All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 001/190] drm: Release driver references to handle before making it available again
@ 2016-01-11  9:16 Chris Wilson
  2016-01-11  9:16 ` [PATCH 002/190] drm/i915: Move the mb() following release-mmap into release-mmap Chris Wilson
                   ` (86 more replies)
  0 siblings, 87 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel, Daniel Vetter, Thierry Reding

When userspace closes a handle, we remove it from the file->object_idr
and then tell the driver to drop its references to that file/handle.
However, as the file/handle is already available again for reuse, it may
be reallocated back to userspace and active on a new object before the
driver has had a chance to drop the old file/handle references.

Whilst calling back into the driver, we have to drop the
file->table_lock spinlock and so to prevent reusing the closed handle we
mark that handle as stale in the idr, perform the callback and then
remove the handle. We set the stale handle to point to the NULL object,
then any idr_find() whilst the driver is removing the handle will return
NULL, just as if the handle is already removed from idr.

v2: Use NULL rather than an ERR_PTR to avoid having to adjust callers.
idr_alloc() tracks existing handles using an internal bitmap, so we are
free to use the NULL object as our stale identifier.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/drm_gem.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 2e8c77e71e1f..d1909d1a1eb4 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -294,18 +294,21 @@ drm_gem_handle_delete(struct drm_file *filp, u32 handle)
 	spin_lock(&filp->table_lock);
 
 	/* Check if we currently have a reference on the object */
-	obj = idr_find(&filp->object_idr, handle);
-	if (obj == NULL) {
+	obj = idr_replace(&filp->object_idr, NULL, handle);
+	if (IS_ERR(obj)) {
 		spin_unlock(&filp->table_lock);
 		return -EINVAL;
 	}
 	dev = obj->dev;
+	spin_unlock(&filp->table_lock);
 
 	/* Release reference and decrement refcount. */
+	drm_gem_object_release_handle(handle, obj, filp);
+
+	spin_lock(&filp->table_lock);
 	idr_remove(&filp->object_idr, handle);
 	spin_unlock(&filp->table_lock);
 
-	drm_gem_object_release_handle(handle, obj, filp);
 	return 0;
 }
 EXPORT_SYMBOL(drm_gem_handle_delete);
-- 
2.7.0.rc3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 002/190] drm/i915: Move the mb() following release-mmap into release-mmap
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 003/190] drm/i915: Add an optional selection from i915 of CONFIG_MMU_NOTIFIER Chris Wilson
                   ` (85 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Goel, Akash

As paranoia, we want to ensure that the CPU's PTEs have been revoked for
the object before we return from i915_gem_release_mmap(). This allows us
to rely on there being no outstanding memory accesses and guarantees
serialisation of the code against concurrent access just by calling
i915_gem_release_mmap().

v2: Reduce the mb() into a wmb() following the revoke.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6c60e04fc09c..3ab529669448 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1962,11 +1962,21 @@ out:
 void
 i915_gem_release_mmap(struct drm_i915_gem_object *obj)
 {
+	/* Serialisation between user GTT access and our code depends upon
+	 * revoking the CPU's PTE whilst the mutex is held. The next user
+	 * pagefault then has to wait until we release the mutex.
+	 */
+	lockdep_assert_held(&obj->base.dev->struct_mutex);
+
 	if (!obj->fault_mappable)
 		return;
 
 	drm_vma_node_unmap(&obj->base.vma_node,
 			   obj->base.dev->anon_inode->i_mapping);
+
+	/* Ensure that the CPU's PTE are revoked before we return */
+	wmb();
+
 	obj->fault_mappable = false;
 }
 
@@ -3269,9 +3279,6 @@ static void i915_gem_object_finish_gtt(struct drm_i915_gem_object *obj)
 	if ((obj->base.read_domains & I915_GEM_DOMAIN_GTT) == 0)
 		return;
 
-	/* Wait for any direct GTT access to complete */
-	mb();
-
 	old_read_domains = obj->base.read_domains;
 	old_write_domain = obj->base.write_domain;
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 003/190] drm/i915: Add an optional selection from i915 of CONFIG_MMU_NOTIFIER
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
  2016-01-11  9:16 ` [PATCH 002/190] drm/i915: Move the mb() following release-mmap into release-mmap Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-02-17 12:59   ` Daniel Vetter
  2016-01-11  9:16 ` [PATCH 004/190] drm/i915: Fix some invalid requests cancellations Chris Wilson
                   ` (84 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

userptr requires mmu-notifier for full unprivileged support. Most
systems have mmu-notifier support already enabled as a requirement for
virtualisation support, but we should make the option for i915 to take
advantage of mmu-notifiers explicit (and enable by default so that
regular userspace can take advantage of passing client memory to the
GPU.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/Kconfig | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index fcd77b27514d..b979295aab82 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -48,3 +48,14 @@ config DRM_I915_PRELIMINARY_HW_SUPPORT
 	  option changes the default for that module option.
 
 	  If in doubt, say "N".
+
+config DRM_I915_USERPTR
+	bool "Always enable userptr support"
+	depends on DRM_I915
+	select MMU_NOTIFIER
+	default y
+	help
+	  This option selects CONFIG_MMU_NOTIFIER if it isn't already
+	  selected to enabled full userptr support.
+
+	  If in doubt, say "Y".
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 004/190] drm/i915: Fix some invalid requests cancellations
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
  2016-01-11  9:16 ` [PATCH 002/190] drm/i915: Move the mb() following release-mmap into release-mmap Chris Wilson
  2016-01-11  9:16 ` [PATCH 003/190] drm/i915: Add an optional selection from i915 of CONFIG_MMU_NOTIFIER Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-12 18:16     ` Dave Gordon
  2016-01-11  9:16 ` [PATCH 005/190] drm/i915: Force clean compilation with -Werror Chris Wilson
                   ` (83 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, Daniel Vetter, stable

As we add the VMA to the request early, it may be cancelled during
execbuf reservation. This will leave the context object pointing to a
dangling request; i915_wait_request() simply skips the wait and so we
may unbind the object whilst it is still active.

However, if at any point we make a change to the hardware (and equally
importantly our bookkeeping in the driver), we cannot cancel the request
as what has already been written must be submitted. Submitting a partial
request is far easier than trying to unwind the incomplete change.

Unfortunately this patch undoes the excess breadcrumb usage that olr
prevented, e.g. if we interrupt batchbuffer submission then we submit
the requests along with the memory writes and interrupt (even though we
do no real work). Disassociating requests from breadcrumbs (and
semaphores) is a topic for a past/future series, but now much more
important.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_drv.h            |  1 -
 drivers/gpu/drm/i915/i915_gem.c            |  7 ++-----
 drivers/gpu/drm/i915/i915_gem_context.c    | 21 +++++++++------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 16 +++++-----------
 drivers/gpu/drm/i915/intel_display.c       |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c           |  1 -
 6 files changed, 17 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 747d2d84a18c..ec20814adb0c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2813,7 +2813,6 @@ int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 			     struct drm_file *file_priv);
 void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 					struct drm_i915_gem_request *req);
-void i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params);
 int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 				   struct drm_i915_gem_execbuffer2 *args,
 				   struct list_head *vmas);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3ab529669448..fd24877eb0a0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3384,12 +3384,9 @@ int i915_gpu_idle(struct drm_device *dev)
 				return ret;
 
 			ret = i915_switch_context(req);
-			if (ret) {
-				i915_gem_request_cancel(req);
-				return ret;
-			}
-
 			i915_add_request_no_flush(req);
+			if (ret)
+				return ret;
 		}
 
 		ret = intel_ring_idle(ring);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index c25083c78ba7..e5e9a8918f19 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -661,7 +661,6 @@ static int do_switch(struct drm_i915_gem_request *req)
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	struct intel_context *from = ring->last_context;
 	u32 hw_flags = 0;
-	bool uninitialized = false;
 	int ret, i;
 
 	if (from != NULL && ring == &dev_priv->ring[RCS]) {
@@ -768,6 +767,15 @@ static int do_switch(struct drm_i915_gem_request *req)
 			to->remap_slice &= ~(1<<i);
 	}
 
+	if (!to->legacy_hw_ctx.initialized) {
+		if (ring->init_context) {
+			ret = ring->init_context(req);
+			if (ret)
+				goto unpin_out;
+		}
+		to->legacy_hw_ctx.initialized = true;
+	}
+
 	/* The backing object for the context is done after switching to the
 	 * *next* context. Therefore we cannot retire the previous context until
 	 * the next context has already started running. In fact, the below code
@@ -791,21 +799,10 @@ static int do_switch(struct drm_i915_gem_request *req)
 		i915_gem_context_unreference(from);
 	}
 
-	uninitialized = !to->legacy_hw_ctx.initialized;
-	to->legacy_hw_ctx.initialized = true;
-
 done:
 	i915_gem_context_reference(to);
 	ring->last_context = to;
 
-	if (uninitialized) {
-		if (ring->init_context) {
-			ret = ring->init_context(req);
-			if (ret)
-				DRM_ERROR("ring init context: %d\n", ret);
-		}
-	}
-
 	return 0;
 
 unpin_out:
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index dccb517361b3..b8186bd061c1 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1136,7 +1136,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 	}
 }
 
-void
+static void
 i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params)
 {
 	/* Unconditionally force add_request to emit a full flush. */
@@ -1318,7 +1318,6 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
 
 	i915_gem_execbuffer_move_to_active(vmas, params->request);
-	i915_gem_execbuffer_retire_commands(params);
 
 	return 0;
 }
@@ -1607,8 +1606,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		goto err_batch_unpin;
 
 	ret = i915_gem_request_add_to_client(params->request, file);
-	if (ret)
+	if (ret) {
+		i915_gem_request_cancel(params->request);
 		goto err_batch_unpin;
+	}
 
 	/*
 	 * Save assorted stuff away to pass through to *_submission().
@@ -1624,6 +1625,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	params->ctx                     = ctx;
 
 	ret = dev_priv->gt.execbuf_submit(params, args, &eb->vmas);
+	i915_gem_execbuffer_retire_commands(params);
 
 err_batch_unpin:
 	/*
@@ -1640,14 +1642,6 @@ err:
 	i915_gem_context_unreference(ctx);
 	eb_destroy(eb);
 
-	/*
-	 * If the request was created but not successfully submitted then it
-	 * must be freed again. If it was submitted then it is being tracked
-	 * on the active request list and no clean up is required here.
-	 */
-	if (ret && params->request)
-		i915_gem_request_cancel(params->request);
-
 	mutex_unlock(&dev->struct_mutex);
 
 pre_mutex_err:
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index b4cf9ce16155..959868c40018 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11751,7 +11751,7 @@ cleanup_unpin:
 	intel_unpin_fb_obj(fb, crtc->primary->state);
 cleanup_pending:
 	if (request)
-		i915_gem_request_cancel(request);
+		i915_add_request_no_flush(request);
 	atomic_dec(&intel_crtc->unpin_work_count);
 	mutex_unlock(&dev->struct_mutex);
 cleanup:
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f7fac5f3b5ce..7f17ba852b8a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -972,7 +972,6 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
 
 	i915_gem_execbuffer_move_to_active(vmas, params->request);
-	i915_gem_execbuffer_retire_commands(params);
 
 	return 0;
 }
-- 
2.7.0.rc3


^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 005/190] drm/i915: Force clean compilation with -Werror
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (2 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 004/190] drm/i915: Fix some invalid requests cancellations Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 006/190] drm/i915: Add GEM debugging Kconfig option Chris Wilson
                   ` (82 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx; +Cc: Jani Nikula

Our driver compiles clean (nowadays thanks to 0day) but for me, at least,
it would be beneficial if the compiler threw an error rather than a
warning when it found a piece of suspect code. (I use this to
compile-check patch series and want to break on the first compiler error
in order to fix the patch.)

v2: Kick off a new "Debugging" submenu for i915.ko

At this point, we applied it to the kernel and promptly kicked it out
again as it broke buildbots (due to a compiler warning on 32bits):

commit 908d759b210effb33d927a8cb6603a16448474e4
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue May 26 07:46:21 2015 +0200

    Revert "drm/i915: Force clean compilation with -Werror"

v3: Avoid enabling -Werror for allyesconfig/allmodconfig builds, using
COMPILE_TEST as a suitable proxy suggested by Andrew Morton. (Damien)
Only make the option available for EXPERT to reinforce that the option
should not be casually enabled.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/Kconfig       |  6 ++++++
 drivers/gpu/drm/i915/Kconfig.debug | 12 ++++++++++++
 drivers/gpu/drm/i915/Makefile      |  2 ++
 3 files changed, 20 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/Kconfig.debug

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index b979295aab82..33e8563c2f99 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -59,3 +59,9 @@ config DRM_I915_USERPTR
 	  selected to enabled full userptr support.
 
 	  If in doubt, say "Y".
+
+menu "drm/i915 Debugging"
+depends on DRM_I915
+depends on EXPERT
+source drivers/gpu/drm/i915/Kconfig.debug
+endmenu
diff --git a/drivers/gpu/drm/i915/Kconfig.debug b/drivers/gpu/drm/i915/Kconfig.debug
new file mode 100644
index 000000000000..1f10ee228eda
--- /dev/null
+++ b/drivers/gpu/drm/i915/Kconfig.debug
@@ -0,0 +1,12 @@
+config DRM_I915_WERROR
+	bool "Force GCC to throw an error instead of a warning when compiling"
+	default n
+	# As this may inadvertently break the build, only allow the user
+	# to shoot oneself in the foot iff they aim really hard
+	depends on EXPERT
+	# We use the dependency on !COMPILE_TEST to not be enabled in
+	# allmodconfig or allyesconfig configurations
+	depends on !COMPILE_TEST
+	---help---
+	  Add -Werror to the build flags for (and only for) i915.ko.
+	  Do not enable this unless you are writing code for the i915.ko module.
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 0851de07bd13..1e9895b9a546 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -2,6 +2,8 @@
 # Makefile for the drm device driver.  This driver provides support for the
 # Direct Rendering Infrastructure (DRI) in XFree86 4.1.0 and higher.
 
+subdir-ccflags-$(CONFIG_DRM_I915_WERROR) := -Werror
+
 # Please keep these build lists sorted!
 
 # core driver code
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 006/190] drm/i915: Add GEM debugging Kconfig option
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (3 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 005/190] drm/i915: Force clean compilation with -Werror Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-12 17:44   ` Dave Gordon
  2016-01-11  9:16 ` [PATCH 007/190] drm/i915: Hide the atomic_read(reset_counter) behind a helper Chris Wilson
                   ` (81 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

Currently there is a #define to enable extra BUG_ON for debugging
requests and associated activities. I want to expand its use to cover
all of GEM internals (so that we can saturate the code with asserts).
We can add a Kconfig option to make it easier to enable - with the usual
caveats of not enabling unless explicitly requested.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/Kconfig.debug |  8 ++++++++
 drivers/gpu/drm/i915/i915_drv.h    |  6 ++++++
 drivers/gpu/drm/i915/i915_gem.c    | 12 +++++-------
 3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.debug b/drivers/gpu/drm/i915/Kconfig.debug
index 1f10ee228eda..7fa6b97635e5 100644
--- a/drivers/gpu/drm/i915/Kconfig.debug
+++ b/drivers/gpu/drm/i915/Kconfig.debug
@@ -10,3 +10,11 @@ config DRM_I915_WERROR
 	---help---
 	  Add -Werror to the build flags for (and only for) i915.ko.
 	  Do not enable this unless you are writing code for the i915.ko module.
+
+config DRM_I915_DEBUG_GEM
+	bool "Insert extra checks into the GEM internals"
+	default n
+	depends on DRM_I915_WERROR
+	---help---
+	  Enable extra sanity checks (including BUGs) that may slow the
+          system down and if hit hang the machine.
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ec20814adb0c..1a6168affadd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2271,6 +2271,12 @@ struct drm_i915_gem_request {
 
 };
 
+#ifdef CONFIG_DRM_I915_DEBUG_GEM
+#define GEM_BUG_ON(expr) BUG_ON(expr)
+#else
+#define GEM_BUG_ON(expr)
+#endif
+
 int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fd24877eb0a0..99fd6aa4dd62 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -38,8 +38,6 @@
 #include <linux/pci.h>
 #include <linux/dma-buf.h>
 
-#define RQ_BUG_ON(expr)
-
 static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
 static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
 static void
@@ -1520,7 +1518,7 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
 
 			i915_gem_object_retire__read(obj, i);
 		}
-		RQ_BUG_ON(obj->active);
+		GEM_BUG_ON(obj->active);
 	}
 
 	return 0;
@@ -2430,8 +2428,8 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 static void
 i915_gem_object_retire__write(struct drm_i915_gem_object *obj)
 {
-	RQ_BUG_ON(obj->last_write_req == NULL);
-	RQ_BUG_ON(!(obj->active & intel_ring_flag(obj->last_write_req->ring)));
+	GEM_BUG_ON(obj->last_write_req == NULL);
+	GEM_BUG_ON(!(obj->active & intel_ring_flag(obj->last_write_req->ring)));
 
 	i915_gem_request_assign(&obj->last_write_req, NULL);
 	intel_fb_obj_flush(obj, true, ORIGIN_CS);
@@ -2442,8 +2440,8 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 {
 	struct i915_vma *vma;
 
-	RQ_BUG_ON(obj->last_read_req[ring] == NULL);
-	RQ_BUG_ON(!(obj->active & (1 << ring)));
+	GEM_BUG_ON(obj->last_read_req[ring] == NULL);
+	GEM_BUG_ON(!(obj->active & (1 << ring)));
 
 	list_del_init(&obj->ring_list[ring]);
 	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 007/190] drm/i915: Hide the atomic_read(reset_counter) behind a helper
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (4 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 006/190] drm/i915: Add GEM debugging Kconfig option Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 008/190] drm/i915: Simplify checking of GPU reset_counter in display pageflips Chris Wilson
                   ` (80 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

This is principally a little bit of syntatic sugar to hide the
atomic_read()s throughout the code to retrieve the current reset_counter.
It also provides the other utility functions to check the reset state on the
already read reset_counter, so that (in later patches) we can read it once
and do multiple tests rather than risk the value changing between tests.

v2: Be strictly on converting existing i915_reset_in_progress() over to
the more verbose i915_reset_in_progress_or_wedged().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  4 ++--
 drivers/gpu/drm/i915/i915_drv.h         | 32 ++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_gem.c         | 16 ++++++++--------
 drivers/gpu/drm/i915/i915_irq.c         |  2 +-
 drivers/gpu/drm/i915/intel_display.c    | 18 +++++++++++-------
 drivers/gpu/drm/i915/intel_lrc.c        |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  4 ++--
 7 files changed, 53 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e3377abc0d4d..932af05b8eec 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4696,7 +4696,7 @@ i915_wedged_get(void *data, u64 *val)
 	struct drm_device *dev = data;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	*val = atomic_read(&dev_priv->gpu_error.reset_counter);
+	*val = i915_reset_counter(&dev_priv->gpu_error);
 
 	return 0;
 }
@@ -4715,7 +4715,7 @@ i915_wedged_set(void *data, u64 val)
 	 * while it is writing to 'i915_wedged'
 	 */
 
-	if (i915_reset_in_progress(&dev_priv->gpu_error))
+	if (i915_reset_in_progress_or_wedged(&dev_priv->gpu_error))
 		return -EAGAIN;
 
 	intel_runtime_pm_get(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1a6168affadd..b274237726de 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2983,20 +2983,44 @@ void i915_gem_retire_requests_ring(struct intel_engine_cs *ring);
 int __must_check i915_gem_check_wedge(struct i915_gpu_error *error,
 				      bool interruptible);
 
+static inline u32 i915_reset_counter(struct i915_gpu_error *error)
+{
+	return atomic_read(&error->reset_counter);
+}
+
+static inline bool __i915_reset_in_progress(u32 reset)
+{
+	return unlikely(reset & I915_RESET_IN_PROGRESS_FLAG);
+}
+
+static inline bool __i915_reset_in_progress_or_wedged(u32 reset)
+{
+	return unlikely(reset & (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED));
+}
+
+static inline bool __i915_terminally_wedged(u32 reset)
+{
+	return unlikely(reset & I915_WEDGED);
+}
+
 static inline bool i915_reset_in_progress(struct i915_gpu_error *error)
 {
-	return unlikely(atomic_read(&error->reset_counter)
-			& (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED));
+	return __i915_reset_in_progress(i915_reset_counter(error));
+}
+
+static inline bool i915_reset_in_progress_or_wedged(struct i915_gpu_error *error)
+{
+	return __i915_reset_in_progress_or_wedged(i915_reset_counter(error));
 }
 
 static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
 {
-	return atomic_read(&error->reset_counter) & I915_WEDGED;
+	return __i915_terminally_wedged(i915_reset_counter(error));
 }
 
 static inline u32 i915_reset_count(struct i915_gpu_error *error)
 {
-	return ((atomic_read(&error->reset_counter) & ~I915_WEDGED) + 1) / 2;
+	return ((i915_reset_counter(error) & ~I915_WEDGED) + 1) / 2;
 }
 
 static inline bool i915_stop_ring_allow_ban(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 99fd6aa4dd62..78bf980a69bf 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -83,7 +83,7 @@ i915_gem_wait_for_error(struct i915_gpu_error *error)
 {
 	int ret;
 
-#define EXIT_COND (!i915_reset_in_progress(error) || \
+#define EXIT_COND (!i915_reset_in_progress_or_wedged(error) || \
 		   i915_terminally_wedged(error))
 	if (EXIT_COND)
 		return 0;
@@ -1111,7 +1111,7 @@ int
 i915_gem_check_wedge(struct i915_gpu_error *error,
 		     bool interruptible)
 {
-	if (i915_reset_in_progress(error)) {
+	if (i915_reset_in_progress_or_wedged(error)) {
 		/* Non-interruptible callers can't handle -EAGAIN, hence return
 		 * -EIO unconditionally for these. */
 		if (!interruptible)
@@ -1295,7 +1295,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 		/* We need to check whether any gpu reset happened in between
 		 * the caller grabbing the seqno and now ... */
-		if (reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter)) {
+		if (reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
 			/* ... but upgrade the -EAGAIN to an -EIO if the gpu
 			 * is truely gone. */
 			ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
@@ -1473,7 +1473,7 @@ i915_wait_request(struct drm_i915_gem_request *req)
 		return ret;
 
 	ret = __i915_wait_request(req,
-				  atomic_read(&dev_priv->gpu_error.reset_counter),
+				  i915_reset_counter(&dev_priv->gpu_error),
 				  interruptible, NULL, NULL);
 	if (ret)
 		return ret;
@@ -1562,7 +1562,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	if (ret)
 		return ret;
 
-	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
+	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 
 	if (readonly) {
 		struct drm_i915_gem_request *req;
@@ -3115,7 +3115,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	}
 
 	drm_gem_object_unreference(&obj->base);
-	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
+	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		if (obj->last_read_req[i] == NULL)
@@ -3160,7 +3160,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (!i915_semaphore_is_enabled(obj->base.dev)) {
 		struct drm_i915_private *i915 = to_i915(obj->base.dev);
 		ret = __i915_wait_request(from_req,
-					  atomic_read(&i915->gpu_error.reset_counter),
+					  i915_reset_counter(&i915->gpu_error),
 					  i915->mm.interruptible,
 					  NULL,
 					  &i915->rps.semaphores);
@@ -4128,7 +4128,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 
 		target = request;
 	}
-	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
+	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 	if (target)
 		i915_gem_request_reference(target);
 	spin_unlock(&file_priv->mm.lock);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f04d799153ca..9a6b0ac54d01 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2484,7 +2484,7 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
 	 * the reset in-progress bit is only ever set by code outside of this
 	 * work we don't need to worry about any other races.
 	 */
-	if (i915_reset_in_progress(error) && !i915_terminally_wedged(error)) {
+	if (i915_reset_in_progress_or_wedged(error) && !i915_terminally_wedged(error)) {
 		DRM_DEBUG_DRIVER("resetting chip\n");
 		kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE,
 				   reset_event);
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 959868c40018..0933bdbaa935 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3290,10 +3290,12 @@ static bool intel_crtc_has_pending_flip(struct drm_crtc *crtc)
 	struct drm_device *dev = crtc->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	unsigned reset_counter;
 	bool pending;
 
-	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
-	    intel_crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
+	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
+	if (intel_crtc->reset_counter != reset_counter ||
+	    __i915_reset_in_progress_or_wedged(reset_counter))
 		return false;
 
 	spin_lock_irq(&dev->event_lock);
@@ -11006,9 +11008,11 @@ static bool page_flip_finished(struct intel_crtc *crtc)
 {
 	struct drm_device *dev = crtc->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	unsigned reset_counter;
 
-	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
-	    crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
+	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
+	if (crtc->reset_counter != reset_counter ||
+	    __i915_reset_in_progress_or_wedged(reset_counter))
 		return true;
 
 	/*
@@ -11665,7 +11669,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 		goto cleanup;
 
 	atomic_inc(&intel_crtc->unpin_work_count);
-	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
+	intel_crtc->reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 
 	if (INTEL_INFO(dev)->gen >= 5 || IS_G4X(dev))
 		work->flip_count = I915_READ(PIPE_FLIPCOUNT_G4X(pipe)) + 1;
@@ -13499,10 +13503,10 @@ static int intel_atomic_prepare_commit(struct drm_device *dev,
 		return ret;
 
 	ret = drm_atomic_helper_prepare_planes(dev, state);
-	if (!ret && !async && !i915_reset_in_progress(&dev_priv->gpu_error)) {
+	if (!ret && !async && !i915_reset_in_progress_or_wedged(&dev_priv->gpu_error)) {
 		u32 reset_counter;
 
-		reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
+		reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 		mutex_unlock(&dev->struct_mutex);
 
 		for_each_plane_in_state(state, plane, plane_state, i) {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7f17ba852b8a..254ce14d790b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1011,7 +1011,7 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
 		return;
 
 	ret = intel_ring_idle(ring);
-	if (ret && !i915_reset_in_progress(&to_i915(ring->dev)->gpu_error))
+	if (ret && !i915_reset_in_progress_or_wedged(&to_i915(ring->dev)->gpu_error))
 		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
 			  ring->name, ret);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 339701d7a9a5..8c6b15ab652b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2274,7 +2274,7 @@ int intel_ring_idle(struct intel_engine_cs *ring)
 
 	/* Make sure we do not trigger any retires */
 	return __i915_wait_request(req,
-				   atomic_read(&to_i915(ring->dev)->gpu_error.reset_counter),
+				   i915_reset_counter(&to_i915(ring->dev)->gpu_error),
 				   to_i915(ring->dev)->mm.interruptible,
 				   NULL, NULL);
 }
@@ -3068,7 +3068,7 @@ intel_stop_ring_buffer(struct intel_engine_cs *ring)
 		return;
 
 	ret = intel_ring_idle(ring);
-	if (ret && !i915_reset_in_progress(&to_i915(ring->dev)->gpu_error))
+	if (ret && !i915_reset_in_progress_or_wedged(&to_i915(ring->dev)->gpu_error))
 		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
 			  ring->name, ret);
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 008/190] drm/i915: Simplify checking of GPU reset_counter in display pageflips
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (5 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 007/190] drm/i915: Hide the atomic_read(reset_counter) behind a helper Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 009/190] drm/i915: Tighten reset_counter for reset status Chris Wilson
                   ` (79 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

If we, when we store the reset_counter for the operation, we ensure that
it is not in a wedged or in the middle of a reset, we can then assert that
if any reset occurs the reset_counter must change. Later we can just
compare the operation's reset epoch against the current counter to see
if we need to abort the operation (to handle the hang).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/intel_display.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 0933bdbaa935..183c05bdb220 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3288,14 +3288,12 @@ void intel_finish_reset(struct drm_device *dev)
 static bool intel_crtc_has_pending_flip(struct drm_crtc *crtc)
 {
 	struct drm_device *dev = crtc->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	unsigned reset_counter;
 	bool pending;
 
-	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
-	if (intel_crtc->reset_counter != reset_counter ||
-	    __i915_reset_in_progress_or_wedged(reset_counter))
+	reset_counter = i915_reset_counter(&to_i915(dev)->gpu_error);
+	if (intel_crtc->reset_counter != reset_counter)
 		return false;
 
 	spin_lock_irq(&dev->event_lock);
@@ -11011,8 +11009,7 @@ static bool page_flip_finished(struct intel_crtc *crtc)
 	unsigned reset_counter;
 
 	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
-	if (crtc->reset_counter != reset_counter ||
-	    __i915_reset_in_progress_or_wedged(reset_counter))
+	if (crtc->reset_counter != reset_counter)
 		return true;
 
 	/*
@@ -11668,8 +11665,13 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	if (ret)
 		goto cleanup;
 
-	atomic_inc(&intel_crtc->unpin_work_count);
 	intel_crtc->reset_counter = i915_reset_counter(&dev_priv->gpu_error);
+	if (__i915_reset_in_progress_or_wedged(intel_crtc->reset_counter)) {
+		ret = -EIO;
+		goto cleanup;
+	}
+
+	atomic_inc(&intel_crtc->unpin_work_count);
 
 	if (INTEL_INFO(dev)->gen >= 5 || IS_G4X(dev))
 		work->flip_count = I915_READ(PIPE_FLIPCOUNT_G4X(pipe)) + 1;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 009/190] drm/i915: Tighten reset_counter for reset status
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (6 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 008/190] drm/i915: Simplify checking of GPU reset_counter in display pageflips Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 010/190] drm/i915: Store the reset counter when constructing a request Chris Wilson
                   ` (78 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

In the reset_counter, we use two bits to track a GPU hang and reset. The
low bit is a "reset-in-progress" flag that we set to signal when we need
to break waiters in order for the recovery task to grab the mutex. As
soon as the recovery task has the mutex, we can clear that flag (which
we do by incrementing the reset_counter thereby incrementing the gobal
reset epoch). By clearing that flag when the recovery task holds the
struct_mutex, we can forgo a second flag that simply tells GEM to ignore
the "reset-in-progress" flag.

The second flag we store in the reset_counter is whether the
reset failed and we consider the GPU terminally wedged. Whilst this flag
is set, all access to the GPU (at least through GEM rather than direct mmio
access) is verboten.

PS: Fun is in store, as in the future we want to move from a global
reset epoch to a per-engine reset engine with request recovery.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_debugfs.c |  4 ++--
 drivers/gpu/drm/i915/i915_drv.c     | 39 ++++++++++++++++++++++---------------
 drivers/gpu/drm/i915/i915_drv.h     |  3 ---
 drivers/gpu/drm/i915/i915_gem.c     | 27 +++++++++----------------
 drivers/gpu/drm/i915/i915_irq.c     | 21 ++------------------
 5 files changed, 36 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 932af05b8eec..6ff2d23faaa7 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4696,7 +4696,7 @@ i915_wedged_get(void *data, u64 *val)
 	struct drm_device *dev = data;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	*val = i915_reset_counter(&dev_priv->gpu_error);
+	*val = i915_terminally_wedged(&dev_priv->gpu_error);
 
 	return 0;
 }
@@ -4715,7 +4715,7 @@ i915_wedged_set(void *data, u64 val)
 	 * while it is writing to 'i915_wedged'
 	 */
 
-	if (i915_reset_in_progress_or_wedged(&dev_priv->gpu_error))
+	if (i915_reset_in_progress(&dev_priv->gpu_error))
 		return -EAGAIN;
 
 	intel_runtime_pm_get(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 288fec7691dc..2f03379cdb4b 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -873,23 +873,32 @@ int i915_resume_switcheroo(struct drm_device *dev)
 int i915_reset(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	bool simulated;
+	struct i915_gpu_error *error = &dev_priv->gpu_error;
+	unsigned reset_counter;
 	int ret;
 
 	intel_reset_gt_powersave(dev);
 
 	mutex_lock(&dev->struct_mutex);
 
-	i915_gem_reset(dev);
+	/* Clear any previous failed attempts at recovery. Time to try again. */
+	atomic_andnot(I915_WEDGED, &error->reset_counter);
 
-	simulated = dev_priv->gpu_error.stop_rings != 0;
+	/* Clear the reset-in-progress flag and increment the reset epoch. */
+	reset_counter = atomic_inc_return(&error->reset_counter);
+	if (WARN_ON(__i915_reset_in_progress(reset_counter))) {
+		ret = -EIO;
+		goto error;
+	}
+
+	i915_gem_reset(dev);
 
 	ret = intel_gpu_reset(dev);
 
 	/* Also reset the gpu hangman. */
-	if (simulated) {
+	if (error->stop_rings != 0) {
 		DRM_INFO("Simulated gpu hang, resetting stop_rings\n");
-		dev_priv->gpu_error.stop_rings = 0;
+		error->stop_rings = 0;
 		if (ret == -ENODEV) {
 			DRM_INFO("Reset not implemented, but ignoring "
 				 "error for simulated gpu hangs\n");
@@ -902,8 +911,7 @@ int i915_reset(struct drm_device *dev)
 
 	if (ret) {
 		DRM_ERROR("Failed to reset chip: %i\n", ret);
-		mutex_unlock(&dev->struct_mutex);
-		return ret;
+		goto error;
 	}
 
 	intel_overlay_reset(dev_priv);
@@ -922,20 +930,14 @@ int i915_reset(struct drm_device *dev)
 	 * was running at the time of the reset (i.e. we weren't VT
 	 * switched away).
 	 */
-
-	/* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */
-	dev_priv->gpu_error.reload_in_reset = true;
-
 	ret = i915_gem_init_hw(dev);
-
-	dev_priv->gpu_error.reload_in_reset = false;
-
-	mutex_unlock(&dev->struct_mutex);
 	if (ret) {
 		DRM_ERROR("Failed hw init on reset %d\n", ret);
-		return ret;
+		goto error;
 	}
 
+	mutex_unlock(&dev->struct_mutex);
+
 	/*
 	 * rps/rc6 re-init is necessary to restore state lost after the
 	 * reset and the re-install of gt irqs. Skip for ironlake per
@@ -946,6 +948,11 @@ int i915_reset(struct drm_device *dev)
 		intel_enable_gt_powersave(dev);
 
 	return 0;
+
+error:
+	atomic_or(I915_WEDGED, &error->reset_counter);
+	mutex_unlock(&dev->struct_mutex);
+	return ret;
 }
 
 static int i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b274237726de..60531df3844c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1381,9 +1381,6 @@ struct i915_gpu_error {
 
 	/* For missed irq/seqno simulation. */
 	unsigned int test_irq_rings;
-
-	/* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset   */
-	bool reload_in_reset;
 };
 
 enum modeset_restore {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 78bf980a69bf..2cdd20b3aeaf 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -83,9 +83,7 @@ i915_gem_wait_for_error(struct i915_gpu_error *error)
 {
 	int ret;
 
-#define EXIT_COND (!i915_reset_in_progress_or_wedged(error) || \
-		   i915_terminally_wedged(error))
-	if (EXIT_COND)
+	if (!i915_reset_in_progress(error))
 		return 0;
 
 	/*
@@ -94,17 +92,16 @@ i915_gem_wait_for_error(struct i915_gpu_error *error)
 	 * we should simply try to bail out and fail as gracefully as possible.
 	 */
 	ret = wait_event_interruptible_timeout(error->reset_queue,
-					       EXIT_COND,
+					       !i915_reset_in_progress(error),
 					       10*HZ);
 	if (ret == 0) {
 		DRM_ERROR("Timed out waiting for the gpu reset to complete\n");
 		return -EIO;
 	} else if (ret < 0) {
 		return ret;
+	} else {
+		return 0;
 	}
-#undef EXIT_COND
-
-	return 0;
 }
 
 int i915_mutex_lock_interruptible(struct drm_device *dev)
@@ -1112,22 +1109,16 @@ i915_gem_check_wedge(struct i915_gpu_error *error,
 		     bool interruptible)
 {
 	if (i915_reset_in_progress_or_wedged(error)) {
+		/* Recovery complete, but the reset failed ... */
+		if (i915_terminally_wedged(error))
+			return -EIO;
+
 		/* Non-interruptible callers can't handle -EAGAIN, hence return
 		 * -EIO unconditionally for these. */
 		if (!interruptible)
 			return -EIO;
 
-		/* Recovery complete, but the reset failed ... */
-		if (i915_terminally_wedged(error))
-			return -EIO;
-
-		/*
-		 * Check if GPU Reset is in progress - we need intel_ring_begin
-		 * to work properly to reinit the hw state while the gpu is
-		 * still marked as reset-in-progress. Handle this with a flag.
-		 */
-		if (!error->reload_in_reset)
-			return -EAGAIN;
+		return -EAGAIN;
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 9a6b0ac54d01..15973e917566 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2466,7 +2466,6 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 static void i915_reset_and_wakeup(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct i915_gpu_error *error = &dev_priv->gpu_error;
 	char *error_event[] = { I915_ERROR_UEVENT "=1", NULL };
 	char *reset_event[] = { I915_RESET_UEVENT "=1", NULL };
 	char *reset_done_event[] = { I915_ERROR_UEVENT "=0", NULL };
@@ -2484,7 +2483,7 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
 	 * the reset in-progress bit is only ever set by code outside of this
 	 * work we don't need to worry about any other races.
 	 */
-	if (i915_reset_in_progress_or_wedged(error) && !i915_terminally_wedged(error)) {
+	if (i915_reset_in_progress(&dev_priv->gpu_error)) {
 		DRM_DEBUG_DRIVER("resetting chip\n");
 		kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE,
 				   reset_event);
@@ -2512,25 +2511,9 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
 
 		intel_runtime_pm_put(dev_priv);
 
-		if (ret == 0) {
-			/*
-			 * After all the gem state is reset, increment the reset
-			 * counter and wake up everyone waiting for the reset to
-			 * complete.
-			 *
-			 * Since unlock operations are a one-sided barrier only,
-			 * we need to insert a barrier here to order any seqno
-			 * updates before
-			 * the counter increment.
-			 */
-			smp_mb__before_atomic();
-			atomic_inc(&dev_priv->gpu_error.reset_counter);
-
+		if (ret == 0)
 			kobject_uevent_env(&dev->primary->kdev->kobj,
 					   KOBJ_CHANGE, reset_done_event);
-		} else {
-			atomic_or(I915_WEDGED, &error->reset_counter);
-		}
 
 		/*
 		 * Note: The wake_up also serves as a memory barrier so that
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 010/190] drm/i915: Store the reset counter when constructing a request
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (7 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 009/190] drm/i915: Tighten reset_counter for reset status Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 011/190] drm/i915: Simplify reset_counter handling during atomic modesetting Chris Wilson
                   ` (77 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

As the request is only valid during the same global reset epoch, we can
record the current reset_counter when constructing the request and reuse
it when waiting upon that request in future. This removes a very hairy
atomic check serialised by the struct_mutex at the time of waiting and
allows us to transfer those waits to a central dispatcher for all
waiters and all requests.

PS: With per-engine resets, we obviously cannot assume a global reset
epoch for the requests - a per-engine epoch makes the most sense. The
challenge then is how to handle checking in the waiter for when to break
the wait, as the fine-grained reset may also want to requeue the
request (i.e. the assumption that just because the epoch changes the
request is completed may be broken - or we just avoid breaking that
assumption with the fine-grained resets).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by:: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.h         |  2 +-
 drivers/gpu/drm/i915/i915_gem.c         | 40 +++++++++++----------------------
 drivers/gpu/drm/i915/intel_display.c    |  7 +-----
 drivers/gpu/drm/i915/intel_lrc.c        |  7 ------
 drivers/gpu/drm/i915/intel_ringbuffer.c |  6 -----
 5 files changed, 15 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 60531df3844c..f74bca326b79 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2191,6 +2191,7 @@ struct drm_i915_gem_request {
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
 	struct intel_engine_cs *ring;
+	unsigned reset_counter;
 
 	 /** GEM sequence number associated with the previous request,
 	  * when the HWS breadcrumb is equal to this the GPU is processing
@@ -3050,7 +3051,6 @@ void __i915_add_request(struct drm_i915_gem_request *req,
 #define i915_add_request_no_flush(req) \
 	__i915_add_request(req, NULL, false)
 int __i915_wait_request(struct drm_i915_gem_request *req,
-			unsigned reset_counter,
 			bool interruptible,
 			s64 *timeout,
 			struct intel_rps_client *rps);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2cdd20b3aeaf..56069bdada85 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1212,7 +1212,6 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 /**
  * __i915_wait_request - wait until execution of request has finished
  * @req: duh!
- * @reset_counter: reset sequence associated with the given request
  * @interruptible: do an interruptible wait (normally yes)
  * @timeout: in - how long to wait (NULL forever); out - how much time remaining
  *
@@ -1227,7 +1226,6 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
  * errno with remaining time filled in timeout argument.
  */
 int __i915_wait_request(struct drm_i915_gem_request *req,
-			unsigned reset_counter,
 			bool interruptible,
 			s64 *timeout,
 			struct intel_rps_client *rps)
@@ -1286,7 +1284,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 		/* We need to check whether any gpu reset happened in between
 		 * the caller grabbing the seqno and now ... */
-		if (reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
+		if (req->reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
 			/* ... but upgrade the -EAGAIN to an -EIO if the gpu
 			 * is truely gone. */
 			ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
@@ -1459,13 +1457,7 @@ i915_wait_request(struct drm_i915_gem_request *req)
 
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
-	if (ret)
-		return ret;
-
-	ret = __i915_wait_request(req,
-				  i915_reset_counter(&dev_priv->gpu_error),
-				  interruptible, NULL, NULL);
+	ret = __i915_wait_request(req, interruptible, NULL, NULL);
 	if (ret)
 		return ret;
 
@@ -1540,7 +1532,6 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_request *requests[I915_NUM_RINGS];
-	unsigned reset_counter;
 	int ret, i, n = 0;
 
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
@@ -1549,12 +1540,6 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	if (!obj->active)
 		return 0;
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error, true);
-	if (ret)
-		return ret;
-
-	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
-
 	if (readonly) {
 		struct drm_i915_gem_request *req;
 
@@ -1576,9 +1561,9 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	}
 
 	mutex_unlock(&dev->struct_mutex);
+	ret = 0;
 	for (i = 0; ret == 0 && i < n; i++)
-		ret = __i915_wait_request(requests[i], reset_counter, true,
-					  NULL, rps);
+		ret = __i915_wait_request(requests[i], true, NULL, rps);
 	mutex_lock(&dev->struct_mutex);
 
 	for (i = 0; i < n; i++) {
@@ -2692,6 +2677,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct drm_i915_gem_request **req_out)
 {
 	struct drm_i915_private *dev_priv = to_i915(ring->dev);
+	unsigned reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 	struct drm_i915_gem_request *req;
 	int ret;
 
@@ -2700,6 +2686,11 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 
 	*req_out = NULL;
 
+	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
+				   dev_priv->mm.interruptible);
+	if (ret)
+		return ret;
+
 	req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
 	if (req == NULL)
 		return -ENOMEM;
@@ -2711,6 +2702,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 	kref_init(&req->ref);
 	req->i915 = dev_priv;
 	req->ring = ring;
+	req->reset_counter = reset_counter;
 	req->ctx  = ctx;
 	i915_gem_context_reference(req->ctx);
 
@@ -3068,11 +3060,9 @@ retire:
 int
 i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_wait *args = data;
 	struct drm_i915_gem_object *obj;
 	struct drm_i915_gem_request *req[I915_NUM_RINGS];
-	unsigned reset_counter;
 	int i, n = 0;
 	int ret;
 
@@ -3106,7 +3096,6 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	}
 
 	drm_gem_object_unreference(&obj->base);
-	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		if (obj->last_read_req[i] == NULL)
@@ -3119,7 +3108,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 
 	for (i = 0; i < n; i++) {
 		if (ret == 0)
-			ret = __i915_wait_request(req[i], reset_counter, true,
+			ret = __i915_wait_request(req[i], true,
 						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
 						  to_rps_client(file));
 		i915_gem_request_unreference__unlocked(req[i]);
@@ -3151,7 +3140,6 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (!i915_semaphore_is_enabled(obj->base.dev)) {
 		struct drm_i915_private *i915 = to_i915(obj->base.dev);
 		ret = __i915_wait_request(from_req,
-					  i915_reset_counter(&i915->gpu_error),
 					  i915->mm.interruptible,
 					  NULL,
 					  &i915->rps.semaphores);
@@ -4094,7 +4082,6 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	unsigned long recent_enough = jiffies - DRM_I915_THROTTLE_JIFFIES;
 	struct drm_i915_gem_request *request, *target = NULL;
-	unsigned reset_counter;
 	int ret;
 
 	ret = i915_gem_wait_for_error(&dev_priv->gpu_error);
@@ -4119,7 +4106,6 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 
 		target = request;
 	}
-	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 	if (target)
 		i915_gem_request_reference(target);
 	spin_unlock(&file_priv->mm.lock);
@@ -4127,7 +4113,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (target == NULL)
 		return 0;
 
-	ret = __i915_wait_request(target, reset_counter, true, NULL, NULL);
+	ret = __i915_wait_request(target, true, NULL, NULL);
 	if (ret == 0)
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 183c05bdb220..4f36313f31ac 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11458,7 +11458,6 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
 
 	if (mmio_flip->req) {
 		WARN_ON(__i915_wait_request(mmio_flip->req,
-					    mmio_flip->crtc->reset_counter,
 					    false, NULL,
 					    &mmio_flip->i915->rps.mmioflips));
 		i915_gem_request_unreference__unlocked(mmio_flip->req);
@@ -13506,9 +13505,6 @@ static int intel_atomic_prepare_commit(struct drm_device *dev,
 
 	ret = drm_atomic_helper_prepare_planes(dev, state);
 	if (!ret && !async && !i915_reset_in_progress_or_wedged(&dev_priv->gpu_error)) {
-		u32 reset_counter;
-
-		reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 		mutex_unlock(&dev->struct_mutex);
 
 		for_each_plane_in_state(state, plane, plane_state, i) {
@@ -13519,8 +13515,7 @@ static int intel_atomic_prepare_commit(struct drm_device *dev,
 				continue;
 
 			ret = __i915_wait_request(intel_plane_state->wait_req,
-						  reset_counter, true,
-						  NULL, NULL);
+						  true, NULL, NULL);
 
 			/* Swallow -EIO errors to allow updates during hw lockup. */
 			if (ret == -EIO)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 254ce14d790b..3b436eb86ac7 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -848,16 +848,9 @@ static int logical_ring_prepare(struct drm_i915_gem_request *req, int bytes)
  */
 int intel_logical_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
 {
-	struct drm_i915_private *dev_priv;
 	int ret;
 
 	WARN_ON(req == NULL);
-	dev_priv = req->ring->dev->dev_private;
-
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
-				   dev_priv->mm.interruptible);
-	if (ret)
-		return ret;
 
 	ret = logical_ring_prepare(req, num_dwords * sizeof(uint32_t));
 	if (ret)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 8c6b15ab652b..15121f3fd4f7 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2274,7 +2274,6 @@ int intel_ring_idle(struct intel_engine_cs *ring)
 
 	/* Make sure we do not trigger any retires */
 	return __i915_wait_request(req,
-				   i915_reset_counter(&to_i915(ring->dev)->gpu_error),
 				   to_i915(ring->dev)->mm.interruptible,
 				   NULL, NULL);
 }
@@ -2405,11 +2404,6 @@ int intel_ring_begin(struct drm_i915_gem_request *req,
 	ring = req->ring;
 	dev_priv = ring->dev->dev_private;
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
-				   dev_priv->mm.interruptible);
-	if (ret)
-		return ret;
-
 	ret = __intel_ring_prepare(ring, num_dwords * sizeof(uint32_t));
 	if (ret)
 		return ret;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 011/190] drm/i915: Simplify reset_counter handling during atomic modesetting
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (8 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 010/190] drm/i915: Store the reset counter when constructing a request Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 012/190] drm/i915: Prevent leaking of -EIO from i915_wait_request() Chris Wilson
                   ` (76 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

Now that the reset_counter is stored on the request, we can rearrange
the code to handle reading the counter versus waiting during the atomic
modesetting for readibility (by deleting the hairiest of codes).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/intel_display.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 4f36313f31ac..ee0ec72b16b4 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13504,9 +13504,9 @@ static int intel_atomic_prepare_commit(struct drm_device *dev,
 		return ret;
 
 	ret = drm_atomic_helper_prepare_planes(dev, state);
-	if (!ret && !async && !i915_reset_in_progress_or_wedged(&dev_priv->gpu_error)) {
-		mutex_unlock(&dev->struct_mutex);
+	mutex_unlock(&dev->struct_mutex);
 
+	if (!ret && !async) {
 		for_each_plane_in_state(state, plane, plane_state, i) {
 			struct intel_plane_state *intel_plane_state =
 				to_intel_plane_state(plane_state);
@@ -13520,19 +13520,15 @@ static int intel_atomic_prepare_commit(struct drm_device *dev,
 			/* Swallow -EIO errors to allow updates during hw lockup. */
 			if (ret == -EIO)
 				ret = 0;
-
-			if (ret)
+			if (ret) {
+				mutex_lock(&dev->struct_mutex);
+				drm_atomic_helper_cleanup_planes(dev, state);
+				mutex_unlock(&dev->struct_mutex);
 				break;
+			}
 		}
-
-		if (!ret)
-			return 0;
-
-		mutex_lock(&dev->struct_mutex);
-		drm_atomic_helper_cleanup_planes(dev, state);
 	}
 
-	mutex_unlock(&dev->struct_mutex);
 	return ret;
 }
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 012/190] drm/i915: Prevent leaking of -EIO from i915_wait_request()
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (9 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 011/190] drm/i915: Simplify reset_counter handling during atomic modesetting Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 013/190] drm/i915: Suppress error message when GPU resets are disabled Chris Wilson
                   ` (75 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

Reporting -EIO from i915_wait_request() has proven very troublematic
over the years, with numerous hard-to-reproduce bugs cropping up in the
corner case of where a reset occurs and the code wasn't expecting such
an error.

If the we reset the GPU or have detected a hang and wish to reset the
GPU, the request is forcibly complete and the wait broken. Currently, we
report either -EAGAIN or -EIO in order for the caller to retreat and
restart the wait (if appropriate) after dropping and then reacquiring
the struct_mutex (essential to allow the GPU reset to proceed). However,
if we take the view that the request is complete (no further work will
be done on it by the GPU because it is dead and soon to be reset), then
we can proceed with the task at hand and then drop the struct_mutex
allowing the reset to occur. This transfers the burden of checking
whether it is safe to proceed to the caller, which in all but one
instance it is safe - completely eliminating the source of all spurious
-EIO.

Of note, we only have two API entry points where we expect that
userspace can observe an EIO. First is when submitting an execbuf, if
the GPU is terminally wedged, then the operation cannot succeed and an
-EIO is reported. Secondly, existing userspace uses the throttle ioctl
to detect an already wedged GPU before starting using HW acceleration
(or to confirm that the GPU is wedged after an error condition). So if
the GPU is wedged when the user calls throttle, also report -EIO.

v2: Split more carefully the change to i915_wait_request() and assorted
ABI from the reset handling.
v3: Add a couple of WARN_ON(EIO) to the interruptible modesetting code
so that we don't start to leak EIO there in future (and break our hang
resistant modesetting).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.h         |  2 --
 drivers/gpu/drm/i915/i915_gem.c         | 44 ++++++++++++++++-----------------
 drivers/gpu/drm/i915/i915_gem_userptr.c |  6 ++---
 drivers/gpu/drm/i915/intel_display.c    | 13 +++++-----
 drivers/gpu/drm/i915/intel_lrc.c        |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  2 +-
 6 files changed, 32 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f74bca326b79..bbdb056d2a8e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2978,8 +2978,6 @@ i915_gem_find_active_request(struct intel_engine_cs *ring);
 
 bool i915_gem_retire_requests(struct drm_device *dev);
 void i915_gem_retire_requests_ring(struct intel_engine_cs *ring);
-int __must_check i915_gem_check_wedge(struct i915_gpu_error *error,
-				      bool interruptible);
 
 static inline u32 i915_reset_counter(struct i915_gpu_error *error)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 56069bdada85..f570990f03e0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -206,11 +206,10 @@ i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj)
 	BUG_ON(obj->madv == __I915_MADV_PURGED);
 
 	ret = i915_gem_object_set_to_cpu_domain(obj, true);
-	if (ret) {
+	if (WARN_ON(ret)) {
 		/* In the event of a disaster, abandon all caches and
 		 * hope for the best.
 		 */
-		WARN_ON(ret != -EIO);
 		obj->base.read_domains = obj->base.write_domain = I915_GEM_DOMAIN_CPU;
 	}
 
@@ -1104,15 +1103,13 @@ put_rpm:
 	return ret;
 }
 
-int
-i915_gem_check_wedge(struct i915_gpu_error *error,
-		     bool interruptible)
+static int
+i915_gem_check_wedge(unsigned reset_counter, bool interruptible)
 {
-	if (i915_reset_in_progress_or_wedged(error)) {
-		/* Recovery complete, but the reset failed ... */
-		if (i915_terminally_wedged(error))
-			return -EIO;
+	if (__i915_terminally_wedged(reset_counter))
+		return -EIO;
 
+	if (__i915_reset_in_progress(reset_counter)) {
 		/* Non-interruptible callers can't handle -EAGAIN, hence return
 		 * -EIO unconditionally for these. */
 		if (!interruptible)
@@ -1283,13 +1280,14 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 		prepare_to_wait(&ring->irq_queue, &wait, state);
 
 		/* We need to check whether any gpu reset happened in between
-		 * the caller grabbing the seqno and now ... */
+		 * the request being submitted and now. If a reset has occurred,
+		 * the request is effectively complete (we either are in the
+		 * process of or have discarded the rendering and completely
+		 * reset the GPU. The results of the request are lost and we
+		 * are free to continue on with the original operation.
+		 */
 		if (req->reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
-			/* ... but upgrade the -EAGAIN to an -EIO if the gpu
-			 * is truely gone. */
-			ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
-			if (ret == 0)
-				ret = -EAGAIN;
+			ret = 0;
 			break;
 		}
 
@@ -2162,11 +2160,10 @@ i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
 	BUG_ON(obj->madv == __I915_MADV_PURGED);
 
 	ret = i915_gem_object_set_to_cpu_domain(obj, true);
-	if (ret) {
+	if (WARN_ON(ret)) {
 		/* In the event of a disaster, abandon all caches and
 		 * hope for the best.
 		 */
-		WARN_ON(ret != -EIO);
 		i915_gem_clflush_object(obj, true);
 		obj->base.read_domains = obj->base.write_domain = I915_GEM_DOMAIN_CPU;
 	}
@@ -2686,8 +2683,11 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 
 	*req_out = NULL;
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
-				   dev_priv->mm.interruptible);
+	/* ABI: Before userspace accesses the GPU (e.g. execbuffer), report
+	 * EIO if the GPU is already wedged, or EAGAIN to drop the struct_mutex
+	 * and restart.
+	 */
+	ret = i915_gem_check_wedge(reset_counter, dev_priv->mm.interruptible);
 	if (ret)
 		return ret;
 
@@ -4088,9 +4088,9 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (ret)
 		return ret;
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error, false);
-	if (ret)
-		return ret;
+	/* ABI: return -EIO if already wedged */
+	if (i915_terminally_wedged(&dev_priv->gpu_error))
+		return -EIO;
 
 	spin_lock(&file_priv->mm.lock);
 	list_for_each_entry(request, &file_priv->mm.request_list, client_list) {
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 19fb0bddc1cd..1a5f89dba4af 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -81,10 +81,8 @@ static void __cancel_userptr__worker(struct work_struct *work)
 		was_interruptible = dev_priv->mm.interruptible;
 		dev_priv->mm.interruptible = false;
 
-		list_for_each_entry_safe(vma, tmp, &obj->vma_list, vma_link) {
-			int ret = i915_vma_unbind(vma);
-			WARN_ON(ret && ret != -EIO);
-		}
+		list_for_each_entry_safe(vma, tmp, &obj->vma_list, vma_link)
+			WARN_ON(i915_vma_unbind(vma));
 		WARN_ON(i915_gem_object_put_pages(obj));
 
 		dev_priv->mm.interruptible = was_interruptible;
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index ee0ec72b16b4..7e36f85d3109 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13516,11 +13516,9 @@ static int intel_atomic_prepare_commit(struct drm_device *dev,
 
 			ret = __i915_wait_request(intel_plane_state->wait_req,
 						  true, NULL, NULL);
-
-			/* Swallow -EIO errors to allow updates during hw lockup. */
-			if (ret == -EIO)
-				ret = 0;
 			if (ret) {
+				/* Any hang should be swallowed by the wait */
+				WARN_ON(ret == -EIO);
 				mutex_lock(&dev->struct_mutex);
 				drm_atomic_helper_cleanup_planes(dev, state);
 				mutex_unlock(&dev->struct_mutex);
@@ -13889,10 +13887,11 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		 */
 		if (needs_modeset(crtc_state))
 			ret = i915_gem_object_wait_rendering(old_obj, true);
-
-		/* Swallow -EIO errors to allow updates during hw lockup. */
-		if (ret && ret != -EIO)
+		if (ret) {
+			/* GPU hangs should have been swallowed by the wait */
+			WARN_ON(ret == -EIO);
 			return ret;
+		}
 	}
 
 	/* For framebuffer backed by dmabuf, wait for fence */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3b436eb86ac7..32644338e6f8 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1004,7 +1004,7 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
 		return;
 
 	ret = intel_ring_idle(ring);
-	if (ret && !i915_reset_in_progress_or_wedged(&to_i915(ring->dev)->gpu_error))
+	if (ret)
 		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
 			  ring->name, ret);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 15121f3fd4f7..99780b674311 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -3062,7 +3062,7 @@ intel_stop_ring_buffer(struct intel_engine_cs *ring)
 		return;
 
 	ret = intel_ring_idle(ring);
-	if (ret && !i915_reset_in_progress_or_wedged(&to_i915(ring->dev)->gpu_error))
+	if (ret)
 		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
 			  ring->name, ret);
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 013/190] drm/i915: Suppress error message when GPU resets are disabled
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (10 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 012/190] drm/i915: Prevent leaking of -EIO from i915_wait_request() Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 014/190] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
                   ` (74 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

If we do not have lowlevel support for reseting the GPU, or if the user
has explicitly disabled reseting the device, the failure is expected.
Since it is an expected failure, we should be using a lower priority
message than *ERROR*, perhaps NOTICE. In the absence of DRM_NOTICE, just
emit the expected failure as a DEBUG message.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 2f03379cdb4b..5160f1414de4 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -910,7 +910,10 @@ int i915_reset(struct drm_device *dev)
 		pr_notice("drm/i915: Resetting chip after gpu hang\n");
 
 	if (ret) {
-		DRM_ERROR("Failed to reset chip: %i\n", ret);
+		if (ret != -ENODEV)
+			DRM_ERROR("Failed to reset chip: %i\n", ret);
+		else
+			DRM_DEBUG_DRIVER("GPU reset disabled\n");
 		goto error;
 	}
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 014/190] drm/i915: Delay queuing hangcheck to wait-request
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (11 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 013/190] drm/i915: Suppress error message when GPU resets are disabled Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 015/190] drm/i915: Remove the dedicated hangcheck workqueue Chris Wilson
                   ` (73 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

We can forgo queuing the hangcheck from the start of every request to
until we wait upon a request. This reduces the overhead of every
request, but may increase the latency of detecting a hang. Howeever, if
nothing every waits upon a hang, did it ever hang? It also improves the
robustness of the wait-request by ensuring that the hangchecker is
indeed running before we sleep indefinitely (and thereby ensuring that
we never actually sleep forever waiting for a dead GPU).

v2: Also queue the hangcheck from retire work in case the GPU become
stuck when no one is watching.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h |  2 +-
 drivers/gpu/drm/i915/i915_gem.c | 13 ++++++++-----
 drivers/gpu/drm/i915/i915_irq.c |  9 ++++-----
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index bbdb056d2a8e..d9d411919779 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2710,7 +2710,7 @@ void intel_hpd_cancel_work(struct drm_i915_private *dev_priv);
 bool intel_hpd_pin_to_port(enum hpd_pin pin, enum port *port);
 
 /* i915_irq.c */
-void i915_queue_hangcheck(struct drm_device *dev);
+void i915_queue_hangcheck(struct drm_i915_private *dev_priv);
 __printf(3, 4)
 void i915_handle_error(struct drm_device *dev, bool wedged,
 		       const char *fmt, ...);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f570990f03e0..b4da8b354a3b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1306,6 +1306,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
+		/* Ensure that even if the GPU hangs, we get woken up. */
+		i915_queue_hangcheck(dev_priv);
+
 		timer.function = NULL;
 		if (timeout || missed_irq(dev_priv, ring)) {
 			unsigned long expire;
@@ -2592,8 +2595,6 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 
 	trace_i915_gem_request_add(request);
 
-	i915_queue_hangcheck(ring->dev);
-
 	queue_delayed_work(dev_priv->wq,
 			   &dev_priv->mm.retire_work,
 			   round_jiffies_up_relative(HZ));
@@ -2947,8 +2948,8 @@ i915_gem_retire_requests(struct drm_device *dev)
 
 	if (idle)
 		mod_delayed_work(dev_priv->wq,
-				   &dev_priv->mm.idle_work,
-				   msecs_to_jiffies(100));
+				 &dev_priv->mm.idle_work,
+				 msecs_to_jiffies(100));
 
 	return idle;
 }
@@ -2967,9 +2968,11 @@ i915_gem_retire_work_handler(struct work_struct *work)
 		idle = i915_gem_retire_requests(dev);
 		mutex_unlock(&dev->struct_mutex);
 	}
-	if (!idle)
+	if (!idle) {
+		i915_queue_hangcheck(dev_priv);
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work,
 				   round_jiffies_up_relative(HZ));
+	}
 }
 
 static void
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 15973e917566..94f5f4e99446 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3165,18 +3165,17 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		goto out;
 	}
 
+	/* Reset timer in case GPU hangs without another request being added */
 	if (busy_count)
-		/* Reset timer case chip hangs without another request
-		 * being added */
-		i915_queue_hangcheck(dev);
+		i915_queue_hangcheck(dev_priv);
 
 out:
 	ENABLE_RPM_WAKEREF_ASSERTS(dev_priv);
 }
 
-void i915_queue_hangcheck(struct drm_device *dev)
+void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
 {
-	struct i915_gpu_error *e = &to_i915(dev)->gpu_error;
+	struct i915_gpu_error *e = &dev_priv->gpu_error;
 
 	if (!i915.enable_hangcheck)
 		return;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 015/190] drm/i915: Remove the dedicated hangcheck workqueue
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (12 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 014/190] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 016/190] drm/i915: Make queueing the hangcheck work inline Chris Wilson
                   ` (72 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

The queue only ever contains at most one item and has no special flags.
It is just a very simple wrapper around the system-wq - a complication
with no benefits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_dma.c | 11 -----------
 drivers/gpu/drm/i915/i915_drv.h |  1 -
 drivers/gpu/drm/i915/i915_irq.c |  6 +++---
 3 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 44a896ce32e6..9e49e304dd8e 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1016,14 +1016,6 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 		goto out_freewq;
 	}
 
-	dev_priv->gpu_error.hangcheck_wq =
-		alloc_ordered_workqueue("i915-hangcheck", 0);
-	if (dev_priv->gpu_error.hangcheck_wq == NULL) {
-		DRM_ERROR("Failed to create our hangcheck workqueue.\n");
-		ret = -ENOMEM;
-		goto out_freedpwq;
-	}
-
 	intel_irq_init(dev_priv);
 	intel_uncore_sanitize(dev);
 
@@ -1105,8 +1097,6 @@ out_gem_unload:
 	intel_teardown_gmbus(dev);
 	intel_teardown_mchbar(dev);
 	pm_qos_remove_request(&dev_priv->pm_qos);
-	destroy_workqueue(dev_priv->gpu_error.hangcheck_wq);
-out_freedpwq:
 	destroy_workqueue(dev_priv->hotplug.dp_wq);
 out_freewq:
 	destroy_workqueue(dev_priv->wq);
@@ -1209,7 +1199,6 @@ int i915_driver_unload(struct drm_device *dev)
 
 	destroy_workqueue(dev_priv->hotplug.dp_wq);
 	destroy_workqueue(dev_priv->wq);
-	destroy_workqueue(dev_priv->gpu_error.hangcheck_wq);
 	pm_qos_remove_request(&dev_priv->pm_qos);
 
 	i915_global_gtt_cleanup(dev);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d9d411919779..188bed933f11 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1330,7 +1330,6 @@ struct i915_gpu_error {
 	/* Hang gpu twice in this window and your context gets banned */
 #define DRM_I915_CTX_BAN_PERIOD DIV_ROUND_UP(8*DRM_I915_HANGCHECK_PERIOD, 1000)
 
-	struct workqueue_struct *hangcheck_wq;
 	struct delayed_work hangcheck_work;
 
 	/* For reset and error_state handling. */
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 94f5f4e99446..8939438d747d 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3175,7 +3175,7 @@ out:
 
 void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
 {
-	struct i915_gpu_error *e = &dev_priv->gpu_error;
+	unsigned long delay;
 
 	if (!i915.enable_hangcheck)
 		return;
@@ -3185,8 +3185,8 @@ void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
 	 * we will ignore a hung ring if a second ring is kept busy.
 	 */
 
-	queue_delayed_work(e->hangcheck_wq, &e->hangcheck_work,
-			   round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES));
+	delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
+	schedule_delayed_work(&dev_priv->gpu_error.hangcheck_work, delay);
 }
 
 static void ibx_irq_reset(struct drm_device *dev)
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 016/190] drm/i915: Make queueing the hangcheck work inline
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (13 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 015/190] drm/i915: Remove the dedicated hangcheck workqueue Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 017/190] drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+ Chris Wilson
                   ` (71 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

Since the function is a small wrapper around schedule_delayed_work(),
move it inline to remove the function call overhead for the principle
caller.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h | 17 ++++++++++++++++-
 drivers/gpu/drm/i915/i915_irq.c | 16 ----------------
 2 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 188bed933f11..201dd330f66a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2709,7 +2709,22 @@ void intel_hpd_cancel_work(struct drm_i915_private *dev_priv);
 bool intel_hpd_pin_to_port(enum hpd_pin pin, enum port *port);
 
 /* i915_irq.c */
-void i915_queue_hangcheck(struct drm_i915_private *dev_priv);
+static inline void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
+{
+	unsigned long delay;
+
+	if (unlikely(!i915.enable_hangcheck))
+		return;
+
+	/* Don't continually defer the hangcheck so that it is always run at
+	 * least once after work has been scheduled on any ring. Otherwise,
+	 * we will ignore a hung ring if a second ring is kept busy.
+	 */
+
+	delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
+	schedule_delayed_work(&dev_priv->gpu_error.hangcheck_work, delay);
+}
+
 __printf(3, 4)
 void i915_handle_error(struct drm_device *dev, bool wedged,
 		       const char *fmt, ...);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 8939438d747d..2a8a9694eec5 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3173,22 +3173,6 @@ out:
 	ENABLE_RPM_WAKEREF_ASSERTS(dev_priv);
 }
 
-void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
-{
-	unsigned long delay;
-
-	if (!i915.enable_hangcheck)
-		return;
-
-	/* Don't continually defer the hangcheck so that it is always run at
-	 * least once after work has been scheduled on any ring. Otherwise,
-	 * we will ignore a hung ring if a second ring is kept busy.
-	 */
-
-	delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
-	schedule_delayed_work(&dev_priv->gpu_error.hangcheck_work, delay);
-}
-
 static void ibx_irq_reset(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 017/190] drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (14 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 016/190] drm/i915: Make queueing the hangcheck work inline Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11 14:02   ` Dave Gordon
  2016-03-24  6:39   ` David Weinehall
  2016-01-11  9:16 ` [PATCH 018/190] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
                   ` (70 subsequent siblings)
  86 siblings, 2 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

In order to ensure seqno/irq coherency, we current read a ring register.
We are not sure quite how it works, only that is does. Experiments show
that e.g. doing a clflush(seqno) instead is not sufficient, but we can
remove the forcewake dance from the mmio access.

v2: Baytrail wants a clflush too.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 99780b674311..a1d43b2c7077 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1490,10 +1490,21 @@ gen6_ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
 {
 	/* Workaround to force correct ordering between irq and seqno writes on
 	 * ivb (and maybe also on snb) by reading from a CS register (like
-	 * ACTHD) before reading the status page. */
+	 * ACTHD) before reading the status page.
+	 *
+	 * Note that this effectively effectively stalls the read by the time
+	 * it takes to do a memory transaction, which more or less ensures
+	 * that the write from the GPU has sufficient time to invalidate
+	 * the CPU cacheline. Alternatively we could delay the interrupt from
+	 * the CS ring to give the write time to land, but that would incur
+	 * a delay after every batch i.e. much more frequent than a delay
+	 * when waiting for the interrupt (with the same net latency).
+	 */
 	if (!lazy_coherency) {
 		struct drm_i915_private *dev_priv = ring->dev->dev_private;
-		POSTING_READ(RING_ACTHD(ring->mmio_base));
+		POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
+
+		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
 	}
 
 	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 018/190] drm/i915: Slaughter the thundering i915_wait_request herd
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (15 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 017/190] drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+ Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 019/190] drm/i915: Separate out the seqno-barrier from engine->get_seqno Chris Wilson
                   ` (69 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.

Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.

Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs aproximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.

To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
everytime to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.

v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/Makefile            |   1 +
 drivers/gpu/drm/i915/i915_debugfs.c      |  19 +-
 drivers/gpu/drm/i915/i915_drv.h          |  32 ++-
 drivers/gpu/drm/i915/i915_gem.c          | 141 +++++--------
 drivers/gpu/drm/i915/i915_gpu_error.c    |   2 +-
 drivers/gpu/drm/i915/i915_irq.c          |  20 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 336 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c         |   5 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c  |   5 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  69 ++++++-
 10 files changed, 521 insertions(+), 109 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/intel_breadcrumbs.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 1e9895b9a546..99ce591c8574 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -37,6 +37,7 @@ i915-y += i915_cmd_parser.o \
 	  i915_gem_userptr.o \
 	  i915_gpu_error.o \
 	  i915_trace_points.o \
+	  intel_breadcrumbs.o \
 	  intel_lrc.o \
 	  intel_mocs.o \
 	  intel_ringbuffer.o \
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 6ff2d23faaa7..9396597b136d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -730,10 +730,22 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
 static void i915_ring_seqno_info(struct seq_file *m,
 				 struct intel_engine_cs *ring)
 {
+	struct rb_node *rb;
+
 	if (ring->get_seqno) {
 		seq_printf(m, "Current sequence (%s): %x\n",
 			   ring->name, ring->get_seqno(ring, false));
 	}
+
+	spin_lock(&ring->breadcrumbs.lock);
+	for (rb = rb_first(&ring->breadcrumbs.waiters);
+	     rb != NULL;
+	     rb = rb_next(rb)) {
+		struct intel_wait *w = container_of(rb, typeof(*w), node);
+		seq_printf(m, "Waiting (%s): %s [%d] on %x\n",
+			   ring->name, w->task->comm, w->task->pid, w->seqno);
+	}
+	spin_unlock(&ring->breadcrumbs.lock);
 }
 
 static int i915_gem_seqno_info(struct seq_file *m, void *data)
@@ -1359,8 +1371,9 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 
 	for_each_ring(ring, dev_priv, i) {
 		seq_printf(m, "%s:\n", ring->name);
-		seq_printf(m, "\tseqno = %x [current %x]\n",
-			   ring->hangcheck.seqno, seqno[i]);
+		seq_printf(m, "\tseqno = %x [current %x], waiters? %d\n",
+			   ring->hangcheck.seqno, seqno[i],
+			   intel_engine_has_waiter(ring));
 		seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
 			   (long long)ring->hangcheck.acthd,
 			   (long long)acthd[i]);
@@ -2346,7 +2359,7 @@ static int count_irq_waiters(struct drm_i915_private *i915)
 	int i;
 
 	for_each_ring(ring, i915, i)
-		count += ring->irq_refcount;
+		count += intel_engine_has_waiter(ring);
 
 	return count;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 201dd330f66a..a9e8de57e848 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1379,7 +1379,7 @@ struct i915_gpu_error {
 #define I915_STOP_RING_ALLOW_WARN      (1 << 30)
 
 	/* For missed irq/seqno simulation. */
-	unsigned int test_irq_rings;
+	unsigned long test_irq_rings;
 };
 
 enum modeset_restore {
@@ -2813,7 +2813,6 @@ ibx_disable_display_interrupt(struct drm_i915_private *dev_priv, uint32_t bits)
 	ibx_display_interrupt_update(dev_priv, bits, 0);
 }
 
-
 /* i915_gem.c */
 int i915_gem_create_ioctl(struct drm_device *dev, void *data,
 			  struct drm_file *file_priv);
@@ -3631,4 +3630,33 @@ static inline void i915_trace_irq_get(struct intel_engine_cs *ring,
 		i915_gem_request_assign(&ring->trace_irq_req, req);
 }
 
+static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
+{
+	/* Ensure our read of the seqno is coherent so that we
+	 * do not "miss an interrupt" (i.e. if this is the last
+	 * request and the seqno write from the GPU is not visible
+	 * by the time the interrupt fires, we will see that the
+	 * request is incomplete and go back to sleep awaiting
+	 * another interrupt that will never come.)
+	 *
+	 * Strictly, we only need to do this once after an interrupt,
+	 * but it is easier and safer to do it every time the waiter
+	 * is woken.
+	 */
+	if (i915_gem_request_completed(req, false))
+		return true;
+
+	/* We need to check whether any gpu reset happened in between
+	 * the request being submitted and now. If a reset has occurred,
+	 * the request is effectively complete (we either are in the
+	 * process of or have discarded the rendering and completely
+	 * reset the GPU. The results of the request are lost and we
+	 * are free to continue on with the original operation.
+	 */
+	if (req->reset_counter != i915_reset_counter(&req->i915->gpu_error))
+		return true;
+
+	return false;
+}
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b4da8b354a3b..4b26529f1f44 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1121,17 +1121,6 @@ i915_gem_check_wedge(unsigned reset_counter, bool interruptible)
 	return 0;
 }
 
-static void fake_irq(unsigned long data)
-{
-	wake_up_process((struct task_struct *)data);
-}
-
-static bool missed_irq(struct drm_i915_private *dev_priv,
-		       struct intel_engine_cs *ring)
-{
-	return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
-}
-
 static unsigned long local_clock_us(unsigned *cpu)
 {
 	unsigned long t;
@@ -1164,7 +1153,9 @@ static bool busywait_stop(unsigned long timeout, unsigned cpu)
 	return this_cpu != cpu;
 }
 
-static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
+static bool __i915_spin_request(struct drm_i915_gem_request *req,
+				struct intel_wait *wait,
+				int state)
 {
 	unsigned long timeout;
 	unsigned cpu;
@@ -1179,31 +1170,30 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 	 * takes to sleep on a request, on the order of a microsecond.
 	 */
 
-	if (req->ring->irq_refcount)
-		return -EBUSY;
-
 	/* Only spin if we know the GPU is processing this request */
 	if (!i915_gem_request_started(req, true))
-		return -EAGAIN;
+		return false;
 
 	timeout = local_clock_us(&cpu) + 5;
-	while (!need_resched()) {
+	do {
 		if (i915_gem_request_completed(req, true))
-			return 0;
+			return true;
 
-		if (signal_pending_state(state, current))
+		if (signal_pending_state(state, wait->task))
 			break;
 
 		if (busywait_stop(timeout, cpu))
 			break;
 
 		cpu_relax_lowlatency();
-	}
 
-	if (i915_gem_request_completed(req, false))
-		return 0;
+		/* Break the loop if we have consumed the timeslice (or been
+		 * preempted) or when either the background thread has
+		 * enabled the interrupt, or the IRQ itself has fired.
+		 */
+	} while (!need_resched() && wait->task->state == state);
 
-	return -EAGAIN;
+	return false;
 }
 
 /**
@@ -1227,18 +1217,13 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			s64 *timeout,
 			struct intel_rps_client *rps)
 {
-	struct intel_engine_cs *ring = i915_gem_request_get_ring(req);
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	const bool irq_test_in_progress =
-		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_ring_flag(ring);
 	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
-	DEFINE_WAIT(wait);
-	unsigned long timeout_expire;
+	struct intel_wait wait;
+	unsigned long timeout_remain;
 	s64 before, now;
-	int ret;
+	int ret = 0;
 
-	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
+	might_sleep();
 
 	if (list_empty(&req->list))
 		return 0;
@@ -1246,7 +1231,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (i915_gem_request_completed(req, true))
 		return 0;
 
-	timeout_expire = 0;
+	timeout_remain = MAX_SCHEDULE_TIMEOUT;
 	if (timeout) {
 		if (WARN_ON(*timeout < 0))
 			return -EINVAL;
@@ -1254,83 +1239,65 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 		if (*timeout == 0)
 			return -ETIME;
 
-		timeout_expire = jiffies + nsecs_to_jiffies_timeout(*timeout);
+		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
 	}
 
-	if (INTEL_INFO(dev_priv)->gen >= 6)
-		gen6_rps_boost(dev_priv, rps, req->emitted_jiffies);
-
 	/* Record current time in case interrupted by signal, or wedged */
 	trace_i915_gem_request_wait_begin(req);
 	before = ktime_get_raw_ns();
 
-	/* Optimistic spin for the next jiffie before touching IRQs */
-	ret = __i915_spin_request(req, state);
-	if (ret == 0)
-		goto out;
+	if (INTEL_INFO(req->i915)->gen >= 6)
+		gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
 
-	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring))) {
-		ret = -ENODEV;
-		goto out;
-	}
-
-	for (;;) {
-		struct timer_list timer;
+	intel_wait_init(&wait, req->seqno);
+	set_task_state(wait.task, state);
 
-		prepare_to_wait(&ring->irq_queue, &wait, state);
+	/* Optimistic spin for the next ~jiffie before touching IRQs */
+	if (intel_engine_add_wait(req->ring, &wait)) {
+		if (__i915_spin_request(req, &wait, state))
+			goto complete;
 
-		/* We need to check whether any gpu reset happened in between
-		 * the request being submitted and now. If a reset has occurred,
-		 * the request is effectively complete (we either are in the
-		 * process of or have discarded the rendering and completely
-		 * reset the GPU. The results of the request are lost and we
-		 * are free to continue on with the original operation.
+		/* In order to check that we haven't missed the interrupt
+		 * as we enabled it, we need to kick ourselves to do a
+		 * coherent check on the seqno before we sleep.
 		 */
-		if (req->reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
-			ret = 0;
-			break;
-		}
-
-		if (i915_gem_request_completed(req, false)) {
-			ret = 0;
-			break;
-		}
+		if (intel_engine_enable_wait_irq(req->ring, &wait))
+			goto wakeup;
+	}
 
-		if (signal_pending_state(state, current)) {
+	for (;;) {
+		if (signal_pending_state(state, wait.task)) {
 			ret = -ERESTARTSYS;
 			break;
 		}
 
-		if (timeout && time_after_eq(jiffies, timeout_expire)) {
+		/* Ensure that even if the GPU hangs, we get woken up. */
+		i915_queue_hangcheck(req->i915);
+
+		timeout_remain = io_schedule_timeout(timeout_remain);
+		if (timeout_remain == 0) {
 			ret = -ETIME;
 			break;
 		}
 
-		/* Ensure that even if the GPU hangs, we get woken up. */
-		i915_queue_hangcheck(dev_priv);
-
-		timer.function = NULL;
-		if (timeout || missed_irq(dev_priv, ring)) {
-			unsigned long expire;
-
-			setup_timer_on_stack(&timer, fake_irq, (unsigned long)current);
-			expire = missed_irq(dev_priv, ring) ? jiffies + 1 : timeout_expire;
-			mod_timer(&timer, expire);
-		}
+		if (intel_wait_complete(&wait))
+			break;
 
-		io_schedule();
+wakeup:
+		set_task_state(wait.task, state);
 
-		if (timer.function) {
-			del_singleshot_timer_sync(&timer);
-			destroy_timer_on_stack(&timer);
-		}
+		/* Carefully check if the request is complete, giving time
+		 * for the seqno to be visible following the interrupt.
+		 * We also have to check in case we are kicked by the GPU
+		 * reset in order to drop the struct_mutex.
+		 */
+		if (__i915_request_irq_complete(req))
+			break;
 	}
-	if (!irq_test_in_progress)
-		ring->irq_put(ring);
-
-	finish_wait(&ring->irq_queue, &wait);
 
-out:
+complete:
+	intel_engine_remove_wait(req->ring, &wait);
+	__set_task_state(wait.task, TASK_RUNNING);
 	now = ktime_get_raw_ns();
 	trace_i915_gem_request_wait_end(req);
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 06ca4082735b..f805d117f3d1 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -900,7 +900,7 @@ static void i915_record_ring_state(struct drm_device *dev,
 		ering->instdone = I915_READ(GEN2_INSTDONE);
 	}
 
-	ering->waiting = waitqueue_active(&ring->irq_queue);
+	ering->waiting = intel_engine_has_waiter(ring);
 	ering->instpm = I915_READ(RING_INSTPM(ring->mmio_base));
 	ering->seqno = ring->get_seqno(ring, false);
 	ering->acthd = intel_ring_get_active_head(ring);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 2a8a9694eec5..95b997a57da8 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1000,8 +1000,7 @@ static void notify_ring(struct intel_engine_cs *ring)
 		return;
 
 	trace_i915_gem_request_notify(ring);
-
-	wake_up_all(&ring->irq_queue);
+	intel_engine_wakeup(ring);
 }
 
 static void vlv_c0_read(struct drm_i915_private *dev_priv,
@@ -1083,7 +1082,7 @@ static bool any_waiters(struct drm_i915_private *dev_priv)
 	int i;
 
 	for_each_ring(ring, dev_priv, i)
-		if (ring->irq_refcount)
+		if (intel_engine_has_waiter(ring))
 			return true;
 
 	return false;
@@ -2431,9 +2430,6 @@ out:
 static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 			       bool reset_completed)
 {
-	struct intel_engine_cs *ring;
-	int i;
-
 	/*
 	 * Notify all waiters for GPU completion events that reset state has
 	 * been changed, and that they need to restart their wait after
@@ -2441,9 +2437,8 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 	 * a gpu reset pending so that i915_error_work_func can acquire them).
 	 */
 
-	/* Wake up __wait_seqno, potentially holding dev->struct_mutex. */
-	for_each_ring(ring, dev_priv, i)
-		wake_up_all(&ring->irq_queue);
+	/* Wake up i915_wait_request, potentially holding dev->struct_mutex. */
+	intel_kick_waiters(dev_priv);
 
 	/* Wake up intel_crtc_wait_for_pending_flips, holding crtc->mutex. */
 	wake_up_all(&dev_priv->pending_flip_queue);
@@ -3079,16 +3074,17 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 			if (ring_idle(ring, seqno)) {
 				ring->hangcheck.action = HANGCHECK_IDLE;
 
-				if (waitqueue_active(&ring->irq_queue)) {
+				if (intel_engine_has_waiter(ring)) {
 					/* Issue a wake-up to catch stuck h/w. */
 					if (!test_and_set_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings)) {
-						if (!(dev_priv->gpu_error.test_irq_rings & intel_ring_flag(ring)))
+						if (!test_bit(ring->id, &dev_priv->gpu_error.test_irq_rings))
 							DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
 								  ring->name);
 						else
 							DRM_INFO("Fake missed irq on %s\n",
 								 ring->name);
-						wake_up_all(&ring->irq_queue);
+
+						intel_engine_enable_fake_irq(ring);
 					}
 					/* Safeguard against driver failure */
 					ring->hangcheck.score += BUSY;
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
new file mode 100644
index 000000000000..9f756583a44e
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -0,0 +1,336 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "i915_drv.h"
+
+static void intel_breadcrumbs_fake_irq(unsigned long data)
+{
+	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
+
+	/*
+	 * The timer persists in case we cannot enable interrupts,
+	 * or if we have previously seen seqno/interrupt incoherency
+	 * ("missed interrupt" syndrome). Here the worker will wake up
+	 * every jiffie in order to kick the oldest waiter to do the
+	 * coherent seqno check.
+	 */
+	rcu_read_lock();
+	if (intel_engine_wakeup(engine))
+		mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
+	rcu_read_unlock();
+}
+
+static void irq_enable(struct intel_engine_cs *engine)
+{
+	WARN_ON(!engine->irq_get(engine));
+}
+
+static void irq_disable(struct intel_engine_cs *engine)
+{
+	engine->irq_put(engine);
+}
+
+static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
+{
+	struct intel_engine_cs *engine =
+		container_of(b, struct intel_engine_cs, breadcrumbs);
+	bool noirq;
+
+	assert_spin_locked(&b->lock);
+	if (b->rpm_wakelock)
+		return false;
+
+	/* Since we are waiting on a request, the GPU should be busy
+	 * and should have its own rpm reference. For completeness,
+	 * record an rpm reference for ourselves to cover the
+	 * interrupt we unmask.
+	 */
+	intel_runtime_pm_get_noresume(engine->i915);
+	b->rpm_wakelock = true;
+
+	/* No interrupts? Kick the waiter every jiffie! */
+	noirq = true;
+	if (intel_irqs_enabled(engine->i915)) {
+		noirq = test_bit(engine->id,
+				 &engine->i915->gpu_error.missed_irq_rings);
+		if (!test_bit(engine->id,
+			      &engine->i915->gpu_error.test_irq_rings)) {
+			irq_enable(engine);
+			b->irq_enabled = true;
+		}
+	}
+	if (noirq)
+		mod_timer(&b->fake_irq, jiffies + 1);
+
+	return b->irq_enabled;
+}
+
+static void __intel_breadcrumbs_disable_irq(struct intel_breadcrumbs *b)
+{
+	struct intel_engine_cs *engine =
+		container_of(b, struct intel_engine_cs, breadcrumbs);
+
+	assert_spin_locked(&b->lock);
+	if (!b->rpm_wakelock)
+		return;
+
+	if (b->irq_enabled) {
+		irq_disable(engine);
+		b->irq_enabled = false;
+	}
+
+	intel_runtime_pm_put(engine->i915);
+	b->rpm_wakelock = false;
+}
+
+static inline struct intel_wait *to_wait(struct rb_node *node)
+{
+	return container_of(node, struct intel_wait, node);
+}
+
+static inline void __intel_breadcrumbs_finish(struct intel_breadcrumbs *b,
+					      struct intel_wait *wait)
+{
+	assert_spin_locked(&b->lock);
+
+	/* This request is completed, so remove it from the tree, mark it as
+	 * complete, and *then* wake up the associated task.
+	 */
+	rb_erase(&wait->node, &b->waiters);
+	RB_CLEAR_NODE(&wait->node);
+
+	wake_up_process(wait->task); /* implicit smp_wmb() */
+}
+
+bool intel_engine_add_wait(struct intel_engine_cs *engine,
+			   struct intel_wait *wait)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	u32 seqno = engine->get_seqno(engine, true);
+	struct rb_node **p, *parent, *completed;
+	bool first;
+
+	spin_lock(&b->lock);
+
+	/* Insert the request into the retirment ordered list
+	 * of waiters by walking the rbtree. If we are the oldest
+	 * seqno in the tree (the first to be retired), then
+	 * set ourselves as the bottom-half.
+	 *
+	 * As we descend the tree, prune completed branches since we hold the
+	 * spinlock we know that the first_waiter must be delayed and can
+	 * reduce some of the sequential wake up latency if we take action
+	 * ourselves and wake up the copmleted tasks in parallel. Also, by
+	 * removing stale elements in the tree, we may be able to reduce the
+	 * ping-pong between the old bottom-half and ourselves as first-waiter.
+	 */
+	first = true;
+	parent = NULL;
+	completed = NULL;
+	p = &b->waiters.rb_node;
+	while (*p) {
+		parent = *p;
+		if (wait->seqno == to_wait(parent)->seqno) {
+			/* We have multiple waiters on the same seqno, select
+			 * the highest priority task (that with the smallest
+			 * task->prio) to serve as the bottom-half for this
+			 * group.
+			 */
+			if (wait->task->prio > to_wait(parent)->task->prio) {
+				p = &parent->rb_right;
+				first = false;
+			} else
+				p = &parent->rb_left;
+		} else if (i915_seqno_passed(wait->seqno,
+					     to_wait(parent)->seqno)) {
+			p = &parent->rb_right;
+			if (i915_seqno_passed(seqno, to_wait(parent)->seqno))
+				completed = parent;
+			else
+				first = false;
+		} else
+			p = &parent->rb_left;
+	}
+	rb_link_node(&wait->node, parent, p);
+	rb_insert_color(&wait->node, &b->waiters);
+
+	if (completed != NULL) {
+		struct rb_node *next = rb_next(completed);
+
+		if (next && next != &wait->node) {
+			GEM_BUG_ON(first);
+			smp_store_mb(b->first_waiter, to_wait(next)->task);
+			/* If we enable the IRQ, we may have missed the
+			 * interrupt for that seqno, so we have to wake up
+			 * that bottom-half in order to do a coherent check
+			 * in case the seqno passed.
+			 */
+			if (__intel_breadcrumbs_enable_irq(b))
+				wake_up_process(to_wait(next)->task);
+		}
+
+		do {
+			struct intel_wait *crumb = to_wait(completed);
+			completed = rb_prev(completed);
+			__intel_breadcrumbs_finish(b, crumb);
+		} while (completed != NULL);
+	}
+
+	if (first)
+		smp_store_mb(b->first_waiter, wait->task);
+	GEM_BUG_ON(b->first_waiter == NULL);
+
+	spin_unlock(&b->lock);
+
+	return first;
+}
+
+bool intel_engine_enable_wait_irq(struct intel_engine_cs *engine,
+				  const struct intel_wait *wait)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	bool first = false;
+
+	spin_lock(&b->lock);
+	if (b->first_waiter == wait->task)
+		first =__intel_breadcrumbs_enable_irq(b);
+	spin_unlock(&b->lock);
+
+	return first;
+}
+
+void intel_engine_enable_fake_irq(struct intel_engine_cs *engine)
+{
+	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
+}
+
+static inline bool chain_wakeup(struct rb_node *rb, int priority)
+{
+	return rb && to_wait(rb)->task->prio <= priority;
+}
+
+void intel_engine_remove_wait(struct intel_engine_cs *engine,
+			      struct intel_wait *wait)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	/* Quick check to see if this waiter was already decoupled from
+	 * the tree by the bottom-half to avoid contention on the spinlock
+	 * by the herd.
+	 */
+	if (RB_EMPTY_NODE(&wait->node))
+		return;
+
+	spin_lock(&b->lock);
+
+	if (b->first_waiter == wait->task) {
+		struct rb_node *next;
+		struct task_struct *task;
+		const int priority = wait->task->prio;
+
+		/* We are the current bottom-half. Find the next candidate,
+		 * the first waiter in the queue on the remaining oldest
+		 * request. As multiple seqnos may complete in the time it
+		 * takes us to wake up and find the next waiter, we have to
+		 * wake up that waiter for it to perform its own coherent
+		 * completion check.
+		 */
+		next = rb_next(&wait->node);
+		if (chain_wakeup(next, priority)) {
+			/* If the next waiter is already complete,
+			 * wake it up and continue onto the next waiter. So
+			 * if have a small herd, they will wake up in parallel
+			 * rather than sequentially, which should reduce
+			 * the overall latency in waking all the completed
+			 * clients.
+			 *
+			 * However, waking up a chain adds extra latency to
+			 * the first_waiter. This is undesirable if that
+			 * waiter is a high priority task.
+			 */
+			u32 seqno = engine->get_seqno(engine, true);
+			while (i915_seqno_passed(seqno,
+						 to_wait(next)->seqno)) {
+				struct rb_node *n = rb_next(next);
+				__intel_breadcrumbs_finish(b, to_wait(next));
+				next = n;
+				if (!chain_wakeup(next, priority))
+					break;
+			}
+		}
+		task = next ? to_wait(next)->task : NULL;
+
+		smp_store_mb(b->first_waiter, task);
+		if (task) {
+			/* In our haste, we may have completed the first waiter
+			 * before we enabled the interrupt. Do so now as we
+			 * have a second waiter for a future seqno. Afterwards,
+			 * we have to wake up that waiter in case we missed
+			 * the interrupt, or if we have to handle an
+			 * exception rather than a seqno completion.
+			 */
+			if (to_wait(next)->seqno != wait->seqno)
+				__intel_breadcrumbs_enable_irq(b);
+			wake_up_process(task);
+		} else
+			__intel_breadcrumbs_disable_irq(b);
+	}
+
+	if (!RB_EMPTY_NODE(&wait->node))
+		rb_erase(&wait->node, &b->waiters);
+	spin_unlock(&b->lock);
+}
+
+void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	spin_lock_init(&b->lock);
+	setup_timer(&b->fake_irq,
+		    intel_breadcrumbs_fake_irq,
+		    (unsigned long)engine);
+}
+
+void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	del_timer_sync(&b->fake_irq);
+}
+
+void intel_kick_waiters(struct drm_i915_private *i915)
+{
+	struct intel_engine_cs *engine;
+	int i;
+
+	/* To avoid the task_struct disappearing beneath us as we wake up
+	 * the process, we must first inspect the task_struct->state under the
+	 * RCU lock, i.e. as we call wake_up_process() we must be holding the
+	 * rcu_read_lock().
+	 */
+	rcu_read_lock();
+	for_each_ring(engine, i915, i)
+		intel_engine_wakeup(engine);
+	rcu_read_unlock();
+}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 32644338e6f8..16fa58a0a930 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1928,6 +1928,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 	i915_cmd_parser_fini_ring(ring);
 	i915_gem_batch_pool_fini(&ring->batch_pool);
 
+	intel_engine_fini_breadcrumbs(ring);
+
 	if (ring->status_page.obj) {
 		kunmap(sg_page(ring->status_page.obj->pages->sgl));
 		ring->status_page.obj = NULL;
@@ -1945,10 +1947,11 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->buffer = NULL;
 
 	ring->dev = dev;
+	ring->i915 = to_i915(dev);
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
-	init_waitqueue_head(&ring->irq_queue);
+	intel_engine_init_breadcrumbs(ring);
 
 	INIT_LIST_HEAD(&ring->buffers);
 	INIT_LIST_HEAD(&ring->execlist_queue);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index a1d43b2c7077..60b0df2c5399 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2152,6 +2152,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	WARN_ON(ring->buffer);
 
 	ring->dev = dev;
+	ring->i915 = to_i915(dev);
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
@@ -2159,7 +2160,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
 
-	init_waitqueue_head(&ring->irq_queue);
+	intel_engine_init_breadcrumbs(ring);
 
 	ringbuf = intel_engine_create_ringbuffer(ring, 32 * PAGE_SIZE);
 	if (IS_ERR(ringbuf)) {
@@ -2223,6 +2224,8 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 
 	i915_cmd_parser_fini_ring(ring);
 	i915_gem_batch_pool_fini(&ring->batch_pool);
+	intel_engine_fini_breadcrumbs(ring);
+
 	ring->dev = NULL;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 7349d9258191..51fcb66bfc4a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -158,9 +158,35 @@ struct  intel_engine_cs {
 #define LAST_USER_RING (VECS + 1)
 	u32		mmio_base;
 	struct		drm_device *dev;
+	struct drm_i915_private *i915;
 	struct intel_ringbuffer *buffer;
 	struct list_head buffers;
 
+	/* Rather than have every client wait upon all user interrupts,
+	 * with the herd waking after every interrupt and each doing the
+	 * heavyweight seqno dance, we delegate the task (of being the
+	 * bottom-half of the user interrupt) to the first client. After
+	 * every interrupt, we wake up one client, who does the heavyweight
+	 * coherent seqno read and either goes back to sleep (if incomplete),
+	 * or wakes up all the completed clients in parallel, before then
+	 * transferring the bottom-half status to the next client in the queue.
+	 *
+	 * Compared to walking the entire list of waiters in a single dedicated
+	 * bottom-half, we reduce the latency of the first waiter by avoiding
+	 * a context switch, but incur additional coherent seqno reads when
+	 * following the chain of request breadcrumbs. Since it is most likely
+	 * that we have a single client waiting on each seqno, then reducing
+	 * the overhead of waking that client is much preferred.
+	 */
+	struct intel_breadcrumbs {
+		spinlock_t lock; /* protects the lists of requests */
+		struct rb_root waiters; /* sorted by retirement, priority */
+		struct task_struct *first_waiter; /* bh for user interrupts */
+		struct timer_list fake_irq; /* used after a missed interrupt */
+		bool irq_enabled;
+		bool rpm_wakelock;
+	} breadcrumbs;
+
 	/*
 	 * A pool of objects to use as shadow copies of client batch buffers
 	 * when the command parser is enabled. Prevents the client from
@@ -304,8 +330,6 @@ struct  intel_engine_cs {
 
 	bool gpu_caches_dirty;
 
-	wait_queue_head_t irq_queue;
-
 	struct intel_context *default_context;
 	struct intel_context *last_context;
 
@@ -511,4 +535,45 @@ void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf);
 /* Legacy ringbuffer specific portion of reservation code: */
 int intel_ring_reserve_space(struct drm_i915_gem_request *request);
 
+/* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
+struct intel_wait {
+	struct rb_node node;
+	struct task_struct *task;
+	u32 seqno;
+};
+void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
+static inline void intel_wait_init(struct intel_wait *wait, u32 seqno)
+{
+	wait->task = current;
+	wait->seqno = seqno;
+}
+static inline bool intel_wait_complete(const struct intel_wait *wait)
+{
+	return RB_EMPTY_NODE(&wait->node);
+}
+bool intel_engine_add_wait(struct intel_engine_cs *engine,
+			   struct intel_wait *wait);
+bool intel_engine_enable_wait_irq(struct intel_engine_cs *engine,
+				  const struct intel_wait *wait);
+void intel_engine_remove_wait(struct intel_engine_cs *engine,
+			      struct intel_wait *wait);
+static inline bool intel_engine_has_waiter(struct intel_engine_cs *engine)
+{
+	return READ_ONCE(engine->breadcrumbs.first_waiter);
+}
+static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
+{
+	struct task_struct *task = READ_ONCE(engine->breadcrumbs.first_waiter);
+	/* Note that for this not to dangerously chase a dangling pointer,
+	 * the caller is responsible for ensure that the task remain valid for
+	 * wake_up_process() i.e. that the RCU grace period cannot expire.
+	 */
+	if (task)
+		wake_up_process(task);
+	return task != NULL;
+}
+void intel_engine_enable_fake_irq(struct intel_engine_cs *engine);
+void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
+void intel_kick_waiters(struct drm_i915_private *i915);
+
 #endif /* _INTEL_RINGBUFFER_H_ */
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 019/190] drm/i915: Separate out the seqno-barrier from engine->get_seqno
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (16 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 018/190] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11 15:43   ` Dave Gordon
  2016-01-11  9:16 ` [PATCH 020/190] drm/i915: Remove the lazy_coherency parameter from request-completed? Chris Wilson
                   ` (68 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

In order to simplify the next couple of patches, extract the
lazy_coherency optimisation our of the engine->get_seqno() vfunc into
its own callback.

v2: Rename the barrier to engine->irq_seqno_barrier to try and better
reflect that the barrier is only required after the user interrupt before
reading the seqno (to ensure that the seqno update lands in time as we
do not have strict seqno-irq ordering on all platforms).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c      |  6 ++---
 drivers/gpu/drm/i915/i915_drv.h          | 12 ++++++----
 drivers/gpu/drm/i915/i915_gpu_error.c    |  2 +-
 drivers/gpu/drm/i915/i915_irq.c          |  4 ++--
 drivers/gpu/drm/i915/i915_trace.h        |  2 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c |  4 ++--
 drivers/gpu/drm/i915/intel_lrc.c         | 39 ++++++++++++--------------------
 drivers/gpu/drm/i915/intel_ringbuffer.c  | 36 +++++++++++++++--------------
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  4 ++--
 9 files changed, 53 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 9396597b136d..1499e2337e5d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -600,7 +600,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 					   ring->name,
 					   i915_gem_request_get_seqno(work->flip_queued_req),
 					   dev_priv->next_seqno,
-					   ring->get_seqno(ring, true),
+					   ring->get_seqno(ring),
 					   i915_gem_request_completed(work->flip_queued_req, true));
 			} else
 				seq_printf(m, "Flip not associated with any ring\n");
@@ -734,7 +734,7 @@ static void i915_ring_seqno_info(struct seq_file *m,
 
 	if (ring->get_seqno) {
 		seq_printf(m, "Current sequence (%s): %x\n",
-			   ring->name, ring->get_seqno(ring, false));
+			   ring->name, ring->get_seqno(ring));
 	}
 
 	spin_lock(&ring->breadcrumbs.lock);
@@ -1354,7 +1354,7 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 	intel_runtime_pm_get(dev_priv);
 
 	for_each_ring(ring, dev_priv, i) {
-		seqno[i] = ring->get_seqno(ring, false);
+		seqno[i] = ring->get_seqno(ring);
 		acthd[i] = intel_ring_get_active_head(ring);
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a9e8de57e848..9762aa76bb0a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2972,15 +2972,19 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
 					   bool lazy_coherency)
 {
-	u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency);
-	return i915_seqno_passed(seqno, req->previous_seqno);
+	if (!lazy_coherency && req->ring->irq_seqno_barrier)
+		req->ring->irq_seqno_barrier(req->ring);
+	return i915_seqno_passed(req->ring->get_seqno(req->ring),
+				 req->previous_seqno);
 }
 
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
 					      bool lazy_coherency)
 {
-	u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency);
-	return i915_seqno_passed(seqno, req->seqno);
+	if (!lazy_coherency && req->ring->irq_seqno_barrier)
+		req->ring->irq_seqno_barrier(req->ring);
+	return i915_seqno_passed(req->ring->get_seqno(req->ring),
+				 req->seqno);
 }
 
 int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index f805d117f3d1..01d0206ca4dd 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -902,8 +902,8 @@ static void i915_record_ring_state(struct drm_device *dev,
 
 	ering->waiting = intel_engine_has_waiter(ring);
 	ering->instpm = I915_READ(RING_INSTPM(ring->mmio_base));
-	ering->seqno = ring->get_seqno(ring, false);
 	ering->acthd = intel_ring_get_active_head(ring);
+	ering->seqno = ring->get_seqno(ring);
 	ering->start = I915_READ_START(ring);
 	ering->head = I915_READ_HEAD(ring);
 	ering->tail = I915_READ_TAIL(ring);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 95b997a57da8..d73669783045 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2903,7 +2903,7 @@ static int semaphore_passed(struct intel_engine_cs *ring)
 	if (signaller->hangcheck.deadlock >= I915_NUM_RINGS)
 		return -1;
 
-	if (i915_seqno_passed(signaller->get_seqno(signaller, false), seqno))
+	if (i915_seqno_passed(signaller->get_seqno(signaller), seqno))
 		return 1;
 
 	/* cursory check for an unkickable deadlock */
@@ -3067,8 +3067,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 
 		semaphore_clear_deadlocks(dev_priv);
 
-		seqno = ring->get_seqno(ring, false);
 		acthd = intel_ring_get_active_head(ring);
+		seqno = ring->get_seqno(ring);
 
 		if (ring->hangcheck.seqno == seqno) {
 			if (ring_idle(ring, seqno)) {
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 52b2d409945d..cfb5f78a6e84 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -573,7 +573,7 @@ TRACE_EVENT(i915_gem_request_notify,
 	    TP_fast_assign(
 			   __entry->dev = ring->dev->primary->index;
 			   __entry->ring = ring->id;
-			   __entry->seqno = ring->get_seqno(ring, false);
+			   __entry->seqno = ring->get_seqno(ring);
 			   ),
 
 	    TP_printk("dev=%u, ring=%u, seqno=%u",
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 9f756583a44e..10b0add54acf 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -127,7 +127,7 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
 			   struct intel_wait *wait)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	u32 seqno = engine->get_seqno(engine, true);
+	u32 seqno = engine->get_seqno(engine);
 	struct rb_node **p, *parent, *completed;
 	bool first;
 
@@ -269,7 +269,7 @@ void intel_engine_remove_wait(struct intel_engine_cs *engine,
 			 * the first_waiter. This is undesirable if that
 			 * waiter is a high priority task.
 			 */
-			u32 seqno = engine->get_seqno(engine, true);
+			u32 seqno = engine->get_seqno(engine);
 			while (i915_seqno_passed(seqno,
 						 to_wait(next)->seqno)) {
 				struct rb_node *n = rb_next(next);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 16fa58a0a930..333e95bda78a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1775,7 +1775,7 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
 	return 0;
 }
 
-static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
+static u32 gen8_get_seqno(struct intel_engine_cs *ring)
 {
 	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
 }
@@ -1785,9 +1785,8 @@ static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
 	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
 }
 
-static u32 bxt_a_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
+static void bxt_seqno_barrier(struct intel_engine_cs *ring)
 {
-
 	/*
 	 * On BXT A steppings there is a HW coherency issue whereby the
 	 * MI_STORE_DATA_IMM storing the completed request's seqno
@@ -1798,11 +1797,7 @@ static u32 bxt_a_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
 	 * bxt_a_set_seqno(), where we also do a clflush after the write. So
 	 * this clflush in practice becomes an invalidate operation.
 	 */
-
-	if (!lazy_coherency)
-		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
-
-	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
+	intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
 }
 
 static void bxt_a_set_seqno(struct intel_engine_cs *ring, u32 seqno)
@@ -2007,12 +2002,11 @@ static int logical_render_ring_init(struct drm_device *dev)
 		ring->init_hw = gen8_init_render_ring;
 	ring->init_context = gen8_init_rcs_context;
 	ring->cleanup = intel_fini_pipe_control;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) {
-		ring->get_seqno = bxt_a_get_seqno;
+		ring->irq_seqno_barrier = bxt_seqno_barrier;
 		ring->set_seqno = bxt_a_set_seqno;
-	} else {
-		ring->get_seqno = gen8_get_seqno;
-		ring->set_seqno = gen8_set_seqno;
 	}
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush_render;
@@ -2059,12 +2053,11 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
 
 	ring->init_hw = gen8_init_common_ring;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) {
-		ring->get_seqno = bxt_a_get_seqno;
+		ring->irq_seqno_barrier = bxt_seqno_barrier;
 		ring->set_seqno = bxt_a_set_seqno;
-	} else {
-		ring->get_seqno = gen8_get_seqno;
-		ring->set_seqno = gen8_set_seqno;
 	}
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
@@ -2114,12 +2107,11 @@ static int logical_blt_ring_init(struct drm_device *dev)
 		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
 
 	ring->init_hw = gen8_init_common_ring;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) {
-		ring->get_seqno = bxt_a_get_seqno;
+		ring->irq_seqno_barrier = bxt_seqno_barrier;
 		ring->set_seqno = bxt_a_set_seqno;
-	} else {
-		ring->get_seqno = gen8_get_seqno;
-		ring->set_seqno = gen8_set_seqno;
 	}
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
@@ -2144,12 +2136,11 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
 
 	ring->init_hw = gen8_init_common_ring;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) {
-		ring->get_seqno = bxt_a_get_seqno;
+		ring->irq_seqno_barrier = bxt_seqno_barrier;
 		ring->set_seqno = bxt_a_set_seqno;
-	} else {
-		ring->get_seqno = gen8_get_seqno;
-		ring->set_seqno = gen8_set_seqno;
 	}
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 60b0df2c5399..57ec21c5b1ab 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1485,8 +1485,8 @@ pc_render_add_request(struct drm_i915_gem_request *req)
 	return 0;
 }
 
-static u32
-gen6_ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
+static void
+gen6_seqno_barrier(struct intel_engine_cs *ring)
 {
 	/* Workaround to force correct ordering between irq and seqno writes on
 	 * ivb (and maybe also on snb) by reading from a CS register (like
@@ -1500,18 +1500,14 @@ gen6_ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
 	 * a delay after every batch i.e. much more frequent than a delay
 	 * when waiting for the interrupt (with the same net latency).
 	 */
-	if (!lazy_coherency) {
-		struct drm_i915_private *dev_priv = ring->dev->dev_private;
-		POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
-
-		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
-	}
+	struct drm_i915_private *dev_priv = ring->i915;
+	POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
 
-	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
+	intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
 }
 
 static u32
-ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
+ring_get_seqno(struct intel_engine_cs *ring)
 {
 	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
 }
@@ -1523,7 +1519,7 @@ ring_set_seqno(struct intel_engine_cs *ring, u32 seqno)
 }
 
 static u32
-pc_render_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
+pc_render_get_seqno(struct intel_engine_cs *ring)
 {
 	return ring->scratch.cpu_page[0];
 }
@@ -2698,7 +2694,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->irq_get = gen8_ring_get_irq;
 		ring->irq_put = gen8_ring_put_irq;
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
-		ring->get_seqno = gen6_ring_get_seqno;
+		ring->irq_seqno_barrier = gen6_seqno_barrier;
+		ring->get_seqno = ring_get_seqno;
 		ring->set_seqno = ring_set_seqno;
 		if (i915_semaphore_is_enabled(dev)) {
 			WARN_ON(!dev_priv->semaphore_obj);
@@ -2715,7 +2712,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->irq_get = gen6_ring_get_irq;
 		ring->irq_put = gen6_ring_put_irq;
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
-		ring->get_seqno = gen6_ring_get_seqno;
+		ring->irq_seqno_barrier = gen6_seqno_barrier;
+		ring->get_seqno = ring_get_seqno;
 		ring->set_seqno = ring_set_seqno;
 		if (i915_semaphore_is_enabled(dev)) {
 			ring->semaphore.sync_to = gen6_ring_sync;
@@ -2829,7 +2827,8 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			ring->write_tail = gen6_bsd_ring_write_tail;
 		ring->flush = gen6_bsd_ring_flush;
 		ring->add_request = gen6_add_request;
-		ring->get_seqno = gen6_ring_get_seqno;
+		ring->irq_seqno_barrier = gen6_seqno_barrier;
+		ring->get_seqno = ring_get_seqno;
 		ring->set_seqno = ring_set_seqno;
 		if (INTEL_INFO(dev)->gen >= 8) {
 			ring->irq_enable_mask =
@@ -2901,7 +2900,8 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
 	ring->mmio_base = GEN8_BSD2_RING_BASE;
 	ring->flush = gen6_bsd_ring_flush;
 	ring->add_request = gen6_add_request;
-	ring->get_seqno = gen6_ring_get_seqno;
+	ring->irq_seqno_barrier = gen6_seqno_barrier;
+	ring->get_seqno = ring_get_seqno;
 	ring->set_seqno = ring_set_seqno;
 	ring->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
@@ -2931,7 +2931,8 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 	ring->write_tail = ring_write_tail;
 	ring->flush = gen6_ring_flush;
 	ring->add_request = gen6_add_request;
-	ring->get_seqno = gen6_ring_get_seqno;
+	ring->irq_seqno_barrier = gen6_seqno_barrier;
+	ring->get_seqno = ring_get_seqno;
 	ring->set_seqno = ring_set_seqno;
 	if (INTEL_INFO(dev)->gen >= 8) {
 		ring->irq_enable_mask =
@@ -2988,7 +2989,8 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 	ring->write_tail = ring_write_tail;
 	ring->flush = gen6_ring_flush;
 	ring->add_request = gen6_add_request;
-	ring->get_seqno = gen6_ring_get_seqno;
+	ring->irq_seqno_barrier = gen6_seqno_barrier;
+	ring->get_seqno = ring_get_seqno;
 	ring->set_seqno = ring_set_seqno;
 
 	if (INTEL_INFO(dev)->gen >= 8) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 51fcb66bfc4a..3b49726b1732 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -219,8 +219,8 @@ struct  intel_engine_cs {
 	 * seen value is good enough. Note that the seqno will always be
 	 * monotonic, even if not coherent.
 	 */
-	u32		(*get_seqno)(struct intel_engine_cs *ring,
-				     bool lazy_coherency);
+	void		(*irq_seqno_barrier)(struct intel_engine_cs *ring);
+	u32		(*get_seqno)(struct intel_engine_cs *ring);
 	void		(*set_seqno)(struct intel_engine_cs *ring,
 				     u32 seqno);
 	int		(*dispatch_execbuffer)(struct drm_i915_gem_request *req,
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 020/190] drm/i915: Remove the lazy_coherency parameter from request-completed?
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (17 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 019/190] drm/i915: Separate out the seqno-barrier from engine->get_seqno Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11 15:45   ` Dave Gordon
  2016-01-12 10:27   ` Mika Kuoppala
  2016-01-11  9:16 ` [PATCH 021/190] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
                   ` (67 subsequent siblings)
  86 siblings, 2 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

Now that we have split out the seqno-barrier from the
engine->get_seqno() callback itself, we can move the users of the
seqno-barrier to the required callsites simplifying the common code and
making the required workaround handling much more explicit.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  4 ++--
 drivers/gpu/drm/i915/i915_drv.h      | 17 ++++++++---------
 drivers/gpu/drm/i915/i915_gem.c      | 24 ++++++++++++++++--------
 drivers/gpu/drm/i915/intel_display.c |  2 +-
 drivers/gpu/drm/i915/intel_pm.c      |  4 ++--
 5 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 1499e2337e5d..d09e48455dcb 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -601,7 +601,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 					   i915_gem_request_get_seqno(work->flip_queued_req),
 					   dev_priv->next_seqno,
 					   ring->get_seqno(ring),
-					   i915_gem_request_completed(work->flip_queued_req, true));
+					   i915_gem_request_completed(work->flip_queued_req));
 			} else
 				seq_printf(m, "Flip not associated with any ring\n");
 			seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n",
@@ -1354,8 +1354,8 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 	intel_runtime_pm_get(dev_priv);
 
 	for_each_ring(ring, dev_priv, i) {
-		seqno[i] = ring->get_seqno(ring);
 		acthd[i] = intel_ring_get_active_head(ring);
+		seqno[i] = ring->get_seqno(ring);
 	}
 
 	i915_get_extra_instdone(dev, instdone);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9762aa76bb0a..44d46018ee13 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2969,20 +2969,14 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 	return (int32_t)(seq1 - seq2) >= 0;
 }
 
-static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
-					   bool lazy_coherency)
+static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
 {
-	if (!lazy_coherency && req->ring->irq_seqno_barrier)
-		req->ring->irq_seqno_barrier(req->ring);
 	return i915_seqno_passed(req->ring->get_seqno(req->ring),
 				 req->previous_seqno);
 }
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
-	if (!lazy_coherency && req->ring->irq_seqno_barrier)
-		req->ring->irq_seqno_barrier(req->ring);
 	return i915_seqno_passed(req->ring->get_seqno(req->ring),
 				 req->seqno);
 }
@@ -3636,6 +3630,8 @@ static inline void i915_trace_irq_get(struct intel_engine_cs *ring,
 
 static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 {
+	struct intel_engine_cs *engine = req->ring;
+
 	/* Ensure our read of the seqno is coherent so that we
 	 * do not "miss an interrupt" (i.e. if this is the last
 	 * request and the seqno write from the GPU is not visible
@@ -3647,7 +3643,10 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 	 * but it is easier and safer to do it every time the waiter
 	 * is woken.
 	 */
-	if (i915_gem_request_completed(req, false))
+	if (engine->irq_seqno_barrier)
+		engine->irq_seqno_barrier(engine);
+
+	if (i915_gem_request_completed(req))
 		return true;
 
 	/* We need to check whether any gpu reset happened in between
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4b26529f1f44..d125820c6309 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1171,12 +1171,12 @@ static bool __i915_spin_request(struct drm_i915_gem_request *req,
 	 */
 
 	/* Only spin if we know the GPU is processing this request */
-	if (!i915_gem_request_started(req, true))
+	if (!i915_gem_request_started(req))
 		return false;
 
 	timeout = local_clock_us(&cpu) + 5;
 	do {
-		if (i915_gem_request_completed(req, true))
+		if (i915_gem_request_completed(req))
 			return true;
 
 		if (signal_pending_state(state, wait->task))
@@ -1228,7 +1228,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (list_empty(&req->list))
 		return 0;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return 0;
 
 	timeout_remain = MAX_SCHEDULE_TIMEOUT;
@@ -2724,8 +2724,16 @@ i915_gem_find_active_request(struct intel_engine_cs *ring)
 {
 	struct drm_i915_gem_request *request;
 
+	/* We are called by the error capture and reset at a random
+	 * point in time. In particular, note that neither is crucially
+	 * ordered with an interrupt. After a hang, the GPU is dead and we
+	 * assume that no more writes can happen (we waited long enough for
+	 * all writes that were in transaction to be flushed) - adding an
+	 * extra delay for a recent interrupt is pointless. Hence, we do
+	 * not need an engine->irq_seqno_barrier() before the seqno reads.
+	 */
 	list_for_each_entry(request, &ring->request_list, list) {
-		if (i915_gem_request_completed(request, false))
+		if (i915_gem_request_completed(request))
 			continue;
 
 		return request;
@@ -2859,7 +2867,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 					   struct drm_i915_gem_request,
 					   list);
 
-		if (!i915_gem_request_completed(request, true))
+		if (!i915_gem_request_completed(request))
 			break;
 
 		i915_gem_request_retire(request);
@@ -2883,7 +2891,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 	}
 
 	if (unlikely(ring->trace_irq_req &&
-		     i915_gem_request_completed(ring->trace_irq_req, true))) {
+		     i915_gem_request_completed(ring->trace_irq_req))) {
 		ring->irq_put(ring);
 		i915_gem_request_assign(&ring->trace_irq_req, NULL);
 	}
@@ -2995,7 +3003,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 		if (list_empty(&req->list))
 			goto retire;
 
-		if (i915_gem_request_completed(req, true)) {
+		if (i915_gem_request_completed(req)) {
 			__i915_gem_request_retire__upto(req);
 retire:
 			i915_gem_object_retire__read(obj, i);
@@ -3104,7 +3112,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (to == from)
 		return 0;
 
-	if (i915_gem_request_completed(from_req, true))
+	if (i915_gem_request_completed(from_req))
 		return 0;
 
 	if (!i915_semaphore_is_enabled(obj->base.dev)) {
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 7e36f85d3109..de4d4a0d923a 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11523,7 +11523,7 @@ static bool __intel_pageflip_stall_check(struct drm_device *dev,
 
 	if (work->flip_ready_vblank == 0) {
 		if (work->flip_queued_req &&
-		    !i915_gem_request_completed(work->flip_queued_req, true))
+		    !i915_gem_request_completed(work->flip_queued_req))
 			return false;
 
 		work->flip_ready_vblank = drm_crtc_vblank_count(crtc);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 9df9e9a22f3c..401c3770057d 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7286,7 +7286,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 	struct request_boost *boost = container_of(work, struct request_boost, work);
 	struct drm_i915_gem_request *req = boost->req;
 
-	if (!i915_gem_request_completed(req, true))
+	if (!i915_gem_request_completed(req))
 		gen6_rps_boost(to_i915(req->ring->dev), NULL,
 			       req->emitted_jiffies);
 
@@ -7302,7 +7302,7 @@ void intel_queue_rps_boost_for_request(struct drm_device *dev,
 	if (req == NULL || INTEL_INFO(dev)->gen < 6)
 		return;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return;
 
 	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 021/190] drm/i915: Use HWS for seqno tracking everywhere
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (18 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 020/190] drm/i915: Remove the lazy_coherency parameter from request-completed? Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11 20:03   ` Dave Gordon
  2016-01-12 10:05   ` Mika Kuoppala
  2016-01-11  9:16 ` [PATCH 022/190] drm/i915: Check the CPU cached value of seqno after waking the waiter Chris Wilson
                   ` (66 subsequent siblings)
  86 siblings, 2 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

By using the same address for storing the HWS on every platform, we can
remove the platform specific vfuncs and reduce the get-seqno routine to
a single read of a cached memory location.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c      | 10 ++--
 drivers/gpu/drm/i915/i915_drv.h          |  4 +-
 drivers/gpu/drm/i915/i915_gpu_error.c    |  2 +-
 drivers/gpu/drm/i915/i915_irq.c          |  4 +-
 drivers/gpu/drm/i915/i915_trace.h        |  2 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c |  4 +-
 drivers/gpu/drm/i915/intel_lrc.c         | 46 ++---------------
 drivers/gpu/drm/i915/intel_ringbuffer.c  | 86 ++++++++------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  7 +--
 9 files changed, 43 insertions(+), 122 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index d09e48455dcb..5a706c700684 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -600,7 +600,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 					   ring->name,
 					   i915_gem_request_get_seqno(work->flip_queued_req),
 					   dev_priv->next_seqno,
-					   ring->get_seqno(ring),
+					   intel_ring_get_seqno(ring),
 					   i915_gem_request_completed(work->flip_queued_req));
 			} else
 				seq_printf(m, "Flip not associated with any ring\n");
@@ -732,10 +732,8 @@ static void i915_ring_seqno_info(struct seq_file *m,
 {
 	struct rb_node *rb;
 
-	if (ring->get_seqno) {
-		seq_printf(m, "Current sequence (%s): %x\n",
-			   ring->name, ring->get_seqno(ring));
-	}
+	seq_printf(m, "Current sequence (%s): %x\n",
+		   ring->name, intel_ring_get_seqno(ring));
 
 	spin_lock(&ring->breadcrumbs.lock);
 	for (rb = rb_first(&ring->breadcrumbs.waiters);
@@ -1355,7 +1353,7 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 
 	for_each_ring(ring, dev_priv, i) {
 		acthd[i] = intel_ring_get_active_head(ring);
-		seqno[i] = ring->get_seqno(ring);
+		seqno[i] = intel_ring_get_seqno(ring);
 	}
 
 	i915_get_extra_instdone(dev, instdone);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 44d46018ee13..fcedcbc50834 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2971,13 +2971,13 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 
 static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
 {
-	return i915_seqno_passed(req->ring->get_seqno(req->ring),
+	return i915_seqno_passed(intel_ring_get_seqno(req->ring),
 				 req->previous_seqno);
 }
 
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
-	return i915_seqno_passed(req->ring->get_seqno(req->ring),
+	return i915_seqno_passed(intel_ring_get_seqno(req->ring),
 				 req->seqno);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 01d0206ca4dd..3e137fc701cf 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -903,7 +903,7 @@ static void i915_record_ring_state(struct drm_device *dev,
 	ering->waiting = intel_engine_has_waiter(ring);
 	ering->instpm = I915_READ(RING_INSTPM(ring->mmio_base));
 	ering->acthd = intel_ring_get_active_head(ring);
-	ering->seqno = ring->get_seqno(ring);
+	ering->seqno = intel_ring_get_seqno(ring);
 	ering->start = I915_READ_START(ring);
 	ering->head = I915_READ_HEAD(ring);
 	ering->tail = I915_READ_TAIL(ring);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index d73669783045..627c7fb6aa9b 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2903,7 +2903,7 @@ static int semaphore_passed(struct intel_engine_cs *ring)
 	if (signaller->hangcheck.deadlock >= I915_NUM_RINGS)
 		return -1;
 
-	if (i915_seqno_passed(signaller->get_seqno(signaller), seqno))
+	if (i915_seqno_passed(intel_ring_get_seqno(signaller), seqno))
 		return 1;
 
 	/* cursory check for an unkickable deadlock */
@@ -3068,7 +3068,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		semaphore_clear_deadlocks(dev_priv);
 
 		acthd = intel_ring_get_active_head(ring);
-		seqno = ring->get_seqno(ring);
+		seqno = intel_ring_get_seqno(ring);
 
 		if (ring->hangcheck.seqno == seqno) {
 			if (ring_idle(ring, seqno)) {
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index cfb5f78a6e84..efca75bcace3 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -573,7 +573,7 @@ TRACE_EVENT(i915_gem_request_notify,
 	    TP_fast_assign(
 			   __entry->dev = ring->dev->primary->index;
 			   __entry->ring = ring->id;
-			   __entry->seqno = ring->get_seqno(ring);
+			   __entry->seqno = intel_ring_get_seqno(ring);
 			   ),
 
 	    TP_printk("dev=%u, ring=%u, seqno=%u",
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 10b0add54acf..f66acf820c40 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -127,7 +127,7 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
 			   struct intel_wait *wait)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	u32 seqno = engine->get_seqno(engine);
+	u32 seqno = intel_ring_get_seqno(engine);
 	struct rb_node **p, *parent, *completed;
 	bool first;
 
@@ -269,7 +269,7 @@ void intel_engine_remove_wait(struct intel_engine_cs *engine,
 			 * the first_waiter. This is undesirable if that
 			 * waiter is a high priority task.
 			 */
-			u32 seqno = engine->get_seqno(engine);
+			u32 seqno = intel_ring_get_seqno(engine);
 			while (i915_seqno_passed(seqno,
 						 to_wait(next)->seqno)) {
 				struct rb_node *n = rb_next(next);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 333e95bda78a..ad51b1fc37cd 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1775,16 +1775,6 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
 	return 0;
 }
 
-static u32 gen8_get_seqno(struct intel_engine_cs *ring)
-{
-	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
-}
-
-static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
-{
-	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
-}
-
 static void bxt_seqno_barrier(struct intel_engine_cs *ring)
 {
 	/*
@@ -1800,14 +1790,6 @@ static void bxt_seqno_barrier(struct intel_engine_cs *ring)
 	intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
 }
 
-static void bxt_a_set_seqno(struct intel_engine_cs *ring, u32 seqno)
-{
-	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
-
-	/* See bxt_a_get_seqno() explaining the reason for the clflush. */
-	intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
-}
-
 static int gen8_emit_request(struct drm_i915_gem_request *request)
 {
 	struct intel_ringbuffer *ringbuf = request->ringbuf;
@@ -1832,7 +1814,7 @@ static int gen8_emit_request(struct drm_i915_gem_request *request)
 				(ring->status_page.gfx_addr +
 				(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT)));
 	intel_logical_ring_emit(ringbuf, 0);
-	intel_logical_ring_emit(ringbuf, i915_gem_request_get_seqno(request));
+	intel_logical_ring_emit(ringbuf, request->seqno);
 	intel_logical_ring_emit(ringbuf, MI_USER_INTERRUPT);
 	intel_logical_ring_emit(ringbuf, MI_NOOP);
 	intel_logical_ring_advance_and_submit(request);
@@ -2002,12 +1984,8 @@ static int logical_render_ring_init(struct drm_device *dev)
 		ring->init_hw = gen8_init_render_ring;
 	ring->init_context = gen8_init_rcs_context;
 	ring->cleanup = intel_fini_pipe_control;
-	ring->get_seqno = gen8_get_seqno;
-	ring->set_seqno = gen8_set_seqno;
-	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) {
+	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1))
 		ring->irq_seqno_barrier = bxt_seqno_barrier;
-		ring->set_seqno = bxt_a_set_seqno;
-	}
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush_render;
 	ring->irq_get = gen8_logical_ring_get_irq;
@@ -2053,12 +2031,8 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
 
 	ring->init_hw = gen8_init_common_ring;
-	ring->get_seqno = gen8_get_seqno;
-	ring->set_seqno = gen8_set_seqno;
-	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) {
+	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1))
 		ring->irq_seqno_barrier = bxt_seqno_barrier;
-		ring->set_seqno = bxt_a_set_seqno;
-	}
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
@@ -2082,8 +2056,6 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
 
 	ring->init_hw = gen8_init_common_ring;
-	ring->get_seqno = gen8_get_seqno;
-	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
@@ -2107,12 +2079,8 @@ static int logical_blt_ring_init(struct drm_device *dev)
 		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
 
 	ring->init_hw = gen8_init_common_ring;
-	ring->get_seqno = gen8_get_seqno;
-	ring->set_seqno = gen8_set_seqno;
-	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) {
+	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1))
 		ring->irq_seqno_barrier = bxt_seqno_barrier;
-		ring->set_seqno = bxt_a_set_seqno;
-	}
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
@@ -2136,12 +2104,8 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
 
 	ring->init_hw = gen8_init_common_ring;
-	ring->get_seqno = gen8_get_seqno;
-	ring->set_seqno = gen8_set_seqno;
-	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) {
+	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1))
 		ring->irq_seqno_barrier = bxt_seqno_barrier;
-		ring->set_seqno = bxt_a_set_seqno;
-	}
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 57ec21c5b1ab..c86d0e17d785 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1216,19 +1216,17 @@ static int gen8_rcs_signal(struct drm_i915_gem_request *signaller_req,
 		return ret;
 
 	for_each_ring(waiter, dev_priv, i) {
-		u32 seqno;
 		u64 gtt_offset = signaller->semaphore.signal_ggtt[i];
 		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
 			continue;
 
-		seqno = i915_gem_request_get_seqno(signaller_req);
 		intel_ring_emit(signaller, GFX_OP_PIPE_CONTROL(6));
 		intel_ring_emit(signaller, PIPE_CONTROL_GLOBAL_GTT_IVB |
 					   PIPE_CONTROL_QW_WRITE |
 					   PIPE_CONTROL_FLUSH_ENABLE);
 		intel_ring_emit(signaller, lower_32_bits(gtt_offset));
 		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
-		intel_ring_emit(signaller, seqno);
+		intel_ring_emit(signaller, signaller_req->seqno);
 		intel_ring_emit(signaller, 0);
 		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
 					   MI_SEMAPHORE_TARGET(waiter->id));
@@ -1257,18 +1255,16 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
 		return ret;
 
 	for_each_ring(waiter, dev_priv, i) {
-		u32 seqno;
 		u64 gtt_offset = signaller->semaphore.signal_ggtt[i];
 		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
 			continue;
 
-		seqno = i915_gem_request_get_seqno(signaller_req);
 		intel_ring_emit(signaller, (MI_FLUSH_DW + 1) |
 					   MI_FLUSH_DW_OP_STOREDW);
 		intel_ring_emit(signaller, lower_32_bits(gtt_offset) |
 					   MI_FLUSH_DW_USE_GTT);
 		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
-		intel_ring_emit(signaller, seqno);
+		intel_ring_emit(signaller, signaller_req->seqno);
 		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
 					   MI_SEMAPHORE_TARGET(waiter->id));
 		intel_ring_emit(signaller, 0);
@@ -1299,11 +1295,9 @@ static int gen6_signal(struct drm_i915_gem_request *signaller_req,
 		i915_reg_t mbox_reg = signaller->semaphore.mbox.signal[i];
 
 		if (i915_mmio_reg_valid(mbox_reg)) {
-			u32 seqno = i915_gem_request_get_seqno(signaller_req);
-
 			intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1));
 			intel_ring_emit_reg(signaller, mbox_reg);
-			intel_ring_emit(signaller, seqno);
+			intel_ring_emit(signaller, signaller_req->seqno);
 		}
 	}
 
@@ -1338,7 +1332,7 @@ gen6_add_request(struct drm_i915_gem_request *req)
 
 	intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
 	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
-	intel_ring_emit(ring, i915_gem_request_get_seqno(req));
+	intel_ring_emit(ring, req->seqno);
 	intel_ring_emit(ring, MI_USER_INTERRUPT);
 	__intel_ring_advance(ring);
 
@@ -1440,7 +1434,9 @@ static int
 pc_render_add_request(struct drm_i915_gem_request *req)
 {
 	struct intel_engine_cs *ring = req->ring;
-	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	u32 addr = req->ring->status_page.gfx_addr +
+		(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
+	u32 scratch_addr = addr;
 	int ret;
 
 	/* For Ironlake, MI_USER_INTERRUPT was deprecated and apparently
@@ -1455,11 +1451,12 @@ pc_render_add_request(struct drm_i915_gem_request *req)
 	if (ret)
 		return ret;
 
-	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
-			PIPE_CONTROL_WRITE_FLUSH |
-			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
-	intel_ring_emit(ring, ring->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
-	intel_ring_emit(ring, i915_gem_request_get_seqno(req));
+	intel_ring_emit(ring,
+			GFX_OP_PIPE_CONTROL(4) |
+			PIPE_CONTROL_QW_WRITE |
+			PIPE_CONTROL_WRITE_FLUSH);
+	intel_ring_emit(ring, addr | PIPE_CONTROL_GLOBAL_GTT);
+	intel_ring_emit(ring, req->seqno);
 	intel_ring_emit(ring, 0);
 	PIPE_CONTROL_FLUSH(ring, scratch_addr);
 	scratch_addr += 2 * CACHELINE_BYTES; /* write to separate cachelines */
@@ -1473,12 +1470,12 @@ pc_render_add_request(struct drm_i915_gem_request *req)
 	scratch_addr += 2 * CACHELINE_BYTES;
 	PIPE_CONTROL_FLUSH(ring, scratch_addr);
 
-	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
+	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4) |
+			PIPE_CONTROL_QW_WRITE |
 			PIPE_CONTROL_WRITE_FLUSH |
-			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
 			PIPE_CONTROL_NOTIFY);
-	intel_ring_emit(ring, ring->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
-	intel_ring_emit(ring, i915_gem_request_get_seqno(req));
+	intel_ring_emit(ring, addr | PIPE_CONTROL_GLOBAL_GTT);
+	intel_ring_emit(ring, req->seqno);
 	intel_ring_emit(ring, 0);
 	__intel_ring_advance(ring);
 
@@ -1506,30 +1503,6 @@ gen6_seqno_barrier(struct intel_engine_cs *ring)
 	intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
 }
 
-static u32
-ring_get_seqno(struct intel_engine_cs *ring)
-{
-	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
-}
-
-static void
-ring_set_seqno(struct intel_engine_cs *ring, u32 seqno)
-{
-	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
-}
-
-static u32
-pc_render_get_seqno(struct intel_engine_cs *ring)
-{
-	return ring->scratch.cpu_page[0];
-}
-
-static void
-pc_render_set_seqno(struct intel_engine_cs *ring, u32 seqno)
-{
-	ring->scratch.cpu_page[0] = seqno;
-}
-
 static bool
 gen5_ring_get_irq(struct intel_engine_cs *ring)
 {
@@ -1665,7 +1638,7 @@ i9xx_add_request(struct drm_i915_gem_request *req)
 
 	intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
 	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
-	intel_ring_emit(ring, i915_gem_request_get_seqno(req));
+	intel_ring_emit(ring, req->seqno);
 	intel_ring_emit(ring, MI_USER_INTERRUPT);
 	__intel_ring_advance(ring);
 
@@ -2457,7 +2430,10 @@ void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno)
 			I915_WRITE(RING_SYNC_2(ring->mmio_base), 0);
 	}
 
-	ring->set_seqno(ring, seqno);
+	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
+	if (ring->irq_seqno_barrier)
+		ring->irq_seqno_barrier(ring);
+
 	ring->hangcheck.seqno = seqno;
 }
 
@@ -2695,8 +2671,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->irq_put = gen8_ring_put_irq;
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		ring->irq_seqno_barrier = gen6_seqno_barrier;
-		ring->get_seqno = ring_get_seqno;
-		ring->set_seqno = ring_set_seqno;
 		if (i915_semaphore_is_enabled(dev)) {
 			WARN_ON(!dev_priv->semaphore_obj);
 			ring->semaphore.sync_to = gen8_ring_sync;
@@ -2713,8 +2687,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->irq_put = gen6_ring_put_irq;
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		ring->irq_seqno_barrier = gen6_seqno_barrier;
-		ring->get_seqno = ring_get_seqno;
-		ring->set_seqno = ring_set_seqno;
 		if (i915_semaphore_is_enabled(dev)) {
 			ring->semaphore.sync_to = gen6_ring_sync;
 			ring->semaphore.signal = gen6_signal;
@@ -2739,8 +2711,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	} else if (IS_GEN5(dev)) {
 		ring->add_request = pc_render_add_request;
 		ring->flush = gen4_render_ring_flush;
-		ring->get_seqno = pc_render_get_seqno;
-		ring->set_seqno = pc_render_set_seqno;
 		ring->irq_get = gen5_ring_get_irq;
 		ring->irq_put = gen5_ring_put_irq;
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT |
@@ -2751,8 +2721,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 			ring->flush = gen2_render_ring_flush;
 		else
 			ring->flush = gen4_render_ring_flush;
-		ring->get_seqno = ring_get_seqno;
-		ring->set_seqno = ring_set_seqno;
 		if (IS_GEN2(dev)) {
 			ring->irq_get = i8xx_ring_get_irq;
 			ring->irq_put = i8xx_ring_put_irq;
@@ -2828,8 +2796,6 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 		ring->flush = gen6_bsd_ring_flush;
 		ring->add_request = gen6_add_request;
 		ring->irq_seqno_barrier = gen6_seqno_barrier;
-		ring->get_seqno = ring_get_seqno;
-		ring->set_seqno = ring_set_seqno;
 		if (INTEL_INFO(dev)->gen >= 8) {
 			ring->irq_enable_mask =
 				GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
@@ -2867,8 +2833,6 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 		ring->mmio_base = BSD_RING_BASE;
 		ring->flush = bsd_ring_flush;
 		ring->add_request = i9xx_add_request;
-		ring->get_seqno = ring_get_seqno;
-		ring->set_seqno = ring_set_seqno;
 		if (IS_GEN5(dev)) {
 			ring->irq_enable_mask = ILK_BSD_USER_INTERRUPT;
 			ring->irq_get = gen5_ring_get_irq;
@@ -2901,8 +2865,6 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
 	ring->flush = gen6_bsd_ring_flush;
 	ring->add_request = gen6_add_request;
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
-	ring->get_seqno = ring_get_seqno;
-	ring->set_seqno = ring_set_seqno;
 	ring->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
 	ring->irq_get = gen8_ring_get_irq;
@@ -2932,8 +2894,6 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 	ring->flush = gen6_ring_flush;
 	ring->add_request = gen6_add_request;
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
-	ring->get_seqno = ring_get_seqno;
-	ring->set_seqno = ring_set_seqno;
 	if (INTEL_INFO(dev)->gen >= 8) {
 		ring->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
@@ -2990,8 +2950,6 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 	ring->flush = gen6_ring_flush;
 	ring->add_request = gen6_add_request;
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
-	ring->get_seqno = ring_get_seqno;
-	ring->set_seqno = ring_set_seqno;
 
 	if (INTEL_INFO(dev)->gen >= 8) {
 		ring->irq_enable_mask =
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 3b49726b1732..28ab07b38c05 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -220,9 +220,6 @@ struct  intel_engine_cs {
 	 * monotonic, even if not coherent.
 	 */
 	void		(*irq_seqno_barrier)(struct intel_engine_cs *ring);
-	u32		(*get_seqno)(struct intel_engine_cs *ring);
-	void		(*set_seqno)(struct intel_engine_cs *ring,
-				     u32 seqno);
 	int		(*dispatch_execbuffer)(struct drm_i915_gem_request *req,
 					       u64 offset, u32 length,
 					       unsigned dispatch_flags);
@@ -502,6 +499,10 @@ int intel_init_blt_ring_buffer(struct drm_device *dev);
 int intel_init_vebox_ring_buffer(struct drm_device *dev);
 
 u64 intel_ring_get_active_head(struct intel_engine_cs *ring);
+static inline u32 intel_ring_get_seqno(struct intel_engine_cs *ring)
+{
+	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
+}
 
 int init_workarounds_ring(struct intel_engine_cs *ring);
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 022/190] drm/i915: Check the CPU cached value of seqno after waking the waiter
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (19 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 021/190] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 023/190] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted Chris Wilson
                   ` (65 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

If we have multiple waiters, we may find that many complete on the same
wake up. If we first inspect the seqno from the CPU cache, we may reduce
the number of heavyweight coherent seqno reads we require.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fcedcbc50834..c2ee8efdd928 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3632,6 +3632,12 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 {
 	struct intel_engine_cs *engine = req->ring;
 
+	/* Before we do the heavier coherent read of the seqno,
+	 * check the value (hopefully) in the CPU cacheline.
+	 */
+	if (i915_gem_request_completed(req))
+		return true;
+
 	/* Ensure our read of the seqno is coherent so that we
 	 * do not "miss an interrupt" (i.e. if this is the last
 	 * request and the seqno write from the GPU is not visible
@@ -3643,11 +3649,11 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 	 * but it is easier and safer to do it every time the waiter
 	 * is woken.
 	 */
-	if (engine->irq_seqno_barrier)
+	if (engine->irq_seqno_barrier) {
 		engine->irq_seqno_barrier(engine);
-
-	if (i915_gem_request_completed(req))
-		return true;
+		if (i915_gem_request_completed(req))
+			return true;
+	}
 
 	/* We need to check whether any gpu reset happened in between
 	 * the request being submitted and now. If a reset has occurred,
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 023/190] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (20 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 022/190] drm/i915: Check the CPU cached value of seqno after waking the waiter Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 024/190] drm/i915: Replace manual barrier() with READ_ONCE() in HWS accessor Chris Wilson
                   ` (64 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

If we flag the seqno as potentially stale upon receiving an interrupt,
we can use that information to reduce the frequency that we apply the
heavyweight coherent seqno read (i.e. if we wake up a chain of waiters).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h          | 15 ++++++++++++++-
 drivers/gpu/drm/i915/i915_irq.c          |  1 +
 drivers/gpu/drm/i915/intel_breadcrumbs.c |  8 ++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  1 +
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c2ee8efdd928..8940b8d3fa59 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3649,7 +3649,20 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 	 * but it is easier and safer to do it every time the waiter
 	 * is woken.
 	 */
-	if (engine->irq_seqno_barrier) {
+	if (engine->irq_seqno_barrier && READ_ONCE(engine->irq_posted)) {
+		/* The ordering of irq_posted versus applying the barrier
+		 * is crucial. The clearing of the current irq_posted must
+		 * be visible before we perform the barrier operation,
+		 * such that if a subsequent interrupt arrives, irq_posted
+		 * is reasserted and our task rewoken (which causes us to
+		 * do another __i915_request_irq_complete() immediately
+		 * and reapply the barrier). Conversely, if the clear
+		 * occurs after the barrier, then an interrupt that arrived
+		 * whilst we waited on the barrier would not trigger a
+		 * barrier on the next pass, and the read may not see the
+		 * seqno update.
+		 */
+		WRITE_ONCE(engine->irq_posted, false);
 		engine->irq_seqno_barrier(engine);
 		if (i915_gem_request_completed(req))
 			return true;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 627c7fb6aa9b..738edd7fbf8d 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1000,6 +1000,7 @@ static void notify_ring(struct intel_engine_cs *ring)
 		return;
 
 	trace_i915_gem_request_notify(ring);
+	ring->irq_posted = true; /* paired with mb() in wake_up_process() */
 	intel_engine_wakeup(ring);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index f66acf820c40..d689bd61534e 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -43,12 +43,20 @@ static void intel_breadcrumbs_fake_irq(unsigned long data)
 
 static void irq_enable(struct intel_engine_cs *engine)
 {
+	/* Enabling the IRQ may miss the generation of the interrupt, but
+	 * we still need to force the barrier before reading the seqno,
+	 * just in case.
+	 */
+	engine->irq_posted = true;
+
 	WARN_ON(!engine->irq_get(engine));
 }
 
 static void irq_disable(struct intel_engine_cs *engine)
 {
 	engine->irq_put(engine);
+
+	engine->irq_posted = false;
 }
 
 static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 28ab07b38c05..6cc8e9c5f8d6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -198,6 +198,7 @@ struct  intel_engine_cs {
 	struct i915_ctx_workarounds wa_ctx;
 
 	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
+	bool		irq_posted;
 	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
 	struct drm_i915_gem_request *trace_irq_req;
 	bool __must_check (*irq_get)(struct intel_engine_cs *ring);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 024/190] drm/i915: Replace manual barrier() with READ_ONCE() in HWS accessor
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (21 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 023/190] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-12 14:17   ` Mika Kuoppala
  2016-01-11  9:16 ` [PATCH 025/190] drm/i915: Broadwell execlists needs exactly the same seqno w/a as legacy Chris Wilson
                   ` (63 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

When reading from the HWS page, we use barrier() to prevent the compiler
optimising away the read from the volatile (may be updated by the GPU)
memory address. This is more suited to READ_ONCE(); make it so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/intel_ringbuffer.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 6cc8e9c5f8d6..8f305ce253ae 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -418,8 +418,7 @@ intel_read_status_page(struct intel_engine_cs *ring,
 		       int reg)
 {
 	/* Ensure that the compiler doesn't optimize away the load. */
-	barrier();
-	return ring->status_page.page_addr[reg];
+	return READ_ONCE(ring->status_page.page_addr[reg]);
 }
 
 static inline void
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 025/190] drm/i915: Broadwell execlists needs exactly the same seqno w/a as legacy
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (22 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 024/190] drm/i915: Replace manual barrier() with READ_ONCE() in HWS accessor Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 026/190] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
                   ` (62 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

In legacy mode, we use the gen6 seqno barrier to insert a delay after
the interrupt before reading the seqno (as the seqno write is not
flushed before the interrupt is sent, the interrupt arrives before the
seqno is visible). Execlists ignored the evidence of igt.

Note that is harder, but not impossible, to reproduce the missed
interrupt syndrome with execlists. This is primarily because execlists
itself being interrupt driven helps mask the issue.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 39 +++++++++++++++++++++------------------
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ad51b1fc37cd..27d91f1ceb2b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1775,18 +1775,24 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
 	return 0;
 }
 
-static void bxt_seqno_barrier(struct intel_engine_cs *ring)
+static void
+gen6_seqno_barrier(struct intel_engine_cs *ring)
 {
-	/*
-	 * On BXT A steppings there is a HW coherency issue whereby the
-	 * MI_STORE_DATA_IMM storing the completed request's seqno
-	 * occasionally doesn't invalidate the CPU cache. Work around this by
-	 * clflushing the corresponding cacheline whenever the caller wants
-	 * the coherency to be guaranteed. Note that this cacheline is known
-	 * to be clean at this point, since we only write it in
-	 * bxt_a_set_seqno(), where we also do a clflush after the write. So
-	 * this clflush in practice becomes an invalidate operation.
+	/* Workaround to force correct ordering between irq and seqno writes on
+	 * ivb (and maybe also on snb) by reading from a CS register (like
+	 * ACTHD) before reading the status page.
+	 *
+	 * Note that this effectively effectively stalls the read by the time
+	 * it takes to do a memory transaction, which more or less ensures
+	 * that the write from the GPU has sufficient time to invalidate
+	 * the CPU cacheline. Alternatively we could delay the interrupt from
+	 * the CS ring to give the write time to land, but that would incur
+	 * a delay after every batch i.e. much more frequent than a delay
+	 * when waiting for the interrupt (with the same net latency).
 	 */
+	struct drm_i915_private *dev_priv = ring->i915;
+	POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
+
 	intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
 }
 
@@ -1984,8 +1990,7 @@ static int logical_render_ring_init(struct drm_device *dev)
 		ring->init_hw = gen8_init_render_ring;
 	ring->init_context = gen8_init_rcs_context;
 	ring->cleanup = intel_fini_pipe_control;
-	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1))
-		ring->irq_seqno_barrier = bxt_seqno_barrier;
+	ring->irq_seqno_barrier = gen6_seqno_barrier;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush_render;
 	ring->irq_get = gen8_logical_ring_get_irq;
@@ -2031,8 +2036,7 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
 
 	ring->init_hw = gen8_init_common_ring;
-	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1))
-		ring->irq_seqno_barrier = bxt_seqno_barrier;
+	ring->irq_seqno_barrier = gen6_seqno_barrier;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
@@ -2056,6 +2060,7 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
 
 	ring->init_hw = gen8_init_common_ring;
+	ring->irq_seqno_barrier = gen6_seqno_barrier;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
@@ -2079,8 +2084,7 @@ static int logical_blt_ring_init(struct drm_device *dev)
 		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
 
 	ring->init_hw = gen8_init_common_ring;
-	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1))
-		ring->irq_seqno_barrier = bxt_seqno_barrier;
+	ring->irq_seqno_barrier = gen6_seqno_barrier;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
@@ -2104,8 +2108,7 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
 
 	ring->init_hw = gen8_init_common_ring;
-	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1))
-		ring->irq_seqno_barrier = bxt_seqno_barrier;
+	ring->irq_seqno_barrier = gen6_seqno_barrier;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 026/190] drm/i915: Stop setting wraparound seqno on initialisation
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (23 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 025/190] drm/i915: Broadwell execlists needs exactly the same seqno w/a as legacy Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 027/190] drm/i915: Only query timestamp when measuring elapsed time Chris Wilson
                   ` (61 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

We have testcases to ensure that seqno wraparound works fine, so we can
forgo forcing everyone to encounter seqno wraparound during early
uptime. seqno wraparound incurs a full GPU stall so not forcing it
will eliminate one jitter from the early system. Using the testcases, we
have very deterministic testing which given how difficult it would be to
debug an issue (GPU hang) stemming from a wraparound using pure
postmortem analysis I see no value in forcing a wrap during boot.

Advancing the global next_seqno after a GPU reset is equally pointless.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d125820c6309..a0744626a110 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4814,14 +4814,6 @@ i915_gem_init_hw(struct drm_device *dev)
 		}
 	}
 
-	/*
-	 * Increment the next seqno by 0x100 so we have a visible break
-	 * on re-initialisation
-	 */
-	ret = i915_gem_set_seqno(dev, dev_priv->next_seqno+0x100);
-	if (ret)
-		goto out;
-
 	/* Now it is safe to go back round and do everything else: */
 	for_each_ring(ring, dev_priv, i) {
 		struct drm_i915_gem_request *req;
@@ -5001,13 +4993,7 @@ i915_gem_load(struct drm_device *dev)
 		dev_priv->num_fence_regs =
 				I915_READ(vgtif_reg(avail_rs.fence_num));
 
-	/*
-	 * Set initial sequence number for requests.
-	 * Using this number allows the wraparound to happen early,
-	 * catching any obvious problems.
-	 */
-	dev_priv->next_seqno = ((u32)~0 - 0x1100);
-	dev_priv->last_seqno = ((u32)~0 - 0x1101);
+	dev_priv->next_seqno = 1;
 
 	/* Initialize fence registers to zero */
 	INIT_LIST_HEAD(&dev_priv->mm.fence_list);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 027/190] drm/i915: Only query timestamp when measuring elapsed time
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (24 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 026/190] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 028/190] drm/i915: On GPU reset, set the HWS breadcrumb to the last seqno Chris Wilson
                   ` (60 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

Avoid the two calls to ktime_get_raw_ns() (at best it reads the TSC) as
we only need to compute the elapsed time for a timed wait.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a0744626a110..b956b8813307 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1220,7 +1220,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
 	struct intel_wait wait;
 	unsigned long timeout_remain;
-	s64 before, now;
 	int ret = 0;
 
 	might_sleep();
@@ -1239,13 +1238,12 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 		if (*timeout == 0)
 			return -ETIME;
 
+		/* Record current time in case interrupted, or wedged */
 		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
+		*timeout += ktime_get_raw_ns();
 	}
 
-	/* Record current time in case interrupted by signal, or wedged */
 	trace_i915_gem_request_wait_begin(req);
-	before = ktime_get_raw_ns();
-
 	if (INTEL_INFO(req->i915)->gen >= 6)
 		gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
 
@@ -1298,13 +1296,12 @@ wakeup:
 complete:
 	intel_engine_remove_wait(req->ring, &wait);
 	__set_task_state(wait.task, TASK_RUNNING);
-	now = ktime_get_raw_ns();
 	trace_i915_gem_request_wait_end(req);
 
 	if (timeout) {
-		s64 tres = *timeout - (now - before);
-
-		*timeout = tres < 0 ? 0 : tres;
+		*timeout -= ktime_get_raw_ns();
+		if (*timeout < 0)
+			*timeout = 0;
 
 		/*
 		 * Apparently ktime isn't accurate enough and occasionally has a
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 028/190] drm/i915: On GPU reset, set the HWS breadcrumb to the last seqno
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (25 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 027/190] drm/i915: Only query timestamp when measuring elapsed time Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 029/190] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
                   ` (59 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

After the GPU reset and we discard all of the incomplete requests, mark
the GPU as having advanced to the last_submitted_seqno (as having
completed the requests and ready for fresh work). The impact of this is
negligble, as all the requests will be considered completed by this
point, it just brings the HWS into line with expectations for external
viewers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b956b8813307..a713e8a6cb36 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2818,6 +2818,8 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 		buffer->last_retired_head = buffer->tail;
 		intel_ring_update_space(buffer);
 	}
+
+	intel_ring_init_seqno(ring, ring->last_submitted_seqno);
 }
 
 void i915_gem_reset(struct drm_device *dev)
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 029/190] drm/i915: Convert trace-irq to the breadcrumb waiter
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (26 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 028/190] drm/i915: On GPU reset, set the HWS breadcrumb to the last seqno Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 030/190] drm/i915: Move the get/put irq locking into the caller Chris Wilson
                   ` (58 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

If we convert the tracing over from direct use of ring->irq_get() and
over to the breadcrumb infrastructure, we only have a single user of the
ring->irq_get and so we will be able to simplify the driver routines
(eliminating the redundant validation and irq refcounting).

v2: Move to a signaling framework based upon the waiter.
v3: Track the first-signal to avoid having to walk the rbtree everytime.
v4: Mark the signaler thread as RT priority to reduce latency in the
indirect wakeups.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h          |   8 --
 drivers/gpu/drm/i915/i915_gem.c          |   6 --
 drivers/gpu/drm/i915/i915_irq.c          |   7 +-
 drivers/gpu/drm/i915/i915_trace.h        |   2 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 177 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h  |   7 +-
 6 files changed, 186 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8940b8d3fa59..7f021505e32f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3620,14 +3620,6 @@ wait_remaining_ms_from_jiffies(unsigned long timestamp_jiffies, int to_wait_ms)
 			    schedule_timeout_uninterruptible(remaining_jiffies);
 	}
 }
-
-static inline void i915_trace_irq_get(struct intel_engine_cs *ring,
-				      struct drm_i915_gem_request *req)
-{
-	if (ring->trace_irq_req == NULL && ring->irq_get(ring))
-		i915_gem_request_assign(&ring->trace_irq_req, req);
-}
-
 static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 {
 	struct intel_engine_cs *engine = req->ring;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a713e8a6cb36..5ddb2ed0f785 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2889,12 +2889,6 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 		i915_gem_object_retire__read(obj, ring->id);
 	}
 
-	if (unlikely(ring->trace_irq_req &&
-		     i915_gem_request_completed(ring->trace_irq_req))) {
-		ring->irq_put(ring);
-		i915_gem_request_assign(&ring->trace_irq_req, NULL);
-	}
-
 	WARN_ON(i915_verify_lists(ring->dev));
 }
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 738edd7fbf8d..bf48fa63127a 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -996,12 +996,9 @@ static void ironlake_rps_change_irq_handler(struct drm_device *dev)
 
 static void notify_ring(struct intel_engine_cs *ring)
 {
-	if (!intel_ring_initialized(ring))
-		return;
-
-	trace_i915_gem_request_notify(ring);
 	ring->irq_posted = true; /* paired with mb() in wake_up_process() */
-	intel_engine_wakeup(ring);
+	if (intel_engine_wakeup(ring))
+		trace_i915_gem_request_notify(ring);
 }
 
 static void vlv_c0_read(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index efca75bcace3..43bb2e0bb949 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -503,7 +503,7 @@ TRACE_EVENT(i915_gem_ring_dispatch,
 			   __entry->ring = ring->id;
 			   __entry->seqno = i915_gem_request_get_seqno(req);
 			   __entry->flags = flags;
-			   i915_trace_irq_get(ring, req);
+			   intel_engine_enable_signaling(req);
 			   ),
 
 	    TP_printk("dev=%u, ring=%u, seqno=%u, flags=%x",
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index d689bd61534e..cf9cbcc2d5d7 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -22,6 +22,8 @@
  *
  */
 
+#include <linux/kthread.h>
+
 #include "i915_drv.h"
 
 static void intel_breadcrumbs_fake_irq(unsigned long data)
@@ -320,10 +322,185 @@ void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
 		    (unsigned long)engine);
 }
 
+struct signal {
+	struct rb_node node;
+	struct intel_wait wait;
+	struct drm_i915_gem_request *request;
+};
+
+static bool signal_complete(struct signal *signal)
+{
+	if (signal == NULL)
+		return false;
+
+	/* If another process served as the bottom-half it may have already
+	 * signalled that this wait is already completed.
+	 */
+	if (intel_wait_complete(&signal->wait))
+		return true;
+
+	/* Carefully check if the request is complete, giving time for the
+	 * seqno to be visible or if the GPU hung.
+	 */
+	if (__i915_request_irq_complete(signal->request))
+		return true;
+
+	return false;
+}
+
+static struct signal *to_signal(struct rb_node *rb)
+{
+	return container_of(rb, struct signal, node);
+}
+
+static void signaler_set_rtpriority(void)
+{
+	 struct sched_param param = { .sched_priority = 1 };
+	 sched_setscheduler_nocheck(current, SCHED_FIFO, &param);
+}
+
+static int intel_breadcrumbs_signaler(void *arg)
+{
+	struct intel_engine_cs *engine = arg;
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct signal *signal;
+
+	/* Install ourselves with high priority to reduce signalling latency */
+	signaler_set_rtpriority();
+
+	do {
+		set_current_state(TASK_INTERRUPTIBLE);
+
+		/* We are either woken up by the interrupt bottom-half,
+		 * or by a client adding a new signaller. In both cases,
+		 * the GPU seqno may have advanced beyond our oldest signal.
+		 * If it has, propagate the signal, remove the waiter and
+		 * check again with the next oldest signal. Otherwise we
+		 * need to wait for a new interrupt from the GPU or for
+		 * a new client.
+		 */
+		signal = READ_ONCE(b->first_signal);
+		if (signal_complete(signal)) {
+			/* Wake up all other completed waiters and select the
+			 * next bottom-half for the next user interrupt.
+			 */
+			intel_engine_remove_wait(engine, &signal->wait);
+
+			i915_gem_request_unreference__unlocked(signal->request);
+
+			/* Find the next oldest signal. Note that as we have
+			 * not been holding the lock, another client may
+			 * have installed an even older signal than the one
+			 * we just completed - so double check we are still
+			 * the oldest before picking the next one.
+			 */
+			spin_lock(&b->lock);
+			if (signal == b->first_signal)
+				b->first_signal = rb_next(&signal->node);
+			rb_erase(&signal->node, &b->signals);
+			spin_unlock(&b->lock);
+
+			kfree(signal);
+		} else {
+			if (kthread_should_stop())
+				break;
+
+			schedule();
+		}
+	} while (1);
+
+	return 0;
+}
+
+int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
+{
+	struct intel_engine_cs *engine = request->ring;
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct rb_node *parent, **p;
+	struct task_struct *task;
+	struct signal *signal;
+	bool first;
+
+	signal = kmalloc(sizeof(*signal), GFP_ATOMIC);
+	if (unlikely(signal == NULL))
+		return -ENOMEM;
+
+	/* Spawn a thread to provide a common bottom-half for all signals.
+	 * As this is an asynchronous interface we cannot steal the current
+	 * task for handling the bottom-half to the user interrupt, therefore
+	 * we create a thread to do the coherent seqno dance after the
+	 * interrupt and then signal the waitqueue (via the dma-buf/fence).
+	 */
+	task = READ_ONCE(b->signaler);
+	if (unlikely(task == NULL)) {
+		spin_lock(&b->lock);
+		task = b->signaler;
+		if (task == NULL) {
+			task = kthread_create(intel_breadcrumbs_signaler,
+					      engine,
+					      "irq/i915:%d",
+					      engine->id);
+			if (!IS_ERR(task))
+				b->signaler = task;
+		}
+		spin_unlock(&b->lock);
+
+		if (IS_ERR(task)) {
+			kfree(signal);
+			return PTR_ERR(task);
+		}
+	}
+
+	signal->wait.task = task;
+	signal->wait.seqno = request->seqno;
+
+	signal->request = i915_gem_request_reference(request);
+
+	/* Insert ourselves into the retirement ordered list of signals
+	 * on this engine. We track the oldest seqno as that will be the
+	 * first signal to complete.
+	 */
+	spin_lock(&b->lock);
+	parent = NULL;
+	first = true;
+	p = &b->signals.rb_node;
+	while (*p) {
+		parent = *p;
+		if (i915_seqno_passed(signal->wait.seqno,
+				      to_signal(parent)->wait.seqno)) {
+			p = &parent->rb_right;
+			first = false;
+		} else
+			p = &parent->rb_left;
+	}
+	rb_link_node(&signal->node, parent, p);
+	rb_insert_color(&signal->node, &b->signals);
+	if (first)
+		smp_store_mb(b->first_signal, signal);
+	spin_unlock(&b->lock);
+
+	/* Now add ourselves into the list of waiters, but register our
+	 * bottom-half as the signaller thread. As per usual, only the oldest
+	 * waiter (not just signaller) is tasked as the bottom-half waking
+	 * up all completed waiters after the user interrupt.
+	 *
+	 * If we are the oldest waiter, enable the irq (after which we
+	 * must double check that the seqno did not complete).
+	 */
+	if (intel_engine_add_wait(engine, &signal->wait) &&
+	    intel_engine_enable_wait_irq(engine, &signal->wait))
+		wake_up_process(task);
+
+	return 0;
+}
+
 void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
 
+	if (b->signaler)
+		kthread_stop(b->signaler);
+
 	del_timer_sync(&b->fake_irq);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 8f305ce253ae..ba81052999fa 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -145,6 +145,8 @@ struct  i915_ctx_workarounds {
 	struct drm_i915_gem_object *obj;
 };
 
+struct drm_i915_gem_request;
+
 struct  intel_engine_cs {
 	const char	*name;
 	enum intel_ring_id {
@@ -181,7 +183,10 @@ struct  intel_engine_cs {
 	struct intel_breadcrumbs {
 		spinlock_t lock; /* protects the lists of requests */
 		struct rb_root waiters; /* sorted by retirement, priority */
+		struct rb_root signals; /* sorted by retirement */
 		struct task_struct *first_waiter; /* bh for user interrupts */
+		struct task_struct *signaler; /* used for fence signalling */
+		void *first_signal;
 		struct timer_list fake_irq; /* used after a missed interrupt */
 		bool irq_enabled;
 		bool rpm_wakelock;
@@ -200,7 +205,6 @@ struct  intel_engine_cs {
 	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
 	bool		irq_posted;
 	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
-	struct drm_i915_gem_request *trace_irq_req;
 	bool __must_check (*irq_get)(struct intel_engine_cs *ring);
 	void		(*irq_put)(struct intel_engine_cs *ring);
 
@@ -558,6 +562,7 @@ bool intel_engine_enable_wait_irq(struct intel_engine_cs *engine,
 				  const struct intel_wait *wait);
 void intel_engine_remove_wait(struct intel_engine_cs *engine,
 			      struct intel_wait *wait);
+int intel_engine_enable_signaling(struct drm_i915_gem_request *request);
 static inline bool intel_engine_has_waiter(struct intel_engine_cs *engine)
 {
 	return READ_ONCE(engine->breadcrumbs.first_waiter);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 030/190] drm/i915: Move the get/put irq locking into the caller
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (27 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 029/190] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 031/190] drm/i915: Harden detection of missed interrupts Chris Wilson
                   ` (57 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

With only a single callsite for intel_engine_cs->irq_get and ->irq_put,
we can reduce the code size by moving the common preamble into the
caller, and we can also eliminate the reference counting.

For completeness, as we are no longer doing reference counting on irq,
rename the get/put vfunctions to enable/disable respectively.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_breadcrumbs.c |   8 +-
 drivers/gpu/drm/i915/intel_lrc.c         |  53 ++----
 drivers/gpu/drm/i915/intel_ringbuffer.c  | 302 ++++++++++---------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h  |   5 +-
 4 files changed, 125 insertions(+), 243 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index cf9cbcc2d5d7..0ea01bd6811c 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -51,12 +51,16 @@ static void irq_enable(struct intel_engine_cs *engine)
 	 */
 	engine->irq_posted = true;
 
-	WARN_ON(!engine->irq_get(engine));
+	spin_lock_irq(&engine->i915->irq_lock);
+	engine->irq_enable(engine);
+	spin_unlock_irq(&engine->i915->irq_lock);
 }
 
 static void irq_disable(struct intel_engine_cs *engine)
 {
-	engine->irq_put(engine);
+	spin_lock_irq(&engine->i915->irq_lock);
+	engine->irq_disable(engine);
+	spin_unlock_irq(&engine->i915->irq_lock);
 
 	engine->irq_posted = false;
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 27d91f1ceb2b..b1ede2e9b372 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1640,37 +1640,20 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
 	return 0;
 }
 
-static bool gen8_logical_ring_get_irq(struct intel_engine_cs *ring)
+static void gen8_logical_ring_enable_irq(struct intel_engine_cs *ring)
 {
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long flags;
-
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return false;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (ring->irq_refcount++ == 0) {
-		I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
-		POSTING_READ(RING_IMR(ring->mmio_base));
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	struct drm_i915_private *dev_priv = ring->i915;
 
-	return true;
+	I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
+	POSTING_READ(RING_IMR(ring->mmio_base));
 }
 
-static void gen8_logical_ring_put_irq(struct intel_engine_cs *ring)
+static void gen8_logical_ring_disable_irq(struct intel_engine_cs *ring)
 {
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long flags;
+	struct drm_i915_private *dev_priv = ring->i915;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--ring->irq_refcount == 0) {
-		I915_WRITE_IMR(ring, ~ring->irq_keep_mask);
-		POSTING_READ(RING_IMR(ring->mmio_base));
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	I915_WRITE_IMR(ring, ~ring->irq_keep_mask);
+	POSTING_READ(RING_IMR(ring->mmio_base));
 }
 
 static int gen8_emit_flush(struct drm_i915_gem_request *request,
@@ -1993,8 +1976,8 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush_render;
-	ring->irq_get = gen8_logical_ring_get_irq;
-	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->irq_enable = gen8_logical_ring_enable_irq;
+	ring->irq_disable = gen8_logical_ring_disable_irq;
 	ring->emit_bb_start = gen8_emit_bb_start;
 
 	ring->dev = dev;
@@ -2039,8 +2022,8 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
-	ring->irq_get = gen8_logical_ring_get_irq;
-	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->irq_enable = gen8_logical_ring_enable_irq;
+	ring->irq_disable = gen8_logical_ring_disable_irq;
 	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
@@ -2063,8 +2046,8 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
-	ring->irq_get = gen8_logical_ring_get_irq;
-	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->irq_enable = gen8_logical_ring_enable_irq;
+	ring->irq_disable = gen8_logical_ring_disable_irq;
 	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
@@ -2087,8 +2070,8 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
-	ring->irq_get = gen8_logical_ring_get_irq;
-	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->irq_enable = gen8_logical_ring_enable_irq;
+	ring->irq_disable = gen8_logical_ring_disable_irq;
 	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
@@ -2111,8 +2094,8 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
-	ring->irq_get = gen8_logical_ring_get_irq;
-	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->irq_enable = gen8_logical_ring_enable_irq;
+	ring->irq_disable = gen8_logical_ring_disable_irq;
 	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index c86d0e17d785..5625f56a2db1 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1503,109 +1503,56 @@ gen6_seqno_barrier(struct intel_engine_cs *ring)
 	intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
 }
 
-static bool
-gen5_ring_get_irq(struct intel_engine_cs *ring)
+static void
+gen5_ring_enable_irq(struct intel_engine_cs *ring)
 {
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long flags;
-
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return false;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (ring->irq_refcount++ == 0)
-		gen5_enable_gt_irq(dev_priv, ring->irq_enable_mask);
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
-
-	return true;
+	gen5_enable_gt_irq(ring->i915, ring->irq_enable_mask);
 }
 
 static void
-gen5_ring_put_irq(struct intel_engine_cs *ring)
+gen5_ring_disable_irq(struct intel_engine_cs *ring)
 {
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long flags;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--ring->irq_refcount == 0)
-		gen5_disable_gt_irq(dev_priv, ring->irq_enable_mask);
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	gen5_disable_gt_irq(ring->i915, ring->irq_enable_mask);
 }
 
-static bool
-i9xx_ring_get_irq(struct intel_engine_cs *ring)
+static void
+i9xx_ring_enable_irq(struct intel_engine_cs *ring)
 {
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long flags;
-
-	if (!intel_irqs_enabled(dev_priv))
-		return false;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (ring->irq_refcount++ == 0) {
-		dev_priv->irq_mask &= ~ring->irq_enable_mask;
-		I915_WRITE(IMR, dev_priv->irq_mask);
-		POSTING_READ(IMR);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	struct drm_i915_private *dev_priv = ring->i915;
 
-	return true;
+	dev_priv->irq_mask &= ~ring->irq_enable_mask;
+	I915_WRITE(IMR, dev_priv->irq_mask);
+	POSTING_READ(IMR);
 }
 
 static void
-i9xx_ring_put_irq(struct intel_engine_cs *ring)
+i9xx_ring_disable_irq(struct intel_engine_cs *ring)
 {
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long flags;
+	struct drm_i915_private *dev_priv = ring->i915;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--ring->irq_refcount == 0) {
-		dev_priv->irq_mask |= ring->irq_enable_mask;
-		I915_WRITE(IMR, dev_priv->irq_mask);
-		POSTING_READ(IMR);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	dev_priv->irq_mask |= ring->irq_enable_mask;
+	I915_WRITE(IMR, dev_priv->irq_mask);
+	POSTING_READ(IMR);
 }
 
-static bool
-i8xx_ring_get_irq(struct intel_engine_cs *ring)
+static void
+i8xx_ring_enable_irq(struct intel_engine_cs *ring)
 {
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long flags;
-
-	if (!intel_irqs_enabled(dev_priv))
-		return false;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (ring->irq_refcount++ == 0) {
-		dev_priv->irq_mask &= ~ring->irq_enable_mask;
-		I915_WRITE16(IMR, dev_priv->irq_mask);
-		POSTING_READ16(IMR);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	struct drm_i915_private *dev_priv = ring->i915;
 
-	return true;
+	dev_priv->irq_mask &= ~ring->irq_enable_mask;
+	I915_WRITE16(IMR, dev_priv->irq_mask);
+	POSTING_READ16(IMR);
 }
 
 static void
-i8xx_ring_put_irq(struct intel_engine_cs *ring)
+i8xx_ring_disable_irq(struct intel_engine_cs *ring)
 {
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long flags;
+	struct drm_i915_private *dev_priv = ring->i915;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--ring->irq_refcount == 0) {
-		dev_priv->irq_mask |= ring->irq_enable_mask;
-		I915_WRITE16(IMR, dev_priv->irq_mask);
-		POSTING_READ16(IMR);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	dev_priv->irq_mask |= ring->irq_enable_mask;
+	I915_WRITE16(IMR, dev_priv->irq_mask);
+	POSTING_READ16(IMR);
 }
 
 static int
@@ -1645,128 +1592,77 @@ i9xx_add_request(struct drm_i915_gem_request *req)
 	return 0;
 }
 
-static bool
-gen6_ring_get_irq(struct intel_engine_cs *ring)
+static void
+gen6_ring_enable_irq(struct intel_engine_cs *ring)
 {
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long flags;
-
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return false;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (ring->irq_refcount++ == 0) {
-		if (HAS_L3_DPF(dev) && ring->id == RCS)
-			I915_WRITE_IMR(ring,
-				       ~(ring->irq_enable_mask |
-					 GT_PARITY_ERROR(dev)));
-		else
-			I915_WRITE_IMR(ring, ~ring->irq_enable_mask);
-		gen5_enable_gt_irq(dev_priv, ring->irq_enable_mask);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	struct drm_i915_private *dev_priv = ring->i915;
 
-	return true;
+	if (HAS_L3_DPF(dev_priv) && ring->id == RCS)
+		I915_WRITE_IMR(ring,
+			       ~(ring->irq_enable_mask |
+				 GT_PARITY_ERROR(dev_priv)));
+	else
+		I915_WRITE_IMR(ring, ~ring->irq_enable_mask);
+	gen5_enable_gt_irq(dev_priv, ring->irq_enable_mask);
 }
 
 static void
-gen6_ring_put_irq(struct intel_engine_cs *ring)
+gen6_ring_disable_irq(struct intel_engine_cs *ring)
 {
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long flags;
+	struct drm_i915_private *dev_priv = ring->i915;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--ring->irq_refcount == 0) {
-		if (HAS_L3_DPF(dev) && ring->id == RCS)
-			I915_WRITE_IMR(ring, ~GT_PARITY_ERROR(dev));
-		else
-			I915_WRITE_IMR(ring, ~0);
-		gen5_disable_gt_irq(dev_priv, ring->irq_enable_mask);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	if (HAS_L3_DPF(dev_priv) && ring->id == RCS)
+		I915_WRITE_IMR(ring, ~GT_PARITY_ERROR(dev_priv));
+	else
+		I915_WRITE_IMR(ring, ~0);
+	gen5_disable_gt_irq(dev_priv, ring->irq_enable_mask);
 }
 
-static bool
-hsw_vebox_get_irq(struct intel_engine_cs *ring)
+static void
+hsw_vebox_enable_irq(struct intel_engine_cs *ring)
 {
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long flags;
-
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return false;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (ring->irq_refcount++ == 0) {
-		I915_WRITE_IMR(ring, ~ring->irq_enable_mask);
-		gen6_enable_pm_irq(dev_priv, ring->irq_enable_mask);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	struct drm_i915_private *dev_priv = ring->i915;
 
-	return true;
+	I915_WRITE_IMR(ring, ~ring->irq_enable_mask);
+	gen6_enable_pm_irq(dev_priv, ring->irq_enable_mask);
 }
 
 static void
-hsw_vebox_put_irq(struct intel_engine_cs *ring)
+hsw_vebox_disable_irq(struct intel_engine_cs *ring)
 {
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long flags;
+	struct drm_i915_private *dev_priv = ring->i915;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--ring->irq_refcount == 0) {
-		I915_WRITE_IMR(ring, ~0);
-		gen6_disable_pm_irq(dev_priv, ring->irq_enable_mask);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	I915_WRITE_IMR(ring, ~0);
+	gen6_disable_pm_irq(dev_priv, ring->irq_enable_mask);
 }
 
-static bool
-gen8_ring_get_irq(struct intel_engine_cs *ring)
+static void
+gen8_ring_enable_irq(struct intel_engine_cs *ring)
 {
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long flags;
-
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return false;
+	struct drm_i915_private *dev_priv = ring->i915;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (ring->irq_refcount++ == 0) {
-		if (HAS_L3_DPF(dev) && ring->id == RCS) {
-			I915_WRITE_IMR(ring,
-				       ~(ring->irq_enable_mask |
-					 GT_RENDER_L3_PARITY_ERROR_INTERRUPT));
-		} else {
-			I915_WRITE_IMR(ring, ~ring->irq_enable_mask);
-		}
-		POSTING_READ(RING_IMR(ring->mmio_base));
+	if (HAS_L3_DPF(dev_priv) && ring->id == RCS) {
+		I915_WRITE_IMR(ring,
+			       ~(ring->irq_enable_mask |
+				 GT_RENDER_L3_PARITY_ERROR_INTERRUPT));
+	} else {
+		I915_WRITE_IMR(ring, ~ring->irq_enable_mask);
 	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
-
-	return true;
+	POSTING_READ(RING_IMR(ring->mmio_base));
 }
 
 static void
-gen8_ring_put_irq(struct intel_engine_cs *ring)
+gen8_ring_disable_irq(struct intel_engine_cs *ring)
 {
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long flags;
+	struct drm_i915_private *dev_priv = ring->i915;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--ring->irq_refcount == 0) {
-		if (HAS_L3_DPF(dev) && ring->id == RCS) {
-			I915_WRITE_IMR(ring,
-				       ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
-		} else {
-			I915_WRITE_IMR(ring, ~0);
-		}
-		POSTING_READ(RING_IMR(ring->mmio_base));
+	if (HAS_L3_DPF(dev_priv) && ring->id == RCS) {
+		I915_WRITE_IMR(ring,
+			       ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
+	} else {
+		I915_WRITE_IMR(ring, ~0);
 	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	POSTING_READ(RING_IMR(ring->mmio_base));
 }
 
 static int
@@ -2667,8 +2563,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->init_context = intel_rcs_ctx_init;
 		ring->add_request = gen6_add_request;
 		ring->flush = gen8_render_ring_flush;
-		ring->irq_get = gen8_ring_get_irq;
-		ring->irq_put = gen8_ring_put_irq;
+		ring->irq_enable = gen8_ring_enable_irq;
+		ring->irq_disable = gen8_ring_disable_irq;
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		ring->irq_seqno_barrier = gen6_seqno_barrier;
 		if (i915_semaphore_is_enabled(dev)) {
@@ -2683,8 +2579,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->flush = gen7_render_ring_flush;
 		if (INTEL_INFO(dev)->gen == 6)
 			ring->flush = gen6_render_ring_flush;
-		ring->irq_get = gen6_ring_get_irq;
-		ring->irq_put = gen6_ring_put_irq;
+		ring->irq_enable = gen6_ring_enable_irq;
+		ring->irq_disable = gen6_ring_disable_irq;
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		ring->irq_seqno_barrier = gen6_seqno_barrier;
 		if (i915_semaphore_is_enabled(dev)) {
@@ -2711,8 +2607,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	} else if (IS_GEN5(dev)) {
 		ring->add_request = pc_render_add_request;
 		ring->flush = gen4_render_ring_flush;
-		ring->irq_get = gen5_ring_get_irq;
-		ring->irq_put = gen5_ring_put_irq;
+		ring->irq_enable = gen5_ring_enable_irq;
+		ring->irq_disable = gen5_ring_disable_irq;
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT |
 					GT_RENDER_PIPECTL_NOTIFY_INTERRUPT;
 	} else {
@@ -2722,11 +2618,11 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		else
 			ring->flush = gen4_render_ring_flush;
 		if (IS_GEN2(dev)) {
-			ring->irq_get = i8xx_ring_get_irq;
-			ring->irq_put = i8xx_ring_put_irq;
+			ring->irq_enable = i8xx_ring_enable_irq;
+			ring->irq_disable = i8xx_ring_disable_irq;
 		} else {
-			ring->irq_get = i9xx_ring_get_irq;
-			ring->irq_put = i9xx_ring_put_irq;
+			ring->irq_enable = i9xx_ring_enable_irq;
+			ring->irq_disable = i9xx_ring_disable_irq;
 		}
 		ring->irq_enable_mask = I915_USER_INTERRUPT;
 	}
@@ -2799,8 +2695,8 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 		if (INTEL_INFO(dev)->gen >= 8) {
 			ring->irq_enable_mask =
 				GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
-			ring->irq_get = gen8_ring_get_irq;
-			ring->irq_put = gen8_ring_put_irq;
+			ring->irq_enable = gen8_ring_enable_irq;
+			ring->irq_disable = gen8_ring_disable_irq;
 			ring->dispatch_execbuffer =
 				gen8_ring_dispatch_execbuffer;
 			if (i915_semaphore_is_enabled(dev)) {
@@ -2810,8 +2706,8 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			}
 		} else {
 			ring->irq_enable_mask = GT_BSD_USER_INTERRUPT;
-			ring->irq_get = gen6_ring_get_irq;
-			ring->irq_put = gen6_ring_put_irq;
+			ring->irq_enable = gen6_ring_enable_irq;
+			ring->irq_disable = gen6_ring_disable_irq;
 			ring->dispatch_execbuffer =
 				gen6_ring_dispatch_execbuffer;
 			if (i915_semaphore_is_enabled(dev)) {
@@ -2835,12 +2731,12 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 		ring->add_request = i9xx_add_request;
 		if (IS_GEN5(dev)) {
 			ring->irq_enable_mask = ILK_BSD_USER_INTERRUPT;
-			ring->irq_get = gen5_ring_get_irq;
-			ring->irq_put = gen5_ring_put_irq;
+			ring->irq_enable = gen5_ring_enable_irq;
+			ring->irq_disable = gen5_ring_disable_irq;
 		} else {
 			ring->irq_enable_mask = I915_BSD_USER_INTERRUPT;
-			ring->irq_get = i9xx_ring_get_irq;
-			ring->irq_put = i9xx_ring_put_irq;
+			ring->irq_enable = i9xx_ring_enable_irq;
+			ring->irq_disable = i9xx_ring_disable_irq;
 		}
 		ring->dispatch_execbuffer = i965_dispatch_execbuffer;
 	}
@@ -2867,8 +2763,8 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
 	ring->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
-	ring->irq_get = gen8_ring_get_irq;
-	ring->irq_put = gen8_ring_put_irq;
+	ring->irq_enable = gen8_ring_enable_irq;
+	ring->irq_disable = gen8_ring_disable_irq;
 	ring->dispatch_execbuffer =
 			gen8_ring_dispatch_execbuffer;
 	if (i915_semaphore_is_enabled(dev)) {
@@ -2897,8 +2793,8 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 	if (INTEL_INFO(dev)->gen >= 8) {
 		ring->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
-		ring->irq_get = gen8_ring_get_irq;
-		ring->irq_put = gen8_ring_put_irq;
+		ring->irq_enable = gen8_ring_enable_irq;
+		ring->irq_disable = gen8_ring_disable_irq;
 		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
 		if (i915_semaphore_is_enabled(dev)) {
 			ring->semaphore.sync_to = gen8_ring_sync;
@@ -2907,8 +2803,8 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 		}
 	} else {
 		ring->irq_enable_mask = GT_BLT_USER_INTERRUPT;
-		ring->irq_get = gen6_ring_get_irq;
-		ring->irq_put = gen6_ring_put_irq;
+		ring->irq_enable = gen6_ring_enable_irq;
+		ring->irq_disable = gen6_ring_disable_irq;
 		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
 		if (i915_semaphore_is_enabled(dev)) {
 			ring->semaphore.signal = gen6_signal;
@@ -2954,8 +2850,8 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 	if (INTEL_INFO(dev)->gen >= 8) {
 		ring->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
-		ring->irq_get = gen8_ring_get_irq;
-		ring->irq_put = gen8_ring_put_irq;
+		ring->irq_enable = gen8_ring_enable_irq;
+		ring->irq_disable = gen8_ring_disable_irq;
 		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
 		if (i915_semaphore_is_enabled(dev)) {
 			ring->semaphore.sync_to = gen8_ring_sync;
@@ -2964,8 +2860,8 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 		}
 	} else {
 		ring->irq_enable_mask = PM_VEBOX_USER_INTERRUPT;
-		ring->irq_get = hsw_vebox_get_irq;
-		ring->irq_put = hsw_vebox_put_irq;
+		ring->irq_enable = hsw_vebox_enable_irq;
+		ring->irq_disable = hsw_vebox_disable_irq;
 		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
 		if (i915_semaphore_is_enabled(dev)) {
 			ring->semaphore.sync_to = gen6_ring_sync;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index ba81052999fa..3364bcebd456 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -202,11 +202,10 @@ struct  intel_engine_cs {
 	struct intel_hw_status_page status_page;
 	struct i915_ctx_workarounds wa_ctx;
 
-	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
 	bool		irq_posted;
 	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
-	bool __must_check (*irq_get)(struct intel_engine_cs *ring);
-	void		(*irq_put)(struct intel_engine_cs *ring);
+	void		(*irq_enable)(struct intel_engine_cs *ring);
+	void		(*irq_disable)(struct intel_engine_cs *ring);
 
 	int		(*init_hw)(struct intel_engine_cs *ring);
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 031/190] drm/i915: Harden detection of missed interrupts
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (28 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 030/190] drm/i915: Move the get/put irq locking into the caller Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 032/190] drm/i915: Remove debug noise on detecting fault-injection " Chris Wilson
                   ` (56 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

Only declare a missed interrupt if we find that the GPU is idle with
waiters and a hangcheck interval has passed in which no new user
interrupts have been raised.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  6 ++++++
 drivers/gpu/drm/i915/i915_irq.c         | 10 ++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 5a706c700684..567f8db4c70a 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -735,6 +735,9 @@ static void i915_ring_seqno_info(struct seq_file *m,
 	seq_printf(m, "Current sequence (%s): %x\n",
 		   ring->name, intel_ring_get_seqno(ring));
 
+	seq_printf(m, "Current user interrupts (%s): %x\n",
+		   ring->name, READ_ONCE(ring->user_interrupts));
+
 	spin_lock(&ring->breadcrumbs.lock);
 	for (rb = rb_first(&ring->breadcrumbs.waiters);
 	     rb != NULL;
@@ -1372,6 +1375,9 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 		seq_printf(m, "\tseqno = %x [current %x], waiters? %d\n",
 			   ring->hangcheck.seqno, seqno[i],
 			   intel_engine_has_waiter(ring));
+		seq_printf(m, "\tuser interrupts = %x [current %x]\n",
+			   ring->hangcheck.user_interrupts,
+			   ring->user_interrupts);
 		seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
 			   (long long)ring->hangcheck.acthd,
 			   (long long)acthd[i]);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index bf48fa63127a..b3942dec7de4 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -997,8 +997,10 @@ static void ironlake_rps_change_irq_handler(struct drm_device *dev)
 static void notify_ring(struct intel_engine_cs *ring)
 {
 	ring->irq_posted = true; /* paired with mb() in wake_up_process() */
-	if (intel_engine_wakeup(ring))
+	if (intel_engine_wakeup(ring)) {
 		trace_i915_gem_request_notify(ring);
+		ring->user_interrupts++;
+	}
 }
 
 static void vlv_c0_read(struct drm_i915_private *dev_priv,
@@ -3061,12 +3063,14 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	for_each_ring(ring, dev_priv, i) {
 		u64 acthd;
 		u32 seqno;
+		unsigned user_interrupts;
 		bool busy = true;
 
 		semaphore_clear_deadlocks(dev_priv);
 
 		acthd = intel_ring_get_active_head(ring);
 		seqno = intel_ring_get_seqno(ring);
+		user_interrupts = READ_ONCE(ring->user_interrupts);
 
 		if (ring->hangcheck.seqno == seqno) {
 			if (ring_idle(ring, seqno)) {
@@ -3074,7 +3078,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 
 				if (intel_engine_has_waiter(ring)) {
 					/* Issue a wake-up to catch stuck h/w. */
-					if (!test_and_set_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings)) {
+					if (ring->hangcheck.user_interrupts == user_interrupts &&
+					    !test_and_set_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings)) {
 						if (!test_bit(ring->id, &dev_priv->gpu_error.test_irq_rings))
 							DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
 								  ring->name);
@@ -3142,6 +3147,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 
 		ring->hangcheck.seqno = seqno;
 		ring->hangcheck.acthd = acthd;
+		ring->hangcheck.user_interrupts = user_interrupts;
 		busy_count += busy;
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 3364bcebd456..73da75fa47c1 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -90,6 +90,7 @@ struct intel_ring_hangcheck {
 	u64 acthd;
 	u64 max_acthd;
 	u32 seqno;
+	unsigned user_interrupts;
 	int score;
 	enum intel_ring_hangcheck_action action;
 	int deadlock;
@@ -328,6 +329,7 @@ struct  intel_engine_cs {
 	 * inspecting request list.
 	 */
 	u32 last_submitted_seqno;
+	unsigned user_interrupts;
 
 	bool gpu_caches_dirty;
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 032/190] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (29 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 031/190] drm/i915: Harden detection of missed interrupts Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 033/190] drm/i915: Only start retire worker when idle Chris Wilson
                   ` (55 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

Since the tests can and do explicitly check debugfs/i915_ring_missed_irqs
for the handling of a "missed interrupt", adding it to the dmesg at INFO
is just noise. When it happens for real, we still class it as an ERROR.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index b3942dec7de4..502663f13cd8 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3083,9 +3083,6 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 						if (!test_bit(ring->id, &dev_priv->gpu_error.test_irq_rings))
 							DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
 								  ring->name);
-						else
-							DRM_INFO("Fake missed irq on %s\n",
-								 ring->name);
 
 						intel_engine_enable_fake_irq(ring);
 					}
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 033/190] drm/i915: Only start retire worker when idle
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (30 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 032/190] drm/i915: Remove debug noise on detecting fault-injection " Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 034/190] drm/i915: Do not keep postponing the idle-work Chris Wilson
                   ` (54 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

The retire worker is a low frequency task that makes sure we retire
outstanding requests if userspace is being lax. We only need to start it
once as it remains active until the GPU is idle, so do a cheap test
before the more expensive queue_work(). A consequence of this is that we
need correct locking in the worker to make the hot path of request
submission cheap. To keep the symmetry and keep hangcheck strictly bound
by the GPU's wakelock, we move the cancel_sync(hangcheck) to the idle
worker before dropping the wakelock.

v2: Guard against RCU fouling the breadcrumbs bottom-half whilst we kick
the waiter.
v3: Remove the wakeref assertion squelching (now we hold a wakeref for
the hangcheck, any rpm error there is genuine).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
References: https://bugs.freedesktop.org/show_bug.cgi?id=88437
---
 drivers/gpu/drm/i915/i915_drv.c            |  2 -
 drivers/gpu/drm/i915/i915_drv.h            |  2 +-
 drivers/gpu/drm/i915/i915_gem.c            | 83 ++++++++++++++++++++----------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  6 +++
 drivers/gpu/drm/i915/i915_irq.c            | 16 +-----
 drivers/gpu/drm/i915/intel_display.c       | 29 -----------
 6 files changed, 66 insertions(+), 72 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 5160f1414de4..4c090f1cf69c 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1490,8 +1490,6 @@ static int intel_runtime_suspend(struct device *device)
 	i915_gem_release_all_mmaps(dev_priv);
 	mutex_unlock(&dev->struct_mutex);
 
-	cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
-
 	intel_guc_suspend(dev);
 
 	intel_suspend_gt_powersave(dev);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7f021505e32f..9ec6f3e9e74d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2987,7 +2987,7 @@ int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
 struct drm_i915_gem_request *
 i915_gem_find_active_request(struct intel_engine_cs *ring);
 
-bool i915_gem_retire_requests(struct drm_device *dev);
+void i915_gem_retire_requests(struct drm_device *dev);
 void i915_gem_retire_requests_ring(struct intel_engine_cs *ring);
 
 static inline u32 i915_reset_counter(struct i915_gpu_error *error)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5ddb2ed0f785..3788fce136f3 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2479,6 +2479,37 @@ i915_gem_get_seqno(struct drm_device *dev, u32 *seqno)
 	return 0;
 }
 
+static void i915_gem_mark_busy(struct drm_i915_private *dev_priv)
+{
+	if (dev_priv->mm.busy)
+		return;
+
+	intel_runtime_pm_get_noresume(dev_priv);
+
+	i915_update_gfx_val(dev_priv);
+	if (INTEL_INFO(dev_priv)->gen >= 6)
+		gen6_rps_busy(dev_priv);
+
+	queue_delayed_work(dev_priv->wq,
+			   &dev_priv->mm.retire_work,
+			   round_jiffies_up_relative(HZ));
+
+	dev_priv->mm.busy = true;
+}
+
+static void i915_gem_mark_idle(struct drm_i915_private *dev_priv)
+{
+	dev_priv->mm.busy = false;
+
+	if (cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work))
+		intel_kick_waiters(dev_priv);
+
+	if (INTEL_INFO(dev_priv)->gen >= 6)
+		gen6_rps_idle(dev_priv);
+
+	intel_runtime_pm_put(dev_priv);
+}
+
 /*
  * NB: This function is not allowed to fail. Doing so would mean the the
  * request is not being tracked for completion but the work itself is
@@ -2559,10 +2590,7 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 
 	trace_i915_gem_request_add(request);
 
-	queue_delayed_work(dev_priv->wq,
-			   &dev_priv->mm.retire_work,
-			   round_jiffies_up_relative(HZ));
-	intel_mark_busy(dev_priv->dev);
+	i915_gem_mark_busy(dev_priv);
 
 	/* Sanity check that the reserved size was large enough. */
 	intel_ring_reserved_space_end(ringbuf);
@@ -2892,7 +2920,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 	WARN_ON(i915_verify_lists(ring->dev));
 }
 
-bool
+void
 i915_gem_retire_requests(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -2900,6 +2928,9 @@ i915_gem_retire_requests(struct drm_device *dev)
 	bool idle = true;
 	int i;
 
+	if (!dev_priv->mm.busy)
+		return;
+
 	for_each_ring(ring, dev_priv, i) {
 		i915_gem_retire_requests_ring(ring);
 		idle &= list_empty(&ring->request_list);
@@ -2918,8 +2949,6 @@ i915_gem_retire_requests(struct drm_device *dev)
 		mod_delayed_work(dev_priv->wq,
 				 &dev_priv->mm.idle_work,
 				 msecs_to_jiffies(100));
-
-	return idle;
 }
 
 static void
@@ -2928,17 +2957,21 @@ i915_gem_retire_work_handler(struct work_struct *work)
 	struct drm_i915_private *dev_priv =
 		container_of(work, typeof(*dev_priv), mm.retire_work.work);
 	struct drm_device *dev = dev_priv->dev;
-	bool idle;
 
 	/* Come back later if the device is busy... */
-	idle = false;
 	if (mutex_trylock(&dev->struct_mutex)) {
-		idle = i915_gem_retire_requests(dev);
+		i915_gem_retire_requests(dev);
 		mutex_unlock(&dev->struct_mutex);
 	}
-	if (!idle) {
+
+	/* Keep the retire handler running until we are finally idle.
+	 * We do not need to do this test under locking as in the worst-case
+	 * we queue the retire worker once too often.
+	 */
+	if (READ_ONCE(dev_priv->mm.busy)) {
 		i915_queue_hangcheck(dev_priv);
-		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work,
+		queue_delayed_work(dev_priv->wq,
+				   &dev_priv->mm.retire_work,
 				   round_jiffies_up_relative(HZ));
 	}
 }
@@ -2952,25 +2985,23 @@ i915_gem_idle_work_handler(struct work_struct *work)
 	struct intel_engine_cs *ring;
 	int i;
 
-	for_each_ring(ring, dev_priv, i)
-		if (!list_empty(&ring->request_list))
-			return;
+	if (!mutex_trylock(&dev->struct_mutex))
+		return;
 
-	/* we probably should sync with hangcheck here, using cancel_work_sync.
-	 * Also locking seems to be fubar here, ring->request_list is protected
-	 * by dev->struct_mutex. */
+	if (!dev_priv->mm.busy)
+		goto out;
 
-	intel_mark_idle(dev);
+	for_each_ring(ring, dev_priv, i) {
+		if (!list_empty(&ring->request_list))
+			goto out;
 
-	if (mutex_trylock(&dev->struct_mutex)) {
-		struct intel_engine_cs *ring;
-		int i;
+		i915_gem_batch_pool_fini(&ring->batch_pool);
+	}
 
-		for_each_ring(ring, dev_priv, i)
-			i915_gem_batch_pool_fini(&ring->batch_pool);
+	i915_gem_mark_idle(dev_priv);
 
-		mutex_unlock(&dev->struct_mutex);
-	}
+out:
+	mutex_unlock(&dev->struct_mutex);
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index b8186bd061c1..da1c6fe5b40e 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1475,6 +1475,12 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		dispatch_flags |= I915_DISPATCH_RS;
 	}
 
+	/* Take a local wakeref for preparing to dispatch the execbuf as
+	 * we expect to access the hardware fairly frequently in the
+	 * process. Upon first dispatch, we acquire another prolonged
+	 * wakeref that we hold until the GPU has been idle for at least
+	 * 100ms.
+	 */
 	intel_runtime_pm_get(dev_priv);
 
 	ret = i915_mutex_lock_interruptible(dev);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 502663f13cd8..8866e981bcba 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3047,13 +3047,6 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	if (!i915.enable_hangcheck)
 		return;
 
-	/*
-	 * The hangcheck work is synced during runtime suspend, we don't
-	 * require a wakeref. TODO: instead of disabling the asserts make
-	 * sure that we hold a reference when this work is running.
-	 */
-	DISABLE_RPM_WAKEREF_ASSERTS(dev_priv);
-
 	/* As enabling the GPU requires fairly extensive mmio access,
 	 * periodically arm the mmio checker to see if we are triggering
 	 * any invalid access.
@@ -3157,17 +3150,12 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		}
 	}
 
-	if (rings_hung) {
-		i915_handle_error(dev, true, "Ring hung");
-		goto out;
-	}
+	if (rings_hung)
+		return i915_handle_error(dev, true, "Ring hung");
 
 	/* Reset timer in case GPU hangs without another request being added */
 	if (busy_count)
 		i915_queue_hangcheck(dev_priv);
-
-out:
-	ENABLE_RPM_WAKEREF_ASSERTS(dev_priv);
 }
 
 static void ibx_irq_reset(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index de4d4a0d923a..8e646780c971 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -10874,35 +10874,6 @@ struct drm_display_mode *intel_crtc_mode_get(struct drm_device *dev,
 	return mode;
 }
 
-void intel_mark_busy(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	if (dev_priv->mm.busy)
-		return;
-
-	intel_runtime_pm_get(dev_priv);
-	i915_update_gfx_val(dev_priv);
-	if (INTEL_INFO(dev)->gen >= 6)
-		gen6_rps_busy(dev_priv);
-	dev_priv->mm.busy = true;
-}
-
-void intel_mark_idle(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	if (!dev_priv->mm.busy)
-		return;
-
-	dev_priv->mm.busy = false;
-
-	if (INTEL_INFO(dev)->gen >= 6)
-		gen6_rps_idle(dev->dev_private);
-
-	intel_runtime_pm_put(dev_priv);
-}
-
 static void intel_crtc_destroy(struct drm_crtc *crtc)
 {
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 034/190] drm/i915: Do not keep postponing the idle-work
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (31 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 033/190] drm/i915: Only start retire worker when idle Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 035/190] drm/i915: Remove redundant queue_delayed_work() from throttle ioctl Chris Wilson
                   ` (53 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

Rather than persistently postponing the idle-work everytime somebody
calls i915_gem_retire_requests() (potentially ensuring that we never
reach the idle state), queue the work the first time we detect all
requests are complete. Then if in 100ms, more requests have been queued,
we will abort the idle-worker and wait again until all the new requests
have been completed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3788fce136f3..efd46adb978b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2946,9 +2946,9 @@ i915_gem_retire_requests(struct drm_device *dev)
 	}
 
 	if (idle)
-		mod_delayed_work(dev_priv->wq,
-				 &dev_priv->mm.idle_work,
-				 msecs_to_jiffies(100));
+		queue_delayed_work(dev_priv->wq,
+				   &dev_priv->mm.idle_work,
+				   msecs_to_jiffies(100));
 }
 
 static void
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 035/190] drm/i915: Remove redundant queue_delayed_work() from throttle ioctl
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (32 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 034/190] drm/i915: Do not keep postponing the idle-work Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 036/190] drm/i915: Restore waitboost credit to the synchronous waiter Chris Wilson
                   ` (52 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

We know, by design, that whilst the GPU is active (and thus we are
throttling) the retire_worker is queued. Therefore attempting to requeue
it with queue_delayed_work() is a no-op and we can safely remove it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index efd46adb978b..e9f5ca7ea835 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4116,9 +4116,6 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 		return 0;
 
 	ret = __i915_wait_request(target, true, NULL, NULL);
-	if (ret == 0)
-		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
-
 	i915_gem_request_unreference__unlocked(target);
 
 	return ret;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 036/190] drm/i915: Restore waitboost credit to the synchronous waiter
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (33 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 035/190] drm/i915: Remove redundant queue_delayed_work() from throttle ioctl Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11 16:10   ` Jesse Barnes
  2016-01-11  9:16 ` [PATCH 037/190] drm/i915: Add background commentary to "waitboosting" Chris Wilson
                   ` (51 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

Ideally, we want to automagically have the GPU respond to the
instantaneous load by reclocking itself. However, reclocking occurs
relatively slowly, and to the client waiting for a result from the GPU,
too late. To compensate and reduce the client latency, we allow the
first wait from a client to boost the GPU clocks to maximum. This
overcomes the lag in autoreclocking, at the expense of forcing the GPU
clocks too high. So to offset the excessive power usage, we currently
allow a client to only boost the clocks once before we detect the GPU
is idle again. This works reasonably for say the first frame in a
benchmark, but for many more synchronous workloads (like OpenCL) we find
the GPU clocks remain too low. By noting a wait which would idle the GPU
(i.e. we just waited upon the last known request), we can give that
client the idle boost credit (for their next wait) without the 100ms
delay required for us to detect the GPU idle state. The intention is to
boost clients that are stalling in the process of feeding the GPU more
work (and who in doing so let the GPU idle), without granting boost
credits to clients that are throttling themselves (such as compositors).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Zou, Nanhai" <nanhai.zou@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/gpu/drm/i915/i915_gem.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e9f5ca7ea835..3fea582768e9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1314,6 +1314,22 @@ complete:
 			*timeout = 0;
 	}
 
+	if (ret == 0 && rps && req->seqno == req->ring->last_submitted_seqno) {
+		/* The GPU is now idle and this client has stalled.
+		 * Since no other client has submitted a request in the
+		 * meantime, assume that this client is the only one
+		 * supplying work to the GPU but is unable to keep that
+		 * work supplied because it is waiting. Since the GPU is
+		 * then never kept fully busy, RPS autoclocking will
+		 * keep the clocks relatively low, causing further delays.
+		 * Compensate by giving the synchronous client credit for
+		 * a waitboost next time.
+		 */
+		spin_lock(&req->i915->rps.client_lock);
+		list_del_init(&rps->link);
+		spin_unlock(&req->i915->rps.client_lock);
+	}
+
 	return ret;
 }
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 037/190] drm/i915: Add background commentary to "waitboosting"
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (34 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 036/190] drm/i915: Restore waitboost credit to the synchronous waiter Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 038/190] drm/i915: Flush the RPS bottom-half when the GPU idles Chris Wilson
                   ` (50 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

Describe the intent of boosting the GPU frequency to maximum before
waiting on the GPU.

RPS waitboosting was introduced with

commit b29c19b645287f7062e17d70fa4e9781a01a5d88
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Sep 25 17:34:56 2013 +0100

    drm/i915: Boost RPS frequency for CPU stalls

but lacked a concise comment in the code to explain itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3fea582768e9..3948e85eaa48 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1244,6 +1244,22 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	}
 
 	trace_i915_gem_request_wait_begin(req);
+
+	/* This client is about to stall waiting for the GPU. In many cases
+	 * this is undesirable and limits the throughput of the system, as
+	 * many clients cannot continue processing user input/output whilst
+	 * blocked. RPS autotuning may take tens of milliseconds to respond
+	 * to the GPU load and thus incurs additional latency for the client.
+	 * We can circumvent that by promoting the GPU frequency to maximum
+	 * before we wait. This makes the GPU throttle up much more quickly
+	 * (good for benchmarks and user experience, e.g. window animations),
+	 * but at a cost of spending more power processing the workload
+	 * (bad for battery). Not all clients even want their results
+	 * immediately and for them we should just let the GPU select its own
+	 * frequency to maximise efficiency. To prevent a single client from
+	 * forcing the clocks too high for the whole system, we only allow
+	 * each client to waitboost once in a busy period.
+	 */
 	if (INTEL_INFO(req->i915)->gen >= 6)
 		gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 038/190] drm/i915: Flush the RPS bottom-half when the GPU idles
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (35 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 037/190] drm/i915: Add background commentary to "waitboosting" Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 039/190] drm/i915: Remove stop-rings debugfs interface Chris Wilson
                   ` (49 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

Make sure that the RPS bottom-half is flushed before we set the idle
frequency when we decide the GPU is idle. This should prevent any races
with the bottom-half and setting the idle frequency, and ensures that
the bottom-half is bounded by the GPU's rpm reference taken for when it
is active (i.e. between gen6_rps_busy() and gen6_rps_idle()).

v2: Avoid recursively using the i915->wq - RPS does not touch the
struct_mutex so has no place being on the ordered i915->wq.
v3: Enable/disable interrupts for RPS busy/idle in order to prevent
further HW access from RPS outside of the wakeref.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/gpu/drm/i915/i915_drv.c      |  1 -
 drivers/gpu/drm/i915/i915_irq.c      | 45 +++++++++++++++---------------------
 drivers/gpu/drm/i915/intel_display.c |  1 +
 drivers/gpu/drm/i915/intel_drv.h     |  6 ++---
 drivers/gpu/drm/i915/intel_pm.c      | 23 +++++++++---------
 5 files changed, 34 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 4c090f1cf69c..442e1217e442 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1492,7 +1492,6 @@ static int intel_runtime_suspend(struct device *device)
 
 	intel_guc_suspend(dev);
 
-	intel_suspend_gt_powersave(dev);
 	intel_runtime_pm_disable_interrupts(dev_priv);
 
 	ret = intel_suspend_complete(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 8866e981bcba..d9757d227c86 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -336,9 +336,8 @@ void gen6_disable_pm_irq(struct drm_i915_private *dev_priv, uint32_t mask)
 	__gen6_disable_pm_irq(dev_priv, mask);
 }
 
-void gen6_reset_rps_interrupts(struct drm_device *dev)
+void gen6_reset_rps_interrupts(struct drm_i915_private *dev_priv)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	i915_reg_t reg = gen6_pm_iir(dev_priv);
 
 	spin_lock_irq(&dev_priv->irq_lock);
@@ -349,14 +348,14 @@ void gen6_reset_rps_interrupts(struct drm_device *dev)
 	spin_unlock_irq(&dev_priv->irq_lock);
 }
 
-void gen6_enable_rps_interrupts(struct drm_device *dev)
+void gen6_enable_rps_interrupts(struct drm_i915_private *dev_priv)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	if (dev_priv->rps.interrupts_enabled)
+		return;
 
 	spin_lock_irq(&dev_priv->irq_lock);
-
-	WARN_ON(dev_priv->rps.pm_iir);
-	WARN_ON(I915_READ(gen6_pm_iir(dev_priv)) & dev_priv->pm_rps_events);
+	WARN_ON_ONCE(dev_priv->rps.pm_iir);
+	WARN_ON_ONCE(I915_READ(gen6_pm_iir(dev_priv)) & dev_priv->pm_rps_events);
 	dev_priv->rps.interrupts_enabled = true;
 	I915_WRITE(gen6_pm_ier(dev_priv), I915_READ(gen6_pm_ier(dev_priv)) |
 				dev_priv->pm_rps_events);
@@ -382,17 +381,13 @@ u32 gen6_sanitize_rps_pm_mask(struct drm_i915_private *dev_priv, u32 mask)
 	return mask;
 }
 
-void gen6_disable_rps_interrupts(struct drm_device *dev)
+void gen6_disable_rps_interrupts(struct drm_i915_private *dev_priv)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	if (!dev_priv->rps.interrupts_enabled)
+		return;
 
 	spin_lock_irq(&dev_priv->irq_lock);
 	dev_priv->rps.interrupts_enabled = false;
-	spin_unlock_irq(&dev_priv->irq_lock);
-
-	cancel_work_sync(&dev_priv->rps.work);
-
-	spin_lock_irq(&dev_priv->irq_lock);
 
 	I915_WRITE(GEN6_PMINTRMSK, gen6_sanitize_rps_pm_mask(dev_priv, ~0));
 
@@ -401,8 +396,15 @@ void gen6_disable_rps_interrupts(struct drm_device *dev)
 				~dev_priv->pm_rps_events);
 
 	spin_unlock_irq(&dev_priv->irq_lock);
+	synchronize_irq(dev_priv->dev->irq);
 
-	synchronize_irq(dev->irq);
+	/* Now that we will not be generating any more work, flush any
+	 * outsanding tasks. As we are called on the RPS idle path,
+	 * we will reset the GPU to minimum frequencies, so the current
+	 * state of the worker can be discarded.
+	 */
+	cancel_work_sync(&dev_priv->rps.work);
+	gen6_reset_rps_interrupts(dev_priv);
 }
 
 /**
@@ -1103,13 +1105,6 @@ static void gen6_pm_rps_work(struct work_struct *work)
 		return;
 	}
 
-	/*
-	 * The RPS work is synced during runtime suspend, we don't require a
-	 * wakeref. TODO: instead of disabling the asserts make sure that we
-	 * always hold an RPM reference while the work is running.
-	 */
-	DISABLE_RPM_WAKEREF_ASSERTS(dev_priv);
-
 	pm_iir = dev_priv->rps.pm_iir;
 	dev_priv->rps.pm_iir = 0;
 	/* Make sure not to corrupt PMIMR state used by ringbuffer on GEN6 */
@@ -1122,7 +1117,7 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	WARN_ON(pm_iir & ~dev_priv->pm_rps_events);
 
 	if ((pm_iir & dev_priv->pm_rps_events) == 0 && !client_boost)
-		goto out;
+		return;
 
 	mutex_lock(&dev_priv->rps.hw_lock);
 
@@ -1177,8 +1172,6 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	intel_set_rps(dev_priv->dev, new_delay);
 
 	mutex_unlock(&dev_priv->rps.hw_lock);
-out:
-	ENABLE_RPM_WAKEREF_ASSERTS(dev_priv);
 }
 
 
@@ -1618,7 +1611,7 @@ static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
 		gen6_disable_pm_irq(dev_priv, pm_iir & dev_priv->pm_rps_events);
 		if (dev_priv->rps.interrupts_enabled) {
 			dev_priv->rps.pm_iir |= pm_iir & dev_priv->pm_rps_events;
-			queue_work(dev_priv->wq, &dev_priv->rps.work);
+			schedule_work(&dev_priv->rps.work);
 		}
 		spin_unlock(&dev_priv->irq_lock);
 	}
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 8e646780c971..57c54c9bc82b 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -16069,6 +16069,7 @@ void intel_modeset_cleanup(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_connector *connector;
 
+	intel_suspend_gt_powersave(dev);
 	intel_disable_gt_powersave(dev);
 
 	intel_backlight_unregister(dev);
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index bdfe4035e074..1e082ab4f4d8 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -998,9 +998,9 @@ void gen5_enable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask);
 void gen5_disable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask);
 void gen6_enable_pm_irq(struct drm_i915_private *dev_priv, uint32_t mask);
 void gen6_disable_pm_irq(struct drm_i915_private *dev_priv, uint32_t mask);
-void gen6_reset_rps_interrupts(struct drm_device *dev);
-void gen6_enable_rps_interrupts(struct drm_device *dev);
-void gen6_disable_rps_interrupts(struct drm_device *dev);
+void gen6_reset_rps_interrupts(struct drm_i915_private *dev_priv);
+void gen6_enable_rps_interrupts(struct drm_i915_private *dev_priv);
+void gen6_disable_rps_interrupts(struct drm_i915_private *dev_priv);
 u32 gen6_sanitize_rps_pm_mask(struct drm_i915_private *dev_priv, u32 mask);
 void intel_runtime_pm_disable_interrupts(struct drm_i915_private *dev_priv);
 void intel_runtime_pm_enable_interrupts(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 401c3770057d..e51ba529a97e 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4475,17 +4475,24 @@ void gen6_rps_busy(struct drm_i915_private *dev_priv)
 			gen6_rps_reset_ei(dev_priv);
 		I915_WRITE(GEN6_PMINTRMSK,
 			   gen6_rps_pm_mask(dev_priv, dev_priv->rps.cur_freq));
+
+		gen6_enable_rps_interrupts(dev_priv);
 	}
 	mutex_unlock(&dev_priv->rps.hw_lock);
 }
 
 void gen6_rps_idle(struct drm_i915_private *dev_priv)
 {
-	struct drm_device *dev = dev_priv->dev;
+	/* Flush our bottom-half so that it does not race with us
+	 * setting the idle frequency and so that it is bounded by
+	 * our rpm wakeref. And then disable the interrupts to stop any
+	 * futher RPS reclocking whilst we are asleep.
+	 */
+	gen6_disable_rps_interrupts(dev_priv);
 
 	mutex_lock(&dev_priv->rps.hw_lock);
 	if (dev_priv->rps.enabled) {
-		if (IS_VALLEYVIEW(dev) || IS_CHERRYVIEW(dev))
+		if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv))
 			vlv_set_rps_idle(dev_priv);
 		else
 			gen6_set_rps(dev_priv->dev, dev_priv->rps.idle_freq);
@@ -4523,7 +4530,7 @@ void gen6_rps_boost(struct drm_i915_private *dev_priv,
 		spin_lock_irq(&dev_priv->irq_lock);
 		if (dev_priv->rps.interrupts_enabled) {
 			dev_priv->rps.client_boost = true;
-			queue_work(dev_priv->wq, &dev_priv->rps.work);
+			schedule_work(&dev_priv->rps.work);
 		}
 		spin_unlock_irq(&dev_priv->irq_lock);
 
@@ -6129,8 +6136,6 @@ static void gen6_suspend_rps(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
 	flush_delayed_work(&dev_priv->rps.delayed_resume_work);
-
-	gen6_disable_rps_interrupts(dev);
 }
 
 /**
@@ -6161,8 +6166,6 @@ void intel_disable_gt_powersave(struct drm_device *dev)
 	if (IS_IRONLAKE_M(dev)) {
 		ironlake_disable_drps(dev);
 	} else if (INTEL_INFO(dev)->gen >= 6) {
-		intel_suspend_gt_powersave(dev);
-
 		mutex_lock(&dev_priv->rps.hw_lock);
 		if (INTEL_INFO(dev)->gen >= 9)
 			gen9_disable_rps(dev);
@@ -6186,8 +6189,7 @@ static void intel_gen6_powersave_work(struct work_struct *work)
 	struct drm_device *dev = dev_priv->dev;
 
 	mutex_lock(&dev_priv->rps.hw_lock);
-
-	gen6_reset_rps_interrupts(dev);
+	gen6_reset_rps_interrupts(dev_priv);
 
 	if (IS_CHERRYVIEW(dev)) {
 		cherryview_enable_rps(dev);
@@ -6213,9 +6215,6 @@ static void intel_gen6_powersave_work(struct work_struct *work)
 	WARN_ON(dev_priv->rps.efficient_freq > dev_priv->rps.max_freq);
 
 	dev_priv->rps.enabled = true;
-
-	gen6_enable_rps_interrupts(dev);
-
 	mutex_unlock(&dev_priv->rps.hw_lock);
 
 	intel_runtime_pm_put(dev_priv);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 039/190] drm/i915: Remove stop-rings debugfs interface
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (36 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 038/190] drm/i915: Flush the RPS bottom-half when the GPU idles Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-02-25 17:30   ` Arun Siluvery
  2016-01-11  9:16 ` [PATCH 040/190] drm/i915: Record the ringbuffer associated with the request Chris Wilson
                   ` (48 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

Now that we have (near) universal GPU recovery code, we can inject a
real hang from userspace and not need any fakery. Not only does this
mean that the testing is far more realistic, but we can simplify the
kernel in the process.

v2: Replace the i915_stop_rings with a dummy implementation as igt
encodified its existence until we can release an update.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     | 19 +------------------
 drivers/gpu/drm/i915/i915_drv.c         | 17 ++---------------
 drivers/gpu/drm/i915/i915_drv.h         | 19 -------------------
 drivers/gpu/drm/i915/i915_gem.c         | 13 +++----------
 drivers/gpu/drm/i915/intel_lrc.c        |  5 -----
 drivers/gpu/drm/i915/intel_ringbuffer.c |  8 --------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
 7 files changed, 6 insertions(+), 76 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 567f8db4c70a..6172649b7e56 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4752,30 +4752,13 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_wedged_fops,
 static int
 i915_ring_stop_get(void *data, u64 *val)
 {
-	struct drm_device *dev = data;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	*val = dev_priv->gpu_error.stop_rings;
-
+	*val = 0;
 	return 0;
 }
 
 static int
 i915_ring_stop_set(void *data, u64 val)
 {
-	struct drm_device *dev = data;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	int ret;
-
-	DRM_DEBUG_DRIVER("Stopping rings 0x%08llx\n", val);
-
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
-	if (ret)
-		return ret;
-
-	dev_priv->gpu_error.stop_rings = val;
-	mutex_unlock(&dev->struct_mutex);
-
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 442e1217e442..e9f85fd0542f 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -891,24 +891,11 @@ int i915_reset(struct drm_device *dev)
 		goto error;
 	}
 
+	pr_notice("drm/i915: Resetting chip after gpu hang\n");
+
 	i915_gem_reset(dev);
 
 	ret = intel_gpu_reset(dev);
-
-	/* Also reset the gpu hangman. */
-	if (error->stop_rings != 0) {
-		DRM_INFO("Simulated gpu hang, resetting stop_rings\n");
-		error->stop_rings = 0;
-		if (ret == -ENODEV) {
-			DRM_INFO("Reset not implemented, but ignoring "
-				 "error for simulated gpu hangs\n");
-			ret = 0;
-		}
-	}
-
-	if (i915_stop_ring_allow_warn(dev_priv))
-		pr_notice("drm/i915: Resetting chip after gpu hang\n");
-
 	if (ret) {
 		if (ret != -ENODEV)
 			DRM_ERROR("Failed to reset chip: %i\n", ret);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9ec6f3e9e74d..c3b795f1566b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1371,13 +1371,6 @@ struct i915_gpu_error {
 	 */
 	wait_queue_head_t reset_queue;
 
-	/* Userspace knobs for gpu hang simulation;
-	 * combines both a ring mask, and extra flags
-	 */
-	u32 stop_rings;
-#define I915_STOP_RING_ALLOW_BAN       (1 << 31)
-#define I915_STOP_RING_ALLOW_WARN      (1 << 30)
-
 	/* For missed irq/seqno simulation. */
 	unsigned long test_irq_rings;
 };
@@ -3030,18 +3023,6 @@ static inline u32 i915_reset_count(struct i915_gpu_error *error)
 	return ((i915_reset_counter(error) & ~I915_WEDGED) + 1) / 2;
 }
 
-static inline bool i915_stop_ring_allow_ban(struct drm_i915_private *dev_priv)
-{
-	return dev_priv->gpu_error.stop_rings == 0 ||
-		dev_priv->gpu_error.stop_rings & I915_STOP_RING_ALLOW_BAN;
-}
-
-static inline bool i915_stop_ring_allow_warn(struct drm_i915_private *dev_priv)
-{
-	return dev_priv->gpu_error.stop_rings == 0 ||
-		dev_priv->gpu_error.stop_rings & I915_STOP_RING_ALLOW_WARN;
-}
-
 void i915_gem_reset(struct drm_device *dev);
 bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
 int __must_check i915_gem_init(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3948e85eaa48..ea9344503bf6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2633,21 +2633,14 @@ static bool i915_context_is_banned(struct drm_i915_private *dev_priv,
 {
 	unsigned long elapsed;
 
-	elapsed = get_seconds() - ctx->hang_stats.guilty_ts;
-
 	if (ctx->hang_stats.banned)
 		return true;
 
+	elapsed = get_seconds() - ctx->hang_stats.guilty_ts;
 	if (ctx->hang_stats.ban_period_seconds &&
 	    elapsed <= ctx->hang_stats.ban_period_seconds) {
-		if (!i915_gem_context_is_default(ctx)) {
-			DRM_DEBUG("context hanging too fast, banning!\n");
-			return true;
-		} else if (i915_stop_ring_allow_ban(dev_priv)) {
-			if (i915_stop_ring_allow_warn(dev_priv))
-				DRM_ERROR("gpu hanging too fast, banning!\n");
-			return true;
-		}
+		DRM_DEBUG("context hanging too fast, banning!\n");
+		return true;
 	}
 
 	return false;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b1ede2e9b372..b634e7d7a92b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -756,16 +756,11 @@ static int logical_ring_wait_for_space(struct drm_i915_gem_request *req,
 static void
 intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
 {
-	struct intel_engine_cs *ring = request->ring;
 	struct drm_i915_private *dev_priv = request->i915;
 
 	intel_logical_ring_advance(request->ringbuf);
-
 	request->tail = request->ringbuf->tail;
 
-	if (intel_ring_stopped(ring))
-		return;
-
 	if (dev_priv->guc.execbuf_client)
 		i915_guc_submit(dev_priv->guc.execbuf_client, request);
 	else
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 5625f56a2db1..d9bb6458fa60 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -59,18 +59,10 @@ int intel_ring_space(struct intel_ringbuffer *ringbuf)
 	return ringbuf->space;
 }
 
-bool intel_ring_stopped(struct intel_engine_cs *ring)
-{
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-	return dev_priv->gpu_error.stop_rings & intel_ring_flag(ring);
-}
-
 static void __intel_ring_advance(struct intel_engine_cs *ring)
 {
 	struct intel_ringbuffer *ringbuf = ring->buffer;
 	ringbuf->tail &= ringbuf->size - 1;
-	if (intel_ring_stopped(ring))
-		return;
 	ring->write_tail(ring, ringbuf->tail);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 73da75fa47c1..eecf9c7ae2b8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -487,7 +487,6 @@ static inline void intel_ring_advance(struct intel_engine_cs *ring)
 int __intel_ring_space(int head, int tail, int size);
 void intel_ring_update_space(struct intel_ringbuffer *ringbuf);
 int intel_ring_space(struct intel_ringbuffer *ringbuf);
-bool intel_ring_stopped(struct intel_engine_cs *ring);
 
 int __must_check intel_ring_idle(struct intel_engine_cs *ring);
 void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 040/190] drm/i915: Record the ringbuffer associated with the request
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (37 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 039/190] drm/i915: Remove stop-rings debugfs interface Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 041/190] drm/i915: Allow userspace to request no-error-capture upon GPU hangs Chris Wilson
                   ` (47 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

The request tells us where to read the ringbuf from, so use that
information to simplify the error capture. If no request was active at
the time of the hang, the ring is idle and there is no information
inside the ring pertaining to the hang.

Note carefully that this will reduce the amount of information stored in
the error state - any ring without an active request will not be
recorded.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 28 ++++++++--------------------
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 3e137fc701cf..93da2c7581f6 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -995,7 +995,6 @@ static void i915_gem_record_rings(struct drm_device *dev,
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct intel_engine_cs *ring = &dev_priv->ring[i];
-		struct intel_ringbuffer *rbuf;
 
 		error->ring[i].pid = -1;
 
@@ -1009,6 +1008,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 		request = i915_gem_find_active_request(ring);
 		if (request) {
 			struct i915_address_space *vm;
+			struct intel_ringbuffer *rb;
 
 			vm = request->ctx && request->ctx->ppgtt ?
 				&request->ctx->ppgtt->base :
@@ -1039,26 +1039,14 @@ static void i915_gem_record_rings(struct drm_device *dev,
 				}
 				rcu_read_unlock();
 			}
-		}
 
-		if (i915.enable_execlists) {
-			/* TODO: This is only a small fix to keep basic error
-			 * capture working, but we need to add more information
-			 * for it to be useful (e.g. dump the context being
-			 * executed).
-			 */
-			if (request)
-				rbuf = request->ctx->engine[ring->id].ringbuf;
-			else
-				rbuf = ring->default_context->engine[ring->id].ringbuf;
-		} else
-			rbuf = ring->buffer;
-
-		error->ring[i].cpu_ring_head = rbuf->head;
-		error->ring[i].cpu_ring_tail = rbuf->tail;
-
-		error->ring[i].ringbuffer =
-			i915_error_ggtt_object_create(dev_priv, rbuf->obj);
+			rb = request->ringbuf;
+			error->ring[i].cpu_ring_head = rb->head;
+			error->ring[i].cpu_ring_tail = rb->tail;
+			error->ring[i].ringbuffer =
+				i915_error_ggtt_object_create(dev_priv,
+							      rb->obj);
+		}
 
 		error->ring[i].hws_page =
 			i915_error_ggtt_object_create(dev_priv, ring->status_page.obj);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 041/190] drm/i915: Allow userspace to request no-error-capture upon GPU hangs
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (38 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 040/190] drm/i915: Record the ringbuffer associated with the request Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 042/190] drm/i915: Clean up GPU hang message Chris Wilson
                   ` (46 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

igt likes to inject GPU hangs into its command streams. However, as we
expect these hangs, we don't actually want them recorded in the dmesg
output or stored in the i915_error_state (usually). To accomodate this
allow userspace to set a flag on the context that any hang emanating
from that context will not be recorded. We still do the error capture
(otherwise how do we find the guilty context and know its intent?) as
part of the reason for random GPU hang injection is to exercise the race
conditions between the error capture and normal execution.

v2: Split out the request->ringbuf error capture changes.
v3: Move the flag defines next to the intel_context->flags definition

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  7 +++++--
 drivers/gpu/drm/i915/i915_gem_context.c | 13 +++++++++++++
 drivers/gpu/drm/i915/i915_gpu_error.c   | 14 +++++++++-----
 include/uapi/drm/i915_drm.h             |  1 +
 4 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c3b795f1566b..57e450e25ad6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -486,6 +486,7 @@ struct drm_i915_error_state {
 	struct timeval time;
 
 	char error_msg[128];
+	bool simulated;
 	int iommu;
 	u32 reset_count;
 	u32 suspend_count;
@@ -842,7 +843,6 @@ struct i915_ctx_hang_stats {
 /* This must match up with the value previously used for execbuf2.rsvd1. */
 #define DEFAULT_CONTEXT_HANDLE 0
 
-#define CONTEXT_NO_ZEROMAP (1<<0)
 /**
  * struct intel_context - as the name implies, represents a context.
  * @ref: reference count.
@@ -867,11 +867,14 @@ struct intel_context {
 	int user_handle;
 	uint8_t remap_slice;
 	struct drm_i915_private *i915;
-	int flags;
 	struct drm_i915_file_private *file_priv;
 	struct i915_ctx_hang_stats hang_stats;
 	struct i915_hw_ppgtt *ppgtt;
 
+	unsigned flags;
+#define CONTEXT_NO_ZEROMAP		(1<<0)
+#define CONTEXT_NO_ERROR_CAPTURE	(1<<1)
+
 	/* Legacy ring buffer submission */
 	struct {
 		struct drm_i915_gem_object *rcs_state;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index e5e9a8918f19..0aea5ccf6d68 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -939,6 +939,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 		else
 			args->value = to_i915(dev)->gtt.base.total;
 		break;
+	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
+		args->value = !!(ctx->flags & CONTEXT_NO_ERROR_CAPTURE);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -984,6 +987,16 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 			ctx->flags |= args->value ? CONTEXT_NO_ZEROMAP : 0;
 		}
 		break;
+	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
+		if (args->size) {
+			ret = -EINVAL;
+		} else {
+			if (args->value)
+				ctx->flags |= CONTEXT_NO_ERROR_CAPTURE;
+			else
+				ctx->flags &= ~CONTEXT_NO_ERROR_CAPTURE;
+		}
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 93da2c7581f6..4f17d6847569 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1040,6 +1040,8 @@ static void i915_gem_record_rings(struct drm_device *dev,
 				rcu_read_unlock();
 			}
 
+			error->simulated |= request->ctx->flags & CONTEXT_NO_ERROR_CAPTURE;
+
 			rb = request->ringbuf;
 			error->ring[i].cpu_ring_head = rb->head;
 			error->ring[i].cpu_ring_tail = rb->tail;
@@ -1333,12 +1335,14 @@ void i915_capture_error_state(struct drm_device *dev, bool wedged,
 	i915_error_capture_msg(dev, error, wedged, error_msg);
 	DRM_INFO("%s\n", error->error_msg);
 
-	spin_lock_irqsave(&dev_priv->gpu_error.lock, flags);
-	if (dev_priv->gpu_error.first_error == NULL) {
-		dev_priv->gpu_error.first_error = error;
-		error = NULL;
+	if (!error->simulated) {
+		spin_lock_irqsave(&dev_priv->gpu_error.lock, flags);
+		if (dev_priv->gpu_error.first_error == NULL) {
+			dev_priv->gpu_error.first_error = error;
+			error = NULL;
+		}
+		spin_unlock_irqrestore(&dev_priv->gpu_error.lock, flags);
 	}
-	spin_unlock_irqrestore(&dev_priv->gpu_error.lock, flags);
 
 	if (error) {
 		i915_error_state_free(&error->ref);
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index acf21026c78a..7fee4416dcc7 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1140,6 +1140,7 @@ struct drm_i915_gem_context_param {
 #define I915_CONTEXT_PARAM_BAN_PERIOD	0x1
 #define I915_CONTEXT_PARAM_NO_ZEROMAP	0x2
 #define I915_CONTEXT_PARAM_GTT_SIZE	0x3
+#define I915_CONTEXT_PARAM_NO_ERROR_CAPTURE	0x4
 	__u64 value;
 };
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 042/190] drm/i915: Clean up GPU hang message
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (39 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 041/190] drm/i915: Allow userspace to request no-error-capture upon GPU hangs Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-02-25 17:40   ` Arun Siluvery
  2016-01-11  9:16 ` [PATCH 043/190] drm/i915: Skip capturing an error state if we already have one Chris Wilson
                   ` (45 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

Remove some redundant kernel messages as we deduce a hung GPU and
capture the error state.

v2: Fix "hang" vs "no progress" message whilst I was there

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index d9757d227c86..ce52d7d9ad91 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3031,8 +3031,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	struct drm_device *dev = dev_priv->dev;
 	struct intel_engine_cs *ring;
 	int i;
-	int busy_count = 0, rings_hung = 0;
-	bool stuck[I915_NUM_RINGS] = { 0 };
+	int busy_count = 0;
 #define BUSY 1
 #define KICK 5
 #define HUNG 20
@@ -3108,7 +3107,6 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 					break;
 				case HANGCHECK_HUNG:
 					ring->hangcheck.score += HUNG;
-					stuck[i] = true;
 					break;
 				}
 			}
@@ -3134,17 +3132,12 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		busy_count += busy;
 	}
 
-	for_each_ring(ring, dev_priv, i) {
-		if (ring->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG) {
-			DRM_INFO("%s on %s\n",
-				 stuck[i] ? "stuck" : "no progress",
-				 ring->name);
-			rings_hung++;
-		}
-	}
-
-	if (rings_hung)
-		return i915_handle_error(dev, true, "Ring hung");
+	for_each_ring(ring, dev_priv, i)
+		if (ring->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG)
+			return i915_handle_error(dev, true,
+						 "%s on %s",
+						 ring->hangcheck.action == HANGCHECK_HUNG ? "Hang" : "No progress" ,
+						 ring->name);
 
 	/* Reset timer in case GPU hangs without another request being added */
 	if (busy_count)
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 043/190] drm/i915: Skip capturing an error state if we already have one
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (40 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 042/190] drm/i915: Clean up GPU hang message Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 044/190] drm/i915: Move GEM request routines to i915_gem_request.c Chris Wilson
                   ` (44 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

As we only ever keep the first error state around, we can avoid some
work that can be quite intrusive if we don't record the error the second
time around. This does move the race whereby the user could discard one
error state as the second is being captured, but that race exists in the
current code and we hope that recapturing error state is only done for
debugging.

Note that as we discard the error state for simulated errors, igt that
exercise error capture continue to function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 4f17d6847569..86f582115313 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1312,6 +1312,9 @@ void i915_capture_error_state(struct drm_device *dev, bool wedged,
 	struct drm_i915_error_state *error;
 	unsigned long flags;
 
+	if (READ_ONCE(dev_priv->gpu_error.first_error))
+		return;
+
 	/* Account for pipe specific data like PIPE*STAT */
 	error = kzalloc(sizeof(*error), GFP_ATOMIC);
 	if (!error) {
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 044/190] drm/i915: Move GEM request routines to i915_gem_request.c
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (41 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 043/190] drm/i915: Skip capturing an error state if we already have one Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-02-25 17:52   ` Arun Siluvery
  2016-01-11  9:16 ` [PATCH 045/190] drm/i915: Move releasing of the GEM request from free to retire/cancel Chris Wilson
                   ` (43 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

Migrate the request operations out of the main body of i915_gem.c and
into their own C file for easier expansion.

v2: Move __i915_add_request() across as well

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile           |   1 +
 drivers/gpu/drm/i915/i915_drv.h         | 205 +---------
 drivers/gpu/drm/i915/i915_gem.c         | 652 +------------------------------
 drivers/gpu/drm/i915/i915_gem_request.c | 659 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_request.h | 223 +++++++++++
 5 files changed, 895 insertions(+), 845 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_request.c
 create mode 100644 drivers/gpu/drm/i915/i915_gem_request.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 99ce591c8574..b0a83215db80 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -31,6 +31,7 @@ i915-y += i915_cmd_parser.o \
 	  i915_gem_gtt.o \
 	  i915_gem.o \
 	  i915_gem_render_state.o \
+	  i915_gem_request.o \
 	  i915_gem_shrinker.o \
 	  i915_gem_stolen.o \
 	  i915_gem_tiling.o \
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 57e450e25ad6..ee146ce02412 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -41,6 +41,7 @@
 #include "intel_lrc.h"
 #include "i915_gem_gtt.h"
 #include "i915_gem_render_state.h"
+#include "i915_gem_request.h"
 #include <linux/io-mapping.h>
 #include <linux/i2c.h>
 #include <linux/i2c-algo-bit.h>
@@ -2162,179 +2163,15 @@ struct drm_i915_gem_object {
 };
 #define to_intel_bo(x) container_of(x, struct drm_i915_gem_object, base)
 
-void i915_gem_track_fb(struct drm_i915_gem_object *old,
-		       struct drm_i915_gem_object *new,
-		       unsigned frontbuffer_bits);
-
-/**
- * Request queue structure.
- *
- * The request queue allows us to note sequence numbers that have been emitted
- * and may be associated with active buffers to be retired.
- *
- * By keeping this list, we can avoid having to do questionable sequence
- * number comparisons on buffer last_read|write_seqno. It also allows an
- * emission time to be associated with the request for tracking how far ahead
- * of the GPU the submission is.
- *
- * The requests are reference counted, so upon creation they should have an
- * initial reference taken using kref_init
- */
-struct drm_i915_gem_request {
-	struct kref ref;
-
-	/** On Which ring this request was generated */
-	struct drm_i915_private *i915;
-	struct intel_engine_cs *ring;
-	unsigned reset_counter;
-
-	 /** GEM sequence number associated with the previous request,
-	  * when the HWS breadcrumb is equal to this the GPU is processing
-	  * this request.
-	  */
-	u32 previous_seqno;
-
-	 /** GEM sequence number associated with this request,
-	  * when the HWS breadcrumb is equal or greater than this the GPU
-	  * has finished processing this request.
-	  */
-	u32 seqno;
-
-	/** Position in the ringbuffer of the start of the request */
-	u32 head;
-
-	/**
-	 * Position in the ringbuffer of the start of the postfix.
-	 * This is required to calculate the maximum available ringbuffer
-	 * space without overwriting the postfix.
-	 */
-	 u32 postfix;
-
-	/** Position in the ringbuffer of the end of the whole request */
-	u32 tail;
-
-	/**
-	 * Context and ring buffer related to this request
-	 * Contexts are refcounted, so when this request is associated with a
-	 * context, we must increment the context's refcount, to guarantee that
-	 * it persists while any request is linked to it. Requests themselves
-	 * are also refcounted, so the request will only be freed when the last
-	 * reference to it is dismissed, and the code in
-	 * i915_gem_request_free() will then decrement the refcount on the
-	 * context.
-	 */
-	struct intel_context *ctx;
-	struct intel_ringbuffer *ringbuf;
-
-	/** Batch buffer related to this request if any (used for
-	    error state dump only) */
-	struct drm_i915_gem_object *batch_obj;
-
-	/** Time at which this request was emitted, in jiffies. */
-	unsigned long emitted_jiffies;
-
-	/** global list entry for this request */
-	struct list_head list;
-
-	struct drm_i915_file_private *file_priv;
-	/** file_priv list entry for this request */
-	struct list_head client_list;
-
-	/** process identifier submitting this request */
-	struct pid *pid;
-
-	/**
-	 * The ELSP only accepts two elements at a time, so we queue
-	 * context/tail pairs on a given queue (ring->execlist_queue) until the
-	 * hardware is available. The queue serves a double purpose: we also use
-	 * it to keep track of the up to 2 contexts currently in the hardware
-	 * (usually one in execution and the other queued up by the GPU): We
-	 * only remove elements from the head of the queue when the hardware
-	 * informs us that an element has been completed.
-	 *
-	 * All accesses to the queue are mediated by a spinlock
-	 * (ring->execlist_lock).
-	 */
-
-	/** Execlist link in the submission queue.*/
-	struct list_head execlist_link;
-
-	/** Execlists no. of times this request has been sent to the ELSP */
-	int elsp_submitted;
-
-};
-
 #ifdef CONFIG_DRM_I915_DEBUG_GEM
 #define GEM_BUG_ON(expr) BUG_ON(expr)
 #else
 #define GEM_BUG_ON(expr)
 #endif
 
-int i915_gem_request_alloc(struct intel_engine_cs *ring,
-			   struct intel_context *ctx,
-			   struct drm_i915_gem_request **req_out);
-void i915_gem_request_cancel(struct drm_i915_gem_request *req);
-void i915_gem_request_free(struct kref *req_ref);
-int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
-				   struct drm_file *file);
-
-static inline uint32_t
-i915_gem_request_get_seqno(struct drm_i915_gem_request *req)
-{
-	return req ? req->seqno : 0;
-}
-
-static inline struct intel_engine_cs *
-i915_gem_request_get_ring(struct drm_i915_gem_request *req)
-{
-	return req ? req->ring : NULL;
-}
-
-static inline struct drm_i915_gem_request *
-i915_gem_request_reference(struct drm_i915_gem_request *req)
-{
-	if (req)
-		kref_get(&req->ref);
-	return req;
-}
-
-static inline void
-i915_gem_request_unreference(struct drm_i915_gem_request *req)
-{
-	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
-	kref_put(&req->ref, i915_gem_request_free);
-}
-
-static inline void
-i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
-{
-	struct drm_device *dev;
-
-	if (!req)
-		return;
-
-	dev = req->ring->dev;
-	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
-		mutex_unlock(&dev->struct_mutex);
-}
-
-static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
-					   struct drm_i915_gem_request *src)
-{
-	if (src)
-		i915_gem_request_reference(src);
-
-	if (*pdst)
-		i915_gem_request_unreference(*pdst);
-
-	*pdst = src;
-}
-
-/*
- * XXX: i915_gem_request_completed should be here but currently needs the
- * definition of i915_seqno_passed() which is below. It will be moved in
- * a later patch when the call to i915_seqno_passed() is obsoleted...
- */
+void i915_gem_track_fb(struct drm_i915_gem_object *old,
+		       struct drm_i915_gem_object *new,
+		       unsigned frontbuffer_bits);
 
 /*
  * A command that requires special handling by the command parser.
@@ -2956,28 +2793,6 @@ int i915_gem_dumb_create(struct drm_file *file_priv,
 			 struct drm_mode_create_dumb *args);
 int i915_gem_mmap_gtt(struct drm_file *file_priv, struct drm_device *dev,
 		      uint32_t handle, uint64_t *offset);
-/**
- * Returns true if seq1 is later than seq2.
- */
-static inline bool
-i915_seqno_passed(uint32_t seq1, uint32_t seq2)
-{
-	return (int32_t)(seq1 - seq2) >= 0;
-}
-
-static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
-{
-	return i915_seqno_passed(intel_ring_get_seqno(req->ring),
-				 req->previous_seqno);
-}
-
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
-{
-	return i915_seqno_passed(intel_ring_get_seqno(req->ring),
-				 req->seqno);
-}
-
-int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
 int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
 
 struct drm_i915_gem_request *
@@ -3036,18 +2851,6 @@ void i915_gem_init_swizzling(struct drm_device *dev);
 void i915_gem_cleanup_ringbuffer(struct drm_device *dev);
 int __must_check i915_gpu_idle(struct drm_device *dev);
 int __must_check i915_gem_suspend(struct drm_device *dev);
-void __i915_add_request(struct drm_i915_gem_request *req,
-			struct drm_i915_gem_object *batch_obj,
-			bool flush_caches);
-#define i915_add_request(req) \
-	__i915_add_request(req, NULL, true)
-#define i915_add_request_no_flush(req) \
-	__i915_add_request(req, NULL, false)
-int __i915_wait_request(struct drm_i915_gem_request *req,
-			bool interruptible,
-			s64 *timeout,
-			struct intel_rps_client *rps);
-int __must_check i915_wait_request(struct drm_i915_gem_request *req);
 int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
 int __must_check
 i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ea9344503bf6..68a25617ca7a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1103,365 +1103,6 @@ put_rpm:
 	return ret;
 }
 
-static int
-i915_gem_check_wedge(unsigned reset_counter, bool interruptible)
-{
-	if (__i915_terminally_wedged(reset_counter))
-		return -EIO;
-
-	if (__i915_reset_in_progress(reset_counter)) {
-		/* Non-interruptible callers can't handle -EAGAIN, hence return
-		 * -EIO unconditionally for these. */
-		if (!interruptible)
-			return -EIO;
-
-		return -EAGAIN;
-	}
-
-	return 0;
-}
-
-static unsigned long local_clock_us(unsigned *cpu)
-{
-	unsigned long t;
-
-	/* Cheaply and approximately convert from nanoseconds to microseconds.
-	 * The result and subsequent calculations are also defined in the same
-	 * approximate microseconds units. The principal source of timing
-	 * error here is from the simple truncation.
-	 *
-	 * Note that local_clock() is only defined wrt to the current CPU;
-	 * the comparisons are no longer valid if we switch CPUs. Instead of
-	 * blocking preemption for the entire busywait, we can detect the CPU
-	 * switch and use that as indicator of system load and a reason to
-	 * stop busywaiting, see busywait_stop().
-	 */
-	*cpu = get_cpu();
-	t = local_clock() >> 10;
-	put_cpu();
-
-	return t;
-}
-
-static bool busywait_stop(unsigned long timeout, unsigned cpu)
-{
-	unsigned this_cpu;
-
-	if (time_after(local_clock_us(&this_cpu), timeout))
-		return true;
-
-	return this_cpu != cpu;
-}
-
-static bool __i915_spin_request(struct drm_i915_gem_request *req,
-				struct intel_wait *wait,
-				int state)
-{
-	unsigned long timeout;
-	unsigned cpu;
-
-	/* When waiting for high frequency requests, e.g. during synchronous
-	 * rendering split between the CPU and GPU, the finite amount of time
-	 * required to set up the irq and wait upon it limits the response
-	 * rate. By busywaiting on the request completion for a short while we
-	 * can service the high frequency waits as quick as possible. However,
-	 * if it is a slow request, we want to sleep as quickly as possible.
-	 * The tradeoff between waiting and sleeping is roughly the time it
-	 * takes to sleep on a request, on the order of a microsecond.
-	 */
-
-	/* Only spin if we know the GPU is processing this request */
-	if (!i915_gem_request_started(req))
-		return false;
-
-	timeout = local_clock_us(&cpu) + 5;
-	do {
-		if (i915_gem_request_completed(req))
-			return true;
-
-		if (signal_pending_state(state, wait->task))
-			break;
-
-		if (busywait_stop(timeout, cpu))
-			break;
-
-		cpu_relax_lowlatency();
-
-		/* Break the loop if we have consumed the timeslice (or been
-		 * preempted) or when either the background thread has
-		 * enabled the interrupt, or the IRQ itself has fired.
-		 */
-	} while (!need_resched() && wait->task->state == state);
-
-	return false;
-}
-
-/**
- * __i915_wait_request - wait until execution of request has finished
- * @req: duh!
- * @interruptible: do an interruptible wait (normally yes)
- * @timeout: in - how long to wait (NULL forever); out - how much time remaining
- *
- * Note: It is of utmost importance that the passed in seqno and reset_counter
- * values have been read by the caller in an smp safe manner. Where read-side
- * locks are involved, it is sufficient to read the reset_counter before
- * unlocking the lock that protects the seqno. For lockless tricks, the
- * reset_counter _must_ be read before, and an appropriate smp_rmb must be
- * inserted.
- *
- * Returns 0 if the request was found within the alloted time. Else returns the
- * errno with remaining time filled in timeout argument.
- */
-int __i915_wait_request(struct drm_i915_gem_request *req,
-			bool interruptible,
-			s64 *timeout,
-			struct intel_rps_client *rps)
-{
-	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
-	struct intel_wait wait;
-	unsigned long timeout_remain;
-	int ret = 0;
-
-	might_sleep();
-
-	if (list_empty(&req->list))
-		return 0;
-
-	if (i915_gem_request_completed(req))
-		return 0;
-
-	timeout_remain = MAX_SCHEDULE_TIMEOUT;
-	if (timeout) {
-		if (WARN_ON(*timeout < 0))
-			return -EINVAL;
-
-		if (*timeout == 0)
-			return -ETIME;
-
-		/* Record current time in case interrupted, or wedged */
-		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
-		*timeout += ktime_get_raw_ns();
-	}
-
-	trace_i915_gem_request_wait_begin(req);
-
-	/* This client is about to stall waiting for the GPU. In many cases
-	 * this is undesirable and limits the throughput of the system, as
-	 * many clients cannot continue processing user input/output whilst
-	 * blocked. RPS autotuning may take tens of milliseconds to respond
-	 * to the GPU load and thus incurs additional latency for the client.
-	 * We can circumvent that by promoting the GPU frequency to maximum
-	 * before we wait. This makes the GPU throttle up much more quickly
-	 * (good for benchmarks and user experience, e.g. window animations),
-	 * but at a cost of spending more power processing the workload
-	 * (bad for battery). Not all clients even want their results
-	 * immediately and for them we should just let the GPU select its own
-	 * frequency to maximise efficiency. To prevent a single client from
-	 * forcing the clocks too high for the whole system, we only allow
-	 * each client to waitboost once in a busy period.
-	 */
-	if (INTEL_INFO(req->i915)->gen >= 6)
-		gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
-
-	intel_wait_init(&wait, req->seqno);
-	set_task_state(wait.task, state);
-
-	/* Optimistic spin for the next ~jiffie before touching IRQs */
-	if (intel_engine_add_wait(req->ring, &wait)) {
-		if (__i915_spin_request(req, &wait, state))
-			goto complete;
-
-		/* In order to check that we haven't missed the interrupt
-		 * as we enabled it, we need to kick ourselves to do a
-		 * coherent check on the seqno before we sleep.
-		 */
-		if (intel_engine_enable_wait_irq(req->ring, &wait))
-			goto wakeup;
-	}
-
-	for (;;) {
-		if (signal_pending_state(state, wait.task)) {
-			ret = -ERESTARTSYS;
-			break;
-		}
-
-		/* Ensure that even if the GPU hangs, we get woken up. */
-		i915_queue_hangcheck(req->i915);
-
-		timeout_remain = io_schedule_timeout(timeout_remain);
-		if (timeout_remain == 0) {
-			ret = -ETIME;
-			break;
-		}
-
-		if (intel_wait_complete(&wait))
-			break;
-
-wakeup:
-		set_task_state(wait.task, state);
-
-		/* Carefully check if the request is complete, giving time
-		 * for the seqno to be visible following the interrupt.
-		 * We also have to check in case we are kicked by the GPU
-		 * reset in order to drop the struct_mutex.
-		 */
-		if (__i915_request_irq_complete(req))
-			break;
-	}
-
-complete:
-	intel_engine_remove_wait(req->ring, &wait);
-	__set_task_state(wait.task, TASK_RUNNING);
-	trace_i915_gem_request_wait_end(req);
-
-	if (timeout) {
-		*timeout -= ktime_get_raw_ns();
-		if (*timeout < 0)
-			*timeout = 0;
-
-		/*
-		 * Apparently ktime isn't accurate enough and occasionally has a
-		 * bit of mismatch in the jiffies<->nsecs<->ktime loop. So patch
-		 * things up to make the test happy. We allow up to 1 jiffy.
-		 *
-		 * This is a regrssion from the timespec->ktime conversion.
-		 */
-		if (ret == -ETIME && *timeout < jiffies_to_usecs(1)*1000)
-			*timeout = 0;
-	}
-
-	if (ret == 0 && rps && req->seqno == req->ring->last_submitted_seqno) {
-		/* The GPU is now idle and this client has stalled.
-		 * Since no other client has submitted a request in the
-		 * meantime, assume that this client is the only one
-		 * supplying work to the GPU but is unable to keep that
-		 * work supplied because it is waiting. Since the GPU is
-		 * then never kept fully busy, RPS autoclocking will
-		 * keep the clocks relatively low, causing further delays.
-		 * Compensate by giving the synchronous client credit for
-		 * a waitboost next time.
-		 */
-		spin_lock(&req->i915->rps.client_lock);
-		list_del_init(&rps->link);
-		spin_unlock(&req->i915->rps.client_lock);
-	}
-
-	return ret;
-}
-
-int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
-				   struct drm_file *file)
-{
-	struct drm_i915_private *dev_private;
-	struct drm_i915_file_private *file_priv;
-
-	WARN_ON(!req || !file || req->file_priv);
-
-	if (!req || !file)
-		return -EINVAL;
-
-	if (req->file_priv)
-		return -EINVAL;
-
-	dev_private = req->ring->dev->dev_private;
-	file_priv = file->driver_priv;
-
-	spin_lock(&file_priv->mm.lock);
-	req->file_priv = file_priv;
-	list_add_tail(&req->client_list, &file_priv->mm.request_list);
-	spin_unlock(&file_priv->mm.lock);
-
-	req->pid = get_pid(task_pid(current));
-
-	return 0;
-}
-
-static inline void
-i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
-{
-	struct drm_i915_file_private *file_priv = request->file_priv;
-
-	if (!file_priv)
-		return;
-
-	spin_lock(&file_priv->mm.lock);
-	list_del(&request->client_list);
-	request->file_priv = NULL;
-	spin_unlock(&file_priv->mm.lock);
-
-	put_pid(request->pid);
-	request->pid = NULL;
-}
-
-static void i915_gem_request_retire(struct drm_i915_gem_request *request)
-{
-	trace_i915_gem_request_retire(request);
-
-	/* We know the GPU must have read the request to have
-	 * sent us the seqno + interrupt, so use the position
-	 * of tail of the request to update the last known position
-	 * of the GPU head.
-	 *
-	 * Note this requires that we are always called in request
-	 * completion order.
-	 */
-	request->ringbuf->last_retired_head = request->postfix;
-
-	list_del_init(&request->list);
-	i915_gem_request_remove_from_client(request);
-
-	i915_gem_request_unreference(request);
-}
-
-static void
-__i915_gem_request_retire__upto(struct drm_i915_gem_request *req)
-{
-	struct intel_engine_cs *engine = req->ring;
-	struct drm_i915_gem_request *tmp;
-
-	lockdep_assert_held(&engine->dev->struct_mutex);
-
-	if (list_empty(&req->list))
-		return;
-
-	do {
-		tmp = list_first_entry(&engine->request_list,
-				       typeof(*tmp), list);
-
-		i915_gem_request_retire(tmp);
-	} while (tmp != req);
-
-	WARN_ON(i915_verify_lists(engine->dev));
-}
-
-/**
- * Waits for a request to be signaled, and cleans up the
- * request and object lists appropriately for that event.
- */
-int
-i915_wait_request(struct drm_i915_gem_request *req)
-{
-	struct drm_device *dev;
-	struct drm_i915_private *dev_priv;
-	bool interruptible;
-	int ret;
-
-	BUG_ON(req == NULL);
-
-	dev = req->ring->dev;
-	dev_priv = dev->dev_private;
-	interruptible = dev_priv->mm.interruptible;
-
-	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
-
-	ret = __i915_wait_request(req, interruptible, NULL, NULL);
-	if (ret)
-		return ret;
-
-	__i915_gem_request_retire__upto(req);
-	return 0;
-}
-
 /**
  * Ensures that all rendering to the object has completed and the object is
  * safe to unbind from the GTT or access from the CPU.
@@ -1515,7 +1156,7 @@ i915_gem_object_retire_request(struct drm_i915_gem_object *obj,
 	else if (obj->last_write_req == req)
 		i915_gem_object_retire__write(obj);
 
-	__i915_gem_request_retire__upto(req);
+	i915_gem_request_retire_upto(req);
 }
 
 /* A nonblocking variant of the above wait. This is a highly dangerous routine
@@ -2441,94 +2082,6 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 	drm_gem_object_unreference(&obj->base);
 }
 
-static int
-i915_gem_init_seqno(struct drm_device *dev, u32 seqno)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_engine_cs *ring;
-	int ret, i, j;
-
-	/* Carefully retire all requests without writing to the rings */
-	for_each_ring(ring, dev_priv, i) {
-		ret = intel_ring_idle(ring);
-		if (ret)
-			return ret;
-	}
-	i915_gem_retire_requests(dev);
-
-	/* Finally reset hw state */
-	for_each_ring(ring, dev_priv, i) {
-		intel_ring_init_seqno(ring, seqno);
-
-		for (j = 0; j < ARRAY_SIZE(ring->semaphore.sync_seqno); j++)
-			ring->semaphore.sync_seqno[j] = 0;
-	}
-
-	return 0;
-}
-
-int i915_gem_set_seqno(struct drm_device *dev, u32 seqno)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	int ret;
-
-	if (seqno == 0)
-		return -EINVAL;
-
-	/* HWS page needs to be set less than what we
-	 * will inject to ring
-	 */
-	ret = i915_gem_init_seqno(dev, seqno - 1);
-	if (ret)
-		return ret;
-
-	/* Carefully set the last_seqno value so that wrap
-	 * detection still works
-	 */
-	dev_priv->next_seqno = seqno;
-	dev_priv->last_seqno = seqno - 1;
-	if (dev_priv->last_seqno == 0)
-		dev_priv->last_seqno--;
-
-	return 0;
-}
-
-int
-i915_gem_get_seqno(struct drm_device *dev, u32 *seqno)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	/* reserve 0 for non-seqno */
-	if (dev_priv->next_seqno == 0) {
-		int ret = i915_gem_init_seqno(dev, 0);
-		if (ret)
-			return ret;
-
-		dev_priv->next_seqno = 1;
-	}
-
-	*seqno = dev_priv->last_seqno = dev_priv->next_seqno++;
-	return 0;
-}
-
-static void i915_gem_mark_busy(struct drm_i915_private *dev_priv)
-{
-	if (dev_priv->mm.busy)
-		return;
-
-	intel_runtime_pm_get_noresume(dev_priv);
-
-	i915_update_gfx_val(dev_priv);
-	if (INTEL_INFO(dev_priv)->gen >= 6)
-		gen6_rps_busy(dev_priv);
-
-	queue_delayed_work(dev_priv->wq,
-			   &dev_priv->mm.retire_work,
-			   round_jiffies_up_relative(HZ));
-
-	dev_priv->mm.busy = true;
-}
-
 static void i915_gem_mark_idle(struct drm_i915_private *dev_priv)
 {
 	dev_priv->mm.busy = false;
@@ -2542,92 +2095,6 @@ static void i915_gem_mark_idle(struct drm_i915_private *dev_priv)
 	intel_runtime_pm_put(dev_priv);
 }
 
-/*
- * NB: This function is not allowed to fail. Doing so would mean the the
- * request is not being tracked for completion but the work itself is
- * going to happen on the hardware. This would be a Bad Thing(tm).
- */
-void __i915_add_request(struct drm_i915_gem_request *request,
-			struct drm_i915_gem_object *obj,
-			bool flush_caches)
-{
-	struct intel_engine_cs *ring;
-	struct drm_i915_private *dev_priv;
-	struct intel_ringbuffer *ringbuf;
-	u32 request_start;
-	int ret;
-
-	if (WARN_ON(request == NULL))
-		return;
-
-	ring = request->ring;
-	dev_priv = ring->dev->dev_private;
-	ringbuf = request->ringbuf;
-
-	/*
-	 * To ensure that this call will not fail, space for its emissions
-	 * should already have been reserved in the ring buffer. Let the ring
-	 * know that it is time to use that space up.
-	 */
-	intel_ring_reserved_space_use(ringbuf);
-
-	request_start = intel_ring_get_tail(ringbuf);
-	/*
-	 * Emit any outstanding flushes - execbuf can fail to emit the flush
-	 * after having emitted the batchbuffer command. Hence we need to fix
-	 * things up similar to emitting the lazy request. The difference here
-	 * is that the flush _must_ happen before the next request, no matter
-	 * what.
-	 */
-	if (flush_caches) {
-		if (i915.enable_execlists)
-			ret = logical_ring_flush_all_caches(request);
-		else
-			ret = intel_ring_flush_all_caches(request);
-		/* Not allowed to fail! */
-		WARN(ret, "*_ring_flush_all_caches failed: %d!\n", ret);
-	}
-
-	/* Record the position of the start of the request so that
-	 * should we detect the updated seqno part-way through the
-	 * GPU processing the request, we never over-estimate the
-	 * position of the head.
-	 */
-	request->postfix = intel_ring_get_tail(ringbuf);
-
-	if (i915.enable_execlists)
-		ret = ring->emit_request(request);
-	else {
-		ret = ring->add_request(request);
-
-		request->tail = intel_ring_get_tail(ringbuf);
-	}
-	/* Not allowed to fail! */
-	WARN(ret, "emit|add_request failed: %d!\n", ret);
-
-	request->head = request_start;
-
-	/* Whilst this request exists, batch_obj will be on the
-	 * active_list, and so will hold the active reference. Only when this
-	 * request is retired will the the batch_obj be moved onto the
-	 * inactive_list and lose its active reference. Hence we do not need
-	 * to explicitly hold another reference here.
-	 */
-	request->batch_obj = obj;
-
-	request->emitted_jiffies = jiffies;
-	request->previous_seqno = ring->last_submitted_seqno;
-	ring->last_submitted_seqno = request->seqno;
-	list_add_tail(&request->list, &ring->request_list);
-
-	trace_i915_gem_request_add(request);
-
-	i915_gem_mark_busy(dev_priv);
-
-	/* Sanity check that the reserved size was large enough. */
-	intel_ring_reserved_space_end(ringbuf);
-}
-
 static bool i915_context_is_banned(struct drm_i915_private *dev_priv,
 				   const struct intel_context *ctx)
 {
@@ -2666,109 +2133,6 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 	}
 }
 
-void i915_gem_request_free(struct kref *req_ref)
-{
-	struct drm_i915_gem_request *req = container_of(req_ref,
-						 typeof(*req), ref);
-	struct intel_context *ctx = req->ctx;
-
-	if (req->file_priv)
-		i915_gem_request_remove_from_client(req);
-
-	if (ctx) {
-		if (i915.enable_execlists) {
-			if (ctx != req->ring->default_context)
-				intel_lr_context_unpin(req);
-		}
-
-		i915_gem_context_unreference(ctx);
-	}
-
-	kmem_cache_free(req->i915->requests, req);
-}
-
-int i915_gem_request_alloc(struct intel_engine_cs *ring,
-			   struct intel_context *ctx,
-			   struct drm_i915_gem_request **req_out)
-{
-	struct drm_i915_private *dev_priv = to_i915(ring->dev);
-	unsigned reset_counter = i915_reset_counter(&dev_priv->gpu_error);
-	struct drm_i915_gem_request *req;
-	int ret;
-
-	if (!req_out)
-		return -EINVAL;
-
-	*req_out = NULL;
-
-	/* ABI: Before userspace accesses the GPU (e.g. execbuffer), report
-	 * EIO if the GPU is already wedged, or EAGAIN to drop the struct_mutex
-	 * and restart.
-	 */
-	ret = i915_gem_check_wedge(reset_counter, dev_priv->mm.interruptible);
-	if (ret)
-		return ret;
-
-	req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
-	if (req == NULL)
-		return -ENOMEM;
-
-	ret = i915_gem_get_seqno(ring->dev, &req->seqno);
-	if (ret)
-		goto err;
-
-	kref_init(&req->ref);
-	req->i915 = dev_priv;
-	req->ring = ring;
-	req->reset_counter = reset_counter;
-	req->ctx  = ctx;
-	i915_gem_context_reference(req->ctx);
-
-	if (i915.enable_execlists)
-		ret = intel_logical_ring_alloc_request_extras(req);
-	else
-		ret = intel_ring_alloc_request_extras(req);
-	if (ret) {
-		i915_gem_context_unreference(req->ctx);
-		goto err;
-	}
-
-	/*
-	 * Reserve space in the ring buffer for all the commands required to
-	 * eventually emit this request. This is to guarantee that the
-	 * i915_add_request() call can't fail. Note that the reserve may need
-	 * to be redone if the request is not actually submitted straight
-	 * away, e.g. because a GPU scheduler has deferred it.
-	 */
-	if (i915.enable_execlists)
-		ret = intel_logical_ring_reserve_space(req);
-	else
-		ret = intel_ring_reserve_space(req);
-	if (ret) {
-		/*
-		 * At this point, the request is fully allocated even if not
-		 * fully prepared. Thus it can be cleaned up using the proper
-		 * free code.
-		 */
-		i915_gem_request_cancel(req);
-		return ret;
-	}
-
-	*req_out = req;
-	return 0;
-
-err:
-	kmem_cache_free(dev_priv->requests, req);
-	return ret;
-}
-
-void i915_gem_request_cancel(struct drm_i915_gem_request *req)
-{
-	intel_ring_reserved_space_cancel(req->ringbuf);
-
-	i915_gem_request_unreference(req);
-}
-
 struct drm_i915_gem_request *
 i915_gem_find_active_request(struct intel_engine_cs *ring)
 {
@@ -2850,14 +2214,14 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 	 * implicit references on things like e.g. ppgtt address spaces through
 	 * the request.
 	 */
-	while (!list_empty(&ring->request_list)) {
+	if (!list_empty(&ring->request_list)) {
 		struct drm_i915_gem_request *request;
 
-		request = list_first_entry(&ring->request_list,
-					   struct drm_i915_gem_request,
-					   list);
+		request = list_last_entry(&ring->request_list,
+					  struct drm_i915_gem_request,
+					  list);
 
-		i915_gem_request_retire(request);
+		i915_gem_request_retire_upto(request);
 	}
 
 	/* Having flushed all requests from all queues, we know that all
@@ -2922,7 +2286,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 		if (!i915_gem_request_completed(request))
 			break;
 
-		i915_gem_request_retire(request);
+		i915_gem_request_retire_upto(request);
 	}
 
 	/* Move any buffers on the active list that are no longer referenced
@@ -3053,7 +2417,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 			goto retire;
 
 		if (i915_gem_request_completed(req)) {
-			__i915_gem_request_retire__upto(req);
+			i915_gem_request_retire_upto(req);
 retire:
 			i915_gem_object_retire__read(obj, i);
 		}
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
new file mode 100644
index 000000000000..b4ede6dd7b20
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -0,0 +1,659 @@
+/*
+ * Copyright © 2008-2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "i915_drv.h"
+
+static int
+i915_gem_check_wedge(unsigned reset_counter, bool interruptible)
+{
+	if (__i915_terminally_wedged(reset_counter))
+		return -EIO;
+
+	if (__i915_reset_in_progress(reset_counter)) {
+		/* Non-interruptible callers can't handle -EAGAIN, hence return
+		 * -EIO unconditionally for these. */
+		if (!interruptible)
+			return -EIO;
+
+		return -EAGAIN;
+	}
+
+	return 0;
+}
+
+static int
+i915_gem_init_seqno(struct drm_i915_private *dev_priv, u32 seqno)
+{
+	struct intel_engine_cs *ring;
+	int ret, i, j;
+
+	/* Carefully retire all requests without writing to the rings */
+	for_each_ring(ring, dev_priv, i) {
+		ret = intel_ring_idle(ring);
+		if (ret)
+			return ret;
+	}
+	i915_gem_retire_requests(dev_priv->dev);
+
+	/* Finally reset hw state */
+	for_each_ring(ring, dev_priv, i) {
+		intel_ring_init_seqno(ring, seqno);
+
+		for (j = 0; j < ARRAY_SIZE(ring->semaphore.sync_seqno); j++)
+			ring->semaphore.sync_seqno[j] = 0;
+	}
+
+	return 0;
+}
+
+int i915_gem_set_seqno(struct drm_device *dev, u32 seqno)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	if (seqno == 0)
+		return -EINVAL;
+
+	/* HWS page needs to be set less than what we
+	 * will inject to ring
+	 */
+	ret = i915_gem_init_seqno(dev_priv, seqno - 1);
+	if (ret)
+		return ret;
+
+	/* Carefully set the last_seqno value so that wrap
+	 * detection still works
+	 */
+	dev_priv->next_seqno = seqno;
+	dev_priv->last_seqno = seqno - 1;
+	if (dev_priv->last_seqno == 0)
+		dev_priv->last_seqno--;
+
+	return 0;
+}
+
+static int
+i915_gem_get_seqno(struct drm_i915_private *dev_priv, u32 *seqno)
+{
+	/* reserve 0 for non-seqno */
+	if (unlikely(dev_priv->next_seqno == 0)) {
+		int ret = i915_gem_init_seqno(dev_priv, 0);
+		if (ret)
+			return ret;
+
+		dev_priv->next_seqno = 1;
+	}
+
+	*seqno = dev_priv->last_seqno = dev_priv->next_seqno++;
+	return 0;
+}
+
+int i915_gem_request_alloc(struct intel_engine_cs *ring,
+			   struct intel_context *ctx,
+			   struct drm_i915_gem_request **req_out)
+{
+	struct drm_i915_private *dev_priv = to_i915(ring->dev);
+	unsigned reset_counter = i915_reset_counter(&dev_priv->gpu_error);
+	struct drm_i915_gem_request *req;
+	int ret;
+
+	if (!req_out)
+		return -EINVAL;
+
+	*req_out = NULL;
+
+	/* ABI: Before userspace accesses the GPU (e.g. execbuffer), report
+	 * EIO if the GPU is already wedged, or EAGAIN to drop the struct_mutex
+	 * and restart.
+	 */
+	ret = i915_gem_check_wedge(reset_counter, dev_priv->mm.interruptible);
+	if (ret)
+		return ret;
+
+	req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
+	if (req == NULL)
+		return -ENOMEM;
+
+	ret = i915_gem_get_seqno(dev_priv, &req->seqno);
+	if (ret)
+		goto err;
+
+	kref_init(&req->ref);
+	req->i915 = dev_priv;
+	req->ring = ring;
+	req->reset_counter = reset_counter;
+	req->ctx  = ctx;
+	i915_gem_context_reference(req->ctx);
+
+	if (i915.enable_execlists)
+		ret = intel_logical_ring_alloc_request_extras(req);
+	else
+		ret = intel_ring_alloc_request_extras(req);
+	if (ret) {
+		i915_gem_context_unreference(req->ctx);
+		goto err;
+	}
+
+	/*
+	 * Reserve space in the ring buffer for all the commands required to
+	 * eventually emit this request. This is to guarantee that the
+	 * i915_add_request() call can't fail. Note that the reserve may need
+	 * to be redone if the request is not actually submitted straight
+	 * away, e.g. because a GPU scheduler has deferred it.
+	 */
+	if (i915.enable_execlists)
+		ret = intel_logical_ring_reserve_space(req);
+	else
+		ret = intel_ring_reserve_space(req);
+	if (ret) {
+		/*
+		 * At this point, the request is fully allocated even if not
+		 * fully prepared. Thus it can be cleaned up using the proper
+		 * free code.
+		 */
+		i915_gem_request_cancel(req);
+		return ret;
+	}
+
+	*req_out = req;
+	return 0;
+
+err:
+	kmem_cache_free(dev_priv->requests, req);
+	return ret;
+}
+
+void i915_gem_request_cancel(struct drm_i915_gem_request *req)
+{
+	intel_ring_reserved_space_cancel(req->ringbuf);
+
+	i915_gem_request_unreference(req);
+}
+
+int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
+				   struct drm_file *file)
+{
+	struct drm_i915_private *dev_private;
+	struct drm_i915_file_private *file_priv;
+
+	WARN_ON(!req || !file || req->file_priv);
+
+	if (!req || !file)
+		return -EINVAL;
+
+	if (req->file_priv)
+		return -EINVAL;
+
+	dev_private = req->ring->dev->dev_private;
+	file_priv = file->driver_priv;
+
+	spin_lock(&file_priv->mm.lock);
+	req->file_priv = file_priv;
+	list_add_tail(&req->client_list, &file_priv->mm.request_list);
+	spin_unlock(&file_priv->mm.lock);
+
+	req->pid = get_pid(task_pid(current));
+
+	return 0;
+}
+
+static inline void
+i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
+{
+	struct drm_i915_file_private *file_priv = request->file_priv;
+
+	if (!file_priv)
+		return;
+
+	spin_lock(&file_priv->mm.lock);
+	list_del(&request->client_list);
+	request->file_priv = NULL;
+	spin_unlock(&file_priv->mm.lock);
+
+	put_pid(request->pid);
+	request->pid = NULL;
+}
+
+static void i915_gem_request_retire(struct drm_i915_gem_request *request)
+{
+	trace_i915_gem_request_retire(request);
+
+	/* We know the GPU must have read the request to have
+	 * sent us the seqno + interrupt, so use the position
+	 * of tail of the request to update the last known position
+	 * of the GPU head.
+	 *
+	 * Note this requires that we are always called in request
+	 * completion order.
+	 */
+	request->ringbuf->last_retired_head = request->postfix;
+
+	list_del_init(&request->list);
+	i915_gem_request_remove_from_client(request);
+
+	i915_gem_request_unreference(request);
+}
+
+void
+i915_gem_request_retire_upto(struct drm_i915_gem_request *req)
+{
+	struct intel_engine_cs *engine = req->ring;
+	struct drm_i915_gem_request *tmp;
+
+	lockdep_assert_held(&engine->dev->struct_mutex);
+
+	if (list_empty(&req->list))
+		return;
+
+	do {
+		tmp = list_first_entry(&engine->request_list,
+				       typeof(*tmp), list);
+
+		i915_gem_request_retire(tmp);
+	} while (tmp != req);
+
+	WARN_ON(i915_verify_lists(engine->dev));
+}
+
+static void i915_gem_mark_busy(struct drm_i915_private *dev_priv)
+{
+	if (dev_priv->mm.busy)
+		return;
+
+	intel_runtime_pm_get_noresume(dev_priv);
+
+	i915_update_gfx_val(dev_priv);
+	if (INTEL_INFO(dev_priv)->gen >= 6)
+		gen6_rps_busy(dev_priv);
+
+	queue_delayed_work(dev_priv->wq,
+			   &dev_priv->mm.retire_work,
+			   round_jiffies_up_relative(HZ));
+
+	dev_priv->mm.busy = true;
+}
+
+/*
+ * NB: This function is not allowed to fail. Doing so would mean the the
+ * request is not being tracked for completion but the work itself is
+ * going to happen on the hardware. This would be a Bad Thing(tm).
+ */
+void __i915_add_request(struct drm_i915_gem_request *request,
+			struct drm_i915_gem_object *obj,
+			bool flush_caches)
+{
+	struct intel_engine_cs *ring;
+	struct drm_i915_private *dev_priv;
+	struct intel_ringbuffer *ringbuf;
+	u32 request_start;
+	int ret;
+
+	if (WARN_ON(request == NULL))
+		return;
+
+	ring = request->ring;
+	dev_priv = ring->dev->dev_private;
+	ringbuf = request->ringbuf;
+
+	/*
+	 * To ensure that this call will not fail, space for its emissions
+	 * should already have been reserved in the ring buffer. Let the ring
+	 * know that it is time to use that space up.
+	 */
+	intel_ring_reserved_space_use(ringbuf);
+
+	request_start = intel_ring_get_tail(ringbuf);
+	/*
+	 * Emit any outstanding flushes - execbuf can fail to emit the flush
+	 * after having emitted the batchbuffer command. Hence we need to fix
+	 * things up similar to emitting the lazy request. The difference here
+	 * is that the flush _must_ happen before the next request, no matter
+	 * what.
+	 */
+	if (flush_caches) {
+		if (i915.enable_execlists)
+			ret = logical_ring_flush_all_caches(request);
+		else
+			ret = intel_ring_flush_all_caches(request);
+		/* Not allowed to fail! */
+		WARN(ret, "*_ring_flush_all_caches failed: %d!\n", ret);
+	}
+
+	/* Record the position of the start of the request so that
+	 * should we detect the updated seqno part-way through the
+	 * GPU processing the request, we never over-estimate the
+	 * position of the head.
+	 */
+	request->postfix = intel_ring_get_tail(ringbuf);
+
+	if (i915.enable_execlists)
+		ret = ring->emit_request(request);
+	else {
+		ret = ring->add_request(request);
+
+		request->tail = intel_ring_get_tail(ringbuf);
+	}
+	/* Not allowed to fail! */
+	WARN(ret, "emit|add_request failed: %d!\n", ret);
+
+	request->head = request_start;
+
+	/* Whilst this request exists, batch_obj will be on the
+	 * active_list, and so will hold the active reference. Only when this
+	 * request is retired will the the batch_obj be moved onto the
+	 * inactive_list and lose its active reference. Hence we do not need
+	 * to explicitly hold another reference here.
+	 */
+	request->batch_obj = obj;
+
+	request->emitted_jiffies = jiffies;
+	request->previous_seqno = ring->last_submitted_seqno;
+	ring->last_submitted_seqno = request->seqno;
+	list_add_tail(&request->list, &ring->request_list);
+
+	trace_i915_gem_request_add(request);
+
+	i915_gem_mark_busy(dev_priv);
+
+	/* Sanity check that the reserved size was large enough. */
+	intel_ring_reserved_space_end(ringbuf);
+}
+
+
+static unsigned long local_clock_us(unsigned *cpu)
+{
+	unsigned long t;
+
+	/* Cheaply and approximately convert from nanoseconds to microseconds.
+	 * The result and subsequent calculations are also defined in the same
+	 * approximate microseconds units. The principal source of timing
+	 * error here is from the simple truncation.
+	 *
+	 * Note that local_clock() is only defined wrt to the current CPU;
+	 * the comparisons are no longer valid if we switch CPUs. Instead of
+	 * blocking preemption for the entire busywait, we can detect the CPU
+	 * switch and use that as indicator of system load and a reason to
+	 * stop busywaiting, see busywait_stop().
+	 */
+	*cpu = get_cpu();
+	t = local_clock() >> 10;
+	put_cpu();
+
+	return t;
+}
+
+static bool busywait_stop(unsigned long timeout, unsigned cpu)
+{
+	unsigned this_cpu;
+
+	if (time_after(local_clock_us(&this_cpu), timeout))
+		return true;
+
+	return this_cpu != cpu;
+}
+
+static bool __i915_spin_request(struct drm_i915_gem_request *req,
+				struct intel_wait *wait,
+				int state)
+{
+	unsigned long timeout;
+	unsigned cpu;
+
+	/* When waiting for high frequency requests, e.g. during synchronous
+	 * rendering split between the CPU and GPU, the finite amount of time
+	 * required to set up the irq and wait upon it limits the response
+	 * rate. By busywaiting on the request completion for a short while we
+	 * can service the high frequency waits as quick as possible. However,
+	 * if it is a slow request, we want to sleep as quickly as possible.
+	 * The tradeoff between waiting and sleeping is roughly the time it
+	 * takes to sleep on a request, on the order of a microsecond.
+	 */
+
+	/* Only spin if we know the GPU is processing this request */
+	if (!i915_gem_request_started(req))
+		return false;
+
+	timeout = local_clock_us(&cpu) + 5;
+	do {
+		if (i915_gem_request_completed(req))
+			return true;
+
+		if (signal_pending_state(state, wait->task))
+			break;
+
+		if (busywait_stop(timeout, cpu))
+			break;
+
+		cpu_relax_lowlatency();
+
+		/* Break the loop if we have consumed the timeslice (or been
+		 * preempted) or when either the background thread has
+		 * enabled the interrupt, or the IRQ itself has fired.
+		 */
+	} while (!need_resched() && wait->task->state == state);
+
+	return false;
+}
+
+/**
+ * __i915_wait_request - wait until execution of request has finished
+ * @req: duh!
+ * @interruptible: do an interruptible wait (normally yes)
+ * @timeout: in - how long to wait (NULL forever); out - how much time remaining
+ *
+ * Note: It is of utmost importance that the passed in seqno and reset_counter
+ * values have been read by the caller in an smp safe manner. Where read-side
+ * locks are involved, it is sufficient to read the reset_counter before
+ * unlocking the lock that protects the seqno. For lockless tricks, the
+ * reset_counter _must_ be read before, and an appropriate smp_rmb must be
+ * inserted.
+ *
+ * Returns 0 if the request was found within the alloted time. Else returns the
+ * errno with remaining time filled in timeout argument.
+ */
+int __i915_wait_request(struct drm_i915_gem_request *req,
+			bool interruptible,
+			s64 *timeout,
+			struct intel_rps_client *rps)
+{
+	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
+	struct intel_wait wait;
+	unsigned long timeout_remain;
+	int ret = 0;
+
+	might_sleep();
+
+	if (list_empty(&req->list))
+		return 0;
+
+	if (i915_gem_request_completed(req))
+		return 0;
+
+	timeout_remain = MAX_SCHEDULE_TIMEOUT;
+	if (timeout) {
+		if (WARN_ON(*timeout < 0))
+			return -EINVAL;
+
+		if (*timeout == 0)
+			return -ETIME;
+
+		/* Record current time in case interrupted, or wedged */
+		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
+		*timeout += ktime_get_raw_ns();
+	}
+
+	trace_i915_gem_request_wait_begin(req);
+
+	/* This client is about to stall waiting for the GPU. In many cases
+	 * this is undesirable and limits the throughput of the system, as
+	 * many clients cannot continue processing user input/output whilst
+	 * blocked. RPS autotuning may take tens of milliseconds to respond
+	 * to the GPU load and thus incurs additional latency for the client.
+	 * We can circumvent that by promoting the GPU frequency to maximum
+	 * before we wait. This makes the GPU throttle up much more quickly
+	 * (good for benchmarks and user experience, e.g. window animations),
+	 * but at a cost of spending more power processing the workload
+	 * (bad for battery). Not all clients even want their results
+	 * immediately and for them we should just let the GPU select its own
+	 * frequency to maximise efficiency. To prevent a single client from
+	 * forcing the clocks too high for the whole system, we only allow
+	 * each client to waitboost once in a busy period.
+	 */
+	if (INTEL_INFO(req->i915)->gen >= 6)
+		gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
+
+	intel_wait_init(&wait, req->seqno);
+	set_task_state(wait.task, state);
+
+	/* Optimistic spin for the next ~jiffie before touching IRQs */
+	if (intel_engine_add_wait(req->ring, &wait)) {
+		if (__i915_spin_request(req, &wait, state))
+			goto complete;
+
+		/* In order to check that we haven't missed the interrupt
+		 * as we enabled it, we need to kick ourselves to do a
+		 * coherent check on the seqno before we sleep.
+		 */
+		if (intel_engine_enable_wait_irq(req->ring, &wait))
+			goto wakeup;
+	}
+
+	for (;;) {
+		if (signal_pending_state(state, wait.task)) {
+			ret = -ERESTARTSYS;
+			break;
+		}
+
+		/* Ensure that even if the GPU hangs, we get woken up. */
+		i915_queue_hangcheck(req->i915);
+
+		timeout_remain = io_schedule_timeout(timeout_remain);
+		if (timeout_remain == 0) {
+			ret = -ETIME;
+			break;
+		}
+
+		if (intel_wait_complete(&wait))
+			break;
+
+wakeup:
+		set_task_state(wait.task, state);
+
+		/* Carefully check if the request is complete, giving time
+		 * for the seqno to be visible following the interrupt.
+		 * We also have to check in case we are kicked by the GPU
+		 * reset in order to drop the struct_mutex.
+		 */
+		if (__i915_request_irq_complete(req))
+			break;
+	}
+
+complete:
+	intel_engine_remove_wait(req->ring, &wait);
+	__set_task_state(wait.task, TASK_RUNNING);
+	trace_i915_gem_request_wait_end(req);
+
+	if (timeout) {
+		*timeout -= ktime_get_raw_ns();
+		if (*timeout < 0)
+			*timeout = 0;
+
+		/*
+		 * Apparently ktime isn't accurate enough and occasionally has a
+		 * bit of mismatch in the jiffies<->nsecs<->ktime loop. So patch
+		 * things up to make the test happy. We allow up to 1 jiffy.
+		 *
+		 * This is a regrssion from the timespec->ktime conversion.
+		 */
+		if (ret == -ETIME && *timeout < jiffies_to_usecs(1)*1000)
+			*timeout = 0;
+	}
+
+	if (ret == 0 && rps && req->seqno == req->ring->last_submitted_seqno) {
+		/* The GPU is now idle and this client has stalled.
+		 * Since no other client has submitted a request in the
+		 * meantime, assume that this client is the only one
+		 * supplying work to the GPU but is unable to keep that
+		 * work supplied because it is waiting. Since the GPU is
+		 * then never kept fully busy, RPS autoclocking will
+		 * keep the clocks relatively low, causing further delays.
+		 * Compensate by giving the synchronous client credit for
+		 * a waitboost next time.
+		 */
+		spin_lock(&req->i915->rps.client_lock);
+		list_del_init(&rps->link);
+		spin_unlock(&req->i915->rps.client_lock);
+	}
+
+	return ret;
+}
+
+/**
+ * Waits for a request to be signaled, and cleans up the
+ * request and object lists appropriately for that event.
+ */
+int
+i915_wait_request(struct drm_i915_gem_request *req)
+{
+	struct drm_device *dev;
+	struct drm_i915_private *dev_priv;
+	bool interruptible;
+	int ret;
+
+	BUG_ON(req == NULL);
+
+	dev = req->ring->dev;
+	dev_priv = dev->dev_private;
+	interruptible = dev_priv->mm.interruptible;
+
+	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
+
+	ret = __i915_wait_request(req, interruptible, NULL, NULL);
+	if (ret)
+		return ret;
+
+	i915_gem_request_retire_upto(req);
+	return 0;
+}
+
+void i915_gem_request_free(struct kref *req_ref)
+{
+	struct drm_i915_gem_request *req = container_of(req_ref,
+						 typeof(*req), ref);
+	struct intel_context *ctx = req->ctx;
+
+	if (req->file_priv)
+		i915_gem_request_remove_from_client(req);
+
+	if (ctx) {
+		if (i915.enable_execlists) {
+			if (ctx != req->ring->default_context)
+				intel_lr_context_unpin(req);
+		}
+
+		i915_gem_context_unreference(ctx);
+	}
+
+	kmem_cache_free(req->i915->requests, req);
+}
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
new file mode 100644
index 000000000000..d46f22f30b0a
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -0,0 +1,223 @@
+/*
+ * Copyright © 2008-2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef I915_GEM_REQUEST_H
+#define I915_GEM_REQUEST_H
+
+/**
+ * Request queue structure.
+ *
+ * The request queue allows us to note sequence numbers that have been emitted
+ * and may be associated with active buffers to be retired.
+ *
+ * By keeping this list, we can avoid having to do questionable sequence
+ * number comparisons on buffer last_read|write_seqno. It also allows an
+ * emission time to be associated with the request for tracking how far ahead
+ * of the GPU the submission is.
+ *
+ * The requests are reference counted, so upon creation they should have an
+ * initial reference taken using kref_init
+ */
+struct drm_i915_gem_request {
+	struct kref ref;
+
+	/** On Which ring this request was generated */
+	struct drm_i915_private *i915;
+	struct intel_engine_cs *ring;
+	unsigned reset_counter;
+
+	 /** GEM sequence number associated with the previous request,
+	  * when the HWS breadcrumb is equal to this the GPU is processing
+	  * this request.
+	  */
+	u32 previous_seqno;
+
+	 /** GEM sequence number associated with this request,
+	  * when the HWS breadcrumb is equal or greater than this the GPU
+	  * has finished processing this request.
+	  */
+	u32 seqno;
+
+	/** Position in the ringbuffer of the start of the request */
+	u32 head;
+
+	/**
+	 * Position in the ringbuffer of the start of the postfix.
+	 * This is required to calculate the maximum available ringbuffer
+	 * space without overwriting the postfix.
+	 */
+	 u32 postfix;
+
+	/** Position in the ringbuffer of the end of the whole request */
+	u32 tail;
+
+	/**
+	 * Context and ring buffer related to this request
+	 * Contexts are refcounted, so when this request is associated with a
+	 * context, we must increment the context's refcount, to guarantee that
+	 * it persists while any request is linked to it. Requests themselves
+	 * are also refcounted, so the request will only be freed when the last
+	 * reference to it is dismissed, and the code in
+	 * i915_gem_request_free() will then decrement the refcount on the
+	 * context.
+	 */
+	struct intel_context *ctx;
+	struct intel_ringbuffer *ringbuf;
+
+	/** Batch buffer related to this request if any (used for
+	    error state dump only) */
+	struct drm_i915_gem_object *batch_obj;
+
+	/** Time at which this request was emitted, in jiffies. */
+	unsigned long emitted_jiffies;
+
+	/** global list entry for this request */
+	struct list_head list;
+
+	struct drm_i915_file_private *file_priv;
+	/** file_priv list entry for this request */
+	struct list_head client_list;
+
+	/** process identifier submitting this request */
+	struct pid *pid;
+
+	/**
+	 * The ELSP only accepts two elements at a time, so we queue
+	 * context/tail pairs on a given queue (ring->execlist_queue) until the
+	 * hardware is available. The queue serves a double purpose: we also use
+	 * it to keep track of the up to 2 contexts currently in the hardware
+	 * (usually one in execution and the other queued up by the GPU): We
+	 * only remove elements from the head of the queue when the hardware
+	 * informs us that an element has been completed.
+	 *
+	 * All accesses to the queue are mediated by a spinlock
+	 * (ring->execlist_lock).
+	 */
+
+	/** Execlist link in the submission queue.*/
+	struct list_head execlist_link;
+
+	/** Execlists no. of times this request has been sent to the ELSP */
+	int elsp_submitted;
+};
+
+int i915_gem_request_alloc(struct intel_engine_cs *ring,
+			   struct intel_context *ctx,
+			   struct drm_i915_gem_request **req_out);
+void i915_gem_request_cancel(struct drm_i915_gem_request *req);
+void i915_gem_request_free(struct kref *req_ref);
+int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
+				   struct drm_file *file);
+void i915_gem_request_retire_upto(struct drm_i915_gem_request *req);
+
+static inline uint32_t
+i915_gem_request_get_seqno(struct drm_i915_gem_request *req)
+{
+	return req ? req->seqno : 0;
+}
+
+static inline struct intel_engine_cs *
+i915_gem_request_get_ring(struct drm_i915_gem_request *req)
+{
+	return req ? req->ring : NULL;
+}
+
+static inline struct drm_i915_gem_request *
+i915_gem_request_reference(struct drm_i915_gem_request *req)
+{
+	if (req)
+		kref_get(&req->ref);
+	return req;
+}
+
+static inline void
+i915_gem_request_unreference(struct drm_i915_gem_request *req)
+{
+	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
+	kref_put(&req->ref, i915_gem_request_free);
+}
+
+static inline void
+i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
+{
+	struct drm_device *dev;
+
+	if (!req)
+		return;
+
+	dev = req->ring->dev;
+	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
+		mutex_unlock(&dev->struct_mutex);
+}
+
+static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
+					   struct drm_i915_gem_request *src)
+{
+	if (src)
+		i915_gem_request_reference(src);
+
+	if (*pdst)
+		i915_gem_request_unreference(*pdst);
+
+	*pdst = src;
+}
+
+void __i915_add_request(struct drm_i915_gem_request *req,
+			struct drm_i915_gem_object *batch_obj,
+			bool flush_caches);
+#define i915_add_request(req) \
+	__i915_add_request(req, NULL, true)
+#define i915_add_request_no_flush(req) \
+	__i915_add_request(req, NULL, false)
+
+struct intel_rps_client;
+
+int __i915_wait_request(struct drm_i915_gem_request *req,
+			bool interruptible,
+			s64 *timeout,
+			struct intel_rps_client *rps);
+int __must_check i915_wait_request(struct drm_i915_gem_request *req);
+
+/**
+ * Returns true if seq1 is later than seq2.
+ */
+static inline bool
+i915_seqno_passed(uint32_t seq1, uint32_t seq2)
+{
+	return (int32_t)(seq1 - seq2) >= 0;
+}
+
+static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
+{
+	return i915_seqno_passed(intel_ring_get_seqno(req->ring),
+				 req->previous_seqno);
+}
+
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
+{
+	return i915_seqno_passed(intel_ring_get_seqno(req->ring),
+				 req->seqno);
+}
+
+#endif /* I915_GEM_REQUEST_H */
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 045/190] drm/i915: Move releasing of the GEM request from free to retire/cancel
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (42 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 044/190] drm/i915: Move GEM request routines to i915_gem_request.c Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-03-08 13:15   ` Tvrtko Ursulin
  2016-01-11  9:16 ` [PATCH 046/190] drm/i915: Derive GEM requests from dma-fence Chris Wilson
                   ` (42 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

If we move the release of the GEM request (i.e. decoupling it from the
various lists used for client and context tracking) after it is complete
(either by the GPU retiring the request, or by the caller cancelling the
request), we can remove the requirement that the final unreference of
the GEM request need to be under the struct_mutex.

v2: Execlists as always is badly asymetric and year old patches still
haven't landed to fix it up.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c          |  4 +--
 drivers/gpu/drm/i915/i915_gem_request.c  | 50 ++++++++++++++------------------
 drivers/gpu/drm/i915/i915_gem_request.h  | 14 ---------
 drivers/gpu/drm/i915/intel_breadcrumbs.c |  2 +-
 drivers/gpu/drm/i915/intel_display.c     |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c         |  6 ++--
 drivers/gpu/drm/i915/intel_pm.c          |  2 +-
 7 files changed, 30 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 68a25617ca7a..6d8d65304abf 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2502,7 +2502,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 			ret = __i915_wait_request(req[i], true,
 						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
 						  to_rps_client(file));
-		i915_gem_request_unreference__unlocked(req[i]);
+		i915_gem_request_unreference(req[i]);
 	}
 	return ret;
 
@@ -3505,7 +3505,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 		return 0;
 
 	ret = __i915_wait_request(target, true, NULL, NULL);
-	i915_gem_request_unreference__unlocked(target);
+	i915_gem_request_unreference(target);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index b4ede6dd7b20..1c4f4d83a3c2 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -184,13 +184,6 @@ err:
 	return ret;
 }
 
-void i915_gem_request_cancel(struct drm_i915_gem_request *req)
-{
-	intel_ring_reserved_space_cancel(req->ringbuf);
-
-	i915_gem_request_unreference(req);
-}
-
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 				   struct drm_file *file)
 {
@@ -235,9 +228,28 @@ i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
 	request->pid = NULL;
 }
 
+static void __i915_gem_request_release(struct drm_i915_gem_request *request)
+{
+	i915_gem_request_remove_from_client(request);
+
+	i915_gem_context_unreference(request->ctx);
+	i915_gem_request_unreference(request);
+}
+
+void i915_gem_request_cancel(struct drm_i915_gem_request *req)
+{
+	intel_ring_reserved_space_cancel(req->ringbuf);
+	if (i915.enable_execlists) {
+		if (req->ctx != req->ring->default_context)
+			intel_lr_context_unpin(req);
+	}
+	__i915_gem_request_release(req);
+}
+
 static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 {
 	trace_i915_gem_request_retire(request);
+	list_del_init(&request->list);
 
 	/* We know the GPU must have read the request to have
 	 * sent us the seqno + interrupt, so use the position
@@ -248,11 +260,7 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	 * completion order.
 	 */
 	request->ringbuf->last_retired_head = request->postfix;
-
-	list_del_init(&request->list);
-	i915_gem_request_remove_from_client(request);
-
-	i915_gem_request_unreference(request);
+	__i915_gem_request_release(request);
 }
 
 void
@@ -639,21 +647,7 @@ i915_wait_request(struct drm_i915_gem_request *req)
 
 void i915_gem_request_free(struct kref *req_ref)
 {
-	struct drm_i915_gem_request *req = container_of(req_ref,
-						 typeof(*req), ref);
-	struct intel_context *ctx = req->ctx;
-
-	if (req->file_priv)
-		i915_gem_request_remove_from_client(req);
-
-	if (ctx) {
-		if (i915.enable_execlists) {
-			if (ctx != req->ring->default_context)
-				intel_lr_context_unpin(req);
-		}
-
-		i915_gem_context_unreference(ctx);
-	}
-
+	struct drm_i915_gem_request *req =
+		container_of(req_ref, typeof(*req), ref);
 	kmem_cache_free(req->i915->requests, req);
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index d46f22f30b0a..af1b825fce50 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -154,23 +154,9 @@ i915_gem_request_reference(struct drm_i915_gem_request *req)
 static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
-	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
 	kref_put(&req->ref, i915_gem_request_free);
 }
 
-static inline void
-i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
-{
-	struct drm_device *dev;
-
-	if (!req)
-		return;
-
-	dev = req->ring->dev;
-	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
-		mutex_unlock(&dev->struct_mutex);
-}
-
 static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
 					   struct drm_i915_gem_request *src)
 {
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 0ea01bd6811c..f6731aac7fcf 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -390,7 +390,7 @@ static int intel_breadcrumbs_signaler(void *arg)
 			 */
 			intel_engine_remove_wait(engine, &signal->wait);
 
-			i915_gem_request_unreference__unlocked(signal->request);
+			i915_gem_request_unreference(signal->request);
 
 			/* Find the next oldest signal. Note that as we have
 			 * not been holding the lock, another client may
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 57c54c9bc82b..32885b8d5c02 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11431,7 +11431,7 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
 		WARN_ON(__i915_wait_request(mmio_flip->req,
 					    false, NULL,
 					    &mmio_flip->i915->rps.mmioflips));
-		i915_gem_request_unreference__unlocked(mmio_flip->req);
+		i915_gem_request_unreference(mmio_flip->req);
 	}
 
 	/* For framebuffer backed by dmabuf, wait for fence */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b634e7d7a92b..7a3069a2beb2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -587,9 +587,6 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
 	struct drm_i915_gem_request *cursor;
 	int num_elements = 0;
 
-	if (request->ctx != ring->default_context)
-		intel_lr_context_pin(request);
-
 	i915_gem_request_reference(request);
 
 	spin_lock_irq(&ring->execlist_lock);
@@ -1071,6 +1068,8 @@ static int intel_lr_context_pin(struct drm_i915_gem_request *rq)
 		ret = intel_lr_context_do_pin(ring, ctx_obj, ringbuf);
 		if (ret)
 			goto reset_pin_count;
+
+		i915_gem_context_reference(rq->ctx);
 	}
 	return ret;
 
@@ -1090,6 +1089,7 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 		if (--rq->ctx->engine[ring->id].pin_count == 0) {
 			intel_unpin_ringbuffer_obj(ringbuf);
 			i915_gem_object_ggtt_unpin(ctx_obj);
+			i915_gem_context_unreference(rq->ctx);
 		}
 	}
 }
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index e51ba529a97e..0e13135aefaa 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7289,7 +7289,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 		gen6_rps_boost(to_i915(req->ring->dev), NULL,
 			       req->emitted_jiffies);
 
-	i915_gem_request_unreference__unlocked(req);
+	i915_gem_request_unreference(req);
 	kfree(boost);
 }
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 046/190] drm/i915: Derive GEM requests from dma-fence
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (43 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 045/190] drm/i915: Move releasing of the GEM request from free to retire/cancel Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 047/190] drm/i915: Rename request reference/unreference to get/put Chris Wilson
                   ` (41 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

dma-buf provides a generic fence class for interoperation between
drivers. Internally we use the request structure as a fence, and so with
only a little bit of interfacing we can rebase those requests on top of
dma-buf fences. This will allow us, in the future, to pass those fences
back to userspace or between drivers.

v2: The fence_context needs to be globally unique, not just unique to
this device.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |   2 +-
 drivers/gpu/drm/i915/i915_gem_request.c    | 111 +++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_gem_request.h    |  33 ++++-----
 drivers/gpu/drm/i915/i915_gpu_error.c      |   2 +-
 drivers/gpu/drm/i915/i915_guc_submission.c |   2 +-
 drivers/gpu/drm/i915/i915_trace.h          |   2 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c   |   3 +-
 drivers/gpu/drm/i915/intel_lrc.c           |   3 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c    |  15 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.h    |   1 +
 10 files changed, 133 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 6172649b7e56..b82482573a8f 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -710,7 +710,7 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
 			if (req->pid)
 				task = pid_task(req->pid, PIDTYPE_PID);
 			seq_printf(m, "    %x @ %d: %s [%d]\n",
-				   req->seqno,
+				   req->fence.seqno,
 				   (int) (jiffies - req->emitted_jiffies),
 				   task ? task->comm : "<unknown>",
 				   task ? task->pid : -1);
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 1c4f4d83a3c2..e366ca0dcd99 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -24,6 +24,92 @@
 
 #include "i915_drv.h"
 
+static inline struct drm_i915_gem_request *
+to_i915_request(struct fence *fence)
+{
+	return container_of(fence, struct drm_i915_gem_request, fence);
+}
+
+static const char *i915_fence_get_driver_name(struct fence *fence)
+{
+	return "i915";
+}
+
+static const char *i915_fence_get_timeline_name(struct fence *fence)
+{
+	return to_i915_request(fence)->ring->name;
+}
+
+static bool i915_fence_signaled(struct fence *fence)
+{
+	return i915_gem_request_completed(to_i915_request(fence));
+}
+
+static bool i915_fence_enable_signaling(struct fence *fence)
+{
+	if (i915_fence_signaled(fence))
+		return false;
+
+	return intel_engine_enable_signaling(to_i915_request(fence)) == 0;
+}
+
+static signed long i915_fence_wait(struct fence *fence,
+				   bool interruptible,
+				   signed long timeout_jiffies)
+{
+	s64 timeout_ns, *timeout;
+	int ret;
+
+	if (timeout_jiffies != MAX_SCHEDULE_TIMEOUT) {
+		timeout_ns = jiffies_to_nsecs(timeout_jiffies);
+		timeout = &timeout_ns;
+	} else
+		timeout = NULL;
+
+	ret = __i915_wait_request(to_i915_request(fence),
+				  interruptible, timeout,
+				  NULL);
+	if (ret == -ETIME)
+		return 0;
+
+	if (ret < 0)
+		return ret;
+
+	if (timeout_jiffies != MAX_SCHEDULE_TIMEOUT)
+		timeout_jiffies = nsecs_to_jiffies(timeout_ns);
+
+	return timeout_jiffies;
+}
+
+static void i915_fence_value_str(struct fence *fence, char *str, int size)
+{
+	snprintf(str, size, "%u", fence->seqno);
+}
+
+static void i915_fence_timeline_value_str(struct fence *fence, char *str,
+					  int size)
+{
+	snprintf(str, size, "%u",
+		 intel_ring_get_seqno(to_i915_request(fence)->ring));
+}
+
+static void i915_fence_release(struct fence *fence)
+{
+	struct drm_i915_gem_request *req = to_i915_request(fence);
+	kmem_cache_free(req->i915->requests, req);
+}
+
+static const struct fence_ops i915_fence_ops = {
+	.get_driver_name = i915_fence_get_driver_name,
+	.get_timeline_name = i915_fence_get_timeline_name,
+	.enable_signaling = i915_fence_enable_signaling,
+	.signaled = i915_fence_signaled,
+	.wait = i915_fence_wait,
+	.release = i915_fence_release,
+	.fence_value_str = i915_fence_value_str,
+	.timeline_value_str = i915_fence_timeline_value_str,
+};
+
 static int
 i915_gem_check_wedge(unsigned reset_counter, bool interruptible)
 {
@@ -116,6 +202,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 	struct drm_i915_private *dev_priv = to_i915(ring->dev);
 	unsigned reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 	struct drm_i915_gem_request *req;
+	u32 seqno;
 	int ret;
 
 	if (!req_out)
@@ -135,11 +222,17 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 	if (req == NULL)
 		return -ENOMEM;
 
-	ret = i915_gem_get_seqno(dev_priv, &req->seqno);
+	ret = i915_gem_get_seqno(dev_priv, &seqno);
 	if (ret)
 		goto err;
 
-	kref_init(&req->ref);
+	spin_lock_init(&req->lock);
+	fence_init(&req->fence,
+		   &i915_fence_ops,
+		   &req->lock,
+		   ring->fence_context,
+		   seqno);
+
 	req->i915 = dev_priv;
 	req->ring = ring;
 	req->reset_counter = reset_counter;
@@ -377,7 +470,7 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 
 	request->emitted_jiffies = jiffies;
 	request->previous_seqno = ring->last_submitted_seqno;
-	ring->last_submitted_seqno = request->seqno;
+	ring->last_submitted_seqno = request->fence.seqno;
 	list_add_tail(&request->list, &ring->request_list);
 
 	trace_i915_gem_request_add(request);
@@ -531,7 +624,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (INTEL_INFO(req->i915)->gen >= 6)
 		gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
 
-	intel_wait_init(&wait, req->seqno);
+	intel_wait_init(&wait, req->fence.seqno);
 	set_task_state(wait.task, state);
 
 	/* Optimistic spin for the next ~jiffie before touching IRQs */
@@ -598,7 +691,8 @@ complete:
 			*timeout = 0;
 	}
 
-	if (ret == 0 && rps && req->seqno == req->ring->last_submitted_seqno) {
+	if (ret == 0 && rps &&
+	    req->fence.seqno == req->ring->last_submitted_seqno) {
 		/* The GPU is now idle and this client has stalled.
 		 * Since no other client has submitted a request in the
 		 * meantime, assume that this client is the only one
@@ -644,10 +738,3 @@ i915_wait_request(struct drm_i915_gem_request *req)
 	i915_gem_request_retire_upto(req);
 	return 0;
 }
-
-void i915_gem_request_free(struct kref *req_ref)
-{
-	struct drm_i915_gem_request *req =
-		container_of(req_ref, typeof(*req), ref);
-	kmem_cache_free(req->i915->requests, req);
-}
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index af1b825fce50..b55d0b7c7f2a 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -25,6 +25,8 @@
 #ifndef I915_GEM_REQUEST_H
 #define I915_GEM_REQUEST_H
 
+#include <linux/fence.h>
+
 /**
  * Request queue structure.
  *
@@ -36,11 +38,11 @@
  * emission time to be associated with the request for tracking how far ahead
  * of the GPU the submission is.
  *
- * The requests are reference counted, so upon creation they should have an
- * initial reference taken using kref_init
+ * The requests are reference counted.
  */
 struct drm_i915_gem_request {
-	struct kref ref;
+	struct fence fence;
+	spinlock_t lock;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -53,12 +55,6 @@ struct drm_i915_gem_request {
 	  */
 	u32 previous_seqno;
 
-	 /** GEM sequence number associated with this request,
-	  * when the HWS breadcrumb is equal or greater than this the GPU
-	  * has finished processing this request.
-	  */
-	u32 seqno;
-
 	/** Position in the ringbuffer of the start of the request */
 	u32 head;
 
@@ -126,7 +122,6 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
-void i915_gem_request_free(struct kref *req_ref);
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 				   struct drm_file *file);
 void i915_gem_request_retire_upto(struct drm_i915_gem_request *req);
@@ -134,7 +129,7 @@ void i915_gem_request_retire_upto(struct drm_i915_gem_request *req);
 static inline uint32_t
 i915_gem_request_get_seqno(struct drm_i915_gem_request *req)
 {
-	return req ? req->seqno : 0;
+	return req ? req->fence.seqno : 0;
 }
 
 static inline struct intel_engine_cs *
@@ -144,17 +139,23 @@ i915_gem_request_get_ring(struct drm_i915_gem_request *req)
 }
 
 static inline struct drm_i915_gem_request *
+to_request(struct fence *fence)
+{
+	/* We assume that NULL fence/request are interoperable */
+	BUILD_BUG_ON(offsetof(struct drm_i915_gem_request, fence) != 0);
+	return container_of(fence, struct drm_i915_gem_request, fence);
+}
+
+static inline struct drm_i915_gem_request *
 i915_gem_request_reference(struct drm_i915_gem_request *req)
 {
-	if (req)
-		kref_get(&req->ref);
-	return req;
+	return to_request(fence_get(&req->fence));
 }
 
 static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
-	kref_put(&req->ref, i915_gem_request_free);
+	fence_put(&req->fence);
 }
 
 static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
@@ -203,7 +204,7 @@ static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
 	return i915_seqno_passed(intel_ring_get_seqno(req->ring),
-				 req->seqno);
+				 req->fence.seqno);
 }
 
 #endif /* I915_GEM_REQUEST_H */
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 86f582115313..05f054898a95 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1092,7 +1092,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 			}
 
 			erq = &error->ring[i].requests[count++];
-			erq->seqno = request->seqno;
+			erq->seqno = request->fence.seqno;
 			erq->jiffies = request->emitted_jiffies;
 			erq->tail = request->postfix;
 		}
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 9c244247c13e..56d3064d32ed 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -616,7 +616,7 @@ int i915_guc_submit(struct i915_guc_client *client,
 		client->retcode = 0;
 	}
 	guc->submissions[ring_id] += 1;
-	guc->last_seqno[ring_id] = rq->seqno;
+	guc->last_seqno[ring_id] = rq->fence.seqno;
 
 	return q_ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 43bb2e0bb949..dc2ff5cac2f4 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -503,7 +503,7 @@ TRACE_EVENT(i915_gem_ring_dispatch,
 			   __entry->ring = ring->id;
 			   __entry->seqno = i915_gem_request_get_seqno(req);
 			   __entry->flags = flags;
-			   intel_engine_enable_signaling(req);
+			   fence_enable_sw_signaling(&req->fence);
 			   ),
 
 	    TP_printk("dev=%u, ring=%u, seqno=%u, flags=%x",
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index f6731aac7fcf..61e18cb90850 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -390,6 +390,7 @@ static int intel_breadcrumbs_signaler(void *arg)
 			 */
 			intel_engine_remove_wait(engine, &signal->wait);
 
+			fence_signal(&signal->request->fence);
 			i915_gem_request_unreference(signal->request);
 
 			/* Find the next oldest signal. Note that as we have
@@ -456,7 +457,7 @@ int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
 	}
 
 	signal->wait.task = task;
-	signal->wait.seqno = request->seqno;
+	signal->wait.seqno = request->fence.seqno;
 
 	signal->request = i915_gem_request_reference(request);
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7a3069a2beb2..f43a94ae5c76 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1798,7 +1798,7 @@ static int gen8_emit_request(struct drm_i915_gem_request *request)
 				(ring->status_page.gfx_addr +
 				(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT)));
 	intel_logical_ring_emit(ringbuf, 0);
-	intel_logical_ring_emit(ringbuf, request->seqno);
+	intel_logical_ring_emit(ringbuf, request->fence.seqno);
 	intel_logical_ring_emit(ringbuf, MI_USER_INTERRUPT);
 	intel_logical_ring_emit(ringbuf, MI_NOOP);
 	intel_logical_ring_advance_and_submit(request);
@@ -1909,6 +1909,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 
 	ring->dev = dev;
 	ring->i915 = to_i915(dev);
+	ring->fence_context = fence_context_alloc(1);
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index d9bb6458fa60..e8a7a1045c06 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1218,7 +1218,7 @@ static int gen8_rcs_signal(struct drm_i915_gem_request *signaller_req,
 					   PIPE_CONTROL_FLUSH_ENABLE);
 		intel_ring_emit(signaller, lower_32_bits(gtt_offset));
 		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
-		intel_ring_emit(signaller, signaller_req->seqno);
+		intel_ring_emit(signaller, signaller_req->fence.seqno);
 		intel_ring_emit(signaller, 0);
 		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
 					   MI_SEMAPHORE_TARGET(waiter->id));
@@ -1256,7 +1256,7 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
 		intel_ring_emit(signaller, lower_32_bits(gtt_offset) |
 					   MI_FLUSH_DW_USE_GTT);
 		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
-		intel_ring_emit(signaller, signaller_req->seqno);
+		intel_ring_emit(signaller, signaller_req->fence.seqno);
 		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
 					   MI_SEMAPHORE_TARGET(waiter->id));
 		intel_ring_emit(signaller, 0);
@@ -1289,7 +1289,7 @@ static int gen6_signal(struct drm_i915_gem_request *signaller_req,
 		if (i915_mmio_reg_valid(mbox_reg)) {
 			intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1));
 			intel_ring_emit_reg(signaller, mbox_reg);
-			intel_ring_emit(signaller, signaller_req->seqno);
+			intel_ring_emit(signaller, signaller_req->fence.seqno);
 		}
 	}
 
@@ -1324,7 +1324,7 @@ gen6_add_request(struct drm_i915_gem_request *req)
 
 	intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
 	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
-	intel_ring_emit(ring, req->seqno);
+	intel_ring_emit(ring, req->fence.seqno);
 	intel_ring_emit(ring, MI_USER_INTERRUPT);
 	__intel_ring_advance(ring);
 
@@ -1448,7 +1448,7 @@ pc_render_add_request(struct drm_i915_gem_request *req)
 			PIPE_CONTROL_QW_WRITE |
 			PIPE_CONTROL_WRITE_FLUSH);
 	intel_ring_emit(ring, addr | PIPE_CONTROL_GLOBAL_GTT);
-	intel_ring_emit(ring, req->seqno);
+	intel_ring_emit(ring, req->fence.seqno);
 	intel_ring_emit(ring, 0);
 	PIPE_CONTROL_FLUSH(ring, scratch_addr);
 	scratch_addr += 2 * CACHELINE_BYTES; /* write to separate cachelines */
@@ -1467,7 +1467,7 @@ pc_render_add_request(struct drm_i915_gem_request *req)
 			PIPE_CONTROL_WRITE_FLUSH |
 			PIPE_CONTROL_NOTIFY);
 	intel_ring_emit(ring, addr | PIPE_CONTROL_GLOBAL_GTT);
-	intel_ring_emit(ring, req->seqno);
+	intel_ring_emit(ring, req->fence.seqno);
 	intel_ring_emit(ring, 0);
 	__intel_ring_advance(ring);
 
@@ -1577,7 +1577,7 @@ i9xx_add_request(struct drm_i915_gem_request *req)
 
 	intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
 	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
-	intel_ring_emit(ring, req->seqno);
+	intel_ring_emit(ring, req->fence.seqno);
 	intel_ring_emit(ring, MI_USER_INTERRUPT);
 	__intel_ring_advance(ring);
 
@@ -2010,6 +2010,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 
 	ring->dev = dev;
 	ring->i915 = to_i915(dev);
+	ring->fence_context = fence_context_alloc(1);
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index eecf9c7ae2b8..a1fcb6c7501f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -159,6 +159,7 @@ struct  intel_engine_cs {
 	} id;
 #define I915_NUM_RINGS 5
 #define LAST_USER_RING (VECS + 1)
+	unsigned fence_context;
 	u32		mmio_base;
 	struct		drm_device *dev;
 	struct drm_i915_private *i915;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 047/190] drm/i915: Rename request reference/unreference to get/put
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (44 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 046/190] drm/i915: Derive GEM requests from dma-fence Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:16 ` [PATCH 048/190] drm/i915: Disable waitboosting for fence_wait() Chris Wilson
                   ` (40 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

Now that we derive requests from struct fence, swap over to its
nomenclature for references. It's shorter and more idiomatic across the
kernel.

s/i915_gem_request_reference/i915_gem_request_get/
s/i915_gem_request_unreference/i915_gem_request_put/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c          | 14 +++++++-------
 drivers/gpu/drm/i915/i915_gem_request.c  |  2 +-
 drivers/gpu/drm/i915/i915_gem_request.h  |  8 ++++----
 drivers/gpu/drm/i915/intel_breadcrumbs.c |  4 ++--
 drivers/gpu/drm/i915/intel_display.c     |  4 ++--
 drivers/gpu/drm/i915/intel_lrc.c         |  4 ++--
 drivers/gpu/drm/i915/intel_pm.c          |  5 ++---
 7 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6d8d65304abf..fd61e722b595 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1185,7 +1185,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 		if (req == NULL)
 			return 0;
 
-		requests[n++] = i915_gem_request_reference(req);
+		requests[n++] = i915_gem_request_get(req);
 	} else {
 		for (i = 0; i < I915_NUM_RINGS; i++) {
 			struct drm_i915_gem_request *req;
@@ -1194,7 +1194,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 			if (req == NULL)
 				continue;
 
-			requests[n++] = i915_gem_request_reference(req);
+			requests[n++] = i915_gem_request_get(req);
 		}
 	}
 
@@ -1207,7 +1207,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	for (i = 0; i < n; i++) {
 		if (ret == 0)
 			i915_gem_object_retire_request(obj, requests[i]);
-		i915_gem_request_unreference(requests[i]);
+		i915_gem_request_put(requests[i]);
 	}
 
 	return ret;
@@ -2492,7 +2492,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		if (obj->last_read_req[i] == NULL)
 			continue;
 
-		req[n++] = i915_gem_request_reference(obj->last_read_req[i]);
+		req[n++] = i915_gem_request_get(obj->last_read_req[i]);
 	}
 
 	mutex_unlock(&dev->struct_mutex);
@@ -2502,7 +2502,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 			ret = __i915_wait_request(req[i], true,
 						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
 						  to_rps_client(file));
-		i915_gem_request_unreference(req[i]);
+		i915_gem_request_put(req[i]);
 	}
 	return ret;
 
@@ -3498,14 +3498,14 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 		target = request;
 	}
 	if (target)
-		i915_gem_request_reference(target);
+		i915_gem_request_get(target);
 	spin_unlock(&file_priv->mm.lock);
 
 	if (target == NULL)
 		return 0;
 
 	ret = __i915_wait_request(target, true, NULL, NULL);
-	i915_gem_request_unreference(target);
+	i915_gem_request_put(target);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index e366ca0dcd99..a796dbd1b0e4 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -326,7 +326,7 @@ static void __i915_gem_request_release(struct drm_i915_gem_request *request)
 	i915_gem_request_remove_from_client(request);
 
 	i915_gem_context_unreference(request->ctx);
-	i915_gem_request_unreference(request);
+	i915_gem_request_put(request);
 }
 
 void i915_gem_request_cancel(struct drm_i915_gem_request *req)
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index b55d0b7c7f2a..0ab14fd0fce0 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -147,13 +147,13 @@ to_request(struct fence *fence)
 }
 
 static inline struct drm_i915_gem_request *
-i915_gem_request_reference(struct drm_i915_gem_request *req)
+i915_gem_request_get(struct drm_i915_gem_request *req)
 {
 	return to_request(fence_get(&req->fence));
 }
 
 static inline void
-i915_gem_request_unreference(struct drm_i915_gem_request *req)
+i915_gem_request_put(struct drm_i915_gem_request *req)
 {
 	fence_put(&req->fence);
 }
@@ -162,10 +162,10 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
 					   struct drm_i915_gem_request *src)
 {
 	if (src)
-		i915_gem_request_reference(src);
+		i915_gem_request_get(src);
 
 	if (*pdst)
-		i915_gem_request_unreference(*pdst);
+		i915_gem_request_put(*pdst);
 
 	*pdst = src;
 }
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 61e18cb90850..aca1b72edcd8 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -391,7 +391,7 @@ static int intel_breadcrumbs_signaler(void *arg)
 			intel_engine_remove_wait(engine, &signal->wait);
 
 			fence_signal(&signal->request->fence);
-			i915_gem_request_unreference(signal->request);
+			i915_gem_request_put(signal->request);
 
 			/* Find the next oldest signal. Note that as we have
 			 * not been holding the lock, another client may
@@ -459,7 +459,7 @@ int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
 	signal->wait.task = task;
 	signal->wait.seqno = request->fence.seqno;
 
-	signal->request = i915_gem_request_reference(request);
+	signal->request = i915_gem_request_get(request);
 
 	/* Insert ourselves into the retirement ordered list of signals
 	 * on this engine. We track the oldest seqno as that will be the
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 32885b8d5c02..ae247927e931 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11431,7 +11431,7 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
 		WARN_ON(__i915_wait_request(mmio_flip->req,
 					    false, NULL,
 					    &mmio_flip->i915->rps.mmioflips));
-		i915_gem_request_unreference(mmio_flip->req);
+		i915_gem_request_put(mmio_flip->req);
 	}
 
 	/* For framebuffer backed by dmabuf, wait for fence */
@@ -11455,7 +11455,7 @@ static int intel_queue_mmio_flip(struct drm_device *dev,
 		return -ENOMEM;
 
 	mmio_flip->i915 = to_i915(dev);
-	mmio_flip->req = i915_gem_request_reference(obj->last_write_req);
+	mmio_flip->req = i915_gem_request_get(obj->last_write_req);
 	mmio_flip->crtc = to_intel_crtc(crtc);
 	mmio_flip->rotation = crtc->primary->state->rotation;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f43a94ae5c76..433e9f60e926 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -587,7 +587,7 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
 	struct drm_i915_gem_request *cursor;
 	int num_elements = 0;
 
-	i915_gem_request_reference(request);
+	i915_gem_request_get(request);
 
 	spin_lock_irq(&ring->execlist_lock);
 
@@ -983,7 +983,7 @@ void intel_execlists_retire_requests(struct intel_engine_cs *ring)
 		if (ctx_obj && (ctx != ring->default_context))
 			intel_lr_context_unpin(req);
 		list_del(&req->execlist_link);
-		i915_gem_request_unreference(req);
+		i915_gem_request_put(req);
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 0e13135aefaa..39b7ca9c3e66 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7289,7 +7289,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 		gen6_rps_boost(to_i915(req->ring->dev), NULL,
 			       req->emitted_jiffies);
 
-	i915_gem_request_unreference(req);
+	i915_gem_request_put(req);
 	kfree(boost);
 }
 
@@ -7308,8 +7308,7 @@ void intel_queue_rps_boost_for_request(struct drm_device *dev,
 	if (boost == NULL)
 		return;
 
-	i915_gem_request_reference(req);
-	boost->req = req;
+	boost->req = i915_gem_request_get(req);
 
 	INIT_WORK(&boost->work, __intel_rps_boost_work);
 	queue_work(to_i915(dev)->wq, &boost->work);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 048/190] drm/i915: Disable waitboosting for fence_wait()
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (45 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 047/190] drm/i915: Rename request reference/unreference to get/put Chris Wilson
@ 2016-01-11  9:16 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 049/190] drm/i915: Disable waitboosting for mmioflips/semaphores Chris Wilson
                   ` (39 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:16 UTC (permalink / raw)
  To: intel-gfx

We want to restrict waitboosting to known process contexts, where we can
track which clients are receiving waitboosts and prevent excessive power
wasting. For fence_wait() we do not have any client tracking and so that
leaves it open to abuse.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 6 +++---
 drivers/gpu/drm/i915/i915_gem_request.h | 1 +
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index a796dbd1b0e4..01893d847dfd 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -68,7 +68,7 @@ static signed long i915_fence_wait(struct fence *fence,
 
 	ret = __i915_wait_request(to_i915_request(fence),
 				  interruptible, timeout,
-				  NULL);
+				  NO_WAITBOOST);
 	if (ret == -ETIME)
 		return 0;
 
@@ -621,7 +621,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	 * forcing the clocks too high for the whole system, we only allow
 	 * each client to waitboost once in a busy period.
 	 */
-	if (INTEL_INFO(req->i915)->gen >= 6)
+	if (!IS_ERR(rps) && INTEL_INFO(req->i915)->gen >= 6)
 		gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
 
 	intel_wait_init(&wait, req->fence.seqno);
@@ -691,7 +691,7 @@ complete:
 			*timeout = 0;
 	}
 
-	if (ret == 0 && rps &&
+	if (ret == 0 && !IS_ERR_OR_NULL(rps) &&
 	    req->fence.seqno == req->ring->last_submitted_seqno) {
 		/* The GPU is now idle and this client has stalled.
 		 * Since no other client has submitted a request in the
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 0ab14fd0fce0..6b3de827929a 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -179,6 +179,7 @@ void __i915_add_request(struct drm_i915_gem_request *req,
 	__i915_add_request(req, NULL, false)
 
 struct intel_rps_client;
+#define NO_WAITBOOST ERR_PTR(-1)
 
 int __i915_wait_request(struct drm_i915_gem_request *req,
 			bool interruptible,
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 049/190] drm/i915: Disable waitboosting for mmioflips/semaphores
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (46 preceding siblings ...)
  2016-01-11  9:16 ` [PATCH 048/190] drm/i915: Disable waitboosting for fence_wait() Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 050/190] drm/i915: Refactor duplicate object vmap functions Chris Wilson
                   ` (38 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Since

commit a6f766f3975185af66a31a2cea2cd38721645999
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Apr 27 13:41:20 2015 +0100

    drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts

and

commit bcafc4e38b6ad03f48989b7ecaff03845b5b7acf
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Apr 27 13:41:21 2015 +0100

    drm/i915: Limit mmio flip RPS boosts

we have limited the waitboosting for semaphores and flips. Ideally we do
not want to boost in either of these instances as no consumer is waiting
upon the results. With the introduction of NO_WAITBOOST in the previous
patch, we can finally disable these needless boosts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c  | 8 +-------
 drivers/gpu/drm/i915/i915_drv.h      | 2 --
 drivers/gpu/drm/i915/i915_gem.c      | 2 +-
 drivers/gpu/drm/i915/intel_display.c | 2 +-
 drivers/gpu/drm/i915/intel_pm.c      | 2 --
 5 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index b82482573a8f..5335072f2047 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2398,13 +2398,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 			   list_empty(&file_priv->rps.link) ? "" : ", active");
 		rcu_read_unlock();
 	}
-	seq_printf(m, "Semaphore boosts: %d%s\n",
-		   dev_priv->rps.semaphores.boosts,
-		   list_empty(&dev_priv->rps.semaphores.link) ? "" : ", active");
-	seq_printf(m, "MMIO flip boosts: %d%s\n",
-		   dev_priv->rps.mmioflips.boosts,
-		   list_empty(&dev_priv->rps.mmioflips.link) ? "" : ", active");
-	seq_printf(m, "Kernel boosts: %d\n", dev_priv->rps.boosts);
+	seq_printf(m, "Kernel (anonymous) boosts: %d\n", dev_priv->rps.boosts);
 	spin_unlock(&dev_priv->rps.client_lock);
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ee146ce02412..49a151126b2a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1136,8 +1136,6 @@ struct intel_gen6_power_mgmt {
 	struct delayed_work delayed_resume_work;
 	unsigned boosts;
 
-	struct intel_rps_client semaphores, mmioflips;
-
 	/* manual wa residency calculations */
 	struct intel_rps_ei up_ei, down_ei;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fd61e722b595..9df00e694cd9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2533,7 +2533,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 		ret = __i915_wait_request(from_req,
 					  i915->mm.interruptible,
 					  NULL,
-					  &i915->rps.semaphores);
+					  NO_WAITBOOST);
 		if (ret)
 			return ret;
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index ae247927e931..e2822530af25 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11430,7 +11430,7 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
 	if (mmio_flip->req) {
 		WARN_ON(__i915_wait_request(mmio_flip->req,
 					    false, NULL,
-					    &mmio_flip->i915->rps.mmioflips));
+					    NO_WAITBOOST));
 		i915_gem_request_put(mmio_flip->req);
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 39b7ca9c3e66..b340f2a1f110 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7324,8 +7324,6 @@ void intel_pm_setup(struct drm_device *dev)
 	INIT_DELAYED_WORK(&dev_priv->rps.delayed_resume_work,
 			  intel_gen6_powersave_work);
 	INIT_LIST_HEAD(&dev_priv->rps.clients);
-	INIT_LIST_HEAD(&dev_priv->rps.semaphores.link);
-	INIT_LIST_HEAD(&dev_priv->rps.mmioflips.link);
 
 	dev_priv->pm.suspended = false;
 	atomic_set(&dev_priv->pm.wakeref_count, 0);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 050/190] drm/i915: Refactor duplicate object vmap functions
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (47 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 049/190] drm/i915: Disable waitboosting for mmioflips/semaphores Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 051/190] drm,i915: Introduce drm_malloc_gfp() Chris Wilson
                   ` (37 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

We now have two implementations for vmapping a whole object, one for
dma-buf and one for the ringbuffer. If we couple the vmapping into the
obj->pages lifetime, then we can reuse an obj->vmapping for both and at
the same time couple it into the shrinker.

v2: Mark the failable kmalloc() as __GFP_NOWARN (vsyrjala)
v3: Call unpin_vmap from the right dmabuf unmapper

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         | 12 +++++---
 drivers/gpu/drm/i915/i915_gem.c         | 41 +++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_dmabuf.c  | 53 ++++-----------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.c | 53 ++++++++++-----------------------
 4 files changed, 71 insertions(+), 88 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 49a151126b2a..56cf2ffc1eac 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2114,10 +2114,7 @@ struct drm_i915_gem_object {
 		struct scatterlist *sg;
 		int last;
 	} get_page;
-
-	/* prime dma-buf support */
-	void *dma_buf_vmapping;
-	int vmapping_count;
+	void *vmapping;
 
 	/** Breadcrumb of last rendering to the buffer.
 	 * There can only be one writer, but we allow for multiple readers.
@@ -2774,12 +2771,19 @@ static inline void i915_gem_object_pin_pages(struct drm_i915_gem_object *obj)
 	BUG_ON(obj->pages == NULL);
 	obj->pages_pin_count++;
 }
+
 static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
 {
 	BUG_ON(obj->pages_pin_count == 0);
 	obj->pages_pin_count--;
 }
 
+void *__must_check i915_gem_object_pin_vmap(struct drm_i915_gem_object *obj);
+static inline void i915_gem_object_unpin_vmap(struct drm_i915_gem_object *obj)
+{
+	i915_gem_object_unpin_pages(obj);
+}
+
 int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
 int i915_gem_object_sync(struct drm_i915_gem_object *obj,
 			 struct intel_engine_cs *to,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9df00e694cd9..2912e8714f5b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1854,6 +1854,11 @@ i915_gem_object_put_pages(struct drm_i915_gem_object *obj)
 	ops->put_pages(obj);
 	obj->pages = NULL;
 
+	if (obj->vmapping) {
+		vunmap(obj->vmapping);
+		obj->vmapping = NULL;
+	}
+
 	i915_gem_object_invalidate(obj);
 
 	return 0;
@@ -2019,6 +2024,42 @@ i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
 	return 0;
 }
 
+void *i915_gem_object_pin_vmap(struct drm_i915_gem_object *obj)
+{
+	int ret;
+
+	ret = i915_gem_object_get_pages(obj);
+	if (ret)
+		return ERR_PTR(ret);
+
+	i915_gem_object_pin_pages(obj);
+
+	if (obj->vmapping == NULL) {
+		struct sg_page_iter sg_iter;
+		struct page **pages;
+		int n;
+
+		n = obj->base.size >> PAGE_SHIFT;
+		pages = kmalloc(n*sizeof(*pages), GFP_TEMPORARY | __GFP_NOWARN);
+		if (pages == NULL)
+			pages = drm_malloc_ab(n, sizeof(*pages));
+		if (pages != NULL) {
+			n = 0;
+			for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0)
+				pages[n++] = sg_page_iter_page(&sg_iter);
+
+			obj->vmapping = vmap(pages, n, 0, PAGE_KERNEL);
+			drm_free_large(pages);
+		}
+		if (obj->vmapping == NULL) {
+			i915_gem_object_unpin_pages(obj);
+			return ERR_PTR(-ENOMEM);
+		}
+	}
+
+	return obj->vmapping;
+}
+
 void i915_vma_move_to_active(struct i915_vma *vma,
 			     struct drm_i915_gem_request *req)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index e9c2bfd85b52..8894648acee0 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -95,14 +95,12 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment *attachment,
 {
 	struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf);
 
-	mutex_lock(&obj->base.dev->struct_mutex);
-
 	dma_unmap_sg(attachment->dev, sg->sgl, sg->nents, dir);
 	sg_free_table(sg);
 	kfree(sg);
 
+	mutex_lock(&obj->base.dev->struct_mutex);
 	i915_gem_object_unpin_pages(obj);
-
 	mutex_unlock(&obj->base.dev->struct_mutex);
 }
 
@@ -110,51 +108,17 @@ static void *i915_gem_dmabuf_vmap(struct dma_buf *dma_buf)
 {
 	struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
 	struct drm_device *dev = obj->base.dev;
-	struct sg_page_iter sg_iter;
-	struct page **pages;
-	int ret, i;
+	void *addr;
+	int ret;
 
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
 		return ERR_PTR(ret);
 
-	if (obj->dma_buf_vmapping) {
-		obj->vmapping_count++;
-		goto out_unlock;
-	}
-
-	ret = i915_gem_object_get_pages(obj);
-	if (ret)
-		goto err;
-
-	i915_gem_object_pin_pages(obj);
-
-	ret = -ENOMEM;
-
-	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
-	if (pages == NULL)
-		goto err_unpin;
-
-	i = 0;
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0)
-		pages[i++] = sg_page_iter_page(&sg_iter);
-
-	obj->dma_buf_vmapping = vmap(pages, i, 0, PAGE_KERNEL);
-	drm_free_large(pages);
-
-	if (!obj->dma_buf_vmapping)
-		goto err_unpin;
-
-	obj->vmapping_count = 1;
-out_unlock:
+	addr = i915_gem_object_pin_vmap(obj);
 	mutex_unlock(&dev->struct_mutex);
-	return obj->dma_buf_vmapping;
 
-err_unpin:
-	i915_gem_object_unpin_pages(obj);
-err:
-	mutex_unlock(&dev->struct_mutex);
-	return ERR_PTR(ret);
+	return addr;
 }
 
 static void i915_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr)
@@ -163,12 +127,7 @@ static void i915_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr)
 	struct drm_device *dev = obj->base.dev;
 
 	mutex_lock(&dev->struct_mutex);
-	if (--obj->vmapping_count == 0) {
-		vunmap(obj->dma_buf_vmapping);
-		obj->dma_buf_vmapping = NULL;
-
-		i915_gem_object_unpin_pages(obj);
-	}
+	i915_gem_object_unpin_vmap(obj);
 	mutex_unlock(&dev->struct_mutex);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index e8a7a1045c06..2728c0ca0871 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1852,34 +1852,12 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
 void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 {
 	if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen)
-		vunmap(ringbuf->virtual_start);
+		i915_gem_object_unpin_vmap(ringbuf->obj);
 	else
 		iounmap(ringbuf->virtual_start);
-	ringbuf->virtual_start = NULL;
 	i915_gem_object_ggtt_unpin(ringbuf->obj);
 }
 
-static u32 *vmap_obj(struct drm_i915_gem_object *obj)
-{
-	struct sg_page_iter sg_iter;
-	struct page **pages;
-	void *addr;
-	int i;
-
-	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
-	if (pages == NULL)
-		return NULL;
-
-	i = 0;
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0)
-		pages[i++] = sg_page_iter_page(&sg_iter);
-
-	addr = vmap(pages, i, 0, PAGE_KERNEL);
-	drm_free_large(pages);
-
-	return addr;
-}
-
 int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 				     struct intel_ringbuffer *ringbuf)
 {
@@ -1893,15 +1871,14 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 			return ret;
 
 		ret = i915_gem_object_set_to_cpu_domain(obj, true);
-		if (ret) {
-			i915_gem_object_ggtt_unpin(obj);
-			return ret;
-		}
+		if (ret)
+			goto unpin;
 
-		ringbuf->virtual_start = vmap_obj(obj);
-		if (ringbuf->virtual_start == NULL) {
-			i915_gem_object_ggtt_unpin(obj);
-			return -ENOMEM;
+		ringbuf->virtual_start = i915_gem_object_pin_vmap(obj);
+		if (IS_ERR(ringbuf->virtual_start)) {
+			ret = PTR_ERR(ringbuf->virtual_start);
+			ringbuf->virtual_start = NULL;
+			goto unpin;
 		}
 	} else {
 		ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
@@ -1909,20 +1886,22 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 			return ret;
 
 		ret = i915_gem_object_set_to_gtt_domain(obj, true);
-		if (ret) {
-			i915_gem_object_ggtt_unpin(obj);
-			return ret;
-		}
+		if (ret)
+			goto unpin;
 
 		ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
 						    i915_gem_obj_ggtt_offset(obj), ringbuf->size);
 		if (ringbuf->virtual_start == NULL) {
-			i915_gem_object_ggtt_unpin(obj);
-			return -EINVAL;
+			ret = -ENOMEM;
+			goto unpin;
 		}
 	}
 
 	return 0;
+
+unpin:
+	i915_gem_object_ggtt_unpin(obj);
+	return ret;
 }
 
 static void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 051/190] drm,i915: Introduce drm_malloc_gfp()
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (48 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 050/190] drm/i915: Refactor duplicate object vmap functions Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 052/190] drm/i915: Treat ringbuffer writes as write to normal memory Chris Wilson
                   ` (36 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel

I have instances where I want to use drm_malloc_ab() but with a custom
gfp mask. And with those, where I want a temporary allocation, I want to
try a high-order kmalloc() before using a vmalloc().

So refactor my usage into drm_malloc_gfp().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/i915/i915_gem.c            |  4 +---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  8 +++-----
 drivers/gpu/drm/i915/i915_gem_gtt.c        |  5 +++--
 drivers/gpu/drm/i915/i915_gem_userptr.c    | 15 ++++-----------
 include/drm/drm_mem_util.h                 | 19 +++++++++++++++++++
 5 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2912e8714f5b..a4f9c5bbb883 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2040,9 +2040,7 @@ void *i915_gem_object_pin_vmap(struct drm_i915_gem_object *obj)
 		int n;
 
 		n = obj->base.size >> PAGE_SHIFT;
-		pages = kmalloc(n*sizeof(*pages), GFP_TEMPORARY | __GFP_NOWARN);
-		if (pages == NULL)
-			pages = drm_malloc_ab(n, sizeof(*pages));
+		pages = drm_malloc_gfp(n, sizeof(*pages), GFP_TEMPORARY);
 		if (pages != NULL) {
 			n = 0;
 			for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index da1c6fe5b40e..dfabeee2ff0b 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1766,11 +1766,9 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
-	exec2_list = kmalloc(sizeof(*exec2_list)*args->buffer_count,
-			     GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
-	if (exec2_list == NULL)
-		exec2_list = drm_malloc_ab(sizeof(*exec2_list),
-					   args->buffer_count);
+	exec2_list = drm_malloc_gfp(sizeof(*exec2_list),
+				    args->buffer_count,
+				    GFP_TEMPORARY);
 	if (exec2_list == NULL) {
 		DRM_DEBUG("Failed to allocate exec list for %d buffers\n",
 			  args->buffer_count);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 56f4f2e58d53..224fe89baca3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3376,8 +3376,9 @@ intel_rotate_fb_obj_pages(struct i915_ggtt_view *ggtt_view,
 	int ret = -ENOMEM;
 
 	/* Allocate a temporary list of source pages for random access. */
-	page_addr_list = drm_malloc_ab(obj->base.size / PAGE_SIZE,
-				       sizeof(dma_addr_t));
+	page_addr_list = drm_malloc_gfp(obj->base.size / PAGE_SIZE,
+					sizeof(dma_addr_t),
+					GFP_TEMPORARY);
 	if (!page_addr_list)
 		return ERR_PTR(ret);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 1a5f89dba4af..251e81c4b0ea 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -573,10 +573,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
 	ret = -ENOMEM;
 	pinned = 0;
 
-	pvec = kmalloc(npages*sizeof(struct page *),
-		       GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
-	if (pvec == NULL)
-		pvec = drm_malloc_ab(npages, sizeof(struct page *));
+	pvec = drm_malloc_gfp(npages, sizeof(struct page *), GFP_TEMPORARY);
 	if (pvec != NULL) {
 		struct mm_struct *mm = obj->userptr.mm->mm;
 
@@ -713,14 +710,10 @@ i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
 	pvec = NULL;
 	pinned = 0;
 	if (obj->userptr.mm->mm == current->mm) {
-		pvec = kmalloc(num_pages*sizeof(struct page *),
-			       GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
+		pvec = drm_malloc_gfp(num_pages, sizeof(struct page *), GFP_TEMPORARY);
 		if (pvec == NULL) {
-			pvec = drm_malloc_ab(num_pages, sizeof(struct page *));
-			if (pvec == NULL) {
-				__i915_gem_userptr_set_active(obj, false);
-				return -ENOMEM;
-			}
+			__i915_gem_userptr_set_active(obj, false);
+			return -ENOMEM;
 		}
 
 		pinned = __get_user_pages_fast(obj->userptr.ptr, num_pages,
diff --git a/include/drm/drm_mem_util.h b/include/drm/drm_mem_util.h
index e42495ad8136..741ce75a72b4 100644
--- a/include/drm/drm_mem_util.h
+++ b/include/drm/drm_mem_util.h
@@ -54,6 +54,25 @@ static __inline__ void *drm_malloc_ab(size_t nmemb, size_t size)
 			 GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL);
 }
 
+static __inline__ void *drm_malloc_gfp(size_t nmemb, size_t size, gfp_t gfp)
+{
+	if (size != 0 && nmemb > SIZE_MAX / size)
+		return NULL;
+
+	if (size * nmemb <= PAGE_SIZE)
+	    return kmalloc(nmemb * size, gfp);
+
+	if (gfp & __GFP_RECLAIMABLE) {
+		void *ptr = kmalloc(nmemb * size,
+				    gfp | __GFP_NOWARN | __GFP_NORETRY);
+		if (ptr)
+			return ptr;
+	}
+
+	return __vmalloc(size * nmemb,
+			 gfp | __GFP_HIGHMEM, PAGE_KERNEL);
+}
+
 static __inline void drm_free_large(void *ptr)
 {
 	kvfree(ptr);
-- 
2.7.0.rc3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 052/190] drm/i915: Treat ringbuffer writes as write to normal memory
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (49 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 051/190] drm,i915: Introduce drm_malloc_gfp() Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 053/190] drm/i915: Convert i915_semaphores_is_enabled over to early sanitize Chris Wilson
                   ` (35 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Ringbuffers are now being written to either through LLC or WC paths, so
treating them as simply iomem is no longer adequate. However, for the
older !llc hardware, the hardware is documentated as treating the TAIL
register update as serialising, so we can relax the barriers when filling
the rings (but even if it were not, it is still an uncached register write
and so serialising anyway.).

For simplicity, let's ignore the iomem annotation.

v2: Remove iomem from ringbuffer->virtual_address

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        |  7 +------
 drivers/gpu/drm/i915/intel_lrc.h        |  6 +++---
 drivers/gpu/drm/i915/intel_ringbuffer.c |  7 +------
 drivers/gpu/drm/i915/intel_ringbuffer.h | 19 +++++++++++++------
 4 files changed, 18 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 433e9f60e926..527eaf59be25 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -766,13 +766,8 @@ intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
 
 static void __wrap_ring_buffer(struct intel_ringbuffer *ringbuf)
 {
-	uint32_t __iomem *virt;
 	int rem = ringbuf->size - ringbuf->tail;
-
-	virt = ringbuf->virtual_start + ringbuf->tail;
-	rem /= 4;
-	while (rem--)
-		iowrite32(MI_NOOP, virt++);
+	memset(ringbuf->virtual_start + ringbuf->tail, 0, rem);
 
 	ringbuf->tail = 0;
 	intel_ring_update_space(ringbuf);
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index de41ad6cd63d..1e58f2550777 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -71,8 +71,9 @@ int logical_ring_flush_all_caches(struct drm_i915_gem_request *req);
  */
 static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
 {
-	ringbuf->tail &= ringbuf->size - 1;
+	intel_ringbuffer_advance(ringbuf);
 }
+
 /**
  * intel_logical_ring_emit() - write a DWORD to the ringbuffer.
  * @ringbuf: Ringbuffer to write to.
@@ -81,8 +82,7 @@ static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
 static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf,
 					   u32 data)
 {
-	iowrite32(data, ringbuf->virtual_start + ringbuf->tail);
-	ringbuf->tail += 4;
+	intel_ringbuffer_emit(ringbuf, data);
 }
 static inline void intel_logical_ring_emit_reg(struct intel_ringbuffer *ringbuf,
 					       i915_reg_t reg)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 2728c0ca0871..02b7032e16e0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2099,13 +2099,8 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
 
 static void __wrap_ring_buffer(struct intel_ringbuffer *ringbuf)
 {
-	uint32_t __iomem *virt;
 	int rem = ringbuf->size - ringbuf->tail;
-
-	virt = ringbuf->virtual_start + ringbuf->tail;
-	rem /= 4;
-	while (rem--)
-		iowrite32(MI_NOOP, virt++);
+	memset(ringbuf->virtual_start + ringbuf->tail, 0, rem);
 
 	ringbuf->tail = 0;
 	intel_ring_update_space(ringbuf);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index a1fcb6c7501f..7669a8d30f27 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -99,7 +99,7 @@ struct intel_ring_hangcheck {
 
 struct intel_ringbuffer {
 	struct drm_i915_gem_object *obj;
-	void __iomem *virtual_start;
+	void *virtual_start;
 
 	struct intel_engine_cs *ring;
 	struct list_head link;
@@ -468,12 +468,20 @@ int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request);
 
 int __must_check intel_ring_begin(struct drm_i915_gem_request *req, int n);
 int __must_check intel_ring_cacheline_align(struct drm_i915_gem_request *req);
+static inline void intel_ringbuffer_emit(struct intel_ringbuffer *rb,
+					 u32 data)
+{
+	*(uint32_t *)(rb->virtual_start + rb->tail) = data;
+	rb->tail += 4;
+}
+static inline void intel_ringbuffer_advance(struct intel_ringbuffer *rb)
+{
+	rb->tail &= rb->size - 1;
+}
 static inline void intel_ring_emit(struct intel_engine_cs *ring,
 				   u32 data)
 {
-	struct intel_ringbuffer *ringbuf = ring->buffer;
-	iowrite32(data, ringbuf->virtual_start + ringbuf->tail);
-	ringbuf->tail += 4;
+	intel_ringbuffer_emit(ring->buffer, data);
 }
 static inline void intel_ring_emit_reg(struct intel_engine_cs *ring,
 				       i915_reg_t reg)
@@ -482,8 +490,7 @@ static inline void intel_ring_emit_reg(struct intel_engine_cs *ring,
 }
 static inline void intel_ring_advance(struct intel_engine_cs *ring)
 {
-	struct intel_ringbuffer *ringbuf = ring->buffer;
-	ringbuf->tail &= ringbuf->size - 1;
+	intel_ringbuffer_advance(ring->buffer);
 }
 int __intel_ring_space(int head, int tail, int size);
 void intel_ring_update_space(struct intel_ringbuffer *ringbuf);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 053/190] drm/i915: Convert i915_semaphores_is_enabled over to early sanitize
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (50 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 052/190] drm/i915: Treat ringbuffer writes as write to normal memory Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-12 19:07   ` Dave Gordon
  2016-01-11  9:17 ` [PATCH 054/190] drm/i915: Use the new rq->i915 field where appropriate Chris Wilson
                   ` (34 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Rather than recomputing whether semaphores are enabled, we can do that
computation once during early initialisation as the i915.semaphores
module parameter is now read-only.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  2 +-
 drivers/gpu/drm/i915/i915_dma.c         |  2 +-
 drivers/gpu/drm/i915/i915_drv.c         | 25 -----------------------
 drivers/gpu/drm/i915/i915_drv.h         |  1 -
 drivers/gpu/drm/i915/i915_gem.c         | 35 ++++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_gem_context.c |  2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c   |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c | 20 +++++++++----------
 8 files changed, 46 insertions(+), 43 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 5335072f2047..387ae77d3c29 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3146,7 +3146,7 @@ static int i915_semaphore_status(struct seq_file *m, void *unused)
 	int num_rings = hweight32(INTEL_INFO(dev)->ring_mask);
 	int i, j, ret;
 
-	if (!i915_semaphore_is_enabled(dev)) {
+	if (!i915.semaphores) {
 		seq_puts(m, "Semaphores are disabled\n");
 		return 0;
 	}
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 9e49e304dd8e..4c72c83cfa28 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -126,7 +126,7 @@ static int i915_getparam(struct drm_device *dev, void *data,
 		value = 1;
 		break;
 	case I915_PARAM_HAS_SEMAPHORES:
-		value = i915_semaphore_is_enabled(dev);
+		value = i915.semaphores;
 		break;
 	case I915_PARAM_HAS_PRIME_VMAP_FLUSH:
 		value = 1;
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index e9f85fd0542f..cc831a34f7bb 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -515,31 +515,6 @@ void intel_detect_pch(struct drm_device *dev)
 	pci_dev_put(pch);
 }
 
-bool i915_semaphore_is_enabled(struct drm_device *dev)
-{
-	if (INTEL_INFO(dev)->gen < 6)
-		return false;
-
-	if (i915.semaphores >= 0)
-		return i915.semaphores;
-
-	/* TODO: make semaphores and Execlists play nicely together */
-	if (i915.enable_execlists)
-		return false;
-
-	/* Until we get further testing... */
-	if (IS_GEN8(dev))
-		return false;
-
-#ifdef CONFIG_INTEL_IOMMU
-	/* Enable semaphores on SNB when IO remapping is off */
-	if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped)
-		return false;
-#endif
-
-	return true;
-}
-
 static void intel_suspend_encoders(struct drm_i915_private *dev_priv)
 {
 	struct drm_device *dev = dev_priv->dev;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 56cf2ffc1eac..58e9e5e50769 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3226,7 +3226,6 @@ extern void intel_set_memory_cxsr(struct drm_i915_private *dev_priv,
 extern void intel_detect_pch(struct drm_device *dev);
 extern int intel_enable_rc6(const struct drm_device *dev);
 
-extern bool i915_semaphore_is_enabled(struct drm_device *dev);
 int i915_reg_read_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file);
 int i915_get_reset_stats_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a4f9c5bbb883..31926a4fb42a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2567,7 +2567,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (i915_gem_request_completed(from_req))
 		return 0;
 
-	if (!i915_semaphore_is_enabled(obj->base.dev)) {
+	if (!i915.semaphores) {
 		struct drm_i915_private *i915 = to_i915(obj->base.dev);
 		ret = __i915_wait_request(from_req,
 					  i915->mm.interruptible,
@@ -4304,13 +4304,42 @@ out:
 	return ret;
 }
 
+static bool i915_gem_sanitize_semaphore(struct drm_i915_private *dev_priv,
+					int param_value)
+{
+	if (INTEL_INFO(dev_priv)->gen < 6)
+		return false;
+
+	if (param_value >= 0)
+		return param_value;
+
+	/* TODO: make semaphores and Execlists play nicely together */
+	if (i915.enable_execlists)
+		return false;
+
+	/* Until we get further testing... */
+	if (IS_GEN8(dev_priv))
+		return false;
+
+#ifdef CONFIG_INTEL_IOMMU
+	/* Enable semaphores on SNB when IO remapping is off */
+	if (INTEL_INFO(dev_priv)->gen == 6 && intel_iommu_gfx_mapped)
+		return false;
+#endif
+
+	return true;
+}
+
 int i915_gem_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret;
 
-	i915.enable_execlists = intel_sanitize_enable_execlists(dev,
-			i915.enable_execlists);
+	i915.enable_execlists =
+		intel_sanitize_enable_execlists(dev, i915.enable_execlists);
+
+	i915.semaphores =
+		i915_gem_sanitize_semaphore(dev_priv, i915.semaphores);
 
 	mutex_lock(&dev->struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 0aea5ccf6d68..361be1085a18 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -523,7 +523,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 	u32 flags = hw_flags | MI_MM_SPACE_GTT;
 	const int num_rings =
 		/* Use an extended w/a on ivb+ if signalling from other rings */
-		i915_semaphore_is_enabled(ring->dev) ?
+		i915.semaphores ?
 		hweight32(INTEL_INFO(ring->dev)->ring_mask) - 1 :
 		0;
 	int len, i, ret;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 05f054898a95..84ce91275fdd 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -823,7 +823,7 @@ static void gen8_record_semaphore_state(struct drm_i915_private *dev_priv,
 	struct intel_engine_cs *to;
 	int i;
 
-	if (!i915_semaphore_is_enabled(dev_priv->dev))
+	if (!i915.semaphores)
 		return;
 
 	if (!error->semaphore_obj)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 02b7032e16e0..e143da96dcfa 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2510,7 +2510,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	ring->mmio_base = RENDER_RING_BASE;
 
 	if (INTEL_INFO(dev)->gen >= 8) {
-		if (i915_semaphore_is_enabled(dev)) {
+		if (i915.semaphores) {
 			obj = i915_gem_alloc_object(dev, 4096);
 			if (obj == NULL) {
 				DRM_ERROR("Failed to allocate semaphore bo. Disabling semaphores\n");
@@ -2534,7 +2534,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->irq_disable = gen8_ring_disable_irq;
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		ring->irq_seqno_barrier = gen6_seqno_barrier;
-		if (i915_semaphore_is_enabled(dev)) {
+		if (i915.semaphores) {
 			WARN_ON(!dev_priv->semaphore_obj);
 			ring->semaphore.sync_to = gen8_ring_sync;
 			ring->semaphore.signal = gen8_rcs_signal;
@@ -2550,7 +2550,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->irq_disable = gen6_ring_disable_irq;
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		ring->irq_seqno_barrier = gen6_seqno_barrier;
-		if (i915_semaphore_is_enabled(dev)) {
+		if (i915.semaphores) {
 			ring->semaphore.sync_to = gen6_ring_sync;
 			ring->semaphore.signal = gen6_signal;
 			/*
@@ -2666,7 +2666,7 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			ring->irq_disable = gen8_ring_disable_irq;
 			ring->dispatch_execbuffer =
 				gen8_ring_dispatch_execbuffer;
-			if (i915_semaphore_is_enabled(dev)) {
+			if (i915.semaphores) {
 				ring->semaphore.sync_to = gen8_ring_sync;
 				ring->semaphore.signal = gen8_xcs_signal;
 				GEN8_RING_SEMAPHORE_INIT;
@@ -2677,7 +2677,7 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			ring->irq_disable = gen6_ring_disable_irq;
 			ring->dispatch_execbuffer =
 				gen6_ring_dispatch_execbuffer;
-			if (i915_semaphore_is_enabled(dev)) {
+			if (i915.semaphores) {
 				ring->semaphore.sync_to = gen6_ring_sync;
 				ring->semaphore.signal = gen6_signal;
 				ring->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_VR;
@@ -2734,7 +2734,7 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
 	ring->irq_disable = gen8_ring_disable_irq;
 	ring->dispatch_execbuffer =
 			gen8_ring_dispatch_execbuffer;
-	if (i915_semaphore_is_enabled(dev)) {
+	if (i915.semaphores) {
 		ring->semaphore.sync_to = gen8_ring_sync;
 		ring->semaphore.signal = gen8_xcs_signal;
 		GEN8_RING_SEMAPHORE_INIT;
@@ -2763,7 +2763,7 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 		ring->irq_enable = gen8_ring_enable_irq;
 		ring->irq_disable = gen8_ring_disable_irq;
 		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
-		if (i915_semaphore_is_enabled(dev)) {
+		if (i915.semaphores) {
 			ring->semaphore.sync_to = gen8_ring_sync;
 			ring->semaphore.signal = gen8_xcs_signal;
 			GEN8_RING_SEMAPHORE_INIT;
@@ -2773,7 +2773,7 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 		ring->irq_enable = gen6_ring_enable_irq;
 		ring->irq_disable = gen6_ring_disable_irq;
 		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
-		if (i915_semaphore_is_enabled(dev)) {
+		if (i915.semaphores) {
 			ring->semaphore.signal = gen6_signal;
 			ring->semaphore.sync_to = gen6_ring_sync;
 			/*
@@ -2820,7 +2820,7 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 		ring->irq_enable = gen8_ring_enable_irq;
 		ring->irq_disable = gen8_ring_disable_irq;
 		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
-		if (i915_semaphore_is_enabled(dev)) {
+		if (i915.semaphores) {
 			ring->semaphore.sync_to = gen8_ring_sync;
 			ring->semaphore.signal = gen8_xcs_signal;
 			GEN8_RING_SEMAPHORE_INIT;
@@ -2830,7 +2830,7 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 		ring->irq_enable = hsw_vebox_enable_irq;
 		ring->irq_disable = hsw_vebox_disable_irq;
 		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
-		if (i915_semaphore_is_enabled(dev)) {
+		if (i915.semaphores) {
 			ring->semaphore.sync_to = gen6_ring_sync;
 			ring->semaphore.signal = gen6_signal;
 			ring->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_VER;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 054/190] drm/i915: Use the new rq->i915 field where appropriate
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (51 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 053/190] drm/i915: Convert i915_semaphores_is_enabled over to early sanitize Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 055/190] drm/i915: Unify intel_logical_ring_emit and intel_ring_emit Chris Wilson
                   ` (33 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

In a few frequent cases, having a direct pointer to the drm_i915_private
from the request is very useful.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c            |  7 +++---
 drivers/gpu/drm/i915/i915_gem_context.c    | 21 +++++++++---------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +--
 drivers/gpu/drm/i915/i915_gem_request.c    |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c           |  6 ++----
 drivers/gpu/drm/i915/intel_pm.c            |  3 +--
 drivers/gpu/drm/i915/intel_ringbuffer.c    | 34 ++++++++++++------------------
 7 files changed, 32 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 31926a4fb42a..c2a1ec8abc11 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2568,7 +2568,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 		return 0;
 
 	if (!i915.semaphores) {
-		struct drm_i915_private *i915 = to_i915(obj->base.dev);
+		struct drm_i915_private *i915 = from_req->i915;
 		ret = __i915_wait_request(from_req,
 					  i915->mm.interruptible,
 					  NULL,
@@ -4069,12 +4069,11 @@ err:
 int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice)
 {
 	struct intel_engine_cs *ring = req->ring;
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_private *dev_priv = req->i915;
 	u32 *remap_info = dev_priv->l3_parity.remap_info[slice];
 	int i, ret;
 
-	if (!HAS_L3_DPF(dev) || !remap_info)
+	if (!HAS_L3_DPF(dev_priv) || !remap_info)
 		return 0;
 
 	ret = intel_ring_begin(req, GEN7_L3LOG_SIZE / 4 * 3);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 361be1085a18..3e3b4bf3fed1 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -524,7 +524,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 	const int num_rings =
 		/* Use an extended w/a on ivb+ if signalling from other rings */
 		i915.semaphores ?
-		hweight32(INTEL_INFO(ring->dev)->ring_mask) - 1 :
+		hweight32(INTEL_INFO(req->i915)->ring_mask) - 1 :
 		0;
 	int len, i, ret;
 
@@ -533,21 +533,21 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 	 * explicitly, so we rely on the value at ring init, stored in
 	 * itlb_before_ctx_switch.
 	 */
-	if (IS_GEN6(ring->dev)) {
+	if (IS_GEN6(req->i915)) {
 		ret = ring->flush(req, I915_GEM_GPU_DOMAINS, 0);
 		if (ret)
 			return ret;
 	}
 
 	/* These flags are for resource streamer on HSW+ */
-	if (IS_HASWELL(ring->dev) || INTEL_INFO(ring->dev)->gen >= 8)
+	if (IS_HASWELL(req->i915) || INTEL_INFO(req->i915)->gen >= 8)
 		flags |= (HSW_MI_RS_SAVE_STATE_EN | HSW_MI_RS_RESTORE_STATE_EN);
-	else if (INTEL_INFO(ring->dev)->gen < 8)
+	else if (INTEL_INFO(req->i915)->gen < 8)
 		flags |= (MI_SAVE_EXT_STATE_EN | MI_RESTORE_EXT_STATE_EN);
 
 
 	len = 4;
-	if (INTEL_INFO(ring->dev)->gen >= 7)
+	if (INTEL_INFO(req->i915)->gen >= 7)
 		len += 2 + (num_rings ? 4*num_rings + 2 : 0);
 
 	ret = intel_ring_begin(req, len);
@@ -555,13 +555,13 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 		return ret;
 
 	/* WaProgramMiArbOnOffAroundMiSetContext:ivb,vlv,hsw,bdw,chv */
-	if (INTEL_INFO(ring->dev)->gen >= 7) {
+	if (INTEL_INFO(req->i915)->gen >= 7) {
 		intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_DISABLE);
 		if (num_rings) {
 			struct intel_engine_cs *signaller;
 
 			intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(num_rings));
-			for_each_ring(signaller, to_i915(ring->dev), i) {
+			for_each_ring(signaller, req->i915, i) {
 				if (signaller == ring)
 					continue;
 
@@ -581,12 +581,12 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 	 */
 	intel_ring_emit(ring, MI_NOOP);
 
-	if (INTEL_INFO(ring->dev)->gen >= 7) {
+	if (INTEL_INFO(req->i915)->gen >= 7) {
 		if (num_rings) {
 			struct intel_engine_cs *signaller;
 
 			intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(num_rings));
-			for_each_ring(signaller, to_i915(ring->dev), i) {
+			for_each_ring(signaller, req->i915, i) {
 				if (signaller == ring)
 					continue;
 
@@ -827,10 +827,9 @@ unpin_out:
 int i915_switch_context(struct drm_i915_gem_request *req)
 {
 	struct intel_engine_cs *ring = req->ring;
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 
 	WARN_ON(i915.enable_execlists);
-	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
+	WARN_ON(!mutex_is_locked(&req->i915->dev->struct_mutex));
 
 	if (req->ctx->legacy_hw_ctx.rcs_state == NULL) { /* We have the fake context */
 		if (req->ctx != ring->last_context) {
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index dfabeee2ff0b..78b462956c78 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1099,7 +1099,6 @@ void
 i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 				   struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = i915_gem_request_get_ring(req);
 	struct i915_vma *vma;
 
 	list_for_each_entry(vma, vmas, exec_list) {
@@ -1126,7 +1125,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 		if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
 			i915_gem_request_assign(&obj->last_fenced_req, req);
 			if (entry->flags & __EXEC_OBJECT_HAS_FENCE) {
-				struct drm_i915_private *dev_priv = to_i915(ring->dev);
+				struct drm_i915_private *dev_priv = req->i915;
 				list_move_tail(&dev_priv->fence_regs[obj->fence_reg].lru_list,
 					       &dev_priv->mm.fence_list);
 			}
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 01893d847dfd..619a9b063d9c 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -199,7 +199,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out)
 {
-	struct drm_i915_private *dev_priv = to_i915(ring->dev);
+	struct drm_i915_private *dev_priv = ring->i915;
 	unsigned reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 	struct drm_i915_gem_request *req;
 	u32 seqno;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 527eaf59be25..a369aa041522 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -329,8 +329,7 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
 {
 
 	struct intel_engine_cs *ring = rq0->ring;
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_private *dev_priv = rq0->i915;
 	uint64_t desc[2];
 
 	if (rq1) {
@@ -1094,8 +1093,7 @@ static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
 	int ret, i;
 	struct intel_engine_cs *ring = req->ring;
 	struct intel_ringbuffer *ringbuf = req->ringbuf;
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_private *dev_priv = req->i915;
 	struct i915_workarounds *w = &dev_priv->workarounds;
 
 	if (w->count == 0)
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index b340f2a1f110..a082b4577599 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7286,8 +7286,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 	struct drm_i915_gem_request *req = boost->req;
 
 	if (!i915_gem_request_completed(req))
-		gen6_rps_boost(to_i915(req->ring->dev), NULL,
-			       req->emitted_jiffies);
+		gen6_rps_boost(req->i915, NULL, req->emitted_jiffies);
 
 	i915_gem_request_put(req);
 	kfree(boost);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index e143da96dcfa..d17dd33ee94c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -99,7 +99,6 @@ gen4_render_ring_flush(struct drm_i915_gem_request *req,
 		       u32	flush_domains)
 {
 	struct intel_engine_cs *ring = req->ring;
-	struct drm_device *dev = ring->dev;
 	u32 cmd;
 	int ret;
 
@@ -138,7 +137,7 @@ gen4_render_ring_flush(struct drm_i915_gem_request *req,
 		cmd |= MI_EXE_FLUSH;
 
 	if (invalidate_domains & I915_GEM_DOMAIN_COMMAND &&
-	    (IS_G4X(dev) || IS_GEN5(dev)))
+	    (IS_G4X(req->i915) || IS_GEN5(req->i915)))
 		cmd |= MI_INVALIDATE_ISP;
 
 	ret = intel_ring_begin(req, 2);
@@ -691,8 +690,7 @@ static int intel_ring_workarounds_emit(struct drm_i915_gem_request *req)
 {
 	int ret, i;
 	struct intel_engine_cs *ring = req->ring;
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_private *dev_priv = req->i915;
 	struct i915_workarounds *w = &dev_priv->workarounds;
 
 	if (w->count == 0)
@@ -1194,12 +1192,11 @@ static int gen8_rcs_signal(struct drm_i915_gem_request *signaller_req,
 {
 #define MBOX_UPDATE_DWORDS 8
 	struct intel_engine_cs *signaller = signaller_req->ring;
-	struct drm_device *dev = signaller->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_private *dev_priv = signaller_req->i915;
 	struct intel_engine_cs *waiter;
 	int i, ret, num_rings;
 
-	num_rings = hweight32(INTEL_INFO(dev)->ring_mask);
+	num_rings = hweight32(INTEL_INFO(dev_priv)->ring_mask);
 	num_dwords += (num_rings-1) * MBOX_UPDATE_DWORDS;
 #undef MBOX_UPDATE_DWORDS
 
@@ -1233,12 +1230,11 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
 {
 #define MBOX_UPDATE_DWORDS 6
 	struct intel_engine_cs *signaller = signaller_req->ring;
-	struct drm_device *dev = signaller->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_private *dev_priv = signaller_req->i915;
 	struct intel_engine_cs *waiter;
 	int i, ret, num_rings;
 
-	num_rings = hweight32(INTEL_INFO(dev)->ring_mask);
+	num_rings = hweight32(INTEL_INFO(dev_priv)->ring_mask);
 	num_dwords += (num_rings-1) * MBOX_UPDATE_DWORDS;
 #undef MBOX_UPDATE_DWORDS
 
@@ -1269,13 +1265,12 @@ static int gen6_signal(struct drm_i915_gem_request *signaller_req,
 		       unsigned int num_dwords)
 {
 	struct intel_engine_cs *signaller = signaller_req->ring;
-	struct drm_device *dev = signaller->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_private *dev_priv = signaller_req->i915;
 	struct intel_engine_cs *useless;
 	int i, ret, num_rings;
 
 #define MBOX_UPDATE_DWORDS 3
-	num_rings = hweight32(INTEL_INFO(dev)->ring_mask);
+	num_rings = hweight32(INTEL_INFO(dev_priv)->ring_mask);
 	num_dwords += round_up((num_rings-1) * MBOX_UPDATE_DWORDS, 2);
 #undef MBOX_UPDATE_DWORDS
 
@@ -1352,7 +1347,7 @@ gen8_ring_sync(struct drm_i915_gem_request *waiter_req,
 	       u32 seqno)
 {
 	struct intel_engine_cs *waiter = waiter_req->ring;
-	struct drm_i915_private *dev_priv = waiter->dev->dev_private;
+	struct drm_i915_private *dev_priv = waiter_req->i915;
 	int ret;
 
 	ret = intel_ring_begin(waiter_req, 4);
@@ -2120,7 +2115,7 @@ int intel_ring_idle(struct intel_engine_cs *ring)
 
 	/* Make sure we do not trigger any retires */
 	return __i915_wait_request(req,
-				   to_i915(ring->dev)->mm.interruptible,
+				   req->i915->mm.interruptible,
 				   NULL, NULL);
 }
 
@@ -2248,7 +2243,7 @@ int intel_ring_begin(struct drm_i915_gem_request *req,
 
 	WARN_ON(req == NULL);
 	ring = req->ring;
-	dev_priv = ring->dev->dev_private;
+	dev_priv = req->i915;
 
 	ret = __intel_ring_prepare(ring, num_dwords * sizeof(uint32_t));
 	if (ret)
@@ -2383,7 +2378,7 @@ gen8_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			      unsigned dispatch_flags)
 {
 	struct intel_engine_cs *ring = req->ring;
-	bool ppgtt = USES_PPGTT(ring->dev) &&
+	bool ppgtt = USES_PPGTT(req->i915) &&
 			!(dispatch_flags & I915_DISPATCH_SECURE);
 	int ret;
 
@@ -2457,7 +2452,6 @@ static int gen6_ring_flush(struct drm_i915_gem_request *req,
 			   u32 invalidate, u32 flush)
 {
 	struct intel_engine_cs *ring = req->ring;
-	struct drm_device *dev = ring->dev;
 	uint32_t cmd;
 	int ret;
 
@@ -2466,7 +2460,7 @@ static int gen6_ring_flush(struct drm_i915_gem_request *req,
 		return ret;
 
 	cmd = MI_FLUSH_DW;
-	if (INTEL_INFO(dev)->gen >= 8)
+	if (INTEL_INFO(req->i915)->gen >= 8)
 		cmd += 1;
 
 	/* We always require a command barrier so that subsequent
@@ -2486,7 +2480,7 @@ static int gen6_ring_flush(struct drm_i915_gem_request *req,
 		cmd |= MI_INVALIDATE_TLB;
 	intel_ring_emit(ring, cmd);
 	intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
-	if (INTEL_INFO(dev)->gen >= 8) {
+	if (INTEL_INFO(req->i915)->gen >= 8) {
 		intel_ring_emit(ring, 0); /* upper addr */
 		intel_ring_emit(ring, 0); /* value */
 	} else  {
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 055/190] drm/i915: Unify intel_logical_ring_emit and intel_ring_emit
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (52 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 054/190] drm/i915: Use the new rq->i915 field where appropriate Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-12 17:29   ` Dave Gordon
  2016-01-11  9:17 ` [PATCH 056/190] drm/i915: Unify intel_ring_begin() Chris Wilson
                   ` (32 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Both perform the same actions with more or less indirection, so just
unify the code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c            |   2 +-
 drivers/gpu/drm/i915/i915_gem_context.c    |   8 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  34 ++++-----
 drivers/gpu/drm/i915/i915_gem_gtt.c        |  26 +++----
 drivers/gpu/drm/i915/intel_display.c       |  26 +++----
 drivers/gpu/drm/i915/intel_lrc.c           | 114 ++++++++++++++---------------
 drivers/gpu/drm/i915/intel_lrc.h           |  26 -------
 drivers/gpu/drm/i915/intel_mocs.c          |  30 ++++----
 drivers/gpu/drm/i915/intel_overlay.c       |  42 +++++------
 drivers/gpu/drm/i915/intel_ringbuffer.c    | 101 ++++++++++++-------------
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  21 ++----
 11 files changed, 194 insertions(+), 236 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c2a1ec8abc11..247731672cb1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4068,7 +4068,7 @@ err:
 
 int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	struct drm_i915_private *dev_priv = req->i915;
 	u32 *remap_info = dev_priv->l3_parity.remap_info[slice];
 	int i, ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 3e3b4bf3fed1..d58de7e084dc 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -519,7 +519,7 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
 static inline int
 mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	u32 flags = hw_flags | MI_MM_SPACE_GTT;
 	const int num_rings =
 		/* Use an extended w/a on ivb+ if signalling from other rings */
@@ -534,7 +534,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 	 * itlb_before_ctx_switch.
 	 */
 	if (IS_GEN6(req->i915)) {
-		ret = ring->flush(req, I915_GEM_GPU_DOMAINS, 0);
+		ret = req->ring->flush(req, I915_GEM_GPU_DOMAINS, 0);
 		if (ret)
 			return ret;
 	}
@@ -562,7 +562,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 
 			intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(num_rings));
 			for_each_ring(signaller, req->i915, i) {
-				if (signaller == ring)
+				if (signaller == req->ring)
 					continue;
 
 				intel_ring_emit_reg(ring, RING_PSMI_CTL(signaller->mmio_base));
@@ -587,7 +587,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 
 			intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(num_rings));
 			for_each_ring(signaller, req->i915, i) {
-				if (signaller == ring)
+				if (signaller == req->ring)
 					continue;
 
 				intel_ring_emit_reg(ring, RING_PSMI_CTL(signaller->mmio_base));
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 78b462956c78..603a247ac333 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1146,14 +1146,12 @@ i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params)
 }
 
 static int
-i915_reset_gen7_sol_offsets(struct drm_device *dev,
-			    struct drm_i915_gem_request *req)
+i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = req->ring;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	int ret, i;
 
-	if (!IS_GEN7(dev) || ring != &dev_priv->ring[RCS]) {
+	if (!IS_GEN7(req->i915) || req->ring->id != RCS) {
 		DRM_DEBUG("sol reset is gen7/rcs only\n");
 		return -EINVAL;
 	}
@@ -1231,9 +1229,8 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 			       struct drm_i915_gem_execbuffer2 *args,
 			       struct list_head *vmas)
 {
-	struct drm_device *dev = params->dev;
-	struct intel_engine_cs *ring = params->ring;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_ringbuffer *ring = params->request->ringbuf;
+	struct drm_i915_private *dev_priv = params->request->i915;
 	u64 exec_start, exec_len;
 	int instp_mode;
 	u32 instp_mask;
@@ -1247,34 +1244,31 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 	if (ret)
 		return ret;
 
-	WARN(params->ctx->ppgtt && params->ctx->ppgtt->pd_dirty_rings & (1<<ring->id),
-	     "%s didn't clear reload\n", ring->name);
-
 	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
 	instp_mask = I915_EXEC_CONSTANTS_MASK;
 	switch (instp_mode) {
 	case I915_EXEC_CONSTANTS_REL_GENERAL:
 	case I915_EXEC_CONSTANTS_ABSOLUTE:
 	case I915_EXEC_CONSTANTS_REL_SURFACE:
-		if (instp_mode != 0 && ring != &dev_priv->ring[RCS]) {
+		if (instp_mode != 0 && params->ring->id != RCS) {
 			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
 			return -EINVAL;
 		}
 
 		if (instp_mode != dev_priv->relative_constants_mode) {
-			if (INTEL_INFO(dev)->gen < 4) {
+			if (INTEL_INFO(dev_priv)->gen < 4) {
 				DRM_DEBUG("no rel constants on pre-gen4\n");
 				return -EINVAL;
 			}
 
-			if (INTEL_INFO(dev)->gen > 5 &&
+			if (INTEL_INFO(dev_priv)->gen > 5 &&
 			    instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
 				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
 				return -EINVAL;
 			}
 
 			/* The HW changed the meaning on this bit on gen6 */
-			if (INTEL_INFO(dev)->gen >= 6)
+			if (INTEL_INFO(dev_priv)->gen >= 6)
 				instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
 		}
 		break;
@@ -1283,7 +1277,7 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 		return -EINVAL;
 	}
 
-	if (ring == &dev_priv->ring[RCS] &&
+	if (params->ring->id == RCS &&
 	    instp_mode != dev_priv->relative_constants_mode) {
 		ret = intel_ring_begin(params->request, 4);
 		if (ret)
@@ -1299,7 +1293,7 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 	}
 
 	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
-		ret = i915_reset_gen7_sol_offsets(dev, params->request);
+		ret = i915_reset_gen7_sol_offsets(params->request);
 		if (ret)
 			return ret;
 	}
@@ -1308,9 +1302,9 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 	exec_start = params->batch_obj_vm_offset +
 		     params->args_batch_start_offset;
 
-	ret = ring->dispatch_execbuffer(params->request,
-					exec_start, exec_len,
-					params->dispatch_flags);
+	ret = params->ring->dispatch_execbuffer(params->request,
+						exec_start, exec_len,
+						params->dispatch_flags);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 224fe89baca3..98841b05f764 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -656,7 +656,7 @@ static int gen8_write_pdp(struct drm_i915_gem_request *req,
 			  unsigned entry,
 			  dma_addr_t addr)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	int ret;
 
 	BUG_ON(entry >= 4);
@@ -666,10 +666,10 @@ static int gen8_write_pdp(struct drm_i915_gem_request *req,
 		return ret;
 
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
-	intel_ring_emit_reg(ring, GEN8_RING_PDP_UDW(ring, entry));
+	intel_ring_emit_reg(ring, GEN8_RING_PDP_UDW(req->ring, entry));
 	intel_ring_emit(ring, upper_32_bits(addr));
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
-	intel_ring_emit_reg(ring, GEN8_RING_PDP_LDW(ring, entry));
+	intel_ring_emit_reg(ring, GEN8_RING_PDP_LDW(req->ring, entry));
 	intel_ring_emit(ring, lower_32_bits(addr));
 	intel_ring_advance(ring);
 
@@ -1648,11 +1648,11 @@ static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
 static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
 			 struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	int ret;
 
 	/* NB: TLBs must be flushed and invalidated before a switch */
-	ret = ring->flush(req, I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS);
+	ret = req->ring->flush(req, I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS);
 	if (ret)
 		return ret;
 
@@ -1661,9 +1661,9 @@ static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
 		return ret;
 
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(2));
-	intel_ring_emit_reg(ring, RING_PP_DIR_DCLV(ring));
+	intel_ring_emit_reg(ring, RING_PP_DIR_DCLV(req->ring));
 	intel_ring_emit(ring, PP_DIR_DCLV_2G);
-	intel_ring_emit_reg(ring, RING_PP_DIR_BASE(ring));
+	intel_ring_emit_reg(ring, RING_PP_DIR_BASE(req->ring));
 	intel_ring_emit(ring, get_pd_offset(ppgtt));
 	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_advance(ring);
@@ -1685,11 +1685,11 @@ static int vgpu_mm_switch(struct i915_hw_ppgtt *ppgtt,
 static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt,
 			  struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	int ret;
 
 	/* NB: TLBs must be flushed and invalidated before a switch */
-	ret = ring->flush(req, I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS);
+	ret = req->ring->flush(req, I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS);
 	if (ret)
 		return ret;
 
@@ -1698,16 +1698,16 @@ static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt,
 		return ret;
 
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(2));
-	intel_ring_emit_reg(ring, RING_PP_DIR_DCLV(ring));
+	intel_ring_emit_reg(ring, RING_PP_DIR_DCLV(req->ring));
 	intel_ring_emit(ring, PP_DIR_DCLV_2G);
-	intel_ring_emit_reg(ring, RING_PP_DIR_BASE(ring));
+	intel_ring_emit_reg(ring, RING_PP_DIR_BASE(req->ring));
 	intel_ring_emit(ring, get_pd_offset(ppgtt));
 	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_advance(ring);
 
 	/* XXX: RCS is the only one to auto invalidate the TLBs? */
-	if (ring->id != RCS) {
-		ret = ring->flush(req, I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS);
+	if (req->ring->id != RCS) {
+		ret = req->ring->flush(req, I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index e2822530af25..b28e783f6f04 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11052,7 +11052,7 @@ static int intel_gen2_queue_flip(struct drm_device *dev,
 				 struct drm_i915_gem_request *req,
 				 uint32_t flags)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	u32 flip_mask;
 	int ret;
@@ -11087,7 +11087,7 @@ static int intel_gen3_queue_flip(struct drm_device *dev,
 				 struct drm_i915_gem_request *req,
 				 uint32_t flags)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	u32 flip_mask;
 	int ret;
@@ -11119,8 +11119,8 @@ static int intel_gen4_queue_flip(struct drm_device *dev,
 				 struct drm_i915_gem_request *req,
 				 uint32_t flags)
 {
-	struct intel_engine_cs *ring = req->ring;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_ringbuffer *ring = req->ringbuf;
+	struct drm_i915_private *dev_priv = req->i915;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	uint32_t pf, pipesrc;
 	int ret;
@@ -11158,8 +11158,8 @@ static int intel_gen6_queue_flip(struct drm_device *dev,
 				 struct drm_i915_gem_request *req,
 				 uint32_t flags)
 {
-	struct intel_engine_cs *ring = req->ring;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_ringbuffer *ring = req->ringbuf;
+	struct drm_i915_private *dev_priv = req->i915;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	uint32_t pf, pipesrc;
 	int ret;
@@ -11194,7 +11194,7 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 				 struct drm_i915_gem_request *req,
 				 uint32_t flags)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	uint32_t plane_bit = 0;
 	int len, ret;
@@ -11215,14 +11215,14 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 	}
 
 	len = 4;
-	if (ring->id == RCS) {
+	if (req->ring->id == RCS) {
 		len += 6;
 		/*
 		 * On Gen 8, SRM is now taking an extra dword to accommodate
 		 * 48bits addresses, and we need a NOOP for the batch size to
 		 * stay even.
 		 */
-		if (IS_GEN8(dev))
+		if (IS_GEN8(req->i915))
 			len += 2;
 	}
 
@@ -11253,21 +11253,21 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 	 * for the RCS also doesn't appear to drop events. Setting the DERRMR
 	 * to zero does lead to lockups within MI_DISPLAY_FLIP.
 	 */
-	if (ring->id == RCS) {
+	if (req->ring->id == RCS) {
 		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 		intel_ring_emit_reg(ring, DERRMR);
 		intel_ring_emit(ring, ~(DERRMR_PIPEA_PRI_FLIP_DONE |
 					DERRMR_PIPEB_PRI_FLIP_DONE |
 					DERRMR_PIPEC_PRI_FLIP_DONE));
-		if (IS_GEN8(dev))
+		if (IS_GEN8(req->i915))
 			intel_ring_emit(ring, MI_STORE_REGISTER_MEM_GEN8 |
 					      MI_SRM_LRM_GLOBAL_GTT);
 		else
 			intel_ring_emit(ring, MI_STORE_REGISTER_MEM |
 					      MI_SRM_LRM_GLOBAL_GTT);
 		intel_ring_emit_reg(ring, DERRMR);
-		intel_ring_emit(ring, ring->scratch.gtt_offset + 256);
-		if (IS_GEN8(dev)) {
+		intel_ring_emit(ring, req->ring->scratch.gtt_offset + 256);
+		if (IS_GEN8(req->i915)) {
 			intel_ring_emit(ring, 0);
 			intel_ring_emit(ring, MI_NOOP);
 		}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a369aa041522..dc4fc9d8612c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -754,7 +754,7 @@ intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
 {
 	struct drm_i915_private *dev_priv = request->i915;
 
-	intel_logical_ring_advance(request->ringbuf);
+	intel_ring_advance(request->ringbuf);
 	request->tail = request->ringbuf->tail;
 
 	if (dev_priv->guc.execbuf_client)
@@ -932,11 +932,11 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 		if (ret)
 			return ret;
 
-		intel_logical_ring_emit(ringbuf, MI_NOOP);
-		intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1));
-		intel_logical_ring_emit_reg(ringbuf, INSTPM);
-		intel_logical_ring_emit(ringbuf, instp_mask << 16 | instp_mode);
-		intel_logical_ring_advance(ringbuf);
+		intel_ring_emit(ringbuf, MI_NOOP);
+		intel_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1));
+		intel_ring_emit_reg(ringbuf, INSTPM);
+		intel_ring_emit(ringbuf, instp_mask << 16 | instp_mode);
+		intel_ring_advance(ringbuf);
 
 		dev_priv->relative_constants_mode = instp_mode;
 	}
@@ -1108,14 +1108,14 @@ static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
 	if (ret)
 		return ret;
 
-	intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(w->count));
+	intel_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(w->count));
 	for (i = 0; i < w->count; i++) {
-		intel_logical_ring_emit_reg(ringbuf, w->reg[i].addr);
-		intel_logical_ring_emit(ringbuf, w->reg[i].value);
+		intel_ring_emit_reg(ringbuf, w->reg[i].addr);
+		intel_ring_emit(ringbuf, w->reg[i].value);
 	}
-	intel_logical_ring_emit(ringbuf, MI_NOOP);
+	intel_ring_emit(ringbuf, MI_NOOP);
 
-	intel_logical_ring_advance(ringbuf);
+	intel_ring_advance(ringbuf);
 
 	ring->gpu_caches_dirty = true;
 	ret = logical_ring_flush_all_caches(req);
@@ -1570,18 +1570,18 @@ static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
 	if (ret)
 		return ret;
 
-	intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(num_lri_cmds));
+	intel_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(num_lri_cmds));
 	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
 		const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
 
-		intel_logical_ring_emit_reg(ringbuf, GEN8_RING_PDP_UDW(ring, i));
-		intel_logical_ring_emit(ringbuf, upper_32_bits(pd_daddr));
-		intel_logical_ring_emit_reg(ringbuf, GEN8_RING_PDP_LDW(ring, i));
-		intel_logical_ring_emit(ringbuf, lower_32_bits(pd_daddr));
+		intel_ring_emit_reg(ringbuf, GEN8_RING_PDP_UDW(ring, i));
+		intel_ring_emit(ringbuf, upper_32_bits(pd_daddr));
+		intel_ring_emit_reg(ringbuf, GEN8_RING_PDP_LDW(ring, i));
+		intel_ring_emit(ringbuf, lower_32_bits(pd_daddr));
 	}
 
-	intel_logical_ring_emit(ringbuf, MI_NOOP);
-	intel_logical_ring_advance(ringbuf);
+	intel_ring_emit(ringbuf, MI_NOOP);
+	intel_ring_advance(ringbuf);
 
 	return 0;
 }
@@ -1616,14 +1616,14 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
 		return ret;
 
 	/* FIXME(BDW): Address space and security selectors. */
-	intel_logical_ring_emit(ringbuf, MI_BATCH_BUFFER_START_GEN8 |
-				(ppgtt<<8) |
-				(dispatch_flags & I915_DISPATCH_RS ?
-				 MI_BATCH_RESOURCE_STREAMER : 0));
-	intel_logical_ring_emit(ringbuf, lower_32_bits(offset));
-	intel_logical_ring_emit(ringbuf, upper_32_bits(offset));
-	intel_logical_ring_emit(ringbuf, MI_NOOP);
-	intel_logical_ring_advance(ringbuf);
+	intel_ring_emit(ringbuf, MI_BATCH_BUFFER_START_GEN8 |
+			(ppgtt<<8) |
+			(dispatch_flags & I915_DISPATCH_RS ?
+			 MI_BATCH_RESOURCE_STREAMER : 0));
+	intel_ring_emit(ringbuf, lower_32_bits(offset));
+	intel_ring_emit(ringbuf, upper_32_bits(offset));
+	intel_ring_emit(ringbuf, MI_NOOP);
+	intel_ring_advance(ringbuf);
 
 	return 0;
 }
@@ -1674,13 +1674,13 @@ static int gen8_emit_flush(struct drm_i915_gem_request *request,
 			cmd |= MI_INVALIDATE_BSD;
 	}
 
-	intel_logical_ring_emit(ringbuf, cmd);
-	intel_logical_ring_emit(ringbuf,
-				I915_GEM_HWS_SCRATCH_ADDR |
-				MI_FLUSH_DW_USE_GTT);
-	intel_logical_ring_emit(ringbuf, 0); /* upper addr */
-	intel_logical_ring_emit(ringbuf, 0); /* value */
-	intel_logical_ring_advance(ringbuf);
+	intel_ring_emit(ringbuf, cmd);
+	intel_ring_emit(ringbuf,
+			I915_GEM_HWS_SCRATCH_ADDR |
+			MI_FLUSH_DW_USE_GTT);
+	intel_ring_emit(ringbuf, 0); /* upper addr */
+	intel_ring_emit(ringbuf, 0); /* value */
+	intel_ring_advance(ringbuf);
 
 	return 0;
 }
@@ -1727,21 +1727,21 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
 		return ret;
 
 	if (vf_flush_wa) {
-		intel_logical_ring_emit(ringbuf, GFX_OP_PIPE_CONTROL(6));
-		intel_logical_ring_emit(ringbuf, 0);
-		intel_logical_ring_emit(ringbuf, 0);
-		intel_logical_ring_emit(ringbuf, 0);
-		intel_logical_ring_emit(ringbuf, 0);
-		intel_logical_ring_emit(ringbuf, 0);
+		intel_ring_emit(ringbuf, GFX_OP_PIPE_CONTROL(6));
+		intel_ring_emit(ringbuf, 0);
+		intel_ring_emit(ringbuf, 0);
+		intel_ring_emit(ringbuf, 0);
+		intel_ring_emit(ringbuf, 0);
+		intel_ring_emit(ringbuf, 0);
 	}
 
-	intel_logical_ring_emit(ringbuf, GFX_OP_PIPE_CONTROL(6));
-	intel_logical_ring_emit(ringbuf, flags);
-	intel_logical_ring_emit(ringbuf, scratch_addr);
-	intel_logical_ring_emit(ringbuf, 0);
-	intel_logical_ring_emit(ringbuf, 0);
-	intel_logical_ring_emit(ringbuf, 0);
-	intel_logical_ring_advance(ringbuf);
+	intel_ring_emit(ringbuf, GFX_OP_PIPE_CONTROL(6));
+	intel_ring_emit(ringbuf, flags);
+	intel_ring_emit(ringbuf, scratch_addr);
+	intel_ring_emit(ringbuf, 0);
+	intel_ring_emit(ringbuf, 0);
+	intel_ring_emit(ringbuf, 0);
+	intel_ring_advance(ringbuf);
 
 	return 0;
 }
@@ -1786,23 +1786,23 @@ static int gen8_emit_request(struct drm_i915_gem_request *request)
 	cmd = MI_STORE_DWORD_IMM_GEN4;
 	cmd |= MI_GLOBAL_GTT;
 
-	intel_logical_ring_emit(ringbuf, cmd);
-	intel_logical_ring_emit(ringbuf,
-				(ring->status_page.gfx_addr +
-				(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT)));
-	intel_logical_ring_emit(ringbuf, 0);
-	intel_logical_ring_emit(ringbuf, request->fence.seqno);
-	intel_logical_ring_emit(ringbuf, MI_USER_INTERRUPT);
-	intel_logical_ring_emit(ringbuf, MI_NOOP);
+	intel_ring_emit(ringbuf, cmd);
+	intel_ring_emit(ringbuf,
+			(ring->status_page.gfx_addr +
+			 (I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT)));
+	intel_ring_emit(ringbuf, 0);
+	intel_ring_emit(ringbuf, request->fence.seqno);
+	intel_ring_emit(ringbuf, MI_USER_INTERRUPT);
+	intel_ring_emit(ringbuf, MI_NOOP);
 	intel_logical_ring_advance_and_submit(request);
 
 	/*
 	 * Here we add two extra NOOPs as padding to avoid
 	 * lite restore of a context with HEAD==TAIL.
 	 */
-	intel_logical_ring_emit(ringbuf, MI_NOOP);
-	intel_logical_ring_emit(ringbuf, MI_NOOP);
-	intel_logical_ring_advance(ringbuf);
+	intel_ring_emit(ringbuf, MI_NOOP);
+	intel_ring_emit(ringbuf, MI_NOOP);
+	intel_ring_advance(ringbuf);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 1e58f2550777..9d4aa699e593 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -63,32 +63,6 @@ int intel_logical_rings_init(struct drm_device *dev);
 int intel_logical_ring_begin(struct drm_i915_gem_request *req, int num_dwords);
 
 int logical_ring_flush_all_caches(struct drm_i915_gem_request *req);
-/**
- * intel_logical_ring_advance() - advance the ringbuffer tail
- * @ringbuf: Ringbuffer to advance.
- *
- * The tail is only updated in our logical ringbuffer struct.
- */
-static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
-{
-	intel_ringbuffer_advance(ringbuf);
-}
-
-/**
- * intel_logical_ring_emit() - write a DWORD to the ringbuffer.
- * @ringbuf: Ringbuffer to write to.
- * @data: DWORD to write.
- */
-static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf,
-					   u32 data)
-{
-	intel_ringbuffer_emit(ringbuf, data);
-}
-static inline void intel_logical_ring_emit_reg(struct intel_ringbuffer *ringbuf,
-					       i915_reg_t reg)
-{
-	intel_logical_ring_emit(ringbuf, i915_mmio_reg_offset(reg));
-}
 
 /* Logical Ring Contexts */
 
diff --git a/drivers/gpu/drm/i915/intel_mocs.c b/drivers/gpu/drm/i915/intel_mocs.c
index fed7bea19cc9..d8a7fdc7baeb 100644
--- a/drivers/gpu/drm/i915/intel_mocs.c
+++ b/drivers/gpu/drm/i915/intel_mocs.c
@@ -206,13 +206,11 @@ static int emit_mocs_control_table(struct drm_i915_gem_request *req,
 		return ret;
 	}
 
-	intel_logical_ring_emit(ringbuf,
-				MI_LOAD_REGISTER_IMM(GEN9_NUM_MOCS_ENTRIES));
+	intel_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(GEN9_NUM_MOCS_ENTRIES));
 
 	for (index = 0; index < table->size; index++) {
-		intel_logical_ring_emit_reg(ringbuf, mocs_register(ring, index));
-		intel_logical_ring_emit(ringbuf,
-					table->table[index].control_value);
+		intel_ring_emit_reg(ringbuf, mocs_register(ring, index));
+		intel_ring_emit(ringbuf, table->table[index].control_value);
 	}
 
 	/*
@@ -224,12 +222,12 @@ static int emit_mocs_control_table(struct drm_i915_gem_request *req,
 	 * that value to all the used entries.
 	 */
 	for (; index < GEN9_NUM_MOCS_ENTRIES; index++) {
-		intel_logical_ring_emit_reg(ringbuf, mocs_register(ring, index));
-		intel_logical_ring_emit(ringbuf, table->table[0].control_value);
+		intel_ring_emit_reg(ringbuf, mocs_register(ring, index));
+		intel_ring_emit(ringbuf, table->table[0].control_value);
 	}
 
-	intel_logical_ring_emit(ringbuf, MI_NOOP);
-	intel_logical_ring_advance(ringbuf);
+	intel_ring_emit(ringbuf, MI_NOOP);
+	intel_ring_advance(ringbuf);
 
 	return 0;
 }
@@ -265,15 +263,15 @@ static int emit_mocs_l3cc_table(struct drm_i915_gem_request *req,
 		return ret;
 	}
 
-	intel_logical_ring_emit(ringbuf,
+	intel_ring_emit(ringbuf,
 			MI_LOAD_REGISTER_IMM(GEN9_NUM_MOCS_ENTRIES / 2));
 
 	for (i = 0, count = 0; i < table->size / 2; i++, count += 2) {
 		value = (table->table[count].l3cc_value & 0xffff) |
 			((table->table[count + 1].l3cc_value & 0xffff) << 16);
 
-		intel_logical_ring_emit_reg(ringbuf, GEN9_LNCFCMOCS(i));
-		intel_logical_ring_emit(ringbuf, value);
+		intel_ring_emit_reg(ringbuf, GEN9_LNCFCMOCS(i));
+		intel_ring_emit(ringbuf, value);
 	}
 
 	if (table->size & 0x01) {
@@ -289,14 +287,14 @@ static int emit_mocs_l3cc_table(struct drm_i915_gem_request *req,
 	 * they are reserved by the hardware.
 	 */
 	for (; i < GEN9_NUM_MOCS_ENTRIES / 2; i++) {
-		intel_logical_ring_emit_reg(ringbuf, GEN9_LNCFCMOCS(i));
-		intel_logical_ring_emit(ringbuf, value);
+		intel_ring_emit_reg(ringbuf, GEN9_LNCFCMOCS(i));
+		intel_ring_emit(ringbuf, value);
 
 		value = filler;
 	}
 
-	intel_logical_ring_emit(ringbuf, MI_NOOP);
-	intel_logical_ring_advance(ringbuf);
+	intel_ring_emit(ringbuf, MI_NOOP);
+	intel_ring_advance(ringbuf);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index 76f1980a7541..6dca0e470e61 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -252,11 +252,11 @@ static int intel_overlay_on(struct intel_overlay *overlay)
 
 	overlay->active = true;
 
-	intel_ring_emit(ring, MI_OVERLAY_FLIP | MI_OVERLAY_ON);
-	intel_ring_emit(ring, overlay->flip_addr | OFC_UPDATE);
-	intel_ring_emit(ring, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
-	intel_ring_emit(ring, MI_NOOP);
-	intel_ring_advance(ring);
+	intel_ring_emit(req->ringbuf, MI_OVERLAY_FLIP | MI_OVERLAY_ON);
+	intel_ring_emit(req->ringbuf, overlay->flip_addr | OFC_UPDATE);
+	intel_ring_emit(req->ringbuf, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
+	intel_ring_emit(req->ringbuf, MI_NOOP);
+	intel_ring_advance(req->ringbuf);
 
 	return intel_overlay_do_wait_request(overlay, req, NULL);
 }
@@ -293,9 +293,9 @@ static int intel_overlay_continue(struct intel_overlay *overlay,
 		return ret;
 	}
 
-	intel_ring_emit(ring, MI_OVERLAY_FLIP | MI_OVERLAY_CONTINUE);
-	intel_ring_emit(ring, flip_addr);
-	intel_ring_advance(ring);
+	intel_ring_emit(req->ringbuf, MI_OVERLAY_FLIP | MI_OVERLAY_CONTINUE);
+	intel_ring_emit(req->ringbuf, flip_addr);
+	intel_ring_advance(req->ringbuf);
 
 	WARN_ON(overlay->last_flip_req);
 	i915_gem_request_assign(&overlay->last_flip_req, req);
@@ -360,22 +360,22 @@ static int intel_overlay_off(struct intel_overlay *overlay)
 	}
 
 	/* wait for overlay to go idle */
-	intel_ring_emit(ring, MI_OVERLAY_FLIP | MI_OVERLAY_CONTINUE);
-	intel_ring_emit(ring, flip_addr);
-	intel_ring_emit(ring, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
+	intel_ring_emit(req->ringbuf, MI_OVERLAY_FLIP | MI_OVERLAY_CONTINUE);
+	intel_ring_emit(req->ringbuf, flip_addr);
+	intel_ring_emit(req->ringbuf, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
 	/* turn overlay off */
 	if (IS_I830(dev)) {
 		/* Workaround: Don't disable the overlay fully, since otherwise
 		 * it dies on the next OVERLAY_ON cmd. */
-		intel_ring_emit(ring, MI_NOOP);
-		intel_ring_emit(ring, MI_NOOP);
-		intel_ring_emit(ring, MI_NOOP);
+		intel_ring_emit(req->ringbuf, MI_NOOP);
+		intel_ring_emit(req->ringbuf, MI_NOOP);
+		intel_ring_emit(req->ringbuf, MI_NOOP);
 	} else {
-		intel_ring_emit(ring, MI_OVERLAY_FLIP | MI_OVERLAY_OFF);
-		intel_ring_emit(ring, flip_addr);
-		intel_ring_emit(ring, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
+		intel_ring_emit(req->ringbuf, MI_OVERLAY_FLIP | MI_OVERLAY_OFF);
+		intel_ring_emit(req->ringbuf, flip_addr);
+		intel_ring_emit(req->ringbuf, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
 	}
-	intel_ring_advance(ring);
+	intel_ring_advance(req->ringbuf);
 
 	return intel_overlay_do_wait_request(overlay, req, intel_overlay_off_tail);
 }
@@ -433,9 +433,9 @@ static int intel_overlay_release_old_vid(struct intel_overlay *overlay)
 			return ret;
 		}
 
-		intel_ring_emit(ring, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
-		intel_ring_emit(ring, MI_NOOP);
-		intel_ring_advance(ring);
+		intel_ring_emit(req->ringbuf, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
+		intel_ring_emit(req->ringbuf, MI_NOOP);
+		intel_ring_advance(req->ringbuf);
 
 		ret = intel_overlay_do_wait_request(overlay, req,
 						    intel_overlay_release_old_vid_tail);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index d17dd33ee94c..86c54584f64a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -71,7 +71,7 @@ gen2_render_ring_flush(struct drm_i915_gem_request *req,
 		       u32	invalidate_domains,
 		       u32	flush_domains)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	u32 cmd;
 	int ret;
 
@@ -98,7 +98,7 @@ gen4_render_ring_flush(struct drm_i915_gem_request *req,
 		       u32	invalidate_domains,
 		       u32	flush_domains)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	u32 cmd;
 	int ret;
 
@@ -191,8 +191,8 @@ gen4_render_ring_flush(struct drm_i915_gem_request *req,
 static int
 intel_emit_post_sync_nonzero_flush(struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = req->ring;
-	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	struct intel_ringbuffer *ring = req->ringbuf;
+	u32 scratch_addr = req->ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
 
 	ret = intel_ring_begin(req, 6);
@@ -227,9 +227,9 @@ static int
 gen6_render_ring_flush(struct drm_i915_gem_request *req,
 		       u32 invalidate_domains, u32 flush_domains)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	u32 flags = 0;
-	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	u32 scratch_addr = req->ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
 
 	/* Force SNB workarounds for PIPE_CONTROL flushes */
@@ -279,7 +279,7 @@ gen6_render_ring_flush(struct drm_i915_gem_request *req,
 static int
 gen7_render_ring_cs_stall_wa(struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	int ret;
 
 	ret = intel_ring_begin(req, 4);
@@ -300,9 +300,9 @@ static int
 gen7_render_ring_flush(struct drm_i915_gem_request *req,
 		       u32 invalidate_domains, u32 flush_domains)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	u32 flags = 0;
-	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	u32 scratch_addr = req->ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
 
 	/*
@@ -363,7 +363,7 @@ static int
 gen8_emit_pipe_control(struct drm_i915_gem_request *req,
 		       u32 flags, u32 scratch_addr)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	int ret;
 
 	ret = intel_ring_begin(req, 6);
@@ -688,15 +688,15 @@ err:
 
 static int intel_ring_workarounds_emit(struct drm_i915_gem_request *req)
 {
-	int ret, i;
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	struct drm_i915_private *dev_priv = req->i915;
 	struct i915_workarounds *w = &dev_priv->workarounds;
+	int ret, i;
 
 	if (w->count == 0)
 		return 0;
 
-	ring->gpu_caches_dirty = true;
+	req->ring->gpu_caches_dirty = true;
 	ret = intel_ring_flush_all_caches(req);
 	if (ret)
 		return ret;
@@ -714,7 +714,7 @@ static int intel_ring_workarounds_emit(struct drm_i915_gem_request *req)
 
 	intel_ring_advance(ring);
 
-	ring->gpu_caches_dirty = true;
+	req->ring->gpu_caches_dirty = true;
 	ret = intel_ring_flush_all_caches(req);
 	if (ret)
 		return ret;
@@ -1191,7 +1191,7 @@ static int gen8_rcs_signal(struct drm_i915_gem_request *signaller_req,
 			   unsigned int num_dwords)
 {
 #define MBOX_UPDATE_DWORDS 8
-	struct intel_engine_cs *signaller = signaller_req->ring;
+	struct intel_ringbuffer *signaller = signaller_req->ringbuf;
 	struct drm_i915_private *dev_priv = signaller_req->i915;
 	struct intel_engine_cs *waiter;
 	int i, ret, num_rings;
@@ -1205,7 +1205,7 @@ static int gen8_rcs_signal(struct drm_i915_gem_request *signaller_req,
 		return ret;
 
 	for_each_ring(waiter, dev_priv, i) {
-		u64 gtt_offset = signaller->semaphore.signal_ggtt[i];
+		u64 gtt_offset = signaller_req->ring->semaphore.signal_ggtt[i];
 		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
 			continue;
 
@@ -1229,7 +1229,7 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
 			   unsigned int num_dwords)
 {
 #define MBOX_UPDATE_DWORDS 6
-	struct intel_engine_cs *signaller = signaller_req->ring;
+	struct intel_ringbuffer *signaller = signaller_req->ringbuf;
 	struct drm_i915_private *dev_priv = signaller_req->i915;
 	struct intel_engine_cs *waiter;
 	int i, ret, num_rings;
@@ -1243,7 +1243,7 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
 		return ret;
 
 	for_each_ring(waiter, dev_priv, i) {
-		u64 gtt_offset = signaller->semaphore.signal_ggtt[i];
+		u64 gtt_offset = signaller_req->ring->semaphore.signal_ggtt[i];
 		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
 			continue;
 
@@ -1264,7 +1264,7 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
 static int gen6_signal(struct drm_i915_gem_request *signaller_req,
 		       unsigned int num_dwords)
 {
-	struct intel_engine_cs *signaller = signaller_req->ring;
+	struct intel_ringbuffer *signaller = signaller_req->ringbuf;
 	struct drm_i915_private *dev_priv = signaller_req->i915;
 	struct intel_engine_cs *useless;
 	int i, ret, num_rings;
@@ -1279,7 +1279,7 @@ static int gen6_signal(struct drm_i915_gem_request *signaller_req,
 		return ret;
 
 	for_each_ring(useless, dev_priv, i) {
-		i915_reg_t mbox_reg = signaller->semaphore.mbox.signal[i];
+		i915_reg_t mbox_reg = signaller_req->ring->semaphore.mbox.signal[i];
 
 		if (i915_mmio_reg_valid(mbox_reg)) {
 			intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1));
@@ -1306,11 +1306,11 @@ static int gen6_signal(struct drm_i915_gem_request *signaller_req,
 static int
 gen6_add_request(struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	int ret;
 
-	if (ring->semaphore.signal)
-		ret = ring->semaphore.signal(req, 4);
+	if (req->ring->semaphore.signal)
+		ret = req->ring->semaphore.signal(req, 4);
 	else
 		ret = intel_ring_begin(req, 4);
 
@@ -1321,15 +1321,14 @@ gen6_add_request(struct drm_i915_gem_request *req)
 	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
 	intel_ring_emit(ring, req->fence.seqno);
 	intel_ring_emit(ring, MI_USER_INTERRUPT);
-	__intel_ring_advance(ring);
+	__intel_ring_advance(req->ring);
 
 	return 0;
 }
 
-static inline bool i915_gem_has_seqno_wrapped(struct drm_device *dev,
+static inline bool i915_gem_has_seqno_wrapped(struct drm_i915_private *dev_priv,
 					      u32 seqno)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	return dev_priv->last_seqno < seqno;
 }
 
@@ -1346,7 +1345,7 @@ gen8_ring_sync(struct drm_i915_gem_request *waiter_req,
 	       struct intel_engine_cs *signaller,
 	       u32 seqno)
 {
-	struct intel_engine_cs *waiter = waiter_req->ring;
+	struct intel_ringbuffer *waiter = waiter_req->ringbuf;
 	struct drm_i915_private *dev_priv = waiter_req->i915;
 	int ret;
 
@@ -1360,9 +1359,11 @@ gen8_ring_sync(struct drm_i915_gem_request *waiter_req,
 				MI_SEMAPHORE_SAD_GTE_SDD);
 	intel_ring_emit(waiter, seqno);
 	intel_ring_emit(waiter,
-			lower_32_bits(GEN8_WAIT_OFFSET(waiter, signaller->id)));
+			lower_32_bits(GEN8_WAIT_OFFSET(waiter_req->ring,
+						       signaller->id)));
 	intel_ring_emit(waiter,
-			upper_32_bits(GEN8_WAIT_OFFSET(waiter, signaller->id)));
+			upper_32_bits(GEN8_WAIT_OFFSET(waiter_req->ring,
+						       signaller->id)));
 	intel_ring_advance(waiter);
 	return 0;
 }
@@ -1372,11 +1373,11 @@ gen6_ring_sync(struct drm_i915_gem_request *waiter_req,
 	       struct intel_engine_cs *signaller,
 	       u32 seqno)
 {
-	struct intel_engine_cs *waiter = waiter_req->ring;
+	struct intel_ringbuffer *waiter = waiter_req->ringbuf;
 	u32 dw1 = MI_SEMAPHORE_MBOX |
 		  MI_SEMAPHORE_COMPARE |
 		  MI_SEMAPHORE_REGISTER;
-	u32 wait_mbox = signaller->semaphore.mbox.wait[waiter->id];
+	u32 wait_mbox = signaller->semaphore.mbox.wait[waiter_req->ring->id];
 	int ret;
 
 	/* Throughout all of the GEM code, seqno passed implies our current
@@ -1392,7 +1393,7 @@ gen6_ring_sync(struct drm_i915_gem_request *waiter_req,
 		return ret;
 
 	/* If seqno wrap happened, omit the wait with no-ops */
-	if (likely(!i915_gem_has_seqno_wrapped(waiter->dev, seqno))) {
+	if (likely(!i915_gem_has_seqno_wrapped(waiter_req->i915, seqno))) {
 		intel_ring_emit(waiter, dw1 | wait_mbox);
 		intel_ring_emit(waiter, seqno);
 		intel_ring_emit(waiter, 0);
@@ -1420,7 +1421,7 @@ do {									\
 static int
 pc_render_add_request(struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	u32 addr = req->ring->status_page.gfx_addr +
 		(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
 	u32 scratch_addr = addr;
@@ -1464,7 +1465,7 @@ pc_render_add_request(struct drm_i915_gem_request *req)
 	intel_ring_emit(ring, addr | PIPE_CONTROL_GLOBAL_GTT);
 	intel_ring_emit(ring, req->fence.seqno);
 	intel_ring_emit(ring, 0);
-	__intel_ring_advance(ring);
+	__intel_ring_advance(req->ring);
 
 	return 0;
 }
@@ -1547,7 +1548,7 @@ bsd_ring_flush(struct drm_i915_gem_request *req,
 	       u32     invalidate_domains,
 	       u32     flush_domains)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	int ret;
 
 	ret = intel_ring_begin(req, 2);
@@ -1563,7 +1564,7 @@ bsd_ring_flush(struct drm_i915_gem_request *req,
 static int
 i9xx_add_request(struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	int ret;
 
 	ret = intel_ring_begin(req, 4);
@@ -1574,7 +1575,7 @@ i9xx_add_request(struct drm_i915_gem_request *req)
 	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
 	intel_ring_emit(ring, req->fence.seqno);
 	intel_ring_emit(ring, MI_USER_INTERRUPT);
-	__intel_ring_advance(ring);
+	__intel_ring_advance(req->ring);
 
 	return 0;
 }
@@ -1657,7 +1658,7 @@ i965_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			 u64 offset, u32 length,
 			 unsigned dispatch_flags)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	int ret;
 
 	ret = intel_ring_begin(req, 2);
@@ -1684,8 +1685,8 @@ i830_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			 u64 offset, u32 len,
 			 unsigned dispatch_flags)
 {
-	struct intel_engine_cs *ring = req->ring;
-	u32 cs_offset = ring->scratch.gtt_offset;
+	struct intel_ringbuffer *ring = req->ringbuf;
+	u32 cs_offset = req->ring->scratch.gtt_offset;
 	int ret;
 
 	ret = intel_ring_begin(req, 6);
@@ -1747,7 +1748,7 @@ i915_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			 u64 offset, u32 len,
 			 unsigned dispatch_flags)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	int ret;
 
 	ret = intel_ring_begin(req, 2);
@@ -2256,8 +2257,8 @@ int intel_ring_begin(struct drm_i915_gem_request *req,
 /* Align the ring tail to a cacheline boundary */
 int intel_ring_cacheline_align(struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = req->ring;
-	int num_dwords = (ring->buffer->tail & (CACHELINE_BYTES - 1)) / sizeof(uint32_t);
+	struct intel_ringbuffer *ring = req->ringbuf;
+	int num_dwords = (ring->tail & (CACHELINE_BYTES - 1)) / sizeof(uint32_t);
 	int ret;
 
 	if (num_dwords == 0)
@@ -2331,7 +2332,7 @@ static void gen6_bsd_ring_write_tail(struct intel_engine_cs *ring,
 static int gen6_bsd_ring_flush(struct drm_i915_gem_request *req,
 			       u32 invalidate, u32 flush)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	uint32_t cmd;
 	int ret;
 
@@ -2340,7 +2341,7 @@ static int gen6_bsd_ring_flush(struct drm_i915_gem_request *req,
 		return ret;
 
 	cmd = MI_FLUSH_DW;
-	if (INTEL_INFO(ring->dev)->gen >= 8)
+	if (INTEL_INFO(req->i915)->gen >= 8)
 		cmd += 1;
 
 	/* We always require a command barrier so that subsequent
@@ -2361,7 +2362,7 @@ static int gen6_bsd_ring_flush(struct drm_i915_gem_request *req,
 
 	intel_ring_emit(ring, cmd);
 	intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
-	if (INTEL_INFO(ring->dev)->gen >= 8) {
+	if (INTEL_INFO(req->i915)->gen >= 8) {
 		intel_ring_emit(ring, 0); /* upper addr */
 		intel_ring_emit(ring, 0); /* value */
 	} else  {
@@ -2377,7 +2378,7 @@ gen8_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			      u64 offset, u32 len,
 			      unsigned dispatch_flags)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	bool ppgtt = USES_PPGTT(req->i915) &&
 			!(dispatch_flags & I915_DISPATCH_SECURE);
 	int ret;
@@ -2403,7 +2404,7 @@ hsw_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			     u64 offset, u32 len,
 			     unsigned dispatch_flags)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	int ret;
 
 	ret = intel_ring_begin(req, 2);
@@ -2428,7 +2429,7 @@ gen6_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			      u64 offset, u32 len,
 			      unsigned dispatch_flags)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	int ret;
 
 	ret = intel_ring_begin(req, 2);
@@ -2451,7 +2452,7 @@ gen6_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
 static int gen6_ring_flush(struct drm_i915_gem_request *req,
 			   u32 invalidate, u32 flush)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ring = req->ringbuf;
 	uint32_t cmd;
 	int ret;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 7669a8d30f27..9c19a6ca8e7d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -468,29 +468,20 @@ int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request);
 
 int __must_check intel_ring_begin(struct drm_i915_gem_request *req, int n);
 int __must_check intel_ring_cacheline_align(struct drm_i915_gem_request *req);
-static inline void intel_ringbuffer_emit(struct intel_ringbuffer *rb,
-					 u32 data)
+static inline void intel_ring_emit(struct intel_ringbuffer *rb,
+				   u32 data)
 {
 	*(uint32_t *)(rb->virtual_start + rb->tail) = data;
 	rb->tail += 4;
 }
-static inline void intel_ringbuffer_advance(struct intel_ringbuffer *rb)
-{
-	rb->tail &= rb->size - 1;
-}
-static inline void intel_ring_emit(struct intel_engine_cs *ring,
-				   u32 data)
-{
-	intel_ringbuffer_emit(ring->buffer, data);
-}
-static inline void intel_ring_emit_reg(struct intel_engine_cs *ring,
+static inline void intel_ring_emit_reg(struct intel_ringbuffer *rb,
 				       i915_reg_t reg)
 {
-	intel_ring_emit(ring, i915_mmio_reg_offset(reg));
+	intel_ring_emit(rb, i915_mmio_reg_offset(reg));
 }
-static inline void intel_ring_advance(struct intel_engine_cs *ring)
+static inline void intel_ring_advance(struct intel_ringbuffer *rb)
 {
-	intel_ringbuffer_advance(ring->buffer);
+	rb->tail &= rb->size - 1;
 }
 int __intel_ring_space(int head, int tail, int size);
 void intel_ring_update_space(struct intel_ringbuffer *ringbuf);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 056/190] drm/i915: Unify intel_ring_begin()
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (53 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 055/190] drm/i915: Unify intel_logical_ring_emit and intel_ring_emit Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 057/190] drm/i915: Remove the identical implementations of request space reservation Chris Wilson
                   ` (31 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Combine the near identical implementations of intel_logical_ring_begin()
and intel_ring_begin() - the only difference is that the logical wait
has to check for a matching ring (which is assumed by legacy).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 141 ++------------------------------
 drivers/gpu/drm/i915/intel_lrc.h        |   1 -
 drivers/gpu/drm/i915/intel_mocs.c       |  12 +--
 drivers/gpu/drm/i915/intel_ringbuffer.c | 111 +++++++++++++------------
 4 files changed, 69 insertions(+), 196 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index dc4fc9d8612c..3d14b69632e8 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -698,48 +698,6 @@ int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request
 	return 0;
 }
 
-static int logical_ring_wait_for_space(struct drm_i915_gem_request *req,
-				       int bytes)
-{
-	struct intel_ringbuffer *ringbuf = req->ringbuf;
-	struct intel_engine_cs *ring = req->ring;
-	struct drm_i915_gem_request *target;
-	unsigned space;
-	int ret;
-
-	if (intel_ring_space(ringbuf) >= bytes)
-		return 0;
-
-	/* The whole point of reserving space is to not wait! */
-	WARN_ON(ringbuf->reserved_in_use);
-
-	list_for_each_entry(target, &ring->request_list, list) {
-		/*
-		 * The request queue is per-engine, so can contain requests
-		 * from multiple ringbuffers. Here, we must ignore any that
-		 * aren't from the ringbuffer we're considering.
-		 */
-		if (target->ringbuf != ringbuf)
-			continue;
-
-		/* Would completion of this request free enough space? */
-		space = __intel_ring_space(target->postfix, ringbuf->tail,
-					   ringbuf->size);
-		if (space >= bytes)
-			break;
-	}
-
-	if (WARN_ON(&target->list == &ring->request_list))
-		return -ENOSPC;
-
-	ret = i915_wait_request(target);
-	if (ret)
-		return ret;
-
-	ringbuf->space = space;
-	return 0;
-}
-
 /*
  * intel_logical_ring_advance_and_submit() - advance the tail and submit the workload
  * @request: Request to advance the logical ringbuffer of.
@@ -763,89 +721,6 @@ intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
 		execlists_context_queue(request);
 }
 
-static void __wrap_ring_buffer(struct intel_ringbuffer *ringbuf)
-{
-	int rem = ringbuf->size - ringbuf->tail;
-	memset(ringbuf->virtual_start + ringbuf->tail, 0, rem);
-
-	ringbuf->tail = 0;
-	intel_ring_update_space(ringbuf);
-}
-
-static int logical_ring_prepare(struct drm_i915_gem_request *req, int bytes)
-{
-	struct intel_ringbuffer *ringbuf = req->ringbuf;
-	int remain_usable = ringbuf->effective_size - ringbuf->tail;
-	int remain_actual = ringbuf->size - ringbuf->tail;
-	int ret, total_bytes, wait_bytes = 0;
-	bool need_wrap = false;
-
-	if (ringbuf->reserved_in_use)
-		total_bytes = bytes;
-	else
-		total_bytes = bytes + ringbuf->reserved_size;
-
-	if (unlikely(bytes > remain_usable)) {
-		/*
-		 * Not enough space for the basic request. So need to flush
-		 * out the remainder and then wait for base + reserved.
-		 */
-		wait_bytes = remain_actual + total_bytes;
-		need_wrap = true;
-	} else {
-		if (unlikely(total_bytes > remain_usable)) {
-			/*
-			 * The base request will fit but the reserved space
-			 * falls off the end. So only need to to wait for the
-			 * reserved size after flushing out the remainder.
-			 */
-			wait_bytes = remain_actual + ringbuf->reserved_size;
-			need_wrap = true;
-		} else if (total_bytes > ringbuf->space) {
-			/* No wrapping required, just waiting. */
-			wait_bytes = total_bytes;
-		}
-	}
-
-	if (wait_bytes) {
-		ret = logical_ring_wait_for_space(req, wait_bytes);
-		if (unlikely(ret))
-			return ret;
-
-		if (need_wrap)
-			__wrap_ring_buffer(ringbuf);
-	}
-
-	return 0;
-}
-
-/**
- * intel_logical_ring_begin() - prepare the logical ringbuffer to accept some commands
- *
- * @req: The request to start some new work for
- * @num_dwords: number of DWORDs that we plan to write to the ringbuffer.
- *
- * The ringbuffer might not be ready to accept the commands right away (maybe it needs to
- * be wrapped, or wait a bit for the tail to be updated). This function takes care of that
- * and also preallocates a request (every workload submission is still mediated through
- * requests, same as it did with legacy ringbuffer submission).
- *
- * Return: non-zero if the ringbuffer is not ready to be written to.
- */
-int intel_logical_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
-{
-	int ret;
-
-	WARN_ON(req == NULL);
-
-	ret = logical_ring_prepare(req, num_dwords * sizeof(uint32_t));
-	if (ret)
-		return ret;
-
-	req->ringbuf->space -= num_dwords * sizeof(uint32_t);
-	return 0;
-}
-
 int intel_logical_ring_reserve_space(struct drm_i915_gem_request *request)
 {
 	/*
@@ -858,7 +733,7 @@ int intel_logical_ring_reserve_space(struct drm_i915_gem_request *request)
 	 */
 	intel_ring_reserved_space_reserve(request->ringbuf, MIN_SPACE_FOR_ADD_REQUEST);
 
-	return intel_logical_ring_begin(request, 0);
+	return intel_ring_begin(request, 0);
 }
 
 /**
@@ -928,7 +803,7 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 
 	if (ring == &dev_priv->ring[RCS] &&
 	    instp_mode != dev_priv->relative_constants_mode) {
-		ret = intel_logical_ring_begin(params->request, 4);
+		ret = intel_ring_begin(params->request, 4);
 		if (ret)
 			return ret;
 
@@ -1104,7 +979,7 @@ static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
 	if (ret)
 		return ret;
 
-	ret = intel_logical_ring_begin(req, w->count * 2 + 2);
+	ret = intel_ring_begin(req, w->count * 2 + 2);
 	if (ret)
 		return ret;
 
@@ -1566,7 +1441,7 @@ static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
 	const int num_lri_cmds = GEN8_LEGACY_PDPES * 2;
 	int i, ret;
 
-	ret = intel_logical_ring_begin(req, num_lri_cmds * 2 + 2);
+	ret = intel_ring_begin(req, num_lri_cmds * 2 + 2);
 	if (ret)
 		return ret;
 
@@ -1611,7 +1486,7 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
 		req->ctx->ppgtt->pd_dirty_rings &= ~intel_ring_flag(req->ring);
 	}
 
-	ret = intel_logical_ring_begin(req, 4);
+	ret = intel_ring_begin(req, 4);
 	if (ret)
 		return ret;
 
@@ -1655,7 +1530,7 @@ static int gen8_emit_flush(struct drm_i915_gem_request *request,
 	uint32_t cmd;
 	int ret;
 
-	ret = intel_logical_ring_begin(request, 4);
+	ret = intel_ring_begin(request, 4);
 	if (ret)
 		return ret;
 
@@ -1722,7 +1597,7 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
 			vf_flush_wa = true;
 	}
 
-	ret = intel_logical_ring_begin(request, vf_flush_wa ? 12 : 6);
+	ret = intel_ring_begin(request, vf_flush_wa ? 12 : 6);
 	if (ret)
 		return ret;
 
@@ -1779,7 +1654,7 @@ static int gen8_emit_request(struct drm_i915_gem_request *request)
 	 * used as a workaround for not being allowed to do lite
 	 * restore with HEAD==TAIL (WaIdleLiteRestore).
 	 */
-	ret = intel_logical_ring_begin(request, 8);
+	ret = intel_ring_begin(request, 8);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 9d4aa699e593..32401e11cebe 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -60,7 +60,6 @@ int intel_logical_ring_reserve_space(struct drm_i915_gem_request *request);
 void intel_logical_ring_stop(struct intel_engine_cs *ring);
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
 int intel_logical_rings_init(struct drm_device *dev);
-int intel_logical_ring_begin(struct drm_i915_gem_request *req, int num_dwords);
 
 int logical_ring_flush_all_caches(struct drm_i915_gem_request *req);
 
diff --git a/drivers/gpu/drm/i915/intel_mocs.c b/drivers/gpu/drm/i915/intel_mocs.c
index d8a7fdc7baeb..5d4f6f3b67cd 100644
--- a/drivers/gpu/drm/i915/intel_mocs.c
+++ b/drivers/gpu/drm/i915/intel_mocs.c
@@ -200,11 +200,9 @@ static int emit_mocs_control_table(struct drm_i915_gem_request *req,
 	if (WARN_ON(table->size > GEN9_NUM_MOCS_ENTRIES))
 		return -ENODEV;
 
-	ret = intel_logical_ring_begin(req, 2 + 2 * GEN9_NUM_MOCS_ENTRIES);
-	if (ret) {
-		DRM_DEBUG("intel_logical_ring_begin failed %d\n", ret);
+	ret = intel_ring_begin(req, 2 + 2 * GEN9_NUM_MOCS_ENTRIES);
+	if (ret)
 		return ret;
-	}
 
 	intel_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(GEN9_NUM_MOCS_ENTRIES));
 
@@ -257,11 +255,9 @@ static int emit_mocs_l3cc_table(struct drm_i915_gem_request *req,
 	if (WARN_ON(table->size > GEN9_NUM_MOCS_ENTRIES))
 		return -ENODEV;
 
-	ret = intel_logical_ring_begin(req, 2 + GEN9_NUM_MOCS_ENTRIES);
-	if (ret) {
-		DRM_DEBUG("intel_logical_ring_begin failed %d\n", ret);
+	ret = intel_ring_begin(req, 2 + GEN9_NUM_MOCS_ENTRIES);
+	if (ret)
 		return ret;
-	}
 
 	intel_ring_emit(ringbuf,
 			MI_LOAD_REGISTER_IMM(GEN9_NUM_MOCS_ENTRIES / 2));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 86c54584f64a..c694f602a0b8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2062,46 +2062,6 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 	ring->dev = NULL;
 }
 
-static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
-{
-	struct intel_ringbuffer *ringbuf = ring->buffer;
-	struct drm_i915_gem_request *request;
-	unsigned space;
-	int ret;
-
-	if (intel_ring_space(ringbuf) >= n)
-		return 0;
-
-	/* The whole point of reserving space is to not wait! */
-	WARN_ON(ringbuf->reserved_in_use);
-
-	list_for_each_entry(request, &ring->request_list, list) {
-		space = __intel_ring_space(request->postfix, ringbuf->tail,
-					   ringbuf->size);
-		if (space >= n)
-			break;
-	}
-
-	if (WARN_ON(&request->list == &ring->request_list))
-		return -ENOSPC;
-
-	ret = i915_wait_request(request);
-	if (ret)
-		return ret;
-
-	ringbuf->space = space;
-	return 0;
-}
-
-static void __wrap_ring_buffer(struct intel_ringbuffer *ringbuf)
-{
-	int rem = ringbuf->size - ringbuf->tail;
-	memset(ringbuf->virtual_start + ringbuf->tail, 0, rem);
-
-	ringbuf->tail = 0;
-	intel_ring_update_space(ringbuf);
-}
-
 int intel_ring_idle(struct intel_engine_cs *ring)
 {
 	struct drm_i915_gem_request *req;
@@ -2188,9 +2148,59 @@ void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf)
 	ringbuf->reserved_in_use = false;
 }
 
-static int __intel_ring_prepare(struct intel_engine_cs *ring, int bytes)
+static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
 {
-	struct intel_ringbuffer *ringbuf = ring->buffer;
+	struct intel_ringbuffer *ringbuf = req->ringbuf;
+	struct intel_engine_cs *ring = req->ring;
+	struct drm_i915_gem_request *target;
+	unsigned space;
+	int ret;
+
+	if (intel_ring_space(ringbuf) >= bytes)
+		return 0;
+
+	/* The whole point of reserving space is to not wait! */
+	WARN_ON(ringbuf->reserved_in_use);
+
+	list_for_each_entry(target, &ring->request_list, list) {
+		/*
+		 * The request queue is per-engine, so can contain requests
+		 * from multiple ringbuffers. Here, we must ignore any that
+		 * aren't from the ringbuffer we're considering.
+		 */
+		if (target->ringbuf != ringbuf)
+			continue;
+
+		/* Would completion of this request free enough space? */
+		space = __intel_ring_space(target->postfix, ringbuf->tail,
+					   ringbuf->size);
+		if (space >= bytes)
+			break;
+	}
+
+	if (WARN_ON(&target->list == &ring->request_list))
+		return -ENOSPC;
+
+	ret = i915_wait_request(target);
+	if (ret)
+		return ret;
+
+	ringbuf->space = space;
+	return 0;
+}
+
+static void ring_wrap(struct intel_ringbuffer *ringbuf)
+{
+	int rem = ringbuf->size - ringbuf->tail;
+	memset(ringbuf->virtual_start + ringbuf->tail, 0, rem);
+
+	ringbuf->tail = 0;
+	intel_ring_update_space(ringbuf);
+}
+
+static int ring_prepare(struct drm_i915_gem_request *req, int bytes)
+{
+	struct intel_ringbuffer *ringbuf = req->ringbuf;
 	int remain_usable = ringbuf->effective_size - ringbuf->tail;
 	int remain_actual = ringbuf->size - ringbuf->tail;
 	int ret, total_bytes, wait_bytes = 0;
@@ -2224,33 +2234,26 @@ static int __intel_ring_prepare(struct intel_engine_cs *ring, int bytes)
 	}
 
 	if (wait_bytes) {
-		ret = ring_wait_for_space(ring, wait_bytes);
+		ret = wait_for_space(req, wait_bytes);
 		if (unlikely(ret))
 			return ret;
 
 		if (need_wrap)
-			__wrap_ring_buffer(ringbuf);
+			ring_wrap(ringbuf);
 	}
 
 	return 0;
 }
 
-int intel_ring_begin(struct drm_i915_gem_request *req,
-		     int num_dwords)
+int intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
 {
-	struct intel_engine_cs *ring;
-	struct drm_i915_private *dev_priv;
 	int ret;
 
-	WARN_ON(req == NULL);
-	ring = req->ring;
-	dev_priv = req->i915;
-
-	ret = __intel_ring_prepare(ring, num_dwords * sizeof(uint32_t));
+	ret = ring_prepare(req, num_dwords * sizeof(uint32_t));
 	if (ret)
 		return ret;
 
-	ring->buffer->space -= num_dwords * sizeof(uint32_t);
+	req->ringbuf->space -= num_dwords * sizeof(uint32_t);
 	return 0;
 }
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 057/190] drm/i915: Remove the identical implementations of request space reservation
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (54 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 056/190] drm/i915: Unify intel_ring_begin() Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 058/190] drm/i915: Rename request->ring to request->engine Chris Wilson
                   ` (30 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Now that we share intel_ring_begin(), reserving space for the tail of
the request is identical between legacy/execlists and so the tautology
can be removed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.c |  7 +++----
 drivers/gpu/drm/i915/intel_lrc.c        | 15 ---------------
 drivers/gpu/drm/i915/intel_lrc.h        |  1 -
 drivers/gpu/drm/i915/intel_ringbuffer.c | 15 ---------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ---
 5 files changed, 3 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 619a9b063d9c..85067069995e 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -255,10 +255,9 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 	 * to be redone if the request is not actually submitted straight
 	 * away, e.g. because a GPU scheduler has deferred it.
 	 */
-	if (i915.enable_execlists)
-		ret = intel_logical_ring_reserve_space(req);
-	else
-		ret = intel_ring_reserve_space(req);
+	intel_ring_reserved_space_reserve(req->ringbuf,
+					  MIN_SPACE_FOR_ADD_REQUEST);
+	ret = intel_ring_begin(req, 0);
 	if (ret) {
 		/*
 		 * At this point, the request is fully allocated even if not
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3d14b69632e8..4f1944929330 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -721,21 +721,6 @@ intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
 		execlists_context_queue(request);
 }
 
-int intel_logical_ring_reserve_space(struct drm_i915_gem_request *request)
-{
-	/*
-	 * The first call merely notes the reserve request and is common for
-	 * all back ends. The subsequent localised _begin() call actually
-	 * ensures that the reservation is available. Without the begin, if
-	 * the request creator immediately submitted the request without
-	 * adding any commands to it then there might not actually be
-	 * sufficient room for the submission commands.
-	 */
-	intel_ring_reserved_space_reserve(request->ringbuf, MIN_SPACE_FOR_ADD_REQUEST);
-
-	return intel_ring_begin(request, 0);
-}
-
 /**
  * execlists_submission() - submit a batchbuffer for execution, Execlists style
  * @dev: DRM device.
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 32401e11cebe..c88988a41898 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -56,7 +56,6 @@
 
 /* Logical Rings */
 int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request);
-int intel_logical_ring_reserve_space(struct drm_i915_gem_request *request);
 void intel_logical_ring_stop(struct intel_engine_cs *ring);
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
 int intel_logical_rings_init(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index c694f602a0b8..db5c407f7720 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2086,21 +2086,6 @@ int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
 	return 0;
 }
 
-int intel_ring_reserve_space(struct drm_i915_gem_request *request)
-{
-	/*
-	 * The first call merely notes the reserve request and is common for
-	 * all back ends. The subsequent localised _begin() call actually
-	 * ensures that the reservation is available. Without the begin, if
-	 * the request creator immediately submitted the request without
-	 * adding any commands to it then there might not actually be
-	 * sufficient room for the submission commands.
-	 */
-	intel_ring_reserved_space_reserve(request->ringbuf, MIN_SPACE_FOR_ADD_REQUEST);
-
-	return intel_ring_begin(request, 0);
-}
-
 void intel_ring_reserved_space_reserve(struct intel_ringbuffer *ringbuf, int size)
 {
 	WARN_ON(ringbuf->reserved_size);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 9c19a6ca8e7d..bc6ceb54b1f3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -536,9 +536,6 @@ void intel_ring_reserved_space_use(struct intel_ringbuffer *ringbuf);
 /* Finish with the reserved space - for use by i915_add_request() only. */
 void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf);
 
-/* Legacy ringbuffer specific portion of reservation code: */
-int intel_ring_reserve_space(struct drm_i915_gem_request *request);
-
 /* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
 struct intel_wait {
 	struct rb_node node;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 058/190] drm/i915: Rename request->ring to request->engine
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (55 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 057/190] drm/i915: Remove the identical implementations of request space reservation Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-28 11:45   ` Tvrtko Ursulin
  2016-01-11  9:17 ` [PATCH 059/190] drm/i915: Rename request->ringbuf to request->ring Chris Wilson
                   ` (29 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

In order to disambiguate between the pointer to the intel_engine_cs
(called ring) and the intel_ringbuffer (called ringbuf), rename
s/ring/engine/.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c          |  11 +--
 drivers/gpu/drm/i915/i915_drv.h              |   2 +-
 drivers/gpu/drm/i915/i915_gem.c              |  32 +++----
 drivers/gpu/drm/i915/i915_gem_context.c      |  70 +++++++-------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   |   8 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c          |  47 +++++-----
 drivers/gpu/drm/i915/i915_gem_render_state.c |  18 ++--
 drivers/gpu/drm/i915/i915_gem_request.c      |  53 ++++-------
 drivers/gpu/drm/i915/i915_gem_request.h      |  10 +-
 drivers/gpu/drm/i915/i915_gpu_error.c        |   3 +-
 drivers/gpu/drm/i915/i915_guc_submission.c   |   8 +-
 drivers/gpu/drm/i915/i915_trace.h            |  32 +++----
 drivers/gpu/drm/i915/intel_breadcrumbs.c     |   2 +-
 drivers/gpu/drm/i915/intel_display.c         |  10 +-
 drivers/gpu/drm/i915/intel_lrc.c             | 134 +++++++++++++--------------
 drivers/gpu/drm/i915/intel_mocs.c            |  13 ++-
 drivers/gpu/drm/i915/intel_ringbuffer.c      |  62 ++++++-------
 17 files changed, 240 insertions(+), 275 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 387ae77d3c29..018076c89247 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -185,8 +185,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		seq_printf(m, " (%s mappable)", s);
 	}
 	if (obj->last_write_req != NULL)
-		seq_printf(m, " (%s)",
-			   i915_gem_request_get_ring(obj->last_write_req)->name);
+		seq_printf(m, " (%s)", obj->last_write_req->engine->name);
 	if (obj->frontbuffer_bits)
 		seq_printf(m, " (frontbuffer: 0x%03x)", obj->frontbuffer_bits);
 }
@@ -593,14 +592,14 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 					   pipe, plane);
 			}
 			if (work->flip_queued_req) {
-				struct intel_engine_cs *ring =
-					i915_gem_request_get_ring(work->flip_queued_req);
+				struct intel_engine_cs *engine =
+					work->flip_queued_req->engine;
 
 				seq_printf(m, "Flip queued on %s at seqno %x, next seqno %x [current breadcrumb %x], completed? %d\n",
-					   ring->name,
+					   engine->name,
 					   i915_gem_request_get_seqno(work->flip_queued_req),
 					   dev_priv->next_seqno,
-					   intel_ring_get_seqno(ring),
+					   intel_ring_get_seqno(engine),
 					   i915_gem_request_completed(work->flip_queued_req));
 			} else
 				seq_printf(m, "Flip not associated with any ring\n");
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 58e9e5e50769..baede4517c70 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3410,7 +3410,7 @@ wait_remaining_ms_from_jiffies(unsigned long timestamp_jiffies, int to_wait_ms)
 }
 static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *engine = req->ring;
+	struct intel_engine_cs *engine = req->engine;
 
 	/* Before we do the heavier coherent read of the seqno,
 	 * check the value (hopefully) in the CPU cacheline.
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 247731672cb1..6622c9bb3af8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1122,7 +1122,7 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
 			if (ret)
 				return ret;
 
-			i = obj->last_write_req->ring->id;
+			i = obj->last_write_req->engine->id;
 			if (obj->last_read_req[i] == obj->last_write_req)
 				i915_gem_object_retire__read(obj, i);
 			else
@@ -1149,7 +1149,7 @@ static void
 i915_gem_object_retire_request(struct drm_i915_gem_object *obj,
 			       struct drm_i915_gem_request *req)
 {
-	int ring = req->ring->id;
+	int ring = req->engine->id;
 
 	if (obj->last_read_req[ring] == req)
 		i915_gem_object_retire__read(obj, ring);
@@ -2062,17 +2062,15 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 			     struct drm_i915_gem_request *req)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
-	struct intel_engine_cs *ring;
-
-	ring = i915_gem_request_get_ring(req);
+	struct intel_engine_cs *engine = req->engine;
 
 	/* Add a reference if we're newly entering the active list. */
 	if (obj->active == 0)
 		drm_gem_object_reference(&obj->base);
-	obj->active |= intel_ring_flag(ring);
+	obj->active |= intel_ring_flag(engine);
 
-	list_move_tail(&obj->ring_list[ring->id], &ring->active_list);
-	i915_gem_request_assign(&obj->last_read_req[ring->id], req);
+	list_move_tail(&obj->ring_list[engine->id], &engine->active_list);
+	i915_gem_request_assign(&obj->last_read_req[engine->id], req);
 
 	list_move_tail(&vma->mm_list, &vma->vm->active_list);
 }
@@ -2081,7 +2079,7 @@ static void
 i915_gem_object_retire__write(struct drm_i915_gem_object *obj)
 {
 	GEM_BUG_ON(obj->last_write_req == NULL);
-	GEM_BUG_ON(!(obj->active & intel_ring_flag(obj->last_write_req->ring)));
+	GEM_BUG_ON(!(obj->active & intel_ring_flag(obj->last_write_req->engine)));
 
 	i915_gem_request_assign(&obj->last_write_req, NULL);
 	intel_fb_obj_flush(obj, true, ORIGIN_CS);
@@ -2098,7 +2096,7 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 	list_del_init(&obj->ring_list[ring]);
 	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
 
-	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
+	if (obj->last_write_req && obj->last_write_req->engine->id == ring)
 		i915_gem_object_retire__write(obj);
 
 	obj->active &= ~(1 << ring);
@@ -2560,7 +2558,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	struct intel_engine_cs *from;
 	int ret;
 
-	from = i915_gem_request_get_ring(from_req);
+	from = from_req->engine;
 	if (to == from)
 		return 0;
 
@@ -3737,7 +3735,7 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 	BUILD_BUG_ON(I915_NUM_RINGS > 16);
 	args->busy = obj->active << 16;
 	if (obj->last_write_req)
-		args->busy |= obj->last_write_req->ring->id;
+		args->busy |= obj->last_write_req->engine->id;
 
 unref:
 	drm_gem_object_unreference(&obj->base);
@@ -4068,7 +4066,6 @@ err:
 
 int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
 	struct drm_i915_private *dev_priv = req->i915;
 	u32 *remap_info = dev_priv->l3_parity.remap_info[slice];
 	int i, ret;
@@ -4086,12 +4083,11 @@ int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice)
 	 * at initialization time.
 	 */
 	for (i = 0; i < GEN7_L3LOG_SIZE / 4; i++) {
-		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
-		intel_ring_emit_reg(ring, GEN7_L3LOG(slice, i));
-		intel_ring_emit(ring, remap_info[i]);
+		intel_ring_emit(req->ringbuf, MI_LOAD_REGISTER_IMM(1));
+		intel_ring_emit_reg(req->ringbuf, GEN7_L3LOG(slice, i));
+		intel_ring_emit(req->ringbuf, remap_info[i]);
 	}
-
-	intel_ring_advance(ring);
+	intel_ring_advance(req->ringbuf);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index d58de7e084dc..dece033cf604 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -450,14 +450,14 @@ void i915_gem_context_fini(struct drm_device *dev)
 
 int i915_gem_context_enable(struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_engine_cs *engine = req->engine;
 	int ret;
 
 	if (i915.enable_execlists) {
-		if (ring->init_context == NULL)
+		if (engine->init_context == NULL)
 			return 0;
 
-		ret = ring->init_context(req);
+		ret = engine->init_context(req);
 	} else
 		ret = i915_switch_context(req);
 
@@ -534,7 +534,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 	 * itlb_before_ctx_switch.
 	 */
 	if (IS_GEN6(req->i915)) {
-		ret = req->ring->flush(req, I915_GEM_GPU_DOMAINS, 0);
+		ret = req->engine->flush(req, I915_GEM_GPU_DOMAINS, 0);
 		if (ret)
 			return ret;
 	}
@@ -562,7 +562,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 
 			intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(num_rings));
 			for_each_ring(signaller, req->i915, i) {
-				if (signaller == req->ring)
+				if (signaller == req->engine)
 					continue;
 
 				intel_ring_emit_reg(ring, RING_PSMI_CTL(signaller->mmio_base));
@@ -587,7 +587,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 
 			intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(num_rings));
 			for_each_ring(signaller, req->i915, i) {
-				if (signaller == req->ring)
+				if (signaller == req->engine)
 					continue;
 
 				intel_ring_emit_reg(ring, RING_PSMI_CTL(signaller->mmio_base));
@@ -657,24 +657,18 @@ needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to,
 static int do_switch(struct drm_i915_gem_request *req)
 {
 	struct intel_context *to = req->ctx;
-	struct intel_engine_cs *ring = req->ring;
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-	struct intel_context *from = ring->last_context;
+	struct intel_engine_cs *engine = req->engine;
+	struct intel_context *from = engine->last_context;
 	u32 hw_flags = 0;
 	int ret, i;
 
-	if (from != NULL && ring == &dev_priv->ring[RCS]) {
-		BUG_ON(from->legacy_hw_ctx.rcs_state == NULL);
-		BUG_ON(!i915_gem_obj_is_pinned(from->legacy_hw_ctx.rcs_state));
-	}
-
-	if (should_skip_switch(ring, from, to))
+	if (should_skip_switch(engine, from, to))
 		return 0;
 
 	/* Trying to pin first makes error handling easier. */
-	if (ring == &dev_priv->ring[RCS]) {
+	if (engine->id == RCS) {
 		ret = i915_gem_obj_ggtt_pin(to->legacy_hw_ctx.rcs_state,
-					    get_context_alignment(ring->dev), 0);
+					    get_context_alignment(engine->dev), 0);
 		if (ret)
 			return ret;
 	}
@@ -684,23 +678,23 @@ static int do_switch(struct drm_i915_gem_request *req)
 	 * evict_everything - as a last ditch gtt defrag effort that also
 	 * switches to the default context. Hence we need to reload from here.
 	 */
-	from = ring->last_context;
+	from = engine->last_context;
 
-	if (needs_pd_load_pre(ring, to)) {
+	if (needs_pd_load_pre(engine, to)) {
 		/* Older GENs and non render rings still want the load first,
 		 * "PP_DCLV followed by PP_DIR_BASE register through Load
 		 * Register Immediate commands in Ring Buffer before submitting
 		 * a context."*/
-		trace_switch_mm(ring, to);
+		trace_switch_mm(engine, to);
 		ret = to->ppgtt->switch_mm(to->ppgtt, req);
 		if (ret)
 			goto unpin_out;
 
 		/* Doing a PD load always reloads the page dirs */
-		to->ppgtt->pd_dirty_rings &= ~intel_ring_flag(ring);
+		to->ppgtt->pd_dirty_rings &= ~intel_ring_flag(engine);
 	}
 
-	if (ring != &dev_priv->ring[RCS]) {
+	if (engine->id != RCS) {
 		if (from)
 			i915_gem_context_unreference(from);
 		goto done;
@@ -725,14 +719,14 @@ static int do_switch(struct drm_i915_gem_request *req)
 		 * space. This means we must enforce that a page table load
 		 * occur when this occurs. */
 	} else if (to->ppgtt &&
-		   (intel_ring_flag(ring) & to->ppgtt->pd_dirty_rings)) {
+		   (intel_ring_flag(engine) & to->ppgtt->pd_dirty_rings)) {
 		hw_flags |= MI_FORCE_RESTORE;
-		to->ppgtt->pd_dirty_rings &= ~intel_ring_flag(ring);
+		to->ppgtt->pd_dirty_rings &= ~intel_ring_flag(engine);
 	}
 
 	/* We should never emit switch_mm more than once */
-	WARN_ON(needs_pd_load_pre(ring, to) &&
-		needs_pd_load_post(ring, to, hw_flags));
+	WARN_ON(needs_pd_load_pre(engine, to) &&
+		needs_pd_load_post(engine, to, hw_flags));
 
 	ret = mi_set_context(req, hw_flags);
 	if (ret)
@@ -741,8 +735,8 @@ static int do_switch(struct drm_i915_gem_request *req)
 	/* GEN8 does *not* require an explicit reload if the PDPs have been
 	 * setup, and we do not wish to move them.
 	 */
-	if (needs_pd_load_post(ring, to, hw_flags)) {
-		trace_switch_mm(ring, to);
+	if (needs_pd_load_post(engine, to, hw_flags)) {
+		trace_switch_mm(engine, to);
 		ret = to->ppgtt->switch_mm(to->ppgtt, req);
 		/* The hardware context switch is emitted, but we haven't
 		 * actually changed the state - so it's probably safe to bail
@@ -768,8 +762,8 @@ static int do_switch(struct drm_i915_gem_request *req)
 	}
 
 	if (!to->legacy_hw_ctx.initialized) {
-		if (ring->init_context) {
-			ret = ring->init_context(req);
+		if (engine->init_context) {
+			ret = engine->init_context(req);
 			if (ret)
 				goto unpin_out;
 		}
@@ -801,12 +795,11 @@ static int do_switch(struct drm_i915_gem_request *req)
 
 done:
 	i915_gem_context_reference(to);
-	ring->last_context = to;
-
+	engine->last_context = to;
 	return 0;
 
 unpin_out:
-	if (ring->id == RCS)
+	if (engine->id == RCS)
 		i915_gem_object_ggtt_unpin(to->legacy_hw_ctx.rcs_state);
 	return ret;
 }
@@ -826,17 +819,18 @@ unpin_out:
  */
 int i915_switch_context(struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = req->ring;
 
 	WARN_ON(i915.enable_execlists);
 	WARN_ON(!mutex_is_locked(&req->i915->dev->struct_mutex));
 
 	if (req->ctx->legacy_hw_ctx.rcs_state == NULL) { /* We have the fake context */
-		if (req->ctx != ring->last_context) {
+		struct intel_engine_cs *engine = req->engine;
+
+		if (req->ctx != engine->last_context) {
 			i915_gem_context_reference(req->ctx);
-			if (ring->last_context)
-				i915_gem_context_unreference(ring->last_context);
-			ring->last_context = req->ctx;
+			if (engine->last_context)
+				i915_gem_context_unreference(engine->last_context);
+			engine->last_context = req->ctx;
 		}
 		return 0;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 603a247ac333..e7df91f9a51f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -942,7 +942,7 @@ static int
 i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 				struct list_head *vmas)
 {
-	const unsigned other_rings = ~intel_ring_flag(req->ring);
+	const unsigned other_rings = ~intel_ring_flag(req->engine);
 	struct i915_vma *vma;
 	uint32_t flush_domains = 0;
 	bool flush_chipset = false;
@@ -952,7 +952,7 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 		struct drm_i915_gem_object *obj = vma->obj;
 
 		if (obj->active & other_rings) {
-			ret = i915_gem_object_sync(obj, req->ring, &req);
+			ret = i915_gem_object_sync(obj, req->engine, &req);
 			if (ret)
 				return ret;
 		}
@@ -964,7 +964,7 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 	}
 
 	if (flush_chipset)
-		i915_gem_chipset_flush(req->ring->dev);
+		i915_gem_chipset_flush(req->engine->dev);
 
 	if (flush_domains & I915_GEM_DOMAIN_GTT)
 		wmb();
@@ -1151,7 +1151,7 @@ i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
 	struct intel_ringbuffer *ring = req->ringbuf;
 	int ret, i;
 
-	if (!IS_GEN7(req->i915) || req->ring->id != RCS) {
+	if (!IS_GEN7(req->i915) || req->engine->id != RCS) {
 		DRM_DEBUG("sol reset is gen7/rcs only\n");
 		return -EINVAL;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 98841b05f764..cb7cb59d4c4a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -666,10 +666,10 @@ static int gen8_write_pdp(struct drm_i915_gem_request *req,
 		return ret;
 
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
-	intel_ring_emit_reg(ring, GEN8_RING_PDP_UDW(req->ring, entry));
+	intel_ring_emit_reg(ring, GEN8_RING_PDP_UDW(req->engine, entry));
 	intel_ring_emit(ring, upper_32_bits(addr));
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
-	intel_ring_emit_reg(ring, GEN8_RING_PDP_LDW(req->ring, entry));
+	intel_ring_emit_reg(ring, GEN8_RING_PDP_LDW(req->engine, entry));
 	intel_ring_emit(ring, lower_32_bits(addr));
 	intel_ring_advance(ring);
 
@@ -1652,7 +1652,9 @@ static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	/* NB: TLBs must be flushed and invalidated before a switch */
-	ret = req->ring->flush(req, I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS);
+	ret = req->engine->flush(req,
+				 I915_GEM_GPU_DOMAINS,
+				 I915_GEM_GPU_DOMAINS);
 	if (ret)
 		return ret;
 
@@ -1661,9 +1663,9 @@ static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
 		return ret;
 
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(2));
-	intel_ring_emit_reg(ring, RING_PP_DIR_DCLV(req->ring));
+	intel_ring_emit_reg(ring, RING_PP_DIR_DCLV(req->engine));
 	intel_ring_emit(ring, PP_DIR_DCLV_2G);
-	intel_ring_emit_reg(ring, RING_PP_DIR_BASE(req->ring));
+	intel_ring_emit_reg(ring, RING_PP_DIR_BASE(req->engine));
 	intel_ring_emit(ring, get_pd_offset(ppgtt));
 	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_advance(ring);
@@ -1674,11 +1676,10 @@ static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
 static int vgpu_mm_switch(struct i915_hw_ppgtt *ppgtt,
 			  struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = req->ring;
-	struct drm_i915_private *dev_priv = to_i915(ppgtt->base.dev);
+	struct drm_i915_private *dev_priv = req->i915;
 
-	I915_WRITE(RING_PP_DIR_DCLV(ring), PP_DIR_DCLV_2G);
-	I915_WRITE(RING_PP_DIR_BASE(ring), get_pd_offset(ppgtt));
+	I915_WRITE(RING_PP_DIR_DCLV(req->engine), PP_DIR_DCLV_2G);
+	I915_WRITE(RING_PP_DIR_BASE(req->engine), get_pd_offset(ppgtt));
 	return 0;
 }
 
@@ -1689,7 +1690,9 @@ static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	/* NB: TLBs must be flushed and invalidated before a switch */
-	ret = req->ring->flush(req, I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS);
+	ret = req->engine->flush(req,
+				 I915_GEM_GPU_DOMAINS,
+				 I915_GEM_GPU_DOMAINS);
 	if (ret)
 		return ret;
 
@@ -1698,16 +1701,18 @@ static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt,
 		return ret;
 
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(2));
-	intel_ring_emit_reg(ring, RING_PP_DIR_DCLV(req->ring));
+	intel_ring_emit_reg(ring, RING_PP_DIR_DCLV(req->engine));
 	intel_ring_emit(ring, PP_DIR_DCLV_2G);
-	intel_ring_emit_reg(ring, RING_PP_DIR_BASE(req->ring));
+	intel_ring_emit_reg(ring, RING_PP_DIR_BASE(req->engine));
 	intel_ring_emit(ring, get_pd_offset(ppgtt));
 	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_advance(ring);
 
 	/* XXX: RCS is the only one to auto invalidate the TLBs? */
-	if (req->ring->id != RCS) {
-		ret = req->ring->flush(req, I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS);
+	if (req->engine->id != RCS) {
+		ret = req->engine->flush(req,
+					 I915_GEM_GPU_DOMAINS,
+					 I915_GEM_GPU_DOMAINS);
 		if (ret)
 			return ret;
 	}
@@ -1718,15 +1723,12 @@ static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt,
 static int gen6_mm_switch(struct i915_hw_ppgtt *ppgtt,
 			  struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = req->ring;
-	struct drm_device *dev = ppgtt->base.dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
+	struct drm_i915_private *dev_priv = req->i915;
 
-	I915_WRITE(RING_PP_DIR_DCLV(ring), PP_DIR_DCLV_2G);
-	I915_WRITE(RING_PP_DIR_BASE(ring), get_pd_offset(ppgtt));
+	I915_WRITE(RING_PP_DIR_DCLV(req->engine), PP_DIR_DCLV_2G);
+	I915_WRITE(RING_PP_DIR_BASE(req->engine), get_pd_offset(ppgtt));
 
-	POSTING_READ(RING_PP_DIR_DCLV(ring));
+	POSTING_READ(RING_PP_DIR_DCLV(req->engine));
 
 	return 0;
 }
@@ -2169,8 +2171,7 @@ int i915_ppgtt_init_hw(struct drm_device *dev)
 
 int i915_ppgtt_init_ring(struct drm_i915_gem_request *req)
 {
-	struct drm_i915_private *dev_priv = req->ring->dev->dev_private;
-	struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
+	struct i915_hw_ppgtt *ppgtt = req->i915->mm.aliasing_ppgtt;
 
 	if (i915.enable_execlists)
 		return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index fc7e6d5c6251..bee3f0ccd0cd 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -198,25 +198,25 @@ int i915_gem_render_state_init(struct drm_i915_gem_request *req)
 	struct render_state so;
 	int ret;
 
-	ret = i915_gem_render_state_prepare(req->ring, &so);
+	ret = i915_gem_render_state_prepare(req->engine, &so);
 	if (ret)
 		return ret;
 
 	if (so.rodata == NULL)
 		return 0;
 
-	ret = req->ring->dispatch_execbuffer(req, so.ggtt_offset,
-					     so.rodata->batch_items * 4,
-					     I915_DISPATCH_SECURE);
+	ret = req->engine->dispatch_execbuffer(req, so.ggtt_offset,
+					       so.rodata->batch_items * 4,
+					       I915_DISPATCH_SECURE);
 	if (ret)
 		goto out;
 
 	if (so.aux_batch_size > 8) {
-		ret = req->ring->dispatch_execbuffer(req,
-						     (so.ggtt_offset +
-						      so.aux_batch_offset),
-						     so.aux_batch_size,
-						     I915_DISPATCH_SECURE);
+		ret = req->engine->dispatch_execbuffer(req,
+						       (so.ggtt_offset +
+							so.aux_batch_offset),
+						       so.aux_batch_size,
+						       I915_DISPATCH_SECURE);
 		if (ret)
 			goto out;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 85067069995e..8adf2c134048 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -37,7 +37,7 @@ static const char *i915_fence_get_driver_name(struct fence *fence)
 
 static const char *i915_fence_get_timeline_name(struct fence *fence)
 {
-	return to_i915_request(fence)->ring->name;
+	return to_i915_request(fence)->engine->name;
 }
 
 static bool i915_fence_signaled(struct fence *fence)
@@ -90,7 +90,7 @@ static void i915_fence_timeline_value_str(struct fence *fence, char *str,
 					  int size)
 {
 	snprintf(str, size, "%u",
-		 intel_ring_get_seqno(to_i915_request(fence)->ring));
+		 intel_ring_get_seqno(to_i915_request(fence)->engine));
 }
 
 static void i915_fence_release(struct fence *fence)
@@ -195,11 +195,11 @@ i915_gem_get_seqno(struct drm_i915_private *dev_priv, u32 *seqno)
 	return 0;
 }
 
-int i915_gem_request_alloc(struct intel_engine_cs *ring,
+int i915_gem_request_alloc(struct intel_engine_cs *engine,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out)
 {
-	struct drm_i915_private *dev_priv = ring->i915;
+	struct drm_i915_private *dev_priv = engine->i915;
 	unsigned reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 	struct drm_i915_gem_request *req;
 	u32 seqno;
@@ -230,11 +230,11 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 	fence_init(&req->fence,
 		   &i915_fence_ops,
 		   &req->lock,
-		   ring->fence_context,
+		   engine->fence_context,
 		   seqno);
 
 	req->i915 = dev_priv;
-	req->ring = ring;
+	req->engine = engine;
 	req->reset_counter = reset_counter;
 	req->ctx  = ctx;
 	i915_gem_context_reference(req->ctx);
@@ -279,7 +279,6 @@ err:
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 				   struct drm_file *file)
 {
-	struct drm_i915_private *dev_private;
 	struct drm_i915_file_private *file_priv;
 
 	WARN_ON(!req || !file || req->file_priv);
@@ -290,7 +289,6 @@ int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 	if (req->file_priv)
 		return -EINVAL;
 
-	dev_private = req->ring->dev->dev_private;
 	file_priv = file->driver_priv;
 
 	spin_lock(&file_priv->mm.lock);
@@ -332,7 +330,7 @@ void i915_gem_request_cancel(struct drm_i915_gem_request *req)
 {
 	intel_ring_reserved_space_cancel(req->ringbuf);
 	if (i915.enable_execlists) {
-		if (req->ctx != req->ring->default_context)
+		if (req->ctx != req->engine->default_context)
 			intel_lr_context_unpin(req);
 	}
 	__i915_gem_request_release(req);
@@ -358,7 +356,7 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 void
 i915_gem_request_retire_upto(struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *engine = req->ring;
+	struct intel_engine_cs *engine = req->engine;
 	struct drm_i915_gem_request *tmp;
 
 	lockdep_assert_held(&engine->dev->struct_mutex);
@@ -403,8 +401,6 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 			struct drm_i915_gem_object *obj,
 			bool flush_caches)
 {
-	struct intel_engine_cs *ring;
-	struct drm_i915_private *dev_priv;
 	struct intel_ringbuffer *ringbuf;
 	u32 request_start;
 	int ret;
@@ -412,8 +408,6 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	if (WARN_ON(request == NULL))
 		return;
 
-	ring = request->ring;
-	dev_priv = ring->dev->dev_private;
 	ringbuf = request->ringbuf;
 
 	/*
@@ -448,9 +442,9 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	request->postfix = intel_ring_get_tail(ringbuf);
 
 	if (i915.enable_execlists)
-		ret = ring->emit_request(request);
+		ret = request->engine->emit_request(request);
 	else {
-		ret = ring->add_request(request);
+		ret = request->engine->add_request(request);
 
 		request->tail = intel_ring_get_tail(ringbuf);
 	}
@@ -468,13 +462,13 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	request->batch_obj = obj;
 
 	request->emitted_jiffies = jiffies;
-	request->previous_seqno = ring->last_submitted_seqno;
-	ring->last_submitted_seqno = request->fence.seqno;
-	list_add_tail(&request->list, &ring->request_list);
+	request->previous_seqno = request->engine->last_submitted_seqno;
+	request->engine->last_submitted_seqno = request->fence.seqno;
+	list_add_tail(&request->list, &request->engine->request_list);
 
 	trace_i915_gem_request_add(request);
 
-	i915_gem_mark_busy(dev_priv);
+	i915_gem_mark_busy(request->i915);
 
 	/* Sanity check that the reserved size was large enough. */
 	intel_ring_reserved_space_end(ringbuf);
@@ -627,7 +621,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	set_task_state(wait.task, state);
 
 	/* Optimistic spin for the next ~jiffie before touching IRQs */
-	if (intel_engine_add_wait(req->ring, &wait)) {
+	if (intel_engine_add_wait(req->engine, &wait)) {
 		if (__i915_spin_request(req, &wait, state))
 			goto complete;
 
@@ -635,7 +629,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 		 * as we enabled it, we need to kick ourselves to do a
 		 * coherent check on the seqno before we sleep.
 		 */
-		if (intel_engine_enable_wait_irq(req->ring, &wait))
+		if (intel_engine_enable_wait_irq(req->engine, &wait))
 			goto wakeup;
 	}
 
@@ -670,7 +664,7 @@ wakeup:
 	}
 
 complete:
-	intel_engine_remove_wait(req->ring, &wait);
+	intel_engine_remove_wait(req->engine, &wait);
 	__set_task_state(wait.task, TASK_RUNNING);
 	trace_i915_gem_request_wait_end(req);
 
@@ -691,7 +685,7 @@ complete:
 	}
 
 	if (ret == 0 && !IS_ERR_OR_NULL(rps) &&
-	    req->fence.seqno == req->ring->last_submitted_seqno) {
+	    req->fence.seqno == req->engine->last_submitted_seqno) {
 		/* The GPU is now idle and this client has stalled.
 		 * Since no other client has submitted a request in the
 		 * meantime, assume that this client is the only one
@@ -717,20 +711,13 @@ complete:
 int
 i915_wait_request(struct drm_i915_gem_request *req)
 {
-	struct drm_device *dev;
-	struct drm_i915_private *dev_priv;
-	bool interruptible;
 	int ret;
 
 	BUG_ON(req == NULL);
 
-	dev = req->ring->dev;
-	dev_priv = dev->dev_private;
-	interruptible = dev_priv->mm.interruptible;
-
-	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
+	BUG_ON(!mutex_is_locked(&req->i915->dev->struct_mutex));
 
-	ret = __i915_wait_request(req, interruptible, NULL, NULL);
+	ret = __i915_wait_request(req, req->i915->mm.interruptible, NULL, NULL);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 6b3de827929a..802862e5007d 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -46,7 +46,7 @@ struct drm_i915_gem_request {
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
-	struct intel_engine_cs *ring;
+	struct intel_engine_cs *engine;
 	unsigned reset_counter;
 
 	 /** GEM sequence number associated with the previous request,
@@ -133,9 +133,9 @@ i915_gem_request_get_seqno(struct drm_i915_gem_request *req)
 }
 
 static inline struct intel_engine_cs *
-i915_gem_request_get_ring(struct drm_i915_gem_request *req)
+i915_gem_request_get_engine(struct drm_i915_gem_request *req)
 {
-	return req ? req->ring : NULL;
+	return req ? req->engine : NULL;
 }
 
 static inline struct drm_i915_gem_request *
@@ -198,13 +198,13 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 
 static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
 {
-	return i915_seqno_passed(intel_ring_get_seqno(req->ring),
+	return i915_seqno_passed(intel_ring_get_seqno(req->engine),
 				 req->previous_seqno);
 }
 
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
-	return i915_seqno_passed(intel_ring_get_seqno(req->ring),
+	return i915_seqno_passed(intel_ring_get_seqno(req->engine),
 				 req->fence.seqno);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 84ce91275fdd..5bf208d8009e 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -721,8 +721,7 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 	err->dirty = obj->dirty;
 	err->purgeable = obj->madv != I915_MADV_WILLNEED;
 	err->userptr = obj->userptr.mm != NULL;
-	err->ring = obj->last_write_req ?
-			i915_gem_request_get_ring(obj->last_write_req)->id : -1;
+	err->ring = obj->last_write_req ?  obj->last_write_req->engine->id : -1;
 	err->cache_level = obj->cache_level;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 56d3064d32ed..eaf680ce5c9c 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -510,7 +510,7 @@ int i915_guc_wq_check_space(struct i915_guc_client *gc)
 static int guc_add_workqueue_item(struct i915_guc_client *gc,
 				  struct drm_i915_gem_request *rq)
 {
-	enum intel_ring_id ring_id = rq->ring->id;
+	enum intel_ring_id ring_id = rq->engine->id;
 	struct guc_wq_item *wqi;
 	void *base;
 	u32 tail, wq_len, wq_off, space;
@@ -548,7 +548,7 @@ static int guc_add_workqueue_item(struct i915_guc_client *gc,
 			WQ_NO_WCFLUSH_WAIT;
 
 	/* The GuC wants only the low-order word of the context descriptor */
-	wqi->context_desc = (u32)intel_lr_context_descriptor(rq->ctx, rq->ring);
+	wqi->context_desc = (u32)intel_lr_context_descriptor(rq->ctx, rq->engine);
 
 	/* The GuC firmware wants the tail index in QWords, not bytes */
 	tail = rq->ringbuf->tail >> 3;
@@ -565,7 +565,7 @@ static int guc_add_workqueue_item(struct i915_guc_client *gc,
 /* Update the ringbuffer pointer in a saved context image */
 static void lr_context_update(struct drm_i915_gem_request *rq)
 {
-	enum intel_ring_id ring_id = rq->ring->id;
+	enum intel_ring_id ring_id = rq->engine->id;
 	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring_id].state;
 	struct drm_i915_gem_object *rb_obj = rq->ringbuf->obj;
 	struct page *page;
@@ -594,7 +594,7 @@ int i915_guc_submit(struct i915_guc_client *client,
 		    struct drm_i915_gem_request *rq)
 {
 	struct intel_guc *guc = client->guc;
-	enum intel_ring_id ring_id = rq->ring->id;
+	enum intel_ring_id ring_id = rq->engine->id;
 	int q_ret, b_ret;
 
 	/* Need this because of the deferred pin ctx and ring */
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index dc2ff5cac2f4..0204ff72b3e4 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -475,7 +475,7 @@ TRACE_EVENT(i915_gem_ring_sync_to,
 	    TP_fast_assign(
 			   __entry->dev = from->dev->primary->index;
 			   __entry->sync_from = from->id;
-			   __entry->sync_to = to_req->ring->id;
+			   __entry->sync_to = to_req->engine->id;
 			   __entry->seqno = i915_gem_request_get_seqno(req);
 			   ),
 
@@ -497,11 +497,9 @@ TRACE_EVENT(i915_gem_ring_dispatch,
 			     ),
 
 	    TP_fast_assign(
-			   struct intel_engine_cs *ring =
-						i915_gem_request_get_ring(req);
-			   __entry->dev = ring->dev->primary->index;
-			   __entry->ring = ring->id;
-			   __entry->seqno = i915_gem_request_get_seqno(req);
+			   __entry->dev = req->i915->dev->primary->index;
+			   __entry->ring = req->engine->id;
+			   __entry->seqno = req->fence.seqno;
 			   __entry->flags = flags;
 			   fence_enable_sw_signaling(&req->fence);
 			   ),
@@ -522,8 +520,8 @@ TRACE_EVENT(i915_gem_ring_flush,
 			     ),
 
 	    TP_fast_assign(
-			   __entry->dev = req->ring->dev->primary->index;
-			   __entry->ring = req->ring->id;
+			   __entry->dev = req->engine->dev->primary->index;
+			   __entry->ring = req->engine->id;
 			   __entry->invalidate = invalidate;
 			   __entry->flush = flush;
 			   ),
@@ -544,11 +542,9 @@ DECLARE_EVENT_CLASS(i915_gem_request,
 			     ),
 
 	    TP_fast_assign(
-			   struct intel_engine_cs *ring =
-						i915_gem_request_get_ring(req);
-			   __entry->dev = ring->dev->primary->index;
-			   __entry->ring = ring->id;
-			   __entry->seqno = i915_gem_request_get_seqno(req);
+			   __entry->dev = req->i915->dev->primary->index;
+			   __entry->ring = req->engine->id;
+			   __entry->seqno = req->fence.seqno;
 			   ),
 
 	    TP_printk("dev=%u, ring=%u, seqno=%u",
@@ -608,13 +604,11 @@ TRACE_EVENT(i915_gem_request_wait_begin,
 	     * less desirable.
 	     */
 	    TP_fast_assign(
-			   struct intel_engine_cs *ring =
-						i915_gem_request_get_ring(req);
-			   __entry->dev = ring->dev->primary->index;
-			   __entry->ring = ring->id;
-			   __entry->seqno = i915_gem_request_get_seqno(req);
+			   __entry->dev = req->i915->dev->primary->index;
+			   __entry->ring = req->engine->id;
+			   __entry->seqno = req->fence.seqno;
 			   __entry->blocking =
-				     mutex_is_locked(&ring->dev->struct_mutex);
+				     mutex_is_locked(&req->i915->dev->struct_mutex);
 			   ),
 
 	    TP_printk("dev=%u, ring=%u, seqno=%u, blocking=%s",
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index aca1b72edcd8..5ba8b4cd8a18 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -419,7 +419,7 @@ static int intel_breadcrumbs_signaler(void *arg)
 
 int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
 {
-	struct intel_engine_cs *engine = request->ring;
+	struct intel_engine_cs *engine = request->engine;
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
 	struct rb_node *parent, **p;
 	struct task_struct *task;
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index b28e783f6f04..323b0d905c89 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11215,7 +11215,7 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 	}
 
 	len = 4;
-	if (req->ring->id == RCS) {
+	if (req->engine->id == RCS) {
 		len += 6;
 		/*
 		 * On Gen 8, SRM is now taking an extra dword to accommodate
@@ -11253,7 +11253,7 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 	 * for the RCS also doesn't appear to drop events. Setting the DERRMR
 	 * to zero does lead to lockups within MI_DISPLAY_FLIP.
 	 */
-	if (req->ring->id == RCS) {
+	if (req->engine->id == RCS) {
 		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 		intel_ring_emit_reg(ring, DERRMR);
 		intel_ring_emit(ring, ~(DERRMR_PIPEA_PRI_FLIP_DONE |
@@ -11266,7 +11266,7 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 			intel_ring_emit(ring, MI_STORE_REGISTER_MEM |
 					      MI_SRM_LRM_GLOBAL_GTT);
 		intel_ring_emit_reg(ring, DERRMR);
-		intel_ring_emit(ring, req->ring->scratch.gtt_offset + 256);
+		intel_ring_emit(ring, req->engine->scratch.gtt_offset + 256);
 		if (IS_GEN8(req->i915)) {
 			intel_ring_emit(ring, 0);
 			intel_ring_emit(ring, MI_NOOP);
@@ -11310,7 +11310,7 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
 						       false))
 		return true;
 	else
-		return ring != i915_gem_request_get_ring(obj->last_write_req);
+		return ring != i915_gem_request_get_engine(obj->last_write_req);
 }
 
 static void skl_do_mmio_flip(struct intel_crtc *intel_crtc,
@@ -11654,7 +11654,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	} else if (IS_IVYBRIDGE(dev) || IS_HASWELL(dev)) {
 		ring = &dev_priv->ring[BCS];
 	} else if (INTEL_INFO(dev)->gen >= 7) {
-		ring = i915_gem_request_get_ring(obj->last_write_req);
+		ring = i915_gem_request_get_engine(obj->last_write_req);
 		if (ring == NULL || ring->id != RCS)
 			ring = &dev_priv->ring[BCS];
 	} else {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4f1944929330..1b70a76df31d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -287,11 +287,9 @@ u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
 
 static bool disable_lite_restore_wa(struct intel_engine_cs *ring)
 {
-	struct drm_device *dev = ring->dev;
-
-	return (IS_SKL_REVID(dev, 0, SKL_REVID_B0) ||
-		IS_BXT_REVID(dev, 0, BXT_REVID_A1)) &&
-	       (ring->id == VCS || ring->id == VCS2);
+	return (IS_SKL_REVID(ring->dev, 0, SKL_REVID_B0) ||
+		IS_BXT_REVID(ring->dev, 0, BXT_REVID_A1)) &&
+		(ring->id == VCS || ring->id == VCS2);
 }
 
 uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
@@ -305,8 +303,8 @@ uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
 	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
 
 	desc = GEN8_CTX_VALID;
-	desc |= GEN8_CTX_ADDRESSING_MODE(dev) << GEN8_CTX_ADDRESSING_MODE_SHIFT;
-	if (IS_GEN8(ctx_obj->base.dev))
+	desc |= GEN8_CTX_ADDRESSING_MODE(ring->i915) << GEN8_CTX_ADDRESSING_MODE_SHIFT;
+	if (IS_GEN8(ring->i915))
 		desc |= GEN8_CTX_L3LLC_COHERENT;
 	desc |= GEN8_CTX_PRIVILEGE;
 	desc |= lrca;
@@ -328,41 +326,40 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
 				 struct drm_i915_gem_request *rq1)
 {
 
-	struct intel_engine_cs *ring = rq0->ring;
+	struct intel_engine_cs *engine = rq0->engine;
 	struct drm_i915_private *dev_priv = rq0->i915;
 	uint64_t desc[2];
 
 	if (rq1) {
-		desc[1] = intel_lr_context_descriptor(rq1->ctx, rq1->ring);
+		desc[1] = intel_lr_context_descriptor(rq1->ctx, rq1->engine);
 		rq1->elsp_submitted++;
 	} else {
 		desc[1] = 0;
 	}
 
-	desc[0] = intel_lr_context_descriptor(rq0->ctx, rq0->ring);
+	desc[0] = intel_lr_context_descriptor(rq0->ctx, rq0->engine);
 	rq0->elsp_submitted++;
 
 	/* You must always write both descriptors in the order below. */
 	spin_lock(&dev_priv->uncore.lock);
 	intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL);
-	I915_WRITE_FW(RING_ELSP(ring), upper_32_bits(desc[1]));
-	I915_WRITE_FW(RING_ELSP(ring), lower_32_bits(desc[1]));
+	I915_WRITE_FW(RING_ELSP(engine), upper_32_bits(desc[1]));
+	I915_WRITE_FW(RING_ELSP(engine), lower_32_bits(desc[1]));
 
-	I915_WRITE_FW(RING_ELSP(ring), upper_32_bits(desc[0]));
+	I915_WRITE_FW(RING_ELSP(engine), upper_32_bits(desc[0]));
 	/* The context is automatically loaded after the following */
-	I915_WRITE_FW(RING_ELSP(ring), lower_32_bits(desc[0]));
+	I915_WRITE_FW(RING_ELSP(engine), lower_32_bits(desc[0]));
 
 	/* ELSP is a wo register, use another nearby reg for posting */
-	POSTING_READ_FW(RING_EXECLIST_STATUS_LO(ring));
+	POSTING_READ_FW(RING_EXECLIST_STATUS_LO(engine));
 	intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL);
 	spin_unlock(&dev_priv->uncore.lock);
 }
 
 static int execlists_update_context(struct drm_i915_gem_request *rq)
 {
-	struct intel_engine_cs *ring = rq->ring;
 	struct i915_hw_ppgtt *ppgtt = rq->ctx->ppgtt;
-	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring->id].state;
+	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[rq->engine->id].state;
 	struct drm_i915_gem_object *rb_obj = rq->ringbuf->obj;
 	struct page *page;
 	uint32_t *reg_state;
@@ -377,7 +374,7 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 	reg_state[CTX_RING_TAIL+1] = rq->tail;
 	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(rb_obj);
 
-	if (ppgtt && !USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+	if (ppgtt && !USES_FULL_48BIT_PPGTT(rq->i915)) {
 		/* True 32b PPGTT with dynamic page allocation: update PDP
 		 * registers and point the unallocated PDPs to scratch page.
 		 * PML4 is allocated during ppgtt init, so this is not needed
@@ -582,22 +579,22 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring)
 
 static int execlists_context_queue(struct drm_i915_gem_request *request)
 {
-	struct intel_engine_cs *ring = request->ring;
+	struct intel_engine_cs *engine = request->engine;
 	struct drm_i915_gem_request *cursor;
 	int num_elements = 0;
 
 	i915_gem_request_get(request);
 
-	spin_lock_irq(&ring->execlist_lock);
+	spin_lock_irq(&engine->execlist_lock);
 
-	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
+	list_for_each_entry(cursor, &engine->execlist_queue, execlist_link)
 		if (++num_elements > 2)
 			break;
 
 	if (num_elements > 2) {
 		struct drm_i915_gem_request *tail_req;
 
-		tail_req = list_last_entry(&ring->execlist_queue,
+		tail_req = list_last_entry(&engine->execlist_queue,
 					   struct drm_i915_gem_request,
 					   execlist_link);
 
@@ -606,41 +603,41 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
 				"More than 2 already-submitted reqs queued\n");
 			list_del(&tail_req->execlist_link);
 			list_add_tail(&tail_req->execlist_link,
-				&ring->execlist_retired_req_list);
+				&engine->execlist_retired_req_list);
 		}
 	}
 
-	list_add_tail(&request->execlist_link, &ring->execlist_queue);
+	list_add_tail(&request->execlist_link, &engine->execlist_queue);
 	if (num_elements == 0)
-		execlists_context_unqueue(ring);
+		execlists_context_unqueue(engine);
 
-	spin_unlock_irq(&ring->execlist_lock);
+	spin_unlock_irq(&engine->execlist_lock);
 
 	return 0;
 }
 
 static int logical_ring_invalidate_all_caches(struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_engine_cs *engine = req->engine;
 	uint32_t flush_domains;
 	int ret;
 
 	flush_domains = 0;
-	if (ring->gpu_caches_dirty)
+	if (engine->gpu_caches_dirty)
 		flush_domains = I915_GEM_GPU_DOMAINS;
 
-	ret = ring->emit_flush(req, I915_GEM_GPU_DOMAINS, flush_domains);
+	ret = engine->emit_flush(req, I915_GEM_GPU_DOMAINS, flush_domains);
 	if (ret)
 		return ret;
 
-	ring->gpu_caches_dirty = false;
+	engine->gpu_caches_dirty = false;
 	return 0;
 }
 
 static int execlists_move_to_gpu(struct drm_i915_gem_request *req,
 				 struct list_head *vmas)
 {
-	const unsigned other_rings = ~intel_ring_flag(req->ring);
+	const unsigned other_rings = ~intel_ring_flag(req->engine);
 	struct i915_vma *vma;
 	uint32_t flush_domains = 0;
 	bool flush_chipset = false;
@@ -650,7 +647,7 @@ static int execlists_move_to_gpu(struct drm_i915_gem_request *req,
 		struct drm_i915_gem_object *obj = vma->obj;
 
 		if (obj->active & other_rings) {
-			ret = i915_gem_object_sync(obj, req->ring, &req);
+			ret = i915_gem_object_sync(obj, req->engine, &req);
 			if (ret)
 				return ret;
 		}
@@ -674,9 +671,9 @@ int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request
 {
 	int ret;
 
-	request->ringbuf = request->ctx->engine[request->ring->id].ringbuf;
+	request->ringbuf = request->ctx->engine[request->engine->id].ringbuf;
 
-	if (request->ctx != request->ring->default_context) {
+	if (request->ctx != request->engine->default_context) {
 		ret = intel_lr_context_pin(request);
 		if (ret)
 			return ret;
@@ -865,17 +862,17 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
 
 int logical_ring_flush_all_caches(struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_engine_cs *engine = req->engine;
 	int ret;
 
-	if (!ring->gpu_caches_dirty)
+	if (!engine->gpu_caches_dirty)
 		return 0;
 
-	ret = ring->emit_flush(req, 0, I915_GEM_GPU_DOMAINS);
+	ret = engine->emit_flush(req, 0, I915_GEM_GPU_DOMAINS);
 	if (ret)
 		return ret;
 
-	ring->gpu_caches_dirty = false;
+	engine->gpu_caches_dirty = false;
 	return 0;
 }
 
@@ -913,34 +910,33 @@ unpin_ctx_obj:
 
 static int intel_lr_context_pin(struct drm_i915_gem_request *rq)
 {
-	int ret = 0;
-	struct intel_engine_cs *ring = rq->ring;
-	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring->id].state;
-	struct intel_ringbuffer *ringbuf = rq->ringbuf;
+	int engine = rq->engine->id;
+	int ret;
 
-	if (rq->ctx->engine[ring->id].pin_count++ == 0) {
-		ret = intel_lr_context_do_pin(ring, ctx_obj, ringbuf);
-		if (ret)
-			goto reset_pin_count;
+	if (rq->ctx->engine[engine].pin_count++)
+		return 0;
 
-		i915_gem_context_reference(rq->ctx);
+	ret = intel_lr_context_do_pin(rq->engine,
+				      rq->ctx->engine[engine].state,
+				      rq->ringbuf);
+	if (ret) {
+		rq->ctx->engine[engine].pin_count = 0;
+		return ret;
 	}
-	return ret;
 
-reset_pin_count:
-	rq->ctx->engine[ring->id].pin_count = 0;
-	return ret;
+	i915_gem_context_reference(rq->ctx);
+	return 0;
 }
 
 void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 {
-	struct intel_engine_cs *ring = rq->ring;
-	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring->id].state;
+	int engine = rq->engine->id;
+	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[engine].state;
 	struct intel_ringbuffer *ringbuf = rq->ringbuf;
 
 	if (ctx_obj) {
-		WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
-		if (--rq->ctx->engine[ring->id].pin_count == 0) {
+		WARN_ON(!mutex_is_locked(&rq->i915->dev->struct_mutex));
+		if (--rq->ctx->engine[engine].pin_count == 0) {
 			intel_unpin_ringbuffer_obj(ringbuf);
 			i915_gem_object_ggtt_unpin(ctx_obj);
 			i915_gem_context_unreference(rq->ctx);
@@ -951,7 +947,7 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
 {
 	int ret, i;
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_engine_cs *engine = req->engine;
 	struct intel_ringbuffer *ringbuf = req->ringbuf;
 	struct drm_i915_private *dev_priv = req->i915;
 	struct i915_workarounds *w = &dev_priv->workarounds;
@@ -959,7 +955,7 @@ static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
 	if (w->count == 0)
 		return 0;
 
-	ring->gpu_caches_dirty = true;
+	engine->gpu_caches_dirty = true;
 	ret = logical_ring_flush_all_caches(req);
 	if (ret)
 		return ret;
@@ -977,7 +973,7 @@ static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
 
 	intel_ring_advance(ringbuf);
 
-	ring->gpu_caches_dirty = true;
+	engine->gpu_caches_dirty = true;
 	ret = logical_ring_flush_all_caches(req);
 	if (ret)
 		return ret;
@@ -1421,7 +1417,7 @@ static int gen9_init_render_ring(struct intel_engine_cs *ring)
 static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
 {
 	struct i915_hw_ppgtt *ppgtt = req->ctx->ppgtt;
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_engine_cs *engine = req->engine;
 	struct intel_ringbuffer *ringbuf = req->ringbuf;
 	const int num_lri_cmds = GEN8_LEGACY_PDPES * 2;
 	int i, ret;
@@ -1434,9 +1430,9 @@ static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
 	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
 		const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
 
-		intel_ring_emit_reg(ringbuf, GEN8_RING_PDP_UDW(ring, i));
+		intel_ring_emit_reg(ringbuf, GEN8_RING_PDP_UDW(engine, i));
 		intel_ring_emit(ringbuf, upper_32_bits(pd_daddr));
-		intel_ring_emit_reg(ringbuf, GEN8_RING_PDP_LDW(ring, i));
+		intel_ring_emit_reg(ringbuf, GEN8_RING_PDP_LDW(engine, i));
 		intel_ring_emit(ringbuf, lower_32_bits(pd_daddr));
 	}
 
@@ -1460,7 +1456,7 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
 	 * not idle). PML4 is allocated during ppgtt init so this is
 	 * not needed in 48-bit.*/
 	if (req->ctx->ppgtt &&
-	    (intel_ring_flag(req->ring) & req->ctx->ppgtt->pd_dirty_rings)) {
+	    (intel_ring_flag(req->engine) & req->ctx->ppgtt->pd_dirty_rings)) {
 		if (!USES_FULL_48BIT_PPGTT(req->i915) &&
 		    !intel_vgpu_active(req->i915->dev)) {
 			ret = intel_logical_ring_emit_pdps(req);
@@ -1468,7 +1464,7 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
 				return ret;
 		}
 
-		req->ctx->ppgtt->pd_dirty_rings &= ~intel_ring_flag(req->ring);
+		req->ctx->ppgtt->pd_dirty_rings &= ~intel_ring_flag(req->engine);
 	}
 
 	ret = intel_ring_begin(req, 4);
@@ -1672,21 +1668,21 @@ static int intel_lr_context_render_state_init(struct drm_i915_gem_request *req)
 	struct render_state so;
 	int ret;
 
-	ret = i915_gem_render_state_prepare(req->ring, &so);
+	ret = i915_gem_render_state_prepare(req->engine, &so);
 	if (ret)
 		return ret;
 
 	if (so.rodata == NULL)
 		return 0;
 
-	ret = req->ring->emit_bb_start(req, so.ggtt_offset,
-				       I915_DISPATCH_SECURE);
+	ret = req->engine->emit_bb_start(req, so.ggtt_offset,
+					 I915_DISPATCH_SECURE);
 	if (ret)
 		goto out;
 
-	ret = req->ring->emit_bb_start(req,
-				       (so.ggtt_offset + so.aux_batch_offset),
-				       I915_DISPATCH_SECURE);
+	ret = req->engine->emit_bb_start(req,
+					 (so.ggtt_offset + so.aux_batch_offset),
+					 I915_DISPATCH_SECURE);
 	if (ret)
 		goto out;
 
diff --git a/drivers/gpu/drm/i915/intel_mocs.c b/drivers/gpu/drm/i915/intel_mocs.c
index 5d4f6f3b67cd..40041bebc3dc 100644
--- a/drivers/gpu/drm/i915/intel_mocs.c
+++ b/drivers/gpu/drm/i915/intel_mocs.c
@@ -138,21 +138,21 @@ static const struct drm_i915_mocs_entry broxton_mocs_table[] = {
  *
  * Return: true if there are applicable MOCS settings for the device.
  */
-static bool get_mocs_settings(struct drm_device *dev,
+static bool get_mocs_settings(struct drm_i915_private *dev_priv,
 			      struct drm_i915_mocs_table *table)
 {
 	bool result = false;
 
-	if (IS_SKYLAKE(dev) || IS_KABYLAKE(dev)) {
+	if (IS_SKYLAKE(dev_priv) || IS_KABYLAKE(dev_priv)) {
 		table->size  = ARRAY_SIZE(skylake_mocs_table);
 		table->table = skylake_mocs_table;
 		result = true;
-	} else if (IS_BROXTON(dev)) {
+	} else if (IS_BROXTON(dev_priv)) {
 		table->size  = ARRAY_SIZE(broxton_mocs_table);
 		table->table = broxton_mocs_table;
 		result = true;
 	} else {
-		WARN_ONCE(INTEL_INFO(dev)->gen >= 9,
+		WARN_ONCE(INTEL_INFO(dev_priv)->gen >= 9,
 			  "Platform that should have a MOCS table does not.\n");
 	}
 
@@ -316,13 +316,12 @@ int intel_rcs_context_init_mocs(struct drm_i915_gem_request *req)
 	struct drm_i915_mocs_table t;
 	int ret;
 
-	if (get_mocs_settings(req->ring->dev, &t)) {
-		struct drm_i915_private *dev_priv = req->i915;
+	if (get_mocs_settings(req->i915, &t)) {
 		struct intel_engine_cs *ring;
 		enum intel_ring_id ring_id;
 
 		/* Program the control registers */
-		for_each_ring(ring, dev_priv, ring_id) {
+		for_each_ring(ring, req->i915, ring_id) {
 			ret = emit_mocs_control_table(req, &t, ring_id);
 			if (ret)
 				return ret;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index db5c407f7720..072fd0fc7748 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -192,7 +192,7 @@ static int
 intel_emit_post_sync_nonzero_flush(struct drm_i915_gem_request *req)
 {
 	struct intel_ringbuffer *ring = req->ringbuf;
-	u32 scratch_addr = req->ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	u32 scratch_addr = req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
 
 	ret = intel_ring_begin(req, 6);
@@ -229,7 +229,7 @@ gen6_render_ring_flush(struct drm_i915_gem_request *req,
 {
 	struct intel_ringbuffer *ring = req->ringbuf;
 	u32 flags = 0;
-	u32 scratch_addr = req->ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	u32 scratch_addr = req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
 
 	/* Force SNB workarounds for PIPE_CONTROL flushes */
@@ -302,7 +302,7 @@ gen7_render_ring_flush(struct drm_i915_gem_request *req,
 {
 	struct intel_ringbuffer *ring = req->ringbuf;
 	u32 flags = 0;
-	u32 scratch_addr = req->ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	u32 scratch_addr = req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
 
 	/*
@@ -386,7 +386,7 @@ gen8_render_ring_flush(struct drm_i915_gem_request *req,
 		       u32 invalidate_domains, u32 flush_domains)
 {
 	u32 flags = 0;
-	u32 scratch_addr = req->ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	u32 scratch_addr = req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
 
 	flags |= PIPE_CONTROL_CS_STALL;
@@ -696,7 +696,7 @@ static int intel_ring_workarounds_emit(struct drm_i915_gem_request *req)
 	if (w->count == 0)
 		return 0;
 
-	req->ring->gpu_caches_dirty = true;
+	req->engine->gpu_caches_dirty = true;
 	ret = intel_ring_flush_all_caches(req);
 	if (ret)
 		return ret;
@@ -714,7 +714,7 @@ static int intel_ring_workarounds_emit(struct drm_i915_gem_request *req)
 
 	intel_ring_advance(ring);
 
-	req->ring->gpu_caches_dirty = true;
+	req->engine->gpu_caches_dirty = true;
 	ret = intel_ring_flush_all_caches(req);
 	if (ret)
 		return ret;
@@ -1205,7 +1205,7 @@ static int gen8_rcs_signal(struct drm_i915_gem_request *signaller_req,
 		return ret;
 
 	for_each_ring(waiter, dev_priv, i) {
-		u64 gtt_offset = signaller_req->ring->semaphore.signal_ggtt[i];
+		u64 gtt_offset = signaller_req->engine->semaphore.signal_ggtt[i];
 		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
 			continue;
 
@@ -1243,7 +1243,7 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
 		return ret;
 
 	for_each_ring(waiter, dev_priv, i) {
-		u64 gtt_offset = signaller_req->ring->semaphore.signal_ggtt[i];
+		u64 gtt_offset = signaller_req->engine->semaphore.signal_ggtt[i];
 		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
 			continue;
 
@@ -1279,7 +1279,7 @@ static int gen6_signal(struct drm_i915_gem_request *signaller_req,
 		return ret;
 
 	for_each_ring(useless, dev_priv, i) {
-		i915_reg_t mbox_reg = signaller_req->ring->semaphore.mbox.signal[i];
+		i915_reg_t mbox_reg = signaller_req->engine->semaphore.mbox.signal[i];
 
 		if (i915_mmio_reg_valid(mbox_reg)) {
 			intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1));
@@ -1309,8 +1309,8 @@ gen6_add_request(struct drm_i915_gem_request *req)
 	struct intel_ringbuffer *ring = req->ringbuf;
 	int ret;
 
-	if (req->ring->semaphore.signal)
-		ret = req->ring->semaphore.signal(req, 4);
+	if (req->engine->semaphore.signal)
+		ret = req->engine->semaphore.signal(req, 4);
 	else
 		ret = intel_ring_begin(req, 4);
 
@@ -1321,7 +1321,7 @@ gen6_add_request(struct drm_i915_gem_request *req)
 	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
 	intel_ring_emit(ring, req->fence.seqno);
 	intel_ring_emit(ring, MI_USER_INTERRUPT);
-	__intel_ring_advance(req->ring);
+	__intel_ring_advance(req->engine);
 
 	return 0;
 }
@@ -1359,10 +1359,10 @@ gen8_ring_sync(struct drm_i915_gem_request *waiter_req,
 				MI_SEMAPHORE_SAD_GTE_SDD);
 	intel_ring_emit(waiter, seqno);
 	intel_ring_emit(waiter,
-			lower_32_bits(GEN8_WAIT_OFFSET(waiter_req->ring,
+			lower_32_bits(GEN8_WAIT_OFFSET(waiter_req->engine,
 						       signaller->id)));
 	intel_ring_emit(waiter,
-			upper_32_bits(GEN8_WAIT_OFFSET(waiter_req->ring,
+			upper_32_bits(GEN8_WAIT_OFFSET(waiter_req->engine,
 						       signaller->id)));
 	intel_ring_advance(waiter);
 	return 0;
@@ -1377,7 +1377,7 @@ gen6_ring_sync(struct drm_i915_gem_request *waiter_req,
 	u32 dw1 = MI_SEMAPHORE_MBOX |
 		  MI_SEMAPHORE_COMPARE |
 		  MI_SEMAPHORE_REGISTER;
-	u32 wait_mbox = signaller->semaphore.mbox.wait[waiter_req->ring->id];
+	u32 wait_mbox = signaller->semaphore.mbox.wait[waiter_req->engine->id];
 	int ret;
 
 	/* Throughout all of the GEM code, seqno passed implies our current
@@ -1422,7 +1422,7 @@ static int
 pc_render_add_request(struct drm_i915_gem_request *req)
 {
 	struct intel_ringbuffer *ring = req->ringbuf;
-	u32 addr = req->ring->status_page.gfx_addr +
+	u32 addr = req->engine->status_page.gfx_addr +
 		(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
 	u32 scratch_addr = addr;
 	int ret;
@@ -1465,7 +1465,7 @@ pc_render_add_request(struct drm_i915_gem_request *req)
 	intel_ring_emit(ring, addr | PIPE_CONTROL_GLOBAL_GTT);
 	intel_ring_emit(ring, req->fence.seqno);
 	intel_ring_emit(ring, 0);
-	__intel_ring_advance(req->ring);
+	__intel_ring_advance(req->engine);
 
 	return 0;
 }
@@ -1575,7 +1575,7 @@ i9xx_add_request(struct drm_i915_gem_request *req)
 	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
 	intel_ring_emit(ring, req->fence.seqno);
 	intel_ring_emit(ring, MI_USER_INTERRUPT);
-	__intel_ring_advance(req->ring);
+	__intel_ring_advance(req->engine);
 
 	return 0;
 }
@@ -1686,7 +1686,7 @@ i830_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			 unsigned dispatch_flags)
 {
 	struct intel_ringbuffer *ring = req->ringbuf;
-	u32 cs_offset = req->ring->scratch.gtt_offset;
+	u32 cs_offset = req->engine->scratch.gtt_offset;
 	int ret;
 
 	ret = intel_ring_begin(req, 6);
@@ -2082,7 +2082,7 @@ int intel_ring_idle(struct intel_engine_cs *ring)
 
 int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
 {
-	request->ringbuf = request->ring->buffer;
+	request->ringbuf = request->engine->buffer;
 	return 0;
 }
 
@@ -2136,7 +2136,7 @@ void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf)
 static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
 {
 	struct intel_ringbuffer *ringbuf = req->ringbuf;
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_engine_cs *engine = req->engine;
 	struct drm_i915_gem_request *target;
 	unsigned space;
 	int ret;
@@ -2147,7 +2147,7 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
 	/* The whole point of reserving space is to not wait! */
 	WARN_ON(ringbuf->reserved_in_use);
 
-	list_for_each_entry(target, &ring->request_list, list) {
+	list_for_each_entry(target, &engine->request_list, list) {
 		/*
 		 * The request queue is per-engine, so can contain requests
 		 * from multiple ringbuffers. Here, we must ignore any that
@@ -2163,7 +2163,7 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
 			break;
 	}
 
-	if (WARN_ON(&target->list == &ring->request_list))
+	if (WARN_ON(&target->list == &engine->request_list))
 		return -ENOSPC;
 
 	ret = i915_wait_request(target);
@@ -2836,40 +2836,40 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 int
 intel_ring_flush_all_caches(struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_engine_cs *engine = req->engine;
 	int ret;
 
-	if (!ring->gpu_caches_dirty)
+	if (!engine->gpu_caches_dirty)
 		return 0;
 
-	ret = ring->flush(req, 0, I915_GEM_GPU_DOMAINS);
+	ret = engine->flush(req, 0, I915_GEM_GPU_DOMAINS);
 	if (ret)
 		return ret;
 
 	trace_i915_gem_ring_flush(req, 0, I915_GEM_GPU_DOMAINS);
 
-	ring->gpu_caches_dirty = false;
+	engine->gpu_caches_dirty = false;
 	return 0;
 }
 
 int
 intel_ring_invalidate_all_caches(struct drm_i915_gem_request *req)
 {
-	struct intel_engine_cs *ring = req->ring;
+	struct intel_engine_cs *engine = req->engine;
 	uint32_t flush_domains;
 	int ret;
 
 	flush_domains = 0;
-	if (ring->gpu_caches_dirty)
+	if (engine->gpu_caches_dirty)
 		flush_domains = I915_GEM_GPU_DOMAINS;
 
-	ret = ring->flush(req, I915_GEM_GPU_DOMAINS, flush_domains);
+	ret = engine->flush(req, I915_GEM_GPU_DOMAINS, flush_domains);
 	if (ret)
 		return ret;
 
 	trace_i915_gem_ring_flush(req, I915_GEM_GPU_DOMAINS, flush_domains);
 
-	ring->gpu_caches_dirty = false;
+	engine->gpu_caches_dirty = false;
 	return 0;
 }
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 059/190] drm/i915: Rename request->ringbuf to request->ring
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (56 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 058/190] drm/i915: Rename request->ring to request->engine Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-28 11:48   ` Tvrtko Ursulin
  2016-01-11  9:17 ` [PATCH 060/190] drm/i915: Rename backpointer from intel_ringbuffer to intel_engine_cs Chris Wilson
                   ` (28 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Now that we have disambuigated ring and engine, we can use the clearer
and more consistent name for the intel_ringbuffer pointer in the
request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c            |   8 +-
 drivers/gpu/drm/i915/i915_gem_context.c    |   2 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c        |   6 +-
 drivers/gpu/drm/i915/i915_gem_request.c    |  20 ++--
 drivers/gpu/drm/i915/i915_gem_request.h    |   2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c      |  31 +++---
 drivers/gpu/drm/i915/i915_guc_submission.c |   4 +-
 drivers/gpu/drm/i915/intel_display.c       |  10 +-
 drivers/gpu/drm/i915/intel_lrc.c           | 152 ++++++++++++++---------------
 drivers/gpu/drm/i915/intel_mocs.c          |  34 +++----
 drivers/gpu/drm/i915/intel_overlay.c       |  42 ++++----
 drivers/gpu/drm/i915/intel_ringbuffer.c    |  86 ++++++++--------
 13 files changed, 198 insertions(+), 203 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6622c9bb3af8..430c439ece26 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4083,11 +4083,11 @@ int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice)
 	 * at initialization time.
 	 */
 	for (i = 0; i < GEN7_L3LOG_SIZE / 4; i++) {
-		intel_ring_emit(req->ringbuf, MI_LOAD_REGISTER_IMM(1));
-		intel_ring_emit_reg(req->ringbuf, GEN7_L3LOG(slice, i));
-		intel_ring_emit(req->ringbuf, remap_info[i]);
+		intel_ring_emit(req->ring, MI_LOAD_REGISTER_IMM(1));
+		intel_ring_emit_reg(req->ring, GEN7_L3LOG(slice, i));
+		intel_ring_emit(req->ring, remap_info[i]);
 	}
-	intel_ring_advance(req->ringbuf);
+	intel_ring_advance(req->ring);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index dece033cf604..5b4e77a80c19 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -519,7 +519,7 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
 static inline int
 mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	u32 flags = hw_flags | MI_MM_SPACE_GTT;
 	const int num_rings =
 		/* Use an extended w/a on ivb+ if signalling from other rings */
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index e7df91f9a51f..a0f5a997c2f2 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1148,7 +1148,7 @@ i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params)
 static int
 i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	int ret, i;
 
 	if (!IS_GEN7(req->i915) || req->engine->id != RCS) {
@@ -1229,7 +1229,7 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 			       struct drm_i915_gem_execbuffer2 *args,
 			       struct list_head *vmas)
 {
-	struct intel_ringbuffer *ring = params->request->ringbuf;
+	struct intel_ringbuffer *ring = params->request->ring;
 	struct drm_i915_private *dev_priv = params->request->i915;
 	u64 exec_start, exec_len;
 	int instp_mode;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index cb7cb59d4c4a..38c109cda904 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -656,7 +656,7 @@ static int gen8_write_pdp(struct drm_i915_gem_request *req,
 			  unsigned entry,
 			  dma_addr_t addr)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	int ret;
 
 	BUG_ON(entry >= 4);
@@ -1648,7 +1648,7 @@ static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
 static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
 			 struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	int ret;
 
 	/* NB: TLBs must be flushed and invalidated before a switch */
@@ -1686,7 +1686,7 @@ static int vgpu_mm_switch(struct i915_hw_ppgtt *ppgtt,
 static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt,
 			  struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	int ret;
 
 	/* NB: TLBs must be flushed and invalidated before a switch */
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 8adf2c134048..4cc64d9cca12 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -255,7 +255,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *engine,
 	 * to be redone if the request is not actually submitted straight
 	 * away, e.g. because a GPU scheduler has deferred it.
 	 */
-	intel_ring_reserved_space_reserve(req->ringbuf,
+	intel_ring_reserved_space_reserve(req->ring,
 					  MIN_SPACE_FOR_ADD_REQUEST);
 	ret = intel_ring_begin(req, 0);
 	if (ret) {
@@ -328,7 +328,7 @@ static void __i915_gem_request_release(struct drm_i915_gem_request *request)
 
 void i915_gem_request_cancel(struct drm_i915_gem_request *req)
 {
-	intel_ring_reserved_space_cancel(req->ringbuf);
+	intel_ring_reserved_space_cancel(req->ring);
 	if (i915.enable_execlists) {
 		if (req->ctx != req->engine->default_context)
 			intel_lr_context_unpin(req);
@@ -349,7 +349,7 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	 * Note this requires that we are always called in request
 	 * completion order.
 	 */
-	request->ringbuf->last_retired_head = request->postfix;
+	request->ring->last_retired_head = request->postfix;
 	__i915_gem_request_release(request);
 }
 
@@ -401,23 +401,23 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 			struct drm_i915_gem_object *obj,
 			bool flush_caches)
 {
-	struct intel_ringbuffer *ringbuf;
+	struct intel_ringbuffer *ring;
 	u32 request_start;
 	int ret;
 
 	if (WARN_ON(request == NULL))
 		return;
 
-	ringbuf = request->ringbuf;
+	ring = request->ring;
 
 	/*
 	 * To ensure that this call will not fail, space for its emissions
 	 * should already have been reserved in the ring buffer. Let the ring
 	 * know that it is time to use that space up.
 	 */
-	intel_ring_reserved_space_use(ringbuf);
+	intel_ring_reserved_space_use(ring);
 
-	request_start = intel_ring_get_tail(ringbuf);
+	request_start = intel_ring_get_tail(ring);
 	/*
 	 * Emit any outstanding flushes - execbuf can fail to emit the flush
 	 * after having emitted the batchbuffer command. Hence we need to fix
@@ -439,14 +439,14 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	 * GPU processing the request, we never over-estimate the
 	 * position of the head.
 	 */
-	request->postfix = intel_ring_get_tail(ringbuf);
+	request->postfix = intel_ring_get_tail(ring);
 
 	if (i915.enable_execlists)
 		ret = request->engine->emit_request(request);
 	else {
 		ret = request->engine->add_request(request);
 
-		request->tail = intel_ring_get_tail(ringbuf);
+		request->tail = intel_ring_get_tail(ring);
 	}
 	/* Not allowed to fail! */
 	WARN(ret, "emit|add_request failed: %d!\n", ret);
@@ -471,7 +471,7 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	i915_gem_mark_busy(request->i915);
 
 	/* Sanity check that the reserved size was large enough. */
-	intel_ring_reserved_space_end(ringbuf);
+	intel_ring_reserved_space_end(ring);
 }
 
 
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 802862e5007d..bd17e3a9a71d 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -79,7 +79,7 @@ struct drm_i915_gem_request {
 	 * context.
 	 */
 	struct intel_context *ctx;
-	struct intel_ringbuffer *ringbuf;
+	struct intel_ringbuffer *ring;
 
 	/** Batch buffer related to this request if any (used for
 	    error state dump only) */
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 5bf208d8009e..b47ca1b7041f 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -993,21 +993,21 @@ static void i915_gem_record_rings(struct drm_device *dev,
 	int i, count;
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
-		struct intel_engine_cs *ring = &dev_priv->ring[i];
+		struct intel_engine_cs *engine = &dev_priv->ring[i];
 
 		error->ring[i].pid = -1;
 
-		if (ring->dev == NULL)
+		if (engine->dev == NULL)
 			continue;
 
 		error->ring[i].valid = true;
 
-		i915_record_ring_state(dev, error, ring, &error->ring[i]);
+		i915_record_ring_state(dev, error, engine, &error->ring[i]);
 
-		request = i915_gem_find_active_request(ring);
+		request = i915_gem_find_active_request(engine);
 		if (request) {
 			struct i915_address_space *vm;
-			struct intel_ringbuffer *rb;
+			struct intel_ringbuffer *ring;
 
 			vm = request->ctx && request->ctx->ppgtt ?
 				&request->ctx->ppgtt->base :
@@ -1022,10 +1022,10 @@ static void i915_gem_record_rings(struct drm_device *dev,
 							 request->batch_obj,
 							 vm);
 
-			if (HAS_BROKEN_CS_TLB(dev_priv->dev))
+			if (HAS_BROKEN_CS_TLB(dev_priv))
 				error->ring[i].wa_batchbuffer =
 					i915_error_ggtt_object_create(dev_priv,
-							     ring->scratch.obj);
+								      engine->scratch.obj);
 
 			if (request->pid) {
 				struct task_struct *task;
@@ -1041,21 +1041,22 @@ static void i915_gem_record_rings(struct drm_device *dev,
 
 			error->simulated |= request->ctx->flags & CONTEXT_NO_ERROR_CAPTURE;
 
-			rb = request->ringbuf;
-			error->ring[i].cpu_ring_head = rb->head;
-			error->ring[i].cpu_ring_tail = rb->tail;
+			ring = request->ring;
+			error->ring[i].cpu_ring_head = ring->head;
+			error->ring[i].cpu_ring_tail = ring->tail;
 			error->ring[i].ringbuffer =
 				i915_error_ggtt_object_create(dev_priv,
-							      rb->obj);
+							      ring->obj);
 		}
 
 		error->ring[i].hws_page =
-			i915_error_ggtt_object_create(dev_priv, ring->status_page.obj);
+			i915_error_ggtt_object_create(dev_priv,
+						      engine->status_page.obj);
 
-		i915_gem_record_active_context(ring, error, &error->ring[i]);
+		i915_gem_record_active_context(engine, error, &error->ring[i]);
 
 		count = 0;
-		list_for_each_entry(request, &ring->request_list, list)
+		list_for_each_entry(request, &engine->request_list, list)
 			count++;
 
 		error->ring[i].num_requests = count;
@@ -1068,7 +1069,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 		}
 
 		count = 0;
-		list_for_each_entry(request, &ring->request_list, list) {
+		list_for_each_entry(request, &engine->request_list, list) {
 			struct drm_i915_error_request *erq;
 
 			if (count >= error->ring[i].num_requests) {
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index eaf680ce5c9c..e82cc9182dfa 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -551,7 +551,7 @@ static int guc_add_workqueue_item(struct i915_guc_client *gc,
 	wqi->context_desc = (u32)intel_lr_context_descriptor(rq->ctx, rq->engine);
 
 	/* The GuC firmware wants the tail index in QWords, not bytes */
-	tail = rq->ringbuf->tail >> 3;
+	tail = rq->ring->tail >> 3;
 	wqi->ring_tail = tail << WQ_RING_TAIL_SHIFT;
 	wqi->fence_id = 0; /*XXX: what fence to be here */
 
@@ -567,7 +567,7 @@ static void lr_context_update(struct drm_i915_gem_request *rq)
 {
 	enum intel_ring_id ring_id = rq->engine->id;
 	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring_id].state;
-	struct drm_i915_gem_object *rb_obj = rq->ringbuf->obj;
+	struct drm_i915_gem_object *rb_obj = rq->ring->obj;
 	struct page *page;
 	uint32_t *reg_state;
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 323b0d905c89..0d42356f15b4 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11052,7 +11052,7 @@ static int intel_gen2_queue_flip(struct drm_device *dev,
 				 struct drm_i915_gem_request *req,
 				 uint32_t flags)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	u32 flip_mask;
 	int ret;
@@ -11087,7 +11087,7 @@ static int intel_gen3_queue_flip(struct drm_device *dev,
 				 struct drm_i915_gem_request *req,
 				 uint32_t flags)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	u32 flip_mask;
 	int ret;
@@ -11119,7 +11119,7 @@ static int intel_gen4_queue_flip(struct drm_device *dev,
 				 struct drm_i915_gem_request *req,
 				 uint32_t flags)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	struct drm_i915_private *dev_priv = req->i915;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	uint32_t pf, pipesrc;
@@ -11158,7 +11158,7 @@ static int intel_gen6_queue_flip(struct drm_device *dev,
 				 struct drm_i915_gem_request *req,
 				 uint32_t flags)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	struct drm_i915_private *dev_priv = req->i915;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	uint32_t pf, pipesrc;
@@ -11194,7 +11194,7 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 				 struct drm_i915_gem_request *req,
 				 uint32_t flags)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	uint32_t plane_bit = 0;
 	int len, ret;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1b70a76df31d..87d325b6e7dc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -360,7 +360,7 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 {
 	struct i915_hw_ppgtt *ppgtt = rq->ctx->ppgtt;
 	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[rq->engine->id].state;
-	struct drm_i915_gem_object *rb_obj = rq->ringbuf->obj;
+	struct drm_i915_gem_object *rb_obj = rq->ring->obj;
 	struct page *page;
 	uint32_t *reg_state;
 
@@ -671,7 +671,7 @@ int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request
 {
 	int ret;
 
-	request->ringbuf = request->ctx->engine[request->engine->id].ringbuf;
+	request->ring = request->ctx->engine[request->engine->id].ringbuf;
 
 	if (request->ctx != request->engine->default_context) {
 		ret = intel_lr_context_pin(request);
@@ -709,8 +709,8 @@ intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
 {
 	struct drm_i915_private *dev_priv = request->i915;
 
-	intel_ring_advance(request->ringbuf);
-	request->tail = request->ringbuf->tail;
+	intel_ring_advance(request->ring);
+	request->tail = request->ring->tail;
 
 	if (dev_priv->guc.execbuf_client)
 		i915_guc_submit(dev_priv->guc.execbuf_client, request);
@@ -740,9 +740,9 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 			       struct list_head *vmas)
 {
 	struct drm_device       *dev = params->dev;
-	struct intel_engine_cs  *ring = params->ring;
+	struct intel_engine_cs  *engine = params->ring;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_ringbuffer *ringbuf = params->ctx->engine[ring->id].ringbuf;
+	struct intel_ringbuffer *ring = params->request->ring;
 	u64 exec_start;
 	int instp_mode;
 	u32 instp_mask;
@@ -754,7 +754,7 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 	case I915_EXEC_CONSTANTS_REL_GENERAL:
 	case I915_EXEC_CONSTANTS_ABSOLUTE:
 	case I915_EXEC_CONSTANTS_REL_SURFACE:
-		if (instp_mode != 0 && ring != &dev_priv->ring[RCS]) {
+		if (instp_mode != 0 && engine->id != RCS) {
 			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
 			return -EINVAL;
 		}
@@ -783,17 +783,17 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 	if (ret)
 		return ret;
 
-	if (ring == &dev_priv->ring[RCS] &&
+	if (engine->id == RCS &&
 	    instp_mode != dev_priv->relative_constants_mode) {
 		ret = intel_ring_begin(params->request, 4);
 		if (ret)
 			return ret;
 
-		intel_ring_emit(ringbuf, MI_NOOP);
-		intel_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1));
-		intel_ring_emit_reg(ringbuf, INSTPM);
-		intel_ring_emit(ringbuf, instp_mask << 16 | instp_mode);
-		intel_ring_advance(ringbuf);
+		intel_ring_emit(ring, MI_NOOP);
+		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
+		intel_ring_emit_reg(ring, INSTPM);
+		intel_ring_emit(ring, instp_mask << 16 | instp_mode);
+		intel_ring_advance(ring);
 
 		dev_priv->relative_constants_mode = instp_mode;
 	}
@@ -801,7 +801,7 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 	exec_start = params->batch_obj_vm_offset +
 		     args->batch_start_offset;
 
-	ret = ring->emit_bb_start(params->request, exec_start, params->dispatch_flags);
+	ret = engine->emit_bb_start(params->request, exec_start, params->dispatch_flags);
 	if (ret)
 		return ret;
 
@@ -880,13 +880,12 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 		struct drm_i915_gem_object *ctx_obj,
 		struct intel_ringbuffer *ringbuf)
 {
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_private *dev_priv = ring->i915;
 	int ret = 0;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
 	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN,
-			PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
+				    PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
 	if (ret)
 		return ret;
 
@@ -918,7 +917,7 @@ static int intel_lr_context_pin(struct drm_i915_gem_request *rq)
 
 	ret = intel_lr_context_do_pin(rq->engine,
 				      rq->ctx->engine[engine].state,
-				      rq->ringbuf);
+				      rq->ring);
 	if (ret) {
 		rq->ctx->engine[engine].pin_count = 0;
 		return ret;
@@ -932,12 +931,12 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 {
 	int engine = rq->engine->id;
 	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[engine].state;
-	struct intel_ringbuffer *ringbuf = rq->ringbuf;
+	struct intel_ringbuffer *ring = rq->ring;
 
 	if (ctx_obj) {
 		WARN_ON(!mutex_is_locked(&rq->i915->dev->struct_mutex));
 		if (--rq->ctx->engine[engine].pin_count == 0) {
-			intel_unpin_ringbuffer_obj(ringbuf);
+			intel_unpin_ringbuffer_obj(ring);
 			i915_gem_object_ggtt_unpin(ctx_obj);
 			i915_gem_context_unreference(rq->ctx);
 		}
@@ -948,7 +947,7 @@ static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
 {
 	int ret, i;
 	struct intel_engine_cs *engine = req->engine;
-	struct intel_ringbuffer *ringbuf = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	struct drm_i915_private *dev_priv = req->i915;
 	struct i915_workarounds *w = &dev_priv->workarounds;
 
@@ -964,14 +963,14 @@ static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
 	if (ret)
 		return ret;
 
-	intel_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(w->count));
+	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(w->count));
 	for (i = 0; i < w->count; i++) {
-		intel_ring_emit_reg(ringbuf, w->reg[i].addr);
-		intel_ring_emit(ringbuf, w->reg[i].value);
+		intel_ring_emit_reg(ring, w->reg[i].addr);
+		intel_ring_emit(ring, w->reg[i].value);
 	}
-	intel_ring_emit(ringbuf, MI_NOOP);
+	intel_ring_emit(ring, MI_NOOP);
 
-	intel_ring_advance(ringbuf);
+	intel_ring_advance(ring);
 
 	engine->gpu_caches_dirty = true;
 	ret = logical_ring_flush_all_caches(req);
@@ -1418,7 +1417,7 @@ static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
 {
 	struct i915_hw_ppgtt *ppgtt = req->ctx->ppgtt;
 	struct intel_engine_cs *engine = req->engine;
-	struct intel_ringbuffer *ringbuf = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	const int num_lri_cmds = GEN8_LEGACY_PDPES * 2;
 	int i, ret;
 
@@ -1426,18 +1425,18 @@ static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
 	if (ret)
 		return ret;
 
-	intel_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(num_lri_cmds));
+	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(num_lri_cmds));
 	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
 		const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
 
-		intel_ring_emit_reg(ringbuf, GEN8_RING_PDP_UDW(engine, i));
-		intel_ring_emit(ringbuf, upper_32_bits(pd_daddr));
-		intel_ring_emit_reg(ringbuf, GEN8_RING_PDP_LDW(engine, i));
-		intel_ring_emit(ringbuf, lower_32_bits(pd_daddr));
+		intel_ring_emit_reg(ring, GEN8_RING_PDP_UDW(engine, i));
+		intel_ring_emit(ring, upper_32_bits(pd_daddr));
+		intel_ring_emit_reg(ring, GEN8_RING_PDP_LDW(engine, i));
+		intel_ring_emit(ring, lower_32_bits(pd_daddr));
 	}
 
-	intel_ring_emit(ringbuf, MI_NOOP);
-	intel_ring_advance(ringbuf);
+	intel_ring_emit(ring, MI_NOOP);
+	intel_ring_advance(ring);
 
 	return 0;
 }
@@ -1445,7 +1444,7 @@ static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
 static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
 			      u64 offset, unsigned dispatch_flags)
 {
-	struct intel_ringbuffer *ringbuf = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	bool ppgtt = !(dispatch_flags & I915_DISPATCH_SECURE);
 	int ret;
 
@@ -1472,14 +1471,14 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
 		return ret;
 
 	/* FIXME(BDW): Address space and security selectors. */
-	intel_ring_emit(ringbuf, MI_BATCH_BUFFER_START_GEN8 |
+	intel_ring_emit(ring, MI_BATCH_BUFFER_START_GEN8 |
 			(ppgtt<<8) |
 			(dispatch_flags & I915_DISPATCH_RS ?
 			 MI_BATCH_RESOURCE_STREAMER : 0));
-	intel_ring_emit(ringbuf, lower_32_bits(offset));
-	intel_ring_emit(ringbuf, upper_32_bits(offset));
-	intel_ring_emit(ringbuf, MI_NOOP);
-	intel_ring_advance(ringbuf);
+	intel_ring_emit(ring, lower_32_bits(offset));
+	intel_ring_emit(ring, upper_32_bits(offset));
+	intel_ring_emit(ring, MI_NOOP);
+	intel_ring_advance(ring);
 
 	return 0;
 }
@@ -1504,10 +1503,7 @@ static int gen8_emit_flush(struct drm_i915_gem_request *request,
 			   u32 invalidate_domains,
 			   u32 unused)
 {
-	struct intel_ringbuffer *ringbuf = request->ringbuf;
-	struct intel_engine_cs *ring = ringbuf->ring;
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_ringbuffer *ring = request->ring;
 	uint32_t cmd;
 	int ret;
 
@@ -1526,17 +1522,17 @@ static int gen8_emit_flush(struct drm_i915_gem_request *request,
 
 	if (invalidate_domains & I915_GEM_GPU_DOMAINS) {
 		cmd |= MI_INVALIDATE_TLB;
-		if (ring == &dev_priv->ring[VCS])
+		if (request->engine->id == VCS)
 			cmd |= MI_INVALIDATE_BSD;
 	}
 
-	intel_ring_emit(ringbuf, cmd);
-	intel_ring_emit(ringbuf,
+	intel_ring_emit(ring, cmd);
+	intel_ring_emit(ring,
 			I915_GEM_HWS_SCRATCH_ADDR |
 			MI_FLUSH_DW_USE_GTT);
-	intel_ring_emit(ringbuf, 0); /* upper addr */
-	intel_ring_emit(ringbuf, 0); /* value */
-	intel_ring_advance(ringbuf);
+	intel_ring_emit(ring, 0); /* upper addr */
+	intel_ring_emit(ring, 0); /* value */
+	intel_ring_advance(ring);
 
 	return 0;
 }
@@ -1545,9 +1541,8 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
 				  u32 invalidate_domains,
 				  u32 flush_domains)
 {
-	struct intel_ringbuffer *ringbuf = request->ringbuf;
-	struct intel_engine_cs *ring = ringbuf->ring;
-	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	struct intel_ringbuffer *ring = request->ring;
+	u32 scratch_addr = request->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	bool vf_flush_wa = false;
 	u32 flags = 0;
 	int ret;
@@ -1574,7 +1569,7 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
 		 * On GEN9: before VF_CACHE_INVALIDATE we need to emit a NULL
 		 * pipe control.
 		 */
-		if (IS_GEN9(ring->dev))
+		if (IS_GEN9(request->i915))
 			vf_flush_wa = true;
 	}
 
@@ -1583,21 +1578,21 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
 		return ret;
 
 	if (vf_flush_wa) {
-		intel_ring_emit(ringbuf, GFX_OP_PIPE_CONTROL(6));
-		intel_ring_emit(ringbuf, 0);
-		intel_ring_emit(ringbuf, 0);
-		intel_ring_emit(ringbuf, 0);
-		intel_ring_emit(ringbuf, 0);
-		intel_ring_emit(ringbuf, 0);
+		intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(6));
+		intel_ring_emit(ring, 0);
+		intel_ring_emit(ring, 0);
+		intel_ring_emit(ring, 0);
+		intel_ring_emit(ring, 0);
+		intel_ring_emit(ring, 0);
 	}
 
-	intel_ring_emit(ringbuf, GFX_OP_PIPE_CONTROL(6));
-	intel_ring_emit(ringbuf, flags);
-	intel_ring_emit(ringbuf, scratch_addr);
-	intel_ring_emit(ringbuf, 0);
-	intel_ring_emit(ringbuf, 0);
-	intel_ring_emit(ringbuf, 0);
-	intel_ring_advance(ringbuf);
+	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(6));
+	intel_ring_emit(ring, flags);
+	intel_ring_emit(ring, scratch_addr);
+	intel_ring_emit(ring, 0);
+	intel_ring_emit(ring, 0);
+	intel_ring_emit(ring, 0);
+	intel_ring_advance(ring);
 
 	return 0;
 }
@@ -1625,8 +1620,7 @@ gen6_seqno_barrier(struct intel_engine_cs *ring)
 
 static int gen8_emit_request(struct drm_i915_gem_request *request)
 {
-	struct intel_ringbuffer *ringbuf = request->ringbuf;
-	struct intel_engine_cs *ring = ringbuf->ring;
+	struct intel_ringbuffer *ring = request->ring;
 	u32 cmd;
 	int ret;
 
@@ -1642,23 +1636,23 @@ static int gen8_emit_request(struct drm_i915_gem_request *request)
 	cmd = MI_STORE_DWORD_IMM_GEN4;
 	cmd |= MI_GLOBAL_GTT;
 
-	intel_ring_emit(ringbuf, cmd);
-	intel_ring_emit(ringbuf,
-			(ring->status_page.gfx_addr +
+	intel_ring_emit(ring, cmd);
+	intel_ring_emit(ring,
+			(request->engine->status_page.gfx_addr +
 			 (I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT)));
-	intel_ring_emit(ringbuf, 0);
-	intel_ring_emit(ringbuf, request->fence.seqno);
-	intel_ring_emit(ringbuf, MI_USER_INTERRUPT);
-	intel_ring_emit(ringbuf, MI_NOOP);
+	intel_ring_emit(ring, 0);
+	intel_ring_emit(ring, request->fence.seqno);
+	intel_ring_emit(ring, MI_USER_INTERRUPT);
+	intel_ring_emit(ring, MI_NOOP);
 	intel_logical_ring_advance_and_submit(request);
 
 	/*
 	 * Here we add two extra NOOPs as padding to avoid
 	 * lite restore of a context with HEAD==TAIL.
 	 */
-	intel_ring_emit(ringbuf, MI_NOOP);
-	intel_ring_emit(ringbuf, MI_NOOP);
-	intel_ring_advance(ringbuf);
+	intel_ring_emit(ring, MI_NOOP);
+	intel_ring_emit(ring, MI_NOOP);
+	intel_ring_advance(ring);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/intel_mocs.c b/drivers/gpu/drm/i915/intel_mocs.c
index 40041bebc3dc..039c7405f640 100644
--- a/drivers/gpu/drm/i915/intel_mocs.c
+++ b/drivers/gpu/drm/i915/intel_mocs.c
@@ -191,9 +191,9 @@ static i915_reg_t mocs_register(enum intel_ring_id ring, int index)
  */
 static int emit_mocs_control_table(struct drm_i915_gem_request *req,
 				   const struct drm_i915_mocs_table *table,
-				   enum intel_ring_id ring)
+				   enum intel_ring_id id)
 {
-	struct intel_ringbuffer *ringbuf = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	unsigned int index;
 	int ret;
 
@@ -204,11 +204,11 @@ static int emit_mocs_control_table(struct drm_i915_gem_request *req,
 	if (ret)
 		return ret;
 
-	intel_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(GEN9_NUM_MOCS_ENTRIES));
+	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(GEN9_NUM_MOCS_ENTRIES));
 
 	for (index = 0; index < table->size; index++) {
-		intel_ring_emit_reg(ringbuf, mocs_register(ring, index));
-		intel_ring_emit(ringbuf, table->table[index].control_value);
+		intel_ring_emit_reg(ring, mocs_register(id, index));
+		intel_ring_emit(ring, table->table[index].control_value);
 	}
 
 	/*
@@ -220,12 +220,12 @@ static int emit_mocs_control_table(struct drm_i915_gem_request *req,
 	 * that value to all the used entries.
 	 */
 	for (; index < GEN9_NUM_MOCS_ENTRIES; index++) {
-		intel_ring_emit_reg(ringbuf, mocs_register(ring, index));
-		intel_ring_emit(ringbuf, table->table[0].control_value);
+		intel_ring_emit_reg(ring, mocs_register(id, index));
+		intel_ring_emit(ring, table->table[0].control_value);
 	}
 
-	intel_ring_emit(ringbuf, MI_NOOP);
-	intel_ring_advance(ringbuf);
+	intel_ring_emit(ring, MI_NOOP);
+	intel_ring_advance(ring);
 
 	return 0;
 }
@@ -244,7 +244,7 @@ static int emit_mocs_control_table(struct drm_i915_gem_request *req,
 static int emit_mocs_l3cc_table(struct drm_i915_gem_request *req,
 				const struct drm_i915_mocs_table *table)
 {
-	struct intel_ringbuffer *ringbuf = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	unsigned int count;
 	unsigned int i;
 	u32 value;
@@ -259,15 +259,15 @@ static int emit_mocs_l3cc_table(struct drm_i915_gem_request *req,
 	if (ret)
 		return ret;
 
-	intel_ring_emit(ringbuf,
+	intel_ring_emit(ring,
 			MI_LOAD_REGISTER_IMM(GEN9_NUM_MOCS_ENTRIES / 2));
 
 	for (i = 0, count = 0; i < table->size / 2; i++, count += 2) {
 		value = (table->table[count].l3cc_value & 0xffff) |
 			((table->table[count + 1].l3cc_value & 0xffff) << 16);
 
-		intel_ring_emit_reg(ringbuf, GEN9_LNCFCMOCS(i));
-		intel_ring_emit(ringbuf, value);
+		intel_ring_emit_reg(ring, GEN9_LNCFCMOCS(i));
+		intel_ring_emit(ring, value);
 	}
 
 	if (table->size & 0x01) {
@@ -283,14 +283,14 @@ static int emit_mocs_l3cc_table(struct drm_i915_gem_request *req,
 	 * they are reserved by the hardware.
 	 */
 	for (; i < GEN9_NUM_MOCS_ENTRIES / 2; i++) {
-		intel_ring_emit_reg(ringbuf, GEN9_LNCFCMOCS(i));
-		intel_ring_emit(ringbuf, value);
+		intel_ring_emit_reg(ring, GEN9_LNCFCMOCS(i));
+		intel_ring_emit(ring, value);
 
 		value = filler;
 	}
 
-	intel_ring_emit(ringbuf, MI_NOOP);
-	intel_ring_advance(ringbuf);
+	intel_ring_emit(ring, MI_NOOP);
+	intel_ring_advance(ring);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index 6dca0e470e61..cb73d16848b0 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -252,11 +252,11 @@ static int intel_overlay_on(struct intel_overlay *overlay)
 
 	overlay->active = true;
 
-	intel_ring_emit(req->ringbuf, MI_OVERLAY_FLIP | MI_OVERLAY_ON);
-	intel_ring_emit(req->ringbuf, overlay->flip_addr | OFC_UPDATE);
-	intel_ring_emit(req->ringbuf, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
-	intel_ring_emit(req->ringbuf, MI_NOOP);
-	intel_ring_advance(req->ringbuf);
+	intel_ring_emit(req->ring, MI_OVERLAY_FLIP | MI_OVERLAY_ON);
+	intel_ring_emit(req->ring, overlay->flip_addr | OFC_UPDATE);
+	intel_ring_emit(req->ring, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
+	intel_ring_emit(req->ring, MI_NOOP);
+	intel_ring_advance(req->ring);
 
 	return intel_overlay_do_wait_request(overlay, req, NULL);
 }
@@ -293,9 +293,9 @@ static int intel_overlay_continue(struct intel_overlay *overlay,
 		return ret;
 	}
 
-	intel_ring_emit(req->ringbuf, MI_OVERLAY_FLIP | MI_OVERLAY_CONTINUE);
-	intel_ring_emit(req->ringbuf, flip_addr);
-	intel_ring_advance(req->ringbuf);
+	intel_ring_emit(req->ring, MI_OVERLAY_FLIP | MI_OVERLAY_CONTINUE);
+	intel_ring_emit(req->ring, flip_addr);
+	intel_ring_advance(req->ring);
 
 	WARN_ON(overlay->last_flip_req);
 	i915_gem_request_assign(&overlay->last_flip_req, req);
@@ -360,22 +360,22 @@ static int intel_overlay_off(struct intel_overlay *overlay)
 	}
 
 	/* wait for overlay to go idle */
-	intel_ring_emit(req->ringbuf, MI_OVERLAY_FLIP | MI_OVERLAY_CONTINUE);
-	intel_ring_emit(req->ringbuf, flip_addr);
-	intel_ring_emit(req->ringbuf, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
+	intel_ring_emit(req->ring, MI_OVERLAY_FLIP | MI_OVERLAY_CONTINUE);
+	intel_ring_emit(req->ring, flip_addr);
+	intel_ring_emit(req->ring, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
 	/* turn overlay off */
 	if (IS_I830(dev)) {
 		/* Workaround: Don't disable the overlay fully, since otherwise
 		 * it dies on the next OVERLAY_ON cmd. */
-		intel_ring_emit(req->ringbuf, MI_NOOP);
-		intel_ring_emit(req->ringbuf, MI_NOOP);
-		intel_ring_emit(req->ringbuf, MI_NOOP);
+		intel_ring_emit(req->ring, MI_NOOP);
+		intel_ring_emit(req->ring, MI_NOOP);
+		intel_ring_emit(req->ring, MI_NOOP);
 	} else {
-		intel_ring_emit(req->ringbuf, MI_OVERLAY_FLIP | MI_OVERLAY_OFF);
-		intel_ring_emit(req->ringbuf, flip_addr);
-		intel_ring_emit(req->ringbuf, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
+		intel_ring_emit(req->ring, MI_OVERLAY_FLIP | MI_OVERLAY_OFF);
+		intel_ring_emit(req->ring, flip_addr);
+		intel_ring_emit(req->ring, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
 	}
-	intel_ring_advance(req->ringbuf);
+	intel_ring_advance(req->ring);
 
 	return intel_overlay_do_wait_request(overlay, req, intel_overlay_off_tail);
 }
@@ -433,9 +433,9 @@ static int intel_overlay_release_old_vid(struct intel_overlay *overlay)
 			return ret;
 		}
 
-		intel_ring_emit(req->ringbuf, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
-		intel_ring_emit(req->ringbuf, MI_NOOP);
-		intel_ring_advance(req->ringbuf);
+		intel_ring_emit(req->ring, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
+		intel_ring_emit(req->ring, MI_NOOP);
+		intel_ring_advance(req->ring);
 
 		ret = intel_overlay_do_wait_request(overlay, req,
 						    intel_overlay_release_old_vid_tail);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 072fd0fc7748..ae00e79c9c99 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -71,7 +71,7 @@ gen2_render_ring_flush(struct drm_i915_gem_request *req,
 		       u32	invalidate_domains,
 		       u32	flush_domains)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	u32 cmd;
 	int ret;
 
@@ -98,7 +98,7 @@ gen4_render_ring_flush(struct drm_i915_gem_request *req,
 		       u32	invalidate_domains,
 		       u32	flush_domains)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	u32 cmd;
 	int ret;
 
@@ -191,7 +191,7 @@ gen4_render_ring_flush(struct drm_i915_gem_request *req,
 static int
 intel_emit_post_sync_nonzero_flush(struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	u32 scratch_addr = req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
 
@@ -227,7 +227,7 @@ static int
 gen6_render_ring_flush(struct drm_i915_gem_request *req,
 		       u32 invalidate_domains, u32 flush_domains)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	u32 flags = 0;
 	u32 scratch_addr = req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
@@ -279,7 +279,7 @@ gen6_render_ring_flush(struct drm_i915_gem_request *req,
 static int
 gen7_render_ring_cs_stall_wa(struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	int ret;
 
 	ret = intel_ring_begin(req, 4);
@@ -300,7 +300,7 @@ static int
 gen7_render_ring_flush(struct drm_i915_gem_request *req,
 		       u32 invalidate_domains, u32 flush_domains)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	u32 flags = 0;
 	u32 scratch_addr = req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
@@ -363,7 +363,7 @@ static int
 gen8_emit_pipe_control(struct drm_i915_gem_request *req,
 		       u32 flags, u32 scratch_addr)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	int ret;
 
 	ret = intel_ring_begin(req, 6);
@@ -688,7 +688,7 @@ err:
 
 static int intel_ring_workarounds_emit(struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	struct drm_i915_private *dev_priv = req->i915;
 	struct i915_workarounds *w = &dev_priv->workarounds;
 	int ret, i;
@@ -1191,7 +1191,7 @@ static int gen8_rcs_signal(struct drm_i915_gem_request *signaller_req,
 			   unsigned int num_dwords)
 {
 #define MBOX_UPDATE_DWORDS 8
-	struct intel_ringbuffer *signaller = signaller_req->ringbuf;
+	struct intel_ringbuffer *signaller = signaller_req->ring;
 	struct drm_i915_private *dev_priv = signaller_req->i915;
 	struct intel_engine_cs *waiter;
 	int i, ret, num_rings;
@@ -1229,7 +1229,7 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
 			   unsigned int num_dwords)
 {
 #define MBOX_UPDATE_DWORDS 6
-	struct intel_ringbuffer *signaller = signaller_req->ringbuf;
+	struct intel_ringbuffer *signaller = signaller_req->ring;
 	struct drm_i915_private *dev_priv = signaller_req->i915;
 	struct intel_engine_cs *waiter;
 	int i, ret, num_rings;
@@ -1264,7 +1264,7 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
 static int gen6_signal(struct drm_i915_gem_request *signaller_req,
 		       unsigned int num_dwords)
 {
-	struct intel_ringbuffer *signaller = signaller_req->ringbuf;
+	struct intel_ringbuffer *signaller = signaller_req->ring;
 	struct drm_i915_private *dev_priv = signaller_req->i915;
 	struct intel_engine_cs *useless;
 	int i, ret, num_rings;
@@ -1306,7 +1306,7 @@ static int gen6_signal(struct drm_i915_gem_request *signaller_req,
 static int
 gen6_add_request(struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	int ret;
 
 	if (req->engine->semaphore.signal)
@@ -1345,7 +1345,7 @@ gen8_ring_sync(struct drm_i915_gem_request *waiter_req,
 	       struct intel_engine_cs *signaller,
 	       u32 seqno)
 {
-	struct intel_ringbuffer *waiter = waiter_req->ringbuf;
+	struct intel_ringbuffer *waiter = waiter_req->ring;
 	struct drm_i915_private *dev_priv = waiter_req->i915;
 	int ret;
 
@@ -1373,7 +1373,7 @@ gen6_ring_sync(struct drm_i915_gem_request *waiter_req,
 	       struct intel_engine_cs *signaller,
 	       u32 seqno)
 {
-	struct intel_ringbuffer *waiter = waiter_req->ringbuf;
+	struct intel_ringbuffer *waiter = waiter_req->ring;
 	u32 dw1 = MI_SEMAPHORE_MBOX |
 		  MI_SEMAPHORE_COMPARE |
 		  MI_SEMAPHORE_REGISTER;
@@ -1421,7 +1421,7 @@ do {									\
 static int
 pc_render_add_request(struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	u32 addr = req->engine->status_page.gfx_addr +
 		(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
 	u32 scratch_addr = addr;
@@ -1548,7 +1548,7 @@ bsd_ring_flush(struct drm_i915_gem_request *req,
 	       u32     invalidate_domains,
 	       u32     flush_domains)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	int ret;
 
 	ret = intel_ring_begin(req, 2);
@@ -1564,7 +1564,7 @@ bsd_ring_flush(struct drm_i915_gem_request *req,
 static int
 i9xx_add_request(struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	int ret;
 
 	ret = intel_ring_begin(req, 4);
@@ -1658,7 +1658,7 @@ i965_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			 u64 offset, u32 length,
 			 unsigned dispatch_flags)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	int ret;
 
 	ret = intel_ring_begin(req, 2);
@@ -1685,7 +1685,7 @@ i830_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			 u64 offset, u32 len,
 			 unsigned dispatch_flags)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	u32 cs_offset = req->engine->scratch.gtt_offset;
 	int ret;
 
@@ -1748,7 +1748,7 @@ i915_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			 u64 offset, u32 len,
 			 unsigned dispatch_flags)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	int ret;
 
 	ret = intel_ring_begin(req, 2);
@@ -2082,7 +2082,7 @@ int intel_ring_idle(struct intel_engine_cs *ring)
 
 int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
 {
-	request->ringbuf = request->engine->buffer;
+	request->ring = request->engine->buffer;
 	return 0;
 }
 
@@ -2135,17 +2135,17 @@ void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf)
 
 static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
 {
-	struct intel_ringbuffer *ringbuf = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	struct intel_engine_cs *engine = req->engine;
 	struct drm_i915_gem_request *target;
 	unsigned space;
 	int ret;
 
-	if (intel_ring_space(ringbuf) >= bytes)
+	if (intel_ring_space(ring) >= bytes)
 		return 0;
 
 	/* The whole point of reserving space is to not wait! */
-	WARN_ON(ringbuf->reserved_in_use);
+	WARN_ON(ring->reserved_in_use);
 
 	list_for_each_entry(target, &engine->request_list, list) {
 		/*
@@ -2153,12 +2153,12 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
 		 * from multiple ringbuffers. Here, we must ignore any that
 		 * aren't from the ringbuffer we're considering.
 		 */
-		if (target->ringbuf != ringbuf)
+		if (target->ring != ring)
 			continue;
 
 		/* Would completion of this request free enough space? */
-		space = __intel_ring_space(target->postfix, ringbuf->tail,
-					   ringbuf->size);
+		space = __intel_ring_space(target->postfix, ring->tail,
+					   ring->size);
 		if (space >= bytes)
 			break;
 	}
@@ -2170,7 +2170,7 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
 	if (ret)
 		return ret;
 
-	ringbuf->space = space;
+	ring->space = space;
 	return 0;
 }
 
@@ -2185,16 +2185,16 @@ static void ring_wrap(struct intel_ringbuffer *ringbuf)
 
 static int ring_prepare(struct drm_i915_gem_request *req, int bytes)
 {
-	struct intel_ringbuffer *ringbuf = req->ringbuf;
-	int remain_usable = ringbuf->effective_size - ringbuf->tail;
-	int remain_actual = ringbuf->size - ringbuf->tail;
+	struct intel_ringbuffer *ring = req->ring;
+	int remain_usable = ring->effective_size - ring->tail;
+	int remain_actual = ring->size - ring->tail;
 	int ret, total_bytes, wait_bytes = 0;
 	bool need_wrap = false;
 
-	if (ringbuf->reserved_in_use)
+	if (ring->reserved_in_use)
 		total_bytes = bytes;
 	else
-		total_bytes = bytes + ringbuf->reserved_size;
+		total_bytes = bytes + ring->reserved_size;
 
 	if (unlikely(bytes > remain_usable)) {
 		/*
@@ -2210,9 +2210,9 @@ static int ring_prepare(struct drm_i915_gem_request *req, int bytes)
 			 * falls off the end. So only need to to wait for the
 			 * reserved size after flushing out the remainder.
 			 */
-			wait_bytes = remain_actual + ringbuf->reserved_size;
+			wait_bytes = remain_actual + ring->reserved_size;
 			need_wrap = true;
-		} else if (total_bytes > ringbuf->space) {
+		} else if (total_bytes > ring->space) {
 			/* No wrapping required, just waiting. */
 			wait_bytes = total_bytes;
 		}
@@ -2224,7 +2224,7 @@ static int ring_prepare(struct drm_i915_gem_request *req, int bytes)
 			return ret;
 
 		if (need_wrap)
-			ring_wrap(ringbuf);
+			ring_wrap(ring);
 	}
 
 	return 0;
@@ -2238,14 +2238,14 @@ int intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
 	if (ret)
 		return ret;
 
-	req->ringbuf->space -= num_dwords * sizeof(uint32_t);
+	req->ring->space -= num_dwords * sizeof(uint32_t);
 	return 0;
 }
 
 /* Align the ring tail to a cacheline boundary */
 int intel_ring_cacheline_align(struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	int num_dwords = (ring->tail & (CACHELINE_BYTES - 1)) / sizeof(uint32_t);
 	int ret;
 
@@ -2320,7 +2320,7 @@ static void gen6_bsd_ring_write_tail(struct intel_engine_cs *ring,
 static int gen6_bsd_ring_flush(struct drm_i915_gem_request *req,
 			       u32 invalidate, u32 flush)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	uint32_t cmd;
 	int ret;
 
@@ -2366,7 +2366,7 @@ gen8_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			      u64 offset, u32 len,
 			      unsigned dispatch_flags)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	bool ppgtt = USES_PPGTT(req->i915) &&
 			!(dispatch_flags & I915_DISPATCH_SECURE);
 	int ret;
@@ -2392,7 +2392,7 @@ hsw_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			     u64 offset, u32 len,
 			     unsigned dispatch_flags)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	int ret;
 
 	ret = intel_ring_begin(req, 2);
@@ -2417,7 +2417,7 @@ gen6_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			      u64 offset, u32 len,
 			      unsigned dispatch_flags)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	int ret;
 
 	ret = intel_ring_begin(req, 2);
@@ -2440,7 +2440,7 @@ gen6_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
 static int gen6_ring_flush(struct drm_i915_gem_request *req,
 			   u32 invalidate, u32 flush)
 {
-	struct intel_ringbuffer *ring = req->ringbuf;
+	struct intel_ringbuffer *ring = req->ring;
 	uint32_t cmd;
 	int ret;
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 060/190] drm/i915: Rename backpointer from intel_ringbuffer to intel_engine_cs
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (57 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 059/190] drm/i915: Rename request->ringbuf to request->ring Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-28 11:49   ` Tvrtko Ursulin
  2016-01-11  9:17 ` [PATCH 061/190] drm/i915: Rename intel_context[engine].ringbuf Chris Wilson
                   ` (27 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Having ringbuf->ring point to an engine is confusing, so rename it once
again to ring->engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_guc_submission.c | 10 +++---
 drivers/gpu/drm/i915/intel_lrc.c           | 35 +++++++++----------
 drivers/gpu/drm/i915/intel_ringbuffer.c    | 54 +++++++++++++++---------------
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  2 +-
 4 files changed, 49 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index e82cc9182dfa..53abe2143f8a 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -391,7 +391,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct guc_execlist_context *lrc = &desc.lrc[i];
 		struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
-		struct intel_engine_cs *ring;
+		struct intel_engine_cs *engine;
 		struct drm_i915_gem_object *obj;
 		uint64_t ctx_desc;
 
@@ -406,15 +406,15 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 		if (!obj)
 			break;	/* XXX: continue? */
 
-		ring = ringbuf->ring;
-		ctx_desc = intel_lr_context_descriptor(ctx, ring);
+		engine = ringbuf->engine;
+		ctx_desc = intel_lr_context_descriptor(ctx, engine);
 		lrc->context_desc = (u32)ctx_desc;
 
 		/* The state page is after PPHWSP */
 		lrc->ring_lcra = i915_gem_obj_ggtt_offset(obj) +
 				LRC_STATE_PN * PAGE_SIZE;
 		lrc->context_id = (client->ctx_index << GUC_ELC_CTXID_OFFSET) |
-				(ring->id << GUC_ELC_ENGINE_OFFSET);
+				(engine->id << GUC_ELC_ENGINE_OFFSET);
 
 		obj = ringbuf->obj;
 
@@ -423,7 +423,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 		lrc->ring_next_free_location = lrc->ring_begin;
 		lrc->ring_current_tail_pointer_value = 0;
 
-		desc.engines_used |= (1 << ring->id);
+		desc.engines_used |= (1 << engine->id);
 	}
 
 	WARN_ON(desc.engines_used == 0);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 87d325b6e7dc..8639ebfab96f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2179,13 +2179,13 @@ void intel_lr_context_free(struct intel_context *ctx)
 		if (ctx_obj) {
 			struct intel_ringbuffer *ringbuf =
 					ctx->engine[i].ringbuf;
-			struct intel_engine_cs *ring = ringbuf->ring;
+			struct intel_engine_cs *engine = ringbuf->engine;
 
-			if (ctx == ring->default_context) {
+			if (ctx == engine->default_context) {
 				intel_unpin_ringbuffer_obj(ringbuf);
 				i915_gem_object_ggtt_unpin(ctx_obj);
 			}
-			WARN_ON(ctx->engine[ring->id].pin_count);
+			WARN_ON(ctx->engine[engine->id].pin_count);
 			intel_ringbuffer_free(ringbuf);
 			drm_gem_object_unreference(&ctx_obj->base);
 		}
@@ -2261,57 +2261,54 @@ static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
  *
  * Return: non-zero on error.
  */
-
 int intel_lr_context_deferred_alloc(struct intel_context *ctx,
-				     struct intel_engine_cs *ring)
+				    struct intel_engine_cs *engine)
 {
-	struct drm_device *dev = ring->dev;
 	struct drm_i915_gem_object *ctx_obj;
 	uint32_t context_size;
 	struct intel_ringbuffer *ringbuf;
 	int ret;
 
 	WARN_ON(ctx->legacy_hw_ctx.rcs_state != NULL);
-	WARN_ON(ctx->engine[ring->id].state);
+	WARN_ON(ctx->engine[engine->id].state);
 
-	context_size = round_up(intel_lr_context_size(ring), 4096);
+	context_size = round_up(intel_lr_context_size(engine), 4096);
 
 	/* One extra page as the sharing data between driver and GuC */
 	context_size += PAGE_SIZE * LRC_PPHWSP_PN;
 
-	ctx_obj = i915_gem_alloc_object(dev, context_size);
+	ctx_obj = i915_gem_alloc_object(engine->dev, context_size);
 	if (!ctx_obj) {
 		DRM_DEBUG_DRIVER("Alloc LRC backing obj failed.\n");
 		return -ENOMEM;
 	}
 
-	ringbuf = intel_engine_create_ringbuffer(ring, 4 * PAGE_SIZE);
+	ringbuf = intel_engine_create_ringbuffer(engine, 4 * PAGE_SIZE);
 	if (IS_ERR(ringbuf)) {
 		ret = PTR_ERR(ringbuf);
 		goto error_deref_obj;
 	}
 
-	ret = populate_lr_context(ctx, ctx_obj, ring, ringbuf);
+	ret = populate_lr_context(ctx, ctx_obj, engine, ringbuf);
 	if (ret) {
 		DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret);
 		goto error_ringbuf;
 	}
 
-	ctx->engine[ring->id].ringbuf = ringbuf;
-	ctx->engine[ring->id].state = ctx_obj;
+	ctx->engine[engine->id].ringbuf = ringbuf;
+	ctx->engine[engine->id].state = ctx_obj;
 
-	if (ctx != ring->default_context && ring->init_context) {
+	if (ctx != engine->default_context && engine->init_context) {
 		struct drm_i915_gem_request *req;
 
-		ret = i915_gem_request_alloc(ring,
-			ctx, &req);
+		ret = i915_gem_request_alloc(engine, ctx, &req);
 		if (ret) {
 			DRM_ERROR("ring create req: %d\n",
 				ret);
 			goto error_ringbuf;
 		}
 
-		ret = ring->init_context(req);
+		ret = engine->init_context(req);
 		if (ret) {
 			DRM_ERROR("ring init context: %d\n",
 				ret);
@@ -2326,8 +2323,8 @@ error_ringbuf:
 	intel_ringbuffer_free(ringbuf);
 error_deref_obj:
 	drm_gem_object_unreference(&ctx_obj->base);
-	ctx->engine[ring->id].ringbuf = NULL;
-	ctx->engine[ring->id].state = NULL;
+	ctx->engine[engine->id].ringbuf = NULL;
+	ctx->engine[engine->id].state = NULL;
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ae00e79c9c99..c437b61ac1d0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1940,7 +1940,7 @@ intel_engine_create_ringbuffer(struct intel_engine_cs *engine, int size)
 		return ERR_PTR(-ENOMEM);
 	}
 
-	ring->ring = engine;
+	ring->engine = engine;
 	list_add(&ring->link, &engine->buffers);
 
 	ring->size = size;
@@ -1975,40 +1975,40 @@ intel_ringbuffer_free(struct intel_ringbuffer *ring)
 	kfree(ring);
 }
 
-static int intel_init_ring_buffer(struct drm_device *dev,
-				  struct intel_engine_cs *ring)
+static int intel_init_engine(struct drm_device *dev,
+			     struct intel_engine_cs *engine)
 {
 	struct intel_ringbuffer *ringbuf;
 	int ret;
 
-	WARN_ON(ring->buffer);
+	WARN_ON(engine->buffer);
 
-	ring->dev = dev;
-	ring->i915 = to_i915(dev);
-	ring->fence_context = fence_context_alloc(1);
-	INIT_LIST_HEAD(&ring->active_list);
-	INIT_LIST_HEAD(&ring->request_list);
-	INIT_LIST_HEAD(&ring->execlist_queue);
-	INIT_LIST_HEAD(&ring->buffers);
-	i915_gem_batch_pool_init(dev, &ring->batch_pool);
-	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
+	engine->dev = dev;
+	engine->i915 = to_i915(dev);
+	engine->fence_context = fence_context_alloc(1);
+	INIT_LIST_HEAD(&engine->active_list);
+	INIT_LIST_HEAD(&engine->request_list);
+	INIT_LIST_HEAD(&engine->execlist_queue);
+	INIT_LIST_HEAD(&engine->buffers);
+	i915_gem_batch_pool_init(dev, &engine->batch_pool);
+	memset(engine->semaphore.sync_seqno, 0, sizeof(engine->semaphore.sync_seqno));
 
-	intel_engine_init_breadcrumbs(ring);
+	intel_engine_init_breadcrumbs(engine);
 
-	ringbuf = intel_engine_create_ringbuffer(ring, 32 * PAGE_SIZE);
+	ringbuf = intel_engine_create_ringbuffer(engine, 32 * PAGE_SIZE);
 	if (IS_ERR(ringbuf)) {
 		ret = PTR_ERR(ringbuf);
 		goto error;
 	}
-	ring->buffer = ringbuf;
+	engine->buffer = ringbuf;
 
 	if (I915_NEED_GFX_HWS(dev)) {
-		ret = init_status_page(ring);
+		ret = init_status_page(engine);
 		if (ret)
 			goto error;
 	} else {
-		BUG_ON(ring->id != RCS);
-		ret = init_phys_status_page(ring);
+		BUG_ON(engine->id != RCS);
+		ret = init_phys_status_page(engine);
 		if (ret)
 			goto error;
 	}
@@ -2016,19 +2016,19 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
 	if (ret) {
 		DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n",
-				ring->name, ret);
+				engine->name, ret);
 		intel_destroy_ringbuffer_obj(ringbuf);
 		goto error;
 	}
 
-	ret = i915_cmd_parser_init_ring(ring);
+	ret = i915_cmd_parser_init_ring(engine);
 	if (ret)
 		goto error;
 
 	return 0;
 
 error:
-	intel_cleanup_ring_buffer(ring);
+	intel_cleanup_ring_buffer(engine);
 	return ret;
 }
 
@@ -2612,7 +2612,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->scratch.gtt_offset = i915_gem_obj_ggtt_offset(obj);
 	}
 
-	ret = intel_init_ring_buffer(dev, ring);
+	ret = intel_init_engine(dev, ring);
 	if (ret)
 		return ret;
 
@@ -2692,7 +2692,7 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 	}
 	ring->init_hw = init_ring_common;
 
-	return intel_init_ring_buffer(dev, ring);
+	return intel_init_engine(dev, ring);
 }
 
 /**
@@ -2724,7 +2724,7 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
 	}
 	ring->init_hw = init_ring_common;
 
-	return intel_init_ring_buffer(dev, ring);
+	return intel_init_engine(dev, ring);
 }
 
 int intel_init_blt_ring_buffer(struct drm_device *dev)
@@ -2780,7 +2780,7 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 	}
 	ring->init_hw = init_ring_common;
 
-	return intel_init_ring_buffer(dev, ring);
+	return intel_init_engine(dev, ring);
 }
 
 int intel_init_vebox_ring_buffer(struct drm_device *dev)
@@ -2830,7 +2830,7 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 	}
 	ring->init_hw = init_ring_common;
 
-	return intel_init_ring_buffer(dev, ring);
+	return intel_init_engine(dev, ring);
 }
 
 int
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index bc6ceb54b1f3..6bd9b356c95d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -101,7 +101,7 @@ struct intel_ringbuffer {
 	struct drm_i915_gem_object *obj;
 	void *virtual_start;
 
-	struct intel_engine_cs *ring;
+	struct intel_engine_cs *engine;
 	struct list_head link;
 
 	u32 head;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 061/190] drm/i915: Rename intel_context[engine].ringbuf
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (58 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 060/190] drm/i915: Rename backpointer from intel_ringbuffer to intel_engine_cs Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 062/190] drm/i915: Rename extern functions operating on intel_engine_cs Chris Wilson
                   ` (26 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Perform s/ringbuf/ring/ on the context struct for consistency with the
ring/engine split.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  2 +-
 drivers/gpu/drm/i915/i915_drv.h            |  2 +-
 drivers/gpu/drm/i915/i915_guc_submission.c |  6 +--
 drivers/gpu/drm/i915/intel_lrc.c           | 63 ++++++++++++++----------------
 4 files changed, 35 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 018076c89247..6e91726db8d3 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1988,7 +1988,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
 				struct drm_i915_gem_object *ctx_obj =
 					ctx->engine[i].state;
 				struct intel_ringbuffer *ringbuf =
-					ctx->engine[i].ringbuf;
+					ctx->engine[i].ring;
 
 				seq_printf(m, "%s: ", ring->name);
 				if (ctx_obj)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index baede4517c70..9f06dd19bfb2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -885,7 +885,7 @@ struct intel_context {
 	/* Execlists */
 	struct {
 		struct drm_i915_gem_object *state;
-		struct intel_ringbuffer *ringbuf;
+		struct intel_ringbuffer *ring;
 		int pin_count;
 	} engine[I915_NUM_RINGS];
 
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 53abe2143f8a..b47e630e048a 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -390,7 +390,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct guc_execlist_context *lrc = &desc.lrc[i];
-		struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
+		struct intel_ringbuffer *ring = ctx->engine[i].ring;
 		struct intel_engine_cs *engine;
 		struct drm_i915_gem_object *obj;
 		uint64_t ctx_desc;
@@ -406,7 +406,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 		if (!obj)
 			break;	/* XXX: continue? */
 
-		engine = ringbuf->engine;
+		engine = ring->engine;
 		ctx_desc = intel_lr_context_descriptor(ctx, engine);
 		lrc->context_desc = (u32)ctx_desc;
 
@@ -416,7 +416,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 		lrc->context_id = (client->ctx_index << GUC_ELC_CTXID_OFFSET) |
 				(engine->id << GUC_ELC_ENGINE_OFFSET);
 
-		obj = ringbuf->obj;
+		obj = ring->obj;
 
 		lrc->ring_begin = i915_gem_obj_ggtt_offset(obj);
 		lrc->ring_end = lrc->ring_begin + obj->base.size - 1;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8639ebfab96f..65beb7267d1a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -402,24 +402,24 @@ static void execlists_submit_requests(struct drm_i915_gem_request *rq0,
 	execlists_elsp_write(rq0, rq1);
 }
 
-static void execlists_context_unqueue(struct intel_engine_cs *ring)
+static void execlists_context_unqueue(struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_request *req0 = NULL, *req1 = NULL;
 	struct drm_i915_gem_request *cursor = NULL, *tmp = NULL;
 
-	assert_spin_locked(&ring->execlist_lock);
+	assert_spin_locked(&engine->execlist_lock);
 
 	/*
 	 * If irqs are not active generate a warning as batches that finish
 	 * without the irqs may get lost and a GPU Hang may occur.
 	 */
-	WARN_ON(!intel_irqs_enabled(ring->dev->dev_private));
+	WARN_ON(!intel_irqs_enabled(engine->dev->dev_private));
 
-	if (list_empty(&ring->execlist_queue))
+	if (list_empty(&engine->execlist_queue))
 		return;
 
 	/* Try to read in pairs */
-	list_for_each_entry_safe(cursor, tmp, &ring->execlist_queue,
+	list_for_each_entry_safe(cursor, tmp, &engine->execlist_queue,
 				 execlist_link) {
 		if (!req0) {
 			req0 = cursor;
@@ -429,7 +429,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 			cursor->elsp_submitted = req0->elsp_submitted;
 			list_del(&req0->execlist_link);
 			list_add_tail(&req0->execlist_link,
-				&ring->execlist_retired_req_list);
+				&engine->execlist_retired_req_list);
 			req0 = cursor;
 		} else {
 			req1 = cursor;
@@ -437,7 +437,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 		}
 	}
 
-	if (IS_GEN8(ring->dev) || IS_GEN9(ring->dev)) {
+	if (IS_GEN8(engine->dev) || IS_GEN9(engine->dev)) {
 		/*
 		 * WaIdleLiteRestore: make sure we never cause a lite
 		 * restore with HEAD==TAIL
@@ -449,11 +449,11 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 			 * for where we prepare the padding after the end of the
 			 * request.
 			 */
-			struct intel_ringbuffer *ringbuf;
+			struct intel_ringbuffer *ring;
 
-			ringbuf = req0->ctx->engine[ring->id].ringbuf;
+			ring = req0->ctx->engine[engine->id].ring;
 			req0->tail += 8;
-			req0->tail &= ringbuf->size - 1;
+			req0->tail &= ring->size - 1;
 		}
 	}
 
@@ -671,7 +671,7 @@ int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request
 {
 	int ret;
 
-	request->ring = request->ctx->engine[request->engine->id].ringbuf;
+	request->ring = request->ctx->engine[request->engine->id].ring;
 
 	if (request->ctx != request->engine->default_context) {
 		ret = intel_lr_context_pin(request);
@@ -1775,7 +1775,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ret = intel_lr_context_do_pin(
 			ring,
 			ring->default_context->engine[ring->id].state,
-			ring->default_context->engine[ring->id].ringbuf);
+			ring->default_context->engine[ring->id].ring);
 	if (ret) {
 		DRM_ERROR(
 			"Failed to pin and map ringbuffer %s: %d\n",
@@ -2177,16 +2177,15 @@ void intel_lr_context_free(struct intel_context *ctx)
 		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
 
 		if (ctx_obj) {
-			struct intel_ringbuffer *ringbuf =
-					ctx->engine[i].ringbuf;
-			struct intel_engine_cs *engine = ringbuf->engine;
+			struct intel_ringbuffer *ring = ctx->engine[i].ring;
+			struct intel_engine_cs *engine = ring->engine;
 
 			if (ctx == engine->default_context) {
-				intel_unpin_ringbuffer_obj(ringbuf);
+				intel_unpin_ringbuffer_obj(ring);
 				i915_gem_object_ggtt_unpin(ctx_obj);
 			}
 			WARN_ON(ctx->engine[engine->id].pin_count);
-			intel_ringbuffer_free(ringbuf);
+			intel_ringbuffer_free(ring);
 			drm_gem_object_unreference(&ctx_obj->base);
 		}
 	}
@@ -2266,7 +2265,7 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 {
 	struct drm_i915_gem_object *ctx_obj;
 	uint32_t context_size;
-	struct intel_ringbuffer *ringbuf;
+	struct intel_ringbuffer *ring;
 	int ret;
 
 	WARN_ON(ctx->legacy_hw_ctx.rcs_state != NULL);
@@ -2283,19 +2282,19 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 		return -ENOMEM;
 	}
 
-	ringbuf = intel_engine_create_ringbuffer(engine, 4 * PAGE_SIZE);
-	if (IS_ERR(ringbuf)) {
-		ret = PTR_ERR(ringbuf);
+	ring = intel_engine_create_ringbuffer(engine, 4 * PAGE_SIZE);
+	if (IS_ERR(ring)) {
+		ret = PTR_ERR(ring);
 		goto error_deref_obj;
 	}
 
-	ret = populate_lr_context(ctx, ctx_obj, engine, ringbuf);
+	ret = populate_lr_context(ctx, ctx_obj, engine, ring);
 	if (ret) {
 		DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret);
 		goto error_ringbuf;
 	}
 
-	ctx->engine[engine->id].ringbuf = ringbuf;
+	ctx->engine[engine->id].ring = ring;
 	ctx->engine[engine->id].state = ctx_obj;
 
 	if (ctx != engine->default_context && engine->init_context) {
@@ -2320,10 +2319,10 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 	return 0;
 
 error_ringbuf:
-	intel_ringbuffer_free(ringbuf);
+	intel_ringbuffer_free(ring);
 error_deref_obj:
 	drm_gem_object_unreference(&ctx_obj->base);
-	ctx->engine[engine->id].ringbuf = NULL;
+	ctx->engine[engine->id].ring = NULL;
 	ctx->engine[engine->id].state = NULL;
 	return ret;
 }
@@ -2332,14 +2331,12 @@ void intel_lr_context_reset(struct drm_device *dev,
 			struct intel_context *ctx)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_engine_cs *ring;
+	struct intel_engine_cs *unused;
 	int i;
 
-	for_each_ring(ring, dev_priv, i) {
-		struct drm_i915_gem_object *ctx_obj =
-				ctx->engine[ring->id].state;
-		struct intel_ringbuffer *ringbuf =
-				ctx->engine[ring->id].ringbuf;
+	for_each_ring(unused, dev_priv, i) {
+		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
+		struct intel_ringbuffer *ring = ctx->engine[i].ring;
 		uint32_t *reg_state;
 		struct page *page;
 
@@ -2358,7 +2355,7 @@ void intel_lr_context_reset(struct drm_device *dev,
 
 		kunmap_atomic(reg_state);
 
-		ringbuf->head = 0;
-		ringbuf->tail = 0;
+		ring->head = 0;
+		ring->tail = 0;
 	}
 }
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 062/190] drm/i915: Rename extern functions operating on intel_engine_cs
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (59 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 061/190] drm/i915: Rename intel_context[engine].ringbuf Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 063/190] drm/i915: Rename struct intel_ringbuffer to intel_ring Chris Wilson
                   ` (25 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Using intel_ring_* to refer to the intel_engine_cs functions is most
confusing!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        | 10 +++----
 drivers/gpu/drm/i915/i915_dma.c            |  8 +++---
 drivers/gpu/drm/i915/i915_drv.h            |  4 +--
 drivers/gpu/drm/i915/i915_gem.c            | 22 +++++++-------
 drivers/gpu/drm/i915/i915_gem_context.c    |  8 +++---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  6 ++--
 drivers/gpu/drm/i915/i915_gem_request.c    |  8 +++---
 drivers/gpu/drm/i915/i915_gem_request.h    |  4 +--
 drivers/gpu/drm/i915/i915_gpu_error.c      |  8 +++---
 drivers/gpu/drm/i915/i915_guc_submission.c |  6 ++--
 drivers/gpu/drm/i915/i915_irq.c            | 18 ++++++------
 drivers/gpu/drm/i915/i915_trace.h          |  2 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c   |  4 +--
 drivers/gpu/drm/i915/intel_lrc.c           | 17 +++++------
 drivers/gpu/drm/i915/intel_mocs.c          |  6 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.c    | 46 ++++++++++++++----------------
 drivers/gpu/drm/i915/intel_ringbuffer.h    | 36 +++++++++++------------
 17 files changed, 104 insertions(+), 109 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 6e91726db8d3..dec10784c2bc 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -599,7 +599,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 					   engine->name,
 					   i915_gem_request_get_seqno(work->flip_queued_req),
 					   dev_priv->next_seqno,
-					   intel_ring_get_seqno(engine),
+					   intel_engine_get_seqno(engine),
 					   i915_gem_request_completed(work->flip_queued_req));
 			} else
 				seq_printf(m, "Flip not associated with any ring\n");
@@ -732,7 +732,7 @@ static void i915_ring_seqno_info(struct seq_file *m,
 	struct rb_node *rb;
 
 	seq_printf(m, "Current sequence (%s): %x\n",
-		   ring->name, intel_ring_get_seqno(ring));
+		   ring->name, intel_engine_get_seqno(ring));
 
 	seq_printf(m, "Current user interrupts (%s): %x\n",
 		   ring->name, READ_ONCE(ring->user_interrupts));
@@ -1354,8 +1354,8 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 	intel_runtime_pm_get(dev_priv);
 
 	for_each_ring(ring, dev_priv, i) {
-		acthd[i] = intel_ring_get_active_head(ring);
-		seqno[i] = intel_ring_get_seqno(ring);
+		acthd[i] = intel_engine_get_active_head(ring);
+		seqno[i] = intel_engine_get_seqno(ring);
 	}
 
 	i915_get_extra_instdone(dev, instdone);
@@ -2496,7 +2496,7 @@ static int i915_guc_info(struct seq_file *m, void *data)
 	struct intel_guc guc;
 	struct i915_guc_client client = {};
 	struct intel_engine_cs *ring;
-	enum intel_ring_id i;
+	enum intel_engine_id i;
 	u64 total = 0;
 
 	if (!HAS_GUC_SCHED(dev_priv->dev))
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 4c72c83cfa28..c0242ce45e43 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -87,16 +87,16 @@ static int i915_getparam(struct drm_device *dev, void *data,
 		value = 1;
 		break;
 	case I915_PARAM_HAS_BSD:
-		value = intel_ring_initialized(&dev_priv->ring[VCS]);
+		value = intel_engine_initialized(&dev_priv->ring[VCS]);
 		break;
 	case I915_PARAM_HAS_BLT:
-		value = intel_ring_initialized(&dev_priv->ring[BCS]);
+		value = intel_engine_initialized(&dev_priv->ring[BCS]);
 		break;
 	case I915_PARAM_HAS_VEBOX:
-		value = intel_ring_initialized(&dev_priv->ring[VECS]);
+		value = intel_engine_initialized(&dev_priv->ring[VECS]);
 		break;
 	case I915_PARAM_HAS_BSD2:
-		value = intel_ring_initialized(&dev_priv->ring[VCS2]);
+		value = intel_engine_initialized(&dev_priv->ring[VCS2]);
 		break;
 	case I915_PARAM_HAS_RELAXED_FENCING:
 		value = 1;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9f06dd19bfb2..466adc6617f0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -520,7 +520,7 @@ struct drm_i915_error_state {
 		/* Software tracked state */
 		bool waiting;
 		int hangcheck_score;
-		enum intel_ring_hangcheck_action hangcheck_action;
+		enum intel_engine_hangcheck_action hangcheck_action;
 		int num_requests;
 
 		/* our own tracking of ring head and tail */
@@ -1973,7 +1973,7 @@ static inline struct drm_i915_private *guc_to_i915(struct intel_guc *guc)
 /* Iterate over initialised rings */
 #define for_each_ring(ring__, dev_priv__, i__) \
 	for ((i__) = 0; (i__) < I915_NUM_RINGS; (i__)++) \
-		for_each_if ((((ring__) = &(dev_priv__)->ring[(i__)]), intel_ring_initialized((ring__))))
+		for_each_if ((((ring__) = &(dev_priv__)->ring[(i__)]), intel_engine_initialized((ring__))))
 
 enum hdmi_force_audio {
 	HDMI_AUDIO_OFF_DVI = -2,	/* no aux data for HDMI-DVI converter */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 430c439ece26..a81cad666d3a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2067,7 +2067,7 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 	/* Add a reference if we're newly entering the active list. */
 	if (obj->active == 0)
 		drm_gem_object_reference(&obj->base);
-	obj->active |= intel_ring_flag(engine);
+	obj->active |= intel_engine_flag(engine);
 
 	list_move_tail(&obj->ring_list[engine->id], &engine->active_list);
 	i915_gem_request_assign(&obj->last_read_req[engine->id], req);
@@ -2079,7 +2079,7 @@ static void
 i915_gem_object_retire__write(struct drm_i915_gem_object *obj)
 {
 	GEM_BUG_ON(obj->last_write_req == NULL);
-	GEM_BUG_ON(!(obj->active & intel_ring_flag(obj->last_write_req->engine)));
+	GEM_BUG_ON(!(obj->active & intel_engine_flag(obj->last_write_req->engine)));
 
 	i915_gem_request_assign(&obj->last_write_req, NULL);
 	intel_fb_obj_flush(obj, true, ORIGIN_CS);
@@ -2273,7 +2273,7 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 		intel_ring_update_space(buffer);
 	}
 
-	intel_ring_init_seqno(ring, ring->last_submitted_seqno);
+	intel_engine_init_seqno(ring, ring->last_submitted_seqno);
 }
 
 void i915_gem_reset(struct drm_device *dev)
@@ -2576,7 +2576,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 
 		i915_gem_object_retire_request(obj, from_req);
 	} else {
-		int idx = intel_ring_sync_index(from, to);
+		int idx = intel_engine_sync_index(from, to);
 		u32 seqno = i915_gem_request_get_seqno(from_req);
 
 		WARN_ON(!to_req);
@@ -2794,7 +2794,7 @@ int i915_gpu_idle(struct drm_device *dev)
 				return ret;
 		}
 
-		ret = intel_ring_idle(ring);
+		ret = intel_engine_idle(ring);
 		if (ret)
 			return ret;
 	}
@@ -4180,13 +4180,13 @@ int i915_gem_init_rings(struct drm_device *dev)
 	return 0;
 
 cleanup_vebox_ring:
-	intel_cleanup_ring_buffer(&dev_priv->ring[VECS]);
+	intel_engine_cleanup(&dev_priv->ring[VECS]);
 cleanup_blt_ring:
-	intel_cleanup_ring_buffer(&dev_priv->ring[BCS]);
+	intel_engine_cleanup(&dev_priv->ring[BCS]);
 cleanup_bsd_ring:
-	intel_cleanup_ring_buffer(&dev_priv->ring[VCS]);
+	intel_engine_cleanup(&dev_priv->ring[VCS]);
 cleanup_render_ring:
-	intel_cleanup_ring_buffer(&dev_priv->ring[RCS]);
+	intel_engine_cleanup(&dev_priv->ring[RCS]);
 
 	return ret;
 }
@@ -4341,8 +4341,8 @@ int i915_gem_init(struct drm_device *dev)
 	if (!i915.enable_execlists) {
 		dev_priv->gt.execbuf_submit = i915_gem_ringbuffer_submission;
 		dev_priv->gt.init_rings = i915_gem_init_rings;
-		dev_priv->gt.cleanup_ring = intel_cleanup_ring_buffer;
-		dev_priv->gt.stop_ring = intel_stop_ring_buffer;
+		dev_priv->gt.cleanup_ring = intel_engine_cleanup;
+		dev_priv->gt.stop_ring = intel_engine_stop;
 	} else {
 		dev_priv->gt.execbuf_submit = intel_execlists_submission;
 		dev_priv->gt.init_rings = intel_logical_rings_init;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 5b4e77a80c19..ac2e205fe3b4 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -610,7 +610,7 @@ static inline bool should_skip_switch(struct intel_engine_cs *ring,
 		return false;
 
 	if (to->ppgtt && from == to &&
-	    !(intel_ring_flag(ring) & to->ppgtt->pd_dirty_rings))
+	    !(intel_engine_flag(ring) & to->ppgtt->pd_dirty_rings))
 		return true;
 
 	return false;
@@ -691,7 +691,7 @@ static int do_switch(struct drm_i915_gem_request *req)
 			goto unpin_out;
 
 		/* Doing a PD load always reloads the page dirs */
-		to->ppgtt->pd_dirty_rings &= ~intel_ring_flag(engine);
+		to->ppgtt->pd_dirty_rings &= ~intel_engine_flag(engine);
 	}
 
 	if (engine->id != RCS) {
@@ -719,9 +719,9 @@ static int do_switch(struct drm_i915_gem_request *req)
 		 * space. This means we must enforce that a page table load
 		 * occur when this occurs. */
 	} else if (to->ppgtt &&
-		   (intel_ring_flag(engine) & to->ppgtt->pd_dirty_rings)) {
+		   (intel_engine_flag(engine) & to->ppgtt->pd_dirty_rings)) {
 		hw_flags |= MI_FORCE_RESTORE;
-		to->ppgtt->pd_dirty_rings &= ~intel_ring_flag(engine);
+		to->ppgtt->pd_dirty_rings &= ~intel_engine_flag(engine);
 	}
 
 	/* We should never emit switch_mm more than once */
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index a0f5a997c2f2..b7c90072f7d4 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -942,7 +942,7 @@ static int
 i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 				struct list_head *vmas)
 {
-	const unsigned other_rings = ~intel_ring_flag(req->engine);
+	const unsigned other_rings = ~intel_engine_flag(req->engine);
 	struct i915_vma *vma;
 	uint32_t flush_domains = 0;
 	bool flush_chipset = false;
@@ -972,7 +972,7 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 	/* Unconditionally invalidate gpu caches and ensure that we do flush
 	 * any residual writes from the previous batch.
 	 */
-	return intel_ring_invalidate_all_caches(req);
+	return intel_engine_invalidate_all_caches(req);
 }
 
 static bool
@@ -1443,7 +1443,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	} else
 		ring = &dev_priv->ring[(args->flags & I915_EXEC_RING_MASK) - 1];
 
-	if (!intel_ring_initialized(ring)) {
+	if (!intel_engine_initialized(ring)) {
 		DRM_DEBUG("execbuf with invalid ring: %d\n",
 			  (int)(args->flags & I915_EXEC_RING_MASK));
 		return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 4cc64d9cca12..54834ad1bf5e 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -90,7 +90,7 @@ static void i915_fence_timeline_value_str(struct fence *fence, char *str,
 					  int size)
 {
 	snprintf(str, size, "%u",
-		 intel_ring_get_seqno(to_i915_request(fence)->engine));
+		 intel_engine_get_seqno(to_i915_request(fence)->engine));
 }
 
 static void i915_fence_release(struct fence *fence)
@@ -136,7 +136,7 @@ i915_gem_init_seqno(struct drm_i915_private *dev_priv, u32 seqno)
 
 	/* Carefully retire all requests without writing to the rings */
 	for_each_ring(ring, dev_priv, i) {
-		ret = intel_ring_idle(ring);
+		ret = intel_engine_idle(ring);
 		if (ret)
 			return ret;
 	}
@@ -144,7 +144,7 @@ i915_gem_init_seqno(struct drm_i915_private *dev_priv, u32 seqno)
 
 	/* Finally reset hw state */
 	for_each_ring(ring, dev_priv, i) {
-		intel_ring_init_seqno(ring, seqno);
+		intel_engine_init_seqno(ring, seqno);
 
 		for (j = 0; j < ARRAY_SIZE(ring->semaphore.sync_seqno); j++)
 			ring->semaphore.sync_seqno[j] = 0;
@@ -429,7 +429,7 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 		if (i915.enable_execlists)
 			ret = logical_ring_flush_all_caches(request);
 		else
-			ret = intel_ring_flush_all_caches(request);
+			ret = intel_engine_flush_all_caches(request);
 		/* Not allowed to fail! */
 		WARN(ret, "*_ring_flush_all_caches failed: %d!\n", ret);
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index bd17e3a9a71d..cd4412f6e7e3 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -198,13 +198,13 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 
 static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
 {
-	return i915_seqno_passed(intel_ring_get_seqno(req->engine),
+	return i915_seqno_passed(intel_engine_get_seqno(req->engine),
 				 req->previous_seqno);
 }
 
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
-	return i915_seqno_passed(intel_ring_get_seqno(req->engine),
+	return i915_seqno_passed(intel_engine_get_seqno(req->engine),
 				 req->fence.seqno);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index b47ca1b7041f..f27d6d1b64d6 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -221,7 +221,7 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
 	}
 }
 
-static const char *hangcheck_action_to_str(enum intel_ring_hangcheck_action a)
+static const char *hangcheck_action_to_str(enum intel_engine_hangcheck_action a)
 {
 	switch (a) {
 	case HANGCHECK_IDLE:
@@ -841,7 +841,7 @@ static void gen8_record_semaphore_state(struct drm_i915_private *dev_priv,
 		signal_offset = (GEN8_SIGNAL_OFFSET(ring, i) & (PAGE_SIZE - 1))
 				/ 4;
 		tmp = error->semaphore_obj->pages[0];
-		idx = intel_ring_sync_index(ring, to);
+		idx = intel_engine_sync_index(ring, to);
 
 		ering->semaphore_mboxes[idx] = tmp[signal_offset];
 		ering->semaphore_seqno[idx] = ring->semaphore.sync_seqno[idx];
@@ -901,8 +901,8 @@ static void i915_record_ring_state(struct drm_device *dev,
 
 	ering->waiting = intel_engine_has_waiter(ring);
 	ering->instpm = I915_READ(RING_INSTPM(ring->mmio_base));
-	ering->acthd = intel_ring_get_active_head(ring);
-	ering->seqno = intel_ring_get_seqno(ring);
+	ering->acthd = intel_engine_get_active_head(ring);
+	ering->seqno = intel_engine_get_seqno(ring);
 	ering->start = I915_READ_START(ring);
 	ering->head = I915_READ_HEAD(ring);
 	ering->tail = I915_READ_TAIL(ring);
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index b47e630e048a..39ccfa8934e3 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -510,7 +510,7 @@ int i915_guc_wq_check_space(struct i915_guc_client *gc)
 static int guc_add_workqueue_item(struct i915_guc_client *gc,
 				  struct drm_i915_gem_request *rq)
 {
-	enum intel_ring_id ring_id = rq->engine->id;
+	enum intel_engine_id ring_id = rq->engine->id;
 	struct guc_wq_item *wqi;
 	void *base;
 	u32 tail, wq_len, wq_off, space;
@@ -565,7 +565,7 @@ static int guc_add_workqueue_item(struct i915_guc_client *gc,
 /* Update the ringbuffer pointer in a saved context image */
 static void lr_context_update(struct drm_i915_gem_request *rq)
 {
-	enum intel_ring_id ring_id = rq->engine->id;
+	enum intel_engine_id ring_id = rq->engine->id;
 	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring_id].state;
 	struct drm_i915_gem_object *rb_obj = rq->ring->obj;
 	struct page *page;
@@ -594,7 +594,7 @@ int i915_guc_submit(struct i915_guc_client *client,
 		    struct drm_i915_gem_request *rq)
 {
 	struct intel_guc *guc = client->guc;
-	enum intel_ring_id ring_id = rq->engine->id;
+	enum intel_engine_id ring_id = rq->engine->id;
 	int q_ret, b_ret;
 
 	/* Need this because of the deferred pin ctx and ring */
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index ce52d7d9ad91..ce047ac84f5f 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2896,7 +2896,7 @@ static int semaphore_passed(struct intel_engine_cs *ring)
 	if (signaller->hangcheck.deadlock >= I915_NUM_RINGS)
 		return -1;
 
-	if (i915_seqno_passed(intel_ring_get_seqno(signaller), seqno))
+	if (i915_seqno_passed(intel_engine_get_seqno(signaller), seqno))
 		return 1;
 
 	/* cursory check for an unkickable deadlock */
@@ -2945,7 +2945,7 @@ static bool subunits_stuck(struct intel_engine_cs *ring)
 	return stuck;
 }
 
-static enum intel_ring_hangcheck_action
+static enum intel_engine_hangcheck_action
 head_stuck(struct intel_engine_cs *ring, u64 acthd)
 {
 	if (acthd != ring->hangcheck.acthd) {
@@ -2968,12 +2968,12 @@ head_stuck(struct intel_engine_cs *ring, u64 acthd)
 	return HANGCHECK_HUNG;
 }
 
-static enum intel_ring_hangcheck_action
-ring_stuck(struct intel_engine_cs *ring, u64 acthd)
+static enum intel_engine_hangcheck_action
+engine_stuck(struct intel_engine_cs *ring, u64 acthd)
 {
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	enum intel_ring_hangcheck_action ha;
+	enum intel_engine_hangcheck_action ha;
 	u32 tmp;
 
 	ha = head_stuck(ring, acthd);
@@ -3053,8 +3053,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 
 		semaphore_clear_deadlocks(dev_priv);
 
-		acthd = intel_ring_get_active_head(ring);
-		seqno = intel_ring_get_seqno(ring);
+		acthd = intel_engine_get_active_head(ring);
+		seqno = intel_engine_get_seqno(ring);
 		user_interrupts = READ_ONCE(ring->user_interrupts);
 
 		if (ring->hangcheck.seqno == seqno) {
@@ -3091,8 +3091,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 				 * being repeatedly kicked and so responsible
 				 * for stalling the machine.
 				 */
-				ring->hangcheck.action = ring_stuck(ring,
-								    acthd);
+				ring->hangcheck.action =
+					engine_stuck(ring, acthd);
 
 				switch (ring->hangcheck.action) {
 				case HANGCHECK_IDLE:
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 0204ff72b3e4..95cab4776401 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -569,7 +569,7 @@ TRACE_EVENT(i915_gem_request_notify,
 	    TP_fast_assign(
 			   __entry->dev = ring->dev->primary->index;
 			   __entry->ring = ring->id;
-			   __entry->seqno = intel_ring_get_seqno(ring);
+			   __entry->seqno = intel_engine_get_seqno(ring);
 			   ),
 
 	    TP_printk("dev=%u, ring=%u, seqno=%u",
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 5ba8b4cd8a18..b9366e6ca5ad 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -141,7 +141,7 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
 			   struct intel_wait *wait)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	u32 seqno = intel_ring_get_seqno(engine);
+	u32 seqno = intel_engine_get_seqno(engine);
 	struct rb_node **p, *parent, *completed;
 	bool first;
 
@@ -283,7 +283,7 @@ void intel_engine_remove_wait(struct intel_engine_cs *engine,
 			 * the first_waiter. This is undesirable if that
 			 * waiter is a high priority task.
 			 */
-			u32 seqno = intel_ring_get_seqno(engine);
+			u32 seqno = intel_engine_get_seqno(engine);
 			while (i915_seqno_passed(seqno,
 						 to_wait(next)->seqno)) {
 				struct rb_node *n = rb_next(next);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 65beb7267d1a..92ae7bc532ed 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -637,7 +637,7 @@ static int logical_ring_invalidate_all_caches(struct drm_i915_gem_request *req)
 static int execlists_move_to_gpu(struct drm_i915_gem_request *req,
 				 struct list_head *vmas)
 {
-	const unsigned other_rings = ~intel_ring_flag(req->engine);
+	const unsigned other_rings = ~intel_engine_flag(req->engine);
 	struct i915_vma *vma;
 	uint32_t flush_domains = 0;
 	bool flush_chipset = false;
@@ -843,10 +843,10 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	int ret;
 
-	if (!intel_ring_initialized(ring))
+	if (!intel_engine_initialized(ring))
 		return;
 
-	ret = intel_ring_idle(ring);
+	ret = intel_engine_idle(ring);
 	if (ret)
 		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
 			  ring->name, ret);
@@ -1455,7 +1455,7 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
 	 * not idle). PML4 is allocated during ppgtt init so this is
 	 * not needed in 48-bit.*/
 	if (req->ctx->ppgtt &&
-	    (intel_ring_flag(req->engine) & req->ctx->ppgtt->pd_dirty_rings)) {
+	    (intel_engine_flag(req->engine) & req->ctx->ppgtt->pd_dirty_rings)) {
 		if (!USES_FULL_48BIT_PPGTT(req->i915) &&
 		    !intel_vgpu_active(req->i915->dev)) {
 			ret = intel_logical_ring_emit_pdps(req);
@@ -1463,7 +1463,7 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
 				return ret;
 		}
 
-		req->ctx->ppgtt->pd_dirty_rings &= ~intel_ring_flag(req->engine);
+		req->ctx->ppgtt->pd_dirty_rings &= ~intel_engine_flag(req->engine);
 	}
 
 	ret = intel_ring_begin(req, 4);
@@ -1714,14 +1714,11 @@ static int gen8_init_rcs_context(struct drm_i915_gem_request *req)
  */
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
-	struct drm_i915_private *dev_priv;
-
-	if (!intel_ring_initialized(ring))
+	if (!intel_engine_initialized(ring))
 		return;
 
-	dev_priv = ring->dev->dev_private;
-
 	if (ring->buffer) {
+		struct drm_i915_private *dev_priv = ring->i915;
 		intel_logical_ring_stop(ring);
 		WARN_ON((I915_READ_MODE(ring) & MODE_IDLE) == 0);
 	}
diff --git a/drivers/gpu/drm/i915/intel_mocs.c b/drivers/gpu/drm/i915/intel_mocs.c
index 039c7405f640..61e1704d7313 100644
--- a/drivers/gpu/drm/i915/intel_mocs.c
+++ b/drivers/gpu/drm/i915/intel_mocs.c
@@ -159,7 +159,7 @@ static bool get_mocs_settings(struct drm_i915_private *dev_priv,
 	return result;
 }
 
-static i915_reg_t mocs_register(enum intel_ring_id ring, int index)
+static i915_reg_t mocs_register(enum intel_engine_id ring, int index)
 {
 	switch (ring) {
 	case RCS:
@@ -191,7 +191,7 @@ static i915_reg_t mocs_register(enum intel_ring_id ring, int index)
  */
 static int emit_mocs_control_table(struct drm_i915_gem_request *req,
 				   const struct drm_i915_mocs_table *table,
-				   enum intel_ring_id id)
+				   enum intel_engine_id id)
 {
 	struct intel_ringbuffer *ring = req->ring;
 	unsigned int index;
@@ -318,7 +318,7 @@ int intel_rcs_context_init_mocs(struct drm_i915_gem_request *req)
 
 	if (get_mocs_settings(req->i915, &t)) {
 		struct intel_engine_cs *ring;
-		enum intel_ring_id ring_id;
+		enum intel_engine_id ring_id;
 
 		/* Program the control registers */
 		for_each_ring(ring, req->i915, ring_id) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index c437b61ac1d0..1bb9f376aa0b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -425,16 +425,16 @@ static void ring_write_tail(struct intel_engine_cs *ring,
 	I915_WRITE_TAIL(ring, value);
 }
 
-u64 intel_ring_get_active_head(struct intel_engine_cs *ring)
+u64 intel_engine_get_active_head(struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct drm_i915_private *dev_priv = engine->i915;
 	u64 acthd;
 
-	if (INTEL_INFO(ring->dev)->gen >= 8)
-		acthd = I915_READ64_2x32(RING_ACTHD(ring->mmio_base),
-					 RING_ACTHD_UDW(ring->mmio_base));
-	else if (INTEL_INFO(ring->dev)->gen >= 4)
-		acthd = I915_READ(RING_ACTHD(ring->mmio_base));
+	if (INTEL_INFO(dev_priv)->gen >= 8)
+		acthd = I915_READ64_2x32(RING_ACTHD(engine->mmio_base),
+					 RING_ACTHD_UDW(engine->mmio_base));
+	else if (INTEL_INFO(dev_priv)->gen >= 4)
+		acthd = I915_READ(RING_ACTHD(engine->mmio_base));
 	else
 		acthd = I915_READ(ACTHD);
 
@@ -697,7 +697,7 @@ static int intel_ring_workarounds_emit(struct drm_i915_gem_request *req)
 		return 0;
 
 	req->engine->gpu_caches_dirty = true;
-	ret = intel_ring_flush_all_caches(req);
+	ret = intel_engine_flush_all_caches(req);
 	if (ret)
 		return ret;
 
@@ -715,7 +715,7 @@ static int intel_ring_workarounds_emit(struct drm_i915_gem_request *req)
 	intel_ring_advance(ring);
 
 	req->engine->gpu_caches_dirty = true;
-	ret = intel_ring_flush_all_caches(req);
+	ret = intel_engine_flush_all_caches(req);
 	if (ret)
 		return ret;
 
@@ -2028,21 +2028,19 @@ static int intel_init_engine(struct drm_device *dev,
 	return 0;
 
 error:
-	intel_cleanup_ring_buffer(engine);
+	intel_engine_cleanup(engine);
 	return ret;
 }
 
-void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
+void intel_engine_cleanup(struct intel_engine_cs *ring)
 {
-	struct drm_i915_private *dev_priv;
-
-	if (!intel_ring_initialized(ring))
+	if (!intel_engine_initialized(ring))
 		return;
 
-	dev_priv = to_i915(ring->dev);
-
 	if (ring->buffer) {
-		intel_stop_ring_buffer(ring);
+		struct drm_i915_private *dev_priv = ring->i915;
+
+		intel_engine_stop(ring);
 		WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
 
 		intel_unpin_ringbuffer_obj(ring->buffer);
@@ -2062,7 +2060,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 	ring->dev = NULL;
 }
 
-int intel_ring_idle(struct intel_engine_cs *ring)
+int intel_engine_idle(struct intel_engine_cs *ring)
 {
 	struct drm_i915_gem_request *req;
 
@@ -2265,7 +2263,7 @@ int intel_ring_cacheline_align(struct drm_i915_gem_request *req)
 	return 0;
 }
 
-void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno)
+void intel_engine_init_seqno(struct intel_engine_cs *ring, u32 seqno)
 {
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -2834,7 +2832,7 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 }
 
 int
-intel_ring_flush_all_caches(struct drm_i915_gem_request *req)
+intel_engine_flush_all_caches(struct drm_i915_gem_request *req)
 {
 	struct intel_engine_cs *engine = req->engine;
 	int ret;
@@ -2853,7 +2851,7 @@ intel_ring_flush_all_caches(struct drm_i915_gem_request *req)
 }
 
 int
-intel_ring_invalidate_all_caches(struct drm_i915_gem_request *req)
+intel_engine_invalidate_all_caches(struct drm_i915_gem_request *req)
 {
 	struct intel_engine_cs *engine = req->engine;
 	uint32_t flush_domains;
@@ -2874,14 +2872,14 @@ intel_ring_invalidate_all_caches(struct drm_i915_gem_request *req)
 }
 
 void
-intel_stop_ring_buffer(struct intel_engine_cs *ring)
+intel_engine_stop(struct intel_engine_cs *ring)
 {
 	int ret;
 
-	if (!intel_ring_initialized(ring))
+	if (!intel_engine_initialized(ring))
 		return;
 
-	ret = intel_ring_idle(ring);
+	ret = intel_engine_idle(ring);
 	if (ret)
 		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
 			  ring->name, ret);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 6bd9b356c95d..6803e4820688 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -75,7 +75,7 @@ struct  intel_hw_status_page {
 	ring->semaphore.signal_ggtt[ring->id] = MI_SEMAPHORE_SYNC_INVALID; \
 	} while(0)
 
-enum intel_ring_hangcheck_action {
+enum intel_engine_hangcheck_action {
 	HANGCHECK_IDLE = 0,
 	HANGCHECK_WAIT,
 	HANGCHECK_ACTIVE,
@@ -86,13 +86,13 @@ enum intel_ring_hangcheck_action {
 
 #define HANGCHECK_SCORE_RING_HUNG 31
 
-struct intel_ring_hangcheck {
+struct intel_engine_hangcheck {
 	u64 acthd;
 	u64 max_acthd;
 	u32 seqno;
 	unsigned user_interrupts;
 	int score;
-	enum intel_ring_hangcheck_action action;
+	enum intel_engine_hangcheck_action action;
 	int deadlock;
 	u32 instdone[I915_NUM_INSTDONE_REG];
 };
@@ -148,9 +148,9 @@ struct  i915_ctx_workarounds {
 
 struct drm_i915_gem_request;
 
-struct  intel_engine_cs {
+struct intel_engine_cs {
 	const char	*name;
-	enum intel_ring_id {
+	enum intel_engine_id {
 		RCS = 0x0,
 		VCS,
 		BCS,
@@ -337,7 +337,7 @@ struct  intel_engine_cs {
 	struct intel_context *default_context;
 	struct intel_context *last_context;
 
-	struct intel_ring_hangcheck hangcheck;
+	struct intel_engine_hangcheck hangcheck;
 
 	struct {
 		struct drm_i915_gem_object *obj;
@@ -380,20 +380,20 @@ struct  intel_engine_cs {
 };
 
 static inline bool
-intel_ring_initialized(struct intel_engine_cs *ring)
+intel_engine_initialized(struct intel_engine_cs *ring)
 {
 	return ring->dev != NULL;
 }
 
 static inline unsigned
-intel_ring_flag(struct intel_engine_cs *ring)
+intel_engine_flag(struct intel_engine_cs *ring)
 {
 	return 1 << ring->id;
 }
 
 static inline u32
-intel_ring_sync_index(struct intel_engine_cs *ring,
-		      struct intel_engine_cs *other)
+intel_engine_sync_index(struct intel_engine_cs *ring,
+			struct intel_engine_cs *other)
 {
 	int idx;
 
@@ -461,8 +461,8 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf);
 void intel_ringbuffer_free(struct intel_ringbuffer *ring);
 
-void intel_stop_ring_buffer(struct intel_engine_cs *ring);
-void intel_cleanup_ring_buffer(struct intel_engine_cs *ring);
+void intel_engine_stop(struct intel_engine_cs *ring);
+void intel_engine_cleanup(struct intel_engine_cs *ring);
 
 int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request);
 
@@ -487,10 +487,10 @@ int __intel_ring_space(int head, int tail, int size);
 void intel_ring_update_space(struct intel_ringbuffer *ringbuf);
 int intel_ring_space(struct intel_ringbuffer *ringbuf);
 
-int __must_check intel_ring_idle(struct intel_engine_cs *ring);
-void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno);
-int intel_ring_flush_all_caches(struct drm_i915_gem_request *req);
-int intel_ring_invalidate_all_caches(struct drm_i915_gem_request *req);
+int __must_check intel_engine_idle(struct intel_engine_cs *ring);
+void intel_engine_init_seqno(struct intel_engine_cs *ring, u32 seqno);
+int intel_engine_flush_all_caches(struct drm_i915_gem_request *req);
+int intel_engine_invalidate_all_caches(struct drm_i915_gem_request *req);
 
 void intel_fini_pipe_control(struct intel_engine_cs *ring);
 int intel_init_pipe_control(struct intel_engine_cs *ring);
@@ -501,8 +501,8 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev);
 int intel_init_blt_ring_buffer(struct drm_device *dev);
 int intel_init_vebox_ring_buffer(struct drm_device *dev);
 
-u64 intel_ring_get_active_head(struct intel_engine_cs *ring);
-static inline u32 intel_ring_get_seqno(struct intel_engine_cs *ring)
+u64 intel_engine_get_active_head(struct intel_engine_cs *ring);
+static inline u32 intel_engine_get_seqno(struct intel_engine_cs *ring)
 {
 	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
 }
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 063/190] drm/i915: Rename struct intel_ringbuffer to intel_ring
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (60 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 062/190] drm/i915: Rename extern functions operating on intel_engine_cs Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-28 11:54   ` Tvrtko Ursulin
  2016-01-11  9:17 ` [PATCH 064/190] drm/i915: Rename intel_pin_and_map_ring() Chris Wilson
                   ` (24 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  21 +++---
 drivers/gpu/drm/i915/i915_drv.h            |   2 +-
 drivers/gpu/drm/i915/i915_gem.c            |  43 ++++++------
 drivers/gpu/drm/i915/i915_gem_context.c    |   2 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c        |   6 +-
 drivers/gpu/drm/i915/i915_gem_request.c    |   2 +-
 drivers/gpu/drm/i915/i915_gem_request.h    |   2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c      |   2 +-
 drivers/gpu/drm/i915/i915_guc_submission.c |   2 +-
 drivers/gpu/drm/i915/intel_display.c       |  10 +--
 drivers/gpu/drm/i915/intel_lrc.c           |  40 ++++++------
 drivers/gpu/drm/i915/intel_mocs.c          |   4 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c    | 101 ++++++++++++++---------------
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  45 ++++++-------
 15 files changed, 138 insertions(+), 148 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index dec10784c2bc..8de944ed3369 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1948,12 +1948,11 @@ static int i915_gem_framebuffer_info(struct seq_file *m, void *data)
 	return 0;
 }
 
-static void describe_ctx_ringbuf(struct seq_file *m,
-				 struct intel_ringbuffer *ringbuf)
+static void describe_ctx_ring(struct seq_file *m, struct intel_ring *ring)
 {
 	seq_printf(m, " (ringbuffer, space: %d, head: %u, tail: %u, last head: %d)",
-		   ringbuf->space, ringbuf->head, ringbuf->tail,
-		   ringbuf->last_retired_head);
+		   ring->space, ring->head, ring->tail,
+		   ring->last_retired_head);
 }
 
 static int i915_context_status(struct seq_file *m, void *unused)
@@ -1985,16 +1984,12 @@ static int i915_context_status(struct seq_file *m, void *unused)
 		if (i915.enable_execlists) {
 			seq_putc(m, '\n');
 			for_each_ring(ring, dev_priv, i) {
-				struct drm_i915_gem_object *ctx_obj =
-					ctx->engine[i].state;
-				struct intel_ringbuffer *ringbuf =
-					ctx->engine[i].ring;
-
 				seq_printf(m, "%s: ", ring->name);
-				if (ctx_obj)
-					describe_obj(m, ctx_obj);
-				if (ringbuf)
-					describe_ctx_ringbuf(m, ringbuf);
+				if (ctx->engine[i].state)
+					describe_obj(m, ctx->engine[i].state);
+				if (ctx->engine[i].ring)
+					describe_ctx_ring(m,
+							  ctx->engine[i].ring);
 				seq_putc(m, '\n');
 			}
 		} else {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 466adc6617f0..44e8738c5310 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -885,7 +885,7 @@ struct intel_context {
 	/* Execlists */
 	struct {
 		struct drm_i915_gem_object *state;
-		struct intel_ringbuffer *ring;
+		struct intel_ring *ring;
 		int pin_count;
 	} engine[I915_NUM_RINGS];
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a81cad666d3a..1c6beb154d07 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2193,9 +2193,9 @@ i915_gem_find_active_request(struct intel_engine_cs *ring)
 	return NULL;
 }
 
-static void i915_gem_reset_ring_status(struct drm_i915_private *dev_priv,
-				       struct intel_engine_cs *ring)
+static void i915_gem_reset_ring_status(struct intel_engine_cs *ring)
 {
+	struct drm_i915_private *dev_priv = ring->i915;
 	struct drm_i915_gem_request *request;
 	bool ring_hung;
 
@@ -2212,19 +2212,18 @@ static void i915_gem_reset_ring_status(struct drm_i915_private *dev_priv,
 		i915_set_reset_status(dev_priv, request->ctx, false);
 }
 
-static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
-					struct intel_engine_cs *ring)
+static void i915_gem_reset_ring_cleanup(struct intel_engine_cs *engine)
 {
-	struct intel_ringbuffer *buffer;
+	struct intel_ring *ring;
 
-	while (!list_empty(&ring->active_list)) {
+	while (!list_empty(&engine->active_list)) {
 		struct drm_i915_gem_object *obj;
 
-		obj = list_first_entry(&ring->active_list,
+		obj = list_first_entry(&engine->active_list,
 				       struct drm_i915_gem_object,
-				       ring_list[ring->id]);
+				       ring_list[engine->id]);
 
-		i915_gem_object_retire__read(obj, ring->id);
+		i915_gem_object_retire__read(obj, engine->id);
 	}
 
 	/*
@@ -2234,14 +2233,14 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 	 */
 
 	if (i915.enable_execlists) {
-		spin_lock_irq(&ring->execlist_lock);
+		spin_lock_irq(&engine->execlist_lock);
 
 		/* list_splice_tail_init checks for empty lists */
-		list_splice_tail_init(&ring->execlist_queue,
-				      &ring->execlist_retired_req_list);
+		list_splice_tail_init(&engine->execlist_queue,
+				      &engine->execlist_retired_req_list);
 
-		spin_unlock_irq(&ring->execlist_lock);
-		intel_execlists_retire_requests(ring);
+		spin_unlock_irq(&engine->execlist_lock);
+		intel_execlists_retire_requests(engine);
 	}
 
 	/*
@@ -2251,10 +2250,10 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 	 * implicit references on things like e.g. ppgtt address spaces through
 	 * the request.
 	 */
-	if (!list_empty(&ring->request_list)) {
+	if (!list_empty(&engine->request_list)) {
 		struct drm_i915_gem_request *request;
 
-		request = list_last_entry(&ring->request_list,
+		request = list_last_entry(&engine->request_list,
 					  struct drm_i915_gem_request,
 					  list);
 
@@ -2268,12 +2267,12 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 	 * upon reset is less than when we start. Do one more pass over
 	 * all the ringbuffers to reset last_retired_head.
 	 */
-	list_for_each_entry(buffer, &ring->buffers, link) {
-		buffer->last_retired_head = buffer->tail;
-		intel_ring_update_space(buffer);
+	list_for_each_entry(ring, &engine->buffers, link) {
+		ring->last_retired_head = ring->tail;
+		intel_ring_update_space(ring);
 	}
 
-	intel_engine_init_seqno(ring, ring->last_submitted_seqno);
+	intel_engine_init_seqno(engine, engine->last_submitted_seqno);
 }
 
 void i915_gem_reset(struct drm_device *dev)
@@ -2288,10 +2287,10 @@ void i915_gem_reset(struct drm_device *dev)
 	 * their reference to the objects, the inspection must be done first.
 	 */
 	for_each_ring(ring, dev_priv, i)
-		i915_gem_reset_ring_status(dev_priv, ring);
+		i915_gem_reset_ring_status(ring);
 
 	for_each_ring(ring, dev_priv, i)
-		i915_gem_reset_ring_cleanup(dev_priv, ring);
+		i915_gem_reset_ring_cleanup(ring);
 
 	i915_gem_context_reset(dev);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index ac2e205fe3b4..17fe8ed991d6 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -519,7 +519,7 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
 static inline int
 mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	u32 flags = hw_flags | MI_MM_SPACE_GTT;
 	const int num_rings =
 		/* Use an extended w/a on ivb+ if signalling from other rings */
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index b7c90072f7d4..731ce13dbdbc 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1148,7 +1148,7 @@ i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params)
 static int
 i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	int ret, i;
 
 	if (!IS_GEN7(req->i915) || req->engine->id != RCS) {
@@ -1229,7 +1229,7 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 			       struct drm_i915_gem_execbuffer2 *args,
 			       struct list_head *vmas)
 {
-	struct intel_ringbuffer *ring = params->request->ring;
+	struct intel_ring *ring = params->request->ring;
 	struct drm_i915_private *dev_priv = params->request->i915;
 	u64 exec_start, exec_len;
 	int instp_mode;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 38c109cda904..9a91451d66ac 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -656,7 +656,7 @@ static int gen8_write_pdp(struct drm_i915_gem_request *req,
 			  unsigned entry,
 			  dma_addr_t addr)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	int ret;
 
 	BUG_ON(entry >= 4);
@@ -1648,7 +1648,7 @@ static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
 static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
 			 struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	int ret;
 
 	/* NB: TLBs must be flushed and invalidated before a switch */
@@ -1686,7 +1686,7 @@ static int vgpu_mm_switch(struct i915_hw_ppgtt *ppgtt,
 static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt,
 			  struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	int ret;
 
 	/* NB: TLBs must be flushed and invalidated before a switch */
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 54834ad1bf5e..e1f2af046b6c 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -401,7 +401,7 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 			struct drm_i915_gem_object *obj,
 			bool flush_caches)
 {
-	struct intel_ringbuffer *ring;
+	struct intel_ring *ring;
 	u32 request_start;
 	int ret;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index cd4412f6e7e3..086950567db4 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -79,7 +79,7 @@ struct drm_i915_gem_request {
 	 * context.
 	 */
 	struct intel_context *ctx;
-	struct intel_ringbuffer *ring;
+	struct intel_ring *ring;
 
 	/** Batch buffer related to this request if any (used for
 	    error state dump only) */
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index f27d6d1b64d6..2785f2d1f073 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1007,7 +1007,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 		request = i915_gem_find_active_request(engine);
 		if (request) {
 			struct i915_address_space *vm;
-			struct intel_ringbuffer *ring;
+			struct intel_ring *ring;
 
 			vm = request->ctx && request->ctx->ppgtt ?
 				&request->ctx->ppgtt->base :
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 39ccfa8934e3..5a6251926367 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -390,7 +390,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct guc_execlist_context *lrc = &desc.lrc[i];
-		struct intel_ringbuffer *ring = ctx->engine[i].ring;
+		struct intel_ring *ring = ctx->engine[i].ring;
 		struct intel_engine_cs *engine;
 		struct drm_i915_gem_object *obj;
 		uint64_t ctx_desc;
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 0d42356f15b4..f8717c5627dd 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11052,7 +11052,7 @@ static int intel_gen2_queue_flip(struct drm_device *dev,
 				 struct drm_i915_gem_request *req,
 				 uint32_t flags)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	u32 flip_mask;
 	int ret;
@@ -11087,7 +11087,7 @@ static int intel_gen3_queue_flip(struct drm_device *dev,
 				 struct drm_i915_gem_request *req,
 				 uint32_t flags)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	u32 flip_mask;
 	int ret;
@@ -11119,7 +11119,7 @@ static int intel_gen4_queue_flip(struct drm_device *dev,
 				 struct drm_i915_gem_request *req,
 				 uint32_t flags)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	struct drm_i915_private *dev_priv = req->i915;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	uint32_t pf, pipesrc;
@@ -11158,7 +11158,7 @@ static int intel_gen6_queue_flip(struct drm_device *dev,
 				 struct drm_i915_gem_request *req,
 				 uint32_t flags)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	struct drm_i915_private *dev_priv = req->i915;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	uint32_t pf, pipesrc;
@@ -11194,7 +11194,7 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 				 struct drm_i915_gem_request *req,
 				 uint32_t flags)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	uint32_t plane_bit = 0;
 	int len, ret;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 92ae7bc532ed..fa4c0c0db994 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -449,7 +449,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *engine)
 			 * for where we prepare the padding after the end of the
 			 * request.
 			 */
-			struct intel_ringbuffer *ring;
+			struct intel_ring *ring;
 
 			ring = req0->ctx->engine[engine->id].ring;
 			req0->tail += 8;
@@ -742,7 +742,7 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 	struct drm_device       *dev = params->dev;
 	struct intel_engine_cs  *engine = params->ring;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_ringbuffer *ring = params->request->ring;
+	struct intel_ring *ring = params->request->ring;
 	u64 exec_start;
 	int instp_mode;
 	u32 instp_mask;
@@ -878,7 +878,7 @@ int logical_ring_flush_all_caches(struct drm_i915_gem_request *req)
 
 static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 		struct drm_i915_gem_object *ctx_obj,
-		struct intel_ringbuffer *ringbuf)
+		struct intel_ring *ringbuf)
 {
 	struct drm_i915_private *dev_priv = ring->i915;
 	int ret = 0;
@@ -889,7 +889,7 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 	if (ret)
 		return ret;
 
-	ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
+	ret = intel_pin_and_map_ring(ring->dev, ringbuf);
 	if (ret)
 		goto unpin_ctx_obj;
 
@@ -931,12 +931,12 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 {
 	int engine = rq->engine->id;
 	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[engine].state;
-	struct intel_ringbuffer *ring = rq->ring;
+	struct intel_ring *ring = rq->ring;
 
 	if (ctx_obj) {
 		WARN_ON(!mutex_is_locked(&rq->i915->dev->struct_mutex));
 		if (--rq->ctx->engine[engine].pin_count == 0) {
-			intel_unpin_ringbuffer_obj(ring);
+			intel_unpin_ring(ring);
 			i915_gem_object_ggtt_unpin(ctx_obj);
 			i915_gem_context_unreference(rq->ctx);
 		}
@@ -947,7 +947,7 @@ static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
 {
 	int ret, i;
 	struct intel_engine_cs *engine = req->engine;
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	struct drm_i915_private *dev_priv = req->i915;
 	struct i915_workarounds *w = &dev_priv->workarounds;
 
@@ -1417,7 +1417,7 @@ static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
 {
 	struct i915_hw_ppgtt *ppgtt = req->ctx->ppgtt;
 	struct intel_engine_cs *engine = req->engine;
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	const int num_lri_cmds = GEN8_LEGACY_PDPES * 2;
 	int i, ret;
 
@@ -1444,7 +1444,7 @@ static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
 static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
 			      u64 offset, unsigned dispatch_flags)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	bool ppgtt = !(dispatch_flags & I915_DISPATCH_SECURE);
 	int ret;
 
@@ -1503,7 +1503,7 @@ static int gen8_emit_flush(struct drm_i915_gem_request *request,
 			   u32 invalidate_domains,
 			   u32 unused)
 {
-	struct intel_ringbuffer *ring = request->ring;
+	struct intel_ring *ring = request->ring;
 	uint32_t cmd;
 	int ret;
 
@@ -1541,7 +1541,7 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
 				  u32 invalidate_domains,
 				  u32 flush_domains)
 {
-	struct intel_ringbuffer *ring = request->ring;
+	struct intel_ring *ring = request->ring;
 	u32 scratch_addr = request->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	bool vf_flush_wa = false;
 	u32 flags = 0;
@@ -1620,7 +1620,7 @@ gen6_seqno_barrier(struct intel_engine_cs *ring)
 
 static int gen8_emit_request(struct drm_i915_gem_request *request)
 {
-	struct intel_ringbuffer *ring = request->ring;
+	struct intel_ring *ring = request->ring;
 	u32 cmd;
 	int ret;
 
@@ -2039,7 +2039,7 @@ make_rpcs(struct drm_device *dev)
 
 static int
 populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj,
-		    struct intel_engine_cs *ring, struct intel_ringbuffer *ringbuf)
+		    struct intel_engine_cs *ring, struct intel_ring *ringbuf)
 {
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -2174,15 +2174,15 @@ void intel_lr_context_free(struct intel_context *ctx)
 		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
 
 		if (ctx_obj) {
-			struct intel_ringbuffer *ring = ctx->engine[i].ring;
+			struct intel_ring *ring = ctx->engine[i].ring;
 			struct intel_engine_cs *engine = ring->engine;
 
 			if (ctx == engine->default_context) {
-				intel_unpin_ringbuffer_obj(ring);
+				intel_unpin_ring(ring);
 				i915_gem_object_ggtt_unpin(ctx_obj);
 			}
 			WARN_ON(ctx->engine[engine->id].pin_count);
-			intel_ringbuffer_free(ring);
+			intel_ring_free(ring);
 			drm_gem_object_unreference(&ctx_obj->base);
 		}
 	}
@@ -2262,7 +2262,7 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 {
 	struct drm_i915_gem_object *ctx_obj;
 	uint32_t context_size;
-	struct intel_ringbuffer *ring;
+	struct intel_ring *ring;
 	int ret;
 
 	WARN_ON(ctx->legacy_hw_ctx.rcs_state != NULL);
@@ -2279,7 +2279,7 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 		return -ENOMEM;
 	}
 
-	ring = intel_engine_create_ringbuffer(engine, 4 * PAGE_SIZE);
+	ring = intel_engine_create_ring(engine, 4 * PAGE_SIZE);
 	if (IS_ERR(ring)) {
 		ret = PTR_ERR(ring);
 		goto error_deref_obj;
@@ -2316,7 +2316,7 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 	return 0;
 
 error_ringbuf:
-	intel_ringbuffer_free(ring);
+	intel_ring_free(ring);
 error_deref_obj:
 	drm_gem_object_unreference(&ctx_obj->base);
 	ctx->engine[engine->id].ring = NULL;
@@ -2333,7 +2333,7 @@ void intel_lr_context_reset(struct drm_device *dev,
 
 	for_each_ring(unused, dev_priv, i) {
 		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
-		struct intel_ringbuffer *ring = ctx->engine[i].ring;
+		struct intel_ring *ring = ctx->engine[i].ring;
 		uint32_t *reg_state;
 		struct page *page;
 
diff --git a/drivers/gpu/drm/i915/intel_mocs.c b/drivers/gpu/drm/i915/intel_mocs.c
index 61e1704d7313..1b724c0a711e 100644
--- a/drivers/gpu/drm/i915/intel_mocs.c
+++ b/drivers/gpu/drm/i915/intel_mocs.c
@@ -193,7 +193,7 @@ static int emit_mocs_control_table(struct drm_i915_gem_request *req,
 				   const struct drm_i915_mocs_table *table,
 				   enum intel_engine_id id)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	unsigned int index;
 	int ret;
 
@@ -244,7 +244,7 @@ static int emit_mocs_control_table(struct drm_i915_gem_request *req,
 static int emit_mocs_l3cc_table(struct drm_i915_gem_request *req,
 				const struct drm_i915_mocs_table *table)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	unsigned int count;
 	unsigned int i;
 	u32 value;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 1bb9f376aa0b..95974156a1d9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -42,7 +42,7 @@ int __intel_ring_space(int head, int tail, int size)
 	return space - I915_RING_FREE_SPACE;
 }
 
-void intel_ring_update_space(struct intel_ringbuffer *ringbuf)
+void intel_ring_update_space(struct intel_ring *ringbuf)
 {
 	if (ringbuf->last_retired_head != -1) {
 		ringbuf->head = ringbuf->last_retired_head;
@@ -53,7 +53,7 @@ void intel_ring_update_space(struct intel_ringbuffer *ringbuf)
 					    ringbuf->tail, ringbuf->size);
 }
 
-int intel_ring_space(struct intel_ringbuffer *ringbuf)
+int intel_ring_space(struct intel_ring *ringbuf)
 {
 	intel_ring_update_space(ringbuf);
 	return ringbuf->space;
@@ -61,7 +61,7 @@ int intel_ring_space(struct intel_ringbuffer *ringbuf)
 
 static void __intel_ring_advance(struct intel_engine_cs *ring)
 {
-	struct intel_ringbuffer *ringbuf = ring->buffer;
+	struct intel_ring *ringbuf = ring->buffer;
 	ringbuf->tail &= ringbuf->size - 1;
 	ring->write_tail(ring, ringbuf->tail);
 }
@@ -71,7 +71,7 @@ gen2_render_ring_flush(struct drm_i915_gem_request *req,
 		       u32	invalidate_domains,
 		       u32	flush_domains)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	u32 cmd;
 	int ret;
 
@@ -98,7 +98,7 @@ gen4_render_ring_flush(struct drm_i915_gem_request *req,
 		       u32	invalidate_domains,
 		       u32	flush_domains)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	u32 cmd;
 	int ret;
 
@@ -191,7 +191,7 @@ gen4_render_ring_flush(struct drm_i915_gem_request *req,
 static int
 intel_emit_post_sync_nonzero_flush(struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	u32 scratch_addr = req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
 
@@ -227,7 +227,7 @@ static int
 gen6_render_ring_flush(struct drm_i915_gem_request *req,
 		       u32 invalidate_domains, u32 flush_domains)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	u32 flags = 0;
 	u32 scratch_addr = req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
@@ -279,7 +279,7 @@ gen6_render_ring_flush(struct drm_i915_gem_request *req,
 static int
 gen7_render_ring_cs_stall_wa(struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	int ret;
 
 	ret = intel_ring_begin(req, 4);
@@ -300,7 +300,7 @@ static int
 gen7_render_ring_flush(struct drm_i915_gem_request *req,
 		       u32 invalidate_domains, u32 flush_domains)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	u32 flags = 0;
 	u32 scratch_addr = req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
@@ -363,7 +363,7 @@ static int
 gen8_emit_pipe_control(struct drm_i915_gem_request *req,
 		       u32 flags, u32 scratch_addr)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	int ret;
 
 	ret = intel_ring_begin(req, 6);
@@ -547,7 +547,7 @@ static int init_ring_common(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_ringbuffer *ringbuf = ring->buffer;
+	struct intel_ring *ringbuf = ring->buffer;
 	struct drm_i915_gem_object *obj = ringbuf->obj;
 	int ret = 0;
 
@@ -688,7 +688,7 @@ err:
 
 static int intel_ring_workarounds_emit(struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	struct drm_i915_private *dev_priv = req->i915;
 	struct i915_workarounds *w = &dev_priv->workarounds;
 	int ret, i;
@@ -1191,7 +1191,7 @@ static int gen8_rcs_signal(struct drm_i915_gem_request *signaller_req,
 			   unsigned int num_dwords)
 {
 #define MBOX_UPDATE_DWORDS 8
-	struct intel_ringbuffer *signaller = signaller_req->ring;
+	struct intel_ring *signaller = signaller_req->ring;
 	struct drm_i915_private *dev_priv = signaller_req->i915;
 	struct intel_engine_cs *waiter;
 	int i, ret, num_rings;
@@ -1229,7 +1229,7 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
 			   unsigned int num_dwords)
 {
 #define MBOX_UPDATE_DWORDS 6
-	struct intel_ringbuffer *signaller = signaller_req->ring;
+	struct intel_ring *signaller = signaller_req->ring;
 	struct drm_i915_private *dev_priv = signaller_req->i915;
 	struct intel_engine_cs *waiter;
 	int i, ret, num_rings;
@@ -1264,7 +1264,7 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
 static int gen6_signal(struct drm_i915_gem_request *signaller_req,
 		       unsigned int num_dwords)
 {
-	struct intel_ringbuffer *signaller = signaller_req->ring;
+	struct intel_ring *signaller = signaller_req->ring;
 	struct drm_i915_private *dev_priv = signaller_req->i915;
 	struct intel_engine_cs *useless;
 	int i, ret, num_rings;
@@ -1306,7 +1306,7 @@ static int gen6_signal(struct drm_i915_gem_request *signaller_req,
 static int
 gen6_add_request(struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	int ret;
 
 	if (req->engine->semaphore.signal)
@@ -1345,7 +1345,7 @@ gen8_ring_sync(struct drm_i915_gem_request *waiter_req,
 	       struct intel_engine_cs *signaller,
 	       u32 seqno)
 {
-	struct intel_ringbuffer *waiter = waiter_req->ring;
+	struct intel_ring *waiter = waiter_req->ring;
 	struct drm_i915_private *dev_priv = waiter_req->i915;
 	int ret;
 
@@ -1373,7 +1373,7 @@ gen6_ring_sync(struct drm_i915_gem_request *waiter_req,
 	       struct intel_engine_cs *signaller,
 	       u32 seqno)
 {
-	struct intel_ringbuffer *waiter = waiter_req->ring;
+	struct intel_ring *waiter = waiter_req->ring;
 	u32 dw1 = MI_SEMAPHORE_MBOX |
 		  MI_SEMAPHORE_COMPARE |
 		  MI_SEMAPHORE_REGISTER;
@@ -1421,7 +1421,7 @@ do {									\
 static int
 pc_render_add_request(struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	u32 addr = req->engine->status_page.gfx_addr +
 		(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
 	u32 scratch_addr = addr;
@@ -1548,7 +1548,7 @@ bsd_ring_flush(struct drm_i915_gem_request *req,
 	       u32     invalidate_domains,
 	       u32     flush_domains)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	int ret;
 
 	ret = intel_ring_begin(req, 2);
@@ -1564,7 +1564,7 @@ bsd_ring_flush(struct drm_i915_gem_request *req,
 static int
 i9xx_add_request(struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	int ret;
 
 	ret = intel_ring_begin(req, 4);
@@ -1658,7 +1658,7 @@ i965_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			 u64 offset, u32 length,
 			 unsigned dispatch_flags)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	int ret;
 
 	ret = intel_ring_begin(req, 2);
@@ -1685,7 +1685,7 @@ i830_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			 u64 offset, u32 len,
 			 unsigned dispatch_flags)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	u32 cs_offset = req->engine->scratch.gtt_offset;
 	int ret;
 
@@ -1748,7 +1748,7 @@ i915_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			 u64 offset, u32 len,
 			 unsigned dispatch_flags)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	int ret;
 
 	ret = intel_ring_begin(req, 2);
@@ -1845,7 +1845,7 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
 	return 0;
 }
 
-void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
+void intel_unpin_ring(struct intel_ring *ringbuf)
 {
 	if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen)
 		i915_gem_object_unpin_vmap(ringbuf->obj);
@@ -1854,8 +1854,7 @@ void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 	i915_gem_object_ggtt_unpin(ringbuf->obj);
 }
 
-int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
-				     struct intel_ringbuffer *ringbuf)
+int intel_pin_and_map_ring(struct drm_device *dev, struct intel_ring *ringbuf)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_object *obj = ringbuf->obj;
@@ -1900,14 +1899,14 @@ unpin:
 	return ret;
 }
 
-static void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
+static void intel_destroy_ringbuffer_obj(struct intel_ring *ringbuf)
 {
 	drm_gem_object_unreference(&ringbuf->obj->base);
 	ringbuf->obj = NULL;
 }
 
 static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
-				      struct intel_ringbuffer *ringbuf)
+				      struct intel_ring *ringbuf)
 {
 	struct drm_i915_gem_object *obj;
 
@@ -1927,10 +1926,10 @@ static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
 	return 0;
 }
 
-struct intel_ringbuffer *
-intel_engine_create_ringbuffer(struct intel_engine_cs *engine, int size)
+struct intel_ring *
+intel_engine_create_ring(struct intel_engine_cs *engine, int size)
 {
-	struct intel_ringbuffer *ring;
+	struct intel_ring *ring;
 	int ret;
 
 	ring = kzalloc(sizeof(*ring), GFP_KERNEL);
@@ -1968,7 +1967,7 @@ intel_engine_create_ringbuffer(struct intel_engine_cs *engine, int size)
 }
 
 void
-intel_ringbuffer_free(struct intel_ringbuffer *ring)
+intel_ring_free(struct intel_ring *ring)
 {
 	intel_destroy_ringbuffer_obj(ring);
 	list_del(&ring->link);
@@ -1978,7 +1977,7 @@ intel_ringbuffer_free(struct intel_ringbuffer *ring)
 static int intel_init_engine(struct drm_device *dev,
 			     struct intel_engine_cs *engine)
 {
-	struct intel_ringbuffer *ringbuf;
+	struct intel_ring *ringbuf;
 	int ret;
 
 	WARN_ON(engine->buffer);
@@ -1995,7 +1994,7 @@ static int intel_init_engine(struct drm_device *dev,
 
 	intel_engine_init_breadcrumbs(engine);
 
-	ringbuf = intel_engine_create_ringbuffer(engine, 32 * PAGE_SIZE);
+	ringbuf = intel_engine_create_ring(engine, 32 * PAGE_SIZE);
 	if (IS_ERR(ringbuf)) {
 		ret = PTR_ERR(ringbuf);
 		goto error;
@@ -2013,7 +2012,7 @@ static int intel_init_engine(struct drm_device *dev,
 			goto error;
 	}
 
-	ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
+	ret = intel_pin_and_map_ring(dev, ringbuf);
 	if (ret) {
 		DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n",
 				engine->name, ret);
@@ -2043,8 +2042,8 @@ void intel_engine_cleanup(struct intel_engine_cs *ring)
 		intel_engine_stop(ring);
 		WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
 
-		intel_unpin_ringbuffer_obj(ring->buffer);
-		intel_ringbuffer_free(ring->buffer);
+		intel_unpin_ring(ring->buffer);
+		intel_ring_free(ring->buffer);
 		ring->buffer = NULL;
 	}
 
@@ -2084,7 +2083,7 @@ int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
 	return 0;
 }
 
-void intel_ring_reserved_space_reserve(struct intel_ringbuffer *ringbuf, int size)
+void intel_ring_reserved_space_reserve(struct intel_ring *ringbuf, int size)
 {
 	WARN_ON(ringbuf->reserved_size);
 	WARN_ON(ringbuf->reserved_in_use);
@@ -2092,7 +2091,7 @@ void intel_ring_reserved_space_reserve(struct intel_ringbuffer *ringbuf, int siz
 	ringbuf->reserved_size = size;
 }
 
-void intel_ring_reserved_space_cancel(struct intel_ringbuffer *ringbuf)
+void intel_ring_reserved_space_cancel(struct intel_ring *ringbuf)
 {
 	WARN_ON(ringbuf->reserved_in_use);
 
@@ -2100,7 +2099,7 @@ void intel_ring_reserved_space_cancel(struct intel_ringbuffer *ringbuf)
 	ringbuf->reserved_in_use = false;
 }
 
-void intel_ring_reserved_space_use(struct intel_ringbuffer *ringbuf)
+void intel_ring_reserved_space_use(struct intel_ring *ringbuf)
 {
 	WARN_ON(ringbuf->reserved_in_use);
 
@@ -2108,7 +2107,7 @@ void intel_ring_reserved_space_use(struct intel_ringbuffer *ringbuf)
 	ringbuf->reserved_tail   = ringbuf->tail;
 }
 
-void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf)
+void intel_ring_reserved_space_end(struct intel_ring *ringbuf)
 {
 	WARN_ON(!ringbuf->reserved_in_use);
 	if (ringbuf->tail > ringbuf->reserved_tail) {
@@ -2133,7 +2132,7 @@ void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf)
 
 static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	struct intel_engine_cs *engine = req->engine;
 	struct drm_i915_gem_request *target;
 	unsigned space;
@@ -2172,7 +2171,7 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
 	return 0;
 }
 
-static void ring_wrap(struct intel_ringbuffer *ringbuf)
+static void ring_wrap(struct intel_ring *ringbuf)
 {
 	int rem = ringbuf->size - ringbuf->tail;
 	memset(ringbuf->virtual_start + ringbuf->tail, 0, rem);
@@ -2183,7 +2182,7 @@ static void ring_wrap(struct intel_ringbuffer *ringbuf)
 
 static int ring_prepare(struct drm_i915_gem_request *req, int bytes)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	int remain_usable = ring->effective_size - ring->tail;
 	int remain_actual = ring->size - ring->tail;
 	int ret, total_bytes, wait_bytes = 0;
@@ -2243,7 +2242,7 @@ int intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
 /* Align the ring tail to a cacheline boundary */
 int intel_ring_cacheline_align(struct drm_i915_gem_request *req)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	int num_dwords = (ring->tail & (CACHELINE_BYTES - 1)) / sizeof(uint32_t);
 	int ret;
 
@@ -2318,7 +2317,7 @@ static void gen6_bsd_ring_write_tail(struct intel_engine_cs *ring,
 static int gen6_bsd_ring_flush(struct drm_i915_gem_request *req,
 			       u32 invalidate, u32 flush)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	uint32_t cmd;
 	int ret;
 
@@ -2364,7 +2363,7 @@ gen8_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			      u64 offset, u32 len,
 			      unsigned dispatch_flags)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	bool ppgtt = USES_PPGTT(req->i915) &&
 			!(dispatch_flags & I915_DISPATCH_SECURE);
 	int ret;
@@ -2390,7 +2389,7 @@ hsw_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			     u64 offset, u32 len,
 			     unsigned dispatch_flags)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	int ret;
 
 	ret = intel_ring_begin(req, 2);
@@ -2415,7 +2414,7 @@ gen6_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
 			      u64 offset, u32 len,
 			      unsigned dispatch_flags)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	int ret;
 
 	ret = intel_ring_begin(req, 2);
@@ -2438,7 +2437,7 @@ gen6_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
 static int gen6_ring_flush(struct drm_i915_gem_request *req,
 			   u32 invalidate, u32 flush)
 {
-	struct intel_ringbuffer *ring = req->ring;
+	struct intel_ring *ring = req->ring;
 	uint32_t cmd;
 	int ret;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 6803e4820688..71941af13560 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -97,7 +97,7 @@ struct intel_engine_hangcheck {
 	u32 instdone[I915_NUM_INSTDONE_REG];
 };
 
-struct intel_ringbuffer {
+struct intel_ring {
 	struct drm_i915_gem_object *obj;
 	void *virtual_start;
 
@@ -163,7 +163,7 @@ struct intel_engine_cs {
 	u32		mmio_base;
 	struct		drm_device *dev;
 	struct drm_i915_private *i915;
-	struct intel_ringbuffer *buffer;
+	struct intel_ring *buffer;
 	struct list_head buffers;
 
 	/* Rather than have every client wait upon all user interrupts,
@@ -454,12 +454,11 @@ intel_write_status_page(struct intel_engine_cs *ring,
 #define I915_GEM_HWS_SCRATCH_INDEX	0x40
 #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
 
-struct intel_ringbuffer *
-intel_engine_create_ringbuffer(struct intel_engine_cs *engine, int size);
-int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
-				     struct intel_ringbuffer *ringbuf);
-void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf);
-void intel_ringbuffer_free(struct intel_ringbuffer *ring);
+struct intel_ring *
+intel_engine_create_ring(struct intel_engine_cs *engine, int size);
+int intel_pin_and_map_ring(struct drm_device *dev, struct intel_ring *ring);
+void intel_unpin_ring(struct intel_ring *ring);
+void intel_ring_free(struct intel_ring *ring);
 
 void intel_engine_stop(struct intel_engine_cs *ring);
 void intel_engine_cleanup(struct intel_engine_cs *ring);
@@ -468,24 +467,22 @@ int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request);
 
 int __must_check intel_ring_begin(struct drm_i915_gem_request *req, int n);
 int __must_check intel_ring_cacheline_align(struct drm_i915_gem_request *req);
-static inline void intel_ring_emit(struct intel_ringbuffer *rb,
-				   u32 data)
+static inline void intel_ring_emit(struct intel_ring *ring, u32 data)
 {
-	*(uint32_t *)(rb->virtual_start + rb->tail) = data;
-	rb->tail += 4;
+	*(uint32_t *)(ring->virtual_start + ring->tail) = data;
+	ring->tail += 4;
 }
-static inline void intel_ring_emit_reg(struct intel_ringbuffer *rb,
-				       i915_reg_t reg)
+static inline void intel_ring_emit_reg(struct intel_ring *ring, i915_reg_t reg)
 {
-	intel_ring_emit(rb, i915_mmio_reg_offset(reg));
+	intel_ring_emit(ring, i915_mmio_reg_offset(reg));
 }
-static inline void intel_ring_advance(struct intel_ringbuffer *rb)
+static inline void intel_ring_advance(struct intel_ring *ring)
 {
-	rb->tail &= rb->size - 1;
+	ring->tail &= ring->size - 1;
 }
 int __intel_ring_space(int head, int tail, int size);
-void intel_ring_update_space(struct intel_ringbuffer *ringbuf);
-int intel_ring_space(struct intel_ringbuffer *ringbuf);
+void intel_ring_update_space(struct intel_ring *ringbuf);
+int intel_ring_space(struct intel_ring *ringbuf);
 
 int __must_check intel_engine_idle(struct intel_engine_cs *ring);
 void intel_engine_init_seqno(struct intel_engine_cs *ring, u32 seqno);
@@ -509,7 +506,7 @@ static inline u32 intel_engine_get_seqno(struct intel_engine_cs *ring)
 
 int init_workarounds_ring(struct intel_engine_cs *ring);
 
-static inline u32 intel_ring_get_tail(struct intel_ringbuffer *ringbuf)
+static inline u32 intel_ring_get_tail(struct intel_ring *ringbuf)
 {
 	return ringbuf->tail;
 }
@@ -528,13 +525,13 @@ static inline u32 intel_ring_get_tail(struct intel_ringbuffer *ringbuf)
  * will always have sufficient room to do its stuff. The request creation
  * code calls this automatically.
  */
-void intel_ring_reserved_space_reserve(struct intel_ringbuffer *ringbuf, int size);
+void intel_ring_reserved_space_reserve(struct intel_ring *ringbuf, int size);
 /* Cancel the reservation, e.g. because the request is being discarded. */
-void intel_ring_reserved_space_cancel(struct intel_ringbuffer *ringbuf);
+void intel_ring_reserved_space_cancel(struct intel_ring *ringbuf);
 /* Use the reserved space - for use by i915_add_request() only. */
-void intel_ring_reserved_space_use(struct intel_ringbuffer *ringbuf);
+void intel_ring_reserved_space_use(struct intel_ring *ringbuf);
 /* Finish with the reserved space - for use by i915_add_request() only. */
-void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf);
+void intel_ring_reserved_space_end(struct intel_ring *ringbuf);
 
 /* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
 struct intel_wait {
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 064/190] drm/i915: Rename intel_pin_and_map_ring()
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (61 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 063/190] drm/i915: Rename struct intel_ringbuffer to intel_ring Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 065/190] drm/i915: Remove obsolete engine->gpu_caches_dirty Chris Wilson
                   ` (23 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

For more consistent oop-naming, we would use intel_ring_verb, so pick
intel_ring_map().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c        |  6 ++---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 44 ++++++++++++++++-----------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  4 +--
 3 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index fa4c0c0db994..3a80d9d45f5c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -889,7 +889,7 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 	if (ret)
 		return ret;
 
-	ret = intel_pin_and_map_ring(ring->dev, ringbuf);
+	ret = intel_ring_map(ringbuf);
 	if (ret)
 		goto unpin_ctx_obj;
 
@@ -936,7 +936,7 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 	if (ctx_obj) {
 		WARN_ON(!mutex_is_locked(&rq->i915->dev->struct_mutex));
 		if (--rq->ctx->engine[engine].pin_count == 0) {
-			intel_unpin_ring(ring);
+			intel_ring_unmap(ring);
 			i915_gem_object_ggtt_unpin(ctx_obj);
 			i915_gem_context_unreference(rq->ctx);
 		}
@@ -2178,7 +2178,7 @@ void intel_lr_context_free(struct intel_context *ctx)
 			struct intel_engine_cs *engine = ring->engine;
 
 			if (ctx == engine->default_context) {
-				intel_unpin_ring(ring);
+				intel_ring_unmap(ring);
 				i915_gem_object_ggtt_unpin(ctx_obj);
 			}
 			WARN_ON(ctx->engine[engine->id].pin_count);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 95974156a1d9..74a4a54e6ca5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1845,22 +1845,12 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
 	return 0;
 }
 
-void intel_unpin_ring(struct intel_ring *ringbuf)
+int intel_ring_map(struct intel_ring *ring)
 {
-	if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen)
-		i915_gem_object_unpin_vmap(ringbuf->obj);
-	else
-		iounmap(ringbuf->virtual_start);
-	i915_gem_object_ggtt_unpin(ringbuf->obj);
-}
-
-int intel_pin_and_map_ring(struct drm_device *dev, struct intel_ring *ringbuf)
-{
-	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct drm_i915_gem_object *obj = ringbuf->obj;
+	struct drm_i915_gem_object *obj = ring->obj;
 	int ret;
 
-	if (HAS_LLC(dev_priv) && !obj->stolen) {
+	if (HAS_LLC(ring->engine->i915) && !obj->stolen) {
 		ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, 0);
 		if (ret)
 			return ret;
@@ -1869,10 +1859,10 @@ int intel_pin_and_map_ring(struct drm_device *dev, struct intel_ring *ringbuf)
 		if (ret)
 			goto unpin;
 
-		ringbuf->virtual_start = i915_gem_object_pin_vmap(obj);
-		if (IS_ERR(ringbuf->virtual_start)) {
-			ret = PTR_ERR(ringbuf->virtual_start);
-			ringbuf->virtual_start = NULL;
+		ring->virtual_start = i915_gem_object_pin_vmap(obj);
+		if (IS_ERR(ring->virtual_start)) {
+			ret = PTR_ERR(ring->virtual_start);
+			ring->virtual_start = NULL;
 			goto unpin;
 		}
 	} else {
@@ -1884,9 +1874,10 @@ int intel_pin_and_map_ring(struct drm_device *dev, struct intel_ring *ringbuf)
 		if (ret)
 			goto unpin;
 
-		ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
-						    i915_gem_obj_ggtt_offset(obj), ringbuf->size);
-		if (ringbuf->virtual_start == NULL) {
+		ring->virtual_start = ioremap_wc(ring->engine->i915->gtt.mappable_base +
+						 i915_gem_obj_ggtt_offset(obj),
+						 ring->size);
+		if (ring->virtual_start == NULL) {
 			ret = -ENOMEM;
 			goto unpin;
 		}
@@ -1899,6 +1890,15 @@ unpin:
 	return ret;
 }
 
+void intel_ring_unmap(struct intel_ring *ring)
+{
+	if (HAS_LLC(ring->engine->i915) && !ring->obj->stolen)
+		i915_gem_object_unpin_vmap(ring->obj);
+	else
+		iounmap(ring->virtual_start);
+	i915_gem_object_ggtt_unpin(ring->obj);
+}
+
 static void intel_destroy_ringbuffer_obj(struct intel_ring *ringbuf)
 {
 	drm_gem_object_unreference(&ringbuf->obj->base);
@@ -2012,7 +2012,7 @@ static int intel_init_engine(struct drm_device *dev,
 			goto error;
 	}
 
-	ret = intel_pin_and_map_ring(dev, ringbuf);
+	ret = intel_ring_map(ringbuf);
 	if (ret) {
 		DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n",
 				engine->name, ret);
@@ -2042,7 +2042,7 @@ void intel_engine_cleanup(struct intel_engine_cs *ring)
 		intel_engine_stop(ring);
 		WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
 
-		intel_unpin_ring(ring->buffer);
+		intel_ring_unmap(ring->buffer);
 		intel_ring_free(ring->buffer);
 		ring->buffer = NULL;
 	}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 71941af13560..15d067b9b8a2 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -456,8 +456,8 @@ intel_write_status_page(struct intel_engine_cs *ring,
 
 struct intel_ring *
 intel_engine_create_ring(struct intel_engine_cs *engine, int size);
-int intel_pin_and_map_ring(struct drm_device *dev, struct intel_ring *ring);
-void intel_unpin_ring(struct intel_ring *ring);
+int intel_ring_map(struct intel_ring *ring);
+void intel_ring_unmap(struct intel_ring *ring);
 void intel_ring_free(struct intel_ring *ring);
 
 void intel_engine_stop(struct intel_engine_cs *ring);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 065/190] drm/i915: Remove obsolete engine->gpu_caches_dirty
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (62 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 064/190] drm/i915: Rename intel_pin_and_map_ring() Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 066/190] drm/i915: Simplify request_alloc by returning the allocated request Chris Wilson
                   ` (22 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Space for flushing the GPU cache prior to completing the request is
preallocated and so cannot fail.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c    |  2 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  9 +---
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 18 ++++----
 drivers/gpu/drm/i915/i915_gem_request.c    |  7 ++-
 drivers/gpu/drm/i915/intel_lrc.c           | 47 +++----------------
 drivers/gpu/drm/i915/intel_lrc.h           |  2 -
 drivers/gpu/drm/i915/intel_ringbuffer.c    | 72 +++++++-----------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  7 ---
 8 files changed, 39 insertions(+), 125 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 17fe8ed991d6..c078ebc29da5 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -534,7 +534,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 	 * itlb_before_ctx_switch.
 	 */
 	if (IS_GEN6(req->i915)) {
-		ret = req->engine->flush(req, I915_GEM_GPU_DOMAINS, 0);
+		ret = req->engine->emit_flush(req, I915_GEM_GPU_DOMAINS, 0);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 731ce13dbdbc..a56fae99a1bc 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -969,10 +969,8 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 	if (flush_domains & I915_GEM_DOMAIN_GTT)
 		wmb();
 
-	/* Unconditionally invalidate gpu caches and ensure that we do flush
-	 * any residual writes from the previous batch.
-	 */
-	return intel_engine_invalidate_all_caches(req);
+	/* Unconditionally invalidate gpu caches and TLBs. */
+	return req->engine->emit_flush(req, I915_GEM_GPU_DOMAINS, 0);
 }
 
 static bool
@@ -1138,9 +1136,6 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 static void
 i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params)
 {
-	/* Unconditionally force add_request to emit a full flush. */
-	params->ring->gpu_caches_dirty = true;
-
 	/* Add a breadcrumb for the completion of the batch buffer */
 	__i915_add_request(params->request, params->batch_obj, true);
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 9a91451d66ac..cddbd8c00663 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1652,9 +1652,9 @@ static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	/* NB: TLBs must be flushed and invalidated before a switch */
-	ret = req->engine->flush(req,
-				 I915_GEM_GPU_DOMAINS,
-				 I915_GEM_GPU_DOMAINS);
+	ret = req->engine->emit_flush(req,
+				      I915_GEM_GPU_DOMAINS,
+				      I915_GEM_GPU_DOMAINS);
 	if (ret)
 		return ret;
 
@@ -1690,9 +1690,9 @@ static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	/* NB: TLBs must be flushed and invalidated before a switch */
-	ret = req->engine->flush(req,
-				 I915_GEM_GPU_DOMAINS,
-				 I915_GEM_GPU_DOMAINS);
+	ret = req->engine->emit_flush(req,
+				      I915_GEM_GPU_DOMAINS,
+				      I915_GEM_GPU_DOMAINS);
 	if (ret)
 		return ret;
 
@@ -1710,9 +1710,9 @@ static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt,
 
 	/* XXX: RCS is the only one to auto invalidate the TLBs? */
 	if (req->engine->id != RCS) {
-		ret = req->engine->flush(req,
-					 I915_GEM_GPU_DOMAINS,
-					 I915_GEM_GPU_DOMAINS);
+		ret = req->engine->emit_flush(req,
+					      I915_GEM_GPU_DOMAINS,
+					      I915_GEM_GPU_DOMAINS);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index e1f2af046b6c..e911430575fe 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -426,10 +426,9 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	 * what.
 	 */
 	if (flush_caches) {
-		if (i915.enable_execlists)
-			ret = logical_ring_flush_all_caches(request);
-		else
-			ret = intel_engine_flush_all_caches(request);
+		ret = request->engine->emit_flush(request,
+						  0, I915_GEM_GPU_DOMAINS);
+
 		/* Not allowed to fail! */
 		WARN(ret, "*_ring_flush_all_caches failed: %d!\n", ret);
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3a80d9d45f5c..b889680f7491 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -616,24 +616,6 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
 	return 0;
 }
 
-static int logical_ring_invalidate_all_caches(struct drm_i915_gem_request *req)
-{
-	struct intel_engine_cs *engine = req->engine;
-	uint32_t flush_domains;
-	int ret;
-
-	flush_domains = 0;
-	if (engine->gpu_caches_dirty)
-		flush_domains = I915_GEM_GPU_DOMAINS;
-
-	ret = engine->emit_flush(req, I915_GEM_GPU_DOMAINS, flush_domains);
-	if (ret)
-		return ret;
-
-	engine->gpu_caches_dirty = false;
-	return 0;
-}
-
 static int execlists_move_to_gpu(struct drm_i915_gem_request *req,
 				 struct list_head *vmas)
 {
@@ -664,7 +646,7 @@ static int execlists_move_to_gpu(struct drm_i915_gem_request *req,
 	/* Unconditionally invalidate gpu caches and ensure that we do flush
 	 * any residual writes from the previous batch.
 	 */
-	return logical_ring_invalidate_all_caches(req);
+	return req->engine->emit_flush(req, I915_GEM_GPU_DOMAINS, 0);
 }
 
 int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request)
@@ -860,22 +842,6 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
 	I915_WRITE_MODE(ring, _MASKED_BIT_DISABLE(STOP_RING));
 }
 
-int logical_ring_flush_all_caches(struct drm_i915_gem_request *req)
-{
-	struct intel_engine_cs *engine = req->engine;
-	int ret;
-
-	if (!engine->gpu_caches_dirty)
-		return 0;
-
-	ret = engine->emit_flush(req, 0, I915_GEM_GPU_DOMAINS);
-	if (ret)
-		return ret;
-
-	engine->gpu_caches_dirty = false;
-	return 0;
-}
-
 static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 		struct drm_i915_gem_object *ctx_obj,
 		struct intel_ring *ringbuf)
@@ -946,7 +912,6 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
 {
 	int ret, i;
-	struct intel_engine_cs *engine = req->engine;
 	struct intel_ring *ring = req->ring;
 	struct drm_i915_private *dev_priv = req->i915;
 	struct i915_workarounds *w = &dev_priv->workarounds;
@@ -954,8 +919,9 @@ static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
 	if (w->count == 0)
 		return 0;
 
-	engine->gpu_caches_dirty = true;
-	ret = logical_ring_flush_all_caches(req);
+	ret = req->engine->emit_flush(req,
+				      I915_GEM_GPU_DOMAINS,
+				      I915_GEM_GPU_DOMAINS);
 	if (ret)
 		return ret;
 
@@ -972,8 +938,9 @@ static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
 
 	intel_ring_advance(ring);
 
-	engine->gpu_caches_dirty = true;
-	ret = logical_ring_flush_all_caches(req);
+	ret = req->engine->emit_flush(req,
+				      I915_GEM_GPU_DOMAINS,
+				      I915_GEM_GPU_DOMAINS);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index c88988a41898..7f01d2ddacfa 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -60,8 +60,6 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring);
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
 int intel_logical_rings_init(struct drm_device *dev);
 
-int logical_ring_flush_all_caches(struct drm_i915_gem_request *req);
-
 /* Logical Ring Contexts */
 
 /* One extra page is added before LRC for GuC as shared data */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 74a4a54e6ca5..e584b0f631f8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -696,8 +696,9 @@ static int intel_ring_workarounds_emit(struct drm_i915_gem_request *req)
 	if (w->count == 0)
 		return 0;
 
-	req->engine->gpu_caches_dirty = true;
-	ret = intel_engine_flush_all_caches(req);
+	ret = req->engine->emit_flush(req,
+				      I915_GEM_GPU_DOMAINS,
+				      I915_GEM_GPU_DOMAINS);
 	if (ret)
 		return ret;
 
@@ -714,8 +715,9 @@ static int intel_ring_workarounds_emit(struct drm_i915_gem_request *req)
 
 	intel_ring_advance(ring);
 
-	req->engine->gpu_caches_dirty = true;
-	ret = intel_engine_flush_all_caches(req);
+	ret = req->engine->emit_flush(req,
+				      I915_GEM_GPU_DOMAINS,
+				      I915_GEM_GPU_DOMAINS);
 	if (ret)
 		return ret;
 
@@ -2509,7 +2511,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 
 		ring->init_context = intel_rcs_ctx_init;
 		ring->add_request = gen6_add_request;
-		ring->flush = gen8_render_ring_flush;
+		ring->emit_flush = gen8_render_ring_flush;
 		ring->irq_enable = gen8_ring_enable_irq;
 		ring->irq_disable = gen8_ring_disable_irq;
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
@@ -2523,9 +2525,9 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	} else if (INTEL_INFO(dev)->gen >= 6) {
 		ring->init_context = intel_rcs_ctx_init;
 		ring->add_request = gen6_add_request;
-		ring->flush = gen7_render_ring_flush;
+		ring->emit_flush = gen7_render_ring_flush;
 		if (INTEL_INFO(dev)->gen == 6)
-			ring->flush = gen6_render_ring_flush;
+			ring->emit_flush = gen6_render_ring_flush;
 		ring->irq_enable = gen6_ring_enable_irq;
 		ring->irq_disable = gen6_ring_disable_irq;
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
@@ -2553,7 +2555,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		}
 	} else if (IS_GEN5(dev)) {
 		ring->add_request = pc_render_add_request;
-		ring->flush = gen4_render_ring_flush;
+		ring->emit_flush = gen4_render_ring_flush;
 		ring->irq_enable = gen5_ring_enable_irq;
 		ring->irq_disable = gen5_ring_disable_irq;
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT |
@@ -2561,9 +2563,9 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	} else {
 		ring->add_request = i9xx_add_request;
 		if (INTEL_INFO(dev)->gen < 4)
-			ring->flush = gen2_render_ring_flush;
+			ring->emit_flush = gen2_render_ring_flush;
 		else
-			ring->flush = gen4_render_ring_flush;
+			ring->emit_flush = gen4_render_ring_flush;
 		if (IS_GEN2(dev)) {
 			ring->irq_enable = i8xx_ring_enable_irq;
 			ring->irq_disable = i8xx_ring_disable_irq;
@@ -2636,7 +2638,7 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 		/* gen6 bsd needs a special wa for tail updates */
 		if (IS_GEN6(dev))
 			ring->write_tail = gen6_bsd_ring_write_tail;
-		ring->flush = gen6_bsd_ring_flush;
+		ring->emit_flush = gen6_bsd_ring_flush;
 		ring->add_request = gen6_add_request;
 		ring->irq_seqno_barrier = gen6_seqno_barrier;
 		if (INTEL_INFO(dev)->gen >= 8) {
@@ -2674,7 +2676,7 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 		}
 	} else {
 		ring->mmio_base = BSD_RING_BASE;
-		ring->flush = bsd_ring_flush;
+		ring->emit_flush = bsd_ring_flush;
 		ring->add_request = i9xx_add_request;
 		if (IS_GEN5(dev)) {
 			ring->irq_enable_mask = ILK_BSD_USER_INTERRUPT;
@@ -2705,7 +2707,7 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
 
 	ring->write_tail = ring_write_tail;
 	ring->mmio_base = GEN8_BSD2_RING_BASE;
-	ring->flush = gen6_bsd_ring_flush;
+	ring->emit_flush = gen6_bsd_ring_flush;
 	ring->add_request = gen6_add_request;
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
 	ring->irq_enable_mask =
@@ -2734,7 +2736,7 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 
 	ring->mmio_base = BLT_RING_BASE;
 	ring->write_tail = ring_write_tail;
-	ring->flush = gen6_ring_flush;
+	ring->emit_flush = gen6_ring_flush;
 	ring->add_request = gen6_add_request;
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
 	if (INTEL_INFO(dev)->gen >= 8) {
@@ -2790,7 +2792,7 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 
 	ring->mmio_base = VEBOX_RING_BASE;
 	ring->write_tail = ring_write_tail;
-	ring->flush = gen6_ring_flush;
+	ring->emit_flush = gen6_ring_flush;
 	ring->add_request = gen6_add_request;
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
 
@@ -2830,46 +2832,6 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 	return intel_init_engine(dev, ring);
 }
 
-int
-intel_engine_flush_all_caches(struct drm_i915_gem_request *req)
-{
-	struct intel_engine_cs *engine = req->engine;
-	int ret;
-
-	if (!engine->gpu_caches_dirty)
-		return 0;
-
-	ret = engine->flush(req, 0, I915_GEM_GPU_DOMAINS);
-	if (ret)
-		return ret;
-
-	trace_i915_gem_ring_flush(req, 0, I915_GEM_GPU_DOMAINS);
-
-	engine->gpu_caches_dirty = false;
-	return 0;
-}
-
-int
-intel_engine_invalidate_all_caches(struct drm_i915_gem_request *req)
-{
-	struct intel_engine_cs *engine = req->engine;
-	uint32_t flush_domains;
-	int ret;
-
-	flush_domains = 0;
-	if (engine->gpu_caches_dirty)
-		flush_domains = I915_GEM_GPU_DOMAINS;
-
-	ret = engine->flush(req, I915_GEM_GPU_DOMAINS, flush_domains);
-	if (ret)
-		return ret;
-
-	trace_i915_gem_ring_flush(req, I915_GEM_GPU_DOMAINS, flush_domains);
-
-	engine->gpu_caches_dirty = false;
-	return 0;
-}
-
 void
 intel_engine_stop(struct intel_engine_cs *ring)
 {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 15d067b9b8a2..fdeadae726b8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -215,9 +215,6 @@ struct intel_engine_cs {
 
 	void		(*write_tail)(struct intel_engine_cs *ring,
 				      u32 value);
-	int __must_check (*flush)(struct drm_i915_gem_request *req,
-				  u32	invalidate_domains,
-				  u32	flush_domains);
 	int		(*add_request)(struct drm_i915_gem_request *req);
 	/* Some chipsets are not quite as coherent as advertised and need
 	 * an expensive kick to force a true read of the up-to-date seqno.
@@ -332,8 +329,6 @@ struct intel_engine_cs {
 	u32 last_submitted_seqno;
 	unsigned user_interrupts;
 
-	bool gpu_caches_dirty;
-
 	struct intel_context *default_context;
 	struct intel_context *last_context;
 
@@ -486,8 +481,6 @@ int intel_ring_space(struct intel_ring *ringbuf);
 
 int __must_check intel_engine_idle(struct intel_engine_cs *ring);
 void intel_engine_init_seqno(struct intel_engine_cs *ring, u32 seqno);
-int intel_engine_flush_all_caches(struct drm_i915_gem_request *req);
-int intel_engine_invalidate_all_caches(struct drm_i915_gem_request *req);
 
 void intel_fini_pipe_control(struct intel_engine_cs *ring);
 int intel_init_pipe_control(struct intel_engine_cs *ring);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 066/190] drm/i915: Simplify request_alloc by returning the allocated request
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (63 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 065/190] drm/i915: Remove obsolete engine->gpu_caches_dirty Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-12 17:11   ` Dave Gordon
  2016-01-11  9:17 ` [PATCH 067/190] drm/i915: Unify legacy/execlists emission of MI_BATCHBUFFER_START Chris Wilson
                   ` (21 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

If is simpler and leads to more readable code through the callstack if
the allocation returns the allocated struct through the return value.

The importance of this is that it no longer looks like we accidentally
allocate requests as side-effect of calling certain functions.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h            |  3 +-
 drivers/gpu/drm/i915/i915_gem.c            | 82 ++++++++++--------------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  8 +--
 drivers/gpu/drm/i915/i915_gem_request.c    | 22 +++-----
 drivers/gpu/drm/i915/i915_gem_request.h    |  6 +--
 drivers/gpu/drm/i915/i915_trace.h          | 15 +++---
 drivers/gpu/drm/i915/intel_display.c       | 25 +++++----
 drivers/gpu/drm/i915/intel_lrc.c           |  6 +--
 drivers/gpu/drm/i915/intel_overlay.c       | 24 ++++-----
 9 files changed, 77 insertions(+), 114 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 44e8738c5310..0c580124d46d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2786,8 +2786,7 @@ static inline void i915_gem_object_unpin_vmap(struct drm_i915_gem_object *obj)
 
 int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
 int i915_gem_object_sync(struct drm_i915_gem_object *obj,
-			 struct intel_engine_cs *to,
-			 struct drm_i915_gem_request **to_req);
+			 struct drm_i915_gem_request *to);
 void i915_vma_move_to_active(struct i915_vma *vma,
 			     struct drm_i915_gem_request *req);
 int i915_gem_dumb_create(struct drm_file *file_priv,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1c6beb154d07..5b5afdcd9634 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2550,47 +2550,35 @@ out:
 
 static int
 __i915_gem_object_sync(struct drm_i915_gem_object *obj,
-		       struct intel_engine_cs *to,
-		       struct drm_i915_gem_request *from_req,
-		       struct drm_i915_gem_request **to_req)
+		       struct drm_i915_gem_request *to,
+		       struct drm_i915_gem_request *from)
 {
-	struct intel_engine_cs *from;
 	int ret;
 
-	from = from_req->engine;
-	if (to == from)
+	if (to->engine == from->engine)
 		return 0;
 
-	if (i915_gem_request_completed(from_req))
+	if (i915_gem_request_completed(from))
 		return 0;
 
 	if (!i915.semaphores) {
-		struct drm_i915_private *i915 = from_req->i915;
-		ret = __i915_wait_request(from_req,
-					  i915->mm.interruptible,
+		ret = __i915_wait_request(from,
+					  to->i915->mm.interruptible,
 					  NULL,
 					  NO_WAITBOOST);
 		if (ret)
 			return ret;
 
-		i915_gem_object_retire_request(obj, from_req);
+		i915_gem_object_retire_request(obj, from);
 	} else {
-		int idx = intel_engine_sync_index(from, to);
-		u32 seqno = i915_gem_request_get_seqno(from_req);
+		int idx = intel_engine_sync_index(from->engine, to->engine);
+		u32 seqno = i915_gem_request_get_seqno(from);
 
-		WARN_ON(!to_req);
-
-		if (seqno <= from->semaphore.sync_seqno[idx])
+		if (seqno <= from->engine->semaphore.sync_seqno[idx])
 			return 0;
 
-		if (*to_req == NULL) {
-			ret = i915_gem_request_alloc(to, to->default_context, to_req);
-			if (ret)
-				return ret;
-		}
-
-		trace_i915_gem_ring_sync_to(*to_req, from, from_req);
-		ret = to->semaphore.sync_to(*to_req, from, seqno);
+		trace_i915_gem_ring_sync_to(to, from);
+		ret = to->engine->semaphore.sync_to(to, from->engine, seqno);
 		if (ret)
 			return ret;
 
@@ -2598,8 +2586,8 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 		 * might have just caused seqno wrap under
 		 * the radar.
 		 */
-		from->semaphore.sync_seqno[idx] =
-			i915_gem_request_get_seqno(obj->last_read_req[from->id]);
+		from->engine->semaphore.sync_seqno[idx] =
+			i915_gem_request_get_seqno(obj->last_read_req[from->engine->id]);
 	}
 
 	return 0;
@@ -2609,17 +2597,12 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
  * i915_gem_object_sync - sync an object to a ring.
  *
  * @obj: object which may be in use on another ring.
- * @to: ring we wish to use the object on. May be NULL.
- * @to_req: request we wish to use the object for. See below.
- *          This will be allocated and returned if a request is
- *          required but not passed in.
+ * @to: request we are wishing to use
  *
  * This code is meant to abstract object synchronization with the GPU.
- * Calling with NULL implies synchronizing the object with the CPU
- * rather than a particular GPU ring. Conceptually we serialise writes
- * between engines inside the GPU. We only allow one engine to write
- * into a buffer at any time, but multiple readers. To ensure each has
- * a coherent view of memory, we must:
+ * Conceptually we serialise writes between engines inside the GPU.
+ * We only allow one engine to write into a buffer at any time, but
+ * multiple readers. To ensure each has a coherent view of memory, we must:
  *
  * - If there is an outstanding write request to the object, the new
  *   request must wait for it to complete (either CPU or in hw, requests
@@ -2628,22 +2611,11 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
  * - If we are a write request (pending_write_domain is set), the new
  *   request must wait for outstanding read requests to complete.
  *
- * For CPU synchronisation (NULL to) no request is required. For syncing with
- * rings to_req must be non-NULL. However, a request does not have to be
- * pre-allocated. If *to_req is NULL and sync commands will be emitted then a
- * request will be allocated automatically and returned through *to_req. Note
- * that it is not guaranteed that commands will be emitted (because the system
- * might already be idle). Hence there is no need to create a request that
- * might never have any work submitted. Note further that if a request is
- * returned in *to_req, it is the responsibility of the caller to submit
- * that request (after potentially adding more work to it).
- *
  * Returns 0 if successful, else propagates up the lower layer error.
  */
 int
 i915_gem_object_sync(struct drm_i915_gem_object *obj,
-		     struct intel_engine_cs *to,
-		     struct drm_i915_gem_request **to_req)
+		     struct drm_i915_gem_request *to)
 {
 	const bool readonly = obj->base.pending_write_domain == 0;
 	struct drm_i915_gem_request *req[I915_NUM_RINGS];
@@ -2652,9 +2624,6 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (!obj->active)
 		return 0;
 
-	if (to == NULL)
-		return i915_gem_object_wait_rendering(obj, readonly);
-
 	n = 0;
 	if (readonly) {
 		if (obj->last_write_req)
@@ -2665,7 +2634,7 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
 				req[n++] = obj->last_read_req[i];
 	}
 	for (i = 0; i < n; i++) {
-		ret = __i915_gem_object_sync(obj, to, req[i], to_req);
+		ret = __i915_gem_object_sync(obj, to, req[i]);
 		if (ret)
 			return ret;
 	}
@@ -2783,9 +2752,9 @@ int i915_gpu_idle(struct drm_device *dev)
 		if (!i915.enable_execlists) {
 			struct drm_i915_gem_request *req;
 
-			ret = i915_gem_request_alloc(ring, ring->default_context, &req);
-			if (ret)
-				return ret;
+			req = i915_gem_request_alloc(ring, ring->default_context);
+			if (IS_ERR(req))
+				return PTR_ERR(req);
 
 			ret = i915_switch_context(req);
 			i915_add_request_no_flush(req);
@@ -4263,8 +4232,9 @@ i915_gem_init_hw(struct drm_device *dev)
 
 		WARN_ON(!ring->default_context);
 
-		ret = i915_gem_request_alloc(ring, ring->default_context, &req);
-		if (ret) {
+		req = i915_gem_request_alloc(ring, ring->default_context);
+		if (IS_ERR(req)) {
+			ret = PTR_ERR(req);
 			i915_gem_cleanup_ringbuffer(dev);
 			goto out;
 		}
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index a56fae99a1bc..3956d74d8c8c 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -952,7 +952,7 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 		struct drm_i915_gem_object *obj = vma->obj;
 
 		if (obj->active & other_rings) {
-			ret = i915_gem_object_sync(obj, req->engine, &req);
+			ret = i915_gem_object_sync(obj, req);
 			if (ret)
 				return ret;
 		}
@@ -1595,9 +1595,11 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		params->batch_obj_vm_offset = i915_gem_obj_offset(batch_obj, vm);
 
 	/* Allocate a request for this batch buffer nice and early. */
-	ret = i915_gem_request_alloc(ring, ctx, &params->request);
-	if (ret)
+	params->request = i915_gem_request_alloc(ring, ctx);
+	if (IS_ERR(params->request)) {
+		ret = PTR_ERR(params->request);
 		goto err_batch_unpin;
+	}
 
 	ret = i915_gem_request_add_to_client(params->request, file);
 	if (ret) {
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index e911430575fe..ce663acc9c7d 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -195,9 +195,9 @@ i915_gem_get_seqno(struct drm_i915_private *dev_priv, u32 *seqno)
 	return 0;
 }
 
-int i915_gem_request_alloc(struct intel_engine_cs *engine,
-			   struct intel_context *ctx,
-			   struct drm_i915_gem_request **req_out)
+struct drm_i915_gem_request *
+i915_gem_request_alloc(struct intel_engine_cs *engine,
+		       struct intel_context *ctx)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
 	unsigned reset_counter = i915_reset_counter(&dev_priv->gpu_error);
@@ -205,22 +205,17 @@ int i915_gem_request_alloc(struct intel_engine_cs *engine,
 	u32 seqno;
 	int ret;
 
-	if (!req_out)
-		return -EINVAL;
-
-	*req_out = NULL;
-
 	/* ABI: Before userspace accesses the GPU (e.g. execbuffer), report
 	 * EIO if the GPU is already wedged, or EAGAIN to drop the struct_mutex
 	 * and restart.
 	 */
 	ret = i915_gem_check_wedge(reset_counter, dev_priv->mm.interruptible);
 	if (ret)
-		return ret;
+		return ERR_PTR(ret);
 
 	req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
 	if (req == NULL)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	ret = i915_gem_get_seqno(dev_priv, &seqno);
 	if (ret)
@@ -265,15 +260,14 @@ int i915_gem_request_alloc(struct intel_engine_cs *engine,
 		 * free code.
 		 */
 		i915_gem_request_cancel(req);
-		return ret;
+		return ERR_PTR(ret);
 	}
 
-	*req_out = req;
-	return 0;
+	return req;
 
 err:
 	kmem_cache_free(dev_priv->requests, req);
-	return ret;
+	return ERR_PTR(ret);
 }
 
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 086950567db4..2da9e0b5dfc7 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -118,9 +118,9 @@ struct drm_i915_gem_request {
 	int elsp_submitted;
 };
 
-int i915_gem_request_alloc(struct intel_engine_cs *ring,
-			   struct intel_context *ctx,
-			   struct drm_i915_gem_request **req_out);
+struct drm_i915_gem_request *
+i915_gem_request_alloc(struct intel_engine_cs *ring,
+		       struct intel_context *ctx);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 				   struct drm_file *file);
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 95cab4776401..85469e3c740a 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -460,10 +460,9 @@ TRACE_EVENT(i915_gem_evict_vm,
 );
 
 TRACE_EVENT(i915_gem_ring_sync_to,
-	    TP_PROTO(struct drm_i915_gem_request *to_req,
-		     struct intel_engine_cs *from,
-		     struct drm_i915_gem_request *req),
-	    TP_ARGS(to_req, from, req),
+	    TP_PROTO(struct drm_i915_gem_request *to,
+		     struct drm_i915_gem_request *from),
+	    TP_ARGS(to, from),
 
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
@@ -473,10 +472,10 @@ TRACE_EVENT(i915_gem_ring_sync_to,
 			     ),
 
 	    TP_fast_assign(
-			   __entry->dev = from->dev->primary->index;
-			   __entry->sync_from = from->id;
-			   __entry->sync_to = to_req->engine->id;
-			   __entry->seqno = i915_gem_request_get_seqno(req);
+			   __entry->dev = from->i915->dev->primary->index;
+			   __entry->sync_from = from->engine->id;
+			   __entry->sync_to = to->engine->id;
+			   __entry->seqno = from->fence.seqno;
 			   ),
 
 	    TP_printk("dev=%u, sync-from=%u, sync-to=%u, seqno=%u",
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index f8717c5627dd..ec52fff7e0b0 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11669,15 +11669,21 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	 * into the display plane and skip any waits.
 	 */
 	if (!mmio_flip) {
-		ret = i915_gem_object_sync(obj, ring, &request);
-		if (ret)
+		request = i915_gem_request_alloc(ring, ring->default_context);
+		if (IS_ERR(request)) {
+			ret = PTR_ERR(request);
 			goto cleanup_pending;
+		}
+
+		ret = i915_gem_object_sync(obj, request);
+		if (ret)
+			goto cleanup_request;
 	}
 
 	ret = intel_pin_and_fence_fb_obj(crtc->primary, fb,
 					 crtc->primary->state);
 	if (ret)
-		goto cleanup_pending;
+		goto cleanup_request;
 
 	work->gtt_offset = intel_plane_obj_offset(to_intel_plane(primary),
 						  obj, 0);
@@ -11691,23 +11697,15 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 		i915_gem_request_assign(&work->flip_queued_req,
 					obj->last_write_req);
 	} else {
-		if (!request) {
-			ret = i915_gem_request_alloc(ring, ring->default_context, &request);
-			if (ret)
-				goto cleanup_unpin;
-		}
-
 		ret = dev_priv->display.queue_flip(dev, crtc, fb, obj, request,
 						   page_flip_flags);
 		if (ret)
 			goto cleanup_unpin;
 
+		i915_add_request_no_flush(request);
 		i915_gem_request_assign(&work->flip_queued_req, request);
 	}
 
-	if (request)
-		i915_add_request_no_flush(request);
-
 	work->flip_queued_vblank = drm_crtc_vblank_count(crtc);
 	work->enable_stall_check = true;
 
@@ -11725,9 +11723,10 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 
 cleanup_unpin:
 	intel_unpin_fb_obj(fb, crtc->primary->state);
-cleanup_pending:
+cleanup_request:
 	if (request)
 		i915_add_request_no_flush(request);
+cleanup_pending:
 	atomic_dec(&intel_crtc->unpin_work_count);
 	mutex_unlock(&dev->struct_mutex);
 cleanup:
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b889680f7491..82b21a883732 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -629,7 +629,7 @@ static int execlists_move_to_gpu(struct drm_i915_gem_request *req,
 		struct drm_i915_gem_object *obj = vma->obj;
 
 		if (obj->active & other_rings) {
-			ret = i915_gem_object_sync(obj, req->engine, &req);
+			ret = i915_gem_object_sync(obj, req);
 			if (ret)
 				return ret;
 		}
@@ -2264,8 +2264,8 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 	if (ctx != engine->default_context && engine->init_context) {
 		struct drm_i915_gem_request *req;
 
-		ret = i915_gem_request_alloc(engine, ctx, &req);
-		if (ret) {
+		req = i915_gem_request_alloc(engine, ctx);
+		if (IS_ERR(req)) {
 			DRM_ERROR("ring create req: %d\n",
 				ret);
 			goto error_ringbuf;
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index cb73d16848b0..df71c01f28f1 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -240,9 +240,9 @@ static int intel_overlay_on(struct intel_overlay *overlay)
 	WARN_ON(overlay->active);
 	WARN_ON(IS_I830(dev) && !(dev_priv->quirks & QUIRK_PIPEA_FORCE));
 
-	ret = i915_gem_request_alloc(ring, ring->default_context, &req);
-	if (ret)
-		return ret;
+	req = i915_gem_request_alloc(ring, ring->default_context);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
 
 	ret = intel_ring_begin(req, 4);
 	if (ret) {
@@ -283,9 +283,9 @@ static int intel_overlay_continue(struct intel_overlay *overlay,
 	if (tmp & (1 << 17))
 		DRM_DEBUG("overlay underrun, DOVSTA: %x\n", tmp);
 
-	ret = i915_gem_request_alloc(ring, ring->default_context, &req);
-	if (ret)
-		return ret;
+	req = i915_gem_request_alloc(ring, ring->default_context);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
 
 	ret = intel_ring_begin(req, 2);
 	if (ret) {
@@ -349,9 +349,9 @@ static int intel_overlay_off(struct intel_overlay *overlay)
 	 * of the hw. Do it in both cases */
 	flip_addr |= OFC_UPDATE;
 
-	ret = i915_gem_request_alloc(ring, ring->default_context, &req);
-	if (ret)
-		return ret;
+	req = i915_gem_request_alloc(ring, ring->default_context);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
 
 	ret = intel_ring_begin(req, 6);
 	if (ret) {
@@ -423,9 +423,9 @@ static int intel_overlay_release_old_vid(struct intel_overlay *overlay)
 		/* synchronous slowpath */
 		struct drm_i915_gem_request *req;
 
-		ret = i915_gem_request_alloc(ring, ring->default_context, &req);
-		if (ret)
-			return ret;
+		req = i915_gem_request_alloc(ring, ring->default_context);
+		if (req)
+			return PTR_ERR(req);
 
 		ret = intel_ring_begin(req, 2);
 		if (ret) {
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 067/190] drm/i915: Unify legacy/execlists emission of MI_BATCHBUFFER_START
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (64 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 066/190] drm/i915: Simplify request_alloc by returning the allocated request Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 068/190] drm/i915: Unify adding requests between ringbuffer and execlists Chris Wilson
                   ` (20 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Both the ->dispatch_execbuffer and ->emit_bb_start callbacks do exactly
the same thing, add MI_BATCHBUFFER_START to the request's ringbuffer -
we need only one vfunc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  6 +--
 drivers/gpu/drm/i915/i915_gem_render_state.c | 16 +++----
 drivers/gpu/drm/i915/intel_lrc.c             |  9 +++-
 drivers/gpu/drm/i915/intel_ringbuffer.c      | 67 +++++++++++++---------------
 drivers/gpu/drm/i915/intel_ringbuffer.h      | 12 +++--
 5 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3956d74d8c8c..3e6384deca65 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1297,9 +1297,9 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 	exec_start = params->batch_obj_vm_offset +
 		     params->args_batch_start_offset;
 
-	ret = params->ring->dispatch_execbuffer(params->request,
-						exec_start, exec_len,
-						params->dispatch_flags);
+	ret = params->ring->emit_bb_start(params->request,
+					  exec_start, exec_len,
+					  params->dispatch_flags);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index bee3f0ccd0cd..ccc988c2b226 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -205,18 +205,18 @@ int i915_gem_render_state_init(struct drm_i915_gem_request *req)
 	if (so.rodata == NULL)
 		return 0;
 
-	ret = req->engine->dispatch_execbuffer(req, so.ggtt_offset,
-					       so.rodata->batch_items * 4,
-					       I915_DISPATCH_SECURE);
+	ret = req->engine->emit_bb_start(req, so.ggtt_offset,
+					 so.rodata->batch_items * 4,
+					 I915_DISPATCH_SECURE);
 	if (ret)
 		goto out;
 
 	if (so.aux_batch_size > 8) {
-		ret = req->engine->dispatch_execbuffer(req,
-						       (so.ggtt_offset +
-							so.aux_batch_offset),
-						       so.aux_batch_size,
-						       I915_DISPATCH_SECURE);
+		ret = req->engine->emit_bb_start(req,
+						 (so.ggtt_offset +
+						  so.aux_batch_offset),
+						 so.aux_batch_size,
+						 I915_DISPATCH_SECURE);
 		if (ret)
 			goto out;
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 82b21a883732..30effca91184 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -783,7 +783,9 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 	exec_start = params->batch_obj_vm_offset +
 		     args->batch_start_offset;
 
-	ret = engine->emit_bb_start(params->request, exec_start, params->dispatch_flags);
+	ret = engine->emit_bb_start(params->request,
+				    exec_start, args->batch_len,
+				    params->dispatch_flags);
 	if (ret)
 		return ret;
 
@@ -1409,7 +1411,8 @@ static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
 }
 
 static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
-			      u64 offset, unsigned dispatch_flags)
+			      u64 offset, u32 len,
+			      unsigned dispatch_flags)
 {
 	struct intel_ring *ring = req->ring;
 	bool ppgtt = !(dispatch_flags & I915_DISPATCH_SECURE);
@@ -1637,12 +1640,14 @@ static int intel_lr_context_render_state_init(struct drm_i915_gem_request *req)
 		return 0;
 
 	ret = req->engine->emit_bb_start(req, so.ggtt_offset,
+					 so.rodata->batch_items * 4,
 					 I915_DISPATCH_SECURE);
 	if (ret)
 		goto out;
 
 	ret = req->engine->emit_bb_start(req,
 					 (so.ggtt_offset + so.aux_batch_offset),
+					 so.aux_batch_size,
 					 I915_DISPATCH_SECURE);
 	if (ret)
 		goto out;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index e584b0f631f8..04f0a77d49cf 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1656,9 +1656,9 @@ gen8_ring_disable_irq(struct intel_engine_cs *ring)
 }
 
 static int
-i965_dispatch_execbuffer(struct drm_i915_gem_request *req,
-			 u64 offset, u32 length,
-			 unsigned dispatch_flags)
+i965_emit_bb_start(struct drm_i915_gem_request *req,
+		   u64 offset, u32 length,
+		   unsigned dispatch_flags)
 {
 	struct intel_ring *ring = req->ring;
 	int ret;
@@ -1683,9 +1683,9 @@ i965_dispatch_execbuffer(struct drm_i915_gem_request *req,
 #define I830_TLB_ENTRIES (2)
 #define I830_WA_SIZE max(I830_TLB_ENTRIES*4096, I830_BATCH_LIMIT)
 static int
-i830_dispatch_execbuffer(struct drm_i915_gem_request *req,
-			 u64 offset, u32 len,
-			 unsigned dispatch_flags)
+i830_emit_bb_start(struct drm_i915_gem_request *req,
+		   u64 offset, u32 len,
+		   unsigned dispatch_flags)
 {
 	struct intel_ring *ring = req->ring;
 	u32 cs_offset = req->engine->scratch.gtt_offset;
@@ -1746,9 +1746,9 @@ i830_dispatch_execbuffer(struct drm_i915_gem_request *req,
 }
 
 static int
-i915_dispatch_execbuffer(struct drm_i915_gem_request *req,
-			 u64 offset, u32 len,
-			 unsigned dispatch_flags)
+i915_emit_bb_start(struct drm_i915_gem_request *req,
+		   u64 offset, u32 len,
+		   unsigned dispatch_flags)
 {
 	struct intel_ring *ring = req->ring;
 	int ret;
@@ -2361,9 +2361,9 @@ static int gen6_bsd_ring_flush(struct drm_i915_gem_request *req,
 }
 
 static int
-gen8_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
-			      u64 offset, u32 len,
-			      unsigned dispatch_flags)
+gen8_emit_bb_start(struct drm_i915_gem_request *req,
+		   u64 offset, u32 len,
+		   unsigned dispatch_flags)
 {
 	struct intel_ring *ring = req->ring;
 	bool ppgtt = USES_PPGTT(req->i915) &&
@@ -2387,9 +2387,9 @@ gen8_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
 }
 
 static int
-hsw_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
-			     u64 offset, u32 len,
-			     unsigned dispatch_flags)
+hsw_emit_bb_start(struct drm_i915_gem_request *req,
+		  u64 offset, u32 len,
+		  unsigned dispatch_flags)
 {
 	struct intel_ring *ring = req->ring;
 	int ret;
@@ -2412,9 +2412,9 @@ hsw_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
 }
 
 static int
-gen6_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
-			      u64 offset, u32 len,
-			      unsigned dispatch_flags)
+gen6_emit_bb_start(struct drm_i915_gem_request *req,
+		   u64 offset, u32 len,
+		   unsigned dispatch_flags)
 {
 	struct intel_ring *ring = req->ring;
 	int ret;
@@ -2578,17 +2578,17 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	ring->write_tail = ring_write_tail;
 
 	if (IS_HASWELL(dev))
-		ring->dispatch_execbuffer = hsw_ring_dispatch_execbuffer;
+		ring->emit_bb_start = hsw_emit_bb_start;
 	else if (IS_GEN8(dev))
-		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
+		ring->emit_bb_start = gen8_emit_bb_start;
 	else if (INTEL_INFO(dev)->gen >= 6)
-		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
+		ring->emit_bb_start = gen6_emit_bb_start;
 	else if (INTEL_INFO(dev)->gen >= 4)
-		ring->dispatch_execbuffer = i965_dispatch_execbuffer;
+		ring->emit_bb_start = i965_emit_bb_start;
 	else if (IS_I830(dev) || IS_845G(dev))
-		ring->dispatch_execbuffer = i830_dispatch_execbuffer;
+		ring->emit_bb_start = i830_emit_bb_start;
 	else
-		ring->dispatch_execbuffer = i915_dispatch_execbuffer;
+		ring->emit_bb_start = i915_emit_bb_start;
 	ring->init_hw = init_render_ring;
 	ring->cleanup = render_ring_cleanup;
 
@@ -2646,8 +2646,7 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 				GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
 			ring->irq_enable = gen8_ring_enable_irq;
 			ring->irq_disable = gen8_ring_disable_irq;
-			ring->dispatch_execbuffer =
-				gen8_ring_dispatch_execbuffer;
+			ring->emit_bb_start = gen8_emit_bb_start;
 			if (i915.semaphores) {
 				ring->semaphore.sync_to = gen8_ring_sync;
 				ring->semaphore.signal = gen8_xcs_signal;
@@ -2657,8 +2656,7 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			ring->irq_enable_mask = GT_BSD_USER_INTERRUPT;
 			ring->irq_enable = gen6_ring_enable_irq;
 			ring->irq_disable = gen6_ring_disable_irq;
-			ring->dispatch_execbuffer =
-				gen6_ring_dispatch_execbuffer;
+			ring->emit_bb_start = gen6_emit_bb_start;
 			if (i915.semaphores) {
 				ring->semaphore.sync_to = gen6_ring_sync;
 				ring->semaphore.signal = gen6_signal;
@@ -2687,7 +2685,7 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			ring->irq_enable = i9xx_ring_enable_irq;
 			ring->irq_disable = i9xx_ring_disable_irq;
 		}
-		ring->dispatch_execbuffer = i965_dispatch_execbuffer;
+		ring->emit_bb_start = i965_emit_bb_start;
 	}
 	ring->init_hw = init_ring_common;
 
@@ -2714,8 +2712,7 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
 			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
 	ring->irq_enable = gen8_ring_enable_irq;
 	ring->irq_disable = gen8_ring_disable_irq;
-	ring->dispatch_execbuffer =
-			gen8_ring_dispatch_execbuffer;
+	ring->emit_bb_start = gen8_emit_bb_start;
 	if (i915.semaphores) {
 		ring->semaphore.sync_to = gen8_ring_sync;
 		ring->semaphore.signal = gen8_xcs_signal;
@@ -2744,7 +2741,7 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
 		ring->irq_enable = gen8_ring_enable_irq;
 		ring->irq_disable = gen8_ring_disable_irq;
-		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
+		ring->emit_bb_start = gen8_emit_bb_start;
 		if (i915.semaphores) {
 			ring->semaphore.sync_to = gen8_ring_sync;
 			ring->semaphore.signal = gen8_xcs_signal;
@@ -2754,7 +2751,7 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 		ring->irq_enable_mask = GT_BLT_USER_INTERRUPT;
 		ring->irq_enable = gen6_ring_enable_irq;
 		ring->irq_disable = gen6_ring_disable_irq;
-		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
+		ring->emit_bb_start = gen6_emit_bb_start;
 		if (i915.semaphores) {
 			ring->semaphore.signal = gen6_signal;
 			ring->semaphore.sync_to = gen6_ring_sync;
@@ -2801,7 +2798,7 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 			GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
 		ring->irq_enable = gen8_ring_enable_irq;
 		ring->irq_disable = gen8_ring_disable_irq;
-		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
+		ring->emit_bb_start = gen8_emit_bb_start;
 		if (i915.semaphores) {
 			ring->semaphore.sync_to = gen8_ring_sync;
 			ring->semaphore.signal = gen8_xcs_signal;
@@ -2811,7 +2808,7 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 		ring->irq_enable_mask = PM_VEBOX_USER_INTERRUPT;
 		ring->irq_enable = hsw_vebox_enable_irq;
 		ring->irq_disable = hsw_vebox_disable_irq;
-		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
+		ring->emit_bb_start = gen6_emit_bb_start;
 		if (i915.semaphores) {
 			ring->semaphore.sync_to = gen6_ring_sync;
 			ring->semaphore.signal = gen6_signal;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index fdeadae726b8..3a10376b896f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -223,12 +223,6 @@ struct intel_engine_cs {
 	 * monotonic, even if not coherent.
 	 */
 	void		(*irq_seqno_barrier)(struct intel_engine_cs *ring);
-	int		(*dispatch_execbuffer)(struct drm_i915_gem_request *req,
-					       u64 offset, u32 length,
-					       unsigned dispatch_flags);
-#define I915_DISPATCH_SECURE 0x1
-#define I915_DISPATCH_PINNED 0x2
-#define I915_DISPATCH_RS     0x4
 	void		(*cleanup)(struct intel_engine_cs *ring);
 
 	/* GEN8 signal/wait table - never trust comments!
@@ -301,7 +295,11 @@ struct intel_engine_cs {
 				      u32 invalidate_domains,
 				      u32 flush_domains);
 	int		(*emit_bb_start)(struct drm_i915_gem_request *req,
-					 u64 offset, unsigned dispatch_flags);
+					 u64 offset, u32 length,
+					 unsigned dispatch_flags);
+#define I915_DISPATCH_SECURE 0x1
+#define I915_DISPATCH_PINNED 0x2
+#define I915_DISPATCH_RS     0x4
 
 	/**
 	 * List of objects currently involved in rendering from the
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 068/190] drm/i915: Unify adding requests between ringbuffer and execlists
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (65 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 067/190] drm/i915: Unify legacy/execlists emission of MI_BATCHBUFFER_START Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 069/190] drm/i915: Remove duplicate golden render state init from execlists Chris Wilson
                   ` (19 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.c |   8 +-
 drivers/gpu/drm/i915/intel_lrc.c        |  14 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.c | 129 +++++++++++++++++---------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  21 +++---
 4 files changed, 87 insertions(+), 85 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index ce663acc9c7d..01443d8d9224 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -434,13 +434,7 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	 */
 	request->postfix = intel_ring_get_tail(ring);
 
-	if (i915.enable_execlists)
-		ret = request->engine->emit_request(request);
-	else {
-		ret = request->engine->add_request(request);
-
-		request->tail = intel_ring_get_tail(ring);
-	}
+	ret = request->engine->add_request(request);
 	/* Not allowed to fail! */
 	WARN(ret, "emit|add_request failed: %d!\n", ret);
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 30effca91184..9838503fafca 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -445,7 +445,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *engine)
 		if (req0->elsp_submitted) {
 			/*
 			 * Apply the wa NOOPS to prevent ring:HEAD == req:TAIL
-			 * as we resubmit the request. See gen8_emit_request()
+			 * as we resubmit the request. See gen8_add_request()
 			 * for where we prepare the padding after the end of the
 			 * request.
 			 */
@@ -1588,7 +1588,7 @@ gen6_seqno_barrier(struct intel_engine_cs *ring)
 	intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
 }
 
-static int gen8_emit_request(struct drm_i915_gem_request *request)
+static int gen8_add_request(struct drm_i915_gem_request *request)
 {
 	struct intel_ring *ring = request->ring;
 	u32 cmd;
@@ -1782,8 +1782,8 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->init_context = gen8_init_rcs_context;
 	ring->cleanup = intel_fini_pipe_control;
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
-	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush_render;
+	ring->add_request = gen8_add_request;
 	ring->irq_enable = gen8_logical_ring_enable_irq;
 	ring->irq_disable = gen8_logical_ring_disable_irq;
 	ring->emit_bb_start = gen8_emit_bb_start;
@@ -1828,8 +1828,8 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 
 	ring->init_hw = gen8_init_common_ring;
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
-	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
+	ring->add_request = gen8_add_request;
 	ring->irq_enable = gen8_logical_ring_enable_irq;
 	ring->irq_disable = gen8_logical_ring_disable_irq;
 	ring->emit_bb_start = gen8_emit_bb_start;
@@ -1852,8 +1852,8 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 
 	ring->init_hw = gen8_init_common_ring;
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
-	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
+	ring->add_request = gen8_add_request;
 	ring->irq_enable = gen8_logical_ring_enable_irq;
 	ring->irq_disable = gen8_logical_ring_disable_irq;
 	ring->emit_bb_start = gen8_emit_bb_start;
@@ -1876,8 +1876,8 @@ static int logical_blt_ring_init(struct drm_device *dev)
 
 	ring->init_hw = gen8_init_common_ring;
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
-	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
+	ring->add_request = gen8_add_request;
 	ring->irq_enable = gen8_logical_ring_enable_irq;
 	ring->irq_disable = gen8_logical_ring_disable_irq;
 	ring->emit_bb_start = gen8_emit_bb_start;
@@ -1900,8 +1900,8 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 
 	ring->init_hw = gen8_init_common_ring;
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
-	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
+	ring->add_request = gen8_add_request;
 	ring->irq_enable = gen8_logical_ring_enable_irq;
 	ring->irq_disable = gen8_logical_ring_disable_irq;
 	ring->emit_bb_start = gen8_emit_bb_start;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 04f0a77d49cf..556e9e2c1fec 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -59,13 +59,6 @@ int intel_ring_space(struct intel_ring *ringbuf)
 	return ringbuf->space;
 }
 
-static void __intel_ring_advance(struct intel_engine_cs *ring)
-{
-	struct intel_ring *ringbuf = ring->buffer;
-	ringbuf->tail &= ringbuf->size - 1;
-	ring->write_tail(ring, ringbuf->tail);
-}
-
 static int
 gen2_render_ring_flush(struct drm_i915_gem_request *req,
 		       u32	invalidate_domains,
@@ -418,13 +411,6 @@ gen8_render_ring_flush(struct drm_i915_gem_request *req,
 	return gen8_emit_pipe_control(req, flags, scratch_addr);
 }
 
-static void ring_write_tail(struct intel_engine_cs *ring,
-			    u32 value)
-{
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-	I915_WRITE_TAIL(ring, value);
-}
-
 u64 intel_engine_get_active_head(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
@@ -533,7 +519,7 @@ static bool stop_ring(struct intel_engine_cs *ring)
 
 	I915_WRITE_CTL(ring, 0);
 	I915_WRITE_HEAD(ring, 0);
-	ring->write_tail(ring, 0);
+	I915_WRITE_TAIL(ring, 0);
 
 	if (!IS_GEN2(ring->dev)) {
 		(void)I915_READ_CTL(ring);
@@ -1308,6 +1294,7 @@ static int gen6_signal(struct drm_i915_gem_request *signaller_req,
 static int
 gen6_add_request(struct drm_i915_gem_request *req)
 {
+	struct drm_i915_private *dev_priv = req->i915;
 	struct intel_ring *ring = req->ring;
 	int ret;
 
@@ -1323,7 +1310,61 @@ gen6_add_request(struct drm_i915_gem_request *req)
 	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
 	intel_ring_emit(ring, req->fence.seqno);
 	intel_ring_emit(ring, MI_USER_INTERRUPT);
-	__intel_ring_advance(req->engine);
+	intel_ring_advance(ring);
+
+	req->tail = intel_ring_get_tail(ring);
+	I915_WRITE_TAIL(req->engine, req->tail);
+
+	return 0;
+}
+
+static int
+gen6_bsd_add_request(struct drm_i915_gem_request *req)
+{
+	struct drm_i915_private *dev_priv = req->i915;
+	struct intel_ring *ring = req->ring;
+	int ret;
+
+	if (req->engine->semaphore.signal)
+		ret = req->engine->semaphore.signal(req, 4);
+	else
+		ret = intel_ring_begin(req, 4);
+	if (ret)
+		return ret;
+
+	intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
+	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
+	intel_ring_emit(ring, req->fence.seqno);
+	intel_ring_emit(ring, MI_USER_INTERRUPT);
+	intel_ring_advance(ring);
+
+       /* Every tail move must follow the sequence below */
+
+	/* Disable notification that the ring is IDLE. The GT
+	 * will then assume that it is busy and bring it out of rc6.
+	 */
+	I915_WRITE(GEN6_BSD_SLEEP_PSMI_CONTROL,
+		   _MASKED_BIT_ENABLE(GEN6_BSD_SLEEP_MSG_DISABLE));
+
+	/* Clear the context id. Here be magic! */
+	I915_WRITE64(GEN6_BSD_RNCID, 0x0);
+
+	/* Wait for the ring not to be idle, i.e. for it to wake up. */
+	if (wait_for((I915_READ(GEN6_BSD_SLEEP_PSMI_CONTROL) &
+		      GEN6_BSD_SLEEP_INDICATOR) == 0,
+		     50))
+		DRM_ERROR("timed out waiting for the BSD ring to wake up\n");
+
+	/* Now that the ring is fully powered up, update the tail */
+	req->tail = intel_ring_get_tail(ring);
+	I915_WRITE_TAIL(req->engine, req->tail);
+	POSTING_READ(RING_TAIL(req->engine->mmio_base));
+
+	/* Let the ring send IDLE messages to the GT again,
+	 * and so let it sleep to conserve power when idle.
+	 */
+	I915_WRITE(GEN6_BSD_SLEEP_PSMI_CONTROL,
+		   _MASKED_BIT_DISABLE(GEN6_BSD_SLEEP_MSG_DISABLE));
 
 	return 0;
 }
@@ -1423,6 +1464,7 @@ do {									\
 static int
 pc_render_add_request(struct drm_i915_gem_request *req)
 {
+	struct drm_i915_private *dev_priv = req->i915;
 	struct intel_ring *ring = req->ring;
 	u32 addr = req->engine->status_page.gfx_addr +
 		(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
@@ -1467,7 +1509,10 @@ pc_render_add_request(struct drm_i915_gem_request *req)
 	intel_ring_emit(ring, addr | PIPE_CONTROL_GLOBAL_GTT);
 	intel_ring_emit(ring, req->fence.seqno);
 	intel_ring_emit(ring, 0);
-	__intel_ring_advance(req->engine);
+	intel_ring_advance(ring);
+
+	req->tail = intel_ring_get_tail(ring);
+	I915_WRITE_TAIL(req->engine, req->tail);
 
 	return 0;
 }
@@ -1566,6 +1611,7 @@ bsd_ring_flush(struct drm_i915_gem_request *req,
 static int
 i9xx_add_request(struct drm_i915_gem_request *req)
 {
+	struct drm_i915_private *dev_priv = req->i915;
 	struct intel_ring *ring = req->ring;
 	int ret;
 
@@ -1577,7 +1623,10 @@ i9xx_add_request(struct drm_i915_gem_request *req)
 	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
 	intel_ring_emit(ring, req->fence.seqno);
 	intel_ring_emit(ring, MI_USER_INTERRUPT);
-	__intel_ring_advance(req->engine);
+	intel_ring_advance(ring);
+
+	req->tail = intel_ring_get_tail(ring);
+	I915_WRITE_TAIL(req->engine, req->tail);
 
 	return 0;
 }
@@ -2283,39 +2332,6 @@ void intel_engine_init_seqno(struct intel_engine_cs *ring, u32 seqno)
 	ring->hangcheck.seqno = seqno;
 }
 
-static void gen6_bsd_ring_write_tail(struct intel_engine_cs *ring,
-				     u32 value)
-{
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-
-       /* Every tail move must follow the sequence below */
-
-	/* Disable notification that the ring is IDLE. The GT
-	 * will then assume that it is busy and bring it out of rc6.
-	 */
-	I915_WRITE(GEN6_BSD_SLEEP_PSMI_CONTROL,
-		   _MASKED_BIT_ENABLE(GEN6_BSD_SLEEP_MSG_DISABLE));
-
-	/* Clear the context id. Here be magic! */
-	I915_WRITE64(GEN6_BSD_RNCID, 0x0);
-
-	/* Wait for the ring not to be idle, i.e. for it to wake up. */
-	if (wait_for((I915_READ(GEN6_BSD_SLEEP_PSMI_CONTROL) &
-		      GEN6_BSD_SLEEP_INDICATOR) == 0,
-		     50))
-		DRM_ERROR("timed out waiting for the BSD ring to wake up\n");
-
-	/* Now that the ring is fully powered up, update the tail */
-	I915_WRITE_TAIL(ring, value);
-	POSTING_READ(RING_TAIL(ring->mmio_base));
-
-	/* Let the ring send IDLE messages to the GT again,
-	 * and so let it sleep to conserve power when idle.
-	 */
-	I915_WRITE(GEN6_BSD_SLEEP_PSMI_CONTROL,
-		   _MASKED_BIT_DISABLE(GEN6_BSD_SLEEP_MSG_DISABLE));
-}
-
 static int gen6_bsd_ring_flush(struct drm_i915_gem_request *req,
 			       u32 invalidate, u32 flush)
 {
@@ -2575,7 +2591,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		}
 		ring->irq_enable_mask = I915_USER_INTERRUPT;
 	}
-	ring->write_tail = ring_write_tail;
 
 	if (IS_HASWELL(dev))
 		ring->emit_bb_start = hsw_emit_bb_start;
@@ -2632,14 +2647,13 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 	ring->name = "bsd ring";
 	ring->id = VCS;
 
-	ring->write_tail = ring_write_tail;
 	if (INTEL_INFO(dev)->gen >= 6) {
 		ring->mmio_base = GEN6_BSD_RING_BASE;
-		/* gen6 bsd needs a special wa for tail updates */
-		if (IS_GEN6(dev))
-			ring->write_tail = gen6_bsd_ring_write_tail;
 		ring->emit_flush = gen6_bsd_ring_flush;
+		/* gen6 bsd needs a special wa for tail updates */
 		ring->add_request = gen6_add_request;
+		if (IS_GEN6(dev))
+			ring->add_request = gen6_bsd_add_request;
 		ring->irq_seqno_barrier = gen6_seqno_barrier;
 		if (INTEL_INFO(dev)->gen >= 8) {
 			ring->irq_enable_mask =
@@ -2703,7 +2717,6 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
 	ring->name = "bsd2 ring";
 	ring->id = VCS2;
 
-	ring->write_tail = ring_write_tail;
 	ring->mmio_base = GEN8_BSD2_RING_BASE;
 	ring->emit_flush = gen6_bsd_ring_flush;
 	ring->add_request = gen6_add_request;
@@ -2732,7 +2745,6 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 	ring->id = BCS;
 
 	ring->mmio_base = BLT_RING_BASE;
-	ring->write_tail = ring_write_tail;
 	ring->emit_flush = gen6_ring_flush;
 	ring->add_request = gen6_add_request;
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
@@ -2788,7 +2800,6 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 	ring->id = VECS;
 
 	ring->mmio_base = VEBOX_RING_BASE;
-	ring->write_tail = ring_write_tail;
 	ring->emit_flush = gen6_ring_flush;
 	ring->add_request = gen6_add_request;
 	ring->irq_seqno_barrier = gen6_seqno_barrier;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 3a10376b896f..8147ce1379fb 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -213,8 +213,15 @@ struct intel_engine_cs {
 
 	int		(*init_context)(struct drm_i915_gem_request *req);
 
-	void		(*write_tail)(struct intel_engine_cs *ring,
-				      u32 value);
+	int		(*emit_flush)(struct drm_i915_gem_request *request,
+				      u32 invalidate_domains,
+				      u32 flush_domains);
+	int		(*emit_bb_start)(struct drm_i915_gem_request *req,
+					 u64 offset, u32 length,
+					 unsigned dispatch_flags);
+#define I915_DISPATCH_SECURE 0x1
+#define I915_DISPATCH_PINNED 0x2
+#define I915_DISPATCH_RS     0x4
 	int		(*add_request)(struct drm_i915_gem_request *req);
 	/* Some chipsets are not quite as coherent as advertised and need
 	 * an expensive kick to force a true read of the up-to-date seqno.
@@ -290,16 +297,6 @@ struct intel_engine_cs {
 	struct list_head execlist_retired_req_list;
 	u8 next_context_status_buffer;
 	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
-	int		(*emit_request)(struct drm_i915_gem_request *request);
-	int		(*emit_flush)(struct drm_i915_gem_request *request,
-				      u32 invalidate_domains,
-				      u32 flush_domains);
-	int		(*emit_bb_start)(struct drm_i915_gem_request *req,
-					 u64 offset, u32 length,
-					 unsigned dispatch_flags);
-#define I915_DISPATCH_SECURE 0x1
-#define I915_DISPATCH_PINNED 0x2
-#define I915_DISPATCH_RS     0x4
 
 	/**
 	 * List of objects currently involved in rendering from the
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 069/190] drm/i915: Remove duplicate golden render state init from execlists
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (66 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 068/190] drm/i915: Unify adding requests between ringbuffer and execlists Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 070/190] drm/i915: Unify legacy/execlists submit_execbuf callbacks Chris Wilson
                   ` (18 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Now that we use the same vfuncs for emitting the batch buffer in both
execlists and legacy, the golden render state initialisation is
identical between both.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_render_state.c | 22 ++++++++++++------
 drivers/gpu/drm/i915/i915_gem_render_state.h | 18 ---------------
 drivers/gpu/drm/i915/intel_lrc.c             | 34 +---------------------------
 drivers/gpu/drm/i915/intel_renderstate.h     | 16 +++++++++----
 4 files changed, 27 insertions(+), 63 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index ccc988c2b226..222f25777bb4 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -28,6 +28,15 @@
 #include "i915_drv.h"
 #include "intel_renderstate.h"
 
+struct render_state {
+	const struct intel_renderstate_rodata *rodata;
+	struct drm_i915_gem_object *obj;
+	u64 ggtt_offset;
+	int gen;
+	u32 aux_batch_size;
+	u32 aux_batch_offset;
+};
+
 static const struct intel_renderstate_rodata *
 render_state_get_rodata(struct drm_device *dev, const int gen)
 {
@@ -163,14 +172,14 @@ err_out:
 
 #undef OUT_BATCH
 
-void i915_gem_render_state_fini(struct render_state *so)
+static void render_state_fini(struct render_state *so)
 {
 	i915_gem_object_ggtt_unpin(so->obj);
 	drm_gem_object_unreference(&so->obj->base);
 }
 
-int i915_gem_render_state_prepare(struct intel_engine_cs *ring,
-				  struct render_state *so)
+static int render_state_prepare(struct intel_engine_cs *ring,
+				struct render_state *so)
 {
 	int ret;
 
@@ -186,7 +195,7 @@ int i915_gem_render_state_prepare(struct intel_engine_cs *ring,
 
 	ret = render_state_setup(so);
 	if (ret) {
-		i915_gem_render_state_fini(so);
+		render_state_fini(so);
 		return ret;
 	}
 
@@ -198,7 +207,7 @@ int i915_gem_render_state_init(struct drm_i915_gem_request *req)
 	struct render_state so;
 	int ret;
 
-	ret = i915_gem_render_state_prepare(req->engine, &so);
+	ret = render_state_prepare(req->engine, &so);
 	if (ret)
 		return ret;
 
@@ -222,8 +231,7 @@ int i915_gem_render_state_init(struct drm_i915_gem_request *req)
 	}
 
 	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), req);
-
 out:
-	i915_gem_render_state_fini(&so);
+	render_state_fini(&so);
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.h b/drivers/gpu/drm/i915/i915_gem_render_state.h
index e641bb093a90..c44fca8599bb 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.h
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.h
@@ -26,24 +26,6 @@
 
 #include <linux/types.h>
 
-struct intel_renderstate_rodata {
-	const u32 *reloc;
-	const u32 *batch;
-	const u32 batch_items;
-};
-
-struct render_state {
-	const struct intel_renderstate_rodata *rodata;
-	struct drm_i915_gem_object *obj;
-	u64 ggtt_offset;
-	int gen;
-	u32 aux_batch_size;
-	u32 aux_batch_offset;
-};
-
 int i915_gem_render_state_init(struct drm_i915_gem_request *req);
-void i915_gem_render_state_fini(struct render_state *so);
-int i915_gem_render_state_prepare(struct intel_engine_cs *ring,
-				  struct render_state *so);
 
 #endif /* _I915_GEM_RENDER_STATE_H_ */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9838503fafca..2f92c43397eb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1627,38 +1627,6 @@ static int gen8_add_request(struct drm_i915_gem_request *request)
 	return 0;
 }
 
-static int intel_lr_context_render_state_init(struct drm_i915_gem_request *req)
-{
-	struct render_state so;
-	int ret;
-
-	ret = i915_gem_render_state_prepare(req->engine, &so);
-	if (ret)
-		return ret;
-
-	if (so.rodata == NULL)
-		return 0;
-
-	ret = req->engine->emit_bb_start(req, so.ggtt_offset,
-					 so.rodata->batch_items * 4,
-					 I915_DISPATCH_SECURE);
-	if (ret)
-		goto out;
-
-	ret = req->engine->emit_bb_start(req,
-					 (so.ggtt_offset + so.aux_batch_offset),
-					 so.aux_batch_size,
-					 I915_DISPATCH_SECURE);
-	if (ret)
-		goto out;
-
-	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), req);
-
-out:
-	i915_gem_render_state_fini(&so);
-	return ret;
-}
-
 static int gen8_init_rcs_context(struct drm_i915_gem_request *req)
 {
 	int ret;
@@ -1675,7 +1643,7 @@ static int gen8_init_rcs_context(struct drm_i915_gem_request *req)
 	if (ret)
 		DRM_ERROR("MOCS failed to program: expect performance issues.\n");
 
-	return intel_lr_context_render_state_init(req);
+	return i915_gem_render_state_init(req);
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/intel_renderstate.h b/drivers/gpu/drm/i915/intel_renderstate.h
index 5bd69852752c..08f6fea05a2c 100644
--- a/drivers/gpu/drm/i915/intel_renderstate.h
+++ b/drivers/gpu/drm/i915/intel_renderstate.h
@@ -24,12 +24,13 @@
 #ifndef _INTEL_RENDERSTATE_H
 #define _INTEL_RENDERSTATE_H
 
-#include "i915_drv.h"
+#include <linux/types.h>
 
-extern const struct intel_renderstate_rodata gen6_null_state;
-extern const struct intel_renderstate_rodata gen7_null_state;
-extern const struct intel_renderstate_rodata gen8_null_state;
-extern const struct intel_renderstate_rodata gen9_null_state;
+struct intel_renderstate_rodata {
+	const u32 *reloc;
+	const u32 *batch;
+	const u32 batch_items;
+};
 
 #define RO_RENDERSTATE(_g)						\
 	const struct intel_renderstate_rodata gen ## _g ## _null_state = { \
@@ -38,4 +39,9 @@ extern const struct intel_renderstate_rodata gen9_null_state;
 		.batch_items = sizeof(gen ## _g ## _null_state_batch)/4, \
 	}
 
+extern const struct intel_renderstate_rodata gen6_null_state;
+extern const struct intel_renderstate_rodata gen7_null_state;
+extern const struct intel_renderstate_rodata gen8_null_state;
+extern const struct intel_renderstate_rodata gen9_null_state;
+
 #endif /* INTEL_RENDERSTATE_H */
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 070/190] drm/i915: Unify legacy/execlists submit_execbuf callbacks
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (67 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 069/190] drm/i915: Remove duplicate golden render state init from execlists Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 071/190] drm/i915: Simplify calling engine->sync_to Chris Wilson
                   ` (17 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Now that emitting requests is identical between legacy and execlists, we
can use the same function to build up the ring for submitting to either
engine. (With the exception of i915_switch_contexts(), but in time that
will also be handled gracefully.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h            |  20 -----
 drivers/gpu/drm/i915/i915_gem.c            |   2 -
 drivers/gpu/drm/i915/i915_gem_context.c    |   3 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  24 ++++--
 drivers/gpu/drm/i915/intel_lrc.c           | 129 -----------------------------
 drivers/gpu/drm/i915/intel_lrc.h           |   4 -
 6 files changed, 20 insertions(+), 162 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0c580124d46d..cae448e238ca 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1655,18 +1655,6 @@ struct i915_virtual_gpu {
 	bool active;
 };
 
-struct i915_execbuffer_params {
-	struct drm_device               *dev;
-	struct drm_file                 *file;
-	uint32_t                        dispatch_flags;
-	uint32_t                        args_batch_start_offset;
-	uint64_t                        batch_obj_vm_offset;
-	struct intel_engine_cs          *ring;
-	struct drm_i915_gem_object      *batch_obj;
-	struct intel_context            *ctx;
-	struct drm_i915_gem_request     *request;
-};
-
 /* used in computing the new watermarks state */
 struct intel_wm_config {
 	unsigned int num_pipes_active;
@@ -1934,9 +1922,6 @@ struct drm_i915_private {
 
 	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
 	struct {
-		int (*execbuf_submit)(struct i915_execbuffer_params *params,
-				      struct drm_i915_gem_execbuffer2 *args,
-				      struct list_head *vmas);
 		int (*init_rings)(struct drm_device *dev);
 		void (*cleanup_ring)(struct intel_engine_cs *ring);
 		void (*stop_ring)(struct intel_engine_cs *ring);
@@ -2656,11 +2641,6 @@ int i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 			      struct drm_file *file_priv);
 int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 			     struct drm_file *file_priv);
-void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
-					struct drm_i915_gem_request *req);
-int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
-				   struct drm_i915_gem_execbuffer2 *args,
-				   struct list_head *vmas);
 int i915_gem_execbuffer(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
 int i915_gem_execbuffer2(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5b5afdcd9634..235a3de6e0a0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4308,12 +4308,10 @@ int i915_gem_init(struct drm_device *dev)
 	mutex_lock(&dev->struct_mutex);
 
 	if (!i915.enable_execlists) {
-		dev_priv->gt.execbuf_submit = i915_gem_ringbuffer_submission;
 		dev_priv->gt.init_rings = i915_gem_init_rings;
 		dev_priv->gt.cleanup_ring = intel_engine_cleanup;
 		dev_priv->gt.stop_ring = intel_engine_stop;
 	} else {
-		dev_priv->gt.execbuf_submit = intel_execlists_submission;
 		dev_priv->gt.init_rings = intel_logical_rings_init;
 		dev_priv->gt.cleanup_ring = intel_logical_ring_cleanup;
 		dev_priv->gt.stop_ring = intel_logical_ring_stop;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index c078ebc29da5..72b0875a95a4 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -819,8 +819,9 @@ unpin_out:
  */
 int i915_switch_context(struct drm_i915_gem_request *req)
 {
+	if (i915.enable_execlists)
+		return 0;
 
-	WARN_ON(i915.enable_execlists);
 	WARN_ON(!mutex_is_locked(&req->i915->dev->struct_mutex));
 
 	if (req->ctx->legacy_hw_ctx.rcs_state == NULL) { /* We have the fake context */
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3e6384deca65..6dee27224ddb 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -41,6 +41,18 @@
 
 #define BATCH_OFFSET_BIAS (256*1024)
 
+struct i915_execbuffer_params {
+	struct drm_device               *dev;
+	struct drm_file                 *file;
+	uint32_t                        dispatch_flags;
+	uint32_t                        args_batch_start_offset;
+	uint64_t                        batch_obj_vm_offset;
+	struct intel_engine_cs          *ring;
+	struct drm_i915_gem_object      *batch_obj;
+	struct intel_context            *ctx;
+	struct drm_i915_gem_request     *request;
+};
+
 struct eb_vmas {
 	struct list_head vmas;
 	int and;
@@ -1093,7 +1105,7 @@ i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
 	return ctx;
 }
 
-void
+static void
 i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 				   struct drm_i915_gem_request *req)
 {
@@ -1219,10 +1231,10 @@ err:
 		return ERR_PTR(ret);
 }
 
-int
-i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
-			       struct drm_i915_gem_execbuffer2 *args,
-			       struct list_head *vmas)
+static int
+execbuf_submit(struct i915_execbuffer_params *params,
+	       struct drm_i915_gem_execbuffer2 *args,
+	       struct list_head *vmas)
 {
 	struct intel_ring *ring = params->request->ring;
 	struct drm_i915_private *dev_priv = params->request->i915;
@@ -1620,7 +1632,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	params->batch_obj               = batch_obj;
 	params->ctx                     = ctx;
 
-	ret = dev_priv->gt.execbuf_submit(params, args, &eb->vmas);
+	ret = execbuf_submit(params, args, &eb->vmas);
 	i915_gem_execbuffer_retire_commands(params);
 
 err_batch_unpin:
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2f92c43397eb..84a8bcc90d78 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -616,39 +616,6 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
 	return 0;
 }
 
-static int execlists_move_to_gpu(struct drm_i915_gem_request *req,
-				 struct list_head *vmas)
-{
-	const unsigned other_rings = ~intel_engine_flag(req->engine);
-	struct i915_vma *vma;
-	uint32_t flush_domains = 0;
-	bool flush_chipset = false;
-	int ret;
-
-	list_for_each_entry(vma, vmas, exec_list) {
-		struct drm_i915_gem_object *obj = vma->obj;
-
-		if (obj->active & other_rings) {
-			ret = i915_gem_object_sync(obj, req);
-			if (ret)
-				return ret;
-		}
-
-		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
-			flush_chipset |= i915_gem_clflush_object(obj, false);
-
-		flush_domains |= obj->base.write_domain;
-	}
-
-	if (flush_domains & I915_GEM_DOMAIN_GTT)
-		wmb();
-
-	/* Unconditionally invalidate gpu caches and ensure that we do flush
-	 * any residual writes from the previous batch.
-	 */
-	return req->engine->emit_flush(req, I915_GEM_GPU_DOMAINS, 0);
-}
-
 int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request)
 {
 	int ret;
@@ -700,102 +667,6 @@ intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
 		execlists_context_queue(request);
 }
 
-/**
- * execlists_submission() - submit a batchbuffer for execution, Execlists style
- * @dev: DRM device.
- * @file: DRM file.
- * @ring: Engine Command Streamer to submit to.
- * @ctx: Context to employ for this submission.
- * @args: execbuffer call arguments.
- * @vmas: list of vmas.
- * @batch_obj: the batchbuffer to submit.
- * @exec_start: batchbuffer start virtual address pointer.
- * @dispatch_flags: translated execbuffer call flags.
- *
- * This is the evil twin version of i915_gem_ringbuffer_submission. It abstracts
- * away the submission details of the execbuffer ioctl call.
- *
- * Return: non-zero if the submission fails.
- */
-int intel_execlists_submission(struct i915_execbuffer_params *params,
-			       struct drm_i915_gem_execbuffer2 *args,
-			       struct list_head *vmas)
-{
-	struct drm_device       *dev = params->dev;
-	struct intel_engine_cs  *engine = params->ring;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_ring *ring = params->request->ring;
-	u64 exec_start;
-	int instp_mode;
-	u32 instp_mask;
-	int ret;
-
-	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
-	instp_mask = I915_EXEC_CONSTANTS_MASK;
-	switch (instp_mode) {
-	case I915_EXEC_CONSTANTS_REL_GENERAL:
-	case I915_EXEC_CONSTANTS_ABSOLUTE:
-	case I915_EXEC_CONSTANTS_REL_SURFACE:
-		if (instp_mode != 0 && engine->id != RCS) {
-			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
-			return -EINVAL;
-		}
-
-		if (instp_mode != dev_priv->relative_constants_mode) {
-			if (instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
-				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
-				return -EINVAL;
-			}
-
-			/* The HW changed the meaning on this bit on gen6 */
-			instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
-		}
-		break;
-	default:
-		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
-		return -EINVAL;
-	}
-
-	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
-		DRM_DEBUG("sol reset is gen7 only\n");
-		return -EINVAL;
-	}
-
-	ret = execlists_move_to_gpu(params->request, vmas);
-	if (ret)
-		return ret;
-
-	if (engine->id == RCS &&
-	    instp_mode != dev_priv->relative_constants_mode) {
-		ret = intel_ring_begin(params->request, 4);
-		if (ret)
-			return ret;
-
-		intel_ring_emit(ring, MI_NOOP);
-		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
-		intel_ring_emit_reg(ring, INSTPM);
-		intel_ring_emit(ring, instp_mask << 16 | instp_mode);
-		intel_ring_advance(ring);
-
-		dev_priv->relative_constants_mode = instp_mode;
-	}
-
-	exec_start = params->batch_obj_vm_offset +
-		     args->batch_start_offset;
-
-	ret = engine->emit_bb_start(params->request,
-				    exec_start, args->batch_len,
-				    params->dispatch_flags);
-	if (ret)
-		return ret;
-
-	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
-
-	i915_gem_execbuffer_move_to_active(vmas, params->request);
-
-	return 0;
-}
-
 void intel_execlists_retire_requests(struct intel_engine_cs *ring)
 {
 	struct drm_i915_gem_request *req, *tmp;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 7f01d2ddacfa..87bc9acc4224 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -79,10 +79,6 @@ uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
 
 /* Execlists */
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
-struct i915_execbuffer_params;
-int intel_execlists_submission(struct i915_execbuffer_params *params,
-			       struct drm_i915_gem_execbuffer2 *args,
-			       struct list_head *vmas);
 u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
 void intel_lrc_irq_handler(struct intel_engine_cs *ring);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 071/190] drm/i915: Simplify calling engine->sync_to
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (68 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 070/190] drm/i915: Unify legacy/execlists submit_execbuf callbacks Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 072/190] drm/i915: Execlists cannot pin a context without the object Chris Wilson
                   ` (16 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Since requests can no longer be generated as a side-effect of
intel_ring_begin(), we know that the seqno will be unchanged during
ring-emission. This predicatablity then means we do not have to check
for the seqno wrapping around whilst emitting the semaphore for
engine->sync_to().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c         | 13 ++-----
 drivers/gpu/drm/i915/intel_ringbuffer.c | 67 ++++++++++++++-------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  5 +--
 3 files changed, 33 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 235a3de6e0a0..b0230e7151ce 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2572,22 +2572,15 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 		i915_gem_object_retire_request(obj, from);
 	} else {
 		int idx = intel_engine_sync_index(from->engine, to->engine);
-		u32 seqno = i915_gem_request_get_seqno(from);
-
-		if (seqno <= from->engine->semaphore.sync_seqno[idx])
+		if (from->fence.seqno <= from->engine->semaphore.sync_seqno[idx])
 			return 0;
 
 		trace_i915_gem_ring_sync_to(to, from);
-		ret = to->engine->semaphore.sync_to(to, from->engine, seqno);
+		ret = to->engine->semaphore.sync_to(to, from);
 		if (ret)
 			return ret;
 
-		/* We use last_read_req because sync_to()
-		 * might have just caused seqno wrap under
-		 * the radar.
-		 */
-		from->engine->semaphore.sync_seqno[idx] =
-			i915_gem_request_get_seqno(obj->last_read_req[from->engine->id]);
+		from->engine->semaphore.sync_seqno[idx] = from->fence.seqno;
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 556e9e2c1fec..d37cdb2f9073 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1384,69 +1384,58 @@ static inline bool i915_gem_has_seqno_wrapped(struct drm_i915_private *dev_priv,
  */
 
 static int
-gen8_ring_sync(struct drm_i915_gem_request *waiter_req,
-	       struct intel_engine_cs *signaller,
-	       u32 seqno)
+gen8_ring_sync(struct drm_i915_gem_request *wait,
+	       struct drm_i915_gem_request *signal)
 {
-	struct intel_ring *waiter = waiter_req->ring;
-	struct drm_i915_private *dev_priv = waiter_req->i915;
+	struct intel_ring *waiter = wait->ring;
+	struct drm_i915_private *dev_priv = wait->i915;
 	int ret;
 
-	ret = intel_ring_begin(waiter_req, 4);
+	ret = intel_ring_begin(wait, 4);
 	if (ret)
 		return ret;
 
-	intel_ring_emit(waiter, MI_SEMAPHORE_WAIT |
-				MI_SEMAPHORE_GLOBAL_GTT |
-				MI_SEMAPHORE_POLL |
-				MI_SEMAPHORE_SAD_GTE_SDD);
-	intel_ring_emit(waiter, seqno);
 	intel_ring_emit(waiter,
-			lower_32_bits(GEN8_WAIT_OFFSET(waiter_req->engine,
-						       signaller->id)));
+			MI_SEMAPHORE_WAIT |
+			MI_SEMAPHORE_GLOBAL_GTT |
+			MI_SEMAPHORE_POLL |
+			MI_SEMAPHORE_SAD_GTE_SDD);
+	intel_ring_emit(waiter, signal->fence.seqno);
 	intel_ring_emit(waiter,
-			upper_32_bits(GEN8_WAIT_OFFSET(waiter_req->engine,
-						       signaller->id)));
+			lower_32_bits(GEN8_WAIT_OFFSET(wait->engine,
+						       signal->engine->id)));
+	intel_ring_emit(waiter,
+			upper_32_bits(GEN8_WAIT_OFFSET(wait->engine,
+						       signal->engine->id)));
 	intel_ring_advance(waiter);
 	return 0;
 }
 
 static int
-gen6_ring_sync(struct drm_i915_gem_request *waiter_req,
-	       struct intel_engine_cs *signaller,
-	       u32 seqno)
+gen6_ring_sync(struct drm_i915_gem_request *wait,
+	       struct drm_i915_gem_request *signal)
 {
-	struct intel_ring *waiter = waiter_req->ring;
+	struct intel_ring *waiter = wait->ring;
 	u32 dw1 = MI_SEMAPHORE_MBOX |
 		  MI_SEMAPHORE_COMPARE |
 		  MI_SEMAPHORE_REGISTER;
-	u32 wait_mbox = signaller->semaphore.mbox.wait[waiter_req->engine->id];
+	u32 wait_mbox = signal->engine->semaphore.mbox.wait[wait->engine->id];
 	int ret;
 
-	/* Throughout all of the GEM code, seqno passed implies our current
-	 * seqno is >= the last seqno executed. However for hardware the
-	 * comparison is strictly greater than.
-	 */
-	seqno -= 1;
-
 	WARN_ON(wait_mbox == MI_SEMAPHORE_SYNC_INVALID);
 
-	ret = intel_ring_begin(waiter_req, 4);
+	ret = intel_ring_begin(wait, 4);
 	if (ret)
 		return ret;
 
-	/* If seqno wrap happened, omit the wait with no-ops */
-	if (likely(!i915_gem_has_seqno_wrapped(waiter_req->i915, seqno))) {
-		intel_ring_emit(waiter, dw1 | wait_mbox);
-		intel_ring_emit(waiter, seqno);
-		intel_ring_emit(waiter, 0);
-		intel_ring_emit(waiter, MI_NOOP);
-	} else {
-		intel_ring_emit(waiter, MI_NOOP);
-		intel_ring_emit(waiter, MI_NOOP);
-		intel_ring_emit(waiter, MI_NOOP);
-		intel_ring_emit(waiter, MI_NOOP);
-	}
+	intel_ring_emit(waiter, dw1 | wait_mbox);
+	/* Throughout all of the GEM code, seqno passed implies our current
+	 * seqno is >= the last seqno executed. However for hardware the
+	 * comparison is strictly greater than.
+	 */
+	intel_ring_emit(waiter, signal->fence.seqno - 1);
+	intel_ring_emit(waiter, 0);
+	intel_ring_emit(waiter, MI_NOOP);
 	intel_ring_advance(waiter);
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 8147ce1379fb..fc9c1e453be1 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -283,9 +283,8 @@ struct intel_engine_cs {
 		};
 
 		/* AKA wait() */
-		int	(*sync_to)(struct drm_i915_gem_request *to_req,
-				   struct intel_engine_cs *from,
-				   u32 seqno);
+		int	(*sync_to)(struct drm_i915_gem_request *to,
+				   struct drm_i915_gem_request *from);
 		int	(*signal)(struct drm_i915_gem_request *signaller_req,
 				  /* num_dwords needed by caller */
 				  unsigned int num_dwords);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 072/190] drm/i915: Execlists cannot pin a context without the object
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (69 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 071/190] drm/i915: Simplify calling engine->sync_to Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11 15:24   ` Tvrtko Ursulin
  2016-01-11  9:17 ` [PATCH 073/190] drm/i915: Introduce i915_gem_active for request tracking Chris Wilson
                   ` (15 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Given that the intel_lr_context_pin cannot succeed without the object,
we cannot reach intel_lr_context_unpin() without first allocating that
object - so we can remove the redundant test.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 84a8bcc90d78..0f0bf97e4032 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -769,17 +769,14 @@ static int intel_lr_context_pin(struct drm_i915_gem_request *rq)
 void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 {
 	int engine = rq->engine->id;
-	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[engine].state;
-	struct intel_ring *ring = rq->ring;
-
-	if (ctx_obj) {
-		WARN_ON(!mutex_is_locked(&rq->i915->dev->struct_mutex));
-		if (--rq->ctx->engine[engine].pin_count == 0) {
-			intel_ring_unmap(ring);
-			i915_gem_object_ggtt_unpin(ctx_obj);
-			i915_gem_context_unreference(rq->ctx);
-		}
-	}
+
+	WARN_ON(!mutex_is_locked(&rq->i915->dev->struct_mutex));
+	if (--rq->ctx->engine[engine].pin_count)
+		return;
+
+	intel_ring_unmap(rq->ring);
+	i915_gem_object_ggtt_unpin(rq->ctx->engine[engine].state);
+	i915_gem_context_unreference(rq->ctx);
 }
 
 static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 073/190] drm/i915: Introduce i915_gem_active for request tracking
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (70 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 072/190] drm/i915: Execlists cannot pin a context without the object Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11 17:32   ` Tvrtko Ursulin
  2016-01-11  9:17 ` [PATCH 074/190] drm/i915: Rename request->list to link for consistency Chris Wilson
                   ` (14 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

In the next patch, request tracking is made more generic and for that we
need a new expanded struct and to separate out the logic changes from
the mechanical churn, we split out the structure renaming into this
patch.

v2: Writer's block. Add some spiel about why we track requests.
v3: Now i915_gem_active.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        | 10 +++---
 drivers/gpu/drm/i915/i915_drv.h            |  9 +++--
 drivers/gpu/drm/i915/i915_gem.c            | 56 +++++++++++++++---------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  4 +--
 drivers/gpu/drm/i915/i915_gem_fence.c      |  6 ++--
 drivers/gpu/drm/i915/i915_gem_request.h    | 38 ++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_tiling.c     |  2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c      |  6 ++--
 drivers/gpu/drm/i915/intel_display.c       | 10 +++---
 9 files changed, 89 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 8de944ed3369..65cb1d6a5d64 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -146,10 +146,10 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		   obj->base.write_domain);
 	for_each_ring(ring, dev_priv, i)
 		seq_printf(m, "%x ",
-				i915_gem_request_get_seqno(obj->last_read_req[i]));
+				i915_gem_request_get_seqno(obj->last_read[i].request));
 	seq_printf(m, "] %x %x%s%s%s",
-		   i915_gem_request_get_seqno(obj->last_write_req),
-		   i915_gem_request_get_seqno(obj->last_fenced_req),
+		   i915_gem_request_get_seqno(obj->last_write.request),
+		   i915_gem_request_get_seqno(obj->last_fence.request),
 		   i915_cache_level_str(to_i915(obj->base.dev), obj->cache_level),
 		   obj->dirty ? " dirty" : "",
 		   obj->madv == I915_MADV_DONTNEED ? " purgeable" : "");
@@ -184,8 +184,8 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		*t = '\0';
 		seq_printf(m, " (%s mappable)", s);
 	}
-	if (obj->last_write_req != NULL)
-		seq_printf(m, " (%s)", obj->last_write_req->engine->name);
+	if (obj->last_write.request != NULL)
+		seq_printf(m, " (%s)", obj->last_write.request->engine->name);
 	if (obj->frontbuffer_bits)
 		seq_printf(m, " (frontbuffer: 0x%03x)", obj->frontbuffer_bits);
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index cae448e238ca..c577f86d94f8 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2110,11 +2110,10 @@ struct drm_i915_gem_object {
 	 * requests on one ring where the write request is older than the
 	 * read request. This allows for the CPU to read from an active
 	 * buffer by only waiting for the write to complete.
-	 * */
-	struct drm_i915_gem_request *last_read_req[I915_NUM_RINGS];
-	struct drm_i915_gem_request *last_write_req;
-	/** Breadcrumb of last fenced GPU access to the buffer. */
-	struct drm_i915_gem_request *last_fenced_req;
+	 */
+	struct i915_gem_active last_read[I915_NUM_RINGS];
+	struct i915_gem_active last_write;
+	struct i915_gem_active last_fence;
 
 	/** Current tiling stride for the object, if it's tiled. */
 	uint32_t stride;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b0230e7151ce..77c253ddf060 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1117,23 +1117,23 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
 		return 0;
 
 	if (readonly) {
-		if (obj->last_write_req != NULL) {
-			ret = i915_wait_request(obj->last_write_req);
+		if (obj->last_write.request != NULL) {
+			ret = i915_wait_request(obj->last_write.request);
 			if (ret)
 				return ret;
 
-			i = obj->last_write_req->engine->id;
-			if (obj->last_read_req[i] == obj->last_write_req)
+			i = obj->last_write.request->engine->id;
+			if (obj->last_read[i].request == obj->last_write.request)
 				i915_gem_object_retire__read(obj, i);
 			else
 				i915_gem_object_retire__write(obj);
 		}
 	} else {
 		for (i = 0; i < I915_NUM_RINGS; i++) {
-			if (obj->last_read_req[i] == NULL)
+			if (obj->last_read[i].request == NULL)
 				continue;
 
-			ret = i915_wait_request(obj->last_read_req[i]);
+			ret = i915_wait_request(obj->last_read[i].request);
 			if (ret)
 				return ret;
 
@@ -1151,9 +1151,9 @@ i915_gem_object_retire_request(struct drm_i915_gem_object *obj,
 {
 	int ring = req->engine->id;
 
-	if (obj->last_read_req[ring] == req)
+	if (obj->last_read[ring].request == req)
 		i915_gem_object_retire__read(obj, ring);
-	else if (obj->last_write_req == req)
+	else if (obj->last_write.request == req)
 		i915_gem_object_retire__write(obj);
 
 	i915_gem_request_retire_upto(req);
@@ -1181,7 +1181,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	if (readonly) {
 		struct drm_i915_gem_request *req;
 
-		req = obj->last_write_req;
+		req = obj->last_write.request;
 		if (req == NULL)
 			return 0;
 
@@ -1190,7 +1190,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 		for (i = 0; i < I915_NUM_RINGS; i++) {
 			struct drm_i915_gem_request *req;
 
-			req = obj->last_read_req[i];
+			req = obj->last_read[i].request;
 			if (req == NULL)
 				continue;
 
@@ -2070,7 +2070,7 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 	obj->active |= intel_engine_flag(engine);
 
 	list_move_tail(&obj->ring_list[engine->id], &engine->active_list);
-	i915_gem_request_assign(&obj->last_read_req[engine->id], req);
+	i915_gem_request_mark_active(req, &obj->last_read[engine->id]);
 
 	list_move_tail(&vma->mm_list, &vma->vm->active_list);
 }
@@ -2078,10 +2078,10 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 static void
 i915_gem_object_retire__write(struct drm_i915_gem_object *obj)
 {
-	GEM_BUG_ON(obj->last_write_req == NULL);
-	GEM_BUG_ON(!(obj->active & intel_engine_flag(obj->last_write_req->engine)));
+	GEM_BUG_ON(obj->last_write.request == NULL);
+	GEM_BUG_ON(!(obj->active & intel_engine_flag(obj->last_write.request->engine)));
 
-	i915_gem_request_assign(&obj->last_write_req, NULL);
+	i915_gem_request_assign(&obj->last_write.request, NULL);
 	intel_fb_obj_flush(obj, true, ORIGIN_CS);
 }
 
@@ -2090,13 +2090,13 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 {
 	struct i915_vma *vma;
 
-	GEM_BUG_ON(obj->last_read_req[ring] == NULL);
+	GEM_BUG_ON(obj->last_read[ring].request == NULL);
 	GEM_BUG_ON(!(obj->active & (1 << ring)));
 
 	list_del_init(&obj->ring_list[ring]);
-	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
+	i915_gem_request_assign(&obj->last_read[ring].request, NULL);
 
-	if (obj->last_write_req && obj->last_write_req->engine->id == ring)
+	if (obj->last_write.request && obj->last_write.request->engine->id == ring)
 		i915_gem_object_retire__write(obj);
 
 	obj->active &= ~(1 << ring);
@@ -2115,7 +2115,7 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
 	}
 
-	i915_gem_request_assign(&obj->last_fenced_req, NULL);
+	i915_gem_request_assign(&obj->last_fence.request, NULL);
 	drm_gem_object_unreference(&obj->base);
 }
 
@@ -2336,7 +2336,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 				      struct drm_i915_gem_object,
 				      ring_list[ring->id]);
 
-		if (!list_empty(&obj->last_read_req[ring->id]->list))
+		if (!list_empty(&obj->last_read[ring->id].request->list))
 			break;
 
 		i915_gem_object_retire__read(obj, ring->id);
@@ -2445,7 +2445,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct drm_i915_gem_request *req;
 
-		req = obj->last_read_req[i];
+		req = obj->last_read[i].request;
 		if (req == NULL)
 			continue;
 
@@ -2525,10 +2525,10 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	drm_gem_object_unreference(&obj->base);
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
-		if (obj->last_read_req[i] == NULL)
+		if (obj->last_read[i].request == NULL)
 			continue;
 
-		req[n++] = i915_gem_request_get(obj->last_read_req[i]);
+		req[n++] = i915_gem_request_get(obj->last_read[i].request);
 	}
 
 	mutex_unlock(&dev->struct_mutex);
@@ -2619,12 +2619,12 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
 
 	n = 0;
 	if (readonly) {
-		if (obj->last_write_req)
-			req[n++] = obj->last_write_req;
+		if (obj->last_write.request)
+			req[n++] = obj->last_write.request;
 	} else {
 		for (i = 0; i < I915_NUM_RINGS; i++)
-			if (obj->last_read_req[i])
-				req[n++] = obj->last_read_req[i];
+			if (obj->last_read[i].request)
+				req[n++] = obj->last_read[i].request;
 	}
 	for (i = 0; i < n; i++) {
 		ret = __i915_gem_object_sync(obj, to, req[i]);
@@ -3695,8 +3695,8 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 
 	BUILD_BUG_ON(I915_NUM_RINGS > 16);
 	args->busy = obj->active << 16;
-	if (obj->last_write_req)
-		args->busy |= obj->last_write_req->engine->id;
+	if (obj->last_write.request)
+		args->busy |= obj->last_write.request->engine->id;
 
 unref:
 	drm_gem_object_unreference(&obj->base);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 6dee27224ddb..56d6b5dbb121 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1125,7 +1125,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 
 		i915_vma_move_to_active(vma, req);
 		if (obj->base.write_domain) {
-			i915_gem_request_assign(&obj->last_write_req, req);
+			i915_gem_request_mark_active(req, &obj->last_write);
 
 			intel_fb_obj_invalidate(obj, ORIGIN_CS);
 
@@ -1133,7 +1133,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 			obj->base.write_domain &= ~I915_GEM_GPU_DOMAINS;
 		}
 		if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
-			i915_gem_request_assign(&obj->last_fenced_req, req);
+			i915_gem_request_mark_active(req, &obj->last_fence);
 			if (entry->flags & __EXEC_OBJECT_HAS_FENCE) {
 				struct drm_i915_private *dev_priv = req->i915;
 				list_move_tail(&dev_priv->fence_regs[obj->fence_reg].lru_list,
diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
index 598198543dcd..ab29c237ffa9 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence.c
@@ -261,12 +261,12 @@ static inline void i915_gem_object_fence_lost(struct drm_i915_gem_object *obj)
 static int
 i915_gem_object_wait_fence(struct drm_i915_gem_object *obj)
 {
-	if (obj->last_fenced_req) {
-		int ret = i915_wait_request(obj->last_fenced_req);
+	if (obj->last_fence.request) {
+		int ret = i915_wait_request(obj->last_fence.request);
 		if (ret)
 			return ret;
 
-		i915_gem_request_assign(&obj->last_fenced_req, NULL);
+		i915_gem_request_assign(&obj->last_fence.request, NULL);
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 2da9e0b5dfc7..0a21986c332b 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -208,4 +208,42 @@ static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 				 req->fence.seqno);
 }
 
+/* We treat requests as fences. This is not be to confused with our
+ * "fence registers" but pipeline synchronisation objects ala GL_ARB_sync.
+ * We use the fences to synchronize access from the CPU with activity on the
+ * GPU, for example, we should not rewrite an object's PTE whilst the GPU
+ * is reading them. We also track fences at a higher level to provide
+ * implicit synchronisation around GEM objects, e.g. set-domain will wait
+ * for outstanding GPU rendering before marking the object ready for CPU
+ * access, or a pageflip will wait until the GPU is complete before showing
+ * the frame on the scanout.
+ *
+ * In order to use a fence, the object must track the fence it needs to
+ * serialise with. For example, GEM objects want to track both read and
+ * write access so that we can perform concurrent read operations between
+ * the CPU and GPU engines, as well as waiting for all rendering to
+ * complete, or waiting for the last GPU user of a "fence register". The
+ * object then embeds a @i915_gem_active to track the most recent (in
+ * retirment order) request relevant for the desired mode of access.
+ * The @i915_gem_active is updated with i915_gem_request_mark_active() to
+ * track the most recent fence request, typically this is done as part of
+ * i915_vma_move_to_active().
+ *
+ * When the @i915_gem_active completes (is retired), it will
+ * signal its completion to the owner through a callback as well as mark
+ * itself as idle (i915_gem_active.request == NULL). The owner
+ * can then perform any action, such as delayed freeing of an active
+ * resource including itself.
+ */
+struct i915_gem_active {
+	struct drm_i915_gem_request *request;
+};
+
+static inline void
+i915_gem_request_mark_active(struct drm_i915_gem_request *request,
+			     struct i915_gem_active *active)
+{
+	i915_gem_request_assign(&active->request, request);
+}
+
 #endif /* I915_GEM_REQUEST_H */
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
index 7410f6c962e7..c7588135a82d 100644
--- a/drivers/gpu/drm/i915/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
@@ -242,7 +242,7 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
 			}
 
 			obj->fence_dirty =
-				obj->last_fenced_req ||
+				obj->last_fence.request ||
 				obj->fence_reg != I915_FENCE_REG_NONE;
 
 			obj->tiling_mode = args->tiling_mode;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 2785f2d1f073..5027636e3624 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -708,8 +708,8 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 	err->size = obj->base.size;
 	err->name = obj->base.name;
 	for (i = 0; i < I915_NUM_RINGS; i++)
-		err->rseqno[i] = i915_gem_request_get_seqno(obj->last_read_req[i]);
-	err->wseqno = i915_gem_request_get_seqno(obj->last_write_req);
+		err->rseqno[i] = i915_gem_request_get_seqno(obj->last_read[i].request);
+	err->wseqno = i915_gem_request_get_seqno(obj->last_write.request);
 	err->gtt_offset = vma->node.start;
 	err->read_domains = obj->base.read_domains;
 	err->write_domain = obj->base.write_domain;
@@ -721,7 +721,7 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 	err->dirty = obj->dirty;
 	err->purgeable = obj->madv != I915_MADV_WILLNEED;
 	err->userptr = obj->userptr.mm != NULL;
-	err->ring = obj->last_write_req ?  obj->last_write_req->engine->id : -1;
+	err->ring = obj->last_write.request ? obj->last_write.request->engine->id : -1;
 	err->cache_level = obj->cache_level;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index ec52fff7e0b0..eef858d5376f 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11310,7 +11310,7 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
 						       false))
 		return true;
 	else
-		return ring != i915_gem_request_get_engine(obj->last_write_req);
+		return ring != i915_gem_request_get_engine(obj->last_write.request);
 }
 
 static void skl_do_mmio_flip(struct intel_crtc *intel_crtc,
@@ -11455,7 +11455,7 @@ static int intel_queue_mmio_flip(struct drm_device *dev,
 		return -ENOMEM;
 
 	mmio_flip->i915 = to_i915(dev);
-	mmio_flip->req = i915_gem_request_get(obj->last_write_req);
+	mmio_flip->req = i915_gem_request_get(obj->last_write.request);
 	mmio_flip->crtc = to_intel_crtc(crtc);
 	mmio_flip->rotation = crtc->primary->state->rotation;
 
@@ -11654,7 +11654,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	} else if (IS_IVYBRIDGE(dev) || IS_HASWELL(dev)) {
 		ring = &dev_priv->ring[BCS];
 	} else if (INTEL_INFO(dev)->gen >= 7) {
-		ring = i915_gem_request_get_engine(obj->last_write_req);
+		ring = i915_gem_request_get_engine(obj->last_write.request);
 		if (ring == NULL || ring->id != RCS)
 			ring = &dev_priv->ring[BCS];
 	} else {
@@ -11695,7 +11695,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 			goto cleanup_unpin;
 
 		i915_gem_request_assign(&work->flip_queued_req,
-					obj->last_write_req);
+					obj->last_write.request);
 	} else {
 		ret = dev_priv->display.queue_flip(dev, crtc, fb, obj, request,
 						   page_flip_flags);
@@ -13895,7 +13895,7 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 				to_intel_plane_state(new_state);
 
 			i915_gem_request_assign(&plane_state->wait_req,
-						obj->last_write_req);
+						obj->last_write.request);
 		}
 
 		i915_gem_track_fb(old_obj, obj, intel_plane->frontbuffer_bit);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 074/190] drm/i915: Rename request->list to link for consistency
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (71 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 073/190] drm/i915: Introduce i915_gem_active for request tracking Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-12 13:47   ` Tvrtko Ursulin
  2016-01-11  9:17 ` [PATCH 075/190] drm/i915: Refactor activity tracking for requests Chris Wilson
                   ` (13 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

We use "list" to denote the list and "link" to denote an element on that
list. Rename request->list to match this idiom.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  4 ++--
 drivers/gpu/drm/i915/i915_gem.c         | 12 ++++++------
 drivers/gpu/drm/i915/i915_gem_request.c | 10 +++++-----
 drivers/gpu/drm/i915/i915_gem_request.h |  4 ++--
 drivers/gpu/drm/i915/i915_gpu_error.c   |  4 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.c |  6 +++---
 6 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 65cb1d6a5d64..efa9572fc217 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -695,13 +695,13 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
 		int count;
 
 		count = 0;
-		list_for_each_entry(req, &ring->request_list, list)
+		list_for_each_entry(req, &ring->request_list, link)
 			count++;
 		if (count == 0)
 			continue;
 
 		seq_printf(m, "%s requests: %d\n", ring->name, count);
-		list_for_each_entry(req, &ring->request_list, list) {
+		list_for_each_entry(req, &ring->request_list, link) {
 			struct task_struct *task;
 
 			rcu_read_lock();
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 77c253ddf060..f314b3ea2726 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2183,7 +2183,7 @@ i915_gem_find_active_request(struct intel_engine_cs *ring)
 	 * extra delay for a recent interrupt is pointless. Hence, we do
 	 * not need an engine->irq_seqno_barrier() before the seqno reads.
 	 */
-	list_for_each_entry(request, &ring->request_list, list) {
+	list_for_each_entry(request, &ring->request_list, link) {
 		if (i915_gem_request_completed(request))
 			continue;
 
@@ -2208,7 +2208,7 @@ static void i915_gem_reset_ring_status(struct intel_engine_cs *ring)
 
 	i915_set_reset_status(dev_priv, request->ctx, ring_hung);
 
-	list_for_each_entry_continue(request, &ring->request_list, list)
+	list_for_each_entry_continue(request, &ring->request_list, link)
 		i915_set_reset_status(dev_priv, request->ctx, false);
 }
 
@@ -2255,7 +2255,7 @@ static void i915_gem_reset_ring_cleanup(struct intel_engine_cs *engine)
 
 		request = list_last_entry(&engine->request_list,
 					  struct drm_i915_gem_request,
-					  list);
+					  link);
 
 		i915_gem_request_retire_upto(request);
 	}
@@ -2317,7 +2317,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 
 		request = list_first_entry(&ring->request_list,
 					   struct drm_i915_gem_request,
-					   list);
+					   link);
 
 		if (!i915_gem_request_completed(request))
 			break;
@@ -2336,7 +2336,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 				      struct drm_i915_gem_object,
 				      ring_list[ring->id]);
 
-		if (!list_empty(&obj->last_read[ring->id].request->list))
+		if (!list_empty(&obj->last_read[ring->id].request->link))
 			break;
 
 		i915_gem_object_retire__read(obj, ring->id);
@@ -2449,7 +2449,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 		if (req == NULL)
 			continue;
 
-		if (list_empty(&req->list))
+		if (list_empty(&req->link))
 			goto retire;
 
 		if (i915_gem_request_completed(req)) {
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 01443d8d9224..7f38d8972721 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -333,7 +333,7 @@ void i915_gem_request_cancel(struct drm_i915_gem_request *req)
 static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 {
 	trace_i915_gem_request_retire(request);
-	list_del_init(&request->list);
+	list_del_init(&request->link);
 
 	/* We know the GPU must have read the request to have
 	 * sent us the seqno + interrupt, so use the position
@@ -355,12 +355,12 @@ i915_gem_request_retire_upto(struct drm_i915_gem_request *req)
 
 	lockdep_assert_held(&engine->dev->struct_mutex);
 
-	if (list_empty(&req->list))
+	if (list_empty(&req->link))
 		return;
 
 	do {
 		tmp = list_first_entry(&engine->request_list,
-				       typeof(*tmp), list);
+				       typeof(*tmp), link);
 
 		i915_gem_request_retire(tmp);
 	} while (tmp != req);
@@ -451,7 +451,7 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	request->emitted_jiffies = jiffies;
 	request->previous_seqno = request->engine->last_submitted_seqno;
 	request->engine->last_submitted_seqno = request->fence.seqno;
-	list_add_tail(&request->list, &request->engine->request_list);
+	list_add_tail(&request->link, &request->engine->request_list);
 
 	trace_i915_gem_request_add(request);
 
@@ -565,7 +565,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 	might_sleep();
 
-	if (list_empty(&req->list))
+	if (list_empty(&req->link))
 		return 0;
 
 	if (i915_gem_request_completed(req))
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 0a21986c332b..01d589be95fd 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -88,8 +88,8 @@ struct drm_i915_gem_request {
 	/** Time at which this request was emitted, in jiffies. */
 	unsigned long emitted_jiffies;
 
-	/** global list entry for this request */
-	struct list_head list;
+	/** engine->request_list entry for this request */
+	struct list_head link;
 
 	struct drm_i915_file_private *file_priv;
 	/** file_priv list entry for this request */
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 5027636e3624..c812079bc25c 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1056,7 +1056,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 		i915_gem_record_active_context(engine, error, &error->ring[i]);
 
 		count = 0;
-		list_for_each_entry(request, &engine->request_list, list)
+		list_for_each_entry(request, &engine->request_list, link)
 			count++;
 
 		error->ring[i].num_requests = count;
@@ -1069,7 +1069,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 		}
 
 		count = 0;
-		list_for_each_entry(request, &engine->request_list, list) {
+		list_for_each_entry(request, &engine->request_list, link) {
 			struct drm_i915_error_request *erq;
 
 			if (count >= error->ring[i].num_requests) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index d37cdb2f9073..213540f92c9d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2109,7 +2109,7 @@ int intel_engine_idle(struct intel_engine_cs *ring)
 
 	req = list_entry(ring->request_list.prev,
 			struct drm_i915_gem_request,
-			list);
+			link);
 
 	/* Make sure we do not trigger any retires */
 	return __i915_wait_request(req,
@@ -2184,7 +2184,7 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
 	/* The whole point of reserving space is to not wait! */
 	WARN_ON(ring->reserved_in_use);
 
-	list_for_each_entry(target, &engine->request_list, list) {
+	list_for_each_entry(target, &engine->request_list, link) {
 		/*
 		 * The request queue is per-engine, so can contain requests
 		 * from multiple ringbuffers. Here, we must ignore any that
@@ -2200,7 +2200,7 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
 			break;
 	}
 
-	if (WARN_ON(&target->list == &engine->request_list))
+	if (WARN_ON(&target->link == &engine->request_list))
 		return -ENOSPC;
 
 	ret = i915_wait_request(target);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 075/190] drm/i915: Refactor activity tracking for requests
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (72 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 074/190] drm/i915: Rename request->list to link for consistency Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-28 11:41   ` Tvrtko Ursulin
  2016-01-11  9:17 ` [PATCH 076/190] drm/i915: Rename vma->*_list to *_link for consistency Chris Wilson
                   ` (12 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.

Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.

All told, less code, simpler and faster, and more extensible.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile           |   1 -
 drivers/gpu/drm/i915/i915_drv.h         |  10 --
 drivers/gpu/drm/i915/i915_gem.c         | 160 ++++++++------------------------
 drivers/gpu/drm/i915/i915_gem_debug.c   |  70 --------------
 drivers/gpu/drm/i915/i915_gem_fence.c   |  10 +-
 drivers/gpu/drm/i915/i915_gem_request.c |  44 +++++++--
 drivers/gpu/drm/i915/i915_gem_request.h |  16 +++-
 drivers/gpu/drm/i915/intel_lrc.c        |   1 -
 drivers/gpu/drm/i915/intel_ringbuffer.c |   1 -
 drivers/gpu/drm/i915/intel_ringbuffer.h |  12 ---
 10 files changed, 89 insertions(+), 236 deletions(-)
 delete mode 100644 drivers/gpu/drm/i915/i915_gem_debug.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index b0a83215db80..79d657f29241 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -23,7 +23,6 @@ i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o
 i915-y += i915_cmd_parser.o \
 	  i915_gem_batch_pool.o \
 	  i915_gem_context.o \
-	  i915_gem_debug.o \
 	  i915_gem_dmabuf.o \
 	  i915_gem_evict.o \
 	  i915_gem_execbuffer.o \
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c577f86d94f8..c9c1a5cdc1e5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -435,8 +435,6 @@ void intel_link_compute_m_n(int bpp, int nlanes,
 #define DRIVER_MINOR		6
 #define DRIVER_PATCHLEVEL	0
 
-#define WATCH_LISTS	0
-
 struct opregion_header;
 struct opregion_acpi;
 struct opregion_swsci;
@@ -2024,7 +2022,6 @@ struct drm_i915_gem_object {
 	struct drm_mm_node *stolen;
 	struct list_head global_list;
 
-	struct list_head ring_list[I915_NUM_RINGS];
 	/** Used in execbuf to temporarily hold a ref */
 	struct list_head obj_exec_link;
 
@@ -3068,13 +3065,6 @@ static inline bool i915_gem_object_needs_bit17_swizzle(struct drm_i915_gem_objec
 		obj->tiling_mode != I915_TILING_NONE;
 }
 
-/* i915_gem_debug.c */
-#if WATCH_LISTS
-int i915_verify_lists(struct drm_device *dev);
-#else
-#define i915_verify_lists(dev) 0
-#endif
-
 /* i915_debugfs.c */
 int i915_debugfs_init(struct drm_minor *minor);
 void i915_debugfs_cleanup(struct drm_minor *minor);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f314b3ea2726..4eef13ebdaf3 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -40,10 +40,6 @@
 
 static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
 static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
-static void
-i915_gem_object_retire__write(struct drm_i915_gem_object *obj);
-static void
-i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring);
 
 static bool cpu_cache_is_coherent(struct drm_device *dev,
 				  enum i915_cache_level level)
@@ -117,7 +113,6 @@ int i915_mutex_lock_interruptible(struct drm_device *dev)
 	if (ret)
 		return ret;
 
-	WARN_ON(i915_verify_lists(dev));
 	return 0;
 }
 
@@ -1117,27 +1112,14 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
 		return 0;
 
 	if (readonly) {
-		if (obj->last_write.request != NULL) {
-			ret = i915_wait_request(obj->last_write.request);
-			if (ret)
-				return ret;
-
-			i = obj->last_write.request->engine->id;
-			if (obj->last_read[i].request == obj->last_write.request)
-				i915_gem_object_retire__read(obj, i);
-			else
-				i915_gem_object_retire__write(obj);
-		}
+		ret = i915_wait_request(obj->last_write.request);
+		if (ret)
+			return ret;
 	} else {
 		for (i = 0; i < I915_NUM_RINGS; i++) {
-			if (obj->last_read[i].request == NULL)
-				continue;
-
 			ret = i915_wait_request(obj->last_read[i].request);
 			if (ret)
 				return ret;
-
-			i915_gem_object_retire__read(obj, i);
 		}
 		GEM_BUG_ON(obj->active);
 	}
@@ -1145,20 +1127,6 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
 	return 0;
 }
 
-static void
-i915_gem_object_retire_request(struct drm_i915_gem_object *obj,
-			       struct drm_i915_gem_request *req)
-{
-	int ring = req->engine->id;
-
-	if (obj->last_read[ring].request == req)
-		i915_gem_object_retire__read(obj, ring);
-	else if (obj->last_write.request == req)
-		i915_gem_object_retire__write(obj);
-
-	i915_gem_request_retire_upto(req);
-}
-
 /* A nonblocking variant of the above wait. This is a highly dangerous routine
  * as the object state may change during this call.
  */
@@ -1206,7 +1174,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 
 	for (i = 0; i < n; i++) {
 		if (ret == 0)
-			i915_gem_object_retire_request(obj, requests[i]);
+			i915_gem_request_retire_upto(requests[i]);
 		i915_gem_request_put(requests[i]);
 	}
 
@@ -2069,35 +2037,37 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 		drm_gem_object_reference(&obj->base);
 	obj->active |= intel_engine_flag(engine);
 
-	list_move_tail(&obj->ring_list[engine->id], &engine->active_list);
 	i915_gem_request_mark_active(req, &obj->last_read[engine->id]);
-
 	list_move_tail(&vma->mm_list, &vma->vm->active_list);
 }
 
 static void
-i915_gem_object_retire__write(struct drm_i915_gem_object *obj)
+i915_gem_object_retire__fence(struct i915_gem_active *active,
+			      struct drm_i915_gem_request *req)
 {
-	GEM_BUG_ON(obj->last_write.request == NULL);
-	GEM_BUG_ON(!(obj->active & intel_engine_flag(obj->last_write.request->engine)));
+}
 
-	i915_gem_request_assign(&obj->last_write.request, NULL);
-	intel_fb_obj_flush(obj, true, ORIGIN_CS);
+static void
+i915_gem_object_retire__write(struct i915_gem_active *active,
+			      struct drm_i915_gem_request *request)
+{
+	intel_fb_obj_flush(container_of(active,
+					struct drm_i915_gem_object,
+					last_write),
+			   true,
+			   ORIGIN_CS);
 }
 
 static void
-i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
+i915_gem_object_retire__read(struct i915_gem_active *active,
+			     struct drm_i915_gem_request *request)
 {
+	int ring = request->engine->id;
+	struct drm_i915_gem_object *obj =
+		container_of(active, struct drm_i915_gem_object, last_read[ring]);
 	struct i915_vma *vma;
 
-	GEM_BUG_ON(obj->last_read[ring].request == NULL);
-	GEM_BUG_ON(!(obj->active & (1 << ring)));
-
-	list_del_init(&obj->ring_list[ring]);
-	i915_gem_request_assign(&obj->last_read[ring].request, NULL);
-
-	if (obj->last_write.request && obj->last_write.request->engine->id == ring)
-		i915_gem_object_retire__write(obj);
+	GEM_BUG_ON((obj->active & (1 << ring)) == 0);
 
 	obj->active &= ~(1 << ring);
 	if (obj->active)
@@ -2107,15 +2077,13 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 	 * so that we don't steal from recently used but inactive objects
 	 * (unless we are forced to ofc!)
 	 */
-	list_move_tail(&obj->global_list,
-		       &to_i915(obj->base.dev)->mm.bound_list);
+	list_move_tail(&obj->global_list, &request->i915->mm.bound_list);
 
 	list_for_each_entry(vma, &obj->vma_list, vma_link) {
 		if (!list_empty(&vma->mm_list))
 			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
 	}
 
-	i915_gem_request_assign(&obj->last_fence.request, NULL);
 	drm_gem_object_unreference(&obj->base);
 }
 
@@ -2216,16 +2184,6 @@ static void i915_gem_reset_ring_cleanup(struct intel_engine_cs *engine)
 {
 	struct intel_ring *ring;
 
-	while (!list_empty(&engine->active_list)) {
-		struct drm_i915_gem_object *obj;
-
-		obj = list_first_entry(&engine->active_list,
-				       struct drm_i915_gem_object,
-				       ring_list[engine->id]);
-
-		i915_gem_object_retire__read(obj, engine->id);
-	}
-
 	/*
 	 * Clear the execlists queue up before freeing the requests, as those
 	 * are the ones that keep the context and ringbuffer backing objects
@@ -2295,8 +2253,6 @@ void i915_gem_reset(struct drm_device *dev)
 	i915_gem_context_reset(dev);
 
 	i915_gem_restore_fences(dev);
-
-	WARN_ON(i915_verify_lists(dev));
 }
 
 /**
@@ -2305,13 +2261,6 @@ void i915_gem_reset(struct drm_device *dev)
 void
 i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 {
-	WARN_ON(i915_verify_lists(ring->dev));
-
-	/* Retire requests first as we use it above for the early return.
-	 * If we retire requests last, we may use a later seqno and so clear
-	 * the requests lists without clearing the active list, leading to
-	 * confusion.
-	 */
 	while (!list_empty(&ring->request_list)) {
 		struct drm_i915_gem_request *request;
 
@@ -2324,25 +2273,6 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 
 		i915_gem_request_retire_upto(request);
 	}
-
-	/* Move any buffers on the active list that are no longer referenced
-	 * by the ringbuffer to the flushing/inactive lists as appropriate,
-	 * before we free the context associated with the requests.
-	 */
-	while (!list_empty(&ring->active_list)) {
-		struct drm_i915_gem_object *obj;
-
-		obj = list_first_entry(&ring->active_list,
-				      struct drm_i915_gem_object,
-				      ring_list[ring->id]);
-
-		if (!list_empty(&obj->last_read[ring->id].request->link))
-			break;
-
-		i915_gem_object_retire__read(obj, ring->id);
-	}
-
-	WARN_ON(i915_verify_lists(ring->dev));
 }
 
 void
@@ -2434,13 +2364,13 @@ out:
  * write domains, emitting any outstanding lazy request and retiring and
  * completed requests.
  */
-static int
+static void
 i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 {
 	int i;
 
 	if (!obj->active)
-		return 0;
+		return;
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct drm_i915_gem_request *req;
@@ -2449,17 +2379,9 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 		if (req == NULL)
 			continue;
 
-		if (list_empty(&req->link))
-			goto retire;
-
-		if (i915_gem_request_completed(req)) {
+		if (i915_gem_request_completed(req))
 			i915_gem_request_retire_upto(req);
-retire:
-			i915_gem_object_retire__read(obj, i);
-		}
 	}
-
-	return 0;
 }
 
 /**
@@ -2507,10 +2429,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	}
 
 	/* Need to make sure the object gets inactive eventually. */
-	ret = i915_gem_object_flush_active(obj);
-	if (ret)
-		goto out;
-
+	i915_gem_object_flush_active(obj);
 	if (!obj->active)
 		goto out;
 
@@ -2522,8 +2441,6 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		goto out;
 	}
 
-	drm_gem_object_unreference(&obj->base);
-
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		if (obj->last_read[i].request == NULL)
 			continue;
@@ -2531,6 +2448,8 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		req[n++] = i915_gem_request_get(obj->last_read[i].request);
 	}
 
+out:
+	drm_gem_object_unreference(&obj->base);
 	mutex_unlock(&dev->struct_mutex);
 
 	for (i = 0; i < n; i++) {
@@ -2541,11 +2460,6 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		i915_gem_request_put(req[i]);
 	}
 	return ret;
-
-out:
-	drm_gem_object_unreference(&obj->base);
-	mutex_unlock(&dev->struct_mutex);
-	return ret;
 }
 
 static int
@@ -2569,7 +2483,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 		if (ret)
 			return ret;
 
-		i915_gem_object_retire_request(obj, from);
+		i915_gem_request_retire_upto(from);
 	} else {
 		int idx = intel_engine_sync_index(from->engine, to->engine);
 		if (from->fence.seqno <= from->engine->semaphore.sync_seqno[idx])
@@ -2760,7 +2674,6 @@ int i915_gpu_idle(struct drm_device *dev)
 			return ret;
 	}
 
-	WARN_ON(i915_verify_lists(dev));
 	return 0;
 }
 
@@ -3689,16 +3602,13 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 	 * become non-busy without any further actions, therefore emit any
 	 * necessary flushes here.
 	 */
-	ret = i915_gem_object_flush_active(obj);
-	if (ret)
-		goto unref;
+	i915_gem_object_flush_active(obj);
 
 	BUILD_BUG_ON(I915_NUM_RINGS > 16);
 	args->busy = obj->active << 16;
 	if (obj->last_write.request)
 		args->busy |= obj->last_write.request->engine->id;
 
-unref:
 	drm_gem_object_unreference(&obj->base);
 unlock:
 	mutex_unlock(&dev->struct_mutex);
@@ -3776,7 +3686,12 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 
 	INIT_LIST_HEAD(&obj->global_list);
 	for (i = 0; i < I915_NUM_RINGS; i++)
-		INIT_LIST_HEAD(&obj->ring_list[i]);
+		init_request_active(&obj->last_read[i],
+				    i915_gem_object_retire__read);
+	init_request_active(&obj->last_write,
+			    i915_gem_object_retire__write);
+	init_request_active(&obj->last_fence,
+			    i915_gem_object_retire__fence);
 	INIT_LIST_HEAD(&obj->obj_exec_link);
 	INIT_LIST_HEAD(&obj->vma_list);
 	INIT_LIST_HEAD(&obj->batch_pool_link);
@@ -4372,7 +4287,6 @@ i915_gem_cleanup_ringbuffer(struct drm_device *dev)
 static void
 init_ring_lists(struct intel_engine_cs *ring)
 {
-	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_debug.c b/drivers/gpu/drm/i915/i915_gem_debug.c
deleted file mode 100644
index 17299d04189f..000000000000
--- a/drivers/gpu/drm/i915/i915_gem_debug.c
+++ /dev/null
@@ -1,70 +0,0 @@
-/*
- * Copyright © 2008 Intel Corporation
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
- * IN THE SOFTWARE.
- *
- * Authors:
- *    Keith Packard <keithp@keithp.com>
- *
- */
-
-#include <drm/drmP.h>
-#include <drm/i915_drm.h>
-#include "i915_drv.h"
-
-#if WATCH_LISTS
-int
-i915_verify_lists(struct drm_device *dev)
-{
-	static int warned;
-	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct drm_i915_gem_object *obj;
-	struct intel_engine_cs *ring;
-	int err = 0;
-	int i;
-
-	if (warned)
-		return 0;
-
-	for_each_ring(ring, dev_priv, i) {
-		list_for_each_entry(obj, &ring->active_list, ring_list[ring->id]) {
-			if (obj->base.dev != dev ||
-			    !atomic_read(&obj->base.refcount.refcount)) {
-				DRM_ERROR("%s: freed active obj %p\n",
-					  ring->name, obj);
-				err++;
-				break;
-			} else if (!obj->active ||
-				   obj->last_read_req[ring->id] == NULL) {
-				DRM_ERROR("%s: invalid active obj %p\n",
-					  ring->name, obj);
-				err++;
-			} else if (obj->base.write_domain) {
-				DRM_ERROR("%s: invalid write obj %p (w %x)\n",
-					  ring->name,
-					  obj, obj->base.write_domain);
-				err++;
-			}
-		}
-	}
-
-	return warned = err;
-}
-#endif /* WATCH_LIST */
diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
index ab29c237ffa9..ff085efcf0e5 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence.c
@@ -261,15 +261,7 @@ static inline void i915_gem_object_fence_lost(struct drm_i915_gem_object *obj)
 static int
 i915_gem_object_wait_fence(struct drm_i915_gem_object *obj)
 {
-	if (obj->last_fence.request) {
-		int ret = i915_wait_request(obj->last_fence.request);
-		if (ret)
-			return ret;
-
-		i915_gem_request_assign(&obj->last_fence.request, NULL);
-	}
-
-	return 0;
+	return i915_wait_request(obj->last_fence.request);
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 7f38d8972721..069c0b9dfd95 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -228,6 +228,7 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 		   engine->fence_context,
 		   seqno);
 
+	INIT_LIST_HEAD(&req->active_list);
 	req->i915 = dev_priv;
 	req->engine = engine;
 	req->reset_counter = reset_counter;
@@ -320,6 +321,27 @@ static void __i915_gem_request_release(struct drm_i915_gem_request *request)
 	i915_gem_request_put(request);
 }
 
+static void __i915_gem_request_retire_active(struct drm_i915_gem_request *req)
+{
+	struct i915_gem_active *active, *next;
+
+	/* Walk through the active list, calling retire on each. This allows
+	 * objects to track their GPU activity and mark themselves as idle
+	 * when their *last* active request is completed (updating state
+	 * tracking lists for eviction, active references for GEM, etc).
+	 *
+	 * As the ->retire() may free the node, we decouple it first and
+	 * pass along the auxiliary information (to avoid dereferencing
+	 * the node after the callback).
+	 */
+	list_for_each_entry_safe(active, next, &req->active_list, link) {
+		INIT_LIST_HEAD(&active->link);
+		active->request = NULL;
+
+		active->retire(active, req);
+	}
+}
+
 void i915_gem_request_cancel(struct drm_i915_gem_request *req)
 {
 	intel_ring_reserved_space_cancel(req->ring);
@@ -327,6 +349,14 @@ void i915_gem_request_cancel(struct drm_i915_gem_request *req)
 		if (req->ctx != req->engine->default_context)
 			intel_lr_context_unpin(req);
 	}
+
+	/* If a request is to be discarded after actions have been queued upon
+	 * it, we cannot unwind that request and it must be submitted rather
+	 * than cancelled. This is not limited to activity tracking, but all
+	 * other state tracking (such as current register settings etc).
+	 */
+	GEM_BUG_ON(!list_empty(&req->active_list));
+
 	__i915_gem_request_release(req);
 }
 
@@ -344,6 +374,8 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	 * completion order.
 	 */
 	request->ring->last_retired_head = request->postfix;
+
+	__i915_gem_request_retire_active(request);
 	__i915_gem_request_release(request);
 }
 
@@ -354,7 +386,6 @@ i915_gem_request_retire_upto(struct drm_i915_gem_request *req)
 	struct drm_i915_gem_request *tmp;
 
 	lockdep_assert_held(&engine->dev->struct_mutex);
-
 	if (list_empty(&req->link))
 		return;
 
@@ -364,8 +395,6 @@ i915_gem_request_retire_upto(struct drm_i915_gem_request *req)
 
 		i915_gem_request_retire(tmp);
 	} while (tmp != req);
-
-	WARN_ON(i915_verify_lists(engine->dev));
 }
 
 static void i915_gem_mark_busy(struct drm_i915_private *dev_priv)
@@ -565,9 +594,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 	might_sleep();
 
-	if (list_empty(&req->link))
-		return 0;
-
 	if (i915_gem_request_completed(req))
 		return 0;
 
@@ -700,10 +726,12 @@ i915_wait_request(struct drm_i915_gem_request *req)
 {
 	int ret;
 
-	BUG_ON(req == NULL);
+	if (req == NULL)
+		return 0;
 
-	BUG_ON(!mutex_is_locked(&req->i915->dev->struct_mutex));
+	GEM_BUG_ON(list_empty(&req->link));
 
+	lockdep_assert_held(&req->i915->dev->struct_mutex);
 	ret = __i915_wait_request(req, req->i915->mm.interruptible, NULL, NULL);
 	if (ret)
 		return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 01d589be95fd..59957d5edfdb 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -84,6 +84,7 @@ struct drm_i915_gem_request {
 	/** Batch buffer related to this request if any (used for
 	    error state dump only) */
 	struct drm_i915_gem_object *batch_obj;
+	struct list_head active_list;
 
 	/** Time at which this request was emitted, in jiffies. */
 	unsigned long emitted_jiffies;
@@ -237,13 +238,26 @@ static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
  */
 struct i915_gem_active {
 	struct drm_i915_gem_request *request;
+	struct list_head link;
+	void (*retire)(struct i915_gem_active *,
+		       struct drm_i915_gem_request *);
 };
 
 static inline void
+init_request_active(struct i915_gem_active *active,
+		    void (*func)(struct i915_gem_active *,
+				 struct drm_i915_gem_request *))
+{
+	INIT_LIST_HEAD(&active->link);
+	active->retire = func;
+}
+
+static inline void
 i915_gem_request_mark_active(struct drm_i915_gem_request *request,
 			     struct i915_gem_active *active)
 {
-	i915_gem_request_assign(&active->request, request);
+	list_move(&active->link, &request->active_list);
+	active->request = request;
 }
 
 #endif /* I915_GEM_REQUEST_H */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0f0bf97e4032..b5f62b5f4913 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1558,7 +1558,6 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	ring->i915 = to_i915(dev);
 	ring->fence_context = fence_context_alloc(1);
-	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	intel_engine_init_breadcrumbs(ring);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 213540f92c9d..7ca4e1fc854d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2025,7 +2025,6 @@ static int intel_init_engine(struct drm_device *dev,
 	engine->dev = dev;
 	engine->i915 = to_i915(dev);
 	engine->fence_context = fence_context_alloc(1);
-	INIT_LIST_HEAD(&engine->active_list);
 	INIT_LIST_HEAD(&engine->request_list);
 	INIT_LIST_HEAD(&engine->execlist_queue);
 	INIT_LIST_HEAD(&engine->buffers);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index fc9c1e453be1..bb92d831a100 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -298,18 +298,6 @@ struct intel_engine_cs {
 	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 
 	/**
-	 * List of objects currently involved in rendering from the
-	 * ringbuffer.
-	 *
-	 * Includes buffers having the contents of their GPU caches
-	 * flushed, not necessarily primitives.  last_read_req
-	 * represents when the rendering involved will be completed.
-	 *
-	 * A reference is held on the buffer while on this list.
-	 */
-	struct list_head active_list;
-
-	/**
 	 * List of breadcrumbs associated with GPU requests currently
 	 * outstanding.
 	 */
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 076/190] drm/i915: Rename vma->*_list to *_link for consistency
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (73 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 075/190] drm/i915: Refactor activity tracking for requests Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-12 13:49   ` Tvrtko Ursulin
  2016-01-11  9:17 ` [PATCH 077/190] drm/i915: Amalgamate GGTT/ppGTT vma debug list walkers Chris Wilson
                   ` (11 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Elsewhere we have adopted the convention of using '_link' to denote
elements in the list (and '_list' for the actual list_head itself), and
that the name should indicate which list the link belongs to (and
preferrably not just where the link is being stored).

s/vma_link/obj_link/ (we iterate over obj->vma_list)
s/mm_list/vm_link/ (we iterate over vm->[in]active_list)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c      | 17 +++++------
 drivers/gpu/drm/i915/i915_gem.c          | 50 ++++++++++++++++----------------
 drivers/gpu/drm/i915/i915_gem_context.c  |  2 +-
 drivers/gpu/drm/i915/i915_gem_evict.c    |  6 ++--
 drivers/gpu/drm/i915/i915_gem_gtt.c      | 10 +++----
 drivers/gpu/drm/i915/i915_gem_gtt.h      |  4 +--
 drivers/gpu/drm/i915/i915_gem_shrinker.c |  4 +--
 drivers/gpu/drm/i915/i915_gem_stolen.c   |  2 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c  |  2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c    |  8 ++---
 10 files changed, 52 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index efa9572fc217..f311df758195 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -117,9 +117,8 @@ static u64 i915_gem_obj_total_ggtt_size(struct drm_i915_gem_object *obj)
 	u64 size = 0;
 	struct i915_vma *vma;
 
-	list_for_each_entry(vma, &obj->vma_list, vma_link) {
-		if (i915_is_ggtt(vma->vm) &&
-		    drm_mm_node_allocated(&vma->node))
+	list_for_each_entry(vma, &obj->vma_list, obj_link) {
+		if (i915_is_ggtt(vma->vm) && drm_mm_node_allocated(&vma->node))
 			size += vma->node.size;
 	}
 
@@ -155,7 +154,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		   obj->madv == I915_MADV_DONTNEED ? " purgeable" : "");
 	if (obj->base.name)
 		seq_printf(m, " (name: %d)", obj->base.name);
-	list_for_each_entry(vma, &obj->vma_list, vma_link) {
+	list_for_each_entry(vma, &obj->vma_list, obj_link) {
 		if (vma->pin_count > 0)
 			pin_count++;
 	}
@@ -164,7 +163,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		seq_printf(m, " (display)");
 	if (obj->fence_reg != I915_FENCE_REG_NONE)
 		seq_printf(m, " (fence: %d)", obj->fence_reg);
-	list_for_each_entry(vma, &obj->vma_list, vma_link) {
+	list_for_each_entry(vma, &obj->vma_list, obj_link) {
 		seq_printf(m, " (%sgtt offset: %08llx, size: %08llx",
 			   i915_is_ggtt(vma->vm) ? "g" : "pp",
 			   vma->node.start, vma->node.size);
@@ -229,7 +228,7 @@ static int i915_gem_object_list_info(struct seq_file *m, void *data)
 	}
 
 	total_obj_size = total_gtt_size = count = 0;
-	list_for_each_entry(vma, head, mm_list) {
+	list_for_each_entry(vma, head, vm_link) {
 		seq_printf(m, "   ");
 		describe_obj(m, vma->obj);
 		seq_printf(m, "\n");
@@ -341,7 +340,7 @@ static int per_file_stats(int id, void *ptr, void *data)
 		stats->shared += obj->base.size;
 
 	if (USES_FULL_PPGTT(obj->base.dev)) {
-		list_for_each_entry(vma, &obj->vma_list, vma_link) {
+		list_for_each_entry(vma, &obj->vma_list, obj_link) {
 			struct i915_hw_ppgtt *ppgtt;
 
 			if (!drm_mm_node_allocated(&vma->node))
@@ -453,12 +452,12 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 		   count, mappable_count, size, mappable_size);
 
 	size = count = mappable_size = mappable_count = 0;
-	count_vmas(&vm->active_list, mm_list);
+	count_vmas(&vm->active_list, vm_link);
 	seq_printf(m, "  %u [%u] active objects, %llu [%llu] bytes\n",
 		   count, mappable_count, size, mappable_size);
 
 	size = count = mappable_size = mappable_count = 0;
-	count_vmas(&vm->inactive_list, mm_list);
+	count_vmas(&vm->inactive_list, vm_link);
 	seq_printf(m, "  %u [%u] inactive objects, %llu [%llu] bytes\n",
 		   count, mappable_count, size, mappable_size);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4eef13ebdaf3..e4d7c7f5aca2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -128,10 +128,10 @@ i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
 
 	pinned = 0;
 	mutex_lock(&dev->struct_mutex);
-	list_for_each_entry(vma, &ggtt->base.active_list, mm_list)
+	list_for_each_entry(vma, &ggtt->base.active_list, vm_link)
 		if (vma->pin_count)
 			pinned += vma->node.size;
-	list_for_each_entry(vma, &ggtt->base.inactive_list, mm_list)
+	list_for_each_entry(vma, &ggtt->base.inactive_list, vm_link)
 		if (vma->pin_count)
 			pinned += vma->node.size;
 	mutex_unlock(&dev->struct_mutex);
@@ -261,7 +261,7 @@ drop_pages(struct drm_i915_gem_object *obj)
 	int ret;
 
 	drm_gem_object_reference(&obj->base);
-	list_for_each_entry_safe(vma, next, &obj->vma_list, vma_link)
+	list_for_each_entry_safe(vma, next, &obj->vma_list, obj_link)
 		if (i915_vma_unbind(vma))
 			break;
 
@@ -2038,7 +2038,7 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 	obj->active |= intel_engine_flag(engine);
 
 	i915_gem_request_mark_active(req, &obj->last_read[engine->id]);
-	list_move_tail(&vma->mm_list, &vma->vm->active_list);
+	list_move_tail(&vma->vm_link, &vma->vm->active_list);
 }
 
 static void
@@ -2079,9 +2079,9 @@ i915_gem_object_retire__read(struct i915_gem_active *active,
 	 */
 	list_move_tail(&obj->global_list, &request->i915->mm.bound_list);
 
-	list_for_each_entry(vma, &obj->vma_list, vma_link) {
-		if (!list_empty(&vma->mm_list))
-			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
+	list_for_each_entry(vma, &obj->vma_list, obj_link) {
+		if (!list_empty(&vma->vm_link))
+			list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
 	}
 
 	drm_gem_object_unreference(&obj->base);
@@ -2576,7 +2576,7 @@ static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
 	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
 	int ret;
 
-	if (list_empty(&vma->vma_link))
+	if (list_empty(&vma->obj_link))
 		return 0;
 
 	if (!drm_mm_node_allocated(&vma->node)) {
@@ -2610,7 +2610,7 @@ static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
 	vma->vm->unbind_vma(vma);
 	vma->bound = 0;
 
-	list_del_init(&vma->mm_list);
+	list_del_init(&vma->vm_link);
 	if (i915_is_ggtt(vma->vm)) {
 		if (vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
 			obj->map_and_fenceable = false;
@@ -2864,7 +2864,7 @@ search_free:
 		goto err_remove_node;
 
 	list_move_tail(&obj->global_list, &dev_priv->mm.bound_list);
-	list_add_tail(&vma->mm_list, &vm->inactive_list);
+	list_add_tail(&vma->vm_link, &vm->inactive_list);
 
 	return vma;
 
@@ -3029,7 +3029,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 	/* And bump the LRU for this access */
 	vma = i915_gem_obj_to_ggtt(obj);
 	if (vma && drm_mm_node_allocated(&vma->node) && !obj->active)
-		list_move_tail(&vma->mm_list,
+		list_move_tail(&vma->vm_link,
 			       &to_i915(obj->base.dev)->gtt.base.inactive_list);
 
 	return 0;
@@ -3064,7 +3064,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 	 * catch the issue of the CS prefetch crossing page boundaries and
 	 * reading an invalid PTE on older architectures.
 	 */
-	list_for_each_entry_safe(vma, next, &obj->vma_list, vma_link) {
+	list_for_each_entry_safe(vma, next, &obj->vma_list, obj_link) {
 		if (!drm_mm_node_allocated(&vma->node))
 			continue;
 
@@ -3127,7 +3127,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 			 */
 		}
 
-		list_for_each_entry(vma, &obj->vma_list, vma_link) {
+		list_for_each_entry(vma, &obj->vma_list, obj_link) {
 			if (!drm_mm_node_allocated(&vma->node))
 				continue;
 
@@ -3137,7 +3137,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		}
 	}
 
-	list_for_each_entry(vma, &obj->vma_list, vma_link)
+	list_for_each_entry(vma, &obj->vma_list, obj_link)
 		vma->node.color = cache_level;
 	obj->cache_level = cache_level;
 
@@ -3797,7 +3797,7 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 
 	trace_i915_gem_object_destroy(obj);
 
-	list_for_each_entry_safe(vma, next, &obj->vma_list, vma_link) {
+	list_for_each_entry_safe(vma, next, &obj->vma_list, obj_link) {
 		int ret;
 
 		vma->pin_count = 0;
@@ -3854,7 +3854,7 @@ struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 				     struct i915_address_space *vm)
 {
 	struct i915_vma *vma;
-	list_for_each_entry(vma, &obj->vma_list, vma_link) {
+	list_for_each_entry(vma, &obj->vma_list, obj_link) {
 		if (vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL &&
 		    vma->vm == vm)
 			return vma;
@@ -3871,7 +3871,7 @@ struct i915_vma *i915_gem_obj_to_ggtt_view(struct drm_i915_gem_object *obj,
 	if (WARN_ONCE(!view, "no view specified"))
 		return ERR_PTR(-EINVAL);
 
-	list_for_each_entry(vma, &obj->vma_list, vma_link)
+	list_for_each_entry(vma, &obj->vma_list, obj_link)
 		if (vma->vm == ggtt &&
 		    i915_ggtt_view_equal(&vma->ggtt_view, view))
 			return vma;
@@ -3892,7 +3892,7 @@ void i915_gem_vma_destroy(struct i915_vma *vma)
 	if (!i915_is_ggtt(vm))
 		i915_ppgtt_put(i915_vm_to_ppgtt(vm));
 
-	list_del(&vma->vma_link);
+	list_del(&vma->obj_link);
 
 	kmem_cache_free(to_i915(vma->obj->base.dev)->vmas, vma);
 }
@@ -4444,7 +4444,7 @@ u64 i915_gem_obj_offset(struct drm_i915_gem_object *o,
 
 	WARN_ON(vm == &dev_priv->mm.aliasing_ppgtt->base);
 
-	list_for_each_entry(vma, &o->vma_list, vma_link) {
+	list_for_each_entry(vma, &o->vma_list, obj_link) {
 		if (i915_is_ggtt(vma->vm) &&
 		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
 			continue;
@@ -4463,7 +4463,7 @@ u64 i915_gem_obj_ggtt_offset_view(struct drm_i915_gem_object *o,
 	struct i915_address_space *ggtt = i915_obj_to_ggtt(o);
 	struct i915_vma *vma;
 
-	list_for_each_entry(vma, &o->vma_list, vma_link)
+	list_for_each_entry(vma, &o->vma_list, obj_link)
 		if (vma->vm == ggtt &&
 		    i915_ggtt_view_equal(&vma->ggtt_view, view))
 			return vma->node.start;
@@ -4477,7 +4477,7 @@ bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
 {
 	struct i915_vma *vma;
 
-	list_for_each_entry(vma, &o->vma_list, vma_link) {
+	list_for_each_entry(vma, &o->vma_list, obj_link) {
 		if (i915_is_ggtt(vma->vm) &&
 		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
 			continue;
@@ -4494,7 +4494,7 @@ bool i915_gem_obj_ggtt_bound_view(struct drm_i915_gem_object *o,
 	struct i915_address_space *ggtt = i915_obj_to_ggtt(o);
 	struct i915_vma *vma;
 
-	list_for_each_entry(vma, &o->vma_list, vma_link)
+	list_for_each_entry(vma, &o->vma_list, obj_link)
 		if (vma->vm == ggtt &&
 		    i915_ggtt_view_equal(&vma->ggtt_view, view) &&
 		    drm_mm_node_allocated(&vma->node))
@@ -4507,7 +4507,7 @@ bool i915_gem_obj_bound_any(struct drm_i915_gem_object *o)
 {
 	struct i915_vma *vma;
 
-	list_for_each_entry(vma, &o->vma_list, vma_link)
+	list_for_each_entry(vma, &o->vma_list, obj_link)
 		if (drm_mm_node_allocated(&vma->node))
 			return true;
 
@@ -4524,7 +4524,7 @@ unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
 
 	BUG_ON(list_empty(&o->vma_list));
 
-	list_for_each_entry(vma, &o->vma_list, vma_link) {
+	list_for_each_entry(vma, &o->vma_list, obj_link) {
 		if (i915_is_ggtt(vma->vm) &&
 		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
 			continue;
@@ -4537,7 +4537,7 @@ unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
 bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj)
 {
 	struct i915_vma *vma;
-	list_for_each_entry(vma, &obj->vma_list, vma_link)
+	list_for_each_entry(vma, &obj->vma_list, obj_link)
 		if (vma->pin_count > 0)
 			return true;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 72b0875a95a4..05b4e0e85f24 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -142,7 +142,7 @@ static void i915_gem_context_clean(struct intel_context *ctx)
 		return;
 
 	list_for_each_entry_safe(vma, next, &ppgtt->base.inactive_list,
-				 mm_list) {
+				 vm_link) {
 		if (WARN_ON(__i915_vma_unbind_no_wait(vma)))
 			break;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 07c6e4d320c9..ea1f8d1bd228 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -116,7 +116,7 @@ i915_gem_evict_something(struct drm_device *dev, struct i915_address_space *vm,
 
 search_again:
 	/* First see if there is a large enough contiguous idle region... */
-	list_for_each_entry(vma, &vm->inactive_list, mm_list) {
+	list_for_each_entry(vma, &vm->inactive_list, vm_link) {
 		if (mark_free(vma, &unwind_list))
 			goto found;
 	}
@@ -125,7 +125,7 @@ search_again:
 		goto none;
 
 	/* Now merge in the soon-to-be-expired objects... */
-	list_for_each_entry(vma, &vm->active_list, mm_list) {
+	list_for_each_entry(vma, &vm->active_list, vm_link) {
 		if (mark_free(vma, &unwind_list))
 			goto found;
 	}
@@ -270,7 +270,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
 		WARN_ON(!list_empty(&vm->active_list));
 	}
 
-	list_for_each_entry_safe(vma, next, &vm->inactive_list, mm_list)
+	list_for_each_entry_safe(vma, next, &vm->inactive_list, vm_link)
 		if (vma->pin_count == 0)
 			WARN_ON(i915_vma_unbind(vma));
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index cddbd8c00663..6168182a87d8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2736,7 +2736,7 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
 		}
 		vma->bound |= GLOBAL_BIND;
 		__i915_vma_set_map_and_fenceable(vma);
-		list_add_tail(&vma->mm_list, &ggtt_vm->inactive_list);
+		list_add_tail(&vma->vm_link, &ggtt_vm->inactive_list);
 	}
 
 	/* Clear any non-preallocated blocks */
@@ -3221,7 +3221,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 	vm = &dev_priv->gtt.base;
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
 		flush = false;
-		list_for_each_entry(vma, &obj->vma_list, vma_link) {
+		list_for_each_entry(vma, &obj->vma_list, obj_link) {
 			if (vma->vm != vm)
 				continue;
 
@@ -3277,8 +3277,8 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 	if (vma == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	INIT_LIST_HEAD(&vma->vma_link);
-	INIT_LIST_HEAD(&vma->mm_list);
+	INIT_LIST_HEAD(&vma->vm_link);
+	INIT_LIST_HEAD(&vma->obj_link);
 	INIT_LIST_HEAD(&vma->exec_list);
 	vma->vm = vm;
 	vma->obj = obj;
@@ -3286,7 +3286,7 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 	if (i915_is_ggtt(vm))
 		vma->ggtt_view = *ggtt_view;
 
-	list_add_tail(&vma->vma_link, &obj->vma_list);
+	list_add_tail(&vma->obj_link, &obj->vma_list);
 	if (!i915_is_ggtt(vm))
 		i915_ppgtt_get(i915_vm_to_ppgtt(vm));
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index b448ad832dcf..2497671d1e1a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -195,9 +195,9 @@ struct i915_vma {
 	struct i915_ggtt_view ggtt_view;
 
 	/** This object's place on the active/inactive lists */
-	struct list_head mm_list;
+	struct list_head vm_link;
 
-	struct list_head vma_link; /* Link in the object's VMA list */
+	struct list_head obj_link; /* Link in the object's VMA list */
 
 	/** This vma's place in the batchbuffer or on the eviction list */
 	struct list_head exec_list;
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 16da9c1422cc..777959b47ccf 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -52,7 +52,7 @@ static int num_vma_bound(struct drm_i915_gem_object *obj)
 	struct i915_vma *vma;
 	int count = 0;
 
-	list_for_each_entry(vma, &obj->vma_list, vma_link) {
+	list_for_each_entry(vma, &obj->vma_list, obj_link) {
 		if (drm_mm_node_allocated(&vma->node))
 			count++;
 		if (vma->pin_count)
@@ -176,7 +176,7 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
 
 			/* For the unbound phase, this should be a no-op! */
 			list_for_each_entry_safe(vma, v,
-						 &obj->vma_list, vma_link)
+						 &obj->vma_list, obj_link)
 				if (i915_vma_unbind(vma))
 					break;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index c384dc9c8a63..590e635cb65c 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -692,7 +692,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 
 		vma->bound |= GLOBAL_BIND;
 		__i915_vma_set_map_and_fenceable(vma);
-		list_add_tail(&vma->mm_list, &ggtt->inactive_list);
+		list_add_tail(&vma->vm_link, &ggtt->inactive_list);
 	}
 
 	list_add_tail(&obj->global_list, &dev_priv->mm.bound_list);
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 251e81c4b0ea..2f3638d02bdd 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -81,7 +81,7 @@ static void __cancel_userptr__worker(struct work_struct *work)
 		was_interruptible = dev_priv->mm.interruptible;
 		dev_priv->mm.interruptible = false;
 
-		list_for_each_entry_safe(vma, tmp, &obj->vma_list, vma_link)
+		list_for_each_entry_safe(vma, tmp, &obj->vma_list, obj_link)
 			WARN_ON(i915_vma_unbind(vma));
 		WARN_ON(i915_gem_object_put_pages(obj));
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index c812079bc25c..706d956b6eb3 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -731,7 +731,7 @@ static u32 capture_active_bo(struct drm_i915_error_buffer *err,
 	struct i915_vma *vma;
 	int i = 0;
 
-	list_for_each_entry(vma, head, mm_list) {
+	list_for_each_entry(vma, head, vm_link) {
 		capture_bo(err++, vma);
 		if (++i == count)
 			break;
@@ -754,7 +754,7 @@ static u32 capture_pinned_bo(struct drm_i915_error_buffer *err,
 		if (err == last)
 			break;
 
-		list_for_each_entry(vma, &obj->vma_list, vma_link)
+		list_for_each_entry(vma, &obj->vma_list, obj_link)
 			if (vma->vm == vm && vma->pin_count > 0)
 				capture_bo(err++, vma);
 	}
@@ -1113,12 +1113,12 @@ static void i915_gem_capture_vm(struct drm_i915_private *dev_priv,
 	int i;
 
 	i = 0;
-	list_for_each_entry(vma, &vm->active_list, mm_list)
+	list_for_each_entry(vma, &vm->active_list, vm_link)
 		i++;
 	error->active_bo_count[ndx] = i;
 
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-		list_for_each_entry(vma, &obj->vma_list, vma_link)
+		list_for_each_entry(vma, &obj->vma_list, obj_link)
 			if (vma->vm == vm && vma->pin_count > 0)
 				i++;
 	}
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 077/190] drm/i915: Amalgamate GGTT/ppGTT vma debug list walkers
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (74 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 076/190] drm/i915: Rename vma->*_list to *_link for consistency Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 078/190] drm/i915: Split early global GTT initialisation Chris Wilson
                   ` (10 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

As we can now have multiple VMA inside the global GTT (with partial
mappings, rotations, etc), it is no longer true that there may just be a
single GGTT entry and so we should walk the full vma_list to count up
the actual usage. In addition to unifying the two walkers, switch from
multiplying the object size for each vma to summing the bound vma sizes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 46 +++++++++++++++----------------------
 1 file changed, 18 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index f311df758195..dd1788c81b90 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -332,6 +332,7 @@ static int per_file_stats(int id, void *ptr, void *data)
 	struct drm_i915_gem_object *obj = ptr;
 	struct file_stats *stats = data;
 	struct i915_vma *vma;
+	int bound = 0;
 
 	stats->count++;
 	stats->total += obj->base.size;
@@ -339,41 +340,30 @@ static int per_file_stats(int id, void *ptr, void *data)
 	if (obj->base.name || obj->base.dma_buf)
 		stats->shared += obj->base.size;
 
-	if (USES_FULL_PPGTT(obj->base.dev)) {
-		list_for_each_entry(vma, &obj->vma_list, obj_link) {
-			struct i915_hw_ppgtt *ppgtt;
+	list_for_each_entry(vma, &obj->vma_list, obj_link) {
+		if (!drm_mm_node_allocated(&vma->node))
+			continue;
 
-			if (!drm_mm_node_allocated(&vma->node))
-				continue;
+		bound++;
 
-			if (i915_is_ggtt(vma->vm)) {
-				stats->global += obj->base.size;
-				continue;
-			}
-
-			ppgtt = container_of(vma->vm, struct i915_hw_ppgtt, base);
+		if (i915_is_ggtt(vma->vm)) {
+			stats->global += vma->node.size;
+		} else {
+			struct i915_hw_ppgtt *ppgtt
+				= container_of(vma->vm,
+					       struct i915_hw_ppgtt,
+					       base);
 			if (ppgtt->file_priv != stats->file_priv)
 				continue;
-
-			if (obj->active) /* XXX per-vma statistic */
-				stats->active += obj->base.size;
-			else
-				stats->inactive += obj->base.size;
-
-			return 0;
-		}
-	} else {
-		if (i915_gem_obj_ggtt_bound(obj)) {
-			stats->global += obj->base.size;
-			if (obj->active)
-				stats->active += obj->base.size;
-			else
-				stats->inactive += obj->base.size;
-			return 0;
 		}
+
+		if (obj->active) /* XXX per-vma statistic */
+			stats->active += vma->node.size;
+		else
+			stats->inactive += vma->node.size;
 	}
 
-	if (!list_empty(&obj->global_list))
+	if (!bound)
 		stats->unbound += obj->base.size;
 
 	return 0;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 078/190] drm/i915: Split early global GTT initialisation
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (75 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 077/190] drm/i915: Amalgamate GGTT/ppGTT vma debug list walkers Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 079/190] drm/i915: Reduce the pointer dance of i915_is_ggtt() Chris Wilson
                   ` (9 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Initialising the global GTT is tricky as we wish to use the drm_mm range
manager during the modesetting initialisation (to capture stolen
allocations from the BIOS) before we actually enable GEM. To overcome
this, we currently setup the drm_mm first and then carefully rebind
them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_dma.c        |  2 ++
 drivers/gpu/drm/i915/i915_gem.c        |  5 +--
 drivers/gpu/drm/i915/i915_gem_gtt.c    | 62 +++++++++++-----------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h    |  1 +
 drivers/gpu/drm/i915/i915_gem_stolen.c | 17 +++++-----
 5 files changed, 33 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index c0242ce45e43..4a24831a14fa 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -989,6 +989,8 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 	dev_priv->gtt.mtrr = arch_phys_wc_add(dev_priv->gtt.mappable_base,
 					      aperture_size);
 
+	i915_gem_init_global_gtt(dev);
+
 	/* The i915 workqueue is primarily used for batched retirement of
 	 * requests (and thus managing bo) once the task has been completed
 	 * by the GPU. i915_gem_retire_requests() is called directly when we
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e4d7c7f5aca2..44bd514a6c2e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4237,7 +4237,9 @@ int i915_gem_init(struct drm_device *dev)
 	if (ret)
 		goto out_unlock;
 
-	i915_gem_init_global_gtt(dev);
+	ret = i915_global_gtt_setup(dev);
+	if (ret)
+		goto out_unlock;
 
 	ret = i915_gem_context_init(dev);
 	if (ret)
@@ -4312,7 +4314,6 @@ i915_gem_load(struct drm_device *dev)
 				  SLAB_HWCACHE_ALIGN,
 				  NULL);
 
-	INIT_LIST_HEAD(&dev_priv->vm_list);
 	INIT_LIST_HEAD(&dev_priv->context_list);
 	INIT_LIST_HEAD(&dev_priv->mm.unbound_list);
 	INIT_LIST_HEAD(&dev_priv->mm.bound_list);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 6168182a87d8..b5c3bbe6dc2a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2681,10 +2681,7 @@ static void i915_gtt_color_adjust(struct drm_mm_node *node,
 	}
 }
 
-static int i915_gem_setup_global_gtt(struct drm_device *dev,
-				     u64 start,
-				     u64 mappable_end,
-				     u64 end)
+int i915_global_gtt_setup(struct drm_device *dev)
 {
 	/* Let GEM Manage all of the aperture.
 	 *
@@ -2697,48 +2694,16 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
 	 */
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_address_space *ggtt_vm = &dev_priv->gtt.base;
-	struct drm_mm_node *entry;
-	struct drm_i915_gem_object *obj;
 	unsigned long hole_start, hole_end;
+	struct drm_mm_node *entry;
 	int ret;
 
-	BUG_ON(mappable_end > end);
-
-	ggtt_vm->start = start;
-
-	/* Subtract the guard page before address space initialization to
-	 * shrink the range used by drm_mm */
-	ggtt_vm->total = end - start - PAGE_SIZE;
-	i915_address_space_init(ggtt_vm, dev_priv);
-	ggtt_vm->total += PAGE_SIZE;
-
 	if (intel_vgpu_active(dev)) {
 		ret = intel_vgt_balloon(dev);
 		if (ret)
 			return ret;
 	}
 
-	if (!HAS_LLC(dev))
-		ggtt_vm->mm.color_adjust = i915_gtt_color_adjust;
-
-	/* Mark any preallocated objects as occupied */
-	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-		struct i915_vma *vma = i915_gem_obj_to_vma(obj, ggtt_vm);
-
-		DRM_DEBUG_KMS("reserving preallocated space: %llx + %zx\n",
-			      i915_gem_obj_ggtt_offset(obj), obj->base.size);
-
-		WARN_ON(i915_gem_obj_ggtt_bound(obj));
-		ret = drm_mm_reserve_node(&ggtt_vm->mm, &vma->node);
-		if (ret) {
-			DRM_DEBUG_KMS("Reservation failed: %i\n", ret);
-			return ret;
-		}
-		vma->bound |= GLOBAL_BIND;
-		__i915_vma_set_map_and_fenceable(vma);
-		list_add_tail(&vma->vm_link, &ggtt_vm->inactive_list);
-	}
-
 	/* Clear any non-preallocated blocks */
 	drm_mm_for_each_hole(entry, &ggtt_vm->mm, hole_start, hole_end) {
 		DRM_DEBUG_KMS("clearing unused GTT space: [%lx, %lx]\n",
@@ -2748,7 +2713,9 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
 	}
 
 	/* And finally clear the reserved guard page */
-	ggtt_vm->clear_range(ggtt_vm, end - PAGE_SIZE, PAGE_SIZE, true);
+	ggtt_vm->clear_range(ggtt_vm,
+			     ggtt_vm->total - PAGE_SIZE, PAGE_SIZE,
+			     true);
 
 	if (USES_PPGTT(dev) && !USES_FULL_PPGTT(dev)) {
 		struct i915_hw_ppgtt *ppgtt;
@@ -2788,13 +2755,22 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
 
 void i915_gem_init_global_gtt(struct drm_device *dev)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	u64 gtt_size, mappable_size;
+	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct i915_address_space *ggtt_vm = &dev_priv->gtt.base;
 
-	gtt_size = dev_priv->gtt.base.total;
-	mappable_size = dev_priv->gtt.mappable_end;
+	INIT_LIST_HEAD(&dev_priv->vm_list);
 
-	i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size);
+	if (WARN_ON(dev_priv->gtt.mappable_end > ggtt_vm->total))
+		dev_priv->gtt.mappable_end = ggtt_vm->total;
+
+	if (!HAS_LLC(dev))
+		ggtt_vm->mm.color_adjust = i915_gtt_color_adjust;
+
+	/* Subtract the guard page before address space initialization to
+	 * shrink the range used by drm_mm */
+	ggtt_vm->total -= PAGE_SIZE;
+	i915_address_space_init(ggtt_vm, dev_priv);
+	ggtt_vm->total += PAGE_SIZE;
 }
 
 void i915_global_gtt_cleanup(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 2497671d1e1a..cb796c1ff6a5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -514,6 +514,7 @@ i915_page_dir_dma_addr(const struct i915_hw_ppgtt *ppgtt, const unsigned n)
 
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
+int i915_global_gtt_setup(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
 
 
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 590e635cb65c..463be259a505 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -683,18 +683,17 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 	 */
 	vma->node.start = gtt_offset;
 	vma->node.size = size;
-	if (drm_mm_initialized(&ggtt->mm)) {
-		ret = drm_mm_reserve_node(&ggtt->mm, &vma->node);
-		if (ret) {
-			DRM_DEBUG_KMS("failed to allocate stolen GTT space\n");
-			goto err;
-		}
 
-		vma->bound |= GLOBAL_BIND;
-		__i915_vma_set_map_and_fenceable(vma);
-		list_add_tail(&vma->vm_link, &ggtt->inactive_list);
+	ret = drm_mm_reserve_node(&ggtt->mm, &vma->node);
+	if (ret) {
+		DRM_DEBUG_KMS("failed to allocate stolen GTT space\n");
+		goto err;
 	}
 
+	vma->bound |= GLOBAL_BIND;
+	__i915_vma_set_map_and_fenceable(vma);
+	list_add_tail(&vma->vm_link, &ggtt->inactive_list);
+
 	list_add_tail(&obj->global_list, &dev_priv->mm.bound_list);
 	i915_gem_object_pin_pages(obj);
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 079/190] drm/i915: Reduce the pointer dance of i915_is_ggtt()
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (76 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 078/190] drm/i915: Split early global GTT initialisation Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-15 12:12   ` Dave Gordon
  2016-01-11  9:17 ` [PATCH 080/190] drm/i915: Store owning file on the i915_address_space Chris Wilson
                   ` (8 subsequent siblings)
  86 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

The multiple levels of indirect do nothing but hinder the compiler and
the pointer chasing turns to be quite painful but painless to fix.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        | 13 ++++++-------
 drivers/gpu/drm/i915/i915_drv.h            |  7 -------
 drivers/gpu/drm/i915/i915_gem.c            | 18 +++++++-----------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 ++---
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 12 +++++-------
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  5 +++++
 drivers/gpu/drm/i915/i915_trace.h          | 27 ++++++++-------------------
 7 files changed, 33 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index dd1788c81b90..99a6181b012e 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -118,7 +118,7 @@ static u64 i915_gem_obj_total_ggtt_size(struct drm_i915_gem_object *obj)
 	struct i915_vma *vma;
 
 	list_for_each_entry(vma, &obj->vma_list, obj_link) {
-		if (i915_is_ggtt(vma->vm) && drm_mm_node_allocated(&vma->node))
+		if (vma->is_ggtt && drm_mm_node_allocated(&vma->node))
 			size += vma->node.size;
 	}
 
@@ -165,12 +165,11 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		seq_printf(m, " (fence: %d)", obj->fence_reg);
 	list_for_each_entry(vma, &obj->vma_list, obj_link) {
 		seq_printf(m, " (%sgtt offset: %08llx, size: %08llx",
-			   i915_is_ggtt(vma->vm) ? "g" : "pp",
+			   vma->is_ggtt ? "g" : "pp",
 			   vma->node.start, vma->node.size);
-		if (i915_is_ggtt(vma->vm))
-			seq_printf(m, ", type: %u)", vma->ggtt_view.type);
-		else
-			seq_puts(m, ")");
+		if (vma->is_ggtt)
+			seq_printf(m, ", type: %u", vma->ggtt_view.type);
+		seq_puts(m, ")");
 	}
 	if (obj->stolen)
 		seq_printf(m, " (stolen: %08llx)", obj->stolen->start);
@@ -346,7 +345,7 @@ static int per_file_stats(int id, void *ptr, void *data)
 
 		bound++;
 
-		if (i915_is_ggtt(vma->vm)) {
+		if (vma->is_ggtt) {
 			stats->global += vma->node.size;
 		} else {
 			struct i915_hw_ppgtt *ppgtt
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c9c1a5cdc1e5..f840cc55f1ab 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2905,18 +2905,11 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj);
 /* Some GGTT VM helpers */
 #define i915_obj_to_ggtt(obj) \
 	(&((struct drm_i915_private *)(obj)->base.dev->dev_private)->gtt.base)
-static inline bool i915_is_ggtt(struct i915_address_space *vm)
-{
-	struct i915_address_space *ggtt =
-		&((struct drm_i915_private *)(vm)->dev->dev_private)->gtt.base;
-	return vm == ggtt;
-}
 
 static inline struct i915_hw_ppgtt *
 i915_vm_to_ppgtt(struct i915_address_space *vm)
 {
 	WARN_ON(i915_is_ggtt(vm));
-
 	return container_of(vm, struct i915_hw_ppgtt, base);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 44bd514a6c2e..9a22fdd8a9f5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2595,8 +2595,7 @@ static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
 			return ret;
 	}
 
-	if (i915_is_ggtt(vma->vm) &&
-	    vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
+	if (vma->is_ggtt && vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
 		i915_gem_object_finish_gtt(obj);
 
 		/* release the fence reg _after_ flushing */
@@ -2611,7 +2610,7 @@ static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
 	vma->bound = 0;
 
 	list_del_init(&vma->vm_link);
-	if (i915_is_ggtt(vma->vm)) {
+	if (vma->is_ggtt) {
 		if (vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
 			obj->map_and_fenceable = false;
 		} else if (vma->ggtt_view.pages) {
@@ -3880,17 +3879,14 @@ struct i915_vma *i915_gem_obj_to_ggtt_view(struct drm_i915_gem_object *obj,
 
 void i915_gem_vma_destroy(struct i915_vma *vma)
 {
-	struct i915_address_space *vm = NULL;
 	WARN_ON(vma->node.allocated);
 
 	/* Keep the vma as a placeholder in the execbuffer reservation lists */
 	if (!list_empty(&vma->exec_list))
 		return;
 
-	vm = vma->vm;
-
-	if (!i915_is_ggtt(vm))
-		i915_ppgtt_put(i915_vm_to_ppgtt(vm));
+	if (!vma->is_ggtt)
+		i915_ppgtt_put(i915_vm_to_ppgtt(vma->vm));
 
 	list_del(&vma->obj_link);
 
@@ -4446,7 +4442,7 @@ u64 i915_gem_obj_offset(struct drm_i915_gem_object *o,
 	WARN_ON(vm == &dev_priv->mm.aliasing_ppgtt->base);
 
 	list_for_each_entry(vma, &o->vma_list, obj_link) {
-		if (i915_is_ggtt(vma->vm) &&
+		if (vma->is_ggtt &&
 		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
 			continue;
 		if (vma->vm == vm)
@@ -4479,7 +4475,7 @@ bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
 	struct i915_vma *vma;
 
 	list_for_each_entry(vma, &o->vma_list, obj_link) {
-		if (i915_is_ggtt(vma->vm) &&
+		if (vma->is_ggtt &&
 		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
 			continue;
 		if (vma->vm == vm && drm_mm_node_allocated(&vma->node))
@@ -4526,7 +4522,7 @@ unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
 	BUG_ON(list_empty(&o->vma_list));
 
 	list_for_each_entry(vma, &o->vma_list, obj_link) {
-		if (i915_is_ggtt(vma->vm) &&
+		if (vma->is_ggtt &&
 		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
 			continue;
 		if (vma->vm == vm)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 56d6b5dbb121..c10795f58bfc 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -683,7 +683,7 @@ need_reloc_mappable(struct i915_vma *vma)
 	if (entry->relocation_count == 0)
 		return false;
 
-	if (!i915_is_ggtt(vma->vm))
+	if (!vma->is_ggtt)
 		return false;
 
 	/* See also use_cpu_reloc() */
@@ -702,8 +702,7 @@ eb_vma_misplaced(struct i915_vma *vma)
 	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 	struct drm_i915_gem_object *obj = vma->obj;
 
-	WARN_ON(entry->flags & __EXEC_OBJECT_NEEDS_MAP &&
-	       !i915_is_ggtt(vma->vm));
+	WARN_ON(entry->flags & __EXEC_OBJECT_NEEDS_MAP && !vma->is_ggtt);
 
 	if (entry->alignment &&
 	    vma->node.start & (entry->alignment - 1))
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index b5c3bbe6dc2a..06117bd0fc00 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3150,6 +3150,7 @@ int i915_gem_gtt_init(struct drm_device *dev)
 	}
 
 	gtt->base.dev = dev;
+	gtt->base.is_ggtt = true;
 
 	ret = gtt->gtt_probe(dev, &gtt->base.total, &gtt->stolen_size,
 			     &gtt->mappable_base, &gtt->mappable_end);
@@ -3258,13 +3259,14 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 	INIT_LIST_HEAD(&vma->exec_list);
 	vma->vm = vm;
 	vma->obj = obj;
+	vma->is_ggtt = i915_is_ggtt(vm);
 
 	if (i915_is_ggtt(vm))
 		vma->ggtt_view = *ggtt_view;
+	else
+		i915_ppgtt_get(i915_vm_to_ppgtt(vm));
 
 	list_add_tail(&vma->obj_link, &obj->vma_list);
-	if (!i915_is_ggtt(vm))
-		i915_ppgtt_get(i915_vm_to_ppgtt(vm));
 
 	return vma;
 }
@@ -3536,13 +3538,9 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
 		return 0;
 
 	if (vma->bound == 0 && vma->vm->allocate_va_range) {
-		trace_i915_va_alloc(vma->vm,
-				    vma->node.start,
-				    vma->node.size,
-				    VM_TO_TRACE_NAME(vma->vm));
-
 		/* XXX: i915_vma_pin() will fix this +- hack */
 		vma->pin_count++;
+		trace_i915_va_alloc(vma);
 		ret = vma->vm->allocate_va_range(vma->vm,
 						 vma->node.start,
 						 vma->node.size);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index cb796c1ff6a5..633b9b2e1acb 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -184,6 +184,7 @@ struct i915_vma {
 #define GLOBAL_BIND	(1<<0)
 #define LOCAL_BIND	(1<<1)
 	unsigned int bound : 4;
+	bool is_ggtt : 1;
 
 	/**
 	 * Support different GGTT views into the same object.
@@ -276,6 +277,8 @@ struct i915_address_space {
 	u64 start;		/* Start offset always 0 for dri2 */
 	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
 
+	bool is_ggtt;
+
 	struct i915_page_scratch *scratch_page;
 	struct i915_page_table *scratch_pt;
 	struct i915_page_directory *scratch_pd;
@@ -331,6 +334,8 @@ struct i915_address_space {
 			u32 flags);
 };
 
+#define i915_is_ggtt(V) ((V)->is_ggtt)
+
 /* The Graphics Translation Table is the way in which GEN hardware translates a
  * Graphics Virtual Address into a Physical Address. In addition to the normal
  * collateral associated with any va->pa translations GEN hardware also has a
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 85469e3c740a..e486dcef508d 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -175,35 +175,24 @@ TRACE_EVENT(i915_vma_unbind,
 		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
 );
 
-#define VM_TO_TRACE_NAME(vm) \
-	(i915_is_ggtt(vm) ? "G" : \
-		      "P")
-
-DECLARE_EVENT_CLASS(i915_va,
-	TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
-	TP_ARGS(vm, start, length, name),
+TRACE_EVENT(i915_va_alloc,
+	TP_PROTO(struct i915_vma *vma),
+	TP_ARGS(vma),
 
 	TP_STRUCT__entry(
 		__field(struct i915_address_space *, vm)
 		__field(u64, start)
 		__field(u64, end)
-		__string(name, name)
 	),
 
 	TP_fast_assign(
-		__entry->vm = vm;
-		__entry->start = start;
-		__entry->end = start + length - 1;
-		__assign_str(name, name);
+		__entry->vm = vma->vm;
+		__entry->start = vma->node.start;
+		__entry->end = vma->node.start + vma->node.size - 1;
 	),
 
-	TP_printk("vm=%p (%s), 0x%llx-0x%llx",
-		  __entry->vm, __get_str(name),  __entry->start, __entry->end)
-);
-
-DEFINE_EVENT(i915_va, i915_va_alloc,
-	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
-	     TP_ARGS(vm, start, length, name)
+	TP_printk("vm=%p (%c), 0x%llx-0x%llx",
+		  __entry->vm, i915_is_ggtt(__entry->vm) ? 'G' : 'P',  __entry->start, __entry->end)
 );
 
 DECLARE_EVENT_CLASS(i915_px_entry,
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 080/190] drm/i915: Store owning file on the i915_address_space
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (77 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 079/190] drm/i915: Reduce the pointer dance of i915_is_ggtt() Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 081/190] drm/i915: i915_vma_move_to_active prep patch Chris Wilson
                   ` (7 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

For the global GTT (and aliasing GTT), the address space is owned by the
device (it is a global resource) and so the per-file owner field is
NULL. For per-process GTT (where we create an address space per
context), each is owned by the opening file. We can use this ownership
information to both distinguish GGTT and ppGTT address spaces, as well
as occasionally inspect the owner.

v2: Whitespace, tells us who owns i915_address_space

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  2 +-
 drivers/gpu/drm/i915/i915_drv.h         |  1 -
 drivers/gpu/drm/i915/i915_gem_context.c |  3 ++-
 drivers/gpu/drm/i915/i915_gem_gtt.c     | 27 ++++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem_gtt.h     | 21 ++++++++++++++-------
 5 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 99a6181b012e..0d1f470567b0 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -352,7 +352,7 @@ static int per_file_stats(int id, void *ptr, void *data)
 				= container_of(vma->vm,
 					       struct i915_hw_ppgtt,
 					       base);
-			if (ppgtt->file_priv != stats->file_priv)
+			if (ppgtt->base.file != stats->file_priv)
 				continue;
 		}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f840cc55f1ab..0cc3ee589dfb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2913,7 +2913,6 @@ i915_vm_to_ppgtt(struct i915_address_space *vm)
 	return container_of(vm, struct i915_hw_ppgtt, base);
 }
 
-
 static inline bool i915_gem_obj_ggtt_bound(struct drm_i915_gem_object *obj)
 {
 	return i915_gem_obj_ggtt_bound_view(obj, &i915_ggtt_view_normal);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 05b4e0e85f24..fab702abd1cb 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -296,7 +296,8 @@ i915_gem_create_context(struct drm_device *dev,
 	}
 
 	if (USES_FULL_PPGTT(dev)) {
-		struct i915_hw_ppgtt *ppgtt = i915_ppgtt_create(dev, file_priv);
+		struct i915_hw_ppgtt *ppgtt =
+			i915_ppgtt_create(to_i915(dev), file_priv);
 
 		if (IS_ERR_OR_NULL(ppgtt)) {
 			DRM_DEBUG_DRIVER("PPGTT setup failed (%ld)\n",
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 06117bd0fc00..3a07ff622bd6 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2112,11 +2112,12 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
+static int __hw_ppgtt_init(struct i915_hw_ppgtt *ppgtt,
+			   struct drm_i915_private *dev_priv)
 {
-	ppgtt->base.dev = dev;
+	ppgtt->base.dev = dev_priv->dev;
 
-	if (INTEL_INFO(dev)->gen < 8)
+	if (INTEL_INFO(dev_priv)->gen < 8)
 		return gen6_ppgtt_init(ppgtt);
 	else
 		return gen8_ppgtt_init(ppgtt);
@@ -2132,15 +2133,17 @@ static void i915_address_space_init(struct i915_address_space *vm,
 	list_add_tail(&vm->global_link, &dev_priv->vm_list);
 }
 
-int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
+int i915_ppgtt_init(struct i915_hw_ppgtt *ppgtt,
+		    struct drm_i915_private *dev_priv,
+		    struct drm_i915_file_private *file_priv)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	int ret = 0;
+	int ret;
 
-	ret = __hw_ppgtt_init(dev, ppgtt);
+	ret = __hw_ppgtt_init(ppgtt, dev_priv);
 	if (ret == 0) {
 		kref_init(&ppgtt->ref);
 		i915_address_space_init(&ppgtt->base, dev_priv);
+		ppgtt->base.file = file_priv;
 	}
 
 	return ret;
@@ -2183,7 +2186,8 @@ int i915_ppgtt_init_ring(struct drm_i915_gem_request *req)
 }
 
 struct i915_hw_ppgtt *
-i915_ppgtt_create(struct drm_device *dev, struct drm_i915_file_private *fpriv)
+i915_ppgtt_create(struct drm_i915_private *dev_priv,
+		  struct drm_i915_file_private *fpriv)
 {
 	struct i915_hw_ppgtt *ppgtt;
 	int ret;
@@ -2192,14 +2196,12 @@ i915_ppgtt_create(struct drm_device *dev, struct drm_i915_file_private *fpriv)
 	if (!ppgtt)
 		return ERR_PTR(-ENOMEM);
 
-	ret = i915_ppgtt_init(dev, ppgtt);
+	ret = i915_ppgtt_init(ppgtt, dev_priv, fpriv);
 	if (ret) {
 		kfree(ppgtt);
 		return ERR_PTR(ret);
 	}
 
-	ppgtt->file_priv = fpriv;
-
 	trace_i915_ppgtt_create(&ppgtt->base);
 
 	return ppgtt;
@@ -2724,7 +2726,7 @@ int i915_global_gtt_setup(struct drm_device *dev)
 		if (!ppgtt)
 			return -ENOMEM;
 
-		ret = __hw_ppgtt_init(dev, ppgtt);
+		ret = __hw_ppgtt_init(ppgtt, dev_priv);
 		if (ret) {
 			ppgtt->base.cleanup(&ppgtt->base);
 			kfree(ppgtt);
@@ -3150,7 +3152,6 @@ int i915_gem_gtt_init(struct drm_device *dev)
 	}
 
 	gtt->base.dev = dev;
-	gtt->base.is_ggtt = true;
 
 	ret = gtt->gtt_probe(dev, &gtt->base.total, &gtt->stolen_size,
 			     &gtt->mappable_base, &gtt->mappable_end);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 633b9b2e1acb..9d3984602d34 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -273,12 +273,19 @@ struct i915_pml4 {
 struct i915_address_space {
 	struct drm_mm mm;
 	struct drm_device *dev;
+	/* Every address space belongs to a struct file - except for the global
+	 * GTT that is owned by the driver (and so @file is set to NULL). In
+	 * principle, no information should leak from one context to another
+	 * (or between files/processes etc) unless explicitly shared by the
+	 * owner. Tracking the owner is important in order to free up per-file
+	 * objects along with the file, to aide resource tracking, and to
+	 * assign blame.
+	 */
+	struct drm_i915_file_private *file;
 	struct list_head global_link;
 	u64 start;		/* Start offset always 0 for dri2 */
 	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
 
-	bool is_ggtt;
-
 	struct i915_page_scratch *scratch_page;
 	struct i915_page_table *scratch_pt;
 	struct i915_page_directory *scratch_pd;
@@ -334,7 +341,7 @@ struct i915_address_space {
 			u32 flags);
 };
 
-#define i915_is_ggtt(V) ((V)->is_ggtt)
+#define i915_is_ggtt(V) ((V)->file == NULL)
 
 /* The Graphics Translation Table is the way in which GEN hardware translates a
  * Graphics Virtual Address into a Physical Address. In addition to the normal
@@ -376,8 +383,6 @@ struct i915_hw_ppgtt {
 		struct i915_page_directory pd;		/* GEN6-7 */
 	};
 
-	struct drm_i915_file_private *file_priv;
-
 	gen6_pte_t __iomem *pd_addr;
 
 	int (*enable)(struct i915_hw_ppgtt *ppgtt);
@@ -523,11 +528,13 @@ int i915_global_gtt_setup(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
 
 
-int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt);
+int i915_ppgtt_init(struct i915_hw_ppgtt *ppgtt,
+		    struct drm_i915_private *dev_priv,
+		    struct drm_i915_file_private *file_priv);
 int i915_ppgtt_init_hw(struct drm_device *dev);
 int i915_ppgtt_init_ring(struct drm_i915_gem_request *req);
 void i915_ppgtt_release(struct kref *kref);
-struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_device *dev,
+struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv,
 					struct drm_i915_file_private *fpriv);
 static inline void i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
 {
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 081/190] drm/i915: i915_vma_move_to_active prep patch
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (78 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 080/190] drm/i915: Store owning file on the i915_address_space Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 082/190] drm/i915: Count how many VMA are bound for an object Chris Wilson
                   ` (6 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

This patch is broken out of the next just to remove the code motion from
that patch and make it more readable. What we do here is move the
i915_vma_move_to_active() to i915_gem_execbuffer.c and put the three
stages (read, write, fenced) together so that future modifications to
active handling are all located in the same spot. The importance of this
is so that we can more simply control the order in which the requests
are place in the retirement list (i.e. control the order at which we
retire and so control the lifetimes to avoid having to hold onto
references).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h              |  3 +-
 drivers/gpu/drm/i915/i915_gem.c              | 15 -------
 drivers/gpu/drm/i915/i915_gem_context.c      |  7 ++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   | 63 ++++++++++++++++++----------
 drivers/gpu/drm/i915/i915_gem_render_state.c |  2 +-
 5 files changed, 49 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0cc3ee589dfb..aa9d3782107e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2764,7 +2764,8 @@ int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
 int i915_gem_object_sync(struct drm_i915_gem_object *obj,
 			 struct drm_i915_gem_request *to);
 void i915_vma_move_to_active(struct i915_vma *vma,
-			     struct drm_i915_gem_request *req);
+			     struct drm_i915_gem_request *req,
+			     unsigned flags);
 int i915_gem_dumb_create(struct drm_file *file_priv,
 			 struct drm_device *dev,
 			 struct drm_mode_create_dumb *args);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9a22fdd8a9f5..164ebdaa0369 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2026,21 +2026,6 @@ void *i915_gem_object_pin_vmap(struct drm_i915_gem_object *obj)
 	return obj->vmapping;
 }
 
-void i915_vma_move_to_active(struct i915_vma *vma,
-			     struct drm_i915_gem_request *req)
-{
-	struct drm_i915_gem_object *obj = vma->obj;
-	struct intel_engine_cs *engine = req->engine;
-
-	/* Add a reference if we're newly entering the active list. */
-	if (obj->active == 0)
-		drm_gem_object_reference(&obj->base);
-	obj->active |= intel_engine_flag(engine);
-
-	i915_gem_request_mark_active(req, &obj->last_read[engine->id]);
-	list_move_tail(&vma->vm_link, &vma->vm->active_list);
-}
-
 static void
 i915_gem_object_retire__fence(struct i915_gem_active *active,
 			      struct drm_i915_gem_request *req)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index fab702abd1cb..310a770b7984 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -778,8 +778,8 @@ static int do_switch(struct drm_i915_gem_request *req)
 	 * MI_SET_CONTEXT instead of when the next seqno has completed.
 	 */
 	if (from != NULL) {
-		from->legacy_hw_ctx.rcs_state->base.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
-		i915_vma_move_to_active(i915_gem_obj_to_ggtt(from->legacy_hw_ctx.rcs_state), req);
+		struct drm_i915_gem_object *obj = from->legacy_hw_ctx.rcs_state;
+
 		/* As long as MI_SET_CONTEXT is serializing, ie. it flushes the
 		 * whole damn pipeline, we don't need to explicitly mark the
 		 * object dirty. The only exception is that the context must be
@@ -787,7 +787,8 @@ static int do_switch(struct drm_i915_gem_request *req)
 		 * able to defer doing this until we know the object would be
 		 * swapped, but there is no way to do that yet.
 		 */
-		from->legacy_hw_ctx.rcs_state->dirty = 1;
+		obj->base.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
+		i915_vma_move_to_active(i915_gem_obj_to_ggtt(obj), req, 0);
 
 		/* obj is kept alive until the next request by its active ref */
 		i915_gem_object_ggtt_unpin(from->legacy_hw_ctx.rcs_state);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index c10795f58bfc..9e549bded186 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1104,6 +1104,44 @@ i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
 	return ctx;
 }
 
+void i915_vma_move_to_active(struct i915_vma *vma,
+			     struct drm_i915_gem_request *req,
+			     unsigned flags)
+{
+	struct drm_i915_gem_object *obj = vma->obj;
+	const unsigned engine = req->engine->id;
+
+	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
+
+	obj->dirty = 1; /* be paranoid  */
+
+	/* Add a reference if we're newly entering the active list. */
+	if (obj->active == 0)
+		drm_gem_object_reference(&obj->base);
+	obj->active |= 1 << engine;
+	i915_gem_request_mark_active(req, &obj->last_read[engine]);
+
+	if (flags & EXEC_OBJECT_WRITE) {
+		i915_gem_request_mark_active(req, &obj->last_write);
+
+		intel_fb_obj_invalidate(obj, ORIGIN_CS);
+
+		/* update for the implicit flush after a batch */
+		obj->base.write_domain &= ~I915_GEM_GPU_DOMAINS;
+	}
+
+	if (flags & EXEC_OBJECT_NEEDS_FENCE) {
+		i915_gem_request_mark_active(req, &obj->last_fence);
+		if (flags & __EXEC_OBJECT_HAS_FENCE) {
+			struct drm_i915_private *dev_priv = req->i915;
+			list_move_tail(&dev_priv->fence_regs[obj->fence_reg].lru_list,
+				       &dev_priv->mm.fence_list);
+		}
+	}
+
+	list_move_tail(&vma->vm_link, &vma->vm->active_list);
+}
+
 static void
 i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 				   struct drm_i915_gem_request *req)
@@ -1111,35 +1149,18 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 	struct i915_vma *vma;
 
 	list_for_each_entry(vma, vmas, exec_list) {
-		struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 		struct drm_i915_gem_object *obj = vma->obj;
 		u32 old_read = obj->base.read_domains;
 		u32 old_write = obj->base.write_domain;
 
-		obj->dirty = 1; /* be paranoid  */
 		obj->base.write_domain = obj->base.pending_write_domain;
-		if (obj->base.write_domain == 0)
+		if (obj->base.write_domain)
+			vma->exec_entry->flags |= EXEC_OBJECT_WRITE;
+		else
 			obj->base.pending_read_domains |= obj->base.read_domains;
 		obj->base.read_domains = obj->base.pending_read_domains;
 
-		i915_vma_move_to_active(vma, req);
-		if (obj->base.write_domain) {
-			i915_gem_request_mark_active(req, &obj->last_write);
-
-			intel_fb_obj_invalidate(obj, ORIGIN_CS);
-
-			/* update for the implicit flush after a batch */
-			obj->base.write_domain &= ~I915_GEM_GPU_DOMAINS;
-		}
-		if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
-			i915_gem_request_mark_active(req, &obj->last_fence);
-			if (entry->flags & __EXEC_OBJECT_HAS_FENCE) {
-				struct drm_i915_private *dev_priv = req->i915;
-				list_move_tail(&dev_priv->fence_regs[obj->fence_reg].lru_list,
-					       &dev_priv->mm.fence_list);
-			}
-		}
-
+		i915_vma_move_to_active(vma, req, vma->exec_entry->flags);
 		trace_i915_gem_object_change_domain(obj, old_read, old_write);
 	}
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 222f25777bb4..68054f5c4ab1 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -230,7 +230,7 @@ int i915_gem_render_state_init(struct drm_i915_gem_request *req)
 			goto out;
 	}
 
-	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), req);
+	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), req, 0);
 out:
 	render_state_fini(&so);
 	return ret;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 082/190] drm/i915: Count how many VMA are bound for an object
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (79 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 081/190] drm/i915: i915_vma_move_to_active prep patch Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 083/190] drm/i915: Be more careful when unbinding vma Chris Wilson
                   ` (5 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Since we may have VMA allocated for an object, but we interrupted their
binding, there is a disparity between have elements on the obj->vma_list
and being bound. i915_gem_obj_bound_any() does this check, but this is
not rigorously observed - add an explicit count to make it easier.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c      | 12 +++++------
 drivers/gpu/drm/i915/i915_drv.h          |  3 ++-
 drivers/gpu/drm/i915/i915_gem.c          | 34 +++++++++++++-------------------
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 17 +---------------
 drivers/gpu/drm/i915/i915_gem_stolen.c   |  1 +
 5 files changed, 23 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 0d1f470567b0..e2b1242e369b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -164,6 +164,9 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	if (obj->fence_reg != I915_FENCE_REG_NONE)
 		seq_printf(m, " (fence: %d)", obj->fence_reg);
 	list_for_each_entry(vma, &obj->vma_list, obj_link) {
+		if (!drm_mm_node_allocated(&vma->node))
+			continue;
+
 		seq_printf(m, " (%sgtt offset: %08llx, size: %08llx",
 			   vma->is_ggtt ? "g" : "pp",
 			   vma->node.start, vma->node.size);
@@ -331,11 +334,11 @@ static int per_file_stats(int id, void *ptr, void *data)
 	struct drm_i915_gem_object *obj = ptr;
 	struct file_stats *stats = data;
 	struct i915_vma *vma;
-	int bound = 0;
 
 	stats->count++;
 	stats->total += obj->base.size;
-
+	if (!obj->bind_count)
+		stats->unbound += obj->base.size;
 	if (obj->base.name || obj->base.dma_buf)
 		stats->shared += obj->base.size;
 
@@ -343,8 +346,6 @@ static int per_file_stats(int id, void *ptr, void *data)
 		if (!drm_mm_node_allocated(&vma->node))
 			continue;
 
-		bound++;
-
 		if (vma->is_ggtt) {
 			stats->global += vma->node.size;
 		} else {
@@ -362,9 +363,6 @@ static int per_file_stats(int id, void *ptr, void *data)
 			stats->inactive += vma->node.size;
 	}
 
-	if (!bound)
-		stats->unbound += obj->base.size;
-
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index aa9d3782107e..8f5cf244094e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2088,6 +2088,8 @@ struct drm_i915_gem_object {
 
 	unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
 
+	/** Count of VMA actually bound by this object */
+	unsigned int bind_count;
 	unsigned int pin_display;
 
 	struct sg_table *pages;
@@ -2874,7 +2876,6 @@ i915_gem_obj_ggtt_offset(struct drm_i915_gem_object *o)
 	return i915_gem_obj_ggtt_offset_view(o, &i915_ggtt_view_normal);
 }
 
-bool i915_gem_obj_bound_any(struct drm_i915_gem_object *o);
 bool i915_gem_obj_ggtt_bound_view(struct drm_i915_gem_object *o,
 				  const struct i915_ggtt_view *view);
 bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 164ebdaa0369..ed3f306af42f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1812,7 +1812,7 @@ i915_gem_object_put_pages(struct drm_i915_gem_object *obj)
 	if (obj->pages_pin_count)
 		return -EBUSY;
 
-	BUG_ON(i915_gem_obj_bound_any(obj));
+	BUG_ON(obj->bind_count);
 
 	/* ->put_pages might need to allocate memory for the bit17 swizzle
 	 * array, hence protect them from being reaped by removing them from gtt
@@ -2558,7 +2558,6 @@ static void i915_gem_object_finish_gtt(struct drm_i915_gem_object *obj)
 static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
-	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
 	int ret;
 
 	if (list_empty(&vma->obj_link))
@@ -2572,7 +2571,8 @@ static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
 	if (vma->pin_count)
 		return -EBUSY;
 
-	BUG_ON(obj->pages == NULL);
+	GEM_BUG_ON(obj->bind_count == 0);
+	GEM_BUG_ON(obj->pages == NULL);
 
 	if (wait) {
 		ret = i915_gem_object_wait_rendering(obj, false);
@@ -2610,8 +2610,9 @@ static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
 
 	/* Since the unbound list is global, only move to that list if
 	 * no more VMAs exist. */
-	if (list_empty(&obj->vma_list))
-		list_move_tail(&obj->global_list, &dev_priv->mm.unbound_list);
+	if (--obj->bind_count == 0)
+		list_move_tail(&obj->global_list,
+			       &to_i915(obj->base.dev)->mm.unbound_list);
 
 	/* And finally now the object is completely decoupled from this vma,
 	 * we can drop its hold on the backing storage and allow it to be
@@ -2849,6 +2850,7 @@ search_free:
 
 	list_move_tail(&obj->global_list, &dev_priv->mm.bound_list);
 	list_add_tail(&vma->vm_link, &vm->inactive_list);
+	obj->bind_count++;
 
 	return vma;
 
@@ -3037,7 +3039,6 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 {
 	struct drm_device *dev = obj->base.dev;
 	struct i915_vma *vma, *next;
-	bool bound = false;
 	int ret = 0;
 
 	if (obj->cache_level == cache_level)
@@ -3061,8 +3062,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 			ret = i915_vma_unbind(vma);
 			if (ret)
 				return ret;
-		} else
-			bound = true;
+		}
 	}
 
 	/* We can reuse the existing drm_mm nodes but need to change the
@@ -3072,7 +3072,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 	 * rewrite the PTE in the belief that doing so tramples upon less
 	 * state and so involves less work.
 	 */
-	if (bound) {
+	if (obj->bind_count) {
 		/* Before we change the PTE, the GPU must not be accessing it.
 		 * If we wait upon the object, we know that all the bound
 		 * VMA are no longer active.
@@ -3281,6 +3281,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 					    old_read_domains,
 					    old_write_domain);
 
+	/* Increment the pages_pin_count to guard against the shrinker */
+	obj->pages_pin_count++;
+
 	return 0;
 
 err_unpin_display:
@@ -3297,6 +3300,7 @@ i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj,
 
 	i915_gem_object_ggtt_unpin_view(obj, view);
 
+	obj->pages_pin_count--;
 	obj->pin_display--;
 }
 
@@ -3797,6 +3801,7 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 			dev_priv->mm.interruptible = was_interruptible;
 		}
 	}
+	GEM_BUG_ON(obj->bind_count);
 
 	/* Stolen objects don't hold a ref, but do hold pin count. Fix that up
 	 * before progressing. */
@@ -4485,17 +4490,6 @@ bool i915_gem_obj_ggtt_bound_view(struct drm_i915_gem_object *o,
 	return false;
 }
 
-bool i915_gem_obj_bound_any(struct drm_i915_gem_object *o)
-{
-	struct i915_vma *vma;
-
-	list_for_each_entry(vma, &o->vma_list, obj_link)
-		if (drm_mm_node_allocated(&vma->node))
-			return true;
-
-	return false;
-}
-
 unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
 				struct i915_address_space *vm)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 777959b47ccf..fa190ef3f727 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -47,21 +47,6 @@ static bool mutex_is_locked_by(struct mutex *mutex, struct task_struct *task)
 #endif
 }
 
-static int num_vma_bound(struct drm_i915_gem_object *obj)
-{
-	struct i915_vma *vma;
-	int count = 0;
-
-	list_for_each_entry(vma, &obj->vma_list, obj_link) {
-		if (drm_mm_node_allocated(&vma->node))
-			count++;
-		if (vma->pin_count)
-			count++;
-	}
-
-	return count;
-}
-
 static bool swap_available(void)
 {
 	return get_nr_swap_pages() > 0;
@@ -77,7 +62,7 @@ static bool can_release_pages(struct drm_i915_gem_object *obj)
 	 * to the GPU, simply unbinding from the GPU is not going to succeed
 	 * in releasing our pin count on the pages themselves.
 	 */
-	if (obj->pages_pin_count != num_vma_bound(obj))
+	if (obj->pages_pin_count != obj->bind_count)
 		return false;
 
 	/* We can only return physical pages to the system if we can either
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 463be259a505..1c81a1470baf 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -693,6 +693,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 	vma->bound |= GLOBAL_BIND;
 	__i915_vma_set_map_and_fenceable(vma);
 	list_add_tail(&vma->vm_link, &ggtt->inactive_list);
+	obj->bind_count++;
 
 	list_add_tail(&obj->global_list, &dev_priv->mm.bound_list);
 	i915_gem_object_pin_pages(obj);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 083/190] drm/i915: Be more careful when unbinding vma
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (80 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 082/190] drm/i915: Count how many VMA are bound for an object Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 084/190] drm/i915: Track active vma requests Chris Wilson
                   ` (4 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

When we call i915_vma_unbind(), we will wait upon outstanding rendering.
This will also trigger a retirement phase, which may update the object
lists. If, we extend request tracking to the VMA itself (rather than
keep it at the encompassing object), then there is a potential that the
obj->vma_list be modified for other elements upon i915_vma_unbind(). As
a result, if we walk over the object list and call i915_vma_unbind(), we
need to be prepared for that list to change.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h          |  2 ++
 drivers/gpu/drm/i915/i915_gem.c          | 54 ++++++++++++++++++++++++--------
 drivers/gpu/drm/i915/i915_gem_shrinker.c |  6 +---
 drivers/gpu/drm/i915/i915_gem_userptr.c  |  4 +--
 4 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8f5cf244094e..9fa925389332 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2707,6 +2707,8 @@ int __must_check i915_vma_unbind(struct i915_vma *vma);
  * _guarantee_ VMA in question is _not in use_ anywhere.
  */
 int __must_check __i915_vma_unbind_no_wait(struct i915_vma *vma);
+
+int i915_gem_object_unbind(struct drm_i915_gem_object *obj);
 int i915_gem_object_put_pages(struct drm_i915_gem_object *obj);
 void i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv);
 void i915_gem_release_mmap(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ed3f306af42f..95e69dc47fc8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -254,18 +254,38 @@ static const struct drm_i915_gem_object_ops i915_gem_phys_ops = {
 	.release = i915_gem_object_release_phys,
 };
 
+int
+i915_gem_object_unbind(struct drm_i915_gem_object *obj)
+{
+	struct list_head still_in_list;
+
+	INIT_LIST_HEAD(&still_in_list);
+	while (!list_empty(&obj->vma_list)) {
+		struct i915_vma *vma =
+			list_first_entry(&obj->vma_list,
+					 struct i915_vma,
+					 obj_link);
+		int ret;
+
+		list_move_tail(&vma->obj_link, &still_in_list);
+		ret = i915_vma_unbind(vma);
+		if (ret)
+			break;
+	}
+	list_splice(&still_in_list, &obj->vma_list);
+
+	return 0;
+}
+
 static int
 drop_pages(struct drm_i915_gem_object *obj)
 {
-	struct i915_vma *vma, *next;
 	int ret;
 
 	drm_gem_object_reference(&obj->base);
-	list_for_each_entry_safe(vma, next, &obj->vma_list, obj_link)
-		if (i915_vma_unbind(vma))
-			break;
-
-	ret = i915_gem_object_put_pages(obj);
+	ret = i915_gem_object_unbind(obj);
+	if (ret == 0)
+		ret = i915_gem_object_put_pages(obj);
 	drm_gem_object_unreference(&obj->base);
 
 	return ret;
@@ -3038,7 +3058,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 				    enum i915_cache_level cache_level)
 {
 	struct drm_device *dev = obj->base.dev;
-	struct i915_vma *vma, *next;
+	struct i915_vma *vma;
 	int ret = 0;
 
 	if (obj->cache_level == cache_level)
@@ -3049,7 +3069,8 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 	 * catch the issue of the CS prefetch crossing page boundaries and
 	 * reading an invalid PTE on older architectures.
 	 */
-	list_for_each_entry_safe(vma, next, &obj->vma_list, obj_link) {
+restart:
+	list_for_each_entry(vma, &obj->vma_list, obj_link) {
 		if (!drm_mm_node_allocated(&vma->node))
 			continue;
 
@@ -3058,11 +3079,18 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 			return -EBUSY;
 		}
 
-		if (!i915_gem_valid_gtt_space(vma, cache_level)) {
-			ret = i915_vma_unbind(vma);
-			if (ret)
-				return ret;
-		}
+		if (i915_gem_valid_gtt_space(vma, cache_level))
+			continue;
+
+		ret = i915_vma_unbind(vma);
+		if (ret)
+			return ret;
+
+		/* As unbinding may affect other elements in the
+		 * obj->vma_list (due to side-effects from retiring
+		 * an active vma), play safe and restart the iterator.
+		 */
+		goto restart;
 	}
 
 	/* We can reuse the existing drm_mm nodes but need to change the
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index fa190ef3f727..e15fc7531f08 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -141,7 +141,6 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
 		INIT_LIST_HEAD(&still_in_list);
 		while (count < target && !list_empty(phase->list)) {
 			struct drm_i915_gem_object *obj;
-			struct i915_vma *vma, *v;
 
 			obj = list_first_entry(phase->list,
 					       typeof(*obj), global_list);
@@ -160,10 +159,7 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
 			drm_gem_object_reference(&obj->base);
 
 			/* For the unbound phase, this should be a no-op! */
-			list_for_each_entry_safe(vma, v,
-						 &obj->vma_list, obj_link)
-				if (i915_vma_unbind(vma))
-					break;
+			i915_gem_object_unbind(obj);
 
 			if (i915_gem_object_put_pages(obj) == 0)
 				count += obj->base.size >> PAGE_SHIFT;
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 2f3638d02bdd..a90392246471 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -75,14 +75,12 @@ static void __cancel_userptr__worker(struct work_struct *work)
 
 	if (obj->pages != NULL) {
 		struct drm_i915_private *dev_priv = to_i915(dev);
-		struct i915_vma *vma, *tmp;
 		bool was_interruptible;
 
 		was_interruptible = dev_priv->mm.interruptible;
 		dev_priv->mm.interruptible = false;
 
-		list_for_each_entry_safe(vma, tmp, &obj->vma_list, obj_link)
-			WARN_ON(i915_vma_unbind(vma));
+		WARN_ON(i915_gem_object_unbind(obj));
 		WARN_ON(i915_gem_object_put_pages(obj));
 
 		dev_priv->mm.interruptible = was_interruptible;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 084/190] drm/i915: Track active vma requests
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (81 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 083/190] drm/i915: Be more careful when unbinding vma Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 085/190] drm/i915: Release vma when the handle is closed Chris Wilson
                   ` (3 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

Hook the vma itself into the i915_gem_request_retire() so that we can
accurately track when a solitary vma is inactive (as opposed to having
to wait for the entire object to be idle). This improves the interaction
when using multiple contexts (with full-ppgtt) and eliminates some
frequent list walking when retiring objects after a completed request.

A side-effect is that we get an active vma reference for free. The
consequence of this is shown in the next patch...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  2 +-
 drivers/gpu/drm/i915/i915_gem.c            | 15 +++++----------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 +++++++++-
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 21 +++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  5 +++++
 5 files changed, 41 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e2b1242e369b..378bc73296aa 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -357,7 +357,7 @@ static int per_file_stats(int id, void *ptr, void *data)
 				continue;
 		}
 
-		if (obj->active) /* XXX per-vma statistic */
+		if (vma->active)
 			stats->active += vma->node.size;
 		else
 			stats->inactive += vma->node.size;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 95e69dc47fc8..7e4f7f2d18e4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2070,7 +2070,6 @@ i915_gem_object_retire__read(struct i915_gem_active *active,
 	int ring = request->engine->id;
 	struct drm_i915_gem_object *obj =
 		container_of(active, struct drm_i915_gem_object, last_read[ring]);
-	struct i915_vma *vma;
 
 	GEM_BUG_ON((obj->active & (1 << ring)) == 0);
 
@@ -2082,12 +2081,9 @@ i915_gem_object_retire__read(struct i915_gem_active *active,
 	 * so that we don't steal from recently used but inactive objects
 	 * (unless we are forced to ofc!)
 	 */
-	list_move_tail(&obj->global_list, &request->i915->mm.bound_list);
-
-	list_for_each_entry(vma, &obj->vma_list, obj_link) {
-		if (!list_empty(&vma->vm_link))
-			list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
-	}
+	if (obj->bind_count)
+		list_move_tail(&obj->global_list,
+			       &request->i915->mm.bound_list);
 
 	drm_gem_object_unreference(&obj->base);
 }
@@ -3034,9 +3030,8 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 
 	/* And bump the LRU for this access */
 	vma = i915_gem_obj_to_ggtt(obj);
-	if (vma && drm_mm_node_allocated(&vma->node) && !obj->active)
-		list_move_tail(&vma->vm_link,
-			       &to_i915(obj->base.dev)->gtt.base.inactive_list);
+	if (vma && drm_mm_node_allocated(&vma->node) && !vma->active)
+		list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 9e549bded186..19d32f22f85d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1115,7 +1115,13 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 
 	obj->dirty = 1; /* be paranoid  */
 
-	/* Add a reference if we're newly entering the active list. */
+	/* Add a reference if we're newly entering the active list.
+	 * The order in which we add operations to the retirement queue is
+	 * vital here: mark_active adds to the start of the callback list,
+	 * such that subsequent callbacks are called first. Therefore we
+	 * add the active reference first and queue for it to be dropped
+	 * *last*.
+	 */
 	if (obj->active == 0)
 		drm_gem_object_reference(&obj->base);
 	obj->active |= 1 << engine;
@@ -1139,6 +1145,8 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 		}
 	}
 
+	vma->active |= 1 << engine;
+	i915_gem_request_mark_active(req, &vma->last_read[engine]);
 	list_move_tail(&vma->vm_link, &vma->vm->active_list);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 3a07ff622bd6..fd42b6491d28 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3241,12 +3241,31 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 	i915_ggtt_flush(dev_priv);
 }
 
+static void
+i915_vma_retire(struct i915_gem_active *active,
+		struct drm_i915_gem_request *rq)
+{
+	const unsigned engine = rq->engine->id;
+	struct i915_vma *vma =
+		container_of(active, struct i915_vma, last_read[engine]);
+
+	GEM_BUG_ON((vma->active & (1 << engine)) == 0);
+	GEM_BUG_ON((vma->obj->active & vma->active) != vma->active);
+
+	vma->active &= ~(1 << engine);
+	if (vma->active)
+		return;
+
+	list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
+}
+
 static struct i915_vma *
 __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 		      struct i915_address_space *vm,
 		      const struct i915_ggtt_view *ggtt_view)
 {
 	struct i915_vma *vma;
+	int i;
 
 	if (WARN_ON(i915_is_ggtt(vm) != !!ggtt_view))
 		return ERR_PTR(-EINVAL);
@@ -3258,6 +3277,8 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 	INIT_LIST_HEAD(&vma->vm_link);
 	INIT_LIST_HEAD(&vma->obj_link);
 	INIT_LIST_HEAD(&vma->exec_list);
+	for (i = 0; i < ARRAY_SIZE(vma->last_read); i++)
+		init_request_active(&vma->last_read[i], i915_vma_retire);
 	vma->vm = vm;
 	vma->obj = obj;
 	vma->is_ggtt = i915_is_ggtt(vm);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 9d3984602d34..0a7867fa5a1f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -34,6 +34,8 @@
 #ifndef __I915_GEM_GTT_H__
 #define __I915_GEM_GTT_H__
 
+#include "i915_gem_request.h"
+
 struct drm_i915_file_private;
 
 typedef uint32_t gen6_pte_t;
@@ -180,10 +182,13 @@ struct i915_vma {
 	struct drm_i915_gem_object *obj;
 	struct i915_address_space *vm;
 
+	struct i915_gem_active last_read[I915_NUM_RINGS];
+
 	/** Flags and address space this VMA is bound to */
 #define GLOBAL_BIND	(1<<0)
 #define LOCAL_BIND	(1<<1)
 	unsigned int bound : 4;
+	unsigned int active : I915_NUM_RINGS;
 	bool is_ggtt : 1;
 
 	/**
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 085/190] drm/i915: Release vma when the handle is closed
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (82 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 084/190] drm/i915: Track active vma requests Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11  9:17 ` [PATCH 086/190] drm/i915: Mark the context and address space as closed Chris Wilson
                   ` (2 subsequent siblings)
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

In order to prevent a leak of the vma on shared objects, we need to
hook into the object_close callback to destroy the vma on the object for
this file. However, if we destroyed that vma immediately we may cause
unexpected application stalls as we try to unbind a busy vma - hence we
defer the unbind to when we retire the vma.

v2: Keep vma allocated until closed. This is useful for a later
optimisation, but it is required now in order to handle potential
recursion of i915_vma_unbind() by retiring itself.
v3: Comments are important.

Testcase: igt/gem_ppggtt/flink-and-close-vma-leak
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com
---
 drivers/gpu/drm/i915/i915_drv.c     |   1 +
 drivers/gpu/drm/i915/i915_drv.h     |   2 +-
 drivers/gpu/drm/i915/i915_gem.c     | 126 ++++++++++++++++++++++--------------
 drivers/gpu/drm/i915/i915_gem_gtt.c |   2 +
 drivers/gpu/drm/i915/i915_gem_gtt.h |   1 +
 5 files changed, 84 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index cc831a34f7bb..2a0882647c23 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1664,6 +1664,7 @@ static struct drm_driver driver = {
 	.debugfs_init = i915_debugfs_init,
 	.debugfs_cleanup = i915_debugfs_cleanup,
 #endif
+	.gem_close_object = i915_gem_close_object,
 	.gem_free_object = i915_gem_free_object,
 	.gem_vm_ops = &i915_gem_vm_ops,
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9fa925389332..262d1b247344 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2673,8 +2673,8 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 						  size_t size);
 struct drm_i915_gem_object *i915_gem_object_create_from_data(
 		struct drm_device *dev, const void *data, size_t size);
+void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file);
 void i915_gem_free_object(struct drm_gem_object *obj);
-void i915_gem_vma_destroy(struct i915_vma *vma);
 
 /* Flags used by pin/bind&friends. */
 #define PIN_MAPPABLE	(1<<0)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7e4f7f2d18e4..1f95cf39b7d2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2385,6 +2385,30 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 	}
 }
 
+static void i915_vma_close(struct i915_vma *vma)
+{
+	GEM_BUG_ON(vma->closed);
+	vma->closed = true;
+
+	list_del_init(&vma->obj_link);
+	if (!vma->active)
+		WARN_ON(i915_vma_unbind(vma));
+}
+
+void i915_gem_close_object(struct drm_gem_object *gem,
+			   struct drm_file *file)
+{
+	struct drm_i915_gem_object *obj = to_intel_bo(gem);
+	struct drm_i915_file_private *fpriv = file->driver_priv;
+	struct i915_vma *vma, *vn;
+
+	mutex_lock(&obj->base.dev->struct_mutex);
+	list_for_each_entry_safe(vma, vn, &obj->vma_list, obj_link)
+		if (vma->vm->file == fpriv)
+			i915_vma_close(vma);
+	mutex_unlock(&obj->base.dev->struct_mutex);
+}
+
 /**
  * i915_gem_wait_ioctl - implements DRM_IOCTL_I915_GEM_WAIT
  * @DRM_IOCTL_ARGS: standard ioctl arguments
@@ -2571,31 +2595,56 @@ static void i915_gem_object_finish_gtt(struct drm_i915_gem_object *obj)
 					    old_write_domain);
 }
 
+static void i915_vma_destroy(struct i915_vma *vma)
+{
+	GEM_BUG_ON(vma->node.allocated);
+	GEM_BUG_ON(vma->active);
+	GEM_BUG_ON(!vma->closed);
+
+	list_del(&vma->vm_link);
+	if (!vma->is_ggtt)
+		i915_ppgtt_put(i915_vm_to_ppgtt(vma->vm));
+
+	kmem_cache_free(to_i915(vma->obj->base.dev)->vmas, vma);
+}
+
 static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
-	int ret;
+	int ret, i;
 
-	if (list_empty(&vma->obj_link))
-		return 0;
+	/* First wait upon any activity as retiring the request may
+	 * have side-effects such as unpinning or even unbinding this vma.
+	 */
+	if (vma->active && wait) {
+		bool was_closed;
 
-	if (!drm_mm_node_allocated(&vma->node)) {
-		i915_gem_vma_destroy(vma);
-		return 0;
+		/* When a closed VMA is retired, it is unbound - eek. */
+		was_closed = vma->closed;
+		vma->closed = false;
+
+		for (i = 0; i < ARRAY_SIZE(vma->last_read); i++) {
+			ret = i915_wait_request(vma->last_read[i].request);
+			if (ret)
+				break;
+		}
+
+		vma->closed = was_closed;
+		if (ret)
+			return ret;
+
+		GEM_BUG_ON(vma->active);
 	}
 
 	if (vma->pin_count)
 		return -EBUSY;
 
+	if (!drm_mm_node_allocated(&vma->node))
+		goto destroy;
+
 	GEM_BUG_ON(obj->bind_count == 0);
 	GEM_BUG_ON(obj->pages == NULL);
 
-	if (wait) {
-		ret = i915_gem_object_wait_rendering(obj, false);
-		if (ret)
-			return ret;
-	}
-
 	if (vma->is_ggtt && vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
 		i915_gem_object_finish_gtt(obj);
 
@@ -2622,7 +2671,6 @@ static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
 	}
 
 	drm_mm_remove_node(&vma->node);
-	i915_gem_vma_destroy(vma);
 
 	/* Since the unbound list is global, only move to that list if
 	 * no more VMAs exist. */
@@ -2636,6 +2684,10 @@ static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
 	 */
 	i915_gem_object_unpin_pages(obj);
 
+destroy:
+	if (unlikely(vma->closed))
+		i915_vma_destroy(vma);
+
 	return 0;
 }
 
@@ -2814,7 +2866,7 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 
 		if (offset & (alignment - 1) || offset + size > end) {
 			ret = -EINVAL;
-			goto err_free_vma;
+			goto err_vma;
 		}
 		vma->node.start = offset;
 		vma->node.size = size;
@@ -2826,7 +2878,7 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 				ret = drm_mm_reserve_node(&vm->mm, &vma->node);
 		}
 		if (ret)
-			goto err_free_vma;
+			goto err_vma;
 	} else {
 		if (flags & PIN_HIGH) {
 			search_flag = DRM_MM_SEARCH_BELOW;
@@ -2851,7 +2903,7 @@ search_free:
 			if (ret == 0)
 				goto search_free;
 
-			goto err_free_vma;
+			goto err_vma;
 		}
 	}
 	if (WARN_ON(!i915_gem_valid_gtt_space(vma, obj->cache_level))) {
@@ -2872,8 +2924,7 @@ search_free:
 
 err_remove_node:
 	drm_mm_remove_node(&vma->node);
-err_free_vma:
-	i915_gem_vma_destroy(vma);
+err_vma:
 	vma = ERR_PTR(ret);
 err_unpin:
 	i915_gem_object_unpin_pages(obj);
@@ -3808,21 +3859,18 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 
 	trace_i915_gem_object_destroy(obj);
 
+	/* All file-owned VMA should have been released by this point through
+	 * i915_gem_close_object(), or earlier by i915_gem_context_close().
+	 * However, the object may also be bound into the global GTT (e.g.
+	 * older GPUs without per-process support, or for direct access through
+	 * the GTT either for the user or for scanout). Those VMA still need to
+	 * unbound now.
+	 */
 	list_for_each_entry_safe(vma, next, &obj->vma_list, obj_link) {
-		int ret;
-
+		GEM_BUG_ON(!i915_is_ggtt(vma->vm));
+		GEM_BUG_ON(vma->active);
 		vma->pin_count = 0;
-		ret = i915_vma_unbind(vma);
-		if (WARN_ON(ret == -ERESTARTSYS)) {
-			bool was_interruptible;
-
-			was_interruptible = dev_priv->mm.interruptible;
-			dev_priv->mm.interruptible = false;
-
-			WARN_ON(i915_vma_unbind(vma));
-
-			dev_priv->mm.interruptible = was_interruptible;
-		}
+		i915_vma_close(vma);
 	}
 	GEM_BUG_ON(obj->bind_count);
 
@@ -3890,22 +3938,6 @@ struct i915_vma *i915_gem_obj_to_ggtt_view(struct drm_i915_gem_object *obj,
 	return NULL;
 }
 
-void i915_gem_vma_destroy(struct i915_vma *vma)
-{
-	WARN_ON(vma->node.allocated);
-
-	/* Keep the vma as a placeholder in the execbuffer reservation lists */
-	if (!list_empty(&vma->exec_list))
-		return;
-
-	if (!vma->is_ggtt)
-		i915_ppgtt_put(i915_vm_to_ppgtt(vma->vm));
-
-	list_del(&vma->obj_link);
-
-	kmem_cache_free(to_i915(vma->obj->base.dev)->vmas, vma);
-}
-
 static void
 i915_gem_stop_ringbuffers(struct drm_device *dev)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index fd42b6491d28..ef093db6b8a6 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3257,6 +3257,8 @@ i915_vma_retire(struct i915_gem_active *active,
 		return;
 
 	list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
+	if (unlikely(vma->closed))
+		WARN_ON(i915_vma_unbind(vma));
 }
 
 static struct i915_vma *
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 0a7867fa5a1f..d68d5fd02923 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -190,6 +190,7 @@ struct i915_vma {
 	unsigned int bound : 4;
 	unsigned int active : I915_NUM_RINGS;
 	bool is_ggtt : 1;
+	bool closed : 1;
 
 	/**
 	 * Support different GGTT views into the same object.
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 086/190] drm/i915: Mark the context and address space as closed
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (83 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 085/190] drm/i915: Release vma when the handle is closed Chris Wilson
@ 2016-01-11  9:17 ` Chris Wilson
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
  86 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11  9:17 UTC (permalink / raw)
  To: intel-gfx

When the user closes the context mark it and the dependent address space
as closed. As we use an asynchronous destruct method, this has two purposes.
First it allows us to flag the closed context and detect internal errors if
we to create any new objects for it (as it is removed from the user's
namespace, these should be internal bugs only). And secondly, it allows
us to immediately reap stale vma.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         |  3 +++
 drivers/gpu/drm/i915/i915_gem.c         | 17 +++++++-------
 drivers/gpu/drm/i915/i915_gem_context.c | 40 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_gem_gtt.c     |  9 ++++++--
 drivers/gpu/drm/i915/i915_gem_gtt.h     |  9 ++++++++
 drivers/gpu/drm/i915/i915_gem_stolen.c  |  2 +-
 6 files changed, 65 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 262d1b247344..fc35a9b8d910 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -888,6 +888,8 @@ struct intel_context {
 	} engine[I915_NUM_RINGS];
 
 	struct list_head link;
+
+	bool closed:1;
 };
 
 enum fb_op_origin {
@@ -2707,6 +2709,7 @@ int __must_check i915_vma_unbind(struct i915_vma *vma);
  * _guarantee_ VMA in question is _not in use_ anywhere.
  */
 int __must_check __i915_vma_unbind_no_wait(struct i915_vma *vma);
+void i915_vma_close(struct i915_vma *vma);
 
 int i915_gem_object_unbind(struct drm_i915_gem_object *obj);
 int i915_gem_object_put_pages(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1f95cf39b7d2..16ee3bd7010e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2385,7 +2385,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 	}
 }
 
-static void i915_vma_close(struct i915_vma *vma)
+void i915_vma_close(struct i915_vma *vma)
 {
 	GEM_BUG_ON(vma->closed);
 	vma->closed = true;
@@ -2654,12 +2654,15 @@ static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
 			return ret;
 	}
 
-	trace_i915_vma_unbind(vma);
-
-	vma->vm->unbind_vma(vma);
+	if (likely(!vma->vm->closed)) {
+		trace_i915_vma_unbind(vma);
+		vma->vm->unbind_vma(vma);
+	}
 	vma->bound = 0;
 
-	list_del_init(&vma->vm_link);
+	drm_mm_remove_node(&vma->node);
+	list_move_tail(&vma->vm_link, &vma->vm->unbound_list);
+
 	if (vma->is_ggtt) {
 		if (vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
 			obj->map_and_fenceable = false;
@@ -2670,8 +2673,6 @@ static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
 		vma->ggtt_view.pages = NULL;
 	}
 
-	drm_mm_remove_node(&vma->node);
-
 	/* Since the unbound list is global, only move to that list if
 	 * no more VMAs exist. */
 	if (--obj->bind_count == 0)
@@ -2917,7 +2918,7 @@ search_free:
 		goto err_remove_node;
 
 	list_move_tail(&obj->global_list, &dev_priv->mm.bound_list);
-	list_add_tail(&vma->vm_link, &vm->inactive_list);
+	list_move_tail(&vma->vm_link, &vm->inactive_list);
 	obj->bind_count++;
 
 	return vma;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 310a770b7984..4583d8fe3585 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -153,6 +153,7 @@ void i915_gem_context_free(struct kref *ctx_ref)
 	struct intel_context *ctx = container_of(ctx_ref, typeof(*ctx), ref);
 
 	trace_i915_context_free(ctx);
+	GEM_BUG_ON(!ctx->closed);
 
 	if (i915.enable_execlists)
 		intel_lr_context_free(ctx);
@@ -209,6 +210,37 @@ i915_gem_alloc_context_obj(struct drm_device *dev, size_t size)
 	return obj;
 }
 
+static void i915_ppgtt_close(struct i915_address_space *vm)
+{
+	struct list_head *phases[] = {
+		&vm->active_list,
+		&vm->inactive_list,
+		&vm->unbound_list,
+		NULL,
+	}, **phase;
+
+	GEM_BUG_ON(i915_is_ggtt(vm));
+	GEM_BUG_ON(vm->closed);
+	vm->closed = true;
+
+	for (phase = phases; *phase; phase++) {
+		struct i915_vma *vma, *vn;
+
+		list_for_each_entry_safe(vma, vn, *phase, vm_link)
+			if (!vma->closed)
+				i915_vma_close(vma);
+	}
+}
+
+static void context_close(struct intel_context *ctx)
+{
+	GEM_BUG_ON(ctx->closed);
+	ctx->closed = true;
+	if (ctx->ppgtt)
+		i915_ppgtt_close(&ctx->ppgtt->base);
+	i915_gem_context_unreference(ctx);
+}
+
 static struct intel_context *
 __create_hw_context(struct drm_device *dev,
 		    struct drm_i915_file_private *file_priv)
@@ -256,7 +288,7 @@ __create_hw_context(struct drm_device *dev,
 	return ctx;
 
 err_out:
-	i915_gem_context_unreference(ctx);
+	context_close(ctx);
 	return ERR_PTR(ret);
 }
 
@@ -318,7 +350,7 @@ err_unpin:
 		i915_gem_object_ggtt_unpin(ctx->legacy_hw_ctx.rcs_state);
 err_destroy:
 	idr_remove(&file_priv->context_idr, ctx->user_handle);
-	i915_gem_context_unreference(ctx);
+	context_close(ctx);
 	return ERR_PTR(ret);
 }
 
@@ -474,7 +506,7 @@ static int context_idr_cleanup(int id, void *p, void *data)
 {
 	struct intel_context *ctx = p;
 
-	i915_gem_context_unreference(ctx);
+	context_close(ctx);
 	return 0;
 }
 
@@ -894,7 +926,7 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 	}
 
 	idr_remove(&ctx->file_priv->context_idr, ctx->user_handle);
-	i915_gem_context_unreference(ctx);
+	context_close(ctx);
 	mutex_unlock(&dev->struct_mutex);
 
 	DRM_DEBUG_DRIVER("HW context %d destroyed\n", args->ctx_id);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ef093db6b8a6..ad26c9e331aa 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2130,6 +2130,7 @@ static void i915_address_space_init(struct i915_address_space *vm,
 	vm->dev = dev_priv->dev;
 	INIT_LIST_HEAD(&vm->active_list);
 	INIT_LIST_HEAD(&vm->inactive_list);
+	INIT_LIST_HEAD(&vm->unbound_list);
 	list_add_tail(&vm->global_link, &dev_priv->vm_list);
 }
 
@@ -2214,9 +2215,10 @@ void  i915_ppgtt_release(struct kref *kref)
 
 	trace_i915_ppgtt_release(&ppgtt->base);
 
-	/* vmas should already be unbound */
+	/* vmas should already be unbound and destroyed */
 	WARN_ON(!list_empty(&ppgtt->base.active_list));
 	WARN_ON(!list_empty(&ppgtt->base.inactive_list));
+	WARN_ON(!list_empty(&ppgtt->base.unbound_list));
 
 	list_del(&ppgtt->base.global_link);
 	drm_mm_takedown(&ppgtt->base.mm);
@@ -3269,6 +3271,8 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 	struct i915_vma *vma;
 	int i;
 
+	GEM_BUG_ON(vm->closed);
+
 	if (WARN_ON(i915_is_ggtt(vm) != !!ggtt_view))
 		return ERR_PTR(-EINVAL);
 
@@ -3276,11 +3280,11 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 	if (vma == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	INIT_LIST_HEAD(&vma->vm_link);
 	INIT_LIST_HEAD(&vma->obj_link);
 	INIT_LIST_HEAD(&vma->exec_list);
 	for (i = 0; i < ARRAY_SIZE(vma->last_read); i++)
 		init_request_active(&vma->last_read[i], i915_vma_retire);
+	list_add(&vma->vm_link, &vm->unbound_list);
 	vma->vm = vm;
 	vma->obj = obj;
 	vma->is_ggtt = i915_is_ggtt(vm);
@@ -3327,6 +3331,7 @@ i915_gem_obj_lookup_or_create_ggtt_vma(struct drm_i915_gem_object *obj,
 	if (!vma)
 		vma = __i915_gem_vma_create(obj, ggtt, view);
 
+	GEM_BUG_ON(vma->closed);
 	return vma;
 
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index d68d5fd02923..6346d1786d41 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -292,6 +292,8 @@ struct i915_address_space {
 	u64 start;		/* Start offset always 0 for dri2 */
 	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
 
+	bool closed;
+
 	struct i915_page_scratch *scratch_page;
 	struct i915_page_table *scratch_pt;
 	struct i915_page_directory *scratch_pd;
@@ -320,6 +322,13 @@ struct i915_address_space {
 	 */
 	struct list_head inactive_list;
 
+	/**
+	 * List of vma that have been unbound.
+	 *
+	 * A reference is not held on the buffer while on this list.
+	 */
+	struct list_head unbound_list;
+
 	/* FIXME: Need a more generic return type */
 	gen6_pte_t (*pte_encode)(dma_addr_t addr,
 				 enum i915_cache_level level,
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 1c81a1470baf..c110563823bd 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -692,7 +692,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 
 	vma->bound |= GLOBAL_BIND;
 	__i915_vma_set_map_and_fenceable(vma);
-	list_add_tail(&vma->vm_link, &ggtt->inactive_list);
+	list_move_tail(&vma->vm_link, &ggtt->inactive_list);
 	obj->bind_count++;
 
 	list_add_tail(&obj->global_list, &dev_priv->mm.bound_list);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction"
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (84 preceding siblings ...)
  2016-01-11  9:17 ` [PATCH 086/190] drm/i915: Mark the context and address space as closed Chris Wilson
@ 2016-01-11 10:44 ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 088/190] drm/i915: Move execlists interrupt based submission to a bottom-half Chris Wilson
                     ` (53 more replies)
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
  86 siblings, 54 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

This reverts commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae.

The patch was only a stop-gap measure that fixed half the problem - the
leak of the fbcon when restarting X. A complete solution required
releasing the VMA when the object itself was closed rather than rely on
file/process exit. The previous patches add the VMA tracking necessary
to do close them along with the object, context or file, and so the time
has come to remove the partial fix.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         |  5 -----
 drivers/gpu/drm/i915/i915_gem.c         | 14 ++------------
 drivers/gpu/drm/i915/i915_gem_context.c | 22 ----------------------
 3 files changed, 2 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fc35a9b8d910..4e912fd3b8c6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2704,11 +2704,6 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
 		  u32 flags);
 void __i915_vma_set_map_and_fenceable(struct i915_vma *vma);
 int __must_check i915_vma_unbind(struct i915_vma *vma);
-/*
- * BEWARE: Do not use the function below unless you can _absolutely_
- * _guarantee_ VMA in question is _not in use_ anywhere.
- */
-int __must_check __i915_vma_unbind_no_wait(struct i915_vma *vma);
 void i915_vma_close(struct i915_vma *vma);
 
 int i915_gem_object_unbind(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 16ee3bd7010e..391f840d29b7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2608,7 +2608,7 @@ static void i915_vma_destroy(struct i915_vma *vma)
 	kmem_cache_free(to_i915(vma->obj->base.dev)->vmas, vma);
 }
 
-static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
+int i915_vma_unbind(struct i915_vma *vma)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
 	int ret, i;
@@ -2616,7 +2616,7 @@ static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
 	/* First wait upon any activity as retiring the request may
 	 * have side-effects such as unpinning or even unbinding this vma.
 	 */
-	if (vma->active && wait) {
+	if (vma->active) {
 		bool was_closed;
 
 		/* When a closed VMA is retired, it is unbound - eek. */
@@ -2692,16 +2692,6 @@ destroy:
 	return 0;
 }
 
-int i915_vma_unbind(struct i915_vma *vma)
-{
-	return __i915_vma_unbind(vma, true);
-}
-
-int __i915_vma_unbind_no_wait(struct i915_vma *vma)
-{
-	return __i915_vma_unbind(vma, false);
-}
-
 int i915_gpu_idle(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 4583d8fe3585..e0ecfdfb0c8c 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -133,21 +133,6 @@ static int get_context_size(struct drm_device *dev)
 	return ret;
 }
 
-static void i915_gem_context_clean(struct intel_context *ctx)
-{
-	struct i915_hw_ppgtt *ppgtt = ctx->ppgtt;
-	struct i915_vma *vma, *next;
-
-	if (!ppgtt)
-		return;
-
-	list_for_each_entry_safe(vma, next, &ppgtt->base.inactive_list,
-				 vm_link) {
-		if (WARN_ON(__i915_vma_unbind_no_wait(vma)))
-			break;
-	}
-}
-
 void i915_gem_context_free(struct kref *ctx_ref)
 {
 	struct intel_context *ctx = container_of(ctx_ref, typeof(*ctx), ref);
@@ -158,13 +143,6 @@ void i915_gem_context_free(struct kref *ctx_ref)
 	if (i915.enable_execlists)
 		intel_lr_context_free(ctx);
 
-	/*
-	 * This context is going away and we need to remove all VMAs still
-	 * around. This is to handle imported shared objects for which
-	 * destructor did not run when their handles were closed.
-	 */
-	i915_gem_context_clean(ctx);
-
 	i915_ppgtt_put(ctx->ppgtt);
 
 	if (ctx->legacy_hw_ctx.rcs_state)
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 088/190] drm/i915: Move execlists interrupt based submission to a bottom-half
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-02-19 12:08     ` Tvrtko Ursulin
  2016-01-11 10:44   ` [PATCH 089/190] drm/i915: Tidy execlists submission and tracking Chris Wilson
                     ` (52 subsequent siblings)
  53 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

[  196.988204] clocksource: timekeeping watchdog: Marking clocksource 'tsc' as unstable because the skew is too large:
[  196.988512] clocksource:                       'refined-jiffies' wd_now: ffff9b48 wd_last: ffff9acb mask: ffffffff
[  196.988559] clocksource:                       'tsc' cs_now: 4fcfa84354 cs_last: 4f95425e98 mask: ffffffffffffffff
[  196.992115] clocksource: Switched to clocksource refined-jiffies

Followed by a hard lockup.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |   5 +-
 drivers/gpu/drm/i915/i915_gem.c         |  15 +--
 drivers/gpu/drm/i915/i915_irq.c         |   2 +-
 drivers/gpu/drm/i915/intel_lrc.c        | 164 +++++++++++++++++---------------
 drivers/gpu/drm/i915/intel_lrc.h        |   3 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
 6 files changed, 98 insertions(+), 92 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 378bc73296aa..15a6fddfb79b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2094,7 +2094,6 @@ static int i915_execlists(struct seq_file *m, void *data)
 	for_each_ring(ring, dev_priv, ring_id) {
 		struct drm_i915_gem_request *head_req = NULL;
 		int count = 0;
-		unsigned long flags;
 
 		seq_printf(m, "%s\n", ring->name);
 
@@ -2121,12 +2120,12 @@ static int i915_execlists(struct seq_file *m, void *data)
 				   i, status, ctx_id);
 		}
 
-		spin_lock_irqsave(&ring->execlist_lock, flags);
+		spin_lock(&ring->execlist_lock);
 		list_for_each(cursor, &ring->execlist_queue)
 			count++;
 		head_req = list_first_entry_or_null(&ring->execlist_queue,
 				struct drm_i915_gem_request, execlist_link);
-		spin_unlock_irqrestore(&ring->execlist_lock, flags);
+		spin_unlock(&ring->execlist_lock);
 
 		seq_printf(m, "\t%d requests in queue\n", count);
 		if (head_req) {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 391f840d29b7..eb875ecd7907 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2192,13 +2192,13 @@ static void i915_gem_reset_ring_cleanup(struct intel_engine_cs *engine)
 	 */
 
 	if (i915.enable_execlists) {
-		spin_lock_irq(&engine->execlist_lock);
+		spin_lock(&engine->execlist_lock);
 
 		/* list_splice_tail_init checks for empty lists */
 		list_splice_tail_init(&engine->execlist_queue,
 				      &engine->execlist_retired_req_list);
 
-		spin_unlock_irq(&engine->execlist_lock);
+		spin_unlock(&engine->execlist_lock);
 		intel_execlists_retire_requests(engine);
 	}
 
@@ -2290,15 +2290,8 @@ i915_gem_retire_requests(struct drm_device *dev)
 	for_each_ring(ring, dev_priv, i) {
 		i915_gem_retire_requests_ring(ring);
 		idle &= list_empty(&ring->request_list);
-		if (i915.enable_execlists) {
-			unsigned long flags;
-
-			spin_lock_irqsave(&ring->execlist_lock, flags);
-			idle &= list_empty(&ring->execlist_queue);
-			spin_unlock_irqrestore(&ring->execlist_lock, flags);
-
-			intel_execlists_retire_requests(ring);
-		}
+		if (i915.enable_execlists)
+			idle &= intel_execlists_retire_requests(ring);
 	}
 
 	if (idle)
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index ce047ac84f5f..b2ef2d0c211b 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1316,7 +1316,7 @@ gen8_cs_irq_handler(struct intel_engine_cs *ring, u32 iir, int test_shift)
 	if (iir & (GT_RENDER_USER_INTERRUPT << test_shift))
 		notify_ring(ring);
 	if (iir & (GT_CONTEXT_SWITCH_INTERRUPT << test_shift))
-		intel_lrc_irq_handler(ring);
+		wake_up_process(ring->execlists_submit);
 }
 
 static irqreturn_t gen8_gt_irq_handler(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b5f62b5f4913..de5889e95d6d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -132,6 +132,8 @@
  *
  */
 
+#include <linux/kthread.h>
+
 #include <drm/drmP.h>
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
@@ -341,7 +343,7 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
 	rq0->elsp_submitted++;
 
 	/* You must always write both descriptors in the order below. */
-	spin_lock(&dev_priv->uncore.lock);
+	spin_lock_irq(&dev_priv->uncore.lock);
 	intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL);
 	I915_WRITE_FW(RING_ELSP(engine), upper_32_bits(desc[1]));
 	I915_WRITE_FW(RING_ELSP(engine), lower_32_bits(desc[1]));
@@ -353,7 +355,7 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
 	/* ELSP is a wo register, use another nearby reg for posting */
 	POSTING_READ_FW(RING_EXECLIST_STATUS_LO(engine));
 	intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL);
-	spin_unlock(&dev_priv->uncore.lock);
+	spin_unlock_irq(&dev_priv->uncore.lock);
 }
 
 static int execlists_update_context(struct drm_i915_gem_request *rq)
@@ -492,89 +494,84 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
 	return false;
 }
 
-static void get_context_status(struct intel_engine_cs *ring,
-			       u8 read_pointer,
-			       u32 *status, u32 *context_id)
+static void set_rtpriority(void)
 {
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-
-	if (WARN_ON(read_pointer >= GEN8_CSB_ENTRIES))
-		return;
-
-	*status = I915_READ(RING_CONTEXT_STATUS_BUF_LO(ring, read_pointer));
-	*context_id = I915_READ(RING_CONTEXT_STATUS_BUF_HI(ring, read_pointer));
+	 struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO/2-1 };
+	 sched_setscheduler_nocheck(current, SCHED_FIFO, &param);
 }
 
-/**
- * intel_lrc_irq_handler() - handle Context Switch interrupts
- * @ring: Engine Command Streamer to handle.
- *
- * Check the unread Context Status Buffers and manage the submission of new
- * contexts to the ELSP accordingly.
- */
-void intel_lrc_irq_handler(struct intel_engine_cs *ring)
+static int intel_execlists_submit(void *arg)
 {
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-	u32 status_pointer;
-	u8 read_pointer;
-	u8 write_pointer;
-	u32 status = 0;
-	u32 status_id;
-	u32 submit_contexts = 0;
+	struct intel_engine_cs *ring = arg;
+	struct drm_i915_private *dev_priv = ring->i915;
 
-	status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
+	set_rtpriority();
 
-	read_pointer = ring->next_context_status_buffer;
-	write_pointer = GEN8_CSB_WRITE_PTR(status_pointer);
-	if (read_pointer > write_pointer)
-		write_pointer += GEN8_CSB_ENTRIES;
+	do {
+		u32 status;
+		u32 status_id;
+		u32 submit_contexts;
+		u8 head, tail;
 
-	spin_lock(&ring->execlist_lock);
+		set_current_state(TASK_INTERRUPTIBLE);
+		head = ring->next_context_status_buffer;
+		tail = I915_READ(RING_CONTEXT_STATUS_PTR(ring)) & GEN8_CSB_PTR_MASK;
+		if (head == tail) {
+			if (kthread_should_stop())
+				return 0;
 
-	while (read_pointer < write_pointer) {
+			schedule();
+			continue;
+		}
+		__set_current_state(TASK_RUNNING);
 
-		get_context_status(ring, ++read_pointer % GEN8_CSB_ENTRIES,
-				   &status, &status_id);
+		if (head > tail)
+			tail += GEN8_CSB_ENTRIES;
 
-		if (status & GEN8_CTX_STATUS_IDLE_ACTIVE)
-			continue;
+		status = 0;
+		submit_contexts = 0;
 
-		if (status & GEN8_CTX_STATUS_PREEMPTED) {
-			if (status & GEN8_CTX_STATUS_LITE_RESTORE) {
-				if (execlists_check_remove_request(ring, status_id))
-					WARN(1, "Lite Restored request removed from queue\n");
-			} else
-				WARN(1, "Preemption without Lite Restore\n");
-		}
+		spin_lock(&ring->execlist_lock);
 
-		if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) ||
-		    (status & GEN8_CTX_STATUS_ELEMENT_SWITCH)) {
-			if (execlists_check_remove_request(ring, status_id))
-				submit_contexts++;
-		}
-	}
+		while (head++ < tail) {
+			status = I915_READ(RING_CONTEXT_STATUS_BUF_LO(ring, head % GEN8_CSB_ENTRIES));
+			status_id = I915_READ(RING_CONTEXT_STATUS_BUF_HI(ring, head % GEN8_CSB_ENTRIES));
 
-	if (disable_lite_restore_wa(ring)) {
-		/* Prevent a ctx to preempt itself */
-		if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) &&
-		    (submit_contexts != 0))
-			execlists_context_unqueue(ring);
-	} else if (submit_contexts != 0) {
-		execlists_context_unqueue(ring);
-	}
+			if (status & GEN8_CTX_STATUS_IDLE_ACTIVE)
+				continue;
 
-	spin_unlock(&ring->execlist_lock);
+			if (status & GEN8_CTX_STATUS_PREEMPTED) {
+				if (status & GEN8_CTX_STATUS_LITE_RESTORE) {
+					if (execlists_check_remove_request(ring, status_id))
+						WARN(1, "Lite Restored request removed from queue\n");
+				} else
+					WARN(1, "Preemption without Lite Restore\n");
+			}
 
-	if (unlikely(submit_contexts > 2))
-		DRM_ERROR("More than two context complete events?\n");
+			if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) ||
+			    (status & GEN8_CTX_STATUS_ELEMENT_SWITCH)) {
+				if (execlists_check_remove_request(ring, status_id))
+					submit_contexts++;
+			}
+		}
+
+		if (disable_lite_restore_wa(ring)) {
+			/* Prevent a ctx to preempt itself */
+			if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) &&
+					(submit_contexts != 0))
+				execlists_context_unqueue(ring);
+		} else if (submit_contexts != 0) {
+			execlists_context_unqueue(ring);
+		}
 
-	ring->next_context_status_buffer = write_pointer % GEN8_CSB_ENTRIES;
+		spin_unlock(&ring->execlist_lock);
 
-	/* Update the read pointer to the old write pointer. Manual ringbuffer
-	 * management ftw </sarcasm> */
-	I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
-		   _MASKED_FIELD(GEN8_CSB_READ_PTR_MASK,
-				 ring->next_context_status_buffer << 8));
+		WARN(submit_contexts > 2, "More than two context complete events?\n");
+		ring->next_context_status_buffer = tail % GEN8_CSB_ENTRIES;
+		I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
+			   _MASKED_FIELD(GEN8_CSB_PTR_MASK << 8,
+					 ring->next_context_status_buffer<<8));
+	} while (1);
 }
 
 static int execlists_context_queue(struct drm_i915_gem_request *request)
@@ -585,7 +582,7 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
 
 	i915_gem_request_get(request);
 
-	spin_lock_irq(&engine->execlist_lock);
+	spin_lock(&engine->execlist_lock);
 
 	list_for_each_entry(cursor, &engine->execlist_queue, execlist_link)
 		if (++num_elements > 2)
@@ -611,7 +608,7 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
 	if (num_elements == 0)
 		execlists_context_unqueue(engine);
 
-	spin_unlock_irq(&engine->execlist_lock);
+	spin_unlock(&engine->execlist_lock);
 
 	return 0;
 }
@@ -667,19 +664,19 @@ intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
 		execlists_context_queue(request);
 }
 
-void intel_execlists_retire_requests(struct intel_engine_cs *ring)
+bool intel_execlists_retire_requests(struct intel_engine_cs *ring)
 {
 	struct drm_i915_gem_request *req, *tmp;
 	struct list_head retired_list;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
 	if (list_empty(&ring->execlist_retired_req_list))
-		return;
+		goto out;
 
 	INIT_LIST_HEAD(&retired_list);
-	spin_lock_irq(&ring->execlist_lock);
+	spin_lock(&ring->execlist_lock);
 	list_replace_init(&ring->execlist_retired_req_list, &retired_list);
-	spin_unlock_irq(&ring->execlist_lock);
+	spin_unlock(&ring->execlist_lock);
 
 	list_for_each_entry_safe(req, tmp, &retired_list, execlist_link) {
 		struct intel_context *ctx = req->ctx;
@@ -691,6 +688,9 @@ void intel_execlists_retire_requests(struct intel_engine_cs *ring)
 		list_del(&req->execlist_link);
 		i915_gem_request_put(req);
 	}
+
+out:
+	return list_empty(&ring->execlist_queue);
 }
 
 void intel_logical_ring_stop(struct intel_engine_cs *ring)
@@ -1525,6 +1525,9 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 	if (!intel_engine_initialized(ring))
 		return;
 
+	if (ring->execlists_submit)
+		kthread_stop(ring->execlists_submit);
+
 	if (ring->buffer) {
 		struct drm_i915_private *dev_priv = ring->i915;
 		intel_logical_ring_stop(ring);
@@ -1550,13 +1553,15 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 
 static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
 {
+	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct task_struct *task;
 	int ret;
 
 	/* Intentionally left blank. */
 	ring->buffer = NULL;
 
 	ring->dev = dev;
-	ring->i915 = to_i915(dev);
+	ring->i915 = dev_priv;
 	ring->fence_context = fence_context_alloc(1);
 	INIT_LIST_HEAD(&ring->request_list);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
@@ -1587,6 +1592,15 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 		goto error;
 	}
 
+	ring->next_context_status_buffer =
+			I915_READ(RING_CONTEXT_STATUS_PTR(ring)) & GEN8_CSB_PTR_MASK;
+	task = kthread_run(intel_execlists_submit, ring,
+			   "irq/i915:%de", ring->id);
+	if (IS_ERR(task))
+		goto error;
+
+	ring->execlists_submit = task;
+
 	return 0;
 
 error:
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 87bc9acc4224..33f82a84065a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -81,7 +81,6 @@ uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
 u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
-void intel_lrc_irq_handler(struct intel_engine_cs *ring);
-void intel_execlists_retire_requests(struct intel_engine_cs *ring);
+bool intel_execlists_retire_requests(struct intel_engine_cs *ring);
 
 #endif /* _INTEL_LRC_H_ */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index bb92d831a100..edaf07b2292e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -291,6 +291,7 @@ struct intel_engine_cs {
 	} semaphore;
 
 	/* Execlists */
+	struct task_struct *execlists_submit;
 	spinlock_t execlist_lock;
 	struct list_head execlist_queue;
 	struct list_head execlist_retired_req_list;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 089/190] drm/i915: Tidy execlists submission and tracking
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
  2016-01-11 10:44   ` [PATCH 088/190] drm/i915: Move execlists interrupt based submission to a bottom-half Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 090/190] drm/i915: Refactor execlists default context pinning Chris Wilson
                     ` (51 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

Other than dramatically simplifying the submission code (requests ftw),
we can reduce the execlist spinlock duration and importantly avoid
having to hold it across the context switch register reads.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  20 +-
 drivers/gpu/drm/i915/i915_gem.c            |   8 +-
 drivers/gpu/drm/i915/i915_gem_request.h    |  21 +-
 drivers/gpu/drm/i915/i915_guc_submission.c |  31 +-
 drivers/gpu/drm/i915/intel_lrc.c           | 505 +++++++++++------------------
 drivers/gpu/drm/i915/intel_lrc.h           |   3 -
 drivers/gpu/drm/i915/intel_ringbuffer.h    |   8 +-
 7 files changed, 209 insertions(+), 387 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 15a6fddfb79b..a5ea90944bbb 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2005,8 +2005,7 @@ static void i915_dump_lrc_obj(struct seq_file *m,
 		return;
 	}
 
-	seq_printf(m, "CONTEXT: %s %u\n", ring->name,
-		   intel_execlists_ctx_id(ctx_obj));
+	seq_printf(m, "CONTEXT: %s\n", ring->name);
 
 	if (!i915_gem_obj_ggtt_bound(ctx_obj))
 		seq_puts(m, "\tNot bound in GGTT\n");
@@ -2092,7 +2091,6 @@ static int i915_execlists(struct seq_file *m, void *data)
 	intel_runtime_pm_get(dev_priv);
 
 	for_each_ring(ring, dev_priv, ring_id) {
-		struct drm_i915_gem_request *head_req = NULL;
 		int count = 0;
 
 		seq_printf(m, "%s\n", ring->name);
@@ -2105,8 +2103,8 @@ static int i915_execlists(struct seq_file *m, void *data)
 		status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
 		seq_printf(m, "\tStatus pointer: 0x%08X\n", status_pointer);
 
-		read_pointer = ring->next_context_status_buffer;
-		write_pointer = GEN8_CSB_WRITE_PTR(status_pointer);
+		read_pointer = (status_pointer >> 8) & GEN8_CSB_PTR_MASK;
+		write_pointer = status_pointer & GEN8_CSB_PTR_MASK;
 		if (read_pointer > write_pointer)
 			write_pointer += GEN8_CSB_ENTRIES;
 		seq_printf(m, "\tRead pointer: 0x%08X, write pointer 0x%08X\n",
@@ -2123,21 +2121,9 @@ static int i915_execlists(struct seq_file *m, void *data)
 		spin_lock(&ring->execlist_lock);
 		list_for_each(cursor, &ring->execlist_queue)
 			count++;
-		head_req = list_first_entry_or_null(&ring->execlist_queue,
-				struct drm_i915_gem_request, execlist_link);
 		spin_unlock(&ring->execlist_lock);
 
 		seq_printf(m, "\t%d requests in queue\n", count);
-		if (head_req) {
-			struct drm_i915_gem_object *ctx_obj;
-
-			ctx_obj = head_req->ctx->engine[ring_id].state;
-			seq_printf(m, "\tHead request id: %u\n",
-				   intel_execlists_ctx_id(ctx_obj));
-			seq_printf(m, "\tHead request tail: %u\n",
-				   head_req->tail);
-		}
-
 		seq_putc(m, '\n');
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index eb875ecd7907..054e11cff00f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2193,12 +2193,12 @@ static void i915_gem_reset_ring_cleanup(struct intel_engine_cs *engine)
 
 	if (i915.enable_execlists) {
 		spin_lock(&engine->execlist_lock);
-
-		/* list_splice_tail_init checks for empty lists */
 		list_splice_tail_init(&engine->execlist_queue,
-				      &engine->execlist_retired_req_list);
-
+				      &engine->execlist_completed);
+		memset(&engine->execlist_port, 0,
+		       sizeof(engine->execlist_port));
 		spin_unlock(&engine->execlist_lock);
+
 		intel_execlists_retire_requests(engine);
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 59957d5edfdb..c2e83584f8a2 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -63,10 +63,11 @@ struct drm_i915_gem_request {
 	 * This is required to calculate the maximum available ringbuffer
 	 * space without overwriting the postfix.
 	 */
-	 u32 postfix;
+	u32 postfix;
 
 	/** Position in the ringbuffer of the end of the whole request */
 	u32 tail;
+	u32 wa_tail;
 
 	/**
 	 * Context and ring buffer related to this request
@@ -99,24 +100,8 @@ struct drm_i915_gem_request {
 	/** process identifier submitting this request */
 	struct pid *pid;
 
-	/**
-	 * The ELSP only accepts two elements at a time, so we queue
-	 * context/tail pairs on a given queue (ring->execlist_queue) until the
-	 * hardware is available. The queue serves a double purpose: we also use
-	 * it to keep track of the up to 2 contexts currently in the hardware
-	 * (usually one in execution and the other queued up by the GPU): We
-	 * only remove elements from the head of the queue when the hardware
-	 * informs us that an element has been completed.
-	 *
-	 * All accesses to the queue are mediated by a spinlock
-	 * (ring->execlist_lock).
-	 */
-
 	/** Execlist link in the submission queue.*/
-	struct list_head execlist_link;
-
-	/** Execlists no. of times this request has been sent to the ELSP */
-	int elsp_submitted;
+	struct list_head execlist_link; /* guarded by engine->execlist_lock */
 };
 
 struct drm_i915_gem_request *
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 5a6251926367..f4e09952d52c 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -393,7 +393,6 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 		struct intel_ring *ring = ctx->engine[i].ring;
 		struct intel_engine_cs *engine;
 		struct drm_i915_gem_object *obj;
-		uint64_t ctx_desc;
 
 		/* TODO: We have a design issue to be solved here. Only when we
 		 * receive the first batch, we know which engine is used by the
@@ -407,8 +406,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 			break;	/* XXX: continue? */
 
 		engine = ring->engine;
-		ctx_desc = intel_lr_context_descriptor(ctx, engine);
-		lrc->context_desc = (u32)ctx_desc;
+		lrc->context_desc = engine->execlist_context_descriptor;
 
 		/* The state page is after PPHWSP */
 		lrc->ring_lcra = i915_gem_obj_ggtt_offset(obj) +
@@ -548,7 +546,7 @@ static int guc_add_workqueue_item(struct i915_guc_client *gc,
 			WQ_NO_WCFLUSH_WAIT;
 
 	/* The GuC wants only the low-order word of the context descriptor */
-	wqi->context_desc = (u32)intel_lr_context_descriptor(rq->ctx, rq->engine);
+	wqi->context_desc = rq->engine->execlist_context_descriptor;
 
 	/* The GuC firmware wants the tail index in QWords, not bytes */
 	tail = rq->ring->tail >> 3;
@@ -562,27 +560,6 @@ static int guc_add_workqueue_item(struct i915_guc_client *gc,
 
 #define CTX_RING_BUFFER_START		0x08
 
-/* Update the ringbuffer pointer in a saved context image */
-static void lr_context_update(struct drm_i915_gem_request *rq)
-{
-	enum intel_engine_id ring_id = rq->engine->id;
-	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring_id].state;
-	struct drm_i915_gem_object *rb_obj = rq->ring->obj;
-	struct page *page;
-	uint32_t *reg_state;
-
-	BUG_ON(!ctx_obj);
-	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj));
-	WARN_ON(!i915_gem_obj_is_pinned(rb_obj));
-
-	page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
-	reg_state = kmap_atomic(page);
-
-	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(rb_obj);
-
-	kunmap_atomic(reg_state);
-}
-
 /**
  * i915_guc_submit() - Submit commands through GuC
  * @client:	the guc client where commands will go through
@@ -597,10 +574,6 @@ int i915_guc_submit(struct i915_guc_client *client,
 	enum intel_engine_id ring_id = rq->engine->id;
 	int q_ret, b_ret;
 
-	/* Need this because of the deferred pin ctx and ring */
-	/* Shall we move this right after ring is pinned? */
-	lr_context_update(rq);
-
 	q_ret = guc_add_workqueue_item(client, rq);
 	if (q_ret == 0)
 		b_ret = guc_ring_doorbell(client);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index de5889e95d6d..80b346a3fd8a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -265,233 +265,133 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
-/**
- * intel_execlists_ctx_id() - get the Execlists Context ID
- * @ctx_obj: Logical Ring Context backing object.
- *
- * Do not confuse with ctx->id! Unfortunately we have a name overload
- * here: the old context ID we pass to userspace as a handler so that
- * they can refer to a context, and the new context ID we pass to the
- * ELSP so that the GPU can inform us of the context status via
- * interrupts.
- *
- * Return: 20-bits globally unique context ID.
- */
-u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
-{
-	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
-			LRC_PPHWSP_PN * PAGE_SIZE;
-
-	/* LRCA is required to be 4K aligned so the more significant 20 bits
-	 * are globally unique */
-	return lrca >> 12;
-}
-
-static bool disable_lite_restore_wa(struct intel_engine_cs *ring)
-{
-	return (IS_SKL_REVID(ring->dev, 0, SKL_REVID_B0) ||
-		IS_BXT_REVID(ring->dev, 0, BXT_REVID_A1)) &&
-		(ring->id == VCS || ring->id == VCS2);
-}
-
-uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
-				     struct intel_engine_cs *ring)
+static u32 execlists_request_write_tail(struct drm_i915_gem_request *req)
 {
-	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
-	uint64_t desc;
-	uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
-			LRC_PPHWSP_PN * PAGE_SIZE;
-
-	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
-
-	desc = GEN8_CTX_VALID;
-	desc |= GEN8_CTX_ADDRESSING_MODE(ring->i915) << GEN8_CTX_ADDRESSING_MODE_SHIFT;
-	if (IS_GEN8(ring->i915))
-		desc |= GEN8_CTX_L3LLC_COHERENT;
-	desc |= GEN8_CTX_PRIVILEGE;
-	desc |= lrca;
-	desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
-
-	/* TODO: WaDisableLiteRestore when we start using semaphore
-	 * signalling between Command Streamers */
-	/* desc |= GEN8_CTX_FORCE_RESTORE; */
+	struct intel_ring *ring = req->ring;
+	struct i915_hw_ppgtt *ppgtt = req->ctx->ppgtt;
 
-	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
-	/* WaEnableForceRestoreInCtxtDescForVCS:bxt */
-	if (disable_lite_restore_wa(ring))
-		desc |= GEN8_CTX_FORCE_RESTORE;
+	if (ppgtt && !USES_FULL_48BIT_PPGTT(req->i915)) {
+		/* True 32b PPGTT with dynamic page allocation: update PDP
+		 * registers and point the unallocated PDPs to scratch page.
+		 * PML4 is allocated during ppgtt init, so this is not needed
+		 * in 48-bit mode.
+		 */
+		if (ppgtt->pd_dirty_rings & intel_engine_flag(req->engine)) {
+			ASSIGN_CTX_PDP(ppgtt, ring->registers, 3);
+			ASSIGN_CTX_PDP(ppgtt, ring->registers, 2);
+			ASSIGN_CTX_PDP(ppgtt, ring->registers, 1);
+			ASSIGN_CTX_PDP(ppgtt, ring->registers, 0);
+			ppgtt->pd_dirty_rings &= ~intel_engine_flag(req->engine);
+		}
+	}
 
-	return desc;
+	ring->registers[CTX_RING_TAIL+1] = req->tail;
+	return ring->context_descriptor;
 }
 
-static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
-				 struct drm_i915_gem_request *rq1)
+static void execlists_submit_pair(struct intel_engine_cs *ring)
 {
+	struct drm_i915_private *dev_priv = ring->i915;
+	uint32_t desc[4];
 
-	struct intel_engine_cs *engine = rq0->engine;
-	struct drm_i915_private *dev_priv = rq0->i915;
-	uint64_t desc[2];
-
-	if (rq1) {
-		desc[1] = intel_lr_context_descriptor(rq1->ctx, rq1->engine);
-		rq1->elsp_submitted++;
-	} else {
-		desc[1] = 0;
-	}
+	if (ring->execlist_port[1]) {
+		desc[0] = execlists_request_write_tail(ring->execlist_port[1]);
+		desc[1] = ring->execlist_port[1]->fence.seqno;
+	} else
+		desc[1] = desc[0] = 0;
 
-	desc[0] = intel_lr_context_descriptor(rq0->ctx, rq0->engine);
-	rq0->elsp_submitted++;
+	desc[2] = execlists_request_write_tail(ring->execlist_port[0]);
+	desc[3] = ring->execlist_port[0]->fence.seqno;
 
-	/* You must always write both descriptors in the order below. */
-	spin_lock_irq(&dev_priv->uncore.lock);
-	intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL);
-	I915_WRITE_FW(RING_ELSP(engine), upper_32_bits(desc[1]));
-	I915_WRITE_FW(RING_ELSP(engine), lower_32_bits(desc[1]));
+	/* Note: You must always write both descriptors in the order below. */
+	I915_WRITE_FW(RING_ELSP(ring), desc[1]);
+	I915_WRITE_FW(RING_ELSP(ring), desc[0]);
+	I915_WRITE_FW(RING_ELSP(ring), desc[3]);
 
-	I915_WRITE_FW(RING_ELSP(engine), upper_32_bits(desc[0]));
 	/* The context is automatically loaded after the following */
-	I915_WRITE_FW(RING_ELSP(engine), lower_32_bits(desc[0]));
-
-	/* ELSP is a wo register, use another nearby reg for posting */
-	POSTING_READ_FW(RING_EXECLIST_STATUS_LO(engine));
-	intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL);
-	spin_unlock_irq(&dev_priv->uncore.lock);
+	I915_WRITE_FW(RING_ELSP(ring), desc[2]);
 }
 
-static int execlists_update_context(struct drm_i915_gem_request *rq)
+static void execlists_context_unqueue(struct intel_engine_cs *engine)
 {
-	struct i915_hw_ppgtt *ppgtt = rq->ctx->ppgtt;
-	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[rq->engine->id].state;
-	struct drm_i915_gem_object *rb_obj = rq->ring->obj;
-	struct page *page;
-	uint32_t *reg_state;
-
-	BUG_ON(!ctx_obj);
-	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj));
-	WARN_ON(!i915_gem_obj_is_pinned(rb_obj));
-
-	page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
-	reg_state = kmap_atomic(page);
+	struct drm_i915_gem_request *cursor;
+	bool submit = false;
+	int port = 0;
 
-	reg_state[CTX_RING_TAIL+1] = rq->tail;
-	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(rb_obj);
+	assert_spin_locked(&engine->execlist_lock);
 
-	if (ppgtt && !USES_FULL_48BIT_PPGTT(rq->i915)) {
-		/* True 32b PPGTT with dynamic page allocation: update PDP
-		 * registers and point the unallocated PDPs to scratch page.
-		 * PML4 is allocated during ppgtt init, so this is not needed
-		 * in 48-bit mode.
+	/* Try to read in pairs and fill both submission ports */
+	cursor = engine->execlist_port[port];
+	if (cursor != NULL) {
+		/* WaIdleLiteRestore:bdw,skl
+		 * Apply the wa NOOPs to prevent ring:HEAD == req:TAIL
+		 * as we resubmit the request. See gen8_emit_request()
+		 * for where we prepare the padding after the end of the
+		 * request.
 		 */
-		ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
-		ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
-		ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
-		ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
-	}
-
-	kunmap_atomic(reg_state);
-
-	return 0;
-}
+		cursor->tail = cursor->wa_tail;
+		cursor = list_next_entry(cursor, execlist_link);
+	} else
+		cursor = list_first_entry(&engine->execlist_queue,
+					  typeof(*cursor),
+					  execlist_link);
+	while (&cursor->execlist_link != &engine->execlist_queue) {
+		/* Same ctx: ignore earlier request, as the
+		 * second request extends the first.
+		 */
+		if (engine->execlist_port[port] &&
+		    cursor->ctx != engine->execlist_port[port]->ctx) {
+			if (++port == ARRAY_SIZE(engine->execlist_port))
+				break;
+		}
 
-static void execlists_submit_requests(struct drm_i915_gem_request *rq0,
-				      struct drm_i915_gem_request *rq1)
-{
-	execlists_update_context(rq0);
+		engine->execlist_port[port] = cursor;
+		submit = true;
 
-	if (rq1)
-		execlists_update_context(rq1);
+		cursor = list_next_entry(cursor, execlist_link);
+	}
 
-	execlists_elsp_write(rq0, rq1);
+	if (submit)
+		execlists_submit_pair(engine);
 }
 
-static void execlists_context_unqueue(struct intel_engine_cs *engine)
+static bool execlists_complete_requests(struct intel_engine_cs *engine,
+					u32 seqno)
 {
-	struct drm_i915_gem_request *req0 = NULL, *req1 = NULL;
-	struct drm_i915_gem_request *cursor = NULL, *tmp = NULL;
-
 	assert_spin_locked(&engine->execlist_lock);
 
-	/*
-	 * If irqs are not active generate a warning as batches that finish
-	 * without the irqs may get lost and a GPU Hang may occur.
-	 */
-	WARN_ON(!intel_irqs_enabled(engine->dev->dev_private));
+	do {
+		struct drm_i915_gem_request *req;
 
-	if (list_empty(&engine->execlist_queue))
-		return;
+		req = engine->execlist_port[0];
+		if (req == NULL)
+			break;
 
-	/* Try to read in pairs */
-	list_for_each_entry_safe(cursor, tmp, &engine->execlist_queue,
-				 execlist_link) {
-		if (!req0) {
-			req0 = cursor;
-		} else if (req0->ctx == cursor->ctx) {
-			/* Same ctx: ignore first request, as second request
-			 * will update tail past first request's workload */
-			cursor->elsp_submitted = req0->elsp_submitted;
-			list_del(&req0->execlist_link);
-			list_add_tail(&req0->execlist_link,
-				&engine->execlist_retired_req_list);
-			req0 = cursor;
-		} else {
-			req1 = cursor;
+		if (!i915_seqno_passed(seqno, req->fence.seqno))
 			break;
-		}
-	}
 
-	if (IS_GEN8(engine->dev) || IS_GEN9(engine->dev)) {
-		/*
-		 * WaIdleLiteRestore: make sure we never cause a lite
-		 * restore with HEAD==TAIL
+		/* Move the completed set of requests from the start of the
+		 * execlist_queue over to the tail of the execlist_completed.
 		 */
-		if (req0->elsp_submitted) {
-			/*
-			 * Apply the wa NOOPS to prevent ring:HEAD == req:TAIL
-			 * as we resubmit the request. See gen8_add_request()
-			 * for where we prepare the padding after the end of the
-			 * request.
-			 */
-			struct intel_ring *ring;
-
-			ring = req0->ctx->engine[engine->id].ring;
-			req0->tail += 8;
-			req0->tail &= ring->size - 1;
-		}
-	}
-
-	WARN_ON(req1 && req1->elsp_submitted);
+		engine->execlist_completed.prev->next = engine->execlist_queue.next;
+		engine->execlist_completed.prev = &req->execlist_link;
 
-	execlists_submit_requests(req0, req1);
-}
-
-static bool execlists_check_remove_request(struct intel_engine_cs *ring,
-					   u32 request_id)
-{
-	struct drm_i915_gem_request *head_req;
+		engine->execlist_queue.next = req->execlist_link.next;
+		req->execlist_link.next->prev = &engine->execlist_queue;
 
-	assert_spin_locked(&ring->execlist_lock);
+		req->execlist_link.next = &engine->execlist_completed;
 
-	head_req = list_first_entry_or_null(&ring->execlist_queue,
-					    struct drm_i915_gem_request,
-					    execlist_link);
-
-	if (head_req != NULL) {
-		struct drm_i915_gem_object *ctx_obj =
-				head_req->ctx->engine[ring->id].state;
-		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
-			WARN(head_req->elsp_submitted == 0,
-			     "Never submitted head request\n");
-
-			if (--head_req->elsp_submitted <= 0) {
-				list_del(&head_req->execlist_link);
-				list_add_tail(&head_req->execlist_link,
-					&ring->execlist_retired_req_list);
-				return true;
-			}
-		}
-	}
+		/* The hardware has completed the request on this port, it
+		 * will switch to the next.
+		 */
+		engine->execlist_port[0] = engine->execlist_port[1];
+		engine->execlist_port[1] = NULL;
+	} while (1);
 
-	return false;
+	if (engine->execlist_context_descriptor & GEN8_CTX_FORCE_RESTORE)
+		return engine->execlist_port[0] == NULL;
+	else
+		return engine->execlist_port[1] == NULL;
 }
 
 static void set_rtpriority(void)
@@ -504,23 +404,29 @@ static int intel_execlists_submit(void *arg)
 {
 	struct intel_engine_cs *ring = arg;
 	struct drm_i915_private *dev_priv = ring->i915;
+	const i915_reg_t ptrs = RING_CONTEXT_STATUS_PTR(ring);
 
 	set_rtpriority();
 
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
 	do {
-		u32 status;
-		u32 status_id;
-		u32 submit_contexts;
 		u8 head, tail;
+		u32 seqno;
 
 		set_current_state(TASK_INTERRUPTIBLE);
-		head = ring->next_context_status_buffer;
-		tail = I915_READ(RING_CONTEXT_STATUS_PTR(ring)) & GEN8_CSB_PTR_MASK;
+		head = tail = 0;
+		if (READ_ONCE(ring->execlist_port[0])) {
+			u32 x = I915_READ_FW(ptrs);
+			head = x >> 8;
+			tail = x;
+		}
 		if (head == tail) {
+			intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 			if (kthread_should_stop())
 				return 0;
 
 			schedule();
+			intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
 			continue;
 		}
 		__set_current_state(TASK_RUNNING);
@@ -528,86 +434,46 @@ static int intel_execlists_submit(void *arg)
 		if (head > tail)
 			tail += GEN8_CSB_ENTRIES;
 
-		status = 0;
-		submit_contexts = 0;
-
-		spin_lock(&ring->execlist_lock);
-
+		seqno = 0;
 		while (head++ < tail) {
-			status = I915_READ(RING_CONTEXT_STATUS_BUF_LO(ring, head % GEN8_CSB_ENTRIES));
-			status_id = I915_READ(RING_CONTEXT_STATUS_BUF_HI(ring, head % GEN8_CSB_ENTRIES));
-
-			if (status & GEN8_CTX_STATUS_IDLE_ACTIVE)
-				continue;
-
-			if (status & GEN8_CTX_STATUS_PREEMPTED) {
-				if (status & GEN8_CTX_STATUS_LITE_RESTORE) {
-					if (execlists_check_remove_request(ring, status_id))
-						WARN(1, "Lite Restored request removed from queue\n");
-				} else
-					WARN(1, "Preemption without Lite Restore\n");
-			}
-
-			if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) ||
-			    (status & GEN8_CTX_STATUS_ELEMENT_SWITCH)) {
-				if (execlists_check_remove_request(ring, status_id))
-					submit_contexts++;
+			u32 status = I915_READ_FW(RING_CONTEXT_STATUS_BUF_LO(ring,
+									     head % GEN8_CSB_ENTRIES));
+			if (unlikely(status & GEN8_CTX_STATUS_PREEMPTED && 0)) {
+				DRM_ERROR("Pre-empted request %x %s Lite Restore\n",
+					  I915_READ_FW(RING_CONTEXT_STATUS_BUF_HI(ring, head % GEN8_CSB_ENTRIES)),
+					  status & GEN8_CTX_STATUS_LITE_RESTORE ? "with" : "without");
 			}
+			if (status & (GEN8_CTX_STATUS_ACTIVE_IDLE |
+				      GEN8_CTX_STATUS_ELEMENT_SWITCH))
+				seqno = I915_READ_FW(RING_CONTEXT_STATUS_BUF_HI(ring,
+										head % GEN8_CSB_ENTRIES));
 		}
 
-		if (disable_lite_restore_wa(ring)) {
-			/* Prevent a ctx to preempt itself */
-			if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) &&
-					(submit_contexts != 0))
+		I915_WRITE_FW(ptrs,
+			      _MASKED_FIELD(GEN8_CSB_PTR_MASK<<8,
+					    (tail % GEN8_CSB_ENTRIES) << 8));
+
+		if (seqno) {
+			spin_lock(&ring->execlist_lock);
+			if (execlists_complete_requests(ring, seqno))
 				execlists_context_unqueue(ring);
-		} else if (submit_contexts != 0) {
-			execlists_context_unqueue(ring);
+			spin_unlock(&ring->execlist_lock);
 		}
-
-		spin_unlock(&ring->execlist_lock);
-
-		WARN(submit_contexts > 2, "More than two context complete events?\n");
-		ring->next_context_status_buffer = tail % GEN8_CSB_ENTRIES;
-		I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
-			   _MASKED_FIELD(GEN8_CSB_PTR_MASK << 8,
-					 ring->next_context_status_buffer<<8));
 	} while (1);
 }
 
 static int execlists_context_queue(struct drm_i915_gem_request *request)
 {
 	struct intel_engine_cs *engine = request->engine;
-	struct drm_i915_gem_request *cursor;
-	int num_elements = 0;
 
 	i915_gem_request_get(request);
 
 	spin_lock(&engine->execlist_lock);
-
-	list_for_each_entry(cursor, &engine->execlist_queue, execlist_link)
-		if (++num_elements > 2)
-			break;
-
-	if (num_elements > 2) {
-		struct drm_i915_gem_request *tail_req;
-
-		tail_req = list_last_entry(&engine->execlist_queue,
-					   struct drm_i915_gem_request,
-					   execlist_link);
-
-		if (request->ctx == tail_req->ctx) {
-			WARN(tail_req->elsp_submitted != 0,
-				"More than 2 already-submitted reqs queued\n");
-			list_del(&tail_req->execlist_link);
-			list_add_tail(&tail_req->execlist_link,
-				&engine->execlist_retired_req_list);
-		}
-	}
-
 	list_add_tail(&request->execlist_link, &engine->execlist_queue);
-	if (num_elements == 0)
-		execlists_context_unqueue(engine);
-
+	if (engine->execlist_port[0] == NULL) {
+		engine->execlist_port[0] = request;
+		execlists_submit_pair(engine);
+	}
 	spin_unlock(&engine->execlist_lock);
 
 	return 0;
@@ -641,56 +507,32 @@ int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request
 	return 0;
 }
 
-/*
- * intel_logical_ring_advance_and_submit() - advance the tail and submit the workload
- * @request: Request to advance the logical ringbuffer of.
- *
- * The tail is updated in our logical ringbuffer struct, not in the actual context. What
- * really happens during submission is that the context and current tail will be placed
- * on a queue waiting for the ELSP to be ready to accept a new context submission. At that
- * point, the tail *inside* the context is updated and the ELSP written to.
- */
-static void
-intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
-{
-	struct drm_i915_private *dev_priv = request->i915;
-
-	intel_ring_advance(request->ring);
-	request->tail = request->ring->tail;
-
-	if (dev_priv->guc.execbuf_client)
-		i915_guc_submit(dev_priv->guc.execbuf_client, request);
-	else
-		execlists_context_queue(request);
-}
-
 bool intel_execlists_retire_requests(struct intel_engine_cs *ring)
 {
 	struct drm_i915_gem_request *req, *tmp;
-	struct list_head retired_list;
+	struct list_head list;
 
-	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
-	if (list_empty(&ring->execlist_retired_req_list))
+	lockdep_assert_held(&ring->dev->struct_mutex);
+	if (list_empty(&ring->execlist_completed))
 		goto out;
 
-	INIT_LIST_HEAD(&retired_list);
 	spin_lock(&ring->execlist_lock);
-	list_replace_init(&ring->execlist_retired_req_list, &retired_list);
+	list_replace_init(&ring->execlist_completed, &list);
 	spin_unlock(&ring->execlist_lock);
 
-	list_for_each_entry_safe(req, tmp, &retired_list, execlist_link) {
+	list_for_each_entry_safe(req, tmp, &list, execlist_link) {
 		struct intel_context *ctx = req->ctx;
 		struct drm_i915_gem_object *ctx_obj =
 				ctx->engine[ring->id].state;
 
 		if (ctx_obj && (ctx != ring->default_context))
 			intel_lr_context_unpin(req);
-		list_del(&req->execlist_link);
+
 		i915_gem_request_put(req);
 	}
 
 out:
-	return list_empty(&ring->execlist_queue);
+	return READ_ONCE(ring->execlist_port[0]) == NULL;
 }
 
 void intel_logical_ring_stop(struct intel_engine_cs *ring)
@@ -720,6 +562,7 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 		struct intel_ring *ringbuf)
 {
 	struct drm_i915_private *dev_priv = ring->i915;
+	u32 ggtt_offset;
 	int ret = 0;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
@@ -734,6 +577,16 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 
 	ctx_obj->dirty = true;
 
+	ggtt_offset =
+		i915_gem_obj_ggtt_offset(ctx_obj) + LRC_PPHWSP_PN * PAGE_SIZE;
+	ringbuf->context_descriptor =
+		ggtt_offset | ring->execlist_context_descriptor;
+
+	ringbuf->registers =
+		kmap(i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN));
+	ringbuf->registers[CTX_RING_BUFFER_START+1] =
+		i915_gem_obj_ggtt_offset(ringbuf->obj);
+
 	/* Invalidate GuC TLB. */
 	if (i915.enable_guc_submission)
 		I915_WRITE(GEN8_GTCR, GEN8_GTCR_INVALIDATE);
@@ -768,6 +621,7 @@ static int intel_lr_context_pin(struct drm_i915_gem_request *rq)
 
 void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 {
+	struct drm_i915_gem_object *ctx_obj;
 	int engine = rq->engine->id;
 
 	WARN_ON(!mutex_is_locked(&rq->i915->dev->struct_mutex));
@@ -775,7 +629,10 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 		return;
 
 	intel_ring_unmap(rq->ring);
-	i915_gem_object_ggtt_unpin(rq->ctx->engine[engine].state);
+
+	ctx_obj = rq->ctx->engine[engine].state;
+	kunmap(i915_gem_object_get_page(ctx_obj, LRC_STATE_PN));
+	i915_gem_object_ggtt_unpin(ctx_obj);
 	i915_gem_context_unreference(rq->ctx);
 }
 
@@ -1168,12 +1025,39 @@ out:
 	return ret;
 }
 
+static bool disable_lite_restore_wa(struct intel_engine_cs *ring)
+{
+	return (IS_SKL_REVID(ring->i915, 0, SKL_REVID_B0) ||
+		IS_BXT_REVID(ring->i915, 0, BXT_REVID_A1)) &&
+		(ring->id == VCS || ring->id == VCS2);
+}
+
+static uint64_t lr_context_descriptor(struct intel_engine_cs *ring)
+{
+	uint64_t desc;
+
+	desc = GEN8_CTX_VALID;
+	desc |= GEN8_CTX_ADDRESSING_MODE(ring->i915) << GEN8_CTX_ADDRESSING_MODE_SHIFT;
+	if (IS_GEN8(ring->i915))
+		desc |= GEN8_CTX_L3LLC_COHERENT;
+	desc |= GEN8_CTX_PRIVILEGE;
+
+	/* TODO: WaDisableLiteRestore when we start using semaphore
+	 * signalling between Command Streamers */
+	/* desc |= GEN8_CTX_FORCE_RESTORE; */
+
+	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
+	/* WaEnableForceRestoreInCtxtDescForVCS:bxt */
+	if (disable_lite_restore_wa(ring))
+		desc |= GEN8_CTX_FORCE_RESTORE;
+
+	return desc;
+}
+
 static int gen8_init_common_ring(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	u8 next_context_status_buffer_hw;
-
 	lrc_setup_hardware_status_page(ring,
 				ring->default_context->engine[ring->id].state);
 
@@ -1197,18 +1081,6 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring)
 	 * SKL  |         ?                |         ?            |
 	 * BXT  |         ?                |         ?            |
 	 */
-	next_context_status_buffer_hw =
-		GEN8_CSB_WRITE_PTR(I915_READ(RING_CONTEXT_STATUS_PTR(ring)));
-
-	/*
-	 * When the CSB registers are reset (also after power-up / gpu reset),
-	 * CSB write pointer is set to all 1's, which is not valid, use '5' in
-	 * this special case, so the first element read is CSB[0].
-	 */
-	if (next_context_status_buffer_hw == GEN8_CSB_PTR_MASK)
-		next_context_status_buffer_hw = (GEN8_CSB_ENTRIES - 1);
-
-	ring->next_context_status_buffer = next_context_status_buffer_hw;
 	DRM_DEBUG_DRIVER("Execlists enabled for %s\n", ring->name);
 
 	memset(&ring->hangcheck, 0, sizeof(ring->hangcheck));
@@ -1482,7 +1354,8 @@ static int gen8_add_request(struct drm_i915_gem_request *request)
 	intel_ring_emit(ring, request->fence.seqno);
 	intel_ring_emit(ring, MI_USER_INTERRUPT);
 	intel_ring_emit(ring, MI_NOOP);
-	intel_logical_ring_advance_and_submit(request);
+	intel_ring_advance(ring);
+	request->tail = ring->tail;
 
 	/*
 	 * Here we add two extra NOOPs as padding to avoid
@@ -1491,6 +1364,12 @@ static int gen8_add_request(struct drm_i915_gem_request *request)
 	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_advance(ring);
+	request->wa_tail = ring->tail;
+
+	if (request->i915->guc.execbuf_client)
+		i915_guc_submit(request->i915->guc.execbuf_client, request);
+	else
+		execlists_context_queue(request);
 
 	return 0;
 }
@@ -1569,9 +1448,11 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 
 	INIT_LIST_HEAD(&ring->buffers);
 	INIT_LIST_HEAD(&ring->execlist_queue);
-	INIT_LIST_HEAD(&ring->execlist_retired_req_list);
+	INIT_LIST_HEAD(&ring->execlist_completed);
 	spin_lock_init(&ring->execlist_lock);
 
+	ring->execlist_context_descriptor = lr_context_descriptor(ring);
+
 	ret = i915_cmd_parser_init_ring(ring);
 	if (ret)
 		goto error;
@@ -1592,8 +1473,6 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 		goto error;
 	}
 
-	ring->next_context_status_buffer =
-			I915_READ(RING_CONTEXT_STATUS_PTR(ring)) & GEN8_CSB_PTR_MASK;
 	task = kthread_run(intel_execlists_submit, ring,
 			   "irq/i915:%de", ring->id);
 	if (IS_ERR(task))
@@ -1904,9 +1783,7 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 					  CTX_CTRL_RS_CTX_ENABLE));
 	ASSIGN_CTX_REG(reg_state, CTX_RING_HEAD, RING_HEAD(ring->mmio_base), 0);
 	ASSIGN_CTX_REG(reg_state, CTX_RING_TAIL, RING_TAIL(ring->mmio_base), 0);
-	/* Ring buffer start address is not known until the buffer is pinned.
-	 * It is written to the context image in execlists_update_context()
-	 */
+	/* Ring buffer start address is not known until the buffer is pinned. */
 	ASSIGN_CTX_REG(reg_state, CTX_RING_BUFFER_START, RING_START(ring->mmio_base), 0);
 	ASSIGN_CTX_REG(reg_state, CTX_RING_BUFFER_CONTROL, RING_CTL(ring->mmio_base),
 		       ((ringbuf->size - PAGE_SIZE) & RING_NR_PAGES) | RING_VALID);
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 33f82a84065a..37601a35d5fc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -74,12 +74,9 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 void intel_lr_context_unpin(struct drm_i915_gem_request *req);
 void intel_lr_context_reset(struct drm_device *dev,
 			struct intel_context *ctx);
-uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
-				     struct intel_engine_cs *ring);
 
 /* Execlists */
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
-u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
 bool intel_execlists_retire_requests(struct intel_engine_cs *ring);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index edaf07b2292e..3d4d5711aea9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -122,6 +122,9 @@ struct intel_ring {
 	 * we can detect new retirements.
 	 */
 	u32 last_retired_head;
+
+	u32 context_descriptor;
+	u32 *registers;
 };
 
 struct	intel_context;
@@ -293,9 +296,10 @@ struct intel_engine_cs {
 	/* Execlists */
 	struct task_struct *execlists_submit;
 	spinlock_t execlist_lock;
+	struct drm_i915_gem_request *execlist_port[2];
 	struct list_head execlist_queue;
-	struct list_head execlist_retired_req_list;
-	u8 next_context_status_buffer;
+	struct list_head execlist_completed;
+	u32 execlist_context_descriptor;
 	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 
 	/**
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 090/190] drm/i915: Refactor execlists default context pinning
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
  2016-01-11 10:44   ` [PATCH 088/190] drm/i915: Move execlists interrupt based submission to a bottom-half Chris Wilson
  2016-01-11 10:44   ` [PATCH 089/190] drm/i915: Tidy execlists submission and tracking Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 091/190] drm/i915: Move context initialisation to first-use Chris Wilson
                     ` (50 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

Refactor pinning and unpinning of contexts, such that the default
context for an engine is pinned during initialisation and unpinned
during teardown (pinning of the context handles the reference counting).
Thus we can eliminate the special case handling of the default context
that was required to mask that it was not being pinned normally.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |   7 +-
 drivers/gpu/drm/i915/i915_gem_request.c |   6 +-
 drivers/gpu/drm/i915/intel_lrc.c        | 117 +++++++++++++-------------------
 drivers/gpu/drm/i915/intel_lrc.h        |   3 +-
 4 files changed, 53 insertions(+), 80 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index a5ea90944bbb..ea5b9f6d0fc9 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2052,11 +2052,8 @@ static int i915_dump_lrc(struct seq_file *m, void *unused)
 		return ret;
 
 	list_for_each_entry(ctx, &dev_priv->context_list, link) {
-		for_each_ring(ring, dev_priv, i) {
-			if (ring->default_context != ctx)
-				i915_dump_lrc_obj(m, ring,
-						  ctx->engine[i].state);
-		}
+		for_each_ring(ring, dev_priv, i)
+			i915_dump_lrc_obj(m, ring, ctx->engine[i].state);
 	}
 
 	mutex_unlock(&dev->struct_mutex);
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 069c0b9dfd95..61be8dda4a14 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -345,10 +345,8 @@ static void __i915_gem_request_retire_active(struct drm_i915_gem_request *req)
 void i915_gem_request_cancel(struct drm_i915_gem_request *req)
 {
 	intel_ring_reserved_space_cancel(req->ring);
-	if (i915.enable_execlists) {
-		if (req->ctx != req->engine->default_context)
-			intel_lr_context_unpin(req);
-	}
+	if (i915.enable_execlists)
+		intel_lr_context_unpin(req->ctx, req->engine);
 
 	/* If a request is to be discarded after actions have been queued upon
 	 * it, we cannot unwind that request and it must be submitted rather
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 80b346a3fd8a..31fbb482d15c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -227,7 +227,8 @@ enum {
 #define GEN8_CTX_ID_SHIFT 32
 #define CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT  0x17
 
-static int intel_lr_context_pin(struct drm_i915_gem_request *rq);
+static int intel_lr_context_pin(struct intel_context *ctx,
+				struct intel_engine_cs *engine);
 static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
 		struct drm_i915_gem_object *default_ctx_obj);
 
@@ -485,11 +486,9 @@ int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request
 
 	request->ring = request->ctx->engine[request->engine->id].ring;
 
-	if (request->ctx != request->engine->default_context) {
-		ret = intel_lr_context_pin(request);
-		if (ret)
-			return ret;
-	}
+	ret = intel_lr_context_pin(request->ctx, request->engine);
+	if (ret)
+		return ret;
 
 	if (i915.enable_guc_submission) {
 		/*
@@ -521,13 +520,7 @@ bool intel_execlists_retire_requests(struct intel_engine_cs *ring)
 	spin_unlock(&ring->execlist_lock);
 
 	list_for_each_entry_safe(req, tmp, &list, execlist_link) {
-		struct intel_context *ctx = req->ctx;
-		struct drm_i915_gem_object *ctx_obj =
-				ctx->engine[ring->id].state;
-
-		if (ctx_obj && (ctx != ring->default_context))
-			intel_lr_context_unpin(req);
-
+		intel_lr_context_unpin(req->ctx, req->engine);
 		i915_gem_request_put(req);
 	}
 
@@ -557,83 +550,73 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
 	I915_WRITE_MODE(ring, _MASKED_BIT_DISABLE(STOP_RING));
 }
 
-static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
-		struct drm_i915_gem_object *ctx_obj,
-		struct intel_ring *ringbuf)
+static int intel_lr_context_pin(struct intel_context *ctx,
+				struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *dev_priv = ring->i915;
+	struct drm_i915_private *dev_priv = engine->i915;
+	struct drm_i915_gem_object *ctx_obj;
+	struct intel_ring *ring;
 	u32 ggtt_offset;
 	int ret = 0;
 
-	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
+	if (ctx->engine[engine->id].pin_count++)
+		return 0;
+
+	lockdep_assert_held(&engine->dev->struct_mutex);
+
+	ctx_obj = ctx->engine[engine->id].state;
 	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN,
 				    PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
 	if (ret)
-		return ret;
+		goto err;
 
-	ret = intel_ring_map(ringbuf);
+	ring = ctx->engine[engine->id].ring;
+	ret = intel_ring_map(ring);
 	if (ret)
 		goto unpin_ctx_obj;
 
+	i915_gem_context_reference(ctx);
 	ctx_obj->dirty = true;
 
 	ggtt_offset =
 		i915_gem_obj_ggtt_offset(ctx_obj) + LRC_PPHWSP_PN * PAGE_SIZE;
-	ringbuf->context_descriptor =
-		ggtt_offset | ring->execlist_context_descriptor;
+	ring->context_descriptor =
+		ggtt_offset | engine->execlist_context_descriptor;
 
-	ringbuf->registers =
+	ring->registers =
 		kmap(i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN));
-	ringbuf->registers[CTX_RING_BUFFER_START+1] =
-		i915_gem_obj_ggtt_offset(ringbuf->obj);
+	ring->registers[CTX_RING_BUFFER_START+1] =
+		i915_gem_obj_ggtt_offset(ring->obj);
 
 	/* Invalidate GuC TLB. */
 	if (i915.enable_guc_submission)
 		I915_WRITE(GEN8_GTCR, GEN8_GTCR_INVALIDATE);
 
-	return ret;
+	return 0;
 
 unpin_ctx_obj:
 	i915_gem_object_ggtt_unpin(ctx_obj);
-
+err:
+	ctx->engine[engine->id].pin_count = 0;
 	return ret;
 }
 
-static int intel_lr_context_pin(struct drm_i915_gem_request *rq)
-{
-	int engine = rq->engine->id;
-	int ret;
-
-	if (rq->ctx->engine[engine].pin_count++)
-		return 0;
-
-	ret = intel_lr_context_do_pin(rq->engine,
-				      rq->ctx->engine[engine].state,
-				      rq->ring);
-	if (ret) {
-		rq->ctx->engine[engine].pin_count = 0;
-		return ret;
-	}
-
-	i915_gem_context_reference(rq->ctx);
-	return 0;
-}
-
-void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
+void intel_lr_context_unpin(struct intel_context *ctx,
+			    struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_object *ctx_obj;
-	int engine = rq->engine->id;
 
-	WARN_ON(!mutex_is_locked(&rq->i915->dev->struct_mutex));
-	if (--rq->ctx->engine[engine].pin_count)
+	lockdep_assert_held(&engine->dev->struct_mutex);
+	if (--ctx->engine[engine->id].pin_count)
 		return;
 
-	intel_ring_unmap(rq->ring);
+	intel_ring_unmap(ctx->engine[engine->id].ring);
 
-	ctx_obj = rq->ctx->engine[engine].state;
+	ctx_obj = ctx->engine[engine->id].state;
 	kunmap(i915_gem_object_get_page(ctx_obj, LRC_STATE_PN));
 	i915_gem_object_ggtt_unpin(ctx_obj);
-	i915_gem_context_unreference(rq->ctx);
+
+	i915_gem_context_unreference(ctx);
 }
 
 static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
@@ -1425,6 +1408,7 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 		kunmap(sg_page(ring->status_page.obj->pages->sgl));
 		ring->status_page.obj = NULL;
 	}
+	intel_lr_context_unpin(ring->default_context, ring);
 
 	lrc_destroy_wa_ctx_obj(ring);
 	ring->dev = NULL;
@@ -1433,6 +1417,7 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct intel_context *ctx;
 	struct task_struct *task;
 	int ret;
 
@@ -1457,19 +1442,17 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	if (ret)
 		goto error;
 
-	ret = intel_lr_context_deferred_alloc(ring->default_context, ring);
+	ctx = ring->default_context;
+
+	ret = intel_lr_context_deferred_alloc(ctx, ring);
 	if (ret)
 		goto error;
 
 	/* As this is the default context, always pin it */
-	ret = intel_lr_context_do_pin(
-			ring,
-			ring->default_context->engine[ring->id].state,
-			ring->default_context->engine[ring->id].ring);
+	ret = intel_lr_context_pin(ctx, ring);
 	if (ret) {
-		DRM_ERROR(
-			"Failed to pin and map ringbuffer %s: %d\n",
-			ring->name, ret);
+		DRM_ERROR("Failed to pin context for %s: %d\n",
+			  ring->name, ret);
 		goto error;
 	}
 
@@ -1872,15 +1855,9 @@ void intel_lr_context_free(struct intel_context *ctx)
 		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
 
 		if (ctx_obj) {
-			struct intel_ring *ring = ctx->engine[i].ring;
-			struct intel_engine_cs *engine = ring->engine;
+			WARN_ON(ctx->engine[i].pin_count);
 
-			if (ctx == engine->default_context) {
-				intel_ring_unmap(ring);
-				i915_gem_object_ggtt_unpin(ctx_obj);
-			}
-			WARN_ON(ctx->engine[engine->id].pin_count);
-			intel_ring_free(ring);
+			intel_ring_free(ctx->engine[i].ring);
 			drm_gem_object_unreference(&ctx_obj->base);
 		}
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 37601a35d5fc..a43d1e5e5f5a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -71,7 +71,8 @@ void intel_lr_context_free(struct intel_context *ctx);
 uint32_t intel_lr_context_size(struct intel_engine_cs *ring);
 int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 				    struct intel_engine_cs *ring);
-void intel_lr_context_unpin(struct drm_i915_gem_request *req);
+void intel_lr_context_unpin(struct intel_context *ctx,
+			    struct intel_engine_cs *engine);
 void intel_lr_context_reset(struct drm_device *dev,
 			struct intel_context *ctx);
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 091/190] drm/i915: Move context initialisation to first-use
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (2 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 090/190] drm/i915: Refactor execlists default context pinning Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 092/190] drm/i915: Move the magical deferred context allocation into the request Chris Wilson
                     ` (49 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

Instead of allocating a new request when allocating a context, use the
request that initiated the allocation to emit the context
initialisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/intel_lrc.c | 42 ++++++++++++++++------------------------
 2 files changed, 18 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4e912fd3b8c6..f5f457920944 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -885,6 +885,7 @@ struct intel_context {
 		struct drm_i915_gem_object *state;
 		struct intel_ring *ring;
 		int pin_count;
+		bool initialised;
 	} engine[I915_NUM_RINGS];
 
 	struct list_head link;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 31fbb482d15c..f892e658cd4b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -482,14 +482,9 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
 
 int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request)
 {
+	struct intel_engine_cs *engine = request->engine;
 	int ret;
 
-	request->ring = request->ctx->engine[request->engine->id].ring;
-
-	ret = intel_lr_context_pin(request->ctx, request->engine);
-	if (ret)
-		return ret;
-
 	if (i915.enable_guc_submission) {
 		/*
 		 * Check that the GuC has space for the request before
@@ -503,6 +498,21 @@ int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request
 			return ret;
 	}
 
+	request->ring = request->ctx->engine[engine->id].ring;
+
+	ret = intel_lr_context_pin(request->ctx, engine);
+	if (ret)
+		return ret;
+
+	if (!request->ctx->engine[engine->id].initialised) {
+		ret = engine->init_context(request);
+		if (ret) {
+			intel_lr_context_unpin(request->ctx, engine);
+			return ret;
+		}
+		request->ctx->engine[engine->id].initialised = true;
+	}
+
 	return 0;
 }
 
@@ -1968,26 +1978,8 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 
 	ctx->engine[engine->id].ring = ring;
 	ctx->engine[engine->id].state = ctx_obj;
+	ctx->engine[engine->id].initialised = engine->init_context == NULL;
 
-	if (ctx != engine->default_context && engine->init_context) {
-		struct drm_i915_gem_request *req;
-
-		req = i915_gem_request_alloc(engine, ctx);
-		if (IS_ERR(req)) {
-			DRM_ERROR("ring create req: %d\n",
-				ret);
-			goto error_ringbuf;
-		}
-
-		ret = engine->init_context(req);
-		if (ret) {
-			DRM_ERROR("ring init context: %d\n",
-				ret);
-			i915_gem_request_cancel(req);
-			goto error_ringbuf;
-		}
-		i915_add_request_no_flush(req);
-	}
 	return 0;
 
 error_ringbuf:
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 092/190] drm/i915: Move the magical deferred context allocation into the request
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (3 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 091/190] drm/i915: Move context initialisation to first-use Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 093/190] drm/i915: Move the forced switch back to the kernel context into eviction Chris Wilson
                     ` (48 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

We can hide more details of execlists from higher level code by removing
the explicit call to create an execlist context into its first use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  8 --------
 drivers/gpu/drm/i915/intel_lrc.c           | 14 ++++++++++----
 drivers/gpu/drm/i915/intel_lrc.h           |  2 --
 3 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 19d32f22f85d..7a9d3f4732e9 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1093,14 +1093,6 @@ i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
 		return ERR_PTR(-EIO);
 	}
 
-	if (i915.enable_execlists && !ctx->engine[ring->id].state) {
-		int ret = intel_lr_context_deferred_alloc(ctx, ring);
-		if (ret) {
-			DRM_DEBUG("Could not create LRC %u: %d\n", ctx_id, ret);
-			return ERR_PTR(ret);
-		}
-	}
-
 	return ctx;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f892e658cd4b..3a2088f9d0be 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -227,6 +227,8 @@ enum {
 #define GEN8_CTX_ID_SHIFT 32
 #define CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT  0x17
 
+static int execlists_context_deferred_alloc(struct intel_context *ctx,
+					    struct intel_engine_cs *engine);
 static int intel_lr_context_pin(struct intel_context *ctx,
 				struct intel_engine_cs *engine);
 static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
@@ -494,6 +496,10 @@ int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request
 		struct intel_guc *guc = &request->i915->guc;
 
 		ret = i915_guc_wq_check_space(guc->execbuf_client);
+	}
+
+	if (request->ctx->engine[engine->id].state == NULL) {
+		ret = execlists_context_deferred_alloc(request->ctx, engine);
 		if (ret)
 			return ret;
 	}
@@ -1454,7 +1460,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 
 	ctx = ring->default_context;
 
-	ret = intel_lr_context_deferred_alloc(ctx, ring);
+	ret = execlists_context_deferred_alloc(ctx, ring);
 	if (ret)
 		goto error;
 
@@ -1930,7 +1936,7 @@ static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
 }
 
 /**
- * intel_lr_context_deferred_alloc() - create the LRC specific bits of a context
+ * execlists_context_deferred_alloc() - create the LRC specific bits of a context
  * @ctx: LR context to create.
  * @ring: engine to be used with the context.
  *
@@ -1942,8 +1948,8 @@ static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
  *
  * Return: non-zero on error.
  */
-int intel_lr_context_deferred_alloc(struct intel_context *ctx,
-				    struct intel_engine_cs *engine)
+static int execlists_context_deferred_alloc(struct intel_context *ctx,
+					    struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_object *ctx_obj;
 	uint32_t context_size;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index a43d1e5e5f5a..a454372fe660 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -69,8 +69,6 @@ int intel_logical_rings_init(struct drm_device *dev);
 
 void intel_lr_context_free(struct intel_context *ctx);
 uint32_t intel_lr_context_size(struct intel_engine_cs *ring);
-int intel_lr_context_deferred_alloc(struct intel_context *ctx,
-				    struct intel_engine_cs *ring);
 void intel_lr_context_unpin(struct intel_context *ctx,
 			    struct intel_engine_cs *engine);
 void intel_lr_context_reset(struct drm_device *dev,
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 093/190] drm/i915: Move the forced switch back to the kernel context into eviction
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (4 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 092/190] drm/i915: Move the magical deferred context allocation into the request Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 094/190] drm/i915: Remove early l3-remap Chris Wilson
                     ` (47 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

Currently, we always switch back to the kernel context (if available,
i.e. legacy HW contexts not execlists) whenever we try and idle the GPU.
We actually only require the switch when trying to evict everything (in
order to prevent fragmentation from placement of the currently active
context) from the global GTT, so move the forced switch into that one
callsite.

In the process, update the comments regarding mode of operation in
particular the distinction between evicting from the global GTT (which
may contain untracked items and transient global pins) and the
per-process GTT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c       |  14 ----
 drivers/gpu/drm/i915/i915_gem_evict.c | 140 +++++++++++++++++++++-------------
 2 files changed, 88 insertions(+), 66 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 054e11cff00f..989222eb107b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2691,21 +2691,7 @@ int i915_gpu_idle(struct drm_device *dev)
 	struct intel_engine_cs *ring;
 	int ret, i;
 
-	/* Flush everything onto the inactive list. */
 	for_each_ring(ring, dev_priv, i) {
-		if (!i915.enable_execlists) {
-			struct drm_i915_gem_request *req;
-
-			req = i915_gem_request_alloc(ring, ring->default_context);
-			if (IS_ERR(req))
-				return PTR_ERR(req);
-
-			ret = i915_switch_context(req);
-			i915_add_request_no_flush(req);
-			if (ret)
-				return ret;
-		}
-
 		ret = intel_engine_idle(ring);
 		if (ret)
 			return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index ea1f8d1bd228..b7bcc324a7a7 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -33,6 +33,36 @@
 #include "intel_drv.h"
 #include "i915_trace.h"
 
+static int switch_to_pinned_context(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *ring;
+	int ret, i;
+	int count = 0;
+
+	if (i915.enable_execlists)
+		return 0;
+
+	for_each_ring(ring, dev_priv, i) {
+		struct drm_i915_gem_request *req;
+
+		if (ring->last_context == ring->default_context)
+			continue;
+
+		req = i915_gem_request_alloc(ring, ring->default_context);
+		if (IS_ERR(req))
+			return PTR_ERR(req);
+
+		ret = i915_switch_context(req);
+		i915_add_request_no_flush(req);
+		if (ret)
+			return ret;
+
+		count++;
+	}
+
+	return count;
+}
+
 static bool
 mark_free(struct i915_vma *vma, struct list_head *unwind)
 {
@@ -76,37 +106,33 @@ i915_gem_evict_something(struct drm_device *dev, struct i915_address_space *vm,
 			 unsigned long start, unsigned long end,
 			 unsigned flags)
 {
-	struct list_head eviction_list, unwind_list;
-	struct i915_vma *vma;
-	int ret = 0;
-	int pass = 0;
+	struct list_head eviction_list;
+	struct list_head *phases[] = {
+		&vm->inactive_list,
+		&vm->active_list,
+		NULL,
+	}, **phase;
+	struct i915_vma *vma, *next;
+	int ret;
 
 	trace_i915_gem_evict(dev, min_size, alignment, flags);
 
 	/*
 	 * The goal is to evict objects and amalgamate space in LRU order.
 	 * The oldest idle objects reside on the inactive list, which is in
-	 * retirement order. The next objects to retire are those on the (per
-	 * ring) active list that do not have an outstanding flush. Once the
-	 * hardware reports completion (the seqno is updated after the
-	 * batchbuffer has been finished) the clean buffer objects would
-	 * be retired to the inactive list. Any dirty objects would be added
-	 * to the tail of the flushing list. So after processing the clean
-	 * active objects we need to emit a MI_FLUSH to retire the flushing
-	 * list, hence the retirement order of the flushing list is in
-	 * advance of the dirty objects on the active lists.
+	 * retirement order. The next objects to retire are those in flight,
+	 * on the active list, again in retirement order.
 	 *
 	 * The retirement sequence is thus:
 	 *   1. Inactive objects (already retired)
-	 *   2. Clean active objects
-	 *   3. Flushing list
-	 *   4. Dirty active objects.
+	 *   2. Active objects (will stall on unbinding)
 	 *
 	 * On each list, the oldest objects lie at the HEAD with the freshest
 	 * object on the TAIL.
 	 */
 
-	INIT_LIST_HEAD(&unwind_list);
+search_again:
+	INIT_LIST_HEAD(&eviction_list);
 	if (start != 0 || end != vm->total) {
 		drm_mm_init_scan_with_range(&vm->mm, min_size,
 					    alignment, cache_level,
@@ -114,26 +140,19 @@ i915_gem_evict_something(struct drm_device *dev, struct i915_address_space *vm,
 	} else
 		drm_mm_init_scan(&vm->mm, min_size, alignment, cache_level);
 
-search_again:
-	/* First see if there is a large enough contiguous idle region... */
-	list_for_each_entry(vma, &vm->inactive_list, vm_link) {
-		if (mark_free(vma, &unwind_list))
-			goto found;
-	}
-
 	if (flags & PIN_NONBLOCK)
-		goto none;
+		phases[1] = NULL;
 
-	/* Now merge in the soon-to-be-expired objects... */
-	list_for_each_entry(vma, &vm->active_list, vm_link) {
-		if (mark_free(vma, &unwind_list))
-			goto found;
-	}
+	phase = phases;
+	do {
+		list_for_each_entry(vma, *phase, vm_link)
+			if (mark_free(vma, &eviction_list))
+				goto found;
+	} while (*++phase);
 
-none:
 	/* Nothing found, clean up and bail out! */
-	while (!list_empty(&unwind_list)) {
-		vma = list_first_entry(&unwind_list,
+	while (!list_empty(&eviction_list)) {
+		vma = list_first_entry(&eviction_list,
 				       struct i915_vma,
 				       exec_list);
 		ret = drm_mm_scan_remove_block(&vma->node);
@@ -143,13 +162,24 @@ none:
 	}
 
 	/* Can we unpin some objects such as idle hw contents,
-	 * or pending flips?
+	 * or pending flips? But since only the GGTT has global entries
+	 * such as scanouts, rinbuffers and contexts, we can skip the
+	 * purge when inspecting per-process local address spaces.
 	 */
-	if (flags & PIN_NONBLOCK)
+	if (!i915_is_ggtt(vm) || flags & PIN_NONBLOCK)
 		return -ENOSPC;
 
-	/* Only idle the GPU and repeat the search once */
-	if (pass++ == 0) {
+	/* Not everything in the GGTT is tracked via vma (otherwise we
+	 * could evict as required with minimal stalling) so we are forced
+	 * to idle the GPU and explicitly retire outstanding requests in
+	 * the hopes that we can then remove contexts and the like only
+	 * bound by their active reference.
+	 */
+	ret = switch_to_pinned_context(to_i915(dev));
+	if (ret < 0)
+		return ret;
+
+	if (ret > 0) {
 		ret = i915_gpu_idle(dev);
 		if (ret)
 			return ret;
@@ -166,19 +196,16 @@ none:
 
 found:
 	/* drm_mm doesn't allow any other other operations while
-	 * scanning, therefore store to be evicted objects on a
-	 * temporary list. */
-	INIT_LIST_HEAD(&eviction_list);
-	while (!list_empty(&unwind_list)) {
-		vma = list_first_entry(&unwind_list,
-				       struct i915_vma,
-				       exec_list);
-		if (drm_mm_scan_remove_block(&vma->node)) {
-			list_move(&vma->exec_list, &eviction_list);
+	 * scanning, therefore store to-be-evicted objects on a
+	 * temporary list and take a reference for all before
+	 * calling unbind (which may remove the active reference
+	 * of any of our objects, thus corrupting the list).
+	 */
+	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+		if (drm_mm_scan_remove_block(&vma->node))
 			drm_gem_object_reference(&vma->obj->base);
-			continue;
-		}
-		list_del_init(&vma->exec_list);
+		else
+			list_del_init(&vma->exec_list);
 	}
 
 	/* Unbinding will emit any required flushes */
@@ -195,7 +222,6 @@ found:
 
 		drm_gem_object_unreference(obj);
 	}
-
 	return ret;
 }
 
@@ -261,12 +287,22 @@ int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
 	trace_i915_gem_evict_vm(vm);
 
 	if (do_idle) {
+		/* Switch back to the default context in order to unpin
+		 * the existing context objects. However, such objects only
+		 * pin themselves inside the global GTT and performing the
+		 * switch otherwise is ineffective.
+		 */
+		if (i915_is_ggtt(vm)) {
+			ret = switch_to_pinned_context(to_i915(vm->dev));
+			if (ret)
+				return ret;
+		}
+
 		ret = i915_gpu_idle(vm->dev);
-		if (ret)
+		if (ret < 0)
 			return ret;
 
 		i915_gem_retire_requests(vm->dev);
-
 		WARN_ON(!list_empty(&vm->active_list));
 	}
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 094/190] drm/i915: Remove early l3-remap
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (5 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 093/190] drm/i915: Move the forced switch back to the kernel context into eviction Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 095/190] drm/i915: Rearrange switch_context to load the aliasing ppgtt on first use Chris Wilson
                     ` (46 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

Since we do the l3-remap on context switch, and proceed to do a context
switch immediately after manually doing the l3-remap, we can remove the
redundant manual call.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         |  1 -
 drivers/gpu/drm/i915/i915_gem.c         | 35 +--------------------------------
 drivers/gpu/drm/i915/i915_gem_context.c | 30 +++++++++++++++++++++++++++-
 3 files changed, 30 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f5f457920944..7dc3eed71eb3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2827,7 +2827,6 @@ bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
 int __must_check i915_gem_init(struct drm_device *dev);
 int i915_gem_init_rings(struct drm_device *dev);
 int __must_check i915_gem_init_hw(struct drm_device *dev);
-int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice);
 void i915_gem_init_swizzling(struct drm_device *dev);
 void i915_gem_cleanup_ringbuffer(struct drm_device *dev);
 int __must_check i915_gpu_idle(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 989222eb107b..379913221ab1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3951,34 +3951,6 @@ err:
 	return ret;
 }
 
-int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice)
-{
-	struct drm_i915_private *dev_priv = req->i915;
-	u32 *remap_info = dev_priv->l3_parity.remap_info[slice];
-	int i, ret;
-
-	if (!HAS_L3_DPF(dev_priv) || !remap_info)
-		return 0;
-
-	ret = intel_ring_begin(req, GEN7_L3LOG_SIZE / 4 * 3);
-	if (ret)
-		return ret;
-
-	/*
-	 * Note: We do not worry about the concurrent register cacheline hang
-	 * here because no other code should access these registers other than
-	 * at initialization time.
-	 */
-	for (i = 0; i < GEN7_L3LOG_SIZE / 4; i++) {
-		intel_ring_emit(req->ring, MI_LOAD_REGISTER_IMM(1));
-		intel_ring_emit_reg(req->ring, GEN7_L3LOG(slice, i));
-		intel_ring_emit(req->ring, remap_info[i]);
-	}
-	intel_ring_advance(req->ring);
-
-	return ret;
-}
-
 void i915_gem_init_swizzling(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -4083,7 +4055,7 @@ i915_gem_init_hw(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	int ret, i, j;
+	int ret, i;
 
 	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
 		return -EIO;
@@ -4158,11 +4130,6 @@ i915_gem_init_hw(struct drm_device *dev)
 			goto out;
 		}
 
-		if (ring->id == RCS) {
-			for (j = 0; j < NUM_L3_SLICES(dev); j++)
-				i915_gem_l3_remap(req, j);
-		}
-
 		ret = i915_ppgtt_init_ring(req);
 		if (ret && ret != -EIO) {
 			DRM_ERROR("PPGTT enable ring #%d failed %d\n", i, ret);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index e0ecfdfb0c8c..15e2e2abd72d 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -665,6 +665,34 @@ needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to,
 	return false;
 }
 
+static int remap_l3(struct drm_i915_gem_request *req, int slice)
+{
+	struct drm_i915_private *dev_priv = req->i915;
+	u32 *remap_info = dev_priv->l3_parity.remap_info[slice];
+	int i, ret;
+
+	if (!HAS_L3_DPF(dev_priv) || !remap_info)
+		return 0;
+
+	ret = intel_ring_begin(req, GEN7_L3LOG_SIZE / 4 * 3);
+	if (ret)
+		return ret;
+
+	/*
+	 * Note: We do not worry about the concurrent register cacheline hang
+	 * here because no other code should access these registers other than
+	 * at initialization time.
+	 */
+	for (i = 0; i < GEN7_L3LOG_SIZE / 4; i++) {
+		intel_ring_emit(req->ring, MI_LOAD_REGISTER_IMM(1));
+		intel_ring_emit_reg(req->ring, GEN7_L3LOG(slice, i));
+		intel_ring_emit(req->ring, remap_info[i]);
+	}
+	intel_ring_advance(req->ring);
+
+	return 0;
+}
+
 static int do_switch(struct drm_i915_gem_request *req)
 {
 	struct intel_context *to = req->ctx;
@@ -764,7 +792,7 @@ static int do_switch(struct drm_i915_gem_request *req)
 		if (!(to->remap_slice & (1<<i)))
 			continue;
 
-		ret = i915_gem_l3_remap(req, i);
+		ret = remap_l3(req, i);
 		/* If it failed, try again next round */
 		if (ret)
 			DRM_DEBUG_DRIVER("L3 remapping failed\n");
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 095/190] drm/i915: Rearrange switch_context to load the aliasing ppgtt on first use
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (6 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 094/190] drm/i915: Remove early l3-remap Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 096/190] drm/i915: Eliminate early submission of context enabling request Chris Wilson
                     ` (45 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

The code to switch_mm() is already handled by i915_switch_context(), the
only difference required to setup the aliasing ppgtt is that we need to
emit te switch_mm() on the first context, i.e. when transitioning from
engine->last_context == NULL.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c         |  8 --------
 drivers/gpu/drm/i915/i915_gem_context.c | 10 +++++++---
 drivers/gpu/drm/i915/i915_gem_gtt.c     | 13 -------------
 drivers/gpu/drm/i915/i915_gem_gtt.h     |  1 -
 4 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 379913221ab1..d157ae1e5c2a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4130,14 +4130,6 @@ i915_gem_init_hw(struct drm_device *dev)
 			goto out;
 		}
 
-		ret = i915_ppgtt_init_ring(req);
-		if (ret && ret != -EIO) {
-			DRM_ERROR("PPGTT enable ring #%d failed %d\n", i, ret);
-			i915_gem_request_cancel(req);
-			i915_gem_cleanup_ringbuffer(dev);
-			goto out;
-		}
-
 		ret = i915_gem_context_enable(req);
 		if (ret && ret != -EIO) {
 			DRM_ERROR("Context enable ring #%d failed %d\n", i, ret);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 15e2e2abd72d..87f86017ab26 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -719,18 +719,22 @@ static int do_switch(struct drm_i915_gem_request *req)
 	 */
 	from = engine->last_context;
 
-	if (needs_pd_load_pre(engine, to)) {
+	if (from == NULL || needs_pd_load_pre(engine, to)) {
+		struct i915_hw_ppgtt *ppgtt;
+
 		/* Older GENs and non render rings still want the load first,
 		 * "PP_DCLV followed by PP_DIR_BASE register through Load
 		 * Register Immediate commands in Ring Buffer before submitting
 		 * a context."*/
 		trace_switch_mm(engine, to);
-		ret = to->ppgtt->switch_mm(to->ppgtt, req);
+
+		ppgtt = to->ppgtt ?: req->i915->mm.aliasing_ppgtt;
+		ret = ppgtt->switch_mm(ppgtt, req);
 		if (ret)
 			goto unpin_out;
 
 		/* Doing a PD load always reloads the page dirs */
-		to->ppgtt->pd_dirty_rings &= ~intel_engine_flag(engine);
+		ppgtt->pd_dirty_rings &= ~intel_engine_flag(engine);
 	}
 
 	if (engine->id != RCS) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ad26c9e331aa..61ec8f28be72 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2173,19 +2173,6 @@ int i915_ppgtt_init_hw(struct drm_device *dev)
 	return 0;
 }
 
-int i915_ppgtt_init_ring(struct drm_i915_gem_request *req)
-{
-	struct i915_hw_ppgtt *ppgtt = req->i915->mm.aliasing_ppgtt;
-
-	if (i915.enable_execlists)
-		return 0;
-
-	if (!ppgtt)
-		return 0;
-
-	return ppgtt->switch_mm(ppgtt, req);
-}
-
 struct i915_hw_ppgtt *
 i915_ppgtt_create(struct drm_i915_private *dev_priv,
 		  struct drm_i915_file_private *fpriv)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 6346d1786d41..bb3dd5fe1a3c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -547,7 +547,6 @@ int i915_ppgtt_init(struct i915_hw_ppgtt *ppgtt,
 		    struct drm_i915_private *dev_priv,
 		    struct drm_i915_file_private *file_priv);
 int i915_ppgtt_init_hw(struct drm_device *dev);
-int i915_ppgtt_init_ring(struct drm_i915_gem_request *req);
 void i915_ppgtt_release(struct kref *kref);
 struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv,
 					struct drm_i915_file_private *fpriv);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 096/190] drm/i915: Eliminate early submission of context enabling request
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (7 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 095/190] drm/i915: Rearrange switch_context to load the aliasing ppgtt on first use Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 097/190] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
                     ` (44 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

Now that the first request is simplified to a pure context enabling
request (i.e. any request will do the required initialisation as
appropriate), we can forgo explicitly sending that required during early
hw initialisation. The only reason we might want to do such is in
enabling power contexts, i.e. if it is actually required we should move
it to the asynchronous power management enabling task.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         |  1 -
 drivers/gpu/drm/i915/i915_gem.c         | 24 ------------------------
 drivers/gpu/drm/i915/i915_gem_context.c | 21 ---------------------
 3 files changed, 46 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7dc3eed71eb3..be63eaf8764a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2967,7 +2967,6 @@ int __must_check i915_gem_context_init(struct drm_device *dev);
 void i915_gem_context_fini(struct drm_device *dev);
 void i915_gem_context_reset(struct drm_device *dev);
 int i915_gem_context_open(struct drm_device *dev, struct drm_file *file);
-int i915_gem_context_enable(struct drm_i915_gem_request *req);
 void i915_gem_context_close(struct drm_device *dev, struct drm_file *file);
 int i915_switch_context(struct drm_i915_gem_request *req);
 struct intel_context *
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d157ae1e5c2a..a0207b9d1aea 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4117,30 +4117,6 @@ i915_gem_init_hw(struct drm_device *dev)
 		}
 	}
 
-	/* Now it is safe to go back round and do everything else: */
-	for_each_ring(ring, dev_priv, i) {
-		struct drm_i915_gem_request *req;
-
-		WARN_ON(!ring->default_context);
-
-		req = i915_gem_request_alloc(ring, ring->default_context);
-		if (IS_ERR(req)) {
-			ret = PTR_ERR(req);
-			i915_gem_cleanup_ringbuffer(dev);
-			goto out;
-		}
-
-		ret = i915_gem_context_enable(req);
-		if (ret && ret != -EIO) {
-			DRM_ERROR("Context enable ring #%d failed %d\n", i, ret);
-			i915_gem_request_cancel(req);
-			i915_gem_cleanup_ringbuffer(dev);
-			goto out;
-		}
-
-		i915_add_request_no_flush(req);
-	}
-
 out:
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 	return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 87f86017ab26..9f9892525945 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -459,27 +459,6 @@ void i915_gem_context_fini(struct drm_device *dev)
 	i915_gem_context_unreference(dctx);
 }
 
-int i915_gem_context_enable(struct drm_i915_gem_request *req)
-{
-	struct intel_engine_cs *engine = req->engine;
-	int ret;
-
-	if (i915.enable_execlists) {
-		if (engine->init_context == NULL)
-			return 0;
-
-		ret = engine->init_context(req);
-	} else
-		ret = i915_switch_context(req);
-
-	if (ret) {
-		DRM_ERROR("ring init context: %d\n", ret);
-		return ret;
-	}
-
-	return 0;
-}
-
 static int context_idr_cleanup(int id, void *p, void *data)
 {
 	struct intel_context *ctx = p;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 097/190] drm/i915/shrinker: Flush active on objects before counting
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (8 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 096/190] drm/i915: Eliminate early submission of context enabling request Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 098/190] drm/i915: Double check the active status on the batch pool Chris Wilson
                     ` (43 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

As we inspect obj->active to decide how many objects we can shrink (we
only shrink idle objects), it helps to flush the active lists first
in order to have a more accurate count of available objects.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index e15fc7531f08..67f3eb9a8391 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -225,6 +225,8 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 	if (!i915_gem_shrinker_lock(dev, &unlock))
 		return 0;
 
+	i915_gem_retire_requests(dev);
+
 	count = 0;
 	list_for_each_entry(obj, &dev_priv->mm.unbound_list, global_list)
 		if (obj->pages_pin_count == 0)
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 098/190] drm/i915: Double check the active status on the batch pool
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (9 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 097/190] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 099/190] drm/i915: Check for request completion before choosing CS flips Chris Wilson
                     ` (42 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

We should not rely on obj->active being uptodate unless we manually
flush it. Instead, we can verify that the next available batch object is
idle by looking at its last active request (and checking it for
completion).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_batch_pool.c | 24 ++++++++++++++++--------
 drivers/gpu/drm/i915/i915_gem_batch_pool.h |  7 +++++--
 drivers/gpu/drm/i915/intel_lrc.c           |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c    |  2 +-
 4 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 7bf2f3f2968e..d4318665ac6c 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -41,15 +41,15 @@
 
 /**
  * i915_gem_batch_pool_init() - initialize a batch buffer pool
- * @dev: the drm device
+ * @engine: the associated request submission engine
  * @pool: the batch buffer pool
  */
-void i915_gem_batch_pool_init(struct drm_device *dev,
+void i915_gem_batch_pool_init(struct intel_engine_cs *engine,
 			      struct i915_gem_batch_pool *pool)
 {
 	int n;
 
-	pool->dev = dev;
+	pool->engine = engine;
 
 	for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++)
 		INIT_LIST_HEAD(&pool->cache_list[n]);
@@ -65,7 +65,7 @@ void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool)
 {
 	int n;
 
-	WARN_ON(!mutex_is_locked(&pool->dev->struct_mutex));
+	lockdep_assert_held(&pool->engine->dev->struct_mutex);
 
 	for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++) {
 		while (!list_empty(&pool->cache_list[n])) {
@@ -102,7 +102,7 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 	struct list_head *list;
 	int n;
 
-	WARN_ON(!mutex_is_locked(&pool->dev->struct_mutex));
+	lockdep_assert_held(&pool->engine->dev->struct_mutex);
 
 	/* Compute a power-of-two bucket, but throw everything greater than
 	 * 16KiB into the same bucket: i.e. the the buckets hold objects of
@@ -115,8 +115,16 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 
 	list_for_each_entry_safe(tmp, next, list, batch_pool_link) {
 		/* The batches are strictly LRU ordered */
-		if (tmp->active)
-			break;
+		if (tmp->active) {
+			struct drm_i915_gem_request *rq;
+
+			rq = tmp->last_read[pool->engine->id].request;
+			if (!i915_gem_request_completed(rq))
+				break;
+
+			GEM_BUG_ON(tmp->active & ~intel_engine_flag(pool->engine));
+			GEM_BUG_ON(tmp->last_write.request);
+		}
 
 		/* While we're looping, do some clean up */
 		if (tmp->madv == __I915_MADV_PURGED) {
@@ -134,7 +142,7 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 	if (obj == NULL) {
 		int ret;
 
-		obj = i915_gem_alloc_object(pool->dev, size);
+		obj = i915_gem_alloc_object(pool->engine->dev, size);
 		if (obj == NULL)
 			return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.h b/drivers/gpu/drm/i915/i915_gem_batch_pool.h
index 848e90703eed..7fd4df0a29fe 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.h
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.h
@@ -27,13 +27,16 @@
 
 #include "i915_drv.h"
 
+struct drm_device;
+struct intel_engine_cs;
+
 struct i915_gem_batch_pool {
-	struct drm_device *dev;
+	struct intel_engine_cs *engine;
 	struct list_head cache_list[4];
 };
 
 /* i915_gem_batch_pool.c */
-void i915_gem_batch_pool_init(struct drm_device *dev,
+void i915_gem_batch_pool_init(struct intel_engine_cs *engine,
 			      struct i915_gem_batch_pool *pool);
 void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool);
 struct drm_i915_gem_object*
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3a2088f9d0be..850cacdf6dda 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1444,7 +1444,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->i915 = dev_priv;
 	ring->fence_context = fence_context_alloc(1);
 	INIT_LIST_HEAD(&ring->request_list);
-	i915_gem_batch_pool_init(dev, &ring->batch_pool);
+	i915_gem_batch_pool_init(ring, &ring->batch_pool);
 	intel_engine_init_breadcrumbs(ring);
 
 	INIT_LIST_HEAD(&ring->buffers);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 7ca4e1fc854d..09799ce72212 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2028,7 +2028,7 @@ static int intel_init_engine(struct drm_device *dev,
 	INIT_LIST_HEAD(&engine->request_list);
 	INIT_LIST_HEAD(&engine->execlist_queue);
 	INIT_LIST_HEAD(&engine->buffers);
-	i915_gem_batch_pool_init(dev, &engine->batch_pool);
+	i915_gem_batch_pool_init(engine, &engine->batch_pool);
 	memset(engine->semaphore.sync_seqno, 0, sizeof(engine->semaphore.sync_seqno));
 
 	intel_engine_init_breadcrumbs(engine);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 099/190] drm/i915: Check for request completion before choosing CS flips
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (10 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 098/190] drm/i915: Double check the active status on the batch pool Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 100/190] drm/i915: Remove request retirement before each batch Chris Wilson
                     ` (41 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

Only queue a CS flip if the outstanding request is not complete, and in
particular do not rely on the request tracking being fresh (since it is
only updated when requests are retired).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_display.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index eef858d5376f..f227cdaf38ec 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11309,8 +11309,11 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
 		 !reservation_object_test_signaled_rcu(obj->base.dma_buf->resv,
 						       false))
 		return true;
+	else if (!obj->last_write.request ||
+		 i915_gem_request_completed(obj->last_write.request))
+		return true;
 	else
-		return ring != i915_gem_request_get_engine(obj->last_write.request);
+		return ring != obj->last_write.request->engine;
 }
 
 static void skl_do_mmio_flip(struct intel_crtc *intel_crtc,
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 100/190] drm/i915: Remove request retirement before each batch
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (11 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 099/190] drm/i915: Check for request completion before choosing CS flips Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 101/190] drm/i915: Only retire if necessary when creating a userptr Chris Wilson
                     ` (40 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

This reimplements the denial-of-service protection against igt from

commit 227f782e4667fc622810bce8be8ccdeee45f89c2
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu May 15 10:41:42 2014 +0100

    drm/i915: Retire requests before creating a new one

and transfers the stall from before each batch into a the close handler.
The issue is that the stall is increasing latency between batches which
is detrimental in some cases (especially coupled with execlists) to
keeping the GPU well fed. Also we have made the observation that retiring
requests can of itself free objects (and requests) and therefore makes
a good first step when shrinking.

v2: Recycle objects prior to i915_gem_object_get_pages()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h            |  1 -
 drivers/gpu/drm/i915/i915_gem.c            | 23 +++++++++++++++--------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 --
 3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index be63eaf8764a..5711ae3a22a1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2780,7 +2780,6 @@ struct drm_i915_gem_request *
 i915_gem_find_active_request(struct intel_engine_cs *ring);
 
 void i915_gem_retire_requests(struct drm_device *dev);
-void i915_gem_retire_requests_ring(struct intel_engine_cs *ring);
 
 static inline u32 i915_reset_counter(struct i915_gpu_error *error)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a0207b9d1aea..d705005ca26e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1986,7 +1986,6 @@ err_pages:
 int
 i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
 {
-	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
 	const struct drm_i915_gem_object_ops *ops = obj->ops;
 	int ret;
 
@@ -2000,11 +1999,15 @@ i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
 
 	BUG_ON(obj->pages_pin_count);
 
+	/* Recycle as many active objects as possible first */
+	i915_gem_retire_requests(obj->base.dev);
+
 	ret = ops->get_pages(obj);
 	if (ret)
 		return ret;
 
-	list_add_tail(&obj->global_list, &dev_priv->mm.unbound_list);
+	list_add_tail(&obj->global_list,
+		      &to_i915(obj->base.dev)->mm.unbound_list);
 
 	obj->get_page.sg = obj->pages->sgl;
 	obj->get_page.last = 0;
@@ -2259,7 +2262,7 @@ void i915_gem_reset(struct drm_device *dev)
 /**
  * This function clears the request list as sequence numbers are passed.
  */
-void
+static bool
 i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 {
 	while (!list_empty(&ring->request_list)) {
@@ -2270,10 +2273,12 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 					   link);
 
 		if (!i915_gem_request_completed(request))
-			break;
+			return false;
 
 		i915_gem_request_retire_upto(request);
 	}
+
+	return true;
 }
 
 void
@@ -2281,19 +2286,18 @@ i915_gem_retire_requests(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	bool idle = true;
+	bool idle;
 	int i;
 
 	if (!dev_priv->mm.busy)
 		return;
 
+	idle = true;
 	for_each_ring(ring, dev_priv, i) {
-		i915_gem_retire_requests_ring(ring);
-		idle &= list_empty(&ring->request_list);
+		idle &= i915_gem_retire_requests_ring(ring);
 		if (i915.enable_execlists)
 			idle &= intel_execlists_retire_requests(ring);
 	}
-
 	if (idle)
 		queue_delayed_work(dev_priv->wq,
 				   &dev_priv->mm.idle_work,
@@ -2399,6 +2403,7 @@ void i915_gem_close_object(struct drm_gem_object *gem,
 	list_for_each_entry_safe(vma, vn, &obj->vma_list, obj_link)
 		if (vma->vm->file == fpriv)
 			i915_vma_close(vma);
+	i915_gem_object_flush_active(obj);
 	mutex_unlock(&obj->base.dev->struct_mutex);
 }
 
@@ -4235,7 +4240,9 @@ i915_gem_cleanup_ringbuffer(struct drm_device *dev)
 static void
 init_ring_lists(struct intel_engine_cs *ring)
 {
+	/* Early initialisation so that core GEM works during engine setup */
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->execlist_completed);
 }
 
 void
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 7a9d3f4732e9..90c5341506be 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -741,8 +741,6 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
 	bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4;
 	int retry;
 
-	i915_gem_retire_requests_ring(ring);
-
 	vm = list_first_entry(vmas, struct i915_vma, exec_list)->vm;
 
 	INIT_LIST_HEAD(&ordered_vmas);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 101/190] drm/i915: Only retire if necessary when creating a userptr
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (12 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 100/190] drm/i915: Remove request retirement before each batch Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 102/190] drm/i915: Move the "per-ring" default_context to the device Chris Wilson
                     ` (39 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

We only want to retire requests if we have an existing object that
conflicts with the fresh userptr range in order to avoid unnecessary
work during creation of every userptr.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_userptr.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index a90392246471..2f922392bd10 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -187,17 +187,23 @@ i915_mmu_notifier_add(struct drm_device *dev,
 	 * using an interrupt timer is likely to get stuck in an EINTR loop).
 	 */
 	mutex_lock(&dev->struct_mutex);
-
-	/* Make sure we drop the final active reference (and thereby
-	 * remove the objects from the interval tree) before we do
-	 * the check for overlapping objects.
-	 */
-	i915_gem_retire_requests(dev);
-
 	spin_lock(&mn->lock);
 	it = interval_tree_iter_first(&mn->objects,
 				      mo->it.start, mo->it.last);
 	if (it) {
+		spin_unlock(&mn->lock);
+
+		/* Make sure we drop the final active reference (and thereby
+		 * remove the objects from the interval tree) before we do
+		 * the check for overlapping objects.
+		 */
+		i915_gem_retire_requests(dev);
+
+		spin_lock(&mn->lock);
+		it = interval_tree_iter_first(&mn->objects,
+					      mo->it.start, mo->it.last);
+	}
+	if (it) {
 		struct drm_i915_gem_object *obj;
 
 		/* We only need to check the first object in the range as it
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 102/190] drm/i915: Move the "per-ring" default_context to the device
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (13 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 101/190] drm/i915: Only retire if necessary when creating a userptr Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 14:40     ` Dave Gordon
  2016-01-11 10:44   ` [PATCH 103/190] drm/i915: Move pinning of dev_priv->kernel_context into its creator Chris Wilson
                     ` (38 subsequent siblings)
  53 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

We have a false notion of a default_context allocated per engine,
whereas actually it is a singular context reserved for kernel use.
Remove it from the engines, and rename it thus.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        | 19 ++++++++++++++-----
 drivers/gpu/drm/i915/i915_drv.h            |  1 +
 drivers/gpu/drm/i915/i915_gem.c            |  2 +-
 drivers/gpu/drm/i915/i915_gem_context.c    | 28 +++++++++++-----------------
 drivers/gpu/drm/i915/i915_gem_evict.c      |  4 ++--
 drivers/gpu/drm/i915/i915_guc_submission.c |  9 +++++----
 drivers/gpu/drm/i915/intel_display.c       |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c           |  6 +++---
 drivers/gpu/drm/i915/intel_overlay.c       |  8 ++++----
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  1 -
 10 files changed, 42 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index ea5b9f6d0fc9..dee66807c6bd 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1960,12 +1960,21 @@ static int i915_context_status(struct seq_file *m, void *unused)
 			continue;
 
 		seq_puts(m, "HW context ");
+		if (IS_ERR(ctx->file_priv)) {
+			seq_puts(m, "(deleted) ");
+		} else if (ctx->file_priv) {
+			struct pid *pid = ctx->file_priv->file->pid;
+			struct task_struct *task;
+
+			task = get_pid_task(pid, PIDTYPE_PID);
+			if (task) {
+				seq_printf(m, "(%s [%d]) ",
+					   task->comm, task->pid);
+				put_task_struct(task);
+			}
+		} else
+			seq_puts(m, "(kernel) ");
 		describe_ctx(m, ctx);
-		for_each_ring(ring, dev_priv, i) {
-			if (ring->default_context == ctx)
-				seq_printf(m, "(default context %s) ",
-					   ring->name);
-		}
 
 		if (i915.enable_execlists) {
 			seq_putc(m, '\n');
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5711ae3a22a1..4ada625b751e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1703,6 +1703,7 @@ struct drm_i915_private {
 
 	struct pci_dev *bridge_dev;
 	struct intel_engine_cs ring[I915_NUM_RINGS];
+	struct intel_context *kernel_context;
 	struct drm_i915_gem_object *semaphore_obj;
 	uint32_t last_seqno, next_seqno;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d705005ca26e..a82a06a61262 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4097,7 +4097,7 @@ i915_gem_init_hw(struct drm_device *dev)
 	 */
 	init_unused_rings(dev);
 
-	BUG_ON(!dev_priv->ring[RCS].default_context);
+	BUG_ON(!dev_priv->kernel_context);
 
 	ret = i915_ppgtt_init_hw(dev);
 	if (ret) {
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 9f9892525945..593c22a702fa 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -216,6 +216,7 @@ static void context_close(struct intel_context *ctx)
 	ctx->closed = true;
 	if (ctx->ppgtt)
 		i915_ppgtt_close(&ctx->ppgtt->base);
+	ctx->file_priv = ERR_PTR(-ENOENT);
 	i915_gem_context_unreference(ctx);
 }
 
@@ -358,22 +359,21 @@ void i915_gem_context_reset(struct drm_device *dev)
 			i915_gem_context_unreference(lctx);
 			ring->last_context = NULL;
 		}
-
-		/* Force the GPU state to be reinitialised on enabling */
-		if (ring->default_context)
-			ring->default_context->legacy_hw_ctx.initialized = false;
 	}
+
+	/* Force the GPU state to be reinitialised on enabling */
+	if (dev_priv->kernel_context)
+		dev_priv->kernel_context->legacy_hw_ctx.initialized = false;
 }
 
 int i915_gem_context_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_context *ctx;
-	int i;
 
 	/* Init should only be called once per module load. Eventually the
 	 * restriction on the context_disabled check can be loosened. */
-	if (WARN_ON(dev_priv->ring[RCS].default_context))
+	if (WARN_ON(dev_priv->kernel_context))
 		return 0;
 
 	if (intel_vgpu_active(dev) && HAS_LOGICAL_RING_CONTEXTS(dev)) {
@@ -402,13 +402,7 @@ int i915_gem_context_init(struct drm_device *dev)
 			  PTR_ERR(ctx));
 		return PTR_ERR(ctx);
 	}
-
-	for (i = 0; i < I915_NUM_RINGS; i++) {
-		struct intel_engine_cs *ring = &dev_priv->ring[i];
-
-		/* NB: RCS will hold a ref for all rings */
-		ring->default_context = ctx;
-	}
+	dev_priv->kernel_context = ctx;
 
 	DRM_DEBUG_DRIVER("%s context support initialized\n",
 			i915.enable_execlists ? "LR" :
@@ -419,7 +413,7 @@ int i915_gem_context_init(struct drm_device *dev)
 void i915_gem_context_fini(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_context *dctx = dev_priv->ring[RCS].default_context;
+	struct intel_context *dctx = dev_priv->kernel_context;
 	int i;
 
 	if (dctx->legacy_hw_ctx.rcs_state) {
@@ -449,10 +443,10 @@ void i915_gem_context_fini(struct drm_device *dev)
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct intel_engine_cs *ring = &dev_priv->ring[i];
 
-		if (ring->last_context)
-			i915_gem_context_unreference(ring->last_context);
+		if (ring->last_context == NULL)
+			continue;
 
-		ring->default_context = NULL;
+		i915_gem_context_unreference(ring->last_context);
 		ring->last_context = NULL;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index b7bcc324a7a7..679b7dd3a312 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -45,10 +45,10 @@ static int switch_to_pinned_context(struct drm_i915_private *dev_priv)
 	for_each_ring(ring, dev_priv, i) {
 		struct drm_i915_gem_request *req;
 
-		if (ring->last_context == ring->default_context)
+		if (ring->last_context == dev_priv->kernel_context)
 			continue;
 
-		req = i915_gem_request_alloc(ring, ring->default_context);
+		req = i915_gem_request_alloc(ring, dev_priv->kernel_context);
 		if (IS_ERR(req))
 			return PTR_ERR(req);
 
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index f4e09952d52c..63e58253280b 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -937,11 +937,12 @@ int i915_guc_submission_enable(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_guc *guc = &dev_priv->guc;
-	struct intel_context *ctx = dev_priv->ring[RCS].default_context;
 	struct i915_guc_client *client;
 
 	/* client for execbuf submission */
-	client = guc_client_alloc(dev, GUC_CTX_PRIORITY_KMD_NORMAL, ctx);
+	client = guc_client_alloc(dev,
+				  GUC_CTX_PRIORITY_KMD_NORMAL,
+				  dev_priv->kernel_context);
 	if (!client) {
 		DRM_ERROR("Failed to create execbuf guc_client\n");
 		return -ENOMEM;
@@ -994,7 +995,7 @@ int intel_guc_suspend(struct drm_device *dev)
 	if (!i915.enable_guc_submission)
 		return 0;
 
-	ctx = dev_priv->ring[RCS].default_context;
+	ctx = dev_priv->kernel_context;
 
 	data[0] = HOST2GUC_ACTION_ENTER_S_STATE;
 	/* any value greater than GUC_POWER_D0 */
@@ -1020,7 +1021,7 @@ int intel_guc_resume(struct drm_device *dev)
 	if (!i915.enable_guc_submission)
 		return 0;
 
-	ctx = dev_priv->ring[RCS].default_context;
+	ctx = dev_priv->kernel_context;
 
 	data[0] = HOST2GUC_ACTION_EXIT_S_STATE;
 	data[1] = GUC_POWER_D0;
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index f227cdaf38ec..e8f957785a64 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11672,7 +11672,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	 * into the display plane and skip any waits.
 	 */
 	if (!mmio_flip) {
-		request = i915_gem_request_alloc(ring, ring->default_context);
+		request = i915_gem_request_alloc(ring, ring->last_context);
 		if (IS_ERR(request)) {
 			ret = PTR_ERR(request);
 			goto cleanup_pending;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 850cacdf6dda..4d5196547e78 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1058,7 +1058,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring)
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	lrc_setup_hardware_status_page(ring,
-				ring->default_context->engine[ring->id].state);
+			dev_priv->kernel_context->engine[ring->id].state);
 
 	I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
 	I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff);
@@ -1424,7 +1424,7 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 		kunmap(sg_page(ring->status_page.obj->pages->sgl));
 		ring->status_page.obj = NULL;
 	}
-	intel_lr_context_unpin(ring->default_context, ring);
+	intel_lr_context_unpin(ring->i915->kernel_context, ring);
 
 	lrc_destroy_wa_ctx_obj(ring);
 	ring->dev = NULL;
@@ -1458,7 +1458,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	if (ret)
 		goto error;
 
-	ctx = ring->default_context;
+	ctx = ring->i915->kernel_context;
 
 	ret = execlists_context_deferred_alloc(ctx, ring);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index df71c01f28f1..094ea87bf6be 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -240,7 +240,7 @@ static int intel_overlay_on(struct intel_overlay *overlay)
 	WARN_ON(overlay->active);
 	WARN_ON(IS_I830(dev) && !(dev_priv->quirks & QUIRK_PIPEA_FORCE));
 
-	req = i915_gem_request_alloc(ring, ring->default_context);
+	req = i915_gem_request_alloc(ring, dev_priv->kernel_context);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
@@ -283,7 +283,7 @@ static int intel_overlay_continue(struct intel_overlay *overlay,
 	if (tmp & (1 << 17))
 		DRM_DEBUG("overlay underrun, DOVSTA: %x\n", tmp);
 
-	req = i915_gem_request_alloc(ring, ring->default_context);
+	req = i915_gem_request_alloc(ring, dev_priv->kernel_context);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
@@ -349,7 +349,7 @@ static int intel_overlay_off(struct intel_overlay *overlay)
 	 * of the hw. Do it in both cases */
 	flip_addr |= OFC_UPDATE;
 
-	req = i915_gem_request_alloc(ring, ring->default_context);
+	req = i915_gem_request_alloc(ring, dev_priv->kernel_context);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
@@ -423,7 +423,7 @@ static int intel_overlay_release_old_vid(struct intel_overlay *overlay)
 		/* synchronous slowpath */
 		struct drm_i915_gem_request *req;
 
-		req = i915_gem_request_alloc(ring, ring->default_context);
+		req = i915_gem_request_alloc(ring, dev_priv->kernel_context);
 		if (req)
 			return PTR_ERR(req);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 3d4d5711aea9..868cc8d5abb3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -316,7 +316,6 @@ struct intel_engine_cs {
 	u32 last_submitted_seqno;
 	unsigned user_interrupts;
 
-	struct intel_context *default_context;
 	struct intel_context *last_context;
 
 	struct intel_engine_hangcheck hangcheck;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 103/190] drm/i915: Move pinning of dev_priv->kernel_context into its creator
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (14 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 102/190] drm/i915: Move the "per-ring" default_context to the device Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 104/190] drm/i915: Remove i915_gem_execbuffer_retire_commands() Chris Wilson
                     ` (37 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

Rather than have every context ask "am I owned by the kernel? pin!",
move that logic into the creator of the kernel context, in order to
improve code comprehension.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 53 +++++++++++++++------------------
 1 file changed, 24 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 593c22a702fa..b7f5781a85ec 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -280,9 +280,7 @@ static struct intel_context *
 i915_gem_create_context(struct drm_device *dev,
 			struct drm_i915_file_private *file_priv)
 {
-	const bool is_global_default_ctx = file_priv == NULL;
 	struct intel_context *ctx;
-	int ret = 0;
 
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 
@@ -290,31 +288,15 @@ i915_gem_create_context(struct drm_device *dev,
 	if (IS_ERR(ctx))
 		return ctx;
 
-	if (is_global_default_ctx && ctx->legacy_hw_ctx.rcs_state) {
-		/* We may need to do things with the shrinker which
-		 * require us to immediately switch back to the default
-		 * context. This can cause a problem as pinning the
-		 * default context also requires GTT space which may not
-		 * be available. To avoid this we always pin the default
-		 * context.
-		 */
-		ret = i915_gem_obj_ggtt_pin(ctx->legacy_hw_ctx.rcs_state,
-					    get_context_alignment(dev), 0);
-		if (ret) {
-			DRM_DEBUG_DRIVER("Couldn't pin %d\n", ret);
-			goto err_destroy;
-		}
-	}
-
 	if (USES_FULL_PPGTT(dev)) {
 		struct i915_hw_ppgtt *ppgtt =
 			i915_ppgtt_create(to_i915(dev), file_priv);
 
-		if (IS_ERR_OR_NULL(ppgtt)) {
+		if (IS_ERR(ppgtt)) {
 			DRM_DEBUG_DRIVER("PPGTT setup failed (%ld)\n",
 					 PTR_ERR(ppgtt));
-			ret = PTR_ERR(ppgtt);
-			goto err_unpin;
+			i915_gem_context_unreference(ctx);
+			return ERR_CAST(ppgtt);
 		}
 
 		ctx->ppgtt = ppgtt;
@@ -323,14 +305,6 @@ i915_gem_create_context(struct drm_device *dev,
 	trace_i915_context_create(ctx);
 
 	return ctx;
-
-err_unpin:
-	if (is_global_default_ctx && ctx->legacy_hw_ctx.rcs_state)
-		i915_gem_object_ggtt_unpin(ctx->legacy_hw_ctx.rcs_state);
-err_destroy:
-	idr_remove(&file_priv->context_idr, ctx->user_handle);
-	context_close(ctx);
-	return ERR_PTR(ret);
 }
 
 void i915_gem_context_reset(struct drm_device *dev)
@@ -402,6 +376,27 @@ int i915_gem_context_init(struct drm_device *dev)
 			  PTR_ERR(ctx));
 		return PTR_ERR(ctx);
 	}
+
+	if (ctx->legacy_hw_ctx.rcs_state) {
+		int ret;
+
+		/* We may need to do things with the shrinker which
+		 * require us to immediately switch back to the default
+		 * context. This can cause a problem as pinning the
+		 * default context also requires GTT space which may not
+		 * be available. To avoid this we always pin the default
+		 * context.
+		 */
+		ret = i915_gem_obj_ggtt_pin(ctx->legacy_hw_ctx.rcs_state,
+					    get_context_alignment(dev), 0);
+		if (ret) {
+			DRM_ERROR("Failed to pinned default global context (error %d)\n",
+				  ret);
+			i915_gem_context_unreference(ctx);
+			return ret;
+		}
+	}
+
 	dev_priv->kernel_context = ctx;
 
 	DRM_DEBUG_DRIVER("%s context support initialized\n",
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 104/190] drm/i915: Remove i915_gem_execbuffer_retire_commands()
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (15 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 103/190] drm/i915: Move pinning of dev_priv->kernel_context into its creator Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 105/190] drm/i915: Pad GTT views of exec objects up to user specified size Chris Wilson
                     ` (36 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

Move the single line to the callsite as the name is now misleading, and
the purpose is solely to add the request to the execution queue.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 90c5341506be..d88be1d3cb86 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1163,13 +1163,6 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 	}
 }
 
-static void
-i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params)
-{
-	/* Add a breadcrumb for the completion of the batch buffer */
-	__i915_add_request(params->request, params->batch_obj, true);
-}
-
 static int
 i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
 {
@@ -1651,7 +1644,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	params->ctx                     = ctx;
 
 	ret = execbuf_submit(params, args, &eb->vmas);
-	i915_gem_execbuffer_retire_commands(params);
+	__i915_add_request(params->request, params->batch_obj, ret == 0);
 
 err_batch_unpin:
 	/*
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 105/190] drm/i915: Pad GTT views of exec objects up to user specified size
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (16 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 104/190] drm/i915: Remove i915_gem_execbuffer_retire_commands() Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-03-22 14:32     ` David Weinehall
  2016-01-11 10:44   ` [PATCH 106/190] drm/i915: Split insertion/binding of an object into the VM Chris Wilson
                     ` (35 subsequent siblings)
  53 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

Our GPUs impose certain requirements upon buffers that depend upon how
exactly they are used. Typically this is expressed as that they require
a larger surface than would be naively computed by pitch * height.
Normally such requirements are hidden away in the userspace driver, but
when we accept pointers from strangers and later impose extra conditions
on them, the original client allocator has no idea about the
monstrosities in the GPU and we require the userspace driver to inform
the kernel how many padding pages are required beyond the client
allocation.

v2: Long time, no see
v3: Try an anonymous union for uapi struct compatability

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  6 ++-
 drivers/gpu/drm/i915/i915_gem.c            | 79 +++++++++++++++---------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 16 +++++-
 include/uapi/drm/i915_drm.h                |  8 ++-
 4 files changed, 64 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4ada625b751e..49b126e4191e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2694,11 +2694,13 @@ void i915_gem_free_object(struct drm_gem_object *obj);
 int __must_check
 i915_gem_object_pin(struct drm_i915_gem_object *obj,
 		    struct i915_address_space *vm,
+		    uint64_t size,
 		    uint32_t alignment,
 		    uint64_t flags);
 int __must_check
 i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 			 const struct i915_ggtt_view *view,
+			 uint64_t size,
 			 uint32_t alignment,
 			 uint64_t flags);
 
@@ -2931,8 +2933,8 @@ i915_gem_obj_ggtt_pin(struct drm_i915_gem_object *obj,
 		      uint32_t alignment,
 		      unsigned flags)
 {
-	return i915_gem_object_pin(obj, i915_obj_to_ggtt(obj),
-				   alignment, flags | PIN_GLOBAL);
+	return i915_gem_object_pin(obj, i915_obj_to_ggtt(obj), 0, alignment,
+				   flags | PIN_GLOBAL);
 }
 
 static inline int
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a82a06a61262..2f14d2da75a5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1440,7 +1440,7 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	}
 
 	/* Now pin it into the GTT if needed */
-	ret = i915_gem_object_ggtt_pin(obj, &view, 0, PIN_MAPPABLE);
+	ret = i915_gem_object_ggtt_pin(obj, &view, 0, 0, PIN_MAPPABLE);
 	if (ret)
 		goto unlock;
 
@@ -2746,20 +2746,20 @@ static struct i915_vma *
 i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 			   struct i915_address_space *vm,
 			   const struct i915_ggtt_view *ggtt_view,
+			   uint64_t size,
 			   unsigned alignment,
 			   uint64_t flags)
 {
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	u32 fence_alignment, unfenced_alignment;
-	u32 search_flag, alloc_flag;
 	u64 start, end;
-	u64 size, fence_size;
+	u32 search_flag, alloc_flag;
 	struct i915_vma *vma;
 	int ret;
 
 	if (i915_is_ggtt(vm)) {
-		u32 view_size;
+		u32 fence_size, fence_alignment, unfenced_alignment;
+		u64 view_size;
 
 		if (WARN_ON(!ggtt_view))
 			return ERR_PTR(-EINVAL);
@@ -2777,21 +2777,22 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 								view_size,
 								obj->tiling_mode,
 								false);
-		size = flags & PIN_MAPPABLE ? fence_size : view_size;
+		size = max(size, view_size);
+		if (flags & PIN_MAPPABLE)
+			size = max_t(u64, size, fence_size);
+
+		if (alignment == 0)
+			alignment = flags & PIN_MAPPABLE ? fence_alignment :
+				unfenced_alignment;
+		if (flags & PIN_MAPPABLE && alignment & (fence_alignment - 1)) {
+			DRM_DEBUG("Invalid object (view type=%u) alignment requested %u\n",
+				  ggtt_view ? ggtt_view->type : 0,
+				  alignment);
+			return ERR_PTR(-EINVAL);
+		}
 	} else {
-		fence_size = i915_gem_get_gtt_size(dev,
-						   obj->base.size,
-						   obj->tiling_mode);
-		fence_alignment = i915_gem_get_gtt_alignment(dev,
-							     obj->base.size,
-							     obj->tiling_mode,
-							     true);
-		unfenced_alignment =
-			i915_gem_get_gtt_alignment(dev,
-						   obj->base.size,
-						   obj->tiling_mode,
-						   false);
-		size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
+		size = max_t(u64, size, obj->base.size);
+		alignment = 4096;
 	}
 
 	start = flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
@@ -2801,24 +2802,14 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	if (flags & PIN_ZONE_4G)
 		end = min_t(u64, end, (1ULL << 32));
 
-	if (alignment == 0)
-		alignment = flags & PIN_MAPPABLE ? fence_alignment :
-						unfenced_alignment;
-	if (flags & PIN_MAPPABLE && alignment & (fence_alignment - 1)) {
-		DRM_DEBUG("Invalid object (view type=%u) alignment requested %u\n",
-			  ggtt_view ? ggtt_view->type : 0,
-			  alignment);
-		return ERR_PTR(-EINVAL);
-	}
-
 	/* If binding the object/GGTT view requires more space than the entire
 	 * aperture has, reject it early before evicting everything in a vain
 	 * attempt to find space.
 	 */
 	if (size > end) {
-		DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%llu > %s aperture=%llu\n",
+		DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: request=%llu [object=%zd] > %s aperture=%llu\n",
 			  ggtt_view ? ggtt_view->type : 0,
-			  size,
+			  size, obj->base.size,
 			  flags & PIN_MAPPABLE ? "mappable" : "total",
 			  end);
 		return ERR_PTR(-E2BIG);
@@ -3309,7 +3300,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	 * (e.g. libkms for the bootup splash), we have to ensure that we
 	 * always use map_and_fenceable for all scanout buffers.
 	 */
-	ret = i915_gem_object_ggtt_pin(obj, view, alignment,
+	ret = i915_gem_object_ggtt_pin(obj, view, 0, alignment,
 				       view->type == I915_GGTT_VIEW_NORMAL ?
 				       PIN_MAPPABLE : 0);
 	if (ret)
@@ -3459,12 +3450,17 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 }
 
 static bool
-i915_vma_misplaced(struct i915_vma *vma, uint32_t alignment, uint64_t flags)
+i915_vma_misplaced(struct i915_vma *vma,
+		   uint64_t size,
+		   uint32_t alignment,
+		   uint64_t flags)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
 
-	if (alignment &&
-	    vma->node.start & (alignment - 1))
+	if (vma->node.size < size)
+		return true;
+
+	if (alignment && vma->node.start & (alignment - 1))
 		return true;
 
 	if (flags & PIN_MAPPABLE && !obj->map_and_fenceable)
@@ -3508,6 +3504,7 @@ static int
 i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
 		       struct i915_address_space *vm,
 		       const struct i915_ggtt_view *ggtt_view,
+		       uint64_t size,
 		       uint32_t alignment,
 		       uint64_t flags)
 {
@@ -3538,7 +3535,7 @@ i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
 		if (WARN_ON(vma->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT))
 			return -EBUSY;
 
-		if (i915_vma_misplaced(vma, alignment, flags)) {
+		if (i915_vma_misplaced(vma, size, alignment, flags)) {
 			WARN(vma->pin_count,
 			     "bo is already pinned in %s with incorrect alignment:"
 			     " offset=%08x %08x, req.alignment=%x, req.map_and_fenceable=%d,"
@@ -3559,8 +3556,8 @@ i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
 
 	bound = vma ? vma->bound : 0;
 	if (vma == NULL || !drm_mm_node_allocated(&vma->node)) {
-		vma = i915_gem_object_bind_to_vm(obj, vm, ggtt_view, alignment,
-						 flags);
+		vma = i915_gem_object_bind_to_vm(obj, vm, ggtt_view,
+						 size, alignment, flags);
 		if (IS_ERR(vma))
 			return PTR_ERR(vma);
 	} else {
@@ -3582,17 +3579,19 @@ i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
 int
 i915_gem_object_pin(struct drm_i915_gem_object *obj,
 		    struct i915_address_space *vm,
+		    uint64_t size,
 		    uint32_t alignment,
 		    uint64_t flags)
 {
 	return i915_gem_object_do_pin(obj, vm,
 				      i915_is_ggtt(vm) ? &i915_ggtt_view_normal : NULL,
-				      alignment, flags);
+				      size, alignment, flags);
 }
 
 int
 i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 			 const struct i915_ggtt_view *view,
+			 uint64_t size,
 			 uint32_t alignment,
 			 uint64_t flags)
 {
@@ -3600,7 +3599,7 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 		return -EINVAL;
 
 	return i915_gem_object_do_pin(obj, i915_obj_to_ggtt(obj), view,
-				      alignment, flags | PIN_GLOBAL);
+				      size, alignment, flags | PIN_GLOBAL);
 }
 
 void
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index d88be1d3cb86..899220139a8a 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -642,10 +642,14 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
 			flags |= PIN_HIGH;
 	}
 
-	ret = i915_gem_object_pin(obj, vma->vm, entry->alignment, flags);
+	ret = i915_gem_object_pin(obj, vma->vm,
+				  entry->pad_to_size,
+				  entry->alignment,
+				  flags);
 	if ((ret == -ENOSPC  || ret == -E2BIG) &&
 	    only_mappable_for_reloc(entry->flags))
 		ret = i915_gem_object_pin(obj, vma->vm,
+					  entry->pad_to_size,
 					  entry->alignment,
 					  flags & ~PIN_MAPPABLE);
 	if (ret)
@@ -708,6 +712,9 @@ eb_vma_misplaced(struct i915_vma *vma)
 	    vma->node.start & (entry->alignment - 1))
 		return true;
 
+	if (vma->node.size < entry->pad_to_size)
+		return true;
+
 	if (entry->flags & EXEC_OBJECT_PINNED &&
 	    vma->node.start != entry->offset)
 		return true;
@@ -1044,6 +1051,13 @@ validate_exec_list(struct drm_device *dev,
 		if (exec[i].alignment && !is_power_of_2(exec[i].alignment))
 			return -EINVAL;
 
+		/* pad_to_size was once a reserved field, so sanitize it */
+		if (exec[i].flags & EXEC_OBJECT_PAD_TO_SIZE) {
+			if (offset_in_page(exec[i].pad_to_size))
+				return -EINVAL;
+		} else
+			exec[i].pad_to_size = 0;
+
 		/* First check for malicious input causing overflow in
 		 * the worst case where we need to allocate the entire
 		 * relocation tree as a single array.
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7fee4416dcc7..ff7b438059da 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -697,10 +697,14 @@ struct drm_i915_gem_exec_object2 {
 #define EXEC_OBJECT_WRITE	(1<<2)
 #define EXEC_OBJECT_SUPPORTS_48B_ADDRESS (1<<3)
 #define EXEC_OBJECT_PINNED	(1<<4)
-#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_PINNED<<1)
+#define EXEC_OBJECT_PAD_TO_SIZE	(1<<5)
+#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_PAD_TO_SIZE<<1)
 	__u64 flags;
 
-	__u64 rsvd1;
+	union {
+		__u64 rsvd1;
+		__u64 pad_to_size;
+	};
 	__u64 rsvd2;
 };
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 106/190] drm/i915: Split insertion/binding of an object into the VM
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (17 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 105/190] drm/i915: Pad GTT views of exec objects up to user specified size Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 107/190] drm/i915: Record allocated vma size Chris Wilson
                     ` (34 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

Split the insertion into the address space's range manager and binding
of that object into the GTT to simplify the code flow when pinning a
VMA.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 33 +++++++++++++++------------------
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2f14d2da75a5..9c159e64a9a0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2743,12 +2743,12 @@ static bool i915_gem_valid_gtt_space(struct i915_vma *vma,
  * there.
  */
 static struct i915_vma *
-i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
-			   struct i915_address_space *vm,
-			   const struct i915_ggtt_view *ggtt_view,
-			   uint64_t size,
-			   unsigned alignment,
-			   uint64_t flags)
+i915_gem_object_insert_into_vm(struct drm_i915_gem_object *obj,
+			       struct i915_address_space *vm,
+			       const struct i915_ggtt_view *ggtt_view,
+			       uint64_t size,
+			       unsigned alignment,
+			       uint64_t flags)
 {
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -2877,11 +2877,6 @@ search_free:
 		goto err_remove_node;
 	}
 
-	trace_i915_vma_bind(vma, flags);
-	ret = i915_vma_bind(vma, obj->cache_level, flags);
-	if (ret)
-		goto err_remove_node;
-
 	list_move_tail(&obj->global_list, &dev_priv->mm.bound_list);
 	list_move_tail(&vma->vm_link, &vm->inactive_list);
 	obj->bind_count++;
@@ -3554,24 +3549,26 @@ i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
 		}
 	}
 
-	bound = vma ? vma->bound : 0;
 	if (vma == NULL || !drm_mm_node_allocated(&vma->node)) {
-		vma = i915_gem_object_bind_to_vm(obj, vm, ggtt_view,
-						 size, alignment, flags);
+		vma = i915_gem_object_insert_into_vm(obj, vm, ggtt_view,
+						     size, alignment, flags);
 		if (IS_ERR(vma))
 			return PTR_ERR(vma);
-	} else {
-		ret = i915_vma_bind(vma, obj->cache_level, flags);
-		if (ret)
-			return ret;
 	}
 
+	bound = vma->bound;
+	ret = i915_vma_bind(vma, obj->cache_level, flags);
+	if (ret)
+		return ret;
+
 	if (ggtt_view && ggtt_view->type == I915_GGTT_VIEW_NORMAL &&
 	    (bound ^ vma->bound) & GLOBAL_BIND) {
 		__i915_vma_set_map_and_fenceable(vma);
 		WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
 	}
 
+	GEM_BUG_ON(i915_vma_misplaced(vma, size, alignment, flags));
+
 	vma->pin_count++;
 	return 0;
 }
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 107/190] drm/i915: Record allocated vma size
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (18 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 106/190] drm/i915: Split insertion/binding of an object into the VM Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 108/190] drm/i915: Start passing around i915_vma from execbuffer Chris Wilson
                     ` (33 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

Tracking the size of the VMA as allocated allows us to dramatically
reduce the complexity of later functions (like inserting the VMA in to
the drm_mm range manager).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h     |  10 +--
 drivers/gpu/drm/i915/i915_gem.c     | 117 +++++++++++++++---------------------
 drivers/gpu/drm/i915/i915_gem_gtt.c |  56 +++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |   6 +-
 4 files changed, 70 insertions(+), 119 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 49b126e4191e..7df6cfabe7fa 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2853,11 +2853,11 @@ int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj,
 int i915_gem_open(struct drm_device *dev, struct drm_file *file);
 void i915_gem_release(struct drm_device *dev, struct drm_file *file);
 
-uint32_t
-i915_gem_get_gtt_size(struct drm_device *dev, uint32_t size, int tiling_mode);
-uint32_t
-i915_gem_get_gtt_alignment(struct drm_device *dev, uint32_t size,
-			    int tiling_mode, bool fenced);
+uint64_t
+i915_gem_get_gtt_size(struct drm_device *dev, uint64_t size, int tiling_mode);
+uint64_t
+i915_gem_get_gtt_alignment(struct drm_device *dev, uint64_t size,
+			   int tiling_mode, bool fenced);
 
 int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 				    enum i915_cache_level cache_level);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9c159e64a9a0..0d4f358f4067 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1589,11 +1589,13 @@ i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv)
 		i915_gem_release_mmap(obj);
 }
 
-uint32_t
-i915_gem_get_gtt_size(struct drm_device *dev, uint32_t size, int tiling_mode)
+uint64_t
+i915_gem_get_gtt_size(struct drm_device *dev, uint64_t size, int tiling_mode)
 {
 	uint32_t gtt_size;
 
+	GEM_BUG_ON(size == 0);
+
 	if (INTEL_INFO(dev)->gen >= 4 ||
 	    tiling_mode == I915_TILING_NONE)
 		return size;
@@ -1617,10 +1619,12 @@ i915_gem_get_gtt_size(struct drm_device *dev, uint32_t size, int tiling_mode)
  * Return the required GTT alignment for an object, taking into account
  * potential fence register mapping.
  */
-uint32_t
-i915_gem_get_gtt_alignment(struct drm_device *dev, uint32_t size,
+uint64_t
+i915_gem_get_gtt_alignment(struct drm_device *dev, uint64_t size,
 			   int tiling_mode, bool fenced)
 {
+	GEM_BUG_ON(size == 0);
+
 	/*
 	 * Minimum alignment is 4k (GTT page size), but might be greater
 	 * if a fence register is needed for the object.
@@ -2747,68 +2751,51 @@ i915_gem_object_insert_into_vm(struct drm_i915_gem_object *obj,
 			       struct i915_address_space *vm,
 			       const struct i915_ggtt_view *ggtt_view,
 			       uint64_t size,
-			       unsigned alignment,
+			       uint64_t alignment,
 			       uint64_t flags)
 {
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	u64 start, end;
-	u32 search_flag, alloc_flag;
 	struct i915_vma *vma;
+	u64 start, end;
+	u64 min_alignment;
 	int ret;
 
-	if (i915_is_ggtt(vm)) {
-		u32 fence_size, fence_alignment, unfenced_alignment;
-		u64 view_size;
-
-		if (WARN_ON(!ggtt_view))
-			return ERR_PTR(-EINVAL);
-
-		view_size = i915_ggtt_view_size(obj, ggtt_view);
-
-		fence_size = i915_gem_get_gtt_size(dev,
-						   view_size,
-						   obj->tiling_mode);
-		fence_alignment = i915_gem_get_gtt_alignment(dev,
-							     view_size,
-							     obj->tiling_mode,
-							     true);
-		unfenced_alignment = i915_gem_get_gtt_alignment(dev,
-								view_size,
-								obj->tiling_mode,
-								false);
-		size = max(size, view_size);
-		if (flags & PIN_MAPPABLE)
-			size = max_t(u64, size, fence_size);
-
-		if (alignment == 0)
-			alignment = flags & PIN_MAPPABLE ? fence_alignment :
-				unfenced_alignment;
-		if (flags & PIN_MAPPABLE && alignment & (fence_alignment - 1)) {
-			DRM_DEBUG("Invalid object (view type=%u) alignment requested %u\n",
-				  ggtt_view ? ggtt_view->type : 0,
-				  alignment);
-			return ERR_PTR(-EINVAL);
-		}
-	} else {
-		size = max_t(u64, size, obj->base.size);
-		alignment = 4096;
+	vma = ggtt_view ?
+		i915_gem_obj_lookup_or_create_ggtt_vma(obj, ggtt_view) :
+		i915_gem_obj_lookup_or_create_vma(obj, vm);
+	if (IS_ERR(vma))
+		return vma;
+
+	size = max(size, vma->size);
+	if (flags & PIN_MAPPABLE)
+		size = i915_gem_get_gtt_size(dev, size, obj->tiling_mode);
+
+	min_alignment =
+		i915_gem_get_gtt_alignment(dev, size, obj->tiling_mode,
+					   flags & PIN_MAPPABLE);
+	if (alignment == 0)
+		alignment = min_alignment;
+	if (alignment & (min_alignment - 1)) {
+		DRM_DEBUG("Invalid object alignment requested %llu, minimum %llu\n",
+			  alignment, min_alignment);
+		return ERR_PTR(-EINVAL);
 	}
 
 	start = flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
-	end = vm->total;
+
+	end = vma->vm->total;
 	if (flags & PIN_MAPPABLE)
 		end = min_t(u64, end, dev_priv->gtt.mappable_end);
 	if (flags & PIN_ZONE_4G)
-		end = min_t(u64, end, (1ULL << 32));
+		end = min_t(u64, end, 1ULL << 32);
 
 	/* If binding the object/GGTT view requires more space than the entire
 	 * aperture has, reject it early before evicting everything in a vain
 	 * attempt to find space.
 	 */
 	if (size > end) {
-		DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: request=%llu [object=%zd] > %s aperture=%llu\n",
-			  ggtt_view ? ggtt_view->type : 0,
+		DRM_DEBUG("Attempting to bind an object larger than the aperture: request=%llu [object=%zd] > %s aperture=%llu\n",
 			  size, obj->base.size,
 			  flags & PIN_MAPPABLE ? "mappable" : "total",
 			  end);
@@ -2821,31 +2808,27 @@ i915_gem_object_insert_into_vm(struct drm_i915_gem_object *obj,
 
 	i915_gem_object_pin_pages(obj);
 
-	vma = ggtt_view ? i915_gem_obj_lookup_or_create_ggtt_vma(obj, ggtt_view) :
-			  i915_gem_obj_lookup_or_create_vma(obj, vm);
-
-	if (IS_ERR(vma))
-		goto err_unpin;
-
 	if (flags & PIN_OFFSET_FIXED) {
 		uint64_t offset = flags & PIN_OFFSET_MASK;
-
-		if (offset & (alignment - 1) || offset + size > end) {
+		if (offset & (alignment - 1) || offset > end - size) {
 			ret = -EINVAL;
-			goto err_vma;
+			goto err_unpin;
 		}
+
 		vma->node.start = offset;
 		vma->node.size = size;
 		vma->node.color = obj->cache_level;
-		ret = drm_mm_reserve_node(&vm->mm, &vma->node);
+		ret = drm_mm_reserve_node(&vma->vm->mm, &vma->node);
 		if (ret) {
 			ret = i915_gem_evict_for_vma(vma);
 			if (ret == 0)
-				ret = drm_mm_reserve_node(&vm->mm, &vma->node);
+				ret = drm_mm_reserve_node(&vma->vm->mm, &vma->node);
+			if (ret)
+				goto err_unpin;
 		}
-		if (ret)
-			goto err_vma;
 	} else {
+		u32 search_flag, alloc_flag;
+
 		if (flags & PIN_HIGH) {
 			search_flag = DRM_MM_SEARCH_BELOW;
 			alloc_flag = DRM_MM_CREATE_TOP;
@@ -2855,21 +2838,23 @@ i915_gem_object_insert_into_vm(struct drm_i915_gem_object *obj,
 		}
 
 search_free:
-		ret = drm_mm_insert_node_in_range_generic(&vm->mm, &vma->node,
+		ret = drm_mm_insert_node_in_range_generic(&vma->vm->mm,
+							  &vma->node,
 							  size, alignment,
 							  obj->cache_level,
 							  start, end,
 							  search_flag,
 							  alloc_flag);
 		if (ret) {
-			ret = i915_gem_evict_something(dev, vm, size, alignment,
+			ret = i915_gem_evict_something(dev, vma->vm,
+						       size, alignment,
 						       obj->cache_level,
 						       start, end,
 						       flags);
 			if (ret == 0)
 				goto search_free;
 
-			goto err_vma;
+			goto err_unpin;
 		}
 	}
 	if (WARN_ON(!i915_gem_valid_gtt_space(vma, obj->cache_level))) {
@@ -2878,18 +2863,16 @@ search_free:
 	}
 
 	list_move_tail(&obj->global_list, &dev_priv->mm.bound_list);
-	list_move_tail(&vma->vm_link, &vm->inactive_list);
+	list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
 	obj->bind_count++;
 
 	return vma;
 
 err_remove_node:
 	drm_mm_remove_node(&vma->node);
-err_vma:
-	vma = ERR_PTR(ret);
 err_unpin:
 	i915_gem_object_unpin_pages(obj);
-	return vma;
+	return ERR_PTR(ret);
 }
 
 bool
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 61ec8f28be72..98b9730f4066 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -171,7 +171,7 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
 {
 	vma->vm->clear_range(vma->vm,
 			     vma->node.start,
-			     vma->obj->base.size,
+			     vma->size,
 			     true);
 }
 
@@ -2617,28 +2617,18 @@ static int aliasing_gtt_bind_vma(struct i915_vma *vma,
 
 static void ggtt_unbind_vma(struct i915_vma *vma)
 {
-	struct drm_device *dev = vma->vm->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_gem_object *obj = vma->obj;
-	const uint64_t size = min_t(uint64_t,
-				    obj->base.size,
-				    vma->node.size);
+	struct i915_hw_ppgtt *appgtt = to_i915(vma->vm->dev)->mm.aliasing_ppgtt;
+	const uint64_t size = min(vma->size, vma->node.size);
 
-	if (vma->bound & GLOBAL_BIND) {
+	if (vma->bound & GLOBAL_BIND)
 		vma->vm->clear_range(vma->vm,
-				     vma->node.start,
-				     size,
+				     vma->node.start, size,
 				     true);
-	}
-
-	if (dev_priv->mm.aliasing_ppgtt && vma->bound & LOCAL_BIND) {
-		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
 
+	if (vma->bound & LOCAL_BIND && appgtt)
 		appgtt->base.clear_range(&appgtt->base,
-					 vma->node.start,
-					 size,
+					 vma->node.start, size,
 					 true);
-	}
 }
 
 void i915_gem_gtt_finish_object(struct drm_i915_gem_object *obj)
@@ -3274,11 +3264,16 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 	list_add(&vma->vm_link, &vm->unbound_list);
 	vma->vm = vm;
 	vma->obj = obj;
+	vma->size = obj->base.size;
 	vma->is_ggtt = i915_is_ggtt(vm);
 
-	if (i915_is_ggtt(vm))
+	if (i915_is_ggtt(vm)) {
 		vma->ggtt_view = *ggtt_view;
-	else
+		if (ggtt_view->type == I915_GGTT_VIEW_PARTIAL)
+			vma->size = ggtt_view->params.partial.size << PAGE_SHIFT;
+		else if (ggtt_view->type == I915_GGTT_VIEW_ROTATED)
+			vma->size = ggtt_view->params.rotation_info.size;
+	} else
 		i915_ppgtt_get(i915_vm_to_ppgtt(vm));
 
 	list_add_tail(&vma->obj_link, &obj->vma_list);
@@ -3573,26 +3568,3 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
 
 	return 0;
 }
-
-/**
- * i915_ggtt_view_size - Get the size of a GGTT view.
- * @obj: Object the view is of.
- * @view: The view in question.
- *
- * @return The size of the GGTT view in bytes.
- */
-size_t
-i915_ggtt_view_size(struct drm_i915_gem_object *obj,
-		    const struct i915_ggtt_view *view)
-{
-	if (view->type == I915_GGTT_VIEW_NORMAL) {
-		return obj->base.size;
-	} else if (view->type == I915_GGTT_VIEW_ROTATED) {
-		return view->params.rotation_info.size;
-	} else if (view->type == I915_GGTT_VIEW_PARTIAL) {
-		return view->params.partial.size << PAGE_SHIFT;
-	} else {
-		WARN_ONCE(1, "GGTT view %u not implemented!\n", view->type);
-		return obj->base.size;
-	}
-}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index bb3dd5fe1a3c..8877dc48f028 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -181,6 +181,7 @@ struct i915_vma {
 	struct drm_mm_node node;
 	struct drm_i915_gem_object *obj;
 	struct i915_address_space *vm;
+	u64 size;
 
 	struct i915_gem_active last_read[I915_NUM_RINGS];
 
@@ -581,9 +582,4 @@ i915_ggtt_view_equal(const struct i915_ggtt_view *a,
 		return !memcmp(&a->params, &b->params, sizeof(a->params));
 	return true;
 }
-
-size_t
-i915_ggtt_view_size(struct drm_i915_gem_object *obj,
-		    const struct i915_ggtt_view *view);
-
 #endif
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 108/190] drm/i915: Start passing around i915_vma from execbuffer
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (19 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 107/190] drm/i915: Record allocated vma size Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 109/190] drm/i915: Remove highly confusing i915_gem_obj_ggtt_pin() Chris Wilson
                     ` (32 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

During execbuffer we look up the i915_vma in order to reserver them in
the VM. However, we then do a double lookup of the vma in order to then
pin them, all because we lack the necessary interfaces to operate on
i915_vma.

v2: Tidy parameter lists to remove one level of redirection in the hot
path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  28 +++--
 drivers/gpu/drm/i915/i915_gem.c            | 159 ++++++++++++-----------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 127 +++++++++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.c        |   3 -
 4 files changed, 144 insertions(+), 173 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7df6cfabe7fa..f6e508e5aa5b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2680,6 +2680,11 @@ struct drm_i915_gem_object *i915_gem_object_create_from_data(
 void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file);
 void i915_gem_free_object(struct drm_gem_object *obj);
 
+int __must_check
+i915_vma_pin(struct i915_vma *vma,
+	     uint64_t size,
+	     uint64_t alignment,
+	     uint64_t flags);
 /* Flags used by pin/bind&friends. */
 #define PIN_MAPPABLE	(1<<0)
 #define PIN_NONBLOCK	(1<<1)
@@ -2691,12 +2696,19 @@ void i915_gem_free_object(struct drm_gem_object *obj);
 #define PIN_HIGH	(1<<7)
 #define PIN_OFFSET_FIXED	(1<<8)
 #define PIN_OFFSET_MASK (~4095)
-int __must_check
-i915_gem_object_pin(struct drm_i915_gem_object *obj,
-		    struct i915_address_space *vm,
-		    uint64_t size,
-		    uint32_t alignment,
-		    uint64_t flags);
+
+static inline void __i915_vma_unpin(struct i915_vma *vma)
+{
+	vma->pin_count--;
+}
+
+static inline void i915_vma_unpin(struct i915_vma *vma)
+{
+	GEM_BUG_ON(vma->pin_count == 0);
+	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
+	__i915_vma_unpin(vma);
+}
+
 int __must_check
 i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 			 const struct i915_ggtt_view *view,
@@ -2933,8 +2945,8 @@ i915_gem_obj_ggtt_pin(struct drm_i915_gem_object *obj,
 		      uint32_t alignment,
 		      unsigned flags)
 {
-	return i915_gem_object_pin(obj, i915_obj_to_ggtt(obj), 0, alignment,
-				   flags | PIN_GLOBAL);
+	return i915_gem_object_ggtt_pin(obj, &i915_ggtt_view_normal,
+					0, alignment, flags);
 }
 
 static inline int
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0d4f358f4067..c6d7a78ab605 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2746,26 +2746,21 @@ static bool i915_gem_valid_gtt_space(struct i915_vma *vma,
  * Finds free space in the GTT aperture and binds the object or a view of it
  * there.
  */
-static struct i915_vma *
-i915_gem_object_insert_into_vm(struct drm_i915_gem_object *obj,
-			       struct i915_address_space *vm,
-			       const struct i915_ggtt_view *ggtt_view,
-			       uint64_t size,
-			       uint64_t alignment,
-			       uint64_t flags)
+static int
+i915_vma_insert(struct i915_vma *vma,
+		uint64_t size,
+		uint64_t alignment,
+		uint64_t flags)
 {
+	struct drm_i915_gem_object *obj = vma->obj;
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct i915_vma *vma;
 	u64 start, end;
 	u64 min_alignment;
 	int ret;
 
-	vma = ggtt_view ?
-		i915_gem_obj_lookup_or_create_ggtt_vma(obj, ggtt_view) :
-		i915_gem_obj_lookup_or_create_vma(obj, vm);
-	if (IS_ERR(vma))
-		return vma;
+	GEM_BUG_ON(vma->bound);
+	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
 
 	size = max(size, vma->size);
 	if (flags & PIN_MAPPABLE)
@@ -2779,7 +2774,7 @@ i915_gem_object_insert_into_vm(struct drm_i915_gem_object *obj,
 	if (alignment & (min_alignment - 1)) {
 		DRM_DEBUG("Invalid object alignment requested %llu, minimum %llu\n",
 			  alignment, min_alignment);
-		return ERR_PTR(-EINVAL);
+		return -EINVAL;
 	}
 
 	start = flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
@@ -2799,12 +2794,12 @@ i915_gem_object_insert_into_vm(struct drm_i915_gem_object *obj,
 			  size, obj->base.size,
 			  flags & PIN_MAPPABLE ? "mappable" : "total",
 			  end);
-		return ERR_PTR(-E2BIG);
+		return -E2BIG;
 	}
 
 	ret = i915_gem_object_get_pages(obj);
 	if (ret)
-		return ERR_PTR(ret);
+		return ret;
 
 	i915_gem_object_pin_pages(obj);
 
@@ -2866,13 +2861,13 @@ search_free:
 	list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
 	obj->bind_count++;
 
-	return vma;
+	return 0;
 
 err_remove_node:
 	drm_mm_remove_node(&vma->node);
 err_unpin:
 	i915_gem_object_unpin_pages(obj);
-	return ERR_PTR(ret);
+	return ret;
 }
 
 bool
@@ -3435,6 +3430,9 @@ i915_vma_misplaced(struct i915_vma *vma,
 {
 	struct drm_i915_gem_object *obj = vma->obj;
 
+	if (!drm_mm_node_allocated(&vma->node))
+		return false;
+
 	if (vma->node.size < size)
 		return true;
 
@@ -3478,94 +3476,45 @@ void __i915_vma_set_map_and_fenceable(struct i915_vma *vma)
 	obj->map_and_fenceable = mappable && fenceable;
 }
 
-static int
-i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
-		       struct i915_address_space *vm,
-		       const struct i915_ggtt_view *ggtt_view,
-		       uint64_t size,
-		       uint32_t alignment,
-		       uint64_t flags)
+int
+i915_vma_pin(struct i915_vma *vma,
+	     uint64_t size,
+	     uint64_t alignment,
+	     uint64_t flags)
 {
-	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
-	struct i915_vma *vma;
-	unsigned bound;
+	unsigned bound = vma->bound;
 	int ret;
 
-	if (WARN_ON(vm == &dev_priv->mm.aliasing_ppgtt->base))
-		return -ENODEV;
-
-	if (WARN_ON(flags & (PIN_GLOBAL | PIN_MAPPABLE) && !i915_is_ggtt(vm)))
-		return -EINVAL;
-
-	if (WARN_ON((flags & (PIN_MAPPABLE | PIN_GLOBAL)) == PIN_MAPPABLE))
-		return -EINVAL;
+	GEM_BUG_ON((flags & (PIN_GLOBAL | PIN_USER)) == 0);
+	GEM_BUG_ON((flags & PIN_GLOBAL) && !vma->is_ggtt);
 
-	if (WARN_ON(i915_is_ggtt(vm) != !!ggtt_view))
-		return -EINVAL;
-
-	vma = ggtt_view ? i915_gem_obj_to_ggtt_view(obj, ggtt_view) :
-			  i915_gem_obj_to_vma(obj, vm);
-
-	if (IS_ERR(vma))
-		return PTR_ERR(vma);
-
-	if (vma) {
-		if (WARN_ON(vma->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT))
-			return -EBUSY;
-
-		if (i915_vma_misplaced(vma, size, alignment, flags)) {
-			WARN(vma->pin_count,
-			     "bo is already pinned in %s with incorrect alignment:"
-			     " offset=%08x %08x, req.alignment=%x, req.map_and_fenceable=%d,"
-			     " obj->map_and_fenceable=%d\n",
-			     ggtt_view ? "ggtt" : "ppgtt",
-			     upper_32_bits(vma->node.start),
-			     lower_32_bits(vma->node.start),
-			     alignment,
-			     !!(flags & PIN_MAPPABLE),
-			     obj->map_and_fenceable);
-			ret = i915_vma_unbind(vma);
-			if (ret)
-				return ret;
+	if (WARN_ON(vma->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT))
+		return -EBUSY;
 
-			vma = NULL;
-		}
-	}
+	/* Pin early to prevent the shrinker/eviction logic from destroying
+	 * our vma as we insert and bind.
+	 */
+	vma->pin_count++;
 
-	if (vma == NULL || !drm_mm_node_allocated(&vma->node)) {
-		vma = i915_gem_object_insert_into_vm(obj, vm, ggtt_view,
-						     size, alignment, flags);
-		if (IS_ERR(vma))
-			return PTR_ERR(vma);
+	if (!bound) {
+		ret = i915_vma_insert(vma, size, alignment, flags);
+		if (ret)
+			goto err;
 	}
 
-	bound = vma->bound;
-	ret = i915_vma_bind(vma, obj->cache_level, flags);
+	ret = i915_vma_bind(vma, vma->obj->cache_level, flags);
 	if (ret)
-		return ret;
+		goto err;
 
-	if (ggtt_view && ggtt_view->type == I915_GGTT_VIEW_NORMAL &&
-	    (bound ^ vma->bound) & GLOBAL_BIND) {
+	if ((bound ^ vma->bound) & GLOBAL_BIND)
 		__i915_vma_set_map_and_fenceable(vma);
-		WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
-	}
 
 	GEM_BUG_ON(i915_vma_misplaced(vma, size, alignment, flags));
-
-	vma->pin_count++;
 	return 0;
-}
 
-int
-i915_gem_object_pin(struct drm_i915_gem_object *obj,
-		    struct i915_address_space *vm,
-		    uint64_t size,
-		    uint32_t alignment,
-		    uint64_t flags)
-{
-	return i915_gem_object_do_pin(obj, vm,
-				      i915_is_ggtt(vm) ? &i915_ggtt_view_normal : NULL,
-				      size, alignment, flags);
+err:
+	vma->pin_count--;
+	return ret;
 }
 
 int
@@ -3575,11 +3524,35 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 			 uint32_t alignment,
 			 uint64_t flags)
 {
+	struct i915_vma *vma;
+	int ret;
+
 	if (WARN_ONCE(!view, "no view specified"))
 		return -EINVAL;
 
-	return i915_gem_object_do_pin(obj, i915_obj_to_ggtt(obj), view,
-				      size, alignment, flags | PIN_GLOBAL);
+	vma = i915_gem_obj_lookup_or_create_ggtt_vma(obj, view);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
+
+	if (i915_vma_misplaced(vma, size, alignment, flags)) {
+		if (flags & PIN_NONBLOCK && (vma->pin_count | vma->active))
+			return -ENOSPC;
+
+		WARN(vma->pin_count,
+		     "bo is already pinned in ggtt with incorrect alignment:"
+		     " offset=%08x %08x, req.alignment=%x, req.map_and_fenceable=%d,"
+		     " obj->map_and_fenceable=%d\n",
+		     upper_32_bits(vma->node.start),
+		     lower_32_bits(vma->node.start),
+		     alignment,
+		     !!(flags & PIN_MAPPABLE),
+		     obj->map_and_fenceable);
+		ret = i915_vma_unbind(vma);
+		if (ret)
+			return ret;
+	}
+
+	return i915_vma_pin(vma, size, alignment, flags | PIN_GLOBAL);
 }
 
 void
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 899220139a8a..d4dcc3e5d080 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -44,11 +44,10 @@
 struct i915_execbuffer_params {
 	struct drm_device               *dev;
 	struct drm_file                 *file;
+	struct i915_vma			*batch_vma;
 	uint32_t                        dispatch_flags;
 	uint32_t                        args_batch_start_offset;
-	uint64_t                        batch_obj_vm_offset;
 	struct intel_engine_cs          *ring;
-	struct drm_i915_gem_object      *batch_obj;
 	struct intel_context            *ctx;
 	struct drm_i915_gem_request     *request;
 };
@@ -101,6 +100,26 @@ eb_reset(struct eb_vmas *eb)
 		memset(eb->buckets, 0, (eb->and+1)*sizeof(struct hlist_head));
 }
 
+static struct i915_vma *
+eb_get_batch(struct eb_vmas *eb)
+{
+	struct i915_vma *vma = list_entry(eb->vmas.prev, typeof(*vma), exec_list);
+
+	/*
+	 * SNA is doing fancy tricks with compressing batch buffers, which leads
+	 * to negative relocation deltas. Usually that works out ok since the
+	 * relocate address is still positive, except when the batch is placed
+	 * very low in the GTT. Ensure this doesn't happen.
+	 *
+	 * Note that actual hangs have only been observed on gen7, but for
+	 * paranoia do it everywhere.
+	 */
+	if ((vma->exec_entry->flags & EXEC_OBJECT_PINNED) == 0)
+		vma->exec_entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
+
+	return vma;
+}
+
 static int
 eb_lookup_vmas(struct eb_vmas *eb,
 	       struct drm_i915_gem_exec_object2 *exec,
@@ -642,16 +661,16 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
 			flags |= PIN_HIGH;
 	}
 
-	ret = i915_gem_object_pin(obj, vma->vm,
-				  entry->pad_to_size,
-				  entry->alignment,
-				  flags);
-	if ((ret == -ENOSPC  || ret == -E2BIG) &&
+	ret = i915_vma_pin(vma,
+			   entry->pad_to_size,
+			   entry->alignment,
+			   flags);
+	if ((ret == -ENOSPC || ret == -E2BIG) &&
 	    only_mappable_for_reloc(entry->flags))
-		ret = i915_gem_object_pin(obj, vma->vm,
-					  entry->pad_to_size,
-					  entry->alignment,
-					  flags & ~PIN_MAPPABLE);
+		ret = i915_vma_pin(vma,
+				   entry->pad_to_size,
+				   entry->alignment,
+				   flags & ~PIN_MAPPABLE);
 	if (ret)
 		return ret;
 
@@ -1203,11 +1222,11 @@ i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
 	return 0;
 }
 
-static struct drm_i915_gem_object*
+static struct i915_vma*
 i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 			  struct drm_i915_gem_exec_object2 *shadow_exec_entry,
-			  struct eb_vmas *eb,
 			  struct drm_i915_gem_object *batch_obj,
+			  struct eb_vmas *eb,
 			  u32 batch_start_offset,
 			  u32 batch_len,
 			  bool is_master)
@@ -1219,7 +1238,7 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 	shadow_batch_obj = i915_gem_batch_pool_get(&ring->batch_pool,
 						   PAGE_ALIGN(batch_len));
 	if (IS_ERR(shadow_batch_obj))
-		return shadow_batch_obj;
+		return ERR_CAST(shadow_batch_obj);
 
 	ret = i915_parse_cmds(ring,
 			      batch_obj,
@@ -1244,14 +1263,12 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 	drm_gem_object_reference(&shadow_batch_obj->base);
 	list_add_tail(&vma->exec_list, &eb->vmas);
 
-	shadow_batch_obj->base.pending_read_domains = I915_GEM_DOMAIN_COMMAND;
-
-	return shadow_batch_obj;
+	return vma;
 
 err:
 	i915_gem_object_unpin_pages(shadow_batch_obj);
 	if (ret == -EACCES) /* unhandled chained batch */
-		return batch_obj;
+		return NULL;
 	else
 		return ERR_PTR(ret);
 }
@@ -1331,7 +1348,7 @@ execbuf_submit(struct i915_execbuffer_params *params,
 	}
 
 	exec_len   = args->batch_len;
-	exec_start = params->batch_obj_vm_offset +
+	exec_start = params->batch_vma->node.start +
 		     params->args_batch_start_offset;
 
 	ret = params->ring->emit_bb_start(params->request,
@@ -1378,26 +1395,6 @@ static int gen8_dispatch_bsd_ring(struct drm_device *dev,
 	}
 }
 
-static struct drm_i915_gem_object *
-eb_get_batch(struct eb_vmas *eb)
-{
-	struct i915_vma *vma = list_entry(eb->vmas.prev, typeof(*vma), exec_list);
-
-	/*
-	 * SNA is doing fancy tricks with compressing batch buffers, which leads
-	 * to negative relocation deltas. Usually that works out ok since the
-	 * relocate address is still positive, except when the batch is placed
-	 * very low in the GTT. Ensure this doesn't happen.
-	 *
-	 * Note that actual hangs have only been observed on gen7, but for
-	 * paranoia do it everywhere.
-	 */
-	if ((vma->exec_entry->flags & EXEC_OBJECT_PINNED) == 0)
-		vma->exec_entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
-
-	return vma->obj;
-}
-
 static int
 i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		       struct drm_file *file,
@@ -1406,7 +1403,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct eb_vmas *eb;
-	struct drm_i915_gem_object *batch_obj;
 	struct drm_i915_gem_exec_object2 shadow_exec_entry;
 	struct intel_engine_cs *ring;
 	struct intel_context *ctx;
@@ -1542,7 +1538,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		goto err;
 
 	/* take note of the batch buffer before we might reorder the lists */
-	batch_obj = eb_get_batch(eb);
+	params->batch_vma = eb_get_batch(eb);
 
 	/* Move the objects en-masse into the GTT, evicting if necessary. */
 	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
@@ -1564,7 +1560,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	}
 
 	/* Set the pending read domains for the batch buffer to COMMAND */
-	if (batch_obj->base.pending_write_domain) {
+	if (params->batch_vma->obj->base.pending_write_domain) {
 		DRM_DEBUG("Attempting to use self-modifying batch buffer\n");
 		ret = -EINVAL;
 		goto err;
@@ -1572,26 +1568,20 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 
 	params->args_batch_start_offset = args->batch_start_offset;
 	if (i915_needs_cmd_parser(ring) && args->batch_len) {
-		struct drm_i915_gem_object *parsed_batch_obj;
-
-		parsed_batch_obj = i915_gem_execbuffer_parse(ring,
-						      &shadow_exec_entry,
-						      eb,
-						      batch_obj,
-						      args->batch_start_offset,
-						      args->batch_len,
-						      file->is_master);
-		if (IS_ERR(parsed_batch_obj)) {
-			ret = PTR_ERR(parsed_batch_obj);
+		struct i915_vma *vma;
+
+		vma = i915_gem_execbuffer_parse(ring, &shadow_exec_entry,
+						params->batch_vma->obj,
+						eb,
+						args->batch_start_offset,
+						args->batch_len,
+						file->is_master);
+		if (IS_ERR(vma)) {
+			ret = PTR_ERR(vma);
 			goto err;
 		}
 
-		/*
-		 * parsed_batch_obj == batch_obj means batch not fully parsed:
-		 * Accept, but don't promote to secure.
-		 */
-
-		if (parsed_batch_obj != batch_obj) {
+		if (vma) {
 			/*
 			 * Batch parsed and accepted:
 			 *
@@ -1603,16 +1593,18 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 			 */
 			dispatch_flags |= I915_DISPATCH_SECURE;
 			params->args_batch_start_offset = 0;
-			batch_obj = parsed_batch_obj;
+			params->batch_vma = vma;
 		}
 	}
 
-	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
+	params->batch_vma->obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
 
 	/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
 	 * batch" bit. Hence we need to pin secure batches into the global gtt.
 	 * hsw should have this fixed, but bdw mucks it up again. */
 	if (dispatch_flags & I915_DISPATCH_SECURE) {
+		struct drm_i915_gem_object *obj = params->batch_vma->obj;
+
 		/*
 		 * So on first glance it looks freaky that we pin the batch here
 		 * outside of the reservation loop. But:
@@ -1623,13 +1615,12 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		 *   fitting due to fragmentation.
 		 * So this is actually safe.
 		 */
-		ret = i915_gem_obj_ggtt_pin(batch_obj, 0, 0);
+		ret = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, 0);
 		if (ret)
 			goto err;
 
-		params->batch_obj_vm_offset = i915_gem_obj_ggtt_offset(batch_obj);
-	} else
-		params->batch_obj_vm_offset = i915_gem_obj_offset(batch_obj, vm);
+		params->batch_vma = i915_gem_obj_to_ggtt(obj);
+	}
 
 	/* Allocate a request for this batch buffer nice and early. */
 	params->request = i915_gem_request_alloc(ring, ctx);
@@ -1654,11 +1645,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	params->file                    = file;
 	params->ring                    = ring;
 	params->dispatch_flags          = dispatch_flags;
-	params->batch_obj               = batch_obj;
 	params->ctx                     = ctx;
 
 	ret = execbuf_submit(params, args, &eb->vmas);
-	__i915_add_request(params->request, params->batch_obj, ret == 0);
+	__i915_add_request(params->request, params->batch_vma->obj, ret == 0);
 
 err_batch_unpin:
 	/*
@@ -1668,8 +1658,7 @@ err_batch_unpin:
 	 * active.
 	 */
 	if (dispatch_flags & I915_DISPATCH_SECURE)
-		i915_gem_object_ggtt_unpin(batch_obj);
-
+		i915_vma_unpin(params->batch_vma);
 err:
 	/* the request owns the ref now */
 	i915_gem_context_unreference(ctx);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 98b9730f4066..8f3b2f051918 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3549,13 +3549,10 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
 		return 0;
 
 	if (vma->bound == 0 && vma->vm->allocate_va_range) {
-		/* XXX: i915_vma_pin() will fix this +- hack */
-		vma->pin_count++;
 		trace_i915_va_alloc(vma);
 		ret = vma->vm->allocate_va_range(vma->vm,
 						 vma->node.start,
 						 vma->node.size);
-		vma->pin_count--;
 		if (ret)
 			return ret;
 	}
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 109/190] drm/i915: Remove highly confusing i915_gem_obj_ggtt_pin()
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (20 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 108/190] drm/i915: Start passing around i915_vma from execbuffer Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 110/190] drm/i915: Move vma->pin_count:4 to vma->flags Chris Wilson
                     ` (31 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for
i915_gem_object_ggtt_pin(), spare us the confustion and remove it.
Removing it now simplifies later patches to change the i915_vma_pin()
(and friends) interface.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h              | 11 +----------
 drivers/gpu/drm/i915/i915_gem.c              | 24 ++++++++++--------------
 drivers/gpu/drm/i915/i915_gem_context.c      | 10 ++++++----
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  2 +-
 drivers/gpu/drm/i915/i915_gem_render_state.c |  2 +-
 drivers/gpu/drm/i915/i915_guc_submission.c   |  4 ++--
 drivers/gpu/drm/i915/intel_guc_loader.c      |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c             |  7 ++++---
 drivers/gpu/drm/i915/intel_overlay.c         |  3 ++-
 drivers/gpu/drm/i915/intel_ringbuffer.c      | 14 ++++++++------
 10 files changed, 36 insertions(+), 43 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f6e508e5aa5b..0e3ff0b24d4d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2713,7 +2713,7 @@ int __must_check
 i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 			 const struct i915_ggtt_view *view,
 			 uint64_t size,
-			 uint32_t alignment,
+			 uint64_t alignment,
 			 uint64_t flags);
 
 int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
@@ -2940,15 +2940,6 @@ i915_gem_obj_ggtt_size(struct drm_i915_gem_object *obj)
 	return i915_gem_obj_size(obj, i915_obj_to_ggtt(obj));
 }
 
-static inline int __must_check
-i915_gem_obj_ggtt_pin(struct drm_i915_gem_object *obj,
-		      uint32_t alignment,
-		      unsigned flags)
-{
-	return i915_gem_object_ggtt_pin(obj, &i915_ggtt_view_normal,
-					0, alignment, flags);
-}
-
 static inline int
 i915_gem_object_ggtt_unbind(struct drm_i915_gem_object *obj)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c6d7a78ab605..495fb80edee0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -780,7 +780,9 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
 	char __user *user_data;
 	int page_offset, page_length, ret;
 
-	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
+	ret = i915_gem_object_ggtt_pin(obj, NULL,
+				       0, 0,
+				       PIN_MAPPABLE | PIN_NONBLOCK);
 	if (ret)
 		goto out;
 
@@ -3425,7 +3427,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 static bool
 i915_vma_misplaced(struct i915_vma *vma,
 		   uint64_t size,
-		   uint32_t alignment,
+		   uint64_t alignment,
 		   uint64_t flags)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
@@ -3521,14 +3523,14 @@ int
 i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 			 const struct i915_ggtt_view *view,
 			 uint64_t size,
-			 uint32_t alignment,
+			 uint64_t alignment,
 			 uint64_t flags)
 {
 	struct i915_vma *vma;
 	int ret;
 
-	if (WARN_ONCE(!view, "no view specified"))
-		return -EINVAL;
+	if (view == NULL)
+		view = &i915_ggtt_view_normal;
 
 	vma = i915_gem_obj_lookup_or_create_ggtt_vma(obj, view);
 	if (IS_ERR(vma))
@@ -3540,11 +3542,11 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 
 		WARN(vma->pin_count,
 		     "bo is already pinned in ggtt with incorrect alignment:"
-		     " offset=%08x %08x, req.alignment=%x, req.map_and_fenceable=%d,"
+		     " offset=%08x %08x, req.alignment=%llx, req.map_and_fenceable=%d,"
 		     " obj->map_and_fenceable=%d\n",
 		     upper_32_bits(vma->node.start),
 		     lower_32_bits(vma->node.start),
-		     alignment,
+		     (long long)alignment,
 		     !!(flags & PIN_MAPPABLE),
 		     obj->map_and_fenceable);
 		ret = i915_vma_unbind(vma);
@@ -3559,13 +3561,7 @@ void
 i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
 				const struct i915_ggtt_view *view)
 {
-	struct i915_vma *vma = i915_gem_obj_to_ggtt_view(obj, view);
-
-	BUG_ON(!vma);
-	WARN_ON(vma->pin_count == 0);
-	WARN_ON(!i915_gem_obj_ggtt_bound_view(obj, view));
-
-	--vma->pin_count;
+	i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
 }
 
 int
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index b7f5781a85ec..15d5a5d247e0 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -378,6 +378,7 @@ int i915_gem_context_init(struct drm_device *dev)
 	}
 
 	if (ctx->legacy_hw_ctx.rcs_state) {
+		u32 alignment = get_context_alignment(dev);
 		int ret;
 
 		/* We may need to do things with the shrinker which
@@ -387,8 +388,8 @@ int i915_gem_context_init(struct drm_device *dev)
 		 * be available. To avoid this we always pin the default
 		 * context.
 		 */
-		ret = i915_gem_obj_ggtt_pin(ctx->legacy_hw_ctx.rcs_state,
-					    get_context_alignment(dev), 0);
+		ret = i915_gem_object_ggtt_pin(ctx->legacy_hw_ctx.rcs_state,
+					       NULL, 0, alignment, 0);
 		if (ret) {
 			DRM_ERROR("Failed to pinned default global context (error %d)\n",
 				  ret);
@@ -674,8 +675,9 @@ static int do_switch(struct drm_i915_gem_request *req)
 
 	/* Trying to pin first makes error handling easier. */
 	if (engine->id == RCS) {
-		ret = i915_gem_obj_ggtt_pin(to->legacy_hw_ctx.rcs_state,
-					    get_context_alignment(engine->dev), 0);
+		u32 alignment = get_context_alignment(engine->dev);
+		ret = i915_gem_object_ggtt_pin(to->legacy_hw_ctx.rcs_state,
+					       NULL, 0, alignment, 0);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index d4dcc3e5d080..be90d907f890 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1249,7 +1249,7 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 	if (ret)
 		goto err;
 
-	ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 0, 0);
+	ret = i915_gem_object_ggtt_pin(shadow_batch_obj, NULL, 0, 0, 0);
 	if (ret)
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 68054f5c4ab1..830c0d24b11e 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -70,7 +70,7 @@ static int render_state_init(struct render_state *so, struct drm_device *dev)
 	if (so->obj == NULL)
 		return -ENOMEM;
 
-	ret = i915_gem_obj_ggtt_pin(so->obj, 4096, 0);
+	ret = i915_gem_object_ggtt_pin(so->obj, NULL, 0, 0, 0);
 	if (ret)
 		goto free_gem;
 
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 63e58253280b..c4d8c34092a9 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -626,8 +626,8 @@ static struct drm_i915_gem_object *gem_allocate_guc_obj(struct drm_device *dev,
 		return NULL;
 	}
 
-	if (i915_gem_obj_ggtt_pin(obj, PAGE_SIZE,
-			PIN_OFFSET_BIAS | GUC_WOPCM_TOP)) {
+	if (i915_gem_object_ggtt_pin(obj, NULL, 0, PAGE_SIZE,
+				     PIN_OFFSET_BIAS | GUC_WOPCM_TOP)) {
 		drm_gem_object_unreference(&obj->base);
 		return NULL;
 	}
diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
index d20788ffd341..dded672d5599 100644
--- a/drivers/gpu/drm/i915/intel_guc_loader.c
+++ b/drivers/gpu/drm/i915/intel_guc_loader.c
@@ -296,7 +296,7 @@ static int guc_ucode_xfer(struct drm_i915_private *dev_priv)
 		return ret;
 	}
 
-	ret = i915_gem_obj_ggtt_pin(guc_fw->guc_fw_obj, 0, 0);
+	ret = i915_gem_object_ggtt_pin(guc_fw->guc_fw_obj, NULL, 0, 0, 0);
 	if (ret) {
 		DRM_DEBUG_DRIVER("pin failed %d\n", ret);
 		return ret;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4d5196547e78..86fa41770ff1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -581,8 +581,9 @@ static int intel_lr_context_pin(struct intel_context *ctx,
 	lockdep_assert_held(&engine->dev->struct_mutex);
 
 	ctx_obj = ctx->engine[engine->id].state;
-	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN,
-				    PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
+	ret = i915_gem_object_ggtt_pin(ctx_obj, NULL,
+				       0, GEN8_LR_CONTEXT_ALIGN,
+				       PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
 	if (ret)
 		goto err;
 
@@ -933,7 +934,7 @@ static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size)
 		return -ENOMEM;
 	}
 
-	ret = i915_gem_obj_ggtt_pin(ring->wa_ctx.obj, PAGE_SIZE, 0);
+	ret = i915_gem_object_ggtt_pin(ring->wa_ctx.obj, NULL, 0, PAGE_SIZE, 0);
 	if (ret) {
 		DRM_DEBUG_DRIVER("pin LRC WA ctx backing obj failed: %d\n",
 				 ret);
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index 094ea87bf6be..414a321b752f 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -1406,7 +1406,8 @@ void intel_setup_overlay(struct drm_device *dev)
 		}
 		overlay->flip_addr = reg_bo->phys_handle->busaddr;
 	} else {
-		ret = i915_gem_obj_ggtt_pin(reg_bo, PAGE_SIZE, PIN_MAPPABLE);
+		ret = i915_gem_object_ggtt_pin(reg_bo, NULL,
+					       0, PAGE_SIZE, PIN_MAPPABLE);
 		if (ret) {
 			DRM_ERROR("failed to pin overlay register bo\n");
 			goto out_free_bo;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 09799ce72212..ba3631d216fe 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -649,7 +649,7 @@ intel_init_pipe_control(struct intel_engine_cs *ring)
 	if (ret)
 		goto err_unref;
 
-	ret = i915_gem_obj_ggtt_pin(ring->scratch.obj, 4096, 0);
+	ret = i915_gem_object_ggtt_pin(ring->scratch.obj, NULL, 0, 4096, 0);
 	if (ret)
 		goto err_unref;
 
@@ -1848,7 +1848,7 @@ static int init_status_page(struct intel_engine_cs *ring)
 			 * actualy map it).
 			 */
 			flags |= PIN_MAPPABLE;
-		ret = i915_gem_obj_ggtt_pin(obj, 4096, flags);
+		ret = i915_gem_object_ggtt_pin(obj, NULL, 0, 4096, flags);
 		if (ret) {
 err_unref:
 			drm_gem_object_unreference(&obj->base);
@@ -1891,7 +1891,7 @@ int intel_ring_map(struct intel_ring *ring)
 	int ret;
 
 	if (HAS_LLC(ring->engine->i915) && !obj->stolen) {
-		ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, 0);
+		ret = i915_gem_object_ggtt_pin(obj, NULL, 0, PAGE_SIZE, 0);
 		if (ret)
 			return ret;
 
@@ -1906,7 +1906,8 @@ int intel_ring_map(struct intel_ring *ring)
 			goto unpin;
 		}
 	} else {
-		ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
+		ret = i915_gem_object_ggtt_pin(obj, NULL, 0, PAGE_SIZE,
+					       PIN_MAPPABLE);
 		if (ret)
 			return ret;
 
@@ -2503,7 +2504,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 				i915.semaphores = 0;
 			} else {
 				i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
-				ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_NONBLOCK);
+				ret = i915_gem_object_ggtt_pin(obj, NULL,
+							       0, 0, 0);
 				if (ret != 0) {
 					drm_gem_object_unreference(&obj->base);
 					DRM_ERROR("Failed to pin semaphore bo. Disabling semaphores\n");
@@ -2603,7 +2605,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 			return -ENOMEM;
 		}
 
-		ret = i915_gem_obj_ggtt_pin(obj, 0, 0);
+		ret = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, 0);
 		if (ret != 0) {
 			drm_gem_object_unreference(&obj->base);
 			DRM_ERROR("Failed to ping batch bo\n");
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 110/190] drm/i915: Move vma->pin_count:4 to vma->flags
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (21 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 109/190] drm/i915: Remove highly confusing i915_gem_obj_ggtt_pin() Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 111/190] drm/i915: Make fb_tracking.lock a spinlock Chris Wilson
                     ` (30 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

Let's aide gcc in our pin_count tracking as
i915_vma_pin()/i915_vma_unpin() are some of the hotest of the hot
functions and gcc doesn't like bitfields that much!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h            | 20 +++++++--------
 drivers/gpu/drm/i915/i915_gem.c            | 27 ++++++++++++---------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 ++++----
 drivers/gpu/drm/i915/i915_gem_gtt.h        | 39 +++++++++++++++++-------------
 4 files changed, 52 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0e3ff0b24d4d..a81e0f6de593 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2686,20 +2686,20 @@ i915_vma_pin(struct i915_vma *vma,
 	     uint64_t alignment,
 	     uint64_t flags);
 /* Flags used by pin/bind&friends. */
-#define PIN_MAPPABLE	(1<<0)
-#define PIN_NONBLOCK	(1<<1)
-#define PIN_GLOBAL	(1<<2)
-#define PIN_OFFSET_BIAS	(1<<3)
-#define PIN_USER	(1<<4)
-#define PIN_UPDATE	(1<<5)
-#define PIN_ZONE_4G	(1<<6)
-#define PIN_HIGH	(1<<7)
-#define PIN_OFFSET_FIXED	(1<<8)
+#define PIN_GLOBAL	(1<<0)
+#define PIN_USER	(1<<1)
+#define PIN_UPDATE	(1<<2)
+#define PIN_MAPPABLE	(1<<3)
+#define PIN_ZONE_4G	(1<<4)
+#define PIN_NONBLOCK	(1<<5)
+#define PIN_HIGH	(1<<6)
+#define PIN_OFFSET_BIAS	(1<<7)
+#define PIN_OFFSET_FIXED (1<<8)
 #define PIN_OFFSET_MASK (~4095)
 
 static inline void __i915_vma_unpin(struct i915_vma *vma)
 {
-	vma->pin_count--;
+	vma->flags--;
 }
 
 static inline void i915_vma_unpin(struct i915_vma *vma)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 495fb80edee0..9bbabc21d3e0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3484,38 +3484,41 @@ i915_vma_pin(struct i915_vma *vma,
 	     uint64_t alignment,
 	     uint64_t flags)
 {
-	unsigned bound = vma->bound;
+	unsigned bound;
 	int ret;
 
 	GEM_BUG_ON((flags & (PIN_GLOBAL | PIN_USER)) == 0);
 	GEM_BUG_ON((flags & PIN_GLOBAL) && !vma->is_ggtt);
 
-	if (WARN_ON(vma->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT))
-		return -EBUSY;
-
 	/* Pin early to prevent the shrinker/eviction logic from destroying
 	 * our vma as we insert and bind.
 	 */
-	vma->pin_count++;
+	bound = vma->flags++;
+	if (WARN_ON((bound & 0xf) == (DRM_I915_GEM_OBJECT_MAX_PIN_COUNT-1))) {
+		ret = -EBUSY;
+		goto err;
+	}
 
-	if (!bound) {
+	if ((bound & 0xff) == 0) {
 		ret = i915_vma_insert(vma, size, alignment, flags);
 		if (ret)
 			goto err;
 	}
 
-	ret = i915_vma_bind(vma, vma->obj->cache_level, flags);
-	if (ret)
-		goto err;
+	if (~(bound >> 4) & (flags & (GLOBAL_BIND | LOCAL_BIND))) {
+		ret = i915_vma_bind(vma, vma->obj->cache_level, flags);
+		if (ret)
+			goto err;
 
-	if ((bound ^ vma->bound) & GLOBAL_BIND)
-		__i915_vma_set_map_and_fenceable(vma);
+		if ((bound ^ vma->flags) & (GLOBAL_BIND << 4))
+			__i915_vma_set_map_and_fenceable(vma);
+	}
 
 	GEM_BUG_ON(i915_vma_misplaced(vma, size, alignment, flags));
 	return 0;
 
 err:
-	vma->pin_count--;
+	__i915_vma_unpin(vma);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index be90d907f890..79dbd74b73c2 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -34,10 +34,10 @@
 #include <linux/dma_remapping.h>
 #include <linux/uaccess.h>
 
-#define  __EXEC_OBJECT_HAS_PIN (1<<31)
-#define  __EXEC_OBJECT_HAS_FENCE (1<<30)
-#define  __EXEC_OBJECT_NEEDS_MAP (1<<29)
-#define  __EXEC_OBJECT_NEEDS_BIAS (1<<28)
+#define  __EXEC_OBJECT_HAS_PIN (1U<<31)
+#define  __EXEC_OBJECT_HAS_FENCE (1U<<30)
+#define  __EXEC_OBJECT_NEEDS_MAP (1U<<29)
+#define  __EXEC_OBJECT_NEEDS_BIAS (1U<<28)
 
 #define BATCH_OFFSET_BIAS (256*1024)
 
@@ -253,7 +253,7 @@ i915_gem_execbuffer_unreserve_vma(struct i915_vma *vma)
 		i915_gem_object_unpin_fence(obj);
 
 	if (entry->flags & __EXEC_OBJECT_HAS_PIN)
-		vma->pin_count--;
+		__i915_vma_unpin(vma);
 
 	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | __EXEC_OBJECT_HAS_PIN);
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 8877dc48f028..e6f64dcb2e77 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -185,13 +185,30 @@ struct i915_vma {
 
 	struct i915_gem_active last_read[I915_NUM_RINGS];
 
-	/** Flags and address space this VMA is bound to */
+	union {
+		struct {
+			/**
+			 * How many users have pinned this object in GTT space. The following
+			 * users can each hold at most one reference: pwrite/pread, execbuffer
+			 * (objects are not allowed multiple times for the same batchbuffer),
+			 * and the framebuffer code. When switching/pageflipping, the
+			 * framebuffer code has at most two buffers pinned per crtc.
+			 *
+			 * In the worst case this is 1 + 1 + 1 + 2*2 = 7. That would fit into 3
+			 * bits with absolutely no headroom. So use 4 bits. */
+			unsigned int pin_count : 4;
+#define DRM_I915_GEM_OBJECT_MAX_PIN_COUNT 0xf
+
+			/** Flags and address space this VMA is bound to */
 #define GLOBAL_BIND	(1<<0)
 #define LOCAL_BIND	(1<<1)
-	unsigned int bound : 4;
-	unsigned int active : I915_NUM_RINGS;
-	bool is_ggtt : 1;
-	bool closed : 1;
+			unsigned int bound : 4;
+			unsigned int active : I915_NUM_RINGS;
+			bool is_ggtt : 1;
+			bool closed : 1;
+		};
+		unsigned int flags;
+	};
 
 	/**
 	 * Support different GGTT views into the same object.
@@ -216,18 +233,6 @@ struct i915_vma {
 	struct hlist_node exec_node;
 	unsigned long exec_handle;
 	struct drm_i915_gem_exec_object2 *exec_entry;
-
-	/**
-	 * How many users have pinned this object in GTT space. The following
-	 * users can each hold at most one reference: pwrite/pread, execbuffer
-	 * (objects are not allowed multiple times for the same batchbuffer),
-	 * and the framebuffer code. When switching/pageflipping, the
-	 * framebuffer code has at most two buffers pinned per crtc.
-	 *
-	 * In the worst case this is 1 + 1 + 1 + 2*2 = 7. That would fit into 3
-	 * bits with absolutely no headroom. So use 4 bits. */
-	unsigned int pin_count:4;
-#define DRM_I915_GEM_OBJECT_MAX_PIN_COUNT 0xf
 };
 
 struct i915_page_dma {
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 111/190] drm/i915: Make fb_tracking.lock a spinlock
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (22 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 110/190] drm/i915: Move vma->pin_count:4 to vma->flags Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 112/190] drm/i915: Move obj->active:5 to obj->flags Chris Wilson
                     ` (29 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

We only need a very lightweight mechanism here as the locking is only
used for co-ordinating a bitfield.

Also double check that the object is still pinned to the display plane
before processing the state change.

v2: Move the cheap unlikely tests into the caller

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h          |  2 +-
 drivers/gpu/drm/i915/i915_gem.c          |  2 +-
 drivers/gpu/drm/i915/intel_drv.h         | 29 ++++++++++++++---
 drivers/gpu/drm/i915/intel_frontbuffer.c | 54 ++++++++++++++------------------
 4 files changed, 51 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a81e0f6de593..efa43411f0eb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1628,7 +1628,7 @@ struct intel_pipe_crc {
 };
 
 struct i915_frontbuffer_tracking {
-	struct mutex lock;
+	spinlock_t lock;
 
 	/*
 	 * Tracking bits for delayed frontbuffer flushing du to gpu activity or
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9bbabc21d3e0..74c56716a304 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4258,7 +4258,7 @@ i915_gem_load(struct drm_device *dev)
 
 	i915_gem_shrinker_init(dev_priv);
 
-	mutex_init(&dev_priv->fb_tracking.lock);
+	spin_lock_init(&dev_priv->fb_tracking.lock);
 }
 
 void i915_gem_release(struct drm_device *dev, struct drm_file *file)
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 1e082ab4f4d8..41e2e1c4d052 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1055,8 +1055,6 @@ void intel_ddi_set_vc_payload_alloc(struct drm_crtc *crtc, bool state);
 uint32_t ddi_signal_levels(struct intel_dp *intel_dp);
 
 /* intel_frontbuffer.c */
-void intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
-			     enum fb_op_origin origin);
 void intel_frontbuffer_flip_prepare(struct drm_device *dev,
 				    unsigned frontbuffer_bits);
 void intel_frontbuffer_flip_complete(struct drm_device *dev,
@@ -1067,8 +1065,31 @@ unsigned int intel_fb_align_height(struct drm_device *dev,
 				   unsigned int height,
 				   uint32_t pixel_format,
 				   uint64_t fb_format_modifier);
-void intel_fb_obj_flush(struct drm_i915_gem_object *obj, bool retire,
-			enum fb_op_origin origin);
+
+void __intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
+			       enum fb_op_origin origin);
+static inline void intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
+					   enum fb_op_origin origin)
+{
+	if (!obj->frontbuffer_bits || !obj->pin_display)
+		return;
+
+	__intel_fb_obj_invalidate(obj, origin);
+}
+
+void __intel_fb_obj_flush(struct drm_i915_gem_object *obj,
+			  bool retire,
+			  enum fb_op_origin origin);
+static inline void intel_fb_obj_flush(struct drm_i915_gem_object *obj,
+				      bool retire,
+				      enum fb_op_origin origin)
+{
+	if (!obj->frontbuffer_bits || !obj->pin_display)
+		return;
+
+	__intel_fb_obj_flush(obj, retire, origin);
+}
+
 u32 intel_fb_stride_alignment(struct drm_device *dev, uint64_t fb_modifier,
 			      uint32_t pixel_format);
 
diff --git a/drivers/gpu/drm/i915/intel_frontbuffer.c b/drivers/gpu/drm/i915/intel_frontbuffer.c
index ac85357010b4..a38ccfe4894a 100644
--- a/drivers/gpu/drm/i915/intel_frontbuffer.c
+++ b/drivers/gpu/drm/i915/intel_frontbuffer.c
@@ -76,24 +76,19 @@
  * until the rendering completes or a flip on this frontbuffer plane is
  * scheduled.
  */
-void intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
-			     enum fb_op_origin origin)
+void __intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
+			       enum fb_op_origin origin)
 {
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = to_i915(dev);
 
 	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
 
-	if (!obj->frontbuffer_bits)
-		return;
-
 	if (origin == ORIGIN_CS) {
-		mutex_lock(&dev_priv->fb_tracking.lock);
-		dev_priv->fb_tracking.busy_bits
-			|= obj->frontbuffer_bits;
-		dev_priv->fb_tracking.flip_bits
-			&= ~obj->frontbuffer_bits;
-		mutex_unlock(&dev_priv->fb_tracking.lock);
+		spin_lock(&dev_priv->fb_tracking.lock);
+		dev_priv->fb_tracking.busy_bits |= obj->frontbuffer_bits;
+		dev_priv->fb_tracking.flip_bits &= ~obj->frontbuffer_bits;
+		spin_unlock(&dev_priv->fb_tracking.lock);
 	}
 
 	intel_psr_invalidate(dev, obj->frontbuffer_bits);
@@ -120,11 +115,11 @@ static void intel_frontbuffer_flush(struct drm_device *dev,
 	struct drm_i915_private *dev_priv = to_i915(dev);
 
 	/* Delay flushing when rings are still busy.*/
-	mutex_lock(&dev_priv->fb_tracking.lock);
+	spin_lock(&dev_priv->fb_tracking.lock);
 	frontbuffer_bits &= ~dev_priv->fb_tracking.busy_bits;
-	mutex_unlock(&dev_priv->fb_tracking.lock);
+	spin_unlock(&dev_priv->fb_tracking.lock);
 
-	if (!frontbuffer_bits)
+	if (frontbuffer_bits == 0)
 		return;
 
 	intel_edp_drrs_flush(dev, frontbuffer_bits);
@@ -142,8 +137,9 @@ static void intel_frontbuffer_flush(struct drm_device *dev,
  * completed and frontbuffer caching can be started again. If @retire is true
  * then any delayed flushes will be unblocked.
  */
-void intel_fb_obj_flush(struct drm_i915_gem_object *obj,
-			bool retire, enum fb_op_origin origin)
+void __intel_fb_obj_flush(struct drm_i915_gem_object *obj,
+			  bool retire,
+			  enum fb_op_origin origin)
 {
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = to_i915(dev);
@@ -151,21 +147,18 @@ void intel_fb_obj_flush(struct drm_i915_gem_object *obj,
 
 	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
 
-	if (!obj->frontbuffer_bits)
-		return;
-
 	frontbuffer_bits = obj->frontbuffer_bits;
 
 	if (retire) {
-		mutex_lock(&dev_priv->fb_tracking.lock);
+		spin_lock(&dev_priv->fb_tracking.lock);
 		/* Filter out new bits since rendering started. */
 		frontbuffer_bits &= dev_priv->fb_tracking.busy_bits;
-
 		dev_priv->fb_tracking.busy_bits &= ~frontbuffer_bits;
-		mutex_unlock(&dev_priv->fb_tracking.lock);
+		spin_unlock(&dev_priv->fb_tracking.lock);
 	}
 
-	intel_frontbuffer_flush(dev, frontbuffer_bits, origin);
+	if (frontbuffer_bits)
+		intel_frontbuffer_flush(dev, frontbuffer_bits, origin);
 }
 
 /**
@@ -185,11 +178,11 @@ void intel_frontbuffer_flip_prepare(struct drm_device *dev,
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
 
-	mutex_lock(&dev_priv->fb_tracking.lock);
+	spin_lock(&dev_priv->fb_tracking.lock);
 	dev_priv->fb_tracking.flip_bits |= frontbuffer_bits;
 	/* Remove stale busy bits due to the old buffer. */
 	dev_priv->fb_tracking.busy_bits &= ~frontbuffer_bits;
-	mutex_unlock(&dev_priv->fb_tracking.lock);
+	spin_unlock(&dev_priv->fb_tracking.lock);
 
 	intel_psr_single_frame_update(dev, frontbuffer_bits);
 }
@@ -209,13 +202,14 @@ void intel_frontbuffer_flip_complete(struct drm_device *dev,
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
 
-	mutex_lock(&dev_priv->fb_tracking.lock);
+	spin_lock(&dev_priv->fb_tracking.lock);
 	/* Mask any cancelled flips. */
 	frontbuffer_bits &= dev_priv->fb_tracking.flip_bits;
 	dev_priv->fb_tracking.flip_bits &= ~frontbuffer_bits;
-	mutex_unlock(&dev_priv->fb_tracking.lock);
+	spin_unlock(&dev_priv->fb_tracking.lock);
 
-	intel_frontbuffer_flush(dev, frontbuffer_bits, ORIGIN_FLIP);
+	if (frontbuffer_bits)
+		intel_frontbuffer_flush(dev, frontbuffer_bits, ORIGIN_FLIP);
 }
 
 /**
@@ -234,10 +228,10 @@ void intel_frontbuffer_flip(struct drm_device *dev,
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
 
-	mutex_lock(&dev_priv->fb_tracking.lock);
+	spin_lock(&dev_priv->fb_tracking.lock);
 	/* Remove stale busy bits due to the old buffer. */
 	dev_priv->fb_tracking.busy_bits &= ~frontbuffer_bits;
-	mutex_unlock(&dev_priv->fb_tracking.lock);
+	spin_unlock(&dev_priv->fb_tracking.lock);
 
 	intel_frontbuffer_flush(dev, frontbuffer_bits, ORIGIN_FLIP);
 }
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 112/190] drm/i915: Move obj->active:5 to obj->flags
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (23 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 111/190] drm/i915: Make fb_tracking.lock a spinlock Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-03-24 12:00     ` David Weinehall
  2016-01-11 10:44   ` [PATCH 113/190] drm/i915: Enable lockless lookup of request tracking via RCU Chris Wilson
                     ` (28 subsequent siblings)
  53 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

We are motivated to avoid using a bitfield for obj->active for a couple
of reasons. Firstly, we wish to document our lockless read of obj->active
using READ_ONCE inside i915_gem_busy_ioctl() and that requires an
integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code
when presented with a bitfield and that shows up high on the profiles of
request tracking (mainly due to excess memory traffic as it converts
the bitfield to a register and back and generates frequent AGI in the
process).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  2 +-
 drivers/gpu/drm/i915/i915_drv.h            | 31 +++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem.c            | 20 +++++++++----------
 drivers/gpu/drm/i915/i915_gem_batch_pool.c |  4 ++--
 drivers/gpu/drm/i915/i915_gem_context.c    |  2 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 +++++-----
 drivers/gpu/drm/i915/i915_gem_gtt.c        |  2 +-
 drivers/gpu/drm/i915/i915_gem_shrinker.c   |  5 +++--
 8 files changed, 53 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index dee66807c6bd..6b14c59828e3 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -136,7 +136,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 
 	seq_printf(m, "%pK: %s%s%s%s %8zdKiB %02x %02x [ ",
 		   &obj->base,
-		   obj->active ? "*" : " ",
+		   i915_gem_object_is_active(obj) ? "*" : " ",
 		   get_pin_flag(obj),
 		   get_tiling_flag(obj),
 		   get_global_flag(obj),
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index efa43411f0eb..1ecff535973e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2031,12 +2031,16 @@ struct drm_i915_gem_object {
 
 	struct list_head batch_pool_link;
 
+	unsigned long flags;
 	/**
 	 * This is set if the object is on the active lists (has pending
 	 * rendering and so a non-zero seqno), and is not set if it i s on
 	 * inactive (ready to be unbound) list.
 	 */
-	unsigned int active:I915_NUM_RINGS;
+#define I915_BO_ACTIVE_SHIFT 0
+#define I915_BO_ACTIVE_MASK ((1 << I915_NUM_RINGS) - 1)
+#define I915_BO_ACTIVE(bo) ((bo)->flags & (I915_BO_ACTIVE_MASK << I915_BO_ACTIVE_SHIFT))
+#define __I915_BO_ACTIVE(bo) (READ_ONCE((bo)->flags) & (I915_BO_ACTIVE_MASK << I915_BO_ACTIVE_SHIFT))
 
 	/**
 	 * This is set if the object has been written to since last bound
@@ -2151,6 +2155,31 @@ struct drm_i915_gem_object {
 #define GEM_BUG_ON(expr)
 #endif
 
+static inline bool
+i915_gem_object_is_active(const struct drm_i915_gem_object *obj)
+{
+	return obj->flags & (I915_BO_ACTIVE_MASK << I915_BO_ACTIVE_SHIFT);
+}
+
+static inline void
+i915_gem_object_set_active(struct drm_i915_gem_object *obj, int engine)
+{
+	obj->flags |= 1 << (engine + I915_BO_ACTIVE_SHIFT);
+}
+
+static inline void
+i915_gem_object_unset_active(struct drm_i915_gem_object *obj, int engine)
+{
+	obj->flags &= ~(1 << (engine + I915_BO_ACTIVE_SHIFT));
+}
+
+static inline bool
+i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj,
+				  int engine)
+{
+	return obj->flags & (1 << (engine + I915_BO_ACTIVE_SHIFT));
+}
+
 void i915_gem_track_fb(struct drm_i915_gem_object *old,
 		       struct drm_i915_gem_object *new,
 		       unsigned frontbuffer_bits);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 74c56716a304..6712ecf1239b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1130,7 +1130,7 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
 {
 	int ret, i;
 
-	if (!obj->active)
+	if (!i915_gem_object_is_active(obj))
 		return 0;
 
 	if (readonly) {
@@ -1143,7 +1143,7 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
 			if (ret)
 				return ret;
 		}
-		GEM_BUG_ON(obj->active);
+		GEM_BUG_ON(i915_gem_object_is_active(obj));
 	}
 
 	return 0;
@@ -1165,7 +1165,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 	BUG_ON(!dev_priv->mm.interruptible);
 
-	if (!obj->active)
+	if (!i915_gem_object_is_active(obj))
 		return 0;
 
 	if (readonly) {
@@ -2080,10 +2080,10 @@ i915_gem_object_retire__read(struct i915_gem_active *active,
 	struct drm_i915_gem_object *obj =
 		container_of(active, struct drm_i915_gem_object, last_read[ring]);
 
-	GEM_BUG_ON((obj->active & (1 << ring)) == 0);
+	GEM_BUG_ON(!i915_gem_object_has_active_engine(obj, ring));
 
-	obj->active &= ~(1 << ring);
-	if (obj->active)
+	i915_gem_object_unset_active(obj, ring);
+	if (i915_gem_object_is_active(obj))
 		return;
 
 	/* Bump our place on the bound list to keep it roughly in LRU order
@@ -2373,7 +2373,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 {
 	int i;
 
-	if (!obj->active)
+	if (!i915_gem_object_is_active(obj))
 		return;
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
@@ -2459,7 +2459,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 
 	/* Need to make sure the object gets inactive eventually. */
 	i915_gem_object_flush_active(obj);
-	if (!obj->active)
+	if (!i915_gem_object_is_active(obj))
 		goto out;
 
 	/* Do this after OLR check to make sure we make forward progress polling
@@ -2557,7 +2557,7 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	struct drm_i915_gem_request *req[I915_NUM_RINGS];
 	int ret, i, n;
 
-	if (!obj->active)
+	if (!i915_gem_object_is_active(obj))
 		return 0;
 
 	n = 0;
@@ -3593,7 +3593,7 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 	i915_gem_object_flush_active(obj);
 
 	BUILD_BUG_ON(I915_NUM_RINGS > 16);
-	args->busy = obj->active << 16;
+	args->busy = I915_BO_ACTIVE(obj) << 16;
 	if (obj->last_write.request)
 		args->busy |= obj->last_write.request->engine->id;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index d4318665ac6c..5ec5b1439e1f 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -115,14 +115,14 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 
 	list_for_each_entry_safe(tmp, next, list, batch_pool_link) {
 		/* The batches are strictly LRU ordered */
-		if (tmp->active) {
+		if (i915_gem_object_is_active(tmp)) {
 			struct drm_i915_gem_request *rq;
 
 			rq = tmp->last_read[pool->engine->id].request;
 			if (!i915_gem_request_completed(rq))
 				break;
 
-			GEM_BUG_ON(tmp->active & ~intel_engine_flag(pool->engine));
+			GEM_BUG_ON((tmp->flags >> I915_BO_ACTIVE_SHIFT) & (~intel_engine_flag(pool->engine) & I915_BO_ACTIVE_MASK));
 			GEM_BUG_ON(tmp->last_write.request);
 		}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 15d5a5d247e0..9250a7405807 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -427,7 +427,7 @@ void i915_gem_context_fini(struct drm_device *dev)
 		WARN_ON(!dev_priv->ring[RCS].last_context);
 		if (dev_priv->ring[RCS].last_context == dctx) {
 			/* Fake switch to NULL context */
-			WARN_ON(dctx->legacy_hw_ctx.rcs_state->active);
+			WARN_ON(i915_gem_object_is_active(dctx->legacy_hw_ctx.rcs_state));
 			i915_gem_object_ggtt_unpin(dctx->legacy_hw_ctx.rcs_state);
 			i915_gem_context_unreference(dctx);
 			dev_priv->ring[RCS].last_context = NULL;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 79dbd74b73c2..e66864bdbfb4 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -515,7 +515,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 	}
 
 	/* We can't wait for rendering with pagefaults disabled */
-	if (obj->active && pagefault_disabled())
+	if (i915_gem_object_is_active(obj) && pagefault_disabled())
 		return -EFAULT;
 
 	if (use_cpu_reloc(obj))
@@ -977,7 +977,7 @@ static int
 i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 				struct list_head *vmas)
 {
-	const unsigned other_rings = ~intel_engine_flag(req->engine);
+	const unsigned other_rings = (~intel_engine_flag(req->engine) & I915_BO_ACTIVE_MASK) << I915_BO_ACTIVE_SHIFT;
 	struct i915_vma *vma;
 	uint32_t flush_domains = 0;
 	bool flush_chipset = false;
@@ -986,7 +986,7 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 	list_for_each_entry(vma, vmas, exec_list) {
 		struct drm_i915_gem_object *obj = vma->obj;
 
-		if (obj->active & other_rings) {
+		if (obj->flags & other_rings) {
 			ret = i915_gem_object_sync(obj, req);
 			if (ret)
 				return ret;
@@ -1145,9 +1145,9 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 	 * add the active reference first and queue for it to be dropped
 	 * *last*.
 	 */
-	if (obj->active == 0)
+	if (!i915_gem_object_is_active(obj))
 		drm_gem_object_reference(&obj->base);
-	obj->active |= 1 << engine;
+	i915_gem_object_set_active(obj, engine);
 	i915_gem_request_mark_active(req, &obj->last_read[engine]);
 
 	if (flags & EXEC_OBJECT_WRITE) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 8f3b2f051918..6652df57e5b0 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3229,7 +3229,7 @@ i915_vma_retire(struct i915_gem_active *active,
 		container_of(active, struct i915_vma, last_read[engine]);
 
 	GEM_BUG_ON((vma->active & (1 << engine)) == 0);
-	GEM_BUG_ON((vma->obj->active & vma->active) != vma->active);
+	GEM_BUG_ON(((vma->obj->flags >> I915_BO_ACTIVE_SHIFT) & vma->active) != vma->active);
 
 	vma->active &= ~(1 << engine);
 	if (vma->active)
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 67f3eb9a8391..4d44def8fb03 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -150,7 +150,8 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
 			    obj->madv != I915_MADV_DONTNEED)
 				continue;
 
-			if ((flags & I915_SHRINK_ACTIVE) == 0 && obj->active)
+			if ((flags & I915_SHRINK_ACTIVE) == 0 &&
+			    i915_gem_object_is_active(obj))
 				continue;
 
 			if (!can_release_pages(obj))
@@ -233,7 +234,7 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 			count += obj->base.size >> PAGE_SHIFT;
 
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-		if (!obj->active && can_release_pages(obj))
+		if (!i915_gem_object_is_active(obj) && can_release_pages(obj))
 			count += obj->base.size >> PAGE_SHIFT;
 	}
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 113/190] drm/i915: Enable lockless lookup of request tracking via RCU
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (24 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 112/190] drm/i915: Move obj->active:5 to obj->flags Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 114/190] drm/i915: Remove (struct_mutex) locking for wait-ioctl Chris Wilson
                     ` (27 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.

However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.

v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)

Paul E. McKenney wrote:

Another approach is synchronize_rcu() after some largish number of
requests.  The advantage of this approach is that it throttles the
production of callbacks at the source.  The corresponding disadvantage
is that it slows things up.

Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it.  Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair.  The
idea is to do something like this:

        cond_synchronize_rcu(cookie);
        cookie = get_state_synchronize_rcu();

You would of course do an initial get_state_synchronize_rcu() to
get things going.  This would not block unless there was less than
one grace period's worth of time between invocations.  But this
assumes a busy system, where there is almost always a grace period
in flight.  But you can make that happen as follows:

        cond_synchronize_rcu(cookie);
        cookie = get_state_synchronize_rcu();
        call_rcu(&my_rcu_head, noop_function);

Note that you need additional code to make sure that the old callback
has completed before doing a new one.  Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c          |  3 ++-
 drivers/gpu/drm/i915/i915_gem_request.c  |  2 +-
 drivers/gpu/drm/i915/i915_gem_request.h  | 24 +++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 15 +++++++++++----
 4 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6712ecf1239b..ee715558ecea 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4215,7 +4215,8 @@ i915_gem_load(struct drm_device *dev)
 	dev_priv->requests =
 		kmem_cache_create("i915_gem_request",
 				  sizeof(struct drm_i915_gem_request), 0,
-				  SLAB_HWCACHE_ALIGN,
+				  SLAB_HWCACHE_ALIGN |
+				  SLAB_DESTROY_BY_RCU,
 				  NULL);
 
 	INIT_LIST_HEAD(&dev_priv->context_list);
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 61be8dda4a14..be24bde2e602 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -336,7 +336,7 @@ static void __i915_gem_request_retire_active(struct drm_i915_gem_request *req)
 	 */
 	list_for_each_entry_safe(active, next, &req->active_list, link) {
 		INIT_LIST_HEAD(&active->link);
-		active->request = NULL;
+		rcu_assign_pointer(active->request, NULL);
 
 		active->retire(active, req);
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index c2e83584f8a2..f035db7c97cd 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -138,6 +138,12 @@ i915_gem_request_get(struct drm_i915_gem_request *req)
 	return to_request(fence_get(&req->fence));
 }
 
+static inline struct drm_i915_gem_request *
+i915_gem_request_get_rcu(struct drm_i915_gem_request *req)
+{
+	return to_request(fence_get_rcu(&req->fence));
+}
+
 static inline void
 i915_gem_request_put(struct drm_i915_gem_request *req)
 {
@@ -242,7 +248,23 @@ i915_gem_request_mark_active(struct drm_i915_gem_request *request,
 			     struct i915_gem_active *active)
 {
 	list_move(&active->link, &request->active_list);
-	active->request = request;
+	rcu_assign_pointer(active->request, request);
+}
+
+static inline struct drm_i915_gem_request *
+i915_gem_active_get_request_rcu(struct i915_gem_active *active)
+{
+	do {
+		struct drm_i915_gem_request *request;
+
+		request = rcu_dereference(active->request);
+		if (request == NULL)
+			return NULL;
+
+		request = i915_gem_request_get_rcu(request);
+		if (request)
+			return request;
+	} while (1);
 }
 
 #endif /* I915_GEM_REQUEST_H */
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 4d44def8fb03..ee689c373a91 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -171,6 +171,8 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
 	}
 
 	i915_gem_retire_requests(dev_priv->dev);
+	/* expedite the RCU grace period to free some request slabs */
+	synchronize_rcu_expedited();
 
 	return count;
 }
@@ -191,10 +193,15 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
  */
 unsigned long i915_gem_shrink_all(struct drm_i915_private *dev_priv)
 {
-	return i915_gem_shrink(dev_priv, -1UL,
-			       I915_SHRINK_BOUND |
-			       I915_SHRINK_UNBOUND |
-			       I915_SHRINK_ACTIVE);
+	unsigned long freed;
+
+	freed = i915_gem_shrink(dev_priv, -1UL,
+				I915_SHRINK_BOUND |
+				I915_SHRINK_UNBOUND |
+				I915_SHRINK_ACTIVE);
+	rcu_barrier(); /* wait until our RCU delayed slab frees are completed */
+
+	return freed;
 }
 
 static bool i915_gem_shrinker_lock(struct drm_device *dev, bool *unlock)
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 114/190] drm/i915: Remove (struct_mutex) locking for wait-ioctl
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (25 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 113/190] drm/i915: Enable lockless lookup of request tracking via RCU Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:44   ` [PATCH 115/190] drm/i915: Remove (struct_mutex) locking for busy-ioctl Chris Wilson
                     ` (26 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

With a bit of care (and leniency) we can iterate over the object and
wait for previous rendering to complete with judicial use of atomic
reference counting. The ABI requires us to ensure that an active object
is eventually flushed (like the busy-ioctl) which is guaranteed by our
management of requests (i.e. everything that is submitted to hardware is
flushed in the same request). All we have to do is ensure that we can
detect when the requests are complete for reporting when the object is
idle (without triggering ETIME) - this is handled by
__i915_wait_request.

The biggest danger in the code is walking the object without holding any
locks. We iterate over the set of last requests and carefully grab a
reference upon it. (If it is changing beneath us, that is the usual
userspace race and even with locking you get the same indeterminate
results.) If the request is unreferenced beneath us, it will be disposed
of into the request cache - so we have to carefully order the retrieval
of the request pointer with its removal, and to do this we employ RCU on
the request cache and upon the last_request pointer tracking.

The impact of this is actually quite small - the return to userspace
following the wait was already lockless. What we achieve here is
completing an already finished wait without hitting the struct_mutex,
our hold is quite short and so we are typically just a victim of
contention rather than a cause.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 52 +++++++++++++++--------------------------
 1 file changed, 19 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ee715558ecea..f30207596ec6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2440,54 +2440,40 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
 	struct drm_i915_gem_wait *args = data;
 	struct drm_i915_gem_object *obj;
-	struct drm_i915_gem_request *req[I915_NUM_RINGS];
-	int i, n = 0;
-	int ret;
+	int i, ret = 0;
 
 	if (args->flags != 0)
 		return -EINVAL;
 
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		return ret;
-
 	obj = to_intel_bo(drm_gem_object_lookup(dev, file, args->bo_handle));
-	if (&obj->base == NULL) {
-		mutex_unlock(&dev->struct_mutex);
+	if (&obj->base == NULL)
 		return -ENOENT;
-	}
 
-	/* Need to make sure the object gets inactive eventually. */
-	i915_gem_object_flush_active(obj);
-	if (!i915_gem_object_is_active(obj))
+	if (!__I915_BO_ACTIVE(obj))
 		goto out;
 
-	/* Do this after OLR check to make sure we make forward progress polling
-	 * on this IOCTL with a timeout == 0 (like busy ioctl)
-	 */
-	if (args->timeout_ns == 0) {
-		ret = -ETIME;
-		goto out;
-	}
-
+	rcu_read_lock();
 	for (i = 0; i < I915_NUM_RINGS; i++) {
-		if (obj->last_read[i].request == NULL)
+		struct drm_i915_gem_request *req;
+
+		req = i915_gem_active_get_request_rcu(&obj->last_read[i]);
+		if (req == NULL)
 			continue;
 
-		req[n++] = i915_gem_request_get(obj->last_read[i].request);
+		rcu_read_unlock();
+		ret = __i915_wait_request(req, true,
+					  args->timeout_ns >= 0 ? &args->timeout_ns : NULL,
+					  to_rps_client(file));
+		i915_gem_request_put(req);
+		if (ret)
+			goto out;
+
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 
 out:
-	drm_gem_object_unreference(&obj->base);
-	mutex_unlock(&dev->struct_mutex);
-
-	for (i = 0; i < n; i++) {
-		if (ret == 0)
-			ret = __i915_wait_request(req[i], true,
-						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
-						  to_rps_client(file));
-		i915_gem_request_put(req[i]);
-	}
+	drm_gem_object_unreference_unlocked(&obj->base);
 	return ret;
 }
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 115/190] drm/i915: Remove (struct_mutex) locking for busy-ioctl
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (26 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 114/190] drm/i915: Remove (struct_mutex) locking for wait-ioctl Chris Wilson
@ 2016-01-11 10:44   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 116/190] drm/i915: Reduce locking inside swfinish ioctl Chris Wilson
                     ` (25 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:44 UTC (permalink / raw)
  To: intel-gfx

By applying the same logic as for wait-ioctl, we can query whether a
request has completed without holding struct_mutex. The biggest impact
system-wide is removing the flush_active and the contention that causes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 51 ++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f30207596ec6..95d4d2460f6a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3559,34 +3559,39 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 {
 	struct drm_i915_gem_busy *args = data;
 	struct drm_i915_gem_object *obj;
-	int ret;
-
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		return ret;
 
 	obj = to_intel_bo(drm_gem_object_lookup(dev, file, args->handle));
-	if (&obj->base == NULL) {
-		ret = -ENOENT;
-		goto unlock;
-	}
+	if (&obj->base == NULL)
+		return -ENOENT;
 
-	/* Count all active objects as busy, even if they are currently not used
-	 * by the gpu. Users of this interface expect objects to eventually
-	 * become non-busy without any further actions, therefore emit any
-	 * necessary flushes here.
-	 */
-	i915_gem_object_flush_active(obj);
+	args->busy = 0;
+	if (__I915_BO_ACTIVE(obj)) {
+		struct drm_i915_gem_request *req;
+		int i;
 
-	BUILD_BUG_ON(I915_NUM_RINGS > 16);
-	args->busy = I915_BO_ACTIVE(obj) << 16;
-	if (obj->last_write.request)
-		args->busy |= obj->last_write.request->engine->id;
+		BUILD_BUG_ON(I915_NUM_RINGS > 16);
+		rcu_read_lock();
+		for (i = 0; i < I915_NUM_RINGS; i++) {
+			req = i915_gem_active_get_request_rcu(&obj->last_read[i]);
+			if (req == NULL)
+				continue;
 
-	drm_gem_object_unreference(&obj->base);
-unlock:
-	mutex_unlock(&dev->struct_mutex);
-	return ret;
+			if (!i915_gem_request_completed(req))
+				args->busy |= 1 << (16 + i);
+			i915_gem_request_put(req);
+		}
+
+		req = i915_gem_active_get_request_rcu(&obj->last_write);
+		if (req) {
+			if (!i915_gem_request_completed(req))
+				args->busy |= 1 << req->engine->id;
+			i915_gem_request_put(req);
+		}
+		rcu_read_unlock();
+	}
+
+	drm_gem_object_unreference_unlocked(&obj->base);
+	return 0;
 }
 
 int
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 116/190] drm/i915: Reduce locking inside swfinish ioctl
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (27 preceding siblings ...)
  2016-01-11 10:44   ` [PATCH 115/190] drm/i915: Remove (struct_mutex) locking for busy-ioctl Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 117/190] drm/i915: Remove pinned check from madvise ioctl Chris Wilson
                     ` (24 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

We only need to take the struct_mutex if the object is pinned to the
display engine and so requires checking for clflush. (The race with
userspace pinning the object to a framebuffer is irrelevant.)

v2: Use access once for compiler hints (or not as it is a bitfield)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 95d4d2460f6a..f87e558a7233 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1282,25 +1282,28 @@ i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 {
 	struct drm_i915_gem_sw_finish *args = data;
 	struct drm_i915_gem_object *obj;
-	int ret = 0;
-
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		return ret;
+	int ret;
 
 	obj = to_intel_bo(drm_gem_object_lookup(dev, file, args->handle));
-	if (&obj->base == NULL) {
-		ret = -ENOENT;
-		goto unlock;
-	}
+	if (&obj->base == NULL)
+		return -ENOENT;
 
 	/* Pinned buffers may be scanout, so flush the cache */
-	if (obj->pin_display)
+	if (obj->pin_display) {
+		ret = i915_mutex_lock_interruptible(dev);
+		if (ret)
+			goto unref;
+
 		i915_gem_object_flush_cpu_write_domain(obj);
 
-	drm_gem_object_unreference(&obj->base);
-unlock:
-	mutex_unlock(&dev->struct_mutex);
+		drm_gem_object_unreference(&obj->base);
+		mutex_unlock(&dev->struct_mutex);
+	} else {
+		ret = 0;
+unref:
+		drm_gem_object_unreference_unlocked(&obj->base);
+	}
+
 	return ret;
 }
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 117/190] drm/i915: Remove pinned check from madvise ioctl
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (28 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 116/190] drm/i915: Reduce locking inside swfinish ioctl Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 118/190] drm/i915: Remove locking for get_tiling Chris Wilson
                     ` (23 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

We don't need to incur the overhead of checking whether the object is
pinned prior to changing its madvise. If the object is pinned, the
madvise will not take effect until it is unpinned and so we cannot free
the pages being pointed at by hardware. Marking a pinned object with
allocated pages as DONTNEED will not trigger any undue warnings. The check
is therefore superfluous, and by removing it we can remove a linear walk
over all the vma the object has.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f87e558a7233..24e6e4773ac8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3631,11 +3631,6 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
 		goto unlock;
 	}
 
-	if (i915_gem_obj_is_pinned(obj)) {
-		ret = -EINVAL;
-		goto out;
-	}
-
 	if (obj->pages &&
 	    obj->tiling_mode != I915_TILING_NONE &&
 	    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
@@ -3654,7 +3649,6 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
 
 	args->retained = obj->madv != __I915_MADV_PURGED;
 
-out:
 	drm_gem_object_unreference(&obj->base);
 unlock:
 	mutex_unlock(&dev->struct_mutex);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 118/190] drm/i915: Remove locking for get_tiling
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (29 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 117/190] drm/i915: Remove pinned check from madvise ioctl Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 119/190] drm/i915: Reduce amount of duplicate buffer information captured on error Chris Wilson
                     ` (22 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

Since we are not concerned with userspace racing itself with set-tiling
(the order is indeterminant even if we take a lock), then we can safely
read back the single obj->tiling_mode and do the static lookup of
swizzle mode without having to take a lock.

get-tiling is reasonably frequent due to the back-channel passing around
of tiling parameters in DRI2/DRI3.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_tiling.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
index c7588135a82d..387246f19ce2 100644
--- a/drivers/gpu/drm/i915/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
@@ -301,10 +301,8 @@ i915_gem_get_tiling(struct drm_device *dev, void *data,
 	if (&obj->base == NULL)
 		return -ENOENT;
 
-	mutex_lock(&dev->struct_mutex);
-
 	args->tiling_mode = obj->tiling_mode;
-	switch (obj->tiling_mode) {
+	switch (args->tiling_mode) {
 	case I915_TILING_X:
 		args->swizzle_mode = dev_priv->mm.bit_6_swizzle_x;
 		break;
@@ -328,8 +326,6 @@ i915_gem_get_tiling(struct drm_device *dev, void *data,
 	if (args->swizzle_mode == I915_BIT_6_SWIZZLE_9_10_17)
 		args->swizzle_mode = I915_BIT_6_SWIZZLE_9_10;
 
-	drm_gem_object_unreference(&obj->base);
-	mutex_unlock(&dev->struct_mutex);
-
+	drm_gem_object_unreference_unlocked(&obj->base);
 	return 0;
 }
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 119/190] drm/i915: Reduce amount of duplicate buffer information captured on error
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (30 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 118/190] drm/i915: Remove locking for get_tiling Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 120/190] drm/i915: Stop the machine whilst capturing the GPU crash dump Chris Wilson
                     ` (21 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

When capturing the error state, we do not need to know about every
address space - just those that are related to the error. We know which
context is active at the time, therefore we know which VM are implicated
in the error. We can then restrict the VM which we report to the
relevant subset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h       |   9 +-
 drivers/gpu/drm/i915/i915_gpu_error.c | 197 ++++++++++++++--------------------
 2 files changed, 87 insertions(+), 119 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1ecff535973e..f1447dbb3c55 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -519,6 +519,7 @@ struct drm_i915_error_state {
 		bool waiting;
 		int hangcheck_score;
 		enum intel_engine_hangcheck_action hangcheck_action;
+		struct i915_address_space *vm;
 		int num_requests;
 
 		/* our own tracking of ring head and tail */
@@ -579,17 +580,15 @@ struct drm_i915_error_state {
 		u32 read_domains;
 		u32 write_domain;
 		s32 fence_reg:I915_MAX_NUM_FENCE_BITS;
-		s32 pinned:2;
 		u32 tiling:2;
 		u32 dirty:1;
 		u32 purgeable:1;
 		u32 userptr:1;
 		s32 ring:4;
 		u32 cache_level:3;
-	} **active_bo, **pinned_bo;
-
-	u32 *active_bo_count, *pinned_bo_count;
-	u32 vm_count;
+	} *active_bo[I915_NUM_RINGS], *pinned_bo;
+	u32 active_bo_count[I915_NUM_RINGS], pinned_bo_count;
+	struct i915_address_space *active_vm[I915_NUM_RINGS];
 };
 
 struct intel_connector;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 706d956b6eb3..98d0a6a53cc7 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -42,16 +42,6 @@ static const char *ring_str(int ring)
 	}
 }
 
-static const char *pin_flag(int pinned)
-{
-	if (pinned > 0)
-		return " P";
-	else if (pinned < 0)
-		return " p";
-	else
-		return "";
-}
-
 static const char *tiling_flag(int tiling)
 {
 	switch (tiling) {
@@ -189,7 +179,7 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
 {
 	int i;
 
-	err_printf(m, "  %s [%d]:\n", name, count);
+	err_printf(m, "%s [%d]:\n", name, count);
 
 	while (count--) {
 		err_printf(m, "    %08x_%08x %8u %02x %02x [ ",
@@ -202,7 +192,6 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
 			err_printf(m, "%02x ", err->rseqno[i]);
 
 		err_printf(m, "] %02x", err->wseqno);
-		err_puts(m, pin_flag(err->pinned));
 		err_puts(m, tiling_flag(err->tiling));
 		err_puts(m, dirty_flag(err->dirty));
 		err_puts(m, purgeable_flag(err->purgeable));
@@ -414,18 +403,25 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 	for (i = 0; i < ARRAY_SIZE(error->ring); i++)
 		i915_ring_error_state(m, dev, error, i);
 
-	for (i = 0; i < error->vm_count; i++) {
-		err_printf(m, "vm[%d]\n", i);
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		if (error->active_vm[i] == NULL)
+			break;
 
-		print_error_buffers(m, "Active",
+		err_printf(m, "Active vm[%d]\n", i);
+		for (j = 0; j < I915_NUM_RINGS; j++) {
+			if (error->ring[j].vm == error->active_vm[i])
+				err_printf(m, "    %s\n",
+					   dev_priv->ring[j].name);
+		}
+		print_error_buffers(m, "  Buffers",
 				    error->active_bo[i],
 				    error->active_bo_count[i]);
-
-		print_error_buffers(m, "Pinned",
-				    error->pinned_bo[i],
-				    error->pinned_bo_count[i]);
 	}
 
+	print_error_buffers(m, "Pinned (global)",
+			    error->pinned_bo,
+			    error->pinned_bo_count);
+
 	for (i = 0; i < ARRAY_SIZE(error->ring); i++) {
 		obj = error->ring[i].batchbuffer;
 		if (obj) {
@@ -585,13 +581,10 @@ static void i915_error_state_free(struct kref *error_ref)
 
 	i915_error_object_free(error->semaphore_obj);
 
-	for (i = 0; i < error->vm_count; i++)
+	for (i = 0; i < ARRAY_SIZE(error->active_bo); i++)
 		kfree(error->active_bo[i]);
-
-	kfree(error->active_bo);
-	kfree(error->active_bo_count);
 	kfree(error->pinned_bo);
-	kfree(error->pinned_bo_count);
+
 	kfree(error->overlay);
 	kfree(error->display);
 	kfree(error);
@@ -714,9 +707,6 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 	err->read_domains = obj->base.read_domains;
 	err->write_domain = obj->base.write_domain;
 	err->fence_reg = obj->fence_reg;
-	err->pinned = 0;
-	if (i915_gem_obj_is_pinned(obj))
-		err->pinned = 1;
 	err->tiling = obj->tiling_mode;
 	err->dirty = obj->dirty;
 	err->purgeable = obj->madv != I915_MADV_WILLNEED;
@@ -725,13 +715,17 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 	err->cache_level = obj->cache_level;
 }
 
-static u32 capture_active_bo(struct drm_i915_error_buffer *err,
-			     int count, struct list_head *head)
+static u32 capture_error_bo(struct drm_i915_error_buffer *err,
+			    int count, struct list_head *head,
+			    bool pinned_only)
 {
 	struct i915_vma *vma;
 	int i = 0;
 
 	list_for_each_entry(vma, head, vm_link) {
+		if (pinned_only && !vma->pin_count)
+			continue;
+
 		capture_bo(err++, vma);
 		if (++i == count)
 			break;
@@ -740,28 +734,6 @@ static u32 capture_active_bo(struct drm_i915_error_buffer *err,
 	return i;
 }
 
-static u32 capture_pinned_bo(struct drm_i915_error_buffer *err,
-			     int count, struct list_head *head,
-			     struct i915_address_space *vm)
-{
-	struct drm_i915_gem_object *obj;
-	struct drm_i915_error_buffer * const first = err;
-	struct drm_i915_error_buffer * const last = err + count;
-
-	list_for_each_entry(obj, head, global_list) {
-		struct i915_vma *vma;
-
-		if (err == last)
-			break;
-
-		list_for_each_entry(vma, &obj->vma_list, obj_link)
-			if (vma->vm == vm && vma->pin_count > 0)
-				capture_bo(err++, vma);
-	}
-
-	return err - first;
-}
-
 /* Generate a semi-unique error code. The code is not meant to have meaning, The
  * code's only purpose is to try to prevent false duplicated bug reports by
  * grossly estimating a GPU error state.
@@ -1009,9 +981,10 @@ static void i915_gem_record_rings(struct drm_device *dev,
 			struct i915_address_space *vm;
 			struct intel_ring *ring;
 
-			vm = request->ctx && request->ctx->ppgtt ?
+			vm = request->ctx->ppgtt ?
 				&request->ctx->ppgtt->base :
 				&dev_priv->gtt.base;
+			error->ring[i].vm = vm;
 
 			/* We need to copy these to an anonymous buffer
 			 * as the simplest method to avoid being overwritten
@@ -1099,89 +1072,83 @@ static void i915_gem_record_rings(struct drm_device *dev,
 	}
 }
 
-/* FIXME: Since pin count/bound list is global, we duplicate what we capture per
- * VM.
- */
 static void i915_gem_capture_vm(struct drm_i915_private *dev_priv,
 				struct drm_i915_error_state *error,
 				struct i915_address_space *vm,
 				const int ndx)
 {
-	struct drm_i915_error_buffer *active_bo = NULL, *pinned_bo = NULL;
-	struct drm_i915_gem_object *obj;
+	struct drm_i915_error_buffer *active_bo;
 	struct i915_vma *vma;
 	int i;
 
 	i = 0;
 	list_for_each_entry(vma, &vm->active_list, vm_link)
 		i++;
-	error->active_bo_count[ndx] = i;
 
-	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-		list_for_each_entry(vma, &obj->vma_list, obj_link)
-			if (vma->vm == vm && vma->pin_count > 0)
-				i++;
-	}
-	error->pinned_bo_count[ndx] = i - error->active_bo_count[ndx];
-
-	if (i) {
+	active_bo = NULL;
+	if (i)
 		active_bo = kcalloc(i, sizeof(*active_bo), GFP_ATOMIC);
-		if (active_bo)
-			pinned_bo = active_bo + error->active_bo_count[ndx];
-	}
-
 	if (active_bo)
-		error->active_bo_count[ndx] =
-			capture_active_bo(active_bo,
-					  error->active_bo_count[ndx],
-					  &vm->active_list);
-
-	if (pinned_bo)
-		error->pinned_bo_count[ndx] =
-			capture_pinned_bo(pinned_bo,
-					  error->pinned_bo_count[ndx],
-					  &dev_priv->mm.bound_list, vm);
+		i = capture_error_bo(active_bo, i, &vm->active_list, false);
+	else
+		i = 0;
+
 	error->active_bo[ndx] = active_bo;
-	error->pinned_bo[ndx] = pinned_bo;
+	error->active_bo_count[ndx] = i;
+	error->active_vm[ndx] = vm;
 }
 
-static void i915_gem_capture_buffers(struct drm_i915_private *dev_priv,
-				     struct drm_i915_error_state *error)
+static void i915_capture_active_buffers(struct drm_i915_private *dev_priv,
+					struct drm_i915_error_state *error)
 {
-	struct i915_address_space *vm;
-	int cnt = 0, i = 0;
-
-	list_for_each_entry(vm, &dev_priv->vm_list, global_link)
-		cnt++;
-
-	error->active_bo = kcalloc(cnt, sizeof(*error->active_bo), GFP_ATOMIC);
-	error->pinned_bo = kcalloc(cnt, sizeof(*error->pinned_bo), GFP_ATOMIC);
-	error->active_bo_count = kcalloc(cnt, sizeof(*error->active_bo_count),
-					 GFP_ATOMIC);
-	error->pinned_bo_count = kcalloc(cnt, sizeof(*error->pinned_bo_count),
-					 GFP_ATOMIC);
-
-	if (error->active_bo == NULL ||
-	    error->pinned_bo == NULL ||
-	    error->active_bo_count == NULL ||
-	    error->pinned_bo_count == NULL) {
-		kfree(error->active_bo);
-		kfree(error->active_bo_count);
-		kfree(error->pinned_bo);
-		kfree(error->pinned_bo_count);
-
-		error->active_bo = NULL;
-		error->active_bo_count = NULL;
-		error->pinned_bo = NULL;
-		error->pinned_bo_count = NULL;
-	} else {
-		list_for_each_entry(vm, &dev_priv->vm_list, global_link)
-			i915_gem_capture_vm(dev_priv, error, vm, i++);
+	int cnt = 0, i, j;
+
+	BUILD_BUG_ON(ARRAY_SIZE(error->ring) > ARRAY_SIZE(error->active_bo));
+	BUILD_BUG_ON(ARRAY_SIZE(error->active_bo) != ARRAY_SIZE(error->active_vm));
+	BUILD_BUG_ON(ARRAY_SIZE(error->active_bo) != ARRAY_SIZE(error->active_bo_count));
+
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		if (error->ring[i].vm == NULL)
+			continue;
+
+		for (j = 0; j < i; j++)
+			if (error->ring[j].vm == error->ring[i].vm)
+				break;
+		if (j != i)
+			continue;
 
-		error->vm_count = cnt;
+		i915_gem_capture_vm(dev_priv, error, error->ring[i].vm, cnt++);
 	}
 }
 
+static void i915_capture_pinned_buffers(struct drm_i915_private *dev_priv,
+					struct drm_i915_error_state *error)
+{
+	struct i915_address_space *vm = &dev_priv->gtt.base;
+	struct drm_i915_error_buffer *bo;
+	struct i915_vma *vma;
+	int i, j;
+
+	i = 0;
+	list_for_each_entry(vma, &vm->active_list, vm_link)
+		i++;
+
+	j = 0;
+	list_for_each_entry(vma, &vm->inactive_list, vm_link)
+		j++;
+
+	bo = NULL;
+	if (i + j)
+		bo = kcalloc(i + j, sizeof(*bo), GFP_ATOMIC);
+	if (bo == NULL)
+		return;
+
+	i = capture_error_bo(bo, i, &vm->active_list, true);
+	j = capture_error_bo(bo + i, j, &vm->inactive_list, true);
+	error->pinned_bo_count = i + j;
+	error->pinned_bo = bo;
+}
+
 /* Capture all registers which don't fit into another category. */
 static void i915_capture_reg_state(struct drm_i915_private *dev_priv,
 				   struct drm_i915_error_state *error)
@@ -1326,10 +1293,12 @@ void i915_capture_error_state(struct drm_device *dev, bool wedged,
 
 	i915_capture_gen_state(dev_priv, error);
 	i915_capture_reg_state(dev_priv, error);
-	i915_gem_capture_buffers(dev_priv, error);
 	i915_gem_record_fences(dev, error);
 	i915_gem_record_rings(dev, error);
 
+	i915_capture_active_buffers(dev_priv, error);
+	i915_capture_pinned_buffers(dev_priv, error);
+
 	do_gettimeofday(&error->time);
 
 	error->overlay = intel_overlay_capture_error_state(dev);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 120/190] drm/i915: Stop the machine whilst capturing the GPU crash dump
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (31 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 119/190] drm/i915: Reduce amount of duplicate buffer information captured on error Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 121/190] drm/i915: Scan GGTT active list for context object Chris Wilson
                     ` (20 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

The error state is purposefully racy as we expect it to be called at any
time and so have avoided any locking whilst capturing the crash dump.
However, with multi-engine GPUs and multiple CPUs, those races can
manifest into OOPSes as we attempt to chase dangling pointers freed on
other CPUs. Under discussion are lots of ways to slow down normal
operation in order to protect the post-mortem error capture, but what it
we take the opposite approach and freeze the machine whilst the error
catpure runs (note the GPU may still running, but as long as we don't
process any of the results the driver's bookkeeping will be static).

Note that by of itself, this is not a complete fix. It also depends on
the compiler barriers in list_add/list_del to prevent transversing the
lists into the void.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Kconfig          |  1 +
 drivers/gpu/drm/i915/i915_drv.h       |  2 ++
 drivers/gpu/drm/i915/i915_gpu_error.c | 35 +++++++++++++++++++++++------------
 3 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 33e8563c2f99..17841b36b1df 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -6,6 +6,7 @@ config DRM_I915
 	select INTEL_GTT
 	select AGP_INTEL if AGP
 	select INTERVAL_TREE
+	select STOP_MACHINE
 	# we need shmfs for the swappable backing store, and in particular
 	# the shmem_readpage() which depends upon tmpfs
 	select SHMEM
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f1447dbb3c55..89da35105a33 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -484,6 +484,8 @@ struct drm_i915_error_state {
 	struct kref ref;
 	struct timeval time;
 
+	struct drm_i915_private *i915;
+
 	char error_msg[128];
 	bool simulated;
 	int iommu;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 98d0a6a53cc7..a3090d7ac20a 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -28,6 +28,7 @@
  */
 
 #include <generated/utsrelease.h>
+#include <linux/stop_machine.h>
 #include "i915_drv.h"
 
 static const char *ring_str(int ring)
@@ -1262,6 +1263,26 @@ static void i915_capture_gen_state(struct drm_i915_private *dev_priv,
 	error->suspend_count = dev_priv->suspend_count;
 }
 
+static int capture(void *data)
+{
+	struct drm_i915_error_state *error = data;
+
+	i915_capture_gen_state(error->i915, error);
+	i915_capture_reg_state(error->i915, error);
+	i915_gem_record_fences(error->i915->dev, error);
+	i915_gem_record_rings(error->i915->dev, error);
+
+	i915_capture_active_buffers(error->i915, error);
+	i915_capture_pinned_buffers(error->i915, error);
+
+	do_gettimeofday(&error->time);
+
+	error->overlay = intel_overlay_capture_error_state(error->i915->dev);
+	error->display = intel_display_capture_error_state(error->i915->dev);
+
+	return 0;
+}
+
 /**
  * i915_capture_error_state - capture an error record for later analysis
  * @dev: drm device
@@ -1290,19 +1311,9 @@ void i915_capture_error_state(struct drm_device *dev, bool wedged,
 	}
 
 	kref_init(&error->ref);
+	error->i915 = dev_priv;
 
-	i915_capture_gen_state(dev_priv, error);
-	i915_capture_reg_state(dev_priv, error);
-	i915_gem_record_fences(dev, error);
-	i915_gem_record_rings(dev, error);
-
-	i915_capture_active_buffers(dev_priv, error);
-	i915_capture_pinned_buffers(dev_priv, error);
-
-	do_gettimeofday(&error->time);
-
-	error->overlay = intel_overlay_capture_error_state(dev);
-	error->display = intel_display_capture_error_state(dev);
+	stop_machine(capture, error, NULL);
 
 	i915_error_capture_msg(dev, error, wedged, error_msg);
 	DRM_INFO("%s\n", error->error_msg);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 121/190] drm/i915: Scan GGTT active list for context object
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (32 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 120/190] drm/i915: Stop the machine whilst capturing the GPU crash dump Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 122/190] drm/i915: Move setting of request->batch into its single callsite Chris Wilson
                     ` (19 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index a3090d7ac20a..9a18fc502145 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -940,19 +940,18 @@ static void i915_gem_record_active_context(struct intel_engine_cs *ring,
 					   struct drm_i915_error_state *error,
 					   struct drm_i915_error_ring *ering)
 {
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-	struct drm_i915_gem_object *obj;
+	struct drm_i915_private *dev_priv = ring->i915;
+	struct i915_vma *vma;
 
 	/* Currently render ring is the only HW context user */
 	if (ring->id != RCS || !error->ccid)
 		return;
 
-	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-		if (!i915_gem_obj_ggtt_bound(obj))
-			continue;
-
-		if ((error->ccid & PAGE_MASK) == i915_gem_obj_ggtt_offset(obj)) {
-			ering->ctx = i915_error_ggtt_object_create(dev_priv, obj);
+	list_for_each_entry(vma, &dev_priv->gtt.base.active_list, vm_link) {
+		if ((error->ccid & PAGE_MASK) == vma->node.start) {
+			ering->ctx = i915_error_object_create(dev_priv,
+							      vma->obj,
+							      vma->vm);
 			break;
 		}
 	}
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 122/190] drm/i915: Move setting of request->batch into its single callsite
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (33 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 121/190] drm/i915: Scan GGTT active list for context object Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 123/190] drm/i915: Mark unmappable GGTT entries as PIN_HIGH Chris Wilson
                     ` (18 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

request->batch_obj is only set by execbuffer for the convenience of
debugging hangs. By moving that operation to the callsite, we can
simplify all other callers and future patches. We also move the
complications of reference handling of the request->batch_obj next to
where the active tracking is set up for the request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 +++++++++-
 drivers/gpu/drm/i915/i915_gem_request.c    | 12 +-----------
 drivers/gpu/drm/i915/i915_gem_request.h    |  8 +++-----
 3 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index e66864bdbfb4..a1b6678fb075 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1629,6 +1629,14 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		goto err_batch_unpin;
 	}
 
+	/* Whilst this request exists, batch_obj will be on the
+	 * active_list, and so will hold the active reference. Only when this
+	 * request is retired will the the batch_obj be moved onto the
+	 * inactive_list and lose its active reference. Hence we do not need
+	 * to explicitly hold another reference here.
+	 */
+	params->request->batch_obj = params->batch_vma->obj;
+
 	ret = i915_gem_request_add_to_client(params->request, file);
 	if (ret) {
 		i915_gem_request_cancel(params->request);
@@ -1648,7 +1656,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	params->ctx                     = ctx;
 
 	ret = execbuf_submit(params, args, &eb->vmas);
-	__i915_add_request(params->request, params->batch_vma->obj, ret == 0);
+	__i915_add_request(params->request, ret == 0);
 
 err_batch_unpin:
 	/*
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index be24bde2e602..1886048f0acd 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -418,9 +418,7 @@ static void i915_gem_mark_busy(struct drm_i915_private *dev_priv)
  * request is not being tracked for completion but the work itself is
  * going to happen on the hardware. This would be a Bad Thing(tm).
  */
-void __i915_add_request(struct drm_i915_gem_request *request,
-			struct drm_i915_gem_object *obj,
-			bool flush_caches)
+void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 {
 	struct intel_ring *ring;
 	u32 request_start;
@@ -467,14 +465,6 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 
 	request->head = request_start;
 
-	/* Whilst this request exists, batch_obj will be on the
-	 * active_list, and so will hold the active reference. Only when this
-	 * request is retired will the the batch_obj be moved onto the
-	 * inactive_list and lose its active reference. Hence we do not need
-	 * to explicitly hold another reference here.
-	 */
-	request->batch_obj = obj;
-
 	request->emitted_jiffies = jiffies;
 	request->previous_seqno = request->engine->last_submitted_seqno;
 	request->engine->last_submitted_seqno = request->fence.seqno;
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index f035db7c97cd..4b38cd731124 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -162,13 +162,11 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
 	*pdst = src;
 }
 
-void __i915_add_request(struct drm_i915_gem_request *req,
-			struct drm_i915_gem_object *batch_obj,
-			bool flush_caches);
+void __i915_add_request(struct drm_i915_gem_request *req, bool flush_caches);
 #define i915_add_request(req) \
-	__i915_add_request(req, NULL, true)
+	__i915_add_request(req, true)
 #define i915_add_request_no_flush(req) \
-	__i915_add_request(req, NULL, false)
+	__i915_add_request(req, false)
 
 struct intel_rps_client;
 #define NO_WAITBOOST ERR_PTR(-1)
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 123/190] drm/i915: Mark unmappable GGTT entries as PIN_HIGH
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (34 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 122/190] drm/i915: Move setting of request->batch into its single callsite Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 124/190] drm/i915: Track pinned vma inside guc Chris Wilson
                     ` (17 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

We allocate a few objects into the GGTT that we never need to access via
the mappable aperture (such as contexts, status pages). We can request
that these are bound high in the VM to increase the amount of mappable
aperture available.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c |  4 ++--
 drivers/gpu/drm/i915/intel_lrc.c        |  3 ++-
 drivers/gpu/drm/i915/intel_ringbuffer.c | 13 +++++++++----
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 9250a7405807..c54c17944796 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -389,7 +389,7 @@ int i915_gem_context_init(struct drm_device *dev)
 		 * context.
 		 */
 		ret = i915_gem_object_ggtt_pin(ctx->legacy_hw_ctx.rcs_state,
-					       NULL, 0, alignment, 0);
+					       NULL, 0, alignment, PIN_HIGH);
 		if (ret) {
 			DRM_ERROR("Failed to pinned default global context (error %d)\n",
 				  ret);
@@ -677,7 +677,7 @@ static int do_switch(struct drm_i915_gem_request *req)
 	if (engine->id == RCS) {
 		u32 alignment = get_context_alignment(engine->dev);
 		ret = i915_gem_object_ggtt_pin(to->legacy_hw_ctx.rcs_state,
-					       NULL, 0, alignment, 0);
+					       NULL, 0, alignment, PIN_HIGH);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 86fa41770ff1..206311b55e71 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -583,7 +583,8 @@ static int intel_lr_context_pin(struct intel_context *ctx,
 	ctx_obj = ctx->engine[engine->id].state;
 	ret = i915_gem_object_ggtt_pin(ctx_obj, NULL,
 				       0, GEN8_LR_CONTEXT_ALIGN,
-				       PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
+				       PIN_OFFSET_BIAS | GUC_WOPCM_TOP |
+				       PIN_HIGH);
 	if (ret)
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ba3631d216fe..6db7f93a3c1d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -649,7 +649,8 @@ intel_init_pipe_control(struct intel_engine_cs *ring)
 	if (ret)
 		goto err_unref;
 
-	ret = i915_gem_object_ggtt_pin(ring->scratch.obj, NULL, 0, 4096, 0);
+	ret = i915_gem_object_ggtt_pin(ring->scratch.obj, NULL,
+				       0, 4096, PIN_HIGH);
 	if (ret)
 		goto err_unref;
 
@@ -1891,7 +1892,9 @@ int intel_ring_map(struct intel_ring *ring)
 	int ret;
 
 	if (HAS_LLC(ring->engine->i915) && !obj->stolen) {
-		ret = i915_gem_object_ggtt_pin(obj, NULL, 0, PAGE_SIZE, 0);
+		ret = i915_gem_object_ggtt_pin(obj, NULL,
+					       0, PAGE_SIZE,
+					       PIN_HIGH);
 		if (ret)
 			return ret;
 
@@ -1906,7 +1909,8 @@ int intel_ring_map(struct intel_ring *ring)
 			goto unpin;
 		}
 	} else {
-		ret = i915_gem_object_ggtt_pin(obj, NULL, 0, PAGE_SIZE,
+		ret = i915_gem_object_ggtt_pin(obj, NULL,
+					       0, PAGE_SIZE,
 					       PIN_MAPPABLE);
 		if (ret)
 			return ret;
@@ -2505,7 +2509,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 			} else {
 				i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
 				ret = i915_gem_object_ggtt_pin(obj, NULL,
-							       0, 0, 0);
+							       0, 0,
+							       PIN_HIGH);
 				if (ret != 0) {
 					drm_gem_object_unreference(&obj->base);
 					DRM_ERROR("Failed to pin semaphore bo. Disabling semaphores\n");
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 124/190] drm/i915: Track pinned vma inside guc
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (35 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 123/190] drm/i915: Mark unmappable GGTT entries as PIN_HIGH Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 125/190] drm/i915: Track pinned VMA Chris Wilson
                     ` (16 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

Since the guc allocates and pins and object into the GGTT for its usage,
it is more natural to use that pinned VMA as our resource cookie.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  10 +-
 drivers/gpu/drm/i915/i915_guc_submission.c | 142 ++++++++++++++---------------
 drivers/gpu/drm/i915/intel_guc.h           |   9 +-
 drivers/gpu/drm/i915/intel_guc_loader.c    |   7 +-
 4 files changed, 78 insertions(+), 90 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 6b14c59828e3..d186d256f467 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2512,15 +2512,15 @@ static int i915_guc_log_dump(struct seq_file *m, void *data)
 	struct drm_info_node *node = m->private;
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_gem_object *log_obj = dev_priv->guc.log_obj;
-	u32 *log;
+	struct drm_i915_gem_object *obj;
 	int i = 0, pg;
 
-	if (!log_obj)
+	if (dev_priv->guc.log == NULL)
 		return 0;
 
-	for (pg = 0; pg < log_obj->base.size / PAGE_SIZE; pg++) {
-		log = kmap_atomic(i915_gem_object_get_page(log_obj, pg));
+	obj = dev_priv->guc.log->obj;
+	for (pg = 0; pg < obj->base.size / PAGE_SIZE; pg++) {
+		u32 *log = kmap_atomic(i915_gem_object_get_page(obj, pg));
 
 		for (i = 0; i < PAGE_SIZE / sizeof(u32); i += 4)
 			seq_printf(m, "0x%08x 0x%08x 0x%08x 0x%08x\n",
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index c4d8c34092a9..baa5c34757ba 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -181,7 +181,7 @@ static void guc_init_doorbell(struct intel_guc *guc,
 	struct guc_doorbell_info *doorbell;
 	void *base;
 
-	base = kmap_atomic(i915_gem_object_get_page(client->client_obj, 0));
+	base = kmap_atomic(i915_gem_object_get_page(client->client->obj, 0));
 	doorbell = base + client->doorbell_offset;
 
 	doorbell->db_status = 1;
@@ -198,7 +198,7 @@ static int guc_ring_doorbell(struct i915_guc_client *gc)
 	void *base;
 	int attempt = 2, ret = -EAGAIN;
 
-	base = kmap_atomic(i915_gem_object_get_page(gc->client_obj, 0));
+	base = kmap_atomic(i915_gem_object_get_page(gc->client->obj, 0));
 	desc = base + gc->proc_desc_offset;
 
 	/* Update the tail so it is visible to GuC */
@@ -260,7 +260,7 @@ static void guc_disable_doorbell(struct intel_guc *guc,
 	i915_reg_t drbreg = GEN8_DRBREGL(client->doorbell_id);
 	int value;
 
-	base = kmap_atomic(i915_gem_object_get_page(client->client_obj, 0));
+	base = kmap_atomic(i915_gem_object_get_page(client->client->obj, 0));
 	doorbell = base + client->doorbell_offset;
 
 	doorbell->db_status = 0;
@@ -343,7 +343,7 @@ static void guc_init_proc_desc(struct intel_guc *guc,
 	struct guc_process_desc *desc;
 	void *base;
 
-	base = kmap_atomic(i915_gem_object_get_page(client->client_obj, 0));
+	base = kmap_atomic(i915_gem_object_get_page(client->client->obj, 0));
 	desc = base + client->proc_desc_offset;
 
 	memset(desc, 0, sizeof(*desc));
@@ -432,17 +432,15 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 	 * XXX: May make debug easier to have it mapped
 	 */
 	desc.db_trigger_cpu = 0;
-	desc.db_trigger_uk = client->doorbell_offset +
-		i915_gem_obj_ggtt_offset(client->client_obj);
+	desc.db_trigger_uk =
+		client->doorbell_offset + client->client->node.start;
 	desc.db_trigger_phy = client->doorbell_offset +
-		sg_dma_address(client->client_obj->pages->sgl);
+		sg_dma_address(client->client->obj->pages->sgl);
 
-	desc.process_desc = client->proc_desc_offset +
-		i915_gem_obj_ggtt_offset(client->client_obj);
-
-	desc.wq_addr = client->wq_offset +
-		i915_gem_obj_ggtt_offset(client->client_obj);
+	desc.process_desc =
+		client->proc_desc_offset + client->client->node.start;
 
+	desc.wq_addr = client->wq_offset + client->client->node.start;
 	desc.wq_size = client->wq_size;
 
 	/*
@@ -452,7 +450,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 	desc.desc_private = (uintptr_t)client;
 
 	/* Pool context is pinned already */
-	sg = guc->ctx_pool_obj->pages;
+	sg = guc->ctx_pool->obj->pages;
 	sg_pcopy_from_buffer(sg->sgl, sg->nents, &desc, sizeof(desc),
 			     sizeof(desc) * client->ctx_index);
 }
@@ -465,7 +463,7 @@ static void guc_fini_ctx_desc(struct intel_guc *guc,
 
 	memset(&desc, 0, sizeof(desc));
 
-	sg = guc->ctx_pool_obj->pages;
+	sg = guc->ctx_pool->obj->pages;
 	sg_pcopy_from_buffer(sg->sgl, sg->nents, &desc, sizeof(desc),
 			     sizeof(desc) * client->ctx_index);
 }
@@ -485,7 +483,7 @@ int i915_guc_wq_check_space(struct i915_guc_client *gc)
 	if (CIRC_SPACE(gc->wq_tail, gc->wq_head, gc->wq_size) >= size)
 		return 0;
 
-	base = kmap_atomic(i915_gem_object_get_page(gc->client_obj, 0));
+	base = kmap_atomic(i915_gem_object_get_page(gc->client->obj, 0));
 	desc = base + gc->proc_desc_offset;
 
 	while (timeout_counter-- > 0) {
@@ -533,7 +531,7 @@ static int guc_add_workqueue_item(struct i915_guc_client *gc,
 	WARN_ON(wq_off & 3);
 
 	/* wq starts from the page after doorbell / process_desc */
-	base = kmap_atomic(i915_gem_object_get_page(gc->client_obj,
+	base = kmap_atomic(i915_gem_object_get_page(gc->client->obj,
 			(wq_off + GUC_DB_SIZE) >> PAGE_SHIFT));
 	wq_off &= PAGE_SIZE - 1;
 	wqi = (struct guc_wq_item *)((char *)base + wq_off);
@@ -601,7 +599,7 @@ int i915_guc_submit(struct i915_guc_client *client,
  */
 
 /**
- * gem_allocate_guc_obj() - Allocate gem object for GuC usage
+ * guc_allocate_vma() - Allocate gem object for GuC usage
  * @dev:	drm device
  * @size:	size of object
  *
@@ -611,46 +609,40 @@ int i915_guc_submit(struct i915_guc_client *client,
  *
  * Return:	A drm_i915_gem_object if successful, otherwise NULL.
  */
-static struct drm_i915_gem_object *gem_allocate_guc_obj(struct drm_device *dev,
-							u32 size)
+static struct i915_vma *guc_allocate_vma(struct drm_device *dev, u32 size)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj;
+	int ret;
 
 	obj = i915_gem_alloc_object(dev, size);
 	if (!obj)
-		return NULL;
-
-	if (i915_gem_object_get_pages(obj)) {
-		drm_gem_object_unreference(&obj->base);
-		return NULL;
-	}
+		return ERR_PTR(-ENOMEM);
 
-	if (i915_gem_object_ggtt_pin(obj, NULL, 0, PAGE_SIZE,
-				     PIN_OFFSET_BIAS | GUC_WOPCM_TOP)) {
+	ret = i915_gem_object_ggtt_pin(obj, NULL, 0, PAGE_SIZE,
+				       PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
+	if (ret) {
 		drm_gem_object_unreference(&obj->base);
-		return NULL;
+		return ERR_PTR(ret);
 	}
 
 	/* Invalidate GuC TLB to let GuC take the latest updates to GTT. */
 	I915_WRITE(GEN8_GTCR, GEN8_GTCR_INVALIDATE);
 
-	return obj;
+	return i915_gem_obj_to_ggtt(obj);
 }
 
 /**
- * gem_release_guc_obj() - Release gem object allocated for GuC usage
- * @obj:	gem obj to be released
+ * guc_release_vma() - Release gem object allocated for GuC usage
+ * @vma:	gem obj to be released
  */
-static void gem_release_guc_obj(struct drm_i915_gem_object *obj)
+static void guc_release_vma(struct i915_vma *vma)
 {
-	if (!obj)
+	if (vma == NULL)
 		return;
 
-	if (i915_gem_obj_is_pinned(obj))
-		i915_gem_object_ggtt_unpin(obj);
-
-	drm_gem_object_unreference(&obj->base);
+	i915_vma_unpin(vma);
+	drm_gem_object_unreference(&vma->obj->base);
 }
 
 static void guc_client_free(struct drm_device *dev,
@@ -677,7 +669,7 @@ static void guc_client_free(struct drm_device *dev,
 	 * Be sure to drop any locks
 	 */
 
-	gem_release_guc_obj(client->client_obj);
+	guc_release_vma(client->client);
 
 	if (client->ctx_index != GUC_INVALID_CTX_ID) {
 		guc_fini_ctx_desc(guc, client);
@@ -706,7 +698,7 @@ static struct i915_guc_client *guc_client_alloc(struct drm_device *dev,
 	struct i915_guc_client *client;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_guc *guc = &dev_priv->guc;
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 
 	client = kzalloc(sizeof(*client), GFP_KERNEL);
 	if (!client)
@@ -725,11 +717,11 @@ static struct i915_guc_client *guc_client_alloc(struct drm_device *dev,
 	}
 
 	/* The first page is doorbell/proc_desc. Two followed pages are wq. */
-	obj = gem_allocate_guc_obj(dev, GUC_DB_SIZE + GUC_WQ_SIZE);
-	if (!obj)
+	vma = guc_allocate_vma(dev, GUC_DB_SIZE + GUC_WQ_SIZE);
+	if (IS_ERR(vma))
 		goto err;
 
-	client->client_obj = obj;
+	client->client = vma;
 	client->wq_offset = GUC_DB_SIZE;
 	client->wq_size = GUC_WQ_SIZE;
 
@@ -774,7 +766,7 @@ err:
 static void guc_create_log(struct intel_guc *guc)
 {
 	struct drm_i915_private *dev_priv = guc_to_i915(guc);
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 	unsigned long offset;
 	uint32_t size, flags;
 
@@ -790,16 +782,16 @@ static void guc_create_log(struct intel_guc *guc)
 		GUC_LOG_ISR_PAGES + 1 +
 		GUC_LOG_CRASH_PAGES + 1) << PAGE_SHIFT;
 
-	obj = guc->log_obj;
-	if (!obj) {
-		obj = gem_allocate_guc_obj(dev_priv->dev, size);
-		if (!obj) {
+	vma = guc->log;
+	if (vma == NULL) {
+		vma = guc_allocate_vma(dev_priv->dev, size);
+		if (IS_ERR(vma)) {
 			/* logging will be off */
 			i915.guc_log_level = -1;
 			return;
 		}
 
-		guc->log_obj = obj;
+		guc->log = vma;
 	}
 
 	/* each allocated unit is a page */
@@ -808,7 +800,7 @@ static void guc_create_log(struct intel_guc *guc)
 		(GUC_LOG_ISR_PAGES << GUC_LOG_ISR_SHIFT) |
 		(GUC_LOG_CRASH_PAGES << GUC_LOG_CRASH_SHIFT);
 
-	offset = i915_gem_obj_ggtt_offset(obj) >> PAGE_SHIFT; /* in pages */
+	offset = vma->node.start >> PAGE_SHIFT; /* in pages */
 	guc->log_flags = (offset << GUC_LOG_BUF_ADDR_SHIFT) | flags;
 }
 
@@ -837,7 +829,7 @@ static void init_guc_policies(struct guc_policies *policies)
 static void guc_create_ads(struct intel_guc *guc)
 {
 	struct drm_i915_private *dev_priv = guc_to_i915(guc);
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 	struct guc_ads *ads;
 	struct guc_policies *policies;
 	struct guc_mmio_reg_state *reg_state;
@@ -850,16 +842,16 @@ static void guc_create_ads(struct intel_guc *guc)
 			sizeof(struct guc_mmio_reg_state) +
 			GUC_S3_SAVE_SPACE_PAGES * PAGE_SIZE;
 
-	obj = guc->ads_obj;
-	if (!obj) {
-		obj = gem_allocate_guc_obj(dev_priv->dev, PAGE_ALIGN(size));
-		if (!obj)
+	vma = guc->ads;
+	if (vma == NULL) {
+		vma = guc_allocate_vma(dev_priv->dev, PAGE_ALIGN(size));
+		if (IS_ERR(vma))
 			return;
 
-		guc->ads_obj = obj;
+		guc->ads = vma;
 	}
 
-	page = i915_gem_object_get_page(obj, 0);
+	page = i915_gem_object_get_page(vma->obj, 0);
 	ads = kmap(page);
 
 	/*
@@ -879,8 +871,7 @@ static void guc_create_ads(struct intel_guc *guc)
 	policies = (void *)ads + sizeof(struct guc_ads);
 	init_guc_policies(policies);
 
-	ads->scheduler_policies = i915_gem_obj_ggtt_offset(obj) +
-			sizeof(struct guc_ads);
+	ads->scheduler_policies = vma->node.start + sizeof(struct guc_ads);
 
 	/* MMIO reg state */
 	reg_state = (void *)policies + sizeof(struct guc_policies);
@@ -908,22 +899,22 @@ static void guc_create_ads(struct intel_guc *guc)
  */
 int i915_guc_submission_init(struct drm_device *dev)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	const size_t ctxsize = sizeof(struct guc_context_desc);
-	const size_t poolsize = GUC_MAX_GPU_CONTEXTS * ctxsize;
-	const size_t gemsize = round_up(poolsize, PAGE_SIZE);
-	struct intel_guc *guc = &dev_priv->guc;
+	struct intel_guc *guc = &to_i915(dev)->guc;
+	struct i915_vma *vma;
+	u32 size;
 
 	if (!i915.enable_guc_submission)
 		return 0; /* not enabled  */
 
-	if (guc->ctx_pool_obj)
+	if (guc->ctx_pool)
 		return 0; /* already allocated */
 
-	guc->ctx_pool_obj = gem_allocate_guc_obj(dev_priv->dev, gemsize);
-	if (!guc->ctx_pool_obj)
-		return -ENOMEM;
+	size = PAGE_ALIGN(GUC_MAX_GPU_CONTEXTS*sizeof(struct guc_context_desc));
+	vma = guc_allocate_vma(dev, size);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
 
+	guc->ctx_pool  = vma;
 	ida_init(&guc->ctx_ids);
 
 	guc_create_log(guc);
@@ -966,19 +957,18 @@ void i915_guc_submission_disable(struct drm_device *dev)
 
 void i915_guc_submission_fini(struct drm_device *dev)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_guc *guc = &dev_priv->guc;
+	struct intel_guc *guc = &to_i915(dev)->guc;
 
-	gem_release_guc_obj(dev_priv->guc.ads_obj);
-	guc->ads_obj = NULL;
+	guc_release_vma(guc->ads);
+	guc->ads = NULL;
 
-	gem_release_guc_obj(dev_priv->guc.log_obj);
-	guc->log_obj = NULL;
+	guc_release_vma(guc->log);
+	guc->log = NULL;
 
-	if (guc->ctx_pool_obj)
+	if (guc->ctx_pool)
 		ida_destroy(&guc->ctx_ids);
-	gem_release_guc_obj(guc->ctx_pool_obj);
-	guc->ctx_pool_obj = NULL;
+	guc_release_vma(guc->ctx_pool);
+	guc->ctx_pool = NULL;
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
index 045b1491ff7a..9ea410614b3f 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -28,7 +28,7 @@
 #include "i915_guc_reg.h"
 
 struct i915_guc_client {
-	struct drm_i915_gem_object *client_obj;
+	struct i915_vma *client;
 	struct intel_context *owner;
 	struct intel_guc *guc;
 	uint32_t priority;
@@ -87,11 +87,10 @@ struct intel_guc_fw {
 struct intel_guc {
 	struct intel_guc_fw guc_fw;
 	uint32_t log_flags;
-	struct drm_i915_gem_object *log_obj;
+	struct i915_vma *log;
 
-	struct drm_i915_gem_object *ads_obj;
-
-	struct drm_i915_gem_object *ctx_pool_obj;
+	struct i915_vma *ads;
+	struct i915_vma *ctx_pool;
 	struct ida ctx_ids;
 
 	struct i915_guc_client *execbuf_client;
diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
index dded672d5599..b447cfd58361 100644
--- a/drivers/gpu/drm/i915/intel_guc_loader.c
+++ b/drivers/gpu/drm/i915/intel_guc_loader.c
@@ -165,16 +165,15 @@ static void set_guc_init_params(struct drm_i915_private *dev_priv)
 			i915.guc_log_level << GUC_LOG_VERBOSITY_SHIFT;
 	}
 
-	if (guc->ads_obj) {
-		u32 ads = (u32)i915_gem_obj_ggtt_offset(guc->ads_obj)
-				>> PAGE_SHIFT;
+	if (guc->ads) {
+		u32 ads = (u32)guc->ads->node.start >> PAGE_SHIFT;
 		params[GUC_CTL_DEBUG] |= ads << GUC_ADS_ADDR_SHIFT;
 		params[GUC_CTL_DEBUG] |= GUC_ADS_ENABLED;
 	}
 
 	/* If GuC submission is enabled, set up additional parameters here */
 	if (i915.enable_guc_submission) {
-		u32 pgs = i915_gem_obj_ggtt_offset(dev_priv->guc.ctx_pool_obj);
+		u32 pgs = dev_priv->guc.ctx_pool->node.start;
 		u32 ctx_in_16 = GUC_MAX_GPU_CONTEXTS / 16;
 
 		pgs >>= PAGE_SHIFT;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 125/190] drm/i915: Track pinned VMA
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (36 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 124/190] drm/i915: Track pinned vma inside guc Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 126/190] drm/i915: Print the batchbuffer offset next to BBADDR in error state Chris Wilson
                     ` (15 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

Treat the VMA as the primary struct responsible for tracking bindings
into the GPU's VM. That is we want to treat the VMA returned after we
pin an object into the VM as the cookie we hold and eventually release
when unpinning. Doing so eliminates the ambiguity in pinning the object
and then searching for the relevant pin later.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c          |  80 ++++++-----
 drivers/gpu/drm/i915/i915_drv.h              |  79 +++-------
 drivers/gpu/drm/i915/i915_gem.c              | 208 ++++++---------------------
 drivers/gpu/drm/i915/i915_gem_context.c      |  56 ++++----
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  55 ++++---
 drivers/gpu/drm/i915/i915_gem_fence.c        |  61 ++++----
 drivers/gpu/drm/i915/i915_gem_gtt.c          |  62 ++++----
 drivers/gpu/drm/i915/i915_gem_gtt.h          |  14 --
 drivers/gpu/drm/i915/i915_gem_render_state.c |  30 ++--
 drivers/gpu/drm/i915/i915_gem_render_state.h |   2 +-
 drivers/gpu/drm/i915/i915_gem_request.c      |   7 +-
 drivers/gpu/drm/i915/i915_gem_request.h      |   2 +-
 drivers/gpu/drm/i915/i915_gem_stolen.c       |   2 +-
 drivers/gpu/drm/i915/i915_gem_tiling.c       |  40 +++---
 drivers/gpu/drm/i915/i915_gpu_error.c        |  52 +++----
 drivers/gpu/drm/i915/i915_guc_submission.c   |  30 ++--
 drivers/gpu/drm/i915/intel_display.c         |  64 +++++----
 drivers/gpu/drm/i915/intel_drv.h             |   8 +-
 drivers/gpu/drm/i915/intel_fbc.c             |   2 +-
 drivers/gpu/drm/i915/intel_fbdev.c           |  42 +++---
 drivers/gpu/drm/i915/intel_guc_loader.c      |  30 ++--
 drivers/gpu/drm/i915/intel_lrc.c             | 106 +++++++-------
 drivers/gpu/drm/i915/intel_overlay.c         |  50 ++++---
 drivers/gpu/drm/i915/intel_ringbuffer.c      | 181 ++++++++++++-----------
 drivers/gpu/drm/i915/intel_ringbuffer.h      |  15 +-
 drivers/gpu/drm/i915/intel_sprite.c          |   8 +-
 26 files changed, 576 insertions(+), 710 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index d186d256f467..e923dc192f54 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -109,7 +109,7 @@ static const char *get_tiling_flag(struct drm_i915_gem_object *obj)
 
 static inline const char *get_global_flag(struct drm_i915_gem_object *obj)
 {
-	return i915_gem_obj_to_ggtt(obj) ? "g" : " ";
+	return i915_gem_object_to_ggtt(obj, NULL) ? "g" : " ";
 }
 
 static u64 i915_gem_obj_total_ggtt_size(struct drm_i915_gem_object *obj)
@@ -266,7 +266,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj;
-	u64 total_obj_size, total_gtt_size;
+	u64 total_obj_size;
 	LIST_HEAD(stolen);
 	int count, ret;
 
@@ -274,7 +274,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 	if (ret)
 		return ret;
 
-	total_obj_size = total_gtt_size = count = 0;
+	total_obj_size = count = 0;
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
 		if (obj->stolen == NULL)
 			continue;
@@ -282,7 +282,6 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 		list_add(&obj->obj_exec_link, &stolen);
 
 		total_obj_size += obj->base.size;
-		total_gtt_size += i915_gem_obj_total_ggtt_size(obj);
 		count++;
 	}
 	list_for_each_entry(obj, &dev_priv->mm.unbound_list, global_list) {
@@ -305,8 +304,8 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 	}
 	mutex_unlock(&dev->struct_mutex);
 
-	seq_printf(m, "Total %d objects, %llu bytes, %llu GTT size\n",
-		   count, total_obj_size, total_gtt_size);
+	seq_printf(m, "Total %d objects, %llu bytes\n",
+		   count, total_obj_size);
 	return 0;
 }
 
@@ -315,7 +314,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 		size += i915_gem_obj_total_ggtt_size(obj); \
 		++count; \
 		if (obj->map_and_fenceable) { \
-			mappable_size += i915_gem_obj_ggtt_size(obj); \
+			mappable_size += obj->base.size; \
 			++mappable_count; \
 		} \
 	} \
@@ -403,10 +402,10 @@ static void print_batch_pool_stats(struct seq_file *m,
 
 #define count_vmas(list, member) do { \
 	list_for_each_entry(vma, list, member) { \
-		size += i915_gem_obj_total_ggtt_size(vma->obj); \
+		size += vma->size; \
 		++count; \
 		if (vma->obj->map_and_fenceable) { \
-			mappable_size += i915_gem_obj_ggtt_size(vma->obj); \
+			mappable_size += vma->size; \
 			++mappable_count; \
 		} \
 	} \
@@ -459,11 +458,11 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 	size = count = mappable_size = mappable_count = 0;
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
 		if (obj->fault_mappable) {
-			size += i915_gem_obj_ggtt_size(obj);
+			size += obj->base.size;
 			++count;
 		}
 		if (obj->pin_display) {
-			mappable_size += i915_gem_obj_ggtt_size(obj);
+			mappable_size += obj->base.size;
 			++mappable_count;
 		}
 		if (obj->madv == I915_MADV_DONTNEED) {
@@ -517,30 +516,29 @@ static int i915_gem_gtt_info(struct seq_file *m, void *data)
 	uintptr_t list = (uintptr_t) node->info_ent->data;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj;
-	u64 total_obj_size, total_gtt_size;
+	u64 total_obj_size;
 	int count, ret;
 
 	ret = mutex_lock_interruptible(&dev->struct_mutex);
 	if (ret)
 		return ret;
 
-	total_obj_size = total_gtt_size = count = 0;
+	total_obj_size = count = 0;
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-		if (list == PINNED_LIST && !i915_gem_obj_is_pinned(obj))
+		if (list == PINNED_LIST && !obj->pin_display)
 			continue;
 
 		seq_puts(m, "   ");
 		describe_obj(m, obj);
 		seq_putc(m, '\n');
 		total_obj_size += obj->base.size;
-		total_gtt_size += i915_gem_obj_total_ggtt_size(obj);
 		count++;
 	}
 
 	mutex_unlock(&dev->struct_mutex);
 
-	seq_printf(m, "Total %d objects, %llu bytes, %llu GTT size\n",
-		   count, total_obj_size, total_gtt_size);
+	seq_printf(m, "Total %d objects, %llu bytes\n",
+		   count, total_obj_size);
 
 	return 0;
 }
@@ -2001,40 +1999,44 @@ static int i915_context_status(struct seq_file *m, void *unused)
 
 static void i915_dump_lrc_obj(struct seq_file *m,
 			      struct intel_engine_cs *ring,
-			      struct drm_i915_gem_object *ctx_obj)
+			      struct intel_context *ctx)
 {
+	struct drm_i915_gem_object *obj = ctx->engine[ring->id].state;
+	struct i915_vma *vma = ctx->engine[ring->id].vma;
 	struct page *page;
-	uint32_t *reg_state;
 	int j;
-	unsigned long ggtt_offset = 0;
 
-	if (ctx_obj == NULL) {
-		seq_printf(m, "Context on %s with no gem object\n",
-			   ring->name);
+	seq_printf(m, "CONTEXT: %s\n", ring->name);
+
+	if (obj == NULL) {
+		seq_printf(m, "\tUnallocated\n\n");
 		return;
 	}
 
 	seq_printf(m, "CONTEXT: %s\n", ring->name);
-
-	if (!i915_gem_obj_ggtt_bound(ctx_obj))
+	if (vma == NULL) {
 		seq_puts(m, "\tNot bound in GGTT\n");
-	else
-		ggtt_offset = i915_gem_obj_ggtt_offset(ctx_obj);
+	} else {
+		seq_printf(m, "\tBound in GGTT at %x\n",
+			   lower_32_bits(vma->node.start));
+	}
 
-	if (i915_gem_object_get_pages(ctx_obj)) {
-		seq_puts(m, "\tFailed to get pages for context object\n");
+	if (i915_gem_object_get_pages(obj)) {
+		seq_puts(m, "\tFailed to get pages for context object\n\n");
 		return;
 	}
 
-	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
-	if (!WARN_ON(page == NULL)) {
-		reg_state = kmap_atomic(page);
-
+	page = i915_gem_object_get_page(obj, LRC_STATE_PN);
+	if (page != NULL) {
+		uint32_t *reg_state = kmap_atomic(page);
 		for (j = 0; j < 0x600 / sizeof(u32) / 4; j += 4) {
-			seq_printf(m, "\t[0x%08lx] 0x%08x 0x%08x 0x%08x 0x%08x\n",
-				   ggtt_offset + 4096 + (j * 4),
-				   reg_state[j], reg_state[j + 1],
-				   reg_state[j + 2], reg_state[j + 3]);
+			seq_printf(m,
+				   "\t[0x%08x] 0x%08x 0x%08x 0x%08x 0x%08x\n",
+				   j * 4,
+				   reg_state[j],
+				   reg_state[j + 1],
+				   reg_state[j + 2],
+				   reg_state[j + 3]);
 		}
 		kunmap_atomic(reg_state);
 	}
@@ -2062,7 +2064,7 @@ static int i915_dump_lrc(struct seq_file *m, void *unused)
 
 	list_for_each_entry(ctx, &dev_priv->context_list, link) {
 		for_each_ring(ring, dev_priv, i)
-			i915_dump_lrc_obj(m, ring, ctx->engine[i].state);
+			i915_dump_lrc_obj(m, ring, ctx);
 	}
 
 	mutex_unlock(&dev->struct_mutex);
@@ -3131,7 +3133,7 @@ static int i915_semaphore_status(struct seq_file *m, void *unused)
 		struct page *page;
 		uint64_t *seqno;
 
-		page = i915_gem_object_get_page(dev_priv->semaphore_obj, 0);
+		page = i915_gem_object_get_page(dev_priv->semaphore_vma->obj, 0);
 
 		seqno = (uint64_t *)kmap_atomic(page);
 		for_each_ring(ring, dev_priv, i) {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 89da35105a33..6b729baf6503 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -878,12 +878,14 @@ struct intel_context {
 	/* Legacy ring buffer submission */
 	struct {
 		struct drm_i915_gem_object *rcs_state;
+		struct i915_vma *rcs_vma;
 		bool initialized;
 	} legacy_hw_ctx;
 
 	/* Execlists */
 	struct {
 		struct drm_i915_gem_object *state;
+		struct i915_vma *vma;
 		struct intel_ring *ring;
 		int pin_count;
 		bool initialised;
@@ -1705,7 +1707,7 @@ struct drm_i915_private {
 	struct pci_dev *bridge_dev;
 	struct intel_engine_cs ring[I915_NUM_RINGS];
 	struct intel_context *kernel_context;
-	struct drm_i915_gem_object *semaphore_obj;
+	struct i915_vma *semaphore_vma;
 	uint32_t last_seqno, next_seqno;
 
 	struct drm_dma_handle *status_page_dmah;
@@ -2739,7 +2741,7 @@ static inline void i915_vma_unpin(struct i915_vma *vma)
 	__i915_vma_unpin(vma);
 }
 
-int __must_check
+struct i915_vma * __must_check
 i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 			 const struct i915_ggtt_view *view,
 			 uint64_t size,
@@ -2884,12 +2886,11 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj,
 				  bool write);
 int __must_check
 i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write);
-int __must_check
+struct i915_vma * __must_check
 i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 				     u32 alignment,
 				     const struct i915_ggtt_view *view);
-void i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj,
-					      const struct i915_ggtt_view *view);
+void i915_gem_object_unpin_from_display_plane(struct i915_vma *vma);
 int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj,
 				int align);
 int i915_gem_open(struct drm_device *dev, struct drm_file *file);
@@ -2910,47 +2911,15 @@ struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev,
 struct dma_buf *i915_gem_prime_export(struct drm_device *dev,
 				struct drm_gem_object *gem_obj, int flags);
 
-u64 i915_gem_obj_ggtt_offset_view(struct drm_i915_gem_object *o,
-				  const struct i915_ggtt_view *view);
-u64 i915_gem_obj_offset(struct drm_i915_gem_object *o,
-			struct i915_address_space *vm);
-static inline u64
-i915_gem_obj_ggtt_offset(struct drm_i915_gem_object *o)
-{
-	return i915_gem_obj_ggtt_offset_view(o, &i915_ggtt_view_normal);
-}
-
-bool i915_gem_obj_ggtt_bound_view(struct drm_i915_gem_object *o,
-				  const struct i915_ggtt_view *view);
-bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
-			struct i915_address_space *vm);
-
-unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
-				struct i915_address_space *vm);
 struct i915_vma *
 i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
-		    struct i915_address_space *vm);
-struct i915_vma *
-i915_gem_obj_to_ggtt_view(struct drm_i915_gem_object *obj,
-			  const struct i915_ggtt_view *view);
+		     struct i915_address_space *vm,
+		     const struct i915_ggtt_view *view);
 
 struct i915_vma *
 i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
-				  struct i915_address_space *vm);
-struct i915_vma *
-i915_gem_obj_lookup_or_create_ggtt_vma(struct drm_i915_gem_object *obj,
-				       const struct i915_ggtt_view *view);
-
-static inline struct i915_vma *
-i915_gem_obj_to_ggtt(struct drm_i915_gem_object *obj)
-{
-	return i915_gem_obj_to_ggtt_view(obj, &i915_ggtt_view_normal);
-}
-bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj);
-
-/* Some GGTT VM helpers */
-#define i915_obj_to_ggtt(obj) \
-	(&((struct drm_i915_private *)(obj)->base.dev->dev_private)->gtt.base)
+				  struct i915_address_space *vm,
+				  const struct i915_ggtt_view *view);
 
 static inline struct i915_hw_ppgtt *
 i915_vm_to_ppgtt(struct i915_address_space *vm)
@@ -2959,29 +2928,21 @@ i915_vm_to_ppgtt(struct i915_address_space *vm)
 	return container_of(vm, struct i915_hw_ppgtt, base);
 }
 
-static inline bool i915_gem_obj_ggtt_bound(struct drm_i915_gem_object *obj)
-{
-	return i915_gem_obj_ggtt_bound_view(obj, &i915_ggtt_view_normal);
-}
-
-static inline unsigned long
-i915_gem_obj_ggtt_size(struct drm_i915_gem_object *obj)
-{
-	return i915_gem_obj_size(obj, i915_obj_to_ggtt(obj));
-}
+/* Some GGTT VM helpers */
+#define i915_obj_to_ggtt(obj) (&(to_i915((obj)->base.dev)->gtt.base))
 
-static inline int
-i915_gem_object_ggtt_unbind(struct drm_i915_gem_object *obj)
+static inline struct i915_vma *
+i915_gem_object_to_ggtt(struct drm_i915_gem_object *obj,
+			const struct i915_ggtt_view *view)
 {
-	return i915_vma_unbind(i915_gem_obj_to_ggtt(obj));
+	return i915_gem_obj_to_vma(obj, i915_obj_to_ggtt(obj), view);
 }
 
-void i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
-				     const struct i915_ggtt_view *view);
-static inline void
-i915_gem_object_ggtt_unpin(struct drm_i915_gem_object *obj)
+static inline unsigned long
+i915_gem_object_ggtt_offset(struct drm_i915_gem_object *o,
+			    const struct i915_ggtt_view *view)
 {
-	i915_gem_object_ggtt_unpin_view(obj, &i915_ggtt_view_normal);
+	return i915_gem_object_to_ggtt(o, view)->node.start;
 }
 
 /* i915_gem_fence.c */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 24e6e4773ac8..01c20a336c04 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -775,16 +775,18 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
 			 struct drm_file *file)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_vma *vma;
 	ssize_t remain;
 	loff_t offset, page_base;
 	char __user *user_data;
 	int page_offset, page_length, ret;
 
-	ret = i915_gem_object_ggtt_pin(obj, NULL,
-				       0, 0,
+	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, 
 				       PIN_MAPPABLE | PIN_NONBLOCK);
-	if (ret)
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto out;
+	}
 
 	ret = i915_gem_object_set_to_gtt_domain(obj, true);
 	if (ret)
@@ -797,7 +799,7 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
 	user_data = to_user_ptr(args->data_ptr);
 	remain = args->size;
 
-	offset = i915_gem_obj_ggtt_offset(obj) + args->offset;
+	offset = vma->node.start + args->offset;
 
 	intel_fb_obj_invalidate(obj, ORIGIN_GTT);
 
@@ -832,7 +834,7 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
 out_flush:
 	intel_fb_obj_flush(obj, false, ORIGIN_GTT);
 out_unpin:
-	i915_gem_object_ggtt_unpin(obj);
+	i915_vma_unpin(vma);
 out:
 	return ret;
 }
@@ -1397,6 +1399,7 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_ggtt_view view = i915_ggtt_view_normal;
+	struct i915_vma *ggtt;
 	pgoff_t page_offset;
 	unsigned long pfn;
 	int ret = 0;
@@ -1445,9 +1448,11 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	}
 
 	/* Now pin it into the GTT if needed */
-	ret = i915_gem_object_ggtt_pin(obj, &view, 0, 0, PIN_MAPPABLE);
-	if (ret)
+	ggtt = i915_gem_object_ggtt_pin(obj, &view, 0, 0, PIN_MAPPABLE);
+	if (IS_ERR(ggtt)) {
+		ret = PTR_ERR(ggtt);
 		goto unlock;
+	}
 
 	ret = i915_gem_object_set_to_gtt_domain(obj, write);
 	if (ret)
@@ -1458,8 +1463,7 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 		goto unpin;
 
 	/* Finally, remap it using the new GTT offset */
-	pfn = dev_priv->gtt.mappable_base +
-		i915_gem_obj_ggtt_offset_view(obj, &view);
+	pfn = dev_priv->gtt.mappable_base + ggtt->node.start;
 	pfn >>= PAGE_SHIFT;
 
 	if (unlikely(view.type == I915_GGTT_VIEW_PARTIAL)) {
@@ -1501,7 +1505,7 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 					    pfn + page_offset);
 	}
 unpin:
-	i915_gem_object_ggtt_unpin_view(obj, &view);
+	__i915_vma_unpin(ggtt);
 unlock:
 	mutex_unlock(&dev->struct_mutex);
 out:
@@ -3010,7 +3014,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 					    old_write_domain);
 
 	/* And bump the LRU for this access */
-	vma = i915_gem_obj_to_ggtt(obj);
+	vma = i915_gem_object_to_ggtt(obj, NULL);
 	if (vma && drm_mm_node_allocated(&vma->node) && !vma->active)
 		list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
 
@@ -3233,11 +3237,12 @@ rpm_put:
  * Can be called from an uninterruptible phase (modesetting) and allows
  * any flushes to be pipelined (for pageflips).
  */
-int
+struct i915_vma *
 i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 				     u32 alignment,
 				     const struct i915_ggtt_view *view)
 {
+	struct i915_vma *vma;
 	u32 old_read_domains, old_write_domain;
 	int ret;
 
@@ -3257,19 +3262,23 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	 */
 	ret = i915_gem_object_set_cache_level(obj,
 					      HAS_WT(obj->base.dev) ? I915_CACHE_WT : I915_CACHE_NONE);
-	if (ret)
+	if (ret) {
+		vma = ERR_PTR(ret);
 		goto err_unpin_display;
+	}
 
 	/* As the user may map the buffer once pinned in the display plane
 	 * (e.g. libkms for the bootup splash), we have to ensure that we
 	 * always use map_and_fenceable for all scanout buffers.
 	 */
-	ret = i915_gem_object_ggtt_pin(obj, view, 0, alignment,
+	vma = i915_gem_object_ggtt_pin(obj, view, 0, alignment,
 				       view->type == I915_GGTT_VIEW_NORMAL ?
 				       PIN_MAPPABLE : 0);
-	if (ret)
+	if (IS_ERR(vma))
 		goto err_unpin_display;
 
+	WARN_ON(obj->pin_display > vma->pin_count);
+
 	i915_gem_object_flush_cpu_write_domain(obj);
 
 	old_write_domain = obj->base.write_domain;
@@ -3288,24 +3297,24 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	/* Increment the pages_pin_count to guard against the shrinker */
 	obj->pages_pin_count++;
 
-	return 0;
+	return vma;
 
 err_unpin_display:
 	obj->pin_display--;
-	return ret;
+	return vma;
 }
 
 void
-i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj,
-					 const struct i915_ggtt_view *view)
+i915_gem_object_unpin_from_display_plane(struct i915_vma *vma)
 {
-	if (WARN_ON(obj->pin_display == 0))
+	if (WARN_ON(vma->obj->pin_display == 0))
 		return;
 
-	i915_gem_object_ggtt_unpin_view(obj, view);
+	vma->obj->pin_display--;
+	vma->obj->pages_pin_count--;
 
-	obj->pages_pin_count--;
-	obj->pin_display--;
+	i915_vma_unpin(vma);
+	WARN_ON(vma->obj->pin_display > vma->pin_count);
 }
 
 /**
@@ -3511,26 +3520,24 @@ err:
 	return ret;
 }
 
-int
+struct i915_vma *
 i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
-			 const struct i915_ggtt_view *view,
+			 const struct i915_ggtt_view *ggtt_view,
 			 uint64_t size,
 			 uint64_t alignment,
 			 uint64_t flags)
 {
+	struct i915_address_space *vm = i915_obj_to_ggtt(obj);
 	struct i915_vma *vma;
 	int ret;
 
-	if (view == NULL)
-		view = &i915_ggtt_view_normal;
-
-	vma = i915_gem_obj_lookup_or_create_ggtt_vma(obj, view);
+	vma = i915_gem_obj_lookup_or_create_vma(obj, vm, ggtt_view);
 	if (IS_ERR(vma))
-		return PTR_ERR(vma);
+		return vma;
 
 	if (i915_vma_misplaced(vma, size, alignment, flags)) {
 		if (flags & PIN_NONBLOCK && (vma->pin_count | vma->active))
-			return -ENOSPC;
+			return ERR_PTR(-ENOSPC);
 
 		WARN(vma->pin_count,
 		     "bo is already pinned in ggtt with incorrect alignment:"
@@ -3543,17 +3550,14 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 		     obj->map_and_fenceable);
 		ret = i915_vma_unbind(vma);
 		if (ret)
-			return ret;
+			return ERR_PTR(ret);
 	}
 
-	return i915_vma_pin(vma, size, alignment, flags | PIN_GLOBAL);
-}
+	ret = i915_vma_pin(vma, size, alignment, flags | PIN_GLOBAL);
+	if (ret)
+		return ERR_PTR(ret);
 
-void
-i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
-				const struct i915_ggtt_view *view)
-{
-	i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
+	return vma;
 }
 
 int
@@ -3824,34 +3828,6 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 	intel_runtime_pm_put(dev_priv);
 }
 
-struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
-				     struct i915_address_space *vm)
-{
-	struct i915_vma *vma;
-	list_for_each_entry(vma, &obj->vma_list, obj_link) {
-		if (vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL &&
-		    vma->vm == vm)
-			return vma;
-	}
-	return NULL;
-}
-
-struct i915_vma *i915_gem_obj_to_ggtt_view(struct drm_i915_gem_object *obj,
-					   const struct i915_ggtt_view *view)
-{
-	struct i915_address_space *ggtt = i915_obj_to_ggtt(obj);
-	struct i915_vma *vma;
-
-	if (WARN_ONCE(!view, "no view specified"))
-		return ERR_PTR(-EINVAL);
-
-	list_for_each_entry(vma, &obj->vma_list, obj_link)
-		if (vma->vm == ggtt &&
-		    i915_ggtt_view_equal(&vma->ggtt_view, view))
-			return vma;
-	return NULL;
-}
-
 static void
 i915_gem_stop_ringbuffers(struct drm_device *dev)
 {
@@ -4329,104 +4305,6 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
 	}
 }
 
-/* All the new VM stuff */
-u64 i915_gem_obj_offset(struct drm_i915_gem_object *o,
-			struct i915_address_space *vm)
-{
-	struct drm_i915_private *dev_priv = o->base.dev->dev_private;
-	struct i915_vma *vma;
-
-	WARN_ON(vm == &dev_priv->mm.aliasing_ppgtt->base);
-
-	list_for_each_entry(vma, &o->vma_list, obj_link) {
-		if (vma->is_ggtt &&
-		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
-			continue;
-		if (vma->vm == vm)
-			return vma->node.start;
-	}
-
-	WARN(1, "%s vma for this object not found.\n",
-	     i915_is_ggtt(vm) ? "global" : "ppgtt");
-	return -1;
-}
-
-u64 i915_gem_obj_ggtt_offset_view(struct drm_i915_gem_object *o,
-				  const struct i915_ggtt_view *view)
-{
-	struct i915_address_space *ggtt = i915_obj_to_ggtt(o);
-	struct i915_vma *vma;
-
-	list_for_each_entry(vma, &o->vma_list, obj_link)
-		if (vma->vm == ggtt &&
-		    i915_ggtt_view_equal(&vma->ggtt_view, view))
-			return vma->node.start;
-
-	WARN(1, "global vma for this object not found. (view=%u)\n", view->type);
-	return -1;
-}
-
-bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
-			struct i915_address_space *vm)
-{
-	struct i915_vma *vma;
-
-	list_for_each_entry(vma, &o->vma_list, obj_link) {
-		if (vma->is_ggtt &&
-		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
-			continue;
-		if (vma->vm == vm && drm_mm_node_allocated(&vma->node))
-			return true;
-	}
-
-	return false;
-}
-
-bool i915_gem_obj_ggtt_bound_view(struct drm_i915_gem_object *o,
-				  const struct i915_ggtt_view *view)
-{
-	struct i915_address_space *ggtt = i915_obj_to_ggtt(o);
-	struct i915_vma *vma;
-
-	list_for_each_entry(vma, &o->vma_list, obj_link)
-		if (vma->vm == ggtt &&
-		    i915_ggtt_view_equal(&vma->ggtt_view, view) &&
-		    drm_mm_node_allocated(&vma->node))
-			return true;
-
-	return false;
-}
-
-unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
-				struct i915_address_space *vm)
-{
-	struct drm_i915_private *dev_priv = o->base.dev->dev_private;
-	struct i915_vma *vma;
-
-	WARN_ON(vm == &dev_priv->mm.aliasing_ppgtt->base);
-
-	BUG_ON(list_empty(&o->vma_list));
-
-	list_for_each_entry(vma, &o->vma_list, obj_link) {
-		if (vma->is_ggtt &&
-		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
-			continue;
-		if (vma->vm == vm)
-			return vma->node.size;
-	}
-	return 0;
-}
-
-bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj)
-{
-	struct i915_vma *vma;
-	list_for_each_entry(vma, &obj->vma_list, obj_link)
-		if (vma->pin_count > 0)
-			return true;
-
-	return false;
-}
-
 /* Like i915_gem_object_get_page(), but mark the returned page dirty */
 struct page *
 i915_gem_object_get_dirty_page(struct drm_i915_gem_object *obj, int n)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index c54c17944796..0a5f1d5fa788 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -327,8 +327,10 @@ void i915_gem_context_reset(struct drm_device *dev)
 		struct intel_context *lctx = ring->last_context;
 
 		if (lctx) {
-			if (lctx->legacy_hw_ctx.rcs_state && i == RCS)
-				i915_gem_object_ggtt_unpin(lctx->legacy_hw_ctx.rcs_state);
+			if (lctx->legacy_hw_ctx.rcs_vma) {
+				i915_vma_unpin(lctx->legacy_hw_ctx.rcs_vma);
+				lctx->legacy_hw_ctx.rcs_vma = NULL;
+			}
 
 			i915_gem_context_unreference(lctx);
 			ring->last_context = NULL;
@@ -379,7 +381,7 @@ int i915_gem_context_init(struct drm_device *dev)
 
 	if (ctx->legacy_hw_ctx.rcs_state) {
 		u32 alignment = get_context_alignment(dev);
-		int ret;
+		struct i915_vma *vma;
 
 		/* We may need to do things with the shrinker which
 		 * require us to immediately switch back to the default
@@ -388,13 +390,13 @@ int i915_gem_context_init(struct drm_device *dev)
 		 * be available. To avoid this we always pin the default
 		 * context.
 		 */
-		ret = i915_gem_object_ggtt_pin(ctx->legacy_hw_ctx.rcs_state,
+		vma = i915_gem_object_ggtt_pin(ctx->legacy_hw_ctx.rcs_state,
 					       NULL, 0, alignment, PIN_HIGH);
-		if (ret) {
+		if (IS_ERR(vma)) {
 			DRM_ERROR("Failed to pinned default global context (error %d)\n",
-				  ret);
+				  (int)PTR_ERR(vma));
 			i915_gem_context_unreference(ctx);
-			return ret;
+			return PTR_ERR(vma);
 		}
 	}
 
@@ -427,13 +429,13 @@ void i915_gem_context_fini(struct drm_device *dev)
 		WARN_ON(!dev_priv->ring[RCS].last_context);
 		if (dev_priv->ring[RCS].last_context == dctx) {
 			/* Fake switch to NULL context */
-			WARN_ON(i915_gem_object_is_active(dctx->legacy_hw_ctx.rcs_state));
-			i915_gem_object_ggtt_unpin(dctx->legacy_hw_ctx.rcs_state);
+			WARN_ON(dctx->legacy_hw_ctx.rcs_vma->active);
+			i915_vma_unpin(dctx->legacy_hw_ctx.rcs_vma);
 			i915_gem_context_unreference(dctx);
 			dev_priv->ring[RCS].last_context = NULL;
 		}
 
-		i915_gem_object_ggtt_unpin(dctx->legacy_hw_ctx.rcs_state);
+		i915_vma_unpin(dctx->legacy_hw_ctx.rcs_vma);
 	}
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
@@ -553,8 +555,8 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 
 	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_emit(ring, MI_SET_CONTEXT);
-	intel_ring_emit(ring, i915_gem_obj_ggtt_offset(req->ctx->legacy_hw_ctx.rcs_state) |
-			flags);
+	intel_ring_emit(ring,
+			req->ctx->legacy_hw_ctx.rcs_vma->node.start | flags);
 	/*
 	 * w/a: MI_SET_CONTEXT must always be followed by MI_NOOP
 	 * WaMiSetContext_Hang:snb,ivb,vlv
@@ -666,20 +668,29 @@ static int do_switch(struct drm_i915_gem_request *req)
 {
 	struct intel_context *to = req->ctx;
 	struct intel_engine_cs *engine = req->engine;
-	struct intel_context *from = engine->last_context;
+	struct intel_context *from;
 	u32 hw_flags = 0;
 	int ret, i;
 
-	if (should_skip_switch(engine, from, to))
+	if (should_skip_switch(engine, engine->last_context, to))
 		return 0;
 
 	/* Trying to pin first makes error handling easier. */
 	if (engine->id == RCS) {
 		u32 alignment = get_context_alignment(engine->dev);
-		ret = i915_gem_object_ggtt_pin(to->legacy_hw_ctx.rcs_state,
+		struct i915_vma *vma;
+
+		vma = i915_gem_object_ggtt_pin(to->legacy_hw_ctx.rcs_state,
 					       NULL, 0, alignment, PIN_HIGH);
-		if (ret)
-			return ret;
+		if (IS_ERR(vma))
+			return PTR_ERR(vma);
+
+		to->legacy_hw_ctx.rcs_vma = vma;
+
+		if (WARN_ON(!(vma->bound & GLOBAL_BIND))) {
+			ret = -ENODEV;
+			goto unpin_out;
+		}
 	}
 
 	/*
@@ -790,8 +801,6 @@ static int do_switch(struct drm_i915_gem_request *req)
 	 * MI_SET_CONTEXT instead of when the next seqno has completed.
 	 */
 	if (from != NULL) {
-		struct drm_i915_gem_object *obj = from->legacy_hw_ctx.rcs_state;
-
 		/* As long as MI_SET_CONTEXT is serializing, ie. it flushes the
 		 * whole damn pipeline, we don't need to explicitly mark the
 		 * object dirty. The only exception is that the context must be
@@ -799,11 +808,10 @@ static int do_switch(struct drm_i915_gem_request *req)
 		 * able to defer doing this until we know the object would be
 		 * swapped, but there is no way to do that yet.
 		 */
-		obj->base.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
-		i915_vma_move_to_active(i915_gem_obj_to_ggtt(obj), req, 0);
-
+		i915_vma_move_to_active(from->legacy_hw_ctx.rcs_vma, req, 0);
 		/* obj is kept alive until the next request by its active ref */
-		i915_gem_object_ggtt_unpin(from->legacy_hw_ctx.rcs_state);
+		i915_vma_unpin(from->legacy_hw_ctx.rcs_vma);
+
 		i915_gem_context_unreference(from);
 	}
 
@@ -814,7 +822,7 @@ done:
 
 unpin_out:
 	if (engine->id == RCS)
-		i915_gem_object_ggtt_unpin(to->legacy_hw_ctx.rcs_state);
+		i915_vma_unpin(to->legacy_hw_ctx.rcs_vma);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index a1b6678fb075..4d15dd32e365 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -174,8 +174,8 @@ eb_lookup_vmas(struct eb_vmas *eb,
 		 * from the (obj, vm) we don't run the risk of creating
 		 * duplicated vmas for the same vm.
 		 */
-		vma = i915_gem_obj_lookup_or_create_vma(obj, vm);
-		if (IS_ERR(vma)) {
+		vma = i915_gem_obj_lookup_or_create_vma(obj, vm, NULL);
+		if (unlikely(IS_ERR(vma))) {
 			DRM_DEBUG("Failed to lookup VMA\n");
 			ret = PTR_ERR(vma);
 			goto err;
@@ -348,21 +348,26 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
 {
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_vma *vma;
 	uint64_t delta = relocation_target(reloc, target_offset);
 	uint64_t offset;
 	void __iomem *reloc_page;
 	int ret;
 
+	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
+
 	ret = i915_gem_object_set_to_gtt_domain(obj, true);
 	if (ret)
-		return ret;
+		goto unpin;
 
 	ret = i915_gem_object_put_fence(obj);
 	if (ret)
-		return ret;
+		goto unpin;
 
 	/* Map the page containing the relocation we're going to perform.  */
-	offset = i915_gem_obj_ggtt_offset(obj);
+	offset = vma->node.start;
 	offset += reloc->offset;
 	reloc_page = io_mapping_map_atomic_wc(dev_priv->gtt.mappable,
 					      offset & PAGE_MASK);
@@ -384,7 +389,9 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
 
 	io_mapping_unmap_atomic(reloc_page);
 
-	return 0;
+unpin:
+	i915_vma_unpin(vma);
+	return ret;
 }
 
 static void
@@ -1222,7 +1229,7 @@ i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
 	return 0;
 }
 
-static struct i915_vma*
+static struct i915_vma *
 i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 			  struct drm_i915_gem_exec_object2 *shadow_exec_entry,
 			  struct drm_i915_gem_object *batch_obj,
@@ -1246,31 +1253,30 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 			      batch_start_offset,
 			      batch_len,
 			      is_master);
-	if (ret)
+	if (ret) {
+		if (ret == -EACCES) /* unhandled chained batch */
+			vma = NULL;
+		else
+			vma = ERR_PTR(ret);
 		goto err;
+	}
 
-	ret = i915_gem_object_ggtt_pin(shadow_batch_obj, NULL, 0, 0, 0);
-	if (ret)
+	vma = i915_gem_object_ggtt_pin(shadow_batch_obj, NULL, 0, 0, 0);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto err;
-
-	i915_gem_object_unpin_pages(shadow_batch_obj);
+	}
 
 	memset(shadow_exec_entry, 0, sizeof(*shadow_exec_entry));
 
-	vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
 	vma->exec_entry = shadow_exec_entry;
 	vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
 	drm_gem_object_reference(&shadow_batch_obj->base);
 	list_add_tail(&vma->exec_list, &eb->vmas);
 
-	return vma;
-
 err:
 	i915_gem_object_unpin_pages(shadow_batch_obj);
-	if (ret == -EACCES) /* unhandled chained batch */
-		return NULL;
-	else
-		return ERR_PTR(ret);
+	return vma;
 }
 
 static int
@@ -1604,6 +1610,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	 * hsw should have this fixed, but bdw mucks it up again. */
 	if (dispatch_flags & I915_DISPATCH_SECURE) {
 		struct drm_i915_gem_object *obj = params->batch_vma->obj;
+		struct i915_vma *vma;
 
 		/*
 		 * So on first glance it looks freaky that we pin the batch here
@@ -1615,11 +1622,13 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		 *   fitting due to fragmentation.
 		 * So this is actually safe.
 		 */
-		ret = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, 0);
-		if (ret)
+		vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, 0);
+		if (IS_ERR(vma)) {
+			ret = PTR_ERR(vma);
 			goto err;
+		}
 
-		params->batch_vma = i915_gem_obj_to_ggtt(obj);
+		params->batch_vma = vma;
 	}
 
 	/* Allocate a request for this batch buffer nice and early. */
@@ -1635,7 +1644,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	 * inactive_list and lose its active reference. Hence we do not need
 	 * to explicitly hold another reference here.
 	 */
-	params->request->batch_obj = params->batch_vma->obj;
+	params->request->batch = params->batch_vma;
 
 	ret = i915_gem_request_add_to_client(params->request, file);
 	if (ret) {
diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
index ff085efcf0e5..8ba05a0f15d2 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence.c
@@ -85,20 +85,14 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
 	POSTING_READ(fence_reg_lo);
 
 	if (obj) {
-		u32 size = i915_gem_obj_ggtt_size(obj);
-		uint64_t val;
-
-		/* Adjust fence size to match tiled area */
-		if (obj->tiling_mode != I915_TILING_NONE) {
-			uint32_t row_size = obj->stride *
-				(obj->tiling_mode == I915_TILING_Y ? 32 : 8);
-			size = (size / row_size) * row_size;
-		}
-
-		val = (uint64_t)((i915_gem_obj_ggtt_offset(obj) + size - 4096) &
-				 0xfffff000) << 32;
-		val |= i915_gem_obj_ggtt_offset(obj) & 0xfffff000;
-		val |= (uint64_t)((obj->stride / 128) - 1) << fence_pitch_shift;
+		struct i915_vma *vma = i915_gem_object_to_ggtt(obj, NULL);
+		u32 row_size = obj->stride * (obj->tiling_mode == I915_TILING_Y  ? 32 : 8);
+		u32 size = (u32)vma->node.size / row_size * row_size;
+		u64 val;
+
+		val = ((vma->node.start + size - 4096) & 0xfffff000) << 32;
+		val |= vma->node.start & 0xfffff000;
+		val |= (u64)((obj->stride / 128) - 1) << fence_pitch_shift;
 		if (obj->tiling_mode == I915_TILING_Y)
 			val |= 1 << I965_FENCE_TILING_Y_SHIFT;
 		val |= I965_FENCE_REG_VALID;
@@ -121,15 +115,17 @@ static void i915_write_fence_reg(struct drm_device *dev, int reg,
 	u32 val;
 
 	if (obj) {
-		u32 size = i915_gem_obj_ggtt_size(obj);
+		struct i915_vma *vma = i915_gem_object_to_ggtt(obj, NULL);
 		int pitch_val;
 		int tile_width;
 
-		WARN((i915_gem_obj_ggtt_offset(obj) & ~I915_FENCE_START_MASK) ||
-		     (size & -size) != size ||
-		     (i915_gem_obj_ggtt_offset(obj) & (size - 1)),
-		     "object 0x%08llx [fenceable? %d] not 1M or pot-size (0x%08x) aligned\n",
-		     i915_gem_obj_ggtt_offset(obj), obj->map_and_fenceable, size);
+		WARN((vma->node.start & ~I915_FENCE_START_MASK) ||
+		     !is_power_of_2(vma->node.size) ||
+		     (vma->node.start & (vma->node.size - 1)),
+		     "object 0x%08lx [fenceable? %d] not 1M or pot-size (0x%08lx) aligned\n",
+		     (long)vma->node.start,
+		     obj->map_and_fenceable,
+		     (long)vma->node.size);
 
 		if (obj->tiling_mode == I915_TILING_Y && HAS_128_BYTE_Y_TILING(dev))
 			tile_width = 128;
@@ -140,10 +136,10 @@ static void i915_write_fence_reg(struct drm_device *dev, int reg,
 		pitch_val = obj->stride / tile_width;
 		pitch_val = ffs(pitch_val) - 1;
 
-		val = i915_gem_obj_ggtt_offset(obj);
+		val = vma->node.start;
 		if (obj->tiling_mode == I915_TILING_Y)
 			val |= 1 << I830_FENCE_TILING_Y_SHIFT;
-		val |= I915_FENCE_SIZE_BITS(size);
+		val |= I915_FENCE_SIZE_BITS(vma->node.size);
 		val |= pitch_val << I830_FENCE_PITCH_SHIFT;
 		val |= I830_FENCE_REG_VALID;
 	} else
@@ -160,22 +156,22 @@ static void i830_write_fence_reg(struct drm_device *dev, int reg,
 	uint32_t val;
 
 	if (obj) {
-		u32 size = i915_gem_obj_ggtt_size(obj);
+		struct i915_vma *vma = i915_gem_object_to_ggtt(obj, NULL);
 		uint32_t pitch_val;
 
-		WARN((i915_gem_obj_ggtt_offset(obj) & ~I830_FENCE_START_MASK) ||
-		     (size & -size) != size ||
-		     (i915_gem_obj_ggtt_offset(obj) & (size - 1)),
-		     "object 0x%08llx not 512K or pot-size 0x%08x aligned\n",
-		     i915_gem_obj_ggtt_offset(obj), size);
+		WARN((vma->node.start & ~I830_FENCE_START_MASK) ||
+		     !is_power_of_2(vma->node.size) ||
+		     (vma->node.start & (vma->node.size - 1)),
+		     "object 0x%08lx not 512K or pot-size 0x%08lx aligned\n",
+		     (long)vma->node.start, (long)vma->node.size);
 
 		pitch_val = obj->stride / 128;
 		pitch_val = ffs(pitch_val) - 1;
 
-		val = i915_gem_obj_ggtt_offset(obj);
+		val = vma->node.start;
 		if (obj->tiling_mode == I915_TILING_Y)
 			val |= 1 << I830_FENCE_TILING_Y_SHIFT;
-		val |= I830_FENCE_SIZE_BITS(size);
+		val |= I830_FENCE_SIZE_BITS(vma->node.size);
 		val |= pitch_val << I830_FENCE_PITCH_SHIFT;
 		val |= I830_FENCE_REG_VALID;
 	} else
@@ -426,11 +422,6 @@ i915_gem_object_pin_fence(struct drm_i915_gem_object *obj)
 {
 	if (obj->fence_reg != I915_FENCE_REG_NONE) {
 		struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
-		struct i915_vma *ggtt_vma = i915_gem_obj_to_ggtt(obj);
-
-		WARN_ON(!ggtt_vma ||
-			dev_priv->fence_regs[obj->fence_reg].pin_count >
-			ggtt_vma->pin_count);
 		dev_priv->fence_regs[obj->fence_reg].pin_count++;
 		return true;
 	} else
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 6652df57e5b0..0aadfaee2150 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3250,14 +3250,10 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 
 	GEM_BUG_ON(vm->closed);
 
-	if (WARN_ON(i915_is_ggtt(vm) != !!ggtt_view))
-		return ERR_PTR(-EINVAL);
-
 	vma = kmem_cache_zalloc(to_i915(obj->base.dev)->vmas, GFP_KERNEL);
 	if (vma == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	INIT_LIST_HEAD(&vma->obj_link);
 	INIT_LIST_HEAD(&vma->exec_list);
 	for (i = 0; i < ARRAY_SIZE(vma->last_read); i++)
 		init_request_active(&vma->last_read[i], i915_vma_retire);
@@ -3267,55 +3263,69 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 	vma->size = obj->base.size;
 	vma->is_ggtt = i915_is_ggtt(vm);
 
-	if (i915_is_ggtt(vm)) {
+	if (ggtt_view) {
 		vma->ggtt_view = *ggtt_view;
 		if (ggtt_view->type == I915_GGTT_VIEW_PARTIAL)
 			vma->size = ggtt_view->params.partial.size << PAGE_SHIFT;
 		else if (ggtt_view->type == I915_GGTT_VIEW_ROTATED)
 			vma->size = ggtt_view->params.rotation_info.size;
-	} else
+	}
+
+	if (!vma->is_ggtt)
 		i915_ppgtt_get(i915_vm_to_ppgtt(vm));
 
 	list_add_tail(&vma->obj_link, &obj->vma_list);
-
 	return vma;
 }
 
-struct i915_vma *
-i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
-				  struct i915_address_space *vm)
+static inline bool vma_matches(struct i915_vma *vma,
+			       struct i915_address_space *vm,
+			       const struct i915_ggtt_view *view)
 {
-	struct i915_vma *vma;
+	if (vma->vm != vm)
+		return false;
 
-	vma = i915_gem_obj_to_vma(obj, vm);
-	if (!vma)
-		vma = __i915_gem_vma_create(obj, vm,
-					    i915_is_ggtt(vm) ? &i915_ggtt_view_normal : NULL);
+	if (!vma->is_ggtt)
+		return true;
 
-	return vma;
+	if (view == NULL)
+		return vma->ggtt_view.type == 0;
+
+	if (vma->ggtt_view.type != view->type)
+		return false;
+
+	return memcmp(&vma->ggtt_view.params,
+		      &view->params,
+		      sizeof(view->params)) == 0;
 }
 
 struct i915_vma *
-i915_gem_obj_lookup_or_create_ggtt_vma(struct drm_i915_gem_object *obj,
-				       const struct i915_ggtt_view *view)
+i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
+		    struct i915_address_space *vm,
+		    const struct i915_ggtt_view *view)
 {
-	struct i915_address_space *ggtt = i915_obj_to_ggtt(obj);
 	struct i915_vma *vma;
 
-	if (WARN_ON(!view))
-		return ERR_PTR(-EINVAL);
+	list_for_each_entry_reverse(vma, &obj->vma_list, obj_link)
+		if (vma_matches(vma, vm, view))
+			return vma;
 
-	vma = i915_gem_obj_to_ggtt_view(obj, view);
+	return NULL;
+}
 
-	if (IS_ERR(vma))
-		return vma;
+struct i915_vma *
+i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
+				  struct i915_address_space *vm,
+				  const struct i915_ggtt_view *view)
+{
+	struct i915_vma *vma;
 
+	vma = i915_gem_obj_to_vma(obj, vm, view);
 	if (!vma)
-		vma = __i915_gem_vma_create(obj, ggtt, view);
+		vma = __i915_gem_vma_create(obj, vm, view);
 
 	GEM_BUG_ON(vma->closed);
 	return vma;
-
 }
 
 static struct scatterlist *
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index e6f64dcb2e77..7f57dea246d8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -573,18 +573,4 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev);
 
 int __must_check i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj);
 void i915_gem_gtt_finish_object(struct drm_i915_gem_object *obj);
-
-static inline bool
-i915_ggtt_view_equal(const struct i915_ggtt_view *a,
-                     const struct i915_ggtt_view *b)
-{
-	if (WARN_ON(!a || !b))
-		return false;
-
-	if (a->type != b->type)
-		return false;
-	if (a->type != I915_GGTT_VIEW_NORMAL)
-		return !memcmp(&a->params, &b->params, sizeof(a->params));
-	return true;
-}
 #endif
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 830c0d24b11e..89b5c99bbb02 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -31,7 +31,7 @@
 struct render_state {
 	const struct intel_renderstate_rodata *rodata;
 	struct drm_i915_gem_object *obj;
-	u64 ggtt_offset;
+	struct i915_vma *vma;
 	int gen;
 	u32 aux_batch_size;
 	u32 aux_batch_offset;
@@ -56,7 +56,7 @@ render_state_get_rodata(struct drm_device *dev, const int gen)
 
 static int render_state_init(struct render_state *so, struct drm_device *dev)
 {
-	int ret;
+	struct i915_vma *vma;
 
 	so->gen = INTEL_INFO(dev)->gen;
 	so->rodata = render_state_get_rodata(dev, so->gen);
@@ -70,16 +70,14 @@ static int render_state_init(struct render_state *so, struct drm_device *dev)
 	if (so->obj == NULL)
 		return -ENOMEM;
 
-	ret = i915_gem_object_ggtt_pin(so->obj, NULL, 0, 0, 0);
-	if (ret)
-		goto free_gem;
+	vma = i915_gem_object_ggtt_pin(so->obj, NULL, 0, 0, 0);
+	if (IS_ERR(vma)) {
+		drm_gem_object_unreference(&so->obj->base);
+		return PTR_ERR(vma);
+	}
 
-	so->ggtt_offset = i915_gem_obj_ggtt_offset(so->obj);
+	so->vma = vma;
 	return 0;
-
-free_gem:
-	drm_gem_object_unreference(&so->obj->base);
-	return ret;
 }
 
 /*
@@ -119,7 +117,7 @@ static int render_state_setup(struct render_state *so)
 		u32 s = rodata->batch[i];
 
 		if (i * 4  == rodata->reloc[reloc_index]) {
-			u64 r = s + so->ggtt_offset;
+			u64 r = s + so->vma->node.start,
 			s = lower_32_bits(r);
 			if (so->gen >= 8) {
 				if (i + 1 >= rodata->batch_items ||
@@ -174,7 +172,7 @@ err_out:
 
 static void render_state_fini(struct render_state *so)
 {
-	i915_gem_object_ggtt_unpin(so->obj);
+	i915_vma_unpin(so->vma);
 	drm_gem_object_unreference(&so->obj->base);
 }
 
@@ -207,14 +205,14 @@ int i915_gem_render_state_init(struct drm_i915_gem_request *req)
 	struct render_state so;
 	int ret;
 
-	ret = render_state_prepare(req->engine, &so);
+	ret = render_state_prepare(req->engine, memset(&so, 0, sizeof(so)));
 	if (ret)
 		return ret;
 
 	if (so.rodata == NULL)
 		return 0;
 
-	ret = req->engine->emit_bb_start(req, so.ggtt_offset,
+	ret = req->engine->emit_bb_start(req, so.vma->node.start,
 					 so.rodata->batch_items * 4,
 					 I915_DISPATCH_SECURE);
 	if (ret)
@@ -222,7 +220,7 @@ int i915_gem_render_state_init(struct drm_i915_gem_request *req)
 
 	if (so.aux_batch_size > 8) {
 		ret = req->engine->emit_bb_start(req,
-						 (so.ggtt_offset +
+						 (so.vma->node.start +
 						  so.aux_batch_offset),
 						 so.aux_batch_size,
 						 I915_DISPATCH_SECURE);
@@ -230,7 +228,7 @@ int i915_gem_render_state_init(struct drm_i915_gem_request *req)
 			goto out;
 	}
 
-	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), req, 0);
+	i915_vma_move_to_active(so.vma, req, 0);
 out:
 	render_state_fini(&so);
 	return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.h b/drivers/gpu/drm/i915/i915_gem_render_state.h
index c44fca8599bb..18cce3f06e9c 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.h
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.h
@@ -24,7 +24,7 @@
 #ifndef _I915_GEM_RENDER_STATE_H_
 #define _I915_GEM_RENDER_STATE_H_
 
-#include <linux/types.h>
+struct drm_i915_gem_request;
 
 int i915_gem_render_state_init(struct drm_i915_gem_request *req);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 1886048f0acd..4ebe4b7e02d0 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -420,15 +420,10 @@ static void i915_gem_mark_busy(struct drm_i915_private *dev_priv)
  */
 void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 {
-	struct intel_ring *ring;
+	struct intel_ring *ring = request->ring;
 	u32 request_start;
 	int ret;
 
-	if (WARN_ON(request == NULL))
-		return;
-
-	ring = request->ring;
-
 	/*
 	 * To ensure that this call will not fail, space for its emissions
 	 * should already have been reserved in the ring buffer. Let the ring
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 4b38cd731124..2294234b4bf5 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -84,7 +84,7 @@ struct drm_i915_gem_request {
 
 	/** Batch buffer related to this request if any (used for
 	    error state dump only) */
-	struct drm_i915_gem_object *batch_obj;
+	struct i915_vma *batch;
 	struct list_head active_list;
 
 	/** Time at which this request was emitted, in jiffies. */
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index c110563823bd..401fa603b3e3 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -670,7 +670,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 	if (gtt_offset == I915_GTT_OFFSET_NONE)
 		return obj;
 
-	vma = i915_gem_obj_lookup_or_create_vma(obj, ggtt);
+	vma = i915_gem_obj_lookup_or_create_vma(obj, ggtt, NULL);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto err;
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
index 387246f19ce2..f83cb4329c8d 100644
--- a/drivers/gpu/drm/i915/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
@@ -114,33 +114,44 @@ i915_tiling_ok(struct drm_device *dev, int stride, int size, int tiling_mode)
 }
 
 /* Is the current GTT allocation valid for the change in tiling? */
-static bool
+static int
 i915_gem_object_fence_ok(struct drm_i915_gem_object *obj, int tiling_mode)
 {
+	struct i915_vma *vma;
 	u32 size;
 
 	if (tiling_mode == I915_TILING_NONE)
-		return true;
+		return 0;
 
 	if (INTEL_INFO(obj->base.dev)->gen >= 4)
-		return true;
+		return 0;
+
+	vma = i915_gem_object_to_ggtt(obj, NULL);
+	if (vma == NULL)
+		return 0;
+
+	if (!obj->map_and_fenceable)
+		return 0;
 
 	if (INTEL_INFO(obj->base.dev)->gen == 3) {
-		if (i915_gem_obj_ggtt_offset(obj) & ~I915_FENCE_START_MASK)
-			return false;
+		if (vma->node.start & ~I915_FENCE_START_MASK)
+			goto bad;
 	} else {
-		if (i915_gem_obj_ggtt_offset(obj) & ~I830_FENCE_START_MASK)
-			return false;
+		if (vma->node.start & ~I830_FENCE_START_MASK)
+			goto bad;
 	}
 
 	size = i915_gem_get_gtt_size(obj->base.dev, obj->base.size, tiling_mode);
-	if (i915_gem_obj_ggtt_size(obj) != size)
-		return false;
+	if (vma->node.size < size)
+		goto bad;
 
-	if (i915_gem_obj_ggtt_offset(obj) & (size - 1))
-		return false;
+	if (vma->node.start & (size - 1))
+		goto bad;
 
-	return true;
+	return 0;
+
+bad:
+	return i915_vma_unbind(vma);
 }
 
 /**
@@ -227,10 +238,7 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
 		 * has to also include the unfenced register the GPU uses
 		 * whilst executing a fenced command for an untiled object.
 		 */
-		if (obj->map_and_fenceable &&
-		    !i915_gem_object_fence_ok(obj, args->tiling_mode))
-			ret = i915_gem_object_ggtt_unbind(obj);
-
+		ret = i915_gem_object_fence_ok(obj, args->tiling_mode);
 		if (ret == 0) {
 			if (obj->pages &&
 			    obj->madv == I915_MADV_WILLNEED &&
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 9a18fc502145..7fe9281bf37e 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -593,17 +593,20 @@ static void i915_error_state_free(struct kref *error_ref)
 
 static struct drm_i915_error_object *
 i915_error_object_create(struct drm_i915_private *dev_priv,
-			 struct drm_i915_gem_object *src,
-			 struct i915_address_space *vm)
+			 struct i915_vma *vma)
 {
+	struct drm_i915_gem_object *src;
 	struct drm_i915_error_object *dst;
-	struct i915_vma *vma = NULL;
 	int num_pages;
 	bool use_ggtt;
 	int i = 0;
 	u64 reloc_offset;
 
-	if (src == NULL || src->pages == NULL)
+	if (vma == NULL)
+		return NULL;
+
+	src = vma->obj;
+	if (src->pages == NULL)
 		return NULL;
 
 	num_pages = src->base.size >> PAGE_SHIFT;
@@ -612,26 +615,19 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
 	if (dst == NULL)
 		return NULL;
 
-	if (i915_gem_obj_bound(src, vm))
-		dst->gtt_offset = i915_gem_obj_offset(src, vm);
-	else
-		dst->gtt_offset = -1;
-
-	reloc_offset = dst->gtt_offset;
-	if (i915_is_ggtt(vm))
-		vma = i915_gem_obj_to_ggtt(src);
+	reloc_offset = dst->gtt_offset = vma->node.start;
 	use_ggtt = (src->cache_level == I915_CACHE_NONE &&
-		   vma && (vma->bound & GLOBAL_BIND) &&
+		   (vma->bound & GLOBAL_BIND) &&
 		   reloc_offset + num_pages * PAGE_SIZE <= dev_priv->gtt.mappable_end);
 
 	/* Cannot access stolen address directly, try to use the aperture */
 	if (src->stolen) {
 		use_ggtt = true;
 
-		if (!(vma && vma->bound & GLOBAL_BIND))
+		if (!(vma->bound & GLOBAL_BIND))
 			goto unwind;
 
-		reloc_offset = i915_gem_obj_ggtt_offset(src);
+		reloc_offset = vma->node.start;
 		if (reloc_offset + num_pages * PAGE_SIZE > dev_priv->gtt.mappable_end)
 			goto unwind;
 	}
@@ -690,8 +686,6 @@ unwind:
 	kfree(dst);
 	return NULL;
 }
-#define i915_error_ggtt_object_create(dev_priv, src) \
-	i915_error_object_create((dev_priv), (src), &(dev_priv)->gtt.base)
 
 static void capture_bo(struct drm_i915_error_buffer *err,
 		       struct i915_vma *vma)
@@ -798,10 +792,10 @@ static void gen8_record_semaphore_state(struct drm_i915_private *dev_priv,
 	if (!i915.semaphores)
 		return;
 
-	if (!error->semaphore_obj)
+	if (!error->semaphore_obj && dev_priv->semaphore_vma)
 		error->semaphore_obj =
-			i915_error_ggtt_object_create(dev_priv,
-						      dev_priv->semaphore_obj);
+			i915_error_object_create(dev_priv,
+						 dev_priv->semaphore_vma);
 
 	for_each_ring(to, dev_priv, i) {
 		int idx;
@@ -949,9 +943,7 @@ static void i915_gem_record_active_context(struct intel_engine_cs *ring,
 
 	list_for_each_entry(vma, &dev_priv->gtt.base.active_list, vm_link) {
 		if ((error->ccid & PAGE_MASK) == vma->node.start) {
-			ering->ctx = i915_error_object_create(dev_priv,
-							      vma->obj,
-							      vma->vm);
+			ering->ctx = i915_error_object_create(dev_priv, vma);
 			break;
 		}
 	}
@@ -992,13 +984,12 @@ static void i915_gem_record_rings(struct drm_device *dev,
 			 */
 			error->ring[i].batchbuffer =
 				i915_error_object_create(dev_priv,
-							 request->batch_obj,
-							 vm);
+							 request->batch);
 
 			if (HAS_BROKEN_CS_TLB(dev_priv))
 				error->ring[i].wa_batchbuffer =
-					i915_error_ggtt_object_create(dev_priv,
-								      engine->scratch.obj);
+					i915_error_object_create(dev_priv,
+								 engine->scratch.vma);
 
 			if (request->pid) {
 				struct task_struct *task;
@@ -1018,13 +1009,12 @@ static void i915_gem_record_rings(struct drm_device *dev,
 			error->ring[i].cpu_ring_head = ring->head;
 			error->ring[i].cpu_ring_tail = ring->tail;
 			error->ring[i].ringbuffer =
-				i915_error_ggtt_object_create(dev_priv,
-							      ring->obj);
+				i915_error_object_create(dev_priv, ring->vma);
 		}
 
 		error->ring[i].hws_page =
-			i915_error_ggtt_object_create(dev_priv,
-						      engine->status_page.obj);
+			i915_error_object_create(dev_priv,
+						 engine->status_page.vma);
 
 		i915_gem_record_active_context(engine, error, &error->ring[i]);
 
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index baa5c34757ba..d6df94129796 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -392,7 +392,6 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 		struct guc_execlist_context *lrc = &desc.lrc[i];
 		struct intel_ring *ring = ctx->engine[i].ring;
 		struct intel_engine_cs *engine;
-		struct drm_i915_gem_object *obj;
 
 		/* TODO: We have a design issue to be solved here. Only when we
 		 * receive the first batch, we know which engine is used by the
@@ -401,23 +400,20 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 		 * for now who owns a GuC client. But for future owner of GuC
 		 * client, need to make sure lrc is pinned prior to enter here.
 		 */
-		obj = ctx->engine[i].state;
-		if (!obj)
+		if (ctx->engine[i].state == NULL)
 			break;	/* XXX: continue? */
 
 		engine = ring->engine;
 		lrc->context_desc = engine->execlist_context_descriptor;
 
 		/* The state page is after PPHWSP */
-		lrc->ring_lcra = i915_gem_obj_ggtt_offset(obj) +
+		lrc->ring_lcra = ctx->engine[i].vma->node.start +
 				LRC_STATE_PN * PAGE_SIZE;
 		lrc->context_id = (client->ctx_index << GUC_ELC_CTXID_OFFSET) |
 				(engine->id << GUC_ELC_ENGINE_OFFSET);
 
-		obj = ring->obj;
-
-		lrc->ring_begin = i915_gem_obj_ggtt_offset(obj);
-		lrc->ring_end = lrc->ring_begin + obj->base.size - 1;
+		lrc->ring_begin = ring->vma->node.start;
+		lrc->ring_end = lrc->ring_begin + ring->size - 1;
 		lrc->ring_next_free_location = lrc->ring_begin;
 		lrc->ring_current_tail_pointer_value = 0;
 
@@ -496,7 +492,7 @@ int i915_guc_wq_check_space(struct i915_guc_client *gc)
 
 		if (timeout_counter)
 			usleep_range(1000, 2000);
-	};
+	}
 
 	kunmap_atomic(base);
 
@@ -611,25 +607,25 @@ int i915_guc_submit(struct i915_guc_client *client,
  */
 static struct i915_vma *guc_allocate_vma(struct drm_device *dev, u32 size)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_object *obj;
-	int ret;
+	struct i915_vma *vma;
 
 	obj = i915_gem_alloc_object(dev, size);
 	if (!obj)
 		return ERR_PTR(-ENOMEM);
 
-	ret = i915_gem_object_ggtt_pin(obj, NULL, 0, PAGE_SIZE,
+	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, PAGE_SIZE,
 				       PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
-	if (ret) {
+	if (IS_ERR(vma)) {
 		drm_gem_object_unreference(&obj->base);
-		return ERR_PTR(ret);
+		return vma;
 	}
 
 	/* Invalidate GuC TLB to let GuC take the latest updates to GTT. */
 	I915_WRITE(GEN8_GTCR, GEN8_GTCR_INVALIDATE);
 
-	return i915_gem_obj_to_ggtt(obj);
+	return vma;
 }
 
 /**
@@ -991,7 +987,7 @@ int intel_guc_suspend(struct drm_device *dev)
 	/* any value greater than GUC_POWER_D0 */
 	data[1] = GUC_POWER_D1;
 	/* first page is shared data with GuC */
-	data[2] = i915_gem_obj_ggtt_offset(ctx->engine[RCS].state);
+	data[2] = ctx->engine[RCS].vma->node.start;
 
 	return host2guc_action(guc, data, ARRAY_SIZE(data));
 }
@@ -1016,7 +1012,7 @@ int intel_guc_resume(struct drm_device *dev)
 	data[0] = HOST2GUC_ACTION_EXIT_S_STATE;
 	data[1] = GUC_POWER_D0;
 	/* first page is shared data with GuC */
-	data[2] = i915_gem_obj_ggtt_offset(ctx->engine[RCS].state);
+	data[2] = ctx->engine[RCS].vma->node.start;
 
 	return host2guc_action(guc, data, ARRAY_SIZE(data));
 }
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index e8f957785a64..313f1fb144b9 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2322,7 +2322,7 @@ static unsigned int intel_linear_alignment(struct drm_i915_private *dev_priv)
 		return 0;
 }
 
-int
+struct i915_vma *
 intel_pin_and_fence_fb_obj(struct drm_plane *plane,
 			   struct drm_framebuffer *fb,
 			   const struct drm_plane_state *plane_state)
@@ -2331,6 +2331,7 @@ intel_pin_and_fence_fb_obj(struct drm_plane *plane,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 	struct i915_ggtt_view view;
+	struct i915_vma *vma;
 	u32 alignment;
 	int ret;
 
@@ -2352,12 +2353,12 @@ intel_pin_and_fence_fb_obj(struct drm_plane *plane,
 	case I915_FORMAT_MOD_Yf_TILED:
 		if (WARN_ONCE(INTEL_INFO(dev)->gen < 9,
 			  "Y tiling bo slipped through, driver bug!\n"))
-			return -EINVAL;
+			return ERR_PTR(-ENODEV);
 		alignment = 1 * 1024 * 1024;
 		break;
 	default:
 		MISSING_CASE(fb->modifier[0]);
-		return -EINVAL;
+		return ERR_PTR(-ENODEV);
 	}
 
 	intel_fill_fb_ggtt_view(&view, fb, plane_state);
@@ -2379,10 +2380,11 @@ intel_pin_and_fence_fb_obj(struct drm_plane *plane,
 	 */
 	intel_runtime_pm_get(dev_priv);
 
-	ret = i915_gem_object_pin_to_display_plane(obj, alignment,
-						   &view);
-	if (ret)
+	vma = i915_gem_object_pin_to_display_plane(obj, alignment, &view);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto err_pm;
+	}
 
 	/* Install a fence for tiled scan-out. Pre-i965 always needs a
 	 * fence, whereas 965+ only requires a fence if using
@@ -2409,29 +2411,31 @@ intel_pin_and_fence_fb_obj(struct drm_plane *plane,
 	}
 
 	intel_runtime_pm_put(dev_priv);
-	return 0;
+	return vma;
 
 err_unpin:
-	i915_gem_object_unpin_from_display_plane(obj, &view);
+	i915_gem_object_unpin_from_display_plane(vma);
 err_pm:
 	intel_runtime_pm_put(dev_priv);
-	return ret;
+	return ERR_PTR(ret);
 }
 
 static void intel_unpin_fb_obj(struct drm_framebuffer *fb,
-			       const struct drm_plane_state *plane_state)
+			       const struct drm_plane_state *state)
 {
 	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 	struct i915_ggtt_view view;
+	struct i915_vma *vma;
 
 	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
 
-	intel_fill_fb_ggtt_view(&view, fb, plane_state);
+	intel_fill_fb_ggtt_view(&view, fb, state);
 
 	if (view.type == I915_GGTT_VIEW_NORMAL)
 		i915_gem_object_unpin_fence(obj);
 
-	i915_gem_object_unpin_from_display_plane(obj, &view);
+	vma = i915_gem_object_to_ggtt(obj, &view);
+	i915_gem_object_unpin_from_display_plane(vma);
 }
 
 /* Computes the linear offset to the base tile and adjusts x, y. bytes per pixel
@@ -2628,7 +2632,7 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc,
 			continue;
 
 		obj = intel_fb_obj(fb);
-		if (i915_gem_obj_ggtt_offset(obj) == plane_config->base) {
+		if (i915_gem_object_ggtt_offset(obj, NULL) == plane_config->base) {
 			drm_framebuffer_reference(fb);
 			goto valid_fb;
 		}
@@ -2788,11 +2792,11 @@ static void i9xx_update_primary_plane(struct drm_plane *primary,
 	I915_WRITE(DSPSTRIDE(plane), fb->pitches[0]);
 	if (INTEL_INFO(dev)->gen >= 4) {
 		I915_WRITE(DSPSURF(plane),
-			   i915_gem_obj_ggtt_offset(obj) + intel_crtc->dspaddr_offset);
+			   i915_gem_object_ggtt_offset(obj, NULL) + intel_crtc->dspaddr_offset);
 		I915_WRITE(DSPTILEOFF(plane), (y << 16) | x);
 		I915_WRITE(DSPLINOFF(plane), linear_offset);
 	} else
-		I915_WRITE(DSPADDR(plane), i915_gem_obj_ggtt_offset(obj) + linear_offset);
+		I915_WRITE(DSPADDR(plane), i915_gem_object_ggtt_offset(obj, NULL) + linear_offset);
 	POSTING_READ(reg);
 }
 
@@ -2893,7 +2897,7 @@ static void ironlake_update_primary_plane(struct drm_plane *primary,
 
 	I915_WRITE(DSPSTRIDE(plane), fb->pitches[0]);
 	I915_WRITE(DSPSURF(plane),
-		   i915_gem_obj_ggtt_offset(obj) + intel_crtc->dspaddr_offset);
+		   i915_gem_object_ggtt_offset(obj, NULL) + intel_crtc->dspaddr_offset);
 	if (IS_HASWELL(dev) || IS_BROADWELL(dev)) {
 		I915_WRITE(DSPOFFSET(plane), (y << 16) | x);
 	} else {
@@ -2948,7 +2952,7 @@ u32 intel_plane_obj_offset(struct intel_plane *intel_plane,
 	intel_fill_fb_ggtt_view(&view, intel_plane->base.fb,
 				intel_plane->base.state);
 
-	vma = i915_gem_obj_to_ggtt_view(obj, &view);
+	vma = i915_gem_object_to_ggtt(obj, &view);
 	if (WARN(!vma, "ggtt vma for display object not found! (view=%u)\n",
 		view.type))
 		return -1;
@@ -11562,6 +11566,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	struct intel_engine_cs *ring;
 	bool mmio_flip;
 	struct drm_i915_gem_request *request = NULL;
+	struct i915_vma *vma;
 	int ret;
 
 	/*
@@ -11683,13 +11688,14 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 			goto cleanup_request;
 	}
 
-	ret = intel_pin_and_fence_fb_obj(crtc->primary, fb,
+	vma = intel_pin_and_fence_fb_obj(crtc->primary, fb,
 					 crtc->primary->state);
-	if (ret)
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto cleanup_request;
+	}
 
-	work->gtt_offset = intel_plane_obj_offset(to_intel_plane(primary),
-						  obj, 0);
+	work->gtt_offset = vma->node.start;
 	work->gtt_offset += intel_crtc->dspaddr_offset;
 
 	if (mmio_flip) {
@@ -13889,7 +13895,12 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		if (ret)
 			DRM_DEBUG_KMS("failed to attach phys object\n");
 	} else {
-		ret = intel_pin_and_fence_fb_obj(plane, fb, new_state);
+		struct i915_vma *vma;
+
+		vma = intel_pin_and_fence_fb_obj(plane, fb, new_state);
+
+		if (IS_ERR(vma))
+			ret = PTR_ERR(vma);
 	}
 
 	if (ret == 0) {
@@ -14229,7 +14240,7 @@ intel_update_cursor_plane(struct drm_plane *plane,
 	if (!obj)
 		addr = 0;
 	else if (!INTEL_INFO(dev)->cursor_needs_physical)
-		addr = i915_gem_obj_ggtt_offset(obj);
+		addr = i915_gem_object_ggtt_offset(obj, NULL);
 	else
 		addr = obj->phys_handle->busaddr;
 
@@ -16019,7 +16030,6 @@ void intel_modeset_gem_init(struct drm_device *dev)
 {
 	struct drm_crtc *c;
 	struct drm_i915_gem_object *obj;
-	int ret;
 
 	mutex_lock(&dev->struct_mutex);
 	intel_init_gt_powersave(dev);
@@ -16035,16 +16045,18 @@ void intel_modeset_gem_init(struct drm_device *dev)
 	 * for this.
 	 */
 	for_each_crtc(dev, c) {
+		struct i915_vma *vma;
+
 		obj = intel_fb_obj(c->primary->fb);
 		if (obj == NULL)
 			continue;
 
 		mutex_lock(&dev->struct_mutex);
-		ret = intel_pin_and_fence_fb_obj(c->primary,
+		vma = intel_pin_and_fence_fb_obj(c->primary,
 						 c->primary->fb,
 						 c->primary->state);
 		mutex_unlock(&dev->struct_mutex);
-		if (ret) {
+		if (IS_ERR(vma)) {
 			DRM_ERROR("failed to pin boot fb on pipe %d\n",
 				  to_intel_crtc(c)->pipe);
 			drm_framebuffer_unreference(c->primary->fb);
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 41e2e1c4d052..d33aebd2ed4e 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -123,6 +123,7 @@ struct intel_framebuffer {
 struct intel_fbdev {
 	struct drm_fb_helper helper;
 	struct intel_framebuffer *fb;
+	struct i915_vma *vma;
 	int preferred_bpp;
 };
 
@@ -1149,9 +1150,10 @@ bool intel_get_load_detect_pipe(struct drm_connector *connector,
 void intel_release_load_detect_pipe(struct drm_connector *connector,
 				    struct intel_load_detect_pipe *old,
 				    struct drm_modeset_acquire_ctx *ctx);
-int intel_pin_and_fence_fb_obj(struct drm_plane *plane,
-			       struct drm_framebuffer *fb,
-			       const struct drm_plane_state *plane_state);
+struct i915_vma *
+intel_pin_and_fence_fb_obj(struct drm_plane *plane,
+			   struct drm_framebuffer *fb,
+			   const struct drm_plane_state *plane_state);
 struct drm_framebuffer *
 __intel_framebuffer_create(struct drm_device *dev,
 			   struct drm_mode_fb_cmd2 *mode_cmd,
diff --git a/drivers/gpu/drm/i915/intel_fbc.c b/drivers/gpu/drm/i915/intel_fbc.c
index a1988a486b92..8d8f1ce7f1ae 100644
--- a/drivers/gpu/drm/i915/intel_fbc.c
+++ b/drivers/gpu/drm/i915/intel_fbc.c
@@ -263,7 +263,7 @@ static void ilk_fbc_activate(struct intel_crtc *crtc)
 
 	y_offset = get_crtc_fence_y_offset(crtc);
 	I915_WRITE(ILK_DPFC_FENCE_YOFF, y_offset);
-	I915_WRITE(ILK_FBC_RT_BASE, i915_gem_obj_ggtt_offset(obj) | ILK_FBC_RT_VALID);
+	I915_WRITE(ILK_FBC_RT_BASE, i915_gem_object_ggtt_offset(obj, NULL) | ILK_FBC_RT_VALID);
 	/* enable it... */
 	I915_WRITE(ILK_DPFC_CONTROL, dpfc_ctl | DPFC_CTL_EN);
 
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index 09840f4380f9..7decbca25dbb 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -184,9 +184,9 @@ static int intelfb_create(struct drm_fb_helper *helper,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct fb_info *info;
 	struct drm_framebuffer *fb;
-	struct drm_i915_gem_object *obj;
-	int size, ret;
+	struct i915_vma *vma;
 	bool prealloc = false;
+	int ret;
 
 	if (intel_fb &&
 	    (sizes->fb_width > intel_fb->base.width ||
@@ -211,18 +211,17 @@ static int intelfb_create(struct drm_fb_helper *helper,
 		sizes->fb_height = intel_fb->base.height;
 	}
 
-	obj = intel_fb->obj;
-	size = obj->base.size;
-
 	mutex_lock(&dev->struct_mutex);
 
 	/* Pin the GGTT vma for our access via info->screen_base.
 	 * This also validates that any existing fb inherited from the
 	 * BIOS is suitable for own access.
 	 */
-	ret = intel_pin_and_fence_fb_obj(NULL, &ifbdev->fb->base, NULL);
-	if (ret)
+	vma = intel_pin_and_fence_fb_obj(NULL, &ifbdev->fb->base, NULL);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto out_unlock;
+	}
 
 	info = drm_fb_helper_alloc_fbi(helper);
 	if (IS_ERR(info)) {
@@ -246,18 +245,19 @@ static int intelfb_create(struct drm_fb_helper *helper,
 	info->apertures->ranges[0].base = dev->mode_config.fb_base;
 	info->apertures->ranges[0].size = dev_priv->gtt.mappable_end;
 
-	info->fix.smem_start = dev->mode_config.fb_base + i915_gem_obj_ggtt_offset(obj);
-	info->fix.smem_len = size;
+	info->fix.smem_start = dev->mode_config.fb_base + vma->node.start;
+	info->fix.smem_len = vma->node.size;
 
 	info->screen_base =
-		ioremap_wc(dev_priv->gtt.mappable_base + i915_gem_obj_ggtt_offset(obj),
-			   size);
+		ioremap_wc(dev_priv->gtt.mappable_base + vma->node.start,
+			   vma->node.size);
 	if (!info->screen_base) {
 		DRM_ERROR("Failed to remap framebuffer into virtual memory\n");
 		ret = -ENOSPC;
 		goto out_destroy_fbi;
 	}
-	info->screen_size = size;
+	info->screen_size = vma->node.size;
+	ifbdev->vma = vma;
 
 	/* This driver doesn't need a VT switch to restore the mode on resume */
 	info->skip_vt_switch = true;
@@ -269,14 +269,13 @@ static int intelfb_create(struct drm_fb_helper *helper,
 	 * If the object is stolen however, it will be full of whatever
 	 * garbage was left in there.
 	 */
-	if (ifbdev->fb->obj->stolen && !prealloc)
+	if (intel_fb->obj->stolen && !prealloc)
 		memset_io(info->screen_base, 0, info->screen_size);
 
 	/* Use default scratch pixmap (info->pixmap.flags = FB_PIXMAP_SYSTEM) */
 
-	DRM_DEBUG_KMS("allocated %dx%d fb: 0x%08llx, bo %p\n",
-		      fb->width, fb->height,
-		      i915_gem_obj_ggtt_offset(obj), obj);
+	DRM_DEBUG_KMS("allocated %dx%d fb: 0x%08llx\n",
+		      fb->width, fb->height, vma->node.start);
 
 	mutex_unlock(&dev->struct_mutex);
 	vga_switcheroo_client_fb_set(dev->pdev, info);
@@ -285,7 +284,8 @@ static int intelfb_create(struct drm_fb_helper *helper,
 out_destroy_fbi:
 	drm_fb_helper_release_fbi(helper);
 out_unpin:
-	i915_gem_object_ggtt_unpin(obj);
+	i915_gem_object_unpin_fence(vma->obj);
+	i915_gem_object_unpin_from_display_plane(vma);
 out_unlock:
 	mutex_unlock(&dev->struct_mutex);
 	return ret;
@@ -524,10 +524,10 @@ static const struct drm_fb_helper_funcs intel_fb_helper_funcs = {
 static void intel_fbdev_destroy(struct drm_device *dev,
 				struct intel_fbdev *ifbdev)
 {
-	/* We rely on the object-free to release the VMA pinning for
-	 * the info->screen_base mmaping. Leaking the VMA is simpler than
-	 * trying to rectify all the possible error paths leading here.
-	 */
+	if (ifbdev->vma) {
+		i915_gem_object_unpin_fence(ifbdev->vma->obj);
+		i915_gem_object_unpin_from_display_plane(ifbdev->vma);
+	}
 
 	drm_fb_helper_unregister_fbi(&ifbdev->helper);
 	drm_fb_helper_release_fbi(&ifbdev->helper);
diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
index b447cfd58361..d1f3d0582d00 100644
--- a/drivers/gpu/drm/i915/intel_guc_loader.c
+++ b/drivers/gpu/drm/i915/intel_guc_loader.c
@@ -221,12 +221,12 @@ static inline bool guc_ucode_response(struct drm_i915_private *dev_priv,
  * Note that GuC needs the CSS header plus uKernel code to be copied by the
  * DMA engine in one operation, whereas the RSA signature is loaded via MMIO.
  */
-static int guc_ucode_xfer_dma(struct drm_i915_private *dev_priv)
+static int guc_ucode_xfer_dma(struct drm_i915_private *dev_priv,
+			      struct i915_vma *vma)
 {
 	struct intel_guc_fw *guc_fw = &dev_priv->guc.guc_fw;
-	struct drm_i915_gem_object *fw_obj = guc_fw->guc_fw_obj;
 	unsigned long offset;
-	struct sg_table *sg = fw_obj->pages;
+	struct sg_table *sg = vma->obj->pages;
 	u32 status, rsa[UOS_RSA_SCRATCH_MAX_COUNT];
 	int i, ret = 0;
 
@@ -243,7 +243,7 @@ static int guc_ucode_xfer_dma(struct drm_i915_private *dev_priv)
 	I915_WRITE(DMA_COPY_SIZE, guc_fw->header_size + guc_fw->ucode_size);
 
 	/* Set the source address for the new blob */
-	offset = i915_gem_obj_ggtt_offset(fw_obj) + guc_fw->header_offset;
+	offset = vma->node.start + guc_fw->header_offset;
 	I915_WRITE(DMA_ADDR_0_LOW, lower_32_bits(offset));
 	I915_WRITE(DMA_ADDR_0_HIGH, upper_32_bits(offset) & 0xFFFF);
 
@@ -287,6 +287,7 @@ static int guc_ucode_xfer(struct drm_i915_private *dev_priv)
 {
 	struct intel_guc_fw *guc_fw = &dev_priv->guc.guc_fw;
 	struct drm_device *dev = dev_priv->dev;
+	struct i915_vma *vma;
 	int ret;
 
 	ret = i915_gem_object_set_to_gtt_domain(guc_fw->guc_fw_obj, false);
@@ -295,10 +296,10 @@ static int guc_ucode_xfer(struct drm_i915_private *dev_priv)
 		return ret;
 	}
 
-	ret = i915_gem_object_ggtt_pin(guc_fw->guc_fw_obj, NULL, 0, 0, 0);
-	if (ret) {
-		DRM_DEBUG_DRIVER("pin failed %d\n", ret);
-		return ret;
+	vma = i915_gem_object_ggtt_pin(guc_fw->guc_fw_obj, NULL, 0, 0, 0);
+	if (IS_ERR(vma)) {
+		DRM_DEBUG_DRIVER("pin failed %d\n", (int)PTR_ERR(vma));
+		return PTR_ERR(vma);
 	}
 
 	/* Invalidate GuC TLB to let GuC take the latest updates to GTT. */
@@ -339,7 +340,7 @@ static int guc_ucode_xfer(struct drm_i915_private *dev_priv)
 
 	set_guc_init_params(dev_priv);
 
-	ret = guc_ucode_xfer_dma(dev_priv);
+	ret = guc_ucode_xfer_dma(dev_priv, vma);
 
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 
@@ -347,7 +348,7 @@ static int guc_ucode_xfer(struct drm_i915_private *dev_priv)
 	 * We keep the object pages for reuse during resume. But we can unpin it
 	 * now that DMA has completed, so it doesn't continue to take up space.
 	 */
-	i915_gem_object_ggtt_unpin(guc_fw->guc_fw_obj);
+	i915_vma_unpin(vma);
 
 	return ret;
 }
@@ -560,9 +561,7 @@ fail:
 	DRM_ERROR("Failed to fetch GuC firmware from %s (error %d)\n",
 		  guc_fw->guc_fw_path, err);
 
-	obj = guc_fw->guc_fw_obj;
-	if (obj)
-		drm_gem_object_unreference(&obj->base);
+	drm_gem_object_unreference_unlocked(&guc_fw->guc_fw_obj->base);
 	guc_fw->guc_fw_obj = NULL;
 
 	release_firmware(fw);		/* OK even if fw is NULL */
@@ -633,11 +632,8 @@ void intel_guc_ucode_fini(struct drm_device *dev)
 	direct_interrupts_to_host(dev_priv);
 	i915_guc_submission_fini(dev);
 
-	mutex_lock(&dev->struct_mutex);
-	if (guc_fw->guc_fw_obj)
-		drm_gem_object_unreference(&guc_fw->guc_fw_obj->base);
+	drm_gem_object_unreference_unlocked(&guc_fw->guc_fw_obj->base);
 	guc_fw->guc_fw_obj = NULL;
-	mutex_unlock(&dev->struct_mutex);
 
 	guc_fw->guc_fw_fetch_status = GUC_FIRMWARE_NONE;
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 206311b55e71..68d06ab6acdc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -232,7 +232,7 @@ static int execlists_context_deferred_alloc(struct intel_context *ctx,
 static int intel_lr_context_pin(struct intel_context *ctx,
 				struct intel_engine_cs *engine);
 static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
-		struct drm_i915_gem_object *default_ctx_obj);
+					   struct i915_vma *vma);
 
 
 /**
@@ -570,41 +570,41 @@ static int intel_lr_context_pin(struct intel_context *ctx,
 				struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	struct drm_i915_gem_object *ctx_obj;
+	struct i915_vma *vma;
 	struct intel_ring *ring;
 	u32 ggtt_offset;
-	int ret = 0;
+	int ret;
 
 	if (ctx->engine[engine->id].pin_count++)
 		return 0;
 
 	lockdep_assert_held(&engine->dev->struct_mutex);
 
-	ctx_obj = ctx->engine[engine->id].state;
-	ret = i915_gem_object_ggtt_pin(ctx_obj, NULL,
+	vma = i915_gem_object_ggtt_pin(ctx->engine[engine->id].state, NULL,
 				       0, GEN8_LR_CONTEXT_ALIGN,
 				       PIN_OFFSET_BIAS | GUC_WOPCM_TOP |
 				       PIN_HIGH);
-	if (ret)
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto err;
+	}
 
 	ring = ctx->engine[engine->id].ring;
 	ret = intel_ring_map(ring);
 	if (ret)
-		goto unpin_ctx_obj;
+		goto unpin;
 
 	i915_gem_context_reference(ctx);
-	ctx_obj->dirty = true;
+	ctx->engine[engine->id].vma = vma;
+	vma->obj->dirty = true;
 
-	ggtt_offset =
-		i915_gem_obj_ggtt_offset(ctx_obj) + LRC_PPHWSP_PN * PAGE_SIZE;
+	ggtt_offset = vma->node.start + LRC_PPHWSP_PN * PAGE_SIZE;
 	ring->context_descriptor =
 		ggtt_offset | engine->execlist_context_descriptor;
 
 	ring->registers =
-		kmap(i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN));
-	ring->registers[CTX_RING_BUFFER_START+1] =
-		i915_gem_obj_ggtt_offset(ring->obj);
+		kmap(i915_gem_object_get_dirty_page(vma->obj, LRC_STATE_PN));
+	ring->registers[CTX_RING_BUFFER_START+1] = ring->vma->node.start;
 
 	/* Invalidate GuC TLB. */
 	if (i915.enable_guc_submission)
@@ -612,8 +612,8 @@ static int intel_lr_context_pin(struct intel_context *ctx,
 
 	return 0;
 
-unpin_ctx_obj:
-	i915_gem_object_ggtt_unpin(ctx_obj);
+unpin:
+	__i915_vma_unpin(vma);
 err:
 	ctx->engine[engine->id].pin_count = 0;
 	return ret;
@@ -622,7 +622,7 @@ err:
 void intel_lr_context_unpin(struct intel_context *ctx,
 			    struct intel_engine_cs *engine)
 {
-	struct drm_i915_gem_object *ctx_obj;
+	struct i915_vma *vma;
 
 	lockdep_assert_held(&engine->dev->struct_mutex);
 	if (--ctx->engine[engine->id].pin_count)
@@ -630,9 +630,9 @@ void intel_lr_context_unpin(struct intel_context *ctx,
 
 	intel_ring_unmap(ctx->engine[engine->id].ring);
 
-	ctx_obj = ctx->engine[engine->id].state;
-	kunmap(i915_gem_object_get_page(ctx_obj, LRC_STATE_PN));
-	i915_gem_object_ggtt_unpin(ctx_obj);
+	vma = ctx->engine[engine->id].vma;
+	kunmap(i915_gem_object_get_page(vma->obj, LRC_STATE_PN));
+	i915_vma_unpin(vma);
 
 	i915_gem_context_unreference(ctx);
 }
@@ -925,43 +925,41 @@ static int gen9_init_perctx_bb(struct intel_engine_cs *ring,
 	return wa_ctx_end(wa_ctx, *offset = index, 1);
 }
 
-static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size)
+static struct i915_vma *
+lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size)
 {
-	int ret;
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 
-	ring->wa_ctx.obj = i915_gem_alloc_object(ring->dev, PAGE_ALIGN(size));
-	if (!ring->wa_ctx.obj) {
-		DRM_DEBUG_DRIVER("alloc LRC WA ctx backing obj failed.\n");
-		return -ENOMEM;
-	}
+	obj = i915_gem_alloc_object(ring->dev, PAGE_ALIGN(size));
+	if (!obj)
+		return ERR_PTR(-ENOMEM);
 
-	ret = i915_gem_object_ggtt_pin(ring->wa_ctx.obj, NULL, 0, PAGE_SIZE, 0);
-	if (ret) {
-		DRM_DEBUG_DRIVER("pin LRC WA ctx backing obj failed: %d\n",
-				 ret);
-		drm_gem_object_unreference(&ring->wa_ctx.obj->base);
-		return ret;
+	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, PAGE_SIZE, 0);
+	if (IS_ERR(vma)) {
+		drm_gem_object_unreference(&obj->base);
+		return vma;
 	}
 
-	return 0;
+	return vma;
 }
 
 static void lrc_destroy_wa_ctx_obj(struct intel_engine_cs *ring)
 {
-	if (ring->wa_ctx.obj) {
-		i915_gem_object_ggtt_unpin(ring->wa_ctx.obj);
-		drm_gem_object_unreference(&ring->wa_ctx.obj->base);
-		ring->wa_ctx.obj = NULL;
+	if (ring->wa_ctx.vma) {
+		i915_vma_unpin(ring->wa_ctx.vma);
+		drm_gem_object_unreference(&ring->wa_ctx.vma->obj->base);
+		ring->wa_ctx.vma = NULL;
 	}
 }
 
 static int intel_init_workaround_bb(struct intel_engine_cs *ring)
 {
-	int ret;
+	struct i915_ctx_workarounds *wa_ctx = &ring->wa_ctx;
 	uint32_t *batch;
 	uint32_t offset;
 	struct page *page;
-	struct i915_ctx_workarounds *wa_ctx = &ring->wa_ctx;
+	int ret;
 
 	WARN_ON(ring->id != RCS);
 
@@ -978,15 +976,17 @@ static int intel_init_workaround_bb(struct intel_engine_cs *ring)
 		return -EINVAL;
 	}
 
-	ret = lrc_setup_wa_ctx_obj(ring, PAGE_SIZE);
-	if (ret) {
+	wa_ctx->vma = lrc_setup_wa_ctx_obj(ring, PAGE_SIZE);
+	if (IS_ERR(wa_ctx->vma)) {
+		ret = PTR_ERR(wa_ctx->vma);
 		DRM_DEBUG_DRIVER("Failed to setup context WA page: %d\n", ret);
 		return ret;
 	}
 
-	page = i915_gem_object_get_dirty_page(wa_ctx->obj, 0);
+	page = i915_gem_object_get_dirty_page(wa_ctx->vma->obj, 0);
 	batch = kmap_atomic(page);
 	offset = 0;
+	ret = 0;
 
 	if (INTEL_INFO(ring->dev)->gen == 8) {
 		ret = gen8_init_indirectctx_bb(ring,
@@ -1060,7 +1060,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring)
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	lrc_setup_hardware_status_page(ring,
-			dev_priv->kernel_context->engine[ring->id].state);
+			dev_priv->kernel_context->engine[ring->id].vma);
 
 	I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
 	I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff);
@@ -1422,9 +1422,9 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 
 	intel_engine_fini_breadcrumbs(ring);
 
-	if (ring->status_page.obj) {
-		kunmap(sg_page(ring->status_page.obj->pages->sgl));
-		ring->status_page.obj = NULL;
+	if (ring->status_page.vma) {
+		kunmap(sg_page(ring->status_page.vma->obj->pages->sgl));
+		ring->status_page.vma = NULL;
 	}
 	intel_lr_context_unpin(ring->i915->kernel_context, ring);
 
@@ -1799,9 +1799,9 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 		ASSIGN_CTX_REG(reg_state, CTX_BB_PER_CTX_PTR, RING_BB_PER_CTX_PTR(ring->mmio_base), 0);
 		ASSIGN_CTX_REG(reg_state, CTX_RCS_INDIRECT_CTX, RING_INDIRECT_CTX(ring->mmio_base), 0);
 		ASSIGN_CTX_REG(reg_state, CTX_RCS_INDIRECT_CTX_OFFSET, RING_INDIRECT_CTX_OFFSET(ring->mmio_base), 0);
-		if (ring->wa_ctx.obj) {
+		if (ring->wa_ctx.vma) {
 			struct i915_ctx_workarounds *wa_ctx = &ring->wa_ctx;
-			uint32_t ggtt_offset = i915_gem_obj_ggtt_offset(wa_ctx->obj);
+			uint32_t ggtt_offset = wa_ctx->vma->node.start;
 
 			reg_state[CTX_RCS_INDIRECT_CTX+1] =
 				(ggtt_offset + wa_ctx->indirect_ctx.offset * sizeof(uint32_t)) |
@@ -1920,17 +1920,17 @@ uint32_t intel_lr_context_size(struct intel_engine_cs *ring)
 }
 
 static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
-		struct drm_i915_gem_object *default_ctx_obj)
+					   struct i915_vma *vma)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	struct page *page;
 
 	/* The HWSP is part of the default context object in LRC mode. */
-	ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(default_ctx_obj)
-			+ LRC_PPHWSP_PN * PAGE_SIZE;
-	page = i915_gem_object_get_page(default_ctx_obj, LRC_PPHWSP_PN);
+	ring->status_page.gfx_addr =
+	       	vma->node.start + LRC_PPHWSP_PN * PAGE_SIZE;
+	page = i915_gem_object_get_page(vma->obj, LRC_PPHWSP_PN);
 	ring->status_page.page_addr = kmap(page);
-	ring->status_page.obj = default_ctx_obj;
+	ring->status_page.vma = vma;
 
 	I915_WRITE(RING_HWS_PGA(ring->mmio_base),
 			(u32)ring->status_page.gfx_addr);
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index 414a321b752f..d1401f4c4762 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -170,8 +170,8 @@ struct overlay_registers {
 struct intel_overlay {
 	struct drm_device *dev;
 	struct intel_crtc *crtc;
-	struct drm_i915_gem_object *vid_bo;
-	struct drm_i915_gem_object *old_vid_bo;
+	struct drm_i915_gem_object *vid_bo, *old_vid_bo;
+	struct i915_vma *vid_vma, *old_vid_vma;
 	bool active;
 	bool pfit_active;
 	u32 pfit_vscale_ratio; /* shifted-point number, (1<<12) == 1.0 */
@@ -197,7 +197,7 @@ intel_overlay_map_regs(struct intel_overlay *overlay)
 		regs = (struct overlay_registers __iomem *)overlay->reg_bo->phys_handle->vaddr;
 	else
 		regs = io_mapping_map_wc(dev_priv->gtt.mappable,
-					 i915_gem_obj_ggtt_offset(overlay->reg_bo));
+					 overlay->flip_addr);
 
 	return regs;
 }
@@ -308,7 +308,7 @@ static void intel_overlay_release_old_vid_tail(struct intel_overlay *overlay)
 {
 	struct drm_i915_gem_object *obj = overlay->old_vid_bo;
 
-	i915_gem_object_ggtt_unpin(obj);
+	i915_gem_object_unpin_from_display_plane(overlay->old_vid_vma);
 	drm_gem_object_unreference(&obj->base);
 
 	overlay->old_vid_bo = NULL;
@@ -316,14 +316,13 @@ static void intel_overlay_release_old_vid_tail(struct intel_overlay *overlay)
 
 static void intel_overlay_off_tail(struct intel_overlay *overlay)
 {
-	struct drm_i915_gem_object *obj = overlay->vid_bo;
-
 	/* never have the overlay hw on without showing a frame */
-	if (WARN_ON(!obj))
+	if (WARN_ON(overlay->vid_vma))
 		return;
 
-	i915_gem_object_ggtt_unpin(obj);
-	drm_gem_object_unreference(&obj->base);
+	i915_gem_object_unpin_from_display_plane(overlay->vid_vma);
+	drm_gem_object_unreference(&overlay->vid_bo->base);
+	overlay->vid_vma = NULL;
 	overlay->vid_bo = NULL;
 
 	overlay->crtc->overlay = NULL;
@@ -741,6 +740,7 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	struct drm_device *dev = overlay->dev;
 	u32 swidth, swidthsw, sheight, ostride;
 	enum pipe pipe = overlay->crtc->pipe;
+	struct i915_vma *vma;
 
 	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
 	WARN_ON(!drm_modeset_is_locked(&dev->mode_config.connection_mutex));
@@ -749,10 +749,10 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	if (ret != 0)
 		return ret;
 
-	ret = i915_gem_object_pin_to_display_plane(new_bo, 0,
+	vma = i915_gem_object_pin_to_display_plane(new_bo, 0,
 						   &i915_ggtt_view_normal);
-	if (ret != 0)
-		return ret;
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
 
 	ret = i915_gem_object_put_fence(new_bo);
 	if (ret)
@@ -795,7 +795,7 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	swidth = params->src_w;
 	swidthsw = calc_swidthsw(overlay->dev, params->offset_Y, tmp_width);
 	sheight = params->src_h;
-	iowrite32(i915_gem_obj_ggtt_offset(new_bo) + params->offset_Y, &regs->OBUF_0Y);
+	iowrite32(vma->node.start + params->offset_Y, &regs->OBUF_0Y);
 	ostride = params->stride_Y;
 
 	if (params->format & I915_OVERLAY_YUV_PLANAR) {
@@ -809,8 +809,8 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 				      params->src_w/uv_hscale);
 		swidthsw |= max_t(u32, tmp_U, tmp_V) << 16;
 		sheight |= (params->src_h/uv_vscale) << 16;
-		iowrite32(i915_gem_obj_ggtt_offset(new_bo) + params->offset_U, &regs->OBUF_0U);
-		iowrite32(i915_gem_obj_ggtt_offset(new_bo) + params->offset_V, &regs->OBUF_0V);
+		iowrite32(vma->node.start + params->offset_U, &regs->OBUF_0U);
+		iowrite32(vma->node.start + params->offset_V, &regs->OBUF_0V);
 		ostride |= params->stride_UV << 16;
 	}
 
@@ -835,7 +835,9 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 			  INTEL_FRONTBUFFER_OVERLAY(pipe));
 
 	overlay->old_vid_bo = overlay->vid_bo;
+	overlay->old_vid_vma = overlay->vid_vma;
 	overlay->vid_bo = new_bo;
+	overlay->vid_vma = vma;
 
 	intel_frontbuffer_flip(dev,
 			       INTEL_FRONTBUFFER_OVERLAY(pipe));
@@ -843,7 +845,7 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	return 0;
 
 out_unpin:
-	i915_gem_object_ggtt_unpin(new_bo);
+	i915_gem_object_unpin_from_display_plane(vma);
 	return ret;
 }
 
@@ -1374,6 +1376,7 @@ void intel_setup_overlay(struct drm_device *dev)
 	struct intel_overlay *overlay;
 	struct drm_i915_gem_object *reg_bo;
 	struct overlay_registers __iomem *regs;
+	struct i915_vma *vma = NULL;
 	int ret;
 
 	if (!HAS_OVERLAY(dev))
@@ -1406,13 +1409,14 @@ void intel_setup_overlay(struct drm_device *dev)
 		}
 		overlay->flip_addr = reg_bo->phys_handle->busaddr;
 	} else {
-		ret = i915_gem_object_ggtt_pin(reg_bo, NULL,
+		vma = i915_gem_object_ggtt_pin(reg_bo, NULL,
 					       0, PAGE_SIZE, PIN_MAPPABLE);
-		if (ret) {
+		if (IS_ERR(vma)) {
 			DRM_ERROR("failed to pin overlay register bo\n");
+			ret = PTR_ERR(vma);
 			goto out_free_bo;
 		}
-		overlay->flip_addr = i915_gem_obj_ggtt_offset(reg_bo);
+		overlay->flip_addr = vma->node.start;
 
 		ret = i915_gem_object_set_to_gtt_domain(reg_bo, true);
 		if (ret) {
@@ -1444,8 +1448,8 @@ void intel_setup_overlay(struct drm_device *dev)
 	return;
 
 out_unpin_bo:
-	if (!OVERLAY_NEEDS_PHYSICAL(dev))
-		i915_gem_object_ggtt_unpin(reg_bo);
+	if (vma)
+		i915_vma_unpin(vma);
 out_free_bo:
 	drm_gem_object_unreference(&reg_bo->base);
 out_free:
@@ -1490,7 +1494,7 @@ intel_overlay_map_regs_atomic(struct intel_overlay *overlay)
 			overlay->reg_bo->phys_handle->vaddr;
 	else
 		regs = io_mapping_map_atomic_wc(dev_priv->gtt.mappable,
-						i915_gem_obj_ggtt_offset(overlay->reg_bo));
+						overlay->flip_addr);
 
 	return regs;
 }
@@ -1523,7 +1527,7 @@ intel_overlay_capture_error_state(struct drm_device *dev)
 	if (OVERLAY_NEEDS_PHYSICAL(overlay->dev))
 		error->base = (__force long)overlay->reg_bo->phys_handle->vaddr;
 	else
-		error->base = i915_gem_obj_ggtt_offset(overlay->reg_bo);
+		error->base = overlay->flip_addr;
 
 	regs = intel_overlay_map_regs_atomic(overlay);
 	if (!regs)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6db7f93a3c1d..dbc76cd54c3e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -534,7 +534,7 @@ static int init_ring_common(struct intel_engine_cs *ring)
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_ring *ringbuf = ring->buffer;
-	struct drm_i915_gem_object *obj = ringbuf->obj;
+	struct i915_vma *vma = ringbuf->vma;
 	int ret = 0;
 
 	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
@@ -574,7 +574,7 @@ static int init_ring_common(struct intel_engine_cs *ring)
 	 * registers with the above sequence (the readback of the HEAD registers
 	 * also enforces ordering), otherwise the hw might lose the new ring
 	 * register values. */
-	I915_WRITE_START(ring, i915_gem_obj_ggtt_offset(obj));
+	I915_WRITE_START(ring, vma->node.start);
 
 	/* WaClearRingBufHeadRegAtInit:ctg,elk */
 	if (I915_READ_HEAD(ring))
@@ -589,14 +589,14 @@ static int init_ring_common(struct intel_engine_cs *ring)
 
 	/* If the head is still not zero, the ring is dead */
 	if (wait_for((I915_READ_CTL(ring) & RING_VALID) != 0 &&
-		     I915_READ_START(ring) == i915_gem_obj_ggtt_offset(obj) &&
+		     I915_READ_START(ring) == vma->node.start &&
 		     (I915_READ_HEAD(ring) & HEAD_ADDR) == 0, 50)) {
 		DRM_ERROR("%s initialization failed "
 			  "ctl %08x (valid? %d) head %08x tail %08x start %08x [expected %08lx]\n",
 			  ring->name,
 			  I915_READ_CTL(ring), I915_READ_CTL(ring) & RING_VALID,
 			  I915_READ_HEAD(ring), I915_READ_TAIL(ring),
-			  I915_READ_START(ring), (unsigned long)i915_gem_obj_ggtt_offset(obj));
+			  I915_READ_START(ring), (unsigned long)vma->node.start);
 		ret = -EIO;
 		goto out;
 	}
@@ -622,10 +622,11 @@ intel_fini_pipe_control(struct intel_engine_cs *ring)
 	if (ring->scratch.obj == NULL)
 		return;
 
-	if (INTEL_INFO(dev)->gen >= 5) {
+	if (INTEL_INFO(dev)->gen >= 5)
 		kunmap(sg_page(ring->scratch.obj->pages->sgl));
-		i915_gem_object_ggtt_unpin(ring->scratch.obj);
-	}
+
+	if (ring->scratch.vma)
+		i915_vma_unpin(ring->scratch.vma);
 
 	drm_gem_object_unreference(&ring->scratch.obj->base);
 	ring->scratch.obj = NULL;
@@ -634,6 +635,7 @@ intel_fini_pipe_control(struct intel_engine_cs *ring)
 int
 intel_init_pipe_control(struct intel_engine_cs *ring)
 {
+	struct i915_vma *vma;
 	int ret;
 
 	WARN_ON(ring->scratch.obj);
@@ -649,12 +651,14 @@ intel_init_pipe_control(struct intel_engine_cs *ring)
 	if (ret)
 		goto err_unref;
 
-	ret = i915_gem_object_ggtt_pin(ring->scratch.obj, NULL,
+	vma = i915_gem_object_ggtt_pin(ring->scratch.obj, NULL,
 				       0, 4096, PIN_HIGH);
-	if (ret)
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto err_unref;
+	}
 
-	ring->scratch.gtt_offset = i915_gem_obj_ggtt_offset(ring->scratch.obj);
+	ring->scratch.gtt_offset = vma->node.start;
 	ring->scratch.cpu_page = kmap(sg_page(ring->scratch.obj->pages->sgl));
 	if (ring->scratch.cpu_page == NULL) {
 		ret = -ENOMEM;
@@ -663,10 +667,11 @@ intel_init_pipe_control(struct intel_engine_cs *ring)
 
 	DRM_DEBUG_DRIVER("%s pipe control offset: 0x%08x\n",
 			 ring->name, ring->scratch.gtt_offset);
+	ring->scratch.vma = vma;
 	return 0;
 
 err_unpin:
-	i915_gem_object_ggtt_unpin(ring->scratch.obj);
+	i915_vma_unpin(vma);
 err_unref:
 	drm_gem_object_unreference(&ring->scratch.obj->base);
 err:
@@ -1167,10 +1172,13 @@ static void render_ring_cleanup(struct intel_engine_cs *ring)
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	if (dev_priv->semaphore_obj) {
-		i915_gem_object_ggtt_unpin(dev_priv->semaphore_obj);
-		drm_gem_object_unreference(&dev_priv->semaphore_obj->base);
-		dev_priv->semaphore_obj = NULL;
+	if (dev_priv->semaphore_vma) {
+		struct drm_i915_gem_object *obj = dev_priv->semaphore_vma->obj;
+
+		i915_vma_unpin(dev_priv->semaphore_vma);
+		dev_priv->semaphore_vma = NULL;
+
+		drm_gem_object_unreference(&obj->base);
 	}
 
 	intel_fini_pipe_control(ring);
@@ -1806,67 +1814,70 @@ i915_emit_bb_start(struct drm_i915_gem_request *req,
 
 static void cleanup_status_page(struct intel_engine_cs *ring)
 {
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 
-	obj = ring->status_page.obj;
-	if (obj == NULL)
+	vma = ring->status_page.vma;
+	if (vma == NULL)
 		return;
+	ring->status_page.vma = NULL;
 
-	kunmap(sg_page(obj->pages->sgl));
-	i915_gem_object_ggtt_unpin(obj);
-	drm_gem_object_unreference(&obj->base);
-	ring->status_page.obj = NULL;
+	kunmap(sg_page(vma->obj->pages->sgl));
+	i915_vma_unpin(vma);
+
+	drm_gem_object_unreference(&vma->obj->base);
 }
 
 static int init_status_page(struct intel_engine_cs *ring)
 {
 	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	unsigned flags;
+	int ret;
 
-	if ((obj = ring->status_page.obj) == NULL) {
-		unsigned flags;
-		int ret;
+	if (ring->status_page.vma)
+		return 0;
 
-		obj = i915_gem_alloc_object(ring->dev, 4096);
-		if (obj == NULL) {
-			DRM_ERROR("Failed to allocate status page\n");
-			return -ENOMEM;
-		}
+	obj = i915_gem_alloc_object(ring->dev, 4096);
+	if (obj == NULL) {
+		DRM_ERROR("Failed to allocate status page\n");
+		return -ENOMEM;
+	}
 
-		ret = i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
-		if (ret)
-			goto err_unref;
-
-		flags = 0;
-		if (!HAS_LLC(ring->dev))
-			/* On g33, we cannot place HWS above 256MiB, so
-			 * restrict its pinning to the low mappable arena.
-			 * Though this restriction is not documented for
-			 * gen4, gen5, or byt, they also behave similarly
-			 * and hang if the HWS is placed at the top of the
-			 * GTT. To generalise, it appears that all !llc
-			 * platforms have issues with us placing the HWS
-			 * above the mappable region (even though we never
-			 * actualy map it).
-			 */
-			flags |= PIN_MAPPABLE;
-		ret = i915_gem_object_ggtt_pin(obj, NULL, 0, 4096, flags);
-		if (ret) {
-err_unref:
-			drm_gem_object_unreference(&obj->base);
-			return ret;
-		}
+	ret = i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
+	if (ret)
+		goto err_unref;
 
-		ring->status_page.obj = obj;
+	flags = 0;
+	if (!HAS_LLC(ring->dev))
+		/* On g33, we cannot place HWS above 256MiB, so
+		 * restrict its pinning to the low mappable arena.
+		 * Though this restriction is not documented for
+		 * gen4, gen5, or byt, they also behave similarly
+		 * and hang if the HWS is placed at the top of the
+		 * GTT. To generalise, it appears that all !llc
+		 * platforms have issues with us placing the HWS
+		 * above the mappable region (even though we never
+		 * actualy map it).
+		 */
+		flags |= PIN_MAPPABLE;
+	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 4096, flags);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto err_unref;
 	}
 
-	ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(obj);
+	ring->status_page.vma = vma;
+	ring->status_page.gfx_addr = vma->node.start;
 	ring->status_page.page_addr = kmap(sg_page(obj->pages->sgl));
-	memset(ring->status_page.page_addr, 0, PAGE_SIZE);
 
 	DRM_DEBUG_DRIVER("%s hws offset: 0x%08x\n",
 			ring->name, ring->status_page.gfx_addr);
 
 	return 0;
+
+err_unref:
+	drm_gem_object_unreference(&obj->base);
+	return ret;
 }
 
 static int init_phys_status_page(struct intel_engine_cs *ring)
@@ -1889,14 +1900,15 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
 int intel_ring_map(struct intel_ring *ring)
 {
 	struct drm_i915_gem_object *obj = ring->obj;
+	struct i915_vma *vma;
 	int ret;
 
 	if (HAS_LLC(ring->engine->i915) && !obj->stolen) {
-		ret = i915_gem_object_ggtt_pin(obj, NULL,
+		vma = i915_gem_object_ggtt_pin(obj, NULL,
 					       0, PAGE_SIZE,
 					       PIN_HIGH);
-		if (ret)
-			return ret;
+		if (IS_ERR(vma))
+			return PTR_ERR(vma);
 
 		ret = i915_gem_object_set_to_cpu_domain(obj, true);
 		if (ret)
@@ -1909,18 +1921,18 @@ int intel_ring_map(struct intel_ring *ring)
 			goto unpin;
 		}
 	} else {
-		ret = i915_gem_object_ggtt_pin(obj, NULL,
+		vma = i915_gem_object_ggtt_pin(obj, NULL,
 					       0, PAGE_SIZE,
 					       PIN_MAPPABLE);
-		if (ret)
-			return ret;
+		if (IS_ERR(vma))
+			return PTR_ERR(vma);
 
 		ret = i915_gem_object_set_to_gtt_domain(obj, true);
 		if (ret)
 			goto unpin;
 
 		ring->virtual_start = ioremap_wc(ring->engine->i915->gtt.mappable_base +
-						 i915_gem_obj_ggtt_offset(obj),
+						 vma->node.start,
 						 ring->size);
 		if (ring->virtual_start == NULL) {
 			ret = -ENOMEM;
@@ -1928,10 +1940,11 @@ int intel_ring_map(struct intel_ring *ring)
 		}
 	}
 
+	ring->vma = vma;
 	return 0;
 
 unpin:
-	i915_gem_object_ggtt_unpin(obj);
+	i915_vma_unpin(vma);
 	return ret;
 }
 
@@ -1941,7 +1954,9 @@ void intel_ring_unmap(struct intel_ring *ring)
 		i915_gem_object_unpin_vmap(ring->obj);
 	else
 		iounmap(ring->virtual_start);
-	i915_gem_object_ggtt_unpin(ring->obj);
+
+	i915_vma_unpin(ring->vma);
+	ring->vma = NULL;
 }
 
 static void intel_destroy_ringbuffer_obj(struct intel_ring *ringbuf)
@@ -2507,16 +2522,20 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 				DRM_ERROR("Failed to allocate semaphore bo. Disabling semaphores\n");
 				i915.semaphores = 0;
 			} else {
+				struct i915_vma *vma;
+
 				i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
-				ret = i915_gem_object_ggtt_pin(obj, NULL,
+				vma = i915_gem_object_ggtt_pin(obj, NULL,
 							       0, 0,
 							       PIN_HIGH);
-				if (ret != 0) {
+				if (IS_ERR(vma)) {
 					drm_gem_object_unreference(&obj->base);
 					DRM_ERROR("Failed to pin semaphore bo. Disabling semaphores\n");
 					i915.semaphores = 0;
-				} else
-					dev_priv->semaphore_obj = obj;
+					vma = NULL;
+				}
+
+				dev_priv->semaphore_vma = vma;
 			}
 		}
 
@@ -2527,8 +2546,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->irq_disable = gen8_ring_disable_irq;
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		ring->irq_seqno_barrier = gen6_seqno_barrier;
-		if (i915.semaphores) {
-			WARN_ON(!dev_priv->semaphore_obj);
+		if (dev_priv->semaphore_vma) {
 			ring->semaphore.sync_to = gen8_ring_sync;
 			ring->semaphore.signal = gen8_rcs_signal;
 			GEN8_RING_SEMAPHORE_INIT;
@@ -2604,21 +2622,24 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 
 	/* Workaround batchbuffer to combat CS tlb bug. */
 	if (HAS_BROKEN_CS_TLB(dev)) {
+		struct i915_vma *vma;
+
 		obj = i915_gem_alloc_object(dev, I830_WA_SIZE);
 		if (obj == NULL) {
 			DRM_ERROR("Failed to allocate batch bo\n");
 			return -ENOMEM;
 		}
 
-		ret = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, 0);
-		if (ret != 0) {
+		vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, 0);
+		if (IS_ERR(vma)) {
 			drm_gem_object_unreference(&obj->base);
-			DRM_ERROR("Failed to ping batch bo\n");
-			return ret;
+			DRM_ERROR("Failed to pin batch bo\n");
+			return PTR_ERR(vma);
 		}
 
 		ring->scratch.obj = obj;
-		ring->scratch.gtt_offset = i915_gem_obj_ggtt_offset(obj);
+		ring->scratch.vma = vma;
+		ring->scratch.gtt_offset = vma->node.start;
 	}
 
 	ret = intel_init_engine(dev, ring);
@@ -2656,7 +2677,7 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			ring->irq_enable = gen8_ring_enable_irq;
 			ring->irq_disable = gen8_ring_disable_irq;
 			ring->emit_bb_start = gen8_emit_bb_start;
-			if (i915.semaphores) {
+			if (dev_priv->semaphore_vma) {
 				ring->semaphore.sync_to = gen8_ring_sync;
 				ring->semaphore.signal = gen8_xcs_signal;
 				GEN8_RING_SEMAPHORE_INIT;
@@ -2721,7 +2742,7 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
 	ring->irq_enable = gen8_ring_enable_irq;
 	ring->irq_disable = gen8_ring_disable_irq;
 	ring->emit_bb_start = gen8_emit_bb_start;
-	if (i915.semaphores) {
+	if (dev_priv->semaphore_vma) {
 		ring->semaphore.sync_to = gen8_ring_sync;
 		ring->semaphore.signal = gen8_xcs_signal;
 		GEN8_RING_SEMAPHORE_INIT;
@@ -2749,7 +2770,7 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 		ring->irq_enable = gen8_ring_enable_irq;
 		ring->irq_disable = gen8_ring_disable_irq;
 		ring->emit_bb_start = gen8_emit_bb_start;
-		if (i915.semaphores) {
+		if (dev_priv->semaphore_vma) {
 			ring->semaphore.sync_to = gen8_ring_sync;
 			ring->semaphore.signal = gen8_xcs_signal;
 			GEN8_RING_SEMAPHORE_INIT;
@@ -2805,7 +2826,7 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 		ring->irq_enable = gen8_ring_enable_irq;
 		ring->irq_disable = gen8_ring_disable_irq;
 		ring->emit_bb_start = gen8_emit_bb_start;
-		if (i915.semaphores) {
+		if (dev_priv->semaphore_vma) {
 			ring->semaphore.sync_to = gen8_ring_sync;
 			ring->semaphore.signal = gen8_xcs_signal;
 			GEN8_RING_SEMAPHORE_INIT;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 868cc8d5abb3..894eb8089296 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -25,10 +25,10 @@
  */
 #define I915_RING_FREE_SPACE 64
 
-struct  intel_hw_status_page {
+struct intel_hw_status_page {
 	u32		*page_addr;
 	unsigned int	gfx_addr;
-	struct		drm_i915_gem_object *obj;
+	struct		i915_vma *vma;
 };
 
 #define I915_READ_TAIL(ring) I915_READ(RING_TAIL((ring)->mmio_base))
@@ -54,19 +54,16 @@ struct  intel_hw_status_page {
  */
 #define i915_semaphore_seqno_size sizeof(uint64_t)
 #define GEN8_SIGNAL_OFFSET(__ring, to)			     \
-	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
+	(dev_priv->semaphore_vma->node.start + \
 	((__ring)->id * I915_NUM_RINGS * i915_semaphore_seqno_size) +	\
 	(i915_semaphore_seqno_size * (to)))
 
 #define GEN8_WAIT_OFFSET(__ring, from)			     \
-	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
+	(dev_priv->semaphore_vma->node.start + \
 	((from) * I915_NUM_RINGS * i915_semaphore_seqno_size) + \
 	(i915_semaphore_seqno_size * (__ring)->id))
 
 #define GEN8_RING_SEMAPHORE_INIT do { \
-	if (!dev_priv->semaphore_obj) { \
-		break; \
-	} \
 	ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(ring, RCS); \
 	ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(ring, VCS); \
 	ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(ring, BCS); \
@@ -99,6 +96,7 @@ struct intel_engine_hangcheck {
 
 struct intel_ring {
 	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 	void *virtual_start;
 
 	struct intel_engine_cs *engine;
@@ -146,7 +144,7 @@ struct  i915_ctx_workarounds {
 		u32 offset;
 		u32 size;
 	} indirect_ctx, per_ctx;
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 };
 
 struct drm_i915_gem_request;
@@ -322,6 +320,7 @@ struct intel_engine_cs {
 
 	struct {
 		struct drm_i915_gem_object *obj;
+		struct i915_vma *vma;
 		u32 gtt_offset;
 		volatile u32 *cpu_page;
 	} scratch;
diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c
index 4d448b990c50..768989f578cb 100644
--- a/drivers/gpu/drm/i915/intel_sprite.c
+++ b/drivers/gpu/drm/i915/intel_sprite.c
@@ -461,8 +461,8 @@ vlv_update_plane(struct drm_plane *dplane,
 
 	I915_WRITE(SPSIZE(pipe, plane), (crtc_h << 16) | crtc_w);
 	I915_WRITE(SPCNTR(pipe, plane), sprctl);
-	I915_WRITE(SPSURF(pipe, plane), i915_gem_obj_ggtt_offset(obj) +
-		   sprsurf_offset);
+	I915_WRITE(SPSURF(pipe, plane),
+		   i915_gem_object_ggtt_offset(obj, NULL) + sprsurf_offset);
 	POSTING_READ(SPSURF(pipe, plane));
 }
 
@@ -603,7 +603,7 @@ ivb_update_plane(struct drm_plane *plane,
 		I915_WRITE(SPRSCALE(pipe), sprscale);
 	I915_WRITE(SPRCTL(pipe), sprctl);
 	I915_WRITE(SPRSURF(pipe),
-		   i915_gem_obj_ggtt_offset(obj) + sprsurf_offset);
+		   i915_gem_object_ggtt_offset(obj, NULL) + sprsurf_offset);
 	POSTING_READ(SPRSURF(pipe));
 }
 
@@ -733,7 +733,7 @@ ilk_update_plane(struct drm_plane *plane,
 	I915_WRITE(DVSSCALE(pipe), dvsscale);
 	I915_WRITE(DVSCNTR(pipe), dvscntr);
 	I915_WRITE(DVSSURF(pipe),
-		   i915_gem_obj_ggtt_offset(obj) + dvssurf_offset);
+		   i915_gem_object_ggtt_offset(obj, NULL) + dvssurf_offset);
 	POSTING_READ(DVSSURF(pipe));
 }
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 126/190] drm/i915: Print the batchbuffer offset next to BBADDR in error state
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (37 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 125/190] drm/i915: Track pinned VMA Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 127/190] drm/i915: Cache kmap between relocations Chris Wilson
                     ` (14 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

It is useful when looking at captured error states to check the recorded
BBADDR register (the address of the last batchbuffer instruction loaded)
against the expected offset of the batch buffer, and so do a quick check
that (a) the capture is true or (b) HEAD hasn't wandered off into the
badlands.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h       |  1 +
 drivers/gpu/drm/i915/i915_gpu_error.c | 12 +++++++++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6b729baf6503..693f472bd604 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -553,6 +553,7 @@ struct drm_i915_error_state {
 		struct drm_i915_error_object {
 			int page_count;
 			u64 gtt_offset;
+			u64 gtt_size;
 			u32 *pages[0];
 		} *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 7fe9281bf37e..69ce355e00ea 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -251,6 +251,13 @@ static void i915_ring_error_state(struct drm_i915_error_state_buf *m,
 	err_printf(m, "  IPEIR: 0x%08x\n", ring->ipeir);
 	err_printf(m, "  IPEHR: 0x%08x\n", ring->ipehr);
 	err_printf(m, "  INSTDONE: 0x%08x\n", ring->instdone);
+	if (ring->batchbuffer) {
+		u64 start = ring->batchbuffer->gtt_offset;
+		u64 end = start + ring->batchbuffer->gtt_size;
+		err_printf(m, "  batch: [0x%08x %08x, 0x%08x %08x]\n",
+			   upper_32_bits(start), lower_32_bits(start),
+			   upper_32_bits(end), lower_32_bits(end));
+	}
 	if (INTEL_INFO(dev)->gen >= 4) {
 		err_printf(m, "  BBADDR: 0x%08x %08x\n", (u32)(ring->bbaddr>>32), (u32)ring->bbaddr);
 		err_printf(m, "  BB_STATE: 0x%08x\n", ring->bbstate);
@@ -615,7 +622,10 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
 	if (dst == NULL)
 		return NULL;
 
-	reloc_offset = dst->gtt_offset = vma->node.start;
+	dst->gtt_offset = vma->node.start;
+	dst->gtt_size = vma->node.size;
+
+	reloc_offset = dst->gtt_offset;
 	use_ggtt = (src->cache_level == I915_CACHE_NONE &&
 		   (vma->bound & GLOBAL_BIND) &&
 		   reloc_offset + num_pages * PAGE_SIZE <= dev_priv->gtt.mappable_end);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 127/190] drm/i915: Cache kmap between relocations
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (38 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 126/190] drm/i915: Print the batchbuffer offset next to BBADDR in error state Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 128/190] drm/i915: Extract i915_gem_obj_prepare_shmem_write() Chris Wilson
                     ` (13 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

When doing relocations, we have to obtain a mapping to the page
containing the target address. This is either a kmap or iomap depending
on GPU and its cache coherency. Neighbouring relocation entries are
typically within the same page and so we can cache our kmapping between
them and avoid those pesky TLB flushes.

Note that there is some sleight-of-hand in how the slow relocate works
as the reloc_entry_cache implies pagefaults disabled (as we are inside a
kmap_atomic section). However, the slow relocate code is meant to be the
fallback from the atomic fast path failing. Fortunately it works as we
already have performed the copy_from_user for the relocation array (no
more pagefaults there) and the kmap_atomic cache is enabled after we
have waited upon an active buffer (so no more sleeping in atomic).
Magic!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 152 +++++++++++++++++++----------
 1 file changed, 102 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 4d15dd32e365..f1dfb51ae4e3 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -305,9 +305,50 @@ relocation_target(struct drm_i915_gem_relocation_entry *reloc,
 	return gen8_canonical_addr((int)reloc->delta + target_offset);
 }
 
+struct reloc_cache {
+	void *vaddr;
+	unsigned page;
+	enum { KMAP, IOMAP } type;
+};
+
+static void reloc_cache_init(struct reloc_cache *cache)
+{
+	cache->page = -1;
+	cache->vaddr = NULL;
+}
+
+static void reloc_cache_fini(struct reloc_cache *cache)
+{
+	if (cache->vaddr == NULL)
+		return;
+
+	switch (cache->type) {
+	case KMAP: kunmap_atomic(cache->vaddr); break;
+	case IOMAP: io_mapping_unmap_atomic(cache->vaddr); break;
+	}
+}
+
+static void *reloc_kmap(struct drm_i915_gem_object *obj,
+			struct reloc_cache *cache,
+			int page)
+{
+	if (cache->page == page)
+		return cache->vaddr;
+
+	if (cache->vaddr)
+		kunmap_atomic(cache->vaddr);
+
+	cache->page = page;
+	cache->vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj, page));
+	cache->type = KMAP;
+
+	return cache->vaddr;
+}
+
 static int
 relocate_entry_cpu(struct drm_i915_gem_object *obj,
 		   struct drm_i915_gem_relocation_entry *reloc,
+		   struct reloc_cache *cache,
 		   uint64_t target_offset)
 {
 	struct drm_device *dev = obj->base.dev;
@@ -320,30 +361,44 @@ relocate_entry_cpu(struct drm_i915_gem_object *obj,
 	if (ret)
 		return ret;
 
-	vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
-				reloc->offset >> PAGE_SHIFT));
+	vaddr = reloc_kmap(obj, cache, reloc->offset >> PAGE_SHIFT);
 	*(uint32_t *)(vaddr + page_offset) = lower_32_bits(delta);
 
 	if (INTEL_INFO(dev)->gen >= 8) {
-		page_offset = offset_in_page(page_offset + sizeof(uint32_t));
-
-		if (page_offset == 0) {
-			kunmap_atomic(vaddr);
-			vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
-			    (reloc->offset + sizeof(uint32_t)) >> PAGE_SHIFT));
+		page_offset += sizeof(uint32_t);
+		if (page_offset == PAGE_SIZE) {
+			vaddr = reloc_kmap(obj, cache, cache->page + 1);
+			page_offset = 0;
 		}
-
 		*(uint32_t *)(vaddr + page_offset) = upper_32_bits(delta);
 	}
 
-	kunmap_atomic(vaddr);
-
 	return 0;
 }
 
+static void *reloc_iomap(struct drm_i915_private *i915,
+			 struct reloc_cache *cache,
+			 uint64_t offset)
+{
+	if (cache->page == offset >> PAGE_SHIFT)
+		return cache->vaddr;
+
+	if (cache->vaddr)
+		io_mapping_unmap_atomic(cache->vaddr);
+
+	cache->page = offset >> PAGE_SHIFT;
+	cache->vaddr =
+		io_mapping_map_atomic_wc(i915->gtt.mappable,
+					 offset & PAGE_MASK);
+	cache->type = IOMAP;
+
+	return cache->vaddr;
+}
+
 static int
 relocate_entry_gtt(struct drm_i915_gem_object *obj,
 		   struct drm_i915_gem_relocation_entry *reloc,
+		   struct reloc_cache *cache,
 		   uint64_t target_offset)
 {
 	struct drm_device *dev = obj->base.dev;
@@ -369,28 +424,19 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
 	/* Map the page containing the relocation we're going to perform.  */
 	offset = vma->node.start;
 	offset += reloc->offset;
-	reloc_page = io_mapping_map_atomic_wc(dev_priv->gtt.mappable,
-					      offset & PAGE_MASK);
+	reloc_page = reloc_iomap(dev_priv, cache, offset);
 	iowrite32(lower_32_bits(delta), reloc_page + offset_in_page(offset));
 
 	if (INTEL_INFO(dev)->gen >= 8) {
 		offset += sizeof(uint32_t);
-
-		if (offset_in_page(offset) == 0) {
-			io_mapping_unmap_atomic(reloc_page);
-			reloc_page =
-				io_mapping_map_atomic_wc(dev_priv->gtt.mappable,
-							 offset);
-		}
-
+		if (offset_in_page(offset) == 0)
+			reloc_page = reloc_iomap(dev_priv, cache, offset);
 		iowrite32(upper_32_bits(delta),
 			  reloc_page + offset_in_page(offset));
 	}
 
-	io_mapping_unmap_atomic(reloc_page);
-
 unpin:
-	i915_vma_unpin(vma);
+	__i915_vma_unpin(vma);
 	return ret;
 }
 
@@ -406,6 +452,7 @@ clflush_write32(void *addr, uint32_t value)
 static int
 relocate_entry_clflush(struct drm_i915_gem_object *obj,
 		       struct drm_i915_gem_relocation_entry *reloc,
+		       struct reloc_cache *cache,
 		       uint64_t target_offset)
 {
 	struct drm_device *dev = obj->base.dev;
@@ -418,31 +465,26 @@ relocate_entry_clflush(struct drm_i915_gem_object *obj,
 	if (ret)
 		return ret;
 
-	vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
-				reloc->offset >> PAGE_SHIFT));
+	vaddr = reloc_kmap(obj, cache, reloc->offset >> PAGE_SHIFT);
 	clflush_write32(vaddr + page_offset, lower_32_bits(delta));
 
 	if (INTEL_INFO(dev)->gen >= 8) {
-		page_offset = offset_in_page(page_offset + sizeof(uint32_t));
-
-		if (page_offset == 0) {
-			kunmap_atomic(vaddr);
-			vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
-			    (reloc->offset + sizeof(uint32_t)) >> PAGE_SHIFT));
+		page_offset += sizeof(uint32_t);
+		if (page_offset == PAGE_SIZE) {
+			vaddr = reloc_kmap(obj, cache, cache->page + 1);
+			page_offset = 0;
 		}
-
 		clflush_write32(vaddr + page_offset, upper_32_bits(delta));
 	}
 
-	kunmap_atomic(vaddr);
-
 	return 0;
 }
 
 static int
 i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 				   struct eb_vmas *eb,
-				   struct drm_i915_gem_relocation_entry *reloc)
+				   struct drm_i915_gem_relocation_entry *reloc,
+				   struct reloc_cache *cache)
 {
 	struct drm_device *dev = obj->base.dev;
 	struct drm_gem_object *target_obj;
@@ -526,11 +568,11 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 		return -EFAULT;
 
 	if (use_cpu_reloc(obj))
-		ret = relocate_entry_cpu(obj, reloc, target_offset);
+		ret = relocate_entry_cpu(obj, reloc, cache, target_offset);
 	else if (obj->map_and_fenceable)
-		ret = relocate_entry_gtt(obj, reloc, target_offset);
+		ret = relocate_entry_gtt(obj, reloc, cache, target_offset);
 	else if (cpu_has_clflush)
-		ret = relocate_entry_clflush(obj, reloc, target_offset);
+		ret = relocate_entry_clflush(obj, reloc, cache, target_offset);
 	else {
 		WARN_ONCE(1, "Impossible case in relocation handling\n");
 		ret = -ENODEV;
@@ -553,9 +595,11 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 	struct drm_i915_gem_relocation_entry stack_reloc[N_RELOC(512)];
 	struct drm_i915_gem_relocation_entry __user *user_relocs;
 	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	int remain, ret;
+	struct reloc_cache cache;
+	int remain, ret = 0;
 
 	user_relocs = to_user_ptr(entry->relocs_ptr);
+	reloc_cache_init(&cache);
 
 	remain = entry->relocation_count;
 	while (remain) {
@@ -565,21 +609,24 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 			count = ARRAY_SIZE(stack_reloc);
 		remain -= count;
 
-		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0])))
-			return -EFAULT;
+		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
+			ret = -EFAULT;
+			goto out;
+		}
 
 		do {
 			u64 offset = r->presumed_offset;
 
-			ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, r);
+			ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, r, &cache);
 			if (ret)
-				return ret;
+				goto out;
 
 			if (r->presumed_offset != offset &&
 			    __copy_to_user_inatomic(&user_relocs->presumed_offset,
 						    &r->presumed_offset,
 						    sizeof(r->presumed_offset))) {
-				return -EFAULT;
+				ret = -EFAULT;
+				goto out;
 			}
 
 			user_relocs++;
@@ -587,7 +634,9 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 		} while (--count);
 	}
 
-	return 0;
+out:
+	reloc_cache_fini(&cache);
+	return ret;
 #undef N_RELOC
 }
 
@@ -597,15 +646,18 @@ i915_gem_execbuffer_relocate_vma_slow(struct i915_vma *vma,
 				      struct drm_i915_gem_relocation_entry *relocs)
 {
 	const struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	int i, ret;
+	struct reloc_cache cache;
+	int i, ret = 0;
 
+	reloc_cache_init(&cache);
 	for (i = 0; i < entry->relocation_count; i++) {
-		ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, &relocs[i]);
+		ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, &relocs[i], &cache);
 		if (ret)
-			return ret;
+			break;
 	}
+	reloc_cache_fini(&cache);
 
-	return 0;
+	return ret;
 }
 
 static int
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 128/190] drm/i915: Extract i915_gem_obj_prepare_shmem_write()
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (39 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 127/190] drm/i915: Cache kmap between relocations Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 129/190] drm/i915: Before accessing an object via the cpu, flush GTT writes Chris Wilson
                     ` (12 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

This is a companion to i915_gem_obj_prepare_shmem_read() that prepares
the backing storage for direct writes. It first serialises with the GPU,
pins the backing storage and then indicates what clfushes are required in
order for the writes to be coherent.

Whilst here, fix support for ancient CPUs without clflush for which we
cannot do the GTT+clflush tricks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c |   2 +-
 drivers/gpu/drm/i915/i915_drv.h        |   6 +-
 drivers/gpu/drm/i915/i915_gem.c        | 112 +++++++++++++++++++++------------
 3 files changed, 79 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 814d894ed925..fae127166e2c 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -902,7 +902,7 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
 		       u32 batch_start_offset,
 		       u32 batch_len)
 {
-	int needs_clflush = 0;
+	unsigned needs_clflush;
 	void *src_base, *src;
 	void *dst = NULL;
 	int ret;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 693f472bd604..45a0da0947cd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2761,7 +2761,11 @@ void i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv);
 void i915_gem_release_mmap(struct drm_i915_gem_object *obj);
 
 int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
-				    int *needs_clflush);
+				    unsigned *needs_clflush);
+int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
+				     unsigned *needs_clflush);
+#define CLFLUSH_BEFORE 0x1
+#define CLFLUSH_AFTER 0x2
 
 int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 01c20a336c04..e18c0d4d24ad 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -492,34 +492,92 @@ __copy_from_user_swizzled(char *gpu_vaddr, int gpu_offset,
  * flush the object from the CPU cache.
  */
 int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
-				    int *needs_clflush)
+				    unsigned *needs_clflush)
 {
 	int ret;
 
 	*needs_clflush = 0;
-
 	if (!obj->base.filp)
 		return -EINVAL;
 
 	if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) {
+		ret = i915_gem_object_wait_rendering(obj, true);
+		if (ret)
+			return ret;
+
 		/* If we're not in the cpu read domain, set ourself into the gtt
 		 * read domain and manually flush cachelines (if required). This
 		 * optimizes for the case when the gpu will dirty the data
 		 * anyway again before the next pread happens. */
 		*needs_clflush = !cpu_cache_is_coherent(obj->base.dev,
 							obj->cache_level);
-		ret = i915_gem_object_wait_rendering(obj, true);
+	}
+
+	ret = i915_gem_object_get_pages(obj);
+	if (ret)
+		return ret;
+
+	i915_gem_object_pin_pages(obj);
+
+	if (*needs_clflush && !cpu_has_clflush) {
+		ret = i915_gem_object_set_to_cpu_domain(obj, false);
+		if (ret) {
+			i915_gem_object_unpin_pages(obj);
+			return ret;
+		}
+		*needs_clflush = 0;
+	}
+
+	return 0;
+}
+
+int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
+				    unsigned *needs_clflush)
+{
+	int ret;
+
+	*needs_clflush = 0;
+	if (!obj->base.filp)
+		return -EINVAL;
+
+	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
+		ret = i915_gem_object_wait_rendering(obj, false);
 		if (ret)
 			return ret;
+
+		/* If we're not in the cpu write domain, set ourself into the
+		 * gtt write domain and manually flush cachelines (as required).
+		 * This optimizes for the case when the gpu will use the data
+		 * right away and we therefore have to clflush anyway.
+		 */
+		*needs_clflush |= cpu_write_needs_clflush(obj) << 1;
 	}
 
+	/* Same trick applies to invalidate partially written cachelines read
+	 * before writing.
+	 */
+	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0)
+		*needs_clflush |= !cpu_cache_is_coherent(obj->base.dev,
+							 obj->cache_level);
+
 	ret = i915_gem_object_get_pages(obj);
 	if (ret)
 		return ret;
 
 	i915_gem_object_pin_pages(obj);
 
-	return ret;
+	if (*needs_clflush && !cpu_has_clflush) {
+		ret = i915_gem_object_set_to_cpu_domain(obj, true);
+		if (ret) {
+			i915_gem_object_unpin_pages(obj);
+			return ret;
+		}
+		*needs_clflush = 0;
+	}
+
+	intel_fb_obj_invalidate(obj, ORIGIN_CPU);
+	obj->dirty = 1;
+	return 0;
 }
 
 /* Per-page copy function for the shmem pread fastpath.
@@ -916,41 +974,17 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 	int shmem_page_offset, page_length, ret = 0;
 	int obj_do_bit17_swizzling, page_do_bit17_swizzling;
 	int hit_slowpath = 0;
-	int needs_clflush_after = 0;
-	int needs_clflush_before = 0;
+	unsigned needs_clflush;
 	struct sg_page_iter sg_iter;
 
-	user_data = to_user_ptr(args->data_ptr);
-	remain = args->size;
-
-	obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
-
-	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
-		/* If we're not in the cpu write domain, set ourself into the gtt
-		 * write domain and manually flush cachelines (if required). This
-		 * optimizes for the case when the gpu will use the data
-		 * right away and we therefore have to clflush anyway. */
-		needs_clflush_after = cpu_write_needs_clflush(obj);
-		ret = i915_gem_object_wait_rendering(obj, false);
-		if (ret)
-			return ret;
-	}
-	/* Same trick applies to invalidate partially written cachelines read
-	 * before writing. */
-	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0)
-		needs_clflush_before =
-			!cpu_cache_is_coherent(dev, obj->cache_level);
-
-	ret = i915_gem_object_get_pages(obj);
+	ret = i915_gem_obj_prepare_shmem_write(obj, &needs_clflush);
 	if (ret)
 		return ret;
 
-	intel_fb_obj_invalidate(obj, ORIGIN_CPU);
-
-	i915_gem_object_pin_pages(obj);
-
+	obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
+	user_data = to_user_ptr(args->data_ptr);
 	offset = args->offset;
-	obj->dirty = 1;
+	remain = args->size;
 
 	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents,
 			 offset >> PAGE_SHIFT) {
@@ -974,7 +1008,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 		/* If we don't overwrite a cacheline completely we need to be
 		 * careful to have up-to-date data by first clflushing. Don't
 		 * overcomplicate things and flush the entire patch. */
-		partial_cacheline_write = needs_clflush_before &&
+		partial_cacheline_write = needs_clflush & CLFLUSH_BEFORE &&
 			((shmem_page_offset | page_length)
 				& (boot_cpu_data.x86_clflush_size - 1));
 
@@ -984,7 +1018,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 		ret = shmem_pwrite_fast(page, shmem_page_offset, page_length,
 					user_data, page_do_bit17_swizzling,
 					partial_cacheline_write,
-					needs_clflush_after);
+					needs_clflush & CLFLUSH_AFTER);
 		if (ret == 0)
 			goto next_page;
 
@@ -993,7 +1027,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 		ret = shmem_pwrite_slow(page, shmem_page_offset, page_length,
 					user_data, page_do_bit17_swizzling,
 					partial_cacheline_write,
-					needs_clflush_after);
+					needs_clflush & CLFLUSH_AFTER);
 
 		mutex_lock(&dev->struct_mutex);
 
@@ -1015,14 +1049,14 @@ out:
 		 * cachelines in-line while writing and the object moved
 		 * out of the cpu write domain while we've dropped the lock.
 		 */
-		if (!needs_clflush_after &&
+		if (!(needs_clflush & CLFLUSH_AFTER) &&
 		    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
 			if (i915_gem_clflush_object(obj, obj->pin_display))
-				needs_clflush_after = true;
+				needs_clflush |= CLFLUSH_AFTER;
 		}
 	}
 
-	if (needs_clflush_after)
+	if (needs_clflush & CLFLUSH_AFTER)
 		i915_gem_chipset_flush(dev);
 	else
 		obj->cache_dirty = true;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 129/190] drm/i915: Before accessing an object via the cpu, flush GTT writes
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (40 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 128/190] drm/i915: Extract i915_gem_obj_prepare_shmem_write() Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 130/190] drm/i915: Wait for writes through the GTT to land before reading back Chris Wilson
                     ` (11 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

If we want to read the pages directly via the CPU, we have to be sure
that we have to flush the writes via the GTT (as the CPU can not see
the address aliasing).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e18c0d4d24ad..c12bda7a4277 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -500,6 +500,8 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 	if (!obj->base.filp)
 		return -EINVAL;
 
+	i915_gem_object_flush_gtt_write_domain(obj);
+
 	if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) {
 		ret = i915_gem_object_wait_rendering(obj, true);
 		if (ret)
@@ -540,6 +542,8 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
 	if (!obj->base.filp)
 		return -EINVAL;
 
+	i915_gem_object_flush_gtt_write_domain(obj);
+
 	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
 		ret = i915_gem_object_wait_rendering(obj, false);
 		if (ret)
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 130/190] drm/i915: Wait for writes through the GTT to land before reading back
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (41 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 129/190] drm/i915: Before accessing an object via the cpu, flush GTT writes Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 131/190] drm/i915: Pin the pages first in shmem prepare read/write Chris Wilson
                     ` (10 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

If we quickly switch from writing through the GTT to a read of the
physical page directly with the CPU (e.g. performing relocations through
the GTT and then running the command parser), we can observe that the
writes are not visible to the CPU. It is not a coherency problem, as
extensive investigations with clflush have demonstrated, but a mere
timing issue - we have to wait for the GTT to complete it's write before
we start our read from the CPU.

The issue can be illustrated in userspace with:

	gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
	cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
	gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);

	for (i = 0; i < OBJECT_SIZE / 64; i++) {
		int x = 16*i + (i%16);
		gtt[x] = i;
		clflush(&cpu[x], sizeof(cpu[x]));
		assert(cpu[x] == i);
	}

Experimenting with that shows that this behaviour is indeed limited to
recent Atom-class hardware.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c12bda7a4277..edc00b7c82b1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2945,20 +2945,30 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
 static void
 i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj)
 {
+	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
 	uint32_t old_write_domain;
 
 	if (obj->base.write_domain != I915_GEM_DOMAIN_GTT)
 		return;
 
 	/* No actual flushing is required for the GTT write domain.  Writes
-	 * to it immediately go to main memory as far as we know, so there's
+	 * to it "immediately" go to main memory as far as we know, so there's
 	 * no chipset flush.  It also doesn't land in render cache.
 	 *
 	 * However, we do have to enforce the order so that all writes through
 	 * the GTT land before any writes to the device, such as updates to
 	 * the GATT itself.
+	 *
+	 * We also have to wait a bit for the writes to land from the GTT.
+	 * An uncached read (i.e. mmio) seems to be ideal for the round-trip
+	 * timing. This issue has only been observed when switching quickly
+	 * between GTT writes and CPU reads from inside the kernel on recent hw,
+	 * and it appears to only affect discrete GTT blocks (i.e. on LLC
+	 * system agents we cannot reproduce this behaviour).
 	 */
 	wmb();
+	if (INTEL_INFO(dev_priv)->gen >= 6 && !HAS_LLC(dev_priv))
+		POSTING_READ_FW(RING_ACTHD(dev_priv->ring[RCS].mmio_base));
 
 	old_write_domain = obj->base.write_domain;
 	obj->base.write_domain = 0;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 131/190] drm/i915: Pin the pages first in shmem prepare read/write
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (42 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 130/190] drm/i915: Wait for writes through the GTT to land before reading back Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 132/190] drm/i915: Tidy up flush cpu/gtt write domains Chris Wilson
                     ` (9 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

There is an improbable, but not impossible, case that if we leave the
pages unpin as we operate on the object, then somebody may steal the
lock and change the cache domains after we have already inspected them.

(Whilst here, avail ourselves of the opportunity to take a couple of
steps to make the two functions look more similar.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 88 ++++++++++++++++++++++++-----------------
 1 file changed, 51 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index edc00b7c82b1..dcdc5c8a5ba8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -500,13 +500,22 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 	if (!obj->base.filp)
 		return -EINVAL;
 
+	ret = i915_gem_object_get_pages(obj);
+	if (ret)
+		return ret;
+
+	i915_gem_object_pin_pages(obj);
+
+	if (obj->base.write_domain == I915_GEM_DOMAIN_CPU)
+		goto out;
+
+	ret = i915_gem_object_wait_rendering(obj, true);
+	if (ret)
+		goto err_unpin;
+
 	i915_gem_object_flush_gtt_write_domain(obj);
 
 	if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) {
-		ret = i915_gem_object_wait_rendering(obj, true);
-		if (ret)
-			return ret;
-
 		/* If we're not in the cpu read domain, set ourself into the gtt
 		 * read domain and manually flush cachelines (if required). This
 		 * optimizes for the case when the gpu will dirty the data
@@ -515,26 +524,25 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 							obj->cache_level);
 	}
 
-	ret = i915_gem_object_get_pages(obj);
-	if (ret)
-		return ret;
-
-	i915_gem_object_pin_pages(obj);
-
 	if (*needs_clflush && !cpu_has_clflush) {
 		ret = i915_gem_object_set_to_cpu_domain(obj, false);
-		if (ret) {
-			i915_gem_object_unpin_pages(obj);
-			return ret;
-		}
+		if (ret)
+			goto err_unpin;
+
 		*needs_clflush = 0;
 	}
 
+out:
+	/* return with the pages pinned */
 	return 0;
+
+err_unpin:
+	i915_gem_object_unpin_pages(obj);
+	return ret;
 }
 
 int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
-				    unsigned *needs_clflush)
+				     unsigned *needs_clflush)
 {
 	int ret;
 
@@ -542,20 +550,27 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
 	if (!obj->base.filp)
 		return -EINVAL;
 
-	i915_gem_object_flush_gtt_write_domain(obj);
+	ret = i915_gem_object_get_pages(obj);
+	if (ret)
+		return ret;
 
-	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
-		ret = i915_gem_object_wait_rendering(obj, false);
-		if (ret)
-			return ret;
+	i915_gem_object_pin_pages(obj);
 
-		/* If we're not in the cpu write domain, set ourself into the
-		 * gtt write domain and manually flush cachelines (as required).
-		 * This optimizes for the case when the gpu will use the data
-		 * right away and we therefore have to clflush anyway.
-		 */
-		*needs_clflush |= cpu_write_needs_clflush(obj) << 1;
-	}
+	if (obj->base.write_domain == I915_GEM_DOMAIN_CPU)
+		goto out;
+
+	ret = i915_gem_object_wait_rendering(obj, false);
+	if (ret)
+		goto err_unpin;
+
+	i915_gem_object_flush_gtt_write_domain(obj);
+
+	/* If we're not in the cpu write domain, set ourself into the
+	 * gtt write domain and manually flush cachelines (as required).
+	 * This optimizes for the case when the gpu will use the data
+	 * right away and we therefore have to clflush anyway.
+	 */
+	*needs_clflush |= cpu_write_needs_clflush(obj) << 1;
 
 	/* Same trick applies to invalidate partially written cachelines read
 	 * before writing.
@@ -564,24 +579,23 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
 		*needs_clflush |= !cpu_cache_is_coherent(obj->base.dev,
 							 obj->cache_level);
 
-	ret = i915_gem_object_get_pages(obj);
-	if (ret)
-		return ret;
-
-	i915_gem_object_pin_pages(obj);
-
 	if (*needs_clflush && !cpu_has_clflush) {
 		ret = i915_gem_object_set_to_cpu_domain(obj, true);
-		if (ret) {
-			i915_gem_object_unpin_pages(obj);
-			return ret;
-		}
+		if (ret)
+			goto err_unpin;
+
 		*needs_clflush = 0;
 	}
 
+out:
 	intel_fb_obj_invalidate(obj, ORIGIN_CPU);
 	obj->dirty = 1;
+	/* return with the pages pinned */
 	return 0;
+
+err_unpin:
+	i915_gem_object_unpin_pages(obj);
+	return ret;
 }
 
 /* Per-page copy function for the shmem pread fastpath.
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 132/190] drm/i915: Tidy up flush cpu/gtt write domains
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (43 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 131/190] drm/i915: Pin the pages first in shmem prepare read/write Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 133/190] drm/i915: Convert known clflush paths over to clflush_cache_range() Chris Wilson
                     ` (8 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

Since we know the write domain, we can drop the local variable and make
the code look a tiny bit simpler.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index dcdc5c8a5ba8..c3d43921bc98 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2960,7 +2960,6 @@ static void
 i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj)
 {
 	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
-	uint32_t old_write_domain;
 
 	if (obj->base.write_domain != I915_GEM_DOMAIN_GTT)
 		return;
@@ -2984,36 +2983,30 @@ i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj)
 	if (INTEL_INFO(dev_priv)->gen >= 6 && !HAS_LLC(dev_priv))
 		POSTING_READ_FW(RING_ACTHD(dev_priv->ring[RCS].mmio_base));
 
-	old_write_domain = obj->base.write_domain;
-	obj->base.write_domain = 0;
-
 	intel_fb_obj_flush(obj, false, ORIGIN_GTT);
 
+	obj->base.write_domain = 0;
 	trace_i915_gem_object_change_domain(obj,
 					    obj->base.read_domains,
-					    old_write_domain);
+					    I915_GEM_DOMAIN_GTT);
 }
 
 /** Flushes the CPU write domain for the object if it's dirty. */
 static void
 i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj)
 {
-	uint32_t old_write_domain;
-
 	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU)
 		return;
 
 	if (i915_gem_clflush_object(obj, obj->pin_display))
 		i915_gem_chipset_flush(obj->base.dev);
 
-	old_write_domain = obj->base.write_domain;
-	obj->base.write_domain = 0;
-
 	intel_fb_obj_flush(obj, false, ORIGIN_CPU);
 
+	obj->base.write_domain = 0;
 	trace_i915_gem_object_change_domain(obj,
 					    obj->base.read_domains,
-					    old_write_domain);
+					    I915_GEM_DOMAIN_CPU);
 }
 
 /**
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 133/190] drm/i915: Convert known clflush paths over to clflush_cache_range()
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (44 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 132/190] drm/i915: Tidy up flush cpu/gtt write domains Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 134/190] drm/i915: Refactor execbuffer relocation writing Chris Wilson
                     ` (7 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

A step towards removing redundant functions from the kernel, in this
case both drm and arch/86 define a clflush(addr, range) operation. The
difference is that drm_clflush_virt_range() provides a wbinvd()
fallback, but along most paths, we only clflush when we know we can.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c         | 13 +++++--------
 drivers/gpu/drm/i915/i915_gem_gtt.c     |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  4 ++--
 3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c3d43921bc98..d81821c6f9a1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -614,8 +614,7 @@ shmem_pread_fast(struct page *page, int shmem_page_offset, int page_length,
 
 	vaddr = kmap_atomic(page);
 	if (needs_clflush)
-		drm_clflush_virt_range(vaddr + shmem_page_offset,
-				       page_length);
+		clflush_cache_range(vaddr + shmem_page_offset, page_length);
 	ret = __copy_to_user_inatomic(user_data,
 				      vaddr + shmem_page_offset,
 				      page_length);
@@ -639,9 +638,9 @@ shmem_clflush_swizzled_range(char *addr, unsigned long length,
 		start = round_down(start, 128);
 		end = round_up(end, 128);
 
-		drm_clflush_virt_range((void *)start, end - start);
+		clflush_cache_range((void *)start, end - start);
 	} else {
-		drm_clflush_virt_range(addr, length);
+		clflush_cache_range(addr, length);
 	}
 
 }
@@ -934,13 +933,11 @@ shmem_pwrite_fast(struct page *page, int shmem_page_offset, int page_length,
 
 	vaddr = kmap_atomic(page);
 	if (needs_clflush_before)
-		drm_clflush_virt_range(vaddr + shmem_page_offset,
-				       page_length);
+		clflush_cache_range(vaddr + shmem_page_offset, page_length);
 	ret = __copy_from_user_inatomic(vaddr + shmem_page_offset,
 					user_data, page_length);
 	if (needs_clflush_after)
-		drm_clflush_virt_range(vaddr + shmem_page_offset,
-				       page_length);
+		clflush_cache_range(vaddr + shmem_page_offset, page_length);
 	kunmap_atomic(vaddr);
 
 	return ret ? -EFAULT : 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0aadfaee2150..b8af904ad12c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -357,7 +357,7 @@ static void kunmap_page_dma(struct drm_device *dev, void *vaddr)
 	 * And we are not sure about the latter so play safe for now.
 	 */
 	if (IS_CHERRYVIEW(dev) || IS_BROXTON(dev))
-		drm_clflush_virt_range(vaddr, PAGE_SIZE);
+		clflush_cache_range(vaddr, PAGE_SIZE);
 
 	kunmap_atomic(vaddr);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 894eb8089296..a66213b2450e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -395,8 +395,8 @@ intel_engine_sync_index(struct intel_engine_cs *ring,
 static inline void
 intel_flush_status_page(struct intel_engine_cs *ring, int reg)
 {
-	drm_clflush_virt_range(&ring->status_page.page_addr[reg],
-			       sizeof(uint32_t));
+	clflush_cache_range(&ring->status_page.page_addr[reg],
+			    sizeof(uint32_t));
 }
 
 static inline u32
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 134/190] drm/i915: Refactor execbuffer relocation writing
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (45 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 133/190] drm/i915: Convert known clflush paths over to clflush_cache_range() Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 135/190] drm/i915: Move map-and-fenceable tracking to the VMA Chris Wilson
                     ` (6 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

With in the introduction of the reloc page cache, we are just one step
away from refactoring the relocation write functions into one. Not only
does it tidy the code (slightly), but it greatly simplifies the control
logic much to gcc's satisfaction.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 290 +++++++++++++++--------------
 1 file changed, 150 insertions(+), 140 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index f1dfb51ae4e3..569be409c049 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -34,6 +34,8 @@
 #include <linux/dma_remapping.h>
 #include <linux/uaccess.h>
 
+#define DBG_USE_CPU_RELOC 0 /* force relocations to use the CPU write path */
+
 #define  __EXEC_OBJECT_HAS_PIN (1U<<31)
 #define  __EXEC_OBJECT_HAS_FENCE (1U<<30)
 #define  __EXEC_OBJECT_NEEDS_MAP (1U<<29)
@@ -53,6 +55,7 @@ struct i915_execbuffer_params {
 };
 
 struct eb_vmas {
+	struct drm_i915_private *i915;
 	struct list_head vmas;
 	int and;
 	union {
@@ -62,7 +65,8 @@ struct eb_vmas {
 };
 
 static struct eb_vmas *
-eb_create(struct drm_i915_gem_execbuffer2 *args)
+eb_create(struct drm_i915_private *i915,
+	  struct drm_i915_gem_execbuffer2 *args)
 {
 	struct eb_vmas *eb = NULL;
 
@@ -89,6 +93,7 @@ eb_create(struct drm_i915_gem_execbuffer2 *args)
 	} else
 		eb->and = -args->buffer_count;
 
+	eb->i915 = i915;
 	INIT_LIST_HEAD(&eb->vmas);
 	return eb;
 }
@@ -275,7 +280,8 @@ static void eb_destroy(struct eb_vmas *eb)
 
 static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
 {
-	return (HAS_LLC(obj->base.dev) ||
+	return (DBG_USE_CPU_RELOC ||
+		HAS_LLC(obj->base.dev) ||
 		obj->base.write_domain == I915_GEM_DOMAIN_CPU ||
 		obj->cache_level != I915_CACHE_NONE);
 }
@@ -299,32 +305,58 @@ static inline uint64_t gen8_noncanonical_addr(uint64_t address)
 }
 
 static inline uint64_t
-relocation_target(struct drm_i915_gem_relocation_entry *reloc,
+relocation_target(const struct drm_i915_gem_relocation_entry *reloc,
 		  uint64_t target_offset)
 {
 	return gen8_canonical_addr((int)reloc->delta + target_offset);
 }
 
 struct reloc_cache {
-	void *vaddr;
+	struct drm_i915_private *i915;
+	unsigned long vaddr;
 	unsigned page;
-	enum { KMAP, IOMAP } type;
+	struct drm_mm_node node;
+	bool use_64bit_reloc;
 };
 
-static void reloc_cache_init(struct reloc_cache *cache)
+static void reloc_cache_init(struct reloc_cache *cache,
+			     struct drm_i915_private *i915)
 {
 	cache->page = -1;
-	cache->vaddr = NULL;
+	cache->vaddr = 0;
+	cache->i915 = i915;
+	cache->use_64bit_reloc = INTEL_INFO(cache->i915)->gen >= 8;
+}
+
+static inline void *unmask_page(unsigned long p)
+{
+	return (void *)(uintptr_t)(p & PAGE_MASK);
 }
 
+static inline unsigned unmask_flags(unsigned long p)
+{
+	return p & ~PAGE_MASK;
+}
+
+#define KMAP 0x4
+
 static void reloc_cache_fini(struct reloc_cache *cache)
 {
-	if (cache->vaddr == NULL)
+	void *vaddr;
+
+	if (cache->vaddr == 0)
 		return;
 
-	switch (cache->type) {
-	case KMAP: kunmap_atomic(cache->vaddr); break;
-	case IOMAP: io_mapping_unmap_atomic(cache->vaddr); break;
+	vaddr = unmask_page(cache->vaddr);
+	if (cache->vaddr & KMAP) {
+		if (cache->vaddr & CLFLUSH_AFTER)
+			mb();
+
+		kunmap_atomic(vaddr);
+		i915_gem_object_unpin_pages((struct drm_i915_gem_object *)cache->node.mm);
+	} else {
+		io_mapping_unmap_atomic(vaddr);
+		i915_vma_unpin((struct i915_vma *)cache->node.mm);
 	}
 }
 
@@ -332,149 +364,138 @@ static void *reloc_kmap(struct drm_i915_gem_object *obj,
 			struct reloc_cache *cache,
 			int page)
 {
-	if (cache->page == page)
-		return cache->vaddr;
+	void *vaddr;
+
+	if (cache->vaddr) {
+		kunmap_atomic(unmask_page(cache->vaddr));
+	} else {
+		unsigned flushes;
+		int ret;
+
+		ret = i915_gem_obj_prepare_shmem_write(obj, &flushes);
+		if (ret)
+			return ERR_PTR(ret);
 
-	if (cache->vaddr)
-		kunmap_atomic(cache->vaddr);
+		cache->vaddr = flushes | KMAP;
+		cache->node.mm = (void *)obj;
+		if (flushes)
+			mb();
+	}
 
+	vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj, page));
+	cache->vaddr = unmask_flags(cache->vaddr) | (unsigned long)vaddr;
 	cache->page = page;
-	cache->vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj, page));
-	cache->type = KMAP;
 
-	return cache->vaddr;
+	return vaddr;
 }
 
-static int
-relocate_entry_cpu(struct drm_i915_gem_object *obj,
-		   struct drm_i915_gem_relocation_entry *reloc,
-		   struct reloc_cache *cache,
-		   uint64_t target_offset)
+static void *reloc_iomap(struct drm_i915_gem_object *obj,
+			 struct reloc_cache *cache,
+			 int page)
 {
-	struct drm_device *dev = obj->base.dev;
-	uint32_t page_offset = offset_in_page(reloc->offset);
-	uint64_t delta = relocation_target(reloc, target_offset);
-	char *vaddr;
-	int ret;
+	void *vaddr;
 
-	ret = i915_gem_object_set_to_cpu_domain(obj, true);
-	if (ret)
-		return ret;
+	if (cache->vaddr) {
+		io_mapping_unmap_atomic(unmask_page(cache->vaddr));
+	} else {
+		struct i915_vma *vma;
+		int ret;
 
-	vaddr = reloc_kmap(obj, cache, reloc->offset >> PAGE_SHIFT);
-	*(uint32_t *)(vaddr + page_offset) = lower_32_bits(delta);
+		if (use_cpu_reloc(obj))
+			return NULL;
 
-	if (INTEL_INFO(dev)->gen >= 8) {
-		page_offset += sizeof(uint32_t);
-		if (page_offset == PAGE_SIZE) {
-			vaddr = reloc_kmap(obj, cache, cache->page + 1);
-			page_offset = 0;
-		}
-		*(uint32_t *)(vaddr + page_offset) = upper_32_bits(delta);
-	}
+		vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
+					       PIN_MAPPABLE | PIN_NONBLOCK);
+		if (IS_ERR(vma))
+			return NULL;
 
-	return 0;
-}
+		ret = i915_gem_object_set_to_gtt_domain(obj, true);
+		if (ret)
+			return ERR_PTR(ret);
 
-static void *reloc_iomap(struct drm_i915_private *i915,
-			 struct reloc_cache *cache,
-			 uint64_t offset)
-{
-	if (cache->page == offset >> PAGE_SHIFT)
-		return cache->vaddr;
+		ret = i915_gem_object_put_fence(obj);
+		if (ret)
+			return ERR_PTR(ret);
 
-	if (cache->vaddr)
-		io_mapping_unmap_atomic(cache->vaddr);
+		cache->node.start = vma->node.start;
+		cache->node.mm = (void *)vma;
+	}
 
-	cache->page = offset >> PAGE_SHIFT;
-	cache->vaddr =
-		io_mapping_map_atomic_wc(i915->gtt.mappable,
-					 offset & PAGE_MASK);
-	cache->type = IOMAP;
+	vaddr = io_mapping_map_atomic_wc(cache->i915->gtt.mappable,
+					 cache->node.start + (page << PAGE_SHIFT));
+	cache->page = page;
+	cache->vaddr = (unsigned long)vaddr;
 
-	return cache->vaddr;
+	return vaddr;
 }
 
-static int
-relocate_entry_gtt(struct drm_i915_gem_object *obj,
-		   struct drm_i915_gem_relocation_entry *reloc,
-		   struct reloc_cache *cache,
-		   uint64_t target_offset)
+static void *reloc_vaddr(struct drm_i915_gem_object *obj,
+			 struct reloc_cache *cache,
+			 int page)
 {
-	struct drm_device *dev = obj->base.dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct i915_vma *vma;
-	uint64_t delta = relocation_target(reloc, target_offset);
-	uint64_t offset;
-	void __iomem *reloc_page;
-	int ret;
-
-	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE);
-	if (IS_ERR(vma))
-		return PTR_ERR(vma);
-
-	ret = i915_gem_object_set_to_gtt_domain(obj, true);
-	if (ret)
-		goto unpin;
-
-	ret = i915_gem_object_put_fence(obj);
-	if (ret)
-		goto unpin;
-
-	/* Map the page containing the relocation we're going to perform.  */
-	offset = vma->node.start;
-	offset += reloc->offset;
-	reloc_page = reloc_iomap(dev_priv, cache, offset);
-	iowrite32(lower_32_bits(delta), reloc_page + offset_in_page(offset));
+	void *vaddr;
 
-	if (INTEL_INFO(dev)->gen >= 8) {
-		offset += sizeof(uint32_t);
-		if (offset_in_page(offset) == 0)
-			reloc_page = reloc_iomap(dev_priv, cache, offset);
-		iowrite32(upper_32_bits(delta),
-			  reloc_page + offset_in_page(offset));
+	if (cache->page == page) {
+		vaddr = unmask_page(cache->vaddr);
+	} else {
+		vaddr = NULL;
+		if ((cache->vaddr & KMAP) == 0)
+			vaddr = reloc_iomap(obj, cache, page);
+		if (vaddr == NULL)
+			vaddr = reloc_kmap(obj, cache, page);
 	}
 
-unpin:
-	__i915_vma_unpin(vma);
-	return ret;
+	return vaddr;
 }
 
 static void
-clflush_write32(void *addr, uint32_t value)
+clflush_write32(uint32_t *addr, uint32_t value, unsigned flushes)
 {
-	/* This is not a fast path, so KISS. */
-	drm_clflush_virt_range(addr, sizeof(uint32_t));
-	*(uint32_t *)addr = value;
-	drm_clflush_virt_range(addr, sizeof(uint32_t));
+	if (unlikely(flushes)) {
+		if (flushes & CLFLUSH_BEFORE) {
+			clflushopt(addr);
+			mb();
+		}
+
+		*addr = value;
+
+		/* Writes to the same cacheline are serialised by the CPU
+		 * (including clflush). On the write path, we only require
+		 * that it hits memory in an orderly fashion and place
+		 * mb barriers at the start and end of the relocation phase
+		 * to ensure ordering of clflush wrt to the system.
+		 */
+		if (flushes & CLFLUSH_AFTER)
+			clflushopt(addr);
+	} else
+		*addr = value;
 }
 
 static int
-relocate_entry_clflush(struct drm_i915_gem_object *obj,
-		       struct drm_i915_gem_relocation_entry *reloc,
-		       struct reloc_cache *cache,
-		       uint64_t target_offset)
+relocate_entry(struct drm_i915_gem_object *obj,
+	       const struct drm_i915_gem_relocation_entry *reloc,
+	       struct reloc_cache *cache,
+	       uint64_t target_offset)
 {
-	struct drm_device *dev = obj->base.dev;
-	uint32_t page_offset = offset_in_page(reloc->offset);
-	uint64_t delta = relocation_target(reloc, target_offset);
-	char *vaddr;
-	int ret;
+	uint32_t offset = reloc->offset;
+	bool wide = cache->use_64bit_reloc;
+	void *vaddr;
 
-	ret = i915_gem_object_set_to_gtt_domain(obj, true);
-	if (ret)
-		return ret;
+	target_offset = relocation_target(reloc, target_offset);
+repeat:
+	vaddr = reloc_vaddr(obj, cache, offset >> PAGE_SHIFT);
+	if (IS_ERR(vaddr))
+		return PTR_ERR(vaddr);
 
-	vaddr = reloc_kmap(obj, cache, reloc->offset >> PAGE_SHIFT);
-	clflush_write32(vaddr + page_offset, lower_32_bits(delta));
+	clflush_write32(vaddr + offset_in_page(offset),
+			lower_32_bits(target_offset),
+			unmask_flags(cache->vaddr) & (CLFLUSH_BEFORE | CLFLUSH_AFTER));
 
-	if (INTEL_INFO(dev)->gen >= 8) {
-		page_offset += sizeof(uint32_t);
-		if (page_offset == PAGE_SIZE) {
-			vaddr = reloc_kmap(obj, cache, cache->page + 1);
-			page_offset = 0;
-		}
-		clflush_write32(vaddr + page_offset, upper_32_bits(delta));
+	if (wide) {
+		offset += sizeof(uint32_t);
+		target_offset >>= 32;
+		wide = false;
+		goto repeat;
 	}
 
 	return 0;
@@ -547,7 +568,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 
 	/* Check that the relocation address is valid... */
 	if (unlikely(reloc->offset >
-		obj->base.size - (INTEL_INFO(dev)->gen >= 8 ? 8 : 4))) {
+		     obj->base.size - (cache->use_64bit_reloc ? 8 : 4))) {
 		DRM_DEBUG("Relocation beyond object bounds: "
 			  "obj %p target %d offset %d size %d.\n",
 			  obj, reloc->target_handle,
@@ -567,23 +588,12 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 	if (i915_gem_object_is_active(obj) && pagefault_disabled())
 		return -EFAULT;
 
-	if (use_cpu_reloc(obj))
-		ret = relocate_entry_cpu(obj, reloc, cache, target_offset);
-	else if (obj->map_and_fenceable)
-		ret = relocate_entry_gtt(obj, reloc, cache, target_offset);
-	else if (cpu_has_clflush)
-		ret = relocate_entry_clflush(obj, reloc, cache, target_offset);
-	else {
-		WARN_ONCE(1, "Impossible case in relocation handling\n");
-		ret = -ENODEV;
-	}
-
+	ret = relocate_entry(obj, reloc, cache, target_offset);
 	if (ret)
 		return ret;
 
 	/* and update the user's relocation entry */
 	reloc->presumed_offset = target_offset;
-
 	return 0;
 }
 
@@ -599,7 +609,7 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 	int remain, ret = 0;
 
 	user_relocs = to_user_ptr(entry->relocs_ptr);
-	reloc_cache_init(&cache);
+	reloc_cache_init(&cache, eb->i915);
 
 	remain = entry->relocation_count;
 	while (remain) {
@@ -649,7 +659,7 @@ i915_gem_execbuffer_relocate_vma_slow(struct i915_vma *vma,
 	struct reloc_cache cache;
 	int i, ret = 0;
 
-	reloc_cache_init(&cache);
+	reloc_cache_init(&cache, eb->i915);
 	for (i = 0; i < entry->relocation_count; i++) {
 		ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, &relocs[i], &cache);
 		if (ret)
@@ -1060,8 +1070,8 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 	if (flush_chipset)
 		i915_gem_chipset_flush(req->engine->dev);
 
-	if (flush_domains & I915_GEM_DOMAIN_GTT)
-		wmb();
+	/* Make sure (untracked) CPU relocs/parsing are flushed */
+	wmb();
 
 	/* Unconditionally invalidate gpu caches and TLBs. */
 	return req->engine->emit_flush(req, I915_GEM_GPU_DOMAINS, 0);
@@ -1582,7 +1592,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 
 	memset(&params_master, 0x00, sizeof(params_master));
 
-	eb = eb_create(args);
+	eb = eb_create(dev_priv, args);
 	if (eb == NULL) {
 		i915_gem_context_unreference(ctx);
 		mutex_unlock(&dev->struct_mutex);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 135/190] drm/i915: Move map-and-fenceable tracking to the VMA
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (46 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 134/190] drm/i915: Refactor execbuffer relocation writing Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 136/190] drm/i915: Move ioremap_wc tracking onto VMA Chris Wilson
                     ` (5 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

By moving map-and-fenceable tracking from the object to the VMA, we gain
fine-grained tracking and the ability to track individual fences on the VMA
(subsequent patch).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        | 46 +++++++++++++-----------------
 drivers/gpu/drm/i915/i915_drv.h            |  6 ----
 drivers/gpu/drm/i915/i915_gem.c            | 31 +++++++++-----------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +-
 drivers/gpu/drm/i915/i915_gem_fence.c      |  5 +---
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  6 ++++
 drivers/gpu/drm/i915/i915_gem_tiling.c     |  4 +--
 drivers/gpu/drm/i915/intel_display.c       |  6 ++--
 8 files changed, 46 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e923dc192f54..418b80de5246 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -112,19 +112,6 @@ static inline const char *get_global_flag(struct drm_i915_gem_object *obj)
 	return i915_gem_object_to_ggtt(obj, NULL) ? "g" : " ";
 }
 
-static u64 i915_gem_obj_total_ggtt_size(struct drm_i915_gem_object *obj)
-{
-	u64 size = 0;
-	struct i915_vma *vma;
-
-	list_for_each_entry(vma, &obj->vma_list, obj_link) {
-		if (vma->is_ggtt && drm_mm_node_allocated(&vma->node))
-			size += vma->node.size;
-	}
-
-	return size;
-}
-
 static void
 describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 {
@@ -309,17 +296,6 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 	return 0;
 }
 
-#define count_objects(list, member) do { \
-	list_for_each_entry(obj, list, member) { \
-		size += i915_gem_obj_total_ggtt_size(obj); \
-		++count; \
-		if (obj->map_and_fenceable) { \
-			mappable_size += obj->base.size; \
-			++mappable_count; \
-		} \
-	} \
-} while (0)
-
 struct file_stats {
 	struct drm_i915_file_private *file_priv;
 	unsigned long count;
@@ -404,7 +380,7 @@ static void print_batch_pool_stats(struct seq_file *m,
 	list_for_each_entry(vma, list, member) { \
 		size += vma->size; \
 		++count; \
-		if (vma->obj->map_and_fenceable) { \
+		if (vma->map_and_fenceable) { \
 			mappable_size += vma->size; \
 			++mappable_count; \
 		} \
@@ -433,7 +409,25 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 		   dev_priv->mm.object_memory);
 
 	size = count = mappable_size = mappable_count = 0;
-	count_objects(&dev_priv->mm.bound_list, global_list);
+	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
+		bool allocated = false, mappable = false;
+
+		list_for_each_entry(vma, &obj->vma_list, obj_link) {
+			if (!vma->is_ggtt)
+				continue;
+
+			allocated = true;
+			size += vma->node.size;
+
+			if (vma->map_and_fenceable) {
+				mappable = true;
+				mappable_size += vma->node.size;
+			}
+		}
+
+		count += allocated;
+		mappable_count += mappable;
+	}
 	seq_printf(m, "%u [%u] objects, %llu [%llu] bytes in gtt\n",
 		   count, mappable_count, size, mappable_size);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 45a0da0947cd..cfc4430d3b50 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2078,12 +2078,6 @@ struct drm_i915_gem_object {
 	unsigned int fence_dirty:1;
 
 	/**
-	 * Is the object at the current location in the gtt mappable and
-	 * fenceable? Used to avoid costly recalculations.
-	 */
-	unsigned int map_and_fenceable:1;
-
-	/**
 	 * Whether the current gtt mapping needs to be mappable (and isn't just
 	 * mappable by accident). Track pin and fault separate for a more
 	 * accurate mappable working set.
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d81821c6f9a1..0c4e8e1aeeff 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2691,13 +2691,15 @@ int i915_vma_unbind(struct i915_vma *vma)
 	GEM_BUG_ON(obj->bind_count == 0);
 	GEM_BUG_ON(obj->pages == NULL);
 
-	if (vma->is_ggtt && vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
+	if (vma->map_and_fenceable) {
 		i915_gem_object_finish_gtt(obj);
 
 		/* release the fence reg _after_ flushing */
 		ret = i915_gem_object_put_fence(obj);
 		if (ret)
 			return ret;
+
+		vma->map_and_fenceable = false;
 	}
 
 	if (likely(!vma->vm->closed)) {
@@ -2709,10 +2711,8 @@ int i915_vma_unbind(struct i915_vma *vma)
 	drm_mm_remove_node(&vma->node);
 	list_move_tail(&vma->vm_link, &vma->vm->unbound_list);
 
-	if (vma->is_ggtt) {
-		if (vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
-			obj->map_and_fenceable = false;
-		} else if (vma->ggtt_view.pages) {
+	if (vma->ggtt_view.pages) {
+		if (vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL) {
 			sg_free_table(vma->ggtt_view.pages);
 			kfree(vma->ggtt_view.pages);
 		}
@@ -3480,8 +3480,6 @@ i915_vma_misplaced(struct i915_vma *vma,
 		   uint64_t alignment,
 		   uint64_t flags)
 {
-	struct drm_i915_gem_object *obj = vma->obj;
-
 	if (!drm_mm_node_allocated(&vma->node))
 		return false;
 
@@ -3491,7 +3489,7 @@ i915_vma_misplaced(struct i915_vma *vma,
 	if (alignment && vma->node.start & (alignment - 1))
 		return true;
 
-	if (flags & PIN_MAPPABLE && !obj->map_and_fenceable)
+	if (flags & PIN_MAPPABLE && !vma->map_and_fenceable)
 		return true;
 
 	if (flags & PIN_OFFSET_BIAS &&
@@ -3511,13 +3509,10 @@ void __i915_vma_set_map_and_fenceable(struct i915_vma *vma)
 	bool mappable, fenceable;
 	u32 fence_size, fence_alignment;
 
-	fence_size = i915_gem_get_gtt_size(obj->base.dev,
-					   obj->base.size,
+	fence_size = i915_gem_get_gtt_size(obj->base.dev, vma->size,
 					   obj->tiling_mode);
-	fence_alignment = i915_gem_get_gtt_alignment(obj->base.dev,
-						     obj->base.size,
-						     obj->tiling_mode,
-						     true);
+	fence_alignment = i915_gem_get_gtt_alignment(obj->base.dev, vma->size,
+						     obj->tiling_mode, true);
 
 	fenceable = (vma->node.size == fence_size &&
 		     (vma->node.start & (fence_alignment - 1)) == 0);
@@ -3525,7 +3520,7 @@ void __i915_vma_set_map_and_fenceable(struct i915_vma *vma)
 	mappable = (vma->node.start + fence_size <=
 		    to_i915(obj->base.dev)->gtt.mappable_end);
 
-	obj->map_and_fenceable = mappable && fenceable;
+	vma->map_and_fenceable = mappable && fenceable;
 }
 
 int
@@ -3593,13 +3588,13 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 
 		WARN(vma->pin_count,
 		     "bo is already pinned in ggtt with incorrect alignment:"
-		     " offset=%08x %08x, req.alignment=%llx, req.map_and_fenceable=%d,"
-		     " obj->map_and_fenceable=%d\n",
+		     " offset=%08x %08x, req.alignment=%llx,"
+		     " req.map_and_fenceable=%d, vma->map_and_fenceable=%d\n",
 		     upper_32_bits(vma->node.start),
 		     lower_32_bits(vma->node.start),
 		     (long long)alignment,
 		     !!(flags & PIN_MAPPABLE),
-		     obj->map_and_fenceable);
+		     vma->map_and_fenceable);
 		ret = i915_vma_unbind(vma);
 		if (ret)
 			return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 569be409c049..d13b7e507b3d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -792,7 +792,6 @@ static bool
 eb_vma_misplaced(struct i915_vma *vma)
 {
 	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	struct drm_i915_gem_object *obj = vma->obj;
 
 	WARN_ON(entry->flags & __EXEC_OBJECT_NEEDS_MAP && !vma->is_ggtt);
 
@@ -812,7 +811,7 @@ eb_vma_misplaced(struct i915_vma *vma)
 		return true;
 
 	/* avoid costly ping-pong once a batch bo ended up non-mappable */
-	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
+	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !vma->map_and_fenceable)
 		return !only_mappable_for_reloc(entry->flags);
 
 	if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) == 0 &&
diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
index 8ba05a0f15d2..e0f5fba22931 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence.c
@@ -124,7 +124,7 @@ static void i915_write_fence_reg(struct drm_device *dev, int reg,
 		     (vma->node.start & (vma->node.size - 1)),
 		     "object 0x%08lx [fenceable? %d] not 1M or pot-size (0x%08lx) aligned\n",
 		     (long)vma->node.start,
-		     obj->map_and_fenceable,
+		     vma->map_and_fenceable,
 		     (long)vma->node.size);
 
 		if (obj->tiling_mode == I915_TILING_Y && HAS_128_BYTE_Y_TILING(dev))
@@ -378,9 +378,6 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj)
 			return 0;
 		}
 	} else if (enable) {
-		if (WARN_ON(!obj->map_and_fenceable))
-			return -EINVAL;
-
 		reg = i915_find_fence_reg(dev);
 		if (IS_ERR(reg))
 			return PTR_ERR(reg);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 7f57dea246d8..6b0f557982d5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -205,6 +205,12 @@ struct i915_vma {
 			unsigned int bound : 4;
 			unsigned int active : I915_NUM_RINGS;
 			bool is_ggtt : 1;
+			/**
+			 * Is the vma/object at the current location in the gtt
+			 * mappable and fenceable? Used to avoid costly
+			 * recalculations.
+			 */
+			bool map_and_fenceable : 1;
 			bool closed : 1;
 		};
 		unsigned int flags;
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
index f83cb4329c8d..7c2da8060757 100644
--- a/drivers/gpu/drm/i915/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
@@ -130,7 +130,7 @@ i915_gem_object_fence_ok(struct drm_i915_gem_object *obj, int tiling_mode)
 	if (vma == NULL)
 		return 0;
 
-	if (!obj->map_and_fenceable)
+	if (!vma->map_and_fenceable)
 		return 0;
 
 	if (INTEL_INFO(obj->base.dev)->gen == 3) {
@@ -141,7 +141,7 @@ i915_gem_object_fence_ok(struct drm_i915_gem_object *obj, int tiling_mode)
 			goto bad;
 	}
 
-	size = i915_gem_get_gtt_size(obj->base.dev, obj->base.size, tiling_mode);
+	size = i915_gem_get_gtt_size(obj->base.dev, vma->size, tiling_mode);
 	if (vma->node.size < size)
 		goto bad;
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 313f1fb144b9..218bfd3c99fc 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2391,7 +2391,7 @@ intel_pin_and_fence_fb_obj(struct drm_plane *plane,
 	 * framebuffer compression.  For simplicity, we always install
 	 * a fence as the cost is not that onerous.
 	 */
-	if (view.type == I915_GGTT_VIEW_NORMAL) {
+	if (vma->map_and_fenceable) {
 		ret = i915_gem_object_get_fence(obj);
 		if (ret == -EDEADLK) {
 			/*
@@ -2430,11 +2430,11 @@ static void intel_unpin_fb_obj(struct drm_framebuffer *fb,
 	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
 
 	intel_fill_fb_ggtt_view(&view, fb, state);
+	vma = i915_gem_object_to_ggtt(obj, &view);
 
-	if (view.type == I915_GGTT_VIEW_NORMAL)
+	if (vma->map_and_fenceable)
 		i915_gem_object_unpin_fence(obj);
 
-	vma = i915_gem_object_to_ggtt(obj, &view);
 	i915_gem_object_unpin_from_display_plane(vma);
 }
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 136/190] drm/i915: Move ioremap_wc tracking onto VMA
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (47 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 135/190] drm/i915: Move map-and-fenceable tracking to the VMA Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-02-11 13:20     ` Tvrtko Ursulin
  2016-01-11 10:45   ` [PATCH 137/190] drm/i915: Shrink pages around failure to dma map Chris Wilson
                     ` (4 subsequent siblings)
  53 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

By tracking the iomapping on the VMA itself, we can share that area
between multiple users. Also by only revoking the iomapping upon
unbinding from the mappable portion of the GGTT, we can keep that iomap
across multiple invocations (e.g. execlists context pinning).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c     |  5 +++++
 drivers/gpu/drm/i915/i915_gem_gtt.c | 33 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.h |  4 ++++
 drivers/gpu/drm/i915/intel_fbdev.c  |  8 +++-----
 4 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0c4e8e1aeeff..5bb21b20c36a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2699,6 +2699,11 @@ int i915_vma_unbind(struct i915_vma *vma)
 		if (ret)
 			return ret;
 
+		if (vma->iomap) {
+			iounmap(vma->iomap);
+			vma->iomap = NULL;
+		}
+
 		vma->map_and_fenceable = false;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index b8af904ad12c..3fcf2fd73453 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3575,3 +3575,36 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
 
 	return 0;
 }
+
+void *i915_vma_iomap(struct drm_i915_private *dev_priv,
+		     struct i915_vma *vma)
+{
+	if (WARN_ON(!vma->map_and_fenceable))
+		return ERR_PTR(-ENODEV);
+
+	GEM_BUG_ON(!vma->is_ggtt);
+	GEM_BUG_ON((vma->bound & GLOBAL_BIND) == 0);
+
+	if (vma->iomap == NULL) {
+		u32 base = dev_priv->gtt.mappable_base + vma->node.start;
+		void *ptr;
+
+		ptr = ioremap_wc(base, vma->size);
+		if (ptr == NULL) {
+			int ret;
+
+			/* Too many areas already allocated? */
+			ret = i915_gem_evict_vm(vma->vm, true);
+			if (ret)
+				return ERR_PTR(ret);
+
+			ptr = ioremap_wc(base, vma->size);
+			if (ptr == NULL)
+				return ERR_PTR(-ENOMEM);
+		}
+
+		vma->iomap = ptr;
+	}
+
+	return vma->iomap;
+}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 6b0f557982d5..0e0570e13a68 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -181,6 +181,7 @@ struct i915_vma {
 	struct drm_mm_node node;
 	struct drm_i915_gem_object *obj;
 	struct i915_address_space *vm;
+	void *iomap;
 	u64 size;
 
 	struct i915_gem_active last_read[I915_NUM_RINGS];
@@ -579,4 +580,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev);
 
 int __must_check i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj);
 void i915_gem_gtt_finish_object(struct drm_i915_gem_object *obj);
+
+void *i915_vma_iomap(struct drm_i915_private *dev_priv,
+		     struct i915_vma *vma);
 #endif
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index 7decbca25dbb..8e7c341951fd 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -248,12 +248,10 @@ static int intelfb_create(struct drm_fb_helper *helper,
 	info->fix.smem_start = dev->mode_config.fb_base + vma->node.start;
 	info->fix.smem_len = vma->node.size;
 
-	info->screen_base =
-		ioremap_wc(dev_priv->gtt.mappable_base + vma->node.start,
-			   vma->node.size);
-	if (!info->screen_base) {
+	info->screen_base = i915_vma_iomap(dev_priv, vma);
+	if (IS_ERR(info->screen_base)) {
 		DRM_ERROR("Failed to remap framebuffer into virtual memory\n");
-		ret = -ENOSPC;
+		ret = PTR_ERR(info->screen_base);
 		goto out_destroy_fbi;
 	}
 	info->screen_size = vma->node.size;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 137/190] drm/i915: Shrink pages around failure to dma map
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (48 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 136/190] drm/i915: Move ioremap_wc tracking onto VMA Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 138/190] drm/i915/userptr: Make gup errors stickier Chris Wilson
                     ` (3 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

Similar to how we handle resource allocation failure of both physical
memory and GGTT mmap space, if we fail to allocate our DMAR remapping,
shrink some of our other objects and try again.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 3fcf2fd73453..59e7b11bf0ac 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2313,12 +2313,37 @@ void i915_gem_suspend_gtt_mappings(struct drm_device *dev)
 
 int i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj)
 {
-	if (!dma_map_sg(&obj->base.dev->pdev->dev,
-			obj->pages->sgl, obj->pages->nents,
-			PCI_DMA_BIDIRECTIONAL))
-		return -ENOSPC;
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	struct device *dma_device = &obj->base.dev->pdev->dev;
 
-	return 0;
+	if (dma_map_sg(dma_device,
+		       obj->pages->sgl, obj->pages->nents,
+		       PCI_DMA_BIDIRECTIONAL))
+		return 0;
+
+	/* If we fail here, it is likely we couldn't allocate the map, or
+	 * we have exhausted the available DMAR space. First throw out
+	 * objects to make enough rooom for the mapping and try again,
+	 * though this doesn't take fragmentation into account. So if we
+	 * still fail, throw out everything we can and start afresh.
+	 */
+	i915_gem_shrink(i915,
+			obj->base.size >> PAGE_SHIFT,
+			I915_SHRINK_BOUND |
+			I915_SHRINK_UNBOUND |
+			I915_SHRINK_PURGEABLE);
+	if (dma_map_sg(dma_device,
+		       obj->pages->sgl, obj->pages->nents,
+		       PCI_DMA_BIDIRECTIONAL))
+		return 0;
+
+	i915_gem_shrink_all(i915);
+	if (dma_map_sg(dma_device,
+		       obj->pages->sgl, obj->pages->nents,
+		       PCI_DMA_BIDIRECTIONAL))
+		return 0;
+
+	return -ENOSPC;
 }
 
 static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 138/190] drm/i915/userptr: Make gup errors stickier
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (49 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 137/190] drm/i915: Shrink pages around failure to dma map Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 139/190] drm/i915: Move fence tracking from object to vma Chris Wilson
                     ` (2 subsequent siblings)
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

Keep any error reported by the gup_worker until we are notified that the
arena has changed (via the mmu-notifier). This has the importance of
making two consecutive calls to i915_gem_object_get_pages() reporting
the same error, and curtailing an loop of detecting a fault and requeueing
a gup_worker.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_userptr.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 2f922392bd10..53f8094b3198 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -609,8 +609,6 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
 			}
 		}
 		obj->userptr.work = ERR_PTR(ret);
-		if (ret)
-			__i915_gem_userptr_set_active(obj, false);
 	}
 
 	obj->userptr.workers--;
@@ -696,15 +694,14 @@ i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
 	 * to the vma (discard or cloning) which should prevent the more
 	 * egregious cases from causing harm.
 	 */
-	if (IS_ERR(obj->userptr.work)) {
-		/* active flag will have been dropped already by the worker */
-		ret = PTR_ERR(obj->userptr.work);
-		obj->userptr.work = NULL;
-		return ret;
-	}
-	if (obj->userptr.work)
+
+	if (obj->userptr.work) {
 		/* active flag should still be held for the pending work */
-		return -EAGAIN;
+		if (IS_ERR(obj->userptr.work))
+			return PTR_ERR(obj->userptr.work);
+		else
+			return -EAGAIN;
+	}
 
 	/* Let the mmu-notifier know that we have begun and need cancellation */
 	ret = __i915_gem_userptr_set_active(obj, true);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 139/190] drm/i915: Move fence tracking from object to vma
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (50 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 138/190] drm/i915/userptr: Make gup errors stickier Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 140/190] drm/i915: Fix partial GGTT faulting Chris Wilson
  2016-01-11 10:45   ` [PATCH 141/190] drm/i915: Choose not to evict faultable objects from the GGTT Chris Wilson
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

In order to handle tiled partial GTT mmappings, we need to associate the
fence with an individual vma.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  15 +-
 drivers/gpu/drm/i915/i915_drv.h            |  81 ++++--
 drivers/gpu/drm/i915/i915_gem.c            |  34 ++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  21 +-
 drivers/gpu/drm/i915/i915_gem_fence.c      | 381 +++++++++++------------------
 drivers/gpu/drm/i915/i915_gem_gtt.c        |   7 +
 drivers/gpu/drm/i915/i915_gem_gtt.h        |   9 +
 drivers/gpu/drm/i915/i915_gem_tiling.c     |  65 +++--
 drivers/gpu/drm/i915/i915_gpu_error.c      |   2 +-
 drivers/gpu/drm/i915/intel_display.c       |  57 ++---
 drivers/gpu/drm/i915/intel_fbc.c           |  30 ++-
 drivers/gpu/drm/i915/intel_fbdev.c         |   4 +-
 drivers/gpu/drm/i915/intel_overlay.c       |   2 +-
 13 files changed, 324 insertions(+), 384 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 418b80de5246..f15ed7793969 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -133,9 +133,8 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	for_each_ring(ring, dev_priv, i)
 		seq_printf(m, "%x ",
 				i915_gem_request_get_seqno(obj->last_read[i].request));
-	seq_printf(m, "] %x %x%s%s%s",
+	seq_printf(m, "] %x %s%s%s",
 		   i915_gem_request_get_seqno(obj->last_write.request),
-		   i915_gem_request_get_seqno(obj->last_fence.request),
 		   i915_cache_level_str(to_i915(obj->base.dev), obj->cache_level),
 		   obj->dirty ? " dirty" : "",
 		   obj->madv == I915_MADV_DONTNEED ? " purgeable" : "");
@@ -148,8 +147,6 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	seq_printf(m, " (pinned x %d)", pin_count);
 	if (obj->pin_display)
 		seq_printf(m, " (display)");
-	if (obj->fence_reg != I915_FENCE_REG_NONE)
-		seq_printf(m, " (fence: %d)", obj->fence_reg);
 	list_for_each_entry(vma, &obj->vma_list, obj_link) {
 		if (!drm_mm_node_allocated(&vma->node))
 			continue;
@@ -159,6 +156,10 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 			   vma->node.start, vma->node.size);
 		if (vma->is_ggtt)
 			seq_printf(m, ", type: %u", vma->ggtt_view.type);
+		if (vma->fence)
+			seq_printf(m, " , fence: %d%s",
+				   vma->fence->id,
+				   vma->last_fence.request ? "*" : "");
 		seq_puts(m, ")");
 	}
 	if (obj->stolen)
@@ -948,14 +949,14 @@ static int i915_gem_fence_regs_info(struct seq_file *m, void *data)
 
 	seq_printf(m, "Total fences = %d\n", dev_priv->num_fence_regs);
 	for (i = 0; i < dev_priv->num_fence_regs; i++) {
-		struct drm_i915_gem_object *obj = dev_priv->fence_regs[i].obj;
+		struct i915_vma *vma = dev_priv->fence_regs[i].vma;
 
 		seq_printf(m, "Fence %d, pin count = %d, object = ",
 			   i, dev_priv->fence_regs[i].pin_count);
-		if (obj == NULL)
+		if (vma == NULL)
 			seq_puts(m, "unused");
 		else
-			describe_obj(m, obj);
+			describe_obj(m, vma->obj);
 		seq_putc(m, '\n');
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index cfc4430d3b50..bb0f750bb5b5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -458,15 +458,21 @@ struct intel_opregion {
 struct intel_overlay;
 struct intel_overlay_error_state;
 
-#define I915_FENCE_REG_NONE -1
-#define I915_MAX_NUM_FENCES 32
-/* 32 fences + sign bit for FENCE_REG_NONE */
-#define I915_MAX_NUM_FENCE_BITS 6
-
 struct drm_i915_fence_reg {
 	struct list_head lru_list;
-	struct drm_i915_gem_object *obj;
+	struct drm_i915_private *i915;
+	struct i915_vma *vma;
 	int pin_count;
+	int id;
+	/**
+	 * Whether the tiling parameters for the currently
+	 * associated fence register have changed. Note that
+	 * for the purposes of tracking tiling changes we also
+	 * treat the unfenced register, the register slot that
+	 * the object occupies whilst it executes a fenced
+	 * command (such as BLT on gen2/3), as a "fence".
+	 */
+	bool dirty;
 };
 
 struct sdvo_device_mapping {
@@ -2053,13 +2059,6 @@ struct drm_i915_gem_object {
 	unsigned int dirty:1;
 
 	/**
-	 * Fence register bits (if any) for this object.  Will be set
-	 * as needed when mapped into the GTT.
-	 * Protected by dev->struct_mutex.
-	 */
-	signed int fence_reg:I915_MAX_NUM_FENCE_BITS;
-
-	/**
 	 * Advice: are the backing pages purgeable?
 	 */
 	unsigned int madv:2;
@@ -2068,14 +2067,6 @@ struct drm_i915_gem_object {
 	 * Current tiling mode for the object.
 	 */
 	unsigned int tiling_mode:2;
-	/**
-	 * Whether the tiling parameters for the currently associated fence
-	 * register have changed. Note that for the purposes of tracking
-	 * tiling changes we also treat the unfenced register, the register
-	 * slot that the object occupies whilst it executes a fenced
-	 * command (such as BLT on gen2/3), as a "fence".
-	 */
-	unsigned int fence_dirty:1;
 
 	/**
 	 * Whether the current gtt mapping needs to be mappable (and isn't just
@@ -2118,7 +2109,6 @@ struct drm_i915_gem_object {
 	 */
 	struct i915_gem_active last_read[I915_NUM_RINGS];
 	struct i915_gem_active last_write;
-	struct i915_gem_active last_fence;
 
 	/** Current tiling stride for the object, if it's tiled. */
 	uint32_t stride;
@@ -2945,11 +2935,50 @@ i915_gem_object_ggtt_offset(struct drm_i915_gem_object *o,
 }
 
 /* i915_gem_fence.c */
-int __must_check i915_gem_object_get_fence(struct drm_i915_gem_object *obj);
-int __must_check i915_gem_object_put_fence(struct drm_i915_gem_object *obj);
+int __must_check i915_vma_get_fence(struct i915_vma *vma);
+int __must_check i915_vma_put_fence(struct i915_vma *vma);
 
-bool i915_gem_object_pin_fence(struct drm_i915_gem_object *obj);
-void i915_gem_object_unpin_fence(struct drm_i915_gem_object *obj);
+/**
+ * i915_vma_pin_fence - pin fencing state
+ * @vma: vma to pin fencing for
+ *
+ * This pins the fencing state (whether tiled or untiled) to make sure the
+ * vma (and its object) is ready to be used as a scanout target. Fencing
+ * status must be synchronize first by calling i915_vma_get_fence():
+ *
+ * The resulting fence pin reference must be released again with
+ * i915_vma_unpin_fence().
+ *
+ * Returns:
+ *
+ * True if the vma has a fence, false otherwise.
+ */
+static inline bool
+i915_vma_pin_fence(struct i915_vma *vma)
+{
+	if (vma->fence) {
+		vma->fence->pin_count++;
+		return true;
+	} else
+		return false;
+}
+
+/**
+ * i915_vma_unpin_fence - unpin fencing state
+ * @vma: vma to unpin fencing for
+ *
+ * This releases the fence pin reference acquired through
+ * i915_vma_pin_fence. It will handle both objects with and without an
+ * attached fence correctly, callers do not need to distinguish this.
+ */
+static inline void
+i915_vma_unpin_fence(struct i915_vma *vma)
+{
+	if (vma->fence) {
+		GEM_BUG_ON(vma->fence->pin_count <= 0);
+		vma->fence->pin_count--;
+	}
+}
 
 void i915_gem_restore_fences(struct drm_device *dev);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5bb21b20c36a..70397c1022d1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -863,11 +863,11 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
 		goto out;
 	}
 
-	ret = i915_gem_object_set_to_gtt_domain(obj, true);
+	ret = i915_vma_put_fence(vma);
 	if (ret)
 		goto out_unpin;
 
-	ret = i915_gem_object_put_fence(obj);
+	ret = i915_gem_object_set_to_gtt_domain(obj, true);
 	if (ret)
 		goto out_unpin;
 
@@ -1507,7 +1507,7 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	if (ret)
 		goto unpin;
 
-	ret = i915_gem_object_get_fence(obj);
+	ret = i915_vma_get_fence(ggtt);
 	if (ret)
 		goto unpin;
 
@@ -2112,12 +2112,6 @@ void *i915_gem_object_pin_vmap(struct drm_i915_gem_object *obj)
 }
 
 static void
-i915_gem_object_retire__fence(struct i915_gem_active *active,
-			      struct drm_i915_gem_request *req)
-{
-}
-
-static void
 i915_gem_object_retire__write(struct i915_gem_active *active,
 			      struct drm_i915_gem_request *request)
 {
@@ -2646,6 +2640,7 @@ static void i915_vma_destroy(struct i915_vma *vma)
 	GEM_BUG_ON(vma->node.allocated);
 	GEM_BUG_ON(vma->active);
 	GEM_BUG_ON(!vma->closed);
+	GEM_BUG_ON(vma->fence);
 
 	list_del(&vma->vm_link);
 	if (!vma->is_ggtt)
@@ -2695,7 +2690,7 @@ int i915_vma_unbind(struct i915_vma *vma)
 		i915_gem_object_finish_gtt(obj);
 
 		/* release the fence reg _after_ flushing */
-		ret = i915_gem_object_put_fence(obj);
+		ret = i915_vma_put_fence(vma);
 		if (ret)
 			return ret;
 
@@ -3163,9 +3158,11 @@ restart:
 			 * dropped the fence as all snoopable access is
 			 * supposed to be linear.
 			 */
-			ret = i915_gem_object_put_fence(obj);
-			if (ret)
-				return ret;
+			list_for_each_entry(vma, &obj->vma_list, obj_link) {
+				ret = i915_vma_put_fence(vma);
+				if (ret)
+					return ret;
+			}
 		} else {
 			/* We either have incoherent backing store and
 			 * so no GTT access or the architecture is fully
@@ -3722,15 +3719,12 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 				    i915_gem_object_retire__read);
 	init_request_active(&obj->last_write,
 			    i915_gem_object_retire__write);
-	init_request_active(&obj->last_fence,
-			    i915_gem_object_retire__fence);
 	INIT_LIST_HEAD(&obj->obj_exec_link);
 	INIT_LIST_HEAD(&obj->vma_list);
 	INIT_LIST_HEAD(&obj->batch_pool_link);
 
 	obj->ops = ops;
 
-	obj->fence_reg = I915_FENCE_REG_NONE;
 	obj->madv = I915_MADV_WILLNEED;
 
 	i915_gem_info_add_obj(obj->base.dev->dev_private, obj->base.size);
@@ -4241,8 +4235,6 @@ i915_gem_load(struct drm_device *dev)
 	INIT_LIST_HEAD(&dev_priv->mm.fence_list);
 	for (i = 0; i < I915_NUM_RINGS; i++)
 		init_ring_lists(&dev_priv->ring[i]);
-	for (i = 0; i < I915_MAX_NUM_FENCES; i++)
-		INIT_LIST_HEAD(&dev_priv->fence_regs[i].lru_list);
 	INIT_DELAYED_WORK(&dev_priv->mm.retire_work,
 			  i915_gem_retire_work_handler);
 	INIT_DELAYED_WORK(&dev_priv->mm.idle_work,
@@ -4266,6 +4258,12 @@ i915_gem_load(struct drm_device *dev)
 
 	/* Initialize fence registers to zero */
 	INIT_LIST_HEAD(&dev_priv->mm.fence_list);
+	for (i = 0; i < dev_priv->num_fence_regs; i++) {
+		struct drm_i915_fence_reg *fence = &dev_priv->fence_regs[i];
+		fence->i915 = dev_priv;
+		fence->id = i;
+		list_add_tail(&fence->lru_list, &dev_priv->mm.fence_list);
+	}
 	i915_gem_restore_fences(dev);
 
 	i915_gem_detect_bit_6_swizzle(dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index d13b7e507b3d..691da0085ff4 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -247,7 +247,6 @@ static void
 i915_gem_execbuffer_unreserve_vma(struct i915_vma *vma)
 {
 	struct drm_i915_gem_exec_object2 *entry;
-	struct drm_i915_gem_object *obj = vma->obj;
 
 	if (!drm_mm_node_allocated(&vma->node))
 		return;
@@ -255,7 +254,7 @@ i915_gem_execbuffer_unreserve_vma(struct i915_vma *vma)
 	entry = vma->exec_entry;
 
 	if (entry->flags & __EXEC_OBJECT_HAS_FENCE)
-		i915_gem_object_unpin_fence(obj);
+		i915_vma_unpin_fence(vma);
 
 	if (entry->flags & __EXEC_OBJECT_HAS_PIN)
 		__i915_vma_unpin(vma);
@@ -409,11 +408,11 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj,
 		if (IS_ERR(vma))
 			return NULL;
 
-		ret = i915_gem_object_set_to_gtt_domain(obj, true);
+		ret = i915_vma_put_fence(vma);
 		if (ret)
 			return ERR_PTR(ret);
 
-		ret = i915_gem_object_put_fence(obj);
+		ret = i915_gem_object_set_to_gtt_domain(obj, true);
 		if (ret)
 			return ERR_PTR(ret);
 
@@ -746,11 +745,11 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
 	entry->flags |= __EXEC_OBJECT_HAS_PIN;
 
 	if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
-		ret = i915_gem_object_get_fence(obj);
+		ret = i915_vma_get_fence(vma);
 		if (ret)
 			return ret;
 
-		if (i915_gem_object_pin_fence(obj))
+		if (i915_vma_pin_fence(vma))
 			entry->flags |= __EXEC_OBJECT_HAS_FENCE;
 	}
 
@@ -1227,14 +1226,8 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 		obj->base.write_domain &= ~I915_GEM_GPU_DOMAINS;
 	}
 
-	if (flags & EXEC_OBJECT_NEEDS_FENCE) {
-		i915_gem_request_mark_active(req, &obj->last_fence);
-		if (flags & __EXEC_OBJECT_HAS_FENCE) {
-			struct drm_i915_private *dev_priv = req->i915;
-			list_move_tail(&dev_priv->fence_regs[obj->fence_reg].lru_list,
-				       &dev_priv->mm.fence_list);
-		}
-	}
+	if (flags & EXEC_OBJECT_NEEDS_FENCE)
+		i915_gem_request_mark_active(req, &vma->last_fence);
 
 	vma->active |= 1 << engine;
 	i915_gem_request_mark_active(req, &vma->last_read[engine]);
diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
index e0f5fba22931..073601ec227a 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence.c
@@ -55,67 +55,66 @@
  * CPU ptes into GTT mmaps (not the GTT ptes themselves) as needed.
  */
 
-static void i965_write_fence_reg(struct drm_device *dev, int reg,
-				 struct drm_i915_gem_object *obj)
+static void i965_write_fence_reg(struct drm_i915_fence_reg *fence,
+				 struct i915_vma *vma)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	i915_reg_t fence_reg_lo, fence_reg_hi;
 	int fence_pitch_shift;
+	u64 val;
 
-	if (INTEL_INFO(dev)->gen >= 6) {
-		fence_reg_lo = FENCE_REG_GEN6_LO(reg);
-		fence_reg_hi = FENCE_REG_GEN6_HI(reg);
+	if (INTEL_INFO(fence->i915)->gen >= 6) {
+		fence_reg_lo = FENCE_REG_GEN6_LO(fence->id);
+		fence_reg_hi = FENCE_REG_GEN6_HI(fence->id);
 		fence_pitch_shift = GEN6_FENCE_PITCH_SHIFT;
+
 	} else {
-		fence_reg_lo = FENCE_REG_965_LO(reg);
-		fence_reg_hi = FENCE_REG_965_HI(reg);
+		fence_reg_lo = FENCE_REG_965_LO(fence->id);
+		fence_reg_hi = FENCE_REG_965_HI(fence->id);
 		fence_pitch_shift = I965_FENCE_PITCH_SHIFT;
 	}
 
-	/* To w/a incoherency with non-atomic 64-bit register updates,
-	 * we split the 64-bit update into two 32-bit writes. In order
-	 * for a partial fence not to be evaluated between writes, we
-	 * precede the update with write to turn off the fence register,
-	 * and only enable the fence as the last step.
-	 *
-	 * For extra levels of paranoia, we make sure each step lands
-	 * before applying the next step.
-	 */
-	I915_WRITE(fence_reg_lo, 0);
-	POSTING_READ(fence_reg_lo);
-
-	if (obj) {
-		struct i915_vma *vma = i915_gem_object_to_ggtt(obj, NULL);
-		u32 row_size = obj->stride * (obj->tiling_mode == I915_TILING_Y  ? 32 : 8);
+	if (vma) {
+		u32 stride = vma->obj->stride;
+		unsigned tiling_y = vma->obj->tiling_mode == I915_TILING_Y;
+		u32 row_size = stride * (tiling_y ? 32 : 8);
 		u32 size = (u32)vma->node.size / row_size * row_size;
-		u64 val;
 
 		val = ((vma->node.start + size - 4096) & 0xfffff000) << 32;
 		val |= vma->node.start & 0xfffff000;
-		val |= (u64)((obj->stride / 128) - 1) << fence_pitch_shift;
-		if (obj->tiling_mode == I915_TILING_Y)
+		val |= (u64)((stride / 128) - 1) << fence_pitch_shift;
+		if (tiling_y)
 			val |= 1 << I965_FENCE_TILING_Y_SHIFT;
 		val |= I965_FENCE_REG_VALID;
+	} else
+		val = 0;
 
-		I915_WRITE(fence_reg_hi, val >> 32);
-		POSTING_READ(fence_reg_hi);
+	if (1) {
+		struct drm_i915_private *dev_priv = fence->i915;
 
-		I915_WRITE(fence_reg_lo, val);
+		/* To w/a incoherency with non-atomic 64-bit register updates,
+		 * we split the 64-bit update into two 32-bit writes. In order
+		 * for a partial fence not to be evaluated between writes, we
+		 * precede the update with write to turn off the fence register,
+		 * and only enable the fence as the last step.
+		 *
+		 * For extra levels of paranoia, we make sure each step lands
+		 * before applying the next step.
+		 */
+		I915_WRITE(fence_reg_lo, 0);
+		POSTING_READ(fence_reg_lo);
+
+		I915_WRITE(fence_reg_hi, upper_32_bits(val));
+		I915_WRITE(fence_reg_lo, lower_32_bits(val));
 		POSTING_READ(fence_reg_lo);
-	} else {
-		I915_WRITE(fence_reg_hi, 0);
-		POSTING_READ(fence_reg_hi);
 	}
 }
 
-static void i915_write_fence_reg(struct drm_device *dev, int reg,
-				 struct drm_i915_gem_object *obj)
+static void i915_write_fence_reg(struct drm_i915_fence_reg *fence,
+				 struct i915_vma *vma)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 val;
 
-	if (obj) {
-		struct i915_vma *vma = i915_gem_object_to_ggtt(obj, NULL);
+	if (vma) {
 		int pitch_val;
 		int tile_width;
 
@@ -127,17 +126,18 @@ static void i915_write_fence_reg(struct drm_device *dev, int reg,
 		     vma->map_and_fenceable,
 		     (long)vma->node.size);
 
-		if (obj->tiling_mode == I915_TILING_Y && HAS_128_BYTE_Y_TILING(dev))
+		if (vma->obj->tiling_mode == I915_TILING_Y &&
+		    HAS_128_BYTE_Y_TILING(fence->i915))
 			tile_width = 128;
 		else
 			tile_width = 512;
 
 		/* Note: pitch better be a power of two tile widths */
-		pitch_val = obj->stride / tile_width;
+		pitch_val = vma->obj->stride / tile_width;
 		pitch_val = ffs(pitch_val) - 1;
 
 		val = vma->node.start;
-		if (obj->tiling_mode == I915_TILING_Y)
+		if (vma->obj->tiling_mode == I915_TILING_Y)
 			val |= 1 << I830_FENCE_TILING_Y_SHIFT;
 		val |= I915_FENCE_SIZE_BITS(vma->node.size);
 		val |= pitch_val << I830_FENCE_PITCH_SHIFT;
@@ -145,18 +145,20 @@ static void i915_write_fence_reg(struct drm_device *dev, int reg,
 	} else
 		val = 0;
 
-	I915_WRITE(FENCE_REG(reg), val);
-	POSTING_READ(FENCE_REG(reg));
+	if (1) {
+		struct drm_i915_private *dev_priv = fence->i915;
+		i915_reg_t reg = FENCE_REG(fence->id);
+		I915_WRITE(reg, val);
+		POSTING_READ(reg);
+	}
 }
 
-static void i830_write_fence_reg(struct drm_device *dev, int reg,
-				struct drm_i915_gem_object *obj)
+static void i830_write_fence_reg(struct drm_i915_fence_reg *fence,
+				 struct i915_vma *vma)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	uint32_t val;
+	u32 val;
 
-	if (obj) {
-		struct i915_vma *vma = i915_gem_object_to_ggtt(obj, NULL);
+	if (vma) {
 		uint32_t pitch_val;
 
 		WARN((vma->node.start & ~I830_FENCE_START_MASK) ||
@@ -165,11 +167,11 @@ static void i830_write_fence_reg(struct drm_device *dev, int reg,
 		     "object 0x%08lx not 512K or pot-size 0x%08lx aligned\n",
 		     (long)vma->node.start, (long)vma->node.size);
 
-		pitch_val = obj->stride / 128;
+		pitch_val = vma->obj->stride / 128;
 		pitch_val = ffs(pitch_val) - 1;
 
 		val = vma->node.start;
-		if (obj->tiling_mode == I915_TILING_Y)
+		if (vma->obj->tiling_mode == I915_TILING_Y)
 			val |= 1 << I830_FENCE_TILING_Y_SHIFT;
 		val |= I830_FENCE_SIZE_BITS(vma->node.size);
 		val |= pitch_val << I830_FENCE_PITCH_SHIFT;
@@ -177,87 +179,85 @@ static void i830_write_fence_reg(struct drm_device *dev, int reg,
 	} else
 		val = 0;
 
-	I915_WRITE(FENCE_REG(reg), val);
-	POSTING_READ(FENCE_REG(reg));
+	if (1) {
+		struct drm_i915_private *dev_priv = fence->i915;
+		i915_reg_t reg = FENCE_REG(fence->id);
+		I915_WRITE(reg, val);
+		POSTING_READ(reg);
+	}
 }
 
-inline static bool i915_gem_object_needs_mb(struct drm_i915_gem_object *obj)
+static void fence_write(struct drm_i915_fence_reg *fence,
+		       	struct i915_vma *vma)
 {
-	return obj && obj->base.read_domains & I915_GEM_DOMAIN_GTT;
-}
+	/* Previous access through the fence register is marshalled by
+	 * the mb() inside the fault handlers (i915_gem_release_mmaps)
+	 * and explicitly managed for internal users.
+	 */
 
-static void i915_gem_write_fence(struct drm_device *dev, int reg,
-				 struct drm_i915_gem_object *obj)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	if (IS_GEN2(fence->i915))
+		i830_write_fence_reg(fence, vma);
+	else if (IS_GEN3(fence->i915))
+		i915_write_fence_reg(fence, vma);
+	else
+		i965_write_fence_reg(fence, vma);
 
-	/* Ensure that all CPU reads are completed before installing a fence
-	 * and all writes before removing the fence.
-	 */
-	if (i915_gem_object_needs_mb(dev_priv->fence_regs[reg].obj))
-		mb();
-
-	WARN(obj && (!obj->stride || !obj->tiling_mode),
-	     "bogus fence setup with stride: 0x%x, tiling mode: %i\n",
-	     obj->stride, obj->tiling_mode);
-
-	if (IS_GEN2(dev))
-		i830_write_fence_reg(dev, reg, obj);
-	else if (IS_GEN3(dev))
-		i915_write_fence_reg(dev, reg, obj);
-	else if (INTEL_INFO(dev)->gen >= 4)
-		i965_write_fence_reg(dev, reg, obj);
-
-	/* And similarly be paranoid that no direct access to this region
-	 * is reordered to before the fence is installed.
+	/* Access through the fenced region afterwards is
+	 * ordered by the posting reads whilst writing the registers.
 	 */
-	if (i915_gem_object_needs_mb(obj))
-		mb();
-}
 
-static inline int fence_number(struct drm_i915_private *dev_priv,
-			       struct drm_i915_fence_reg *fence)
-{
-	return fence - dev_priv->fence_regs;
+	fence->dirty = false;
 }
 
-static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
-					 struct drm_i915_fence_reg *fence,
-					 bool enable)
+static int fence_update(struct drm_i915_fence_reg *fence,
+			struct i915_vma *vma)
 {
-	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
-	int reg = fence_number(dev_priv, fence);
+	int ret;
 
-	i915_gem_write_fence(obj->base.dev, reg, enable ? obj : NULL);
+	if (vma) {
+		if (!vma->map_and_fenceable)
+			return -EINVAL;
 
-	if (enable) {
-		obj->fence_reg = reg;
-		fence->obj = obj;
-		list_move_tail(&fence->lru_list, &dev_priv->mm.fence_list);
-	} else {
-		obj->fence_reg = I915_FENCE_REG_NONE;
-		fence->obj = NULL;
-		list_del_init(&fence->lru_list);
+		if (WARN(!vma->obj->stride || !vma->obj->tiling_mode,
+			 "bogus fence setup with stride: 0x%x, tiling mode: %i\n",
+			 vma->obj->stride, vma->obj->tiling_mode))
+			return -EINVAL;
+
+		ret = i915_wait_request(vma->last_fence.request);
+		if (ret)
+			return ret;
 	}
-	obj->fence_dirty = false;
-}
 
-static inline void i915_gem_object_fence_lost(struct drm_i915_gem_object *obj)
-{
-	if (obj->tiling_mode)
-		i915_gem_release_mmap(obj);
+	if (fence->vma) {
+		ret = i915_wait_request(fence->vma->last_fence.request);
+		if (ret)
+			return ret;
+	}
 
-	/* As we do not have an associated fence register, we will force
-	 * a tiling change if we ever need to acquire one.
-	 */
-	obj->fence_dirty = false;
-	obj->fence_reg = I915_FENCE_REG_NONE;
-}
+	if (fence->vma && fence->vma != vma) {
+		/* Ensure that all userspace CPU access is completed before
+		 * stealing the fence.
+		 */
+		i915_gem_release_mmap(fence->vma->obj);
 
-static int
-i915_gem_object_wait_fence(struct drm_i915_gem_object *obj)
-{
-	return i915_wait_request(obj->last_fence.request);
+		fence->vma->fence = NULL;
+		fence->vma = NULL;
+
+		list_move(&fence->lru_list, &fence->i915->mm.fence_list);
+	}
+
+	fence_write(fence, vma);
+
+	if (vma) {
+		if (fence->vma != vma) {
+			vma->fence = fence;
+			fence->vma = vma;
+		}
+
+		list_move_tail(&fence->lru_list, &fence->i915->mm.fence_list);
+	}
+
+	return 0;
 }
 
 /**
@@ -272,62 +272,32 @@ i915_gem_object_wait_fence(struct drm_i915_gem_object *obj)
  * 0 on success, negative error code on failure.
  */
 int
-i915_gem_object_put_fence(struct drm_i915_gem_object *obj)
+i915_vma_put_fence(struct i915_vma *vma)
 {
-	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
-	struct drm_i915_fence_reg *fence;
-	int ret;
+	struct drm_i915_fence_reg *fence = vma->fence;
 
-	ret = i915_gem_object_wait_fence(obj);
-	if (ret)
-		return ret;
-
-	if (obj->fence_reg == I915_FENCE_REG_NONE)
+	if (fence == NULL)
 		return 0;
 
-	fence = &dev_priv->fence_regs[obj->fence_reg];
-
 	if (WARN_ON(fence->pin_count))
 		return -EBUSY;
 
-	i915_gem_object_fence_lost(obj);
-	i915_gem_object_update_fence(obj, fence, false);
-
-	return 0;
+	return fence_update(fence, NULL);
 }
 
-static struct drm_i915_fence_reg *
-i915_find_fence_reg(struct drm_device *dev)
+static struct drm_i915_fence_reg *fence_find(struct drm_i915_private *dev_priv)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_fence_reg *reg, *avail;
-	int i;
-
-	/* First try to find a free reg */
-	avail = NULL;
-	for (i = 0; i < dev_priv->num_fence_regs; i++) {
-		reg = &dev_priv->fence_regs[i];
-		if (!reg->obj)
-			return reg;
-
-		if (!reg->pin_count)
-			avail = reg;
-	}
-
-	if (avail == NULL)
-		goto deadlock;
+	struct drm_i915_fence_reg *fence;
 
-	/* None available, try to steal one or wait for a user to finish */
-	list_for_each_entry(reg, &dev_priv->mm.fence_list, lru_list) {
-		if (reg->pin_count)
+	list_for_each_entry(fence, &dev_priv->mm.fence_list, lru_list) {
+		if (fence->pin_count)
 			continue;
 
-		return reg;
+		return fence;
 	}
 
-deadlock:
 	/* Wait for completion of pending flips which consume fences */
-	if (intel_has_pending_fb_unpin(dev))
+	if (intel_has_pending_fb_unpin(dev_priv->dev))
 		return ERR_PTR(-EAGAIN);
 
 	return ERR_PTR(-EDEADLK);
@@ -352,95 +322,27 @@ deadlock:
  * 0 on success, negative error code on failure.
  */
 int
-i915_gem_object_get_fence(struct drm_i915_gem_object *obj)
+i915_vma_get_fence(struct i915_vma *vma)
 {
-	struct drm_device *dev = obj->base.dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	bool enable = obj->tiling_mode != I915_TILING_NONE;
-	struct drm_i915_fence_reg *reg;
-	int ret;
-
-	/* Have we updated the tiling parameters upon the object and so
-	 * will need to serialise the write to the associated fence register?
-	 */
-	if (obj->fence_dirty) {
-		ret = i915_gem_object_wait_fence(obj);
-		if (ret)
-			return ret;
-	}
+	struct drm_i915_fence_reg *fence;
+	struct i915_vma *set = vma->obj->tiling_mode ? vma : NULL;
 
 	/* Just update our place in the LRU if our fence is getting reused. */
-	if (obj->fence_reg != I915_FENCE_REG_NONE) {
-		reg = &dev_priv->fence_regs[obj->fence_reg];
-		if (!obj->fence_dirty) {
-			list_move_tail(&reg->lru_list,
-				       &dev_priv->mm.fence_list);
+	if (vma->fence) {
+		fence = vma->fence;
+		if (!fence->dirty) {
+			list_move_tail(&fence->lru_list,
+				       &fence->i915->mm.fence_list);
 			return 0;
 		}
-	} else if (enable) {
-		reg = i915_find_fence_reg(dev);
-		if (IS_ERR(reg))
-			return PTR_ERR(reg);
-
-		if (reg->obj) {
-			struct drm_i915_gem_object *old = reg->obj;
-
-			ret = i915_gem_object_wait_fence(old);
-			if (ret)
-				return ret;
-
-			i915_gem_object_fence_lost(old);
-		}
+	} else if (set) {
+		fence = fence_find(to_i915(vma->vm->dev));
+		if (IS_ERR(fence))
+			return PTR_ERR(fence);
 	} else
 		return 0;
 
-	i915_gem_object_update_fence(obj, reg, enable);
-
-	return 0;
-}
-
-/**
- * i915_gem_object_pin_fence - pin fencing state
- * @obj: object to pin fencing for
- *
- * This pins the fencing state (whether tiled or untiled) to make sure the
- * object is ready to be used as a scanout target. Fencing status must be
- * synchronize first by calling i915_gem_object_get_fence():
- *
- * The resulting fence pin reference must be released again with
- * i915_gem_object_unpin_fence().
- *
- * Returns:
- *
- * True if the object has a fence, false otherwise.
- */
-bool
-i915_gem_object_pin_fence(struct drm_i915_gem_object *obj)
-{
-	if (obj->fence_reg != I915_FENCE_REG_NONE) {
-		struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
-		dev_priv->fence_regs[obj->fence_reg].pin_count++;
-		return true;
-	} else
-		return false;
-}
-
-/**
- * i915_gem_object_unpin_fence - unpin fencing state
- * @obj: object to unpin fencing for
- *
- * This releases the fence pin reference acquired through
- * i915_gem_object_pin_fence. It will handle both objects with and without an
- * attached fence correctly, callers do not need to distinguish this.
- */
-void
-i915_gem_object_unpin_fence(struct drm_i915_gem_object *obj)
-{
-	if (obj->fence_reg != I915_FENCE_REG_NONE) {
-		struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
-		WARN_ON(dev_priv->fence_regs[obj->fence_reg].pin_count <= 0);
-		dev_priv->fence_regs[obj->fence_reg].pin_count--;
-	}
+	return fence_update(fence, set);
 }
 
 /**
@@ -462,12 +364,7 @@ void i915_gem_restore_fences(struct drm_device *dev)
 		 * Commit delayed tiling changes if we have an object still
 		 * attached to the fence, otherwise just clear the fence.
 		 */
-		if (reg->obj) {
-			i915_gem_object_update_fence(reg->obj, reg,
-						     reg->obj->tiling_mode);
-		} else {
-			i915_gem_write_fence(dev, i, NULL);
-		}
+		fence_write(reg, reg->vma);
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 59e7b11bf0ac..3db8cdf56dcc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3265,6 +3265,12 @@ i915_vma_retire(struct i915_gem_active *active,
 		WARN_ON(i915_vma_unbind(vma));
 }
 
+static void
+i915_vma_retire__fence(struct i915_gem_active *active,
+		       struct drm_i915_gem_request *request)
+{
+}
+
 static struct i915_vma *
 __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 		      struct i915_address_space *vm,
@@ -3282,6 +3288,7 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 	INIT_LIST_HEAD(&vma->exec_list);
 	for (i = 0; i < ARRAY_SIZE(vma->last_read); i++)
 		init_request_active(&vma->last_read[i], i915_vma_retire);
+	init_request_active(&vma->last_fence, i915_vma_retire__fence);
 	list_add(&vma->vm_link, &vm->unbound_list);
 	vma->vm = vm;
 	vma->obj = obj;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 0e0570e13a68..c0ada0402335 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -36,7 +36,14 @@
 
 #include "i915_gem_request.h"
 
+#define I915_FENCE_REG_NONE -1
+#define I915_MAX_NUM_FENCES 32
+/* 32 fences + sign bit for FENCE_REG_NONE */
+#define I915_MAX_NUM_FENCE_BITS 6
+
+
 struct drm_i915_file_private;
+struct drm_i915_fence_reg;
 
 typedef uint32_t gen6_pte_t;
 typedef uint64_t gen8_pte_t;
@@ -181,10 +188,12 @@ struct i915_vma {
 	struct drm_mm_node node;
 	struct drm_i915_gem_object *obj;
 	struct i915_address_space *vm;
+	struct drm_i915_fence_reg *fence;
 	void *iomap;
 	u64 size;
 
 	struct i915_gem_active last_read[I915_NUM_RINGS];
+	struct i915_gem_active last_fence;
 
 	union {
 		struct {
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
index 7c2da8060757..57aab59c6a5c 100644
--- a/drivers/gpu/drm/i915/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
@@ -113,12 +113,37 @@ i915_tiling_ok(struct drm_device *dev, int stride, int size, int tiling_mode)
 	return true;
 }
 
+static bool i915_vma_fence_ok(struct i915_vma *vma, int tiling_mode)
+{
+	u32 size;
+
+	if (!vma->map_and_fenceable)
+		return true;
+
+	if (INTEL_INFO(vma->vm->dev)->gen == 3) {
+		if (vma->node.start & ~I915_FENCE_START_MASK)
+			return false;
+	} else {
+		if (vma->node.start & ~I830_FENCE_START_MASK)
+			return false;
+	}
+
+	size = i915_gem_get_gtt_size(vma->vm->dev, vma->size, tiling_mode);
+	if (vma->node.size < size)
+		return false;
+
+	if (vma->node.start & (size - 1))
+		return false;
+
+	return true;
+}
+
 /* Is the current GTT allocation valid for the change in tiling? */
 static int
 i915_gem_object_fence_ok(struct drm_i915_gem_object *obj, int tiling_mode)
 {
 	struct i915_vma *vma;
-	u32 size;
+	int ret;
 
 	if (tiling_mode == I915_TILING_NONE)
 		return 0;
@@ -126,32 +151,16 @@ i915_gem_object_fence_ok(struct drm_i915_gem_object *obj, int tiling_mode)
 	if (INTEL_INFO(obj->base.dev)->gen >= 4)
 		return 0;
 
-	vma = i915_gem_object_to_ggtt(obj, NULL);
-	if (vma == NULL)
-		return 0;
+	list_for_each_entry(vma, &obj->vma_list, obj_link) {
+		if (i915_vma_fence_ok(vma, tiling_mode))
+			continue;
 
-	if (!vma->map_and_fenceable)
-		return 0;
-
-	if (INTEL_INFO(obj->base.dev)->gen == 3) {
-		if (vma->node.start & ~I915_FENCE_START_MASK)
-			goto bad;
-	} else {
-		if (vma->node.start & ~I830_FENCE_START_MASK)
-			goto bad;
+		ret = i915_vma_unbind(vma);
+		if (ret)
+			return ret;
 	}
 
-	size = i915_gem_get_gtt_size(obj->base.dev, vma->size, tiling_mode);
-	if (vma->node.size < size)
-		goto bad;
-
-	if (vma->node.start & (size - 1))
-		goto bad;
-
 	return 0;
-
-bad:
-	return i915_vma_unbind(vma);
 }
 
 /**
@@ -240,6 +249,8 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
 		 */
 		ret = i915_gem_object_fence_ok(obj, args->tiling_mode);
 		if (ret == 0) {
+			struct i915_vma *vma;
+
 			if (obj->pages &&
 			    obj->madv == I915_MADV_WILLNEED &&
 			    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
@@ -249,10 +260,12 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
 					i915_gem_object_pin_pages(obj);
 			}
 
-			obj->fence_dirty =
-				obj->last_fence.request ||
-				obj->fence_reg != I915_FENCE_REG_NONE;
+			list_for_each_entry(vma, &obj->vma_list, obj_link) {
+				if (!vma->fence)
+					continue;
 
+				vma->fence->dirty = true;
+			}
 			obj->tiling_mode = args->tiling_mode;
 			obj->stride = args->stride;
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 69ce355e00ea..e5907ac666ad 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -711,7 +711,7 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 	err->gtt_offset = vma->node.start;
 	err->read_domains = obj->base.read_domains;
 	err->write_domain = obj->base.write_domain;
-	err->fence_reg = obj->fence_reg;
+	err->fence_reg = vma->fence ? vma->fence->id : -1;
 	err->tiling = obj->tiling_mode;
 	err->dirty = obj->dirty;
 	err->purgeable = obj->madv != I915_MADV_WILLNEED;
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 218bfd3c99fc..13d283e4b0a3 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2333,7 +2333,6 @@ intel_pin_and_fence_fb_obj(struct drm_plane *plane,
 	struct i915_ggtt_view view;
 	struct i915_vma *vma;
 	u32 alignment;
-	int ret;
 
 	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
 
@@ -2381,43 +2380,33 @@ intel_pin_and_fence_fb_obj(struct drm_plane *plane,
 	intel_runtime_pm_get(dev_priv);
 
 	vma = i915_gem_object_pin_to_display_plane(obj, alignment, &view);
-	if (IS_ERR(vma)) {
-		ret = PTR_ERR(vma);
-		goto err_pm;
-	}
+	if (IS_ERR(vma))
+		goto err;
 
-	/* Install a fence for tiled scan-out. Pre-i965 always needs a
-	 * fence, whereas 965+ only requires a fence if using
-	 * framebuffer compression.  For simplicity, we always install
-	 * a fence as the cost is not that onerous.
-	 */
 	if (vma->map_and_fenceable) {
-		ret = i915_gem_object_get_fence(obj);
-		if (ret == -EDEADLK) {
-			/*
-			 * -EDEADLK means there are no free fences
-			 * no pending flips.
-			 *
-			 * This is propagated to atomic, but it uses
-			 * -EDEADLK to force a locking recovery, so
-			 * change the returned error to -EBUSY.
-			 */
-			ret = -EBUSY;
-			goto err_unpin;
-		} else if (ret)
-			goto err_unpin;
-
-		i915_gem_object_pin_fence(obj);
+		/* Install a fence for tiled scan-out. Pre-i965 always needs a
+		 * fence, whereas 965+ only requires a fence if using
+		 * framebuffer compression.  For simplicity, we always, when
+		 * possible, install a fence as the cost is not that onerous.
+		 *
+		 * If we fail to fence the tiled scanout, then either the
+		 * modeset will reject the change (which is highly unlikely as
+		 * the affected systems, all but one, do not have unmappable
+		 * space) or we will not be able to enable full powersaving
+		 * techniques (also likely not to apply due to various limits
+		 * FBC and the like impose on the size of the buffer, which
+		 * presumably we violated anyway with this unmappable buffer).
+		 * Anyway, it is presumably better to stumble onwards with
+		 * something and try to run the system in a "less than optimal"
+		 * mode that matches the user configuration.
+		 */
+		if (i915_vma_get_fence(vma) == 0)
+			i915_vma_pin_fence(vma);
 	}
 
+err:
 	intel_runtime_pm_put(dev_priv);
 	return vma;
-
-err_unpin:
-	i915_gem_object_unpin_from_display_plane(vma);
-err_pm:
-	intel_runtime_pm_put(dev_priv);
-	return ERR_PTR(ret);
 }
 
 static void intel_unpin_fb_obj(struct drm_framebuffer *fb,
@@ -2432,9 +2421,7 @@ static void intel_unpin_fb_obj(struct drm_framebuffer *fb,
 	intel_fill_fb_ggtt_view(&view, fb, state);
 	vma = i915_gem_object_to_ggtt(obj, &view);
 
-	if (vma->map_and_fenceable)
-		i915_gem_object_unpin_fence(obj);
-
+	i915_vma_unpin_fence(vma);
 	i915_gem_object_unpin_from_display_plane(vma);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_fbc.c b/drivers/gpu/drm/i915/intel_fbc.c
index 8d8f1ce7f1ae..db48e3ccd7f7 100644
--- a/drivers/gpu/drm/i915/intel_fbc.c
+++ b/drivers/gpu/drm/i915/intel_fbc.c
@@ -130,11 +130,17 @@ static void i8xx_fbc_deactivate(struct drm_i915_private *dev_priv)
 	}
 }
 
+/* XXX replace me when we have VMA tracking for intel_plane_state */
+static int get_fence_id(struct drm_framebuffer *fb)
+{
+	struct i915_vma *vma = i915_gem_object_to_ggtt(intel_fb_obj(fb), NULL);
+	return vma->fence ? vma->fence->id : I915_FENCE_REG_NONE;
+}
+
 static void i8xx_fbc_activate(struct intel_crtc *crtc)
 {
 	struct drm_i915_private *dev_priv = crtc->base.dev->dev_private;
 	struct drm_framebuffer *fb = crtc->base.primary->fb;
-	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 	int cfb_pitch;
 	int i;
 	u32 fbc_ctl;
@@ -173,7 +179,7 @@ static void i8xx_fbc_activate(struct intel_crtc *crtc)
 	if (IS_I945GM(dev_priv))
 		fbc_ctl |= FBC_CTL_C3_IDLE; /* 945 needs special SR handling */
 	fbc_ctl |= (cfb_pitch & 0xff) << FBC_CTL_STRIDE_SHIFT;
-	fbc_ctl |= obj->fence_reg;
+	fbc_ctl |= get_fence_id(fb);
 	I915_WRITE(FBC_CONTROL, fbc_ctl);
 }
 
@@ -186,7 +192,6 @@ static void g4x_fbc_activate(struct intel_crtc *crtc)
 {
 	struct drm_i915_private *dev_priv = crtc->base.dev->dev_private;
 	struct drm_framebuffer *fb = crtc->base.primary->fb;
-	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 	u32 dpfc_ctl;
 
 	dev_priv->fbc.active = true;
@@ -196,7 +201,8 @@ static void g4x_fbc_activate(struct intel_crtc *crtc)
 		dpfc_ctl |= DPFC_CTL_LIMIT_2X;
 	else
 		dpfc_ctl |= DPFC_CTL_LIMIT_1X;
-	dpfc_ctl |= DPFC_CTL_FENCE_EN | obj->fence_reg;
+	dpfc_ctl |= get_fence_id(fb);
+	dpfc_ctl |= DPFC_CTL_FENCE_EN;
 
 	I915_WRITE(DPFC_FENCE_YOFF, get_crtc_fence_y_offset(crtc));
 
@@ -234,7 +240,6 @@ static void ilk_fbc_activate(struct intel_crtc *crtc)
 {
 	struct drm_i915_private *dev_priv = crtc->base.dev->dev_private;
 	struct drm_framebuffer *fb = crtc->base.primary->fb;
-	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 	u32 dpfc_ctl;
 	int threshold = dev_priv->fbc.threshold;
 	unsigned int y_offset;
@@ -257,19 +262,21 @@ static void ilk_fbc_activate(struct intel_crtc *crtc)
 		dpfc_ctl |= DPFC_CTL_LIMIT_1X;
 		break;
 	}
-	dpfc_ctl |= DPFC_CTL_FENCE_EN;
 	if (IS_GEN5(dev_priv))
-		dpfc_ctl |= obj->fence_reg;
+		dpfc_ctl |= get_fence_id(fb);
+	dpfc_ctl |= DPFC_CTL_FENCE_EN;
 
 	y_offset = get_crtc_fence_y_offset(crtc);
 	I915_WRITE(ILK_DPFC_FENCE_YOFF, y_offset);
-	I915_WRITE(ILK_FBC_RT_BASE, i915_gem_object_ggtt_offset(obj, NULL) | ILK_FBC_RT_VALID);
+	I915_WRITE(ILK_FBC_RT_BASE,
+		   i915_gem_object_ggtt_offset(intel_fb_obj(fb), NULL) |
+		   ILK_FBC_RT_VALID);
 	/* enable it... */
 	I915_WRITE(ILK_DPFC_CONTROL, dpfc_ctl | DPFC_CTL_EN);
 
 	if (IS_GEN6(dev_priv)) {
 		I915_WRITE(SNB_DPFC_CTL_SA,
-			   SNB_CPU_FENCE_ENABLE | obj->fence_reg);
+			   SNB_CPU_FENCE_ENABLE | get_fence_id(fb));
 		I915_WRITE(DPFC_CPU_FENCE_OFFSET, y_offset);
 	}
 
@@ -299,7 +306,6 @@ static void gen7_fbc_activate(struct intel_crtc *crtc)
 {
 	struct drm_i915_private *dev_priv = crtc->base.dev->dev_private;
 	struct drm_framebuffer *fb = crtc->base.primary->fb;
-	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 	u32 dpfc_ctl;
 	int threshold = dev_priv->fbc.threshold;
 
@@ -345,7 +351,7 @@ static void gen7_fbc_activate(struct intel_crtc *crtc)
 	I915_WRITE(ILK_DPFC_CONTROL, dpfc_ctl | DPFC_CTL_EN);
 
 	I915_WRITE(SNB_DPFC_CTL_SA,
-		   SNB_CPU_FENCE_ENABLE | obj->fence_reg);
+		   SNB_CPU_FENCE_ENABLE | get_fence_id(fb));
 	I915_WRITE(DPFC_CPU_FENCE_OFFSET, get_crtc_fence_y_offset(crtc));
 
 	intel_fbc_recompress(dev_priv);
@@ -781,7 +787,7 @@ static void __intel_fbc_update(struct intel_crtc *crtc)
 	 * by the CPU to the scanout and trigger updates to the FBC.
 	 */
 	if (obj->tiling_mode != I915_TILING_X ||
-	    obj->fence_reg == I915_FENCE_REG_NONE) {
+	    get_fence_id(fb) == I915_FENCE_REG_NONE) {
 		set_no_fbc_reason(dev_priv, "framebuffer not tiled or fenced");
 		goto out_disable;
 	}
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index 8e7c341951fd..0c8de9420776 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -282,7 +282,7 @@ static int intelfb_create(struct drm_fb_helper *helper,
 out_destroy_fbi:
 	drm_fb_helper_release_fbi(helper);
 out_unpin:
-	i915_gem_object_unpin_fence(vma->obj);
+	i915_vma_unpin_fence(vma);
 	i915_gem_object_unpin_from_display_plane(vma);
 out_unlock:
 	mutex_unlock(&dev->struct_mutex);
@@ -523,7 +523,7 @@ static void intel_fbdev_destroy(struct drm_device *dev,
 				struct intel_fbdev *ifbdev)
 {
 	if (ifbdev->vma) {
-		i915_gem_object_unpin_fence(ifbdev->vma->obj);
+		i915_vma_unpin_fence(ifbdev->vma);
 		i915_gem_object_unpin_from_display_plane(ifbdev->vma);
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index d1401f4c4762..97b75414263d 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -754,7 +754,7 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	if (IS_ERR(vma))
 		return PTR_ERR(vma);
 
-	ret = i915_gem_object_put_fence(new_bo);
+	ret = i915_vma_put_fence(vma);
 	if (ret)
 		goto out_unpin;
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 140/190] drm/i915: Fix partial GGTT faulting
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (51 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 139/190] drm/i915: Move fence tracking from object to vma Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  2016-01-11 10:45   ` [PATCH 141/190] drm/i915: Choose not to evict faultable objects from the GGTT Chris Wilson
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

We want to always use the partial VMA as a fallback for a failure to
bind the object into the GGTT. This extends the support partial objects
in the GGTT to cover everything, not just objects too large.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 64 +++++++++++++++++++++--------------------
 1 file changed, 33 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 70397c1022d1..a8f4d4633bdb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1447,7 +1447,6 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	struct drm_i915_gem_object *obj = to_intel_bo(vma->vm_private_data);
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct i915_ggtt_view view = i915_ggtt_view_normal;
 	struct i915_vma *ggtt;
 	pgoff_t page_offset;
 	unsigned long pfn;
@@ -1482,22 +1481,26 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	}
 
 	/* Use a partial view if the object is bigger than the aperture. */
-	if (obj->base.size >= dev_priv->gtt.mappable_end &&
-	    obj->tiling_mode == I915_TILING_NONE) {
+	/* Now pin it into the GTT if needed */
+	ggtt = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
+					PIN_MAPPABLE | PIN_NONBLOCK);
+	if (IS_ERR(ggtt)) {
 		static const unsigned int chunk_size = 256; // 1 MiB
+		struct i915_ggtt_view partial;
 
-		memset(&view, 0, sizeof(view));
-		view.type = I915_GGTT_VIEW_PARTIAL;
-		view.params.partial.offset = rounddown(page_offset, chunk_size);
-		view.params.partial.size =
+		memset(&partial, 0, sizeof(partial));
+		partial.type = I915_GGTT_VIEW_PARTIAL;
+		partial.params.partial.offset =
+			rounddown(page_offset, chunk_size);
+		partial.params.partial.size =
 			min_t(unsigned int,
 			      chunk_size,
 			      (vma->vm_end - vma->vm_start)/PAGE_SIZE -
-			      view.params.partial.offset);
-	}
+			      partial.params.partial.offset);
 
-	/* Now pin it into the GTT if needed */
-	ggtt = i915_gem_object_ggtt_pin(obj, &view, 0, 0, PIN_MAPPABLE);
+		ggtt = i915_gem_object_ggtt_pin(obj, &partial, 0, 0,
+						PIN_MAPPABLE);
+	}
 	if (IS_ERR(ggtt)) {
 		ret = PTR_ERR(ggtt);
 		goto unlock;
@@ -1515,24 +1518,7 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	pfn = dev_priv->gtt.mappable_base + ggtt->node.start;
 	pfn >>= PAGE_SHIFT;
 
-	if (unlikely(view.type == I915_GGTT_VIEW_PARTIAL)) {
-		/* Overriding existing pages in partial view does not cause
-		 * us any trouble as TLBs are still valid because the fault
-		 * is due to userspace losing part of the mapping or never
-		 * having accessed it before (at this partials' range).
-		 */
-		unsigned long base = vma->vm_start +
-				     (view.params.partial.offset << PAGE_SHIFT);
-		unsigned int i;
-
-		for (i = 0; i < view.params.partial.size; i++) {
-			ret = vm_insert_pfn(vma, base + i * PAGE_SIZE, pfn + i);
-			if (ret)
-				break;
-		}
-
-		obj->fault_mappable = true;
-	} else {
+	if (ggtt->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
 		if (!obj->fault_mappable) {
 			unsigned long size = min_t(unsigned long,
 						   vma->vm_end - vma->vm_start,
@@ -1546,13 +1532,29 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 				if (ret)
 					break;
 			}
-
-			obj->fault_mappable = true;
 		} else
 			ret = vm_insert_pfn(vma,
 					    (unsigned long)vmf->virtual_address,
 					    pfn + page_offset);
+	} else {
+		/* Overriding existing pages in partial view does not cause
+		 * us any trouble as TLBs are still valid because the fault
+		 * is due to userspace losing part of the mapping or never
+		 * having accessed it before (at this partials' range).
+		 */
+		const struct i915_ggtt_view *view = &ggtt->ggtt_view;
+		unsigned long base = vma->vm_start +
+				     (view->params.partial.offset << PAGE_SHIFT);
+		unsigned int i;
+
+		for (i = 0; i < view->params.partial.size; i++) {
+			ret = vm_insert_pfn(vma, base + i * PAGE_SIZE, pfn + i);
+			if (ret)
+				break;
+		}
 	}
+
+	obj->fault_mappable = true;
 unpin:
 	__i915_vma_unpin(ggtt);
 unlock:
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 141/190] drm/i915: Choose not to evict faultable objects from the GGTT
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
                     ` (52 preceding siblings ...)
  2016-01-11 10:45   ` [PATCH 140/190] drm/i915: Fix partial GGTT faulting Chris Wilson
@ 2016-01-11 10:45   ` Chris Wilson
  53 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 10:45 UTC (permalink / raw)
  To: intel-gfx

Often times we do not want to evict mapped objects from the GGTT as
these are quite expensive to teardown and frequently reused (causing an
equally, if not more so, expensive setup). In particular, when faulting
in a new object we want to avoid evicting an active object, or else we
may trigger a page-fault-of-doom as we ping-pong between evicting two
objects.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h       | 7 ++++---
 drivers/gpu/drm/i915/i915_gem.c       | 4 +++-
 drivers/gpu/drm/i915/i915_gem_evict.c | 7 +++++--
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index bb0f750bb5b5..45b8cbdfab55 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2709,9 +2709,10 @@ i915_vma_pin(struct i915_vma *vma,
 #define PIN_MAPPABLE	(1<<3)
 #define PIN_ZONE_4G	(1<<4)
 #define PIN_NONBLOCK	(1<<5)
-#define PIN_HIGH	(1<<6)
-#define PIN_OFFSET_BIAS	(1<<7)
-#define PIN_OFFSET_FIXED (1<<8)
+#define PIN_NOFAULT	(1<<6)
+#define PIN_HIGH	(1<<7)
+#define PIN_OFFSET_BIAS	(1<<8)
+#define PIN_OFFSET_FIXED (1<<9)
 #define PIN_OFFSET_MASK (~4095)
 
 static inline void __i915_vma_unpin(struct i915_vma *vma)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a8f4d4633bdb..60dfee56f6ef 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1483,7 +1483,9 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	/* Use a partial view if the object is bigger than the aperture. */
 	/* Now pin it into the GTT if needed */
 	ggtt = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
-					PIN_MAPPABLE | PIN_NONBLOCK);
+					PIN_MAPPABLE |
+					PIN_NONBLOCK |
+					PIN_NOFAULT);
 	if (IS_ERR(ggtt)) {
 		static const unsigned int chunk_size = 256; // 1 MiB
 		struct i915_ggtt_view partial;
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 679b7dd3a312..fdc4941be15a 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -64,7 +64,7 @@ static int switch_to_pinned_context(struct drm_i915_private *dev_priv)
 }
 
 static bool
-mark_free(struct i915_vma *vma, struct list_head *unwind)
+mark_free(struct i915_vma *vma, unsigned flags, struct list_head *unwind)
 {
 	if (vma->pin_count)
 		return false;
@@ -72,6 +72,9 @@ mark_free(struct i915_vma *vma, struct list_head *unwind)
 	if (WARN_ON(!list_empty(&vma->exec_list)))
 		return false;
 
+	if (flags & PIN_NOFAULT && vma->obj->fault_mappable)
+		return false;
+
 	list_add(&vma->exec_list, unwind);
 	return drm_mm_scan_add_block(&vma->node);
 }
@@ -146,7 +149,7 @@ search_again:
 	phase = phases;
 	do {
 		list_for_each_entry(vma, *phase, vm_link)
-			if (mark_free(vma, &eviction_list))
+			if (mark_free(vma, flags, &eviction_list))
 				goto found;
 	} while (*++phase);
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout
  2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
                   ` (85 preceding siblings ...)
  2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
@ 2016-01-11 11:00 ` Chris Wilson
  2016-01-11 11:00   ` [PATCH 143/190] drm/i915: Track display alignment on VMA Chris Wilson
                     ` (47 more replies)
  86 siblings, 48 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

The existing ABI says that scanouts are pinned into the mappable region
so that legacy clients (e.g. old Xorg or plymouthd) can write directly
into the scanout through a GTT mapping. However if the surface does not
fit into the mappable region, we are better off just trying to fit it
anywhere and hoping for the best. (Any userspace that is cappable of
using ginormous scanouts is also likely not to rely on pure GTT
updates.) With the partial vma fault support, we are no longer
restricted to only using scanouts that we can pin (though it is still
preferred for performance reasons and for powersaving features like
FBC).

v2: Skip fence pinning when not mappable.
v3: Add a comment to explain the possible rammifactions of not being
    able to use fences for unmappable scanouts.
v4: Rebase to skip over some local patches
v5: Rebase to defer until after we have unmappable GTT fault support

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c  | 14 ++++++++++----
 drivers/gpu/drm/i915/intel_fbc.c |  4 ++++
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 60dfee56f6ef..52e099ac29bf 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3327,11 +3327,17 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 
 	/* As the user may map the buffer once pinned in the display plane
 	 * (e.g. libkms for the bootup splash), we have to ensure that we
-	 * always use map_and_fenceable for all scanout buffers.
+	 * always use map_and_fenceable for all scanout buffers. However,
+	 * it may simply be too big to fit into mappable, in which case
+	 * put it anyway and hope that userspace can cope (but always first
+	 * try to preserve the existing ABI).
 	 */
-	vma = i915_gem_object_ggtt_pin(obj, view, 0, alignment,
-				       view->type == I915_GGTT_VIEW_NORMAL ?
-				       PIN_MAPPABLE : 0);
+	vma = ERR_PTR(-ENOSPC);
+	if (view->type == I915_GGTT_VIEW_NORMAL)
+		vma = i915_gem_object_ggtt_pin(obj, view, 0, alignment,
+					       PIN_MAPPABLE | PIN_NONBLOCK);
+	if (IS_ERR(vma))
+		vma = i915_gem_object_ggtt_pin(obj, view, 0, alignment, 0);
 	if (IS_ERR(vma))
 		goto err_unpin_display;
 
diff --git a/drivers/gpu/drm/i915/intel_fbc.c b/drivers/gpu/drm/i915/intel_fbc.c
index db48e3ccd7f7..bed4680639d1 100644
--- a/drivers/gpu/drm/i915/intel_fbc.c
+++ b/drivers/gpu/drm/i915/intel_fbc.c
@@ -785,6 +785,10 @@ static void __intel_fbc_update(struct intel_crtc *crtc)
 
 	/* The use of a CPU fence is mandatory in order to detect writes
 	 * by the CPU to the scanout and trigger updates to the FBC.
+	 *
+	 * Note that is possible for a tiled surface to be unmappable (and
+	 * so have no fence associated with it) due to aperture constaints
+	 * at the time of pinning.
 	 */
 	if (obj->tiling_mode != I915_TILING_X ||
 	    get_fence_id(fb) == I915_FENCE_REG_NONE) {
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 143/190] drm/i915: Track display alignment on VMA
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
@ 2016-01-11 11:00   ` Chris Wilson
  2016-01-11 11:00   ` [PATCH 144/190] drm/i915: Bump the inactive MRU tracking for all VMA accessed Chris Wilson
                     ` (46 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:00 UTC (permalink / raw)
  To: intel-gfx

When using the aliasing ppgtt and pagefliping with the shrinker/eviction
active, we note that we often have to rebind the backbuffer before
flipping onto the scanout because it has an invalid alignment. If we
store the worst-case alignment required for a VMA, we can avoid having
to rebind at critical junctures.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c     | 20 ++++++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  2 +-
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 52e099ac29bf..fa518764c32c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2804,7 +2804,6 @@ i915_vma_insert(struct i915_vma *vma,
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u64 start, end;
-	u64 min_alignment;
 	int ret;
 
 	GEM_BUG_ON(vma->bound);
@@ -2814,16 +2813,10 @@ i915_vma_insert(struct i915_vma *vma,
 	if (flags & PIN_MAPPABLE)
 		size = i915_gem_get_gtt_size(dev, size, obj->tiling_mode);
 
-	min_alignment =
-		i915_gem_get_gtt_alignment(dev, size, obj->tiling_mode,
-					   flags & PIN_MAPPABLE);
-	if (alignment == 0)
-		alignment = min_alignment;
-	if (alignment & (min_alignment - 1)) {
-		DRM_DEBUG("Invalid object alignment requested %llu, minimum %llu\n",
-			  alignment, min_alignment);
-		return -EINVAL;
-	}
+	alignment = max_t(u64, max(alignment, vma->display_alignment),
+			  i915_gem_get_gtt_alignment(dev, size,
+						     obj->tiling_mode,
+						     flags & PIN_MAPPABLE));
 
 	start = flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
 
@@ -3341,6 +3334,8 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	if (IS_ERR(vma))
 		goto err_unpin_display;
 
+	vma->display_alignment = max_t(u64, vma->display_alignment, alignment);
+
 	WARN_ON(obj->pin_display > vma->pin_count);
 
 	i915_gem_object_flush_cpu_write_domain(obj);
@@ -3374,8 +3369,9 @@ i915_gem_object_unpin_from_display_plane(struct i915_vma *vma)
 	if (WARN_ON(vma->obj->pin_display == 0))
 		return;
 
-	vma->obj->pin_display--;
 	vma->obj->pages_pin_count--;
+	if (--vma->obj->pin_display == 0)
+		vma->display_alignment = 0;
 
 	i915_vma_unpin(vma);
 	WARN_ON(vma->obj->pin_display > vma->pin_count);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c0ada0402335..dd446b69921b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -190,7 +190,7 @@ struct i915_vma {
 	struct i915_address_space *vm;
 	struct drm_i915_fence_reg *fence;
 	void *iomap;
-	u64 size;
+	u64 size, display_alignment;
 
 	struct i915_gem_active last_read[I915_NUM_RINGS];
 	struct i915_gem_active last_fence;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 144/190] drm/i915: Bump the inactive MRU tracking for all VMA accessed
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
  2016-01-11 11:00   ` [PATCH 143/190] drm/i915: Track display alignment on VMA Chris Wilson
@ 2016-01-11 11:00   ` Chris Wilson
  2016-01-11 11:00   ` [PATCH 145/190] drm/i915: Stop discarding GTT cache-domain on unbind vma Chris Wilson
                     ` (45 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:00 UTC (permalink / raw)
  To: intel-gfx

When we bump the MRU access tracking on set-to-gtt, we need to not only
bump the primary GGTT VMA but all partials as well. Similarly we want to
bump the MRU access for when unpinning an object from the scanout.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fa518764c32c..6ceed074f738 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3003,6 +3003,24 @@ i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj)
 					    I915_GEM_DOMAIN_CPU);
 }
 
+static void i915_gem_object_bump_inactive_ggtt(struct drm_i915_gem_object *obj)
+{
+	struct i915_vma *vma;
+
+	list_for_each_entry(vma, &obj->vma_list, obj_link) {
+		if (!vma->is_ggtt)
+			continue;
+
+		if (vma->active)
+			continue;
+
+		if (!drm_mm_node_allocated(&vma->node))
+			continue;
+
+		list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
+	}
+}
+
 /**
  * Moves a single object to the GTT read, and possibly write domain.
  *
@@ -3013,7 +3031,6 @@ int
 i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 {
 	uint32_t old_write_domain, old_read_domains;
-	struct i915_vma *vma;
 	int ret;
 
 	if (obj->base.write_domain == I915_GEM_DOMAIN_GTT)
@@ -3063,9 +3080,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 					    old_write_domain);
 
 	/* And bump the LRU for this access */
-	vma = i915_gem_object_to_ggtt(obj, NULL);
-	if (vma && drm_mm_node_allocated(&vma->node) && !vma->active)
-		list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
+	i915_gem_object_bump_inactive_ggtt(obj);
 
 	return 0;
 }
@@ -3373,6 +3388,10 @@ i915_gem_object_unpin_from_display_plane(struct i915_vma *vma)
 	if (--vma->obj->pin_display == 0)
 		vma->display_alignment = 0;
 
+	/* Bump the LRU to try and avoid premature eviction whilst flipping  */
+	if (!vma->active)
+		list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
+
 	i915_vma_unpin(vma);
 	WARN_ON(vma->obj->pin_display > vma->pin_count);
 }
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 145/190] drm/i915: Stop discarding GTT cache-domain on unbind vma
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
  2016-01-11 11:00   ` [PATCH 143/190] drm/i915: Track display alignment on VMA Chris Wilson
  2016-01-11 11:00   ` [PATCH 144/190] drm/i915: Bump the inactive MRU tracking for all VMA accessed Chris Wilson
@ 2016-01-11 11:00   ` Chris Wilson
  2016-01-12 13:22     ` Joonas Lahtinen
  2016-01-11 11:00   ` [PATCH 146/190] io-mapping: Always create a struct to hold metadata about the io-mapping Chris Wilson
                     ` (44 subsequent siblings)
  47 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

Since

commit 43566dedde54f9729113f5f9fde77d53e75e61e9
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jan 2 16:29:29 2015 +0530

    drm/i915: Broaden application of set-domain(GTT)

we allowed objects to be in the GTT domain, but unbound. Therefore
removing the GTT cache domain when removing the GGTT vma is no longer
semantically correct.

An unfortunate side-effect is we lose the wondrously named
i915_gem_object_finish_gtt(), not to be confused with
i915_gem_gtt_finish_object()!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 26 +++-----------------------
 1 file changed, 3 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6ceed074f738..08287d8857c9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2618,27 +2618,6 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	return 0;
 }
 
-static void i915_gem_object_finish_gtt(struct drm_i915_gem_object *obj)
-{
-	u32 old_write_domain, old_read_domains;
-
-	/* Force a pagefault for domain tracking on next user access */
-	i915_gem_release_mmap(obj);
-
-	if ((obj->base.read_domains & I915_GEM_DOMAIN_GTT) == 0)
-		return;
-
-	old_read_domains = obj->base.read_domains;
-	old_write_domain = obj->base.write_domain;
-
-	obj->base.read_domains &= ~I915_GEM_DOMAIN_GTT;
-	obj->base.write_domain &= ~I915_GEM_DOMAIN_GTT;
-
-	trace_i915_gem_object_change_domain(obj,
-					    old_read_domains,
-					    old_write_domain);
-}
-
 static void i915_vma_destroy(struct i915_vma *vma)
 {
 	GEM_BUG_ON(vma->node.allocated);
@@ -2691,13 +2670,14 @@ int i915_vma_unbind(struct i915_vma *vma)
 	GEM_BUG_ON(obj->pages == NULL);
 
 	if (vma->map_and_fenceable) {
-		i915_gem_object_finish_gtt(obj);
-
 		/* release the fence reg _after_ flushing */
 		ret = i915_vma_put_fence(vma);
 		if (ret)
 			return ret;
 
+		/* Force a pagefault for domain tracking on next user access */
+		i915_gem_release_mmap(obj);
+
 		if (vma->iomap) {
 			iounmap(vma->iomap);
 			vma->iomap = NULL;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 146/190] io-mapping: Always create a struct to hold metadata about the io-mapping
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (2 preceding siblings ...)
  2016-01-11 11:00   ` [PATCH 145/190] drm/i915: Stop discarding GTT cache-domain on unbind vma Chris Wilson
@ 2016-01-11 11:00   ` Chris Wilson
  2016-01-11 11:00   ` [PATCH 147/190] drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass Chris Wilson
                     ` (43 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, linux-mm

Currently, we only allocate a structure to hold metadata if we need to
allocate an ioremap for every access, such as on x86-32. However, it
would be useful to store basic information about the io-mapping, such as
its page protection, on all platforms.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: linux-mm@kvack.org
---
 drivers/gpu/drm/i915/i915_dma.c            | 11 ++--
 drivers/gpu/drm/i915/i915_gem.c            |  2 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  3 +-
 drivers/gpu/drm/i915/i915_gpu_error.c      |  2 +-
 drivers/gpu/drm/i915/intel_overlay.c       |  4 +-
 include/linux/io-mapping.h                 | 84 ++++++++++++++++++------------
 7 files changed, 63 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 4a24831a14fa..7d85c3bea02a 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -978,10 +978,9 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 
 	aperture_size = dev_priv->gtt.mappable_end;
 
-	dev_priv->gtt.mappable =
-		io_mapping_create_wc(dev_priv->gtt.mappable_base,
-				     aperture_size);
-	if (dev_priv->gtt.mappable == NULL) {
+	if (!io_mapping_init_wc(&dev_priv->gtt.mappable,
+				dev_priv->gtt.mappable_base,
+				aperture_size)) {
 		ret = -EIO;
 		goto out_gtt;
 	}
@@ -1104,7 +1103,7 @@ out_freewq:
 	destroy_workqueue(dev_priv->wq);
 out_mtrrfree:
 	arch_phys_wc_del(dev_priv->gtt.mtrr);
-	io_mapping_free(dev_priv->gtt.mappable);
+	io_mapping_fini(&dev_priv->gtt.mappable);
 out_gtt:
 	i915_global_gtt_cleanup(dev);
 out_freecsr:
@@ -1148,7 +1147,7 @@ int i915_driver_unload(struct drm_device *dev)
 	WARN_ON(unregister_oom_notifier(&dev_priv->mm.oom_notifier));
 	unregister_shrinker(&dev_priv->mm.shrinker);
 
-	io_mapping_free(dev_priv->gtt.mappable);
+	io_mapping_fini(&dev_priv->gtt.mappable);
 	arch_phys_wc_del(dev_priv->gtt.mtrr);
 
 	acpi_video_unregister();
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 08287d8857c9..7e321fdd90d2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -895,7 +895,7 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
 		 * source page isn't available.  Return the error and we'll
 		 * retry in the slow path.
 		 */
-		if (fast_user_write(dev_priv->gtt.mappable, page_base,
+		if (fast_user_write(&dev_priv->gtt.mappable, page_base,
 				    page_offset, user_data, page_length)) {
 			ret = -EFAULT;
 			goto out_flush;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 691da0085ff4..6ccce848f3e2 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -420,7 +420,7 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj,
 		cache->node.mm = (void *)vma;
 	}
 
-	vaddr = io_mapping_map_atomic_wc(cache->i915->gtt.mappable,
+	vaddr = io_mapping_map_atomic_wc(&cache->i915->gtt.mappable,
 					 cache->node.start + (page << PAGE_SHIFT));
 	cache->page = page;
 	cache->vaddr = (unsigned long)vaddr;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index dd446b69921b..16319835df10 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -34,6 +34,7 @@
 #ifndef __I915_GEM_GTT_H__
 #define __I915_GEM_GTT_H__
 
+#include <linux/io-mapping.h>
 #include "i915_gem_request.h"
 
 #define I915_FENCE_REG_NONE -1
@@ -389,11 +390,11 @@ struct i915_address_space {
  */
 struct i915_gtt {
 	struct i915_address_space base;
+	struct io_mapping mappable;	/* Mapping to our CPU mappable region */
 
 	size_t stolen_size;		/* Total size of stolen memory */
 	size_t stolen_usable_size;	/* Total size minus BIOS reserved */
 	u64 mappable_end;		/* End offset that we can CPU map */
-	struct io_mapping *mappable;	/* Mapping to our CPU mappable region */
 	phys_addr_t mappable_base;	/* PA of our GMADR */
 
 	/** "Graphics Stolen Memory" holds the global PTEs */
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index e5907ac666ad..7c7e0e76260c 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -664,7 +664,7 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
 			 * captures what the GPU read.
 			 */
 
-			s = io_mapping_map_atomic_wc(dev_priv->gtt.mappable,
+			s = io_mapping_map_atomic_wc(&dev_priv->gtt.mappable,
 						     reloc_offset);
 			memcpy_fromio(d, s, PAGE_SIZE);
 			io_mapping_unmap_atomic(s);
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index 97b75414263d..dd4d17e8c2cf 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -196,7 +196,7 @@ intel_overlay_map_regs(struct intel_overlay *overlay)
 	if (OVERLAY_NEEDS_PHYSICAL(overlay->dev))
 		regs = (struct overlay_registers __iomem *)overlay->reg_bo->phys_handle->vaddr;
 	else
-		regs = io_mapping_map_wc(dev_priv->gtt.mappable,
+		regs = io_mapping_map_wc(&dev_priv->gtt.mappable,
 					 overlay->flip_addr);
 
 	return regs;
@@ -1493,7 +1493,7 @@ intel_overlay_map_regs_atomic(struct intel_overlay *overlay)
 		regs = (struct overlay_registers __iomem *)
 			overlay->reg_bo->phys_handle->vaddr;
 	else
-		regs = io_mapping_map_atomic_wc(dev_priv->gtt.mappable,
+		regs = io_mapping_map_atomic_wc(&dev_priv->gtt.mappable,
 						overlay->flip_addr);
 
 	return regs;
diff --git a/include/linux/io-mapping.h b/include/linux/io-mapping.h
index e399029b68c5..34bb310c9a0e 100644
--- a/include/linux/io-mapping.h
+++ b/include/linux/io-mapping.h
@@ -31,16 +31,17 @@
  * See Documentation/io-mapping.txt
  */
 
-#ifdef CONFIG_HAVE_ATOMIC_IOMAP
-
-#include <asm/iomap.h>
-
 struct io_mapping {
 	resource_size_t base;
 	unsigned long size;
 	pgprot_t prot;
+	void __iomem *iomem;
 };
 
+
+#ifdef CONFIG_HAVE_ATOMIC_IOMAP
+
+#include <asm/iomap.h>
 /*
  * For small address space machines, mapping large objects
  * into the kernel virtual space isn't practical. Where
@@ -49,34 +50,25 @@ struct io_mapping {
  */
 
 static inline struct io_mapping *
-io_mapping_create_wc(resource_size_t base, unsigned long size)
+io_mapping_init_wc(struct io_mapping *iomap,
+		   resource_size_t base,
+		   unsigned long size)
 {
-	struct io_mapping *iomap;
 	pgprot_t prot;
 
-	iomap = kmalloc(sizeof(*iomap), GFP_KERNEL);
-	if (!iomap)
-		goto out_err;
-
 	if (iomap_create_wc(base, size, &prot))
-		goto out_free;
+		return NULL;
 
 	iomap->base = base;
 	iomap->size = size;
 	iomap->prot = prot;
 	return iomap;
-
-out_free:
-	kfree(iomap);
-out_err:
-	return NULL;
 }
 
 static inline void
-io_mapping_free(struct io_mapping *mapping)
+io_mapping_fini(struct io_mapping *mapping)
 {
 	iomap_free(mapping->base, mapping->size);
-	kfree(mapping);
 }
 
 /* Atomic map/unmap */
@@ -119,21 +111,38 @@ io_mapping_unmap(void __iomem *vaddr)
 #else
 
 #include <linux/uaccess.h>
-
-/* this struct isn't actually defined anywhere */
-struct io_mapping;
+#include <asm/pgtable_types.h>
 
 /* Create the io_mapping object*/
 static inline struct io_mapping *
-io_mapping_create_wc(resource_size_t base, unsigned long size)
+io_mapping_init_wc(struct io_mapping *iomap,
+		   resource_size_t base,
+		   unsigned long size)
 {
-	return (struct io_mapping __force *) ioremap_wc(base, size);
+	iomap->base = base;
+	iomap->size = size;
+	iomap->iomem = ioremap_wc(base, size);
+	iomap->prot = pgprot_writecombine(PAGE_KERNEL_IO);
+
+	return iomap;
 }
 
 static inline void
-io_mapping_free(struct io_mapping *mapping)
+io_mapping_fini(struct io_mapping *mapping)
+{
+	iounmap(mapping->iomem);
+}
+
+/* Non-atomic map/unmap */
+static inline void __iomem *
+io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset)
+{
+	return mapping->iomem + offset;
+}
+
+static inline void
+io_mapping_unmap(void __iomem *vaddr)
 {
-	iounmap((void __force __iomem *) mapping);
 }
 
 /* Atomic map/unmap */
@@ -143,28 +152,37 @@ io_mapping_map_atomic_wc(struct io_mapping *mapping,
 {
 	preempt_disable();
 	pagefault_disable();
-	return ((char __force __iomem *) mapping) + offset;
+	return io_mapping_map_wc(mapping, offset);
 }
 
 static inline void
 io_mapping_unmap_atomic(void __iomem *vaddr)
 {
+	io_mapping_unmap(vaddr);
 	pagefault_enable();
 	preempt_enable();
 }
 
-/* Non-atomic map/unmap */
-static inline void __iomem *
-io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset)
+#endif /* HAVE_ATOMIC_IOMAP */
+
+static inline struct io_mapping *
+io_mapping_create_wc(resource_size_t base,
+		     unsigned long size)
 {
-	return ((char __force __iomem *) mapping) + offset;
+	struct io_mapping *iomap;
+
+	iomap = kmalloc(sizeof(*iomap), GFP_KERNEL);
+	if (iomap == NULL)
+		return NULL;
+
+	return io_mapping_init_wc(iomap, base, size);
 }
 
 static inline void
-io_mapping_unmap(void __iomem *vaddr)
+io_mapping_free(struct io_mapping *iomap)
 {
+	io_mapping_fini(iomap);
+	kfree(iomap);
 }
 
-#endif /* HAVE_ATOMIC_IOMAP */
-
 #endif /* _LINUX_IO_MAPPING_H */
-- 
2.7.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 147/190] drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (3 preceding siblings ...)
  2016-01-11 11:00   ` [PATCH 146/190] io-mapping: Always create a struct to hold metadata about the io-mapping Chris Wilson
@ 2016-01-11 11:00   ` Chris Wilson
  2016-01-11 11:00   ` [PATCH 148/190] drm/i915: Stop marking the unaccessible scratch page as UC Chris Wilson
                     ` (42 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:00 UTC (permalink / raw)
  To: intel-gfx

On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
Upload rate for 2 linear surfaces:  8134MiB/s -> 8154MiB/s
Upload rate for 2 tiled surfaces:   8625MiB/s -> 8632MiB/s
Upload rate for 4 linear surfaces:  8127MiB/s -> 8134MiB/s
Upload rate for 4 tiled surfaces:   8602MiB/s -> 8629MiB/s
Upload rate for 8 linear surfaces:  8124MiB/s -> 8137MiB/s
Upload rate for 8 tiled surfaces:   8603MiB/s -> 8624MiB/s
Upload rate for 16 linear surfaces: 8123MiB/s -> 8128MiB/s
Upload rate for 16 tiled surfaces:  8606MiB/s -> 8618MiB/s
Upload rate for 32 linear surfaces: 8121MiB/s -> 8128MiB/s
Upload rate for 32 tiled surfaces:  8605MiB/s -> 8614MiB/s
Upload rate for 64 linear surfaces: 8121MiB/s -> 8127MiB/s
Upload rate for 64 tiled surfaces:  3017MiB/s -> 5202MiB/s

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Testcase: igt/gem_fence_upload/performance
Testcase: igt/gem_mmap_gtt
---
 drivers/gpu/drm/Makefile           |   2 +-
 drivers/gpu/drm/i915/Makefile      |   5 +-
 drivers/gpu/drm/i915/i915_drv.h    |   4 ++
 drivers/gpu/drm/i915/i915_gem.c    |  46 +++-----------
 drivers/gpu/drm/i915/i915_memory.c | 122 +++++++++++++++++++++++++++++++++++++
 5 files changed, 138 insertions(+), 41 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_memory.c

diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index f858aa25fbb2..6834d0e33741 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -43,7 +43,7 @@ obj-$(CONFIG_DRM_RADEON)+= radeon/
 obj-$(CONFIG_DRM_AMDGPU)+= amd/amdgpu/
 obj-$(CONFIG_DRM_MGA)	+= mga/
 obj-$(CONFIG_DRM_I810)	+= i810/
-obj-$(CONFIG_DRM_I915)  += i915/
+obj-y += i915/
 obj-$(CONFIG_DRM_MGAG200) += mgag200/
 obj-$(CONFIG_DRM_VC4)  += vc4/
 obj-$(CONFIG_DRM_CIRRUS_QEMU) += cirrus/
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 79d657f29241..a362425ef862 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -100,6 +100,9 @@ i915-y += i915_vgpu.o
 # legacy horrors
 i915-y += i915_dma.o
 
-obj-$(CONFIG_DRM_I915)  += i915.o
+obj-$(CONFIG_DRM_I915) += i915.o
+ifdef CONFIG_DRM_I915
+obj-y += i915_memory.o
+endif
 
 CFLAGS_i915_trace_points.o := -I$(src)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 45b8cbdfab55..e6f49175af1b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3447,4 +3447,8 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 	return false;
 }
 
+int remap_io_mapping(struct vm_area_struct *vma,
+		     unsigned long addr, unsigned long pfn, unsigned long size,
+		     struct io_mapping *iomap);
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7e321fdd90d2..1fa4752682d6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1449,7 +1449,6 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_vma *ggtt;
 	pgoff_t page_offset;
-	unsigned long pfn;
 	int ret = 0;
 	bool write = !!(vmf->flags & FAULT_FLAG_WRITE);
 
@@ -1517,44 +1516,13 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 		goto unpin;
 
 	/* Finally, remap it using the new GTT offset */
-	pfn = dev_priv->gtt.mappable_base + ggtt->node.start;
-	pfn >>= PAGE_SHIFT;
-
-	if (ggtt->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
-		if (!obj->fault_mappable) {
-			unsigned long size = min_t(unsigned long,
-						   vma->vm_end - vma->vm_start,
-						   obj->base.size);
-			int i;
-
-			for (i = 0; i < size >> PAGE_SHIFT; i++) {
-				ret = vm_insert_pfn(vma,
-						    (unsigned long)vma->vm_start + i * PAGE_SIZE,
-						    pfn + i);
-				if (ret)
-					break;
-			}
-		} else
-			ret = vm_insert_pfn(vma,
-					    (unsigned long)vmf->virtual_address,
-					    pfn + page_offset);
-	} else {
-		/* Overriding existing pages in partial view does not cause
-		 * us any trouble as TLBs are still valid because the fault
-		 * is due to userspace losing part of the mapping or never
-		 * having accessed it before (at this partials' range).
-		 */
-		const struct i915_ggtt_view *view = &ggtt->ggtt_view;
-		unsigned long base = vma->vm_start +
-				     (view->params.partial.offset << PAGE_SHIFT);
-		unsigned int i;
-
-		for (i = 0; i < view->params.partial.size; i++) {
-			ret = vm_insert_pfn(vma, base + i * PAGE_SIZE, pfn + i);
-			if (ret)
-				break;
-		}
-	}
+	ret = remap_io_mapping(vma,
+			       vma->vm_start + (ggtt->ggtt_view.params.partial.offset << PAGE_SHIFT),
+			       (dev_priv->gtt.mappable_base + ggtt->node.start) >> PAGE_SHIFT,
+			       min_t(u64, ggtt->size, vma->vm_end - vma->vm_start),
+			       &dev_priv->gtt.mappable);
+	if (ret)
+		goto unpin;
 
 	obj->fault_mappable = true;
 unpin:
diff --git a/drivers/gpu/drm/i915/i915_memory.c b/drivers/gpu/drm/i915/i915_memory.c
new file mode 100644
index 000000000000..f684576022f3
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_memory.c
@@ -0,0 +1,122 @@
+#include <linux/mm.h>
+#include <linux/io-mapping.h>
+
+#include <asm/io.h>
+#include <asm/pgalloc.h>
+#include <asm/uaccess.h>
+#include <asm/tlb.h>
+#include <asm/tlbflush.h>
+#include <asm/pgtable.h>
+
+#include "i915_drv.h"
+
+struct remap_pfn {
+	struct mm_struct *mm;
+	unsigned long addr;
+	unsigned long pfn;
+	pgprot_t prot;
+};
+
+static inline void remap_pfn(struct remap_pfn *r, pte_t *pte)
+{
+	set_pte_at(r->mm, r->addr, pte,
+		   pte_mkspecial(pfn_pte(r->pfn, r->prot)));
+	r->pfn++;
+	r->addr += PAGE_SIZE;
+}
+
+static inline int remap_pte_range(struct remap_pfn *r, pmd_t *pmd, unsigned long end)
+{
+	pte_t *pte;
+	spinlock_t *ptl;
+
+	pte = pte_alloc_map_lock(r->mm, pmd, r->addr, &ptl);
+	if (!pte)
+		return -ENOMEM;
+
+	arch_enter_lazy_mmu_mode();
+	do
+		remap_pfn(r, pte++);
+	while (r->addr < end);
+	arch_leave_lazy_mmu_mode();
+
+	pte_unmap_unlock(pte - 1, ptl);
+	return 0;
+}
+
+static inline int remap_pmd_range(struct remap_pfn *r, pud_t *pud, unsigned long end)
+{
+	pmd_t *pmd;
+	int err;
+
+	pmd = pmd_alloc(r->mm, pud, r->addr);
+	if (!pmd)
+		return -ENOMEM;
+	VM_BUG_ON(pmd_trans_huge(*pmd));
+
+	do
+		err = remap_pte_range(r, pmd++, pmd_addr_end(r->addr, end));
+	while (err == 0 && r->addr < end);
+
+	return err;
+}
+
+static inline int remap_pud_range(struct remap_pfn *r, pgd_t *pgd, unsigned long end)
+{
+	pud_t *pud;
+	int err;
+
+	pud = pud_alloc(r->mm, pgd, r->addr);
+	if (!pud)
+		return -ENOMEM;
+
+	do
+		err = remap_pmd_range(r, pud++, pud_addr_end(r->addr, end));
+	while (err == 0 && r->addr < end);
+
+	return err;
+}
+
+/**
+ * remap_io_mapping - remap an IO mapping to userspace
+ * @vma: user vma to map to
+ * @addr: target user address to start at
+ * @pfn: physical address of kernel memory
+ * @size: size of map area
+ * @iomap: the source io_mapping
+ *
+ *  Note: this is only safe if the mm semaphore is held when called.
+ */
+int remap_io_mapping(struct vm_area_struct *vma,
+		     unsigned long addr, unsigned long pfn, unsigned long size,
+		     struct io_mapping *iomap)
+{
+	unsigned long end = addr + PAGE_ALIGN(size);
+	struct remap_pfn r;
+	pgd_t *pgd;
+	int err;
+
+	if (WARN_ON(addr >= end))
+		return -EINVAL;
+
+#define MUST_SET (VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP)
+	BUG_ON((vma->vm_flags & MUST_SET) != MUST_SET);
+#undef MUST_SET
+
+	r.mm = vma->vm_mm;
+	r.addr = addr;
+	r.pfn = pfn;
+	r.prot = __pgprot((pgprot_val(iomap->prot) & _PAGE_CACHE_MASK) |
+			  (pgprot_val(vma->vm_page_prot) & ~_PAGE_CACHE_MASK));
+
+	pgd = pgd_offset(r.mm, addr);
+	do
+		err = remap_pud_range(&r, pgd++, pgd_addr_end(r.addr, end));
+	while (err == 0 && r.addr < end);
+
+	if (err)
+		zap_vma_ptes(vma, addr, r.addr - addr);
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(remap_io_mapping);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 148/190] drm/i915: Stop marking the unaccessible scratch page as UC
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (4 preceding siblings ...)
  2016-01-11 11:00   ` [PATCH 147/190] drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass Chris Wilson
@ 2016-01-11 11:00   ` Chris Wilson
  2016-01-11 11:00   ` [PATCH 149/190] drm/i915: Use i915_vm_to_ppgtt() Chris Wilson
                     ` (41 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:00 UTC (permalink / raw)
  To: intel-gfx

Since by design, if not entirely by practice, nothing is allowed to
access the scratch page we use to background fill the VM, then we do not
need to ensure that it is coherent between the CPU and GPU.
set_pages_uc() does a stop_machine() after changing the PAT, and that
significantly impacts upon context creation throughput.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 3db8cdf56dcc..caf573fd2dde 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -407,16 +407,12 @@ static struct i915_page_scratch *alloc_scratch_page(struct drm_device *dev)
 		return ERR_PTR(ret);
 	}
 
-	set_pages_uc(px_page(sp), 1);
-
 	return sp;
 }
 
 static void free_scratch_page(struct drm_device *dev,
 			      struct i915_page_scratch *sp)
 {
-	set_pages_wb(px_page(sp), 1);
-
 	cleanup_px(dev, sp);
 	kfree(sp);
 }
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 149/190] drm/i915: Use i915_vm_to_ppgtt()
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (5 preceding siblings ...)
  2016-01-11 11:00   ` [PATCH 148/190] drm/i915: Stop marking the unaccessible scratch page as UC Chris Wilson
@ 2016-01-11 11:00   ` Chris Wilson
  2016-01-11 11:00   ` [PATCH 150/190] drm/i915: Embed the scratch page struct into each VM Chris Wilson
                     ` (40 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:00 UTC (permalink / raw)
  To: intel-gfx

We have a typesafe wrapper to extract the ppgtt from a generic address
space, but only used it once out a few dozen places.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h     |  1 -
 drivers/gpu/drm/i915/i915_gem_gtt.c | 40 ++++++++++++-------------------------
 2 files changed, 13 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e6f49175af1b..c460dc0c14e1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2914,7 +2914,6 @@ i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
 static inline struct i915_hw_ppgtt *
 i915_vm_to_ppgtt(struct i915_address_space *vm)
 {
-	WARN_ON(i915_is_ggtt(vm));
 	return container_of(vm, struct i915_hw_ppgtt, base);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index caf573fd2dde..4fea8d221ba7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -700,8 +700,7 @@ static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
 				       uint64_t length,
 				       gen8_pte_t scratch_pte)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
 	gen8_pte_t *pt_vaddr;
 	unsigned pdpe = gen8_pdpe_index(start);
 	unsigned pde = gen8_pde_index(start);
@@ -756,8 +755,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				   uint64_t length,
 				   bool use_scratch)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
 	gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
 						 I915_CACHE_LLC, use_scratch);
 
@@ -782,8 +780,7 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
 			      uint64_t start,
 			      enum i915_cache_level cache_level)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
 	gen8_pte_t *pt_vaddr;
 	unsigned pdpe = gen8_pdpe_index(start);
 	unsigned pde = gen8_pde_index(start);
@@ -823,8 +820,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 				      enum i915_cache_level cache_level,
 				      u32 unused)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
 	struct sg_page_iter sg_iter;
 
 	__sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
@@ -975,8 +971,7 @@ static void gen8_ppgtt_cleanup_4lvl(struct i915_hw_ppgtt *ppgtt)
 
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
 
 	if (intel_vgpu_active(vm->dev))
 		gen8_ppgtt_notify_vgt(ppgtt, false);
@@ -1210,8 +1205,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 				    uint64_t start,
 				    uint64_t length)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
 	unsigned long *new_page_dirs, *new_page_tables;
 	struct drm_device *dev = vm->dev;
 	struct i915_page_directory *pd;
@@ -1323,8 +1317,7 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
 				    uint64_t length)
 {
 	DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4);
-	struct i915_hw_ppgtt *ppgtt =
-			container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
 	struct i915_page_directory_pointer *pdp;
 	uint64_t pml4e;
 	int ret = 0;
@@ -1370,8 +1363,7 @@ err_out:
 static int gen8_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start, uint64_t length)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
 
 	if (USES_FULL_48BIT_PPGTT(vm->dev))
 		return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
@@ -1792,8 +1784,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 				   uint64_t length,
 				   bool use_scratch)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
 	gen6_pte_t *pt_vaddr, scratch_pte;
 	unsigned first_entry = start >> PAGE_SHIFT;
 	unsigned num_entries = length >> PAGE_SHIFT;
@@ -1827,8 +1818,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 				      uint64_t start,
 				      enum i915_cache_level cache_level, u32 flags)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
 	gen6_pte_t *pt_vaddr;
 	unsigned first_entry = start >> PAGE_SHIFT;
 	unsigned act_pt = first_entry / GEN6_PTES;
@@ -1861,8 +1851,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 	DECLARE_BITMAP(new_page_tables, I915_PDES);
 	struct drm_device *dev = vm->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct i915_hw_ppgtt *ppgtt =
-				container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
 	struct i915_page_table *pt;
 	uint32_t start, length, start_save, length_save;
 	uint32_t pde, temp;
@@ -1974,8 +1963,7 @@ static void gen6_free_scratch(struct i915_address_space *vm)
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
 	struct i915_page_table *pt;
 	uint32_t pde;
 
@@ -3226,9 +3214,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 		list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
 			/* TODO: Perhaps it shouldn't be gen6 specific */
 
-			struct i915_hw_ppgtt *ppgtt =
-					container_of(vm, struct i915_hw_ppgtt,
-						     base);
+			struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
 
 			if (i915_is_ggtt(vm))
 				ppgtt = dev_priv->mm.aliasing_ppgtt;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 150/190] drm/i915: Embed the scratch page struct into each VM
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (6 preceding siblings ...)
  2016-01-11 11:00   ` [PATCH 149/190] drm/i915: Use i915_vm_to_ppgtt() Chris Wilson
@ 2016-01-11 11:00   ` Chris Wilson
  2016-01-11 11:00   ` [PATCH 151/190] drm/i915: Allow DMA pagetables to use highmem Chris Wilson
                     ` (39 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:00 UTC (permalink / raw)
  To: intel-gfx

As the scratch page is no longer shared between all VM, and each has
their own, forgo the small allocation and simply embed the scratch page
struct into the i915_address_space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 83 +++++++++++++++----------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  6 +--
 2 files changed, 35 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 4fea8d221ba7..fa7dedd395ee 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -392,29 +392,16 @@ static void fill_page_dma_32(struct drm_device *dev, struct i915_page_dma *p,
 	fill_page_dma(dev, p, v);
 }
 
-static struct i915_page_scratch *alloc_scratch_page(struct drm_device *dev)
+static int
+setup_scratch_page(struct drm_device *dev, struct i915_page_dma *scratch)
 {
-	struct i915_page_scratch *sp;
-	int ret;
-
-	sp = kzalloc(sizeof(*sp), GFP_KERNEL);
-	if (sp == NULL)
-		return ERR_PTR(-ENOMEM);
-
-	ret = __setup_page_dma(dev, px_base(sp), GFP_DMA32 | __GFP_ZERO);
-	if (ret) {
-		kfree(sp);
-		return ERR_PTR(ret);
-	}
-
-	return sp;
+	return __setup_page_dma(dev, scratch, GFP_DMA32 | __GFP_ZERO);
 }
 
-static void free_scratch_page(struct drm_device *dev,
-			      struct i915_page_scratch *sp)
+static void cleanup_scratch_page(struct drm_device *dev,
+				 struct i915_page_dma *scratch)
 {
-	cleanup_px(dev, sp);
-	kfree(sp);
+	cleanup_page_dma(dev, scratch);
 }
 
 static struct i915_page_table *alloc_pt(struct drm_device *dev)
@@ -460,7 +447,7 @@ static void gen8_initialize_pt(struct i915_address_space *vm,
 {
 	gen8_pte_t scratch_pte;
 
-	scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
+	scratch_pte = gen8_pte_encode(vm->scratch_page.daddr,
 				      I915_CACHE_LLC, true);
 
 	fill_px(vm->dev, pt, scratch_pte);
@@ -471,9 +458,9 @@ static void gen6_initialize_pt(struct i915_address_space *vm,
 {
 	gen6_pte_t scratch_pte;
 
-	WARN_ON(px_dma(vm->scratch_page) == 0);
+	WARN_ON(vm->scratch_page.daddr == 0);
 
-	scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
+	scratch_pte = vm->pte_encode(vm->scratch_page.daddr,
 				     I915_CACHE_LLC, true, 0);
 
 	fill32_px(vm->dev, pt, scratch_pte);
@@ -756,7 +743,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				   bool use_scratch)
 {
 	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
-	gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
+	gen8_pte_t scratch_pte = gen8_pte_encode(vm->scratch_page.daddr,
 						 I915_CACHE_LLC, use_scratch);
 
 	if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
@@ -860,21 +847,22 @@ static void gen8_free_page_tables(struct drm_device *dev,
 static int gen8_init_scratch(struct i915_address_space *vm)
 {
 	struct drm_device *dev = vm->dev;
+	int ret;
 
-	vm->scratch_page = alloc_scratch_page(dev);
-	if (IS_ERR(vm->scratch_page))
-		return PTR_ERR(vm->scratch_page);
+	ret = setup_scratch_page(dev, &vm->scratch_page);
+	if (ret)
+		return ret;
 
 	vm->scratch_pt = alloc_pt(dev);
 	if (IS_ERR(vm->scratch_pt)) {
-		free_scratch_page(dev, vm->scratch_page);
+		cleanup_scratch_page(dev, &vm->scratch_page);
 		return PTR_ERR(vm->scratch_pt);
 	}
 
 	vm->scratch_pd = alloc_pd(dev);
 	if (IS_ERR(vm->scratch_pd)) {
 		free_pt(dev, vm->scratch_pt);
-		free_scratch_page(dev, vm->scratch_page);
+		cleanup_scratch_page(dev, &vm->scratch_page);
 		return PTR_ERR(vm->scratch_pd);
 	}
 
@@ -883,7 +871,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
 		if (IS_ERR(vm->scratch_pdp)) {
 			free_pd(dev, vm->scratch_pd);
 			free_pt(dev, vm->scratch_pt);
-			free_scratch_page(dev, vm->scratch_page);
+			cleanup_scratch_page(dev, &vm->scratch_page);
 			return PTR_ERR(vm->scratch_pdp);
 		}
 	}
@@ -936,7 +924,7 @@ static void gen8_free_scratch(struct i915_address_space *vm)
 		free_pdp(dev, vm->scratch_pdp);
 	free_pd(dev, vm->scratch_pd);
 	free_pt(dev, vm->scratch_pt);
-	free_scratch_page(dev, vm->scratch_page);
+	cleanup_scratch_page(dev, &vm->scratch_page);
 }
 
 static void gen8_ppgtt_cleanup_3lvl(struct drm_device *dev,
@@ -1433,7 +1421,7 @@ static void gen8_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	struct i915_address_space *vm = &ppgtt->base;
 	uint64_t start = ppgtt->base.start;
 	uint64_t length = ppgtt->base.total;
-	gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
+	gen8_pte_t scratch_pte = gen8_pte_encode(vm->scratch_page.daddr,
 						 I915_CACHE_LLC, true);
 
 	if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
@@ -1550,7 +1538,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	uint32_t  pte, pde, temp;
 	uint32_t start = ppgtt->base.start, length = ppgtt->base.total;
 
-	scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
+	scratch_pte = vm->pte_encode(vm->scratch_page.daddr,
 				     I915_CACHE_LLC, true, 0);
 
 	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
@@ -1792,7 +1780,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 	unsigned first_pte = first_entry % GEN6_PTES;
 	unsigned last_pte, i;
 
-	scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
+	scratch_pte = vm->pte_encode(vm->scratch_page.daddr,
 				     I915_CACHE_LLC, true, 0);
 
 	while (num_entries) {
@@ -1937,14 +1925,15 @@ unwind_out:
 static int gen6_init_scratch(struct i915_address_space *vm)
 {
 	struct drm_device *dev = vm->dev;
+	int ret;
 
-	vm->scratch_page = alloc_scratch_page(dev);
-	if (IS_ERR(vm->scratch_page))
-		return PTR_ERR(vm->scratch_page);
+	ret = setup_scratch_page(dev, &vm->scratch_page);
+	if (ret)
+		return ret;
 
 	vm->scratch_pt = alloc_pt(dev);
 	if (IS_ERR(vm->scratch_pt)) {
-		free_scratch_page(dev, vm->scratch_page);
+		cleanup_scratch_page(dev, &vm->scratch_page);
 		return PTR_ERR(vm->scratch_pt);
 	}
 
@@ -1958,7 +1947,7 @@ static void gen6_free_scratch(struct i915_address_space *vm)
 	struct drm_device *dev = vm->dev;
 
 	free_pt(dev, vm->scratch_pt);
-	free_scratch_page(dev, vm->scratch_page);
+	cleanup_scratch_page(dev, &vm->scratch_page);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -2481,7 +2470,7 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
 		 first_entry, num_entries, max_entries))
 		num_entries = max_entries;
 
-	scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
+	scratch_pte = gen8_pte_encode(vm->scratch_page.daddr,
 				      I915_CACHE_LLC,
 				      use_scratch);
 	for (i = 0; i < num_entries; i++)
@@ -2512,7 +2501,7 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
 		 first_entry, num_entries, max_entries))
 		num_entries = max_entries;
 
-	scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
+	scratch_pte = vm->pte_encode(vm->scratch_page.daddr,
 				     I915_CACHE_LLC, use_scratch, 0);
 
 	for (i = 0; i < num_entries; i++)
@@ -2867,8 +2856,8 @@ static int ggtt_probe_common(struct drm_device *dev,
 			     size_t gtt_size)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct i915_page_scratch *scratch_page;
 	phys_addr_t gtt_phys_addr;
+	int ret;
 
 	/* For Modern GENs the PTEs and register space are split in the BAR */
 	gtt_phys_addr = pci_resource_start(dev->pdev, 0) +
@@ -2890,16 +2879,14 @@ static int ggtt_probe_common(struct drm_device *dev,
 		return -ENOMEM;
 	}
 
-	scratch_page = alloc_scratch_page(dev);
-	if (IS_ERR(scratch_page)) {
+	ret = setup_scratch_page(dev, &dev_priv->gtt.base.scratch_page);
+	if (ret) {
 		DRM_ERROR("Scratch setup failed\n");
 		/* iounmap will also get called at remove, but meh */
 		iounmap(dev_priv->gtt.gsm);
-		return PTR_ERR(scratch_page);
+		return ret;
 	}
 
-	dev_priv->gtt.base.scratch_page = scratch_page;
-
 	return 0;
 }
 
@@ -3071,11 +3058,10 @@ static int gen6_gmch_probe(struct drm_device *dev,
 
 static void gen6_gmch_remove(struct i915_address_space *vm)
 {
-
 	struct i915_gtt *gtt = container_of(vm, struct i915_gtt, base);
 
 	iounmap(gtt->gsm);
-	free_scratch_page(vm->dev, vm->scratch_page);
+	cleanup_scratch_page(vm->dev, &vm->scratch_page);
 }
 
 static int i915_gmch_probe(struct drm_device *dev,
@@ -3213,7 +3199,6 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 	if (USES_PPGTT(dev)) {
 		list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
 			/* TODO: Perhaps it shouldn't be gen6 specific */
-
 			struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
 
 			if (i915_is_ggtt(vm))
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 16319835df10..06d11f941056 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -268,10 +268,6 @@ struct i915_page_dma {
 #define px_page(px) (px_base(px)->page)
 #define px_dma(px) (px_base(px)->daddr)
 
-struct i915_page_scratch {
-	struct i915_page_dma base;
-};
-
 struct i915_page_table {
 	struct i915_page_dma base;
 
@@ -317,7 +313,7 @@ struct i915_address_space {
 
 	bool closed;
 
-	struct i915_page_scratch *scratch_page;
+	struct i915_page_dma scratch_page;
 	struct i915_page_table *scratch_pt;
 	struct i915_page_directory *scratch_pd;
 	struct i915_page_directory_pointer *scratch_pdp; /* GEN8+ & 48b PPGTT */
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 151/190] drm/i915: Allow DMA pagetables to use highmem
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (7 preceding siblings ...)
  2016-01-11 11:00   ` [PATCH 150/190] drm/i915: Embed the scratch page struct into each VM Chris Wilson
@ 2016-01-11 11:00   ` Chris Wilson
  2016-01-11 11:00   ` [PATCH 152/190] drm/i915: Replace request->postfix with ->head for space searching Chris Wilson
                     ` (38 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:00 UTC (permalink / raw)
  To: intel-gfx

As we never need to directly access the pages we allocate for scratch and
the pagetables, and always remap them into the GTT through the dma
remapper, we do not need to limit the allocations to lowmem i.e. we can
pass in the __GFP_HIGHMEM flag to the page allocation.

For backwards compatibility, e.g. certain old GPUs not liking highmem
for certain functions that may be accidentally mapped to the scratch
page by userspace, keep the GMCH probe as only allocating form DMA32.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index fa7dedd395ee..faee28c807f2 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -32,6 +32,8 @@
 #include "i915_trace.h"
 #include "intel_drv.h"
 
+#define I915_GFP_DMA (GFP_KERNEL | __GFP_HIGHMEM)
+
 /**
  * DOC: Global GTT views
  *
@@ -330,7 +332,7 @@ static int __setup_page_dma(struct drm_device *dev,
 
 static int setup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
 {
-	return __setup_page_dma(dev, p, GFP_KERNEL);
+	return __setup_page_dma(dev, p, I915_GFP_DMA);
 }
 
 static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
@@ -393,9 +395,9 @@ static void fill_page_dma_32(struct drm_device *dev, struct i915_page_dma *p,
 }
 
 static int
-setup_scratch_page(struct drm_device *dev, struct i915_page_dma *scratch)
+setup_scratch_page(struct drm_device *dev, struct i915_page_dma *scratch, gfp_t gfp)
 {
-	return __setup_page_dma(dev, scratch, GFP_DMA32 | __GFP_ZERO);
+	return __setup_page_dma(dev, scratch, gfp | __GFP_ZERO);
 }
 
 static void cleanup_scratch_page(struct drm_device *dev,
@@ -849,7 +851,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
 	struct drm_device *dev = vm->dev;
 	int ret;
 
-	ret = setup_scratch_page(dev, &vm->scratch_page);
+	ret = setup_scratch_page(dev, &vm->scratch_page, I915_GFP_DMA);
 	if (ret)
 		return ret;
 
@@ -1927,7 +1929,7 @@ static int gen6_init_scratch(struct i915_address_space *vm)
 	struct drm_device *dev = vm->dev;
 	int ret;
 
-	ret = setup_scratch_page(dev, &vm->scratch_page);
+	ret = setup_scratch_page(dev, &vm->scratch_page, I915_GFP_DMA);
 	if (ret)
 		return ret;
 
@@ -2879,7 +2881,7 @@ static int ggtt_probe_common(struct drm_device *dev,
 		return -ENOMEM;
 	}
 
-	ret = setup_scratch_page(dev, &dev_priv->gtt.base.scratch_page);
+	ret = setup_scratch_page(dev, &dev_priv->gtt.base.scratch_page, GFP_DMA32);
 	if (ret) {
 		DRM_ERROR("Scratch setup failed\n");
 		/* iounmap will also get called at remove, but meh */
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 152/190] drm/i915: Replace request->postfix with ->head for space searching
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (8 preceding siblings ...)
  2016-01-11 11:00   ` [PATCH 151/190] drm/i915: Allow DMA pagetables to use highmem Chris Wilson
@ 2016-01-11 11:00   ` Chris Wilson
  2016-01-11 11:00   ` [PATCH 153/190] drm/i915: Record the position of the start of the request Chris Wilson
                     ` (37 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:00 UTC (permalink / raw)
  To: intel-gfx

We can simplify the request code slightly by removing the postfix marker
and simply using the head of the request when calculating how much space
will be available when retiring upto that request. (We ignore the end of
the request in case the interrupt arrives before the ring is actually
past the tail and so risk overwritting an active part of the
ringbuffer.) Using the head for the space calculation limits us to
having requests such that any two can fit into the ringbuffer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 12 +++++-------
 drivers/gpu/drm/i915/i915_gem_request.h | 15 ++-------------
 drivers/gpu/drm/i915/i915_gpu_error.c   |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  4 ++--
 4 files changed, 10 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 4ebe4b7e02d0..9e8e594ce2bd 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -371,7 +371,7 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	 * Note this requires that we are always called in request
 	 * completion order.
 	 */
-	request->ring->last_retired_head = request->postfix;
+	request->ring->last_retired_head = request->head;
 
 	__i915_gem_request_retire_active(request);
 	__i915_gem_request_release(request);
@@ -447,17 +447,15 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 		WARN(ret, "*_ring_flush_all_caches failed: %d!\n", ret);
 	}
 
+	ret = request->engine->add_request(request);
+	/* Not allowed to fail! */
+	WARN(ret, "emit|add_request failed: %d!\n", ret);
+
 	/* Record the position of the start of the request so that
 	 * should we detect the updated seqno part-way through the
 	 * GPU processing the request, we never over-estimate the
 	 * position of the head.
 	 */
-	request->postfix = intel_ring_get_tail(ring);
-
-	ret = request->engine->add_request(request);
-	/* Not allowed to fail! */
-	WARN(ret, "emit|add_request failed: %d!\n", ret);
-
 	request->head = request_start;
 
 	request->emitted_jiffies = jiffies;
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 2294234b4bf5..d87136edf117 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -55,19 +55,8 @@ struct drm_i915_gem_request {
 	  */
 	u32 previous_seqno;
 
-	/** Position in the ringbuffer of the start of the request */
-	u32 head;
-
-	/**
-	 * Position in the ringbuffer of the start of the postfix.
-	 * This is required to calculate the maximum available ringbuffer
-	 * space without overwriting the postfix.
-	 */
-	u32 postfix;
-
-	/** Position in the ringbuffer of the end of the whole request */
-	u32 tail;
-	u32 wa_tail;
+	/** Position in the ringbuffer of the request */
+	u32 head, tail, wa_tail;
 
 	/**
 	 * Context and ring buffer related to this request
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 7c7e0e76260c..a2935d7e9278 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1067,7 +1067,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 			erq = &error->ring[i].requests[count++];
 			erq->seqno = request->fence.seqno;
 			erq->jiffies = request->emitted_jiffies;
-			erq->tail = request->postfix;
+			erq->tail = request->tail;
 		}
 	}
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index dbc76cd54c3e..be2207f551e3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2213,8 +2213,8 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
 			continue;
 
 		/* Would completion of this request free enough space? */
-		space = __intel_ring_space(target->postfix, ring->tail,
-					   ring->size);
+		space = __intel_ring_space(target->head,
+					   ring->tail, ring->size);
 		if (space >= bytes)
 			break;
 	}
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 153/190] drm/i915: Record the position of the start of the request
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (9 preceding siblings ...)
  2016-01-11 11:00   ` [PATCH 152/190] drm/i915: Replace request->postfix with ->head for space searching Chris Wilson
@ 2016-01-11 11:00   ` Chris Wilson
  2016-01-11 11:00   ` [PATCH 154/190] drm/i915: Move per-request pid from request to ctx Chris Wilson
                     ` (36 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:00 UTC (permalink / raw)
  To: intel-gfx

Not only does it make for good documentation and debugging aide, but it
is also vital for when we want to unwind requests - such as when
throwing away an incomplete request.

v2: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         |  1 +
 drivers/gpu/drm/i915/i915_gem_request.c | 16 +++++++---------
 drivers/gpu/drm/i915/i915_gpu_error.c   |  4 +++-
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c460dc0c14e1..84693d4c4e52 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -566,6 +566,7 @@ struct drm_i915_error_state {
 		struct drm_i915_error_request {
 			long jiffies;
 			u32 seqno;
+			u32 head;
 			u32 tail;
 		} *requests;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 9e8e594ce2bd..74be71e7d113 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -244,6 +244,13 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 		goto err;
 	}
 
+	/* Record the position of the start of the request so that
+	 * should we detect the updated seqno part-way through the
+	 * GPU processing the request, we never over-estimate the
+	 * position of the head.
+	 */
+	req->head = intel_ring_get_tail(req->ring);
+
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
 	 * eventually emit this request. This is to guarantee that the
@@ -421,7 +428,6 @@ static void i915_gem_mark_busy(struct drm_i915_private *dev_priv)
 void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 {
 	struct intel_ring *ring = request->ring;
-	u32 request_start;
 	int ret;
 
 	/*
@@ -431,7 +437,6 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 	 */
 	intel_ring_reserved_space_use(ring);
 
-	request_start = intel_ring_get_tail(ring);
 	/*
 	 * Emit any outstanding flushes - execbuf can fail to emit the flush
 	 * after having emitted the batchbuffer command. Hence we need to fix
@@ -451,13 +456,6 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 	/* Not allowed to fail! */
 	WARN(ret, "emit|add_request failed: %d!\n", ret);
 
-	/* Record the position of the start of the request so that
-	 * should we detect the updated seqno part-way through the
-	 * GPU processing the request, we never over-estimate the
-	 * position of the head.
-	 */
-	request->head = request_start;
-
 	request->emitted_jiffies = jiffies;
 	request->previous_seqno = request->engine->last_submitted_seqno;
 	request->engine->last_submitted_seqno = request->fence.seqno;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index a2935d7e9278..494dee1f724d 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -457,9 +457,10 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 				   dev_priv->ring[i].name,
 				   error->ring[i].num_requests);
 			for (j = 0; j < error->ring[i].num_requests; j++) {
-				err_printf(m, "  seqno 0x%08x, emitted %ld, tail 0x%08x\n",
+				err_printf(m, "  seqno 0x%08x, emitted %ld, head 0x%08x tail 0x%08x\n",
 					   error->ring[i].requests[j].seqno,
 					   error->ring[i].requests[j].jiffies,
+					   error->ring[i].requests[j].head,
 					   error->ring[i].requests[j].tail);
 			}
 		}
@@ -1067,6 +1068,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 			erq = &error->ring[i].requests[count++];
 			erq->seqno = request->fence.seqno;
 			erq->jiffies = request->emitted_jiffies;
+			erq->head = request->head;
 			erq->tail = request->tail;
 		}
 	}
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 154/190] drm/i915: Move per-request pid from request to ctx
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (10 preceding siblings ...)
  2016-01-11 11:00   ` [PATCH 153/190] drm/i915: Record the position of the start of the request Chris Wilson
@ 2016-01-11 11:00   ` Chris Wilson
  2016-01-11 11:00   ` [PATCH 155/190] drm/i915: Merge legacy+execlists context structs Chris Wilson
                     ` (35 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:00 UTC (permalink / raw)
  To: intel-gfx

Since contexts are not currently shared between userspace processes, we
have an exact correspondence between context creator and guilty batch
submitter. Therefore we can save some per-batch work by inspecting the
context->pid upon error instead. Note that we take the context's
creator's pid rather than the file's pid in order to better track fd
passed over sockets.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     | 21 ++++++++++++---------
 drivers/gpu/drm/i915/i915_drv.h         |  2 ++
 drivers/gpu/drm/i915/i915_gem_context.c |  5 +++++
 drivers/gpu/drm/i915/i915_gem_request.c |  5 -----
 drivers/gpu/drm/i915/i915_gem_request.h |  3 ---
 drivers/gpu/drm/i915/i915_gpu_error.c   | 13 ++++++++++---
 6 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index f15ed7793969..4cd05b730b4c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -480,6 +480,8 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 	print_batch_pool_stats(m, dev_priv);
 	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
 		struct file_stats stats;
+		struct drm_i915_file_private *file_priv = file->driver_priv;
+		struct drm_i915_gem_request *request;
 		struct task_struct *task;
 
 		memset(&stats, 0, sizeof(stats));
@@ -493,8 +495,13 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 		 * still alive (e.g. get_pid(current) => fork() => exit()).
 		 * Therefore, we need to protect this ->comm access using RCU.
 		 */
+		request = list_first_entry_or_null(&file_priv->mm.request_list,
+						   struct drm_i915_gem_request,
+						   client_list);
 		rcu_read_lock();
-		task = pid_task(file->pid, PIDTYPE_PID);
+		task = pid_task(request && request->ctx->pid ?
+				request->ctx->pid : file->pid,
+				PIDTYPE_PID);
 		print_file_stats(m, task ? task->comm : "<unknown>", stats);
 		rcu_read_unlock();
 	}
@@ -681,12 +688,11 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
 
 		seq_printf(m, "%s requests: %d\n", ring->name, count);
 		list_for_each_entry(req, &ring->request_list, link) {
+			struct pid *pid = req->ctx->pid;
 			struct task_struct *task;
 
 			rcu_read_lock();
-			task = NULL;
-			if (req->pid)
-				task = pid_task(req->pid, PIDTYPE_PID);
+			task = pid ? pid_task(pid, PIDTYPE_PID) : NULL;
 			seq_printf(m, "    %x @ %d: %s [%d]\n",
 				   req->fence.seqno,
 				   (int) (jiffies - req->emitted_jiffies),
@@ -1953,13 +1959,10 @@ static int i915_context_status(struct seq_file *m, void *unused)
 			continue;
 
 		seq_puts(m, "HW context ");
-		if (IS_ERR(ctx->file_priv)) {
-			seq_puts(m, "(deleted) ");
-		} else if (ctx->file_priv) {
-			struct pid *pid = ctx->file_priv->file->pid;
+		if (ctx->pid) {
 			struct task_struct *task;
 
-			task = get_pid_task(pid, PIDTYPE_PID);
+			task = get_pid_task(ctx->pid, PIDTYPE_PID);
 			if (task) {
 				seq_printf(m, "(%s [%d]) ",
 					   task->comm, task->pid);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 84693d4c4e52..dcff2f2066d0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -565,6 +565,7 @@ struct drm_i915_error_state {
 
 		struct drm_i915_error_request {
 			long jiffies;
+			pid_t pid;
 			u32 seqno;
 			u32 head;
 			u32 tail;
@@ -878,6 +879,7 @@ struct intel_context {
 	struct drm_i915_file_private *file_priv;
 	struct i915_ctx_hang_stats hang_stats;
 	struct i915_hw_ppgtt *ppgtt;
+	struct pid *pid;
 
 	unsigned flags;
 #define CONTEXT_NO_ZEROMAP		(1<<0)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 0a5f1d5fa788..b57112db1c3f 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -147,6 +147,8 @@ void i915_gem_context_free(struct kref *ctx_ref)
 
 	if (ctx->legacy_hw_ctx.rcs_state)
 		drm_gem_object_unreference(&ctx->legacy_hw_ctx.rcs_state->base);
+
+	put_pid(ctx->pid);
 	list_del(&ctx->link);
 	kfree(ctx);
 }
@@ -256,6 +258,9 @@ __create_hw_context(struct drm_device *dev,
 		ret = DEFAULT_CONTEXT_HANDLE;
 
 	ctx->file_priv = file_priv;
+	if (file_priv)
+		ctx->pid = get_task_pid(current, PIDTYPE_PID);
+
 	ctx->user_handle = ret;
 	/* NB: Mark all slices as needing a remap so that when the context first
 	 * loads it will restore whatever remap state already exists. If there
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 74be71e7d113..d922b78614bd 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -298,8 +298,6 @@ int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 	list_add_tail(&req->client_list, &file_priv->mm.request_list);
 	spin_unlock(&file_priv->mm.lock);
 
-	req->pid = get_pid(task_pid(current));
-
 	return 0;
 }
 
@@ -315,9 +313,6 @@ i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
 	list_del(&request->client_list);
 	request->file_priv = NULL;
 	spin_unlock(&file_priv->mm.lock);
-
-	put_pid(request->pid);
-	request->pid = NULL;
 }
 
 static void __i915_gem_request_release(struct drm_i915_gem_request *request)
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index d87136edf117..67bc1f919af0 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -86,9 +86,6 @@ struct drm_i915_gem_request {
 	/** file_priv list entry for this request */
 	struct list_head client_list;
 
-	/** process identifier submitting this request */
-	struct pid *pid;
-
 	/** Execlist link in the submission queue.*/
 	struct list_head execlist_link; /* guarded by engine->execlist_lock */
 };
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 494dee1f724d..f3c428d5627b 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -457,7 +457,8 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 				   dev_priv->ring[i].name,
 				   error->ring[i].num_requests);
 			for (j = 0; j < error->ring[i].num_requests; j++) {
-				err_printf(m, "  seqno 0x%08x, emitted %ld, head 0x%08x tail 0x%08x\n",
+				err_printf(m, "  pid %d, seqno 0x%08x, emitted %ld, head 0x%08x tail 0x%08x\n",
+					   error->ring[i].requests[j].pid,
 					   error->ring[i].requests[j].seqno,
 					   error->ring[i].requests[j].jiffies,
 					   error->ring[i].requests[j].head,
@@ -983,6 +984,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 		if (request) {
 			struct i915_address_space *vm;
 			struct intel_ring *ring;
+			struct pid *pid;
 
 			vm = request->ctx->ppgtt ?
 				&request->ctx->ppgtt->base :
@@ -1002,11 +1004,12 @@ static void i915_gem_record_rings(struct drm_device *dev,
 					i915_error_object_create(dev_priv,
 								 engine->scratch.vma);
 
-			if (request->pid) {
+			pid = request->ctx->pid;
+			if (pid) {
 				struct task_struct *task;
 
 				rcu_read_lock();
-				task = pid_task(request->pid, PIDTYPE_PID);
+				task = pid_task(pid, PIDTYPE_PID);
 				if (task) {
 					strcpy(error->ring[i].comm, task->comm);
 					error->ring[i].pid = task->pid;
@@ -1070,6 +1073,10 @@ static void i915_gem_record_rings(struct drm_device *dev,
 			erq->jiffies = request->emitted_jiffies;
 			erq->head = request->head;
 			erq->tail = request->tail;
+
+			rcu_read_lock();
+			erq->pid = request->ctx ? pid_nr(request->ctx->pid) : 0;
+			rcu_read_unlock();
 		}
 	}
 }
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 155/190] drm/i915: Merge legacy+execlists context structs
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (11 preceding siblings ...)
  2016-01-11 11:00   ` [PATCH 154/190] drm/i915: Move per-request pid from request to ctx Chris Wilson
@ 2016-01-11 11:00   ` Chris Wilson
  2016-01-11 11:00   ` [PATCH 156/190] drm/i915: Store the active context object on all engines upon error Chris Wilson
                     ` (34 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:00 UTC (permalink / raw)
  To: intel-gfx

struct intel_context contains two substructs, one for the legacy RCS and
one for every execlists engine. Since legacy RCS is a subset of the
execlists engine support, just combine the two substructs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     | 34 +++++---------
 drivers/gpu/drm/i915/i915_drv.h         | 10 +----
 drivers/gpu/drm/i915/i915_gem_context.c | 78 ++++++++++++++++++---------------
 drivers/gpu/drm/i915/intel_lrc.c        | 25 -----------
 drivers/gpu/drm/i915/intel_lrc.h        |  1 -
 5 files changed, 54 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 4cd05b730b4c..558d79b63e6c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -179,13 +179,6 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		seq_printf(m, " (frontbuffer: 0x%03x)", obj->frontbuffer_bits);
 }
 
-static void describe_ctx(struct seq_file *m, struct intel_context *ctx)
-{
-	seq_putc(m, ctx->legacy_hw_ctx.initialized ? 'I' : 'i');
-	seq_putc(m, ctx->remap_slice ? 'R' : 'r');
-	seq_putc(m, ' ');
-}
-
 static int i915_gem_object_list_info(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = m->private;
@@ -1954,10 +1947,6 @@ static int i915_context_status(struct seq_file *m, void *unused)
 		return ret;
 
 	list_for_each_entry(ctx, &dev_priv->context_list, link) {
-		if (!i915.enable_execlists &&
-		    ctx->legacy_hw_ctx.rcs_state == NULL)
-			continue;
-
 		seq_puts(m, "HW context ");
 		if (ctx->pid) {
 			struct task_struct *task;
@@ -1970,21 +1959,18 @@ static int i915_context_status(struct seq_file *m, void *unused)
 			}
 		} else
 			seq_puts(m, "(kernel) ");
-		describe_ctx(m, ctx);
 
-		if (i915.enable_execlists) {
+		seq_putc(m, ctx->remap_slice ? 'R' : 'r');
+		seq_putc(m, '\n');
+
+		for_each_ring(ring, dev_priv, i) {
+			seq_printf(m, "%s: ", ring->name);
+			seq_putc(m, ctx->engine[i].initialised ? 'I' : 'i');
+			if (ctx->engine[i].state)
+				describe_obj(m, ctx->engine[i].state);
+			if (ctx->engine[i].ring)
+				describe_ctx_ring(m, ctx->engine[i].ring);
 			seq_putc(m, '\n');
-			for_each_ring(ring, dev_priv, i) {
-				seq_printf(m, "%s: ", ring->name);
-				if (ctx->engine[i].state)
-					describe_obj(m, ctx->engine[i].state);
-				if (ctx->engine[i].ring)
-					describe_ctx_ring(m,
-							  ctx->engine[i].ring);
-				seq_putc(m, '\n');
-			}
-		} else {
-			describe_obj(m, ctx->legacy_hw_ctx.rcs_state);
 		}
 
 		seq_putc(m, '\n');
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index dcff2f2066d0..917686eed962 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -885,15 +885,7 @@ struct intel_context {
 #define CONTEXT_NO_ZEROMAP		(1<<0)
 #define CONTEXT_NO_ERROR_CAPTURE	(1<<1)
 
-	/* Legacy ring buffer submission */
-	struct {
-		struct drm_i915_gem_object *rcs_state;
-		struct i915_vma *rcs_vma;
-		bool initialized;
-	} legacy_hw_ctx;
-
-	/* Execlists */
-	struct {
+	struct intel_context_engine {
 		struct drm_i915_gem_object *state;
 		struct i915_vma *vma;
 		struct intel_ring *ring;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index b57112db1c3f..f261d9e0929d 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -136,17 +136,25 @@ static int get_context_size(struct drm_device *dev)
 void i915_gem_context_free(struct kref *ctx_ref)
 {
 	struct intel_context *ctx = container_of(ctx_ref, typeof(*ctx), ref);
+	int i;
 
 	trace_i915_context_free(ctx);
 	GEM_BUG_ON(!ctx->closed);
 
-	if (i915.enable_execlists)
-		intel_lr_context_free(ctx);
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		struct intel_context_engine *ce = &ctx->engine[i];
 
-	i915_ppgtt_put(ctx->ppgtt);
+		if (ce->state == NULL)
+			continue;
+
+		WARN_ON(ce->pin_count);
+		if (ce->ring)
+			intel_ring_free(ce->ring);
 
-	if (ctx->legacy_hw_ctx.rcs_state)
-		drm_gem_object_unreference(&ctx->legacy_hw_ctx.rcs_state->base);
+		drm_gem_object_unreference(&ce->state->base);
+	}
+
+	i915_ppgtt_put(ctx->ppgtt);
 
 	put_pid(ctx->pid);
 	list_del(&ctx->link);
@@ -245,7 +253,7 @@ __create_hw_context(struct drm_device *dev,
 			ret = PTR_ERR(obj);
 			goto err_out;
 		}
-		ctx->legacy_hw_ctx.rcs_state = obj;
+		ctx->engine[RCS].state = obj;
 	}
 
 	/* Default context will never have a file_priv */
@@ -331,20 +339,21 @@ void i915_gem_context_reset(struct drm_device *dev)
 		struct intel_engine_cs *ring = &dev_priv->ring[i];
 		struct intel_context *lctx = ring->last_context;
 
-		if (lctx) {
-			if (lctx->legacy_hw_ctx.rcs_vma) {
-				i915_vma_unpin(lctx->legacy_hw_ctx.rcs_vma);
-				lctx->legacy_hw_ctx.rcs_vma = NULL;
-			}
+		/* Force the GPU state to be reinitialised on enabling */
+		if (dev_priv->kernel_context)
+			dev_priv->kernel_context->engine[i].initialised = false;
+
+		if (lctx == NULL)
+			continue;
 
-			i915_gem_context_unreference(lctx);
-			ring->last_context = NULL;
+		if (lctx->engine[i].vma) {
+			i915_vma_unpin(lctx->engine[i].vma);
+			lctx->engine[i].vma = NULL;
 		}
-	}
 
-	/* Force the GPU state to be reinitialised on enabling */
-	if (dev_priv->kernel_context)
-		dev_priv->kernel_context->legacy_hw_ctx.initialized = false;
+		i915_gem_context_unreference(lctx);
+		ring->last_context = NULL;
+	}
 }
 
 int i915_gem_context_init(struct drm_device *dev)
@@ -384,7 +393,7 @@ int i915_gem_context_init(struct drm_device *dev)
 		return PTR_ERR(ctx);
 	}
 
-	if (ctx->legacy_hw_ctx.rcs_state) {
+	if (ctx->engine[RCS].state) {
 		u32 alignment = get_context_alignment(dev);
 		struct i915_vma *vma;
 
@@ -395,7 +404,7 @@ int i915_gem_context_init(struct drm_device *dev)
 		 * be available. To avoid this we always pin the default
 		 * context.
 		 */
-		vma = i915_gem_object_ggtt_pin(ctx->legacy_hw_ctx.rcs_state,
+		vma = i915_gem_object_ggtt_pin(ctx->engine[RCS].state,
 					       NULL, 0, alignment, PIN_HIGH);
 		if (IS_ERR(vma)) {
 			DRM_ERROR("Failed to pinned default global context (error %d)\n",
@@ -419,7 +428,7 @@ void i915_gem_context_fini(struct drm_device *dev)
 	struct intel_context *dctx = dev_priv->kernel_context;
 	int i;
 
-	if (dctx->legacy_hw_ctx.rcs_state) {
+	if (!i915.enable_execlists) {
 		/* The only known way to stop the gpu from accessing the hw context is
 		 * to reset it. Do this as the very last operation to avoid confusing
 		 * other code, leading to spurious errors. */
@@ -434,13 +443,13 @@ void i915_gem_context_fini(struct drm_device *dev)
 		WARN_ON(!dev_priv->ring[RCS].last_context);
 		if (dev_priv->ring[RCS].last_context == dctx) {
 			/* Fake switch to NULL context */
-			WARN_ON(dctx->legacy_hw_ctx.rcs_vma->active);
-			i915_vma_unpin(dctx->legacy_hw_ctx.rcs_vma);
+			WARN_ON(dctx->engine[RCS].vma->active);
+			i915_vma_unpin(dctx->engine[RCS].vma);
 			i915_gem_context_unreference(dctx);
 			dev_priv->ring[RCS].last_context = NULL;
 		}
 
-		i915_vma_unpin(dctx->legacy_hw_ctx.rcs_vma);
+		i915_vma_unpin(dctx->engine[RCS].vma);
 	}
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
@@ -560,8 +569,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 
 	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_emit(ring, MI_SET_CONTEXT);
-	intel_ring_emit(ring,
-			req->ctx->legacy_hw_ctx.rcs_vma->node.start | flags);
+	intel_ring_emit(ring, req->ctx->engine[RCS].vma->node.start | flags);
 	/*
 	 * w/a: MI_SET_CONTEXT must always be followed by MI_NOOP
 	 * WaMiSetContext_Hang:snb,ivb,vlv
@@ -685,12 +693,12 @@ static int do_switch(struct drm_i915_gem_request *req)
 		u32 alignment = get_context_alignment(engine->dev);
 		struct i915_vma *vma;
 
-		vma = i915_gem_object_ggtt_pin(to->legacy_hw_ctx.rcs_state,
+		vma = i915_gem_object_ggtt_pin(to->engine[RCS].state,
 					       NULL, 0, alignment, PIN_HIGH);
 		if (IS_ERR(vma))
 			return PTR_ERR(vma);
 
-		to->legacy_hw_ctx.rcs_vma = vma;
+		to->engine[RCS].vma = vma;
 
 		if (WARN_ON(!(vma->bound & GLOBAL_BIND))) {
 			ret = -ENODEV;
@@ -737,11 +745,11 @@ static int do_switch(struct drm_i915_gem_request *req)
 	 *
 	 * XXX: We need a real interface to do this instead of trickery.
 	 */
-	ret = i915_gem_object_set_to_gtt_domain(to->legacy_hw_ctx.rcs_state, false);
+	ret = i915_gem_object_set_to_gtt_domain(to->engine[RCS].state, false);
 	if (ret)
 		goto unpin_out;
 
-	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to)) {
+	if (!to->engine[RCS].initialised || i915_gem_context_is_default(to)) {
 		hw_flags |= MI_RESTORE_INHIBIT;
 		/* NB: If we inhibit the restore, the context is not allowed to
 		 * die because future work may end up depending on valid address
@@ -790,13 +798,13 @@ static int do_switch(struct drm_i915_gem_request *req)
 			to->remap_slice &= ~(1<<i);
 	}
 
-	if (!to->legacy_hw_ctx.initialized) {
+	if (!to->engine[RCS].initialised) {
 		if (engine->init_context) {
 			ret = engine->init_context(req);
 			if (ret)
 				goto unpin_out;
 		}
-		to->legacy_hw_ctx.initialized = true;
+		to->engine[RCS].initialised = true;
 	}
 
 	/* The backing object for the context is done after switching to the
@@ -813,9 +821,9 @@ static int do_switch(struct drm_i915_gem_request *req)
 		 * able to defer doing this until we know the object would be
 		 * swapped, but there is no way to do that yet.
 		 */
-		i915_vma_move_to_active(from->legacy_hw_ctx.rcs_vma, req, 0);
+		i915_vma_move_to_active(from->engine[RCS].vma, req, 0);
 		/* obj is kept alive until the next request by its active ref */
-		i915_vma_unpin(from->legacy_hw_ctx.rcs_vma);
+		i915_vma_unpin(from->engine[RCS].vma);
 
 		i915_gem_context_unreference(from);
 	}
@@ -827,7 +835,7 @@ done:
 
 unpin_out:
 	if (engine->id == RCS)
-		i915_vma_unpin(to->legacy_hw_ctx.rcs_vma);
+		i915_vma_unpin(to->engine[RCS].vma);
 	return ret;
 }
 
@@ -851,7 +859,7 @@ int i915_switch_context(struct drm_i915_gem_request *req)
 
 	WARN_ON(!mutex_is_locked(&req->i915->dev->struct_mutex));
 
-	if (req->ctx->legacy_hw_ctx.rcs_state == NULL) { /* We have the fake context */
+	if (req->ctx->engine[RCS].state == NULL) { /* We have the fake context */
 		struct intel_engine_cs *engine = req->engine;
 
 		if (req->ctx != engine->last_context) {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 68d06ab6acdc..c2a45f48da66 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1858,30 +1858,6 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 }
 
 /**
- * intel_lr_context_free() - free the LRC specific bits of a context
- * @ctx: the LR context to free.
- *
- * The real context freeing is done in i915_gem_context_free: this only
- * takes care of the bits that are LRC related: the per-engine backing
- * objects and the logical ringbuffer.
- */
-void intel_lr_context_free(struct intel_context *ctx)
-{
-	int i;
-
-	for (i = 0; i < I915_NUM_RINGS; i++) {
-		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
-
-		if (ctx_obj) {
-			WARN_ON(ctx->engine[i].pin_count);
-
-			intel_ring_free(ctx->engine[i].ring);
-			drm_gem_object_unreference(&ctx_obj->base);
-		}
-	}
-}
-
-/**
  * intel_lr_context_size() - return the size of the context for an engine
  * @ring: which engine to find the context size for
  *
@@ -1958,7 +1934,6 @@ static int execlists_context_deferred_alloc(struct intel_context *ctx,
 	struct intel_ring *ring;
 	int ret;
 
-	WARN_ON(ctx->legacy_hw_ctx.rcs_state != NULL);
 	WARN_ON(ctx->engine[engine->id].state);
 
 	context_size = round_up(intel_lr_context_size(engine), 4096);
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index a454372fe660..28ac50494b1e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -67,7 +67,6 @@ int intel_logical_rings_init(struct drm_device *dev);
 #define LRC_PPHWSP_PN	(LRC_GUCSHR_PN + 1)
 #define LRC_STATE_PN	(LRC_PPHWSP_PN + 1)
 
-void intel_lr_context_free(struct intel_context *ctx);
 uint32_t intel_lr_context_size(struct intel_engine_cs *ring);
 void intel_lr_context_unpin(struct intel_context *ctx,
 			    struct intel_engine_cs *engine);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 156/190] drm/i915: Store the active context object on all engines upon error
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (12 preceding siblings ...)
  2016-01-11 11:00   ` [PATCH 155/190] drm/i915: Merge legacy+execlists context structs Chris Wilson
@ 2016-01-11 11:00   ` Chris Wilson
  2016-01-11 11:00   ` [PATCH 157/190] drm/i915: Tidy execlists by using intel_context_engine locals Chris Wilson
                     ` (33 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:00 UTC (permalink / raw)
  To: intel-gfx

With execlists, we have context objects everywhere, not just RCS. So
store them for post-mortem debugging.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 26 ++++----------------------
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index f3c428d5627b..e9ef6b25c696 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -941,26 +941,6 @@ static void i915_record_ring_state(struct drm_device *dev,
 	}
 }
 
-
-static void i915_gem_record_active_context(struct intel_engine_cs *ring,
-					   struct drm_i915_error_state *error,
-					   struct drm_i915_error_ring *ering)
-{
-	struct drm_i915_private *dev_priv = ring->i915;
-	struct i915_vma *vma;
-
-	/* Currently render ring is the only HW context user */
-	if (ring->id != RCS || !error->ccid)
-		return;
-
-	list_for_each_entry(vma, &dev_priv->gtt.base.active_list, vm_link) {
-		if ((error->ccid & PAGE_MASK) == vma->node.start) {
-			ering->ctx = i915_error_object_create(dev_priv, vma);
-			break;
-		}
-	}
-}
-
 static void i915_gem_record_rings(struct drm_device *dev,
 				  struct drm_i915_error_state *error)
 {
@@ -1004,6 +984,10 @@ static void i915_gem_record_rings(struct drm_device *dev,
 					i915_error_object_create(dev_priv,
 								 engine->scratch.vma);
 
+			error->ring[i].ctx =
+				i915_error_object_create(dev_priv,
+							 request->ctx->engine[i].vma);
+
 			pid = request->ctx->pid;
 			if (pid) {
 				struct task_struct *task;
@@ -1030,8 +1014,6 @@ static void i915_gem_record_rings(struct drm_device *dev,
 			i915_error_object_create(dev_priv,
 						 engine->status_page.vma);
 
-		i915_gem_record_active_context(engine, error, &error->ring[i]);
-
 		count = 0;
 		list_for_each_entry(request, &engine->request_list, link)
 			count++;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 157/190] drm/i915: Tidy execlists by using intel_context_engine locals
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (13 preceding siblings ...)
  2016-01-11 11:00   ` [PATCH 156/190] drm/i915: Store the active context object on all engines upon error Chris Wilson
@ 2016-01-11 11:00   ` Chris Wilson
  2016-01-11 11:00   ` [PATCH 158/190] drm/i915: Skip holding an object reference for execbuf preparation Chris Wilson
                     ` (32 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:00 UTC (permalink / raw)
  To: intel-gfx

No functional changes, just less typing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 63 ++++++++++++++++++++--------------------
 1 file changed, 32 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c2a45f48da66..62f19ed51fb2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -485,6 +485,7 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
 int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request)
 {
 	struct intel_engine_cs *engine = request->engine;
+	struct intel_context_engine *ce = &request->ctx->engine[engine->id];
 	int ret;
 
 	if (i915.enable_guc_submission) {
@@ -498,25 +499,25 @@ int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request
 		ret = i915_guc_wq_check_space(guc->execbuf_client);
 	}
 
-	if (request->ctx->engine[engine->id].state == NULL) {
+	if (ce->state == NULL) {
 		ret = execlists_context_deferred_alloc(request->ctx, engine);
 		if (ret)
 			return ret;
 	}
 
-	request->ring = request->ctx->engine[engine->id].ring;
+	request->ring = ce->ring;
 
 	ret = intel_lr_context_pin(request->ctx, engine);
 	if (ret)
 		return ret;
 
-	if (!request->ctx->engine[engine->id].initialised) {
+	if (!ce->initialised) {
 		ret = engine->init_context(request);
 		if (ret) {
 			intel_lr_context_unpin(request->ctx, engine);
 			return ret;
 		}
-		request->ctx->engine[engine->id].initialised = true;
+		ce->initialised = true;
 	}
 
 	return 0;
@@ -569,18 +570,18 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
 static int intel_lr_context_pin(struct intel_context *ctx,
 				struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
+	struct intel_context_engine *ce = &ctx->engine[engine->id];
 	struct i915_vma *vma;
 	struct intel_ring *ring;
 	u32 ggtt_offset;
 	int ret;
 
-	if (ctx->engine[engine->id].pin_count++)
+	if (ce->pin_count++)
 		return 0;
 
 	lockdep_assert_held(&engine->dev->struct_mutex);
 
-	vma = i915_gem_object_ggtt_pin(ctx->engine[engine->id].state, NULL,
+	vma = i915_gem_object_ggtt_pin(ce->state, NULL,
 				       0, GEN8_LR_CONTEXT_ALIGN,
 				       PIN_OFFSET_BIAS | GUC_WOPCM_TOP |
 				       PIN_HIGH);
@@ -589,13 +590,13 @@ static int intel_lr_context_pin(struct intel_context *ctx,
 		goto err;
 	}
 
-	ring = ctx->engine[engine->id].ring;
+	ring = ce->ring;
 	ret = intel_ring_map(ring);
 	if (ret)
 		goto unpin;
 
 	i915_gem_context_reference(ctx);
-	ctx->engine[engine->id].vma = vma;
+	ce->vma = vma;
 	vma->obj->dirty = true;
 
 	ggtt_offset = vma->node.start + LRC_PPHWSP_PN * PAGE_SIZE;
@@ -607,30 +608,33 @@ static int intel_lr_context_pin(struct intel_context *ctx,
 	ring->registers[CTX_RING_BUFFER_START+1] = ring->vma->node.start;
 
 	/* Invalidate GuC TLB. */
-	if (i915.enable_guc_submission)
+	if (i915.enable_guc_submission) {
+		struct drm_i915_private *dev_priv = engine->i915;
 		I915_WRITE(GEN8_GTCR, GEN8_GTCR_INVALIDATE);
+	}
 
 	return 0;
 
 unpin:
 	__i915_vma_unpin(vma);
 err:
-	ctx->engine[engine->id].pin_count = 0;
+	ce->pin_count = 0;
 	return ret;
 }
 
 void intel_lr_context_unpin(struct intel_context *ctx,
 			    struct intel_engine_cs *engine)
 {
+	struct intel_context_engine *ce = &ctx->engine[engine->id];
 	struct i915_vma *vma;
 
 	lockdep_assert_held(&engine->dev->struct_mutex);
-	if (--ctx->engine[engine->id].pin_count)
+	if (--ce->pin_count)
 		return;
 
-	intel_ring_unmap(ctx->engine[engine->id].ring);
+	intel_ring_unmap(ce->ring);
 
-	vma = ctx->engine[engine->id].vma;
+	vma = ce->vma;
 	kunmap(i915_gem_object_get_page(vma->obj, LRC_STATE_PN));
 	i915_vma_unpin(vma);
 
@@ -1929,12 +1933,13 @@ static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
 static int execlists_context_deferred_alloc(struct intel_context *ctx,
 					    struct intel_engine_cs *engine)
 {
+	struct intel_context_engine *ce = &ctx->engine[engine->id];
 	struct drm_i915_gem_object *ctx_obj;
 	uint32_t context_size;
 	struct intel_ring *ring;
 	int ret;
 
-	WARN_ON(ctx->engine[engine->id].state);
+	WARN_ON(ce->state);
 
 	context_size = round_up(intel_lr_context_size(engine), 4096);
 
@@ -1959,9 +1964,9 @@ static int execlists_context_deferred_alloc(struct intel_context *ctx,
 		goto error_ringbuf;
 	}
 
-	ctx->engine[engine->id].ring = ring;
-	ctx->engine[engine->id].state = ctx_obj;
-	ctx->engine[engine->id].initialised = engine->init_context == NULL;
+	ce->ring = ring;
+	ce->state = ctx_obj;
+	ce->initialised = engine->init_context == NULL;
 
 	return 0;
 
@@ -1969,40 +1974,36 @@ error_ringbuf:
 	intel_ring_free(ring);
 error_deref_obj:
 	drm_gem_object_unreference(&ctx_obj->base);
-	ctx->engine[engine->id].ring = NULL;
-	ctx->engine[engine->id].state = NULL;
+	ce->ring = NULL;
+	ce->state = NULL;
 	return ret;
 }
 
 void intel_lr_context_reset(struct drm_device *dev,
-			struct intel_context *ctx)
+			    struct intel_context *ctx)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *unused;
 	int i;
 
 	for_each_ring(unused, dev_priv, i) {
-		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
-		struct intel_ring *ring = ctx->engine[i].ring;
+		struct intel_context_engine *ce = &ctx->engine[i];
 		uint32_t *reg_state;
 		struct page *page;
 
-		if (!ctx_obj)
+		if (ce->state == NULL)
 			continue;
 
-		if (i915_gem_object_get_pages(ctx_obj)) {
+		if (i915_gem_object_get_pages(ce->state)) {
 			WARN(1, "Failed get_pages for context obj\n");
 			continue;
 		}
-		page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
+		page = i915_gem_object_get_dirty_page(ce->state, LRC_STATE_PN);
 		reg_state = kmap_atomic(page);
 
-		reg_state[CTX_RING_HEAD+1] = 0;
-		reg_state[CTX_RING_TAIL+1] = 0;
+		reg_state[CTX_RING_HEAD+1] = ce->ring->head;
+		reg_state[CTX_RING_TAIL+1] = ce->ring->tail;
 
 		kunmap_atomic(reg_state);
-
-		ring->head = 0;
-		ring->tail = 0;
 	}
 }
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 158/190] drm/i915: Skip holding an object reference for execbuf preparation
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (14 preceding siblings ...)
  2016-01-11 11:00   ` [PATCH 157/190] drm/i915: Tidy execlists by using intel_context_engine locals Chris Wilson
@ 2016-01-11 11:00   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 159/190] drm/i915: Defer active reference until required Chris Wilson
                     ` (31 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:00 UTC (permalink / raw)
  To: intel-gfx

This is a golden oldie! We can shave a couple of locked instructions for
about 10% of the per-object overhead by not taking an extra kref whilst
reserving objects for an execbuf. Due to lock management this is safe,
as we cannot lose the original object reference without the lock.
Equally, because this relies on the heavy BKL^W struct_mutex, it is also
likely to be only a temporary optimisation until we have fine grained
locking. (That's what we said 5 years ago, so there's probably another
10 years before we get around to finer grained locking!)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 6ccce848f3e2..b7424f1b1293 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -158,7 +158,6 @@ eb_lookup_vmas(struct eb_vmas *eb,
 			goto err;
 		}
 
-		drm_gem_object_reference(&obj->base);
 		list_add_tail(&obj->obj_exec_link, &objects);
 	}
 	spin_unlock(&file->table_lock);
@@ -272,7 +271,6 @@ static void eb_destroy(struct eb_vmas *eb)
 				       exec_list);
 		list_del_init(&vma->exec_list);
 		i915_gem_execbuffer_unreserve_vma(vma);
-		drm_gem_object_unreference(&vma->obj->base);
 	}
 	kfree(eb);
 }
@@ -947,7 +945,6 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
 		list_del_init(&vma->exec_list);
 		i915_gem_execbuffer_unreserve_vma(vma);
-		drm_gem_object_unreference(&vma->obj->base);
 	}
 
 	mutex_unlock(&dev->struct_mutex);
@@ -1325,7 +1322,6 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 
 	vma->exec_entry = shadow_exec_entry;
 	vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
-	drm_gem_object_reference(&shadow_batch_obj->base);
 	list_add_tail(&vma->exec_list, &eb->vmas);
 
 err:
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 159/190] drm/i915: Defer active reference until required
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (15 preceding siblings ...)
  2016-01-11 11:00   ` [PATCH 158/190] drm/i915: Skip holding an object reference for execbuf preparation Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 160/190] drm: Track drm_mm nodes with an interval tree Chris Wilson
                     ` (30 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

We only need the active reference to keep the object alive after the
handle has been deleted (so as to prevent a synchronous gem_close). Why
the pay the price of a kref on every execbuf when we can insert that
final active ref just in time for the handle deletion?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h              | 26 +++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem.c              | 31 +++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_batch_pool.c   |  2 +-
 drivers/gpu/drm/i915/i915_gem_context.c      |  2 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  5 +----
 drivers/gpu/drm/i915/i915_gem_render_state.c |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c      |  2 +-
 7 files changed, 57 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 917686eed962..addd33bbc847 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2048,6 +2048,13 @@ struct drm_i915_gem_object {
 #define __I915_BO_ACTIVE(bo) (READ_ONCE((bo)->flags) & (I915_BO_ACTIVE_MASK << I915_BO_ACTIVE_SHIFT))
 
 	/**
+	 * Have we taken a reference for the object for incomplete GPU
+	 * activity?
+	 */
+#define I915_BO_ACTIVE_REF_SHIFT (I915_BO_ACTIVE_SHIFT + I915_NUM_RINGS)
+#define I915_BO_ACTIVE_REF_BIT (1 << I915_BO_ACTIVE_REF_SHIFT)
+
+	/**
 	 * This is set if the object has been written to since last bound
 	 * to the GTT
 	 */
@@ -2163,6 +2170,25 @@ i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj,
 	return obj->flags & (1 << (engine + I915_BO_ACTIVE_SHIFT));
 }
 
+static inline bool
+i915_gem_object_has_active_reference(const struct drm_i915_gem_object *obj)
+{
+	return obj->flags & I915_BO_ACTIVE_REF_BIT;
+}
+
+static inline void
+i915_gem_object_set_active_reference(struct drm_i915_gem_object *obj)
+{
+	obj->flags |= I915_BO_ACTIVE_REF_BIT;
+}
+
+static inline void
+i915_gem_object_unset_active_reference(struct drm_i915_gem_object *obj)
+{
+	obj->flags &= ~I915_BO_ACTIVE_REF_BIT;
+}
+void __i915_gem_object_release_unless_active(struct drm_i915_gem_object *obj);
+
 void i915_gem_track_fb(struct drm_i915_gem_object *old,
 		       struct drm_i915_gem_object *new,
 		       unsigned frontbuffer_bits);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1fa4752682d6..962fd81ce26c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2116,7 +2116,10 @@ i915_gem_object_retire__read(struct i915_gem_active *active,
 		list_move_tail(&obj->global_list,
 			       &request->i915->mm.bound_list);
 
-	drm_gem_object_unreference(&obj->base);
+	if (i915_gem_object_has_active_reference(obj)) {
+		i915_gem_object_unset_active_reference(obj);
+		drm_gem_object_unreference(&obj->base);
+	}
 }
 
 static void i915_gem_mark_idle(struct drm_i915_private *dev_priv)
@@ -2390,13 +2393,12 @@ out:
  * write domains, emitting any outstanding lazy request and retiring and
  * completed requests.
  */
-static void
-i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
+static bool i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 {
 	int i;
 
 	if (!i915_gem_object_is_active(obj))
-		return;
+		return false;
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct drm_i915_gem_request *req;
@@ -2408,6 +2410,8 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 		if (i915_gem_request_completed(req))
 			i915_gem_request_retire_upto(req);
 	}
+
+	return i915_gem_object_is_active(obj);
 }
 
 void i915_vma_close(struct i915_vma *vma)
@@ -2431,7 +2435,12 @@ void i915_gem_close_object(struct drm_gem_object *gem,
 	list_for_each_entry_safe(vma, vn, &obj->vma_list, obj_link)
 		if (vma->vm->file == fpriv)
 			i915_vma_close(vma);
-	i915_gem_object_flush_active(obj);
+
+	if (i915_gem_object_flush_active(obj) &&
+	    !i915_gem_object_has_active_reference(obj)) {
+		i915_gem_object_set_active_reference(obj);
+		drm_gem_object_reference(&obj->base);
+	}
 	mutex_unlock(&obj->base.dev->struct_mutex);
 }
 
@@ -3847,6 +3856,18 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 	intel_runtime_pm_put(dev_priv);
 }
 
+void __i915_gem_object_release_unless_active(struct drm_i915_gem_object *obj)
+{
+	if (obj == NULL)
+		return;
+
+	GEM_BUG_ON(i915_gem_object_has_active_reference(obj));
+	if (i915_gem_object_is_active(obj))
+		i915_gem_object_set_active_reference(obj);
+	else
+		drm_gem_object_unreference(&obj->base);
+}
+
 static void
 i915_gem_stop_ringbuffers(struct drm_device *dev)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 5ec5b1439e1f..d46012234db1 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -75,7 +75,7 @@ void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool)
 						 batch_pool_link);
 
 			list_del(&obj->batch_pool_link);
-			drm_gem_object_unreference(&obj->base);
+			__i915_gem_object_release_unless_active(obj);
 		}
 	}
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f261d9e0929d..e619cdadaeb6 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -151,7 +151,7 @@ void i915_gem_context_free(struct kref *ctx_ref)
 		if (ce->ring)
 			intel_ring_free(ce->ring);
 
-		drm_gem_object_unreference(&ce->state->base);
+		__i915_gem_object_release_unless_active(ce->state);
 	}
 
 	i915_ppgtt_put(ctx->ppgtt);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index b7424f1b1293..00f3529a3560 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1202,15 +1202,12 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 
 	obj->dirty = 1; /* be paranoid  */
 
-	/* Add a reference if we're newly entering the active list.
-	 * The order in which we add operations to the retirement queue is
+	/* The order in which we add operations to the retirement queue is
 	 * vital here: mark_active adds to the start of the callback list,
 	 * such that subsequent callbacks are called first. Therefore we
 	 * add the active reference first and queue for it to be dropped
 	 * *last*.
 	 */
-	if (!i915_gem_object_is_active(obj))
-		drm_gem_object_reference(&obj->base);
 	i915_gem_object_set_active(obj, engine);
 	i915_gem_request_mark_active(req, &obj->last_read[engine]);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 89b5c99bbb02..2fac95b0ba44 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -173,7 +173,7 @@ err_out:
 static void render_state_fini(struct render_state *so)
 {
 	i915_vma_unpin(so->vma);
-	drm_gem_object_unreference(&so->obj->base);
+	__i915_gem_object_release_unless_active(so->obj);
 }
 
 static int render_state_prepare(struct intel_engine_cs *ring,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index be2207f551e3..41c52cdcbe4a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1961,7 +1961,7 @@ void intel_ring_unmap(struct intel_ring *ring)
 
 static void intel_destroy_ringbuffer_obj(struct intel_ring *ringbuf)
 {
-	drm_gem_object_unreference(&ringbuf->obj->base);
+	__i915_gem_object_release_unless_active(ringbuf->obj);
 	ringbuf->obj = NULL;
 }
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 160/190] drm: Track drm_mm nodes with an interval tree
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (16 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 159/190] drm/i915: Defer active reference until required Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 161/190] drm: Convert drm_vma_manager to embedded interval-tree in drm_mm Chris Wilson
                     ` (29 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel

In addition to the last-in/first-out stack for accessing drm_mm nodes,
we occasionally and in the future often want to find a drm_mm_node by an
address. To do so efficiently we need to track the nodes in an interval
tree - lookups for a particular address will then be O(lg(N)), where N
is the number of nodes in the range manager as opposed to O(N).
Insertion however gains an extra O(lg(N)) step for all nodes
irrespective of whether the interval tree is in use. For future i915
patches, eliminating the linear walk is a significant improvement.

v2: Use generic interval-tree template for u64 and faster insertion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
---
 drivers/gpu/drm/drm_mm.c | 124 ++++++++++++++++++++++++++++++++++++++---------
 include/drm/drm_mm.h     |  12 +++++
 2 files changed, 113 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index 04de6fd88f8c..fff084f266f9 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -46,6 +46,7 @@
 #include <linux/slab.h>
 #include <linux/seq_file.h>
 #include <linux/export.h>
+#include <linux/interval_tree_generic.h>
 
 /**
  * DOC: Overview
@@ -103,6 +104,63 @@ static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_
 						u64 end,
 						enum drm_mm_search_flags flags);
 
+#define START(node) ((node)->start)
+#define LAST(node)  ((node)->start + (node)->size - 1)
+
+INTERVAL_TREE_DEFINE(struct drm_mm_node, rb,
+		     u64, __subtree_last,
+		     START, LAST, static inline, drm_mm_interval_tree)
+
+struct drm_mm_node *
+drm_mm_interval_first(struct drm_mm *mm, u64 start, u64 last)
+{
+	return drm_mm_interval_tree_iter_first(&mm->interval_tree,
+					       start, last);
+}
+EXPORT_SYMBOL(drm_mm_interval_first);
+
+struct drm_mm_node *
+drm_mm_interval_next(struct drm_mm_node *node, u64 start, u64 last)
+{
+	return drm_mm_interval_tree_iter_next(node, start, last);
+}
+EXPORT_SYMBOL(drm_mm_interval_next);
+
+static void drm_mm_interval_tree_add_node(struct drm_mm_node *hole_node,
+					  struct drm_mm_node *node)
+{
+	struct drm_mm *mm = hole_node->mm;
+	struct rb_node **link, *rb_parent;
+	struct drm_mm_node *parent;
+
+	node->__subtree_last = LAST(node);
+
+	if (hole_node->allocated) {
+		hole_node->__subtree_last = node->__subtree_last;
+		rb_parent = &hole_node->rb;
+		link = &hole_node->rb.rb_right;
+	} else {
+		rb_parent = NULL;
+		link = &mm->interval_tree.rb_node;
+	}
+
+	while (*link) {
+		rb_parent = *link;
+		parent = rb_entry(rb_parent, struct drm_mm_node, rb);
+		if (parent->__subtree_last < node->__subtree_last)
+			parent->__subtree_last = node->__subtree_last;
+		if (node->start < parent->start)
+			link = &parent->rb.rb_left;
+		else
+			link = &parent->rb.rb_right;
+	}
+
+	rb_link_node(&node->rb, rb_parent, link);
+	rb_insert_augmented(&node->rb,
+			    &mm->interval_tree,
+			    &drm_mm_interval_tree_augment);
+}
+
 static void drm_mm_insert_helper(struct drm_mm_node *hole_node,
 				 struct drm_mm_node *node,
 				 u64 size, unsigned alignment,
@@ -153,6 +211,8 @@ static void drm_mm_insert_helper(struct drm_mm_node *hole_node,
 	INIT_LIST_HEAD(&node->hole_stack);
 	list_add(&node->node_list, &hole_node->node_list);
 
+	drm_mm_interval_tree_add_node(hole_node, node);
+
 	BUG_ON(node->start + node->size > adj_end);
 
 	node->hole_follows = 0;
@@ -178,39 +238,50 @@ static void drm_mm_insert_helper(struct drm_mm_node *hole_node,
  */
 int drm_mm_reserve_node(struct drm_mm *mm, struct drm_mm_node *node)
 {
-	struct drm_mm_node *hole;
 	u64 end = node->start + node->size;
-	u64 hole_start;
-	u64 hole_end;
-
-	BUG_ON(node == NULL);
+	struct drm_mm_node *hole;
+	u64 hole_start, hole_end;
 
 	/* Find the relevant hole to add our node to */
-	drm_mm_for_each_hole(hole, mm, hole_start, hole_end) {
-		if (hole_start > node->start || hole_end < end)
-			continue;
+	hole = drm_mm_interval_tree_iter_first(&mm->interval_tree,
+					       node->start, ~(u64)0);
+	if (hole) {
+		if (hole->start <= node->start)
+			return -ENOSPC;
+	} else {
+		hole = list_entry(&mm->head_node.node_list,
+				  typeof(*hole), node_list);
+	}
 
-		node->mm = mm;
-		node->allocated = 1;
+	hole = list_last_entry(&hole->node_list, typeof(*hole), node_list);
+	if (!hole->hole_follows)
+		return -ENOSPC;
 
-		INIT_LIST_HEAD(&node->hole_stack);
-		list_add(&node->node_list, &hole->node_list);
+	hole_start = __drm_mm_hole_node_start(hole);
+	hole_end = __drm_mm_hole_node_end(hole);
+	if (hole_start > node->start || hole_end < end)
+		return -ENOSPC;
 
-		if (node->start == hole_start) {
-			hole->hole_follows = 0;
-			list_del_init(&hole->hole_stack);
-		}
+	node->mm = mm;
+	node->allocated = 1;
 
-		node->hole_follows = 0;
-		if (end != hole_end) {
-			list_add(&node->hole_stack, &mm->hole_stack);
-			node->hole_follows = 1;
-		}
+	INIT_LIST_HEAD(&node->hole_stack);
+	list_add(&node->node_list, &hole->node_list);
 
-		return 0;
+	drm_mm_interval_tree_add_node(hole, node);
+
+	if (node->start == hole_start) {
+		hole->hole_follows = 0;
+		list_del_init(&hole->hole_stack);
+	}
+
+	node->hole_follows = 0;
+	if (end != hole_end) {
+		list_add(&node->hole_stack, &mm->hole_stack);
+		node->hole_follows = 1;
 	}
 
-	return -ENOSPC;
+	return 0;
 }
 EXPORT_SYMBOL(drm_mm_reserve_node);
 
@@ -300,6 +371,8 @@ static void drm_mm_insert_helper_range(struct drm_mm_node *hole_node,
 	INIT_LIST_HEAD(&node->hole_stack);
 	list_add(&node->node_list, &hole_node->node_list);
 
+	drm_mm_interval_tree_add_node(hole_node, node);
+
 	BUG_ON(node->start < start);
 	BUG_ON(node->start < adj_start);
 	BUG_ON(node->start + node->size > adj_end);
@@ -388,6 +461,7 @@ void drm_mm_remove_node(struct drm_mm_node *node)
 	} else
 		list_move(&prev_node->hole_stack, &mm->hole_stack);
 
+	drm_mm_interval_tree_remove(node, &mm->interval_tree);
 	list_del(&node->node_list);
 	node->allocated = 0;
 }
@@ -514,11 +588,13 @@ void drm_mm_replace_node(struct drm_mm_node *old, struct drm_mm_node *new)
 {
 	list_replace(&old->node_list, &new->node_list);
 	list_replace(&old->hole_stack, &new->hole_stack);
+	rb_replace_node(&old->rb, &new->rb, &old->mm->interval_tree);
 	new->hole_follows = old->hole_follows;
 	new->mm = old->mm;
 	new->start = old->start;
 	new->size = old->size;
 	new->color = old->color;
+	new->__subtree_last = old->__subtree_last;
 
 	old->allocated = 0;
 	new->allocated = 1;
@@ -756,6 +832,8 @@ void drm_mm_init(struct drm_mm * mm, u64 start, u64 size)
 	mm->head_node.size = start - mm->head_node.start;
 	list_add_tail(&mm->head_node.hole_stack, &mm->hole_stack);
 
+	mm->interval_tree = RB_ROOT;
+
 	mm->color_adjust = NULL;
 }
 EXPORT_SYMBOL(drm_mm_init);
diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h
index fc65118e5077..205ddcf6d55d 100644
--- a/include/drm/drm_mm.h
+++ b/include/drm/drm_mm.h
@@ -37,6 +37,7 @@
  * Generic range manager structs
  */
 #include <linux/bug.h>
+#include <linux/rbtree.h>
 #include <linux/kernel.h>
 #include <linux/list.h>
 #include <linux/spinlock.h>
@@ -61,6 +62,7 @@ enum drm_mm_allocator_flags {
 struct drm_mm_node {
 	struct list_head node_list;
 	struct list_head hole_stack;
+	struct rb_node rb;
 	unsigned hole_follows : 1;
 	unsigned scanned_block : 1;
 	unsigned scanned_prev_free : 1;
@@ -70,6 +72,7 @@ struct drm_mm_node {
 	unsigned long color;
 	u64 start;
 	u64 size;
+	u64 __subtree_last;
 	struct drm_mm *mm;
 };
 
@@ -79,6 +82,9 @@ struct drm_mm {
 	/* head_node.node_list is the list of all memory nodes, ordered
 	 * according to the (increasing) start address of the memory node. */
 	struct drm_mm_node head_node;
+	/* Keep an interval_tree for fast lookup of drm_mm_nodes by address. */
+	struct rb_root interval_tree;
+
 	unsigned int scan_check_range : 1;
 	unsigned scan_alignment;
 	unsigned long scan_color;
@@ -295,6 +301,12 @@ void drm_mm_init(struct drm_mm *mm,
 void drm_mm_takedown(struct drm_mm *mm);
 bool drm_mm_clean(struct drm_mm *mm);
 
+struct drm_mm_node *
+drm_mm_interval_first(struct drm_mm *mm, u64 start, u64 last);
+
+struct drm_mm_node *
+drm_mm_interval_next(struct drm_mm_node *node, u64 start, u64 last);
+
 void drm_mm_init_scan(struct drm_mm *mm,
 		      u64 size,
 		      unsigned alignment,
-- 
2.7.0.rc3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 161/190] drm: Convert drm_vma_manager to embedded interval-tree in drm_mm
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (17 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 160/190] drm: Track drm_mm nodes with an interval tree Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 162/190] drm/i915: Allow the user to pass a context to any ring Chris Wilson
                     ` (28 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel

Having added an interval-tree to struct drm_mm, we can replace the
auxiliary rb-tree inside the drm_vma_manager with it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
---
 drivers/gpu/drm/drm_vma_manager.c | 43 ++++++++-------------------------------
 include/drm/drm_vma_manager.h     |  2 --
 2 files changed, 9 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/drm_vma_manager.c b/drivers/gpu/drm/drm_vma_manager.c
index 2f2ecde8285b..42727ceae7a9 100644
--- a/drivers/gpu/drm/drm_vma_manager.c
+++ b/drivers/gpu/drm/drm_vma_manager.c
@@ -86,7 +86,6 @@ void drm_vma_offset_manager_init(struct drm_vma_offset_manager *mgr,
 				 unsigned long page_offset, unsigned long size)
 {
 	rwlock_init(&mgr->vm_lock);
-	mgr->vm_addr_space_rb = RB_ROOT;
 	drm_mm_init(&mgr->vm_addr_space_mm, page_offset, size);
 }
 EXPORT_SYMBOL(drm_vma_offset_manager_init);
@@ -142,16 +141,16 @@ struct drm_vma_offset_node *drm_vma_offset_lookup_locked(struct drm_vma_offset_m
 							 unsigned long start,
 							 unsigned long pages)
 {
-	struct drm_vma_offset_node *node, *best;
+	struct drm_mm_node *node, *best;
 	struct rb_node *iter;
 	unsigned long offset;
 
-	iter = mgr->vm_addr_space_rb.rb_node;
+	iter = mgr->vm_addr_space_mm.interval_tree.rb_node;
 	best = NULL;
 
 	while (likely(iter)) {
-		node = rb_entry(iter, struct drm_vma_offset_node, vm_rb);
-		offset = node->vm_node.start;
+		node = rb_entry(iter, struct drm_mm_node, rb);
+		offset = node->start;
 		if (start >= offset) {
 			iter = iter->rb_right;
 			best = node;
@@ -164,38 +163,17 @@ struct drm_vma_offset_node *drm_vma_offset_lookup_locked(struct drm_vma_offset_m
 
 	/* verify that the node spans the requested area */
 	if (best) {
-		offset = best->vm_node.start + best->vm_node.size;
+		offset = best->start + best->size;
 		if (offset < start + pages)
 			best = NULL;
 	}
 
-	return best;
-}
-EXPORT_SYMBOL(drm_vma_offset_lookup_locked);
-
-/* internal helper to link @node into the rb-tree */
-static void _drm_vma_offset_add_rb(struct drm_vma_offset_manager *mgr,
-				   struct drm_vma_offset_node *node)
-{
-	struct rb_node **iter = &mgr->vm_addr_space_rb.rb_node;
-	struct rb_node *parent = NULL;
-	struct drm_vma_offset_node *iter_node;
-
-	while (likely(*iter)) {
-		parent = *iter;
-		iter_node = rb_entry(*iter, struct drm_vma_offset_node, vm_rb);
+	if (best == NULL)
+		return NULL;
 
-		if (node->vm_node.start < iter_node->vm_node.start)
-			iter = &(*iter)->rb_left;
-		else if (node->vm_node.start > iter_node->vm_node.start)
-			iter = &(*iter)->rb_right;
-		else
-			BUG();
-	}
-
-	rb_link_node(&node->vm_rb, parent, iter);
-	rb_insert_color(&node->vm_rb, &mgr->vm_addr_space_rb);
+	return container_of(best, struct drm_vma_offset_node, vm_node);
 }
+EXPORT_SYMBOL(drm_vma_offset_lookup_locked);
 
 /**
  * drm_vma_offset_add() - Add offset node to manager
@@ -237,8 +215,6 @@ int drm_vma_offset_add(struct drm_vma_offset_manager *mgr,
 	if (ret)
 		goto out_unlock;
 
-	_drm_vma_offset_add_rb(mgr, node);
-
 out_unlock:
 	write_unlock(&mgr->vm_lock);
 	return ret;
@@ -262,7 +238,6 @@ void drm_vma_offset_remove(struct drm_vma_offset_manager *mgr,
 	write_lock(&mgr->vm_lock);
 
 	if (drm_mm_node_allocated(&node->vm_node)) {
-		rb_erase(&node->vm_rb, &mgr->vm_addr_space_rb);
 		drm_mm_remove_node(&node->vm_node);
 		memset(&node->vm_node, 0, sizeof(node->vm_node));
 	}
diff --git a/include/drm/drm_vma_manager.h b/include/drm/drm_vma_manager.h
index 2f63dd5e05eb..c671093a0157 100644
--- a/include/drm/drm_vma_manager.h
+++ b/include/drm/drm_vma_manager.h
@@ -40,13 +40,11 @@ struct drm_vma_offset_file {
 struct drm_vma_offset_node {
 	rwlock_t vm_lock;
 	struct drm_mm_node vm_node;
-	struct rb_node vm_rb;
 	struct rb_root vm_files;
 };
 
 struct drm_vma_offset_manager {
 	rwlock_t vm_lock;
-	struct rb_root vm_addr_space_rb;
 	struct drm_mm vm_addr_space_mm;
 };
 
-- 
2.7.0.rc3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 162/190] drm/i915: Allow the user to pass a context to any ring
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (18 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 161/190] drm: Convert drm_vma_manager to embedded interval-tree in drm_mm Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 163/190] drm/i915: Fix i915_gem_evict_for_vma (soft-pinning) Chris Wilson
                     ` (27 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

With full-ppgtt, we want the user to have full control over their memory
layout, with a separate instance per context. Forcing them to use a
shared memory layout for !RCS not only duplicates the amount of work we
have to do, but also defeats the memory segregation on offer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 00f3529a3560..7af562996767 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1172,12 +1172,9 @@ static struct intel_context *
 i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
 			  struct intel_engine_cs *ring, const u32 ctx_id)
 {
-	struct intel_context *ctx = NULL;
+	struct intel_context *ctx;
 	struct i915_ctx_hang_stats *hs;
 
-	if (ring->id != RCS && ctx_id != DEFAULT_CONTEXT_HANDLE)
-		return ERR_PTR(-EINVAL);
-
 	ctx = i915_gem_context_get(file->driver_priv, ctx_id);
 	if (IS_ERR(ctx))
 		return ctx;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 163/190] drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (19 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 162/190] drm/i915: Allow the user to pass a context to any ring Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 164/190] drm/i915: Move obj->dirty:1 to obj->flags Chris Wilson
                     ` (26 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

Soft-pinning depends upon being able to check for availabilty of an
inverval and evict overlapping object fro a drm_mm range manager very
quickly. Currently it uses a linear list which makes performance dire,
and softpinning not a suitable replacement.

It also helps if the routine reports the correct error codes as expected
by its callers and emits a tracepoint upon use.

For posterity since the wrong patch was pushed (i.e. that missed these
key points), this is the changelog that should have been on

commit 506a8e87d8d2746b9e9d2433503fe237c54e4750
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Dec 8 11:55:07 2015 +0000

    drm/i915: Add soft-pinning API for execbuffer

Userspace can pass in an offset that it presumes the object is located
at. The kernel will then do its utmost to fit the object into that
location. The assumption is that userspace is handling its own object
locations (for example along with full-ppgtt) and that the kernel will
rarely have to make space for the user's requests.

This extends the DRM_IOCTL_I915_GEM_EXECBUFFER2 to do the following:
* if the user supplies a virtual address via the execobject->offset
  *and* sets the EXEC_OBJECT_PINNED flag in execobject->flags, then
  that object is placed at that offset in the address space selected
  by the context specifier in execbuffer.
* the location must be aligned to the GTT page size, 4096 bytes
* as the object is placed exactly as specified, it may be used by this
  execbuffer call without relocations pointing to it

It may fail to do so if:
* EINVAL is returned if the object does not have a 4096 byte aligned
  address
* the object conflicts with another pinned object (either pinned by
  hardware in that address space, e.g. scanouts in the aliasing ppgtt)
  or within the same batch.
  EBUSY is returned if the location is pinned by hardware
  EINVAL is returned if the location is already in use by the batch
* EINVAL is returned if the object conflicts with its own alignment (as meets
  the hardware requirements) or if the placement of the object does not fit
  within the address space

All other execbuffer errors apply.

Presence of this execbuf extension may be queried by passing
I915_PARAM_HAS_EXEC_SOFTPIN to DRM_IOCTL_I915_GETPARAM and checking for
a reported value of 1 (or greater).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h       |  2 +-
 drivers/gpu/drm/i915/i915_gem.c       |  2 +-
 drivers/gpu/drm/i915/i915_gem_evict.c | 69 ++++++++++++++++++++++-------------
 drivers/gpu/drm/i915/i915_trace.h     | 23 ++++++++++++
 4 files changed, 69 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index addd33bbc847..62a024a7225b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3052,7 +3052,7 @@ int __must_check i915_gem_evict_something(struct drm_device *dev,
 					  unsigned long start,
 					  unsigned long end,
 					  unsigned flags);
-int __must_check i915_gem_evict_for_vma(struct i915_vma *target);
+int __must_check i915_gem_evict_for_vma(struct i915_vma *vma, unsigned flags);
 int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle);
 
 /* belongs in i915_gem_gtt.h */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 962fd81ce26c..497b68849d09 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2813,7 +2813,7 @@ i915_vma_insert(struct i915_vma *vma,
 		vma->node.color = obj->cache_level;
 		ret = drm_mm_reserve_node(&vma->vm->mm, &vma->node);
 		if (ret) {
-			ret = i915_gem_evict_for_vma(vma);
+			ret = i915_gem_evict_for_vma(vma, flags);
 			if (ret == 0)
 				ret = drm_mm_reserve_node(&vma->vm->mm, &vma->node);
 			if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index fdc4941be15a..b48839fc2996 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -229,42 +229,61 @@ found:
 }
 
 int
-i915_gem_evict_for_vma(struct i915_vma *target)
+i915_gem_evict_for_vma(struct i915_vma *target, unsigned flags)
 {
-	struct drm_mm_node *node, *next;
+	struct list_head eviction_list;
+	struct drm_mm_node *node;
+	u64 end = target->node.start + target->node.size;
+	struct i915_vma *vma, *next;
+	int ret;
 
-	list_for_each_entry_safe(node, next,
-			&target->vm->mm.head_node.node_list,
-			node_list) {
-		struct i915_vma *vma;
-		int ret;
+	trace_i915_gem_evict_vma(target, flags);
 
-		if (node->start + node->size <= target->node.start)
-			continue;
-		if (node->start >= target->node.start + target->node.size)
-			break;
+	node = drm_mm_interval_first(&target->vm->mm,
+				     target->node.start, end - 1);
+	if (node == NULL)
+		return 0;
 
-		vma = container_of(node, typeof(*vma), node);
+	INIT_LIST_HEAD(&eviction_list);
+	vma = container_of(node, typeof(*vma), node);
+	list_for_each_entry_from(vma,
+				 &target->vm->mm.head_node.node_list,
+				 node.node_list) {
+		if (vma->node.start >= end)
+			break;
 
-		if (vma->pin_count) {
-			if (!vma->exec_entry || (vma->pin_count > 1))
-				/* Object is pinned for some other use */
-				return -EBUSY;
+		if (flags & PIN_NONBLOCK && (vma->pin_count | vma->active)) {
+			ret = -ENOSPC;
+			break;
+		}
 
-			/* We need to evict a buffer in the same batch */
-			if (vma->exec_entry->flags & EXEC_OBJECT_PINNED)
-				/* Overlapping fixed objects in the same batch */
-				return -EINVAL;
+		if (vma->exec_entry &&
+		    vma->exec_entry->flags & EXEC_OBJECT_PINNED) {
+			/* Overlapping pinned objects in the same batch */
+			ret = -EINVAL;
+			break;
+		}
 
-			return -ENOSPC;
+		if (vma->pin_count) {
+			/* We may need to evict an buffer in the same batch */
+			ret = vma->exec_entry ? -ENOSPC : -EBUSY;
+			break;
 		}
 
-		ret = i915_vma_unbind(vma);
-		if (ret)
-			return ret;
+		list_add(&vma->exec_list, &eviction_list);
+		drm_gem_object_reference(&vma->obj->base);
 	}
 
-	return 0;
+	ret = 0;
+	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+		struct drm_i915_gem_object *obj = vma->obj;
+		list_del_init(&vma->exec_list);
+		if (ret == 0)
+			ret = i915_vma_unbind(vma);
+		drm_gem_object_unreference(&obj->base);
+	}
+
+	return ret;
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index e486dcef508d..396943fa53ea 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -448,6 +448,29 @@ TRACE_EVENT(i915_gem_evict_vm,
 	    TP_printk("dev=%d, vm=%p", __entry->dev, __entry->vm)
 );
 
+TRACE_EVENT(i915_gem_evict_vma,
+	    TP_PROTO(struct i915_vma *vma, unsigned flags),
+	    TP_ARGS(vma, flags),
+
+	    TP_STRUCT__entry(
+			     __field(u32, dev)
+			     __field(struct i915_address_space *, vm)
+			     __field(u64, start)
+			     __field(u64, size)
+			     __field(unsigned, flags)
+			    ),
+
+	    TP_fast_assign(
+			   __entry->dev = vma->vm->dev->primary->index;
+			   __entry->vm = vma->vm;
+			   __entry->start = vma->node.start;
+			   __entry->size = vma->node.size;
+			   __entry->flags = flags;
+			  ),
+
+	    TP_printk("dev=%d, vm=%p, start=%llx size=%llx, flags=%x", __entry->dev, __entry->vm, (long long)__entry->start, (long long)__entry->size, __entry->flags)
+);
+
 TRACE_EVENT(i915_gem_ring_sync_to,
 	    TP_PROTO(struct drm_i915_gem_request *to,
 		     struct drm_i915_gem_request *from),
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 164/190] drm/i915: Move obj->dirty:1 to obj->flags
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (20 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 163/190] drm/i915: Fix i915_gem_evict_for_vma (soft-pinning) Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-03-24  8:17     ` David Weinehall
  2016-01-11 11:01   ` [PATCH 165/190] drm/i915: Use the precomputed value for whether to enable command parsing Chris Wilson
                     ` (25 subsequent siblings)
  47 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

The obj->dirty bit is a companion to the obj->active bits that were
moved to the obj->flags bitmask. Since we also update this bit inside
the i915_vma_move_to_active() hotpath, we can aide gcc by also moving
the obj->dirty bit to obj->flags bitmask.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  2 +-
 drivers/gpu/drm/i915/i915_drv.h            | 21 ++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem.c            | 18 +++++++++---------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +--
 drivers/gpu/drm/i915/i915_gem_userptr.c    |  6 +++---
 drivers/gpu/drm/i915/i915_gpu_error.c      |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c           |  2 +-
 7 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 558d79b63e6c..8a59630fe5fb 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -136,7 +136,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	seq_printf(m, "] %x %s%s%s",
 		   i915_gem_request_get_seqno(obj->last_write.request),
 		   i915_cache_level_str(to_i915(obj->base.dev), obj->cache_level),
-		   obj->dirty ? " dirty" : "",
+		   i915_gem_object_is_dirty(obj) ? " dirty" : "",
 		   obj->madv == I915_MADV_DONTNEED ? " purgeable" : "");
 	if (obj->base.name)
 		seq_printf(m, " (name: %d)", obj->base.name);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 62a024a7225b..d664a67cda7b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2058,7 +2058,8 @@ struct drm_i915_gem_object {
 	 * This is set if the object has been written to since last bound
 	 * to the GTT
 	 */
-	unsigned int dirty:1;
+#define I915_BO_DIRTY_SHIFT (I915_BO_ACTIVE_REF_SHIFT + 1)
+#define I915_BO_DIRTY_BIT (1 << I915_BO_DIRTY_SHIFT)
 
 	/**
 	 * Advice: are the backing pages purgeable?
@@ -2189,6 +2190,24 @@ i915_gem_object_unset_active_reference(struct drm_i915_gem_object *obj)
 }
 void __i915_gem_object_release_unless_active(struct drm_i915_gem_object *obj);
 
+static inline bool
+i915_gem_object_is_dirty(const struct drm_i915_gem_object *obj)
+{
+	return obj->flags & I915_BO_DIRTY_BIT;
+}
+
+static inline void
+i915_gem_object_set_dirty(struct drm_i915_gem_object *obj)
+{
+	obj->flags |= I915_BO_DIRTY_BIT;
+}
+
+static inline void
+i915_gem_object_unset_dirty(struct drm_i915_gem_object *obj)
+{
+	obj->flags &= ~I915_BO_DIRTY_BIT;
+}
+
 void i915_gem_track_fb(struct drm_i915_gem_object *old,
 		       struct drm_i915_gem_object *new,
 		       unsigned frontbuffer_bits);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 497b68849d09..5347469bbea1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -209,9 +209,9 @@ i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj)
 	}
 
 	if (obj->madv == I915_MADV_DONTNEED)
-		obj->dirty = 0;
+		i915_gem_object_unset_dirty(obj);
 
-	if (obj->dirty) {
+	if (i915_gem_object_is_dirty(obj)) {
 		struct address_space *mapping = file_inode(obj->base.filp)->i_mapping;
 		char *vaddr = obj->phys_handle->vaddr;
 		int i;
@@ -235,7 +235,7 @@ i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj)
 			page_cache_release(page);
 			vaddr += PAGE_SIZE;
 		}
-		obj->dirty = 0;
+		i915_gem_object_unset_dirty(obj);
 	}
 
 	sg_free_table(obj->pages);
@@ -589,7 +589,7 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
 
 out:
 	intel_fb_obj_invalidate(obj, ORIGIN_CPU);
-	obj->dirty = 1;
+	i915_gem_object_set_dirty(obj);
 	/* return with the pages pinned */
 	return 0;
 
@@ -1836,12 +1836,12 @@ i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
 		i915_gem_object_save_bit_17_swizzle(obj);
 
 	if (obj->madv == I915_MADV_DONTNEED)
-		obj->dirty = 0;
+		i915_gem_object_unset_dirty(obj);
 
 	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
 		struct page *page = sg_page_iter_page(&sg_iter);
 
-		if (obj->dirty)
+		if (i915_gem_object_is_dirty(obj))
 			set_page_dirty(page);
 
 		if (obj->madv == I915_MADV_WILLNEED)
@@ -1849,7 +1849,7 @@ i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
 
 		page_cache_release(page);
 	}
-	obj->dirty = 0;
+	i915_gem_object_unset_dirty(obj);
 
 	sg_free_table(obj->pages);
 	kfree(obj->pages);
@@ -3029,7 +3029,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 	if (write) {
 		obj->base.read_domains = I915_GEM_DOMAIN_GTT;
 		obj->base.write_domain = I915_GEM_DOMAIN_GTT;
-		obj->dirty = 1;
+		i915_gem_object_set_dirty(obj);
 	}
 
 	trace_i915_gem_object_change_domain(obj,
@@ -4389,7 +4389,7 @@ i915_gem_object_create_from_data(struct drm_device *dev,
 	i915_gem_object_pin_pages(obj);
 	sg = obj->pages;
 	bytes = sg_copy_from_buffer(sg->sgl, sg->nents, (void *)data, size);
-	obj->dirty = 1;		/* Backing store is now out of date */
+	i915_gem_object_set_dirty(obj); /* Backing store is now out of date */
 	i915_gem_object_unpin_pages(obj);
 
 	if (WARN_ON(bytes != size)) {
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 7af562996767..185fbf45a5d2 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1197,14 +1197,13 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 
-	obj->dirty = 1; /* be paranoid  */
-
 	/* The order in which we add operations to the retirement queue is
 	 * vital here: mark_active adds to the start of the callback list,
 	 * such that subsequent callbacks are called first. Therefore we
 	 * add the active reference first and queue for it to be dropped
 	 * *last*.
 	 */
+	i915_gem_object_set_dirty(obj); /* be paranoid */
 	i915_gem_object_set_active(obj, engine);
 	i915_gem_request_mark_active(req, &obj->last_read[engine]);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 53f8094b3198..232ce85b39db 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -745,20 +745,20 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj)
 	__i915_gem_userptr_set_active(obj, false);
 
 	if (obj->madv != I915_MADV_WILLNEED)
-		obj->dirty = 0;
+		i915_gem_object_unset_dirty(obj);
 
 	i915_gem_gtt_finish_object(obj);
 
 	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
 		struct page *page = sg_page_iter_page(&sg_iter);
 
-		if (obj->dirty)
+		if (i915_gem_object_is_dirty(obj))
 			set_page_dirty(page);
 
 		mark_page_accessed(page);
 		page_cache_release(page);
 	}
-	obj->dirty = 0;
+	i915_gem_object_unset_dirty(obj);
 
 	sg_free_table(obj->pages);
 	kfree(obj->pages);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index e9ef6b25c696..6fbb11a53b60 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -715,7 +715,7 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 	err->write_domain = obj->base.write_domain;
 	err->fence_reg = vma->fence ? vma->fence->id : -1;
 	err->tiling = obj->tiling_mode;
-	err->dirty = obj->dirty;
+	err->dirty = i915_gem_object_is_dirty(obj);
 	err->purgeable = obj->madv != I915_MADV_WILLNEED;
 	err->userptr = obj->userptr.mm != NULL;
 	err->ring = obj->last_write.request ? obj->last_write.request->engine->id : -1;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 62f19ed51fb2..3e61fce1326e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -597,7 +597,7 @@ static int intel_lr_context_pin(struct intel_context *ctx,
 
 	i915_gem_context_reference(ctx);
 	ce->vma = vma;
-	vma->obj->dirty = true;
+	i915_gem_object_set_dirty(vma->obj);
 
 	ggtt_offset = vma->node.start + LRC_PPHWSP_PN * PAGE_SIZE;
 	ring->context_descriptor =
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 165/190] drm/i915: Use the precomputed value for whether to enable command parsing
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (21 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 164/190] drm/i915: Move obj->dirty:1 to obj->flags Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 166/190] drm/i915: Drop spinlocks around adding to the client request list Chris Wilson
                     ` (24 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

As i915.enable_cmd_parser is an unsafe option, make it read-only at
runtime. Now that it is constant, we can use the value determined during
initialisation as to whether we need the cmdparser at execbuffer time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 36 +++++++++---------------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 +-
 drivers/gpu/drm/i915/i915_params.c         |  6 ++---
 drivers/gpu/drm/i915/i915_params.h         |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h    | 15 +++++++++++++
 5 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index fae127166e2c..84340eb42e1b 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -696,12 +696,18 @@ int i915_cmd_parser_init_ring(struct intel_engine_cs *ring)
 	int cmd_table_count;
 	int ret;
 
-	if (!IS_GEN7(ring->dev))
+	if (!i915.enable_cmd_parser)
+		return 0;
+
+	if (!USES_PPGTT(ring->i915))
+		return 0;
+
+	if (!IS_GEN7(ring->i915))
 		return 0;
 
 	switch (ring->id) {
 	case RCS:
-		if (IS_HASWELL(ring->dev)) {
+		if (IS_HASWELL(ring->i915)) {
 			cmd_tables = hsw_render_ring_cmds;
 			cmd_table_count =
 				ARRAY_SIZE(hsw_render_ring_cmds);
@@ -713,7 +719,7 @@ int i915_cmd_parser_init_ring(struct intel_engine_cs *ring)
 		ring->reg_table = gen7_render_regs;
 		ring->reg_count = ARRAY_SIZE(gen7_render_regs);
 
-		if (IS_HASWELL(ring->dev)) {
+		if (IS_HASWELL(ring->i915)) {
 			ring->master_reg_table = hsw_master_regs;
 			ring->master_reg_count = ARRAY_SIZE(hsw_master_regs);
 		} else {
@@ -729,7 +735,7 @@ int i915_cmd_parser_init_ring(struct intel_engine_cs *ring)
 		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
 		break;
 	case BCS:
-		if (IS_HASWELL(ring->dev)) {
+		if (IS_HASWELL(ring->i915)) {
 			cmd_tables = hsw_blt_ring_cmds;
 			cmd_table_count = ARRAY_SIZE(hsw_blt_ring_cmds);
 		} else {
@@ -740,7 +746,7 @@ int i915_cmd_parser_init_ring(struct intel_engine_cs *ring)
 		ring->reg_table = gen7_blt_regs;
 		ring->reg_count = ARRAY_SIZE(gen7_blt_regs);
 
-		if (IS_HASWELL(ring->dev)) {
+		if (IS_HASWELL(ring->i915)) {
 			ring->master_reg_table = hsw_master_regs;
 			ring->master_reg_count = ARRAY_SIZE(hsw_master_regs);
 		} else {
@@ -954,26 +960,6 @@ unpin_src:
 	return ret ? ERR_PTR(ret) : dst;
 }
 
-/**
- * i915_needs_cmd_parser() - should a given ring use software command parsing?
- * @ring: the ring in question
- *
- * Only certain platforms require software batch buffer command parsing, and
- * only when enabled via module parameter.
- *
- * Return: true if the ring requires software command parsing
- */
-bool i915_needs_cmd_parser(struct intel_engine_cs *ring)
-{
-	if (!ring->needs_cmd_parser)
-		return false;
-
-	if (!USES_PPGTT(ring->dev))
-		return false;
-
-	return (i915.enable_cmd_parser == 1);
-}
-
 static bool check_cmd(const struct intel_engine_cs *ring,
 		      const struct drm_i915_cmd_descriptor *desc,
 		      const u32 *cmd, u32 length,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 185fbf45a5d2..e60f559696d9 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1616,7 +1616,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	}
 
 	params->args_batch_start_offset = args->batch_start_offset;
-	if (i915_needs_cmd_parser(ring) && args->batch_len) {
+	if (intel_engine_needs_cmd_parser(ring) && args->batch_len) {
 		struct i915_vma *vma;
 
 		vma = i915_gem_execbuffer_parse(ring, &shadow_exec_entry,
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 8d90c256520a..e6998efd9cae 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -48,7 +48,7 @@ struct i915_params i915 __read_mostly = {
 	.reset = true,
 	.invert_brightness = 0,
 	.disable_display = 0,
-	.enable_cmd_parser = 1,
+	.enable_cmd_parser = true,
 	.disable_vtd_wa = 0,
 	.use_mmio_flip = 0,
 	.mmio_debug = 0,
@@ -169,9 +169,9 @@ MODULE_PARM_DESC(disable_display, "Disable display (default: false)");
 module_param_named_unsafe(disable_vtd_wa, i915.disable_vtd_wa, bool, 0600);
 MODULE_PARM_DESC(disable_vtd_wa, "Disable all VT-d workarounds (default: false)");
 
-module_param_named_unsafe(enable_cmd_parser, i915.enable_cmd_parser, int, 0600);
+module_param_named_unsafe(enable_cmd_parser, i915.enable_cmd_parser, bool, 0400);
 MODULE_PARM_DESC(enable_cmd_parser,
-		 "Enable command parsing (1=enabled [default], 0=disabled)");
+		 "Enable command parsing (true=enabled [default], false=disabled)");
 
 module_param_named_unsafe(use_mmio_flip, i915.use_mmio_flip, int, 0600);
 MODULE_PARM_DESC(use_mmio_flip,
diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
index 529929073120..218e49cb469e 100644
--- a/drivers/gpu/drm/i915/i915_params.h
+++ b/drivers/gpu/drm/i915/i915_params.h
@@ -44,12 +44,12 @@ struct i915_params {
 	int disable_power_well;
 	int enable_ips;
 	int invert_brightness;
-	int enable_cmd_parser;
 	int guc_log_level;
 	int use_mmio_flip;
 	int mmio_debug;
 	int edp_vswing;
 	/* leave bools at the end to not create holes */
+	bool enable_cmd_parser;
 	bool enable_hangcheck;
 	bool fastboot;
 	bool prefault_disable;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index a66213b2450e..d24d0e438f49 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -371,6 +371,21 @@ intel_engine_flag(struct intel_engine_cs *ring)
 	return 1 << ring->id;
 }
 
+/**
+ * intel_engine_needs_cmd_parser() - should a given ring use software command
+ * parsing?
+ * @engine: the ring in question
+ *
+ * Only certain platforms require software batch buffer command parsing, and
+ * only when enabled via module parameter.
+ *
+ * Return: true if the ring requires software command parsing
+ */
+static inline bool intel_engine_needs_cmd_parser(struct intel_engine_cs *ring)
+{
+	return ring->needs_cmd_parser;
+}
+
 static inline u32
 intel_engine_sync_index(struct intel_engine_cs *ring,
 			struct intel_engine_cs *other)
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 166/190] drm/i915: Drop spinlocks around adding to the client request list
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (22 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 165/190] drm/i915: Use the precomputed value for whether to enable command parsing Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 167/190] drm/i915: Amalgamate execbuffer parameter structures Chris Wilson
                     ` (23 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

Adding to the tail of the client request list as the only other user is
in the throttle ioctl that iterates forwards over the list. It only
needs protection against deletion of a request as it reads it, it simply
won't see a new request added to the end of the list, or it would be too
early and rejected. We can further reduce the number of spinlocks
required when throttling by removing stale requests from the client_list
as we throttle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c            | 10 ++++------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 15 +++++++++------
 drivers/gpu/drm/i915/i915_gem_request.c    | 23 -----------------------
 drivers/gpu/drm/i915/i915_gem_request.h    |  2 --
 4 files changed, 13 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5347469bbea1..e3d83e10918b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3436,12 +3436,10 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 		if (time_after_eq(request->emitted_jiffies, recent_enough))
 			break;
 
-		/*
-		 * Note that the request might not have been submitted yet.
-		 * In which case emitted_jiffies will be zero.
-		 */
-		if (!request->emitted_jiffies)
-			continue;
+		if (target) {
+			list_del(&target->client_list);
+			target->file_priv = NULL;
+		}
 
 		target = request;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index e60f559696d9..7014523cd890 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1322,6 +1322,14 @@ err:
 	return vma;
 }
 
+static void
+add_to_client(struct drm_i915_gem_request *req,
+	      struct drm_file *file)
+{
+	req->file_priv = file->driver_priv;
+	list_add_tail(&req->client_list, &req->file_priv->mm.request_list);
+}
+
 static int
 execbuf_submit(struct i915_execbuffer_params *params,
 	       struct drm_i915_gem_execbuffer2 *args,
@@ -1409,6 +1417,7 @@ execbuf_submit(struct i915_execbuffer_params *params,
 	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
 
 	i915_gem_execbuffer_move_to_active(vmas, params->request);
+	add_to_client(params->request, params->file);
 
 	return 0;
 }
@@ -1689,12 +1698,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	 */
 	params->request->batch = params->batch_vma;
 
-	ret = i915_gem_request_add_to_client(params->request, file);
-	if (ret) {
-		i915_gem_request_cancel(params->request);
-		goto err_batch_unpin;
-	}
-
 	/*
 	 * Save assorted stuff away to pass through to *_submission().
 	 * NB: This data should be 'persistent' and not local as it will
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index d922b78614bd..28977edfbb83 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -278,29 +278,6 @@ err:
 	return ERR_PTR(ret);
 }
 
-int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
-				   struct drm_file *file)
-{
-	struct drm_i915_file_private *file_priv;
-
-	WARN_ON(!req || !file || req->file_priv);
-
-	if (!req || !file)
-		return -EINVAL;
-
-	if (req->file_priv)
-		return -EINVAL;
-
-	file_priv = file->driver_priv;
-
-	spin_lock(&file_priv->mm.lock);
-	req->file_priv = file_priv;
-	list_add_tail(&req->client_list, &file_priv->mm.request_list);
-	spin_unlock(&file_priv->mm.lock);
-
-	return 0;
-}
-
 static inline void
 i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 67bc1f919af0..1e7c4fff5257 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -94,8 +94,6 @@ struct drm_i915_gem_request *
 i915_gem_request_alloc(struct intel_engine_cs *ring,
 		       struct intel_context *ctx);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
-int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
-				   struct drm_file *file);
 void i915_gem_request_retire_upto(struct drm_i915_gem_request *req);
 
 static inline uint32_t
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 167/190] drm/i915: Amalgamate execbuffer parameter structures
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (23 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 166/190] drm/i915: Drop spinlocks around adding to the client request list Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 168/190] drm/i915: Skip holding context reference for duration of execbuffer call Chris Wilson
                     ` (22 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

Combine the two slightly overlapping parameter structures we pass around
the execbuffer routines into one.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 539 +++++++++++++----------------
 1 file changed, 231 insertions(+), 308 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 7014523cd890..2a535eb35dff 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -43,70 +43,71 @@
 
 #define BATCH_OFFSET_BIAS (256*1024)
 
-struct i915_execbuffer_params {
-	struct drm_device               *dev;
-	struct drm_file                 *file;
-	struct i915_vma			*batch_vma;
-	uint32_t                        dispatch_flags;
-	uint32_t                        args_batch_start_offset;
-	struct intel_engine_cs          *ring;
-	struct intel_context            *ctx;
-	struct drm_i915_gem_request     *request;
-};
-
-struct eb_vmas {
+struct i915_execbuffer {
 	struct drm_i915_private *i915;
+	struct drm_file *file;
+	struct drm_i915_gem_execbuffer2 *args;
+	struct drm_i915_gem_exec_object2 *exec;
+	struct intel_engine_cs *engine;
+	struct intel_context *ctx;
+	struct i915_address_space *vm;
+	struct i915_vma *batch_vma;
+	uint32_t batch_start_offset;
+	struct drm_i915_gem_request *request;
+	unsigned dispatch_flags;
+	bool need_relocs;
 	struct list_head vmas;
+	struct reloc_cache {
+		unsigned long vaddr;
+		unsigned page;
+		struct drm_mm_node node;
+		bool use_64bit_reloc;
+	} reloc_cache;
 	int and;
 	union {
-		struct i915_vma *lut[0];
-		struct hlist_head buckets[0];
+		struct i915_vma **lut;
+		struct hlist_head *buckets;
 	};
 };
 
-static struct eb_vmas *
-eb_create(struct drm_i915_private *i915,
-	  struct drm_i915_gem_execbuffer2 *args)
+static int
+eb_create(struct i915_execbuffer *eb)
 {
-	struct eb_vmas *eb = NULL;
-
-	if (args->flags & I915_EXEC_HANDLE_LUT) {
-		unsigned size = args->buffer_count;
+	eb->lut = NULL;
+	if (eb->args->flags & I915_EXEC_HANDLE_LUT) {
+		unsigned size = eb->args->buffer_count;
 		size *= sizeof(struct i915_vma *);
-		size += sizeof(struct eb_vmas);
-		eb = kmalloc(size, GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
+		eb->lut = kmalloc(size, GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
 	}
 
-	if (eb == NULL) {
-		unsigned size = args->buffer_count;
+	if (eb->lut == NULL) {
+		unsigned size = eb->args->buffer_count;
 		unsigned count = PAGE_SIZE / sizeof(struct hlist_head) / 2;
 		BUILD_BUG_ON_NOT_POWER_OF_2(PAGE_SIZE / sizeof(struct hlist_head));
 		while (count > 2*size)
 			count >>= 1;
-		eb = kzalloc(count*sizeof(struct hlist_head) +
-			     sizeof(struct eb_vmas),
-			     GFP_TEMPORARY);
-		if (eb == NULL)
-			return eb;
+		eb->lut = kzalloc(count*sizeof(struct hlist_head),
+				  GFP_TEMPORARY);
+		if (eb->lut == NULL)
+			return -ENOMEM;
 
 		eb->and = count - 1;
 	} else
-		eb->and = -args->buffer_count;
+		eb->and = -eb->args->buffer_count;
 
-	eb->i915 = i915;
 	INIT_LIST_HEAD(&eb->vmas);
-	return eb;
+	return 0;
 }
 
 static void
-eb_reset(struct eb_vmas *eb)
+eb_reset(struct i915_execbuffer *eb)
 {
 	if (eb->and >= 0)
 		memset(eb->buckets, 0, (eb->and+1)*sizeof(struct hlist_head));
 }
 
 static struct i915_vma *
-eb_get_batch(struct eb_vmas *eb)
+eb_get_batch(struct i915_execbuffer *eb)
 {
 	struct i915_vma *vma = list_entry(eb->vmas.prev, typeof(*vma), exec_list);
 
@@ -126,41 +127,37 @@ eb_get_batch(struct eb_vmas *eb)
 }
 
 static int
-eb_lookup_vmas(struct eb_vmas *eb,
-	       struct drm_i915_gem_exec_object2 *exec,
-	       const struct drm_i915_gem_execbuffer2 *args,
-	       struct i915_address_space *vm,
-	       struct drm_file *file)
+eb_lookup_vmas(struct i915_execbuffer *eb)
 {
 	struct drm_i915_gem_object *obj;
 	struct list_head objects;
 	int i, ret;
 
 	INIT_LIST_HEAD(&objects);
-	spin_lock(&file->table_lock);
+	spin_lock(&eb->file->table_lock);
 	/* Grab a reference to the object and release the lock so we can lookup
 	 * or create the VMA without using GFP_ATOMIC */
-	for (i = 0; i < args->buffer_count; i++) {
-		obj = to_intel_bo(idr_find(&file->object_idr, exec[i].handle));
+	for (i = 0; i < eb->args->buffer_count; i++) {
+		obj = to_intel_bo(idr_find(&eb->file->object_idr, eb->exec[i].handle));
 		if (obj == NULL) {
-			spin_unlock(&file->table_lock);
+			spin_unlock(&eb->file->table_lock);
 			DRM_DEBUG("Invalid object handle %d at index %d\n",
-				   exec[i].handle, i);
+				   eb->exec[i].handle, i);
 			ret = -ENOENT;
 			goto err;
 		}
 
 		if (!list_empty(&obj->obj_exec_link)) {
-			spin_unlock(&file->table_lock);
+			spin_unlock(&eb->file->table_lock);
 			DRM_DEBUG("Object %p [handle %d, index %d] appears more than once in object list\n",
-				   obj, exec[i].handle, i);
+				   obj, eb->exec[i].handle, i);
 			ret = -EINVAL;
 			goto err;
 		}
 
 		list_add_tail(&obj->obj_exec_link, &objects);
 	}
-	spin_unlock(&file->table_lock);
+	spin_unlock(&eb->file->table_lock);
 
 	i = 0;
 	while (!list_empty(&objects)) {
@@ -178,7 +175,7 @@ eb_lookup_vmas(struct eb_vmas *eb,
 		 * from the (obj, vm) we don't run the risk of creating
 		 * duplicated vmas for the same vm.
 		 */
-		vma = i915_gem_obj_lookup_or_create_vma(obj, vm, NULL);
+		vma = i915_gem_obj_lookup_or_create_vma(obj, eb->vm, NULL);
 		if (unlikely(IS_ERR(vma))) {
 			DRM_DEBUG("Failed to lookup VMA\n");
 			ret = PTR_ERR(vma);
@@ -189,11 +186,13 @@ eb_lookup_vmas(struct eb_vmas *eb,
 		list_add_tail(&vma->exec_list, &eb->vmas);
 		list_del_init(&obj->obj_exec_link);
 
-		vma->exec_entry = &exec[i];
+		vma->exec_entry = &eb->exec[i];
 		if (eb->and < 0) {
 			eb->lut[i] = vma;
 		} else {
-			uint32_t handle = args->flags & I915_EXEC_HANDLE_LUT ? i : exec[i].handle;
+			u32 handle =
+				eb->args->flags & I915_EXEC_HANDLE_LUT ?
+				i : eb->exec[i].handle;
 			vma->exec_handle = handle;
 			hlist_add_head(&vma->exec_node,
 				       &eb->buckets[handle & eb->and]);
@@ -220,7 +219,7 @@ err:
 	return ret;
 }
 
-static struct i915_vma *eb_get_vma(struct eb_vmas *eb, unsigned long handle)
+static struct i915_vma *eb_get_vma(struct i915_execbuffer *eb, unsigned long handle)
 {
 	if (eb->and < 0) {
 		if (handle >= -eb->and)
@@ -243,7 +242,7 @@ static struct i915_vma *eb_get_vma(struct eb_vmas *eb, unsigned long handle)
 }
 
 static void
-i915_gem_execbuffer_unreserve_vma(struct i915_vma *vma)
+eb_unreserve_vma(struct i915_vma *vma)
 {
 	struct drm_i915_gem_exec_object2 *entry;
 
@@ -261,8 +260,10 @@ i915_gem_execbuffer_unreserve_vma(struct i915_vma *vma)
 	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | __EXEC_OBJECT_HAS_PIN);
 }
 
-static void eb_destroy(struct eb_vmas *eb)
+static void eb_destroy(struct i915_execbuffer *eb)
 {
+	i915_gem_context_unreference(eb->ctx);
+
 	while (!list_empty(&eb->vmas)) {
 		struct i915_vma *vma;
 
@@ -270,9 +271,8 @@ static void eb_destroy(struct eb_vmas *eb)
 				       struct i915_vma,
 				       exec_list);
 		list_del_init(&vma->exec_list);
-		i915_gem_execbuffer_unreserve_vma(vma);
+		eb_unreserve_vma(vma);
 	}
-	kfree(eb);
 }
 
 static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
@@ -308,21 +308,12 @@ relocation_target(const struct drm_i915_gem_relocation_entry *reloc,
 	return gen8_canonical_addr((int)reloc->delta + target_offset);
 }
 
-struct reloc_cache {
-	struct drm_i915_private *i915;
-	unsigned long vaddr;
-	unsigned page;
-	struct drm_mm_node node;
-	bool use_64bit_reloc;
-};
-
 static void reloc_cache_init(struct reloc_cache *cache,
 			     struct drm_i915_private *i915)
 {
 	cache->page = -1;
 	cache->vaddr = 0;
-	cache->i915 = i915;
-	cache->use_64bit_reloc = INTEL_INFO(cache->i915)->gen >= 8;
+	cache->use_64bit_reloc = INTEL_INFO(i915)->gen >= 8;
 }
 
 static inline void *unmask_page(unsigned long p)
@@ -337,7 +328,7 @@ static inline unsigned unmask_flags(unsigned long p)
 
 #define KMAP 0x4
 
-static void reloc_cache_fini(struct reloc_cache *cache)
+static void reloc_cache_reset(struct reloc_cache *cache)
 {
 	void *vaddr;
 
@@ -355,6 +346,9 @@ static void reloc_cache_fini(struct reloc_cache *cache)
 		io_mapping_unmap_atomic(vaddr);
 		i915_vma_unpin((struct i915_vma *)cache->node.mm);
 	}
+
+	cache->vaddr = 0;
+	cache->page = -1;
 }
 
 static void *reloc_kmap(struct drm_i915_gem_object *obj,
@@ -386,6 +380,14 @@ static void *reloc_kmap(struct drm_i915_gem_object *obj,
 	return vaddr;
 }
 
+static struct io_mapping *
+to_io_mapping(struct reloc_cache *cache)
+{
+	struct drm_i915_private *i915 =
+		container_of(cache, struct i915_execbuffer, reloc_cache)->i915;
+	return &i915->gtt.mappable;
+}
+
 static void *reloc_iomap(struct drm_i915_gem_object *obj,
 			 struct reloc_cache *cache,
 			 int page)
@@ -418,7 +420,7 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj,
 		cache->node.mm = (void *)vma;
 	}
 
-	vaddr = io_mapping_map_atomic_wc(&cache->i915->gtt.mappable,
+	vaddr = io_mapping_map_atomic_wc(to_io_mapping(cache),
 					 cache->node.start + (page << PAGE_SHIFT));
 	cache->page = page;
 	cache->vaddr = (unsigned long)vaddr;
@@ -499,12 +501,10 @@ repeat:
 }
 
 static int
-i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
-				   struct eb_vmas *eb,
-				   struct drm_i915_gem_relocation_entry *reloc,
-				   struct reloc_cache *cache)
+eb_relocate_entry(struct drm_i915_gem_object *obj,
+		  struct i915_execbuffer *eb,
+		  struct drm_i915_gem_relocation_entry *reloc)
 {
-	struct drm_device *dev = obj->base.dev;
 	struct drm_gem_object *target_obj;
 	struct drm_i915_gem_object *target_i915_obj;
 	struct i915_vma *target_vma;
@@ -523,7 +523,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 	/* Sandybridge PPGTT errata: We need a global gtt mapping for MI and
 	 * pipe_control writes because the gpu doesn't properly redirect them
 	 * through the ppgtt for non_secure batchbuffers. */
-	if (unlikely(IS_GEN6(dev) &&
+	if (unlikely(IS_GEN6(eb->i915) &&
 	    reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION)) {
 		ret = i915_vma_bind(target_vma, target_i915_obj->cache_level,
 				    PIN_GLOBAL);
@@ -565,7 +565,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 
 	/* Check that the relocation address is valid... */
 	if (unlikely(reloc->offset >
-		     obj->base.size - (cache->use_64bit_reloc ? 8 : 4))) {
+		     obj->base.size - (eb->reloc_cache.use_64bit_reloc ? 8 : 4))) {
 		DRM_DEBUG("Relocation beyond object bounds: "
 			  "obj %p target %d offset %d size %d.\n",
 			  obj, reloc->target_handle,
@@ -585,7 +585,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 	if (i915_gem_object_is_active(obj) && pagefault_disabled())
 		return -EFAULT;
 
-	ret = relocate_entry(obj, reloc, cache, target_offset);
+	ret = relocate_entry(obj, reloc, &eb->reloc_cache, target_offset);
 	if (ret)
 		return ret;
 
@@ -594,19 +594,15 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 	return 0;
 }
 
-static int
-i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
-				 struct eb_vmas *eb)
+static int eb_relocate_vma(struct i915_vma *vma, struct i915_execbuffer *eb)
 {
 #define N_RELOC(x) ((x) / sizeof(struct drm_i915_gem_relocation_entry))
 	struct drm_i915_gem_relocation_entry stack_reloc[N_RELOC(512)];
 	struct drm_i915_gem_relocation_entry __user *user_relocs;
 	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	struct reloc_cache cache;
 	int remain, ret = 0;
 
 	user_relocs = to_user_ptr(entry->relocs_ptr);
-	reloc_cache_init(&cache, eb->i915);
 
 	remain = entry->relocation_count;
 	while (remain) {
@@ -624,7 +620,7 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 		do {
 			u64 offset = r->presumed_offset;
 
-			ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, r, &cache);
+			ret = eb_relocate_entry(vma->obj, eb, r);
 			if (ret)
 				goto out;
 
@@ -642,33 +638,29 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 	}
 
 out:
-	reloc_cache_fini(&cache);
+	reloc_cache_reset(&eb->reloc_cache);
 	return ret;
 #undef N_RELOC
 }
 
 static int
-i915_gem_execbuffer_relocate_vma_slow(struct i915_vma *vma,
-				      struct eb_vmas *eb,
-				      struct drm_i915_gem_relocation_entry *relocs)
+eb_relocate_vma_slow(struct i915_vma *vma,
+		     struct i915_execbuffer *eb,
+		     struct drm_i915_gem_relocation_entry *relocs)
 {
 	const struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	struct reloc_cache cache;
 	int i, ret = 0;
 
-	reloc_cache_init(&cache, eb->i915);
 	for (i = 0; i < entry->relocation_count; i++) {
-		ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, &relocs[i], &cache);
+		ret = eb_relocate_entry(vma->obj, eb, &relocs[i]);
 		if (ret)
 			break;
 	}
-	reloc_cache_fini(&cache);
-
+	reloc_cache_reset(&eb->reloc_cache);
 	return ret;
 }
 
-static int
-i915_gem_execbuffer_relocate(struct eb_vmas *eb)
+static int eb_relocate(struct i915_execbuffer *eb)
 {
 	struct i915_vma *vma;
 	int ret = 0;
@@ -682,7 +674,7 @@ i915_gem_execbuffer_relocate(struct eb_vmas *eb)
 	 */
 	pagefault_disable();
 	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		ret = i915_gem_execbuffer_relocate_vma(vma, eb);
+		ret = eb_relocate_vma(vma, eb);
 		if (ret)
 			break;
 	}
@@ -698,9 +690,9 @@ static bool only_mappable_for_reloc(unsigned int flags)
 }
 
 static int
-i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
-				struct intel_engine_cs *ring,
-				bool *need_reloc)
+eb_reserve_vma(struct i915_vma *vma,
+	       struct intel_engine_cs *ring,
+	       bool *need_reloc)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
 	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
@@ -818,33 +810,26 @@ eb_vma_misplaced(struct i915_vma *vma)
 	return false;
 }
 
-static int
-i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
-			    struct list_head *vmas,
-			    struct intel_context *ctx,
-			    bool *need_relocs)
+static int eb_reserve(struct i915_execbuffer *eb)
 {
+	const bool has_fenced_gpu_access = INTEL_INFO(eb->i915)->gen < 4;
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
-	struct i915_address_space *vm;
 	struct list_head ordered_vmas;
 	struct list_head pinned_vmas;
-	bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4;
 	int retry;
 
-	vm = list_first_entry(vmas, struct i915_vma, exec_list)->vm;
-
 	INIT_LIST_HEAD(&ordered_vmas);
 	INIT_LIST_HEAD(&pinned_vmas);
-	while (!list_empty(vmas)) {
+	while (!list_empty(&eb->vmas)) {
 		struct drm_i915_gem_exec_object2 *entry;
 		bool need_fence, need_mappable;
 
-		vma = list_first_entry(vmas, struct i915_vma, exec_list);
+		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
 		obj = vma->obj;
 		entry = vma->exec_entry;
 
-		if (ctx->flags & CONTEXT_NO_ZEROMAP)
+		if (eb->ctx->flags & CONTEXT_NO_ZEROMAP)
 			entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
 
 		if (!has_fenced_gpu_access)
@@ -865,8 +850,8 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
 		obj->base.pending_read_domains = I915_GEM_GPU_DOMAINS & ~I915_GEM_DOMAIN_COMMAND;
 		obj->base.pending_write_domain = 0;
 	}
-	list_splice(&ordered_vmas, vmas);
-	list_splice(&pinned_vmas, vmas);
+	list_splice(&ordered_vmas, &eb->vmas);
+	list_splice(&pinned_vmas, &eb->vmas);
 
 	/* Attempt to pin all of the buffers into the GTT.
 	 * This is done in 3 phases:
@@ -885,24 +870,24 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
 		int ret = 0;
 
 		/* Unbind any ill-fitting objects or pin. */
-		list_for_each_entry(vma, vmas, exec_list) {
+		list_for_each_entry(vma, &eb->vmas, exec_list) {
 			if (!drm_mm_node_allocated(&vma->node))
 				continue;
 
 			if (eb_vma_misplaced(vma))
 				ret = i915_vma_unbind(vma);
 			else
-				ret = i915_gem_execbuffer_reserve_vma(vma, ring, need_relocs);
+				ret = eb_reserve_vma(vma, eb->engine, &eb->need_relocs);
 			if (ret)
 				goto err;
 		}
 
 		/* Bind fresh objects */
-		list_for_each_entry(vma, vmas, exec_list) {
+		list_for_each_entry(vma, &eb->vmas, exec_list) {
 			if (drm_mm_node_allocated(&vma->node))
 				continue;
 
-			ret = i915_gem_execbuffer_reserve_vma(vma, ring, need_relocs);
+			ret = eb_reserve_vma(vma, eb->engine, &eb->need_relocs);
 			if (ret)
 				goto err;
 		}
@@ -912,46 +897,37 @@ err:
 			return ret;
 
 		/* Decrement pin count for bound objects */
-		list_for_each_entry(vma, vmas, exec_list)
-			i915_gem_execbuffer_unreserve_vma(vma);
+		list_for_each_entry(vma, &eb->vmas, exec_list)
+			eb_unreserve_vma(vma);
 
-		ret = i915_gem_evict_vm(vm, true);
+		ret = i915_gem_evict_vm(eb->vm, true);
 		if (ret)
 			return ret;
 	} while (1);
 }
 
 static int
-i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
-				  struct drm_i915_gem_execbuffer2 *args,
-				  struct drm_file *file,
-				  struct intel_engine_cs *ring,
-				  struct eb_vmas *eb,
-				  struct drm_i915_gem_exec_object2 *exec,
-				  struct intel_context *ctx)
+eb_relocate_slow(struct i915_execbuffer *eb)
 {
+	const unsigned count = eb->args->buffer_count;
+	struct drm_device *dev = eb->i915->dev;
 	struct drm_i915_gem_relocation_entry *reloc;
-	struct i915_address_space *vm;
 	struct i915_vma *vma;
-	bool need_relocs;
 	int *reloc_offset;
 	int i, total, ret;
-	unsigned count = args->buffer_count;
-
-	vm = list_first_entry(&eb->vmas, struct i915_vma, exec_list)->vm;
 
 	/* We may process another execbuffer during the unlock... */
 	while (!list_empty(&eb->vmas)) {
 		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
 		list_del_init(&vma->exec_list);
-		i915_gem_execbuffer_unreserve_vma(vma);
+		eb_unreserve_vma(vma);
 	}
 
 	mutex_unlock(&dev->struct_mutex);
 
 	total = 0;
 	for (i = 0; i < count; i++)
-		total += exec[i].relocation_count;
+		total += eb->exec[i].relocation_count;
 
 	reloc_offset = drm_malloc_ab(count, sizeof(*reloc_offset));
 	reloc = drm_malloc_ab(total, sizeof(*reloc));
@@ -968,10 +944,10 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 		u64 invalid_offset = (u64)-1;
 		int j;
 
-		user_relocs = to_user_ptr(exec[i].relocs_ptr);
+		user_relocs = to_user_ptr(eb->exec[i].relocs_ptr);
 
 		if (copy_from_user(reloc+total, user_relocs,
-				   exec[i].relocation_count * sizeof(*reloc))) {
+				   eb->exec[i].relocation_count * sizeof(*reloc))) {
 			ret = -EFAULT;
 			mutex_lock(&dev->struct_mutex);
 			goto err;
@@ -986,7 +962,7 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 		 * happened we would make the mistake of assuming that the
 		 * relocations were valid.
 		 */
-		for (j = 0; j < exec[i].relocation_count; j++) {
+		for (j = 0; j < eb->exec[i].relocation_count; j++) {
 			if (__copy_to_user(&user_relocs[j].presumed_offset,
 					   &invalid_offset,
 					   sizeof(invalid_offset))) {
@@ -997,7 +973,7 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 		}
 
 		reloc_offset[i] = total;
-		total += exec[i].relocation_count;
+		total += eb->exec[i].relocation_count;
 	}
 
 	ret = i915_mutex_lock_interruptible(dev);
@@ -1008,19 +984,17 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 
 	/* reacquire the objects */
 	eb_reset(eb);
-	ret = eb_lookup_vmas(eb, exec, args, vm, file);
+	ret = eb_lookup_vmas(eb);
 	if (ret)
 		goto err;
 
-	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
-	ret = i915_gem_execbuffer_reserve(ring, &eb->vmas, ctx, &need_relocs);
+	ret = eb_reserve(eb);
 	if (ret)
 		goto err;
 
 	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		int offset = vma->exec_entry - exec;
-		ret = i915_gem_execbuffer_relocate_vma_slow(vma, eb,
-							    reloc + reloc_offset[offset]);
+		int offset = vma->exec_entry - eb->exec;
+		ret = eb_relocate_vma_slow(vma, eb, reloc + reloc_offset[offset]);
 		if (ret)
 			goto err;
 	}
@@ -1038,20 +1012,19 @@ err:
 }
 
 static int
-i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
-				struct list_head *vmas)
+eb_move_to_gpu(struct i915_execbuffer *eb)
 {
-	const unsigned other_rings = (~intel_engine_flag(req->engine) & I915_BO_ACTIVE_MASK) << I915_BO_ACTIVE_SHIFT;
+	const unsigned other_rings = (~intel_engine_flag(eb->engine) & I915_BO_ACTIVE_MASK) << I915_BO_ACTIVE_SHIFT;
 	struct i915_vma *vma;
 	uint32_t flush_domains = 0;
 	bool flush_chipset = false;
 	int ret;
 
-	list_for_each_entry(vma, vmas, exec_list) {
+	list_for_each_entry(vma, &eb->vmas, exec_list) {
 		struct drm_i915_gem_object *obj = vma->obj;
 
 		if (obj->flags & other_rings) {
-			ret = i915_gem_object_sync(obj, req);
+			ret = i915_gem_object_sync(obj, eb->request);
 			if (ret)
 				return ret;
 		}
@@ -1063,13 +1036,13 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 	}
 
 	if (flush_chipset)
-		i915_gem_chipset_flush(req->engine->dev);
+		i915_gem_chipset_flush(eb->i915->dev);
 
 	/* Make sure (untracked) CPU relocs/parsing are flushed */
 	wmb();
 
 	/* Unconditionally invalidate gpu caches and TLBs. */
-	return req->engine->emit_flush(req, I915_GEM_GPU_DOMAINS, 0);
+	return eb->engine->emit_flush(eb->request, I915_GEM_GPU_DOMAINS, 0);
 }
 
 static bool
@@ -1168,24 +1141,26 @@ validate_exec_list(struct drm_device *dev,
 	return 0;
 }
 
-static struct intel_context *
-i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
-			  struct intel_engine_cs *ring, const u32 ctx_id)
+static int eb_select_context(struct i915_execbuffer *eb)
 {
 	struct intel_context *ctx;
-	struct i915_ctx_hang_stats *hs;
+	unsigned ctx_id;
 
-	ctx = i915_gem_context_get(file->driver_priv, ctx_id);
-	if (IS_ERR(ctx))
-		return ctx;
+	ctx_id = i915_execbuffer2_get_context_id(*eb->args);
+	ctx = i915_gem_context_get(eb->file->driver_priv, ctx_id);
+	if (unlikely(IS_ERR(ctx)))
+		return PTR_ERR(ctx);
 
-	hs = &ctx->hang_stats;
-	if (hs->banned) {
+	if (unlikely(ctx->hang_stats.banned)) {
 		DRM_DEBUG("Context %u tried to submit while banned\n", ctx_id);
-		return ERR_PTR(-EIO);
+		return -EIO;
 	}
 
-	return ctx;
+	eb->ctx = ctx;
+	i915_gem_context_reference(ctx);
+	eb->vm = ctx->ppgtt ? &ctx->ppgtt->base : &eb->i915->gtt.base;
+
+	return 0;
 }
 
 void i915_vma_move_to_active(struct i915_vma *vma,
@@ -1225,12 +1200,11 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 }
 
 static void
-i915_gem_execbuffer_move_to_active(struct list_head *vmas,
-				   struct drm_i915_gem_request *req)
+eb_move_to_active(struct i915_execbuffer *eb)
 {
 	struct i915_vma *vma;
 
-	list_for_each_entry(vma, vmas, exec_list) {
+	list_for_each_entry(vma, &eb->vmas, exec_list) {
 		struct drm_i915_gem_object *obj = vma->obj;
 		u32 old_read = obj->base.read_domains;
 		u32 old_write = obj->base.write_domain;
@@ -1242,7 +1216,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 			obj->base.pending_read_domains |= obj->base.read_domains;
 		obj->base.read_domains = obj->base.pending_read_domains;
 
-		i915_vma_move_to_active(vma, req, vma->exec_entry->flags);
+		i915_vma_move_to_active(vma, eb->request, vma->exec_entry->flags);
 		trace_i915_gem_object_change_domain(obj, old_read, old_write);
 	}
 }
@@ -1274,28 +1248,24 @@ i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
 }
 
 static struct i915_vma *
-i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
-			  struct drm_i915_gem_exec_object2 *shadow_exec_entry,
-			  struct drm_i915_gem_object *batch_obj,
-			  struct eb_vmas *eb,
-			  u32 batch_start_offset,
-			  u32 batch_len,
-			  bool is_master)
+eb_parse(struct i915_execbuffer *eb,
+	 struct drm_i915_gem_exec_object2 *shadow_exec_entry,
+	 bool is_master)
 {
 	struct drm_i915_gem_object *shadow_batch_obj;
 	struct i915_vma *vma;
 	int ret;
 
-	shadow_batch_obj = i915_gem_batch_pool_get(&ring->batch_pool,
-						   PAGE_ALIGN(batch_len));
+	shadow_batch_obj = i915_gem_batch_pool_get(&eb->engine->batch_pool,
+						   PAGE_ALIGN(eb->args->batch_len));
 	if (IS_ERR(shadow_batch_obj))
 		return ERR_CAST(shadow_batch_obj);
 
-	ret = i915_parse_cmds(ring,
-			      batch_obj,
+	ret = i915_parse_cmds(eb->engine,
+			      eb->batch_vma->obj,
 			      shadow_batch_obj,
-			      batch_start_offset,
-			      batch_len,
+			      eb->args->batch_start_offset,
+			      eb->args->batch_len,
 			      is_master);
 	if (ret) {
 		if (ret == -EACCES) /* unhandled chained batch */
@@ -1331,32 +1301,29 @@ add_to_client(struct drm_i915_gem_request *req,
 }
 
 static int
-execbuf_submit(struct i915_execbuffer_params *params,
-	       struct drm_i915_gem_execbuffer2 *args,
-	       struct list_head *vmas)
+execbuf_submit(struct i915_execbuffer *eb)
 {
-	struct intel_ring *ring = params->request->ring;
-	struct drm_i915_private *dev_priv = params->request->i915;
-	u64 exec_start, exec_len;
+	struct intel_ring *ring = eb->request->ring;
+	struct drm_i915_private *dev_priv = eb->i915;
 	int instp_mode;
 	u32 instp_mask;
 	int ret;
 
-	ret = i915_gem_execbuffer_move_to_gpu(params->request, vmas);
+	ret = eb_move_to_gpu(eb);
 	if (ret)
 		return ret;
 
-	ret = i915_switch_context(params->request);
+	ret = i915_switch_context(eb->request);
 	if (ret)
 		return ret;
 
-	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
+	instp_mode = eb->args->flags & I915_EXEC_CONSTANTS_MASK;
 	instp_mask = I915_EXEC_CONSTANTS_MASK;
 	switch (instp_mode) {
 	case I915_EXEC_CONSTANTS_REL_GENERAL:
 	case I915_EXEC_CONSTANTS_ABSOLUTE:
 	case I915_EXEC_CONSTANTS_REL_SURFACE:
-		if (instp_mode != 0 && params->ring->id != RCS) {
+		if (instp_mode != 0 && eb->engine->id != RCS) {
 			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
 			return -EINVAL;
 		}
@@ -1383,9 +1350,9 @@ execbuf_submit(struct i915_execbuffer_params *params,
 		return -EINVAL;
 	}
 
-	if (params->ring->id == RCS &&
+	if (eb->engine->id == RCS &&
 	    instp_mode != dev_priv->relative_constants_mode) {
-		ret = intel_ring_begin(params->request, 4);
+		ret = intel_ring_begin(eb->request, 4);
 		if (ret)
 			return ret;
 
@@ -1398,26 +1365,24 @@ execbuf_submit(struct i915_execbuffer_params *params,
 		dev_priv->relative_constants_mode = instp_mode;
 	}
 
-	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
-		ret = i915_reset_gen7_sol_offsets(params->request);
+	if (eb->args->flags & I915_EXEC_GEN7_SOL_RESET) {
+		ret = i915_reset_gen7_sol_offsets(eb->request);
 		if (ret)
 			return ret;
 	}
 
-	exec_len   = args->batch_len;
-	exec_start = params->batch_vma->node.start +
-		     params->args_batch_start_offset;
-
-	ret = params->ring->emit_bb_start(params->request,
-					  exec_start, exec_len,
-					  params->dispatch_flags);
+	ret = eb->engine->emit_bb_start(eb->request,
+					eb->batch_vma->node.start +
+					eb->batch_start_offset,
+					eb->args->batch_len,
+					eb->dispatch_flags);
 	if (ret)
 		return ret;
 
-	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
+	trace_i915_gem_ring_dispatch(eb->request, eb->dispatch_flags);
 
-	i915_gem_execbuffer_move_to_active(vmas, params->request);
-	add_to_client(params->request, params->file);
+	eb_move_to_active(eb);
+	add_to_client(eb->request, eb->file);
 
 	return 0;
 }
@@ -1426,19 +1391,18 @@ execbuf_submit(struct i915_execbuffer_params *params,
  * Find one BSD ring to dispatch the corresponding BSD command.
  * The Ring ID is returned.
  */
-static int gen8_dispatch_bsd_ring(struct drm_device *dev,
-				  struct drm_file *file)
+static struct intel_engine_cs *
+gen8_dispatch_bsd_ring(struct drm_device *dev,
+		       struct drm_file *file)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 
 	/* Check whether the file_priv is using one ring */
-	if (file_priv->bsd_ring)
-		return file_priv->bsd_ring->id;
-	else {
-		/* If no, use the ping-pong mechanism to select one ring */
+	if (file_priv->bsd_ring == NULL) {
+		struct drm_i915_private *dev_priv = dev->dev_private;
 		int ring_id;
 
+		/* If not, use the ping-pong mechanism to select one ring */
 		mutex_lock(&dev->struct_mutex);
 		if (dev_priv->mm.bsd_ring_dispatch_index == 0) {
 			ring_id = VCS;
@@ -1449,28 +1413,20 @@ static int gen8_dispatch_bsd_ring(struct drm_device *dev,
 		}
 		file_priv->bsd_ring = &dev_priv->ring[ring_id];
 		mutex_unlock(&dev->struct_mutex);
-		return ring_id;
 	}
+
+	return file_priv->bsd_ring;
 }
 
 static int
-i915_gem_do_execbuffer(struct drm_device *dev, void *data,
+i915_gem_do_execbuffer(struct drm_device *dev,
 		       struct drm_file *file,
 		       struct drm_i915_gem_execbuffer2 *args,
 		       struct drm_i915_gem_exec_object2 *exec)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct eb_vmas *eb;
+	struct i915_execbuffer eb;
 	struct drm_i915_gem_exec_object2 shadow_exec_entry;
-	struct intel_engine_cs *ring;
-	struct intel_context *ctx;
-	struct i915_address_space *vm;
-	struct i915_execbuffer_params params_master; /* XXX: will be removed later */
-	struct i915_execbuffer_params *params = &params_master;
-	const u32 ctx_id = i915_execbuffer2_get_context_id(*args);
-	u32 dispatch_flags;
 	int ret;
-	bool need_relocs;
 
 	if (!i915_gem_check_execbuffer(args))
 		return -EINVAL;
@@ -1479,15 +1435,15 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;
 
-	dispatch_flags = 0;
+	eb.dispatch_flags = 0;
 	if (args->flags & I915_EXEC_SECURE) {
 		if (!file->is_master || !capable(CAP_SYS_ADMIN))
 		    return -EPERM;
 
-		dispatch_flags |= I915_DISPATCH_SECURE;
+		eb.dispatch_flags |= I915_DISPATCH_SECURE;
 	}
 	if (args->flags & I915_EXEC_IS_PINNED)
-		dispatch_flags |= I915_DISPATCH_PINNED;
+		eb.dispatch_flags |= I915_DISPATCH_PINNED;
 
 	if ((args->flags & I915_EXEC_RING_MASK) > LAST_USER_RING) {
 		DRM_DEBUG("execbuf with unknown ring: %d\n",
@@ -1502,22 +1458,26 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		return -EINVAL;
 	} 
 
+	eb.i915 = to_i915(dev);
+	eb.file = file;
+	eb.args = args;
+	eb.exec = exec;
+	eb.need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
+	reloc_cache_init(&eb.reloc_cache, eb.i915);
+
 	if ((args->flags & I915_EXEC_RING_MASK) == I915_EXEC_DEFAULT)
-		ring = &dev_priv->ring[RCS];
+		eb.engine = &eb.i915->ring[RCS];
 	else if ((args->flags & I915_EXEC_RING_MASK) == I915_EXEC_BSD) {
 		if (HAS_BSD2(dev)) {
-			int ring_id;
-
 			switch (args->flags & I915_EXEC_BSD_MASK) {
 			case I915_EXEC_BSD_DEFAULT:
-				ring_id = gen8_dispatch_bsd_ring(dev, file);
-				ring = &dev_priv->ring[ring_id];
+				eb.engine = gen8_dispatch_bsd_ring(dev, file);
 				break;
 			case I915_EXEC_BSD_RING1:
-				ring = &dev_priv->ring[VCS];
+				eb.engine = &eb.i915->ring[VCS];
 				break;
 			case I915_EXEC_BSD_RING2:
-				ring = &dev_priv->ring[VCS2];
+				eb.engine = &eb.i915->ring[VCS2];
 				break;
 			default:
 				DRM_DEBUG("execbuf with unknown bsd ring: %d\n",
@@ -1525,33 +1485,28 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 				return -EINVAL;
 			}
 		} else
-			ring = &dev_priv->ring[VCS];
+			eb.engine = &eb.i915->ring[VCS];
 	} else
-		ring = &dev_priv->ring[(args->flags & I915_EXEC_RING_MASK) - 1];
+		eb.engine = &eb.i915->ring[(args->flags & I915_EXEC_RING_MASK) - 1];
 
-	if (!intel_engine_initialized(ring)) {
+	if (!intel_engine_initialized(eb.engine)) {
 		DRM_DEBUG("execbuf with invalid ring: %d\n",
 			  (int)(args->flags & I915_EXEC_RING_MASK));
 		return -EINVAL;
 	}
 
-	if (args->buffer_count < 1) {
-		DRM_DEBUG("execbuf with %d buffers\n", args->buffer_count);
-		return -EINVAL;
-	}
-
 	if (args->flags & I915_EXEC_RESOURCE_STREAMER) {
-		if (!HAS_RESOURCE_STREAMER(dev)) {
+		if (!HAS_RESOURCE_STREAMER(eb.i915)) {
 			DRM_DEBUG("RS is only allowed for Haswell, Gen8 and above\n");
 			return -EINVAL;
 		}
-		if (ring->id != RCS) {
+		if (eb.engine->id != RCS) {
 			DRM_DEBUG("RS is not available on %s\n",
-				 ring->name);
+				 eb.engine->name);
 			return -EINVAL;
 		}
 
-		dispatch_flags |= I915_DISPATCH_RS;
+		eb.dispatch_flags |= I915_DISPATCH_RS;
 	}
 
 	/* Take a local wakeref for preparing to dispatch the execbuf as
@@ -1560,57 +1515,44 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	 * wakeref that we hold until the GPU has been idle for at least
 	 * 100ms.
 	 */
-	intel_runtime_pm_get(dev_priv);
+	intel_runtime_pm_get(eb.i915);
 
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
 		goto pre_mutex_err;
 
-	ctx = i915_gem_validate_context(dev, file, ring, ctx_id);
-	if (IS_ERR(ctx)) {
+	ret = eb_select_context(&eb);
+	if (ret) {
 		mutex_unlock(&dev->struct_mutex);
-		ret = PTR_ERR(ctx);
 		goto pre_mutex_err;
 	}
 
-	i915_gem_context_reference(ctx);
-
-	if (ctx->ppgtt)
-		vm = &ctx->ppgtt->base;
-	else
-		vm = &dev_priv->gtt.base;
-
-	memset(&params_master, 0x00, sizeof(params_master));
-
-	eb = eb_create(dev_priv, args);
-	if (eb == NULL) {
-		i915_gem_context_unreference(ctx);
+	if (eb_create(&eb)) {
+		i915_gem_context_unreference(eb.ctx);
 		mutex_unlock(&dev->struct_mutex);
 		ret = -ENOMEM;
 		goto pre_mutex_err;
 	}
 
 	/* Look up object handles */
-	ret = eb_lookup_vmas(eb, exec, args, vm, file);
+	ret = eb_lookup_vmas(&eb);
 	if (ret)
 		goto err;
 
 	/* take note of the batch buffer before we might reorder the lists */
-	params->batch_vma = eb_get_batch(eb);
+	eb.batch_vma = eb_get_batch(&eb);
 
 	/* Move the objects en-masse into the GTT, evicting if necessary. */
-	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
-	ret = i915_gem_execbuffer_reserve(ring, &eb->vmas, ctx, &need_relocs);
+	ret = eb_reserve(&eb);
 	if (ret)
 		goto err;
 
 	/* The objects are in their final locations, apply the relocations. */
-	if (need_relocs)
-		ret = i915_gem_execbuffer_relocate(eb);
+	if (eb.need_relocs)
+		ret = eb_relocate(&eb);
 	if (ret) {
 		if (ret == -EFAULT) {
-			ret = i915_gem_execbuffer_relocate_slow(dev, args, file, ring,
-								eb, exec, ctx);
+			ret = eb_relocate_slow(&eb);
 			BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 		}
 		if (ret)
@@ -1618,22 +1560,17 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	}
 
 	/* Set the pending read domains for the batch buffer to COMMAND */
-	if (params->batch_vma->obj->base.pending_write_domain) {
+	if (eb.batch_vma->obj->base.pending_write_domain) {
 		DRM_DEBUG("Attempting to use self-modifying batch buffer\n");
 		ret = -EINVAL;
 		goto err;
 	}
 
-	params->args_batch_start_offset = args->batch_start_offset;
-	if (intel_engine_needs_cmd_parser(ring) && args->batch_len) {
+	eb.batch_start_offset = args->batch_start_offset;
+	if (intel_engine_needs_cmd_parser(eb.engine) && args->batch_len) {
 		struct i915_vma *vma;
 
-		vma = i915_gem_execbuffer_parse(ring, &shadow_exec_entry,
-						params->batch_vma->obj,
-						eb,
-						args->batch_start_offset,
-						args->batch_len,
-						file->is_master);
+		vma = eb_parse(&eb, &shadow_exec_entry, file->is_master);
 		if (IS_ERR(vma)) {
 			ret = PTR_ERR(vma);
 			goto err;
@@ -1649,19 +1586,19 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 			 * specifically don't want that set on batches the
 			 * command parser has accepted.
 			 */
-			dispatch_flags |= I915_DISPATCH_SECURE;
-			params->args_batch_start_offset = 0;
-			params->batch_vma = vma;
+			eb.dispatch_flags |= I915_DISPATCH_SECURE;
+			eb.batch_start_offset = 0;
+			eb.batch_vma = vma;
 		}
 	}
 
-	params->batch_vma->obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
+	eb.batch_vma->obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
 
 	/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
 	 * batch" bit. Hence we need to pin secure batches into the global gtt.
 	 * hsw should have this fixed, but bdw mucks it up again. */
-	if (dispatch_flags & I915_DISPATCH_SECURE) {
-		struct drm_i915_gem_object *obj = params->batch_vma->obj;
+	if (eb.dispatch_flags & I915_DISPATCH_SECURE) {
+		struct drm_i915_gem_object *obj = eb.batch_vma->obj;
 		struct i915_vma *vma;
 
 		/*
@@ -1680,13 +1617,13 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 			goto err;
 		}
 
-		params->batch_vma = vma;
+		eb.batch_vma = vma;
 	}
 
 	/* Allocate a request for this batch buffer nice and early. */
-	params->request = i915_gem_request_alloc(ring, ctx);
-	if (IS_ERR(params->request)) {
-		ret = PTR_ERR(params->request);
+	eb.request = i915_gem_request_alloc(eb.engine, eb.ctx);
+	if (IS_ERR(eb.request)) {
+		ret = PTR_ERR(eb.request);
 		goto err_batch_unpin;
 	}
 
@@ -1696,22 +1633,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	 * inactive_list and lose its active reference. Hence we do not need
 	 * to explicitly hold another reference here.
 	 */
-	params->request->batch = params->batch_vma;
+	eb.request->batch = eb.batch_vma;
 
-	/*
-	 * Save assorted stuff away to pass through to *_submission().
-	 * NB: This data should be 'persistent' and not local as it will
-	 * kept around beyond the duration of the IOCTL once the GPU
-	 * scheduler arrives.
-	 */
-	params->dev                     = dev;
-	params->file                    = file;
-	params->ring                    = ring;
-	params->dispatch_flags          = dispatch_flags;
-	params->ctx                     = ctx;
-
-	ret = execbuf_submit(params, args, &eb->vmas);
-	__i915_add_request(params->request, ret == 0);
+	ret = execbuf_submit(&eb);
+	__i915_add_request(eb.request, ret == 0);
 
 err_batch_unpin:
 	/*
@@ -1720,19 +1645,17 @@ err_batch_unpin:
 	 * needs to be adjusted to also track the ggtt batch vma properly as
 	 * active.
 	 */
-	if (dispatch_flags & I915_DISPATCH_SECURE)
-		i915_vma_unpin(params->batch_vma);
+	if (eb.dispatch_flags & I915_DISPATCH_SECURE)
+		i915_vma_unpin(eb.batch_vma);
 err:
 	/* the request owns the ref now */
-	i915_gem_context_unreference(ctx);
-	eb_destroy(eb);
-
+	eb_destroy(&eb);
 	mutex_unlock(&dev->struct_mutex);
 
 pre_mutex_err:
 	/* intel_gpu_busy should also get a ref, so it will free when the device
 	 * is really idle. */
-	intel_runtime_pm_put(dev_priv);
+	intel_runtime_pm_put(eb.i915);
 	return ret;
 }
 
@@ -1799,7 +1722,7 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
 	exec2.flags = I915_EXEC_RENDER;
 	i915_execbuffer2_set_context_id(exec2, 0);
 
-	ret = i915_gem_do_execbuffer(dev, data, file, &exec2, exec2_list);
+	ret = i915_gem_do_execbuffer(dev, file, &exec2, exec2_list);
 	if (!ret) {
 		struct drm_i915_gem_exec_object __user *user_exec_list =
 			to_user_ptr(args->buffers_ptr);
@@ -1863,7 +1786,7 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
 		return -EFAULT;
 	}
 
-	ret = i915_gem_do_execbuffer(dev, data, file, args, exec2_list);
+	ret = i915_gem_do_execbuffer(dev, file, args, exec2_list);
 	if (!ret) {
 		/* Copy the new buffer offsets back to the user's exec list. */
 		struct drm_i915_gem_exec_object2 __user *user_exec_list =
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 168/190] drm/i915: Skip holding context reference for duration of execbuffer call
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (24 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 167/190] drm/i915: Amalgamate execbuffer parameter structures Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 169/190] drm/i915: Use vma->exec_entry as our double-entry placeholder Chris Wilson
                     ` (21 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

Since the context can only be referenced and unreferenced whilst holding
the bkl, we can safely forgo holding the reference on the context for
the duration of our lock inside the execbuffer. After dropping the lock
for the slow path, we then need to take care to reacquire the context,
which has the benefit of rechecking whether we were gazzumped and now
the GPU is wedged.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 50 +++++++++++++++---------------
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 2a535eb35dff..7d758610a095 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -262,8 +262,6 @@ eb_unreserve_vma(struct i915_vma *vma)
 
 static void eb_destroy(struct i915_execbuffer *eb)
 {
-	i915_gem_context_unreference(eb->ctx);
-
 	while (!list_empty(&eb->vmas)) {
 		struct i915_vma *vma;
 
@@ -906,6 +904,27 @@ err:
 	} while (1);
 }
 
+static int eb_select_context(struct i915_execbuffer *eb)
+{
+	struct intel_context *ctx;
+	unsigned ctx_id;
+
+	ctx_id = i915_execbuffer2_get_context_id(*eb->args);
+	ctx = i915_gem_context_get(eb->file->driver_priv, ctx_id);
+	if (unlikely(IS_ERR(ctx)))
+		return PTR_ERR(ctx);
+
+	if (unlikely(ctx->hang_stats.banned)) {
+		DRM_DEBUG("Context %u tried to submit while banned\n", ctx_id);
+		return -EIO;
+	}
+
+	eb->ctx = ctx;
+	eb->vm = ctx->ppgtt ? &ctx->ppgtt->base : &eb->i915->gtt.base;
+
+	return 0;
+}
+
 static int
 eb_relocate_slow(struct i915_execbuffer *eb)
 {
@@ -982,6 +1001,10 @@ eb_relocate_slow(struct i915_execbuffer *eb)
 		goto err;
 	}
 
+	ret = eb_select_context(eb);
+	if (ret)
+		goto err;
+
 	/* reacquire the objects */
 	eb_reset(eb);
 	ret = eb_lookup_vmas(eb);
@@ -1141,28 +1164,6 @@ validate_exec_list(struct drm_device *dev,
 	return 0;
 }
 
-static int eb_select_context(struct i915_execbuffer *eb)
-{
-	struct intel_context *ctx;
-	unsigned ctx_id;
-
-	ctx_id = i915_execbuffer2_get_context_id(*eb->args);
-	ctx = i915_gem_context_get(eb->file->driver_priv, ctx_id);
-	if (unlikely(IS_ERR(ctx)))
-		return PTR_ERR(ctx);
-
-	if (unlikely(ctx->hang_stats.banned)) {
-		DRM_DEBUG("Context %u tried to submit while banned\n", ctx_id);
-		return -EIO;
-	}
-
-	eb->ctx = ctx;
-	i915_gem_context_reference(ctx);
-	eb->vm = ctx->ppgtt ? &ctx->ppgtt->base : &eb->i915->gtt.base;
-
-	return 0;
-}
-
 void i915_vma_move_to_active(struct i915_vma *vma,
 			     struct drm_i915_gem_request *req,
 			     unsigned flags)
@@ -1528,7 +1529,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	}
 
 	if (eb_create(&eb)) {
-		i915_gem_context_unreference(eb.ctx);
 		mutex_unlock(&dev->struct_mutex);
 		ret = -ENOMEM;
 		goto pre_mutex_err;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 169/190] drm/i915: Use vma->exec_entry as our double-entry placeholder
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (25 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 168/190] drm/i915: Skip holding context reference for duration of execbuffer call Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 170/190] drm/i915: Store a direct lookup from object handle to vma Chris Wilson
                     ` (20 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

This has the benefit of not requiring us to manipulate the
vma->exec_link list when tearing down the execbuffer, and is a
marginally cheaper test to detect the user error.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_evict.c      | 23 +++-------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 73 ++++++++++++++++--------------
 drivers/gpu/drm/i915/i915_gem_gtt.c        |  1 -
 3 files changed, 46 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index b48839fc2996..d40bcb81c922 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -69,9 +69,6 @@ mark_free(struct i915_vma *vma, unsigned flags, struct list_head *unwind)
 	if (vma->pin_count)
 		return false;
 
-	if (WARN_ON(!list_empty(&vma->exec_list)))
-		return false;
-
 	if (flags & PIN_NOFAULT && vma->obj->fault_mappable)
 		return false;
 
@@ -161,7 +158,7 @@ search_again:
 		ret = drm_mm_scan_remove_block(&vma->node);
 		BUG_ON(ret);
 
-		list_del_init(&vma->exec_list);
+		list_del(&vma->exec_list);
 	}
 
 	/* Can we unpin some objects such as idle hw contents,
@@ -208,22 +205,16 @@ found:
 		if (drm_mm_scan_remove_block(&vma->node))
 			drm_gem_object_reference(&vma->obj->base);
 		else
-			list_del_init(&vma->exec_list);
+			list_del(&vma->exec_list);
 	}
 
 	/* Unbinding will emit any required flushes */
-	while (!list_empty(&eviction_list)) {
-		struct drm_gem_object *obj;
-		vma = list_first_entry(&eviction_list,
-				       struct i915_vma,
-				       exec_list);
-
-		obj =  &vma->obj->base;
-		list_del_init(&vma->exec_list);
+	ret = 0;
+	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+		struct drm_i915_gem_object *obj = vma->obj;
 		if (ret == 0)
 			ret = i915_vma_unbind(vma);
-
-		drm_gem_object_unreference(obj);
+		drm_gem_object_unreference(&obj->base);
 	}
 	return ret;
 }
@@ -277,7 +268,7 @@ i915_gem_evict_for_vma(struct i915_vma *target, unsigned flags)
 	ret = 0;
 	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
 		struct drm_i915_gem_object *obj = vma->obj;
-		list_del_init(&vma->exec_list);
+		list_del(&vma->exec_list);
 		if (ret == 0)
 			ret = i915_vma_unbind(vma);
 		drm_gem_object_unreference(&obj->base);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 7d758610a095..af55d56ec00a 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -95,13 +95,38 @@ eb_create(struct i915_execbuffer *eb)
 	} else
 		eb->and = -eb->args->buffer_count;
 
-	INIT_LIST_HEAD(&eb->vmas);
 	return 0;
 }
 
+static inline void
+__eb_unreserve_vma(struct i915_vma *vma,
+		   const struct drm_i915_gem_exec_object2 *entry)
+{
+	if (unlikely(entry->flags & __EXEC_OBJECT_HAS_FENCE))
+		i915_vma_unpin_fence(vma);
+
+	if (entry->flags & __EXEC_OBJECT_HAS_PIN)
+		__i915_vma_unpin(vma);
+}
+
+static void
+eb_unreserve_vma(struct i915_vma *vma)
+{
+	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
+	__eb_unreserve_vma(vma, entry);
+	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | __EXEC_OBJECT_HAS_PIN);
+}
+
 static void
 eb_reset(struct i915_execbuffer *eb)
 {
+	struct i915_vma *vma;
+
+	list_for_each_entry(vma, &eb->vmas, exec_list) {
+		eb_unreserve_vma(vma);
+		vma->exec_entry = NULL;
+	}
+
 	if (eb->and >= 0)
 		memset(eb->buckets, 0, (eb->and+1)*sizeof(struct hlist_head));
 }
@@ -133,6 +158,8 @@ eb_lookup_vmas(struct i915_execbuffer *eb)
 	struct list_head objects;
 	int i, ret;
 
+	INIT_LIST_HEAD(&eb->vmas);
+
 	INIT_LIST_HEAD(&objects);
 	spin_lock(&eb->file->table_lock);
 	/* Grab a reference to the object and release the lock so we can lookup
@@ -241,36 +268,20 @@ static struct i915_vma *eb_get_vma(struct i915_execbuffer *eb, unsigned long han
 	}
 }
 
-static void
-eb_unreserve_vma(struct i915_vma *vma)
-{
-	struct drm_i915_gem_exec_object2 *entry;
-
-	if (!drm_mm_node_allocated(&vma->node))
-		return;
-
-	entry = vma->exec_entry;
-
-	if (entry->flags & __EXEC_OBJECT_HAS_FENCE)
-		i915_vma_unpin_fence(vma);
-
-	if (entry->flags & __EXEC_OBJECT_HAS_PIN)
-		__i915_vma_unpin(vma);
-
-	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | __EXEC_OBJECT_HAS_PIN);
-}
-
 static void eb_destroy(struct i915_execbuffer *eb)
 {
-	while (!list_empty(&eb->vmas)) {
-		struct i915_vma *vma;
+	struct i915_vma *vma;
 
-		vma = list_first_entry(&eb->vmas,
-				       struct i915_vma,
-				       exec_list);
-		list_del_init(&vma->exec_list);
-		eb_unreserve_vma(vma);
+	list_for_each_entry(vma, &eb->vmas, exec_list) {
+		if (vma->exec_entry == NULL)
+			continue;
+
+		__eb_unreserve_vma(vma, vma->exec_entry);
+		vma->exec_entry = NULL;
 	}
+
+	if (eb->buckets)
+		kfree(eb->buckets);
 }
 
 static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
@@ -936,12 +947,7 @@ eb_relocate_slow(struct i915_execbuffer *eb)
 	int i, total, ret;
 
 	/* We may process another execbuffer during the unlock... */
-	while (!list_empty(&eb->vmas)) {
-		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
-		list_del_init(&vma->exec_list);
-		eb_unreserve_vma(vma);
-	}
-
+	eb_reset(eb);
 	mutex_unlock(&dev->struct_mutex);
 
 	total = 0;
@@ -1006,7 +1012,6 @@ eb_relocate_slow(struct i915_execbuffer *eb)
 		goto err;
 
 	/* reacquire the objects */
-	eb_reset(eb);
 	ret = eb_lookup_vmas(eb);
 	if (ret)
 		goto err;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index faee28c807f2..cb3a6e272e22 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3254,7 +3254,6 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 	if (vma == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	INIT_LIST_HEAD(&vma->exec_list);
 	for (i = 0; i < ARRAY_SIZE(vma->last_read); i++)
 		init_request_active(&vma->last_read[i], i915_vma_retire);
 	init_request_active(&vma->last_fence, i915_vma_retire__fence);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 170/190] drm/i915: Store a direct lookup from object handle to vma
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (26 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 169/190] drm/i915: Use vma->exec_entry as our double-entry placeholder Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 171/190] drm/i915: Pass vma to relocate entry Chris Wilson
                     ` (19 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

The advent of full-ppgtt lead to an extra indirection between the object
and its binding. That extra indirection has a noticeable impact on how
fast we can convert from the user handles to our internal vma for
execbuffer. In order to bypass the extra indirection, we use a
resizeable hashtable to jump from the object to the per-ctx vma.
rhashtable was considered but we don't need the online resizing feature
and the extra complexity proved to undermine its usefulness. Instead, we
simply reallocate the hastable on demand in a background task and
serialize it before iterating.

In non-full-ppgtt modes, multiple files and multiple contexts can share
the same vma. This leads to having multiple possible handle->vma links,
so we only use the first to establish the fast path. The majority of
buffers are not shared and so we should still be able to realise
speedups with multiple clients.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  17 ++-
 drivers/gpu/drm/i915/i915_drv.h            |  13 +-
 drivers/gpu/drm/i915/i915_gem.c            |  30 +++-
 drivers/gpu/drm/i915/i915_gem_context.c    |  75 ++++++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 233 ++++++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem_gtt.h        |   8 +-
 6 files changed, 264 insertions(+), 112 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 8a59630fe5fb..19b0d6a7680d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -230,9 +230,9 @@ static int obj_rank_by_stolen(void *priv,
 			      struct list_head *A, struct list_head *B)
 {
 	struct drm_i915_gem_object *a =
-		container_of(A, struct drm_i915_gem_object, obj_exec_link);
+		container_of(A, struct drm_i915_gem_object, tmp_link);
 	struct drm_i915_gem_object *b =
-		container_of(B, struct drm_i915_gem_object, obj_exec_link);
+		container_of(B, struct drm_i915_gem_object, tmp_link);
 
 	if (a->stolen->start < b->stolen->start)
 		return -1;
@@ -260,7 +260,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 		if (obj->stolen == NULL)
 			continue;
 
-		list_add(&obj->obj_exec_link, &stolen);
+		list_add(&obj->tmp_link, &stolen);
 
 		total_obj_size += obj->base.size;
 		count++;
@@ -269,7 +269,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 		if (obj->stolen == NULL)
 			continue;
 
-		list_add(&obj->obj_exec_link, &stolen);
+		list_add(&obj->tmp_link, &stolen);
 
 		total_obj_size += obj->base.size;
 		count++;
@@ -277,11 +277,11 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 	list_sort(NULL, &stolen, obj_rank_by_stolen);
 	seq_puts(m, "Stolen:\n");
 	while (!list_empty(&stolen)) {
-		obj = list_first_entry(&stolen, typeof(*obj), obj_exec_link);
+		obj = list_first_entry(&stolen, typeof(*obj), tmp_link);
 		seq_puts(m, "   ");
 		describe_obj(m, obj);
 		seq_putc(m, '\n');
-		list_del_init(&obj->obj_exec_link);
+		list_del(&obj->tmp_link);
 	}
 	mutex_unlock(&dev->struct_mutex);
 
@@ -1973,6 +1973,11 @@ static int i915_context_status(struct seq_file *m, void *unused)
 			seq_putc(m, '\n');
 		}
 
+		seq_printf(m, "\tvma hashtable size=%u (actual %u), count=%u\n",
+			   ctx->vma_ht_size,
+			   1 << ctx->vma_ht_bits,
+			   ctx->vma_ht_count);
+
 		seq_putc(m, '\n');
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d664a67cda7b..29e1d2ed8b05 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -49,7 +49,7 @@
 #include <drm/drm_legacy.h> /* for struct drm_dma_handle */
 #include <drm/drm_gem.h>
 #include <linux/backlight.h>
-#include <linux/hashtable.h>
+#include <linux/hash.h>
 #include <linux/intel-iommu.h>
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
@@ -895,6 +895,12 @@ struct intel_context {
 
 	struct list_head link;
 
+	struct work_struct vma_ht_resize;
+	struct hlist_head *vma_ht;
+	unsigned vma_ht_bits;
+	unsigned vma_ht_size;
+	unsigned vma_ht_count;
+
 	bool closed:1;
 };
 
@@ -2026,15 +2032,14 @@ struct drm_i915_gem_object {
 
 	/** List of VMAs backed by this object */
 	struct list_head vma_list;
+	struct i915_vma *vma_hashed;
 
 	/** Stolen memory for this object, instead of being backed by shmem. */
 	struct drm_mm_node *stolen;
 	struct list_head global_list;
 
-	/** Used in execbuf to temporarily hold a ref */
-	struct list_head obj_exec_link;
-
 	struct list_head batch_pool_link;
+	struct list_head tmp_link;
 
 	unsigned long flags;
 	/**
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e3d83e10918b..91f764e9dff2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2414,11 +2414,36 @@ static bool i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 	return i915_gem_object_is_active(obj);
 }
 
+static void i915_vma_unlink_ctx(struct i915_vma *vma)
+{
+	struct intel_context *ctx = vma->ctx;
+
+	if (ctx->vma_ht_size & 1) {
+		cancel_work_sync(&ctx->vma_ht_resize);
+		ctx->vma_ht_size &= ~1;
+	}
+
+	__hlist_del(&vma->ctx_node);
+	ctx->vma_ht_count--;
+
+	if (4*ctx->vma_ht_count < ctx->vma_ht_size) {
+		ctx->vma_ht_size |= 1;
+		schedule_work(&ctx->vma_ht_resize);
+	}
+
+	if (vma->is_ggtt)
+		vma->obj->vma_hashed = NULL;
+	vma->ctx = NULL;
+}
+
 void i915_vma_close(struct i915_vma *vma)
 {
 	GEM_BUG_ON(vma->closed);
 	vma->closed = true;
 
+	if (vma->ctx)
+		i915_vma_unlink_ctx(vma);
+
 	list_del_init(&vma->obj_link);
 	if (!vma->active)
 		WARN_ON(i915_vma_unbind(vma));
@@ -2436,6 +2461,10 @@ void i915_gem_close_object(struct drm_gem_object *gem,
 		if (vma->vm->file == fpriv)
 			i915_vma_close(vma);
 
+	vma = obj->vma_hashed;
+	if (vma && vma->ctx->file_priv == fpriv)
+		i915_vma_unlink_ctx(vma);
+
 	if (i915_gem_object_flush_active(obj) &&
 	    !i915_gem_object_has_active_reference(obj)) {
 		i915_gem_object_set_active_reference(obj);
@@ -3699,7 +3728,6 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 				    i915_gem_object_retire__read);
 	init_request_active(&obj->last_write,
 			    i915_gem_object_retire__write);
-	INIT_LIST_HEAD(&obj->obj_exec_link);
 	INIT_LIST_HEAD(&obj->vma_list);
 	INIT_LIST_HEAD(&obj->batch_pool_link);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index e619cdadaeb6..4e0c5e161e84 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -85,6 +85,7 @@
  *
  */
 
+#include <linux/log2.h>
 #include <drm/drmP.h>
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
@@ -97,6 +98,9 @@
 #define GEN6_CONTEXT_ALIGN (64<<10)
 #define GEN7_CONTEXT_ALIGN 4096
 
+/* Initial size (as log2) to preallocate the handle->object hashtable */
+#define VMA_HT_BITS 2u /* 4 x 2 pointers, 64 bytes minimum */
+
 static size_t get_context_alignment(struct drm_device *dev)
 {
 	if (IS_GEN6(dev))
@@ -133,6 +137,65 @@ static int get_context_size(struct drm_device *dev)
 	return ret;
 }
 
+static void resize_vma_ht(struct work_struct *work)
+{
+	struct intel_context *ctx =
+		container_of(work, typeof(*ctx), vma_ht_resize);
+	unsigned size, bits, new_bits, i;
+	struct hlist_head *new_ht;
+
+	bits = 1 + ilog2(ctx->vma_ht_count);
+	new_bits = min_t(unsigned, max(bits, VMA_HT_BITS), sizeof(unsigned)*8);
+	if (new_bits == ctx->vma_ht_bits)
+		goto out;
+
+	new_ht = kzalloc(sizeof(*new_ht)<<new_bits, GFP_KERNEL | __GFP_NOWARN);
+	if (new_ht == NULL)
+		new_ht = vzalloc(sizeof(*new_ht)<<new_bits);
+	if (new_ht == NULL)
+		/* pretend the resize suceeded and stop calling us for a bit! */
+		goto out;
+
+	size = 1 << ctx->vma_ht_bits;
+	for (i = 0; i < size; i++) {
+		struct i915_vma *vma;
+		struct hlist_node *tmp;
+
+		hlist_for_each_entry_safe(vma, tmp, &ctx->vma_ht[i], ctx_node) {
+			__hlist_del(&vma->ctx_node);
+			hlist_add_head(&vma->ctx_node,
+				       &new_ht[hash_32(vma->ctx_handle,
+						       new_bits)]);
+		}
+	}
+	kvfree(ctx->vma_ht);
+	ctx->vma_ht = new_ht;
+	ctx->vma_ht_bits = new_bits;
+	smp_wmb();
+out:
+	ctx->vma_ht_size = 1 << bits;
+}
+
+static void decouple_vma(struct intel_context *ctx)
+{
+	unsigned i, size;
+
+	if (ctx->vma_ht_size & 1)
+		cancel_work_sync(&ctx->vma_ht_resize);
+
+	size = 1 << ctx->vma_ht_bits;
+	for (i = 0; i < size; i++) {
+		struct i915_vma *vma;
+		struct hlist_node *tmp;
+
+		hlist_for_each_entry_safe(vma, tmp, &ctx->vma_ht[i], ctx_node) {
+			vma->obj->vma_hashed = NULL;
+			vma->ctx = NULL;
+		}
+	}
+	kvfree(ctx->vma_ht);
+}
+
 void i915_gem_context_free(struct kref *ctx_ref)
 {
 	struct intel_context *ctx = container_of(ctx_ref, typeof(*ctx), ref);
@@ -154,9 +217,11 @@ void i915_gem_context_free(struct kref *ctx_ref)
 		__i915_gem_object_release_unless_active(ce->state);
 	}
 
+	decouple_vma(ctx);
 	i915_ppgtt_put(ctx->ppgtt);
 
 	put_pid(ctx->pid);
+
 	list_del(&ctx->link);
 	kfree(ctx);
 }
@@ -246,6 +311,16 @@ __create_hw_context(struct drm_device *dev,
 	list_add_tail(&ctx->link, &dev_priv->context_list);
 	ctx->i915 = dev_priv;
 
+	ctx->vma_ht_bits = VMA_HT_BITS;
+	ctx->vma_ht_size = 1 << ctx->vma_ht_bits;
+	ctx->vma_ht = kzalloc(sizeof(*ctx->vma_ht)*ctx->vma_ht_size,
+			      GFP_KERNEL);
+	if (ctx->vma_ht == NULL) {
+		kfree(ctx);
+		return ERR_PTR(-ENOMEM);
+	}
+	INIT_WORK(&ctx->vma_ht_resize, resize_vma_ht);
+
 	if (dev_priv->hw_context_size) {
 		struct drm_i915_gem_object *obj =
 				i915_gem_alloc_context_obj(dev, dev_priv->hw_context_size);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index af55d56ec00a..891c4593b8eb 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -63,37 +63,30 @@ struct i915_execbuffer {
 		struct drm_mm_node node;
 		bool use_64bit_reloc;
 	} reloc_cache;
-	int and;
-	union {
-		struct i915_vma **lut;
-		struct hlist_head *buckets;
-	};
+	int lut_mask;
+	struct hlist_head *buckets;
 };
 
 static int
 eb_create(struct i915_execbuffer *eb)
 {
-	eb->lut = NULL;
-	if (eb->args->flags & I915_EXEC_HANDLE_LUT) {
-		unsigned size = eb->args->buffer_count;
-		size *= sizeof(struct i915_vma *);
-		eb->lut = kmalloc(size, GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
-	}
-
-	if (eb->lut == NULL) {
-		unsigned size = eb->args->buffer_count;
-		unsigned count = PAGE_SIZE / sizeof(struct hlist_head) / 2;
-		BUILD_BUG_ON_NOT_POWER_OF_2(PAGE_SIZE / sizeof(struct hlist_head));
-		while (count > 2*size)
-			count >>= 1;
-		eb->lut = kzalloc(count*sizeof(struct hlist_head),
-				  GFP_TEMPORARY);
-		if (eb->lut == NULL)
+	if ((eb->args->flags & I915_EXEC_HANDLE_LUT) == 0) {
+		unsigned size = 1 + ilog2(eb->args->buffer_count);
+		do {
+			eb->buckets = kzalloc((1<<size)*sizeof(struct hlist_head),
+					     GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
+			if (eb->buckets)
+				break;
+		} while (--size > 2);
+		if (eb->buckets == NULL)
+			eb->buckets = kzalloc((1<<size)*sizeof(struct hlist_head),
+					      GFP_TEMPORARY);
+		if (eb->buckets == NULL)
 			return -ENOMEM;
 
-		eb->and = count - 1;
+		eb->lut_mask = size;
 	} else
-		eb->and = -eb->args->buffer_count;
+		eb->lut_mask = -eb->args->buffer_count;
 
 	return 0;
 }
@@ -127,72 +120,102 @@ eb_reset(struct i915_execbuffer *eb)
 		vma->exec_entry = NULL;
 	}
 
-	if (eb->and >= 0)
-		memset(eb->buckets, 0, (eb->and+1)*sizeof(struct hlist_head));
+	if (eb->lut_mask >= 0)
+		memset(eb->buckets, 0,
+		       (1<<eb->lut_mask)*sizeof(struct hlist_head));
 }
 
-static struct i915_vma *
-eb_get_batch(struct i915_execbuffer *eb)
+#define to_ptr(T, x) ((T *)(uintptr_t)(x))
+
+static bool
+eb_add_vma(struct i915_execbuffer *eb, struct i915_vma *vma, int i)
 {
-	struct i915_vma *vma = list_entry(eb->vmas.prev, typeof(*vma), exec_list);
+	if (unlikely(vma->exec_entry)) {
+		DRM_DEBUG("Object [handle %d, index %d] appears more than once in object list\n",
+			  eb->exec[i].handle, i);
+		return false;
+	}
+	list_add_tail(&vma->exec_list, &eb->vmas);
 
-	/*
-	 * SNA is doing fancy tricks with compressing batch buffers, which leads
-	 * to negative relocation deltas. Usually that works out ok since the
-	 * relocate address is still positive, except when the batch is placed
-	 * very low in the GTT. Ensure this doesn't happen.
-	 *
-	 * Note that actual hangs have only been observed on gen7, but for
-	 * paranoia do it everywhere.
-	 */
-	if ((vma->exec_entry->flags & EXEC_OBJECT_PINNED) == 0)
-		vma->exec_entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
+	vma->exec_entry = &eb->exec[i];
+	if (eb->lut_mask >= 0) {
+		vma->exec_handle = eb->exec[i].handle;
+		hlist_add_head(&vma->exec_node,
+			       &eb->buckets[hash_32(vma->exec_handle,
+						    eb->lut_mask)]);
+	}
 
-	return vma;
+	eb->exec[i].rsvd2 = (uintptr_t)vma;
+	return true;
+}
+
+static inline struct hlist_head *ht_head(struct intel_context *ctx, u32 handle)
+{
+	return &ctx->vma_ht[hash_32(handle, ctx->vma_ht_bits)];
 }
 
 static int
 eb_lookup_vmas(struct i915_execbuffer *eb)
 {
-	struct drm_i915_gem_object *obj;
-	struct list_head objects;
-	int i, ret;
+	const int count = eb->args->buffer_count;
+	struct i915_vma *vma;
+	int slow_pass = -1;
+	int i;
 
 	INIT_LIST_HEAD(&eb->vmas);
 
-	INIT_LIST_HEAD(&objects);
+	if (unlikely(eb->ctx->vma_ht_size & 1))
+		flush_work(&eb->ctx->vma_ht_resize);
+	for (i = 0; i < count; i++) {
+		eb->exec[i].rsvd2 = 0;
+
+		hlist_for_each_entry(vma,
+				     ht_head(eb->ctx, eb->exec[i].handle),
+				     ctx_node) {
+			if (vma->ctx_handle != eb->exec[i].handle)
+				continue;
+
+			if (!eb_add_vma(eb, vma, i))
+				return -EINVAL;
+
+			goto next_vma;
+		}
+
+		if (slow_pass < 0)
+			slow_pass = i;
+next_vma: ;
+	}
+
+	if (slow_pass < 0)
+		return 0;
+
 	spin_lock(&eb->file->table_lock);
 	/* Grab a reference to the object and release the lock so we can lookup
 	 * or create the VMA without using GFP_ATOMIC */
-	for (i = 0; i < eb->args->buffer_count; i++) {
-		obj = to_intel_bo(idr_find(&eb->file->object_idr, eb->exec[i].handle));
-		if (obj == NULL) {
-			spin_unlock(&eb->file->table_lock);
-			DRM_DEBUG("Invalid object handle %d at index %d\n",
-				   eb->exec[i].handle, i);
-			ret = -ENOENT;
-			goto err;
-		}
+	for (i = slow_pass; i < count; i++) {
+		struct drm_i915_gem_object *obj;
 
-		if (!list_empty(&obj->obj_exec_link)) {
+		if (eb->exec[i].rsvd2)
+			continue;
+
+		obj = to_intel_bo(idr_find(&eb->file->object_idr,
+					   eb->exec[i].handle));
+		if (unlikely(obj == NULL)) {
 			spin_unlock(&eb->file->table_lock);
-			DRM_DEBUG("Object %p [handle %d, index %d] appears more than once in object list\n",
-				   obj, eb->exec[i].handle, i);
-			ret = -EINVAL;
-			goto err;
+			DRM_DEBUG("Invalid object handle %d at index %d\n",
+				  eb->exec[i].handle, i);
+			return -ENOENT;
 		}
 
-		list_add_tail(&obj->obj_exec_link, &objects);
+		eb->exec[i].rsvd2 = 1 | (uintptr_t)obj;
 	}
 	spin_unlock(&eb->file->table_lock);
 
-	i = 0;
-	while (!list_empty(&objects)) {
-		struct i915_vma *vma;
+	for (i = slow_pass; i < count; i++) {
+		struct drm_i915_gem_object *obj;
 
-		obj = list_first_entry(&objects,
-				       struct drm_i915_gem_object,
-				       obj_exec_link);
+		if ((eb->exec[i].rsvd2 & 1) == 0)
+			continue;
 
 		/*
 		 * NOTE: We can leak any vmas created here when something fails
@@ -202,61 +225,71 @@ eb_lookup_vmas(struct i915_execbuffer *eb)
 		 * from the (obj, vm) we don't run the risk of creating
 		 * duplicated vmas for the same vm.
 		 */
+		obj = to_ptr(struct drm_i915_gem_object, eb->exec[i].rsvd2 & ~1);
 		vma = i915_gem_obj_lookup_or_create_vma(obj, eb->vm, NULL);
 		if (unlikely(IS_ERR(vma))) {
 			DRM_DEBUG("Failed to lookup VMA\n");
-			ret = PTR_ERR(vma);
-			goto err;
+			return PTR_ERR(vma);
 		}
 
-		/* Transfer ownership from the objects list to the vmas list. */
-		list_add_tail(&vma->exec_list, &eb->vmas);
-		list_del_init(&obj->obj_exec_link);
-
-		vma->exec_entry = &eb->exec[i];
-		if (eb->and < 0) {
-			eb->lut[i] = vma;
-		} else {
-			u32 handle =
-				eb->args->flags & I915_EXEC_HANDLE_LUT ?
-				i : eb->exec[i].handle;
-			vma->exec_handle = handle;
-			hlist_add_head(&vma->exec_node,
-				       &eb->buckets[handle & eb->and]);
+		/* First come, first served */
+		if (vma->ctx == NULL) {
+			vma->ctx = eb->ctx;
+			vma->ctx_handle = eb->exec[i].handle;
+			hlist_add_head(&vma->ctx_node,
+				       ht_head(eb->ctx, eb->exec[i].handle));
+			eb->ctx->vma_ht_count++;
+			if (vma->is_ggtt) {
+				BUG_ON(obj->vma_hashed);
+				obj->vma_hashed = vma;
+			}
 		}
-		++i;
+
+		if (!eb_add_vma(eb, vma, i))
+			return -EINVAL;
+	}
+	if (4*eb->ctx->vma_ht_count > 3*eb->ctx->vma_ht_size) {
+		eb->ctx->vma_ht_size |= 1;
+		queue_work(system_highpri_wq, &eb->ctx->vma_ht_resize);
 	}
 
 	return 0;
+}
 
+static struct i915_vma *
+eb_get_batch(struct i915_execbuffer *eb)
+{
+	struct i915_vma *vma;
+
+	vma = to_ptr(struct i915_vma, eb->exec[eb->args->buffer_count-1].rsvd2);
 
-err:
-	while (!list_empty(&objects)) {
-		obj = list_first_entry(&objects,
-				       struct drm_i915_gem_object,
-				       obj_exec_link);
-		list_del_init(&obj->obj_exec_link);
-		drm_gem_object_unreference(&obj->base);
-	}
 	/*
-	 * Objects already transfered to the vmas list will be unreferenced by
-	 * eb_destroy.
+	 * SNA is doing fancy tricks with compressing batch buffers, which leads
+	 * to negative relocation deltas. Usually that works out ok since the
+	 * relocate address is still positive, except when the batch is placed
+	 * very low in the GTT. Ensure this doesn't happen.
+	 *
+	 * Note that actual hangs have only been observed on gen7, but for
+	 * paranoia do it everywhere.
 	 */
+	if ((vma->exec_entry->flags & EXEC_OBJECT_PINNED) == 0)
+		vma->exec_entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
 
-	return ret;
+	return vma;
 }
 
-static struct i915_vma *eb_get_vma(struct i915_execbuffer *eb, unsigned long handle)
+static struct i915_vma *
+eb_get_vma(struct i915_execbuffer *eb, unsigned long handle)
 {
-	if (eb->and < 0) {
-		if (handle >= -eb->and)
+	if (eb->lut_mask < 0) {
+		if (handle >= -eb->lut_mask)
 			return NULL;
-		return eb->lut[handle];
+		return to_ptr(struct i915_vma, eb->exec[handle].rsvd2);
 	} else {
 		struct hlist_head *head;
 		struct hlist_node *node;
 
-		head = &eb->buckets[handle & eb->and];
+		head = &eb->buckets[hash_32(handle, eb->lut_mask)];
 		hlist_for_each(node, head) {
 			struct i915_vma *vma;
 
@@ -280,7 +313,7 @@ static void eb_destroy(struct i915_execbuffer *eb)
 		vma->exec_entry = NULL;
 	}
 
-	if (eb->buckets)
+	if (eb->lut_mask >= 0)
 		kfree(eb->buckets);
 }
 
@@ -845,7 +878,7 @@ static int eb_reserve(struct i915_execbuffer *eb)
 			entry->flags &= ~EXEC_OBJECT_NEEDS_FENCE;
 		need_fence =
 			entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
-			obj->tiling_mode != I915_TILING_NONE;
+			vma->obj->tiling_mode != I915_TILING_NONE;
 		need_mappable = need_fence || need_reloc_mappable(vma);
 
 		if (entry->flags & EXEC_OBJECT_PINNED)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 06d11f941056..3080033b722c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -175,6 +175,7 @@ struct i915_ggtt_view {
 extern const struct i915_ggtt_view i915_ggtt_view_normal;
 extern const struct i915_ggtt_view i915_ggtt_view_rotated;
 
+struct intel_context;
 enum i915_cache_level;
 
 /**
@@ -240,6 +241,7 @@ struct i915_vma {
 	struct list_head vm_link;
 
 	struct list_head obj_link; /* Link in the object's VMA list */
+	struct hlist_node obj_node;
 
 	/** This vma's place in the batchbuffer or on the eviction list */
 	struct list_head exec_list;
@@ -248,8 +250,12 @@ struct i915_vma {
 	 * Used for performing relocations during execbuffer insertion.
 	 */
 	struct hlist_node exec_node;
-	unsigned long exec_handle;
 	struct drm_i915_gem_exec_object2 *exec_entry;
+	u32 exec_handle;
+
+	struct intel_context *ctx;
+	struct hlist_node ctx_node;
+	u32 ctx_handle;
 };
 
 struct i915_page_dma {
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 171/190] drm/i915: Pass vma to relocate entry
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (27 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 170/190] drm/i915: Store a direct lookup from object handle to vma Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 172/190] drm/i915: Eliminate lots of iterations over the execobjects array Chris Wilson
                     ` (18 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

We can simplify our tracking of pending writes in an execbuf to the
single bit in the vma->exec_entry->flags, but that requires the
relocation function knowing the object's vma. Pass it along.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h            |   3 +-
 drivers/gpu/drm/i915/i915_gem.c            |  12 ++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 104 ++++++++++++-----------------
 drivers/gpu/drm/i915/intel_display.c       |   2 +-
 4 files changed, 52 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 29e1d2ed8b05..2ceefce0e731 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2847,7 +2847,8 @@ static inline void i915_gem_object_unpin_vmap(struct drm_i915_gem_object *obj)
 
 int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
 int i915_gem_object_sync(struct drm_i915_gem_object *obj,
-			 struct drm_i915_gem_request *to);
+			 struct drm_i915_gem_request *to,
+			 bool write);
 void i915_vma_move_to_active(struct i915_vma *vma,
 			     struct drm_i915_gem_request *req,
 			     unsigned flags);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 91f764e9dff2..3eeca1fb89d2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2597,9 +2597,9 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
  */
 int
 i915_gem_object_sync(struct drm_i915_gem_object *obj,
-		     struct drm_i915_gem_request *to)
+		     struct drm_i915_gem_request *to,
+		     bool write)
 {
-	const bool readonly = obj->base.pending_write_domain == 0;
 	struct drm_i915_gem_request *req[I915_NUM_RINGS];
 	int ret, i, n;
 
@@ -2607,13 +2607,13 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
 		return 0;
 
 	n = 0;
-	if (readonly) {
-		if (obj->last_write.request)
-			req[n++] = obj->last_write.request;
-	} else {
+	if (write) {
 		for (i = 0; i < I915_NUM_RINGS; i++)
 			if (obj->last_read[i].request)
 				req[n++] = obj->last_read[i].request;
+	} else {
+		if (obj->last_write.request)
+			req[n++] = obj->last_write.request;
 	}
 	for (i = 0; i < n; i++) {
 		ret = __i915_gem_object_sync(obj, to, req[i]);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 891c4593b8eb..2868e094f67c 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -543,42 +543,25 @@ repeat:
 }
 
 static int
-eb_relocate_entry(struct drm_i915_gem_object *obj,
+eb_relocate_entry(struct i915_vma *vma,
 		  struct i915_execbuffer *eb,
 		  struct drm_i915_gem_relocation_entry *reloc)
 {
-	struct drm_gem_object *target_obj;
-	struct drm_i915_gem_object *target_i915_obj;
-	struct i915_vma *target_vma;
-	uint64_t target_offset;
+	struct i915_vma *target;
+	u64 target_offset;
 	int ret;
 
 	/* we've already hold a reference to all valid objects */
-	target_vma = eb_get_vma(eb, reloc->target_handle);
-	if (unlikely(target_vma == NULL))
+	target = eb_get_vma(eb, reloc->target_handle);
+	if (unlikely(target == NULL))
 		return -ENOENT;
-	target_i915_obj = target_vma->obj;
-	target_obj = &target_vma->obj->base;
-
-	target_offset = gen8_canonical_addr(target_vma->node.start);
-
-	/* Sandybridge PPGTT errata: We need a global gtt mapping for MI and
-	 * pipe_control writes because the gpu doesn't properly redirect them
-	 * through the ppgtt for non_secure batchbuffers. */
-	if (unlikely(IS_GEN6(eb->i915) &&
-	    reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION)) {
-		ret = i915_vma_bind(target_vma, target_i915_obj->cache_level,
-				    PIN_GLOBAL);
-		if (WARN_ONCE(ret, "Unexpected failure to bind target VMA!"))
-			return ret;
-	}
 
 	/* Validate that the target is in a valid r/w GPU domain */
 	if (unlikely(reloc->write_domain & (reloc->write_domain - 1))) {
 		DRM_DEBUG("reloc with multiple write domains: "
-			  "obj %p target %d offset %d "
+			  "target %d offset %d "
 			  "read %08x write %08x",
-			  obj, reloc->target_handle,
+			  reloc->target_handle,
 			  (int) reloc->offset,
 			  reloc->read_domains,
 			  reloc->write_domain);
@@ -587,47 +570,59 @@ eb_relocate_entry(struct drm_i915_gem_object *obj,
 	if (unlikely((reloc->write_domain | reloc->read_domains)
 		     & ~I915_GEM_GPU_DOMAINS)) {
 		DRM_DEBUG("reloc with read/write non-GPU domains: "
-			  "obj %p target %d offset %d "
+			  "target %d offset %d "
 			  "read %08x write %08x",
-			  obj, reloc->target_handle,
+			  reloc->target_handle,
 			  (int) reloc->offset,
 			  reloc->read_domains,
 			  reloc->write_domain);
 		return -EINVAL;
 	}
 
-	target_obj->pending_read_domains |= reloc->read_domains;
-	target_obj->pending_write_domain |= reloc->write_domain;
+	if (reloc->write_domain)
+		target->exec_entry->flags |= EXEC_OBJECT_WRITE;
+
+	/* Sandybridge PPGTT errata: We need a global gtt mapping for MI and
+	 * pipe_control writes because the gpu doesn't properly redirect them
+	 * through the ppgtt for non_secure batchbuffers. */
+	if (unlikely(IS_GEN6(eb->i915) &&
+	    reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION)) {
+		ret = i915_vma_bind(target, target->obj->cache_level,
+				    PIN_GLOBAL);
+		if (WARN_ONCE(ret, "Unexpected failure to bind target VMA!"))
+			return ret;
+	}
 
 	/* If the relocation already has the right value in it, no
 	 * more work needs to be done.
 	 */
+	target_offset = gen8_canonical_addr(target->node.start);
 	if (target_offset == reloc->presumed_offset)
 		return 0;
 
 	/* Check that the relocation address is valid... */
 	if (unlikely(reloc->offset >
-		     obj->base.size - (eb->reloc_cache.use_64bit_reloc ? 8 : 4))) {
+		     vma->size - (eb->reloc_cache.use_64bit_reloc ? 8 : 4))) {
 		DRM_DEBUG("Relocation beyond object bounds: "
-			  "obj %p target %d offset %d size %d.\n",
-			  obj, reloc->target_handle,
-			  (int) reloc->offset,
-			  (int) obj->base.size);
+			  "target %d offset %d size %d.\n",
+			  reloc->target_handle,
+			  (int)reloc->offset,
+			  (int)vma->size);
 		return -EINVAL;
 	}
 	if (unlikely(reloc->offset & 3)) {
 		DRM_DEBUG("Relocation not 4-byte aligned: "
-			  "obj %p target %d offset %d.\n",
-			  obj, reloc->target_handle,
-			  (int) reloc->offset);
+			  "target %d offset %d.\n",
+			  reloc->target_handle,
+			  (int)reloc->offset);
 		return -EINVAL;
 	}
 
 	/* We can't wait for rendering with pagefaults disabled */
-	if (i915_gem_object_is_active(obj) && pagefault_disabled())
+	if (i915_gem_object_is_active(vma->obj) && pagefault_disabled())
 		return -EFAULT;
 
-	ret = relocate_entry(obj, reloc, &eb->reloc_cache, target_offset);
+	ret = relocate_entry(vma->obj, reloc, &eb->reloc_cache, target_offset);
 	if (ret)
 		return ret;
 
@@ -662,7 +657,7 @@ static int eb_relocate_vma(struct i915_vma *vma, struct i915_execbuffer *eb)
 		do {
 			u64 offset = r->presumed_offset;
 
-			ret = eb_relocate_entry(vma->obj, eb, r);
+			ret = eb_relocate_entry(vma, eb, r);
 			if (ret)
 				goto out;
 
@@ -694,7 +689,7 @@ eb_relocate_vma_slow(struct i915_vma *vma,
 	int i, ret = 0;
 
 	for (i = 0; i < entry->relocation_count; i++) {
-		ret = eb_relocate_entry(vma->obj, eb, &relocs[i]);
+		ret = eb_relocate_entry(vma, eb, &relocs[i]);
 		if (ret)
 			break;
 	}
@@ -736,7 +731,6 @@ eb_reserve_vma(struct i915_vma *vma,
 	       struct intel_engine_cs *ring,
 	       bool *need_reloc)
 {
-	struct drm_i915_gem_object *obj = vma->obj;
 	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 	uint64_t flags;
 	int ret;
@@ -790,11 +784,6 @@ eb_reserve_vma(struct i915_vma *vma,
 		*need_reloc = true;
 	}
 
-	if (entry->flags & EXEC_OBJECT_WRITE) {
-		obj->base.pending_read_domains = I915_GEM_DOMAIN_RENDER;
-		obj->base.pending_write_domain = I915_GEM_DOMAIN_RENDER;
-	}
-
 	return 0;
 }
 
@@ -855,7 +844,6 @@ eb_vma_misplaced(struct i915_vma *vma)
 static int eb_reserve(struct i915_execbuffer *eb)
 {
 	const bool has_fenced_gpu_access = INTEL_INFO(eb->i915)->gen < 4;
-	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
 	struct list_head ordered_vmas;
 	struct list_head pinned_vmas;
@@ -868,7 +856,6 @@ static int eb_reserve(struct i915_execbuffer *eb)
 		bool need_fence, need_mappable;
 
 		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
-		obj = vma->obj;
 		entry = vma->exec_entry;
 
 		if (eb->ctx->flags & CONTEXT_NO_ZEROMAP)
@@ -888,9 +875,6 @@ static int eb_reserve(struct i915_execbuffer *eb)
 			list_move(&vma->exec_list, &ordered_vmas);
 		} else
 			list_move_tail(&vma->exec_list, &ordered_vmas);
-
-		obj->base.pending_read_domains = I915_GEM_GPU_DOMAINS & ~I915_GEM_DOMAIN_COMMAND;
-		obj->base.pending_write_domain = 0;
 	}
 	list_splice(&ordered_vmas, &eb->vmas);
 	list_splice(&pinned_vmas, &eb->vmas);
@@ -1085,7 +1069,9 @@ eb_move_to_gpu(struct i915_execbuffer *eb)
 		struct drm_i915_gem_object *obj = vma->obj;
 
 		if (obj->flags & other_rings) {
-			ret = i915_gem_object_sync(obj, eb->request);
+			ret = i915_gem_object_sync(obj,
+						   eb->request,
+						   vma->exec_entry->flags & EXEC_OBJECT_WRITE);
 			if (ret)
 				return ret;
 		}
@@ -1248,12 +1234,10 @@ eb_move_to_active(struct i915_execbuffer *eb)
 		u32 old_read = obj->base.read_domains;
 		u32 old_write = obj->base.write_domain;
 
-		obj->base.write_domain = obj->base.pending_write_domain;
-		if (obj->base.write_domain)
-			vma->exec_entry->flags |= EXEC_OBJECT_WRITE;
-		else
-			obj->base.pending_read_domains |= obj->base.read_domains;
-		obj->base.read_domains = obj->base.pending_read_domains;
+		obj->base.write_domain = 0;
+		if (vma->exec_entry->flags & EXEC_OBJECT_WRITE)
+			obj->base.read_domains = 0;
+		obj->base.read_domains |= I915_GEM_GPU_DOMAINS;
 
 		i915_vma_move_to_active(vma, eb->request, vma->exec_entry->flags);
 		trace_i915_gem_object_change_domain(obj, old_read, old_write);
@@ -1598,7 +1582,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	}
 
 	/* Set the pending read domains for the batch buffer to COMMAND */
-	if (eb.batch_vma->obj->base.pending_write_domain) {
+	if (eb.batch_vma->exec_entry->flags & EXEC_OBJECT_WRITE) {
 		DRM_DEBUG("Attempting to use self-modifying batch buffer\n");
 		ret = -EINVAL;
 		goto err;
@@ -1630,8 +1614,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		}
 	}
 
-	eb.batch_vma->obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
-
 	/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
 	 * batch" bit. Hence we need to pin secure batches into the global gtt.
 	 * hsw should have this fixed, but bdw mucks it up again. */
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 13d283e4b0a3..e518d3300a3e 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11670,7 +11670,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 			goto cleanup_pending;
 		}
 
-		ret = i915_gem_object_sync(obj, request);
+		ret = i915_gem_object_sync(obj, request, false);
 		if (ret)
 			goto cleanup_request;
 	}
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 172/190] drm/i915: Eliminate lots of iterations over the execobjects array
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (28 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 171/190] drm/i915: Pass vma to relocate entry Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 173/190] drm/i915: Wait upon userptr get-user-pages within execbuffer Chris Wilson
                     ` (17 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

The major scaling bottleneck in execbuffer is the processing of the
execobjects. Creating an auxiliary list is ineffecient when compared to
using the execobject array we already have allocated.

Reservation is then split into phases. As we lookup up the VMA, we
try and bind it back into active location. Only if that fails, do we add
it to the unbound list for phase 2. In phase 2, we try and add all those
objects that could not fit into their previous location, with fallback
to retrying all objects and evicting the VM in case of severe
fragmentation. (This is the same as before, except that phase 1 is now
done inline with looking up the VMA to avoid an interation over the
execobject array. In the ideal case, we eliminate the separate reservation
phase). During the reservation phase, we only evict from the VM between
passes (rather than currently as we try to fit every new VMA). In
testing with Unreal Engine's Atlantis demo which stresses the eviction
logic on gen7 class hardware, this speed up the framerate by a factor of
2.

The second loop amlagamation is between move_to_gpu and move_to_active.
As we always submit the request, even if incomplete, we can use the
current request to track active VMA as we perform the flushes and
synchronisation required.

The next big advancement is to avoid copying back to the user any
execobjects and relocations that are not changed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h            |    3 +-
 drivers/gpu/drm/i915/i915_gem.c            |    4 +-
 drivers/gpu/drm/i915/i915_gem_evict.c      |   71 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 1310 ++++++++++++++--------------
 drivers/gpu/drm/i915/i915_gem_gtt.c        |    2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h        |    4 +-
 6 files changed, 713 insertions(+), 681 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2ceefce0e731..601ef7412cf9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2733,6 +2733,7 @@ int i915_gem_wait_ioctl(struct drm_device *dev, void *data,
 void i915_gem_load(struct drm_device *dev);
 void *i915_gem_object_alloc(struct drm_device *dev);
 void i915_gem_object_free(struct drm_i915_gem_object *obj);
+bool i915_gem_object_flush_active(struct drm_i915_gem_object *obj);
 void i915_gem_object_init(struct drm_i915_gem_object *obj,
 			 const struct drm_i915_gem_object_ops *ops);
 struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
@@ -3078,7 +3079,7 @@ int __must_check i915_gem_evict_something(struct drm_device *dev,
 					  unsigned long end,
 					  unsigned flags);
 int __must_check i915_gem_evict_for_vma(struct i915_vma *vma, unsigned flags);
-int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle);
+int i915_gem_evict_vm(struct i915_address_space *vm);
 
 /* belongs in i915_gem_gtt.h */
 static inline void i915_gem_chipset_flush(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3eeca1fb89d2..0bd6db4e83d9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2393,7 +2393,7 @@ out:
  * write domains, emitting any outstanding lazy request and retiring and
  * completed requests.
  */
-static bool i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
+bool i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 {
 	int i;
 
@@ -2821,7 +2821,7 @@ i915_vma_insert(struct i915_vma *vma,
 			  size, obj->base.size,
 			  flags & PIN_MAPPABLE ? "mappable" : "total",
 			  end);
-		return -E2BIG;
+		return -ENOSPC;
 	}
 
 	ret = i915_gem_object_get_pages(obj);
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index d40bcb81c922..e71b89bac168 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -72,7 +72,7 @@ mark_free(struct i915_vma *vma, unsigned flags, struct list_head *unwind)
 	if (flags & PIN_NOFAULT && vma->obj->fault_mappable)
 		return false;
 
-	list_add(&vma->exec_list, unwind);
+	list_add(&vma->evict_link, unwind);
 	return drm_mm_scan_add_block(&vma->node);
 }
 
@@ -154,11 +154,11 @@ search_again:
 	while (!list_empty(&eviction_list)) {
 		vma = list_first_entry(&eviction_list,
 				       struct i915_vma,
-				       exec_list);
+				       evict_link);
 		ret = drm_mm_scan_remove_block(&vma->node);
 		BUG_ON(ret);
 
-		list_del(&vma->exec_list);
+		list_del(&vma->evict_link);
 	}
 
 	/* Can we unpin some objects such as idle hw contents,
@@ -201,16 +201,16 @@ found:
 	 * calling unbind (which may remove the active reference
 	 * of any of our objects, thus corrupting the list).
 	 */
-	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
 		if (drm_mm_scan_remove_block(&vma->node))
 			drm_gem_object_reference(&vma->obj->base);
 		else
-			list_del(&vma->exec_list);
+			list_del(&vma->evict_link);
 	}
 
 	/* Unbinding will emit any required flushes */
 	ret = 0;
-	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
 		struct drm_i915_gem_object *obj = vma->obj;
 		if (ret == 0)
 			ret = i915_vma_unbind(vma);
@@ -261,14 +261,13 @@ i915_gem_evict_for_vma(struct i915_vma *target, unsigned flags)
 			break;
 		}
 
-		list_add(&vma->exec_list, &eviction_list);
+		list_add(&vma->evict_link, &eviction_list);
 		drm_gem_object_reference(&vma->obj->base);
 	}
 
 	ret = 0;
-	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
 		struct drm_i915_gem_object *obj = vma->obj;
-		list_del(&vma->exec_list);
 		if (ret == 0)
 			ret = i915_vma_unbind(vma);
 		drm_gem_object_unreference(&obj->base);
@@ -291,37 +290,49 @@ i915_gem_evict_for_vma(struct i915_vma *target, unsigned flags)
  * To clarify: This is for freeing up virtual address space, not for freeing
  * memory in e.g. the shrinker.
  */
-int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
+int i915_gem_evict_vm(struct i915_address_space *vm)
 {
+	struct list_head *phases[] = {
+		&vm->inactive_list,
+		&vm->active_list,
+		NULL
+	}, **phase;
+	struct list_head eviction_list;
 	struct i915_vma *vma, *next;
 	int ret;
 
 	WARN_ON(!mutex_is_locked(&vm->dev->struct_mutex));
 	trace_i915_gem_evict_vm(vm);
 
-	if (do_idle) {
-		/* Switch back to the default context in order to unpin
-		 * the existing context objects. However, such objects only
-		 * pin themselves inside the global GTT and performing the
-		 * switch otherwise is ineffective.
-		 */
-		if (i915_is_ggtt(vm)) {
-			ret = switch_to_pinned_context(to_i915(vm->dev));
-			if (ret)
-				return ret;
-		}
-
-		ret = i915_gpu_idle(vm->dev);
+	/* Switch back to the default context in order to unpin
+	 * the existing context objects. However, such objects only
+	 * pin themselves inside the global GTT and performing the
+	 * switch otherwise is ineffective.
+	 */
+	if (i915_is_ggtt(vm)) {
+		ret = switch_to_pinned_context(to_i915(vm->dev));
 		if (ret < 0)
 			return ret;
-
-		i915_gem_retire_requests(vm->dev);
-		WARN_ON(!list_empty(&vm->active_list));
 	}
 
-	list_for_each_entry_safe(vma, next, &vm->inactive_list, vm_link)
-		if (vma->pin_count == 0)
-			WARN_ON(i915_vma_unbind(vma));
+	INIT_LIST_HEAD(&eviction_list);
+	phase = phases;
+	do {
+		list_for_each_entry(vma, *phase, vm_link) {
+			if (vma->pin_count)
+				continue;
 
-	return 0;
+			list_add(&vma->evict_link, &eviction_list);
+			drm_gem_object_reference(&vma->obj->base);
+		}
+	} while (*++phase);
+
+	ret = 0;
+	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
+		struct drm_i915_gem_object *obj = vma->obj;
+		if (ret == 0)
+			ret = i915_vma_unbind(vma);
+		drm_gem_object_unreference(&obj->base);
+	}
+	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 2868e094f67c..f40d3254249a 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -40,6 +40,10 @@
 #define  __EXEC_OBJECT_HAS_FENCE (1U<<30)
 #define  __EXEC_OBJECT_NEEDS_MAP (1U<<29)
 #define  __EXEC_OBJECT_NEEDS_BIAS (1U<<28)
+#define __EB_RESERVED (__EXEC_OBJECT_HAS_PIN | __EXEC_OBJECT_HAS_FENCE)
+
+#define __EXEC_HAS_RELOC (1ULL<<31)
+#define UPDATE PIN_OFFSET_FIXED
 
 #define BATCH_OFFSET_BIAS (256*1024)
 
@@ -52,21 +56,44 @@ struct i915_execbuffer {
 	struct intel_context *ctx;
 	struct i915_address_space *vm;
 	struct i915_vma *batch_vma;
-	uint32_t batch_start_offset;
 	struct drm_i915_gem_request *request;
-	unsigned dispatch_flags;
-	bool need_relocs;
-	struct list_head vmas;
+	struct list_head unbound;
+	struct list_head relocs;
 	struct reloc_cache {
 		unsigned long vaddr;
 		unsigned page;
 		struct drm_mm_node node;
 		bool use_64bit_reloc;
+		bool has_llc;
+		bool has_fence;
 	} reloc_cache;
+	u64 invalid_flags;
+	u32 context_flags;
+	u32 dispatch_flags;
 	int lut_mask;
 	struct hlist_head *buckets;
 };
 
+#define to_ptr(T, x) ((T *)(uintptr_t)(x))
+
+/* Used to convert any address to canonical form.
+ * Starting from gen8, some commands (e.g. STATE_BASE_ADDRESS,
+ * MI_LOAD_REGISTER_MEM and others, see Broadwell PRM Vol2a) require the
+ * addresses to be in a canonical form:
+ * "GraphicsAddress[63:48] are ignored by the HW and assumed to be in correct
+ * canonical form [63:48] == [47]."
+ */
+#define GEN8_HIGH_ADDRESS_BIT 47
+static inline uint64_t gen8_canonical_addr(uint64_t address)
+{
+	return sign_extend64(address, GEN8_HIGH_ADDRESS_BIT);
+}
+
+static inline uint64_t gen8_noncanonical_addr(uint64_t address)
+{
+	return address & ((1ULL << (GEN8_HIGH_ADDRESS_BIT + 1)) - 1);
+}
+
 static int
 eb_create(struct i915_execbuffer *eb)
 {
@@ -91,78 +118,317 @@ eb_create(struct i915_execbuffer *eb)
 	return 0;
 }
 
+static bool
+eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry,
+		 const struct i915_vma *vma)
+{
+	if ((entry->flags & __EXEC_OBJECT_HAS_PIN) == 0)
+		return true;
+
+	if (vma->node.size < entry->pad_to_size)
+		return true;
+
+	if (entry->alignment && vma->node.start & (entry->alignment - 1))
+		return true;
+
+	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS &&
+	    vma->node.start < BATCH_OFFSET_BIAS)
+		return true;
+
+	if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) == 0 &&
+	    (vma->node.start + vma->node.size - 1) >> 32)
+		return true;
+
+	return false;
+}
+
+static void
+eb_pin_vma(struct i915_execbuffer *eb,
+	   struct drm_i915_gem_exec_object2 *entry,
+	   struct i915_vma *vma)
+{
+	u64 flags;
+
+	flags = PIN_USER | PIN_NONBLOCK | PIN_OFFSET_FIXED;
+	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
+		flags |= PIN_GLOBAL;
+	if (entry->flags & EXEC_OBJECT_PINNED)
+		flags |= entry->offset;
+	else
+		flags |= vma->node.start;
+	if (unlikely(i915_vma_pin(vma, 0, 0, flags)))
+		return;
+
+	if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
+		if (unlikely(i915_vma_get_fence(vma))) {
+			i915_vma_unpin(vma);
+			return;
+		}
+
+		if (i915_vma_pin_fence(vma))
+			entry->flags |= __EXEC_OBJECT_HAS_FENCE;
+	}
+
+	if (entry->offset != vma->node.start) {
+		entry->offset = vma->node.start | UPDATE;
+		eb->args->flags |= __EXEC_HAS_RELOC;
+	}
+	entry->flags |= __EXEC_OBJECT_HAS_PIN;
+}
+
 static inline void
 __eb_unreserve_vma(struct i915_vma *vma,
-		   const struct drm_i915_gem_exec_object2 *entry)
+		 const struct drm_i915_gem_exec_object2 *entry)
 {
+	GEM_BUG_ON((entry->flags & __EXEC_OBJECT_HAS_PIN) == 0);
+
 	if (unlikely(entry->flags & __EXEC_OBJECT_HAS_FENCE))
 		i915_vma_unpin_fence(vma);
 
-	if (entry->flags & __EXEC_OBJECT_HAS_PIN)
-		__i915_vma_unpin(vma);
+	__i915_vma_unpin(vma);
 }
 
-static void
-eb_unreserve_vma(struct i915_vma *vma)
+static inline void
+eb_unreserve_vma(struct i915_vma *vma,
+		 struct drm_i915_gem_exec_object2 *entry)
 {
-	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	__eb_unreserve_vma(vma, entry);
-	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | __EXEC_OBJECT_HAS_PIN);
+	if (entry->flags & __EXEC_OBJECT_HAS_PIN) {
+		__eb_unreserve_vma(vma, entry);
+		entry->flags &= ~__EB_RESERVED;
+	}
 }
 
-static void
-eb_reset(struct i915_execbuffer *eb)
+static int
+eb_add_vma(struct i915_execbuffer *eb,
+	   struct drm_i915_gem_exec_object2 *entry,
+	   struct i915_vma *vma)
 {
-	struct i915_vma *vma;
+	int ret;
 
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		eb_unreserve_vma(vma);
-		vma->exec_entry = NULL;
-	}
+	GEM_BUG_ON(vma->closed);
 
-	if (eb->lut_mask >= 0)
-		memset(eb->buckets, 0,
-		       (1<<eb->lut_mask)*sizeof(struct hlist_head));
-}
+	if (unlikely(entry->flags & eb->invalid_flags))
+		return -EINVAL;
 
-#define to_ptr(T, x) ((T *)(uintptr_t)(x))
+	if (unlikely(entry->alignment && !is_power_of_2(entry->alignment)))
+		return -EINVAL;
+
+	/* Offset can be used as input (EXEC_OBJECT_PINNED), reject
+	 * any non-page-aligned or non-canonical addresses.
+	 */
+	if (entry->flags & EXEC_OBJECT_PINNED) {
+		if (unlikely(entry->offset !=
+			     gen8_canonical_addr(entry->offset & PAGE_MASK)))
+			return -EINVAL;
+
+		/* From drm_mm perspective address space is continuous,
+		 * so from this point we're always using non-canonical
+		 * form internally.
+		 */
+		entry->offset = gen8_noncanonical_addr(entry->offset);
+	}
+
+	/* pad_to_size was once a reserved field, so sanitize it */
+	if (entry->flags & EXEC_OBJECT_PAD_TO_SIZE) {
+		if (unlikely(offset_in_page(entry->pad_to_size)))
+			return -EINVAL;
+	} else
+		entry->pad_to_size = 0;
 
-static bool
-eb_add_vma(struct i915_execbuffer *eb, struct i915_vma *vma, int i)
-{
 	if (unlikely(vma->exec_entry)) {
 		DRM_DEBUG("Object [handle %d, index %d] appears more than once in object list\n",
-			  eb->exec[i].handle, i);
-		return false;
+			  entry->handle, (int)(entry - eb->exec));
+		return -EINVAL;
 	}
-	list_add_tail(&vma->exec_list, &eb->vmas);
 
-	vma->exec_entry = &eb->exec[i];
+	vma->exec_entry = entry;
+	entry->rsvd2 = (uintptr_t)vma;
+
 	if (eb->lut_mask >= 0) {
-		vma->exec_handle = eb->exec[i].handle;
+		vma->exec_handle = entry->handle;
 		hlist_add_head(&vma->exec_node,
-			       &eb->buckets[hash_32(vma->exec_handle,
+			       &eb->buckets[hash_32(entry->handle,
 						    eb->lut_mask)]);
 	}
 
-	eb->exec[i].rsvd2 = (uintptr_t)vma;
-	return true;
+	if (entry->relocation_count)
+		list_add_tail(&vma->reloc_link, &eb->relocs);
+
+	if (!eb->reloc_cache.has_fence) {
+		entry->flags &= ~EXEC_OBJECT_NEEDS_FENCE;
+	} else {
+		if (entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
+		    vma->obj->tiling_mode != I915_TILING_NONE)
+			entry->flags |= EXEC_OBJECT_NEEDS_GTT | __EXEC_OBJECT_NEEDS_MAP;
+	}
+
+	if ((entry->flags & EXEC_OBJECT_PINNED) == 0)
+		entry->flags |= eb->context_flags;
+
+	ret = 0;
+	if (vma->node.size)
+		eb_pin_vma(eb, entry, vma);
+	if (eb_vma_misplaced(entry, vma)) {
+		eb_unreserve_vma(vma, entry);
+
+		list_add_tail(&vma->exec_link, &eb->unbound);
+		if (drm_mm_node_allocated(&vma->node))
+			ret = i915_vma_unbind(vma);
+	}
+	return ret;
+}
+
+static inline int use_cpu_reloc(const struct reloc_cache *cache,
+				const struct drm_i915_gem_object *obj)
+{
+	return (DBG_USE_CPU_RELOC ||
+		cache->has_llc ||
+		obj->base.write_domain == I915_GEM_DOMAIN_CPU ||
+		obj->cache_level != I915_CACHE_NONE);
+}
+
+static int
+eb_reserve_vma(struct i915_execbuffer *eb, struct i915_vma *vma, int pass)
+{
+	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
+	u64 flags;
+	int ret;
+
+	flags = PIN_USER | PIN_NONBLOCK;
+	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
+		flags |= PIN_GLOBAL;
+
+	if (!drm_mm_node_allocated(&vma->node)) {
+		/* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
+		 * limit address to the first 4GBs for unflagged objects.
+		 */
+		if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) == 0)
+			flags |= PIN_ZONE_4G;
+
+		if (vma->is_ggtt) {
+			if (entry->flags & __EXEC_OBJECT_NEEDS_MAP) {
+				flags |= PIN_MAPPABLE;
+			} else if (entry->relocation_count &&
+				   !use_cpu_reloc(&eb->reloc_cache, vma->obj)) {
+				if (pass <= 0)
+					flags |= PIN_MAPPABLE;
+			} else {
+				flags |= PIN_HIGH;
+			}
+		}
+
+		if (entry->flags & EXEC_OBJECT_PINNED)
+			flags |= entry->offset | PIN_OFFSET_FIXED;
+		else if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
+			flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
+	}
+
+	ret = i915_vma_pin(vma, entry->pad_to_size, entry->alignment, flags);
+	if (ret)
+		return ret;
+
+	if ((entry->offset & PIN_OFFSET_MASK) != vma->node.start) {
+		entry->offset = vma->node.start | UPDATE;
+		eb->args->flags |= __EXEC_HAS_RELOC;
+	}
+	entry->flags |= __EXEC_OBJECT_HAS_PIN;
+
+	if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
+		ret = i915_vma_get_fence(vma);
+		if (ret)
+			return ret;
+
+		if (i915_vma_pin_fence(vma))
+			entry->flags |= __EXEC_OBJECT_HAS_FENCE;
+	}
+
+	return 0;
 }
 
-static inline struct hlist_head *ht_head(struct intel_context *ctx, u32 handle)
+static int eb_reserve(struct i915_execbuffer *eb)
+{
+	const unsigned count = eb->args->buffer_count;
+	struct i915_vma *vma;
+	unsigned i;
+	int pass;
+	int ret;
+
+	/* Attempt to pin all of the buffers into the GTT.
+	 * This is done in 3 phases:
+	 *
+	 * 1a. Unbind all objects that do not match the GTT constraints for
+	 *     the execbuffer (fenceable, mappable, alignment etc).
+	 * 1b. Increment pin count for already bound objects.
+	 * 2.  Bind new objects.
+	 * 3.  Decrement pin count.
+	 *
+	 * This avoid unnecessary unbinding of later objects in order to make
+	 * room for the earlier objects *unless* we need to defragment.
+	 */
+	if (list_empty(&eb->unbound))
+		return 0;
+
+	pass = -1;
+	do {
+		struct list_head last;
+
+		i915_gem_retire_requests(eb->i915->dev);
+
+		list_for_each_entry(vma, &eb->unbound, exec_link) {
+			ret = eb_reserve_vma(eb, vma, pass);
+			if (ret)
+				break;
+		}
+		if (ret != -ENOSPC || pass++ > 0)
+			return ret;
+
+		/* Resort *all* the objects into priority order */
+		INIT_LIST_HEAD(&eb->unbound);
+		INIT_LIST_HEAD(&last);
+		for (i = 0; i < count; i++) {
+			struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
+			vma = to_ptr(struct i915_vma, entry->rsvd2);
+
+			eb_unreserve_vma(vma, entry);
+
+			if (entry->flags & EXEC_OBJECT_PINNED)
+				list_add(&vma->exec_link, &eb->unbound);
+			else if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
+				list_add_tail(&vma->exec_link, &eb->unbound);
+			else
+				list_add_tail(&vma->exec_link, &last);
+		}
+		list_splice_tail(&last, &eb->unbound);
+
+		/* Too fragmented, unbind everything and retry */
+		ret = i915_gem_evict_vm(eb->vm);
+		if (ret)
+			return ret;
+	} while (1);
+}
+
+static inline struct hlist_head *
+ht_head(const struct intel_context *ctx, u32 handle)
 {
 	return &ctx->vma_ht[hash_32(handle, ctx->vma_ht_bits)];
 }
 
+static int eb_batch_index(const struct i915_execbuffer *eb)
+{
+	return eb->args->buffer_count - 1;
+}
+
 static int
 eb_lookup_vmas(struct i915_execbuffer *eb)
 {
 	const int count = eb->args->buffer_count;
 	struct i915_vma *vma;
+	struct idr *idr;
 	int slow_pass = -1;
-	int i;
+	int i, ret;
 
-	INIT_LIST_HEAD(&eb->vmas);
+	INIT_LIST_HEAD(&eb->relocs);
+	INIT_LIST_HEAD(&eb->unbound);
 
 	if (unlikely(eb->ctx->vma_ht_size & 1))
 		flush_work(&eb->ctx->vma_ht_resize);
@@ -175,8 +441,9 @@ eb_lookup_vmas(struct i915_execbuffer *eb)
 			if (vma->ctx_handle != eb->exec[i].handle)
 				continue;
 
-			if (!eb_add_vma(eb, vma, i))
-				return -EINVAL;
+			ret = eb_add_vma(eb, &eb->exec[i], vma);
+			if (unlikely(ret))
+				return ret;
 
 			goto next_vma;
 		}
@@ -187,24 +454,25 @@ next_vma: ;
 	}
 
 	if (slow_pass < 0)
-		return 0;
+		goto out;
 
 	spin_lock(&eb->file->table_lock);
 	/* Grab a reference to the object and release the lock so we can lookup
 	 * or create the VMA without using GFP_ATOMIC */
+	idr = &eb->file->object_idr;
 	for (i = slow_pass; i < count; i++) {
 		struct drm_i915_gem_object *obj;
 
 		if (eb->exec[i].rsvd2)
 			continue;
 
-		obj = to_intel_bo(idr_find(&eb->file->object_idr,
-					   eb->exec[i].handle));
+		obj = to_intel_bo(idr_find(idr, eb->exec[i].handle));
 		if (unlikely(obj == NULL)) {
 			spin_unlock(&eb->file->table_lock);
 			DRM_DEBUG("Invalid object handle %d at index %d\n",
 				  eb->exec[i].handle, i);
-			return -ENOENT;
+			ret = -ENOENT;
+			goto err;
 		}
 
 		eb->exec[i].rsvd2 = 1 | (uintptr_t)obj;
@@ -225,11 +493,12 @@ next_vma: ;
 		 * from the (obj, vm) we don't run the risk of creating
 		 * duplicated vmas for the same vm.
 		 */
-		obj = to_ptr(struct drm_i915_gem_object, eb->exec[i].rsvd2 & ~1);
+		obj = to_ptr(typeof(*obj), eb->exec[i].rsvd2 & ~1);
 		vma = i915_gem_obj_lookup_or_create_vma(obj, eb->vm, NULL);
 		if (unlikely(IS_ERR(vma))) {
 			DRM_DEBUG("Failed to lookup VMA\n");
-			return PTR_ERR(vma);
+			ret = PTR_ERR(vma);
+			goto err;
 		}
 
 		/* First come, first served */
@@ -240,28 +509,24 @@ next_vma: ;
 				       ht_head(eb->ctx, eb->exec[i].handle));
 			eb->ctx->vma_ht_count++;
 			if (vma->is_ggtt) {
-				BUG_ON(obj->vma_hashed);
+				GEM_BUG_ON(obj->vma_hashed);
 				obj->vma_hashed = vma;
 			}
 		}
 
-		if (!eb_add_vma(eb, vma, i))
-			return -EINVAL;
+		ret = eb_add_vma(eb, &eb->exec[i], vma);
+		if (unlikely(ret))
+			goto err;
 	}
 	if (4*eb->ctx->vma_ht_count > 3*eb->ctx->vma_ht_size) {
 		eb->ctx->vma_ht_size |= 1;
 		queue_work(system_highpri_wq, &eb->ctx->vma_ht_resize);
 	}
 
-	return 0;
-}
-
-static struct i915_vma *
-eb_get_batch(struct i915_execbuffer *eb)
-{
-	struct i915_vma *vma;
-
-	vma = to_ptr(struct i915_vma, eb->exec[eb->args->buffer_count-1].rsvd2);
+out:
+	/* take note of the batch buffer before we might reorder the lists */
+	i = eb_batch_index(eb);
+	eb->batch_vma = to_ptr(struct i915_vma, eb->exec[i].rsvd2);
 
 	/*
 	 * SNA is doing fancy tricks with compressing batch buffers, which leads
@@ -272,14 +537,23 @@ eb_get_batch(struct i915_execbuffer *eb)
 	 * Note that actual hangs have only been observed on gen7, but for
 	 * paranoia do it everywhere.
 	 */
-	if ((vma->exec_entry->flags & EXEC_OBJECT_PINNED) == 0)
-		vma->exec_entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
+	if ((eb->exec[i].flags & EXEC_OBJECT_PINNED) == 0)
+		eb->exec[i].flags |= __EXEC_OBJECT_NEEDS_BIAS;
+	if (eb->reloc_cache.has_fence)
+		eb->exec[i].flags |= EXEC_OBJECT_NEEDS_FENCE;
 
-	return vma;
+	return eb_reserve(eb);
+
+err:
+	for (i = slow_pass; i < count; i++) {
+		if (eb->exec[i].rsvd2 & 1)
+			eb->exec[i].rsvd2 = 0;
+	}
+	return ret;
 }
 
 static struct i915_vma *
-eb_get_vma(struct i915_execbuffer *eb, unsigned long handle)
+eb_get_vma(const struct i915_execbuffer *eb, unsigned long handle)
 {
 	if (eb->lut_mask < 0) {
 		if (handle >= -eb->lut_mask)
@@ -301,46 +575,51 @@ eb_get_vma(struct i915_execbuffer *eb, unsigned long handle)
 	}
 }
 
-static void eb_destroy(struct i915_execbuffer *eb)
+static void
+eb_reset(const struct i915_execbuffer *eb)
 {
-	struct i915_vma *vma;
+	const unsigned count = eb->args->buffer_count;
+	unsigned i;
 
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		if (vma->exec_entry == NULL)
-			continue;
+	for (i = 0; i < count; i++) {
+		struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
+		struct i915_vma *vma = to_ptr(struct i915_vma, entry->rsvd2);
+
+		if (entry->flags & __EXEC_OBJECT_HAS_PIN)
+			__eb_unreserve_vma(vma, entry);
+		entry->flags &= ~eb->invalid_flags;
 
-		__eb_unreserve_vma(vma, vma->exec_entry);
 		vma->exec_entry = NULL;
 	}
 
 	if (eb->lut_mask >= 0)
-		kfree(eb->buckets);
+		memset(eb->buckets, 0,
+		       (1<<eb->lut_mask)*sizeof(struct hlist_head));
 }
 
-static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
+static void eb_destroy(const struct i915_execbuffer *eb)
 {
-	return (DBG_USE_CPU_RELOC ||
-		HAS_LLC(obj->base.dev) ||
-		obj->base.write_domain == I915_GEM_DOMAIN_CPU ||
-		obj->cache_level != I915_CACHE_NONE);
-}
+	const unsigned count = eb->args->buffer_count;
+	unsigned i;
 
-/* Used to convert any address to canonical form.
- * Starting from gen8, some commands (e.g. STATE_BASE_ADDRESS,
- * MI_LOAD_REGISTER_MEM and others, see Broadwell PRM Vol2a) require the
- * addresses to be in a canonical form:
- * "GraphicsAddress[63:48] are ignored by the HW and assumed to be in correct
- * canonical form [63:48] == [47]."
- */
-#define GEN8_HIGH_ADDRESS_BIT 47
-static inline uint64_t gen8_canonical_addr(uint64_t address)
-{
-	return sign_extend64(address, GEN8_HIGH_ADDRESS_BIT);
-}
+	if (eb->lut_mask >= 0)
+		kfree(eb->buckets);
 
-static inline uint64_t gen8_noncanonical_addr(uint64_t address)
-{
-	return address & ((1ULL << (GEN8_HIGH_ADDRESS_BIT + 1)) - 1);
+	if (eb->exec == NULL)
+		return;
+
+	for (i = 0; i < count; i++) {
+		struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
+		struct i915_vma *vma = to_ptr(struct i915_vma, entry->rsvd2);
+
+		if (vma == NULL || vma->exec_entry == NULL)
+			continue;
+
+		GEM_BUG_ON(vma->exec_entry != entry);
+		if (entry->flags & __EXEC_OBJECT_HAS_PIN)
+			__eb_unreserve_vma(vma, entry);
+		vma->exec_entry = NULL;
+	}
 }
 
 static inline uint64_t
@@ -355,7 +634,9 @@ static void reloc_cache_init(struct reloc_cache *cache,
 {
 	cache->page = -1;
 	cache->vaddr = 0;
+	cache->has_llc = HAS_LLC(i915);
 	cache->use_64bit_reloc = INTEL_INFO(i915)->gen >= 8;
+	cache->has_fence = INTEL_INFO(i915)->gen < 4;
 }
 
 static inline void *unmask_page(unsigned long p)
@@ -442,7 +723,7 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj,
 		struct i915_vma *vma;
 		int ret;
 
-		if (use_cpu_reloc(obj))
+		if (use_cpu_reloc(cache, obj))
 			return NULL;
 
 		vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
@@ -542,10 +823,10 @@ repeat:
 	return 0;
 }
 
-static int
-eb_relocate_entry(struct i915_vma *vma,
-		  struct i915_execbuffer *eb,
-		  struct drm_i915_gem_relocation_entry *reloc)
+static uint64_t
+eb_relocate_entry(struct i915_execbuffer *eb,
+		  const struct i915_vma *vma,
+		  const struct drm_i915_gem_relocation_entry *reloc)
 {
 	struct i915_vma *target;
 	u64 target_offset;
@@ -618,318 +899,127 @@ eb_relocate_entry(struct i915_vma *vma,
 		return -EINVAL;
 	}
 
-	/* We can't wait for rendering with pagefaults disabled */
-	if (i915_gem_object_is_active(vma->obj) && pagefault_disabled())
-		return -EFAULT;
-
-	ret = relocate_entry(vma->obj, reloc, &eb->reloc_cache, target_offset);
-	if (ret)
-		return ret;
-
-	/* and update the user's relocation entry */
-	reloc->presumed_offset = target_offset;
-	return 0;
-}
-
-static int eb_relocate_vma(struct i915_vma *vma, struct i915_execbuffer *eb)
-{
-#define N_RELOC(x) ((x) / sizeof(struct drm_i915_gem_relocation_entry))
-	struct drm_i915_gem_relocation_entry stack_reloc[N_RELOC(512)];
-	struct drm_i915_gem_relocation_entry __user *user_relocs;
-	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	int remain, ret = 0;
-
-	user_relocs = to_user_ptr(entry->relocs_ptr);
-
-	remain = entry->relocation_count;
-	while (remain) {
-		struct drm_i915_gem_relocation_entry *r = stack_reloc;
-		int count = remain;
-		if (count > ARRAY_SIZE(stack_reloc))
-			count = ARRAY_SIZE(stack_reloc);
-		remain -= count;
-
-		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
-			ret = -EFAULT;
-			goto out;
-		}
-
-		do {
-			u64 offset = r->presumed_offset;
-
-			ret = eb_relocate_entry(vma, eb, r);
-			if (ret)
-				goto out;
-
-			if (r->presumed_offset != offset &&
-			    __copy_to_user_inatomic(&user_relocs->presumed_offset,
-						    &r->presumed_offset,
-						    sizeof(r->presumed_offset))) {
-				ret = -EFAULT;
-				goto out;
-			}
-
-			user_relocs++;
-			r++;
-		} while (--count);
-	}
-
-out:
-	reloc_cache_reset(&eb->reloc_cache);
-	return ret;
-#undef N_RELOC
-}
-
-static int
-eb_relocate_vma_slow(struct i915_vma *vma,
-		     struct i915_execbuffer *eb,
-		     struct drm_i915_gem_relocation_entry *relocs)
-{
-	const struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	int i, ret = 0;
-
-	for (i = 0; i < entry->relocation_count; i++) {
-		ret = eb_relocate_entry(vma, eb, &relocs[i]);
-		if (ret)
-			break;
-	}
-	reloc_cache_reset(&eb->reloc_cache);
-	return ret;
-}
-
-static int eb_relocate(struct i915_execbuffer *eb)
-{
-	struct i915_vma *vma;
-	int ret = 0;
-
-	/* This is the fast path and we cannot handle a pagefault whilst
-	 * holding the struct mutex lest the user pass in the relocations
-	 * contained within a mmaped bo. For in such a case we, the page
-	 * fault handler would call i915_gem_fault() and we would try to
-	 * acquire the struct mutex again. Obviously this is bad and so
-	 * lockdep complains vehemently.
-	 */
-	pagefault_disable();
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		ret = eb_relocate_vma(vma, eb);
-		if (ret)
-			break;
-	}
-	pagefault_enable();
-
-	return ret;
-}
-
-static bool only_mappable_for_reloc(unsigned int flags)
-{
-	return (flags & (EXEC_OBJECT_NEEDS_FENCE | __EXEC_OBJECT_NEEDS_MAP)) ==
-		__EXEC_OBJECT_NEEDS_MAP;
-}
-
-static int
-eb_reserve_vma(struct i915_vma *vma,
-	       struct intel_engine_cs *ring,
-	       bool *need_reloc)
-{
-	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	uint64_t flags;
-	int ret;
-
-	flags = PIN_USER;
-	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
-		flags |= PIN_GLOBAL;
-
-	if (!drm_mm_node_allocated(&vma->node)) {
-		/* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
-		 * limit address to the first 4GBs for unflagged objects.
-		 */
-		if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) == 0)
-			flags |= PIN_ZONE_4G;
-		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
-			flags |= PIN_GLOBAL | PIN_MAPPABLE;
-		if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
-			flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
-		if (entry->flags & EXEC_OBJECT_PINNED)
-			flags |= entry->offset | PIN_OFFSET_FIXED;
-		if ((flags & PIN_MAPPABLE) == 0)
-			flags |= PIN_HIGH;
-	}
-
-	ret = i915_vma_pin(vma,
-			   entry->pad_to_size,
-			   entry->alignment,
-			   flags);
-	if ((ret == -ENOSPC || ret == -E2BIG) &&
-	    only_mappable_for_reloc(entry->flags))
-		ret = i915_vma_pin(vma,
-				   entry->pad_to_size,
-				   entry->alignment,
-				   flags & ~PIN_MAPPABLE);
-	if (ret)
-		return ret;
-
-	entry->flags |= __EXEC_OBJECT_HAS_PIN;
-
-	if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
-		ret = i915_vma_get_fence(vma);
-		if (ret)
-			return ret;
-
-		if (i915_vma_pin_fence(vma))
-			entry->flags |= __EXEC_OBJECT_HAS_FENCE;
-	}
-
-	if (entry->offset != vma->node.start) {
-		entry->offset = vma->node.start;
-		*need_reloc = true;
-	}
-
-	return 0;
-}
-
-static bool
-need_reloc_mappable(struct i915_vma *vma)
-{
-	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-
-	if (entry->relocation_count == 0)
-		return false;
-
-	if (!vma->is_ggtt)
-		return false;
-
-	/* See also use_cpu_reloc() */
-	if (HAS_LLC(vma->obj->base.dev))
-		return false;
-
-	if (vma->obj->base.write_domain == I915_GEM_DOMAIN_CPU)
-		return false;
-
-	return true;
-}
-
-static bool
-eb_vma_misplaced(struct i915_vma *vma)
-{
-	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-
-	WARN_ON(entry->flags & __EXEC_OBJECT_NEEDS_MAP && !vma->is_ggtt);
-
-	if (entry->alignment &&
-	    vma->node.start & (entry->alignment - 1))
-		return true;
-
-	if (vma->node.size < entry->pad_to_size)
-		return true;
-
-	if (entry->flags & EXEC_OBJECT_PINNED &&
-	    vma->node.start != entry->offset)
-		return true;
-
-	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS &&
-	    vma->node.start < BATCH_OFFSET_BIAS)
-		return true;
-
-	/* avoid costly ping-pong once a batch bo ended up non-mappable */
-	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !vma->map_and_fenceable)
-		return !only_mappable_for_reloc(entry->flags);
+	/* We can't wait for rendering with pagefaults disabled */
+	if (i915_gem_object_is_active(vma->obj) && pagefault_disabled()) {
+		if (i915_gem_object_flush_active(vma->obj))
+			return -EBUSY;
+	}
 
-	if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) == 0 &&
-	    (vma->node.start + vma->node.size - 1) >> 32)
-		return true;
+	ret = relocate_entry(vma->obj, reloc, &eb->reloc_cache, target_offset);
+	if (ret)
+		return ret;
 
-	return false;
+	/* and update the user's relocation entry */
+	return target_offset | 1;
 }
 
-static int eb_reserve(struct i915_execbuffer *eb)
+static int eb_relocate_vma(struct i915_execbuffer *eb,
+			   const struct i915_vma *vma)
 {
-	const bool has_fenced_gpu_access = INTEL_INFO(eb->i915)->gen < 4;
-	struct i915_vma *vma;
-	struct list_head ordered_vmas;
-	struct list_head pinned_vmas;
-	int retry;
-
-	INIT_LIST_HEAD(&ordered_vmas);
-	INIT_LIST_HEAD(&pinned_vmas);
-	while (!list_empty(&eb->vmas)) {
-		struct drm_i915_gem_exec_object2 *entry;
-		bool need_fence, need_mappable;
-
-		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
-		entry = vma->exec_entry;
-
-		if (eb->ctx->flags & CONTEXT_NO_ZEROMAP)
-			entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
-
-		if (!has_fenced_gpu_access)
-			entry->flags &= ~EXEC_OBJECT_NEEDS_FENCE;
-		need_fence =
-			entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
-			vma->obj->tiling_mode != I915_TILING_NONE;
-		need_mappable = need_fence || need_reloc_mappable(vma);
+#define N_RELOC(x) ((x) / sizeof(struct drm_i915_gem_relocation_entry))
+	struct drm_i915_gem_relocation_entry stack_reloc[N_RELOC(512)];
+	struct drm_i915_gem_relocation_entry __user *user_relocs;
+	const struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
+	int remain;
 
-		if (entry->flags & EXEC_OBJECT_PINNED)
-			list_move_tail(&vma->exec_list, &pinned_vmas);
-		else if (need_mappable) {
-			entry->flags |= __EXEC_OBJECT_NEEDS_MAP;
-			list_move(&vma->exec_list, &ordered_vmas);
-		} else
-			list_move_tail(&vma->exec_list, &ordered_vmas);
-	}
-	list_splice(&ordered_vmas, &eb->vmas);
-	list_splice(&pinned_vmas, &eb->vmas);
+	user_relocs = to_user_ptr(entry->relocs_ptr);
+	remain = entry->relocation_count;
 
-	/* Attempt to pin all of the buffers into the GTT.
-	 * This is done in 3 phases:
-	 *
-	 * 1a. Unbind all objects that do not match the GTT constraints for
-	 *     the execbuffer (fenceable, mappable, alignment etc).
-	 * 1b. Increment pin count for already bound objects.
-	 * 2.  Bind new objects.
-	 * 3.  Decrement pin count.
-	 *
-	 * This avoid unnecessary unbinding of later objects in order to make
-	 * room for the earlier objects *unless* we need to defragment.
+	/*
+	 * We must check that the entire relocation array is safe
+	 * to read, but since we may need to update the presumed
+	 * offsets during execution, check for full write access.
 	 */
-	retry = 0;
-	do {
-		int ret = 0;
+	if (!access_ok(VERIFY_WRITE, user_relocs, remain*sizeof(*user_relocs)))
+		return -EFAULT;
 
-		/* Unbind any ill-fitting objects or pin. */
-		list_for_each_entry(vma, &eb->vmas, exec_list) {
-			if (!drm_mm_node_allocated(&vma->node))
-				continue;
+	do {
+		struct drm_i915_gem_relocation_entry *r = stack_reloc;
+		int count = min_t(int, remain, ARRAY_SIZE(stack_reloc));
 
-			if (eb_vma_misplaced(vma))
-				ret = i915_vma_unbind(vma);
-			else
-				ret = eb_reserve_vma(vma, eb->engine, &eb->need_relocs);
-			if (ret)
-				goto err;
+		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
+			remain = -EFAULT;
+			goto out;
 		}
 
-		/* Bind fresh objects */
-		list_for_each_entry(vma, &eb->vmas, exec_list) {
-			if (drm_mm_node_allocated(&vma->node))
-				continue;
+		remain -= count;
+		do {
+			uint64_t offset = eb_relocate_entry(eb, vma, r);
+			if (offset == 0) {
+			} else if ((int64_t)offset < 0) {
+				remain = (int64_t)offset;
+				goto out;
+			} else {
+				offset &= ~1;
+				if (__copy_to_user_inatomic(&user_relocs[r-stack_reloc].presumed_offset,
+							    &offset,
+							    sizeof(offset))) {
+					remain = -EFAULT;
+					goto out;
+				}
+			}
+		} while (r++, --count);
+		user_relocs += ARRAY_SIZE(stack_reloc);
+	} while (remain);
+out:
+	reloc_cache_reset(&eb->reloc_cache);
+	return remain;
+#undef N_RELOC
+}
 
-			ret = eb_reserve_vma(vma, eb->engine, &eb->need_relocs);
-			if (ret)
-				goto err;
-		}
+static int
+eb_relocate_vma_slow(struct i915_execbuffer *eb,
+		     const struct i915_vma *vma)
+{
+	const struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
+	struct drm_i915_gem_relocation_entry *relocs =
+		to_ptr(typeof(*relocs), entry->relocs_ptr);
+	int i, ret;
 
+	for (i = 0; i < entry->relocation_count; i++) {
+		uint64_t offset = eb_relocate_entry(eb, vma, &relocs[i]);
+		if ((int64_t)offset < 0) {
+			ret = (int64_t)offset;
+			goto err;
+		}
+	}
+	ret = 0;
 err:
-		if (ret != -ENOSPC || retry++)
-			return ret;
+	reloc_cache_reset(&eb->reloc_cache);
+	return ret;
+}
 
-		/* Decrement pin count for bound objects */
-		list_for_each_entry(vma, &eb->vmas, exec_list)
-			eb_unreserve_vma(vma);
+static int eb_relocate(struct i915_execbuffer *eb)
+{
+	const struct i915_vma *vma;
+	int ret = 0;
 
-		ret = i915_gem_evict_vm(eb->vm, true);
-		if (ret)
-			return ret;
-	} while (1);
+	/* This is the fast path and we cannot handle a pagefault whilst
+	 * holding the struct mutex lest the user pass in the relocations
+	 * contained within a mmaped bo. For in such a case we, the page
+	 * fault handler would call i915_gem_fault() and we would try to
+	 * acquire the struct mutex again. Obviously this is bad and so
+	 * lockdep complains vehemently.
+	 */
+	pagefault_disable();
+	list_for_each_entry(vma, &eb->relocs, reloc_link) {
+retry:
+		ret = eb_relocate_vma(eb, vma);
+		if (ret == 0)
+			continue;
+
+		if (ret == -EBUSY) {
+			pagefault_enable();
+			ret = i915_gem_object_wait_rendering(vma->obj, false);
+			pagefault_disable();
+			if (ret == 0)
+				goto retry;
+		}
+		break;
+	}
+	pagefault_enable();
+
+	return ret;
 }
 
 static int eb_select_context(struct i915_execbuffer *eb)
@@ -950,49 +1040,52 @@ static int eb_select_context(struct i915_execbuffer *eb)
 	eb->ctx = ctx;
 	eb->vm = ctx->ppgtt ? &ctx->ppgtt->base : &eb->i915->gtt.base;
 
+	eb->context_flags = 0;
+	if (ctx->flags & CONTEXT_NO_ZEROMAP)
+		eb->context_flags |= __EXEC_OBJECT_NEEDS_BIAS;
+
 	return 0;
 }
 
-static int
-eb_relocate_slow(struct i915_execbuffer *eb)
+static struct drm_i915_gem_relocation_entry *
+eb_copy_relocations(const struct i915_execbuffer *eb)
 {
+	const unsigned relocs_max = UINT_MAX / sizeof(struct drm_i915_gem_relocation_entry);
 	const unsigned count = eb->args->buffer_count;
-	struct drm_device *dev = eb->i915->dev;
 	struct drm_i915_gem_relocation_entry *reloc;
-	struct i915_vma *vma;
-	int *reloc_offset;
-	int i, total, ret;
-
-	/* We may process another execbuffer during the unlock... */
-	eb_reset(eb);
-	mutex_unlock(&dev->struct_mutex);
+	unsigned total, i;
 
 	total = 0;
-	for (i = 0; i < count; i++)
-		total += eb->exec[i].relocation_count;
-
-	reloc_offset = drm_malloc_ab(count, sizeof(*reloc_offset));
-	reloc = drm_malloc_ab(total, sizeof(*reloc));
-	if (reloc == NULL || reloc_offset == NULL) {
-		drm_free_large(reloc);
-		drm_free_large(reloc_offset);
-		mutex_lock(&dev->struct_mutex);
-		return -ENOMEM;
+	for (i = 0; i < count; i++) {
+		unsigned nreloc = eb->exec[i].relocation_count;
+
+		if (total > relocs_max - nreloc)
+			return ERR_PTR(-EINVAL);
+
+		total += nreloc;
 	}
+	if (total == 0)
+		return NULL;
+
+	reloc = drm_malloc_gfp(total, sizeof(*reloc), GFP_TEMPORARY);
+	if (reloc == NULL)
+		return ERR_PTR(-ENOMEM);
 
 	total = 0;
 	for (i = 0; i < count; i++) {
 		struct drm_i915_gem_relocation_entry __user *user_relocs;
+		unsigned nreloc = eb->exec[i].relocation_count, j;
 		u64 invalid_offset = (u64)-1;
-		int j;
+
+		if (nreloc == 0)
+			continue;
 
 		user_relocs = to_user_ptr(eb->exec[i].relocs_ptr);
 
 		if (copy_from_user(reloc+total, user_relocs,
-				   eb->exec[i].relocation_count * sizeof(*reloc))) {
-			ret = -EFAULT;
-			mutex_lock(&dev->struct_mutex);
-			goto err;
+				   nreloc * sizeof(*reloc))) {
+			drm_free_large(reloc);
+			return ERR_PTR(-EFAULT);
 		}
 
 		/* As we do not update the known relocation offsets after
@@ -1004,18 +1097,40 @@ eb_relocate_slow(struct i915_execbuffer *eb)
 		 * happened we would make the mistake of assuming that the
 		 * relocations were valid.
 		 */
-		for (j = 0; j < eb->exec[i].relocation_count; j++) {
+		for (j = 0; j < nreloc; j++) {
 			if (__copy_to_user(&user_relocs[j].presumed_offset,
 					   &invalid_offset,
 					   sizeof(invalid_offset))) {
-				ret = -EFAULT;
-				mutex_lock(&dev->struct_mutex);
-				goto err;
+				drm_free_large(reloc);
+				return ERR_PTR(-EFAULT);
 			}
 		}
 
-		reloc_offset[i] = total;
-		total += eb->exec[i].relocation_count;
+		eb->exec[i].relocs_ptr = (uintptr_t)(reloc + total);
+		total += nreloc;
+	}
+
+	return reloc;
+}
+
+static int eb_relocate_slow(struct i915_execbuffer *eb)
+{
+	struct drm_device *dev = eb->i915->dev;
+	struct drm_i915_gem_relocation_entry *reloc = NULL;
+	const struct i915_vma *vma;
+	int ret;
+
+repeat:
+	/* We may process another execbuffer during the unlock... */
+	eb_reset(eb);
+	mutex_unlock(&dev->struct_mutex);
+
+	if (reloc == NULL) {
+		reloc = eb_copy_relocations(eb);
+		if (IS_ERR(reloc)) {
+			mutex_lock(&dev->struct_mutex);
+			return PTR_ERR(reloc);
+		}
 	}
 
 	ret = i915_mutex_lock_interruptible(dev);
@@ -1033,13 +1148,8 @@ eb_relocate_slow(struct i915_execbuffer *eb)
 	if (ret)
 		goto err;
 
-	ret = eb_reserve(eb);
-	if (ret)
-		goto err;
-
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		int offset = vma->exec_entry - eb->exec;
-		ret = eb_relocate_vma_slow(vma, eb, reloc + reloc_offset[offset]);
+	list_for_each_entry(vma, &eb->relocs, reloc_link) {
+		ret = eb_relocate_vma_slow(eb, vma);
 		if (ret)
 			goto err;
 	}
@@ -1051,8 +1161,12 @@ eb_relocate_slow(struct i915_execbuffer *eb)
 	 */
 
 err:
+	if (ret == -EAGAIN) {
+		cond_resched();
+		goto repeat;
+	}
 	drm_free_large(reloc);
-	drm_free_large(reloc_offset);
+
 	return ret;
 }
 
@@ -1060,27 +1174,38 @@ static int
 eb_move_to_gpu(struct i915_execbuffer *eb)
 {
 	const unsigned other_rings = (~intel_engine_flag(eb->engine) & I915_BO_ACTIVE_MASK) << I915_BO_ACTIVE_SHIFT;
-	struct i915_vma *vma;
-	uint32_t flush_domains = 0;
+	const unsigned count = eb->args->buffer_count;
 	bool flush_chipset = false;
+	unsigned i;
 	int ret;
 
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
+	for (i = 0; i < count; i++) {
+		const struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
+		struct i915_vma *vma = to_ptr(struct i915_vma, entry->rsvd2);
 		struct drm_i915_gem_object *obj = vma->obj;
 
 		if (obj->flags & other_rings) {
-			ret = i915_gem_object_sync(obj,
-						   eb->request,
-						   vma->exec_entry->flags & EXEC_OBJECT_WRITE);
+			ret = i915_gem_object_sync(obj, eb->request,
+						   entry->flags & EXEC_OBJECT_WRITE);
 			if (ret)
 				return ret;
 		}
 
-		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
-			flush_chipset |= i915_gem_clflush_object(obj, false);
+		if (!i915_gem_object_is_active(obj)) {
+			if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
+				flush_chipset |= i915_gem_clflush_object(obj, false);
+
+			obj->base.write_domain = 0;
+			if (entry->flags & EXEC_OBJECT_WRITE)
+				obj->base.read_domains = 0;
+			obj->base.read_domains |= I915_GEM_GPU_DOMAINS;
+		}
 
-		flush_domains |= obj->base.write_domain;
+		i915_vma_move_to_active(vma, eb->request, entry->flags);
+		__eb_unreserve_vma(vma, entry);
+		vma->exec_entry = NULL;
 	}
+	eb->exec = NULL;
 
 	if (flush_chipset)
 		i915_gem_chipset_flush(eb->i915->dev);
@@ -1115,79 +1240,6 @@ i915_gem_check_execbuffer(struct drm_i915_gem_execbuffer2 *exec)
 	return true;
 }
 
-static int
-validate_exec_list(struct drm_device *dev,
-		   struct drm_i915_gem_exec_object2 *exec,
-		   int count)
-{
-	unsigned relocs_total = 0;
-	unsigned relocs_max = UINT_MAX / sizeof(struct drm_i915_gem_relocation_entry);
-	unsigned invalid_flags;
-	int i;
-
-	invalid_flags = __EXEC_OBJECT_UNKNOWN_FLAGS;
-	if (USES_FULL_PPGTT(dev))
-		invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
-
-	for (i = 0; i < count; i++) {
-		char __user *ptr = to_user_ptr(exec[i].relocs_ptr);
-		int length; /* limited by fault_in_pages_readable() */
-
-		if (exec[i].flags & invalid_flags)
-			return -EINVAL;
-
-		/* Offset can be used as input (EXEC_OBJECT_PINNED), reject
-		 * any non-page-aligned or non-canonical addresses.
-		 */
-		if (exec[i].flags & EXEC_OBJECT_PINNED) {
-			if (exec[i].offset !=
-			    gen8_canonical_addr(exec[i].offset & PAGE_MASK))
-				return -EINVAL;
-
-			/* From drm_mm perspective address space is continuous,
-			 * so from this point we're always using non-canonical
-			 * form internally.
-			 */
-			exec[i].offset = gen8_noncanonical_addr(exec[i].offset);
-		}
-
-		if (exec[i].alignment && !is_power_of_2(exec[i].alignment))
-			return -EINVAL;
-
-		/* pad_to_size was once a reserved field, so sanitize it */
-		if (exec[i].flags & EXEC_OBJECT_PAD_TO_SIZE) {
-			if (offset_in_page(exec[i].pad_to_size))
-				return -EINVAL;
-		} else
-			exec[i].pad_to_size = 0;
-
-		/* First check for malicious input causing overflow in
-		 * the worst case where we need to allocate the entire
-		 * relocation tree as a single array.
-		 */
-		if (exec[i].relocation_count > relocs_max - relocs_total)
-			return -EINVAL;
-		relocs_total += exec[i].relocation_count;
-
-		length = exec[i].relocation_count *
-			sizeof(struct drm_i915_gem_relocation_entry);
-		/*
-		 * We must check that the entire relocation array is safe
-		 * to read, but since we may need to update the presumed
-		 * offsets during execution, check for full write access.
-		 */
-		if (!access_ok(VERIFY_WRITE, ptr, length))
-			return -EFAULT;
-
-		if (likely(!i915.prefault_disable)) {
-			if (fault_in_multipages_readable(ptr, length))
-				return -EFAULT;
-		}
-	}
-
-	return 0;
-}
-
 void i915_vma_move_to_active(struct i915_vma *vma,
 			     struct drm_i915_gem_request *req,
 			     unsigned flags)
@@ -1224,26 +1276,6 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 	list_move_tail(&vma->vm_link, &vma->vm->active_list);
 }
 
-static void
-eb_move_to_active(struct i915_execbuffer *eb)
-{
-	struct i915_vma *vma;
-
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		struct drm_i915_gem_object *obj = vma->obj;
-		u32 old_read = obj->base.read_domains;
-		u32 old_write = obj->base.write_domain;
-
-		obj->base.write_domain = 0;
-		if (vma->exec_entry->flags & EXEC_OBJECT_WRITE)
-			obj->base.read_domains = 0;
-		obj->base.read_domains |= I915_GEM_GPU_DOMAINS;
-
-		i915_vma_move_to_active(vma, eb->request, vma->exec_entry->flags);
-		trace_i915_gem_object_change_domain(obj, old_read, old_write);
-	}
-}
-
 static int
 i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
 {
@@ -1255,25 +1287,22 @@ i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
 		return -EINVAL;
 	}
 
-	ret = intel_ring_begin(req, 4 * 3);
+	ret = intel_ring_begin(req, 4 * 2 + 2);
 	if (ret)
 		return ret;
 
+	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(4));
 	for (i = 0; i < 4; i++) {
-		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 		intel_ring_emit_reg(ring, GEN7_SO_WRITE_OFFSET(i));
 		intel_ring_emit(ring, 0);
 	}
-
+	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_advance(ring);
 
 	return 0;
 }
 
-static struct i915_vma *
-eb_parse(struct i915_execbuffer *eb,
-	 struct drm_i915_gem_exec_object2 *shadow_exec_entry,
-	 bool is_master)
+static struct i915_vma *eb_parse(struct i915_execbuffer *eb, bool is_master)
 {
 	struct drm_i915_gem_object *shadow_batch_obj;
 	struct i915_vma *vma;
@@ -1304,11 +1333,11 @@ eb_parse(struct i915_execbuffer *eb,
 		goto err;
 	}
 
-	memset(shadow_exec_entry, 0, sizeof(*shadow_exec_entry));
-
-	vma->exec_entry = shadow_exec_entry;
+	vma->exec_entry =
+	       	memset(&eb->exec[eb->args->buffer_count++],
+		       0, sizeof(*vma->exec_entry));
 	vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
-	list_add_tail(&vma->exec_list, &eb->vmas);
+	vma->exec_entry->rsvd2 = (uintptr_t)vma;
 
 err:
 	i915_gem_object_unpin_pages(shadow_batch_obj);
@@ -1324,70 +1353,81 @@ add_to_client(struct drm_i915_gem_request *req,
 }
 
 static int
-execbuf_submit(struct i915_execbuffer *eb)
+eb_set_constants_offset(struct i915_execbuffer *eb)
 {
-	struct intel_ring *ring = eb->request->ring;
 	struct drm_i915_private *dev_priv = eb->i915;
-	int instp_mode;
-	u32 instp_mask;
+	struct intel_ring *ring;
+	u32 mode, mask;
 	int ret;
 
-	ret = eb_move_to_gpu(eb);
-	if (ret)
-		return ret;
-
-	ret = i915_switch_context(eb->request);
-	if (ret)
-		return ret;
-
-	instp_mode = eb->args->flags & I915_EXEC_CONSTANTS_MASK;
-	instp_mask = I915_EXEC_CONSTANTS_MASK;
-	switch (instp_mode) {
+	mode = eb->args->flags & I915_EXEC_CONSTANTS_MASK;
+	switch (mode) {
 	case I915_EXEC_CONSTANTS_REL_GENERAL:
 	case I915_EXEC_CONSTANTS_ABSOLUTE:
 	case I915_EXEC_CONSTANTS_REL_SURFACE:
-		if (instp_mode != 0 && eb->engine->id != RCS) {
-			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
-			return -EINVAL;
-		}
-
-		if (instp_mode != dev_priv->relative_constants_mode) {
-			if (INTEL_INFO(dev_priv)->gen < 4) {
-				DRM_DEBUG("no rel constants on pre-gen4\n");
-				return -EINVAL;
-			}
-
-			if (INTEL_INFO(dev_priv)->gen > 5 &&
-			    instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
-				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
-				return -EINVAL;
-			}
-
-			/* The HW changed the meaning on this bit on gen6 */
-			if (INTEL_INFO(dev_priv)->gen >= 6)
-				instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
-		}
 		break;
 	default:
-		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
+		DRM_DEBUG("execbuf with unknown constants: %d\n", mode);
 		return -EINVAL;
 	}
 
-	if (eb->engine->id == RCS &&
-	    instp_mode != dev_priv->relative_constants_mode) {
-		ret = intel_ring_begin(eb->request, 4);
-		if (ret)
-			return ret;
+	if (mode == dev_priv->relative_constants_mode)
+		return 0;
+
+	if (eb->engine->id != RCS) {
+		DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
+		return -EINVAL;
+	}
 
-		intel_ring_emit(ring, MI_NOOP);
-		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
-		intel_ring_emit_reg(ring, INSTPM);
-		intel_ring_emit(ring, instp_mask << 16 | instp_mode);
-		intel_ring_advance(ring);
+	if (INTEL_INFO(dev_priv)->gen < 4) {
+		DRM_DEBUG("no rel constants on pre-gen4\n");
+		return -EINVAL;
+	}
 
-		dev_priv->relative_constants_mode = instp_mode;
+	if (INTEL_INFO(dev_priv)->gen > 5 &&
+	    mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
+		DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
+		return -EINVAL;
 	}
 
+	/* The HW changed the meaning on this bit on gen6 */
+	mask = I915_EXEC_CONSTANTS_MASK;
+	if (INTEL_INFO(dev_priv)->gen >= 6)
+		mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
+
+	ret = intel_ring_begin(eb->request, 4);
+	if (ret)
+		return ret;
+
+	ring = eb->request->ring;
+	intel_ring_emit(ring, MI_NOOP);
+	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
+	intel_ring_emit_reg(ring, INSTPM);
+	intel_ring_emit(ring, mask << 16 | mode);
+	intel_ring_advance(ring);
+
+	dev_priv->relative_constants_mode = mode;
+
+	return 0;
+}
+
+static int
+execbuf_submit(struct i915_execbuffer *eb)
+{
+	int ret;
+
+	ret = eb_move_to_gpu(eb);
+	if (ret)
+		return ret;
+
+	ret = i915_switch_context(eb->request);
+	if (ret)
+		return ret;
+
+	ret = eb_set_constants_offset(eb);
+	if (ret)
+		return ret;
+
 	if (eb->args->flags & I915_EXEC_GEN7_SOL_RESET) {
 		ret = i915_reset_gen7_sol_offsets(eb->request);
 		if (ret)
@@ -1396,15 +1436,13 @@ execbuf_submit(struct i915_execbuffer *eb)
 
 	ret = eb->engine->emit_bb_start(eb->request,
 					eb->batch_vma->node.start +
-					eb->batch_start_offset,
+					eb->args->batch_start_offset,
 					eb->args->batch_len,
 					eb->dispatch_flags);
 	if (ret)
 		return ret;
 
 	trace_i915_gem_ring_dispatch(eb->request, eb->dispatch_flags);
-
-	eb_move_to_active(eb);
 	add_to_client(eb->request, eb->file);
 
 	return 0;
@@ -1448,16 +1486,11 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		       struct drm_i915_gem_exec_object2 *exec)
 {
 	struct i915_execbuffer eb;
-	struct drm_i915_gem_exec_object2 shadow_exec_entry;
 	int ret;
 
 	if (!i915_gem_check_execbuffer(args))
 		return -EINVAL;
 
-	ret = validate_exec_list(dev, exec, args->buffer_count);
-	if (ret)
-		return ret;
-
 	eb.dispatch_flags = 0;
 	if (args->flags & I915_EXEC_SECURE) {
 		if (!file->is_master || !capable(CAP_SYS_ADMIN))
@@ -1485,7 +1518,11 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	eb.file = file;
 	eb.args = args;
 	eb.exec = exec;
-	eb.need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
+	if ((args->flags & I915_EXEC_NO_RELOC) == 0)
+		args->flags |= __EXEC_HAS_RELOC;
+	eb.invalid_flags = __EXEC_OBJECT_UNKNOWN_FLAGS;
+	if (USES_FULL_PPGTT(eb.i915))
+		eb.invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
 	reloc_cache_init(&eb.reloc_cache, eb.i915);
 
 	if ((args->flags & I915_EXEC_RING_MASK) == I915_EXEC_DEFAULT)
@@ -1561,22 +1598,11 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (ret)
 		goto err;
 
-	/* take note of the batch buffer before we might reorder the lists */
-	eb.batch_vma = eb_get_batch(&eb);
-
-	/* Move the objects en-masse into the GTT, evicting if necessary. */
-	ret = eb_reserve(&eb);
-	if (ret)
-		goto err;
-
 	/* The objects are in their final locations, apply the relocations. */
-	if (eb.need_relocs)
+	if (args->flags & __EXEC_HAS_RELOC) {
 		ret = eb_relocate(&eb);
-	if (ret) {
-		if (ret == -EFAULT) {
+		if (ret == -EAGAIN || ret == -EFAULT)
 			ret = eb_relocate_slow(&eb);
-			BUG_ON(!mutex_is_locked(&dev->struct_mutex));
-		}
 		if (ret)
 			goto err;
 	}
@@ -1588,11 +1614,10 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		goto err;
 	}
 
-	eb.batch_start_offset = args->batch_start_offset;
 	if (intel_engine_needs_cmd_parser(eb.engine) && args->batch_len) {
 		struct i915_vma *vma;
 
-		vma = eb_parse(&eb, &shadow_exec_entry, file->is_master);
+		vma = eb_parse(&eb, file->is_master);
 		if (IS_ERR(vma)) {
 			ret = PTR_ERR(vma);
 			goto err;
@@ -1609,7 +1634,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 			 * command parser has accepted.
 			 */
 			eb.dispatch_flags |= I915_DISPATCH_SECURE;
-			eb.batch_start_offset = 0;
+			eb.args->batch_start_offset = 0;
 			eb.batch_vma = vma;
 		}
 	}
@@ -1700,7 +1725,7 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
 
 	/* Copy in the exec list from userland */
 	exec_list = drm_malloc_ab(sizeof(*exec_list), args->buffer_count);
-	exec2_list = drm_malloc_ab(sizeof(*exec2_list), args->buffer_count);
+	exec2_list = drm_malloc_ab(sizeof(*exec2_list), args->buffer_count + 1);
 	if (exec_list == NULL || exec2_list == NULL) {
 		DRM_DEBUG("Failed to allocate exec list for %d buffers\n",
 			  args->buffer_count);
@@ -1743,24 +1768,22 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
 	i915_execbuffer2_set_context_id(exec2, 0);
 
 	ret = i915_gem_do_execbuffer(dev, file, &exec2, exec2_list);
-	if (!ret) {
+	if (exec2.flags & __EXEC_HAS_RELOC) {
 		struct drm_i915_gem_exec_object __user *user_exec_list =
 			to_user_ptr(args->buffers_ptr);
 
 		/* Copy the new buffer offsets back to the user's exec list. */
 		for (i = 0; i < args->buffer_count; i++) {
+			if ((exec2_list[i].offset & UPDATE) == 0)
+				continue;
+
 			exec2_list[i].offset =
-				gen8_canonical_addr(exec2_list[i].offset);
-			ret = __copy_to_user(&user_exec_list[i].offset,
-					     &exec2_list[i].offset,
-					     sizeof(user_exec_list[i].offset));
-			if (ret) {
-				ret = -EFAULT;
-				DRM_DEBUG("failed to copy %d exec entries "
-					  "back to user (%d)\n",
-					  args->buffer_count, ret);
+				gen8_canonical_addr(exec2_list[i].offset & PIN_OFFSET_MASK);
+			exec2_list[i].offset &= PIN_OFFSET_MASK;
+			if (__copy_to_user(&user_exec_list[i].offset,
+					   &exec2_list[i].offset,
+					   sizeof(user_exec_list[i].offset)))
 				break;
-			}
 		}
 	}
 
@@ -1789,43 +1812,38 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
 	}
 
 	exec2_list = drm_malloc_gfp(sizeof(*exec2_list),
-				    args->buffer_count,
+				    args->buffer_count + 1,
 				    GFP_TEMPORARY);
 	if (exec2_list == NULL) {
 		DRM_DEBUG("Failed to allocate exec list for %d buffers\n",
 			  args->buffer_count);
 		return -ENOMEM;
 	}
-	ret = copy_from_user(exec2_list,
-			     to_user_ptr(args->buffers_ptr),
-			     sizeof(*exec2_list) * args->buffer_count);
-	if (ret != 0) {
-		DRM_DEBUG("copy %d exec entries failed %d\n",
-			  args->buffer_count, ret);
+	if (copy_from_user(exec2_list,
+			   to_user_ptr(args->buffers_ptr),
+			   sizeof(*exec2_list) * args->buffer_count)) {
+		DRM_DEBUG("copy %d exec entries failed\n", args->buffer_count);
 		drm_free_large(exec2_list);
 		return -EFAULT;
 	}
 
 	ret = i915_gem_do_execbuffer(dev, file, args, exec2_list);
-	if (!ret) {
+	if (args->flags & __EXEC_HAS_RELOC) {
 		/* Copy the new buffer offsets back to the user's exec list. */
 		struct drm_i915_gem_exec_object2 __user *user_exec_list =
-				   to_user_ptr(args->buffers_ptr);
+			to_user_ptr(args->buffers_ptr);
 		int i;
 
 		for (i = 0; i < args->buffer_count; i++) {
+			if ((exec2_list[i].offset & UPDATE) == 0)
+				continue;
+
 			exec2_list[i].offset =
-				gen8_canonical_addr(exec2_list[i].offset);
-			ret = __copy_to_user(&user_exec_list[i].offset,
-					     &exec2_list[i].offset,
-					     sizeof(user_exec_list[i].offset));
-			if (ret) {
-				ret = -EFAULT;
-				DRM_DEBUG("failed to copy %d exec entries "
-					  "back to user\n",
-					  args->buffer_count);
+				gen8_canonical_addr(exec2_list[i].offset & PIN_OFFSET_MASK);
+			if (__copy_to_user(&user_exec_list[i].offset,
+					   &exec2_list[i].offset,
+					   sizeof(user_exec_list[i].offset)))
 				break;
-			}
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index cb3a6e272e22..a9b547e4ea6f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3594,7 +3594,7 @@ void *i915_vma_iomap(struct drm_i915_private *dev_priv,
 			int ret;
 
 			/* Too many areas already allocated? */
-			ret = i915_gem_evict_vm(vma->vm, true);
+			ret = i915_gem_evict_vm(vma->vm);
 			if (ret)
 				return ERR_PTR(ret);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 3080033b722c..6996d79175a0 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -244,7 +244,9 @@ struct i915_vma {
 	struct hlist_node obj_node;
 
 	/** This vma's place in the batchbuffer or on the eviction list */
-	struct list_head exec_list;
+	struct list_head exec_link;
+	struct list_head reloc_link;
+	struct list_head evict_link;
 
 	/**
 	 * Used for performing relocations during execbuffer insertion.
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 173/190] drm/i915: Wait upon userptr get-user-pages within execbuffer
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (29 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 172/190] drm/i915: Eliminate lots of iterations over the execobjects array Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 174/190] drm/i915: Show context objects in debugfs/i915_gem_objects Chris Wilson
                     ` (16 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

This simply hides the EAGAIN caused by userptr when userspace causes
resource contention.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_dma.c            |  1 +
 drivers/gpu/drm/i915/i915_drv.h            |  8 ++++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +++
 drivers/gpu/drm/i915/i915_gem_userptr.c    | 16 +++++++++++++---
 4 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 7d85c3bea02a..c1afbd873197 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1192,6 +1192,7 @@ int i915_driver_unload(struct drm_device *dev)
 	mutex_unlock(&dev->struct_mutex);
 	intel_fbc_cleanup_cfb(dev_priv);
 	i915_gem_cleanup_stolen(dev);
+	i915_gem_cleanup_userptr(dev);
 
 	intel_csr_ucode_fini(dev_priv);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 601ef7412cf9..a4311e2d2140 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1295,6 +1295,13 @@ struct i915_gem_mm {
 	struct delayed_work idle_work;
 
 	/**
+	 * Workqueue to fault in userptr pages, flushed by the execbuf
+	 * when required but otherwise left to userspace to try again
+	 * on EAGAIN.
+	 */
+	struct workqueue_struct *userptr_wq;
+
+	/**
 	 * Are we in a non-interruptible section of code like
 	 * modesetting?
 	 */
@@ -2724,6 +2731,7 @@ int i915_gem_set_tiling(struct drm_device *dev, void *data,
 int i915_gem_get_tiling(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
 int i915_gem_init_userptr(struct drm_device *dev);
+void i915_gem_cleanup_userptr(struct drm_device *dev);
 int i915_gem_userptr_ioctl(struct drm_device *dev, void *data,
 			   struct drm_file *file);
 int i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index f40d3254249a..733250afa139 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1133,6 +1133,9 @@ repeat:
 		}
 	}
 
+	/* A frequent cause for EAGAIN are currently unavailable client pages */
+	flush_workqueue(eb->i915->mm.userptr_wq);
+
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret) {
 		mutex_lock(&dev->struct_mutex);
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 232ce85b39db..54385f6c7e14 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -102,7 +102,8 @@ static unsigned long cancel_userptr(struct i915_mmu_object *mo)
 	 * is freed and then double free it.
 	 */
 	if (mo->active && kref_get_unless_zero(&mo->obj->base.refcount)) {
-		schedule_work(&mo->work);
+		queue_work(to_i915(mo->obj->base.dev)->mm.userptr_wq,
+			   &mo->work);
 		/* only schedule one work packet to avoid the refleak */
 		mo->active = false;
 	}
@@ -450,7 +451,7 @@ __i915_mm_struct_free(struct kref *kref)
 	mutex_unlock(&to_i915(mm->dev)->mm_lock);
 
 	INIT_WORK(&mm->work, __i915_mm_struct_free__worker);
-	schedule_work(&mm->work);
+	queue_work(to_i915(mm->dev)->mm.userptr_wq, &mm->work);
 }
 
 static void
@@ -664,7 +665,7 @@ __i915_gem_userptr_get_pages_schedule(struct drm_i915_gem_object *obj,
 	get_task_struct(work->task);
 
 	INIT_WORK(&work->work, __i915_gem_userptr_get_pages_worker);
-	schedule_work(&work->work);
+	queue_work(to_i915(obj->base.dev)->mm.userptr_wq, &work->work);
 
 	*active = true;
 	return -EAGAIN;
@@ -886,5 +887,14 @@ i915_gem_init_userptr(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	mutex_init(&dev_priv->mm_lock);
 	hash_init(dev_priv->mm_structs);
+	dev_priv->mm.userptr_wq =
+		alloc_workqueue("i915-userptr", WQ_HIGHPRI, 0);
 	return 0;
 }
+
+void
+i915_gem_cleanup_userptr(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = to_i915(dev);
+	destroy_workqueue(dev_priv->mm.userptr_wq);
+}
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 174/190] drm/i915: Show context objects in debugfs/i915_gem_objects
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (30 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 173/190] drm/i915: Wait upon userptr get-user-pages within execbuffer Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-03-24  7:58     ` David Weinehall
  2016-01-11 11:01   ` [PATCH 175/190] drm/i915: Remove superfluous i915_add_request_no_flush() helper Chris Wilson
                     ` (15 subsequent siblings)
  47 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 19b0d6a7680d..f8ca00ce986e 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -370,6 +370,40 @@ static void print_batch_pool_stats(struct seq_file *m,
 	print_file_stats(m, "[k]batch pool", stats);
 }
 
+static int per_file_ctx_stats(int id, void *ptr, void *data)
+{
+	struct intel_context *ctx = ptr;
+	int n;
+
+	for (n = 0; n < ARRAY_SIZE(ctx->engine); n++) {
+		if (ctx->engine[n].state)
+			per_file_stats(0, ctx->engine[n].state, data);
+		if (ctx->engine[n].ring)
+			per_file_stats(0, ctx->engine[n].ring->obj, data);
+	}
+
+	return 0;
+}
+
+static void print_context_stats(struct seq_file *m,
+				struct drm_i915_private *dev_priv)
+{
+	struct file_stats stats;
+	struct drm_file *file;
+
+	memset(&stats, 0, sizeof(stats));
+
+	if (dev_priv->kernel_context)
+		per_file_ctx_stats(0, dev_priv->kernel_context, &stats);
+
+	list_for_each_entry(file, &dev_priv->dev->filelist, lhead) {
+		struct drm_i915_file_private *fpriv = file->driver_priv;
+		idr_for_each(&fpriv->context_idr, per_file_ctx_stats, &stats);
+	}
+
+	print_file_stats(m, "[k]contexts", stats);
+}
+
 #define count_vmas(list, member) do { \
 	list_for_each_entry(vma, list, member) { \
 		size += vma->size; \
@@ -471,6 +505,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 
 	seq_putc(m, '\n');
 	print_batch_pool_stats(m, dev_priv);
+	print_context_stats(m, dev_priv);
 	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
 		struct file_stats stats;
 		struct drm_i915_file_private *file_priv = file->driver_priv;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 175/190] drm/i915: Remove superfluous i915_add_request_no_flush() helper
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (31 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 174/190] drm/i915: Show context objects in debugfs/i915_gem_objects Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 176/190] drm/i915: Use the MRU stack search after evicting Chris Wilson
                     ` (14 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

The only time we need to emit a flush inside request emission is after
an execbuffer, for which we can use the full __i915_add_request(). All
other instances want the simpler i915_add_request() without flushing, so
remove the useless helper.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_evict.c   | 2 +-
 drivers/gpu/drm/i915/i915_gem_request.h | 2 --
 drivers/gpu/drm/i915/intel_display.c    | 4 ++--
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index e71b89bac168..56b57bdf22ab 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -53,7 +53,7 @@ static int switch_to_pinned_context(struct drm_i915_private *dev_priv)
 			return PTR_ERR(req);
 
 		ret = i915_switch_context(req);
-		i915_add_request_no_flush(req);
+		i915_add_request(req);
 		if (ret)
 			return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 1e7c4fff5257..434e028f0411 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -148,8 +148,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
 
 void __i915_add_request(struct drm_i915_gem_request *req, bool flush_caches);
 #define i915_add_request(req) \
-	__i915_add_request(req, true)
-#define i915_add_request_no_flush(req) \
 	__i915_add_request(req, false)
 
 struct intel_rps_client;
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index e518d3300a3e..b1fb43fcfeea 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11698,7 +11698,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 		if (ret)
 			goto cleanup_unpin;
 
-		i915_add_request_no_flush(request);
+		i915_add_request(request);
 		i915_gem_request_assign(&work->flip_queued_req, request);
 	}
 
@@ -11721,7 +11721,7 @@ cleanup_unpin:
 	intel_unpin_fb_obj(fb, crtc->primary->state);
 cleanup_request:
 	if (request)
-		i915_add_request_no_flush(request);
+		i915_add_request(request);
 cleanup_pending:
 	atomic_dec(&intel_crtc->unpin_work_count);
 	mutex_unlock(&dev->struct_mutex);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 176/190] drm/i915: Use the MRU stack search after evicting
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (32 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 175/190] drm/i915: Remove superfluous i915_add_request_no_flush() helper Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 177/190] drm/i915: Use VMA as the primary object for context state Chris Wilson
                     ` (13 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

When we evict from the GTT to make room for an object, the hole we
create is put onto the MRU stack inside the drm_mm range manager. On the
next search pass, we can speed up a PIN_HIGH allocation by referencing
that stack for the new hole.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0bd6db4e83d9..a7cad2c7c034 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2873,8 +2873,10 @@ search_free:
 						       obj->cache_level,
 						       start, end,
 						       flags);
-			if (ret == 0)
+			if (ret == 0) {
+				search_flag = DRM_MM_SEARCH_DEFAULT;
 				goto search_free;
+			}
 
 			goto err_unpin;
 		}
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 177/190] drm/i915: Use VMA as the primary object for context state
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (33 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 176/190] drm/i915: Use the MRU stack search after evicting Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 178/190] drm/i915: Do an inline flush-active before dropping the mutex when waiting Chris Wilson
                     ` (12 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

When working with contexts, we most frequently want the GGTT VMA for the
context state, first and foremost. Since the object is available via the
VMA, we need only then store the VMA.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        | 13 ++--
 drivers/gpu/drm/i915/i915_drv.h            |  3 +-
 drivers/gpu/drm/i915/i915_gem_context.c    | 99 ++++++++++++++++--------------
 drivers/gpu/drm/i915/i915_gpu_error.c      |  2 +-
 drivers/gpu/drm/i915/i915_guc_submission.c |  6 +-
 drivers/gpu/drm/i915/intel_lrc.c           | 53 ++++++++--------
 6 files changed, 93 insertions(+), 83 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index f8ca00ce986e..7fb4088b3966 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -377,7 +377,7 @@ static int per_file_ctx_stats(int id, void *ptr, void *data)
 
 	for (n = 0; n < ARRAY_SIZE(ctx->engine); n++) {
 		if (ctx->engine[n].state)
-			per_file_stats(0, ctx->engine[n].state, data);
+			per_file_stats(0, ctx->engine[n].state->obj, data);
 		if (ctx->engine[n].ring)
 			per_file_stats(0, ctx->engine[n].ring->obj, data);
 	}
@@ -2002,7 +2002,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
 			seq_printf(m, "%s: ", ring->name);
 			seq_putc(m, ctx->engine[i].initialised ? 'I' : 'i');
 			if (ctx->engine[i].state)
-				describe_obj(m, ctx->engine[i].state);
+				describe_obj(m, ctx->engine[i].state->obj);
 			if (ctx->engine[i].ring)
 				describe_ctx_ring(m, ctx->engine[i].ring);
 			seq_putc(m, '\n');
@@ -2025,14 +2025,13 @@ static void i915_dump_lrc_obj(struct seq_file *m,
 			      struct intel_engine_cs *ring,
 			      struct intel_context *ctx)
 {
-	struct drm_i915_gem_object *obj = ctx->engine[ring->id].state;
-	struct i915_vma *vma = ctx->engine[ring->id].vma;
+	struct i915_vma *vma = ctx->engine[ring->id].state;
 	struct page *page;
 	int j;
 
 	seq_printf(m, "CONTEXT: %s\n", ring->name);
 
-	if (obj == NULL) {
+	if (vma == NULL) {
 		seq_printf(m, "\tUnallocated\n\n");
 		return;
 	}
@@ -2045,12 +2044,12 @@ static void i915_dump_lrc_obj(struct seq_file *m,
 			   lower_32_bits(vma->node.start));
 	}
 
-	if (i915_gem_object_get_pages(obj)) {
+	if (i915_gem_object_get_pages(vma->obj)) {
 		seq_puts(m, "\tFailed to get pages for context object\n\n");
 		return;
 	}
 
-	page = i915_gem_object_get_page(obj, LRC_STATE_PN);
+	page = i915_gem_object_get_page(vma->obj, LRC_STATE_PN);
 	if (page != NULL) {
 		uint32_t *reg_state = kmap_atomic(page);
 		for (j = 0; j < 0x600 / sizeof(u32) / 4; j += 4) {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a4311e2d2140..6827e26b5681 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -886,8 +886,7 @@ struct intel_context {
 #define CONTEXT_NO_ERROR_CAPTURE	(1<<1)
 
 	struct intel_context_engine {
-		struct drm_i915_gem_object *state;
-		struct i915_vma *vma;
+		struct i915_vma *state;
 		struct intel_ring *ring;
 		int pin_count;
 		bool initialised;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 4e0c5e161e84..0c4864eca5f6 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -214,7 +214,7 @@ void i915_gem_context_free(struct kref *ctx_ref)
 		if (ce->ring)
 			intel_ring_free(ce->ring);
 
-		__i915_gem_object_release_unless_active(ce->state);
+		__i915_gem_object_release_unless_active(ce->state->obj);
 	}
 
 	decouple_vma(ctx);
@@ -322,13 +322,26 @@ __create_hw_context(struct drm_device *dev,
 	INIT_WORK(&ctx->vma_ht_resize, resize_vma_ht);
 
 	if (dev_priv->hw_context_size) {
-		struct drm_i915_gem_object *obj =
-				i915_gem_alloc_context_obj(dev, dev_priv->hw_context_size);
+		struct drm_i915_gem_object *obj;
+		struct i915_vma *vma;
+
+		obj = i915_gem_alloc_context_obj(dev,
+						 dev_priv->hw_context_size);
 		if (IS_ERR(obj)) {
 			ret = PTR_ERR(obj);
 			goto err_out;
 		}
-		ctx->engine[RCS].state = obj;
+
+		vma = i915_gem_obj_lookup_or_create_vma(obj,
+							&dev_priv->gtt.base,
+							NULL);
+		if (IS_ERR(vma)) {
+			drm_gem_object_unreference(&obj->base);
+			ret = PTR_ERR(vma);
+			goto err_out;
+		}
+
+		ctx->engine[RCS].state = vma;
 	}
 
 	/* Default context will never have a file_priv */
@@ -421,10 +434,8 @@ void i915_gem_context_reset(struct drm_device *dev)
 		if (lctx == NULL)
 			continue;
 
-		if (lctx->engine[i].vma) {
-			i915_vma_unpin(lctx->engine[i].vma);
-			lctx->engine[i].vma = NULL;
-		}
+		if (lctx->engine[i].state)
+			i915_vma_unpin(lctx->engine[i].state);
 
 		i915_gem_context_unreference(lctx);
 		ring->last_context = NULL;
@@ -469,8 +480,7 @@ int i915_gem_context_init(struct drm_device *dev)
 	}
 
 	if (ctx->engine[RCS].state) {
-		u32 alignment = get_context_alignment(dev);
-		struct i915_vma *vma;
+		int ret;
 
 		/* We may need to do things with the shrinker which
 		 * require us to immediately switch back to the default
@@ -479,13 +489,14 @@ int i915_gem_context_init(struct drm_device *dev)
 		 * be available. To avoid this we always pin the default
 		 * context.
 		 */
-		vma = i915_gem_object_ggtt_pin(ctx->engine[RCS].state,
-					       NULL, 0, alignment, PIN_HIGH);
-		if (IS_ERR(vma)) {
+		ret = i915_vma_pin(ctx->engine[RCS].state,
+				   0, get_context_alignment(dev),
+				   PIN_GLOBAL | PIN_HIGH);
+		if (ret) {
 			DRM_ERROR("Failed to pinned default global context (error %d)\n",
-				  (int)PTR_ERR(vma));
+				  ret);
 			i915_gem_context_unreference(ctx);
-			return PTR_ERR(vma);
+			return ret;
 		}
 	}
 
@@ -518,13 +529,13 @@ void i915_gem_context_fini(struct drm_device *dev)
 		WARN_ON(!dev_priv->ring[RCS].last_context);
 		if (dev_priv->ring[RCS].last_context == dctx) {
 			/* Fake switch to NULL context */
-			WARN_ON(dctx->engine[RCS].vma->active);
-			i915_vma_unpin(dctx->engine[RCS].vma);
+			WARN_ON(dctx->engine[RCS].state->active);
+			i915_vma_unpin(dctx->engine[RCS].state);
 			i915_gem_context_unreference(dctx);
 			dev_priv->ring[RCS].last_context = NULL;
 		}
 
-		i915_vma_unpin(dctx->engine[RCS].vma);
+		i915_vma_unpin(dctx->engine[RCS].state);
 	}
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
@@ -644,7 +655,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 
 	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_emit(ring, MI_SET_CONTEXT);
-	intel_ring_emit(ring, req->ctx->engine[RCS].vma->node.start | flags);
+	intel_ring_emit(ring, req->ctx->engine[RCS].state->node.start | flags);
 	/*
 	 * w/a: MI_SET_CONTEXT must always be followed by MI_NOOP
 	 * WaMiSetContext_Hang:snb,ivb,vlv
@@ -752,6 +763,20 @@ static int remap_l3(struct drm_i915_gem_request *req, int slice)
 	return 0;
 }
 
+static void flush_cpu_writes(struct drm_i915_gem_object *obj)
+{
+	if (obj->base.write_domain == 0)
+		return;
+
+	if (obj->base.write_domain & I915_GEM_DOMAIN_CPU) {
+		if (i915_gem_clflush_object(obj, false))
+			i915_gem_chipset_flush(obj->base.dev);
+	}
+
+	wmb();
+	obj->base.write_domain = 0;
+}
+
 static int do_switch(struct drm_i915_gem_request *req)
 {
 	struct intel_context *to = req->ctx;
@@ -765,20 +790,11 @@ static int do_switch(struct drm_i915_gem_request *req)
 
 	/* Trying to pin first makes error handling easier. */
 	if (engine->id == RCS) {
-		u32 alignment = get_context_alignment(engine->dev);
-		struct i915_vma *vma;
-
-		vma = i915_gem_object_ggtt_pin(to->engine[RCS].state,
-					       NULL, 0, alignment, PIN_HIGH);
-		if (IS_ERR(vma))
-			return PTR_ERR(vma);
-
-		to->engine[RCS].vma = vma;
-
-		if (WARN_ON(!(vma->bound & GLOBAL_BIND))) {
-			ret = -ENODEV;
-			goto unpin_out;
-		}
+		ret = i915_vma_pin(to->engine[RCS].state, 
+				   0, get_context_alignment(engine->dev),
+				   PIN_GLOBAL | PIN_HIGH);
+		if (ret)
+			return ret;
 	}
 
 	/*
@@ -813,16 +829,9 @@ static int do_switch(struct drm_i915_gem_request *req)
 	}
 
 	/*
-	 * Clear this page out of any CPU caches for coherent swap-in/out. Note
-	 * that thanks to write = false in this call and us not setting any gpu
-	 * write domains when putting a context object onto the active list
-	 * (when switching away from it), this won't block.
-	 *
-	 * XXX: We need a real interface to do this instead of trickery.
+	 * Clear this page out of any CPU caches for coherent swap-in/out.
 	 */
-	ret = i915_gem_object_set_to_gtt_domain(to->engine[RCS].state, false);
-	if (ret)
-		goto unpin_out;
+	flush_cpu_writes(to->engine[RCS].state->obj);
 
 	if (!to->engine[RCS].initialised || i915_gem_context_is_default(to)) {
 		hw_flags |= MI_RESTORE_INHIBIT;
@@ -896,9 +905,9 @@ static int do_switch(struct drm_i915_gem_request *req)
 		 * able to defer doing this until we know the object would be
 		 * swapped, but there is no way to do that yet.
 		 */
-		i915_vma_move_to_active(from->engine[RCS].vma, req, 0);
+		i915_vma_move_to_active(from->engine[RCS].state, req, 0);
 		/* obj is kept alive until the next request by its active ref */
-		i915_vma_unpin(from->engine[RCS].vma);
+		i915_vma_unpin(from->engine[RCS].state);
 
 		i915_gem_context_unreference(from);
 	}
@@ -910,7 +919,7 @@ done:
 
 unpin_out:
 	if (engine->id == RCS)
-		i915_vma_unpin(to->engine[RCS].vma);
+		i915_vma_unpin(to->engine[RCS].state);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 6fbb11a53b60..f0249dafda07 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -986,7 +986,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 
 			error->ring[i].ctx =
 				i915_error_object_create(dev_priv,
-							 request->ctx->engine[i].vma);
+							 request->ctx->engine[i].state);
 
 			pid = request->ctx->pid;
 			if (pid) {
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index d6df94129796..17d23f55367c 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -407,7 +407,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 		lrc->context_desc = engine->execlist_context_descriptor;
 
 		/* The state page is after PPHWSP */
-		lrc->ring_lcra = ctx->engine[i].vma->node.start +
+		lrc->ring_lcra = ctx->engine[i].state->node.start +
 				LRC_STATE_PN * PAGE_SIZE;
 		lrc->context_id = (client->ctx_index << GUC_ELC_CTXID_OFFSET) |
 				(engine->id << GUC_ELC_ENGINE_OFFSET);
@@ -987,7 +987,7 @@ int intel_guc_suspend(struct drm_device *dev)
 	/* any value greater than GUC_POWER_D0 */
 	data[1] = GUC_POWER_D1;
 	/* first page is shared data with GuC */
-	data[2] = ctx->engine[RCS].vma->node.start;
+	data[2] = ctx->engine[RCS].state->node.start;
 
 	return host2guc_action(guc, data, ARRAY_SIZE(data));
 }
@@ -1012,7 +1012,7 @@ int intel_guc_resume(struct drm_device *dev)
 	data[0] = HOST2GUC_ACTION_EXIT_S_STATE;
 	data[1] = GUC_POWER_D0;
 	/* first page is shared data with GuC */
-	data[2] = ctx->engine[RCS].vma->node.start;
+	data[2] = ctx->engine[RCS].state->node.start;
 
 	return host2guc_action(guc, data, ARRAY_SIZE(data));
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3e61fce1326e..e687b191453a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -571,7 +571,6 @@ static int intel_lr_context_pin(struct intel_context *ctx,
 				struct intel_engine_cs *engine)
 {
 	struct intel_context_engine *ce = &ctx->engine[engine->id];
-	struct i915_vma *vma;
 	struct intel_ring *ring;
 	u32 ggtt_offset;
 	int ret;
@@ -581,14 +580,11 @@ static int intel_lr_context_pin(struct intel_context *ctx,
 
 	lockdep_assert_held(&engine->dev->struct_mutex);
 
-	vma = i915_gem_object_ggtt_pin(ce->state, NULL,
-				       0, GEN8_LR_CONTEXT_ALIGN,
-				       PIN_OFFSET_BIAS | GUC_WOPCM_TOP |
-				       PIN_HIGH);
-	if (IS_ERR(vma)) {
-		ret = PTR_ERR(vma);
+	ret = i915_vma_pin(ce->state, 0, GEN8_LR_CONTEXT_ALIGN,
+			   PIN_OFFSET_BIAS | GUC_WOPCM_TOP |
+			   PIN_GLOBAL | PIN_HIGH);
+	if (ret)
 		goto err;
-	}
 
 	ring = ce->ring;
 	ret = intel_ring_map(ring);
@@ -596,15 +592,15 @@ static int intel_lr_context_pin(struct intel_context *ctx,
 		goto unpin;
 
 	i915_gem_context_reference(ctx);
-	ce->vma = vma;
-	i915_gem_object_set_dirty(vma->obj);
+	i915_gem_object_set_dirty(ce->state->obj);
 
-	ggtt_offset = vma->node.start + LRC_PPHWSP_PN * PAGE_SIZE;
+	ggtt_offset = ce->state->node.start + LRC_PPHWSP_PN * PAGE_SIZE;
 	ring->context_descriptor =
 		ggtt_offset | engine->execlist_context_descriptor;
 
 	ring->registers =
-		kmap(i915_gem_object_get_dirty_page(vma->obj, LRC_STATE_PN));
+		kmap(i915_gem_object_get_dirty_page(ce->state->obj,
+						    LRC_STATE_PN));
 	ring->registers[CTX_RING_BUFFER_START+1] = ring->vma->node.start;
 
 	/* Invalidate GuC TLB. */
@@ -616,7 +612,7 @@ static int intel_lr_context_pin(struct intel_context *ctx,
 	return 0;
 
 unpin:
-	__i915_vma_unpin(vma);
+	__i915_vma_unpin(ce->state);
 err:
 	ce->pin_count = 0;
 	return ret;
@@ -626,7 +622,6 @@ void intel_lr_context_unpin(struct intel_context *ctx,
 			    struct intel_engine_cs *engine)
 {
 	struct intel_context_engine *ce = &ctx->engine[engine->id];
-	struct i915_vma *vma;
 
 	lockdep_assert_held(&engine->dev->struct_mutex);
 	if (--ce->pin_count)
@@ -634,9 +629,8 @@ void intel_lr_context_unpin(struct intel_context *ctx,
 
 	intel_ring_unmap(ce->ring);
 
-	vma = ce->vma;
-	kunmap(i915_gem_object_get_page(vma->obj, LRC_STATE_PN));
-	i915_vma_unpin(vma);
+	kunmap(i915_gem_object_get_page(ce->state->obj, LRC_STATE_PN));
+	i915_vma_unpin(ce->state);
 
 	i915_gem_context_unreference(ctx);
 }
@@ -1064,7 +1058,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring)
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	lrc_setup_hardware_status_page(ring,
-			dev_priv->kernel_context->engine[ring->id].vma);
+			dev_priv->kernel_context->engine[ring->id].state);
 
 	I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
 	I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff);
@@ -1935,6 +1929,7 @@ static int execlists_context_deferred_alloc(struct intel_context *ctx,
 {
 	struct intel_context_engine *ce = &ctx->engine[engine->id];
 	struct drm_i915_gem_object *ctx_obj;
+	struct i915_vma *vma;
 	uint32_t context_size;
 	struct intel_ring *ring;
 	int ret;
@@ -1952,6 +1947,14 @@ static int execlists_context_deferred_alloc(struct intel_context *ctx,
 		return -ENOMEM;
 	}
 
+	vma = i915_gem_obj_lookup_or_create_vma(ctx_obj,
+						&engine->i915->gtt.base,
+						NULL);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto error_deref_obj;
+	}
+
 	ring = intel_engine_create_ring(engine, 4 * PAGE_SIZE);
 	if (IS_ERR(ring)) {
 		ret = PTR_ERR(ring);
@@ -1965,7 +1968,7 @@ static int execlists_context_deferred_alloc(struct intel_context *ctx,
 	}
 
 	ce->ring = ring;
-	ce->state = ctx_obj;
+	ce->state = vma;
 	ce->initialised = engine->init_context == NULL;
 
 	return 0;
@@ -1974,8 +1977,6 @@ error_ringbuf:
 	intel_ring_free(ring);
 error_deref_obj:
 	drm_gem_object_unreference(&ctx_obj->base);
-	ce->ring = NULL;
-	ce->state = NULL;
 	return ret;
 }
 
@@ -1988,17 +1989,19 @@ void intel_lr_context_reset(struct drm_device *dev,
 
 	for_each_ring(unused, dev_priv, i) {
 		struct intel_context_engine *ce = &ctx->engine[i];
+		struct drm_i915_gem_object *obj;
 		uint32_t *reg_state;
 		struct page *page;
 
 		if (ce->state == NULL)
 			continue;
 
-		if (i915_gem_object_get_pages(ce->state)) {
-			WARN(1, "Failed get_pages for context obj\n");
+		obj = ce->state->obj;
+		if (WARN(i915_gem_object_get_pages(obj),
+			 "Failed get_pages for context obj\n"))
 			continue;
-		}
-		page = i915_gem_object_get_dirty_page(ce->state, LRC_STATE_PN);
+
+		page = i915_gem_object_get_dirty_page(obj, LRC_STATE_PN);
 		reg_state = kmap_atomic(page);
 
 		reg_state[CTX_RING_HEAD+1] = ce->ring->head;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 178/190] drm/i915: Do an inline flush-active before dropping the mutex when waiting
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (34 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 177/190] drm/i915: Use VMA as the primary object for context state Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 179/190] drm/i915: Skip MI_SET_CONTEXT for the same context Chris Wilson
                     ` (11 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

In a nonblocking wait, we gather up all the outstanding requests and
then drop the mutex. However if all those requests have already
completed we do not need to wait upon them and can exit early without
having to drop and reacquire the struct_mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a7cad2c7c034..d452499ae5a9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1226,7 +1226,10 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 		if (req == NULL)
 			return 0;
 
-		requests[n++] = i915_gem_request_get(req);
+		if (i915_gem_request_completed(req))
+			i915_gem_request_retire_upto(req);
+		else
+			requests[n++] = i915_gem_request_get(req);
 	} else {
 		for (i = 0; i < I915_NUM_RINGS; i++) {
 			struct drm_i915_gem_request *req;
@@ -1235,10 +1238,16 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 			if (req == NULL)
 				continue;
 
-			requests[n++] = i915_gem_request_get(req);
+			if (i915_gem_request_completed(req))
+				i915_gem_request_retire_upto(req);
+			else
+				requests[n++] = i915_gem_request_get(req);
 		}
 	}
 
+	if (n == 0)
+		return 0;
+
 	mutex_unlock(&dev->struct_mutex);
 	ret = 0;
 	for (i = 0; ret == 0 && i < n; i++)
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 179/190] drm/i915: Skip MI_SET_CONTEXT for the same context
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (35 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 178/190] drm/i915: Do an inline flush-active before dropping the mutex when waiting Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 180/190] drm/i915: Micro-optimise i915_gem_object_get_dirty_page() Chris Wilson
                     ` (10 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

Fixes regression from

commit 71b7e54f71b899db9f8def67a0e976969384e699
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Apr 14 17:35:18 2015 +0200

    drm/i915: Don't look at pg_dirty_rings for aliasing ppgtt

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 0c4864eca5f6..060e902afd1c 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -680,7 +680,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 
 	intel_ring_advance(ring);
 
-	return ret;
+	return 0;
 }
 
 static inline bool should_skip_switch(struct intel_engine_cs *ring,
@@ -690,9 +690,13 @@ static inline bool should_skip_switch(struct intel_engine_cs *ring,
 	if (to->remap_slice)
 		return false;
 
-	if (to->ppgtt && from == to &&
-	    !(intel_engine_flag(ring) & to->ppgtt->pd_dirty_rings))
-		return true;
+	if (from == to) {
+		if (to->ppgtt == NULL)
+			return true;
+
+		if (!(intel_engine_flag(ring) & to->ppgtt->pd_dirty_rings))
+			return true;
+	}
 
 	return false;
 }
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 180/190] drm/i915: Micro-optimise i915_gem_object_get_dirty_page()
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (36 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 179/190] drm/i915: Skip MI_SET_CONTEXT for the same context Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 181/190] drm/i915: Introduce an internal allocator for disposable private objects Chris Wilson
                     ` (9 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

We can skip the set_page_dirty() calls if we already know that the
entire object is dirty. Futhermore, the WARN is redundant (we'll crash
shortly afterwards) but adds substantial overhead to the function
(roughly increasing the relocation per-page cost by 10%).

Fixes regression from
commit 033908aed5a596f6202c848c6bbc8a40fb1a8490
Author: Dave Gordon <david.s.gordon@intel.com>
Date:   Thu Dec 10 18:51:23 2015 +0000

    drm/i915: mark GEM object pages dirty when mapped & written by the CPU

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h |  8 +++++---
 drivers/gpu/drm/i915/i915_gem.c | 14 +++++---------
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6827e26b5681..2f8b5e7f9320 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2807,16 +2807,18 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
 
 int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
 
-static inline int __sg_page_count(struct scatterlist *sg)
+static inline int __sg_page_count(const struct scatterlist *sg)
 {
 	return sg->length >> PAGE_SHIFT;
 }
 
 struct page *
-i915_gem_object_get_dirty_page(struct drm_i915_gem_object *obj, int n);
+i915_gem_object_get_dirty_page(struct drm_i915_gem_object *obj,
+			       unsigned int n);
 
 static inline struct page *
-i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
+i915_gem_object_get_page(struct drm_i915_gem_object *obj,
+			 unsigned int n)
 {
 	if (WARN_ON(n >= obj->base.size >> PAGE_SHIFT))
 		return NULL;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d452499ae5a9..9cd161645041 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4388,16 +4388,12 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
 
 /* Like i915_gem_object_get_page(), but mark the returned page dirty */
 struct page *
-i915_gem_object_get_dirty_page(struct drm_i915_gem_object *obj, int n)
+i915_gem_object_get_dirty_page(struct drm_i915_gem_object *obj,
+			       unsigned int n)
 {
-	struct page *page;
-
-	/* Only default objects have per-page dirty tracking */
-	if (WARN_ON(obj->ops != &i915_gem_object_ops))
-		return NULL;
-
-	page = i915_gem_object_get_page(obj, n);
-	set_page_dirty(page);
+	struct page *page = i915_gem_object_get_page(obj, n);
+	if (!i915_gem_object_is_dirty(obj))
+		set_page_dirty(page);
 	return page;
 }
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 181/190] drm/i915: Introduce an internal allocator for disposable private objects
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (37 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 180/190] drm/i915: Micro-optimise i915_gem_object_get_dirty_page() Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 182/190] drm/i915: Avoid allocating a vmap arena for a single page Chris Wilson
                     ` (8 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

Quite a few of our objects used for internal hardware programming do not
benefit from being swappable or from being zero initialised. As such
they do not benefit from using a shmemfs backing storage and since they
are internal and never directly exposed to the user, we do not need to
worry about providing a filp. For these we can use an
drm_i915_gem_object wrapper around a sg_table of plain struct page. They
are not swapped backed and not automatically pinned. If they are reaped
by the shrinker, the pages are released and the contents discarded. For
the internal use case, this is fine as for example, ringbuffers are
pinned from being written by a request to be read by the hardware. Once
they are idle, they can be discarded entirely. As such they are a good
match for execlist ringbuffers and a small varierty of other internal
objects.

In the first iteration, this is limited to the scratch batch buffers we
use (for command parsing and state initialisation).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile                |   1 +
 drivers/gpu/drm/i915/i915_drv.h              |   8 ++
 drivers/gpu/drm/i915/i915_gem.c              |   9 +-
 drivers/gpu/drm/i915/i915_gem_batch_pool.c   |  27 ++---
 drivers/gpu/drm/i915/i915_gem_internal.c     | 157 +++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_render_state.c |   2 +-
 6 files changed, 180 insertions(+), 24 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_internal.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index a362425ef862..8d0fae65a5bd 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -28,6 +28,7 @@ i915-y += i915_cmd_parser.o \
 	  i915_gem_execbuffer.o \
 	  i915_gem_fence.o \
 	  i915_gem_gtt.o \
+	  i915_gem_internal.o \
 	  i915_gem.o \
 	  i915_gem_render_state.o \
 	  i915_gem_request.o \
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2f8b5e7f9320..e3c77d245a6b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1989,6 +1989,9 @@ enum hdmi_force_audio {
 #define I915_GTT_OFFSET_NONE ((u32)-1)
 
 struct drm_i915_gem_object_ops {
+	unsigned int flags;
+#define I915_GEM_OBJECT_HAS_STRUCT_PAGE 0x1
+
 	/* Interface between the GEM object and its backing storage.
 	 * get_pages() is called once prior to the use of the associated set
 	 * of pages before to binding them into the GTT, and put_pages() is
@@ -3117,6 +3120,11 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 					       u32 gtt_offset,
 					       u32 size);
 
+/* i915_gem_internal.c */
+struct drm_i915_gem_object *
+i915_gem_object_create_internal(struct drm_device *dev,
+				unsigned size);
+
 /* i915_gem_shrinker.c */
 unsigned long i915_gem_shrink(struct drm_i915_private *dev_priv,
 			      unsigned long target,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9cd161645041..cca45f60d0bd 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -497,7 +497,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 	int ret;
 
 	*needs_clflush = 0;
-	if (!obj->base.filp)
+	if ((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0)
 		return -EINVAL;
 
 	ret = i915_gem_object_get_pages(obj);
@@ -547,7 +547,7 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
 	int ret;
 
 	*needs_clflush = 0;
-	if (!obj->base.filp)
+	if ((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0)
 		return -EINVAL;
 
 	ret = i915_gem_object_get_pages(obj);
@@ -800,7 +800,7 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
 	/* prime objects have no backing filp to GEM pread/pwrite
 	 * pages from.
 	 */
-	if (!obj->base.filp) {
+	if ((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0) {
 		ret = -EINVAL;
 		goto out;
 	}
@@ -1131,7 +1131,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 	/* prime objects have no backing filp to GEM pread/pwrite
 	 * pages from.
 	 */
-	if (!obj->base.filp) {
+	if ((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0) {
 		ret = -EINVAL;
 		goto out;
 	}
@@ -3750,6 +3750,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 }
 
 static const struct drm_i915_gem_object_ops i915_gem_object_ops = {
+	.flags = I915_GEM_OBJECT_HAS_STRUCT_PAGE,
 	.get_pages = i915_gem_object_get_pages_gtt,
 	.put_pages = i915_gem_object_put_pages_gtt,
 };
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index d46012234db1..b876e334da58 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -98,9 +98,9 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 			size_t size)
 {
 	struct drm_i915_gem_object *obj = NULL;
-	struct drm_i915_gem_object *tmp, *next;
+	struct drm_i915_gem_object *tmp;
 	struct list_head *list;
-	int n;
+	int n, ret;
 
 	lockdep_assert_held(&pool->engine->dev->struct_mutex);
 
@@ -113,7 +113,7 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 		n = ARRAY_SIZE(pool->cache_list) - 1;
 	list = &pool->cache_list[n];
 
-	list_for_each_entry_safe(tmp, next, list, batch_pool_link) {
+	list_for_each_entry(tmp, list, batch_pool_link) {
 		/* The batches are strictly LRU ordered */
 		if (i915_gem_object_is_active(tmp)) {
 			struct drm_i915_gem_request *rq;
@@ -126,13 +126,6 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 			GEM_BUG_ON(tmp->last_write.request);
 		}
 
-		/* While we're looping, do some clean up */
-		if (tmp->madv == __I915_MADV_PURGED) {
-			list_del(&tmp->batch_pool_link);
-			drm_gem_object_unreference(&tmp->base);
-			continue;
-		}
-
 		if (tmp->base.size >= size) {
 			obj = tmp;
 			break;
@@ -140,19 +133,15 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 	}
 
 	if (obj == NULL) {
-		int ret;
-
-		obj = i915_gem_alloc_object(pool->engine->dev, size);
+		obj = i915_gem_object_create_internal(pool->engine->dev, size);
 		if (obj == NULL)
 			return ERR_PTR(-ENOMEM);
-
-		ret = i915_gem_object_get_pages(obj);
-		if (ret)
-			return ERR_PTR(ret);
-
-		obj->madv = I915_MADV_DONTNEED;
 	}
 
+	ret = i915_gem_object_get_pages(obj);
+	if (ret)
+		return ERR_PTR(ret);
+
 	list_move_tail(&obj->batch_pool_link, list);
 	i915_gem_object_pin_pages(obj);
 	return obj;
diff --git a/drivers/gpu/drm/i915/i915_gem_internal.c b/drivers/gpu/drm/i915/i915_gem_internal.c
new file mode 100644
index 000000000000..0685a304f3ab
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_internal.c
@@ -0,0 +1,157 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include <drm/drmP.h>
+#include <drm/i915_drm.h>
+#include "i915_drv.h"
+
+static void __i915_gem_object_free_pages(struct sg_table *st)
+{
+	struct sg_page_iter sg_iter;
+
+	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0)
+		page_cache_release(sg_page_iter_page(&sg_iter));
+
+	sg_free_table(st);
+	kfree(st);
+}
+
+static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj)
+{
+	const unsigned npages = obj->base.size / PAGE_SIZE;
+	struct sg_table *st;
+	struct scatterlist *sg;
+	unsigned long last_pfn = 0;	/* suppress gcc warning */
+	gfp_t gfp;
+	int i;
+
+	st = kmalloc(sizeof(*st), GFP_KERNEL);
+	if (st == NULL)
+		return -ENOMEM;
+
+	if (sg_alloc_table(st, npages, GFP_KERNEL)) {
+		kfree(st);
+		return -ENOMEM;
+	}
+
+	sg = st->sgl;
+	st->nents = 0;
+
+	gfp = GFP_KERNEL | __GFP_HIGHMEM;
+	gfp |= __GFP_NORETRY | __GFP_NOWARN;
+	gfp &= ~(__GFP_IO | __GFP_RECLAIM);
+	for (i = 0; i < npages; i++) {
+		struct page *page;
+
+		page = alloc_page(gfp);
+		if (page == NULL) {
+			i915_gem_shrink_all(to_i915(obj->base.dev));
+			page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM);
+			if (page == NULL)
+				goto err;
+		}
+
+#ifdef CONFIG_SWIOTLB
+		if (swiotlb_nr_tbl()) {
+			st->nents++;
+			sg_set_page(sg, page, PAGE_SIZE, 0);
+			sg = sg_next(sg);
+			continue;
+		}
+#endif
+		if (!i || page_to_pfn(page) != last_pfn + 1) {
+			if (i)
+				sg = sg_next(sg);
+			st->nents++;
+			sg_set_page(sg, page, PAGE_SIZE, 0);
+		} else {
+			sg->length += PAGE_SIZE;
+		}
+		last_pfn = page_to_pfn(page);
+	}
+#ifdef CONFIG_SWIOTLB
+	if (!swiotlb_nr_tbl())
+#endif
+		sg_mark_end(sg);
+	obj->pages = st;
+
+	if (i915_gem_gtt_prepare_object(obj)) {
+		obj->pages = NULL;
+		goto err;
+	}
+
+	obj->madv = I915_MADV_DONTNEED;
+	i915_gem_object_set_dirty(obj);
+	return 0;
+
+err:
+	sg_mark_end(sg);
+	__i915_gem_object_free_pages(st);
+	return -ENOMEM;
+}
+
+static void i915_gem_object_put_pages_internal(struct drm_i915_gem_object *obj)
+{
+	__i915_gem_object_free_pages(obj->pages);
+
+	i915_gem_object_unset_dirty(obj);
+	obj->madv = I915_MADV_WILLNEED;
+}
+
+static const struct drm_i915_gem_object_ops i915_gem_object_internal_ops = {
+	.flags = I915_GEM_OBJECT_HAS_STRUCT_PAGE,
+	.get_pages = i915_gem_object_get_pages_internal,
+	.put_pages = i915_gem_object_put_pages_internal,
+};
+
+/**
+ * Creates a new object that wraps some internal memory for private use.
+ * This object is not backed by swappable storage, and as such its contents
+ * are volatile and only valid whilst pinned. If the object is reaped by the
+ * shrinker, its pages and data will be discarded. Equally, it is not a full
+ * GEM object and so not valid for access from userspace. This makes it useful
+ * for hardware interfaces like ringbuffers (which are pinned from the time
+ * the request is written to the time the hardware stops accessing it), but
+ * not for contexts (which need to be preserved when not active for later
+ * reuse).
+ */
+struct drm_i915_gem_object *
+i915_gem_object_create_internal(struct drm_device *dev,
+				unsigned size)
+{
+	struct drm_i915_gem_object *obj;
+
+	obj = i915_gem_object_alloc(dev);
+	if (obj == NULL)
+		return NULL;
+
+	drm_gem_private_object_init(dev, &obj->base, size);
+	i915_gem_object_init(obj, &i915_gem_object_internal_ops);
+
+	obj->base.write_domain = I915_GEM_DOMAIN_CPU;
+	obj->base.read_domains = I915_GEM_DOMAIN_CPU;
+	obj->cache_level = HAS_LLC(dev) ? I915_CACHE_LLC: I915_CACHE_NONE;
+
+	return obj;
+}
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 2fac95b0ba44..436f097d1907 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -66,7 +66,7 @@ static int render_state_init(struct render_state *so, struct drm_device *dev)
 	if (so->rodata->batch_items * 4 > 4096)
 		return -EINVAL;
 
-	so->obj = i915_gem_alloc_object(dev, 4096);
+	so->obj = i915_gem_object_create_internal(dev, 4096);
 	if (so->obj == NULL)
 		return -ENOMEM;
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 182/190] drm/i915: Avoid allocating a vmap arena for a single page
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (38 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 181/190] drm/i915: Introduce an internal allocator for disposable private objects Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 183/190] drm/i915/cmdparser: Use cached vmappings Chris Wilson
                     ` (7 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

If we want a contiguous mapping of a single page sized object, we can
forgo using vmap() and just use a regular kmap().

(This maybe worth lifting to the core, with the additional proviso that
the pgprot_t is compatible.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cca45f60d0bd..7be5a8fb9180 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1882,14 +1882,17 @@ i915_gem_object_put_pages(struct drm_i915_gem_object *obj)
 	 * lists early. */
 	list_del(&obj->global_list);
 
-	ops->put_pages(obj);
-	obj->pages = NULL;
-
 	if (obj->vmapping) {
-		vunmap(obj->vmapping);
+		if (obj->base.size == PAGE_SIZE)
+			kunmap(sg_page(obj->pages->sgl));
+		else
+			vunmap(obj->vmapping);
 		obj->vmapping = NULL;
 	}
 
+	ops->put_pages(obj);
+	obj->pages = NULL;
+
 	i915_gem_object_invalidate(obj);
 
 	return 0;
@@ -2069,15 +2072,22 @@ void *i915_gem_object_pin_vmap(struct drm_i915_gem_object *obj)
 	i915_gem_object_pin_pages(obj);
 
 	if (obj->vmapping == NULL) {
-		struct sg_page_iter sg_iter;
 		struct page **pages;
-		int n;
 
-		n = obj->base.size >> PAGE_SHIFT;
-		pages = drm_malloc_gfp(n, sizeof(*pages), GFP_TEMPORARY);
+		pages = NULL;
+		if (obj->base.size == PAGE_SIZE)
+			obj->vmapping = kmap(sg_page(obj->pages->sgl));
+		else
+			pages = drm_malloc_gfp(obj->base.size >> PAGE_SHIFT,
+					       sizeof(*pages),
+					       GFP_TEMPORARY);
 		if (pages != NULL) {
+			struct sg_page_iter sg_iter;
+			int n;
+
 			n = 0;
-			for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0)
+			for_each_sg_page(obj->pages->sgl, &sg_iter,
+					 obj->pages->nents, 0)
 				pages[n++] = sg_page_iter_page(&sg_iter);
 
 			obj->vmapping = vmap(pages, n, 0, PAGE_KERNEL);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 183/190] drm/i915/cmdparser: Use cached vmappings
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (39 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 182/190] drm/i915: Avoid allocating a vmap arena for a single page Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 184/190] drm/i915/cmdparser: Only cache the dst vmap Chris Wilson
                     ` (6 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

The single largest factor in the overhead of parsing the commands is the
setup of the virtual mapping to provide a continuous block for the batch
buffer. If we keep those vmappings around (against the better judgement
of mm/vmalloc.c, which we offset by handwaving and looking suggestively
at the shrinker) we can dramatically improve the performance of the
parser for small batches (such as media workloads). Furthermore, we can
use the prepare shmem read/write functions to determine  how best we
need to clflush the range (rather than every page of the object).

The impact of caching both src/dst vmaps is +80% on ivb and +140% on byt
for the throughput on small batches. (Caching just the dst vmap and
iterating over the src, doing a page by page copy is roughly 5% slower
on both platforms. That may be an acceptable trade-off to eliminate one
cached vmapping, and we may be able to reduce the per-page copying overhead
further.) For *this* simple test case, the cmdparser is now within a
factor of 2 of ideal performance.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 123 ++++++++++-------------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   6 ++
 2 files changed, 48 insertions(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 84340eb42e1b..32b83369ae4e 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -866,98 +866,57 @@ find_reg(const struct drm_i915_reg_descriptor *table,
 	return NULL;
 }
 
-static u32 *vmap_batch(struct drm_i915_gem_object *obj,
-		       unsigned start, unsigned len)
-{
-	int i;
-	void *addr = NULL;
-	struct sg_page_iter sg_iter;
-	int first_page = start >> PAGE_SHIFT;
-	int last_page = (len + start + 4095) >> PAGE_SHIFT;
-	int npages = last_page - first_page;
-	struct page **pages;
-
-	pages = drm_malloc_ab(npages, sizeof(*pages));
-	if (pages == NULL) {
-		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
-		goto finish;
-	}
-
-	i = 0;
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, first_page) {
-		pages[i++] = sg_page_iter_page(&sg_iter);
-		if (i == npages)
-			break;
-	}
-
-	addr = vmap(pages, i, 0, PAGE_KERNEL);
-	if (addr == NULL) {
-		DRM_DEBUG_DRIVER("Failed to vmap pages\n");
-		goto finish;
-	}
-
-finish:
-	if (pages)
-		drm_free_large(pages);
-	return (u32*)addr;
-}
-
-/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
-static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
+/* Returns a vmap'd pointer to dst_obj, which the caller must unmap */
+static u32 *copy_batch(struct drm_i915_gem_object *dst_obj,
 		       struct drm_i915_gem_object *src_obj,
 		       u32 batch_start_offset,
-		       u32 batch_len)
+		       u32 batch_len,
+		       bool *needs_clflush_after)
 {
-	unsigned needs_clflush;
-	void *src_base, *src;
-	void *dst = NULL;
+	unsigned src_needs_clflush;
+	unsigned dst_needs_clflush;
+	void *src, *dst;
 	int ret;
 
-	if (batch_len > dest_obj->base.size ||
-	    batch_len + batch_start_offset > src_obj->base.size)
-		return ERR_PTR(-E2BIG);
-
-	if (WARN_ON(dest_obj->pages_pin_count == 0))
-		return ERR_PTR(-ENODEV);
-
-	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
+	ret = i915_gem_obj_prepare_shmem_read(src_obj, &src_needs_clflush);
+	if (ret)
 		return ERR_PTR(ret);
-	}
 
-	src_base = vmap_batch(src_obj, batch_start_offset, batch_len);
-	if (!src_base) {
-		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
-		ret = -ENOMEM;
+	ret = i915_gem_obj_prepare_shmem_write(dst_obj, &dst_needs_clflush);
+	if (ret) {
+		dst = ERR_PTR(ret);
 		goto unpin_src;
 	}
 
-	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
-		goto unmap_src;
+	src = i915_gem_object_pin_vmap(src_obj);
+	if (IS_ERR(src)) {
+		dst = src;
+		goto unpin_dst;
 	}
 
-	dst = vmap_batch(dest_obj, 0, batch_len);
-	if (!dst) {
-		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
-		ret = -ENOMEM;
+	dst = i915_gem_object_pin_vmap(dst_obj);
+	if (IS_ERR(dst))
 		goto unmap_src;
-	}
 
-	src = src_base + offset_in_page(batch_start_offset);
-	if (needs_clflush)
-		drm_clflush_virt_range(src, batch_len);
+	src += batch_start_offset;
+	if (src_needs_clflush)
+		clflush_cache_range(src, batch_len);
+
+	if (dst_needs_clflush & CLFLUSH_BEFORE)
+		batch_len = roundup(batch_len, boot_cpu_data.x86_clflush_size);
 
 	memcpy(dst, src, batch_len);
 
+	/* dst_obj is returned with vmap pinned */
+	*needs_clflush_after = dst_needs_clflush & CLFLUSH_AFTER;
+
 unmap_src:
-	vunmap(src_base);
+	i915_gem_object_unpin_vmap(src_obj);
+unpin_dst:
+	i915_gem_object_unpin_pages(dst_obj);
 unpin_src:
 	i915_gem_object_unpin_pages(src_obj);
-
-	return ret ? ERR_PTR(ret) : dst;
+	return dst;
 }
 
 static bool check_cmd(const struct intel_engine_cs *ring,
@@ -1106,16 +1065,18 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		    u32 batch_len,
 		    bool is_master)
 {
-	u32 *cmd, *batch_base, *batch_end;
+	u32 *cmd, *batch_end;
 	struct drm_i915_cmd_descriptor default_desc = { 0 };
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
+	bool needs_clflush_after = false;
 	int ret = 0;
 
-	batch_base = copy_batch(shadow_batch_obj, batch_obj,
-				batch_start_offset, batch_len);
-	if (IS_ERR(batch_base)) {
+	cmd = copy_batch(shadow_batch_obj, batch_obj,
+			 batch_start_offset, batch_len,
+			 &needs_clflush_after);
+	if (IS_ERR(cmd)) {
 		DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n");
-		return PTR_ERR(batch_base);
+		return PTR_ERR(cmd);
 	}
 
 	/*
@@ -1123,9 +1084,7 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 	 * large or larger and copy_batch() will write MI_NOPs to the extra
 	 * space. Parsing should be faster in some cases this way.
 	 */
-	batch_end = batch_base + (batch_len / sizeof(*batch_end));
-
-	cmd = batch_base;
+	batch_end = cmd + (batch_len / sizeof(*batch_end));
 	while (cmd < batch_end) {
 		const struct drm_i915_cmd_descriptor *desc;
 		u32 length;
@@ -1184,7 +1143,9 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		ret = -EINVAL;
 	}
 
-	vunmap(batch_base);
+	if (ret == 0 && needs_clflush_after)
+		clflush_cache_range(shadow_batch_obj->vmapping, batch_len);
+	i915_gem_object_unpin_vmap(shadow_batch_obj);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 733250afa139..eac3d52f790d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1616,6 +1616,12 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		ret = -EINVAL;
 		goto err;
 	}
+	if (args->batch_start_offset > eb.batch_vma->size ||
+	    args->batch_len > eb.batch_vma->size - args->batch_start_offset) {
+		DRM_DEBUG("Attempting to use out-of-bounds batch\n");
+		ret = -EINVAL;
+		goto err;
+	}
 
 	if (intel_engine_needs_cmd_parser(eb.engine) && args->batch_len) {
 		struct i915_vma *vma;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 184/190] drm/i915/cmdparser: Only cache the dst vmap
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (40 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 183/190] drm/i915/cmdparser: Use cached vmappings Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 185/190] drm/i915/cmdparser: Improve hash function Chris Wilson
                     ` (5 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

For simplicity, we want to continue using a contiguous mapping of the
command buffer, but we can reduce the number of vmappings we hold by
switching over to a page-by-page copy from the user batch buffer to the
shadow. The cost for saving one linear mapping is about 5% in trivial
workloads - which is more or less the overhead in calling kmap_atomic().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 35 +++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 32b83369ae4e..05221581887e 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -875,7 +875,8 @@ static u32 *copy_batch(struct drm_i915_gem_object *dst_obj,
 {
 	unsigned src_needs_clflush;
 	unsigned dst_needs_clflush;
-	void *src, *dst;
+	void *dst, *ptr;
+	int offset, n;
 	int ret;
 
 	ret = i915_gem_obj_prepare_shmem_read(src_obj, &src_needs_clflush);
@@ -888,30 +889,34 @@ static u32 *copy_batch(struct drm_i915_gem_object *dst_obj,
 		goto unpin_src;
 	}
 
-	src = i915_gem_object_pin_vmap(src_obj);
-	if (IS_ERR(src)) {
-		dst = src;
-		goto unpin_dst;
-	}
-
 	dst = i915_gem_object_pin_vmap(dst_obj);
 	if (IS_ERR(dst))
-		goto unmap_src;
-
-	src += batch_start_offset;
-	if (src_needs_clflush)
-		clflush_cache_range(src, batch_len);
+		goto unpin_dst;
 
+	ptr = dst;
+	offset = offset_in_page(batch_start_offset);
 	if (dst_needs_clflush & CLFLUSH_BEFORE)
 		batch_len = roundup(batch_len, boot_cpu_data.x86_clflush_size);
 
-	memcpy(dst, src, batch_len);
+	for (n = batch_start_offset >> PAGE_SHIFT; batch_len; n++) {
+		int len = min_t(int, batch_len, PAGE_SIZE - offset);
+		void *vaddr;
+
+		offset = 0;
+		batch_len -= len;
+
+		vaddr = kmap_atomic(i915_gem_object_get_page(src_obj, n));
+		if (src_needs_clflush)
+			clflush_cache_range(vaddr + offset, len);
+		memcpy(ptr, vaddr + offset, len);
+		kunmap_atomic(vaddr);
+
+		ptr += len;
+	}
 
 	/* dst_obj is returned with vmap pinned */
 	*needs_clflush_after = dst_needs_clflush & CLFLUSH_AFTER;
 
-unmap_src:
-	i915_gem_object_unpin_vmap(src_obj);
 unpin_dst:
 	i915_gem_object_unpin_pages(dst_obj);
 unpin_src:
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 185/190] drm/i915/cmdparser: Improve hash function
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (41 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 184/190] drm/i915/cmdparser: Only cache the dst vmap Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 186/190] drm/i915/cmdparser: Compare against the previous command descriptor Chris Wilson
                     ` (4 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

The existing code's hashfunction is very suboptimal (most 3D commands
use the same bucket degrading the hash to a long list). The code even
acknowledge that the issue was known and the fix simple:

/*
 * If we attempt to generate a perfect hash, we should be able to look at bits
 * 31:29 of a command from a batch buffer and use the full mask for that
 * client. The existing INSTR_CLIENT_MASK/SHIFT defines can be used for this.
 */

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 49 ++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 05221581887e..4fdcb19012e5 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -86,24 +86,24 @@
  * general bitmasking mechanism.
  */
 
-#define STD_MI_OPCODE_MASK  0xFF800000
-#define STD_3D_OPCODE_MASK  0xFFFF0000
-#define STD_2D_OPCODE_MASK  0xFFC00000
-#define STD_MFX_OPCODE_MASK 0xFFFF0000
+#define STD_MI_OPCODE_SHIFT  (32 - 9)
+#define STD_3D_OPCODE_SHIFT  (32 - 16)
+#define STD_2D_OPCODE_SHIFT  (32 - 10)
+#define STD_MFX_OPCODE_SHIFT (32 - 16)
 
 #define CMD(op, opm, f, lm, fl, ...)				\
 	{							\
 		.flags = (fl) | ((f) ? CMD_DESC_FIXED : 0),	\
-		.cmd = { (op), (opm) },				\
+		.cmd = { (op), ~0u << (opm) }, 			\
 		.length = { (lm) },				\
 		__VA_ARGS__					\
 	}
 
 /* Convenience macros to compress the tables */
-#define SMI STD_MI_OPCODE_MASK
-#define S3D STD_3D_OPCODE_MASK
-#define S2D STD_2D_OPCODE_MASK
-#define SMFX STD_MFX_OPCODE_MASK
+#define SMI STD_MI_OPCODE_SHIFT
+#define S3D STD_3D_OPCODE_SHIFT
+#define S2D STD_2D_OPCODE_SHIFT
+#define SMFX STD_MFX_OPCODE_SHIFT
 #define F true
 #define S CMD_DESC_SKIP
 #define R CMD_DESC_REJECT
@@ -632,12 +632,24 @@ struct cmd_node {
  * non-opcode bits being set. But if we don't include those bits, some 3D
  * commands may hash to the same bucket due to not including opcode bits that
  * make the command unique. For now, we will risk hashing to the same bucket.
- *
- * If we attempt to generate a perfect hash, we should be able to look at bits
- * 31:29 of a command from a batch buffer and use the full mask for that
- * client. The existing INSTR_CLIENT_MASK/SHIFT defines can be used for this.
  */
-#define CMD_HASH_MASK STD_MI_OPCODE_MASK
+static inline u32 cmd_header_key(u32 x)
+{
+	u32 shift;
+	switch (x >> INSTR_CLIENT_SHIFT) {
+	default:
+	case INSTR_MI_CLIENT:
+		shift = STD_MI_OPCODE_SHIFT;
+		break;
+	case INSTR_RC_CLIENT:
+		shift = STD_3D_OPCODE_SHIFT;
+		break;
+	case INSTR_BC_CLIENT:
+		shift = STD_2D_OPCODE_SHIFT;
+		break;
+	}
+	return x >> shift;
+}
 
 static int init_hash_table(struct intel_engine_cs *ring,
 			   const struct drm_i915_cmd_table *cmd_tables,
@@ -661,7 +673,7 @@ static int init_hash_table(struct intel_engine_cs *ring,
 
 			desc_node->desc = desc;
 			hash_add(ring->cmd_hash, &desc_node->node,
-				 desc->cmd.value & CMD_HASH_MASK);
+				 cmd_header_key(desc->cmd.value));
 		}
 	}
 
@@ -807,12 +819,9 @@ find_cmd_in_table(struct intel_engine_cs *ring,
 	struct cmd_node *desc_node;
 
 	hash_for_each_possible(ring->cmd_hash, desc_node, node,
-			       cmd_header & CMD_HASH_MASK) {
+			       cmd_header_key(cmd_header)) {
 		const struct drm_i915_cmd_descriptor *desc = desc_node->desc;
-		u32 masked_cmd = desc->cmd.mask & cmd_header;
-		u32 masked_value = desc->cmd.value & desc->cmd.mask;
-
-		if (masked_cmd == masked_value)
+		if (((cmd_header ^ desc->cmd.value) & desc->cmd.mask) == 0)
 			return desc;
 	}
 
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 186/190] drm/i915/cmdparser: Compare against the previous command descriptor
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (42 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 185/190] drm/i915/cmdparser: Improve hash function Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 187/190] drm/i915: Allow execbuffer to use the first object as the batch Chris Wilson
                     ` (3 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

On the blitter (and in test code), we see long sequences of repeated
commands, e.g. XY_PIXEL_BLT, XY_SCANLINE_BLT, or XY_SRC_COPY. For these,
we can skip the hashtable lookup by remembering the previous command
descriptor and doing a straightforward compare of the command header.
The corollary is that we need to do one extra comparison before lookup
up new commands.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 4fdcb19012e5..c0b034171b52 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -349,6 +349,9 @@ static const struct drm_i915_cmd_descriptor hsw_blt_cmds[] = {
 	CMD(  MI_LOAD_SCAN_LINES_EXCL,          SMI,   !F,  0x3F,   R  ),
 };
 
+static const struct drm_i915_cmd_descriptor noop_desc =
+	CMD(MI_NOOP, SMI, F, 1, S);
+
 #undef CMD
 #undef SMI
 #undef S3D
@@ -839,11 +842,14 @@ find_cmd_in_table(struct intel_engine_cs *ring,
 static const struct drm_i915_cmd_descriptor*
 find_cmd(struct intel_engine_cs *ring,
 	 u32 cmd_header,
+	 const struct drm_i915_cmd_descriptor *desc,
 	 struct drm_i915_cmd_descriptor *default_desc)
 {
-	const struct drm_i915_cmd_descriptor *desc;
 	u32 mask;
 
+	if (((cmd_header ^ desc->cmd.value) & desc->cmd.mask) == 0)
+		return desc;
+
 	desc = find_cmd_in_table(ring, cmd_header);
 	if (desc)
 		return desc;
@@ -852,10 +858,10 @@ find_cmd(struct intel_engine_cs *ring,
 	if (!mask)
 		return NULL;
 
-	BUG_ON(!default_desc);
-	default_desc->flags = CMD_DESC_SKIP;
+	default_desc->cmd.value = cmd_header;
+	default_desc->cmd.mask = 0xffff0000;
 	default_desc->length.mask = mask;
-
+	default_desc->flags = CMD_DESC_SKIP;
 	return default_desc;
 }
 
@@ -1080,7 +1086,8 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		    bool is_master)
 {
 	u32 *cmd, *batch_end;
-	struct drm_i915_cmd_descriptor default_desc = { 0 };
+	struct drm_i915_cmd_descriptor default_desc = noop_desc;
+	const struct drm_i915_cmd_descriptor *desc = &default_desc;
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
 	bool needs_clflush_after = false;
 	int ret = 0;
@@ -1100,13 +1107,12 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 	 */
 	batch_end = cmd + (batch_len / sizeof(*batch_end));
 	while (cmd < batch_end) {
-		const struct drm_i915_cmd_descriptor *desc;
 		u32 length;
 
 		if (*cmd == MI_BATCH_BUFFER_END)
 			break;
 
-		desc = find_cmd(ring, *cmd, &default_desc);
+		desc = find_cmd(ring, *cmd, desc, &default_desc);
 		if (!desc) {
 			DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
 					 *cmd);
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 187/190] drm/i915: Allow execbuffer to use the first object as the batch
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (43 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 186/190] drm/i915/cmdparser: Compare against the previous command descriptor Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 188/190] drm/i915: Use VMA for ringbuffer tracking Chris Wilson
                     ` (2 subsequent siblings)
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

Currently, the last object in the execlist is the always the batch.
However, when building the batch buffer we often know the batch object
first and if we can use the first slot in the execlist we can emit
relocation instructions relative to it immediately and avoid a separate
pass to adjust the relocations to point to the last execlist slot.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_dma.c            | 3 +++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 ++++-
 include/uapi/drm/i915_drm.h                | 4 +++-
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index c1afbd873197..85926c03c3cf 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -172,6 +172,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_EXEC_SOFTPIN:
 		value = 1;
 		break;
+	case I915_PARAM_HAS_EXEC_BATCH_FIRST:
+		value = 1;
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index eac3d52f790d..817c0cc054d2 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -415,7 +415,10 @@ ht_head(const struct intel_context *ctx, u32 handle)
 
 static int eb_batch_index(const struct i915_execbuffer *eb)
 {
-	return eb->args->buffer_count - 1;
+	 if (eb->args->flags & I915_EXEC_BATCH_FIRST)
+		 return 0;
+	 else
+		 return eb->args->buffer_count - 1;
 }
 
 static int
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index ff7b438059da..27bb4668af05 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -357,6 +357,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_GPU_RESET	 35
 #define I915_PARAM_HAS_RESOURCE_STREAMER 36
 #define I915_PARAM_HAS_EXEC_SOFTPIN	 37
+#define I915_PARAM_HAS_EXEC_BATCH_FIRST	 38
 
 typedef struct drm_i915_getparam {
 	__s32 param;
@@ -786,7 +787,8 @@ struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_RESOURCE_STREAMER     (1<<15)
 
-#define __I915_EXEC_UNKNOWN_FLAGS -(I915_EXEC_RESOURCE_STREAMER<<1)
+#define I915_EXEC_BATCH_FIRST		(1<<16)
+#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_BATCH_FIRST<<1))
 
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 188/190] drm/i915: Use VMA for ringbuffer tracking
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (44 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 187/190] drm/i915: Allow execbuffer to use the first object as the batch Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 189/190] drm/i915: Skip clearing the GGTT on full-ppgtt systems Chris Wilson
  2016-01-11 11:01   ` [PATCH 190/190] drm/i915: Do a nonblocking wait first in pread/pwrite Chris Wilson
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

Use the GGTT VMA as the primary cookie for handing ring objects as
the most common action upon the ring is mapping and unmapping which act
upon the VMA itself. By restructuring the code to work with the ring
VMA, we can shrink the code and remove a few cycles from context pinning.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |   2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c | 135 ++++++++++++++------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +-
 3 files changed, 61 insertions(+), 78 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 7fb4088b3966..af2ec70dd7ab 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -379,7 +379,7 @@ static int per_file_ctx_stats(int id, void *ptr, void *data)
 		if (ctx->engine[n].state)
 			per_file_stats(0, ctx->engine[n].state->obj, data);
 		if (ctx->engine[n].ring)
-			per_file_stats(0, ctx->engine[n].ring->obj, data);
+			per_file_stats(0, ctx->engine[n].ring->vma->obj, data);
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 41c52cdcbe4a..512841df2527 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1899,108 +1899,91 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
 
 int intel_ring_map(struct intel_ring *ring)
 {
-	struct drm_i915_gem_object *obj = ring->obj;
-	struct i915_vma *vma;
+	void *ptr;
 	int ret;
 
-	if (HAS_LLC(ring->engine->i915) && !obj->stolen) {
-		vma = i915_gem_object_ggtt_pin(obj, NULL,
-					       0, PAGE_SIZE,
-					       PIN_HIGH);
-		if (IS_ERR(vma))
-			return PTR_ERR(vma);
+	GEM_BUG_ON(ring->virtual_start);
 
-		ret = i915_gem_object_set_to_cpu_domain(obj, true);
-		if (ret)
-			goto unpin;
-
-		ring->virtual_start = i915_gem_object_pin_vmap(obj);
-		if (IS_ERR(ring->virtual_start)) {
-			ret = PTR_ERR(ring->virtual_start);
-			ring->virtual_start = NULL;
-			goto unpin;
-		}
-	} else {
-		vma = i915_gem_object_ggtt_pin(obj, NULL,
-					       0, PAGE_SIZE,
-					       PIN_MAPPABLE);
-		if (IS_ERR(vma))
-			return PTR_ERR(vma);
+	ret = i915_vma_pin(ring->vma, 0, PAGE_SIZE,
+			   PIN_GLOBAL | (ring->vmap ? PIN_HIGH : PIN_MAPPABLE));
+	if (unlikely(ret))
+		return ret;
 
-		ret = i915_gem_object_set_to_gtt_domain(obj, true);
-		if (ret)
-			goto unpin;
-
-		ring->virtual_start = ioremap_wc(ring->engine->i915->gtt.mappable_base +
-						 vma->node.start,
-						 ring->size);
-		if (ring->virtual_start == NULL) {
-			ret = -ENOMEM;
-			goto unpin;
-		}
+	if (ring->vmap)
+		ptr = i915_gem_object_pin_vmap(ring->vma->obj);
+	else
+		ptr = i915_vma_iomap(ring->engine->i915, ring->vma);
+	if (IS_ERR(ptr)) {
+		i915_vma_unpin(ring->vma);
+		return PTR_ERR(ptr);
 	}
 
-	ring->vma = vma;
+	ring->virtual_start = ptr;
 	return 0;
-
-unpin:
-	i915_vma_unpin(vma);
-	return ret;
 }
 
 void intel_ring_unmap(struct intel_ring *ring)
 {
-	if (HAS_LLC(ring->engine->i915) && !ring->obj->stolen)
-		i915_gem_object_unpin_vmap(ring->obj);
-	else
-		iounmap(ring->virtual_start);
+	GEM_BUG_ON(ring->virtual_start == NULL);
 
-	i915_vma_unpin(ring->vma);
-	ring->vma = NULL;
-}
+	if (ring->vmap)
+		i915_gem_object_unpin_vmap(ring->vma->obj);
+	ring->virtual_start = NULL;
 
-static void intel_destroy_ringbuffer_obj(struct intel_ring *ringbuf)
-{
-	__i915_gem_object_release_unless_active(ringbuf->obj);
-	ringbuf->obj = NULL;
+	i915_vma_unpin(ring->vma);
 }
 
-static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
-				      struct intel_ring *ringbuf)
+static struct i915_vma *
+intel_ring_create_vma(struct drm_device *dev, int size)
 {
 	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int ret;
 
 	obj = NULL;
 	if (!HAS_LLC(dev))
-		obj = i915_gem_object_create_stolen(dev, ringbuf->size);
+		obj = i915_gem_object_create_stolen(dev, size);
 	if (obj == NULL)
-		obj = i915_gem_alloc_object(dev, ringbuf->size);
+		obj = i915_gem_alloc_object(dev, size);
 	if (obj == NULL)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	/* mark ring buffers as read-only from GPU side by default */
 	obj->gt_ro = 1;
 
-	ringbuf->obj = obj;
+	if (HAS_LLC(dev) && !obj->stolen)
+		ret = i915_gem_object_set_to_cpu_domain(obj, true);
+	else
+		ret = i915_gem_object_set_to_gtt_domain(obj, true);
+	if (ret) {
+		vma = ERR_PTR(ret);
+		goto err;
+	}
+
+	vma = i915_gem_obj_lookup_or_create_vma(obj,
+						&to_i915(dev)->gtt.base,
+						NULL);
+	if (IS_ERR(vma))
+		goto err;
+
+	return vma;
 
-	return 0;
+err:
+	drm_gem_object_unreference(&obj->base);
+	return vma;
 }
 
 struct intel_ring *
 intel_engine_create_ring(struct intel_engine_cs *engine, int size)
 {
 	struct intel_ring *ring;
-	int ret;
+	struct i915_vma *vma;
 
 	ring = kzalloc(sizeof(*ring), GFP_KERNEL);
-	if (ring == NULL) {
-		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s\n",
-				 engine->name);
+	if (ring == NULL)
 		return ERR_PTR(-ENOMEM);
-	}
 
 	ring->engine = engine;
-	list_add(&ring->link, &engine->buffers);
 
 	ring->size = size;
 	/* Workaround an erratum on the i830 which causes a hang if
@@ -2008,28 +1991,29 @@ intel_engine_create_ring(struct intel_engine_cs *engine, int size)
 	 * of the buffer.
 	 */
 	ring->effective_size = size;
-	if (IS_I830(engine->dev) || IS_845G(engine->dev))
+	if (IS_I830(engine->i915) || IS_845G(engine->i915))
 		ring->effective_size -= 2 * CACHELINE_BYTES;
 
 	ring->last_retired_head = -1;
 	intel_ring_update_space(ring);
 
-	ret = intel_alloc_ringbuffer_obj(engine->dev, ring);
-	if (ret) {
-		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s: %d\n",
-				 engine->name, ret);
-		list_del(&ring->link);
+	vma = intel_ring_create_vma(engine->dev, size);
+	if (IS_ERR(vma)) {
 		kfree(ring);
-		return ERR_PTR(ret);
+		return ERR_CAST(vma);
 	}
+	ring->vma = vma;
+	if (HAS_LLC(engine->i915) && !vma->obj->stolen)
+		ring->vmap = true;
 
+	list_add(&ring->link, &engine->buffers);
 	return ring;
 }
 
 void
 intel_ring_free(struct intel_ring *ring)
 {
-	intel_destroy_ringbuffer_obj(ring);
+	__i915_gem_object_release_unless_active(ring->vma->obj);
 	list_del(&ring->link);
 	kfree(ring);
 }
@@ -2058,7 +2042,6 @@ static int intel_init_engine(struct drm_device *dev,
 		ret = PTR_ERR(ringbuf);
 		goto error;
 	}
-	engine->buffer = ringbuf;
 
 	if (I915_NEED_GFX_HWS(dev)) {
 		ret = init_status_page(engine);
@@ -2073,12 +2056,12 @@ static int intel_init_engine(struct drm_device *dev,
 
 	ret = intel_ring_map(ringbuf);
 	if (ret) {
-		DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n",
-				engine->name, ret);
-		intel_destroy_ringbuffer_obj(ringbuf);
+		intel_ring_free(ringbuf);
 		goto error;
 	}
 
+	engine->buffer = ringbuf;
+
 	ret = i915_cmd_parser_init_ring(engine);
 	if (ret)
 		goto error;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index d24d0e438f49..3ae941b338ca 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -95,7 +95,6 @@ struct intel_engine_hangcheck {
 };
 
 struct intel_ring {
-	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
 	void *virtual_start;
 
@@ -110,6 +109,7 @@ struct intel_ring {
 	int reserved_size;
 	int reserved_tail;
 	bool reserved_in_use;
+	bool vmap;
 
 	/** We track the position of the requests in the ring buffer, and
 	 * when each is retired we increment last_retired_head as the GPU
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 189/190] drm/i915: Skip clearing the GGTT on full-ppgtt systems
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (45 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 188/190] drm/i915: Use VMA for ringbuffer tracking Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  2016-01-11 11:01   ` [PATCH 190/190] drm/i915: Do a nonblocking wait first in pread/pwrite Chris Wilson
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

Under full-ppgtt, access to the global GTT is carefully regulated
through hardware functions (i.e. userspace cannot read and write to
arbitrary locations in the GGTT via the GPU). With this restriction in
place, we can forgo clearing stale entries from the GGTT as they will
not be accessed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a9b547e4ea6f..2e460b369e82 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2451,6 +2451,13 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 	assert_rpm_atomic_end(dev_priv, rpm_atomic_seq);
 }
 
+static void nop_clear_range(struct i915_address_space *vm,
+			    uint64_t start,
+			    uint64_t length,
+			    bool use_scratch)
+{
+}
+
 static void gen8_ggtt_clear_range(struct i915_address_space *vm,
 				  uint64_t start,
 				  uint64_t length,
@@ -3005,7 +3012,9 @@ static int gen8_gmch_probe(struct drm_device *dev,
 
 	ret = ggtt_probe_common(dev, gtt_size);
 
-	dev_priv->gtt.base.clear_range = gen8_ggtt_clear_range;
+	dev_priv->gtt.base.clear_range = nop_clear_range;
+	if (!USES_FULL_PPGTT(dev))
+		dev_priv->gtt.base.clear_range = gen8_ggtt_clear_range;
 	dev_priv->gtt.base.insert_entries = gen8_ggtt_insert_entries;
 	dev_priv->gtt.base.bind_vma = ggtt_bind_vma;
 	dev_priv->gtt.base.unbind_vma = ggtt_unbind_vma;
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* [PATCH 190/190] drm/i915: Do a nonblocking wait first in pread/pwrite
  2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
                     ` (46 preceding siblings ...)
  2016-01-11 11:01   ` [PATCH 189/190] drm/i915: Skip clearing the GGTT on full-ppgtt systems Chris Wilson
@ 2016-01-11 11:01   ` Chris Wilson
  47 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 11:01 UTC (permalink / raw)
  To: intel-gfx

If we try and read or write to an active request, we first must wait
upon the GPU completing that request. Let's do that without holding the
mutex (and so allow someone else to access the GPU whilst we wait). Upn
completion, we will reacquire the mutex and only then start the
operation (i.e. we do not rely on state from before dropping the mutex).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 208 +++++++++++++++++++++-------------------
 1 file changed, 110 insertions(+), 98 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7be5a8fb9180..e51118473df6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -434,6 +434,104 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
 			       args->size, &args->handle);
 }
 
+/**
+ * Ensures that all rendering to the object has completed and the object is
+ * safe to unbind from the GTT or access from the CPU.
+ */
+int
+i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
+			       bool readonly)
+{
+	int ret, i;
+
+	if (!i915_gem_object_is_active(obj))
+		return 0;
+
+	if (readonly) {
+		ret = i915_wait_request(obj->last_write.request);
+		if (ret)
+			return ret;
+	} else {
+		for (i = 0; i < I915_NUM_RINGS; i++) {
+			ret = i915_wait_request(obj->last_read[i].request);
+			if (ret)
+				return ret;
+		}
+		GEM_BUG_ON(i915_gem_object_is_active(obj));
+	}
+
+	return 0;
+}
+
+/* A nonblocking variant of the above wait. This is a highly dangerous routine
+ * as the object state may change during this call.
+ */
+static __must_check int
+i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
+					    struct intel_rps_client *rps,
+					    bool readonly)
+{
+	struct drm_device *dev = obj->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_gem_request *requests[I915_NUM_RINGS];
+	int ret, i, n = 0;
+
+	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
+	BUG_ON(!dev_priv->mm.interruptible);
+
+	if (!i915_gem_object_is_active(obj))
+		return 0;
+
+	if (readonly) {
+		struct drm_i915_gem_request *req;
+
+		req = obj->last_write.request;
+		if (req == NULL)
+			return 0;
+
+		if (i915_gem_request_completed(req))
+			i915_gem_request_retire_upto(req);
+		else
+			requests[n++] = i915_gem_request_get(req);
+	} else {
+		for (i = 0; i < I915_NUM_RINGS; i++) {
+			struct drm_i915_gem_request *req;
+
+			req = obj->last_read[i].request;
+			if (req == NULL)
+				continue;
+
+			if (i915_gem_request_completed(req))
+				i915_gem_request_retire_upto(req);
+			else
+				requests[n++] = i915_gem_request_get(req);
+		}
+	}
+
+	if (n == 0)
+		return 0;
+
+	mutex_unlock(&dev->struct_mutex);
+	ret = 0;
+	for (i = 0; ret == 0 && i < n; i++)
+		ret = __i915_wait_request(requests[i], true, NULL, rps);
+	mutex_lock(&dev->struct_mutex);
+
+	for (i = 0; i < n; i++) {
+		if (ret == 0)
+			i915_gem_request_retire_upto(requests[i]);
+		i915_gem_request_put(requests[i]);
+	}
+
+	return ret;
+}
+
+static struct intel_rps_client *to_rps_client(struct drm_file *file)
+{
+	struct drm_i915_file_private *fpriv = file->driver_priv;
+	return &fpriv->rps;
+}
+
 static inline int
 __copy_to_user_swizzled(char __user *cpu_vaddr,
 			const char *gpu_vaddr, int gpu_offset,
@@ -805,6 +903,12 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
 		goto out;
 	}
 
+	ret = i915_gem_object_wait_rendering__nonblocking(obj,
+							  to_rps_client(file),
+							  true);
+	if (ret)
+		goto out;
+
 	trace_i915_gem_object_pread(obj, args->offset, args->size);
 
 	ret = i915_gem_shmem_pread(dev, obj, args, file);
@@ -1136,6 +1240,12 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 		goto out;
 	}
 
+	ret = i915_gem_object_wait_rendering__nonblocking(obj,
+							  to_rps_client(file),
+							  false);
+	if (ret)
+		goto out;
+
 	trace_i915_gem_object_pwrite(obj, args->offset, args->size);
 
 	ret = -EFAULT;
@@ -1172,104 +1282,6 @@ put_rpm:
 }
 
 /**
- * Ensures that all rendering to the object has completed and the object is
- * safe to unbind from the GTT or access from the CPU.
- */
-int
-i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
-			       bool readonly)
-{
-	int ret, i;
-
-	if (!i915_gem_object_is_active(obj))
-		return 0;
-
-	if (readonly) {
-		ret = i915_wait_request(obj->last_write.request);
-		if (ret)
-			return ret;
-	} else {
-		for (i = 0; i < I915_NUM_RINGS; i++) {
-			ret = i915_wait_request(obj->last_read[i].request);
-			if (ret)
-				return ret;
-		}
-		GEM_BUG_ON(i915_gem_object_is_active(obj));
-	}
-
-	return 0;
-}
-
-/* A nonblocking variant of the above wait. This is a highly dangerous routine
- * as the object state may change during this call.
- */
-static __must_check int
-i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
-					    struct intel_rps_client *rps,
-					    bool readonly)
-{
-	struct drm_device *dev = obj->base.dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_gem_request *requests[I915_NUM_RINGS];
-	int ret, i, n = 0;
-
-	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
-	BUG_ON(!dev_priv->mm.interruptible);
-
-	if (!i915_gem_object_is_active(obj))
-		return 0;
-
-	if (readonly) {
-		struct drm_i915_gem_request *req;
-
-		req = obj->last_write.request;
-		if (req == NULL)
-			return 0;
-
-		if (i915_gem_request_completed(req))
-			i915_gem_request_retire_upto(req);
-		else
-			requests[n++] = i915_gem_request_get(req);
-	} else {
-		for (i = 0; i < I915_NUM_RINGS; i++) {
-			struct drm_i915_gem_request *req;
-
-			req = obj->last_read[i].request;
-			if (req == NULL)
-				continue;
-
-			if (i915_gem_request_completed(req))
-				i915_gem_request_retire_upto(req);
-			else
-				requests[n++] = i915_gem_request_get(req);
-		}
-	}
-
-	if (n == 0)
-		return 0;
-
-	mutex_unlock(&dev->struct_mutex);
-	ret = 0;
-	for (i = 0; ret == 0 && i < n; i++)
-		ret = __i915_wait_request(requests[i], true, NULL, rps);
-	mutex_lock(&dev->struct_mutex);
-
-	for (i = 0; i < n; i++) {
-		if (ret == 0)
-			i915_gem_request_retire_upto(requests[i]);
-		i915_gem_request_put(requests[i]);
-	}
-
-	return ret;
-}
-
-static struct intel_rps_client *to_rps_client(struct drm_file *file)
-{
-	struct drm_i915_file_private *fpriv = file->driver_priv;
-	return &fpriv->rps;
-}
-
-/**
  * Called when user space prepares to use an object with the CPU, either
  * through the mmap ioctl's mapping or a GTT mapping.
  */
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 263+ messages in thread

* Re: [PATCH 017/190] drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+
  2016-01-11  9:16 ` [PATCH 017/190] drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+ Chris Wilson
@ 2016-01-11 14:02   ` Dave Gordon
  2016-01-21 16:27     ` Mika Kuoppala
  2016-03-24  6:39   ` David Weinehall
  1 sibling, 1 reply; 263+ messages in thread
From: Dave Gordon @ 2016-01-11 14:02 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter

On 11/01/16 09:16, Chris Wilson wrote:
> In order to ensure seqno/irq coherency, we current read a ring register.
> We are not sure quite how it works, only that is does. Experiments show
> that e.g. doing a clflush(seqno) instead is not sufficient, but we can
> remove the forcewake dance from the mmio access.
>
> v2: Baytrail wants a clflush too.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>   drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++++++++++++--
>   1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 99780b674311..a1d43b2c7077 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1490,10 +1490,21 @@ gen6_ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
>   {
>   	/* Workaround to force correct ordering between irq and seqno writes on
>   	 * ivb (and maybe also on snb) by reading from a CS register (like
> -	 * ACTHD) before reading the status page. */
> +	 * ACTHD) before reading the status page.
> +	 *
> +	 * Note that this effectively effectively stalls the read by the time
> +	 * it takes to do a memory transaction, which more or less ensures
> +	 * that the write from the GPU has sufficient time to invalidate
> +	 * the CPU cacheline. Alternatively we could delay the interrupt from
> +	 * the CS ring to give the write time to land, but that would incur
> +	 * a delay after every batch i.e. much more frequent than a delay
> +	 * when waiting for the interrupt (with the same net latency).
> +	 */
>   	if (!lazy_coherency) {
>   		struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -		POSTING_READ(RING_ACTHD(ring->mmio_base));
> +		POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
> +
> +		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
>   	}
>
>   	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);

Well, I generally like this, but my previous questions of 2015-01-05 
were not answered:

> Hmm ... would putting the flush /before/ the POSTING_READ be better?
>
> Depending on how the h/w implements the cacheline invalidation, it
 > might allow some overlap between the cache controller's internal
 > activities and the MMIO cycle ...
>
> Also, previously we only had the flush on BXT, whereas now you're
 > doing it on all gen6+. I think this is probably a good thing, but just
 > wondered whether there's any downside to it?
>
> Also ... are we sure that no-one calls this without having a
> forcewake  in effect at the time, in particular debugfs? Or is it not
 > going to end up going through here once lazy_coherency is abolished?

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 102/190] drm/i915: Move the "per-ring" default_context to the device
  2016-01-11 10:44   ` [PATCH 102/190] drm/i915: Move the "per-ring" default_context to the device Chris Wilson
@ 2016-01-11 14:40     ` Dave Gordon
  0 siblings, 0 replies; 263+ messages in thread
From: Dave Gordon @ 2016-01-11 14:40 UTC (permalink / raw)
  To: intel-gfx, Chris Wilson

On 11/01/16 10:44, Chris Wilson wrote:
> We have a false notion of a default_context allocated per engine,
> whereas actually it is a singular context reserved for kernel use.
> Remove it from the engines, and rename it thus.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c        | 19 ++++++++++++++-----
>   drivers/gpu/drm/i915/i915_drv.h            |  1 +
>   drivers/gpu/drm/i915/i915_gem.c            |  2 +-
>   drivers/gpu/drm/i915/i915_gem_context.c    | 28 +++++++++++-----------------
>   drivers/gpu/drm/i915/i915_gem_evict.c      |  4 ++--
>   drivers/gpu/drm/i915/i915_guc_submission.c |  9 +++++----
>   drivers/gpu/drm/i915/intel_display.c       |  2 +-
>   drivers/gpu/drm/i915/intel_lrc.c           |  6 +++---
>   drivers/gpu/drm/i915/intel_overlay.c       |  8 ++++----
>   drivers/gpu/drm/i915/intel_ringbuffer.h    |  1 -
>   10 files changed, 42 insertions(+), 38 deletions(-)

Well generally, I like this, 'cos it looks just like my patch that 
strangely was rejected, but ...

> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index ea5b9f6d0fc9..dee66807c6bd 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1960,12 +1960,21 @@ static int i915_context_status(struct seq_file *m, void *unused)
>   			continue;
>
>   		seq_puts(m, "HW context ");
> +		if (IS_ERR(ctx->file_priv)) {
> +			seq_puts(m, "(deleted) ");
> +		} else if (ctx->file_priv) {
> +			struct pid *pid = ctx->file_priv->file->pid;
> +			struct task_struct *task;
> +
> +			task = get_pid_task(pid, PIDTYPE_PID);
> +			if (task) {
> +				seq_printf(m, "(%s [%d]) ",
> +					   task->comm, task->pid);
> +				put_task_struct(task);
> +			}
> +		} else
> +			seq_puts(m, "(kernel) ");

Improper formatting, needs {} round else clause. Would look prettier to 
put "if (!ctx->file_priv)" first of all in this if-ladder, so the 
trivial cases (kernel, deleted) are dealt with first.

>   		describe_ctx(m, ctx);
> -		for_each_ring(ring, dev_priv, i) {
> -			if (ring->default_context == ctx)
> -				seq_printf(m, "(default context %s) ",
> -					   ring->name);
> -		}
>
>   		if (i915.enable_execlists) {
>   			seq_putc(m, '\n');
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 5711ae3a22a1..4ada625b751e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1703,6 +1703,7 @@ struct drm_i915_private {
>
>   	struct pci_dev *bridge_dev;
>   	struct intel_engine_cs ring[I915_NUM_RINGS];
> +	struct intel_context *kernel_context;
>   	struct drm_i915_gem_object *semaphore_obj;
>   	uint32_t last_seqno, next_seqno;
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d705005ca26e..a82a06a61262 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4097,7 +4097,7 @@ i915_gem_init_hw(struct drm_device *dev)
>   	 */
>   	init_unused_rings(dev);
>
> -	BUG_ON(!dev_priv->ring[RCS].default_context);
> +	BUG_ON(!dev_priv->kernel_context);
>
>   	ret = i915_ppgtt_init_hw(dev);
>   	if (ret) {
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 9f9892525945..593c22a702fa 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -216,6 +216,7 @@ static void context_close(struct intel_context *ctx)
>   	ctx->closed = true;
>   	if (ctx->ppgtt)
>   		i915_ppgtt_close(&ctx->ppgtt->base);
> +	ctx->file_priv = ERR_PTR(-ENOENT);

On the one hand, yes, I agree with zapping ctx->file_priv, if it's going 
to remain at all. I don't care whether it's NULLed or set to any other 
not-a-pointer (I had a version where it was POISONed).

On the other, I can't apply this patch to drm-intel-nightly without 
first taking at least some large subset of the 101 preceding patches; 
which I won't do because I don't have the time or knowledge to review 
every one of them, but if I take one without reviewing it then that 
invalidates reviewing subsequent dependants.

It's surely much easier for reviewers if patchsets are arranged in a 
short broad bush rather than a single long chain? I would probably give 
this my R-B if it were near the base of the tree and I could follow the 
(short) chain from root to tip; but here it's queued behind lots of 
things I don't want to check :(

>   	i915_gem_context_unreference(ctx);
>   }
>
> @@ -358,22 +359,21 @@ void i915_gem_context_reset(struct drm_device *dev)
>   			i915_gem_context_unreference(lctx);
>   			ring->last_context = NULL;
>   		}
> -
> -		/* Force the GPU state to be reinitialised on enabling */
> -		if (ring->default_context)
> -			ring->default_context->legacy_hw_ctx.initialized = false;
>   	}
> +
> +	/* Force the GPU state to be reinitialised on enabling */
> +	if (dev_priv->kernel_context)
> +		dev_priv->kernel_context->legacy_hw_ctx.initialized = false;
>   }
>
>   int i915_gem_context_init(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	struct intel_context *ctx;
> -	int i;
>
>   	/* Init should only be called once per module load. Eventually the
>   	 * restriction on the context_disabled check can be loosened. */
> -	if (WARN_ON(dev_priv->ring[RCS].default_context))
> +	if (WARN_ON(dev_priv->kernel_context))
>   		return 0;
>
>   	if (intel_vgpu_active(dev) && HAS_LOGICAL_RING_CONTEXTS(dev)) {
> @@ -402,13 +402,7 @@ int i915_gem_context_init(struct drm_device *dev)
>   			  PTR_ERR(ctx));
>   		return PTR_ERR(ctx);
>   	}
> -
> -	for (i = 0; i < I915_NUM_RINGS; i++) {
> -		struct intel_engine_cs *ring = &dev_priv->ring[i];
> -
> -		/* NB: RCS will hold a ref for all rings */
> -		ring->default_context = ctx;
> -	}
> +	dev_priv->kernel_context = ctx;
>
>   	DRM_DEBUG_DRIVER("%s context support initialized\n",
>   			i915.enable_execlists ? "LR" :
> @@ -419,7 +413,7 @@ int i915_gem_context_init(struct drm_device *dev)
>   void i915_gem_context_fini(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_context *dctx = dev_priv->ring[RCS].default_context;
> +	struct intel_context *dctx = dev_priv->kernel_context;
>   	int i;
>
>   	if (dctx->legacy_hw_ctx.rcs_state) {
> @@ -449,10 +443,10 @@ void i915_gem_context_fini(struct drm_device *dev)
>   	for (i = 0; i < I915_NUM_RINGS; i++) {
>   		struct intel_engine_cs *ring = &dev_priv->ring[i];
>
> -		if (ring->last_context)
> -			i915_gem_context_unreference(ring->last_context);
> +		if (ring->last_context == NULL)
> +			continue;
>
> -		ring->default_context = NULL;
> +		i915_gem_context_unreference(ring->last_context);
>   		ring->last_context = NULL;
>   	}
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> index b7bcc324a7a7..679b7dd3a312 100644
> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> @@ -45,10 +45,10 @@ static int switch_to_pinned_context(struct drm_i915_private *dev_priv)
>   	for_each_ring(ring, dev_priv, i) {
>   		struct drm_i915_gem_request *req;
>
> -		if (ring->last_context == ring->default_context)
> +		if (ring->last_context == dev_priv->kernel_context)
>   			continue;
>
> -		req = i915_gem_request_alloc(ring, ring->default_context);
> +		req = i915_gem_request_alloc(ring, dev_priv->kernel_context);
>   		if (IS_ERR(req))
>   			return PTR_ERR(req);
>
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index f4e09952d52c..63e58253280b 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -937,11 +937,12 @@ int i915_guc_submission_enable(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	struct intel_guc *guc = &dev_priv->guc;
> -	struct intel_context *ctx = dev_priv->ring[RCS].default_context;
>   	struct i915_guc_client *client;
>
>   	/* client for execbuf submission */
> -	client = guc_client_alloc(dev, GUC_CTX_PRIORITY_KMD_NORMAL, ctx);
> +	client = guc_client_alloc(dev,
> +				  GUC_CTX_PRIORITY_KMD_NORMAL,
> +				  dev_priv->kernel_context);
>   	if (!client) {
>   		DRM_ERROR("Failed to create execbuf guc_client\n");
>   		return -ENOMEM;
> @@ -994,7 +995,7 @@ int intel_guc_suspend(struct drm_device *dev)
>   	if (!i915.enable_guc_submission)
>   		return 0;
>
> -	ctx = dev_priv->ring[RCS].default_context;
> +	ctx = dev_priv->kernel_context;
>
>   	data[0] = HOST2GUC_ACTION_ENTER_S_STATE;
>   	/* any value greater than GUC_POWER_D0 */
> @@ -1020,7 +1021,7 @@ int intel_guc_resume(struct drm_device *dev)
>   	if (!i915.enable_guc_submission)
>   		return 0;
>
> -	ctx = dev_priv->ring[RCS].default_context;
> +	ctx = dev_priv->kernel_context;
>
>   	data[0] = HOST2GUC_ACTION_EXIT_S_STATE;
>   	data[1] = GUC_POWER_D0;
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index f227cdaf38ec..e8f957785a64 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -11672,7 +11672,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
>   	 * into the display plane and skip any waits.
>   	 */
>   	if (!mmio_flip) {
> -		request = i915_gem_request_alloc(ring, ring->default_context);
> +		request = i915_gem_request_alloc(ring, ring->last_context);

Why is this ring->last_context rather than dev_priv->kernel_context? Is 
it guaranteed that the last context active on this engine is somehow 
connected with the flip activity? Could it not have been an earlier 
workload that prepared the framebuffer associated with the flip, then an 
unrelated batch in a different context ran on this engine, then the flip 
that relates to the earlier batch?

.Dave.

>   		if (IS_ERR(request)) {
>   			ret = PTR_ERR(request);
>   			goto cleanup_pending;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 850cacdf6dda..4d5196547e78 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1058,7 +1058,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring)
>   	struct drm_device *dev = ring->dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	lrc_setup_hardware_status_page(ring,
> -				ring->default_context->engine[ring->id].state);
> +			dev_priv->kernel_context->engine[ring->id].state);
>
>   	I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
>   	I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff);
> @@ -1424,7 +1424,7 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
>   		kunmap(sg_page(ring->status_page.obj->pages->sgl));
>   		ring->status_page.obj = NULL;
>   	}
> -	intel_lr_context_unpin(ring->default_context, ring);
> +	intel_lr_context_unpin(ring->i915->kernel_context, ring);
>
>   	lrc_destroy_wa_ctx_obj(ring);
>   	ring->dev = NULL;
> @@ -1458,7 +1458,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>   	if (ret)
>   		goto error;
>
> -	ctx = ring->default_context;
> +	ctx = ring->i915->kernel_context;
>
>   	ret = execlists_context_deferred_alloc(ctx, ring);
>   	if (ret)
> diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
> index df71c01f28f1..094ea87bf6be 100644
> --- a/drivers/gpu/drm/i915/intel_overlay.c
> +++ b/drivers/gpu/drm/i915/intel_overlay.c
> @@ -240,7 +240,7 @@ static int intel_overlay_on(struct intel_overlay *overlay)
>   	WARN_ON(overlay->active);
>   	WARN_ON(IS_I830(dev) && !(dev_priv->quirks & QUIRK_PIPEA_FORCE));
>
> -	req = i915_gem_request_alloc(ring, ring->default_context);
> +	req = i915_gem_request_alloc(ring, dev_priv->kernel_context);
>   	if (IS_ERR(req))
>   		return PTR_ERR(req);
>
> @@ -283,7 +283,7 @@ static int intel_overlay_continue(struct intel_overlay *overlay,
>   	if (tmp & (1 << 17))
>   		DRM_DEBUG("overlay underrun, DOVSTA: %x\n", tmp);
>
> -	req = i915_gem_request_alloc(ring, ring->default_context);
> +	req = i915_gem_request_alloc(ring, dev_priv->kernel_context);
>   	if (IS_ERR(req))
>   		return PTR_ERR(req);
>
> @@ -349,7 +349,7 @@ static int intel_overlay_off(struct intel_overlay *overlay)
>   	 * of the hw. Do it in both cases */
>   	flip_addr |= OFC_UPDATE;
>
> -	req = i915_gem_request_alloc(ring, ring->default_context);
> +	req = i915_gem_request_alloc(ring, dev_priv->kernel_context);
>   	if (IS_ERR(req))
>   		return PTR_ERR(req);
>
> @@ -423,7 +423,7 @@ static int intel_overlay_release_old_vid(struct intel_overlay *overlay)
>   		/* synchronous slowpath */
>   		struct drm_i915_gem_request *req;
>
> -		req = i915_gem_request_alloc(ring, ring->default_context);
> +		req = i915_gem_request_alloc(ring, dev_priv->kernel_context);
>   		if (req)
>   			return PTR_ERR(req);
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 3d4d5711aea9..868cc8d5abb3 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -316,7 +316,6 @@ struct intel_engine_cs {
>   	u32 last_submitted_seqno;
>   	unsigned user_interrupts;
>
> -	struct intel_context *default_context;
>   	struct intel_context *last_context;
>
>   	struct intel_engine_hangcheck hangcheck;
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 072/190] drm/i915: Execlists cannot pin a context without the object
  2016-01-11  9:17 ` [PATCH 072/190] drm/i915: Execlists cannot pin a context without the object Chris Wilson
@ 2016-01-11 15:24   ` Tvrtko Ursulin
  0 siblings, 0 replies; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-01-11 15:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 11/01/16 09:17, Chris Wilson wrote:
> Given that the intel_lr_context_pin cannot succeed without the object,
> we cannot reach intel_lr_context_unpin() without first allocating that
> object - so we can remove the redundant test.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 19 ++++++++-----------
>   1 file changed, 8 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 84a8bcc90d78..0f0bf97e4032 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -769,17 +769,14 @@ static int intel_lr_context_pin(struct drm_i915_gem_request *rq)
>   void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
>   {
>   	int engine = rq->engine->id;
> -	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[engine].state;
> -	struct intel_ring *ring = rq->ring;
> -
> -	if (ctx_obj) {
> -		WARN_ON(!mutex_is_locked(&rq->i915->dev->struct_mutex));
> -		if (--rq->ctx->engine[engine].pin_count == 0) {
> -			intel_ring_unmap(ring);
> -			i915_gem_object_ggtt_unpin(ctx_obj);
> -			i915_gem_context_unreference(rq->ctx);
> -		}
> -	}
> +
> +	WARN_ON(!mutex_is_locked(&rq->i915->dev->struct_mutex));
> +	if (--rq->ctx->engine[engine].pin_count)
> +		return;
> +
> +	intel_ring_unmap(rq->ring);
> +	i915_gem_object_ggtt_unpin(rq->ctx->engine[engine].state);
> +	i915_gem_context_unreference(rq->ctx);
>   }
>
>   static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
>

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

There is another ctx_obj check in intel_execlists_retire_requests which 
could go.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 019/190] drm/i915: Separate out the seqno-barrier from engine->get_seqno
  2016-01-11  9:16 ` [PATCH 019/190] drm/i915: Separate out the seqno-barrier from engine->get_seqno Chris Wilson
@ 2016-01-11 15:43   ` Dave Gordon
  0 siblings, 0 replies; 263+ messages in thread
From: Dave Gordon @ 2016-01-11 15:43 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 11/01/16 09:16, Chris Wilson wrote:
> In order to simplify the next couple of patches, extract the
> lazy_coherency optimisation our of the engine->get_seqno() vfunc into
> its own callback.
>
> v2: Rename the barrier to engine->irq_seqno_barrier to try and better
> reflect that the barrier is only required after the user interrupt before
> reading the seqno (to ensure that the seqno update lands in time as we
> do not have strict seqno-irq ordering on all platforms).
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c      |  6 ++---
>   drivers/gpu/drm/i915/i915_drv.h          | 12 ++++++----
>   drivers/gpu/drm/i915/i915_gpu_error.c    |  2 +-
>   drivers/gpu/drm/i915/i915_irq.c          |  4 ++--
>   drivers/gpu/drm/i915/i915_trace.h        |  2 +-
>   drivers/gpu/drm/i915/intel_breadcrumbs.c |  4 ++--
>   drivers/gpu/drm/i915/intel_lrc.c         | 39 ++++++++++++--------------------
>   drivers/gpu/drm/i915/intel_ringbuffer.c  | 36 +++++++++++++++--------------
>   drivers/gpu/drm/i915/intel_ringbuffer.h  |  4 ++--
>   9 files changed, 53 insertions(+), 56 deletions(-)

All looks OK, so

Reviewed-by: Dave Gordon <david.s.gordon@intel.com>

> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 9396597b136d..1499e2337e5d 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -600,7 +600,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
>   					   ring->name,
>   					   i915_gem_request_get_seqno(work->flip_queued_req),
>   					   dev_priv->next_seqno,
> -					   ring->get_seqno(ring, true),
> +					   ring->get_seqno(ring),
>   					   i915_gem_request_completed(work->flip_queued_req, true));
>   			} else
>   				seq_printf(m, "Flip not associated with any ring\n");
> @@ -734,7 +734,7 @@ static void i915_ring_seqno_info(struct seq_file *m,
>
>   	if (ring->get_seqno) {
>   		seq_printf(m, "Current sequence (%s): %x\n",
> -			   ring->name, ring->get_seqno(ring, false));
> +			   ring->name, ring->get_seqno(ring));
>   	}
>
>   	spin_lock(&ring->breadcrumbs.lock);
> @@ -1354,7 +1354,7 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>   	intel_runtime_pm_get(dev_priv);
>
>   	for_each_ring(ring, dev_priv, i) {
> -		seqno[i] = ring->get_seqno(ring, false);
> +		seqno[i] = ring->get_seqno(ring);
>   		acthd[i] = intel_ring_get_active_head(ring);
>   	}
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index a9e8de57e848..9762aa76bb0a 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2972,15 +2972,19 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>   static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
>   					   bool lazy_coherency)
>   {
> -	u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency);
> -	return i915_seqno_passed(seqno, req->previous_seqno);
> +	if (!lazy_coherency && req->ring->irq_seqno_barrier)
> +		req->ring->irq_seqno_barrier(req->ring);
> +	return i915_seqno_passed(req->ring->get_seqno(req->ring),
> +				 req->previous_seqno);
>   }

We have a different implementation of request_started() with the 
scheduler, but we'll just update this when the scheduler goes in.

.Dave.

>   static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
>   					      bool lazy_coherency)
>   {
> -	u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency);
> -	return i915_seqno_passed(seqno, req->seqno);
> +	if (!lazy_coherency && req->ring->irq_seqno_barrier)
> +		req->ring->irq_seqno_barrier(req->ring);
> +	return i915_seqno_passed(req->ring->get_seqno(req->ring),
> +				 req->seqno);
>   }
>
>   int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index f805d117f3d1..01d0206ca4dd 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -902,8 +902,8 @@ static void i915_record_ring_state(struct drm_device *dev,
>
>   	ering->waiting = intel_engine_has_waiter(ring);
>   	ering->instpm = I915_READ(RING_INSTPM(ring->mmio_base));
> -	ering->seqno = ring->get_seqno(ring, false);
>   	ering->acthd = intel_ring_get_active_head(ring);
> +	ering->seqno = ring->get_seqno(ring);
>   	ering->start = I915_READ_START(ring);
>   	ering->head = I915_READ_HEAD(ring);
>   	ering->tail = I915_READ_TAIL(ring);
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 95b997a57da8..d73669783045 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2903,7 +2903,7 @@ static int semaphore_passed(struct intel_engine_cs *ring)
>   	if (signaller->hangcheck.deadlock >= I915_NUM_RINGS)
>   		return -1;
>
> -	if (i915_seqno_passed(signaller->get_seqno(signaller, false), seqno))
> +	if (i915_seqno_passed(signaller->get_seqno(signaller), seqno))
>   		return 1;
>
>   	/* cursory check for an unkickable deadlock */
> @@ -3067,8 +3067,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>
>   		semaphore_clear_deadlocks(dev_priv);
>
> -		seqno = ring->get_seqno(ring, false);
>   		acthd = intel_ring_get_active_head(ring);
> +		seqno = ring->get_seqno(ring);
>
>   		if (ring->hangcheck.seqno == seqno) {
>   			if (ring_idle(ring, seqno)) {
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index 52b2d409945d..cfb5f78a6e84 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -573,7 +573,7 @@ TRACE_EVENT(i915_gem_request_notify,
>   	    TP_fast_assign(
>   			   __entry->dev = ring->dev->primary->index;
>   			   __entry->ring = ring->id;
> -			   __entry->seqno = ring->get_seqno(ring, false);
> +			   __entry->seqno = ring->get_seqno(ring);
>   			   ),
>
>   	    TP_printk("dev=%u, ring=%u, seqno=%u",
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 9f756583a44e..10b0add54acf 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -127,7 +127,7 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
>   			   struct intel_wait *wait)
>   {
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	u32 seqno = engine->get_seqno(engine, true);
> +	u32 seqno = engine->get_seqno(engine);
>   	struct rb_node **p, *parent, *completed;
>   	bool first;
>
> @@ -269,7 +269,7 @@ void intel_engine_remove_wait(struct intel_engine_cs *engine,
>   			 * the first_waiter. This is undesirable if that
>   			 * waiter is a high priority task.
>   			 */
> -			u32 seqno = engine->get_seqno(engine, true);
> +			u32 seqno = engine->get_seqno(engine);
>   			while (i915_seqno_passed(seqno,
>   						 to_wait(next)->seqno)) {
>   				struct rb_node *n = rb_next(next);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 16fa58a0a930..333e95bda78a 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1775,7 +1775,7 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
>   	return 0;
>   }
>
> -static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
> +static u32 gen8_get_seqno(struct intel_engine_cs *ring)
>   {
>   	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
>   }
> @@ -1785,9 +1785,8 @@ static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
>   	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
>   }
>
> -static u32 bxt_a_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
> +static void bxt_seqno_barrier(struct intel_engine_cs *ring)
>   {
> -
>   	/*
>   	 * On BXT A steppings there is a HW coherency issue whereby the
>   	 * MI_STORE_DATA_IMM storing the completed request's seqno
> @@ -1798,11 +1797,7 @@ static u32 bxt_a_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
>   	 * bxt_a_set_seqno(), where we also do a clflush after the write. So
>   	 * this clflush in practice becomes an invalidate operation.
>   	 */
> -
> -	if (!lazy_coherency)
> -		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
> -
> -	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
> +	intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
>   }
>
>   static void bxt_a_set_seqno(struct intel_engine_cs *ring, u32 seqno)
> @@ -2007,12 +2002,11 @@ static int logical_render_ring_init(struct drm_device *dev)
>   		ring->init_hw = gen8_init_render_ring;
>   	ring->init_context = gen8_init_rcs_context;
>   	ring->cleanup = intel_fini_pipe_control;
> +	ring->get_seqno = gen8_get_seqno;
> +	ring->set_seqno = gen8_set_seqno;
>   	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) {
> -		ring->get_seqno = bxt_a_get_seqno;
> +		ring->irq_seqno_barrier = bxt_seqno_barrier;
>   		ring->set_seqno = bxt_a_set_seqno;
> -	} else {
> -		ring->get_seqno = gen8_get_seqno;
> -		ring->set_seqno = gen8_set_seqno;
>   	}
>   	ring->emit_request = gen8_emit_request;
>   	ring->emit_flush = gen8_emit_flush_render;
> @@ -2059,12 +2053,11 @@ static int logical_bsd_ring_init(struct drm_device *dev)
>   		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
>
>   	ring->init_hw = gen8_init_common_ring;
> +	ring->get_seqno = gen8_get_seqno;
> +	ring->set_seqno = gen8_set_seqno;
>   	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) {
> -		ring->get_seqno = bxt_a_get_seqno;
> +		ring->irq_seqno_barrier = bxt_seqno_barrier;
>   		ring->set_seqno = bxt_a_set_seqno;
> -	} else {
> -		ring->get_seqno = gen8_get_seqno;
> -		ring->set_seqno = gen8_set_seqno;
>   	}
>   	ring->emit_request = gen8_emit_request;
>   	ring->emit_flush = gen8_emit_flush;
> @@ -2114,12 +2107,11 @@ static int logical_blt_ring_init(struct drm_device *dev)
>   		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
>
>   	ring->init_hw = gen8_init_common_ring;
> +	ring->get_seqno = gen8_get_seqno;
> +	ring->set_seqno = gen8_set_seqno;
>   	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) {
> -		ring->get_seqno = bxt_a_get_seqno;
> +		ring->irq_seqno_barrier = bxt_seqno_barrier;
>   		ring->set_seqno = bxt_a_set_seqno;
> -	} else {
> -		ring->get_seqno = gen8_get_seqno;
> -		ring->set_seqno = gen8_set_seqno;
>   	}
>   	ring->emit_request = gen8_emit_request;
>   	ring->emit_flush = gen8_emit_flush;
> @@ -2144,12 +2136,11 @@ static int logical_vebox_ring_init(struct drm_device *dev)
>   		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
>
>   	ring->init_hw = gen8_init_common_ring;
> +	ring->get_seqno = gen8_get_seqno;
> +	ring->set_seqno = gen8_set_seqno;
>   	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) {
> -		ring->get_seqno = bxt_a_get_seqno;
> +		ring->irq_seqno_barrier = bxt_seqno_barrier;
>   		ring->set_seqno = bxt_a_set_seqno;
> -	} else {
> -		ring->get_seqno = gen8_get_seqno;
> -		ring->set_seqno = gen8_set_seqno;
>   	}
>   	ring->emit_request = gen8_emit_request;
>   	ring->emit_flush = gen8_emit_flush;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 60b0df2c5399..57ec21c5b1ab 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1485,8 +1485,8 @@ pc_render_add_request(struct drm_i915_gem_request *req)
>   	return 0;
>   }
>
> -static u32
> -gen6_ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
> +static void
> +gen6_seqno_barrier(struct intel_engine_cs *ring)
>   {
>   	/* Workaround to force correct ordering between irq and seqno writes on
>   	 * ivb (and maybe also on snb) by reading from a CS register (like
> @@ -1500,18 +1500,14 @@ gen6_ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
>   	 * a delay after every batch i.e. much more frequent than a delay
>   	 * when waiting for the interrupt (with the same net latency).
>   	 */
> -	if (!lazy_coherency) {
> -		struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -		POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
> -
> -		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
> -	}
> +	struct drm_i915_private *dev_priv = ring->i915;
> +	POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
>
> -	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
> +	intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
>   }
>
>   static u32
> -ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
> +ring_get_seqno(struct intel_engine_cs *ring)
>   {
>   	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
>   }
> @@ -1523,7 +1519,7 @@ ring_set_seqno(struct intel_engine_cs *ring, u32 seqno)
>   }
>
>   static u32
> -pc_render_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
> +pc_render_get_seqno(struct intel_engine_cs *ring)
>   {
>   	return ring->scratch.cpu_page[0];
>   }
> @@ -2698,7 +2694,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>   		ring->irq_get = gen8_ring_get_irq;
>   		ring->irq_put = gen8_ring_put_irq;
>   		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
> -		ring->get_seqno = gen6_ring_get_seqno;
> +		ring->irq_seqno_barrier = gen6_seqno_barrier;
> +		ring->get_seqno = ring_get_seqno;
>   		ring->set_seqno = ring_set_seqno;
>   		if (i915_semaphore_is_enabled(dev)) {
>   			WARN_ON(!dev_priv->semaphore_obj);
> @@ -2715,7 +2712,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>   		ring->irq_get = gen6_ring_get_irq;
>   		ring->irq_put = gen6_ring_put_irq;
>   		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
> -		ring->get_seqno = gen6_ring_get_seqno;
> +		ring->irq_seqno_barrier = gen6_seqno_barrier;
> +		ring->get_seqno = ring_get_seqno;
>   		ring->set_seqno = ring_set_seqno;
>   		if (i915_semaphore_is_enabled(dev)) {
>   			ring->semaphore.sync_to = gen6_ring_sync;
> @@ -2829,7 +2827,8 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
>   			ring->write_tail = gen6_bsd_ring_write_tail;
>   		ring->flush = gen6_bsd_ring_flush;
>   		ring->add_request = gen6_add_request;
> -		ring->get_seqno = gen6_ring_get_seqno;
> +		ring->irq_seqno_barrier = gen6_seqno_barrier;
> +		ring->get_seqno = ring_get_seqno;
>   		ring->set_seqno = ring_set_seqno;
>   		if (INTEL_INFO(dev)->gen >= 8) {
>   			ring->irq_enable_mask =
> @@ -2901,7 +2900,8 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
>   	ring->mmio_base = GEN8_BSD2_RING_BASE;
>   	ring->flush = gen6_bsd_ring_flush;
>   	ring->add_request = gen6_add_request;
> -	ring->get_seqno = gen6_ring_get_seqno;
> +	ring->irq_seqno_barrier = gen6_seqno_barrier;
> +	ring->get_seqno = ring_get_seqno;
>   	ring->set_seqno = ring_set_seqno;
>   	ring->irq_enable_mask =
>   			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
> @@ -2931,7 +2931,8 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
>   	ring->write_tail = ring_write_tail;
>   	ring->flush = gen6_ring_flush;
>   	ring->add_request = gen6_add_request;
> -	ring->get_seqno = gen6_ring_get_seqno;
> +	ring->irq_seqno_barrier = gen6_seqno_barrier;
> +	ring->get_seqno = ring_get_seqno;
>   	ring->set_seqno = ring_set_seqno;
>   	if (INTEL_INFO(dev)->gen >= 8) {
>   		ring->irq_enable_mask =
> @@ -2988,7 +2989,8 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
>   	ring->write_tail = ring_write_tail;
>   	ring->flush = gen6_ring_flush;
>   	ring->add_request = gen6_add_request;
> -	ring->get_seqno = gen6_ring_get_seqno;
> +	ring->irq_seqno_barrier = gen6_seqno_barrier;
> +	ring->get_seqno = ring_get_seqno;
>   	ring->set_seqno = ring_set_seqno;
>
>   	if (INTEL_INFO(dev)->gen >= 8) {
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 51fcb66bfc4a..3b49726b1732 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -219,8 +219,8 @@ struct  intel_engine_cs {
>   	 * seen value is good enough. Note that the seqno will always be
>   	 * monotonic, even if not coherent.
>   	 */
> -	u32		(*get_seqno)(struct intel_engine_cs *ring,
> -				     bool lazy_coherency);
> +	void		(*irq_seqno_barrier)(struct intel_engine_cs *ring);
> +	u32		(*get_seqno)(struct intel_engine_cs *ring);
>   	void		(*set_seqno)(struct intel_engine_cs *ring,
>   				     u32 seqno);
>   	int		(*dispatch_execbuffer)(struct drm_i915_gem_request *req,
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 020/190] drm/i915: Remove the lazy_coherency parameter from request-completed?
  2016-01-11  9:16 ` [PATCH 020/190] drm/i915: Remove the lazy_coherency parameter from request-completed? Chris Wilson
@ 2016-01-11 15:45   ` Dave Gordon
  2016-01-11 16:24     ` Chris Wilson
  2016-01-12 10:27   ` Mika Kuoppala
  1 sibling, 1 reply; 263+ messages in thread
From: Dave Gordon @ 2016-01-11 15:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 11/01/16 09:16, Chris Wilson wrote:
> Now that we have split out the seqno-barrier from the
> engine->get_seqno() callback itself, we can move the users of the
> seqno-barrier to the required callsites simplifying the common code and
> making the required workaround handling much more explicit.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c  |  4 ++--
>   drivers/gpu/drm/i915/i915_drv.h      | 17 ++++++++---------
>   drivers/gpu/drm/i915/i915_gem.c      | 24 ++++++++++++++++--------
>   drivers/gpu/drm/i915/intel_display.c |  2 +-
>   drivers/gpu/drm/i915/intel_pm.c      |  4 ++--
>   5 files changed, 29 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 1499e2337e5d..d09e48455dcb 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -601,7 +601,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
>   					   i915_gem_request_get_seqno(work->flip_queued_req),
>   					   dev_priv->next_seqno,
>   					   ring->get_seqno(ring),
> -					   i915_gem_request_completed(work->flip_queued_req, true));
> +					   i915_gem_request_completed(work->flip_queued_req));
>   			} else
>   				seq_printf(m, "Flip not associated with any ring\n");
>   			seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n",
> @@ -1354,8 +1354,8 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>   	intel_runtime_pm_get(dev_priv);
>
>   	for_each_ring(ring, dev_priv, i) {
> -		seqno[i] = ring->get_seqno(ring);
>   		acthd[i] = intel_ring_get_active_head(ring);
> +		seqno[i] = ring->get_seqno(ring);
>   	}
>
>   	i915_get_extra_instdone(dev, instdone);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 9762aa76bb0a..44d46018ee13 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2969,20 +2969,14 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>   	return (int32_t)(seq1 - seq2) >= 0;
>   }
>
> -static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
> -					   bool lazy_coherency)
> +static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
>   {
> -	if (!lazy_coherency && req->ring->irq_seqno_barrier)
> -		req->ring->irq_seqno_barrier(req->ring);
>   	return i915_seqno_passed(req->ring->get_seqno(req->ring),
>   				 req->previous_seqno);
>   }
> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> -					      bool lazy_coherency)
> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
>   {
> -	if (!lazy_coherency && req->ring->irq_seqno_barrier)
> -		req->ring->irq_seqno_barrier(req->ring);
>   	return i915_seqno_passed(req->ring->get_seqno(req->ring),
>   				 req->seqno);
>   }
> @@ -3636,6 +3630,8 @@ static inline void i915_trace_irq_get(struct intel_engine_cs *ring,
>
>   static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
>   {
> +	struct intel_engine_cs *engine = req->ring;
> +
>   	/* Ensure our read of the seqno is coherent so that we
>   	 * do not "miss an interrupt" (i.e. if this is the last
>   	 * request and the seqno write from the GPU is not visible
> @@ -3647,7 +3643,10 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
>   	 * but it is easier and safer to do it every time the waiter
>   	 * is woken.
>   	 */
> -	if (i915_gem_request_completed(req, false))
> +	if (engine->irq_seqno_barrier)
> +		engine->irq_seqno_barrier(engine);

I'm still not convinced that this is the right place for the magic, but 
at least it's preferable to having a lazy_coherency parameter. So on the 
basis that the proper review criterion is "better than before, and 
creates no new problems", and not "fixes all known issues", then

Reviewed-by: Dave Gordon <david.s.gordon@intel.com>

> +
> +	if (i915_gem_request_completed(req))
>   		return true;
>
>   	/* We need to check whether any gpu reset happened in between
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 4b26529f1f44..d125820c6309 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1171,12 +1171,12 @@ static bool __i915_spin_request(struct drm_i915_gem_request *req,
>   	 */
>
>   	/* Only spin if we know the GPU is processing this request */
> -	if (!i915_gem_request_started(req, true))
> +	if (!i915_gem_request_started(req))
>   		return false;
>
>   	timeout = local_clock_us(&cpu) + 5;
>   	do {
> -		if (i915_gem_request_completed(req, true))
> +		if (i915_gem_request_completed(req))
>   			return true;
>
>   		if (signal_pending_state(state, wait->task))
> @@ -1228,7 +1228,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   	if (list_empty(&req->list))
>   		return 0;
>
> -	if (i915_gem_request_completed(req, true))
> +	if (i915_gem_request_completed(req))
>   		return 0;
>
>   	timeout_remain = MAX_SCHEDULE_TIMEOUT;
> @@ -2724,8 +2724,16 @@ i915_gem_find_active_request(struct intel_engine_cs *ring)
>   {
>   	struct drm_i915_gem_request *request;
>
> +	/* We are called by the error capture and reset at a random
> +	 * point in time. In particular, note that neither is crucially
> +	 * ordered with an interrupt. After a hang, the GPU is dead and we
> +	 * assume that no more writes can happen (we waited long enough for
> +	 * all writes that were in transaction to be flushed) - adding an
> +	 * extra delay for a recent interrupt is pointless. Hence, we do
> +	 * not need an engine->irq_seqno_barrier() before the seqno reads.
> +	 */
>   	list_for_each_entry(request, &ring->request_list, list) {
> -		if (i915_gem_request_completed(request, false))
> +		if (i915_gem_request_completed(request))
>   			continue;
>
>   		return request;
> @@ -2859,7 +2867,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>   					   struct drm_i915_gem_request,
>   					   list);
>
> -		if (!i915_gem_request_completed(request, true))
> +		if (!i915_gem_request_completed(request))
>   			break;
>
>   		i915_gem_request_retire(request);
> @@ -2883,7 +2891,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>   	}
>
>   	if (unlikely(ring->trace_irq_req &&
> -		     i915_gem_request_completed(ring->trace_irq_req, true))) {
> +		     i915_gem_request_completed(ring->trace_irq_req))) {
>   		ring->irq_put(ring);
>   		i915_gem_request_assign(&ring->trace_irq_req, NULL);
>   	}
> @@ -2995,7 +3003,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
>   		if (list_empty(&req->list))
>   			goto retire;
>
> -		if (i915_gem_request_completed(req, true)) {
> +		if (i915_gem_request_completed(req)) {
>   			__i915_gem_request_retire__upto(req);
>   retire:
>   			i915_gem_object_retire__read(obj, i);
> @@ -3104,7 +3112,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
>   	if (to == from)
>   		return 0;
>
> -	if (i915_gem_request_completed(from_req, true))
> +	if (i915_gem_request_completed(from_req))
>   		return 0;
>
>   	if (!i915_semaphore_is_enabled(obj->base.dev)) {
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 7e36f85d3109..de4d4a0d923a 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -11523,7 +11523,7 @@ static bool __intel_pageflip_stall_check(struct drm_device *dev,
>
>   	if (work->flip_ready_vblank == 0) {
>   		if (work->flip_queued_req &&
> -		    !i915_gem_request_completed(work->flip_queued_req, true))
> +		    !i915_gem_request_completed(work->flip_queued_req))
>   			return false;
>
>   		work->flip_ready_vblank = drm_crtc_vblank_count(crtc);
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 9df9e9a22f3c..401c3770057d 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -7286,7 +7286,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
>   	struct request_boost *boost = container_of(work, struct request_boost, work);
>   	struct drm_i915_gem_request *req = boost->req;
>
> -	if (!i915_gem_request_completed(req, true))
> +	if (!i915_gem_request_completed(req))
>   		gen6_rps_boost(to_i915(req->ring->dev), NULL,
>   			       req->emitted_jiffies);
>
> @@ -7302,7 +7302,7 @@ void intel_queue_rps_boost_for_request(struct drm_device *dev,
>   	if (req == NULL || INTEL_INFO(dev)->gen < 6)
>   		return;
>
> -	if (i915_gem_request_completed(req, true))
> +	if (i915_gem_request_completed(req))
>   		return;
>
>   	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 036/190] drm/i915: Restore waitboost credit to the synchronous waiter
  2016-01-11  9:16 ` [PATCH 036/190] drm/i915: Restore waitboost credit to the synchronous waiter Chris Wilson
@ 2016-01-11 16:10   ` Jesse Barnes
  0 siblings, 0 replies; 263+ messages in thread
From: Jesse Barnes @ 2016-01-11 16:10 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 01/11/2016 01:16 AM, Chris Wilson wrote:
> Ideally, we want to automagically have the GPU respond to the
> instantaneous load by reclocking itself. However, reclocking occurs
> relatively slowly, and to the client waiting for a result from the GPU,
> too late. To compensate and reduce the client latency, we allow the
> first wait from a client to boost the GPU clocks to maximum. This
> overcomes the lag in autoreclocking, at the expense of forcing the GPU
> clocks too high. So to offset the excessive power usage, we currently
> allow a client to only boost the clocks once before we detect the GPU
> is idle again. This works reasonably for say the first frame in a
> benchmark, but for many more synchronous workloads (like OpenCL) we find
> the GPU clocks remain too low. By noting a wait which would idle the GPU
> (i.e. we just waited upon the last known request), we can give that
> client the idle boost credit (for their next wait) without the 100ms
> delay required for us to detect the GPU idle state. The intention is to
> boost clients that are stalling in the process of feeding the GPU more
> work (and who in doing so let the GPU idle), without granting boost
> credits to clients that are throttling themselves (such as compositors).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: "Zou, Nanhai" <nanhai.zou@intel.com>
> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index e9f5ca7ea835..3fea582768e9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1314,6 +1314,22 @@ complete:
>  			*timeout = 0;
>  	}
>  
> +	if (ret == 0 && rps && req->seqno == req->ring->last_submitted_seqno) {
> +		/* The GPU is now idle and this client has stalled.
> +		 * Since no other client has submitted a request in the
> +		 * meantime, assume that this client is the only one
> +		 * supplying work to the GPU but is unable to keep that
> +		 * work supplied because it is waiting. Since the GPU is
> +		 * then never kept fully busy, RPS autoclocking will
> +		 * keep the clocks relatively low, causing further delays.
> +		 * Compensate by giving the synchronous client credit for
> +		 * a waitboost next time.
> +		 */
> +		spin_lock(&req->i915->rps.client_lock);
> +		list_del_init(&rps->link);
> +		spin_unlock(&req->i915->rps.client_lock);
> +	}
> +
>  	return ret;
>  }
>  
> 

Assuming this works for the OCL guys, it seems ok.  Doing the
list_del_init(&rps->link) is a bit of an obfuscated way of doing it, but
I guess the comment makes it pretty clear.

Jesse
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 020/190] drm/i915: Remove the lazy_coherency parameter from request-completed?
  2016-01-11 15:45   ` Dave Gordon
@ 2016-01-11 16:24     ` Chris Wilson
  0 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 16:24 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jan 11, 2016 at 03:45:08PM +0000, Dave Gordon wrote:
> >@@ -3647,7 +3643,10 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
> >  	 * but it is easier and safer to do it every time the waiter
> >  	 * is woken.
> >  	 */
> >-	if (i915_gem_request_completed(req, false))
> >+	if (engine->irq_seqno_barrier)
> >+		engine->irq_seqno_barrier(engine);
> 
> I'm still not convinced that this is the right place for the magic,
> but at least it's preferable to having a lazy_coherency parameter.
> So on the basis that the proper review criterion is "better than
> before, and creates no new problems", and not "fixes all known
> issues", then

I've tried very hard to explain why this must be, and only be, in the
interrupt handler bottom-half. The basic premise is that we need some
delay between the interrupt and the seqno read (on certain platforms) to
compensate for the unordered nature of the MI_STORE_DWORD_INDEX vs
MI_USER_INTERRUPT. There are other ways of inserting that delay, mostly
by adding the in the ring between the data store and the interrupt, but
my preference for only inserting the delay as required on the waiter
side rather than after every single batch (based on the current state of
affairs where waits should be rare). Even more so if we can convince
userspace to switch over to userspace fences, rather than hitting the
ioctls everytime. If you are interested, the mesa side looks like
http://cgit.freedesktop.org/~ickle/mesa/commit/?h=brw-batch2&id=f2c13b212dd30562db7b405d6ea79c87456ead51

The opposite question is where do you think is the right place to insert
the interrupt delay?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 073/190] drm/i915: Introduce i915_gem_active for request tracking
  2016-01-11  9:17 ` [PATCH 073/190] drm/i915: Introduce i915_gem_active for request tracking Chris Wilson
@ 2016-01-11 17:32   ` Tvrtko Ursulin
  2016-01-11 22:49     ` Chris Wilson
  0 siblings, 1 reply; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-01-11 17:32 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 11/01/16 09:17, Chris Wilson wrote:
> In the next patch, request tracking is made more generic and for that we
> need a new expanded struct and to separate out the logic changes from
> the mechanical churn, we split out the structure renaming into this
> patch.
>
> v2: Writer's block. Add some spiel about why we track requests.
> v3: Now i915_gem_active.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c        | 10 +++---
>   drivers/gpu/drm/i915/i915_drv.h            |  9 +++--
>   drivers/gpu/drm/i915/i915_gem.c            | 56 +++++++++++++++---------------
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |  4 +--
>   drivers/gpu/drm/i915/i915_gem_fence.c      |  6 ++--
>   drivers/gpu/drm/i915/i915_gem_request.h    | 38 ++++++++++++++++++++
>   drivers/gpu/drm/i915/i915_gem_tiling.c     |  2 +-
>   drivers/gpu/drm/i915/i915_gpu_error.c      |  6 ++--
>   drivers/gpu/drm/i915/intel_display.c       | 10 +++---
>   9 files changed, 89 insertions(+), 52 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 8de944ed3369..65cb1d6a5d64 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -146,10 +146,10 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>   		   obj->base.write_domain);
>   	for_each_ring(ring, dev_priv, i)
>   		seq_printf(m, "%x ",
> -				i915_gem_request_get_seqno(obj->last_read_req[i]));
> +				i915_gem_request_get_seqno(obj->last_read[i].request));
>   	seq_printf(m, "] %x %x%s%s%s",
> -		   i915_gem_request_get_seqno(obj->last_write_req),
> -		   i915_gem_request_get_seqno(obj->last_fenced_req),
> +		   i915_gem_request_get_seqno(obj->last_write.request),
> +		   i915_gem_request_get_seqno(obj->last_fence.request),
>   		   i915_cache_level_str(to_i915(obj->base.dev), obj->cache_level),
>   		   obj->dirty ? " dirty" : "",
>   		   obj->madv == I915_MADV_DONTNEED ? " purgeable" : "");
> @@ -184,8 +184,8 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>   		*t = '\0';
>   		seq_printf(m, " (%s mappable)", s);
>   	}
> -	if (obj->last_write_req != NULL)
> -		seq_printf(m, " (%s)", obj->last_write_req->engine->name);
> +	if (obj->last_write.request != NULL)
> +		seq_printf(m, " (%s)", obj->last_write.request->engine->name);
>   	if (obj->frontbuffer_bits)
>   		seq_printf(m, " (frontbuffer: 0x%03x)", obj->frontbuffer_bits);
>   }
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index cae448e238ca..c577f86d94f8 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2110,11 +2110,10 @@ struct drm_i915_gem_object {
>   	 * requests on one ring where the write request is older than the
>   	 * read request. This allows for the CPU to read from an active
>   	 * buffer by only waiting for the write to complete.
> -	 * */
> -	struct drm_i915_gem_request *last_read_req[I915_NUM_RINGS];
> -	struct drm_i915_gem_request *last_write_req;
> -	/** Breadcrumb of last fenced GPU access to the buffer. */
> -	struct drm_i915_gem_request *last_fenced_req;
> +	 */
> +	struct i915_gem_active last_read[I915_NUM_RINGS];
> +	struct i915_gem_active last_write;
> +	struct i915_gem_active last_fence;
>
>   	/** Current tiling stride for the object, if it's tiled. */
>   	uint32_t stride;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index b0230e7151ce..77c253ddf060 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1117,23 +1117,23 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
>   		return 0;
>
>   	if (readonly) {
> -		if (obj->last_write_req != NULL) {
> -			ret = i915_wait_request(obj->last_write_req);
> +		if (obj->last_write.request != NULL) {
> +			ret = i915_wait_request(obj->last_write.request);
>   			if (ret)
>   				return ret;
>
> -			i = obj->last_write_req->engine->id;
> -			if (obj->last_read_req[i] == obj->last_write_req)
> +			i = obj->last_write.request->engine->id;
> +			if (obj->last_read[i].request == obj->last_write.request)
>   				i915_gem_object_retire__read(obj, i);
>   			else
>   				i915_gem_object_retire__write(obj);
>   		}
>   	} else {
>   		for (i = 0; i < I915_NUM_RINGS; i++) {
> -			if (obj->last_read_req[i] == NULL)
> +			if (obj->last_read[i].request == NULL)
>   				continue;
>
> -			ret = i915_wait_request(obj->last_read_req[i]);
> +			ret = i915_wait_request(obj->last_read[i].request);
>   			if (ret)
>   				return ret;
>
> @@ -1151,9 +1151,9 @@ i915_gem_object_retire_request(struct drm_i915_gem_object *obj,
>   {
>   	int ring = req->engine->id;
>
> -	if (obj->last_read_req[ring] == req)
> +	if (obj->last_read[ring].request == req)
>   		i915_gem_object_retire__read(obj, ring);
> -	else if (obj->last_write_req == req)
> +	else if (obj->last_write.request == req)
>   		i915_gem_object_retire__write(obj);
>
>   	i915_gem_request_retire_upto(req);
> @@ -1181,7 +1181,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
>   	if (readonly) {
>   		struct drm_i915_gem_request *req;
>
> -		req = obj->last_write_req;
> +		req = obj->last_write.request;
>   		if (req == NULL)
>   			return 0;
>
> @@ -1190,7 +1190,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
>   		for (i = 0; i < I915_NUM_RINGS; i++) {
>   			struct drm_i915_gem_request *req;
>
> -			req = obj->last_read_req[i];
> +			req = obj->last_read[i].request;
>   			if (req == NULL)
>   				continue;
>
> @@ -2070,7 +2070,7 @@ void i915_vma_move_to_active(struct i915_vma *vma,
>   	obj->active |= intel_engine_flag(engine);
>
>   	list_move_tail(&obj->ring_list[engine->id], &engine->active_list);
> -	i915_gem_request_assign(&obj->last_read_req[engine->id], req);
> +	i915_gem_request_mark_active(req, &obj->last_read[engine->id]);
>
>   	list_move_tail(&vma->mm_list, &vma->vm->active_list);
>   }
> @@ -2078,10 +2078,10 @@ void i915_vma_move_to_active(struct i915_vma *vma,
>   static void
>   i915_gem_object_retire__write(struct drm_i915_gem_object *obj)
>   {
> -	GEM_BUG_ON(obj->last_write_req == NULL);
> -	GEM_BUG_ON(!(obj->active & intel_engine_flag(obj->last_write_req->engine)));
> +	GEM_BUG_ON(obj->last_write.request == NULL);
> +	GEM_BUG_ON(!(obj->active & intel_engine_flag(obj->last_write.request->engine)));
>
> -	i915_gem_request_assign(&obj->last_write_req, NULL);
> +	i915_gem_request_assign(&obj->last_write.request, NULL);
>   	intel_fb_obj_flush(obj, true, ORIGIN_CS);
>   }
>
> @@ -2090,13 +2090,13 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>   {
>   	struct i915_vma *vma;
>
> -	GEM_BUG_ON(obj->last_read_req[ring] == NULL);
> +	GEM_BUG_ON(obj->last_read[ring].request == NULL);
>   	GEM_BUG_ON(!(obj->active & (1 << ring)));
>
>   	list_del_init(&obj->ring_list[ring]);
> -	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
> +	i915_gem_request_assign(&obj->last_read[ring].request, NULL);
>
> -	if (obj->last_write_req && obj->last_write_req->engine->id == ring)
> +	if (obj->last_write.request && obj->last_write.request->engine->id == ring)
>   		i915_gem_object_retire__write(obj);
>
>   	obj->active &= ~(1 << ring);
> @@ -2115,7 +2115,7 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>   			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
>   	}
>
> -	i915_gem_request_assign(&obj->last_fenced_req, NULL);
> +	i915_gem_request_assign(&obj->last_fence.request, NULL);
>   	drm_gem_object_unreference(&obj->base);
>   }
>
> @@ -2336,7 +2336,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>   				      struct drm_i915_gem_object,
>   				      ring_list[ring->id]);
>
> -		if (!list_empty(&obj->last_read_req[ring->id]->list))
> +		if (!list_empty(&obj->last_read[ring->id].request->list))
>   			break;
>
>   		i915_gem_object_retire__read(obj, ring->id);
> @@ -2445,7 +2445,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
>   	for (i = 0; i < I915_NUM_RINGS; i++) {
>   		struct drm_i915_gem_request *req;
>
> -		req = obj->last_read_req[i];
> +		req = obj->last_read[i].request;
>   		if (req == NULL)
>   			continue;
>
> @@ -2525,10 +2525,10 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   	drm_gem_object_unreference(&obj->base);
>
>   	for (i = 0; i < I915_NUM_RINGS; i++) {
> -		if (obj->last_read_req[i] == NULL)
> +		if (obj->last_read[i].request == NULL)
>   			continue;
>
> -		req[n++] = i915_gem_request_get(obj->last_read_req[i]);
> +		req[n++] = i915_gem_request_get(obj->last_read[i].request);
>   	}
>
>   	mutex_unlock(&dev->struct_mutex);
> @@ -2619,12 +2619,12 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
>
>   	n = 0;
>   	if (readonly) {
> -		if (obj->last_write_req)
> -			req[n++] = obj->last_write_req;
> +		if (obj->last_write.request)
> +			req[n++] = obj->last_write.request;
>   	} else {
>   		for (i = 0; i < I915_NUM_RINGS; i++)
> -			if (obj->last_read_req[i])
> -				req[n++] = obj->last_read_req[i];
> +			if (obj->last_read[i].request)
> +				req[n++] = obj->last_read[i].request;
>   	}
>   	for (i = 0; i < n; i++) {
>   		ret = __i915_gem_object_sync(obj, to, req[i]);
> @@ -3695,8 +3695,8 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
>
>   	BUILD_BUG_ON(I915_NUM_RINGS > 16);
>   	args->busy = obj->active << 16;
> -	if (obj->last_write_req)
> -		args->busy |= obj->last_write_req->engine->id;
> +	if (obj->last_write.request)
> +		args->busy |= obj->last_write.request->engine->id;
>
>   unref:
>   	drm_gem_object_unreference(&obj->base);
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 6dee27224ddb..56d6b5dbb121 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1125,7 +1125,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
>
>   		i915_vma_move_to_active(vma, req);
>   		if (obj->base.write_domain) {
> -			i915_gem_request_assign(&obj->last_write_req, req);
> +			i915_gem_request_mark_active(req, &obj->last_write);
>
>   			intel_fb_obj_invalidate(obj, ORIGIN_CS);
>
> @@ -1133,7 +1133,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
>   			obj->base.write_domain &= ~I915_GEM_GPU_DOMAINS;
>   		}
>   		if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
> -			i915_gem_request_assign(&obj->last_fenced_req, req);
> +			i915_gem_request_mark_active(req, &obj->last_fence);
>   			if (entry->flags & __EXEC_OBJECT_HAS_FENCE) {
>   				struct drm_i915_private *dev_priv = req->i915;
>   				list_move_tail(&dev_priv->fence_regs[obj->fence_reg].lru_list,
> diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
> index 598198543dcd..ab29c237ffa9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_fence.c
> +++ b/drivers/gpu/drm/i915/i915_gem_fence.c
> @@ -261,12 +261,12 @@ static inline void i915_gem_object_fence_lost(struct drm_i915_gem_object *obj)
>   static int
>   i915_gem_object_wait_fence(struct drm_i915_gem_object *obj)
>   {
> -	if (obj->last_fenced_req) {
> -		int ret = i915_wait_request(obj->last_fenced_req);
> +	if (obj->last_fence.request) {
> +		int ret = i915_wait_request(obj->last_fence.request);
>   		if (ret)
>   			return ret;
>
> -		i915_gem_request_assign(&obj->last_fenced_req, NULL);
> +		i915_gem_request_assign(&obj->last_fence.request, NULL);
>   	}
>
>   	return 0;
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index 2da9e0b5dfc7..0a21986c332b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -208,4 +208,42 @@ static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
>   				 req->fence.seqno);
>   }
>
> +/* We treat requests as fences. This is not be to confused with our
> + * "fence registers" but pipeline synchronisation objects ala GL_ARB_sync.
> + * We use the fences to synchronize access from the CPU with activity on the
> + * GPU, for example, we should not rewrite an object's PTE whilst the GPU
> + * is reading them. We also track fences at a higher level to provide
> + * implicit synchronisation around GEM objects, e.g. set-domain will wait
> + * for outstanding GPU rendering before marking the object ready for CPU
> + * access, or a pageflip will wait until the GPU is complete before showing
> + * the frame on the scanout.
> + *
> + * In order to use a fence, the object must track the fence it needs to
> + * serialise with. For example, GEM objects want to track both read and
> + * write access so that we can perform concurrent read operations between
> + * the CPU and GPU engines, as well as waiting for all rendering to
> + * complete, or waiting for the last GPU user of a "fence register". The
> + * object then embeds a @i915_gem_active to track the most recent (in
> + * retirment order) request relevant for the desired mode of access.
> + * The @i915_gem_active is updated with i915_gem_request_mark_active() to
> + * track the most recent fence request, typically this is done as part of
> + * i915_vma_move_to_active().
> + *
> + * When the @i915_gem_active completes (is retired), it will
> + * signal its completion to the owner through a callback as well as mark
> + * itself as idle (i915_gem_active.request == NULL). The owner
> + * can then perform any action, such as delayed freeing of an active
> + * resource including itself.
> + */
> +struct i915_gem_active {
> +	struct drm_i915_gem_request *request;
> +};
> +
> +static inline void
> +i915_gem_request_mark_active(struct drm_i915_gem_request *request,
> +			     struct i915_gem_active *active)
> +{
> +	i915_gem_request_assign(&active->request, request);
> +}

This function name bothers me since I think the name is misleading and 
unintuitive. It is not marking a request as active but associating it 
with the second data structure.

Maybe i915_gem_request_move_to_active to keep the mental association 
with the well established vma_move_to_active ?

Maybe struct i915_gem_active could also be better called 
i915_gem_active_list ?

It is not ideal because of the future little reversal of who is in who's 
list, so maybe there is something even better. But I think an intuitive 
name is really important for code clarity and maintainability.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 021/190] drm/i915: Use HWS for seqno tracking everywhere
  2016-01-11  9:16 ` [PATCH 021/190] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
@ 2016-01-11 20:03   ` Dave Gordon
  2016-01-12 10:05   ` Mika Kuoppala
  1 sibling, 0 replies; 263+ messages in thread
From: Dave Gordon @ 2016-01-11 20:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 11/01/16 09:16, Chris Wilson wrote:
> By using the same address for storing the HWS on every platform, we can
> remove the platform specific vfuncs and reduce the get-seqno routine to
> a single read of a cached memory location.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c      | 10 ++--
>   drivers/gpu/drm/i915/i915_drv.h          |  4 +-
>   drivers/gpu/drm/i915/i915_gpu_error.c    |  2 +-
>   drivers/gpu/drm/i915/i915_irq.c          |  4 +-
>   drivers/gpu/drm/i915/i915_trace.h        |  2 +-
>   drivers/gpu/drm/i915/intel_breadcrumbs.c |  4 +-
>   drivers/gpu/drm/i915/intel_lrc.c         | 46 ++---------------
>   drivers/gpu/drm/i915/intel_ringbuffer.c  | 86 ++++++++------------------------
>   drivers/gpu/drm/i915/intel_ringbuffer.h  |  7 +--
>   9 files changed, 43 insertions(+), 122 deletions(-)

Generally I like this, except the changes to pc_render_add_request() (as 
previously noted) ...

> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 57ec21c5b1ab..c86d0e17d785 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1216,19 +1216,17 @@ static int gen8_rcs_signal(struct drm_i915_gem_request *signaller_req,
>   		return ret;
>
>   	for_each_ring(waiter, dev_priv, i) {
> -		u32 seqno;
>   		u64 gtt_offset = signaller->semaphore.signal_ggtt[i];
>   		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
>   			continue;
>
> -		seqno = i915_gem_request_get_seqno(signaller_req);
>   		intel_ring_emit(signaller, GFX_OP_PIPE_CONTROL(6));
>   		intel_ring_emit(signaller, PIPE_CONTROL_GLOBAL_GTT_IVB |
>   					   PIPE_CONTROL_QW_WRITE |
>   					   PIPE_CONTROL_FLUSH_ENABLE);
>   		intel_ring_emit(signaller, lower_32_bits(gtt_offset));
>   		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
> -		intel_ring_emit(signaller, seqno);
> +		intel_ring_emit(signaller, signaller_req->seqno);
>   		intel_ring_emit(signaller, 0);
>   		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
>   					   MI_SEMAPHORE_TARGET(waiter->id));
> @@ -1257,18 +1255,16 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
>   		return ret;
>
>   	for_each_ring(waiter, dev_priv, i) {
> -		u32 seqno;
>   		u64 gtt_offset = signaller->semaphore.signal_ggtt[i];
>   		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
>   			continue;
>
> -		seqno = i915_gem_request_get_seqno(signaller_req);
>   		intel_ring_emit(signaller, (MI_FLUSH_DW + 1) |
>   					   MI_FLUSH_DW_OP_STOREDW);
>   		intel_ring_emit(signaller, lower_32_bits(gtt_offset) |
>   					   MI_FLUSH_DW_USE_GTT);
>   		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
> -		intel_ring_emit(signaller, seqno);
> +		intel_ring_emit(signaller, signaller_req->seqno);
>   		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
>   					   MI_SEMAPHORE_TARGET(waiter->id));
>   		intel_ring_emit(signaller, 0);
> @@ -1299,11 +1295,9 @@ static int gen6_signal(struct drm_i915_gem_request *signaller_req,
>   		i915_reg_t mbox_reg = signaller->semaphore.mbox.signal[i];
>
>   		if (i915_mmio_reg_valid(mbox_reg)) {
> -			u32 seqno = i915_gem_request_get_seqno(signaller_req);
> -
>   			intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1));
>   			intel_ring_emit_reg(signaller, mbox_reg);
> -			intel_ring_emit(signaller, seqno);
> +			intel_ring_emit(signaller, signaller_req->seqno);
>   		}
>   	}
>
> @@ -1338,7 +1332,7 @@ gen6_add_request(struct drm_i915_gem_request *req)
>
>   	intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
>   	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
> -	intel_ring_emit(ring, i915_gem_request_get_seqno(req));
> +	intel_ring_emit(ring, req->seqno);
>   	intel_ring_emit(ring, MI_USER_INTERRUPT);
>   	__intel_ring_advance(ring);
>
> @@ -1440,7 +1434,9 @@ static int
>   pc_render_add_request(struct drm_i915_gem_request *req)
>   {
>   	struct intel_engine_cs *ring = req->ring;
> -	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
> +	u32 addr = req->ring->status_page.gfx_addr +
> +		(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
> +	u32 scratch_addr = addr;
>   	int ret;

What I don't like is that this will now splat dummy writes across the 
HWSP. I'd rather the HWSP was kept for communicating /useful/ messages 
from the GPU to CPU, and the scratch page be used for dummy writes (as 
it previously was).

If you really needed to switch those writes from scratch to HWSP -- 
maybe so that the scratch page could be eliminated -- you could #define 
six separate offsets within the status page to be used for dummy writes, 
and use those values in the emit() sequence below. But that really 
wouldn't be pretty :( And since AFAIK we're not planning to get rid of 
the scratch page, we should continue to use it for /all/ dummy writes.

.Dave.

>   	/* For Ironlake, MI_USER_INTERRUPT was deprecated and apparently
> @@ -1455,11 +1451,12 @@ pc_render_add_request(struct drm_i915_gem_request *req)
>   	if (ret)
>   		return ret;
>
> -	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
> -			PIPE_CONTROL_WRITE_FLUSH |
> -			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
> -	intel_ring_emit(ring, ring->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
> -	intel_ring_emit(ring, i915_gem_request_get_seqno(req));
> +	intel_ring_emit(ring,
> +			GFX_OP_PIPE_CONTROL(4) |
> +			PIPE_CONTROL_QW_WRITE |
> +			PIPE_CONTROL_WRITE_FLUSH);
> +	intel_ring_emit(ring, addr | PIPE_CONTROL_GLOBAL_GTT);
> +	intel_ring_emit(ring, req->seqno);
>   	intel_ring_emit(ring, 0);
>   	PIPE_CONTROL_FLUSH(ring, scratch_addr);
>   	scratch_addr += 2 * CACHELINE_BYTES; /* write to separate cachelines */
> @@ -1473,12 +1470,12 @@ pc_render_add_request(struct drm_i915_gem_request *req)
>   	scratch_addr += 2 * CACHELINE_BYTES;
>   	PIPE_CONTROL_FLUSH(ring, scratch_addr);
>
> -	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
> +	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4) |
> +			PIPE_CONTROL_QW_WRITE |
>   			PIPE_CONTROL_WRITE_FLUSH |
> -			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
>   			PIPE_CONTROL_NOTIFY);
> -	intel_ring_emit(ring, ring->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
> -	intel_ring_emit(ring, i915_gem_request_get_seqno(req));
> +	intel_ring_emit(ring, addr | PIPE_CONTROL_GLOBAL_GTT);
> +	intel_ring_emit(ring, req->seqno);
>   	intel_ring_emit(ring, 0);
>   	__intel_ring_advance(ring);

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 073/190] drm/i915: Introduce i915_gem_active for request tracking
  2016-01-11 17:32   ` Tvrtko Ursulin
@ 2016-01-11 22:49     ` Chris Wilson
  2016-01-12 10:04       ` Tvrtko Ursulin
  0 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-11 22:49 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Mon, Jan 11, 2016 at 05:32:27PM +0000, Tvrtko Ursulin wrote:
> >+struct i915_gem_active {
> >+	struct drm_i915_gem_request *request;
> >+};
> >+
> >+static inline void
> >+i915_gem_request_mark_active(struct drm_i915_gem_request *request,
> >+			     struct i915_gem_active *active)
> >+{
> >+	i915_gem_request_assign(&active->request, request);
> >+}
> 
> This function name bothers me since I think the name is misleading
> and unintuitive. It is not marking a request as active but
> associating it with the second data structure.
> 
> Maybe i915_gem_request_move_to_active to keep the mental association
> with the well established vma_move_to_active ?

That's backwards imo, since it is the i915_gem_active that gets added to
the request. (Or at least will be.)
 
> Maybe struct i915_gem_active could also be better called
> i915_gem_active_list ?

It's not a list but a node. I started with drm_i915_gem_request_node,
but that's too unwieldy and I felt even more confusing.

> It is not ideal because of the future little reversal of who is in
> who's list, so maybe there is something even better. But I think an
> intuitive name is really important for code clarity and
> maintainability.

In userspace, I have the request (which is actually a userspace fence
itself) containing a list of fences (that are identical to i915_gem_active,
they track which request contains the reference and a callback for
signalling) and those fences have a direct correspondence to,
ARB_sync_objects, for example. But we already have plenty of conflict
regarding the term fence, so that's no go.

i915_gem_active, for me, made the association with the active-reference
tracking that is ingrained into the objects and beyond. I quite like the
connection with GPU activity

i915_gem_retire_notifier? Hmm, I still like how
i915_gem_active.request != NULL is quite self-descriptive.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 073/190] drm/i915: Introduce i915_gem_active for request tracking
  2016-01-11 22:49     ` Chris Wilson
@ 2016-01-12 10:04       ` Tvrtko Ursulin
  2016-01-12 11:01         ` Chris Wilson
  0 siblings, 1 reply; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-01-12 10:04 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 11/01/16 22:49, Chris Wilson wrote:
> On Mon, Jan 11, 2016 at 05:32:27PM +0000, Tvrtko Ursulin wrote:
>>> +struct i915_gem_active {
>>> +	struct drm_i915_gem_request *request;
>>> +};
>>> +
>>> +static inline void
>>> +i915_gem_request_mark_active(struct drm_i915_gem_request *request,
>>> +			     struct i915_gem_active *active)
>>> +{
>>> +	i915_gem_request_assign(&active->request, request);
>>> +}
>>
>> This function name bothers me since I think the name is misleading
>> and unintuitive. It is not marking a request as active but
>> associating it with the second data structure.
>>
>> Maybe i915_gem_request_move_to_active to keep the mental association
>> with the well established vma_move_to_active ?
>
> That's backwards imo, since it is the i915_gem_active that gets added to
> the request. (Or at least will be.)
>
>> Maybe struct i915_gem_active could also be better called
>> i915_gem_active_list ?
>
> It's not a list but a node. I started with drm_i915_gem_request_node,
> but that's too unwieldy and I felt even more confusing.
>
>> It is not ideal because of the future little reversal of who is in
>> who's list, so maybe there is something even better. But I think an
>> intuitive name is really important for code clarity and
>> maintainability.
>
> In userspace, I have the request (which is actually a userspace fence
> itself) containing a list of fences (that are identical to i915_gem_active,
> they track which request contains the reference and a callback for
> signalling) and those fences have a direct correspondence to,
> ARB_sync_objects, for example. But we already have plenty of conflict
> regarding the term fence, so that's no go.
>
> i915_gem_active, for me, made the association with the active-reference
> tracking that is ingrained into the objects and beyond. I quite like the
> connection with GPU activity
>
> i915_gem_retire_notifier? Hmm, I still like how
> i915_gem_active.request != NULL is quite self-descriptive.

Yes the last bit is neat.

Perhaps then leave the structure name as is and just rename the function 
to i915_gem_request_assign_active? I think that describes better what is 
actually happening.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 021/190] drm/i915: Use HWS for seqno tracking everywhere
  2016-01-11  9:16 ` [PATCH 021/190] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
  2016-01-11 20:03   ` Dave Gordon
@ 2016-01-12 10:05   ` Mika Kuoppala
  2016-01-12 11:03     ` Chris Wilson
  1 sibling, 1 reply; 263+ messages in thread
From: Mika Kuoppala @ 2016-01-12 10:05 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> By using the same address for storing the HWS on every platform, we can
> remove the platform specific vfuncs and reduce the get-seqno routine to
> a single read of a cached memory location.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c      | 10 ++--
>  drivers/gpu/drm/i915/i915_drv.h          |  4 +-
>  drivers/gpu/drm/i915/i915_gpu_error.c    |  2 +-
>  drivers/gpu/drm/i915/i915_irq.c          |  4 +-
>  drivers/gpu/drm/i915/i915_trace.h        |  2 +-
>  drivers/gpu/drm/i915/intel_breadcrumbs.c |  4 +-
>  drivers/gpu/drm/i915/intel_lrc.c         | 46 ++---------------
>  drivers/gpu/drm/i915/intel_ringbuffer.c  | 86 ++++++++------------------------
>  drivers/gpu/drm/i915/intel_ringbuffer.h  |  7 +--
>  9 files changed, 43 insertions(+), 122 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index d09e48455dcb..5a706c700684 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -600,7 +600,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
>  					   ring->name,
>  					   i915_gem_request_get_seqno(work->flip_queued_req),
>  					   dev_priv->next_seqno,
> -					   ring->get_seqno(ring),
> +					   intel_ring_get_seqno(ring),
>  					   i915_gem_request_completed(work->flip_queued_req));
>  			} else
>  				seq_printf(m, "Flip not associated with any ring\n");
> @@ -732,10 +732,8 @@ static void i915_ring_seqno_info(struct seq_file *m,
>  {
>  	struct rb_node *rb;
>  
> -	if (ring->get_seqno) {
> -		seq_printf(m, "Current sequence (%s): %x\n",
> -			   ring->name, ring->get_seqno(ring));
> -	}
> +	seq_printf(m, "Current sequence (%s): %x\n",
> +		   ring->name, intel_ring_get_seqno(ring));
>  
>  	spin_lock(&ring->breadcrumbs.lock);
>  	for (rb = rb_first(&ring->breadcrumbs.waiters);
> @@ -1355,7 +1353,7 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>  
>  	for_each_ring(ring, dev_priv, i) {
>  		acthd[i] = intel_ring_get_active_head(ring);
> -		seqno[i] = ring->get_seqno(ring);
> +		seqno[i] = intel_ring_get_seqno(ring);
>  	}
>  
>  	i915_get_extra_instdone(dev, instdone);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 44d46018ee13..fcedcbc50834 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2971,13 +2971,13 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>  
>  static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
>  {
> -	return i915_seqno_passed(req->ring->get_seqno(req->ring),
> +	return i915_seqno_passed(intel_ring_get_seqno(req->ring),
>  				 req->previous_seqno);
>  }
>  
>  static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
>  {
> -	return i915_seqno_passed(req->ring->get_seqno(req->ring),
> +	return i915_seqno_passed(intel_ring_get_seqno(req->ring),
>  				 req->seqno);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 01d0206ca4dd..3e137fc701cf 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -903,7 +903,7 @@ static void i915_record_ring_state(struct drm_device *dev,
>  	ering->waiting = intel_engine_has_waiter(ring);
>  	ering->instpm = I915_READ(RING_INSTPM(ring->mmio_base));
>  	ering->acthd = intel_ring_get_active_head(ring);
> -	ering->seqno = ring->get_seqno(ring);
> +	ering->seqno = intel_ring_get_seqno(ring);
>  	ering->start = I915_READ_START(ring);
>  	ering->head = I915_READ_HEAD(ring);
>  	ering->tail = I915_READ_TAIL(ring);
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index d73669783045..627c7fb6aa9b 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2903,7 +2903,7 @@ static int semaphore_passed(struct intel_engine_cs *ring)
>  	if (signaller->hangcheck.deadlock >= I915_NUM_RINGS)
>  		return -1;
>  
> -	if (i915_seqno_passed(signaller->get_seqno(signaller), seqno))
> +	if (i915_seqno_passed(intel_ring_get_seqno(signaller), seqno))
>  		return 1;
>  
>  	/* cursory check for an unkickable deadlock */
> @@ -3068,7 +3068,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>  		semaphore_clear_deadlocks(dev_priv);
>  
>  		acthd = intel_ring_get_active_head(ring);
> -		seqno = ring->get_seqno(ring);
> +		seqno = intel_ring_get_seqno(ring);
>  
>  		if (ring->hangcheck.seqno == seqno) {
>  			if (ring_idle(ring, seqno)) {
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index cfb5f78a6e84..efca75bcace3 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -573,7 +573,7 @@ TRACE_EVENT(i915_gem_request_notify,
>  	    TP_fast_assign(
>  			   __entry->dev = ring->dev->primary->index;
>  			   __entry->ring = ring->id;
> -			   __entry->seqno = ring->get_seqno(ring);
> +			   __entry->seqno = intel_ring_get_seqno(ring);
>  			   ),
>  
>  	    TP_printk("dev=%u, ring=%u, seqno=%u",
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 10b0add54acf..f66acf820c40 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -127,7 +127,7 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
>  			   struct intel_wait *wait)
>  {
>  	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	u32 seqno = engine->get_seqno(engine);
> +	u32 seqno = intel_ring_get_seqno(engine);
>  	struct rb_node **p, *parent, *completed;
>  	bool first;
>  
> @@ -269,7 +269,7 @@ void intel_engine_remove_wait(struct intel_engine_cs *engine,
>  			 * the first_waiter. This is undesirable if that
>  			 * waiter is a high priority task.
>  			 */
> -			u32 seqno = engine->get_seqno(engine);
> +			u32 seqno = intel_ring_get_seqno(engine);
>  			while (i915_seqno_passed(seqno,
>  						 to_wait(next)->seqno)) {
>  				struct rb_node *n = rb_next(next);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 333e95bda78a..ad51b1fc37cd 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1775,16 +1775,6 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
>  	return 0;
>  }
>  
> -static u32 gen8_get_seqno(struct intel_engine_cs *ring)
> -{
> -	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
> -}
> -
> -static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
> -{
> -	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
> -}
> -
>  static void bxt_seqno_barrier(struct intel_engine_cs *ring)
>  {
>  	/*
> @@ -1800,14 +1790,6 @@ static void bxt_seqno_barrier(struct intel_engine_cs *ring)
>  	intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
>  }
>  
> -static void bxt_a_set_seqno(struct intel_engine_cs *ring, u32 seqno)
> -{
> -	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
> -
> -	/* See bxt_a_get_seqno() explaining the reason for the clflush. */
> -	intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
> -}
> -
>  static int gen8_emit_request(struct drm_i915_gem_request *request)
>  {
>  	struct intel_ringbuffer *ringbuf = request->ringbuf;
> @@ -1832,7 +1814,7 @@ static int gen8_emit_request(struct drm_i915_gem_request *request)
>  				(ring->status_page.gfx_addr +
>  				(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT)));
>  	intel_logical_ring_emit(ringbuf, 0);
> -	intel_logical_ring_emit(ringbuf, i915_gem_request_get_seqno(request));
> +	intel_logical_ring_emit(ringbuf, request->seqno);
>  	intel_logical_ring_emit(ringbuf, MI_USER_INTERRUPT);
>  	intel_logical_ring_emit(ringbuf, MI_NOOP);
>  	intel_logical_ring_advance_and_submit(request);
> @@ -2002,12 +1984,8 @@ static int logical_render_ring_init(struct drm_device *dev)
>  		ring->init_hw = gen8_init_render_ring;
>  	ring->init_context = gen8_init_rcs_context;
>  	ring->cleanup = intel_fini_pipe_control;
> -	ring->get_seqno = gen8_get_seqno;
> -	ring->set_seqno = gen8_set_seqno;
> -	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) {
> +	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1))
>  		ring->irq_seqno_barrier = bxt_seqno_barrier;
> -		ring->set_seqno = bxt_a_set_seqno;
> -	}
>  	ring->emit_request = gen8_emit_request;
>  	ring->emit_flush = gen8_emit_flush_render;
>  	ring->irq_get = gen8_logical_ring_get_irq;
> @@ -2053,12 +2031,8 @@ static int logical_bsd_ring_init(struct drm_device *dev)
>  		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
>  
>  	ring->init_hw = gen8_init_common_ring;
> -	ring->get_seqno = gen8_get_seqno;
> -	ring->set_seqno = gen8_set_seqno;
> -	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) {
> +	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1))
>  		ring->irq_seqno_barrier = bxt_seqno_barrier;
> -		ring->set_seqno = bxt_a_set_seqno;
> -	}
>  	ring->emit_request = gen8_emit_request;
>  	ring->emit_flush = gen8_emit_flush;
>  	ring->irq_get = gen8_logical_ring_get_irq;
> @@ -2082,8 +2056,6 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
>  		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
>  
>  	ring->init_hw = gen8_init_common_ring;
> -	ring->get_seqno = gen8_get_seqno;
> -	ring->set_seqno = gen8_set_seqno;
>  	ring->emit_request = gen8_emit_request;
>  	ring->emit_flush = gen8_emit_flush;
>  	ring->irq_get = gen8_logical_ring_get_irq;
> @@ -2107,12 +2079,8 @@ static int logical_blt_ring_init(struct drm_device *dev)
>  		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
>  
>  	ring->init_hw = gen8_init_common_ring;
> -	ring->get_seqno = gen8_get_seqno;
> -	ring->set_seqno = gen8_set_seqno;
> -	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) {
> +	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1))
>  		ring->irq_seqno_barrier = bxt_seqno_barrier;
> -		ring->set_seqno = bxt_a_set_seqno;
> -	}
>  	ring->emit_request = gen8_emit_request;
>  	ring->emit_flush = gen8_emit_flush;
>  	ring->irq_get = gen8_logical_ring_get_irq;
> @@ -2136,12 +2104,8 @@ static int logical_vebox_ring_init(struct drm_device *dev)
>  		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
>  
>  	ring->init_hw = gen8_init_common_ring;
> -	ring->get_seqno = gen8_get_seqno;
> -	ring->set_seqno = gen8_set_seqno;
> -	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) {
> +	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1))
>  		ring->irq_seqno_barrier = bxt_seqno_barrier;
> -		ring->set_seqno = bxt_a_set_seqno;
> -	}
>  	ring->emit_request = gen8_emit_request;
>  	ring->emit_flush = gen8_emit_flush;
>  	ring->irq_get = gen8_logical_ring_get_irq;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 57ec21c5b1ab..c86d0e17d785 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1216,19 +1216,17 @@ static int gen8_rcs_signal(struct drm_i915_gem_request *signaller_req,
>  		return ret;
>  
>  	for_each_ring(waiter, dev_priv, i) {
> -		u32 seqno;
>  		u64 gtt_offset = signaller->semaphore.signal_ggtt[i];
>  		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
>  			continue;
>  
> -		seqno = i915_gem_request_get_seqno(signaller_req);
>  		intel_ring_emit(signaller, GFX_OP_PIPE_CONTROL(6));
>  		intel_ring_emit(signaller, PIPE_CONTROL_GLOBAL_GTT_IVB |
>  					   PIPE_CONTROL_QW_WRITE |
>  					   PIPE_CONTROL_FLUSH_ENABLE);
>  		intel_ring_emit(signaller, lower_32_bits(gtt_offset));
>  		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
> -		intel_ring_emit(signaller, seqno);
> +		intel_ring_emit(signaller, signaller_req->seqno);
>  		intel_ring_emit(signaller, 0);
>  		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
>  					   MI_SEMAPHORE_TARGET(waiter->id));
> @@ -1257,18 +1255,16 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
>  		return ret;
>  
>  	for_each_ring(waiter, dev_priv, i) {
> -		u32 seqno;
>  		u64 gtt_offset = signaller->semaphore.signal_ggtt[i];
>  		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
>  			continue;
>  
> -		seqno = i915_gem_request_get_seqno(signaller_req);
>  		intel_ring_emit(signaller, (MI_FLUSH_DW + 1) |
>  					   MI_FLUSH_DW_OP_STOREDW);
>  		intel_ring_emit(signaller, lower_32_bits(gtt_offset) |
>  					   MI_FLUSH_DW_USE_GTT);
>  		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
> -		intel_ring_emit(signaller, seqno);
> +		intel_ring_emit(signaller, signaller_req->seqno);
>  		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
>  					   MI_SEMAPHORE_TARGET(waiter->id));
>  		intel_ring_emit(signaller, 0);
> @@ -1299,11 +1295,9 @@ static int gen6_signal(struct drm_i915_gem_request *signaller_req,
>  		i915_reg_t mbox_reg = signaller->semaphore.mbox.signal[i];
>  
>  		if (i915_mmio_reg_valid(mbox_reg)) {
> -			u32 seqno = i915_gem_request_get_seqno(signaller_req);
> -
>  			intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1));
>  			intel_ring_emit_reg(signaller, mbox_reg);
> -			intel_ring_emit(signaller, seqno);
> +			intel_ring_emit(signaller, signaller_req->seqno);
>  		}
>  	}
>  
> @@ -1338,7 +1332,7 @@ gen6_add_request(struct drm_i915_gem_request *req)
>  
>  	intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
>  	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
> -	intel_ring_emit(ring, i915_gem_request_get_seqno(req));
> +	intel_ring_emit(ring, req->seqno);
>  	intel_ring_emit(ring, MI_USER_INTERRUPT);
>  	__intel_ring_advance(ring);
>  
> @@ -1440,7 +1434,9 @@ static int
>  pc_render_add_request(struct drm_i915_gem_request *req)
>  {
>  	struct intel_engine_cs *ring = req->ring;
> -	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
> +	u32 addr = req->ring->status_page.gfx_addr +
> +		(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
> +	u32 scratch_addr = addr;
>  	int ret;
>  
>  	/* For Ironlake, MI_USER_INTERRUPT was deprecated and apparently
> @@ -1455,11 +1451,12 @@ pc_render_add_request(struct drm_i915_gem_request *req)
>  	if (ret)
>  		return ret;
>  
> -	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
> -			PIPE_CONTROL_WRITE_FLUSH |
> -			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
> -	intel_ring_emit(ring, ring->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
> -	intel_ring_emit(ring, i915_gem_request_get_seqno(req));
> +	intel_ring_emit(ring,
> +			GFX_OP_PIPE_CONTROL(4) |
> +			PIPE_CONTROL_QW_WRITE |
> +			PIPE_CONTROL_WRITE_FLUSH);

Why no more PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE?
-Mika

> +	intel_ring_emit(ring, addr | PIPE_CONTROL_GLOBAL_GTT);
> +	intel_ring_emit(ring, req->seqno);
>  	intel_ring_emit(ring, 0);
>  	PIPE_CONTROL_FLUSH(ring, scratch_addr);
>  	scratch_addr += 2 * CACHELINE_BYTES; /* write to separate cachelines */
> @@ -1473,12 +1470,12 @@ pc_render_add_request(struct drm_i915_gem_request *req)
>  	scratch_addr += 2 * CACHELINE_BYTES;
>  	PIPE_CONTROL_FLUSH(ring, scratch_addr);
>  
> -	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
> +	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4) |
> +			PIPE_CONTROL_QW_WRITE |
>  			PIPE_CONTROL_WRITE_FLUSH |
> -			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
>  			PIPE_CONTROL_NOTIFY);
> -	intel_ring_emit(ring, ring->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
> -	intel_ring_emit(ring, i915_gem_request_get_seqno(req));
> +	intel_ring_emit(ring, addr | PIPE_CONTROL_GLOBAL_GTT);
> +	intel_ring_emit(ring, req->seqno);
>  	intel_ring_emit(ring, 0);
>  	__intel_ring_advance(ring);
>  
> @@ -1506,30 +1503,6 @@ gen6_seqno_barrier(struct intel_engine_cs *ring)
>  	intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
>  }
>  
> -static u32
> -ring_get_seqno(struct intel_engine_cs *ring)
> -{
> -	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
> -}
> -
> -static void
> -ring_set_seqno(struct intel_engine_cs *ring, u32 seqno)
> -{
> -	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
> -}
> -
> -static u32
> -pc_render_get_seqno(struct intel_engine_cs *ring)
> -{
> -	return ring->scratch.cpu_page[0];
> -}
> -
> -static void
> -pc_render_set_seqno(struct intel_engine_cs *ring, u32 seqno)
> -{
> -	ring->scratch.cpu_page[0] = seqno;
> -}
> -
>  static bool
>  gen5_ring_get_irq(struct intel_engine_cs *ring)
>  {
> @@ -1665,7 +1638,7 @@ i9xx_add_request(struct drm_i915_gem_request *req)
>  
>  	intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
>  	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
> -	intel_ring_emit(ring, i915_gem_request_get_seqno(req));
> +	intel_ring_emit(ring, req->seqno);
>  	intel_ring_emit(ring, MI_USER_INTERRUPT);
>  	__intel_ring_advance(ring);
>  
> @@ -2457,7 +2430,10 @@ void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno)
>  			I915_WRITE(RING_SYNC_2(ring->mmio_base), 0);
>  	}
>  
> -	ring->set_seqno(ring, seqno);
> +	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
> +	if (ring->irq_seqno_barrier)
> +		ring->irq_seqno_barrier(ring);
> +
>  	ring->hangcheck.seqno = seqno;
>  }
>  
> @@ -2695,8 +2671,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>  		ring->irq_put = gen8_ring_put_irq;
>  		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
>  		ring->irq_seqno_barrier = gen6_seqno_barrier;
> -		ring->get_seqno = ring_get_seqno;
> -		ring->set_seqno = ring_set_seqno;
>  		if (i915_semaphore_is_enabled(dev)) {
>  			WARN_ON(!dev_priv->semaphore_obj);
>  			ring->semaphore.sync_to = gen8_ring_sync;
> @@ -2713,8 +2687,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>  		ring->irq_put = gen6_ring_put_irq;
>  		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
>  		ring->irq_seqno_barrier = gen6_seqno_barrier;
> -		ring->get_seqno = ring_get_seqno;
> -		ring->set_seqno = ring_set_seqno;
>  		if (i915_semaphore_is_enabled(dev)) {
>  			ring->semaphore.sync_to = gen6_ring_sync;
>  			ring->semaphore.signal = gen6_signal;
> @@ -2739,8 +2711,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>  	} else if (IS_GEN5(dev)) {
>  		ring->add_request = pc_render_add_request;
>  		ring->flush = gen4_render_ring_flush;
> -		ring->get_seqno = pc_render_get_seqno;
> -		ring->set_seqno = pc_render_set_seqno;
>  		ring->irq_get = gen5_ring_get_irq;
>  		ring->irq_put = gen5_ring_put_irq;
>  		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT |
> @@ -2751,8 +2721,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>  			ring->flush = gen2_render_ring_flush;
>  		else
>  			ring->flush = gen4_render_ring_flush;
> -		ring->get_seqno = ring_get_seqno;
> -		ring->set_seqno = ring_set_seqno;
>  		if (IS_GEN2(dev)) {
>  			ring->irq_get = i8xx_ring_get_irq;
>  			ring->irq_put = i8xx_ring_put_irq;
> @@ -2828,8 +2796,6 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
>  		ring->flush = gen6_bsd_ring_flush;
>  		ring->add_request = gen6_add_request;
>  		ring->irq_seqno_barrier = gen6_seqno_barrier;
> -		ring->get_seqno = ring_get_seqno;
> -		ring->set_seqno = ring_set_seqno;
>  		if (INTEL_INFO(dev)->gen >= 8) {
>  			ring->irq_enable_mask =
>  				GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
> @@ -2867,8 +2833,6 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
>  		ring->mmio_base = BSD_RING_BASE;
>  		ring->flush = bsd_ring_flush;
>  		ring->add_request = i9xx_add_request;
> -		ring->get_seqno = ring_get_seqno;
> -		ring->set_seqno = ring_set_seqno;
>  		if (IS_GEN5(dev)) {
>  			ring->irq_enable_mask = ILK_BSD_USER_INTERRUPT;
>  			ring->irq_get = gen5_ring_get_irq;
> @@ -2901,8 +2865,6 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
>  	ring->flush = gen6_bsd_ring_flush;
>  	ring->add_request = gen6_add_request;
>  	ring->irq_seqno_barrier = gen6_seqno_barrier;
> -	ring->get_seqno = ring_get_seqno;
> -	ring->set_seqno = ring_set_seqno;
>  	ring->irq_enable_mask =
>  			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
>  	ring->irq_get = gen8_ring_get_irq;
> @@ -2932,8 +2894,6 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
>  	ring->flush = gen6_ring_flush;
>  	ring->add_request = gen6_add_request;
>  	ring->irq_seqno_barrier = gen6_seqno_barrier;
> -	ring->get_seqno = ring_get_seqno;
> -	ring->set_seqno = ring_set_seqno;
>  	if (INTEL_INFO(dev)->gen >= 8) {
>  		ring->irq_enable_mask =
>  			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
> @@ -2990,8 +2950,6 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
>  	ring->flush = gen6_ring_flush;
>  	ring->add_request = gen6_add_request;
>  	ring->irq_seqno_barrier = gen6_seqno_barrier;
> -	ring->get_seqno = ring_get_seqno;
> -	ring->set_seqno = ring_set_seqno;
>  
>  	if (INTEL_INFO(dev)->gen >= 8) {
>  		ring->irq_enable_mask =
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 3b49726b1732..28ab07b38c05 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -220,9 +220,6 @@ struct  intel_engine_cs {
>  	 * monotonic, even if not coherent.
>  	 */
>  	void		(*irq_seqno_barrier)(struct intel_engine_cs *ring);
> -	u32		(*get_seqno)(struct intel_engine_cs *ring);
> -	void		(*set_seqno)(struct intel_engine_cs *ring,
> -				     u32 seqno);
>  	int		(*dispatch_execbuffer)(struct drm_i915_gem_request *req,
>  					       u64 offset, u32 length,
>  					       unsigned dispatch_flags);
> @@ -502,6 +499,10 @@ int intel_init_blt_ring_buffer(struct drm_device *dev);
>  int intel_init_vebox_ring_buffer(struct drm_device *dev);
>  
>  u64 intel_ring_get_active_head(struct intel_engine_cs *ring);
> +static inline u32 intel_ring_get_seqno(struct intel_engine_cs *ring)
> +{
> +	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
> +}
>  
>  int init_workarounds_ring(struct intel_engine_cs *ring);
>  
> -- 
> 2.7.0.rc3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 020/190] drm/i915: Remove the lazy_coherency parameter from request-completed?
  2016-01-11  9:16 ` [PATCH 020/190] drm/i915: Remove the lazy_coherency parameter from request-completed? Chris Wilson
  2016-01-11 15:45   ` Dave Gordon
@ 2016-01-12 10:27   ` Mika Kuoppala
  2016-01-12 10:51     ` Chris Wilson
  1 sibling, 1 reply; 263+ messages in thread
From: Mika Kuoppala @ 2016-01-12 10:27 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Now that we have split out the seqno-barrier from the
> engine->get_seqno() callback itself, we can move the users of the
> seqno-barrier to the required callsites simplifying the common code and
> making the required workaround handling much more explicit.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c  |  4 ++--
>  drivers/gpu/drm/i915/i915_drv.h      | 17 ++++++++---------
>  drivers/gpu/drm/i915/i915_gem.c      | 24 ++++++++++++++++--------
>  drivers/gpu/drm/i915/intel_display.c |  2 +-
>  drivers/gpu/drm/i915/intel_pm.c      |  4 ++--
>  5 files changed, 29 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 1499e2337e5d..d09e48455dcb 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -601,7 +601,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
>  					   i915_gem_request_get_seqno(work->flip_queued_req),
>  					   dev_priv->next_seqno,
>  					   ring->get_seqno(ring),
> -					   i915_gem_request_completed(work->flip_queued_req, true));
> +					   i915_gem_request_completed(work->flip_queued_req));
>  			} else
>  				seq_printf(m, "Flip not associated with any ring\n");
>  			seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n",
> @@ -1354,8 +1354,8 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>  	intel_runtime_pm_get(dev_priv);
>  
>  	for_each_ring(ring, dev_priv, i) {
> -		seqno[i] = ring->get_seqno(ring);
>  		acthd[i] = intel_ring_get_active_head(ring);
> +		seqno[i] = ring->get_seqno(ring);

Oh! Perhaps I am overly optimistic but did you just show how to solve
the 'hangcheck elapsed <random> ring idle..' coherency issue
in hangcheck?

I would like to have a separate patch that does this ordering
in here and also in i915_hangcheck_elapsed to be in the safe
side, regardless.

Or at minimum, copy the acthd read, seqno read ordering
into the i915_hangcheck_elapsed also.

-Mika


>  	}
>  
>  	i915_get_extra_instdone(dev, instdone);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 9762aa76bb0a..44d46018ee13 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2969,20 +2969,14 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>  	return (int32_t)(seq1 - seq2) >= 0;
>  }
>  
> -static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
> -					   bool lazy_coherency)
> +static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
>  {
> -	if (!lazy_coherency && req->ring->irq_seqno_barrier)
> -		req->ring->irq_seqno_barrier(req->ring);
>  	return i915_seqno_passed(req->ring->get_seqno(req->ring),
>  				 req->previous_seqno);
>  }
>  
> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> -					      bool lazy_coherency)
> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
>  {
> -	if (!lazy_coherency && req->ring->irq_seqno_barrier)
> -		req->ring->irq_seqno_barrier(req->ring);
>  	return i915_seqno_passed(req->ring->get_seqno(req->ring),
>  				 req->seqno);
>  }
> @@ -3636,6 +3630,8 @@ static inline void i915_trace_irq_get(struct intel_engine_cs *ring,
>  
>  static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
>  {
> +	struct intel_engine_cs *engine = req->ring;
> +
>  	/* Ensure our read of the seqno is coherent so that we
>  	 * do not "miss an interrupt" (i.e. if this is the last
>  	 * request and the seqno write from the GPU is not visible
> @@ -3647,7 +3643,10 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
>  	 * but it is easier and safer to do it every time the waiter
>  	 * is woken.
>  	 */
> -	if (i915_gem_request_completed(req, false))
> +	if (engine->irq_seqno_barrier)
> +		engine->irq_seqno_barrier(engine);
> +
> +	if (i915_gem_request_completed(req))
>  		return true;
>  
>  	/* We need to check whether any gpu reset happened in between
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 4b26529f1f44..d125820c6309 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1171,12 +1171,12 @@ static bool __i915_spin_request(struct drm_i915_gem_request *req,
>  	 */
>  
>  	/* Only spin if we know the GPU is processing this request */
> -	if (!i915_gem_request_started(req, true))
> +	if (!i915_gem_request_started(req))
>  		return false;
>  
>  	timeout = local_clock_us(&cpu) + 5;
>  	do {
> -		if (i915_gem_request_completed(req, true))
> +		if (i915_gem_request_completed(req))
>  			return true;
>  
>  		if (signal_pending_state(state, wait->task))
> @@ -1228,7 +1228,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  	if (list_empty(&req->list))
>  		return 0;
>  
> -	if (i915_gem_request_completed(req, true))
> +	if (i915_gem_request_completed(req))
>  		return 0;
>  
>  	timeout_remain = MAX_SCHEDULE_TIMEOUT;
> @@ -2724,8 +2724,16 @@ i915_gem_find_active_request(struct intel_engine_cs *ring)
>  {
>  	struct drm_i915_gem_request *request;
>  
> +	/* We are called by the error capture and reset at a random
> +	 * point in time. In particular, note that neither is crucially
> +	 * ordered with an interrupt. After a hang, the GPU is dead and we
> +	 * assume that no more writes can happen (we waited long enough for
> +	 * all writes that were in transaction to be flushed) - adding an
> +	 * extra delay for a recent interrupt is pointless. Hence, we do
> +	 * not need an engine->irq_seqno_barrier() before the seqno reads.
> +	 */
>  	list_for_each_entry(request, &ring->request_list, list) {
> -		if (i915_gem_request_completed(request, false))
> +		if (i915_gem_request_completed(request))
>  			continue;
>  
>  		return request;
> @@ -2859,7 +2867,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>  					   struct drm_i915_gem_request,
>  					   list);
>  
> -		if (!i915_gem_request_completed(request, true))
> +		if (!i915_gem_request_completed(request))
>  			break;
>  
>  		i915_gem_request_retire(request);
> @@ -2883,7 +2891,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>  	}
>  
>  	if (unlikely(ring->trace_irq_req &&
> -		     i915_gem_request_completed(ring->trace_irq_req, true))) {
> +		     i915_gem_request_completed(ring->trace_irq_req))) {
>  		ring->irq_put(ring);
>  		i915_gem_request_assign(&ring->trace_irq_req, NULL);
>  	}
> @@ -2995,7 +3003,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
>  		if (list_empty(&req->list))
>  			goto retire;
>  
> -		if (i915_gem_request_completed(req, true)) {
> +		if (i915_gem_request_completed(req)) {
>  			__i915_gem_request_retire__upto(req);
>  retire:
>  			i915_gem_object_retire__read(obj, i);
> @@ -3104,7 +3112,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
>  	if (to == from)
>  		return 0;
>  
> -	if (i915_gem_request_completed(from_req, true))
> +	if (i915_gem_request_completed(from_req))
>  		return 0;
>  
>  	if (!i915_semaphore_is_enabled(obj->base.dev)) {
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 7e36f85d3109..de4d4a0d923a 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -11523,7 +11523,7 @@ static bool __intel_pageflip_stall_check(struct drm_device *dev,
>  
>  	if (work->flip_ready_vblank == 0) {
>  		if (work->flip_queued_req &&
> -		    !i915_gem_request_completed(work->flip_queued_req, true))
> +		    !i915_gem_request_completed(work->flip_queued_req))
>  			return false;
>  
>  		work->flip_ready_vblank = drm_crtc_vblank_count(crtc);
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 9df9e9a22f3c..401c3770057d 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -7286,7 +7286,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
>  	struct request_boost *boost = container_of(work, struct request_boost, work);
>  	struct drm_i915_gem_request *req = boost->req;
>  
> -	if (!i915_gem_request_completed(req, true))
> +	if (!i915_gem_request_completed(req))
>  		gen6_rps_boost(to_i915(req->ring->dev), NULL,
>  			       req->emitted_jiffies);
>  
> @@ -7302,7 +7302,7 @@ void intel_queue_rps_boost_for_request(struct drm_device *dev,
>  	if (req == NULL || INTEL_INFO(dev)->gen < 6)
>  		return;
>  
> -	if (i915_gem_request_completed(req, true))
> +	if (i915_gem_request_completed(req))
>  		return;
>  
>  	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
> -- 
> 2.7.0.rc3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 020/190] drm/i915: Remove the lazy_coherency parameter from request-completed?
  2016-01-12 10:27   ` Mika Kuoppala
@ 2016-01-12 10:51     ` Chris Wilson
  0 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-12 10:51 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

On Tue, Jan 12, 2016 at 12:27:05PM +0200, Mika Kuoppala wrote:
> >  	for_each_ring(ring, dev_priv, i) {
> > -		seqno[i] = ring->get_seqno(ring);
> >  		acthd[i] = intel_ring_get_active_head(ring);
> > +		seqno[i] = ring->get_seqno(ring);
> 
> Oh! Perhaps I am overly optimistic but did you just show how to solve
> the 'hangcheck elapsed <random> ring idle..' coherency issue
> in hangcheck?

No. There are two causes, one is that we genuinely inspect the seqno
before it is written in the interrupt bottom-half and the other is
indeed the race you keep mentioning between the hangcheck looking at
waiters and the waiter setting itself up.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 073/190] drm/i915: Introduce i915_gem_active for request tracking
  2016-01-12 10:04       ` Tvrtko Ursulin
@ 2016-01-12 11:01         ` Chris Wilson
  2016-01-12 13:42           ` Tvrtko Ursulin
  2016-01-12 13:44           ` Tvrtko Ursulin
  0 siblings, 2 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-12 11:01 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Tue, Jan 12, 2016 at 10:04:20AM +0000, Tvrtko Ursulin wrote:
> Perhaps then leave the structure name as is and just rename the
> function to i915_gem_request_assign_active? I think that describes
> better what is actually happening.

i915_gem_request_update_active()?

request_assign_active() says to me that it is the request we are acting
on and it can have only one active entity. "update" could go either way.

i915_gem_active_add_to_request() is the full version I guess, or just
i915_gem_active_set().

i915_gem_request_mark_active() -> i915_gem_active_set()
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 021/190] drm/i915: Use HWS for seqno tracking everywhere
  2016-01-12 10:05   ` Mika Kuoppala
@ 2016-01-12 11:03     ` Chris Wilson
  2016-01-12 14:30       ` Mika Kuoppala
  0 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-12 11:03 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

On Tue, Jan 12, 2016 at 12:05:06PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> > -	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
> > -			PIPE_CONTROL_WRITE_FLUSH |
> > -			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
> > -	intel_ring_emit(ring, ring->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
> > -	intel_ring_emit(ring, i915_gem_request_get_seqno(req));
> > +	intel_ring_emit(ring,
> > +			GFX_OP_PIPE_CONTROL(4) |
> > +			PIPE_CONTROL_QW_WRITE |
> > +			PIPE_CONTROL_WRITE_FLUSH);
> 
> Why no more PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE?

I opened vim to add it back in and I coulnd't bring myself to commit
that attrocity.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 145/190] drm/i915: Stop discarding GTT cache-domain on unbind vma
  2016-01-11 11:00   ` [PATCH 145/190] drm/i915: Stop discarding GTT cache-domain on unbind vma Chris Wilson
@ 2016-01-12 13:22     ` Joonas Lahtinen
  0 siblings, 0 replies; 263+ messages in thread
From: Joonas Lahtinen @ 2016-01-12 13:22 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Akash Goel

On ma, 2016-01-11 at 11:00 +0000, Chris Wilson wrote:
> Since
> 
> commit 43566dedde54f9729113f5f9fde77d53e75e61e9
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Fri Jan 2 16:29:29 2015 +0530
> 
>     drm/i915: Broaden application of set-domain(GTT)
> 
> we allowed objects to be in the GTT domain, but unbound. Therefore
> removing the GTT cache domain when removing the GGTT vma is no longer
> semantically correct.
> 
> An unfortunate side-effect is we lose the wondrously named
> i915_gem_object_finish_gtt(), not to be confused with
> i915_gem_gtt_finish_object()!
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Akash Goel <akash.goel@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

I'm fairly sure I did this already in the past, but here goes again...

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 26 +++-----------------------
>  1 file changed, 3 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c
> b/drivers/gpu/drm/i915/i915_gem.c
> index 6ceed074f738..08287d8857c9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2618,27 +2618,6 @@ i915_gem_object_sync(struct
> drm_i915_gem_object *obj,
>  	return 0;
>  }
>  
> -static void i915_gem_object_finish_gtt(struct drm_i915_gem_object
> *obj)
> -{
> -	u32 old_write_domain, old_read_domains;
> -
> -	/* Force a pagefault for domain tracking on next user access
> */
> -	i915_gem_release_mmap(obj);
> -
> -	if ((obj->base.read_domains & I915_GEM_DOMAIN_GTT) == 0)
> -		return;
> -
> -	old_read_domains = obj->base.read_domains;
> -	old_write_domain = obj->base.write_domain;
> -
> -	obj->base.read_domains &= ~I915_GEM_DOMAIN_GTT;
> -	obj->base.write_domain &= ~I915_GEM_DOMAIN_GTT;
> -
> -	trace_i915_gem_object_change_domain(obj,
> -					    old_read_domains,
> -					    old_write_domain);
> -}
> -
>  static void i915_vma_destroy(struct i915_vma *vma)
>  {
>  	GEM_BUG_ON(vma->node.allocated);
> @@ -2691,13 +2670,14 @@ int i915_vma_unbind(struct i915_vma *vma)
>  	GEM_BUG_ON(obj->pages == NULL);
>  
>  	if (vma->map_and_fenceable) {
> -		i915_gem_object_finish_gtt(obj);
> -
>  		/* release the fence reg _after_ flushing */
>  		ret = i915_vma_put_fence(vma);
>  		if (ret)
>  			return ret;
>  
> +		/* Force a pagefault for domain tracking on next
> user access */
> +		i915_gem_release_mmap(obj);
> +
>  		if (vma->iomap) {
>  			iounmap(vma->iomap);
>  			vma->iomap = NULL;
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 073/190] drm/i915: Introduce i915_gem_active for request tracking
  2016-01-12 11:01         ` Chris Wilson
@ 2016-01-12 13:42           ` Tvrtko Ursulin
  2016-01-12 13:44           ` Tvrtko Ursulin
  1 sibling, 0 replies; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-01-12 13:42 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx



On 12/01/16 11:01, Chris Wilson wrote:
> On Tue, Jan 12, 2016 at 10:04:20AM +0000, Tvrtko Ursulin wrote:
>> Perhaps then leave the structure name as is and just rename the
>> function to i915_gem_request_assign_active? I think that describes
>> better what is actually happening.
>
> i915_gem_request_update_active()?
>
> request_assign_active() says to me that it is the request we are acting
> on and it can have only one active entity. "update" could go either way.
>
> i915_gem_active_add_to_request() is the full version I guess, or just
> i915_gem_active_set().

i915_gem_active_add_to_request sounds good, but need to reorder the 
parameters then I think.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 073/190] drm/i915: Introduce i915_gem_active for request tracking
  2016-01-12 11:01         ` Chris Wilson
  2016-01-12 13:42           ` Tvrtko Ursulin
@ 2016-01-12 13:44           ` Tvrtko Ursulin
  2016-01-12 14:08             ` Chris Wilson
  1 sibling, 1 reply; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-01-12 13:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 12/01/16 11:01, Chris Wilson wrote:
> On Tue, Jan 12, 2016 at 10:04:20AM +0000, Tvrtko Ursulin wrote:
>> Perhaps then leave the structure name as is and just rename the
>> function to i915_gem_request_assign_active? I think that describes
>> better what is actually happening.
>
> i915_gem_request_update_active()?
>
> request_assign_active() says to me that it is the request we are acting
> on and it can have only one active entity. "update" could go either way.
>
> i915_gem_active_add_to_request() is the full version I guess, or just
> i915_gem_active_set().
>
> i915_gem_request_mark_active() -> i915_gem_active_set()

Sorry, or the short version might be good enough and better in the code 
since shorter. Just I think parameters need to be re-ordered.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 074/190] drm/i915: Rename request->list to link for consistency
  2016-01-11  9:17 ` [PATCH 074/190] drm/i915: Rename request->list to link for consistency Chris Wilson
@ 2016-01-12 13:47   ` Tvrtko Ursulin
  0 siblings, 0 replies; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-01-12 13:47 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 11/01/16 09:17, Chris Wilson wrote:
> We use "list" to denote the list and "link" to denote an element on that
> list. Rename request->list to match this idiom.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c     |  4 ++--
>   drivers/gpu/drm/i915/i915_gem.c         | 12 ++++++------
>   drivers/gpu/drm/i915/i915_gem_request.c | 10 +++++-----
>   drivers/gpu/drm/i915/i915_gem_request.h |  4 ++--
>   drivers/gpu/drm/i915/i915_gpu_error.c   |  4 ++--
>   drivers/gpu/drm/i915/intel_ringbuffer.c |  6 +++---
>   6 files changed, 20 insertions(+), 20 deletions(-)

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko


> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 65cb1d6a5d64..efa9572fc217 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -695,13 +695,13 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
>   		int count;
>
>   		count = 0;
> -		list_for_each_entry(req, &ring->request_list, list)
> +		list_for_each_entry(req, &ring->request_list, link)
>   			count++;
>   		if (count == 0)
>   			continue;
>
>   		seq_printf(m, "%s requests: %d\n", ring->name, count);
> -		list_for_each_entry(req, &ring->request_list, list) {
> +		list_for_each_entry(req, &ring->request_list, link) {
>   			struct task_struct *task;
>
>   			rcu_read_lock();
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 77c253ddf060..f314b3ea2726 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2183,7 +2183,7 @@ i915_gem_find_active_request(struct intel_engine_cs *ring)
>   	 * extra delay for a recent interrupt is pointless. Hence, we do
>   	 * not need an engine->irq_seqno_barrier() before the seqno reads.
>   	 */
> -	list_for_each_entry(request, &ring->request_list, list) {
> +	list_for_each_entry(request, &ring->request_list, link) {
>   		if (i915_gem_request_completed(request))
>   			continue;
>
> @@ -2208,7 +2208,7 @@ static void i915_gem_reset_ring_status(struct intel_engine_cs *ring)
>
>   	i915_set_reset_status(dev_priv, request->ctx, ring_hung);
>
> -	list_for_each_entry_continue(request, &ring->request_list, list)
> +	list_for_each_entry_continue(request, &ring->request_list, link)
>   		i915_set_reset_status(dev_priv, request->ctx, false);
>   }
>
> @@ -2255,7 +2255,7 @@ static void i915_gem_reset_ring_cleanup(struct intel_engine_cs *engine)
>
>   		request = list_last_entry(&engine->request_list,
>   					  struct drm_i915_gem_request,
> -					  list);
> +					  link);
>
>   		i915_gem_request_retire_upto(request);
>   	}
> @@ -2317,7 +2317,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>
>   		request = list_first_entry(&ring->request_list,
>   					   struct drm_i915_gem_request,
> -					   list);
> +					   link);
>
>   		if (!i915_gem_request_completed(request))
>   			break;
> @@ -2336,7 +2336,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>   				      struct drm_i915_gem_object,
>   				      ring_list[ring->id]);
>
> -		if (!list_empty(&obj->last_read[ring->id].request->list))
> +		if (!list_empty(&obj->last_read[ring->id].request->link))
>   			break;
>
>   		i915_gem_object_retire__read(obj, ring->id);
> @@ -2449,7 +2449,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
>   		if (req == NULL)
>   			continue;
>
> -		if (list_empty(&req->list))
> +		if (list_empty(&req->link))
>   			goto retire;
>
>   		if (i915_gem_request_completed(req)) {
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index 01443d8d9224..7f38d8972721 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -333,7 +333,7 @@ void i915_gem_request_cancel(struct drm_i915_gem_request *req)
>   static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>   {
>   	trace_i915_gem_request_retire(request);
> -	list_del_init(&request->list);
> +	list_del_init(&request->link);
>
>   	/* We know the GPU must have read the request to have
>   	 * sent us the seqno + interrupt, so use the position
> @@ -355,12 +355,12 @@ i915_gem_request_retire_upto(struct drm_i915_gem_request *req)
>
>   	lockdep_assert_held(&engine->dev->struct_mutex);
>
> -	if (list_empty(&req->list))
> +	if (list_empty(&req->link))
>   		return;
>
>   	do {
>   		tmp = list_first_entry(&engine->request_list,
> -				       typeof(*tmp), list);
> +				       typeof(*tmp), link);
>
>   		i915_gem_request_retire(tmp);
>   	} while (tmp != req);
> @@ -451,7 +451,7 @@ void __i915_add_request(struct drm_i915_gem_request *request,
>   	request->emitted_jiffies = jiffies;
>   	request->previous_seqno = request->engine->last_submitted_seqno;
>   	request->engine->last_submitted_seqno = request->fence.seqno;
> -	list_add_tail(&request->list, &request->engine->request_list);
> +	list_add_tail(&request->link, &request->engine->request_list);
>
>   	trace_i915_gem_request_add(request);
>
> @@ -565,7 +565,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>
>   	might_sleep();
>
> -	if (list_empty(&req->list))
> +	if (list_empty(&req->link))
>   		return 0;
>
>   	if (i915_gem_request_completed(req))
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index 0a21986c332b..01d589be95fd 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -88,8 +88,8 @@ struct drm_i915_gem_request {
>   	/** Time at which this request was emitted, in jiffies. */
>   	unsigned long emitted_jiffies;
>
> -	/** global list entry for this request */
> -	struct list_head list;
> +	/** engine->request_list entry for this request */
> +	struct list_head link;
>
>   	struct drm_i915_file_private *file_priv;
>   	/** file_priv list entry for this request */
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 5027636e3624..c812079bc25c 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1056,7 +1056,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
>   		i915_gem_record_active_context(engine, error, &error->ring[i]);
>
>   		count = 0;
> -		list_for_each_entry(request, &engine->request_list, list)
> +		list_for_each_entry(request, &engine->request_list, link)
>   			count++;
>
>   		error->ring[i].num_requests = count;
> @@ -1069,7 +1069,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
>   		}
>
>   		count = 0;
> -		list_for_each_entry(request, &engine->request_list, list) {
> +		list_for_each_entry(request, &engine->request_list, link) {
>   			struct drm_i915_error_request *erq;
>
>   			if (count >= error->ring[i].num_requests) {
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index d37cdb2f9073..213540f92c9d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2109,7 +2109,7 @@ int intel_engine_idle(struct intel_engine_cs *ring)
>
>   	req = list_entry(ring->request_list.prev,
>   			struct drm_i915_gem_request,
> -			list);
> +			link);
>
>   	/* Make sure we do not trigger any retires */
>   	return __i915_wait_request(req,
> @@ -2184,7 +2184,7 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
>   	/* The whole point of reserving space is to not wait! */
>   	WARN_ON(ring->reserved_in_use);
>
> -	list_for_each_entry(target, &engine->request_list, list) {
> +	list_for_each_entry(target, &engine->request_list, link) {
>   		/*
>   		 * The request queue is per-engine, so can contain requests
>   		 * from multiple ringbuffers. Here, we must ignore any that
> @@ -2200,7 +2200,7 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
>   			break;
>   	}
>
> -	if (WARN_ON(&target->list == &engine->request_list))
> +	if (WARN_ON(&target->link == &engine->request_list))
>   		return -ENOSPC;
>
>   	ret = i915_wait_request(target);
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 076/190] drm/i915: Rename vma->*_list to *_link for consistency
  2016-01-11  9:17 ` [PATCH 076/190] drm/i915: Rename vma->*_list to *_link for consistency Chris Wilson
@ 2016-01-12 13:49   ` Tvrtko Ursulin
  0 siblings, 0 replies; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-01-12 13:49 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx



On 11/01/16 09:17, Chris Wilson wrote:
> Elsewhere we have adopted the convention of using '_link' to denote
> elements in the list (and '_list' for the actual list_head itself), and
> that the name should indicate which list the link belongs to (and
> preferrably not just where the link is being stored).
>
> s/vma_link/obj_link/ (we iterate over obj->vma_list)
> s/mm_list/vm_link/ (we iterate over vm->[in]active_list)
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c      | 17 +++++------
>   drivers/gpu/drm/i915/i915_gem.c          | 50 ++++++++++++++++----------------
>   drivers/gpu/drm/i915/i915_gem_context.c  |  2 +-
>   drivers/gpu/drm/i915/i915_gem_evict.c    |  6 ++--
>   drivers/gpu/drm/i915/i915_gem_gtt.c      | 10 +++----
>   drivers/gpu/drm/i915/i915_gem_gtt.h      |  4 +--
>   drivers/gpu/drm/i915/i915_gem_shrinker.c |  4 +--
>   drivers/gpu/drm/i915/i915_gem_stolen.c   |  2 +-
>   drivers/gpu/drm/i915/i915_gem_userptr.c  |  2 +-
>   drivers/gpu/drm/i915/i915_gpu_error.c    |  8 ++---
>   10 files changed, 52 insertions(+), 53 deletions(-)

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index efa9572fc217..f311df758195 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -117,9 +117,8 @@ static u64 i915_gem_obj_total_ggtt_size(struct drm_i915_gem_object *obj)
>   	u64 size = 0;
>   	struct i915_vma *vma;
>
> -	list_for_each_entry(vma, &obj->vma_list, vma_link) {
> -		if (i915_is_ggtt(vma->vm) &&
> -		    drm_mm_node_allocated(&vma->node))
> +	list_for_each_entry(vma, &obj->vma_list, obj_link) {
> +		if (i915_is_ggtt(vma->vm) && drm_mm_node_allocated(&vma->node))
>   			size += vma->node.size;
>   	}
>
> @@ -155,7 +154,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>   		   obj->madv == I915_MADV_DONTNEED ? " purgeable" : "");
>   	if (obj->base.name)
>   		seq_printf(m, " (name: %d)", obj->base.name);
> -	list_for_each_entry(vma, &obj->vma_list, vma_link) {
> +	list_for_each_entry(vma, &obj->vma_list, obj_link) {
>   		if (vma->pin_count > 0)
>   			pin_count++;
>   	}
> @@ -164,7 +163,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>   		seq_printf(m, " (display)");
>   	if (obj->fence_reg != I915_FENCE_REG_NONE)
>   		seq_printf(m, " (fence: %d)", obj->fence_reg);
> -	list_for_each_entry(vma, &obj->vma_list, vma_link) {
> +	list_for_each_entry(vma, &obj->vma_list, obj_link) {
>   		seq_printf(m, " (%sgtt offset: %08llx, size: %08llx",
>   			   i915_is_ggtt(vma->vm) ? "g" : "pp",
>   			   vma->node.start, vma->node.size);
> @@ -229,7 +228,7 @@ static int i915_gem_object_list_info(struct seq_file *m, void *data)
>   	}
>
>   	total_obj_size = total_gtt_size = count = 0;
> -	list_for_each_entry(vma, head, mm_list) {
> +	list_for_each_entry(vma, head, vm_link) {
>   		seq_printf(m, "   ");
>   		describe_obj(m, vma->obj);
>   		seq_printf(m, "\n");
> @@ -341,7 +340,7 @@ static int per_file_stats(int id, void *ptr, void *data)
>   		stats->shared += obj->base.size;
>
>   	if (USES_FULL_PPGTT(obj->base.dev)) {
> -		list_for_each_entry(vma, &obj->vma_list, vma_link) {
> +		list_for_each_entry(vma, &obj->vma_list, obj_link) {
>   			struct i915_hw_ppgtt *ppgtt;
>
>   			if (!drm_mm_node_allocated(&vma->node))
> @@ -453,12 +452,12 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
>   		   count, mappable_count, size, mappable_size);
>
>   	size = count = mappable_size = mappable_count = 0;
> -	count_vmas(&vm->active_list, mm_list);
> +	count_vmas(&vm->active_list, vm_link);
>   	seq_printf(m, "  %u [%u] active objects, %llu [%llu] bytes\n",
>   		   count, mappable_count, size, mappable_size);
>
>   	size = count = mappable_size = mappable_count = 0;
> -	count_vmas(&vm->inactive_list, mm_list);
> +	count_vmas(&vm->inactive_list, vm_link);
>   	seq_printf(m, "  %u [%u] inactive objects, %llu [%llu] bytes\n",
>   		   count, mappable_count, size, mappable_size);
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 4eef13ebdaf3..e4d7c7f5aca2 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -128,10 +128,10 @@ i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
>
>   	pinned = 0;
>   	mutex_lock(&dev->struct_mutex);
> -	list_for_each_entry(vma, &ggtt->base.active_list, mm_list)
> +	list_for_each_entry(vma, &ggtt->base.active_list, vm_link)
>   		if (vma->pin_count)
>   			pinned += vma->node.size;
> -	list_for_each_entry(vma, &ggtt->base.inactive_list, mm_list)
> +	list_for_each_entry(vma, &ggtt->base.inactive_list, vm_link)
>   		if (vma->pin_count)
>   			pinned += vma->node.size;
>   	mutex_unlock(&dev->struct_mutex);
> @@ -261,7 +261,7 @@ drop_pages(struct drm_i915_gem_object *obj)
>   	int ret;
>
>   	drm_gem_object_reference(&obj->base);
> -	list_for_each_entry_safe(vma, next, &obj->vma_list, vma_link)
> +	list_for_each_entry_safe(vma, next, &obj->vma_list, obj_link)
>   		if (i915_vma_unbind(vma))
>   			break;
>
> @@ -2038,7 +2038,7 @@ void i915_vma_move_to_active(struct i915_vma *vma,
>   	obj->active |= intel_engine_flag(engine);
>
>   	i915_gem_request_mark_active(req, &obj->last_read[engine->id]);
> -	list_move_tail(&vma->mm_list, &vma->vm->active_list);
> +	list_move_tail(&vma->vm_link, &vma->vm->active_list);
>   }
>
>   static void
> @@ -2079,9 +2079,9 @@ i915_gem_object_retire__read(struct i915_gem_active *active,
>   	 */
>   	list_move_tail(&obj->global_list, &request->i915->mm.bound_list);
>
> -	list_for_each_entry(vma, &obj->vma_list, vma_link) {
> -		if (!list_empty(&vma->mm_list))
> -			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
> +	list_for_each_entry(vma, &obj->vma_list, obj_link) {
> +		if (!list_empty(&vma->vm_link))
> +			list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
>   	}
>
>   	drm_gem_object_unreference(&obj->base);
> @@ -2576,7 +2576,7 @@ static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
>   	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
>   	int ret;
>
> -	if (list_empty(&vma->vma_link))
> +	if (list_empty(&vma->obj_link))
>   		return 0;
>
>   	if (!drm_mm_node_allocated(&vma->node)) {
> @@ -2610,7 +2610,7 @@ static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
>   	vma->vm->unbind_vma(vma);
>   	vma->bound = 0;
>
> -	list_del_init(&vma->mm_list);
> +	list_del_init(&vma->vm_link);
>   	if (i915_is_ggtt(vma->vm)) {
>   		if (vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
>   			obj->map_and_fenceable = false;
> @@ -2864,7 +2864,7 @@ search_free:
>   		goto err_remove_node;
>
>   	list_move_tail(&obj->global_list, &dev_priv->mm.bound_list);
> -	list_add_tail(&vma->mm_list, &vm->inactive_list);
> +	list_add_tail(&vma->vm_link, &vm->inactive_list);
>
>   	return vma;
>
> @@ -3029,7 +3029,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>   	/* And bump the LRU for this access */
>   	vma = i915_gem_obj_to_ggtt(obj);
>   	if (vma && drm_mm_node_allocated(&vma->node) && !obj->active)
> -		list_move_tail(&vma->mm_list,
> +		list_move_tail(&vma->vm_link,
>   			       &to_i915(obj->base.dev)->gtt.base.inactive_list);
>
>   	return 0;
> @@ -3064,7 +3064,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>   	 * catch the issue of the CS prefetch crossing page boundaries and
>   	 * reading an invalid PTE on older architectures.
>   	 */
> -	list_for_each_entry_safe(vma, next, &obj->vma_list, vma_link) {
> +	list_for_each_entry_safe(vma, next, &obj->vma_list, obj_link) {
>   		if (!drm_mm_node_allocated(&vma->node))
>   			continue;
>
> @@ -3127,7 +3127,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>   			 */
>   		}
>
> -		list_for_each_entry(vma, &obj->vma_list, vma_link) {
> +		list_for_each_entry(vma, &obj->vma_list, obj_link) {
>   			if (!drm_mm_node_allocated(&vma->node))
>   				continue;
>
> @@ -3137,7 +3137,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>   		}
>   	}
>
> -	list_for_each_entry(vma, &obj->vma_list, vma_link)
> +	list_for_each_entry(vma, &obj->vma_list, obj_link)
>   		vma->node.color = cache_level;
>   	obj->cache_level = cache_level;
>
> @@ -3797,7 +3797,7 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
>
>   	trace_i915_gem_object_destroy(obj);
>
> -	list_for_each_entry_safe(vma, next, &obj->vma_list, vma_link) {
> +	list_for_each_entry_safe(vma, next, &obj->vma_list, obj_link) {
>   		int ret;
>
>   		vma->pin_count = 0;
> @@ -3854,7 +3854,7 @@ struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
>   				     struct i915_address_space *vm)
>   {
>   	struct i915_vma *vma;
> -	list_for_each_entry(vma, &obj->vma_list, vma_link) {
> +	list_for_each_entry(vma, &obj->vma_list, obj_link) {
>   		if (vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL &&
>   		    vma->vm == vm)
>   			return vma;
> @@ -3871,7 +3871,7 @@ struct i915_vma *i915_gem_obj_to_ggtt_view(struct drm_i915_gem_object *obj,
>   	if (WARN_ONCE(!view, "no view specified"))
>   		return ERR_PTR(-EINVAL);
>
> -	list_for_each_entry(vma, &obj->vma_list, vma_link)
> +	list_for_each_entry(vma, &obj->vma_list, obj_link)
>   		if (vma->vm == ggtt &&
>   		    i915_ggtt_view_equal(&vma->ggtt_view, view))
>   			return vma;
> @@ -3892,7 +3892,7 @@ void i915_gem_vma_destroy(struct i915_vma *vma)
>   	if (!i915_is_ggtt(vm))
>   		i915_ppgtt_put(i915_vm_to_ppgtt(vm));
>
> -	list_del(&vma->vma_link);
> +	list_del(&vma->obj_link);
>
>   	kmem_cache_free(to_i915(vma->obj->base.dev)->vmas, vma);
>   }
> @@ -4444,7 +4444,7 @@ u64 i915_gem_obj_offset(struct drm_i915_gem_object *o,
>
>   	WARN_ON(vm == &dev_priv->mm.aliasing_ppgtt->base);
>
> -	list_for_each_entry(vma, &o->vma_list, vma_link) {
> +	list_for_each_entry(vma, &o->vma_list, obj_link) {
>   		if (i915_is_ggtt(vma->vm) &&
>   		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
>   			continue;
> @@ -4463,7 +4463,7 @@ u64 i915_gem_obj_ggtt_offset_view(struct drm_i915_gem_object *o,
>   	struct i915_address_space *ggtt = i915_obj_to_ggtt(o);
>   	struct i915_vma *vma;
>
> -	list_for_each_entry(vma, &o->vma_list, vma_link)
> +	list_for_each_entry(vma, &o->vma_list, obj_link)
>   		if (vma->vm == ggtt &&
>   		    i915_ggtt_view_equal(&vma->ggtt_view, view))
>   			return vma->node.start;
> @@ -4477,7 +4477,7 @@ bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
>   {
>   	struct i915_vma *vma;
>
> -	list_for_each_entry(vma, &o->vma_list, vma_link) {
> +	list_for_each_entry(vma, &o->vma_list, obj_link) {
>   		if (i915_is_ggtt(vma->vm) &&
>   		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
>   			continue;
> @@ -4494,7 +4494,7 @@ bool i915_gem_obj_ggtt_bound_view(struct drm_i915_gem_object *o,
>   	struct i915_address_space *ggtt = i915_obj_to_ggtt(o);
>   	struct i915_vma *vma;
>
> -	list_for_each_entry(vma, &o->vma_list, vma_link)
> +	list_for_each_entry(vma, &o->vma_list, obj_link)
>   		if (vma->vm == ggtt &&
>   		    i915_ggtt_view_equal(&vma->ggtt_view, view) &&
>   		    drm_mm_node_allocated(&vma->node))
> @@ -4507,7 +4507,7 @@ bool i915_gem_obj_bound_any(struct drm_i915_gem_object *o)
>   {
>   	struct i915_vma *vma;
>
> -	list_for_each_entry(vma, &o->vma_list, vma_link)
> +	list_for_each_entry(vma, &o->vma_list, obj_link)
>   		if (drm_mm_node_allocated(&vma->node))
>   			return true;
>
> @@ -4524,7 +4524,7 @@ unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
>
>   	BUG_ON(list_empty(&o->vma_list));
>
> -	list_for_each_entry(vma, &o->vma_list, vma_link) {
> +	list_for_each_entry(vma, &o->vma_list, obj_link) {
>   		if (i915_is_ggtt(vma->vm) &&
>   		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
>   			continue;
> @@ -4537,7 +4537,7 @@ unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
>   bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj)
>   {
>   	struct i915_vma *vma;
> -	list_for_each_entry(vma, &obj->vma_list, vma_link)
> +	list_for_each_entry(vma, &obj->vma_list, obj_link)
>   		if (vma->pin_count > 0)
>   			return true;
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 72b0875a95a4..05b4e0e85f24 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -142,7 +142,7 @@ static void i915_gem_context_clean(struct intel_context *ctx)
>   		return;
>
>   	list_for_each_entry_safe(vma, next, &ppgtt->base.inactive_list,
> -				 mm_list) {
> +				 vm_link) {
>   		if (WARN_ON(__i915_vma_unbind_no_wait(vma)))
>   			break;
>   	}
> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> index 07c6e4d320c9..ea1f8d1bd228 100644
> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> @@ -116,7 +116,7 @@ i915_gem_evict_something(struct drm_device *dev, struct i915_address_space *vm,
>
>   search_again:
>   	/* First see if there is a large enough contiguous idle region... */
> -	list_for_each_entry(vma, &vm->inactive_list, mm_list) {
> +	list_for_each_entry(vma, &vm->inactive_list, vm_link) {
>   		if (mark_free(vma, &unwind_list))
>   			goto found;
>   	}
> @@ -125,7 +125,7 @@ search_again:
>   		goto none;
>
>   	/* Now merge in the soon-to-be-expired objects... */
> -	list_for_each_entry(vma, &vm->active_list, mm_list) {
> +	list_for_each_entry(vma, &vm->active_list, vm_link) {
>   		if (mark_free(vma, &unwind_list))
>   			goto found;
>   	}
> @@ -270,7 +270,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
>   		WARN_ON(!list_empty(&vm->active_list));
>   	}
>
> -	list_for_each_entry_safe(vma, next, &vm->inactive_list, mm_list)
> +	list_for_each_entry_safe(vma, next, &vm->inactive_list, vm_link)
>   		if (vma->pin_count == 0)
>   			WARN_ON(i915_vma_unbind(vma));
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index cddbd8c00663..6168182a87d8 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2736,7 +2736,7 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
>   		}
>   		vma->bound |= GLOBAL_BIND;
>   		__i915_vma_set_map_and_fenceable(vma);
> -		list_add_tail(&vma->mm_list, &ggtt_vm->inactive_list);
> +		list_add_tail(&vma->vm_link, &ggtt_vm->inactive_list);
>   	}
>
>   	/* Clear any non-preallocated blocks */
> @@ -3221,7 +3221,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
>   	vm = &dev_priv->gtt.base;
>   	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
>   		flush = false;
> -		list_for_each_entry(vma, &obj->vma_list, vma_link) {
> +		list_for_each_entry(vma, &obj->vma_list, obj_link) {
>   			if (vma->vm != vm)
>   				continue;
>
> @@ -3277,8 +3277,8 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
>   	if (vma == NULL)
>   		return ERR_PTR(-ENOMEM);
>
> -	INIT_LIST_HEAD(&vma->vma_link);
> -	INIT_LIST_HEAD(&vma->mm_list);
> +	INIT_LIST_HEAD(&vma->vm_link);
> +	INIT_LIST_HEAD(&vma->obj_link);
>   	INIT_LIST_HEAD(&vma->exec_list);
>   	vma->vm = vm;
>   	vma->obj = obj;
> @@ -3286,7 +3286,7 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
>   	if (i915_is_ggtt(vm))
>   		vma->ggtt_view = *ggtt_view;
>
> -	list_add_tail(&vma->vma_link, &obj->vma_list);
> +	list_add_tail(&vma->obj_link, &obj->vma_list);
>   	if (!i915_is_ggtt(vm))
>   		i915_ppgtt_get(i915_vm_to_ppgtt(vm));
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index b448ad832dcf..2497671d1e1a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -195,9 +195,9 @@ struct i915_vma {
>   	struct i915_ggtt_view ggtt_view;
>
>   	/** This object's place on the active/inactive lists */
> -	struct list_head mm_list;
> +	struct list_head vm_link;
>
> -	struct list_head vma_link; /* Link in the object's VMA list */
> +	struct list_head obj_link; /* Link in the object's VMA list */
>
>   	/** This vma's place in the batchbuffer or on the eviction list */
>   	struct list_head exec_list;
> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> index 16da9c1422cc..777959b47ccf 100644
> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> @@ -52,7 +52,7 @@ static int num_vma_bound(struct drm_i915_gem_object *obj)
>   	struct i915_vma *vma;
>   	int count = 0;
>
> -	list_for_each_entry(vma, &obj->vma_list, vma_link) {
> +	list_for_each_entry(vma, &obj->vma_list, obj_link) {
>   		if (drm_mm_node_allocated(&vma->node))
>   			count++;
>   		if (vma->pin_count)
> @@ -176,7 +176,7 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
>
>   			/* For the unbound phase, this should be a no-op! */
>   			list_for_each_entry_safe(vma, v,
> -						 &obj->vma_list, vma_link)
> +						 &obj->vma_list, obj_link)
>   				if (i915_vma_unbind(vma))
>   					break;
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
> index c384dc9c8a63..590e635cb65c 100644
> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> @@ -692,7 +692,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
>
>   		vma->bound |= GLOBAL_BIND;
>   		__i915_vma_set_map_and_fenceable(vma);
> -		list_add_tail(&vma->mm_list, &ggtt->inactive_list);
> +		list_add_tail(&vma->vm_link, &ggtt->inactive_list);
>   	}
>
>   	list_add_tail(&obj->global_list, &dev_priv->mm.bound_list);
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index 251e81c4b0ea..2f3638d02bdd 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -81,7 +81,7 @@ static void __cancel_userptr__worker(struct work_struct *work)
>   		was_interruptible = dev_priv->mm.interruptible;
>   		dev_priv->mm.interruptible = false;
>
> -		list_for_each_entry_safe(vma, tmp, &obj->vma_list, vma_link)
> +		list_for_each_entry_safe(vma, tmp, &obj->vma_list, obj_link)
>   			WARN_ON(i915_vma_unbind(vma));
>   		WARN_ON(i915_gem_object_put_pages(obj));
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index c812079bc25c..706d956b6eb3 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -731,7 +731,7 @@ static u32 capture_active_bo(struct drm_i915_error_buffer *err,
>   	struct i915_vma *vma;
>   	int i = 0;
>
> -	list_for_each_entry(vma, head, mm_list) {
> +	list_for_each_entry(vma, head, vm_link) {
>   		capture_bo(err++, vma);
>   		if (++i == count)
>   			break;
> @@ -754,7 +754,7 @@ static u32 capture_pinned_bo(struct drm_i915_error_buffer *err,
>   		if (err == last)
>   			break;
>
> -		list_for_each_entry(vma, &obj->vma_list, vma_link)
> +		list_for_each_entry(vma, &obj->vma_list, obj_link)
>   			if (vma->vm == vm && vma->pin_count > 0)
>   				capture_bo(err++, vma);
>   	}
> @@ -1113,12 +1113,12 @@ static void i915_gem_capture_vm(struct drm_i915_private *dev_priv,
>   	int i;
>
>   	i = 0;
> -	list_for_each_entry(vma, &vm->active_list, mm_list)
> +	list_for_each_entry(vma, &vm->active_list, vm_link)
>   		i++;
>   	error->active_bo_count[ndx] = i;
>
>   	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
> -		list_for_each_entry(vma, &obj->vma_list, vma_link)
> +		list_for_each_entry(vma, &obj->vma_list, obj_link)
>   			if (vma->vm == vm && vma->pin_count > 0)
>   				i++;
>   	}
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 073/190] drm/i915: Introduce i915_gem_active for request tracking
  2016-01-12 13:44           ` Tvrtko Ursulin
@ 2016-01-12 14:08             ` Chris Wilson
  0 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-12 14:08 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Tue, Jan 12, 2016 at 01:44:13PM +0000, Tvrtko Ursulin wrote:
> 
> On 12/01/16 11:01, Chris Wilson wrote:
> >On Tue, Jan 12, 2016 at 10:04:20AM +0000, Tvrtko Ursulin wrote:
> >>Perhaps then leave the structure name as is and just rename the
> >>function to i915_gem_request_assign_active? I think that describes
> >>better what is actually happening.
> >
> >i915_gem_request_update_active()?
> >
> >request_assign_active() says to me that it is the request we are acting
> >on and it can have only one active entity. "update" could go either way.
> >
> >i915_gem_active_add_to_request() is the full version I guess, or just
> >i915_gem_active_set().
> >
> >i915_gem_request_mark_active() -> i915_gem_active_set()
> 
> Sorry, or the short version might be good enough and better in the
> code since shorter. Just I think parameters need to be re-ordered.

Yes, i915_gem_active_set() would operate on the i915_gem_active and take
i915_gem_request as its parameter.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 024/190] drm/i915: Replace manual barrier() with READ_ONCE() in HWS accessor
  2016-01-11  9:16 ` [PATCH 024/190] drm/i915: Replace manual barrier() with READ_ONCE() in HWS accessor Chris Wilson
@ 2016-01-12 14:17   ` Mika Kuoppala
  0 siblings, 0 replies; 263+ messages in thread
From: Mika Kuoppala @ 2016-01-12 14:17 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter

Chris Wilson <chris@chris-wilson.co.uk> writes:

> When reading from the HWS page, we use barrier() to prevent the compiler
> optimising away the read from the volatile (may be updated by the GPU)
> memory address. This is more suited to READ_ONCE(); make it so.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

After reading Documentation/memory-barriers.txt I feel that deodorant
failed. I confirmed my confusion about hws page cacheability from Chris,
and it is snooped.

The barrier here is superset of what we need. We need
to instruct compiler to throw out compiler cached value of this
particular address, not everything it had. So it makes sense to me.

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 6cc8e9c5f8d6..8f305ce253ae 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -418,8 +418,7 @@ intel_read_status_page(struct intel_engine_cs *ring,
>  		       int reg)
>  {
>  	/* Ensure that the compiler doesn't optimize away the load. */
> -	barrier();
> -	return ring->status_page.page_addr[reg];
> +	return READ_ONCE(ring->status_page.page_addr[reg]);
>  }
>  
>  static inline void
> -- 
> 2.7.0.rc3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 021/190] drm/i915: Use HWS for seqno tracking everywhere
  2016-01-12 11:03     ` Chris Wilson
@ 2016-01-12 14:30       ` Mika Kuoppala
  2016-01-12 14:46         ` Chris Wilson
  0 siblings, 1 reply; 263+ messages in thread
From: Mika Kuoppala @ 2016-01-12 14:30 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Tue, Jan 12, 2016 at 12:05:06PM +0200, Mika Kuoppala wrote:
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> > -	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
>> > -			PIPE_CONTROL_WRITE_FLUSH |
>> > -			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
>> > -	intel_ring_emit(ring, ring->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
>> > -	intel_ring_emit(ring, i915_gem_request_get_seqno(req));
>> > +	intel_ring_emit(ring,
>> > +			GFX_OP_PIPE_CONTROL(4) |
>> > +			PIPE_CONTROL_QW_WRITE |
>> > +			PIPE_CONTROL_WRITE_FLUSH);
>> 
>> Why no more PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE?
>
> I opened vim to add it back in and I coulnd't bring myself to commit
> that attrocity.

I just noticed the asymmetry. Ilk doesn't need it?

-Mika

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 021/190] drm/i915: Use HWS for seqno tracking everywhere
  2016-01-12 14:30       ` Mika Kuoppala
@ 2016-01-12 14:46         ` Chris Wilson
  0 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-12 14:46 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

On Tue, Jan 12, 2016 at 04:30:03PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > On Tue, Jan 12, 2016 at 12:05:06PM +0200, Mika Kuoppala wrote:
> >> Chris Wilson <chris@chris-wilson.co.uk> writes:
> >> > -	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
> >> > -			PIPE_CONTROL_WRITE_FLUSH |
> >> > -			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
> >> > -	intel_ring_emit(ring, ring->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
> >> > -	intel_ring_emit(ring, i915_gem_request_get_seqno(req));
> >> > +	intel_ring_emit(ring,
> >> > +			GFX_OP_PIPE_CONTROL(4) |
> >> > +			PIPE_CONTROL_QW_WRITE |
> >> > +			PIPE_CONTROL_WRITE_FLUSH);
> >> 
> >> Why no more PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE?
> >
> > I opened vim to add it back in and I coulnd't bring myself to commit
> > that attrocity.
> 
> I just noticed the asymmetry. Ilk doesn't need it?

Here and now, we are doing 8 writes (1 seqno, 6 scratch, 1 seqno again
for good luck) simply to try and flush the pipecontrol queue to ensure
the seqno write lands before the notify interrupt is asserted. The TC
invalidate on the first and only first write is superfluous - it imposes
a top of pipe stall when we already have a bottom of pipe stall in the
flush (the write will not occur until the pipeline is drained). We do
the TC invalidate along with the reset of the cache invalidation before
the next batch.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 066/190] drm/i915: Simplify request_alloc by returning the allocated request
  2016-01-11  9:17 ` [PATCH 066/190] drm/i915: Simplify request_alloc by returning the allocated request Chris Wilson
@ 2016-01-12 17:11   ` Dave Gordon
  0 siblings, 0 replies; 263+ messages in thread
From: Dave Gordon @ 2016-01-12 17:11 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 11/01/16 09:17, Chris Wilson wrote:
> If is simpler and leads to more readable code through the callstack if
> the allocation returns the allocated struct through the return value.
>
> The importance of this is that it no longer looks like we accidentally
> allocate requests as side-effect of calling certain functions.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.h            |  3 +-
>   drivers/gpu/drm/i915/i915_gem.c            | 82 ++++++++++--------------------
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |  8 +--
>   drivers/gpu/drm/i915/i915_gem_request.c    | 22 +++-----
>   drivers/gpu/drm/i915/i915_gem_request.h    |  6 +--
>   drivers/gpu/drm/i915/i915_trace.h          | 15 +++---
>   drivers/gpu/drm/i915/intel_display.c       | 25 +++++----
>   drivers/gpu/drm/i915/intel_lrc.c           |  6 +--
>   drivers/gpu/drm/i915/intel_overlay.c       | 24 ++++-----
>   9 files changed, 77 insertions(+), 114 deletions(-)

I would quite like to have reviewed this, since I think simplifying 
{i915_gem_}request_alloc() is a really good idea, and have submitted a 
patch to do just that. But I can't review it because it doesn't apply, 
even if I start from nightly and try to apply the whole series of 190 
patches (it fails at patch 018/190, Separate out the seqno-barrier from 
engine->get_seqno).

Actually, it looks like the request_alloc() changes *would* apply 
cleanly without any of the rest of the pile of patches, except that this 
patch does more than it admits; in addition to the improvement to 
request_alloc() it also rewrites i915_gem_object_sync(), and *those* 
changes don't apply to nightly.

So this should be two separate patches, one to improve request_alloc() 
and then a separate one to update i915_gem_object_sync().

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 055/190] drm/i915: Unify intel_logical_ring_emit and intel_ring_emit
  2016-01-11  9:17 ` [PATCH 055/190] drm/i915: Unify intel_logical_ring_emit and intel_ring_emit Chris Wilson
@ 2016-01-12 17:29   ` Dave Gordon
  0 siblings, 0 replies; 263+ messages in thread
From: Dave Gordon @ 2016-01-12 17:29 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 11/01/16 09:17, Chris Wilson wrote:
> Both perform the same actions with more or less indirection, so just
> unify the code.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c            |   2 +-
>   drivers/gpu/drm/i915/i915_gem_context.c    |   8 +-
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |  34 ++++-----
>   drivers/gpu/drm/i915/i915_gem_gtt.c        |  26 +++----
>   drivers/gpu/drm/i915/intel_display.c       |  26 +++----
>   drivers/gpu/drm/i915/intel_lrc.c           | 114 ++++++++++++++---------------
>   drivers/gpu/drm/i915/intel_lrc.h           |  26 -------
>   drivers/gpu/drm/i915/intel_mocs.c          |  30 ++++----
>   drivers/gpu/drm/i915/intel_overlay.c       |  42 +++++------
>   drivers/gpu/drm/i915/intel_ringbuffer.c    | 101 ++++++++++++-------------
>   drivers/gpu/drm/i915/intel_ringbuffer.h    |  21 ++----
>   11 files changed, 194 insertions(+), 236 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c2a1ec8abc11..247731672cb1 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4068,7 +4068,7 @@ err:
>
>   int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice)
>   {
> -	struct intel_engine_cs *ring = req->ring;
> +	struct intel_ringbuffer *ring = req->ringbuf;

NAK. (regardless of the fact that I'd like them unified!)

Until you have purged the last use of the name "ring" as a reference to 
an engine, adding new things called "ring" but of a different type will 
be too confusing.

The variable, at least for now, should be called "ringbuf", which makes 
it obvious that it caches the 'req->ringbuf' field and NOT the
         struct intel_engine_cs *ring;
that is also found in struct drm_i915_gem_request.

You can only start reusing names with a new meaning /after/ the old 
meaning has been eliminated from the code, but some interval for 
everyone's mental cache to be updated. But probably better never to 
reuse an old name for a different thing, why not just make up a new one, 
as we did with "engine".

>   	struct drm_i915_private *dev_priv = req->i915;
>   	u32 *remap_info = dev_priv->l3_parity.remap_info[slice];
>   	int i, ret;
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 3e3b4bf3fed1..d58de7e084dc 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -519,7 +519,7 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
>   static inline int
>   mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
>   {
> -	struct intel_engine_cs *ring = req->ring;
> +	struct intel_ringbuffer *ring = req->ringbuf;
>   	u32 flags = hw_flags | MI_MM_SPACE_GTT;
>   	const int num_rings =
>   		/* Use an extended w/a on ivb+ if signalling from other rings */
> @@ -534,7 +534,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
>   	 * itlb_before_ctx_switch.
>   	 */
>   	if (IS_GEN6(req->i915)) {
> -		ret = ring->flush(req, I915_GEM_GPU_DOMAINS, 0);
> +		ret = req->ring->flush(req, I915_GEM_GPU_DOMAINS, 0);

Hmm ... what is this "ring"? Oh, this one's not a ringbuffer, it's an 
ENGINE!

>   		if (ret)
>   			return ret;
>   	}
> @@ -562,7 +562,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
>
>   			intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(num_rings));
>   			for_each_ring(signaller, req->i915, i) {
> -				if (signaller == ring)
> +				if (signaller == req->ring)

another engine

>   					continue;
>
>   				intel_ring_emit_reg(ring, RING_PSMI_CTL(signaller->mmio_base));
> @@ -587,7 +587,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
>
>   			intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(num_rings));
>   			for_each_ring(signaller, req->i915, i) {
> -				if (signaller == ring)
> +				if (signaller == req->ring)

and this one too

>   					continue;
>
>   				intel_ring_emit_reg(ring, RING_PSMI_CTL(signaller->mmio_base));
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 78b462956c78..603a247ac333 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1146,14 +1146,12 @@ i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params)
>   }
>
>   static int
> -i915_reset_gen7_sol_offsets(struct drm_device *dev,
> -			    struct drm_i915_gem_request *req)
> +i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
>   {
> -	struct intel_engine_cs *ring = req->ring;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_ringbuffer *ring = req->ringbuf;

But this 'ring' is a ringbuffer ...

>   	int ret, i;
>
> -	if (!IS_GEN7(dev) || ring != &dev_priv->ring[RCS]) {
> +	if (!IS_GEN7(req->i915) || req->ring->id != RCS) {

... and this one, in the same function, is an engine!

>   		DRM_DEBUG("sol reset is gen7/rcs only\n");
>   		return -EINVAL;
>   	}

So please submit a version that does just what it says ('cos I think 
unification would be good) but without the confusing repurposing of 
local variable names.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 006/190] drm/i915: Add GEM debugging Kconfig option
  2016-01-11  9:16 ` [PATCH 006/190] drm/i915: Add GEM debugging Kconfig option Chris Wilson
@ 2016-01-12 17:44   ` Dave Gordon
  0 siblings, 0 replies; 263+ messages in thread
From: Dave Gordon @ 2016-01-12 17:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 11/01/16 09:16, Chris Wilson wrote:
> Currently there is a #define to enable extra BUG_ON for debugging
> requests and associated activities. I want to expand its use to cover
> all of GEM internals (so that we can saturate the code with asserts).
> We can add a Kconfig option to make it easier to enable - with the usual
> caveats of not enabling unless explicitly requested.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/Kconfig.debug |  8 ++++++++
>   drivers/gpu/drm/i915/i915_drv.h    |  6 ++++++
>   drivers/gpu/drm/i915/i915_gem.c    | 12 +++++-------
>   3 files changed, 19 insertions(+), 7 deletions(-)
> diff --git a/drivers/gpu/drm/i915/Kconfig.debug b/drivers/gpu/drm/i915/Kconfig.debug
> index 1f10ee228eda..7fa6b97635e5 100644
> --- a/drivers/gpu/drm/i915/Kconfig.debug
> +++ b/drivers/gpu/drm/i915/Kconfig.debug
> @@ -10,3 +10,11 @@ config DRM_I915_WERROR
>   	---help---
>   	  Add -Werror to the build flags for (and only for) i915.ko.
>   	  Do not enable this unless you are writing code for the i915.ko module.
> +
> +config DRM_I915_DEBUG_GEM
> +	bool "Insert extra checks into the GEM internals"
> +	default n
> +	depends on DRM_I915_WERROR

This comes up as an option only if DRM_I915_WERROR is already selected? 
Surely it should be orthogonal to compile-time checks, with each as 
independent options but both restricted to EXPERT mode. So the line 
above should be "depends on EXPERT" not "depends on DRM_I915_WERROR"?

> +	---help---
> +	  Enable extra sanity checks (including BUGs) that may slow the
> +          system down and if hit hang the machine.

"hang the machine if hit". Unless you want commas round "if hit"?

Otherwise looks OK.

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [Intel-gfx] [PATCH 004/190] drm/i915: Fix some invalid requests cancellations
  2016-01-11  9:16 ` [PATCH 004/190] drm/i915: Fix some invalid requests cancellations Chris Wilson
@ 2016-01-12 18:16     ` Dave Gordon
  0 siblings, 0 replies; 263+ messages in thread
From: Dave Gordon @ 2016-01-12 18:16 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter, stable

On 11/01/16 09:16, Chris Wilson wrote:
> As we add the VMA to the request early, it may be cancelled during
> execbuf reservation. This will leave the context object pointing to a
> dangling request; i915_wait_request() simply skips the wait and so we
> may unbind the object whilst it is still active.

I don't understand this; context objects don't point to requests, it's 
vice versa. The request has a pointer to the engine, and to the context, 
and adds +1 to the refcount on the latter.

> However, if at any point we make a change to the hardware (and equally
> importantly our bookkeeping in the driver), we cannot cancel the request
> as what has already been written must be submitted. Submitting a partial
> request is far easier than trying to unwind the incomplete change.

What change could be made to the hardware? Engine reset, perhaps, but 
even that doesn't necessarily invalidate a request in preparation for 
sending to the hardware.

Submitting a partial change seems likely to leave the h/w in an 
undefined state. Cancelling a request is (or ought to be) trivial; just 
reset the ringbuffer's tail pointer to where it was when the request was 
allocated. The engine doesn't read anything past the tail of the 
previous request until the new request is submitted.

> Unfortunately this patch undoes the excess breadcrumb usage that olr
> prevented, e.g. if we interrupt batchbuffer submission then we submit
> the requests along with the memory writes and interrupt (even though we
> do no real work). Disassociating requests from breadcrumbs (and
> semaphores) is a topic for a past/future series, but now much more
> important.

Another incomprehensible comment? OLR went away a long time ago now. And 
AFAIK batchbuffer submission cannot be interrupted by anything (or more 
accurately, anything that interrupts submission won't be able to write 
to the ringbuffer or submit new work).

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: stable@vger.kernel.org
> ---
>   drivers/gpu/drm/i915/i915_drv.h            |  1 -
>   drivers/gpu/drm/i915/i915_gem.c            |  7 ++-----
>   drivers/gpu/drm/i915/i915_gem_context.c    | 21 +++++++++------------
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 16 +++++-----------
>   drivers/gpu/drm/i915/intel_display.c       |  2 +-
>   drivers/gpu/drm/i915/intel_lrc.c           |  1 -
>   6 files changed, 17 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 747d2d84a18c..ec20814adb0c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2813,7 +2813,6 @@ int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
>   			     struct drm_file *file_priv);
>   void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
>   					struct drm_i915_gem_request *req);
> -void i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params);
>   int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
>   				   struct drm_i915_gem_execbuffer2 *args,
>   				   struct list_head *vmas);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 3ab529669448..fd24877eb0a0 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3384,12 +3384,9 @@ int i915_gpu_idle(struct drm_device *dev)
>   				return ret;
>
>   			ret = i915_switch_context(req);
> -			if (ret) {
> -				i915_gem_request_cancel(req);
> -				return ret;
> -			}
> -
>   			i915_add_request_no_flush(req);
> +			if (ret)
> +				return ret;

This seems like a bad idea. Looking at how we could get here (i.e. how 
could switch_context() return an error), we see things such as "failed 
to pin" or i915_gem_object_get_pages() failed.

With no real idea of what the GPU and/or CPU address spaces contain at 
this point, it seems unwise to charge ahead regardless.

.Dave.

>   		}
>
>   		ret = intel_ring_idle(ring);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index c25083c78ba7..e5e9a8918f19 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -661,7 +661,6 @@ static int do_switch(struct drm_i915_gem_request *req)
>   	struct drm_i915_private *dev_priv = ring->dev->dev_private;
>   	struct intel_context *from = ring->last_context;
>   	u32 hw_flags = 0;
> -	bool uninitialized = false;
>   	int ret, i;
>
>   	if (from != NULL && ring == &dev_priv->ring[RCS]) {
> @@ -768,6 +767,15 @@ static int do_switch(struct drm_i915_gem_request *req)
>   			to->remap_slice &= ~(1<<i);
>   	}
>
> +	if (!to->legacy_hw_ctx.initialized) {
> +		if (ring->init_context) {
> +			ret = ring->init_context(req);
> +			if (ret)
> +				goto unpin_out;
> +		}
> +		to->legacy_hw_ctx.initialized = true;
> +	}
> +
>   	/* The backing object for the context is done after switching to the
>   	 * *next* context. Therefore we cannot retire the previous context until
>   	 * the next context has already started running. In fact, the below code
> @@ -791,21 +799,10 @@ static int do_switch(struct drm_i915_gem_request *req)
>   		i915_gem_context_unreference(from);
>   	}
>
> -	uninitialized = !to->legacy_hw_ctx.initialized;
> -	to->legacy_hw_ctx.initialized = true;
> -
>   done:
>   	i915_gem_context_reference(to);
>   	ring->last_context = to;
>
> -	if (uninitialized) {
> -		if (ring->init_context) {
> -			ret = ring->init_context(req);
> -			if (ret)
> -				DRM_ERROR("ring init context: %d\n", ret);
> -		}
> -	}
> -
>   	return 0;
>
>   unpin_out:
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index dccb517361b3..b8186bd061c1 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1136,7 +1136,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
>   	}
>   }
>
> -void
> +static void
>   i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params)
>   {
>   	/* Unconditionally force add_request to emit a full flush. */
> @@ -1318,7 +1318,6 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
>   	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
>
>   	i915_gem_execbuffer_move_to_active(vmas, params->request);
> -	i915_gem_execbuffer_retire_commands(params);
>
>   	return 0;
>   }
> @@ -1607,8 +1606,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   		goto err_batch_unpin;
>
>   	ret = i915_gem_request_add_to_client(params->request, file);
> -	if (ret)
> +	if (ret) {
> +		i915_gem_request_cancel(params->request);
>   		goto err_batch_unpin;
> +	}
>
>   	/*
>   	 * Save assorted stuff away to pass through to *_submission().
> @@ -1624,6 +1625,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	params->ctx                     = ctx;
>
>   	ret = dev_priv->gt.execbuf_submit(params, args, &eb->vmas);
> +	i915_gem_execbuffer_retire_commands(params);
>
>   err_batch_unpin:
>   	/*
> @@ -1640,14 +1642,6 @@ err:
>   	i915_gem_context_unreference(ctx);
>   	eb_destroy(eb);
>
> -	/*
> -	 * If the request was created but not successfully submitted then it
> -	 * must be freed again. If it was submitted then it is being tracked
> -	 * on the active request list and no clean up is required here.
> -	 */
> -	if (ret && params->request)
> -		i915_gem_request_cancel(params->request);
> -
>   	mutex_unlock(&dev->struct_mutex);
>
>   pre_mutex_err:
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index b4cf9ce16155..959868c40018 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -11751,7 +11751,7 @@ cleanup_unpin:
>   	intel_unpin_fb_obj(fb, crtc->primary->state);
>   cleanup_pending:
>   	if (request)
> -		i915_gem_request_cancel(request);
> +		i915_add_request_no_flush(request);
>   	atomic_dec(&intel_crtc->unpin_work_count);
>   	mutex_unlock(&dev->struct_mutex);
>   cleanup:
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index f7fac5f3b5ce..7f17ba852b8a 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -972,7 +972,6 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
>   	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
>
>   	i915_gem_execbuffer_move_to_active(vmas, params->request);
> -	i915_gem_execbuffer_retire_commands(params);
>
>   	return 0;
>   }
>


^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 004/190] drm/i915: Fix some invalid requests cancellations
@ 2016-01-12 18:16     ` Dave Gordon
  0 siblings, 0 replies; 263+ messages in thread
From: Dave Gordon @ 2016-01-12 18:16 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter, stable

On 11/01/16 09:16, Chris Wilson wrote:
> As we add the VMA to the request early, it may be cancelled during
> execbuf reservation. This will leave the context object pointing to a
> dangling request; i915_wait_request() simply skips the wait and so we
> may unbind the object whilst it is still active.

I don't understand this; context objects don't point to requests, it's 
vice versa. The request has a pointer to the engine, and to the context, 
and adds +1 to the refcount on the latter.

> However, if at any point we make a change to the hardware (and equally
> importantly our bookkeeping in the driver), we cannot cancel the request
> as what has already been written must be submitted. Submitting a partial
> request is far easier than trying to unwind the incomplete change.

What change could be made to the hardware? Engine reset, perhaps, but 
even that doesn't necessarily invalidate a request in preparation for 
sending to the hardware.

Submitting a partial change seems likely to leave the h/w in an 
undefined state. Cancelling a request is (or ought to be) trivial; just 
reset the ringbuffer's tail pointer to where it was when the request was 
allocated. The engine doesn't read anything past the tail of the 
previous request until the new request is submitted.

> Unfortunately this patch undoes the excess breadcrumb usage that olr
> prevented, e.g. if we interrupt batchbuffer submission then we submit
> the requests along with the memory writes and interrupt (even though we
> do no real work). Disassociating requests from breadcrumbs (and
> semaphores) is a topic for a past/future series, but now much more
> important.

Another incomprehensible comment? OLR went away a long time ago now. And 
AFAIK batchbuffer submission cannot be interrupted by anything (or more 
accurately, anything that interrupts submission won't be able to write 
to the ringbuffer or submit new work).

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: stable@vger.kernel.org
> ---
>   drivers/gpu/drm/i915/i915_drv.h            |  1 -
>   drivers/gpu/drm/i915/i915_gem.c            |  7 ++-----
>   drivers/gpu/drm/i915/i915_gem_context.c    | 21 +++++++++------------
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 16 +++++-----------
>   drivers/gpu/drm/i915/intel_display.c       |  2 +-
>   drivers/gpu/drm/i915/intel_lrc.c           |  1 -
>   6 files changed, 17 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 747d2d84a18c..ec20814adb0c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2813,7 +2813,6 @@ int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
>   			     struct drm_file *file_priv);
>   void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
>   					struct drm_i915_gem_request *req);
> -void i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params);
>   int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
>   				   struct drm_i915_gem_execbuffer2 *args,
>   				   struct list_head *vmas);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 3ab529669448..fd24877eb0a0 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3384,12 +3384,9 @@ int i915_gpu_idle(struct drm_device *dev)
>   				return ret;
>
>   			ret = i915_switch_context(req);
> -			if (ret) {
> -				i915_gem_request_cancel(req);
> -				return ret;
> -			}
> -
>   			i915_add_request_no_flush(req);
> +			if (ret)
> +				return ret;

This seems like a bad idea. Looking at how we could get here (i.e. how 
could switch_context() return an error), we see things such as "failed 
to pin" or i915_gem_object_get_pages() failed.

With no real idea of what the GPU and/or CPU address spaces contain at 
this point, it seems unwise to charge ahead regardless.

.Dave.

>   		}
>
>   		ret = intel_ring_idle(ring);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index c25083c78ba7..e5e9a8918f19 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -661,7 +661,6 @@ static int do_switch(struct drm_i915_gem_request *req)
>   	struct drm_i915_private *dev_priv = ring->dev->dev_private;
>   	struct intel_context *from = ring->last_context;
>   	u32 hw_flags = 0;
> -	bool uninitialized = false;
>   	int ret, i;
>
>   	if (from != NULL && ring == &dev_priv->ring[RCS]) {
> @@ -768,6 +767,15 @@ static int do_switch(struct drm_i915_gem_request *req)
>   			to->remap_slice &= ~(1<<i);
>   	}
>
> +	if (!to->legacy_hw_ctx.initialized) {
> +		if (ring->init_context) {
> +			ret = ring->init_context(req);
> +			if (ret)
> +				goto unpin_out;
> +		}
> +		to->legacy_hw_ctx.initialized = true;
> +	}
> +
>   	/* The backing object for the context is done after switching to the
>   	 * *next* context. Therefore we cannot retire the previous context until
>   	 * the next context has already started running. In fact, the below code
> @@ -791,21 +799,10 @@ static int do_switch(struct drm_i915_gem_request *req)
>   		i915_gem_context_unreference(from);
>   	}
>
> -	uninitialized = !to->legacy_hw_ctx.initialized;
> -	to->legacy_hw_ctx.initialized = true;
> -
>   done:
>   	i915_gem_context_reference(to);
>   	ring->last_context = to;
>
> -	if (uninitialized) {
> -		if (ring->init_context) {
> -			ret = ring->init_context(req);
> -			if (ret)
> -				DRM_ERROR("ring init context: %d\n", ret);
> -		}
> -	}
> -
>   	return 0;
>
>   unpin_out:
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index dccb517361b3..b8186bd061c1 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1136,7 +1136,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
>   	}
>   }
>
> -void
> +static void
>   i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params)
>   {
>   	/* Unconditionally force add_request to emit a full flush. */
> @@ -1318,7 +1318,6 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
>   	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
>
>   	i915_gem_execbuffer_move_to_active(vmas, params->request);
> -	i915_gem_execbuffer_retire_commands(params);
>
>   	return 0;
>   }
> @@ -1607,8 +1606,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   		goto err_batch_unpin;
>
>   	ret = i915_gem_request_add_to_client(params->request, file);
> -	if (ret)
> +	if (ret) {
> +		i915_gem_request_cancel(params->request);
>   		goto err_batch_unpin;
> +	}
>
>   	/*
>   	 * Save assorted stuff away to pass through to *_submission().
> @@ -1624,6 +1625,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	params->ctx                     = ctx;
>
>   	ret = dev_priv->gt.execbuf_submit(params, args, &eb->vmas);
> +	i915_gem_execbuffer_retire_commands(params);
>
>   err_batch_unpin:
>   	/*
> @@ -1640,14 +1642,6 @@ err:
>   	i915_gem_context_unreference(ctx);
>   	eb_destroy(eb);
>
> -	/*
> -	 * If the request was created but not successfully submitted then it
> -	 * must be freed again. If it was submitted then it is being tracked
> -	 * on the active request list and no clean up is required here.
> -	 */
> -	if (ret && params->request)
> -		i915_gem_request_cancel(params->request);
> -
>   	mutex_unlock(&dev->struct_mutex);
>
>   pre_mutex_err:
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index b4cf9ce16155..959868c40018 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -11751,7 +11751,7 @@ cleanup_unpin:
>   	intel_unpin_fb_obj(fb, crtc->primary->state);
>   cleanup_pending:
>   	if (request)
> -		i915_gem_request_cancel(request);
> +		i915_add_request_no_flush(request);
>   	atomic_dec(&intel_crtc->unpin_work_count);
>   	mutex_unlock(&dev->struct_mutex);
>   cleanup:
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index f7fac5f3b5ce..7f17ba852b8a 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -972,7 +972,6 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
>   	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
>
>   	i915_gem_execbuffer_move_to_active(vmas, params->request);
> -	i915_gem_execbuffer_retire_commands(params);
>
>   	return 0;
>   }
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 053/190] drm/i915: Convert i915_semaphores_is_enabled over to early sanitize
  2016-01-11  9:17 ` [PATCH 053/190] drm/i915: Convert i915_semaphores_is_enabled over to early sanitize Chris Wilson
@ 2016-01-12 19:07   ` Dave Gordon
  0 siblings, 0 replies; 263+ messages in thread
From: Dave Gordon @ 2016-01-12 19:07 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 11/01/16 09:17, Chris Wilson wrote:
> Rather than recomputing whether semaphores are enabled, we can do that
> computation once during early initialisation as the i915.semaphores
> module parameter is now read-only.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c     |  2 +-
>   drivers/gpu/drm/i915/i915_dma.c         |  2 +-
>   drivers/gpu/drm/i915/i915_drv.c         | 25 -----------------------
>   drivers/gpu/drm/i915/i915_drv.h         |  1 -
>   drivers/gpu/drm/i915/i915_gem.c         | 35 ++++++++++++++++++++++++++++++---
>   drivers/gpu/drm/i915/i915_gem_context.c |  2 +-
>   drivers/gpu/drm/i915/i915_gpu_error.c   |  2 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.c | 20 +++++++++----------
>   8 files changed, 46 insertions(+), 43 deletions(-)

LGTM.

Reviewed-by: Dave Gordon <david.s.gordon@intel.com>

[aside]
The conditions below seem to exclude a lot of systems. It looks like 
semaphores can only be used on GEN 6 (if no IOMMU) and GEN 7. Nothing 
before or after that range, as GEN9+ supports execlist only.

So is it even worth continuing to support semaphores at all? Especially 
as we have customers who say that the scheduler gives more performance 
gain than semaphores ...
[/aside]

.Dave.

> +static bool i915_gem_sanitize_semaphore(struct drm_i915_private *dev_priv,
> +					int param_value)
> +{
> +	if (INTEL_INFO(dev_priv)->gen < 6)
> +		return false;
> +
> +	if (param_value >= 0)
> +		return param_value;
> +
> +	/* TODO: make semaphores and Execlists play nicely together */
> +	if (i915.enable_execlists)
> +		return false;
> +
> +	/* Until we get further testing... */
> +	if (IS_GEN8(dev_priv))
> +		return false;
> +
> +#ifdef CONFIG_INTEL_IOMMU
> +	/* Enable semaphores on SNB when IO remapping is off */
> +	if (INTEL_INFO(dev_priv)->gen == 6 && intel_iommu_gfx_mapped)
> +		return false;
> +#endif
> +
> +	return true;
> +}

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [Intel-gfx] [PATCH 004/190] drm/i915: Fix some invalid requests cancellations
  2016-01-12 18:16     ` Dave Gordon
  (?)
@ 2016-01-13 20:06     ` Chris Wilson
  -1 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-13 20:06 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx, Daniel Vetter, stable

On Tue, Jan 12, 2016 at 06:16:21PM +0000, Dave Gordon wrote:
> On 11/01/16 09:16, Chris Wilson wrote:
> >As we add the VMA to the request early, it may be cancelled during
> >execbuf reservation. This will leave the context object pointing to a
> >dangling request; i915_wait_request() simply skips the wait and so we
> >may unbind the object whilst it is still active.
> 
> I don't understand this; context objects don't point to requests,
> it's vice versa. The request has a pointer to the engine, and to the
> context, and adds +1 to the refcount on the latter.

The context object has an active reference to prevent it being freed
whilst still in use. That is combined with the pin on last_context until
after the switch is complete to prevent the hardware writing to stale
pages. The golden render state also uses the active reference to prevent
being reaped whilst it is still in the ringbuffer, again we can return
those pages back to the system and then execute them on the GPU.

> >However, if at any point we make a change to the hardware (and equally
> >importantly our bookkeeping in the driver), we cannot cancel the request
> >as what has already been written must be submitted. Submitting a partial
> >request is far easier than trying to unwind the incomplete change.
> 
> What change could be made to the hardware? Engine reset, perhaps,
> but even that doesn't necessarily invalidate a request in
> preparation for sending to the hardware.

Sloppy terminology, but we don't unwind the request when we cancel it,
so the command we have written into the ringbuffer persist and will be
executed by the next request. We would also need to unwind the changes
we made to our state tracker.
 
> Submitting a partial change seems likely to leave the h/w in an
> undefined state. Cancelling a request is (or ought to be) trivial;
> just reset the ringbuffer's tail pointer to where it was when the
> request was allocated. The engine doesn't read anything past the
> tail of the previous request until the new request is submitted.

The opposite, submitting the changes that we have made so far ensures
that the hardware state matches our bookkeeping.
 
> >Unfortunately this patch undoes the excess breadcrumb usage that olr
> >prevented, e.g. if we interrupt batchbuffer submission then we submit
> >the requests along with the memory writes and interrupt (even though we
> >do no real work). Disassociating requests from breadcrumbs (and
> >semaphores) is a topic for a past/future series, but now much more
> >important.
> 
> Another incomprehensible comment? OLR went away a long time ago now.

OLR was introduced to avoid exactly this scenario - where we would emit
commands into the ring and then have to cancel the operation. Rather
than emit the breadcrumb and complete the seqno/request, we kept the
request open and reused it for the newcomer.

> And AFAIK batchbuffer submission cannot be interrupted by anything
> (or more accurately, anything that interrupts submission won't be
> able to write to the ringbuffer or submit new work).

You mean after the request is constructed. Whilst the request is being
constructed it is very easily to interrupt, and apparently fragile.

> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >Cc: stable@vger.kernel.org

Do I need to mention that igt finds these bugs?

> >---
> >  drivers/gpu/drm/i915/i915_drv.h            |  1 -
> >  drivers/gpu/drm/i915/i915_gem.c            |  7 ++-----
> >  drivers/gpu/drm/i915/i915_gem_context.c    | 21 +++++++++------------
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 16 +++++-----------
> >  drivers/gpu/drm/i915/intel_display.c       |  2 +-
> >  drivers/gpu/drm/i915/intel_lrc.c           |  1 -
> >  6 files changed, 17 insertions(+), 31 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >index 747d2d84a18c..ec20814adb0c 100644
> >--- a/drivers/gpu/drm/i915/i915_drv.h
> >+++ b/drivers/gpu/drm/i915/i915_drv.h
> >@@ -2813,7 +2813,6 @@ int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
> >  			     struct drm_file *file_priv);
> >  void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
> >  					struct drm_i915_gem_request *req);
> >-void i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params);
> >  int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
> >  				   struct drm_i915_gem_execbuffer2 *args,
> >  				   struct list_head *vmas);
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index 3ab529669448..fd24877eb0a0 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -3384,12 +3384,9 @@ int i915_gpu_idle(struct drm_device *dev)
> >  				return ret;
> >
> >  			ret = i915_switch_context(req);
> >-			if (ret) {
> >-				i915_gem_request_cancel(req);
> >-				return ret;
> >-			}
> >-
> >  			i915_add_request_no_flush(req);
> >+			if (ret)
> >+				return ret;
> 
> This seems like a bad idea. Looking at how we could get here (i.e.
> how could switch_context() return an error), we see things such as
> "failed to pin" or i915_gem_object_get_pages() failed.

The point is that we can hit the error path after having done part of
the request setup and have objects pointing to the request. It cannot be
cancelled at this point.
 
> With no real idea of what the GPU and/or CPU address spaces contain
> at this point, it seems unwise to charge ahead regardless.

? We know the state. We know that objects are using this request to
track their changes. Throwing those away means that we throw away
storage being used by the HW.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 079/190] drm/i915: Reduce the pointer dance of i915_is_ggtt()
  2016-01-11  9:17 ` [PATCH 079/190] drm/i915: Reduce the pointer dance of i915_is_ggtt() Chris Wilson
@ 2016-01-15 12:12   ` Dave Gordon
  2016-01-15 12:24     ` Chris Wilson
  0 siblings, 1 reply; 263+ messages in thread
From: Dave Gordon @ 2016-01-15 12:12 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 11/01/16 09:17, Chris Wilson wrote:
> The multiple levels of indirect do nothing but hinder the compiler and
> the pointer chasing turns to be quite painful but painless to fix.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c        | 13 ++++++-------
>   drivers/gpu/drm/i915/i915_drv.h            |  7 -------
>   drivers/gpu/drm/i915/i915_gem.c            | 18 +++++++-----------
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 ++---
>   drivers/gpu/drm/i915/i915_gem_gtt.c        | 12 +++++-------
>   drivers/gpu/drm/i915/i915_gem_gtt.h        |  5 +++++
>   drivers/gpu/drm/i915/i915_trace.h          | 27 ++++++++-------------------
>   7 files changed, 33 insertions(+), 54 deletions(-)

[snip]

> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c9c1a5cdc1e5..f840cc55f1ab 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2905,18 +2905,11 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj);
>   /* Some GGTT VM helpers */
>   #define i915_obj_to_ggtt(obj) \
>   	(&((struct drm_i915_private *)(obj)->base.dev->dev_private)->gtt.base)
> -static inline bool i915_is_ggtt(struct i915_address_space *vm)
> -{
> -	struct i915_address_space *ggtt =
> -		&((struct drm_i915_private *)(vm)->dev->dev_private)->gtt.base;
> -	return vm == ggtt;
> -}

> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index b5c3bbe6dc2a..06117bd0fc00 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -3150,6 +3150,7 @@ int i915_gem_gtt_init(struct drm_device *dev)
>   	}
>
>   	gtt->base.dev = dev;
> +	gtt->base.is_ggtt = true;

So, it looks like the plan here is that when we need to determine 
whether something is the special distinguished instance of a type, then 
instead of comparing its address against the global pointer to the 
distinguished instance, we'll just look for a flag /inside/ the object 
itself, which is set /only/ on the distinguished instance.

Now why didn't I think of that? That looks like such a good idea, we 
should apply it in other CONTEXTs!

Reviewed-by: Dave Gordon <david.s.gordon@intel.com>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 079/190] drm/i915: Reduce the pointer dance of i915_is_ggtt()
  2016-01-15 12:12   ` Dave Gordon
@ 2016-01-15 12:24     ` Chris Wilson
  0 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-01-15 12:24 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Fri, Jan 15, 2016 at 12:12:15PM +0000, Dave Gordon wrote:
> On 11/01/16 09:17, Chris Wilson wrote:
> >The multiple levels of indirect do nothing but hinder the compiler and
> >the pointer chasing turns to be quite painful but painless to fix.
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >---
> >  drivers/gpu/drm/i915/i915_debugfs.c        | 13 ++++++-------
> >  drivers/gpu/drm/i915/i915_drv.h            |  7 -------
> >  drivers/gpu/drm/i915/i915_gem.c            | 18 +++++++-----------
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 ++---
> >  drivers/gpu/drm/i915/i915_gem_gtt.c        | 12 +++++-------
> >  drivers/gpu/drm/i915/i915_gem_gtt.h        |  5 +++++
> >  drivers/gpu/drm/i915/i915_trace.h          | 27 ++++++++-------------------
> >  7 files changed, 33 insertions(+), 54 deletions(-)
> 
> [snip]
> 
> >diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >index c9c1a5cdc1e5..f840cc55f1ab 100644
> >--- a/drivers/gpu/drm/i915/i915_drv.h
> >+++ b/drivers/gpu/drm/i915/i915_drv.h
> >@@ -2905,18 +2905,11 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj);
> >  /* Some GGTT VM helpers */
> >  #define i915_obj_to_ggtt(obj) \
> >  	(&((struct drm_i915_private *)(obj)->base.dev->dev_private)->gtt.base)
> >-static inline bool i915_is_ggtt(struct i915_address_space *vm)
> >-{
> >-	struct i915_address_space *ggtt =
> >-		&((struct drm_i915_private *)(vm)->dev->dev_private)->gtt.base;
> >-	return vm == ggtt;
> >-}
> 
> >diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >index b5c3bbe6dc2a..06117bd0fc00 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >@@ -3150,6 +3150,7 @@ int i915_gem_gtt_init(struct drm_device *dev)
> >  	}
> >
> >  	gtt->base.dev = dev;
> >+	gtt->base.is_ggtt = true;
> 
> So, it looks like the plan here is that when we need to determine
> whether something is the special distinguished instance of a type,
> then instead of comparing its address against the global pointer to
> the distinguished instance, we'll just look for a flag /inside/ the
> object itself, which is set /only/ on the distinguished instance.
> 
> Now why didn't I think of that? That looks like such a good idea, we
> should apply it in other CONTEXTs!

But we already have that flag in contexts! It also happens to be useful
for other tracking as well. And we demonstrated that we didn't even need
the checks for the kernel context anyway.

You will also note this is a small stepping patch after which we
transition away from i915_address_space.is_ggtt to using the owner.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 017/190] drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+
  2016-01-11 14:02   ` Dave Gordon
@ 2016-01-21 16:27     ` Mika Kuoppala
  0 siblings, 0 replies; 263+ messages in thread
From: Mika Kuoppala @ 2016-01-21 16:27 UTC (permalink / raw)
  To: Dave Gordon, Chris Wilson, intel-gfx; +Cc: Daniel Vetter

Dave Gordon <david.s.gordon@intel.com> writes:

> On 11/01/16 09:16, Chris Wilson wrote:
>> In order to ensure seqno/irq coherency, we current read a ring register.
>> We are not sure quite how it works, only that is does. Experiments show
>> that e.g. doing a clflush(seqno) instead is not sufficient, but we can
>> remove the forcewake dance from the mmio access.
>>
>> v2: Baytrail wants a clflush too.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> ---
>>   drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++++++++++++--
>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index 99780b674311..a1d43b2c7077 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -1490,10 +1490,21 @@ gen6_ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
>>   {
>>   	/* Workaround to force correct ordering between irq and seqno writes on
>>   	 * ivb (and maybe also on snb) by reading from a CS register (like
>> -	 * ACTHD) before reading the status page. */
>> +	 * ACTHD) before reading the status page.
>> +	 *
>> +	 * Note that this effectively effectively stalls the read by the time
>> +	 * it takes to do a memory transaction, which more or less ensures
>> +	 * that the write from the GPU has sufficient time to invalidate
>> +	 * the CPU cacheline. Alternatively we could delay the interrupt from
>> +	 * the CS ring to give the write time to land, but that would incur
>> +	 * a delay after every batch i.e. much more frequent than a delay
>> +	 * when waiting for the interrupt (with the same net latency).
>> +	 */
>>   	if (!lazy_coherency) {
>>   		struct drm_i915_private *dev_priv = ring->dev->dev_private;
>> -		POSTING_READ(RING_ACTHD(ring->mmio_base));
>> +		POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
>> +
>> +		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
>>   	}
>>
>>   	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
>
> Well, I generally like this, but my previous questions of 2015-01-05 
> were not answered:
>
>> Hmm ... would putting the flush /before/ the POSTING_READ be better?
>>
>> Depending on how the h/w implements the cacheline invalidation, it
>  > might allow some overlap between the cache controller's internal
>  > activities and the MMIO cycle ...

I thought of the sequence of events like this:

#1: read(acthd9) -> gpu flushes the write (if cpu snoop fails,
we still have the correct on in dram)

#2: flush_status_page() -> is actually going to be invalidate for cpu as
it is only written from cpu side on init. (explained in bxt_a_get_seqno)

#3: read status page will get coherent value from dram.

>>
>> Also, previously we only had the flush on BXT, whereas now you're
>  > doing it on all gen6+. I think this is probably a good thing, but just
>  > wondered whether there's any downside to it?

Perf hit. But when seqno coherence is essential, we need
to be absolutely sure due to way we handle irqs. The only 
place where we force coherence is from __i915_wait_request,
and there only before final decision before going to sleep.
(well hangcheck also does the coherent read but that has no
relevance on perf).

Our irq handling is built on a principle that if
we enable interrupts with waiters, absolutely nothing can get lost.
There is no leeway after irq_get().

If we are going to put task to sleep, we can afford a cacheline
flush. 

So guarantee a coherent read asked to be coherent. And relax
the rules per gen basis, if someone can show passed tests with
improved perf metrics.

>>
>> Also ... are we sure that no-one calls this without having a
>> forcewake  in effect at the time, in particular debugfs? Or is it not
>  > going to end up going through here once lazy_coherency is abolished?
>

I asked about this from Chris in irc. The actual forcewake
status doesn't matter. If it was not on, we just get zero
but the sideffect still happens: seqno is written.

Thanks,
-Mika

> .Dave.
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 075/190] drm/i915: Refactor activity tracking for requests
  2016-01-11  9:17 ` [PATCH 075/190] drm/i915: Refactor activity tracking for requests Chris Wilson
@ 2016-01-28 11:41   ` Tvrtko Ursulin
  2016-01-28 11:46     ` Chris Wilson
  0 siblings, 1 reply; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-01-28 11:41 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


Hi,

On 11/01/16 09:17, Chris Wilson wrote:
> With the introduction of requests, we amplified the number of atomic
> refcounted objects we use and update every execbuffer; from none to
> several references, and a set of references that need to be changed. We
> also introduced interesting side-effects in the order of retiring
> requests and objects.
>
> Instead of independently tracking the last request for an object, track
> the active objects for each request. The object will reside in the
> buffer list of its most recent active request and so we reduce the kref
> interchange to a list_move. Now retirements are entirely driven by the
> request, dramatically simplifying activity tracking on the object
> themselves, and removing the ambiguity between retiring objects and
> retiring requests.
>
> All told, less code, simpler and faster, and more extensible.

I've looked in this in detail before holidays and unfortunately a lot if 
if evaporated from my head since. I remember I thought the idea was good 
and really simplifies things.

But it is also difficult to apply the subset of patches to look at the 
resulting code in more detail.

So would it be possible to extract and rebase relevant patches? I think 
that would be from 73 to 76. (Together with the renaming we agreed 
already. And those trivial renames of list/link already have r-b's.)

That would be step one, rework of active tracking.

Then the next step could be adding VMA tracking, patches 81 to 87 I think.

Regards,

Tvrtko


>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/Makefile           |   1 -
>   drivers/gpu/drm/i915/i915_drv.h         |  10 --
>   drivers/gpu/drm/i915/i915_gem.c         | 160 ++++++++------------------------
>   drivers/gpu/drm/i915/i915_gem_debug.c   |  70 --------------
>   drivers/gpu/drm/i915/i915_gem_fence.c   |  10 +-
>   drivers/gpu/drm/i915/i915_gem_request.c |  44 +++++++--
>   drivers/gpu/drm/i915/i915_gem_request.h |  16 +++-
>   drivers/gpu/drm/i915/intel_lrc.c        |   1 -
>   drivers/gpu/drm/i915/intel_ringbuffer.c |   1 -
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  12 ---
>   10 files changed, 89 insertions(+), 236 deletions(-)
>   delete mode 100644 drivers/gpu/drm/i915/i915_gem_debug.c
>
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index b0a83215db80..79d657f29241 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -23,7 +23,6 @@ i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o
>   i915-y += i915_cmd_parser.o \
>   	  i915_gem_batch_pool.o \
>   	  i915_gem_context.o \
> -	  i915_gem_debug.o \
>   	  i915_gem_dmabuf.o \
>   	  i915_gem_evict.o \
>   	  i915_gem_execbuffer.o \
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c577f86d94f8..c9c1a5cdc1e5 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -435,8 +435,6 @@ void intel_link_compute_m_n(int bpp, int nlanes,
>   #define DRIVER_MINOR		6
>   #define DRIVER_PATCHLEVEL	0
>
> -#define WATCH_LISTS	0
> -
>   struct opregion_header;
>   struct opregion_acpi;
>   struct opregion_swsci;
> @@ -2024,7 +2022,6 @@ struct drm_i915_gem_object {
>   	struct drm_mm_node *stolen;
>   	struct list_head global_list;
>
> -	struct list_head ring_list[I915_NUM_RINGS];
>   	/** Used in execbuf to temporarily hold a ref */
>   	struct list_head obj_exec_link;
>
> @@ -3068,13 +3065,6 @@ static inline bool i915_gem_object_needs_bit17_swizzle(struct drm_i915_gem_objec
>   		obj->tiling_mode != I915_TILING_NONE;
>   }
>
> -/* i915_gem_debug.c */
> -#if WATCH_LISTS
> -int i915_verify_lists(struct drm_device *dev);
> -#else
> -#define i915_verify_lists(dev) 0
> -#endif
> -
>   /* i915_debugfs.c */
>   int i915_debugfs_init(struct drm_minor *minor);
>   void i915_debugfs_cleanup(struct drm_minor *minor);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index f314b3ea2726..4eef13ebdaf3 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -40,10 +40,6 @@
>
>   static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
>   static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
> -static void
> -i915_gem_object_retire__write(struct drm_i915_gem_object *obj);
> -static void
> -i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring);
>
>   static bool cpu_cache_is_coherent(struct drm_device *dev,
>   				  enum i915_cache_level level)
> @@ -117,7 +113,6 @@ int i915_mutex_lock_interruptible(struct drm_device *dev)
>   	if (ret)
>   		return ret;
>
> -	WARN_ON(i915_verify_lists(dev));
>   	return 0;
>   }
>
> @@ -1117,27 +1112,14 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
>   		return 0;
>
>   	if (readonly) {
> -		if (obj->last_write.request != NULL) {
> -			ret = i915_wait_request(obj->last_write.request);
> -			if (ret)
> -				return ret;
> -
> -			i = obj->last_write.request->engine->id;
> -			if (obj->last_read[i].request == obj->last_write.request)
> -				i915_gem_object_retire__read(obj, i);
> -			else
> -				i915_gem_object_retire__write(obj);
> -		}
> +		ret = i915_wait_request(obj->last_write.request);
> +		if (ret)
> +			return ret;
>   	} else {
>   		for (i = 0; i < I915_NUM_RINGS; i++) {
> -			if (obj->last_read[i].request == NULL)
> -				continue;
> -
>   			ret = i915_wait_request(obj->last_read[i].request);
>   			if (ret)
>   				return ret;
> -
> -			i915_gem_object_retire__read(obj, i);
>   		}
>   		GEM_BUG_ON(obj->active);
>   	}
> @@ -1145,20 +1127,6 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
>   	return 0;
>   }
>
> -static void
> -i915_gem_object_retire_request(struct drm_i915_gem_object *obj,
> -			       struct drm_i915_gem_request *req)
> -{
> -	int ring = req->engine->id;
> -
> -	if (obj->last_read[ring].request == req)
> -		i915_gem_object_retire__read(obj, ring);
> -	else if (obj->last_write.request == req)
> -		i915_gem_object_retire__write(obj);
> -
> -	i915_gem_request_retire_upto(req);
> -}
> -
>   /* A nonblocking variant of the above wait. This is a highly dangerous routine
>    * as the object state may change during this call.
>    */
> @@ -1206,7 +1174,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
>
>   	for (i = 0; i < n; i++) {
>   		if (ret == 0)
> -			i915_gem_object_retire_request(obj, requests[i]);
> +			i915_gem_request_retire_upto(requests[i]);
>   		i915_gem_request_put(requests[i]);
>   	}
>
> @@ -2069,35 +2037,37 @@ void i915_vma_move_to_active(struct i915_vma *vma,
>   		drm_gem_object_reference(&obj->base);
>   	obj->active |= intel_engine_flag(engine);
>
> -	list_move_tail(&obj->ring_list[engine->id], &engine->active_list);
>   	i915_gem_request_mark_active(req, &obj->last_read[engine->id]);
> -
>   	list_move_tail(&vma->mm_list, &vma->vm->active_list);
>   }
>
>   static void
> -i915_gem_object_retire__write(struct drm_i915_gem_object *obj)
> +i915_gem_object_retire__fence(struct i915_gem_active *active,
> +			      struct drm_i915_gem_request *req)
>   {
> -	GEM_BUG_ON(obj->last_write.request == NULL);
> -	GEM_BUG_ON(!(obj->active & intel_engine_flag(obj->last_write.request->engine)));
> +}
>
> -	i915_gem_request_assign(&obj->last_write.request, NULL);
> -	intel_fb_obj_flush(obj, true, ORIGIN_CS);
> +static void
> +i915_gem_object_retire__write(struct i915_gem_active *active,
> +			      struct drm_i915_gem_request *request)
> +{
> +	intel_fb_obj_flush(container_of(active,
> +					struct drm_i915_gem_object,
> +					last_write),
> +			   true,
> +			   ORIGIN_CS);
>   }
>
>   static void
> -i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
> +i915_gem_object_retire__read(struct i915_gem_active *active,
> +			     struct drm_i915_gem_request *request)
>   {
> +	int ring = request->engine->id;
> +	struct drm_i915_gem_object *obj =
> +		container_of(active, struct drm_i915_gem_object, last_read[ring]);
>   	struct i915_vma *vma;
>
> -	GEM_BUG_ON(obj->last_read[ring].request == NULL);
> -	GEM_BUG_ON(!(obj->active & (1 << ring)));
> -
> -	list_del_init(&obj->ring_list[ring]);
> -	i915_gem_request_assign(&obj->last_read[ring].request, NULL);
> -
> -	if (obj->last_write.request && obj->last_write.request->engine->id == ring)
> -		i915_gem_object_retire__write(obj);
> +	GEM_BUG_ON((obj->active & (1 << ring)) == 0);
>
>   	obj->active &= ~(1 << ring);
>   	if (obj->active)
> @@ -2107,15 +2077,13 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>   	 * so that we don't steal from recently used but inactive objects
>   	 * (unless we are forced to ofc!)
>   	 */
> -	list_move_tail(&obj->global_list,
> -		       &to_i915(obj->base.dev)->mm.bound_list);
> +	list_move_tail(&obj->global_list, &request->i915->mm.bound_list);
>
>   	list_for_each_entry(vma, &obj->vma_list, vma_link) {
>   		if (!list_empty(&vma->mm_list))
>   			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
>   	}
>
> -	i915_gem_request_assign(&obj->last_fence.request, NULL);
>   	drm_gem_object_unreference(&obj->base);
>   }
>
> @@ -2216,16 +2184,6 @@ static void i915_gem_reset_ring_cleanup(struct intel_engine_cs *engine)
>   {
>   	struct intel_ring *ring;
>
> -	while (!list_empty(&engine->active_list)) {
> -		struct drm_i915_gem_object *obj;
> -
> -		obj = list_first_entry(&engine->active_list,
> -				       struct drm_i915_gem_object,
> -				       ring_list[engine->id]);
> -
> -		i915_gem_object_retire__read(obj, engine->id);
> -	}
> -
>   	/*
>   	 * Clear the execlists queue up before freeing the requests, as those
>   	 * are the ones that keep the context and ringbuffer backing objects
> @@ -2295,8 +2253,6 @@ void i915_gem_reset(struct drm_device *dev)
>   	i915_gem_context_reset(dev);
>
>   	i915_gem_restore_fences(dev);
> -
> -	WARN_ON(i915_verify_lists(dev));
>   }
>
>   /**
> @@ -2305,13 +2261,6 @@ void i915_gem_reset(struct drm_device *dev)
>   void
>   i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>   {
> -	WARN_ON(i915_verify_lists(ring->dev));
> -
> -	/* Retire requests first as we use it above for the early return.
> -	 * If we retire requests last, we may use a later seqno and so clear
> -	 * the requests lists without clearing the active list, leading to
> -	 * confusion.
> -	 */
>   	while (!list_empty(&ring->request_list)) {
>   		struct drm_i915_gem_request *request;
>
> @@ -2324,25 +2273,6 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>
>   		i915_gem_request_retire_upto(request);
>   	}
> -
> -	/* Move any buffers on the active list that are no longer referenced
> -	 * by the ringbuffer to the flushing/inactive lists as appropriate,
> -	 * before we free the context associated with the requests.
> -	 */
> -	while (!list_empty(&ring->active_list)) {
> -		struct drm_i915_gem_object *obj;
> -
> -		obj = list_first_entry(&ring->active_list,
> -				      struct drm_i915_gem_object,
> -				      ring_list[ring->id]);
> -
> -		if (!list_empty(&obj->last_read[ring->id].request->link))
> -			break;
> -
> -		i915_gem_object_retire__read(obj, ring->id);
> -	}
> -
> -	WARN_ON(i915_verify_lists(ring->dev));
>   }
>
>   void
> @@ -2434,13 +2364,13 @@ out:
>    * write domains, emitting any outstanding lazy request and retiring and
>    * completed requests.
>    */
> -static int
> +static void
>   i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
>   {
>   	int i;
>
>   	if (!obj->active)
> -		return 0;
> +		return;
>
>   	for (i = 0; i < I915_NUM_RINGS; i++) {
>   		struct drm_i915_gem_request *req;
> @@ -2449,17 +2379,9 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
>   		if (req == NULL)
>   			continue;
>
> -		if (list_empty(&req->link))
> -			goto retire;
> -
> -		if (i915_gem_request_completed(req)) {
> +		if (i915_gem_request_completed(req))
>   			i915_gem_request_retire_upto(req);
> -retire:
> -			i915_gem_object_retire__read(obj, i);
> -		}
>   	}
> -
> -	return 0;
>   }
>
>   /**
> @@ -2507,10 +2429,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   	}
>
>   	/* Need to make sure the object gets inactive eventually. */
> -	ret = i915_gem_object_flush_active(obj);
> -	if (ret)
> -		goto out;
> -
> +	i915_gem_object_flush_active(obj);
>   	if (!obj->active)
>   		goto out;
>
> @@ -2522,8 +2441,6 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   		goto out;
>   	}
>
> -	drm_gem_object_unreference(&obj->base);
> -
>   	for (i = 0; i < I915_NUM_RINGS; i++) {
>   		if (obj->last_read[i].request == NULL)
>   			continue;
> @@ -2531,6 +2448,8 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   		req[n++] = i915_gem_request_get(obj->last_read[i].request);
>   	}
>
> +out:
> +	drm_gem_object_unreference(&obj->base);
>   	mutex_unlock(&dev->struct_mutex);
>
>   	for (i = 0; i < n; i++) {
> @@ -2541,11 +2460,6 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   		i915_gem_request_put(req[i]);
>   	}
>   	return ret;
> -
> -out:
> -	drm_gem_object_unreference(&obj->base);
> -	mutex_unlock(&dev->struct_mutex);
> -	return ret;
>   }
>
>   static int
> @@ -2569,7 +2483,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
>   		if (ret)
>   			return ret;
>
> -		i915_gem_object_retire_request(obj, from);
> +		i915_gem_request_retire_upto(from);
>   	} else {
>   		int idx = intel_engine_sync_index(from->engine, to->engine);
>   		if (from->fence.seqno <= from->engine->semaphore.sync_seqno[idx])
> @@ -2760,7 +2674,6 @@ int i915_gpu_idle(struct drm_device *dev)
>   			return ret;
>   	}
>
> -	WARN_ON(i915_verify_lists(dev));
>   	return 0;
>   }
>
> @@ -3689,16 +3602,13 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
>   	 * become non-busy without any further actions, therefore emit any
>   	 * necessary flushes here.
>   	 */
> -	ret = i915_gem_object_flush_active(obj);
> -	if (ret)
> -		goto unref;
> +	i915_gem_object_flush_active(obj);
>
>   	BUILD_BUG_ON(I915_NUM_RINGS > 16);
>   	args->busy = obj->active << 16;
>   	if (obj->last_write.request)
>   		args->busy |= obj->last_write.request->engine->id;
>
> -unref:
>   	drm_gem_object_unreference(&obj->base);
>   unlock:
>   	mutex_unlock(&dev->struct_mutex);
> @@ -3776,7 +3686,12 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
>
>   	INIT_LIST_HEAD(&obj->global_list);
>   	for (i = 0; i < I915_NUM_RINGS; i++)
> -		INIT_LIST_HEAD(&obj->ring_list[i]);
> +		init_request_active(&obj->last_read[i],
> +				    i915_gem_object_retire__read);
> +	init_request_active(&obj->last_write,
> +			    i915_gem_object_retire__write);
> +	init_request_active(&obj->last_fence,
> +			    i915_gem_object_retire__fence);
>   	INIT_LIST_HEAD(&obj->obj_exec_link);
>   	INIT_LIST_HEAD(&obj->vma_list);
>   	INIT_LIST_HEAD(&obj->batch_pool_link);
> @@ -4372,7 +4287,6 @@ i915_gem_cleanup_ringbuffer(struct drm_device *dev)
>   static void
>   init_ring_lists(struct intel_engine_cs *ring)
>   {
> -	INIT_LIST_HEAD(&ring->active_list);
>   	INIT_LIST_HEAD(&ring->request_list);
>   }
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_debug.c b/drivers/gpu/drm/i915/i915_gem_debug.c
> deleted file mode 100644
> index 17299d04189f..000000000000
> --- a/drivers/gpu/drm/i915/i915_gem_debug.c
> +++ /dev/null
> @@ -1,70 +0,0 @@
> -/*
> - * Copyright © 2008 Intel Corporation
> - *
> - * Permission is hereby granted, free of charge, to any person obtaining a
> - * copy of this software and associated documentation files (the "Software"),
> - * to deal in the Software without restriction, including without limitation
> - * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> - * and/or sell copies of the Software, and to permit persons to whom the
> - * Software is furnished to do so, subject to the following conditions:
> - *
> - * The above copyright notice and this permission notice (including the next
> - * paragraph) shall be included in all copies or substantial portions of the
> - * Software.
> - *
> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> - * IN THE SOFTWARE.
> - *
> - * Authors:
> - *    Keith Packard <keithp@keithp.com>
> - *
> - */
> -
> -#include <drm/drmP.h>
> -#include <drm/i915_drm.h>
> -#include "i915_drv.h"
> -
> -#if WATCH_LISTS
> -int
> -i915_verify_lists(struct drm_device *dev)
> -{
> -	static int warned;
> -	struct drm_i915_private *dev_priv = to_i915(dev);
> -	struct drm_i915_gem_object *obj;
> -	struct intel_engine_cs *ring;
> -	int err = 0;
> -	int i;
> -
> -	if (warned)
> -		return 0;
> -
> -	for_each_ring(ring, dev_priv, i) {
> -		list_for_each_entry(obj, &ring->active_list, ring_list[ring->id]) {
> -			if (obj->base.dev != dev ||
> -			    !atomic_read(&obj->base.refcount.refcount)) {
> -				DRM_ERROR("%s: freed active obj %p\n",
> -					  ring->name, obj);
> -				err++;
> -				break;
> -			} else if (!obj->active ||
> -				   obj->last_read_req[ring->id] == NULL) {
> -				DRM_ERROR("%s: invalid active obj %p\n",
> -					  ring->name, obj);
> -				err++;
> -			} else if (obj->base.write_domain) {
> -				DRM_ERROR("%s: invalid write obj %p (w %x)\n",
> -					  ring->name,
> -					  obj, obj->base.write_domain);
> -				err++;
> -			}
> -		}
> -	}
> -
> -	return warned = err;
> -}
> -#endif /* WATCH_LIST */
> diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
> index ab29c237ffa9..ff085efcf0e5 100644
> --- a/drivers/gpu/drm/i915/i915_gem_fence.c
> +++ b/drivers/gpu/drm/i915/i915_gem_fence.c
> @@ -261,15 +261,7 @@ static inline void i915_gem_object_fence_lost(struct drm_i915_gem_object *obj)
>   static int
>   i915_gem_object_wait_fence(struct drm_i915_gem_object *obj)
>   {
> -	if (obj->last_fence.request) {
> -		int ret = i915_wait_request(obj->last_fence.request);
> -		if (ret)
> -			return ret;
> -
> -		i915_gem_request_assign(&obj->last_fence.request, NULL);
> -	}
> -
> -	return 0;
> +	return i915_wait_request(obj->last_fence.request);
>   }
>
>   /**
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index 7f38d8972721..069c0b9dfd95 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -228,6 +228,7 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
>   		   engine->fence_context,
>   		   seqno);
>
> +	INIT_LIST_HEAD(&req->active_list);
>   	req->i915 = dev_priv;
>   	req->engine = engine;
>   	req->reset_counter = reset_counter;
> @@ -320,6 +321,27 @@ static void __i915_gem_request_release(struct drm_i915_gem_request *request)
>   	i915_gem_request_put(request);
>   }
>
> +static void __i915_gem_request_retire_active(struct drm_i915_gem_request *req)
> +{
> +	struct i915_gem_active *active, *next;
> +
> +	/* Walk through the active list, calling retire on each. This allows
> +	 * objects to track their GPU activity and mark themselves as idle
> +	 * when their *last* active request is completed (updating state
> +	 * tracking lists for eviction, active references for GEM, etc).
> +	 *
> +	 * As the ->retire() may free the node, we decouple it first and
> +	 * pass along the auxiliary information (to avoid dereferencing
> +	 * the node after the callback).
> +	 */
> +	list_for_each_entry_safe(active, next, &req->active_list, link) {
> +		INIT_LIST_HEAD(&active->link);
> +		active->request = NULL;
> +
> +		active->retire(active, req);
> +	}
> +}
> +
>   void i915_gem_request_cancel(struct drm_i915_gem_request *req)
>   {
>   	intel_ring_reserved_space_cancel(req->ring);
> @@ -327,6 +349,14 @@ void i915_gem_request_cancel(struct drm_i915_gem_request *req)
>   		if (req->ctx != req->engine->default_context)
>   			intel_lr_context_unpin(req);
>   	}
> +
> +	/* If a request is to be discarded after actions have been queued upon
> +	 * it, we cannot unwind that request and it must be submitted rather
> +	 * than cancelled. This is not limited to activity tracking, but all
> +	 * other state tracking (such as current register settings etc).
> +	 */
> +	GEM_BUG_ON(!list_empty(&req->active_list));
> +
>   	__i915_gem_request_release(req);
>   }
>
> @@ -344,6 +374,8 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>   	 * completion order.
>   	 */
>   	request->ring->last_retired_head = request->postfix;
> +
> +	__i915_gem_request_retire_active(request);
>   	__i915_gem_request_release(request);
>   }
>
> @@ -354,7 +386,6 @@ i915_gem_request_retire_upto(struct drm_i915_gem_request *req)
>   	struct drm_i915_gem_request *tmp;
>
>   	lockdep_assert_held(&engine->dev->struct_mutex);
> -
>   	if (list_empty(&req->link))
>   		return;
>
> @@ -364,8 +395,6 @@ i915_gem_request_retire_upto(struct drm_i915_gem_request *req)
>
>   		i915_gem_request_retire(tmp);
>   	} while (tmp != req);
> -
> -	WARN_ON(i915_verify_lists(engine->dev));
>   }
>
>   static void i915_gem_mark_busy(struct drm_i915_private *dev_priv)
> @@ -565,9 +594,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>
>   	might_sleep();
>
> -	if (list_empty(&req->link))
> -		return 0;
> -
>   	if (i915_gem_request_completed(req))
>   		return 0;
>
> @@ -700,10 +726,12 @@ i915_wait_request(struct drm_i915_gem_request *req)
>   {
>   	int ret;
>
> -	BUG_ON(req == NULL);
> +	if (req == NULL)
> +		return 0;
>
> -	BUG_ON(!mutex_is_locked(&req->i915->dev->struct_mutex));
> +	GEM_BUG_ON(list_empty(&req->link));
>
> +	lockdep_assert_held(&req->i915->dev->struct_mutex);
>   	ret = __i915_wait_request(req, req->i915->mm.interruptible, NULL, NULL);
>   	if (ret)
>   		return ret;
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index 01d589be95fd..59957d5edfdb 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -84,6 +84,7 @@ struct drm_i915_gem_request {
>   	/** Batch buffer related to this request if any (used for
>   	    error state dump only) */
>   	struct drm_i915_gem_object *batch_obj;
> +	struct list_head active_list;
>
>   	/** Time at which this request was emitted, in jiffies. */
>   	unsigned long emitted_jiffies;
> @@ -237,13 +238,26 @@ static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
>    */
>   struct i915_gem_active {
>   	struct drm_i915_gem_request *request;
> +	struct list_head link;
> +	void (*retire)(struct i915_gem_active *,
> +		       struct drm_i915_gem_request *);
>   };
>
>   static inline void
> +init_request_active(struct i915_gem_active *active,
> +		    void (*func)(struct i915_gem_active *,
> +				 struct drm_i915_gem_request *))
> +{
> +	INIT_LIST_HEAD(&active->link);
> +	active->retire = func;
> +}
> +
> +static inline void
>   i915_gem_request_mark_active(struct drm_i915_gem_request *request,
>   			     struct i915_gem_active *active)
>   {
> -	i915_gem_request_assign(&active->request, request);
> +	list_move(&active->link, &request->active_list);
> +	active->request = request;
>   }
>
>   #endif /* I915_GEM_REQUEST_H */
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 0f0bf97e4032..b5f62b5f4913 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1558,7 +1558,6 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>   	ring->dev = dev;
>   	ring->i915 = to_i915(dev);
>   	ring->fence_context = fence_context_alloc(1);
> -	INIT_LIST_HEAD(&ring->active_list);
>   	INIT_LIST_HEAD(&ring->request_list);
>   	i915_gem_batch_pool_init(dev, &ring->batch_pool);
>   	intel_engine_init_breadcrumbs(ring);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 213540f92c9d..7ca4e1fc854d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2025,7 +2025,6 @@ static int intel_init_engine(struct drm_device *dev,
>   	engine->dev = dev;
>   	engine->i915 = to_i915(dev);
>   	engine->fence_context = fence_context_alloc(1);
> -	INIT_LIST_HEAD(&engine->active_list);
>   	INIT_LIST_HEAD(&engine->request_list);
>   	INIT_LIST_HEAD(&engine->execlist_queue);
>   	INIT_LIST_HEAD(&engine->buffers);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index fc9c1e453be1..bb92d831a100 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -298,18 +298,6 @@ struct intel_engine_cs {
>   	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
>
>   	/**
> -	 * List of objects currently involved in rendering from the
> -	 * ringbuffer.
> -	 *
> -	 * Includes buffers having the contents of their GPU caches
> -	 * flushed, not necessarily primitives.  last_read_req
> -	 * represents when the rendering involved will be completed.
> -	 *
> -	 * A reference is held on the buffer while on this list.
> -	 */
> -	struct list_head active_list;
> -
> -	/**
>   	 * List of breadcrumbs associated with GPU requests currently
>   	 * outstanding.
>   	 */
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 058/190] drm/i915: Rename request->ring to request->engine
  2016-01-11  9:17 ` [PATCH 058/190] drm/i915: Rename request->ring to request->engine Chris Wilson
@ 2016-01-28 11:45   ` Tvrtko Ursulin
  0 siblings, 0 replies; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-01-28 11:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 11/01/16 09:17, Chris Wilson wrote:
> In order to disambiguate between the pointer to the intel_engine_cs
> (called ring) and the intel_ringbuffer (called ringbuf), rename
> s/ring/engine/.

How about just extract this and do it straight away?

I sneaked in some engines already so it would be good to do it before 
any other big work.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 075/190] drm/i915: Refactor activity tracking for requests
  2016-01-28 11:41   ` Tvrtko Ursulin
@ 2016-01-28 11:46     ` Chris Wilson
  2016-01-28 11:56       ` Tvrtko Ursulin
  0 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-01-28 11:46 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, Jan 28, 2016 at 11:41:37AM +0000, Tvrtko Ursulin wrote:
> 
> Hi,
> 
> On 11/01/16 09:17, Chris Wilson wrote:
> >With the introduction of requests, we amplified the number of atomic
> >refcounted objects we use and update every execbuffer; from none to
> >several references, and a set of references that need to be changed. We
> >also introduced interesting side-effects in the order of retiring
> >requests and objects.
> >
> >Instead of independently tracking the last request for an object, track
> >the active objects for each request. The object will reside in the
> >buffer list of its most recent active request and so we reduce the kref
> >interchange to a list_move. Now retirements are entirely driven by the
> >request, dramatically simplifying activity tracking on the object
> >themselves, and removing the ambiguity between retiring objects and
> >retiring requests.
> >
> >All told, less code, simpler and faster, and more extensible.
> 
> I've looked in this in detail before holidays and unfortunately a
> lot if if evaporated from my head since. I remember I thought the
> idea was good and really simplifies things.
> 
> But it is also difficult to apply the subset of patches to look at
> the resulting code in more detail.
> 
> So would it be possible to extract and rebase relevant patches? I
> think that would be from 73 to 76. (Together with the renaming we
> agreed already. And those trivial renames of list/link already have
> r-b's.)

Actually no, if you read some of the earlier patches you will see the
required bug fixes.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 059/190] drm/i915: Rename request->ringbuf to request->ring
  2016-01-11  9:17 ` [PATCH 059/190] drm/i915: Rename request->ringbuf to request->ring Chris Wilson
@ 2016-01-28 11:48   ` Tvrtko Ursulin
  0 siblings, 0 replies; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-01-28 11:48 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 11/01/16 09:17, Chris Wilson wrote:
> Now that we have disambuigated ring and engine, we can use the clearer
> and more consistent name for the intel_ringbuffer pointer in the
> request.

I am unsure about this one. Seems to be more ringbufs in the code base 
and structure is called ring buffer so would not be so keen on renaming 
this.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 060/190] drm/i915: Rename backpointer from intel_ringbuffer to intel_engine_cs
  2016-01-11  9:17 ` [PATCH 060/190] drm/i915: Rename backpointer from intel_ringbuffer to intel_engine_cs Chris Wilson
@ 2016-01-28 11:49   ` Tvrtko Ursulin
  0 siblings, 0 replies; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-01-28 11:49 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 11/01/16 09:17, Chris Wilson wrote:
> Having ringbuf->ring point to an engine is confusing, so rename it once
> again to ring->engine.

Another one to extract, rebase and merge ahead of the big work.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 063/190] drm/i915: Rename struct intel_ringbuffer to intel_ring
  2016-01-11  9:17 ` [PATCH 063/190] drm/i915: Rename struct intel_ringbuffer to intel_ring Chris Wilson
@ 2016-01-28 11:54   ` Tvrtko Ursulin
  0 siblings, 0 replies; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-01-28 11:54 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


Oh right, I suppose patch 59 is OK then if it goes together with this one.

Regards,

Tvrtko

On 11/01/16 09:17, Chris Wilson wrote:
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c        |  21 +++---
>   drivers/gpu/drm/i915/i915_drv.h            |   2 +-
>   drivers/gpu/drm/i915/i915_gem.c            |  43 ++++++------
>   drivers/gpu/drm/i915/i915_gem_context.c    |   2 +-
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
>   drivers/gpu/drm/i915/i915_gem_gtt.c        |   6 +-
>   drivers/gpu/drm/i915/i915_gem_request.c    |   2 +-
>   drivers/gpu/drm/i915/i915_gem_request.h    |   2 +-
>   drivers/gpu/drm/i915/i915_gpu_error.c      |   2 +-
>   drivers/gpu/drm/i915/i915_guc_submission.c |   2 +-
>   drivers/gpu/drm/i915/intel_display.c       |  10 +--
>   drivers/gpu/drm/i915/intel_lrc.c           |  40 ++++++------
>   drivers/gpu/drm/i915/intel_mocs.c          |   4 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.c    | 101 ++++++++++++++---------------
>   drivers/gpu/drm/i915/intel_ringbuffer.h    |  45 ++++++-------
>   15 files changed, 138 insertions(+), 148 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index dec10784c2bc..8de944ed3369 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1948,12 +1948,11 @@ static int i915_gem_framebuffer_info(struct seq_file *m, void *data)
>   	return 0;
>   }
>
> -static void describe_ctx_ringbuf(struct seq_file *m,
> -				 struct intel_ringbuffer *ringbuf)
> +static void describe_ctx_ring(struct seq_file *m, struct intel_ring *ring)
>   {
>   	seq_printf(m, " (ringbuffer, space: %d, head: %u, tail: %u, last head: %d)",
> -		   ringbuf->space, ringbuf->head, ringbuf->tail,
> -		   ringbuf->last_retired_head);
> +		   ring->space, ring->head, ring->tail,
> +		   ring->last_retired_head);
>   }
>
>   static int i915_context_status(struct seq_file *m, void *unused)
> @@ -1985,16 +1984,12 @@ static int i915_context_status(struct seq_file *m, void *unused)
>   		if (i915.enable_execlists) {
>   			seq_putc(m, '\n');
>   			for_each_ring(ring, dev_priv, i) {
> -				struct drm_i915_gem_object *ctx_obj =
> -					ctx->engine[i].state;
> -				struct intel_ringbuffer *ringbuf =
> -					ctx->engine[i].ring;
> -
>   				seq_printf(m, "%s: ", ring->name);
> -				if (ctx_obj)
> -					describe_obj(m, ctx_obj);
> -				if (ringbuf)
> -					describe_ctx_ringbuf(m, ringbuf);
> +				if (ctx->engine[i].state)
> +					describe_obj(m, ctx->engine[i].state);
> +				if (ctx->engine[i].ring)
> +					describe_ctx_ring(m,
> +							  ctx->engine[i].ring);
>   				seq_putc(m, '\n');
>   			}
>   		} else {
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 466adc6617f0..44e8738c5310 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -885,7 +885,7 @@ struct intel_context {
>   	/* Execlists */
>   	struct {
>   		struct drm_i915_gem_object *state;
> -		struct intel_ringbuffer *ring;
> +		struct intel_ring *ring;
>   		int pin_count;
>   	} engine[I915_NUM_RINGS];
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index a81cad666d3a..1c6beb154d07 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2193,9 +2193,9 @@ i915_gem_find_active_request(struct intel_engine_cs *ring)
>   	return NULL;
>   }
>
> -static void i915_gem_reset_ring_status(struct drm_i915_private *dev_priv,
> -				       struct intel_engine_cs *ring)
> +static void i915_gem_reset_ring_status(struct intel_engine_cs *ring)
>   {
> +	struct drm_i915_private *dev_priv = ring->i915;
>   	struct drm_i915_gem_request *request;
>   	bool ring_hung;
>
> @@ -2212,19 +2212,18 @@ static void i915_gem_reset_ring_status(struct drm_i915_private *dev_priv,
>   		i915_set_reset_status(dev_priv, request->ctx, false);
>   }
>
> -static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
> -					struct intel_engine_cs *ring)
> +static void i915_gem_reset_ring_cleanup(struct intel_engine_cs *engine)
>   {
> -	struct intel_ringbuffer *buffer;
> +	struct intel_ring *ring;
>
> -	while (!list_empty(&ring->active_list)) {
> +	while (!list_empty(&engine->active_list)) {
>   		struct drm_i915_gem_object *obj;
>
> -		obj = list_first_entry(&ring->active_list,
> +		obj = list_first_entry(&engine->active_list,
>   				       struct drm_i915_gem_object,
> -				       ring_list[ring->id]);
> +				       ring_list[engine->id]);
>
> -		i915_gem_object_retire__read(obj, ring->id);
> +		i915_gem_object_retire__read(obj, engine->id);
>   	}
>
>   	/*
> @@ -2234,14 +2233,14 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>   	 */
>
>   	if (i915.enable_execlists) {
> -		spin_lock_irq(&ring->execlist_lock);
> +		spin_lock_irq(&engine->execlist_lock);
>
>   		/* list_splice_tail_init checks for empty lists */
> -		list_splice_tail_init(&ring->execlist_queue,
> -				      &ring->execlist_retired_req_list);
> +		list_splice_tail_init(&engine->execlist_queue,
> +				      &engine->execlist_retired_req_list);
>
> -		spin_unlock_irq(&ring->execlist_lock);
> -		intel_execlists_retire_requests(ring);
> +		spin_unlock_irq(&engine->execlist_lock);
> +		intel_execlists_retire_requests(engine);
>   	}
>
>   	/*
> @@ -2251,10 +2250,10 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>   	 * implicit references on things like e.g. ppgtt address spaces through
>   	 * the request.
>   	 */
> -	if (!list_empty(&ring->request_list)) {
> +	if (!list_empty(&engine->request_list)) {
>   		struct drm_i915_gem_request *request;
>
> -		request = list_last_entry(&ring->request_list,
> +		request = list_last_entry(&engine->request_list,
>   					  struct drm_i915_gem_request,
>   					  list);
>
> @@ -2268,12 +2267,12 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>   	 * upon reset is less than when we start. Do one more pass over
>   	 * all the ringbuffers to reset last_retired_head.
>   	 */
> -	list_for_each_entry(buffer, &ring->buffers, link) {
> -		buffer->last_retired_head = buffer->tail;
> -		intel_ring_update_space(buffer);
> +	list_for_each_entry(ring, &engine->buffers, link) {
> +		ring->last_retired_head = ring->tail;
> +		intel_ring_update_space(ring);
>   	}
>
> -	intel_engine_init_seqno(ring, ring->last_submitted_seqno);
> +	intel_engine_init_seqno(engine, engine->last_submitted_seqno);
>   }
>
>   void i915_gem_reset(struct drm_device *dev)
> @@ -2288,10 +2287,10 @@ void i915_gem_reset(struct drm_device *dev)
>   	 * their reference to the objects, the inspection must be done first.
>   	 */
>   	for_each_ring(ring, dev_priv, i)
> -		i915_gem_reset_ring_status(dev_priv, ring);
> +		i915_gem_reset_ring_status(ring);
>
>   	for_each_ring(ring, dev_priv, i)
> -		i915_gem_reset_ring_cleanup(dev_priv, ring);
> +		i915_gem_reset_ring_cleanup(ring);
>
>   	i915_gem_context_reset(dev);
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index ac2e205fe3b4..17fe8ed991d6 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -519,7 +519,7 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
>   static inline int
>   mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	u32 flags = hw_flags | MI_MM_SPACE_GTT;
>   	const int num_rings =
>   		/* Use an extended w/a on ivb+ if signalling from other rings */
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index b7c90072f7d4..731ce13dbdbc 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1148,7 +1148,7 @@ i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params)
>   static int
>   i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	int ret, i;
>
>   	if (!IS_GEN7(req->i915) || req->engine->id != RCS) {
> @@ -1229,7 +1229,7 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
>   			       struct drm_i915_gem_execbuffer2 *args,
>   			       struct list_head *vmas)
>   {
> -	struct intel_ringbuffer *ring = params->request->ring;
> +	struct intel_ring *ring = params->request->ring;
>   	struct drm_i915_private *dev_priv = params->request->i915;
>   	u64 exec_start, exec_len;
>   	int instp_mode;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 38c109cda904..9a91451d66ac 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -656,7 +656,7 @@ static int gen8_write_pdp(struct drm_i915_gem_request *req,
>   			  unsigned entry,
>   			  dma_addr_t addr)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	int ret;
>
>   	BUG_ON(entry >= 4);
> @@ -1648,7 +1648,7 @@ static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
>   static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
>   			 struct drm_i915_gem_request *req)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	int ret;
>
>   	/* NB: TLBs must be flushed and invalidated before a switch */
> @@ -1686,7 +1686,7 @@ static int vgpu_mm_switch(struct i915_hw_ppgtt *ppgtt,
>   static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt,
>   			  struct drm_i915_gem_request *req)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	int ret;
>
>   	/* NB: TLBs must be flushed and invalidated before a switch */
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index 54834ad1bf5e..e1f2af046b6c 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -401,7 +401,7 @@ void __i915_add_request(struct drm_i915_gem_request *request,
>   			struct drm_i915_gem_object *obj,
>   			bool flush_caches)
>   {
> -	struct intel_ringbuffer *ring;
> +	struct intel_ring *ring;
>   	u32 request_start;
>   	int ret;
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index cd4412f6e7e3..086950567db4 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -79,7 +79,7 @@ struct drm_i915_gem_request {
>   	 * context.
>   	 */
>   	struct intel_context *ctx;
> -	struct intel_ringbuffer *ring;
> +	struct intel_ring *ring;
>
>   	/** Batch buffer related to this request if any (used for
>   	    error state dump only) */
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index f27d6d1b64d6..2785f2d1f073 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1007,7 +1007,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
>   		request = i915_gem_find_active_request(engine);
>   		if (request) {
>   			struct i915_address_space *vm;
> -			struct intel_ringbuffer *ring;
> +			struct intel_ring *ring;
>
>   			vm = request->ctx && request->ctx->ppgtt ?
>   				&request->ctx->ppgtt->base :
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 39ccfa8934e3..5a6251926367 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -390,7 +390,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
>
>   	for (i = 0; i < I915_NUM_RINGS; i++) {
>   		struct guc_execlist_context *lrc = &desc.lrc[i];
> -		struct intel_ringbuffer *ring = ctx->engine[i].ring;
> +		struct intel_ring *ring = ctx->engine[i].ring;
>   		struct intel_engine_cs *engine;
>   		struct drm_i915_gem_object *obj;
>   		uint64_t ctx_desc;
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 0d42356f15b4..f8717c5627dd 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -11052,7 +11052,7 @@ static int intel_gen2_queue_flip(struct drm_device *dev,
>   				 struct drm_i915_gem_request *req,
>   				 uint32_t flags)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
>   	u32 flip_mask;
>   	int ret;
> @@ -11087,7 +11087,7 @@ static int intel_gen3_queue_flip(struct drm_device *dev,
>   				 struct drm_i915_gem_request *req,
>   				 uint32_t flags)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
>   	u32 flip_mask;
>   	int ret;
> @@ -11119,7 +11119,7 @@ static int intel_gen4_queue_flip(struct drm_device *dev,
>   				 struct drm_i915_gem_request *req,
>   				 uint32_t flags)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	struct drm_i915_private *dev_priv = req->i915;
>   	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
>   	uint32_t pf, pipesrc;
> @@ -11158,7 +11158,7 @@ static int intel_gen6_queue_flip(struct drm_device *dev,
>   				 struct drm_i915_gem_request *req,
>   				 uint32_t flags)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	struct drm_i915_private *dev_priv = req->i915;
>   	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
>   	uint32_t pf, pipesrc;
> @@ -11194,7 +11194,7 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
>   				 struct drm_i915_gem_request *req,
>   				 uint32_t flags)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
>   	uint32_t plane_bit = 0;
>   	int len, ret;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 92ae7bc532ed..fa4c0c0db994 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -449,7 +449,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *engine)
>   			 * for where we prepare the padding after the end of the
>   			 * request.
>   			 */
> -			struct intel_ringbuffer *ring;
> +			struct intel_ring *ring;
>
>   			ring = req0->ctx->engine[engine->id].ring;
>   			req0->tail += 8;
> @@ -742,7 +742,7 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
>   	struct drm_device       *dev = params->dev;
>   	struct intel_engine_cs  *engine = params->ring;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_ringbuffer *ring = params->request->ring;
> +	struct intel_ring *ring = params->request->ring;
>   	u64 exec_start;
>   	int instp_mode;
>   	u32 instp_mask;
> @@ -878,7 +878,7 @@ int logical_ring_flush_all_caches(struct drm_i915_gem_request *req)
>
>   static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
>   		struct drm_i915_gem_object *ctx_obj,
> -		struct intel_ringbuffer *ringbuf)
> +		struct intel_ring *ringbuf)
>   {
>   	struct drm_i915_private *dev_priv = ring->i915;
>   	int ret = 0;
> @@ -889,7 +889,7 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
>   	if (ret)
>   		return ret;
>
> -	ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
> +	ret = intel_pin_and_map_ring(ring->dev, ringbuf);
>   	if (ret)
>   		goto unpin_ctx_obj;
>
> @@ -931,12 +931,12 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
>   {
>   	int engine = rq->engine->id;
>   	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[engine].state;
> -	struct intel_ringbuffer *ring = rq->ring;
> +	struct intel_ring *ring = rq->ring;
>
>   	if (ctx_obj) {
>   		WARN_ON(!mutex_is_locked(&rq->i915->dev->struct_mutex));
>   		if (--rq->ctx->engine[engine].pin_count == 0) {
> -			intel_unpin_ringbuffer_obj(ring);
> +			intel_unpin_ring(ring);
>   			i915_gem_object_ggtt_unpin(ctx_obj);
>   			i915_gem_context_unreference(rq->ctx);
>   		}
> @@ -947,7 +947,7 @@ static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
>   {
>   	int ret, i;
>   	struct intel_engine_cs *engine = req->engine;
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	struct drm_i915_private *dev_priv = req->i915;
>   	struct i915_workarounds *w = &dev_priv->workarounds;
>
> @@ -1417,7 +1417,7 @@ static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
>   {
>   	struct i915_hw_ppgtt *ppgtt = req->ctx->ppgtt;
>   	struct intel_engine_cs *engine = req->engine;
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	const int num_lri_cmds = GEN8_LEGACY_PDPES * 2;
>   	int i, ret;
>
> @@ -1444,7 +1444,7 @@ static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
>   static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
>   			      u64 offset, unsigned dispatch_flags)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	bool ppgtt = !(dispatch_flags & I915_DISPATCH_SECURE);
>   	int ret;
>
> @@ -1503,7 +1503,7 @@ static int gen8_emit_flush(struct drm_i915_gem_request *request,
>   			   u32 invalidate_domains,
>   			   u32 unused)
>   {
> -	struct intel_ringbuffer *ring = request->ring;
> +	struct intel_ring *ring = request->ring;
>   	uint32_t cmd;
>   	int ret;
>
> @@ -1541,7 +1541,7 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
>   				  u32 invalidate_domains,
>   				  u32 flush_domains)
>   {
> -	struct intel_ringbuffer *ring = request->ring;
> +	struct intel_ring *ring = request->ring;
>   	u32 scratch_addr = request->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
>   	bool vf_flush_wa = false;
>   	u32 flags = 0;
> @@ -1620,7 +1620,7 @@ gen6_seqno_barrier(struct intel_engine_cs *ring)
>
>   static int gen8_emit_request(struct drm_i915_gem_request *request)
>   {
> -	struct intel_ringbuffer *ring = request->ring;
> +	struct intel_ring *ring = request->ring;
>   	u32 cmd;
>   	int ret;
>
> @@ -2039,7 +2039,7 @@ make_rpcs(struct drm_device *dev)
>
>   static int
>   populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj,
> -		    struct intel_engine_cs *ring, struct intel_ringbuffer *ringbuf)
> +		    struct intel_engine_cs *ring, struct intel_ring *ringbuf)
>   {
>   	struct drm_device *dev = ring->dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> @@ -2174,15 +2174,15 @@ void intel_lr_context_free(struct intel_context *ctx)
>   		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
>
>   		if (ctx_obj) {
> -			struct intel_ringbuffer *ring = ctx->engine[i].ring;
> +			struct intel_ring *ring = ctx->engine[i].ring;
>   			struct intel_engine_cs *engine = ring->engine;
>
>   			if (ctx == engine->default_context) {
> -				intel_unpin_ringbuffer_obj(ring);
> +				intel_unpin_ring(ring);
>   				i915_gem_object_ggtt_unpin(ctx_obj);
>   			}
>   			WARN_ON(ctx->engine[engine->id].pin_count);
> -			intel_ringbuffer_free(ring);
> +			intel_ring_free(ring);
>   			drm_gem_object_unreference(&ctx_obj->base);
>   		}
>   	}
> @@ -2262,7 +2262,7 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
>   {
>   	struct drm_i915_gem_object *ctx_obj;
>   	uint32_t context_size;
> -	struct intel_ringbuffer *ring;
> +	struct intel_ring *ring;
>   	int ret;
>
>   	WARN_ON(ctx->legacy_hw_ctx.rcs_state != NULL);
> @@ -2279,7 +2279,7 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
>   		return -ENOMEM;
>   	}
>
> -	ring = intel_engine_create_ringbuffer(engine, 4 * PAGE_SIZE);
> +	ring = intel_engine_create_ring(engine, 4 * PAGE_SIZE);
>   	if (IS_ERR(ring)) {
>   		ret = PTR_ERR(ring);
>   		goto error_deref_obj;
> @@ -2316,7 +2316,7 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
>   	return 0;
>
>   error_ringbuf:
> -	intel_ringbuffer_free(ring);
> +	intel_ring_free(ring);
>   error_deref_obj:
>   	drm_gem_object_unreference(&ctx_obj->base);
>   	ctx->engine[engine->id].ring = NULL;
> @@ -2333,7 +2333,7 @@ void intel_lr_context_reset(struct drm_device *dev,
>
>   	for_each_ring(unused, dev_priv, i) {
>   		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
> -		struct intel_ringbuffer *ring = ctx->engine[i].ring;
> +		struct intel_ring *ring = ctx->engine[i].ring;
>   		uint32_t *reg_state;
>   		struct page *page;
>
> diff --git a/drivers/gpu/drm/i915/intel_mocs.c b/drivers/gpu/drm/i915/intel_mocs.c
> index 61e1704d7313..1b724c0a711e 100644
> --- a/drivers/gpu/drm/i915/intel_mocs.c
> +++ b/drivers/gpu/drm/i915/intel_mocs.c
> @@ -193,7 +193,7 @@ static int emit_mocs_control_table(struct drm_i915_gem_request *req,
>   				   const struct drm_i915_mocs_table *table,
>   				   enum intel_engine_id id)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	unsigned int index;
>   	int ret;
>
> @@ -244,7 +244,7 @@ static int emit_mocs_control_table(struct drm_i915_gem_request *req,
>   static int emit_mocs_l3cc_table(struct drm_i915_gem_request *req,
>   				const struct drm_i915_mocs_table *table)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	unsigned int count;
>   	unsigned int i;
>   	u32 value;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 1bb9f376aa0b..95974156a1d9 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -42,7 +42,7 @@ int __intel_ring_space(int head, int tail, int size)
>   	return space - I915_RING_FREE_SPACE;
>   }
>
> -void intel_ring_update_space(struct intel_ringbuffer *ringbuf)
> +void intel_ring_update_space(struct intel_ring *ringbuf)
>   {
>   	if (ringbuf->last_retired_head != -1) {
>   		ringbuf->head = ringbuf->last_retired_head;
> @@ -53,7 +53,7 @@ void intel_ring_update_space(struct intel_ringbuffer *ringbuf)
>   					    ringbuf->tail, ringbuf->size);
>   }
>
> -int intel_ring_space(struct intel_ringbuffer *ringbuf)
> +int intel_ring_space(struct intel_ring *ringbuf)
>   {
>   	intel_ring_update_space(ringbuf);
>   	return ringbuf->space;
> @@ -61,7 +61,7 @@ int intel_ring_space(struct intel_ringbuffer *ringbuf)
>
>   static void __intel_ring_advance(struct intel_engine_cs *ring)
>   {
> -	struct intel_ringbuffer *ringbuf = ring->buffer;
> +	struct intel_ring *ringbuf = ring->buffer;
>   	ringbuf->tail &= ringbuf->size - 1;
>   	ring->write_tail(ring, ringbuf->tail);
>   }
> @@ -71,7 +71,7 @@ gen2_render_ring_flush(struct drm_i915_gem_request *req,
>   		       u32	invalidate_domains,
>   		       u32	flush_domains)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	u32 cmd;
>   	int ret;
>
> @@ -98,7 +98,7 @@ gen4_render_ring_flush(struct drm_i915_gem_request *req,
>   		       u32	invalidate_domains,
>   		       u32	flush_domains)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	u32 cmd;
>   	int ret;
>
> @@ -191,7 +191,7 @@ gen4_render_ring_flush(struct drm_i915_gem_request *req,
>   static int
>   intel_emit_post_sync_nonzero_flush(struct drm_i915_gem_request *req)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	u32 scratch_addr = req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
>   	int ret;
>
> @@ -227,7 +227,7 @@ static int
>   gen6_render_ring_flush(struct drm_i915_gem_request *req,
>   		       u32 invalidate_domains, u32 flush_domains)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	u32 flags = 0;
>   	u32 scratch_addr = req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
>   	int ret;
> @@ -279,7 +279,7 @@ gen6_render_ring_flush(struct drm_i915_gem_request *req,
>   static int
>   gen7_render_ring_cs_stall_wa(struct drm_i915_gem_request *req)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	int ret;
>
>   	ret = intel_ring_begin(req, 4);
> @@ -300,7 +300,7 @@ static int
>   gen7_render_ring_flush(struct drm_i915_gem_request *req,
>   		       u32 invalidate_domains, u32 flush_domains)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	u32 flags = 0;
>   	u32 scratch_addr = req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
>   	int ret;
> @@ -363,7 +363,7 @@ static int
>   gen8_emit_pipe_control(struct drm_i915_gem_request *req,
>   		       u32 flags, u32 scratch_addr)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	int ret;
>
>   	ret = intel_ring_begin(req, 6);
> @@ -547,7 +547,7 @@ static int init_ring_common(struct intel_engine_cs *ring)
>   {
>   	struct drm_device *dev = ring->dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_ringbuffer *ringbuf = ring->buffer;
> +	struct intel_ring *ringbuf = ring->buffer;
>   	struct drm_i915_gem_object *obj = ringbuf->obj;
>   	int ret = 0;
>
> @@ -688,7 +688,7 @@ err:
>
>   static int intel_ring_workarounds_emit(struct drm_i915_gem_request *req)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	struct drm_i915_private *dev_priv = req->i915;
>   	struct i915_workarounds *w = &dev_priv->workarounds;
>   	int ret, i;
> @@ -1191,7 +1191,7 @@ static int gen8_rcs_signal(struct drm_i915_gem_request *signaller_req,
>   			   unsigned int num_dwords)
>   {
>   #define MBOX_UPDATE_DWORDS 8
> -	struct intel_ringbuffer *signaller = signaller_req->ring;
> +	struct intel_ring *signaller = signaller_req->ring;
>   	struct drm_i915_private *dev_priv = signaller_req->i915;
>   	struct intel_engine_cs *waiter;
>   	int i, ret, num_rings;
> @@ -1229,7 +1229,7 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
>   			   unsigned int num_dwords)
>   {
>   #define MBOX_UPDATE_DWORDS 6
> -	struct intel_ringbuffer *signaller = signaller_req->ring;
> +	struct intel_ring *signaller = signaller_req->ring;
>   	struct drm_i915_private *dev_priv = signaller_req->i915;
>   	struct intel_engine_cs *waiter;
>   	int i, ret, num_rings;
> @@ -1264,7 +1264,7 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
>   static int gen6_signal(struct drm_i915_gem_request *signaller_req,
>   		       unsigned int num_dwords)
>   {
> -	struct intel_ringbuffer *signaller = signaller_req->ring;
> +	struct intel_ring *signaller = signaller_req->ring;
>   	struct drm_i915_private *dev_priv = signaller_req->i915;
>   	struct intel_engine_cs *useless;
>   	int i, ret, num_rings;
> @@ -1306,7 +1306,7 @@ static int gen6_signal(struct drm_i915_gem_request *signaller_req,
>   static int
>   gen6_add_request(struct drm_i915_gem_request *req)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	int ret;
>
>   	if (req->engine->semaphore.signal)
> @@ -1345,7 +1345,7 @@ gen8_ring_sync(struct drm_i915_gem_request *waiter_req,
>   	       struct intel_engine_cs *signaller,
>   	       u32 seqno)
>   {
> -	struct intel_ringbuffer *waiter = waiter_req->ring;
> +	struct intel_ring *waiter = waiter_req->ring;
>   	struct drm_i915_private *dev_priv = waiter_req->i915;
>   	int ret;
>
> @@ -1373,7 +1373,7 @@ gen6_ring_sync(struct drm_i915_gem_request *waiter_req,
>   	       struct intel_engine_cs *signaller,
>   	       u32 seqno)
>   {
> -	struct intel_ringbuffer *waiter = waiter_req->ring;
> +	struct intel_ring *waiter = waiter_req->ring;
>   	u32 dw1 = MI_SEMAPHORE_MBOX |
>   		  MI_SEMAPHORE_COMPARE |
>   		  MI_SEMAPHORE_REGISTER;
> @@ -1421,7 +1421,7 @@ do {									\
>   static int
>   pc_render_add_request(struct drm_i915_gem_request *req)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	u32 addr = req->engine->status_page.gfx_addr +
>   		(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
>   	u32 scratch_addr = addr;
> @@ -1548,7 +1548,7 @@ bsd_ring_flush(struct drm_i915_gem_request *req,
>   	       u32     invalidate_domains,
>   	       u32     flush_domains)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	int ret;
>
>   	ret = intel_ring_begin(req, 2);
> @@ -1564,7 +1564,7 @@ bsd_ring_flush(struct drm_i915_gem_request *req,
>   static int
>   i9xx_add_request(struct drm_i915_gem_request *req)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	int ret;
>
>   	ret = intel_ring_begin(req, 4);
> @@ -1658,7 +1658,7 @@ i965_dispatch_execbuffer(struct drm_i915_gem_request *req,
>   			 u64 offset, u32 length,
>   			 unsigned dispatch_flags)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	int ret;
>
>   	ret = intel_ring_begin(req, 2);
> @@ -1685,7 +1685,7 @@ i830_dispatch_execbuffer(struct drm_i915_gem_request *req,
>   			 u64 offset, u32 len,
>   			 unsigned dispatch_flags)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	u32 cs_offset = req->engine->scratch.gtt_offset;
>   	int ret;
>
> @@ -1748,7 +1748,7 @@ i915_dispatch_execbuffer(struct drm_i915_gem_request *req,
>   			 u64 offset, u32 len,
>   			 unsigned dispatch_flags)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	int ret;
>
>   	ret = intel_ring_begin(req, 2);
> @@ -1845,7 +1845,7 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
>   	return 0;
>   }
>
> -void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
> +void intel_unpin_ring(struct intel_ring *ringbuf)
>   {
>   	if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen)
>   		i915_gem_object_unpin_vmap(ringbuf->obj);
> @@ -1854,8 +1854,7 @@ void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
>   	i915_gem_object_ggtt_unpin(ringbuf->obj);
>   }
>
> -int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
> -				     struct intel_ringbuffer *ringbuf)
> +int intel_pin_and_map_ring(struct drm_device *dev, struct intel_ring *ringbuf)
>   {
>   	struct drm_i915_private *dev_priv = to_i915(dev);
>   	struct drm_i915_gem_object *obj = ringbuf->obj;
> @@ -1900,14 +1899,14 @@ unpin:
>   	return ret;
>   }
>
> -static void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
> +static void intel_destroy_ringbuffer_obj(struct intel_ring *ringbuf)
>   {
>   	drm_gem_object_unreference(&ringbuf->obj->base);
>   	ringbuf->obj = NULL;
>   }
>
>   static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
> -				      struct intel_ringbuffer *ringbuf)
> +				      struct intel_ring *ringbuf)
>   {
>   	struct drm_i915_gem_object *obj;
>
> @@ -1927,10 +1926,10 @@ static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
>   	return 0;
>   }
>
> -struct intel_ringbuffer *
> -intel_engine_create_ringbuffer(struct intel_engine_cs *engine, int size)
> +struct intel_ring *
> +intel_engine_create_ring(struct intel_engine_cs *engine, int size)
>   {
> -	struct intel_ringbuffer *ring;
> +	struct intel_ring *ring;
>   	int ret;
>
>   	ring = kzalloc(sizeof(*ring), GFP_KERNEL);
> @@ -1968,7 +1967,7 @@ intel_engine_create_ringbuffer(struct intel_engine_cs *engine, int size)
>   }
>
>   void
> -intel_ringbuffer_free(struct intel_ringbuffer *ring)
> +intel_ring_free(struct intel_ring *ring)
>   {
>   	intel_destroy_ringbuffer_obj(ring);
>   	list_del(&ring->link);
> @@ -1978,7 +1977,7 @@ intel_ringbuffer_free(struct intel_ringbuffer *ring)
>   static int intel_init_engine(struct drm_device *dev,
>   			     struct intel_engine_cs *engine)
>   {
> -	struct intel_ringbuffer *ringbuf;
> +	struct intel_ring *ringbuf;
>   	int ret;
>
>   	WARN_ON(engine->buffer);
> @@ -1995,7 +1994,7 @@ static int intel_init_engine(struct drm_device *dev,
>
>   	intel_engine_init_breadcrumbs(engine);
>
> -	ringbuf = intel_engine_create_ringbuffer(engine, 32 * PAGE_SIZE);
> +	ringbuf = intel_engine_create_ring(engine, 32 * PAGE_SIZE);
>   	if (IS_ERR(ringbuf)) {
>   		ret = PTR_ERR(ringbuf);
>   		goto error;
> @@ -2013,7 +2012,7 @@ static int intel_init_engine(struct drm_device *dev,
>   			goto error;
>   	}
>
> -	ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
> +	ret = intel_pin_and_map_ring(dev, ringbuf);
>   	if (ret) {
>   		DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n",
>   				engine->name, ret);
> @@ -2043,8 +2042,8 @@ void intel_engine_cleanup(struct intel_engine_cs *ring)
>   		intel_engine_stop(ring);
>   		WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
>
> -		intel_unpin_ringbuffer_obj(ring->buffer);
> -		intel_ringbuffer_free(ring->buffer);
> +		intel_unpin_ring(ring->buffer);
> +		intel_ring_free(ring->buffer);
>   		ring->buffer = NULL;
>   	}
>
> @@ -2084,7 +2083,7 @@ int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
>   	return 0;
>   }
>
> -void intel_ring_reserved_space_reserve(struct intel_ringbuffer *ringbuf, int size)
> +void intel_ring_reserved_space_reserve(struct intel_ring *ringbuf, int size)
>   {
>   	WARN_ON(ringbuf->reserved_size);
>   	WARN_ON(ringbuf->reserved_in_use);
> @@ -2092,7 +2091,7 @@ void intel_ring_reserved_space_reserve(struct intel_ringbuffer *ringbuf, int siz
>   	ringbuf->reserved_size = size;
>   }
>
> -void intel_ring_reserved_space_cancel(struct intel_ringbuffer *ringbuf)
> +void intel_ring_reserved_space_cancel(struct intel_ring *ringbuf)
>   {
>   	WARN_ON(ringbuf->reserved_in_use);
>
> @@ -2100,7 +2099,7 @@ void intel_ring_reserved_space_cancel(struct intel_ringbuffer *ringbuf)
>   	ringbuf->reserved_in_use = false;
>   }
>
> -void intel_ring_reserved_space_use(struct intel_ringbuffer *ringbuf)
> +void intel_ring_reserved_space_use(struct intel_ring *ringbuf)
>   {
>   	WARN_ON(ringbuf->reserved_in_use);
>
> @@ -2108,7 +2107,7 @@ void intel_ring_reserved_space_use(struct intel_ringbuffer *ringbuf)
>   	ringbuf->reserved_tail   = ringbuf->tail;
>   }
>
> -void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf)
> +void intel_ring_reserved_space_end(struct intel_ring *ringbuf)
>   {
>   	WARN_ON(!ringbuf->reserved_in_use);
>   	if (ringbuf->tail > ringbuf->reserved_tail) {
> @@ -2133,7 +2132,7 @@ void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf)
>
>   static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	struct intel_engine_cs *engine = req->engine;
>   	struct drm_i915_gem_request *target;
>   	unsigned space;
> @@ -2172,7 +2171,7 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
>   	return 0;
>   }
>
> -static void ring_wrap(struct intel_ringbuffer *ringbuf)
> +static void ring_wrap(struct intel_ring *ringbuf)
>   {
>   	int rem = ringbuf->size - ringbuf->tail;
>   	memset(ringbuf->virtual_start + ringbuf->tail, 0, rem);
> @@ -2183,7 +2182,7 @@ static void ring_wrap(struct intel_ringbuffer *ringbuf)
>
>   static int ring_prepare(struct drm_i915_gem_request *req, int bytes)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	int remain_usable = ring->effective_size - ring->tail;
>   	int remain_actual = ring->size - ring->tail;
>   	int ret, total_bytes, wait_bytes = 0;
> @@ -2243,7 +2242,7 @@ int intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
>   /* Align the ring tail to a cacheline boundary */
>   int intel_ring_cacheline_align(struct drm_i915_gem_request *req)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	int num_dwords = (ring->tail & (CACHELINE_BYTES - 1)) / sizeof(uint32_t);
>   	int ret;
>
> @@ -2318,7 +2317,7 @@ static void gen6_bsd_ring_write_tail(struct intel_engine_cs *ring,
>   static int gen6_bsd_ring_flush(struct drm_i915_gem_request *req,
>   			       u32 invalidate, u32 flush)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	uint32_t cmd;
>   	int ret;
>
> @@ -2364,7 +2363,7 @@ gen8_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
>   			      u64 offset, u32 len,
>   			      unsigned dispatch_flags)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	bool ppgtt = USES_PPGTT(req->i915) &&
>   			!(dispatch_flags & I915_DISPATCH_SECURE);
>   	int ret;
> @@ -2390,7 +2389,7 @@ hsw_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
>   			     u64 offset, u32 len,
>   			     unsigned dispatch_flags)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	int ret;
>
>   	ret = intel_ring_begin(req, 2);
> @@ -2415,7 +2414,7 @@ gen6_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
>   			      u64 offset, u32 len,
>   			      unsigned dispatch_flags)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	int ret;
>
>   	ret = intel_ring_begin(req, 2);
> @@ -2438,7 +2437,7 @@ gen6_ring_dispatch_execbuffer(struct drm_i915_gem_request *req,
>   static int gen6_ring_flush(struct drm_i915_gem_request *req,
>   			   u32 invalidate, u32 flush)
>   {
> -	struct intel_ringbuffer *ring = req->ring;
> +	struct intel_ring *ring = req->ring;
>   	uint32_t cmd;
>   	int ret;
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 6803e4820688..71941af13560 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -97,7 +97,7 @@ struct intel_engine_hangcheck {
>   	u32 instdone[I915_NUM_INSTDONE_REG];
>   };
>
> -struct intel_ringbuffer {
> +struct intel_ring {
>   	struct drm_i915_gem_object *obj;
>   	void *virtual_start;
>
> @@ -163,7 +163,7 @@ struct intel_engine_cs {
>   	u32		mmio_base;
>   	struct		drm_device *dev;
>   	struct drm_i915_private *i915;
> -	struct intel_ringbuffer *buffer;
> +	struct intel_ring *buffer;
>   	struct list_head buffers;
>
>   	/* Rather than have every client wait upon all user interrupts,
> @@ -454,12 +454,11 @@ intel_write_status_page(struct intel_engine_cs *ring,
>   #define I915_GEM_HWS_SCRATCH_INDEX	0x40
>   #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
>
> -struct intel_ringbuffer *
> -intel_engine_create_ringbuffer(struct intel_engine_cs *engine, int size);
> -int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
> -				     struct intel_ringbuffer *ringbuf);
> -void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf);
> -void intel_ringbuffer_free(struct intel_ringbuffer *ring);
> +struct intel_ring *
> +intel_engine_create_ring(struct intel_engine_cs *engine, int size);
> +int intel_pin_and_map_ring(struct drm_device *dev, struct intel_ring *ring);
> +void intel_unpin_ring(struct intel_ring *ring);
> +void intel_ring_free(struct intel_ring *ring);
>
>   void intel_engine_stop(struct intel_engine_cs *ring);
>   void intel_engine_cleanup(struct intel_engine_cs *ring);
> @@ -468,24 +467,22 @@ int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request);
>
>   int __must_check intel_ring_begin(struct drm_i915_gem_request *req, int n);
>   int __must_check intel_ring_cacheline_align(struct drm_i915_gem_request *req);
> -static inline void intel_ring_emit(struct intel_ringbuffer *rb,
> -				   u32 data)
> +static inline void intel_ring_emit(struct intel_ring *ring, u32 data)
>   {
> -	*(uint32_t *)(rb->virtual_start + rb->tail) = data;
> -	rb->tail += 4;
> +	*(uint32_t *)(ring->virtual_start + ring->tail) = data;
> +	ring->tail += 4;
>   }
> -static inline void intel_ring_emit_reg(struct intel_ringbuffer *rb,
> -				       i915_reg_t reg)
> +static inline void intel_ring_emit_reg(struct intel_ring *ring, i915_reg_t reg)
>   {
> -	intel_ring_emit(rb, i915_mmio_reg_offset(reg));
> +	intel_ring_emit(ring, i915_mmio_reg_offset(reg));
>   }
> -static inline void intel_ring_advance(struct intel_ringbuffer *rb)
> +static inline void intel_ring_advance(struct intel_ring *ring)
>   {
> -	rb->tail &= rb->size - 1;
> +	ring->tail &= ring->size - 1;
>   }
>   int __intel_ring_space(int head, int tail, int size);
> -void intel_ring_update_space(struct intel_ringbuffer *ringbuf);
> -int intel_ring_space(struct intel_ringbuffer *ringbuf);
> +void intel_ring_update_space(struct intel_ring *ringbuf);
> +int intel_ring_space(struct intel_ring *ringbuf);
>
>   int __must_check intel_engine_idle(struct intel_engine_cs *ring);
>   void intel_engine_init_seqno(struct intel_engine_cs *ring, u32 seqno);
> @@ -509,7 +506,7 @@ static inline u32 intel_engine_get_seqno(struct intel_engine_cs *ring)
>
>   int init_workarounds_ring(struct intel_engine_cs *ring);
>
> -static inline u32 intel_ring_get_tail(struct intel_ringbuffer *ringbuf)
> +static inline u32 intel_ring_get_tail(struct intel_ring *ringbuf)
>   {
>   	return ringbuf->tail;
>   }
> @@ -528,13 +525,13 @@ static inline u32 intel_ring_get_tail(struct intel_ringbuffer *ringbuf)
>    * will always have sufficient room to do its stuff. The request creation
>    * code calls this automatically.
>    */
> -void intel_ring_reserved_space_reserve(struct intel_ringbuffer *ringbuf, int size);
> +void intel_ring_reserved_space_reserve(struct intel_ring *ringbuf, int size);
>   /* Cancel the reservation, e.g. because the request is being discarded. */
> -void intel_ring_reserved_space_cancel(struct intel_ringbuffer *ringbuf);
> +void intel_ring_reserved_space_cancel(struct intel_ring *ringbuf);
>   /* Use the reserved space - for use by i915_add_request() only. */
> -void intel_ring_reserved_space_use(struct intel_ringbuffer *ringbuf);
> +void intel_ring_reserved_space_use(struct intel_ring *ringbuf);
>   /* Finish with the reserved space - for use by i915_add_request() only. */
> -void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf);
> +void intel_ring_reserved_space_end(struct intel_ring *ringbuf);
>
>   /* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
>   struct intel_wait {
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 075/190] drm/i915: Refactor activity tracking for requests
  2016-01-28 11:46     ` Chris Wilson
@ 2016-01-28 11:56       ` Tvrtko Ursulin
  0 siblings, 0 replies; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-01-28 11:56 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 28/01/16 11:46, Chris Wilson wrote:
> On Thu, Jan 28, 2016 at 11:41:37AM +0000, Tvrtko Ursulin wrote:
>>
>> Hi,
>>
>> On 11/01/16 09:17, Chris Wilson wrote:
>>> With the introduction of requests, we amplified the number of atomic
>>> refcounted objects we use and update every execbuffer; from none to
>>> several references, and a set of references that need to be changed. We
>>> also introduced interesting side-effects in the order of retiring
>>> requests and objects.
>>>
>>> Instead of independently tracking the last request for an object, track
>>> the active objects for each request. The object will reside in the
>>> buffer list of its most recent active request and so we reduce the kref
>>> interchange to a list_move. Now retirements are entirely driven by the
>>> request, dramatically simplifying activity tracking on the object
>>> themselves, and removing the ambiguity between retiring objects and
>>> retiring requests.
>>>
>>> All told, less code, simpler and faster, and more extensible.
>>
>> I've looked in this in detail before holidays and unfortunately a
>> lot if if evaporated from my head since. I remember I thought the
>> idea was good and really simplifies things.
>>
>> But it is also difficult to apply the subset of patches to look at
>> the resulting code in more detail.
>>
>> So would it be possible to extract and rebase relevant patches? I
>> think that would be from 73 to 76. (Together with the renaming we
>> agreed already. And those trivial renames of list/link already have
>> r-b's.)
>
> Actually no, if you read some of the earlier patches you will see the
> required bug fixes.

How many / which ones? Can you extract them into a smaller series 
(rebased so it can be applied and tested) ending with 76?

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 136/190] drm/i915: Move ioremap_wc tracking onto VMA
  2016-01-11 10:45   ` [PATCH 136/190] drm/i915: Move ioremap_wc tracking onto VMA Chris Wilson
@ 2016-02-11 13:20     ` Tvrtko Ursulin
  2016-02-11 13:29       ` Chris Wilson
  0 siblings, 1 reply; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-02-11 13:20 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx



On 11/01/16 10:45, Chris Wilson wrote:
> By tracking the iomapping on the VMA itself, we can share that area
> between multiple users. Also by only revoking the iomapping upon
> unbinding from the mappable portion of the GGTT, we can keep that iomap
> across multiple invocations (e.g. execlists context pinning).

Between the lines and from some IRC discussion it seems the goal of this 
is to fix an address space memory leak with fbcon?

But I don't know fbdev so can't find who would do the unpin on the VMA 
to allow unbind to eventually unmap it?

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c     |  5 +++++
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 33 +++++++++++++++++++++++++++++++++
>   drivers/gpu/drm/i915/i915_gem_gtt.h |  4 ++++
>   drivers/gpu/drm/i915/intel_fbdev.c  |  8 +++-----
>   4 files changed, 45 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 0c4e8e1aeeff..5bb21b20c36a 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2699,6 +2699,11 @@ int i915_vma_unbind(struct i915_vma *vma)
>   		if (ret)
>   			return ret;
>
> +		if (vma->iomap) {
> +			iounmap(vma->iomap);
> +			vma->iomap = NULL;
> +		}
> +
>   		vma->map_and_fenceable = false;
>   	}
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index b8af904ad12c..3fcf2fd73453 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -3575,3 +3575,36 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
>
>   	return 0;
>   }
> +
> +void *i915_vma_iomap(struct drm_i915_private *dev_priv,
> +		     struct i915_vma *vma)
> +{
> +	if (WARN_ON(!vma->map_and_fenceable))
> +		return ERR_PTR(-ENODEV);
> +
> +	GEM_BUG_ON(!vma->is_ggtt);
> +	GEM_BUG_ON((vma->bound & GLOBAL_BIND) == 0);
> +
> +	if (vma->iomap == NULL) {
> +		u32 base = dev_priv->gtt.mappable_base + vma->node.start;
> +		void *ptr;
> +
> +		ptr = ioremap_wc(base, vma->size);
> +		if (ptr == NULL) {
> +			int ret;
> +
> +			/* Too many areas already allocated? */
> +			ret = i915_gem_evict_vm(vma->vm, true);
> +			if (ret)
> +				return ERR_PTR(ret);
> +
> +			ptr = ioremap_wc(base, vma->size);
> +			if (ptr == NULL)
> +				return ERR_PTR(-ENOMEM);
> +		}
> +
> +		vma->iomap = ptr;
> +	}
> +
> +	return vma->iomap;
> +}
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 6b0f557982d5..0e0570e13a68 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -181,6 +181,7 @@ struct i915_vma {
>   	struct drm_mm_node node;
>   	struct drm_i915_gem_object *obj;
>   	struct i915_address_space *vm;
> +	void *iomap;
>   	u64 size;
>
>   	struct i915_gem_active last_read[I915_NUM_RINGS];
> @@ -579,4 +580,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev);
>
>   int __must_check i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj);
>   void i915_gem_gtt_finish_object(struct drm_i915_gem_object *obj);
> +
> +void *i915_vma_iomap(struct drm_i915_private *dev_priv,
> +		     struct i915_vma *vma);
>   #endif
> diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
> index 7decbca25dbb..8e7c341951fd 100644
> --- a/drivers/gpu/drm/i915/intel_fbdev.c
> +++ b/drivers/gpu/drm/i915/intel_fbdev.c
> @@ -248,12 +248,10 @@ static int intelfb_create(struct drm_fb_helper *helper,
>   	info->fix.smem_start = dev->mode_config.fb_base + vma->node.start;
>   	info->fix.smem_len = vma->node.size;
>
> -	info->screen_base =
> -		ioremap_wc(dev_priv->gtt.mappable_base + vma->node.start,
> -			   vma->node.size);
> -	if (!info->screen_base) {
> +	info->screen_base = i915_vma_iomap(dev_priv, vma);
> +	if (IS_ERR(info->screen_base)) {
>   		DRM_ERROR("Failed to remap framebuffer into virtual memory\n");
> -		ret = -ENOSPC;
> +		ret = PTR_ERR(info->screen_base);
>   		goto out_destroy_fbi;
>   	}
>   	info->screen_size = vma->node.size;
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 136/190] drm/i915: Move ioremap_wc tracking onto VMA
  2016-02-11 13:20     ` Tvrtko Ursulin
@ 2016-02-11 13:29       ` Chris Wilson
  2016-02-11 14:10         ` Tvrtko Ursulin
  0 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-02-11 13:29 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, Feb 11, 2016 at 01:20:46PM +0000, Tvrtko Ursulin wrote:
> 
> 
> On 11/01/16 10:45, Chris Wilson wrote:
> >By tracking the iomapping on the VMA itself, we can share that area
> >between multiple users. Also by only revoking the iomapping upon
> >unbinding from the mappable portion of the GGTT, we can keep that iomap
> >across multiple invocations (e.g. execlists context pinning).
> 
> Between the lines and from some IRC discussion it seems the goal of
> this is to fix an address space memory leak with fbcon?

The goal is to prevent an issue from hastily dropping iomappings (and
vmappings elsewhere) when unpinning contexts. That comes into play when
we track the ring->vma (not just track ring->vma as we do now, but can
rely on the vma being persistent). Fixing a leak on driver unload is an
interesting side-effect.
 
> But I don't know fbdev so can't find who would do the unpin on the
> VMA to allow unbind to eventually unmap it?

That actually gets fixed in another patch when we teach intel_fbdev to
actually track the VMA it allocates, as right now we deliberately leak
it to keep the code simple.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 136/190] drm/i915: Move ioremap_wc tracking onto VMA
  2016-02-11 13:29       ` Chris Wilson
@ 2016-02-11 14:10         ` Tvrtko Ursulin
  2016-02-19 15:11           ` Chris Wilson
  0 siblings, 1 reply; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-02-11 14:10 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 11/02/16 13:29, Chris Wilson wrote:
> On Thu, Feb 11, 2016 at 01:20:46PM +0000, Tvrtko Ursulin wrote:
>>
>>
>> On 11/01/16 10:45, Chris Wilson wrote:
>>> By tracking the iomapping on the VMA itself, we can share that area
>>> between multiple users. Also by only revoking the iomapping upon
>>> unbinding from the mappable portion of the GGTT, we can keep that iomap
>>> across multiple invocations (e.g. execlists context pinning).
>>
>> Between the lines and from some IRC discussion it seems the goal of
>> this is to fix an address space memory leak with fbcon?
>
> The goal is to prevent an issue from hastily dropping iomappings (and
> vmappings elsewhere) when unpinning contexts. That comes into play when
> we track the ring->vma (not just track ring->vma as we do now, but can
> rely on the vma being persistent). Fixing a leak on driver unload is an
> interesting side-effect.
>
>> But I don't know fbdev so can't find who would do the unpin on the
>> VMA to allow unbind to eventually unmap it?
>
> That actually gets fixed in another patch when we teach intel_fbdev to
> actually track the VMA it allocates, as right now we deliberately leak
> it to keep the code simple.

Ok in that case ack on the patch.

It will need to assert on VMA being pinned I think and some minor 
changes when you rebase it on top of nightly as standalone so I can 
properly review it then.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 003/190] drm/i915: Add an optional selection from i915 of CONFIG_MMU_NOTIFIER
  2016-01-11  9:16 ` [PATCH 003/190] drm/i915: Add an optional selection from i915 of CONFIG_MMU_NOTIFIER Chris Wilson
@ 2016-02-17 12:59   ` Daniel Vetter
  0 siblings, 0 replies; 263+ messages in thread
From: Daniel Vetter @ 2016-02-17 12:59 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Jan 11, 2016 at 09:16:14AM +0000, Chris Wilson wrote:
> userptr requires mmu-notifier for full unprivileged support. Most
> systems have mmu-notifier support already enabled as a requirement for
> virtualisation support, but we should make the option for i915 to take
> advantage of mmu-notifiers explicit (and enable by default so that
> regular userspace can take advantage of passing client memory to the
> GPU.)
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>

Queued for -next, thanks for the patch.
-Daniel

> ---
>  drivers/gpu/drm/i915/Kconfig | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
> index fcd77b27514d..b979295aab82 100644
> --- a/drivers/gpu/drm/i915/Kconfig
> +++ b/drivers/gpu/drm/i915/Kconfig
> @@ -48,3 +48,14 @@ config DRM_I915_PRELIMINARY_HW_SUPPORT
>  	  option changes the default for that module option.
>  
>  	  If in doubt, say "N".
> +
> +config DRM_I915_USERPTR
> +	bool "Always enable userptr support"
> +	depends on DRM_I915
> +	select MMU_NOTIFIER
> +	default y
> +	help
> +	  This option selects CONFIG_MMU_NOTIFIER if it isn't already
> +	  selected to enabled full userptr support.
> +
> +	  If in doubt, say "Y".
> -- 
> 2.7.0.rc3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 088/190] drm/i915: Move execlists interrupt based submission to a bottom-half
  2016-01-11 10:44   ` [PATCH 088/190] drm/i915: Move execlists interrupt based submission to a bottom-half Chris Wilson
@ 2016-02-19 12:08     ` Tvrtko Ursulin
  2016-02-19 12:29       ` Chris Wilson
  0 siblings, 1 reply; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-02-19 12:08 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


Hi,

On 11/01/16 10:44, Chris Wilson wrote:
> [  196.988204] clocksource: timekeeping watchdog: Marking clocksource 'tsc' as unstable because the skew is too large:
> [  196.988512] clocksource:                       'refined-jiffies' wd_now: ffff9b48 wd_last: ffff9acb mask: ffffffff
> [  196.988559] clocksource:                       'tsc' cs_now: 4fcfa84354 cs_last: 4f95425e98 mask: ffffffffffffffff
> [  196.992115] clocksource: Switched to clocksource refined-jiffies
>
> Followed by a hard lockup.

What does the above mean? Irq handler taking too long interferes with 
time keeping ?

I like it BTW. Just the commit message needs more work. :)

How is performance impact with just this patch in isolation? Worth 
progressing it on its own?

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c     |   5 +-
>   drivers/gpu/drm/i915/i915_gem.c         |  15 +--
>   drivers/gpu/drm/i915/i915_irq.c         |   2 +-
>   drivers/gpu/drm/i915/intel_lrc.c        | 164 +++++++++++++++++---------------
>   drivers/gpu/drm/i915/intel_lrc.h        |   3 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
>   6 files changed, 98 insertions(+), 92 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 378bc73296aa..15a6fddfb79b 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2094,7 +2094,6 @@ static int i915_execlists(struct seq_file *m, void *data)
>   	for_each_ring(ring, dev_priv, ring_id) {
>   		struct drm_i915_gem_request *head_req = NULL;
>   		int count = 0;
> -		unsigned long flags;
>
>   		seq_printf(m, "%s\n", ring->name);
>
> @@ -2121,12 +2120,12 @@ static int i915_execlists(struct seq_file *m, void *data)
>   				   i, status, ctx_id);
>   		}
>
> -		spin_lock_irqsave(&ring->execlist_lock, flags);
> +		spin_lock(&ring->execlist_lock);
>   		list_for_each(cursor, &ring->execlist_queue)
>   			count++;
>   		head_req = list_first_entry_or_null(&ring->execlist_queue,
>   				struct drm_i915_gem_request, execlist_link);
> -		spin_unlock_irqrestore(&ring->execlist_lock, flags);
> +		spin_unlock(&ring->execlist_lock);
>
>   		seq_printf(m, "\t%d requests in queue\n", count);
>   		if (head_req) {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 391f840d29b7..eb875ecd7907 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2192,13 +2192,13 @@ static void i915_gem_reset_ring_cleanup(struct intel_engine_cs *engine)
>   	 */
>
>   	if (i915.enable_execlists) {
> -		spin_lock_irq(&engine->execlist_lock);
> +		spin_lock(&engine->execlist_lock);
>
>   		/* list_splice_tail_init checks for empty lists */
>   		list_splice_tail_init(&engine->execlist_queue,
>   				      &engine->execlist_retired_req_list);
>
> -		spin_unlock_irq(&engine->execlist_lock);
> +		spin_unlock(&engine->execlist_lock);
>   		intel_execlists_retire_requests(engine);
>   	}
>
> @@ -2290,15 +2290,8 @@ i915_gem_retire_requests(struct drm_device *dev)
>   	for_each_ring(ring, dev_priv, i) {
>   		i915_gem_retire_requests_ring(ring);
>   		idle &= list_empty(&ring->request_list);
> -		if (i915.enable_execlists) {
> -			unsigned long flags;
> -
> -			spin_lock_irqsave(&ring->execlist_lock, flags);
> -			idle &= list_empty(&ring->execlist_queue);
> -			spin_unlock_irqrestore(&ring->execlist_lock, flags);
> -
> -			intel_execlists_retire_requests(ring);
> -		}
> +		if (i915.enable_execlists)
> +			idle &= intel_execlists_retire_requests(ring);
>   	}
>
>   	if (idle)
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index ce047ac84f5f..b2ef2d0c211b 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1316,7 +1316,7 @@ gen8_cs_irq_handler(struct intel_engine_cs *ring, u32 iir, int test_shift)
>   	if (iir & (GT_RENDER_USER_INTERRUPT << test_shift))
>   		notify_ring(ring);
>   	if (iir & (GT_CONTEXT_SWITCH_INTERRUPT << test_shift))
> -		intel_lrc_irq_handler(ring);
> +		wake_up_process(ring->execlists_submit);
>   }
>
>   static irqreturn_t gen8_gt_irq_handler(struct drm_i915_private *dev_priv,
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index b5f62b5f4913..de5889e95d6d 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -132,6 +132,8 @@
>    *
>    */
>
> +#include <linux/kthread.h>
> +
>   #include <drm/drmP.h>
>   #include <drm/i915_drm.h>
>   #include "i915_drv.h"
> @@ -341,7 +343,7 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
>   	rq0->elsp_submitted++;
>
>   	/* You must always write both descriptors in the order below. */
> -	spin_lock(&dev_priv->uncore.lock);
> +	spin_lock_irq(&dev_priv->uncore.lock);
>   	intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL);
>   	I915_WRITE_FW(RING_ELSP(engine), upper_32_bits(desc[1]));
>   	I915_WRITE_FW(RING_ELSP(engine), lower_32_bits(desc[1]));
> @@ -353,7 +355,7 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
>   	/* ELSP is a wo register, use another nearby reg for posting */
>   	POSTING_READ_FW(RING_EXECLIST_STATUS_LO(engine));
>   	intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL);
> -	spin_unlock(&dev_priv->uncore.lock);
> +	spin_unlock_irq(&dev_priv->uncore.lock);
>   }
>
>   static int execlists_update_context(struct drm_i915_gem_request *rq)
> @@ -492,89 +494,84 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
>   	return false;
>   }
>
> -static void get_context_status(struct intel_engine_cs *ring,
> -			       u8 read_pointer,
> -			       u32 *status, u32 *context_id)
> +static void set_rtpriority(void)
>   {
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -
> -	if (WARN_ON(read_pointer >= GEN8_CSB_ENTRIES))
> -		return;
> -
> -	*status = I915_READ(RING_CONTEXT_STATUS_BUF_LO(ring, read_pointer));
> -	*context_id = I915_READ(RING_CONTEXT_STATUS_BUF_HI(ring, read_pointer));
> +	 struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO/2-1 };
> +	 sched_setscheduler_nocheck(current, SCHED_FIFO, &param);
>   }
>
> -/**
> - * intel_lrc_irq_handler() - handle Context Switch interrupts
> - * @ring: Engine Command Streamer to handle.
> - *
> - * Check the unread Context Status Buffers and manage the submission of new
> - * contexts to the ELSP accordingly.
> - */
> -void intel_lrc_irq_handler(struct intel_engine_cs *ring)
> +static int intel_execlists_submit(void *arg)
>   {
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -	u32 status_pointer;
> -	u8 read_pointer;
> -	u8 write_pointer;
> -	u32 status = 0;
> -	u32 status_id;
> -	u32 submit_contexts = 0;
> +	struct intel_engine_cs *ring = arg;
> +	struct drm_i915_private *dev_priv = ring->i915;
>
> -	status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
> +	set_rtpriority();
>
> -	read_pointer = ring->next_context_status_buffer;
> -	write_pointer = GEN8_CSB_WRITE_PTR(status_pointer);
> -	if (read_pointer > write_pointer)
> -		write_pointer += GEN8_CSB_ENTRIES;
> +	do {
> +		u32 status;
> +		u32 status_id;
> +		u32 submit_contexts;
> +		u8 head, tail;
>
> -	spin_lock(&ring->execlist_lock);
> +		set_current_state(TASK_INTERRUPTIBLE);

Hm, what is the effect of setting TASK_INTERRUPTIBLE at this stage 
rather than just before the call to schedule?

And why interruptible?

> +		head = ring->next_context_status_buffer;
> +		tail = I915_READ(RING_CONTEXT_STATUS_PTR(ring)) & GEN8_CSB_PTR_MASK;
> +		if (head == tail) {
> +			if (kthread_should_stop())
> +				return 0;
>
> -	while (read_pointer < write_pointer) {
> +			schedule();
> +			continue;
> +		}
> +		__set_current_state(TASK_RUNNING);
>
> -		get_context_status(ring, ++read_pointer % GEN8_CSB_ENTRIES,
> -				   &status, &status_id);
> +		if (head > tail)
> +			tail += GEN8_CSB_ENTRIES;
>
> -		if (status & GEN8_CTX_STATUS_IDLE_ACTIVE)
> -			continue;
> +		status = 0;
> +		submit_contexts = 0;
>
> -		if (status & GEN8_CTX_STATUS_PREEMPTED) {
> -			if (status & GEN8_CTX_STATUS_LITE_RESTORE) {
> -				if (execlists_check_remove_request(ring, status_id))
> -					WARN(1, "Lite Restored request removed from queue\n");
> -			} else
> -				WARN(1, "Preemption without Lite Restore\n");
> -		}
> +		spin_lock(&ring->execlist_lock);
>
> -		if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) ||
> -		    (status & GEN8_CTX_STATUS_ELEMENT_SWITCH)) {
> -			if (execlists_check_remove_request(ring, status_id))
> -				submit_contexts++;
> -		}
> -	}
> +		while (head++ < tail) {
> +			status = I915_READ(RING_CONTEXT_STATUS_BUF_LO(ring, head % GEN8_CSB_ENTRIES));
> +			status_id = I915_READ(RING_CONTEXT_STATUS_BUF_HI(ring, head % GEN8_CSB_ENTRIES));

One potentially cheap improvement I've been thinking of is to move the 
CSB reading outside the execlist_lock, to avoid slow MMIO contending the 
lock.

We could fetch all valid entries into a temporary local buffer, then 
grab the execlist_lock and process completions and submission from there.

> -	if (disable_lite_restore_wa(ring)) {
> -		/* Prevent a ctx to preempt itself */
> -		if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) &&
> -		    (submit_contexts != 0))
> -			execlists_context_unqueue(ring);
> -	} else if (submit_contexts != 0) {
> -		execlists_context_unqueue(ring);
> -	}
> +			if (status & GEN8_CTX_STATUS_IDLE_ACTIVE)
> +				continue;
>
> -	spin_unlock(&ring->execlist_lock);
> +			if (status & GEN8_CTX_STATUS_PREEMPTED) {
> +				if (status & GEN8_CTX_STATUS_LITE_RESTORE) {
> +					if (execlists_check_remove_request(ring, status_id))
> +						WARN(1, "Lite Restored request removed from queue\n");
> +				} else
> +					WARN(1, "Preemption without Lite Restore\n");
> +			}
>
> -	if (unlikely(submit_contexts > 2))
> -		DRM_ERROR("More than two context complete events?\n");
> +			if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) ||
> +			    (status & GEN8_CTX_STATUS_ELEMENT_SWITCH)) {
> +				if (execlists_check_remove_request(ring, status_id))
> +					submit_contexts++;
> +			}
> +		}
> +
> +		if (disable_lite_restore_wa(ring)) {
> +			/* Prevent a ctx to preempt itself */
> +			if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) &&
> +					(submit_contexts != 0))
> +				execlists_context_unqueue(ring);
> +		} else if (submit_contexts != 0) {
> +			execlists_context_unqueue(ring);
> +		}
>
> -	ring->next_context_status_buffer = write_pointer % GEN8_CSB_ENTRIES;
> +		spin_unlock(&ring->execlist_lock);

Would it be worth trying to grab the mutex and unpin the LRCs at this 
point? It would slow down the thread a bit but would get rid of the 
retired work queue. With the vma->iomap thingy could be quite cheap?

> -	/* Update the read pointer to the old write pointer. Manual ringbuffer
> -	 * management ftw </sarcasm> */
> -	I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
> -		   _MASKED_FIELD(GEN8_CSB_READ_PTR_MASK,
> -				 ring->next_context_status_buffer << 8));
> +		WARN(submit_contexts > 2, "More than two context complete events?\n");
> +		ring->next_context_status_buffer = tail % GEN8_CSB_ENTRIES;
> +		I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
> +			   _MASKED_FIELD(GEN8_CSB_PTR_MASK << 8,
> +					 ring->next_context_status_buffer<<8));
> +	} while (1);
>   }
>
>   static int execlists_context_queue(struct drm_i915_gem_request *request)
> @@ -585,7 +582,7 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
>
>   	i915_gem_request_get(request);
>
> -	spin_lock_irq(&engine->execlist_lock);
> +	spin_lock(&engine->execlist_lock);
>
>   	list_for_each_entry(cursor, &engine->execlist_queue, execlist_link)
>   		if (++num_elements > 2)
> @@ -611,7 +608,7 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
>   	if (num_elements == 0)
>   		execlists_context_unqueue(engine);
>
> -	spin_unlock_irq(&engine->execlist_lock);
> +	spin_unlock(&engine->execlist_lock);

Hm, another thing which could be quite cool is if here we could just 
wake the thread and let it submit the request instead of doing it 
directly from 3rd party context.

That would make all ELSP accesses serialized for free since they would 
only be happening from a single thread. And potentially could reduce the 
scope of the lock even more.

But it would mean extra latency when transitioning the idle engine to 
busy. Maybe it wouldn't matter for such workloads.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 088/190] drm/i915: Move execlists interrupt based submission to a bottom-half
  2016-02-19 12:08     ` Tvrtko Ursulin
@ 2016-02-19 12:29       ` Chris Wilson
  2016-02-19 14:10         ` Tvrtko Ursulin
  0 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-02-19 12:29 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, Feb 19, 2016 at 12:08:14PM +0000, Tvrtko Ursulin wrote:
> 
> Hi,
> 
> On 11/01/16 10:44, Chris Wilson wrote:
> >[  196.988204] clocksource: timekeeping watchdog: Marking clocksource 'tsc' as unstable because the skew is too large:
> >[  196.988512] clocksource:                       'refined-jiffies' wd_now: ffff9b48 wd_last: ffff9acb mask: ffffffff
> >[  196.988559] clocksource:                       'tsc' cs_now: 4fcfa84354 cs_last: 4f95425e98 mask: ffffffffffffffff
> >[  196.992115] clocksource: Switched to clocksource refined-jiffies
> >
> >Followed by a hard lockup.
> 
> What does the above mean? Irq handler taking too long interferes
> with time keeping ?

That's exactly what it means, we run for too long in interrupt context
(i.e. with interrupts disabled).
 
> I like it BTW. Just the commit message needs more work. :)
> 
> How is performance impact with just this patch in isolation? Worth
> progressing it on its own?

I only looked for regressions, which I didn't find. It fixes a machine
freeze/panic, so I wasn't looking for any other reason to justify the
patch!

> >-void intel_lrc_irq_handler(struct intel_engine_cs *ring)
> >+static int intel_execlists_submit(void *arg)
> >  {
> >-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> >-	u32 status_pointer;
> >-	u8 read_pointer;
> >-	u8 write_pointer;
> >-	u32 status = 0;
> >-	u32 status_id;
> >-	u32 submit_contexts = 0;
> >+	struct intel_engine_cs *ring = arg;
> >+	struct drm_i915_private *dev_priv = ring->i915;
> >
> >-	status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
> >+	set_rtpriority();
> >
> >-	read_pointer = ring->next_context_status_buffer;
> >-	write_pointer = GEN8_CSB_WRITE_PTR(status_pointer);
> >-	if (read_pointer > write_pointer)
> >-		write_pointer += GEN8_CSB_ENTRIES;
> >+	do {
> >+		u32 status;
> >+		u32 status_id;
> >+		u32 submit_contexts;
> >+		u8 head, tail;
> >
> >-	spin_lock(&ring->execlist_lock);
> >+		set_current_state(TASK_INTERRUPTIBLE);
> 
> Hm, what is the effect of setting TASK_INTERRUPTIBLE at this stage
> rather than just before the call to schedule?

We need to set the task state before doing the checks, so that we do not
miss a wakeup from the interrupt handler. Otherwise, we may check the
register, decide there is nothing to do, be interrupted by the irq
handler which then sets us to TASK_RUNNING, and then proceed with going
to sleep by setting TASK_INTERRUPTIBLE (and so missing that the
context-switch had just occurred and sleep forever).
 
> And why interruptible?

To hide ourselves from contributing to the system load and appear as
sleeping (rather than blocked) in the process lists.

> >+		while (head++ < tail) {
> >+			status = I915_READ(RING_CONTEXT_STATUS_BUF_LO(ring, head % GEN8_CSB_ENTRIES));
> >+			status_id = I915_READ(RING_CONTEXT_STATUS_BUF_HI(ring, head % GEN8_CSB_ENTRIES));
> 
> One potentially cheap improvement I've been thinking of is to move
> the CSB reading outside the execlist_lock, to avoid slow MMIO
> contending the lock.

Yes, that should be a small but noticeable improvement.
 
> We could fetch all valid entries into a temporary local buffer, then
> grab the execlist_lock and process completions and submission from
> there.

If you look at the next patch, you will see that's what I did do :)

> >-	ring->next_context_status_buffer = write_pointer % GEN8_CSB_ENTRIES;
> >+		spin_unlock(&ring->execlist_lock);
> 
> Would it be worth trying to grab the mutex and unpin the LRCs at
> this point? It would slow down the thread a bit but would get rid of
> the retired work queue. With the vma->iomap thingy could be quite
> cheap?

Exactly, we want the iomap/vmap caching thingy first :) But the
retired work queue disappears as a fallout of your previous-context idea
anyway plus the fix to avoid the struct_mutex when freeing requests.

> >-	/* Update the read pointer to the old write pointer. Manual ringbuffer
> >-	 * management ftw </sarcasm> */
> >-	I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
> >-		   _MASKED_FIELD(GEN8_CSB_READ_PTR_MASK,
> >-				 ring->next_context_status_buffer << 8));
> >+		WARN(submit_contexts > 2, "More than two context complete events?\n");
> >+		ring->next_context_status_buffer = tail % GEN8_CSB_ENTRIES;
> >+		I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
> >+			   _MASKED_FIELD(GEN8_CSB_PTR_MASK << 8,
> >+					 ring->next_context_status_buffer<<8));
> >+	} while (1);
> >  }
> >
> >  static int execlists_context_queue(struct drm_i915_gem_request *request)
> >@@ -585,7 +582,7 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
> >
> >  	i915_gem_request_get(request);
> >
> >-	spin_lock_irq(&engine->execlist_lock);
> >+	spin_lock(&engine->execlist_lock);
> >
> >  	list_for_each_entry(cursor, &engine->execlist_queue, execlist_link)
> >  		if (++num_elements > 2)
> >@@ -611,7 +608,7 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
> >  	if (num_elements == 0)
> >  		execlists_context_unqueue(engine);
> >
> >-	spin_unlock_irq(&engine->execlist_lock);
> >+	spin_unlock(&engine->execlist_lock);
> 
> Hm, another thing which could be quite cool is if here we could just
> wake the thread and let it submit the request instead of doing it
> directly from 3rd party context.

Yes, this is something I played around with, and my conclusion was that
the extra overhead from calling ttwu (try_to_wake_up) on the majority of
workloads outweighed the few workloads that benefitted.
 
> That would make all ELSP accesses serialized for free since they
> would only be happening from a single thread. And potentially could
> reduce the scope of the lock even more.
> 
> But it would mean extra latency when transitioning the idle engine
> to busy. Maybe it wouldn't matter for such workloads.

Yup. I saw greater improvement from reducing the overhead along the
execlists context-switch handling path than the adding of requests.

There is certainly plenty of scope for improvement though.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 088/190] drm/i915: Move execlists interrupt based submission to a bottom-half
  2016-02-19 12:29       ` Chris Wilson
@ 2016-02-19 14:10         ` Tvrtko Ursulin
  2016-02-19 14:34           ` Chris Wilson
  2016-02-19 14:41           ` Chris Wilson
  0 siblings, 2 replies; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-02-19 14:10 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/02/16 12:29, Chris Wilson wrote:
> On Fri, Feb 19, 2016 at 12:08:14PM +0000, Tvrtko Ursulin wrote:
>>
>> Hi,
>>
>> On 11/01/16 10:44, Chris Wilson wrote:
>>> [  196.988204] clocksource: timekeeping watchdog: Marking clocksource 'tsc' as unstable because the skew is too large:
>>> [  196.988512] clocksource:                       'refined-jiffies' wd_now: ffff9b48 wd_last: ffff9acb mask: ffffffff
>>> [  196.988559] clocksource:                       'tsc' cs_now: 4fcfa84354 cs_last: 4f95425e98 mask: ffffffffffffffff
>>> [  196.992115] clocksource: Switched to clocksource refined-jiffies
>>>
>>> Followed by a hard lockup.
>>
>> What does the above mean? Irq handler taking too long interferes
>> with time keeping ?
>
> That's exactly what it means, we run for too long in interrupt context
> (i.e. with interrupts disabled).

Okay, just please spell it out in the commit.

>> I like it BTW. Just the commit message needs more work. :)
>>
>> How is performance impact with just this patch in isolation? Worth
>> progressing it on its own?
>
> I only looked for regressions, which I didn't find. It fixes a machine
> freeze/panic, so I wasn't looking for any other reason to justify the
> patch!

Then both of the above also need to be documented in the commit message.

>>> -void intel_lrc_irq_handler(struct intel_engine_cs *ring)
>>> +static int intel_execlists_submit(void *arg)
>>>   {
>>> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
>>> -	u32 status_pointer;
>>> -	u8 read_pointer;
>>> -	u8 write_pointer;
>>> -	u32 status = 0;
>>> -	u32 status_id;
>>> -	u32 submit_contexts = 0;
>>> +	struct intel_engine_cs *ring = arg;
>>> +	struct drm_i915_private *dev_priv = ring->i915;
>>>
>>> -	status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
>>> +	set_rtpriority();
>>>
>>> -	read_pointer = ring->next_context_status_buffer;
>>> -	write_pointer = GEN8_CSB_WRITE_PTR(status_pointer);
>>> -	if (read_pointer > write_pointer)
>>> -		write_pointer += GEN8_CSB_ENTRIES;
>>> +	do {
>>> +		u32 status;
>>> +		u32 status_id;
>>> +		u32 submit_contexts;
>>> +		u8 head, tail;
>>>
>>> -	spin_lock(&ring->execlist_lock);
>>> +		set_current_state(TASK_INTERRUPTIBLE);
>>
>> Hm, what is the effect of setting TASK_INTERRUPTIBLE at this stage
>> rather than just before the call to schedule?
>
> We need to set the task state before doing the checks, so that we do not
> miss a wakeup from the interrupt handler. Otherwise, we may check the
> register, decide there is nothing to do, be interrupted by the irq
> handler which then sets us to TASK_RUNNING, and then proceed with going
> to sleep by setting TASK_INTERRUPTIBLE (and so missing that the
> context-switch had just occurred and sleep forever).

Doh of course - haven't been exposed to this for a while.

>> And why interruptible?
>
> To hide ourselves from contributing to the system load and appear as
> sleeping (rather than blocked) in the process lists.

Oh yes, definitely.

>>> +		while (head++ < tail) {
>>> +			status = I915_READ(RING_CONTEXT_STATUS_BUF_LO(ring, head % GEN8_CSB_ENTRIES));
>>> +			status_id = I915_READ(RING_CONTEXT_STATUS_BUF_HI(ring, head % GEN8_CSB_ENTRIES));
>>
>> One potentially cheap improvement I've been thinking of is to move
>> the CSB reading outside the execlist_lock, to avoid slow MMIO
>> contending the lock.
>
> Yes, that should be a small but noticeable improvement.
>
>> We could fetch all valid entries into a temporary local buffer, then
>> grab the execlist_lock and process completions and submission from
>> there.
>
> If you look at the next patch, you will see that's what I did do :)

I have just about started to understand (or pretend to understand) the 
current code. I don't dare to look at a complete rewrite.

So maybe start with this one and go with small incremental changes?

>>> -	ring->next_context_status_buffer = write_pointer % GEN8_CSB_ENTRIES;
>>> +		spin_unlock(&ring->execlist_lock);
>>
>> Would it be worth trying to grab the mutex and unpin the LRCs at
>> this point? It would slow down the thread a bit but would get rid of
>> the retired work queue. With the vma->iomap thingy could be quite
>> cheap?
>
> Exactly, we want the iomap/vmap caching thingy first :) But the
> retired work queue disappears as a fallout of your previous-context idea
> anyway plus the fix to avoid the struct_mutex when freeing requests.

I did not get that working yet. I think it need N previous contexts 
pinned in a request, where N is equal to CSB size. Since most 
pessimistic thinking is we could get that many context complete events 
in a single interrupt.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 088/190] drm/i915: Move execlists interrupt based submission to a bottom-half
  2016-02-19 14:10         ` Tvrtko Ursulin
@ 2016-02-19 14:34           ` Chris Wilson
  2016-02-19 14:52             ` Tvrtko Ursulin
  2016-02-19 14:41           ` Chris Wilson
  1 sibling, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-02-19 14:34 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, Feb 19, 2016 at 02:10:44PM +0000, Tvrtko Ursulin wrote:
> On 19/02/16 12:29, Chris Wilson wrote:
> >Exactly, we want the iomap/vmap caching thingy first :) But the
> >retired work queue disappears as a fallout of your previous-context idea
> >anyway plus the fix to avoid the struct_mutex when freeing requests.
> 
> I did not get that working yet. I think it need N previous contexts
> pinned in a request, where N is equal to CSB size. Since most
> pessimistic thinking is we could get that many context complete
> events in a single interrupt.

The completion order is still the same as our execution order (it has to
be otherwise it violates our serialisation rules), so worst case is you
only need to keep the engine->last_context pinned along with active
references on the outstanding contexts (and those active references are
held by the following request).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 088/190] drm/i915: Move execlists interrupt based submission to a bottom-half
  2016-02-19 14:10         ` Tvrtko Ursulin
  2016-02-19 14:34           ` Chris Wilson
@ 2016-02-19 14:41           ` Chris Wilson
  1 sibling, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-02-19 14:41 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, Feb 19, 2016 at 02:10:44PM +0000, Tvrtko Ursulin wrote:
> 
> On 19/02/16 12:29, Chris Wilson wrote:
> >On Fri, Feb 19, 2016 at 12:08:14PM +0000, Tvrtko Ursulin wrote:
> >>
> >>Hi,
> >>
> >>On 11/01/16 10:44, Chris Wilson wrote:
> >>>[  196.988204] clocksource: timekeeping watchdog: Marking clocksource 'tsc' as unstable because the skew is too large:
> >>>[  196.988512] clocksource:                       'refined-jiffies' wd_now: ffff9b48 wd_last: ffff9acb mask: ffffffff
> >>>[  196.988559] clocksource:                       'tsc' cs_now: 4fcfa84354 cs_last: 4f95425e98 mask: ffffffffffffffff
> >>>[  196.992115] clocksource: Switched to clocksource refined-jiffies
> >>>
> >>>Followed by a hard lockup.
> >>
> >>What does the above mean? Irq handler taking too long interferes
> >>with time keeping ?
> >
> >That's exactly what it means, we run for too long in interrupt context
> >(i.e. with interrupts disabled).
> 
> Okay, just please spell it out in the commit.
> 
> >>I like it BTW. Just the commit message needs more work. :)
> >>
> >>How is performance impact with just this patch in isolation? Worth
> >>progressing it on its own?
> >
> >I only looked for regressions, which I didn't find. It fixes a machine
> >freeze/panic, so I wasn't looking for any other reason to justify the
> >patch!
> 
> Then both of the above also need to be documented in the commit message.

Hah, one thing I just rediscovered was that the benchmarks for this kill
the machine without the patch.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 088/190] drm/i915: Move execlists interrupt based submission to a bottom-half
  2016-02-19 14:34           ` Chris Wilson
@ 2016-02-19 14:52             ` Tvrtko Ursulin
  2016-02-19 15:02               ` Chris Wilson
  0 siblings, 1 reply; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-02-19 14:52 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/02/16 14:34, Chris Wilson wrote:
> On Fri, Feb 19, 2016 at 02:10:44PM +0000, Tvrtko Ursulin wrote:
>> On 19/02/16 12:29, Chris Wilson wrote:
>>> Exactly, we want the iomap/vmap caching thingy first :) But the
>>> retired work queue disappears as a fallout of your previous-context idea
>>> anyway plus the fix to avoid the struct_mutex when freeing requests.
>>
>> I did not get that working yet. I think it need N previous contexts
>> pinned in a request, where N is equal to CSB size. Since most
>> pessimistic thinking is we could get that many context complete
>> events in a single interrupt.
>
> The completion order is still the same as our execution order (it has to
> be otherwise it violates our serialisation rules), so worst case is you
> only need to keep the engine->last_context pinned along with active
> references on the outstanding contexts (and those active references are
> held by the following request).

Yes completion order is the same but seqno in HWS could theoretically be 
ahead of the context completes by the size of CSB I thought.

Theory is that GPU could have completed N contexts, stuffed the 
notifications in the CSB and then maybe the actual interrupt got 
coalesced, delayed, or something. At least I thought this was what I was 
observing. Not 100% sure.

But it was only a problem in the current code, where I tried to unpin 
the previous contexts from the seqno based timeline of course.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 088/190] drm/i915: Move execlists interrupt based submission to a bottom-half
  2016-02-19 14:52             ` Tvrtko Ursulin
@ 2016-02-19 15:02               ` Chris Wilson
  0 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-02-19 15:02 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, Feb 19, 2016 at 02:52:18PM +0000, Tvrtko Ursulin wrote:
> 
> On 19/02/16 14:34, Chris Wilson wrote:
> >On Fri, Feb 19, 2016 at 02:10:44PM +0000, Tvrtko Ursulin wrote:
> >>On 19/02/16 12:29, Chris Wilson wrote:
> >>>Exactly, we want the iomap/vmap caching thingy first :) But the
> >>>retired work queue disappears as a fallout of your previous-context idea
> >>>anyway plus the fix to avoid the struct_mutex when freeing requests.
> >>
> >>I did not get that working yet. I think it need N previous contexts
> >>pinned in a request, where N is equal to CSB size. Since most
> >>pessimistic thinking is we could get that many context complete
> >>events in a single interrupt.
> >
> >The completion order is still the same as our execution order (it has to
> >be otherwise it violates our serialisation rules), so worst case is you
> >only need to keep the engine->last_context pinned along with active
> >references on the outstanding contexts (and those active references are
> >held by the following request).
> 
> Yes completion order is the same but seqno in HWS could
> theoretically be ahead of the context completes by the size of CSB I
> thought.
> 
> Theory is that GPU could have completed N contexts, stuffed the
> notifications in the CSB and then maybe the actual interrupt got
> coalesced, delayed, or something. At least I thought this was what I
> was observing. Not 100% sure.

Aiui, the issue is where we unpin (because of breadcrumb completion)
before the HW completes (finishes writing out the context). For legacy,
we rely on that the MI_SET_CONTEXT is a serialising instruction (i.e.
all writes are completed to the old context before execution continues)
and only then need to keep the old context alive until the request
containing the MI_SET_CONTEXT away is complete. For the same techinque
to be applicable to execlists, just needs the same guarantee that the
hardware will not execute from the next context until it has finished
saving state from the previous context.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 136/190] drm/i915: Move ioremap_wc tracking onto VMA
  2016-02-11 14:10         ` Tvrtko Ursulin
@ 2016-02-19 15:11           ` Chris Wilson
  2016-02-22 15:29             ` Tvrtko Ursulin
  0 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-02-19 15:11 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, Feb 11, 2016 at 02:10:19PM +0000, Tvrtko Ursulin wrote:
> 
> On 11/02/16 13:29, Chris Wilson wrote:
> >On Thu, Feb 11, 2016 at 01:20:46PM +0000, Tvrtko Ursulin wrote:
> >>
> >>
> >>On 11/01/16 10:45, Chris Wilson wrote:
> >>>By tracking the iomapping on the VMA itself, we can share that area
> >>>between multiple users. Also by only revoking the iomapping upon
> >>>unbinding from the mappable portion of the GGTT, we can keep that iomap
> >>>across multiple invocations (e.g. execlists context pinning).
> >>
> >>Between the lines and from some IRC discussion it seems the goal of
> >>this is to fix an address space memory leak with fbcon?
> >
> >The goal is to prevent an issue from hastily dropping iomappings (and
> >vmappings elsewhere) when unpinning contexts. That comes into play when
> >we track the ring->vma (not just track ring->vma as we do now, but can
> >rely on the vma being persistent). Fixing a leak on driver unload is an
> >interesting side-effect.
> >
> >>But I don't know fbdev so can't find who would do the unpin on the
> >>VMA to allow unbind to eventually unmap it?
> >
> >That actually gets fixed in another patch when we teach intel_fbdev to
> >actually track the VMA it allocates, as right now we deliberately leak
> >it to keep the code simple.
> 
> Ok in that case ack on the patch.
> 
> It will need to assert on VMA being pinned I think and some minor
> changes when you rebase it on top of nightly as standalone so I can
> properly review it then.

Had some more fun recently where the iomap was showing up again, and
realised that for 64b systems we can keep a pointer to inside our
dev_priv->gtt.mappable iomapping. Makes the patch a little more ugly
(requires #ifdeffry) but eliminates all of the runtime iomapping
overhead!

However, that wasn't enough for the test case as the limitation to the
mappable aperture was too severe...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 136/190] drm/i915: Move ioremap_wc tracking onto VMA
  2016-02-19 15:11           ` Chris Wilson
@ 2016-02-22 15:29             ` Tvrtko Ursulin
  2016-02-23 10:21               ` Chris Wilson
  0 siblings, 1 reply; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-02-22 15:29 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/02/16 15:11, Chris Wilson wrote:
> On Thu, Feb 11, 2016 at 02:10:19PM +0000, Tvrtko Ursulin wrote:
>>
>> On 11/02/16 13:29, Chris Wilson wrote:
>>> On Thu, Feb 11, 2016 at 01:20:46PM +0000, Tvrtko Ursulin wrote:
>>>>
>>>>
>>>> On 11/01/16 10:45, Chris Wilson wrote:
>>>>> By tracking the iomapping on the VMA itself, we can share that area
>>>>> between multiple users. Also by only revoking the iomapping upon
>>>>> unbinding from the mappable portion of the GGTT, we can keep that iomap
>>>>> across multiple invocations (e.g. execlists context pinning).
>>>>
>>>> Between the lines and from some IRC discussion it seems the goal of
>>>> this is to fix an address space memory leak with fbcon?
>>>
>>> The goal is to prevent an issue from hastily dropping iomappings (and
>>> vmappings elsewhere) when unpinning contexts. That comes into play when
>>> we track the ring->vma (not just track ring->vma as we do now, but can
>>> rely on the vma being persistent). Fixing a leak on driver unload is an
>>> interesting side-effect.
>>>
>>>> But I don't know fbdev so can't find who would do the unpin on the
>>>> VMA to allow unbind to eventually unmap it?
>>>
>>> That actually gets fixed in another patch when we teach intel_fbdev to
>>> actually track the VMA it allocates, as right now we deliberately leak
>>> it to keep the code simple.
>>
>> Ok in that case ack on the patch.
>>
>> It will need to assert on VMA being pinned I think and some minor
>> changes when you rebase it on top of nightly as standalone so I can
>> properly review it then.
>
> Had some more fun recently where the iomap was showing up again, and
> realised that for 64b systems we can keep a pointer to inside our
> dev_priv->gtt.mappable iomapping. Makes the patch a little more ugly
> (requires #ifdeffry) but eliminates all of the runtime iomapping
> overhead!
>
> However, that wasn't enough for the test case as the limitation to the
> mappable aperture was too severe...

Could use kmap then and not go through the aperture? I had a patch of 
similar semantics to your vma->iomap somewhere which, to start with, 
adds ability to kmap one page. Or it could do the whole objects for 
simplicity which for LRCs should be OK.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 136/190] drm/i915: Move ioremap_wc tracking onto VMA
  2016-02-22 15:29             ` Tvrtko Ursulin
@ 2016-02-23 10:21               ` Chris Wilson
  0 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-02-23 10:21 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Mon, Feb 22, 2016 at 03:29:57PM +0000, Tvrtko Ursulin wrote:
> 
> On 19/02/16 15:11, Chris Wilson wrote:
> >On Thu, Feb 11, 2016 at 02:10:19PM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 11/02/16 13:29, Chris Wilson wrote:
> >>>On Thu, Feb 11, 2016 at 01:20:46PM +0000, Tvrtko Ursulin wrote:
> >>>>
> >>>>
> >>>>On 11/01/16 10:45, Chris Wilson wrote:
> >>>>>By tracking the iomapping on the VMA itself, we can share that area
> >>>>>between multiple users. Also by only revoking the iomapping upon
> >>>>>unbinding from the mappable portion of the GGTT, we can keep that iomap
> >>>>>across multiple invocations (e.g. execlists context pinning).
> >>>>
> >>>>Between the lines and from some IRC discussion it seems the goal of
> >>>>this is to fix an address space memory leak with fbcon?
> >>>
> >>>The goal is to prevent an issue from hastily dropping iomappings (and
> >>>vmappings elsewhere) when unpinning contexts. That comes into play when
> >>>we track the ring->vma (not just track ring->vma as we do now, but can
> >>>rely on the vma being persistent). Fixing a leak on driver unload is an
> >>>interesting side-effect.
> >>>
> >>>>But I don't know fbdev so can't find who would do the unpin on the
> >>>>VMA to allow unbind to eventually unmap it?
> >>>
> >>>That actually gets fixed in another patch when we teach intel_fbdev to
> >>>actually track the VMA it allocates, as right now we deliberately leak
> >>>it to keep the code simple.
> >>
> >>Ok in that case ack on the patch.
> >>
> >>It will need to assert on VMA being pinned I think and some minor
> >>changes when you rebase it on top of nightly as standalone so I can
> >>properly review it then.
> >
> >Had some more fun recently where the iomap was showing up again, and
> >realised that for 64b systems we can keep a pointer to inside our
> >dev_priv->gtt.mappable iomapping. Makes the patch a little more ugly
> >(requires #ifdeffry) but eliminates all of the runtime iomapping
> >overhead!
> >
> >However, that wasn't enough for the test case as the limitation to the
> >mappable aperture was too severe...
> 
> Could use kmap then and not go through the aperture? I had a patch
> of similar semantics to your vma->iomap somewhere which, to start
> with, adds ability to kmap one page. Or it could do the whole
> objects for simplicity which for LRCs should be OK.

Actually I used vmap to avoid the limitation. But yes, for LRC we really
don't need to keep the whole state object mapped (and we don't, we just
kmap the registers), but we would either want to reduce the ringbuffer to
a single page, avoid commands spanning the page boundaries, or just use
vmap.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 039/190] drm/i915: Remove stop-rings debugfs interface
  2016-01-11  9:16 ` [PATCH 039/190] drm/i915: Remove stop-rings debugfs interface Chris Wilson
@ 2016-02-25 17:30   ` Arun Siluvery
  0 siblings, 0 replies; 263+ messages in thread
From: Arun Siluvery @ 2016-02-25 17:30 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 11/01/2016 09:16, Chris Wilson wrote:
> Now that we have (near) universal GPU recovery code, we can inject a
> real hang from userspace and not need any fakery. Not only does this
> mean that the testing is far more realistic, but we can simplify the
> kernel in the process.
>
> v2: Replace the i915_stop_rings with a dummy implementation as igt
> encodified its existence until we can release an update.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c     | 19 +------------------
>   drivers/gpu/drm/i915/i915_drv.c         | 17 ++---------------
>   drivers/gpu/drm/i915/i915_drv.h         | 19 -------------------
>   drivers/gpu/drm/i915/i915_gem.c         | 13 +++----------
>   drivers/gpu/drm/i915/intel_lrc.c        |  5 -----
>   drivers/gpu/drm/i915/intel_ringbuffer.c |  8 --------
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
>   7 files changed, 6 insertions(+), 76 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 567f8db4c70a..6172649b7e56 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -4752,30 +4752,13 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_wedged_fops,
>   static int
>   i915_ring_stop_get(void *data, u64 *val)
>   {
> -	struct drm_device *dev = data;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -
> -	*val = dev_priv->gpu_error.stop_rings;
> -
> +	*val = 0;
>   	return 0;
>   }
>
>   static int
>   i915_ring_stop_set(void *data, u64 val)
>   {
> -	struct drm_device *dev = data;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	int ret;
> -
> -	DRM_DEBUG_DRIVER("Stopping rings 0x%08llx\n", val);
> -
> -	ret = mutex_lock_interruptible(&dev->struct_mutex);
> -	if (ret)
> -		return ret;
> -
> -	dev_priv->gpu_error.stop_rings = val;
> -	mutex_unlock(&dev->struct_mutex);
> -
>   	return 0;
>   }
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 442e1217e442..e9f85fd0542f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -891,24 +891,11 @@ int i915_reset(struct drm_device *dev)
>   		goto error;
>   	}
>
> +	pr_notice("drm/i915: Resetting chip after gpu hang\n");
> +
>   	i915_gem_reset(dev);
>
>   	ret = intel_gpu_reset(dev);
> -
> -	/* Also reset the gpu hangman. */
> -	if (error->stop_rings != 0) {
> -		DRM_INFO("Simulated gpu hang, resetting stop_rings\n");
> -		error->stop_rings = 0;
> -		if (ret == -ENODEV) {
> -			DRM_INFO("Reset not implemented, but ignoring "
> -				 "error for simulated gpu hangs\n");
> -			ret = 0;
> -		}
> -	}
> -
> -	if (i915_stop_ring_allow_warn(dev_priv))
> -		pr_notice("drm/i915: Resetting chip after gpu hang\n");
> -
>   	if (ret) {
>   		if (ret != -ENODEV)
>   			DRM_ERROR("Failed to reset chip: %i\n", ret);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 9ec6f3e9e74d..c3b795f1566b 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1371,13 +1371,6 @@ struct i915_gpu_error {
>   	 */
>   	wait_queue_head_t reset_queue;
>
> -	/* Userspace knobs for gpu hang simulation;
> -	 * combines both a ring mask, and extra flags
> -	 */
> -	u32 stop_rings;
> -#define I915_STOP_RING_ALLOW_BAN       (1 << 31)
> -#define I915_STOP_RING_ALLOW_WARN      (1 << 30)
> -
>   	/* For missed irq/seqno simulation. */
>   	unsigned long test_irq_rings;
>   };
> @@ -3030,18 +3023,6 @@ static inline u32 i915_reset_count(struct i915_gpu_error *error)
>   	return ((i915_reset_counter(error) & ~I915_WEDGED) + 1) / 2;
>   }
>
> -static inline bool i915_stop_ring_allow_ban(struct drm_i915_private *dev_priv)
> -{
> -	return dev_priv->gpu_error.stop_rings == 0 ||
> -		dev_priv->gpu_error.stop_rings & I915_STOP_RING_ALLOW_BAN;
> -}
> -
> -static inline bool i915_stop_ring_allow_warn(struct drm_i915_private *dev_priv)
> -{
> -	return dev_priv->gpu_error.stop_rings == 0 ||
> -		dev_priv->gpu_error.stop_rings & I915_STOP_RING_ALLOW_WARN;
> -}
> -
>   void i915_gem_reset(struct drm_device *dev);
>   bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
>   int __must_check i915_gem_init(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 3948e85eaa48..ea9344503bf6 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2633,21 +2633,14 @@ static bool i915_context_is_banned(struct drm_i915_private *dev_priv,
>   {
>   	unsigned long elapsed;
>
> -	elapsed = get_seconds() - ctx->hang_stats.guilty_ts;
> -
>   	if (ctx->hang_stats.banned)
>   		return true;
>
> +	elapsed = get_seconds() - ctx->hang_stats.guilty_ts;
>   	if (ctx->hang_stats.ban_period_seconds &&
>   	    elapsed <= ctx->hang_stats.ban_period_seconds) {
> -		if (!i915_gem_context_is_default(ctx)) {
> -			DRM_DEBUG("context hanging too fast, banning!\n");
> -			return true;
> -		} else if (i915_stop_ring_allow_ban(dev_priv)) {
> -			if (i915_stop_ring_allow_warn(dev_priv))
> -				DRM_ERROR("gpu hanging too fast, banning!\n");
> -			return true;
> -		}
> +		DRM_DEBUG("context hanging too fast, banning!\n");
> +		return true;
>   	}
>
>   	return false;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index b1ede2e9b372..b634e7d7a92b 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -756,16 +756,11 @@ static int logical_ring_wait_for_space(struct drm_i915_gem_request *req,
>   static void
>   intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
>   {
> -	struct intel_engine_cs *ring = request->ring;
>   	struct drm_i915_private *dev_priv = request->i915;
>
>   	intel_logical_ring_advance(request->ringbuf);
> -
>   	request->tail = request->ringbuf->tail;
>
> -	if (intel_ring_stopped(ring))
> -		return;
> -
>   	if (dev_priv->guc.execbuf_client)
>   		i915_guc_submit(dev_priv->guc.execbuf_client, request);
>   	else
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 5625f56a2db1..d9bb6458fa60 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -59,18 +59,10 @@ int intel_ring_space(struct intel_ringbuffer *ringbuf)
>   	return ringbuf->space;
>   }
>
> -bool intel_ring_stopped(struct intel_engine_cs *ring)
> -{
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -	return dev_priv->gpu_error.stop_rings & intel_ring_flag(ring);
> -}
> -
>   static void __intel_ring_advance(struct intel_engine_cs *ring)
>   {
>   	struct intel_ringbuffer *ringbuf = ring->buffer;
>   	ringbuf->tail &= ringbuf->size - 1;
> -	if (intel_ring_stopped(ring))
> -		return;
>   	ring->write_tail(ring, ringbuf->tail);
>   }
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 73da75fa47c1..eecf9c7ae2b8 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -487,7 +487,6 @@ static inline void intel_ring_advance(struct intel_engine_cs *ring)
>   int __intel_ring_space(int head, int tail, int size);
>   void intel_ring_update_space(struct intel_ringbuffer *ringbuf);
>   int intel_ring_space(struct intel_ringbuffer *ringbuf);
> -bool intel_ring_stopped(struct intel_engine_cs *ring);
>
>   int __must_check intel_ring_idle(struct intel_engine_cs *ring);
>   void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno);
>
one less thing to worry about, looks good to me,
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>

regards
Arun

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 042/190] drm/i915: Clean up GPU hang message
  2016-01-11  9:16 ` [PATCH 042/190] drm/i915: Clean up GPU hang message Chris Wilson
@ 2016-02-25 17:40   ` Arun Siluvery
  0 siblings, 0 replies; 263+ messages in thread
From: Arun Siluvery @ 2016-02-25 17:40 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 11/01/2016 09:16, Chris Wilson wrote:
> Remove some redundant kernel messages as we deduce a hung GPU and
> capture the error state.
>
> v2: Fix "hang" vs "no progress" message whilst I was there
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_irq.c | 21 +++++++--------------
>   1 file changed, 7 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index d9757d227c86..ce52d7d9ad91 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3031,8 +3031,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   	struct drm_device *dev = dev_priv->dev;
>   	struct intel_engine_cs *ring;
>   	int i;
> -	int busy_count = 0, rings_hung = 0;
> -	bool stuck[I915_NUM_RINGS] = { 0 };
> +	int busy_count = 0;
>   #define BUSY 1
>   #define KICK 5
>   #define HUNG 20
> @@ -3108,7 +3107,6 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   					break;
>   				case HANGCHECK_HUNG:
>   					ring->hangcheck.score += HUNG;
> -					stuck[i] = true;
>   					break;
>   				}
>   			}
> @@ -3134,17 +3132,12 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   		busy_count += busy;
>   	}
>
> -	for_each_ring(ring, dev_priv, i) {
> -		if (ring->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG) {
> -			DRM_INFO("%s on %s\n",
> -				 stuck[i] ? "stuck" : "no progress",
> -				 ring->name);
> -			rings_hung++;

this is required when engine resets are supported. I am converting this 
to an engine_mask and send it directly to i915_handle_error().

regards
Arun

> -		}
> -	}
> -
> -	if (rings_hung)
> -		return i915_handle_error(dev, true, "Ring hung");
> +	for_each_ring(ring, dev_priv, i)
> +		if (ring->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG)
> +			return i915_handle_error(dev, true,
> +						 "%s on %s",
> +						 ring->hangcheck.action == HANGCHECK_HUNG ? "Hang" : "No progress" ,
> +						 ring->name);
>
>   	/* Reset timer in case GPU hangs without another request being added */
>   	if (busy_count)
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 044/190] drm/i915: Move GEM request routines to i915_gem_request.c
  2016-01-11  9:16 ` [PATCH 044/190] drm/i915: Move GEM request routines to i915_gem_request.c Chris Wilson
@ 2016-02-25 17:52   ` Arun Siluvery
  2016-03-08 12:58     ` Tvrtko Ursulin
  0 siblings, 1 reply; 263+ messages in thread
From: Arun Siluvery @ 2016-02-25 17:52 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 11/01/2016 09:16, Chris Wilson wrote:
> Migrate the request operations out of the main body of i915_gem.c and
> into their own C file for easier expansion.
>
> v2: Move __i915_add_request() across as well
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---

don't we lose the history in git blame when moved to a new file? is this 
ok? especially for files like i915_gem.c.

regards
Arun

>   drivers/gpu/drm/i915/Makefile           |   1 +
>   drivers/gpu/drm/i915/i915_drv.h         | 205 +---------
>   drivers/gpu/drm/i915/i915_gem.c         | 652 +------------------------------
>   drivers/gpu/drm/i915/i915_gem_request.c | 659 ++++++++++++++++++++++++++++++++
>   drivers/gpu/drm/i915/i915_gem_request.h | 223 +++++++++++
>   5 files changed, 895 insertions(+), 845 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/i915_gem_request.c
>   create mode 100644 drivers/gpu/drm/i915/i915_gem_request.h
>
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 99ce591c8574..b0a83215db80 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -31,6 +31,7 @@ i915-y += i915_cmd_parser.o \
>   	  i915_gem_gtt.o \
>   	  i915_gem.o \
>   	  i915_gem_render_state.o \
> +	  i915_gem_request.o \
>   	  i915_gem_shrinker.o \
>   	  i915_gem_stolen.o \
>   	  i915_gem_tiling.o \
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 57e450e25ad6..ee146ce02412 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -41,6 +41,7 @@
>   #include "intel_lrc.h"
>   #include "i915_gem_gtt.h"
>   #include "i915_gem_render_state.h"
> +#include "i915_gem_request.h"
>   #include <linux/io-mapping.h>
>   #include <linux/i2c.h>
>   #include <linux/i2c-algo-bit.h>
> @@ -2162,179 +2163,15 @@ struct drm_i915_gem_object {
>   };
>   #define to_intel_bo(x) container_of(x, struct drm_i915_gem_object, base)
>
> -void i915_gem_track_fb(struct drm_i915_gem_object *old,
> -		       struct drm_i915_gem_object *new,
> -		       unsigned frontbuffer_bits);
> -
> -/**
> - * Request queue structure.
> - *
> - * The request queue allows us to note sequence numbers that have been emitted
> - * and may be associated with active buffers to be retired.
> - *
> - * By keeping this list, we can avoid having to do questionable sequence
> - * number comparisons on buffer last_read|write_seqno. It also allows an
> - * emission time to be associated with the request for tracking how far ahead
> - * of the GPU the submission is.
> - *
> - * The requests are reference counted, so upon creation they should have an
> - * initial reference taken using kref_init
> - */
> -struct drm_i915_gem_request {
> -	struct kref ref;
> -
> -	/** On Which ring this request was generated */
> -	struct drm_i915_private *i915;
> -	struct intel_engine_cs *ring;
> -	unsigned reset_counter;
> -
> -	 /** GEM sequence number associated with the previous request,
> -	  * when the HWS breadcrumb is equal to this the GPU is processing
> -	  * this request.
> -	  */
> -	u32 previous_seqno;
> -
> -	 /** GEM sequence number associated with this request,
> -	  * when the HWS breadcrumb is equal or greater than this the GPU
> -	  * has finished processing this request.
> -	  */
> -	u32 seqno;
> -
> -	/** Position in the ringbuffer of the start of the request */
> -	u32 head;
> -
> -	/**
> -	 * Position in the ringbuffer of the start of the postfix.
> -	 * This is required to calculate the maximum available ringbuffer
> -	 * space without overwriting the postfix.
> -	 */
> -	 u32 postfix;
> -
> -	/** Position in the ringbuffer of the end of the whole request */
> -	u32 tail;
> -
> -	/**
> -	 * Context and ring buffer related to this request
> -	 * Contexts are refcounted, so when this request is associated with a
> -	 * context, we must increment the context's refcount, to guarantee that
> -	 * it persists while any request is linked to it. Requests themselves
> -	 * are also refcounted, so the request will only be freed when the last
> -	 * reference to it is dismissed, and the code in
> -	 * i915_gem_request_free() will then decrement the refcount on the
> -	 * context.
> -	 */
> -	struct intel_context *ctx;
> -	struct intel_ringbuffer *ringbuf;
> -
> -	/** Batch buffer related to this request if any (used for
> -	    error state dump only) */
> -	struct drm_i915_gem_object *batch_obj;
> -
> -	/** Time at which this request was emitted, in jiffies. */
> -	unsigned long emitted_jiffies;
> -
> -	/** global list entry for this request */
> -	struct list_head list;
> -
> -	struct drm_i915_file_private *file_priv;
> -	/** file_priv list entry for this request */
> -	struct list_head client_list;
> -
> -	/** process identifier submitting this request */
> -	struct pid *pid;
> -
> -	/**
> -	 * The ELSP only accepts two elements at a time, so we queue
> -	 * context/tail pairs on a given queue (ring->execlist_queue) until the
> -	 * hardware is available. The queue serves a double purpose: we also use
> -	 * it to keep track of the up to 2 contexts currently in the hardware
> -	 * (usually one in execution and the other queued up by the GPU): We
> -	 * only remove elements from the head of the queue when the hardware
> -	 * informs us that an element has been completed.
> -	 *
> -	 * All accesses to the queue are mediated by a spinlock
> -	 * (ring->execlist_lock).
> -	 */
> -
> -	/** Execlist link in the submission queue.*/
> -	struct list_head execlist_link;
> -
> -	/** Execlists no. of times this request has been sent to the ELSP */
> -	int elsp_submitted;
> -
> -};
> -
>   #ifdef CONFIG_DRM_I915_DEBUG_GEM
>   #define GEM_BUG_ON(expr) BUG_ON(expr)
>   #else
>   #define GEM_BUG_ON(expr)
>   #endif
>
> -int i915_gem_request_alloc(struct intel_engine_cs *ring,
> -			   struct intel_context *ctx,
> -			   struct drm_i915_gem_request **req_out);
> -void i915_gem_request_cancel(struct drm_i915_gem_request *req);
> -void i915_gem_request_free(struct kref *req_ref);
> -int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
> -				   struct drm_file *file);
> -
> -static inline uint32_t
> -i915_gem_request_get_seqno(struct drm_i915_gem_request *req)
> -{
> -	return req ? req->seqno : 0;
> -}
> -
> -static inline struct intel_engine_cs *
> -i915_gem_request_get_ring(struct drm_i915_gem_request *req)
> -{
> -	return req ? req->ring : NULL;
> -}
> -
> -static inline struct drm_i915_gem_request *
> -i915_gem_request_reference(struct drm_i915_gem_request *req)
> -{
> -	if (req)
> -		kref_get(&req->ref);
> -	return req;
> -}
> -
> -static inline void
> -i915_gem_request_unreference(struct drm_i915_gem_request *req)
> -{
> -	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
> -	kref_put(&req->ref, i915_gem_request_free);
> -}
> -
> -static inline void
> -i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
> -{
> -	struct drm_device *dev;
> -
> -	if (!req)
> -		return;
> -
> -	dev = req->ring->dev;
> -	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
> -		mutex_unlock(&dev->struct_mutex);
> -}
> -
> -static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
> -					   struct drm_i915_gem_request *src)
> -{
> -	if (src)
> -		i915_gem_request_reference(src);
> -
> -	if (*pdst)
> -		i915_gem_request_unreference(*pdst);
> -
> -	*pdst = src;
> -}
> -
> -/*
> - * XXX: i915_gem_request_completed should be here but currently needs the
> - * definition of i915_seqno_passed() which is below. It will be moved in
> - * a later patch when the call to i915_seqno_passed() is obsoleted...
> - */
> +void i915_gem_track_fb(struct drm_i915_gem_object *old,
> +		       struct drm_i915_gem_object *new,
> +		       unsigned frontbuffer_bits);
>
>   /*
>    * A command that requires special handling by the command parser.
> @@ -2956,28 +2793,6 @@ int i915_gem_dumb_create(struct drm_file *file_priv,
>   			 struct drm_mode_create_dumb *args);
>   int i915_gem_mmap_gtt(struct drm_file *file_priv, struct drm_device *dev,
>   		      uint32_t handle, uint64_t *offset);
> -/**
> - * Returns true if seq1 is later than seq2.
> - */
> -static inline bool
> -i915_seqno_passed(uint32_t seq1, uint32_t seq2)
> -{
> -	return (int32_t)(seq1 - seq2) >= 0;
> -}
> -
> -static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
> -{
> -	return i915_seqno_passed(intel_ring_get_seqno(req->ring),
> -				 req->previous_seqno);
> -}
> -
> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
> -{
> -	return i915_seqno_passed(intel_ring_get_seqno(req->ring),
> -				 req->seqno);
> -}
> -
> -int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
>   int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
>
>   struct drm_i915_gem_request *
> @@ -3036,18 +2851,6 @@ void i915_gem_init_swizzling(struct drm_device *dev);
>   void i915_gem_cleanup_ringbuffer(struct drm_device *dev);
>   int __must_check i915_gpu_idle(struct drm_device *dev);
>   int __must_check i915_gem_suspend(struct drm_device *dev);
> -void __i915_add_request(struct drm_i915_gem_request *req,
> -			struct drm_i915_gem_object *batch_obj,
> -			bool flush_caches);
> -#define i915_add_request(req) \
> -	__i915_add_request(req, NULL, true)
> -#define i915_add_request_no_flush(req) \
> -	__i915_add_request(req, NULL, false)
> -int __i915_wait_request(struct drm_i915_gem_request *req,
> -			bool interruptible,
> -			s64 *timeout,
> -			struct intel_rps_client *rps);
> -int __must_check i915_wait_request(struct drm_i915_gem_request *req);
>   int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
>   int __must_check
>   i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ea9344503bf6..68a25617ca7a 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1103,365 +1103,6 @@ put_rpm:
>   	return ret;
>   }
>
> -static int
> -i915_gem_check_wedge(unsigned reset_counter, bool interruptible)
> -{
> -	if (__i915_terminally_wedged(reset_counter))
> -		return -EIO;
> -
> -	if (__i915_reset_in_progress(reset_counter)) {
> -		/* Non-interruptible callers can't handle -EAGAIN, hence return
> -		 * -EIO unconditionally for these. */
> -		if (!interruptible)
> -			return -EIO;
> -
> -		return -EAGAIN;
> -	}
> -
> -	return 0;
> -}
> -
> -static unsigned long local_clock_us(unsigned *cpu)
> -{
> -	unsigned long t;
> -
> -	/* Cheaply and approximately convert from nanoseconds to microseconds.
> -	 * The result and subsequent calculations are also defined in the same
> -	 * approximate microseconds units. The principal source of timing
> -	 * error here is from the simple truncation.
> -	 *
> -	 * Note that local_clock() is only defined wrt to the current CPU;
> -	 * the comparisons are no longer valid if we switch CPUs. Instead of
> -	 * blocking preemption for the entire busywait, we can detect the CPU
> -	 * switch and use that as indicator of system load and a reason to
> -	 * stop busywaiting, see busywait_stop().
> -	 */
> -	*cpu = get_cpu();
> -	t = local_clock() >> 10;
> -	put_cpu();
> -
> -	return t;
> -}
> -
> -static bool busywait_stop(unsigned long timeout, unsigned cpu)
> -{
> -	unsigned this_cpu;
> -
> -	if (time_after(local_clock_us(&this_cpu), timeout))
> -		return true;
> -
> -	return this_cpu != cpu;
> -}
> -
> -static bool __i915_spin_request(struct drm_i915_gem_request *req,
> -				struct intel_wait *wait,
> -				int state)
> -{
> -	unsigned long timeout;
> -	unsigned cpu;
> -
> -	/* When waiting for high frequency requests, e.g. during synchronous
> -	 * rendering split between the CPU and GPU, the finite amount of time
> -	 * required to set up the irq and wait upon it limits the response
> -	 * rate. By busywaiting on the request completion for a short while we
> -	 * can service the high frequency waits as quick as possible. However,
> -	 * if it is a slow request, we want to sleep as quickly as possible.
> -	 * The tradeoff between waiting and sleeping is roughly the time it
> -	 * takes to sleep on a request, on the order of a microsecond.
> -	 */
> -
> -	/* Only spin if we know the GPU is processing this request */
> -	if (!i915_gem_request_started(req))
> -		return false;
> -
> -	timeout = local_clock_us(&cpu) + 5;
> -	do {
> -		if (i915_gem_request_completed(req))
> -			return true;
> -
> -		if (signal_pending_state(state, wait->task))
> -			break;
> -
> -		if (busywait_stop(timeout, cpu))
> -			break;
> -
> -		cpu_relax_lowlatency();
> -
> -		/* Break the loop if we have consumed the timeslice (or been
> -		 * preempted) or when either the background thread has
> -		 * enabled the interrupt, or the IRQ itself has fired.
> -		 */
> -	} while (!need_resched() && wait->task->state == state);
> -
> -	return false;
> -}
> -
> -/**
> - * __i915_wait_request - wait until execution of request has finished
> - * @req: duh!
> - * @interruptible: do an interruptible wait (normally yes)
> - * @timeout: in - how long to wait (NULL forever); out - how much time remaining
> - *
> - * Note: It is of utmost importance that the passed in seqno and reset_counter
> - * values have been read by the caller in an smp safe manner. Where read-side
> - * locks are involved, it is sufficient to read the reset_counter before
> - * unlocking the lock that protects the seqno. For lockless tricks, the
> - * reset_counter _must_ be read before, and an appropriate smp_rmb must be
> - * inserted.
> - *
> - * Returns 0 if the request was found within the alloted time. Else returns the
> - * errno with remaining time filled in timeout argument.
> - */
> -int __i915_wait_request(struct drm_i915_gem_request *req,
> -			bool interruptible,
> -			s64 *timeout,
> -			struct intel_rps_client *rps)
> -{
> -	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> -	struct intel_wait wait;
> -	unsigned long timeout_remain;
> -	int ret = 0;
> -
> -	might_sleep();
> -
> -	if (list_empty(&req->list))
> -		return 0;
> -
> -	if (i915_gem_request_completed(req))
> -		return 0;
> -
> -	timeout_remain = MAX_SCHEDULE_TIMEOUT;
> -	if (timeout) {
> -		if (WARN_ON(*timeout < 0))
> -			return -EINVAL;
> -
> -		if (*timeout == 0)
> -			return -ETIME;
> -
> -		/* Record current time in case interrupted, or wedged */
> -		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
> -		*timeout += ktime_get_raw_ns();
> -	}
> -
> -	trace_i915_gem_request_wait_begin(req);
> -
> -	/* This client is about to stall waiting for the GPU. In many cases
> -	 * this is undesirable and limits the throughput of the system, as
> -	 * many clients cannot continue processing user input/output whilst
> -	 * blocked. RPS autotuning may take tens of milliseconds to respond
> -	 * to the GPU load and thus incurs additional latency for the client.
> -	 * We can circumvent that by promoting the GPU frequency to maximum
> -	 * before we wait. This makes the GPU throttle up much more quickly
> -	 * (good for benchmarks and user experience, e.g. window animations),
> -	 * but at a cost of spending more power processing the workload
> -	 * (bad for battery). Not all clients even want their results
> -	 * immediately and for them we should just let the GPU select its own
> -	 * frequency to maximise efficiency. To prevent a single client from
> -	 * forcing the clocks too high for the whole system, we only allow
> -	 * each client to waitboost once in a busy period.
> -	 */
> -	if (INTEL_INFO(req->i915)->gen >= 6)
> -		gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
> -
> -	intel_wait_init(&wait, req->seqno);
> -	set_task_state(wait.task, state);
> -
> -	/* Optimistic spin for the next ~jiffie before touching IRQs */
> -	if (intel_engine_add_wait(req->ring, &wait)) {
> -		if (__i915_spin_request(req, &wait, state))
> -			goto complete;
> -
> -		/* In order to check that we haven't missed the interrupt
> -		 * as we enabled it, we need to kick ourselves to do a
> -		 * coherent check on the seqno before we sleep.
> -		 */
> -		if (intel_engine_enable_wait_irq(req->ring, &wait))
> -			goto wakeup;
> -	}
> -
> -	for (;;) {
> -		if (signal_pending_state(state, wait.task)) {
> -			ret = -ERESTARTSYS;
> -			break;
> -		}
> -
> -		/* Ensure that even if the GPU hangs, we get woken up. */
> -		i915_queue_hangcheck(req->i915);
> -
> -		timeout_remain = io_schedule_timeout(timeout_remain);
> -		if (timeout_remain == 0) {
> -			ret = -ETIME;
> -			break;
> -		}
> -
> -		if (intel_wait_complete(&wait))
> -			break;
> -
> -wakeup:
> -		set_task_state(wait.task, state);
> -
> -		/* Carefully check if the request is complete, giving time
> -		 * for the seqno to be visible following the interrupt.
> -		 * We also have to check in case we are kicked by the GPU
> -		 * reset in order to drop the struct_mutex.
> -		 */
> -		if (__i915_request_irq_complete(req))
> -			break;
> -	}
> -
> -complete:
> -	intel_engine_remove_wait(req->ring, &wait);
> -	__set_task_state(wait.task, TASK_RUNNING);
> -	trace_i915_gem_request_wait_end(req);
> -
> -	if (timeout) {
> -		*timeout -= ktime_get_raw_ns();
> -		if (*timeout < 0)
> -			*timeout = 0;
> -
> -		/*
> -		 * Apparently ktime isn't accurate enough and occasionally has a
> -		 * bit of mismatch in the jiffies<->nsecs<->ktime loop. So patch
> -		 * things up to make the test happy. We allow up to 1 jiffy.
> -		 *
> -		 * This is a regrssion from the timespec->ktime conversion.
> -		 */
> -		if (ret == -ETIME && *timeout < jiffies_to_usecs(1)*1000)
> -			*timeout = 0;
> -	}
> -
> -	if (ret == 0 && rps && req->seqno == req->ring->last_submitted_seqno) {
> -		/* The GPU is now idle and this client has stalled.
> -		 * Since no other client has submitted a request in the
> -		 * meantime, assume that this client is the only one
> -		 * supplying work to the GPU but is unable to keep that
> -		 * work supplied because it is waiting. Since the GPU is
> -		 * then never kept fully busy, RPS autoclocking will
> -		 * keep the clocks relatively low, causing further delays.
> -		 * Compensate by giving the synchronous client credit for
> -		 * a waitboost next time.
> -		 */
> -		spin_lock(&req->i915->rps.client_lock);
> -		list_del_init(&rps->link);
> -		spin_unlock(&req->i915->rps.client_lock);
> -	}
> -
> -	return ret;
> -}
> -
> -int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
> -				   struct drm_file *file)
> -{
> -	struct drm_i915_private *dev_private;
> -	struct drm_i915_file_private *file_priv;
> -
> -	WARN_ON(!req || !file || req->file_priv);
> -
> -	if (!req || !file)
> -		return -EINVAL;
> -
> -	if (req->file_priv)
> -		return -EINVAL;
> -
> -	dev_private = req->ring->dev->dev_private;
> -	file_priv = file->driver_priv;
> -
> -	spin_lock(&file_priv->mm.lock);
> -	req->file_priv = file_priv;
> -	list_add_tail(&req->client_list, &file_priv->mm.request_list);
> -	spin_unlock(&file_priv->mm.lock);
> -
> -	req->pid = get_pid(task_pid(current));
> -
> -	return 0;
> -}
> -
> -static inline void
> -i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
> -{
> -	struct drm_i915_file_private *file_priv = request->file_priv;
> -
> -	if (!file_priv)
> -		return;
> -
> -	spin_lock(&file_priv->mm.lock);
> -	list_del(&request->client_list);
> -	request->file_priv = NULL;
> -	spin_unlock(&file_priv->mm.lock);
> -
> -	put_pid(request->pid);
> -	request->pid = NULL;
> -}
> -
> -static void i915_gem_request_retire(struct drm_i915_gem_request *request)
> -{
> -	trace_i915_gem_request_retire(request);
> -
> -	/* We know the GPU must have read the request to have
> -	 * sent us the seqno + interrupt, so use the position
> -	 * of tail of the request to update the last known position
> -	 * of the GPU head.
> -	 *
> -	 * Note this requires that we are always called in request
> -	 * completion order.
> -	 */
> -	request->ringbuf->last_retired_head = request->postfix;
> -
> -	list_del_init(&request->list);
> -	i915_gem_request_remove_from_client(request);
> -
> -	i915_gem_request_unreference(request);
> -}
> -
> -static void
> -__i915_gem_request_retire__upto(struct drm_i915_gem_request *req)
> -{
> -	struct intel_engine_cs *engine = req->ring;
> -	struct drm_i915_gem_request *tmp;
> -
> -	lockdep_assert_held(&engine->dev->struct_mutex);
> -
> -	if (list_empty(&req->list))
> -		return;
> -
> -	do {
> -		tmp = list_first_entry(&engine->request_list,
> -				       typeof(*tmp), list);
> -
> -		i915_gem_request_retire(tmp);
> -	} while (tmp != req);
> -
> -	WARN_ON(i915_verify_lists(engine->dev));
> -}
> -
> -/**
> - * Waits for a request to be signaled, and cleans up the
> - * request and object lists appropriately for that event.
> - */
> -int
> -i915_wait_request(struct drm_i915_gem_request *req)
> -{
> -	struct drm_device *dev;
> -	struct drm_i915_private *dev_priv;
> -	bool interruptible;
> -	int ret;
> -
> -	BUG_ON(req == NULL);
> -
> -	dev = req->ring->dev;
> -	dev_priv = dev->dev_private;
> -	interruptible = dev_priv->mm.interruptible;
> -
> -	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
> -
> -	ret = __i915_wait_request(req, interruptible, NULL, NULL);
> -	if (ret)
> -		return ret;
> -
> -	__i915_gem_request_retire__upto(req);
> -	return 0;
> -}
> -
>   /**
>    * Ensures that all rendering to the object has completed and the object is
>    * safe to unbind from the GTT or access from the CPU.
> @@ -1515,7 +1156,7 @@ i915_gem_object_retire_request(struct drm_i915_gem_object *obj,
>   	else if (obj->last_write_req == req)
>   		i915_gem_object_retire__write(obj);
>
> -	__i915_gem_request_retire__upto(req);
> +	i915_gem_request_retire_upto(req);
>   }
>
>   /* A nonblocking variant of the above wait. This is a highly dangerous routine
> @@ -2441,94 +2082,6 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>   	drm_gem_object_unreference(&obj->base);
>   }
>
> -static int
> -i915_gem_init_seqno(struct drm_device *dev, u32 seqno)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> -	int ret, i, j;
> -
> -	/* Carefully retire all requests without writing to the rings */
> -	for_each_ring(ring, dev_priv, i) {
> -		ret = intel_ring_idle(ring);
> -		if (ret)
> -			return ret;
> -	}
> -	i915_gem_retire_requests(dev);
> -
> -	/* Finally reset hw state */
> -	for_each_ring(ring, dev_priv, i) {
> -		intel_ring_init_seqno(ring, seqno);
> -
> -		for (j = 0; j < ARRAY_SIZE(ring->semaphore.sync_seqno); j++)
> -			ring->semaphore.sync_seqno[j] = 0;
> -	}
> -
> -	return 0;
> -}
> -
> -int i915_gem_set_seqno(struct drm_device *dev, u32 seqno)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	int ret;
> -
> -	if (seqno == 0)
> -		return -EINVAL;
> -
> -	/* HWS page needs to be set less than what we
> -	 * will inject to ring
> -	 */
> -	ret = i915_gem_init_seqno(dev, seqno - 1);
> -	if (ret)
> -		return ret;
> -
> -	/* Carefully set the last_seqno value so that wrap
> -	 * detection still works
> -	 */
> -	dev_priv->next_seqno = seqno;
> -	dev_priv->last_seqno = seqno - 1;
> -	if (dev_priv->last_seqno == 0)
> -		dev_priv->last_seqno--;
> -
> -	return 0;
> -}
> -
> -int
> -i915_gem_get_seqno(struct drm_device *dev, u32 *seqno)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -
> -	/* reserve 0 for non-seqno */
> -	if (dev_priv->next_seqno == 0) {
> -		int ret = i915_gem_init_seqno(dev, 0);
> -		if (ret)
> -			return ret;
> -
> -		dev_priv->next_seqno = 1;
> -	}
> -
> -	*seqno = dev_priv->last_seqno = dev_priv->next_seqno++;
> -	return 0;
> -}
> -
> -static void i915_gem_mark_busy(struct drm_i915_private *dev_priv)
> -{
> -	if (dev_priv->mm.busy)
> -		return;
> -
> -	intel_runtime_pm_get_noresume(dev_priv);
> -
> -	i915_update_gfx_val(dev_priv);
> -	if (INTEL_INFO(dev_priv)->gen >= 6)
> -		gen6_rps_busy(dev_priv);
> -
> -	queue_delayed_work(dev_priv->wq,
> -			   &dev_priv->mm.retire_work,
> -			   round_jiffies_up_relative(HZ));
> -
> -	dev_priv->mm.busy = true;
> -}
> -
>   static void i915_gem_mark_idle(struct drm_i915_private *dev_priv)
>   {
>   	dev_priv->mm.busy = false;
> @@ -2542,92 +2095,6 @@ static void i915_gem_mark_idle(struct drm_i915_private *dev_priv)
>   	intel_runtime_pm_put(dev_priv);
>   }
>
> -/*
> - * NB: This function is not allowed to fail. Doing so would mean the the
> - * request is not being tracked for completion but the work itself is
> - * going to happen on the hardware. This would be a Bad Thing(tm).
> - */
> -void __i915_add_request(struct drm_i915_gem_request *request,
> -			struct drm_i915_gem_object *obj,
> -			bool flush_caches)
> -{
> -	struct intel_engine_cs *ring;
> -	struct drm_i915_private *dev_priv;
> -	struct intel_ringbuffer *ringbuf;
> -	u32 request_start;
> -	int ret;
> -
> -	if (WARN_ON(request == NULL))
> -		return;
> -
> -	ring = request->ring;
> -	dev_priv = ring->dev->dev_private;
> -	ringbuf = request->ringbuf;
> -
> -	/*
> -	 * To ensure that this call will not fail, space for its emissions
> -	 * should already have been reserved in the ring buffer. Let the ring
> -	 * know that it is time to use that space up.
> -	 */
> -	intel_ring_reserved_space_use(ringbuf);
> -
> -	request_start = intel_ring_get_tail(ringbuf);
> -	/*
> -	 * Emit any outstanding flushes - execbuf can fail to emit the flush
> -	 * after having emitted the batchbuffer command. Hence we need to fix
> -	 * things up similar to emitting the lazy request. The difference here
> -	 * is that the flush _must_ happen before the next request, no matter
> -	 * what.
> -	 */
> -	if (flush_caches) {
> -		if (i915.enable_execlists)
> -			ret = logical_ring_flush_all_caches(request);
> -		else
> -			ret = intel_ring_flush_all_caches(request);
> -		/* Not allowed to fail! */
> -		WARN(ret, "*_ring_flush_all_caches failed: %d!\n", ret);
> -	}
> -
> -	/* Record the position of the start of the request so that
> -	 * should we detect the updated seqno part-way through the
> -	 * GPU processing the request, we never over-estimate the
> -	 * position of the head.
> -	 */
> -	request->postfix = intel_ring_get_tail(ringbuf);
> -
> -	if (i915.enable_execlists)
> -		ret = ring->emit_request(request);
> -	else {
> -		ret = ring->add_request(request);
> -
> -		request->tail = intel_ring_get_tail(ringbuf);
> -	}
> -	/* Not allowed to fail! */
> -	WARN(ret, "emit|add_request failed: %d!\n", ret);
> -
> -	request->head = request_start;
> -
> -	/* Whilst this request exists, batch_obj will be on the
> -	 * active_list, and so will hold the active reference. Only when this
> -	 * request is retired will the the batch_obj be moved onto the
> -	 * inactive_list and lose its active reference. Hence we do not need
> -	 * to explicitly hold another reference here.
> -	 */
> -	request->batch_obj = obj;
> -
> -	request->emitted_jiffies = jiffies;
> -	request->previous_seqno = ring->last_submitted_seqno;
> -	ring->last_submitted_seqno = request->seqno;
> -	list_add_tail(&request->list, &ring->request_list);
> -
> -	trace_i915_gem_request_add(request);
> -
> -	i915_gem_mark_busy(dev_priv);
> -
> -	/* Sanity check that the reserved size was large enough. */
> -	intel_ring_reserved_space_end(ringbuf);
> -}
> -
>   static bool i915_context_is_banned(struct drm_i915_private *dev_priv,
>   				   const struct intel_context *ctx)
>   {
> @@ -2666,109 +2133,6 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>   	}
>   }
>
> -void i915_gem_request_free(struct kref *req_ref)
> -{
> -	struct drm_i915_gem_request *req = container_of(req_ref,
> -						 typeof(*req), ref);
> -	struct intel_context *ctx = req->ctx;
> -
> -	if (req->file_priv)
> -		i915_gem_request_remove_from_client(req);
> -
> -	if (ctx) {
> -		if (i915.enable_execlists) {
> -			if (ctx != req->ring->default_context)
> -				intel_lr_context_unpin(req);
> -		}
> -
> -		i915_gem_context_unreference(ctx);
> -	}
> -
> -	kmem_cache_free(req->i915->requests, req);
> -}
> -
> -int i915_gem_request_alloc(struct intel_engine_cs *ring,
> -			   struct intel_context *ctx,
> -			   struct drm_i915_gem_request **req_out)
> -{
> -	struct drm_i915_private *dev_priv = to_i915(ring->dev);
> -	unsigned reset_counter = i915_reset_counter(&dev_priv->gpu_error);
> -	struct drm_i915_gem_request *req;
> -	int ret;
> -
> -	if (!req_out)
> -		return -EINVAL;
> -
> -	*req_out = NULL;
> -
> -	/* ABI: Before userspace accesses the GPU (e.g. execbuffer), report
> -	 * EIO if the GPU is already wedged, or EAGAIN to drop the struct_mutex
> -	 * and restart.
> -	 */
> -	ret = i915_gem_check_wedge(reset_counter, dev_priv->mm.interruptible);
> -	if (ret)
> -		return ret;
> -
> -	req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
> -	if (req == NULL)
> -		return -ENOMEM;
> -
> -	ret = i915_gem_get_seqno(ring->dev, &req->seqno);
> -	if (ret)
> -		goto err;
> -
> -	kref_init(&req->ref);
> -	req->i915 = dev_priv;
> -	req->ring = ring;
> -	req->reset_counter = reset_counter;
> -	req->ctx  = ctx;
> -	i915_gem_context_reference(req->ctx);
> -
> -	if (i915.enable_execlists)
> -		ret = intel_logical_ring_alloc_request_extras(req);
> -	else
> -		ret = intel_ring_alloc_request_extras(req);
> -	if (ret) {
> -		i915_gem_context_unreference(req->ctx);
> -		goto err;
> -	}
> -
> -	/*
> -	 * Reserve space in the ring buffer for all the commands required to
> -	 * eventually emit this request. This is to guarantee that the
> -	 * i915_add_request() call can't fail. Note that the reserve may need
> -	 * to be redone if the request is not actually submitted straight
> -	 * away, e.g. because a GPU scheduler has deferred it.
> -	 */
> -	if (i915.enable_execlists)
> -		ret = intel_logical_ring_reserve_space(req);
> -	else
> -		ret = intel_ring_reserve_space(req);
> -	if (ret) {
> -		/*
> -		 * At this point, the request is fully allocated even if not
> -		 * fully prepared. Thus it can be cleaned up using the proper
> -		 * free code.
> -		 */
> -		i915_gem_request_cancel(req);
> -		return ret;
> -	}
> -
> -	*req_out = req;
> -	return 0;
> -
> -err:
> -	kmem_cache_free(dev_priv->requests, req);
> -	return ret;
> -}
> -
> -void i915_gem_request_cancel(struct drm_i915_gem_request *req)
> -{
> -	intel_ring_reserved_space_cancel(req->ringbuf);
> -
> -	i915_gem_request_unreference(req);
> -}
> -
>   struct drm_i915_gem_request *
>   i915_gem_find_active_request(struct intel_engine_cs *ring)
>   {
> @@ -2850,14 +2214,14 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>   	 * implicit references on things like e.g. ppgtt address spaces through
>   	 * the request.
>   	 */
> -	while (!list_empty(&ring->request_list)) {
> +	if (!list_empty(&ring->request_list)) {
>   		struct drm_i915_gem_request *request;
>
> -		request = list_first_entry(&ring->request_list,
> -					   struct drm_i915_gem_request,
> -					   list);
> +		request = list_last_entry(&ring->request_list,
> +					  struct drm_i915_gem_request,
> +					  list);
>
> -		i915_gem_request_retire(request);
> +		i915_gem_request_retire_upto(request);
>   	}
>
>   	/* Having flushed all requests from all queues, we know that all
> @@ -2922,7 +2286,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>   		if (!i915_gem_request_completed(request))
>   			break;
>
> -		i915_gem_request_retire(request);
> +		i915_gem_request_retire_upto(request);
>   	}
>
>   	/* Move any buffers on the active list that are no longer referenced
> @@ -3053,7 +2417,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
>   			goto retire;
>
>   		if (i915_gem_request_completed(req)) {
> -			__i915_gem_request_retire__upto(req);
> +			i915_gem_request_retire_upto(req);
>   retire:
>   			i915_gem_object_retire__read(obj, i);
>   		}
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> new file mode 100644
> index 000000000000..b4ede6dd7b20
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -0,0 +1,659 @@
> +/*
> + * Copyright © 2008-2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#include "i915_drv.h"
> +
> +static int
> +i915_gem_check_wedge(unsigned reset_counter, bool interruptible)
> +{
> +	if (__i915_terminally_wedged(reset_counter))
> +		return -EIO;
> +
> +	if (__i915_reset_in_progress(reset_counter)) {
> +		/* Non-interruptible callers can't handle -EAGAIN, hence return
> +		 * -EIO unconditionally for these. */
> +		if (!interruptible)
> +			return -EIO;
> +
> +		return -EAGAIN;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +i915_gem_init_seqno(struct drm_i915_private *dev_priv, u32 seqno)
> +{
> +	struct intel_engine_cs *ring;
> +	int ret, i, j;
> +
> +	/* Carefully retire all requests without writing to the rings */
> +	for_each_ring(ring, dev_priv, i) {
> +		ret = intel_ring_idle(ring);
> +		if (ret)
> +			return ret;
> +	}
> +	i915_gem_retire_requests(dev_priv->dev);
> +
> +	/* Finally reset hw state */
> +	for_each_ring(ring, dev_priv, i) {
> +		intel_ring_init_seqno(ring, seqno);
> +
> +		for (j = 0; j < ARRAY_SIZE(ring->semaphore.sync_seqno); j++)
> +			ring->semaphore.sync_seqno[j] = 0;
> +	}
> +
> +	return 0;
> +}
> +
> +int i915_gem_set_seqno(struct drm_device *dev, u32 seqno)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	int ret;
> +
> +	if (seqno == 0)
> +		return -EINVAL;
> +
> +	/* HWS page needs to be set less than what we
> +	 * will inject to ring
> +	 */
> +	ret = i915_gem_init_seqno(dev_priv, seqno - 1);
> +	if (ret)
> +		return ret;
> +
> +	/* Carefully set the last_seqno value so that wrap
> +	 * detection still works
> +	 */
> +	dev_priv->next_seqno = seqno;
> +	dev_priv->last_seqno = seqno - 1;
> +	if (dev_priv->last_seqno == 0)
> +		dev_priv->last_seqno--;
> +
> +	return 0;
> +}
> +
> +static int
> +i915_gem_get_seqno(struct drm_i915_private *dev_priv, u32 *seqno)
> +{
> +	/* reserve 0 for non-seqno */
> +	if (unlikely(dev_priv->next_seqno == 0)) {
> +		int ret = i915_gem_init_seqno(dev_priv, 0);
> +		if (ret)
> +			return ret;
> +
> +		dev_priv->next_seqno = 1;
> +	}
> +
> +	*seqno = dev_priv->last_seqno = dev_priv->next_seqno++;
> +	return 0;
> +}
> +
> +int i915_gem_request_alloc(struct intel_engine_cs *ring,
> +			   struct intel_context *ctx,
> +			   struct drm_i915_gem_request **req_out)
> +{
> +	struct drm_i915_private *dev_priv = to_i915(ring->dev);
> +	unsigned reset_counter = i915_reset_counter(&dev_priv->gpu_error);
> +	struct drm_i915_gem_request *req;
> +	int ret;
> +
> +	if (!req_out)
> +		return -EINVAL;
> +
> +	*req_out = NULL;
> +
> +	/* ABI: Before userspace accesses the GPU (e.g. execbuffer), report
> +	 * EIO if the GPU is already wedged, or EAGAIN to drop the struct_mutex
> +	 * and restart.
> +	 */
> +	ret = i915_gem_check_wedge(reset_counter, dev_priv->mm.interruptible);
> +	if (ret)
> +		return ret;
> +
> +	req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
> +	if (req == NULL)
> +		return -ENOMEM;
> +
> +	ret = i915_gem_get_seqno(dev_priv, &req->seqno);
> +	if (ret)
> +		goto err;
> +
> +	kref_init(&req->ref);
> +	req->i915 = dev_priv;
> +	req->ring = ring;
> +	req->reset_counter = reset_counter;
> +	req->ctx  = ctx;
> +	i915_gem_context_reference(req->ctx);
> +
> +	if (i915.enable_execlists)
> +		ret = intel_logical_ring_alloc_request_extras(req);
> +	else
> +		ret = intel_ring_alloc_request_extras(req);
> +	if (ret) {
> +		i915_gem_context_unreference(req->ctx);
> +		goto err;
> +	}
> +
> +	/*
> +	 * Reserve space in the ring buffer for all the commands required to
> +	 * eventually emit this request. This is to guarantee that the
> +	 * i915_add_request() call can't fail. Note that the reserve may need
> +	 * to be redone if the request is not actually submitted straight
> +	 * away, e.g. because a GPU scheduler has deferred it.
> +	 */
> +	if (i915.enable_execlists)
> +		ret = intel_logical_ring_reserve_space(req);
> +	else
> +		ret = intel_ring_reserve_space(req);
> +	if (ret) {
> +		/*
> +		 * At this point, the request is fully allocated even if not
> +		 * fully prepared. Thus it can be cleaned up using the proper
> +		 * free code.
> +		 */
> +		i915_gem_request_cancel(req);
> +		return ret;
> +	}
> +
> +	*req_out = req;
> +	return 0;
> +
> +err:
> +	kmem_cache_free(dev_priv->requests, req);
> +	return ret;
> +}
> +
> +void i915_gem_request_cancel(struct drm_i915_gem_request *req)
> +{
> +	intel_ring_reserved_space_cancel(req->ringbuf);
> +
> +	i915_gem_request_unreference(req);
> +}
> +
> +int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
> +				   struct drm_file *file)
> +{
> +	struct drm_i915_private *dev_private;
> +	struct drm_i915_file_private *file_priv;
> +
> +	WARN_ON(!req || !file || req->file_priv);
> +
> +	if (!req || !file)
> +		return -EINVAL;
> +
> +	if (req->file_priv)
> +		return -EINVAL;
> +
> +	dev_private = req->ring->dev->dev_private;
> +	file_priv = file->driver_priv;
> +
> +	spin_lock(&file_priv->mm.lock);
> +	req->file_priv = file_priv;
> +	list_add_tail(&req->client_list, &file_priv->mm.request_list);
> +	spin_unlock(&file_priv->mm.lock);
> +
> +	req->pid = get_pid(task_pid(current));
> +
> +	return 0;
> +}
> +
> +static inline void
> +i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
> +{
> +	struct drm_i915_file_private *file_priv = request->file_priv;
> +
> +	if (!file_priv)
> +		return;
> +
> +	spin_lock(&file_priv->mm.lock);
> +	list_del(&request->client_list);
> +	request->file_priv = NULL;
> +	spin_unlock(&file_priv->mm.lock);
> +
> +	put_pid(request->pid);
> +	request->pid = NULL;
> +}
> +
> +static void i915_gem_request_retire(struct drm_i915_gem_request *request)
> +{
> +	trace_i915_gem_request_retire(request);
> +
> +	/* We know the GPU must have read the request to have
> +	 * sent us the seqno + interrupt, so use the position
> +	 * of tail of the request to update the last known position
> +	 * of the GPU head.
> +	 *
> +	 * Note this requires that we are always called in request
> +	 * completion order.
> +	 */
> +	request->ringbuf->last_retired_head = request->postfix;
> +
> +	list_del_init(&request->list);
> +	i915_gem_request_remove_from_client(request);
> +
> +	i915_gem_request_unreference(request);
> +}
> +
> +void
> +i915_gem_request_retire_upto(struct drm_i915_gem_request *req)
> +{
> +	struct intel_engine_cs *engine = req->ring;
> +	struct drm_i915_gem_request *tmp;
> +
> +	lockdep_assert_held(&engine->dev->struct_mutex);
> +
> +	if (list_empty(&req->list))
> +		return;
> +
> +	do {
> +		tmp = list_first_entry(&engine->request_list,
> +				       typeof(*tmp), list);
> +
> +		i915_gem_request_retire(tmp);
> +	} while (tmp != req);
> +
> +	WARN_ON(i915_verify_lists(engine->dev));
> +}
> +
> +static void i915_gem_mark_busy(struct drm_i915_private *dev_priv)
> +{
> +	if (dev_priv->mm.busy)
> +		return;
> +
> +	intel_runtime_pm_get_noresume(dev_priv);
> +
> +	i915_update_gfx_val(dev_priv);
> +	if (INTEL_INFO(dev_priv)->gen >= 6)
> +		gen6_rps_busy(dev_priv);
> +
> +	queue_delayed_work(dev_priv->wq,
> +			   &dev_priv->mm.retire_work,
> +			   round_jiffies_up_relative(HZ));
> +
> +	dev_priv->mm.busy = true;
> +}
> +
> +/*
> + * NB: This function is not allowed to fail. Doing so would mean the the
> + * request is not being tracked for completion but the work itself is
> + * going to happen on the hardware. This would be a Bad Thing(tm).
> + */
> +void __i915_add_request(struct drm_i915_gem_request *request,
> +			struct drm_i915_gem_object *obj,
> +			bool flush_caches)
> +{
> +	struct intel_engine_cs *ring;
> +	struct drm_i915_private *dev_priv;
> +	struct intel_ringbuffer *ringbuf;
> +	u32 request_start;
> +	int ret;
> +
> +	if (WARN_ON(request == NULL))
> +		return;
> +
> +	ring = request->ring;
> +	dev_priv = ring->dev->dev_private;
> +	ringbuf = request->ringbuf;
> +
> +	/*
> +	 * To ensure that this call will not fail, space for its emissions
> +	 * should already have been reserved in the ring buffer. Let the ring
> +	 * know that it is time to use that space up.
> +	 */
> +	intel_ring_reserved_space_use(ringbuf);
> +
> +	request_start = intel_ring_get_tail(ringbuf);
> +	/*
> +	 * Emit any outstanding flushes - execbuf can fail to emit the flush
> +	 * after having emitted the batchbuffer command. Hence we need to fix
> +	 * things up similar to emitting the lazy request. The difference here
> +	 * is that the flush _must_ happen before the next request, no matter
> +	 * what.
> +	 */
> +	if (flush_caches) {
> +		if (i915.enable_execlists)
> +			ret = logical_ring_flush_all_caches(request);
> +		else
> +			ret = intel_ring_flush_all_caches(request);
> +		/* Not allowed to fail! */
> +		WARN(ret, "*_ring_flush_all_caches failed: %d!\n", ret);
> +	}
> +
> +	/* Record the position of the start of the request so that
> +	 * should we detect the updated seqno part-way through the
> +	 * GPU processing the request, we never over-estimate the
> +	 * position of the head.
> +	 */
> +	request->postfix = intel_ring_get_tail(ringbuf);
> +
> +	if (i915.enable_execlists)
> +		ret = ring->emit_request(request);
> +	else {
> +		ret = ring->add_request(request);
> +
> +		request->tail = intel_ring_get_tail(ringbuf);
> +	}
> +	/* Not allowed to fail! */
> +	WARN(ret, "emit|add_request failed: %d!\n", ret);
> +
> +	request->head = request_start;
> +
> +	/* Whilst this request exists, batch_obj will be on the
> +	 * active_list, and so will hold the active reference. Only when this
> +	 * request is retired will the the batch_obj be moved onto the
> +	 * inactive_list and lose its active reference. Hence we do not need
> +	 * to explicitly hold another reference here.
> +	 */
> +	request->batch_obj = obj;
> +
> +	request->emitted_jiffies = jiffies;
> +	request->previous_seqno = ring->last_submitted_seqno;
> +	ring->last_submitted_seqno = request->seqno;
> +	list_add_tail(&request->list, &ring->request_list);
> +
> +	trace_i915_gem_request_add(request);
> +
> +	i915_gem_mark_busy(dev_priv);
> +
> +	/* Sanity check that the reserved size was large enough. */
> +	intel_ring_reserved_space_end(ringbuf);
> +}
> +
> +
> +static unsigned long local_clock_us(unsigned *cpu)
> +{
> +	unsigned long t;
> +
> +	/* Cheaply and approximately convert from nanoseconds to microseconds.
> +	 * The result and subsequent calculations are also defined in the same
> +	 * approximate microseconds units. The principal source of timing
> +	 * error here is from the simple truncation.
> +	 *
> +	 * Note that local_clock() is only defined wrt to the current CPU;
> +	 * the comparisons are no longer valid if we switch CPUs. Instead of
> +	 * blocking preemption for the entire busywait, we can detect the CPU
> +	 * switch and use that as indicator of system load and a reason to
> +	 * stop busywaiting, see busywait_stop().
> +	 */
> +	*cpu = get_cpu();
> +	t = local_clock() >> 10;
> +	put_cpu();
> +
> +	return t;
> +}
> +
> +static bool busywait_stop(unsigned long timeout, unsigned cpu)
> +{
> +	unsigned this_cpu;
> +
> +	if (time_after(local_clock_us(&this_cpu), timeout))
> +		return true;
> +
> +	return this_cpu != cpu;
> +}
> +
> +static bool __i915_spin_request(struct drm_i915_gem_request *req,
> +				struct intel_wait *wait,
> +				int state)
> +{
> +	unsigned long timeout;
> +	unsigned cpu;
> +
> +	/* When waiting for high frequency requests, e.g. during synchronous
> +	 * rendering split between the CPU and GPU, the finite amount of time
> +	 * required to set up the irq and wait upon it limits the response
> +	 * rate. By busywaiting on the request completion for a short while we
> +	 * can service the high frequency waits as quick as possible. However,
> +	 * if it is a slow request, we want to sleep as quickly as possible.
> +	 * The tradeoff between waiting and sleeping is roughly the time it
> +	 * takes to sleep on a request, on the order of a microsecond.
> +	 */
> +
> +	/* Only spin if we know the GPU is processing this request */
> +	if (!i915_gem_request_started(req))
> +		return false;
> +
> +	timeout = local_clock_us(&cpu) + 5;
> +	do {
> +		if (i915_gem_request_completed(req))
> +			return true;
> +
> +		if (signal_pending_state(state, wait->task))
> +			break;
> +
> +		if (busywait_stop(timeout, cpu))
> +			break;
> +
> +		cpu_relax_lowlatency();
> +
> +		/* Break the loop if we have consumed the timeslice (or been
> +		 * preempted) or when either the background thread has
> +		 * enabled the interrupt, or the IRQ itself has fired.
> +		 */
> +	} while (!need_resched() && wait->task->state == state);
> +
> +	return false;
> +}
> +
> +/**
> + * __i915_wait_request - wait until execution of request has finished
> + * @req: duh!
> + * @interruptible: do an interruptible wait (normally yes)
> + * @timeout: in - how long to wait (NULL forever); out - how much time remaining
> + *
> + * Note: It is of utmost importance that the passed in seqno and reset_counter
> + * values have been read by the caller in an smp safe manner. Where read-side
> + * locks are involved, it is sufficient to read the reset_counter before
> + * unlocking the lock that protects the seqno. For lockless tricks, the
> + * reset_counter _must_ be read before, and an appropriate smp_rmb must be
> + * inserted.
> + *
> + * Returns 0 if the request was found within the alloted time. Else returns the
> + * errno with remaining time filled in timeout argument.
> + */
> +int __i915_wait_request(struct drm_i915_gem_request *req,
> +			bool interruptible,
> +			s64 *timeout,
> +			struct intel_rps_client *rps)
> +{
> +	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> +	struct intel_wait wait;
> +	unsigned long timeout_remain;
> +	int ret = 0;
> +
> +	might_sleep();
> +
> +	if (list_empty(&req->list))
> +		return 0;
> +
> +	if (i915_gem_request_completed(req))
> +		return 0;
> +
> +	timeout_remain = MAX_SCHEDULE_TIMEOUT;
> +	if (timeout) {
> +		if (WARN_ON(*timeout < 0))
> +			return -EINVAL;
> +
> +		if (*timeout == 0)
> +			return -ETIME;
> +
> +		/* Record current time in case interrupted, or wedged */
> +		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
> +		*timeout += ktime_get_raw_ns();
> +	}
> +
> +	trace_i915_gem_request_wait_begin(req);
> +
> +	/* This client is about to stall waiting for the GPU. In many cases
> +	 * this is undesirable and limits the throughput of the system, as
> +	 * many clients cannot continue processing user input/output whilst
> +	 * blocked. RPS autotuning may take tens of milliseconds to respond
> +	 * to the GPU load and thus incurs additional latency for the client.
> +	 * We can circumvent that by promoting the GPU frequency to maximum
> +	 * before we wait. This makes the GPU throttle up much more quickly
> +	 * (good for benchmarks and user experience, e.g. window animations),
> +	 * but at a cost of spending more power processing the workload
> +	 * (bad for battery). Not all clients even want their results
> +	 * immediately and for them we should just let the GPU select its own
> +	 * frequency to maximise efficiency. To prevent a single client from
> +	 * forcing the clocks too high for the whole system, we only allow
> +	 * each client to waitboost once in a busy period.
> +	 */
> +	if (INTEL_INFO(req->i915)->gen >= 6)
> +		gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
> +
> +	intel_wait_init(&wait, req->seqno);
> +	set_task_state(wait.task, state);
> +
> +	/* Optimistic spin for the next ~jiffie before touching IRQs */
> +	if (intel_engine_add_wait(req->ring, &wait)) {
> +		if (__i915_spin_request(req, &wait, state))
> +			goto complete;
> +
> +		/* In order to check that we haven't missed the interrupt
> +		 * as we enabled it, we need to kick ourselves to do a
> +		 * coherent check on the seqno before we sleep.
> +		 */
> +		if (intel_engine_enable_wait_irq(req->ring, &wait))
> +			goto wakeup;
> +	}
> +
> +	for (;;) {
> +		if (signal_pending_state(state, wait.task)) {
> +			ret = -ERESTARTSYS;
> +			break;
> +		}
> +
> +		/* Ensure that even if the GPU hangs, we get woken up. */
> +		i915_queue_hangcheck(req->i915);
> +
> +		timeout_remain = io_schedule_timeout(timeout_remain);
> +		if (timeout_remain == 0) {
> +			ret = -ETIME;
> +			break;
> +		}
> +
> +		if (intel_wait_complete(&wait))
> +			break;
> +
> +wakeup:
> +		set_task_state(wait.task, state);
> +
> +		/* Carefully check if the request is complete, giving time
> +		 * for the seqno to be visible following the interrupt.
> +		 * We also have to check in case we are kicked by the GPU
> +		 * reset in order to drop the struct_mutex.
> +		 */
> +		if (__i915_request_irq_complete(req))
> +			break;
> +	}
> +
> +complete:
> +	intel_engine_remove_wait(req->ring, &wait);
> +	__set_task_state(wait.task, TASK_RUNNING);
> +	trace_i915_gem_request_wait_end(req);
> +
> +	if (timeout) {
> +		*timeout -= ktime_get_raw_ns();
> +		if (*timeout < 0)
> +			*timeout = 0;
> +
> +		/*
> +		 * Apparently ktime isn't accurate enough and occasionally has a
> +		 * bit of mismatch in the jiffies<->nsecs<->ktime loop. So patch
> +		 * things up to make the test happy. We allow up to 1 jiffy.
> +		 *
> +		 * This is a regrssion from the timespec->ktime conversion.
> +		 */
> +		if (ret == -ETIME && *timeout < jiffies_to_usecs(1)*1000)
> +			*timeout = 0;
> +	}
> +
> +	if (ret == 0 && rps && req->seqno == req->ring->last_submitted_seqno) {
> +		/* The GPU is now idle and this client has stalled.
> +		 * Since no other client has submitted a request in the
> +		 * meantime, assume that this client is the only one
> +		 * supplying work to the GPU but is unable to keep that
> +		 * work supplied because it is waiting. Since the GPU is
> +		 * then never kept fully busy, RPS autoclocking will
> +		 * keep the clocks relatively low, causing further delays.
> +		 * Compensate by giving the synchronous client credit for
> +		 * a waitboost next time.
> +		 */
> +		spin_lock(&req->i915->rps.client_lock);
> +		list_del_init(&rps->link);
> +		spin_unlock(&req->i915->rps.client_lock);
> +	}
> +
> +	return ret;
> +}
> +
> +/**
> + * Waits for a request to be signaled, and cleans up the
> + * request and object lists appropriately for that event.
> + */
> +int
> +i915_wait_request(struct drm_i915_gem_request *req)
> +{
> +	struct drm_device *dev;
> +	struct drm_i915_private *dev_priv;
> +	bool interruptible;
> +	int ret;
> +
> +	BUG_ON(req == NULL);
> +
> +	dev = req->ring->dev;
> +	dev_priv = dev->dev_private;
> +	interruptible = dev_priv->mm.interruptible;
> +
> +	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
> +
> +	ret = __i915_wait_request(req, interruptible, NULL, NULL);
> +	if (ret)
> +		return ret;
> +
> +	i915_gem_request_retire_upto(req);
> +	return 0;
> +}
> +
> +void i915_gem_request_free(struct kref *req_ref)
> +{
> +	struct drm_i915_gem_request *req = container_of(req_ref,
> +						 typeof(*req), ref);
> +	struct intel_context *ctx = req->ctx;
> +
> +	if (req->file_priv)
> +		i915_gem_request_remove_from_client(req);
> +
> +	if (ctx) {
> +		if (i915.enable_execlists) {
> +			if (ctx != req->ring->default_context)
> +				intel_lr_context_unpin(req);
> +		}
> +
> +		i915_gem_context_unreference(ctx);
> +	}
> +
> +	kmem_cache_free(req->i915->requests, req);
> +}
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> new file mode 100644
> index 000000000000..d46f22f30b0a
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -0,0 +1,223 @@
> +/*
> + * Copyright © 2008-2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef I915_GEM_REQUEST_H
> +#define I915_GEM_REQUEST_H
> +
> +/**
> + * Request queue structure.
> + *
> + * The request queue allows us to note sequence numbers that have been emitted
> + * and may be associated with active buffers to be retired.
> + *
> + * By keeping this list, we can avoid having to do questionable sequence
> + * number comparisons on buffer last_read|write_seqno. It also allows an
> + * emission time to be associated with the request for tracking how far ahead
> + * of the GPU the submission is.
> + *
> + * The requests are reference counted, so upon creation they should have an
> + * initial reference taken using kref_init
> + */
> +struct drm_i915_gem_request {
> +	struct kref ref;
> +
> +	/** On Which ring this request was generated */
> +	struct drm_i915_private *i915;
> +	struct intel_engine_cs *ring;
> +	unsigned reset_counter;
> +
> +	 /** GEM sequence number associated with the previous request,
> +	  * when the HWS breadcrumb is equal to this the GPU is processing
> +	  * this request.
> +	  */
> +	u32 previous_seqno;
> +
> +	 /** GEM sequence number associated with this request,
> +	  * when the HWS breadcrumb is equal or greater than this the GPU
> +	  * has finished processing this request.
> +	  */
> +	u32 seqno;
> +
> +	/** Position in the ringbuffer of the start of the request */
> +	u32 head;
> +
> +	/**
> +	 * Position in the ringbuffer of the start of the postfix.
> +	 * This is required to calculate the maximum available ringbuffer
> +	 * space without overwriting the postfix.
> +	 */
> +	 u32 postfix;
> +
> +	/** Position in the ringbuffer of the end of the whole request */
> +	u32 tail;
> +
> +	/**
> +	 * Context and ring buffer related to this request
> +	 * Contexts are refcounted, so when this request is associated with a
> +	 * context, we must increment the context's refcount, to guarantee that
> +	 * it persists while any request is linked to it. Requests themselves
> +	 * are also refcounted, so the request will only be freed when the last
> +	 * reference to it is dismissed, and the code in
> +	 * i915_gem_request_free() will then decrement the refcount on the
> +	 * context.
> +	 */
> +	struct intel_context *ctx;
> +	struct intel_ringbuffer *ringbuf;
> +
> +	/** Batch buffer related to this request if any (used for
> +	    error state dump only) */
> +	struct drm_i915_gem_object *batch_obj;
> +
> +	/** Time at which this request was emitted, in jiffies. */
> +	unsigned long emitted_jiffies;
> +
> +	/** global list entry for this request */
> +	struct list_head list;
> +
> +	struct drm_i915_file_private *file_priv;
> +	/** file_priv list entry for this request */
> +	struct list_head client_list;
> +
> +	/** process identifier submitting this request */
> +	struct pid *pid;
> +
> +	/**
> +	 * The ELSP only accepts two elements at a time, so we queue
> +	 * context/tail pairs on a given queue (ring->execlist_queue) until the
> +	 * hardware is available. The queue serves a double purpose: we also use
> +	 * it to keep track of the up to 2 contexts currently in the hardware
> +	 * (usually one in execution and the other queued up by the GPU): We
> +	 * only remove elements from the head of the queue when the hardware
> +	 * informs us that an element has been completed.
> +	 *
> +	 * All accesses to the queue are mediated by a spinlock
> +	 * (ring->execlist_lock).
> +	 */
> +
> +	/** Execlist link in the submission queue.*/
> +	struct list_head execlist_link;
> +
> +	/** Execlists no. of times this request has been sent to the ELSP */
> +	int elsp_submitted;
> +};
> +
> +int i915_gem_request_alloc(struct intel_engine_cs *ring,
> +			   struct intel_context *ctx,
> +			   struct drm_i915_gem_request **req_out);
> +void i915_gem_request_cancel(struct drm_i915_gem_request *req);
> +void i915_gem_request_free(struct kref *req_ref);
> +int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
> +				   struct drm_file *file);
> +void i915_gem_request_retire_upto(struct drm_i915_gem_request *req);
> +
> +static inline uint32_t
> +i915_gem_request_get_seqno(struct drm_i915_gem_request *req)
> +{
> +	return req ? req->seqno : 0;
> +}
> +
> +static inline struct intel_engine_cs *
> +i915_gem_request_get_ring(struct drm_i915_gem_request *req)
> +{
> +	return req ? req->ring : NULL;
> +}
> +
> +static inline struct drm_i915_gem_request *
> +i915_gem_request_reference(struct drm_i915_gem_request *req)
> +{
> +	if (req)
> +		kref_get(&req->ref);
> +	return req;
> +}
> +
> +static inline void
> +i915_gem_request_unreference(struct drm_i915_gem_request *req)
> +{
> +	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
> +	kref_put(&req->ref, i915_gem_request_free);
> +}
> +
> +static inline void
> +i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
> +{
> +	struct drm_device *dev;
> +
> +	if (!req)
> +		return;
> +
> +	dev = req->ring->dev;
> +	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
> +		mutex_unlock(&dev->struct_mutex);
> +}
> +
> +static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
> +					   struct drm_i915_gem_request *src)
> +{
> +	if (src)
> +		i915_gem_request_reference(src);
> +
> +	if (*pdst)
> +		i915_gem_request_unreference(*pdst);
> +
> +	*pdst = src;
> +}
> +
> +void __i915_add_request(struct drm_i915_gem_request *req,
> +			struct drm_i915_gem_object *batch_obj,
> +			bool flush_caches);
> +#define i915_add_request(req) \
> +	__i915_add_request(req, NULL, true)
> +#define i915_add_request_no_flush(req) \
> +	__i915_add_request(req, NULL, false)
> +
> +struct intel_rps_client;
> +
> +int __i915_wait_request(struct drm_i915_gem_request *req,
> +			bool interruptible,
> +			s64 *timeout,
> +			struct intel_rps_client *rps);
> +int __must_check i915_wait_request(struct drm_i915_gem_request *req);
> +
> +/**
> + * Returns true if seq1 is later than seq2.
> + */
> +static inline bool
> +i915_seqno_passed(uint32_t seq1, uint32_t seq2)
> +{
> +	return (int32_t)(seq1 - seq2) >= 0;
> +}
> +
> +static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
> +{
> +	return i915_seqno_passed(intel_ring_get_seqno(req->ring),
> +				 req->previous_seqno);
> +}
> +
> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
> +{
> +	return i915_seqno_passed(intel_ring_get_seqno(req->ring),
> +				 req->seqno);
> +}
> +
> +#endif /* I915_GEM_REQUEST_H */
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 044/190] drm/i915: Move GEM request routines to i915_gem_request.c
  2016-02-25 17:52   ` Arun Siluvery
@ 2016-03-08 12:58     ` Tvrtko Ursulin
  2016-03-08 13:35       ` Arun Siluvery
  0 siblings, 1 reply; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-03-08 12:58 UTC (permalink / raw)
  To: Arun Siluvery, Chris Wilson, intel-gfx


On 25/02/16 17:52, Arun Siluvery wrote:
> On 11/01/2016 09:16, Chris Wilson wrote:
>> Migrate the request operations out of the main body of i915_gem.c and
>> into their own C file for easier expansion.
>>
>> v2: Move __i915_add_request() across as well
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> ---
>
> don't we lose the history in git blame when moved to a new file? is this
> ok? especially for files like i915_gem.c.

Could git blame -C be the answer to this downside? If it works as 
advertised:

"""
        -C|<num>|
            In addition to -M, detect lines moved or copied from other 
files that were modified in the same commit. This is useful when you 
reorganize your program and move code around across files. When this 
option is given
            twice, the command additionally looks for copies from other 
files in the commit that creates the file. When this option is given 
three times, the command additionally looks for copies from other files 
in any
            commit.
"""

I think there would be value in separating the request bits from the 
memory management bits. It is easier to work with smaller files, 
especially when they are logically organised.

So ack on this from me.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 045/190] drm/i915: Move releasing of the GEM request from free to retire/cancel
  2016-01-11  9:16 ` [PATCH 045/190] drm/i915: Move releasing of the GEM request from free to retire/cancel Chris Wilson
@ 2016-03-08 13:15   ` Tvrtko Ursulin
  2016-04-05 13:42     ` Tvrtko Ursulin
  0 siblings, 1 reply; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-03-08 13:15 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 11/01/16 09:16, Chris Wilson wrote:
> If we move the release of the GEM request (i.e. decoupling it from the
> various lists used for client and context tracking) after it is complete
> (either by the GPU retiring the request, or by the caller cancelling the
> request), we can remove the requirement that the final unreference of
> the GEM request need to be under the struct_mutex.
>
> v2: Execlists as always is badly asymetric and year old patches still
> haven't landed to fix it up.

Looks good and pretty standalone to me (depends on i915_gem_request code 
movement only I think). Just one question below.

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c          |  4 +--
>   drivers/gpu/drm/i915/i915_gem_request.c  | 50 ++++++++++++++------------------
>   drivers/gpu/drm/i915/i915_gem_request.h  | 14 ---------
>   drivers/gpu/drm/i915/intel_breadcrumbs.c |  2 +-
>   drivers/gpu/drm/i915/intel_display.c     |  2 +-
>   drivers/gpu/drm/i915/intel_lrc.c         |  6 ++--
>   drivers/gpu/drm/i915/intel_pm.c          |  2 +-
>   7 files changed, 30 insertions(+), 50 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 68a25617ca7a..6d8d65304abf 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2502,7 +2502,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   			ret = __i915_wait_request(req[i], true,
>   						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
>   						  to_rps_client(file));
> -		i915_gem_request_unreference__unlocked(req[i]);
> +		i915_gem_request_unreference(req[i]);
>   	}
>   	return ret;
>
> @@ -3505,7 +3505,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
>   		return 0;
>
>   	ret = __i915_wait_request(target, true, NULL, NULL);
> -	i915_gem_request_unreference__unlocked(target);
> +	i915_gem_request_unreference(target);
>
>   	return ret;
>   }
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index b4ede6dd7b20..1c4f4d83a3c2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -184,13 +184,6 @@ err:
>   	return ret;
>   }
>
> -void i915_gem_request_cancel(struct drm_i915_gem_request *req)
> -{
> -	intel_ring_reserved_space_cancel(req->ringbuf);
> -
> -	i915_gem_request_unreference(req);
> -}
> -
>   int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>   				   struct drm_file *file)
>   {
> @@ -235,9 +228,28 @@ i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
>   	request->pid = NULL;
>   }
>
> +static void __i915_gem_request_release(struct drm_i915_gem_request *request)
> +{
> +	i915_gem_request_remove_from_client(request);
> +
> +	i915_gem_context_unreference(request->ctx);
> +	i915_gem_request_unreference(request);
> +}
> +
> +void i915_gem_request_cancel(struct drm_i915_gem_request *req)
> +{
> +	intel_ring_reserved_space_cancel(req->ringbuf);
> +	if (i915.enable_execlists) {
> +		if (req->ctx != req->ring->default_context)
> +			intel_lr_context_unpin(req);
> +	}
> +	__i915_gem_request_release(req);
> +}
> +
>   static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>   {
>   	trace_i915_gem_request_retire(request);
> +	list_del_init(&request->list);
>
>   	/* We know the GPU must have read the request to have
>   	 * sent us the seqno + interrupt, so use the position
> @@ -248,11 +260,7 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>   	 * completion order.
>   	 */
>   	request->ringbuf->last_retired_head = request->postfix;
> -
> -	list_del_init(&request->list);
> -	i915_gem_request_remove_from_client(request);
> -
> -	i915_gem_request_unreference(request);
> +	__i915_gem_request_release(request);
>   }
>
>   void
> @@ -639,21 +647,7 @@ i915_wait_request(struct drm_i915_gem_request *req)
>
>   void i915_gem_request_free(struct kref *req_ref)
>   {
> -	struct drm_i915_gem_request *req = container_of(req_ref,
> -						 typeof(*req), ref);
> -	struct intel_context *ctx = req->ctx;
> -
> -	if (req->file_priv)
> -		i915_gem_request_remove_from_client(req);
> -
> -	if (ctx) {
> -		if (i915.enable_execlists) {
> -			if (ctx != req->ring->default_context)
> -				intel_lr_context_unpin(req);
> -		}
> -
> -		i915_gem_context_unreference(ctx);
> -	}
> -
> +	struct drm_i915_gem_request *req =
> +		container_of(req_ref, typeof(*req), ref);
>   	kmem_cache_free(req->i915->requests, req);
>   }
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index d46f22f30b0a..af1b825fce50 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -154,23 +154,9 @@ i915_gem_request_reference(struct drm_i915_gem_request *req)
>   static inline void
>   i915_gem_request_unreference(struct drm_i915_gem_request *req)
>   {
> -	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
>   	kref_put(&req->ref, i915_gem_request_free);
>   }
>
> -static inline void
> -i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
> -{
> -	struct drm_device *dev;
> -
> -	if (!req)
> -		return;
> -
> -	dev = req->ring->dev;
> -	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
> -		mutex_unlock(&dev->struct_mutex);
> -}
> -
>   static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>   					   struct drm_i915_gem_request *src)
>   {
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 0ea01bd6811c..f6731aac7fcf 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -390,7 +390,7 @@ static int intel_breadcrumbs_signaler(void *arg)
>   			 */
>   			intel_engine_remove_wait(engine, &signal->wait);
>
> -			i915_gem_request_unreference__unlocked(signal->request);
> +			i915_gem_request_unreference(signal->request);
>
>   			/* Find the next oldest signal. Note that as we have
>   			 * not been holding the lock, another client may
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 57c54c9bc82b..32885b8d5c02 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -11431,7 +11431,7 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
>   		WARN_ON(__i915_wait_request(mmio_flip->req,
>   					    false, NULL,
>   					    &mmio_flip->i915->rps.mmioflips));
> -		i915_gem_request_unreference__unlocked(mmio_flip->req);
> +		i915_gem_request_unreference(mmio_flip->req);
>   	}
>
>   	/* For framebuffer backed by dmabuf, wait for fence */
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index b634e7d7a92b..7a3069a2beb2 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -587,9 +587,6 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
>   	struct drm_i915_gem_request *cursor;
>   	int num_elements = 0;
>
> -	if (request->ctx != ring->default_context)
> -		intel_lr_context_pin(request);
> -

Since you remove LRC pin from queue, the lifetime is now either:

1. From request create to cancel.
2. From request create to execlist retirement.

Would it be more logical to leave the LRC pin in queue, but remove it 
from request creation instead? That would make the LRC pin lifetime only 
a single possibility, from queue to execlist retire.

>   	i915_gem_request_reference(request);
>
>   	spin_lock_irq(&ring->execlist_lock);
> @@ -1071,6 +1068,8 @@ static int intel_lr_context_pin(struct drm_i915_gem_request *rq)
>   		ret = intel_lr_context_do_pin(ring, ctx_obj, ringbuf);
>   		if (ret)
>   			goto reset_pin_count;
> +
> +		i915_gem_context_reference(rq->ctx);
>   	}
>   	return ret;
>
> @@ -1090,6 +1089,7 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
>   		if (--rq->ctx->engine[ring->id].pin_count == 0) {
>   			intel_unpin_ringbuffer_obj(ringbuf);
>   			i915_gem_object_ggtt_unpin(ctx_obj);
> +			i915_gem_context_unreference(rq->ctx);
>   		}
>   	}
>   }
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index e51ba529a97e..0e13135aefaa 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -7289,7 +7289,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
>   		gen6_rps_boost(to_i915(req->ring->dev), NULL,
>   			       req->emitted_jiffies);
>
> -	i915_gem_request_unreference__unlocked(req);
> +	i915_gem_request_unreference(req);
>   	kfree(boost);
>   }
>
>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 044/190] drm/i915: Move GEM request routines to i915_gem_request.c
  2016-03-08 12:58     ` Tvrtko Ursulin
@ 2016-03-08 13:35       ` Arun Siluvery
  0 siblings, 0 replies; 263+ messages in thread
From: Arun Siluvery @ 2016-03-08 13:35 UTC (permalink / raw)
  To: Tvrtko Ursulin, Chris Wilson, intel-gfx

On 08/03/2016 12:58, Tvrtko Ursulin wrote:
>
> On 25/02/16 17:52, Arun Siluvery wrote:
>> On 11/01/2016 09:16, Chris Wilson wrote:
>>> Migrate the request operations out of the main body of i915_gem.c and
>>> into their own C file for easier expansion.
>>>
>>> v2: Move __i915_add_request() across as well
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>
>> don't we lose the history in git blame when moved to a new file? is this
>> ok? especially for files like i915_gem.c.
>
> Could git blame -C be the answer to this downside? If it works as
> advertised:
>
> """
>         -C|<num>|
>             In addition to -M, detect lines moved or copied from other
> files that were modified in the same commit. This is useful when you
> reorganize your program and move code around across files. When this
> option is given
>             twice, the command additionally looks for copies from other
> files in the commit that creates the file. When this option is given
> three times, the command additionally looks for copies from other files
> in any
>             commit.
> """
>
> I think there would be value in separating the request bits from the
> memory management bits. It is easier to work with smaller files,
> especially when they are logically organised.

I agree, working with smaller, logically organised files is definitely 
easier, just wanted to understand effect on history with movement of 
code. I was not aware of 'git blame -C<n>' but it seemed to help.

As a test, I moved a fn from one file to another and I could see the 
original commit where that fn was introduced.

regards
Arun

>
> So ack on this from me.
>
> Regards,
>
> Tvrtko
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 105/190] drm/i915: Pad GTT views of exec objects up to user specified size
  2016-01-11 10:44   ` [PATCH 105/190] drm/i915: Pad GTT views of exec objects up to user specified size Chris Wilson
@ 2016-03-22 14:32     ` David Weinehall
  0 siblings, 0 replies; 263+ messages in thread
From: David Weinehall @ 2016-03-22 14:32 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Jan 11, 2016 at 10:44:49AM +0000, Chris Wilson wrote:
> Our GPUs impose certain requirements upon buffers that depend upon how
> exactly they are used. Typically this is expressed as that they require
> a larger surface than would be naively computed by pitch * height.
> Normally such requirements are hidden away in the userspace driver, but
> when we accept pointers from strangers and later impose extra conditions
> on them, the original client allocator has no idea about the
> monstrosities in the GPU and we require the userspace driver to inform
> the kernel how many padding pages are required beyond the client
> allocation.
> 
> v2: Long time, no see
> v3: Try an anonymous union for uapi struct compatability
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Testcase: gem_mmap_gtt --run-subtest huge-bo-tiledX
Tested-by: David Weinehall <david.weinehall@intel.com>

This patch fixes an OOPS on gen8+ triggered by the mentioned testcase.

> ---
>  drivers/gpu/drm/i915/i915_drv.h            |  6 ++-
>  drivers/gpu/drm/i915/i915_gem.c            | 79 +++++++++++++++---------------
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 16 +++++-
>  include/uapi/drm/i915_drm.h                |  8 ++-
>  4 files changed, 64 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 4ada625b751e..49b126e4191e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2694,11 +2694,13 @@ void i915_gem_free_object(struct drm_gem_object *obj);
>  int __must_check
>  i915_gem_object_pin(struct drm_i915_gem_object *obj,
>  		    struct i915_address_space *vm,
> +		    uint64_t size,
>  		    uint32_t alignment,
>  		    uint64_t flags);
>  int __must_check
>  i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
>  			 const struct i915_ggtt_view *view,
> +			 uint64_t size,
>  			 uint32_t alignment,
>  			 uint64_t flags);
>  
> @@ -2931,8 +2933,8 @@ i915_gem_obj_ggtt_pin(struct drm_i915_gem_object *obj,
>  		      uint32_t alignment,
>  		      unsigned flags)
>  {
> -	return i915_gem_object_pin(obj, i915_obj_to_ggtt(obj),
> -				   alignment, flags | PIN_GLOBAL);
> +	return i915_gem_object_pin(obj, i915_obj_to_ggtt(obj), 0, alignment,
> +				   flags | PIN_GLOBAL);
>  }
>  
>  static inline int
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index a82a06a61262..2f14d2da75a5 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1440,7 +1440,7 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  	}
>  
>  	/* Now pin it into the GTT if needed */
> -	ret = i915_gem_object_ggtt_pin(obj, &view, 0, PIN_MAPPABLE);
> +	ret = i915_gem_object_ggtt_pin(obj, &view, 0, 0, PIN_MAPPABLE);
>  	if (ret)
>  		goto unlock;
>  
> @@ -2746,20 +2746,20 @@ static struct i915_vma *
>  i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
>  			   struct i915_address_space *vm,
>  			   const struct i915_ggtt_view *ggtt_view,
> +			   uint64_t size,
>  			   unsigned alignment,
>  			   uint64_t flags)
>  {
>  	struct drm_device *dev = obj->base.dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> -	u32 fence_alignment, unfenced_alignment;
> -	u32 search_flag, alloc_flag;
>  	u64 start, end;
> -	u64 size, fence_size;
> +	u32 search_flag, alloc_flag;
>  	struct i915_vma *vma;
>  	int ret;
>  
>  	if (i915_is_ggtt(vm)) {
> -		u32 view_size;
> +		u32 fence_size, fence_alignment, unfenced_alignment;
> +		u64 view_size;
>  
>  		if (WARN_ON(!ggtt_view))
>  			return ERR_PTR(-EINVAL);
> @@ -2777,21 +2777,22 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
>  								view_size,
>  								obj->tiling_mode,
>  								false);
> -		size = flags & PIN_MAPPABLE ? fence_size : view_size;
> +		size = max(size, view_size);
> +		if (flags & PIN_MAPPABLE)
> +			size = max_t(u64, size, fence_size);
> +
> +		if (alignment == 0)
> +			alignment = flags & PIN_MAPPABLE ? fence_alignment :
> +				unfenced_alignment;
> +		if (flags & PIN_MAPPABLE && alignment & (fence_alignment - 1)) {
> +			DRM_DEBUG("Invalid object (view type=%u) alignment requested %u\n",
> +				  ggtt_view ? ggtt_view->type : 0,
> +				  alignment);
> +			return ERR_PTR(-EINVAL);
> +		}
>  	} else {
> -		fence_size = i915_gem_get_gtt_size(dev,
> -						   obj->base.size,
> -						   obj->tiling_mode);
> -		fence_alignment = i915_gem_get_gtt_alignment(dev,
> -							     obj->base.size,
> -							     obj->tiling_mode,
> -							     true);
> -		unfenced_alignment =
> -			i915_gem_get_gtt_alignment(dev,
> -						   obj->base.size,
> -						   obj->tiling_mode,
> -						   false);
> -		size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
> +		size = max_t(u64, size, obj->base.size);
> +		alignment = 4096;
>  	}
>  
>  	start = flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
> @@ -2801,24 +2802,14 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
>  	if (flags & PIN_ZONE_4G)
>  		end = min_t(u64, end, (1ULL << 32));
>  
> -	if (alignment == 0)
> -		alignment = flags & PIN_MAPPABLE ? fence_alignment :
> -						unfenced_alignment;
> -	if (flags & PIN_MAPPABLE && alignment & (fence_alignment - 1)) {
> -		DRM_DEBUG("Invalid object (view type=%u) alignment requested %u\n",
> -			  ggtt_view ? ggtt_view->type : 0,
> -			  alignment);
> -		return ERR_PTR(-EINVAL);
> -	}
> -
>  	/* If binding the object/GGTT view requires more space than the entire
>  	 * aperture has, reject it early before evicting everything in a vain
>  	 * attempt to find space.
>  	 */
>  	if (size > end) {
> -		DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%llu > %s aperture=%llu\n",
> +		DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: request=%llu [object=%zd] > %s aperture=%llu\n",
>  			  ggtt_view ? ggtt_view->type : 0,
> -			  size,
> +			  size, obj->base.size,
>  			  flags & PIN_MAPPABLE ? "mappable" : "total",
>  			  end);
>  		return ERR_PTR(-E2BIG);
> @@ -3309,7 +3300,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
>  	 * (e.g. libkms for the bootup splash), we have to ensure that we
>  	 * always use map_and_fenceable for all scanout buffers.
>  	 */
> -	ret = i915_gem_object_ggtt_pin(obj, view, alignment,
> +	ret = i915_gem_object_ggtt_pin(obj, view, 0, alignment,
>  				       view->type == I915_GGTT_VIEW_NORMAL ?
>  				       PIN_MAPPABLE : 0);
>  	if (ret)
> @@ -3459,12 +3450,17 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
>  }
>  
>  static bool
> -i915_vma_misplaced(struct i915_vma *vma, uint32_t alignment, uint64_t flags)
> +i915_vma_misplaced(struct i915_vma *vma,
> +		   uint64_t size,
> +		   uint32_t alignment,
> +		   uint64_t flags)
>  {
>  	struct drm_i915_gem_object *obj = vma->obj;
>  
> -	if (alignment &&
> -	    vma->node.start & (alignment - 1))
> +	if (vma->node.size < size)
> +		return true;
> +
> +	if (alignment && vma->node.start & (alignment - 1))
>  		return true;
>  
>  	if (flags & PIN_MAPPABLE && !obj->map_and_fenceable)
> @@ -3508,6 +3504,7 @@ static int
>  i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
>  		       struct i915_address_space *vm,
>  		       const struct i915_ggtt_view *ggtt_view,
> +		       uint64_t size,
>  		       uint32_t alignment,
>  		       uint64_t flags)
>  {
> @@ -3538,7 +3535,7 @@ i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
>  		if (WARN_ON(vma->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT))
>  			return -EBUSY;
>  
> -		if (i915_vma_misplaced(vma, alignment, flags)) {
> +		if (i915_vma_misplaced(vma, size, alignment, flags)) {
>  			WARN(vma->pin_count,
>  			     "bo is already pinned in %s with incorrect alignment:"
>  			     " offset=%08x %08x, req.alignment=%x, req.map_and_fenceable=%d,"
> @@ -3559,8 +3556,8 @@ i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
>  
>  	bound = vma ? vma->bound : 0;
>  	if (vma == NULL || !drm_mm_node_allocated(&vma->node)) {
> -		vma = i915_gem_object_bind_to_vm(obj, vm, ggtt_view, alignment,
> -						 flags);
> +		vma = i915_gem_object_bind_to_vm(obj, vm, ggtt_view,
> +						 size, alignment, flags);
>  		if (IS_ERR(vma))
>  			return PTR_ERR(vma);
>  	} else {
> @@ -3582,17 +3579,19 @@ i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
>  int
>  i915_gem_object_pin(struct drm_i915_gem_object *obj,
>  		    struct i915_address_space *vm,
> +		    uint64_t size,
>  		    uint32_t alignment,
>  		    uint64_t flags)
>  {
>  	return i915_gem_object_do_pin(obj, vm,
>  				      i915_is_ggtt(vm) ? &i915_ggtt_view_normal : NULL,
> -				      alignment, flags);
> +				      size, alignment, flags);
>  }
>  
>  int
>  i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
>  			 const struct i915_ggtt_view *view,
> +			 uint64_t size,
>  			 uint32_t alignment,
>  			 uint64_t flags)
>  {
> @@ -3600,7 +3599,7 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
>  		return -EINVAL;
>  
>  	return i915_gem_object_do_pin(obj, i915_obj_to_ggtt(obj), view,
> -				      alignment, flags | PIN_GLOBAL);
> +				      size, alignment, flags | PIN_GLOBAL);
>  }
>  
>  void
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index d88be1d3cb86..899220139a8a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -642,10 +642,14 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
>  			flags |= PIN_HIGH;
>  	}
>  
> -	ret = i915_gem_object_pin(obj, vma->vm, entry->alignment, flags);
> +	ret = i915_gem_object_pin(obj, vma->vm,
> +				  entry->pad_to_size,
> +				  entry->alignment,
> +				  flags);
>  	if ((ret == -ENOSPC  || ret == -E2BIG) &&
>  	    only_mappable_for_reloc(entry->flags))
>  		ret = i915_gem_object_pin(obj, vma->vm,
> +					  entry->pad_to_size,
>  					  entry->alignment,
>  					  flags & ~PIN_MAPPABLE);
>  	if (ret)
> @@ -708,6 +712,9 @@ eb_vma_misplaced(struct i915_vma *vma)
>  	    vma->node.start & (entry->alignment - 1))
>  		return true;
>  
> +	if (vma->node.size < entry->pad_to_size)
> +		return true;
> +
>  	if (entry->flags & EXEC_OBJECT_PINNED &&
>  	    vma->node.start != entry->offset)
>  		return true;
> @@ -1044,6 +1051,13 @@ validate_exec_list(struct drm_device *dev,
>  		if (exec[i].alignment && !is_power_of_2(exec[i].alignment))
>  			return -EINVAL;
>  
> +		/* pad_to_size was once a reserved field, so sanitize it */
> +		if (exec[i].flags & EXEC_OBJECT_PAD_TO_SIZE) {
> +			if (offset_in_page(exec[i].pad_to_size))
> +				return -EINVAL;
> +		} else
> +			exec[i].pad_to_size = 0;
> +
>  		/* First check for malicious input causing overflow in
>  		 * the worst case where we need to allocate the entire
>  		 * relocation tree as a single array.
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 7fee4416dcc7..ff7b438059da 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -697,10 +697,14 @@ struct drm_i915_gem_exec_object2 {
>  #define EXEC_OBJECT_WRITE	(1<<2)
>  #define EXEC_OBJECT_SUPPORTS_48B_ADDRESS (1<<3)
>  #define EXEC_OBJECT_PINNED	(1<<4)
> -#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_PINNED<<1)
> +#define EXEC_OBJECT_PAD_TO_SIZE	(1<<5)
> +#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_PAD_TO_SIZE<<1)
>  	__u64 flags;
>  
> -	__u64 rsvd1;
> +	union {
> +		__u64 rsvd1;
> +		__u64 pad_to_size;
> +	};
>  	__u64 rsvd2;
>  };
>  
> -- 
> 2.7.0.rc3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 017/190] drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+
  2016-01-11  9:16 ` [PATCH 017/190] drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+ Chris Wilson
  2016-01-11 14:02   ` Dave Gordon
@ 2016-03-24  6:39   ` David Weinehall
  1 sibling, 0 replies; 263+ messages in thread
From: David Weinehall @ 2016-03-24  6:39 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx

On Mon, Jan 11, 2016 at 09:16:28AM +0000, Chris Wilson wrote:
> In order to ensure seqno/irq coherency, we current read a ring register.

currently

> We are not sure quite how it works, only that is does. Experiments show
> that e.g. doing a clflush(seqno) instead is not sufficient, but we can
> remove the forcewake dance from the mmio access.
> 
> v2: Baytrail wants a clflush too.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 99780b674311..a1d43b2c7077 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1490,10 +1490,21 @@ gen6_ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
>  {
>  	/* Workaround to force correct ordering between irq and seqno writes on
>  	 * ivb (and maybe also on snb) by reading from a CS register (like
> -	 * ACTHD) before reading the status page. */
> +	 * ACTHD) before reading the status page.
> +	 *
> +	 * Note that this effectively effectively stalls the read by the time

s/effectively//

> +	 * it takes to do a memory transaction, which more or less ensures
> +	 * that the write from the GPU has sufficient time to invalidate
> +	 * the CPU cacheline. Alternatively we could delay the interrupt from
> +	 * the CS ring to give the write time to land, but that would incur
> +	 * a delay after every batch i.e. much more frequent than a delay
> +	 * when waiting for the interrupt (with the same net latency).
> +	 */
>  	if (!lazy_coherency) {
>  		struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -		POSTING_READ(RING_ACTHD(ring->mmio_base));
> +		POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
> +
> +		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
>  	}
>  
>  	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
> -- 
> 2.7.0.rc3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 174/190] drm/i915: Show context objects in debugfs/i915_gem_objects
  2016-01-11 11:01   ` [PATCH 174/190] drm/i915: Show context objects in debugfs/i915_gem_objects Chris Wilson
@ 2016-03-24  7:58     ` David Weinehall
  0 siblings, 0 replies; 263+ messages in thread
From: David Weinehall @ 2016-03-24  7:58 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Jan 11, 2016 at 11:01:15AM +0000, Chris Wilson wrote:
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

With ring --> ringbuf, unless your s/ringbuf/ring/ patch is merged, obviously:

Reviewed-by: David Weinehall <david.weinehall@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 35 +++++++++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 19b0d6a7680d..f8ca00ce986e 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -370,6 +370,40 @@ static void print_batch_pool_stats(struct seq_file *m,
>  	print_file_stats(m, "[k]batch pool", stats);
>  }
>  
> +static int per_file_ctx_stats(int id, void *ptr, void *data)
> +{
> +	struct intel_context *ctx = ptr;
> +	int n;
> +
> +	for (n = 0; n < ARRAY_SIZE(ctx->engine); n++) {
> +		if (ctx->engine[n].state)
> +			per_file_stats(0, ctx->engine[n].state, data);
> +		if (ctx->engine[n].ring)
> +			per_file_stats(0, ctx->engine[n].ring->obj, data);
> +	}
> +
> +	return 0;
> +}
> +
> +static void print_context_stats(struct seq_file *m,
> +				struct drm_i915_private *dev_priv)
> +{
> +	struct file_stats stats;
> +	struct drm_file *file;
> +
> +	memset(&stats, 0, sizeof(stats));
> +
> +	if (dev_priv->kernel_context)
> +		per_file_ctx_stats(0, dev_priv->kernel_context, &stats);
> +
> +	list_for_each_entry(file, &dev_priv->dev->filelist, lhead) {
> +		struct drm_i915_file_private *fpriv = file->driver_priv;
> +		idr_for_each(&fpriv->context_idr, per_file_ctx_stats, &stats);
> +	}
> +
> +	print_file_stats(m, "[k]contexts", stats);
> +}
> +
>  #define count_vmas(list, member) do { \
>  	list_for_each_entry(vma, list, member) { \
>  		size += vma->size; \
> @@ -471,6 +505,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
>  
>  	seq_putc(m, '\n');
>  	print_batch_pool_stats(m, dev_priv);
> +	print_context_stats(m, dev_priv);
>  	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
>  		struct file_stats stats;
>  		struct drm_i915_file_private *file_priv = file->driver_priv;
> -- 
> 2.7.0.rc3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 164/190] drm/i915: Move obj->dirty:1 to obj->flags
  2016-01-11 11:01   ` [PATCH 164/190] drm/i915: Move obj->dirty:1 to obj->flags Chris Wilson
@ 2016-03-24  8:17     ` David Weinehall
  0 siblings, 0 replies; 263+ messages in thread
From: David Weinehall @ 2016-03-24  8:17 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Jan 11, 2016 at 11:01:05AM +0000, Chris Wilson wrote:
> The obj->dirty bit is a companion to the obj->active bits that were
> moved to the obj->flags bitmask. Since we also update this bit inside
> the i915_vma_move_to_active() hotpath, we can aide gcc by also moving
> the obj->dirty bit to obj->flags bitmask.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Needs rebasing -- assuming such a rebase:

Reviewed-by: David Weinehall <david.weinehall@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_debugfs.c        |  2 +-
>  drivers/gpu/drm/i915/i915_drv.h            | 21 ++++++++++++++++++++-
>  drivers/gpu/drm/i915/i915_gem.c            | 18 +++++++++---------
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +--
>  drivers/gpu/drm/i915/i915_gem_userptr.c    |  6 +++---
>  drivers/gpu/drm/i915/i915_gpu_error.c      |  2 +-
>  drivers/gpu/drm/i915/intel_lrc.c           |  2 +-
>  7 files changed, 36 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 558d79b63e6c..8a59630fe5fb 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -136,7 +136,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>  	seq_printf(m, "] %x %s%s%s",
>  		   i915_gem_request_get_seqno(obj->last_write.request),
>  		   i915_cache_level_str(to_i915(obj->base.dev), obj->cache_level),
> -		   obj->dirty ? " dirty" : "",
> +		   i915_gem_object_is_dirty(obj) ? " dirty" : "",
>  		   obj->madv == I915_MADV_DONTNEED ? " purgeable" : "");
>  	if (obj->base.name)
>  		seq_printf(m, " (name: %d)", obj->base.name);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 62a024a7225b..d664a67cda7b 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2058,7 +2058,8 @@ struct drm_i915_gem_object {
>  	 * This is set if the object has been written to since last bound
>  	 * to the GTT
>  	 */
> -	unsigned int dirty:1;
> +#define I915_BO_DIRTY_SHIFT (I915_BO_ACTIVE_REF_SHIFT + 1)
> +#define I915_BO_DIRTY_BIT (1 << I915_BO_DIRTY_SHIFT)
>  
>  	/**
>  	 * Advice: are the backing pages purgeable?
> @@ -2189,6 +2190,24 @@ i915_gem_object_unset_active_reference(struct drm_i915_gem_object *obj)
>  }
>  void __i915_gem_object_release_unless_active(struct drm_i915_gem_object *obj);
>  
> +static inline bool
> +i915_gem_object_is_dirty(const struct drm_i915_gem_object *obj)
> +{
> +	return obj->flags & I915_BO_DIRTY_BIT;
> +}
> +
> +static inline void
> +i915_gem_object_set_dirty(struct drm_i915_gem_object *obj)
> +{
> +	obj->flags |= I915_BO_DIRTY_BIT;
> +}
> +
> +static inline void
> +i915_gem_object_unset_dirty(struct drm_i915_gem_object *obj)
> +{
> +	obj->flags &= ~I915_BO_DIRTY_BIT;
> +}
> +
>  void i915_gem_track_fb(struct drm_i915_gem_object *old,
>  		       struct drm_i915_gem_object *new,
>  		       unsigned frontbuffer_bits);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 497b68849d09..5347469bbea1 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -209,9 +209,9 @@ i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj)
>  	}
>  
>  	if (obj->madv == I915_MADV_DONTNEED)
> -		obj->dirty = 0;
> +		i915_gem_object_unset_dirty(obj);
>  
> -	if (obj->dirty) {
> +	if (i915_gem_object_is_dirty(obj)) {
>  		struct address_space *mapping = file_inode(obj->base.filp)->i_mapping;
>  		char *vaddr = obj->phys_handle->vaddr;
>  		int i;
> @@ -235,7 +235,7 @@ i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj)
>  			page_cache_release(page);
>  			vaddr += PAGE_SIZE;
>  		}
> -		obj->dirty = 0;
> +		i915_gem_object_unset_dirty(obj);
>  	}
>  
>  	sg_free_table(obj->pages);
> @@ -589,7 +589,7 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
>  
>  out:
>  	intel_fb_obj_invalidate(obj, ORIGIN_CPU);
> -	obj->dirty = 1;
> +	i915_gem_object_set_dirty(obj);
>  	/* return with the pages pinned */
>  	return 0;
>  
> @@ -1836,12 +1836,12 @@ i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
>  		i915_gem_object_save_bit_17_swizzle(obj);
>  
>  	if (obj->madv == I915_MADV_DONTNEED)
> -		obj->dirty = 0;
> +		i915_gem_object_unset_dirty(obj);
>  
>  	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
>  		struct page *page = sg_page_iter_page(&sg_iter);
>  
> -		if (obj->dirty)
> +		if (i915_gem_object_is_dirty(obj))
>  			set_page_dirty(page);
>  
>  		if (obj->madv == I915_MADV_WILLNEED)
> @@ -1849,7 +1849,7 @@ i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
>  
>  		page_cache_release(page);
>  	}
> -	obj->dirty = 0;
> +	i915_gem_object_unset_dirty(obj);
>  
>  	sg_free_table(obj->pages);
>  	kfree(obj->pages);
> @@ -3029,7 +3029,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>  	if (write) {
>  		obj->base.read_domains = I915_GEM_DOMAIN_GTT;
>  		obj->base.write_domain = I915_GEM_DOMAIN_GTT;
> -		obj->dirty = 1;
> +		i915_gem_object_set_dirty(obj);
>  	}
>  
>  	trace_i915_gem_object_change_domain(obj,
> @@ -4389,7 +4389,7 @@ i915_gem_object_create_from_data(struct drm_device *dev,
>  	i915_gem_object_pin_pages(obj);
>  	sg = obj->pages;
>  	bytes = sg_copy_from_buffer(sg->sgl, sg->nents, (void *)data, size);
> -	obj->dirty = 1;		/* Backing store is now out of date */
> +	i915_gem_object_set_dirty(obj); /* Backing store is now out of date */
>  	i915_gem_object_unpin_pages(obj);
>  
>  	if (WARN_ON(bytes != size)) {
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 7af562996767..185fbf45a5d2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1197,14 +1197,13 @@ void i915_vma_move_to_active(struct i915_vma *vma,
>  
>  	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
>  
> -	obj->dirty = 1; /* be paranoid  */
> -
>  	/* The order in which we add operations to the retirement queue is
>  	 * vital here: mark_active adds to the start of the callback list,
>  	 * such that subsequent callbacks are called first. Therefore we
>  	 * add the active reference first and queue for it to be dropped
>  	 * *last*.
>  	 */
> +	i915_gem_object_set_dirty(obj); /* be paranoid */
>  	i915_gem_object_set_active(obj, engine);
>  	i915_gem_request_mark_active(req, &obj->last_read[engine]);
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index 53f8094b3198..232ce85b39db 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -745,20 +745,20 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj)
>  	__i915_gem_userptr_set_active(obj, false);
>  
>  	if (obj->madv != I915_MADV_WILLNEED)
> -		obj->dirty = 0;
> +		i915_gem_object_unset_dirty(obj);
>  
>  	i915_gem_gtt_finish_object(obj);
>  
>  	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
>  		struct page *page = sg_page_iter_page(&sg_iter);
>  
> -		if (obj->dirty)
> +		if (i915_gem_object_is_dirty(obj))
>  			set_page_dirty(page);
>  
>  		mark_page_accessed(page);
>  		page_cache_release(page);
>  	}
> -	obj->dirty = 0;
> +	i915_gem_object_unset_dirty(obj);
>  
>  	sg_free_table(obj->pages);
>  	kfree(obj->pages);
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index e9ef6b25c696..6fbb11a53b60 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -715,7 +715,7 @@ static void capture_bo(struct drm_i915_error_buffer *err,
>  	err->write_domain = obj->base.write_domain;
>  	err->fence_reg = vma->fence ? vma->fence->id : -1;
>  	err->tiling = obj->tiling_mode;
> -	err->dirty = obj->dirty;
> +	err->dirty = i915_gem_object_is_dirty(obj);
>  	err->purgeable = obj->madv != I915_MADV_WILLNEED;
>  	err->userptr = obj->userptr.mm != NULL;
>  	err->ring = obj->last_write.request ? obj->last_write.request->engine->id : -1;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 62f19ed51fb2..3e61fce1326e 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -597,7 +597,7 @@ static int intel_lr_context_pin(struct intel_context *ctx,
>  
>  	i915_gem_context_reference(ctx);
>  	ce->vma = vma;
> -	vma->obj->dirty = true;
> +	i915_gem_object_set_dirty(vma->obj);
>  
>  	ggtt_offset = vma->node.start + LRC_PPHWSP_PN * PAGE_SIZE;
>  	ring->context_descriptor =
> -- 
> 2.7.0.rc3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 112/190] drm/i915: Move obj->active:5 to obj->flags
  2016-01-11 10:44   ` [PATCH 112/190] drm/i915: Move obj->active:5 to obj->flags Chris Wilson
@ 2016-03-24 12:00     ` David Weinehall
  0 siblings, 0 replies; 263+ messages in thread
From: David Weinehall @ 2016-03-24 12:00 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Jan 11, 2016 at 10:44:56AM +0000, Chris Wilson wrote:
> We are motivated to avoid using a bitfield for obj->active for a couple
> of reasons. Firstly, we wish to document our lockless read of obj->active
> using READ_ONCE inside i915_gem_busy_ioctl() and that requires an
> integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code
> when presented with a bitfield and that shows up high on the profiles of
> request tracking (mainly due to excess memory traffic as it converts
> the bitfield to a register and back and generates frequent AGI in the
> process).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

This patch, together with it's dirty counterpart, seems to make a lot of sense,
but it seems the pre-requisites to make it apply are rather extensive;
I tried to tweak it to apply to a nightly, but that's not trivial.

Still, the concept seems sounds.  I dunno if there's much point of this
right now, since it cannot be merged without the pre-requisites, but:

Reviewed-by: David Weinehall <david.weinehall@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_debugfs.c        |  2 +-
>  drivers/gpu/drm/i915/i915_drv.h            | 31 +++++++++++++++++++++++++++++-
>  drivers/gpu/drm/i915/i915_gem.c            | 20 +++++++++----------
>  drivers/gpu/drm/i915/i915_gem_batch_pool.c |  4 ++--
>  drivers/gpu/drm/i915/i915_gem_context.c    |  2 +-
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 +++++-----
>  drivers/gpu/drm/i915/i915_gem_gtt.c        |  2 +-
>  drivers/gpu/drm/i915/i915_gem_shrinker.c   |  5 +++--
>  8 files changed, 53 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index dee66807c6bd..6b14c59828e3 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -136,7 +136,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>  
>  	seq_printf(m, "%pK: %s%s%s%s %8zdKiB %02x %02x [ ",
>  		   &obj->base,
> -		   obj->active ? "*" : " ",
> +		   i915_gem_object_is_active(obj) ? "*" : " ",
>  		   get_pin_flag(obj),
>  		   get_tiling_flag(obj),
>  		   get_global_flag(obj),
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index efa43411f0eb..1ecff535973e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2031,12 +2031,16 @@ struct drm_i915_gem_object {
>  
>  	struct list_head batch_pool_link;
>  
> +	unsigned long flags;
>  	/**
>  	 * This is set if the object is on the active lists (has pending
>  	 * rendering and so a non-zero seqno), and is not set if it i s on
>  	 * inactive (ready to be unbound) list.
>  	 */
> -	unsigned int active:I915_NUM_RINGS;
> +#define I915_BO_ACTIVE_SHIFT 0
> +#define I915_BO_ACTIVE_MASK ((1 << I915_NUM_RINGS) - 1)
> +#define I915_BO_ACTIVE(bo) ((bo)->flags & (I915_BO_ACTIVE_MASK << I915_BO_ACTIVE_SHIFT))
> +#define __I915_BO_ACTIVE(bo) (READ_ONCE((bo)->flags) & (I915_BO_ACTIVE_MASK << I915_BO_ACTIVE_SHIFT))
>  
>  	/**
>  	 * This is set if the object has been written to since last bound
> @@ -2151,6 +2155,31 @@ struct drm_i915_gem_object {
>  #define GEM_BUG_ON(expr)
>  #endif
>  
> +static inline bool
> +i915_gem_object_is_active(const struct drm_i915_gem_object *obj)
> +{
> +	return obj->flags & (I915_BO_ACTIVE_MASK << I915_BO_ACTIVE_SHIFT);
> +}
> +
> +static inline void
> +i915_gem_object_set_active(struct drm_i915_gem_object *obj, int engine)
> +{
> +	obj->flags |= 1 << (engine + I915_BO_ACTIVE_SHIFT);
> +}
> +
> +static inline void
> +i915_gem_object_unset_active(struct drm_i915_gem_object *obj, int engine)
> +{
> +	obj->flags &= ~(1 << (engine + I915_BO_ACTIVE_SHIFT));
> +}
> +
> +static inline bool
> +i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj,
> +				  int engine)
> +{
> +	return obj->flags & (1 << (engine + I915_BO_ACTIVE_SHIFT));
> +}
> +
>  void i915_gem_track_fb(struct drm_i915_gem_object *old,
>  		       struct drm_i915_gem_object *new,
>  		       unsigned frontbuffer_bits);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 74c56716a304..6712ecf1239b 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1130,7 +1130,7 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
>  {
>  	int ret, i;
>  
> -	if (!obj->active)
> +	if (!i915_gem_object_is_active(obj))
>  		return 0;
>  
>  	if (readonly) {
> @@ -1143,7 +1143,7 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
>  			if (ret)
>  				return ret;
>  		}
> -		GEM_BUG_ON(obj->active);
> +		GEM_BUG_ON(i915_gem_object_is_active(obj));
>  	}
>  
>  	return 0;
> @@ -1165,7 +1165,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
>  	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
>  	BUG_ON(!dev_priv->mm.interruptible);
>  
> -	if (!obj->active)
> +	if (!i915_gem_object_is_active(obj))
>  		return 0;
>  
>  	if (readonly) {
> @@ -2080,10 +2080,10 @@ i915_gem_object_retire__read(struct i915_gem_active *active,
>  	struct drm_i915_gem_object *obj =
>  		container_of(active, struct drm_i915_gem_object, last_read[ring]);
>  
> -	GEM_BUG_ON((obj->active & (1 << ring)) == 0);
> +	GEM_BUG_ON(!i915_gem_object_has_active_engine(obj, ring));
>  
> -	obj->active &= ~(1 << ring);
> -	if (obj->active)
> +	i915_gem_object_unset_active(obj, ring);
> +	if (i915_gem_object_is_active(obj))
>  		return;
>  
>  	/* Bump our place on the bound list to keep it roughly in LRU order
> @@ -2373,7 +2373,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
>  {
>  	int i;
>  
> -	if (!obj->active)
> +	if (!i915_gem_object_is_active(obj))
>  		return;
>  
>  	for (i = 0; i < I915_NUM_RINGS; i++) {
> @@ -2459,7 +2459,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  
>  	/* Need to make sure the object gets inactive eventually. */
>  	i915_gem_object_flush_active(obj);
> -	if (!obj->active)
> +	if (!i915_gem_object_is_active(obj))
>  		goto out;
>  
>  	/* Do this after OLR check to make sure we make forward progress polling
> @@ -2557,7 +2557,7 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
>  	struct drm_i915_gem_request *req[I915_NUM_RINGS];
>  	int ret, i, n;
>  
> -	if (!obj->active)
> +	if (!i915_gem_object_is_active(obj))
>  		return 0;
>  
>  	n = 0;
> @@ -3593,7 +3593,7 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
>  	i915_gem_object_flush_active(obj);
>  
>  	BUILD_BUG_ON(I915_NUM_RINGS > 16);
> -	args->busy = obj->active << 16;
> +	args->busy = I915_BO_ACTIVE(obj) << 16;
>  	if (obj->last_write.request)
>  		args->busy |= obj->last_write.request->engine->id;
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
> index d4318665ac6c..5ec5b1439e1f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
> +++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
> @@ -115,14 +115,14 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
>  
>  	list_for_each_entry_safe(tmp, next, list, batch_pool_link) {
>  		/* The batches are strictly LRU ordered */
> -		if (tmp->active) {
> +		if (i915_gem_object_is_active(tmp)) {
>  			struct drm_i915_gem_request *rq;
>  
>  			rq = tmp->last_read[pool->engine->id].request;
>  			if (!i915_gem_request_completed(rq))
>  				break;
>  
> -			GEM_BUG_ON(tmp->active & ~intel_engine_flag(pool->engine));
> +			GEM_BUG_ON((tmp->flags >> I915_BO_ACTIVE_SHIFT) & (~intel_engine_flag(pool->engine) & I915_BO_ACTIVE_MASK));
>  			GEM_BUG_ON(tmp->last_write.request);
>  		}
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 15d5a5d247e0..9250a7405807 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -427,7 +427,7 @@ void i915_gem_context_fini(struct drm_device *dev)
>  		WARN_ON(!dev_priv->ring[RCS].last_context);
>  		if (dev_priv->ring[RCS].last_context == dctx) {
>  			/* Fake switch to NULL context */
> -			WARN_ON(dctx->legacy_hw_ctx.rcs_state->active);
> +			WARN_ON(i915_gem_object_is_active(dctx->legacy_hw_ctx.rcs_state));
>  			i915_gem_object_ggtt_unpin(dctx->legacy_hw_ctx.rcs_state);
>  			i915_gem_context_unreference(dctx);
>  			dev_priv->ring[RCS].last_context = NULL;
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 79dbd74b73c2..e66864bdbfb4 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -515,7 +515,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>  	}
>  
>  	/* We can't wait for rendering with pagefaults disabled */
> -	if (obj->active && pagefault_disabled())
> +	if (i915_gem_object_is_active(obj) && pagefault_disabled())
>  		return -EFAULT;
>  
>  	if (use_cpu_reloc(obj))
> @@ -977,7 +977,7 @@ static int
>  i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
>  				struct list_head *vmas)
>  {
> -	const unsigned other_rings = ~intel_engine_flag(req->engine);
> +	const unsigned other_rings = (~intel_engine_flag(req->engine) & I915_BO_ACTIVE_MASK) << I915_BO_ACTIVE_SHIFT;
>  	struct i915_vma *vma;
>  	uint32_t flush_domains = 0;
>  	bool flush_chipset = false;
> @@ -986,7 +986,7 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
>  	list_for_each_entry(vma, vmas, exec_list) {
>  		struct drm_i915_gem_object *obj = vma->obj;
>  
> -		if (obj->active & other_rings) {
> +		if (obj->flags & other_rings) {
>  			ret = i915_gem_object_sync(obj, req);
>  			if (ret)
>  				return ret;
> @@ -1145,9 +1145,9 @@ void i915_vma_move_to_active(struct i915_vma *vma,
>  	 * add the active reference first and queue for it to be dropped
>  	 * *last*.
>  	 */
> -	if (obj->active == 0)
> +	if (!i915_gem_object_is_active(obj))
>  		drm_gem_object_reference(&obj->base);
> -	obj->active |= 1 << engine;
> +	i915_gem_object_set_active(obj, engine);
>  	i915_gem_request_mark_active(req, &obj->last_read[engine]);
>  
>  	if (flags & EXEC_OBJECT_WRITE) {
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 8f3b2f051918..6652df57e5b0 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -3229,7 +3229,7 @@ i915_vma_retire(struct i915_gem_active *active,
>  		container_of(active, struct i915_vma, last_read[engine]);
>  
>  	GEM_BUG_ON((vma->active & (1 << engine)) == 0);
> -	GEM_BUG_ON((vma->obj->active & vma->active) != vma->active);
> +	GEM_BUG_ON(((vma->obj->flags >> I915_BO_ACTIVE_SHIFT) & vma->active) != vma->active);
>  
>  	vma->active &= ~(1 << engine);
>  	if (vma->active)
> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> index 67f3eb9a8391..4d44def8fb03 100644
> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> @@ -150,7 +150,8 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
>  			    obj->madv != I915_MADV_DONTNEED)
>  				continue;
>  
> -			if ((flags & I915_SHRINK_ACTIVE) == 0 && obj->active)
> +			if ((flags & I915_SHRINK_ACTIVE) == 0 &&
> +			    i915_gem_object_is_active(obj))
>  				continue;
>  
>  			if (!can_release_pages(obj))
> @@ -233,7 +234,7 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
>  			count += obj->base.size >> PAGE_SHIFT;
>  
>  	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
> -		if (!obj->active && can_release_pages(obj))
> +		if (!i915_gem_object_is_active(obj) && can_release_pages(obj))
>  			count += obj->base.size >> PAGE_SHIFT;
>  	}
>  
> -- 
> 2.7.0.rc3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 045/190] drm/i915: Move releasing of the GEM request from free to retire/cancel
  2016-03-08 13:15   ` Tvrtko Ursulin
@ 2016-04-05 13:42     ` Tvrtko Ursulin
  2016-04-05 14:09       ` Chris Wilson
  2016-04-05 14:10       ` Chris Wilson
  0 siblings, 2 replies; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-04-05 13:42 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/03/16 13:15, Tvrtko Ursulin wrote:
>
> On 11/01/16 09:16, Chris Wilson wrote:
>> If we move the release of the GEM request (i.e. decoupling it from the
>> various lists used for client and context tracking) after it is complete
>> (either by the GPU retiring the request, or by the caller cancelling the
>> request), we can remove the requirement that the final unreference of
>> the GEM request need to be under the struct_mutex.
>>
>> v2: Execlists as always is badly asymetric and year old patches still
>> haven't landed to fix it up.
>
> Looks good and pretty standalone to me (depends on i915_gem_request code
> movement only I think). Just one question below.
>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> ---
>>   drivers/gpu/drm/i915/i915_gem.c          |  4 +--
>>   drivers/gpu/drm/i915/i915_gem_request.c  | 50
>> ++++++++++++++------------------
>>   drivers/gpu/drm/i915/i915_gem_request.h  | 14 ---------
>>   drivers/gpu/drm/i915/intel_breadcrumbs.c |  2 +-
>>   drivers/gpu/drm/i915/intel_display.c     |  2 +-
>>   drivers/gpu/drm/i915/intel_lrc.c         |  6 ++--
>>   drivers/gpu/drm/i915/intel_pm.c          |  2 +-
>>   7 files changed, 30 insertions(+), 50 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 68a25617ca7a..6d8d65304abf 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2502,7 +2502,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void
>> *data, struct drm_file *file)
>>               ret = __i915_wait_request(req[i], true,
>>                             args->timeout_ns > 0 ? &args->timeout_ns :
>> NULL,
>>                             to_rps_client(file));
>> -        i915_gem_request_unreference__unlocked(req[i]);
>> +        i915_gem_request_unreference(req[i]);
>>       }
>>       return ret;
>>
>> @@ -3505,7 +3505,7 @@ i915_gem_ring_throttle(struct drm_device *dev,
>> struct drm_file *file)
>>           return 0;
>>
>>       ret = __i915_wait_request(target, true, NULL, NULL);
>> -    i915_gem_request_unreference__unlocked(target);
>> +    i915_gem_request_unreference(target);
>>
>>       return ret;
>>   }
>> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c
>> b/drivers/gpu/drm/i915/i915_gem_request.c
>> index b4ede6dd7b20..1c4f4d83a3c2 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_request.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
>> @@ -184,13 +184,6 @@ err:
>>       return ret;
>>   }
>>
>> -void i915_gem_request_cancel(struct drm_i915_gem_request *req)
>> -{
>> -    intel_ring_reserved_space_cancel(req->ringbuf);
>> -
>> -    i915_gem_request_unreference(req);
>> -}
>> -
>>   int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>>                      struct drm_file *file)
>>   {
>> @@ -235,9 +228,28 @@ i915_gem_request_remove_from_client(struct
>> drm_i915_gem_request *request)
>>       request->pid = NULL;
>>   }
>>
>> +static void __i915_gem_request_release(struct drm_i915_gem_request
>> *request)
>> +{
>> +    i915_gem_request_remove_from_client(request);
>> +
>> +    i915_gem_context_unreference(request->ctx);
>> +    i915_gem_request_unreference(request);
>> +}
>> +
>> +void i915_gem_request_cancel(struct drm_i915_gem_request *req)
>> +{
>> +    intel_ring_reserved_space_cancel(req->ringbuf);
>> +    if (i915.enable_execlists) {
>> +        if (req->ctx != req->ring->default_context)
>> +            intel_lr_context_unpin(req);
>> +    }
>> +    __i915_gem_request_release(req);
>> +}
>> +
>>   static void i915_gem_request_retire(struct drm_i915_gem_request
>> *request)
>>   {
>>       trace_i915_gem_request_retire(request);
>> +    list_del_init(&request->list);
>>
>>       /* We know the GPU must have read the request to have
>>        * sent us the seqno + interrupt, so use the position
>> @@ -248,11 +260,7 @@ static void i915_gem_request_retire(struct
>> drm_i915_gem_request *request)
>>        * completion order.
>>        */
>>       request->ringbuf->last_retired_head = request->postfix;
>> -
>> -    list_del_init(&request->list);
>> -    i915_gem_request_remove_from_client(request);
>> -
>> -    i915_gem_request_unreference(request);
>> +    __i915_gem_request_release(request);
>>   }
>>
>>   void
>> @@ -639,21 +647,7 @@ i915_wait_request(struct drm_i915_gem_request *req)
>>
>>   void i915_gem_request_free(struct kref *req_ref)
>>   {
>> -    struct drm_i915_gem_request *req = container_of(req_ref,
>> -                         typeof(*req), ref);
>> -    struct intel_context *ctx = req->ctx;
>> -
>> -    if (req->file_priv)
>> -        i915_gem_request_remove_from_client(req);
>> -
>> -    if (ctx) {
>> -        if (i915.enable_execlists) {
>> -            if (ctx != req->ring->default_context)
>> -                intel_lr_context_unpin(req);
>> -        }
>> -
>> -        i915_gem_context_unreference(ctx);
>> -    }
>> -
>> +    struct drm_i915_gem_request *req =
>> +        container_of(req_ref, typeof(*req), ref);
>>       kmem_cache_free(req->i915->requests, req);
>>   }
>> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h
>> b/drivers/gpu/drm/i915/i915_gem_request.h
>> index d46f22f30b0a..af1b825fce50 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_request.h
>> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
>> @@ -154,23 +154,9 @@ i915_gem_request_reference(struct
>> drm_i915_gem_request *req)
>>   static inline void
>>   i915_gem_request_unreference(struct drm_i915_gem_request *req)
>>   {
>> -    WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
>>       kref_put(&req->ref, i915_gem_request_free);
>>   }
>>
>> -static inline void
>> -i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
>> -{
>> -    struct drm_device *dev;
>> -
>> -    if (!req)
>> -        return;
>> -
>> -    dev = req->ring->dev;
>> -    if (kref_put_mutex(&req->ref, i915_gem_request_free,
>> &dev->struct_mutex))
>> -        mutex_unlock(&dev->struct_mutex);
>> -}
>> -
>>   static inline void i915_gem_request_assign(struct
>> drm_i915_gem_request **pdst,
>>                          struct drm_i915_gem_request *src)
>>   {
>> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c
>> b/drivers/gpu/drm/i915/intel_breadcrumbs.c
>> index 0ea01bd6811c..f6731aac7fcf 100644
>> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
>> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
>> @@ -390,7 +390,7 @@ static int intel_breadcrumbs_signaler(void *arg)
>>                */
>>               intel_engine_remove_wait(engine, &signal->wait);
>>
>> -            i915_gem_request_unreference__unlocked(signal->request);
>> +            i915_gem_request_unreference(signal->request);
>>
>>               /* Find the next oldest signal. Note that as we have
>>                * not been holding the lock, another client may
>> diff --git a/drivers/gpu/drm/i915/intel_display.c
>> b/drivers/gpu/drm/i915/intel_display.c
>> index 57c54c9bc82b..32885b8d5c02 100644
>> --- a/drivers/gpu/drm/i915/intel_display.c
>> +++ b/drivers/gpu/drm/i915/intel_display.c
>> @@ -11431,7 +11431,7 @@ static void intel_mmio_flip_work_func(struct
>> work_struct *work)
>>           WARN_ON(__i915_wait_request(mmio_flip->req,
>>                           false, NULL,
>>                           &mmio_flip->i915->rps.mmioflips));
>> -        i915_gem_request_unreference__unlocked(mmio_flip->req);
>> +        i915_gem_request_unreference(mmio_flip->req);
>>       }
>>
>>       /* For framebuffer backed by dmabuf, wait for fence */
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index b634e7d7a92b..7a3069a2beb2 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -587,9 +587,6 @@ static int execlists_context_queue(struct
>> drm_i915_gem_request *request)
>>       struct drm_i915_gem_request *cursor;
>>       int num_elements = 0;
>>
>> -    if (request->ctx != ring->default_context)
>> -        intel_lr_context_pin(request);
>> -
>
> Since you remove LRC pin from queue, the lifetime is now either:
>
> 1. From request create to cancel.
> 2. From request create to execlist retirement.
>
> Would it be more logical to leave the LRC pin in queue, but remove it
> from request creation instead? That would make the LRC pin lifetime only
> a single possibility, from queue to execlist retire.

I felt was so close in getting rid of execlist_retired_req_queue, using 
this patch as a starting point, when I realised this patch does not play 
nicely with the GuC. Back to the drawing board. :(

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 045/190] drm/i915: Move releasing of the GEM request from free to retire/cancel
  2016-04-05 13:42     ` Tvrtko Ursulin
@ 2016-04-05 14:09       ` Chris Wilson
  2016-04-05 14:17         ` Tvrtko Ursulin
  2016-04-05 14:10       ` Chris Wilson
  1 sibling, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-04-05 14:09 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Tue, Apr 05, 2016 at 02:42:16PM +0100, Tvrtko Ursulin wrote:
> >>@@ -587,9 +587,6 @@ static int execlists_context_queue(struct
> >>drm_i915_gem_request *request)
> >>      struct drm_i915_gem_request *cursor;
> >>      int num_elements = 0;
> >>
> >>-    if (request->ctx != ring->default_context)
> >>-        intel_lr_context_pin(request);
> >>-
> >
> >Since you remove LRC pin from queue, the lifetime is now either:
> >
> >1. From request create to cancel.
> >2. From request create to execlist retirement.
> >
> >Would it be more logical to leave the LRC pin in queue, but remove it
> >from request creation instead? That would make the LRC pin lifetime only
> >a single possibility, from queue to execlist retire.

Well what we actually need in request allocation is pinning the
ringbuffer. At the moment we do that by pinning the request. We also
need to pin the VM in order to manipulate it. We could leave pinning the
logical context object til actual submission.

> I felt was so close in getting rid of execlist_retired_req_queue,
> using this patch as a starting point, when I realised this patch
> does not play nicely with the GuC. Back to the drawing board. :(

If you mean what happens if the GuC executes requests out-of-order - it
can't in the current model since we only have a single timeline and that
would break it badly - then nothing changes. We are only moving the
release in time, not decoupling it from any serialisation.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 045/190] drm/i915: Move releasing of the GEM request from free to retire/cancel
  2016-04-05 13:42     ` Tvrtko Ursulin
  2016-04-05 14:09       ` Chris Wilson
@ 2016-04-05 14:10       ` Chris Wilson
  1 sibling, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-04-05 14:10 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Tue, Apr 05, 2016 at 02:42:16PM +0100, Tvrtko Ursulin wrote:
> I felt was so close in getting rid of execlist_retired_req_queue,
> using this patch as a starting point, when I realised this patch
> does not play nicely with the GuC. Back to the drawing board. :(

I will also say that we need to rid requests of struct_mutex far more
than we need the GuC.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 045/190] drm/i915: Move releasing of the GEM request from free to retire/cancel
  2016-04-05 14:09       ` Chris Wilson
@ 2016-04-05 14:17         ` Tvrtko Ursulin
  2016-04-05 14:27           ` Chris Wilson
  0 siblings, 1 reply; 263+ messages in thread
From: Tvrtko Ursulin @ 2016-04-05 14:17 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 05/04/16 15:09, Chris Wilson wrote:
> On Tue, Apr 05, 2016 at 02:42:16PM +0100, Tvrtko Ursulin wrote:
>>>> @@ -587,9 +587,6 @@ static int execlists_context_queue(struct
>>>> drm_i915_gem_request *request)
>>>>       struct drm_i915_gem_request *cursor;
>>>>       int num_elements = 0;
>>>>
>>>> -    if (request->ctx != ring->default_context)
>>>> -        intel_lr_context_pin(request);
>>>> -
>>>
>>> Since you remove LRC pin from queue, the lifetime is now either:
>>>
>>> 1. From request create to cancel.
>>> 2. From request create to execlist retirement.
>>>
>>> Would it be more logical to leave the LRC pin in queue, but remove it
>> >from request creation instead? That would make the LRC pin lifetime only
>>> a single possibility, from queue to execlist retire.
>
> Well what we actually need in request allocation is pinning the
> ringbuffer. At the moment we do that by pinning the request. We also
> need to pin the VM in order to manipulate it. We could leave pinning the
> logical context object til actual submission.

Hmmm, yes. Have to think how complicated or not would that be.

>> I felt was so close in getting rid of execlist_retired_req_queue,
>> using this patch as a starting point, when I realised this patch
>> does not play nicely with the GuC. Back to the drawing board. :(
>
> If you mean what happens if the GuC executes requests out-of-order - it
> can't in the current model since we only have a single timeline and that
> would break it badly - then nothing changes. We are only moving the
> release in time, not decoupling it from any serialisation.

No, there is no LRC unpin, it is unbalanced in the GuC mode with this 
patch. Unless I am missing something... ?

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 045/190] drm/i915: Move releasing of the GEM request from free to retire/cancel
  2016-04-05 14:17         ` Tvrtko Ursulin
@ 2016-04-05 14:27           ` Chris Wilson
  2016-04-05 14:45             ` Chris Wilson
  0 siblings, 1 reply; 263+ messages in thread
From: Chris Wilson @ 2016-04-05 14:27 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Tue, Apr 05, 2016 at 03:17:30PM +0100, Tvrtko Ursulin wrote:
> 
> On 05/04/16 15:09, Chris Wilson wrote:
> >On Tue, Apr 05, 2016 at 02:42:16PM +0100, Tvrtko Ursulin wrote:
> >>>>@@ -587,9 +587,6 @@ static int execlists_context_queue(struct
> >>>>drm_i915_gem_request *request)
> >>>>      struct drm_i915_gem_request *cursor;
> >>>>      int num_elements = 0;
> >>>>
> >>>>-    if (request->ctx != ring->default_context)
> >>>>-        intel_lr_context_pin(request);
> >>>>-
> >>>
> >>>Since you remove LRC pin from queue, the lifetime is now either:
> >>>
> >>>1. From request create to cancel.
> >>>2. From request create to execlist retirement.
> >>>
> >>>Would it be more logical to leave the LRC pin in queue, but remove it
> >>>from request creation instead? That would make the LRC pin lifetime only
> >>>a single possibility, from queue to execlist retire.
> >
> >Well what we actually need in request allocation is pinning the
> >ringbuffer. At the moment we do that by pinning the request. We also
> >need to pin the VM in order to manipulate it. We could leave pinning the
> >logical context object til actual submission.
> 
> Hmmm, yes. Have to think how complicated or not would that be.
> 
> >>I felt was so close in getting rid of execlist_retired_req_queue,
> >>using this patch as a starting point, when I realised this patch
> >>does not play nicely with the GuC. Back to the drawing board. :(
> >
> >If you mean what happens if the GuC executes requests out-of-order - it
> >can't in the current model since we only have a single timeline and that
> >would break it badly - then nothing changes. We are only moving the
> >release in time, not decoupling it from any serialisation.
> 
> No, there is no LRC unpin, it is unbalanced in the GuC mode with
> this patch. Unless I am missing something... ?

Under GuC submission mode, we acquire the context with

i915_gem_request_alloc ->
	intel_logical_ring_alloc_request_extras ->
	intel_lr_context_pin

intel_logical_ring_advance_and_submit() keeps track of last_context for
both execlists and GuC

and we release the context from

i915_gem_request_retire or i915_gem_request_cancel ->
	__i915_gem_request_release ->
	intel_lr_context_unpin

(We need to fix that piece of insider knowlege.)

i.e. GuC submission LRC tracking behaves identically to execlists.
There's no feedback from requests to the GuC at the moment to track
active GuC clients though.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: [PATCH 045/190] drm/i915: Move releasing of the GEM request from free to retire/cancel
  2016-04-05 14:27           ` Chris Wilson
@ 2016-04-05 14:45             ` Chris Wilson
  0 siblings, 0 replies; 263+ messages in thread
From: Chris Wilson @ 2016-04-05 14:45 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

On Tue, Apr 05, 2016 at 03:27:24PM +0100, Chris Wilson wrote:
> On Tue, Apr 05, 2016 at 03:17:30PM +0100, Tvrtko Ursulin wrote:
> > 
> > On 05/04/16 15:09, Chris Wilson wrote:
> > >On Tue, Apr 05, 2016 at 02:42:16PM +0100, Tvrtko Ursulin wrote:
> > >>>>@@ -587,9 +587,6 @@ static int execlists_context_queue(struct
> > >>>>drm_i915_gem_request *request)
> > >>>>      struct drm_i915_gem_request *cursor;
> > >>>>      int num_elements = 0;
> > >>>>
> > >>>>-    if (request->ctx != ring->default_context)
> > >>>>-        intel_lr_context_pin(request);
> > >>>>-
> > >>>
> > >>>Since you remove LRC pin from queue, the lifetime is now either:
> > >>>
> > >>>1. From request create to cancel.
> > >>>2. From request create to execlist retirement.
> > >>>
> > >>>Would it be more logical to leave the LRC pin in queue, but remove it
> > >>>from request creation instead? That would make the LRC pin lifetime only
> > >>>a single possibility, from queue to execlist retire.
> > >
> > >Well what we actually need in request allocation is pinning the
> > >ringbuffer. At the moment we do that by pinning the request. We also
> > >need to pin the VM in order to manipulate it. We could leave pinning the
> > >logical context object til actual submission.
> > 
> > Hmmm, yes. Have to think how complicated or not would that be.
> > 
> > >>I felt was so close in getting rid of execlist_retired_req_queue,
> > >>using this patch as a starting point, when I realised this patch
> > >>does not play nicely with the GuC. Back to the drawing board. :(
> > >
> > >If you mean what happens if the GuC executes requests out-of-order - it
> > >can't in the current model since we only have a single timeline and that
> > >would break it badly - then nothing changes. We are only moving the
> > >release in time, not decoupling it from any serialisation.
> > 
> > No, there is no LRC unpin, it is unbalanced in the GuC mode with
> > this patch. Unless I am missing something... ?
> 
> Under GuC submission mode, we acquire the context with
> 
> i915_gem_request_alloc ->
> 	intel_logical_ring_alloc_request_extras ->
> 	intel_lr_context_pin
> 
> intel_logical_ring_advance_and_submit() keeps track of last_context for
> both execlists and GuC
> 
> and we release the context from
> 
> i915_gem_request_retire or i915_gem_request_cancel ->
> 	__i915_gem_request_release ->
> 	intel_lr_context_unpin
> 
> (We need to fix that piece of insider knowlege.)
> 
> i.e. GuC submission LRC tracking behaves identically to execlists.
> There's no feedback from requests to the GuC at the moment to track
> active GuC clients though.

From IRC, this is what is in my tree not what is at the time of the
patch. Sorry, so what I need to rebalance context-pinning is your
suggestion of tracking the pinned-context on the request:

https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=tasklet&id=5c57c9d270b5fcbf9e89804c623c96027746ed86

Thanks, I hope that will slot in nicely....
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 263+ messages in thread

end of thread, other threads:[~2016-04-05 14:45 UTC | newest]

Thread overview: 263+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
2016-01-11  9:16 ` [PATCH 002/190] drm/i915: Move the mb() following release-mmap into release-mmap Chris Wilson
2016-01-11  9:16 ` [PATCH 003/190] drm/i915: Add an optional selection from i915 of CONFIG_MMU_NOTIFIER Chris Wilson
2016-02-17 12:59   ` Daniel Vetter
2016-01-11  9:16 ` [PATCH 004/190] drm/i915: Fix some invalid requests cancellations Chris Wilson
2016-01-12 18:16   ` [Intel-gfx] " Dave Gordon
2016-01-12 18:16     ` Dave Gordon
2016-01-13 20:06     ` [Intel-gfx] " Chris Wilson
2016-01-11  9:16 ` [PATCH 005/190] drm/i915: Force clean compilation with -Werror Chris Wilson
2016-01-11  9:16 ` [PATCH 006/190] drm/i915: Add GEM debugging Kconfig option Chris Wilson
2016-01-12 17:44   ` Dave Gordon
2016-01-11  9:16 ` [PATCH 007/190] drm/i915: Hide the atomic_read(reset_counter) behind a helper Chris Wilson
2016-01-11  9:16 ` [PATCH 008/190] drm/i915: Simplify checking of GPU reset_counter in display pageflips Chris Wilson
2016-01-11  9:16 ` [PATCH 009/190] drm/i915: Tighten reset_counter for reset status Chris Wilson
2016-01-11  9:16 ` [PATCH 010/190] drm/i915: Store the reset counter when constructing a request Chris Wilson
2016-01-11  9:16 ` [PATCH 011/190] drm/i915: Simplify reset_counter handling during atomic modesetting Chris Wilson
2016-01-11  9:16 ` [PATCH 012/190] drm/i915: Prevent leaking of -EIO from i915_wait_request() Chris Wilson
2016-01-11  9:16 ` [PATCH 013/190] drm/i915: Suppress error message when GPU resets are disabled Chris Wilson
2016-01-11  9:16 ` [PATCH 014/190] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
2016-01-11  9:16 ` [PATCH 015/190] drm/i915: Remove the dedicated hangcheck workqueue Chris Wilson
2016-01-11  9:16 ` [PATCH 016/190] drm/i915: Make queueing the hangcheck work inline Chris Wilson
2016-01-11  9:16 ` [PATCH 017/190] drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+ Chris Wilson
2016-01-11 14:02   ` Dave Gordon
2016-01-21 16:27     ` Mika Kuoppala
2016-03-24  6:39   ` David Weinehall
2016-01-11  9:16 ` [PATCH 018/190] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
2016-01-11  9:16 ` [PATCH 019/190] drm/i915: Separate out the seqno-barrier from engine->get_seqno Chris Wilson
2016-01-11 15:43   ` Dave Gordon
2016-01-11  9:16 ` [PATCH 020/190] drm/i915: Remove the lazy_coherency parameter from request-completed? Chris Wilson
2016-01-11 15:45   ` Dave Gordon
2016-01-11 16:24     ` Chris Wilson
2016-01-12 10:27   ` Mika Kuoppala
2016-01-12 10:51     ` Chris Wilson
2016-01-11  9:16 ` [PATCH 021/190] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
2016-01-11 20:03   ` Dave Gordon
2016-01-12 10:05   ` Mika Kuoppala
2016-01-12 11:03     ` Chris Wilson
2016-01-12 14:30       ` Mika Kuoppala
2016-01-12 14:46         ` Chris Wilson
2016-01-11  9:16 ` [PATCH 022/190] drm/i915: Check the CPU cached value of seqno after waking the waiter Chris Wilson
2016-01-11  9:16 ` [PATCH 023/190] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted Chris Wilson
2016-01-11  9:16 ` [PATCH 024/190] drm/i915: Replace manual barrier() with READ_ONCE() in HWS accessor Chris Wilson
2016-01-12 14:17   ` Mika Kuoppala
2016-01-11  9:16 ` [PATCH 025/190] drm/i915: Broadwell execlists needs exactly the same seqno w/a as legacy Chris Wilson
2016-01-11  9:16 ` [PATCH 026/190] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
2016-01-11  9:16 ` [PATCH 027/190] drm/i915: Only query timestamp when measuring elapsed time Chris Wilson
2016-01-11  9:16 ` [PATCH 028/190] drm/i915: On GPU reset, set the HWS breadcrumb to the last seqno Chris Wilson
2016-01-11  9:16 ` [PATCH 029/190] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
2016-01-11  9:16 ` [PATCH 030/190] drm/i915: Move the get/put irq locking into the caller Chris Wilson
2016-01-11  9:16 ` [PATCH 031/190] drm/i915: Harden detection of missed interrupts Chris Wilson
2016-01-11  9:16 ` [PATCH 032/190] drm/i915: Remove debug noise on detecting fault-injection " Chris Wilson
2016-01-11  9:16 ` [PATCH 033/190] drm/i915: Only start retire worker when idle Chris Wilson
2016-01-11  9:16 ` [PATCH 034/190] drm/i915: Do not keep postponing the idle-work Chris Wilson
2016-01-11  9:16 ` [PATCH 035/190] drm/i915: Remove redundant queue_delayed_work() from throttle ioctl Chris Wilson
2016-01-11  9:16 ` [PATCH 036/190] drm/i915: Restore waitboost credit to the synchronous waiter Chris Wilson
2016-01-11 16:10   ` Jesse Barnes
2016-01-11  9:16 ` [PATCH 037/190] drm/i915: Add background commentary to "waitboosting" Chris Wilson
2016-01-11  9:16 ` [PATCH 038/190] drm/i915: Flush the RPS bottom-half when the GPU idles Chris Wilson
2016-01-11  9:16 ` [PATCH 039/190] drm/i915: Remove stop-rings debugfs interface Chris Wilson
2016-02-25 17:30   ` Arun Siluvery
2016-01-11  9:16 ` [PATCH 040/190] drm/i915: Record the ringbuffer associated with the request Chris Wilson
2016-01-11  9:16 ` [PATCH 041/190] drm/i915: Allow userspace to request no-error-capture upon GPU hangs Chris Wilson
2016-01-11  9:16 ` [PATCH 042/190] drm/i915: Clean up GPU hang message Chris Wilson
2016-02-25 17:40   ` Arun Siluvery
2016-01-11  9:16 ` [PATCH 043/190] drm/i915: Skip capturing an error state if we already have one Chris Wilson
2016-01-11  9:16 ` [PATCH 044/190] drm/i915: Move GEM request routines to i915_gem_request.c Chris Wilson
2016-02-25 17:52   ` Arun Siluvery
2016-03-08 12:58     ` Tvrtko Ursulin
2016-03-08 13:35       ` Arun Siluvery
2016-01-11  9:16 ` [PATCH 045/190] drm/i915: Move releasing of the GEM request from free to retire/cancel Chris Wilson
2016-03-08 13:15   ` Tvrtko Ursulin
2016-04-05 13:42     ` Tvrtko Ursulin
2016-04-05 14:09       ` Chris Wilson
2016-04-05 14:17         ` Tvrtko Ursulin
2016-04-05 14:27           ` Chris Wilson
2016-04-05 14:45             ` Chris Wilson
2016-04-05 14:10       ` Chris Wilson
2016-01-11  9:16 ` [PATCH 046/190] drm/i915: Derive GEM requests from dma-fence Chris Wilson
2016-01-11  9:16 ` [PATCH 047/190] drm/i915: Rename request reference/unreference to get/put Chris Wilson
2016-01-11  9:16 ` [PATCH 048/190] drm/i915: Disable waitboosting for fence_wait() Chris Wilson
2016-01-11  9:17 ` [PATCH 049/190] drm/i915: Disable waitboosting for mmioflips/semaphores Chris Wilson
2016-01-11  9:17 ` [PATCH 050/190] drm/i915: Refactor duplicate object vmap functions Chris Wilson
2016-01-11  9:17 ` [PATCH 051/190] drm,i915: Introduce drm_malloc_gfp() Chris Wilson
2016-01-11  9:17 ` [PATCH 052/190] drm/i915: Treat ringbuffer writes as write to normal memory Chris Wilson
2016-01-11  9:17 ` [PATCH 053/190] drm/i915: Convert i915_semaphores_is_enabled over to early sanitize Chris Wilson
2016-01-12 19:07   ` Dave Gordon
2016-01-11  9:17 ` [PATCH 054/190] drm/i915: Use the new rq->i915 field where appropriate Chris Wilson
2016-01-11  9:17 ` [PATCH 055/190] drm/i915: Unify intel_logical_ring_emit and intel_ring_emit Chris Wilson
2016-01-12 17:29   ` Dave Gordon
2016-01-11  9:17 ` [PATCH 056/190] drm/i915: Unify intel_ring_begin() Chris Wilson
2016-01-11  9:17 ` [PATCH 057/190] drm/i915: Remove the identical implementations of request space reservation Chris Wilson
2016-01-11  9:17 ` [PATCH 058/190] drm/i915: Rename request->ring to request->engine Chris Wilson
2016-01-28 11:45   ` Tvrtko Ursulin
2016-01-11  9:17 ` [PATCH 059/190] drm/i915: Rename request->ringbuf to request->ring Chris Wilson
2016-01-28 11:48   ` Tvrtko Ursulin
2016-01-11  9:17 ` [PATCH 060/190] drm/i915: Rename backpointer from intel_ringbuffer to intel_engine_cs Chris Wilson
2016-01-28 11:49   ` Tvrtko Ursulin
2016-01-11  9:17 ` [PATCH 061/190] drm/i915: Rename intel_context[engine].ringbuf Chris Wilson
2016-01-11  9:17 ` [PATCH 062/190] drm/i915: Rename extern functions operating on intel_engine_cs Chris Wilson
2016-01-11  9:17 ` [PATCH 063/190] drm/i915: Rename struct intel_ringbuffer to intel_ring Chris Wilson
2016-01-28 11:54   ` Tvrtko Ursulin
2016-01-11  9:17 ` [PATCH 064/190] drm/i915: Rename intel_pin_and_map_ring() Chris Wilson
2016-01-11  9:17 ` [PATCH 065/190] drm/i915: Remove obsolete engine->gpu_caches_dirty Chris Wilson
2016-01-11  9:17 ` [PATCH 066/190] drm/i915: Simplify request_alloc by returning the allocated request Chris Wilson
2016-01-12 17:11   ` Dave Gordon
2016-01-11  9:17 ` [PATCH 067/190] drm/i915: Unify legacy/execlists emission of MI_BATCHBUFFER_START Chris Wilson
2016-01-11  9:17 ` [PATCH 068/190] drm/i915: Unify adding requests between ringbuffer and execlists Chris Wilson
2016-01-11  9:17 ` [PATCH 069/190] drm/i915: Remove duplicate golden render state init from execlists Chris Wilson
2016-01-11  9:17 ` [PATCH 070/190] drm/i915: Unify legacy/execlists submit_execbuf callbacks Chris Wilson
2016-01-11  9:17 ` [PATCH 071/190] drm/i915: Simplify calling engine->sync_to Chris Wilson
2016-01-11  9:17 ` [PATCH 072/190] drm/i915: Execlists cannot pin a context without the object Chris Wilson
2016-01-11 15:24   ` Tvrtko Ursulin
2016-01-11  9:17 ` [PATCH 073/190] drm/i915: Introduce i915_gem_active for request tracking Chris Wilson
2016-01-11 17:32   ` Tvrtko Ursulin
2016-01-11 22:49     ` Chris Wilson
2016-01-12 10:04       ` Tvrtko Ursulin
2016-01-12 11:01         ` Chris Wilson
2016-01-12 13:42           ` Tvrtko Ursulin
2016-01-12 13:44           ` Tvrtko Ursulin
2016-01-12 14:08             ` Chris Wilson
2016-01-11  9:17 ` [PATCH 074/190] drm/i915: Rename request->list to link for consistency Chris Wilson
2016-01-12 13:47   ` Tvrtko Ursulin
2016-01-11  9:17 ` [PATCH 075/190] drm/i915: Refactor activity tracking for requests Chris Wilson
2016-01-28 11:41   ` Tvrtko Ursulin
2016-01-28 11:46     ` Chris Wilson
2016-01-28 11:56       ` Tvrtko Ursulin
2016-01-11  9:17 ` [PATCH 076/190] drm/i915: Rename vma->*_list to *_link for consistency Chris Wilson
2016-01-12 13:49   ` Tvrtko Ursulin
2016-01-11  9:17 ` [PATCH 077/190] drm/i915: Amalgamate GGTT/ppGTT vma debug list walkers Chris Wilson
2016-01-11  9:17 ` [PATCH 078/190] drm/i915: Split early global GTT initialisation Chris Wilson
2016-01-11  9:17 ` [PATCH 079/190] drm/i915: Reduce the pointer dance of i915_is_ggtt() Chris Wilson
2016-01-15 12:12   ` Dave Gordon
2016-01-15 12:24     ` Chris Wilson
2016-01-11  9:17 ` [PATCH 080/190] drm/i915: Store owning file on the i915_address_space Chris Wilson
2016-01-11  9:17 ` [PATCH 081/190] drm/i915: i915_vma_move_to_active prep patch Chris Wilson
2016-01-11  9:17 ` [PATCH 082/190] drm/i915: Count how many VMA are bound for an object Chris Wilson
2016-01-11  9:17 ` [PATCH 083/190] drm/i915: Be more careful when unbinding vma Chris Wilson
2016-01-11  9:17 ` [PATCH 084/190] drm/i915: Track active vma requests Chris Wilson
2016-01-11  9:17 ` [PATCH 085/190] drm/i915: Release vma when the handle is closed Chris Wilson
2016-01-11  9:17 ` [PATCH 086/190] drm/i915: Mark the context and address space as closed Chris Wilson
2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
2016-01-11 10:44   ` [PATCH 088/190] drm/i915: Move execlists interrupt based submission to a bottom-half Chris Wilson
2016-02-19 12:08     ` Tvrtko Ursulin
2016-02-19 12:29       ` Chris Wilson
2016-02-19 14:10         ` Tvrtko Ursulin
2016-02-19 14:34           ` Chris Wilson
2016-02-19 14:52             ` Tvrtko Ursulin
2016-02-19 15:02               ` Chris Wilson
2016-02-19 14:41           ` Chris Wilson
2016-01-11 10:44   ` [PATCH 089/190] drm/i915: Tidy execlists submission and tracking Chris Wilson
2016-01-11 10:44   ` [PATCH 090/190] drm/i915: Refactor execlists default context pinning Chris Wilson
2016-01-11 10:44   ` [PATCH 091/190] drm/i915: Move context initialisation to first-use Chris Wilson
2016-01-11 10:44   ` [PATCH 092/190] drm/i915: Move the magical deferred context allocation into the request Chris Wilson
2016-01-11 10:44   ` [PATCH 093/190] drm/i915: Move the forced switch back to the kernel context into eviction Chris Wilson
2016-01-11 10:44   ` [PATCH 094/190] drm/i915: Remove early l3-remap Chris Wilson
2016-01-11 10:44   ` [PATCH 095/190] drm/i915: Rearrange switch_context to load the aliasing ppgtt on first use Chris Wilson
2016-01-11 10:44   ` [PATCH 096/190] drm/i915: Eliminate early submission of context enabling request Chris Wilson
2016-01-11 10:44   ` [PATCH 097/190] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
2016-01-11 10:44   ` [PATCH 098/190] drm/i915: Double check the active status on the batch pool Chris Wilson
2016-01-11 10:44   ` [PATCH 099/190] drm/i915: Check for request completion before choosing CS flips Chris Wilson
2016-01-11 10:44   ` [PATCH 100/190] drm/i915: Remove request retirement before each batch Chris Wilson
2016-01-11 10:44   ` [PATCH 101/190] drm/i915: Only retire if necessary when creating a userptr Chris Wilson
2016-01-11 10:44   ` [PATCH 102/190] drm/i915: Move the "per-ring" default_context to the device Chris Wilson
2016-01-11 14:40     ` Dave Gordon
2016-01-11 10:44   ` [PATCH 103/190] drm/i915: Move pinning of dev_priv->kernel_context into its creator Chris Wilson
2016-01-11 10:44   ` [PATCH 104/190] drm/i915: Remove i915_gem_execbuffer_retire_commands() Chris Wilson
2016-01-11 10:44   ` [PATCH 105/190] drm/i915: Pad GTT views of exec objects up to user specified size Chris Wilson
2016-03-22 14:32     ` David Weinehall
2016-01-11 10:44   ` [PATCH 106/190] drm/i915: Split insertion/binding of an object into the VM Chris Wilson
2016-01-11 10:44   ` [PATCH 107/190] drm/i915: Record allocated vma size Chris Wilson
2016-01-11 10:44   ` [PATCH 108/190] drm/i915: Start passing around i915_vma from execbuffer Chris Wilson
2016-01-11 10:44   ` [PATCH 109/190] drm/i915: Remove highly confusing i915_gem_obj_ggtt_pin() Chris Wilson
2016-01-11 10:44   ` [PATCH 110/190] drm/i915: Move vma->pin_count:4 to vma->flags Chris Wilson
2016-01-11 10:44   ` [PATCH 111/190] drm/i915: Make fb_tracking.lock a spinlock Chris Wilson
2016-01-11 10:44   ` [PATCH 112/190] drm/i915: Move obj->active:5 to obj->flags Chris Wilson
2016-03-24 12:00     ` David Weinehall
2016-01-11 10:44   ` [PATCH 113/190] drm/i915: Enable lockless lookup of request tracking via RCU Chris Wilson
2016-01-11 10:44   ` [PATCH 114/190] drm/i915: Remove (struct_mutex) locking for wait-ioctl Chris Wilson
2016-01-11 10:44   ` [PATCH 115/190] drm/i915: Remove (struct_mutex) locking for busy-ioctl Chris Wilson
2016-01-11 10:45   ` [PATCH 116/190] drm/i915: Reduce locking inside swfinish ioctl Chris Wilson
2016-01-11 10:45   ` [PATCH 117/190] drm/i915: Remove pinned check from madvise ioctl Chris Wilson
2016-01-11 10:45   ` [PATCH 118/190] drm/i915: Remove locking for get_tiling Chris Wilson
2016-01-11 10:45   ` [PATCH 119/190] drm/i915: Reduce amount of duplicate buffer information captured on error Chris Wilson
2016-01-11 10:45   ` [PATCH 120/190] drm/i915: Stop the machine whilst capturing the GPU crash dump Chris Wilson
2016-01-11 10:45   ` [PATCH 121/190] drm/i915: Scan GGTT active list for context object Chris Wilson
2016-01-11 10:45   ` [PATCH 122/190] drm/i915: Move setting of request->batch into its single callsite Chris Wilson
2016-01-11 10:45   ` [PATCH 123/190] drm/i915: Mark unmappable GGTT entries as PIN_HIGH Chris Wilson
2016-01-11 10:45   ` [PATCH 124/190] drm/i915: Track pinned vma inside guc Chris Wilson
2016-01-11 10:45   ` [PATCH 125/190] drm/i915: Track pinned VMA Chris Wilson
2016-01-11 10:45   ` [PATCH 126/190] drm/i915: Print the batchbuffer offset next to BBADDR in error state Chris Wilson
2016-01-11 10:45   ` [PATCH 127/190] drm/i915: Cache kmap between relocations Chris Wilson
2016-01-11 10:45   ` [PATCH 128/190] drm/i915: Extract i915_gem_obj_prepare_shmem_write() Chris Wilson
2016-01-11 10:45   ` [PATCH 129/190] drm/i915: Before accessing an object via the cpu, flush GTT writes Chris Wilson
2016-01-11 10:45   ` [PATCH 130/190] drm/i915: Wait for writes through the GTT to land before reading back Chris Wilson
2016-01-11 10:45   ` [PATCH 131/190] drm/i915: Pin the pages first in shmem prepare read/write Chris Wilson
2016-01-11 10:45   ` [PATCH 132/190] drm/i915: Tidy up flush cpu/gtt write domains Chris Wilson
2016-01-11 10:45   ` [PATCH 133/190] drm/i915: Convert known clflush paths over to clflush_cache_range() Chris Wilson
2016-01-11 10:45   ` [PATCH 134/190] drm/i915: Refactor execbuffer relocation writing Chris Wilson
2016-01-11 10:45   ` [PATCH 135/190] drm/i915: Move map-and-fenceable tracking to the VMA Chris Wilson
2016-01-11 10:45   ` [PATCH 136/190] drm/i915: Move ioremap_wc tracking onto VMA Chris Wilson
2016-02-11 13:20     ` Tvrtko Ursulin
2016-02-11 13:29       ` Chris Wilson
2016-02-11 14:10         ` Tvrtko Ursulin
2016-02-19 15:11           ` Chris Wilson
2016-02-22 15:29             ` Tvrtko Ursulin
2016-02-23 10:21               ` Chris Wilson
2016-01-11 10:45   ` [PATCH 137/190] drm/i915: Shrink pages around failure to dma map Chris Wilson
2016-01-11 10:45   ` [PATCH 138/190] drm/i915/userptr: Make gup errors stickier Chris Wilson
2016-01-11 10:45   ` [PATCH 139/190] drm/i915: Move fence tracking from object to vma Chris Wilson
2016-01-11 10:45   ` [PATCH 140/190] drm/i915: Fix partial GGTT faulting Chris Wilson
2016-01-11 10:45   ` [PATCH 141/190] drm/i915: Choose not to evict faultable objects from the GGTT Chris Wilson
2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
2016-01-11 11:00   ` [PATCH 143/190] drm/i915: Track display alignment on VMA Chris Wilson
2016-01-11 11:00   ` [PATCH 144/190] drm/i915: Bump the inactive MRU tracking for all VMA accessed Chris Wilson
2016-01-11 11:00   ` [PATCH 145/190] drm/i915: Stop discarding GTT cache-domain on unbind vma Chris Wilson
2016-01-12 13:22     ` Joonas Lahtinen
2016-01-11 11:00   ` [PATCH 146/190] io-mapping: Always create a struct to hold metadata about the io-mapping Chris Wilson
2016-01-11 11:00   ` [PATCH 147/190] drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass Chris Wilson
2016-01-11 11:00   ` [PATCH 148/190] drm/i915: Stop marking the unaccessible scratch page as UC Chris Wilson
2016-01-11 11:00   ` [PATCH 149/190] drm/i915: Use i915_vm_to_ppgtt() Chris Wilson
2016-01-11 11:00   ` [PATCH 150/190] drm/i915: Embed the scratch page struct into each VM Chris Wilson
2016-01-11 11:00   ` [PATCH 151/190] drm/i915: Allow DMA pagetables to use highmem Chris Wilson
2016-01-11 11:00   ` [PATCH 152/190] drm/i915: Replace request->postfix with ->head for space searching Chris Wilson
2016-01-11 11:00   ` [PATCH 153/190] drm/i915: Record the position of the start of the request Chris Wilson
2016-01-11 11:00   ` [PATCH 154/190] drm/i915: Move per-request pid from request to ctx Chris Wilson
2016-01-11 11:00   ` [PATCH 155/190] drm/i915: Merge legacy+execlists context structs Chris Wilson
2016-01-11 11:00   ` [PATCH 156/190] drm/i915: Store the active context object on all engines upon error Chris Wilson
2016-01-11 11:00   ` [PATCH 157/190] drm/i915: Tidy execlists by using intel_context_engine locals Chris Wilson
2016-01-11 11:00   ` [PATCH 158/190] drm/i915: Skip holding an object reference for execbuf preparation Chris Wilson
2016-01-11 11:01   ` [PATCH 159/190] drm/i915: Defer active reference until required Chris Wilson
2016-01-11 11:01   ` [PATCH 160/190] drm: Track drm_mm nodes with an interval tree Chris Wilson
2016-01-11 11:01   ` [PATCH 161/190] drm: Convert drm_vma_manager to embedded interval-tree in drm_mm Chris Wilson
2016-01-11 11:01   ` [PATCH 162/190] drm/i915: Allow the user to pass a context to any ring Chris Wilson
2016-01-11 11:01   ` [PATCH 163/190] drm/i915: Fix i915_gem_evict_for_vma (soft-pinning) Chris Wilson
2016-01-11 11:01   ` [PATCH 164/190] drm/i915: Move obj->dirty:1 to obj->flags Chris Wilson
2016-03-24  8:17     ` David Weinehall
2016-01-11 11:01   ` [PATCH 165/190] drm/i915: Use the precomputed value for whether to enable command parsing Chris Wilson
2016-01-11 11:01   ` [PATCH 166/190] drm/i915: Drop spinlocks around adding to the client request list Chris Wilson
2016-01-11 11:01   ` [PATCH 167/190] drm/i915: Amalgamate execbuffer parameter structures Chris Wilson
2016-01-11 11:01   ` [PATCH 168/190] drm/i915: Skip holding context reference for duration of execbuffer call Chris Wilson
2016-01-11 11:01   ` [PATCH 169/190] drm/i915: Use vma->exec_entry as our double-entry placeholder Chris Wilson
2016-01-11 11:01   ` [PATCH 170/190] drm/i915: Store a direct lookup from object handle to vma Chris Wilson
2016-01-11 11:01   ` [PATCH 171/190] drm/i915: Pass vma to relocate entry Chris Wilson
2016-01-11 11:01   ` [PATCH 172/190] drm/i915: Eliminate lots of iterations over the execobjects array Chris Wilson
2016-01-11 11:01   ` [PATCH 173/190] drm/i915: Wait upon userptr get-user-pages within execbuffer Chris Wilson
2016-01-11 11:01   ` [PATCH 174/190] drm/i915: Show context objects in debugfs/i915_gem_objects Chris Wilson
2016-03-24  7:58     ` David Weinehall
2016-01-11 11:01   ` [PATCH 175/190] drm/i915: Remove superfluous i915_add_request_no_flush() helper Chris Wilson
2016-01-11 11:01   ` [PATCH 176/190] drm/i915: Use the MRU stack search after evicting Chris Wilson
2016-01-11 11:01   ` [PATCH 177/190] drm/i915: Use VMA as the primary object for context state Chris Wilson
2016-01-11 11:01   ` [PATCH 178/190] drm/i915: Do an inline flush-active before dropping the mutex when waiting Chris Wilson
2016-01-11 11:01   ` [PATCH 179/190] drm/i915: Skip MI_SET_CONTEXT for the same context Chris Wilson
2016-01-11 11:01   ` [PATCH 180/190] drm/i915: Micro-optimise i915_gem_object_get_dirty_page() Chris Wilson
2016-01-11 11:01   ` [PATCH 181/190] drm/i915: Introduce an internal allocator for disposable private objects Chris Wilson
2016-01-11 11:01   ` [PATCH 182/190] drm/i915: Avoid allocating a vmap arena for a single page Chris Wilson
2016-01-11 11:01   ` [PATCH 183/190] drm/i915/cmdparser: Use cached vmappings Chris Wilson
2016-01-11 11:01   ` [PATCH 184/190] drm/i915/cmdparser: Only cache the dst vmap Chris Wilson
2016-01-11 11:01   ` [PATCH 185/190] drm/i915/cmdparser: Improve hash function Chris Wilson
2016-01-11 11:01   ` [PATCH 186/190] drm/i915/cmdparser: Compare against the previous command descriptor Chris Wilson
2016-01-11 11:01   ` [PATCH 187/190] drm/i915: Allow execbuffer to use the first object as the batch Chris Wilson
2016-01-11 11:01   ` [PATCH 188/190] drm/i915: Use VMA for ringbuffer tracking Chris Wilson
2016-01-11 11:01   ` [PATCH 189/190] drm/i915: Skip clearing the GGTT on full-ppgtt systems Chris Wilson
2016-01-11 11:01   ` [PATCH 190/190] drm/i915: Do a nonblocking wait first in pread/pwrite Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.