All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 00/44] GPU scheduler for i915 driver
@ 2014-06-26 17:23 John.C.Harrison
  2014-06-26 17:23 ` [RFC 01/44] drm/i915: Corrected 'file_priv' to 'file' in 'i915_driver_preclose()' John.C.Harrison
                   ` (45 more replies)
  0 siblings, 46 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:23 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Implemented a batch buffer submission scheduler for the i915 DRM driver.

The general theory of operation is that when batch buffers are submitted to the
driver, the execbuffer() code assigns a unique seqno value and then packages up
all the information required to execute the batch buffer at a later time. This
package is given over to the scheduler which adds it to an internal node list.
The scheduler also scans the list of objects associated with the batch buffer
and compares them against the objects already in use by other buffers in the
node list. If matches are found then the new batch buffer node is marked as
being dependent upon the matching node. The same is done for the context object.
The scheduler also bumps up the priority of such matching nodes on the grounds
that the more dependencies a given batch buffer has the more important it is
likely to be.

The scheduler aims to have a given (tuneable) number of batch buffers in flight
on the hardware at any given time. If fewer than this are currently executing
when a new node is queued, then the node is passed straight through to the
submit function. Otherwise it is simply added to the queue and the driver
returns back to user land.

As each batch buffer completes, it raises an interrupt which wakes up the
scheduler. Note that it is possible for multiple buffers to complete before the
IRQ handler gets to run. Further, the seqno values of the individual buffers are
not necessary incrementing as the scheduler may have re-ordered their
submission. However, the scheduler keeps the list of executing buffers in order
of hardware submission. Thus it can scan through the list until a matching seqno
is found and then mark all in flight nodes from that point on as completed.

A deferred work queue is also poked by the interrupt handler. When this wakes up
it can do more involved processing such as actually removing completed nodes
from the queue and freeing up the resources associated with them (internal
memory allocations, DRM object references, context reference, etc.). The work
handler also checks the in flight count and calls the submission code if a new
slot has appeared.

When the scheduler's submit code is called, it scans the queued node list for
the highest priority node that has no unmet dependencies. Note that the
dependency calculation is complex as it must take inter-ring dependencies and
potential preemptions into account. Note also that in the future this will be
extended to include external dependencies such as the Android Native Sync file
descriptors and/or the linux dma-buff synchronisation scheme.

If a suitable node is found then it is sent to execbuff_final() for submission
to the hardware. The in flight count is then re-checked and a new node popped
from the list if appropriate.

The scheduler also allows high priority batch buffers (e.g. from a desktop
compositor) to jump ahead of whatever is already running if the underlying
hardware supports pre-emption. In this situation, any work that was pre-empted
is returned to the queued list ready to be resubmitted when no more high
priority work is outstanding.

[Patches against drm-intel-nightly tree fetched 30/05/2014]

John Harrison (44):
  drm/i915: Corrected 'file_priv' to 'file' in 'i915_driver_preclose()'
  drm/i915: Added getparam for native sync
  drm/i915: Add extra add_request calls
  drm/i915: Fix null pointer dereference in error capture
  drm/i915: Updating assorted register and status page definitions
  drm/i915: Fixes for FIFO space queries
  drm/i915: Disable 'get seqno' workaround for VLV
  drm/i915: Added GPU scheduler config option
  drm/i915: Start of GPU scheduler
  drm/i915: Prepare retire_requests to handle out-of-order seqnos
  drm/i915: Added scheduler hook into i915_seqno_passed()
  drm/i915: Disable hardware semaphores when GPU scheduler is enabled
  drm/i915: Added scheduler hook when closing DRM file handles
  drm/i915: Added getparam for GPU scheduler
  drm/i915: Added deferred work handler for scheduler
  drm/i915: Alloc early seqno
  drm/i915: Prelude to splitting i915_gem_do_execbuffer in two
  drm/i915: Added scheduler debug macro
  drm/i915: Split i915_dem_do_execbuffer() in half
  drm/i915: Redirect execbuffer_final() via scheduler
  drm/i915: Added tracking/locking of batch buffer objects
  drm/i915: Ensure OLS & PLR are always in sync
  drm/i915: Added manipulation of OLS/PLR
  drm/i915: Added scheduler interrupt handler hook
  drm/i915: Added hook to catch 'unexpected' ring submissions
  drm/i915: Added scheduler support to __wait_seqno() calls
  drm/i915: Added scheduler support to page fault handler
  drm/i915: Added scheduler flush calls to ring throttle and idle functions
  drm/i915: Hook scheduler into intel_ring_idle()
  drm/i915: Added a module parameter for allowing scheduler overrides
  drm/i915: Implemented the GPU scheduler
  drm/i915: Added immediate submission override to scheduler
  drm/i915: Added trace points to scheduler
  drm/i915: Added scheduler queue throttling by DRM file handle
  drm/i915: Added debugfs interface to scheduler tuning parameters
  drm/i915: Added debug state dump facilities to scheduler
  drm/i915: Added facility for cancelling an outstanding request
  drm/i915: Add early exit to execbuff_final() if insufficient ring space
  drm/i915: Added support for pre-emptive scheduling
  drm/i915: REVERTME Hack to allow IGT to test pre-emption
  drm/i915: Added validation callback to trace points
  drm/i915: Added scheduler statistic reporting to debugfs
  drm/i915: Added support for submitting out-of-batch ring commands
  drm/i915: Fake batch support for page flips

 drivers/gpu/drm/i915/Kconfig                 |   16 +
 drivers/gpu/drm/i915/Makefile                |    1 +
 drivers/gpu/drm/i915/i915_debugfs.c          |  202 +++
 drivers/gpu/drm/i915/i915_dma.c              |   27 +-
 drivers/gpu/drm/i915/i915_drv.c              |    9 +
 drivers/gpu/drm/i915/i915_drv.h              |   61 +-
 drivers/gpu/drm/i915/i915_gem.c              |  256 +++-
 drivers/gpu/drm/i915/i915_gem_context.c      |    9 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  658 ++++++++-
 drivers/gpu/drm/i915/i915_gem_render_state.c |    2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c        |   13 +-
 drivers/gpu/drm/i915/i915_irq.c              |    7 +-
 drivers/gpu/drm/i915/i915_params.c           |    4 +
 drivers/gpu/drm/i915/i915_reg.h              |   30 +-
 drivers/gpu/drm/i915/i915_scheduler.c        | 1979 ++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h        |  277 ++++
 drivers/gpu/drm/i915/i915_trace.h            |  223 +++
 drivers/gpu/drm/i915/intel_display.c         |   92 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c      |   80 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h      |   68 +-
 drivers/gpu/drm/i915/intel_uncore.c          |   49 +-
 include/drm/drmP.h                           |    7 +
 include/uapi/drm/i915_drm.h                  |    7 +
 23 files changed, 3880 insertions(+), 197 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_scheduler.c
 create mode 100644 drivers/gpu/drm/i915/i915_scheduler.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [RFC 01/44] drm/i915: Corrected 'file_priv' to 'file' in 'i915_driver_preclose()'
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
@ 2014-06-26 17:23 ` John.C.Harrison
  2014-06-30 21:03   ` Jesse Barnes
  2014-06-26 17:23 ` [RFC 02/44] drm/i915: Added getparam for native sync John.C.Harrison
                   ` (44 subsequent siblings)
  45 siblings, 1 reply; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:23 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The 'i915_driver_preclose()' function has a parameter called 'file_priv'.
However, this is misleading as the structure it points to is a 'drm_file' not a
'drm_i915_file_private'. It should be named just 'file' to avoid confusion.
---
 drivers/gpu/drm/i915/i915_dma.c |    6 +++---
 drivers/gpu/drm/i915/i915_drv.h |    6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index b9159ad..6cce55b 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1916,11 +1916,11 @@ void i915_driver_lastclose(struct drm_device * dev)
 	i915_dma_cleanup(dev);
 }
 
-void i915_driver_preclose(struct drm_device * dev, struct drm_file *file_priv)
+void i915_driver_preclose(struct drm_device *dev, struct drm_file *file)
 {
 	mutex_lock(&dev->struct_mutex);
-	i915_gem_context_close(dev, file_priv);
-	i915_gem_release(dev, file_priv);
+	i915_gem_context_close(dev, file);
+	i915_gem_release(dev, file);
 	mutex_unlock(&dev->struct_mutex);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index bea9ab40..7a96ca0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2044,12 +2044,12 @@ void i915_update_dri1_breadcrumb(struct drm_device *dev);
 extern void i915_kernel_lost_context(struct drm_device * dev);
 extern int i915_driver_load(struct drm_device *, unsigned long flags);
 extern int i915_driver_unload(struct drm_device *);
-extern int i915_driver_open(struct drm_device *dev, struct drm_file *file_priv);
+extern int i915_driver_open(struct drm_device *dev, struct drm_file *file);
 extern void i915_driver_lastclose(struct drm_device * dev);
 extern void i915_driver_preclose(struct drm_device *dev,
-				 struct drm_file *file_priv);
+				 struct drm_file *file);
 extern void i915_driver_postclose(struct drm_device *dev,
-				  struct drm_file *file_priv);
+				  struct drm_file *file);
 extern int i915_driver_device_is_agp(struct drm_device * dev);
 #ifdef CONFIG_COMPAT
 extern long i915_compat_ioctl(struct file *filp, unsigned int cmd,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 02/44] drm/i915: Added getparam for native sync
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
  2014-06-26 17:23 ` [RFC 01/44] drm/i915: Corrected 'file_priv' to 'file' in 'i915_driver_preclose()' John.C.Harrison
@ 2014-06-26 17:23 ` John.C.Harrison
  2014-07-07 18:52   ` Daniel Vetter
  2014-06-26 17:23 ` [RFC 03/44] drm/i915: Add extra add_request calls John.C.Harrison
                   ` (43 subsequent siblings)
  45 siblings, 1 reply; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:23 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Validation tests need a run time mechanism for querying whether or not the
driver supports the Android native sync facility.
---
 drivers/gpu/drm/i915/i915_dma.c |    7 +++++++
 include/uapi/drm/i915_drm.h     |    1 +
 2 files changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 6cce55b..67f2918 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1022,6 +1022,13 @@ static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_CMD_PARSER_VERSION:
 		value = i915_cmd_parser_get_version();
 		break;
+	case I915_PARAM_HAS_NATIVE_SYNC:
+#ifdef CONFIG_DRM_I915_SYNC
+		value = 1;
+#else
+		value = 0;
+#endif
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index ff57f07..bf54c78 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -340,6 +340,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
 #define I915_PARAM_HAS_WT     	 	 27
 #define I915_PARAM_CMD_PARSER_VERSION	 28
+#define I915_PARAM_HAS_NATIVE_SYNC	 30
 
 typedef struct drm_i915_getparam {
 	int param;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 03/44] drm/i915: Add extra add_request calls
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
  2014-06-26 17:23 ` [RFC 01/44] drm/i915: Corrected 'file_priv' to 'file' in 'i915_driver_preclose()' John.C.Harrison
  2014-06-26 17:23 ` [RFC 02/44] drm/i915: Added getparam for native sync John.C.Harrison
@ 2014-06-26 17:23 ` John.C.Harrison
  2014-06-30 21:10   ` Jesse Barnes
  2014-06-26 17:23 ` [RFC 04/44] drm/i915: Fix null pointer dereference in error capture John.C.Harrison
                   ` (42 subsequent siblings)
  45 siblings, 1 reply; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:23 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler needs to track batch buffers by seqno without extra, non-batch
buffer work being attached to the same seqno. This means that anywhere which
adds work to the ring should explicitly call i915_add_request() when it has
finished writing to the ring.

The add_request() function does extra work, such as flushing caches, that does
not necessarily want to be done everywhere. Instead, a new
i915_add_request_wo_flush() function has been added which skips the cache flush
and just tidies up request structures and seqno values.

Note, much of this patch was implemented by Naresh Kumar Kachhi for pending
power management improvements. However, it is also directly applicable to the
scheduler work as noted above.
---
 drivers/gpu/drm/i915/i915_dma.c              |    5 +++++
 drivers/gpu/drm/i915/i915_drv.h              |    9 +++++---
 drivers/gpu/drm/i915/i915_gem.c              |   31 ++++++++++++++++++++------
 drivers/gpu/drm/i915/i915_gem_context.c      |    9 ++++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   |    4 ++--
 drivers/gpu/drm/i915/i915_gem_render_state.c |    2 +-
 drivers/gpu/drm/i915/intel_display.c         |   10 ++++-----
 7 files changed, 52 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 67f2918..494b156 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -456,6 +456,7 @@ static int i915_dispatch_cmdbuffer(struct drm_device * dev,
 				   struct drm_clip_rect *cliprects,
 				   void *cmdbuf)
 {
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	int nbox = cmd->num_cliprects;
 	int i = 0, count, ret;
 
@@ -482,6 +483,7 @@ static int i915_dispatch_cmdbuffer(struct drm_device * dev,
 	}
 
 	i915_emit_breadcrumb(dev);
+	i915_add_request_wo_flush(LP_RING(dev_priv));
 	return 0;
 }
 
@@ -544,6 +546,7 @@ static int i915_dispatch_batchbuffer(struct drm_device * dev,
 	}
 
 	i915_emit_breadcrumb(dev);
+	i915_add_request_wo_flush(LP_RING(dev_priv));
 	return 0;
 }
 
@@ -597,6 +600,7 @@ static int i915_dispatch_flip(struct drm_device * dev)
 		ADVANCE_LP_RING();
 	}
 
+	i915_add_request_wo_flush(LP_RING(dev_priv));
 	master_priv->sarea_priv->pf_current_page = dev_priv->dri1.current_page;
 	return 0;
 }
@@ -774,6 +778,7 @@ static int i915_emit_irq(struct drm_device * dev)
 		OUT_RING(dev_priv->dri1.counter);
 		OUT_RING(MI_USER_INTERRUPT);
 		ADVANCE_LP_RING();
+		i915_add_request_wo_flush(LP_RING(dev_priv));
 	}
 
 	return dev_priv->dri1.counter;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7a96ca0..e3295cb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2199,7 +2199,7 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
 
 int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
 int i915_gem_object_sync(struct drm_i915_gem_object *obj,
-			 struct intel_engine_cs *to);
+			 struct intel_engine_cs *to, bool add_request);
 void i915_vma_move_to_active(struct i915_vma *vma,
 			     struct intel_engine_cs *ring);
 int i915_gem_dumb_create(struct drm_file *file_priv,
@@ -2272,9 +2272,12 @@ int __must_check i915_gem_suspend(struct drm_device *dev);
 int __i915_add_request(struct intel_engine_cs *ring,
 		       struct drm_file *file,
 		       struct drm_i915_gem_object *batch_obj,
-		       u32 *seqno);
+		       u32 *seqno,
+		       bool flush_caches);
 #define i915_add_request(ring, seqno) \
-	__i915_add_request(ring, NULL, NULL, seqno)
+	__i915_add_request(ring, NULL, NULL, seqno, true)
+#define i915_add_request_wo_flush(ring) \
+	__i915_add_request(ring, NULL, NULL, NULL, false)
 int __must_check i915_wait_seqno(struct intel_engine_cs *ring,
 				 uint32_t seqno);
 int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5a13d9e..898660c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2320,7 +2320,8 @@ i915_gem_get_seqno(struct drm_device *dev, u32 *seqno)
 int __i915_add_request(struct intel_engine_cs *ring,
 		       struct drm_file *file,
 		       struct drm_i915_gem_object *obj,
-		       u32 *out_seqno)
+		       u32 *out_seqno,
+		       bool flush_caches)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	struct drm_i915_gem_request *request;
@@ -2335,9 +2336,11 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	 * is that the flush _must_ happen before the next request, no matter
 	 * what.
 	 */
-	ret = intel_ring_flush_all_caches(ring);
-	if (ret)
-		return ret;
+	if (flush_caches) {
+		ret = intel_ring_flush_all_caches(ring);
+		if (ret)
+			return ret;
+	}
 
 	request = ring->preallocated_lazy_request;
 	if (WARN_ON(request == NULL))
@@ -2815,6 +2818,8 @@ out:
  *
  * @obj: object which may be in use on another ring.
  * @to: ring we wish to use the object on. May be NULL.
+ * @add_request: do we need to add a request to track operations
+ *    submitted on ring with sync_to function
  *
  * This code is meant to abstract object synchronization with the GPU.
  * Calling with NULL implies synchronizing the object with the CPU
@@ -2824,7 +2829,7 @@ out:
  */
 int
 i915_gem_object_sync(struct drm_i915_gem_object *obj,
-		     struct intel_engine_cs *to)
+		     struct intel_engine_cs *to, bool add_request)
 {
 	struct intel_engine_cs *from = obj->ring;
 	u32 seqno;
@@ -2848,12 +2853,15 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
 
 	trace_i915_gem_ring_sync_to(from, to, seqno);
 	ret = to->semaphore.sync_to(to, from, seqno);
-	if (!ret)
+	if (!ret) {
 		/* We use last_read_seqno because sync_to()
 		 * might have just caused seqno wrap under
 		 * the radar.
 		 */
 		from->semaphore.sync_seqno[idx] = obj->last_read_seqno;
+		if (add_request)
+			i915_add_request_wo_flush(to);
+	}
 
 	return ret;
 }
@@ -2958,6 +2966,15 @@ int i915_gpu_idle(struct drm_device *dev)
 		if (ret)
 			return ret;
 
+		/* Make sure the context switch (if one actually happened)
+		 * gets wrapped up and finished rather than hanging around
+		 * and confusing things later. */
+		if (ring->outstanding_lazy_seqno) {
+			ret = i915_add_request(ring, NULL);
+			if (ret)
+				return ret;
+		}
+
 		ret = intel_ring_idle(ring);
 		if (ret)
 			return ret;
@@ -3832,7 +3849,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	int ret;
 
 	if (pipelined != obj->ring) {
-		ret = i915_gem_object_sync(obj, pipelined);
+		ret = i915_gem_object_sync(obj, pipelined, true);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 3ffe308..d1d2ee0 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -488,6 +488,15 @@ int i915_gem_context_enable(struct drm_i915_private *dev_priv)
 		ret = i915_switch_context(ring, ring->default_context);
 		if (ret)
 			return ret;
+
+		/* Make sure the context switch (if one actually happened)
+		 * gets wrapped up and finished rather than hanging around
+		 * and confusing things later. */
+		if(ring->outstanding_lazy_seqno) {
+			ret = i915_add_request_wo_flush(ring);
+			if (ret)
+				return ret;
+		}
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3a30133..ee836a6 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -858,7 +858,7 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
 
 	list_for_each_entry(vma, vmas, exec_list) {
 		struct drm_i915_gem_object *obj = vma->obj;
-		ret = i915_gem_object_sync(obj, ring);
+		ret = i915_gem_object_sync(obj, ring, false);
 		if (ret)
 			return ret;
 
@@ -998,7 +998,7 @@ i915_gem_execbuffer_retire_commands(struct drm_device *dev,
 	ring->gpu_caches_dirty = true;
 
 	/* Add a breadcrumb for the completion of the batch buffer */
-	(void)__i915_add_request(ring, file, obj, NULL);
+	(void)__i915_add_request(ring, file, obj, NULL, true);
 }
 
 static int
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 3521f99..50118cb 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -190,7 +190,7 @@ int i915_gem_render_state_init(struct intel_engine_cs *ring)
 
 	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so->obj), ring);
 
-	ret = __i915_add_request(ring, NULL, so->obj, NULL);
+	ret = __i915_add_request(ring, NULL, so->obj, NULL, true);
 	/* __i915_add_request moves object to inactive if it fails */
 out:
 	render_state_free(so);
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 54095d4..fa1ffbb 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -8980,7 +8980,7 @@ static int intel_gen2_queue_flip(struct drm_device *dev,
 	intel_ring_emit(ring, 0); /* aux display base address, unused */
 
 	intel_mark_page_flip_active(intel_crtc);
-	__intel_ring_advance(ring);
+	i915_add_request_wo_flush(ring);
 	return 0;
 }
 
@@ -9012,7 +9012,7 @@ static int intel_gen3_queue_flip(struct drm_device *dev,
 	intel_ring_emit(ring, MI_NOOP);
 
 	intel_mark_page_flip_active(intel_crtc);
-	__intel_ring_advance(ring);
+	i915_add_request_wo_flush(ring);
 	return 0;
 }
 
@@ -9051,7 +9051,7 @@ static int intel_gen4_queue_flip(struct drm_device *dev,
 	intel_ring_emit(ring, pf | pipesrc);
 
 	intel_mark_page_flip_active(intel_crtc);
-	__intel_ring_advance(ring);
+	i915_add_request_wo_flush(ring);
 	return 0;
 }
 
@@ -9087,7 +9087,7 @@ static int intel_gen6_queue_flip(struct drm_device *dev,
 	intel_ring_emit(ring, pf | pipesrc);
 
 	intel_mark_page_flip_active(intel_crtc);
-	__intel_ring_advance(ring);
+	i915_add_request_wo_flush(ring);
 	return 0;
 }
 
@@ -9182,7 +9182,7 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 	intel_ring_emit(ring, (MI_NOOP));
 
 	intel_mark_page_flip_active(intel_crtc);
-	__intel_ring_advance(ring);
+	i915_add_request_wo_flush(ring);
 	return 0;
 }
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 04/44] drm/i915: Fix null pointer dereference in error capture
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (2 preceding siblings ...)
  2014-06-26 17:23 ` [RFC 03/44] drm/i915: Add extra add_request calls John.C.Harrison
@ 2014-06-26 17:23 ` John.C.Harrison
  2014-06-30 21:40   ` Jesse Barnes
  2014-07-01  7:20   ` [PATCH] drm/i915: Remove num_pages parameter to i915_error_object_create() Chris Wilson
  2014-06-26 17:23 ` [RFC 05/44] drm/i915: Updating assorted register and status page definitions John.C.Harrison
                   ` (41 subsequent siblings)
  45 siblings, 2 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:23 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The i915_gem_record_rings() code was unconditionally querying and saving state
for the batch_obj of a request structure. This is not necessarily set. Thus a
null pointer dereference can occur.
---
 drivers/gpu/drm/i915/i915_gpu_error.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 87ec60e..0738f21 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -902,12 +902,13 @@ static void i915_gem_record_rings(struct drm_device *dev,
 			 * as the simplest method to avoid being overwritten
 			 * by userspace.
 			 */
-			error->ring[i].batchbuffer =
-				i915_error_object_create(dev_priv,
-							 request->batch_obj,
-							 request->ctx ?
-							 request->ctx->vm :
-							 &dev_priv->gtt.base);
+			if(request->batch_obj)
+				error->ring[i].batchbuffer =
+					i915_error_object_create(dev_priv,
+								 request->batch_obj,
+								 request->ctx ?
+								 request->ctx->vm :
+								 &dev_priv->gtt.base);
 
 			if (HAS_BROKEN_CS_TLB(dev_priv->dev) &&
 			    ring->scratch.obj)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 05/44] drm/i915: Updating assorted register and status page definitions
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (3 preceding siblings ...)
  2014-06-26 17:23 ` [RFC 04/44] drm/i915: Fix null pointer dereference in error capture John.C.Harrison
@ 2014-06-26 17:23 ` John.C.Harrison
  2014-07-02 17:49   ` Jesse Barnes
  2014-06-26 17:23 ` [RFC 06/44] drm/i915: Fixes for FIFO space queries John.C.Harrison
                   ` (40 subsequent siblings)
  45 siblings, 1 reply; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:23 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Added various definitions that will be useful for the scheduler in general and
pre-emptive context switching in particular.
---
 drivers/gpu/drm/i915/i915_drv.h         |    5 ++-
 drivers/gpu/drm/i915/i915_reg.h         |   30 ++++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |   61 ++++++++++++++++++++++++++++++-
 3 files changed, 92 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e3295cb..53f6fe5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -584,7 +584,10 @@ struct i915_ctx_hang_stats {
 };
 
 /* This must match up with the value previously used for execbuf2.rsvd1. */
-#define DEFAULT_CONTEXT_ID 0
+#define DEFAULT_CONTEXT_ID		0
+/* This must not match any user context */
+#define PREEMPTION_CONTEXT_ID		(-1)
+
 struct intel_context {
 	struct kref ref;
 	int id;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 242df99..cfc918d 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -205,6 +205,10 @@
 #define  MI_GLOBAL_GTT    (1<<22)
 
 #define MI_NOOP			MI_INSTR(0, 0)
+#define   MI_NOOP_WRITE_ID		(1<<22)
+#define   MI_NOOP_ID_MASK		((1<<22) - 1)
+#define   MI_NOOP_MID(id)		((id) & MI_NOOP_ID_MASK)
+#define MI_NOOP_WITH_ID(id)	MI_INSTR(0, MI_NOOP_WRITE_ID|MI_NOOP_MID(id))
 #define MI_USER_INTERRUPT	MI_INSTR(0x02, 0)
 #define MI_WAIT_FOR_EVENT       MI_INSTR(0x03, 0)
 #define   MI_WAIT_FOR_OVERLAY_FLIP	(1<<16)
@@ -222,6 +226,7 @@
 #define MI_ARB_ON_OFF		MI_INSTR(0x08, 0)
 #define   MI_ARB_ENABLE			(1<<0)
 #define   MI_ARB_DISABLE		(0<<0)
+#define MI_ARB_CHECK		MI_INSTR(0x05, 0)
 #define MI_BATCH_BUFFER_END	MI_INSTR(0x0a, 0)
 #define MI_SUSPEND_FLUSH	MI_INSTR(0x0b, 0)
 #define   MI_SUSPEND_FLUSH_EN	(1<<0)
@@ -260,6 +265,8 @@
 #define   MI_SEMAPHORE_SYNC_INVALID (3<<16)
 #define   MI_SEMAPHORE_SYNC_MASK    (3<<16)
 #define MI_SET_CONTEXT		MI_INSTR(0x18, 0)
+#define   MI_CONTEXT_ADDR_MASK		((~0)<<12)
+#define   MI_SET_CONTEXT_FLAG_MASK	((1<<12)-1)
 #define   MI_MM_SPACE_GTT		(1<<8)
 #define   MI_MM_SPACE_PHYSICAL		(0<<8)
 #define   MI_SAVE_EXT_STATE_EN		(1<<3)
@@ -270,6 +277,10 @@
 #define   MI_MEM_VIRTUAL	(1 << 22) /* 965+ only */
 #define MI_STORE_DWORD_INDEX	MI_INSTR(0x21, 1)
 #define   MI_STORE_DWORD_INDEX_SHIFT 2
+#define MI_STORE_REG_MEM	MI_INSTR(0x24, 1)
+#define   MI_STORE_REG_MEM_GTT		(1 << 22)
+#define   MI_STORE_REG_MEM_PREDICATE	(1 << 21)
+
 /* Official intel docs are somewhat sloppy concerning MI_LOAD_REGISTER_IMM:
  * - Always issue a MI_NOOP _before_ the MI_LOAD_REGISTER_IMM - otherwise hw
  *   simply ignores the register load under certain conditions.
@@ -283,7 +294,10 @@
 #define MI_FLUSH_DW		MI_INSTR(0x26, 1) /* for GEN6 */
 #define   MI_FLUSH_DW_STORE_INDEX	(1<<21)
 #define   MI_INVALIDATE_TLB		(1<<18)
+#define   MI_FLUSH_DW_OP_NONE		(0<<14)
 #define   MI_FLUSH_DW_OP_STOREDW	(1<<14)
+#define   MI_FLUSH_DW_OP_RSVD		(2<<14)
+#define   MI_FLUSH_DW_OP_STAMP		(3<<14)
 #define   MI_FLUSH_DW_OP_MASK		(3<<14)
 #define   MI_FLUSH_DW_NOTIFY		(1<<8)
 #define   MI_INVALIDATE_BSD		(1<<7)
@@ -1005,6 +1019,19 @@ enum punit_power_well {
 #define GEN6_VERSYNC	(RING_SYNC_1(VEBOX_RING_BASE))
 #define GEN6_VEVSYNC	(RING_SYNC_2(VEBOX_RING_BASE))
 #define GEN6_NOSYNC 0
+
+/*
+ * Premption-related registers
+ */
+#define RING_UHPTR(base)	((base)+0x134)
+#define   UHPTR_GFX_ADDR_ALIGN		(0x7)
+#define   UHPTR_VALID			(0x1)
+#define RING_PREEMPT_ADDR	0x0214c
+#define   PREEMPT_BATCH_LEVEL_MASK	(0x3)
+#define BB_PREEMPT_ADDR		0x02148
+#define SBB_PREEMPT_ADDR	0x0213c
+#define RS_PREEMPT_STATUS	0x0215c
+
 #define RING_MAX_IDLE(base)	((base)+0x54)
 #define RING_HWS_PGA(base)	((base)+0x80)
 #define RING_HWS_PGA_GEN6(base)	((base)+0x2080)
@@ -5383,7 +5410,8 @@ enum punit_power_well {
 #define  VLV_SPAREG2H				0xA194
 
 #define  GTFIFODBG				0x120000
-#define    GT_FIFO_SBDROPERR			(1<<6)
+#define    GT_FIFO_CPU_ERROR_MASK		0xf
+#define    GT_FIFO_SDDROPERR			(1<<6)
 #define    GT_FIFO_BLOBDROPERR			(1<<5)
 #define    GT_FIFO_SB_READ_ABORTERR		(1<<4)
 #define    GT_FIFO_DROPERR			(1<<3)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 910c83c..30841ea 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -40,6 +40,12 @@ struct  intel_hw_status_page {
 #define I915_READ_MODE(ring) I915_READ(RING_MI_MODE((ring)->mmio_base))
 #define I915_WRITE_MODE(ring, val) I915_WRITE(RING_MI_MODE((ring)->mmio_base), val)
 
+#define I915_READ_UHPTR(ring) \
+		I915_READ(RING_UHPTR((ring)->mmio_base))
+#define I915_WRITE_UHPTR(ring, val) \
+		I915_WRITE(RING_UHPTR((ring)->mmio_base), val)
+#define I915_READ_NOPID(ring) I915_READ(RING_NOPID((ring)->mmio_base))
+
 enum intel_ring_hangcheck_action {
 	HANGCHECK_IDLE = 0,
 	HANGCHECK_WAIT,
@@ -280,10 +286,61 @@ intel_write_status_page(struct intel_engine_cs *ring,
  * 0x1f: Last written status offset. (GM45)
  *
  * The area from dword 0x20 to 0x3ff is available for driver usage.
+ *
+ * Note: in general the allocation of these indices is arbitrary, as long
+ * as they're all unique. But a few of them are used with instructions that
+ * have specific alignment requirements, those particular indices must be
+ * chosen carefully to meet those requirements. The list below shows the
+ * currently-known alignment requirements:
+ *
+ *	I915_GEM_SCRATCH_INDEX	    must be EVEN
  */
 #define I915_GEM_HWS_INDEX		0x20
-#define I915_GEM_HWS_SCRATCH_INDEX	0x30
-#define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
+#define I915_GEM_ACTIVE_SEQNO_INDEX	0x21  /* Executing seqno for TDR only */
+#define I915_GEM_PGFLIP_INDEX		0x22
+#define I915_GEM_BREADCRUMB_INDEX	0x23
+
+#define I915_GEM_HWS_SCRATCH_INDEX	0x24  /* QWord */
+#define I915_GEM_HWS_SCRATCH_ADDR	(I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
+
+/*
+ * Software (CPU) tracking of batch start/end addresses in the ring
+ */
+#define I915_GEM_BATCH_START_ADDR	0x2e  /* Start of batch in ring     */
+#define I915_GEM_BATCH_END_ADDR		0x2f  /* End of batch in ring       */
+
+/*
+ * Tracking; these are updated by the GPU at the beginning and/or end of every batch
+ */
+#define I915_BATCH_DONE_SEQNO		0x30  /* Last completed batch seqno  */
+#define I915_BATCH_ACTIVE_SEQNO		0x31  /* Seqno of batch in progress  */
+#define I915_BATCH_ACTIVE_ADDR		0x32  /* Addr of batch cmds in ring  */
+#define I915_BATCH_ACTIVE_END		0x33  /* End of batch cmds in ring   */
+
+/*
+ * Tracking; these are updated by the GPU at the beginning and/or end of a preemptive batch
+ */
+#define I915_PREEMPTIVE_DONE_SEQNO	0x34  /* Last completed preemptive batch seqno  */
+#define I915_PREEMPTIVE_ACTIVE_SEQNO	0x35  /* Seqno of preemptive batch in progress  */
+#define I915_PREEMPTIVE_ACTIVE_ADDR	0x36  /* Addr of preemptive batch cmds in ring  */
+#define I915_PREEMPTIVE_ACTIVE_END	0x37  /* End of preemptive batch cmds in ring   */
+
+/*
+ * Preemption; these are used by the GPU to save important registers
+ */
+#define I915_SAVE_PREEMPTED_RING_PTR	0x38  /* HEAD before preemption     */
+#define I915_SAVE_PREEMPTED_BB_PTR	0x39  /* BB ptr before preemption   */
+#define I915_SAVE_PREEMPTED_SBB_PTR	0x3a  /* SBB before preemption      */
+#define I915_SAVE_PREEMPTED_UHPTR	0x3b  /* UHPTR after preemption     */
+#define I915_SAVE_PREEMPTED_HEAD	0x3c  /* HEAD after preemption      */
+#define I915_SAVE_PREEMPTED_TAIL	0x3d  /* TAIL after preemption      */
+#define I915_SAVE_PREEMPTED_STATUS	0x3e  /* RS preemption status       */
+#define I915_SAVE_PREEMPTED_NOPID	0x3f  /* Dummy                      */
+
+/* Range of DWORDs to snapshot in the interrupt handler */
+#define	I915_IRQ_SNAP_START		I915_GEM_HWS_INDEX
+#define	I915_IRQ_SNAP_SPLIT		(I915_SAVE_PREEMPTED_NOPID/4*4+4)
+#define	I915_IRQ_SNAP_END		((I915_SAVE_PREEMPTED_NOPID+128)/4*4+4)
 
 void intel_stop_ring_buffer(struct intel_engine_cs *ring);
 void intel_cleanup_ring_buffer(struct intel_engine_cs *ring);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 06/44] drm/i915: Fixes for FIFO space queries
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (4 preceding siblings ...)
  2014-06-26 17:23 ` [RFC 05/44] drm/i915: Updating assorted register and status page definitions John.C.Harrison
@ 2014-06-26 17:23 ` John.C.Harrison
  2014-07-02 17:50   ` Jesse Barnes
  2014-06-26 17:23 ` [RFC 07/44] drm/i915: Disable 'get seqno' workaround for VLV John.C.Harrison
                   ` (39 subsequent siblings)
  45 siblings, 1 reply; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:23 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The previous code was not correctly masking the value of the GTFIFOCTL register,
leading to overruns and the message "MMIO read or write has been dropped". In
addition, the checks were repeated in several different places. This commit
replaces these various checks with a simple (inline) function to encapsulate the
read-and-mask operation. In addition, it adds a custom wait-for-fifo function
for VLV, as the timing parameters are somewhat different from those on earlier
chips.
---
 drivers/gpu/drm/i915/intel_uncore.c |   49 ++++++++++++++++++++++++++++++-----
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 871c284..6a3dddf 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -47,6 +47,12 @@ assert_device_not_suspended(struct drm_i915_private *dev_priv)
 	     "Device suspended\n");
 }
 
+static inline u32 fifo_free_entries(struct drm_i915_private *dev_priv)
+{
+	u32 count = __raw_i915_read32(dev_priv, GTFIFOCTL);
+	return count & GT_FIFO_FREE_ENTRIES_MASK;
+}
+
 static void __gen6_gt_wait_for_thread_c0(struct drm_i915_private *dev_priv)
 {
 	u32 gt_thread_status_mask;
@@ -154,6 +160,28 @@ static void __gen7_gt_force_wake_mt_put(struct drm_i915_private *dev_priv,
 		gen6_gt_check_fifodbg(dev_priv);
 }
 
+static int __vlv_gt_wait_for_fifo(struct drm_i915_private *dev_priv)
+{
+	u32 free = fifo_free_entries(dev_priv);
+	int loop1, loop2;
+
+	for (loop1 = 0; loop1 < 5000 && free < GT_FIFO_NUM_RESERVED_ENTRIES; ) {
+		for (loop2 = 0; loop2 < 1000 && free < GT_FIFO_NUM_RESERVED_ENTRIES; loop2 += 10) {
+			udelay(10);
+			free = fifo_free_entries(dev_priv);
+		}
+		loop1 += loop2;
+		if (loop1 > 1000 || free < 48)
+			DRM_DEBUG("after %d us, the FIFO has %d slots", loop1, free);
+	}
+
+	dev_priv->uncore.fifo_count = free;
+	if (WARN(free < GT_FIFO_NUM_RESERVED_ENTRIES,
+		"FIFO has insufficient space (%d slots)", free))
+		return -1;
+	return 0;
+}
+
 static int __gen6_gt_wait_for_fifo(struct drm_i915_private *dev_priv)
 {
 	int ret = 0;
@@ -161,16 +189,15 @@ static int __gen6_gt_wait_for_fifo(struct drm_i915_private *dev_priv)
 	/* On VLV, FIFO will be shared by both SW and HW.
 	 * So, we need to read the FREE_ENTRIES everytime */
 	if (IS_VALLEYVIEW(dev_priv->dev))
-		dev_priv->uncore.fifo_count =
-			__raw_i915_read32(dev_priv, GTFIFOCTL) &
-						GT_FIFO_FREE_ENTRIES_MASK;
+		return __vlv_gt_wait_for_fifo(dev_priv);
 
 	if (dev_priv->uncore.fifo_count < GT_FIFO_NUM_RESERVED_ENTRIES) {
 		int loop = 500;
-		u32 fifo = __raw_i915_read32(dev_priv, GTFIFOCTL) & GT_FIFO_FREE_ENTRIES_MASK;
+		u32 fifo = fifo_free_entries(dev_priv);
+
 		while (fifo <= GT_FIFO_NUM_RESERVED_ENTRIES && loop--) {
 			udelay(10);
-			fifo = __raw_i915_read32(dev_priv, GTFIFOCTL) & GT_FIFO_FREE_ENTRIES_MASK;
+			fifo = fifo_free_entries(dev_priv);
 		}
 		if (WARN_ON(loop < 0 && fifo <= GT_FIFO_NUM_RESERVED_ENTRIES))
 			++ret;
@@ -194,6 +221,11 @@ static void vlv_force_wake_reset(struct drm_i915_private *dev_priv)
 static void __vlv_force_wake_get(struct drm_i915_private *dev_priv,
 						int fw_engine)
 {
+#if	1
+	if (__gen6_gt_wait_for_fifo(dev_priv))
+		gen6_gt_check_fifodbg(dev_priv);
+#endif
+
 	/* Check for Render Engine */
 	if (FORCEWAKE_RENDER & fw_engine) {
 		if (wait_for_atomic((__raw_i915_read32(dev_priv,
@@ -238,6 +270,10 @@ static void __vlv_force_wake_get(struct drm_i915_private *dev_priv,
 static void __vlv_force_wake_put(struct drm_i915_private *dev_priv,
 					int fw_engine)
 {
+#if	1
+	if (__gen6_gt_wait_for_fifo(dev_priv))
+		gen6_gt_check_fifodbg(dev_priv);
+#endif
 
 	/* Check for Render Engine */
 	if (FORCEWAKE_RENDER & fw_engine)
@@ -355,8 +391,7 @@ static void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
 
 		if (IS_GEN6(dev) || IS_GEN7(dev))
 			dev_priv->uncore.fifo_count =
-				__raw_i915_read32(dev_priv, GTFIFOCTL) &
-				GT_FIFO_FREE_ENTRIES_MASK;
+				fifo_free_entries(dev_priv);
 	} else {
 		dev_priv->uncore.forcewake_count = 0;
 		dev_priv->uncore.fw_rendercount = 0;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 07/44] drm/i915: Disable 'get seqno' workaround for VLV
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (5 preceding siblings ...)
  2014-06-26 17:23 ` [RFC 06/44] drm/i915: Fixes for FIFO space queries John.C.Harrison
@ 2014-06-26 17:23 ` John.C.Harrison
  2014-07-02 17:51   ` Jesse Barnes
  2014-06-26 17:23 ` [RFC 08/44] drm/i915: Added GPU scheduler config option John.C.Harrison
                   ` (38 subsequent siblings)
  45 siblings, 1 reply; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:23 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a workaround for a hardware bug when reading the seqno from the status
page. The bug does not exist on VLV however, the workaround was still being
applied.
---
 drivers/gpu/drm/i915/intel_ringbuffer.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 279488a..bad5db0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1960,7 +1960,10 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 			ring->irq_put = gen6_ring_put_irq;
 		}
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
-		ring->get_seqno = gen6_ring_get_seqno;
+		if (IS_VALLEYVIEW(dev))
+			ring->get_seqno = ring_get_seqno;
+		else
+			ring->get_seqno = gen6_ring_get_seqno;
 		ring->set_seqno = ring_set_seqno;
 		ring->semaphore.sync_to = gen6_ring_sync;
 		ring->semaphore.signal = gen6_signal;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 08/44] drm/i915: Added GPU scheduler config option
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (6 preceding siblings ...)
  2014-06-26 17:23 ` [RFC 07/44] drm/i915: Disable 'get seqno' workaround for VLV John.C.Harrison
@ 2014-06-26 17:23 ` John.C.Harrison
  2014-07-07 18:58   ` Daniel Vetter
  2014-06-26 17:24 ` [RFC 09/44] drm/i915: Start of GPU scheduler John.C.Harrison
                   ` (37 subsequent siblings)
  45 siblings, 1 reply; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:23 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Added a Kconfig option for enabling/disabling the GPU scheduler.
---
 drivers/gpu/drm/i915/Kconfig |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 437e182..22a036b 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -81,3 +81,11 @@ config DRM_I915_UMS
 	  enable this only if you have ancient versions of the DDX drivers.
 
 	  If in doubt, say "N".
+
+config DRM_I915_SCHEDULER
+	bool "Enable GPU scheduler on Intel hardware"
+	depends on DRM_I915
+	default y
+	help
+	  Choose this option to enable GPU task scheduling for improved
+	  performance and efficiency.
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 09/44] drm/i915: Start of GPU scheduler
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (7 preceding siblings ...)
  2014-06-26 17:23 ` [RFC 08/44] drm/i915: Added GPU scheduler config option John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-07-02 17:55   ` Jesse Barnes
  2014-07-07 19:02   ` Daniel Vetter
  2014-06-26 17:24 ` [RFC 10/44] drm/i915: Prepare retire_requests to handle out-of-order seqnos John.C.Harrison
                   ` (36 subsequent siblings)
  45 siblings, 2 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Created GPU scheduler source files with only a basic init function.
---
 drivers/gpu/drm/i915/Makefile         |    1 +
 drivers/gpu/drm/i915/i915_drv.h       |    4 +++
 drivers/gpu/drm/i915/i915_gem.c       |    3 ++
 drivers/gpu/drm/i915/i915_scheduler.c |   59 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h |   40 ++++++++++++++++++++++
 5 files changed, 107 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_scheduler.c
 create mode 100644 drivers/gpu/drm/i915/i915_scheduler.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index cad1683..12817a8 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -11,6 +11,7 @@ i915-y := i915_drv.o \
 	  i915_params.o \
           i915_suspend.o \
 	  i915_sysfs.o \
+	  i915_scheduler.o \
 	  intel_pm.o
 i915-$(CONFIG_COMPAT)   += i915_ioc32.o
 i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 53f6fe5..6e592d3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1331,6 +1331,8 @@ struct intel_pipe_crc {
 	wait_queue_head_t wq;
 };
 
+struct i915_scheduler;
+
 struct drm_i915_private {
 	struct drm_device *dev;
 	struct kmem_cache *slab;
@@ -1540,6 +1542,8 @@ struct drm_i915_private {
 
 	struct i915_runtime_pm pm;
 
+	struct i915_scheduler *scheduler;
+
 	/* Old dri1 support infrastructure, beware the dragons ya fools entering
 	 * here! */
 	struct i915_dri1_state dri1;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 898660c..b784eb2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -37,6 +37,7 @@
 #include <linux/swap.h>
 #include <linux/pci.h>
 #include <linux/dma-buf.h>
+#include "i915_scheduler.h"
 
 static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
 static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj,
@@ -4669,6 +4670,8 @@ static int i915_gem_init_rings(struct drm_device *dev)
 			goto cleanup_vebox_ring;
 	}
 
+	i915_scheduler_init(dev);
+
 	ret = i915_gem_set_seqno(dev, ((u32)~0 - 0x1000));
 	if (ret)
 		goto cleanup_bsd2_ring;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
new file mode 100644
index 0000000..9ec0225
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -0,0 +1,59 @@
+/*
+ * Copyright (c) 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "i915_drv.h"
+#include "intel_drv.h"
+#include "i915_scheduler.h"
+
+#ifdef CONFIG_DRM_I915_SCHEDULER
+
+int i915_scheduler_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	if (scheduler)
+		return 0;
+
+	scheduler = kzalloc(sizeof(*scheduler), GFP_KERNEL);
+	if (!scheduler)
+		return -ENOMEM;
+
+	spin_lock_init(&scheduler->lock);
+
+	scheduler->index = 1;
+
+	dev_priv->scheduler = scheduler;
+
+	return 0;
+}
+
+#else   /* CONFIG_DRM_I915_SCHEDULER */
+
+int i915_scheduler_init(struct drm_device *dev)
+{
+	return 0;
+}
+
+#endif  /* CONFIG_DRM_I915_SCHEDULER */
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
new file mode 100644
index 0000000..bbe1934
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -0,0 +1,40 @@
+/*
+ * Copyright (c) 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef _I915_SCHEDULER_H_
+#define _I915_SCHEDULER_H_
+
+int         i915_scheduler_init(struct drm_device *dev);
+
+#ifdef CONFIG_DRM_I915_SCHEDULER
+
+struct i915_scheduler {
+	uint32_t    flags[I915_NUM_RINGS];
+	spinlock_t  lock;
+	uint32_t    index;
+};
+
+#endif  /* CONFIG_DRM_I915_SCHEDULER */
+
+#endif  /* _I915_SCHEDULER_H_ */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 10/44] drm/i915: Prepare retire_requests to handle out-of-order seqnos
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (8 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 09/44] drm/i915: Start of GPU scheduler John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-07-02 18:11   ` Jesse Barnes
  2014-07-07 19:05   ` Daniel Vetter
  2014-06-26 17:24 ` [RFC 11/44] drm/i915: Added scheduler hook into i915_seqno_passed() John.C.Harrison
                   ` (35 subsequent siblings)
  45 siblings, 2 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

A major point of the GPU scheduler is that it re-orders batch buffers after they
have been submitted to the driver. Rather than attempting to re-assign seqno
values, it is much simpler to have each batch buffer keep its initially assigned
number and modify the rest of the driver to cope with seqnos being returned out
of order. In practice, very little code actually needs updating to cope.

One such place is the retire request handler. Rather than stopping as soon as an
uncompleted seqno is found, it must now keep iterating through the requests in
case later seqnos have completed. There is also a problem with doing the free of
the request before the move to inactive. Thus the requests are now moved to a
temporary list first, then the objects de-activated and finally the requests on
the temporary list are freed.
---
 drivers/gpu/drm/i915/i915_gem.c |   60 +++++++++++++++++++++------------------
 1 file changed, 32 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b784eb2..7e53446 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2602,7 +2602,10 @@ void i915_gem_reset(struct drm_device *dev)
 void
 i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 {
+	struct drm_i915_gem_object *obj, *obj_next;
+	struct drm_i915_gem_request *req, *req_next;
 	uint32_t seqno;
+	LIST_HEAD(deferred_request_free);
 
 	if (list_empty(&ring->request_list))
 		return;
@@ -2611,43 +2614,35 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 
 	seqno = ring->get_seqno(ring, true);
 
-	/* Move any buffers on the active list that are no longer referenced
-	 * by the ringbuffer to the flushing/inactive lists as appropriate,
-	 * before we free the context associated with the requests.
+	/* Note that seqno values might be out of order due to rescheduling and
+	 * pre-emption. Thus both lists must be processed in their entirety
+	 * rather than stopping at the first 'non-passed' entry.
 	 */
-	while (!list_empty(&ring->active_list)) {
-		struct drm_i915_gem_object *obj;
-
-		obj = list_first_entry(&ring->active_list,
-				      struct drm_i915_gem_object,
-				      ring_list);
-
-		if (!i915_seqno_passed(seqno, obj->last_read_seqno))
-			break;
 
-		i915_gem_object_move_to_inactive(obj);
-	}
-
-
-	while (!list_empty(&ring->request_list)) {
-		struct drm_i915_gem_request *request;
-
-		request = list_first_entry(&ring->request_list,
-					   struct drm_i915_gem_request,
-					   list);
-
-		if (!i915_seqno_passed(seqno, request->seqno))
-			break;
+	list_for_each_entry_safe(req, req_next, &ring->request_list, list) {
+		if (!i915_seqno_passed(seqno, req->seqno))
+			continue;
 
-		trace_i915_gem_request_retire(ring, request->seqno);
+		trace_i915_gem_request_retire(ring, req->seqno);
 		/* We know the GPU must have read the request to have
 		 * sent us the seqno + interrupt, so use the position
 		 * of tail of the request to update the last known position
 		 * of the GPU head.
 		 */
-		ring->buffer->last_retired_head = request->tail;
+		ring->buffer->last_retired_head = req->tail;
 
-		i915_gem_free_request(request);
+		list_move_tail(&req->list, &deferred_request_free);
+	}
+
+	/* Move any buffers on the active list that are no longer referenced
+	 * by the ringbuffer to the flushing/inactive lists as appropriate,
+	 * before we free the context associated with the requests.
+	 */
+	list_for_each_entry_safe(obj, obj_next, &ring->active_list, ring_list) {
+		if (!i915_seqno_passed(seqno, obj->last_read_seqno))
+			continue;
+
+		i915_gem_object_move_to_inactive(obj);
 	}
 
 	if (unlikely(ring->trace_irq_seqno &&
@@ -2656,6 +2651,15 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 		ring->trace_irq_seqno = 0;
 	}
 
+	/* Finish processing active list before freeing request */
+	while (!list_empty(&deferred_request_free)) {
+		req = list_first_entry(&deferred_request_free,
+	                               struct drm_i915_gem_request,
+	                               list);
+
+		i915_gem_free_request(req);
+	}
+
 	WARN_ON(i915_verify_lists(ring->dev));
 }
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 11/44] drm/i915: Added scheduler hook into i915_seqno_passed()
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (9 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 10/44] drm/i915: Prepare retire_requests to handle out-of-order seqnos John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-07-02 18:14   ` Jesse Barnes
  2014-06-26 17:24 ` [RFC 12/44] drm/i915: Disable hardware semaphores when GPU scheduler is enabled John.C.Harrison
                   ` (34 subsequent siblings)
  45 siblings, 1 reply; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The GPU scheduler can cause seqno values to become out of order. This means that
a straight forward 'is seqno X > seqno Y' test is no longer valid. Instead, a
call into the scheduler must be made to see if the value being queried is known
to be out of order.
---
 drivers/gpu/drm/i915/i915_drv.h       |   23 ++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem.c       |   14 +++++++-------
 drivers/gpu/drm/i915/i915_irq.c       |    4 ++--
 drivers/gpu/drm/i915/i915_scheduler.c |   20 ++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h |    3 +++
 5 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6e592d3..0977653 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2214,14 +2214,35 @@ int i915_gem_dumb_create(struct drm_file *file_priv,
 			 struct drm_mode_create_dumb *args);
 int i915_gem_mmap_gtt(struct drm_file *file_priv, struct drm_device *dev,
 		      uint32_t handle, uint64_t *offset);
+
+bool i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
+				uint32_t seqno, bool *completed);
+
 /**
  * Returns true if seq1 is later than seq2.
  */
 static inline bool
-i915_seqno_passed(uint32_t seq1, uint32_t seq2)
+i915_seqno_passed(struct intel_engine_cs *ring, uint32_t seq1, uint32_t seq2)
 {
+#ifdef CONFIG_DRM_I915_SCHEDULER
+	bool    completed;
+
+	if (i915_scheduler_is_seqno_in_flight(ring, seq2, &completed))
+		return completed;
+#endif
+
 	return (int32_t)(seq1 - seq2) >= 0;
 }
+static inline int32_t
+i915_compare_seqno_values(uint32_t seq1, uint32_t seq2)
+{
+	int32_t	diff = seq1 - seq2;
+
+	if (!diff)
+		return 0;
+
+	return (diff > 0) ? 1 : -1;
+}
 
 int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
 int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7e53446..fece5e7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1165,7 +1165,7 @@ static int __wait_seqno(struct intel_engine_cs *ring, u32 seqno,
 
 	WARN(dev_priv->pm.irqs_disabled, "IRQs disabled\n");
 
-	if (i915_seqno_passed(ring->get_seqno(ring, true), seqno))
+	if (i915_seqno_passed(ring, ring->get_seqno(ring, true), seqno))
 		return 0;
 
 	timeout_expire = timeout ? jiffies + timespec_to_jiffies_timeout(timeout) : 0;
@@ -1201,7 +1201,7 @@ static int __wait_seqno(struct intel_engine_cs *ring, u32 seqno,
 			break;
 		}
 
-		if (i915_seqno_passed(ring->get_seqno(ring, false), seqno)) {
+		if (i915_seqno_passed(ring, ring->get_seqno(ring, false), seqno)) {
 			ret = 0;
 			break;
 		}
@@ -2243,7 +2243,7 @@ i915_gem_object_retire(struct drm_i915_gem_object *obj)
 	if (ring == NULL)
 		return;
 
-	if (i915_seqno_passed(ring->get_seqno(ring, true),
+	if (i915_seqno_passed(ring, ring->get_seqno(ring, true),
 			      obj->last_read_seqno))
 		i915_gem_object_move_to_inactive(obj);
 }
@@ -2489,7 +2489,7 @@ i915_gem_find_active_request(struct intel_engine_cs *ring)
 	completed_seqno = ring->get_seqno(ring, false);
 
 	list_for_each_entry(request, &ring->request_list, list) {
-		if (i915_seqno_passed(completed_seqno, request->seqno))
+		if (i915_seqno_passed(ring, completed_seqno, request->seqno))
 			continue;
 
 		return request;
@@ -2620,7 +2620,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 	 */
 
 	list_for_each_entry_safe(req, req_next, &ring->request_list, list) {
-		if (!i915_seqno_passed(seqno, req->seqno))
+		if (!i915_seqno_passed(ring, seqno, req->seqno))
 			continue;
 
 		trace_i915_gem_request_retire(ring, req->seqno);
@@ -2639,14 +2639,14 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 	 * before we free the context associated with the requests.
 	 */
 	list_for_each_entry_safe(obj, obj_next, &ring->active_list, ring_list) {
-		if (!i915_seqno_passed(seqno, obj->last_read_seqno))
+		if (!i915_seqno_passed(ring, seqno, obj->last_read_seqno))
 			continue;
 
 		i915_gem_object_move_to_inactive(obj);
 	}
 
 	if (unlikely(ring->trace_irq_seqno &&
-		     i915_seqno_passed(seqno, ring->trace_irq_seqno))) {
+		     i915_seqno_passed(ring, seqno, ring->trace_irq_seqno))) {
 		ring->irq_put(ring);
 		ring->trace_irq_seqno = 0;
 	}
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 00358f9..eff08a3e 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2750,7 +2750,7 @@ static bool
 ring_idle(struct intel_engine_cs *ring, u32 seqno)
 {
 	return (list_empty(&ring->request_list) ||
-		i915_seqno_passed(seqno, ring_last_seqno(ring)));
+		i915_seqno_passed(ring, seqno, ring_last_seqno(ring)));
 }
 
 static bool
@@ -2862,7 +2862,7 @@ static int semaphore_passed(struct intel_engine_cs *ring)
 	if (ctl & RING_WAIT_SEMAPHORE && semaphore_passed(signaller) < 0)
 		return -1;
 
-	return i915_seqno_passed(signaller->get_seqno(signaller, false), seqno);
+	return i915_seqno_passed(ring, signaller->get_seqno(signaller, false), seqno);
 }
 
 static void semaphore_clear_deadlocks(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 9ec0225..e9aa566 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -49,6 +49,26 @@ int i915_scheduler_init(struct drm_device *dev)
 	return 0;
 }
 
+bool i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
+			       uint32_t seqno, bool *completed)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	bool                    found = false;
+	unsigned long           flags;
+
+	if (!scheduler)
+		return false;
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+
+	/* Do stuff... */
+
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	return found;
+}
+
 #else   /* CONFIG_DRM_I915_SCHEDULER */
 
 int i915_scheduler_init(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index bbe1934..67260b7 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -35,6 +35,9 @@ struct i915_scheduler {
 	uint32_t    index;
 };
 
+bool        i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
+					      uint32_t seqno, bool *completed);
+
 #endif  /* CONFIG_DRM_I915_SCHEDULER */
 
 #endif  /* _I915_SCHEDULER_H_ */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 12/44] drm/i915: Disable hardware semaphores when GPU scheduler is enabled
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (10 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 11/44] drm/i915: Added scheduler hook into i915_seqno_passed() John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-07-02 18:16   ` Jesse Barnes
  2014-06-26 17:24 ` [RFC 13/44] drm/i915: Added scheduler hook when closing DRM file handles John.C.Harrison
                   ` (33 subsequent siblings)
  45 siblings, 1 reply; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Hardware sempahores require seqno values to be continuously incrementing.
However, the scheduler's reordering of batch buffers means that the seqno values
going through the hardware could be out of order. Thus semaphores can not be
used.

On the other hand, the scheduler superceeds the need for hardware semaphores
anyway. Having one ring stall waiting for something to complete on another ring
is inefficient if that ring could be working on some other, independent task.
This is what the scheduler is meant to do - keep the hardware as busy as
possible by reordering batch buffers to avoid dependency stalls.
---
 drivers/gpu/drm/i915/i915_drv.c         |    9 +++++++++
 drivers/gpu/drm/i915/i915_scheduler.c   |    9 +++++++++
 drivers/gpu/drm/i915/i915_scheduler.h   |    1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |    4 ++++
 4 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index e2bfdda..748b13a 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -33,6 +33,7 @@
 #include "i915_drv.h"
 #include "i915_trace.h"
 #include "intel_drv.h"
+#include "i915_scheduler.h"
 
 #include <linux/console.h>
 #include <linux/module.h>
@@ -468,6 +469,14 @@ void intel_detect_pch(struct drm_device *dev)
 
 bool i915_semaphore_is_enabled(struct drm_device *dev)
 {
+	/* Hardware semaphores are not compatible with the scheduler due to the
+	 * seqno values being potentially out of order. However, semaphores are
+	 * also not required as the scheduler will handle interring dependencies
+	 * and try do so in a way that does not cause dead time on the hardware.
+	 */
+	if (i915_scheduler_is_enabled(dev))
+		return 0;
+
 	if (INTEL_INFO(dev)->gen < 6)
 		return false;
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index e9aa566..d9c1879 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -26,6 +26,15 @@
 #include "intel_drv.h"
 #include "i915_scheduler.h"
 
+bool i915_scheduler_is_enabled(struct drm_device *dev)
+{
+#ifdef CONFIG_DRM_I915_SCHEDULER
+	return true;
+#else
+	return false;
+#endif
+}
+
 #ifdef CONFIG_DRM_I915_SCHEDULER
 
 int i915_scheduler_init(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 67260b7..4044b6e 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -25,6 +25,7 @@
 #ifndef _I915_SCHEDULER_H_
 #define _I915_SCHEDULER_H_
 
+bool        i915_scheduler_is_enabled(struct drm_device *dev);
 int         i915_scheduler_init(struct drm_device *dev);
 
 #ifdef CONFIG_DRM_I915_SCHEDULER
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index bad5db0..34d6d6e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -32,6 +32,7 @@
 #include <drm/i915_drm.h>
 #include "i915_trace.h"
 #include "intel_drv.h"
+#include "i915_scheduler.h"
 
 /* Early gen2 devices have a cacheline of just 32 bytes, using 64 is overkill,
  * but keeps the logic simple. Indeed, the whole purpose of this macro is just
@@ -765,6 +766,9 @@ gen6_ring_sync(struct intel_engine_cs *waiter,
 	u32 wait_mbox = signaller->semaphore.mbox.wait[waiter->id];
 	int ret;
 
+	/* Arithmetic on sequence numbers is unreliable with a scheduler. */
+	BUG_ON(i915_scheduler_is_enabled(signaller->dev));
+
 	/* Throughout all of the GEM code, seqno passed implies our current
 	 * seqno is >= the last seqno executed. However for hardware the
 	 * comparison is strictly greater than.
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 13/44] drm/i915: Added scheduler hook when closing DRM file handles
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (11 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 12/44] drm/i915: Disable hardware semaphores when GPU scheduler is enabled John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-07-02 18:20   ` Jesse Barnes
  2014-06-26 17:24 ` [RFC 14/44] drm/i915: Added getparam for GPU scheduler John.C.Harrison
                   ` (32 subsequent siblings)
  45 siblings, 1 reply; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler decouples the submission of batch buffers to the driver with
submission of batch buffers to the hardware. Thus it is possible for an
application to submit work, then close the DRM handle and free up all the
resources that piece of work wishes to use before the work has even been
submitted to the hardware. To prevent this, the scheduler needs to be informed
of the DRM close event so that it can force through any outstanding work
attributed to that file handle.
---
 drivers/gpu/drm/i915/i915_dma.c       |    3 +++
 drivers/gpu/drm/i915/i915_scheduler.c |   18 ++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h |    2 ++
 3 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 494b156..6c9ce82 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -42,6 +42,7 @@
 #include <linux/vga_switcheroo.h>
 #include <linux/slab.h>
 #include <acpi/video.h>
+#include "i915_scheduler.h"
 #include <linux/pm.h>
 #include <linux/pm_runtime.h>
 #include <linux/oom.h>
@@ -1930,6 +1931,8 @@ void i915_driver_lastclose(struct drm_device * dev)
 
 void i915_driver_preclose(struct drm_device *dev, struct drm_file *file)
 {
+	i915_scheduler_closefile(dev, file);
+
 	mutex_lock(&dev->struct_mutex);
 	i915_gem_context_close(dev, file);
 	i915_gem_release(dev, file);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index d9c1879..66a6568 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -78,6 +78,19 @@ bool i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
 	return found;
 }
 
+int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	if (!scheduler)
+		return 0;
+
+	/* Do stuff... */
+
+	return 0;
+}
+
 #else   /* CONFIG_DRM_I915_SCHEDULER */
 
 int i915_scheduler_init(struct drm_device *dev)
@@ -85,4 +98,9 @@ int i915_scheduler_init(struct drm_device *dev)
 	return 0;
 }
 
+int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
+{
+	return 0;
+}
+
 #endif  /* CONFIG_DRM_I915_SCHEDULER */
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 4044b6e..95641f6 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -27,6 +27,8 @@
 
 bool        i915_scheduler_is_enabled(struct drm_device *dev);
 int         i915_scheduler_init(struct drm_device *dev);
+int         i915_scheduler_closefile(struct drm_device *dev,
+				     struct drm_file *file);
 
 #ifdef CONFIG_DRM_I915_SCHEDULER
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 14/44] drm/i915: Added getparam for GPU scheduler
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (12 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 13/44] drm/i915: Added scheduler hook when closing DRM file handles John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-07-02 18:21   ` Jesse Barnes
  2014-06-26 17:24 ` [RFC 15/44] drm/i915: Added deferred work handler for scheduler John.C.Harrison
                   ` (31 subsequent siblings)
  45 siblings, 1 reply; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

This is required by user land validation programs that need to know whether the
scheduler is available for testing or not.
---
 drivers/gpu/drm/i915/i915_dma.c |    3 +++
 include/uapi/drm/i915_drm.h     |    1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 6c9ce82..1668316 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1035,6 +1035,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
 		value = 0;
 #endif
 		break;
+	case I915_PARAM_HAS_GPU_SCHEDULER:
+		value = i915_scheduler_is_enabled(dev);
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index bf54c78..de6f603 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -341,6 +341,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_WT     	 	 27
 #define I915_PARAM_CMD_PARSER_VERSION	 28
 #define I915_PARAM_HAS_NATIVE_SYNC	 30
+#define I915_PARAM_HAS_GPU_SCHEDULER	 31
 
 typedef struct drm_i915_getparam {
 	int param;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 15/44] drm/i915: Added deferred work handler for scheduler
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (13 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 14/44] drm/i915: Added getparam for GPU scheduler John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-07-07 19:14   ` Daniel Vetter
  2014-06-26 17:24 ` [RFC 16/44] drm/i915: Alloc early seqno John.C.Harrison
                   ` (30 subsequent siblings)
  45 siblings, 1 reply; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler needs to do interrupt triggered work that is too complex to do in
the interrupt handler. Thus it requires a deferred work handler to process this
work asynchronously.
---
 drivers/gpu/drm/i915/i915_dma.c       |    3 +++
 drivers/gpu/drm/i915/i915_drv.h       |   10 ++++++++++
 drivers/gpu/drm/i915/i915_gem.c       |   27 +++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.c |    7 +++++++
 drivers/gpu/drm/i915/i915_scheduler.h |    1 +
 5 files changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 1668316..d1356f3 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1813,6 +1813,9 @@ int i915_driver_unload(struct drm_device *dev)
 	WARN_ON(unregister_oom_notifier(&dev_priv->mm.oom_notifier));
 	unregister_shrinker(&dev_priv->mm.shrinker);
 
+	/* Cancel the scheduler work handler, which should be idle now. */
+	cancel_work_sync(&dev_priv->mm.scheduler_work);
+
 	io_mapping_free(dev_priv->gtt.mappable);
 	arch_phys_wc_del(dev_priv->gtt.mtrr);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0977653..fbafa68 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1075,6 +1075,16 @@ struct i915_gem_mm {
 	struct delayed_work idle_work;
 
 	/**
+	 * New scheme is to get an interrupt after every work packet
+	 * in order to allow the low latency scheduling of pending
+	 * packets. The idea behind adding new packets to a pending
+	 * queue rather than directly into the hardware ring buffer
+	 * is to allow high priority packets to over take low priority
+	 * ones.
+	 */
+	struct work_struct scheduler_work;
+
+	/**
 	 * Are we in a non-interruptible section of code like
 	 * modesetting?
 	 */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fece5e7..57b24f0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2712,6 +2712,29 @@ i915_gem_idle_work_handler(struct work_struct *work)
 	intel_mark_idle(dev_priv->dev);
 }
 
+#ifdef CONFIG_DRM_I915_SCHEDULER
+static void
+i915_gem_scheduler_work_handler(struct work_struct *work)
+{
+	struct intel_engine_cs  *ring;
+	struct drm_i915_private *dev_priv;
+	struct drm_device       *dev;
+	int                     i;
+
+	dev_priv = container_of(work, struct drm_i915_private, mm.scheduler_work);
+	dev = dev_priv->dev;
+
+	mutex_lock(&dev->struct_mutex);
+
+	/* Do stuff: */
+	for_each_ring(ring, dev_priv, i) {
+		i915_scheduler_remove(ring);
+	}
+
+	mutex_unlock(&dev->struct_mutex);
+}
+#endif
+
 /**
  * Ensures that an object will eventually get non-busy by flushing any required
  * write domains, emitting any outstanding lazy request and retiring and
@@ -4916,6 +4939,10 @@ i915_gem_load(struct drm_device *dev)
 			  i915_gem_retire_work_handler);
 	INIT_DELAYED_WORK(&dev_priv->mm.idle_work,
 			  i915_gem_idle_work_handler);
+#ifdef CONFIG_DRM_I915_SCHEDULER
+	INIT_WORK(&dev_priv->mm.scheduler_work,
+				i915_gem_scheduler_work_handler);
+#endif
 	init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
 
 	/* On GEN3 we really need to make sure the ARB C3 LP bit is set */
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 66a6568..37f8a98 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -58,6 +58,13 @@ int i915_scheduler_init(struct drm_device *dev)
 	return 0;
 }
 
+int i915_scheduler_remove(struct intel_engine_cs *ring)
+{
+	/* Do stuff... */
+
+	return 0;
+}
+
 bool i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
 			       uint32_t seqno, bool *completed)
 {
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 95641f6..6b2cc51 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -38,6 +38,7 @@ struct i915_scheduler {
 	uint32_t    index;
 };
 
+int         i915_scheduler_remove(struct intel_engine_cs *ring);
 bool        i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
 					      uint32_t seqno, bool *completed);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 16/44] drm/i915: Alloc early seqno
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (14 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 15/44] drm/i915: Added deferred work handler for scheduler John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-07-02 18:29   ` Jesse Barnes
  2014-06-26 17:24 ` [RFC 17/44] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two John.C.Harrison
                   ` (29 subsequent siblings)
  45 siblings, 1 reply; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler needs to explicitly allocate a seqno to track each submitted batch
buffer. This must happen a long time before any commands are actually written to
the ring.
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    5 +++++
 drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h    |    1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index ee836a6..ec274ef 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1317,6 +1317,11 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		vma->bind_vma(vma, batch_obj->cache_level, GLOBAL_BIND);
 	}
 
+	/* Allocate a seqno for this batch buffer nice and early. */
+	ret = intel_ring_alloc_seqno(ring);
+	if (ret)
+		goto err;
+
 	if (flags & I915_DISPATCH_SECURE)
 		exec_start += i915_gem_obj_ggtt_offset(batch_obj);
 	else
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 34d6d6e..737c41b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1662,7 +1662,7 @@ int intel_ring_idle(struct intel_engine_cs *ring)
 	return i915_wait_seqno(ring, seqno);
 }
 
-static int
+int
 intel_ring_alloc_seqno(struct intel_engine_cs *ring)
 {
 	if (ring->outstanding_lazy_seqno)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 30841ea..cc92de2 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -347,6 +347,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring);
 
 int __must_check intel_ring_begin(struct intel_engine_cs *ring, int n);
 int __must_check intel_ring_cacheline_align(struct intel_engine_cs *ring);
+int __must_check intel_ring_alloc_seqno(struct intel_engine_cs *ring);
 static inline void intel_ring_emit(struct intel_engine_cs *ring,
 				   u32 data)
 {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 17/44] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (15 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 16/44] drm/i915: Alloc early seqno John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-07-02 18:34   ` Jesse Barnes
  2014-06-26 17:24 ` [RFC 18/44] drm/i915: Added scheduler debug macro John.C.Harrison
                   ` (28 subsequent siblings)
  45 siblings, 1 reply; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler decouples the submission of batch buffers to the driver with their
submission to the hardware. This basically means splitting the execbuffer()
function in half. This change rearranges some code ready for the split to occur.
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index ec274ef..fda9187 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -32,6 +32,7 @@
 #include "i915_trace.h"
 #include "intel_drv.h"
 #include <linux/dma_remapping.h>
+#include "i915_scheduler.h"
 
 #define  __EXEC_OBJECT_HAS_PIN (1<<31)
 #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
@@ -874,10 +875,7 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
 	if (flush_domains & I915_GEM_DOMAIN_GTT)
 		wmb();
 
-	/* Unconditionally invalidate gpu caches and ensure that we do flush
-	 * any residual writes from the previous batch.
-	 */
-	return intel_ring_invalidate_all_caches(ring);
+	return 0;
 }
 
 static bool
@@ -1219,8 +1217,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		}
 	}
 
-	intel_runtime_pm_get(dev_priv);
-
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
 		goto pre_mutex_err;
@@ -1331,6 +1327,20 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (ret)
 		goto err;
 
+	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
+
+	/* To be split into two functions here... */
+
+	intel_runtime_pm_get(dev_priv);
+
+	/* Unconditionally invalidate gpu caches and ensure that we do flush
+	 * any residual writes from the previous batch.
+	 */
+	ret = intel_ring_invalidate_all_caches(ring);
+	if (ret)
+		goto err;
+
+	/* Switch to the correct context for the batch */
 	ret = i915_switch_context(ring, ctx);
 	if (ret)
 		goto err;
@@ -1381,7 +1391,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 
 	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
 
-	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
 	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
 
 err:
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 18/44] drm/i915: Added scheduler debug macro
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (16 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 17/44] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-07-02 18:37   ` Jesse Barnes
  2014-06-26 17:24 ` [RFC 19/44] drm/i915: Split i915_dem_do_execbuffer() in half John.C.Harrison
                   ` (27 subsequent siblings)
  45 siblings, 1 reply; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Added a DRM debug facility for use by the scheduler.
---
 include/drm/drmP.h |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/drm/drmP.h b/include/drm/drmP.h
index 76ccaab..2f477c9 100644
--- a/include/drm/drmP.h
+++ b/include/drm/drmP.h
@@ -120,6 +120,7 @@ struct videomode;
 #define DRM_UT_DRIVER		0x02
 #define DRM_UT_KMS		0x04
 #define DRM_UT_PRIME		0x08
+#define DRM_UT_SCHED		0x40
 
 extern __printf(2, 3)
 void drm_ut_debug_printk(const char *function_name,
@@ -221,10 +222,16 @@ int drm_err(const char *func, const char *format, ...);
 		if (unlikely(drm_debug & DRM_UT_PRIME))			\
 			drm_ut_debug_printk(__func__, fmt, ##args);	\
 	} while (0)
+#define DRM_DEBUG_SCHED(fmt, args...)					\
+	do {								\
+		if (unlikely(drm_debug & DRM_UT_SCHED))			\
+			drm_ut_debug_printk(__func__, fmt, ##args);	\
+	} while (0)
 #else
 #define DRM_DEBUG_DRIVER(fmt, args...) do { } while (0)
 #define DRM_DEBUG_KMS(fmt, args...)	do { } while (0)
 #define DRM_DEBUG_PRIME(fmt, args...)	do { } while (0)
+#define DRM_DEBUG_SCHED(fmt, args...)	do { } while (0)
 #define DRM_DEBUG(fmt, arg...)		 do { } while (0)
 #endif
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 19/44] drm/i915: Split i915_dem_do_execbuffer() in half
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (17 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 18/44] drm/i915: Added scheduler debug macro John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 20/44] drm/i915: Redirect execbuffer_final() via scheduler John.C.Harrison
                   ` (26 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Split the execbuffer() function in half. The first half collects and validates
all the information requried to process the batch buffer. It also does all the
object pinning, relocations, active list management, etc - basically anything
that must be done upfront before the IOCTL returns and allows the user land side
to start changing/freeing things. The second half does the actual ring
submission.

This change implements the split but leaves the back half being called directly
from the end of the front half.
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  125 +++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_scheduler.h      |   25 ++++++
 2 files changed, 121 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index fda9187..334e8c6 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1090,10 +1090,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	struct intel_context *ctx;
 	struct i915_address_space *vm;
 	const u32 ctx_id = i915_execbuffer2_get_context_id(*args);
-	u64 exec_start = args->batch_start_offset, exec_len;
 	u32 mask, flags;
-	int ret, mode, i;
+	int ret, mode;
 	bool need_relocs;
+	struct i915_scheduler_queue_entry qe;
 
 	if (!i915_gem_check_execbuffer(args))
 		return -EINVAL;
@@ -1240,6 +1240,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (!USES_FULL_PPGTT(dev))
 		vm = &dev_priv->gtt.base;
 
+	memset(&qe, 0x00, sizeof(qe));
+
 	eb = eb_create(args);
 	if (eb == NULL) {
 		i915_gem_context_unreference(ctx);
@@ -1318,10 +1320,27 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (ret)
 		goto err;
 
+	/* Save assorted stuff away to pass through to execbuffer_final() */
+	qe.params.dev                     = dev;
+	qe.params.file                    = file;
+	qe.params.ring                    = ring;
+	qe.params.eb_flags                = flags;
+	qe.params.args_flags              = args->flags;
+	qe.params.args_batch_start_offset = args->batch_start_offset;
+	qe.params.args_batch_len          = args->batch_len;
+	qe.params.args_num_cliprects      = args->num_cliprects;
+	qe.params.args_DR1                = args->DR1;
+	qe.params.args_DR4                = args->DR4;
+	qe.params.batch_obj               = batch_obj;
+	qe.params.cliprects               = cliprects;
+	qe.params.ctx                     = ctx;
+	qe.params.mask                    = mask;
+	qe.params.mode                    = mode;
+
 	if (flags & I915_DISPATCH_SECURE)
-		exec_start += i915_gem_obj_ggtt_offset(batch_obj);
+		qe.params.batch_obj_vm_offset = i915_gem_obj_ggtt_offset(batch_obj);
 	else
-		exec_start += i915_gem_obj_offset(batch_obj, vm);
+		qe.params.batch_obj_vm_offset = i915_gem_obj_offset(batch_obj, vm);
 
 	ret = i915_gem_execbuffer_move_to_gpu(ring, &eb->vmas);
 	if (ret)
@@ -1329,7 +1348,58 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 
 	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
 
-	/* To be split into two functions here... */
+	ret = i915_gem_do_execbuffer_final(&qe.params);
+	if (ret)
+		goto err;
+
+	/* Free everything that was stored in the QE structure (until the
+	 * scheduler arrives and does it instead): */
+	kfree(qe.params.cliprects);
+
+	/* The eb list is no longer required. The scheduler has extracted all
+	 * the information than needs to persist. */
+	eb_destroy(eb);
+
+	/*
+	 * Don't clean up everything that is now saved away in the queue.
+	 * Just unlock and return immediately.
+	 */
+	mutex_unlock(&dev->struct_mutex);
+
+	return ret;
+
+err:
+	/* the request owns the ref now */
+	i915_gem_context_unreference(ctx);
+
+	eb_destroy(eb);
+
+	mutex_unlock(&dev->struct_mutex);
+
+pre_mutex_err:
+	kfree(cliprects);
+
+	return ret;
+}
+
+/*
+ * This is the main function for adding a batch to the ring.
+ * It is called from the scheduler, with the struct_mutex already held.
+ */
+int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params)
+{
+	struct drm_i915_private *dev_priv = params->dev->dev_private;
+	struct intel_engine_cs  *ring = params->ring;
+	u64 exec_start, exec_len;
+	int ret, i;
+
+	/* The mutex must be acquired before calling this function */
+	BUG_ON(!mutex_is_locked(&params->dev->struct_mutex));
+
+	if (dev_priv->ums.mm_suspended) {
+		ret = -EBUSY;
+		goto early_err;
+	}
 
 	intel_runtime_pm_get(dev_priv);
 
@@ -1341,12 +1411,12 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		goto err;
 
 	/* Switch to the correct context for the batch */
-	ret = i915_switch_context(ring, ctx);
+	ret = i915_switch_context(ring, params->ctx);
 	if (ret)
 		goto err;
 
 	if (ring == &dev_priv->ring[RCS] &&
-	    mode != dev_priv->relative_constants_mode) {
+	    params->mode != dev_priv->relative_constants_mode) {
 		ret = intel_ring_begin(ring, 4);
 		if (ret)
 				goto err;
@@ -1354,58 +1424,55 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		intel_ring_emit(ring, MI_NOOP);
 		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 		intel_ring_emit(ring, INSTPM);
-		intel_ring_emit(ring, mask << 16 | mode);
+		intel_ring_emit(ring, params->mask << 16 | params->mode);
 		intel_ring_advance(ring);
 
-		dev_priv->relative_constants_mode = mode;
+		dev_priv->relative_constants_mode = params->mode;
 	}
 
-	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
-		ret = i915_reset_gen7_sol_offsets(dev, ring);
+	if (params->args_flags & I915_EXEC_GEN7_SOL_RESET) {
+		ret = i915_reset_gen7_sol_offsets(params->dev, ring);
 		if (ret)
 			goto err;
 	}
 
 
-	exec_len = args->batch_len;
-	if (cliprects) {
-		for (i = 0; i < args->num_cliprects; i++) {
-			ret = i915_emit_box(dev, &cliprects[i],
-					    args->DR1, args->DR4);
+	exec_len   = params->args_batch_len;
+	exec_start = params->batch_obj_vm_offset +
+		     params->args_batch_start_offset;
+
+	if (params->cliprects) {
+		for (i = 0; i < params->args_num_cliprects; i++) {
+			ret = i915_emit_box(params->dev, &params->cliprects[i],
+					    params->args_DR1, params->args_DR4);
 			if (ret)
 				goto err;
 
 			ret = ring->dispatch_execbuffer(ring,
 							exec_start, exec_len,
-							flags);
+							params->eb_flags);
 			if (ret)
 				goto err;
 		}
 	} else {
 		ret = ring->dispatch_execbuffer(ring,
 						exec_start, exec_len,
-						flags);
+						params->eb_flags);
 		if (ret)
 			goto err;
 	}
 
-	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
+	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), params->eb_flags);
 
-	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
+	i915_gem_execbuffer_retire_commands(params->dev, params->file, ring,
+					    params->batch_obj);
 
 err:
-	/* the request owns the ref now */
-	i915_gem_context_unreference(ctx);
-	eb_destroy(eb);
-
-	mutex_unlock(&dev->struct_mutex);
-
-pre_mutex_err:
-	kfree(cliprects);
-
 	/* intel_gpu_busy should also get a ref, so it will free when the device
 	 * is really idle. */
 	intel_runtime_pm_put(dev_priv);
+
+early_err:
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 6b2cc51..68a9543 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -25,6 +25,29 @@
 #ifndef _I915_SCHEDULER_H_
 #define _I915_SCHEDULER_H_
 
+struct i915_execbuffer_params {
+	struct drm_device               *dev;
+	struct drm_file                 *file;
+	uint32_t                        eb_flags;
+	uint32_t                        args_flags;
+	uint32_t                        args_batch_start_offset;
+	uint32_t                        args_batch_len;
+	uint32_t                        args_num_cliprects;
+	uint32_t                        args_DR1;
+	uint32_t                        args_DR4;
+	uint32_t                        batch_obj_vm_offset;
+	struct intel_engine_cs          *ring;
+	struct drm_i915_gem_object      *batch_obj;
+	struct drm_clip_rect            *cliprects;
+	uint32_t                        mask;
+	int                             mode;
+	struct intel_context            *ctx;
+};
+
+struct i915_scheduler_queue_entry {
+	struct i915_execbuffer_params       params;
+};
+
 bool        i915_scheduler_is_enabled(struct drm_device *dev);
 int         i915_scheduler_init(struct drm_device *dev);
 int         i915_scheduler_closefile(struct drm_device *dev,
@@ -44,4 +67,6 @@ bool        i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
 
 #endif  /* CONFIG_DRM_I915_SCHEDULER */
 
+int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params);
+
 #endif  /* _I915_SCHEDULER_H_ */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 20/44] drm/i915: Redirect execbuffer_final() via scheduler
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (18 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 19/44] drm/i915: Split i915_dem_do_execbuffer() in half John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 21/44] drm/i915: Added tracking/locking of batch buffer objects John.C.Harrison
                   ` (25 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Updated the execbuffer() code to pass the packaged up batch buffer information
to the scheduler rather than calling execbuffer_final() directly. The scheduler
queue() code is currently a stub which simply chains on to _final() immediately.
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    6 +-----
 drivers/gpu/drm/i915/i915_scheduler.c      |   23 +++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h      |    2 ++
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 334e8c6..f73c936 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1348,14 +1348,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 
 	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
 
-	ret = i915_gem_do_execbuffer_final(&qe.params);
+	ret = i915_scheduler_queue_execbuffer(&qe);
 	if (ret)
 		goto err;
 
-	/* Free everything that was stored in the QE structure (until the
-	 * scheduler arrives and does it instead): */
-	kfree(qe.params.cliprects);
-
 	/* The eb list is no longer required. The scheduler has extracted all
 	 * the information than needs to persist. */
 	eb_destroy(eb);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 37f8a98..d95c789 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -58,6 +58,24 @@ int i915_scheduler_init(struct drm_device *dev)
 	return 0;
 }
 
+int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
+{
+	struct drm_i915_private     *dev_priv = qe->params.dev->dev_private;
+	struct i915_scheduler       *scheduler = dev_priv->scheduler;
+	int ret;
+
+	BUG_ON(!scheduler);
+
+	qe->params.scheduler_index = scheduler->index++;
+
+	ret = i915_gem_do_execbuffer_final(&qe->params);
+
+	/* Free everything that is owned by the QE structure: */
+	kfree(qe->params.cliprects);
+
+	return ret;
+}
+
 int i915_scheduler_remove(struct intel_engine_cs *ring)
 {
 	/* Do stuff... */
@@ -110,4 +128,9 @@ int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
 	return 0;
 }
 
+int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
+{
+	return i915_gem_do_execbuffer_final(&qe->params);
+}
+
 #endif  /* CONFIG_DRM_I915_SCHEDULER */
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 68a9543..4c3e081 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -42,6 +42,7 @@ struct i915_execbuffer_params {
 	uint32_t                        mask;
 	int                             mode;
 	struct intel_context            *ctx;
+	uint32_t                        scheduler_index;
 };
 
 struct i915_scheduler_queue_entry {
@@ -52,6 +53,7 @@ bool        i915_scheduler_is_enabled(struct drm_device *dev);
 int         i915_scheduler_init(struct drm_device *dev);
 int         i915_scheduler_closefile(struct drm_device *dev,
 				     struct drm_file *file);
+int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
 
 #ifdef CONFIG_DRM_I915_SCHEDULER
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 21/44] drm/i915: Added tracking/locking of batch buffer objects
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (19 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 20/44] drm/i915: Redirect execbuffer_final() via scheduler John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 22/44] drm/i915: Ensure OLS & PLR are always in sync John.C.Harrison
                   ` (24 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler needs to track interdependencies between batch buffers. These are
calculated by analysing the object lists of the buffers and looking for
commonality. The scheduler also needs to keep those buffers locked long after
the initial IOCTL call has returned to user land.
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   57 +++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_scheduler.c      |   20 +++++++++-
 drivers/gpu/drm/i915/i915_scheduler.h      |    6 +++
 3 files changed, 80 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index f73c936..6bb1fd6 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1094,6 +1094,9 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	int ret, mode;
 	bool need_relocs;
 	struct i915_scheduler_queue_entry qe;
+#ifdef CONFIG_DRM_I915_SCHEDULER
+	int i;
+#endif
 
 	if (!i915_gem_check_execbuffer(args))
 		return -EINVAL;
@@ -1250,6 +1253,16 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		goto pre_mutex_err;
 	}
 
+#ifdef CONFIG_DRM_I915_SCHEDULER
+	qe.saved_objects = kzalloc(
+			sizeof(*qe.saved_objects) * args->buffer_count,
+			GFP_KERNEL);
+	if (!qe.saved_objects) {
+		ret = -ENOMEM;
+		goto err;
+	}
+#endif
+
 	/* Look up object handles */
 	ret = eb_lookup_vmas(eb, exec, args, vm, file);
 	if (ret)
@@ -1333,10 +1346,33 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	qe.params.args_DR4                = args->DR4;
 	qe.params.batch_obj               = batch_obj;
 	qe.params.cliprects               = cliprects;
-	qe.params.ctx                     = ctx;
 	qe.params.mask                    = mask;
 	qe.params.mode                    = mode;
 
+#ifdef CONFIG_DRM_I915_SCHEDULER
+	/*
+	 * Save away the list of objects used by this batch buffer for the
+	 * purpose of tracking inter-buffer dependencies.
+	 */
+	for (i = 0; i < args->buffer_count; i++) {
+		/*
+		 * NB: 'drm_gem_object_lookup()' increments the object's
+		 * reference count and so must be matched by a
+		 * 'drm_gem_object_unreference' call.
+		 */
+		qe.saved_objects[i].obj =
+			to_intel_bo(drm_gem_object_lookup(dev, file,
+							  exec[i].handle));
+	}
+	qe.num_objs = i;
+
+	/* Lock and save the context object as well. */
+	i915_gem_context_reference(ctx);
+	qe.params.ctx = ctx;
+#else  // CONFIG_DRM_I915_SCHEDULER
+	qe.params.ctx = ctx;
+#endif // CONFIG_DRM_I915_SCHEDULER
+
 	if (flags & I915_DISPATCH_SECURE)
 		qe.params.batch_obj_vm_offset = i915_gem_obj_ggtt_offset(batch_obj);
 	else
@@ -1370,6 +1406,25 @@ err:
 
 	eb_destroy(eb);
 
+#ifdef CONFIG_DRM_I915_SCHEDULER
+	if (qe.saved_objects) {
+		/* Need to release the objects: */
+		for (i = 0; i < qe.num_objs; i++) {
+			if (!qe.saved_objects[i].obj)
+				continue;
+
+			drm_gem_object_unreference(
+					&qe.saved_objects[i].obj->base);
+		}
+
+		kfree(qe.saved_objects);
+
+		/* Context too */
+		if (qe.params.ctx)
+			i915_gem_context_unreference(qe.params.ctx);
+	}
+#endif // CONFIG_DRM_I915_SCHEDULER
+
 	mutex_unlock(&dev->struct_mutex);
 
 pre_mutex_err:
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index d95c789..fc165c2 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -62,7 +62,7 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 {
 	struct drm_i915_private     *dev_priv = qe->params.dev->dev_private;
 	struct i915_scheduler       *scheduler = dev_priv->scheduler;
-	int ret;
+	int ret, i;
 
 	BUG_ON(!scheduler);
 
@@ -70,7 +70,23 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 
 	ret = i915_gem_do_execbuffer_final(&qe->params);
 
-	/* Free everything that is owned by the QE structure: */
+	/* Need to release the objects: */
+	for (i = 0; i < qe->num_objs; i++) {
+		if (!qe->saved_objects[i].obj)
+			continue;
+
+		drm_gem_object_unreference(&qe->saved_objects[i].obj->base);
+	}
+
+	kfree(qe->saved_objects);
+	qe->saved_objects = NULL;
+	qe->num_objs = 0;
+
+	/* Free the context object too: */
+	if (qe->params.ctx)
+		i915_gem_context_unreference(qe->params.ctx);
+
+	/* And anything else owned by the QE structure: */
 	kfree(qe->params.cliprects);
 
 	return ret;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 4c3e081..7c88a26 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -45,8 +45,14 @@ struct i915_execbuffer_params {
 	uint32_t                        scheduler_index;
 };
 
+struct i915_scheduler_obj_entry {
+	struct drm_i915_gem_object          *obj;
+};
+
 struct i915_scheduler_queue_entry {
 	struct i915_execbuffer_params       params;
+	struct i915_scheduler_obj_entry     *saved_objects;
+	int                                 num_objs;
 };
 
 bool        i915_scheduler_is_enabled(struct drm_device *dev);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 22/44] drm/i915: Ensure OLS & PLR are always in sync
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (20 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 21/44] drm/i915: Added tracking/locking of batch buffer objects John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 23/44] drm/i915: Added manipulation of OLS/PLR John.C.Harrison
                   ` (23 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The new seqno alloction code pre-allocates a 'lazy' request structure and then
tries to allocate the 'lazy' seqno. The seqno allocation can potential wrap
around zero and when doing so, tries to idle the ring by waiting for all
oustanding work to complete. With a scheduler in place, this can mean first
submitting extra work to the ring. However, at this point in time, the lazy
request is valid but the lazy seqno is not. Some existing code was getting
confused by this state and Bad Things would happen.

The safest solution is to still allocate the lazy request in advance (to avoid
having to roll back in an out of memory sitation) but to save the pointer in a
local variable rather than immediately updating the lazy pointer. Only after a
valid seqno has been acquired is the lazy request pointer actually updated.

This guarantees that both lazy values are either invalid or both valid. There
can no longer be an inconsistent state.
---
 drivers/gpu/drm/i915/intel_ringbuffer.c |   27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 737c41b..1ef0cbd 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1665,20 +1665,31 @@ int intel_ring_idle(struct intel_engine_cs *ring)
 int
 intel_ring_alloc_seqno(struct intel_engine_cs *ring)
 {
-	if (ring->outstanding_lazy_seqno)
+	int ret;
+	struct drm_i915_gem_request *request;
+
+	/* NB: Some code seems to test the OLS and other code tests the PLR.
+	 * Therefore it is only safe if the two are kept in step. */
+
+	if (ring->outstanding_lazy_seqno) {
+		BUG_ON(ring->preallocated_lazy_request == NULL);
 		return 0;
+	}
 
-	if (ring->preallocated_lazy_request == NULL) {
-		struct drm_i915_gem_request *request;
+	BUG_ON(ring->preallocated_lazy_request != NULL);
 
-		request = kmalloc(sizeof(*request), GFP_KERNEL);
-		if (request == NULL)
-			return -ENOMEM;
+	request = kmalloc(sizeof(*request), GFP_KERNEL);
+	if (request == NULL)
+		return -ENOMEM;
 
-		ring->preallocated_lazy_request = request;
+	ret = i915_gem_get_seqno(ring->dev, &ring->outstanding_lazy_seqno);
+	if (ret) {
+		kfree(request);
+		return ret;
 	}
 
-	return i915_gem_get_seqno(ring->dev, &ring->outstanding_lazy_seqno);
+	ring->preallocated_lazy_request = request;
+	return 0;
 }
 
 static int __intel_ring_prepare(struct intel_engine_cs *ring,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 23/44] drm/i915: Added manipulation of OLS/PLR
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (21 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 22/44] drm/i915: Ensure OLS & PLR are always in sync John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 24/44] drm/i915: Added scheduler interrupt handler hook John.C.Harrison
                   ` (22 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler requires each batch buffer to be tagged with the seqno it has been
assigned and for that seqno to only be attached to the given batch buffer. Note
that the seqno assigned to a batch buffer that is being submitted to the
hardware might be very different to the next seqno that would be assigned
automatically on ring submission.

This means manipulating the lazy seqno and request values around batch buffer
submission. At the start of execbuffer() the lazy seqno should be zero, if not
it means that something has been written to the ring without a request being
added. The lazy seqno also needs to be reset back to zero at the end ready for
the next request to start.

Then, execbuffer_final() needs to manually set the lazy seqno to the batch
buffer's pre-assigned value rather than grabbing the next available value. There
is no need to explictly clear the lazy seqno at the end of _final() as the
add_request() call within _retire_commands() will do that automatically.
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   68 +++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_scheduler.h      |    2 +
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 6bb1fd6..98cc95e 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1328,10 +1328,22 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		vma->bind_vma(vma, batch_obj->cache_level, GLOBAL_BIND);
 	}
 
+	/* OLS should be zero at this point. If not then this buffer is going
+	 * to be tagged as someone else's work! */
+	BUG_ON(ring->outstanding_lazy_seqno    != 0);
+	BUG_ON(ring->preallocated_lazy_request != NULL);
+
 	/* Allocate a seqno for this batch buffer nice and early. */
 	ret = intel_ring_alloc_seqno(ring);
 	if (ret)
 		goto err;
+	qe.params.seqno   = ring->outstanding_lazy_seqno;
+	qe.params.request = ring->preallocated_lazy_request;
+
+	BUG_ON(ring->outstanding_lazy_seqno    == 0);
+	BUG_ON(ring->outstanding_lazy_seqno    != qe.params.seqno);
+	BUG_ON(ring->preallocated_lazy_request != qe.params.request);
+	BUG_ON(ring->preallocated_lazy_request == NULL);
 
 	/* Save assorted stuff away to pass through to execbuffer_final() */
 	qe.params.dev                     = dev;
@@ -1373,6 +1385,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	qe.params.ctx = ctx;
 #endif // CONFIG_DRM_I915_SCHEDULER
 
+	/* OLS should have been set to something useful above */
+	BUG_ON(ring->outstanding_lazy_seqno    != qe.params.seqno);
+	BUG_ON(ring->preallocated_lazy_request != qe.params.request);
+
 	if (flags & I915_DISPATCH_SECURE)
 		qe.params.batch_obj_vm_offset = i915_gem_obj_ggtt_offset(batch_obj);
 	else
@@ -1384,6 +1400,19 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 
 	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
 
+	/* Make sure the OLS hasn't advanced (which would indicate a flush
+	 * of the work in progess which in turn would be a Bad Thing). */
+	BUG_ON(ring->outstanding_lazy_seqno    != qe.params.seqno);
+	BUG_ON(ring->preallocated_lazy_request != qe.params.request);
+
+	/*
+	 * A new seqno has been assigned to the buffer and saved away for
+	 * future reference. So clear the OLS to ensure that any further
+	 * work is assigned a brand new seqno:
+	 */
+	ring->outstanding_lazy_seqno    = 0;
+	ring->preallocated_lazy_request = NULL;
+
 	ret = i915_scheduler_queue_execbuffer(&qe);
 	if (ret)
 		goto err;
@@ -1425,6 +1454,12 @@ err:
 	}
 #endif // CONFIG_DRM_I915_SCHEDULER
 
+	/* Clear the OLS again in case the failure occurred after it had been
+	 * assigned. */
+	kfree(ring->preallocated_lazy_request);
+	ring->preallocated_lazy_request = NULL;
+	ring->outstanding_lazy_seqno    = 0;
+
 	mutex_unlock(&dev->struct_mutex);
 
 pre_mutex_err:
@@ -1443,6 +1478,7 @@ int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params)
 	struct intel_engine_cs  *ring = params->ring;
 	u64 exec_start, exec_len;
 	int ret, i;
+	u32 seqno;
 
 	/* The mutex must be acquired before calling this function */
 	BUG_ON(!mutex_is_locked(&params->dev->struct_mutex));
@@ -1454,6 +1490,14 @@ int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params)
 
 	intel_runtime_pm_get(dev_priv);
 
+	/* Ensure the correct seqno gets assigned to the correct buffer: */
+	BUG_ON(ring->outstanding_lazy_seqno    != 0);
+	BUG_ON(ring->preallocated_lazy_request != NULL);
+	ring->outstanding_lazy_seqno    = params->seqno;
+	ring->preallocated_lazy_request = params->request;
+
+	seqno = params->seqno;
+
 	/* Unconditionally invalidate gpu caches and ensure that we do flush
 	 * any residual writes from the previous batch.
 	 */
@@ -1466,6 +1510,10 @@ int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params)
 	if (ret)
 		goto err;
 
+	/* Seqno matches? */
+	BUG_ON(seqno != params->seqno);
+	BUG_ON(ring->outstanding_lazy_seqno != params->seqno);
+
 	if (ring == &dev_priv->ring[RCS] &&
 	    params->mode != dev_priv->relative_constants_mode) {
 		ret = intel_ring_begin(ring, 4);
@@ -1487,6 +1535,9 @@ int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params)
 			goto err;
 	}
 
+	/* Seqno matches? */
+	BUG_ON(ring->outstanding_lazy_seqno    != params->seqno);
+	BUG_ON(ring->preallocated_lazy_request != params->request);
 
 	exec_len   = params->args_batch_len;
 	exec_start = params->batch_obj_vm_offset +
@@ -1513,12 +1564,27 @@ int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params)
 			goto err;
 	}
 
-	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), params->eb_flags);
+	trace_i915_gem_ring_dispatch(ring, seqno, params->eb_flags);
+
+	/* Seqno matches? */
+	BUG_ON(params->seqno   != ring->outstanding_lazy_seqno);
+	BUG_ON(params->request != ring->preallocated_lazy_request);
 
 	i915_gem_execbuffer_retire_commands(params->dev, params->file, ring,
 					    params->batch_obj);
 
+	/* OLS should be zero by now! */
+	BUG_ON(ring->outstanding_lazy_seqno);
+	BUG_ON(ring->preallocated_lazy_request);
+
 err:
+	if (ret) {
+		/* Reset the OLS ready to try again later. */
+		kfree(ring->preallocated_lazy_request);
+		ring->preallocated_lazy_request = NULL;
+		ring->outstanding_lazy_seqno    = 0;
+	}
+
 	/* intel_gpu_busy should also get a ref, so it will free when the device
 	 * is really idle. */
 	intel_runtime_pm_put(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 7c88a26..e62254a 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -42,6 +42,8 @@ struct i915_execbuffer_params {
 	uint32_t                        mask;
 	int                             mode;
 	struct intel_context            *ctx;
+	int                             seqno;
+	struct drm_i915_gem_request     *request;
 	uint32_t                        scheduler_index;
 };
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 24/44] drm/i915: Added scheduler interrupt handler hook
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (22 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 23/44] drm/i915: Added manipulation of OLS/PLR John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 25/44] drm/i915: Added hook to catch 'unexpected' ring submissions John.C.Harrison
                   ` (21 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler needs to be informed of each batch buffer completion. This is done
via the user interrupt mechanism. The epilogue of each batch buffer submission
updates a sequence number value (seqno) and triggers a user interrupt.

This change hooks the scheduler in to the processing of that interrupt via the
notify_ring() function. The scheduler also has clean up code that needs to be
done outside of the interrupt context, thus notify_ring() now also pokes the
scheduler's work queue.
---
 drivers/gpu/drm/i915/i915_irq.c       |    3 +++
 drivers/gpu/drm/i915/i915_scheduler.c |   16 ++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h |    1 +
 3 files changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index eff08a3e..7089242 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -36,6 +36,7 @@
 #include "i915_drv.h"
 #include "i915_trace.h"
 #include "intel_drv.h"
+#include "i915_scheduler.h"
 
 static const u32 hpd_ibx[] = {
 	[HPD_CRT] = SDE_CRT_HOTPLUG,
@@ -1218,6 +1219,8 @@ static void notify_ring(struct drm_device *dev,
 
 	trace_i915_gem_request_complete(ring);
 
+	i915_scheduler_handle_IRQ(ring);
+
 	wake_up_all(&ring->irq_queue);
 	i915_queue_hangcheck(dev);
 }
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index fc165c2..1e4d7c313 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -92,6 +92,17 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 	return ret;
 }
 
+int i915_scheduler_handle_IRQ(struct intel_engine_cs *ring)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	/* Do stuff... */
+
+	queue_work(dev_priv->wq, &dev_priv->mm.scheduler_work);
+
+	return 0;
+}
+
 int i915_scheduler_remove(struct intel_engine_cs *ring)
 {
 	/* Do stuff... */
@@ -149,4 +160,9 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 	return i915_gem_do_execbuffer_final(&qe->params);
 }
 
+int i915_scheduler_handle_IRQ(struct intel_engine_cs *ring)
+{
+	return 0;
+}
+
 #endif  /* CONFIG_DRM_I915_SCHEDULER */
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index e62254a..dd7d699 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -62,6 +62,7 @@ int         i915_scheduler_init(struct drm_device *dev);
 int         i915_scheduler_closefile(struct drm_device *dev,
 				     struct drm_file *file);
 int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
+int         i915_scheduler_handle_IRQ(struct intel_engine_cs *ring);
 
 #ifdef CONFIG_DRM_I915_SCHEDULER
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 25/44] drm/i915: Added hook to catch 'unexpected' ring submissions
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (23 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 24/44] drm/i915: Added scheduler interrupt handler hook John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 26/44] drm/i915: Added scheduler support to __wait_seqno() calls John.C.Harrison
                   ` (20 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler needs to know what each seqno that pops out of the ring is
referring to. This change adds a hook into the the 'submit some random work that
got forgotten about' clean up code to inform the scheduler that a new seqno has
been sent to the ring for some non-batch buffer operation.
---
 drivers/gpu/drm/i915/i915_gem.c       |   20 +++++++++++++++++++-
 drivers/gpu/drm/i915/i915_scheduler.c |    7 +++++++
 drivers/gpu/drm/i915/i915_scheduler.h |    1 +
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 57b24f0..7727f0f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2347,6 +2347,25 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	if (WARN_ON(request == NULL))
 		return -ENOMEM;
 
+	request->seqno = intel_ring_get_seqno(ring);
+
+#ifdef CONFIG_DRM_I915_SCHEDULER
+	/* The scheduler needs to know about all seqno values that can pop out
+	 * of the ring. Otherwise, things can get confused when batch buffers
+	 * are re-ordered. Specifically, the scheduler has to work out which
+	 * buffers have completed by matching the last completed seqno with its
+	 * internal list of all seqnos ordered by when they were sent to the
+	 * ring. If an unknown seqno appears, the scheduler is unable to process
+	 * any batch buffers that might have completed just before the unknown
+	 * one.
+	 * NB:  The scheduler must be told before the request is actually sent
+	 * to the ring as it needs to know about it before the interrupt occurs.
+	 */
+	ret = i915_scheduler_fly_seqno(ring, request->seqno);
+	if (ret)
+		return ret;
+#endif
+
 	/* Record the position of the start of the request so that
 	 * should we detect the updated seqno part-way through the
 	 * GPU processing the request, we never over-estimate the
@@ -2358,7 +2377,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	if (ret)
 		return ret;
 
-	request->seqno = intel_ring_get_seqno(ring);
 	request->ring = ring;
 	request->head = request_start;
 	request->tail = request_ring_position;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 1e4d7c313..b5d391c 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -92,6 +92,13 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 	return ret;
 }
 
+int i915_scheduler_fly_seqno(struct intel_engine_cs *ring, uint32_t seqno)
+{
+	/* Do stuff... */
+
+	return 0;
+}
+
 int i915_scheduler_handle_IRQ(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index dd7d699..57e001a 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -72,6 +72,7 @@ struct i915_scheduler {
 	uint32_t    index;
 };
 
+int         i915_scheduler_fly_seqno(struct intel_engine_cs *ring, uint32_t seqno);
 int         i915_scheduler_remove(struct intel_engine_cs *ring);
 bool        i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
 					      uint32_t seqno, bool *completed);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 26/44] drm/i915: Added scheduler support to __wait_seqno() calls
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (24 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 25/44] drm/i915: Added hook to catch 'unexpected' ring submissions John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 27/44] drm/i915: Added scheduler support to page fault handler John.C.Harrison
                   ` (19 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler can cause batch buffers, and hence seqno values, to be submitted
to the ring out of order and asynchronously to their submission to the driver.
Thus waiting for the completion of a given seqno value is not as simple as
saying 'is my value <= current ring value'. Not only might the arithmetic
comparison be invalid but the seqno in question might not even have been sent to
the hardware yet.

This change hooks the scheduler in to the wait_seqno() code to ensure correct
behaviour. That is, flush the target batch buffer through to the hardware and do
an out-of-order-safe comparison. Note that pre-emptive scheduling adds in the
further complication that even thought the batch buffer might have been sent at
the start of the wait call, it could be thrown off the hardware and back into
the software queue during the wait. This means that waiting indefinitely with
the driver wide mutex lock held is a very Bad Idea. Instead, the wait call must
return EAGAIN at least as far back as necessary to release the mutex lock and
allow the scheduler's asynchronous processing to get in and handle the
pre-emption operation and eventually re-submit the work.
---
 drivers/gpu/drm/i915/i915_gem.c       |   53 +++++++++++++++++++++++++++------
 drivers/gpu/drm/i915/i915_scheduler.c |    8 +++++
 drivers/gpu/drm/i915/i915_scheduler.h |    9 ++++++
 3 files changed, 61 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7727f0f..5ed5f66 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1152,7 +1152,8 @@ static int __wait_seqno(struct intel_engine_cs *ring, u32 seqno,
 			unsigned reset_counter,
 			bool interruptible,
 			struct timespec *timeout,
-			struct drm_i915_file_private *file_priv)
+			struct drm_i915_file_private *file_priv,
+			bool is_locked)
 {
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1161,9 +1162,14 @@ static int __wait_seqno(struct intel_engine_cs *ring, u32 seqno,
 	struct timespec before, now;
 	DEFINE_WAIT(wait);
 	unsigned long timeout_expire;
-	int ret;
+	int ret = 0;
+#ifdef CONFIG_DRM_I915_SCHEDULER
+	bool    completed;
+#endif
 
+	might_sleep();
 	WARN(dev_priv->pm.irqs_disabled, "IRQs disabled\n");
+	BUG_ON(seqno == 0);
 
 	if (i915_seqno_passed(ring, ring->get_seqno(ring, true), seqno))
 		return 0;
@@ -1201,9 +1207,34 @@ static int __wait_seqno(struct intel_engine_cs *ring, u32 seqno,
 			break;
 		}
 
-		if (i915_seqno_passed(ring, ring->get_seqno(ring, false), seqno)) {
-			ret = 0;
-			break;
+#ifdef CONFIG_DRM_I915_SCHEDULER
+		if (is_locked) {
+			/* If this seqno is being tracked by the scheduler then
+			 * it is unsafe to sleep with the mutex lock held as the
+			 * scheduler may require the lock in order to progress
+			 * the seqno. */
+			if (i915_scheduler_is_seqno_in_flight(ring, seqno, &completed)) {
+				ret = completed ? 0 : -EAGAIN;
+				break;
+			}
+
+			/* If the seqno is not tracked by the scheduler then a
+			 * straight arithmetic comparison test can be done. */
+			if (i915_compare_seqno_values(ring->get_seqno(ring, false), seqno) >= 0) {
+				ret = 0;
+				break;
+			}
+		} else
+#endif
+		{
+			/* The regular 'is seqno passed' test is fine if the
+			 * mutex lock is not held. Even if the seqno is stuck
+			 * in the scheduler, it will be able to progress while
+			 * this thread waits. */
+			if (i915_seqno_passed(ring, ring->get_seqno(ring, false), seqno)) {
+				ret = 0;
+				break;
+			}
 		}
 
 		if (interruptible && signal_pending(current)) {
@@ -1265,6 +1296,10 @@ i915_wait_seqno(struct intel_engine_cs *ring, uint32_t seqno)
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 	BUG_ON(seqno == 0);
 
+	ret = I915_SCHEDULER_FLUSH_SEQNO(ring, true, seqno);
+	if (ret < 0)
+		return ret;
+
 	ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
 	if (ret)
 		return ret;
@@ -1275,7 +1310,7 @@ i915_wait_seqno(struct intel_engine_cs *ring, uint32_t seqno)
 
 	return __wait_seqno(ring, seqno,
 			    atomic_read(&dev_priv->gpu_error.reset_counter),
-			    interruptible, NULL, NULL);
+			    interruptible, NULL, NULL, true);
 }
 
 static int
@@ -1352,7 +1387,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 
 	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
 	mutex_unlock(&dev->struct_mutex);
-	ret = __wait_seqno(ring, seqno, reset_counter, true, NULL, file_priv);
+	ret = __wait_seqno(ring, seqno, reset_counter, true, NULL, file_priv, false);
 	mutex_lock(&dev->struct_mutex);
 	if (ret)
 		return ret;
@@ -2848,7 +2883,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
 	mutex_unlock(&dev->struct_mutex);
 
-	ret = __wait_seqno(ring, seqno, reset_counter, true, timeout, file->driver_priv);
+	ret = __wait_seqno(ring, seqno, reset_counter, true, timeout, file->driver_priv, false);
 	if (timeout)
 		args->timeout_ns = timespec_to_ns(timeout);
 	return ret;
@@ -4071,7 +4106,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (seqno == 0)
 		return 0;
 
-	ret = __wait_seqno(ring, seqno, reset_counter, true, NULL, NULL);
+	ret = __wait_seqno(ring, seqno, reset_counter, true, NULL, NULL, false);
 	if (ret == 0)
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index b5d391c..d579bab 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -117,6 +117,14 @@ int i915_scheduler_remove(struct intel_engine_cs *ring)
 	return 0;
 }
 
+int i915_scheduler_flush_seqno(struct intel_engine_cs *ring, bool is_locked,
+			       uint32_t seqno)
+{
+	/* Do stuff... */
+
+	return 0;
+}
+
 bool i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
 			       uint32_t seqno, bool *completed)
 {
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 57e001a..3811359 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -57,6 +57,13 @@ struct i915_scheduler_queue_entry {
 	int                                 num_objs;
 };
 
+#ifdef CONFIG_DRM_I915_SCHEDULER
+#   define I915_SCHEDULER_FLUSH_SEQNO(ring, locked, seqno)                   \
+		i915_scheduler_flush_seqno(ring, locked, seqno)
+#else
+#   define I915_SCHEDULER_FLUSH_SEQNO(ring, locked, seqno)      0
+#endif
+
 bool        i915_scheduler_is_enabled(struct drm_device *dev);
 int         i915_scheduler_init(struct drm_device *dev);
 int         i915_scheduler_closefile(struct drm_device *dev,
@@ -74,6 +81,8 @@ struct i915_scheduler {
 
 int         i915_scheduler_fly_seqno(struct intel_engine_cs *ring, uint32_t seqno);
 int         i915_scheduler_remove(struct intel_engine_cs *ring);
+int         i915_scheduler_flush_seqno(struct intel_engine_cs *ring,
+				       bool is_locked, uint32_t seqno);
 bool        i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
 					      uint32_t seqno, bool *completed);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 27/44] drm/i915: Added scheduler support to page fault handler
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (25 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 26/44] drm/i915: Added scheduler support to __wait_seqno() calls John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 28/44] drm/i915: Added scheduler flush calls to ring throttle and idle functions John.C.Harrison
                   ` (18 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

GPU page faults can now require scheduler operation in order to complete. For
example, in order to free up sufficient memory to handle the fault the handler
must wait for a batch buffer to complete that has not even been sent to the
hardware yet. Thus EAGAIN no longer means a GPU hang, it can occur under normal
operation.
---
 drivers/gpu/drm/i915/i915_gem.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5ed5f66..aa1e0b2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1622,10 +1622,16 @@ out:
 		}
 	case -EAGAIN:
 		/*
-		 * EAGAIN means the gpu is hung and we'll wait for the error
-		 * handler to reset everything when re-faulting in
+		 * EAGAIN can mean the gpu is hung and we'll have to wait for
+		 * the error handler to reset everything when re-faulting in
 		 * i915_mutex_lock_interruptible.
+		 *
+		 * It can also indicate various other nonfatal errors for which
+		 * the best response is to give other threads a chance to run,
+		 * and then retry the failing operation in its entirety.
 		 */
+		set_need_resched();
+		/*FALLTHRU*/
 	case 0:
 	case -ERESTARTSYS:
 	case -EINTR:
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 28/44] drm/i915: Added scheduler flush calls to ring throttle and idle functions
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (26 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 27/44] drm/i915: Added scheduler support to page fault handler John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 29/44] drm/i915: Hook scheduler into intel_ring_idle() John.C.Harrison
                   ` (17 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

When requesting that all GPU work is completed, it is now necessary to get the
scheduler involved in order to flush out work that queued and not yet submitted.
---
 drivers/gpu/drm/i915/i915_gem.c       |   16 +++++++++++++++-
 drivers/gpu/drm/i915/i915_scheduler.c |    7 +++++++
 drivers/gpu/drm/i915/i915_scheduler.h |    5 +++++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index aa1e0b2..1c508b7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3049,6 +3049,10 @@ int i915_gpu_idle(struct drm_device *dev)
 
 	/* Flush everything onto the inactive list. */
 	for_each_ring(ring, dev_priv, i) {
+		ret = I915_SCHEDULER_FLUSH_ALL(ring, true);
+		if (ret < 0)
+			return ret;
+
 		ret = i915_switch_context(ring, ring->default_context);
 		if (ret)
 			return ret;
@@ -4088,7 +4092,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	struct intel_engine_cs *ring = NULL;
 	unsigned reset_counter;
 	u32 seqno = 0;
-	int ret;
+	int i, ret;
 
 	ret = i915_gem_wait_for_error(&dev_priv->gpu_error);
 	if (ret)
@@ -4098,6 +4102,16 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (ret)
 		return ret;
 
+	for_each_ring(ring, dev_priv, i) {
+		/* Need a mechanism to flush out scheduler entries that were
+		 * submitted more than 'recent_enough' time ago as well! In the
+		 * meantime, just flush everything out to ensure that entries
+		 * can not sit around indefinitely. */
+		ret = I915_SCHEDULER_FLUSH_ALL(ring, false);
+		if (ret < 0)
+			return ret;
+	}
+
 	spin_lock(&file_priv->mm.lock);
 	list_for_each_entry(request, &file_priv->mm.request_list, client_list) {
 		if (time_after_eq(request->emitted_jiffies, recent_enough))
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index d579bab..6b6827f 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -125,6 +125,13 @@ int i915_scheduler_flush_seqno(struct intel_engine_cs *ring, bool is_locked,
 	return 0;
 }
 
+int i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked)
+{
+	/* Do stuff... */
+
+	return 0;
+}
+
 bool i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
 			       uint32_t seqno, bool *completed)
 {
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 3811359..898d2bb 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -58,9 +58,13 @@ struct i915_scheduler_queue_entry {
 };
 
 #ifdef CONFIG_DRM_I915_SCHEDULER
+#   define I915_SCHEDULER_FLUSH_ALL(ring, locked)                            \
+		i915_scheduler_flush(ring, locked)
+
 #   define I915_SCHEDULER_FLUSH_SEQNO(ring, locked, seqno)                   \
 		i915_scheduler_flush_seqno(ring, locked, seqno)
 #else
+#   define I915_SCHEDULER_FLUSH_ALL(ring, locked)               0
 #   define I915_SCHEDULER_FLUSH_SEQNO(ring, locked, seqno)      0
 #endif
 
@@ -81,6 +85,7 @@ struct i915_scheduler {
 
 int         i915_scheduler_fly_seqno(struct intel_engine_cs *ring, uint32_t seqno);
 int         i915_scheduler_remove(struct intel_engine_cs *ring);
+int         i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked);
 int         i915_scheduler_flush_seqno(struct intel_engine_cs *ring,
 				       bool is_locked, uint32_t seqno);
 bool        i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 29/44] drm/i915: Hook scheduler into intel_ring_idle()
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (27 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 28/44] drm/i915: Added scheduler flush calls to ring throttle and idle functions John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 30/44] drm/i915: Added a module parameter for allowing scheduler overrides John.C.Harrison
                   ` (16 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The code to wait for a ring to be idle ends by calling __wait_seqno() on the
value in the last request structure. However, with a scheduler, there may be
work queued up but not yet submitted. There is also the possiblity of
pre-emption re-ordering work after it has been submitted. Thus the last request
structure at the current moment is not necessarily the last piece of work by the
time that particular seqno has completed.

It is not possible to force the scheduler to submit all work from inside the
ring idle function as it might not be a safe place to do so. Instead, it must
simply return early if the scheduler has outstanding work and roll back as far
as releasing the driver mutex lock and the returning the system to a consistent
state.
---
 drivers/gpu/drm/i915/i915_scheduler.c   |   12 ++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h   |    1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |    8 ++++++++
 3 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 6b6827f..6a10a76 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -165,6 +165,13 @@ int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
 	return 0;
 }
 
+bool i915_scheduler_is_idle(struct intel_engine_cs *ring)
+{
+	/* Do stuff... */
+
+	return true;
+}
+
 #else   /* CONFIG_DRM_I915_SCHEDULER */
 
 int i915_scheduler_init(struct drm_device *dev)
@@ -177,6 +184,11 @@ int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
 	return 0;
 }
 
+bool i915_scheduler_is_idle(struct intel_engine_cs *ring)
+{
+	return true;
+}
+
 int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 {
 	return i915_gem_do_execbuffer_final(&qe->params);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 898d2bb..1b3d51a 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -74,6 +74,7 @@ int         i915_scheduler_closefile(struct drm_device *dev,
 				     struct drm_file *file);
 int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
 int         i915_scheduler_handle_IRQ(struct intel_engine_cs *ring);
+bool        i915_scheduler_is_idle(struct intel_engine_cs *ring);
 
 #ifdef CONFIG_DRM_I915_SCHEDULER
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 1ef0cbd..1ad162b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1651,6 +1651,14 @@ int intel_ring_idle(struct intel_engine_cs *ring)
 			return ret;
 	}
 
+	/* If there is anything outstanding within the scheduler then give up
+	 * now as the submission of such work requires the mutex lock. While
+	 * the lock is definitely held at this point (i915_wait_seqno will BUG
+	 * if called without), the driver is not necessarily at a safe point
+	 * to start submitting ring work. */
+	if (!i915_scheduler_is_idle(ring))
+		return -EAGAIN;
+
 	/* Wait upon the last request to be completed */
 	if (list_empty(&ring->request_list))
 		return 0;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 30/44] drm/i915: Added a module parameter for allowing scheduler overrides
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (28 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 29/44] drm/i915: Hook scheduler into intel_ring_idle() John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 31/44] drm/i915: Implemented the GPU scheduler John.C.Harrison
                   ` (15 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

It can be useful to be able to disable certain features (e.g. the entire
scheduler) via a module parameter for debugging purposes. A parameter has the
advantage of not being a compile time switch but without implying that it can be
changed dynamically at runtime.
---
 drivers/gpu/drm/i915/i915_drv.h       |    1 +
 drivers/gpu/drm/i915/i915_params.c    |    4 ++++
 drivers/gpu/drm/i915/i915_scheduler.h |    5 +++++
 3 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fbafa68..4d52c67 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2053,6 +2053,7 @@ struct i915_params {
 	bool reset;
 	bool disable_display;
 	bool disable_vtd_wa;
+	int scheduler_override;
 };
 extern struct i915_params i915 __read_mostly;
 
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index d05a2af..ce99733 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -48,6 +48,7 @@ struct i915_params i915 __read_mostly = {
 	.disable_display = 0,
 	.enable_cmd_parser = 1,
 	.disable_vtd_wa = 0,
+	.scheduler_override = 0,
 };
 
 module_param_named(modeset, i915.modeset, int, 0400);
@@ -156,3 +157,6 @@ MODULE_PARM_DESC(disable_vtd_wa, "Disable all VT-d workarounds (default: false)"
 module_param_named(enable_cmd_parser, i915.enable_cmd_parser, int, 0600);
 MODULE_PARM_DESC(enable_cmd_parser,
 		 "Enable command parsing (1=enabled [default], 0=disabled)");
+
+module_param_named(scheduler_override, i915.scheduler_override, int, 0600);
+MODULE_PARM_DESC(scheduler_override, "Scheduler override option (default: 0)");
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 1b3d51a..6dd4fea 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -84,6 +84,11 @@ struct i915_scheduler {
 	uint32_t    index;
 };
 
+/* Options for 'scheduler_override' module parameter: */
+enum {
+	i915_so_normal              = 0,
+};
+
 int         i915_scheduler_fly_seqno(struct intel_engine_cs *ring, uint32_t seqno);
 int         i915_scheduler_remove(struct intel_engine_cs *ring);
 int         i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 31/44] drm/i915: Implemented the GPU scheduler
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (29 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 30/44] drm/i915: Added a module parameter for allowing scheduler overrides John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 32/44] drm/i915: Added immediate submission override to scheduler John.C.Harrison
                   ` (14 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Filled in all the 'do stuff here' blanks...

The general theory of operation is that when batch buffers are submitted to the
driver, the execbuffer() code assigns a unique seqno value and then packages up
all the information required to execute the batch buffer at a later time. This
package is given over to the scheduler which adds it to an internal node list.
The scheduler also scans the list of objects associated with the batch buffer
and compares them against the objects already in use by other buffers in the
node list. If matches are found then the new batch buffer node is marked as
being dependent upon the matching node. The same is done for the context object.
The scheduler also bumps up the priority of such matching nodes on the grounds
that the more dependencies a given batch buffer has the more important it is
likely to be.

The scheduler aims to have a given (tuneable) number of batch buffers in flight
on the hardware at any given time. If fewer than this are currently executing
when a new node is queued, then the node is passed straight through to the
submit function. Otherwise it is simply added to the queue and the driver
returns back to user land.

As each batch buffer completes, it raises an interrupt which wakes up the
scheduler. Note that it is possible for multiple buffers to complete before the
IRQ handler gets to run. Further, the seqno values of the individual buffers are
not necessary incrementing as the scheduler may have re-ordered their
submission. However, the scheduler keeps the list of executing buffers in order
of hardware submission. Thus it can scan through the list until a matching seqno
is found and then mark all in flight nodes from that point on as completed.

A deferred work queue is also poked by the interrupt handler. When this wakes up
it can do more involved processing such as actually removing completed nodes
from the queue and freeing up the resources associated with them (internal
memory allocations, DRM object references, context reference, etc.). The work
handler also checks the in flight count and calls the submission code if a new
slot has appeared.

When the scheduler's submit code is called, it scans the queued node list for
the highest priority node that has no unmet dependencies. Note that the
dependency calculation is complex as it must take inter-ring dependencies and
potential preemptions into account. Note also that in the future this will be
extended to include external dependencies such as the Android Native Sync file
descriptors and/or the linux dma-buff synchronisation scheme.

If a suitable node is found then it is sent to execbuff_final() for submission
to the hardware. The in flight count is then re-checked and a new node popped
from the list if appropriate.

Note that this change does not implement pre-emptive scheduling. Only basic
scheduling by re-ordering batch buffer submission is currently implemented.
---
 drivers/gpu/drm/i915/i915_scheduler.c |  945 +++++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_scheduler.h |   59 +-
 2 files changed, 965 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 6a10a76..1816f1d 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -41,6 +41,7 @@ int i915_scheduler_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	int                     r;
 
 	if (scheduler)
 		return 0;
@@ -51,8 +52,16 @@ int i915_scheduler_init(struct drm_device *dev)
 
 	spin_lock_init(&scheduler->lock);
 
+	for (r = 0; r < I915_NUM_RINGS; r++)
+		INIT_LIST_HEAD(&scheduler->node_queue[r]);
+
 	scheduler->index = 1;
 
+	/* Default tuning values: */
+	scheduler->priority_level_max     = ~0U;
+	scheduler->priority_level_preempt = 900;
+	scheduler->min_flying             = 2;
+
 	dev_priv->scheduler = scheduler;
 
 	return 0;
@@ -60,50 +69,371 @@ int i915_scheduler_init(struct drm_device *dev)
 
 int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 {
-	struct drm_i915_private     *dev_priv = qe->params.dev->dev_private;
-	struct i915_scheduler       *scheduler = dev_priv->scheduler;
-	int ret, i;
+	struct drm_i915_private *dev_priv = qe->params.dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct intel_engine_cs  *ring = qe->params.ring;
+	struct i915_scheduler_queue_entry  *node;
+	struct i915_scheduler_queue_entry  *test;
+	struct timespec     stamp;
+	unsigned long       flags;
+	bool                not_flying, found;
+	int                 i, j, r, got_batch = 0;
+	int                 incomplete = 0;
 
 	BUG_ON(!scheduler);
 
-	qe->params.scheduler_index = scheduler->index++;
+	if (i915.scheduler_override & i915_so_direct_submit) {
+		int ret;
 
-	ret = i915_gem_do_execbuffer_final(&qe->params);
+		qe->params.scheduler_index = scheduler->index++;
 
-	/* Need to release the objects: */
-	for (i = 0; i < qe->num_objs; i++) {
-		if (!qe->saved_objects[i].obj)
-			continue;
+		scheduler->flags[qe->params.ring->id] |= i915_sf_submitting;
+		ret = i915_gem_do_execbuffer_final(&qe->params);
+		scheduler->flags[qe->params.ring->id] &= ~i915_sf_submitting;
+
+		/* Need to release the objects: */
+		for (i = 0; i < qe->num_objs; i++) {
+			if (!qe->saved_objects[i].obj)
+				continue;
 
-		drm_gem_object_unreference(&qe->saved_objects[i].obj->base);
+			drm_gem_object_unreference(&qe->saved_objects[i].obj->base);
+		}
+
+		kfree(qe->saved_objects);
+		qe->saved_objects = NULL;
+		qe->num_objs = 0;
+
+		/* Free the context object too: */
+		if (qe->params.ctx)
+			i915_gem_context_unreference(qe->params.ctx);
+
+		/* And anything else owned by the QE structure: */
+		kfree(qe->params.cliprects);
+
+		return ret;
 	}
 
-	kfree(qe->saved_objects);
-	qe->saved_objects = NULL;
-	qe->num_objs = 0;
+	getrawmonotonic(&stamp);
 
-	/* Free the context object too: */
-	if (qe->params.ctx)
-		i915_gem_context_unreference(qe->params.ctx);
+	node = kmalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		return -ENOMEM;
 
-	/* And anything else owned by the QE structure: */
-	kfree(qe->params.cliprects);
+	*node = *qe;
+	INIT_LIST_HEAD(&node->link);
+	node->status = i915_sqs_queued;
+	node->stamp  = stamp;
 
-	return ret;
+	/*
+	 * Verify that the batch buffer itself is included in the object list.
+	 */
+	for (i = 0; i < node->num_objs; i++) {
+		if (node->saved_objects[i].obj == node->params.batch_obj)
+			got_batch++;
+	}
+
+	BUG_ON(got_batch != 1);
+
+	/* Need to determine the number of incomplete entries in the list as
+	 * that will be the maximum size of the dependency list.
+	 *
+	 * Note that the allocation must not be made with the spinlock acquired
+	 * as kmalloc can sleep. However, the unlock/relock is safe because no
+	 * new entries can be queued up during the unlock as the i915 driver
+	 * mutex is still held. Entries could be removed from the list but that
+	 * just means the dep_list will be over-allocated which is fine.
+	 */
+	spin_lock_irqsave(&scheduler->lock, flags);
+	for (r = 0; r < I915_NUM_RINGS; r++) {
+		list_for_each_entry(test, &scheduler->node_queue[r], link) {
+			if (I915_SQS_IS_COMPLETE(test))
+				continue;
+
+			incomplete++;
+		}
+	}
+
+	/* Temporarily unlock to allocate memory: */
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+	if (incomplete) {
+		node->dep_list = kmalloc(sizeof(node->dep_list[0]) * incomplete,
+					 GFP_KERNEL);
+		if (!node->dep_list) {
+			kfree(node);
+			return -ENOMEM;
+		}
+	} else
+		node->dep_list = NULL;
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+	node->num_deps = 0;
+
+	if (node->dep_list) {
+		for (r = 0; r < I915_NUM_RINGS; r++) {
+			list_for_each_entry(test, &scheduler->node_queue[r], link) {
+				if (I915_SQS_IS_COMPLETE(test))
+					continue;
+
+				found = (node->params.ctx == test->params.ctx);
+
+				for (i = 0; (i < node->num_objs) && !found; i++) {
+					for (j = 0; j < test->num_objs; j++) {
+						if (node->saved_objects[i].obj !=
+							    test->saved_objects[j].obj)
+							continue;
+
+						found = true;
+						break;
+					}
+				}
+
+				if (found) {
+					node->dep_list[node->num_deps] = test;
+					node->num_deps++;
+				}
+			}
+		}
+
+		BUG_ON(node->num_deps > incomplete);
+	}
+
+	if (node->priority && node->num_deps) {
+		i915_scheduler_priority_bump_clear(scheduler, ring);
+
+		for (i = 0; i < node->num_deps; i++)
+			i915_scheduler_priority_bump(scheduler,
+					node->dep_list[i], node->priority);
+	}
+
+	node->params.scheduler_index = scheduler->index++;
+
+	list_add_tail(&node->link, &scheduler->node_queue[ring->id]);
+
+	not_flying = i915_scheduler_count_flying(scheduler, ring) <
+						 scheduler->min_flying;
+
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	if (not_flying)
+		i915_scheduler_submit(ring, true);
+
+	return 0;
 }
 
 int i915_scheduler_fly_seqno(struct intel_engine_cs *ring, uint32_t seqno)
 {
-	/* Do stuff... */
+	struct i915_scheduler_queue_entry *node;
+	struct drm_i915_private           *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler             *scheduler = dev_priv->scheduler;
+	struct timespec stamp;
+	unsigned long   flags;
+	int             ret;
+
+	BUG_ON(!scheduler);
+
+	/* No need to add if this request is due to a scheduler submission */
+	if (scheduler->flags[ring->id] & i915_sf_submitting)
+		return 0;
+
+	getrawmonotonic(&stamp);
+
+	/* Need to allocate a new node. Note that kzalloc can sleep
+	 * thus the spinlock must not be held yet. */
+	node = kzalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&node->link);
+	node->params.ring  = ring;
+	node->params.seqno = seqno;
+	node->params.dev   = ring->dev;
+	node->stamp        = stamp;
+	node->status       = i915_sqs_none;
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+	ret = i915_scheduler_fly_node(node);
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	return ret;
+}
+
+int i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node)
+{
+	struct drm_i915_private *dev_priv = node->params.dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct intel_engine_cs  *ring;
+
+	BUG_ON(!scheduler);
+	BUG_ON(!node);
+	BUG_ON(node->status != i915_sqs_none);
+
+	ring = node->params.ring;
+
+	/* Add the node (which should currently be in state none) to the front
+	 * of the queue. This ensure that flying nodes are always held in
+	 * hardware submission order. */
+	list_add(&node->link, &scheduler->node_queue[ring->id]);
+
+	node->status = i915_sqs_flying;
+
+	if (!(scheduler->flags[ring->id] & i915_sf_interrupts_enabled)) {
+		bool    success = true;
+
+		success = ring->irq_get(ring);
+		if (success)
+			scheduler->flags[ring->id] |= i915_sf_interrupts_enabled;
+		else
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * Nodes are considered valid dependencies if they are queued on any ring or
+ * if they are in flight on a different ring. In flight on the same ring is no
+ * longer interesting for non-premptive nodes as the ring serialises execution.
+ * For pre-empting nodes, all in flight dependencies are valid as they must not
+ * be jumped by the act of pre-empting.
+ *
+ * Anything that is neither queued nor flying is uninteresting.
+ */
+static inline bool i915_scheduler_is_dependency_valid(
+			struct i915_scheduler_queue_entry *node, uint32_t idx)
+{
+	struct i915_scheduler_queue_entry *dep;
+
+	dep = node->dep_list[idx];
+	if (!dep)
+		return false;
+
+	if (I915_SQS_IS_QUEUED(dep))
+		return true;
+
+	if (I915_SQS_IS_FLYING(dep)) {
+		if (node->params.ring != dep->params.ring)
+			return true;
+	}
+
+	return false;
+}
+
+uint32_t i915_scheduler_count_flying(struct i915_scheduler *scheduler,
+				     struct intel_engine_cs *ring)
+{
+	struct i915_scheduler_queue_entry *node;
+	uint32_t                          flying = 0;
+
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link)
+		if (I915_SQS_IS_FLYING(node))
+			flying++;
+
+	return flying;
+}
+
+/* Add a popped node back in to the queue. For example, because the ring
+ * was hung when execbuff_final() was called and thus the ring submission
+ * needs to be retried later. */
+static void i915_scheduler_node_requeue(struct i915_scheduler_queue_entry *node)
+{
+	BUG_ON(!node);
+	BUG_ON(!I915_SQS_IS_FLYING(node));
+
+	node->status = i915_sqs_queued;
+}
+
+/* Give up on a popped node completely. For example, because it is causing the
+ * ring to hang or is using some resource that no longer exists. */
+static void i915_scheduler_node_kill(struct i915_scheduler_queue_entry *node)
+{
+	BUG_ON(!node);
+	BUG_ON(!I915_SQS_IS_FLYING(node));
+
+	node->status = i915_sqs_complete;
+}
+
+/*
+ * The batch tagged with the indicated seqence number has completed.
+ * Search the queue for it, update its status and those of any batches
+ * submitted earlier, which must also have completed or been preeempted
+ * as appropriate.
+ *
+ * Called with spinlock already held.
+ */
+static int i915_scheduler_seqno_complete(struct intel_engine_cs *ring, uint32_t seqno)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_queue_entry *node;
+
+	/*
+	 * Batch buffers are added to the head of the list in execution order,
+	 * thus seqno values, although not necessarily incrementing, will be
+	 * met in completion order when scanning the list. So when a match is
+	 * found, all subsequent entries must have also popped out. Conversely,
+	 * if a completed entry is found then there is no need to scan further.
+	 */
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+		if (I915_SQS_IS_COMPLETE(node))
+			goto done;
+
+		if (seqno == node->params.seqno)
+			break;
+	}
+
+	/*
+	 * NB: Lots of extra seqnos get added to the ring to track things
+	 * like cache flushes and page flips. So don't complain about if
+	 * no node was found.
+	 */
+	if (&node->link == &scheduler->node_queue[ring->id])
+		goto done;
+
+	BUG_ON(!I915_SQS_IS_FLYING(node));
+
+	/* Everything from here can be marked as done: */
+	list_for_each_entry_from(node, &scheduler->node_queue[ring->id], link) {
+		/* Check if the marking has already been done: */
+		if (I915_SQS_IS_COMPLETE(node))
+			break;
+
+		if (!I915_SQS_IS_FLYING(node))
+			continue;
+
+		/* Node was in flight so mark it as complete. */
+		node->status = i915_sqs_complete;
+	}
+
+	/* Should submit new work here if flight list is empty but the DRM
+	 * mutex lock might not be available if a '__wait_seqno()' call is
+	 * blocking the system. */
 
+done:
 	return 0;
 }
 
 int i915_scheduler_handle_IRQ(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	unsigned long       flags;
+	static uint32_t     last_seqno;
+	uint32_t            seqno;
+
+	seqno = ring->get_seqno(ring, false);
+
+	if (i915.scheduler_override & i915_so_direct_submit)
+		return 0;
+
+	if (seqno == last_seqno) {
+		/* Why are there sometimes multiple interrupts per seqno? */
+		return 0;
+	}
+	last_seqno = seqno;
 
-	/* Do stuff... */
+	spin_lock_irqsave(&scheduler->lock, flags);
+	i915_scheduler_seqno_complete(ring, seqno);
+	spin_unlock_irqrestore(&scheduler->lock, flags);
 
 	queue_work(dev_priv->wq, &dev_priv->mm.scheduler_work);
 
@@ -112,22 +442,506 @@ int i915_scheduler_handle_IRQ(struct intel_engine_cs *ring)
 
 int i915_scheduler_remove(struct intel_engine_cs *ring)
 {
-	/* Do stuff... */
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_queue_entry  *node, *node_next;
+	unsigned long       flags;
+	int                 flying = 0, queued = 0;
+	int                 ret = 0;
+	bool                do_submit;
+	uint32_t            i, min_seqno;
+	struct list_head    remove;
 
-	return 0;
+	if (list_empty(&scheduler->node_queue[ring->id]))
+		return 0;
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+
+	/* /i915_scheduler_dump_locked(ring, "remove/pre");/ */
+
+	/*
+	 * In the case where the system is idle, starting 'min_seqno' from a big
+	 * number will cause all nodes to be removed as they are now back to
+	 * being in-order. However, this will be a problem if the last one to
+	 * complete was actually out-of-order as the ring seqno value will be
+	 * lower than one or more completed buffers. Thus code looking for the
+	 * completion of said buffers will wait forever.
+	 * Instead, use the hardware seqno as the starting point. This means
+	 * that some buffers might be kept around even in a completely idle
+	 * system but it should guarantee that no-one ever gets confused when
+	 * waiting for buffer completion.
+	 */
+	min_seqno = ring->get_seqno(ring, true);
+
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+		if (I915_SQS_IS_QUEUED(node))
+			queued++;
+		else if (I915_SQS_IS_FLYING(node))
+			flying++;
+		else if (I915_SQS_IS_COMPLETE(node))
+			continue;
+
+		if (i915_compare_seqno_values(node->params.seqno, min_seqno) < 0)
+			min_seqno = node->params.seqno;
+	}
+
+	INIT_LIST_HEAD(&remove);
+	list_for_each_entry_safe(node, node_next, &scheduler->node_queue[ring->id], link) {
+		/*
+		 * Only remove completed nodes which have a lower seqno than
+		 * all pending nodes. While there is the possibility of the
+		 * ring's seqno counting backwards, all higher buffers must
+		 * be remembered so that the 'i915_seqno_passed()' test can
+		 * report that they have in fact passed.
+		 */
+		if (!I915_SQS_IS_COMPLETE(node))
+			continue;
+
+		if (i915_compare_seqno_values(node->params.seqno, min_seqno) > 0)
+			continue;
+
+		list_del(&node->link);
+		list_add(&node->link, &remove);
+
+		/* Strip the dependency info while the mutex is still locked */
+		i915_scheduler_remove_dependent(scheduler, node);
+
+		continue;
+	}
+
+	/*
+	 * No idea why but this seems to cause problems occasionally.
+	 * Note that the 'irq_put' code is internally reference counted
+	 * and spin_locked so it should be safe to call.
+	 */
+	/*if ((scheduler->flags[ring->id] & i915_sf_interrupts_enabled) &&
+	    (first_flight[ring->id] == NULL)) {
+		ring->irq_put(ring);
+		scheduler->flags[ring->id] &= ~i915_sf_interrupts_enabled;
+	}*/
+
+	/* Launch more packets now? */
+	do_submit = (queued > 0) && (flying < scheduler->min_flying);
+
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	if (do_submit)
+		ret = i915_scheduler_submit(ring, true);
+
+	while (!list_empty(&remove)) {
+		node = list_first_entry(&remove, typeof(*node), link);
+		list_del(&node->link);
+
+		/* Release the locked buffers: */
+		for (i = 0; i < node->num_objs; i++) {
+			drm_gem_object_unreference(
+					    &node->saved_objects[i].obj->base);
+		}
+		kfree(node->saved_objects);
+
+		/* Context too: */
+		if (node->params.ctx)
+			i915_gem_context_unreference(node->params.ctx);
+
+		/* And anything else owned by the node: */
+		kfree(node->params.cliprects);
+		kfree(node->dep_list);
+		kfree(node);
+	}
+
+	return ret;
 }
 
 int i915_scheduler_flush_seqno(struct intel_engine_cs *ring, bool is_locked,
 			       uint32_t seqno)
 {
-	/* Do stuff... */
+	struct i915_scheduler_queue_entry  *node;
+	struct drm_i915_private            *dev_priv;
+	struct i915_scheduler              *scheduler;
+	unsigned long       flags;
+	int                 flush_count = 0;
 
-	return 0;
+	if (!ring)
+		return -EINVAL;
+
+	dev_priv  = ring->dev->dev_private;
+	scheduler = dev_priv->scheduler;
+
+	if (!scheduler)
+		return 0;
+
+	BUG_ON(is_locked && (scheduler->flags[ring->id] & i915_sf_submitting));
+
+	if (list_empty(&scheduler->node_queue[ring->id]))
+		return 0;
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+
+	i915_scheduler_priority_bump_clear(scheduler, ring);
+
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+		if (!I915_SQS_IS_QUEUED(node))
+			continue;
+
+		if (node->params.seqno != seqno)
+			continue;
+
+		flush_count += i915_scheduler_priority_bump(scheduler,
+					node, scheduler->priority_level_max);
+	}
+
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	if (flush_count) {
+		DRM_DEBUG_SCHED("<%s> Bumped %d entries\n", ring->name, flush_count);
+		flush_count = i915_scheduler_submit_max_priority(ring, is_locked);
+	}
+
+	return flush_count;
 }
 
 int i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked)
 {
-	/* Do stuff... */
+	struct i915_scheduler_queue_entry *node;
+	struct drm_i915_private           *dev_priv;
+	struct i915_scheduler             *scheduler;
+	unsigned long       flags;
+	bool        found;
+	int         ret;
+	uint32_t    count = 0;
+
+	if (!ring)
+		return -EINVAL;
+
+	dev_priv  = ring->dev->dev_private;
+	scheduler = dev_priv->scheduler;
+
+	if (!scheduler)
+		return 0;
+
+	BUG_ON(is_locked && (scheduler->flags[ring->id] & i915_sf_submitting));
+
+	do {
+		found = false;
+		spin_lock_irqsave(&scheduler->lock, flags);
+		list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+			if (!I915_SQS_IS_QUEUED(node))
+				continue;
+
+			found = true;
+			break;
+		}
+		spin_unlock_irqrestore(&scheduler->lock, flags);
+
+		if (found) {
+			ret = i915_scheduler_submit(ring, is_locked);
+			if (ret < 0)
+				return ret;
+
+			count += ret;
+		}
+	} while (found);
+
+	return count;
+}
+
+void i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler,
+					struct intel_engine_cs *ring)
+{
+	struct i915_scheduler_queue_entry *node;
+	int i;
+
+	/*
+	 * Ensure circular dependencies don't cause problems and that a bump
+	 * by object usage only bumps each using buffer once:
+	 */
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		list_for_each_entry(node, &scheduler->node_queue[i], link)
+			node->bumped = false;
+	}
+}
+
+int i915_scheduler_priority_bump(struct i915_scheduler *scheduler,
+				 struct i915_scheduler_queue_entry *target,
+				 uint32_t bump)
+{
+	uint32_t new_priority;
+	int      i, count;
+
+	if (target->priority >= scheduler->priority_level_max)
+		return 1;
+
+	if (target->bumped)
+		return 0;
+
+	new_priority = target->priority + bump;
+	if ((new_priority <= target->priority) ||
+	    (new_priority > scheduler->priority_level_max))
+		target->priority = scheduler->priority_level_max;
+	else
+		target->priority = new_priority;
+
+	count = 1;
+	target->bumped = true;
+
+	for (i = 0; i < target->num_deps; i++) {
+		if (!target->dep_list[i])
+			continue;
+
+		if (target->dep_list[i]->bumped)
+			continue;
+
+		count += i915_scheduler_priority_bump(scheduler,
+						      target->dep_list[i],
+						      bump);
+	}
+
+	return count;
+}
+
+int i915_scheduler_submit_max_priority(struct intel_engine_cs *ring,
+				       bool is_locked)
+{
+	struct i915_scheduler_queue_entry  *node;
+	struct drm_i915_private            *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler              *scheduler = dev_priv->scheduler;
+	unsigned long	flags;
+	int             ret, count = 0;
+	bool            found;
+
+	do {
+		found = false;
+		spin_lock_irqsave(&scheduler->lock, flags);
+		list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+			if (!I915_SQS_IS_QUEUED(node))
+				continue;
+
+			if (node->priority < scheduler->priority_level_max)
+				continue;
+
+			found = true;
+			break;
+		}
+		spin_unlock_irqrestore(&scheduler->lock, flags);
+
+		if (!found)
+			break;
+
+		ret = i915_scheduler_submit(ring, is_locked);
+		if (ret < 0)
+			return ret;
+
+		count += ret;
+	} while (found);
+
+	return count;
+}
+
+static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
+				    struct i915_scheduler_queue_entry **pop_node,
+				    unsigned long *flags)
+{
+	struct drm_i915_private            *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler              *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_queue_entry  *best;
+	struct i915_scheduler_queue_entry  *node;
+	int     ret;
+	int     i;
+	bool	any_queued;
+	bool	has_local, has_remote, only_remote;
+
+	*pop_node = NULL;
+	ret = -ENODATA;
+
+	any_queued = false;
+	only_remote = false;
+	best = NULL;
+
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+		if (!I915_SQS_IS_QUEUED(node))
+			continue;
+		any_queued = true;
+
+		has_local  = false;
+		has_remote = false;
+		for (i = 0; i < node->num_deps; i++) {
+			if (!i915_scheduler_is_dependency_valid(node, i))
+				continue;
+
+			if (node->dep_list[i]->params.ring == node->params.ring)
+				has_local = true;
+			else
+				has_remote = true;
+		}
+
+		if (has_remote && !has_local)
+			only_remote = true;
+
+		if (!has_local && !has_remote) {
+			if (!best ||
+			    (node->priority > best->priority))
+				best = node;
+		}
+	}
+
+	if (best) {
+		list_del(&best->link);
+
+		INIT_LIST_HEAD(&best->link);
+		best->status  = i915_sqs_none;
+
+		ret = 0;
+	} else {
+		/* Can only get here if:
+		 * (a) there are no buffers in the queue
+		 * (b) all queued buffers are dependent on other buffers
+		 *     e.g. on a buffer that is in flight on a different ring
+		 */
+		if (only_remote) {
+			/* The only dependent buffers are on another ring. */
+			ret = -EAGAIN;
+		} else if (any_queued) {
+			/* It seems that something has gone horribly wrong! */
+			DRM_ERROR("Broken dependency tracking on ring %d!\n",
+				  (int) ring->id);
+		}
+	}
+
+	/* i915_scheduler_dump_queue_pop(ring, best); */
+
+	*pop_node = best;
+	return ret;
+}
+
+int i915_scheduler_submit(struct intel_engine_cs *ring, bool was_locked)
+{
+	struct drm_device   *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_queue_entry  *node;
+	unsigned long       flags;
+	int                 ret = 0, count = 0;
+
+	if (!was_locked) {
+		ret = i915_mutex_lock_interruptible(dev);
+		if (ret)
+			return ret;
+	}
+
+	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+
+	/* First time around, complain if anything unexpected occurs: */
+	ret = i915_scheduler_pop_from_queue_locked(ring, &node, &flags);
+	if (ret) {
+		spin_unlock_irqrestore(&scheduler->lock, flags);
+
+		if (!was_locked)
+			mutex_unlock(&dev->struct_mutex);
+
+		return ret;
+	}
+
+	do {
+		BUG_ON(!node);
+		BUG_ON(node->params.ring != ring);
+		BUG_ON(node->status != i915_sqs_none);
+		count++;
+
+		/* The call to pop above will have removed the node from the
+		 * list. So add it back in and mark it as in flight. */
+		i915_scheduler_fly_node(node);
+
+		scheduler->flags[ring->id] |= i915_sf_submitting;
+		spin_unlock_irqrestore(&scheduler->lock, flags);
+		ret = i915_gem_do_execbuffer_final(&node->params);
+		spin_lock_irqsave(&scheduler->lock, flags);
+		scheduler->flags[ring->id] &= ~i915_sf_submitting;
+
+		if (ret) {
+			bool requeue = false;
+
+			/* Oh dear! Either the node is broken or the ring is
+			 * busy. So need to kill the node or requeue it and try
+			 * again later as appropriate. */
+
+			switch (-ret) {
+			case EAGAIN:
+			case EBUSY:
+			case EIO:
+			case ENOMEM:
+			case ERESTARTSYS:
+				/* Supposedly recoverable errors. */
+				requeue = true;
+			break;
+
+			case ENODEV:
+			case ENOENT:
+				/* Fatal errors. Kill the node. */
+			break;
+
+			default:
+				DRM_DEBUG_SCHED("<%s> Got unexpected error from execbuff_final(): %d!\n",
+						ring->name, ret);
+				/* Assume it is recoverable and hope for the best. */
+				requeue = true;
+			break;
+			}
+
+			if (requeue) {
+				i915_scheduler_node_requeue(node);
+				/* No point spinning if the ring is currently
+				 * unavailable so just give up and come back
+				 * later. */
+				break;
+			} else
+				i915_scheduler_node_kill(node);
+		}
+
+		/* Keep launching until the sky is sufficiently full. */
+		if (i915_scheduler_count_flying(scheduler, ring) >=
+						scheduler->min_flying)
+			break;
+
+		ret = i915_scheduler_pop_from_queue_locked(ring, &node, &flags);
+	} while (ret == 0);
+
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	if (!was_locked)
+		mutex_unlock(&dev->struct_mutex);
+
+	/* Don't complain about not being able to submit extra entries */
+	if (ret == -ENODATA)
+		ret = 0;
+
+	return (ret < 0) ? ret : count;
+}
+
+int i915_scheduler_remove_dependent(struct i915_scheduler *scheduler,
+				    struct i915_scheduler_queue_entry *remove)
+{
+	struct i915_scheduler_queue_entry  *node;
+	int     i, r;
+	int     count = 0;
+
+	for (i = 0; i < remove->num_deps; i++)
+		if ((remove->dep_list[i]) &&
+		    (!I915_SQS_IS_COMPLETE(remove->dep_list[i])))
+			count++;
+	BUG_ON(count);
+
+	for (r = 0; r < I915_NUM_RINGS; r++) {
+		list_for_each_entry(node, &scheduler->node_queue[r], link) {
+			for (i = 0; i < node->num_deps; i++) {
+				if (node->dep_list[i] != remove)
+					continue;
+
+				node->dep_list[i] = NULL;
+			}
+		}
+	}
 
 	return 0;
 }
@@ -135,17 +949,25 @@ int i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked)
 bool i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
 			       uint32_t seqno, bool *completed)
 {
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-	struct i915_scheduler   *scheduler = dev_priv->scheduler;
-	bool                    found = false;
-	unsigned long           flags;
+	struct i915_scheduler_queue_entry  *node;
+	struct drm_i915_private            *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler              *scheduler = dev_priv->scheduler;
+	bool            found = false;
+	unsigned long   flags;
 
 	if (!scheduler)
 		return false;
 
 	spin_lock_irqsave(&scheduler->lock, flags);
 
-	/* Do stuff... */
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+		if (node->params.seqno != seqno)
+			continue;
+
+		found = true;
+		*completed = I915_SQS_IS_COMPLETE(node);
+		break;
+	}
 
 	spin_unlock_irqrestore(&scheduler->lock, flags);
 
@@ -154,20 +976,73 @@ bool i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
 
 int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_queue_entry  *node;
+	struct drm_i915_private            *dev_priv = dev->dev_private;
+	struct i915_scheduler              *scheduler = dev_priv->scheduler;
+	struct intel_engine_cs  *ring;
+	int                     i, ret;
+	uint32_t                seqno;
+	unsigned long           flags;
+	bool                    found;
 
 	if (!scheduler)
 		return 0;
 
-	/* Do stuff... */
+	for_each_ring(ring, dev_priv, i) {
+		do {
+			spin_lock_irqsave(&scheduler->lock, flags);
+
+			found = false;
+			list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+				if (I915_SQS_IS_COMPLETE(node))
+					continue;
+
+				if (node->params.file != file)
+					continue;
+
+				found = true;
+				seqno = node->params.seqno;
+				break;
+			}
+
+			spin_unlock_irqrestore(&scheduler->lock, flags);
+
+			if (found) {
+				do {
+					mutex_lock(&dev->struct_mutex);
+					ret = i915_wait_seqno(ring, seqno);
+					mutex_unlock(&dev->struct_mutex);
+				} while (ret == -EAGAIN);
+			}
+		} while (found);
+	}
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+	for_each_ring(ring, dev_priv, i) {
+		list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+			if (node->params.file != file)
+				continue;
+
+			WARN_ON(!I915_SQS_IS_COMPLETE(node));
+
+			node->params.file = NULL;
+		}
+	}
+	spin_unlock_irqrestore(&scheduler->lock, flags);
 
 	return 0;
 }
 
 bool i915_scheduler_is_idle(struct intel_engine_cs *ring)
 {
-	/* Do stuff... */
+	struct i915_scheduler_queue_entry *node;
+	struct drm_device       *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link)
+		if (!I915_SQS_IS_COMPLETE(node))
+			return false;
 
 	return true;
 }
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 6dd4fea..f93d57d 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -47,14 +47,38 @@ struct i915_execbuffer_params {
 	uint32_t                        scheduler_index;
 };
 
+enum i915_scheduler_queue_status {
+	/* Limbo, between other states: */
+	i915_sqs_none = 0,
+	/* Not yet submitted to hardware: */
+	i915_sqs_queued,
+	/* Sent to hardware for processing: */
+	i915_sqs_flying,
+	/* Finished processing on the hardware: */
+	i915_sqs_complete,
+	/* Limit value for use with arrays/loops */
+	i915_sqs_MAX
+};
+
+#define I915_SQS_IS_QUEUED(node)	(((node)->status == i915_sqs_queued))
+#define I915_SQS_IS_FLYING(node)	(((node)->status == i915_sqs_flying))
+#define I915_SQS_IS_COMPLETE(node)	((node)->status == i915_sqs_complete)
+
 struct i915_scheduler_obj_entry {
 	struct drm_i915_gem_object          *obj;
 };
 
 struct i915_scheduler_queue_entry {
 	struct i915_execbuffer_params       params;
+	uint32_t                            priority;
 	struct i915_scheduler_obj_entry     *saved_objects;
 	int                                 num_objs;
+	bool                                bumped;
+	struct i915_scheduler_queue_entry   **dep_list;
+	int                                 num_deps;
+	enum i915_scheduler_queue_status    status;
+	struct timespec                     stamp;
+	struct list_head                    link;
 };
 
 #ifdef CONFIG_DRM_I915_SCHEDULER
@@ -79,21 +103,48 @@ bool        i915_scheduler_is_idle(struct intel_engine_cs *ring);
 #ifdef CONFIG_DRM_I915_SCHEDULER
 
 struct i915_scheduler {
-	uint32_t    flags[I915_NUM_RINGS];
-	spinlock_t  lock;
-	uint32_t    index;
+	struct list_head    node_queue[I915_NUM_RINGS];
+	uint32_t            flags[I915_NUM_RINGS];
+	spinlock_t          lock;
+	uint32_t            index;
+
+	/* Tuning parameters: */
+	uint32_t            priority_level_max;
+	uint32_t            priority_level_preempt;
+	uint32_t            min_flying;
+};
+
+/* Flag bits for i915_scheduler::flags */
+enum {
+	i915_sf_interrupts_enabled  = (1 << 0),
+	i915_sf_submitting          = (1 << 1),
 };
 
 /* Options for 'scheduler_override' module parameter: */
 enum {
-	i915_so_normal              = 0,
+	i915_so_direct_submit       = (1 << 0),
 };
 
+bool        i915_scheduler_is_busy(struct intel_engine_cs *ring);
+int         i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node);
 int         i915_scheduler_fly_seqno(struct intel_engine_cs *ring, uint32_t seqno);
 int         i915_scheduler_remove(struct intel_engine_cs *ring);
+int         i915_scheduler_remove_dependent(struct i915_scheduler *scheduler,
+				struct i915_scheduler_queue_entry *remove);
 int         i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked);
 int         i915_scheduler_flush_seqno(struct intel_engine_cs *ring,
 				       bool is_locked, uint32_t seqno);
+int         i915_scheduler_submit(struct intel_engine_cs *ring,
+				  bool is_locked);
+int         i915_scheduler_submit_max_priority(struct intel_engine_cs *ring,
+					       bool is_locked);
+uint32_t    i915_scheduler_count_flying(struct i915_scheduler *scheduler,
+					struct intel_engine_cs *ring);
+void        i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler,
+					       struct intel_engine_cs *ring);
+int         i915_scheduler_priority_bump(struct i915_scheduler *scheduler,
+				struct i915_scheduler_queue_entry *target,
+				uint32_t bump);
 bool        i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
 					      uint32_t seqno, bool *completed);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 32/44] drm/i915: Added immediate submission override to scheduler
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (30 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 31/44] drm/i915: Implemented the GPU scheduler John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 33/44] drm/i915: Added trace points " John.C.Harrison
                   ` (13 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

To aid with debugging issues related to the scheduler, it can be useful to
ensure that all batch buffers are submitted immediately rather than queued until
later. This change adds an override flag via the module parameter to force
instant submission.
---
 drivers/gpu/drm/i915/i915_scheduler.c |    7 +++++--
 drivers/gpu/drm/i915/i915_scheduler.h |    1 +
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 1816f1d..71d8db4 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -209,8 +209,11 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 
 	list_add_tail(&node->link, &scheduler->node_queue[ring->id]);
 
-	not_flying = i915_scheduler_count_flying(scheduler, ring) <
-						 scheduler->min_flying;
+	if (i915.scheduler_override & i915_so_submit_on_queue)
+		not_flying = true;
+	else
+		not_flying = i915_scheduler_count_flying(scheduler, ring) <
+							 scheduler->min_flying;
 
 	spin_unlock_irqrestore(&scheduler->lock, flags);
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index f93d57d..e824e700 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -123,6 +123,7 @@ enum {
 /* Options for 'scheduler_override' module parameter: */
 enum {
 	i915_so_direct_submit       = (1 << 0),
+	i915_so_submit_on_queue     = (1 << 1),
 };
 
 bool        i915_scheduler_is_busy(struct intel_engine_cs *ring);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 33/44] drm/i915: Added trace points to scheduler
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (31 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 32/44] drm/i915: Added immediate submission override to scheduler John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 34/44] drm/i915: Added scheduler queue throttling by DRM file handle John.C.Harrison
                   ` (12 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Added trace points to the scheduler to track all the various events, node state
transitions and other interesting things that occur.
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    2 +
 drivers/gpu/drm/i915/i915_scheduler.c      |   31 ++++-
 drivers/gpu/drm/i915/i915_trace.h          |  194 ++++++++++++++++++++++++++++
 3 files changed, 226 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 98cc95e..bf19e02 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1413,6 +1413,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	ring->outstanding_lazy_seqno    = 0;
 	ring->preallocated_lazy_request = NULL;
 
+	trace_i915_gem_ring_queue(ring, &qe);
+
 	ret = i915_scheduler_queue_execbuffer(&qe);
 	if (ret)
 		goto err;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 71d8db4..6d0f4cb 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -87,6 +87,8 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 
 		qe->params.scheduler_index = scheduler->index++;
 
+		trace_i915_scheduler_queue(qe->params.ring, qe);
+
 		scheduler->flags[qe->params.ring->id] |= i915_sf_submitting;
 		ret = i915_gem_do_execbuffer_final(&qe->params);
 		scheduler->flags[qe->params.ring->id] &= ~i915_sf_submitting;
@@ -215,6 +217,9 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 		not_flying = i915_scheduler_count_flying(scheduler, ring) <
 							 scheduler->min_flying;
 
+	trace_i915_scheduler_queue(ring, node);
+	trace_i915_scheduler_node_state_change(ring, node);
+
 	spin_unlock_irqrestore(&scheduler->lock, flags);
 
 	if (not_flying)
@@ -253,6 +258,8 @@ int i915_scheduler_fly_seqno(struct intel_engine_cs *ring, uint32_t seqno)
 	node->stamp        = stamp;
 	node->status       = i915_sqs_none;
 
+	trace_i915_scheduler_node_state_change(ring, node);
+
 	spin_lock_irqsave(&scheduler->lock, flags);
 	ret = i915_scheduler_fly_node(node);
 	spin_unlock_irqrestore(&scheduler->lock, flags);
@@ -279,6 +286,9 @@ int i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node)
 
 	node->status = i915_sqs_flying;
 
+	trace_i915_scheduler_fly(ring, node);
+	trace_i915_scheduler_node_state_change(ring, node);
+
 	if (!(scheduler->flags[ring->id] & i915_sf_interrupts_enabled)) {
 		bool    success = true;
 
@@ -343,6 +353,8 @@ static void i915_scheduler_node_requeue(struct i915_scheduler_queue_entry *node)
 	BUG_ON(!I915_SQS_IS_FLYING(node));
 
 	node->status = i915_sqs_queued;
+	trace_i915_scheduler_unfly(node->params.ring, node);
+	trace_i915_scheduler_node_state_change(node->params.ring, node);
 }
 
 /* Give up on a popped node completely. For example, because it is causing the
@@ -353,6 +365,8 @@ static void i915_scheduler_node_kill(struct i915_scheduler_queue_entry *node)
 	BUG_ON(!I915_SQS_IS_FLYING(node));
 
 	node->status = i915_sqs_complete;
+	trace_i915_scheduler_unfly(node->params.ring, node);
+	trace_i915_scheduler_node_state_change(node->params.ring, node);
 }
 
 /*
@@ -377,13 +391,17 @@ static int i915_scheduler_seqno_complete(struct intel_engine_cs *ring, uint32_t
 	 * if a completed entry is found then there is no need to scan further.
 	 */
 	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
-		if (I915_SQS_IS_COMPLETE(node))
+		if (I915_SQS_IS_COMPLETE(node)) {
+			trace_i915_scheduler_landing(ring, seqno, node);
 			goto done;
+		}
 
 		if (seqno == node->params.seqno)
 			break;
 	}
 
+	trace_i915_scheduler_landing(ring, seqno, node);
+
 	/*
 	 * NB: Lots of extra seqnos get added to the ring to track things
 	 * like cache flushes and page flips. So don't complain about if
@@ -405,6 +423,7 @@ static int i915_scheduler_seqno_complete(struct intel_engine_cs *ring, uint32_t
 
 		/* Node was in flight so mark it as complete. */
 		node->status = i915_sqs_complete;
+		trace_i915_scheduler_node_state_change(ring, node);
 	}
 
 	/* Should submit new work here if flight list is empty but the DRM
@@ -425,6 +444,8 @@ int i915_scheduler_handle_IRQ(struct intel_engine_cs *ring)
 
 	seqno = ring->get_seqno(ring, false);
 
+	trace_i915_scheduler_irq(ring, seqno);
+
 	if (i915.scheduler_override & i915_so_direct_submit)
 		return 0;
 
@@ -526,6 +547,8 @@ int i915_scheduler_remove(struct intel_engine_cs *ring)
 	/* Launch more packets now? */
 	do_submit = (queued > 0) && (flying < scheduler->min_flying);
 
+	trace_i915_scheduler_remove(ring, min_seqno, do_submit);
+
 	spin_unlock_irqrestore(&scheduler->lock, flags);
 
 	if (do_submit)
@@ -535,6 +558,8 @@ int i915_scheduler_remove(struct intel_engine_cs *ring)
 		node = list_first_entry(&remove, typeof(*node), link);
 		list_del(&node->link);
 
+		trace_i915_scheduler_destroy(ring, node);
+
 		/* Release the locked buffers: */
 		for (i = 0; i < node->num_objs; i++) {
 			drm_gem_object_unreference(
@@ -793,6 +818,8 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
 		INIT_LIST_HEAD(&best->link);
 		best->status  = i915_sqs_none;
 
+		trace_i915_scheduler_node_state_change(ring, best);
+
 		ret = 0;
 	} else {
 		/* Can only get here if:
@@ -812,6 +839,8 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
 
 	/* i915_scheduler_dump_queue_pop(ring, best); */
 
+	trace_i915_scheduler_pop_from_queue(ring, best);
+
 	*pop_node = best;
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index f5aa006..bea2a49 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -9,6 +9,7 @@
 #include "i915_drv.h"
 #include "intel_drv.h"
 #include "intel_ringbuffer.h"
+#include "i915_scheduler.h"
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM i915
@@ -587,6 +588,199 @@ TRACE_EVENT(intel_gpu_freq_change,
 	    TP_printk("new_freq=%u", __entry->freq)
 );
 
+TRACE_EVENT(i915_scheduler_queue,
+	    TP_PROTO(struct intel_engine_cs *ring,
+		     struct i915_scheduler_queue_entry *node),
+	    TP_ARGS(ring, node),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring      = ring->id;
+			   __entry->seqno     = node ? node->params.seqno : 0;
+			   ),
+
+	    TP_printk("ring=%d, seqno=%d",
+		      __entry->ring, __entry->seqno)
+);
+
+TRACE_EVENT(i915_scheduler_fly,
+	    TP_PROTO(struct intel_engine_cs *ring,
+		     struct i915_scheduler_queue_entry *node),
+	    TP_ARGS(ring, node),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring      = ring->id;
+			   __entry->seqno     = node ? node->params.seqno : 0;
+			   ),
+
+	    TP_printk("ring=%d, seqno=%d",
+		      __entry->ring, __entry->seqno)
+);
+
+TRACE_EVENT(i915_scheduler_unfly,
+	    TP_PROTO(struct intel_engine_cs *ring,
+		     struct i915_scheduler_queue_entry *node),
+	    TP_ARGS(ring, node),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring      = ring->id;
+			   __entry->seqno     = node ? node->params.seqno : 0;
+			   ),
+
+	    TP_printk("ring=%d, seqno=%d",
+		      __entry->ring, __entry->seqno)
+);
+
+TRACE_EVENT(i915_scheduler_landing,
+	    TP_PROTO(struct intel_engine_cs *ring, u32 seqno,
+		     struct i915_scheduler_queue_entry *node),
+	    TP_ARGS(ring, seqno, node),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, seqno)
+			     __field(u32, status)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring   = ring->id;
+			   __entry->seqno  = seqno;
+			   __entry->status = node ? node->status : ~0U;
+			   ),
+
+	    TP_printk("ring=%d, seqno=%d, status=%d",
+		      __entry->ring, __entry->seqno, __entry->status)
+);
+
+TRACE_EVENT(i915_scheduler_remove,
+	    TP_PROTO(struct intel_engine_cs *ring,
+		     u32 min_seqno, bool do_submit),
+	    TP_ARGS(ring, min_seqno, do_submit),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, min_seqno)
+			     __field(bool, do_submit)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring      = ring->id;
+			   __entry->min_seqno = min_seqno;
+			   __entry->do_submit = do_submit;
+			   ),
+
+	    TP_printk("ring=%d, min_seqno = %d, do_submit=%d",
+		      __entry->ring, __entry->min_seqno, __entry->do_submit)
+);
+
+TRACE_EVENT(i915_scheduler_destroy,
+	    TP_PROTO(struct intel_engine_cs *ring,
+		     struct i915_scheduler_queue_entry *node),
+	    TP_ARGS(ring, node),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring      = ring->id;
+			   __entry->seqno     = node ? node->params.seqno : 0;
+			   ),
+
+	    TP_printk("ring=%d, seqno=%d",
+		      __entry->ring, __entry->seqno)
+);
+
+TRACE_EVENT(i915_scheduler_pop_from_queue,
+	    TP_PROTO(struct intel_engine_cs *ring,
+		     struct i915_scheduler_queue_entry *node),
+	    TP_ARGS(ring, node),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring   = ring->id;
+			   __entry->seqno  = node ? node->params.seqno : 0;
+			   ),
+
+	    TP_printk("ring=%d, seqno=%d",
+		      __entry->ring, __entry->seqno)
+);
+
+TRACE_EVENT(i915_scheduler_node_state_change,
+	    TP_PROTO(struct intel_engine_cs *ring,
+		     struct i915_scheduler_queue_entry *node),
+	    TP_ARGS(ring, node),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, seqno)
+			     __field(u32, status)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring   = ring->id;
+			   __entry->seqno  = node->params.seqno;
+			   __entry->status = node->status;
+			   ),
+
+	    TP_printk("ring=%d, seqno=%d, status=%d",
+		      __entry->ring, __entry->seqno, __entry->status)
+);
+
+TRACE_EVENT(i915_scheduler_irq,
+	    TP_PROTO(struct intel_engine_cs *ring, uint32_t seqno),
+	    TP_ARGS(ring, seqno),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring   = ring->id;
+			   __entry->seqno  = seqno;
+			   ),
+
+	    TP_printk("ring=%d, seqno=%d", __entry->ring, __entry->seqno)
+);
+
+TRACE_EVENT(i915_gem_ring_queue,
+	    TP_PROTO(struct intel_engine_cs *ring,
+		     struct i915_scheduler_queue_entry *node),
+	    TP_ARGS(ring, node),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring   = ring->id;
+			   __entry->seqno  = node->params.seqno;
+			   ),
+
+	    TP_printk("ring=%d, seqno=%d", __entry->ring, __entry->seqno)
+);
+
 #endif /* _I915_TRACE_H_ */
 
 /* This part must be outside protection */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 34/44] drm/i915: Added scheduler queue throttling by DRM file handle
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (32 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 33/44] drm/i915: Added trace points " John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 35/44] drm/i915: Added debugfs interface to scheduler tuning parameters John.C.Harrison
                   ` (11 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler decouples the submission of batch buffers to the driver from their
subsequent submission to the hardware. This means that an application which is
continuously submitting buffers as fast as it can could potentialy flood the
driver. To prevent this, the driver now tracks how many buffers are in progress
(queued in software or executing in hardware) and limits this to a given
(tunable) number. If this number is exceeded then the queue to the driver will
return EAGAIN and thus prevent the scheduler's queue becoming arbitrarily large.
---
 drivers/gpu/drm/i915/i915_drv.h            |    2 ++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   12 +++++++++++
 drivers/gpu/drm/i915/i915_scheduler.c      |   32 ++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h      |    5 +++++
 4 files changed, 51 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4d52c67..872e869 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1785,6 +1785,8 @@ struct drm_i915_file_private {
 
 	atomic_t rps_wait_boost;
 	struct  intel_engine_cs *bsd_ring;
+
+	u32 scheduler_queue_length;
 };
 
 /*
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index bf19e02..3227a39 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1614,6 +1614,12 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
+#ifdef CONFIG_DRM_I915_SCHEDULER
+	/* Throttle batch requests per device file */
+	if (i915_scheduler_file_queue_is_full(file))
+		return -EAGAIN;
+#endif
+
 	/* Copy in the exec list from userland */
 	exec_list = drm_malloc_ab(sizeof(*exec_list), args->buffer_count);
 	exec2_list = drm_malloc_ab(sizeof(*exec2_list), args->buffer_count);
@@ -1702,6 +1708,12 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
+#ifdef CONFIG_DRM_I915_SCHEDULER
+	/* Throttle batch requests per device file */
+	if (i915_scheduler_file_queue_is_full(file))
+		return -EAGAIN;
+#endif
+
 	exec2_list = kmalloc(sizeof(*exec2_list)*args->buffer_count,
 			     GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
 	if (exec2_list == NULL)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 6d0f4cb..6782249 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -61,6 +61,7 @@ int i915_scheduler_init(struct drm_device *dev)
 	scheduler->priority_level_max     = ~0U;
 	scheduler->priority_level_preempt = 900;
 	scheduler->min_flying             = 2;
+	scheduler->file_queue_max         = 64;
 
 	dev_priv->scheduler = scheduler;
 
@@ -211,6 +212,8 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 
 	list_add_tail(&node->link, &scheduler->node_queue[ring->id]);
 
+	i915_scheduler_file_queue_inc(node->params.file);
+
 	if (i915.scheduler_override & i915_so_submit_on_queue)
 		not_flying = true;
 	else
@@ -530,6 +533,12 @@ int i915_scheduler_remove(struct intel_engine_cs *ring)
 		/* Strip the dependency info while the mutex is still locked */
 		i915_scheduler_remove_dependent(scheduler, node);
 
+		/* Likewise clean up the file descriptor before it might disappear. */
+		if (node->params.file) {
+			i915_scheduler_file_queue_dec(node->params.file);
+			node->params.file = NULL;
+		}
+
 		continue;
 	}
 
@@ -1079,6 +1088,29 @@ bool i915_scheduler_is_idle(struct intel_engine_cs *ring)
 	return true;
 }
 
+bool i915_scheduler_file_queue_is_full(struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct drm_i915_private      *dev_priv  = file_priv->dev_priv;
+	struct i915_scheduler        *scheduler = dev_priv->scheduler;
+
+	return (file_priv->scheduler_queue_length >= scheduler->file_queue_max);
+}
+
+void i915_scheduler_file_queue_inc(struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+
+	file_priv->scheduler_queue_length++;
+}
+
+void i915_scheduler_file_queue_dec(struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+
+	file_priv->scheduler_queue_length--;
+}
+
 #else   /* CONFIG_DRM_I915_SCHEDULER */
 
 int i915_scheduler_init(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index e824e700..78a92c9 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -112,6 +112,7 @@ struct i915_scheduler {
 	uint32_t            priority_level_max;
 	uint32_t            priority_level_preempt;
 	uint32_t            min_flying;
+	uint32_t            file_queue_max;
 };
 
 /* Flag bits for i915_scheduler::flags */
@@ -149,6 +150,10 @@ int         i915_scheduler_priority_bump(struct i915_scheduler *scheduler,
 bool        i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
 					      uint32_t seqno, bool *completed);
 
+bool i915_scheduler_file_queue_is_full(struct drm_file *file);
+void i915_scheduler_file_queue_inc(struct drm_file *file);
+void i915_scheduler_file_queue_dec(struct drm_file *file);
+
 #endif  /* CONFIG_DRM_I915_SCHEDULER */
 
 int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 35/44] drm/i915: Added debugfs interface to scheduler tuning parameters
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (33 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 34/44] drm/i915: Added scheduler queue throttling by DRM file handle John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 36/44] drm/i915: Added debug state dump facilities to scheduler John.C.Harrison
                   ` (10 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There are various parameters within the scheduler which can be tuned to improve
performance, reduce memory footprint, etc. This change adds support for altering
these via debugfs.
---
 drivers/gpu/drm/i915/i915_debugfs.c |  117 +++++++++++++++++++++++++++++++++++
 1 file changed, 117 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 5858cbb..1c20c8c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -39,6 +39,7 @@
 #include "intel_ringbuffer.h"
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
+#include "i915_scheduler.h"
 
 enum {
 	ACTIVE_LIST,
@@ -983,6 +984,116 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_next_seqno_fops,
 			i915_next_seqno_get, i915_next_seqno_set,
 			"0x%llx\n");
 
+#ifdef CONFIG_DRM_I915_SCHEDULER
+static int
+i915_scheduler_priority_max_get(void *data, u64 *val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	*val = (u64) scheduler->priority_level_max;
+	return 0;
+}
+
+static int
+i915_scheduler_priority_max_set(void *data, u64 val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	scheduler->priority_level_max = (u32) val;
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_priority_max_fops,
+			i915_scheduler_priority_max_get,
+			i915_scheduler_priority_max_set,
+			"0x%llx\n");
+
+static int
+i915_scheduler_priority_preempt_get(void *data, u64 *val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	*val = (u64) scheduler->priority_level_preempt;
+	return 0;
+}
+
+static int
+i915_scheduler_priority_preempt_set(void *data, u64 val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	scheduler->priority_level_preempt = (u32) val;
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_priority_preempt_fops,
+			i915_scheduler_priority_preempt_get,
+			i915_scheduler_priority_preempt_set,
+			"0x%llx\n");
+
+static int
+i915_scheduler_min_flying_get(void *data, u64 *val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	*val = (u64) scheduler->min_flying;
+	return 0;
+}
+
+static int
+i915_scheduler_min_flying_set(void *data, u64 val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	scheduler->min_flying = (u32) val;
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_min_flying_fops,
+			i915_scheduler_min_flying_get,
+			i915_scheduler_min_flying_set,
+			"0x%llx\n");
+
+static int
+i915_scheduler_file_queue_max_get(void *data, u64 *val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	*val = (u64) scheduler->file_queue_max;
+	return 0;
+}
+
+static int
+i915_scheduler_file_queue_max_set(void *data, u64 val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	scheduler->file_queue_max = (u32) val;
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_file_queue_max_fops,
+			i915_scheduler_file_queue_max_get,
+			i915_scheduler_file_queue_max_set,
+			"0x%llx\n");
+#endif  /* CONFIG_DRM_I915_SCHEDULER */
+
 static int i915_rstdby_delays(struct seq_file *m, void *unused)
 {
 	struct drm_info_node *node = m->private;
@@ -3834,6 +3945,12 @@ static const struct i915_debugfs_files {
 	{"i915_gem_drop_caches", &i915_drop_caches_fops},
 	{"i915_error_state", &i915_error_state_fops},
 	{"i915_next_seqno", &i915_next_seqno_fops},
+#ifdef CONFIG_DRM_I915_SCHEDULER
+	{"i915_scheduler_priority_max", &i915_scheduler_priority_max_fops},
+	{"i915_scheduler_priority_preempt", &i915_scheduler_priority_preempt_fops},
+	{"i915_scheduler_min_flying", &i915_scheduler_min_flying_fops},
+	{"i915_scheduler_file_queue_max", &i915_scheduler_file_queue_max_fops},
+#endif
 	{"i915_display_crc_ctl", &i915_display_crc_ctl_fops},
 	{"i915_pri_wm_latency", &i915_pri_wm_latency_fops},
 	{"i915_spr_wm_latency", &i915_spr_wm_latency_fops},
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 36/44] drm/i915: Added debug state dump facilities to scheduler
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (34 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 35/44] drm/i915: Added debugfs interface to scheduler tuning parameters John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 37/44] drm/i915: Added facility for cancelling an outstanding request John.C.Harrison
                   ` (9 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

When debugging batch buffer submission issues, it is useful to be able to see
what the current state of the scheduler is. This change adds functions for
decoding the internal scheduler state and reporting it.
---
 drivers/gpu/drm/i915/i915_scheduler.c |  255 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h |   17 +++
 2 files changed, 272 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 6782249..7c03fb7 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -37,6 +37,101 @@ bool i915_scheduler_is_enabled(struct drm_device *dev)
 
 #ifdef CONFIG_DRM_I915_SCHEDULER
 
+const char *i915_qe_state_str(struct i915_scheduler_queue_entry *node)
+{
+	static char	str[50];
+	char		*ptr = str;
+
+	*(ptr++) = node->bumped ? 'B' : '-',
+
+	*ptr = 0;
+
+	return str;
+}
+
+char i915_scheduler_queue_status_chr(enum i915_scheduler_queue_status status)
+{
+	switch (status) {
+	case i915_sqs_none:
+	return 'N';
+
+	case i915_sqs_queued:
+	return 'Q';
+
+	case i915_sqs_flying:
+	return 'F';
+
+	case i915_sqs_complete:
+	return 'C';
+
+	default:
+	break;
+	}
+
+	return '?';
+}
+
+const char *i915_scheduler_queue_status_str(
+				enum i915_scheduler_queue_status status)
+{
+	static char	str[50];
+
+	switch (status) {
+	case i915_sqs_none:
+	return "None";
+
+	case i915_sqs_queued:
+	return "Queued";
+
+	case i915_sqs_flying:
+	return "Flying";
+
+	case i915_sqs_complete:
+	return "Complete";
+
+	default:
+	break;
+	}
+
+	sprintf(str, "[Unknown_%d!]", status);
+	return str;
+}
+
+const char *i915_scheduler_flag_str(uint32_t flags)
+{
+	static char     str[100];
+	char           *ptr = str;
+
+	*ptr = 0;
+
+#define TEST_FLAG(flag, msg)						\
+	if (flags & (flag)) {						\
+		strcpy(ptr, msg);					\
+		ptr += strlen(ptr);					\
+		flags &= ~(flag);					\
+	}
+
+	TEST_FLAG(i915_sf_interrupts_enabled, "IntOn|");
+	TEST_FLAG(i915_sf_submitting,         "Submitting|");
+	TEST_FLAG(i915_sf_dump_force,         "DumpForce|");
+	TEST_FLAG(i915_sf_dump_details,       "DumpDetails|");
+	TEST_FLAG(i915_sf_dump_dependencies,  "DumpDeps|");
+
+#undef TEST_FLAG
+
+	if (flags) {
+		sprintf(ptr, "Unknown_0x%X!", flags);
+		ptr += strlen(ptr);
+	}
+
+	if (ptr == str)
+		strcpy(str, "-");
+	else
+		ptr[-1] = 0;
+
+	return str;
+};
+
 int i915_scheduler_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -589,6 +684,166 @@ int i915_scheduler_remove(struct intel_engine_cs *ring)
 	return ret;
 }
 
+int i915_scheduler_dump_all(struct drm_device *dev, const char *msg)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	unsigned long   flags;
+	int             ret;
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+	ret = i915_scheduler_dump_all_locked(dev, msg);
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	return ret;
+}
+
+int i915_scheduler_dump_all_locked(struct drm_device *dev, const char *msg)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct intel_engine_cs  *ring;
+	int                     i, r, ret = 0;
+
+	for_each_ring(ring, dev_priv, i) {
+		scheduler->flags[ring->id] |= i915_sf_dump_force   |
+					      i915_sf_dump_details |
+					      i915_sf_dump_dependencies;
+		r = i915_scheduler_dump_locked(ring, msg);
+		if (ret == 0)
+			ret = r;
+	}
+
+	return ret;
+}
+
+int i915_scheduler_dump(struct intel_engine_cs *ring, const char *msg)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	unsigned long   flags;
+	int             ret;
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+	ret = i915_scheduler_dump_locked(ring, msg);
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	return ret;
+}
+
+int i915_scheduler_dump_locked(struct intel_engine_cs *ring, const char *msg)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_queue_entry  *node;
+	int                 flying = 0, queued = 0, complete = 0, other = 0;
+	static int          old_flying = -1, old_queued = -1, old_complete = -1;
+	bool                b_dumped = false, b_dump;
+	char                brkt[2] = { '<', '>' };
+
+	if (!ring)
+		return -EINVAL;
+
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+		if (I915_SQS_IS_QUEUED(node))
+			queued++;
+		else if (I915_SQS_IS_FLYING(node))
+			flying++;
+		else if (I915_SQS_IS_COMPLETE(node))
+			complete++;
+		else
+			other++;
+	}
+
+	b_dump = (flying != old_flying) ||
+		 (queued != old_queued) ||
+		 (complete != old_complete);
+	if (scheduler->flags[ring->id] & i915_sf_dump_force) {
+		if (!b_dump) {
+			b_dump = true;
+			brkt[0] = '{';
+			brkt[1] = '}';
+		}
+
+		scheduler->flags[ring->id] &= ~i915_sf_dump_force;
+	}
+
+	if (b_dump) {
+		old_flying   = flying;
+		old_queued   = queued;
+		old_complete = complete;
+		DRM_DEBUG_SCHED("<%s> Q:%02d, F:%02d, C:%02d, O:%02d, " \
+				"Flags = %s, OLR = %d %c%s%c\n",
+				ring->name, queued, flying, complete, other,
+				i915_scheduler_flag_str(scheduler->flags[ring->id]),
+				ring->outstanding_lazy_seqno,
+				brkt[0], msg, brkt[1]);
+		b_dumped = true;
+	} /* else
+		DRM_DEBUG_SCHED("<%s> Q:%02d, F:%02d, C:%02d, O:%02d" \
+				", Flags = %s, OLR = %d [%s]\n",
+				ring->name,
+				queued, flying, complete, other,
+				i915_scheduler_flag_str(scheduler->flags[ring->id]),
+				ring->outstanding_lazy_seqno, msg); */
+
+	if (b_dumped && (scheduler->flags[ring->id] & i915_sf_dump_details)) {
+		uint32_t    seqno;
+		int         i, deps;
+		uint32_t    count, counts[i915_sqs_MAX];
+
+		memset(counts, 0x00, sizeof(counts));
+
+		seqno = ring->get_seqno(ring, true);
+		list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+			if (node->status < i915_sqs_MAX) {
+				count = counts[node->status]++;
+			} else {
+				DRM_DEBUG_SCHED("<%s>   Unknown status: %d!\n",
+						ring->name, node->status);
+				count = -1;
+			}
+
+			deps = 0;
+			for (i = 0; i < node->num_deps; i++)
+				if (i915_scheduler_is_dependency_valid(node, i))
+					deps++;
+
+			DRM_DEBUG_SCHED("<%s>   %c:%02d> index = %d, seqno" \
+					" = %d/%s, deps = %d / %d, %s [pri = " \
+					"%4d]\n", ring->name,
+					i915_scheduler_queue_status_chr(node->status),
+					count,
+					node->params.scheduler_index,
+					node->params.seqno,
+					node->params.ring->name,
+					deps, node->num_deps,
+					i915_qe_state_str(node),
+					node->priority);
+
+			if ((scheduler->flags[ring->id] & i915_sf_dump_dependencies)
+				== 0)
+				continue;
+
+			for (i = 0; i < node->num_deps; i++)
+				if (node->dep_list[i])
+					DRM_DEBUG_SCHED("<%s>       |-%c:" \
+						"%02d%c seqno = %d/%s, %s [pri = %4d]\n",
+						ring->name,
+						i915_scheduler_queue_status_chr(node->dep_list[i]->status),
+						i,
+						i915_scheduler_is_dependency_valid(node, i)
+							? '>' : '#',
+						node->dep_list[i]->params.seqno,
+						node->dep_list[i]->params.ring->name,
+						i915_qe_state_str(node->dep_list[i]),
+						node->dep_list[i]->priority);
+		}
+	}
+
+	return 0;
+}
+
 int i915_scheduler_flush_seqno(struct intel_engine_cs *ring, bool is_locked,
 			       uint32_t seqno)
 {
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 78a92c9..bbfd13c 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -59,6 +59,9 @@ enum i915_scheduler_queue_status {
 	/* Limit value for use with arrays/loops */
 	i915_sqs_MAX
 };
+char i915_scheduler_queue_status_chr(enum i915_scheduler_queue_status status);
+const char *i915_scheduler_queue_status_str(
+				enum i915_scheduler_queue_status status);
 
 #define I915_SQS_IS_QUEUED(node)	(((node)->status == i915_sqs_queued))
 #define I915_SQS_IS_FLYING(node)	(((node)->status == i915_sqs_flying))
@@ -80,6 +83,7 @@ struct i915_scheduler_queue_entry {
 	struct timespec                     stamp;
 	struct list_head                    link;
 };
+const char *i915_qe_state_str(struct i915_scheduler_queue_entry *node);
 
 #ifdef CONFIG_DRM_I915_SCHEDULER
 #   define I915_SCHEDULER_FLUSH_ALL(ring, locked)                            \
@@ -117,9 +121,16 @@ struct i915_scheduler {
 
 /* Flag bits for i915_scheduler::flags */
 enum {
+	/* Internal state */
 	i915_sf_interrupts_enabled  = (1 << 0),
 	i915_sf_submitting          = (1 << 1),
+
+	/* Dump/debug flags */
+	i915_sf_dump_force          = (1 << 8),
+	i915_sf_dump_details        = (1 << 9),
+	i915_sf_dump_dependencies   = (1 << 10),
 };
+const char *i915_scheduler_flag_str(uint32_t flags);
 
 /* Options for 'scheduler_override' module parameter: */
 enum {
@@ -142,6 +153,12 @@ int         i915_scheduler_submit_max_priority(struct intel_engine_cs *ring,
 					       bool is_locked);
 uint32_t    i915_scheduler_count_flying(struct i915_scheduler *scheduler,
 					struct intel_engine_cs *ring);
+int         i915_scheduler_dump(struct intel_engine_cs *ring,
+				const char *msg);
+int         i915_scheduler_dump_locked(struct intel_engine_cs *ring,
+				       const char *msg);
+int         i915_scheduler_dump_all(struct drm_device *dev, const char *msg);
+int         i915_scheduler_dump_all_locked(struct drm_device *dev, const char *msg);
 void        i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler,
 					       struct intel_engine_cs *ring);
 int         i915_scheduler_priority_bump(struct i915_scheduler *scheduler,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 37/44] drm/i915: Added facility for cancelling an outstanding request
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (35 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 36/44] drm/i915: Added debug state dump facilities to scheduler John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 38/44] drm/i915: Add early exit to execbuff_final() if insufficient ring space John.C.Harrison
                   ` (8 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

If the scheduler pre-empts a batch buffer that is queued in the ring or even
executing in the ring then that buffer must be returned to the queued in
software state. Part of this re-queueing is to clean up the request structure.
---
 drivers/gpu/drm/i915/i915_drv.h |    1 +
 drivers/gpu/drm/i915/i915_gem.c |   16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 872e869..f8980c0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2319,6 +2319,7 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	__i915_add_request(ring, NULL, NULL, seqno, true)
 #define i915_add_request_wo_flush(ring) \
 	__i915_add_request(ring, NULL, NULL, NULL, false)
+int i915_gem_cancel_request(struct intel_engine_cs *ring, u32 seqno);
 int __must_check i915_wait_seqno(struct intel_engine_cs *ring,
 				 uint32_t seqno);
 int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1c508b7..dd0fac8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2655,6 +2655,22 @@ void i915_gem_reset(struct drm_device *dev)
 	i915_gem_restore_fences(dev);
 }
 
+int
+i915_gem_cancel_request(struct intel_engine_cs *ring, u32 seqno)
+{
+	struct drm_i915_gem_request *req, *next;
+	int found = 0;
+
+	list_for_each_entry_safe(req, next, &ring->request_list, list) {
+		if (req->seqno == seqno) {
+			found += 1;
+			i915_gem_free_request(req);
+		}
+	}
+
+	return found;
+}
+
 /**
  * This function clears the request list as sequence numbers are passed.
  */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 38/44] drm/i915: Add early exit to execbuff_final() if insufficient ring space
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (36 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 37/44] drm/i915: Added facility for cancelling an outstanding request John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 39/44] drm/i915: Added support for pre-emptive scheduling John.C.Harrison
                   ` (7 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

One of the major purposes of the GPU scheduler is to avoid stalling the CPU when
the GPU is busy and unable to accept more work. This change adds support to the
ring submission code to allow a ring space check to be performed before
attempting to submit a batch buffer to the hardware. If insufficient space is
available then the scheduler can go away and come back later, letting the CPU
get on with other work, rather than stalling and waiting for the hardware to
catch up.
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   44 +++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.c    |   34 +++++++++++++++++----
 drivers/gpu/drm/i915/intel_ringbuffer.h    |    2 ++
 3 files changed, 73 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3227a39..a9570ff 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1490,6 +1490,36 @@ int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params)
 		goto early_err;
 	}
 
+#ifdef CONFIG_DRM_I915_SCHEDULER
+{
+	uint32_t min_space;
+
+	/*
+	 * It would be a bad idea to run out of space while writing commands
+	 * to the ring. One of the major aims of the scheduler is to not stall
+	 * at any point for any reason. However, doing an early exit half way
+	 * through submission could result in a partial sequence being written
+	 * which would leave the engine in an unknown state. Therefore, check in
+	 * advance that there will be enough space for the entire submission
+	 * whether emitted by the code below OR by any other functions that may
+	 * be executed before the end of final().
+	 *
+	 * NB: This test deliberately overestimates, because that's easier than
+	 * tracing every potential path that could be taken!
+	 *
+	 * Current measurements suggest that we may need to emit up to 744 bytes
+	 * (186 dwords), so this is rounded up to 256 dwords here. Then we double
+	 * that to get the free space requirement, because the block isn't allowed
+	 * to span the transition from the end to the beginning of the ring.
+	 */
+#define I915_BATCH_EXEC_MAX_LEN         256	/* max dwords emitted here	*/
+	min_space = I915_BATCH_EXEC_MAX_LEN * 2 * sizeof(uint32_t);
+	ret = intel_ring_test_space(ring, min_space);
+	if (ret)
+		goto early_err;
+}
+#endif
+
 	intel_runtime_pm_get(dev_priv);
 
 	/* Ensure the correct seqno gets assigned to the correct buffer: */
@@ -1500,6 +1530,16 @@ int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params)
 
 	seqno = params->seqno;
 
+#ifdef CONFIG_DRM_I915_SCHEDULER
+	ret = intel_ring_begin(ring, I915_BATCH_EXEC_MAX_LEN);
+	if (ret)
+		goto err;
+#endif
+
+	/* Seqno matches? */
+	BUG_ON(ring->outstanding_lazy_seqno    != params->seqno);
+	BUG_ON(ring->preallocated_lazy_request != params->request);
+
 	/* Unconditionally invalidate gpu caches and ensure that we do flush
 	 * any residual writes from the previous batch.
 	 */
@@ -1518,9 +1558,11 @@ int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params)
 
 	if (ring == &dev_priv->ring[RCS] &&
 	    params->mode != dev_priv->relative_constants_mode) {
+#ifndef CONFIG_DRM_I915_SCHEDULER
 		ret = intel_ring_begin(ring, 4);
 		if (ret)
-				goto err;
+			goto err;
+#endif
 
 		intel_ring_emit(ring, MI_NOOP);
 		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 1ad162b..640f26f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -49,7 +49,7 @@ static inline int __ring_space(int head, int tail, int size)
 	return space;
 }
 
-static inline int ring_space(struct intel_engine_cs *ring)
+inline int intel_ring_space(struct intel_engine_cs *ring)
 {
 	struct intel_ringbuffer *ringbuf = ring->buffer;
 	return __ring_space(ringbuf->head & HEAD_ADDR, ringbuf->tail, ringbuf->size);
@@ -546,7 +546,7 @@ static int init_ring_common(struct intel_engine_cs *ring)
 	else {
 		ringbuf->head = I915_READ_HEAD(ring);
 		ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
-		ringbuf->space = ring_space(ring);
+		ringbuf->space = intel_ring_space(ring);
 		ringbuf->last_retired_head = -1;
 	}
 
@@ -1530,7 +1530,7 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 		ringbuf->head = ringbuf->last_retired_head;
 		ringbuf->last_retired_head = -1;
 
-		ringbuf->space = ring_space(ring);
+		ringbuf->space = intel_ring_space(ring);
 		if (ringbuf->space >= n)
 			return 0;
 	}
@@ -1553,7 +1553,7 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 	ringbuf->head = ringbuf->last_retired_head;
 	ringbuf->last_retired_head = -1;
 
-	ringbuf->space = ring_space(ring);
+	ringbuf->space = intel_ring_space(ring);
 	return 0;
 }
 
@@ -1582,7 +1582,7 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
 	trace_i915_ring_wait_begin(ring);
 	do {
 		ringbuf->head = I915_READ_HEAD(ring);
-		ringbuf->space = ring_space(ring);
+		ringbuf->space = intel_ring_space(ring);
 		if (ringbuf->space >= n) {
 			ret = 0;
 			break;
@@ -1634,7 +1634,7 @@ static int intel_wrap_ring_buffer(struct intel_engine_cs *ring)
 		iowrite32(MI_NOOP, virt++);
 
 	ringbuf->tail = 0;
-	ringbuf->space = ring_space(ring);
+	ringbuf->space = intel_ring_space(ring);
 
 	return 0;
 }
@@ -1767,6 +1767,28 @@ int intel_ring_cacheline_align(struct intel_engine_cs *ring)
 	return 0;
 }
 
+/* Test to see if the ring has sufficient space to submit a given piece of work
+ * without causing a stall */
+int intel_ring_test_space(struct intel_engine_cs *ring, int min_space)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct intel_ringbuffer *ringbuf  = ring->buffer;
+
+	if (ringbuf->space < min_space) {
+		/* Need to update the actual ring space. Otherwise, the system
+		 * hangs forever testing a software copy of the space value that
+		 * never changes!
+		 */
+		ringbuf->head  = I915_READ_HEAD(ring);
+		ringbuf->space = intel_ring_space(ring);
+
+		if (ringbuf->space < min_space)
+			return -EAGAIN;
+	}
+
+	return 0;
+}
+
 void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index cc92de2..cf9a535 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -345,6 +345,8 @@ intel_write_status_page(struct intel_engine_cs *ring,
 void intel_stop_ring_buffer(struct intel_engine_cs *ring);
 void intel_cleanup_ring_buffer(struct intel_engine_cs *ring);
 
+int intel_ring_space(struct intel_engine_cs *ring);
+int intel_ring_test_space(struct intel_engine_cs *ring, int min_space);
 int __must_check intel_ring_begin(struct intel_engine_cs *ring, int n);
 int __must_check intel_ring_cacheline_align(struct intel_engine_cs *ring);
 int __must_check intel_ring_alloc_seqno(struct intel_engine_cs *ring);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 39/44] drm/i915: Added support for pre-emptive scheduling
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (37 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 38/44] drm/i915: Add early exit to execbuff_final() if insufficient ring space John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 40/44] drm/i915: REVERTME Hack to allow IGT to test pre-emption John.C.Harrison
                   ` (6 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Added support for pre-empting batch buffers that have already been submitted to
the ring. Currently this implements Gen7 level pre-emption which means
pre-empting only at voluntary points within the batch buffer. The ring
submission code itself adds such points between batch buffers and the OpenCL
driver should be adding them within GPGPU specific batch buffers. Other types of
workloads cannot be preempted by the hardware and so will not be adding
pre-emption points to their buffers.

When a pre-emption occurs, the scheduler must work out which buffers have been
pre-empted versus which actually managed to complete first, and of those that
were pre-empted was the last one pre-empted mid-batch or had it not yet begun to
execute. This is done by extending the seqno mechanism to four slots: batch
buffer start, batch buffer end, preemption start and preemption end. By querying
these four numbers (and only allowing a single preemption event at a time) the
scheduler can guarantee to work out exactly what happened to all batch buffers
that had been submitted to the ring.

A Kconfig option has also been added to allow pre-emption support to be enabled
or disabled.
---
 drivers/gpu/drm/i915/Kconfig               |    8 +
 drivers/gpu/drm/i915/i915_gem.c            |   12 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  273 ++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.c      |  467 +++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_scheduler.h      |   25 +-
 drivers/gpu/drm/i915/i915_trace.h          |   23 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h    |    4 +
 7 files changed, 797 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 22a036b..b94d4c7 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -89,3 +89,11 @@ config DRM_I915_SCHEDULER
 	help
 	  Choose this option to enable GPU task scheduling for improved
 	  performance and efficiency.
+
+config DRM_I915_SCHEDULER_PREEMPTION
+	bool "Enable pre-emption within the GPU scheduler"
+	depends on DRM_I915_SCHEDULER
+	default y
+	help
+	  Choose this option to enable pre-emptive context switching within the
+	  GPU scheduler for even more performance and efficiency improvements.
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index dd0fac8..2cb4484 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2312,6 +2312,18 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno)
 			ring->semaphore.sync_seqno[j] = 0;
 	}
 
+#ifdef CONFIG_DRM_I915_SCHEDULER_PREEMPTION
+	/* Also reset sw batch tracking state */
+	for_each_ring(ring, dev_priv, i) {
+		ring->last_regular_batch = 0;
+		ring->last_preemptive_batch = 0;
+		intel_write_status_page(ring, I915_BATCH_DONE_SEQNO, 0);
+		intel_write_status_page(ring, I915_BATCH_ACTIVE_SEQNO, 0);
+		intel_write_status_page(ring, I915_PREEMPTIVE_DONE_SEQNO, 0);
+		intel_write_status_page(ring, I915_PREEMPTIVE_ACTIVE_SEQNO, 0);
+	}
+#endif
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index a9570ff..81acdf2 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1470,6 +1470,238 @@ pre_mutex_err:
 	return ret;
 }
 
+#ifdef CONFIG_DRM_I915_SCHEDULER_PREEMPTION
+/*
+ * The functions below emit opcodes into the ring buffer.
+ * The simpler ones insert a single instruction, whereas the
+ * prequel/preamble/postamble functions generate a sequence
+ * of operations according to the nature of the current batch.
+ * Top among them is i915_gem_do_execbuffer_final() which is
+ * called by the scheduler to pass a batch to the hardware.
+ *
+ * There are three different types of batch handled here:
+ * 1.	non-preemptible batches (using the default context)
+ * 2.	preemptible batches (using a non-default context)
+ * 3.	preemptive batches (using a non-default context)
+ * and three points at which the code paths vary (prequel, at the very
+ * start of per-batch processing; preamble, just before the call to the
+ * batch buffer; and postamble, which after the batch buffer completes).
+ *
+ * The preamble is simple; it logs the sequence number of the batch that's
+ * about to start, and enables or disables preemption for the duration of
+ * the batch. The postamble is similar: it logs the sequence number of the
+ * batch that's just finished, and clears the in-progress sequence number
+ * (except for preemptive batches, where this is deferred to the interrupt
+ * handler).
+ *
+ * The prequel is the part that differs most. In the case of a regular batch,
+ * it contains an ARB ON/ARB CHECK sequence that allows preemption before
+ * the batch starts. The preemptive prequel, on the other hand, is more
+ * complex; see the description below ...
+ */
+
+/*
+ * Emit an MI_STORE_DWORD_INDEX instruction.
+ * This stores the specified value in the (index)th DWORD of the hardware status page.
+ */
+static uint32_t
+emit_store_dw_index(struct intel_engine_cs *ring, uint32_t value, uint32_t index)
+{
+	uint32_t vptr;
+	intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
+	intel_ring_emit(ring, index << MI_STORE_DWORD_INDEX_SHIFT);
+	vptr = intel_ring_get_tail(ring);
+	intel_ring_emit(ring, value);
+	return vptr;
+}
+
+/*
+ * Emit an MI_STORE_REGISTER_MEM instruction.
+ * This stores the specified register in the (index)th DWORD of the memory
+ * area pointed to by base (which is actually the hardware status page).
+ */
+static void
+emit_store_reg_index(struct intel_engine_cs *ring, uint32_t reg, uint32_t base, uint32_t index)
+{
+	intel_ring_emit(ring, MI_STORE_REG_MEM | MI_STORE_REG_MEM_GTT);
+	intel_ring_emit(ring, reg);
+	intel_ring_emit(ring, base+(index << MI_STORE_DWORD_INDEX_SHIFT));
+}
+
+/*
+ * Emit the commands to check for preemption before starting a regular batch
+ */
+static void
+emit_regular_prequel(struct intel_engine_cs *ring, uint32_t seqno, uint32_t start)
+{
+	/* Log the ring address of the batch we're starting BEFORE the ARB CHECK */
+	emit_store_dw_index(ring, start, I915_BATCH_ACTIVE_ADDR);
+	intel_ring_emit(ring, MI_REPORT_HEAD);
+
+	/* Ensure Arbitration is enabled, then check for pending preemption */
+	intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
+	intel_ring_emit(ring, MI_ARB_CHECK);
+	/* 6 dwords so far */
+}
+
+/*
+ * Emit the commands that prefix a preemptive batch.
+ *
+ * The difficulty here is that the engine is asynchronous. It may have already
+ * stopped with HEAD == TAIL, or it may still be running. If still running, it
+ * could execute an ARB CHECK instruction at ANY time.
+ *
+ * Therefore, it is unsafe to write UHPTR first and then update TAIL because
+ * an ARB_CHECK might trigger a jump between the two. This would set HEAD to
+ * be *after* TAIL which the engine would interpret as being a VERY looooong
+ * way *BEHIND* TAIL.
+ *
+ * OTOH, if TAIL is written first and then UHPTR, the engine might run the new
+ * code before the update of UHPTR has occurred. It would then stop when
+ * HEAD == (new) TAIL and the updated UHPTR would be ignored leaving the
+ * preemption pending until later!
+ *
+ * In addition, it is necessary to distinguish in the interrupt handler whether
+ * the ring was in fact idle by the time preemption took place. I.e. there were
+ * no ARB CHECK commands between HEAD at the time when UHPTR was set and the
+ * start of the preemptive batch that is being constructed.
+ *
+ * The solution is to first construct a 'landing zone' containing at least one
+ * instruction whose execution can be detected (in this case, a STORE and an
+ * ARB_ENABLE) and advance TAIL over it. Then set UHPTR to the same value as
+ * the new TAIL.
+ *
+ * If an (enabled) ARB_CHECK instruction is executed before the next update to
+ * TAIL, the engine will update HEAD to the value of UHPTR and then stop as the
+ * new value of HEAD will match TAIL. OTOH if no further ARB_CHECK instructions
+ * are reached, the engine will eventually run into the landing zone and again
+ * stop at the same point (but with preemption still pending).
+ *
+ * Thus, a second zone is added that *starts* with an ARB_CHECK. If (and only
+ * if) preemption has not yet occurred, this will cause a jump to the location
+ * given by UHPTR (which is its own address!). As a side effect, the VALID bit
+ * of UHPTR is cleared, so when the same ARB_CHECK is executed again, it now
+ * has no effect.
+ *
+ * Either way, the engine reaches the end of the second landing zone with
+ * preemption having occurred exactly once, so there's no surprise left lurking
+ * for later. If the new batch work has already been added by the time this
+ * happens, it can continue immediately. Otherwise, the engine will stop until
+ * the next update to TAIL after the batch call is added.
+ */
+static void
+emit_preemptive_prequel(struct intel_engine_cs *ring, uint32_t seqno, uint32_t start)
+{
+	/* 'dev_priv' is required by the WRITE_UHPTR() macro! :-( */
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	uint32_t i, hwpa, jump;
+
+	/* Part 1, reached only if the ring is idle */
+	emit_store_dw_index(ring, seqno, I915_BATCH_ACTIVE_SEQNO);
+	intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
+	/* 4 dwords so far */
+	intel_ring_advance(ring);
+	jump = intel_ring_get_tail(ring);
+	BUG_ON(jump & UHPTR_GFX_ADDR_ALIGN);
+
+	I915_WRITE_UHPTR(ring, jump | UHPTR_VALID);
+
+	/* May jump to itself! */
+	intel_ring_emit(ring, MI_ARB_CHECK);
+
+	/* Log the ring address of the batch we're starting AFTER the ARB CHECK */
+	emit_store_dw_index(ring, start, I915_PREEMPTIVE_ACTIVE_ADDR);
+	/* 8 dwords so far */
+
+	{
+		/*
+		 * Unfortunately not everything we need is automatically saved by a
+		 * context switch, so we have to explicitly save some registers here.
+		 */
+		static const u32 regs[][2] = {
+			{	RING_PREEMPT_ADDR,		I915_SAVE_PREEMPTED_RING_PTR	},
+			{	BB_PREEMPT_ADDR,		I915_SAVE_PREEMPTED_BB_PTR	},
+			{	SBB_PREEMPT_ADDR,		I915_SAVE_PREEMPTED_SBB_PTR	},
+			{	RS_PREEMPT_STATUS,		I915_SAVE_PREEMPTED_STATUS	},
+
+			{	RING_HEAD(RENDER_RING_BASE),	I915_SAVE_PREEMPTED_HEAD	},
+			{	RING_TAIL(RENDER_RING_BASE),	I915_SAVE_PREEMPTED_TAIL	},
+			{	RING_UHPTR(RENDER_RING_BASE),	I915_SAVE_PREEMPTED_UHPTR	},
+			{	NOPID,				I915_SAVE_PREEMPTED_NOPID	}
+		};
+
+		/* This loop generates another 24 dwords, for a total of 36 so far */
+		hwpa = i915_gem_obj_ggtt_offset(ring->status_page.obj);
+		for (i = 0; i < ARRAY_SIZE(regs); ++i)
+			emit_store_reg_index(ring, regs[i][0], hwpa, regs[i][1]);
+	}
+}
+
+/*
+ * Emit the commands that immediately prefix execution of a batch.
+ *
+ * The GPU will log the seqno of the batch as it starts running it,
+ * then enable or disable preemption checks during this batch.
+ */
+static void
+emit_preamble(struct intel_engine_cs *ring, uint32_t seqno, struct intel_context *ctx, bool preemptive)
+{
+	emit_store_dw_index(ring, seqno, preemptive ? I915_PREEMPTIVE_ACTIVE_SEQNO : I915_BATCH_ACTIVE_SEQNO);
+	if (preemptive || i915_gem_context_is_default(ctx))
+		intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_DISABLE);
+	else
+		intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
+	/* 4 dwords so far */
+}
+
+/*
+ * Emit the commands that immediately follow execution of a batch.
+ *
+ * The GPU will:
+ * 1) log the end address of the batch we've completed
+ * 2) log the seqno of the batch we've just completed.
+ * 3) in the case of a non-preemptive batch, clear the in-progress sequence
+ *    number; otherwise, issue a dummy register store to flush the above
+ *    writes before the interrupt happens.
+ */
+static void
+emit_postamble(struct intel_engine_cs *ring, uint32_t seqno, uint32_t start, bool preemptive)
+{
+	uint32_t eptr, end;
+
+	if (intel_ring_begin(ring, 10))
+		return;
+
+	/*
+	 * Note that the '~0u' in this call is a placeholder - the actual address
+	 * will be calculated later in this function and retroactively patched
+	 * into this dword!
+	 */
+	eptr = emit_store_dw_index(ring, ~0u, preemptive ? I915_PREEMPTIVE_ACTIVE_END : I915_BATCH_ACTIVE_END);
+	emit_store_dw_index(ring, seqno, preemptive ? I915_PREEMPTIVE_DONE_SEQNO : I915_BATCH_DONE_SEQNO);
+	if (preemptive) {
+		uint32_t hwpa = i915_gem_obj_ggtt_offset(ring->status_page.obj);
+		emit_store_reg_index(ring, NOPID, hwpa, I915_SAVE_PREEMPTED_NOPID);
+	} else {
+		emit_store_dw_index(ring, 0, I915_BATCH_ACTIVE_SEQNO);
+	}
+	intel_ring_emit(ring, MI_NOOP);
+	/* 10 dwords so far */
+
+	end = intel_ring_get_tail(ring);
+
+	/* Stash the batch bounds for use by the interrupt handler */
+	intel_write_status_page(ring, I915_GEM_BATCH_START_ADDR, start);
+	intel_write_status_page(ring, I915_GEM_BATCH_END_ADDR, end);
+
+	BUG_ON(eptr & UHPTR_GFX_ADDR_ALIGN);
+	BUG_ON(end & UHPTR_GFX_ADDR_ALIGN);
+
+	/* Go back and patch the end-batch address inserted above */
+	iowrite32(end, ring->buffer->virtual_start + eptr);
+}
+#endif  /* CONFIG_DRM_I915_SCHEDULER_PREEMPTION */
+
 /*
  * This is the main function for adding a batch to the ring.
  * It is called from the scheduler, with the struct_mutex already held.
@@ -1480,6 +1712,10 @@ int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params)
 	struct intel_engine_cs  *ring = params->ring;
 	u64 exec_start, exec_len;
 	int ret, i;
+	bool preemptive;
+#ifdef CONFIG_DRM_I915_SCHEDULER_PREEMPTION
+	u32 start;
+#endif
 	u32 seqno;
 
 	/* The mutex must be acquired before calling this function */
@@ -1547,6 +1783,22 @@ int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params)
 	if (ret)
 		goto err;
 
+	preemptive = (params->scheduler_flags & i915_ebp_sf_preempt) != 0;
+#ifndef CONFIG_DRM_I915_SCHEDULER_PREEMPTION
+	/* The scheduler must not request preemption if support wasn't compiled in */
+	BUG_ON(preemptive);
+#endif
+
+#ifdef CONFIG_DRM_I915_SCHEDULER_PREEMPTION
+	start = intel_ring_get_tail(ring);
+	BUG_ON(start & UHPTR_GFX_ADDR_ALIGN);
+
+	if (preemptive)
+		emit_preemptive_prequel(ring, seqno, start);
+	else
+		emit_regular_prequel(ring, seqno, start);
+#endif
+
 	/* Switch to the correct context for the batch */
 	ret = i915_switch_context(ring, params->ctx);
 	if (ret)
@@ -1583,10 +1835,26 @@ int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params)
 	BUG_ON(ring->outstanding_lazy_seqno    != params->seqno);
 	BUG_ON(ring->preallocated_lazy_request != params->request);
 
+#ifdef CONFIG_DRM_I915_SCHEDULER_PREEMPTION
+	/*
+	 * Log the seqno of the batch we're starting
+	 * Enable/disable preemption checks during this batch
+	 */
+	emit_preamble(ring, seqno, params->ctx, preemptive);
+#endif
+
 	exec_len   = params->args_batch_len;
 	exec_start = params->batch_obj_vm_offset +
 		     params->args_batch_start_offset;
 
+#ifdef CONFIG_DRM_I915_SCHEDULER_PREEMPTION
+	if (params->preemption_point) {
+		uint32_t preemption_offset = params->preemption_point - exec_start;
+		exec_start += preemption_offset;
+		exec_len   -= preemption_offset;
+	}
+#endif
+
 	if (params->cliprects) {
 		for (i = 0; i < params->args_num_cliprects; i++) {
 			ret = i915_emit_box(params->dev, &params->cliprects[i],
@@ -1608,6 +1876,11 @@ int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params)
 			goto err;
 	}
 
+#ifdef CONFIG_DRM_I915_SCHEDULER_PREEMPTION
+	emit_postamble(ring, seqno, start, preemptive);
+	intel_ring_advance(ring);
+#endif
+
 	trace_i915_gem_ring_dispatch(ring, seqno, params->eb_flags);
 
 	/* Seqno matches? */
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 7c03fb7..0eb6a31 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -43,6 +43,8 @@ const char *i915_qe_state_str(struct i915_scheduler_queue_entry *node)
 	char		*ptr = str;
 
 	*(ptr++) = node->bumped ? 'B' : '-',
+	*(ptr++) = (node->params.scheduler_flags & i915_ebp_sf_preempt) ? 'P' : '-';
+	*(ptr++) = (node->params.scheduler_flags & i915_ebp_sf_was_preempt) ? 'p' : '-';
 
 	*ptr = 0;
 
@@ -61,9 +63,15 @@ char i915_scheduler_queue_status_chr(enum i915_scheduler_queue_status status)
 	case i915_sqs_flying:
 	return 'F';
 
+	case i915_sqs_overtaking:
+	return 'O';
+
 	case i915_sqs_complete:
 	return 'C';
 
+	case i915_sqs_preempted:
+	return 'P';
+
 	default:
 	break;
 	}
@@ -86,9 +94,15 @@ const char *i915_scheduler_queue_status_str(
 	case i915_sqs_flying:
 	return "Flying";
 
+	case i915_sqs_overtaking:
+	return "Overtaking";
+
 	case i915_sqs_complete:
 	return "Complete";
 
+	case i915_sqs_preempted:
+	return "Preempted";
+
 	default:
 	break;
 	}
@@ -155,7 +169,11 @@ int i915_scheduler_init(struct drm_device *dev)
 	/* Default tuning values: */
 	scheduler->priority_level_max     = ~0U;
 	scheduler->priority_level_preempt = 900;
+#ifdef CONFIG_DRM_I915_SCHEDULER_PREEMPTION
+	scheduler->min_flying             = 8;
+#else
 	scheduler->min_flying             = 2;
+#endif
 	scheduler->file_queue_max         = 64;
 
 	dev_priv->scheduler = scheduler;
@@ -172,7 +190,7 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 	struct i915_scheduler_queue_entry  *test;
 	struct timespec     stamp;
 	unsigned long       flags;
-	bool                not_flying, found;
+	bool                not_flying, want_preempt, found;
 	int                 i, j, r, got_batch = 0;
 	int                 incomplete = 0;
 
@@ -315,12 +333,22 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 		not_flying = i915_scheduler_count_flying(scheduler, ring) <
 							 scheduler->min_flying;
 
+#ifdef CONFIG_DRM_I915_SCHEDULER_PREEMPTION
+	want_preempt = node->priority >= scheduler->priority_level_preempt;
+#else
+	want_preempt = false;
+#endif
+
+	if (want_preempt)
+		node->params.scheduler_flags |= i915_ebp_sf_preempt |
+						i915_ebp_sf_was_preempt;
+
 	trace_i915_scheduler_queue(ring, node);
 	trace_i915_scheduler_node_state_change(ring, node);
 
 	spin_unlock_irqrestore(&scheduler->lock, flags);
 
-	if (not_flying)
+	if (not_flying || want_preempt)
 		i915_scheduler_submit(ring, true);
 
 	return 0;
@@ -341,6 +369,14 @@ int i915_scheduler_fly_seqno(struct intel_engine_cs *ring, uint32_t seqno)
 	if (scheduler->flags[ring->id] & i915_sf_submitting)
 		return 0;
 
+#ifdef CONFIG_DRM_I915_SCHEDULER_PREEMPTION
+	/* Does not work with preemption as that requires the extra seqno status
+	 * words to be updated rather than just the one original word! */
+	DRM_DEBUG_SCHED("<%s> Got non-batch ring submission! [seqno = %d]\n",
+			ring->name, seqno);
+	return 0;
+#endif
+
 	getrawmonotonic(&stamp);
 
 	/* Need to allocate a new node. Note that kzalloc can sleep
@@ -382,7 +418,10 @@ int i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node)
 	 * hardware submission order. */
 	list_add(&node->link, &scheduler->node_queue[ring->id]);
 
-	node->status = i915_sqs_flying;
+	if (node->params.scheduler_flags & i915_ebp_sf_preempt)
+		node->status = i915_sqs_overtaking;
+	else
+		node->status = i915_sqs_flying;
 
 	trace_i915_scheduler_fly(ring, node);
 	trace_i915_scheduler_node_state_change(ring, node);
@@ -424,6 +463,9 @@ static inline bool i915_scheduler_is_dependency_valid(
 	if (I915_SQS_IS_FLYING(dep)) {
 		if (node->params.ring != dep->params.ring)
 			return true;
+
+		if (node->params.scheduler_flags & i915_ebp_sf_preempt)
+			return true;
 	}
 
 	return false;
@@ -467,6 +509,309 @@ static void i915_scheduler_node_kill(struct i915_scheduler_queue_entry *node)
 	trace_i915_scheduler_node_state_change(node->params.ring, node);
 }
 
+#ifdef CONFIG_DRM_I915_SCHEDULER_PREEMPTION
+
+/*
+ * The batch tagged with the indicated seqence number has been started
+ * (but not yet completed). Must be called with spinlock already held.
+ *
+ * This handles two distinct cases: preemptED and preemptIVE. In both
+ * cases, the associated batch MUST exist and be FLYING. Because batch
+ * buffers are moved to the head of the queue as they are submitted to
+ * the hardware, no FLYING batch can come later than the first COMPLETED
+ * batch, even wih preemption, so we can quit the search early if we
+ * find a COMPLETED batch -- which would be a BUG.
+ *
+ * In the case of mid_batch == true, the batch buffer itself was
+ * non-preemptive and has been preempted part way through (at the given
+ * address). The address must be saved away so that the starting point can be
+ * adjusted when the batch is resubmitted.
+ *
+ * In the case of mid_batch == false, the batch buffer is the preempting one
+ * and has started executing (potentially pre-empting other batch buffers part
+ * way through) but not yet completed (at the time of analysis). At this point
+ * it should, in theory, be safe to reallow ring submission rather than waiting
+ * for the preemptive batch to fully complete.
+ */
+static void i915_scheduler_seqno_started(struct intel_engine_cs *ring,
+					 uint32_t seqno, bool mid_batch,
+					 uint32_t bb_addr)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_queue_entry *node;
+	bool   found = false;
+
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+		if (seqno == node->params.seqno) {
+			found = true;
+			break;
+		}
+
+		BUG_ON(I915_SQS_IS_COMPLETE(node));
+	}
+
+	BUG_ON(!found);
+
+	if (mid_batch) {
+		BUG_ON(node->status != i915_sqs_flying);
+		node->params.preemption_point = bb_addr;
+	} else {
+		BUG_ON(node->status != i915_sqs_overtaking);
+	}
+}
+
+/*
+ * The batch tagged with the indicated seqence number has completed.
+ * Search the queue for it, update its status and those of any batches
+ * submitted earlier, which must also have completed or been preeempted
+ * as appropriate.
+ *
+ * Called with spinlock already held.
+ */
+static void i915_scheduler_seqno_complete(struct intel_engine_cs *ring,
+					  uint32_t seqno, bool preemptive)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_queue_entry *node;
+	bool   found = false;
+
+	/*
+	 * Batch buffers are added to the head of the list in execution order,
+	 * thus seqno values, although not necessarily incrementing, will be
+	 * met in completion order when scanning the list. So when a match is
+	 * found, all subsequent entries must have either also popped or been
+	 * preempted.
+	 */
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+		if (seqno == node->params.seqno) {
+			found = true;
+			break;
+		}
+	}
+
+	trace_i915_scheduler_landing(ring, seqno, found ? node : NULL);
+	BUG_ON(!found);
+
+	if (preemptive) {
+		BUG_ON(node->status != i915_sqs_overtaking);
+
+		/*
+		 * This batch has overtaken and preempted those still on the
+		 * list. All batches in flight will need to be resubmitted.
+		 */
+		node->status = i915_sqs_complete;
+		trace_i915_scheduler_node_state_change(ring, node);
+
+		list_for_each_entry_continue(node, &scheduler->node_queue[ring->id], link) {
+			BUG_ON(node->status == i915_sqs_overtaking);
+
+			if (I915_SQS_IS_COMPLETE(node))
+				break;
+
+			if (node->status != i915_sqs_flying)
+				continue;
+
+			node->status = i915_sqs_preempted;
+			trace_i915_scheduler_unfly(ring, node);
+			trace_i915_scheduler_node_state_change(ring, node);
+		}
+
+		/*
+		 * Preemption finished:
+		 *
+		 * The 'preempting' flag prevented submissions to the ring
+		 * while a preemptive batch was in flight. Now that it is
+		 * complete, the flag can be cleared and submissions may be
+		 * resumed.
+		 *
+		 * The 'preempted' flag, OTOH, tells waiters who may be holding
+		 * the 'struct_mutex' that preemption has occurred, and they
+		 * should wake up (or not go to sleep) and release the mutex so
+		 * that the scheduler's delayed-work task can postprocess the
+		 * request queue and initiate submission of more batches.
+		 * Without this, a thread that is waiting for a batch that has
+		 * been preempted (or has not yet been submited to the hardware)
+		 * could sleep while holding the mutex but would never receive
+		 * a wakeup, resulting in a device hang.
+		 */
+		scheduler->flags[ring->id] &= ~i915_sf_preempting;
+		scheduler->flags[ring->id] |=  i915_sf_preempted;
+	} else {
+		BUG_ON(node->status != i915_sqs_flying);
+
+		/* Everything from here can be marked as done: */
+		list_for_each_entry_from(node, &scheduler->node_queue[ring->id], link) {
+			BUG_ON(node->status == i915_sqs_overtaking);
+
+			/* Check if the marking has already been done: */
+			if (I915_SQS_IS_COMPLETE(node))
+				break;
+
+			if (node->status != i915_sqs_flying)
+				continue;
+
+			/* Node was in flight so mark it as complete. */
+			node->status = i915_sqs_complete;
+			trace_i915_scheduler_node_state_change(ring, node);
+		}
+	}
+
+	/* Should submit new work here if flight list is empty but the DRM
+	 * mutex lock might not be available if a '__wait_seqno()' call is
+	 * blocking the system. */
+}
+
+/*
+ * In the non-preemption case, the last seqno processed by the ring is
+ * sufficient information to keep track of what has or has not completed.
+ *
+ * However, it is insufficient in the preemption case as much historical
+ * information can be lost. Instead, four separate seqno values are required
+ * to distinguish between batches that have completed versus ones that have
+ * been preempted:
+ *   p_active  sequence number of currently executing preemptive batch or
+ *             zero if no such batch is executing
+ *   b_active  sequence number of currently executing non-preemptive batch
+ *             or zero if no such batch is executing
+ *   p_done    sequence number of last completed preemptive batch
+ *   b_done    sequence number of last completed non-preemptive batch
+ *
+ * NB: Zero is not a valid seqence number and is therefore safe to use as an
+ *     'N/A' type value.
+ *
+ * Only one preemptive batch can be in the flight at a time. No more
+ * batches can be submitted until it completes, at which time there should
+ * be no further activity. Completion of a preemptive batch is indicated
+ * by (p_done == p_active != 0).
+ *
+ * At any other time, the GPU may still be running additional tasks after the
+ * one that initiated the interrupt, so any values read from the hardware
+ * status page may not reflect a single coherent state!
+ *
+ * In particular, the following cases can occur while handling the completion
+ * of a preemptive batch:
+ *
+ * 1.  The regular case is that 'seqno' == 'p_done', and 'b_done' differs
+ *     from them, being from an earlier non-preemptive batch.
+ *
+ * 2.  The interrupt was generated by an earlier non-preemptive batch. In this
+ *     case, 'seqno' should match 'b_done' and 'p_done' should be differ.
+ *     There should also be another interrupt still on it's way!
+ *       GPU: seq 1, intr 1 ...
+ *       CPU:              intr 1, reads seqno
+ *       GPU:                                seq 2
+ *       CPU:                                    reads p_done, b_done
+ *       GPU:                                                intr 2
+ *     This can happen when 1 is regular and 2 is preemptive. Most other
+ *     strange cases should not happen simply because of the requirement
+ *     that no more batches are submitted after a preemptive one until the
+ *     preemption completes.
+ *
+ * In the case of handling completion of a NON-preemptive batch, the following
+ * may be observed:
+ *
+ * 1.  The regular case is that 'seqno' == 'b_done' and the interrupt was
+ *     generated by the completion of the most recent (non-preemptive) batch.
+ *
+ * 2.  The interrupt was generated by an earlier non-preemptive batch. In this
+ *     case, 'seqno' should be earlier than 'b_done'. There should be another
+ *     interrupt still on it's way!
+ *		GPU: seq 1, intr 1 ...
+ *		CPU:              intr 1, reads seqno
+ *		GPU:                                seq 2
+ *		CPU:                                    reads b_done
+ *		GPU:                                                 intr 2
+ *     This can easily happen when 1 and 2 are both regular batches.
+ *
+ * 3.  Updates to the sequence number can overtake interrupts:
+ *		GPU: seq 1, intr 1 (delayed), seq 2 ...
+ *		CPU:                              intr 1, reads/processes seq 2
+ *		GPU:                                    intr 2
+ *		CPU:                                         intr 2, reads seq 2 again
+ *     This can only happen when 1 and 2 are both regular batches i.e. not
+ *     the preemptive case where nothing can be queued until preemption is
+ *     seen to have completed.
+ *
+ * 4.  If there are non-batch commands (with sequence numbers) in the ring,
+ *     then 'seqno' could be updated by such a command while 'b_done' remains
+ *     at the number of the last non-preemptive batch.
+ *
+ * 5.  'seqno' could also be left over from an already-serviced preemptive batch.
+ *
+ * All of which basically means that 'seqno' as read via 'ring->get_seqno()' is
+ * not especially useful. Thus the four batch buffer bookend values are all that
+ * is used to determine exactly what has or has not occurred between this ISR
+ * execution and the last.
+ */
+int i915_scheduler_handle_IRQ(struct intel_engine_cs *ring)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	unsigned long   flags;
+	uint32_t        b_active, b_done, p_active, p_done;
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+
+	p_done   = intel_read_status_page(ring, I915_PREEMPTIVE_DONE_SEQNO);
+	p_active = intel_read_status_page(ring, I915_PREEMPTIVE_ACTIVE_SEQNO);
+	b_done   = intel_read_status_page(ring, I915_BATCH_DONE_SEQNO);
+	b_active = intel_read_status_page(ring, I915_BATCH_ACTIVE_SEQNO);
+
+	trace_i915_scheduler_irq(ring, ring->get_seqno(ring, false),
+				 b_active, b_done, p_active, p_done);
+
+	if (i915.scheduler_override & i915_so_direct_submit) {
+		spin_unlock_irqrestore(&scheduler->lock, flags);
+		return 0;
+	}
+
+	/* All regular batches up to 'b_done' have completed */
+	if (b_done != ring->last_regular_batch) {
+		i915_scheduler_seqno_complete(ring, b_done, false);
+		ring->last_regular_batch = b_done;
+	}
+
+	if (p_done) {
+		/*
+		 * The preeemptive batch identified by 'p_done' has completed.
+		 * If 'b_active' is different from 'p_active' and nonzero, that
+		 * batch has been preempted mid-batch. All other batches still
+		 * in flight have been preempted before starting.
+		 */
+		BUG_ON(p_active != p_done);
+		if (b_active == p_active) {
+			/* null preemption (ring was idle) */
+		} else if (b_active == 0) {
+			/* interbatch preemption (ring was busy) */
+		} else /* any other value of b_active */ {
+			/* midbatch preemption (batch was running) */
+			uint32_t b_addr = intel_read_status_page(ring, I915_SAVE_PREEMPTED_BB_PTR);
+			i915_scheduler_seqno_started(ring, b_active, true, b_addr);
+		}
+
+		i915_scheduler_seqno_complete(ring, p_done, true);
+		ring->last_preemptive_batch = p_done;
+
+		/* Clear the active-batch and preemptive-batch-done sequence
+		 * numbers in the status page */
+		intel_write_status_page(ring, I915_BATCH_ACTIVE_SEQNO, 0);
+		intel_write_status_page(ring, I915_PREEMPTIVE_DONE_SEQNO, 0);
+	} else if (p_active && p_active != ring->last_preemptive_batch) {
+		/* new preemptive batch started but not yet finished */
+		i915_scheduler_seqno_started(ring, p_active, false, 0);
+	}
+
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	queue_work(dev_priv->wq, &dev_priv->mm.scheduler_work);
+
+	return 0;
+}
+
+#else  /* CONFIG_DRM_I915_SCHEDULER_PREEMPTION */
+
 /*
  * The batch tagged with the indicated seqence number has completed.
  * Search the queue for it, update its status and those of any batches
@@ -542,7 +887,7 @@ int i915_scheduler_handle_IRQ(struct intel_engine_cs *ring)
 
 	seqno = ring->get_seqno(ring, false);
 
-	trace_i915_scheduler_irq(ring, seqno);
+	trace_i915_scheduler_irq(ring, seqno, 0, 0, 0, 0);
 
 	if (i915.scheduler_override & i915_so_direct_submit)
 		return 0;
@@ -562,6 +907,8 @@ int i915_scheduler_handle_IRQ(struct intel_engine_cs *ring)
 	return 0;
 }
 
+#endif  /* CONFIG_DRM_I915_SCHEDULER_PREEMPTION */
+
 int i915_scheduler_remove(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
@@ -1040,7 +1387,8 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
 	int     ret;
 	int     i;
 	bool	any_queued;
-	bool	has_local, has_remote, only_remote;
+	bool	has_local, has_remote, only_remote, local_preempt_only;
+	bool	was_preempted = false;
 
 	*pop_node = NULL;
 	ret = -ENODATA;
@@ -1054,18 +1402,44 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
 			continue;
 		any_queued = true;
 
+		/* Attempt to re-enable pre-emption if a node wants to pre-empt
+		 * but previously got downgraded. */
+		if ((node->params.scheduler_flags &
+		     (i915_ebp_sf_preempt |
+		      i915_ebp_sf_was_preempt)) ==
+		    i915_ebp_sf_was_preempt)
+			node->params.scheduler_flags |=
+				i915_ebp_sf_preempt;
+
 		has_local  = false;
 		has_remote = false;
+		local_preempt_only = true;
 		for (i = 0; i < node->num_deps; i++) {
 			if (!i915_scheduler_is_dependency_valid(node, i))
 				continue;
 
-			if (node->dep_list[i]->params.ring == node->params.ring)
+			if (node->dep_list[i]->params.ring == node->params.ring) {
 				has_local = true;
-			else
+
+				if (local_preempt_only &&
+				    (node->params.scheduler_flags & i915_ebp_sf_preempt)) {
+					node->params.scheduler_flags &= ~i915_ebp_sf_preempt;
+					if (i915_scheduler_is_dependency_valid(node, i))
+						local_preempt_only = false;
+					node->params.scheduler_flags |= i915_ebp_sf_preempt;
+				}
+			} else
 				has_remote = true;
 		}
 
+		if (has_local && local_preempt_only) {
+			/* If a preemptive node's local dependencies are all
+			 * flying. then they can be ignore by un-preempting the
+			 * node. */
+			node->params.scheduler_flags &= ~i915_ebp_sf_preempt;
+			has_local = false;
+		}
+
 		if (has_remote && !has_local)
 			only_remote = true;
 
@@ -1080,6 +1454,7 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
 		list_del(&best->link);
 
 		INIT_LIST_HEAD(&best->link);
+		was_preempted = best->status == i915_sqs_preempted;
 		best->status  = i915_sqs_none;
 
 		trace_i915_scheduler_node_state_change(ring, best);
@@ -1105,6 +1480,13 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
 
 	trace_i915_scheduler_pop_from_queue(ring, best);
 
+	if (was_preempted) {
+		/* Previously submitted - cancel outstanding request */
+		spin_unlock_irqrestore(&scheduler->lock, *flags);
+		i915_gem_cancel_request(ring, best->params.seqno);
+		spin_lock_irqsave(&scheduler->lock, *flags);
+	}
+
 	*pop_node = best;
 	return ret;
 }
@@ -1118,6 +1500,12 @@ int i915_scheduler_submit(struct intel_engine_cs *ring, bool was_locked)
 	unsigned long       flags;
 	int                 ret = 0, count = 0;
 
+	if (scheduler->flags[ring->id] & i915_sf_preempting) {
+		/* If a pre-emption event is in progress then no other work may
+		 * be submitted to that ring. Come back later... */
+		return -EAGAIN;
+	}
+
 	if (!was_locked) {
 		ret = i915_mutex_lock_interruptible(dev);
 		if (ret)
@@ -1145,10 +1533,45 @@ int i915_scheduler_submit(struct intel_engine_cs *ring, bool was_locked)
 		BUG_ON(node->status != i915_sqs_none);
 		count++;
 
+		if (node->params.scheduler_flags & i915_ebp_sf_preempt) {
+			struct i915_scheduler_queue_entry  *fly;
+			bool    got_flying = false;
+
+			list_for_each_entry(fly, &scheduler->node_queue[ring->id], link) {
+				if (!I915_SQS_IS_FLYING(fly))
+					continue;
+
+				got_flying = true;
+				if (fly->priority >= node->priority) {
+					/* Already working on something at least
+					 * as important, so don't interrupt it. */
+					node->params.scheduler_flags &=
+						~i915_ebp_sf_preempt;
+					break;
+				}
+			}
+
+			if (!got_flying) {
+				/* Nothing to preempt so don't bother. */
+				node->params.scheduler_flags &=
+					~i915_ebp_sf_preempt;
+			}
+		}
+
 		/* The call to pop above will have removed the node from the
 		 * list. So add it back in and mark it as in flight. */
 		i915_scheduler_fly_node(node);
 
+		/* If the submission code path is being called then the
+		 * scheduler must be out of the 'post-premeption' state. */
+		scheduler->flags[ring->id] &= ~i915_sf_preempted;
+		/* If this batch is pre-emptive then it will tie the hardware
+		 * up until it has at least begun to be executed. That is,
+		 * if a pre-emption request is in flight then no other work
+		 * may be submitted until it resolves. */
+		if (node->params.scheduler_flags & i915_ebp_sf_preempt)
+			scheduler->flags[ring->id] |= i915_sf_preempting;
+
 		scheduler->flags[ring->id] |= i915_sf_submitting;
 		spin_unlock_irqrestore(&scheduler->lock, flags);
 		ret = i915_gem_do_execbuffer_final(&node->params);
@@ -1160,7 +1583,9 @@ int i915_scheduler_submit(struct intel_engine_cs *ring, bool was_locked)
 
 			/* Oh dear! Either the node is broken or the ring is
 			 * busy. So need to kill the node or requeue it and try
-			 * again later as appropriate. */
+			 * again later as appropriate. Either way, clear the
+			 * pre-emption flag as it ain't happening. */
+			scheduler->flags[ring->id] &= ~i915_sf_preempting;
 
 			switch (-ret) {
 			case EAGAIN:
@@ -1195,6 +1620,10 @@ int i915_scheduler_submit(struct intel_engine_cs *ring, bool was_locked)
 				i915_scheduler_node_kill(node);
 		}
 
+		/* If pre-emption is in progress then give up and go home. */
+		if (scheduler->flags[ring->id] & i915_sf_preempting)
+			break;
+
 		/* Keep launching until the sky is sufficiently full. */
 		if (i915_scheduler_count_flying(scheduler, ring) >=
 						scheduler->min_flying)
@@ -1329,6 +1758,28 @@ int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
 	return 0;
 }
 
+bool i915_scheduler_is_busy(struct intel_engine_cs *ring)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	/*
+	 * The scheduler is prevented from sending batches to the hardware
+	 * while preemption is in progress (i915_sf_preempting).
+	 *
+	 * Post-preemption (i915_sf_preempted), the hardware ring will be
+	 * empty, and the scheduler therefore needs a chance to run the
+	 * delayed work task to retire completed work and restart submission
+	 *
+	 * Therefore, if either flag is set, the scheduler is busy.
+	 */
+	if (scheduler->flags[ring->id] & (i915_sf_preempting |
+					  i915_sf_preempted))
+		return true;
+
+	return false;
+}
+
 bool i915_scheduler_is_idle(struct intel_engine_cs *ring)
 {
 	struct i915_scheduler_queue_entry *node;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index bbfd13c..f86b687 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -42,9 +42,19 @@ struct i915_execbuffer_params {
 	uint32_t                        mask;
 	int                             mode;
 	struct intel_context            *ctx;
+	uint32_t                        preemption_point;
 	int                             seqno;
 	struct drm_i915_gem_request     *request;
 	uint32_t                        scheduler_index;
+	uint32_t                        scheduler_flags;
+};
+
+/* Flag bits for i915_execbuffer_params::scheduler_flags */
+enum {
+	/* Preemption is currently enabled */
+	i915_ebp_sf_preempt          = (1 << 0),
+	/* Preemption was originally requested */
+	i915_ebp_sf_was_preempt      = (1 << 1),
 };
 
 enum i915_scheduler_queue_status {
@@ -54,8 +64,13 @@ enum i915_scheduler_queue_status {
 	i915_sqs_queued,
 	/* Sent to hardware for processing: */
 	i915_sqs_flying,
+	/* Sent to hardware for high-priority processing: */
+	i915_sqs_overtaking,
 	/* Finished processing on the hardware: */
 	i915_sqs_complete,
+	/* Was submitted, may or may not have started processing, now being
+	 * evicted: */
+	i915_sqs_preempted,
 	/* Limit value for use with arrays/loops */
 	i915_sqs_MAX
 };
@@ -63,8 +78,10 @@ char i915_scheduler_queue_status_chr(enum i915_scheduler_queue_status status);
 const char *i915_scheduler_queue_status_str(
 				enum i915_scheduler_queue_status status);
 
-#define I915_SQS_IS_QUEUED(node)	(((node)->status == i915_sqs_queued))
-#define I915_SQS_IS_FLYING(node)	(((node)->status == i915_sqs_flying))
+#define I915_SQS_IS_QUEUED(node)	(((node)->status == i915_sqs_queued) || \
+					 ((node)->status == i915_sqs_preempted))
+#define I915_SQS_IS_FLYING(node)	(((node)->status == i915_sqs_flying) || \
+					 ((node)->status == i915_sqs_overtaking))
 #define I915_SQS_IS_COMPLETE(node)	((node)->status == i915_sqs_complete)
 
 struct i915_scheduler_obj_entry {
@@ -125,6 +142,10 @@ enum {
 	i915_sf_interrupts_enabled  = (1 << 0),
 	i915_sf_submitting          = (1 << 1),
 
+	/* Preemption-related state */
+	i915_sf_preempting          = (1 << 4),
+	i915_sf_preempted           = (1 << 5),
+
 	/* Dump/debug flags */
 	i915_sf_dump_force          = (1 << 8),
 	i915_sf_dump_details        = (1 << 9),
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index bea2a49..40b1c6f 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -747,20 +747,33 @@ TRACE_EVENT(i915_scheduler_node_state_change,
 );
 
 TRACE_EVENT(i915_scheduler_irq,
-	    TP_PROTO(struct intel_engine_cs *ring, uint32_t seqno),
-	    TP_ARGS(ring, seqno),
+	    TP_PROTO(struct intel_engine_cs *ring, uint32_t seqno,
+		     uint32_t b_active, uint32_t b_done,
+		     uint32_t p_active, uint32_t p_done),
+	    TP_ARGS(ring, seqno, b_active, b_done, p_active, p_done),
 
 	    TP_STRUCT__entry(
 			     __field(u32, ring)
 			     __field(u32, seqno)
+			     __field(u32, b_active)
+			     __field(u32, b_done)
+			     __field(u32, p_active)
+			     __field(u32, p_done)
 			     ),
 
 	    TP_fast_assign(
-			   __entry->ring   = ring->id;
-			   __entry->seqno  = seqno;
+			   __entry->ring     = ring->id;
+			   __entry->seqno    = seqno;
+			   __entry->b_active = b_active;
+			   __entry->b_done   = b_done;
+			   __entry->p_active = p_active;
+			   __entry->p_done   = p_done;
 			   ),
 
-	    TP_printk("ring=%d, seqno=%d", __entry->ring, __entry->seqno)
+	    TP_printk("ring=%d, seqno=%d, b_active = %d, b_done = %d, p_active = %d, p_done = %d",
+		      __entry->ring, __entry->seqno,
+		      __entry->b_active, __entry->b_done,
+		      __entry->p_active, __entry->p_done)
 );
 
 TRACE_EVENT(i915_gem_ring_queue,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index cf9a535..17d91e9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -182,6 +182,10 @@ struct  intel_engine_cs {
 
 	struct intel_context *default_context;
 	struct intel_context *last_context;
+#ifdef CONFIG_DRM_I915_SCHEDULER_PREEMPTION
+	uint32_t last_regular_batch;
+	uint32_t last_preemptive_batch;
+#endif
 
 	struct intel_ring_hangcheck hangcheck;
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 40/44] drm/i915: REVERTME Hack to allow IGT to test pre-emption
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (38 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 39/44] drm/i915: Added support for pre-emptive scheduling John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 41/44] drm/i915: Added validation callback to trace points John.C.Harrison
                   ` (5 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

In order to test pre-emption, a flag has been added to the execbuffer() API to
explicitly request that a given batch buffer is made pre-emptive. This is purely
a temporary measure to allow an IGT test to queue pre-emptive and non-preemptive
work loads.

Note that the final solution will be to add an IOCTL to set the priority of a
batch buffer (work in progress elsewhere). The scheduler will then decide to
pre-empt or not based on the assigned priority level.
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   12 ++++++++++++
 include/uapi/drm/i915_drm.h                |    5 +++++
 2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 81acdf2..b7d0737 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1361,6 +1361,18 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	qe.params.mask                    = mask;
 	qe.params.mode                    = mode;
 
+	/* Hack for testing pre-empting prior to having an official priority API */
+	if (qe.params.args_flags & I915_EXEC_PREEMPT) {
+#ifdef CONFIG_DRM_I915_SCHEDULER_PREEMPTION
+		struct i915_scheduler   *scheduler = dev_priv->scheduler;
+		qe.priority += scheduler->priority_level_preempt;
+#else
+		DRM_DEBUG("Buffer flags 0x%X includes PREEMPT!\n",
+			  qe.params.args_flags);
+#endif
+	}
+	/* Hack for testing pre-empting prior to having an official priority API */
+
 #ifdef CONFIG_DRM_I915_SCHEDULER
 	/*
 	 * Save away the list of objects used by this batch buffer for the
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index de6f603..d391222 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -694,6 +694,11 @@ struct drm_i915_gem_execbuffer2 {
 #define I915_EXEC_BLT                    (3<<0)
 #define I915_EXEC_VEBOX                  (4<<0)
 
+/* Pre-emption flag
+ * If this flag is set, this batchbuffer preempts those already submitted
+ */
+#define I915_EXEC_PREEMPT                (1<<5)
+
 /* Used for switching the constants addressing mode on gen4+ RENDER ring.
  * Gen6+ only supports relative addressing to dynamic state (default) and
  * absolute addressing.
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 41/44] drm/i915: Added validation callback to trace points
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (39 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 40/44] drm/i915: REVERTME Hack to allow IGT to test pre-emption John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 42/44] drm/i915: Added scheduler statistic reporting to debugfs John.C.Harrison
                   ` (4 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The validation tests require hooks into the GPU scheduler to allow them to
analyse what the scheduler is doing internally.
---
 drivers/gpu/drm/i915/i915_scheduler.c |    4 ++++
 drivers/gpu/drm/i915/i915_scheduler.h |   16 ++++++++++++++++
 drivers/gpu/drm/i915/i915_trace.h     |   16 ++++++++++++++++
 3 files changed, 36 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 0eb6a31..8d45b73 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -26,6 +26,10 @@
 #include "intel_drv.h"
 #include "i915_scheduler.h"
 
+i915_scheduler_validation_callback_type
+				i915_scheduler_validation_callback = NULL;
+EXPORT_SYMBOL(i915_scheduler_validation_callback);
+
 bool i915_scheduler_is_enabled(struct drm_device *dev)
 {
 #ifdef CONFIG_DRM_I915_SCHEDULER
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index f86b687..2f8c566 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -196,4 +196,20 @@ void i915_scheduler_file_queue_dec(struct drm_file *file);
 
 int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params);
 
+/* A callback mechanism to allow validation tests to hook into the internal
+ * state of the scheduler. */
+enum i915_scheduler_validation_op {
+	i915_scheduler_validation_op_state_change	= 1,
+	i915_scheduler_validation_op_queue,
+	i915_scheduler_validation_op_dispatch,
+	i915_scheduler_validation_op_complete,
+};
+typedef int (*i915_scheduler_validation_callback_type)
+	(enum i915_scheduler_validation_op op,
+	 struct intel_engine_cs *ring,
+	 uint32_t seqno,
+	 struct i915_scheduler_queue_entry *node);
+extern i915_scheduler_validation_callback_type
+				i915_scheduler_validation_callback;
+
 #endif  /* _I915_SCHEDULER_H_ */
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 40b1c6f..2029d8b 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -369,6 +369,10 @@ TRACE_EVENT(i915_gem_ring_dispatch,
 			   __entry->seqno = seqno;
 			   __entry->flags = flags;
 			   i915_trace_irq_get(ring, seqno);
+			   if (i915_scheduler_validation_callback)
+				i915_scheduler_validation_callback(
+				      i915_scheduler_validation_op_dispatch,
+				      ring, seqno, NULL);
 			   ),
 
 	    TP_printk("dev=%u, ring=%u, seqno=%u, flags=%x",
@@ -660,6 +664,10 @@ TRACE_EVENT(i915_scheduler_landing,
 			   __entry->ring   = ring->id;
 			   __entry->seqno  = seqno;
 			   __entry->status = node ? node->status : ~0U;
+			   if (i915_scheduler_validation_callback)
+				i915_scheduler_validation_callback(
+				      i915_scheduler_validation_op_complete,
+				      ring, seqno, node);
 			   ),
 
 	    TP_printk("ring=%d, seqno=%d, status=%d",
@@ -740,6 +748,10 @@ TRACE_EVENT(i915_scheduler_node_state_change,
 			   __entry->ring   = ring->id;
 			   __entry->seqno  = node->params.seqno;
 			   __entry->status = node->status;
+			   if (i915_scheduler_validation_callback)
+				i915_scheduler_validation_callback(
+				      i915_scheduler_validation_op_state_change,
+				      ring, node->params.seqno, node);
 			   ),
 
 	    TP_printk("ring=%d, seqno=%d, status=%d",
@@ -789,6 +801,10 @@ TRACE_EVENT(i915_gem_ring_queue,
 	    TP_fast_assign(
 			   __entry->ring   = ring->id;
 			   __entry->seqno  = node->params.seqno;
+			   if (i915_scheduler_validation_callback)
+				i915_scheduler_validation_callback(
+				      i915_scheduler_validation_op_queue,
+				      ring, node->params.seqno, node);
 			   ),
 
 	    TP_printk("ring=%d, seqno=%d", __entry->ring, __entry->seqno)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 42/44] drm/i915: Added scheduler statistic reporting to debugfs
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (40 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 41/44] drm/i915: Added validation callback to trace points John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 43/44] drm/i915: Added support for submitting out-of-batch ring commands John.C.Harrison
                   ` (3 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

It is useful for know what the scheduler is doing for both debugging and
performance analysis purposes. This change adds a bunch of counters and such
that keep track of various scheduler operations (batches submitted, preempted,
interrupts processed, flush requests, etc.). The data can then be read in
userland via the debugfs mechanism.
---
 drivers/gpu/drm/i915/i915_debugfs.c   |   85 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.c |   66 +++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_scheduler.h |   50 +++++++++++++++++++
 3 files changed, 198 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 1c20c8c..cb9839b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2482,6 +2482,88 @@ static int i915_display_info(struct seq_file *m, void *unused)
 	return 0;
 }
 
+#ifdef CONFIG_DRM_I915_SCHEDULER
+static int i915_scheduler_info(struct seq_file *m, void *unused)
+{
+	struct drm_info_node *node = (struct drm_info_node *) m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_stats *stats = scheduler->stats;
+	struct i915_scheduler_stats_nodes node_stats[I915_NUM_RINGS];
+	struct intel_engine_cs *ring;
+	char   str[50 * (I915_NUM_RINGS + 1)], name[50], *ptr;
+	int ret, i, r;
+
+	ret = mutex_lock_interruptible(&dev->mode_config.mutex);
+	if (ret)
+		return ret;
+
+#define PRINT_VAR(name, fmt, var)					\
+	do {								\
+		sprintf(str, "%-22s", name );				\
+		ptr = str + strlen(str);				\
+		for_each_ring(ring, dev_priv, r) {			\
+			sprintf(ptr, " %10" fmt, var);			\
+			ptr += strlen(ptr);				\
+		}							\
+		seq_printf(m, "%s\n", str);				\
+	} while(0)
+
+	PRINT_VAR("Ring name:",             "s", dev_priv->ring[r].name);
+	seq_printf(m, "Batch submissions:\n");
+	PRINT_VAR("  Queued",               "u", stats[r].queued);
+	PRINT_VAR("  Queued preemptive",    "u", stats[r].queued_preemptive);
+	PRINT_VAR("  Submitted",            "u", stats[r].submitted);
+	PRINT_VAR("  Submitted preemptive", "u", stats[r].submitted_preemptive);
+	PRINT_VAR("  Preempted",            "u", stats[r].preempted);
+	PRINT_VAR("  Completed",            "u", stats[r].completed);
+	PRINT_VAR("  Completed preemptive", "u", stats[r].completed_preemptive);
+	PRINT_VAR("  Expired",              "u", stats[r].expired);
+	seq_putc(m, '\n');
+
+	seq_printf(m, "Flush counts:\n");
+	PRINT_VAR("  By object",            "u", stats[r].flush_obj);
+	PRINT_VAR("  By seqno",             "u", stats[r].flush_seqno);
+	PRINT_VAR("  Blanket",              "u", stats[r].flush_all);
+	PRINT_VAR("  Entries bumped",       "u", stats[r].flush_bump);
+	PRINT_VAR("  Entries submitted",    "u", stats[r].flush_submit);
+	seq_putc(m, '\n');
+
+	seq_printf(m, "Interrupt counts:\n");
+	PRINT_VAR("  Regular",              "llu", stats[r].irq.regular);
+	PRINT_VAR("  Preemptive",           "llu", stats[r].irq.preemptive);
+	PRINT_VAR("  Idle",                 "llu", stats[r].irq.idle);
+	PRINT_VAR("  Inter-batch",          "llu", stats[r].irq.interbatch);
+	PRINT_VAR("  Mid-batch",            "llu", stats[r].irq.midbatch);
+	seq_putc(m, '\n');
+
+	seq_printf(m, "Seqno values at last IRQ:\n");
+	PRINT_VAR("  Seqno",                "d", stats[r].irq.last_seqno);
+	PRINT_VAR("  Batch done",           "d", stats[r].irq.last_b_done);
+	PRINT_VAR("  Preemptive done",      "d", stats[r].irq.last_p_done);
+	PRINT_VAR("  Batch active",         "d", stats[r].irq.last_b_active);
+	PRINT_VAR("  Preemptive active",    "d", stats[r].irq.last_p_active);
+	seq_putc(m, '\n');
+
+	seq_printf(m, "Queue contents:\n");
+	for_each_ring(ring, dev_priv, i)
+		i915_scheduler_query_stats(ring, node_stats + ring->id);
+
+	for (i = 0; i < i915_sqs_MAX; i++) {
+		sprintf(name, "  %s", i915_scheduler_queue_status_str(i));
+		PRINT_VAR(name, "d", node_stats[r].counts[i]);
+	}
+	seq_putc(m, '\n');
+
+#undef PRINT_VAR
+
+	mutex_unlock(&dev->mode_config.mutex);
+
+	return 0;
+}
+#endif
+
 struct pipe_crc_info {
 	const char *name;
 	struct drm_device *dev;
@@ -3928,6 +4010,9 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_pc8_status", i915_pc8_status, 0},
 	{"i915_power_domain_info", i915_power_domain_info, 0},
 	{"i915_display_info", i915_display_info, 0},
+#ifdef CONFIG_DRM_I915_SCHEDULER
+	{"i915_scheduler_info", i915_scheduler_info, 0},
+#endif
 };
 #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 8d45b73..c679513 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -204,11 +204,13 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 		int ret;
 
 		qe->params.scheduler_index = scheduler->index++;
+		scheduler->stats[qe->params.ring->id].queued++;
 
 		trace_i915_scheduler_queue(qe->params.ring, qe);
 
 		scheduler->flags[qe->params.ring->id] |= i915_sf_submitting;
 		ret = i915_gem_do_execbuffer_final(&qe->params);
+		scheduler->stats[qe->params.ring->id].submitted++;
 		scheduler->flags[qe->params.ring->id] &= ~i915_sf_submitting;
 
 		/* Need to release the objects: */
@@ -229,6 +231,7 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 
 		/* And anything else owned by the QE structure: */
 		kfree(qe->params.cliprects);
+		scheduler->stats[qe->params.ring->id].expired++;
 
 		return ret;
 	}
@@ -343,10 +346,14 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 	want_preempt = false;
 #endif
 
-	if (want_preempt)
+	if (want_preempt) {
 		node->params.scheduler_flags |= i915_ebp_sf_preempt |
 						i915_ebp_sf_was_preempt;
 
+		scheduler->stats[ring->id].queued_preemptive++;
+	} else
+		scheduler->stats[ring->id].queued++;
+
 	trace_i915_scheduler_queue(ring, node);
 	trace_i915_scheduler_node_state_change(ring, node);
 
@@ -607,6 +614,7 @@ static void i915_scheduler_seqno_complete(struct intel_engine_cs *ring,
 		 */
 		node->status = i915_sqs_complete;
 		trace_i915_scheduler_node_state_change(ring, node);
+		scheduler->stats[ring->id].completed_preemptive++;
 
 		list_for_each_entry_continue(node, &scheduler->node_queue[ring->id], link) {
 			BUG_ON(node->status == i915_sqs_overtaking);
@@ -620,6 +628,7 @@ static void i915_scheduler_seqno_complete(struct intel_engine_cs *ring,
 			node->status = i915_sqs_preempted;
 			trace_i915_scheduler_unfly(ring, node);
 			trace_i915_scheduler_node_state_change(ring, node);
+			scheduler->stats[ring->id].preempted++;
 		}
 
 		/*
@@ -659,6 +668,7 @@ static void i915_scheduler_seqno_complete(struct intel_engine_cs *ring,
 			/* Node was in flight so mark it as complete. */
 			node->status = i915_sqs_complete;
 			trace_i915_scheduler_node_state_change(ring, node);
+			scheduler->stats[ring->id].completed++;
 		}
 	}
 
@@ -753,6 +763,7 @@ int i915_scheduler_handle_IRQ(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_stats *stats = scheduler->stats + ring->id;
 	unsigned long   flags;
 	uint32_t        b_active, b_done, p_active, p_done;
 
@@ -763,7 +774,13 @@ int i915_scheduler_handle_IRQ(struct intel_engine_cs *ring)
 	b_done   = intel_read_status_page(ring, I915_BATCH_DONE_SEQNO);
 	b_active = intel_read_status_page(ring, I915_BATCH_ACTIVE_SEQNO);
 
-	trace_i915_scheduler_irq(ring, ring->get_seqno(ring, false),
+	stats->irq.last_b_done   = b_done;
+	stats->irq.last_p_done   = p_done;
+	stats->irq.last_b_active = b_active;
+	stats->irq.last_p_active = p_active;
+	stats->irq.last_seqno    = ring->get_seqno(ring, false);
+
+	trace_i915_scheduler_irq(ring, stats->irq.last_seqno,
 				 b_active, b_done, p_active, p_done);
 
 	if (i915.scheduler_override & i915_so_direct_submit) {
@@ -775,6 +792,7 @@ int i915_scheduler_handle_IRQ(struct intel_engine_cs *ring)
 	if (b_done != ring->last_regular_batch) {
 		i915_scheduler_seqno_complete(ring, b_done, false);
 		ring->last_regular_batch = b_done;
+		stats->irq.regular += 1;
 	}
 
 	if (p_done) {
@@ -784,15 +802,19 @@ int i915_scheduler_handle_IRQ(struct intel_engine_cs *ring)
 		 * batch has been preempted mid-batch. All other batches still
 		 * in flight have been preempted before starting.
 		 */
+		stats->irq.preemptive += 1;
 		BUG_ON(p_active != p_done);
 		if (b_active == p_active) {
 			/* null preemption (ring was idle) */
+			stats->irq.idle += 1;
 		} else if (b_active == 0) {
 			/* interbatch preemption (ring was busy) */
+			stats->irq.interbatch += 1;
 		} else /* any other value of b_active */ {
 			/* midbatch preemption (batch was running) */
 			uint32_t b_addr = intel_read_status_page(ring, I915_SAVE_PREEMPTED_BB_PTR);
 			i915_scheduler_seqno_started(ring, b_active, true, b_addr);
+			stats->irq.midbatch += 1;
 		}
 
 		i915_scheduler_seqno_complete(ring, p_done, true);
@@ -871,6 +893,7 @@ static int i915_scheduler_seqno_complete(struct intel_engine_cs *ring, uint32_t
 		/* Node was in flight so mark it as complete. */
 		node->status = i915_sqs_complete;
 		trace_i915_scheduler_node_state_change(ring, node);
+		scheduler->stats[ring->id].completed++;
 	}
 
 	/* Should submit new work here if flight list is empty but the DRM
@@ -975,6 +998,7 @@ int i915_scheduler_remove(struct intel_engine_cs *ring)
 
 		list_del(&node->link);
 		list_add(&node->link, &remove);
+		scheduler->stats[ring->id].expired++;
 
 		/* Strip the dependency info while the mutex is still locked */
 		i915_scheduler_remove_dependent(scheduler, node);
@@ -1195,6 +1219,32 @@ int i915_scheduler_dump_locked(struct intel_engine_cs *ring, const char *msg)
 	return 0;
 }
 
+int i915_scheduler_query_stats(struct intel_engine_cs *ring,
+			       struct i915_scheduler_stats_nodes *stats)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_queue_entry  *node;
+	unsigned long   flags;
+
+	memset(stats, 0x00, sizeof(*stats));
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+		if (node->status >= i915_sqs_MAX) {
+			DRM_DEBUG_SCHED("Invalid node state: %d! [seqno = %d]\n",
+					node->status, node->params.seqno);
+		}
+
+		stats->counts[node->status]++;
+	}
+
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	return 0;
+}
+
 int i915_scheduler_flush_seqno(struct intel_engine_cs *ring, bool is_locked,
 			       uint32_t seqno)
 {
@@ -1220,6 +1270,8 @@ int i915_scheduler_flush_seqno(struct intel_engine_cs *ring, bool is_locked,
 
 	spin_lock_irqsave(&scheduler->lock, flags);
 
+	scheduler->stats[ring->id].flush_seqno++;
+
 	i915_scheduler_priority_bump_clear(scheduler, ring);
 
 	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
@@ -1231,6 +1283,7 @@ int i915_scheduler_flush_seqno(struct intel_engine_cs *ring, bool is_locked,
 
 		flush_count += i915_scheduler_priority_bump(scheduler,
 					node, scheduler->priority_level_max);
+		scheduler->stats[ring->id].flush_bump += flush_count;
 	}
 
 	spin_unlock_irqrestore(&scheduler->lock, flags);
@@ -1238,6 +1291,7 @@ int i915_scheduler_flush_seqno(struct intel_engine_cs *ring, bool is_locked,
 	if (flush_count) {
 		DRM_DEBUG_SCHED("<%s> Bumped %d entries\n", ring->name, flush_count);
 		flush_count = i915_scheduler_submit_max_priority(ring, is_locked);
+		scheduler->stats[ring->id].flush_submit += flush_count;
 	}
 
 	return flush_count;
@@ -1264,6 +1318,8 @@ int i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked)
 
 	BUG_ON(is_locked && (scheduler->flags[ring->id] & i915_sf_submitting));
 
+	scheduler->stats[ring->id].flush_all++;
+
 	do {
 		found = false;
 		spin_lock_irqsave(&scheduler->lock, flags);
@@ -1278,6 +1334,7 @@ int i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked)
 
 		if (found) {
 			ret = i915_scheduler_submit(ring, is_locked);
+			scheduler->stats[ring->id].flush_submit++;
 			if (ret < 0)
 				return ret;
 
@@ -1573,8 +1630,11 @@ int i915_scheduler_submit(struct intel_engine_cs *ring, bool was_locked)
 		 * up until it has at least begun to be executed. That is,
 		 * if a pre-emption request is in flight then no other work
 		 * may be submitted until it resolves. */
-		if (node->params.scheduler_flags & i915_ebp_sf_preempt)
+		if (node->params.scheduler_flags & i915_ebp_sf_preempt) {
 			scheduler->flags[ring->id] |= i915_sf_preempting;
+			scheduler->stats[ring->id].submitted_preemptive++;
+		} else
+			scheduler->stats[ring->id].submitted++;
 
 		scheduler->flags[ring->id] |= i915_sf_submitting;
 		spin_unlock_irqrestore(&scheduler->lock, flags);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 2f8c566..8d2289f 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -123,6 +123,51 @@ bool        i915_scheduler_is_idle(struct intel_engine_cs *ring);
 
 #ifdef CONFIG_DRM_I915_SCHEDULER
 
+struct i915_scheduler_stats_nodes
+{
+	uint32_t	counts[i915_sqs_MAX];
+};
+
+struct i915_scheduler_stats_irq
+{
+	/* Counts of various interrupt types */
+	uint64_t            regular;
+	uint64_t            preemptive;
+	uint64_t            idle;
+	uint64_t            interbatch;
+	uint64_t            midbatch;
+
+	/* Sequence numbers seen at last IRQ */
+	uint32_t            last_seqno;
+	uint32_t            last_b_done;
+	uint32_t            last_p_done;
+	uint32_t            last_b_active;
+	uint32_t            last_p_active;
+};
+
+struct i915_scheduler_stats
+{
+	/* Batch buffer counts: */
+	uint32_t            queued;
+	uint32_t            queued_preemptive;
+	uint32_t            submitted;
+	uint32_t            submitted_preemptive;
+	uint32_t            preempted;
+	uint32_t            completed;
+	uint32_t            completed_preemptive;
+	uint32_t            expired;
+
+	/* Other stuff: */
+	uint32_t            flush_obj;
+	uint32_t            flush_seqno;
+	uint32_t            flush_all;
+	uint32_t            flush_bump;
+	uint32_t            flush_submit;
+
+	/* Interrupts: */
+	struct i915_scheduler_stats_irq irq;
+};
+
 struct i915_scheduler {
 	struct list_head    node_queue[I915_NUM_RINGS];
 	uint32_t            flags[I915_NUM_RINGS];
@@ -134,6 +179,9 @@ struct i915_scheduler {
 	uint32_t            priority_level_preempt;
 	uint32_t            min_flying;
 	uint32_t            file_queue_max;
+
+	/* Statistics: */
+	struct i915_scheduler_stats     stats[I915_NUM_RINGS];
 };
 
 /* Flag bits for i915_scheduler::flags */
@@ -187,6 +235,8 @@ int         i915_scheduler_priority_bump(struct i915_scheduler *scheduler,
 				uint32_t bump);
 bool        i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
 					      uint32_t seqno, bool *completed);
+int         i915_scheduler_query_stats(struct intel_engine_cs *ring,
+				       struct i915_scheduler_stats_nodes *stats);
 
 bool i915_scheduler_file_queue_is_full(struct drm_file *file);
 void i915_scheduler_file_queue_inc(struct drm_file *file);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 43/44] drm/i915: Added support for submitting out-of-batch ring commands
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (41 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 42/44] drm/i915: Added scheduler statistic reporting to debugfs John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-06-26 17:24 ` [RFC 44/44] drm/i915: Fake batch support for page flips John.C.Harrison
                   ` (2 subsequent siblings)
  45 siblings, 0 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a problem with any commands written to the ring that the scheduler does
not know about. Basically, they can get lost if the scheduler issues a
pre-emption as the pre-emption mechanism discards the current ring contents.
Thus any non-batch buffer submission has the potential to be skipped.

The solution is to make sure that nothing is written to the ring that did not
come from the scheduler. Not many pieces of code do write to the ring directly.
The only one that seems to occur on modern systems is the page flip code.

This checkin adds scheduler support for command submission without having a
batch buffer - just an arbitrary sized block of data to be written to the ring.
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  109 ++++++++++++++++------------
 drivers/gpu/drm/i915/i915_scheduler.c      |   88 +++++++++++++++++++---
 drivers/gpu/drm/i915/i915_scheduler.h      |   12 +++
 3 files changed, 153 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index b7d0737..48379fb 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -987,16 +987,18 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 }
 
 static void
-i915_gem_execbuffer_retire_commands(struct drm_device *dev,
-				    struct drm_file *file,
-				    struct intel_engine_cs *ring,
-				    struct drm_i915_gem_object *obj)
+i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params)
 {
+	if(params->scheduler_flags & i915_ebp_sf_not_a_batch) {
+		i915_add_request_wo_flush(params->ring);
+		return;
+	}
+
 	/* Unconditionally force add_request to emit a full flush. */
-	ring->gpu_caches_dirty = true;
+	params->ring->gpu_caches_dirty = true;
 
 	/* Add a breadcrumb for the completion of the batch buffer */
-	(void)__i915_add_request(ring, file, obj, NULL, true);
+	(void)__i915_add_request(params->ring, params->file, params->batch_obj, NULL, true);
 }
 
 static int
@@ -1659,7 +1661,7 @@ static void
 emit_preamble(struct intel_engine_cs *ring, uint32_t seqno, struct intel_context *ctx, bool preemptive)
 {
 	emit_store_dw_index(ring, seqno, preemptive ? I915_PREEMPTIVE_ACTIVE_SEQNO : I915_BATCH_ACTIVE_SEQNO);
-	if (preemptive || i915_gem_context_is_default(ctx))
+	if (preemptive || !ctx || i915_gem_context_is_default(ctx))
 		intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_DISABLE);
 	else
 		intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
@@ -1761,7 +1763,8 @@ int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params)
 	 * to span the transition from the end to the beginning of the ring.
 	 */
 #define I915_BATCH_EXEC_MAX_LEN         256	/* max dwords emitted here	*/
-	min_space = I915_BATCH_EXEC_MAX_LEN * 2 * sizeof(uint32_t);
+	min_space = I915_BATCH_EXEC_MAX_LEN + params->emit_len;
+	min_space = min_space * 2 * sizeof(uint32_t);
 	ret = intel_ring_test_space(ring, min_space);
 	if (ret)
 		goto early_err;
@@ -1811,30 +1814,34 @@ int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params)
 		emit_regular_prequel(ring, seqno, start);
 #endif
 
-	/* Switch to the correct context for the batch */
-	ret = i915_switch_context(ring, params->ctx);
-	if (ret)
-		goto err;
+	if(params->ctx) {
+		/* Switch to the correct context for the batch */
+		ret = i915_switch_context(ring, params->ctx);
+		if (ret)
+			goto err;
+	}
 
 	/* Seqno matches? */
 	BUG_ON(seqno != params->seqno);
 	BUG_ON(ring->outstanding_lazy_seqno != params->seqno);
 
-	if (ring == &dev_priv->ring[RCS] &&
-	    params->mode != dev_priv->relative_constants_mode) {
+	if((params->scheduler_flags & i915_ebp_sf_not_a_batch) == 0) {
+		if (ring == &dev_priv->ring[RCS] &&
+		    params->mode != dev_priv->relative_constants_mode) {
 #ifndef CONFIG_DRM_I915_SCHEDULER
-		ret = intel_ring_begin(ring, 4);
-		if (ret)
-			goto err;
+			ret = intel_ring_begin(ring, 4);
+			if (ret)
+				goto err;
 #endif
 
-		intel_ring_emit(ring, MI_NOOP);
-		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
-		intel_ring_emit(ring, INSTPM);
-		intel_ring_emit(ring, params->mask << 16 | params->mode);
-		intel_ring_advance(ring);
+			intel_ring_emit(ring, MI_NOOP);
+			intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
+			intel_ring_emit(ring, INSTPM);
+			intel_ring_emit(ring, params->mask << 16 | params->mode);
+			intel_ring_advance(ring);
 
-		dev_priv->relative_constants_mode = params->mode;
+			dev_priv->relative_constants_mode = params->mode;
+		}
 	}
 
 	if (params->args_flags & I915_EXEC_GEN7_SOL_RESET) {
@@ -1855,37 +1862,48 @@ int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params)
 	emit_preamble(ring, seqno, params->ctx, preemptive);
 #endif
 
-	exec_len   = params->args_batch_len;
-	exec_start = params->batch_obj_vm_offset +
-		     params->args_batch_start_offset;
+	if(params->scheduler_flags & i915_ebp_sf_not_a_batch) {
+		if(params->scheduler_flags & i915_ebp_sf_cacheline_align) {
+			ret = intel_ring_cacheline_align(ring);
+			if (ret)
+				goto err;
+		}
+
+		for( i = 0; i < params->emit_len; i++ )
+			intel_ring_emit(ring, params->emit_data[i]);
+	} else {
+		exec_len   = params->args_batch_len;
+		exec_start = params->batch_obj_vm_offset +
+			     params->args_batch_start_offset;
 
 #ifdef CONFIG_DRM_I915_SCHEDULER_PREEMPTION
-	if (params->preemption_point) {
-		uint32_t preemption_offset = params->preemption_point - exec_start;
-		exec_start += preemption_offset;
-		exec_len   -= preemption_offset;
-	}
+		if (params->preemption_point) {
+			uint32_t preemption_offset = params->preemption_point - exec_start;
+			exec_start += preemption_offset;
+			exec_len   -= preemption_offset;
+		}
 #endif
 
-	if (params->cliprects) {
-		for (i = 0; i < params->args_num_cliprects; i++) {
-			ret = i915_emit_box(params->dev, &params->cliprects[i],
-					    params->args_DR1, params->args_DR4);
-			if (ret)
-				goto err;
-
+		if (params->cliprects) {
+			for (i = 0; i < params->args_num_cliprects; i++) {
+				ret = i915_emit_box(params->dev, &params->cliprects[i],
+						    params->args_DR1, params->args_DR4);
+				if (ret)
+					goto err;
+
+				ret = ring->dispatch_execbuffer(ring,
+								exec_start, exec_len,
+								params->eb_flags);
+				if (ret)
+					goto err;
+			}
+		} else {
 			ret = ring->dispatch_execbuffer(ring,
 							exec_start, exec_len,
 							params->eb_flags);
 			if (ret)
 				goto err;
 		}
-	} else {
-		ret = ring->dispatch_execbuffer(ring,
-						exec_start, exec_len,
-						params->eb_flags);
-		if (ret)
-			goto err;
 	}
 
 #ifdef CONFIG_DRM_I915_SCHEDULER_PREEMPTION
@@ -1899,8 +1917,7 @@ int i915_gem_do_execbuffer_final(struct i915_execbuffer_params *params)
 	BUG_ON(params->seqno   != ring->outstanding_lazy_seqno);
 	BUG_ON(params->request != ring->preallocated_lazy_request);
 
-	i915_gem_execbuffer_retire_commands(params->dev, params->file, ring,
-					    params->batch_obj);
+	i915_gem_execbuffer_retire_commands(params);
 
 	/* OLS should be zero by now! */
 	BUG_ON(ring->outstanding_lazy_seqno);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index c679513..127ded9 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -49,6 +49,7 @@ const char *i915_qe_state_str(struct i915_scheduler_queue_entry *node)
 	*(ptr++) = node->bumped ? 'B' : '-',
 	*(ptr++) = (node->params.scheduler_flags & i915_ebp_sf_preempt) ? 'P' : '-';
 	*(ptr++) = (node->params.scheduler_flags & i915_ebp_sf_was_preempt) ? 'p' : '-';
+	*(ptr++) = (node->params.scheduler_flags & i915_ebp_sf_not_a_batch) ? '!' : '-';
 
 	*ptr = 0;
 
@@ -247,15 +248,30 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 	node->status = i915_sqs_queued;
 	node->stamp  = stamp;
 
-	/*
-	 * Verify that the batch buffer itself is included in the object list.
-	 */
-	for (i = 0; i < node->num_objs; i++) {
-		if (node->saved_objects[i].obj == node->params.batch_obj)
-			got_batch++;
-	}
+	if( node->params.scheduler_flags & i915_ebp_sf_not_a_batch ) {
+		uint32_t size;
+
+		size = sizeof(*node->params.emit_data) * node->params.emit_len;
+		node->params.emit_data = kmalloc(size, GFP_KERNEL);
+		if (!node->params.emit_data) {
+			kfree(node);
+			return -ENOMEM;
+		}
+
+		memcpy(node->params.emit_data, qe->params.emit_data, size);
+	} else {
+		BUG_ON(node->params.emit_len || node->params.emit_data);
 
-	BUG_ON(got_batch != 1);
+		/*
+		 * Verify that the batch buffer itself is included in the object list.
+		 */
+		for (i = 0; i < node->num_objs; i++) {
+			if (node->saved_objects[i].obj == node->params.batch_obj)
+				got_batch++;
+		}
+
+		BUG_ON(got_batch != 1);
+	}
 
 	/* Need to determine the number of incomplete entries in the list as
 	 * that will be the maximum size of the dependency list.
@@ -282,6 +298,7 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 		node->dep_list = kmalloc(sizeof(node->dep_list[0]) * incomplete,
 					 GFP_KERNEL);
 		if (!node->dep_list) {
+			kfree(node->params.emit_data);
 			kfree(node);
 			return -ENOMEM;
 		}
@@ -297,7 +314,10 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 				if (I915_SQS_IS_COMPLETE(test))
 					continue;
 
-				found = (node->params.ctx == test->params.ctx);
+				if (node->params.ctx && test->params.ctx)
+					found = (node->params.ctx == test->params.ctx);
+				else
+					found = false;
 
 				for (i = 0; (i < node->num_objs) && !found; i++) {
 					for (j = 0; j < test->num_objs; j++) {
@@ -332,7 +352,8 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 
 	list_add_tail(&node->link, &scheduler->node_queue[ring->id]);
 
-	i915_scheduler_file_queue_inc(node->params.file);
+	if (node->params.file)
+		i915_scheduler_file_queue_inc(node->params.file);
 
 	if (i915.scheduler_override & i915_so_submit_on_queue)
 		not_flying = true;
@@ -1051,6 +1072,7 @@ int i915_scheduler_remove(struct intel_engine_cs *ring)
 			i915_gem_context_unreference(node->params.ctx);
 
 		/* And anything else owned by the node: */
+		kfree(node->params.emit_data);
 		kfree(node->params.cliprects);
 		kfree(node->dep_list);
 		kfree(node);
@@ -1909,3 +1931,49 @@ int i915_scheduler_handle_IRQ(struct intel_engine_cs *ring)
 }
 
 #endif  /* CONFIG_DRM_I915_SCHEDULER */
+
+int i915_scheduler_queue_nonbatch(struct intel_engine_cs *ring,
+				  uint32_t *data, uint32_t len,
+				  struct drm_i915_gem_object *objs[],
+				  uint32_t num_objs, uint32_t flags)
+{
+	struct i915_scheduler_queue_entry qe;
+	int ret;
+
+	memset(&qe, 0x00, sizeof(qe));
+
+	ret = intel_ring_alloc_seqno(ring);
+	if (ret)
+		return ret;
+
+	qe.params.ring            = ring;
+	qe.params.dev             = ring->dev;
+	qe.params.seqno           = ring->outstanding_lazy_seqno;
+	qe.params.request         = ring->preallocated_lazy_request;
+	qe.params.emit_len        = len;
+	qe.params.emit_data       = data;
+	qe.params.scheduler_flags = flags | i915_ebp_sf_not_a_batch;
+
+#ifdef CONFIG_DRM_I915_SCHEDULER
+{
+	int i;
+
+	qe.num_objs      = num_objs;
+	qe.saved_objects = kmalloc(sizeof(qe.saved_objects[0]) * num_objs, GFP_KERNEL);
+	if (!qe.saved_objects)
+		return -ENOMEM;
+
+	for (i = 0; i < num_objs; i++) {
+		qe.saved_objects[i].obj = objs[i];
+		drm_gem_object_reference(&objs[i]->base);
+	}
+}
+#endif
+
+	ring->outstanding_lazy_seqno    = 0;
+	ring->preallocated_lazy_request = NULL;
+
+	trace_i915_gem_ring_queue(ring, &qe);
+
+	return i915_scheduler_queue_execbuffer(&qe);
+}
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 8d2289f..f2a9243 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -47,6 +47,8 @@ struct i915_execbuffer_params {
 	struct drm_i915_gem_request     *request;
 	uint32_t                        scheduler_index;
 	uint32_t                        scheduler_flags;
+	uint32_t                        *emit_data;
+	uint32_t                        emit_len;
 };
 
 /* Flag bits for i915_execbuffer_params::scheduler_flags */
@@ -55,6 +57,12 @@ enum {
 	i915_ebp_sf_preempt          = (1 << 0),
 	/* Preemption was originally requested */
 	i915_ebp_sf_was_preempt      = (1 << 1),
+
+	/* Non-batch internal driver submissions */
+	i915_ebp_sf_not_a_batch      = (1 << 2),
+
+	/* Payload should be cacheline aligned in ring */
+	i915_ebp_sf_cacheline_align  = (1 << 2),
 };
 
 enum i915_scheduler_queue_status {
@@ -118,6 +126,10 @@ int         i915_scheduler_init(struct drm_device *dev);
 int         i915_scheduler_closefile(struct drm_device *dev,
 				     struct drm_file *file);
 int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
+int         i915_scheduler_queue_nonbatch(struct intel_engine_cs *ring,
+					  uint32_t *data, uint32_t len,
+					  struct drm_i915_gem_object *objs[],
+					  uint32_t num_objs, uint32_t flags);
 int         i915_scheduler_handle_IRQ(struct intel_engine_cs *ring);
 bool        i915_scheduler_is_idle(struct intel_engine_cs *ring);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 44/44] drm/i915: Fake batch support for page flips
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (42 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 43/44] drm/i915: Added support for submitting out-of-batch ring commands John.C.Harrison
@ 2014-06-26 17:24 ` John.C.Harrison
  2014-07-07 19:25   ` Daniel Vetter
  2014-06-26 20:44 ` [RFC 00/44] GPU scheduler for i915 driver Dave Airlie
  2014-10-10 10:35 ` Steven Newbury
  45 siblings, 1 reply; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:24 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Any commands written to the ring without the scheduler's knowledge can get lost
during a pre-emption event. This checkin updates the page flip code to send the
ring commands via the scheduler's 'fake batch' interface. Thus the page flip is
kept safe from being clobbered.
---
 drivers/gpu/drm/i915/intel_display.c |   84 ++++++++++++++++------------------
 1 file changed, 40 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index fa1ffbb..8bbc5d3 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -9099,8 +9099,8 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 				 uint32_t flags)
 {
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-	uint32_t plane_bit = 0;
-	int len, ret;
+	uint32_t plane_bit = 0, sched_flags;
+	int ret;
 
 	switch (intel_crtc->plane) {
 	case PLANE_A:
@@ -9117,18 +9117,6 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 		return -ENODEV;
 	}
 
-	len = 4;
-	if (ring->id == RCS) {
-		len += 6;
-		/*
-		 * On Gen 8, SRM is now taking an extra dword to accommodate
-		 * 48bits addresses, and we need a NOOP for the batch size to
-		 * stay even.
-		 */
-		if (IS_GEN8(dev))
-			len += 2;
-	}
-
 	/*
 	 * BSpec MI_DISPLAY_FLIP for IVB:
 	 * "The full packet must be contained within the same cache line."
@@ -9139,13 +9127,7 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 	 * then do the cacheline alignment, and finally emit the
 	 * MI_DISPLAY_FLIP.
 	 */
-	ret = intel_ring_cacheline_align(ring);
-	if (ret)
-		return ret;
-
-	ret = intel_ring_begin(ring, len);
-	if (ret)
-		return ret;
+	sched_flags = i915_ebp_sf_cacheline_align;
 
 	/* Unmask the flip-done completion message. Note that the bspec says that
 	 * we should do this for both the BCS and RCS, and that we must not unmask
@@ -9157,32 +9139,46 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 	 * to zero does lead to lockups within MI_DISPLAY_FLIP.
 	 */
 	if (ring->id == RCS) {
-		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
-		intel_ring_emit(ring, DERRMR);
-		intel_ring_emit(ring, ~(DERRMR_PIPEA_PRI_FLIP_DONE |
-					DERRMR_PIPEB_PRI_FLIP_DONE |
-					DERRMR_PIPEC_PRI_FLIP_DONE));
-		if (IS_GEN8(dev))
-			intel_ring_emit(ring, MI_STORE_REGISTER_MEM_GEN8(1) |
-					      MI_SRM_LRM_GLOBAL_GTT);
-		else
-			intel_ring_emit(ring, MI_STORE_REGISTER_MEM(1) |
-					      MI_SRM_LRM_GLOBAL_GTT);
-		intel_ring_emit(ring, DERRMR);
-		intel_ring_emit(ring, ring->scratch.gtt_offset + 256);
-		if (IS_GEN8(dev)) {
-			intel_ring_emit(ring, 0);
-			intel_ring_emit(ring, MI_NOOP);
-		}
-	}
+		uint32_t cmds[] = {
+			MI_LOAD_REGISTER_IMM(1),
+			DERRMR,
+			~(DERRMR_PIPEA_PRI_FLIP_DONE |
+				DERRMR_PIPEB_PRI_FLIP_DONE |
+				DERRMR_PIPEC_PRI_FLIP_DONE),
+			IS_GEN8(dev) ? (MI_STORE_REGISTER_MEM_GEN8(1) |
+					MI_SRM_LRM_GLOBAL_GTT) :
+				       (MI_STORE_REGISTER_MEM(1) |
+					MI_SRM_LRM_GLOBAL_GTT),
+			DERRMR,
+			ring->scratch.gtt_offset + 256,
+//		if (IS_GEN8(dev)) {
+			0,
+			MI_NOOP,
+//		}
+			MI_DISPLAY_FLIP_I915 | plane_bit,
+			fb->pitches[0] | obj->tiling_mode,
+			intel_crtc->unpin_work->gtt_offset,
+			MI_NOOP
+		};
+		uint32_t len = sizeof(cmds) / sizeof(*cmds);
+
+		ret = i915_scheduler_queue_nonbatch(ring, cmds, len, &obj, 1, sched_flags);
+	} else {
+		uint32_t cmds[] = {
+			MI_DISPLAY_FLIP_I915 | plane_bit,
+			fb->pitches[0] | obj->tiling_mode,
+			intel_crtc->unpin_work->gtt_offset,
+			MI_NOOP
+		};
+		uint32_t len = sizeof(cmds) / sizeof(*cmds);
 
-	intel_ring_emit(ring, MI_DISPLAY_FLIP_I915 | plane_bit);
-	intel_ring_emit(ring, (fb->pitches[0] | obj->tiling_mode));
-	intel_ring_emit(ring, intel_crtc->unpin_work->gtt_offset);
-	intel_ring_emit(ring, (MI_NOOP));
+		ret = i915_scheduler_queue_nonbatch(ring, cmds, len, &obj, 1, sched_flags);
+	}
+	if (ret)
+		return ret;
 
 	intel_mark_page_flip_active(intel_crtc);
-	i915_add_request_wo_flush(ring);
+
 	return 0;
 }
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [RFC 00/44] GPU scheduler for i915 driver
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (43 preceding siblings ...)
  2014-06-26 17:24 ` [RFC 44/44] drm/i915: Fake batch support for page flips John.C.Harrison
@ 2014-06-26 20:44 ` Dave Airlie
  2014-07-07 15:57   ` Daniel Vetter
  2014-10-10 10:35 ` Steven Newbury
  45 siblings, 1 reply; 90+ messages in thread
From: Dave Airlie @ 2014-06-26 20:44 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: intel-gfx@lists.freedesktop.org

>
> Implemented a batch buffer submission scheduler for the i915 DRM driver.
>

While this seems very interesting, you might want to address in the commit msg
or the cover email

a) why this is needed,
b) any improvements in speed, power consumption or throughput it generates,
i.e. benchmarks.

also some notes on what hw supports preemption.

Dave.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 01/44] drm/i915: Corrected 'file_priv' to 'file' in 'i915_driver_preclose()'
  2014-06-26 17:23 ` [RFC 01/44] drm/i915: Corrected 'file_priv' to 'file' in 'i915_driver_preclose()' John.C.Harrison
@ 2014-06-30 21:03   ` Jesse Barnes
  2014-07-07 18:02     ` Daniel Vetter
  0 siblings, 1 reply; 90+ messages in thread
From: Jesse Barnes @ 2014-06-30 21:03 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, 26 Jun 2014 18:23:52 +0100
John.C.Harrison@Intel.com wrote:

> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The 'i915_driver_preclose()' function has a parameter called 'file_priv'.
> However, this is misleading as the structure it points to is a 'drm_file' not a
> 'drm_i915_file_private'. It should be named just 'file' to avoid confusion.
> ---
>  drivers/gpu/drm/i915/i915_dma.c |    6 +++---
>  drivers/gpu/drm/i915/i915_drv.h |    6 +++---
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index b9159ad..6cce55b 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1916,11 +1916,11 @@ void i915_driver_lastclose(struct drm_device * dev)
>  	i915_dma_cleanup(dev);
>  }
>  
> -void i915_driver_preclose(struct drm_device * dev, struct drm_file *file_priv)
> +void i915_driver_preclose(struct drm_device *dev, struct drm_file *file)
>  {
>  	mutex_lock(&dev->struct_mutex);
> -	i915_gem_context_close(dev, file_priv);
> -	i915_gem_release(dev, file_priv);
> +	i915_gem_context_close(dev, file);
> +	i915_gem_release(dev, file);
>  	mutex_unlock(&dev->struct_mutex);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index bea9ab40..7a96ca0 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2044,12 +2044,12 @@ void i915_update_dri1_breadcrumb(struct drm_device *dev);
>  extern void i915_kernel_lost_context(struct drm_device * dev);
>  extern int i915_driver_load(struct drm_device *, unsigned long flags);
>  extern int i915_driver_unload(struct drm_device *);
> -extern int i915_driver_open(struct drm_device *dev, struct drm_file *file_priv);
> +extern int i915_driver_open(struct drm_device *dev, struct drm_file *file);
>  extern void i915_driver_lastclose(struct drm_device * dev);
>  extern void i915_driver_preclose(struct drm_device *dev,
> -				 struct drm_file *file_priv);
> +				 struct drm_file *file);
>  extern void i915_driver_postclose(struct drm_device *dev,
> -				  struct drm_file *file_priv);
> +				  struct drm_file *file);
>  extern int i915_driver_device_is_agp(struct drm_device * dev);
>  #ifdef CONFIG_COMPAT
>  extern long i915_compat_ioctl(struct file *filp, unsigned int cmd,

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 03/44] drm/i915: Add extra add_request calls
  2014-06-26 17:23 ` [RFC 03/44] drm/i915: Add extra add_request calls John.C.Harrison
@ 2014-06-30 21:10   ` Jesse Barnes
  2014-07-07 18:41     ` Daniel Vetter
  0 siblings, 1 reply; 90+ messages in thread
From: Jesse Barnes @ 2014-06-30 21:10 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, 26 Jun 2014 18:23:54 +0100
John.C.Harrison@Intel.com wrote:

> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The scheduler needs to track batch buffers by seqno without extra, non-batch
> buffer work being attached to the same seqno. This means that anywhere which
> adds work to the ring should explicitly call i915_add_request() when it has
> finished writing to the ring.
> 
> The add_request() function does extra work, such as flushing caches, that does
> not necessarily want to be done everywhere. Instead, a new
> i915_add_request_wo_flush() function has been added which skips the cache flush
> and just tidies up request structures and seqno values.
> 
> Note, much of this patch was implemented by Naresh Kumar Kachhi for pending
> power management improvements. However, it is also directly applicable to the
> scheduler work as noted above.
> ---
>  drivers/gpu/drm/i915/i915_dma.c              |    5 +++++
>  drivers/gpu/drm/i915/i915_drv.h              |    9 +++++---
>  drivers/gpu/drm/i915/i915_gem.c              |   31 ++++++++++++++++++++------
>  drivers/gpu/drm/i915/i915_gem_context.c      |    9 ++++++++
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c   |    4 ++--
>  drivers/gpu/drm/i915/i915_gem_render_state.c |    2 +-
>  drivers/gpu/drm/i915/intel_display.c         |   10 ++++-----
>  7 files changed, 52 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 67f2918..494b156 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -456,6 +456,7 @@ static int i915_dispatch_cmdbuffer(struct drm_device * dev,
>  				   struct drm_clip_rect *cliprects,
>  				   void *cmdbuf)
>  {
> +	struct drm_i915_private *dev_priv = dev->dev_private;
>  	int nbox = cmd->num_cliprects;
>  	int i = 0, count, ret;
>  
> @@ -482,6 +483,7 @@ static int i915_dispatch_cmdbuffer(struct drm_device * dev,
>  	}
>  
>  	i915_emit_breadcrumb(dev);
> +	i915_add_request_wo_flush(LP_RING(dev_priv));
>  	return 0;
>  }
>  
> @@ -544,6 +546,7 @@ static int i915_dispatch_batchbuffer(struct drm_device * dev,
>  	}
>  
>  	i915_emit_breadcrumb(dev);
> +	i915_add_request_wo_flush(LP_RING(dev_priv));
>  	return 0;
>  }
>  
> @@ -597,6 +600,7 @@ static int i915_dispatch_flip(struct drm_device * dev)
>  		ADVANCE_LP_RING();
>  	}
>  
> +	i915_add_request_wo_flush(LP_RING(dev_priv));
>  	master_priv->sarea_priv->pf_current_page = dev_priv->dri1.current_page;
>  	return 0;
>  }
> @@ -774,6 +778,7 @@ static int i915_emit_irq(struct drm_device * dev)
>  		OUT_RING(dev_priv->dri1.counter);
>  		OUT_RING(MI_USER_INTERRUPT);
>  		ADVANCE_LP_RING();
> +		i915_add_request_wo_flush(LP_RING(dev_priv));
>  	}
>  
>  	return dev_priv->dri1.counter;
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 7a96ca0..e3295cb 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2199,7 +2199,7 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
>  
>  int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
>  int i915_gem_object_sync(struct drm_i915_gem_object *obj,
> -			 struct intel_engine_cs *to);
> +			 struct intel_engine_cs *to, bool add_request);
>  void i915_vma_move_to_active(struct i915_vma *vma,
>  			     struct intel_engine_cs *ring);
>  int i915_gem_dumb_create(struct drm_file *file_priv,
> @@ -2272,9 +2272,12 @@ int __must_check i915_gem_suspend(struct drm_device *dev);
>  int __i915_add_request(struct intel_engine_cs *ring,
>  		       struct drm_file *file,
>  		       struct drm_i915_gem_object *batch_obj,
> -		       u32 *seqno);
> +		       u32 *seqno,
> +		       bool flush_caches);
>  #define i915_add_request(ring, seqno) \
> -	__i915_add_request(ring, NULL, NULL, seqno)
> +	__i915_add_request(ring, NULL, NULL, seqno, true)
> +#define i915_add_request_wo_flush(ring) \
> +	__i915_add_request(ring, NULL, NULL, NULL, false)
>  int __must_check i915_wait_seqno(struct intel_engine_cs *ring,
>  				 uint32_t seqno);
>  int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 5a13d9e..898660c 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2320,7 +2320,8 @@ i915_gem_get_seqno(struct drm_device *dev, u32 *seqno)
>  int __i915_add_request(struct intel_engine_cs *ring,
>  		       struct drm_file *file,
>  		       struct drm_i915_gem_object *obj,
> -		       u32 *out_seqno)
> +		       u32 *out_seqno,
> +		       bool flush_caches)
>  {
>  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
>  	struct drm_i915_gem_request *request;
> @@ -2335,9 +2336,11 @@ int __i915_add_request(struct intel_engine_cs *ring,
>  	 * is that the flush _must_ happen before the next request, no matter
>  	 * what.
>  	 */
> -	ret = intel_ring_flush_all_caches(ring);
> -	if (ret)
> -		return ret;
> +	if (flush_caches) {
> +		ret = intel_ring_flush_all_caches(ring);
> +		if (ret)
> +			return ret;
> +	}
>  
>  	request = ring->preallocated_lazy_request;
>  	if (WARN_ON(request == NULL))
> @@ -2815,6 +2818,8 @@ out:
>   *
>   * @obj: object which may be in use on another ring.
>   * @to: ring we wish to use the object on. May be NULL.
> + * @add_request: do we need to add a request to track operations
> + *    submitted on ring with sync_to function
>   *
>   * This code is meant to abstract object synchronization with the GPU.
>   * Calling with NULL implies synchronizing the object with the CPU
> @@ -2824,7 +2829,7 @@ out:
>   */
>  int
>  i915_gem_object_sync(struct drm_i915_gem_object *obj,
> -		     struct intel_engine_cs *to)
> +		     struct intel_engine_cs *to, bool add_request)
>  {
>  	struct intel_engine_cs *from = obj->ring;
>  	u32 seqno;
> @@ -2848,12 +2853,15 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
>  
>  	trace_i915_gem_ring_sync_to(from, to, seqno);
>  	ret = to->semaphore.sync_to(to, from, seqno);
> -	if (!ret)
> +	if (!ret) {
>  		/* We use last_read_seqno because sync_to()
>  		 * might have just caused seqno wrap under
>  		 * the radar.
>  		 */
>  		from->semaphore.sync_seqno[idx] = obj->last_read_seqno;
> +		if (add_request)
> +			i915_add_request_wo_flush(to);
> +	}
>  
>  	return ret;
>  }
> @@ -2958,6 +2966,15 @@ int i915_gpu_idle(struct drm_device *dev)
>  		if (ret)
>  			return ret;
>  
> +		/* Make sure the context switch (if one actually happened)
> +		 * gets wrapped up and finished rather than hanging around
> +		 * and confusing things later. */
> +		if (ring->outstanding_lazy_seqno) {
> +			ret = i915_add_request(ring, NULL);
> +			if (ret)
> +				return ret;
> +		}
> +
>  		ret = intel_ring_idle(ring);
>  		if (ret)
>  			return ret;
> @@ -3832,7 +3849,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
>  	int ret;
>  
>  	if (pipelined != obj->ring) {
> -		ret = i915_gem_object_sync(obj, pipelined);
> +		ret = i915_gem_object_sync(obj, pipelined, true);
>  		if (ret)
>  			return ret;
>  	}
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 3ffe308..d1d2ee0 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -488,6 +488,15 @@ int i915_gem_context_enable(struct drm_i915_private *dev_priv)
>  		ret = i915_switch_context(ring, ring->default_context);
>  		if (ret)
>  			return ret;
> +
> +		/* Make sure the context switch (if one actually happened)
> +		 * gets wrapped up and finished rather than hanging around
> +		 * and confusing things later. */
> +		if(ring->outstanding_lazy_seqno) {
> +			ret = i915_add_request_wo_flush(ring);
> +			if (ret)
> +				return ret;
> +		}
>  	}
>  
>  	return 0;
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 3a30133..ee836a6 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -858,7 +858,7 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
>  
>  	list_for_each_entry(vma, vmas, exec_list) {
>  		struct drm_i915_gem_object *obj = vma->obj;
> -		ret = i915_gem_object_sync(obj, ring);
> +		ret = i915_gem_object_sync(obj, ring, false);
>  		if (ret)
>  			return ret;
>  
> @@ -998,7 +998,7 @@ i915_gem_execbuffer_retire_commands(struct drm_device *dev,
>  	ring->gpu_caches_dirty = true;
>  
>  	/* Add a breadcrumb for the completion of the batch buffer */
> -	(void)__i915_add_request(ring, file, obj, NULL);
> +	(void)__i915_add_request(ring, file, obj, NULL, true);
>  }
>  
>  static int
> diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
> index 3521f99..50118cb 100644
> --- a/drivers/gpu/drm/i915/i915_gem_render_state.c
> +++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
> @@ -190,7 +190,7 @@ int i915_gem_render_state_init(struct intel_engine_cs *ring)
>  
>  	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so->obj), ring);
>  
> -	ret = __i915_add_request(ring, NULL, so->obj, NULL);
> +	ret = __i915_add_request(ring, NULL, so->obj, NULL, true);
>  	/* __i915_add_request moves object to inactive if it fails */
>  out:
>  	render_state_free(so);
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 54095d4..fa1ffbb 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -8980,7 +8980,7 @@ static int intel_gen2_queue_flip(struct drm_device *dev,
>  	intel_ring_emit(ring, 0); /* aux display base address, unused */
>  
>  	intel_mark_page_flip_active(intel_crtc);
> -	__intel_ring_advance(ring);
> +	i915_add_request_wo_flush(ring);
>  	return 0;
>  }
>  
> @@ -9012,7 +9012,7 @@ static int intel_gen3_queue_flip(struct drm_device *dev,
>  	intel_ring_emit(ring, MI_NOOP);
>  
>  	intel_mark_page_flip_active(intel_crtc);
> -	__intel_ring_advance(ring);
> +	i915_add_request_wo_flush(ring);
>  	return 0;
>  }
>  
> @@ -9051,7 +9051,7 @@ static int intel_gen4_queue_flip(struct drm_device *dev,
>  	intel_ring_emit(ring, pf | pipesrc);
>  
>  	intel_mark_page_flip_active(intel_crtc);
> -	__intel_ring_advance(ring);
> +	i915_add_request_wo_flush(ring);
>  	return 0;
>  }
>  
> @@ -9087,7 +9087,7 @@ static int intel_gen6_queue_flip(struct drm_device *dev,
>  	intel_ring_emit(ring, pf | pipesrc);
>  
>  	intel_mark_page_flip_active(intel_crtc);
> -	__intel_ring_advance(ring);
> +	i915_add_request_wo_flush(ring);
>  	return 0;
>  }
>  
> @@ -9182,7 +9182,7 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
>  	intel_ring_emit(ring, (MI_NOOP));
>  
>  	intel_mark_page_flip_active(intel_crtc);
> -	__intel_ring_advance(ring);
> +	i915_add_request_wo_flush(ring);
>  	return 0;
>  }
>  

I think "no_flush" would be more in line with some of the other
functions in the kernel.  "wo" makes me think of "write only".  But
it's not a big deal.

I do wonder about the rules for when add_request is needed though, and
I need to look later in the series for the usage.  When I looked at it
in relation to fences, it didn't seem to be a good fit since it looked
like requests got freed when the active list was cleared, vs when they
were actually consumed by some user.

But this patch seems straightforward enough, so:

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 04/44] drm/i915: Fix null pointer dereference in error capture
  2014-06-26 17:23 ` [RFC 04/44] drm/i915: Fix null pointer dereference in error capture John.C.Harrison
@ 2014-06-30 21:40   ` Jesse Barnes
  2014-07-01  7:12     ` Chris Wilson
  2014-07-01  7:20   ` [PATCH] drm/i915: Remove num_pages parameter to i915_error_object_create() Chris Wilson
  1 sibling, 1 reply; 90+ messages in thread
From: Jesse Barnes @ 2014-06-30 21:40 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, 26 Jun 2014 18:23:55 +0100
John.C.Harrison@Intel.com wrote:

> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The i915_gem_record_rings() code was unconditionally querying and saving state
> for the batch_obj of a request structure. This is not necessarily set. Thus a
> null pointer dereference can occur.
> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c |   13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 87ec60e..0738f21 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -902,12 +902,13 @@ static void i915_gem_record_rings(struct drm_device *dev,
>  			 * as the simplest method to avoid being overwritten
>  			 * by userspace.
>  			 */
> -			error->ring[i].batchbuffer =
> -				i915_error_object_create(dev_priv,
> -							 request->batch_obj,
> -							 request->ctx ?
> -							 request->ctx->vm :
> -							 &dev_priv->gtt.base);
> +			if(request->batch_obj)
> +				error->ring[i].batchbuffer =
> +					i915_error_object_create(dev_priv,
> +								 request->batch_obj,
> +								 request->ctx ?
> +								 request->ctx->vm :
> +								 &dev_priv->gtt.base);
>  
>  			if (HAS_BROKEN_CS_TLB(dev_priv->dev) &&
>  			    ring->scratch.obj)

Reviewed-by: Jesse Barnes <jbarnes@virtuosugeek.org>

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 04/44] drm/i915: Fix null pointer dereference in error capture
  2014-06-30 21:40   ` Jesse Barnes
@ 2014-07-01  7:12     ` Chris Wilson
  2014-07-07 18:49       ` Daniel Vetter
  0 siblings, 1 reply; 90+ messages in thread
From: Chris Wilson @ 2014-07-01  7:12 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Intel-GFX

On Mon, Jun 30, 2014 at 02:40:05PM -0700, Jesse Barnes wrote:
> On Thu, 26 Jun 2014 18:23:55 +0100
> John.C.Harrison@Intel.com wrote:
> 
> > From: John Harrison <John.C.Harrison@Intel.com>
> > 
> > The i915_gem_record_rings() code was unconditionally querying and saving state
> > for the batch_obj of a request structure. This is not necessarily set. Thus a
> > null pointer dereference can occur.
> > ---
> >  drivers/gpu/drm/i915/i915_gpu_error.c |   13 +++++++------
> >  1 file changed, 7 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index 87ec60e..0738f21 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -902,12 +902,13 @@ static void i915_gem_record_rings(struct drm_device *dev,
> >  			 * as the simplest method to avoid being overwritten
> >  			 * by userspace.
> >  			 */
> > -			error->ring[i].batchbuffer =
> > -				i915_error_object_create(dev_priv,
> > -							 request->batch_obj,
> > -							 request->ctx ?
> > -							 request->ctx->vm :
> > -							 &dev_priv->gtt.base);
> > +			if(request->batch_obj)
> > +				error->ring[i].batchbuffer =
> > +					i915_error_object_create(dev_priv,
> > +								 request->batch_obj,
> > +								 request->ctx ?
> > +								 request->ctx->vm :
> > +								 &dev_priv->gtt.base);
> >  
> >  			if (HAS_BROKEN_CS_TLB(dev_priv->dev) &&
> >  			    ring->scratch.obj)
> 
> Reviewed-by: Jesse Barnes <jbarnes@virtuosugeek.org>

Nah, put the NULL check into the macro. i915_error_object_create() was
originally written as a no-op on NULL pointers for cleanliness, we may
as well do the check centrally and remove the extras we have grown.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH] drm/i915: Remove num_pages parameter to i915_error_object_create()
  2014-06-26 17:23 ` [RFC 04/44] drm/i915: Fix null pointer dereference in error capture John.C.Harrison
  2014-06-30 21:40   ` Jesse Barnes
@ 2014-07-01  7:20   ` Chris Wilson
  1 sibling, 0 replies; 90+ messages in thread
From: Chris Wilson @ 2014-07-01  7:20 UTC (permalink / raw)
  To: intel-gfx

For cleanliness, i915_error_object_create() was written to handle the
NULL pointer in a central location. The macro that wrapped it and passed
it a num_pages to use, was not safe. As we now never limit the num_pages
to use (we did so at one point to only capture the first page of the
context), we can remove the redundant macro and be NULL safe again.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 394e283970a8..f1581a4af7a7 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -538,12 +538,12 @@ static void i915_error_state_free(struct kref *error_ref)
 }
 
 static struct drm_i915_error_object *
-i915_error_object_create_sized(struct drm_i915_private *dev_priv,
-			       struct drm_i915_gem_object *src,
-			       struct i915_address_space *vm,
-			       int num_pages)
+i915_error_object_create(struct drm_i915_private *dev_priv,
+			 struct drm_i915_gem_object *src,
+			 struct i915_address_space *vm)
 {
 	struct drm_i915_error_object *dst;
+	int num_pages;
 	bool use_ggtt;
 	int i = 0;
 	u32 reloc_offset;
@@ -551,6 +551,8 @@ i915_error_object_create_sized(struct drm_i915_private *dev_priv,
 	if (src == NULL || src->pages == NULL)
 		return NULL;
 
+	num_pages = src->base.size >> PAGE_SHIFT;
+
 	dst = kmalloc(sizeof(*dst) + num_pages * sizeof(u32 *), GFP_ATOMIC);
 	if (dst == NULL)
 		return NULL;
@@ -629,13 +631,8 @@ unwind:
 	kfree(dst);
 	return NULL;
 }
-#define i915_error_object_create(dev_priv, src, vm) \
-	i915_error_object_create_sized((dev_priv), (src), (vm), \
-				       (src)->base.size>>PAGE_SHIFT)
-
 #define i915_error_ggtt_object_create(dev_priv, src) \
-	i915_error_object_create_sized((dev_priv), (src), &(dev_priv)->gtt.base, \
-				       (src)->base.size>>PAGE_SHIFT)
+	i915_error_object_create((dev_priv), (src), &(dev_priv)->gtt.base)
 
 static void capture_bo(struct drm_i915_error_buffer *err,
 		       struct i915_vma *vma)
@@ -932,8 +929,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 							 request->ctx->vm :
 							 &dev_priv->gtt.base);
 
-			if (HAS_BROKEN_CS_TLB(dev_priv->dev) &&
-			    ring->scratch.obj)
+			if (HAS_BROKEN_CS_TLB(dev_priv->dev))
 				error->ring[i].wa_batchbuffer =
 					i915_error_ggtt_object_create(dev_priv,
 							     ring->scratch.obj);
@@ -955,9 +951,8 @@ static void i915_gem_record_rings(struct drm_device *dev,
 		error->ring[i].ringbuffer =
 			i915_error_ggtt_object_create(dev_priv, ring->buffer->obj);
 
-		if (ring->status_page.obj)
-			error->ring[i].hws_page =
-				i915_error_ggtt_object_create(dev_priv, ring->status_page.obj);
+		error->ring[i].hws_page =
+			i915_error_ggtt_object_create(dev_priv, ring->status_page.obj);
 
 		i915_gem_record_active_context(ring, error, &error->ring[i]);
 
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [RFC 05/44] drm/i915: Updating assorted register and status page definitions
  2014-06-26 17:23 ` [RFC 05/44] drm/i915: Updating assorted register and status page definitions John.C.Harrison
@ 2014-07-02 17:49   ` Jesse Barnes
  0 siblings, 0 replies; 90+ messages in thread
From: Jesse Barnes @ 2014-07-02 17:49 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, 26 Jun 2014 18:23:56 +0100
John.C.Harrison@Intel.com wrote:

> + * Premption-related registers
> + */
> +#define RING_UHPTR(base)	((base)+0x134)
> +#define   UHPTR_GFX_ADDR_ALIGN		(0x7)
> +#define   UHPTR_VALID			(0x1)
> +#define RING_PREEMPT_ADDR	0x0214c
> +#define   PREEMPT_BATCH_LEVEL_MASK	(0x3)
> +#define BB_PREEMPT_ADDR		0x02148
> +#define SBB_PREEMPT_ADDR	0x0213c
> +#define RS_PREEMPT_STATUS	0x0215c

I couldn't find these easily, and the GFX_ADDR_ALIGN is just page
alignment right?  So you might not need that one.  But overall looks
fine.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 06/44] drm/i915: Fixes for FIFO space queries
  2014-06-26 17:23 ` [RFC 06/44] drm/i915: Fixes for FIFO space queries John.C.Harrison
@ 2014-07-02 17:50   ` Jesse Barnes
  0 siblings, 0 replies; 90+ messages in thread
From: Jesse Barnes @ 2014-07-02 17:50 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, 26 Jun 2014 18:23:57 +0100
John.C.Harrison@Intel.com wrote:

> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The previous code was not correctly masking the value of the GTFIFOCTL register,
> leading to overruns and the message "MMIO read or write has been dropped". In
> addition, the checks were repeated in several different places. This commit
> replaces these various checks with a simple (inline) function to encapsulate the
> read-and-mask operation. In addition, it adds a custom wait-for-fifo function
> for VLV, as the timing parameters are somewhat different from those on earlier
> chips.
> ---
>  drivers/gpu/drm/i915/intel_uncore.c |   49 ++++++++++++++++++++++++++++++-----
>  1 file changed, 42 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 871c284..6a3dddf 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -47,6 +47,12 @@ assert_device_not_suspended(struct drm_i915_private *dev_priv)
>  	     "Device suspended\n");
>  }
>  
> +static inline u32 fifo_free_entries(struct drm_i915_private *dev_priv)
> +{
> +	u32 count = __raw_i915_read32(dev_priv, GTFIFOCTL);
> +	return count & GT_FIFO_FREE_ENTRIES_MASK;
> +}
> +
>  static void __gen6_gt_wait_for_thread_c0(struct drm_i915_private *dev_priv)
>  {
>  	u32 gt_thread_status_mask;
> @@ -154,6 +160,28 @@ static void __gen7_gt_force_wake_mt_put(struct drm_i915_private *dev_priv,
>  		gen6_gt_check_fifodbg(dev_priv);
>  }
>  
> +static int __vlv_gt_wait_for_fifo(struct drm_i915_private *dev_priv)
> +{
> +	u32 free = fifo_free_entries(dev_priv);
> +	int loop1, loop2;
> +
> +	for (loop1 = 0; loop1 < 5000 && free < GT_FIFO_NUM_RESERVED_ENTRIES; ) {
> +		for (loop2 = 0; loop2 < 1000 && free < GT_FIFO_NUM_RESERVED_ENTRIES; loop2 += 10) {
> +			udelay(10);
> +			free = fifo_free_entries(dev_priv);
> +		}
> +		loop1 += loop2;
> +		if (loop1 > 1000 || free < 48)
> +			DRM_DEBUG("after %d us, the FIFO has %d slots", loop1, free);
> +	}
> +
> +	dev_priv->uncore.fifo_count = free;
> +	if (WARN(free < GT_FIFO_NUM_RESERVED_ENTRIES,
> +		"FIFO has insufficient space (%d slots)", free))
> +		return -1;
> +	return 0;
> +}
> +
>  static int __gen6_gt_wait_for_fifo(struct drm_i915_private *dev_priv)
>  {
>  	int ret = 0;
> @@ -161,16 +189,15 @@ static int __gen6_gt_wait_for_fifo(struct drm_i915_private *dev_priv)
>  	/* On VLV, FIFO will be shared by both SW and HW.
>  	 * So, we need to read the FREE_ENTRIES everytime */
>  	if (IS_VALLEYVIEW(dev_priv->dev))
> -		dev_priv->uncore.fifo_count =
> -			__raw_i915_read32(dev_priv, GTFIFOCTL) &
> -						GT_FIFO_FREE_ENTRIES_MASK;
> +		return __vlv_gt_wait_for_fifo(dev_priv);
>  
>  	if (dev_priv->uncore.fifo_count < GT_FIFO_NUM_RESERVED_ENTRIES) {
>  		int loop = 500;
> -		u32 fifo = __raw_i915_read32(dev_priv, GTFIFOCTL) & GT_FIFO_FREE_ENTRIES_MASK;
> +		u32 fifo = fifo_free_entries(dev_priv);
> +
>  		while (fifo <= GT_FIFO_NUM_RESERVED_ENTRIES && loop--) {
>  			udelay(10);
> -			fifo = __raw_i915_read32(dev_priv, GTFIFOCTL) & GT_FIFO_FREE_ENTRIES_MASK;
> +			fifo = fifo_free_entries(dev_priv);
>  		}
>  		if (WARN_ON(loop < 0 && fifo <= GT_FIFO_NUM_RESERVED_ENTRIES))
>  			++ret;
> @@ -194,6 +221,11 @@ static void vlv_force_wake_reset(struct drm_i915_private *dev_priv)
>  static void __vlv_force_wake_get(struct drm_i915_private *dev_priv,
>  						int fw_engine)
>  {
> +#if	1
> +	if (__gen6_gt_wait_for_fifo(dev_priv))
> +		gen6_gt_check_fifodbg(dev_priv);
> +#endif
> +
>  	/* Check for Render Engine */
>  	if (FORCEWAKE_RENDER & fw_engine) {
>  		if (wait_for_atomic((__raw_i915_read32(dev_priv,
> @@ -238,6 +270,10 @@ static void __vlv_force_wake_get(struct drm_i915_private *dev_priv,
>  static void __vlv_force_wake_put(struct drm_i915_private *dev_priv,
>  					int fw_engine)
>  {
> +#if	1
> +	if (__gen6_gt_wait_for_fifo(dev_priv))
> +		gen6_gt_check_fifodbg(dev_priv);
> +#endif
>  
>  	/* Check for Render Engine */
>  	if (FORCEWAKE_RENDER & fw_engine)
> @@ -355,8 +391,7 @@ static void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
>  
>  		if (IS_GEN6(dev) || IS_GEN7(dev))
>  			dev_priv->uncore.fifo_count =
> -				__raw_i915_read32(dev_priv, GTFIFOCTL) &
> -				GT_FIFO_FREE_ENTRIES_MASK;
> +				fifo_free_entries(dev_priv);
>  	} else {
>  		dev_priv->uncore.forcewake_count = 0;
>  		dev_priv->uncore.fw_rendercount = 0;

It would be best to split out the free_entries cleanup (a good one)
from the vlv bug fix, and also drop the #if 1s.

With that done:

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 07/44] drm/i915: Disable 'get seqno' workaround for VLV
  2014-06-26 17:23 ` [RFC 07/44] drm/i915: Disable 'get seqno' workaround for VLV John.C.Harrison
@ 2014-07-02 17:51   ` Jesse Barnes
  2014-07-07 18:56     ` Daniel Vetter
  0 siblings, 1 reply; 90+ messages in thread
From: Jesse Barnes @ 2014-07-02 17:51 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, 26 Jun 2014 18:23:58 +0100
John.C.Harrison@Intel.com wrote:

> From: John Harrison <John.C.Harrison@Intel.com>
> 
> There is a workaround for a hardware bug when reading the seqno from the status
> page. The bug does not exist on VLV however, the workaround was still being
> applied.
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 279488a..bad5db0 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1960,7 +1960,10 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>  			ring->irq_put = gen6_ring_put_irq;
>  		}
>  		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
> -		ring->get_seqno = gen6_ring_get_seqno;
> +		if (IS_VALLEYVIEW(dev))
> +			ring->get_seqno = ring_get_seqno;
> +		else
> +			ring->get_seqno = gen6_ring_get_seqno;
>  		ring->set_seqno = ring_set_seqno;
>  		ring->semaphore.sync_to = gen6_ring_sync;
>  		ring->semaphore.signal = gen6_signal;

Assuming this has been well tested:
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 09/44] drm/i915: Start of GPU scheduler
  2014-06-26 17:24 ` [RFC 09/44] drm/i915: Start of GPU scheduler John.C.Harrison
@ 2014-07-02 17:55   ` Jesse Barnes
  2014-07-07 19:02   ` Daniel Vetter
  1 sibling, 0 replies; 90+ messages in thread
From: Jesse Barnes @ 2014-07-02 17:55 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, 26 Jun 2014 18:24:00 +0100
John.C.Harrison@Intel.com wrote:

> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Created GPU scheduler source files with only a basic init function.
> ---
>  drivers/gpu/drm/i915/Makefile         |    1 +
>  drivers/gpu/drm/i915/i915_drv.h       |    4 +++
>  drivers/gpu/drm/i915/i915_gem.c       |    3 ++
>  drivers/gpu/drm/i915/i915_scheduler.c |   59 +++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_scheduler.h |   40 ++++++++++++++++++++++
>  5 files changed, 107 insertions(+)
>  create mode 100644 drivers/gpu/drm/i915/i915_scheduler.c
>  create mode 100644 drivers/gpu/drm/i915/i915_scheduler.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index cad1683..12817a8 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -11,6 +11,7 @@ i915-y := i915_drv.o \
>  	  i915_params.o \
>            i915_suspend.o \
>  	  i915_sysfs.o \
> +	  i915_scheduler.o \
>  	  intel_pm.o
>  i915-$(CONFIG_COMPAT)   += i915_ioc32.o
>  i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 53f6fe5..6e592d3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1331,6 +1331,8 @@ struct intel_pipe_crc {
>  	wait_queue_head_t wq;
>  };
>  
> +struct i915_scheduler;
> +
>  struct drm_i915_private {
>  	struct drm_device *dev;
>  	struct kmem_cache *slab;
> @@ -1540,6 +1542,8 @@ struct drm_i915_private {
>  
>  	struct i915_runtime_pm pm;
>  
> +	struct i915_scheduler *scheduler;
> +
>  	/* Old dri1 support infrastructure, beware the dragons ya fools entering
>  	 * here! */
>  	struct i915_dri1_state dri1;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 898660c..b784eb2 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -37,6 +37,7 @@
>  #include <linux/swap.h>
>  #include <linux/pci.h>
>  #include <linux/dma-buf.h>
> +#include "i915_scheduler.h"
>  
>  static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
>  static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj,
> @@ -4669,6 +4670,8 @@ static int i915_gem_init_rings(struct drm_device *dev)
>  			goto cleanup_vebox_ring;
>  	}
>  
> +	i915_scheduler_init(dev);
> +
>  	ret = i915_gem_set_seqno(dev, ((u32)~0 - 0x1000));
>  	if (ret)
>  		goto cleanup_bsd2_ring;
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> new file mode 100644
> index 0000000..9ec0225
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -0,0 +1,59 @@
> +/*
> + * Copyright (c) 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#include "i915_drv.h"
> +#include "intel_drv.h"
> +#include "i915_scheduler.h"
> +
> +#ifdef CONFIG_DRM_I915_SCHEDULER
> +
> +int i915_scheduler_init(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
> +
> +	if (scheduler)
> +		return 0;
> +
> +	scheduler = kzalloc(sizeof(*scheduler), GFP_KERNEL);
> +	if (!scheduler)
> +		return -ENOMEM;
> +
> +	spin_lock_init(&scheduler->lock);
> +
> +	scheduler->index = 1;
> +
> +	dev_priv->scheduler = scheduler;
> +
> +	return 0;
> +}
> +
> +#else   /* CONFIG_DRM_I915_SCHEDULER */
> +
> +int i915_scheduler_init(struct drm_device *dev)
> +{
> +	return 0;
> +}
> +
> +#endif  /* CONFIG_DRM_I915_SCHEDULER */

Usually these bits are hidden in a header, and the source file isn't
compiled in if the config isn't set.  But I think once we get it in,
we might just want a runtime option rather than a config option anyway,
so I'd say you could just drop the config option.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 10/44] drm/i915: Prepare retire_requests to handle out-of-order seqnos
  2014-06-26 17:24 ` [RFC 10/44] drm/i915: Prepare retire_requests to handle out-of-order seqnos John.C.Harrison
@ 2014-07-02 18:11   ` Jesse Barnes
  2014-07-07 19:05   ` Daniel Vetter
  1 sibling, 0 replies; 90+ messages in thread
From: Jesse Barnes @ 2014-07-02 18:11 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, 26 Jun 2014 18:24:01 +0100
John.C.Harrison@Intel.com wrote:

> From: John Harrison <John.C.Harrison@Intel.com>
> 
> A major point of the GPU scheduler is that it re-orders batch buffers after they
> have been submitted to the driver. Rather than attempting to re-assign seqno
> values, it is much simpler to have each batch buffer keep its initially assigned
> number and modify the rest of the driver to cope with seqnos being returned out
> of order. In practice, very little code actually needs updating to cope.
> 
> One such place is the retire request handler. Rather than stopping as soon as an
> uncompleted seqno is found, it must now keep iterating through the requests in
> case later seqnos have completed. There is also a problem with doing the free of
> the request before the move to inactive. Thus the requests are now moved to a
> temporary list first, then the objects de-activated and finally the requests on
> the temporary list are freed.
> ---
>  drivers/gpu/drm/i915/i915_gem.c |   60 +++++++++++++++++++++------------------
>  1 file changed, 32 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index b784eb2..7e53446 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2602,7 +2602,10 @@ void i915_gem_reset(struct drm_device *dev)
>  void
>  i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>  {
> +	struct drm_i915_gem_object *obj, *obj_next;
> +	struct drm_i915_gem_request *req, *req_next;
>  	uint32_t seqno;
> +	LIST_HEAD(deferred_request_free);
>  
>  	if (list_empty(&ring->request_list))
>  		return;
> @@ -2611,43 +2614,35 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>  
>  	seqno = ring->get_seqno(ring, true);
>  
> -	/* Move any buffers on the active list that are no longer referenced
> -	 * by the ringbuffer to the flushing/inactive lists as appropriate,
> -	 * before we free the context associated with the requests.
> +	/* Note that seqno values might be out of order due to rescheduling and
> +	 * pre-emption. Thus both lists must be processed in their entirety
> +	 * rather than stopping at the first 'non-passed' entry.
>  	 */
> -	while (!list_empty(&ring->active_list)) {
> -		struct drm_i915_gem_object *obj;
> -
> -		obj = list_first_entry(&ring->active_list,
> -				      struct drm_i915_gem_object,
> -				      ring_list);
> -
> -		if (!i915_seqno_passed(seqno, obj->last_read_seqno))
> -			break;
>  
> -		i915_gem_object_move_to_inactive(obj);
> -	}
> -
> -
> -	while (!list_empty(&ring->request_list)) {
> -		struct drm_i915_gem_request *request;
> -
> -		request = list_first_entry(&ring->request_list,
> -					   struct drm_i915_gem_request,
> -					   list);
> -
> -		if (!i915_seqno_passed(seqno, request->seqno))
> -			break;
> +	list_for_each_entry_safe(req, req_next, &ring->request_list, list) {
> +		if (!i915_seqno_passed(seqno, req->seqno))
> +			continue;
>  
> -		trace_i915_gem_request_retire(ring, request->seqno);
> +		trace_i915_gem_request_retire(ring, req->seqno);
>  		/* We know the GPU must have read the request to have
>  		 * sent us the seqno + interrupt, so use the position
>  		 * of tail of the request to update the last known position
>  		 * of the GPU head.
>  		 */
> -		ring->buffer->last_retired_head = request->tail;
> +		ring->buffer->last_retired_head = req->tail;
>  
> -		i915_gem_free_request(request);
> +		list_move_tail(&req->list, &deferred_request_free);
> +	}
> +
> +	/* Move any buffers on the active list that are no longer referenced
> +	 * by the ringbuffer to the flushing/inactive lists as appropriate,
> +	 * before we free the context associated with the requests.
> +	 */
> +	list_for_each_entry_safe(obj, obj_next, &ring->active_list, ring_list) {
> +		if (!i915_seqno_passed(seqno, obj->last_read_seqno))
> +			continue;
> +
> +		i915_gem_object_move_to_inactive(obj);
>  	}
>  
>  	if (unlikely(ring->trace_irq_seqno &&
> @@ -2656,6 +2651,15 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>  		ring->trace_irq_seqno = 0;
>  	}
>  
> +	/* Finish processing active list before freeing request */
> +	while (!list_empty(&deferred_request_free)) {
> +		req = list_first_entry(&deferred_request_free,
> +	                               struct drm_i915_gem_request,
> +	                               list);
> +
> +		i915_gem_free_request(req);
> +	}
> +
>  	WARN_ON(i915_verify_lists(ring->dev));
>  }
>  

I think this looks ok, but I don't look at this code much...  Seems
like it should be fine to go in as-is, though I do worry a little about
the additional time we'll spend walking the list if we have lots of
outstanding requests.  But since this is just called in a work queue,
maybe that's fine.

Going forward, I guess we might want per-context seqno tracking
instead, with more limited preemption within a context (or maybe
none?), which might make things easier.  But that would require a bit
more restructuring...

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 11/44] drm/i915: Added scheduler hook into i915_seqno_passed()
  2014-06-26 17:24 ` [RFC 11/44] drm/i915: Added scheduler hook into i915_seqno_passed() John.C.Harrison
@ 2014-07-02 18:14   ` Jesse Barnes
  0 siblings, 0 replies; 90+ messages in thread
From: Jesse Barnes @ 2014-07-02 18:14 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, 26 Jun 2014 18:24:02 +0100
John.C.Harrison@Intel.com wrote:

> +bool i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
> +				uint32_t seqno, bool *completed);
> +

In what cases might the return value not match the completed value?  I
guess I'll see in a later patch...

Same comment about the ifdef applies here; looks like you have some
runtime checking in place too, which seems sufficient to me.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 12/44] drm/i915: Disable hardware semaphores when GPU scheduler is enabled
  2014-06-26 17:24 ` [RFC 12/44] drm/i915: Disable hardware semaphores when GPU scheduler is enabled John.C.Harrison
@ 2014-07-02 18:16   ` Jesse Barnes
  0 siblings, 0 replies; 90+ messages in thread
From: Jesse Barnes @ 2014-07-02 18:16 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, 26 Jun 2014 18:24:03 +0100
John.C.Harrison@Intel.com wrote:

> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Hardware sempahores require seqno values to be continuously incrementing.
> However, the scheduler's reordering of batch buffers means that the seqno values
> going through the hardware could be out of order. Thus semaphores can not be
> used.
> 
> On the other hand, the scheduler superceeds the need for hardware semaphores
> anyway. Having one ring stall waiting for something to complete on another ring
> is inefficient if that ring could be working on some other, independent task.
> This is what the scheduler is meant to do - keep the hardware as busy as
> possible by reordering batch buffers to avoid dependency stalls.
> ---
>  drivers/gpu/drm/i915/i915_drv.c         |    9 +++++++++
>  drivers/gpu/drm/i915/i915_scheduler.c   |    9 +++++++++
>  drivers/gpu/drm/i915/i915_scheduler.h   |    1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c |    4 ++++
>  4 files changed, 23 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index e2bfdda..748b13a 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -33,6 +33,7 @@
>  #include "i915_drv.h"
>  #include "i915_trace.h"
>  #include "intel_drv.h"
> +#include "i915_scheduler.h"
>  
>  #include <linux/console.h>
>  #include <linux/module.h>
> @@ -468,6 +469,14 @@ void intel_detect_pch(struct drm_device *dev)
>  
>  bool i915_semaphore_is_enabled(struct drm_device *dev)
>  {
> +	/* Hardware semaphores are not compatible with the scheduler due to the
> +	 * seqno values being potentially out of order. However, semaphores are
> +	 * also not required as the scheduler will handle interring dependencies
> +	 * and try do so in a way that does not cause dead time on the hardware.
> +	 */
> +	if (i915_scheduler_is_enabled(dev))
> +		return 0;
> +
>  	if (INTEL_INFO(dev)->gen < 6)
>  		return false;
>  
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index e9aa566..d9c1879 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -26,6 +26,15 @@
>  #include "intel_drv.h"
>  #include "i915_scheduler.h"
>  
> +bool i915_scheduler_is_enabled(struct drm_device *dev)
> +{
> +#ifdef CONFIG_DRM_I915_SCHEDULER
> +	return true;
> +#else
> +	return false;
> +#endif
> +}

I think this should be:
	if (dev_priv->scheduler)
		return true;
	return false;

instead?

Otherwise looks fine.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 13/44] drm/i915: Added scheduler hook when closing DRM file handles
  2014-06-26 17:24 ` [RFC 13/44] drm/i915: Added scheduler hook when closing DRM file handles John.C.Harrison
@ 2014-07-02 18:20   ` Jesse Barnes
  2014-07-23 15:10     ` John Harrison
  0 siblings, 1 reply; 90+ messages in thread
From: Jesse Barnes @ 2014-07-02 18:20 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, 26 Jun 2014 18:24:04 +0100
John.C.Harrison@Intel.com wrote:

> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The scheduler decouples the submission of batch buffers to the driver with
> submission of batch buffers to the hardware. Thus it is possible for an
> application to submit work, then close the DRM handle and free up all the
> resources that piece of work wishes to use before the work has even been
> submitted to the hardware. To prevent this, the scheduler needs to be informed
> of the DRM close event so that it can force through any outstanding work
> attributed to that file handle.
> ---
>  drivers/gpu/drm/i915/i915_dma.c       |    3 +++
>  drivers/gpu/drm/i915/i915_scheduler.c |   18 ++++++++++++++++++
>  drivers/gpu/drm/i915/i915_scheduler.h |    2 ++
>  3 files changed, 23 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 494b156..6c9ce82 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -42,6 +42,7 @@
>  #include <linux/vga_switcheroo.h>
>  #include <linux/slab.h>
>  #include <acpi/video.h>
> +#include "i915_scheduler.h"
>  #include <linux/pm.h>
>  #include <linux/pm_runtime.h>
>  #include <linux/oom.h>
> @@ -1930,6 +1931,8 @@ void i915_driver_lastclose(struct drm_device * dev)
>  
>  void i915_driver_preclose(struct drm_device *dev, struct drm_file *file)
>  {
> +	i915_scheduler_closefile(dev, file);
> +
>  	mutex_lock(&dev->struct_mutex);
>  	i915_gem_context_close(dev, file);
>  	i915_gem_release(dev, file);
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index d9c1879..66a6568 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -78,6 +78,19 @@ bool i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
>  	return found;
>  }
>  
> +int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
> +
> +	if (!scheduler)
> +		return 0;
> +
> +	/* Do stuff... */
> +
> +	return 0;
> +}
> +
>  #else   /* CONFIG_DRM_I915_SCHEDULER */
>  
>  int i915_scheduler_init(struct drm_device *dev)
> @@ -85,4 +98,9 @@ int i915_scheduler_init(struct drm_device *dev)
>  	return 0;
>  }
>  
> +int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
> +{
> +	return 0;
> +}
> +
>  #endif  /* CONFIG_DRM_I915_SCHEDULER */
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index 4044b6e..95641f6 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -27,6 +27,8 @@
>  
>  bool        i915_scheduler_is_enabled(struct drm_device *dev);
>  int         i915_scheduler_init(struct drm_device *dev);
> +int         i915_scheduler_closefile(struct drm_device *dev,
> +				     struct drm_file *file);
>  
>  #ifdef CONFIG_DRM_I915_SCHEDULER
>  

Yeah I guess the client could have passed a ref to some other process
for tracking the outstanding work, so we need to complete it.

But shouldn't that happen as part of the clearing of the outstanding
requests in i915_gem_suspend() which is called from lastclose()?  We do
a gpu_idle() and retire_requests() in there already...

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 14/44] drm/i915: Added getparam for GPU scheduler
  2014-06-26 17:24 ` [RFC 14/44] drm/i915: Added getparam for GPU scheduler John.C.Harrison
@ 2014-07-02 18:21   ` Jesse Barnes
  2014-07-07 19:11     ` Daniel Vetter
  0 siblings, 1 reply; 90+ messages in thread
From: Jesse Barnes @ 2014-07-02 18:21 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, 26 Jun 2014 18:24:05 +0100
John.C.Harrison@Intel.com wrote:

> From: John Harrison <John.C.Harrison@Intel.com>
> 
> This is required by user land validation programs that need to know whether the
> scheduler is available for testing or not.
> ---
>  drivers/gpu/drm/i915/i915_dma.c |    3 +++
>  include/uapi/drm/i915_drm.h     |    1 +
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 6c9ce82..1668316 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1035,6 +1035,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
>  		value = 0;
>  #endif
>  		break;
> +	case I915_PARAM_HAS_GPU_SCHEDULER:
> +		value = i915_scheduler_is_enabled(dev);
> +		break;
>  	default:
>  		DRM_DEBUG("Unknown parameter %d\n", param->param);
>  		return -EINVAL;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index bf54c78..de6f603 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -341,6 +341,7 @@ typedef struct drm_i915_irq_wait {
>  #define I915_PARAM_HAS_WT     	 	 27
>  #define I915_PARAM_CMD_PARSER_VERSION	 28
>  #define I915_PARAM_HAS_NATIVE_SYNC	 30
> +#define I915_PARAM_HAS_GPU_SCHEDULER	 31
>  
>  typedef struct drm_i915_getparam {
>  	int param;

I guess we have plenty of getparam space available.  But another option
would be for tests to check for a debugfs file that dumps scheduler
info instead, and save the get params for non-debug applications.

Either way though:
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 16/44] drm/i915: Alloc early seqno
  2014-06-26 17:24 ` [RFC 16/44] drm/i915: Alloc early seqno John.C.Harrison
@ 2014-07-02 18:29   ` Jesse Barnes
  2014-07-23 15:11     ` John Harrison
  0 siblings, 1 reply; 90+ messages in thread
From: Jesse Barnes @ 2014-07-02 18:29 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, 26 Jun 2014 18:24:07 +0100
John.C.Harrison@Intel.com wrote:

> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The scheduler needs to explicitly allocate a seqno to track each submitted batch
> buffer. This must happen a long time before any commands are actually written to
> the ring.
> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |    5 +++++
>  drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.h    |    1 +
>  3 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index ee836a6..ec274ef 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1317,6 +1317,11 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  		vma->bind_vma(vma, batch_obj->cache_level, GLOBAL_BIND);
>  	}
>  
> +	/* Allocate a seqno for this batch buffer nice and early. */
> +	ret = intel_ring_alloc_seqno(ring);
> +	if (ret)
> +		goto err;
> +
>  	if (flags & I915_DISPATCH_SECURE)
>  		exec_start += i915_gem_obj_ggtt_offset(batch_obj);
>  	else
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 34d6d6e..737c41b 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1662,7 +1662,7 @@ int intel_ring_idle(struct intel_engine_cs *ring)
>  	return i915_wait_seqno(ring, seqno);
>  }
>  
> -static int
> +int
>  intel_ring_alloc_seqno(struct intel_engine_cs *ring)
>  {
>  	if (ring->outstanding_lazy_seqno)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 30841ea..cc92de2 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -347,6 +347,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring);
>  
>  int __must_check intel_ring_begin(struct intel_engine_cs *ring, int n);
>  int __must_check intel_ring_cacheline_align(struct intel_engine_cs *ring);
> +int __must_check intel_ring_alloc_seqno(struct intel_engine_cs *ring);
>  static inline void intel_ring_emit(struct intel_engine_cs *ring,
>  				   u32 data)
>  {

This ought to be ok even w/o the scheduler, we'll just pick up the
lazy_seqno later on rather than allocating a new one at ring_begin
right?

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 17/44] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two
  2014-06-26 17:24 ` [RFC 17/44] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two John.C.Harrison
@ 2014-07-02 18:34   ` Jesse Barnes
  2014-07-07 19:21     ` Daniel Vetter
  0 siblings, 1 reply; 90+ messages in thread
From: Jesse Barnes @ 2014-07-02 18:34 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, 26 Jun 2014 18:24:08 +0100
John.C.Harrison@Intel.com wrote:

> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The scheduler decouples the submission of batch buffers to the driver with their
> submission to the hardware. This basically means splitting the execbuffer()
> function in half. This change rearranges some code ready for the split to occur.
> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index ec274ef..fda9187 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -32,6 +32,7 @@
>  #include "i915_trace.h"
>  #include "intel_drv.h"
>  #include <linux/dma_remapping.h>
> +#include "i915_scheduler.h"
>  
>  #define  __EXEC_OBJECT_HAS_PIN (1<<31)
>  #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
> @@ -874,10 +875,7 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
>  	if (flush_domains & I915_GEM_DOMAIN_GTT)
>  		wmb();
>  
> -	/* Unconditionally invalidate gpu caches and ensure that we do flush
> -	 * any residual writes from the previous batch.
> -	 */
> -	return intel_ring_invalidate_all_caches(ring);
> +	return 0;
>  }
>  
>  static bool
> @@ -1219,8 +1217,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  		}
>  	}
>  
> -	intel_runtime_pm_get(dev_priv);
> -
>  	ret = i915_mutex_lock_interruptible(dev);
>  	if (ret)
>  		goto pre_mutex_err;
> @@ -1331,6 +1327,20 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	if (ret)
>  		goto err;
>  
> +	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
> +
> +	/* To be split into two functions here... */
> +
> +	intel_runtime_pm_get(dev_priv);
> +
> +	/* Unconditionally invalidate gpu caches and ensure that we do flush
> +	 * any residual writes from the previous batch.
> +	 */
> +	ret = intel_ring_invalidate_all_caches(ring);
> +	if (ret)
> +		goto err;
> +
> +	/* Switch to the correct context for the batch */
>  	ret = i915_switch_context(ring, ctx);
>  	if (ret)
>  		goto err;
> @@ -1381,7 +1391,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  
>  	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
>  
> -	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
>  	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
>  
>  err:

I'd like Chris to take a look too, but it looks safe afaict.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 18/44] drm/i915: Added scheduler debug macro
  2014-06-26 17:24 ` [RFC 18/44] drm/i915: Added scheduler debug macro John.C.Harrison
@ 2014-07-02 18:37   ` Jesse Barnes
  2014-07-07 19:23     ` Daniel Vetter
  0 siblings, 1 reply; 90+ messages in thread
From: Jesse Barnes @ 2014-07-02 18:37 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, 26 Jun 2014 18:24:09 +0100
John.C.Harrison@Intel.com wrote:

> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Added a DRM debug facility for use by the scheduler.
> ---
>  include/drm/drmP.h |    7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/include/drm/drmP.h b/include/drm/drmP.h
> index 76ccaab..2f477c9 100644
> --- a/include/drm/drmP.h
> +++ b/include/drm/drmP.h
> @@ -120,6 +120,7 @@ struct videomode;
>  #define DRM_UT_DRIVER		0x02
>  #define DRM_UT_KMS		0x04
>  #define DRM_UT_PRIME		0x08
> +#define DRM_UT_SCHED		0x40

What's wrong with 0x10?  We should probably define these in terms of
shifts anyway, since this is just a bitmask really.

>  extern __printf(2, 3)
>  void drm_ut_debug_printk(const char *function_name,
> @@ -221,10 +222,16 @@ int drm_err(const char *func, const char *format, ...);
>  		if (unlikely(drm_debug & DRM_UT_PRIME))			\
>  			drm_ut_debug_printk(__func__, fmt, ##args);	\
>  	} while (0)
> +#define DRM_DEBUG_SCHED(fmt, args...)					\
> +	do {								\
> +		if (unlikely(drm_debug & DRM_UT_SCHED))			\
> +			drm_ut_debug_printk(__func__, fmt, ##args);	\
> +	} while (0)
>  #else
>  #define DRM_DEBUG_DRIVER(fmt, args...) do { } while (0)
>  #define DRM_DEBUG_KMS(fmt, args...)	do { } while (0)
>  #define DRM_DEBUG_PRIME(fmt, args...)	do { } while (0)
> +#define DRM_DEBUG_SCHED(fmt, args...)	do { } while (0)
>  #define DRM_DEBUG(fmt, arg...)		 do { } while (0)
>  #endif
>  

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 00/44] GPU scheduler for i915 driver
  2014-06-26 20:44 ` [RFC 00/44] GPU scheduler for i915 driver Dave Airlie
@ 2014-07-07 15:57   ` Daniel Vetter
  0 siblings, 0 replies; 90+ messages in thread
From: Daniel Vetter @ 2014-07-07 15:57 UTC (permalink / raw)
  To: Dave Airlie; +Cc: intel-gfx@lists.freedesktop.org

On Fri, Jun 27, 2014 at 06:44:04AM +1000, Dave Airlie wrote:
> >
> > Implemented a batch buffer submission scheduler for the i915 DRM driver.
> >
> 
> While this seems very interesting, you might want to address in the commit msg
> or the cover email
> 
> a) why this is needed,
> b) any improvements in speed, power consumption or throughput it generates,
> i.e. benchmarks.
> 
> also some notes on what hw supports preemption.

Also tests to both exercise the rescheduling (i.e. let a high-prio batch
compete agains a large pile of low-prio batches) and augmenting our gpu
sync tests (from Damien) with that, too.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 01/44] drm/i915: Corrected 'file_priv' to 'file' in 'i915_driver_preclose()'
  2014-06-30 21:03   ` Jesse Barnes
@ 2014-07-07 18:02     ` Daniel Vetter
  0 siblings, 0 replies; 90+ messages in thread
From: Daniel Vetter @ 2014-07-07 18:02 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Intel-GFX

On Mon, Jun 30, 2014 at 02:03:18PM -0700, Jesse Barnes wrote:
> On Thu, 26 Jun 2014 18:23:52 +0100
> John.C.Harrison@Intel.com wrote:
> 
> > From: John Harrison <John.C.Harrison@Intel.com>
> > 
> > The 'i915_driver_preclose()' function has a parameter called 'file_priv'.
> > However, this is misleading as the structure it points to is a 'drm_file' not a
> > 'drm_i915_file_private'. It should be named just 'file' to avoid confusion.

sob line is missing, but I've added that since we work for the same
company ;-) Please make sure you'll get these details right, checkpatch.pl
will help. Queued for -next, thanks for the patch.
-Daniel

> > ---
> >  drivers/gpu/drm/i915/i915_dma.c |    6 +++---
> >  drivers/gpu/drm/i915/i915_drv.h |    6 +++---
> >  2 files changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> > index b9159ad..6cce55b 100644
> > --- a/drivers/gpu/drm/i915/i915_dma.c
> > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > @@ -1916,11 +1916,11 @@ void i915_driver_lastclose(struct drm_device * dev)
> >  	i915_dma_cleanup(dev);
> >  }
> >  
> > -void i915_driver_preclose(struct drm_device * dev, struct drm_file *file_priv)
> > +void i915_driver_preclose(struct drm_device *dev, struct drm_file *file)
> >  {
> >  	mutex_lock(&dev->struct_mutex);
> > -	i915_gem_context_close(dev, file_priv);
> > -	i915_gem_release(dev, file_priv);
> > +	i915_gem_context_close(dev, file);
> > +	i915_gem_release(dev, file);
> >  	mutex_unlock(&dev->struct_mutex);
> >  }
> >  
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index bea9ab40..7a96ca0 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2044,12 +2044,12 @@ void i915_update_dri1_breadcrumb(struct drm_device *dev);
> >  extern void i915_kernel_lost_context(struct drm_device * dev);
> >  extern int i915_driver_load(struct drm_device *, unsigned long flags);
> >  extern int i915_driver_unload(struct drm_device *);
> > -extern int i915_driver_open(struct drm_device *dev, struct drm_file *file_priv);
> > +extern int i915_driver_open(struct drm_device *dev, struct drm_file *file);
> >  extern void i915_driver_lastclose(struct drm_device * dev);
> >  extern void i915_driver_preclose(struct drm_device *dev,
> > -				 struct drm_file *file_priv);
> > +				 struct drm_file *file);
> >  extern void i915_driver_postclose(struct drm_device *dev,
> > -				  struct drm_file *file_priv);
> > +				  struct drm_file *file);
> >  extern int i915_driver_device_is_agp(struct drm_device * dev);
> >  #ifdef CONFIG_COMPAT
> >  extern long i915_compat_ioctl(struct file *filp, unsigned int cmd,
> 
> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> 
> -- 
> Jesse Barnes, Intel Open Source Technology Center
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 03/44] drm/i915: Add extra add_request calls
  2014-06-30 21:10   ` Jesse Barnes
@ 2014-07-07 18:41     ` Daniel Vetter
  2014-07-08  7:44       ` Chris Wilson
  0 siblings, 1 reply; 90+ messages in thread
From: Daniel Vetter @ 2014-07-07 18:41 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Intel-GFX

On Mon, Jun 30, 2014 at 02:10:16PM -0700, Jesse Barnes wrote:
> On Thu, 26 Jun 2014 18:23:54 +0100
> John.C.Harrison@Intel.com wrote:
> I think "no_flush" would be more in line with some of the other
> functions in the kernel.  "wo" makes me think of "write only".  But
> it's not a big deal.
> 
> I do wonder about the rules for when add_request is needed though, and
> I need to look later in the series for the usage.  When I looked at it
> in relation to fences, it didn't seem to be a good fit since it looked
> like requests got freed when the active list was cleared, vs when they
> were actually consumed by some user.

Yeah, wo_flush is highly confusing while no_flush is rather clear. There's
also the question of how this all will interfere with execlists since
those patches also have the need to keep track of stuff, but slightly
different.

I'll go through your rfc for some light reading but I think we should
settle execlists first before proceeding with the schedule in earnest.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 04/44] drm/i915: Fix null pointer dereference in error capture
  2014-07-01  7:12     ` Chris Wilson
@ 2014-07-07 18:49       ` Daniel Vetter
  0 siblings, 0 replies; 90+ messages in thread
From: Daniel Vetter @ 2014-07-07 18:49 UTC (permalink / raw)
  To: Chris Wilson, Jesse Barnes, John.C.Harrison, Intel-GFX

On Tue, Jul 01, 2014 at 08:12:11AM +0100, Chris Wilson wrote:
> On Mon, Jun 30, 2014 at 02:40:05PM -0700, Jesse Barnes wrote:
> > On Thu, 26 Jun 2014 18:23:55 +0100
> > John.C.Harrison@Intel.com wrote:
> > 
> > > From: John Harrison <John.C.Harrison@Intel.com>
> > > 
> > > The i915_gem_record_rings() code was unconditionally querying and saving state
> > > for the batch_obj of a request structure. This is not necessarily set. Thus a
> > > null pointer dereference can occur.
> > > ---
> > >  drivers/gpu/drm/i915/i915_gpu_error.c |   13 +++++++------
> > >  1 file changed, 7 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > > index 87ec60e..0738f21 100644
> > > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > > @@ -902,12 +902,13 @@ static void i915_gem_record_rings(struct drm_device *dev,
> > >  			 * as the simplest method to avoid being overwritten
> > >  			 * by userspace.
> > >  			 */
> > > -			error->ring[i].batchbuffer =
> > > -				i915_error_object_create(dev_priv,
> > > -							 request->batch_obj,
> > > -							 request->ctx ?
> > > -							 request->ctx->vm :
> > > -							 &dev_priv->gtt.base);
> > > +			if(request->batch_obj)
> > > +				error->ring[i].batchbuffer =
> > > +					i915_error_object_create(dev_priv,
> > > +								 request->batch_obj,
> > > +								 request->ctx ?
> > > +								 request->ctx->vm :
> > > +								 &dev_priv->gtt.base);
> > >  
> > >  			if (HAS_BROKEN_CS_TLB(dev_priv->dev) &&
> > >  			    ring->scratch.obj)
> > 
> > Reviewed-by: Jesse Barnes <jbarnes@virtuosugeek.org>
> 
> Nah, put the NULL check into the macro. i915_error_object_create() was
> originally written as a no-op on NULL pointers for cleanliness, we may
> as well do the check centrally and remove the extras we have grown.

Also the usual broken record from your maintainer: How does this blow up
and can we please have a testcase for it? Oscar provided a basic error
state check test, so the infrastructure for a new subtest is now there. I
hope ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 02/44] drm/i915: Added getparam for native sync
  2014-06-26 17:23 ` [RFC 02/44] drm/i915: Added getparam for native sync John.C.Harrison
@ 2014-07-07 18:52   ` Daniel Vetter
  0 siblings, 0 replies; 90+ messages in thread
From: Daniel Vetter @ 2014-07-07 18:52 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, Jun 26, 2014 at 06:23:53PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Validation tests need a run time mechanism for querying whether or not the
> driver supports the Android native sync facility.
> ---
>  drivers/gpu/drm/i915/i915_dma.c |    7 +++++++
>  include/uapi/drm/i915_drm.h     |    1 +
>  2 files changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 6cce55b..67f2918 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1022,6 +1022,13 @@ static int i915_getparam(struct drm_device *dev, void *data,
>  	case I915_PARAM_CMD_PARSER_VERSION:
>  		value = i915_cmd_parser_get_version();
>  		break;
> +	case I915_PARAM_HAS_NATIVE_SYNC:
> +#ifdef CONFIG_DRM_I915_SYNC
> +		value = 1;
> +#else
> +		value = 0;
> +#endif

New userspace ABI (which this is) needs to come with open-source users.
Also we do the "announce new features to userspace" patch generally last
in a series to avoid unecessary test failures.

Finally infrastructure only used by tests should be done in debugfs, which
has more lax abi guarantees.

And one more: syncpt support and the scheduler are orthogonal imo, and
as part of proper syncpt support we also need to destage the android
syncpt stuff first (since i915 can't depend upon stuff from
drivers/staging). Thus far I have seen neglible efforts from Android
people to make this happen :(
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 07/44] drm/i915: Disable 'get seqno' workaround for VLV
  2014-07-02 17:51   ` Jesse Barnes
@ 2014-07-07 18:56     ` Daniel Vetter
  0 siblings, 0 replies; 90+ messages in thread
From: Daniel Vetter @ 2014-07-07 18:56 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Intel-GFX

On Wed, Jul 02, 2014 at 10:51:23AM -0700, Jesse Barnes wrote:
> On Thu, 26 Jun 2014 18:23:58 +0100
> John.C.Harrison@Intel.com wrote:
> 
> > From: John Harrison <John.C.Harrison@Intel.com>
> > 
> > There is a workaround for a hardware bug when reading the seqno from the status
> > page. The bug does not exist on VLV however, the workaround was still being
> > applied.
> > ---
> >  drivers/gpu/drm/i915/intel_ringbuffer.c |    5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 279488a..bad5db0 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1960,7 +1960,10 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
> >  			ring->irq_put = gen6_ring_put_irq;
> >  		}
> >  		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
> > -		ring->get_seqno = gen6_ring_get_seqno;
> > +		if (IS_VALLEYVIEW(dev))
> > +			ring->get_seqno = ring_get_seqno;
> > +		else
> > +			ring->get_seqno = gen6_ring_get_seqno;
> >  		ring->set_seqno = ring_set_seqno;
> >  		ring->semaphore.sync_to = gen6_ring_sync;
> >  		ring->semaphore.signal = gen6_signal;
> 
> Assuming this has been well tested:
> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>

I have my doubts ... the seqno race is fairly hard to reproduce really and
needs some serious beating. Also highly timing dependent.

My best guess is that Oscar's irq handling race fixes fixed the underlying
bug on gen6+, so I think we should instead dare to rip out this w/a
completely and see what happens. Doing this on gen6+ will at least give us
serious amounts of test coverage.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 08/44] drm/i915: Added GPU scheduler config option
  2014-06-26 17:23 ` [RFC 08/44] drm/i915: Added GPU scheduler config option John.C.Harrison
@ 2014-07-07 18:58   ` Daniel Vetter
  0 siblings, 0 replies; 90+ messages in thread
From: Daniel Vetter @ 2014-07-07 18:58 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, Jun 26, 2014 at 06:23:59PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Added a Kconfig option for enabling/disabling the GPU scheduler.
> ---
>  drivers/gpu/drm/i915/Kconfig |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
> index 437e182..22a036b 100644
> --- a/drivers/gpu/drm/i915/Kconfig
> +++ b/drivers/gpu/drm/i915/Kconfig
> @@ -81,3 +81,11 @@ config DRM_I915_UMS
>  	  enable this only if you have ancient versions of the DDX drivers.
>  
>  	  If in doubt, say "N".
> +
> +config DRM_I915_SCHEDULER
> +	bool "Enable GPU scheduler on Intel hardware"
> +	depends on DRM_I915
> +	default y
> +	help
> +	  Choose this option to enable GPU task scheduling for improved
> +	  performance and efficiency.

NACK. We ship one driver in one well tested config, everything else is a
nightmare. There's very few exceptions (currently MMU_NOTIFIER and
optional FBDEV support which have some really good reasons attached to
them). And I'm still grumpy about the MMU_NOTIFIER one ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 09/44] drm/i915: Start of GPU scheduler
  2014-06-26 17:24 ` [RFC 09/44] drm/i915: Start of GPU scheduler John.C.Harrison
  2014-07-02 17:55   ` Jesse Barnes
@ 2014-07-07 19:02   ` Daniel Vetter
  1 sibling, 0 replies; 90+ messages in thread
From: Daniel Vetter @ 2014-07-07 19:02 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, Jun 26, 2014 at 06:24:00PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Created GPU scheduler source files with only a basic init function.

Same critique as for Oscar's execlist: Please don't order patches by
adding unused leave code and structures first, but start by wiring up
wired up (but mayb be still partially stubbed-out) code.

The aim is to make review of individual patches possible with as little
context as required - for otherwise (i.e. if you have to keep all the code
in mind till the end since only then it really gets plugged in) splitting
up the patches is a superficial exercise and doesn't really help the
reviewer.

/rant
-Daniel

> ---
>  drivers/gpu/drm/i915/Makefile         |    1 +
>  drivers/gpu/drm/i915/i915_drv.h       |    4 +++
>  drivers/gpu/drm/i915/i915_gem.c       |    3 ++
>  drivers/gpu/drm/i915/i915_scheduler.c |   59 +++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_scheduler.h |   40 ++++++++++++++++++++++
>  5 files changed, 107 insertions(+)
>  create mode 100644 drivers/gpu/drm/i915/i915_scheduler.c
>  create mode 100644 drivers/gpu/drm/i915/i915_scheduler.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index cad1683..12817a8 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -11,6 +11,7 @@ i915-y := i915_drv.o \
>  	  i915_params.o \
>            i915_suspend.o \
>  	  i915_sysfs.o \
> +	  i915_scheduler.o \
>  	  intel_pm.o
>  i915-$(CONFIG_COMPAT)   += i915_ioc32.o
>  i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 53f6fe5..6e592d3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1331,6 +1331,8 @@ struct intel_pipe_crc {
>  	wait_queue_head_t wq;
>  };
>  
> +struct i915_scheduler;
> +
>  struct drm_i915_private {
>  	struct drm_device *dev;
>  	struct kmem_cache *slab;
> @@ -1540,6 +1542,8 @@ struct drm_i915_private {
>  
>  	struct i915_runtime_pm pm;
>  
> +	struct i915_scheduler *scheduler;
> +
>  	/* Old dri1 support infrastructure, beware the dragons ya fools entering
>  	 * here! */
>  	struct i915_dri1_state dri1;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 898660c..b784eb2 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -37,6 +37,7 @@
>  #include <linux/swap.h>
>  #include <linux/pci.h>
>  #include <linux/dma-buf.h>
> +#include "i915_scheduler.h"
>  
>  static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
>  static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj,
> @@ -4669,6 +4670,8 @@ static int i915_gem_init_rings(struct drm_device *dev)
>  			goto cleanup_vebox_ring;
>  	}
>  
> +	i915_scheduler_init(dev);
> +
>  	ret = i915_gem_set_seqno(dev, ((u32)~0 - 0x1000));
>  	if (ret)
>  		goto cleanup_bsd2_ring;
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> new file mode 100644
> index 0000000..9ec0225
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -0,0 +1,59 @@
> +/*
> + * Copyright (c) 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#include "i915_drv.h"
> +#include "intel_drv.h"
> +#include "i915_scheduler.h"
> +
> +#ifdef CONFIG_DRM_I915_SCHEDULER
> +
> +int i915_scheduler_init(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
> +
> +	if (scheduler)
> +		return 0;
> +
> +	scheduler = kzalloc(sizeof(*scheduler), GFP_KERNEL);
> +	if (!scheduler)
> +		return -ENOMEM;
> +
> +	spin_lock_init(&scheduler->lock);
> +
> +	scheduler->index = 1;
> +
> +	dev_priv->scheduler = scheduler;
> +
> +	return 0;
> +}
> +
> +#else   /* CONFIG_DRM_I915_SCHEDULER */
> +
> +int i915_scheduler_init(struct drm_device *dev)
> +{
> +	return 0;
> +}
> +
> +#endif  /* CONFIG_DRM_I915_SCHEDULER */
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> new file mode 100644
> index 0000000..bbe1934
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -0,0 +1,40 @@
> +/*
> + * Copyright (c) 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef _I915_SCHEDULER_H_
> +#define _I915_SCHEDULER_H_
> +
> +int         i915_scheduler_init(struct drm_device *dev);
> +
> +#ifdef CONFIG_DRM_I915_SCHEDULER
> +
> +struct i915_scheduler {
> +	uint32_t    flags[I915_NUM_RINGS];
> +	spinlock_t  lock;
> +	uint32_t    index;
> +};
> +
> +#endif  /* CONFIG_DRM_I915_SCHEDULER */
> +
> +#endif  /* _I915_SCHEDULER_H_ */
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 10/44] drm/i915: Prepare retire_requests to handle out-of-order seqnos
  2014-06-26 17:24 ` [RFC 10/44] drm/i915: Prepare retire_requests to handle out-of-order seqnos John.C.Harrison
  2014-07-02 18:11   ` Jesse Barnes
@ 2014-07-07 19:05   ` Daniel Vetter
  2014-07-09 14:08     ` Daniel Vetter
  1 sibling, 1 reply; 90+ messages in thread
From: Daniel Vetter @ 2014-07-07 19:05 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, Jun 26, 2014 at 06:24:01PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> A major point of the GPU scheduler is that it re-orders batch buffers after they
> have been submitted to the driver. Rather than attempting to re-assign seqno
> values, it is much simpler to have each batch buffer keep its initially assigned
> number and modify the rest of the driver to cope with seqnos being returned out
> of order. In practice, very little code actually needs updating to cope.
> 
> One such place is the retire request handler. Rather than stopping as soon as an
> uncompleted seqno is found, it must now keep iterating through the requests in
> case later seqnos have completed. There is also a problem with doing the free of
> the request before the move to inactive. Thus the requests are now moved to a
> temporary list first, then the objects de-activated and finally the requests on
> the temporary list are freed.

I still hold that we should track requests, not seqno+ring pairs. At least
the plan with Maarten's fencing patches is to embedded the generic struct
fence into our i915_gem_request structure. And struct fence will also be
the kernel-internal represenation of a android native sync fence.

So splatter ring+seqno->request/fence lookups all over the place isn't a
good way forward. It's ok for bring up, but for merging we should do that
kind of large-scale refactoring upfront to reduce rebase churn. Oscar
knows how this works.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c |   60 +++++++++++++++++++++------------------
>  1 file changed, 32 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index b784eb2..7e53446 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2602,7 +2602,10 @@ void i915_gem_reset(struct drm_device *dev)
>  void
>  i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>  {
> +	struct drm_i915_gem_object *obj, *obj_next;
> +	struct drm_i915_gem_request *req, *req_next;
>  	uint32_t seqno;
> +	LIST_HEAD(deferred_request_free);
>  
>  	if (list_empty(&ring->request_list))
>  		return;
> @@ -2611,43 +2614,35 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>  
>  	seqno = ring->get_seqno(ring, true);
>  
> -	/* Move any buffers on the active list that are no longer referenced
> -	 * by the ringbuffer to the flushing/inactive lists as appropriate,
> -	 * before we free the context associated with the requests.
> +	/* Note that seqno values might be out of order due to rescheduling and
> +	 * pre-emption. Thus both lists must be processed in their entirety
> +	 * rather than stopping at the first 'non-passed' entry.
>  	 */
> -	while (!list_empty(&ring->active_list)) {
> -		struct drm_i915_gem_object *obj;
> -
> -		obj = list_first_entry(&ring->active_list,
> -				      struct drm_i915_gem_object,
> -				      ring_list);
> -
> -		if (!i915_seqno_passed(seqno, obj->last_read_seqno))
> -			break;
>  
> -		i915_gem_object_move_to_inactive(obj);
> -	}
> -
> -
> -	while (!list_empty(&ring->request_list)) {
> -		struct drm_i915_gem_request *request;
> -
> -		request = list_first_entry(&ring->request_list,
> -					   struct drm_i915_gem_request,
> -					   list);
> -
> -		if (!i915_seqno_passed(seqno, request->seqno))
> -			break;
> +	list_for_each_entry_safe(req, req_next, &ring->request_list, list) {
> +		if (!i915_seqno_passed(seqno, req->seqno))
> +			continue;
>  
> -		trace_i915_gem_request_retire(ring, request->seqno);
> +		trace_i915_gem_request_retire(ring, req->seqno);
>  		/* We know the GPU must have read the request to have
>  		 * sent us the seqno + interrupt, so use the position
>  		 * of tail of the request to update the last known position
>  		 * of the GPU head.
>  		 */
> -		ring->buffer->last_retired_head = request->tail;
> +		ring->buffer->last_retired_head = req->tail;
>  
> -		i915_gem_free_request(request);
> +		list_move_tail(&req->list, &deferred_request_free);
> +	}
> +
> +	/* Move any buffers on the active list that are no longer referenced
> +	 * by the ringbuffer to the flushing/inactive lists as appropriate,
> +	 * before we free the context associated with the requests.
> +	 */
> +	list_for_each_entry_safe(obj, obj_next, &ring->active_list, ring_list) {
> +		if (!i915_seqno_passed(seqno, obj->last_read_seqno))
> +			continue;
> +
> +		i915_gem_object_move_to_inactive(obj);
>  	}
>  
>  	if (unlikely(ring->trace_irq_seqno &&
> @@ -2656,6 +2651,15 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>  		ring->trace_irq_seqno = 0;
>  	}
>  
> +	/* Finish processing active list before freeing request */
> +	while (!list_empty(&deferred_request_free)) {
> +		req = list_first_entry(&deferred_request_free,
> +	                               struct drm_i915_gem_request,
> +	                               list);
> +
> +		i915_gem_free_request(req);
> +	}
> +
>  	WARN_ON(i915_verify_lists(ring->dev));
>  }
>  
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 14/44] drm/i915: Added getparam for GPU scheduler
  2014-07-02 18:21   ` Jesse Barnes
@ 2014-07-07 19:11     ` Daniel Vetter
  0 siblings, 0 replies; 90+ messages in thread
From: Daniel Vetter @ 2014-07-07 19:11 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Intel-GFX

On Wed, Jul 02, 2014 at 11:21:42AM -0700, Jesse Barnes wrote:
> On Thu, 26 Jun 2014 18:24:05 +0100
> John.C.Harrison@Intel.com wrote:
> 
> > From: John Harrison <John.C.Harrison@Intel.com>
> > 
> > This is required by user land validation programs that need to know whether the
> > scheduler is available for testing or not.
> > ---
> >  drivers/gpu/drm/i915/i915_dma.c |    3 +++
> >  include/uapi/drm/i915_drm.h     |    1 +
> >  2 files changed, 4 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> > index 6c9ce82..1668316 100644
> > --- a/drivers/gpu/drm/i915/i915_dma.c
> > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > @@ -1035,6 +1035,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
> >  		value = 0;
> >  #endif
> >  		break;
> > +	case I915_PARAM_HAS_GPU_SCHEDULER:
> > +		value = i915_scheduler_is_enabled(dev);
> > +		break;
> >  	default:
> >  		DRM_DEBUG("Unknown parameter %d\n", param->param);
> >  		return -EINVAL;
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index bf54c78..de6f603 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -341,6 +341,7 @@ typedef struct drm_i915_irq_wait {
> >  #define I915_PARAM_HAS_WT     	 	 27
> >  #define I915_PARAM_CMD_PARSER_VERSION	 28
> >  #define I915_PARAM_HAS_NATIVE_SYNC	 30
> > +#define I915_PARAM_HAS_GPU_SCHEDULER	 31
> >  
> >  typedef struct drm_i915_getparam {
> >  	int param;
> 
> I guess we have plenty of getparam space available.  But another option
> would be for tests to check for a debugfs file that dumps scheduler
> info instead, and save the get params for non-debug applications.

Yeah, pure testing interfaces should reside in debugfs - much less
stringent abi compatibility requirements for that stuff.

Also, I want to see these validation tests as igt patches.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 15/44] drm/i915: Added deferred work handler for scheduler
  2014-06-26 17:24 ` [RFC 15/44] drm/i915: Added deferred work handler for scheduler John.C.Harrison
@ 2014-07-07 19:14   ` Daniel Vetter
  2014-07-23 15:37     ` John Harrison
  0 siblings, 1 reply; 90+ messages in thread
From: Daniel Vetter @ 2014-07-07 19:14 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, Jun 26, 2014 at 06:24:06PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The scheduler needs to do interrupt triggered work that is too complex to do in
> the interrupt handler. Thus it requires a deferred work handler to process this
> work asynchronously.
> ---
>  drivers/gpu/drm/i915/i915_dma.c       |    3 +++
>  drivers/gpu/drm/i915/i915_drv.h       |   10 ++++++++++
>  drivers/gpu/drm/i915/i915_gem.c       |   27 +++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_scheduler.c |    7 +++++++
>  drivers/gpu/drm/i915/i915_scheduler.h |    1 +
>  5 files changed, 48 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 1668316..d1356f3 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1813,6 +1813,9 @@ int i915_driver_unload(struct drm_device *dev)
>  	WARN_ON(unregister_oom_notifier(&dev_priv->mm.oom_notifier));
>  	unregister_shrinker(&dev_priv->mm.shrinker);
>  
> +	/* Cancel the scheduler work handler, which should be idle now. */
> +	cancel_work_sync(&dev_priv->mm.scheduler_work);
> +
>  	io_mapping_free(dev_priv->gtt.mappable);
>  	arch_phys_wc_del(dev_priv->gtt.mtrr);
>  
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 0977653..fbafa68 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1075,6 +1075,16 @@ struct i915_gem_mm {
>  	struct delayed_work idle_work;
>  
>  	/**
> +	 * New scheme is to get an interrupt after every work packet
> +	 * in order to allow the low latency scheduling of pending
> +	 * packets. The idea behind adding new packets to a pending
> +	 * queue rather than directly into the hardware ring buffer
> +	 * is to allow high priority packets to over take low priority
> +	 * ones.
> +	 */
> +	struct work_struct scheduler_work;

Latency for work items isn't too awesome, and e.g. Oscar's execlist code
latches the next context right away from the irq handler. Why can't we do
something similar for the scheduler? Fishing the next item out of a
priority queue shouldn't be expensive ...
-Daniel

> +
> +	/**
>  	 * Are we in a non-interruptible section of code like
>  	 * modesetting?
>  	 */
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index fece5e7..57b24f0 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2712,6 +2712,29 @@ i915_gem_idle_work_handler(struct work_struct *work)
>  	intel_mark_idle(dev_priv->dev);
>  }
>  
> +#ifdef CONFIG_DRM_I915_SCHEDULER
> +static void
> +i915_gem_scheduler_work_handler(struct work_struct *work)
> +{
> +	struct intel_engine_cs  *ring;
> +	struct drm_i915_private *dev_priv;
> +	struct drm_device       *dev;
> +	int                     i;
> +
> +	dev_priv = container_of(work, struct drm_i915_private, mm.scheduler_work);
> +	dev = dev_priv->dev;
> +
> +	mutex_lock(&dev->struct_mutex);
> +
> +	/* Do stuff: */
> +	for_each_ring(ring, dev_priv, i) {
> +		i915_scheduler_remove(ring);
> +	}
> +
> +	mutex_unlock(&dev->struct_mutex);
> +}
> +#endif
> +
>  /**
>   * Ensures that an object will eventually get non-busy by flushing any required
>   * write domains, emitting any outstanding lazy request and retiring and
> @@ -4916,6 +4939,10 @@ i915_gem_load(struct drm_device *dev)
>  			  i915_gem_retire_work_handler);
>  	INIT_DELAYED_WORK(&dev_priv->mm.idle_work,
>  			  i915_gem_idle_work_handler);
> +#ifdef CONFIG_DRM_I915_SCHEDULER
> +	INIT_WORK(&dev_priv->mm.scheduler_work,
> +				i915_gem_scheduler_work_handler);
> +#endif
>  	init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
>  
>  	/* On GEN3 we really need to make sure the ARB C3 LP bit is set */
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index 66a6568..37f8a98 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -58,6 +58,13 @@ int i915_scheduler_init(struct drm_device *dev)
>  	return 0;
>  }
>  
> +int i915_scheduler_remove(struct intel_engine_cs *ring)
> +{
> +	/* Do stuff... */
> +
> +	return 0;
> +}
> +
>  bool i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
>  			       uint32_t seqno, bool *completed)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index 95641f6..6b2cc51 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -38,6 +38,7 @@ struct i915_scheduler {
>  	uint32_t    index;
>  };
>  
> +int         i915_scheduler_remove(struct intel_engine_cs *ring);
>  bool        i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
>  					      uint32_t seqno, bool *completed);
>  
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 17/44] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two
  2014-07-02 18:34   ` Jesse Barnes
@ 2014-07-07 19:21     ` Daniel Vetter
  2014-07-23 16:33       ` John Harrison
  0 siblings, 1 reply; 90+ messages in thread
From: Daniel Vetter @ 2014-07-07 19:21 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Intel-GFX

On Wed, Jul 02, 2014 at 11:34:23AM -0700, Jesse Barnes wrote:
> On Thu, 26 Jun 2014 18:24:08 +0100
> John.C.Harrison@Intel.com wrote:
> 
> > From: John Harrison <John.C.Harrison@Intel.com>
> > 
> > The scheduler decouples the submission of batch buffers to the driver with their
> > submission to the hardware. This basically means splitting the execbuffer()
> > function in half. This change rearranges some code ready for the split to occur.
> > ---
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   23 ++++++++++++++++-------
> >  1 file changed, 16 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index ec274ef..fda9187 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -32,6 +32,7 @@
> >  #include "i915_trace.h"
> >  #include "intel_drv.h"
> >  #include <linux/dma_remapping.h>
> > +#include "i915_scheduler.h"
> >  
> >  #define  __EXEC_OBJECT_HAS_PIN (1<<31)
> >  #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
> > @@ -874,10 +875,7 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
> >  	if (flush_domains & I915_GEM_DOMAIN_GTT)
> >  		wmb();
> >  
> > -	/* Unconditionally invalidate gpu caches and ensure that we do flush
> > -	 * any residual writes from the previous batch.
> > -	 */
> > -	return intel_ring_invalidate_all_caches(ring);
> > +	return 0;
> >  }
> >  
> >  static bool
> > @@ -1219,8 +1217,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >  		}
> >  	}
> >  
> > -	intel_runtime_pm_get(dev_priv);
> > -
> >  	ret = i915_mutex_lock_interruptible(dev);
> >  	if (ret)
> >  		goto pre_mutex_err;
> > @@ -1331,6 +1327,20 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >  	if (ret)
> >  		goto err;
> >  
> > +	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
> > +
> > +	/* To be split into two functions here... */
> > +
> > +	intel_runtime_pm_get(dev_priv);
> > +
> > +	/* Unconditionally invalidate gpu caches and ensure that we do flush
> > +	 * any residual writes from the previous batch.
> > +	 */
> > +	ret = intel_ring_invalidate_all_caches(ring);
> > +	if (ret)
> > +		goto err;
> > +
> > +	/* Switch to the correct context for the batch */
> >  	ret = i915_switch_context(ring, ctx);
> >  	if (ret)
> >  		goto err;
> > @@ -1381,7 +1391,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >  
> >  	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
> >  
> > -	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
> >  	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
> >  
> >  err:
> 
> I'd like Chris to take a look too, but it looks safe afaict.
> 
> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>

switch_context can fail with EINTR so we really can't move stuff to the
active list before that point. Or we need to make sure that all the stuff
between the old and new move_to_active callsite can't fail.

Or we need to track this and tell userspace with an EIO and adjusted reset
stats that something between our point of no return where the kernel
committed to executing the batch failed.

Or we need to unrol move_to_active (which is currently not really
possible).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 18/44] drm/i915: Added scheduler debug macro
  2014-07-02 18:37   ` Jesse Barnes
@ 2014-07-07 19:23     ` Daniel Vetter
  0 siblings, 0 replies; 90+ messages in thread
From: Daniel Vetter @ 2014-07-07 19:23 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Intel-GFX

On Wed, Jul 02, 2014 at 11:37:29AM -0700, Jesse Barnes wrote:
> On Thu, 26 Jun 2014 18:24:09 +0100
> John.C.Harrison@Intel.com wrote:
> 
> > From: John Harrison <John.C.Harrison@Intel.com>
> > 
> > Added a DRM debug facility for use by the scheduler.
> > ---
> >  include/drm/drmP.h |    7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/include/drm/drmP.h b/include/drm/drmP.h
> > index 76ccaab..2f477c9 100644
> > --- a/include/drm/drmP.h
> > +++ b/include/drm/drmP.h
> > @@ -120,6 +120,7 @@ struct videomode;
> >  #define DRM_UT_DRIVER		0x02
> >  #define DRM_UT_KMS		0x04
> >  #define DRM_UT_PRIME		0x08
> > +#define DRM_UT_SCHED		0x40
> 
> What's wrong with 0x10?  We should probably define these in terms of
> shifts anyway, since this is just a bitmask really.

If we want more fine-grained logging we need to use real infrastructure
like dynamic printk or similar things. The current drm_debug stuff
flat-out doesn't scale for debugging random issues and I always use
drm.debug=0xe anyway. Also the i915 scheduler isn't core drm coe so really
should be DRM_DEBUG_DRIVER or so.
-Daniel

> 
> >  extern __printf(2, 3)
> >  void drm_ut_debug_printk(const char *function_name,
> > @@ -221,10 +222,16 @@ int drm_err(const char *func, const char *format, ...);
> >  		if (unlikely(drm_debug & DRM_UT_PRIME))			\
> >  			drm_ut_debug_printk(__func__, fmt, ##args);	\
> >  	} while (0)
> > +#define DRM_DEBUG_SCHED(fmt, args...)					\
> > +	do {								\
> > +		if (unlikely(drm_debug & DRM_UT_SCHED))			\
> > +			drm_ut_debug_printk(__func__, fmt, ##args);	\
> > +	} while (0)
> >  #else
> >  #define DRM_DEBUG_DRIVER(fmt, args...) do { } while (0)
> >  #define DRM_DEBUG_KMS(fmt, args...)	do { } while (0)
> >  #define DRM_DEBUG_PRIME(fmt, args...)	do { } while (0)
> > +#define DRM_DEBUG_SCHED(fmt, args...)	do { } while (0)
> >  #define DRM_DEBUG(fmt, arg...)		 do { } while (0)
> >  #endif
> >  
> 
> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> 
> -- 
> Jesse Barnes, Intel Open Source Technology Center
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 44/44] drm/i915: Fake batch support for page flips
  2014-06-26 17:24 ` [RFC 44/44] drm/i915: Fake batch support for page flips John.C.Harrison
@ 2014-07-07 19:25   ` Daniel Vetter
  0 siblings, 0 replies; 90+ messages in thread
From: Daniel Vetter @ 2014-07-07 19:25 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, Jun 26, 2014 at 06:24:35PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Any commands written to the ring without the scheduler's knowledge can get lost
> during a pre-emption event. This checkin updates the page flip code to send the
> ring commands via the scheduler's 'fake batch' interface. Thus the page flip is
> kept safe from being clobbered.

Same comment as with the execlist series: Can't we just use mmio flips
instead? We could just restrict the scheduler to more recent platforms if
mmio flips aren't available on all platforms ...
-Daniel
> ---
>  drivers/gpu/drm/i915/intel_display.c |   84 ++++++++++++++++------------------
>  1 file changed, 40 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index fa1ffbb..8bbc5d3 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -9099,8 +9099,8 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
>  				 uint32_t flags)
>  {
>  	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> -	uint32_t plane_bit = 0;
> -	int len, ret;
> +	uint32_t plane_bit = 0, sched_flags;
> +	int ret;
>  
>  	switch (intel_crtc->plane) {
>  	case PLANE_A:
> @@ -9117,18 +9117,6 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
>  		return -ENODEV;
>  	}
>  
> -	len = 4;
> -	if (ring->id == RCS) {
> -		len += 6;
> -		/*
> -		 * On Gen 8, SRM is now taking an extra dword to accommodate
> -		 * 48bits addresses, and we need a NOOP for the batch size to
> -		 * stay even.
> -		 */
> -		if (IS_GEN8(dev))
> -			len += 2;
> -	}
> -
>  	/*
>  	 * BSpec MI_DISPLAY_FLIP for IVB:
>  	 * "The full packet must be contained within the same cache line."
> @@ -9139,13 +9127,7 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
>  	 * then do the cacheline alignment, and finally emit the
>  	 * MI_DISPLAY_FLIP.
>  	 */
> -	ret = intel_ring_cacheline_align(ring);
> -	if (ret)
> -		return ret;
> -
> -	ret = intel_ring_begin(ring, len);
> -	if (ret)
> -		return ret;
> +	sched_flags = i915_ebp_sf_cacheline_align;
>  
>  	/* Unmask the flip-done completion message. Note that the bspec says that
>  	 * we should do this for both the BCS and RCS, and that we must not unmask
> @@ -9157,32 +9139,46 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
>  	 * to zero does lead to lockups within MI_DISPLAY_FLIP.
>  	 */
>  	if (ring->id == RCS) {
> -		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
> -		intel_ring_emit(ring, DERRMR);
> -		intel_ring_emit(ring, ~(DERRMR_PIPEA_PRI_FLIP_DONE |
> -					DERRMR_PIPEB_PRI_FLIP_DONE |
> -					DERRMR_PIPEC_PRI_FLIP_DONE));
> -		if (IS_GEN8(dev))
> -			intel_ring_emit(ring, MI_STORE_REGISTER_MEM_GEN8(1) |
> -					      MI_SRM_LRM_GLOBAL_GTT);
> -		else
> -			intel_ring_emit(ring, MI_STORE_REGISTER_MEM(1) |
> -					      MI_SRM_LRM_GLOBAL_GTT);
> -		intel_ring_emit(ring, DERRMR);
> -		intel_ring_emit(ring, ring->scratch.gtt_offset + 256);
> -		if (IS_GEN8(dev)) {
> -			intel_ring_emit(ring, 0);
> -			intel_ring_emit(ring, MI_NOOP);
> -		}
> -	}
> +		uint32_t cmds[] = {
> +			MI_LOAD_REGISTER_IMM(1),
> +			DERRMR,
> +			~(DERRMR_PIPEA_PRI_FLIP_DONE |
> +				DERRMR_PIPEB_PRI_FLIP_DONE |
> +				DERRMR_PIPEC_PRI_FLIP_DONE),
> +			IS_GEN8(dev) ? (MI_STORE_REGISTER_MEM_GEN8(1) |
> +					MI_SRM_LRM_GLOBAL_GTT) :
> +				       (MI_STORE_REGISTER_MEM(1) |
> +					MI_SRM_LRM_GLOBAL_GTT),
> +			DERRMR,
> +			ring->scratch.gtt_offset + 256,
> +//		if (IS_GEN8(dev)) {
> +			0,
> +			MI_NOOP,
> +//		}
> +			MI_DISPLAY_FLIP_I915 | plane_bit,
> +			fb->pitches[0] | obj->tiling_mode,
> +			intel_crtc->unpin_work->gtt_offset,
> +			MI_NOOP
> +		};
> +		uint32_t len = sizeof(cmds) / sizeof(*cmds);
> +
> +		ret = i915_scheduler_queue_nonbatch(ring, cmds, len, &obj, 1, sched_flags);
> +	} else {
> +		uint32_t cmds[] = {
> +			MI_DISPLAY_FLIP_I915 | plane_bit,
> +			fb->pitches[0] | obj->tiling_mode,
> +			intel_crtc->unpin_work->gtt_offset,
> +			MI_NOOP
> +		};
> +		uint32_t len = sizeof(cmds) / sizeof(*cmds);
>  
> -	intel_ring_emit(ring, MI_DISPLAY_FLIP_I915 | plane_bit);
> -	intel_ring_emit(ring, (fb->pitches[0] | obj->tiling_mode));
> -	intel_ring_emit(ring, intel_crtc->unpin_work->gtt_offset);
> -	intel_ring_emit(ring, (MI_NOOP));
> +		ret = i915_scheduler_queue_nonbatch(ring, cmds, len, &obj, 1, sched_flags);
> +	}
> +	if (ret)
> +		return ret;
>  
>  	intel_mark_page_flip_active(intel_crtc);
> -	i915_add_request_wo_flush(ring);
> +
>  	return 0;
>  }
>  
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 03/44] drm/i915: Add extra add_request calls
  2014-07-07 18:41     ` Daniel Vetter
@ 2014-07-08  7:44       ` Chris Wilson
  0 siblings, 0 replies; 90+ messages in thread
From: Chris Wilson @ 2014-07-08  7:44 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel-GFX

On Mon, Jul 07, 2014 at 08:41:47PM +0200, Daniel Vetter wrote:
> On Mon, Jun 30, 2014 at 02:10:16PM -0700, Jesse Barnes wrote:
> > On Thu, 26 Jun 2014 18:23:54 +0100
> > John.C.Harrison@Intel.com wrote:
> > I think "no_flush" would be more in line with some of the other
> > functions in the kernel.  "wo" makes me think of "write only".  But
> > it's not a big deal.
> > 
> > I do wonder about the rules for when add_request is needed though, and
> > I need to look later in the series for the usage.  When I looked at it
> > in relation to fences, it didn't seem to be a good fit since it looked
> > like requests got freed when the active list was cleared, vs when they
> > were actually consumed by some user.
> 
> Yeah, wo_flush is highly confusing while no_flush is rather clear. There's
> also the question of how this all will interfere with execlists since
> those patches also have the need to keep track of stuff, but slightly
> different.
> 
> I'll go through your rfc for some light reading but I think we should
> settle execlists first before proceeding with the schedule in earnest.

On top of these extra requests, it is time to worry about read-read
optimisations. I would like for busy_ioctl to tell me that a flip is
pending on a particular pipe (though that probably requires extending
the ioctl to pass back separate busy/write/read rings) and at that point
I start to worry about undue synchronisation. That seems fitting for a
request overhaul.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 10/44] drm/i915: Prepare retire_requests to handle out-of-order seqnos
  2014-07-07 19:05   ` Daniel Vetter
@ 2014-07-09 14:08     ` Daniel Vetter
  0 siblings, 0 replies; 90+ messages in thread
From: Daniel Vetter @ 2014-07-09 14:08 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Mon, Jul 07, 2014 at 09:05:50PM +0200, Daniel Vetter wrote:
> On Thu, Jun 26, 2014 at 06:24:01PM +0100, John.C.Harrison@Intel.com wrote:
> > From: John Harrison <John.C.Harrison@Intel.com>
> > 
> > A major point of the GPU scheduler is that it re-orders batch buffers after they
> > have been submitted to the driver. Rather than attempting to re-assign seqno
> > values, it is much simpler to have each batch buffer keep its initially assigned
> > number and modify the rest of the driver to cope with seqnos being returned out
> > of order. In practice, very little code actually needs updating to cope.
> > 
> > One such place is the retire request handler. Rather than stopping as soon as an
> > uncompleted seqno is found, it must now keep iterating through the requests in
> > case later seqnos have completed. There is also a problem with doing the free of
> > the request before the move to inactive. Thus the requests are now moved to a
> > temporary list first, then the objects de-activated and finally the requests on
> > the temporary list are freed.
> 
> I still hold that we should track requests, not seqno+ring pairs. At least
> the plan with Maarten's fencing patches is to embedded the generic struct
> fence into our i915_gem_request structure. And struct fence will also be
> the kernel-internal represenation of a android native sync fence.
> 
> So splatter ring+seqno->request/fence lookups all over the place isn't a
> good way forward. It's ok for bring up, but for merging we should do that
> kind of large-scale refactoring upfront to reduce rebase churn. Oscar
> knows how this works.

Aside: Maarten's driver core patches to add a struct fence have been
merged for 3.17, so no the requirements to directly go to the right
solution and embedded struct fence into i915_gem_request and start to
refcount pointers properly is possible. Possible without intermediate
hacks to add a struct kref directly to i915_gem_request as an intermediate
step.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 13/44] drm/i915: Added scheduler hook when closing DRM file handles
  2014-07-02 18:20   ` Jesse Barnes
@ 2014-07-23 15:10     ` John Harrison
  2014-07-23 15:39       ` Jesse Barnes
  0 siblings, 1 reply; 90+ messages in thread
From: John Harrison @ 2014-07-23 15:10 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Intel-GFX

On 02/07/2014 19:20, Jesse Barnes wrote:
> On Thu, 26 Jun 2014 18:24:04 +0100
> John.C.Harrison@Intel.com wrote:
>
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The scheduler decouples the submission of batch buffers to the driver with
>> submission of batch buffers to the hardware. Thus it is possible for an
>> application to submit work, then close the DRM handle and free up all the
>> resources that piece of work wishes to use before the work has even been
>> submitted to the hardware. To prevent this, the scheduler needs to be informed
>> of the DRM close event so that it can force through any outstanding work
>> attributed to that file handle.
>> ---
>>   drivers/gpu/drm/i915/i915_dma.c       |    3 +++
>>   drivers/gpu/drm/i915/i915_scheduler.c |   18 ++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_scheduler.h |    2 ++
>>   3 files changed, 23 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
>> index 494b156..6c9ce82 100644
>> --- a/drivers/gpu/drm/i915/i915_dma.c
>> +++ b/drivers/gpu/drm/i915/i915_dma.c
>> @@ -42,6 +42,7 @@
>>   #include <linux/vga_switcheroo.h>
>>   #include <linux/slab.h>
>>   #include <acpi/video.h>
>> +#include "i915_scheduler.h"
>>   #include <linux/pm.h>
>>   #include <linux/pm_runtime.h>
>>   #include <linux/oom.h>
>> @@ -1930,6 +1931,8 @@ void i915_driver_lastclose(struct drm_device * dev)
>>   
>>   void i915_driver_preclose(struct drm_device *dev, struct drm_file *file)
>>   {
>> +	i915_scheduler_closefile(dev, file);
>> +
>>   	mutex_lock(&dev->struct_mutex);
>>   	i915_gem_context_close(dev, file);
>>   	i915_gem_release(dev, file);
>> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
>> index d9c1879..66a6568 100644
>> --- a/drivers/gpu/drm/i915/i915_scheduler.c
>> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
>> @@ -78,6 +78,19 @@ bool i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
>>   	return found;
>>   }
>>   
>> +int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
>> +{
>> +	struct drm_i915_private *dev_priv = dev->dev_private;
>> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
>> +
>> +	if (!scheduler)
>> +		return 0;
>> +
>> +	/* Do stuff... */
>> +
>> +	return 0;
>> +}
>> +
>>   #else   /* CONFIG_DRM_I915_SCHEDULER */
>>   
>>   int i915_scheduler_init(struct drm_device *dev)
>> @@ -85,4 +98,9 @@ int i915_scheduler_init(struct drm_device *dev)
>>   	return 0;
>>   }
>>   
>> +int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
>> +{
>> +	return 0;
>> +}
>> +
>>   #endif  /* CONFIG_DRM_I915_SCHEDULER */
>> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
>> index 4044b6e..95641f6 100644
>> --- a/drivers/gpu/drm/i915/i915_scheduler.h
>> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
>> @@ -27,6 +27,8 @@
>>   
>>   bool        i915_scheduler_is_enabled(struct drm_device *dev);
>>   int         i915_scheduler_init(struct drm_device *dev);
>> +int         i915_scheduler_closefile(struct drm_device *dev,
>> +				     struct drm_file *file);
>>   
>>   #ifdef CONFIG_DRM_I915_SCHEDULER
>>   
> Yeah I guess the client could have passed a ref to some other process
> for tracking the outstanding work, so we need to complete it.
>
> But shouldn't that happen as part of the clearing of the outstanding
> requests in i915_gem_suspend() which is called from lastclose()?  We do
> a gpu_idle() and retire_requests() in there already...
>

Note that this is per file close not the global close.  Individual DRM 
file handles are closed whenever a user land app stops using DRM. When 
that happens, the scheduler needs to clean up all references to that 
handle. It is not just to ensure all work belonging to that handle has 
completed but also to ensure the scheduler does not attempt to deference 
dodgy file pointers later on.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 16/44] drm/i915: Alloc early seqno
  2014-07-02 18:29   ` Jesse Barnes
@ 2014-07-23 15:11     ` John Harrison
  0 siblings, 0 replies; 90+ messages in thread
From: John Harrison @ 2014-07-23 15:11 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Intel-GFX

On 02/07/2014 19:29, Jesse Barnes wrote:
> On Thu, 26 Jun 2014 18:24:07 +0100
> John.C.Harrison@Intel.com wrote:
>
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The scheduler needs to explicitly allocate a seqno to track each submitted batch
>> buffer. This must happen a long time before any commands are actually written to
>> the ring.
>> ---
>>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |    5 +++++
>>   drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +-
>>   drivers/gpu/drm/i915/intel_ringbuffer.h    |    1 +
>>   3 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> index ee836a6..ec274ef 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> @@ -1317,6 +1317,11 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>>   		vma->bind_vma(vma, batch_obj->cache_level, GLOBAL_BIND);
>>   	}
>>   
>> +	/* Allocate a seqno for this batch buffer nice and early. */
>> +	ret = intel_ring_alloc_seqno(ring);
>> +	if (ret)
>> +		goto err;
>> +
>>   	if (flags & I915_DISPATCH_SECURE)
>>   		exec_start += i915_gem_obj_ggtt_offset(batch_obj);
>>   	else
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index 34d6d6e..737c41b 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -1662,7 +1662,7 @@ int intel_ring_idle(struct intel_engine_cs *ring)
>>   	return i915_wait_seqno(ring, seqno);
>>   }
>>   
>> -static int
>> +int
>>   intel_ring_alloc_seqno(struct intel_engine_cs *ring)
>>   {
>>   	if (ring->outstanding_lazy_seqno)
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index 30841ea..cc92de2 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -347,6 +347,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring);
>>   
>>   int __must_check intel_ring_begin(struct intel_engine_cs *ring, int n);
>>   int __must_check intel_ring_cacheline_align(struct intel_engine_cs *ring);
>> +int __must_check intel_ring_alloc_seqno(struct intel_engine_cs *ring);
>>   static inline void intel_ring_emit(struct intel_engine_cs *ring,
>>   				   u32 data)
>>   {
> This ought to be ok even w/o the scheduler, we'll just pick up the
> lazy_seqno later on rather than allocating a new one at ring_begin
> right?
>
> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
>

Yes. The early allocation is completely benign.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 15/44] drm/i915: Added deferred work handler for scheduler
  2014-07-07 19:14   ` Daniel Vetter
@ 2014-07-23 15:37     ` John Harrison
  2014-07-23 18:50       ` Daniel Vetter
  0 siblings, 1 reply; 90+ messages in thread
From: John Harrison @ 2014-07-23 15:37 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel-GFX


On 07/07/2014 20:14, Daniel Vetter wrote:
> On Thu, Jun 26, 2014 at 06:24:06PM +0100, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The scheduler needs to do interrupt triggered work that is too complex to do in
>> the interrupt handler. Thus it requires a deferred work handler to process this
>> work asynchronously.
>> ---
>>   drivers/gpu/drm/i915/i915_dma.c       |    3 +++
>>   drivers/gpu/drm/i915/i915_drv.h       |   10 ++++++++++
>>   drivers/gpu/drm/i915/i915_gem.c       |   27 +++++++++++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_scheduler.c |    7 +++++++
>>   drivers/gpu/drm/i915/i915_scheduler.h |    1 +
>>   5 files changed, 48 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
>> index 1668316..d1356f3 100644
>> --- a/drivers/gpu/drm/i915/i915_dma.c
>> +++ b/drivers/gpu/drm/i915/i915_dma.c
>> @@ -1813,6 +1813,9 @@ int i915_driver_unload(struct drm_device *dev)
>>   	WARN_ON(unregister_oom_notifier(&dev_priv->mm.oom_notifier));
>>   	unregister_shrinker(&dev_priv->mm.shrinker);
>>   
>> +	/* Cancel the scheduler work handler, which should be idle now. */
>> +	cancel_work_sync(&dev_priv->mm.scheduler_work);
>> +
>>   	io_mapping_free(dev_priv->gtt.mappable);
>>   	arch_phys_wc_del(dev_priv->gtt.mtrr);
>>   
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 0977653..fbafa68 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1075,6 +1075,16 @@ struct i915_gem_mm {
>>   	struct delayed_work idle_work;
>>   
>>   	/**
>> +	 * New scheme is to get an interrupt after every work packet
>> +	 * in order to allow the low latency scheduling of pending
>> +	 * packets. The idea behind adding new packets to a pending
>> +	 * queue rather than directly into the hardware ring buffer
>> +	 * is to allow high priority packets to over take low priority
>> +	 * ones.
>> +	 */
>> +	struct work_struct scheduler_work;
> Latency for work items isn't too awesome, and e.g. Oscar's execlist code
> latches the next context right away from the irq handler. Why can't we do
> something similar for the scheduler? Fishing the next item out of a
> priority queue shouldn't be expensive ...
> -Daniel

The problem is that taking batch buffers from the scheduler's queue and 
submitting them to the hardware requires lots of processing that is not 
IRQ compatible. It isn't just a simple register write. Half of the code 
in 'i915_gem_do_execbuffer()' must be executed. Probably/possibly it 
could be made IRQ friendly but that would place a lot of restrictions on 
a lot of code that currently doesn't expect to be restricted. Instead, 
the submission is done via a work handler that acquires the driver mutex 
lock.

In order to cover the extra latency, the scheduler operates in a 
multi-buffered mode and aims to keep eight batch buffers in flight at 
all times. That number being obtained empirically by running lots of 
benchmarks on Android with lots of different settings and seeing where 
the buffer size stopped making a difference.

John.


>
>> +
>> +	/**
>>   	 * Are we in a non-interruptible section of code like
>>   	 * modesetting?
>>   	 */
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index fece5e7..57b24f0 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2712,6 +2712,29 @@ i915_gem_idle_work_handler(struct work_struct *work)
>>   	intel_mark_idle(dev_priv->dev);
>>   }
>>   
>> +#ifdef CONFIG_DRM_I915_SCHEDULER
>> +static void
>> +i915_gem_scheduler_work_handler(struct work_struct *work)
>> +{
>> +	struct intel_engine_cs  *ring;
>> +	struct drm_i915_private *dev_priv;
>> +	struct drm_device       *dev;
>> +	int                     i;
>> +
>> +	dev_priv = container_of(work, struct drm_i915_private, mm.scheduler_work);
>> +	dev = dev_priv->dev;
>> +
>> +	mutex_lock(&dev->struct_mutex);
>> +
>> +	/* Do stuff: */
>> +	for_each_ring(ring, dev_priv, i) {
>> +		i915_scheduler_remove(ring);
>> +	}
>> +
>> +	mutex_unlock(&dev->struct_mutex);
>> +}
>> +#endif
>> +
>>   /**
>>    * Ensures that an object will eventually get non-busy by flushing any required
>>    * write domains, emitting any outstanding lazy request and retiring and
>> @@ -4916,6 +4939,10 @@ i915_gem_load(struct drm_device *dev)
>>   			  i915_gem_retire_work_handler);
>>   	INIT_DELAYED_WORK(&dev_priv->mm.idle_work,
>>   			  i915_gem_idle_work_handler);
>> +#ifdef CONFIG_DRM_I915_SCHEDULER
>> +	INIT_WORK(&dev_priv->mm.scheduler_work,
>> +				i915_gem_scheduler_work_handler);
>> +#endif
>>   	init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
>>   
>>   	/* On GEN3 we really need to make sure the ARB C3 LP bit is set */
>> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
>> index 66a6568..37f8a98 100644
>> --- a/drivers/gpu/drm/i915/i915_scheduler.c
>> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
>> @@ -58,6 +58,13 @@ int i915_scheduler_init(struct drm_device *dev)
>>   	return 0;
>>   }
>>   
>> +int i915_scheduler_remove(struct intel_engine_cs *ring)
>> +{
>> +	/* Do stuff... */
>> +
>> +	return 0;
>> +}
>> +
>>   bool i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
>>   			       uint32_t seqno, bool *completed)
>>   {
>> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
>> index 95641f6..6b2cc51 100644
>> --- a/drivers/gpu/drm/i915/i915_scheduler.h
>> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
>> @@ -38,6 +38,7 @@ struct i915_scheduler {
>>   	uint32_t    index;
>>   };
>>   
>> +int         i915_scheduler_remove(struct intel_engine_cs *ring);
>>   bool        i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
>>   					      uint32_t seqno, bool *completed);
>>   
>> -- 
>> 1.7.9.5
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 13/44] drm/i915: Added scheduler hook when closing DRM file handles
  2014-07-23 15:10     ` John Harrison
@ 2014-07-23 15:39       ` Jesse Barnes
  0 siblings, 0 replies; 90+ messages in thread
From: Jesse Barnes @ 2014-07-23 15:39 UTC (permalink / raw)
  To: John Harrison; +Cc: Intel-GFX

On Wed, 23 Jul 2014 16:10:32 +0100
John Harrison <John.C.Harrison@Intel.com> wrote:

> On 02/07/2014 19:20, Jesse Barnes wrote:
> > On Thu, 26 Jun 2014 18:24:04 +0100
> > John.C.Harrison@Intel.com wrote:
> >
> >> From: John Harrison <John.C.Harrison@Intel.com>
> >>
> >> The scheduler decouples the submission of batch buffers to the driver with
> >> submission of batch buffers to the hardware. Thus it is possible for an
> >> application to submit work, then close the DRM handle and free up all the
> >> resources that piece of work wishes to use before the work has even been
> >> submitted to the hardware. To prevent this, the scheduler needs to be informed
> >> of the DRM close event so that it can force through any outstanding work
> >> attributed to that file handle.
> >> ---
> >>   drivers/gpu/drm/i915/i915_dma.c       |    3 +++
> >>   drivers/gpu/drm/i915/i915_scheduler.c |   18 ++++++++++++++++++
> >>   drivers/gpu/drm/i915/i915_scheduler.h |    2 ++
> >>   3 files changed, 23 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> >> index 494b156..6c9ce82 100644
> >> --- a/drivers/gpu/drm/i915/i915_dma.c
> >> +++ b/drivers/gpu/drm/i915/i915_dma.c
> >> @@ -42,6 +42,7 @@
> >>   #include <linux/vga_switcheroo.h>
> >>   #include <linux/slab.h>
> >>   #include <acpi/video.h>
> >> +#include "i915_scheduler.h"
> >>   #include <linux/pm.h>
> >>   #include <linux/pm_runtime.h>
> >>   #include <linux/oom.h>
> >> @@ -1930,6 +1931,8 @@ void i915_driver_lastclose(struct drm_device * dev)
> >>   
> >>   void i915_driver_preclose(struct drm_device *dev, struct drm_file *file)
> >>   {
> >> +	i915_scheduler_closefile(dev, file);
> >> +
> >>   	mutex_lock(&dev->struct_mutex);
> >>   	i915_gem_context_close(dev, file);
> >>   	i915_gem_release(dev, file);
> >> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> >> index d9c1879..66a6568 100644
> >> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> >> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> >> @@ -78,6 +78,19 @@ bool i915_scheduler_is_seqno_in_flight(struct intel_engine_cs *ring,
> >>   	return found;
> >>   }
> >>   
> >> +int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
> >> +{
> >> +	struct drm_i915_private *dev_priv = dev->dev_private;
> >> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
> >> +
> >> +	if (!scheduler)
> >> +		return 0;
> >> +
> >> +	/* Do stuff... */
> >> +
> >> +	return 0;
> >> +}
> >> +
> >>   #else   /* CONFIG_DRM_I915_SCHEDULER */
> >>   
> >>   int i915_scheduler_init(struct drm_device *dev)
> >> @@ -85,4 +98,9 @@ int i915_scheduler_init(struct drm_device *dev)
> >>   	return 0;
> >>   }
> >>   
> >> +int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
> >> +{
> >> +	return 0;
> >> +}
> >> +
> >>   #endif  /* CONFIG_DRM_I915_SCHEDULER */
> >> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> >> index 4044b6e..95641f6 100644
> >> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> >> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> >> @@ -27,6 +27,8 @@
> >>   
> >>   bool        i915_scheduler_is_enabled(struct drm_device *dev);
> >>   int         i915_scheduler_init(struct drm_device *dev);
> >> +int         i915_scheduler_closefile(struct drm_device *dev,
> >> +				     struct drm_file *file);
> >>   
> >>   #ifdef CONFIG_DRM_I915_SCHEDULER
> >>   
> > Yeah I guess the client could have passed a ref to some other process
> > for tracking the outstanding work, so we need to complete it.
> >
> > But shouldn't that happen as part of the clearing of the outstanding
> > requests in i915_gem_suspend() which is called from lastclose()?  We do
> > a gpu_idle() and retire_requests() in there already...
> >
> 
> Note that this is per file close not the global close.  Individual DRM 
> file handles are closed whenever a user land app stops using DRM. When 
> that happens, the scheduler needs to clean up all references to that 
> handle. It is not just to ensure all work belonging to that handle has 
> completed but also to ensure the scheduler does not attempt to deference 
> dodgy file pointers later on.

Ah yeah sorry, mixed it up with lastclose.  Looks fine for per-client
close.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 17/44] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two
  2014-07-07 19:21     ` Daniel Vetter
@ 2014-07-23 16:33       ` John Harrison
  2014-07-23 18:14         ` Daniel Vetter
  0 siblings, 1 reply; 90+ messages in thread
From: John Harrison @ 2014-07-23 16:33 UTC (permalink / raw)
  To: Daniel Vetter, Jesse Barnes; +Cc: Intel-GFX


On 07/07/2014 20:21, Daniel Vetter wrote:
> On Wed, Jul 02, 2014 at 11:34:23AM -0700, Jesse Barnes wrote:
>> On Thu, 26 Jun 2014 18:24:08 +0100
>> John.C.Harrison@Intel.com wrote:
>>
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> The scheduler decouples the submission of batch buffers to the driver with their
>>> submission to the hardware. This basically means splitting the execbuffer()
>>> function in half. This change rearranges some code ready for the split to occur.
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |   23 ++++++++++++++++-------
>>>   1 file changed, 16 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> index ec274ef..fda9187 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> @@ -32,6 +32,7 @@
>>>   #include "i915_trace.h"
>>>   #include "intel_drv.h"
>>>   #include <linux/dma_remapping.h>
>>> +#include "i915_scheduler.h"
>>>   
>>>   #define  __EXEC_OBJECT_HAS_PIN (1<<31)
>>>   #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
>>> @@ -874,10 +875,7 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
>>>   	if (flush_domains & I915_GEM_DOMAIN_GTT)
>>>   		wmb();
>>>   
>>> -	/* Unconditionally invalidate gpu caches and ensure that we do flush
>>> -	 * any residual writes from the previous batch.
>>> -	 */
>>> -	return intel_ring_invalidate_all_caches(ring);
>>> +	return 0;
>>>   }
>>>   
>>>   static bool
>>> @@ -1219,8 +1217,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>>>   		}
>>>   	}
>>>   
>>> -	intel_runtime_pm_get(dev_priv);
>>> -
>>>   	ret = i915_mutex_lock_interruptible(dev);
>>>   	if (ret)
>>>   		goto pre_mutex_err;
>>> @@ -1331,6 +1327,20 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>>>   	if (ret)
>>>   		goto err;
>>>   
>>> +	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
>>> +
>>> +	/* To be split into two functions here... */
>>> +
>>> +	intel_runtime_pm_get(dev_priv);
>>> +
>>> +	/* Unconditionally invalidate gpu caches and ensure that we do flush
>>> +	 * any residual writes from the previous batch.
>>> +	 */
>>> +	ret = intel_ring_invalidate_all_caches(ring);
>>> +	if (ret)
>>> +		goto err;
>>> +
>>> +	/* Switch to the correct context for the batch */
>>>   	ret = i915_switch_context(ring, ctx);
>>>   	if (ret)
>>>   		goto err;
>>> @@ -1381,7 +1391,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>>>   
>>>   	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
>>>   
>>> -	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
>>>   	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
>>>   
>>>   err:
>> I'd like Chris to take a look too, but it looks safe afaict.
>>
>> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> switch_context can fail with EINTR so we really can't move stuff to the
> active list before that point. Or we need to make sure that all the stuff
> between the old and new move_to_active callsite can't fail.
>
> Or we need to track this and tell userspace with an EIO and adjusted reset
> stats that something between our point of no return where the kernel
> committed to executing the batch failed.
>
> Or we need to unrol move_to_active (which is currently not really
> possible).
> -Daniel

switch_context can fail with quite a lot of different error codes. Is 
there anything particularly special about EINTR? I can't spot that 
particular code path at the moment.

The context switch is done at the point of submission to the hardware. 
As batch buffers can be re-ordered between submission to driver and 
submission to hardware, there is no point choosing a context any 
earlier. Whereas the move to active needs to be done at the point of 
submission to the driver. The object needs to be marked as in use even 
though the batch buffer that actually uses it might not be executed for 
some time. From the software viewpoint, the object is in use and all the 
syncrhonisation code needs to know that.

The scheduler makes the batch buffer execution asynchronous to its 
submission to the driver. There is no way to communicate back a return 
code to user land. Instead, it is up to the scheduler to check the 
return codes from all the execution paths and to retry later if 
something fails for a temporary reason. Or to discard the buffer if it 
is truly toast.

John.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 17/44] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two
  2014-07-23 16:33       ` John Harrison
@ 2014-07-23 18:14         ` Daniel Vetter
  0 siblings, 0 replies; 90+ messages in thread
From: Daniel Vetter @ 2014-07-23 18:14 UTC (permalink / raw)
  To: John Harrison; +Cc: Intel-GFX

On Wed, Jul 23, 2014 at 05:33:42PM +0100, John Harrison wrote:
> 
> On 07/07/2014 20:21, Daniel Vetter wrote:
> >On Wed, Jul 02, 2014 at 11:34:23AM -0700, Jesse Barnes wrote:
> >>On Thu, 26 Jun 2014 18:24:08 +0100
> >>John.C.Harrison@Intel.com wrote:
> >>
> >>>From: John Harrison <John.C.Harrison@Intel.com>
> >>>
> >>>The scheduler decouples the submission of batch buffers to the driver with their
> >>>submission to the hardware. This basically means splitting the execbuffer()
> >>>function in half. This change rearranges some code ready for the split to occur.
> >>>---
> >>>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   23 ++++++++++++++++-------
> >>>  1 file changed, 16 insertions(+), 7 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>index ec274ef..fda9187 100644
> >>>--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>@@ -32,6 +32,7 @@
> >>>  #include "i915_trace.h"
> >>>  #include "intel_drv.h"
> >>>  #include <linux/dma_remapping.h>
> >>>+#include "i915_scheduler.h"
> >>>  #define  __EXEC_OBJECT_HAS_PIN (1<<31)
> >>>  #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
> >>>@@ -874,10 +875,7 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
> >>>  	if (flush_domains & I915_GEM_DOMAIN_GTT)
> >>>  		wmb();
> >>>-	/* Unconditionally invalidate gpu caches and ensure that we do flush
> >>>-	 * any residual writes from the previous batch.
> >>>-	 */
> >>>-	return intel_ring_invalidate_all_caches(ring);
> >>>+	return 0;
> >>>  }
> >>>  static bool
> >>>@@ -1219,8 +1217,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >>>  		}
> >>>  	}
> >>>-	intel_runtime_pm_get(dev_priv);
> >>>-
> >>>  	ret = i915_mutex_lock_interruptible(dev);
> >>>  	if (ret)
> >>>  		goto pre_mutex_err;
> >>>@@ -1331,6 +1327,20 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >>>  	if (ret)
> >>>  		goto err;
> >>>+	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
> >>>+
> >>>+	/* To be split into two functions here... */
> >>>+
> >>>+	intel_runtime_pm_get(dev_priv);
> >>>+
> >>>+	/* Unconditionally invalidate gpu caches and ensure that we do flush
> >>>+	 * any residual writes from the previous batch.
> >>>+	 */
> >>>+	ret = intel_ring_invalidate_all_caches(ring);
> >>>+	if (ret)
> >>>+		goto err;
> >>>+
> >>>+	/* Switch to the correct context for the batch */
> >>>  	ret = i915_switch_context(ring, ctx);
> >>>  	if (ret)
> >>>  		goto err;
> >>>@@ -1381,7 +1391,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >>>  	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
> >>>-	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
> >>>  	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
> >>>  err:
> >>I'd like Chris to take a look too, but it looks safe afaict.
> >>
> >>Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> >switch_context can fail with EINTR so we really can't move stuff to the
> >active list before that point. Or we need to make sure that all the stuff
> >between the old and new move_to_active callsite can't fail.
> >
> >Or we need to track this and tell userspace with an EIO and adjusted reset
> >stats that something between our point of no return where the kernel
> >committed to executing the batch failed.
> >
> >Or we need to unrol move_to_active (which is currently not really
> >possible).
> >-Daniel
> 
> switch_context can fail with quite a lot of different error codes. Is there
> anything particularly special about EINTR? I can't spot that particular code
> path at the moment.
> 
> The context switch is done at the point of submission to the hardware. As
> batch buffers can be re-ordered between submission to driver and submission
> to hardware, there is no point choosing a context any earlier. Whereas the
> move to active needs to be done at the point of submission to the driver.
> The object needs to be marked as in use even though the batch buffer that
> actually uses it might not be executed for some time. From the software
> viewpoint, the object is in use and all the syncrhonisation code needs to
> know that.
> 
> The scheduler makes the batch buffer execution asynchronous to its
> submission to the driver. There is no way to communicate back a return code
> to user land. Instead, it is up to the scheduler to check the return codes
> from all the execution paths and to retry later if something fails for a
> temporary reason. Or to discard the buffer if it is truly toast.

EINTR is simply really easy to test&hit since you can provoke it with
signals. And X uses signals excessively. One point where EINTR might
happen is in intel_ring_begin, the other when we try to pin the context
into ggtt. The other error codes are true exceptions and will much harder
to hit.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 15/44] drm/i915: Added deferred work handler for scheduler
  2014-07-23 15:37     ` John Harrison
@ 2014-07-23 18:50       ` Daniel Vetter
  2014-07-24 15:42         ` John Harrison
  0 siblings, 1 reply; 90+ messages in thread
From: Daniel Vetter @ 2014-07-23 18:50 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

On Wed, Jul 23, 2014 at 5:37 PM, John Harrison
<John.C.Harrison@intel.com> wrote:
>>>   diff --git a/drivers/gpu/drm/i915/i915_drv.h
>>> b/drivers/gpu/drm/i915/i915_drv.h
>>> index 0977653..fbafa68 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -1075,6 +1075,16 @@ struct i915_gem_mm {
>>>         struct delayed_work idle_work;
>>>         /**
>>> +        * New scheme is to get an interrupt after every work packet
>>> +        * in order to allow the low latency scheduling of pending
>>> +        * packets. The idea behind adding new packets to a pending
>>> +        * queue rather than directly into the hardware ring buffer
>>> +        * is to allow high priority packets to over take low priority
>>> +        * ones.
>>> +        */
>>> +       struct work_struct scheduler_work;
>>
>> Latency for work items isn't too awesome, and e.g. Oscar's execlist code
>> latches the next context right away from the irq handler. Why can't we do
>> something similar for the scheduler? Fishing the next item out of a
>> priority queue shouldn't be expensive ...
>> -Daniel
>
>
> The problem is that taking batch buffers from the scheduler's queue and
> submitting them to the hardware requires lots of processing that is not IRQ
> compatible. It isn't just a simple register write. Half of the code in
> 'i915_gem_do_execbuffer()' must be executed. Probably/possibly it could be
> made IRQ friendly but that would place a lot of restrictions on a lot of
> code that currently doesn't expect to be restricted. Instead, the submission
> is done via a work handler that acquires the driver mutex lock.
>
> In order to cover the extra latency, the scheduler operates in a
> multi-buffered mode and aims to keep eight batch buffers in flight at all
> times. That number being obtained empirically by running lots of benchmarks
> on Android with lots of different settings and seeing where the buffer size
> stopped making a difference.

So I've tried to stitch together that part of the scheduler from the
patch series. Afaics you do the actual scheduling under the protection
of irqsave spinlocks (well you also hold the dev->struct_mutex). That
means you disable local interrupts. Up to the actual submit point I
spotted two such critcial sections encompassing pretty much all the
code.

If we'd run the same code from the interrupt handler then only our own
interrupt handler is blocked, all other interrupt processing can
continue. So that's actually a lot nicer than what you have. In any
case you can't do expensive operations under an irqsave spinlock
anyway.

So either I've missed something big here, or this justification doesn't hold up.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 15/44] drm/i915: Added deferred work handler for scheduler
  2014-07-23 18:50       ` Daniel Vetter
@ 2014-07-24 15:42         ` John Harrison
  2014-07-25  7:18           ` Daniel Vetter
  0 siblings, 1 reply; 90+ messages in thread
From: John Harrison @ 2014-07-24 15:42 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx


On 23/07/2014 19:50, Daniel Vetter wrote:
> On Wed, Jul 23, 2014 at 5:37 PM, John Harrison
> <John.C.Harrison@intel.com> wrote:
>>>>    diff --git a/drivers/gpu/drm/i915/i915_drv.h
>>>> b/drivers/gpu/drm/i915/i915_drv.h
>>>> index 0977653..fbafa68 100644
>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>> @@ -1075,6 +1075,16 @@ struct i915_gem_mm {
>>>>          struct delayed_work idle_work;
>>>>          /**
>>>> +        * New scheme is to get an interrupt after every work packet
>>>> +        * in order to allow the low latency scheduling of pending
>>>> +        * packets. The idea behind adding new packets to a pending
>>>> +        * queue rather than directly into the hardware ring buffer
>>>> +        * is to allow high priority packets to over take low priority
>>>> +        * ones.
>>>> +        */
>>>> +       struct work_struct scheduler_work;
>>> Latency for work items isn't too awesome, and e.g. Oscar's execlist code
>>> latches the next context right away from the irq handler. Why can't we do
>>> something similar for the scheduler? Fishing the next item out of a
>>> priority queue shouldn't be expensive ...
>>> -Daniel
>>
>> The problem is that taking batch buffers from the scheduler's queue and
>> submitting them to the hardware requires lots of processing that is not IRQ
>> compatible. It isn't just a simple register write. Half of the code in
>> 'i915_gem_do_execbuffer()' must be executed. Probably/possibly it could be
>> made IRQ friendly but that would place a lot of restrictions on a lot of
>> code that currently doesn't expect to be restricted. Instead, the submission
>> is done via a work handler that acquires the driver mutex lock.
>>
>> In order to cover the extra latency, the scheduler operates in a
>> multi-buffered mode and aims to keep eight batch buffers in flight at all
>> times. That number being obtained empirically by running lots of benchmarks
>> on Android with lots of different settings and seeing where the buffer size
>> stopped making a difference.
> So I've tried to stitch together that part of the scheduler from the
> patch series. Afaics you do the actual scheduling under the protection
> of irqsave spinlocks (well you also hold the dev->struct_mutex). That
> means you disable local interrupts. Up to the actual submit point I
> spotted two such critcial sections encompassing pretty much all the
> code.
>
> If we'd run the same code from the interrupt handler then only our own
> interrupt handler is blocked, all other interrupt processing can
> continue. So that's actually a lot nicer than what you have. In any
> case you can't do expensive operations under an irqsave spinlock
> anyway.
>
> So either I've missed something big here, or this justification doesn't hold up.
> -Daniel

The irqsave spinlock is only held while manipulating the internal 
scheduler data structures. It is released immediately prior to calling 
i915_gem_do_execbuffer_final(). So the actual submission code path is 
done with the driver mutex but no spinlocks. I'm sure I got 'scheduling 
while atomic' bug checks the one time I accidentally left the spinlock held.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 15/44] drm/i915: Added deferred work handler for scheduler
  2014-07-24 15:42         ` John Harrison
@ 2014-07-25  7:18           ` Daniel Vetter
  0 siblings, 0 replies; 90+ messages in thread
From: Daniel Vetter @ 2014-07-25  7:18 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 04:42:55PM +0100, John Harrison wrote:
> 
> On 23/07/2014 19:50, Daniel Vetter wrote:
> >On Wed, Jul 23, 2014 at 5:37 PM, John Harrison
> ><John.C.Harrison@intel.com> wrote:
> >>>>   diff --git a/drivers/gpu/drm/i915/i915_drv.h
> >>>>b/drivers/gpu/drm/i915/i915_drv.h
> >>>>index 0977653..fbafa68 100644
> >>>>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>>>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>>>@@ -1075,6 +1075,16 @@ struct i915_gem_mm {
> >>>>         struct delayed_work idle_work;
> >>>>         /**
> >>>>+        * New scheme is to get an interrupt after every work packet
> >>>>+        * in order to allow the low latency scheduling of pending
> >>>>+        * packets. The idea behind adding new packets to a pending
> >>>>+        * queue rather than directly into the hardware ring buffer
> >>>>+        * is to allow high priority packets to over take low priority
> >>>>+        * ones.
> >>>>+        */
> >>>>+       struct work_struct scheduler_work;
> >>>Latency for work items isn't too awesome, and e.g. Oscar's execlist code
> >>>latches the next context right away from the irq handler. Why can't we do
> >>>something similar for the scheduler? Fishing the next item out of a
> >>>priority queue shouldn't be expensive ...
> >>>-Daniel
> >>
> >>The problem is that taking batch buffers from the scheduler's queue and
> >>submitting them to the hardware requires lots of processing that is not IRQ
> >>compatible. It isn't just a simple register write. Half of the code in
> >>'i915_gem_do_execbuffer()' must be executed. Probably/possibly it could be
> >>made IRQ friendly but that would place a lot of restrictions on a lot of
> >>code that currently doesn't expect to be restricted. Instead, the submission
> >>is done via a work handler that acquires the driver mutex lock.
> >>
> >>In order to cover the extra latency, the scheduler operates in a
> >>multi-buffered mode and aims to keep eight batch buffers in flight at all
> >>times. That number being obtained empirically by running lots of benchmarks
> >>on Android with lots of different settings and seeing where the buffer size
> >>stopped making a difference.
> >So I've tried to stitch together that part of the scheduler from the
> >patch series. Afaics you do the actual scheduling under the protection
> >of irqsave spinlocks (well you also hold the dev->struct_mutex). That
> >means you disable local interrupts. Up to the actual submit point I
> >spotted two such critcial sections encompassing pretty much all the
> >code.
> >
> >If we'd run the same code from the interrupt handler then only our own
> >interrupt handler is blocked, all other interrupt processing can
> >continue. So that's actually a lot nicer than what you have. In any
> >case you can't do expensive operations under an irqsave spinlock
> >anyway.
> >
> >So either I've missed something big here, or this justification doesn't hold up.
> 
> The irqsave spinlock is only held while manipulating the internal scheduler
> data structures. It is released immediately prior to calling
> i915_gem_do_execbuffer_final(). So the actual submission code path is done
> with the driver mutex but no spinlocks. I'm sure I got 'scheduling while
> atomic' bug checks the one time I accidentally left the spinlock held.

Ok, missed something ;-)

btw for big patch series please upload them somewhere on a public git
(github or so). Generally I review patches only by reading them because
trying to apply them usually results in some conflicts. But with big patch
series like this here that doesn't work, especially when everything is
tightly coupled (iirc I had to jump around in about 10 different patches
to figure out what the work handler looks like). Or maybe I didn't
understand the patch split flow and read it backwards ;-)
-Daniel
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 00/44] GPU scheduler for i915 driver
  2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
                   ` (44 preceding siblings ...)
  2014-06-26 20:44 ` [RFC 00/44] GPU scheduler for i915 driver Dave Airlie
@ 2014-10-10 10:35 ` Steven Newbury
  2014-10-20 10:31   ` John Harrison
  45 siblings, 1 reply; 90+ messages in thread
From: Steven Newbury @ 2014-10-10 10:35 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX


[-- Attachment #1.1: Type: text/plain, Size: 332 bytes --]

On Thu, 2014-06-26 at 18:23 +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison Harrison@Intel.com>
> 
> Implemented a batch buffer submission scheduler for the i915 DRM 
> driver.

Hi John,

I was just wondering what's happening with this patch series?  Are you 
still working on it?  Does it need any testing?

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 00/44] GPU scheduler for i915 driver
  2014-10-10 10:35 ` Steven Newbury
@ 2014-10-20 10:31   ` John Harrison
  0 siblings, 0 replies; 90+ messages in thread
From: John Harrison @ 2014-10-20 10:31 UTC (permalink / raw)
  To: Steven Newbury; +Cc: Intel-GFX

On 10/10/2014 11:35, Steven Newbury wrote:
> On Thu, 2014-06-26 at 18:23 +0100, John.C.Harrison@Intel.com wrote:
>> From: John Harrison Harrison@Intel.com>
>>
>> Implemented a batch buffer submission scheduler for the i915 DRM
>> driver.
> Hi John,
>
> I was just wondering what's happening with this patch series?  Are you
> still working on it?  Does it need any testing?

It is still in progress. Although currently stalled because it was 
decided to first replace the driver's seqno usage with request 
structures wherever possible. The theory being that this is safer and 
more sensible that having the scheduler cause out-of-order seqnos and 
the various work-arounds to cope with that.

^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2014-10-20 10:31 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-26 17:23 [RFC 00/44] GPU scheduler for i915 driver John.C.Harrison
2014-06-26 17:23 ` [RFC 01/44] drm/i915: Corrected 'file_priv' to 'file' in 'i915_driver_preclose()' John.C.Harrison
2014-06-30 21:03   ` Jesse Barnes
2014-07-07 18:02     ` Daniel Vetter
2014-06-26 17:23 ` [RFC 02/44] drm/i915: Added getparam for native sync John.C.Harrison
2014-07-07 18:52   ` Daniel Vetter
2014-06-26 17:23 ` [RFC 03/44] drm/i915: Add extra add_request calls John.C.Harrison
2014-06-30 21:10   ` Jesse Barnes
2014-07-07 18:41     ` Daniel Vetter
2014-07-08  7:44       ` Chris Wilson
2014-06-26 17:23 ` [RFC 04/44] drm/i915: Fix null pointer dereference in error capture John.C.Harrison
2014-06-30 21:40   ` Jesse Barnes
2014-07-01  7:12     ` Chris Wilson
2014-07-07 18:49       ` Daniel Vetter
2014-07-01  7:20   ` [PATCH] drm/i915: Remove num_pages parameter to i915_error_object_create() Chris Wilson
2014-06-26 17:23 ` [RFC 05/44] drm/i915: Updating assorted register and status page definitions John.C.Harrison
2014-07-02 17:49   ` Jesse Barnes
2014-06-26 17:23 ` [RFC 06/44] drm/i915: Fixes for FIFO space queries John.C.Harrison
2014-07-02 17:50   ` Jesse Barnes
2014-06-26 17:23 ` [RFC 07/44] drm/i915: Disable 'get seqno' workaround for VLV John.C.Harrison
2014-07-02 17:51   ` Jesse Barnes
2014-07-07 18:56     ` Daniel Vetter
2014-06-26 17:23 ` [RFC 08/44] drm/i915: Added GPU scheduler config option John.C.Harrison
2014-07-07 18:58   ` Daniel Vetter
2014-06-26 17:24 ` [RFC 09/44] drm/i915: Start of GPU scheduler John.C.Harrison
2014-07-02 17:55   ` Jesse Barnes
2014-07-07 19:02   ` Daniel Vetter
2014-06-26 17:24 ` [RFC 10/44] drm/i915: Prepare retire_requests to handle out-of-order seqnos John.C.Harrison
2014-07-02 18:11   ` Jesse Barnes
2014-07-07 19:05   ` Daniel Vetter
2014-07-09 14:08     ` Daniel Vetter
2014-06-26 17:24 ` [RFC 11/44] drm/i915: Added scheduler hook into i915_seqno_passed() John.C.Harrison
2014-07-02 18:14   ` Jesse Barnes
2014-06-26 17:24 ` [RFC 12/44] drm/i915: Disable hardware semaphores when GPU scheduler is enabled John.C.Harrison
2014-07-02 18:16   ` Jesse Barnes
2014-06-26 17:24 ` [RFC 13/44] drm/i915: Added scheduler hook when closing DRM file handles John.C.Harrison
2014-07-02 18:20   ` Jesse Barnes
2014-07-23 15:10     ` John Harrison
2014-07-23 15:39       ` Jesse Barnes
2014-06-26 17:24 ` [RFC 14/44] drm/i915: Added getparam for GPU scheduler John.C.Harrison
2014-07-02 18:21   ` Jesse Barnes
2014-07-07 19:11     ` Daniel Vetter
2014-06-26 17:24 ` [RFC 15/44] drm/i915: Added deferred work handler for scheduler John.C.Harrison
2014-07-07 19:14   ` Daniel Vetter
2014-07-23 15:37     ` John Harrison
2014-07-23 18:50       ` Daniel Vetter
2014-07-24 15:42         ` John Harrison
2014-07-25  7:18           ` Daniel Vetter
2014-06-26 17:24 ` [RFC 16/44] drm/i915: Alloc early seqno John.C.Harrison
2014-07-02 18:29   ` Jesse Barnes
2014-07-23 15:11     ` John Harrison
2014-06-26 17:24 ` [RFC 17/44] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two John.C.Harrison
2014-07-02 18:34   ` Jesse Barnes
2014-07-07 19:21     ` Daniel Vetter
2014-07-23 16:33       ` John Harrison
2014-07-23 18:14         ` Daniel Vetter
2014-06-26 17:24 ` [RFC 18/44] drm/i915: Added scheduler debug macro John.C.Harrison
2014-07-02 18:37   ` Jesse Barnes
2014-07-07 19:23     ` Daniel Vetter
2014-06-26 17:24 ` [RFC 19/44] drm/i915: Split i915_dem_do_execbuffer() in half John.C.Harrison
2014-06-26 17:24 ` [RFC 20/44] drm/i915: Redirect execbuffer_final() via scheduler John.C.Harrison
2014-06-26 17:24 ` [RFC 21/44] drm/i915: Added tracking/locking of batch buffer objects John.C.Harrison
2014-06-26 17:24 ` [RFC 22/44] drm/i915: Ensure OLS & PLR are always in sync John.C.Harrison
2014-06-26 17:24 ` [RFC 23/44] drm/i915: Added manipulation of OLS/PLR John.C.Harrison
2014-06-26 17:24 ` [RFC 24/44] drm/i915: Added scheduler interrupt handler hook John.C.Harrison
2014-06-26 17:24 ` [RFC 25/44] drm/i915: Added hook to catch 'unexpected' ring submissions John.C.Harrison
2014-06-26 17:24 ` [RFC 26/44] drm/i915: Added scheduler support to __wait_seqno() calls John.C.Harrison
2014-06-26 17:24 ` [RFC 27/44] drm/i915: Added scheduler support to page fault handler John.C.Harrison
2014-06-26 17:24 ` [RFC 28/44] drm/i915: Added scheduler flush calls to ring throttle and idle functions John.C.Harrison
2014-06-26 17:24 ` [RFC 29/44] drm/i915: Hook scheduler into intel_ring_idle() John.C.Harrison
2014-06-26 17:24 ` [RFC 30/44] drm/i915: Added a module parameter for allowing scheduler overrides John.C.Harrison
2014-06-26 17:24 ` [RFC 31/44] drm/i915: Implemented the GPU scheduler John.C.Harrison
2014-06-26 17:24 ` [RFC 32/44] drm/i915: Added immediate submission override to scheduler John.C.Harrison
2014-06-26 17:24 ` [RFC 33/44] drm/i915: Added trace points " John.C.Harrison
2014-06-26 17:24 ` [RFC 34/44] drm/i915: Added scheduler queue throttling by DRM file handle John.C.Harrison
2014-06-26 17:24 ` [RFC 35/44] drm/i915: Added debugfs interface to scheduler tuning parameters John.C.Harrison
2014-06-26 17:24 ` [RFC 36/44] drm/i915: Added debug state dump facilities to scheduler John.C.Harrison
2014-06-26 17:24 ` [RFC 37/44] drm/i915: Added facility for cancelling an outstanding request John.C.Harrison
2014-06-26 17:24 ` [RFC 38/44] drm/i915: Add early exit to execbuff_final() if insufficient ring space John.C.Harrison
2014-06-26 17:24 ` [RFC 39/44] drm/i915: Added support for pre-emptive scheduling John.C.Harrison
2014-06-26 17:24 ` [RFC 40/44] drm/i915: REVERTME Hack to allow IGT to test pre-emption John.C.Harrison
2014-06-26 17:24 ` [RFC 41/44] drm/i915: Added validation callback to trace points John.C.Harrison
2014-06-26 17:24 ` [RFC 42/44] drm/i915: Added scheduler statistic reporting to debugfs John.C.Harrison
2014-06-26 17:24 ` [RFC 43/44] drm/i915: Added support for submitting out-of-batch ring commands John.C.Harrison
2014-06-26 17:24 ` [RFC 44/44] drm/i915: Fake batch support for page flips John.C.Harrison
2014-07-07 19:25   ` Daniel Vetter
2014-06-26 20:44 ` [RFC 00/44] GPU scheduler for i915 driver Dave Airlie
2014-07-07 15:57   ` Daniel Vetter
2014-10-10 10:35 ` Steven Newbury
2014-10-20 10:31   ` John Harrison

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.