All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 00/39] GPU scheduler for i915 driver
@ 2015-07-17 14:33 John.C.Harrison
  2015-07-17 14:33 ` [RFC 01/39] drm/i915: Add total count to context status debugfs output John.C.Harrison
                   ` (39 more replies)
  0 siblings, 40 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Implemented a batch buffer submission scheduler for the i915 DRM driver.

The general theory of operation is that when batch buffers are submitted to the
driver, the execbuffer() code assigns a unique seqno value and then packages up
all the information required to execute the batch buffer at a later time. This
package is given over to the scheduler which adds it to an internal node list.
The scheduler also scans the list of objects associated with the batch buffer
and compares them against the objects already in use by other buffers in the
node list. If matches are found then the new batch buffer node is marked as
being dependent upon the matching node. The same is done for the context object.
The scheduler also bumps up the priority of such matching nodes on the grounds
that the more dependencies a given batch buffer has the more important it is
likely to be.

The scheduler aims to have a given (tuneable) number of batch buffers in flight
on the hardware at any given time. If fewer than this are currently executing
when a new node is queued, then the node is passed straight through to the
submit function. Otherwise it is simply added to the queue and the driver
returns back to user land.

As each batch buffer completes, it raises an interrupt which wakes up the
scheduler. Note that it is possible for multiple buffers to complete before the
IRQ handler gets to run. Further, the seqno values of the individual buffers are
not necessary incrementing as the scheduler may have re-ordered their
submission. However, the scheduler keeps the list of executing buffers in order
of hardware submission. Thus it can scan through the list until a matching seqno
is found and then mark all in flight nodes from that point on as completed.

A deferred work queue is also poked by the interrupt handler. When this wakes up
it can do more involved processing such as actually removing completed nodes
from the queue and freeing up the resources associated with them (internal
memory allocations, DRM object references, context reference, etc.). The work
handler also checks the in flight count and calls the submission code if a new
slot has appeared.

When the scheduler's submit code is called, it scans the queued node list for
the highest priority node that has no unmet dependencies. Note that the
dependency calculation is complex as it must take inter-ring dependencies and
potential preemptions into account. Note also that in the future this will be
extended to include external dependencies such as the Android Native Sync file
descriptors and/or the linux dma-buff synchronisation scheme.

If a suitable node is found then it is sent to execbuff_final() for submission
to the hardware. The in flight count is then re-checked and a new node popped
from the list if appropriate.

The scheduler also allows high priority batch buffers (e.g. from a desktop
compositor) to jump ahead of whatever is already running if the underlying
hardware supports pre-emption. In this situation, any work that was pre-empted
is returned to the queued list ready to be resubmitted when no more high
priority work is outstanding.

[Patches against drm-intel-nightly tree fetched 15/07/2015 with struct fence
conversion patches applied]

Dave Gordon (1):
  drm/i915: Updating assorted register and status page definitions

John Harrison (38):
  drm/i915: Add total count to context status debugfs output
  drm/i915: Explicit power enable during deferred context initialisation
  drm/i915: Prelude to splitting i915_gem_do_execbuffer in two
  drm/i915: Split i915_dem_do_execbuffer() in half
  drm/i915: Re-instate request->uniq because it is extremely useful
  drm/i915: Start of GPU scheduler
  drm/i915: Prepare retire_requests to handle out-of-order seqnos
  drm/i915: Added scheduler hook into i915_gem_complete_requests_ring()
  drm/i915: Disable hardware semaphores when GPU scheduler is enabled
  drm/i915: Force MMIO flips when scheduler enabled
  drm/i915: Added scheduler hook when closing DRM file handles
  drm/i915: Added deferred work handler for scheduler
  drm/i915: Redirect execbuffer_final() via scheduler
  drm/i915: Keep the reserved space mechanism happy
  drm/i915: Added tracking/locking of batch buffer objects
  drm/i915: Hook scheduler node clean up into retire requests
  drm/i915: Added scheduler interrupt handler hook
  drm/i915: Added scheduler support to __wait_request() calls
  drm/i915: Added scheduler support to page fault handler
  drm/i915: Added scheduler flush calls to ring throttle and idle functions
  drm/i915: Add scheduler hook to GPU reset
  drm/i915: Added a module parameter for allowing scheduler overrides
  drm/i915: Support for 'unflushed' ring idle
  drm/i915: Defer seqno allocation until actual hardware submission time
  drm/i915: Added immediate submission override to scheduler
  drm/i915: Add sync wait support to scheduler
  drm/i915: Connecting execbuff fences to scheduler
  drm/i915: Added trace points to scheduler
  drm/i915: Added scheduler queue throttling by DRM file handle
  drm/i915: Added debugfs interface to scheduler tuning parameters
  drm/i915: Added debug state dump facilities to scheduler
  drm/i915: Add early exit to execbuff_final() if insufficient ring space
  drm/i915: Added scheduler statistic reporting to debugfs
  drm/i915: Added seqno values to scheduler status dump
  drm/i915: Add scheduler support functions for TDR
  drm/i915: GPU priority bumping to prevent starvation
  drm/i915: Enable GPU scheduler by default
  drm/i915: Allow scheduler to manage inter-ring object synchronisation

 drivers/gpu/drm/i915/Makefile              |    1 +
 drivers/gpu/drm/i915/i915_debugfs.c        |  221 ++++
 drivers/gpu/drm/i915/i915_dma.c            |    6 +
 drivers/gpu/drm/i915/i915_drv.c            |    9 +
 drivers/gpu/drm/i915/i915_drv.h            |   42 +-
 drivers/gpu/drm/i915/i915_gem.c            |  191 ++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  338 ++++--
 drivers/gpu/drm/i915/i915_irq.c            |    3 +
 drivers/gpu/drm/i915/i915_params.c         |    4 +
 drivers/gpu/drm/i915/i915_reg.h            |   30 +-
 drivers/gpu/drm/i915/i915_scheduler.c      | 1762 ++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h      |  178 +++
 drivers/gpu/drm/i915/i915_trace.h          |  233 +++-
 drivers/gpu/drm/i915/intel_display.c       |    6 +-
 drivers/gpu/drm/i915/intel_lrc.c           |  163 ++-
 drivers/gpu/drm/i915/intel_lrc.h           |    1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c    |   47 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h    |   43 +-
 18 files changed, 3110 insertions(+), 168 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_scheduler.c
 create mode 100644 drivers/gpu/drm/i915/i915_scheduler.h

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC 01/39] drm/i915: Add total count to context status debugfs output
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 02/39] drm/i915: Updating assorted register and status page definitions John.C.Harrison
                   ` (38 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

When there are lots and lots and even more lots of contexts (e.g. when running
with execlists) it is useful to be able to immediately see what the total
context count is.

Change-Id: If9726d4df86567100ecf53867b43f4753f08bf84
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index c50a798..05646fe 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1942,6 +1942,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
 	struct intel_context *ctx;
+	uint32_t count = 0;
 	int ret, i;
 
 	ret = mutex_lock_interruptible(&dev->struct_mutex);
@@ -1955,6 +1956,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
 
 		seq_puts(m, "HW context ");
 		describe_ctx(m, ctx);
+		count++;
 		for_each_ring(ring, dev_priv, i) {
 			if (ring->default_context == ctx)
 				seq_printf(m, "(default context %s) ",
@@ -1983,6 +1985,8 @@ static int i915_context_status(struct seq_file *m, void *unused)
 		seq_putc(m, '\n');
 	}
 
+	seq_printf(m, "Total: %d contexts\n", count);
+
 	mutex_unlock(&dev->struct_mutex);
 
 	return 0;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 02/39] drm/i915: Updating assorted register and status page definitions
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
  2015-07-17 14:33 ` [RFC 01/39] drm/i915: Add total count to context status debugfs output John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 03/39] drm/i915: Explicit power enable during deferred context initialisation John.C.Harrison
                   ` (37 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: Dave Gordon <david.s.gordon@intel.com>

Added various definitions that will be useful for the scheduler in general and
pre-emptive context switching in particular.

Change-Id: Ica805b94160426def51f5d520f5ce51c60864a98
For: VIZ-1587
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h         | 30 ++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.h | 40 +++++++++++++++++++++++++++++++--
 2 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index e9a95df..ae3e9f7 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -250,6 +250,10 @@
 #define  MI_GLOBAL_GTT    (1<<22)
 
 #define MI_NOOP			MI_INSTR(0, 0)
+#define   MI_NOOP_WRITE_ID		(1<<22)
+#define   MI_NOOP_ID_MASK		((1<<22) - 1)
+#define   MI_NOOP_MID(id)		((id) & MI_NOOP_ID_MASK)
+#define MI_NOOP_WITH_ID(id)	MI_INSTR(0, MI_NOOP_WRITE_ID|MI_NOOP_MID(id))
 #define MI_USER_INTERRUPT	MI_INSTR(0x02, 0)
 #define MI_WAIT_FOR_EVENT       MI_INSTR(0x03, 0)
 #define   MI_WAIT_FOR_OVERLAY_FLIP	(1<<16)
@@ -267,6 +271,7 @@
 #define MI_ARB_ON_OFF		MI_INSTR(0x08, 0)
 #define   MI_ARB_ENABLE			(1<<0)
 #define   MI_ARB_DISABLE		(0<<0)
+#define MI_ARB_CHECK		MI_INSTR(0x05, 0)
 #define MI_BATCH_BUFFER_END	MI_INSTR(0x0a, 0)
 #define MI_SUSPEND_FLUSH	MI_INSTR(0x0b, 0)
 #define   MI_SUSPEND_FLUSH_EN	(1<<0)
@@ -316,6 +321,8 @@
 #define   MI_SEMAPHORE_SYNC_INVALID (3<<16)
 #define   MI_SEMAPHORE_SYNC_MASK    (3<<16)
 #define MI_SET_CONTEXT		MI_INSTR(0x18, 0)
+#define   MI_CONTEXT_ADDR_MASK		((~0)<<12)
+#define   MI_SET_CONTEXT_FLAG_MASK	((1<<12)-1)
 #define   MI_MM_SPACE_GTT		(1<<8)
 #define   MI_MM_SPACE_PHYSICAL		(0<<8)
 #define   MI_SAVE_EXT_STATE_EN		(1<<3)
@@ -335,6 +342,10 @@
 #define   MI_USE_GGTT		(1 << 22) /* g4x+ */
 #define MI_STORE_DWORD_INDEX	MI_INSTR(0x21, 1)
 #define   MI_STORE_DWORD_INDEX_SHIFT 2
+#define MI_STORE_REG_MEM	MI_INSTR(0x24, 1)
+#define   MI_STORE_REG_MEM_GTT		(1 << 22)
+#define   MI_STORE_REG_MEM_PREDICATE	(1 << 21)
+
 /* Official intel docs are somewhat sloppy concerning MI_LOAD_REGISTER_IMM:
  * - Always issue a MI_NOOP _before_ the MI_LOAD_REGISTER_IMM - otherwise hw
  *   simply ignores the register load under certain conditions.
@@ -349,7 +360,10 @@
 #define MI_FLUSH_DW		MI_INSTR(0x26, 1) /* for GEN6 */
 #define   MI_FLUSH_DW_STORE_INDEX	(1<<21)
 #define   MI_INVALIDATE_TLB		(1<<18)
+#define   MI_FLUSH_DW_OP_NONE		(0<<14)
 #define   MI_FLUSH_DW_OP_STOREDW	(1<<14)
+#define   MI_FLUSH_DW_OP_RSVD		(2<<14)
+#define   MI_FLUSH_DW_OP_STAMP		(3<<14)
 #define   MI_FLUSH_DW_OP_MASK		(3<<14)
 #define   MI_FLUSH_DW_NOTIFY		(1<<8)
 #define   MI_INVALIDATE_BSD		(1<<7)
@@ -1491,6 +1505,19 @@ enum skl_disp_power_wells {
 
 #define HSW_GTT_CACHE_EN	0x4024
 #define   GTT_CACHE_EN_ALL	0xF0007FFF
+
+/*
+ * Premption-related registers
+ */
+#define RING_UHPTR(base)	((base)+0x134)
+#define   UHPTR_GFX_ADDR_ALIGN		(0x7)
+#define   UHPTR_VALID			(0x1)
+#define RING_PREEMPT_ADDR	0x0214c
+#define   PREEMPT_BATCH_LEVEL_MASK	(0x3)
+#define BB_PREEMPT_ADDR		0x02148
+#define SBB_PREEMPT_ADDR	0x0213c
+#define RS_PREEMPT_STATUS	0x0215c
+
 #define GEN7_WR_WATERMARK	0x4028
 #define GEN7_GFX_PRIO_CTRL	0x402C
 #define ARB_MODE		0x4030
@@ -6612,7 +6639,8 @@ enum skl_disp_power_wells {
 #define  VLV_SPAREG2H				0xA194
 
 #define  GTFIFODBG				0x120000
-#define    GT_FIFO_SBDROPERR			(1<<6)
+#define    GT_FIFO_CPU_ERROR_MASK		0xf
+#define    GT_FIFO_SDDROPERR			(1<<6)
 #define    GT_FIFO_BLOBDROPERR			(1<<5)
 #define    GT_FIFO_SB_READ_ABORTERR		(1<<4)
 #define    GT_FIFO_DROPERR			(1<<3)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 2e68b73..9457774 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -49,6 +49,12 @@ struct  intel_hw_status_page {
 #define I915_READ_MODE(ring) I915_READ(RING_MI_MODE((ring)->mmio_base))
 #define I915_WRITE_MODE(ring, val) I915_WRITE(RING_MI_MODE((ring)->mmio_base), val)
 
+#define I915_READ_UHPTR(ring) \
+		I915_READ(RING_UHPTR((ring)->mmio_base))
+#define I915_WRITE_UHPTR(ring, val) \
+		I915_WRITE(RING_UHPTR((ring)->mmio_base), val)
+#define I915_READ_NOPID(ring) I915_READ(RING_NOPID((ring)->mmio_base))
+
 /* seqno size is actually only a uint32, but since we plan to use MI_FLUSH_DW to
  * do the writes, and that must have qw aligned offsets, simply pretend it's 8b.
  */
@@ -415,10 +421,40 @@ intel_write_status_page(struct intel_engine_cs *ring,
  * 0x20-0x2f: Reserved (Gen6+)
  *
  * The area from dword 0x30 to 0x3ff is available for driver usage.
+ *
+ * Note: in general the allocation of these indices is arbitrary, as long
+ * as they are all unique. But a few of them are used with instructions that
+ * have specific alignment requirements, those particular indices must be
+ * chosen carefully to meet those requirements. The list below shows the
+ * currently-known alignment requirements:
+ *
+ *	I915_GEM_SCRATCH_INDEX	    must be EVEN
  */
 #define I915_GEM_HWS_INDEX		0x30
-#define I915_GEM_HWS_SCRATCH_INDEX	0x40
-#define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
+
+#define I915_GEM_HWS_SCRATCH_INDEX	0x34  /* QWord */
+#define I915_GEM_HWS_SCRATCH_ADDR	(I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
+
+/*
+ * Tracking; these are updated by the GPU at the beginning and/or end of every
+ * batch. One pair for regular buffers, the other for preemptive ones.
+ */
+#define I915_BATCH_DONE_SEQNO		0x40  /* Completed batch seqno        */
+#define I915_BATCH_ACTIVE_SEQNO		0x41  /* In progress batch seqno      */
+#define I915_PREEMPTIVE_DONE_SEQNO	0x42  /* Completed preemptive batch   */
+#define I915_PREEMPTIVE_ACTIVE_SEQNO	0x43  /* In progress preemptive batch */
+
+/*
+ * Preemption; these are used by the GPU to save important registers
+ */
+#define I915_SAVE_PREEMPTED_RING_PTR	0x44  /* HEAD before preemption     */
+#define I915_SAVE_PREEMPTED_BB_PTR	0x45  /* BB ptr before preemption   */
+#define I915_SAVE_PREEMPTED_SBB_PTR	0x46  /* SBB before preemption      */
+#define I915_SAVE_PREEMPTED_UHPTR	0x47  /* UHPTR after preemption     */
+#define I915_SAVE_PREEMPTED_HEAD	0x48  /* HEAD after preemption      */
+#define I915_SAVE_PREEMPTED_TAIL	0x49  /* TAIL after preemption      */
+#define I915_SAVE_PREEMPTED_STATUS	0x4A  /* RS preemption status       */
+#define I915_SAVE_PREEMPTED_NOPID	0x4B  /* Dummy                      */
 
 void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf);
 int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 03/39] drm/i915: Explicit power enable during deferred context initialisation
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
  2015-07-17 14:33 ` [RFC 01/39] drm/i915: Add total count to context status debugfs output John.C.Harrison
  2015-07-17 14:33 ` [RFC 02/39] drm/i915: Updating assorted register and status page definitions John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-21  7:54   ` Daniel Vetter
  2015-07-17 14:33 ` [RFC 04/39] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two John.C.Harrison
                   ` (36 subsequent siblings)
  39 siblings, 1 reply; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

A later patch in this series re-organises the batch buffer submission
code. Part of that is to reduce the scope of a pm_get/put pair.
Specifically, they previously wrapped the entire submission path from
the very start to the very end, now they only wrap the actual hardware
submission part in the back half.

While that is a good thing in general, it causes a problem with the
deferred context initialisation. That is done quite early on in the
execbuf code path - it happens at context validation time rather than
context switch time. Some of the deferred work requires the power to
be enabled. Hence this patch adds an explicit power reference count to
the deferred initialisation code itself.

Change-Id: Id7b1535dfd8809a2bd5546272de2bbec39da2868
Issue: GMINL-5159
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 18dbd5c..8aa9a18 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2317,12 +2317,15 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	WARN_ON(ctx->legacy_hw_ctx.rcs_state != NULL);
 	WARN_ON(ctx->engine[ring->id].state);
 
+	intel_runtime_pm_get(dev->dev_private);
+
 	context_size = round_up(get_lr_context_size(ring), 4096);
 
 	ctx_obj = i915_gem_alloc_object(dev, context_size);
 	if (!ctx_obj) {
 		DRM_DEBUG_DRIVER("Alloc LRC backing obj failed.\n");
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto error_pm;
 	}
 
 	if (is_global_default_ctx) {
@@ -2331,7 +2334,7 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 			DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n",
 					ret);
 			drm_gem_object_unreference(&ctx_obj->base);
-			return ret;
+			goto error_pm;
 		}
 	}
 
@@ -2415,6 +2418,7 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 		ctx->rcs_initialized = true;
 	}
 
+	intel_runtime_pm_put(dev->dev_private);
 	return 0;
 
 error:
@@ -2428,6 +2432,8 @@ error_unpin_ctx:
 	if (is_global_default_ctx)
 		i915_gem_object_ggtt_unpin(ctx_obj);
 	drm_gem_object_unreference(&ctx_obj->base);
+error_pm:
+	intel_runtime_pm_put(dev->dev_private);
 	return ret;
 }
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 04/39] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (2 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 03/39] drm/i915: Explicit power enable during deferred context initialisation John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-21  8:06   ` Daniel Vetter
  2015-07-17 14:33 ` [RFC 05/39] drm/i915: Split i915_dem_do_execbuffer() in half John.C.Harrison
                   ` (35 subsequent siblings)
  39 siblings, 1 reply; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler decouples the submission of batch buffers to the driver with their
submission to the hardware. This basically means splitting the execbuffer()
function in half. This change rearranges some code ready for the split to occur.

Change-Id: Icc9c8afaac18821f3eb8a151a49f918f90c068a3
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 57 ++++++++++++++++++------------
 drivers/gpu/drm/i915/intel_lrc.c           | 18 +++++++---
 2 files changed, 47 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index d95d472..988ecd4 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -926,10 +926,7 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 	if (flush_domains & I915_GEM_DOMAIN_GTT)
 		wmb();
 
-	/* Unconditionally invalidate gpu caches and ensure that we do flush
-	 * any residual writes from the previous batch.
-	 */
-	return intel_ring_invalidate_all_caches(req);
+	return 0;
 }
 
 static bool
@@ -1253,17 +1250,6 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 		}
 	}
 
-	ret = i915_gem_execbuffer_move_to_gpu(params->request, vmas);
-	if (ret)
-		goto error;
-
-	ret = i915_switch_context(params->request);
-	if (ret)
-		goto error;
-
-	WARN(params->ctx->ppgtt && params->ctx->ppgtt->pd_dirty_rings & (1<<ring->id),
-	     "%s didn't clear reload\n", ring->name);
-
 	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
 	instp_mask = I915_EXEC_CONSTANTS_MASK;
 	switch (instp_mode) {
@@ -1301,6 +1287,32 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 		goto error;
 	}
 
+	ret = i915_gem_execbuffer_move_to_gpu(params->request, vmas);
+	if (ret)
+		goto error;
+
+	i915_gem_execbuffer_move_to_active(vmas, params->request);
+
+	/* To be split into two functions here... */
+
+	intel_runtime_pm_get(dev_priv);
+
+	/*
+	 * Unconditionally invalidate gpu caches and ensure that we do flush
+	 * any residual writes from the previous batch.
+	 */
+	ret = intel_ring_invalidate_all_caches(params->request);
+	if (ret)
+		goto error;
+
+	/* Switch to the correct context for the batch */
+	ret = i915_switch_context(params->request);
+	if (ret)
+		goto error;
+
+	WARN(params->ctx->ppgtt && params->ctx->ppgtt->pd_dirty_rings & (1<<ring->id),
+	     "%s didn't clear reload\n", ring->name);
+
 	if (ring == &dev_priv->ring[RCS] &&
 			instp_mode != dev_priv->relative_constants_mode) {
 		ret = intel_ring_begin(params->request, 4);
@@ -1344,15 +1356,20 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 						exec_start, exec_len,
 						params->dispatch_flags);
 		if (ret)
-			return ret;
+			goto error;
 	}
 
 	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
 
-	i915_gem_execbuffer_move_to_active(vmas, params->request);
 	i915_gem_execbuffer_retire_commands(params);
 
 error:
+	/*
+	 * intel_gpu_busy should also get a ref, so it will free when the device
+	 * is really idle.
+	 */
+	intel_runtime_pm_put(dev_priv);
+
 	kfree(cliprects);
 	return ret;
 }
@@ -1563,8 +1580,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	}
 #endif
 
-	intel_runtime_pm_get(dev_priv);
-
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
 		goto pre_mutex_err;
@@ -1759,10 +1774,6 @@ err:
 	mutex_unlock(&dev->struct_mutex);
 
 pre_mutex_err:
-	/* intel_gpu_busy should also get a ref, so it will free when the device
-	 * is really idle. */
-	intel_runtime_pm_put(dev_priv);
-
 	if (fd_fence_complete != -1) {
 		sys_close(fd_fence_complete);
 		args->rsvd2 = (__u64) -1;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8aa9a18..89f3bcd 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -613,10 +613,7 @@ static int execlists_move_to_gpu(struct drm_i915_gem_request *req,
 	if (flush_domains & I915_GEM_DOMAIN_GTT)
 		wmb();
 
-	/* Unconditionally invalidate gpu caches and ensure that we do flush
-	 * any residual writes from the previous batch.
-	 */
-	return logical_ring_invalidate_all_caches(req);
+	return 0;
 }
 
 int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request)
@@ -889,6 +886,18 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 	if (ret)
 		return ret;
 
+	i915_gem_execbuffer_move_to_active(vmas, params->request);
+
+	/* To be split into two functions here... */
+
+	/*
+	 * Unconditionally invalidate gpu caches and ensure that we do flush
+	 * any residual writes from the previous batch.
+	 */
+	ret = logical_ring_invalidate_all_caches(params->request);
+	if (ret)
+		return ret;
+
 	if (ring == &dev_priv->ring[RCS] &&
 	    instp_mode != dev_priv->relative_constants_mode) {
 		ret = intel_logical_ring_begin(params->request, 4);
@@ -913,7 +922,6 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 
 	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
 
-	i915_gem_execbuffer_move_to_active(vmas, params->request);
 	i915_gem_execbuffer_retire_commands(params);
 
 	return 0;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 05/39] drm/i915: Split i915_dem_do_execbuffer() in half
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (3 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 04/39] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-21  8:00   ` Daniel Vetter
  2015-07-17 14:33 ` [RFC 06/39] drm/i915: Re-instate request->uniq because it is extremely useful John.C.Harrison
                   ` (34 subsequent siblings)
  39 siblings, 1 reply; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Split the execbuffer() function in half. The first half collects and validates
all the information requried to process the batch buffer. It also does all the
object pinning, relocations, active list management, etc - basically anything
that must be done upfront before the IOCTL returns and allows the user land side
to start changing/freeing things. The second half does the actual ring
submission.

This change implements the split but leaves the back half being called directly
from the end of the front half.

Change-Id: I5e1c77639ce526ab2401b0323186c518bf13da0a
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  11 +++
 drivers/gpu/drm/i915/i915_gem.c            |   2 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 130 ++++++++++++++++++++---------
 drivers/gpu/drm/i915/intel_lrc.c           |  58 +++++++++----
 drivers/gpu/drm/i915/intel_lrc.h           |   1 +
 5 files changed, 147 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 289ddd6..28d51ac 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1684,10 +1684,18 @@ struct i915_execbuffer_params {
 	struct drm_device               *dev;
 	struct drm_file                 *file;
 	uint32_t                        dispatch_flags;
+	uint32_t                        args_flags;
 	uint32_t                        args_batch_start_offset;
+	uint32_t                        args_batch_len;
+	uint32_t                        args_num_cliprects;
+	uint32_t                        args_DR1;
+	uint32_t                        args_DR4;
 	uint32_t                        batch_obj_vm_offset;
 	struct intel_engine_cs          *ring;
 	struct drm_i915_gem_object      *batch_obj;
+	struct drm_clip_rect            *cliprects;
+	uint32_t                        instp_mask;
+	int                             instp_mode;
 	struct intel_context            *ctx;
 	struct drm_i915_gem_request     *request;
 };
@@ -1929,6 +1937,7 @@ struct drm_i915_private {
 		int (*execbuf_submit)(struct i915_execbuffer_params *params,
 				      struct drm_i915_gem_execbuffer2 *args,
 				      struct list_head *vmas);
+		int (*execbuf_final)(struct i915_execbuffer_params *params);
 		int (*init_rings)(struct drm_device *dev);
 		void (*cleanup_ring)(struct intel_engine_cs *ring);
 		void (*stop_ring)(struct intel_engine_cs *ring);
@@ -2743,9 +2752,11 @@ int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 					struct drm_i915_gem_request *req);
 void i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params);
+void i915_gem_execbuff_release_batch_obj(struct drm_i915_gem_object *batch_obj);
 int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 				   struct drm_i915_gem_execbuffer2 *args,
 				   struct list_head *vmas);
+int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params);
 int i915_gem_execbuffer(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
 int i915_gem_execbuffer2(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8150820..2a5667b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5481,11 +5481,13 @@ int i915_gem_init(struct drm_device *dev)
 
 	if (!i915.enable_execlists) {
 		dev_priv->gt.execbuf_submit = i915_gem_ringbuffer_submission;
+		dev_priv->gt.execbuf_final = i915_gem_ringbuffer_submission_final;
 		dev_priv->gt.init_rings = i915_gem_init_rings;
 		dev_priv->gt.cleanup_ring = intel_cleanup_ring_buffer;
 		dev_priv->gt.stop_ring = intel_stop_ring_buffer;
 	} else {
 		dev_priv->gt.execbuf_submit = intel_execlists_submission;
+		dev_priv->gt.execbuf_final = intel_execlists_submission_final;
 		dev_priv->gt.init_rings = intel_logical_rings_init;
 		dev_priv->gt.cleanup_ring = intel_logical_ring_cleanup;
 		dev_priv->gt.stop_ring = intel_logical_ring_stop;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 988ecd4..ba9d595 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1198,14 +1198,10 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 			       struct drm_i915_gem_execbuffer2 *args,
 			       struct list_head *vmas)
 {
-	struct drm_clip_rect *cliprects = NULL;
 	struct drm_device *dev = params->dev;
 	struct intel_engine_cs *ring = params->ring;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	u64 exec_start, exec_len;
-	int instp_mode;
-	u32 instp_mask;
-	int i, ret = 0;
+	int ret = 0;
 
 	if (args->num_cliprects != 0) {
 		if (ring != &dev_priv->ring[RCS]) {
@@ -1218,23 +1214,23 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 			return -EINVAL;
 		}
 
-		if (args->num_cliprects > UINT_MAX / sizeof(*cliprects)) {
+		if (args->num_cliprects > UINT_MAX / sizeof(*params->cliprects)) {
 			DRM_DEBUG("execbuf with %u cliprects\n",
 				  args->num_cliprects);
 			return -EINVAL;
 		}
 
-		cliprects = kcalloc(args->num_cliprects,
-				    sizeof(*cliprects),
+		params->cliprects = kcalloc(args->num_cliprects,
+				    sizeof(*params->cliprects),
 				    GFP_KERNEL);
-		if (cliprects == NULL) {
+		if (params->cliprects == NULL) {
 			ret = -ENOMEM;
 			goto error;
 		}
 
-		if (copy_from_user(cliprects,
+		if (copy_from_user(params->cliprects,
 				   to_user_ptr(args->cliprects_ptr),
-				   sizeof(*cliprects)*args->num_cliprects)) {
+				   sizeof(*params->cliprects)*args->num_cliprects)) {
 			ret = -EFAULT;
 			goto error;
 		}
@@ -1250,19 +1246,19 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 		}
 	}
 
-	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
-	instp_mask = I915_EXEC_CONSTANTS_MASK;
-	switch (instp_mode) {
+	params->instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
+	params->instp_mask = I915_EXEC_CONSTANTS_MASK;
+	switch (params->instp_mode) {
 	case I915_EXEC_CONSTANTS_REL_GENERAL:
 	case I915_EXEC_CONSTANTS_ABSOLUTE:
 	case I915_EXEC_CONSTANTS_REL_SURFACE:
-		if (instp_mode != 0 && ring != &dev_priv->ring[RCS]) {
+		if (params->instp_mode != 0 && ring != &dev_priv->ring[RCS]) {
 			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
 			ret = -EINVAL;
 			goto error;
 		}
 
-		if (instp_mode != dev_priv->relative_constants_mode) {
+		if (params->instp_mode != dev_priv->relative_constants_mode) {
 			if (INTEL_INFO(dev)->gen < 4) {
 				DRM_DEBUG("no rel constants on pre-gen4\n");
 				ret = -EINVAL;
@@ -1270,7 +1266,7 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 			}
 
 			if (INTEL_INFO(dev)->gen > 5 &&
-			    instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
+			    params->instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
 				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
 				ret = -EINVAL;
 				goto error;
@@ -1278,11 +1274,11 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 
 			/* The HW changed the meaning on this bit on gen6 */
 			if (INTEL_INFO(dev)->gen >= 6)
-				instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
+				params->instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
 		}
 		break;
 	default:
-		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
+		DRM_DEBUG("execbuf with unknown constants: %d\n", params->instp_mode);
 		ret = -EINVAL;
 		goto error;
 	}
@@ -1293,7 +1289,38 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 
 	i915_gem_execbuffer_move_to_active(vmas, params->request);
 
-	/* To be split into two functions here... */
+	ret = dev_priv->gt.execbuf_final(params);
+	if (ret)
+		goto error;
+
+	/*
+	 * Free everything that was stored in the QE structure (until the
+	 * scheduler arrives and does it instead):
+	 */
+	kfree(params->cliprects);
+	if (params->dispatch_flags & I915_DISPATCH_SECURE)
+		i915_gem_execbuff_release_batch_obj(params->batch_obj);
+
+	return 0;
+
+error:
+	kfree(params->cliprects);
+	return ret;
+}
+
+/*
+ * This is the main function for adding a batch to the ring.
+ * It is called from the scheduler, with the struct_mutex already held.
+ */
+int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params)
+{
+	struct drm_i915_private *dev_priv = params->dev->dev_private;
+	struct intel_engine_cs  *ring = params->ring;
+	u64 exec_start, exec_len;
+	int ret, i;
+
+	/* The mutex must be acquired before calling this function */
+	BUG_ON(!mutex_is_locked(&params->dev->struct_mutex));
 
 	intel_runtime_pm_get(dev_priv);
 
@@ -1314,7 +1341,7 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 	     "%s didn't clear reload\n", ring->name);
 
 	if (ring == &dev_priv->ring[RCS] &&
-			instp_mode != dev_priv->relative_constants_mode) {
+			params->instp_mode != dev_priv->relative_constants_mode) {
 		ret = intel_ring_begin(params->request, 4);
 		if (ret)
 			goto error;
@@ -1322,26 +1349,26 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 		intel_ring_emit(ring, MI_NOOP);
 		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 		intel_ring_emit(ring, INSTPM);
-		intel_ring_emit(ring, instp_mask << 16 | instp_mode);
+		intel_ring_emit(ring, params->instp_mask << 16 | params->instp_mode);
 		intel_ring_advance(ring);
 
-		dev_priv->relative_constants_mode = instp_mode;
+		dev_priv->relative_constants_mode = params->instp_mode;
 	}
 
-	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
-		ret = i915_reset_gen7_sol_offsets(dev, params->request);
+	if (params->args_flags & I915_EXEC_GEN7_SOL_RESET) {
+		ret = i915_reset_gen7_sol_offsets(params->dev, params->request);
 		if (ret)
 			goto error;
 	}
 
-	exec_len   = args->batch_len;
+	exec_len   = params->args_batch_len;
 	exec_start = params->batch_obj_vm_offset +
 		     params->args_batch_start_offset;
 
-	if (cliprects) {
-		for (i = 0; i < args->num_cliprects; i++) {
-			ret = i915_emit_box(params->request, &cliprects[i],
-					    args->DR1, args->DR4);
+	if (params->cliprects) {
+		for (i = 0; i < params->args_num_cliprects; i++) {
+			ret = i915_emit_box(params->request, &params->cliprects[i],
+					    params->args_DR1, params->args_DR4);
 			if (ret)
 				goto error;
 
@@ -1370,7 +1397,6 @@ error:
 	 */
 	intel_runtime_pm_put(dev_priv);
 
-	kfree(cliprects);
 	return ret;
 }
 
@@ -1722,6 +1748,11 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	params->file                    = file;
 	params->ring                    = ring;
 	params->dispatch_flags          = dispatch_flags;
+	params->args_flags              = args->flags;
+	params->args_batch_len          = args->batch_len;
+	params->args_num_cliprects      = args->num_cliprects;
+	params->args_DR1                = args->DR1;
+	params->args_DR4                = args->DR4;
 	params->batch_obj               = batch_obj;
 	params->ctx                     = ctx;
 
@@ -1747,19 +1778,31 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 #endif
 
 	ret = dev_priv->gt.execbuf_submit(params, args, &eb->vmas);
+	if (ret)
+		goto err;
+
+	/* the request owns the ref now */
+	i915_gem_context_unreference(ctx);
 
-err_batch_unpin:
 	/*
-	 * FIXME: We crucially rely upon the active tracking for the (ppgtt)
-	 * batch vma for correctness. For less ugly and less fragility this
-	 * needs to be adjusted to also track the ggtt batch vma properly as
-	 * active.
+	 * The eb list is no longer required. The scheduler has extracted all
+	 * the information than needs to persist.
+	 */
+	eb_destroy(eb);
+
+	/*
+	 * Don't clean up everything that is now saved away in the queue.
+	 * Just unlock and return immediately.
 	 */
+	mutex_unlock(&dev->struct_mutex);
+
+	return 0;
+
+err_batch_unpin:
 	if (dispatch_flags & I915_DISPATCH_SECURE)
-		i915_gem_object_ggtt_unpin(batch_obj);
+		i915_gem_execbuff_release_batch_obj(batch_obj);
 
 err:
-	/* the request owns the ref now */
 	i915_gem_context_unreference(ctx);
 	eb_destroy(eb);
 
@@ -1782,6 +1825,17 @@ pre_mutex_err:
 	return ret;
 }
 
+void i915_gem_execbuff_release_batch_obj(struct drm_i915_gem_object *batch_obj)
+{
+	/*
+	 * FIXME: We crucially rely upon the active tracking for the (ppgtt)
+	 * batch vma for correctness. For less ugly and less fragility this
+	 * needs to be adjusted to also track the ggtt batch vma properly as
+	 * active.
+	 */
+	i915_gem_object_ggtt_unpin(batch_obj);
+}
+
 /*
  * Legacy execbuffer just creates an exec2 list from the original exec object
  * list array and passes it to the real function.
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 89f3bcd..bba1152 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -830,35 +830,31 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 	struct drm_device       *dev = params->dev;
 	struct intel_engine_cs  *ring = params->ring;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_ringbuffer *ringbuf = params->ctx->engine[ring->id].ringbuf;
-	u64 exec_start;
-	int instp_mode;
-	u32 instp_mask;
 	int ret;
 
-	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
-	instp_mask = I915_EXEC_CONSTANTS_MASK;
-	switch (instp_mode) {
+	params->instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
+	params->instp_mask = I915_EXEC_CONSTANTS_MASK;
+	switch (params->instp_mode) {
 	case I915_EXEC_CONSTANTS_REL_GENERAL:
 	case I915_EXEC_CONSTANTS_ABSOLUTE:
 	case I915_EXEC_CONSTANTS_REL_SURFACE:
-		if (instp_mode != 0 && ring != &dev_priv->ring[RCS]) {
+		if (params->instp_mode != 0 && ring != &dev_priv->ring[RCS]) {
 			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
 			return -EINVAL;
 		}
 
-		if (instp_mode != dev_priv->relative_constants_mode) {
-			if (instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
+		if (params->instp_mode != dev_priv->relative_constants_mode) {
+			if (params->instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
 				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
 				return -EINVAL;
 			}
 
 			/* The HW changed the meaning on this bit on gen6 */
-			instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
+			params->instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
 		}
 		break;
 	default:
-		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
+		DRM_DEBUG("execbuf with unknown constants: %d\n", params->instp_mode);
 		return -EINVAL;
 	}
 
@@ -888,7 +884,33 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 
 	i915_gem_execbuffer_move_to_active(vmas, params->request);
 
-	/* To be split into two functions here... */
+	ret = dev_priv->gt.execbuf_final(params);
+	if (ret)
+		return ret;
+
+	/*
+	 * Free everything that was stored in the QE structure (until the
+	 * scheduler arrives and does it instead):
+	 */
+	if (params->dispatch_flags & I915_DISPATCH_SECURE)
+		i915_gem_execbuff_release_batch_obj(params->batch_obj);
+
+	return 0;
+}
+
+/*
+ * This is the main function for adding a batch to the ring.
+ * It is called from the scheduler, with the struct_mutex already held.
+ */
+int intel_execlists_submission_final(struct i915_execbuffer_params *params)
+{
+	struct drm_i915_private *dev_priv = params->dev->dev_private;
+	struct intel_engine_cs  *ring = params->ring;
+	u64 exec_start;
+	int ret;
+
+	/* The mutex must be acquired before calling this function */
+	BUG_ON(!mutex_is_locked(&params->dev->struct_mutex));
 
 	/*
 	 * Unconditionally invalidate gpu caches and ensure that we do flush
@@ -899,7 +921,9 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 		return ret;
 
 	if (ring == &dev_priv->ring[RCS] &&
-	    instp_mode != dev_priv->relative_constants_mode) {
+	    params->instp_mode != dev_priv->relative_constants_mode) {
+		struct intel_ringbuffer *ringbuf = params->request->ringbuf;
+
 		ret = intel_logical_ring_begin(params->request, 4);
 		if (ret)
 			return ret;
@@ -907,14 +931,14 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 		intel_logical_ring_emit(ringbuf, MI_NOOP);
 		intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1));
 		intel_logical_ring_emit(ringbuf, INSTPM);
-		intel_logical_ring_emit(ringbuf, instp_mask << 16 | instp_mode);
+		intel_logical_ring_emit(ringbuf, params->instp_mask << 16 | params->instp_mode);
 		intel_logical_ring_advance(ringbuf);
 
-		dev_priv->relative_constants_mode = instp_mode;
+		dev_priv->relative_constants_mode = params->instp_mode;
 	}
 
 	exec_start = params->batch_obj_vm_offset +
-		     args->batch_start_offset;
+		     params->args_batch_start_offset;
 
 	ret = ring->emit_bb_start(params->request, exec_start, params->dispatch_flags);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 64f89f99..07d402b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -81,6 +81,7 @@ struct i915_execbuffer_params;
 int intel_execlists_submission(struct i915_execbuffer_params *params,
 			       struct drm_i915_gem_execbuffer2 *args,
 			       struct list_head *vmas);
+int intel_execlists_submission_final(struct i915_execbuffer_params *params);
 u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
 void intel_lrc_irq_handler(struct intel_engine_cs *ring);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 06/39] drm/i915: Re-instate request->uniq because it is extremely useful
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (4 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 05/39] drm/i915: Split i915_dem_do_execbuffer() in half John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 07/39] drm/i915: Start of GPU scheduler John.C.Harrison
                   ` (33 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The seqno value cannot always be used when debugging issues via trace
points. This is because it can be reset back to start, especially
during TDR type tests. Also, when the scheduler arrives the seqno is
only valid while a given request is executing on the hardware. While
the request is simply queued waiting for submission, it's seqno value
will be zero (meaning invalid).

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h   |  5 +++++
 drivers/gpu/drm/i915/i915_gem.c   |  3 ++-
 drivers/gpu/drm/i915/i915_trace.h | 25 +++++++++++++++++--------
 3 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 28d51ac..a680778 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1945,6 +1945,8 @@ struct drm_i915_private {
 
 	bool edp_low_vswing;
 
+	uint32_t request_uniq;
+
 	/*
 	 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
 	 * will be rejected. Instead look for a better place.
@@ -2186,6 +2188,9 @@ struct drm_i915_gem_request {
 	/** GEM sequence number associated with this request. */
 	uint32_t seqno;
 
+	/* Unique identifier which can be used for trace points & debug */
+	uint32_t uniq;
+
 	/** Position in the ringbuffer of the start of the request */
 	u32 head;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2a5667b..0c407ae 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2828,7 +2828,7 @@ static void i915_fence_value_str(struct fence *fence, char *str, int size)
 
 	req = container_of(fence, typeof(*req), fence);
 
-	snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);
+	snprintf(str, size, "%d [%d:%d]", req->fence.seqno, req->uniq, req->seqno);
 }
 
 static const struct fence_ops i915_gem_request_fops = {
@@ -2974,6 +2974,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 
 	req->i915 = dev_priv;
 	req->ring = ring;
+	req->uniq = dev_priv->request_uniq++;
 	req->ctx  = ctx;
 	i915_gem_context_reference(req->ctx);
 
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index f455194..796c630 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -433,6 +433,7 @@ TRACE_EVENT(i915_gem_ring_sync_to,
 			     __field(u32, dev)
 			     __field(u32, sync_from)
 			     __field(u32, sync_to)
+			     __field(u32, uniq_to)
 			     __field(u32, seqno)
 			     ),
 
@@ -440,13 +441,14 @@ TRACE_EVENT(i915_gem_ring_sync_to,
 			   __entry->dev = from->dev->primary->index;
 			   __entry->sync_from = from->id;
 			   __entry->sync_to = to_req->ring->id;
+			   __entry->uniq_to = to_req->uniq;
 			   __entry->seqno = i915_gem_request_get_seqno(req);
 			   ),
 
-	    TP_printk("dev=%u, sync-from=%u, sync-to=%u, seqno=%u",
+	    TP_printk("dev=%u, sync-from=%u, sync-to=%u, seqno=%u, to_uniq=%u",
 		      __entry->dev,
 		      __entry->sync_from, __entry->sync_to,
-		      __entry->seqno)
+		      __entry->seqno, __entry->uniq_to)
 );
 
 TRACE_EVENT(i915_gem_ring_dispatch,
@@ -481,6 +483,7 @@ TRACE_EVENT(i915_gem_ring_flush,
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
 			     __field(u32, ring)
+			     __field(u32, uniq)
 			     __field(u32, invalidate)
 			     __field(u32, flush)
 			     ),
@@ -488,12 +491,13 @@ TRACE_EVENT(i915_gem_ring_flush,
 	    TP_fast_assign(
 			   __entry->dev = req->ring->dev->primary->index;
 			   __entry->ring = req->ring->id;
+			   __entry->uniq = req->uniq;
 			   __entry->invalidate = invalidate;
 			   __entry->flush = flush;
 			   ),
 
-	    TP_printk("dev=%u, ring=%x, invalidate=%04x, flush=%04x",
-		      __entry->dev, __entry->ring,
+	    TP_printk("dev=%u, ring=%x, request=%u, invalidate=%04x, flush=%04x",
+		      __entry->dev, __entry->ring, __entry->uniq,
 		      __entry->invalidate, __entry->flush)
 );
 
@@ -504,6 +508,7 @@ DECLARE_EVENT_CLASS(i915_gem_request,
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
 			     __field(u32, ring)
+			     __field(u32, uniq)
 			     __field(u32, seqno)
 			     ),
 
@@ -512,11 +517,13 @@ DECLARE_EVENT_CLASS(i915_gem_request,
 						i915_gem_request_get_ring(req);
 			   __entry->dev = ring->dev->primary->index;
 			   __entry->ring = ring->id;
+			   __entry->uniq = req ? req->uniq : 0;
 			   __entry->seqno = i915_gem_request_get_seqno(req);
 			   ),
 
-	    TP_printk("dev=%u, ring=%u, seqno=%u",
-		      __entry->dev, __entry->ring, __entry->seqno)
+	    TP_printk("dev=%u, ring=%u, uniq=%u, seqno=%u",
+		      __entry->dev, __entry->ring, __entry->uniq,
+		      __entry->seqno)
 );
 
 DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
@@ -564,6 +571,7 @@ TRACE_EVENT(i915_gem_request_wait_begin,
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
 			     __field(u32, ring)
+			     __field(u32, uniq)
 			     __field(u32, seqno)
 			     __field(bool, blocking)
 			     ),
@@ -579,13 +587,14 @@ TRACE_EVENT(i915_gem_request_wait_begin,
 						i915_gem_request_get_ring(req);
 			   __entry->dev = ring->dev->primary->index;
 			   __entry->ring = ring->id;
+			   __entry->uniq = req ? req->uniq : 0;
 			   __entry->seqno = i915_gem_request_get_seqno(req);
 			   __entry->blocking =
 				     mutex_is_locked(&ring->dev->struct_mutex);
 			   ),
 
-	    TP_printk("dev=%u, ring=%u, seqno=%u, blocking=%s",
-		      __entry->dev, __entry->ring,
+	    TP_printk("dev=%u, ring=%u, uniq=%u, seqno=%u, blocking=%s",
+		      __entry->dev, __entry->ring, __entry->uniq,
 		      __entry->seqno, __entry->blocking ?  "yes (NB)" : "no")
 );
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 07/39] drm/i915: Start of GPU scheduler
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (5 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 06/39] drm/i915: Re-instate request->uniq because it is extremely useful John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-21  9:40   ` Daniel Vetter
  2015-07-17 14:33 ` [RFC 08/39] drm/i915: Prepare retire_requests to handle out-of-order seqnos John.C.Harrison
                   ` (32 subsequent siblings)
  39 siblings, 1 reply; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Initial creation of scheduler source files. Note that this patch implements most
of the scheduler functionality but does not hook it in to the driver yet. It
also leaves the scheduler code in 'pass through' mode so that even when it is
hooked in, it will not actually do very much. This allows the hooks to be added
one at a time in byte size chunks and only when the scheduler is finally enabled
at the end does anything start happening.

The general theory of operation is that when batch buffers are submitted to the
driver, the execbuffer() code assigns a unique request and then packages up all
the information required to execute the batch buffer at a later time. This
package is given over to the scheduler which adds it to an internal node list.
The scheduler also scans the list of objects associated with the batch buffer
and compares them against the objects already in use by other buffers in the
node list. If matches are found then the new batch buffer node is marked as
being dependent upon the matching node. The same is done for the context object.
The scheduler also bumps up the priority of such matching nodes on the grounds
that the more dependencies a given batch buffer has the more important it is
likely to be.

The scheduler aims to have a given (tuneable) number of batch buffers in flight
on the hardware at any given time. If fewer than this are currently executing
when a new node is queued, then the node is passed straight through to the
submit function. Otherwise it is simply added to the queue and the driver
returns back to user land.

As each batch buffer completes, it raises an interrupt which wakes up the
scheduler. Note that it is possible for multiple buffers to complete before the
IRQ handler gets to run. Further, it is possible for the seqno values to be
un-ordered (particularly once pre-emption is enabled). However, the scheduler
keeps the list of executing buffers in order of hardware submission. Thus it can
scan through the list until a matching seqno is found and then mark all in
flight nodes from that point on as completed.

A deferred work queue is also poked by the interrupt handler. When this wakes up
it can do more involved processing such as actually removing completed nodes
from the queue and freeing up the resources associated with them (internal
memory allocations, DRM object references, context reference, etc.). The work
handler also checks the in flight count and calls the submission code if a new
slot has appeared.

When the scheduler's submit code is called, it scans the queued node list for
the highest priority node that has no unmet dependencies. Note that the
dependency calculation is complex as it must take inter-ring dependencies and
potential preemptions into account. Note also that in the future this will be
extended to include external dependencies such as the Android Native Sync file
descriptors and/or the linux dma-buff synchronisation scheme.

If a suitable node is found then it is sent to execbuff_final() for submission
to the hardware. The in flight count is then re-checked and a new node popped
from the list if appropriate.

Note that this patch does not implement pre-emptive scheduling. Only basic
scheduling by re-ordering batch buffer submission is currently implemented.

Change-Id: I1e08f59e650a3c2bbaaa9de7627da33849b06106
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/Makefile         |   1 +
 drivers/gpu/drm/i915/i915_drv.h       |   4 +
 drivers/gpu/drm/i915/i915_gem.c       |   5 +
 drivers/gpu/drm/i915/i915_scheduler.c | 776 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h |  91 ++++
 5 files changed, 877 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_scheduler.c
 create mode 100644 drivers/gpu/drm/i915/i915_scheduler.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 47a74114..c367b39 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -9,6 +9,7 @@ ccflags-y := -Werror
 # core driver code
 i915-y := i915_drv.o \
 	  i915_params.o \
+	  i915_scheduler.o \
           i915_suspend.o \
 	  i915_sysfs.o \
 	  intel_pm.o \
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a680778..7d2a494 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1700,6 +1700,8 @@ struct i915_execbuffer_params {
 	struct drm_i915_gem_request     *request;
 };
 
+struct i915_scheduler;
+
 struct drm_i915_private {
 	struct drm_device *dev;
 	struct kmem_cache *objects;
@@ -1932,6 +1934,8 @@ struct drm_i915_private {
 
 	struct i915_runtime_pm pm;
 
+	struct i915_scheduler *scheduler;
+
 	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
 	struct {
 		int (*execbuf_submit)(struct i915_execbuffer_params *params,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0c407ae..3fbc6ec 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -40,6 +40,7 @@
 #ifdef CONFIG_SYNC
 #include <../drivers/staging/android/sync.h>
 #endif
+#include "i915_scheduler.h"
 
 #define RQ_BUG_ON(expr)
 
@@ -5398,6 +5399,10 @@ i915_gem_init_hw(struct drm_device *dev)
 
 	i915_gem_init_swizzling(dev);
 
+	ret = i915_scheduler_init(dev);
+	if (ret)
+		return ret;
+
 	/*
 	 * At least 830 can leave some of the unused rings
 	 * "active" (ie. head != tail) after resume which
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
new file mode 100644
index 0000000..71d8df7
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -0,0 +1,776 @@
+/*
+ * Copyright (c) 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "i915_drv.h"
+#include "intel_drv.h"
+#include "i915_scheduler.h"
+
+static int         i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node);
+static int         i915_scheduler_remove_dependent(struct i915_scheduler *scheduler,
+						   struct i915_scheduler_queue_entry *remove);
+static int         i915_scheduler_submit(struct intel_engine_cs *ring,
+					 bool is_locked);
+static uint32_t    i915_scheduler_count_flying(struct i915_scheduler *scheduler,
+					       struct intel_engine_cs *ring);
+static void        i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler);
+static int         i915_scheduler_priority_bump(struct i915_scheduler *scheduler,
+						struct i915_scheduler_queue_entry *target,
+						uint32_t bump);
+
+int i915_scheduler_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	int                     r;
+
+	if (scheduler)
+		return 0;
+
+	scheduler = kzalloc(sizeof(*scheduler), GFP_KERNEL);
+	if (!scheduler)
+		return -ENOMEM;
+
+	spin_lock_init(&scheduler->lock);
+
+	for (r = 0; r < I915_NUM_RINGS; r++)
+		INIT_LIST_HEAD(&scheduler->node_queue[r]);
+
+	scheduler->index = 1;
+
+	/* Default tuning values: */
+	scheduler->priority_level_max     = ~0U;
+	scheduler->priority_level_preempt = 900;
+	scheduler->min_flying             = 2;
+
+	dev_priv->scheduler = scheduler;
+
+	return 0;
+}
+
+int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
+{
+	struct drm_i915_private *dev_priv = qe->params.dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct intel_engine_cs  *ring = qe->params.ring;
+	struct i915_scheduler_queue_entry  *node;
+	struct i915_scheduler_queue_entry  *test;
+	struct timespec     stamp;
+	unsigned long       flags;
+	bool                not_flying, found;
+	int                 i, j, r;
+	int                 incomplete = 0;
+
+	BUG_ON(!scheduler);
+
+	if (1/*i915.scheduler_override & i915_so_direct_submit*/) {
+		int ret;
+
+		qe->scheduler_index = scheduler->index++;
+
+		scheduler->flags[qe->params.ring->id] |= i915_sf_submitting;
+		ret = dev_priv->gt.execbuf_final(&qe->params);
+		scheduler->flags[qe->params.ring->id] &= ~i915_sf_submitting;
+
+		/*
+		 * Don't do any clean up on failure because the caller will
+		 * do it all anyway.
+		 */
+		if (ret)
+			return ret;
+
+		/* Free everything that is owned by the QE structure: */
+		kfree(qe->params.cliprects);
+		if (qe->params.dispatch_flags & I915_DISPATCH_SECURE)
+			i915_gem_execbuff_release_batch_obj(qe->params.batch_obj);
+
+		return 0;
+	}
+
+	getrawmonotonic(&stamp);
+
+	node = kmalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		return -ENOMEM;
+
+	*node = *qe;
+	INIT_LIST_HEAD(&node->link);
+	node->status = i915_sqs_queued;
+	node->stamp  = stamp;
+	i915_gem_request_reference(node->params.request);
+
+	/* Need to determine the number of incomplete entries in the list as
+	 * that will be the maximum size of the dependency list.
+	 *
+	 * Note that the allocation must not be made with the spinlock acquired
+	 * as kmalloc can sleep. However, the unlock/relock is safe because no
+	 * new entries can be queued up during the unlock as the i915 driver
+	 * mutex is still held. Entries could be removed from the list but that
+	 * just means the dep_list will be over-allocated which is fine.
+	 */
+	spin_lock_irqsave(&scheduler->lock, flags);
+	for (r = 0; r < I915_NUM_RINGS; r++) {
+		list_for_each_entry(test, &scheduler->node_queue[r], link) {
+			if (I915_SQS_IS_COMPLETE(test))
+				continue;
+
+			incomplete++;
+		}
+	}
+
+	/* Temporarily unlock to allocate memory: */
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+	if (incomplete) {
+		node->dep_list = kmalloc(sizeof(node->dep_list[0]) * incomplete,
+					 GFP_KERNEL);
+		if (!node->dep_list) {
+			kfree(node);
+			return -ENOMEM;
+		}
+	} else
+		node->dep_list = NULL;
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+	node->num_deps = 0;
+
+	if (node->dep_list) {
+		for (r = 0; r < I915_NUM_RINGS; r++) {
+			list_for_each_entry(test, &scheduler->node_queue[r], link) {
+				if (I915_SQS_IS_COMPLETE(test))
+					continue;
+
+				/*
+				 * Batches on the same ring for the same
+				 * context must be kept in order.
+				 */
+				found = (node->params.ctx == test->params.ctx) &&
+					(node->params.ring == test->params.ring);
+
+				/*
+				 * Batches working on the same objects must
+				 * be kept in order.
+				 */
+				for (i = 0; (i < node->num_objs) && !found; i++) {
+					for (j = 0; j < test->num_objs; j++) {
+						if (node->saved_objects[i].obj !=
+							    test->saved_objects[j].obj)
+							continue;
+
+						found = true;
+						break;
+					}
+				}
+
+				if (found) {
+					node->dep_list[node->num_deps] = test;
+					node->num_deps++;
+				}
+			}
+		}
+
+		BUG_ON(node->num_deps > incomplete);
+	}
+
+	if (node->priority && node->num_deps) {
+		i915_scheduler_priority_bump_clear(scheduler);
+
+		for (i = 0; i < node->num_deps; i++)
+			i915_scheduler_priority_bump(scheduler,
+					node->dep_list[i], node->priority);
+	}
+
+	node->scheduler_index = scheduler->index++;
+
+	list_add_tail(&node->link, &scheduler->node_queue[ring->id]);
+
+	not_flying = i915_scheduler_count_flying(scheduler, ring) <
+						 scheduler->min_flying;
+
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	if (not_flying)
+		i915_scheduler_submit(ring, true);
+
+	return 0;
+}
+
+static int i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node)
+{
+	struct drm_i915_private *dev_priv = node->params.dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct intel_engine_cs  *ring;
+
+	BUG_ON(!scheduler);
+	BUG_ON(!node);
+	BUG_ON(node->status != i915_sqs_popped);
+
+	ring = node->params.ring;
+
+	/* Add the node (which should currently be in state none) to the front
+	 * of the queue. This ensure that flying nodes are always held in
+	 * hardware submission order. */
+	list_add(&node->link, &scheduler->node_queue[ring->id]);
+
+	node->status = i915_sqs_flying;
+
+	if (!(scheduler->flags[ring->id] & i915_sf_interrupts_enabled)) {
+		bool    success = true;
+
+		success = ring->irq_get(ring);
+		if (success)
+			scheduler->flags[ring->id] |= i915_sf_interrupts_enabled;
+		else
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * Nodes are considered valid dependencies if they are queued on any ring or
+ * if they are in flight on a different ring. In flight on the same ring is no
+ * longer interesting for non-premptive nodes as the ring serialises execution.
+ * For pre-empting nodes, all in flight dependencies are valid as they must not
+ * be jumped by the act of pre-empting.
+ *
+ * Anything that is neither queued nor flying is uninteresting.
+ */
+static inline bool i915_scheduler_is_dependency_valid(
+			struct i915_scheduler_queue_entry *node, uint32_t idx)
+{
+	struct i915_scheduler_queue_entry *dep;
+
+	dep = node->dep_list[idx];
+	if (!dep)
+		return false;
+
+	if (I915_SQS_IS_QUEUED(dep))
+		return true;
+
+	if (I915_SQS_IS_FLYING(dep)) {
+		if (node->params.ring != dep->params.ring)
+			return true;
+	}
+
+	return false;
+}
+
+static uint32_t i915_scheduler_count_flying(struct i915_scheduler *scheduler,
+					    struct intel_engine_cs *ring)
+{
+	struct i915_scheduler_queue_entry *node;
+	uint32_t                          flying = 0;
+
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link)
+		if (I915_SQS_IS_FLYING(node))
+			flying++;
+
+	return flying;
+}
+
+/* Add a popped node back in to the queue. For example, because the ring was
+ * hung when execfinal() was called and thus the ring submission needs to be
+ * retried later. */
+static void i915_scheduler_node_requeue(struct i915_scheduler_queue_entry *node)
+{
+	BUG_ON(!node);
+	BUG_ON(!I915_SQS_IS_FLYING(node));
+
+	node->status = i915_sqs_queued;
+}
+
+/* Give up on a popped node completely. For example, because it is causing the
+ * ring to hang or is using some resource that no longer exists. */
+static void i915_scheduler_node_kill(struct i915_scheduler_queue_entry *node)
+{
+	BUG_ON(!node);
+	BUG_ON(!I915_SQS_IS_FLYING(node));
+
+	node->status = i915_sqs_dead;
+}
+
+/*
+ * The batch tagged with the indicated seqence number has completed.
+ * Search the queue for it, update its status and those of any batches
+ * submitted earlier, which must also have completed or been preeempted
+ * as appropriate.
+ *
+ * Called with spinlock already held.
+ */
+static void i915_scheduler_seqno_complete(struct intel_engine_cs *ring, uint32_t seqno)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_queue_entry *node;
+	bool got_changes = false;
+
+	/*
+	 * Batch buffers are added to the head of the list in execution order,
+	 * thus seqno values, although not necessarily incrementing, will be
+	 * met in completion order when scanning the list. So when a match is
+	 * found, all subsequent entries must have also popped out. Conversely,
+	 * if a completed entry is found then there is no need to scan further.
+	 */
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+		if (I915_SQS_IS_COMPLETE(node))
+			return;
+
+		if (seqno == node->params.request->seqno)
+			break;
+	}
+
+	/*
+	 * NB: Lots of extra seqnos get added to the ring to track things
+	 * like cache flushes and page flips. So don't complain about if
+	 * no node was found.
+	 */
+	if (&node->link == &scheduler->node_queue[ring->id])
+		return;
+
+	WARN_ON(!I915_SQS_IS_FLYING(node));
+
+	/* Everything from here can be marked as done: */
+	list_for_each_entry_from(node, &scheduler->node_queue[ring->id], link) {
+		/* Check if the marking has already been done: */
+		if (I915_SQS_IS_COMPLETE(node))
+			break;
+
+		if (!I915_SQS_IS_FLYING(node))
+			continue;
+
+		/* Node was in flight so mark it as complete. */
+		node->status = i915_sqs_complete;
+		got_changes = true;
+	}
+
+	/* Should submit new work here if flight list is empty but the DRM
+	 * mutex lock might not be available if a '__wait_request()' call is
+	 * blocking the system. */
+}
+
+int i915_scheduler_handle_irq(struct intel_engine_cs *ring)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	unsigned long       flags;
+	uint32_t            seqno;
+
+	seqno = ring->get_seqno(ring, false);
+
+	if (1/*i915.scheduler_override & i915_so_direct_submit*/)
+		return 0;
+
+	if (seqno == scheduler->last_irq_seqno[ring->id]) {
+		/* Why are there sometimes multiple interrupts per seqno? */
+		return 0;
+	}
+	scheduler->last_irq_seqno[ring->id] = seqno;
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+	i915_scheduler_seqno_complete(ring, seqno);
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	/* XXX: Need to also call i915_scheduler_remove() via work handler. */
+
+	return 0;
+}
+
+int i915_scheduler_remove(struct intel_engine_cs *ring)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_queue_entry  *node, *node_next;
+	unsigned long       flags;
+	int                 flying = 0, queued = 0;
+	int                 ret = 0;
+	bool                do_submit;
+	uint32_t            min_seqno;
+	struct list_head    remove;
+
+	if (list_empty(&scheduler->node_queue[ring->id]))
+		return 0;
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+
+	/* /i915_scheduler_dump_locked(ring, "remove/pre");/ */
+
+	/*
+	 * In the case where the system is idle, starting 'min_seqno' from a big
+	 * number will cause all nodes to be removed as they are now back to
+	 * being in-order. However, this will be a problem if the last one to
+	 * complete was actually out-of-order as the ring seqno value will be
+	 * lower than one or more completed buffers. Thus code looking for the
+	 * completion of said buffers will wait forever.
+	 * Instead, use the hardware seqno as the starting point. This means
+	 * that some buffers might be kept around even in a completely idle
+	 * system but it should guarantee that no-one ever gets confused when
+	 * waiting for buffer completion.
+	 */
+	min_seqno = ring->get_seqno(ring, true);
+
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+		if (I915_SQS_IS_QUEUED(node))
+			queued++;
+		else if (I915_SQS_IS_FLYING(node))
+			flying++;
+		else if (I915_SQS_IS_COMPLETE(node))
+			continue;
+
+		if (node->params.request->seqno == 0)
+			continue;
+
+		if (!i915_seqno_passed(node->params.request->seqno, min_seqno))
+			min_seqno = node->params.request->seqno;
+	}
+
+	INIT_LIST_HEAD(&remove);
+	list_for_each_entry_safe(node, node_next, &scheduler->node_queue[ring->id], link) {
+		/*
+		 * Only remove completed nodes which have a lower seqno than
+		 * all pending nodes. While there is the possibility of the
+		 * ring's seqno counting backwards, all higher buffers must
+		 * be remembered so that the 'i915_seqno_passed()' test can
+		 * report that they have in fact passed.
+		 *
+		 * NB: This is not true for 'dead' nodes. The GPU reset causes
+		 * the software seqno to restart from its initial value. Thus
+		 * the dead nodes must be removed even though their seqno values
+		 * are potentially vastly greater than the current ring seqno.
+		 */
+		if (!I915_SQS_IS_COMPLETE(node))
+			continue;
+
+		if (node->status != i915_sqs_dead) {
+			if (i915_seqno_passed(node->params.request->seqno, min_seqno) &&
+			    (node->params.request->seqno != min_seqno))
+				continue;
+		}
+
+		list_del(&node->link);
+		list_add(&node->link, &remove);
+
+		/* Strip the dependency info while the mutex is still locked */
+		i915_scheduler_remove_dependent(scheduler, node);
+
+		continue;
+	}
+
+	/*
+	 * No idea why but this seems to cause problems occasionally.
+	 * Note that the 'irq_put' code is internally reference counted
+	 * and spin_locked so it should be safe to call.
+	 */
+	/*if ((scheduler->flags[ring->id] & i915_sf_interrupts_enabled) &&
+	    (first_flight[ring->id] == NULL)) {
+		ring->irq_put(ring);
+		scheduler->flags[ring->id] &= ~i915_sf_interrupts_enabled;
+	}*/
+
+	/* Launch more packets now? */
+	do_submit = (queued > 0) && (flying < scheduler->min_flying);
+
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	if (do_submit)
+		ret = i915_scheduler_submit(ring, true);
+
+	while (!list_empty(&remove)) {
+		node = list_first_entry(&remove, typeof(*node), link);
+		list_del(&node->link);
+
+		/* The batch buffer must be unpinned before it is unreferenced
+		 * otherwise the unpin fails with a missing vma!? */
+		if (node->params.dispatch_flags & I915_DISPATCH_SECURE)
+			i915_gem_execbuff_release_batch_obj(node->params.batch_obj);
+
+		/* Free everything that is owned by the node: */
+		i915_gem_request_unreference(node->params.request);
+		kfree(node->params.cliprects);
+		kfree(node->dep_list);
+		kfree(node);
+	}
+
+	return ret;
+}
+
+static void i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler)
+{
+	struct i915_scheduler_queue_entry *node;
+	int i;
+
+	/*
+	 * Ensure circular dependencies don't cause problems and that a bump
+	 * by object usage only bumps each using buffer once:
+	 */
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		list_for_each_entry(node, &scheduler->node_queue[i], link)
+			node->bumped = false;
+	}
+}
+
+static int i915_scheduler_priority_bump(struct i915_scheduler *scheduler,
+					struct i915_scheduler_queue_entry *target,
+					uint32_t bump)
+{
+	uint32_t new_priority;
+	int      i, count;
+
+	if (target->priority >= scheduler->priority_level_max)
+		return 1;
+
+	if (target->bumped)
+		return 0;
+
+	new_priority = target->priority + bump;
+	if ((new_priority <= target->priority) ||
+	    (new_priority > scheduler->priority_level_max))
+		target->priority = scheduler->priority_level_max;
+	else
+		target->priority = new_priority;
+
+	count = 1;
+	target->bumped = true;
+
+	for (i = 0; i < target->num_deps; i++) {
+		if (!target->dep_list[i])
+			continue;
+
+		if (target->dep_list[i]->bumped)
+			continue;
+
+		count += i915_scheduler_priority_bump(scheduler,
+						      target->dep_list[i],
+						      bump);
+	}
+
+	return count;
+}
+
+static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
+				    struct i915_scheduler_queue_entry **pop_node,
+				    unsigned long *flags)
+{
+	struct drm_i915_private            *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler              *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_queue_entry  *best;
+	struct i915_scheduler_queue_entry  *node;
+	int     ret;
+	int     i;
+	bool	any_queued;
+	bool	has_local, has_remote, only_remote;
+
+	*pop_node = NULL;
+	ret = -ENODATA;
+
+	any_queued = false;
+	only_remote = false;
+	best = NULL;
+
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+		if (!I915_SQS_IS_QUEUED(node))
+			continue;
+		any_queued = true;
+
+		has_local  = false;
+		has_remote = false;
+		for (i = 0; i < node->num_deps; i++) {
+			if (!i915_scheduler_is_dependency_valid(node, i))
+				continue;
+
+			if (node->dep_list[i]->params.ring == node->params.ring)
+				has_local = true;
+			else
+				has_remote = true;
+		}
+
+		if (has_remote && !has_local)
+			only_remote = true;
+
+		if (!has_local && !has_remote) {
+			if (!best ||
+			    (node->priority > best->priority))
+				best = node;
+		}
+	}
+
+	if (best) {
+		list_del(&best->link);
+
+		INIT_LIST_HEAD(&best->link);
+		best->status  = i915_sqs_popped;
+
+		ret = 0;
+	} else {
+		/* Can only get here if:
+		 * (a) there are no buffers in the queue
+		 * (b) all queued buffers are dependent on other buffers
+		 *     e.g. on a buffer that is in flight on a different ring
+		 */
+		if (only_remote) {
+			/* The only dependent buffers are on another ring. */
+			ret = -EAGAIN;
+		} else if (any_queued) {
+			/* It seems that something has gone horribly wrong! */
+			DRM_ERROR("Broken dependency tracking on ring %d!\n",
+				  (int) ring->id);
+		}
+	}
+
+	/* i915_scheduler_dump_queue_pop(ring, best); */
+
+	*pop_node = best;
+	return ret;
+}
+
+static int i915_scheduler_submit(struct intel_engine_cs *ring, bool was_locked)
+{
+	struct drm_device   *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_queue_entry  *node;
+	unsigned long       flags;
+	int                 ret = 0, count = 0;
+
+	if (!was_locked) {
+		ret = i915_mutex_lock_interruptible(dev);
+		if (ret)
+			return ret;
+	}
+
+	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+
+	/* First time around, complain if anything unexpected occurs: */
+	ret = i915_scheduler_pop_from_queue_locked(ring, &node, &flags);
+	if (ret) {
+		spin_unlock_irqrestore(&scheduler->lock, flags);
+
+		if (!was_locked)
+			mutex_unlock(&dev->struct_mutex);
+
+		return ret;
+	}
+
+	do {
+		BUG_ON(!node);
+		BUG_ON(node->params.ring != ring);
+		BUG_ON(node->status != i915_sqs_popped);
+		count++;
+
+		/* The call to pop above will have removed the node from the
+		 * list. So add it back in and mark it as in flight. */
+		i915_scheduler_fly_node(node);
+
+		scheduler->flags[ring->id] |= i915_sf_submitting;
+		spin_unlock_irqrestore(&scheduler->lock, flags);
+		ret = dev_priv->gt.execbuf_final(&node->params);
+		spin_lock_irqsave(&scheduler->lock, flags);
+		scheduler->flags[ring->id] &= ~i915_sf_submitting;
+
+		if (ret) {
+			bool requeue = true;
+
+			/* Oh dear! Either the node is broken or the ring is
+			 * busy. So need to kill the node or requeue it and try
+			 * again later as appropriate. */
+
+			switch (-ret) {
+			case ENODEV:
+			case ENOENT:
+				/* Fatal errors. Kill the node. */
+				requeue = false;
+			break;
+
+			case EAGAIN:
+			case EBUSY:
+			case EIO:
+			case ENOMEM:
+			case ERESTARTSYS:
+			case EINTR:
+				/* Supposedly recoverable errors. */
+			break;
+
+			default:
+				DRM_DEBUG_DRIVER("<%s> Got unexpected error from execfinal(): %d!\n",
+						 ring->name, ret);
+				/* Assume it is recoverable and hope for the best. */
+			break;
+			}
+
+			if (requeue) {
+				i915_scheduler_node_requeue(node);
+				/* No point spinning if the ring is currently
+				 * unavailable so just give up and come back
+				 * later. */
+				break;
+			} else
+				i915_scheduler_node_kill(node);
+		}
+
+		/* Keep launching until the sky is sufficiently full. */
+		if (i915_scheduler_count_flying(scheduler, ring) >=
+						scheduler->min_flying)
+			break;
+
+		ret = i915_scheduler_pop_from_queue_locked(ring, &node, &flags);
+	} while (ret == 0);
+
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	if (!was_locked)
+		mutex_unlock(&dev->struct_mutex);
+
+	/* Don't complain about not being able to submit extra entries */
+	if (ret == -ENODATA)
+		ret = 0;
+
+	return (ret < 0) ? ret : count;
+}
+
+static int i915_scheduler_remove_dependent(struct i915_scheduler *scheduler,
+					   struct i915_scheduler_queue_entry *remove)
+{
+	struct i915_scheduler_queue_entry  *node;
+	int     i, r;
+	int     count = 0;
+
+	for (i = 0; i < remove->num_deps; i++)
+		if ((remove->dep_list[i]) &&
+		    (!I915_SQS_IS_COMPLETE(remove->dep_list[i])))
+			count++;
+	BUG_ON(count);
+
+	for (r = 0; r < I915_NUM_RINGS; r++) {
+		list_for_each_entry(node, &scheduler->node_queue[r], link) {
+			for (i = 0; i < node->num_deps; i++) {
+				if (node->dep_list[i] != remove)
+					continue;
+
+				node->dep_list[i] = NULL;
+			}
+		}
+	}
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
new file mode 100644
index 0000000..0c5fc7f
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -0,0 +1,91 @@
+/*
+ * Copyright (c) 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef _I915_SCHEDULER_H_
+#define _I915_SCHEDULER_H_
+
+enum i915_scheduler_queue_status {
+	/* Limbo: */
+	i915_sqs_none = 0,
+	/* Not yet submitted to hardware: */
+	i915_sqs_queued,
+	/* Popped from queue, ready to fly: */
+	i915_sqs_popped,
+	/* Sent to hardware for processing: */
+	i915_sqs_flying,
+	/* Finished processing on the hardware: */
+	i915_sqs_complete,
+	/* Killed by catastrophic submission failure: */
+	i915_sqs_dead,
+	/* Limit value for use with arrays/loops */
+	i915_sqs_MAX
+};
+
+#define I915_SQS_IS_QUEUED(node)	(((node)->status == i915_sqs_queued))
+#define I915_SQS_IS_FLYING(node)	(((node)->status == i915_sqs_flying))
+#define I915_SQS_IS_COMPLETE(node)	(((node)->status == i915_sqs_complete) || \
+					 ((node)->status == i915_sqs_dead))
+
+struct i915_scheduler_obj_entry {
+	struct drm_i915_gem_object          *obj;
+};
+
+struct i915_scheduler_queue_entry {
+	struct i915_execbuffer_params       params;
+	uint32_t                            priority;
+	struct i915_scheduler_obj_entry     *saved_objects;
+	int                                 num_objs;
+	bool                                bumped;
+	struct i915_scheduler_queue_entry   **dep_list;
+	int                                 num_deps;
+	enum i915_scheduler_queue_status    status;
+	struct timespec                     stamp;
+	struct list_head                    link;
+	uint32_t                            scheduler_index;
+};
+
+struct i915_scheduler {
+	struct list_head    node_queue[I915_NUM_RINGS];
+	uint32_t            flags[I915_NUM_RINGS];
+	spinlock_t          lock;
+	uint32_t            index;
+	uint32_t            last_irq_seqno[I915_NUM_RINGS];
+
+	/* Tuning parameters: */
+	uint32_t            priority_level_max;
+	uint32_t            priority_level_preempt;
+	uint32_t            min_flying;
+};
+
+/* Flag bits for i915_scheduler::flags */
+enum {
+	i915_sf_interrupts_enabled  = (1 << 0),
+	i915_sf_submitting          = (1 << 1),
+};
+
+int         i915_scheduler_init(struct drm_device *dev);
+int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
+int         i915_scheduler_handle_irq(struct intel_engine_cs *ring);
+
+#endif  /* _I915_SCHEDULER_H_ */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 08/39] drm/i915: Prepare retire_requests to handle out-of-order seqnos
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (6 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 07/39] drm/i915: Start of GPU scheduler John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 09/39] drm/i915: Added scheduler hook into i915_gem_complete_requests_ring() John.C.Harrison
                   ` (31 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

A major point of the GPU scheduler is that it re-orders batch buffers after they
have been submitted to the driver. This leads to requests completing out of
order. In turn, this means that the retire processing can no longer assume that
all completed entries are at the front of the list. Rather than attempting to
re-order the request list on a regular basis, it is better to simply scan the
entire list.

There is also a problem with doing the free of the request before the move to
inactive. Thus the requests are now moved to a temporary list first, then the
objects de-activated and finally the requests on the temporary list are freed.

Change-Id: I7eb6793581d9d28eb832e0e94c116b7202fa1b26
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 54 +++++++++++++++++++++++------------------
 1 file changed, 30 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3fbc6ec..56405cd 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3171,6 +3171,10 @@ void i915_gem_reset(struct drm_device *dev)
 void
 i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 {
+	struct drm_i915_gem_object *obj, *obj_next;
+	struct drm_i915_gem_request *req, *req_next;
+	LIST_HEAD(deferred_request_free);
+
 	WARN_ON(i915_verify_lists(ring->dev));
 
 	/*
@@ -3180,37 +3184,31 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 	 */
 	i915_gem_request_notify(ring);
 
+	/*
+	 * Note that request entries might be out of order due to rescheduling
+	 * and pre-emption. Thus both lists must be processed in their entirety
+	 * rather than stopping at the first non-complete entry.
+	 */
+
 	/* Retire requests first as we use it above for the early return.
 	 * If we retire requests last, we may use a later seqno and so clear
 	 * the requests lists without clearing the active list, leading to
 	 * confusion.
 	 */
-	while (!list_empty(&ring->request_list)) {
-		struct drm_i915_gem_request *request;
-
-		request = list_first_entry(&ring->request_list,
-					   struct drm_i915_gem_request,
-					   list);
-
-		if (!i915_gem_request_completed(request))
-			break;
+	list_for_each_entry_safe(req, req_next, &ring->request_list, list) {
+		if (!i915_gem_request_completed(req))
+			continue;
 
-		i915_gem_request_retire(request);
+		list_move_tail(&req->list, &deferred_request_free);
 	}
 
 	/* Move any buffers on the active list that are no longer referenced
 	 * by the ringbuffer to the flushing/inactive lists as appropriate,
 	 * before we free the context associated with the requests.
 	 */
-	while (!list_empty(&ring->active_list)) {
-		struct drm_i915_gem_object *obj;
-
-		obj = list_first_entry(&ring->active_list,
-				      struct drm_i915_gem_object,
-				      ring_list[ring->id]);
-
+	list_for_each_entry_safe(obj, obj_next, &ring->active_list, ring_list[ring->id]) {
 		if (!list_empty(&obj->last_read_req[ring->id]->list))
-			break;
+			continue;
 
 		i915_gem_object_retire__read(obj, ring->id);
 	}
@@ -3222,18 +3220,26 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 	}
 
 	while (!list_empty(&ring->delayed_free_list)) {
-		struct drm_i915_gem_request *request;
 		unsigned long flags;
 
-		request = list_first_entry(&ring->delayed_free_list,
-					   struct drm_i915_gem_request,
-					   delay_free_list);
+		req = list_first_entry(&ring->delayed_free_list,
+				       struct drm_i915_gem_request,
+				       delay_free_list);
 
 		spin_lock_irqsave(&ring->delayed_free_lock, flags);
-		list_del(&request->delay_free_list);
+		list_del(&req->delay_free_list);
 		spin_unlock_irqrestore(&ring->delayed_free_lock, flags);
 
-		i915_gem_request_free(request);
+		i915_gem_request_free(req);
+	}
+
+	/* It should now be safe to actually free the requests */
+	while (!list_empty(&deferred_request_free)) {
+		req = list_first_entry(&deferred_request_free,
+				       struct drm_i915_gem_request,
+				       list);
+
+		i915_gem_request_retire(req);
 	}
 
 	WARN_ON(i915_verify_lists(ring->dev));
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 09/39] drm/i915: Added scheduler hook into i915_gem_complete_requests_ring()
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (7 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 08/39] drm/i915: Prepare retire_requests to handle out-of-order seqnos John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 10/39] drm/i915: Disable hardware semaphores when GPU scheduler is enabled John.C.Harrison
                   ` (30 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The GPU scheduler can cause requests to complete out of order. For example,
because one request pre-empted others that had already been submitted. This
means the simple seqno comparison is not necessarily valid. Instead, a check
against what the scheduler is currently doing must be made to determine if a
request has really completed.

Change-Id: I149250a8f9382586514ca324aba1c53063b83e19
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h       |  2 ++
 drivers/gpu/drm/i915/i915_gem.c       | 13 +++++++++++--
 drivers/gpu/drm/i915/i915_scheduler.c | 31 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h |  2 ++
 4 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7d2a494..58f53ec 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2238,6 +2238,8 @@ struct drm_i915_gem_request {
 	/** process identifier submitting this request */
 	struct pid *pid;
 
+	struct i915_scheduler_queue_entry	*scheduler_qe;
+
 	/**
 	 * The ELSP only accepts two elements at a time, so we queue
 	 * context/tail pairs on a given queue (ring->execlist_queue) until the
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 56405cd..e3c4032 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2772,6 +2772,7 @@ void i915_gem_request_notify(struct intel_engine_cs *ring)
 {
 	struct drm_i915_gem_request *req, *req_next;
 	unsigned long flags;
+	bool complete;
 	u32 seqno;
 	LIST_HEAD(free_list);
 
@@ -2785,8 +2786,13 @@ void i915_gem_request_notify(struct intel_engine_cs *ring)
 	spin_lock_irqsave(&ring->fence_lock, flags);
 	list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_list) {
 		if (!req->cancelled) {
-			if (!i915_seqno_passed(seqno, req->seqno))
-				continue;
+			if (i915_scheduler_is_request_tracked(req, &complete, NULL)) {
+				if (!complete)
+					continue;
+			} else {
+				if (!i915_seqno_passed(seqno, req->seqno))
+					continue;
+			}
 
 			fence_signal_locked(&req->fence);
 			trace_i915_gem_request_complete(req);
@@ -2811,6 +2817,9 @@ void i915_gem_request_notify(struct intel_engine_cs *ring)
 
 		i915_gem_request_unreference(req);
 	}
+
+	/* Necessary? Or does the fence_signal() call do an implicit wakeup? */
+	wake_up_all(&ring->irq_queue);
 }
 
 static void i915_fence_timeline_value_str(struct fence *fence, char *str, int size)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 71d8df7..0d1cbe3 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -119,6 +119,9 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 	node->stamp  = stamp;
 	i915_gem_request_reference(node->params.request);
 
+	BUG_ON(node->params.request->scheduler_qe);
+	node->params.request->scheduler_qe = node;
+
 	/* Need to determine the number of incomplete entries in the list as
 	 * that will be the maximum size of the dependency list.
 	 *
@@ -363,6 +366,13 @@ static void i915_scheduler_seqno_complete(struct intel_engine_cs *ring, uint32_t
 		got_changes = true;
 	}
 
+	/*
+	 * Avoid issues with requests not being signalled because their
+	 * interrupt has already passed.
+	 */
+	if (got_changes)
+		i915_gem_request_notify(ring);
+
 	/* Should submit new work here if flight list is empty but the DRM
 	 * mutex lock might not be available if a '__wait_request()' call is
 	 * blocking the system. */
@@ -504,6 +514,7 @@ int i915_scheduler_remove(struct intel_engine_cs *ring)
 			i915_gem_execbuff_release_batch_obj(node->params.batch_obj);
 
 		/* Free everything that is owned by the node: */
+		node->params.request->scheduler_qe = NULL;
 		i915_gem_request_unreference(node->params.request);
 		kfree(node->params.cliprects);
 		kfree(node->dep_list);
@@ -774,3 +785,23 @@ static int i915_scheduler_remove_dependent(struct i915_scheduler *scheduler,
 
 	return 0;
 }
+
+bool i915_scheduler_is_request_tracked(struct drm_i915_gem_request *req,
+				       bool *completed, bool *busy)
+{
+	struct drm_i915_private *dev_priv = req->ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	if (!scheduler)
+		return false;
+
+	if (req->scheduler_qe == NULL)
+		return false;
+
+	if (completed)
+		*completed = I915_SQS_IS_COMPLETE(req->scheduler_qe);
+	if (busy)
+		*busy      = I915_SQS_IS_QUEUED(req->scheduler_qe);
+
+	return true;
+}
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 0c5fc7f..6b2585a 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -87,5 +87,7 @@ enum {
 int         i915_scheduler_init(struct drm_device *dev);
 int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
 int         i915_scheduler_handle_irq(struct intel_engine_cs *ring);
+bool        i915_scheduler_is_request_tracked(struct drm_i915_gem_request *req,
+					      bool *completed, bool *busy);
 
 #endif  /* _I915_SCHEDULER_H_ */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 10/39] drm/i915: Disable hardware semaphores when GPU scheduler is enabled
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (8 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 09/39] drm/i915: Added scheduler hook into i915_gem_complete_requests_ring() John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 11/39] drm/i915: Force MMIO flips when scheduler enabled John.C.Harrison
                   ` (29 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Hardware sempahores require seqno values to be continuously incrementing.
However, the scheduler's reordering of batch buffers means that the seqno values
going through the hardware could be out of order. Thus semaphores can not be
used.

On the other hand, the scheduler superceeds the need for hardware semaphores
anyway. Having one ring stall waiting for something to complete on another ring
is inefficient if that ring could be working on some other, independent task.
This is what the scheduler is meant to do - keep the hardware as busy as
possible by reordering batch buffers to avoid dependency stalls.

Change-Id: I95d1fceacd370455a9720d7dca55cfd0a1f6beaa
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c         | 9 +++++++++
 drivers/gpu/drm/i915/i915_scheduler.c   | 7 +++++++
 drivers/gpu/drm/i915/i915_scheduler.h   | 1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c | 4 ++++
 4 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index db48aee..abd7efc 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -34,6 +34,7 @@
 #include "i915_drv.h"
 #include "i915_trace.h"
 #include "intel_drv.h"
+#include "i915_scheduler.h"
 
 #include <linux/console.h>
 #include <linux/module.h>
@@ -516,6 +517,14 @@ void intel_detect_pch(struct drm_device *dev)
 
 bool i915_semaphore_is_enabled(struct drm_device *dev)
 {
+	/* Hardware semaphores are not compatible with the scheduler due to the
+	 * seqno values being potentially out of order. However, semaphores are
+	 * also not required as the scheduler will handle interring dependencies
+	 * and try do so in a way that does not cause dead time on the hardware.
+	 */
+	if (i915_scheduler_is_enabled(dev))
+		return false;
+
 	if (INTEL_INFO(dev)->gen < 6)
 		return false;
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 0d1cbe3..f7fd9a4 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -38,6 +38,13 @@ static int         i915_scheduler_priority_bump(struct i915_scheduler *scheduler
 						struct i915_scheduler_queue_entry *target,
 						uint32_t bump);
 
+bool i915_scheduler_is_enabled(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	return dev_priv->scheduler != NULL;
+}
+
 int i915_scheduler_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 6b2585a..88cbfba 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -84,6 +84,7 @@ enum {
 	i915_sf_submitting          = (1 << 1),
 };
 
+bool        i915_scheduler_is_enabled(struct drm_device *dev);
 int         i915_scheduler_init(struct drm_device *dev);
 int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
 int         i915_scheduler_handle_irq(struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 83a5254..df0cd48 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -32,6 +32,7 @@
 #include <drm/i915_drm.h>
 #include "i915_trace.h"
 #include "intel_drv.h"
+#include "i915_scheduler.h"
 
 bool
 intel_ring_initialized(struct intel_engine_cs *ring)
@@ -1379,6 +1380,9 @@ gen6_ring_sync(struct drm_i915_gem_request *waiter_req,
 	u32 wait_mbox = signaller->semaphore.mbox.wait[waiter->id];
 	int ret;
 
+	/* Arithmetic on sequence numbers is unreliable with a scheduler. */
+	BUG_ON(i915_scheduler_is_enabled(signaller->dev));
+
 	/* Throughout all of the GEM code, seqno passed implies our current
 	 * seqno is >= the last seqno executed. However for hardware the
 	 * comparison is strictly greater than.
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 11/39] drm/i915: Force MMIO flips when scheduler enabled
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (9 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 10/39] drm/i915: Disable hardware semaphores when GPU scheduler is enabled John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 12/39] drm/i915: Added scheduler hook when closing DRM file handles John.C.Harrison
                   ` (28 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Change-Id: Ice071af6d88306b0d1c53bdb651a1a3e20bdc1af
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index b9e8113..9629dab 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -44,6 +44,7 @@
 #include <drm/drm_plane_helper.h>
 #include <drm/drm_rect.h>
 #include <linux/dma_remapping.h>
+#include "i915_scheduler.h"
 
 /* Primary plane formats for gen <= 3 */
 static const uint32_t i8xx_primary_formats[] = {
@@ -11180,6 +11181,8 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
 		return true;
 	else if (i915.enable_execlists)
 		return true;
+	else if (i915_scheduler_is_enabled(ring->dev))
+		return true;
 	else
 		return ring != i915_gem_request_get_ring(obj->last_write_req);
 }
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 12/39] drm/i915: Added scheduler hook when closing DRM file handles
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (10 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 11/39] drm/i915: Force MMIO flips when scheduler enabled John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 13/39] drm/i915: Added deferred work handler for scheduler John.C.Harrison
                   ` (27 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler decouples the submission of batch buffers to the driver with
submission of batch buffers to the hardware. Thus it is possible for an
application to submit work, then close the DRM handle and free up all the
resources that piece of work wishes to use before the work has even been
submitted to the hardware. To prevent this, the scheduler needs to be informed
of the DRM close event so that it can force through any outstanding work
attributed to that file handle.

Change-Id: I24ac056c062b075ff1cc5e2ed2d3fa8e17e85951
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_dma.c       |  3 ++
 drivers/gpu/drm/i915/i915_scheduler.c | 66 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h |  2 ++
 3 files changed, 71 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 5e63076..0a25017 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -47,6 +47,7 @@
 #include <linux/vga_switcheroo.h>
 #include <linux/slab.h>
 #include <acpi/video.h>
+#include "i915_scheduler.h"
 #include <linux/pm.h>
 #include <linux/pm_runtime.h>
 #include <linux/oom.h>
@@ -1186,6 +1187,8 @@ void i915_driver_lastclose(struct drm_device *dev)
 
 void i915_driver_preclose(struct drm_device *dev, struct drm_file *file)
 {
+	i915_scheduler_closefile(dev, file);
+
 	mutex_lock(&dev->struct_mutex);
 	i915_gem_context_close(dev, file);
 	i915_gem_release(dev, file);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index f7fd9a4..50bcccb 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -812,3 +812,69 @@ bool i915_scheduler_is_request_tracked(struct drm_i915_gem_request *req,
 
 	return true;
 }
+
+int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
+{
+	struct i915_scheduler_queue_entry  *node;
+	struct drm_i915_private            *dev_priv = dev->dev_private;
+	struct i915_scheduler              *scheduler = dev_priv->scheduler;
+	struct drm_i915_gem_request        *req;
+	struct intel_engine_cs  *ring;
+	int                     i, ret;
+	unsigned long           flags;
+	bool                    found;
+
+	if (!scheduler)
+		return 0;
+
+	for_each_ring(ring, dev_priv, i) {
+		do {
+			spin_lock_irqsave(&scheduler->lock, flags);
+
+			found = false;
+			list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+				if (I915_SQS_IS_COMPLETE(node))
+					continue;
+
+				if (node->params.file != file)
+					continue;
+
+				found = true;
+				req = node->params.request;
+				i915_gem_request_reference(req);
+				break;
+			}
+
+			spin_unlock_irqrestore(&scheduler->lock, flags);
+
+			if (found) {
+				do {
+					mutex_lock(&dev->struct_mutex);
+					ret = i915_wait_request(req);
+					mutex_unlock(&dev->struct_mutex);
+					if (ret == -EAGAIN)
+						msleep(20);
+				} while (ret == -EAGAIN);
+
+				mutex_lock(&dev->struct_mutex);
+				i915_gem_request_unreference(req);
+				mutex_unlock(&dev->struct_mutex);
+			}
+		} while (found);
+	}
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+	for_each_ring(ring, dev_priv, i) {
+		list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+			if (node->params.file != file)
+				continue;
+
+			WARN_ON(!I915_SQS_IS_COMPLETE(node));
+
+			node->params.file = NULL;
+		}
+	}
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 88cbfba..fbb6f7b 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -86,6 +86,8 @@ enum {
 
 bool        i915_scheduler_is_enabled(struct drm_device *dev);
 int         i915_scheduler_init(struct drm_device *dev);
+int         i915_scheduler_closefile(struct drm_device *dev,
+				     struct drm_file *file);
 int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
 int         i915_scheduler_handle_irq(struct intel_engine_cs *ring);
 bool        i915_scheduler_is_request_tracked(struct drm_i915_gem_request *req,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 13/39] drm/i915: Added deferred work handler for scheduler
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (11 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 12/39] drm/i915: Added scheduler hook when closing DRM file handles John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 14/39] drm/i915: Redirect execbuffer_final() via scheduler John.C.Harrison
                   ` (26 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler needs to do interrupt triggered work that is too complex to do in
the interrupt handler. Thus it requires a deferred work handler to process this
work asynchronously.

Change-Id: I0f7cc2b6f034a50bf8f7e368b60ad8bafd00f993
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_dma.c       |  3 +++
 drivers/gpu/drm/i915/i915_drv.h       | 10 ++++++++++
 drivers/gpu/drm/i915/i915_gem.c       |  2 ++
 drivers/gpu/drm/i915/i915_scheduler.c | 23 +++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_scheduler.h |  1 +
 5 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 0a25017..4d3370f 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1084,6 +1084,9 @@ int i915_driver_unload(struct drm_device *dev)
 	WARN_ON(unregister_oom_notifier(&dev_priv->mm.oom_notifier));
 	unregister_shrinker(&dev_priv->mm.shrinker);
 
+	/* Cancel the scheduler work handler, which should be idle now. */
+	cancel_work_sync(&dev_priv->mm.scheduler_work);
+
 	io_mapping_free(dev_priv->gtt.mappable);
 	arch_phys_wc_del(dev_priv->gtt.mtrr);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 58f53ec..2b3fab6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1299,6 +1299,16 @@ struct i915_gem_mm {
 	struct delayed_work retire_work;
 
 	/**
+	 * New scheme is to get an interrupt after every work packet
+	 * in order to allow the low latency scheduling of pending
+	 * packets. The idea behind adding new packets to a pending
+	 * queue rather than directly into the hardware ring buffer
+	 * is to allow high priority packets to over take low priority
+	 * ones.
+	 */
+	struct work_struct scheduler_work;
+
+	/**
 	 * When we detect an idle GPU, we want to turn on
 	 * powersaving features. So once we see that there
 	 * are no more requests outstanding and no more
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e3c4032..77a3b27 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5631,6 +5631,8 @@ i915_gem_load(struct drm_device *dev)
 			  i915_gem_retire_work_handler);
 	INIT_DELAYED_WORK(&dev_priv->mm.idle_work,
 			  i915_gem_idle_work_handler);
+	INIT_WORK(&dev_priv->mm.scheduler_work,
+				i915_gem_scheduler_work_handler);
 	init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
 
 	dev_priv->relative_constants_mode = I915_EXEC_CONSTANTS_REL_GENERAL;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 50bcccb..3494fd5 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -407,12 +407,12 @@ int i915_scheduler_handle_irq(struct intel_engine_cs *ring)
 	i915_scheduler_seqno_complete(ring, seqno);
 	spin_unlock_irqrestore(&scheduler->lock, flags);
 
-	/* XXX: Need to also call i915_scheduler_remove() via work handler. */
+	queue_work(dev_priv->wq, &dev_priv->mm.scheduler_work);
 
 	return 0;
 }
 
-int i915_scheduler_remove(struct intel_engine_cs *ring)
+static int i915_scheduler_remove(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	struct i915_scheduler   *scheduler = dev_priv->scheduler;
@@ -531,6 +531,25 @@ int i915_scheduler_remove(struct intel_engine_cs *ring)
 	return ret;
 }
 
+void i915_gem_scheduler_work_handler(struct work_struct *work)
+{
+	struct intel_engine_cs  *ring;
+	struct drm_i915_private *dev_priv;
+	struct drm_device       *dev;
+	int                     i;
+
+	dev_priv = container_of(work, struct drm_i915_private, mm.scheduler_work);
+	dev = dev_priv->dev;
+
+	mutex_lock(&dev->struct_mutex);
+
+	for_each_ring(ring, dev_priv, i) {
+		i915_scheduler_remove(ring);
+	}
+
+	mutex_unlock(&dev->struct_mutex);
+}
+
 static void i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler)
 {
 	struct i915_scheduler_queue_entry *node;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index fbb6f7b..15878a4 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -90,6 +90,7 @@ int         i915_scheduler_closefile(struct drm_device *dev,
 				     struct drm_file *file);
 int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
 int         i915_scheduler_handle_irq(struct intel_engine_cs *ring);
+void        i915_gem_scheduler_work_handler(struct work_struct *work);
 bool        i915_scheduler_is_request_tracked(struct drm_i915_gem_request *req,
 					      bool *completed, bool *busy);
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 14/39] drm/i915: Redirect execbuffer_final() via scheduler
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (12 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 13/39] drm/i915: Added deferred work handler for scheduler John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 15/39] drm/i915: Keep the reserved space mechanism happy John.C.Harrison
                   ` (25 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Updated the execbuffer() code to pass the packaged up batch buffer information
to the scheduler rather than calling execbuffer_final() directly. The scheduler
queue() code is currently a stub which simply chains on to _final() immediately.

Change-Id: I2a19062a9e66845f2e886332fc4b5fc7ac992864
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 19 +++++++------------
 drivers/gpu/drm/i915/intel_lrc.c           | 12 ++++--------
 2 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index ba9d595..364e9cc 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -37,6 +37,7 @@
 #ifdef CONFIG_SYNC
 #include <../drivers/staging/android/sync.h>
 #endif
+#include "i915_scheduler.h"
 
 #define  __EXEC_OBJECT_HAS_PIN (1<<31)
 #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
@@ -1198,6 +1199,7 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 			       struct drm_i915_gem_execbuffer2 *args,
 			       struct list_head *vmas)
 {
+	struct i915_scheduler_queue_entry *qe;
 	struct drm_device *dev = params->dev;
 	struct intel_engine_cs *ring = params->ring;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1289,18 +1291,11 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 
 	i915_gem_execbuffer_move_to_active(vmas, params->request);
 
-	ret = dev_priv->gt.execbuf_final(params);
+	qe = container_of(params, typeof(*qe), params);
+	ret = i915_scheduler_queue_execbuffer(qe);
 	if (ret)
 		goto error;
 
-	/*
-	 * Free everything that was stored in the QE structure (until the
-	 * scheduler arrives and does it instead):
-	 */
-	kfree(params->cliprects);
-	if (params->dispatch_flags & I915_DISPATCH_SECURE)
-		i915_gem_execbuff_release_batch_obj(params->batch_obj);
-
 	return 0;
 
 error:
@@ -1492,8 +1487,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	struct intel_engine_cs *ring;
 	struct intel_context *ctx;
 	struct i915_address_space *vm;
-	struct i915_execbuffer_params params_master; /* XXX: will be removed later */
-	struct i915_execbuffer_params *params = &params_master;
+	struct i915_scheduler_queue_entry qe;
+	struct i915_execbuffer_params *params = &qe.params;
 	const u32 ctx_id = i915_execbuffer2_get_context_id(*args);
 	u32 dispatch_flags;
 	int ret;
@@ -1624,7 +1619,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	else
 		vm = &dev_priv->gtt.base;
 
-	memset(&params_master, 0x00, sizeof(params_master));
+	memset(&qe, 0x00, sizeof(qe));
 
 	eb = eb_create(args);
 	if (eb == NULL) {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index bba1152..a8c78ec 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -136,6 +136,7 @@
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 #include "intel_mocs.h"
+#include "i915_scheduler.h"
 
 #define GEN9_LR_CONTEXT_RENDER_SIZE (22 * PAGE_SIZE)
 #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
@@ -827,6 +828,7 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 			       struct drm_i915_gem_execbuffer2 *args,
 			       struct list_head *vmas)
 {
+	struct i915_scheduler_queue_entry *qe;
 	struct drm_device       *dev = params->dev;
 	struct intel_engine_cs  *ring = params->ring;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -884,17 +886,11 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 
 	i915_gem_execbuffer_move_to_active(vmas, params->request);
 
-	ret = dev_priv->gt.execbuf_final(params);
+	qe = container_of(params, typeof(*qe), params);
+	ret = i915_scheduler_queue_execbuffer(qe);
 	if (ret)
 		return ret;
 
-	/*
-	 * Free everything that was stored in the QE structure (until the
-	 * scheduler arrives and does it instead):
-	 */
-	if (params->dispatch_flags & I915_DISPATCH_SECURE)
-		i915_gem_execbuff_release_batch_obj(params->batch_obj);
-
 	return 0;
 }
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 15/39] drm/i915: Keep the reserved space mechanism happy
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (13 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 14/39] drm/i915: Redirect execbuffer_final() via scheduler John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 16/39] drm/i915: Added tracking/locking of batch buffer objects John.C.Harrison
                   ` (24 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Ring space is reserved when constructing a request to ensure that the
subsequent 'add_request()' call cannot fail due to waiting for space
on a busy or broken GPU. However, the scheduler jumps in to the middle
of the execbuffer process between request creation and request
submission. Thus it needs to cancel the reserved space when the
request is simply added to the scheduler's queue and not yet
submitted. Similarly, it needs to re-reserve the space when it finally
does want to send the batch buffer to the hardware.

For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  7 +++++++
 drivers/gpu/drm/i915/i915_scheduler.c      |  4 ++++
 drivers/gpu/drm/i915/intel_lrc.c           | 13 +++++++++++--
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 364e9cc..75d018d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1317,6 +1317,10 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params)
 	/* The mutex must be acquired before calling this function */
 	BUG_ON(!mutex_is_locked(&params->dev->struct_mutex));
 
+	ret = intel_ring_reserve_space(params->request);
+	if (ret)
+		return ret;
+
 	intel_runtime_pm_get(dev_priv);
 
 	/*
@@ -1392,6 +1396,9 @@ error:
 	 */
 	intel_runtime_pm_put(dev_priv);
 
+	if (ret)
+		intel_ring_reserved_space_cancel(params->request->ringbuf);
+
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 3494fd5..e145829 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -95,6 +95,8 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 
 		qe->scheduler_index = scheduler->index++;
 
+		intel_ring_reserved_space_cancel(qe->params.request->ringbuf);
+
 		scheduler->flags[qe->params.ring->id] |= i915_sf_submitting;
 		ret = dev_priv->gt.execbuf_final(&qe->params);
 		scheduler->flags[qe->params.ring->id] &= ~i915_sf_submitting;
@@ -126,6 +128,8 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 	node->stamp  = stamp;
 	i915_gem_request_reference(node->params.request);
 
+	intel_ring_reserved_space_cancel(node->params.request->ringbuf);
+
 	BUG_ON(node->params.request->scheduler_qe);
 	node->params.request->scheduler_qe = node;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a8c78ec..76d5023 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -908,13 +908,17 @@ int intel_execlists_submission_final(struct i915_execbuffer_params *params)
 	/* The mutex must be acquired before calling this function */
 	BUG_ON(!mutex_is_locked(&params->dev->struct_mutex));
 
+	ret = intel_logical_ring_reserve_space(params->request);
+	if (ret)
+		return ret;
+
 	/*
 	 * Unconditionally invalidate gpu caches and ensure that we do flush
 	 * any residual writes from the previous batch.
 	 */
 	ret = logical_ring_invalidate_all_caches(params->request);
 	if (ret)
-		return ret;
+		goto err;
 
 	if (ring == &dev_priv->ring[RCS] &&
 	    params->instp_mode != dev_priv->relative_constants_mode) {
@@ -938,13 +942,18 @@ int intel_execlists_submission_final(struct i915_execbuffer_params *params)
 
 	ret = ring->emit_bb_start(params->request, exec_start, params->dispatch_flags);
 	if (ret)
-		return ret;
+		goto err;
 
 	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
 
 	i915_gem_execbuffer_retire_commands(params);
 
 	return 0;
+
+err:
+	intel_ring_reserved_space_cancel(params->request->ringbuf);
+
+	return ret;
 }
 
 void intel_execlists_retire_requests(struct intel_engine_cs *ring)
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 16/39] drm/i915: Added tracking/locking of batch buffer objects
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (14 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 15/39] drm/i915: Keep the reserved space mechanism happy John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 17/39] drm/i915: Hook scheduler node clean up into retire requests John.C.Harrison
                   ` (23 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler needs to track interdependencies between batch buffers. These are
calculated by analysing the object lists of the buffers and looking for
commonality. The scheduler also needs to keep those buffers locked long after
the initial IOCTL call has returned to user land.

Change-Id: I31e3677ecfc2c9b5a908bda6acc4850432d55f1e
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 48 ++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_scheduler.c      | 33 ++++++++++++++++++--
 2 files changed, 76 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 75d018d..61a5498 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1498,7 +1498,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	struct i915_execbuffer_params *params = &qe.params;
 	const u32 ctx_id = i915_execbuffer2_get_context_id(*args);
 	u32 dispatch_flags;
-	int ret;
+	int ret, i;
 	bool need_relocs;
 	int fd_fence_complete = -1;
 #ifdef CONFIG_SYNC
@@ -1636,6 +1636,14 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		goto pre_mutex_err;
 	}
 
+	qe.saved_objects = kzalloc(
+			sizeof(*qe.saved_objects) * args->buffer_count,
+			GFP_KERNEL);
+	if (!qe.saved_objects) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
 	/* Look up object handles */
 	ret = eb_lookup_vmas(eb, exec, args, vm, file);
 	if (ret)
@@ -1756,7 +1764,26 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	params->args_DR1                = args->DR1;
 	params->args_DR4                = args->DR4;
 	params->batch_obj               = batch_obj;
-	params->ctx                     = ctx;
+
+	/*
+	 * Save away the list of objects used by this batch buffer for the
+	 * purpose of tracking inter-buffer dependencies.
+	 */
+	for (i = 0; i < args->buffer_count; i++) {
+		/*
+		 * NB: 'drm_gem_object_lookup()' increments the object's
+		 * reference count and so must be matched by a
+		 * 'drm_gem_object_unreference' call.
+		 */
+		qe.saved_objects[i].obj =
+			to_intel_bo(drm_gem_object_lookup(dev, file,
+							  exec[i].handle));
+	}
+	qe.num_objs = i;
+
+	/* Lock and save the context object as well. */
+	i915_gem_context_reference(ctx);
+	params->ctx = ctx;
 
 #ifdef CONFIG_SYNC
 	if (args->flags & I915_EXEC_CREATE_FENCE) {
@@ -1808,6 +1835,23 @@ err:
 	i915_gem_context_unreference(ctx);
 	eb_destroy(eb);
 
+	if (qe.saved_objects) {
+		/* Need to release the objects: */
+		for (i = 0; i < qe.num_objs; i++) {
+			if (!qe.saved_objects[i].obj)
+				continue;
+
+			drm_gem_object_unreference(
+					&qe.saved_objects[i].obj->base);
+		}
+
+		kfree(qe.saved_objects);
+
+		/* Context too */
+		if (params->ctx)
+			i915_gem_context_unreference(params->ctx);
+	}
+
 	/*
 	 * If the request was created but not successfully submitted then it
 	 * must be freed again. If it was submitted then it is being tracked
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index e145829..f5fa968 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -108,7 +108,23 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 		if (ret)
 			return ret;
 
-		/* Free everything that is owned by the QE structure: */
+		/* Need to release the objects: */
+		for (i = 0; i < qe->num_objs; i++) {
+			if (!qe->saved_objects[i].obj)
+				continue;
+
+			drm_gem_object_unreference(&qe->saved_objects[i].obj->base);
+		}
+
+		kfree(qe->saved_objects);
+		qe->saved_objects = NULL;
+		qe->num_objs = 0;
+
+		/* Free the context object too: */
+		if (qe->params.ctx)
+			i915_gem_context_unreference(qe->params.ctx);
+
+		/* And anything else owned by the QE structure: */
 		kfree(qe->params.cliprects);
 		if (qe->params.dispatch_flags & I915_DISPATCH_SECURE)
 			i915_gem_execbuff_release_batch_obj(qe->params.batch_obj);
@@ -425,7 +441,7 @@ static int i915_scheduler_remove(struct intel_engine_cs *ring)
 	int                 flying = 0, queued = 0;
 	int                 ret = 0;
 	bool                do_submit;
-	uint32_t            min_seqno;
+	uint32_t            i, min_seqno;
 	struct list_head    remove;
 
 	if (list_empty(&scheduler->node_queue[ring->id]))
@@ -524,7 +540,18 @@ static int i915_scheduler_remove(struct intel_engine_cs *ring)
 		if (node->params.dispatch_flags & I915_DISPATCH_SECURE)
 			i915_gem_execbuff_release_batch_obj(node->params.batch_obj);
 
-		/* Free everything that is owned by the node: */
+		/* Release the locked buffers: */
+		for (i = 0; i < node->num_objs; i++) {
+			drm_gem_object_unreference(
+					    &node->saved_objects[i].obj->base);
+		}
+		kfree(node->saved_objects);
+
+		/* Context too: */
+		if (node->params.ctx)
+			i915_gem_context_unreference(node->params.ctx);
+
+		/* And anything else owned by the node: */
 		node->params.request->scheduler_qe = NULL;
 		i915_gem_request_unreference(node->params.request);
 		kfree(node->params.cliprects);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 17/39] drm/i915: Hook scheduler node clean up into retire requests
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (15 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 16/39] drm/i915: Added tracking/locking of batch buffer objects John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 18/39] drm/i915: Added scheduler interrupt handler hook John.C.Harrison
                   ` (22 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler keeps its own lock on various DRM objects in order to guarantee
safe access long after the original execbuff IOCTL has completed. This is
especially important when pre-emption is enabled as the batch buffer might need
to be submitted to the hardware multiple times. This patch hooks the clean up of
these locks into the request retire function. The request can only be retired
after it has completed on the hardware and thus is no longer eligible for
re-submission. Thus there is no point holding on to the locks beyond that time.

For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c       |  3 +++
 drivers/gpu/drm/i915/i915_scheduler.c | 51 ++++++++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_scheduler.h |  1 +
 3 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 77a3b27..cb5af5d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1405,6 +1405,9 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	if (!list_empty(&request->signal_list))
 		request->cancelled = true;
 
+	if (request->scheduler_qe)
+		i915_gem_scheduler_clean_node(request->scheduler_qe);
+
 	i915_gem_request_unreference(request);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index f5fa968..df2e27f 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -432,6 +432,38 @@ int i915_scheduler_handle_irq(struct intel_engine_cs *ring)
 	return 0;
 }
 
+void i915_gem_scheduler_clean_node(struct i915_scheduler_queue_entry *node)
+{
+	uint32_t i;
+
+	if (WARN_ON(!I915_SQS_IS_COMPLETE(node)))
+		return;
+
+	if (node->params.batch_obj) {
+		/* The batch buffer must be unpinned before it is unreferenced
+		 * otherwise the unpin fails with a missing vma!? */
+		if (node->params.dispatch_flags & I915_DISPATCH_SECURE)
+			i915_gem_execbuff_release_batch_obj(node->params.batch_obj);
+
+		node->params.batch_obj = NULL;
+	}
+
+	/* Release the locked buffers: */
+	for (i = 0; i < node->num_objs; i++) {
+		drm_gem_object_unreference(
+				    &node->saved_objects[i].obj->base);
+	}
+	kfree(node->saved_objects);
+	node->saved_objects = NULL;
+	node->num_objs = 0;
+
+	/* Context too: */
+	if (node->params.ctx) {
+		i915_gem_context_unreference(node->params.ctx);
+		node->params.ctx = NULL;
+	}
+}
+
 static int i915_scheduler_remove(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
@@ -441,7 +473,7 @@ static int i915_scheduler_remove(struct intel_engine_cs *ring)
 	int                 flying = 0, queued = 0;
 	int                 ret = 0;
 	bool                do_submit;
-	uint32_t            i, min_seqno;
+	uint32_t            min_seqno;
 	struct list_head    remove;
 
 	if (list_empty(&scheduler->node_queue[ring->id]))
@@ -535,21 +567,8 @@ static int i915_scheduler_remove(struct intel_engine_cs *ring)
 		node = list_first_entry(&remove, typeof(*node), link);
 		list_del(&node->link);
 
-		/* The batch buffer must be unpinned before it is unreferenced
-		 * otherwise the unpin fails with a missing vma!? */
-		if (node->params.dispatch_flags & I915_DISPATCH_SECURE)
-			i915_gem_execbuff_release_batch_obj(node->params.batch_obj);
-
-		/* Release the locked buffers: */
-		for (i = 0; i < node->num_objs; i++) {
-			drm_gem_object_unreference(
-					    &node->saved_objects[i].obj->base);
-		}
-		kfree(node->saved_objects);
-
-		/* Context too: */
-		if (node->params.ctx)
-			i915_gem_context_unreference(node->params.ctx);
+		/* Free up all the DRM object references */
+		i915_gem_scheduler_clean_node(node);
 
 		/* And anything else owned by the node: */
 		node->params.request->scheduler_qe = NULL;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 15878a4..73c5e7d 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -88,6 +88,7 @@ bool        i915_scheduler_is_enabled(struct drm_device *dev);
 int         i915_scheduler_init(struct drm_device *dev);
 int         i915_scheduler_closefile(struct drm_device *dev,
 				     struct drm_file *file);
+void        i915_gem_scheduler_clean_node(struct i915_scheduler_queue_entry *node);
 int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
 int         i915_scheduler_handle_irq(struct intel_engine_cs *ring);
 void        i915_gem_scheduler_work_handler(struct work_struct *work);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 18/39] drm/i915: Added scheduler interrupt handler hook
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (16 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 17/39] drm/i915: Hook scheduler node clean up into retire requests John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 19/39] drm/i915: Added scheduler support to __wait_request() calls John.C.Harrison
                   ` (21 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler needs to be informed of each batch buffer completion. This is done
via the user interrupt mechanism. The epilogue of each batch buffer submission
updates a sequence number value (seqno) and triggers a user interrupt.

This change hooks the scheduler in to the processing of that interrupt via the
notify_ring() function. The scheduler also has clean up code that needs to be
done outside of the interrupt context, thus notify_ring() now also pokes the
scheduler's work queue.

Change-Id: I4724b3ad7782453a244f84744d54bf14f5b65a38
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f67d09b..40a2eff 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -36,6 +36,7 @@
 #include "i915_drv.h"
 #include "i915_trace.h"
 #include "intel_drv.h"
+#include "i915_scheduler.h"
 
 /**
  * DOC: interrupt handling
@@ -851,6 +852,8 @@ static void notify_ring(struct intel_engine_cs *ring)
 	if (!intel_ring_initialized(ring))
 		return;
 
+	i915_scheduler_handle_irq(ring);
+
 	i915_gem_request_notify(ring);
 
 	wake_up_all(&ring->irq_queue);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 19/39] drm/i915: Added scheduler support to __wait_request() calls
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (17 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 18/39] drm/i915: Added scheduler interrupt handler hook John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-21  9:27   ` Daniel Vetter
  2015-07-17 14:33 ` [RFC 20/39] drm/i915: Added scheduler support to page fault handler John.C.Harrison
                   ` (20 subsequent siblings)
  39 siblings, 1 reply; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler can cause batch buffers, and hence requests, to be submitted to
the ring out of order and asynchronously to their submission to the driver. Thus
at the point of waiting for the completion of a given request, it is not even
guaranteed that the request has actually been sent to the hardware yet. Even it
is has been sent, it is possible that it could be pre-empted and thus 'unsent'.

This means that it is necessary to be able to submit requests to the hardware
during the wait call itself. Unfortunately, while some callers of
__wait_request() release the mutex lock first, others do not (and apparently can
not). Hence there is the ability to deadlock as the wait stalls for submission
but the asynchronous submission is stalled for the mutex lock.

This change hooks the scheduler in to the __wait_request() code to ensure
correct behaviour. That is, flush the target batch buffer through to the
hardware and do not deadlock waiting for something that cannot currently be
submitted. Instead, the wait call must return EAGAIN at least as far back as
necessary to release the mutex lock and allow the scheduler's asynchronous
processing to get in and handle the pre-emption operation and eventually
(re-)submit the work.

Change-Id: I31fe6bc7e38f6ffdd843fcae16e7cc8b1e52a931
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  3 +-
 drivers/gpu/drm/i915/i915_gem.c         | 37 +++++++++++---
 drivers/gpu/drm/i915/i915_scheduler.c   | 91 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h   |  2 +
 drivers/gpu/drm/i915/intel_display.c    |  3 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  2 +-
 6 files changed, 128 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2b3fab6..e9e0736 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2972,7 +2972,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			unsigned reset_counter,
 			bool interruptible,
 			s64 *timeout,
-			struct intel_rps_client *rps);
+			struct intel_rps_client *rps,
+			bool is_locked);
 int __must_check i915_wait_request(struct drm_i915_gem_request *req);
 int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
 int __must_check
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cb5af5d..f713cda 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1219,7 +1219,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			unsigned reset_counter,
 			bool interruptible,
 			s64 *timeout,
-			struct intel_rps_client *rps)
+			struct intel_rps_client *rps,
+			bool is_locked)
 {
 	struct intel_engine_cs *ring = i915_gem_request_get_ring(req);
 	struct drm_device *dev = ring->dev;
@@ -1229,8 +1230,10 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	DEFINE_WAIT(wait);
 	unsigned long timeout_expire;
 	s64 before, now;
-	int ret;
+	int ret = 0;
+	bool    busy;
 
+	might_sleep();
 	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
 
 	if (list_empty(&req->list))
@@ -1281,6 +1284,22 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
+		if (is_locked) {
+			/* If this request is being processed by the scheduler
+			 * then it is unsafe to sleep with the mutex lock held
+			 * as the scheduler may require the lock in order to
+			 * progress the request. */
+			if (i915_scheduler_is_request_tracked(req, NULL, &busy)) {
+				if (busy) {
+					ret = -EAGAIN;
+					break;
+				}
+			}
+
+			/* If the request is not tracked by the scheduler then the
+			 * regular test can be done. */
+		}
+
 		if (i915_gem_request_completed(req)) {
 			ret = 0;
 			break;
@@ -1452,13 +1471,17 @@ i915_wait_request(struct drm_i915_gem_request *req)
 
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 
+	ret = i915_scheduler_flush_request(req, true);
+	if (ret < 0)
+		return ret;
+
 	ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
 	if (ret)
 		return ret;
 
 	ret = __i915_wait_request(req,
 				  atomic_read(&dev_priv->gpu_error.reset_counter),
-				  interruptible, NULL, NULL);
+				  interruptible, NULL, NULL, true);
 	if (ret)
 		return ret;
 
@@ -1571,7 +1594,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	mutex_unlock(&dev->struct_mutex);
 	for (i = 0; ret == 0 && i < n; i++)
 		ret = __i915_wait_request(requests[i], reset_counter, true,
-					  NULL, rps);
+					  NULL, rps, false);
 	mutex_lock(&dev->struct_mutex);
 
 	for (i = 0; i < n; i++) {
@@ -3443,7 +3466,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		if (ret == 0)
 			ret = __i915_wait_request(req[i], reset_counter, true,
 						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
-						  file->driver_priv);
+						  file->driver_priv, false);
 		i915_gem_request_unreference(req[i]);
 	}
 	return ret;
@@ -3476,7 +3499,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 					  atomic_read(&i915->gpu_error.reset_counter),
 					  i915->mm.interruptible,
 					  NULL,
-					  &i915->rps.semaphores);
+					  &i915->rps.semaphores, true);
 		if (ret)
 			return ret;
 
@@ -4683,7 +4706,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (target == NULL)
 		return 0;
 
-	ret = __i915_wait_request(target, reset_counter, true, NULL, NULL);
+	ret = __i915_wait_request(target, reset_counter, true, NULL, NULL, false);
 	if (ret == 0)
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index df2e27f..811cbe4 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -31,6 +31,8 @@ static int         i915_scheduler_remove_dependent(struct i915_scheduler *schedu
 						   struct i915_scheduler_queue_entry *remove);
 static int         i915_scheduler_submit(struct intel_engine_cs *ring,
 					 bool is_locked);
+static int         i915_scheduler_submit_max_priority(struct intel_engine_cs *ring,
+					       bool is_locked);
 static uint32_t    i915_scheduler_count_flying(struct i915_scheduler *scheduler,
 					       struct intel_engine_cs *ring);
 static void        i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler);
@@ -600,6 +602,57 @@ void i915_gem_scheduler_work_handler(struct work_struct *work)
 	mutex_unlock(&dev->struct_mutex);
 }
 
+int i915_scheduler_flush_request(struct drm_i915_gem_request *req,
+				 bool is_locked)
+{
+	struct drm_i915_private            *dev_priv;
+	struct i915_scheduler              *scheduler;
+	unsigned long       flags;
+	int                 flush_count;
+	uint32_t            ring_id;
+
+	if (!req)
+		return -EINVAL;
+
+	dev_priv  = req->ring->dev->dev_private;
+	scheduler = dev_priv->scheduler;
+
+	if (!scheduler)
+		return 0;
+
+	if (!req->scheduler_qe)
+		return 0;
+
+	if (!I915_SQS_IS_QUEUED(req->scheduler_qe))
+		return 0;
+
+	ring_id = req->ring->id;
+	if (is_locked && (scheduler->flags[ring_id] & i915_sf_submitting)) {
+		/* Scheduler is busy already submitting another batch,
+		 * come back later rather than going recursive... */
+		return -EAGAIN;
+	}
+
+	if (list_empty(&scheduler->node_queue[ring_id]))
+		return 0;
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+
+	i915_scheduler_priority_bump_clear(scheduler);
+
+	flush_count = i915_scheduler_priority_bump(scheduler,
+			    req->scheduler_qe, scheduler->priority_level_max);
+
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	if (flush_count) {
+		DRM_DEBUG_DRIVER("<%s> Bumped %d entries\n", req->ring->name, flush_count);
+		flush_count = i915_scheduler_submit_max_priority(req->ring, is_locked);
+	}
+
+	return flush_count;
+}
+
 static void i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler)
 {
 	struct i915_scheduler_queue_entry *node;
@@ -653,6 +706,44 @@ static int i915_scheduler_priority_bump(struct i915_scheduler *scheduler,
 	return count;
 }
 
+static int i915_scheduler_submit_max_priority(struct intel_engine_cs *ring,
+					      bool is_locked)
+{
+	struct i915_scheduler_queue_entry  *node;
+	struct drm_i915_private            *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler              *scheduler = dev_priv->scheduler;
+	unsigned long	flags;
+	int             ret, count = 0;
+	bool            found;
+
+	do {
+		found = false;
+		spin_lock_irqsave(&scheduler->lock, flags);
+		list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+			if (!I915_SQS_IS_QUEUED(node))
+				continue;
+
+			if (node->priority < scheduler->priority_level_max)
+				continue;
+
+			found = true;
+			break;
+		}
+		spin_unlock_irqrestore(&scheduler->lock, flags);
+
+		if (!found)
+			break;
+
+		ret = i915_scheduler_submit(ring, is_locked);
+		if (ret < 0)
+			return ret;
+
+		count += ret;
+	} while (found);
+
+	return count;
+}
+
 static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
 				    struct i915_scheduler_queue_entry **pop_node,
 				    unsigned long *flags)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 73c5e7d..fcf2640 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -92,6 +92,8 @@ void        i915_gem_scheduler_clean_node(struct i915_scheduler_queue_entry *nod
 int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
 int         i915_scheduler_handle_irq(struct intel_engine_cs *ring);
 void        i915_gem_scheduler_work_handler(struct work_struct *work);
+int         i915_scheduler_flush_request(struct drm_i915_gem_request *req,
+					 bool is_locked);
 bool        i915_scheduler_is_request_tracked(struct drm_i915_gem_request *req,
 					      bool *completed, bool *busy);
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 9629dab..129c829 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11291,7 +11291,8 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
 		WARN_ON(__i915_wait_request(mmio_flip->req,
 					    mmio_flip->crtc->reset_counter,
 					    false, NULL,
-					    &mmio_flip->i915->rps.mmioflips));
+					    &mmio_flip->i915->rps.mmioflips,
+					    false));
 
 	intel_do_mmio_flip(mmio_flip->crtc);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index df0cd48..e0992b7 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2193,7 +2193,7 @@ int intel_ring_idle(struct intel_engine_cs *ring)
 	return __i915_wait_request(req,
 				   atomic_read(&to_i915(ring->dev)->gpu_error.reset_counter),
 				   to_i915(ring->dev)->mm.interruptible,
-				   NULL, NULL);
+				   NULL, NULL, true);
 }
 
 int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 20/39] drm/i915: Added scheduler support to page fault handler
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (18 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 19/39] drm/i915: Added scheduler support to __wait_request() calls John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 21/39] drm/i915: Added scheduler flush calls to ring throttle and idle functions John.C.Harrison
                   ` (19 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

GPU page faults can now require scheduler operation in order to complete. For
example, in order to free up sufficient memory to handle the fault the handler
must wait for a batch buffer to complete that has not even been sent to the
hardware yet. Thus EAGAIN no longer means a GPU hang, it can occur under normal
operation.

Change-Id: Iff6bd2744ef12bb7405fbcd6b43c286caad4141f
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f713cda..dd9ebbe 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1919,10 +1919,15 @@ out:
 		}
 	case -EAGAIN:
 		/*
-		 * EAGAIN means the gpu is hung and we'll wait for the error
-		 * handler to reset everything when re-faulting in
+		 * EAGAIN can mean the gpu is hung and we'll have to wait for
+		 * the error handler to reset everything when re-faulting in
 		 * i915_mutex_lock_interruptible.
+		 *
+		 * It can also indicate various other nonfatal errors for which
+		 * the best response is to give other threads a chance to run,
+		 * and then retry the failing operation in its entirety.
 		 */
+		/*FALLTHRU*/
 	case 0:
 	case -ERESTARTSYS:
 	case -EINTR:
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 21/39] drm/i915: Added scheduler flush calls to ring throttle and idle functions
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (19 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 20/39] drm/i915: Added scheduler support to page fault handler John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 22/39] drm/i915: Add scheduler hook to GPU reset John.C.Harrison
                   ` (18 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

When requesting that all GPU work is completed, it is now necessary to get the
scheduler involved in order to flush out work that queued and not yet submitted.

Change-Id: I95dcc2a2ee5c1a844748621c333994ddd6cf6a66
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c       | 17 ++++++++++++-
 drivers/gpu/drm/i915/i915_scheduler.c | 45 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h |  1 +
 3 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index dd9ebbe..6142e68 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3710,6 +3710,10 @@ int i915_gpu_idle(struct drm_device *dev)
 
 	/* Flush everything onto the inactive list. */
 	for_each_ring(ring, dev_priv, i) {
+		ret = i915_scheduler_flush(ring, true);
+		if (ret < 0)
+			return ret;
+
 		if (!i915.enable_execlists) {
 			struct drm_i915_gem_request *req;
 
@@ -4679,7 +4683,8 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	unsigned long recent_enough = jiffies - DRM_I915_THROTTLE_JIFFIES;
 	struct drm_i915_gem_request *request, *target = NULL;
 	unsigned reset_counter;
-	int ret;
+	int i, ret;
+	struct intel_engine_cs *ring;
 
 	ret = i915_gem_wait_for_error(&dev_priv->gpu_error);
 	if (ret)
@@ -4689,6 +4694,16 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (ret)
 		return ret;
 
+	for_each_ring(ring, dev_priv, i) {
+		/* Need a mechanism to flush out scheduler entries that were
+		 * submitted more than 'recent_enough' time ago as well! In the
+		 * meantime, just flush everything out to ensure that entries
+		 * can not sit around indefinitely. */
+		ret = i915_scheduler_flush(ring, false);
+		if (ret < 0)
+			return ret;
+	}
+
 	spin_lock(&file_priv->mm.lock);
 	list_for_each_entry(request, &file_priv->mm.request_list, client_list) {
 		if (time_after_eq(request->emitted_jiffies, recent_enough))
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 811cbe4..73c9ba6 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -653,6 +653,51 @@ int i915_scheduler_flush_request(struct drm_i915_gem_request *req,
 	return flush_count;
 }
 
+int i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked)
+{
+	struct i915_scheduler_queue_entry *node;
+	struct drm_i915_private           *dev_priv;
+	struct i915_scheduler             *scheduler;
+	unsigned long       flags;
+	bool        found;
+	int         ret;
+	uint32_t    count = 0;
+
+	if (!ring)
+		return -EINVAL;
+
+	dev_priv  = ring->dev->dev_private;
+	scheduler = dev_priv->scheduler;
+
+	if (!scheduler)
+		return 0;
+
+	BUG_ON(is_locked && (scheduler->flags[ring->id] & i915_sf_submitting));
+
+	do {
+		found = false;
+		spin_lock_irqsave(&scheduler->lock, flags);
+		list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+			if (!I915_SQS_IS_QUEUED(node))
+				continue;
+
+			found = true;
+			break;
+		}
+		spin_unlock_irqrestore(&scheduler->lock, flags);
+
+		if (found) {
+			ret = i915_scheduler_submit(ring, is_locked);
+			if (ret < 0)
+				return ret;
+
+			count += ret;
+		}
+	} while (found);
+
+	return count;
+}
+
 static void i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler)
 {
 	struct i915_scheduler_queue_entry *node;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index fcf2640..5e094d5 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -92,6 +92,7 @@ void        i915_gem_scheduler_clean_node(struct i915_scheduler_queue_entry *nod
 int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
 int         i915_scheduler_handle_irq(struct intel_engine_cs *ring);
 void        i915_gem_scheduler_work_handler(struct work_struct *work);
+int         i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked);
 int         i915_scheduler_flush_request(struct drm_i915_gem_request *req,
 					 bool is_locked);
 bool        i915_scheduler_is_request_tracked(struct drm_i915_gem_request *req,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 22/39] drm/i915: Add scheduler hook to GPU reset
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (20 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 21/39] drm/i915: Added scheduler flush calls to ring throttle and idle functions John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 23/39] drm/i915: Added a module parameter for allowing scheduler overrides John.C.Harrison
                   ` (17 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

When the watchdog resets the GPU, the scheduler needs to know so that it can
clean up it's view of the world. All in flight nodes must be marked as dead so
that the scheduler does not wait forever for them to complete. Also, all queued
nodes must be marked as dead so that the scheduler does not dead lock the reset
code by saying that the ring can not be idled and it must come back again later.

Change-Id: I184eb59c5c1a1385f9c17db66c7cc46f8904eebd
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c       |  2 ++
 drivers/gpu/drm/i915/i915_scheduler.c | 63 ++++++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_scheduler.h |  8 ++++-
 3 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6142e68..6d72caa 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3187,6 +3187,8 @@ void i915_gem_reset(struct drm_device *dev)
 	struct intel_engine_cs *ring;
 	int i;
 
+	i915_scheduler_kill_all(dev);
+
 	/*
 	 * Before we free the objects from the requests, we need to inspect
 	 * them for finding the guilty party. As the requests only borrow
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 73c9ba6..3155f42 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -341,6 +341,56 @@ static void i915_scheduler_node_kill(struct i915_scheduler_queue_entry *node)
 	node->status = i915_sqs_dead;
 }
 
+/* Abandon a queued node completely. For example because the driver is being
+ * reset and it is not valid to preserve absolutely any state at all across the
+ * reinitialisation sequence. */
+static void i915_scheduler_node_kill_queued(struct i915_scheduler_queue_entry *node)
+{
+	BUG_ON(!node);
+	BUG_ON(!I915_SQS_IS_QUEUED(node));
+
+	node->status = i915_sqs_dead;
+}
+
+/* The system is toast. Terminate all nodes with extreme prejudice. */
+void i915_scheduler_kill_all(struct drm_device *dev)
+{
+	struct i915_scheduler_queue_entry   *node;
+	struct drm_i915_private             *dev_priv = dev->dev_private;
+	struct i915_scheduler               *scheduler = dev_priv->scheduler;
+	unsigned long   flags;
+	int             r;
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+
+	for (r = 0; r < I915_NUM_RINGS; r++) {
+		list_for_each_entry(node, &scheduler->node_queue[r], link) {
+			switch (node->status) {
+			case I915_SQS_CASE_COMPLETE:
+			break;
+
+			case I915_SQS_CASE_FLYING:
+				i915_scheduler_node_kill(node);
+			break;
+
+			case I915_SQS_CASE_QUEUED:
+				i915_scheduler_node_kill_queued(node);
+			break;
+
+			default:
+				/* Wot no state?! */
+				BUG();
+			}
+		}
+	}
+
+	memset(scheduler->last_irq_seqno, 0x00, sizeof(scheduler->last_irq_seqno));
+
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	queue_work(dev_priv->wq, &dev_priv->mm.scheduler_work);
+}
+
 /*
  * The batch tagged with the indicated seqence number has completed.
  * Search the queue for it, update its status and those of any batches
@@ -912,7 +962,7 @@ static int i915_scheduler_submit(struct intel_engine_cs *ring, bool was_locked)
 		scheduler->flags[ring->id] &= ~i915_sf_submitting;
 
 		if (ret) {
-			bool requeue = true;
+			int requeue = 1;
 
 			/* Oh dear! Either the node is broken or the ring is
 			 * busy. So need to kill the node or requeue it and try
@@ -922,7 +972,7 @@ static int i915_scheduler_submit(struct intel_engine_cs *ring, bool was_locked)
 			case ENODEV:
 			case ENOENT:
 				/* Fatal errors. Kill the node. */
-				requeue = false;
+				requeue = -1;
 			break;
 
 			case EAGAIN:
@@ -941,13 +991,18 @@ static int i915_scheduler_submit(struct intel_engine_cs *ring, bool was_locked)
 			break;
 			}
 
-			if (requeue) {
+			/* Check that the watchdog/reset code has not nuked
+			 * the node while we weren't looking: */
+			if (node->status == i915_sqs_dead)
+				requeue = 0;
+
+			if (requeue == 1) {
 				i915_scheduler_node_requeue(node);
 				/* No point spinning if the ring is currently
 				 * unavailable so just give up and come back
 				 * later. */
 				break;
-			} else
+			} else if (requeue == -1)
 				i915_scheduler_node_kill(node);
 		}
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 5e094d5..b440e62 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -36,7 +36,7 @@ enum i915_scheduler_queue_status {
 	i915_sqs_flying,
 	/* Finished processing on the hardware: */
 	i915_sqs_complete,
-	/* Killed by catastrophic submission failure: */
+	/* Killed by watchdog or catastrophic submission failure: */
 	i915_sqs_dead,
 	/* Limit value for use with arrays/loops */
 	i915_sqs_MAX
@@ -47,6 +47,11 @@ enum i915_scheduler_queue_status {
 #define I915_SQS_IS_COMPLETE(node)	(((node)->status == i915_sqs_complete) || \
 					 ((node)->status == i915_sqs_dead))
 
+#define I915_SQS_CASE_QUEUED		i915_sqs_queued
+#define I915_SQS_CASE_FLYING		i915_sqs_flying
+#define I915_SQS_CASE_COMPLETE		i915_sqs_complete:		\
+					case i915_sqs_dead
+
 struct i915_scheduler_obj_entry {
 	struct drm_i915_gem_object          *obj;
 };
@@ -91,6 +96,7 @@ int         i915_scheduler_closefile(struct drm_device *dev,
 void        i915_gem_scheduler_clean_node(struct i915_scheduler_queue_entry *node);
 int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
 int         i915_scheduler_handle_irq(struct intel_engine_cs *ring);
+void        i915_scheduler_kill_all(struct drm_device *dev);
 void        i915_gem_scheduler_work_handler(struct work_struct *work);
 int         i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked);
 int         i915_scheduler_flush_request(struct drm_i915_gem_request *req,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 23/39] drm/i915: Added a module parameter for allowing scheduler overrides
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (21 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 22/39] drm/i915: Add scheduler hook to GPU reset John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 24/39] drm/i915: Support for 'unflushed' ring idle John.C.Harrison
                   ` (16 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

It can be useful to be able to disable certain features (e.g. the entire
scheduler) via a module parameter for debugging purposes. A parameter has the
advantage of not being a compile time switch but without implying that it can be
changed dynamically at runtime.

Change-Id: I92f4c832be88f5b34b49b90d6a9903fac68f7004
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h       | 1 +
 drivers/gpu/drm/i915/i915_params.c    | 4 ++++
 drivers/gpu/drm/i915/i915_scheduler.c | 7 +++++--
 drivers/gpu/drm/i915/i915_scheduler.h | 5 +++++
 4 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e9e0736..30552cc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2664,6 +2664,7 @@ struct i915_params {
 	bool verbose_state_checks;
 	bool nuclear_pageflip;
 	int edp_vswing;
+	int scheduler_override;
 };
 extern struct i915_params i915 __read_mostly;
 
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 7983fe4..a5320ff 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -53,6 +53,7 @@ struct i915_params i915 __read_mostly = {
 	.verbose_state_checks = 1,
 	.nuclear_pageflip = 0,
 	.edp_vswing = 0,
+	.scheduler_override = 1,
 };
 
 module_param_named(modeset, i915.modeset, int, 0400);
@@ -186,3 +187,6 @@ MODULE_PARM_DESC(edp_vswing,
 		 "Ignore/Override vswing pre-emph table selection from VBT "
 		 "(0=use value from vbt [default], 1=low power swing(200mV),"
 		 "2=default swing(400mV))");
+
+module_param_named(scheduler_override, i915.scheduler_override, int, 0600);
+MODULE_PARM_DESC(scheduler_override, "Scheduler override mask (0 = none, 1 = direct submission [default])");
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 3155f42..224c8b4 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -44,6 +44,9 @@ bool i915_scheduler_is_enabled(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
+	if (i915.scheduler_override & i915_so_direct_submit)
+		return false;
+
 	return dev_priv->scheduler != NULL;
 }
 
@@ -92,7 +95,7 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 
 	BUG_ON(!scheduler);
 
-	if (1/*i915.scheduler_override & i915_so_direct_submit*/) {
+	if (i915.scheduler_override & i915_so_direct_submit) {
 		int ret;
 
 		qe->scheduler_index = scheduler->index++;
@@ -466,7 +469,7 @@ int i915_scheduler_handle_irq(struct intel_engine_cs *ring)
 
 	seqno = ring->get_seqno(ring, false);
 
-	if (1/*i915.scheduler_override & i915_so_direct_submit*/)
+	if (i915.scheduler_override & i915_so_direct_submit)
 		return 0;
 
 	if (seqno == scheduler->last_irq_seqno[ring->id]) {
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index b440e62..7d743c9 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -89,6 +89,11 @@ enum {
 	i915_sf_submitting          = (1 << 1),
 };
 
+/* Options for 'scheduler_override' module parameter: */
+enum {
+	i915_so_direct_submit       = (1 << 0),
+};
+
 bool        i915_scheduler_is_enabled(struct drm_device *dev);
 int         i915_scheduler_init(struct drm_device *dev);
 int         i915_scheduler_closefile(struct drm_device *dev,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 24/39] drm/i915: Support for 'unflushed' ring idle
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (22 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 23/39] drm/i915: Added a module parameter for allowing scheduler overrides John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-21  8:50   ` Daniel Vetter
  2015-07-17 14:33 ` [RFC 25/39] drm/i915: Defer seqno allocation until actual hardware submission time John.C.Harrison
                   ` (15 subsequent siblings)
  39 siblings, 1 reply; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

When the seqno wraps around zero, the entire GPU is forced to be idle
for some reason (possibly only to work around issues with hardware
semaphores but no-one seems too sure!). This causes a problem if the
force idle occurs at an inopportune moment such as in the middle of
submitting a batch buffer. Specifically, it would lead to recursive
submits - submitting work requires a new seqno, the new seqno requires
idling the ring, idling the ring requires submitting work, submitting
work requires a new seqno...

This change adds a 'flush' parameter to the idle function call which
specifies whether the scheduler queues should be flushed out. I.e. is
the call intended to just idle the ring as it is right now (no flush)
or is it intended to force all outstanding work out of the system
(with flush).

In the seqno wrap case, pending work is not an issue because the next
operation will be to submit it. However, in other cases, the intention
is to make sure everything that could be done has been done.

Change-Id: I182e9a5853666c64ecc9e84d8a8b820a7f8e8836
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         |  4 ++--
 drivers/gpu/drm/i915/intel_lrc.c        |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c | 17 +++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 4 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6d72caa..20c696f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2474,7 +2474,7 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno)
 
 	/* Carefully retire all requests without writing to the rings */
 	for_each_ring(ring, dev_priv, i) {
-		ret = intel_ring_idle(ring);
+		ret = intel_ring_idle(ring, false);
 		if (ret)
 			return ret;
 	}
@@ -3732,7 +3732,7 @@ int i915_gpu_idle(struct drm_device *dev)
 			i915_add_request_no_flush(req);
 		}
 
-		ret = intel_ring_idle(ring);
+		ret = intel_ring_idle(ring, true);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 76d5023..a811d0b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -990,7 +990,7 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
 	if (!intel_ring_initialized(ring))
 		return;
 
-	ret = intel_ring_idle(ring);
+	ret = intel_ring_idle(ring, true);
 	if (ret && !i915_reset_in_progress(&to_i915(ring->dev)->gpu_error))
 		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
 			  ring->name, ret);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index e0992b7..afb04de 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2177,9 +2177,22 @@ static void __wrap_ring_buffer(struct intel_ringbuffer *ringbuf)
 	intel_ring_update_space(ringbuf);
 }
 
-int intel_ring_idle(struct intel_engine_cs *ring)
+int intel_ring_idle(struct intel_engine_cs *ring, bool flush)
 {
 	struct drm_i915_gem_request *req;
+	int ret;
+
+	/*
+	 * NB: Must not flush the scheduler if this idle request is from
+	 * within an execbuff submission (i.e. due to 'get_seqno' calling
+	 * 'wrap_seqno' calling 'idle'). As that would lead to recursive
+	 * flushes!
+	 */
+	if (flush) {
+		ret = i915_scheduler_flush(ring, true);
+		if (ret)
+			return ret;
+	}
 
 	/* Wait upon the last request to be completed */
 	if (list_empty(&ring->request_list))
@@ -2983,7 +2996,7 @@ intel_stop_ring_buffer(struct intel_engine_cs *ring)
 	if (!intel_ring_initialized(ring))
 		return;
 
-	ret = intel_ring_idle(ring);
+	ret = intel_ring_idle(ring, true);
 	if (ret && !i915_reset_in_progress(&to_i915(ring->dev)->gpu_error))
 		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
 			  ring->name, ret);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 9457774..2f30900 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -487,7 +487,7 @@ void intel_ring_update_space(struct intel_ringbuffer *ringbuf);
 int intel_ring_space(struct intel_ringbuffer *ringbuf);
 bool intel_ring_stopped(struct intel_engine_cs *ring);
 
-int __must_check intel_ring_idle(struct intel_engine_cs *ring);
+int __must_check intel_ring_idle(struct intel_engine_cs *ring, bool flush);
 void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno);
 int intel_ring_flush_all_caches(struct drm_i915_gem_request *req);
 int intel_ring_invalidate_all_caches(struct drm_i915_gem_request *req);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 25/39] drm/i915: Defer seqno allocation until actual hardware submission time
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (23 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 24/39] drm/i915: Support for 'unflushed' ring idle John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 26/39] drm/i915: Added immediate submission override to scheduler John.C.Harrison
                   ` (14 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The seqno value is now only used for the final test for completion of a request.
It is no longer used to track the request through the software stack. Thus it is
no longer necessary to allocate the seqno immediately with the request. Instead,
it can be done lazily and left until the request is actually sent to the
hardware. This is particular advantageous with a GPU scheduler as the requests
can then be re-ordered between their creation and their hardware submission
without having out of order seqnos.

v2: i915_add_request() can't fail!

v3: combine with 'drm/i915: Assign seqno at start of exec_final()'
Various bits of code during the execbuf code path need a seqno value to be
assigned to the request. This change makes this assignment explicit at the start
of submission_final() rather than relying on an auto-generated seqno to have
happened already. This is in preparation for a future patch which changes seqno
values to be assigned lazily (during add_request).

Change-Id: I0d922b84c517611a79fa6c2b9e730d4fe3671d6a
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  1 +
 drivers/gpu/drm/i915/i915_gem.c            | 21 ++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 +++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c           | 13 +++++++++++++
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 30552cc..12b4986 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2201,6 +2201,7 @@ struct drm_i915_gem_request {
 
 	/** GEM sequence number associated with this request. */
 	uint32_t seqno;
+	uint32_t reserved_seqno;
 
 	/* Unique identifier which can be used for trace points & debug */
 	uint32_t uniq;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 20c696f..7308838 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2524,6 +2524,9 @@ i915_gem_get_seqno(struct drm_device *dev, u32 *seqno)
 
 	/* reserve 0 for non-seqno */
 	if (dev_priv->next_seqno == 0) {
+		/* Why is the full re-initialisation required? Is it only for
+		 * hardware semaphores? If so, could skip it in the case where
+		 * semaphores are disabled? */
 		int ret = i915_gem_init_seqno(dev, 0);
 		if (ret)
 			return ret;
@@ -2581,6 +2584,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 		WARN(ret, "*_ring_flush_all_caches failed: %d!\n", ret);
 	}
 
+	/* Make the request's seqno 'live': */
+	if(!request->seqno) {
+		request->seqno = request->reserved_seqno;
+		WARN_ON(request->seqno != dev_priv->last_seqno);
+	}
+
 	/* Record the position of the start of the request so that
 	 * should we detect the updated seqno part-way through the
 	 * GPU processing the request, we never over-estimate the
@@ -2821,6 +2830,9 @@ void i915_gem_request_notify(struct intel_engine_cs *ring)
 				if (!complete)
 					continue;
 			} else {
+				/* How can this happen? */
+				WARN_ON(req->seqno == 0);
+
 				if (!i915_seqno_passed(seqno, req->seqno))
 					continue;
 			}
@@ -3009,7 +3021,14 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 	if (req == NULL)
 		return -ENOMEM;
 
-	ret = i915_gem_get_seqno(ring->dev, &req->seqno);
+	/*
+	 * Assign an identifier to track this request through the hardware
+	 * but don't make it live yet. It could change in the future if this
+	 * request gets overtaken. However, it still needs to be allocated
+	 * in advance because the point of submission must not fail and seqno
+	 * allocation can fail.
+	 */
+	ret = i915_gem_get_seqno(ring->dev, &req->reserved_seqno);
 	if (ret)
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 61a5498..1642701 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1317,6 +1317,19 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params)
 	/* The mutex must be acquired before calling this function */
 	BUG_ON(!mutex_is_locked(&params->dev->struct_mutex));
 
+	/* Make sure the request's seqno is the latest and greatest: */
+	if(params->request->reserved_seqno != dev_priv->last_seqno) {
+		ret = i915_gem_get_seqno(ring->dev, &params->request->reserved_seqno);
+		if (ret)
+			return ret;
+	}
+	/*
+	 * And make it live because some of the execbuff submission code
+	 * requires the seqno to be available up front. */
+	WARN_ON(params->request->seqno);
+	params->request->seqno = params->request->reserved_seqno;
+	WARN_ON(params->request->seqno != dev_priv->last_seqno);
+
 	ret = intel_ring_reserve_space(params->request);
 	if (ret)
 		return ret;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a811d0b..d3e7399 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -908,6 +908,19 @@ int intel_execlists_submission_final(struct i915_execbuffer_params *params)
 	/* The mutex must be acquired before calling this function */
 	BUG_ON(!mutex_is_locked(&params->dev->struct_mutex));
 
+	/* Make sure the request's seqno is the latest and greatest: */
+	if(params->request->reserved_seqno != dev_priv->last_seqno) {
+		ret = i915_gem_get_seqno(ring->dev, &params->request->reserved_seqno);
+		if (ret)
+			return ret;
+	}
+	/*
+	 * And make it live because some of the execbuff submission code
+	 * requires the seqno to be available up front. */
+	WARN_ON(params->request->seqno);
+	params->request->seqno = params->request->reserved_seqno;
+	WARN_ON(params->request->seqno != dev_priv->last_seqno);
+
 	ret = intel_logical_ring_reserve_space(params->request);
 	if (ret)
 		return ret;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 26/39] drm/i915: Added immediate submission override to scheduler
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (24 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 25/39] drm/i915: Defer seqno allocation until actual hardware submission time John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 27/39] drm/i915: Add sync wait support " John.C.Harrison
                   ` (13 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

To aid with debugging issues related to the scheduler, it can be useful to
ensure that all batch buffers are submitted immediately rather than queued until
later. This change adds an override flag via the module parameter to force
instant submission.

Change-Id: I7652df53e2d3c3d77d78bebcf99856e2c53f2801
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_scheduler.c | 7 +++++--
 drivers/gpu/drm/i915/i915_scheduler.h | 1 +
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 224c8b4..c7139f8 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -238,8 +238,11 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 
 	list_add_tail(&node->link, &scheduler->node_queue[ring->id]);
 
-	not_flying = i915_scheduler_count_flying(scheduler, ring) <
-						 scheduler->min_flying;
+	if (i915.scheduler_override & i915_so_submit_on_queue)
+		not_flying = true;
+	else
+		not_flying = i915_scheduler_count_flying(scheduler, ring) <
+							 scheduler->min_flying;
 
 	spin_unlock_irqrestore(&scheduler->lock, flags);
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 7d743c9..ce94b0b 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -92,6 +92,7 @@ enum {
 /* Options for 'scheduler_override' module parameter: */
 enum {
 	i915_so_direct_submit       = (1 << 0),
+	i915_so_submit_on_queue     = (1 << 1),
 };
 
 bool        i915_scheduler_is_enabled(struct drm_device *dev);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 27/39] drm/i915: Add sync wait support to scheduler
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (25 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 26/39] drm/i915: Added immediate submission override to scheduler John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-21  9:59   ` Daniel Vetter
  2015-07-17 14:33 ` [RFC 28/39] drm/i915: Connecting execbuff fences " John.C.Harrison
                   ` (12 subsequent siblings)
  39 siblings, 1 reply; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a sync framework to allow work for multiple independent
systems to be synchronised with each other but without stalling
the CPU whether in the application or the driver. This patch adds
support for this framework to the GPU scheduler.

Batch buffers can now have sync framework fence objects associated with
them. The scheduler will look at this fence when deciding what to
submit next to the hardware. If the fence is outstanding then that
batch buffer will be passed over in preference of one that is ready to
run. If no other batches are ready then the scheduler will queue an
asynchronous callback to be woken up when the fence has been
signalled. The callback will wake the scheduler and submit the now
ready batch buffer.

For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h       |   1 +
 drivers/gpu/drm/i915/i915_scheduler.c | 163 ++++++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_scheduler.h |   6 ++
 3 files changed, 165 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 12b4986..b568432 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1703,6 +1703,7 @@ struct i915_execbuffer_params {
 	uint32_t                        batch_obj_vm_offset;
 	struct intel_engine_cs          *ring;
 	struct drm_i915_gem_object      *batch_obj;
+	struct sync_fence               *fence_wait;
 	struct drm_clip_rect            *cliprects;
 	uint32_t                        instp_mask;
 	int                             instp_mode;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index c7139f8..19577c9 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -25,6 +25,7 @@
 #include "i915_drv.h"
 #include "intel_drv.h"
 #include "i915_scheduler.h"
+#include <../drivers/staging/android/sync.h>
 
 static int         i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node);
 static int         i915_scheduler_remove_dependent(struct i915_scheduler *scheduler,
@@ -100,6 +101,9 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 
 		qe->scheduler_index = scheduler->index++;
 
+		WARN_ON(qe->params.fence_wait &&
+			(atomic_read(&qe->params.fence_wait->status) == 0));
+
 		intel_ring_reserved_space_cancel(qe->params.request->ringbuf);
 
 		scheduler->flags[qe->params.ring->id] |= i915_sf_submitting;
@@ -134,6 +138,11 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 		if (qe->params.dispatch_flags & I915_DISPATCH_SECURE)
 			i915_gem_execbuff_release_batch_obj(qe->params.batch_obj);
 
+#ifdef CONFIG_SYNC
+		if (qe->params.fence_wait)
+			sync_fence_put(qe->params.fence_wait);
+#endif
+
 		return 0;
 	}
 
@@ -625,6 +634,11 @@ static int i915_scheduler_remove(struct intel_engine_cs *ring)
 		node = list_first_entry(&remove, typeof(*node), link);
 		list_del(&node->link);
 
+#ifdef CONFIG_SYNC
+		if (node->params.fence_wait)
+			sync_fence_put(node->params.fence_wait);
+#endif
+
 		/* Free up all the DRM object references */
 		i915_gem_scheduler_clean_node(node);
 
@@ -845,17 +859,100 @@ static int i915_scheduler_submit_max_priority(struct intel_engine_cs *ring,
 	return count;
 }
 
+#ifdef CONFIG_SYNC
+/* Use a private structure in order to pass the 'dev' pointer through */
+struct i915_sync_fence_waiter {
+	struct sync_fence_waiter sfw;
+	struct drm_device	 *dev;
+	struct i915_scheduler_queue_entry *node;
+};
+
+static void i915_scheduler_wait_fence_signaled(struct sync_fence *fence,
+				       struct sync_fence_waiter *waiter)
+{
+	struct i915_sync_fence_waiter *i915_waiter;
+	struct drm_i915_private *dev_priv = NULL;
+
+	i915_waiter = container_of(waiter, struct i915_sync_fence_waiter, sfw);
+	dev_priv    = (i915_waiter && i915_waiter->dev) ?
+					    i915_waiter->dev->dev_private : NULL;
+
+	/*
+	 * NB: The callback is executed at interrupt time, thus it can not
+	 * call _submit() directly. It must go via the delayed work handler.
+	 */
+	if (dev_priv) {
+		struct i915_scheduler   *scheduler;
+		unsigned long           flags;
+
+		scheduler = dev_priv->scheduler;
+
+		spin_lock_irqsave(&scheduler->lock, flags);
+		i915_waiter->node->flags &= ~i915_qef_fence_waiting;
+		spin_unlock_irqrestore(&scheduler->lock, flags);
+
+		queue_work(dev_priv->wq, &dev_priv->mm.scheduler_work);
+	}
+
+	kfree(waiter);
+}
+
+static bool i915_scheduler_async_fence_wait(struct drm_device *dev,
+					    struct i915_scheduler_queue_entry *node)
+{
+	struct i915_sync_fence_waiter	*fence_waiter;
+	struct sync_fence		*fence = node->params.fence_wait;
+	int				signaled;
+	bool				success = true;
+
+	if ((node->flags & i915_qef_fence_waiting) == 0)
+		node->flags |= i915_qef_fence_waiting;
+	else
+		return true;
+
+	if (fence == NULL)
+		return false;
+
+	signaled = atomic_read(&fence->status);
+	if (!signaled) {
+		fence_waiter = kmalloc(sizeof(*fence_waiter), GFP_KERNEL);
+		if (!fence_waiter) {
+			success = false;
+			goto end;
+		}
+
+		sync_fence_waiter_init(&fence_waiter->sfw,
+				i915_scheduler_wait_fence_signaled);
+		fence_waiter->node = node;
+		fence_waiter->dev = dev;
+
+		if (sync_fence_wait_async(fence, &fence_waiter->sfw)) {
+			/* an error occurred, usually this is because the
+			 * fence was signaled already */
+			signaled = atomic_read(&fence->status);
+			if (!signaled) {
+				success = false;
+				goto end;
+			}
+		}
+	}
+end:
+	return success;
+}
+#endif
+
 static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
 				    struct i915_scheduler_queue_entry **pop_node,
 				    unsigned long *flags)
 {
 	struct drm_i915_private            *dev_priv = ring->dev->dev_private;
 	struct i915_scheduler              *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_queue_entry  *best_wait, *fence_wait = NULL;
 	struct i915_scheduler_queue_entry  *best;
 	struct i915_scheduler_queue_entry  *node;
 	int     ret;
 	int     i;
-	bool	any_queued;
+	bool	signalled, any_queued;
 	bool	has_local, has_remote, only_remote;
 
 	*pop_node = NULL;
@@ -863,13 +960,25 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
 
 	any_queued = false;
 	only_remote = false;
+	best_wait = NULL;
 	best = NULL;
 
+#ifndef CONFIG_SYNC
+	signalled = true;
+#endif
+
 	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
 		if (!I915_SQS_IS_QUEUED(node))
 			continue;
 		any_queued = true;
 
+#ifdef CONFIG_SYNC
+		if (node->params.fence_wait)
+			signalled = atomic_read(&node->params.fence_wait->status) != 0;
+		else
+			signalled = true;
+#endif	// CONFIG_SYNC
+
 		has_local  = false;
 		has_remote = false;
 		for (i = 0; i < node->num_deps; i++) {
@@ -886,9 +995,15 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
 			only_remote = true;
 
 		if (!has_local && !has_remote) {
-			if (!best ||
-			    (node->priority > best->priority))
-				best = node;
+			if (signalled) {
+				if (!best ||
+				    (node->priority > best->priority))
+					best = node;
+			} else {
+				if (!best_wait ||
+				    (node->priority > best_wait->priority))
+					best_wait = node;
+			}
 		}
 	}
 
@@ -904,8 +1019,34 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
 		 * (a) there are no buffers in the queue
 		 * (b) all queued buffers are dependent on other buffers
 		 *     e.g. on a buffer that is in flight on a different ring
+		 * (c) all independent buffers are waiting on fences
 		 */
-		if (only_remote) {
+		if (best_wait) {
+			/* Need to wait for something to be signalled.
+			 *
+			 * NB: do not really want to wait on one specific fd
+			 * because there is no guarantee in the order that
+			 * blocked buffers will be signalled. Need to wait on
+			 * 'anything' and then rescan for best available, if
+			 * still nothing then wait again...
+			 *
+			 * NB 2: The wait must also wake up if someone attempts
+			 * to submit a new buffer. The new buffer might be
+			 * independent of all others and thus could jump the
+			 * queue and start running immediately.
+			 *
+			 * NB 3: Lastly, must not wait with the spinlock held!
+			 *
+			 * So rather than wait here, need to queue a deferred
+			 * wait thread and just return 'nothing to do'.
+			 *
+			 * NB 4: Can't actually do the wait here because the
+			 * spinlock is still held and the wait requires doing
+			 * a memory allocation.
+			 */
+			fence_wait = best_wait;
+			ret = -EAGAIN;
+		} else if (only_remote) {
 			/* The only dependent buffers are on another ring. */
 			ret = -EAGAIN;
 		} else if (any_queued) {
@@ -917,6 +1058,18 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
 
 	/* i915_scheduler_dump_queue_pop(ring, best); */
 
+	if (fence_wait) {
+#ifdef CONFIG_SYNC
+		/* It should be safe to sleep now... */
+		/* NB: Need to release and reacquire the spinlock though */
+		spin_unlock_irqrestore(&scheduler->lock, *flags);
+		i915_scheduler_async_fence_wait(ring->dev, fence_wait);
+		spin_lock_irqsave(&scheduler->lock, *flags);
+#else
+		BUG_ON(true);
+#endif
+	}
+
 	*pop_node = best;
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index ce94b0b..8ca4b4b 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -56,6 +56,11 @@ struct i915_scheduler_obj_entry {
 	struct drm_i915_gem_object          *obj;
 };
 
+enum i915_scheduler_queue_entry_flags {
+	/* Fence is being waited on */
+	i915_qef_fence_waiting              = (1 << 0),
+};
+
 struct i915_scheduler_queue_entry {
 	struct i915_execbuffer_params       params;
 	uint32_t                            priority;
@@ -68,6 +73,7 @@ struct i915_scheduler_queue_entry {
 	struct timespec                     stamp;
 	struct list_head                    link;
 	uint32_t                            scheduler_index;
+	uint32_t                            flags;
 };
 
 struct i915_scheduler {
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 28/39] drm/i915: Connecting execbuff fences to scheduler
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (26 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 27/39] drm/i915: Add sync wait support " John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 29/39] drm/i915: Added trace points " John.C.Harrison
                   ` (11 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler now supports sync framework fences being associated with
batch buffers. The execbuff IOCTL allows such fences to be passed in
from user land. This patch wires the two together so that the IOCTL no
longer needs to stall on the fence immediately. Instead the stall is
now swallowed by the scheduler's scheduling algorithm.

For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 21 ++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_scheduler.c      |  3 +++
 drivers/gpu/drm/i915/i915_scheduler.h      |  5 +++++
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 1642701..1325b19 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1612,7 +1612,9 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	/*
 	 * Without a GPU scheduler, any fence waits must be done up front.
 	 */
-	if (args->flags & I915_EXEC_WAIT_FENCE) {
+	if ((args->flags & I915_EXEC_WAIT_FENCE) &&
+	    (i915.scheduler_override & i915_so_direct_submit))
+	{
 		ret = i915_early_fence_wait(ring, fd_fence_wait);
 		if (ret < 0)
 			return ret;
@@ -1799,6 +1801,18 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	params->ctx = ctx;
 
 #ifdef CONFIG_SYNC
+	if (args->flags & I915_EXEC_WAIT_FENCE) {
+		if (fd_fence_wait < 0) {
+			DRM_ERROR("Wait fence for ring %d has invalid id %d\n",
+				  (int) ring->id, fd_fence_wait);
+		} else {
+			params->fence_wait = sync_fence_fdget(fd_fence_wait);
+			if (params->fence_wait == NULL)
+				DRM_ERROR("Invalid wait fence %d\n",
+					  fd_fence_wait);
+		}
+	}
+
 	if (args->flags & I915_EXEC_CREATE_FENCE) {
 		/*
 		 * Caller has requested a sync fence.
@@ -1865,6 +1879,11 @@ err:
 			i915_gem_context_unreference(params->ctx);
 	}
 
+#ifdef CONFIG_SYNC
+	if (params->fence_wait)
+		sync_fence_put(params->fence_wait);
+#endif
+
 	/*
 	 * If the request was created but not successfully submitted then it
 	 * must be freed again. If it was submitted then it is being tracked
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 19577c9..66dbc20 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -977,6 +977,9 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
 			signalled = atomic_read(&node->params.fence_wait->status) != 0;
 		else
 			signalled = true;
+
+		if (!signalled)
+			signalled = i915_safe_to_ignore_fence(ring, node->params.fence_wait);
 #endif	// CONFIG_SYNC
 
 		has_local  = false;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 8ca4b4b..3f94512 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -110,6 +110,11 @@ int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *q
 int         i915_scheduler_handle_irq(struct intel_engine_cs *ring);
 void        i915_scheduler_kill_all(struct drm_device *dev);
 void        i915_gem_scheduler_work_handler(struct work_struct *work);
+#ifdef CONFIG_SYNC
+struct drm_i915_gem_request *i915_scheduler_find_by_sync_value(struct intel_engine_cs *ring,
+							       struct intel_context *ctx,
+							       uint32_t sync_value);
+#endif
 int         i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked);
 int         i915_scheduler_flush_request(struct drm_i915_gem_request *req,
 					 bool is_locked);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 29/39] drm/i915: Added trace points to scheduler
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (27 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 28/39] drm/i915: Connecting execbuff fences " John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 30/39] drm/i915: Added scheduler queue throttling by DRM file handle John.C.Harrison
                   ` (10 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Added trace points to the scheduler to track all the various events, node state
transitions and other interesting things that occur.

Change-Id: I9886390cfc7897bc1faf50a104bc651d8baed8a5
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   2 +
 drivers/gpu/drm/i915/i915_scheduler.c      |  34 ++++-
 drivers/gpu/drm/i915/i915_trace.h          | 208 +++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c           |   2 +
 4 files changed, 244 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 1325b19..f90a2c8 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1291,6 +1291,8 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 
 	i915_gem_execbuffer_move_to_active(vmas, params->request);
 
+	trace_i915_gem_ring_queue(ring, params);
+
 	qe = container_of(params, typeof(*qe), params);
 	ret = i915_scheduler_queue_execbuffer(qe);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 66dbc20..408bedc 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -101,6 +101,8 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 
 		qe->scheduler_index = scheduler->index++;
 
+		trace_i915_scheduler_queue(qe->params.ring, qe);
+
 		WARN_ON(qe->params.fence_wait &&
 			(atomic_read(&qe->params.fence_wait->status) == 0));
 
@@ -253,6 +255,9 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 		not_flying = i915_scheduler_count_flying(scheduler, ring) <
 							 scheduler->min_flying;
 
+	trace_i915_scheduler_queue(ring, node);
+	trace_i915_scheduler_node_state_change(ring, node);
+
 	spin_unlock_irqrestore(&scheduler->lock, flags);
 
 	if (not_flying)
@@ -280,6 +285,9 @@ static int i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node)
 
 	node->status = i915_sqs_flying;
 
+	trace_i915_scheduler_fly(ring, node);
+	trace_i915_scheduler_node_state_change(ring, node);
+
 	if (!(scheduler->flags[ring->id] & i915_sf_interrupts_enabled)) {
 		bool    success = true;
 
@@ -344,6 +352,8 @@ static void i915_scheduler_node_requeue(struct i915_scheduler_queue_entry *node)
 	BUG_ON(!I915_SQS_IS_FLYING(node));
 
 	node->status = i915_sqs_queued;
+	trace_i915_scheduler_unfly(node->params.ring, node);
+	trace_i915_scheduler_node_state_change(node->params.ring, node);
 }
 
 /* Give up on a popped node completely. For example, because it is causing the
@@ -354,6 +364,8 @@ static void i915_scheduler_node_kill(struct i915_scheduler_queue_entry *node)
 	BUG_ON(!I915_SQS_IS_FLYING(node));
 
 	node->status = i915_sqs_dead;
+	trace_i915_scheduler_unfly(node->params.ring, node);
+	trace_i915_scheduler_node_state_change(node->params.ring, node);
 }
 
 /* Abandon a queued node completely. For example because the driver is being
@@ -365,6 +377,7 @@ static void i915_scheduler_node_kill_queued(struct i915_scheduler_queue_entry *n
 	BUG_ON(!I915_SQS_IS_QUEUED(node));
 
 	node->status = i915_sqs_dead;
+	trace_i915_scheduler_node_state_change(node->params.ring, node);
 }
 
 /* The system is toast. Terminate all nodes with extreme prejudice. */
@@ -429,8 +442,10 @@ static void i915_scheduler_seqno_complete(struct intel_engine_cs *ring, uint32_t
 	 * if a completed entry is found then there is no need to scan further.
 	 */
 	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
-		if (I915_SQS_IS_COMPLETE(node))
+		if (I915_SQS_IS_COMPLETE(node)) {
+			trace_i915_scheduler_landing(ring, seqno, node);
 			return;
+		}
 
 		if (seqno == node->params.request->seqno)
 			break;
@@ -441,8 +456,12 @@ static void i915_scheduler_seqno_complete(struct intel_engine_cs *ring, uint32_t
 	 * like cache flushes and page flips. So don't complain about if
 	 * no node was found.
 	 */
-	if (&node->link == &scheduler->node_queue[ring->id])
+	if (&node->link == &scheduler->node_queue[ring->id]) {
+		trace_i915_scheduler_landing(ring, seqno, NULL);
 		return;
+	}
+
+	trace_i915_scheduler_landing(ring, seqno, node);
 
 	WARN_ON(!I915_SQS_IS_FLYING(node));
 
@@ -457,6 +476,7 @@ static void i915_scheduler_seqno_complete(struct intel_engine_cs *ring, uint32_t
 
 		/* Node was in flight so mark it as complete. */
 		node->status = i915_sqs_complete;
+		trace_i915_scheduler_node_state_change(ring, node);
 		got_changes = true;
 	}
 
@@ -481,6 +501,8 @@ int i915_scheduler_handle_irq(struct intel_engine_cs *ring)
 
 	seqno = ring->get_seqno(ring, false);
 
+	trace_i915_scheduler_irq(ring, seqno);
+
 	if (i915.scheduler_override & i915_so_direct_submit)
 		return 0;
 
@@ -625,6 +647,8 @@ static int i915_scheduler_remove(struct intel_engine_cs *ring)
 	/* Launch more packets now? */
 	do_submit = (queued > 0) && (flying < scheduler->min_flying);
 
+	trace_i915_scheduler_remove(ring, min_seqno, do_submit);
+
 	spin_unlock_irqrestore(&scheduler->lock, flags);
 
 	if (do_submit)
@@ -634,6 +658,8 @@ static int i915_scheduler_remove(struct intel_engine_cs *ring)
 		node = list_first_entry(&remove, typeof(*node), link);
 		list_del(&node->link);
 
+		trace_i915_scheduler_destroy(ring, node);
+
 #ifdef CONFIG_SYNC
 		if (node->params.fence_wait)
 			sync_fence_put(node->params.fence_wait);
@@ -1016,6 +1042,8 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
 		INIT_LIST_HEAD(&best->link);
 		best->status  = i915_sqs_popped;
 
+		trace_i915_scheduler_node_state_change(ring, best);
+
 		ret = 0;
 	} else {
 		/* Can only get here if:
@@ -1073,6 +1101,8 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
 #endif
 	}
 
+	trace_i915_scheduler_pop_from_queue(ring, best);
+
 	*pop_node = best;
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 796c630..8774192 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -9,6 +9,7 @@
 #include "i915_drv.h"
 #include "intel_drv.h"
 #include "intel_ringbuffer.h"
+#include "i915_scheduler.h"
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM i915
@@ -786,6 +787,213 @@ TRACE_EVENT(switch_mm,
 		  __entry->dev, __entry->ring, __entry->to, __entry->vm)
 );
 
+TRACE_EVENT(i915_scheduler_queue,
+	    TP_PROTO(struct intel_engine_cs *ring,
+		     struct i915_scheduler_queue_entry *node),
+	    TP_ARGS(ring, node),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, uniq)
+			     __field(u32, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring  = ring->id;
+			   __entry->uniq  = node ? node->params.request->uniq  : 0;
+			   __entry->seqno = node ? node->params.request->seqno : 0;
+			   ),
+
+	    TP_printk("ring=%d, uniq=%d, seqno=%d",
+		      __entry->ring, __entry->uniq, __entry->seqno)
+);
+
+TRACE_EVENT(i915_scheduler_fly,
+	    TP_PROTO(struct intel_engine_cs *ring,
+		     struct i915_scheduler_queue_entry *node),
+	    TP_ARGS(ring, node),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, uniq)
+			     __field(u32, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring  = ring->id;
+			   __entry->uniq  = node ? node->params.request->uniq  : 0;
+			   __entry->seqno = node ? node->params.request->seqno : 0;
+			   ),
+
+	    TP_printk("ring=%d, uniq=%d, seqno=%d",
+		      __entry->ring, __entry->uniq, __entry->seqno)
+);
+
+TRACE_EVENT(i915_scheduler_unfly,
+	    TP_PROTO(struct intel_engine_cs *ring,
+		     struct i915_scheduler_queue_entry *node),
+	    TP_ARGS(ring, node),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, uniq)
+			     __field(u32, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring  = ring->id;
+			   __entry->uniq  = node ? node->params.request->uniq  : 0;
+			   __entry->seqno = node ? node->params.request->seqno : 0;
+			   ),
+
+	    TP_printk("ring=%d, uniq=%d, seqno=%d",
+		      __entry->ring, __entry->uniq, __entry->seqno)
+);
+
+TRACE_EVENT(i915_scheduler_landing,
+	    TP_PROTO(struct intel_engine_cs *ring, u32 seqno,
+		     struct i915_scheduler_queue_entry *node),
+	    TP_ARGS(ring, seqno, node),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, uniq)
+			     __field(u32, seqno)
+			     __field(u32, status)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring   = ring->id;
+			   __entry->uniq   = node ? node->params.request->uniq  : 0;
+			   __entry->seqno  = seqno;
+			   __entry->status = node ? node->status : ~0U;
+			   ),
+
+	    TP_printk("ring=%d, uniq=%d, seqno=%d, status=%d",
+		      __entry->ring, __entry->uniq, __entry->seqno, __entry->status)
+);
+
+TRACE_EVENT(i915_scheduler_remove,
+	    TP_PROTO(struct intel_engine_cs *ring,
+		     u32 min_seqno, bool do_submit),
+	    TP_ARGS(ring, min_seqno, do_submit),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, min_seqno)
+			     __field(bool, do_submit)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring      = ring->id;
+			   __entry->min_seqno = min_seqno;
+			   __entry->do_submit = do_submit;
+			   ),
+
+	    TP_printk("ring=%d, min_seqno = %d, do_submit=%d",
+		      __entry->ring, __entry->min_seqno, __entry->do_submit)
+);
+
+TRACE_EVENT(i915_scheduler_destroy,
+	    TP_PROTO(struct intel_engine_cs *ring,
+		     struct i915_scheduler_queue_entry *node),
+	    TP_ARGS(ring, node),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, uniq)
+			     __field(u32, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring  = ring->id;
+			   __entry->uniq  = node ? node->params.request->uniq  : 0;
+			   __entry->seqno = node ? node->params.request->seqno : 0;
+			   ),
+
+	    TP_printk("ring=%d, uniq=%d, seqno=%d",
+		      __entry->ring, __entry->uniq, __entry->seqno)
+);
+
+TRACE_EVENT(i915_scheduler_pop_from_queue,
+	    TP_PROTO(struct intel_engine_cs *ring,
+		     struct i915_scheduler_queue_entry *node),
+	    TP_ARGS(ring, node),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, uniq)
+			     __field(u32, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring  = ring->id;
+			   __entry->uniq  = node ? node->params.request->uniq  : 0;
+			   __entry->seqno = node ? node->params.request->seqno : 0;
+			   ),
+
+	    TP_printk("ring=%d, uniq=%d, seqno=%d",
+		      __entry->ring, __entry->uniq, __entry->seqno)
+);
+
+TRACE_EVENT(i915_scheduler_node_state_change,
+	    TP_PROTO(struct intel_engine_cs *ring,
+		     struct i915_scheduler_queue_entry *node),
+	    TP_ARGS(ring, node),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, uniq)
+			     __field(u32, seqno)
+			     __field(u32, status)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring   = ring->id;
+			   __entry->uniq   = node ? node->params.request->uniq  : 0;
+			   __entry->seqno  = node->params.request->seqno;
+			   __entry->status = node->status;
+			   ),
+
+	    TP_printk("ring=%d, uniq=%d, seqno=%d, status=%d",
+		      __entry->ring, __entry->uniq, __entry->seqno, __entry->status)
+);
+
+TRACE_EVENT(i915_scheduler_irq,
+	    TP_PROTO(struct intel_engine_cs *ring, uint32_t seqno),
+	    TP_ARGS(ring, seqno),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring   = ring->id;
+			   __entry->seqno  = seqno;
+			   ),
+
+	    TP_printk("ring=%d, seqno=%d", __entry->ring, __entry->seqno)
+);
+
+TRACE_EVENT(i915_gem_ring_queue,
+	    TP_PROTO(struct intel_engine_cs *ring,
+		     struct i915_execbuffer_params *params),
+	    TP_ARGS(ring, params),
+
+	    TP_STRUCT__entry(
+			     __field(u32, ring)
+			     __field(u32, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->ring   = ring->id;
+			   __entry->seqno  = params->request->seqno;
+			   ),
+
+	    TP_printk("ring=%d, seqno=%d", __entry->ring, __entry->seqno)
+);
+
 #endif /* _I915_TRACE_H_ */
 
 /* This part must be outside protection */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d3e7399..41dca2a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -886,6 +886,8 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 
 	i915_gem_execbuffer_move_to_active(vmas, params->request);
 
+	trace_i915_gem_ring_queue(ring, params);
+
 	qe = container_of(params, typeof(*qe), params);
 	ret = i915_scheduler_queue_execbuffer(qe);
 	if (ret)
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 30/39] drm/i915: Added scheduler queue throttling by DRM file handle
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (28 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 29/39] drm/i915: Added trace points " John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 31/39] drm/i915: Added debugfs interface to scheduler tuning parameters John.C.Harrison
                   ` (9 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler decouples the submission of batch buffers to the driver from their
subsequent submission to the hardware. This means that an application which is
continuously submitting buffers as fast as it can could potentialy flood the
driver. To prevent this, the driver now tracks how many buffers are in progress
(queued in software or executing in hardware) and limits this to a given
(tunable) number. If this number is exceeded then the queue to the driver will
return EAGAIN and thus prevent the scheduler's queue becoming arbitrarily large.

Change-Id: I83258240aec7c810db08c006a3062d46aa91363f
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  2 ++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  8 +++++++
 drivers/gpu/drm/i915/i915_scheduler.c      | 34 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h      |  2 ++
 4 files changed, 46 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b568432..e230632 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -334,6 +334,8 @@ struct drm_i915_file_private {
 	} rps;
 
 	struct intel_engine_cs *bsd_ring;
+
+	u32 scheduler_queue_length;
 };
 
 enum intel_dpll_id {
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index f90a2c8..c2a69d8 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1935,6 +1935,10 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
+	/* Throttle batch requests per device file */
+	if (i915_scheduler_file_queue_is_full(file))
+		return -EAGAIN;
+
 	/* Copy in the exec list from userland */
 	exec_list = drm_malloc_ab(sizeof(*exec_list), args->buffer_count);
 	exec2_list = drm_malloc_ab(sizeof(*exec2_list), args->buffer_count);
@@ -2018,6 +2022,10 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
+	/* Throttle batch requests per device file */
+	if (i915_scheduler_file_queue_is_full(file))
+		return -EAGAIN;
+
 	exec2_list = kmalloc(sizeof(*exec2_list)*args->buffer_count,
 			     GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
 	if (exec2_list == NULL)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 408bedc..f0c99ad 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -40,6 +40,8 @@ static void        i915_scheduler_priority_bump_clear(struct i915_scheduler *sch
 static int         i915_scheduler_priority_bump(struct i915_scheduler *scheduler,
 						struct i915_scheduler_queue_entry *target,
 						uint32_t bump);
+static void        i915_scheduler_file_queue_inc(struct drm_file *file);
+static void        i915_scheduler_file_queue_dec(struct drm_file *file);
 
 bool i915_scheduler_is_enabled(struct drm_device *dev)
 {
@@ -75,6 +77,7 @@ int i915_scheduler_init(struct drm_device *dev)
 	scheduler->priority_level_max     = ~0U;
 	scheduler->priority_level_preempt = 900;
 	scheduler->min_flying             = 2;
+	scheduler->file_queue_max         = 64;
 
 	dev_priv->scheduler = scheduler;
 
@@ -249,6 +252,8 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 
 	list_add_tail(&node->link, &scheduler->node_queue[ring->id]);
 
+	i915_scheduler_file_queue_inc(node->params.file);
+
 	if (i915.scheduler_override & i915_so_submit_on_queue)
 		not_flying = true;
 	else
@@ -630,6 +635,12 @@ static int i915_scheduler_remove(struct intel_engine_cs *ring)
 		/* Strip the dependency info while the mutex is still locked */
 		i915_scheduler_remove_dependent(scheduler, node);
 
+		/* Likewise clean up the file descriptor before it might disappear. */
+		if (node->params.file) {
+			i915_scheduler_file_queue_dec(node->params.file);
+			node->params.file = NULL;
+		}
+
 		continue;
 	}
 
@@ -1330,3 +1341,26 @@ int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
 
 	return 0;
 }
+
+bool i915_scheduler_file_queue_is_full(struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct drm_i915_private      *dev_priv  = file_priv->dev_priv;
+	struct i915_scheduler        *scheduler = dev_priv->scheduler;
+
+	return file_priv->scheduler_queue_length >= scheduler->file_queue_max;
+}
+
+static void i915_scheduler_file_queue_inc(struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+
+	file_priv->scheduler_queue_length++;
+}
+
+static void i915_scheduler_file_queue_dec(struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+
+	file_priv->scheduler_queue_length--;
+}
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 3f94512..301c567 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -87,6 +87,7 @@ struct i915_scheduler {
 	uint32_t            priority_level_max;
 	uint32_t            priority_level_preempt;
 	uint32_t            min_flying;
+	uint32_t            file_queue_max;
 };
 
 /* Flag bits for i915_scheduler::flags */
@@ -120,5 +121,6 @@ int         i915_scheduler_flush_request(struct drm_i915_gem_request *req,
 					 bool is_locked);
 bool        i915_scheduler_is_request_tracked(struct drm_i915_gem_request *req,
 					      bool *completed, bool *busy);
+bool        i915_scheduler_file_queue_is_full(struct drm_file *file);
 
 #endif  /* _I915_SCHEDULER_H_ */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 31/39] drm/i915: Added debugfs interface to scheduler tuning parameters
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (29 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 30/39] drm/i915: Added scheduler queue throttling by DRM file handle John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 32/39] drm/i915: Added debug state dump facilities to scheduler John.C.Harrison
                   ` (8 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There are various parameters within the scheduler which can be tuned to improve
performance, reduce memory footprint, etc. This change adds support for altering
these via debugfs.

Change-Id: I6c26765269ae7173ff4d3a5c20921eaaca7c36ed
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 113 ++++++++++++++++++++++++++++++++++++
 1 file changed, 113 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 05646fe..028fa8f 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -39,6 +39,7 @@
 #include "intel_ringbuffer.h"
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
+#include "i915_scheduler.h"
 
 enum {
 	ACTIVE_LIST,
@@ -1123,6 +1124,114 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_next_seqno_fops,
 			i915_next_seqno_get, i915_next_seqno_set,
 			"0x%llx\n");
 
+static int
+i915_scheduler_priority_max_get(void *data, u64 *val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	*val = (u64) scheduler->priority_level_max;
+	return 0;
+}
+
+static int
+i915_scheduler_priority_max_set(void *data, u64 val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	scheduler->priority_level_max = (u32) val;
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_priority_max_fops,
+			i915_scheduler_priority_max_get,
+			i915_scheduler_priority_max_set,
+			"0x%llx\n");
+
+static int
+i915_scheduler_priority_preempt_get(void *data, u64 *val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	*val = (u64) scheduler->priority_level_preempt;
+	return 0;
+}
+
+static int
+i915_scheduler_priority_preempt_set(void *data, u64 val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	scheduler->priority_level_preempt = (u32) val;
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_priority_preempt_fops,
+			i915_scheduler_priority_preempt_get,
+			i915_scheduler_priority_preempt_set,
+			"0x%llx\n");
+
+static int
+i915_scheduler_min_flying_get(void *data, u64 *val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	*val = (u64) scheduler->min_flying;
+	return 0;
+}
+
+static int
+i915_scheduler_min_flying_set(void *data, u64 val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	scheduler->min_flying = (u32) val;
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_min_flying_fops,
+			i915_scheduler_min_flying_get,
+			i915_scheduler_min_flying_set,
+			"0x%llx\n");
+
+static int
+i915_scheduler_file_queue_max_get(void *data, u64 *val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	*val = (u64) scheduler->file_queue_max;
+	return 0;
+}
+
+static int
+i915_scheduler_file_queue_max_set(void *data, u64 val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	scheduler->file_queue_max = (u32) val;
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_file_queue_max_fops,
+			i915_scheduler_file_queue_max_get,
+			i915_scheduler_file_queue_max_set,
+			"0x%llx\n");
+
 static int i915_frequency_info(struct seq_file *m, void *unused)
 {
 	struct drm_info_node *node = m->private;
@@ -5163,6 +5272,10 @@ static const struct i915_debugfs_files {
 	{"i915_gem_drop_caches", &i915_drop_caches_fops},
 	{"i915_error_state", &i915_error_state_fops},
 	{"i915_next_seqno", &i915_next_seqno_fops},
+	{"i915_scheduler_priority_max", &i915_scheduler_priority_max_fops},
+	{"i915_scheduler_priority_preempt", &i915_scheduler_priority_preempt_fops},
+	{"i915_scheduler_min_flying", &i915_scheduler_min_flying_fops},
+	{"i915_scheduler_file_queue_max", &i915_scheduler_file_queue_max_fops},
 	{"i915_display_crc_ctl", &i915_display_crc_ctl_fops},
 	{"i915_pri_wm_latency", &i915_pri_wm_latency_fops},
 	{"i915_spr_wm_latency", &i915_spr_wm_latency_fops},
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 32/39] drm/i915: Added debug state dump facilities to scheduler
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (30 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 31/39] drm/i915: Added debugfs interface to scheduler tuning parameters John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 33/39] drm/i915: Add early exit to execbuff_final() if insufficient ring space John.C.Harrison
                   ` (7 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

When debugging batch buffer submission issues, it is useful to be able to see
what the current state of the scheduler is. This change adds functions for
decoding the internal scheduler state and reporting it.

Change-Id: I0634168e3f3465ff023f5a673165c90b07e535b6
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_scheduler.c | 276 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h |  14 ++
 2 files changed, 290 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index f0c99ad..e22f6b8 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -36,6 +36,9 @@ static int         i915_scheduler_submit_max_priority(struct intel_engine_cs *ri
 					       bool is_locked);
 static uint32_t    i915_scheduler_count_flying(struct i915_scheduler *scheduler,
 					       struct intel_engine_cs *ring);
+static int         i915_scheduler_dump_locked(struct intel_engine_cs *ring,
+					      const char *msg);
+static int         i915_scheduler_dump_all_locked(struct drm_device *dev, const char *msg);
 static void        i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler);
 static int         i915_scheduler_priority_bump(struct i915_scheduler *scheduler,
 						struct i915_scheduler_queue_entry *target,
@@ -53,6 +56,115 @@ bool i915_scheduler_is_enabled(struct drm_device *dev)
 	return dev_priv->scheduler != NULL;
 }
 
+const char *i915_qe_state_str(struct i915_scheduler_queue_entry *node)
+{
+	static char	str[50];
+	char		*ptr = str;
+
+	*(ptr++) = node->bumped ? 'B' : '-',
+
+	*ptr = 0;
+
+	return str;
+}
+
+char i915_scheduler_queue_status_chr(enum i915_scheduler_queue_status status)
+{
+	switch (status) {
+	case i915_sqs_none:
+	return 'N';
+
+	case i915_sqs_queued:
+	return 'Q';
+
+	case i915_sqs_popped:
+	return 'X';
+
+	case i915_sqs_flying:
+	return 'F';
+
+	case i915_sqs_complete:
+	return 'C';
+
+	case i915_sqs_dead:
+	return 'D';
+
+	default:
+	break;
+	}
+
+	return '?';
+}
+
+const char *i915_scheduler_queue_status_str(
+				enum i915_scheduler_queue_status status)
+{
+	static char	str[50];
+
+	switch (status) {
+	case i915_sqs_none:
+	return "None";
+
+	case i915_sqs_queued:
+	return "Queued";
+
+	case i915_sqs_popped:
+	return "Popped";
+
+	case i915_sqs_flying:
+	return "Flying";
+
+	case i915_sqs_complete:
+	return "Complete";
+
+	case i915_sqs_dead:
+	return "Dead";
+
+	default:
+	break;
+	}
+
+	sprintf(str, "[Unknown_%d!]", status);
+	return str;
+}
+
+const char *i915_scheduler_flag_str(uint32_t flags)
+{
+	static char     str[100];
+	char           *ptr = str;
+
+	*ptr = 0;
+
+#define TEST_FLAG(flag, msg)						\
+	do {								\
+		if (flags & (flag)) {					\
+			strcpy(ptr, msg);				\
+			ptr += strlen(ptr);				\
+			flags &= ~(flag);				\
+		}							\
+	} while (0)
+
+	TEST_FLAG(i915_sf_interrupts_enabled, "IntOn|");
+	TEST_FLAG(i915_sf_submitting,         "Submitting|");
+	TEST_FLAG(i915_sf_dump_force,         "DumpForce|");
+	TEST_FLAG(i915_sf_dump_details,       "DumpDetails|");
+	TEST_FLAG(i915_sf_dump_dependencies,  "DumpDeps|");
+
+#undef TEST_FLAG
+
+	if (flags) {
+		sprintf(ptr, "Unknown_0x%X!", flags);
+		ptr += strlen(ptr);
+	}
+
+	if (ptr == str)
+		strcpy(str, "-");
+	else
+		ptr[-1] = 0;
+
+	return str;
+};
+
 int i915_scheduler_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -709,6 +821,170 @@ void i915_gem_scheduler_work_handler(struct work_struct *work)
 	mutex_unlock(&dev->struct_mutex);
 }
 
+int i915_scheduler_dump_all(struct drm_device *dev, const char *msg)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	unsigned long   flags;
+	int             ret;
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+	ret = i915_scheduler_dump_all_locked(dev, msg);
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	return ret;
+}
+
+static int i915_scheduler_dump_all_locked(struct drm_device *dev, const char *msg)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct intel_engine_cs  *ring;
+	int                     i, r, ret = 0;
+
+	for_each_ring(ring, dev_priv, i) {
+		scheduler->flags[ring->id] |= i915_sf_dump_force   |
+					      i915_sf_dump_details |
+					      i915_sf_dump_dependencies;
+		r = i915_scheduler_dump_locked(ring, msg);
+		if (ret == 0)
+			ret = r;
+	}
+
+	return ret;
+}
+
+int i915_scheduler_dump(struct intel_engine_cs *ring, const char *msg)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	unsigned long   flags;
+	int             ret;
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+	ret = i915_scheduler_dump_locked(ring, msg);
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	return ret;
+}
+
+static int i915_scheduler_dump_locked(struct intel_engine_cs *ring, const char *msg)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_queue_entry  *node;
+	int                 flying = 0, queued = 0, complete = 0, other = 0;
+	static int          old_flying = -1, old_queued = -1, old_complete = -1;
+	bool                b_dump;
+	char                brkt[2] = { '<', '>' };
+
+	if (!ring)
+		return -EINVAL;
+
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+		if (I915_SQS_IS_QUEUED(node))
+			queued++;
+		else if (I915_SQS_IS_FLYING(node))
+			flying++;
+		else if (I915_SQS_IS_COMPLETE(node))
+			complete++;
+		else
+			other++;
+	}
+
+	b_dump = (flying != old_flying) ||
+		 (queued != old_queued) ||
+		 (complete != old_complete);
+	if (scheduler->flags[ring->id] & i915_sf_dump_force) {
+		if (!b_dump) {
+			b_dump = true;
+			brkt[0] = '{';
+			brkt[1] = '}';
+		}
+
+		scheduler->flags[ring->id] &= ~i915_sf_dump_force;
+	}
+
+	if (b_dump) {
+		old_flying   = flying;
+		old_queued   = queued;
+		old_complete = complete;
+		DRM_DEBUG_DRIVER("<%s> Q:%02d, F:%02d, C:%02d, O:%02d, "
+				 "Flags = %s, Next = %d:%d %c%s%c\n",
+				 ring->name, queued, flying, complete, other,
+				 i915_scheduler_flag_str(scheduler->flags[ring->id]),
+				 dev_priv->request_uniq, dev_priv->next_seqno,
+				 brkt[0], msg, brkt[1]);
+	} else {
+		/*DRM_DEBUG_DRIVER("<%s> Q:%02d, F:%02d, C:%02d, O:%02d"
+				 ", Flags = %s, Next = %d:%d [%s]\n",
+				 ring->name,
+				 queued, flying, complete, other,
+				 i915_scheduler_flag_str(scheduler->flags[ring->id]),
+				 dev_priv->request_uniq, dev_priv->next_seqno, msg); */
+
+		return 0;
+	}
+
+	if (scheduler->flags[ring->id] & i915_sf_dump_details) {
+		int         i, deps;
+		uint32_t    count, counts[i915_sqs_MAX];
+
+		memset(counts, 0x00, sizeof(counts));
+
+		list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+			if (node->status < i915_sqs_MAX) {
+				count = counts[node->status]++;
+			} else {
+				DRM_DEBUG_DRIVER("<%s>   Unknown status: %d!\n",
+						 ring->name, node->status);
+				count = -1;
+			}
+
+			deps = 0;
+			for (i = 0; i < node->num_deps; i++)
+				if (i915_scheduler_is_dependency_valid(node, i))
+					deps++;
+
+			DRM_DEBUG_DRIVER("<%s>   %c:%02d> index = %d, uniq = %d, seqno"
+					 " = %d/%s, deps = %d / %d, fence = %p/%d, %s [pri = "
+					 "%4d]\n", ring->name,
+					 i915_scheduler_queue_status_chr(node->status),
+					 count,
+					 node->scheduler_index,
+					 node->params.request->uniq,
+					 node->params.request->seqno,
+					 node->params.ring->name,
+					 deps, node->num_deps,
+					 node->params.fence_wait,
+					 node->params.fence_wait ? atomic_read(&node->params.fence_wait->status) : 0,
+					 i915_qe_state_str(node),
+					 node->priority);
+
+			if ((scheduler->flags[ring->id] & i915_sf_dump_dependencies)
+				== 0)
+				continue;
+
+			for (i = 0; i < node->num_deps; i++)
+				if (node->dep_list[i])
+					DRM_DEBUG_DRIVER("<%s>       |-%c:"
+						"%02d%c uniq = %d, seqno = %d/%s, %s [pri = %4d]\n",
+						ring->name,
+						i915_scheduler_queue_status_chr(node->dep_list[i]->status),
+						i,
+						i915_scheduler_is_dependency_valid(node, i)
+							? '>' : '#',
+						node->dep_list[i]->params.request->uniq,
+						node->dep_list[i]->params.request->seqno,
+						node->dep_list[i]->params.ring->name,
+						i915_qe_state_str(node->dep_list[i]),
+						node->dep_list[i]->priority);
+		}
+	}
+
+	return 0;
+}
+
 int i915_scheduler_flush_request(struct drm_i915_gem_request *req,
 				 bool is_locked)
 {
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 301c567..3e6819d 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -41,6 +41,9 @@ enum i915_scheduler_queue_status {
 	/* Limit value for use with arrays/loops */
 	i915_sqs_MAX
 };
+char i915_scheduler_queue_status_chr(enum i915_scheduler_queue_status status);
+const char *i915_scheduler_queue_status_str(
+				enum i915_scheduler_queue_status status);
 
 #define I915_SQS_IS_QUEUED(node)	(((node)->status == i915_sqs_queued))
 #define I915_SQS_IS_FLYING(node)	(((node)->status == i915_sqs_flying))
@@ -75,6 +78,7 @@ struct i915_scheduler_queue_entry {
 	uint32_t                            scheduler_index;
 	uint32_t                            flags;
 };
+const char *i915_qe_state_str(struct i915_scheduler_queue_entry *node);
 
 struct i915_scheduler {
 	struct list_head    node_queue[I915_NUM_RINGS];
@@ -92,9 +96,16 @@ struct i915_scheduler {
 
 /* Flag bits for i915_scheduler::flags */
 enum {
+	/* Internal state */
 	i915_sf_interrupts_enabled  = (1 << 0),
 	i915_sf_submitting          = (1 << 1),
+
+	/* Dump/debug flags */
+	i915_sf_dump_force          = (1 << 8),
+	i915_sf_dump_details        = (1 << 9),
+	i915_sf_dump_dependencies   = (1 << 10),
 };
+const char *i915_scheduler_flag_str(uint32_t flags);
 
 /* Options for 'scheduler_override' module parameter: */
 enum {
@@ -119,6 +130,9 @@ struct drm_i915_gem_request *i915_scheduler_find_by_sync_value(struct intel_engi
 int         i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked);
 int         i915_scheduler_flush_request(struct drm_i915_gem_request *req,
 					 bool is_locked);
+int         i915_scheduler_dump(struct intel_engine_cs *ring,
+				const char *msg);
+int         i915_scheduler_dump_all(struct drm_device *dev, const char *msg);
 bool        i915_scheduler_is_request_tracked(struct drm_i915_gem_request *req,
 					      bool *completed, bool *busy);
 bool        i915_scheduler_file_queue_is_full(struct drm_file *file);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 33/39] drm/i915: Add early exit to execbuff_final() if insufficient ring space
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (31 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 32/39] drm/i915: Added debug state dump facilities to scheduler John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 34/39] drm/i915: Added scheduler statistic reporting to debugfs John.C.Harrison
                   ` (6 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

One of the major purposes of the GPU scheduler is to avoid stalling the CPU when
the GPU is busy and unable to accept more work. This change adds support to the
ring submission code to allow a ring space check to be performed before
attempting to submit a batch buffer to the hardware. If insufficient space is
available then the scheduler can go away and come back later, letting the CPU
get on with other work, rather than stalling and waiting for the hardware to
catch up.

Change-Id: I267159ce1150cb6714d34a49b841bcbe4bf66326
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 42 ++++++++++++++++------
 drivers/gpu/drm/i915/intel_lrc.c           | 57 +++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_ringbuffer.c    | 24 +++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  1 +
 4 files changed, 109 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index c2a69d8..b701838 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1078,25 +1078,19 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev,
 {
 	struct intel_engine_cs *ring = req->ring;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	int ret, i;
+	int i;
 
 	if (!IS_GEN7(dev) || ring != &dev_priv->ring[RCS]) {
 		DRM_DEBUG("sol reset is gen7/rcs only\n");
 		return -EINVAL;
 	}
 
-	ret = intel_ring_begin(req, 4 * 3);
-	if (ret)
-		return ret;
-
 	for (i = 0; i < 4; i++) {
 		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 		intel_ring_emit(ring, GEN7_SO_WRITE_OFFSET(i));
 		intel_ring_emit(ring, 0);
 	}
 
-	intel_ring_advance(ring);
-
 	return 0;
 }
 
@@ -1315,6 +1309,7 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params)
 	struct intel_engine_cs  *ring = params->ring;
 	u64 exec_start, exec_len;
 	int ret, i;
+	uint32_t min_space;
 
 	/* The mutex must be acquired before calling this function */
 	BUG_ON(!mutex_is_locked(&params->dev->struct_mutex));
@@ -1336,8 +1331,36 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params)
 	if (ret)
 		return ret;
 
+	/*
+	 * It would be a bad idea to run out of space while writing commands
+	 * to the ring. One of the major aims of the scheduler is to not stall
+	 * at any point for any reason. However, doing an early exit half way
+	 * through submission could result in a partial sequence being written
+	 * which would leave the engine in an unknown state. Therefore, check in
+	 * advance that there will be enough space for the entire submission
+	 * whether emitted by the code below OR by any other functions that may
+	 * be executed before the end of final().
+	 *
+	 * NB: This test deliberately overestimates, because that's easier than
+	 * tracing every potential path that could be taken!
+	 *
+	 * Current measurements suggest that we may need to emit up to 744 bytes
+	 * (186 dwords), so this is rounded up to 256 dwords here. Then we double
+	 * that to get the free space requirement, because the block isn't allowed
+	 * to span the transition from the end to the beginning of the ring.
+	 */
+#define I915_BATCH_EXEC_MAX_LEN         256	/* max dwords emitted here	*/
+	min_space = I915_BATCH_EXEC_MAX_LEN * 2 * sizeof(uint32_t);
+	ret = intel_ring_test_space(params->request->ringbuf, min_space);
+	if (ret)
+		goto early_error;
+
 	intel_runtime_pm_get(dev_priv);
 
+	ret = intel_ring_begin(params->request, I915_BATCH_EXEC_MAX_LEN);
+	if (ret)
+		goto error;
+
 	/*
 	 * Unconditionally invalidate gpu caches and ensure that we do flush
 	 * any residual writes from the previous batch.
@@ -1356,10 +1379,6 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params)
 
 	if (ring == &dev_priv->ring[RCS] &&
 			params->instp_mode != dev_priv->relative_constants_mode) {
-		ret = intel_ring_begin(params->request, 4);
-		if (ret)
-			goto error;
-
 		intel_ring_emit(ring, MI_NOOP);
 		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 		intel_ring_emit(ring, INSTPM);
@@ -1411,6 +1430,7 @@ error:
 	 */
 	intel_runtime_pm_put(dev_priv);
 
+early_error:
 	if (ret)
 		intel_ring_reserved_space_cancel(params->request->ringbuf);
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 41dca2a..3cedddb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -215,6 +215,30 @@ enum {
 
 static int intel_lr_context_pin(struct drm_i915_gem_request *rq);
 
+/* Test to see if the ring has sufficient space to submit a given piece of work
+ * without causing a stall */
+static int logical_ring_test_space(struct intel_ringbuffer *ringbuf, int min_space)
+{
+	//struct intel_engine_cs *ring = ringbuf->ring;
+	//struct drm_device *dev = ring->dev;
+	//struct drm_i915_private *dev_priv = dev->dev_private;
+
+	if (ringbuf->space < min_space) {
+		/* Need to update the actual ring space. Otherwise, the system
+		 * hangs forever testing a software copy of the space value that
+		 * never changes!
+		 */
+		//ringbuf->head  = I915_READ_HEAD(ring);
+		//ringbuf->space = intel_ring_space(ringbuf);
+		intel_ring_update_space(ringbuf);
+
+		if (ringbuf->space < min_space)
+			return -EAGAIN;
+	}
+
+	return 0;
+}
+
 /**
  * intel_sanitize_enable_execlists() - sanitize i915.enable_execlists
  * @dev: DRM device.
@@ -906,6 +930,7 @@ int intel_execlists_submission_final(struct i915_execbuffer_params *params)
 	struct intel_engine_cs  *ring = params->ring;
 	u64 exec_start;
 	int ret;
+	uint32_t min_space;
 
 	/* The mutex must be acquired before calling this function */
 	BUG_ON(!mutex_is_locked(&params->dev->struct_mutex));
@@ -928,6 +953,34 @@ int intel_execlists_submission_final(struct i915_execbuffer_params *params)
 		return ret;
 
 	/*
+	 * It would be a bad idea to run out of space while writing commands
+	 * to the ring. One of the major aims of the scheduler is to not stall
+	 * at any point for any reason. However, doing an early exit half way
+	 * through submission could result in a partial sequence being written
+	 * which would leave the engine in an unknown state. Therefore, check in
+	 * advance that there will be enough space for the entire submission
+	 * whether emitted by the code below OR by any other functions that may
+	 * be executed before the end of final().
+	 *
+	 * NB: This test deliberately overestimates, because that's easier than
+	 * tracing every potential path that could be taken!
+	 *
+	 * Current measurements suggest that we may need to emit up to ??? bytes
+	 * (186 dwords), so this is rounded up to 256 dwords here. Then we double
+	 * that to get the free space requirement, because the block isn't allowed
+	 * to span the transition from the end to the beginning of the ring.
+	 */
+#define I915_BATCH_EXEC_MAX_LEN         256	/* max dwords emitted here	*/
+	min_space = I915_BATCH_EXEC_MAX_LEN * 2 * sizeof(uint32_t);
+	ret = logical_ring_test_space(params->request->ringbuf, min_space);
+	if (ret)
+		goto err;
+
+	ret = intel_logical_ring_begin(params->request, I915_BATCH_EXEC_MAX_LEN);
+	if (ret)
+		goto err;
+
+	/*
 	 * Unconditionally invalidate gpu caches and ensure that we do flush
 	 * any residual writes from the previous batch.
 	 */
@@ -939,10 +992,6 @@ int intel_execlists_submission_final(struct i915_execbuffer_params *params)
 	    params->instp_mode != dev_priv->relative_constants_mode) {
 		struct intel_ringbuffer *ringbuf = params->request->ringbuf;
 
-		ret = intel_logical_ring_begin(params->request, 4);
-		if (ret)
-			return ret;
-
 		intel_logical_ring_emit(ringbuf, MI_NOOP);
 		intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1));
 		intel_logical_ring_emit(ringbuf, INSTPM);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index afb04de..3d11a09 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2371,6 +2371,30 @@ int intel_ring_cacheline_align(struct drm_i915_gem_request *req)
 	return 0;
 }
 
+/* Test to see if the ring has sufficient space to submit a given piece of work
+ * without causing a stall */
+int intel_ring_test_space(struct intel_ringbuffer *ringbuf, int min_space)
+{
+	struct drm_i915_private *dev_priv = ringbuf->ring->dev->dev_private;
+
+	/* There is a separate LRC version of this code. */
+	BUG_ON(i915.enable_execlists);
+
+	if (ringbuf->space < min_space) {
+		/* Need to update the actual ring space. Otherwise, the system
+		 * hangs forever testing a software copy of the space value that
+		 * never changes!
+		 */
+		ringbuf->head  = I915_READ_HEAD(ringbuf->ring);
+		ringbuf->space = intel_ring_space(ringbuf);
+
+		if (ringbuf->space < min_space)
+			return -EAGAIN;
+	}
+
+	return 0;
+}
+
 void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno)
 {
 	struct drm_device *dev = ring->dev;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 2f30900..7bc8c90 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -468,6 +468,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring);
 
 int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request);
 
+int intel_ring_test_space(struct intel_ringbuffer *ringbuf, int min_space);
 int __must_check intel_ring_begin(struct drm_i915_gem_request *req, int n);
 int __must_check intel_ring_cacheline_align(struct drm_i915_gem_request *req);
 static inline void intel_ring_emit(struct intel_engine_cs *ring,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 34/39] drm/i915: Added scheduler statistic reporting to debugfs
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (32 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 33/39] drm/i915: Add early exit to execbuff_final() if insufficient ring space John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 35/39] drm/i915: Added seqno values to scheduler status dump John.C.Harrison
                   ` (5 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

It is useful for know what the scheduler is doing for both debugging and
performance analysis purposes. This change adds a bunch of counters and such
that keep track of various scheduler operations (batches submitted, completed,
flush requests, etc.). The data can then be read in userland via the debugfs
mechanism.

Change-Id: I3266c631cd70c9eeb2c235f88f493e60462f85d7
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c        | 76 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 ++++-
 drivers/gpu/drm/i915/i915_scheduler.c      | 71 ++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_scheduler.h      | 35 ++++++++++++++
 4 files changed, 189 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 028fa8f..3c5c750 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3258,6 +3258,81 @@ static int i915_drrs_status(struct seq_file *m, void *unused)
 	return 0;
 }
 
+static int i915_scheduler_info(struct seq_file *m, void *unused)
+{
+	struct drm_info_node *node = (struct drm_info_node *) m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_stats *stats = scheduler->stats;
+	struct i915_scheduler_stats_nodes node_stats[I915_NUM_RINGS];
+	struct intel_engine_cs *ring;
+	char   str[50 * (I915_NUM_RINGS + 1)], name[50], *ptr;
+	int ret, i, r;
+
+	ret = mutex_lock_interruptible(&dev->mode_config.mutex);
+	if (ret)
+		return ret;
+
+#define PRINT_VAR(name, fmt, var)					\
+	do {								\
+		sprintf(str, "%-22s", name);				\
+		ptr = str + strlen(str);				\
+		for_each_ring(ring, dev_priv, r) {			\
+			sprintf(ptr, " %10" fmt, var);			\
+			ptr += strlen(ptr);				\
+		}							\
+		seq_printf(m, "%s\n", str);				\
+	} while (0)
+
+	PRINT_VAR("Ring name:",             "s", dev_priv->ring[r].name);
+	PRINT_VAR("  Ring seqno",           "d", ring->get_seqno(ring, false));
+	seq_putc(m, '\n');
+
+	seq_puts(m, "Batch submissions:\n");
+	PRINT_VAR("  Queued",               "u", stats[r].queued);
+	PRINT_VAR("  Submitted",            "u", stats[r].submitted);
+	PRINT_VAR("  Completed",            "u", stats[r].completed);
+	PRINT_VAR("  Expired",              "u", stats[r].expired);
+	seq_putc(m, '\n');
+
+	seq_puts(m, "Flush counts:\n");
+	PRINT_VAR("  By object",            "u", stats[r].flush_obj);
+	PRINT_VAR("  By request",           "u", stats[r].flush_req);
+	PRINT_VAR("  Blanket",              "u", stats[r].flush_all);
+	PRINT_VAR("  Entries bumped",       "u", stats[r].flush_bump);
+	PRINT_VAR("  Entries submitted",    "u", stats[r].flush_submit);
+	seq_putc(m, '\n');
+
+	seq_puts(m, "Miscellaneous:\n");
+	PRINT_VAR("  ExecEarly retry",      "u", stats[r].exec_early);
+	PRINT_VAR("  ExecFinal requeue",    "u", stats[r].exec_again);
+	PRINT_VAR("  ExecFinal killed",     "u", stats[r].exec_dead);
+	PRINT_VAR("  Fence wait",           "u", stats[r].fence_wait);
+	PRINT_VAR("  Fence wait again",     "u", stats[r].fence_again);
+	PRINT_VAR("  Fence wait ignore",    "u", stats[r].fence_ignore);
+	PRINT_VAR("  Fence supplied",       "u", stats[r].fence_got);
+	PRINT_VAR("  Hung flying",          "u", stats[r].kill_flying);
+	PRINT_VAR("  Hung queued",          "u", stats[r].kill_queued);
+	seq_putc(m, '\n');
+
+	seq_puts(m, "Queue contents:\n");
+	for_each_ring(ring, dev_priv, i)
+		i915_scheduler_query_stats(ring, node_stats + ring->id);
+
+	for (i = 0; i < (i915_sqs_MAX + 1); i++) {
+		sprintf(name, "  %s", i915_scheduler_queue_status_str(i));
+		PRINT_VAR(name, "d", node_stats[r].counts[i]);
+	}
+	seq_putc(m, '\n');
+
+#undef PRINT_VAR
+
+	mutex_unlock(&dev->mode_config.mutex);
+
+	return 0;
+}
+
 struct pipe_crc_info {
 	const char *name;
 	struct drm_device *dev;
@@ -5250,6 +5325,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_semaphore_status", i915_semaphore_status, 0},
 	{"i915_shared_dplls_info", i915_shared_dplls_info, 0},
 	{"i915_dp_mst_info", i915_dp_mst_info, 0},
+	{"i915_scheduler_info", i915_scheduler_info, 0},
 	{"i915_wa_registers", i915_wa_registers, 0},
 	{"i915_ddb_info", i915_ddb_info, 0},
 	{"i915_sseu_status", i915_sseu_status, 0},
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index b701838..5fd07ae 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1507,8 +1507,15 @@ static int i915_early_fence_wait(struct intel_engine_cs *ring, int fence_fd)
 	}
 
 	if (atomic_read(&fence->status) == 0) {
-		if (!i915_safe_to_ignore_fence(ring, fence))
+		struct drm_i915_private *dev_priv = ring->dev->dev_private;
+		struct i915_scheduler *scheduler = dev_priv->scheduler;
+
+		if (i915_safe_to_ignore_fence(ring, fence))
+			scheduler->stats[ring->id].fence_ignore++;
+		else {
+			scheduler->stats[ring->id].fence_wait++;
 			ret = sync_fence_wait(fence, 1000);
+		}
 	}
 
 	sync_fence_put(fence);
@@ -1922,6 +1929,8 @@ pre_mutex_err:
 		args->rsvd2 = (__u64) -1;
 	}
 
+	dev_priv->scheduler->stats[ring->id].exec_early++;
+
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index e22f6b8..1547b64 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -120,6 +120,9 @@ const char *i915_scheduler_queue_status_str(
 	case i915_sqs_dead:
 	return "Dead";
 
+	case i915_sqs_MAX:
+	return "Invalid";
+
 	default:
 	break;
 	}
@@ -211,10 +214,14 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 
 	BUG_ON(!scheduler);
 
+	if (qe->params.fence_wait)
+		scheduler->stats[ring->id].fence_got++;
+
 	if (i915.scheduler_override & i915_so_direct_submit) {
 		int ret;
 
 		qe->scheduler_index = scheduler->index++;
+		scheduler->stats[qe->params.ring->id].queued++;
 
 		trace_i915_scheduler_queue(qe->params.ring, qe);
 
@@ -225,6 +232,7 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 
 		scheduler->flags[qe->params.ring->id] |= i915_sf_submitting;
 		ret = dev_priv->gt.execbuf_final(&qe->params);
+		scheduler->stats[qe->params.ring->id].submitted++;
 		scheduler->flags[qe->params.ring->id] &= ~i915_sf_submitting;
 
 		/*
@@ -260,6 +268,8 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 			sync_fence_put(qe->params.fence_wait);
 #endif
 
+		scheduler->stats[qe->params.ring->id].expired++;
+
 		return 0;
 	}
 
@@ -372,6 +382,8 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
 		not_flying = i915_scheduler_count_flying(scheduler, ring) <
 							 scheduler->min_flying;
 
+	scheduler->stats[ring->id].queued++;
+
 	trace_i915_scheduler_queue(ring, node);
 	trace_i915_scheduler_node_state_change(ring, node);
 
@@ -516,10 +528,12 @@ void i915_scheduler_kill_all(struct drm_device *dev)
 
 			case I915_SQS_CASE_FLYING:
 				i915_scheduler_node_kill(node);
+				scheduler->stats[r].kill_flying++;
 			break;
 
 			case I915_SQS_CASE_QUEUED:
 				i915_scheduler_node_kill_queued(node);
+				scheduler->stats[r].kill_queued++;
 			break;
 
 			default:
@@ -594,6 +608,7 @@ static void i915_scheduler_seqno_complete(struct intel_engine_cs *ring, uint32_t
 		/* Node was in flight so mark it as complete. */
 		node->status = i915_sqs_complete;
 		trace_i915_scheduler_node_state_change(ring, node);
+		scheduler->stats[ring->id].completed++;
 		got_changes = true;
 	}
 
@@ -743,6 +758,7 @@ static int i915_scheduler_remove(struct intel_engine_cs *ring)
 
 		list_del(&node->link);
 		list_add(&node->link, &remove);
+		scheduler->stats[ring->id].expired++;
 
 		/* Strip the dependency info while the mutex is still locked */
 		i915_scheduler_remove_dependent(scheduler, node);
@@ -985,6 +1001,35 @@ static int i915_scheduler_dump_locked(struct intel_engine_cs *ring, const char *
 	return 0;
 }
 
+int i915_scheduler_query_stats(struct intel_engine_cs *ring,
+			       struct i915_scheduler_stats_nodes *stats)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_queue_entry  *node;
+	unsigned long   flags;
+
+	memset(stats, 0x00, sizeof(*stats));
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+		if (node->status >= i915_sqs_MAX) {
+			DRM_DEBUG_DRIVER("Invalid node state: %d! [uniq = %d, seqno = %d]\n",
+					 node->status, node->params.request->uniq, node->params.request->seqno);
+
+			stats->counts[i915_sqs_MAX]++;
+			continue;
+		}
+
+		stats->counts[node->status]++;
+	}
+
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	return 0;
+}
+
 int i915_scheduler_flush_request(struct drm_i915_gem_request *req,
 				 bool is_locked)
 {
@@ -1021,16 +1066,21 @@ int i915_scheduler_flush_request(struct drm_i915_gem_request *req,
 
 	spin_lock_irqsave(&scheduler->lock, flags);
 
+	scheduler->stats[ring_id].flush_req++;
+
 	i915_scheduler_priority_bump_clear(scheduler);
 
 	flush_count = i915_scheduler_priority_bump(scheduler,
 			    req->scheduler_qe, scheduler->priority_level_max);
+	scheduler->stats[ring_id].flush_bump += flush_count;
 
 	spin_unlock_irqrestore(&scheduler->lock, flags);
 
 	if (flush_count) {
 		DRM_DEBUG_DRIVER("<%s> Bumped %d entries\n", req->ring->name, flush_count);
 		flush_count = i915_scheduler_submit_max_priority(req->ring, is_locked);
+		if (flush_count > 0)
+			scheduler->stats[ring_id].flush_submit += flush_count;
 	}
 
 	return flush_count;
@@ -1057,6 +1107,8 @@ int i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked)
 
 	BUG_ON(is_locked && (scheduler->flags[ring->id] & i915_sf_submitting));
 
+	scheduler->stats[ring->id].flush_all++;
+
 	do {
 		found = false;
 		spin_lock_irqsave(&scheduler->lock, flags);
@@ -1071,6 +1123,7 @@ int i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked)
 
 		if (found) {
 			ret = i915_scheduler_submit(ring, is_locked);
+			scheduler->stats[ring->id].flush_submit++;
 			if (ret < 0)
 				return ret;
 
@@ -1213,15 +1266,20 @@ static void i915_scheduler_wait_fence_signaled(struct sync_fence *fence,
 static bool i915_scheduler_async_fence_wait(struct drm_device *dev,
 					    struct i915_scheduler_queue_entry *node)
 {
+	struct drm_i915_private         *dev_priv = node->params.ring->dev->dev_private;
+	struct i915_scheduler           *scheduler = dev_priv->scheduler;
 	struct i915_sync_fence_waiter	*fence_waiter;
 	struct sync_fence		*fence = node->params.fence_wait;
 	int				signaled;
 	bool				success = true;
 
-	if ((node->flags & i915_qef_fence_waiting) == 0)
+	if ((node->flags & i915_qef_fence_waiting) == 0) {
 		node->flags |= i915_qef_fence_waiting;
-	else
+		scheduler->stats[node->params.ring->id].fence_wait++;
+	} else {
+		scheduler->stats[node->params.ring->id].fence_again++;
 		return true;
+	}
 
 	if (fence == NULL)
 		return false;
@@ -1291,8 +1349,10 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
 		else
 			signalled = true;
 
-		if (!signalled)
+		if (!signalled) {
 			signalled = i915_safe_to_ignore_fence(ring, node->params.fence_wait);
+			scheduler->stats[node->params.ring->id].fence_ignore++;
+		}
 #endif	// CONFIG_SYNC
 
 		has_local  = false;
@@ -1434,6 +1494,8 @@ static int i915_scheduler_submit(struct intel_engine_cs *ring, bool was_locked)
 		 * list. So add it back in and mark it as in flight. */
 		i915_scheduler_fly_node(node);
 
+		scheduler->stats[ring->id].submitted++;
+
 		scheduler->flags[ring->id] |= i915_sf_submitting;
 		spin_unlock_irqrestore(&scheduler->lock, flags);
 		ret = dev_priv->gt.execbuf_final(&node->params);
@@ -1452,6 +1514,7 @@ static int i915_scheduler_submit(struct intel_engine_cs *ring, bool was_locked)
 			case ENOENT:
 				/* Fatal errors. Kill the node. */
 				requeue = -1;
+				scheduler->stats[ring->id].exec_dead++;
 			break;
 
 			case EAGAIN:
@@ -1461,12 +1524,14 @@ static int i915_scheduler_submit(struct intel_engine_cs *ring, bool was_locked)
 			case ERESTARTSYS:
 			case EINTR:
 				/* Supposedly recoverable errors. */
+				scheduler->stats[ring->id].exec_again++;
 			break;
 
 			default:
 				DRM_DEBUG_DRIVER("<%s> Got unexpected error from execfinal(): %d!\n",
 						 ring->name, ret);
 				/* Assume it is recoverable and hope for the best. */
+				scheduler->stats[ring->id].exec_again++;
 			break;
 			}
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 3e6819d..dd0510c 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -80,6 +80,36 @@ struct i915_scheduler_queue_entry {
 };
 const char *i915_qe_state_str(struct i915_scheduler_queue_entry *node);
 
+struct i915_scheduler_stats_nodes {
+	uint32_t	counts[i915_sqs_MAX + 1];
+};
+
+struct i915_scheduler_stats {
+	/* Batch buffer counts: */
+	uint32_t            queued;
+	uint32_t            submitted;
+	uint32_t            completed;
+	uint32_t            expired;
+
+	/* Other stuff: */
+	uint32_t            flush_obj;
+	uint32_t            flush_req;
+	uint32_t            flush_all;
+	uint32_t            flush_bump;
+	uint32_t            flush_submit;
+
+	uint32_t            exec_early;
+	uint32_t            exec_again;
+	uint32_t            exec_dead;
+	uint32_t            kill_flying;
+	uint32_t            kill_queued;
+
+	uint32_t            fence_wait;
+	uint32_t            fence_again;
+	uint32_t            fence_ignore;
+	uint32_t            fence_got;
+};
+
 struct i915_scheduler {
 	struct list_head    node_queue[I915_NUM_RINGS];
 	uint32_t            flags[I915_NUM_RINGS];
@@ -92,6 +122,9 @@ struct i915_scheduler {
 	uint32_t            priority_level_preempt;
 	uint32_t            min_flying;
 	uint32_t            file_queue_max;
+
+	/* Statistics: */
+	struct i915_scheduler_stats     stats[I915_NUM_RINGS];
 };
 
 /* Flag bits for i915_scheduler::flags */
@@ -135,6 +168,8 @@ int         i915_scheduler_dump(struct intel_engine_cs *ring,
 int         i915_scheduler_dump_all(struct drm_device *dev, const char *msg);
 bool        i915_scheduler_is_request_tracked(struct drm_i915_gem_request *req,
 					      bool *completed, bool *busy);
+int         i915_scheduler_query_stats(struct intel_engine_cs *ring,
+				       struct i915_scheduler_stats_nodes *stats);
 bool        i915_scheduler_file_queue_is_full(struct drm_file *file);
 
 #endif  /* _I915_SCHEDULER_H_ */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 35/39] drm/i915: Added seqno values to scheduler status dump
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (33 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 34/39] drm/i915: Added scheduler statistic reporting to debugfs John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 36/39] drm/i915: Add scheduler support functions for TDR John.C.Harrison
                   ` (4 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

It is useful to be able to see what seqnos have actually popped out of the
hardware when viewing the scheduler status.

Change-Id: Ie93e51c64328be2606b8b43440f6344d5f225426
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_scheduler.c | 10 ++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 1547b64..7be1c89 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -152,6 +152,7 @@ const char *i915_scheduler_flag_str(uint32_t flags)
 	TEST_FLAG(i915_sf_dump_force,         "DumpForce|");
 	TEST_FLAG(i915_sf_dump_details,       "DumpDetails|");
 	TEST_FLAG(i915_sf_dump_dependencies,  "DumpDeps|");
+	TEST_FLAG(i915_sf_dump_seqno,         "DumpSeqno|");
 
 #undef TEST_FLAG
 
@@ -861,6 +862,7 @@ static int i915_scheduler_dump_all_locked(struct drm_device *dev, const char *ms
 	for_each_ring(ring, dev_priv, i) {
 		scheduler->flags[ring->id] |= i915_sf_dump_force   |
 					      i915_sf_dump_details |
+					      i915_sf_dump_seqno   |
 					      i915_sf_dump_dependencies;
 		r = i915_scheduler_dump_locked(ring, msg);
 		if (ret == 0)
@@ -942,6 +944,14 @@ static int i915_scheduler_dump_locked(struct intel_engine_cs *ring, const char *
 		return 0;
 	}
 
+	if (scheduler->flags[ring->id] & i915_sf_dump_seqno) {
+		uint32_t    seqno;
+
+		seqno    = ring->get_seqno(ring, true);
+
+		DRM_DEBUG_DRIVER("<%s> Seqno = %d\n", ring->name, seqno);
+	}
+
 	if (scheduler->flags[ring->id] & i915_sf_dump_details) {
 		int         i, deps;
 		uint32_t    count, counts[i915_sqs_MAX];
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index dd0510c..6e6e3a0 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -137,6 +137,7 @@ enum {
 	i915_sf_dump_force          = (1 << 8),
 	i915_sf_dump_details        = (1 << 9),
 	i915_sf_dump_dependencies   = (1 << 10),
+	i915_sf_dump_seqno          = (1 << 11),
 };
 const char *i915_scheduler_flag_str(uint32_t flags);
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 36/39] drm/i915: Add scheduler support functions for TDR
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (34 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 35/39] drm/i915: Added seqno values to scheduler status dump John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 37/39] drm/i915: GPU priority bumping to prevent starvation John.C.Harrison
                   ` (3 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Change-Id: I720463f01c4edd3579ce52e315a85e4d7874d7e5
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_scheduler.c | 31 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h |  1 +
 2 files changed, 32 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 7be1c89..631f4e6 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -1693,6 +1693,37 @@ int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
 	return 0;
 }
 
+/*
+ * Used by TDR to distinguish hung rings (not moving but with work to do)
+ * from idle rings (not moving because there is nothing to do).
+ */
+bool i915_scheduler_is_ring_flying(struct intel_engine_cs *ring)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+	struct i915_scheduler_queue_entry *node;
+	unsigned long   flags;
+	bool            found = false;
+
+	/* With the scheduler in bypass mode, no information can be returned. */
+	if (i915.scheduler_override & i915_so_direct_submit) {
+		return true;
+	}
+
+	spin_lock_irqsave(&scheduler->lock, flags);
+
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+		if (I915_SQS_IS_FLYING(node)) {
+			found = true;
+			break;
+		}
+	}
+
+	spin_unlock_irqrestore(&scheduler->lock, flags);
+
+	return found;
+}
+
 bool i915_scheduler_file_queue_is_full(struct drm_file *file)
 {
 	struct drm_i915_file_private *file_priv = file->driver_priv;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 6e6e3a0..2113e7d 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -154,6 +154,7 @@ int         i915_scheduler_closefile(struct drm_device *dev,
 void        i915_gem_scheduler_clean_node(struct i915_scheduler_queue_entry *node);
 int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
 int         i915_scheduler_handle_irq(struct intel_engine_cs *ring);
+bool        i915_scheduler_is_ring_flying(struct intel_engine_cs *ring);
 void        i915_scheduler_kill_all(struct drm_device *dev);
 void        i915_gem_scheduler_work_handler(struct work_struct *work);
 #ifdef CONFIG_SYNC
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 37/39] drm/i915: GPU priority bumping to prevent starvation
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (35 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 36/39] drm/i915: Add scheduler support functions for TDR John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 38/39] drm/i915: Enable GPU scheduler by default John.C.Harrison
                   ` (2 subsequent siblings)
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

If a high priority task was to continuously submit batch buffers to the driver,
it could starve out any lower priority task from getting any GPU time at all. To
prevent this, the priority of a queued batch buffer is bumped each time it does
not get submitted to the hardware.

Change-Id: I0319c7d2f306c61a283f03edda9b5d09a6d3b621
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c   | 28 ++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.c | 14 ++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h |  1 +
 3 files changed, 43 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 3c5c750..509668f 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1152,6 +1152,33 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_priority_max_fops,
 			"0x%llx\n");
 
 static int
+i915_scheduler_priority_bump_get(void *data, u64 *val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	*val = (u64) scheduler->priority_level_bump;
+	return 0;
+}
+
+static int
+i915_scheduler_priority_bump_set(void *data, u64 val)
+{
+	struct drm_device       *dev       = data;
+	struct drm_i915_private *dev_priv  = dev->dev_private;
+	struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+	scheduler->priority_level_bump = (u32) val;
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_priority_bump_fops,
+			i915_scheduler_priority_bump_get,
+			i915_scheduler_priority_bump_set,
+			"0x%llx\n");
+
+static int
 i915_scheduler_priority_preempt_get(void *data, u64 *val)
 {
 	struct drm_device       *dev       = data;
@@ -5349,6 +5376,7 @@ static const struct i915_debugfs_files {
 	{"i915_error_state", &i915_error_state_fops},
 	{"i915_next_seqno", &i915_next_seqno_fops},
 	{"i915_scheduler_priority_max", &i915_scheduler_priority_max_fops},
+	{"i915_scheduler_priority_bump", &i915_scheduler_priority_bump_fops},
 	{"i915_scheduler_priority_preempt", &i915_scheduler_priority_preempt_fops},
 	{"i915_scheduler_min_flying", &i915_scheduler_min_flying_fops},
 	{"i915_scheduler_file_queue_max", &i915_scheduler_file_queue_max_fops},
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 631f4e6..8de3f0b 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -191,6 +191,7 @@ int i915_scheduler_init(struct drm_device *dev)
 
 	/* Default tuning values: */
 	scheduler->priority_level_max     = ~0U;
+	scheduler->priority_level_bump    =  50;
 	scheduler->priority_level_preempt = 900;
 	scheduler->min_flying             = 2;
 	scheduler->file_queue_max         = 64;
@@ -1568,6 +1569,19 @@ static int i915_scheduler_submit(struct intel_engine_cs *ring, bool was_locked)
 		ret = i915_scheduler_pop_from_queue_locked(ring, &node, &flags);
 	} while (ret == 0);
 
+	/*
+	 * Bump the priority of everything that was not submitted to prevent
+	 * starvation of low priority tasks by a spamming high priority task.
+	 */
+	i915_scheduler_priority_bump_clear(scheduler);
+	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
+		if (!I915_SQS_IS_QUEUED(node))
+			continue;
+
+		i915_scheduler_priority_bump(scheduler, node,
+					     scheduler->priority_level_bump);
+	}
+
 	spin_unlock_irqrestore(&scheduler->lock, flags);
 
 	if (!was_locked)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 2113e7d..8f3e42f 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -119,6 +119,7 @@ struct i915_scheduler {
 
 	/* Tuning parameters: */
 	uint32_t            priority_level_max;
+	uint32_t            priority_level_bump;
 	uint32_t            priority_level_preempt;
 	uint32_t            min_flying;
 	uint32_t            file_queue_max;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 38/39] drm/i915: Enable GPU scheduler by default
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (36 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 37/39] drm/i915: GPU priority bumping to prevent starvation John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-17 14:33 ` [RFC 39/39] drm/i915: Allow scheduler to manage inter-ring object synchronisation John.C.Harrison
  2015-07-21 13:33 ` [RFC 00/39] GPU scheduler for i915 driver Daniel Vetter
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Now that all the scheduler patches have been applied, it is safe to enable.

Change-Id: I128042e85a30fca765ce1eb46c837c62dee66089
For: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_params.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index a5320ff..4656518 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -53,7 +53,7 @@ struct i915_params i915 __read_mostly = {
 	.verbose_state_checks = 1,
 	.nuclear_pageflip = 0,
 	.edp_vswing = 0,
-	.scheduler_override = 1,
+	.scheduler_override = 0,
 };
 
 module_param_named(modeset, i915.modeset, int, 0400);
@@ -189,4 +189,4 @@ MODULE_PARM_DESC(edp_vswing,
 		 "2=default swing(400mV))");
 
 module_param_named(scheduler_override, i915.scheduler_override, int, 0600);
-MODULE_PARM_DESC(scheduler_override, "Scheduler override mask (0 = none, 1 = direct submission [default])");
+MODULE_PARM_DESC(scheduler_override, "Scheduler override mask (default: 0)");
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC 39/39] drm/i915: Allow scheduler to manage inter-ring object synchronisation
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (37 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 38/39] drm/i915: Enable GPU scheduler by default John.C.Harrison
@ 2015-07-17 14:33 ` John.C.Harrison
  2015-07-21 13:33 ` [RFC 00/39] GPU scheduler for i915 driver Daniel Vetter
  39 siblings, 0 replies; 48+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:33 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The scheduler has always tracked batch buffer dependencies based on
DRM object usage. This means that it will not submit a batch on one
ring that has outstanding dependencies still executing on other rings.
This is exactly the same synchronisation performed by
i915_gem_object_sync() using hardware semaphores where available and
CPU stalls where not (e.g. in execlist mode and/or on Gen8 hardware).

Unfortunately, when a batch buffer is submitted to the driver the
_object_sync() call happens first. Thus in case where hardware
semaphores are disabled, the driver has already stalled until the
dependency has been resolved.

This patch adds an optimisation to _object_sync() to ignore the
synchronisation in the case where it will subsequently be handled by
the scheduler. This removes the driver stall and (in the single
application case) provides near hardware semaphore performance even
when hardware semaphores are disabled. In a busy system where there is
other work that can be executed on the stalling ring, it provides
better than hardware semaphore performance as it removes the stall
from both the driver and from the hardware. There is also a theory
that this method should improve power usage as hardware semaphores are
apparently not very power efficient - the stalled ring does not go
into as low a power a state as when it is genuinely idle.

The optimisation is to check whether both ends of the synchronisation
are batch buffer requests. If they are, then the scheduler will have
the inter-dependency tracked and managed. If one or other end is not a
batch buffer request (e.g. a page flip) then the code falls back to
the CPU stall or hardware semaphore as appropriate.

To check whether the existing usage is a batch buffer, the code simply
calls the 'are you tracking this request' function of the scheduler on
the object's last_read_req member. To check whether the new usage is a
batch buffer, a flag is passed in from the caller.

Change-Id: Idc16e19b5a4dc8b3782ce9db44dd3df445f396c1
Issue: VIZ-5566
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  2 +-
 drivers/gpu/drm/i915/i915_gem.c            | 19 +++++++++++++++----
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c           |  2 +-
 4 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e230632..e4bef2c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2895,7 +2895,7 @@ int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
 #endif
 int i915_gem_object_sync(struct drm_i915_gem_object *obj,
 			 struct intel_engine_cs *to,
-			 struct drm_i915_gem_request **to_req);
+			 struct drm_i915_gem_request **to_req, bool to_batch);
 void i915_vma_move_to_active(struct i915_vma *vma,
 			     struct drm_i915_gem_request *req);
 int i915_gem_dumb_create(struct drm_file *file_priv,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7308838..e0dca8c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3507,7 +3507,7 @@ static int
 __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 		       struct intel_engine_cs *to,
 		       struct drm_i915_gem_request *from_req,
-		       struct drm_i915_gem_request **to_req)
+		       struct drm_i915_gem_request **to_req, bool to_batch)
 {
 	struct intel_engine_cs *from;
 	int ret;
@@ -3519,6 +3519,15 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (i915_gem_request_completed(from_req))
 		return 0;
 
+	/*
+	 * The scheduler will manage inter-ring object dependencies
+	 * as long as both to and from requests are scheduler managed
+	 * (i.e. batch buffers).
+	 */
+	if (to_batch &&
+	    i915_scheduler_is_request_tracked(from_req, NULL, NULL))
+		return 0;
+
 	if (!i915_semaphore_is_enabled(obj->base.dev)) {
 		struct drm_i915_private *i915 = to_i915(obj->base.dev);
 		ret = __i915_wait_request(from_req,
@@ -3569,6 +3578,8 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
  * @to_req: request we wish to use the object for. See below.
  *          This will be allocated and returned if a request is
  *          required but not passed in.
+ * @to_batch: is the sync request on behalf of batch buffer submission?
+ * If so then the scheduler can (potentially) manage the synchronisation.
  *
  * This code is meant to abstract object synchronization with the GPU.
  * Calling with NULL implies synchronizing the object with the CPU
@@ -3599,7 +3610,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 int
 i915_gem_object_sync(struct drm_i915_gem_object *obj,
 		     struct intel_engine_cs *to,
-		     struct drm_i915_gem_request **to_req)
+		     struct drm_i915_gem_request **to_req, bool to_batch)
 {
 	const bool readonly = obj->base.pending_write_domain == 0;
 	struct drm_i915_gem_request *req[I915_NUM_RINGS];
@@ -3621,7 +3632,7 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
 				req[n++] = obj->last_read_req[i];
 	}
 	for (i = 0; i < n; i++) {
-		ret = __i915_gem_object_sync(obj, to, req[i], to_req);
+		ret = __i915_gem_object_sync(obj, to, req[i], to_req, to_batch);
 		if (ret)
 			return ret;
 	}
@@ -4568,7 +4579,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	u32 old_read_domains, old_write_domain;
 	int ret;
 
-	ret = i915_gem_object_sync(obj, pipelined, pipelined_request);
+	ret = i915_gem_object_sync(obj, pipelined, pipelined_request, false);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 5fd07ae..82d3975 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -910,7 +910,7 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 		struct drm_i915_gem_object *obj = vma->obj;
 
 		if (obj->active & other_rings) {
-			ret = i915_gem_object_sync(obj, req->ring, &req);
+			ret = i915_gem_object_sync(obj, req->ring, &req, true);
 			if (ret)
 				return ret;
 		}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3cedddb..659c9d9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -624,7 +624,7 @@ static int execlists_move_to_gpu(struct drm_i915_gem_request *req,
 		struct drm_i915_gem_object *obj = vma->obj;
 
 		if (obj->active & other_rings) {
-			ret = i915_gem_object_sync(obj, req->ring, &req);
+			ret = i915_gem_object_sync(obj, req->ring, &req, true);
 			if (ret)
 				return ret;
 		}
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [RFC 03/39] drm/i915: Explicit power enable during deferred context initialisation
  2015-07-17 14:33 ` [RFC 03/39] drm/i915: Explicit power enable during deferred context initialisation John.C.Harrison
@ 2015-07-21  7:54   ` Daniel Vetter
  0 siblings, 0 replies; 48+ messages in thread
From: Daniel Vetter @ 2015-07-21  7:54 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jul 17, 2015 at 03:33:12PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> A later patch in this series re-organises the batch buffer submission
> code. Part of that is to reduce the scope of a pm_get/put pair.
> Specifically, they previously wrapped the entire submission path from
> the very start to the very end, now they only wrap the actual hardware
> submission part in the back half.
> 
> While that is a good thing in general, it causes a problem with the
> deferred context initialisation. That is done quite early on in the
> execbuf code path - it happens at context validation time rather than
> context switch time. Some of the deferred work requires the power to
> be enabled. Hence this patch adds an explicit power reference count to
> the deferred initialisation code itself.

The main reason for grabbing rpm for the entire execbuf is writing ptes
for global gtt. This happens on gen6 due to some hilarious tlb aliasing hw
bug where we have to bind objects into both ggtt and ppgtt.
-Daniel

> 
> Change-Id: Id7b1535dfd8809a2bd5546272de2bbec39da2868
> Issue: GMINL-5159
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 18dbd5c..8aa9a18 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2317,12 +2317,15 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
>  	WARN_ON(ctx->legacy_hw_ctx.rcs_state != NULL);
>  	WARN_ON(ctx->engine[ring->id].state);
>  
> +	intel_runtime_pm_get(dev->dev_private);
> +
>  	context_size = round_up(get_lr_context_size(ring), 4096);
>  
>  	ctx_obj = i915_gem_alloc_object(dev, context_size);
>  	if (!ctx_obj) {
>  		DRM_DEBUG_DRIVER("Alloc LRC backing obj failed.\n");
> -		return -ENOMEM;
> +		ret = -ENOMEM;
> +		goto error_pm;
>  	}
>  
>  	if (is_global_default_ctx) {
> @@ -2331,7 +2334,7 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
>  			DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n",
>  					ret);
>  			drm_gem_object_unreference(&ctx_obj->base);
> -			return ret;
> +			goto error_pm;
>  		}
>  	}
>  
> @@ -2415,6 +2418,7 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
>  		ctx->rcs_initialized = true;
>  	}
>  
> +	intel_runtime_pm_put(dev->dev_private);
>  	return 0;
>  
>  error:
> @@ -2428,6 +2432,8 @@ error_unpin_ctx:
>  	if (is_global_default_ctx)
>  		i915_gem_object_ggtt_unpin(ctx_obj);
>  	drm_gem_object_unreference(&ctx_obj->base);
> +error_pm:
> +	intel_runtime_pm_put(dev->dev_private);
>  	return ret;
>  }
>  
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC 05/39] drm/i915: Split i915_dem_do_execbuffer() in half
  2015-07-17 14:33 ` [RFC 05/39] drm/i915: Split i915_dem_do_execbuffer() in half John.C.Harrison
@ 2015-07-21  8:00   ` Daniel Vetter
  0 siblings, 0 replies; 48+ messages in thread
From: Daniel Vetter @ 2015-07-21  8:00 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jul 17, 2015 at 03:33:14PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Split the execbuffer() function in half. The first half collects and validates
> all the information requried to process the batch buffer. It also does all the
> object pinning, relocations, active list management, etc - basically anything
> that must be done upfront before the IOCTL returns and allows the user land side
> to start changing/freeing things. The second half does the actual ring
> submission.
> 
> This change implements the split but leaves the back half being called directly
> from the end of the front half.
> 
> Change-Id: I5e1c77639ce526ab2401b0323186c518bf13da0a
> For: VIZ-1587
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h            |  11 +++
>  drivers/gpu/drm/i915/i915_gem.c            |   2 +
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 130 ++++++++++++++++++++---------
>  drivers/gpu/drm/i915/intel_lrc.c           |  58 +++++++++----
>  drivers/gpu/drm/i915/intel_lrc.h           |   1 +
>  5 files changed, 147 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 289ddd6..28d51ac 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1684,10 +1684,18 @@ struct i915_execbuffer_params {
>  	struct drm_device               *dev;
>  	struct drm_file                 *file;
>  	uint32_t                        dispatch_flags;
> +	uint32_t                        args_flags;
>  	uint32_t                        args_batch_start_offset;
> +	uint32_t                        args_batch_len;
> +	uint32_t                        args_num_cliprects;
> +	uint32_t                        args_DR1;
> +	uint32_t                        args_DR4;
>  	uint32_t                        batch_obj_vm_offset;
>  	struct intel_engine_cs          *ring;
>  	struct drm_i915_gem_object      *batch_obj;
> +	struct drm_clip_rect            *cliprects;
> +	uint32_t                        instp_mask;
> +	int                             instp_mode;
>  	struct intel_context            *ctx;
>  	struct drm_i915_gem_request     *request;
>  };
> @@ -1929,6 +1937,7 @@ struct drm_i915_private {
>  		int (*execbuf_submit)(struct i915_execbuffer_params *params,
>  				      struct drm_i915_gem_execbuffer2 *args,
>  				      struct list_head *vmas);
> +		int (*execbuf_final)(struct i915_execbuffer_params *params);

No need for this vfunc since you only call this from specialized and not
generic code.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC 04/39] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two
  2015-07-17 14:33 ` [RFC 04/39] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two John.C.Harrison
@ 2015-07-21  8:06   ` Daniel Vetter
  0 siblings, 0 replies; 48+ messages in thread
From: Daniel Vetter @ 2015-07-21  8:06 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jul 17, 2015 at 03:33:13PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The scheduler decouples the submission of batch buffers to the driver with their
> submission to the hardware. This basically means splitting the execbuffer()
> function in half. This change rearranges some code ready for the split to occur.

Would be nice to explain what and why moves so reviewers don't have to
reverse-engineer the idea behind this patch.

> 
> Change-Id: Icc9c8afaac18821f3eb8a151a49f918f90c068a3
> For: VIZ-1587
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Moving move_to_active around is really scary and should be a separate
patch. The reasons it is where it currently is is that move_to_active is
somewhat destructive and can't be undone. Hence why it is past the point
of no return.

Also why can't we just pull this out into generic code?
-Daniel


> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 57 ++++++++++++++++++------------
>  drivers/gpu/drm/i915/intel_lrc.c           | 18 +++++++---
>  2 files changed, 47 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index d95d472..988ecd4 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -926,10 +926,7 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
>  	if (flush_domains & I915_GEM_DOMAIN_GTT)
>  		wmb();
>  
> -	/* Unconditionally invalidate gpu caches and ensure that we do flush
> -	 * any residual writes from the previous batch.
> -	 */
> -	return intel_ring_invalidate_all_caches(req);
> +	return 0;
>  }
>  
>  static bool
> @@ -1253,17 +1250,6 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
>  		}
>  	}
>  
> -	ret = i915_gem_execbuffer_move_to_gpu(params->request, vmas);
> -	if (ret)
> -		goto error;
> -
> -	ret = i915_switch_context(params->request);
> -	if (ret)
> -		goto error;
> -
> -	WARN(params->ctx->ppgtt && params->ctx->ppgtt->pd_dirty_rings & (1<<ring->id),
> -	     "%s didn't clear reload\n", ring->name);
> -
>  	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
>  	instp_mask = I915_EXEC_CONSTANTS_MASK;
>  	switch (instp_mode) {
> @@ -1301,6 +1287,32 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
>  		goto error;
>  	}
>  
> +	ret = i915_gem_execbuffer_move_to_gpu(params->request, vmas);
> +	if (ret)
> +		goto error;
> +
> +	i915_gem_execbuffer_move_to_active(vmas, params->request);
> +
> +	/* To be split into two functions here... */
> +
> +	intel_runtime_pm_get(dev_priv);
> +
> +	/*
> +	 * Unconditionally invalidate gpu caches and ensure that we do flush
> +	 * any residual writes from the previous batch.
> +	 */
> +	ret = intel_ring_invalidate_all_caches(params->request);
> +	if (ret)
> +		goto error;
> +
> +	/* Switch to the correct context for the batch */
> +	ret = i915_switch_context(params->request);
> +	if (ret)
> +		goto error;
> +
> +	WARN(params->ctx->ppgtt && params->ctx->ppgtt->pd_dirty_rings & (1<<ring->id),
> +	     "%s didn't clear reload\n", ring->name);
> +
>  	if (ring == &dev_priv->ring[RCS] &&
>  			instp_mode != dev_priv->relative_constants_mode) {
>  		ret = intel_ring_begin(params->request, 4);
> @@ -1344,15 +1356,20 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
>  						exec_start, exec_len,
>  						params->dispatch_flags);
>  		if (ret)
> -			return ret;
> +			goto error;
>  	}
>  
>  	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
>  
> -	i915_gem_execbuffer_move_to_active(vmas, params->request);
>  	i915_gem_execbuffer_retire_commands(params);
>  
>  error:
> +	/*
> +	 * intel_gpu_busy should also get a ref, so it will free when the device
> +	 * is really idle.
> +	 */
> +	intel_runtime_pm_put(dev_priv);
> +
>  	kfree(cliprects);
>  	return ret;
>  }
> @@ -1563,8 +1580,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	}
>  #endif
>  
> -	intel_runtime_pm_get(dev_priv);
> -
>  	ret = i915_mutex_lock_interruptible(dev);
>  	if (ret)
>  		goto pre_mutex_err;
> @@ -1759,10 +1774,6 @@ err:
>  	mutex_unlock(&dev->struct_mutex);
>  
>  pre_mutex_err:
> -	/* intel_gpu_busy should also get a ref, so it will free when the device
> -	 * is really idle. */
> -	intel_runtime_pm_put(dev_priv);
> -
>  	if (fd_fence_complete != -1) {
>  		sys_close(fd_fence_complete);
>  		args->rsvd2 = (__u64) -1;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 8aa9a18..89f3bcd 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -613,10 +613,7 @@ static int execlists_move_to_gpu(struct drm_i915_gem_request *req,
>  	if (flush_domains & I915_GEM_DOMAIN_GTT)
>  		wmb();
>  
> -	/* Unconditionally invalidate gpu caches and ensure that we do flush
> -	 * any residual writes from the previous batch.
> -	 */
> -	return logical_ring_invalidate_all_caches(req);
> +	return 0;
>  }
>  
>  int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request)
> @@ -889,6 +886,18 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
>  	if (ret)
>  		return ret;
>  
> +	i915_gem_execbuffer_move_to_active(vmas, params->request);
> +
> +	/* To be split into two functions here... */
> +
> +	/*
> +	 * Unconditionally invalidate gpu caches and ensure that we do flush
> +	 * any residual writes from the previous batch.
> +	 */
> +	ret = logical_ring_invalidate_all_caches(params->request);
> +	if (ret)
> +		return ret;
> +
>  	if (ring == &dev_priv->ring[RCS] &&
>  	    instp_mode != dev_priv->relative_constants_mode) {
>  		ret = intel_logical_ring_begin(params->request, 4);
> @@ -913,7 +922,6 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
>  
>  	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
>  
> -	i915_gem_execbuffer_move_to_active(vmas, params->request);
>  	i915_gem_execbuffer_retire_commands(params);
>  
>  	return 0;
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC 24/39] drm/i915: Support for 'unflushed' ring idle
  2015-07-17 14:33 ` [RFC 24/39] drm/i915: Support for 'unflushed' ring idle John.C.Harrison
@ 2015-07-21  8:50   ` Daniel Vetter
  0 siblings, 0 replies; 48+ messages in thread
From: Daniel Vetter @ 2015-07-21  8:50 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jul 17, 2015 at 03:33:33PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> When the seqno wraps around zero, the entire GPU is forced to be idle
> for some reason (possibly only to work around issues with hardware
> semaphores but no-one seems too sure!). This causes a problem if the

It is for semaphores, and since 9d7730914f4cd we have extensive testcases
for this case thanks to Mika. Originally this was added in 1ec14ad313270
but somehow the details in the comments where lost.

So maybe no human knows it any more, but git log surely still does ;-)
-Daniel

> force idle occurs at an inopportune moment such as in the middle of
> submitting a batch buffer. Specifically, it would lead to recursive
> submits - submitting work requires a new seqno, the new seqno requires
> idling the ring, idling the ring requires submitting work, submitting
> work requires a new seqno...
> 
> This change adds a 'flush' parameter to the idle function call which
> specifies whether the scheduler queues should be flushed out. I.e. is
> the call intended to just idle the ring as it is right now (no flush)
> or is it intended to force all outstanding work out of the system
> (with flush).
> 
> In the seqno wrap case, pending work is not an issue because the next
> operation will be to submit it. However, in other cases, the intention
> is to make sure everything that could be done has been done.
> 
> Change-Id: I182e9a5853666c64ecc9e84d8a8b820a7f8e8836
> For: VIZ-1587
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c         |  4 ++--
>  drivers/gpu/drm/i915/intel_lrc.c        |  2 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 17 +++++++++++++++--
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
>  4 files changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 6d72caa..20c696f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2474,7 +2474,7 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno)
>  
>  	/* Carefully retire all requests without writing to the rings */
>  	for_each_ring(ring, dev_priv, i) {
> -		ret = intel_ring_idle(ring);
> +		ret = intel_ring_idle(ring, false);
>  		if (ret)
>  			return ret;
>  	}
> @@ -3732,7 +3732,7 @@ int i915_gpu_idle(struct drm_device *dev)
>  			i915_add_request_no_flush(req);
>  		}
>  
> -		ret = intel_ring_idle(ring);
> +		ret = intel_ring_idle(ring, true);
>  		if (ret)
>  			return ret;
>  	}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 76d5023..a811d0b 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -990,7 +990,7 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
>  	if (!intel_ring_initialized(ring))
>  		return;
>  
> -	ret = intel_ring_idle(ring);
> +	ret = intel_ring_idle(ring, true);
>  	if (ret && !i915_reset_in_progress(&to_i915(ring->dev)->gpu_error))
>  		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
>  			  ring->name, ret);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index e0992b7..afb04de 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2177,9 +2177,22 @@ static void __wrap_ring_buffer(struct intel_ringbuffer *ringbuf)
>  	intel_ring_update_space(ringbuf);
>  }
>  
> -int intel_ring_idle(struct intel_engine_cs *ring)
> +int intel_ring_idle(struct intel_engine_cs *ring, bool flush)
>  {
>  	struct drm_i915_gem_request *req;
> +	int ret;
> +
> +	/*
> +	 * NB: Must not flush the scheduler if this idle request is from
> +	 * within an execbuff submission (i.e. due to 'get_seqno' calling
> +	 * 'wrap_seqno' calling 'idle'). As that would lead to recursive
> +	 * flushes!
> +	 */
> +	if (flush) {
> +		ret = i915_scheduler_flush(ring, true);
> +		if (ret)
> +			return ret;
> +	}
>  
>  	/* Wait upon the last request to be completed */
>  	if (list_empty(&ring->request_list))
> @@ -2983,7 +2996,7 @@ intel_stop_ring_buffer(struct intel_engine_cs *ring)
>  	if (!intel_ring_initialized(ring))
>  		return;
>  
> -	ret = intel_ring_idle(ring);
> +	ret = intel_ring_idle(ring, true);
>  	if (ret && !i915_reset_in_progress(&to_i915(ring->dev)->gpu_error))
>  		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
>  			  ring->name, ret);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 9457774..2f30900 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -487,7 +487,7 @@ void intel_ring_update_space(struct intel_ringbuffer *ringbuf);
>  int intel_ring_space(struct intel_ringbuffer *ringbuf);
>  bool intel_ring_stopped(struct intel_engine_cs *ring);
>  
> -int __must_check intel_ring_idle(struct intel_engine_cs *ring);
> +int __must_check intel_ring_idle(struct intel_engine_cs *ring, bool flush);
>  void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno);
>  int intel_ring_flush_all_caches(struct drm_i915_gem_request *req);
>  int intel_ring_invalidate_all_caches(struct drm_i915_gem_request *req);
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC 19/39] drm/i915: Added scheduler support to __wait_request() calls
  2015-07-17 14:33 ` [RFC 19/39] drm/i915: Added scheduler support to __wait_request() calls John.C.Harrison
@ 2015-07-21  9:27   ` Daniel Vetter
  0 siblings, 0 replies; 48+ messages in thread
From: Daniel Vetter @ 2015-07-21  9:27 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jul 17, 2015 at 03:33:28PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The scheduler can cause batch buffers, and hence requests, to be submitted to
> the ring out of order and asynchronously to their submission to the driver. Thus
> at the point of waiting for the completion of a given request, it is not even
> guaranteed that the request has actually been sent to the hardware yet. Even it
> is has been sent, it is possible that it could be pre-empted and thus 'unsent'.
> 
> This means that it is necessary to be able to submit requests to the hardware
> during the wait call itself. Unfortunately, while some callers of
> __wait_request() release the mutex lock first, others do not (and apparently can
> not). Hence there is the ability to deadlock as the wait stalls for submission
> but the asynchronous submission is stalled for the mutex lock.
> 
> This change hooks the scheduler in to the __wait_request() code to ensure
> correct behaviour. That is, flush the target batch buffer through to the
> hardware and do not deadlock waiting for something that cannot currently be
> submitted. Instead, the wait call must return EAGAIN at least as far back as
> necessary to release the mutex lock and allow the scheduler's asynchronous
> processing to get in and handle the pre-emption operation and eventually
> (re-)submit the work.

I expect this to break the shrinker logic and that means fixing up all the
callers that bind objects into ggtt, like you already do (well comment
about) for the page fault handler. And just doing -EAGAIN spinning isn't
pretty either.

Instead we need a proper locking scheme here so that the scheduler can run
while the struct mutex is held. execlist submission had the same
requirement and hence has grown its own locks.

Note that there's currently one type of memory that we can evict which the
shrinker doesn't take care of, and that's to-be-unpinned buffers after
pageflips. We're working to fix that.
-Daniel

> 
> Change-Id: I31fe6bc7e38f6ffdd843fcae16e7cc8b1e52a931
> For: VIZ-1587
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |  3 +-
>  drivers/gpu/drm/i915/i915_gem.c         | 37 +++++++++++---
>  drivers/gpu/drm/i915/i915_scheduler.c   | 91 +++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_scheduler.h   |  2 +
>  drivers/gpu/drm/i915/intel_display.c    |  3 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  2 +-
>  6 files changed, 128 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 2b3fab6..e9e0736 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2972,7 +2972,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  			unsigned reset_counter,
>  			bool interruptible,
>  			s64 *timeout,
> -			struct intel_rps_client *rps);
> +			struct intel_rps_client *rps,
> +			bool is_locked);
>  int __must_check i915_wait_request(struct drm_i915_gem_request *req);
>  int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
>  int __must_check
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index cb5af5d..f713cda 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1219,7 +1219,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  			unsigned reset_counter,
>  			bool interruptible,
>  			s64 *timeout,
> -			struct intel_rps_client *rps)
> +			struct intel_rps_client *rps,
> +			bool is_locked)
>  {
>  	struct intel_engine_cs *ring = i915_gem_request_get_ring(req);
>  	struct drm_device *dev = ring->dev;
> @@ -1229,8 +1230,10 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  	DEFINE_WAIT(wait);
>  	unsigned long timeout_expire;
>  	s64 before, now;
> -	int ret;
> +	int ret = 0;
> +	bool    busy;
>  
> +	might_sleep();
>  	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
>  
>  	if (list_empty(&req->list))
> @@ -1281,6 +1284,22 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  			break;
>  		}
>  
> +		if (is_locked) {
> +			/* If this request is being processed by the scheduler
> +			 * then it is unsafe to sleep with the mutex lock held
> +			 * as the scheduler may require the lock in order to
> +			 * progress the request. */
> +			if (i915_scheduler_is_request_tracked(req, NULL, &busy)) {
> +				if (busy) {
> +					ret = -EAGAIN;
> +					break;
> +				}
> +			}
> +
> +			/* If the request is not tracked by the scheduler then the
> +			 * regular test can be done. */
> +		}
> +
>  		if (i915_gem_request_completed(req)) {
>  			ret = 0;
>  			break;
> @@ -1452,13 +1471,17 @@ i915_wait_request(struct drm_i915_gem_request *req)
>  
>  	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
>  
> +	ret = i915_scheduler_flush_request(req, true);
> +	if (ret < 0)
> +		return ret;
> +
>  	ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
>  	if (ret)
>  		return ret;
>  
>  	ret = __i915_wait_request(req,
>  				  atomic_read(&dev_priv->gpu_error.reset_counter),
> -				  interruptible, NULL, NULL);
> +				  interruptible, NULL, NULL, true);
>  	if (ret)
>  		return ret;
>  
> @@ -1571,7 +1594,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
>  	mutex_unlock(&dev->struct_mutex);
>  	for (i = 0; ret == 0 && i < n; i++)
>  		ret = __i915_wait_request(requests[i], reset_counter, true,
> -					  NULL, rps);
> +					  NULL, rps, false);
>  	mutex_lock(&dev->struct_mutex);
>  
>  	for (i = 0; i < n; i++) {
> @@ -3443,7 +3466,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  		if (ret == 0)
>  			ret = __i915_wait_request(req[i], reset_counter, true,
>  						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
> -						  file->driver_priv);
> +						  file->driver_priv, false);
>  		i915_gem_request_unreference(req[i]);
>  	}
>  	return ret;
> @@ -3476,7 +3499,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
>  					  atomic_read(&i915->gpu_error.reset_counter),
>  					  i915->mm.interruptible,
>  					  NULL,
> -					  &i915->rps.semaphores);
> +					  &i915->rps.semaphores, true);
>  		if (ret)
>  			return ret;
>  
> @@ -4683,7 +4706,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
>  	if (target == NULL)
>  		return 0;
>  
> -	ret = __i915_wait_request(target, reset_counter, true, NULL, NULL);
> +	ret = __i915_wait_request(target, reset_counter, true, NULL, NULL, false);
>  	if (ret == 0)
>  		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
>  
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index df2e27f..811cbe4 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -31,6 +31,8 @@ static int         i915_scheduler_remove_dependent(struct i915_scheduler *schedu
>  						   struct i915_scheduler_queue_entry *remove);
>  static int         i915_scheduler_submit(struct intel_engine_cs *ring,
>  					 bool is_locked);
> +static int         i915_scheduler_submit_max_priority(struct intel_engine_cs *ring,
> +					       bool is_locked);
>  static uint32_t    i915_scheduler_count_flying(struct i915_scheduler *scheduler,
>  					       struct intel_engine_cs *ring);
>  static void        i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler);
> @@ -600,6 +602,57 @@ void i915_gem_scheduler_work_handler(struct work_struct *work)
>  	mutex_unlock(&dev->struct_mutex);
>  }
>  
> +int i915_scheduler_flush_request(struct drm_i915_gem_request *req,
> +				 bool is_locked)
> +{
> +	struct drm_i915_private            *dev_priv;
> +	struct i915_scheduler              *scheduler;
> +	unsigned long       flags;
> +	int                 flush_count;
> +	uint32_t            ring_id;
> +
> +	if (!req)
> +		return -EINVAL;
> +
> +	dev_priv  = req->ring->dev->dev_private;
> +	scheduler = dev_priv->scheduler;
> +
> +	if (!scheduler)
> +		return 0;
> +
> +	if (!req->scheduler_qe)
> +		return 0;
> +
> +	if (!I915_SQS_IS_QUEUED(req->scheduler_qe))
> +		return 0;
> +
> +	ring_id = req->ring->id;
> +	if (is_locked && (scheduler->flags[ring_id] & i915_sf_submitting)) {
> +		/* Scheduler is busy already submitting another batch,
> +		 * come back later rather than going recursive... */
> +		return -EAGAIN;
> +	}
> +
> +	if (list_empty(&scheduler->node_queue[ring_id]))
> +		return 0;
> +
> +	spin_lock_irqsave(&scheduler->lock, flags);
> +
> +	i915_scheduler_priority_bump_clear(scheduler);
> +
> +	flush_count = i915_scheduler_priority_bump(scheduler,
> +			    req->scheduler_qe, scheduler->priority_level_max);
> +
> +	spin_unlock_irqrestore(&scheduler->lock, flags);
> +
> +	if (flush_count) {
> +		DRM_DEBUG_DRIVER("<%s> Bumped %d entries\n", req->ring->name, flush_count);
> +		flush_count = i915_scheduler_submit_max_priority(req->ring, is_locked);
> +	}
> +
> +	return flush_count;
> +}
> +
>  static void i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler)
>  {
>  	struct i915_scheduler_queue_entry *node;
> @@ -653,6 +706,44 @@ static int i915_scheduler_priority_bump(struct i915_scheduler *scheduler,
>  	return count;
>  }
>  
> +static int i915_scheduler_submit_max_priority(struct intel_engine_cs *ring,
> +					      bool is_locked)
> +{
> +	struct i915_scheduler_queue_entry  *node;
> +	struct drm_i915_private            *dev_priv = ring->dev->dev_private;
> +	struct i915_scheduler              *scheduler = dev_priv->scheduler;
> +	unsigned long	flags;
> +	int             ret, count = 0;
> +	bool            found;
> +
> +	do {
> +		found = false;
> +		spin_lock_irqsave(&scheduler->lock, flags);
> +		list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
> +			if (!I915_SQS_IS_QUEUED(node))
> +				continue;
> +
> +			if (node->priority < scheduler->priority_level_max)
> +				continue;
> +
> +			found = true;
> +			break;
> +		}
> +		spin_unlock_irqrestore(&scheduler->lock, flags);
> +
> +		if (!found)
> +			break;
> +
> +		ret = i915_scheduler_submit(ring, is_locked);
> +		if (ret < 0)
> +			return ret;
> +
> +		count += ret;
> +	} while (found);
> +
> +	return count;
> +}
> +
>  static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
>  				    struct i915_scheduler_queue_entry **pop_node,
>  				    unsigned long *flags)
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index 73c5e7d..fcf2640 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -92,6 +92,8 @@ void        i915_gem_scheduler_clean_node(struct i915_scheduler_queue_entry *nod
>  int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
>  int         i915_scheduler_handle_irq(struct intel_engine_cs *ring);
>  void        i915_gem_scheduler_work_handler(struct work_struct *work);
> +int         i915_scheduler_flush_request(struct drm_i915_gem_request *req,
> +					 bool is_locked);
>  bool        i915_scheduler_is_request_tracked(struct drm_i915_gem_request *req,
>  					      bool *completed, bool *busy);
>  
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 9629dab..129c829 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -11291,7 +11291,8 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
>  		WARN_ON(__i915_wait_request(mmio_flip->req,
>  					    mmio_flip->crtc->reset_counter,
>  					    false, NULL,
> -					    &mmio_flip->i915->rps.mmioflips));
> +					    &mmio_flip->i915->rps.mmioflips,
> +					    false));
>  
>  	intel_do_mmio_flip(mmio_flip->crtc);
>  
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index df0cd48..e0992b7 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2193,7 +2193,7 @@ int intel_ring_idle(struct intel_engine_cs *ring)
>  	return __i915_wait_request(req,
>  				   atomic_read(&to_i915(ring->dev)->gpu_error.reset_counter),
>  				   to_i915(ring->dev)->mm.interruptible,
> -				   NULL, NULL);
> +				   NULL, NULL, true);
>  }
>  
>  int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC 07/39] drm/i915: Start of GPU scheduler
  2015-07-17 14:33 ` [RFC 07/39] drm/i915: Start of GPU scheduler John.C.Harrison
@ 2015-07-21  9:40   ` Daniel Vetter
  0 siblings, 0 replies; 48+ messages in thread
From: Daniel Vetter @ 2015-07-21  9:40 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jul 17, 2015 at 03:33:16PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Initial creation of scheduler source files. Note that this patch implements most
> of the scheduler functionality but does not hook it in to the driver yet. It
> also leaves the scheduler code in 'pass through' mode so that even when it is
> hooked in, it will not actually do very much. This allows the hooks to be added
> one at a time in byte size chunks and only when the scheduler is finally enabled
> at the end does anything start happening.
> 
> The general theory of operation is that when batch buffers are submitted to the
> driver, the execbuffer() code assigns a unique request and then packages up all
> the information required to execute the batch buffer at a later time. This
> package is given over to the scheduler which adds it to an internal node list.
> The scheduler also scans the list of objects associated with the batch buffer
> and compares them against the objects already in use by other buffers in the
> node list. If matches are found then the new batch buffer node is marked as
> being dependent upon the matching node. The same is done for the context object.
> The scheduler also bumps up the priority of such matching nodes on the grounds
> that the more dependencies a given batch buffer has the more important it is
> likely to be.
> 
> The scheduler aims to have a given (tuneable) number of batch buffers in flight
> on the hardware at any given time. If fewer than this are currently executing
> when a new node is queued, then the node is passed straight through to the
> submit function. Otherwise it is simply added to the queue and the driver
> returns back to user land.
> 
> As each batch buffer completes, it raises an interrupt which wakes up the
> scheduler. Note that it is possible for multiple buffers to complete before the
> IRQ handler gets to run. Further, it is possible for the seqno values to be
> un-ordered (particularly once pre-emption is enabled). However, the scheduler
> keeps the list of executing buffers in order of hardware submission. Thus it can
> scan through the list until a matching seqno is found and then mark all in
> flight nodes from that point on as completed.
> 
> A deferred work queue is also poked by the interrupt handler. When this wakes up
> it can do more involved processing such as actually removing completed nodes
> from the queue and freeing up the resources associated with them (internal
> memory allocations, DRM object references, context reference, etc.). The work
> handler also checks the in flight count and calls the submission code if a new
> slot has appeared.
> 
> When the scheduler's submit code is called, it scans the queued node list for
> the highest priority node that has no unmet dependencies. Note that the
> dependency calculation is complex as it must take inter-ring dependencies and
> potential preemptions into account. Note also that in the future this will be
> extended to include external dependencies such as the Android Native Sync file
> descriptors and/or the linux dma-buff synchronisation scheme.
> 
> If a suitable node is found then it is sent to execbuff_final() for submission
> to the hardware. The in flight count is then re-checked and a new node popped
> from the list if appropriate.
> 
> Note that this patch does not implement pre-emptive scheduling. Only basic
> scheduling by re-ordering batch buffer submission is currently implemented.
> 
> Change-Id: I1e08f59e650a3c2bbaaa9de7627da33849b06106
> For: VIZ-1587
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/Makefile         |   1 +
>  drivers/gpu/drm/i915/i915_drv.h       |   4 +
>  drivers/gpu/drm/i915/i915_gem.c       |   5 +
>  drivers/gpu/drm/i915/i915_scheduler.c | 776 ++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_scheduler.h |  91 ++++
>  5 files changed, 877 insertions(+)
>  create mode 100644 drivers/gpu/drm/i915/i915_scheduler.c
>  create mode 100644 drivers/gpu/drm/i915/i915_scheduler.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 47a74114..c367b39 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -9,6 +9,7 @@ ccflags-y := -Werror
>  # core driver code
>  i915-y := i915_drv.o \
>  	  i915_params.o \
> +	  i915_scheduler.o \
>            i915_suspend.o \
>  	  i915_sysfs.o \
>  	  intel_pm.o \
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index a680778..7d2a494 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1700,6 +1700,8 @@ struct i915_execbuffer_params {
>  	struct drm_i915_gem_request     *request;
>  };
>  
> +struct i915_scheduler;
> +
>  struct drm_i915_private {
>  	struct drm_device *dev;
>  	struct kmem_cache *objects;
> @@ -1932,6 +1934,8 @@ struct drm_i915_private {
>  
>  	struct i915_runtime_pm pm;
>  
> +	struct i915_scheduler *scheduler;
> +
>  	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
>  	struct {
>  		int (*execbuf_submit)(struct i915_execbuffer_params *params,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 0c407ae..3fbc6ec 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -40,6 +40,7 @@
>  #ifdef CONFIG_SYNC
>  #include <../drivers/staging/android/sync.h>
>  #endif
> +#include "i915_scheduler.h"
>  
>  #define RQ_BUG_ON(expr)
>  
> @@ -5398,6 +5399,10 @@ i915_gem_init_hw(struct drm_device *dev)
>  
>  	i915_gem_init_swizzling(dev);
>  
> +	ret = i915_scheduler_init(dev);
> +	if (ret)
> +		return ret;
> +
>  	/*
>  	 * At least 830 can leave some of the unused rings
>  	 * "active" (ie. head != tail) after resume which
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> new file mode 100644
> index 0000000..71d8df7
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -0,0 +1,776 @@
> +/*
> + * Copyright (c) 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#include "i915_drv.h"
> +#include "intel_drv.h"
> +#include "i915_scheduler.h"
> +
> +static int         i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node);
> +static int         i915_scheduler_remove_dependent(struct i915_scheduler *scheduler,
> +						   struct i915_scheduler_queue_entry *remove);
> +static int         i915_scheduler_submit(struct intel_engine_cs *ring,
> +					 bool is_locked);
> +static uint32_t    i915_scheduler_count_flying(struct i915_scheduler *scheduler,
> +					       struct intel_engine_cs *ring);
> +static void        i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler);
> +static int         i915_scheduler_priority_bump(struct i915_scheduler *scheduler,
> +						struct i915_scheduler_queue_entry *target,
> +						uint32_t bump);
> +
> +int i915_scheduler_init(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
> +	int                     r;
> +
> +	if (scheduler)
> +		return 0;
> +
> +	scheduler = kzalloc(sizeof(*scheduler), GFP_KERNEL);
> +	if (!scheduler)
> +		return -ENOMEM;
> +
> +	spin_lock_init(&scheduler->lock);
> +
> +	for (r = 0; r < I915_NUM_RINGS; r++)
> +		INIT_LIST_HEAD(&scheduler->node_queue[r]);
> +
> +	scheduler->index = 1;
> +
> +	/* Default tuning values: */
> +	scheduler->priority_level_max     = ~0U;
> +	scheduler->priority_level_preempt = 900;
> +	scheduler->min_flying             = 2;
> +
> +	dev_priv->scheduler = scheduler;
> +
> +	return 0;
> +}
> +
> +int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
> +{
> +	struct drm_i915_private *dev_priv = qe->params.dev->dev_private;
> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
> +	struct intel_engine_cs  *ring = qe->params.ring;
> +	struct i915_scheduler_queue_entry  *node;
> +	struct i915_scheduler_queue_entry  *test;
> +	struct timespec     stamp;
> +	unsigned long       flags;
> +	bool                not_flying, found;
> +	int                 i, j, r;
> +	int                 incomplete = 0;
> +
> +	BUG_ON(!scheduler);
> +
> +	if (1/*i915.scheduler_override & i915_so_direct_submit*/) {
> +		int ret;
> +
> +		qe->scheduler_index = scheduler->index++;
> +
> +		scheduler->flags[qe->params.ring->id] |= i915_sf_submitting;
> +		ret = dev_priv->gt.execbuf_final(&qe->params);
> +		scheduler->flags[qe->params.ring->id] &= ~i915_sf_submitting;
> +
> +		/*
> +		 * Don't do any clean up on failure because the caller will
> +		 * do it all anyway.
> +		 */
> +		if (ret)
> +			return ret;
> +
> +		/* Free everything that is owned by the QE structure: */
> +		kfree(qe->params.cliprects);
> +		if (qe->params.dispatch_flags & I915_DISPATCH_SECURE)
> +			i915_gem_execbuff_release_batch_obj(qe->params.batch_obj);
> +
> +		return 0;
> +	}
> +
> +	getrawmonotonic(&stamp);
> +
> +	node = kmalloc(sizeof(*node), GFP_KERNEL);
> +	if (!node)
> +		return -ENOMEM;
> +
> +	*node = *qe;
> +	INIT_LIST_HEAD(&node->link);
> +	node->status = i915_sqs_queued;
> +	node->stamp  = stamp;
> +	i915_gem_request_reference(node->params.request);
> +
> +	/* Need to determine the number of incomplete entries in the list as
> +	 * that will be the maximum size of the dependency list.
> +	 *
> +	 * Note that the allocation must not be made with the spinlock acquired
> +	 * as kmalloc can sleep. However, the unlock/relock is safe because no
> +	 * new entries can be queued up during the unlock as the i915 driver
> +	 * mutex is still held. Entries could be removed from the list but that
> +	 * just means the dep_list will be over-allocated which is fine.
> +	 */
> +	spin_lock_irqsave(&scheduler->lock, flags);
> +	for (r = 0; r < I915_NUM_RINGS; r++) {
> +		list_for_each_entry(test, &scheduler->node_queue[r], link) {
> +			if (I915_SQS_IS_COMPLETE(test))
> +				continue;
> +
> +			incomplete++;
> +		}
> +	}
> +
> +	/* Temporarily unlock to allocate memory: */
> +	spin_unlock_irqrestore(&scheduler->lock, flags);
> +	if (incomplete) {
> +		node->dep_list = kmalloc(sizeof(node->dep_list[0]) * incomplete,
> +					 GFP_KERNEL);
> +		if (!node->dep_list) {
> +			kfree(node);
> +			return -ENOMEM;
> +		}
> +	} else
> +		node->dep_list = NULL;
> +
> +	spin_lock_irqsave(&scheduler->lock, flags);
> +	node->num_deps = 0;
> +
> +	if (node->dep_list) {
> +		for (r = 0; r < I915_NUM_RINGS; r++) {
> +			list_for_each_entry(test, &scheduler->node_queue[r], link) {
> +				if (I915_SQS_IS_COMPLETE(test))
> +					continue;
> +
> +				/*
> +				 * Batches on the same ring for the same
> +				 * context must be kept in order.
> +				 */
> +				found = (node->params.ctx == test->params.ctx) &&
> +					(node->params.ring == test->params.ring);
> +
> +				/*
> +				 * Batches working on the same objects must
> +				 * be kept in order.
> +				 */
> +				for (i = 0; (i < node->num_objs) && !found; i++) {
> +					for (j = 0; j < test->num_objs; j++) {
> +						if (node->saved_objects[i].obj !=
> +							    test->saved_objects[j].obj)
> +							continue;
> +
> +						found = true;
> +						break;

I guess this should be a goto to break out of both loops?

> +					}
> +				}
> +
> +				if (found) {
> +					node->dep_list[node->num_deps] = test;
> +					node->num_deps++;
> +				}
> +			}
> +		}
> +
> +		BUG_ON(node->num_deps > incomplete);
> +	}

This seems to be O(pending_requests * obj_per_request ^ 2), which is a bit
excessive. Also since you only check deps at the object level this breaks
read-read.

Finally move_to_active does track all this already since it has to
exchange all the request for new ones. We might want to move object_sync
in there too just because, but I think this would definitely fit better
in there.
-Daniel

> +
> +	if (node->priority && node->num_deps) {
> +		i915_scheduler_priority_bump_clear(scheduler);
> +
> +		for (i = 0; i < node->num_deps; i++)
> +			i915_scheduler_priority_bump(scheduler,
> +					node->dep_list[i], node->priority);
> +	}
> +
> +	node->scheduler_index = scheduler->index++;
> +
> +	list_add_tail(&node->link, &scheduler->node_queue[ring->id]);
> +
> +	not_flying = i915_scheduler_count_flying(scheduler, ring) <
> +						 scheduler->min_flying;
> +
> +	spin_unlock_irqrestore(&scheduler->lock, flags);
> +
> +	if (not_flying)
> +		i915_scheduler_submit(ring, true);
> +
> +	return 0;
> +}
> +
> +static int i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node)
> +{
> +	struct drm_i915_private *dev_priv = node->params.dev->dev_private;
> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
> +	struct intel_engine_cs  *ring;
> +
> +	BUG_ON(!scheduler);
> +	BUG_ON(!node);
> +	BUG_ON(node->status != i915_sqs_popped);
> +
> +	ring = node->params.ring;
> +
> +	/* Add the node (which should currently be in state none) to the front
> +	 * of the queue. This ensure that flying nodes are always held in
> +	 * hardware submission order. */
> +	list_add(&node->link, &scheduler->node_queue[ring->id]);
> +
> +	node->status = i915_sqs_flying;
> +
> +	if (!(scheduler->flags[ring->id] & i915_sf_interrupts_enabled)) {
> +		bool    success = true;
> +
> +		success = ring->irq_get(ring);
> +		if (success)
> +			scheduler->flags[ring->id] |= i915_sf_interrupts_enabled;
> +		else
> +			return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Nodes are considered valid dependencies if they are queued on any ring or
> + * if they are in flight on a different ring. In flight on the same ring is no
> + * longer interesting for non-premptive nodes as the ring serialises execution.
> + * For pre-empting nodes, all in flight dependencies are valid as they must not
> + * be jumped by the act of pre-empting.
> + *
> + * Anything that is neither queued nor flying is uninteresting.
> + */
> +static inline bool i915_scheduler_is_dependency_valid(
> +			struct i915_scheduler_queue_entry *node, uint32_t idx)
> +{
> +	struct i915_scheduler_queue_entry *dep;
> +
> +	dep = node->dep_list[idx];
> +	if (!dep)
> +		return false;
> +
> +	if (I915_SQS_IS_QUEUED(dep))
> +		return true;
> +
> +	if (I915_SQS_IS_FLYING(dep)) {
> +		if (node->params.ring != dep->params.ring)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +static uint32_t i915_scheduler_count_flying(struct i915_scheduler *scheduler,
> +					    struct intel_engine_cs *ring)
> +{
> +	struct i915_scheduler_queue_entry *node;
> +	uint32_t                          flying = 0;
> +
> +	list_for_each_entry(node, &scheduler->node_queue[ring->id], link)
> +		if (I915_SQS_IS_FLYING(node))
> +			flying++;
> +
> +	return flying;
> +}
> +
> +/* Add a popped node back in to the queue. For example, because the ring was
> + * hung when execfinal() was called and thus the ring submission needs to be
> + * retried later. */
> +static void i915_scheduler_node_requeue(struct i915_scheduler_queue_entry *node)
> +{
> +	BUG_ON(!node);
> +	BUG_ON(!I915_SQS_IS_FLYING(node));
> +
> +	node->status = i915_sqs_queued;
> +}
> +
> +/* Give up on a popped node completely. For example, because it is causing the
> + * ring to hang or is using some resource that no longer exists. */
> +static void i915_scheduler_node_kill(struct i915_scheduler_queue_entry *node)
> +{
> +	BUG_ON(!node);
> +	BUG_ON(!I915_SQS_IS_FLYING(node));
> +
> +	node->status = i915_sqs_dead;
> +}
> +
> +/*
> + * The batch tagged with the indicated seqence number has completed.
> + * Search the queue for it, update its status and those of any batches
> + * submitted earlier, which must also have completed or been preeempted
> + * as appropriate.
> + *
> + * Called with spinlock already held.
> + */
> +static void i915_scheduler_seqno_complete(struct intel_engine_cs *ring, uint32_t seqno)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
> +	struct i915_scheduler_queue_entry *node;
> +	bool got_changes = false;
> +
> +	/*
> +	 * Batch buffers are added to the head of the list in execution order,
> +	 * thus seqno values, although not necessarily incrementing, will be
> +	 * met in completion order when scanning the list. So when a match is
> +	 * found, all subsequent entries must have also popped out. Conversely,
> +	 * if a completed entry is found then there is no need to scan further.
> +	 */
> +	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
> +		if (I915_SQS_IS_COMPLETE(node))
> +			return;
> +
> +		if (seqno == node->params.request->seqno)
> +			break;
> +	}
> +
> +	/*
> +	 * NB: Lots of extra seqnos get added to the ring to track things
> +	 * like cache flushes and page flips. So don't complain about if
> +	 * no node was found.
> +	 */
> +	if (&node->link == &scheduler->node_queue[ring->id])
> +		return;
> +
> +	WARN_ON(!I915_SQS_IS_FLYING(node));
> +
> +	/* Everything from here can be marked as done: */
> +	list_for_each_entry_from(node, &scheduler->node_queue[ring->id], link) {
> +		/* Check if the marking has already been done: */
> +		if (I915_SQS_IS_COMPLETE(node))
> +			break;
> +
> +		if (!I915_SQS_IS_FLYING(node))
> +			continue;
> +
> +		/* Node was in flight so mark it as complete. */
> +		node->status = i915_sqs_complete;
> +		got_changes = true;
> +	}
> +
> +	/* Should submit new work here if flight list is empty but the DRM
> +	 * mutex lock might not be available if a '__wait_request()' call is
> +	 * blocking the system. */
> +}
> +
> +int i915_scheduler_handle_irq(struct intel_engine_cs *ring)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
> +	unsigned long       flags;
> +	uint32_t            seqno;
> +
> +	seqno = ring->get_seqno(ring, false);
> +
> +	if (1/*i915.scheduler_override & i915_so_direct_submit*/)
> +		return 0;
> +
> +	if (seqno == scheduler->last_irq_seqno[ring->id]) {
> +		/* Why are there sometimes multiple interrupts per seqno? */
> +		return 0;
> +	}
> +	scheduler->last_irq_seqno[ring->id] = seqno;
> +
> +	spin_lock_irqsave(&scheduler->lock, flags);
> +	i915_scheduler_seqno_complete(ring, seqno);
> +	spin_unlock_irqrestore(&scheduler->lock, flags);
> +
> +	/* XXX: Need to also call i915_scheduler_remove() via work handler. */
> +
> +	return 0;
> +}
> +
> +int i915_scheduler_remove(struct intel_engine_cs *ring)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
> +	struct i915_scheduler_queue_entry  *node, *node_next;
> +	unsigned long       flags;
> +	int                 flying = 0, queued = 0;
> +	int                 ret = 0;
> +	bool                do_submit;
> +	uint32_t            min_seqno;
> +	struct list_head    remove;
> +
> +	if (list_empty(&scheduler->node_queue[ring->id]))
> +		return 0;
> +
> +	spin_lock_irqsave(&scheduler->lock, flags);
> +
> +	/* /i915_scheduler_dump_locked(ring, "remove/pre");/ */
> +
> +	/*
> +	 * In the case where the system is idle, starting 'min_seqno' from a big
> +	 * number will cause all nodes to be removed as they are now back to
> +	 * being in-order. However, this will be a problem if the last one to
> +	 * complete was actually out-of-order as the ring seqno value will be
> +	 * lower than one or more completed buffers. Thus code looking for the
> +	 * completion of said buffers will wait forever.
> +	 * Instead, use the hardware seqno as the starting point. This means
> +	 * that some buffers might be kept around even in a completely idle
> +	 * system but it should guarantee that no-one ever gets confused when
> +	 * waiting for buffer completion.
> +	 */
> +	min_seqno = ring->get_seqno(ring, true);
> +
> +	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
> +		if (I915_SQS_IS_QUEUED(node))
> +			queued++;
> +		else if (I915_SQS_IS_FLYING(node))
> +			flying++;
> +		else if (I915_SQS_IS_COMPLETE(node))
> +			continue;
> +
> +		if (node->params.request->seqno == 0)
> +			continue;
> +
> +		if (!i915_seqno_passed(node->params.request->seqno, min_seqno))
> +			min_seqno = node->params.request->seqno;
> +	}
> +
> +	INIT_LIST_HEAD(&remove);
> +	list_for_each_entry_safe(node, node_next, &scheduler->node_queue[ring->id], link) {
> +		/*
> +		 * Only remove completed nodes which have a lower seqno than
> +		 * all pending nodes. While there is the possibility of the
> +		 * ring's seqno counting backwards, all higher buffers must
> +		 * be remembered so that the 'i915_seqno_passed()' test can
> +		 * report that they have in fact passed.
> +		 *
> +		 * NB: This is not true for 'dead' nodes. The GPU reset causes
> +		 * the software seqno to restart from its initial value. Thus
> +		 * the dead nodes must be removed even though their seqno values
> +		 * are potentially vastly greater than the current ring seqno.
> +		 */
> +		if (!I915_SQS_IS_COMPLETE(node))
> +			continue;
> +
> +		if (node->status != i915_sqs_dead) {
> +			if (i915_seqno_passed(node->params.request->seqno, min_seqno) &&
> +			    (node->params.request->seqno != min_seqno))
> +				continue;
> +		}
> +
> +		list_del(&node->link);
> +		list_add(&node->link, &remove);
> +
> +		/* Strip the dependency info while the mutex is still locked */
> +		i915_scheduler_remove_dependent(scheduler, node);
> +
> +		continue;
> +	}
> +
> +	/*
> +	 * No idea why but this seems to cause problems occasionally.
> +	 * Note that the 'irq_put' code is internally reference counted
> +	 * and spin_locked so it should be safe to call.
> +	 */
> +	/*if ((scheduler->flags[ring->id] & i915_sf_interrupts_enabled) &&
> +	    (first_flight[ring->id] == NULL)) {
> +		ring->irq_put(ring);
> +		scheduler->flags[ring->id] &= ~i915_sf_interrupts_enabled;
> +	}*/
> +
> +	/* Launch more packets now? */
> +	do_submit = (queued > 0) && (flying < scheduler->min_flying);
> +
> +	spin_unlock_irqrestore(&scheduler->lock, flags);
> +
> +	if (do_submit)
> +		ret = i915_scheduler_submit(ring, true);
> +
> +	while (!list_empty(&remove)) {
> +		node = list_first_entry(&remove, typeof(*node), link);
> +		list_del(&node->link);
> +
> +		/* The batch buffer must be unpinned before it is unreferenced
> +		 * otherwise the unpin fails with a missing vma!? */
> +		if (node->params.dispatch_flags & I915_DISPATCH_SECURE)
> +			i915_gem_execbuff_release_batch_obj(node->params.batch_obj);
> +
> +		/* Free everything that is owned by the node: */
> +		i915_gem_request_unreference(node->params.request);
> +		kfree(node->params.cliprects);
> +		kfree(node->dep_list);
> +		kfree(node);
> +	}
> +
> +	return ret;
> +}
> +
> +static void i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler)
> +{
> +	struct i915_scheduler_queue_entry *node;
> +	int i;
> +
> +	/*
> +	 * Ensure circular dependencies don't cause problems and that a bump
> +	 * by object usage only bumps each using buffer once:
> +	 */
> +	for (i = 0; i < I915_NUM_RINGS; i++) {
> +		list_for_each_entry(node, &scheduler->node_queue[i], link)
> +			node->bumped = false;
> +	}
> +}
> +
> +static int i915_scheduler_priority_bump(struct i915_scheduler *scheduler,
> +					struct i915_scheduler_queue_entry *target,
> +					uint32_t bump)
> +{
> +	uint32_t new_priority;
> +	int      i, count;
> +
> +	if (target->priority >= scheduler->priority_level_max)
> +		return 1;
> +
> +	if (target->bumped)
> +		return 0;
> +
> +	new_priority = target->priority + bump;
> +	if ((new_priority <= target->priority) ||
> +	    (new_priority > scheduler->priority_level_max))
> +		target->priority = scheduler->priority_level_max;
> +	else
> +		target->priority = new_priority;
> +
> +	count = 1;
> +	target->bumped = true;
> +
> +	for (i = 0; i < target->num_deps; i++) {
> +		if (!target->dep_list[i])
> +			continue;
> +
> +		if (target->dep_list[i]->bumped)
> +			continue;
> +
> +		count += i915_scheduler_priority_bump(scheduler,
> +						      target->dep_list[i],
> +						      bump);
> +	}
> +
> +	return count;
> +}
> +
> +static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
> +				    struct i915_scheduler_queue_entry **pop_node,
> +				    unsigned long *flags)
> +{
> +	struct drm_i915_private            *dev_priv = ring->dev->dev_private;
> +	struct i915_scheduler              *scheduler = dev_priv->scheduler;
> +	struct i915_scheduler_queue_entry  *best;
> +	struct i915_scheduler_queue_entry  *node;
> +	int     ret;
> +	int     i;
> +	bool	any_queued;
> +	bool	has_local, has_remote, only_remote;
> +
> +	*pop_node = NULL;
> +	ret = -ENODATA;
> +
> +	any_queued = false;
> +	only_remote = false;
> +	best = NULL;
> +
> +	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
> +		if (!I915_SQS_IS_QUEUED(node))
> +			continue;
> +		any_queued = true;
> +
> +		has_local  = false;
> +		has_remote = false;
> +		for (i = 0; i < node->num_deps; i++) {
> +			if (!i915_scheduler_is_dependency_valid(node, i))
> +				continue;
> +
> +			if (node->dep_list[i]->params.ring == node->params.ring)
> +				has_local = true;
> +			else
> +				has_remote = true;
> +		}
> +
> +		if (has_remote && !has_local)
> +			only_remote = true;
> +
> +		if (!has_local && !has_remote) {
> +			if (!best ||
> +			    (node->priority > best->priority))
> +				best = node;
> +		}
> +	}
> +
> +	if (best) {
> +		list_del(&best->link);
> +
> +		INIT_LIST_HEAD(&best->link);
> +		best->status  = i915_sqs_popped;
> +
> +		ret = 0;
> +	} else {
> +		/* Can only get here if:
> +		 * (a) there are no buffers in the queue
> +		 * (b) all queued buffers are dependent on other buffers
> +		 *     e.g. on a buffer that is in flight on a different ring
> +		 */
> +		if (only_remote) {
> +			/* The only dependent buffers are on another ring. */
> +			ret = -EAGAIN;
> +		} else if (any_queued) {
> +			/* It seems that something has gone horribly wrong! */
> +			DRM_ERROR("Broken dependency tracking on ring %d!\n",
> +				  (int) ring->id);
> +		}
> +	}
> +
> +	/* i915_scheduler_dump_queue_pop(ring, best); */
> +
> +	*pop_node = best;
> +	return ret;
> +}
> +
> +static int i915_scheduler_submit(struct intel_engine_cs *ring, bool was_locked)
> +{
> +	struct drm_device   *dev = ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
> +	struct i915_scheduler_queue_entry  *node;
> +	unsigned long       flags;
> +	int                 ret = 0, count = 0;
> +
> +	if (!was_locked) {
> +		ret = i915_mutex_lock_interruptible(dev);

Nah, scheduler needs its own hw lock, otherwise hell will break lose with
locking inversions. We might need to add engine->hw_lock.


> +		if (ret)
> +			return ret;
> +	}
> +
> +	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
> +
> +	spin_lock_irqsave(&scheduler->lock, flags);
> +
> +	/* First time around, complain if anything unexpected occurs: */
> +	ret = i915_scheduler_pop_from_queue_locked(ring, &node, &flags);
> +	if (ret) {
> +		spin_unlock_irqrestore(&scheduler->lock, flags);
> +
> +		if (!was_locked)
> +			mutex_unlock(&dev->struct_mutex);
> +
> +		return ret;
> +	}
> +
> +	do {
> +		BUG_ON(!node);
> +		BUG_ON(node->params.ring != ring);
> +		BUG_ON(node->status != i915_sqs_popped);
> +		count++;
> +
> +		/* The call to pop above will have removed the node from the
> +		 * list. So add it back in and mark it as in flight. */
> +		i915_scheduler_fly_node(node);
> +
> +		scheduler->flags[ring->id] |= i915_sf_submitting;
> +		spin_unlock_irqrestore(&scheduler->lock, flags);
> +		ret = dev_priv->gt.execbuf_final(&node->params);
> +		spin_lock_irqsave(&scheduler->lock, flags);
> +		scheduler->flags[ring->id] &= ~i915_sf_submitting;
> +
> +		if (ret) {
> +			bool requeue = true;
> +
> +			/* Oh dear! Either the node is broken or the ring is
> +			 * busy. So need to kill the node or requeue it and try
> +			 * again later as appropriate. */
> +
> +			switch (-ret) {
> +			case ENODEV:
> +			case ENOENT:
> +				/* Fatal errors. Kill the node. */
> +				requeue = false;
> +			break;
> +
> +			case EAGAIN:
> +			case EBUSY:
> +			case EIO:
> +			case ENOMEM:
> +			case ERESTARTSYS:
> +			case EINTR:
> +				/* Supposedly recoverable errors. */
> +			break;
> +
> +			default:
> +				DRM_DEBUG_DRIVER("<%s> Got unexpected error from execfinal(): %d!\n",
> +						 ring->name, ret);
> +				/* Assume it is recoverable and hope for the best. */
> +			break;
> +			}
> +
> +			if (requeue) {
> +				i915_scheduler_node_requeue(node);
> +				/* No point spinning if the ring is currently
> +				 * unavailable so just give up and come back
> +				 * later. */
> +				break;
> +			} else
> +				i915_scheduler_node_kill(node);
> +		}
> +
> +		/* Keep launching until the sky is sufficiently full. */
> +		if (i915_scheduler_count_flying(scheduler, ring) >=
> +						scheduler->min_flying)
> +			break;
> +
> +		ret = i915_scheduler_pop_from_queue_locked(ring, &node, &flags);
> +	} while (ret == 0);
> +
> +	spin_unlock_irqrestore(&scheduler->lock, flags);
> +
> +	if (!was_locked)
> +		mutex_unlock(&dev->struct_mutex);
> +
> +	/* Don't complain about not being able to submit extra entries */
> +	if (ret == -ENODATA)
> +		ret = 0;
> +
> +	return (ret < 0) ? ret : count;
> +}
> +
> +static int i915_scheduler_remove_dependent(struct i915_scheduler *scheduler,
> +					   struct i915_scheduler_queue_entry *remove)
> +{
> +	struct i915_scheduler_queue_entry  *node;
> +	int     i, r;
> +	int     count = 0;
> +
> +	for (i = 0; i < remove->num_deps; i++)
> +		if ((remove->dep_list[i]) &&
> +		    (!I915_SQS_IS_COMPLETE(remove->dep_list[i])))
> +			count++;
> +	BUG_ON(count);
> +
> +	for (r = 0; r < I915_NUM_RINGS; r++) {
> +		list_for_each_entry(node, &scheduler->node_queue[r], link) {
> +			for (i = 0; i < node->num_deps; i++) {
> +				if (node->dep_list[i] != remove)
> +					continue;
> +
> +				node->dep_list[i] = NULL;
> +			}
> +		}
> +	}
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> new file mode 100644
> index 0000000..0c5fc7f
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -0,0 +1,91 @@
> +/*
> + * Copyright (c) 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef _I915_SCHEDULER_H_
> +#define _I915_SCHEDULER_H_
> +
> +enum i915_scheduler_queue_status {
> +	/* Limbo: */
> +	i915_sqs_none = 0,
> +	/* Not yet submitted to hardware: */
> +	i915_sqs_queued,
> +	/* Popped from queue, ready to fly: */
> +	i915_sqs_popped,
> +	/* Sent to hardware for processing: */
> +	i915_sqs_flying,
> +	/* Finished processing on the hardware: */
> +	i915_sqs_complete,
> +	/* Killed by catastrophic submission failure: */
> +	i915_sqs_dead,
> +	/* Limit value for use with arrays/loops */
> +	i915_sqs_MAX
> +};
> +
> +#define I915_SQS_IS_QUEUED(node)	(((node)->status == i915_sqs_queued))
> +#define I915_SQS_IS_FLYING(node)	(((node)->status == i915_sqs_flying))
> +#define I915_SQS_IS_COMPLETE(node)	(((node)->status == i915_sqs_complete) || \
> +					 ((node)->status == i915_sqs_dead))
> +
> +struct i915_scheduler_obj_entry {
> +	struct drm_i915_gem_object          *obj;
> +};
> +
> +struct i915_scheduler_queue_entry {
> +	struct i915_execbuffer_params       params;
> +	uint32_t                            priority;
> +	struct i915_scheduler_obj_entry     *saved_objects;
> +	int                                 num_objs;
> +	bool                                bumped;
> +	struct i915_scheduler_queue_entry   **dep_list;
> +	int                                 num_deps;
> +	enum i915_scheduler_queue_status    status;
> +	struct timespec                     stamp;
> +	struct list_head                    link;
> +	uint32_t                            scheduler_index;
> +};
> +
> +struct i915_scheduler {
> +	struct list_head    node_queue[I915_NUM_RINGS];
> +	uint32_t            flags[I915_NUM_RINGS];
> +	spinlock_t          lock;
> +	uint32_t            index;
> +	uint32_t            last_irq_seqno[I915_NUM_RINGS];
> +
> +	/* Tuning parameters: */
> +	uint32_t            priority_level_max;
> +	uint32_t            priority_level_preempt;
> +	uint32_t            min_flying;
> +};
> +
> +/* Flag bits for i915_scheduler::flags */
> +enum {
> +	i915_sf_interrupts_enabled  = (1 << 0),
> +	i915_sf_submitting          = (1 << 1),
> +};
> +
> +int         i915_scheduler_init(struct drm_device *dev);
> +int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
> +int         i915_scheduler_handle_irq(struct intel_engine_cs *ring);
> +
> +#endif  /* _I915_SCHEDULER_H_ */
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC 27/39] drm/i915: Add sync wait support to scheduler
  2015-07-17 14:33 ` [RFC 27/39] drm/i915: Add sync wait support " John.C.Harrison
@ 2015-07-21  9:59   ` Daniel Vetter
  0 siblings, 0 replies; 48+ messages in thread
From: Daniel Vetter @ 2015-07-21  9:59 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jul 17, 2015 at 03:33:36PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> There is a sync framework to allow work for multiple independent
> systems to be synchronised with each other but without stalling
> the CPU whether in the application or the driver. This patch adds
> support for this framework to the GPU scheduler.
> 
> Batch buffers can now have sync framework fence objects associated with
> them. The scheduler will look at this fence when deciding what to
> submit next to the hardware. If the fence is outstanding then that
> batch buffer will be passed over in preference of one that is ready to
> run. If no other batches are ready then the scheduler will queue an
> asynchronous callback to be woken up when the fence has been
> signalled. The callback will wake the scheduler and submit the now
> ready batch buffer.
> 
> For: VIZ-1587
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Not quite what I expected. My motivation behind moving i915 to struct
fence is that the scheduler would internally use only struct fence for
synchronization. Then adding syncpt support wouldn't need any changes
really in the internals since that's just another source of struct fences
to take into account.

Also note that struct fence allows you to register just a callback (won't
work with the i915 fences you've done though), which means you could just
register callbacks for everything instead of waiting in process context.

Also I guess some generic fence helper to aggregate fences would be
useful, that should also simplify the code in the depency calculation.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_drv.h       |   1 +
>  drivers/gpu/drm/i915/i915_scheduler.c | 163 ++++++++++++++++++++++++++++++++--
>  drivers/gpu/drm/i915/i915_scheduler.h |   6 ++
>  3 files changed, 165 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 12b4986..b568432 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1703,6 +1703,7 @@ struct i915_execbuffer_params {
>  	uint32_t                        batch_obj_vm_offset;
>  	struct intel_engine_cs          *ring;
>  	struct drm_i915_gem_object      *batch_obj;
> +	struct sync_fence               *fence_wait;
>  	struct drm_clip_rect            *cliprects;
>  	uint32_t                        instp_mask;
>  	int                             instp_mode;
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index c7139f8..19577c9 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -25,6 +25,7 @@
>  #include "i915_drv.h"
>  #include "intel_drv.h"
>  #include "i915_scheduler.h"
> +#include <../drivers/staging/android/sync.h>
>  
>  static int         i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node);
>  static int         i915_scheduler_remove_dependent(struct i915_scheduler *scheduler,
> @@ -100,6 +101,9 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
>  
>  		qe->scheduler_index = scheduler->index++;
>  
> +		WARN_ON(qe->params.fence_wait &&
> +			(atomic_read(&qe->params.fence_wait->status) == 0));
> +
>  		intel_ring_reserved_space_cancel(qe->params.request->ringbuf);
>  
>  		scheduler->flags[qe->params.ring->id] |= i915_sf_submitting;
> @@ -134,6 +138,11 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
>  		if (qe->params.dispatch_flags & I915_DISPATCH_SECURE)
>  			i915_gem_execbuff_release_batch_obj(qe->params.batch_obj);
>  
> +#ifdef CONFIG_SYNC
> +		if (qe->params.fence_wait)
> +			sync_fence_put(qe->params.fence_wait);
> +#endif
> +
>  		return 0;
>  	}
>  
> @@ -625,6 +634,11 @@ static int i915_scheduler_remove(struct intel_engine_cs *ring)
>  		node = list_first_entry(&remove, typeof(*node), link);
>  		list_del(&node->link);
>  
> +#ifdef CONFIG_SYNC
> +		if (node->params.fence_wait)
> +			sync_fence_put(node->params.fence_wait);
> +#endif
> +
>  		/* Free up all the DRM object references */
>  		i915_gem_scheduler_clean_node(node);
>  
> @@ -845,17 +859,100 @@ static int i915_scheduler_submit_max_priority(struct intel_engine_cs *ring,
>  	return count;
>  }
>  
> +#ifdef CONFIG_SYNC
> +/* Use a private structure in order to pass the 'dev' pointer through */
> +struct i915_sync_fence_waiter {
> +	struct sync_fence_waiter sfw;
> +	struct drm_device	 *dev;
> +	struct i915_scheduler_queue_entry *node;
> +};
> +
> +static void i915_scheduler_wait_fence_signaled(struct sync_fence *fence,
> +				       struct sync_fence_waiter *waiter)
> +{
> +	struct i915_sync_fence_waiter *i915_waiter;
> +	struct drm_i915_private *dev_priv = NULL;
> +
> +	i915_waiter = container_of(waiter, struct i915_sync_fence_waiter, sfw);
> +	dev_priv    = (i915_waiter && i915_waiter->dev) ?
> +					    i915_waiter->dev->dev_private : NULL;
> +
> +	/*
> +	 * NB: The callback is executed at interrupt time, thus it can not
> +	 * call _submit() directly. It must go via the delayed work handler.
> +	 */
> +	if (dev_priv) {
> +		struct i915_scheduler   *scheduler;
> +		unsigned long           flags;
> +
> +		scheduler = dev_priv->scheduler;
> +
> +		spin_lock_irqsave(&scheduler->lock, flags);
> +		i915_waiter->node->flags &= ~i915_qef_fence_waiting;
> +		spin_unlock_irqrestore(&scheduler->lock, flags);
> +
> +		queue_work(dev_priv->wq, &dev_priv->mm.scheduler_work);
> +	}
> +
> +	kfree(waiter);
> +}
> +
> +static bool i915_scheduler_async_fence_wait(struct drm_device *dev,
> +					    struct i915_scheduler_queue_entry *node)
> +{
> +	struct i915_sync_fence_waiter	*fence_waiter;
> +	struct sync_fence		*fence = node->params.fence_wait;
> +	int				signaled;
> +	bool				success = true;
> +
> +	if ((node->flags & i915_qef_fence_waiting) == 0)
> +		node->flags |= i915_qef_fence_waiting;
> +	else
> +		return true;
> +
> +	if (fence == NULL)
> +		return false;
> +
> +	signaled = atomic_read(&fence->status);
> +	if (!signaled) {
> +		fence_waiter = kmalloc(sizeof(*fence_waiter), GFP_KERNEL);
> +		if (!fence_waiter) {
> +			success = false;
> +			goto end;
> +		}
> +
> +		sync_fence_waiter_init(&fence_waiter->sfw,
> +				i915_scheduler_wait_fence_signaled);
> +		fence_waiter->node = node;
> +		fence_waiter->dev = dev;
> +
> +		if (sync_fence_wait_async(fence, &fence_waiter->sfw)) {
> +			/* an error occurred, usually this is because the
> +			 * fence was signaled already */
> +			signaled = atomic_read(&fence->status);
> +			if (!signaled) {
> +				success = false;
> +				goto end;
> +			}
> +		}
> +	}
> +end:
> +	return success;
> +}
> +#endif
> +
>  static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
>  				    struct i915_scheduler_queue_entry **pop_node,
>  				    unsigned long *flags)
>  {
>  	struct drm_i915_private            *dev_priv = ring->dev->dev_private;
>  	struct i915_scheduler              *scheduler = dev_priv->scheduler;
> +	struct i915_scheduler_queue_entry  *best_wait, *fence_wait = NULL;
>  	struct i915_scheduler_queue_entry  *best;
>  	struct i915_scheduler_queue_entry  *node;
>  	int     ret;
>  	int     i;
> -	bool	any_queued;
> +	bool	signalled, any_queued;
>  	bool	has_local, has_remote, only_remote;
>  
>  	*pop_node = NULL;
> @@ -863,13 +960,25 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
>  
>  	any_queued = false;
>  	only_remote = false;
> +	best_wait = NULL;
>  	best = NULL;
>  
> +#ifndef CONFIG_SYNC
> +	signalled = true;
> +#endif
> +
>  	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
>  		if (!I915_SQS_IS_QUEUED(node))
>  			continue;
>  		any_queued = true;
>  
> +#ifdef CONFIG_SYNC
> +		if (node->params.fence_wait)
> +			signalled = atomic_read(&node->params.fence_wait->status) != 0;
> +		else
> +			signalled = true;
> +#endif	// CONFIG_SYNC
> +
>  		has_local  = false;
>  		has_remote = false;
>  		for (i = 0; i < node->num_deps; i++) {
> @@ -886,9 +995,15 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
>  			only_remote = true;
>  
>  		if (!has_local && !has_remote) {
> -			if (!best ||
> -			    (node->priority > best->priority))
> -				best = node;
> +			if (signalled) {
> +				if (!best ||
> +				    (node->priority > best->priority))
> +					best = node;
> +			} else {
> +				if (!best_wait ||
> +				    (node->priority > best_wait->priority))
> +					best_wait = node;
> +			}
>  		}
>  	}
>  
> @@ -904,8 +1019,34 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
>  		 * (a) there are no buffers in the queue
>  		 * (b) all queued buffers are dependent on other buffers
>  		 *     e.g. on a buffer that is in flight on a different ring
> +		 * (c) all independent buffers are waiting on fences
>  		 */
> -		if (only_remote) {
> +		if (best_wait) {
> +			/* Need to wait for something to be signalled.
> +			 *
> +			 * NB: do not really want to wait on one specific fd
> +			 * because there is no guarantee in the order that
> +			 * blocked buffers will be signalled. Need to wait on
> +			 * 'anything' and then rescan for best available, if
> +			 * still nothing then wait again...
> +			 *
> +			 * NB 2: The wait must also wake up if someone attempts
> +			 * to submit a new buffer. The new buffer might be
> +			 * independent of all others and thus could jump the
> +			 * queue and start running immediately.
> +			 *
> +			 * NB 3: Lastly, must not wait with the spinlock held!
> +			 *
> +			 * So rather than wait here, need to queue a deferred
> +			 * wait thread and just return 'nothing to do'.
> +			 *
> +			 * NB 4: Can't actually do the wait here because the
> +			 * spinlock is still held and the wait requires doing
> +			 * a memory allocation.
> +			 */
> +			fence_wait = best_wait;
> +			ret = -EAGAIN;
> +		} else if (only_remote) {
>  			/* The only dependent buffers are on another ring. */
>  			ret = -EAGAIN;
>  		} else if (any_queued) {
> @@ -917,6 +1058,18 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
>  
>  	/* i915_scheduler_dump_queue_pop(ring, best); */
>  
> +	if (fence_wait) {
> +#ifdef CONFIG_SYNC
> +		/* It should be safe to sleep now... */
> +		/* NB: Need to release and reacquire the spinlock though */
> +		spin_unlock_irqrestore(&scheduler->lock, *flags);
> +		i915_scheduler_async_fence_wait(ring->dev, fence_wait);
> +		spin_lock_irqsave(&scheduler->lock, *flags);
> +#else
> +		BUG_ON(true);
> +#endif
> +	}
> +
>  	*pop_node = best;
>  	return ret;
>  }
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index ce94b0b..8ca4b4b 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -56,6 +56,11 @@ struct i915_scheduler_obj_entry {
>  	struct drm_i915_gem_object          *obj;
>  };
>  
> +enum i915_scheduler_queue_entry_flags {
> +	/* Fence is being waited on */
> +	i915_qef_fence_waiting              = (1 << 0),
> +};
> +
>  struct i915_scheduler_queue_entry {
>  	struct i915_execbuffer_params       params;
>  	uint32_t                            priority;
> @@ -68,6 +73,7 @@ struct i915_scheduler_queue_entry {
>  	struct timespec                     stamp;
>  	struct list_head                    link;
>  	uint32_t                            scheduler_index;
> +	uint32_t                            flags;
>  };
>  
>  struct i915_scheduler {
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC 00/39] GPU scheduler for i915 driver
  2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
                   ` (38 preceding siblings ...)
  2015-07-17 14:33 ` [RFC 39/39] drm/i915: Allow scheduler to manage inter-ring object synchronisation John.C.Harrison
@ 2015-07-21 13:33 ` Daniel Vetter
  39 siblings, 0 replies; 48+ messages in thread
From: Daniel Vetter @ 2015-07-21 13:33 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jul 17, 2015 at 03:33:09PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Implemented a batch buffer submission scheduler for the i915 DRM driver.
> 
> The general theory of operation is that when batch buffers are submitted to the
> driver, the execbuffer() code assigns a unique seqno value and then packages up
> all the information required to execute the batch buffer at a later time. This
> package is given over to the scheduler which adds it to an internal node list.
> The scheduler also scans the list of objects associated with the batch buffer
> and compares them against the objects already in use by other buffers in the
> node list. If matches are found then the new batch buffer node is marked as
> being dependent upon the matching node. The same is done for the context object.
> The scheduler also bumps up the priority of such matching nodes on the grounds
> that the more dependencies a given batch buffer has the more important it is
> likely to be.
> 
> The scheduler aims to have a given (tuneable) number of batch buffers in flight
> on the hardware at any given time. If fewer than this are currently executing
> when a new node is queued, then the node is passed straight through to the
> submit function. Otherwise it is simply added to the queue and the driver
> returns back to user land.
> 
> As each batch buffer completes, it raises an interrupt which wakes up the
> scheduler. Note that it is possible for multiple buffers to complete before the
> IRQ handler gets to run. Further, the seqno values of the individual buffers are
> not necessary incrementing as the scheduler may have re-ordered their
> submission. However, the scheduler keeps the list of executing buffers in order
> of hardware submission. Thus it can scan through the list until a matching seqno
> is found and then mark all in flight nodes from that point on as completed.
> 
> A deferred work queue is also poked by the interrupt handler. When this wakes up
> it can do more involved processing such as actually removing completed nodes
> from the queue and freeing up the resources associated with them (internal
> memory allocations, DRM object references, context reference, etc.). The work
> handler also checks the in flight count and calls the submission code if a new
> slot has appeared.
> 
> When the scheduler's submit code is called, it scans the queued node list for
> the highest priority node that has no unmet dependencies. Note that the
> dependency calculation is complex as it must take inter-ring dependencies and
> potential preemptions into account. Note also that in the future this will be
> extended to include external dependencies such as the Android Native Sync file
> descriptors and/or the linux dma-buff synchronisation scheme.
> 
> If a suitable node is found then it is sent to execbuff_final() for submission
> to the hardware. The in flight count is then re-checked and a new node popped
> from the list if appropriate.
> 
> The scheduler also allows high priority batch buffers (e.g. from a desktop
> compositor) to jump ahead of whatever is already running if the underlying
> hardware supports pre-emption. In this situation, any work that was pre-empted
> is returned to the queued list ready to be resubmitted when no more high
> priority work is outstanding.
> 
> [Patches against drm-intel-nightly tree fetched 15/07/2015 with struct fence
> conversion patches applied]

A few high-level comments:

seqno handling
--------------

So you do rework seqno handling to materialize them fairly late, but
there's still (at least seemingly, not sure whether that all falls out
again) a lot of code to handle seqno values jumping around.

My goal with all the request/fence abstraction was that core code can stop
caring about where/how the seqno is stored, which would allow us to switch
to a per-timeline seqno schem, i.e. per ctx/engine, store to a private
per-ctx location.

Originally it made sense to keep seqno values as-is since they where used
all over the driver, but we fixed that with the seqno2request conversion
and with the request preallocation patches. What's still missing that
prevents us from fully abstracting away seqno, how they're allocated and
where they're stored from a fence-internal implementation detail?

parallel data structures
------------------------

Currently your scheduler builds up a parallel set of datastrctures
starting from dev_priv->scheduler. That probably makes sense for an
external patch series, but for usptreaming I don't like parallel data
structures for a few reasons:

- It makes it harder to refactor code since it's more hoops to jump
  through when going from one hirarchy to the other one. E.g. when you
  walk a list on one side there might not be a perfectly corresponding
  list order on the other side.

- It's harder to spot duplicated logic (like the depency calculation
  code).

- It's harder on reviewers since behavior for the new stuff needs to be
  inferred from the code, while extending existing stuff can build on
  working assumptions. Yeah I'm just lazy ;-)

- Especially in GEM code it's not great having to kmalloc shadowing
  structures deep down in the call chain.

Anyway I think the global stuff should just be an embedded struct in
dev_priv (we don't bother with kmalloc in general). The per-engine arrays
should be split and moved into intel_engine_cs. Anything ctx specific
(haven't spotted much) belongs into ctx and the queue_entry should be
moved into requests. Yes there's not a 1:1 request:QE mapping, but we have
that already with execlists where we can coalesce requests together if
they're just tail updates to the same ctx on the same engine.

legacy ringbuffer scheduler
---------------------------

The really nice thing with execlist is that each ctx has its own ring and
we don't need to delay emitting ringbuffer commands at all for the
scheduler, just the tail update and execlist submission. That means
scheduler with execlist is really just an internal detail like the guc
submission and won't need any big changes to the core gem code.

Legacy scheduler otoh means we need to make sure no ringbuffer commands
get emitted, which means some kind of transactional ring access thing or
something similar like Chris suggested. That's a lot more than just the
guarantees in the add_request rework, and I don't really think that effort
is worth it really.

I've made some more comments about a bunch of the details, this is just
the overall design comments.

Cheers, Daniel
> 
> Dave Gordon (1):
>   drm/i915: Updating assorted register and status page definitions
> 
> John Harrison (38):
>   drm/i915: Add total count to context status debugfs output
>   drm/i915: Explicit power enable during deferred context initialisation
>   drm/i915: Prelude to splitting i915_gem_do_execbuffer in two
>   drm/i915: Split i915_dem_do_execbuffer() in half
>   drm/i915: Re-instate request->uniq because it is extremely useful
>   drm/i915: Start of GPU scheduler
>   drm/i915: Prepare retire_requests to handle out-of-order seqnos
>   drm/i915: Added scheduler hook into i915_gem_complete_requests_ring()
>   drm/i915: Disable hardware semaphores when GPU scheduler is enabled
>   drm/i915: Force MMIO flips when scheduler enabled
>   drm/i915: Added scheduler hook when closing DRM file handles
>   drm/i915: Added deferred work handler for scheduler
>   drm/i915: Redirect execbuffer_final() via scheduler
>   drm/i915: Keep the reserved space mechanism happy
>   drm/i915: Added tracking/locking of batch buffer objects
>   drm/i915: Hook scheduler node clean up into retire requests
>   drm/i915: Added scheduler interrupt handler hook
>   drm/i915: Added scheduler support to __wait_request() calls
>   drm/i915: Added scheduler support to page fault handler
>   drm/i915: Added scheduler flush calls to ring throttle and idle functions
>   drm/i915: Add scheduler hook to GPU reset
>   drm/i915: Added a module parameter for allowing scheduler overrides
>   drm/i915: Support for 'unflushed' ring idle
>   drm/i915: Defer seqno allocation until actual hardware submission time
>   drm/i915: Added immediate submission override to scheduler
>   drm/i915: Add sync wait support to scheduler
>   drm/i915: Connecting execbuff fences to scheduler
>   drm/i915: Added trace points to scheduler
>   drm/i915: Added scheduler queue throttling by DRM file handle
>   drm/i915: Added debugfs interface to scheduler tuning parameters
>   drm/i915: Added debug state dump facilities to scheduler
>   drm/i915: Add early exit to execbuff_final() if insufficient ring space
>   drm/i915: Added scheduler statistic reporting to debugfs
>   drm/i915: Added seqno values to scheduler status dump
>   drm/i915: Add scheduler support functions for TDR
>   drm/i915: GPU priority bumping to prevent starvation
>   drm/i915: Enable GPU scheduler by default
>   drm/i915: Allow scheduler to manage inter-ring object synchronisation
> 
>  drivers/gpu/drm/i915/Makefile              |    1 +
>  drivers/gpu/drm/i915/i915_debugfs.c        |  221 ++++
>  drivers/gpu/drm/i915/i915_dma.c            |    6 +
>  drivers/gpu/drm/i915/i915_drv.c            |    9 +
>  drivers/gpu/drm/i915/i915_drv.h            |   42 +-
>  drivers/gpu/drm/i915/i915_gem.c            |  191 ++-
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  338 ++++--
>  drivers/gpu/drm/i915/i915_irq.c            |    3 +
>  drivers/gpu/drm/i915/i915_params.c         |    4 +
>  drivers/gpu/drm/i915/i915_reg.h            |   30 +-
>  drivers/gpu/drm/i915/i915_scheduler.c      | 1762 ++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_scheduler.h      |  178 +++
>  drivers/gpu/drm/i915/i915_trace.h          |  233 +++-
>  drivers/gpu/drm/i915/intel_display.c       |    6 +-
>  drivers/gpu/drm/i915/intel_lrc.c           |  163 ++-
>  drivers/gpu/drm/i915/intel_lrc.h           |    1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c    |   47 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.h    |   43 +-
>  18 files changed, 3110 insertions(+), 168 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_scheduler.c
>  create mode 100644 drivers/gpu/drm/i915/i915_scheduler.h
> 
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2015-07-21 13:31 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-17 14:33 [RFC 00/39] GPU scheduler for i915 driver John.C.Harrison
2015-07-17 14:33 ` [RFC 01/39] drm/i915: Add total count to context status debugfs output John.C.Harrison
2015-07-17 14:33 ` [RFC 02/39] drm/i915: Updating assorted register and status page definitions John.C.Harrison
2015-07-17 14:33 ` [RFC 03/39] drm/i915: Explicit power enable during deferred context initialisation John.C.Harrison
2015-07-21  7:54   ` Daniel Vetter
2015-07-17 14:33 ` [RFC 04/39] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two John.C.Harrison
2015-07-21  8:06   ` Daniel Vetter
2015-07-17 14:33 ` [RFC 05/39] drm/i915: Split i915_dem_do_execbuffer() in half John.C.Harrison
2015-07-21  8:00   ` Daniel Vetter
2015-07-17 14:33 ` [RFC 06/39] drm/i915: Re-instate request->uniq because it is extremely useful John.C.Harrison
2015-07-17 14:33 ` [RFC 07/39] drm/i915: Start of GPU scheduler John.C.Harrison
2015-07-21  9:40   ` Daniel Vetter
2015-07-17 14:33 ` [RFC 08/39] drm/i915: Prepare retire_requests to handle out-of-order seqnos John.C.Harrison
2015-07-17 14:33 ` [RFC 09/39] drm/i915: Added scheduler hook into i915_gem_complete_requests_ring() John.C.Harrison
2015-07-17 14:33 ` [RFC 10/39] drm/i915: Disable hardware semaphores when GPU scheduler is enabled John.C.Harrison
2015-07-17 14:33 ` [RFC 11/39] drm/i915: Force MMIO flips when scheduler enabled John.C.Harrison
2015-07-17 14:33 ` [RFC 12/39] drm/i915: Added scheduler hook when closing DRM file handles John.C.Harrison
2015-07-17 14:33 ` [RFC 13/39] drm/i915: Added deferred work handler for scheduler John.C.Harrison
2015-07-17 14:33 ` [RFC 14/39] drm/i915: Redirect execbuffer_final() via scheduler John.C.Harrison
2015-07-17 14:33 ` [RFC 15/39] drm/i915: Keep the reserved space mechanism happy John.C.Harrison
2015-07-17 14:33 ` [RFC 16/39] drm/i915: Added tracking/locking of batch buffer objects John.C.Harrison
2015-07-17 14:33 ` [RFC 17/39] drm/i915: Hook scheduler node clean up into retire requests John.C.Harrison
2015-07-17 14:33 ` [RFC 18/39] drm/i915: Added scheduler interrupt handler hook John.C.Harrison
2015-07-17 14:33 ` [RFC 19/39] drm/i915: Added scheduler support to __wait_request() calls John.C.Harrison
2015-07-21  9:27   ` Daniel Vetter
2015-07-17 14:33 ` [RFC 20/39] drm/i915: Added scheduler support to page fault handler John.C.Harrison
2015-07-17 14:33 ` [RFC 21/39] drm/i915: Added scheduler flush calls to ring throttle and idle functions John.C.Harrison
2015-07-17 14:33 ` [RFC 22/39] drm/i915: Add scheduler hook to GPU reset John.C.Harrison
2015-07-17 14:33 ` [RFC 23/39] drm/i915: Added a module parameter for allowing scheduler overrides John.C.Harrison
2015-07-17 14:33 ` [RFC 24/39] drm/i915: Support for 'unflushed' ring idle John.C.Harrison
2015-07-21  8:50   ` Daniel Vetter
2015-07-17 14:33 ` [RFC 25/39] drm/i915: Defer seqno allocation until actual hardware submission time John.C.Harrison
2015-07-17 14:33 ` [RFC 26/39] drm/i915: Added immediate submission override to scheduler John.C.Harrison
2015-07-17 14:33 ` [RFC 27/39] drm/i915: Add sync wait support " John.C.Harrison
2015-07-21  9:59   ` Daniel Vetter
2015-07-17 14:33 ` [RFC 28/39] drm/i915: Connecting execbuff fences " John.C.Harrison
2015-07-17 14:33 ` [RFC 29/39] drm/i915: Added trace points " John.C.Harrison
2015-07-17 14:33 ` [RFC 30/39] drm/i915: Added scheduler queue throttling by DRM file handle John.C.Harrison
2015-07-17 14:33 ` [RFC 31/39] drm/i915: Added debugfs interface to scheduler tuning parameters John.C.Harrison
2015-07-17 14:33 ` [RFC 32/39] drm/i915: Added debug state dump facilities to scheduler John.C.Harrison
2015-07-17 14:33 ` [RFC 33/39] drm/i915: Add early exit to execbuff_final() if insufficient ring space John.C.Harrison
2015-07-17 14:33 ` [RFC 34/39] drm/i915: Added scheduler statistic reporting to debugfs John.C.Harrison
2015-07-17 14:33 ` [RFC 35/39] drm/i915: Added seqno values to scheduler status dump John.C.Harrison
2015-07-17 14:33 ` [RFC 36/39] drm/i915: Add scheduler support functions for TDR John.C.Harrison
2015-07-17 14:33 ` [RFC 37/39] drm/i915: GPU priority bumping to prevent starvation John.C.Harrison
2015-07-17 14:33 ` [RFC 38/39] drm/i915: Enable GPU scheduler by default John.C.Harrison
2015-07-17 14:33 ` [RFC 39/39] drm/i915: Allow scheduler to manage inter-ring object synchronisation John.C.Harrison
2015-07-21 13:33 ` [RFC 00/39] GPU scheduler for i915 driver Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.