[RFC 00/44] GPU scheduler for i915 driver

* [RFC 00/44] GPU scheduler for i915 driver
@ 2014-06-26 17:23 John.C.Harrison
  2014-06-26 17:23 ` [RFC 01/44] drm/i915: Corrected 'file_priv' to 'file' in 'i915_driver_preclose()' John.C.Harrison
                   ` (45 more replies)
  0 siblings, 46 replies; 90+ messages in thread
From: John.C.Harrison @ 2014-06-26 17:23 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Implemented a batch buffer submission scheduler for the i915 DRM driver.

The general theory of operation is that when batch buffers are submitted to the
driver, the execbuffer() code assigns a unique seqno value and then packages up
all the information required to execute the batch buffer at a later time. This
package is given over to the scheduler which adds it to an internal node list.
The scheduler also scans the list of objects associated with the batch buffer
and compares them against the objects already in use by other buffers in the
node list. If matches are found then the new batch buffer node is marked as
being dependent upon the matching node. The same is done for the context object.
The scheduler also bumps up the priority of such matching nodes on the grounds
that the more dependencies a given batch buffer has the more important it is
likely to be.

The scheduler aims to have a given (tuneable) number of batch buffers in flight
on the hardware at any given time. If fewer than this are currently executing
when a new node is queued, then the node is passed straight through to the
submit function. Otherwise it is simply added to the queue and the driver
returns back to user land.

As each batch buffer completes, it raises an interrupt which wakes up the
scheduler. Note that it is possible for multiple buffers to complete before the
IRQ handler gets to run. Further, the seqno values of the individual buffers are
not necessary incrementing as the scheduler may have re-ordered their
submission. However, the scheduler keeps the list of executing buffers in order
of hardware submission. Thus it can scan through the list until a matching seqno
is found and then mark all in flight nodes from that point on as completed.

A deferred work queue is also poked by the interrupt handler. When this wakes up
it can do more involved processing such as actually removing completed nodes
from the queue and freeing up the resources associated with them (internal
memory allocations, DRM object references, context reference, etc.). The work
handler also checks the in flight count and calls the submission code if a new
slot has appeared.

When the scheduler's submit code is called, it scans the queued node list for
the highest priority node that has no unmet dependencies. Note that the
dependency calculation is complex as it must take inter-ring dependencies and
potential preemptions into account. Note also that in the future this will be
extended to include external dependencies such as the Android Native Sync file
descriptors and/or the linux dma-buff synchronisation scheme.

If a suitable node is found then it is sent to execbuff_final() for submission
to the hardware. The in flight count is then re-checked and a new node popped
from the list if appropriate.

The scheduler also allows high priority batch buffers (e.g. from a desktop
compositor) to jump ahead of whatever is already running if the underlying
hardware supports pre-emption. In this situation, any work that was pre-empted
is returned to the queued list ready to be resubmitted when no more high
priority work is outstanding.

[Patches against drm-intel-nightly tree fetched 30/05/2014]

John Harrison (44):
  drm/i915: Corrected 'file_priv' to 'file' in 'i915_driver_preclose()'
  drm/i915: Added getparam for native sync
  drm/i915: Add extra add_request calls
  drm/i915: Fix null pointer dereference in error capture
  drm/i915: Updating assorted register and status page definitions
  drm/i915: Fixes for FIFO space queries
  drm/i915: Disable 'get seqno' workaround for VLV
  drm/i915: Added GPU scheduler config option
  drm/i915: Start of GPU scheduler
  drm/i915: Prepare retire_requests to handle out-of-order seqnos
  drm/i915: Added scheduler hook into i915_seqno_passed()
  drm/i915: Disable hardware semaphores when GPU scheduler is enabled
  drm/i915: Added scheduler hook when closing DRM file handles
  drm/i915: Added getparam for GPU scheduler
  drm/i915: Added deferred work handler for scheduler
  drm/i915: Alloc early seqno
  drm/i915: Prelude to splitting i915_gem_do_execbuffer in two
  drm/i915: Added scheduler debug macro
  drm/i915: Split i915_dem_do_execbuffer() in half
  drm/i915: Redirect execbuffer_final() via scheduler
  drm/i915: Added tracking/locking of batch buffer objects
  drm/i915: Ensure OLS & PLR are always in sync
  drm/i915: Added manipulation of OLS/PLR
  drm/i915: Added scheduler interrupt handler hook
  drm/i915: Added hook to catch 'unexpected' ring submissions
  drm/i915: Added scheduler support to __wait_seqno() calls
  drm/i915: Added scheduler support to page fault handler
  drm/i915: Added scheduler flush calls to ring throttle and idle functions
  drm/i915: Hook scheduler into intel_ring_idle()
  drm/i915: Added a module parameter for allowing scheduler overrides
  drm/i915: Implemented the GPU scheduler
  drm/i915: Added immediate submission override to scheduler
  drm/i915: Added trace points to scheduler
  drm/i915: Added scheduler queue throttling by DRM file handle
  drm/i915: Added debugfs interface to scheduler tuning parameters
  drm/i915: Added debug state dump facilities to scheduler
  drm/i915: Added facility for cancelling an outstanding request
  drm/i915: Add early exit to execbuff_final() if insufficient ring space
  drm/i915: Added support for pre-emptive scheduling
  drm/i915: REVERTME Hack to allow IGT to test pre-emption
  drm/i915: Added validation callback to trace points
  drm/i915: Added scheduler statistic reporting to debugfs
  drm/i915: Added support for submitting out-of-batch ring commands
  drm/i915: Fake batch support for page flips

 drivers/gpu/drm/i915/Kconfig                 |   16 +
 drivers/gpu/drm/i915/Makefile                |    1 +
 drivers/gpu/drm/i915/i915_debugfs.c          |  202 +++
 drivers/gpu/drm/i915/i915_dma.c              |   27 +-
 drivers/gpu/drm/i915/i915_drv.c              |    9 +
 drivers/gpu/drm/i915/i915_drv.h              |   61 +-
 drivers/gpu/drm/i915/i915_gem.c              |  256 +++-
 drivers/gpu/drm/i915/i915_gem_context.c      |    9 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  658 ++++++++-
 drivers/gpu/drm/i915/i915_gem_render_state.c |    2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c        |   13 +-
 drivers/gpu/drm/i915/i915_irq.c              |    7 +-
 drivers/gpu/drm/i915/i915_params.c           |    4 +
 drivers/gpu/drm/i915/i915_reg.h              |   30 +-
 drivers/gpu/drm/i915/i915_scheduler.c        | 1979 ++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h        |  277 ++++
 drivers/gpu/drm/i915/i915_trace.h            |  223 +++
 drivers/gpu/drm/i915/intel_display.c         |   92 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c      |   80 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h      |   68 +-
 drivers/gpu/drm/i915/intel_uncore.c          |   49 +-
 include/drm/drmP.h                           |    7 +
 include/uapi/drm/i915_drm.h                  |    7 +
 23 files changed, 3880 insertions(+), 197 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_scheduler.c
 create mode 100644 drivers/gpu/drm/i915/i915_scheduler.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 90+ messages in thread