All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/49] Execlists
@ 2014-03-27 17:59 oscar.mateo
  2014-03-27 17:59 ` [PATCH 01/49] drm/i915/bdw: Macro to distinguish LRCs (Logical Ring Contexts) oscar.mateo
                   ` (49 more replies)
  0 siblings, 50 replies; 85+ messages in thread
From: oscar.mateo @ 2014-03-27 17:59 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Hi all,

This patch series implement execlists for GEN8+. Before continuing, it is important to mention that I might have taken upon myself to assemble the series and rewrite it for upstreaming, but many people have worked on this series before me. Namely:

Ben Widawsky (benjamin.widawsky@intel.com).
Jesse Barnes (jbarnes@virtuousgeek.org).
Michel Thierry (michel.thierry@intel.com).
Thomas Daniel (thomas.daniel@intel.com).
Rafael Barbalho (rafael.barbalho@intel.com).

All good ideas in the series belong to these authors, and so I have tried to maintain authorship in the patches accordingly (to the extent possible, since the patches have suffered a lot of squashing & splitting). These authors do not, however, bear any of the blame for errors: I am solely responsible for them. 

Now, let's get back to the subject at hand:

With GEN8 comes an expansion of the HW contexts: "Logical Ring Contexts". One of the main differences with the legacy HW contexts is that logical ring contexts incorporate many more things to the context's state, like PDPs or ringbuffer control registers. These logical ring contexts enable a number of new abilities, especially "Execlists". Execlists are the new method by which, on GEN8+ hardware, workloads are submitted for execution (as opposed to the legacy, ringbuffer-based). With this new method, commands in the context's ringbuffer are executed when the GPU moves to this context from a previous one (a.k.a. context switch).

On a context switch, the GPU has to remember the current state of the context being switched out including the head and tail pointers of the ring buffer, so it:

- Flushes the pipe.
- Saves ringbuffer head pointer.
- Saves engine state.

Similarly, on a context restore (When a previously switched out context is resubmitted), the GPU restores the saved context and resumes execution where it stopped:

- Restores PDPs and sets-up PPGTT.
- Restores ringbuffer.
- Restores engine state.

The way in which contexts are submitted for execution is the GPU's ExecLists Submit Port (ELSP, for short). This port supports the submission of two contexts at a time, which are executed in a serial way (Context-0 first, Context-1 next) upon every context completion. The GPU keeps the software informed about the status of this list via context switch interrupts and context status buffers, to help software keep track of the progress. The existance of a second context ensures some useful work done in HW while the Context-0 switch status is being processed by SW. After Context-1 completion, HW goes IDLE if there is no further contexts scheduled in the ELSP.

Every time a new Execution List is submitted to the ELSP where one of the contexts is already running will result in a Lite Restore (sampling of the new tail pointer).

Regarding the creation of logical ring contexts, we had before (since PPGTT was introduced):

- One global default context.
- One private default context for each opened fd.
- One extra private context for each context create ioctl call.

The global default context existed for future shrinker usage as well as reset handling. At the same time, every file got it's own context, plus any number of extra contexts if the context create ioctl call was used by the userspace driver. These private contexts were the ones used by the driver for execbuffer calls.

Now that ringbuffers belong per-context (and not per-engine, like before) and that contexts are uniquely tied to a given engine (and not reusable, like before) we need:

- No. of engines global default contexts.
- Up to no. of engines private default contexts for each opened fd.
- Up to no. of engines extra private contexts for each context create ioctl call.

Given that at creation time of a non-global context we don't know which engine is going to use it, we have implemented a deferred creation of logical ring contexts: the private default context starts its life as a hollow or blank holder, that gets populated once we receive an execbuffer ioctl (for a particular engine) on that fd. If later on we receive another execbuffer ioctl for a different engine, we create a second private default context and so on. The same rules apply to the create context ioctl call.

Execlists have been implemented as follows:

When a request is committed, its commands (the BB start and any leading or trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer for the appropriate context. The tail pointer in the hardware context is not updated at this time, but instead, kept by the driver in the ringbuffer structure. A structure representing this execution request is added to a request queue for the appropriate engine: this structure contains a copy of the context's tail after the request was written to the ringbuffer and a pointer to the context itself.

If the engine's request queue was empty before the request was added, the queue is processed immediately. Otherwise the queue will be processed during a context switch interrupt. In any case, elements on the queue will get sent (in pairs) to the ELSP with a globally unique 20-bits submission ID (constructed with the fd's ID, plus our own context ID, plus the engine's ID).

When execution of a request completes, the GPU updates the context status buffer with a context complete event and generates a context switch interrupt. During context switch interrupt handling, the driver examines the context status events in the context status buffer: for each context complete event, if the announced ID matches that on the head of the request queue, then that request is retired and removed from the queue.

After processing, if any requests were retired and the queue is not empty then a new execution list can be submitted. The two requests at the front of the queue are next to be submitted but since a context may not occur twice in an execution list, if subsequent requests have the same ID as the first then the two requests must be combined. This is done simply by discarding requests at the head of the queue until either only one requests is left (in which case we use a NULL second context) or the first two requests have unique IDs.

By always executing the first two requests in the queue the driver ensures that the GPU is kept as busy as possible. In the case where a single context completes but a second context is still executing, the request for the second context will be at the head of the queue when we remove the first one. This request will then be resubmitted along with a new request for a different context, which will cause the hardware to continue executing the second request and queue the new request (the GPU detects the condition of a context getting preempted with the same context and optimizes the context switch flow by not doing preemption, but just sampling the new tail pointer).

Because the GPU continues to execute while the context switch interrupt is being handled, there is a race condition where a second context completes while handling the completion of the previous. This results in the second context being resubmitted (potentially along with a third), and an extra context complete event for that context will occur. The request will be removed from the queue at the first context complete event, and the second context complete event will not result in removal of a request from the queue because the IDs of the request and the event will not match.

Cheers,
Oscar

Ben Widawsky (15):
  drm/i915/bdw: Macro to distinguish LRCs (Logical Ring Contexts)
  drm/i915: s/for_each_ring/for_each_active_ring
  drm/i915: for_each_ring
  drm/i915: Extract trivial parts of ring init (early init)
  drm/i915/bdw: Rework init code for gen8 contexts
  drm/i915: Extract ringbuffer obj alloc & destroy
  drm/i915/bdw: LR context ring init
  drm/i915/bdw: GEN8 semaphoreless ring add request
  drm/i915/bdw: GEN8 new ring flush
  drm/i915/bdw: A bit more advanced context init/fini
  drm/i915/bdw: Allocate ringbuffer for LR contexts
  drm/i915/bdw: Populate LR contexts (somewhat)
  drm/i915/bdw: Status page for LR contexts
  drm/i915/bdw: Enable execlists in the hardware
  drm/i915/bdw: Implement context switching (somewhat)

Michel Thierry (1):
  drm/i915/bdw: Get prepared for a two-stage execlist submit process

Oscar Mateo (30):
  drm/i915: Simplify a couple of functions thanks to for_each_ring
  drm/i915/bdw: New file for logical ring contexts and execlists
  drm/i915: Make i915_gem_create_context outside accessible
  drm/i915: s/intel_ring_buffer/intel_engine
  drm/i915: Split the ringbuffers and the rings
  drm/i915: Rename functions that mention ringbuffers (meaning rings)
  drm/i915/bdw: Execlists ring tail writing
  drm/i915/bdw: Plumbing for user LR context switching
  drm/i915: s/__intel_ring_advance/intel_ringbuffer_advance_and_submit
  drm/i915/bdw: Write a new set of context-aware ringbuffer management
    functions
  drm/i915: Final touches to LR contexts plumbing and refactoring
  drm/i915/bdw: Set the request context information correctly in the LRC
    case
  drm/i915/bdw: Prepare for user-created LR contexts
  drm/i915/bdw: Start creating & destroying user LR contexts
  drm/i915/bdw: Pin context pages at context create time
  drm/i915/bdw: Extract LR context object populating
  drm/i915/bdw: Introduce dependent contexts
  drm/i915/bdw: Create stand-alone and dependent contexts
  drm/i915/bdw: Allow non-default, non-render user LR contexts
  drm/i915/bdw: Fix reset stats ioctl with LR contexts
  drm/i915: Allocate an integer ID for each new file descriptor
  drm/i915/bdw: Prepare for a 20-bits globally unique submission ID
  drm/i915/bdw: Swap the PPGTT PDPs, LRC style
  drm/i915/bdw: Write the tail pointer, LRC style
  drm/i915/bdw: Display execlists info in debugfs
  drm/i915/bdw: Display context ringbuffer info in debugfs
  drm/i915/bdw: Start queueing contexts to be submitted
  drm/i915/bdw: Always write seqno to default context
  drm/i915/bdw: Enable logical ring contexts
  drm/i915/bdw: Document execlists and logical ring contexts

Thomas Daniel (3):
  drm/i915/bdw: Add forcewake lock around ELSP writes
  drm/i915/bdw: LR context switch interrupts
  drm/i915/bdw: Handle context switch events

 drivers/gpu/drm/i915/Makefile              |   1 +
 drivers/gpu/drm/i915/i915_cmd_parser.c     |  14 +-
 drivers/gpu/drm/i915/i915_debugfs.c        | 103 +++-
 drivers/gpu/drm/i915/i915_dma.c            |  57 +-
 drivers/gpu/drm/i915/i915_drv.h            |  90 +++-
 drivers/gpu/drm/i915/i915_gem.c            | 153 +++---
 drivers/gpu/drm/i915/i915_gem_context.c    | 109 ++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  85 +--
 drivers/gpu/drm/i915/i915_gem_gtt.c        |  39 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h        |   2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c      |  12 +-
 drivers/gpu/drm/i915/i915_irq.c            |  93 ++--
 drivers/gpu/drm/i915/i915_lrc.c            | 826 +++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h            |  10 +
 drivers/gpu/drm/i915/i915_trace.h          |  26 +-
 drivers/gpu/drm/i915/intel_display.c       |  26 +-
 drivers/gpu/drm/i915/intel_drv.h           |   4 +-
 drivers/gpu/drm/i915/intel_overlay.c       |  12 +-
 drivers/gpu/drm/i915/intel_pm.c            |  18 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c    | 796 +++++++++++++++++----------
 drivers/gpu/drm/i915/intel_ringbuffer.h    | 187 ++++---
 drivers/gpu/drm/i915/intel_uncore.c        |  15 +
 22 files changed, 2043 insertions(+), 635 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_lrc.c

-- 
1.9.0

^ permalink raw reply	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2014-04-28 14:44 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-27 17:59 [PATCH 00/49] Execlists oscar.mateo
2014-03-27 17:59 ` [PATCH 01/49] drm/i915/bdw: Macro to distinguish LRCs (Logical Ring Contexts) oscar.mateo
2014-03-27 17:59 ` [PATCH 02/49] drm/i915: s/for_each_ring/for_each_active_ring oscar.mateo
2014-03-27 17:59 ` [PATCH 03/49] drm/i915: for_each_ring oscar.mateo
2014-03-27 17:59 ` [PATCH 04/49] drm/i915: Simplify a couple of functions thanks to for_each_ring oscar.mateo
2014-03-27 17:59 ` [PATCH 05/49] drm/i915: Extract trivial parts of ring init (early init) oscar.mateo
2014-03-27 17:59 ` [PATCH 06/49] drm/i915/bdw: New file for logical ring contexts and execlists oscar.mateo
2014-03-27 17:59 ` [PATCH 07/49] drm/i915/bdw: Rework init code for gen8 contexts oscar.mateo
2014-03-27 17:59 ` [PATCH 08/49] drm/i915: Make i915_gem_create_context outside accessible oscar.mateo
2014-03-27 17:59 ` [PATCH 09/49] drm/i915: Extract ringbuffer obj alloc & destroy oscar.mateo
2014-03-27 17:59 ` [PATCH 10/49] drm/i915: s/intel_ring_buffer/intel_engine oscar.mateo
2014-03-27 17:59 ` [PATCH 11/49] drm/i915: Split the ringbuffers and the rings oscar.mateo
2014-03-27 17:59 ` [PATCH 12/49] drm/i915: Rename functions that mention ringbuffers (meaning rings) oscar.mateo
2014-03-27 17:59 ` [PATCH 13/49] drm/i915/bdw: Execlists ring tail writing oscar.mateo
2014-03-27 17:13   ` Mateo Lozano, Oscar
2014-03-27 17:59 ` [PATCH 14/49] drm/i915/bdw: LR context ring init oscar.mateo
2014-03-27 17:59 ` [PATCH 15/49] drm/i915/bdw: GEN8 semaphoreless ring add request oscar.mateo
2014-03-27 17:59 ` [PATCH 16/49] drm/i915/bdw: GEN8 new ring flush oscar.mateo
2014-03-27 17:59 ` [PATCH 17/49] drm/i915/bdw: A bit more advanced context init/fini oscar.mateo
2014-04-01  0:38   ` Damien Lespiau
2014-04-01 13:47     ` Mateo Lozano, Oscar
2014-04-01 13:51       ` Damien Lespiau
2014-04-01 19:18         ` Ben Widawsky
2014-04-01 21:05           ` Damien Lespiau
2014-04-02  4:07             ` Ben Widawsky
2014-03-27 17:59 ` [PATCH 18/49] drm/i915/bdw: Allocate ringbuffer for LR contexts oscar.mateo
2014-03-27 17:59 ` [PATCH 19/49] drm/i915/bdw: Populate LR contexts (somewhat) oscar.mateo
2014-04-01  0:00   ` Damien Lespiau
2014-04-01 13:33     ` Mateo Lozano, Oscar
2014-04-15 16:00   ` Jeff McGee
2014-04-15 16:10     ` Jeff McGee
2014-04-15 19:51       ` Daniel Vetter
2014-04-15 20:43       ` Jeff McGee
2014-04-15 21:08         ` Daniel Vetter
2014-04-15 22:32           ` Jeff McGee
2014-04-16  6:04             ` Daniel Vetter
2014-03-27 17:59 ` [PATCH 20/49] drm/i915/bdw: Status page for LR contexts oscar.mateo
2014-03-27 17:59 ` [PATCH 21/49] drm/i915/bdw: Enable execlists in the hardware oscar.mateo
2014-03-27 17:59 ` [PATCH 22/49] drm/i915/bdw: Plumbing for user LR context switching oscar.mateo
2014-03-27 17:59 ` [PATCH 23/49] drm/i915: s/__intel_ring_advance/intel_ringbuffer_advance_and_submit oscar.mateo
2014-03-27 17:59 ` [PATCH 24/49] drm/i915/bdw: Write a new set of context-aware ringbuffer management functions oscar.mateo
2014-03-27 17:59 ` [PATCH 25/49] drm/i915: Final touches to LR contexts plumbing and refactoring oscar.mateo
2014-03-27 17:59 ` [PATCH 26/49] drm/i915/bdw: Set the request context information correctly in the LRC case oscar.mateo
2014-03-27 17:59 ` [PATCH 27/49] drm/i915/bdw: Prepare for user-created LR contexts oscar.mateo
2014-03-27 17:59 ` [PATCH 28/49] drm/i915/bdw: Start creating & destroying user " oscar.mateo
2014-03-27 17:59 ` [PATCH 29/49] drm/i915/bdw: Pin context pages at context create time oscar.mateo
2014-03-27 17:59 ` [PATCH 30/49] drm/i915/bdw: Extract LR context object populating oscar.mateo
2014-03-27 18:00 ` [PATCH 31/49] drm/i915/bdw: Introduce dependent contexts oscar.mateo
2014-03-27 17:21   ` Mateo Lozano, Oscar
2014-04-09 16:54     ` Mateo Lozano, Oscar
2014-03-27 18:00 ` [PATCH 32/49] drm/i915/bdw: Create stand-alone and " oscar.mateo
2014-03-27 18:00 ` [PATCH 33/49] drm/i915/bdw: Allow non-default, non-render user LR contexts oscar.mateo
2014-03-27 18:00 ` [PATCH 34/49] drm/i915/bdw: Fix reset stats ioctl with " oscar.mateo
2014-03-27 18:00 ` [PATCH 35/49] drm/i915: Allocate an integer ID for each new file descriptor oscar.mateo
2014-03-27 18:00 ` [PATCH 36/49] drm/i915/bdw: Prepare for a 20-bits globally unique submission ID oscar.mateo
2014-03-27 18:00 ` [PATCH 37/49] drm/i915/bdw: Implement context switching (somewhat) oscar.mateo
2014-03-27 18:00 ` [PATCH 38/49] drm/i915/bdw: Add forcewake lock around ELSP writes oscar.mateo
2014-03-27 18:00 ` [PATCH 39/49] drm/i915/bdw: Swap the PPGTT PDPs, LRC style oscar.mateo
2014-03-31 16:42   ` Damien Lespiau
2014-04-01 13:42     ` Mateo Lozano, Oscar
2014-04-02 13:47   ` Damien Lespiau
2014-04-09  7:56     ` Mateo Lozano, Oscar
2014-03-27 18:00 ` [PATCH 40/49] drm/i915/bdw: Write the tail pointer, " oscar.mateo
2014-03-27 18:00 ` [PATCH 41/49] drm/i915/bdw: LR context switch interrupts oscar.mateo
2014-04-02 11:42   ` Damien Lespiau
2014-04-02 11:49     ` Daniel Vetter
2014-04-02 12:56       ` Damien Lespiau
2014-03-27 18:00 ` [PATCH 42/49] drm/i915/bdw: Get prepared for a two-stage execlist submit process oscar.mateo
2014-04-04 11:12   ` Damien Lespiau
2014-04-04 13:24     ` Damien Lespiau
2014-04-09  7:57       ` Mateo Lozano, Oscar
2014-03-27 18:00 ` [PATCH 43/49] drm/i915/bdw: Handle context switch events oscar.mateo
2014-04-03 14:24   ` Damien Lespiau
2014-04-09  8:15     ` Mateo Lozano, Oscar
2014-04-26  0:53   ` Robert Beckett
2014-04-28 14:43     ` Mateo Lozano, Oscar
2014-03-27 18:00 ` [PATCH 44/49] drm/i915/bdw: Display execlists info in debugfs oscar.mateo
2014-04-07 19:19   ` Damien Lespiau
2014-03-27 18:00 ` [PATCH 45/49] drm/i915/bdw: Display context ringbuffer " oscar.mateo
2014-03-27 18:00 ` [PATCH 46/49] drm/i915/bdw: Start queueing contexts to be submitted oscar.mateo
2014-03-27 18:00 ` [PATCH 47/49] drm/i915/bdw: Always write seqno to default context oscar.mateo
2014-03-27 18:00 ` [PATCH 48/49] drm/i915/bdw: Enable logical ring contexts oscar.mateo
2014-03-27 18:00 ` [PATCH 49/49] drm/i915/bdw: Document execlists and " oscar.mateo
2014-04-07 18:12 ` [PATCH 00/49] Execlists Damien Lespiau
2014-04-07 21:32   ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.