All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/42] Execlists v4
@ 2014-07-11 15:48 oscar.mateo
  2014-07-11 15:48 ` [PATCH 01/42] drm/i915/bdw: New source and header file for LRs, LRCs and Execlists oscar.mateo
                   ` (42 more replies)
  0 siblings, 43 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:48 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

For a description of this patchset, please check the previous cover letters: [1], [2] and [3].

The main changes introduced in this v4 are:

- Do not abstract __i915_add_request away.
- Squash together the two emit request functions.
- Always pass the ringbuffer along, as the struct that hold together all required information.
- Integrate help into DocBook.
- Early sanitize for i915.execlists (and remove lrc_enabled).
- Always defer the context creation: do not special case the context create ioctl.
- Do not special case the reset_ring_cleanup, do INIT_LIST_HEAD.
- Add TODO to error capture code.
- "use_mmio_flip at driver discretion" means "always" with Execlists.
- Lock before displaying execlists info in debugfs.
- Ehen populating the context, use _MASKED_BIT_ENABLE, upper/lower_32_bits, etc...
- Add comment about CHV when populating the context.
- Replace submit_ctx vfunc with a plain function call.
- Do not change the behaviour of pipe control fini.
- The BSD invalidate bit still exists in GEN8.
- Squash intel_runtime_pm_get fix.
- Pin rinbuffer and context backing objects only on-demand.
- Reorder and rewrite to explain patches instead of showcasing them.
- Reorder and rewrite to split "chapters" as cleanly as possible.

The patches in this series can be split into "chapters". They are:

- Logical Ring Contexts: patches 1 to 11
- Logical Rings: patches 12 to 26
- Execlists: patches 27 to 35
- Debug, documentation and enabling: patches 36 to 40
- Last minute patches that still require more attention: patches 41 and 42

Unfortunately, in a last-minute run of the IGT testsuite, I have noticed problems in gem_evict_everything. It's still not clear I can continue working on this project come next Monday, so I have decided to send the series to the mailing list "as-is", even though this potential eviction problem still needs to be chased. The last two patches (regarding on-demand pining of backing objects) also require a deep review and probably some more work, but I was running out of time.

One other caveat I have noticed is that many WAs in gen8_init_clock_gating (those that affect registers that now exist per-context) can get lost in the render default context. The reason is, in Execlists, a context is saved as soon as head = tail (with MI_SET_CONTEXT, however, the context wouldn't be saved until you tried to restore a different context). As we are sending the golden state batchbuffer to the render ring as soon as the rings are initialized, we are effectively saving the default context before gen8_init_clock_gating has an opportunity to set the WAs. I haven't noticed any ill-effect from this (yet) but it would be a good idea to move the WAs somewhere else (ring init looks like a good place). I believe there is already work in progress to create a new WA architecture, so thi
 s can be tackled there.

The previous IGT test [4] still applies.

[1]
http://lists.freedesktop.org/archives/intel-gfx/2014-March/042563.html
[2]
http://lists.freedesktop.org/archives/intel-gfx/2014-May/044847.html
[3]
http://lists.freedesktop.org/archives/intel-gfx/2014-June/047138.html
[4]
http://lists.freedesktop.org/archives/intel-gfx/2014-May/044846.html

Ben Widawsky (2):
  drm/i915/bdw: Implement context switching (somewhat)
  drm/i915/bdw: Print context state in debugfs

Michel Thierry (1):
  drm/i915/bdw: Two-stage execlist submit process

Oscar Mateo (38):
  drm/i915/bdw: New source and header file for LRs, LRCs and Execlists
  drm/i915/bdw: Macro for LRCs and module option for Execlists
  drm/i915/bdw: Initialization for Logical Ring Contexts
  drm/i915/bdw: Introduce one context backing object per engine
  drm/i915/bdw: A bit more advanced LR context alloc/free
  drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts
  drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
  drm/i915/bdw: Populate LR contexts (somewhat)
  drm/i915/bdw: Deferred creation of user-created LRCs
  drm/i915/bdw: Render moot context reset and switch with Execlists
  drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
  drm/i915: Abstract the legacy workload submission mechanism away
  drm/i915/bdw: Skeleton for the new logical rings submission path
  drm/i915/bdw: Generic logical ring init and cleanup
  drm/i915/bdw: GEN-specific logical ring init
  drm/i915/bdw: GEN-specific logical ring set/get seqno
  drm/i915/bdw: New logical ring submission mechanism
  drm/i915/bdw: GEN-specific logical ring emit request
  drm/i915/bdw: GEN-specific logical ring emit flush
  drm/i915/bdw: Emission of requests with logical rings
  drm/i915/bdw: Ring idle and stop with logical rings
  drm/i915/bdw: Interrupts with logical rings
  drm/i915/bdw: GEN-specific logical ring emit batchbuffer start
  drm/i915/bdw: Workload submission mechanism for Execlists
  drm/i915/bdw: Always use MMIO flips with Execlists
  drm/i915/bdw: Render state init for Execlists
  drm/i915/bdw: Write the tail pointer, LRC style
  drm/i915/bdw: Avoid non-lite-restore preemptions
  drm/i915/bdw: Help out the ctx switch interrupt handler
  drm/i915/bdw: Make sure gpu reset still works with Execlists
  drm/i915/bdw: Make sure error capture keeps working with Execlists
  drm/i915/bdw: Disable semaphores for Execlists
  drm/i915/bdw: Display execlists info in debugfs
  drm/i915/bdw: Display context backing obj & ringbuffer info in debugfs
  drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
  drm/i915/bdw: Enable Logical Ring Contexts (hence, Execlists)
  drm/i915/bdw: Pin the context backing objects to GGTT on-demand
  drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand

Thomas Daniel (1):
  drm/i915/bdw: Handle context switch events

 Documentation/DocBook/drm.tmpl               |    5 +
 drivers/gpu/drm/i915/Makefile                |    1 +
 drivers/gpu/drm/i915/i915_debugfs.c          |  150 ++-
 drivers/gpu/drm/i915/i915_drv.c              |    4 +
 drivers/gpu/drm/i915/i915_drv.h              |   46 +-
 drivers/gpu/drm/i915/i915_gem.c              |  106 +-
 drivers/gpu/drm/i915/i915_gem_context.c      |   56 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   |   32 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c          |    5 +
 drivers/gpu/drm/i915/i915_gem_render_state.c |   40 +-
 drivers/gpu/drm/i915/i915_gem_render_state.h |   47 +
 drivers/gpu/drm/i915/i915_gpu_error.c        |   22 +-
 drivers/gpu/drm/i915/i915_irq.c              |   44 +-
 drivers/gpu/drm/i915/i915_params.c           |    6 +
 drivers/gpu/drm/i915/i915_reg.h              |    5 +
 drivers/gpu/drm/i915/intel_display.c         |    2 +
 drivers/gpu/drm/i915/intel_lrc.c             | 1829 ++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h             |  113 ++
 drivers/gpu/drm/i915/intel_renderstate.h     |    8 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c      |  166 ++-
 drivers/gpu/drm/i915/intel_ringbuffer.h      |   42 +-
 21 files changed, 2574 insertions(+), 155 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_render_state.h
 create mode 100644 drivers/gpu/drm/i915/intel_lrc.c
 create mode 100644 drivers/gpu/drm/i915/intel_lrc.h

-- 
1.9.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 01/42] drm/i915/bdw: New source and header file for LRs, LRCs and Execlists
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
@ 2014-07-11 15:48 ` oscar.mateo
  2014-07-11 15:48 ` [PATCH 02/42] drm/i915/bdw: Macro for LRCs and module option for Execlists oscar.mateo
                   ` (41 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:48 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Some legacy HW context code assumptions don't make sense for this new
submission method, so we will place this stuff in a separate file.

Note for reviewers: I've carefully considered the best name for this file
and this was my best option (other possibilities were intel_lr_context.c
or intel_execlist.c). I am open to a certain bikeshedding on this matter,
anyway.

And some point in time, it would be a good idea to split intel_lrc.c/.h
even further, but for the moment just shove everything together.

v2: Change to intel_lrc.c

v3: Squash together with the header file addition

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/Makefile    |  1 +
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/intel_lrc.c | 42 ++++++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h | 27 ++++++++++++++++++++++++++
 4 files changed, 71 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/intel_lrc.c
 create mode 100644 drivers/gpu/drm/i915/intel_lrc.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index cad1683..9fee2a0 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -31,6 +31,7 @@ i915-y += i915_cmd_parser.o \
 	  i915_gpu_error.o \
 	  i915_irq.o \
 	  i915_trace_points.o \
+	  intel_lrc.o \
 	  intel_ringbuffer.o \
 	  intel_uncore.o
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 47c8ec1..0a7752f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -35,6 +35,7 @@
 #include "i915_reg.h"
 #include "intel_bios.h"
 #include "intel_ringbuffer.h"
+#include "intel_lrc.h"
 #include "i915_gem_gtt.h"
 #include <linux/io-mapping.h>
 #include <linux/i2c.h>
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
new file mode 100644
index 0000000..49bb6fc
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -0,0 +1,42 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Ben Widawsky <ben@bwidawsk.net>
+ *    Michel Thierry <michel.thierry@intel.com>
+ *    Thomas Daniel <thomas.daniel@intel.com>
+ *    Oscar Mateo <oscar.mateo@intel.com>
+ *
+ */
+
+/*
+ * GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
+ * These expanded contexts enable a number of new abilities, especially
+ * "Execlists" (also implemented in this file).
+ *
+ * Execlists are the new method by which, on gen8+ hardware, workloads are
+ * submitted for execution (as opposed to the legacy, ringbuffer-based, method).
+ */
+
+#include <drm/drmP.h>
+#include <drm/i915_drm.h>
+#include "i915_drv.h"
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
new file mode 100644
index 0000000..f6830a4
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef _INTEL_LRC_H_
+#define _INTEL_LRC_H_
+
+#endif /* _INTEL_LRC_H_ */
-- 
1.9.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 02/42] drm/i915/bdw: Macro for LRCs and module option for Execlists
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
  2014-07-11 15:48 ` [PATCH 01/42] drm/i915/bdw: New source and header file for LRs, LRCs and Execlists oscar.mateo
@ 2014-07-11 15:48 ` oscar.mateo
  2014-07-11 15:48 ` [PATCH 03/42] drm/i915/bdw: Initialization for Logical Ring Contexts oscar.mateo
                   ` (40 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:48 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
These expanded contexts enable a number of new abilities, especially
"Execlists".

The macro is defined to off until we have things in place to hope to
work.

v2: Rename "advanced contexts" to the more correct "logical ring
contexts".

v3: Add a module parameter to enable execlists. Execlist are relatively
new, and so it'd be wise to be able to switch back to ring submission
to debug subtle problems that will inevitably arise.

v4: Add an intel_enable_execlists function.

v5: Sanitize early, as suggested by Daniel. Remove lrc_enabled.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v4 & v5)
---
 drivers/gpu/drm/i915/i915_drv.h    |  2 ++
 drivers/gpu/drm/i915/i915_gem.c    |  3 +++
 drivers/gpu/drm/i915/i915_params.c |  6 ++++++
 drivers/gpu/drm/i915/intel_lrc.c   | 11 +++++++++++
 drivers/gpu/drm/i915/intel_lrc.h   |  3 +++
 5 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0a7752f..bb7ddd1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2030,6 +2030,7 @@ struct drm_i915_cmd_table {
 #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
 
 #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
+#define HAS_LOGICAL_RING_CONTEXTS(dev)	0
 #define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >= 6)
 #define HAS_PPGTT(dev)		(INTEL_INFO(dev)->gen >= 7 && !IS_GEN8(dev))
 #define USES_PPGTT(dev)		intel_enable_ppgtt(dev, false)
@@ -2115,6 +2116,7 @@ struct i915_params {
 	int enable_rc6;
 	int enable_fbc;
 	int enable_ppgtt;
+	int enable_execlists;
 	int enable_psr;
 	unsigned int preliminary_hw_support;
 	int disable_power_well;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e5d4d73..d8bf4fa 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4746,6 +4746,9 @@ int i915_gem_init(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret;
 
+	i915.enable_execlists = intel_sanitize_enable_execlists(dev,
+			i915.enable_execlists);
+
 	mutex_lock(&dev->struct_mutex);
 
 	if (IS_VALLEYVIEW(dev)) {
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 8145729..46d7a06 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -37,6 +37,7 @@ struct i915_params i915 __read_mostly = {
 	.enable_fbc = -1,
 	.enable_hangcheck = true,
 	.enable_ppgtt = -1,
+	.enable_execlists = -1,
 	.enable_psr = 0,
 	.preliminary_hw_support = IS_ENABLED(CONFIG_DRM_I915_PRELIMINARY_HW_SUPPORT),
 	.disable_power_well = 1,
@@ -117,6 +118,11 @@ MODULE_PARM_DESC(enable_ppgtt,
 	"Override PPGTT usage. "
 	"(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
 
+module_param_named(enable_execlists, i915.enable_execlists, int, 0400);
+MODULE_PARM_DESC(enable_execlists,
+	"Override execlists usage. "
+	"(-1=auto [default], 0=disabled, 1=enabled)");
+
 module_param_named(enable_psr, i915.enable_psr, int, 0600);
 MODULE_PARM_DESC(enable_psr, "Enable PSR (default: false)");
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 49bb6fc..21f7f1c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -40,3 +40,14 @@
 #include <drm/drmP.h>
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
+
+int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists)
+{
+	if (enable_execlists == 0)
+		return 0;
+
+	if (HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev))
+		return 1;
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index f6830a4..75ee9c3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -24,4 +24,7 @@
 #ifndef _INTEL_LRC_H_
 #define _INTEL_LRC_H_
 
+/* Execlists */
+int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
+
 #endif /* _INTEL_LRC_H_ */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 03/42] drm/i915/bdw: Initialization for Logical Ring Contexts
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
  2014-07-11 15:48 ` [PATCH 01/42] drm/i915/bdw: New source and header file for LRs, LRCs and Execlists oscar.mateo
  2014-07-11 15:48 ` [PATCH 02/42] drm/i915/bdw: Macro for LRCs and module option for Execlists oscar.mateo
@ 2014-07-11 15:48 ` oscar.mateo
  2014-07-11 15:48 ` [PATCH 04/42] drm/i915/bdw: Introduce one context backing object per engine oscar.mateo
                   ` (39 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:48 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

For the moment this is just a placeholder, but it shows one of the
main differences between the good ol' HW contexts and the shiny
new Logical Ring Contexts: LR contexts allocate  and free their
own backing objects. Another difference is that the allocation is
deferred (as the create function name suggests), but that does not
happen in this patch yet, because for the moment we are only dealing
with the default context.

Early in the series we had our own gen8_gem_context_init/fini
functions, but the truth is they now look almost the same as the
legacy hw context init/fini functions. We can always split them
later if this ceases to be the case.

Also, we do not fall back to legacy ringbuffers when logical ring
context initialization fails (not very likely to happen and, even
if it does, hw contexts would probably fail as well).

v2: Daniel says "explain, do not showcase".

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 29 +++++++++++++++++++++++------
 drivers/gpu/drm/i915/intel_lrc.c        | 15 +++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h        |  5 +++++
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index de72a28..718150e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -182,7 +182,10 @@ void i915_gem_context_free(struct kref *ctx_ref)
 						   typeof(*ctx), ref);
 	struct i915_hw_ppgtt *ppgtt = NULL;
 
-	if (ctx->legacy_hw_ctx.rcs_state) {
+	if (i915.enable_execlists) {
+		ppgtt = ctx_to_ppgtt(ctx);
+		intel_lr_context_free(ctx);
+	} else if (ctx->legacy_hw_ctx.rcs_state) {
 		/* We refcount even the aliasing PPGTT to keep the code symmetric */
 		if (USES_PPGTT(ctx->legacy_hw_ctx.rcs_state->base.dev))
 			ppgtt = ctx_to_ppgtt(ctx);
@@ -419,7 +422,11 @@ int i915_gem_context_init(struct drm_device *dev)
 	if (WARN_ON(dev_priv->ring[RCS].default_context))
 		return 0;
 
-	if (HAS_HW_CONTEXTS(dev)) {
+	if (i915.enable_execlists) {
+		/* NB: intentionally left blank. We will allocate our own
+		 * backing objects as we need them, thank you very much */
+		dev_priv->hw_context_size = 0;
+	} else if (HAS_HW_CONTEXTS(dev)) {
 		dev_priv->hw_context_size = round_up(get_context_size(dev), 4096);
 		if (dev_priv->hw_context_size > (1<<20)) {
 			DRM_DEBUG_DRIVER("Disabling HW Contexts; invalid size %d\n",
@@ -435,11 +442,20 @@ int i915_gem_context_init(struct drm_device *dev)
 		return PTR_ERR(ctx);
 	}
 
-	/* NB: RCS will hold a ref for all rings */
-	for (i = 0; i < I915_NUM_RINGS; i++)
-		dev_priv->ring[i].default_context = ctx;
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		struct intel_engine_cs *ring = &dev_priv->ring[i];
+
+		/* NB: RCS will hold a ref for all rings */
+		ring->default_context = ctx;
+
+		/* FIXME: we really only want to do this for initialized rings */
+		if (i915.enable_execlists)
+			intel_lr_context_deferred_create(ctx, ring);
+	}
 
-	DRM_DEBUG_DRIVER("%s context support initialized\n", dev_priv->hw_context_size ? "HW" : "fake");
+	DRM_DEBUG_DRIVER("%s context support initialized\n",
+			i915.enable_execlists ? "LR" :
+			dev_priv->hw_context_size ? "HW" : "fake");
 	return 0;
 }
 
@@ -781,6 +797,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	struct intel_context *ctx;
 	int ret;
 
+	/* FIXME: allow user-created LR contexts as well */
 	if (!hw_context_enabled(dev))
 		return -ENODEV;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 21f7f1c..8cc6b55 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -51,3 +51,18 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 
 	return 0;
 }
+
+void intel_lr_context_free(struct intel_context *ctx)
+{
+	/* TODO */
+}
+
+int intel_lr_context_deferred_create(struct intel_context *ctx,
+				     struct intel_engine_cs *ring)
+{
+	BUG_ON(ctx->legacy_hw_ctx.rcs_state != NULL);
+
+	/* TODO */
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 75ee9c3..3b93572 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -24,6 +24,11 @@
 #ifndef _INTEL_LRC_H_
 #define _INTEL_LRC_H_
 
+/* Logical Ring Contexts */
+void intel_lr_context_free(struct intel_context *ctx);
+int intel_lr_context_deferred_create(struct intel_context *ctx,
+				     struct intel_engine_cs *ring);
+
 /* Execlists */
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 04/42] drm/i915/bdw: Introduce one context backing object per engine
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (2 preceding siblings ...)
  2014-07-11 15:48 ` [PATCH 03/42] drm/i915/bdw: Initialization for Logical Ring Contexts oscar.mateo
@ 2014-07-11 15:48 ` oscar.mateo
  2014-07-11 15:48 ` [PATCH 05/42] drm/i915/bdw: A bit more advanced LR context alloc/free oscar.mateo
                   ` (38 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:48 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

A context backing object only makes sense for a given engine (because
it holds state data specific to that engine).

In legacy ringbuffer sumission mode, the only MI_SET_CONTEXT we really
perform is for the render engine, so one backing object is all we nee.

With Execlists, however, we need backing objects for every engine, as
contexts become the only way to submit workloads to the GPU. To tackle
this problem, we multiplex the context struct to contain <no-of-engines>
objects.

Originally, I colored this code by instantiating one new context for
every engine I wanted to use, but this change suggested by Brad Volkin
makes it more elegant.

v2: Leave the old backing object pointer behind. Daniel Vetter suggested
using a union, but it makes more sense to keep rcs_state as a NULL
pointer behind, to make sure no one uses it incorrectly when Execlists
are enabled, similar to what he suggested for ring->buffer (Rusty's API
level 5).

v3: Use the name "state" instead of the too-generic "obj", so that it
mirrors the name choice for the legacy rcs_state.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index bb7ddd1..0ebf41b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -612,11 +612,17 @@ struct intel_context {
 	struct i915_ctx_hang_stats hang_stats;
 	struct i915_address_space *vm;
 
+	/* Legacy ring buffer submission */
 	struct {
 		struct drm_i915_gem_object *rcs_state;
 		bool initialized;
 	} legacy_hw_ctx;
 
+	/* Execlists */
+	struct {
+		struct drm_i915_gem_object *state;
+	} engine[I915_NUM_RINGS];
+
 	struct list_head link;
 };
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 05/42] drm/i915/bdw: A bit more advanced LR context alloc/free
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (3 preceding siblings ...)
  2014-07-11 15:48 ` [PATCH 04/42] drm/i915/bdw: Introduce one context backing object per engine oscar.mateo
@ 2014-07-11 15:48 ` oscar.mateo
  2014-07-11 15:48 ` [PATCH 06/42] drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts oscar.mateo
                   ` (37 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:48 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Now that we have the ability to allocate our own context backing objects
and we have multiplexed one of them per engine inside the context structs,
we can finally allocate and free them correctly.

Regarding the context size, reading the register to calculate the sizes
can work, I think, however the docs are very clear about the actual
context sizes on GEN8, so just hardcode that and use it.

v2: Rebased on top of the Full PPGTT series. It is important to notice
that at this point we have one global default context per engine, all
of them using the aliasing PPGTT (as opposed to the single global
default context we have with legacy HW contexts).

v3:
- Go back to one single global default context, this time with multiple
  backing objects inside.
- Use different context sizes for non-render engines, as suggested by
  Damien (still hardcoded, since the information about the context size
  registers in the BSpec is, well, *lacking*).
- Render ctx size is 20 (or 19) pages, but not 21 (caught by Damien).
- Move default context backing object creation to intel_init_ring (so
  that we don't waste memory in rings that might not get initialized).

v4:
- Reuse the HW legacy context init/fini.
- Create a separate free function.
- Rename the functions with an intel_ preffix.

v5: Several rebases to account for the changes in the previous patches.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  2 ++
 drivers/gpu/drm/i915/i915_gem_context.c |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c        | 59 +++++++++++++++++++++++++++++++--
 3 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0ebf41b..0638356 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2493,6 +2493,8 @@ int i915_switch_context(struct intel_engine_cs *ring,
 struct intel_context *
 i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id);
 void i915_gem_context_free(struct kref *ctx_ref);
+struct drm_i915_gem_object *
+i915_gem_alloc_context_obj(struct drm_device *dev, size_t size);
 static inline void i915_gem_context_reference(struct intel_context *ctx)
 {
 	kref_get(&ctx->ref);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 718150e..48d7476 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -201,7 +201,7 @@ void i915_gem_context_free(struct kref *ctx_ref)
 	kfree(ctx);
 }
 
-static struct drm_i915_gem_object *
+struct drm_i915_gem_object *
 i915_gem_alloc_context_obj(struct drm_device *dev, size_t size)
 {
 	struct drm_i915_gem_object *obj;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8cc6b55..a3fc6fc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -41,6 +41,11 @@
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 
+#define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
+#define GEN8_LR_CONTEXT_OTHER_SIZE (2 * PAGE_SIZE)
+
+#define GEN8_LR_CONTEXT_ALIGN 4096
+
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists)
 {
 	if (enable_execlists == 0)
@@ -54,15 +59,65 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 
 void intel_lr_context_free(struct intel_context *ctx)
 {
-	/* TODO */
+	int i;
+
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
+		if (ctx_obj) {
+			i915_gem_object_ggtt_unpin(ctx_obj);
+			drm_gem_object_unreference(&ctx_obj->base);
+		}
+	}
+}
+
+static uint32_t get_lr_context_size(struct intel_engine_cs *ring)
+{
+	int ret = 0;
+
+	WARN_ON(INTEL_INFO(ring->dev)->gen != 8);
+
+	switch (ring->id) {
+	case RCS:
+		ret = GEN8_LR_CONTEXT_RENDER_SIZE;
+		break;
+	case VCS:
+	case BCS:
+	case VECS:
+	case VCS2:
+		ret = GEN8_LR_CONTEXT_OTHER_SIZE;
+		break;
+	}
+
+	return ret;
 }
 
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring)
 {
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_gem_object *ctx_obj;
+	uint32_t context_size;
+	int ret;
+
 	BUG_ON(ctx->legacy_hw_ctx.rcs_state != NULL);
 
-	/* TODO */
+	context_size = round_up(get_lr_context_size(ring), 4096);
+
+	ctx_obj = i915_gem_alloc_context_obj(dev, context_size);
+	if (IS_ERR(ctx_obj)) {
+		ret = PTR_ERR(ctx_obj);
+		DRM_DEBUG_DRIVER("Alloc LRC backing obj failed: %d\n", ret);
+		return ret;
+	}
+
+	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n", ret);
+		drm_gem_object_unreference(&ctx_obj->base);
+		return ret;
+	}
+
+	ctx->engine[ring->id].state = ctx_obj;
 
 	return 0;
 }
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 06/42] drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (4 preceding siblings ...)
  2014-07-11 15:48 ` [PATCH 05/42] drm/i915/bdw: A bit more advanced LR context alloc/free oscar.mateo
@ 2014-07-11 15:48 ` oscar.mateo
  2014-07-11 15:48 ` [PATCH 07/42] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer oscar.mateo
                   ` (36 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:48 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

As we have said a couple of times by now, logical ring contexts have
their own ringbuffers: not only the backing pages, but the whole
management struct.

In a previous version of the series, this was achieved with two separate
patches:
drm/i915/bdw: Allocate ringbuffer backing objects for default global LRC
drm/i915/bdw: Allocate ringbuffer for user-created LRCs

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  1 +
 drivers/gpu/drm/i915/intel_lrc.c        | 38 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c |  6 +++---
 drivers/gpu/drm/i915/intel_ringbuffer.h |  4 ++++
 4 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0638356..ca1a0cc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -621,6 +621,7 @@ struct intel_context {
 	/* Execlists */
 	struct {
 		struct drm_i915_gem_object *state;
+		struct intel_ringbuffer *ringbuf;
 	} engine[I915_NUM_RINGS];
 
 	struct list_head link;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a3fc6fc..0a12b8c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -63,7 +63,11 @@ void intel_lr_context_free(struct intel_context *ctx)
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
+		struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
+
 		if (ctx_obj) {
+			intel_destroy_ringbuffer_obj(ringbuf);
+			kfree(ringbuf);
 			i915_gem_object_ggtt_unpin(ctx_obj);
 			drm_gem_object_unreference(&ctx_obj->base);
 		}
@@ -97,6 +101,7 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_gem_object *ctx_obj;
 	uint32_t context_size;
+	struct intel_ringbuffer *ringbuf;
 	int ret;
 
 	BUG_ON(ctx->legacy_hw_ctx.rcs_state != NULL);
@@ -117,6 +122,39 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 		return ret;
 	}
 
+	ringbuf = kzalloc(sizeof(*ringbuf), GFP_KERNEL);
+	if (!ringbuf) {
+		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s\n",
+				ring->name);
+		i915_gem_object_ggtt_unpin(ctx_obj);
+		drm_gem_object_unreference(&ctx_obj->base);
+		ret = -ENOMEM;
+		return ret;
+	}
+
+	ringbuf->size = 32 * PAGE_SIZE;
+	ringbuf->effective_size = ringbuf->size;
+	ringbuf->head = 0;
+	ringbuf->tail = 0;
+	ringbuf->space = ringbuf->size;
+	ringbuf->last_retired_head = -1;
+
+	/* TODO: For now we put this in the mappable region so that we can reuse
+	 * the existing ringbuffer code which ioremaps it. When we start
+	 * creating many contexts, this will no longer work and we must switch
+	 * to a kmapish interface.
+	 */
+	ret = intel_alloc_ringbuffer_obj(dev, ringbuf);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer obj %s: %d\n",
+				ring->name, ret);
+		kfree(ringbuf);
+		i915_gem_object_ggtt_unpin(ctx_obj);
+		drm_gem_object_unreference(&ctx_obj->base);
+		return ret;
+	}
+
+	ctx->engine[ring->id].ringbuf = ringbuf;
 	ctx->engine[ring->id].state = ctx_obj;
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 599709e..01e9840 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1495,7 +1495,7 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
 	return 0;
 }
 
-static void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
+void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 {
 	if (!ringbuf->obj)
 		return;
@@ -1506,8 +1506,8 @@ static void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 	ringbuf->obj = NULL;
 }
 
-static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
-				      struct intel_ringbuffer *ringbuf)
+int intel_alloc_ringbuffer_obj(struct drm_device *dev,
+			       struct intel_ringbuffer *ringbuf)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_object *obj;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index ed59410..053d004 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -353,6 +353,10 @@ intel_write_status_page(struct intel_engine_cs *ring,
 #define I915_GEM_HWS_SCRATCH_INDEX	0x30
 #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
 
+void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf);
+int intel_alloc_ringbuffer_obj(struct drm_device *dev,
+			       struct intel_ringbuffer *ringbuf);
+
 void intel_stop_ring_buffer(struct intel_engine_cs *ring);
 void intel_cleanup_ring_buffer(struct intel_engine_cs *ring);
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 07/42] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (5 preceding siblings ...)
  2014-07-11 15:48 ` [PATCH 06/42] drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts oscar.mateo
@ 2014-07-11 15:48 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 08/42] drm/i915/bdw: Populate LR contexts (somewhat) oscar.mateo
                   ` (35 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:48 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Any given ringbuffer is unequivocally tied to one context and one engine.
By setting the appropriate pointers to them, the ringbuffer struct holds
all the infromation you might need to submit a workload for processing,
Execlists style.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h | 3 +++
 3 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0a12b8c..2eb7db6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -132,6 +132,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 		return ret;
 	}
 
+	ringbuf->ring = ring;
+	ringbuf->ctx = ctx;
 	ringbuf->size = 32 * PAGE_SIZE;
 	ringbuf->effective_size = ringbuf->size;
 	ringbuf->head = 0;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 01e9840..279dda4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1570,6 +1570,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	ringbuf->size = 32 * PAGE_SIZE;
+	ringbuf->ring = ring;
+	ringbuf->ctx = ring->default_context;
 	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
 
 	init_waitqueue_head(&ring->irq_queue);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 053d004..be40788 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -88,6 +88,9 @@ struct intel_ringbuffer {
 	struct drm_i915_gem_object *obj;
 	void __iomem *virtual_start;
 
+	struct intel_engine_cs *ring;
+	struct intel_context *ctx;
+
 	u32 head;
 	u32 tail;
 	int space;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 08/42] drm/i915/bdw: Populate LR contexts (somewhat)
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (6 preceding siblings ...)
  2014-07-11 15:48 ` [PATCH 07/42] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 09/42] drm/i915/bdw: Deferred creation of user-created LRCs oscar.mateo
                   ` (34 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

For the most part, logical ring context objects are similar to hardware
contexts in that the backing object is meant to be opaque. There are
some exceptions where we need to poke certain offsets of the object for
initialization, updating the tail pointer or updating the PDPs.

For our basic execlist implementation we'll only need our PPGTT PDs,
and ringbuffer addresses in order to set up the context. With previous
patches, we have both, so start prepping the context to be load.

Before running a context for the first time you must populate some
fields in the context object. These fields begin 1 PAGE + LRCA, ie. the
first page (in 0 based counting) of the context  image. These same
fields will be read and written to as contexts are saved and restored
once the system is up and running.

Many of these fields are completely reused from previous global
registers: ringbuffer head/tail/control, context control matches some
previous MI_SET_CONTEXT flags, and page directories. There are other
fields which we don't touch which we may want in the future.

v2: CTX_LRI_HEADER_0 is MI_LOAD_REGISTER_IMM(14) for render and (11)
for other engines.

v3: Several rebases and general changes to the code.

v4: Squash with "Extract LR context object populating"
Also, Damien's review comments:
- Set the Force Posted bit on the LRI header, as the BSpec suggest we do.
- Prevent warning when compiling a 32-bits kernel without HIGHMEM64.
- Add a clarifying comment to the context population code.

v5: Damien's review comments:
- The third MI_LOAD_REGISTER_IMM in the context does not set Force Posted.
- Remove dead code.

v6: Add a note about the (presumed) differences between BDW and CHV state
contexts. Also, Brad's review comments:
- Use the _MASKED_BIT_ENABLE, upper_32_bits and lower_32_bits macros.
- Be less magical about how we set the ring size in the context.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> (v2)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h  |   1 +
 drivers/gpu/drm/i915/intel_lrc.c | 159 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 156 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 45f447a..6c0811b 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -282,6 +282,7 @@
  *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
  */
 #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
+#define   MI_LRI_FORCE_POSTED		(1<<12)
 #define MI_STORE_REGISTER_MEM(x) MI_INSTR(0x24, 2*(x)-1)
 #define MI_STORE_REGISTER_MEM_GEN8(x) MI_INSTR(0x24, 3*(x)-1)
 #define   MI_SRM_LRM_GLOBAL_GTT		(1<<22)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2eb7db6..cf322ec 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -46,6 +46,38 @@
 
 #define GEN8_LR_CONTEXT_ALIGN 4096
 
+#define RING_ELSP(ring)			((ring)->mmio_base+0x230)
+#define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
+
+#define CTX_LRI_HEADER_0		0x01
+#define CTX_CONTEXT_CONTROL		0x02
+#define CTX_RING_HEAD			0x04
+#define CTX_RING_TAIL			0x06
+#define CTX_RING_BUFFER_START		0x08
+#define CTX_RING_BUFFER_CONTROL		0x0a
+#define CTX_BB_HEAD_U			0x0c
+#define CTX_BB_HEAD_L			0x0e
+#define CTX_BB_STATE			0x10
+#define CTX_SECOND_BB_HEAD_U		0x12
+#define CTX_SECOND_BB_HEAD_L		0x14
+#define CTX_SECOND_BB_STATE		0x16
+#define CTX_BB_PER_CTX_PTR		0x18
+#define CTX_RCS_INDIRECT_CTX		0x1a
+#define CTX_RCS_INDIRECT_CTX_OFFSET	0x1c
+#define CTX_LRI_HEADER_1		0x21
+#define CTX_CTX_TIMESTAMP		0x22
+#define CTX_PDP3_UDW			0x24
+#define CTX_PDP3_LDW			0x26
+#define CTX_PDP2_UDW			0x28
+#define CTX_PDP2_LDW			0x2a
+#define CTX_PDP1_UDW			0x2c
+#define CTX_PDP1_LDW			0x2e
+#define CTX_PDP0_UDW			0x30
+#define CTX_PDP0_LDW			0x32
+#define CTX_LRI_HEADER_2		0x41
+#define CTX_R_PWR_CLK_STATE		0x42
+#define CTX_GPGPU_CSR_BASE_ADDRESS	0x44
+
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists)
 {
 	if (enable_execlists == 0)
@@ -57,6 +89,115 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
+static int
+populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj,
+		    struct intel_engine_cs *ring, struct intel_ringbuffer *ringbuf)
+{
+	struct drm_i915_gem_object *ring_obj = ringbuf->obj;
+	struct i915_hw_ppgtt *ppgtt = ctx_to_ppgtt(ctx);
+	struct page *page;
+	uint32_t *reg_state;
+	int ret;
+
+	ret = i915_gem_object_set_to_cpu_domain(ctx_obj, true);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Could not set to CPU domain\n");
+		return ret;
+	}
+
+	ret = i915_gem_object_get_pages(ctx_obj);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Could not get object pages\n");
+		return ret;
+	}
+
+	i915_gem_object_pin_pages(ctx_obj);
+
+	/* The second page of the context object contains some fields which must
+	 * be set up prior to the first execution. */
+	page = i915_gem_object_get_page(ctx_obj, 1);
+	reg_state = kmap_atomic(page);
+
+	/* A context is actually a big batch buffer with several MI_LOAD_REGISTER_IMM
+	 * commands followed by (reg, value) pairs. The values we are setting here are
+	 * only for the first context restore: on a subsequent save, the GPU will
+	 * recreate this batchbuffer with new values (including all the missing
+	 * MI_LOAD_REGISTER_IMM commands that we are not initializing here). */
+	if (ring->id == RCS)
+		reg_state[CTX_LRI_HEADER_0] = MI_LOAD_REGISTER_IMM(14);
+	else
+		reg_state[CTX_LRI_HEADER_0] = MI_LOAD_REGISTER_IMM(11);
+	reg_state[CTX_LRI_HEADER_0] |= MI_LRI_FORCE_POSTED;
+	reg_state[CTX_CONTEXT_CONTROL] = RING_CONTEXT_CONTROL(ring);
+	reg_state[CTX_CONTEXT_CONTROL+1] =
+			_MASKED_BIT_ENABLE((1<<3) | MI_RESTORE_INHIBIT);
+	reg_state[CTX_RING_HEAD] = RING_HEAD(ring->mmio_base);
+	reg_state[CTX_RING_HEAD+1] = 0;
+	reg_state[CTX_RING_TAIL] = RING_TAIL(ring->mmio_base);
+	reg_state[CTX_RING_TAIL+1] = 0;
+	reg_state[CTX_RING_BUFFER_START] = RING_START(ring->mmio_base);
+	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
+	reg_state[CTX_RING_BUFFER_CONTROL] = RING_CTL(ring->mmio_base);
+	reg_state[CTX_RING_BUFFER_CONTROL+1] =
+			((ringbuf->size - PAGE_SIZE) & RING_NR_PAGES) | RING_VALID;
+	reg_state[CTX_BB_HEAD_U] = ring->mmio_base + 0x168;
+	reg_state[CTX_BB_HEAD_U+1] = 0;
+	reg_state[CTX_BB_HEAD_L] = ring->mmio_base + 0x140;
+	reg_state[CTX_BB_HEAD_L+1] = 0;
+	reg_state[CTX_BB_STATE] = ring->mmio_base + 0x110;
+	reg_state[CTX_BB_STATE+1] = (1<<5);
+	reg_state[CTX_SECOND_BB_HEAD_U] = ring->mmio_base + 0x11c;
+	reg_state[CTX_SECOND_BB_HEAD_U+1] = 0;
+	reg_state[CTX_SECOND_BB_HEAD_L] = ring->mmio_base + 0x114;
+	reg_state[CTX_SECOND_BB_HEAD_L+1] = 0;
+	reg_state[CTX_SECOND_BB_STATE] = ring->mmio_base + 0x118;
+	reg_state[CTX_SECOND_BB_STATE+1] = 0;
+	if (ring->id == RCS) {
+		/* TODO: according to BSpec, the register state context
+		 * for CHV does not have these. OTOH, these registers do
+		 * exist in CHV. I'm waiting for a clarification */
+		reg_state[CTX_BB_PER_CTX_PTR] = ring->mmio_base + 0x1c0;
+		reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
+		reg_state[CTX_RCS_INDIRECT_CTX] = ring->mmio_base + 0x1c4;
+		reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
+		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring->mmio_base + 0x1c8;
+		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
+	}
+	reg_state[CTX_LRI_HEADER_1] = MI_LOAD_REGISTER_IMM(9);
+	reg_state[CTX_LRI_HEADER_1] |= MI_LRI_FORCE_POSTED;
+	reg_state[CTX_CTX_TIMESTAMP] = ring->mmio_base + 0x3a8;
+	reg_state[CTX_CTX_TIMESTAMP+1] = 0;
+	reg_state[CTX_PDP3_UDW] = GEN8_RING_PDP_UDW(ring, 3);
+	reg_state[CTX_PDP3_LDW] = GEN8_RING_PDP_LDW(ring, 3);
+	reg_state[CTX_PDP2_UDW] = GEN8_RING_PDP_UDW(ring, 2);
+	reg_state[CTX_PDP2_LDW] = GEN8_RING_PDP_LDW(ring, 2);
+	reg_state[CTX_PDP1_UDW] = GEN8_RING_PDP_UDW(ring, 1);
+	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
+	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
+	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
+	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[3]);
+	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[3]);
+	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[2]);
+	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[2]);
+	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[1]);
+	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[1]);
+	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[0]);
+	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[0]);
+	if (ring->id == RCS) {
+		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
+		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
+		reg_state[CTX_R_PWR_CLK_STATE+1] = 0;
+	}
+
+	kunmap_atomic(reg_state);
+
+	ctx_obj->dirty = 1;
+	set_page_dirty(page);
+	i915_gem_object_unpin_pages(ctx_obj);
+
+	return 0;
+}
+
 void intel_lr_context_free(struct intel_context *ctx)
 {
 	int i;
@@ -150,14 +291,24 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	if (ret) {
 		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer obj %s: %d\n",
 				ring->name, ret);
-		kfree(ringbuf);
-		i915_gem_object_ggtt_unpin(ctx_obj);
-		drm_gem_object_unreference(&ctx_obj->base);
-		return ret;
+		goto error;
+	}
+
+	ret = populate_lr_context(ctx, ctx_obj, ring, ringbuf);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret);
+		intel_destroy_ringbuffer_obj(ringbuf);
+		goto error;
 	}
 
 	ctx->engine[ring->id].ringbuf = ringbuf;
 	ctx->engine[ring->id].state = ctx_obj;
 
 	return 0;
+
+error:
+	kfree(ringbuf);
+	i915_gem_object_ggtt_unpin(ctx_obj);
+	drm_gem_object_unreference(&ctx_obj->base);
+	return ret;
 }
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 09/42] drm/i915/bdw: Deferred creation of user-created LRCs
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (7 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 08/42] drm/i915/bdw: Populate LR contexts (somewhat) oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 10/42] drm/i915/bdw: Render moot context reset and switch with Execlists oscar.mateo
                   ` (33 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

The backing objects and ringbuffers for contexts created via open
fd are actually empty until the user starts sending execbuffers to
them. At that point, we allocate & populate them. We do this because,
at create time, we really don't know which engine is going to be used
with the context later on (and we don't want to waste memory on
objects that we might never use).

v2: As contexts created via ioctl can only be used with the render
ring, we have enough information to allocate & populate them right
away.

v3: Defer the creation always, even with ioctl-created contexts, as
requested by Daniel Vetter.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c    | 7 +++----
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 8 ++++++++
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 48d7476..fbe7278 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -784,9 +784,9 @@ int i915_switch_context(struct intel_engine_cs *ring,
 	return do_switch(ring, to);
 }
 
-static bool hw_context_enabled(struct drm_device *dev)
+static bool contexts_enabled(struct drm_device *dev)
 {
-	return to_i915(dev)->hw_context_size;
+	return i915.enable_execlists || to_i915(dev)->hw_context_size;
 }
 
 int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
@@ -797,8 +797,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	struct intel_context *ctx;
 	int ret;
 
-	/* FIXME: allow user-created LR contexts as well */
-	if (!hw_context_enabled(dev))
+	if (!contexts_enabled(dev))
 		return -ENODEV;
 
 	ret = i915_mutex_lock_interruptible(dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 60998fc..a006404 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -951,6 +951,14 @@ i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
 		return ERR_PTR(-EIO);
 	}
 
+	if (i915.enable_execlists && !ctx->engine[ring->id].state) {
+		int ret = intel_lr_context_deferred_create(ctx, ring);
+		if (ret) {
+			DRM_DEBUG("Could not create LRC %u: %d\n", ctx_id, ret);
+			return ERR_PTR(ret);
+		}
+	}
+
 	return ctx;
 }
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 10/42] drm/i915/bdw: Render moot context reset and switch with Execlists
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (8 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 09/42] drm/i915/bdw: Deferred creation of user-created LRCs oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 11/42] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs oscar.mateo
                   ` (32 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

These two functions make no sense in an Logical Ring Context & Execlists
world.

v2: We got rid of lrc_enabled and centralized everything in the sanitized
i915.enbale_execlists instead.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index fbe7278..288f5de 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -380,6 +380,9 @@ void i915_gem_context_reset(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int i;
 
+	if (i915.enable_execlists)
+		return;
+
 	/* Prevent the hardware from restoring the last context (which hung) on
 	 * the next switch */
 	for (i = 0; i < I915_NUM_RINGS; i++) {
@@ -514,6 +517,9 @@ int i915_gem_context_enable(struct drm_i915_private *dev_priv)
 		ppgtt->enable(ppgtt);
 	}
 
+	if (i915.enable_execlists)
+		return 0;
+
 	/* FIXME: We should make this work, even in reset */
 	if (i915_reset_in_progress(&dev_priv->gpu_error))
 		return 0;
@@ -769,6 +775,9 @@ int i915_switch_context(struct intel_engine_cs *ring,
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 
+	if (i915.enable_execlists)
+		return 0;
+
 	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
 
 	if (to->legacy_hw_ctx.rcs_state == NULL) { /* We have the fake context */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 11/42] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (9 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 10/42] drm/i915/bdw: Render moot context reset and switch with Execlists oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 12/42] drm/i915: Abstract the legacy workload submission mechanism away oscar.mateo
                   ` (31 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

This is mostly for correctness so that we know we are running the LR
context correctly (this is, the PDPs are contained inside the context
object).

v2: Move the check to inside the enable PPGTT function. The switch
happens in two places: the legacy context switch (that we won't hit
when Execlists are enabled) and the PPGTT enable, which unfortunately
we need. This would look much nicer if the ppgtt->enable was part of
the ring init, where it logically belongs.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a4153ee..3b956e7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -851,6 +851,11 @@ static int gen8_ppgtt_enable(struct i915_hw_ppgtt *ppgtt)
 		if (USES_FULL_PPGTT(dev))
 			continue;
 
+		/* In the case of Execlists, we don't want to write the PDPs
+		 * in the legacy way (they live inside the context now) */
+		if (i915.enable_execlists)
+			return 0;
+
 		ret = ppgtt->switch_mm(ppgtt, ring, true);
 		if (ret)
 			goto err_out;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 12/42] drm/i915: Abstract the legacy workload submission mechanism away
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (10 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 11/42] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 13/42] drm/i915/bdw: Skeleton for the new logical rings submission path oscar.mateo
                   ` (30 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

As suggested by Daniel Vetter. The idea, in subsequent patches, is to
provide an alternative to these vfuncs for the Execlists submission
mechanism.

v2: Splitted into two and reordered to illustrate our intentions, instead
of showing it off. Also, remove the add_request vfunc and added the
stop_ring one.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            | 24 ++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem.c            | 15 +++++++++++----
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 20 ++++++++++----------
 3 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ca1a0cc..24faec3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1610,6 +1610,21 @@ struct drm_i915_private {
 	/* Old ums support infrastructure, same warning applies. */
 	struct i915_ums_state ums;
 
+	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
+	struct {
+		int (*do_execbuf) (struct drm_device *dev, struct drm_file *file,
+				   struct intel_engine_cs *ring,
+				   struct intel_context *ctx,
+				   struct drm_i915_gem_execbuffer2 *args,
+				   struct list_head *vmas,
+				   struct drm_i915_gem_object *batch_obj,
+				   u64 exec_start, u32 flags);
+		int (*init_rings) (struct drm_device *dev);
+		void (*cleanup_ring) (struct intel_engine_cs *ring);
+		void (*stop_ring) (struct intel_engine_cs *ring);
+		bool (*is_ring_initialized) (struct intel_engine_cs *ring);
+	} gt;
+
 	/*
 	 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
 	 * will be rejected. Instead look for a better place.
@@ -2217,6 +2232,14 @@ int i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 			      struct drm_file *file_priv);
 int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 			     struct drm_file *file_priv);
+int i915_gem_ringbuffer_submission(struct drm_device *dev,
+				   struct drm_file *file,
+				   struct intel_engine_cs *ring,
+				   struct intel_context *ctx,
+				   struct drm_i915_gem_execbuffer2 *args,
+				   struct list_head *vmas,
+				   struct drm_i915_gem_object *batch_obj,
+				   u64 exec_start, u32 flags);
 int i915_gem_execbuffer(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
 int i915_gem_execbuffer2(struct drm_device *dev, void *data,
@@ -2369,6 +2392,7 @@ void i915_gem_reset(struct drm_device *dev);
 bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
 int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
 int __must_check i915_gem_init(struct drm_device *dev);
+int i915_gem_init_rings(struct drm_device *dev);
 int __must_check i915_gem_init_hw(struct drm_device *dev);
 int i915_gem_l3_remap(struct intel_engine_cs *ring, int slice);
 void i915_gem_init_swizzling(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d8bf4fa..6544286 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4518,7 +4518,7 @@ i915_gem_stop_ringbuffers(struct drm_device *dev)
 	int i;
 
 	for_each_ring(ring, dev_priv, i)
-		intel_stop_ring_buffer(ring);
+		dev_priv->gt.stop_ring(ring);
 }
 
 int
@@ -4635,7 +4635,7 @@ intel_enable_blt(struct drm_device *dev)
 	return true;
 }
 
-static int i915_gem_init_rings(struct drm_device *dev)
+int i915_gem_init_rings(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret;
@@ -4718,7 +4718,7 @@ i915_gem_init_hw(struct drm_device *dev)
 
 	i915_gem_init_swizzling(dev);
 
-	ret = i915_gem_init_rings(dev);
+	ret = dev_priv->gt.init_rings(dev);
 	if (ret)
 		return ret;
 
@@ -4759,6 +4759,13 @@ int i915_gem_init(struct drm_device *dev)
 			DRM_DEBUG_DRIVER("allow wake ack timed out\n");
 	}
 
+	if (!i915.enable_execlists) {
+		dev_priv->gt.do_execbuf = i915_gem_ringbuffer_submission;
+		dev_priv->gt.init_rings = i915_gem_init_rings;
+		dev_priv->gt.cleanup_ring = intel_cleanup_ring_buffer;
+		dev_priv->gt.stop_ring = intel_stop_ring_buffer;
+	}
+
 	i915_gem_init_userptr(dev);
 	i915_gem_init_global_gtt(dev);
 
@@ -4794,7 +4801,7 @@ i915_gem_cleanup_ringbuffer(struct drm_device *dev)
 	int i;
 
 	for_each_ring(ring, dev_priv, i)
-		intel_cleanup_ring_buffer(ring);
+		dev_priv->gt.cleanup_ring(ring);
 }
 
 int
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index a006404..491ef7e 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1034,14 +1034,14 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev,
 	return 0;
 }
 
-static int
-legacy_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
-			     struct intel_engine_cs *ring,
-			     struct intel_context *ctx,
-			     struct drm_i915_gem_execbuffer2 *args,
-			     struct list_head *vmas,
-			     struct drm_i915_gem_object *batch_obj,
-			     u64 exec_start, u32 flags)
+int
+i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
+			       struct intel_engine_cs *ring,
+			       struct intel_context *ctx,
+			       struct drm_i915_gem_execbuffer2 *args,
+			       struct list_head *vmas,
+			       struct drm_i915_gem_object *batch_obj,
+			       u64 exec_start, u32 flags)
 {
 	struct drm_clip_rect *cliprects = NULL;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1408,8 +1408,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	else
 		exec_start += i915_gem_obj_offset(batch_obj, vm);
 
-	ret = legacy_ringbuffer_submission(dev, file, ring, ctx,
-			args, &eb->vmas, batch_obj, exec_start, flags);
+	ret = dev_priv->gt.do_execbuf(dev, file, ring, ctx, args,
+			&eb->vmas, batch_obj, exec_start, flags);
 	if (ret)
 		goto err;
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 13/42] drm/i915/bdw: Skeleton for the new logical rings submission path
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (11 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 12/42] drm/i915: Abstract the legacy workload submission mechanism away oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 14/42] drm/i915/bdw: Generic logical ring init and cleanup oscar.mateo
                   ` (29 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Execlists are indeed a brave new world with respect to workload
submission to the GPU.

In previous version of these series, I have tried to impact the
legacy ringbuffer submission path as little as possible (mostly,
passing the context around and using the correct ringbuffer when I
needed one) but Daniel is afraid (probably with a reason) that
these changes and, especially, future ones, will end up breaking
older gens.

This commit and some others coming next will try to limit the
damage by creating an alternative path for workload submission.
The first step is here: laying out a new ring init/fini.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c  |   5 ++
 drivers/gpu/drm/i915/intel_lrc.c | 151 +++++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h |  12 ++++
 3 files changed, 168 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6544286..9560b40 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4764,6 +4764,11 @@ int i915_gem_init(struct drm_device *dev)
 		dev_priv->gt.init_rings = i915_gem_init_rings;
 		dev_priv->gt.cleanup_ring = intel_cleanup_ring_buffer;
 		dev_priv->gt.stop_ring = intel_stop_ring_buffer;
+	} else {
+		dev_priv->gt.do_execbuf = intel_execlists_submission;
+		dev_priv->gt.init_rings = intel_logical_rings_init;
+		dev_priv->gt.cleanup_ring = intel_logical_ring_cleanup;
+		dev_priv->gt.stop_ring = intel_logical_ring_stop;
 	}
 
 	i915_gem_init_userptr(dev);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index cf322ec..cb56bb8 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -89,6 +89,157 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
+int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
+			       struct intel_engine_cs *ring,
+			       struct intel_context *ctx,
+			       struct drm_i915_gem_execbuffer2 *args,
+			       struct list_head *vmas,
+			       struct drm_i915_gem_object *batch_obj,
+			       u64 exec_start, u32 flags)
+{
+	/* TODO */
+	return 0;
+}
+
+void intel_logical_ring_stop(struct intel_engine_cs *ring)
+{
+	/* TODO */
+}
+
+void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
+{
+	/* TODO */
+}
+
+static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
+{
+	/* TODO */
+	return 0;
+}
+
+static int logical_render_ring_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring = &dev_priv->ring[RCS];
+
+	ring->name = "render ring";
+	ring->id = RCS;
+	ring->mmio_base = RENDER_RING_BASE;
+	ring->irq_enable_mask =
+		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
+
+	return logical_ring_init(dev, ring);
+}
+
+static int logical_bsd_ring_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring = &dev_priv->ring[VCS];
+
+	ring->name = "bsd ring";
+	ring->id = VCS;
+	ring->mmio_base = GEN6_BSD_RING_BASE;
+	ring->irq_enable_mask =
+		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
+
+	return logical_ring_init(dev, ring);
+}
+
+static int logical_bsd2_ring_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring = &dev_priv->ring[VCS2];
+
+	ring->name = "bds2 ring";
+	ring->id = VCS2;
+	ring->mmio_base = GEN8_BSD2_RING_BASE;
+	ring->irq_enable_mask =
+		GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
+
+	return logical_ring_init(dev, ring);
+}
+
+static int logical_blt_ring_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring = &dev_priv->ring[BCS];
+
+	ring->name = "blitter ring";
+	ring->id = BCS;
+	ring->mmio_base = BLT_RING_BASE;
+	ring->irq_enable_mask =
+		GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
+
+	return logical_ring_init(dev, ring);
+}
+
+static int logical_vebox_ring_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring = &dev_priv->ring[VECS];
+
+	ring->name = "video enhancement ring";
+	ring->id = VECS;
+	ring->mmio_base = VEBOX_RING_BASE;
+	ring->irq_enable_mask =
+		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
+
+	return logical_ring_init(dev, ring);
+}
+
+int intel_logical_rings_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ret = logical_render_ring_init(dev);
+	if (ret)
+		return ret;
+
+	if (HAS_BSD(dev)) {
+		ret = logical_bsd_ring_init(dev);
+		if (ret)
+			goto cleanup_render_ring;
+	}
+
+	if (HAS_BLT(dev)) {
+		ret = logical_blt_ring_init(dev);
+		if (ret)
+			goto cleanup_bsd_ring;
+	}
+
+	if (HAS_VEBOX(dev)) {
+		ret = logical_vebox_ring_init(dev);
+		if (ret)
+			goto cleanup_blt_ring;
+	}
+
+	if (HAS_BSD2(dev)) {
+		ret = logical_bsd2_ring_init(dev);
+		if (ret)
+			goto cleanup_vebox_ring;
+	}
+
+	ret = i915_gem_set_seqno(dev, ((u32)~0 - 0x1000));
+	if (ret)
+		goto cleanup_bsd2_ring;
+
+	return 0;
+
+cleanup_bsd2_ring:
+	intel_logical_ring_cleanup(&dev_priv->ring[VCS2]);
+cleanup_vebox_ring:
+	intel_logical_ring_cleanup(&dev_priv->ring[VECS]);
+cleanup_blt_ring:
+	intel_logical_ring_cleanup(&dev_priv->ring[BCS]);
+cleanup_bsd_ring:
+	intel_logical_ring_cleanup(&dev_priv->ring[VCS]);
+cleanup_render_ring:
+	intel_logical_ring_cleanup(&dev_priv->ring[RCS]);
+
+	return ret;
+}
+
 static int
 populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj,
 		    struct intel_engine_cs *ring, struct intel_ringbuffer *ringbuf)
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 3b93572..bf0eff4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -24,6 +24,11 @@
 #ifndef _INTEL_LRC_H_
 #define _INTEL_LRC_H_
 
+/* Logical Rings */
+void intel_logical_ring_stop(struct intel_engine_cs *ring);
+void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
+int intel_logical_rings_init(struct drm_device *dev);
+
 /* Logical Ring Contexts */
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
@@ -31,5 +36,12 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 
 /* Execlists */
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
+int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
+			       struct intel_engine_cs *ring,
+			       struct intel_context *ctx,
+			       struct drm_i915_gem_execbuffer2 *args,
+			       struct list_head *vmas,
+			       struct drm_i915_gem_object *batch_obj,
+			       u64 exec_start, u32 flags);
 
 #endif /* _INTEL_LRC_H_ */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 14/42] drm/i915/bdw: Generic logical ring init and cleanup
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (12 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 13/42] drm/i915/bdw: Skeleton for the new logical rings submission path oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 15/42] drm/i915/bdw: GEN-specific logical ring init oscar.mateo
                   ` (28 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Allocate and populate the default LRC for every ring, call
gen-specific init/cleanup, init/fini the command parser and
set the status page (now inside the LRC object). These are
things all engines/rings have in common.

Stopping the ring before cleanup and initializing the seqnos
is left as a TODO task (we need more infrastructure in place
before we can achieve this).

v2: Check the ringbuffer backing obj for ring_is_initialized,
instead of the context backing obj (similar, but not exactly
the same).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c |  4 ---
 drivers/gpu/drm/i915/intel_lrc.c        | 54 +++++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.c | 17 +++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  6 +---
 4 files changed, 70 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 288f5de..9085ff1 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -450,10 +450,6 @@ int i915_gem_context_init(struct drm_device *dev)
 
 		/* NB: RCS will hold a ref for all rings */
 		ring->default_context = ctx;
-
-		/* FIXME: we really only want to do this for initialized rings */
-		if (i915.enable_execlists)
-			intel_lr_context_deferred_create(ctx, ring);
 	}
 
 	DRM_DEBUG_DRIVER("%s context support initialized\n",
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index cb56bb8..05b7069 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -108,12 +108,60 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
 
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
-	/* TODO */
+	if (!intel_ring_initialized(ring))
+		return;
+
+	/* TODO: make sure the ring is stopped */
+	ring->preallocated_lazy_request = NULL;
+	ring->outstanding_lazy_seqno = 0;
+
+	if (ring->cleanup)
+		ring->cleanup(ring);
+
+	i915_cmd_parser_fini_ring(ring);
+
+	if (ring->status_page.obj) {
+		kunmap(sg_page(ring->status_page.obj->pages->sgl));
+		ring->status_page.obj = NULL;
+	}
 }
 
 static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
 {
-	/* TODO */
+	int ret;
+	struct intel_context *dctx = ring->default_context;
+	struct drm_i915_gem_object *dctx_obj;
+
+	/* Intentionally left blank. */
+	ring->buffer = NULL;
+
+	ring->dev = dev;
+	INIT_LIST_HEAD(&ring->active_list);
+	INIT_LIST_HEAD(&ring->request_list);
+	init_waitqueue_head(&ring->irq_queue);
+
+	ret = intel_lr_context_deferred_create(dctx, ring);
+	if (ret)
+		return ret;
+
+	/* The status page is offset 0 from the context object in LRCs. */
+	dctx_obj = dctx->engine[ring->id].state;
+	ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(dctx_obj);
+	ring->status_page.page_addr = kmap(sg_page(dctx_obj->pages->sgl));
+	if (ring->status_page.page_addr == NULL)
+		return -ENOMEM;
+	ring->status_page.obj = dctx_obj;
+
+	ret = i915_cmd_parser_init_ring(ring);
+	if (ret)
+		return ret;
+
+	if (ring->init) {
+		ret = ring->init(ring);
+		if (ret)
+			return ret;
+	}
+
 	return 0;
 }
 
@@ -397,6 +445,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	int ret;
 
 	BUG_ON(ctx->legacy_hw_ctx.rcs_state != NULL);
+	if (ctx->engine[ring->id].state)
+		return 0;
 
 	context_size = round_up(get_lr_context_size(ring), 4096);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 279dda4..20eb1a4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -40,6 +40,23 @@
  */
 #define CACHELINE_BYTES 64
 
+bool
+intel_ring_initialized(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+
+	if (!dev)
+		return false;
+
+	if (i915.enable_execlists) {
+		struct intel_context *dctx = ring->default_context;
+		struct intel_ringbuffer *ringbuf = dctx->engine[ring->id].ringbuf;
+
+		return ringbuf->obj;
+	} else
+		return ring->buffer && ring->buffer->obj;
+}
+
 static inline int __ring_space(int head, int tail, int size)
 {
 	int space = head - (tail + I915_RING_FREE_SPACE);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index be40788..7203ee2 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -288,11 +288,7 @@ struct  intel_engine_cs {
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
 };
 
-static inline bool
-intel_ring_initialized(struct intel_engine_cs *ring)
-{
-	return ring->buffer && ring->buffer->obj;
-}
+bool intel_ring_initialized(struct intel_engine_cs *ring);
 
 static inline unsigned
 intel_ring_flag(struct intel_engine_cs *ring)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 15/42] drm/i915/bdw: GEN-specific logical ring init
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (13 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 14/42] drm/i915/bdw: Generic logical ring init and cleanup oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 16/42] drm/i915/bdw: GEN-specific logical ring set/get seqno oscar.mateo
                   ` (27 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Logical rings do not need most of the initialization their
legacy ringbuffer counterparts do: we just need the pipe
control object for the render ring, enable Execlists on the
hardware and a few workarounds.

v2: Squash with: "drm/i915: Extract pipe control fini & make
init outside accesible".

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 54 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 34 +++++++++++++--------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
 3 files changed, 78 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 05b7069..7c8b75e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -106,6 +106,49 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
 	/* TODO */
 }
 
+static int gen8_init_common_ring(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	I915_WRITE(RING_MODE_GEN7(ring),
+		_MASKED_BIT_DISABLE(GFX_REPLAY_MODE) |
+		_MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));
+	POSTING_READ(RING_MODE_GEN7(ring));
+	DRM_DEBUG_DRIVER("Execlists enabled for %s\n", ring->name);
+
+	memset(&ring->hangcheck, 0, sizeof(ring->hangcheck));
+
+	return 0;
+}
+
+static int gen8_init_render_ring(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ret = gen8_init_common_ring(ring);
+	if (ret)
+		return ret;
+
+	/* We need to disable the AsyncFlip performance optimisations in order
+	 * to use MI_WAIT_FOR_EVENT within the CS. It should already be
+	 * programmed to '1' on all products.
+	 *
+	 * WaDisableAsyncFlipPerfMode:snb,ivb,hsw,vlv,bdw,chv
+	 */
+	I915_WRITE(MI_MODE, _MASKED_BIT_ENABLE(ASYNC_FLIP_PERF_DISABLE));
+
+	ret = intel_init_pipe_control(ring);
+	if (ret)
+		return ret;
+
+	I915_WRITE(INSTPM, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
+
+	return ret;
+}
+
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
 	if (!intel_ring_initialized(ring))
@@ -176,6 +219,9 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
 
+	ring->init = gen8_init_render_ring;
+	ring->cleanup = intel_fini_pipe_control;
+
 	return logical_ring_init(dev, ring);
 }
 
@@ -190,6 +236,8 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
 
+	ring->init = gen8_init_common_ring;
+
 	return logical_ring_init(dev, ring);
 }
 
@@ -204,6 +252,8 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
 
+	ring->init = gen8_init_common_ring;
+
 	return logical_ring_init(dev, ring);
 }
 
@@ -218,6 +268,8 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
 
+	ring->init = gen8_init_common_ring;
+
 	return logical_ring_init(dev, ring);
 }
 
@@ -232,6 +284,8 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
 
+	ring->init = gen8_init_common_ring;
+
 	return logical_ring_init(dev, ring);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 20eb1a4..ca45c58 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -573,8 +573,25 @@ out:
 	return ret;
 }
 
-static int
-init_pipe_control(struct intel_engine_cs *ring)
+void
+intel_fini_pipe_control(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+
+	if (ring->scratch.obj == NULL)
+		return;
+
+	if (INTEL_INFO(dev)->gen >= 5) {
+		kunmap(sg_page(ring->scratch.obj->pages->sgl));
+		i915_gem_object_ggtt_unpin(ring->scratch.obj);
+	}
+
+	drm_gem_object_unreference(&ring->scratch.obj->base);
+	ring->scratch.obj = NULL;
+}
+
+int
+intel_init_pipe_control(struct intel_engine_cs *ring)
 {
 	int ret;
 
@@ -649,7 +666,7 @@ static int init_render_ring(struct intel_engine_cs *ring)
 			   _MASKED_BIT_ENABLE(GFX_REPLAY_MODE));
 
 	if (INTEL_INFO(dev)->gen >= 5) {
-		ret = init_pipe_control(ring);
+		ret = intel_init_pipe_control(ring);
 		if (ret)
 			return ret;
 	}
@@ -684,16 +701,7 @@ static void render_ring_cleanup(struct intel_engine_cs *ring)
 		dev_priv->semaphore_obj = NULL;
 	}
 
-	if (ring->scratch.obj == NULL)
-		return;
-
-	if (INTEL_INFO(dev)->gen >= 5) {
-		kunmap(sg_page(ring->scratch.obj->pages->sgl));
-		i915_gem_object_ggtt_unpin(ring->scratch.obj);
-	}
-
-	drm_gem_object_unreference(&ring->scratch.obj->base);
-	ring->scratch.obj = NULL;
+	intel_fini_pipe_control(ring);
 }
 
 static int gen8_rcs_signal(struct intel_engine_cs *signaller,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 7203ee2..c135334 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -380,6 +380,9 @@ void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno);
 int intel_ring_flush_all_caches(struct intel_engine_cs *ring);
 int intel_ring_invalidate_all_caches(struct intel_engine_cs *ring);
 
+void intel_fini_pipe_control(struct intel_engine_cs *ring);
+int intel_init_pipe_control(struct intel_engine_cs *ring);
+
 int intel_init_render_ring_buffer(struct drm_device *dev);
 int intel_init_bsd_ring_buffer(struct drm_device *dev);
 int intel_init_bsd2_ring_buffer(struct drm_device *dev);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 16/42] drm/i915/bdw: GEN-specific logical ring set/get seqno
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (14 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 15/42] drm/i915/bdw: GEN-specific logical ring init oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 17/42] drm/i915/bdw: New logical ring submission mechanism oscar.mateo
                   ` (26 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

No mistery here: the seqno is still retrieved from the engine's
HW status page (the one in the default context. For the moment,
I see no reason to worry about other context's HWS page).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7c8b75e..f171fd5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -149,6 +149,16 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
 	return ret;
 }
 
+static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
+{
+	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
+}
+
+static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
+{
+	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
+}
+
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
 	if (!intel_ring_initialized(ring))
@@ -221,6 +231,8 @@ static int logical_render_ring_init(struct drm_device *dev)
 
 	ring->init = gen8_init_render_ring;
 	ring->cleanup = intel_fini_pipe_control;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 
 	return logical_ring_init(dev, ring);
 }
@@ -237,6 +249,8 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 
 	return logical_ring_init(dev, ring);
 }
@@ -253,6 +267,8 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 
 	return logical_ring_init(dev, ring);
 }
@@ -269,6 +285,8 @@ static int logical_blt_ring_init(struct drm_device *dev)
 		GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 
 	return logical_ring_init(dev, ring);
 }
@@ -285,6 +303,8 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 
 	return logical_ring_init(dev, ring);
 }
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 17/42] drm/i915/bdw: New logical ring submission mechanism
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (15 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 16/42] drm/i915/bdw: GEN-specific logical ring set/get seqno oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 18/42] drm/i915/bdw: GEN-specific logical ring emit request oscar.mateo
                   ` (25 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Well, new-ish: if all this code looks familiar, that's because it's
a clone of the existing submission mechanism (with some modifications
here and there to adapt it to LRCs and Execlists).

And why did we do this instead of reusing code, one might wonder?
Well, there are some fears that the differences are big enough that
they will end up breaking all platforms.

Also, Execlists offer several advantages, like control over when the
GPU is done with a given workload, that can help simplify the
submission mechanism, no doubt. I am interested in getting Execlists
to work first and foremost, but in the future this parallel submission
mechanism will help us to fine tune the mechanism without affecting
old gens.

v2: Pass the ringbuffer only (whenever possible).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 193 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h        |  12 ++
 drivers/gpu/drm/i915/intel_ringbuffer.c |  20 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |   3 +
 4 files changed, 218 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f171fd5..bd37d51 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -106,6 +106,199 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
 	/* TODO */
 }
 
+void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
+{
+	intel_logical_ring_advance(ringbuf);
+
+	if (intel_ring_stopped(ringbuf->ring))
+		return;
+
+	/* TODO: how to submit a context to the ELSP is not here yet */
+}
+
+static int logical_ring_alloc_seqno(struct intel_engine_cs *ring)
+{
+	if (ring->outstanding_lazy_seqno)
+		return 0;
+
+	if (ring->preallocated_lazy_request == NULL) {
+		struct drm_i915_gem_request *request;
+
+		request = kmalloc(sizeof(*request), GFP_KERNEL);
+		if (request == NULL)
+			return -ENOMEM;
+
+		ring->preallocated_lazy_request = request;
+	}
+
+	return i915_gem_get_seqno(ring->dev, &ring->outstanding_lazy_seqno);
+}
+
+static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf, int bytes)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	struct drm_i915_gem_request *request;
+	u32 seqno = 0;
+	int ret;
+
+	if (ringbuf->last_retired_head != -1) {
+		ringbuf->head = ringbuf->last_retired_head;
+		ringbuf->last_retired_head = -1;
+
+		ringbuf->space = intel_ring_space(ringbuf);
+		if (ringbuf->space >= bytes)
+			return 0;
+	}
+
+	list_for_each_entry(request, &ring->request_list, list) {
+		if (__intel_ring_space(request->tail, ringbuf->tail,
+				ringbuf->size) >= bytes) {
+			seqno = request->seqno;
+			break;
+		}
+	}
+
+	if (seqno == 0)
+		return -ENOSPC;
+
+	ret = i915_wait_seqno(ring, seqno);
+	if (ret)
+		return ret;
+
+	/* TODO: make sure we update the right ringbuffer's last_retired_head
+	 * when retiring requests */
+	i915_gem_retire_requests_ring(ring);
+	ringbuf->head = ringbuf->last_retired_head;
+	ringbuf->last_retired_head = -1;
+
+	ringbuf->space = intel_ring_space(ringbuf);
+	return 0;
+}
+
+static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf, int bytes)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	unsigned long end;
+	int ret;
+
+	ret = logical_ring_wait_request(ringbuf, bytes);
+	if (ret != -ENOSPC)
+		return ret;
+
+	/* Force the context submission in case we have been skipping it */
+	intel_logical_ring_advance_and_submit(ringbuf);
+
+	/* With GEM the hangcheck timer should kick us out of the loop,
+	 * leaving it early runs the risk of corrupting GEM state (due
+	 * to running on almost untested codepaths). But on resume
+	 * timers don't work yet, so prevent a complete hang in that
+	 * case by choosing an insanely large timeout. */
+	end = jiffies + 60 * HZ;
+
+	do {
+		ringbuf->head = I915_READ_HEAD(ring);
+		ringbuf->space = intel_ring_space(ringbuf);
+		if (ringbuf->space >= bytes) {
+			ret = 0;
+			break;
+		}
+
+		if (!drm_core_check_feature(dev, DRIVER_MODESET) &&
+		    dev->primary->master) {
+			struct drm_i915_master_private *master_priv = dev->primary->master->driver_priv;
+			if (master_priv->sarea_priv)
+				master_priv->sarea_priv->perf_boxes |= I915_BOX_WAIT;
+		}
+
+		msleep(1);
+
+		if (dev_priv->mm.interruptible && signal_pending(current)) {
+			ret = -ERESTARTSYS;
+			break;
+		}
+
+		ret = i915_gem_check_wedge(&dev_priv->gpu_error,
+					   dev_priv->mm.interruptible);
+		if (ret)
+			break;
+
+		if (time_after(jiffies, end)) {
+			ret = -EBUSY;
+			break;
+		}
+	} while (1);
+
+	return ret;
+}
+
+static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf)
+{
+	uint32_t __iomem *virt;
+	int rem = ringbuf->size - ringbuf->tail;
+
+	if (ringbuf->space < rem) {
+		int ret = logical_ring_wait_for_space(ringbuf, rem);
+		if (ret)
+			return ret;
+	}
+
+	virt = ringbuf->virtual_start + ringbuf->tail;
+	rem /= 4;
+	while (rem--)
+		iowrite32(MI_NOOP, virt++);
+
+	ringbuf->tail = 0;
+	ringbuf->space = intel_ring_space(ringbuf);
+
+	return 0;
+}
+
+static int logical_ring_prepare(struct intel_ringbuffer *ringbuf, int bytes)
+{
+	int ret;
+
+	if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
+		ret = logical_ring_wrap_buffer(ringbuf);
+		if (unlikely(ret))
+			return ret;
+	}
+
+	if (unlikely(ringbuf->space < bytes)) {
+		ret = logical_ring_wait_for_space(ringbuf, bytes);
+		if (unlikely(ret))
+			return ret;
+	}
+
+	return 0;
+}
+
+int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
+				   dev_priv->mm.interruptible);
+	if (ret)
+		return ret;
+
+	ret = logical_ring_prepare(ringbuf, num_dwords * sizeof(uint32_t));
+	if (ret)
+		return ret;
+
+	/* Preallocate the olr before touching the ring */
+	ret = logical_ring_alloc_seqno(ring);
+	if (ret)
+		return ret;
+
+	ringbuf->space -= num_dwords * sizeof(uint32_t);
+	return 0;
+}
+
 static int gen8_init_common_ring(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index bf0eff4..16798b6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -29,6 +29,18 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring);
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
 int intel_logical_rings_init(struct drm_device *dev);
 
+void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf);
+static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
+{
+	ringbuf->tail &= ringbuf->size - 1;
+}
+static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf, u32 data)
+{
+	iowrite32(data, ringbuf->virtual_start + ringbuf->tail);
+	ringbuf->tail += 4;
+}
+int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords);
+
 /* Logical Ring Contexts */
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ca45c58..dc2a991 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -57,7 +57,7 @@ intel_ring_initialized(struct intel_engine_cs *ring)
 		return ring->buffer && ring->buffer->obj;
 }
 
-static inline int __ring_space(int head, int tail, int size)
+int __intel_ring_space(int head, int tail, int size)
 {
 	int space = head - (tail + I915_RING_FREE_SPACE);
 	if (space < 0)
@@ -65,12 +65,12 @@ static inline int __ring_space(int head, int tail, int size)
 	return space;
 }
 
-static inline int ring_space(struct intel_ringbuffer *ringbuf)
+int intel_ring_space(struct intel_ringbuffer *ringbuf)
 {
-	return __ring_space(ringbuf->head & HEAD_ADDR, ringbuf->tail, ringbuf->size);
+	return __intel_ring_space(ringbuf->head & HEAD_ADDR, ringbuf->tail, ringbuf->size);
 }
 
-static bool intel_ring_stopped(struct intel_engine_cs *ring)
+bool intel_ring_stopped(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	return dev_priv->gpu_error.stop_rings & intel_ring_flag(ring);
@@ -561,7 +561,7 @@ static int init_ring_common(struct intel_engine_cs *ring)
 	else {
 		ringbuf->head = I915_READ_HEAD(ring);
 		ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
-		ringbuf->space = ring_space(ringbuf);
+		ringbuf->space = intel_ring_space(ringbuf);
 		ringbuf->last_retired_head = -1;
 	}
 
@@ -1679,13 +1679,13 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 		ringbuf->head = ringbuf->last_retired_head;
 		ringbuf->last_retired_head = -1;
 
-		ringbuf->space = ring_space(ringbuf);
+		ringbuf->space = intel_ring_space(ringbuf);
 		if (ringbuf->space >= n)
 			return 0;
 	}
 
 	list_for_each_entry(request, &ring->request_list, list) {
-		if (__ring_space(request->tail, ringbuf->tail, ringbuf->size) >= n) {
+		if (__intel_ring_space(request->tail, ringbuf->tail, ringbuf->size) >= n) {
 			seqno = request->seqno;
 			break;
 		}
@@ -1702,7 +1702,7 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 	ringbuf->head = ringbuf->last_retired_head;
 	ringbuf->last_retired_head = -1;
 
-	ringbuf->space = ring_space(ringbuf);
+	ringbuf->space = intel_ring_space(ringbuf);
 	return 0;
 }
 
@@ -1731,7 +1731,7 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
 	trace_i915_ring_wait_begin(ring);
 	do {
 		ringbuf->head = I915_READ_HEAD(ring);
-		ringbuf->space = ring_space(ringbuf);
+		ringbuf->space = intel_ring_space(ringbuf);
 		if (ringbuf->space >= n) {
 			ret = 0;
 			break;
@@ -1783,7 +1783,7 @@ static int intel_wrap_ring_buffer(struct intel_engine_cs *ring)
 		iowrite32(MI_NOOP, virt++);
 
 	ringbuf->tail = 0;
-	ringbuf->space = ring_space(ringbuf);
+	ringbuf->space = intel_ring_space(ringbuf);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index c135334..c305df0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -373,6 +373,9 @@ static inline void intel_ring_advance(struct intel_engine_cs *ring)
 	struct intel_ringbuffer *ringbuf = ring->buffer;
 	ringbuf->tail &= ringbuf->size - 1;
 }
+int __intel_ring_space(int head, int tail, int size);
+int intel_ring_space(struct intel_ringbuffer *ringbuf);
+bool intel_ring_stopped(struct intel_engine_cs *ring);
 void __intel_ring_advance(struct intel_engine_cs *ring);
 
 int __must_check intel_ring_idle(struct intel_engine_cs *ring);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 18/42] drm/i915/bdw: GEN-specific logical ring emit request
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (16 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 17/42] drm/i915/bdw: New logical ring submission mechanism oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 19/42] drm/i915/bdw: GEN-specific logical ring emit flush oscar.mateo
                   ` (24 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Very similar to the legacy add_request, only modified to account for
logical ringbuffer.

v2: Use MI_GLOBAL_GTT, as suggested by Brad Volkin.

v3: Unify render and non-render in the same function, as noticed by
Brad Volkin.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h         |  1 +
 drivers/gpu/drm/i915/intel_lrc.c        | 31 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +++
 3 files changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 6c0811b..4df121b 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -272,6 +272,7 @@
 #define   MI_SEMAPHORE_POLL		(1<<15)
 #define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
 #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
+#define MI_STORE_DWORD_IMM_GEN8	MI_INSTR(0x20, 2)
 #define   MI_MEM_VIRTUAL	(1 << 22) /* 965+ only */
 #define MI_STORE_DWORD_INDEX	MI_INSTR(0x21, 1)
 #define   MI_STORE_DWORD_INDEX_SHIFT 2
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index bd37d51..64bda7a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -352,6 +352,32 @@ static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
 	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
 }
 
+static int gen8_emit_request(struct intel_ringbuffer *ringbuf)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	u32 cmd;
+	int ret;
+
+	ret = intel_logical_ring_begin(ringbuf, 6);
+	if (ret)
+		return ret;
+
+	cmd = MI_STORE_DWORD_IMM_GEN8;
+	cmd |= MI_GLOBAL_GTT;
+
+	intel_logical_ring_emit(ringbuf, cmd);
+	intel_logical_ring_emit(ringbuf,
+				(ring->status_page.gfx_addr +
+				(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT)));
+	intel_logical_ring_emit(ringbuf, 0);
+	intel_logical_ring_emit(ringbuf, ring->outstanding_lazy_seqno);
+	intel_logical_ring_emit(ringbuf, MI_USER_INTERRUPT);
+	intel_logical_ring_emit(ringbuf, MI_NOOP);
+	intel_logical_ring_advance_and_submit(ringbuf);
+
+	return 0;
+}
+
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
 	if (!intel_ring_initialized(ring))
@@ -426,6 +452,7 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->cleanup = intel_fini_pipe_control;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
+	ring->emit_request = gen8_emit_request;
 
 	return logical_ring_init(dev, ring);
 }
@@ -444,6 +471,7 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
+	ring->emit_request = gen8_emit_request;
 
 	return logical_ring_init(dev, ring);
 }
@@ -462,6 +490,7 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
+	ring->emit_request = gen8_emit_request;
 
 	return logical_ring_init(dev, ring);
 }
@@ -480,6 +509,7 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
+	ring->emit_request = gen8_emit_request;
 
 	return logical_ring_init(dev, ring);
 }
@@ -498,6 +528,7 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
+	ring->emit_request = gen8_emit_request;
 
 	return logical_ring_init(dev, ring);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index c305df0..176ee6a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -215,6 +215,9 @@ struct  intel_engine_cs {
 				  unsigned int num_dwords);
 	} semaphore;
 
+	/* Execlists */
+	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
+
 	/**
 	 * List of objects currently involved in rendering from the
 	 * ringbuffer.
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 19/42] drm/i915/bdw: GEN-specific logical ring emit flush
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (17 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 18/42] drm/i915/bdw: GEN-specific logical ring emit request oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 20/42] drm/i915/bdw: Emission of requests with logical rings oscar.mateo
                   ` (23 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Same as the legacy-style ring->flush.

v2: The BSD invalidate bit still exists in GEN8! Add it for the VCS
rings (but still consolidate the blt and bsd ring flushes into one).
This was noticed by Brad Volkin.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 78 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c |  7 ---
 drivers/gpu/drm/i915/intel_ringbuffer.h | 10 +++++
 3 files changed, 88 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 64bda7a..e25cbff 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -342,6 +342,79 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
 	return ret;
 }
 
+static int gen8_emit_flush(struct intel_ringbuffer *ringbuf,
+			   u32 invalidate_domains,
+			   u32 unused)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	uint32_t cmd;
+	int ret;
+
+	ret = intel_logical_ring_begin(ringbuf, 4);
+	if (ret)
+		return ret;
+
+	cmd = MI_FLUSH_DW + 1;
+
+	if (invalidate_domains & I915_GEM_GPU_DOMAINS) {
+		cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
+			MI_FLUSH_DW_OP_STOREDW;
+		if (ring == &dev_priv->ring[VCS])
+			cmd |= MI_INVALIDATE_BSD;
+	}
+	intel_logical_ring_emit(ringbuf, cmd);
+	intel_logical_ring_emit(ringbuf, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
+	intel_logical_ring_emit(ringbuf, 0); /* upper addr */
+	intel_logical_ring_emit(ringbuf, 0); /* value */
+	intel_logical_ring_advance(ringbuf);
+
+	return 0;
+}
+
+static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf,
+				  u32 invalidate_domains,
+				  u32 flush_domains)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	u32 flags = 0;
+	int ret;
+
+	flags |= PIPE_CONTROL_CS_STALL;
+
+	if (flush_domains) {
+		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
+		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
+	}
+
+	if (invalidate_domains) {
+		flags |= PIPE_CONTROL_TLB_INVALIDATE;
+		flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_QW_WRITE;
+		flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
+	}
+
+	ret = intel_logical_ring_begin(ringbuf, 6);
+	if (ret)
+		return ret;
+
+	intel_logical_ring_emit(ringbuf, GFX_OP_PIPE_CONTROL(6));
+	intel_logical_ring_emit(ringbuf, flags);
+	intel_logical_ring_emit(ringbuf, scratch_addr);
+	intel_logical_ring_emit(ringbuf, 0);
+	intel_logical_ring_emit(ringbuf, 0);
+	intel_logical_ring_emit(ringbuf, 0);
+	intel_logical_ring_advance(ringbuf);
+
+	return 0;
+}
+
 static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
 {
 	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
@@ -453,6 +526,7 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
+	ring->emit_flush = gen8_emit_flush_render;
 
 	return logical_ring_init(dev, ring);
 }
@@ -472,6 +546,7 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
+	ring->emit_flush = gen8_emit_flush;
 
 	return logical_ring_init(dev, ring);
 }
@@ -491,6 +566,7 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
+	ring->emit_flush = gen8_emit_flush;
 
 	return logical_ring_init(dev, ring);
 }
@@ -510,6 +586,7 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
+	ring->emit_flush = gen8_emit_flush;
 
 	return logical_ring_init(dev, ring);
 }
@@ -529,6 +606,7 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
+	ring->emit_flush = gen8_emit_flush;
 
 	return logical_ring_init(dev, ring);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index dc2a991..3188403 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -33,13 +33,6 @@
 #include "i915_trace.h"
 #include "intel_drv.h"
 
-/* Early gen2 devices have a cacheline of just 32 bytes, using 64 is overkill,
- * but keeps the logic simple. Indeed, the whole purpose of this macro is just
- * to give some inclination as to some of the magic values used in the various
- * workarounds!
- */
-#define CACHELINE_BYTES 64
-
 bool
 intel_ring_initialized(struct intel_engine_cs *ring)
 {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 176ee6a..6e22866 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -5,6 +5,13 @@
 
 #define I915_CMD_HASH_ORDER 9
 
+/* Early gen2 devices have a cacheline of just 32 bytes, using 64 is overkill,
+ * but keeps the logic simple. Indeed, the whole purpose of this macro is just
+ * to give some inclination as to some of the magic values used in the various
+ * workarounds!
+ */
+#define CACHELINE_BYTES 64
+
 /*
  * Gen2 BSpec "1. Programming Environment" / 1.4.4.6 "Ring Buffer Use"
  * Gen3 BSpec "vol1c Memory Interface Functions" / 2.3.4.5 "Ring Buffer Use"
@@ -217,6 +224,9 @@ struct  intel_engine_cs {
 
 	/* Execlists */
 	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
+	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
+				      u32 invalidate_domains,
+				      u32 flush_domains);
 
 	/**
 	 * List of objects currently involved in rendering from the
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 20/42] drm/i915/bdw: Emission of requests with logical rings
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (18 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 19/42] drm/i915/bdw: GEN-specific logical ring emit flush oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 21/42] drm/i915/bdw: Ring idle and stop " oscar.mateo
                   ` (22 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

On a previous iteration of this patch, I created an Execlists
version of __i915_add_request and asbtracted it away as a
vfunc. Daniel Vetter wondered then why that was needed:

"with the clean split in command submission I expect every
function to know wether it'll submit to an lrc (everything in
intel_lrc.c) or wether it'll submit to a legacy ring (existing
code), so I don't see a need for an add_request vfunc."

The honest, hairy truth is that this patch is the glue keeping
the whole logical ring puzzle together:

- i915_add_request is used by intel_ring_idle, which in turn is
  used by i915_gpu_idle, which in turn is used in several places
  inside the eviction and gtt codes.
- Also, it is used by i915_gem_check_olr, which is littered all
  over i915_gem.c
- ...

If I were to duplicate all the code that directly or indirectly
uses __i915_add_request, I'll end up creating a separate driver.

To show the differences between the existing legacy version and
the new Execlists one, this time I have special-cased
__i915_add_request instead of adding an add_request vfunc. I
hope this helps to untangle this Gordian knot.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c  | 72 +++++++++++++++++++++++++++++-----------
 drivers/gpu/drm/i915/intel_lrc.c | 30 ++++++++++++++---
 drivers/gpu/drm/i915/intel_lrc.h |  1 +
 3 files changed, 80 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9560b40..1c83b9c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2327,10 +2327,21 @@ int __i915_add_request(struct intel_engine_cs *ring,
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	struct drm_i915_gem_request *request;
+	struct intel_ringbuffer *ringbuf;
 	u32 request_ring_position, request_start;
 	int ret;
 
-	request_start = intel_ring_get_tail(ring->buffer);
+	request = ring->preallocated_lazy_request;
+	if (WARN_ON(request == NULL))
+		return -ENOMEM;
+
+	if (i915.enable_execlists) {
+		struct intel_context *ctx = request->ctx;
+		ringbuf = ctx->engine[ring->id].ringbuf;
+	} else
+		ringbuf = ring->buffer;
+
+	request_start = intel_ring_get_tail(ringbuf);
 	/*
 	 * Emit any outstanding flushes - execbuf can fail to emit the flush
 	 * after having emitted the batchbuffer command. Hence we need to fix
@@ -2338,24 +2349,32 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	 * is that the flush _must_ happen before the next request, no matter
 	 * what.
 	 */
-	ret = intel_ring_flush_all_caches(ring);
-	if (ret)
-		return ret;
-
-	request = ring->preallocated_lazy_request;
-	if (WARN_ON(request == NULL))
-		return -ENOMEM;
+	if (i915.enable_execlists) {
+		ret = logical_ring_flush_all_caches(ringbuf);
+		if (ret)
+			return ret;
+	} else {
+		ret = intel_ring_flush_all_caches(ring);
+		if (ret)
+			return ret;
+	}
 
 	/* Record the position of the start of the request so that
 	 * should we detect the updated seqno part-way through the
 	 * GPU processing the request, we never over-estimate the
 	 * position of the head.
 	 */
-	request_ring_position = intel_ring_get_tail(ring->buffer);
+	request_ring_position = intel_ring_get_tail(ringbuf);
 
-	ret = ring->add_request(ring);
-	if (ret)
-		return ret;
+	if (i915.enable_execlists) {
+		ret = ring->emit_request(ringbuf);
+		if (ret)
+			return ret;
+	} else {
+		ret = ring->add_request(ring);
+		if (ret)
+			return ret;
+	}
 
 	request->seqno = intel_ring_get_seqno(ring);
 	request->ring = ring;
@@ -2370,12 +2389,14 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	 */
 	request->batch_obj = obj;
 
-	/* Hold a reference to the current context so that we can inspect
-	 * it later in case a hangcheck error event fires.
-	 */
-	request->ctx = ring->last_context;
-	if (request->ctx)
-		i915_gem_context_reference(request->ctx);
+	if (!i915.enable_execlists) {
+		/* Hold a reference to the current context so that we can inspect
+		 * it later in case a hangcheck error event fires.
+		 */
+		request->ctx = ring->last_context;
+		if (request->ctx)
+			i915_gem_context_reference(request->ctx);
+	}
 
 	request->emitted_jiffies = jiffies;
 	list_add_tail(&request->list, &ring->request_list);
@@ -2630,6 +2651,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 
 	while (!list_empty(&ring->request_list)) {
 		struct drm_i915_gem_request *request;
+		struct intel_ringbuffer *ringbuf;
 
 		request = list_first_entry(&ring->request_list,
 					   struct drm_i915_gem_request,
@@ -2639,12 +2661,24 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 			break;
 
 		trace_i915_gem_request_retire(ring, request->seqno);
+
+		/* This is one of the few common intersection points
+		 * between legacy ringbuffer submission and execlists:
+		 * we need to tell them apart in order to find the correct
+		 * ringbuffer to which the request belongs to.
+		 */
+		if (i915.enable_execlists) {
+			struct intel_context *ctx = request->ctx;
+			ringbuf = ctx->engine[ring->id].ringbuf;
+		} else
+			ringbuf = ring->buffer;
+
 		/* We know the GPU must have read the request to have
 		 * sent us the seqno + interrupt, so use the position
 		 * of tail of the request to update the last known position
 		 * of the GPU head.
 		 */
-		ring->buffer->last_retired_head = request->tail;
+		ringbuf->last_retired_head = request->tail;
 
 		i915_gem_free_request(request);
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e25cbff..c89e0be 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -106,6 +106,22 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
 	/* TODO */
 }
 
+int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	int ret;
+
+	if (!ring->gpu_caches_dirty)
+		return 0;
+
+	ret = ring->emit_flush(ringbuf, 0, I915_GEM_GPU_DOMAINS);
+	if (ret)
+		return ret;
+
+	ring->gpu_caches_dirty = false;
+	return 0;
+}
+
 void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
 {
 	intel_logical_ring_advance(ringbuf);
@@ -116,7 +132,8 @@ void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
 	/* TODO: how to submit a context to the ELSP is not here yet */
 }
 
-static int logical_ring_alloc_seqno(struct intel_engine_cs *ring)
+static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
+				    struct intel_context *ctx)
 {
 	if (ring->outstanding_lazy_seqno)
 		return 0;
@@ -128,6 +145,13 @@ static int logical_ring_alloc_seqno(struct intel_engine_cs *ring)
 		if (request == NULL)
 			return -ENOMEM;
 
+		/* Hold a reference to the context this request belongs to
+		 * (we will need it when the time comes to emit/retire the
+		 * request).
+		 */
+		request->ctx = ctx;
+		i915_gem_context_reference(request->ctx);
+
 		ring->preallocated_lazy_request = request;
 	}
 
@@ -165,8 +189,6 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf, int bytes
 	if (ret)
 		return ret;
 
-	/* TODO: make sure we update the right ringbuffer's last_retired_head
-	 * when retiring requests */
 	i915_gem_retire_requests_ring(ring);
 	ringbuf->head = ringbuf->last_retired_head;
 	ringbuf->last_retired_head = -1;
@@ -291,7 +313,7 @@ int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords)
 		return ret;
 
 	/* Preallocate the olr before touching the ring */
-	ret = logical_ring_alloc_seqno(ring);
+	ret = logical_ring_alloc_seqno(ring, ringbuf->ctx);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 16798b6..696e09e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -29,6 +29,7 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring);
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
 int intel_logical_rings_init(struct drm_device *dev);
 
+int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf);
 void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf);
 static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
 {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 21/42] drm/i915/bdw: Ring idle and stop with logical rings
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (19 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 20/42] drm/i915/bdw: Emission of requests with logical rings oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 22/42] drm/i915/bdw: Interrupts " oscar.mateo
                   ` (21 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

This is a hard one, since there is no direct hardware ring to
control when in Execlists.

We reuse intel_ring_idle here, but it should be fine as long
as i915_add_request does the ring thing.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c89e0be..39ac8b1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -103,7 +103,24 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 
 void intel_logical_ring_stop(struct intel_engine_cs *ring)
 {
-	/* TODO */
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	int ret;
+
+	if (!intel_ring_initialized(ring))
+		return;
+
+	ret = intel_ring_idle(ring);
+	if (ret && !i915_reset_in_progress(&to_i915(ring->dev)->gpu_error))
+		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
+			  ring->name, ret);
+
+	/* TODO: Is this correct with Execlists enabled? */
+	I915_WRITE_MODE(ring, _MASKED_BIT_ENABLE(STOP_RING));
+	if (wait_for_atomic((I915_READ_MODE(ring) & MODE_IDLE) != 0, 1000)) {
+		DRM_ERROR("%s :timed out trying to stop ring\n", ring->name);
+		return;
+	}
+	I915_WRITE_MODE(ring, _MASKED_BIT_DISABLE(STOP_RING));
 }
 
 int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf)
@@ -475,10 +492,13 @@ static int gen8_emit_request(struct intel_ringbuffer *ringbuf)
 
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
 	if (!intel_ring_initialized(ring))
 		return;
 
-	/* TODO: make sure the ring is stopped */
+	intel_logical_ring_stop(ring);
+	WARN_ON((I915_READ_MODE(ring) & MODE_IDLE) == 0);
 	ring->preallocated_lazy_request = NULL;
 	ring->outstanding_lazy_seqno = 0;
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 22/42] drm/i915/bdw: Interrupts with logical rings
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (20 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 21/42] drm/i915/bdw: Ring idle and stop " oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 23/42] drm/i915/bdw: GEN-specific logical ring emit batchbuffer start oscar.mateo
                   ` (20 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

We need to attend context switch interrupts from all rings. Also, fixed writing
IMR/IER and added HWSTAM at ring init time.

Notice that, if added to irq_enable_mask, the context switch interrupts would
be incorrectly masked out when the user interrupts are due to no users waiting
on a sequence number. Therefore, this commit adds a bitmask of interrupts to
be kept unmasked at all times.

v2: Disable HWSTAM, as suggested by Damien (nobody listens to these interrupts,
anyway).

v3: Add new get/put_irq functions.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2 & v3)
---
 drivers/gpu/drm/i915/i915_irq.c         | 19 +++++++++--
 drivers/gpu/drm/i915/i915_reg.h         |  3 ++
 drivers/gpu/drm/i915/intel_lrc.c        | 58 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 4 files changed, 78 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 026f0a3..7bd44e2 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1643,6 +1643,8 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 				notify_ring(dev, &dev_priv->ring[RCS]);
 			if (bcs & GT_RENDER_USER_INTERRUPT)
 				notify_ring(dev, &dev_priv->ring[BCS]);
+			if ((rcs | bcs) & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				DRM_DEBUG_DRIVER("TODO: Context switch\n");
 		} else
 			DRM_ERROR("The master control interrupt lied (GT0)!\n");
 	}
@@ -1655,9 +1657,13 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
 			if (vcs & GT_RENDER_USER_INTERRUPT)
 				notify_ring(dev, &dev_priv->ring[VCS]);
+			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				DRM_DEBUG_DRIVER("TODO: Context switch\n");
 			vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
 			if (vcs & GT_RENDER_USER_INTERRUPT)
 				notify_ring(dev, &dev_priv->ring[VCS2]);
+			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				DRM_DEBUG_DRIVER("TODO: Context switch\n");
 		} else
 			DRM_ERROR("The master control interrupt lied (GT1)!\n");
 	}
@@ -1681,6 +1687,8 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 			vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
 			if (vcs & GT_RENDER_USER_INTERRUPT)
 				notify_ring(dev, &dev_priv->ring[VECS]);
+			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				DRM_DEBUG_DRIVER("TODO: Context switch\n");
 		} else
 			DRM_ERROR("The master control interrupt lied (GT3)!\n");
 	}
@@ -3768,12 +3776,17 @@ static void gen8_gt_irq_postinstall(struct drm_i915_private *dev_priv)
 	/* These are interrupts we'll toggle with the ring mask register */
 	uint32_t gt_interrupts[] = {
 		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
+			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
 			GT_RENDER_L3_PARITY_ERROR_INTERRUPT |
-			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT,
+			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT |
+			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT,
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
-			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT,
+			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
+			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT |
+			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT,
 		0,
-		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT
+		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT |
+			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT
 		};
 
 	for (i = 0; i < ARRAY_SIZE(gt_interrupts); i++)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 4df121b..1138b46 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1062,6 +1062,7 @@ enum punit_power_well {
 #define RING_ACTHD_UDW(base)	((base)+0x5c)
 #define RING_NOPID(base)	((base)+0x94)
 #define RING_IMR(base)		((base)+0xa8)
+#define RING_HWSTAM(base)	((base)+0x98)
 #define RING_TIMESTAMP(base)	((base)+0x358)
 #define   TAIL_ADDR		0x001FFFF8
 #define   HEAD_WRAP_COUNT	0xFFE00000
@@ -4590,6 +4591,8 @@ enum punit_power_well {
 #define GEN8_GT_IIR(which) (0x44308 + (0x10 * (which)))
 #define GEN8_GT_IER(which) (0x4430c + (0x10 * (which)))
 
+#define GEN8_GT_CONTEXT_SWITCH_INTERRUPT	(1 <<  8)
+
 #define GEN8_BCS_IRQ_SHIFT 16
 #define GEN8_RCS_IRQ_SHIFT 0
 #define GEN8_VCS2_IRQ_SHIFT 16
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 39ac8b1..9f45043 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -343,6 +343,9 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring)
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
+	I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
+	I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff);
+
 	I915_WRITE(RING_MODE_GEN7(ring),
 		_MASKED_BIT_DISABLE(GFX_REPLAY_MODE) |
 		_MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));
@@ -381,6 +384,39 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
 	return ret;
 }
 
+static bool gen8_logical_ring_get_irq(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	unsigned long flags;
+
+	if (!dev->irq_enabled)
+		return false;
+
+	spin_lock_irqsave(&dev_priv->irq_lock, flags);
+	if (ring->irq_refcount++ == 0) {
+		I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
+		POSTING_READ(RING_IMR(ring->mmio_base));
+	}
+	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+
+	return true;
+}
+
+static void gen8_logical_ring_put_irq(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev_priv->irq_lock, flags);
+	if (--ring->irq_refcount == 0) {
+		I915_WRITE_IMR(ring, ~ring->irq_keep_mask);
+		POSTING_READ(RING_IMR(ring->mmio_base));
+	}
+	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+}
+
 static int gen8_emit_flush(struct intel_ringbuffer *ringbuf,
 			   u32 invalidate_domains,
 			   u32 unused)
@@ -562,6 +598,10 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->mmio_base = RENDER_RING_BASE;
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
+	ring->irq_keep_mask =
+		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
+	if (HAS_L3_DPF(dev))
+		ring->irq_keep_mask |= GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
 
 	ring->init = gen8_init_render_ring;
 	ring->cleanup = intel_fini_pipe_control;
@@ -569,6 +609,8 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush_render;
+	ring->irq_get = gen8_logical_ring_get_irq;
+	ring->irq_put = gen8_logical_ring_put_irq;
 
 	return logical_ring_init(dev, ring);
 }
@@ -583,12 +625,16 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->mmio_base = GEN6_BSD_RING_BASE;
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
+	ring->irq_keep_mask =
+		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
+	ring->irq_get = gen8_logical_ring_get_irq;
+	ring->irq_put = gen8_logical_ring_put_irq;
 
 	return logical_ring_init(dev, ring);
 }
@@ -603,12 +649,16 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->mmio_base = GEN8_BSD2_RING_BASE;
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
+	ring->irq_keep_mask =
+		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
+	ring->irq_get = gen8_logical_ring_get_irq;
+	ring->irq_put = gen8_logical_ring_put_irq;
 
 	return logical_ring_init(dev, ring);
 }
@@ -623,12 +673,16 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->mmio_base = BLT_RING_BASE;
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
+	ring->irq_keep_mask =
+		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
+	ring->irq_get = gen8_logical_ring_get_irq;
+	ring->irq_put = gen8_logical_ring_put_irq;
 
 	return logical_ring_init(dev, ring);
 }
@@ -643,12 +697,16 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->mmio_base = VEBOX_RING_BASE;
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
+	ring->irq_keep_mask =
+		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
+	ring->irq_get = gen8_logical_ring_get_irq;
+	ring->irq_put = gen8_logical_ring_put_irq;
 
 	return logical_ring_init(dev, ring);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 6e22866..09102b2 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -223,6 +223,7 @@ struct  intel_engine_cs {
 	} semaphore;
 
 	/* Execlists */
+	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
 	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
 				      u32 invalidate_domains,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 23/42] drm/i915/bdw: GEN-specific logical ring emit batchbuffer start
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (21 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 22/42] drm/i915/bdw: Interrupts " oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 24/42] drm/i915/bdw: Workload submission mechanism for Execlists oscar.mateo
                   ` (19 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Dispatch_execbuffer's evil twin.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 28 ++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
 2 files changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9f45043..f4b3834 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -384,6 +384,29 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
 	return ret;
 }
 
+static int gen8_emit_bb_start(struct intel_ringbuffer *ringbuf,
+			      u64 offset, unsigned flags)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	bool ppgtt = dev_priv->mm.aliasing_ppgtt != NULL &&
+		!(flags & I915_DISPATCH_SECURE);
+	int ret;
+
+	ret = intel_logical_ring_begin(ringbuf, 4);
+	if (ret)
+		return ret;
+
+	/* FIXME(BDW): Address space and security selectors. */
+	intel_logical_ring_emit(ringbuf, MI_BATCH_BUFFER_START_GEN8 | (ppgtt<<8));
+	intel_logical_ring_emit(ringbuf, lower_32_bits(offset));
+	intel_logical_ring_emit(ringbuf, upper_32_bits(offset));
+	intel_logical_ring_emit(ringbuf, MI_NOOP);
+	intel_logical_ring_advance(ringbuf);
+
+	return 0;
+}
+
 static bool gen8_logical_ring_get_irq(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;
@@ -611,6 +634,7 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->emit_flush = gen8_emit_flush_render;
 	ring->irq_get = gen8_logical_ring_get_irq;
 	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
 }
@@ -635,6 +659,7 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
 	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
 }
@@ -659,6 +684,7 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
 	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
 }
@@ -683,6 +709,7 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
 	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
 }
@@ -707,6 +734,7 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
 	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 09102b2..c885d5c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -228,6 +228,8 @@ struct  intel_engine_cs {
 	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
 				      u32 invalidate_domains,
 				      u32 flush_domains);
+	int		(*emit_bb_start)(struct intel_ringbuffer *ringbuf,
+					 u64 offset, unsigned flags);
 
 	/**
 	 * List of objects currently involved in rendering from the
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 24/42] drm/i915/bdw: Workload submission mechanism for Execlists
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (22 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 23/42] drm/i915/bdw: GEN-specific logical ring emit batchbuffer start oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 25/42] drm/i915/bdw: Always use MMIO flips with Execlists oscar.mateo
                   ` (18 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

This is what i915_gem_do_execbuffer calls when it wants to execute some
worload in an Execlists world.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |   6 ++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
 drivers/gpu/drm/i915/intel_lrc.c           | 128 ++++++++++++++++++++++++++++-
 3 files changed, 135 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 24faec3..9a3adff 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2232,6 +2232,12 @@ int i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 			      struct drm_file *file_priv);
 int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 			     struct drm_file *file_priv);
+void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
+					struct intel_engine_cs *ring);
+void i915_gem_execbuffer_retire_commands(struct drm_device *dev,
+					 struct drm_file *file,
+					 struct intel_engine_cs *ring,
+					 struct drm_i915_gem_object *obj);
 int i915_gem_ringbuffer_submission(struct drm_device *dev,
 				   struct drm_file *file,
 				   struct intel_engine_cs *ring,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 491ef7e..41e6467 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -962,7 +962,7 @@ i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
 	return ctx;
 }
 
-static void
+void
 i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 				   struct intel_engine_cs *ring)
 {
@@ -994,7 +994,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 	}
 }
 
-static void
+void
 i915_gem_execbuffer_retire_commands(struct drm_device *dev,
 				    struct drm_file *file,
 				    struct intel_engine_cs *ring,
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f4b3834..40fc671 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -89,6 +89,57 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
+static int logical_ring_invalidate_all_caches(struct intel_ringbuffer *ringbuf)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	uint32_t flush_domains;
+	int ret;
+
+	flush_domains = 0;
+	if (ring->gpu_caches_dirty)
+		flush_domains = I915_GEM_GPU_DOMAINS;
+
+	ret = ring->emit_flush(ringbuf, I915_GEM_GPU_DOMAINS, flush_domains);
+	if (ret)
+		return ret;
+
+	ring->gpu_caches_dirty = false;
+	return 0;
+}
+
+static int execlists_move_to_gpu(struct intel_ringbuffer *ringbuf,
+				 struct list_head *vmas)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	struct i915_vma *vma;
+	uint32_t flush_domains = 0;
+	bool flush_chipset = false;
+	int ret;
+
+	list_for_each_entry(vma, vmas, exec_list) {
+		struct drm_i915_gem_object *obj = vma->obj;
+		ret = i915_gem_object_sync(obj, ring);
+		if (ret)
+			return ret;
+
+		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
+			flush_chipset |= i915_gem_clflush_object(obj, false);
+
+		flush_domains |= obj->base.write_domain;
+	}
+
+	if (flush_chipset)
+		i915_gem_chipset_flush(ring->dev);
+
+	if (flush_domains & I915_GEM_DOMAIN_GTT)
+		wmb();
+
+	/* Unconditionally invalidate gpu caches and ensure that we do flush
+	 * any residual writes from the previous batch.
+	 */
+	return logical_ring_invalidate_all_caches(ringbuf);
+}
+
 int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 			       struct intel_engine_cs *ring,
 			       struct intel_context *ctx,
@@ -97,7 +148,82 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 			       struct drm_i915_gem_object *batch_obj,
 			       u64 exec_start, u32 flags)
 {
-	/* TODO */
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	int instp_mode;
+	u32 instp_mask;
+	int ret;
+
+	if (args->num_cliprects != 0) {
+		DRM_DEBUG("clip rectangles are only valid on pre-gen5\n");
+		return -EINVAL;
+	} else {
+		if (args->DR4 == 0xffffffff) {
+			DRM_DEBUG("UXA submitting garbage DR4, fixing up\n");
+			args->DR4 = 0;
+		}
+
+		if (args->DR1 || args->DR4 || args->cliprects_ptr) {
+			DRM_DEBUG("0 cliprects but dirt in cliprects fields\n");
+			return -EINVAL;
+		}
+	}
+
+	ret = execlists_move_to_gpu(ringbuf, vmas);
+	if (ret)
+		return ret;
+
+	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
+	instp_mask = I915_EXEC_CONSTANTS_MASK;
+	/* The HW changed the meaning on this bit on gen6 */
+	instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
+	switch (instp_mode) {
+	case I915_EXEC_CONSTANTS_REL_GENERAL:
+		break;
+
+	case I915_EXEC_CONSTANTS_ABSOLUTE:
+		if (ring != &dev_priv->ring[RCS]) {
+			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
+			return -EINVAL;
+		}
+		break;
+
+	case I915_EXEC_CONSTANTS_REL_SURFACE:
+		DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
+		return -EINVAL;
+
+	default:
+		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
+		return -EINVAL;
+	}
+
+	if (ring == &dev_priv->ring[RCS] &&
+			instp_mode != dev_priv->relative_constants_mode) {
+		ret = intel_logical_ring_begin(ringbuf, 4);
+		if (ret)
+			return ret;
+
+		intel_logical_ring_emit(ringbuf, MI_NOOP);
+		intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1));
+		intel_logical_ring_emit(ringbuf, INSTPM);
+		intel_logical_ring_emit(ringbuf, instp_mask << 16 | instp_mode);
+		intel_logical_ring_advance(ringbuf);
+
+		dev_priv->relative_constants_mode = instp_mode;
+	}
+
+	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
+		DRM_DEBUG("sol reset is gen7 only\n");
+		return -EINVAL;
+	}
+
+	ret = ring->emit_bb_start(ringbuf, exec_start, flags);
+	if (ret)
+		return ret;
+
+	i915_gem_execbuffer_move_to_active(vmas, ring);
+	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
+
 	return 0;
 }
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 25/42] drm/i915/bdw: Always use MMIO flips with Execlists
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (23 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 24/42] drm/i915/bdw: Workload submission mechanism for Execlists oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 26/42] drm/i915/bdw: Render state init for Execlists oscar.mateo
                   ` (17 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

The normal flip function places things in the ring in the legacy
way, so we either fix that or force MMIO flips always as we do in
this patch.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 2 ++
 drivers/gpu/drm/i915/intel_lrc.c     | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 72abc0b..f3ca991 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -9487,6 +9487,8 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
 		return false;
 	else if (i915.use_mmio_flip > 0)
 		return true;
+	else if (i915.enable_execlists)
+		return true;
 	else
 		return ring != obj->ring;
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 40fc671..332dac2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -83,7 +83,8 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	if (enable_execlists == 0)
 		return 0;
 
-	if (HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev))
+	if (HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev) &&
+			i915.use_mmio_flip >= 0)
 		return 1;
 
 	return 0;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 26/42] drm/i915/bdw: Render state init for Execlists
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (24 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 25/42] drm/i915/bdw: Always use MMIO flips with Execlists oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 27/42] drm/i915/bdw: Implement context switching (somewhat) oscar.mateo
                   ` (16 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

The batchbuffer that sets the render context state is submitted
in a different way, and from different places.

We needed to make both the render state preparation and free functions
outside accesible, and namespace accordingly. This mess is so that all
LR, LRC and Execlists functionality can go together in intel_lrc.c: we
can fix all of this later on, once the interfaces are clear.

v2: Create a separate ctx->rcs_initialized for the Execlists case, as
suggested by Chris Wilson.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h              |  4 +--
 drivers/gpu/drm/i915/i915_gem_context.c      | 17 +++++++++-
 drivers/gpu/drm/i915/i915_gem_render_state.c | 40 ++++++++++++++---------
 drivers/gpu/drm/i915/i915_gem_render_state.h | 47 ++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c             | 46 +++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h             |  2 ++
 drivers/gpu/drm/i915/intel_renderstate.h     |  8 +----
 7 files changed, 139 insertions(+), 25 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_render_state.h

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9a3adff..9e3e38e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -37,6 +37,7 @@
 #include "intel_ringbuffer.h"
 #include "intel_lrc.h"
 #include "i915_gem_gtt.h"
+#include "i915_gem_render_state.h"
 #include <linux/io-mapping.h>
 #include <linux/i2c.h>
 #include <linux/i2c-algo-bit.h>
@@ -619,6 +620,7 @@ struct intel_context {
 	} legacy_hw_ctx;
 
 	/* Execlists */
+	bool rcs_initialized;
 	struct {
 		struct drm_i915_gem_object *state;
 		struct intel_ringbuffer *ringbuf;
@@ -2546,8 +2548,6 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 				   struct drm_file *file);
 
-/* i915_gem_render_state.c */
-int i915_gem_render_state_init(struct intel_engine_cs *ring);
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev,
 					  struct i915_address_space *vm,
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 9085ff1..0dc6992 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -513,8 +513,23 @@ int i915_gem_context_enable(struct drm_i915_private *dev_priv)
 		ppgtt->enable(ppgtt);
 	}
 
-	if (i915.enable_execlists)
+	if (i915.enable_execlists) {
+		struct intel_context *dctx;
+
+		ring = &dev_priv->ring[RCS];
+		dctx = ring->default_context;
+
+		if (!dctx->rcs_initialized) {
+			ret = intel_lr_context_render_state_init(ring, dctx);
+			if (ret) {
+				DRM_ERROR("Init render state failed: %d\n", ret);
+				return ret;
+			}
+			dctx->rcs_initialized = true;
+		}
+
 		return 0;
+	}
 
 	/* FIXME: We should make this work, even in reset */
 	if (i915_reset_in_progress(&dev_priv->gpu_error))
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index e60be3f..a9a62d7 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -28,13 +28,6 @@
 #include "i915_drv.h"
 #include "intel_renderstate.h"
 
-struct render_state {
-	const struct intel_renderstate_rodata *rodata;
-	struct drm_i915_gem_object *obj;
-	u64 ggtt_offset;
-	int gen;
-};
-
 static const struct intel_renderstate_rodata *
 render_state_get_rodata(struct drm_device *dev, const int gen)
 {
@@ -127,30 +120,47 @@ static int render_state_setup(struct render_state *so)
 	return 0;
 }
 
-static void render_state_fini(struct render_state *so)
+void i915_gem_render_state_fini(struct render_state *so)
 {
 	i915_gem_object_ggtt_unpin(so->obj);
 	drm_gem_object_unreference(&so->obj->base);
 }
 
-int i915_gem_render_state_init(struct intel_engine_cs *ring)
+int i915_gem_render_state_prepare(struct intel_engine_cs *ring,
+				  struct render_state *so)
 {
-	struct render_state so;
 	int ret;
 
 	if (WARN_ON(ring->id != RCS))
 		return -ENOENT;
 
-	ret = render_state_init(&so, ring->dev);
+	ret = render_state_init(so, ring->dev);
 	if (ret)
 		return ret;
 
-	if (so.rodata == NULL)
+	if (so->rodata == NULL)
 		return 0;
 
-	ret = render_state_setup(&so);
+	ret = render_state_setup(so);
+	if (ret) {
+		i915_gem_render_state_fini(so);
+		return ret;
+	}
+
+	return 0;
+}
+
+int i915_gem_render_state_init(struct intel_engine_cs *ring)
+{
+	struct render_state so;
+	int ret;
+
+	ret = i915_gem_render_state_prepare(ring, &so);
 	if (ret)
-		goto out;
+		return ret;
+
+	if (so.rodata == NULL)
+		return 0;
 
 	ret = ring->dispatch_execbuffer(ring,
 					so.ggtt_offset,
@@ -164,6 +174,6 @@ int i915_gem_render_state_init(struct intel_engine_cs *ring)
 	ret = __i915_add_request(ring, NULL, so.obj, NULL);
 	/* __i915_add_request moves object to inactive if it fails */
 out:
-	render_state_fini(&so);
+	i915_gem_render_state_fini(&so);
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.h b/drivers/gpu/drm/i915/i915_gem_render_state.h
new file mode 100644
index 0000000..c44961e
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.h
@@ -0,0 +1,47 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef _I915_GEM_RENDER_STATE_H_
+#define _I915_GEM_RENDER_STATE_H_
+
+#include <linux/types.h>
+
+struct intel_renderstate_rodata {
+	const u32 *reloc;
+	const u32 *batch;
+	const u32 batch_items;
+};
+
+struct render_state {
+	const struct intel_renderstate_rodata *rodata;
+	struct drm_i915_gem_object *obj;
+	u64 ggtt_offset;
+	int gen;
+};
+
+int i915_gem_render_state_init(struct intel_engine_cs *ring);
+void i915_gem_render_state_fini(struct render_state *so);
+int i915_gem_render_state_prepare(struct intel_engine_cs *ring,
+				  struct render_state *so);
+
+#endif /* _I915_GEM_RENDER_STATE_H_ */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 332dac2..27466df 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -919,6 +919,37 @@ cleanup_render_ring:
 	return ret;
 }
 
+int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
+				       struct intel_context *ctx)
+{
+	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	struct render_state so;
+	struct drm_i915_file_private *file_priv = ctx->file_priv;
+	struct drm_file *file = file_priv? file_priv->file : NULL;
+	int ret;
+
+	ret = i915_gem_render_state_prepare(ring, &so);
+	if (ret)
+		return ret;
+
+	if (so.rodata == NULL)
+		return 0;
+
+	ret = ring->emit_bb_start(ringbuf,
+			so.ggtt_offset,
+			I915_DISPATCH_SECURE);
+	if (ret)
+		goto out;
+
+	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), ring);
+
+	ret = __i915_add_request(ring, file, so.obj, NULL);
+	/* intel_logical_ring_add_request moves object to inactive if it fails */
+out:
+	i915_gem_render_state_fini(&so);
+	return ret;
+}
+
 static int
 populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj,
 		    struct intel_engine_cs *ring, struct intel_ringbuffer *ringbuf)
@@ -1136,6 +1167,21 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	ctx->engine[ring->id].ringbuf = ringbuf;
 	ctx->engine[ring->id].state = ctx_obj;
 
+	/* The default context will have to wait, because we are not yet
+	 * ready to send a batchbuffer at this point */
+	if (ring->id == RCS && !ctx->rcs_initialized &&
+			ctx != ring->default_context) {
+		ret = intel_lr_context_render_state_init(ring, ctx);
+		if (ret) {
+			DRM_ERROR("Init render state failed: %d\n", ret);
+			ctx->engine[ring->id].ringbuf = NULL;
+			ctx->engine[ring->id].state = NULL;
+			intel_destroy_ringbuffer_obj(ringbuf);
+			goto error;
+		}
+		ctx->rcs_initialized = true;
+	}
+
 	return 0;
 
 error:
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 696e09e..f20c3d2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -43,6 +43,8 @@ static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf, u32
 int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords);
 
 /* Logical Ring Contexts */
+int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
+				       struct intel_context *ctx);
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/intel_renderstate.h b/drivers/gpu/drm/i915/intel_renderstate.h
index fd4f662..6c792d3 100644
--- a/drivers/gpu/drm/i915/intel_renderstate.h
+++ b/drivers/gpu/drm/i915/intel_renderstate.h
@@ -24,13 +24,7 @@
 #ifndef _INTEL_RENDERSTATE_H
 #define _INTEL_RENDERSTATE_H
 
-#include <linux/types.h>
-
-struct intel_renderstate_rodata {
-	const u32 *reloc;
-	const u32 *batch;
-	const u32 batch_items;
-};
+#include "i915_drv.h"
 
 extern const struct intel_renderstate_rodata gen6_null_state;
 extern const struct intel_renderstate_rodata gen7_null_state;
-- 
1.9.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 27/42] drm/i915/bdw: Implement context switching (somewhat)
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (25 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 26/42] drm/i915/bdw: Render state init for Execlists oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 28/42] drm/i915/bdw: Write the tail pointer, LRC style oscar.mateo
                   ` (15 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

A context switch occurs by submitting a context descriptor to the
ExecList Submission Port. Given that we can now initialize a context,
it's possible to begin implementing the context switch by creating the
descriptor and submitting it to ELSP (actually two, since the ELSP
has two ports).

The context object must be mapped in the GGTT, which means it must exist
in the 0-4GB graphics VA range.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

v2: This code has changed quite a lot in various rebases. Of particular
importance is that now we use the globally unique Submission ID to send
to the hardware. Also, context pages are now pinned unconditionally to
GGTT, so there is no need to bind them.

v3: Use LRCA[31:12] as hwCtxId[19:0]. This guarantees that the HW context
ID we submit to the ELSP is globally unique and != 0 (Bspec requirements
of the software use-only bits of the Context ID in the Context Descriptor
Format) without the hassle of the previous submission Id construction.
Also, re-add the ELSP porting read (it was dropped somewhere during the
rebases).

v4:
- Squash with "drm/i915/bdw: Add forcewake lock around ELSP writes" (BSPEC
  says: "SW must set Force Wakeup bit to prevent GT from entering C6 while
  ELSP writes are in progress") as noted by Thomas Daniel
  (thomas.daniel@intel.com).
- Rename functions and use an execlists/intel_execlists_ namespace.
- The BUG_ON only checked that the LRCA was <32 bits, but it didn't make
  sure that it was properly aligned. Spotted by Alistair Mcaulay
  <alistair.mcaulay@intel.com>.

v5:
- Improved source code comments as suggested by Chris Wilson.
- No need to abstract submit_ctx away, as pointed by Brad Volkin.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 116 ++++++++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.h |   1 +
 2 files changed, 115 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 27466df..db5f27b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -47,6 +47,7 @@
 #define GEN8_LR_CONTEXT_ALIGN 4096
 
 #define RING_ELSP(ring)			((ring)->mmio_base+0x230)
+#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
 #define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
 
 #define CTX_LRI_HEADER_0		0x01
@@ -78,6 +79,26 @@
 #define CTX_R_PWR_CLK_STATE		0x42
 #define CTX_GPGPU_CSR_BASE_ADDRESS	0x44
 
+#define GEN8_CTX_VALID (1<<0)
+#define GEN8_CTX_FORCE_PD_RESTORE (1<<1)
+#define GEN8_CTX_FORCE_RESTORE (1<<2)
+#define GEN8_CTX_L3LLC_COHERENT (1<<5)
+#define GEN8_CTX_PRIVILEGE (1<<8)
+enum {
+	ADVANCED_CONTEXT=0,
+	LEGACY_CONTEXT,
+	ADVANCED_AD_CONTEXT,
+	LEGACY_64B_CONTEXT
+};
+#define GEN8_CTX_MODE_SHIFT 3
+enum {
+	FAULT_AND_HANG=0,
+	FAULT_AND_HALT, /* Debug only */
+	FAULT_AND_STREAM,
+	FAULT_AND_CONTINUE /* Unsupported */
+};
+#define GEN8_CTX_ID_SHIFT 32
+
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists)
 {
 	if (enable_execlists == 0)
@@ -90,6 +111,93 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
+u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
+{
+	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj);
+
+	/* LRCA is required to be 4K aligned so the more significant 20 bits
+	 * are globally unique */
+	return lrca >> 12;
+}
+
+static uint64_t execlists_ctx_descriptor(struct drm_i915_gem_object *ctx_obj)
+{
+	uint64_t desc;
+	uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj);
+	BUG_ON(lrca & 0xFFFFFFFF00000FFFULL);
+
+	desc = GEN8_CTX_VALID;
+	desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
+	desc |= GEN8_CTX_L3LLC_COHERENT;
+	desc |= GEN8_CTX_PRIVILEGE;
+	desc |= lrca;
+	desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
+
+	/* TODO: WaDisableLiteRestore when we start using semaphore
+	 * signalling between Command Streamers */
+	/* desc |= GEN8_CTX_FORCE_RESTORE; */
+
+	return desc;
+}
+
+static void execlists_elsp_write(struct intel_engine_cs *ring,
+				 struct drm_i915_gem_object *ctx_obj0,
+				 struct drm_i915_gem_object *ctx_obj1)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	uint64_t temp = 0;
+	uint32_t desc[4];
+
+	/* XXX: You must always write both descriptors in the order below. */
+	if (ctx_obj1)
+		temp = execlists_ctx_descriptor(ctx_obj1);
+	else
+		temp = 0;
+	desc[1] = (u32)(temp >> 32);
+	desc[0] = (u32)temp;
+
+	temp = execlists_ctx_descriptor(ctx_obj0);
+	desc[3] = (u32)(temp >> 32);
+	desc[2] = (u32)temp;
+
+	/* Set Force Wakeup bit to prevent GT from entering C6 while
+	 * ELSP writes are in progress */
+	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
+
+	I915_WRITE(RING_ELSP(ring), desc[1]);
+	I915_WRITE(RING_ELSP(ring), desc[0]);
+	I915_WRITE(RING_ELSP(ring), desc[3]);
+	/* The context is automatically loaded after the following */
+	I915_WRITE(RING_ELSP(ring), desc[2]);
+
+	/* ELSP is a wo register, so use another nearby reg for posting instead */
+	POSTING_READ(RING_EXECLIST_STATUS(ring));
+
+	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
+}
+
+static int execlists_submit_context(struct intel_engine_cs *ring,
+				    struct intel_context *to0, u32 tail0,
+				    struct intel_context *to1, u32 tail1)
+{
+	struct drm_i915_gem_object *ctx_obj0;
+	struct drm_i915_gem_object *ctx_obj1 = NULL;
+
+	ctx_obj0 = to0->engine[ring->id].state;
+	BUG_ON(!ctx_obj0);
+	BUG_ON(!i915_gem_obj_is_pinned(ctx_obj0));
+
+	if (to1) {
+		ctx_obj1 = to1->engine[ring->id].state;
+		BUG_ON(!ctx_obj1);
+		BUG_ON(!i915_gem_obj_is_pinned(ctx_obj1));
+	}
+
+	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
+
+	return 0;
+}
+
 static int logical_ring_invalidate_all_caches(struct intel_ringbuffer *ringbuf)
 {
 	struct intel_engine_cs *ring = ringbuf->ring;
@@ -268,12 +376,16 @@ int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf)
 
 void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
 {
+	struct intel_engine_cs *ring = ringbuf->ring;
+	struct intel_context *ctx = ringbuf->ctx;
+
 	intel_logical_ring_advance(ringbuf);
 
-	if (intel_ring_stopped(ringbuf->ring))
+	if (intel_ring_stopped(ring))
 		return;
 
-	/* TODO: how to submit a context to the ELSP is not here yet */
+	/* FIXME: too cheeky, we don't even check if the ELSP is ready */
+	execlists_submit_context(ring, ctx, ringbuf->tail, NULL, 0);
 }
 
 static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index f20c3d2..b59965b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -58,5 +58,6 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 			       struct list_head *vmas,
 			       struct drm_i915_gem_object *batch_obj,
 			       u64 exec_start, u32 flags);
+u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
 #endif /* _INTEL_LRC_H_ */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 28/42] drm/i915/bdw: Write the tail pointer, LRC style
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (26 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 27/42] drm/i915/bdw: Implement context switching (somewhat) oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 29/42] drm/i915/bdw: Two-stage execlist submit process oscar.mateo
                   ` (14 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Each logical ring context has the tail pointer in the context object,
so update it before submission.

v2: New namespace.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index db5f27b..ceee121 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -176,6 +176,21 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
 }
 
+static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tail)
+{
+	struct page *page;
+	uint32_t *reg_state;
+
+	page = i915_gem_object_get_page(ctx_obj, 1);
+	reg_state = kmap_atomic(page);
+
+	reg_state[CTX_RING_TAIL+1] = tail;
+
+	kunmap_atomic(reg_state);
+
+	return 0;
+}
+
 static int execlists_submit_context(struct intel_engine_cs *ring,
 				    struct intel_context *to0, u32 tail0,
 				    struct intel_context *to1, u32 tail1)
@@ -187,10 +202,14 @@ static int execlists_submit_context(struct intel_engine_cs *ring,
 	BUG_ON(!ctx_obj0);
 	BUG_ON(!i915_gem_obj_is_pinned(ctx_obj0));
 
+	execlists_ctx_write_tail(ctx_obj0, tail0);
+
 	if (to1) {
 		ctx_obj1 = to1->engine[ring->id].state;
 		BUG_ON(!ctx_obj1);
 		BUG_ON(!i915_gem_obj_is_pinned(ctx_obj1));
+
+		execlists_ctx_write_tail(ctx_obj1, tail1);
 	}
 
 	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 29/42] drm/i915/bdw: Two-stage execlist submit process
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (27 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 28/42] drm/i915/bdw: Write the tail pointer, LRC style oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 30/42] drm/i915/bdw: Handle context switch events oscar.mateo
                   ` (13 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Michel Thierry <michel.thierry@intel.com>

Context switch (and execlist submission) should happen only when
other contexts are not active, otherwise pre-emption occurs.

To assure this, we place context switch requests in a queue and those
request are later consumed when the right context switch interrupt is
received (still TODO).

v2: Use a spinlock, do not remove the requests on unqueue (wait for
context switch completion).

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

v3: Several rebases and code changes. Use unique ID.

v4:
- Move the queue/lock init to the late ring initialization.
- Damien's kmalloc review comments: check return, use sizeof(*req),
do not cast.

v5:
- Do not reuse drm_i915_gem_request. Instead, create our own.
- New namespace.

Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2-v5)
---
 drivers/gpu/drm/i915/intel_lrc.c        | 63 +++++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_lrc.h        |  8 +++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ceee121..68993f8 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -217,6 +217,63 @@ static int execlists_submit_context(struct intel_engine_cs *ring,
 	return 0;
 }
 
+static void execlists_context_unqueue(struct intel_engine_cs *ring)
+{
+	struct intel_ctx_submit_request *req0 = NULL, *req1 = NULL;
+	struct intel_ctx_submit_request *cursor = NULL, *tmp = NULL;
+
+	if (list_empty(&ring->execlist_queue))
+		return;
+
+	/* Try to read in pairs */
+	list_for_each_entry_safe(cursor, tmp, &ring->execlist_queue, execlist_link) {
+		if (!req0)
+			req0 = cursor;
+		else if (req0->ctx == cursor->ctx) {
+			/* Same ctx: ignore first request, as second request
+			 * will update tail past first request's workload */
+			list_del(&req0->execlist_link);
+			i915_gem_context_unreference(req0->ctx);
+			kfree(req0);
+			req0 = cursor;
+		} else {
+			req1 = cursor;
+			break;
+		}
+	}
+
+	BUG_ON(execlists_submit_context(ring, req0->ctx, req0->tail,
+			req1? req1->ctx : NULL, req1? req1->tail : 0));
+}
+
+static int execlists_context_queue(struct intel_engine_cs *ring,
+				   struct intel_context *to,
+				   u32 tail)
+{
+	struct intel_ctx_submit_request *req = NULL;
+	unsigned long flags;
+	bool was_empty;
+
+	req = kzalloc(sizeof(*req), GFP_KERNEL);
+	if (req == NULL)
+		return -ENOMEM;
+	req->ctx = to;
+	i915_gem_context_reference(req->ctx);
+	req->ring = ring;
+	req->tail = tail;
+
+	spin_lock_irqsave(&ring->execlist_lock, flags);
+
+	was_empty = list_empty(&ring->execlist_queue);
+	list_add_tail(&req->execlist_link, &ring->execlist_queue);
+	if (was_empty)
+		execlists_context_unqueue(ring);
+
+	spin_unlock_irqrestore(&ring->execlist_lock, flags);
+
+	return 0;
+}
+
 static int logical_ring_invalidate_all_caches(struct intel_ringbuffer *ringbuf)
 {
 	struct intel_engine_cs *ring = ringbuf->ring;
@@ -403,8 +460,7 @@ void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
 	if (intel_ring_stopped(ring))
 		return;
 
-	/* FIXME: too cheeky, we don't even check if the ELSP is ready */
-	execlists_submit_context(ring, ctx, ringbuf->tail, NULL, 0);
+	execlists_context_queue(ring, ctx, ringbuf->tail);
 }
 
 static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
@@ -844,6 +900,9 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	INIT_LIST_HEAD(&ring->request_list);
 	init_waitqueue_head(&ring->irq_queue);
 
+	INIT_LIST_HEAD(&ring->execlist_queue);
+	spin_lock_init(&ring->execlist_lock);
+
 	ret = intel_lr_context_deferred_create(dctx, ring);
 	if (ret)
 		return ret;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index b59965b..14492a9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -60,4 +60,12 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 			       u64 exec_start, u32 flags);
 u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
+struct intel_ctx_submit_request {
+	struct intel_context *ctx;
+	struct intel_engine_cs *ring;
+	u32 tail;
+
+	struct list_head execlist_link;
+};
+
 #endif /* _INTEL_LRC_H_ */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index c885d5c..6358823 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -223,6 +223,8 @@ struct  intel_engine_cs {
 	} semaphore;
 
 	/* Execlists */
+	spinlock_t execlist_lock;
+	struct list_head execlist_queue;
 	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
 	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 30/42] drm/i915/bdw: Handle context switch events
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (28 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 29/42] drm/i915/bdw: Two-stage execlist submit process oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 31/42] drm/i915/bdw: Avoid non-lite-restore preemptions oscar.mateo
                   ` (12 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Thomas Daniel <thomas.daniel@intel.com>

Handle all context status events in the context status buffer on every
context switch interrupt. We only remove work from the execlist queue
after a context status buffer reports that it has completed and we only
attempt to schedule new contexts on interrupt when a previously submitted
context completes (unless no contexts are queued, which means the GPU is
free).

We canot call intel_runtime_pm_get() in an interrupt (or with a spinlock
grabbed, FWIW), because it might sleep, which is not a nice thing to do.
Instead, do the runtime_pm get/put together with the create/destroy request,
and handle the forcewake get/put directly.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

v2: Unreferencing the context when we are freeing the request might free
the backing bo, which requires the struct_mutex to be grabbed, so defer
unreferencing and freeing to a bottom half.

v3:
- Ack the interrupt inmediately, before trying to handle it (fix for
missing interrupts by Bob Beckett <robert.beckett@intel.com>).
- Update the Context Status Buffer Read Pointer, just in case (spotted
by Damien Lespiau).

v4: New namespace and multiple rebase changes.

v5: Squash with "drm/i915/bdw: Do not call intel_runtime_pm_get() in an
interrupt", as suggested by Daniel.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c         |  35 ++++++---
 drivers/gpu/drm/i915/intel_lrc.c        | 129 ++++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_lrc.h        |   3 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
 4 files changed, 151 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 7bd44e2..ae748db 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1628,6 +1628,7 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 				       struct drm_i915_private *dev_priv,
 				       u32 master_ctl)
 {
+	struct intel_engine_cs *ring;
 	u32 rcs, bcs, vcs;
 	uint32_t tmp = 0;
 	irqreturn_t ret = IRQ_NONE;
@@ -1637,14 +1638,20 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 		if (tmp) {
 			I915_WRITE(GEN8_GT_IIR(0), tmp);
 			ret = IRQ_HANDLED;
+
 			rcs = tmp >> GEN8_RCS_IRQ_SHIFT;
-			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
+			ring = &dev_priv->ring[RCS];
 			if (rcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[RCS]);
+				notify_ring(dev, ring);
+			if (rcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				intel_execlists_handle_ctx_events(ring);
+
+			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
+			ring = &dev_priv->ring[BCS];
 			if (bcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[BCS]);
-			if ((rcs | bcs) & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
-				DRM_DEBUG_DRIVER("TODO: Context switch\n");
+				notify_ring(dev, ring);
+			if (bcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				intel_execlists_handle_ctx_events(ring);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT0)!\n");
 	}
@@ -1654,16 +1661,20 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 		if (tmp) {
 			I915_WRITE(GEN8_GT_IIR(1), tmp);
 			ret = IRQ_HANDLED;
+
 			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
+			ring = &dev_priv->ring[VCS];
 			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[VCS]);
+				notify_ring(dev, ring);
 			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
-				DRM_DEBUG_DRIVER("TODO: Context switch\n");
+				intel_execlists_handle_ctx_events(ring);
+
 			vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
+			ring = &dev_priv->ring[VCS2];
 			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[VCS2]);
+				notify_ring(dev, ring);
 			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
-				DRM_DEBUG_DRIVER("TODO: Context switch\n");
+				intel_execlists_handle_ctx_events(ring);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT1)!\n");
 	}
@@ -1684,11 +1695,13 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 		if (tmp) {
 			I915_WRITE(GEN8_GT_IIR(3), tmp);
 			ret = IRQ_HANDLED;
+
 			vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
+			ring = &dev_priv->ring[VECS];
 			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[VECS]);
+				notify_ring(dev, ring);
 			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
-				DRM_DEBUG_DRIVER("TODO: Context switch\n");
+				intel_execlists_handle_ctx_events(ring);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT3)!\n");
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 68993f8..9d320fa 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -49,6 +49,22 @@
 #define RING_ELSP(ring)			((ring)->mmio_base+0x230)
 #define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
 #define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
+#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
+#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
+
+#define RING_EXECLIST_QFULL		(1 << 0x2)
+#define RING_EXECLIST1_VALID		(1 << 0x3)
+#define RING_EXECLIST0_VALID		(1 << 0x4)
+#define RING_EXECLIST_ACTIVE_STATUS	(3 << 0xE)
+#define RING_EXECLIST1_ACTIVE		(1 << 0x11)
+#define RING_EXECLIST0_ACTIVE		(1 << 0x12)
+
+#define GEN8_CTX_STATUS_IDLE_ACTIVE	(1 << 0)
+#define GEN8_CTX_STATUS_PREEMPTED	(1 << 1)
+#define GEN8_CTX_STATUS_ELEMENT_SWITCH	(1 << 2)
+#define GEN8_CTX_STATUS_ACTIVE_IDLE	(1 << 3)
+#define GEN8_CTX_STATUS_COMPLETE	(1 << 4)
+#define GEN8_CTX_STATUS_LITE_RESTORE	(1 << 15)
 
 #define CTX_LRI_HEADER_0		0x01
 #define CTX_CONTEXT_CONTROL		0x02
@@ -147,6 +163,7 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	uint64_t temp = 0;
 	uint32_t desc[4];
+	unsigned long flags;
 
 	/* XXX: You must always write both descriptors in the order below. */
 	if (ctx_obj1)
@@ -160,9 +177,17 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	desc[3] = (u32)(temp >> 32);
 	desc[2] = (u32)temp;
 
-	/* Set Force Wakeup bit to prevent GT from entering C6 while
-	 * ELSP writes are in progress */
-	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
+	/* Set Force Wakeup bit to prevent GT from entering C6 while ELSP writes
+	 * are in progress.
+	 *
+	 * The other problem is that we can't just call gen6_gt_force_wake_get()
+	 * because that function calls intel_runtime_pm_get(), which might sleep.
+	 * Instead, we do the runtime_pm_get/put when creating/destroying requests.
+	 */
+	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
+	if (dev_priv->uncore.forcewake_count++ == 0)
+		dev_priv->uncore.funcs.force_wake_get(dev_priv, FORCEWAKE_ALL);
+	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
 
 	I915_WRITE(RING_ELSP(ring), desc[1]);
 	I915_WRITE(RING_ELSP(ring), desc[0]);
@@ -173,7 +198,11 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	/* ELSP is a wo register, so use another nearby reg for posting instead */
 	POSTING_READ(RING_EXECLIST_STATUS(ring));
 
-	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
+	/* Release Force Wakeup (see the big comment above). */
+	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
+	if (--dev_priv->uncore.forcewake_count == 0)
+		dev_priv->uncore.funcs.force_wake_put(dev_priv, FORCEWAKE_ALL);
+	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
 }
 
 static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tail)
@@ -221,6 +250,9 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 {
 	struct intel_ctx_submit_request *req0 = NULL, *req1 = NULL;
 	struct intel_ctx_submit_request *cursor = NULL, *tmp = NULL;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	assert_spin_locked(&ring->execlist_lock);
 
 	if (list_empty(&ring->execlist_queue))
 		return;
@@ -233,8 +265,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 			/* Same ctx: ignore first request, as second request
 			 * will update tail past first request's workload */
 			list_del(&req0->execlist_link);
-			i915_gem_context_unreference(req0->ctx);
-			kfree(req0);
+			queue_work(dev_priv->wq, &req0->work);
 			req0 = cursor;
 		} else {
 			req1 = cursor;
@@ -246,6 +277,89 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 			req1? req1->ctx : NULL, req1? req1->tail : 0));
 }
 
+static bool execlists_check_remove_request(struct intel_engine_cs *ring,
+					   u32 request_id)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct intel_ctx_submit_request *head_req;
+
+	assert_spin_locked(&ring->execlist_lock);
+
+	head_req = list_first_entry_or_null(&ring->execlist_queue,
+			struct intel_ctx_submit_request, execlist_link);
+	if (head_req != NULL) {
+		struct drm_i915_gem_object *ctx_obj =
+				head_req->ctx->engine[ring->id].state;
+		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
+			list_del(&head_req->execlist_link);
+			queue_work(dev_priv->wq, &head_req->work);
+			return true;
+		}
+	}
+
+	return false;
+}
+
+void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	u32 status_pointer;
+	u8 read_pointer;
+	u8 write_pointer;
+	u32 status;
+	u32 status_id;
+	u32 submit_contexts = 0;
+
+	status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
+
+	read_pointer = ring->next_context_status_buffer;
+	write_pointer = status_pointer & 0x07;
+	if (read_pointer > write_pointer)
+		write_pointer += 6;
+
+	spin_lock(&ring->execlist_lock);
+
+	while (read_pointer < write_pointer) {
+		read_pointer++;
+		status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
+				(read_pointer % 6) * 8);
+		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
+				(read_pointer % 6) * 8 + 4);
+
+		if (status & GEN8_CTX_STATUS_COMPLETE) {
+			if (execlists_check_remove_request(ring, status_id))
+				submit_contexts++;
+		}
+	}
+
+	if (submit_contexts != 0)
+		execlists_context_unqueue(ring);
+
+	spin_unlock(&ring->execlist_lock);
+
+	WARN(submit_contexts > 2, "More than two context complete events?\n");
+	ring->next_context_status_buffer = write_pointer % 6;
+
+	I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
+			((u32)ring->next_context_status_buffer & 0x07) << 8);
+}
+
+static void execlists_free_request_task(struct work_struct *work)
+{
+	struct intel_ctx_submit_request *req =
+			container_of(work, struct intel_ctx_submit_request, work);
+	struct drm_device *dev = req->ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	intel_runtime_pm_put(dev_priv);
+
+	mutex_lock(&dev->struct_mutex);
+	i915_gem_context_unreference(req->ctx);
+	mutex_unlock(&dev->struct_mutex);
+
+	kfree(req);
+}
+
 static int execlists_context_queue(struct intel_engine_cs *ring,
 				   struct intel_context *to,
 				   u32 tail)
@@ -261,6 +375,8 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 	i915_gem_context_reference(req->ctx);
 	req->ring = ring;
 	req->tail = tail;
+	INIT_WORK(&req->work, execlists_free_request_task);
+	intel_runtime_pm_get(dev_priv);
 
 	spin_lock_irqsave(&ring->execlist_lock, flags);
 
@@ -902,6 +1018,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 
 	INIT_LIST_HEAD(&ring->execlist_queue);
 	spin_lock_init(&ring->execlist_lock);
+	ring->next_context_status_buffer = 0;
 
 	ret = intel_lr_context_deferred_create(dctx, ring);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 14492a9..2e8929f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -66,6 +66,9 @@ struct intel_ctx_submit_request {
 	u32 tail;
 
 	struct list_head execlist_link;
+	struct work_struct work;
 };
 
+void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring);
+
 #endif /* _INTEL_LRC_H_ */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 6358823..905d1ba 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -225,6 +225,7 @@ struct  intel_engine_cs {
 	/* Execlists */
 	spinlock_t execlist_lock;
 	struct list_head execlist_queue;
+	u8 next_context_status_buffer;
 	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
 	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 31/42] drm/i915/bdw: Avoid non-lite-restore preemptions
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (29 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 30/42] drm/i915/bdw: Handle context switch events oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 32/42] drm/i915/bdw: Help out the ctx switch interrupt handler oscar.mateo
                   ` (11 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

In the current Execlists feeding mechanism, full preemption is not
supported yet: only lite-restores are allowed (this is: the GPU
simply samples a new tail pointer for the context currently in
execution).

But we have identified an scenario in which a full preemption occurs:
1) We submit two contexts for execution (A & B).
2) The GPU finishes with the first one (A), switches to the second one
(B) and informs us.
3) We submit B again (hoping to cause a lite restore) together with C,
but in the time we spend writing to the ELSP, the GPU finishes B.
4) The GPU start executing B again (since we told it so).
5) We receive a B finished interrupt and, mistakenly, we submit C (again)
and D, causing a full preemption of B.

The race is avoided by keeping track of how many times a context has been
submitted to the hardware and by better discriminating the received context
switch interrupts: in the example, when we have submitted B twice, we won´t
submit C and D as soon as we receive the notification that B is completed
because we were expecting to get a LITE_RESTORE and we didn´t, so we know a
second completion will be received shortly.

Without this explicit checking, somehow, the batch buffer execution order
gets messed with. This can be verified with the IGT test I sent together with
the series. I don´t know the exact mechanism by which the pre-emption messes
with the execution order but, since other people is working on the Scheduler
+ Preemption on Execlists, I didn´t try to fix it. In these series, only Lite
Restores are supported (other kind of preemptions WARN).

v2: elsp_submitted belongs in the new intel_ctx_submit_request. Several
rebase changes.

v3: Clarify how the race is avoided, as requested by Daniel.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 28 ++++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_lrc.h |  2 ++
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9d320fa..d3e961e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -264,6 +264,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 		else if (req0->ctx == cursor->ctx) {
 			/* Same ctx: ignore first request, as second request
 			 * will update tail past first request's workload */
+			cursor->elsp_submitted = req0->elsp_submitted;
 			list_del(&req0->execlist_link);
 			queue_work(dev_priv->wq, &req0->work);
 			req0 = cursor;
@@ -273,8 +274,14 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 		}
 	}
 
+	WARN_ON(req1 && req1->elsp_submitted);
+
 	BUG_ON(execlists_submit_context(ring, req0->ctx, req0->tail,
 			req1? req1->ctx : NULL, req1? req1->tail : 0));
+
+	req0->elsp_submitted++;
+	if (req1)
+		req1->elsp_submitted++;
 }
 
 static bool execlists_check_remove_request(struct intel_engine_cs *ring,
@@ -291,9 +298,13 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
 		struct drm_i915_gem_object *ctx_obj =
 				head_req->ctx->engine[ring->id].state;
 		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
-			list_del(&head_req->execlist_link);
-			queue_work(dev_priv->wq, &head_req->work);
-			return true;
+			WARN(head_req->elsp_submitted == 0,
+					"Never submitted head request\n");
+			if (--head_req->elsp_submitted <= 0) {
+				list_del(&head_req->execlist_link);
+				queue_work(dev_priv->wq, &head_req->work);
+				return true;
+			}
 		}
 	}
 
@@ -326,7 +337,16 @@ void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring)
 		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
 				(read_pointer % 6) * 8 + 4);
 
-		if (status & GEN8_CTX_STATUS_COMPLETE) {
+		if (status & GEN8_CTX_STATUS_PREEMPTED) {
+			if (status & GEN8_CTX_STATUS_LITE_RESTORE) {
+				if (execlists_check_remove_request(ring, status_id))
+					WARN(1, "Lite Restored request removed from queue\n");
+			} else
+				WARN(1, "Preemption without Lite Restore\n");
+		}
+
+		 if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) ||
+		     (status & GEN8_CTX_STATUS_ELEMENT_SWITCH)) {
 			if (execlists_check_remove_request(ring, status_id))
 				submit_contexts++;
 		}
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 2e8929f..074b44f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -67,6 +67,8 @@ struct intel_ctx_submit_request {
 
 	struct list_head execlist_link;
 	struct work_struct work;
+
+	int elsp_submitted;
 };
 
 void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring);
-- 
1.9.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 32/42] drm/i915/bdw: Help out the ctx switch interrupt handler
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (30 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 31/42] drm/i915/bdw: Avoid non-lite-restore preemptions oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 33/42] drm/i915/bdw: Make sure gpu reset still works with Execlists oscar.mateo
                   ` (10 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

If we receive a storm of requests for the same context (see gem_storedw_loop_*)
we might end up iterating over too many elements in interrupt time, looking for
contexts to squash together. Instead, share the burden by giving more
intelligence to the queue function. At most, the interrupt will iterate over
three elements.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d3e961e..6beab26 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -384,9 +384,10 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 				   struct intel_context *to,
 				   u32 tail)
 {
-	struct intel_ctx_submit_request *req = NULL;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct intel_ctx_submit_request *req = NULL, *cursor;
 	unsigned long flags;
-	bool was_empty;
+	int num_elements = 0;
 
 	req = kzalloc(sizeof(*req), GFP_KERNEL);
 	if (req == NULL)
@@ -400,9 +401,26 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 
 	spin_lock_irqsave(&ring->execlist_lock, flags);
 
-	was_empty = list_empty(&ring->execlist_queue);
+	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
+		if (++num_elements > 2)
+			break;
+
+	if (num_elements > 2) {
+		struct intel_ctx_submit_request *tail_req;
+
+		tail_req = list_last_entry(&ring->execlist_queue,
+					struct intel_ctx_submit_request,
+					execlist_link);
+		if (to == tail_req->ctx) {
+			WARN(tail_req->elsp_submitted != 0,
+					"More than 2 already-submitted reqs queued\n");
+			list_del(&tail_req->execlist_link);
+			queue_work(dev_priv->wq, &tail_req->work);
+		}
+	}
+
 	list_add_tail(&req->execlist_link, &ring->execlist_queue);
-	if (was_empty)
+	if (num_elements == 0)
 		execlists_context_unqueue(ring);
 
 	spin_unlock_irqrestore(&ring->execlist_lock, flags);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 33/42] drm/i915/bdw: Make sure gpu reset still works with Execlists
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (31 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 32/42] drm/i915/bdw: Help out the ctx switch interrupt handler oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 34/42] drm/i915/bdw: Make sure error capture keeps working " oscar.mateo
                   ` (9 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

If we reset a ring after a hang, we have to make sure that we clear
out all queued Execlists requests.

v2: The ring is, at this point, already being correctly re-programmed
for Execlists, and the hangcheck counters cleared.

v3: Daniel suggests to drop the "if (execlists)" because the Execlists
queue should be empty in legacy mode (which is true, if we do the
INIT_LIST_HEAD).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         | 11 +++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
 2 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1c83b9c..b5ea3b0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2567,6 +2567,17 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 		i915_gem_free_request(request);
 	}
 
+	while (!list_empty(&ring->execlist_queue)) {
+		struct intel_ctx_submit_request *submit_req;
+
+		submit_req = list_first_entry(&ring->execlist_queue,
+				struct intel_ctx_submit_request,
+				execlist_link);
+		list_del(&submit_req->execlist_link);
+		i915_gem_context_unreference(submit_req->ctx);
+		kfree(submit_req);
+	}
+
 	/* These may not have been flush before the reset, do so now */
 	kfree(ring->preallocated_lazy_request);
 	ring->preallocated_lazy_request = NULL;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 3188403..6e604c9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1587,6 +1587,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->execlist_queue);
 	ringbuf->size = 32 * PAGE_SIZE;
 	ringbuf->ring = ring;
 	ringbuf->ctx = ring->default_context;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 34/42] drm/i915/bdw: Make sure error capture keeps working with Execlists
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (32 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 33/42] drm/i915/bdw: Make sure gpu reset still works with Execlists oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 35/42] drm/i915/bdw: Disable semaphores for Execlists oscar.mateo
                   ` (8 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Since the ringbuffer does not belong per engine anymore, we have to
make sure that we are always recording the correct ringbuffer.

TODO: This is only a small fix to keep basic error capture working, but
we need to add more information for it to be useful (e.g. dump the
context being executed).

v2: Reorder how the ringbuffer is chosen to clarify the change and
rename the variable, both changes suggested by Chris Wilson. Also,
add the TODO comment to the code, as suggested by Daniel.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 45b6191..1e38576 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -874,9 +874,6 @@ static void i915_record_ring_state(struct drm_device *dev,
 		ering->hws = I915_READ(mmio);
 	}
 
-	ering->cpu_ring_head = ring->buffer->head;
-	ering->cpu_ring_tail = ring->buffer->tail;
-
 	ering->hangcheck_score = ring->hangcheck.score;
 	ering->hangcheck_action = ring->hangcheck.action;
 
@@ -936,6 +933,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct intel_engine_cs *ring = &dev_priv->ring[i];
+		struct intel_ringbuffer *rbuf;
 
 		error->ring[i].pid = -1;
 
@@ -979,8 +977,24 @@ static void i915_gem_record_rings(struct drm_device *dev,
 			}
 		}
 
+		if (i915.enable_execlists) {
+			/* TODO: This is only a small fix to keep basic error
+			 * capture working, but we need to add more information
+			 * for it to be useful (e.g. dump the context being
+			 * executed).
+			 */
+			if (request)
+				rbuf = request->ctx->engine[ring->id].ringbuf;
+			else
+				rbuf = ring->default_context->engine[ring->id].ringbuf;
+		} else
+			rbuf = ring->buffer;
+
+		error->ring[i].cpu_ring_head = rbuf->head;
+		error->ring[i].cpu_ring_tail = rbuf->tail;
+
 		error->ring[i].ringbuffer =
-			i915_error_ggtt_object_create(dev_priv, ring->buffer->obj);
+			i915_error_ggtt_object_create(dev_priv, rbuf->obj);
 
 		if (ring->status_page.obj)
 			error->ring[i].hws_page =
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 35/42] drm/i915/bdw: Disable semaphores for Execlists
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (33 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 34/42] drm/i915/bdw: Make sure error capture keeps working " oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 36/42] drm/i915/bdw: Display execlists info in debugfs oscar.mateo
                   ` (7 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Up until recently, semaphores weren't enabled in BDW so we didn't care
about them. But then Rodrigo came and enabled them:

   commit 521e62e49a42661a4ee0102644517dbe2f100a23
   Author: Rodrigo Vivi <rodrigo.vivi@intel.com>

      drm/i915: Enable semaphores on BDW

So now we have to explicitly disable them for Execlists until both
features play nicely.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 83cb43a..ce345df 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -477,6 +477,10 @@ bool i915_semaphore_is_enabled(struct drm_device *dev)
 	if (i915.semaphores >= 0)
 		return i915.semaphores;
 
+	/* TODO: make semaphores and Execlists play nicely together */
+	if (i915.enable_execlists)
+		return false;
+
 #ifdef CONFIG_INTEL_IOMMU
 	/* Enable semaphores on SNB when IO remapping is off */
 	if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 36/42] drm/i915/bdw: Display execlists info in debugfs
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (34 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 35/42] drm/i915/bdw: Disable semaphores for Execlists oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 37/42] drm/i915/bdw: Display context backing obj & ringbuffer " oscar.mateo
                   ` (6 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

v2: Warn and return if LRCs are not enabled.

v3: Grab the Execlists spinlock (noticed by Daniel Vetter).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 73 +++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c    |  6 ---
 drivers/gpu/drm/i915/intel_lrc.h    |  7 ++++
 3 files changed, 80 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 4a5b0f8..66e9244 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1673,6 +1673,78 @@ static int i915_context_status(struct seq_file *m, void *unused)
 	return 0;
 }
 
+static int i915_execlists(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = (struct drm_info_node *) m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring;
+	u32 status_pointer;
+	u8 read_pointer;
+	u8 write_pointer;
+	u32 status;
+	u32 ctx_id;
+	struct list_head *cursor;
+	int ring_id, i;
+
+	if (!i915.enable_execlists) {
+		seq_printf(m, "Logical Ring Contexts are disabled\n");
+		return 0;
+	}
+
+	for_each_ring(ring, dev_priv, ring_id) {
+		struct intel_ctx_submit_request *head_req = NULL;
+		int count = 0;
+		unsigned long flags;
+
+		seq_printf(m, "%s\n", ring->name);
+
+		status = I915_READ(RING_EXECLIST_STATUS(ring));
+		ctx_id = I915_READ(RING_EXECLIST_STATUS(ring) + 4);
+		seq_printf(m, "\tExeclist status: 0x%08X, context: %u\n",
+				status, ctx_id);
+
+		status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
+		seq_printf(m, "\tStatus pointer: 0x%08X\n", status_pointer);
+
+		read_pointer = ring->next_context_status_buffer;
+		write_pointer = status_pointer & 0x07;
+		if (read_pointer > write_pointer)
+			write_pointer += 6;
+		seq_printf(m, "\tRead pointer: 0x%08X, write pointer 0x%08X\n",
+				read_pointer, write_pointer);
+
+		for (i = 0; i < 6; i++) {
+			status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + 8*i);
+			ctx_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + 8*i + 4);
+
+			seq_printf(m, "\tStatus buffer %d: 0x%08X, context: %u\n",
+					i, status, ctx_id);
+		}
+
+		spin_lock_irqsave(&ring->execlist_lock, flags);
+		list_for_each(cursor, &ring->execlist_queue)
+			count++;
+		head_req = list_first_entry_or_null(&ring->execlist_queue,
+				struct intel_ctx_submit_request, execlist_link);
+		spin_unlock_irqrestore(&ring->execlist_lock, flags);
+
+		seq_printf(m, "\t%d requests in queue\n", count);
+		if (head_req) {
+			struct drm_i915_gem_object *ctx_obj;
+
+			ctx_obj = head_req->ctx->engine[ring_id].state;
+			seq_printf(m, "\tHead request id: %u\n",
+					intel_execlists_ctx_id(ctx_obj));
+			seq_printf(m, "\tHead request tail: %u\n", head_req->tail);
+		}
+
+		seq_putc(m, '\n');
+	}
+
+	return 0;
+}
+
 static int i915_gen6_forcewake_count_info(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = m->private;
@@ -3892,6 +3964,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_opregion", i915_opregion, 0},
 	{"i915_gem_framebuffer", i915_gem_framebuffer_info, 0},
 	{"i915_context_status", i915_context_status, 0},
+	{"i915_execlists", i915_execlists, 0},
 	{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
 	{"i915_swizzle_info", i915_swizzle_info, 0},
 	{"i915_ppgtt_info", i915_ppgtt_info, 0},
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 6beab26..8ce9698 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -46,12 +46,6 @@
 
 #define GEN8_LR_CONTEXT_ALIGN 4096
 
-#define RING_ELSP(ring)			((ring)->mmio_base+0x230)
-#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
-#define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
-#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
-#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
-
 #define RING_EXECLIST_QFULL		(1 << 0x2)
 #define RING_EXECLIST1_VALID		(1 << 0x3)
 #define RING_EXECLIST0_VALID		(1 << 0x4)
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 074b44f..f3b921b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -24,6 +24,13 @@
 #ifndef _INTEL_LRC_H_
 #define _INTEL_LRC_H_
 
+/* Execlists regs */
+#define RING_ELSP(ring)			((ring)->mmio_base+0x230)
+#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
+#define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
+#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
+#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
+
 /* Logical Rings */
 void intel_logical_ring_stop(struct intel_engine_cs *ring);
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 37/42] drm/i915/bdw: Display context backing obj & ringbuffer info in debugfs
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (35 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 36/42] drm/i915/bdw: Display execlists info in debugfs oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 38/42] drm/i915/bdw: Print context state " oscar.mateo
                   ` (5 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 66e9244..4c62900 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1628,6 +1628,12 @@ static int i915_gem_framebuffer_info(struct seq_file *m, void *data)
 
 	return 0;
 }
+static void describe_ctx_ringbuf(struct seq_file *m, struct intel_ringbuffer *ringbuf)
+{
+	seq_printf(m, " (ringbuffer, space: %d, head: %u, tail: %u, last head: %d)",
+			ringbuf->space, ringbuf->head, ringbuf->tail,
+			ringbuf->last_retired_head);
+}
 
 static int i915_context_status(struct seq_file *m, void *unused)
 {
@@ -1655,7 +1661,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
 	}
 
 	list_for_each_entry(ctx, &dev_priv->context_list, link) {
-		if (ctx->legacy_hw_ctx.rcs_state == NULL)
+		if (!i915.enable_execlists && ctx->legacy_hw_ctx.rcs_state == NULL)
 			continue;
 
 		seq_puts(m, "HW context ");
@@ -1664,7 +1670,22 @@ static int i915_context_status(struct seq_file *m, void *unused)
 			if (ring->default_context == ctx)
 				seq_printf(m, "(default context %s) ", ring->name);
 
-		describe_obj(m, ctx->legacy_hw_ctx.rcs_state);
+		if (i915.enable_execlists) {
+			seq_putc(m, '\n');
+			for_each_ring(ring, dev_priv, i) {
+				struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
+				struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
+
+				seq_printf(m, "%s: ", ring->name);
+				if (ctx_obj)
+					describe_obj(m, ctx_obj);
+				if (ringbuf)
+					describe_ctx_ringbuf(m, ringbuf);
+				seq_putc(m, '\n');
+			}
+		} else
+			describe_obj(m, ctx->legacy_hw_ctx.rcs_state);
+
 		seq_putc(m, '\n');
 	}
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 38/42] drm/i915/bdw: Print context state in debugfs
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (36 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 37/42] drm/i915/bdw: Display context backing obj & ringbuffer " oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 39/42] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists oscar.mateo
                   ` (4 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <ben@bwidawsk.net>

This has turned out to be really handy in debug so far.

Update:
Since writing this patch, I've gotten similar code upstream for error
state. I've used it quite a bit in debugfs however, and I'd like to keep
it here at least until preemption is working.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

This patch was accidentally dropped in the first Execlists version, and
it has been very useful indeed. Put it back again, but as a standalone
debugfs file.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 52 +++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 4c62900..34ddabc 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1694,6 +1694,57 @@ static int i915_context_status(struct seq_file *m, void *unused)
 	return 0;
 }
 
+static int i915_dump_lrc(struct seq_file *m, void *unused)
+{
+	struct drm_info_node *node = (struct drm_info_node *) m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring;
+	struct intel_context *ctx;
+	int ret, i;
+
+	if (!i915.enable_execlists) {
+		seq_printf(m, "Logical Ring Contexts are disabled\n");
+		return 0;
+	}
+
+	ret = mutex_lock_interruptible(&dev->mode_config.mutex);
+	if (ret)
+		return ret;
+
+	list_for_each_entry(ctx, &dev_priv->context_list, link) {
+		for_each_ring(ring, dev_priv, i) {
+			struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
+
+			if (ring->default_context == ctx)
+				continue;
+
+			if (ctx_obj) {
+				struct page *page = i915_gem_object_get_page(ctx_obj, 1);
+				uint32_t *reg_state = kmap_atomic(page);
+				int j;
+
+				seq_printf(m, "CONTEXT: %s %u\n", ring->name,
+						intel_execlists_ctx_id(ctx_obj));
+
+				for (j = 0; j < 0x600 / sizeof(u32) / 4; j += 4) {
+					seq_printf(m, "\t[0x%08lx] 0x%08x 0x%08x 0x%08x 0x%08x\n",
+					i915_gem_obj_ggtt_offset(ctx_obj) + 4096 + (j * 4),
+					reg_state[j], reg_state[j + 1],
+					reg_state[j + 2], reg_state[j + 3]);
+				}
+				kunmap_atomic(reg_state);
+
+				seq_putc(m, '\n');
+			}
+		}
+	}
+
+	mutex_unlock(&dev->mode_config.mutex);
+
+	return 0;
+}
+
 static int i915_execlists(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = (struct drm_info_node *) m->private;
@@ -3985,6 +4036,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_opregion", i915_opregion, 0},
 	{"i915_gem_framebuffer", i915_gem_framebuffer_info, 0},
 	{"i915_context_status", i915_context_status, 0},
+	{"i915_dump_lrc", i915_dump_lrc, 0},
 	{"i915_execlists", i915_execlists, 0},
 	{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
 	{"i915_swizzle_info", i915_swizzle_info, 0},
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 39/42] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (37 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 38/42] drm/i915/bdw: Print context state " oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 40/42] drm/i915/bdw: Enable Logical Ring Contexts (hence, Execlists) oscar.mateo
                   ` (3 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Add theory of operation notes to intel_lrc.c and comments to externally
visible functions.

v2: Add notes on logical ring context creation.

v3: Use kerneldoc.

v4: Integrate it in the DocBook template.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v3)
---
 Documentation/DocBook/drm.tmpl   |   5 +
 drivers/gpu/drm/i915/intel_lrc.c | 215 ++++++++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.h |  30 ++++++
 3 files changed, 249 insertions(+), 1 deletion(-)

diff --git a/Documentation/DocBook/drm.tmpl b/Documentation/DocBook/drm.tmpl
index 4890d94..cd981f2 100644
--- a/Documentation/DocBook/drm.tmpl
+++ b/Documentation/DocBook/drm.tmpl
@@ -3945,6 +3945,11 @@ int num_ioctls;</synopsis>
 !Pdrivers/gpu/drm/i915/i915_cmd_parser.c batch buffer command parser
 !Idrivers/gpu/drm/i915/i915_cmd_parser.c
       </sect2>
+      <sect2>
+        <title>Logical Rings, Logical Ring Contexts and Execlists</title>
+!Pdrivers/gpu/drm/i915/intel_lrc.c Logical Rings, Logical Ring Contexts and Execlists
+!Idrivers/gpu/drm/i915/intel_lrc.c
+      </sect2>
     </sect1>
   </chapter>
 </part>
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8ce9698..3c3fa69 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -28,13 +28,108 @@
  *
  */
 
-/*
+/**
+ * DOC: Logical Rings, Logical Ring Contexts and Execlists
+ *
+ * Motivation:
  * GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
  * These expanded contexts enable a number of new abilities, especially
  * "Execlists" (also implemented in this file).
  *
+ * One of the main differences with the legacy HW contexts is that logical
+ * ring contexts incorporate many more things to the context's state, like
+ * PDPs or ringbuffer control registers:
+ *
+ * The reason why PDPs are included in the context is straightforward: as
+ * PPGTTs (per-process GTTs) are actually per-context, having the PDPs
+ * contained there mean you don't need to do a ppgtt->switch_mm yourself,
+ * instead, the GPU will do it for you on the context switch.
+ *
+ * But, what about the ringbuffer control registers (head, tail, etc..)?
+ * shouldn't we just need a set of those per engine command streamer? This is
+ * where the name "Logical Rings" starts to make sense: by virtualizing the
+ * rings, the engine cs shifts to a new "ring buffer" with every context
+ * switch. When you want to submit a workload to the GPU you: A) choose your
+ * context, B) find its appropriate virtualized ring, C) write commands to it
+ * and then, finally, D) tell the GPU to switch to that context.
+ *
+ * Instead of the legacy MI_SET_CONTEXT, the way you tell the GPU to switch
+ * to a contexts is via a context execution list, ergo "Execlists".
+ *
+ * LRC implementation:
+ * Regarding the creation of contexts, we have:
+ *
+ * - One global default context.
+ * - One local default context for each opened fd.
+ * - One local extra context for each context create ioctl call.
+ *
+ * Now that ringbuffers belong per-context (and not per-engine, like before)
+ * and that contexts are uniquely tied to a given engine (and not reusable,
+ * like before) we need:
+ *
+ * - One ringbuffer per-engine inside each context.
+ * - One backing object per-engine inside each context.
+ *
+ * The global default context starts its life with these new objects fully
+ * allocated and populated. The local default context for each opened fd is
+ * more complex, because we don't know at creation time which engine is going
+ * to use them. To handle this, we have implemented a deferred creation of LR
+ * contexts:
+ *
+ * The local context starts its life as a hollow or blank holder, that only
+ * gets populated for a given engine once we receive an execbuffer. If later
+ * on we receive another execbuffer ioctl for the same context but a different
+ * engine, we allocate/populate a new ringbuffer and context backing object and
+ * so on.
+ *
+ * Finally, regarding local contexts created using the ioctl call: as they are
+ * only allowed with the render ring, we can allocate & populate them right
+ * away (no need to defer anything, at least for now).
+ *
+ * Execlists implementation:
  * Execlists are the new method by which, on gen8+ hardware, workloads are
  * submitted for execution (as opposed to the legacy, ringbuffer-based, method).
+ * This method works as follows:
+ *
+ * When a request is committed, its commands (the BB start and any leading or
+ * trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer
+ * for the appropriate context. The tail pointer in the hardware context is not
+ * updated at this time, but instead, kept by the driver in the ringbuffer
+ * structure. A structure representing this request is added to a request queue
+ * for the appropriate engine: this structure contains a copy of the context's
+ * tail after the request was written to the ring buffer and a pointer to the
+ * context itself.
+ *
+ * If the engine's request queue was empty before the request was added, the
+ * queue is processed immediately. Otherwise the queue will be processed during
+ * a context switch interrupt. In any case, elements on the queue will get sent
+ * (in pairs) to the GPU's ExecLists Submit Port (ELSP, for short) with a
+ * globally unique 20-bits submission ID.
+ *
+ * When execution of a request completes, the GPU updates the context status
+ * buffer with a context complete event and generates a context switch interrupt.
+ * During the interrupt handling, the driver examines the events in the buffer:
+ * for each context complete event, if the announced ID matches that on the head
+ * of the request queue, then that request is retired and removed from the queue.
+ *
+ * After processing, if any requests were retired and the queue is not empty
+ * then a new execution list can be submitted. The two requests at the front of
+ * the queue are next to be submitted but since a context may not occur twice in
+ * an execution list, if subsequent requests have the same ID as the first then
+ * the two requests must be combined. This is done simply by discarding requests
+ * at the head of the queue until either only one requests is left (in which case
+ * we use a NULL second context) or the first two requests have unique IDs.
+ *
+ * By always executing the first two requests in the queue the driver ensures
+ * that the GPU is kept as busy as possible. In the case where a single context
+ * completes but a second context is still executing, the request for this second
+ * context will be at the head of the queue when we remove the first one. This
+ * request will then be resubmitted along with a new request for a different context,
+ * which will cause the hardware to continue executing the second request and queue
+ * the new request (the GPU detects the condition of a context getting preempted
+ * with the same context and optimizes the context switch flow by not doing
+ * preemption, but just sampling the new tail pointer).
+ *
  */
 
 #include <drm/drmP.h>
@@ -109,6 +204,17 @@ enum {
 };
 #define GEN8_CTX_ID_SHIFT 32
 
+/**
+ * intel_sanitize_enable_execlists() - sanitize i915.enable_execlists
+ * @dev: DRM device.
+ * @enable_execlists: value of i915.enable_execlists module parameter.
+ *
+ * Only certain platforms support Execlists (the prerequisites being
+ * support for Logical Ring Contexts and Aliasing PPGTT or better),
+ * and only when enabled via module parameter.
+ *
+ * Return: 1 if Execlists is supported and has to be enabled.
+ */
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists)
 {
 	if (enable_execlists == 0)
@@ -121,6 +227,18 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
+/**
+ * intel_execlists_ctx_id() - get the Execlists Context ID
+ * @ctx_obj: Logical Ring Context backing object.
+ *
+ * Do not confuse with ctx->id! Unfortunately we have a name overload
+ * here: the old context ID we pass to userspace as a handler so that
+ * they can refer to a context, and the new context ID we pass to the
+ * ELSP so that the GPU can inform us of the context status via
+ * interrupts.
+ *
+ * Return: 20-bits globally unique context ID.
+ */
 u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
 {
 	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj);
@@ -305,6 +423,13 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
 	return false;
 }
 
+/**
+ * intel_execlists_handle_ctx_events() - handle Context Switch interrupts
+ * @ring: Engine Command Streamer to handle.
+ *
+ * Check the unread Context Status Buffers and manage the submission of new
+ * contexts to the ELSP accordingly.
+ */
 void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
@@ -473,6 +598,23 @@ static int execlists_move_to_gpu(struct intel_ringbuffer *ringbuf,
 	return logical_ring_invalidate_all_caches(ringbuf);
 }
 
+/**
+ * execlists_submission() - submit a batchbuffer for execution, Execlists style
+ * @dev: DRM device.
+ * @file: DRM file.
+ * @ring: Engine Command Streamer to submit to.
+ * @ctx: Context to employ for this submission.
+ * @args: execbuffer call arguments.
+ * @vmas: list of vmas.
+ * @batch_obj: the batchbuffer to submit.
+ * @exec_start: batchbuffer start virtual address pointer.
+ * @flags: translated execbuffer call flags.
+ *
+ * This is the evil twin version of i915_gem_ringbuffer_submission. It abstracts
+ * away the submission details of the execbuffer ioctl call.
+ *
+ * Return: non-zero if the submission fails.
+ */
 int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 			       struct intel_engine_cs *ring,
 			       struct intel_context *ctx,
@@ -598,6 +740,15 @@ int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf)
 	return 0;
 }
 
+/**
+ * intel_logical_ring_advance_and_submit() - advance the tail and submit the workload
+ * @ringbuf: Logical Ringbuffer to advance.
+ *
+ * The tail is updated in our logical ringbuffer struct, not in the actual context. What
+ * really happens during submission is that the context and current tail will be placed
+ * on a queue waiting for the ELSP to be ready to accept a new context submission. At that
+ * point, the tail *inside* the context is updated and the ELSP written to.
+ */
 void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
 {
 	struct intel_engine_cs *ring = ringbuf->ring;
@@ -775,6 +926,19 @@ static int logical_ring_prepare(struct intel_ringbuffer *ringbuf, int bytes)
 	return 0;
 }
 
+/**
+ * intel_logical_ring_begin() - prepare the logical ringbuffer to accept some commands
+ *
+ * @ringbuf: Logical ringbuffer.
+ * @num_dwords: number of DWORDs that we plan to write to the ringbuffer.
+ *
+ * The ringbuffer might not be ready to accept the commands right away (maybe it needs to
+ * be wrapped, or wait a bit for the tail to be updated). This function takes care of that
+ * and also preallocates a request (every workload submission is still mediated through
+ * requests, same as it did with legacy ringbuffer submission).
+ *
+ * Return: non-zero if the ringbuffer is not ready to be written to.
+ */
 int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords)
 {
 	struct intel_engine_cs *ring = ringbuf->ring;
@@ -1011,6 +1175,12 @@ static int gen8_emit_request(struct intel_ringbuffer *ringbuf)
 	return 0;
 }
 
+/**
+ * intel_logical_ring_cleanup() - deallocate the Engine Command Streamer
+ *
+ * @ring: Engine Command Streamer.
+ *
+ */
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
@@ -1205,6 +1375,16 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	return logical_ring_init(dev, ring);
 }
 
+/**
+ * intel_logical_rings_init() - allocate, populate and init the Engine Command Streamers
+ * @dev: DRM device.
+ *
+ * This function inits the engines for an Execlists submission style (the equivalent in the
+ * legacy ringbuffer submission world would be i915_gem_init_rings). It does it only for
+ * those engines that are present in the hardware.
+ *
+ * Return: non-zero if the initialization failed.
+ */
 int intel_logical_rings_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1258,6 +1438,18 @@ cleanup_render_ring:
 	return ret;
 }
 
+/**
+ * intel_lr_context_render_state_init() - render state init for Execlists
+ * @ring: Engine Command Streamer.
+ * @ctx: Context to initialize.
+ *
+ * A.K.A. null-context, A.K.A. golden-context. In a word, the render engine
+ * contexts require to always have a valid 3d pipeline state. As this is
+ * achieved with the submission of a batchbuffer, we require an alternative
+ * entry point to the legacy ringbuffer submission one (i915_gem_render_state_init).
+ *
+ * Return: non-zero if the initialization failed.
+ */
 int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
 				       struct intel_context *ctx)
 {
@@ -1398,6 +1590,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	return 0;
 }
 
+/**
+ * intel_lr_context_free() - free the LRC specific bits of a context
+ * @ctx: the LR context to free.
+ *
+ * The real context freeing is done in i915_gem_context_free: this only
+ * takes care of the bits that are LRC related: the per-engine backing
+ * objects and the logical ringbuffer.
+ */
 void intel_lr_context_free(struct intel_context *ctx)
 {
 	int i;
@@ -1436,6 +1636,19 @@ static uint32_t get_lr_context_size(struct intel_engine_cs *ring)
 	return ret;
 }
 
+/**
+ * intel_lr_context_deferred_create() - create the LRC specific bits of a context
+ * @ctx: LR context to create.
+ * @ring: engine to be used with the context.
+ *
+ * This function can be called more than once, with different engines, if we plan
+ * to use the context with them. The context backing objects and the ringbuffers
+ * (specially the ringbuffer backing objects) suck a lot of memory up, and that's why
+ * the creation is a deferred call: it's better to make sure first that we need to use
+ * a given ring with the context.
+ *
+ * Return: non-zero on eror.
+ */
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring)
 {
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index f3b921b..ac32890 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -38,10 +38,21 @@ int intel_logical_rings_init(struct drm_device *dev);
 
 int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf);
 void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf);
+/**
+ * intel_logical_ring_advance() - advance the ringbuffer tail
+ * @ringbuf: Ringbuffer to advance.
+ *
+ * The tail is only updated in our logical ringbuffer struct.
+ */
 static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
 {
 	ringbuf->tail &= ringbuf->size - 1;
 }
+/**
+ * intel_logical_ring_emit() - write a DWORD to the ringbuffer.
+ * @ringbuf: Ringbuffer to write to.
+ * @data: DWORD to write.
+ */
 static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf, u32 data)
 {
 	iowrite32(data, ringbuf->virtual_start + ringbuf->tail);
@@ -67,6 +78,25 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 			       u64 exec_start, u32 flags);
 u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
+/**
+ * struct intel_ctx_submit_request - queued context submission request
+ * @ctx: Context to submit to the ELSP.
+ * @ring: Engine to submit it to.
+ * @tail: how far in the context's ringbuffer this request goes to.
+ * @execlist_link: link in the submission queue.
+ * @work: workqueue for processing this request in a bottom half.
+ * @elsp_submitted: no. of times this request has been sent to the ELSP.
+ *
+ * The ELSP only accepts two elements at a time, so we queue context/tail
+ * pairs on a given queue (ring->execlist_queue) until the hardware is
+ * available. The queue serves a double purpose: we also use it to keep track
+ * of the up to 2 contexts currently in the hardware (usually one in execution
+ * and the other queued up by the GPU): We only remove elements from the head
+ * of the queue when the hardware informs us that an element has been
+ * completed.
+ *
+ * All accesses to the queue are mediated by a spinlock (ring->execlist_lock).
+ */
 struct intel_ctx_submit_request {
 	struct intel_context *ctx;
 	struct intel_engine_cs *ring;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 40/42] drm/i915/bdw: Enable Logical Ring Contexts (hence, Execlists)
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (38 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 39/42] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 41/42] drm/i915/bdw: Pin the context backing objects to GGTT on-demand oscar.mateo
                   ` (2 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

The time has come, the Walrus said, to talk of many things.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9e3e38e..426ef0c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2054,7 +2054,7 @@ struct drm_i915_cmd_table {
 #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
 
 #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
-#define HAS_LOGICAL_RING_CONTEXTS(dev)	0
+#define HAS_LOGICAL_RING_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 8)
 #define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >= 6)
 #define HAS_PPGTT(dev)		(INTEL_INFO(dev)->gen >= 7 && !IS_GEN8(dev))
 #define USES_PPGTT(dev)		intel_enable_ppgtt(dev, false)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 41/42] drm/i915/bdw: Pin the context backing objects to GGTT on-demand
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (39 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 40/42] drm/i915/bdw: Enable Logical Ring Contexts (hence, Execlists) oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 15:49 ` [PATCH 42/42] drm/i915/bdw: Pin the ringbuffer backing object " oscar.mateo
  2014-07-11 17:26 ` [PATCH 00/42] Execlists v4 Volkin, Bradley D
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Up until now, we have pinned every logical ring context backing object
during creation, and left it pinned until destruction. This made my life
easier, but it's a harmful thing to do, because we cause fragmentation
of the GGTT (and, eventually, we would run out of space).

This patch makes the pinning on-demand: the backing objects of the two
contexts that are written to the ELSP are pinned right before submission
and unpinned once the hardware is done with them. The only context that
is still pinned regardless is the global default one, so that the HWS can
still be accessed in the same way (ring->status_page).

There are several TODO left in this patch, so it's probably not ready for
merging yet:
- execlists_submit_context has increased chances of failing now, so we
  cannot simply BUG_ON if it does. I outlined here a possible recovery
  mechanism, but maybe it's a bit too naive.
- The debugfs entries that access the context backing object need to be
  checked and modified to pin the object accordingly.
- A recovery from reset probably needs to unpin some objects as well.
- Something is not right during module re-loading (occasional failure
  in tests/drv_module_reload).
- Lots of testing!!

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 74 +++++++++++++++++++++++++++++-----------
 1 file changed, 54 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3c3fa69..244269b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -333,24 +333,41 @@ static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tai
 }
 
 static int execlists_submit_context(struct intel_engine_cs *ring,
-				    struct intel_context *to0, u32 tail0,
-				    struct intel_context *to1, u32 tail1)
+				    struct intel_ctx_submit_request *req0,
+				    struct intel_ctx_submit_request *req1)
 {
-	struct drm_i915_gem_object *ctx_obj0;
+	struct intel_context *to0 = req0->ctx;
+	struct drm_i915_gem_object *ctx_obj0 = to0->engine[ring->id].state;
 	struct drm_i915_gem_object *ctx_obj1 = NULL;
+	bool ctx0_pinned = false;
+	int ret;
 
-	ctx_obj0 = to0->engine[ring->id].state;
 	BUG_ON(!ctx_obj0);
-	BUG_ON(!i915_gem_obj_is_pinned(ctx_obj0));
+	if (to0 != ring->default_context && !req0->elsp_submitted) {
+		ret = i915_gem_obj_ggtt_pin(ctx_obj0, GEN8_LR_CONTEXT_ALIGN, 0);
+		if (ret)
+			return ret;
+		ctx0_pinned = true;
+	}
+
+	execlists_ctx_write_tail(ctx_obj0, req0->tail);
 
-	execlists_ctx_write_tail(ctx_obj0, tail0);
+	if (req1) {
+		struct intel_context *to1 = req1->ctx;
 
-	if (to1) {
 		ctx_obj1 = to1->engine[ring->id].state;
+
 		BUG_ON(!ctx_obj1);
-		BUG_ON(!i915_gem_obj_is_pinned(ctx_obj1));
+		if (to1 != ring->default_context) {
+			ret = i915_gem_obj_ggtt_pin(ctx_obj1, GEN8_LR_CONTEXT_ALIGN, 0);
+			if (ret) {
+				if (ctx0_pinned)
+					i915_gem_object_ggtt_unpin(ctx_obj0);
+				return ret;
+			}
+		}
 
-		execlists_ctx_write_tail(ctx_obj1, tail1);
+		execlists_ctx_write_tail(ctx_obj1, req1->tail);
 	}
 
 	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
@@ -388,8 +405,15 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 
 	WARN_ON(req1 && req1->elsp_submitted);
 
-	BUG_ON(execlists_submit_context(ring, req0->ctx, req0->tail,
-			req1? req1->ctx : NULL, req1? req1->tail : 0));
+	if (WARN_ON(execlists_submit_context(ring, req0, req1))) {
+		list_del(&req0->execlist_link);
+		queue_work(dev_priv->wq, &req0->work);
+		if (req1) {
+			list_del(&req1->execlist_link);
+			queue_work(dev_priv->wq, &req1->work);
+		}
+		return;
+	}
 
 	req0->elsp_submitted++;
 	if (req1)
@@ -413,6 +437,8 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
 			WARN(head_req->elsp_submitted == 0,
 					"Never submitted head request\n");
 			if (--head_req->elsp_submitted <= 0) {
+				if (head_req->ctx != ring->default_context)
+					i915_gem_object_ggtt_unpin(ctx_obj);
 				list_del(&head_req->execlist_link);
 				queue_work(dev_priv->wq, &head_req->work);
 				return true;
@@ -1604,12 +1630,15 @@ void intel_lr_context_free(struct intel_context *ctx)
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
-		struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
 
 		if (ctx_obj) {
+			struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
+			struct intel_engine_cs *ring = ringbuf->ring;
+
 			intel_destroy_ringbuffer_obj(ringbuf);
 			kfree(ringbuf);
-			i915_gem_object_ggtt_unpin(ctx_obj);
+			if (ctx == ring->default_context)
+				i915_gem_object_ggtt_unpin(ctx_obj);
 			drm_gem_object_unreference(&ctx_obj->base);
 		}
 	}
@@ -1652,6 +1681,7 @@ static uint32_t get_lr_context_size(struct intel_engine_cs *ring)
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring)
 {
+	const bool is_global_default_ctx = (ctx == ring->default_context);
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_gem_object *ctx_obj;
 	uint32_t context_size;
@@ -1671,18 +1701,21 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 		return ret;
 	}
 
-	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
-	if (ret) {
-		DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n", ret);
-		drm_gem_object_unreference(&ctx_obj->base);
-		return ret;
+	if (is_global_default_ctx) {
+		ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
+		if (ret) {
+			DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n", ret);
+			drm_gem_object_unreference(&ctx_obj->base);
+			return ret;
+		}
 	}
 
 	ringbuf = kzalloc(sizeof(*ringbuf), GFP_KERNEL);
 	if (!ringbuf) {
 		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s\n",
 				ring->name);
-		i915_gem_object_ggtt_unpin(ctx_obj);
+		if (is_global_default_ctx)
+			i915_gem_object_ggtt_unpin(ctx_obj);
 		drm_gem_object_unreference(&ctx_obj->base);
 		ret = -ENOMEM;
 		return ret;
@@ -1738,7 +1771,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 
 error:
 	kfree(ringbuf);
-	i915_gem_object_ggtt_unpin(ctx_obj);
+	if (is_global_default_ctx)
+		i915_gem_object_ggtt_unpin(ctx_obj);
 	drm_gem_object_unreference(&ctx_obj->base);
 	return ret;
 }
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 42/42] drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (40 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 41/42] drm/i915/bdw: Pin the context backing objects to GGTT on-demand oscar.mateo
@ 2014-07-11 15:49 ` oscar.mateo
  2014-07-11 17:26 ` [PATCH 00/42] Execlists v4 Volkin, Bradley D
  42 siblings, 0 replies; 44+ messages in thread
From: oscar.mateo @ 2014-07-11 15:49 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Same as with the context, pinning to GGTT regardless is harmful (it
badly fragments the GGTT and can even exhaust it).

Unfortunately, this case is also more complex than the previous one
because we need to map and access the ringbuffer in several places
along the execbuffer path (and we cannot make do by leaving the
default ringbuffer pinned, as before). Also, the context object
itself contains a pointer to the ringbuffer address that we have to
keep updated if we are going to allow the ringbuffer to move around.

Also as before, this patch is not probably ready for merging yet:
- The debugfs entries that access the context backing object need to be
  checked and modified to pin the object accordingly.
- A recovery from reset probably needs to unpin some objects as well.
- Something is not right during module re-loading (occasional failure
  in tests/drv_module_reload).
- Lots of testing needed (and maybe some targeted IGTs?)-

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 115 +++++++++++++++++++++++---------
 drivers/gpu/drm/i915/intel_ringbuffer.c |  83 +++++++++++++----------
 drivers/gpu/drm/i915/intel_ringbuffer.h |   4 ++
 3 files changed, 135 insertions(+), 67 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 244269b..9e00eb5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -317,7 +317,9 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
 }
 
-static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tail)
+static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
+				    struct drm_i915_gem_object *ring_obj,
+				    u32 tail)
 {
 	struct page *page;
 	uint32_t *reg_state;
@@ -326,6 +328,7 @@ static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tai
 	reg_state = kmap_atomic(page);
 
 	reg_state[CTX_RING_TAIL+1] = tail;
+	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
 
 	kunmap_atomic(reg_state);
 
@@ -338,41 +341,61 @@ static int execlists_submit_context(struct intel_engine_cs *ring,
 {
 	struct intel_context *to0 = req0->ctx;
 	struct drm_i915_gem_object *ctx_obj0 = to0->engine[ring->id].state;
+	struct intel_ringbuffer *ringbuf0 = to0->engine[ring->id].ringbuf;
 	struct drm_i915_gem_object *ctx_obj1 = NULL;
-	bool ctx0_pinned = false;
+	bool ctx0_pinned = false, rb0_pinned = false, ctx1_pinned = false;
 	int ret;
 
 	BUG_ON(!ctx_obj0);
-	if (to0 != ring->default_context && !req0->elsp_submitted) {
-		ret = i915_gem_obj_ggtt_pin(ctx_obj0, GEN8_LR_CONTEXT_ALIGN, 0);
+	if (!req0->elsp_submitted) {
+		if (to0 != ring->default_context) {
+			ret = i915_gem_obj_ggtt_pin(ctx_obj0, GEN8_LR_CONTEXT_ALIGN, 0);
+			if (ret)
+				return ret;
+			ctx0_pinned = true;
+		}
+
+		ret = i915_gem_obj_ggtt_pin(ringbuf0->obj, PAGE_SIZE, 0);
 		if (ret)
-			return ret;
-		ctx0_pinned = true;
+			goto error;
+		rb0_pinned = true;
 	}
 
-	execlists_ctx_write_tail(ctx_obj0, req0->tail);
+	execlists_update_context(ctx_obj0, ringbuf0->obj, req0->tail);
 
 	if (req1) {
 		struct intel_context *to1 = req1->ctx;
+		struct intel_ringbuffer *ringbuf1 = to1->engine[ring->id].ringbuf;
 
 		ctx_obj1 = to1->engine[ring->id].state;
 
 		BUG_ON(!ctx_obj1);
 		if (to1 != ring->default_context) {
 			ret = i915_gem_obj_ggtt_pin(ctx_obj1, GEN8_LR_CONTEXT_ALIGN, 0);
-			if (ret) {
-				if (ctx0_pinned)
-					i915_gem_object_ggtt_unpin(ctx_obj0);
-				return ret;
-			}
+			if (ret)
+				goto error;
+			ctx1_pinned = true;
 		}
 
-		execlists_ctx_write_tail(ctx_obj1, req1->tail);
+		ret = i915_gem_obj_ggtt_pin(ringbuf1->obj, PAGE_SIZE, 0);
+		if (ret)
+			goto error;
+
+		execlists_update_context(ctx_obj1, ringbuf1->obj, req1->tail);
 	}
 
 	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
 
 	return 0;
+
+error:
+	if (ctx0_pinned)
+		i915_gem_object_ggtt_unpin(ctx_obj0);
+	if (rb0_pinned)
+		i915_gem_object_ggtt_unpin(ringbuf0->obj);
+	if (ctx1_pinned)
+		i915_gem_object_ggtt_unpin(ctx_obj1);
+	return ret;
 }
 
 static void execlists_context_unqueue(struct intel_engine_cs *ring)
@@ -437,8 +460,11 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
 			WARN(head_req->elsp_submitted == 0,
 					"Never submitted head request\n");
 			if (--head_req->elsp_submitted <= 0) {
+				struct intel_ringbuffer *ringbuf =
+						head_req->ctx->engine[ring->id].ringbuf;
 				if (head_req->ctx != ring->default_context)
 					i915_gem_object_ggtt_unpin(ctx_obj);
+				i915_gem_object_ggtt_unpin(ringbuf->obj);
 				list_del(&head_req->execlist_link);
 				queue_work(dev_priv->wq, &head_req->work);
 				return true;
@@ -670,9 +696,15 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 		}
 	}
 
+	if (atomic_inc_return(&ringbuf->unmap_count) == 1) {
+		ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
+		if (ret)
+			return ret;
+	}
+
 	ret = execlists_move_to_gpu(ringbuf, vmas);
 	if (ret)
-		return ret;
+		goto out;
 
 	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
 	instp_mask = I915_EXEC_CONSTANTS_MASK;
@@ -685,24 +717,27 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 	case I915_EXEC_CONSTANTS_ABSOLUTE:
 		if (ring != &dev_priv->ring[RCS]) {
 			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
-			return -EINVAL;
+			ret = -EINVAL;
+			goto out;
 		}
 		break;
 
 	case I915_EXEC_CONSTANTS_REL_SURFACE:
 		DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out;
 
 	default:
 		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out;
 	}
 
 	if (ring == &dev_priv->ring[RCS] &&
 			instp_mode != dev_priv->relative_constants_mode) {
 		ret = intel_logical_ring_begin(ringbuf, 4);
 		if (ret)
-			return ret;
+			goto out;
 
 		intel_logical_ring_emit(ringbuf, MI_NOOP);
 		intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1));
@@ -715,17 +750,21 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 
 	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
 		DRM_DEBUG("sol reset is gen7 only\n");
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out;
 	}
 
 	ret = ring->emit_bb_start(ringbuf, exec_start, flags);
 	if (ret)
-		return ret;
+		goto out;
 
 	i915_gem_execbuffer_move_to_active(vmas, ring);
 	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
 
-	return 0;
+out:
+	if (atomic_dec_return(&ringbuf->unmap_count) == 0)
+		intel_unpin_ringbuffer_obj(ringbuf);
+	return ret;
 }
 
 void intel_logical_ring_stop(struct intel_engine_cs *ring)
@@ -1492,6 +1531,12 @@ int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
 	if (so.rodata == NULL)
 		return 0;
 
+	if (atomic_inc_return(&ringbuf->unmap_count) == 1) {
+		ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
+		if (ret)
+			return ret;
+	}
+
 	ret = ring->emit_bb_start(ringbuf,
 			so.ggtt_offset,
 			I915_DISPATCH_SECURE);
@@ -1503,6 +1548,8 @@ int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
 	ret = __i915_add_request(ring, file, so.obj, NULL);
 	/* intel_logical_ring_add_request moves object to inactive if it fails */
 out:
+	if (atomic_dec_return(&ringbuf->unmap_count) == 0)
+		intel_unpin_ringbuffer_obj(ringbuf);
 	i915_gem_render_state_fini(&so);
 	return ret;
 }
@@ -1554,7 +1601,13 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_RING_TAIL] = RING_TAIL(ring->mmio_base);
 	reg_state[CTX_RING_TAIL+1] = 0;
 	reg_state[CTX_RING_BUFFER_START] = RING_START(ring->mmio_base);
+
+	ret = i915_gem_obj_ggtt_pin(ring_obj, PAGE_SIZE, 0);
+	if (ret)
+		goto error;
 	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
+	i915_gem_object_ggtt_unpin(ring_obj);
+
 	reg_state[CTX_RING_BUFFER_CONTROL] = RING_CTL(ring->mmio_base);
 	reg_state[CTX_RING_BUFFER_CONTROL+1] =
 			((ringbuf->size - PAGE_SIZE) & RING_NR_PAGES) | RING_VALID;
@@ -1607,13 +1660,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 		reg_state[CTX_R_PWR_CLK_STATE+1] = 0;
 	}
 
+error:
 	kunmap_atomic(reg_state);
 
 	ctx_obj->dirty = 1;
 	set_page_dirty(page);
 	i915_gem_object_unpin_pages(ctx_obj);
 
-	return 0;
+	return ret;
 }
 
 /**
@@ -1730,16 +1784,13 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	ringbuf->space = ringbuf->size;
 	ringbuf->last_retired_head = -1;
 
-	/* TODO: For now we put this in the mappable region so that we can reuse
-	 * the existing ringbuffer code which ioremaps it. When we start
-	 * creating many contexts, this will no longer work and we must switch
-	 * to a kmapish interface.
-	 */
-	ret = intel_alloc_ringbuffer_obj(dev, ringbuf);
-	if (ret) {
-		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer obj %s: %d\n",
-				ring->name, ret);
-		goto error;
+	if (ringbuf->obj == NULL) {
+		ret = intel_alloc_ringbuffer_obj(dev, ringbuf);
+		if (ret) {
+			DRM_DEBUG_DRIVER("Failed to allocate ringbuffer obj %s: %d\n",
+					ring->name, ret);
+			goto error;
+		}
 	}
 
 	ret = populate_lr_context(ctx, ctx_obj, ring, ringbuf);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6e604c9..020588c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1513,13 +1513,42 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
 	return 0;
 }
 
-void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
+void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 {
-	if (!ringbuf->obj)
-		return;
-
 	iounmap(ringbuf->virtual_start);
+	ringbuf->virtual_start = NULL;
 	i915_gem_object_ggtt_unpin(ringbuf->obj);
+}
+
+int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
+				     struct intel_ringbuffer *ringbuf)
+{
+	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct drm_i915_gem_object *obj = ringbuf->obj;
+	int ret;
+
+	ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
+	if (ret)
+		return ret;
+
+	ret = i915_gem_object_set_to_gtt_domain(obj, true);
+	if (ret) {
+		i915_gem_object_ggtt_unpin(obj);
+		return ret;
+	}
+
+	ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
+			i915_gem_obj_ggtt_offset(obj), ringbuf->size);
+	if (ringbuf->virtual_start == NULL) {
+		i915_gem_object_ggtt_unpin(obj);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
+{
 	drm_gem_object_unreference(&ringbuf->obj->base);
 	ringbuf->obj = NULL;
 }
@@ -1527,12 +1556,7 @@ void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 int intel_alloc_ringbuffer_obj(struct drm_device *dev,
 			       struct intel_ringbuffer *ringbuf)
 {
-	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_object *obj;
-	int ret;
-
-	if (ringbuf->obj)
-		return 0;
 
 	obj = NULL;
 	if (!HAS_LLC(dev))
@@ -1545,30 +1569,9 @@ int intel_alloc_ringbuffer_obj(struct drm_device *dev,
 	/* mark ring buffers as read-only from GPU side by default */
 	obj->gt_ro = 1;
 
-	ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
-	if (ret)
-		goto err_unref;
-
-	ret = i915_gem_object_set_to_gtt_domain(obj, true);
-	if (ret)
-		goto err_unpin;
-
-	ringbuf->virtual_start =
-		ioremap_wc(dev_priv->gtt.mappable_base + i915_gem_obj_ggtt_offset(obj),
-				ringbuf->size);
-	if (ringbuf->virtual_start == NULL) {
-		ret = -EINVAL;
-		goto err_unpin;
-	}
-
 	ringbuf->obj = obj;
-	return 0;
 
-err_unpin:
-	i915_gem_object_ggtt_unpin(obj);
-err_unref:
-	drm_gem_object_unreference(&obj->base);
-	return ret;
+	return 0;
 }
 
 static int intel_init_ring_buffer(struct drm_device *dev,
@@ -1606,10 +1609,19 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 			goto error;
 	}
 
-	ret = intel_alloc_ringbuffer_obj(dev, ringbuf);
-	if (ret) {
-		DRM_ERROR("Failed to allocate ringbuffer %s: %d\n", ring->name, ret);
-		goto error;
+	if (ringbuf->obj == NULL) {
+		ret = intel_alloc_ringbuffer_obj(dev, ringbuf);
+		if (ret) {
+			DRM_ERROR("Failed to allocate ringbuffer %s: %d\n", ring->name, ret);
+			goto error;
+		}
+
+		ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
+		if (ret) {
+			DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n", ring->name, ret);
+			intel_destroy_ringbuffer_obj(ringbuf);
+			goto error;
+		}
 	}
 
 	/* Workaround an erratum on the i830 which causes a hang if
@@ -1647,6 +1659,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 	intel_stop_ring_buffer(ring);
 	WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
 
+	intel_unpin_ringbuffer_obj(ringbuf);
 	intel_destroy_ringbuffer_obj(ringbuf);
 	ring->preallocated_lazy_request = NULL;
 	ring->outstanding_lazy_seqno = 0;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 905d1ba..feee5e1 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -94,6 +94,7 @@ struct intel_ring_hangcheck {
 struct intel_ringbuffer {
 	struct drm_i915_gem_object *obj;
 	void __iomem *virtual_start;
+	atomic_t unmap_count;
 
 	struct intel_engine_cs *ring;
 	struct intel_context *ctx;
@@ -371,6 +372,9 @@ intel_write_status_page(struct intel_engine_cs *ring,
 #define I915_GEM_HWS_SCRATCH_INDEX	0x30
 #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
 
+void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf);
+int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
+				     struct intel_ringbuffer *ringbuf);
 void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf);
 int intel_alloc_ringbuffer_obj(struct drm_device *dev,
 			       struct intel_ringbuffer *ringbuf);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/42] Execlists v4
  2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
                   ` (41 preceding siblings ...)
  2014-07-11 15:49 ` [PATCH 42/42] drm/i915/bdw: Pin the ringbuffer backing object " oscar.mateo
@ 2014-07-11 17:26 ` Volkin, Bradley D
  42 siblings, 0 replies; 44+ messages in thread
From: Volkin, Bradley D @ 2014-07-11 17:26 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jul 11, 2014 at 08:48:52AM -0700, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> For a description of this patchset, please check the previous cover letters: [1], [2] and [3].
> 
> The main changes introduced in this v4 are:
> 
> - Do not abstract __i915_add_request away.
> - Squash together the two emit request functions.
> - Always pass the ringbuffer along, as the struct that hold together all required information.
> - Integrate help into DocBook.
> - Early sanitize for i915.execlists (and remove lrc_enabled).
> - Always defer the context creation: do not special case the context create ioctl.
> - Do not special case the reset_ring_cleanup, do INIT_LIST_HEAD.
> - Add TODO to error capture code.
> - "use_mmio_flip at driver discretion" means "always" with Execlists.
> - Lock before displaying execlists info in debugfs.
> - Ehen populating the context, use _MASKED_BIT_ENABLE, upper/lower_32_bits, etc...
> - Add comment about CHV when populating the context.
> - Replace submit_ctx vfunc with a plain function call.
> - Do not change the behaviour of pipe control fini.
> - The BSD invalidate bit still exists in GEN8.
> - Squash intel_runtime_pm_get fix.
> - Pin rinbuffer and context backing objects only on-demand.
> - Reorder and rewrite to explain patches instead of showcasing them.
> - Reorder and rewrite to split "chapters" as cleanly as possible.
> 
> The patches in this series can be split into "chapters". They are:
> 
> - Logical Ring Contexts: patches 1 to 11
> - Logical Rings: patches 12 to 26
> - Execlists: patches 27 to 35
> - Debug, documentation and enabling: patches 36 to 40
> - Last minute patches that still require more attention: patches 41 and 42
> 
> Unfortunately, in a last-minute run of the IGT testsuite, I have noticed problems in gem_evict_everything. It's still not clear I can continue working on this project come next Monday, so I have decided to send the series to the mailing list "as-is", even though this potential eviction problem still needs to be chased. The last two patches (regarding on-demand pining of backing objects) also require a deep review and probably some more work, but I was running out of time.
> 
> One other caveat I have noticed is that many WAs in gen8_init_clock_gating (those that affect registers that now exist per-context) can get lost in the render default context. The reason is, in Execlists, a context is saved as soon as head = tail (with MI_SET_CONTEXT, however, the context wouldn't be saved until you tried to restore a different context). As we are sending the golden state batchbuffer to the render ring as soon as the rings are initialized, we are effectively saving the default context before gen8_init_clock_gating has an opportunity to set the WAs. I haven't noticed any ill-effect from this (yet) but it would be a good idea to move the WAs somewhere else (ring init looks like a good place). I believe there is already work in progress to create a new WA architecture, so t
 his can be tackled there.

Acked-by: Brad Volkin <bradley.d.volkin@intel.com>

I was on the hook to do detailed review for this but I won't have time for that
before my vacation and don't want to delay it. From a high-level pass, it looks
like all of my previous feedback was addressed and the new ordering looks good
as well. So from that perspective, I think we're ready to merge the bulk of
this series. Thanks for all the work on this Oscar!

Brad

> 
> The previous IGT test [4] still applies.
> 
> [1]
> http://lists.freedesktop.org/archives/intel-gfx/2014-March/042563.html
> [2]
> http://lists.freedesktop.org/archives/intel-gfx/2014-May/044847.html
> [3]
> http://lists.freedesktop.org/archives/intel-gfx/2014-June/047138.html
> [4]
> http://lists.freedesktop.org/archives/intel-gfx/2014-May/044846.html
> 
> Ben Widawsky (2):
>   drm/i915/bdw: Implement context switching (somewhat)
>   drm/i915/bdw: Print context state in debugfs
> 
> Michel Thierry (1):
>   drm/i915/bdw: Two-stage execlist submit process
> 
> Oscar Mateo (38):
>   drm/i915/bdw: New source and header file for LRs, LRCs and Execlists
>   drm/i915/bdw: Macro for LRCs and module option for Execlists
>   drm/i915/bdw: Initialization for Logical Ring Contexts
>   drm/i915/bdw: Introduce one context backing object per engine
>   drm/i915/bdw: A bit more advanced LR context alloc/free
>   drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts
>   drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
>   drm/i915/bdw: Populate LR contexts (somewhat)
>   drm/i915/bdw: Deferred creation of user-created LRCs
>   drm/i915/bdw: Render moot context reset and switch with Execlists
>   drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
>   drm/i915: Abstract the legacy workload submission mechanism away
>   drm/i915/bdw: Skeleton for the new logical rings submission path
>   drm/i915/bdw: Generic logical ring init and cleanup
>   drm/i915/bdw: GEN-specific logical ring init
>   drm/i915/bdw: GEN-specific logical ring set/get seqno
>   drm/i915/bdw: New logical ring submission mechanism
>   drm/i915/bdw: GEN-specific logical ring emit request
>   drm/i915/bdw: GEN-specific logical ring emit flush
>   drm/i915/bdw: Emission of requests with logical rings
>   drm/i915/bdw: Ring idle and stop with logical rings
>   drm/i915/bdw: Interrupts with logical rings
>   drm/i915/bdw: GEN-specific logical ring emit batchbuffer start
>   drm/i915/bdw: Workload submission mechanism for Execlists
>   drm/i915/bdw: Always use MMIO flips with Execlists
>   drm/i915/bdw: Render state init for Execlists
>   drm/i915/bdw: Write the tail pointer, LRC style
>   drm/i915/bdw: Avoid non-lite-restore preemptions
>   drm/i915/bdw: Help out the ctx switch interrupt handler
>   drm/i915/bdw: Make sure gpu reset still works with Execlists
>   drm/i915/bdw: Make sure error capture keeps working with Execlists
>   drm/i915/bdw: Disable semaphores for Execlists
>   drm/i915/bdw: Display execlists info in debugfs
>   drm/i915/bdw: Display context backing obj & ringbuffer info in debugfs
>   drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
>   drm/i915/bdw: Enable Logical Ring Contexts (hence, Execlists)
>   drm/i915/bdw: Pin the context backing objects to GGTT on-demand
>   drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand
> 
> Thomas Daniel (1):
>   drm/i915/bdw: Handle context switch events
> 
>  Documentation/DocBook/drm.tmpl               |    5 +
>  drivers/gpu/drm/i915/Makefile                |    1 +
>  drivers/gpu/drm/i915/i915_debugfs.c          |  150 ++-
>  drivers/gpu/drm/i915/i915_drv.c              |    4 +
>  drivers/gpu/drm/i915/i915_drv.h              |   46 +-
>  drivers/gpu/drm/i915/i915_gem.c              |  106 +-
>  drivers/gpu/drm/i915/i915_gem_context.c      |   56 +-
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c   |   32 +-
>  drivers/gpu/drm/i915/i915_gem_gtt.c          |    5 +
>  drivers/gpu/drm/i915/i915_gem_render_state.c |   40 +-
>  drivers/gpu/drm/i915/i915_gem_render_state.h |   47 +
>  drivers/gpu/drm/i915/i915_gpu_error.c        |   22 +-
>  drivers/gpu/drm/i915/i915_irq.c              |   44 +-
>  drivers/gpu/drm/i915/i915_params.c           |    6 +
>  drivers/gpu/drm/i915/i915_reg.h              |    5 +
>  drivers/gpu/drm/i915/intel_display.c         |    2 +
>  drivers/gpu/drm/i915/intel_lrc.c             | 1829 ++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.h             |  113 ++
>  drivers/gpu/drm/i915/intel_renderstate.h     |    8 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.c      |  166 ++-
>  drivers/gpu/drm/i915/intel_ringbuffer.h      |   42 +-
>  21 files changed, 2574 insertions(+), 155 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_gem_render_state.h
>  create mode 100644 drivers/gpu/drm/i915/intel_lrc.c
>  create mode 100644 drivers/gpu/drm/i915/intel_lrc.h
> 
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2014-07-11 17:25 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-11 15:48 [PATCH 00/42] Execlists v4 oscar.mateo
2014-07-11 15:48 ` [PATCH 01/42] drm/i915/bdw: New source and header file for LRs, LRCs and Execlists oscar.mateo
2014-07-11 15:48 ` [PATCH 02/42] drm/i915/bdw: Macro for LRCs and module option for Execlists oscar.mateo
2014-07-11 15:48 ` [PATCH 03/42] drm/i915/bdw: Initialization for Logical Ring Contexts oscar.mateo
2014-07-11 15:48 ` [PATCH 04/42] drm/i915/bdw: Introduce one context backing object per engine oscar.mateo
2014-07-11 15:48 ` [PATCH 05/42] drm/i915/bdw: A bit more advanced LR context alloc/free oscar.mateo
2014-07-11 15:48 ` [PATCH 06/42] drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts oscar.mateo
2014-07-11 15:48 ` [PATCH 07/42] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer oscar.mateo
2014-07-11 15:49 ` [PATCH 08/42] drm/i915/bdw: Populate LR contexts (somewhat) oscar.mateo
2014-07-11 15:49 ` [PATCH 09/42] drm/i915/bdw: Deferred creation of user-created LRCs oscar.mateo
2014-07-11 15:49 ` [PATCH 10/42] drm/i915/bdw: Render moot context reset and switch with Execlists oscar.mateo
2014-07-11 15:49 ` [PATCH 11/42] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs oscar.mateo
2014-07-11 15:49 ` [PATCH 12/42] drm/i915: Abstract the legacy workload submission mechanism away oscar.mateo
2014-07-11 15:49 ` [PATCH 13/42] drm/i915/bdw: Skeleton for the new logical rings submission path oscar.mateo
2014-07-11 15:49 ` [PATCH 14/42] drm/i915/bdw: Generic logical ring init and cleanup oscar.mateo
2014-07-11 15:49 ` [PATCH 15/42] drm/i915/bdw: GEN-specific logical ring init oscar.mateo
2014-07-11 15:49 ` [PATCH 16/42] drm/i915/bdw: GEN-specific logical ring set/get seqno oscar.mateo
2014-07-11 15:49 ` [PATCH 17/42] drm/i915/bdw: New logical ring submission mechanism oscar.mateo
2014-07-11 15:49 ` [PATCH 18/42] drm/i915/bdw: GEN-specific logical ring emit request oscar.mateo
2014-07-11 15:49 ` [PATCH 19/42] drm/i915/bdw: GEN-specific logical ring emit flush oscar.mateo
2014-07-11 15:49 ` [PATCH 20/42] drm/i915/bdw: Emission of requests with logical rings oscar.mateo
2014-07-11 15:49 ` [PATCH 21/42] drm/i915/bdw: Ring idle and stop " oscar.mateo
2014-07-11 15:49 ` [PATCH 22/42] drm/i915/bdw: Interrupts " oscar.mateo
2014-07-11 15:49 ` [PATCH 23/42] drm/i915/bdw: GEN-specific logical ring emit batchbuffer start oscar.mateo
2014-07-11 15:49 ` [PATCH 24/42] drm/i915/bdw: Workload submission mechanism for Execlists oscar.mateo
2014-07-11 15:49 ` [PATCH 25/42] drm/i915/bdw: Always use MMIO flips with Execlists oscar.mateo
2014-07-11 15:49 ` [PATCH 26/42] drm/i915/bdw: Render state init for Execlists oscar.mateo
2014-07-11 15:49 ` [PATCH 27/42] drm/i915/bdw: Implement context switching (somewhat) oscar.mateo
2014-07-11 15:49 ` [PATCH 28/42] drm/i915/bdw: Write the tail pointer, LRC style oscar.mateo
2014-07-11 15:49 ` [PATCH 29/42] drm/i915/bdw: Two-stage execlist submit process oscar.mateo
2014-07-11 15:49 ` [PATCH 30/42] drm/i915/bdw: Handle context switch events oscar.mateo
2014-07-11 15:49 ` [PATCH 31/42] drm/i915/bdw: Avoid non-lite-restore preemptions oscar.mateo
2014-07-11 15:49 ` [PATCH 32/42] drm/i915/bdw: Help out the ctx switch interrupt handler oscar.mateo
2014-07-11 15:49 ` [PATCH 33/42] drm/i915/bdw: Make sure gpu reset still works with Execlists oscar.mateo
2014-07-11 15:49 ` [PATCH 34/42] drm/i915/bdw: Make sure error capture keeps working " oscar.mateo
2014-07-11 15:49 ` [PATCH 35/42] drm/i915/bdw: Disable semaphores for Execlists oscar.mateo
2014-07-11 15:49 ` [PATCH 36/42] drm/i915/bdw: Display execlists info in debugfs oscar.mateo
2014-07-11 15:49 ` [PATCH 37/42] drm/i915/bdw: Display context backing obj & ringbuffer " oscar.mateo
2014-07-11 15:49 ` [PATCH 38/42] drm/i915/bdw: Print context state " oscar.mateo
2014-07-11 15:49 ` [PATCH 39/42] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists oscar.mateo
2014-07-11 15:49 ` [PATCH 40/42] drm/i915/bdw: Enable Logical Ring Contexts (hence, Execlists) oscar.mateo
2014-07-11 15:49 ` [PATCH 41/42] drm/i915/bdw: Pin the context backing objects to GGTT on-demand oscar.mateo
2014-07-11 15:49 ` [PATCH 42/42] drm/i915/bdw: Pin the ringbuffer backing object " oscar.mateo
2014-07-11 17:26 ` [PATCH 00/42] Execlists v4 Volkin, Bradley D

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.