All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/43] Execlists v5
@ 2014-07-24 16:04 Thomas Daniel
  2014-07-24 16:04 ` [PATCH 01/43] drm/i915: Reorder the actual workload submission so that args checking is done earlier Thomas Daniel
                   ` (44 more replies)
  0 siblings, 45 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Thomas Daniel <thomas.daniel@intel.com>

For a description of this patchset, please check the previous cover letters: [1], [2], [3] and [4].

I have taken ownership of this patchset from Oscar, and this version represents his last work on the execlists patchset.  The narrative below is from him.

I have been given some grace period to fix the remaining issues in Execlists before I move to a different project, and this is the result. There are very little differences between this v5 and the v4 I sent out last week, so I was unsure whether to drop a new patchbomb or simply reply to the patches that have changed, but I decided for the former to make the review easier.

The changes are:

- New prep-work patch to prevent a potential problem with the legacy ringbuffer submission extraction that was done earlier.
- Do the remaining intel_runtime_put while purging the execlists queue during reset.
- Check arguments before doing stuff in intel_execlists_submission. Also, get rel_constants parsing right.
- Do gen8_emit_flush = gen6_ring_flush + gen6_bsd_ring_flush.
- New patches for pinning context and ringbuffer backing objects on-demand (before I was pinning on interrupt time, which was a no-no). These fix the reamining eviction issues I was seeing.

The previous comment about the WAs still applies. I reproduce it here for completeness:

"One other caveat I have noticed is that many WAs in gen8_init_clock_gating (those that affect registers that now exist per-context) can get lost in the render default context. The reason is, in Execlists, a context is saved as soon as head = tail (with MI_SET_CONTEXT, however, the context wouldn't be saved until you tried to restore a different context). As we are sending the golden state batchbuffer to the render ring as soon as the rings are initialized, we are effectively saving the default context before gen8_init_clock_gating has an opportunity to set the WAs. I haven't noticed any ill-effect from this (yet) but it would be a good idea to move the WAs somewhere else (ring init looks like a good place). I believe there is already work in progress to create a new WA architecture, so this can be tackled there."

The previous IGT test [4] still applies.

There are three pending issues:

- The test gem_close_race warns about "scheduling while atomic" when the shrinker gets called. Without Execlists, the shrinker does not get called at all (which kind of makes sense) but the tests timeouts before finishing.
- The test gem_concurrent_blit fails in the gtt-* subtests: some pixels (14, to be exact) do not get copied correctly from one bo to another. Funnily enough, the tests pass if I do a i915 module reload first (./tests/drv_module_reload). Yesterday I dumped all the registers in the chip before and after a module reload (attached), but I havenŽt found any meaningful difference yet.
- When I try to run a whole IGT suite using Piglit, sometimes I hit the BUG_ON(!i915_gem_obj_is_pinned(ctx_obj0)) in execlists_submit_context(). I havenŽt managed to reproduce the problem at will, but there is obviously something wrong with the last two Execlists patches.

Keep the r-b tags coming, please!!

-- Oscar

[1]
http://lists.freedesktop.org/archives/intel-gfx/2014-March/042563.html
[2]
http://lists.freedesktop.org/archives/intel-gfx/2014-May/044847.html
[3]
http://lists.freedesktop.org/archives/intel-gfx/2014-June/047138.html
[4]
http://lists.freedesktop.org/archives/intel-gfx/2014-July/048944.html
[5]
http://lists.freedesktop.org/archives/intel-gfx/2014-May/044846.html

Ben Widawsky (2):
  drm/i915/bdw: Implement context switching (somewhat)
  drm/i915/bdw: Print context state in debugfs

Michel Thierry (1):
  drm/i915/bdw: Two-stage execlist submit process

Oscar Mateo (39):
  drm/i915: Reorder the actual workload submission so that args checking
    is done earlier
  drm/i915/bdw: New source and header file for LRs, LRCs and Execlists
  drm/i915/bdw: Macro for LRCs and module option for Execlists
  drm/i915/bdw: Initialization for Logical Ring Contexts
  drm/i915/bdw: Introduce one context backing object per engine
  drm/i915/bdw: A bit more advanced LR context alloc/free
  drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts
  drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
  drm/i915/bdw: Populate LR contexts (somewhat)
  drm/i915/bdw: Deferred creation of user-created LRCs
  drm/i915/bdw: Render moot context reset and switch with Execlists
  drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
  drm/i915: Abstract the legacy workload submission mechanism away
  drm/i915/bdw: Skeleton for the new logical rings submission path
  drm/i915/bdw: Generic logical ring init and cleanup
  drm/i915/bdw: GEN-specific logical ring init
  drm/i915/bdw: GEN-specific logical ring set/get seqno
  drm/i915/bdw: New logical ring submission mechanism
  drm/i915/bdw: GEN-specific logical ring emit request
  drm/i915/bdw: GEN-specific logical ring emit flush
  drm/i915/bdw: Emission of requests with logical rings
  drm/i915/bdw: Ring idle and stop with logical rings
  drm/i915/bdw: Interrupts with logical rings
  drm/i915/bdw: GEN-specific logical ring emit batchbuffer start
  drm/i915/bdw: Workload submission mechanism for Execlists
  drm/i915/bdw: Always use MMIO flips with Execlists
  drm/i915/bdw: Render state init for Execlists
  drm/i915/bdw: Write the tail pointer, LRC style
  drm/i915/bdw: Avoid non-lite-restore preemptions
  drm/i915/bdw: Help out the ctx switch interrupt handler
  drm/i915/bdw: Make sure gpu reset still works with Execlists
  drm/i915/bdw: Make sure error capture keeps working with Execlists
  drm/i915/bdw: Disable semaphores for Execlists
  drm/i915/bdw: Display execlists info in debugfs
  drm/i915/bdw: Display context backing obj & ringbuffer info in debugfs
  drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
  drm/i915/bdw: Enable Logical Ring Contexts (hence, Execlists)
  drm/i915/bdw: Pin the context backing objects to GGTT on-demand
  drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand

Thomas Daniel (1):
  drm/i915/bdw: Handle context switch events

 Documentation/DocBook/drm.tmpl               |    5 +
 drivers/gpu/drm/i915/Makefile                |    1 +
 drivers/gpu/drm/i915/i915_debugfs.c          |  157 ++-
 drivers/gpu/drm/i915/i915_drv.c              |    4 +
 drivers/gpu/drm/i915/i915_drv.h              |   47 +-
 drivers/gpu/drm/i915/i915_gem.c              |  132 +-
 drivers/gpu/drm/i915/i915_gem_context.c      |   56 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  118 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c          |    5 +
 drivers/gpu/drm/i915/i915_gem_render_state.c |   40 +-
 drivers/gpu/drm/i915/i915_gem_render_state.h |   47 +
 drivers/gpu/drm/i915/i915_gpu_error.c        |   22 +-
 drivers/gpu/drm/i915/i915_irq.c              |   44 +-
 drivers/gpu/drm/i915/i915_params.c           |    6 +
 drivers/gpu/drm/i915/i915_reg.h              |    5 +
 drivers/gpu/drm/i915/intel_display.c         |    2 +
 drivers/gpu/drm/i915/intel_lrc.c             | 1802 ++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h             |  115 ++
 drivers/gpu/drm/i915/intel_renderstate.h     |    8 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c      |  166 ++-
 drivers/gpu/drm/i915/intel_ringbuffer.h      |   41 +-
 21 files changed, 2622 insertions(+), 201 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_render_state.h
 create mode 100644 drivers/gpu/drm/i915/intel_lrc.c
 create mode 100644 drivers/gpu/drm/i915/intel_lrc.h

-- 
1.9.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 01/43] drm/i915: Reorder the actual workload submission so that args checking is done earlier
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-07-25  8:30   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 02/43] drm/i915/bdw: New source and header file for LRs, LRCs and Execlists Thomas Daniel
                   ` (43 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

In this patch:

commit 78382593e921c88371abd019aca8978db3248a8f
Author: Oscar Mateo <oscar.mateo@intel.com>
Date:   Thu Jul 3 16:28:05 2014 +0100

    drm/i915: Extract the actual workload submission mechanism from execbuffer

    So that we isolate the legacy ringbuffer submission mechanism, which becomes
    a good candidate to be abstracted away. This is prep-work for Execlists (which
    will its own workload submission mechanism).

    No functional changes.

I changed the order in which the args checking is done. I don't know why I did (brain
fade?) but itś not right. I haven't seen any ill effect from this, but the Execlists
version of this function will have problems if the order is not correct.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   86 ++++++++++++++--------------
 1 file changed, 43 insertions(+), 43 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 60998fc..c5115957 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1042,6 +1042,43 @@ legacy_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
 	u32 instp_mask;
 	int i, ret = 0;
 
+	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
+	instp_mask = I915_EXEC_CONSTANTS_MASK;
+	switch (instp_mode) {
+	case I915_EXEC_CONSTANTS_REL_GENERAL:
+	case I915_EXEC_CONSTANTS_ABSOLUTE:
+	case I915_EXEC_CONSTANTS_REL_SURFACE:
+		if (instp_mode != 0 && ring != &dev_priv->ring[RCS]) {
+			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
+			ret = -EINVAL;
+			goto error;
+		}
+
+		if (instp_mode != dev_priv->relative_constants_mode) {
+			if (INTEL_INFO(dev)->gen < 4) {
+				DRM_DEBUG("no rel constants on pre-gen4\n");
+				ret = -EINVAL;
+				goto error;
+			}
+
+			if (INTEL_INFO(dev)->gen > 5 &&
+			    instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
+				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
+				ret = -EINVAL;
+				goto error;
+			}
+
+			/* The HW changed the meaning on this bit on gen6 */
+			if (INTEL_INFO(dev)->gen >= 6)
+				instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
+		}
+		break;
+	default:
+		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
+		ret = -EINVAL;
+		goto error;
+	}
+
 	if (args->num_cliprects != 0) {
 		if (ring != &dev_priv->ring[RCS]) {
 			DRM_DEBUG("clip rectangles are only valid with the render ring\n");
@@ -1085,6 +1122,12 @@ legacy_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
 		}
 	}
 
+	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
+		ret = i915_reset_gen7_sol_offsets(dev, ring);
+		if (ret)
+			goto error;
+	}
+
 	ret = i915_gem_execbuffer_move_to_gpu(ring, vmas);
 	if (ret)
 		goto error;
@@ -1093,43 +1136,6 @@ legacy_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
 	if (ret)
 		goto error;
 
-	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
-	instp_mask = I915_EXEC_CONSTANTS_MASK;
-	switch (instp_mode) {
-	case I915_EXEC_CONSTANTS_REL_GENERAL:
-	case I915_EXEC_CONSTANTS_ABSOLUTE:
-	case I915_EXEC_CONSTANTS_REL_SURFACE:
-		if (instp_mode != 0 && ring != &dev_priv->ring[RCS]) {
-			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
-			ret = -EINVAL;
-			goto error;
-		}
-
-		if (instp_mode != dev_priv->relative_constants_mode) {
-			if (INTEL_INFO(dev)->gen < 4) {
-				DRM_DEBUG("no rel constants on pre-gen4\n");
-				ret = -EINVAL;
-				goto error;
-			}
-
-			if (INTEL_INFO(dev)->gen > 5 &&
-			    instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
-				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
-				ret = -EINVAL;
-				goto error;
-			}
-
-			/* The HW changed the meaning on this bit on gen6 */
-			if (INTEL_INFO(dev)->gen >= 6)
-				instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
-		}
-		break;
-	default:
-		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
-		ret = -EINVAL;
-		goto error;
-	}
-
 	if (ring == &dev_priv->ring[RCS] &&
 			instp_mode != dev_priv->relative_constants_mode) {
 		ret = intel_ring_begin(ring, 4);
@@ -1145,12 +1151,6 @@ legacy_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
 		dev_priv->relative_constants_mode = instp_mode;
 	}
 
-	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
-		ret = i915_reset_gen7_sol_offsets(dev, ring);
-		if (ret)
-			goto error;
-	}
-
 	exec_len = args->batch_len;
 	if (cliprects) {
 		for (i = 0; i < args->num_cliprects; i++) {
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 02/43] drm/i915/bdw: New source and header file for LRs, LRCs and Execlists
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
  2014-07-24 16:04 ` [PATCH 01/43] drm/i915: Reorder the actual workload submission so that args checking is done earlier Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-07-24 16:04 ` [PATCH 03/43] drm/i915/bdw: Macro for LRCs and module option for Execlists Thomas Daniel
                   ` (42 subsequent siblings)
  44 siblings, 0 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Some legacy HW context code assumptions don't make sense for this new
submission method, so we will place this stuff in a separate file.

Note for reviewers: I've carefully considered the best name for this file
and this was my best option (other possibilities were intel_lr_context.c
or intel_execlist.c). I am open to a certain bikeshedding on this matter,
anyway.

And some point in time, it would be a good idea to split intel_lrc.c/.h
even further, but for the moment just shove everything together.

v2: Change to intel_lrc.c

v3: Squash together with the header file addition

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/Makefile    |    1 +
 drivers/gpu/drm/i915/i915_drv.h  |    1 +
 drivers/gpu/drm/i915/intel_lrc.c |   42 ++++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h |   27 ++++++++++++++++++++++++
 4 files changed, 71 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/intel_lrc.c
 create mode 100644 drivers/gpu/drm/i915/intel_lrc.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index cad1683..9fee2a0 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -31,6 +31,7 @@ i915-y += i915_cmd_parser.o \
 	  i915_gpu_error.o \
 	  i915_irq.o \
 	  i915_trace_points.o \
+	  intel_lrc.o \
 	  intel_ringbuffer.o \
 	  intel_uncore.o
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 44a63f3..54c2bd9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -35,6 +35,7 @@
 #include "i915_reg.h"
 #include "intel_bios.h"
 #include "intel_ringbuffer.h"
+#include "intel_lrc.h"
 #include "i915_gem_gtt.h"
 #include <linux/io-mapping.h>
 #include <linux/i2c.h>
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
new file mode 100644
index 0000000..49bb6fc
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -0,0 +1,42 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Ben Widawsky <ben@bwidawsk.net>
+ *    Michel Thierry <michel.thierry@intel.com>
+ *    Thomas Daniel <thomas.daniel@intel.com>
+ *    Oscar Mateo <oscar.mateo@intel.com>
+ *
+ */
+
+/*
+ * GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
+ * These expanded contexts enable a number of new abilities, especially
+ * "Execlists" (also implemented in this file).
+ *
+ * Execlists are the new method by which, on gen8+ hardware, workloads are
+ * submitted for execution (as opposed to the legacy, ringbuffer-based, method).
+ */
+
+#include <drm/drmP.h>
+#include <drm/i915_drm.h>
+#include "i915_drv.h"
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
new file mode 100644
index 0000000..f6830a4
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef _INTEL_LRC_H_
+#define _INTEL_LRC_H_
+
+#endif /* _INTEL_LRC_H_ */
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 03/43] drm/i915/bdw: Macro for LRCs and module option for Execlists
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
  2014-07-24 16:04 ` [PATCH 01/43] drm/i915: Reorder the actual workload submission so that args checking is done earlier Thomas Daniel
  2014-07-24 16:04 ` [PATCH 02/43] drm/i915/bdw: New source and header file for LRs, LRCs and Execlists Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 13:57   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 04/43] drm/i915/bdw: Initialization for Logical Ring Contexts Thomas Daniel
                   ` (41 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
These expanded contexts enable a number of new abilities, especially
"Execlists".

The macro is defined to off until we have things in place to hope to
work.

v2: Rename "advanced contexts" to the more correct "logical ring
contexts".

v3: Add a module parameter to enable execlists. Execlist are relatively
new, and so it'd be wise to be able to switch back to ring submission
to debug subtle problems that will inevitably arise.

v4: Add an intel_enable_execlists function.

v5: Sanitize early, as suggested by Daniel. Remove lrc_enabled.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v4 & v5)
---
 drivers/gpu/drm/i915/i915_drv.h    |    2 ++
 drivers/gpu/drm/i915/i915_gem.c    |    3 +++
 drivers/gpu/drm/i915/i915_params.c |    6 ++++++
 drivers/gpu/drm/i915/intel_lrc.c   |   11 +++++++++++
 drivers/gpu/drm/i915/intel_lrc.h   |    3 +++
 5 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 54c2bd9..a793d6d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2037,6 +2037,7 @@ struct drm_i915_cmd_table {
 #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
 
 #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
+#define HAS_LOGICAL_RING_CONTEXTS(dev)	0
 #define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >= 6)
 #define HAS_PPGTT(dev)		(INTEL_INFO(dev)->gen >= 7 && !IS_GEN8(dev))
 #define USES_PPGTT(dev)		intel_enable_ppgtt(dev, false)
@@ -2122,6 +2123,7 @@ struct i915_params {
 	int enable_rc6;
 	int enable_fbc;
 	int enable_ppgtt;
+	int enable_execlists;
 	int enable_psr;
 	unsigned int preliminary_hw_support;
 	int disable_power_well;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e5d4d73..d8bf4fa 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4746,6 +4746,9 @@ int i915_gem_init(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret;
 
+	i915.enable_execlists = intel_sanitize_enable_execlists(dev,
+			i915.enable_execlists);
+
 	mutex_lock(&dev->struct_mutex);
 
 	if (IS_VALLEYVIEW(dev)) {
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index bbdee21..7f0fb72 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -35,6 +35,7 @@ struct i915_params i915 __read_mostly = {
 	.vbt_sdvo_panel_type = -1,
 	.enable_rc6 = -1,
 	.enable_fbc = -1,
+	.enable_execlists = -1,
 	.enable_hangcheck = true,
 	.enable_ppgtt = -1,
 	.enable_psr = 1,
@@ -117,6 +118,11 @@ MODULE_PARM_DESC(enable_ppgtt,
 	"Override PPGTT usage. "
 	"(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
 
+module_param_named(enable_execlists, i915.enable_execlists, int, 0400);
+MODULE_PARM_DESC(enable_execlists,
+	"Override execlists usage. "
+	"(-1=auto [default], 0=disabled, 1=enabled)");
+
 module_param_named(enable_psr, i915.enable_psr, int, 0600);
 MODULE_PARM_DESC(enable_psr, "Enable PSR (default: true)");
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 49bb6fc..21f7f1c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -40,3 +40,14 @@
 #include <drm/drmP.h>
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
+
+int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists)
+{
+	if (enable_execlists == 0)
+		return 0;
+
+	if (HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev))
+		return 1;
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index f6830a4..75ee9c3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -24,4 +24,7 @@
 #ifndef _INTEL_LRC_H_
 #define _INTEL_LRC_H_
 
+/* Execlists */
+int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
+
 #endif /* _INTEL_LRC_H_ */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 04/43] drm/i915/bdw: Initialization for Logical Ring Contexts
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (2 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 03/43] drm/i915/bdw: Macro for LRCs and module option for Execlists Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 14:03   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 05/43] drm/i915/bdw: Introduce one context backing object per engine Thomas Daniel
                   ` (40 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

For the moment this is just a placeholder, but it shows one of the
main differences between the good ol' HW contexts and the shiny
new Logical Ring Contexts: LR contexts allocate  and free their
own backing objects. Another difference is that the allocation is
deferred (as the create function name suggests), but that does not
happen in this patch yet, because for the moment we are only dealing
with the default context.

Early in the series we had our own gen8_gem_context_init/fini
functions, but the truth is they now look almost the same as the
legacy hw context init/fini functions. We can always split them
later if this ceases to be the case.

Also, we do not fall back to legacy ringbuffers when logical ring
context initialization fails (not very likely to happen and, even
if it does, hw contexts would probably fail as well).

v2: Daniel says "explain, do not showcase".

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c |   29 +++++++++++++++++++++++------
 drivers/gpu/drm/i915/intel_lrc.c        |   15 +++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h        |    5 +++++
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index de72a28..718150e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -182,7 +182,10 @@ void i915_gem_context_free(struct kref *ctx_ref)
 						   typeof(*ctx), ref);
 	struct i915_hw_ppgtt *ppgtt = NULL;
 
-	if (ctx->legacy_hw_ctx.rcs_state) {
+	if (i915.enable_execlists) {
+		ppgtt = ctx_to_ppgtt(ctx);
+		intel_lr_context_free(ctx);
+	} else if (ctx->legacy_hw_ctx.rcs_state) {
 		/* We refcount even the aliasing PPGTT to keep the code symmetric */
 		if (USES_PPGTT(ctx->legacy_hw_ctx.rcs_state->base.dev))
 			ppgtt = ctx_to_ppgtt(ctx);
@@ -419,7 +422,11 @@ int i915_gem_context_init(struct drm_device *dev)
 	if (WARN_ON(dev_priv->ring[RCS].default_context))
 		return 0;
 
-	if (HAS_HW_CONTEXTS(dev)) {
+	if (i915.enable_execlists) {
+		/* NB: intentionally left blank. We will allocate our own
+		 * backing objects as we need them, thank you very much */
+		dev_priv->hw_context_size = 0;
+	} else if (HAS_HW_CONTEXTS(dev)) {
 		dev_priv->hw_context_size = round_up(get_context_size(dev), 4096);
 		if (dev_priv->hw_context_size > (1<<20)) {
 			DRM_DEBUG_DRIVER("Disabling HW Contexts; invalid size %d\n",
@@ -435,11 +442,20 @@ int i915_gem_context_init(struct drm_device *dev)
 		return PTR_ERR(ctx);
 	}
 
-	/* NB: RCS will hold a ref for all rings */
-	for (i = 0; i < I915_NUM_RINGS; i++)
-		dev_priv->ring[i].default_context = ctx;
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		struct intel_engine_cs *ring = &dev_priv->ring[i];
+
+		/* NB: RCS will hold a ref for all rings */
+		ring->default_context = ctx;
+
+		/* FIXME: we really only want to do this for initialized rings */
+		if (i915.enable_execlists)
+			intel_lr_context_deferred_create(ctx, ring);
+	}
 
-	DRM_DEBUG_DRIVER("%s context support initialized\n", dev_priv->hw_context_size ? "HW" : "fake");
+	DRM_DEBUG_DRIVER("%s context support initialized\n",
+			i915.enable_execlists ? "LR" :
+			dev_priv->hw_context_size ? "HW" : "fake");
 	return 0;
 }
 
@@ -781,6 +797,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	struct intel_context *ctx;
 	int ret;
 
+	/* FIXME: allow user-created LR contexts as well */
 	if (!hw_context_enabled(dev))
 		return -ENODEV;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 21f7f1c..8cc6b55 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -51,3 +51,18 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 
 	return 0;
 }
+
+void intel_lr_context_free(struct intel_context *ctx)
+{
+	/* TODO */
+}
+
+int intel_lr_context_deferred_create(struct intel_context *ctx,
+				     struct intel_engine_cs *ring)
+{
+	BUG_ON(ctx->legacy_hw_ctx.rcs_state != NULL);
+
+	/* TODO */
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 75ee9c3..3b93572 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -24,6 +24,11 @@
 #ifndef _INTEL_LRC_H_
 #define _INTEL_LRC_H_
 
+/* Logical Ring Contexts */
+void intel_lr_context_free(struct intel_context *ctx);
+int intel_lr_context_deferred_create(struct intel_context *ctx,
+				     struct intel_engine_cs *ring);
+
 /* Execlists */
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 05/43] drm/i915/bdw: Introduce one context backing object per engine
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (3 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 04/43] drm/i915/bdw: Initialization for Logical Ring Contexts Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 13:59   ` [PATCH] drm/i915: WARN if module opt sanitization goes out of order Daniel Vetter
  2014-07-24 16:04 ` [PATCH 06/43] drm/i915/bdw: A bit more advanced LR context alloc/free Thomas Daniel
                   ` (39 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

A context backing object only makes sense for a given engine (because
it holds state data specific to that engine).

In legacy ringbuffer sumission mode, the only MI_SET_CONTEXT we really
perform is for the render engine, so one backing object is all we nee.

With Execlists, however, we need backing objects for every engine, as
contexts become the only way to submit workloads to the GPU. To tackle
this problem, we multiplex the context struct to contain <no-of-engines>
objects.

Originally, I colored this code by instantiating one new context for
every engine I wanted to use, but this change suggested by Brad Volkin
makes it more elegant.

v2: Leave the old backing object pointer behind. Daniel Vetter suggested
using a union, but it makes more sense to keep rcs_state as a NULL
pointer behind, to make sure no one uses it incorrectly when Execlists
are enabled, similar to what he suggested for ring->buffer (Rusty's API
level 5).

v3: Use the name "state" instead of the too-generic "obj", so that it
mirrors the name choice for the legacy rcs_state.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a793d6d..b2b0c80 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -616,11 +616,17 @@ struct intel_context {
 	struct i915_ctx_hang_stats hang_stats;
 	struct i915_address_space *vm;
 
+	/* Legacy ring buffer submission */
 	struct {
 		struct drm_i915_gem_object *rcs_state;
 		bool initialized;
 	} legacy_hw_ctx;
 
+	/* Execlists */
+	struct {
+		struct drm_i915_gem_object *state;
+	} engine[I915_NUM_RINGS];
+
 	struct list_head link;
 };
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 06/43] drm/i915/bdw: A bit more advanced LR context alloc/free
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (4 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 05/43] drm/i915/bdw: Introduce one context backing object per engine Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-07-24 16:04 ` [PATCH 07/43] drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts Thomas Daniel
                   ` (38 subsequent siblings)
  44 siblings, 0 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Now that we have the ability to allocate our own context backing objects
and we have multiplexed one of them per engine inside the context structs,
we can finally allocate and free them correctly.

Regarding the context size, reading the register to calculate the sizes
can work, I think, however the docs are very clear about the actual
context sizes on GEN8, so just hardcode that and use it.

v2: Rebased on top of the Full PPGTT series. It is important to notice
that at this point we have one global default context per engine, all
of them using the aliasing PPGTT (as opposed to the single global
default context we have with legacy HW contexts).

v3:
- Go back to one single global default context, this time with multiple
  backing objects inside.
- Use different context sizes for non-render engines, as suggested by
  Damien (still hardcoded, since the information about the context size
  registers in the BSpec is, well, *lacking*).
- Render ctx size is 20 (or 19) pages, but not 21 (caught by Damien).
- Move default context backing object creation to intel_init_ring (so
  that we don't waste memory in rings that might not get initialized).

v4:
- Reuse the HW legacy context init/fini.
- Create a separate free function.
- Rename the functions with an intel_ preffix.

v5: Several rebases to account for the changes in the previous patches.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |    2 ++
 drivers/gpu/drm/i915/i915_gem_context.c |    2 +-
 drivers/gpu/drm/i915/intel_lrc.c        |   59 +++++++++++++++++++++++++++++--
 3 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b2b0c80..f2a6598 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2500,6 +2500,8 @@ int i915_switch_context(struct intel_engine_cs *ring,
 struct intel_context *
 i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id);
 void i915_gem_context_free(struct kref *ctx_ref);
+struct drm_i915_gem_object *
+i915_gem_alloc_context_obj(struct drm_device *dev, size_t size);
 static inline void i915_gem_context_reference(struct intel_context *ctx)
 {
 	kref_get(&ctx->ref);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 718150e..48d7476 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -201,7 +201,7 @@ void i915_gem_context_free(struct kref *ctx_ref)
 	kfree(ctx);
 }
 
-static struct drm_i915_gem_object *
+struct drm_i915_gem_object *
 i915_gem_alloc_context_obj(struct drm_device *dev, size_t size)
 {
 	struct drm_i915_gem_object *obj;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8cc6b55..a3fc6fc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -41,6 +41,11 @@
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 
+#define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
+#define GEN8_LR_CONTEXT_OTHER_SIZE (2 * PAGE_SIZE)
+
+#define GEN8_LR_CONTEXT_ALIGN 4096
+
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists)
 {
 	if (enable_execlists == 0)
@@ -54,15 +59,65 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 
 void intel_lr_context_free(struct intel_context *ctx)
 {
-	/* TODO */
+	int i;
+
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
+		if (ctx_obj) {
+			i915_gem_object_ggtt_unpin(ctx_obj);
+			drm_gem_object_unreference(&ctx_obj->base);
+		}
+	}
+}
+
+static uint32_t get_lr_context_size(struct intel_engine_cs *ring)
+{
+	int ret = 0;
+
+	WARN_ON(INTEL_INFO(ring->dev)->gen != 8);
+
+	switch (ring->id) {
+	case RCS:
+		ret = GEN8_LR_CONTEXT_RENDER_SIZE;
+		break;
+	case VCS:
+	case BCS:
+	case VECS:
+	case VCS2:
+		ret = GEN8_LR_CONTEXT_OTHER_SIZE;
+		break;
+	}
+
+	return ret;
 }
 
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring)
 {
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_gem_object *ctx_obj;
+	uint32_t context_size;
+	int ret;
+
 	BUG_ON(ctx->legacy_hw_ctx.rcs_state != NULL);
 
-	/* TODO */
+	context_size = round_up(get_lr_context_size(ring), 4096);
+
+	ctx_obj = i915_gem_alloc_context_obj(dev, context_size);
+	if (IS_ERR(ctx_obj)) {
+		ret = PTR_ERR(ctx_obj);
+		DRM_DEBUG_DRIVER("Alloc LRC backing obj failed: %d\n", ret);
+		return ret;
+	}
+
+	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n", ret);
+		drm_gem_object_unreference(&ctx_obj->base);
+		return ret;
+	}
+
+	ctx->engine[ring->id].state = ctx_obj;
 
 	return 0;
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 07/43] drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (5 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 06/43] drm/i915/bdw: A bit more advanced LR context alloc/free Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-07-24 16:04 ` [PATCH 08/43] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer Thomas Daniel
                   ` (37 subsequent siblings)
  44 siblings, 0 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

As we have said a couple of times by now, logical ring contexts have
their own ringbuffers: not only the backing pages, but the whole
management struct.

In a previous version of the series, this was achieved with two separate
patches:
drm/i915/bdw: Allocate ringbuffer backing objects for default global LRC
drm/i915/bdw: Allocate ringbuffer for user-created LRCs

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |    1 +
 drivers/gpu/drm/i915/intel_lrc.c        |   38 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c |    6 ++---
 drivers/gpu/drm/i915/intel_ringbuffer.h |    4 ++++
 4 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f2a6598..ff2c373 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -625,6 +625,7 @@ struct intel_context {
 	/* Execlists */
 	struct {
 		struct drm_i915_gem_object *state;
+		struct intel_ringbuffer *ringbuf;
 	} engine[I915_NUM_RINGS];
 
 	struct list_head link;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a3fc6fc..0a12b8c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -63,7 +63,11 @@ void intel_lr_context_free(struct intel_context *ctx)
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
+		struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
+
 		if (ctx_obj) {
+			intel_destroy_ringbuffer_obj(ringbuf);
+			kfree(ringbuf);
 			i915_gem_object_ggtt_unpin(ctx_obj);
 			drm_gem_object_unreference(&ctx_obj->base);
 		}
@@ -97,6 +101,7 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_gem_object *ctx_obj;
 	uint32_t context_size;
+	struct intel_ringbuffer *ringbuf;
 	int ret;
 
 	BUG_ON(ctx->legacy_hw_ctx.rcs_state != NULL);
@@ -117,6 +122,39 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 		return ret;
 	}
 
+	ringbuf = kzalloc(sizeof(*ringbuf), GFP_KERNEL);
+	if (!ringbuf) {
+		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s\n",
+				ring->name);
+		i915_gem_object_ggtt_unpin(ctx_obj);
+		drm_gem_object_unreference(&ctx_obj->base);
+		ret = -ENOMEM;
+		return ret;
+	}
+
+	ringbuf->size = 32 * PAGE_SIZE;
+	ringbuf->effective_size = ringbuf->size;
+	ringbuf->head = 0;
+	ringbuf->tail = 0;
+	ringbuf->space = ringbuf->size;
+	ringbuf->last_retired_head = -1;
+
+	/* TODO: For now we put this in the mappable region so that we can reuse
+	 * the existing ringbuffer code which ioremaps it. When we start
+	 * creating many contexts, this will no longer work and we must switch
+	 * to a kmapish interface.
+	 */
+	ret = intel_alloc_ringbuffer_obj(dev, ringbuf);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer obj %s: %d\n",
+				ring->name, ret);
+		kfree(ringbuf);
+		i915_gem_object_ggtt_unpin(ctx_obj);
+		drm_gem_object_unreference(&ctx_obj->base);
+		return ret;
+	}
+
+	ctx->engine[ring->id].ringbuf = ringbuf;
 	ctx->engine[ring->id].state = ctx_obj;
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 599709e..01e9840 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1495,7 +1495,7 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
 	return 0;
 }
 
-static void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
+void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 {
 	if (!ringbuf->obj)
 		return;
@@ -1506,8 +1506,8 @@ static void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 	ringbuf->obj = NULL;
 }
 
-static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
-				      struct intel_ringbuffer *ringbuf)
+int intel_alloc_ringbuffer_obj(struct drm_device *dev,
+			       struct intel_ringbuffer *ringbuf)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_object *obj;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index ed59410..053d004 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -353,6 +353,10 @@ intel_write_status_page(struct intel_engine_cs *ring,
 #define I915_GEM_HWS_SCRATCH_INDEX	0x30
 #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
 
+void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf);
+int intel_alloc_ringbuffer_obj(struct drm_device *dev,
+			       struct intel_ringbuffer *ringbuf);
+
 void intel_stop_ring_buffer(struct intel_engine_cs *ring);
 void intel_cleanup_ring_buffer(struct intel_engine_cs *ring);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 08/43] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (6 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 07/43] drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 14:14   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 09/43] drm/i915/bdw: Populate LR contexts (somewhat) Thomas Daniel
                   ` (36 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Any given ringbuffer is unequivocally tied to one context and one engine.
By setting the appropriate pointers to them, the ringbuffer struct holds
all the infromation you might need to submit a workload for processing,
Execlists style.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        |    2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.c |    2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h |    3 +++
 3 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0a12b8c..2eb7db6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -132,6 +132,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 		return ret;
 	}
 
+	ringbuf->ring = ring;
+	ringbuf->ctx = ctx;
 	ringbuf->size = 32 * PAGE_SIZE;
 	ringbuf->effective_size = ringbuf->size;
 	ringbuf->head = 0;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 01e9840..279dda4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1570,6 +1570,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	ringbuf->size = 32 * PAGE_SIZE;
+	ringbuf->ring = ring;
+	ringbuf->ctx = ring->default_context;
 	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
 
 	init_waitqueue_head(&ring->irq_queue);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 053d004..be40788 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -88,6 +88,9 @@ struct intel_ringbuffer {
 	struct drm_i915_gem_object *obj;
 	void __iomem *virtual_start;
 
+	struct intel_engine_cs *ring;
+	struct intel_context *ctx;
+
 	u32 head;
 	u32 tail;
 	int space;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 09/43] drm/i915/bdw: Populate LR contexts (somewhat)
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (7 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 08/43] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-07-24 16:04 ` [PATCH 10/43] drm/i915/bdw: Deferred creation of user-created LRCs Thomas Daniel
                   ` (35 subsequent siblings)
  44 siblings, 0 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

For the most part, logical ring context objects are similar to hardware
contexts in that the backing object is meant to be opaque. There are
some exceptions where we need to poke certain offsets of the object for
initialization, updating the tail pointer or updating the PDPs.

For our basic execlist implementation we'll only need our PPGTT PDs,
and ringbuffer addresses in order to set up the context. With previous
patches, we have both, so start prepping the context to be load.

Before running a context for the first time you must populate some
fields in the context object. These fields begin 1 PAGE + LRCA, ie. the
first page (in 0 based counting) of the context  image. These same
fields will be read and written to as contexts are saved and restored
once the system is up and running.

Many of these fields are completely reused from previous global
registers: ringbuffer head/tail/control, context control matches some
previous MI_SET_CONTEXT flags, and page directories. There are other
fields which we don't touch which we may want in the future.

v2: CTX_LRI_HEADER_0 is MI_LOAD_REGISTER_IMM(14) for render and (11)
for other engines.

v3: Several rebases and general changes to the code.

v4: Squash with "Extract LR context object populating"
Also, Damien's review comments:
- Set the Force Posted bit on the LRI header, as the BSpec suggest we do.
- Prevent warning when compiling a 32-bits kernel without HIGHMEM64.
- Add a clarifying comment to the context population code.

v5: Damien's review comments:
- The third MI_LOAD_REGISTER_IMM in the context does not set Force Posted.
- Remove dead code.

v6: Add a note about the (presumed) differences between BDW and CHV state
contexts. Also, Brad's review comments:
- Use the _MASKED_BIT_ENABLE, upper_32_bits and lower_32_bits macros.
- Be less magical about how we set the ring size in the context.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> (v2)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h  |    1 +
 drivers/gpu/drm/i915/intel_lrc.c |  159 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 156 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index ce70aa4..043a6ea 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -282,6 +282,7 @@
  *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
  */
 #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
+#define   MI_LRI_FORCE_POSTED		(1<<12)
 #define MI_STORE_REGISTER_MEM(x) MI_INSTR(0x24, 2*(x)-1)
 #define MI_STORE_REGISTER_MEM_GEN8(x) MI_INSTR(0x24, 3*(x)-1)
 #define   MI_SRM_LRM_GLOBAL_GTT		(1<<22)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2eb7db6..cf322ec 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -46,6 +46,38 @@
 
 #define GEN8_LR_CONTEXT_ALIGN 4096
 
+#define RING_ELSP(ring)			((ring)->mmio_base+0x230)
+#define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
+
+#define CTX_LRI_HEADER_0		0x01
+#define CTX_CONTEXT_CONTROL		0x02
+#define CTX_RING_HEAD			0x04
+#define CTX_RING_TAIL			0x06
+#define CTX_RING_BUFFER_START		0x08
+#define CTX_RING_BUFFER_CONTROL		0x0a
+#define CTX_BB_HEAD_U			0x0c
+#define CTX_BB_HEAD_L			0x0e
+#define CTX_BB_STATE			0x10
+#define CTX_SECOND_BB_HEAD_U		0x12
+#define CTX_SECOND_BB_HEAD_L		0x14
+#define CTX_SECOND_BB_STATE		0x16
+#define CTX_BB_PER_CTX_PTR		0x18
+#define CTX_RCS_INDIRECT_CTX		0x1a
+#define CTX_RCS_INDIRECT_CTX_OFFSET	0x1c
+#define CTX_LRI_HEADER_1		0x21
+#define CTX_CTX_TIMESTAMP		0x22
+#define CTX_PDP3_UDW			0x24
+#define CTX_PDP3_LDW			0x26
+#define CTX_PDP2_UDW			0x28
+#define CTX_PDP2_LDW			0x2a
+#define CTX_PDP1_UDW			0x2c
+#define CTX_PDP1_LDW			0x2e
+#define CTX_PDP0_UDW			0x30
+#define CTX_PDP0_LDW			0x32
+#define CTX_LRI_HEADER_2		0x41
+#define CTX_R_PWR_CLK_STATE		0x42
+#define CTX_GPGPU_CSR_BASE_ADDRESS	0x44
+
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists)
 {
 	if (enable_execlists == 0)
@@ -57,6 +89,115 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
+static int
+populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj,
+		    struct intel_engine_cs *ring, struct intel_ringbuffer *ringbuf)
+{
+	struct drm_i915_gem_object *ring_obj = ringbuf->obj;
+	struct i915_hw_ppgtt *ppgtt = ctx_to_ppgtt(ctx);
+	struct page *page;
+	uint32_t *reg_state;
+	int ret;
+
+	ret = i915_gem_object_set_to_cpu_domain(ctx_obj, true);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Could not set to CPU domain\n");
+		return ret;
+	}
+
+	ret = i915_gem_object_get_pages(ctx_obj);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Could not get object pages\n");
+		return ret;
+	}
+
+	i915_gem_object_pin_pages(ctx_obj);
+
+	/* The second page of the context object contains some fields which must
+	 * be set up prior to the first execution. */
+	page = i915_gem_object_get_page(ctx_obj, 1);
+	reg_state = kmap_atomic(page);
+
+	/* A context is actually a big batch buffer with several MI_LOAD_REGISTER_IMM
+	 * commands followed by (reg, value) pairs. The values we are setting here are
+	 * only for the first context restore: on a subsequent save, the GPU will
+	 * recreate this batchbuffer with new values (including all the missing
+	 * MI_LOAD_REGISTER_IMM commands that we are not initializing here). */
+	if (ring->id == RCS)
+		reg_state[CTX_LRI_HEADER_0] = MI_LOAD_REGISTER_IMM(14);
+	else
+		reg_state[CTX_LRI_HEADER_0] = MI_LOAD_REGISTER_IMM(11);
+	reg_state[CTX_LRI_HEADER_0] |= MI_LRI_FORCE_POSTED;
+	reg_state[CTX_CONTEXT_CONTROL] = RING_CONTEXT_CONTROL(ring);
+	reg_state[CTX_CONTEXT_CONTROL+1] =
+			_MASKED_BIT_ENABLE((1<<3) | MI_RESTORE_INHIBIT);
+	reg_state[CTX_RING_HEAD] = RING_HEAD(ring->mmio_base);
+	reg_state[CTX_RING_HEAD+1] = 0;
+	reg_state[CTX_RING_TAIL] = RING_TAIL(ring->mmio_base);
+	reg_state[CTX_RING_TAIL+1] = 0;
+	reg_state[CTX_RING_BUFFER_START] = RING_START(ring->mmio_base);
+	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
+	reg_state[CTX_RING_BUFFER_CONTROL] = RING_CTL(ring->mmio_base);
+	reg_state[CTX_RING_BUFFER_CONTROL+1] =
+			((ringbuf->size - PAGE_SIZE) & RING_NR_PAGES) | RING_VALID;
+	reg_state[CTX_BB_HEAD_U] = ring->mmio_base + 0x168;
+	reg_state[CTX_BB_HEAD_U+1] = 0;
+	reg_state[CTX_BB_HEAD_L] = ring->mmio_base + 0x140;
+	reg_state[CTX_BB_HEAD_L+1] = 0;
+	reg_state[CTX_BB_STATE] = ring->mmio_base + 0x110;
+	reg_state[CTX_BB_STATE+1] = (1<<5);
+	reg_state[CTX_SECOND_BB_HEAD_U] = ring->mmio_base + 0x11c;
+	reg_state[CTX_SECOND_BB_HEAD_U+1] = 0;
+	reg_state[CTX_SECOND_BB_HEAD_L] = ring->mmio_base + 0x114;
+	reg_state[CTX_SECOND_BB_HEAD_L+1] = 0;
+	reg_state[CTX_SECOND_BB_STATE] = ring->mmio_base + 0x118;
+	reg_state[CTX_SECOND_BB_STATE+1] = 0;
+	if (ring->id == RCS) {
+		/* TODO: according to BSpec, the register state context
+		 * for CHV does not have these. OTOH, these registers do
+		 * exist in CHV. I'm waiting for a clarification */
+		reg_state[CTX_BB_PER_CTX_PTR] = ring->mmio_base + 0x1c0;
+		reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
+		reg_state[CTX_RCS_INDIRECT_CTX] = ring->mmio_base + 0x1c4;
+		reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
+		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring->mmio_base + 0x1c8;
+		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
+	}
+	reg_state[CTX_LRI_HEADER_1] = MI_LOAD_REGISTER_IMM(9);
+	reg_state[CTX_LRI_HEADER_1] |= MI_LRI_FORCE_POSTED;
+	reg_state[CTX_CTX_TIMESTAMP] = ring->mmio_base + 0x3a8;
+	reg_state[CTX_CTX_TIMESTAMP+1] = 0;
+	reg_state[CTX_PDP3_UDW] = GEN8_RING_PDP_UDW(ring, 3);
+	reg_state[CTX_PDP3_LDW] = GEN8_RING_PDP_LDW(ring, 3);
+	reg_state[CTX_PDP2_UDW] = GEN8_RING_PDP_UDW(ring, 2);
+	reg_state[CTX_PDP2_LDW] = GEN8_RING_PDP_LDW(ring, 2);
+	reg_state[CTX_PDP1_UDW] = GEN8_RING_PDP_UDW(ring, 1);
+	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
+	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
+	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
+	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[3]);
+	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[3]);
+	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[2]);
+	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[2]);
+	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[1]);
+	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[1]);
+	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[0]);
+	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[0]);
+	if (ring->id == RCS) {
+		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
+		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
+		reg_state[CTX_R_PWR_CLK_STATE+1] = 0;
+	}
+
+	kunmap_atomic(reg_state);
+
+	ctx_obj->dirty = 1;
+	set_page_dirty(page);
+	i915_gem_object_unpin_pages(ctx_obj);
+
+	return 0;
+}
+
 void intel_lr_context_free(struct intel_context *ctx)
 {
 	int i;
@@ -150,14 +291,24 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	if (ret) {
 		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer obj %s: %d\n",
 				ring->name, ret);
-		kfree(ringbuf);
-		i915_gem_object_ggtt_unpin(ctx_obj);
-		drm_gem_object_unreference(&ctx_obj->base);
-		return ret;
+		goto error;
+	}
+
+	ret = populate_lr_context(ctx, ctx_obj, ring, ringbuf);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret);
+		intel_destroy_ringbuffer_obj(ringbuf);
+		goto error;
 	}
 
 	ctx->engine[ring->id].ringbuf = ringbuf;
 	ctx->engine[ring->id].state = ctx_obj;
 
 	return 0;
+
+error:
+	kfree(ringbuf);
+	i915_gem_object_ggtt_unpin(ctx_obj);
+	drm_gem_object_unreference(&ctx_obj->base);
+	return ret;
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 10/43] drm/i915/bdw: Deferred creation of user-created LRCs
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (8 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 09/43] drm/i915/bdw: Populate LR contexts (somewhat) Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 14:25   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 11/43] drm/i915/bdw: Render moot context reset and switch with Execlists Thomas Daniel
                   ` (34 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

The backing objects and ringbuffers for contexts created via open
fd are actually empty until the user starts sending execbuffers to
them. At that point, we allocate & populate them. We do this because,
at create time, we really don't know which engine is going to be used
with the context later on (and we don't want to waste memory on
objects that we might never use).

v2: As contexts created via ioctl can only be used with the render
ring, we have enough information to allocate & populate them right
away.

v3: Defer the creation always, even with ioctl-created contexts, as
requested by Daniel Vetter.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c    |    7 +++----
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    8 ++++++++
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 48d7476..fbe7278 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -784,9 +784,9 @@ int i915_switch_context(struct intel_engine_cs *ring,
 	return do_switch(ring, to);
 }
 
-static bool hw_context_enabled(struct drm_device *dev)
+static bool contexts_enabled(struct drm_device *dev)
 {
-	return to_i915(dev)->hw_context_size;
+	return i915.enable_execlists || to_i915(dev)->hw_context_size;
 }
 
 int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
@@ -797,8 +797,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	struct intel_context *ctx;
 	int ret;
 
-	/* FIXME: allow user-created LR contexts as well */
-	if (!hw_context_enabled(dev))
+	if (!contexts_enabled(dev))
 		return -ENODEV;
 
 	ret = i915_mutex_lock_interruptible(dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index c5115957..4e9b387 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -951,6 +951,14 @@ i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
 		return ERR_PTR(-EIO);
 	}
 
+	if (i915.enable_execlists && !ctx->engine[ring->id].state) {
+		int ret = intel_lr_context_deferred_create(ctx, ring);
+		if (ret) {
+			DRM_DEBUG("Could not create LRC %u: %d\n", ctx_id, ret);
+			return ERR_PTR(ret);
+		}
+	}
+
 	return ctx;
 }
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 11/43] drm/i915/bdw: Render moot context reset and switch with Execlists
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (9 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 10/43] drm/i915/bdw: Deferred creation of user-created LRCs Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 14:30   ` Daniel Vetter
  2014-08-20 15:29   ` [PATCH] " Thomas Daniel
  2014-07-24 16:04 ` [PATCH 12/43] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs Thomas Daniel
                   ` (33 subsequent siblings)
  44 siblings, 2 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

These two functions make no sense in an Logical Ring Context & Execlists
world.

v2: We got rid of lrc_enabled and centralized everything in the sanitized
i915.enbale_execlists instead.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index fbe7278..288f5de 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -380,6 +380,9 @@ void i915_gem_context_reset(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int i;
 
+	if (i915.enable_execlists)
+		return;
+
 	/* Prevent the hardware from restoring the last context (which hung) on
 	 * the next switch */
 	for (i = 0; i < I915_NUM_RINGS; i++) {
@@ -514,6 +517,9 @@ int i915_gem_context_enable(struct drm_i915_private *dev_priv)
 		ppgtt->enable(ppgtt);
 	}
 
+	if (i915.enable_execlists)
+		return 0;
+
 	/* FIXME: We should make this work, even in reset */
 	if (i915_reset_in_progress(&dev_priv->gpu_error))
 		return 0;
@@ -769,6 +775,9 @@ int i915_switch_context(struct intel_engine_cs *ring,
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 
+	if (i915.enable_execlists)
+		return 0;
+
 	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
 
 	if (to->legacy_hw_ctx.rcs_state == NULL) { /* We have the fake context */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 12/43] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (10 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 11/43] drm/i915/bdw: Render moot context reset and switch with Execlists Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-01 13:46   ` Damien Lespiau
  2014-08-07 12:17   ` Thomas Daniel
  2014-07-24 16:04 ` [PATCH 13/43] drm/i915: Abstract the legacy workload submission mechanism away Thomas Daniel
                   ` (32 subsequent siblings)
  44 siblings, 2 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

This is mostly for correctness so that we know we are running the LR
context correctly (this is, the PDPs are contained inside the context
object).

v2: Move the check to inside the enable PPGTT function. The switch
happens in two places: the legacy context switch (that we won't hit
when Execlists are enabled) and the PPGTT enable, which unfortunately
we need. This would look much nicer if the ppgtt->enable was part of
the ring init, where it logically belongs.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5188936..ccd70f5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -852,6 +852,11 @@ static int gen8_ppgtt_enable(struct i915_hw_ppgtt *ppgtt)
 		if (USES_FULL_PPGTT(dev))
 			continue;
 
+		/* In the case of Execlists, we don't want to write the PDPs
+		 * in the legacy way (they live inside the context now) */
+		if (i915.enable_execlists)
+			return 0;
+
 		ret = ppgtt->switch_mm(ppgtt, ring, true);
 		if (ret)
 			goto err_out;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 13/43] drm/i915: Abstract the legacy workload submission mechanism away
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (11 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 12/43] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 14:36   ` Daniel Vetter
                     ` (2 more replies)
  2014-07-24 16:04 ` [PATCH 14/43] drm/i915/bdw: Skeleton for the new logical rings submission path Thomas Daniel
                   ` (31 subsequent siblings)
  44 siblings, 3 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

As suggested by Daniel Vetter. The idea, in subsequent patches, is to
provide an alternative to these vfuncs for the Execlists submission
mechanism.

v2: Splitted into two and reordered to illustrate our intentions, instead
of showing it off. Also, remove the add_request vfunc and added the
stop_ring one.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |   24 ++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem.c            |   15 +++++++++++----
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   20 ++++++++++----------
 3 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ff2c373..1caed52 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1617,6 +1617,21 @@ struct drm_i915_private {
 	/* Old ums support infrastructure, same warning applies. */
 	struct i915_ums_state ums;
 
+	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
+	struct {
+		int (*do_execbuf) (struct drm_device *dev, struct drm_file *file,
+				   struct intel_engine_cs *ring,
+				   struct intel_context *ctx,
+				   struct drm_i915_gem_execbuffer2 *args,
+				   struct list_head *vmas,
+				   struct drm_i915_gem_object *batch_obj,
+				   u64 exec_start, u32 flags);
+		int (*init_rings) (struct drm_device *dev);
+		void (*cleanup_ring) (struct intel_engine_cs *ring);
+		void (*stop_ring) (struct intel_engine_cs *ring);
+		bool (*is_ring_initialized) (struct intel_engine_cs *ring);
+	} gt;
+
 	/*
 	 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
 	 * will be rejected. Instead look for a better place.
@@ -2224,6 +2239,14 @@ int i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 			      struct drm_file *file_priv);
 int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 			     struct drm_file *file_priv);
+int i915_gem_ringbuffer_submission(struct drm_device *dev,
+				   struct drm_file *file,
+				   struct intel_engine_cs *ring,
+				   struct intel_context *ctx,
+				   struct drm_i915_gem_execbuffer2 *args,
+				   struct list_head *vmas,
+				   struct drm_i915_gem_object *batch_obj,
+				   u64 exec_start, u32 flags);
 int i915_gem_execbuffer(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
 int i915_gem_execbuffer2(struct drm_device *dev, void *data,
@@ -2376,6 +2399,7 @@ void i915_gem_reset(struct drm_device *dev);
 bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
 int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
 int __must_check i915_gem_init(struct drm_device *dev);
+int i915_gem_init_rings(struct drm_device *dev);
 int __must_check i915_gem_init_hw(struct drm_device *dev);
 int i915_gem_l3_remap(struct intel_engine_cs *ring, int slice);
 void i915_gem_init_swizzling(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d8bf4fa..6544286 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4518,7 +4518,7 @@ i915_gem_stop_ringbuffers(struct drm_device *dev)
 	int i;
 
 	for_each_ring(ring, dev_priv, i)
-		intel_stop_ring_buffer(ring);
+		dev_priv->gt.stop_ring(ring);
 }
 
 int
@@ -4635,7 +4635,7 @@ intel_enable_blt(struct drm_device *dev)
 	return true;
 }
 
-static int i915_gem_init_rings(struct drm_device *dev)
+int i915_gem_init_rings(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret;
@@ -4718,7 +4718,7 @@ i915_gem_init_hw(struct drm_device *dev)
 
 	i915_gem_init_swizzling(dev);
 
-	ret = i915_gem_init_rings(dev);
+	ret = dev_priv->gt.init_rings(dev);
 	if (ret)
 		return ret;
 
@@ -4759,6 +4759,13 @@ int i915_gem_init(struct drm_device *dev)
 			DRM_DEBUG_DRIVER("allow wake ack timed out\n");
 	}
 
+	if (!i915.enable_execlists) {
+		dev_priv->gt.do_execbuf = i915_gem_ringbuffer_submission;
+		dev_priv->gt.init_rings = i915_gem_init_rings;
+		dev_priv->gt.cleanup_ring = intel_cleanup_ring_buffer;
+		dev_priv->gt.stop_ring = intel_stop_ring_buffer;
+	}
+
 	i915_gem_init_userptr(dev);
 	i915_gem_init_global_gtt(dev);
 
@@ -4794,7 +4801,7 @@ i915_gem_cleanup_ringbuffer(struct drm_device *dev)
 	int i;
 
 	for_each_ring(ring, dev_priv, i)
-		intel_cleanup_ring_buffer(ring);
+		dev_priv->gt.cleanup_ring(ring);
 }
 
 int
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 4e9b387..8c63d79 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1034,14 +1034,14 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev,
 	return 0;
 }
 
-static int
-legacy_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
-			     struct intel_engine_cs *ring,
-			     struct intel_context *ctx,
-			     struct drm_i915_gem_execbuffer2 *args,
-			     struct list_head *vmas,
-			     struct drm_i915_gem_object *batch_obj,
-			     u64 exec_start, u32 flags)
+int
+i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
+			       struct intel_engine_cs *ring,
+			       struct intel_context *ctx,
+			       struct drm_i915_gem_execbuffer2 *args,
+			       struct list_head *vmas,
+			       struct drm_i915_gem_object *batch_obj,
+			       u64 exec_start, u32 flags)
 {
 	struct drm_clip_rect *cliprects = NULL;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1408,8 +1408,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	else
 		exec_start += i915_gem_obj_offset(batch_obj, vm);
 
-	ret = legacy_ringbuffer_submission(dev, file, ring, ctx,
-			args, &eb->vmas, batch_obj, exec_start, flags);
+	ret = dev_priv->gt.do_execbuf(dev, file, ring, ctx, args,
+			&eb->vmas, batch_obj, exec_start, flags);
 	if (ret)
 		goto err;
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 14/43] drm/i915/bdw: Skeleton for the new logical rings submission path
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (12 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 13/43] drm/i915: Abstract the legacy workload submission mechanism away Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-07-24 16:04 ` [PATCH 15/43] drm/i915/bdw: Generic logical ring init and cleanup Thomas Daniel
                   ` (30 subsequent siblings)
  44 siblings, 0 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Execlists are indeed a brave new world with respect to workload
submission to the GPU.

In previous version of these series, I have tried to impact the
legacy ringbuffer submission path as little as possible (mostly,
passing the context around and using the correct ringbuffer when I
needed one) but Daniel is afraid (probably with a reason) that
these changes and, especially, future ones, will end up breaking
older gens.

This commit and some others coming next will try to limit the
damage by creating an alternative path for workload submission.
The first step is here: laying out a new ring init/fini.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c  |    5 ++
 drivers/gpu/drm/i915/intel_lrc.c |  151 ++++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h |   12 +++
 3 files changed, 168 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6544286..9560b40 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4764,6 +4764,11 @@ int i915_gem_init(struct drm_device *dev)
 		dev_priv->gt.init_rings = i915_gem_init_rings;
 		dev_priv->gt.cleanup_ring = intel_cleanup_ring_buffer;
 		dev_priv->gt.stop_ring = intel_stop_ring_buffer;
+	} else {
+		dev_priv->gt.do_execbuf = intel_execlists_submission;
+		dev_priv->gt.init_rings = intel_logical_rings_init;
+		dev_priv->gt.cleanup_ring = intel_logical_ring_cleanup;
+		dev_priv->gt.stop_ring = intel_logical_ring_stop;
 	}
 
 	i915_gem_init_userptr(dev);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index cf322ec..cb56bb8 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -89,6 +89,157 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
+int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
+			       struct intel_engine_cs *ring,
+			       struct intel_context *ctx,
+			       struct drm_i915_gem_execbuffer2 *args,
+			       struct list_head *vmas,
+			       struct drm_i915_gem_object *batch_obj,
+			       u64 exec_start, u32 flags)
+{
+	/* TODO */
+	return 0;
+}
+
+void intel_logical_ring_stop(struct intel_engine_cs *ring)
+{
+	/* TODO */
+}
+
+void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
+{
+	/* TODO */
+}
+
+static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
+{
+	/* TODO */
+	return 0;
+}
+
+static int logical_render_ring_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring = &dev_priv->ring[RCS];
+
+	ring->name = "render ring";
+	ring->id = RCS;
+	ring->mmio_base = RENDER_RING_BASE;
+	ring->irq_enable_mask =
+		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
+
+	return logical_ring_init(dev, ring);
+}
+
+static int logical_bsd_ring_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring = &dev_priv->ring[VCS];
+
+	ring->name = "bsd ring";
+	ring->id = VCS;
+	ring->mmio_base = GEN6_BSD_RING_BASE;
+	ring->irq_enable_mask =
+		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
+
+	return logical_ring_init(dev, ring);
+}
+
+static int logical_bsd2_ring_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring = &dev_priv->ring[VCS2];
+
+	ring->name = "bds2 ring";
+	ring->id = VCS2;
+	ring->mmio_base = GEN8_BSD2_RING_BASE;
+	ring->irq_enable_mask =
+		GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
+
+	return logical_ring_init(dev, ring);
+}
+
+static int logical_blt_ring_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring = &dev_priv->ring[BCS];
+
+	ring->name = "blitter ring";
+	ring->id = BCS;
+	ring->mmio_base = BLT_RING_BASE;
+	ring->irq_enable_mask =
+		GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
+
+	return logical_ring_init(dev, ring);
+}
+
+static int logical_vebox_ring_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring = &dev_priv->ring[VECS];
+
+	ring->name = "video enhancement ring";
+	ring->id = VECS;
+	ring->mmio_base = VEBOX_RING_BASE;
+	ring->irq_enable_mask =
+		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
+
+	return logical_ring_init(dev, ring);
+}
+
+int intel_logical_rings_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ret = logical_render_ring_init(dev);
+	if (ret)
+		return ret;
+
+	if (HAS_BSD(dev)) {
+		ret = logical_bsd_ring_init(dev);
+		if (ret)
+			goto cleanup_render_ring;
+	}
+
+	if (HAS_BLT(dev)) {
+		ret = logical_blt_ring_init(dev);
+		if (ret)
+			goto cleanup_bsd_ring;
+	}
+
+	if (HAS_VEBOX(dev)) {
+		ret = logical_vebox_ring_init(dev);
+		if (ret)
+			goto cleanup_blt_ring;
+	}
+
+	if (HAS_BSD2(dev)) {
+		ret = logical_bsd2_ring_init(dev);
+		if (ret)
+			goto cleanup_vebox_ring;
+	}
+
+	ret = i915_gem_set_seqno(dev, ((u32)~0 - 0x1000));
+	if (ret)
+		goto cleanup_bsd2_ring;
+
+	return 0;
+
+cleanup_bsd2_ring:
+	intel_logical_ring_cleanup(&dev_priv->ring[VCS2]);
+cleanup_vebox_ring:
+	intel_logical_ring_cleanup(&dev_priv->ring[VECS]);
+cleanup_blt_ring:
+	intel_logical_ring_cleanup(&dev_priv->ring[BCS]);
+cleanup_bsd_ring:
+	intel_logical_ring_cleanup(&dev_priv->ring[VCS]);
+cleanup_render_ring:
+	intel_logical_ring_cleanup(&dev_priv->ring[RCS]);
+
+	return ret;
+}
+
 static int
 populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj,
 		    struct intel_engine_cs *ring, struct intel_ringbuffer *ringbuf)
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 3b93572..bf0eff4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -24,6 +24,11 @@
 #ifndef _INTEL_LRC_H_
 #define _INTEL_LRC_H_
 
+/* Logical Rings */
+void intel_logical_ring_stop(struct intel_engine_cs *ring);
+void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
+int intel_logical_rings_init(struct drm_device *dev);
+
 /* Logical Ring Contexts */
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
@@ -31,5 +36,12 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 
 /* Execlists */
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
+int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
+			       struct intel_engine_cs *ring,
+			       struct intel_context *ctx,
+			       struct drm_i915_gem_execbuffer2 *args,
+			       struct list_head *vmas,
+			       struct drm_i915_gem_object *batch_obj,
+			       u64 exec_start, u32 flags);
 
 #endif /* _INTEL_LRC_H_ */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 15/43] drm/i915/bdw: Generic logical ring init and cleanup
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (13 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 14/43] drm/i915/bdw: Skeleton for the new logical rings submission path Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 15:01   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 16/43] drm/i915/bdw: GEN-specific logical ring init Thomas Daniel
                   ` (29 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Allocate and populate the default LRC for every ring, call
gen-specific init/cleanup, init/fini the command parser and
set the status page (now inside the LRC object). These are
things all engines/rings have in common.

Stopping the ring before cleanup and initializing the seqnos
is left as a TODO task (we need more infrastructure in place
before we can achieve this).

v2: Check the ringbuffer backing obj for ring_is_initialized,
instead of the context backing obj (similar, but not exactly
the same).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c |    4 ---
 drivers/gpu/drm/i915/intel_lrc.c        |   54 +++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.c |   17 ++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |    6 +---
 4 files changed, 70 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 288f5de..9085ff1 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -450,10 +450,6 @@ int i915_gem_context_init(struct drm_device *dev)
 
 		/* NB: RCS will hold a ref for all rings */
 		ring->default_context = ctx;
-
-		/* FIXME: we really only want to do this for initialized rings */
-		if (i915.enable_execlists)
-			intel_lr_context_deferred_create(ctx, ring);
 	}
 
 	DRM_DEBUG_DRIVER("%s context support initialized\n",
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index cb56bb8..05b7069 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -108,12 +108,60 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
 
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
-	/* TODO */
+	if (!intel_ring_initialized(ring))
+		return;
+
+	/* TODO: make sure the ring is stopped */
+	ring->preallocated_lazy_request = NULL;
+	ring->outstanding_lazy_seqno = 0;
+
+	if (ring->cleanup)
+		ring->cleanup(ring);
+
+	i915_cmd_parser_fini_ring(ring);
+
+	if (ring->status_page.obj) {
+		kunmap(sg_page(ring->status_page.obj->pages->sgl));
+		ring->status_page.obj = NULL;
+	}
 }
 
 static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
 {
-	/* TODO */
+	int ret;
+	struct intel_context *dctx = ring->default_context;
+	struct drm_i915_gem_object *dctx_obj;
+
+	/* Intentionally left blank. */
+	ring->buffer = NULL;
+
+	ring->dev = dev;
+	INIT_LIST_HEAD(&ring->active_list);
+	INIT_LIST_HEAD(&ring->request_list);
+	init_waitqueue_head(&ring->irq_queue);
+
+	ret = intel_lr_context_deferred_create(dctx, ring);
+	if (ret)
+		return ret;
+
+	/* The status page is offset 0 from the context object in LRCs. */
+	dctx_obj = dctx->engine[ring->id].state;
+	ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(dctx_obj);
+	ring->status_page.page_addr = kmap(sg_page(dctx_obj->pages->sgl));
+	if (ring->status_page.page_addr == NULL)
+		return -ENOMEM;
+	ring->status_page.obj = dctx_obj;
+
+	ret = i915_cmd_parser_init_ring(ring);
+	if (ret)
+		return ret;
+
+	if (ring->init) {
+		ret = ring->init(ring);
+		if (ret)
+			return ret;
+	}
+
 	return 0;
 }
 
@@ -397,6 +445,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	int ret;
 
 	BUG_ON(ctx->legacy_hw_ctx.rcs_state != NULL);
+	if (ctx->engine[ring->id].state)
+		return 0;
 
 	context_size = round_up(get_lr_context_size(ring), 4096);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 279dda4..20eb1a4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -40,6 +40,23 @@
  */
 #define CACHELINE_BYTES 64
 
+bool
+intel_ring_initialized(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+
+	if (!dev)
+		return false;
+
+	if (i915.enable_execlists) {
+		struct intel_context *dctx = ring->default_context;
+		struct intel_ringbuffer *ringbuf = dctx->engine[ring->id].ringbuf;
+
+		return ringbuf->obj;
+	} else
+		return ring->buffer && ring->buffer->obj;
+}
+
 static inline int __ring_space(int head, int tail, int size)
 {
 	int space = head - (tail + I915_RING_FREE_SPACE);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index be40788..7203ee2 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -288,11 +288,7 @@ struct  intel_engine_cs {
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
 };
 
-static inline bool
-intel_ring_initialized(struct intel_engine_cs *ring)
-{
-	return ring->buffer && ring->buffer->obj;
-}
+bool intel_ring_initialized(struct intel_engine_cs *ring);
 
 static inline unsigned
 intel_ring_flag(struct intel_engine_cs *ring)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 16/43] drm/i915/bdw: GEN-specific logical ring init
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (14 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 15/43] drm/i915/bdw: Generic logical ring init and cleanup Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 15:04   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 17/43] drm/i915/bdw: GEN-specific logical ring set/get seqno Thomas Daniel
                   ` (28 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Logical rings do not need most of the initialization their
legacy ringbuffer counterparts do: we just need the pipe
control object for the render ring, enable Execlists on the
hardware and a few workarounds.

v2: Squash with: "drm/i915: Extract pipe control fini & make
init outside accesible".

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        |   54 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c |   34 +++++++++++--------
 drivers/gpu/drm/i915/intel_ringbuffer.h |    3 ++
 3 files changed, 78 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 05b7069..7c8b75e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -106,6 +106,49 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
 	/* TODO */
 }
 
+static int gen8_init_common_ring(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	I915_WRITE(RING_MODE_GEN7(ring),
+		_MASKED_BIT_DISABLE(GFX_REPLAY_MODE) |
+		_MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));
+	POSTING_READ(RING_MODE_GEN7(ring));
+	DRM_DEBUG_DRIVER("Execlists enabled for %s\n", ring->name);
+
+	memset(&ring->hangcheck, 0, sizeof(ring->hangcheck));
+
+	return 0;
+}
+
+static int gen8_init_render_ring(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ret = gen8_init_common_ring(ring);
+	if (ret)
+		return ret;
+
+	/* We need to disable the AsyncFlip performance optimisations in order
+	 * to use MI_WAIT_FOR_EVENT within the CS. It should already be
+	 * programmed to '1' on all products.
+	 *
+	 * WaDisableAsyncFlipPerfMode:snb,ivb,hsw,vlv,bdw,chv
+	 */
+	I915_WRITE(MI_MODE, _MASKED_BIT_ENABLE(ASYNC_FLIP_PERF_DISABLE));
+
+	ret = intel_init_pipe_control(ring);
+	if (ret)
+		return ret;
+
+	I915_WRITE(INSTPM, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
+
+	return ret;
+}
+
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
 	if (!intel_ring_initialized(ring))
@@ -176,6 +219,9 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
 
+	ring->init = gen8_init_render_ring;
+	ring->cleanup = intel_fini_pipe_control;
+
 	return logical_ring_init(dev, ring);
 }
 
@@ -190,6 +236,8 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
 
+	ring->init = gen8_init_common_ring;
+
 	return logical_ring_init(dev, ring);
 }
 
@@ -204,6 +252,8 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
 
+	ring->init = gen8_init_common_ring;
+
 	return logical_ring_init(dev, ring);
 }
 
@@ -218,6 +268,8 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
 
+	ring->init = gen8_init_common_ring;
+
 	return logical_ring_init(dev, ring);
 }
 
@@ -232,6 +284,8 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
 
+	ring->init = gen8_init_common_ring;
+
 	return logical_ring_init(dev, ring);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 20eb1a4..ca45c58 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -573,8 +573,25 @@ out:
 	return ret;
 }
 
-static int
-init_pipe_control(struct intel_engine_cs *ring)
+void
+intel_fini_pipe_control(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+
+	if (ring->scratch.obj == NULL)
+		return;
+
+	if (INTEL_INFO(dev)->gen >= 5) {
+		kunmap(sg_page(ring->scratch.obj->pages->sgl));
+		i915_gem_object_ggtt_unpin(ring->scratch.obj);
+	}
+
+	drm_gem_object_unreference(&ring->scratch.obj->base);
+	ring->scratch.obj = NULL;
+}
+
+int
+intel_init_pipe_control(struct intel_engine_cs *ring)
 {
 	int ret;
 
@@ -649,7 +666,7 @@ static int init_render_ring(struct intel_engine_cs *ring)
 			   _MASKED_BIT_ENABLE(GFX_REPLAY_MODE));
 
 	if (INTEL_INFO(dev)->gen >= 5) {
-		ret = init_pipe_control(ring);
+		ret = intel_init_pipe_control(ring);
 		if (ret)
 			return ret;
 	}
@@ -684,16 +701,7 @@ static void render_ring_cleanup(struct intel_engine_cs *ring)
 		dev_priv->semaphore_obj = NULL;
 	}
 
-	if (ring->scratch.obj == NULL)
-		return;
-
-	if (INTEL_INFO(dev)->gen >= 5) {
-		kunmap(sg_page(ring->scratch.obj->pages->sgl));
-		i915_gem_object_ggtt_unpin(ring->scratch.obj);
-	}
-
-	drm_gem_object_unreference(&ring->scratch.obj->base);
-	ring->scratch.obj = NULL;
+	intel_fini_pipe_control(ring);
 }
 
 static int gen8_rcs_signal(struct intel_engine_cs *signaller,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 7203ee2..c135334 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -380,6 +380,9 @@ void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno);
 int intel_ring_flush_all_caches(struct intel_engine_cs *ring);
 int intel_ring_invalidate_all_caches(struct intel_engine_cs *ring);
 
+void intel_fini_pipe_control(struct intel_engine_cs *ring);
+int intel_init_pipe_control(struct intel_engine_cs *ring);
+
 int intel_init_render_ring_buffer(struct drm_device *dev);
 int intel_init_bsd_ring_buffer(struct drm_device *dev);
 int intel_init_bsd2_ring_buffer(struct drm_device *dev);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 17/43] drm/i915/bdw: GEN-specific logical ring set/get seqno
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (15 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 16/43] drm/i915/bdw: GEN-specific logical ring init Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 15:05   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 18/43] drm/i915/bdw: New logical ring submission mechanism Thomas Daniel
                   ` (27 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

No mistery here: the seqno is still retrieved from the engine's
HW status page (the one in the default context. For the moment,
I see no reason to worry about other context's HWS page).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7c8b75e..f171fd5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -149,6 +149,16 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
 	return ret;
 }
 
+static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
+{
+	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
+}
+
+static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
+{
+	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
+}
+
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
 	if (!intel_ring_initialized(ring))
@@ -221,6 +231,8 @@ static int logical_render_ring_init(struct drm_device *dev)
 
 	ring->init = gen8_init_render_ring;
 	ring->cleanup = intel_fini_pipe_control;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 
 	return logical_ring_init(dev, ring);
 }
@@ -237,6 +249,8 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 
 	return logical_ring_init(dev, ring);
 }
@@ -253,6 +267,8 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 
 	return logical_ring_init(dev, ring);
 }
@@ -269,6 +285,8 @@ static int logical_blt_ring_init(struct drm_device *dev)
 		GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 
 	return logical_ring_init(dev, ring);
 }
@@ -285,6 +303,8 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 
 	return logical_ring_init(dev, ring);
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 18/43] drm/i915/bdw: New logical ring submission mechanism
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (16 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 17/43] drm/i915/bdw: GEN-specific logical ring set/get seqno Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 20:40   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 19/43] drm/i915/bdw: GEN-specific logical ring emit request Thomas Daniel
                   ` (26 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Well, new-ish: if all this code looks familiar, that's because it's
a clone of the existing submission mechanism (with some modifications
here and there to adapt it to LRCs and Execlists).

And why did we do this instead of reusing code, one might wonder?
Well, there are some fears that the differences are big enough that
they will end up breaking all platforms.

Also, Execlists offer several advantages, like control over when the
GPU is done with a given workload, that can help simplify the
submission mechanism, no doubt. I am interested in getting Execlists
to work first and foremost, but in the future this parallel submission
mechanism will help us to fine tune the mechanism without affecting
old gens.

v2: Pass the ringbuffer only (whenever possible).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        |  193 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h        |   12 ++
 drivers/gpu/drm/i915/intel_ringbuffer.c |   20 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |    3 +
 4 files changed, 218 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f171fd5..bd37d51 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -106,6 +106,199 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
 	/* TODO */
 }
 
+void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
+{
+	intel_logical_ring_advance(ringbuf);
+
+	if (intel_ring_stopped(ringbuf->ring))
+		return;
+
+	/* TODO: how to submit a context to the ELSP is not here yet */
+}
+
+static int logical_ring_alloc_seqno(struct intel_engine_cs *ring)
+{
+	if (ring->outstanding_lazy_seqno)
+		return 0;
+
+	if (ring->preallocated_lazy_request == NULL) {
+		struct drm_i915_gem_request *request;
+
+		request = kmalloc(sizeof(*request), GFP_KERNEL);
+		if (request == NULL)
+			return -ENOMEM;
+
+		ring->preallocated_lazy_request = request;
+	}
+
+	return i915_gem_get_seqno(ring->dev, &ring->outstanding_lazy_seqno);
+}
+
+static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf, int bytes)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	struct drm_i915_gem_request *request;
+	u32 seqno = 0;
+	int ret;
+
+	if (ringbuf->last_retired_head != -1) {
+		ringbuf->head = ringbuf->last_retired_head;
+		ringbuf->last_retired_head = -1;
+
+		ringbuf->space = intel_ring_space(ringbuf);
+		if (ringbuf->space >= bytes)
+			return 0;
+	}
+
+	list_for_each_entry(request, &ring->request_list, list) {
+		if (__intel_ring_space(request->tail, ringbuf->tail,
+				ringbuf->size) >= bytes) {
+			seqno = request->seqno;
+			break;
+		}
+	}
+
+	if (seqno == 0)
+		return -ENOSPC;
+
+	ret = i915_wait_seqno(ring, seqno);
+	if (ret)
+		return ret;
+
+	/* TODO: make sure we update the right ringbuffer's last_retired_head
+	 * when retiring requests */
+	i915_gem_retire_requests_ring(ring);
+	ringbuf->head = ringbuf->last_retired_head;
+	ringbuf->last_retired_head = -1;
+
+	ringbuf->space = intel_ring_space(ringbuf);
+	return 0;
+}
+
+static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf, int bytes)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	unsigned long end;
+	int ret;
+
+	ret = logical_ring_wait_request(ringbuf, bytes);
+	if (ret != -ENOSPC)
+		return ret;
+
+	/* Force the context submission in case we have been skipping it */
+	intel_logical_ring_advance_and_submit(ringbuf);
+
+	/* With GEM the hangcheck timer should kick us out of the loop,
+	 * leaving it early runs the risk of corrupting GEM state (due
+	 * to running on almost untested codepaths). But on resume
+	 * timers don't work yet, so prevent a complete hang in that
+	 * case by choosing an insanely large timeout. */
+	end = jiffies + 60 * HZ;
+
+	do {
+		ringbuf->head = I915_READ_HEAD(ring);
+		ringbuf->space = intel_ring_space(ringbuf);
+		if (ringbuf->space >= bytes) {
+			ret = 0;
+			break;
+		}
+
+		if (!drm_core_check_feature(dev, DRIVER_MODESET) &&
+		    dev->primary->master) {
+			struct drm_i915_master_private *master_priv = dev->primary->master->driver_priv;
+			if (master_priv->sarea_priv)
+				master_priv->sarea_priv->perf_boxes |= I915_BOX_WAIT;
+		}
+
+		msleep(1);
+
+		if (dev_priv->mm.interruptible && signal_pending(current)) {
+			ret = -ERESTARTSYS;
+			break;
+		}
+
+		ret = i915_gem_check_wedge(&dev_priv->gpu_error,
+					   dev_priv->mm.interruptible);
+		if (ret)
+			break;
+
+		if (time_after(jiffies, end)) {
+			ret = -EBUSY;
+			break;
+		}
+	} while (1);
+
+	return ret;
+}
+
+static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf)
+{
+	uint32_t __iomem *virt;
+	int rem = ringbuf->size - ringbuf->tail;
+
+	if (ringbuf->space < rem) {
+		int ret = logical_ring_wait_for_space(ringbuf, rem);
+		if (ret)
+			return ret;
+	}
+
+	virt = ringbuf->virtual_start + ringbuf->tail;
+	rem /= 4;
+	while (rem--)
+		iowrite32(MI_NOOP, virt++);
+
+	ringbuf->tail = 0;
+	ringbuf->space = intel_ring_space(ringbuf);
+
+	return 0;
+}
+
+static int logical_ring_prepare(struct intel_ringbuffer *ringbuf, int bytes)
+{
+	int ret;
+
+	if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
+		ret = logical_ring_wrap_buffer(ringbuf);
+		if (unlikely(ret))
+			return ret;
+	}
+
+	if (unlikely(ringbuf->space < bytes)) {
+		ret = logical_ring_wait_for_space(ringbuf, bytes);
+		if (unlikely(ret))
+			return ret;
+	}
+
+	return 0;
+}
+
+int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
+				   dev_priv->mm.interruptible);
+	if (ret)
+		return ret;
+
+	ret = logical_ring_prepare(ringbuf, num_dwords * sizeof(uint32_t));
+	if (ret)
+		return ret;
+
+	/* Preallocate the olr before touching the ring */
+	ret = logical_ring_alloc_seqno(ring);
+	if (ret)
+		return ret;
+
+	ringbuf->space -= num_dwords * sizeof(uint32_t);
+	return 0;
+}
+
 static int gen8_init_common_ring(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index bf0eff4..16798b6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -29,6 +29,18 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring);
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
 int intel_logical_rings_init(struct drm_device *dev);
 
+void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf);
+static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
+{
+	ringbuf->tail &= ringbuf->size - 1;
+}
+static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf, u32 data)
+{
+	iowrite32(data, ringbuf->virtual_start + ringbuf->tail);
+	ringbuf->tail += 4;
+}
+int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords);
+
 /* Logical Ring Contexts */
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ca45c58..dc2a991 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -57,7 +57,7 @@ intel_ring_initialized(struct intel_engine_cs *ring)
 		return ring->buffer && ring->buffer->obj;
 }
 
-static inline int __ring_space(int head, int tail, int size)
+int __intel_ring_space(int head, int tail, int size)
 {
 	int space = head - (tail + I915_RING_FREE_SPACE);
 	if (space < 0)
@@ -65,12 +65,12 @@ static inline int __ring_space(int head, int tail, int size)
 	return space;
 }
 
-static inline int ring_space(struct intel_ringbuffer *ringbuf)
+int intel_ring_space(struct intel_ringbuffer *ringbuf)
 {
-	return __ring_space(ringbuf->head & HEAD_ADDR, ringbuf->tail, ringbuf->size);
+	return __intel_ring_space(ringbuf->head & HEAD_ADDR, ringbuf->tail, ringbuf->size);
 }
 
-static bool intel_ring_stopped(struct intel_engine_cs *ring)
+bool intel_ring_stopped(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	return dev_priv->gpu_error.stop_rings & intel_ring_flag(ring);
@@ -561,7 +561,7 @@ static int init_ring_common(struct intel_engine_cs *ring)
 	else {
 		ringbuf->head = I915_READ_HEAD(ring);
 		ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
-		ringbuf->space = ring_space(ringbuf);
+		ringbuf->space = intel_ring_space(ringbuf);
 		ringbuf->last_retired_head = -1;
 	}
 
@@ -1679,13 +1679,13 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 		ringbuf->head = ringbuf->last_retired_head;
 		ringbuf->last_retired_head = -1;
 
-		ringbuf->space = ring_space(ringbuf);
+		ringbuf->space = intel_ring_space(ringbuf);
 		if (ringbuf->space >= n)
 			return 0;
 	}
 
 	list_for_each_entry(request, &ring->request_list, list) {
-		if (__ring_space(request->tail, ringbuf->tail, ringbuf->size) >= n) {
+		if (__intel_ring_space(request->tail, ringbuf->tail, ringbuf->size) >= n) {
 			seqno = request->seqno;
 			break;
 		}
@@ -1702,7 +1702,7 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 	ringbuf->head = ringbuf->last_retired_head;
 	ringbuf->last_retired_head = -1;
 
-	ringbuf->space = ring_space(ringbuf);
+	ringbuf->space = intel_ring_space(ringbuf);
 	return 0;
 }
 
@@ -1731,7 +1731,7 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
 	trace_i915_ring_wait_begin(ring);
 	do {
 		ringbuf->head = I915_READ_HEAD(ring);
-		ringbuf->space = ring_space(ringbuf);
+		ringbuf->space = intel_ring_space(ringbuf);
 		if (ringbuf->space >= n) {
 			ret = 0;
 			break;
@@ -1783,7 +1783,7 @@ static int intel_wrap_ring_buffer(struct intel_engine_cs *ring)
 		iowrite32(MI_NOOP, virt++);
 
 	ringbuf->tail = 0;
-	ringbuf->space = ring_space(ringbuf);
+	ringbuf->space = intel_ring_space(ringbuf);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index c135334..c305df0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -373,6 +373,9 @@ static inline void intel_ring_advance(struct intel_engine_cs *ring)
 	struct intel_ringbuffer *ringbuf = ring->buffer;
 	ringbuf->tail &= ringbuf->size - 1;
 }
+int __intel_ring_space(int head, int tail, int size);
+int intel_ring_space(struct intel_ringbuffer *ringbuf);
+bool intel_ring_stopped(struct intel_engine_cs *ring);
 void __intel_ring_advance(struct intel_engine_cs *ring);
 
 int __must_check intel_ring_idle(struct intel_engine_cs *ring);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 19/43] drm/i915/bdw: GEN-specific logical ring emit request
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (17 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 18/43] drm/i915/bdw: New logical ring submission mechanism Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-07-24 16:04 ` [PATCH 20/43] drm/i915/bdw: GEN-specific logical ring emit flush Thomas Daniel
                   ` (25 subsequent siblings)
  44 siblings, 0 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Very similar to the legacy add_request, only modified to account for
logical ringbuffer.

v2: Use MI_GLOBAL_GTT, as suggested by Brad Volkin.

v3: Unify render and non-render in the same function, as noticed by
Brad Volkin.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h         |    1 +
 drivers/gpu/drm/i915/intel_lrc.c        |   31 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |    3 +++
 3 files changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 043a6ea..70dddac 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -272,6 +272,7 @@
 #define   MI_SEMAPHORE_POLL		(1<<15)
 #define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
 #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
+#define MI_STORE_DWORD_IMM_GEN8	MI_INSTR(0x20, 2)
 #define   MI_MEM_VIRTUAL	(1 << 22) /* 965+ only */
 #define MI_STORE_DWORD_INDEX	MI_INSTR(0x21, 1)
 #define   MI_STORE_DWORD_INDEX_SHIFT 2
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index bd37d51..64bda7a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -352,6 +352,32 @@ static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
 	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
 }
 
+static int gen8_emit_request(struct intel_ringbuffer *ringbuf)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	u32 cmd;
+	int ret;
+
+	ret = intel_logical_ring_begin(ringbuf, 6);
+	if (ret)
+		return ret;
+
+	cmd = MI_STORE_DWORD_IMM_GEN8;
+	cmd |= MI_GLOBAL_GTT;
+
+	intel_logical_ring_emit(ringbuf, cmd);
+	intel_logical_ring_emit(ringbuf,
+				(ring->status_page.gfx_addr +
+				(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT)));
+	intel_logical_ring_emit(ringbuf, 0);
+	intel_logical_ring_emit(ringbuf, ring->outstanding_lazy_seqno);
+	intel_logical_ring_emit(ringbuf, MI_USER_INTERRUPT);
+	intel_logical_ring_emit(ringbuf, MI_NOOP);
+	intel_logical_ring_advance_and_submit(ringbuf);
+
+	return 0;
+}
+
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
 	if (!intel_ring_initialized(ring))
@@ -426,6 +452,7 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->cleanup = intel_fini_pipe_control;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
+	ring->emit_request = gen8_emit_request;
 
 	return logical_ring_init(dev, ring);
 }
@@ -444,6 +471,7 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
+	ring->emit_request = gen8_emit_request;
 
 	return logical_ring_init(dev, ring);
 }
@@ -462,6 +490,7 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
+	ring->emit_request = gen8_emit_request;
 
 	return logical_ring_init(dev, ring);
 }
@@ -480,6 +509,7 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
+	ring->emit_request = gen8_emit_request;
 
 	return logical_ring_init(dev, ring);
 }
@@ -498,6 +528,7 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
+	ring->emit_request = gen8_emit_request;
 
 	return logical_ring_init(dev, ring);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index c305df0..176ee6a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -215,6 +215,9 @@ struct  intel_engine_cs {
 				  unsigned int num_dwords);
 	} semaphore;
 
+	/* Execlists */
+	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
+
 	/**
 	 * List of objects currently involved in rendering from the
 	 * ringbuffer.
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 20/43] drm/i915/bdw: GEN-specific logical ring emit flush
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (18 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 19/43] drm/i915/bdw: GEN-specific logical ring emit request Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-07-24 16:04 ` [PATCH 21/43] drm/i915/bdw: Emission of requests with logical rings Thomas Daniel
                   ` (24 subsequent siblings)
  44 siblings, 0 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Same as the legacy-style ring->flush.

v2: The BSD invalidate bit still exists in GEN8! Add it for the VCS
rings (but still consolidate the blt and bsd ring flushes into one).
This was noticed by Brad Volkin.

v3: The command for BSD and for other rings is slightly different:
get it exactly the same as in gen6_ring_flush + gen6_bsd_ring_flush

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        |   82 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c |    7 ---
 drivers/gpu/drm/i915/intel_ringbuffer.h |   10 ++++
 3 files changed, 92 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 64bda7a..5dd63d6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -342,6 +342,83 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
 	return ret;
 }
 
+static int gen8_emit_flush(struct intel_ringbuffer *ringbuf,
+			   u32 invalidate_domains,
+			   u32 unused)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	uint32_t cmd;
+	int ret;
+
+	ret = intel_logical_ring_begin(ringbuf, 4);
+	if (ret)
+		return ret;
+
+	cmd = MI_FLUSH_DW + 1;
+
+	if (ring == &dev_priv->ring[VCS]) {
+		if (invalidate_domains & I915_GEM_GPU_DOMAINS)
+			cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
+				MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
+	} else {
+		if (invalidate_domains & I915_GEM_DOMAIN_RENDER)
+			cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
+				MI_FLUSH_DW_OP_STOREDW;
+	}
+
+	intel_logical_ring_emit(ringbuf, cmd);
+	intel_logical_ring_emit(ringbuf, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
+	intel_logical_ring_emit(ringbuf, 0); /* upper addr */
+	intel_logical_ring_emit(ringbuf, 0); /* value */
+	intel_logical_ring_advance(ringbuf);
+
+	return 0;
+}
+
+static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf,
+				  u32 invalidate_domains,
+				  u32 flush_domains)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	u32 flags = 0;
+	int ret;
+
+	flags |= PIPE_CONTROL_CS_STALL;
+
+	if (flush_domains) {
+		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
+		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
+	}
+
+	if (invalidate_domains) {
+		flags |= PIPE_CONTROL_TLB_INVALIDATE;
+		flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_QW_WRITE;
+		flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
+	}
+
+	ret = intel_logical_ring_begin(ringbuf, 6);
+	if (ret)
+		return ret;
+
+	intel_logical_ring_emit(ringbuf, GFX_OP_PIPE_CONTROL(6));
+	intel_logical_ring_emit(ringbuf, flags);
+	intel_logical_ring_emit(ringbuf, scratch_addr);
+	intel_logical_ring_emit(ringbuf, 0);
+	intel_logical_ring_emit(ringbuf, 0);
+	intel_logical_ring_emit(ringbuf, 0);
+	intel_logical_ring_advance(ringbuf);
+
+	return 0;
+}
+
 static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
 {
 	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
@@ -453,6 +530,7 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
+	ring->emit_flush = gen8_emit_flush_render;
 
 	return logical_ring_init(dev, ring);
 }
@@ -472,6 +550,7 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
+	ring->emit_flush = gen8_emit_flush;
 
 	return logical_ring_init(dev, ring);
 }
@@ -491,6 +570,7 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
+	ring->emit_flush = gen8_emit_flush;
 
 	return logical_ring_init(dev, ring);
 }
@@ -510,6 +590,7 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
+	ring->emit_flush = gen8_emit_flush;
 
 	return logical_ring_init(dev, ring);
 }
@@ -529,6 +610,7 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
+	ring->emit_flush = gen8_emit_flush;
 
 	return logical_ring_init(dev, ring);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index dc2a991..3188403 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -33,13 +33,6 @@
 #include "i915_trace.h"
 #include "intel_drv.h"
 
-/* Early gen2 devices have a cacheline of just 32 bytes, using 64 is overkill,
- * but keeps the logic simple. Indeed, the whole purpose of this macro is just
- * to give some inclination as to some of the magic values used in the various
- * workarounds!
- */
-#define CACHELINE_BYTES 64
-
 bool
 intel_ring_initialized(struct intel_engine_cs *ring)
 {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 176ee6a..6e22866 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -5,6 +5,13 @@
 
 #define I915_CMD_HASH_ORDER 9
 
+/* Early gen2 devices have a cacheline of just 32 bytes, using 64 is overkill,
+ * but keeps the logic simple. Indeed, the whole purpose of this macro is just
+ * to give some inclination as to some of the magic values used in the various
+ * workarounds!
+ */
+#define CACHELINE_BYTES 64
+
 /*
  * Gen2 BSpec "1. Programming Environment" / 1.4.4.6 "Ring Buffer Use"
  * Gen3 BSpec "vol1c Memory Interface Functions" / 2.3.4.5 "Ring Buffer Use"
@@ -217,6 +224,9 @@ struct  intel_engine_cs {
 
 	/* Execlists */
 	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
+	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
+				      u32 invalidate_domains,
+				      u32 flush_domains);
 
 	/**
 	 * List of objects currently involved in rendering from the
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 21/43] drm/i915/bdw: Emission of requests with logical rings
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (19 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 20/43] drm/i915/bdw: GEN-specific logical ring emit flush Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 20:56   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 22/43] drm/i915/bdw: Ring idle and stop " Thomas Daniel
                   ` (23 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

On a previous iteration of this patch, I created an Execlists
version of __i915_add_request and asbtracted it away as a
vfunc. Daniel Vetter wondered then why that was needed:

"with the clean split in command submission I expect every
function to know wether it'll submit to an lrc (everything in
intel_lrc.c) or wether it'll submit to a legacy ring (existing
code), so I don't see a need for an add_request vfunc."

The honest, hairy truth is that this patch is the glue keeping
the whole logical ring puzzle together:

- i915_add_request is used by intel_ring_idle, which in turn is
  used by i915_gpu_idle, which in turn is used in several places
  inside the eviction and gtt codes.
- Also, it is used by i915_gem_check_olr, which is littered all
  over i915_gem.c
- ...

If I were to duplicate all the code that directly or indirectly
uses __i915_add_request, I'll end up creating a separate driver.

To show the differences between the existing legacy version and
the new Execlists one, this time I have special-cased
__i915_add_request instead of adding an add_request vfunc. I
hope this helps to untangle this Gordian knot.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c  |   72 ++++++++++++++++++++++++++++----------
 drivers/gpu/drm/i915/intel_lrc.c |   30 +++++++++++++---
 drivers/gpu/drm/i915/intel_lrc.h |    1 +
 3 files changed, 80 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9560b40..1c83b9c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2327,10 +2327,21 @@ int __i915_add_request(struct intel_engine_cs *ring,
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	struct drm_i915_gem_request *request;
+	struct intel_ringbuffer *ringbuf;
 	u32 request_ring_position, request_start;
 	int ret;
 
-	request_start = intel_ring_get_tail(ring->buffer);
+	request = ring->preallocated_lazy_request;
+	if (WARN_ON(request == NULL))
+		return -ENOMEM;
+
+	if (i915.enable_execlists) {
+		struct intel_context *ctx = request->ctx;
+		ringbuf = ctx->engine[ring->id].ringbuf;
+	} else
+		ringbuf = ring->buffer;
+
+	request_start = intel_ring_get_tail(ringbuf);
 	/*
 	 * Emit any outstanding flushes - execbuf can fail to emit the flush
 	 * after having emitted the batchbuffer command. Hence we need to fix
@@ -2338,24 +2349,32 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	 * is that the flush _must_ happen before the next request, no matter
 	 * what.
 	 */
-	ret = intel_ring_flush_all_caches(ring);
-	if (ret)
-		return ret;
-
-	request = ring->preallocated_lazy_request;
-	if (WARN_ON(request == NULL))
-		return -ENOMEM;
+	if (i915.enable_execlists) {
+		ret = logical_ring_flush_all_caches(ringbuf);
+		if (ret)
+			return ret;
+	} else {
+		ret = intel_ring_flush_all_caches(ring);
+		if (ret)
+			return ret;
+	}
 
 	/* Record the position of the start of the request so that
 	 * should we detect the updated seqno part-way through the
 	 * GPU processing the request, we never over-estimate the
 	 * position of the head.
 	 */
-	request_ring_position = intel_ring_get_tail(ring->buffer);
+	request_ring_position = intel_ring_get_tail(ringbuf);
 
-	ret = ring->add_request(ring);
-	if (ret)
-		return ret;
+	if (i915.enable_execlists) {
+		ret = ring->emit_request(ringbuf);
+		if (ret)
+			return ret;
+	} else {
+		ret = ring->add_request(ring);
+		if (ret)
+			return ret;
+	}
 
 	request->seqno = intel_ring_get_seqno(ring);
 	request->ring = ring;
@@ -2370,12 +2389,14 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	 */
 	request->batch_obj = obj;
 
-	/* Hold a reference to the current context so that we can inspect
-	 * it later in case a hangcheck error event fires.
-	 */
-	request->ctx = ring->last_context;
-	if (request->ctx)
-		i915_gem_context_reference(request->ctx);
+	if (!i915.enable_execlists) {
+		/* Hold a reference to the current context so that we can inspect
+		 * it later in case a hangcheck error event fires.
+		 */
+		request->ctx = ring->last_context;
+		if (request->ctx)
+			i915_gem_context_reference(request->ctx);
+	}
 
 	request->emitted_jiffies = jiffies;
 	list_add_tail(&request->list, &ring->request_list);
@@ -2630,6 +2651,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 
 	while (!list_empty(&ring->request_list)) {
 		struct drm_i915_gem_request *request;
+		struct intel_ringbuffer *ringbuf;
 
 		request = list_first_entry(&ring->request_list,
 					   struct drm_i915_gem_request,
@@ -2639,12 +2661,24 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 			break;
 
 		trace_i915_gem_request_retire(ring, request->seqno);
+
+		/* This is one of the few common intersection points
+		 * between legacy ringbuffer submission and execlists:
+		 * we need to tell them apart in order to find the correct
+		 * ringbuffer to which the request belongs to.
+		 */
+		if (i915.enable_execlists) {
+			struct intel_context *ctx = request->ctx;
+			ringbuf = ctx->engine[ring->id].ringbuf;
+		} else
+			ringbuf = ring->buffer;
+
 		/* We know the GPU must have read the request to have
 		 * sent us the seqno + interrupt, so use the position
 		 * of tail of the request to update the last known position
 		 * of the GPU head.
 		 */
-		ring->buffer->last_retired_head = request->tail;
+		ringbuf->last_retired_head = request->tail;
 
 		i915_gem_free_request(request);
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5dd63d6..dcf59c6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -106,6 +106,22 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
 	/* TODO */
 }
 
+int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	int ret;
+
+	if (!ring->gpu_caches_dirty)
+		return 0;
+
+	ret = ring->emit_flush(ringbuf, 0, I915_GEM_GPU_DOMAINS);
+	if (ret)
+		return ret;
+
+	ring->gpu_caches_dirty = false;
+	return 0;
+}
+
 void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
 {
 	intel_logical_ring_advance(ringbuf);
@@ -116,7 +132,8 @@ void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
 	/* TODO: how to submit a context to the ELSP is not here yet */
 }
 
-static int logical_ring_alloc_seqno(struct intel_engine_cs *ring)
+static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
+				    struct intel_context *ctx)
 {
 	if (ring->outstanding_lazy_seqno)
 		return 0;
@@ -128,6 +145,13 @@ static int logical_ring_alloc_seqno(struct intel_engine_cs *ring)
 		if (request == NULL)
 			return -ENOMEM;
 
+		/* Hold a reference to the context this request belongs to
+		 * (we will need it when the time comes to emit/retire the
+		 * request).
+		 */
+		request->ctx = ctx;
+		i915_gem_context_reference(request->ctx);
+
 		ring->preallocated_lazy_request = request;
 	}
 
@@ -165,8 +189,6 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf, int bytes
 	if (ret)
 		return ret;
 
-	/* TODO: make sure we update the right ringbuffer's last_retired_head
-	 * when retiring requests */
 	i915_gem_retire_requests_ring(ring);
 	ringbuf->head = ringbuf->last_retired_head;
 	ringbuf->last_retired_head = -1;
@@ -291,7 +313,7 @@ int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords)
 		return ret;
 
 	/* Preallocate the olr before touching the ring */
-	ret = logical_ring_alloc_seqno(ring);
+	ret = logical_ring_alloc_seqno(ring, ringbuf->ctx);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 16798b6..696e09e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -29,6 +29,7 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring);
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
 int intel_logical_rings_init(struct drm_device *dev);
 
+int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf);
 void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf);
 static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
 {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 22/43] drm/i915/bdw: Ring idle and stop with logical rings
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (20 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 21/43] drm/i915/bdw: Emission of requests with logical rings Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-07-24 16:04 ` [PATCH 23/43] drm/i915/bdw: Interrupts " Thomas Daniel
                   ` (22 subsequent siblings)
  44 siblings, 0 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

This is a hard one, since there is no direct hardware ring to
control when in Execlists.

We reuse intel_ring_idle here, but it should be fine as long
as i915_add_request does the ring thing.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c |   24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index dcf59c6..c30518c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -103,7 +103,24 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 
 void intel_logical_ring_stop(struct intel_engine_cs *ring)
 {
-	/* TODO */
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	int ret;
+
+	if (!intel_ring_initialized(ring))
+		return;
+
+	ret = intel_ring_idle(ring);
+	if (ret && !i915_reset_in_progress(&to_i915(ring->dev)->gpu_error))
+		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
+			  ring->name, ret);
+
+	/* TODO: Is this correct with Execlists enabled? */
+	I915_WRITE_MODE(ring, _MASKED_BIT_ENABLE(STOP_RING));
+	if (wait_for_atomic((I915_READ_MODE(ring) & MODE_IDLE) != 0, 1000)) {
+		DRM_ERROR("%s :timed out trying to stop ring\n", ring->name);
+		return;
+	}
+	I915_WRITE_MODE(ring, _MASKED_BIT_DISABLE(STOP_RING));
 }
 
 int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf)
@@ -479,10 +496,13 @@ static int gen8_emit_request(struct intel_ringbuffer *ringbuf)
 
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
 	if (!intel_ring_initialized(ring))
 		return;
 
-	/* TODO: make sure the ring is stopped */
+	intel_logical_ring_stop(ring);
+	WARN_ON((I915_READ_MODE(ring) & MODE_IDLE) == 0);
 	ring->preallocated_lazy_request = NULL;
 	ring->outstanding_lazy_seqno = 0;
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 23/43] drm/i915/bdw: Interrupts with logical rings
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (21 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 22/43] drm/i915/bdw: Ring idle and stop " Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 21:02   ` Daniel Vetter
  2014-08-11 21:08   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 24/43] drm/i915/bdw: GEN-specific logical ring emit batchbuffer start Thomas Daniel
                   ` (21 subsequent siblings)
  44 siblings, 2 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

We need to attend context switch interrupts from all rings. Also, fixed writing
IMR/IER and added HWSTAM at ring init time.

Notice that, if added to irq_enable_mask, the context switch interrupts would
be incorrectly masked out when the user interrupts are due to no users waiting
on a sequence number. Therefore, this commit adds a bitmask of interrupts to
be kept unmasked at all times.

v2: Disable HWSTAM, as suggested by Damien (nobody listens to these interrupts,
anyway).

v3: Add new get/put_irq functions.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2 & v3)
---
 drivers/gpu/drm/i915/i915_irq.c         |   19 ++++++++--
 drivers/gpu/drm/i915/i915_reg.h         |    3 ++
 drivers/gpu/drm/i915/intel_lrc.c        |   58 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |    1 +
 4 files changed, 78 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index a38b5c3..f77a4ca 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1643,6 +1643,8 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 				notify_ring(dev, &dev_priv->ring[RCS]);
 			if (bcs & GT_RENDER_USER_INTERRUPT)
 				notify_ring(dev, &dev_priv->ring[BCS]);
+			if ((rcs | bcs) & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				DRM_DEBUG_DRIVER("TODO: Context switch\n");
 		} else
 			DRM_ERROR("The master control interrupt lied (GT0)!\n");
 	}
@@ -1655,9 +1657,13 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
 			if (vcs & GT_RENDER_USER_INTERRUPT)
 				notify_ring(dev, &dev_priv->ring[VCS]);
+			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				DRM_DEBUG_DRIVER("TODO: Context switch\n");
 			vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
 			if (vcs & GT_RENDER_USER_INTERRUPT)
 				notify_ring(dev, &dev_priv->ring[VCS2]);
+			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				DRM_DEBUG_DRIVER("TODO: Context switch\n");
 		} else
 			DRM_ERROR("The master control interrupt lied (GT1)!\n");
 	}
@@ -1681,6 +1687,8 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 			vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
 			if (vcs & GT_RENDER_USER_INTERRUPT)
 				notify_ring(dev, &dev_priv->ring[VECS]);
+			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				DRM_DEBUG_DRIVER("TODO: Context switch\n");
 		} else
 			DRM_ERROR("The master control interrupt lied (GT3)!\n");
 	}
@@ -3768,12 +3776,17 @@ static void gen8_gt_irq_postinstall(struct drm_i915_private *dev_priv)
 	/* These are interrupts we'll toggle with the ring mask register */
 	uint32_t gt_interrupts[] = {
 		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
+			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
 			GT_RENDER_L3_PARITY_ERROR_INTERRUPT |
-			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT,
+			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT |
+			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT,
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
-			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT,
+			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
+			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT |
+			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT,
 		0,
-		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT
+		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT |
+			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT
 		};
 
 	for (i = 0; i < ARRAY_SIZE(gt_interrupts); i++)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 70dddac..bfc0c01 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1062,6 +1062,7 @@ enum punit_power_well {
 #define RING_ACTHD_UDW(base)	((base)+0x5c)
 #define RING_NOPID(base)	((base)+0x94)
 #define RING_IMR(base)		((base)+0xa8)
+#define RING_HWSTAM(base)	((base)+0x98)
 #define RING_TIMESTAMP(base)	((base)+0x358)
 #define   TAIL_ADDR		0x001FFFF8
 #define   HEAD_WRAP_COUNT	0xFFE00000
@@ -4590,6 +4591,8 @@ enum punit_power_well {
 #define GEN8_GT_IIR(which) (0x44308 + (0x10 * (which)))
 #define GEN8_GT_IER(which) (0x4430c + (0x10 * (which)))
 
+#define GEN8_GT_CONTEXT_SWITCH_INTERRUPT	(1 <<  8)
+
 #define GEN8_BCS_IRQ_SHIFT 16
 #define GEN8_RCS_IRQ_SHIFT 0
 #define GEN8_VCS2_IRQ_SHIFT 16
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c30518c..a6dcb3a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -343,6 +343,9 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring)
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
+	I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
+	I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff);
+
 	I915_WRITE(RING_MODE_GEN7(ring),
 		_MASKED_BIT_DISABLE(GFX_REPLAY_MODE) |
 		_MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));
@@ -381,6 +384,39 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
 	return ret;
 }
 
+static bool gen8_logical_ring_get_irq(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	unsigned long flags;
+
+	if (!dev->irq_enabled)
+		return false;
+
+	spin_lock_irqsave(&dev_priv->irq_lock, flags);
+	if (ring->irq_refcount++ == 0) {
+		I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
+		POSTING_READ(RING_IMR(ring->mmio_base));
+	}
+	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+
+	return true;
+}
+
+static void gen8_logical_ring_put_irq(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev_priv->irq_lock, flags);
+	if (--ring->irq_refcount == 0) {
+		I915_WRITE_IMR(ring, ~ring->irq_keep_mask);
+		POSTING_READ(RING_IMR(ring->mmio_base));
+	}
+	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+}
+
 static int gen8_emit_flush(struct intel_ringbuffer *ringbuf,
 			   u32 invalidate_domains,
 			   u32 unused)
@@ -566,6 +602,10 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->mmio_base = RENDER_RING_BASE;
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
+	ring->irq_keep_mask =
+		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
+	if (HAS_L3_DPF(dev))
+		ring->irq_keep_mask |= GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
 
 	ring->init = gen8_init_render_ring;
 	ring->cleanup = intel_fini_pipe_control;
@@ -573,6 +613,8 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush_render;
+	ring->irq_get = gen8_logical_ring_get_irq;
+	ring->irq_put = gen8_logical_ring_put_irq;
 
 	return logical_ring_init(dev, ring);
 }
@@ -587,12 +629,16 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->mmio_base = GEN6_BSD_RING_BASE;
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
+	ring->irq_keep_mask =
+		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
+	ring->irq_get = gen8_logical_ring_get_irq;
+	ring->irq_put = gen8_logical_ring_put_irq;
 
 	return logical_ring_init(dev, ring);
 }
@@ -607,12 +653,16 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->mmio_base = GEN8_BSD2_RING_BASE;
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
+	ring->irq_keep_mask =
+		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
+	ring->irq_get = gen8_logical_ring_get_irq;
+	ring->irq_put = gen8_logical_ring_put_irq;
 
 	return logical_ring_init(dev, ring);
 }
@@ -627,12 +677,16 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->mmio_base = BLT_RING_BASE;
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
+	ring->irq_keep_mask =
+		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
+	ring->irq_get = gen8_logical_ring_get_irq;
+	ring->irq_put = gen8_logical_ring_put_irq;
 
 	return logical_ring_init(dev, ring);
 }
@@ -647,12 +701,16 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->mmio_base = VEBOX_RING_BASE;
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
+	ring->irq_keep_mask =
+		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
+	ring->irq_get = gen8_logical_ring_get_irq;
+	ring->irq_put = gen8_logical_ring_put_irq;
 
 	return logical_ring_init(dev, ring);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 6e22866..09102b2 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -223,6 +223,7 @@ struct  intel_engine_cs {
 	} semaphore;
 
 	/* Execlists */
+	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
 	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
 				      u32 invalidate_domains,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 24/43] drm/i915/bdw: GEN-specific logical ring emit batchbuffer start
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (22 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 23/43] drm/i915/bdw: Interrupts " Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 21:09   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 25/43] drm/i915/bdw: Workload submission mechanism for Execlists Thomas Daniel
                   ` (20 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Dispatch_execbuffer's evil twin.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        |   28 ++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |    2 ++
 2 files changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a6dcb3a..55ee8dd 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -384,6 +384,29 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
 	return ret;
 }
 
+static int gen8_emit_bb_start(struct intel_ringbuffer *ringbuf,
+			      u64 offset, unsigned flags)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	bool ppgtt = dev_priv->mm.aliasing_ppgtt != NULL &&
+		!(flags & I915_DISPATCH_SECURE);
+	int ret;
+
+	ret = intel_logical_ring_begin(ringbuf, 4);
+	if (ret)
+		return ret;
+
+	/* FIXME(BDW): Address space and security selectors. */
+	intel_logical_ring_emit(ringbuf, MI_BATCH_BUFFER_START_GEN8 | (ppgtt<<8));
+	intel_logical_ring_emit(ringbuf, lower_32_bits(offset));
+	intel_logical_ring_emit(ringbuf, upper_32_bits(offset));
+	intel_logical_ring_emit(ringbuf, MI_NOOP);
+	intel_logical_ring_advance(ringbuf);
+
+	return 0;
+}
+
 static bool gen8_logical_ring_get_irq(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;
@@ -615,6 +638,7 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->emit_flush = gen8_emit_flush_render;
 	ring->irq_get = gen8_logical_ring_get_irq;
 	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
 }
@@ -639,6 +663,7 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
 	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
 }
@@ -663,6 +688,7 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
 	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
 }
@@ -687,6 +713,7 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
 	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
 }
@@ -711,6 +738,7 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
 	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 09102b2..c885d5c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -228,6 +228,8 @@ struct  intel_engine_cs {
 	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
 				      u32 invalidate_domains,
 				      u32 flush_domains);
+	int		(*emit_bb_start)(struct intel_ringbuffer *ringbuf,
+					 u64 offset, unsigned flags);
 
 	/**
 	 * List of objects currently involved in rendering from the
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 25/43] drm/i915/bdw: Workload submission mechanism for Execlists
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (23 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 24/43] drm/i915/bdw: GEN-specific logical ring emit batchbuffer start Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 20:30   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 26/43] drm/i915/bdw: Always use MMIO flips with Execlists Thomas Daniel
                   ` (19 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

This is what i915_gem_do_execbuffer calls when it wants to execute some
worload in an Execlists world.

v2: Check arguments before doing stuff in intel_execlists_submission. Also,
get rel_constants parsing right.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |    6 ++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    4 +-
 drivers/gpu/drm/i915/intel_lrc.c           |  130 +++++++++++++++++++++++++++-
 3 files changed, 137 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1caed52..4303e2c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2239,6 +2239,12 @@ int i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 			      struct drm_file *file_priv);
 int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 			     struct drm_file *file_priv);
+void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
+					struct intel_engine_cs *ring);
+void i915_gem_execbuffer_retire_commands(struct drm_device *dev,
+					 struct drm_file *file,
+					 struct intel_engine_cs *ring,
+					 struct drm_i915_gem_object *obj);
 int i915_gem_ringbuffer_submission(struct drm_device *dev,
 				   struct drm_file *file,
 				   struct intel_engine_cs *ring,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 8c63d79..cae7df8 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -962,7 +962,7 @@ i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
 	return ctx;
 }
 
-static void
+void
 i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 				   struct intel_engine_cs *ring)
 {
@@ -994,7 +994,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 	}
 }
 
-static void
+void
 i915_gem_execbuffer_retire_commands(struct drm_device *dev,
 				    struct drm_file *file,
 				    struct intel_engine_cs *ring,
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 55ee8dd..cd834b3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -89,6 +89,57 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
+static int logical_ring_invalidate_all_caches(struct intel_ringbuffer *ringbuf)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	uint32_t flush_domains;
+	int ret;
+
+	flush_domains = 0;
+	if (ring->gpu_caches_dirty)
+		flush_domains = I915_GEM_GPU_DOMAINS;
+
+	ret = ring->emit_flush(ringbuf, I915_GEM_GPU_DOMAINS, flush_domains);
+	if (ret)
+		return ret;
+
+	ring->gpu_caches_dirty = false;
+	return 0;
+}
+
+static int execlists_move_to_gpu(struct intel_ringbuffer *ringbuf,
+				 struct list_head *vmas)
+{
+	struct intel_engine_cs *ring = ringbuf->ring;
+	struct i915_vma *vma;
+	uint32_t flush_domains = 0;
+	bool flush_chipset = false;
+	int ret;
+
+	list_for_each_entry(vma, vmas, exec_list) {
+		struct drm_i915_gem_object *obj = vma->obj;
+		ret = i915_gem_object_sync(obj, ring);
+		if (ret)
+			return ret;
+
+		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
+			flush_chipset |= i915_gem_clflush_object(obj, false);
+
+		flush_domains |= obj->base.write_domain;
+	}
+
+	if (flush_chipset)
+		i915_gem_chipset_flush(ring->dev);
+
+	if (flush_domains & I915_GEM_DOMAIN_GTT)
+		wmb();
+
+	/* Unconditionally invalidate gpu caches and ensure that we do flush
+	 * any residual writes from the previous batch.
+	 */
+	return logical_ring_invalidate_all_caches(ringbuf);
+}
+
 int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 			       struct intel_engine_cs *ring,
 			       struct intel_context *ctx,
@@ -97,7 +148,84 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 			       struct drm_i915_gem_object *batch_obj,
 			       u64 exec_start, u32 flags)
 {
-	/* TODO */
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	int instp_mode;
+	u32 instp_mask;
+	int ret;
+
+	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
+	instp_mask = I915_EXEC_CONSTANTS_MASK;
+	switch (instp_mode) {
+	case I915_EXEC_CONSTANTS_REL_GENERAL:
+	case I915_EXEC_CONSTANTS_ABSOLUTE:
+	case I915_EXEC_CONSTANTS_REL_SURFACE:
+		if (instp_mode != 0 && ring != &dev_priv->ring[RCS]) {
+			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
+			return -EINVAL;
+		}
+
+		if (instp_mode != dev_priv->relative_constants_mode) {
+			if (instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
+				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
+				return -EINVAL;
+			}
+
+			/* The HW changed the meaning on this bit on gen6 */
+			instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
+		}
+		break;
+	default:
+		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
+		return -EINVAL;
+	}
+
+	if (args->num_cliprects != 0) {
+		DRM_DEBUG("clip rectangles are only valid on pre-gen5\n");
+		return -EINVAL;
+	} else {
+		if (args->DR4 == 0xffffffff) {
+			DRM_DEBUG("UXA submitting garbage DR4, fixing up\n");
+			args->DR4 = 0;
+		}
+
+		if (args->DR1 || args->DR4 || args->cliprects_ptr) {
+			DRM_DEBUG("0 cliprects but dirt in cliprects fields\n");
+			return -EINVAL;
+		}
+	}
+
+	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
+		DRM_DEBUG("sol reset is gen7 only\n");
+		return -EINVAL;
+	}
+
+	ret = execlists_move_to_gpu(ringbuf, vmas);
+	if (ret)
+		return ret;
+
+	if (ring == &dev_priv->ring[RCS] &&
+			instp_mode != dev_priv->relative_constants_mode) {
+		ret = intel_logical_ring_begin(ringbuf, 4);
+		if (ret)
+			return ret;
+
+		intel_logical_ring_emit(ringbuf, MI_NOOP);
+		intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1));
+		intel_logical_ring_emit(ringbuf, INSTPM);
+		intel_logical_ring_emit(ringbuf, instp_mask << 16 | instp_mode);
+		intel_logical_ring_advance(ringbuf);
+
+		dev_priv->relative_constants_mode = instp_mode;
+	}
+
+	ret = ring->emit_bb_start(ringbuf, exec_start, flags);
+	if (ret)
+		return ret;
+
+	i915_gem_execbuffer_move_to_active(vmas, ring);
+	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
+
 	return 0;
 }
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 26/43] drm/i915/bdw: Always use MMIO flips with Execlists
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (24 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 25/43] drm/i915/bdw: Workload submission mechanism for Execlists Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 20:34   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 27/43] drm/i915/bdw: Render state init for Execlists Thomas Daniel
                   ` (18 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

The normal flip function places things in the ring in the legacy
way, so we either fix that or force MMIO flips always as we do in
this patch.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_display.c |    2 ++
 drivers/gpu/drm/i915/intel_lrc.c     |    3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 5ed6a1a..8129af4 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -9482,6 +9482,8 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
 		return false;
 	else if (i915.use_mmio_flip > 0)
 		return true;
+	else if (i915.enable_execlists)
+		return true;
 	else
 		return ring != obj->ring;
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index cd834b3..0a04c03 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -83,7 +83,8 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	if (enable_execlists == 0)
 		return 0;
 
-	if (HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev))
+	if (HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev) &&
+			i915.use_mmio_flip >= 0)
 		return 1;
 
 	return 0;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 27/43] drm/i915/bdw: Render state init for Execlists
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (25 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 26/43] drm/i915/bdw: Always use MMIO flips with Execlists Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 21:25   ` Daniel Vetter
  2014-08-21 10:40   ` [PATCH] " Thomas Daniel
  2014-07-24 16:04 ` [PATCH 28/43] drm/i915/bdw: Implement context switching (somewhat) Thomas Daniel
                   ` (17 subsequent siblings)
  44 siblings, 2 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

The batchbuffer that sets the render context state is submitted
in a different way, and from different places.

We needed to make both the render state preparation and free functions
outside accesible, and namespace accordingly. This mess is so that all
LR, LRC and Execlists functionality can go together in intel_lrc.c: we
can fix all of this later on, once the interfaces are clear.

v2: Create a separate ctx->rcs_initialized for the Execlists case, as
suggested by Chris Wilson.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h              |    4 +--
 drivers/gpu/drm/i915/i915_gem_context.c      |   17 +++++++++-
 drivers/gpu/drm/i915/i915_gem_render_state.c |   40 ++++++++++++++--------
 drivers/gpu/drm/i915/i915_gem_render_state.h |   47 ++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c             |   46 +++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h             |    2 ++
 drivers/gpu/drm/i915/intel_renderstate.h     |    8 +----
 7 files changed, 139 insertions(+), 25 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_render_state.h

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4303e2c..b7cf0ec 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -37,6 +37,7 @@
 #include "intel_ringbuffer.h"
 #include "intel_lrc.h"
 #include "i915_gem_gtt.h"
+#include "i915_gem_render_state.h"
 #include <linux/io-mapping.h>
 #include <linux/i2c.h>
 #include <linux/i2c-algo-bit.h>
@@ -623,6 +624,7 @@ struct intel_context {
 	} legacy_hw_ctx;
 
 	/* Execlists */
+	bool rcs_initialized;
 	struct {
 		struct drm_i915_gem_object *state;
 		struct intel_ringbuffer *ringbuf;
@@ -2553,8 +2555,6 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 				   struct drm_file *file);
 
-/* i915_gem_render_state.c */
-int i915_gem_render_state_init(struct intel_engine_cs *ring);
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev,
 					  struct i915_address_space *vm,
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 9085ff1..0dc6992 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -513,8 +513,23 @@ int i915_gem_context_enable(struct drm_i915_private *dev_priv)
 		ppgtt->enable(ppgtt);
 	}
 
-	if (i915.enable_execlists)
+	if (i915.enable_execlists) {
+		struct intel_context *dctx;
+
+		ring = &dev_priv->ring[RCS];
+		dctx = ring->default_context;
+
+		if (!dctx->rcs_initialized) {
+			ret = intel_lr_context_render_state_init(ring, dctx);
+			if (ret) {
+				DRM_ERROR("Init render state failed: %d\n", ret);
+				return ret;
+			}
+			dctx->rcs_initialized = true;
+		}
+
 		return 0;
+	}
 
 	/* FIXME: We should make this work, even in reset */
 	if (i915_reset_in_progress(&dev_priv->gpu_error))
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index e60be3f..a9a62d7 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -28,13 +28,6 @@
 #include "i915_drv.h"
 #include "intel_renderstate.h"
 
-struct render_state {
-	const struct intel_renderstate_rodata *rodata;
-	struct drm_i915_gem_object *obj;
-	u64 ggtt_offset;
-	int gen;
-};
-
 static const struct intel_renderstate_rodata *
 render_state_get_rodata(struct drm_device *dev, const int gen)
 {
@@ -127,30 +120,47 @@ static int render_state_setup(struct render_state *so)
 	return 0;
 }
 
-static void render_state_fini(struct render_state *so)
+void i915_gem_render_state_fini(struct render_state *so)
 {
 	i915_gem_object_ggtt_unpin(so->obj);
 	drm_gem_object_unreference(&so->obj->base);
 }
 
-int i915_gem_render_state_init(struct intel_engine_cs *ring)
+int i915_gem_render_state_prepare(struct intel_engine_cs *ring,
+				  struct render_state *so)
 {
-	struct render_state so;
 	int ret;
 
 	if (WARN_ON(ring->id != RCS))
 		return -ENOENT;
 
-	ret = render_state_init(&so, ring->dev);
+	ret = render_state_init(so, ring->dev);
 	if (ret)
 		return ret;
 
-	if (so.rodata == NULL)
+	if (so->rodata == NULL)
 		return 0;
 
-	ret = render_state_setup(&so);
+	ret = render_state_setup(so);
+	if (ret) {
+		i915_gem_render_state_fini(so);
+		return ret;
+	}
+
+	return 0;
+}
+
+int i915_gem_render_state_init(struct intel_engine_cs *ring)
+{
+	struct render_state so;
+	int ret;
+
+	ret = i915_gem_render_state_prepare(ring, &so);
 	if (ret)
-		goto out;
+		return ret;
+
+	if (so.rodata == NULL)
+		return 0;
 
 	ret = ring->dispatch_execbuffer(ring,
 					so.ggtt_offset,
@@ -164,6 +174,6 @@ int i915_gem_render_state_init(struct intel_engine_cs *ring)
 	ret = __i915_add_request(ring, NULL, so.obj, NULL);
 	/* __i915_add_request moves object to inactive if it fails */
 out:
-	render_state_fini(&so);
+	i915_gem_render_state_fini(&so);
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.h b/drivers/gpu/drm/i915/i915_gem_render_state.h
new file mode 100644
index 0000000..c44961e
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.h
@@ -0,0 +1,47 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef _I915_GEM_RENDER_STATE_H_
+#define _I915_GEM_RENDER_STATE_H_
+
+#include <linux/types.h>
+
+struct intel_renderstate_rodata {
+	const u32 *reloc;
+	const u32 *batch;
+	const u32 batch_items;
+};
+
+struct render_state {
+	const struct intel_renderstate_rodata *rodata;
+	struct drm_i915_gem_object *obj;
+	u64 ggtt_offset;
+	int gen;
+};
+
+int i915_gem_render_state_init(struct intel_engine_cs *ring);
+void i915_gem_render_state_fini(struct render_state *so);
+int i915_gem_render_state_prepare(struct intel_engine_cs *ring,
+				  struct render_state *so);
+
+#endif /* _I915_GEM_RENDER_STATE_H_ */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0a04c03..4549eec 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -925,6 +925,37 @@ cleanup_render_ring:
 	return ret;
 }
 
+int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
+				       struct intel_context *ctx)
+{
+	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	struct render_state so;
+	struct drm_i915_file_private *file_priv = ctx->file_priv;
+	struct drm_file *file = file_priv? file_priv->file : NULL;
+	int ret;
+
+	ret = i915_gem_render_state_prepare(ring, &so);
+	if (ret)
+		return ret;
+
+	if (so.rodata == NULL)
+		return 0;
+
+	ret = ring->emit_bb_start(ringbuf,
+			so.ggtt_offset,
+			I915_DISPATCH_SECURE);
+	if (ret)
+		goto out;
+
+	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), ring);
+
+	ret = __i915_add_request(ring, file, so.obj, NULL);
+	/* intel_logical_ring_add_request moves object to inactive if it fails */
+out:
+	i915_gem_render_state_fini(&so);
+	return ret;
+}
+
 static int
 populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj,
 		    struct intel_engine_cs *ring, struct intel_ringbuffer *ringbuf)
@@ -1142,6 +1173,21 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	ctx->engine[ring->id].ringbuf = ringbuf;
 	ctx->engine[ring->id].state = ctx_obj;
 
+	/* The default context will have to wait, because we are not yet
+	 * ready to send a batchbuffer at this point */
+	if (ring->id == RCS && !ctx->rcs_initialized &&
+			ctx != ring->default_context) {
+		ret = intel_lr_context_render_state_init(ring, ctx);
+		if (ret) {
+			DRM_ERROR("Init render state failed: %d\n", ret);
+			ctx->engine[ring->id].ringbuf = NULL;
+			ctx->engine[ring->id].state = NULL;
+			intel_destroy_ringbuffer_obj(ringbuf);
+			goto error;
+		}
+		ctx->rcs_initialized = true;
+	}
+
 	return 0;
 
 error:
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 696e09e..f20c3d2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -43,6 +43,8 @@ static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf, u32
 int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords);
 
 /* Logical Ring Contexts */
+int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
+				       struct intel_context *ctx);
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/intel_renderstate.h b/drivers/gpu/drm/i915/intel_renderstate.h
index fd4f662..6c792d3 100644
--- a/drivers/gpu/drm/i915/intel_renderstate.h
+++ b/drivers/gpu/drm/i915/intel_renderstate.h
@@ -24,13 +24,7 @@
 #ifndef _INTEL_RENDERSTATE_H
 #define _INTEL_RENDERSTATE_H
 
-#include <linux/types.h>
-
-struct intel_renderstate_rodata {
-	const u32 *reloc;
-	const u32 *batch;
-	const u32 batch_items;
-};
+#include "i915_drv.h"
 
 extern const struct intel_renderstate_rodata gen6_null_state;
 extern const struct intel_renderstate_rodata gen7_null_state;
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 28/43] drm/i915/bdw: Implement context switching (somewhat)
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (26 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 27/43] drm/i915/bdw: Render state init for Execlists Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-11 21:29   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 29/43] drm/i915/bdw: Write the tail pointer, LRC style Thomas Daniel
                   ` (16 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

A context switch occurs by submitting a context descriptor to the
ExecList Submission Port. Given that we can now initialize a context,
it's possible to begin implementing the context switch by creating the
descriptor and submitting it to ELSP (actually two, since the ELSP
has two ports).

The context object must be mapped in the GGTT, which means it must exist
in the 0-4GB graphics VA range.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

v2: This code has changed quite a lot in various rebases. Of particular
importance is that now we use the globally unique Submission ID to send
to the hardware. Also, context pages are now pinned unconditionally to
GGTT, so there is no need to bind them.

v3: Use LRCA[31:12] as hwCtxId[19:0]. This guarantees that the HW context
ID we submit to the ELSP is globally unique and != 0 (Bspec requirements
of the software use-only bits of the Context ID in the Context Descriptor
Format) without the hassle of the previous submission Id construction.
Also, re-add the ELSP porting read (it was dropped somewhere during the
rebases).

v4:
- Squash with "drm/i915/bdw: Add forcewake lock around ELSP writes" (BSPEC
  says: "SW must set Force Wakeup bit to prevent GT from entering C6 while
  ELSP writes are in progress") as noted by Thomas Daniel
  (thomas.daniel@intel.com).
- Rename functions and use an execlists/intel_execlists_ namespace.
- The BUG_ON only checked that the LRCA was <32 bits, but it didn't make
  sure that it was properly aligned. Spotted by Alistair Mcaulay
  <alistair.mcaulay@intel.com>.

v5:
- Improved source code comments as suggested by Chris Wilson.
- No need to abstract submit_ctx away, as pointed by Brad Volkin.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c |  116 +++++++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.h |    1 +
 2 files changed, 115 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4549eec..535ef98 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -47,6 +47,7 @@
 #define GEN8_LR_CONTEXT_ALIGN 4096
 
 #define RING_ELSP(ring)			((ring)->mmio_base+0x230)
+#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
 #define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
 
 #define CTX_LRI_HEADER_0		0x01
@@ -78,6 +79,26 @@
 #define CTX_R_PWR_CLK_STATE		0x42
 #define CTX_GPGPU_CSR_BASE_ADDRESS	0x44
 
+#define GEN8_CTX_VALID (1<<0)
+#define GEN8_CTX_FORCE_PD_RESTORE (1<<1)
+#define GEN8_CTX_FORCE_RESTORE (1<<2)
+#define GEN8_CTX_L3LLC_COHERENT (1<<5)
+#define GEN8_CTX_PRIVILEGE (1<<8)
+enum {
+	ADVANCED_CONTEXT=0,
+	LEGACY_CONTEXT,
+	ADVANCED_AD_CONTEXT,
+	LEGACY_64B_CONTEXT
+};
+#define GEN8_CTX_MODE_SHIFT 3
+enum {
+	FAULT_AND_HANG=0,
+	FAULT_AND_HALT, /* Debug only */
+	FAULT_AND_STREAM,
+	FAULT_AND_CONTINUE /* Unsupported */
+};
+#define GEN8_CTX_ID_SHIFT 32
+
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists)
 {
 	if (enable_execlists == 0)
@@ -90,6 +111,93 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
+u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
+{
+	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj);
+
+	/* LRCA is required to be 4K aligned so the more significant 20 bits
+	 * are globally unique */
+	return lrca >> 12;
+}
+
+static uint64_t execlists_ctx_descriptor(struct drm_i915_gem_object *ctx_obj)
+{
+	uint64_t desc;
+	uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj);
+	BUG_ON(lrca & 0xFFFFFFFF00000FFFULL);
+
+	desc = GEN8_CTX_VALID;
+	desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
+	desc |= GEN8_CTX_L3LLC_COHERENT;
+	desc |= GEN8_CTX_PRIVILEGE;
+	desc |= lrca;
+	desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
+
+	/* TODO: WaDisableLiteRestore when we start using semaphore
+	 * signalling between Command Streamers */
+	/* desc |= GEN8_CTX_FORCE_RESTORE; */
+
+	return desc;
+}
+
+static void execlists_elsp_write(struct intel_engine_cs *ring,
+				 struct drm_i915_gem_object *ctx_obj0,
+				 struct drm_i915_gem_object *ctx_obj1)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	uint64_t temp = 0;
+	uint32_t desc[4];
+
+	/* XXX: You must always write both descriptors in the order below. */
+	if (ctx_obj1)
+		temp = execlists_ctx_descriptor(ctx_obj1);
+	else
+		temp = 0;
+	desc[1] = (u32)(temp >> 32);
+	desc[0] = (u32)temp;
+
+	temp = execlists_ctx_descriptor(ctx_obj0);
+	desc[3] = (u32)(temp >> 32);
+	desc[2] = (u32)temp;
+
+	/* Set Force Wakeup bit to prevent GT from entering C6 while
+	 * ELSP writes are in progress */
+	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
+
+	I915_WRITE(RING_ELSP(ring), desc[1]);
+	I915_WRITE(RING_ELSP(ring), desc[0]);
+	I915_WRITE(RING_ELSP(ring), desc[3]);
+	/* The context is automatically loaded after the following */
+	I915_WRITE(RING_ELSP(ring), desc[2]);
+
+	/* ELSP is a wo register, so use another nearby reg for posting instead */
+	POSTING_READ(RING_EXECLIST_STATUS(ring));
+
+	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
+}
+
+static int execlists_submit_context(struct intel_engine_cs *ring,
+				    struct intel_context *to0, u32 tail0,
+				    struct intel_context *to1, u32 tail1)
+{
+	struct drm_i915_gem_object *ctx_obj0;
+	struct drm_i915_gem_object *ctx_obj1 = NULL;
+
+	ctx_obj0 = to0->engine[ring->id].state;
+	BUG_ON(!ctx_obj0);
+	BUG_ON(!i915_gem_obj_is_pinned(ctx_obj0));
+
+	if (to1) {
+		ctx_obj1 = to1->engine[ring->id].state;
+		BUG_ON(!ctx_obj1);
+		BUG_ON(!i915_gem_obj_is_pinned(ctx_obj1));
+	}
+
+	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
+
+	return 0;
+}
+
 static int logical_ring_invalidate_all_caches(struct intel_ringbuffer *ringbuf)
 {
 	struct intel_engine_cs *ring = ringbuf->ring;
@@ -270,12 +378,16 @@ int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf)
 
 void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
 {
+	struct intel_engine_cs *ring = ringbuf->ring;
+	struct intel_context *ctx = ringbuf->ctx;
+
 	intel_logical_ring_advance(ringbuf);
 
-	if (intel_ring_stopped(ringbuf->ring))
+	if (intel_ring_stopped(ring))
 		return;
 
-	/* TODO: how to submit a context to the ELSP is not here yet */
+	/* FIXME: too cheeky, we don't even check if the ELSP is ready */
+	execlists_submit_context(ring, ctx, ringbuf->tail, NULL, 0);
 }
 
 static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index f20c3d2..b59965b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -58,5 +58,6 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 			       struct list_head *vmas,
 			       struct drm_i915_gem_object *batch_obj,
 			       u64 exec_start, u32 flags);
+u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
 #endif /* _INTEL_LRC_H_ */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 29/43] drm/i915/bdw: Write the tail pointer, LRC style
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (27 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 28/43] drm/i915/bdw: Implement context switching (somewhat) Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-01 14:33   ` Damien Lespiau
  2014-08-11 21:30   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 30/43] drm/i915/bdw: Two-stage execlist submit process Thomas Daniel
                   ` (15 subsequent siblings)
  44 siblings, 2 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Each logical ring context has the tail pointer in the context object,
so update it before submission.

v2: New namespace.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 535ef98..5b6f416 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -176,6 +176,21 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
 }
 
+static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tail)
+{
+	struct page *page;
+	uint32_t *reg_state;
+
+	page = i915_gem_object_get_page(ctx_obj, 1);
+	reg_state = kmap_atomic(page);
+
+	reg_state[CTX_RING_TAIL+1] = tail;
+
+	kunmap_atomic(reg_state);
+
+	return 0;
+}
+
 static int execlists_submit_context(struct intel_engine_cs *ring,
 				    struct intel_context *to0, u32 tail0,
 				    struct intel_context *to1, u32 tail1)
@@ -187,10 +202,14 @@ static int execlists_submit_context(struct intel_engine_cs *ring,
 	BUG_ON(!ctx_obj0);
 	BUG_ON(!i915_gem_obj_is_pinned(ctx_obj0));
 
+	execlists_ctx_write_tail(ctx_obj0, tail0);
+
 	if (to1) {
 		ctx_obj1 = to1->engine[ring->id].state;
 		BUG_ON(!ctx_obj1);
 		BUG_ON(!i915_gem_obj_is_pinned(ctx_obj1));
+
+		execlists_ctx_write_tail(ctx_obj1, tail1);
 	}
 
 	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 30/43] drm/i915/bdw: Two-stage execlist submit process
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (28 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 29/43] drm/i915/bdw: Write the tail pointer, LRC style Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-14 20:05   ` Daniel Vetter
  2014-08-14 20:10   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 31/43] drm/i915/bdw: Handle context switch events Thomas Daniel
                   ` (14 subsequent siblings)
  44 siblings, 2 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Michel Thierry <michel.thierry@intel.com>

Context switch (and execlist submission) should happen only when
other contexts are not active, otherwise pre-emption occurs.

To assure this, we place context switch requests in a queue and those
request are later consumed when the right context switch interrupt is
received (still TODO).

v2: Use a spinlock, do not remove the requests on unqueue (wait for
context switch completion).

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

v3: Several rebases and code changes. Use unique ID.

v4:
- Move the queue/lock init to the late ring initialization.
- Damien's kmalloc review comments: check return, use sizeof(*req),
do not cast.

v5:
- Do not reuse drm_i915_gem_request. Instead, create our own.
- New namespace.

Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2-v5)
---
 drivers/gpu/drm/i915/intel_lrc.c        |   63 ++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.h        |    8 ++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |    2 +
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5b6f416..9e91169 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -217,6 +217,63 @@ static int execlists_submit_context(struct intel_engine_cs *ring,
 	return 0;
 }
 
+static void execlists_context_unqueue(struct intel_engine_cs *ring)
+{
+	struct intel_ctx_submit_request *req0 = NULL, *req1 = NULL;
+	struct intel_ctx_submit_request *cursor = NULL, *tmp = NULL;
+
+	if (list_empty(&ring->execlist_queue))
+		return;
+
+	/* Try to read in pairs */
+	list_for_each_entry_safe(cursor, tmp, &ring->execlist_queue, execlist_link) {
+		if (!req0)
+			req0 = cursor;
+		else if (req0->ctx == cursor->ctx) {
+			/* Same ctx: ignore first request, as second request
+			 * will update tail past first request's workload */
+			list_del(&req0->execlist_link);
+			i915_gem_context_unreference(req0->ctx);
+			kfree(req0);
+			req0 = cursor;
+		} else {
+			req1 = cursor;
+			break;
+		}
+	}
+
+	BUG_ON(execlists_submit_context(ring, req0->ctx, req0->tail,
+			req1? req1->ctx : NULL, req1? req1->tail : 0));
+}
+
+static int execlists_context_queue(struct intel_engine_cs *ring,
+				   struct intel_context *to,
+				   u32 tail)
+{
+	struct intel_ctx_submit_request *req = NULL;
+	unsigned long flags;
+	bool was_empty;
+
+	req = kzalloc(sizeof(*req), GFP_KERNEL);
+	if (req == NULL)
+		return -ENOMEM;
+	req->ctx = to;
+	i915_gem_context_reference(req->ctx);
+	req->ring = ring;
+	req->tail = tail;
+
+	spin_lock_irqsave(&ring->execlist_lock, flags);
+
+	was_empty = list_empty(&ring->execlist_queue);
+	list_add_tail(&req->execlist_link, &ring->execlist_queue);
+	if (was_empty)
+		execlists_context_unqueue(ring);
+
+	spin_unlock_irqrestore(&ring->execlist_lock, flags);
+
+	return 0;
+}
+
 static int logical_ring_invalidate_all_caches(struct intel_ringbuffer *ringbuf)
 {
 	struct intel_engine_cs *ring = ringbuf->ring;
@@ -405,8 +462,7 @@ void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
 	if (intel_ring_stopped(ring))
 		return;
 
-	/* FIXME: too cheeky, we don't even check if the ELSP is ready */
-	execlists_submit_context(ring, ctx, ringbuf->tail, NULL, 0);
+	execlists_context_queue(ring, ctx, ringbuf->tail);
 }
 
 static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
@@ -850,6 +906,9 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	INIT_LIST_HEAD(&ring->request_list);
 	init_waitqueue_head(&ring->irq_queue);
 
+	INIT_LIST_HEAD(&ring->execlist_queue);
+	spin_lock_init(&ring->execlist_lock);
+
 	ret = intel_lr_context_deferred_create(dctx, ring);
 	if (ret)
 		return ret;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index b59965b..14492a9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -60,4 +60,12 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 			       u64 exec_start, u32 flags);
 u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
+struct intel_ctx_submit_request {
+	struct intel_context *ctx;
+	struct intel_engine_cs *ring;
+	u32 tail;
+
+	struct list_head execlist_link;
+};
+
 #endif /* _INTEL_LRC_H_ */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index c885d5c..6358823 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -223,6 +223,8 @@ struct  intel_engine_cs {
 	} semaphore;
 
 	/* Execlists */
+	spinlock_t execlist_lock;
+	struct list_head execlist_queue;
 	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
 	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 31/43] drm/i915/bdw: Handle context switch events
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (29 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 30/43] drm/i915/bdw: Two-stage execlist submit process Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-14 20:13   ` Daniel Vetter
                     ` (3 more replies)
  2014-07-24 16:04 ` [PATCH 32/43] drm/i915/bdw: Avoid non-lite-restore preemptions Thomas Daniel
                   ` (13 subsequent siblings)
  44 siblings, 4 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

Handle all context status events in the context status buffer on every
context switch interrupt. We only remove work from the execlist queue
after a context status buffer reports that it has completed and we only
attempt to schedule new contexts on interrupt when a previously submitted
context completes (unless no contexts are queued, which means the GPU is
free).

We canot call intel_runtime_pm_get() in an interrupt (or with a spinlock
grabbed, FWIW), because it might sleep, which is not a nice thing to do.
Instead, do the runtime_pm get/put together with the create/destroy request,
and handle the forcewake get/put directly.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

v2: Unreferencing the context when we are freeing the request might free
the backing bo, which requires the struct_mutex to be grabbed, so defer
unreferencing and freeing to a bottom half.

v3:
- Ack the interrupt inmediately, before trying to handle it (fix for
missing interrupts by Bob Beckett <robert.beckett@intel.com>).
- Update the Context Status Buffer Read Pointer, just in case (spotted
by Damien Lespiau).

v4: New namespace and multiple rebase changes.

v5: Squash with "drm/i915/bdw: Do not call intel_runtime_pm_get() in an
interrupt", as suggested by Daniel.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c         |   35 ++++++---
 drivers/gpu/drm/i915/intel_lrc.c        |  129 +++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_lrc.h        |    3 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |    1 +
 4 files changed, 151 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f77a4ca..e4077d1 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1628,6 +1628,7 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 				       struct drm_i915_private *dev_priv,
 				       u32 master_ctl)
 {
+	struct intel_engine_cs *ring;
 	u32 rcs, bcs, vcs;
 	uint32_t tmp = 0;
 	irqreturn_t ret = IRQ_NONE;
@@ -1637,14 +1638,20 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 		if (tmp) {
 			I915_WRITE(GEN8_GT_IIR(0), tmp);
 			ret = IRQ_HANDLED;
+
 			rcs = tmp >> GEN8_RCS_IRQ_SHIFT;
-			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
+			ring = &dev_priv->ring[RCS];
 			if (rcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[RCS]);
+				notify_ring(dev, ring);
+			if (rcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				intel_execlists_handle_ctx_events(ring);
+
+			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
+			ring = &dev_priv->ring[BCS];
 			if (bcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[BCS]);
-			if ((rcs | bcs) & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
-				DRM_DEBUG_DRIVER("TODO: Context switch\n");
+				notify_ring(dev, ring);
+			if (bcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				intel_execlists_handle_ctx_events(ring);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT0)!\n");
 	}
@@ -1654,16 +1661,20 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 		if (tmp) {
 			I915_WRITE(GEN8_GT_IIR(1), tmp);
 			ret = IRQ_HANDLED;
+
 			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
+			ring = &dev_priv->ring[VCS];
 			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[VCS]);
+				notify_ring(dev, ring);
 			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
-				DRM_DEBUG_DRIVER("TODO: Context switch\n");
+				intel_execlists_handle_ctx_events(ring);
+
 			vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
+			ring = &dev_priv->ring[VCS2];
 			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[VCS2]);
+				notify_ring(dev, ring);
 			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
-				DRM_DEBUG_DRIVER("TODO: Context switch\n");
+				intel_execlists_handle_ctx_events(ring);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT1)!\n");
 	}
@@ -1684,11 +1695,13 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 		if (tmp) {
 			I915_WRITE(GEN8_GT_IIR(3), tmp);
 			ret = IRQ_HANDLED;
+
 			vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
+			ring = &dev_priv->ring[VECS];
 			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[VECS]);
+				notify_ring(dev, ring);
 			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
-				DRM_DEBUG_DRIVER("TODO: Context switch\n");
+				intel_execlists_handle_ctx_events(ring);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT3)!\n");
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9e91169..65f4f26 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -49,6 +49,22 @@
 #define RING_ELSP(ring)			((ring)->mmio_base+0x230)
 #define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
 #define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
+#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
+#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
+
+#define RING_EXECLIST_QFULL		(1 << 0x2)
+#define RING_EXECLIST1_VALID		(1 << 0x3)
+#define RING_EXECLIST0_VALID		(1 << 0x4)
+#define RING_EXECLIST_ACTIVE_STATUS	(3 << 0xE)
+#define RING_EXECLIST1_ACTIVE		(1 << 0x11)
+#define RING_EXECLIST0_ACTIVE		(1 << 0x12)
+
+#define GEN8_CTX_STATUS_IDLE_ACTIVE	(1 << 0)
+#define GEN8_CTX_STATUS_PREEMPTED	(1 << 1)
+#define GEN8_CTX_STATUS_ELEMENT_SWITCH	(1 << 2)
+#define GEN8_CTX_STATUS_ACTIVE_IDLE	(1 << 3)
+#define GEN8_CTX_STATUS_COMPLETE	(1 << 4)
+#define GEN8_CTX_STATUS_LITE_RESTORE	(1 << 15)
 
 #define CTX_LRI_HEADER_0		0x01
 #define CTX_CONTEXT_CONTROL		0x02
@@ -147,6 +163,7 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	uint64_t temp = 0;
 	uint32_t desc[4];
+	unsigned long flags;
 
 	/* XXX: You must always write both descriptors in the order below. */
 	if (ctx_obj1)
@@ -160,9 +177,17 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	desc[3] = (u32)(temp >> 32);
 	desc[2] = (u32)temp;
 
-	/* Set Force Wakeup bit to prevent GT from entering C6 while
-	 * ELSP writes are in progress */
-	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
+	/* Set Force Wakeup bit to prevent GT from entering C6 while ELSP writes
+	 * are in progress.
+	 *
+	 * The other problem is that we can't just call gen6_gt_force_wake_get()
+	 * because that function calls intel_runtime_pm_get(), which might sleep.
+	 * Instead, we do the runtime_pm_get/put when creating/destroying requests.
+	 */
+	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
+	if (dev_priv->uncore.forcewake_count++ == 0)
+		dev_priv->uncore.funcs.force_wake_get(dev_priv, FORCEWAKE_ALL);
+	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
 
 	I915_WRITE(RING_ELSP(ring), desc[1]);
 	I915_WRITE(RING_ELSP(ring), desc[0]);
@@ -173,7 +198,11 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	/* ELSP is a wo register, so use another nearby reg for posting instead */
 	POSTING_READ(RING_EXECLIST_STATUS(ring));
 
-	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
+	/* Release Force Wakeup (see the big comment above). */
+	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
+	if (--dev_priv->uncore.forcewake_count == 0)
+		dev_priv->uncore.funcs.force_wake_put(dev_priv, FORCEWAKE_ALL);
+	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
 }
 
 static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tail)
@@ -221,6 +250,9 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 {
 	struct intel_ctx_submit_request *req0 = NULL, *req1 = NULL;
 	struct intel_ctx_submit_request *cursor = NULL, *tmp = NULL;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	assert_spin_locked(&ring->execlist_lock);
 
 	if (list_empty(&ring->execlist_queue))
 		return;
@@ -233,8 +265,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 			/* Same ctx: ignore first request, as second request
 			 * will update tail past first request's workload */
 			list_del(&req0->execlist_link);
-			i915_gem_context_unreference(req0->ctx);
-			kfree(req0);
+			queue_work(dev_priv->wq, &req0->work);
 			req0 = cursor;
 		} else {
 			req1 = cursor;
@@ -246,6 +277,89 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 			req1? req1->ctx : NULL, req1? req1->tail : 0));
 }
 
+static bool execlists_check_remove_request(struct intel_engine_cs *ring,
+					   u32 request_id)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct intel_ctx_submit_request *head_req;
+
+	assert_spin_locked(&ring->execlist_lock);
+
+	head_req = list_first_entry_or_null(&ring->execlist_queue,
+			struct intel_ctx_submit_request, execlist_link);
+	if (head_req != NULL) {
+		struct drm_i915_gem_object *ctx_obj =
+				head_req->ctx->engine[ring->id].state;
+		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
+			list_del(&head_req->execlist_link);
+			queue_work(dev_priv->wq, &head_req->work);
+			return true;
+		}
+	}
+
+	return false;
+}
+
+void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	u32 status_pointer;
+	u8 read_pointer;
+	u8 write_pointer;
+	u32 status;
+	u32 status_id;
+	u32 submit_contexts = 0;
+
+	status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
+
+	read_pointer = ring->next_context_status_buffer;
+	write_pointer = status_pointer & 0x07;
+	if (read_pointer > write_pointer)
+		write_pointer += 6;
+
+	spin_lock(&ring->execlist_lock);
+
+	while (read_pointer < write_pointer) {
+		read_pointer++;
+		status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
+				(read_pointer % 6) * 8);
+		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
+				(read_pointer % 6) * 8 + 4);
+
+		if (status & GEN8_CTX_STATUS_COMPLETE) {
+			if (execlists_check_remove_request(ring, status_id))
+				submit_contexts++;
+		}
+	}
+
+	if (submit_contexts != 0)
+		execlists_context_unqueue(ring);
+
+	spin_unlock(&ring->execlist_lock);
+
+	WARN(submit_contexts > 2, "More than two context complete events?\n");
+	ring->next_context_status_buffer = write_pointer % 6;
+
+	I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
+			((u32)ring->next_context_status_buffer & 0x07) << 8);
+}
+
+static void execlists_free_request_task(struct work_struct *work)
+{
+	struct intel_ctx_submit_request *req =
+			container_of(work, struct intel_ctx_submit_request, work);
+	struct drm_device *dev = req->ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	intel_runtime_pm_put(dev_priv);
+
+	mutex_lock(&dev->struct_mutex);
+	i915_gem_context_unreference(req->ctx);
+	mutex_unlock(&dev->struct_mutex);
+
+	kfree(req);
+}
+
 static int execlists_context_queue(struct intel_engine_cs *ring,
 				   struct intel_context *to,
 				   u32 tail)
@@ -261,6 +375,8 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 	i915_gem_context_reference(req->ctx);
 	req->ring = ring;
 	req->tail = tail;
+	INIT_WORK(&req->work, execlists_free_request_task);
+	intel_runtime_pm_get(dev_priv);
 
 	spin_lock_irqsave(&ring->execlist_lock, flags);
 
@@ -908,6 +1024,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 
 	INIT_LIST_HEAD(&ring->execlist_queue);
 	spin_lock_init(&ring->execlist_lock);
+	ring->next_context_status_buffer = 0;
 
 	ret = intel_lr_context_deferred_create(dctx, ring);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 14492a9..2e8929f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -66,6 +66,9 @@ struct intel_ctx_submit_request {
 	u32 tail;
 
 	struct list_head execlist_link;
+	struct work_struct work;
 };
 
+void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring);
+
 #endif /* _INTEL_LRC_H_ */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 6358823..905d1ba 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -225,6 +225,7 @@ struct  intel_engine_cs {
 	/* Execlists */
 	spinlock_t execlist_lock;
 	struct list_head execlist_queue;
+	u8 next_context_status_buffer;
 	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
 	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 32/43] drm/i915/bdw: Avoid non-lite-restore preemptions
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (30 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 31/43] drm/i915/bdw: Handle context switch events Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-14 20:31   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 33/43] drm/i915/bdw: Help out the ctx switch interrupt handler Thomas Daniel
                   ` (12 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

In the current Execlists feeding mechanism, full preemption is not
supported yet: only lite-restores are allowed (this is: the GPU
simply samples a new tail pointer for the context currently in
execution).

But we have identified an scenario in which a full preemption occurs:
1) We submit two contexts for execution (A & B).
2) The GPU finishes with the first one (A), switches to the second one
(B) and informs us.
3) We submit B again (hoping to cause a lite restore) together with C,
but in the time we spend writing to the ELSP, the GPU finishes B.
4) The GPU start executing B again (since we told it so).
5) We receive a B finished interrupt and, mistakenly, we submit C (again)
and D, causing a full preemption of B.

The race is avoided by keeping track of how many times a context has been
submitted to the hardware and by better discriminating the received context
switch interrupts: in the example, when we have submitted B twice, we won´t
submit C and D as soon as we receive the notification that B is completed
because we were expecting to get a LITE_RESTORE and we didn´t, so we know a
second completion will be received shortly.

Without this explicit checking, somehow, the batch buffer execution order
gets messed with. This can be verified with the IGT test I sent together with
the series. I don´t know the exact mechanism by which the pre-emption messes
with the execution order but, since other people is working on the Scheduler
+ Preemption on Execlists, I didn´t try to fix it. In these series, only Lite
Restores are supported (other kind of preemptions WARN).

v2: elsp_submitted belongs in the new intel_ctx_submit_request. Several
rebase changes.

v3: Clarify how the race is avoided, as requested by Daniel.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c |   28 ++++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_lrc.h |    2 ++
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 65f4f26..895dbfc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -264,6 +264,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 		else if (req0->ctx == cursor->ctx) {
 			/* Same ctx: ignore first request, as second request
 			 * will update tail past first request's workload */
+			cursor->elsp_submitted = req0->elsp_submitted;
 			list_del(&req0->execlist_link);
 			queue_work(dev_priv->wq, &req0->work);
 			req0 = cursor;
@@ -273,8 +274,14 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 		}
 	}
 
+	WARN_ON(req1 && req1->elsp_submitted);
+
 	BUG_ON(execlists_submit_context(ring, req0->ctx, req0->tail,
 			req1? req1->ctx : NULL, req1? req1->tail : 0));
+
+	req0->elsp_submitted++;
+	if (req1)
+		req1->elsp_submitted++;
 }
 
 static bool execlists_check_remove_request(struct intel_engine_cs *ring,
@@ -291,9 +298,13 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
 		struct drm_i915_gem_object *ctx_obj =
 				head_req->ctx->engine[ring->id].state;
 		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
-			list_del(&head_req->execlist_link);
-			queue_work(dev_priv->wq, &head_req->work);
-			return true;
+			WARN(head_req->elsp_submitted == 0,
+					"Never submitted head request\n");
+			if (--head_req->elsp_submitted <= 0) {
+				list_del(&head_req->execlist_link);
+				queue_work(dev_priv->wq, &head_req->work);
+				return true;
+			}
 		}
 	}
 
@@ -326,7 +337,16 @@ void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring)
 		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
 				(read_pointer % 6) * 8 + 4);
 
-		if (status & GEN8_CTX_STATUS_COMPLETE) {
+		if (status & GEN8_CTX_STATUS_PREEMPTED) {
+			if (status & GEN8_CTX_STATUS_LITE_RESTORE) {
+				if (execlists_check_remove_request(ring, status_id))
+					WARN(1, "Lite Restored request removed from queue\n");
+			} else
+				WARN(1, "Preemption without Lite Restore\n");
+		}
+
+		 if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) ||
+		     (status & GEN8_CTX_STATUS_ELEMENT_SWITCH)) {
 			if (execlists_check_remove_request(ring, status_id))
 				submit_contexts++;
 		}
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 2e8929f..074b44f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -67,6 +67,8 @@ struct intel_ctx_submit_request {
 
 	struct list_head execlist_link;
 	struct work_struct work;
+
+	int elsp_submitted;
 };
 
 void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring);
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 33/43] drm/i915/bdw: Help out the ctx switch interrupt handler
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (31 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 32/43] drm/i915/bdw: Avoid non-lite-restore preemptions Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-14 20:43   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 34/43] drm/i915/bdw: Make sure gpu reset still works with Execlists Thomas Daniel
                   ` (11 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

If we receive a storm of requests for the same context (see gem_storedw_loop_*)
we might end up iterating over too many elements in interrupt time, looking for
contexts to squash together. Instead, share the burden by giving more
intelligence to the queue function. At most, the interrupt will iterate over
three elements.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c |   26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 895dbfc..829b15d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -384,9 +384,10 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 				   struct intel_context *to,
 				   u32 tail)
 {
-	struct intel_ctx_submit_request *req = NULL;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct intel_ctx_submit_request *req = NULL, *cursor;
 	unsigned long flags;
-	bool was_empty;
+	int num_elements = 0;
 
 	req = kzalloc(sizeof(*req), GFP_KERNEL);
 	if (req == NULL)
@@ -400,9 +401,26 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 
 	spin_lock_irqsave(&ring->execlist_lock, flags);
 
-	was_empty = list_empty(&ring->execlist_queue);
+	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
+		if (++num_elements > 2)
+			break;
+
+	if (num_elements > 2) {
+		struct intel_ctx_submit_request *tail_req;
+
+		tail_req = list_last_entry(&ring->execlist_queue,
+					struct intel_ctx_submit_request,
+					execlist_link);
+		if (to == tail_req->ctx) {
+			WARN(tail_req->elsp_submitted != 0,
+					"More than 2 already-submitted reqs queued\n");
+			list_del(&tail_req->execlist_link);
+			queue_work(dev_priv->wq, &tail_req->work);
+		}
+	}
+
 	list_add_tail(&req->execlist_link, &ring->execlist_queue);
-	if (was_empty)
+	if (num_elements == 0)
 		execlists_context_unqueue(ring);
 
 	spin_unlock_irqrestore(&ring->execlist_lock, flags);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 34/43] drm/i915/bdw: Make sure gpu reset still works with Execlists
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (32 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 33/43] drm/i915/bdw: Help out the ctx switch interrupt handler Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-01 14:42   ` Damien Lespiau
  2014-08-01 14:46   ` Damien Lespiau
  2014-07-24 16:04 ` [PATCH 35/43] drm/i915/bdw: Make sure error capture keeps working " Thomas Daniel
                   ` (10 subsequent siblings)
  44 siblings, 2 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

If we reset a ring after a hang, we have to make sure that we clear
out all queued Execlists requests.

v2: The ring is, at this point, already being correctly re-programmed
for Execlists, and the hangcheck counters cleared.

v3: Daniel suggests to drop the "if (execlists)" because the Execlists
queue should be empty in legacy mode (which is true, if we do the
INIT_LIST_HEAD).

v4: Do the pending intel_runtime_pm_put

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         |   12 ++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c |    1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1c83b9c..143cff7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2567,6 +2567,18 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 		i915_gem_free_request(request);
 	}
 
+	while (!list_empty(&ring->execlist_queue)) {
+		struct intel_ctx_submit_request *submit_req;
+
+		submit_req = list_first_entry(&ring->execlist_queue,
+				struct intel_ctx_submit_request,
+				execlist_link);
+		list_del(&submit_req->execlist_link);
+		intel_runtime_pm_put(dev_priv);
+		i915_gem_context_unreference(submit_req->ctx);
+		kfree(submit_req);
+	}
+
 	/* These may not have been flush before the reset, do so now */
 	kfree(ring->preallocated_lazy_request);
 	ring->preallocated_lazy_request = NULL;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 3188403..6e604c9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1587,6 +1587,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->execlist_queue);
 	ringbuf->size = 32 * PAGE_SIZE;
 	ringbuf->ring = ring;
 	ringbuf->ctx = ring->default_context;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 35/43] drm/i915/bdw: Make sure error capture keeps working with Execlists
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (33 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 34/43] drm/i915/bdw: Make sure gpu reset still works with Execlists Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-15 12:14   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 36/43] drm/i915/bdw: Disable semaphores for Execlists Thomas Daniel
                   ` (9 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Since the ringbuffer does not belong per engine anymore, we have to
make sure that we are always recording the correct ringbuffer.

TODO: This is only a small fix to keep basic error capture working, but
we need to add more information for it to be useful (e.g. dump the
context being executed).

v2: Reorder how the ringbuffer is chosen to clarify the change and
rename the variable, both changes suggested by Chris Wilson. Also,
add the TODO comment to the code, as suggested by Daniel.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.c |   22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 45b6191..1e38576 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -874,9 +874,6 @@ static void i915_record_ring_state(struct drm_device *dev,
 		ering->hws = I915_READ(mmio);
 	}
 
-	ering->cpu_ring_head = ring->buffer->head;
-	ering->cpu_ring_tail = ring->buffer->tail;
-
 	ering->hangcheck_score = ring->hangcheck.score;
 	ering->hangcheck_action = ring->hangcheck.action;
 
@@ -936,6 +933,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct intel_engine_cs *ring = &dev_priv->ring[i];
+		struct intel_ringbuffer *rbuf;
 
 		error->ring[i].pid = -1;
 
@@ -979,8 +977,24 @@ static void i915_gem_record_rings(struct drm_device *dev,
 			}
 		}
 
+		if (i915.enable_execlists) {
+			/* TODO: This is only a small fix to keep basic error
+			 * capture working, but we need to add more information
+			 * for it to be useful (e.g. dump the context being
+			 * executed).
+			 */
+			if (request)
+				rbuf = request->ctx->engine[ring->id].ringbuf;
+			else
+				rbuf = ring->default_context->engine[ring->id].ringbuf;
+		} else
+			rbuf = ring->buffer;
+
+		error->ring[i].cpu_ring_head = rbuf->head;
+		error->ring[i].cpu_ring_tail = rbuf->tail;
+
 		error->ring[i].ringbuffer =
-			i915_error_ggtt_object_create(dev_priv, ring->buffer->obj);
+			i915_error_ggtt_object_create(dev_priv, rbuf->obj);
 
 		if (ring->status_page.obj)
 			error->ring[i].hws_page =
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 36/43] drm/i915/bdw: Disable semaphores for Execlists
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (34 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 35/43] drm/i915/bdw: Make sure error capture keeps working " Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-07-24 16:04 ` [PATCH 37/43] drm/i915/bdw: Display execlists info in debugfs Thomas Daniel
                   ` (8 subsequent siblings)
  44 siblings, 0 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Up until recently, semaphores weren't enabled in BDW so we didn't care
about them. But then Rodrigo came and enabled them:

   commit 521e62e49a42661a4ee0102644517dbe2f100a23
   Author: Rodrigo Vivi <rodrigo.vivi@intel.com>

      drm/i915: Enable semaphores on BDW

So now we have to explicitly disable them for Execlists until both
features play nicely.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 5e4fefd..3489102 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -477,6 +477,10 @@ bool i915_semaphore_is_enabled(struct drm_device *dev)
 	if (i915.semaphores >= 0)
 		return i915.semaphores;
 
+	/* TODO: make semaphores and Execlists play nicely together */
+	if (i915.enable_execlists)
+		return false;
+
 #ifdef CONFIG_INTEL_IOMMU
 	/* Enable semaphores on SNB when IO remapping is off */
 	if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 37/43] drm/i915/bdw: Display execlists info in debugfs
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (35 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 36/43] drm/i915/bdw: Disable semaphores for Execlists Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-01 14:54   ` Damien Lespiau
  2014-08-07 12:23   ` Thomas Daniel
  2014-07-24 16:04 ` [PATCH 38/43] drm/i915/bdw: Display context backing obj & ringbuffer " Thomas Daniel
                   ` (7 subsequent siblings)
  44 siblings, 2 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

v2: Warn and return if LRCs are not enabled.

v3: Grab the Execlists spinlock (noticed by Daniel Vetter).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |   73 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c    |    6 ---
 drivers/gpu/drm/i915/intel_lrc.h    |    7 ++++
 3 files changed, 80 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index fc39610..903ed67 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1674,6 +1674,78 @@ static int i915_context_status(struct seq_file *m, void *unused)
 	return 0;
 }
 
+static int i915_execlists(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = (struct drm_info_node *) m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring;
+	u32 status_pointer;
+	u8 read_pointer;
+	u8 write_pointer;
+	u32 status;
+	u32 ctx_id;
+	struct list_head *cursor;
+	int ring_id, i;
+
+	if (!i915.enable_execlists) {
+		seq_printf(m, "Logical Ring Contexts are disabled\n");
+		return 0;
+	}
+
+	for_each_ring(ring, dev_priv, ring_id) {
+		struct intel_ctx_submit_request *head_req = NULL;
+		int count = 0;
+		unsigned long flags;
+
+		seq_printf(m, "%s\n", ring->name);
+
+		status = I915_READ(RING_EXECLIST_STATUS(ring));
+		ctx_id = I915_READ(RING_EXECLIST_STATUS(ring) + 4);
+		seq_printf(m, "\tExeclist status: 0x%08X, context: %u\n",
+				status, ctx_id);
+
+		status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
+		seq_printf(m, "\tStatus pointer: 0x%08X\n", status_pointer);
+
+		read_pointer = ring->next_context_status_buffer;
+		write_pointer = status_pointer & 0x07;
+		if (read_pointer > write_pointer)
+			write_pointer += 6;
+		seq_printf(m, "\tRead pointer: 0x%08X, write pointer 0x%08X\n",
+				read_pointer, write_pointer);
+
+		for (i = 0; i < 6; i++) {
+			status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + 8*i);
+			ctx_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + 8*i + 4);
+
+			seq_printf(m, "\tStatus buffer %d: 0x%08X, context: %u\n",
+					i, status, ctx_id);
+		}
+
+		spin_lock_irqsave(&ring->execlist_lock, flags);
+		list_for_each(cursor, &ring->execlist_queue)
+			count++;
+		head_req = list_first_entry_or_null(&ring->execlist_queue,
+				struct intel_ctx_submit_request, execlist_link);
+		spin_unlock_irqrestore(&ring->execlist_lock, flags);
+
+		seq_printf(m, "\t%d requests in queue\n", count);
+		if (head_req) {
+			struct drm_i915_gem_object *ctx_obj;
+
+			ctx_obj = head_req->ctx->engine[ring_id].state;
+			seq_printf(m, "\tHead request id: %u\n",
+					intel_execlists_ctx_id(ctx_obj));
+			seq_printf(m, "\tHead request tail: %u\n", head_req->tail);
+		}
+
+		seq_putc(m, '\n');
+	}
+
+	return 0;
+}
+
 static int i915_gen6_forcewake_count_info(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = m->private;
@@ -3899,6 +3971,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_opregion", i915_opregion, 0},
 	{"i915_gem_framebuffer", i915_gem_framebuffer_info, 0},
 	{"i915_context_status", i915_context_status, 0},
+	{"i915_execlists", i915_execlists, 0},
 	{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
 	{"i915_swizzle_info", i915_swizzle_info, 0},
 	{"i915_ppgtt_info", i915_ppgtt_info, 0},
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 829b15d..8056fa4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -46,12 +46,6 @@
 
 #define GEN8_LR_CONTEXT_ALIGN 4096
 
-#define RING_ELSP(ring)			((ring)->mmio_base+0x230)
-#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
-#define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
-#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
-#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
-
 #define RING_EXECLIST_QFULL		(1 << 0x2)
 #define RING_EXECLIST1_VALID		(1 << 0x3)
 #define RING_EXECLIST0_VALID		(1 << 0x4)
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 074b44f..f3b921b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -24,6 +24,13 @@
 #ifndef _INTEL_LRC_H_
 #define _INTEL_LRC_H_
 
+/* Execlists regs */
+#define RING_ELSP(ring)			((ring)->mmio_base+0x230)
+#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
+#define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
+#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
+#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
+
 /* Logical Rings */
 void intel_logical_ring_stop(struct intel_engine_cs *ring);
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 38/43] drm/i915/bdw: Display context backing obj & ringbuffer info in debugfs
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (36 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 37/43] drm/i915/bdw: Display execlists info in debugfs Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-07-24 16:04 ` [PATCH 39/43] drm/i915/bdw: Print context state " Thomas Daniel
                   ` (6 subsequent siblings)
  44 siblings, 0 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |   25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 903ed67..0980cdd 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1629,6 +1629,12 @@ static int i915_gem_framebuffer_info(struct seq_file *m, void *data)
 
 	return 0;
 }
+static void describe_ctx_ringbuf(struct seq_file *m, struct intel_ringbuffer *ringbuf)
+{
+	seq_printf(m, " (ringbuffer, space: %d, head: %u, tail: %u, last head: %d)",
+			ringbuf->space, ringbuf->head, ringbuf->tail,
+			ringbuf->last_retired_head);
+}
 
 static int i915_context_status(struct seq_file *m, void *unused)
 {
@@ -1656,7 +1662,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
 	}
 
 	list_for_each_entry(ctx, &dev_priv->context_list, link) {
-		if (ctx->legacy_hw_ctx.rcs_state == NULL)
+		if (!i915.enable_execlists && ctx->legacy_hw_ctx.rcs_state == NULL)
 			continue;
 
 		seq_puts(m, "HW context ");
@@ -1665,7 +1671,22 @@ static int i915_context_status(struct seq_file *m, void *unused)
 			if (ring->default_context == ctx)
 				seq_printf(m, "(default context %s) ", ring->name);
 
-		describe_obj(m, ctx->legacy_hw_ctx.rcs_state);
+		if (i915.enable_execlists) {
+			seq_putc(m, '\n');
+			for_each_ring(ring, dev_priv, i) {
+				struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
+				struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
+
+				seq_printf(m, "%s: ", ring->name);
+				if (ctx_obj)
+					describe_obj(m, ctx_obj);
+				if (ringbuf)
+					describe_ctx_ringbuf(m, ringbuf);
+				seq_putc(m, '\n');
+			}
+		} else
+			describe_obj(m, ctx->legacy_hw_ctx.rcs_state);
+
 		seq_putc(m, '\n');
 	}
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 39/43] drm/i915/bdw: Print context state in debugfs
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (37 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 38/43] drm/i915/bdw: Display context backing obj & ringbuffer " Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-01 15:54   ` Damien Lespiau
  2014-08-07 12:24   ` Thomas Daniel
  2014-07-24 16:04 ` [PATCH 40/43] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists Thomas Daniel
                   ` (5 subsequent siblings)
  44 siblings, 2 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <ben@bwidawsk.net>

This has turned out to be really handy in debug so far.

Update:
Since writing this patch, I've gotten similar code upstream for error
state. I've used it quite a bit in debugfs however, and I'd like to keep
it here at least until preemption is working.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

This patch was accidentally dropped in the first Execlists version, and
it has been very useful indeed. Put it back again, but as a standalone
debugfs file.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |   52 +++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 0980cdd..968c3c0 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1695,6 +1695,57 @@ static int i915_context_status(struct seq_file *m, void *unused)
 	return 0;
 }
 
+static int i915_dump_lrc(struct seq_file *m, void *unused)
+{
+	struct drm_info_node *node = (struct drm_info_node *) m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring;
+	struct intel_context *ctx;
+	int ret, i;
+
+	if (!i915.enable_execlists) {
+		seq_printf(m, "Logical Ring Contexts are disabled\n");
+		return 0;
+	}
+
+	ret = mutex_lock_interruptible(&dev->mode_config.mutex);
+	if (ret)
+		return ret;
+
+	list_for_each_entry(ctx, &dev_priv->context_list, link) {
+		for_each_ring(ring, dev_priv, i) {
+			struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
+
+			if (ring->default_context == ctx)
+				continue;
+
+			if (ctx_obj) {
+				struct page *page = i915_gem_object_get_page(ctx_obj, 1);
+				uint32_t *reg_state = kmap_atomic(page);
+				int j;
+
+				seq_printf(m, "CONTEXT: %s %u\n", ring->name,
+						intel_execlists_ctx_id(ctx_obj));
+
+				for (j = 0; j < 0x600 / sizeof(u32) / 4; j += 4) {
+					seq_printf(m, "\t[0x%08lx] 0x%08x 0x%08x 0x%08x 0x%08x\n",
+					i915_gem_obj_ggtt_offset(ctx_obj) + 4096 + (j * 4),
+					reg_state[j], reg_state[j + 1],
+					reg_state[j + 2], reg_state[j + 3]);
+				}
+				kunmap_atomic(reg_state);
+
+				seq_putc(m, '\n');
+			}
+		}
+	}
+
+	mutex_unlock(&dev->mode_config.mutex);
+
+	return 0;
+}
+
 static int i915_execlists(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = (struct drm_info_node *) m->private;
@@ -3992,6 +4043,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_opregion", i915_opregion, 0},
 	{"i915_gem_framebuffer", i915_gem_framebuffer_info, 0},
 	{"i915_context_status", i915_context_status, 0},
+	{"i915_dump_lrc", i915_dump_lrc, 0},
 	{"i915_execlists", i915_execlists, 0},
 	{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
 	{"i915_swizzle_info", i915_swizzle_info, 0},
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 40/43] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (38 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 39/43] drm/i915/bdw: Print context state " Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-15 12:42   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 41/43] drm/i915/bdw: Enable Logical Ring Contexts (hence, Execlists) Thomas Daniel
                   ` (4 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Add theory of operation notes to intel_lrc.c and comments to externally
visible functions.

v2: Add notes on logical ring context creation.

v3: Use kerneldoc.

v4: Integrate it in the DocBook template.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v3)
---
 Documentation/DocBook/drm.tmpl   |    5 +
 drivers/gpu/drm/i915/intel_lrc.c |  215 +++++++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.h |   30 ++++++
 3 files changed, 249 insertions(+), 1 deletion(-)

diff --git a/Documentation/DocBook/drm.tmpl b/Documentation/DocBook/drm.tmpl
index 97838551..91a5620 100644
--- a/Documentation/DocBook/drm.tmpl
+++ b/Documentation/DocBook/drm.tmpl
@@ -3909,6 +3909,11 @@ int num_ioctls;</synopsis>
 !Pdrivers/gpu/drm/i915/i915_cmd_parser.c batch buffer command parser
 !Idrivers/gpu/drm/i915/i915_cmd_parser.c
       </sect2>
+      <sect2>
+        <title>Logical Rings, Logical Ring Contexts and Execlists</title>
+!Pdrivers/gpu/drm/i915/intel_lrc.c Logical Rings, Logical Ring Contexts and Execlists
+!Idrivers/gpu/drm/i915/intel_lrc.c
+      </sect2>
     </sect1>
   </chapter>
 </part>
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8056fa4..5faa084 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -28,13 +28,108 @@
  *
  */
 
-/*
+/**
+ * DOC: Logical Rings, Logical Ring Contexts and Execlists
+ *
+ * Motivation:
  * GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
  * These expanded contexts enable a number of new abilities, especially
  * "Execlists" (also implemented in this file).
  *
+ * One of the main differences with the legacy HW contexts is that logical
+ * ring contexts incorporate many more things to the context's state, like
+ * PDPs or ringbuffer control registers:
+ *
+ * The reason why PDPs are included in the context is straightforward: as
+ * PPGTTs (per-process GTTs) are actually per-context, having the PDPs
+ * contained there mean you don't need to do a ppgtt->switch_mm yourself,
+ * instead, the GPU will do it for you on the context switch.
+ *
+ * But, what about the ringbuffer control registers (head, tail, etc..)?
+ * shouldn't we just need a set of those per engine command streamer? This is
+ * where the name "Logical Rings" starts to make sense: by virtualizing the
+ * rings, the engine cs shifts to a new "ring buffer" with every context
+ * switch. When you want to submit a workload to the GPU you: A) choose your
+ * context, B) find its appropriate virtualized ring, C) write commands to it
+ * and then, finally, D) tell the GPU to switch to that context.
+ *
+ * Instead of the legacy MI_SET_CONTEXT, the way you tell the GPU to switch
+ * to a contexts is via a context execution list, ergo "Execlists".
+ *
+ * LRC implementation:
+ * Regarding the creation of contexts, we have:
+ *
+ * - One global default context.
+ * - One local default context for each opened fd.
+ * - One local extra context for each context create ioctl call.
+ *
+ * Now that ringbuffers belong per-context (and not per-engine, like before)
+ * and that contexts are uniquely tied to a given engine (and not reusable,
+ * like before) we need:
+ *
+ * - One ringbuffer per-engine inside each context.
+ * - One backing object per-engine inside each context.
+ *
+ * The global default context starts its life with these new objects fully
+ * allocated and populated. The local default context for each opened fd is
+ * more complex, because we don't know at creation time which engine is going
+ * to use them. To handle this, we have implemented a deferred creation of LR
+ * contexts:
+ *
+ * The local context starts its life as a hollow or blank holder, that only
+ * gets populated for a given engine once we receive an execbuffer. If later
+ * on we receive another execbuffer ioctl for the same context but a different
+ * engine, we allocate/populate a new ringbuffer and context backing object and
+ * so on.
+ *
+ * Finally, regarding local contexts created using the ioctl call: as they are
+ * only allowed with the render ring, we can allocate & populate them right
+ * away (no need to defer anything, at least for now).
+ *
+ * Execlists implementation:
  * Execlists are the new method by which, on gen8+ hardware, workloads are
  * submitted for execution (as opposed to the legacy, ringbuffer-based, method).
+ * This method works as follows:
+ *
+ * When a request is committed, its commands (the BB start and any leading or
+ * trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer
+ * for the appropriate context. The tail pointer in the hardware context is not
+ * updated at this time, but instead, kept by the driver in the ringbuffer
+ * structure. A structure representing this request is added to a request queue
+ * for the appropriate engine: this structure contains a copy of the context's
+ * tail after the request was written to the ring buffer and a pointer to the
+ * context itself.
+ *
+ * If the engine's request queue was empty before the request was added, the
+ * queue is processed immediately. Otherwise the queue will be processed during
+ * a context switch interrupt. In any case, elements on the queue will get sent
+ * (in pairs) to the GPU's ExecLists Submit Port (ELSP, for short) with a
+ * globally unique 20-bits submission ID.
+ *
+ * When execution of a request completes, the GPU updates the context status
+ * buffer with a context complete event and generates a context switch interrupt.
+ * During the interrupt handling, the driver examines the events in the buffer:
+ * for each context complete event, if the announced ID matches that on the head
+ * of the request queue, then that request is retired and removed from the queue.
+ *
+ * After processing, if any requests were retired and the queue is not empty
+ * then a new execution list can be submitted. The two requests at the front of
+ * the queue are next to be submitted but since a context may not occur twice in
+ * an execution list, if subsequent requests have the same ID as the first then
+ * the two requests must be combined. This is done simply by discarding requests
+ * at the head of the queue until either only one requests is left (in which case
+ * we use a NULL second context) or the first two requests have unique IDs.
+ *
+ * By always executing the first two requests in the queue the driver ensures
+ * that the GPU is kept as busy as possible. In the case where a single context
+ * completes but a second context is still executing, the request for this second
+ * context will be at the head of the queue when we remove the first one. This
+ * request will then be resubmitted along with a new request for a different context,
+ * which will cause the hardware to continue executing the second request and queue
+ * the new request (the GPU detects the condition of a context getting preempted
+ * with the same context and optimizes the context switch flow by not doing
+ * preemption, but just sampling the new tail pointer).
+ *
  */
 
 #include <drm/drmP.h>
@@ -109,6 +204,17 @@ enum {
 };
 #define GEN8_CTX_ID_SHIFT 32
 
+/**
+ * intel_sanitize_enable_execlists() - sanitize i915.enable_execlists
+ * @dev: DRM device.
+ * @enable_execlists: value of i915.enable_execlists module parameter.
+ *
+ * Only certain platforms support Execlists (the prerequisites being
+ * support for Logical Ring Contexts and Aliasing PPGTT or better),
+ * and only when enabled via module parameter.
+ *
+ * Return: 1 if Execlists is supported and has to be enabled.
+ */
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists)
 {
 	if (enable_execlists == 0)
@@ -121,6 +227,18 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
+/**
+ * intel_execlists_ctx_id() - get the Execlists Context ID
+ * @ctx_obj: Logical Ring Context backing object.
+ *
+ * Do not confuse with ctx->id! Unfortunately we have a name overload
+ * here: the old context ID we pass to userspace as a handler so that
+ * they can refer to a context, and the new context ID we pass to the
+ * ELSP so that the GPU can inform us of the context status via
+ * interrupts.
+ *
+ * Return: 20-bits globally unique context ID.
+ */
 u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
 {
 	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj);
@@ -305,6 +423,13 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
 	return false;
 }
 
+/**
+ * intel_execlists_handle_ctx_events() - handle Context Switch interrupts
+ * @ring: Engine Command Streamer to handle.
+ *
+ * Check the unread Context Status Buffers and manage the submission of new
+ * contexts to the ELSP accordingly.
+ */
 void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
@@ -473,6 +598,23 @@ static int execlists_move_to_gpu(struct intel_ringbuffer *ringbuf,
 	return logical_ring_invalidate_all_caches(ringbuf);
 }
 
+/**
+ * execlists_submission() - submit a batchbuffer for execution, Execlists style
+ * @dev: DRM device.
+ * @file: DRM file.
+ * @ring: Engine Command Streamer to submit to.
+ * @ctx: Context to employ for this submission.
+ * @args: execbuffer call arguments.
+ * @vmas: list of vmas.
+ * @batch_obj: the batchbuffer to submit.
+ * @exec_start: batchbuffer start virtual address pointer.
+ * @flags: translated execbuffer call flags.
+ *
+ * This is the evil twin version of i915_gem_ringbuffer_submission. It abstracts
+ * away the submission details of the execbuffer ioctl call.
+ *
+ * Return: non-zero if the submission fails.
+ */
 int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 			       struct intel_engine_cs *ring,
 			       struct intel_context *ctx,
@@ -600,6 +742,15 @@ int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf)
 	return 0;
 }
 
+/**
+ * intel_logical_ring_advance_and_submit() - advance the tail and submit the workload
+ * @ringbuf: Logical Ringbuffer to advance.
+ *
+ * The tail is updated in our logical ringbuffer struct, not in the actual context. What
+ * really happens during submission is that the context and current tail will be placed
+ * on a queue waiting for the ELSP to be ready to accept a new context submission. At that
+ * point, the tail *inside* the context is updated and the ELSP written to.
+ */
 void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
 {
 	struct intel_engine_cs *ring = ringbuf->ring;
@@ -777,6 +928,19 @@ static int logical_ring_prepare(struct intel_ringbuffer *ringbuf, int bytes)
 	return 0;
 }
 
+/**
+ * intel_logical_ring_begin() - prepare the logical ringbuffer to accept some commands
+ *
+ * @ringbuf: Logical ringbuffer.
+ * @num_dwords: number of DWORDs that we plan to write to the ringbuffer.
+ *
+ * The ringbuffer might not be ready to accept the commands right away (maybe it needs to
+ * be wrapped, or wait a bit for the tail to be updated). This function takes care of that
+ * and also preallocates a request (every workload submission is still mediated through
+ * requests, same as it did with legacy ringbuffer submission).
+ *
+ * Return: non-zero if the ringbuffer is not ready to be written to.
+ */
 int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords)
 {
 	struct intel_engine_cs *ring = ringbuf->ring;
@@ -1017,6 +1181,12 @@ static int gen8_emit_request(struct intel_ringbuffer *ringbuf)
 	return 0;
 }
 
+/**
+ * intel_logical_ring_cleanup() - deallocate the Engine Command Streamer
+ *
+ * @ring: Engine Command Streamer.
+ *
+ */
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
@@ -1211,6 +1381,16 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	return logical_ring_init(dev, ring);
 }
 
+/**
+ * intel_logical_rings_init() - allocate, populate and init the Engine Command Streamers
+ * @dev: DRM device.
+ *
+ * This function inits the engines for an Execlists submission style (the equivalent in the
+ * legacy ringbuffer submission world would be i915_gem_init_rings). It does it only for
+ * those engines that are present in the hardware.
+ *
+ * Return: non-zero if the initialization failed.
+ */
 int intel_logical_rings_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1264,6 +1444,18 @@ cleanup_render_ring:
 	return ret;
 }
 
+/**
+ * intel_lr_context_render_state_init() - render state init for Execlists
+ * @ring: Engine Command Streamer.
+ * @ctx: Context to initialize.
+ *
+ * A.K.A. null-context, A.K.A. golden-context. In a word, the render engine
+ * contexts require to always have a valid 3d pipeline state. As this is
+ * achieved with the submission of a batchbuffer, we require an alternative
+ * entry point to the legacy ringbuffer submission one (i915_gem_render_state_init).
+ *
+ * Return: non-zero if the initialization failed.
+ */
 int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
 				       struct intel_context *ctx)
 {
@@ -1404,6 +1596,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	return 0;
 }
 
+/**
+ * intel_lr_context_free() - free the LRC specific bits of a context
+ * @ctx: the LR context to free.
+ *
+ * The real context freeing is done in i915_gem_context_free: this only
+ * takes care of the bits that are LRC related: the per-engine backing
+ * objects and the logical ringbuffer.
+ */
 void intel_lr_context_free(struct intel_context *ctx)
 {
 	int i;
@@ -1442,6 +1642,19 @@ static uint32_t get_lr_context_size(struct intel_engine_cs *ring)
 	return ret;
 }
 
+/**
+ * intel_lr_context_deferred_create() - create the LRC specific bits of a context
+ * @ctx: LR context to create.
+ * @ring: engine to be used with the context.
+ *
+ * This function can be called more than once, with different engines, if we plan
+ * to use the context with them. The context backing objects and the ringbuffers
+ * (specially the ringbuffer backing objects) suck a lot of memory up, and that's why
+ * the creation is a deferred call: it's better to make sure first that we need to use
+ * a given ring with the context.
+ *
+ * Return: non-zero on eror.
+ */
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring)
 {
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index f3b921b..ac32890 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -38,10 +38,21 @@ int intel_logical_rings_init(struct drm_device *dev);
 
 int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf);
 void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf);
+/**
+ * intel_logical_ring_advance() - advance the ringbuffer tail
+ * @ringbuf: Ringbuffer to advance.
+ *
+ * The tail is only updated in our logical ringbuffer struct.
+ */
 static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
 {
 	ringbuf->tail &= ringbuf->size - 1;
 }
+/**
+ * intel_logical_ring_emit() - write a DWORD to the ringbuffer.
+ * @ringbuf: Ringbuffer to write to.
+ * @data: DWORD to write.
+ */
 static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf, u32 data)
 {
 	iowrite32(data, ringbuf->virtual_start + ringbuf->tail);
@@ -67,6 +78,25 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 			       u64 exec_start, u32 flags);
 u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
+/**
+ * struct intel_ctx_submit_request - queued context submission request
+ * @ctx: Context to submit to the ELSP.
+ * @ring: Engine to submit it to.
+ * @tail: how far in the context's ringbuffer this request goes to.
+ * @execlist_link: link in the submission queue.
+ * @work: workqueue for processing this request in a bottom half.
+ * @elsp_submitted: no. of times this request has been sent to the ELSP.
+ *
+ * The ELSP only accepts two elements at a time, so we queue context/tail
+ * pairs on a given queue (ring->execlist_queue) until the hardware is
+ * available. The queue serves a double purpose: we also use it to keep track
+ * of the up to 2 contexts currently in the hardware (usually one in execution
+ * and the other queued up by the GPU): We only remove elements from the head
+ * of the queue when the hardware informs us that an element has been
+ * completed.
+ *
+ * All accesses to the queue are mediated by a spinlock (ring->execlist_lock).
+ */
 struct intel_ctx_submit_request {
 	struct intel_context *ctx;
 	struct intel_engine_cs *ring;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 41/43] drm/i915/bdw: Enable Logical Ring Contexts (hence, Execlists)
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (39 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 40/43] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-18  8:33   ` Jani Nikula
  2014-07-24 16:04 ` [PATCH 42/43] drm/i915/bdw: Pin the context backing objects to GGTT on-demand Thomas Daniel
                   ` (3 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

The time has come, the Walrus said, to talk of many things.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b7cf0ec..1ce51d6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2061,7 +2061,7 @@ struct drm_i915_cmd_table {
 #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
 
 #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
-#define HAS_LOGICAL_RING_CONTEXTS(dev)	0
+#define HAS_LOGICAL_RING_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 8)
 #define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >= 6)
 #define HAS_PPGTT(dev)		(INTEL_INFO(dev)->gen >= 7 && !IS_GEN8(dev))
 #define USES_PPGTT(dev)		intel_enable_ppgtt(dev, false)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 42/43] drm/i915/bdw: Pin the context backing objects to GGTT on-demand
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (40 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 41/43] drm/i915/bdw: Enable Logical Ring Contexts (hence, Execlists) Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-08-15 13:03   ` Daniel Vetter
  2014-07-24 16:04 ` [PATCH 43/43] drm/i915/bdw: Pin the ringbuffer backing object " Thomas Daniel
                   ` (2 subsequent siblings)
  44 siblings, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Up until now, we have pinned every logical ring context backing object
during creation, and left it pinned until destruction. This made my life
easier, but it's a harmful thing to do, because we cause fragmentation
of the GGTT (and, eventually, we would run out of space).

This patch makes the pinning on-demand: the backing objects of the two
contexts that are written to the ELSP are pinned right before submission
and unpinned once the hardware is done with them. The only context that
is still pinned regardless is the global default one, so that the HWS can
still be accessed in the same way (ring->status_page).

v2: In the early version of this patch, we were pinning the context as
we put it into the ELSP: on the one hand, this is very efficient because
only a maximum two contexts are pinned at any given time, but on the other
hand, we cannot really pin in interrupt time :(

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |   11 +++++++--
 drivers/gpu/drm/i915/i915_drv.h     |    1 +
 drivers/gpu/drm/i915/i915_gem.c     |   44 ++++++++++++++++++++++++-----------
 drivers/gpu/drm/i915/intel_lrc.c    |   42 ++++++++++++++++++++++++---------
 drivers/gpu/drm/i915/intel_lrc.h    |    2 ++
 5 files changed, 73 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 968c3c0..84531cc 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1721,10 +1721,15 @@ static int i915_dump_lrc(struct seq_file *m, void *unused)
 				continue;
 
 			if (ctx_obj) {
-				struct page *page = i915_gem_object_get_page(ctx_obj, 1);
-				uint32_t *reg_state = kmap_atomic(page);
+				struct page *page ;
+				uint32_t *reg_state;
 				int j;
 
+				i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
+
+				page = i915_gem_object_get_page(ctx_obj, 1);
+				reg_state = kmap_atomic(page);
+
 				seq_printf(m, "CONTEXT: %s %u\n", ring->name,
 						intel_execlists_ctx_id(ctx_obj));
 
@@ -1736,6 +1741,8 @@ static int i915_dump_lrc(struct seq_file *m, void *unused)
 				}
 				kunmap_atomic(reg_state);
 
+				i915_gem_object_ggtt_unpin(ctx_obj);
+
 				seq_putc(m, '\n');
 			}
 		}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1ce51d6..70466af 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -628,6 +628,7 @@ struct intel_context {
 	struct {
 		struct drm_i915_gem_object *state;
 		struct intel_ringbuffer *ringbuf;
+		atomic_t unpin_count;
 	} engine[I915_NUM_RINGS];
 
 	struct list_head link;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 143cff7..42faaa3 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2491,12 +2491,23 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 
 static void i915_gem_free_request(struct drm_i915_gem_request *request)
 {
+	struct intel_context *ctx = request->ctx;
+
 	list_del(&request->list);
 	i915_gem_request_remove_from_client(request);
 
-	if (request->ctx)
-		i915_gem_context_unreference(request->ctx);
+	if (ctx) {
+		struct intel_engine_cs *ring = request->ring;
+		struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
+		atomic_t *unpin_count = &ctx->engine[ring->id].unpin_count;
 
+		if (ctx_obj) {
+			if (atomic_dec_return(unpin_count) == 0 &&
+					ctx != ring->default_context)
+				i915_gem_object_ggtt_unpin(ctx_obj);
+		}
+		i915_gem_context_unreference(ctx);
+	}
 	kfree(request);
 }
 
@@ -2551,6 +2562,23 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 	}
 
 	/*
+	 * Clear the execlists queue up before freeing the requests, as those
+	 * are the ones that keep the context and ringbuffer backing objects
+	 * pinned in place.
+	 */
+	while (!list_empty(&ring->execlist_queue)) {
+		struct intel_ctx_submit_request *submit_req;
+
+		submit_req = list_first_entry(&ring->execlist_queue,
+				struct intel_ctx_submit_request,
+				execlist_link);
+		list_del(&submit_req->execlist_link);
+		intel_runtime_pm_put(dev_priv);
+		i915_gem_context_unreference(submit_req->ctx);
+		kfree(submit_req);
+	}
+
+	/*
 	 * We must free the requests after all the corresponding objects have
 	 * been moved off active lists. Which is the same order as the normal
 	 * retire_requests function does. This is important if object hold
@@ -2567,18 +2595,6 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 		i915_gem_free_request(request);
 	}
 
-	while (!list_empty(&ring->execlist_queue)) {
-		struct intel_ctx_submit_request *submit_req;
-
-		submit_req = list_first_entry(&ring->execlist_queue,
-				struct intel_ctx_submit_request,
-				execlist_link);
-		list_del(&submit_req->execlist_link);
-		intel_runtime_pm_put(dev_priv);
-		i915_gem_context_unreference(submit_req->ctx);
-		kfree(submit_req);
-	}
-
 	/* These may not have been flush before the reset, do so now */
 	kfree(ring->preallocated_lazy_request);
 	ring->preallocated_lazy_request = NULL;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5faa084..9fa8e35 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -139,8 +139,6 @@
 #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
 #define GEN8_LR_CONTEXT_OTHER_SIZE (2 * PAGE_SIZE)
 
-#define GEN8_LR_CONTEXT_ALIGN 4096
-
 #define RING_EXECLIST_QFULL		(1 << 0x2)
 #define RING_EXECLIST1_VALID		(1 << 0x3)
 #define RING_EXECLIST0_VALID		(1 << 0x4)
@@ -767,16 +765,30 @@ void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
 static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
 				    struct intel_context *ctx)
 {
+	int ret;
+
 	if (ring->outstanding_lazy_seqno)
 		return 0;
 
 	if (ring->preallocated_lazy_request == NULL) {
 		struct drm_i915_gem_request *request;
+		struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
+		atomic_t *unpin_count = &ctx->engine[ring->id].unpin_count;
 
 		request = kmalloc(sizeof(*request), GFP_KERNEL);
 		if (request == NULL)
 			return -ENOMEM;
 
+		if (atomic_inc_return(unpin_count) == 1 &&
+				ctx != ring->default_context) {
+			ret = i915_gem_obj_ggtt_pin(ctx_obj,
+					GEN8_LR_CONTEXT_ALIGN, 0);
+			if (ret) {
+				kfree(request);
+				return ret;
+			}
+		}
+
 		/* Hold a reference to the context this request belongs to
 		 * (we will need it when the time comes to emit/retire the
 		 * request).
@@ -1610,12 +1622,15 @@ void intel_lr_context_free(struct intel_context *ctx)
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
-		struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
 
 		if (ctx_obj) {
+			struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
+			struct intel_engine_cs *ring = ringbuf->ring;
+
 			intel_destroy_ringbuffer_obj(ringbuf);
 			kfree(ringbuf);
-			i915_gem_object_ggtt_unpin(ctx_obj);
+			if (ctx == ring->default_context)
+				i915_gem_object_ggtt_unpin(ctx_obj);
 			drm_gem_object_unreference(&ctx_obj->base);
 		}
 	}
@@ -1658,6 +1673,7 @@ static uint32_t get_lr_context_size(struct intel_engine_cs *ring)
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring)
 {
+	const bool is_global_default_ctx = (ctx == ring->default_context);
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_gem_object *ctx_obj;
 	uint32_t context_size;
@@ -1677,18 +1693,21 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 		return ret;
 	}
 
-	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
-	if (ret) {
-		DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n", ret);
-		drm_gem_object_unreference(&ctx_obj->base);
-		return ret;
+	if (is_global_default_ctx) {
+		ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
+		if (ret) {
+			DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n", ret);
+			drm_gem_object_unreference(&ctx_obj->base);
+			return ret;
+		}
 	}
 
 	ringbuf = kzalloc(sizeof(*ringbuf), GFP_KERNEL);
 	if (!ringbuf) {
 		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s\n",
 				ring->name);
-		i915_gem_object_ggtt_unpin(ctx_obj);
+		if (is_global_default_ctx)
+			i915_gem_object_ggtt_unpin(ctx_obj);
 		drm_gem_object_unreference(&ctx_obj->base);
 		ret = -ENOMEM;
 		return ret;
@@ -1744,7 +1763,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 
 error:
 	kfree(ringbuf);
-	i915_gem_object_ggtt_unpin(ctx_obj);
+	if (is_global_default_ctx)
+		i915_gem_object_ggtt_unpin(ctx_obj);
 	drm_gem_object_unreference(&ctx_obj->base);
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index ac32890..5999e05 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -24,6 +24,8 @@
 #ifndef _INTEL_LRC_H_
 #define _INTEL_LRC_H_
 
+#define GEN8_LR_CONTEXT_ALIGN 4096
+
 /* Execlists regs */
 #define RING_ELSP(ring)			((ring)->mmio_base+0x230)
 #define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 43/43] drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (41 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 42/43] drm/i915/bdw: Pin the context backing objects to GGTT on-demand Thomas Daniel
@ 2014-07-24 16:04 ` Thomas Daniel
  2014-07-25  8:35 ` [PATCH 00/43] Execlists v5 Daniel Vetter
  2014-08-01 16:09 ` Damien Lespiau
  44 siblings, 0 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-07-24 16:04 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Same as with the context, pinning to GGTT regardless is harmful (it
badly fragments the GGTT and can even exhaust it).

Unfortunately, this case is also more complex than the previous one
because we need to map and access the ringbuffer in several places
along the execbuffer path (and we cannot make do by leaving the
default ringbuffer pinned, as before). Also, the context object
itself contains a pointer to the ringbuffer address that we have to
keep updated if we are going to allow the ringbuffer to move around.

v2: Same as with the context pinning, we cannot really do it during
an interrupt. Also, pin the default ringbuffers objects regardless
(makes error capture a lot easier).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         |    5 +-
 drivers/gpu/drm/i915/intel_lrc.c        |   80 ++++++++++++++++++++---------
 drivers/gpu/drm/i915/intel_ringbuffer.c |   83 ++++++++++++++++++-------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |    3 ++
 4 files changed, 111 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 42faaa3..1a852b9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2498,13 +2498,16 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request)
 
 	if (ctx) {
 		struct intel_engine_cs *ring = request->ring;
+		struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
 		struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
 		atomic_t *unpin_count = &ctx->engine[ring->id].unpin_count;
 
 		if (ctx_obj) {
 			if (atomic_dec_return(unpin_count) == 0 &&
-					ctx != ring->default_context)
+					ctx != ring->default_context) {
+				intel_unpin_ringbuffer_obj(ringbuf);
 				i915_gem_object_ggtt_unpin(ctx_obj);
+			}
 		}
 		i915_gem_context_unreference(ctx);
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9fa8e35..4ca8278 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -315,7 +315,9 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
 }
 
-static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tail)
+static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
+				    struct drm_i915_gem_object *ring_obj,
+				    u32 tail)
 {
 	struct page *page;
 	uint32_t *reg_state;
@@ -324,6 +326,7 @@ static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tai
 	reg_state = kmap_atomic(page);
 
 	reg_state[CTX_RING_TAIL+1] = tail;
+	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
 
 	kunmap_atomic(reg_state);
 
@@ -334,21 +337,25 @@ static int execlists_submit_context(struct intel_engine_cs *ring,
 				    struct intel_context *to0, u32 tail0,
 				    struct intel_context *to1, u32 tail1)
 {
-	struct drm_i915_gem_object *ctx_obj0;
+	struct drm_i915_gem_object *ctx_obj0 = to0->engine[ring->id].state;
+	struct intel_ringbuffer *ringbuf0 = to0->engine[ring->id].ringbuf;
 	struct drm_i915_gem_object *ctx_obj1 = NULL;
+	struct intel_ringbuffer *ringbuf1 = NULL;
 
-	ctx_obj0 = to0->engine[ring->id].state;
 	BUG_ON(!ctx_obj0);
 	BUG_ON(!i915_gem_obj_is_pinned(ctx_obj0));
+	BUG_ON(!i915_gem_obj_is_pinned(ringbuf0->obj));
 
-	execlists_ctx_write_tail(ctx_obj0, tail0);
+	execlists_update_context(ctx_obj0, ringbuf0->obj, tail0);
 
 	if (to1) {
+		ringbuf1 = to1->engine[ring->id].ringbuf;
 		ctx_obj1 = to1->engine[ring->id].state;
 		BUG_ON(!ctx_obj1);
 		BUG_ON(!i915_gem_obj_is_pinned(ctx_obj1));
+		BUG_ON(!i915_gem_obj_is_pinned(ringbuf1->obj));
 
-		execlists_ctx_write_tail(ctx_obj1, tail1);
+		execlists_update_context(ctx_obj1, ringbuf1->obj, tail1);
 	}
 
 	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
@@ -772,6 +779,7 @@ static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
 
 	if (ring->preallocated_lazy_request == NULL) {
 		struct drm_i915_gem_request *request;
+		struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
 		struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
 		atomic_t *unpin_count = &ctx->engine[ring->id].unpin_count;
 
@@ -787,6 +795,13 @@ static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
 				kfree(request);
 				return ret;
 			}
+
+			ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
+			if (ret) {
+				i915_gem_object_ggtt_unpin(ctx_obj);
+				kfree(request);
+				return ret;
+			}
 		}
 
 		/* Hold a reference to the context this request belongs to
@@ -1546,7 +1561,13 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_RING_TAIL] = RING_TAIL(ring->mmio_base);
 	reg_state[CTX_RING_TAIL+1] = 0;
 	reg_state[CTX_RING_BUFFER_START] = RING_START(ring->mmio_base);
+
+	ret = i915_gem_obj_ggtt_pin(ring_obj, PAGE_SIZE, 0);
+	if (ret)
+		goto error;
 	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
+	i915_gem_object_ggtt_unpin(ring_obj);
+
 	reg_state[CTX_RING_BUFFER_CONTROL] = RING_CTL(ring->mmio_base);
 	reg_state[CTX_RING_BUFFER_CONTROL+1] =
 			((ringbuf->size - PAGE_SIZE) & RING_NR_PAGES) | RING_VALID;
@@ -1599,13 +1620,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 		reg_state[CTX_R_PWR_CLK_STATE+1] = 0;
 	}
 
+error:
 	kunmap_atomic(reg_state);
 
 	ctx_obj->dirty = 1;
 	set_page_dirty(page);
 	i915_gem_object_unpin_pages(ctx_obj);
 
-	return 0;
+	return ret;
 }
 
 /**
@@ -1627,10 +1649,12 @@ void intel_lr_context_free(struct intel_context *ctx)
 			struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
 			struct intel_engine_cs *ring = ringbuf->ring;
 
+			if (ctx == ring->default_context) {
+				intel_unpin_ringbuffer_obj(ringbuf);
+				i915_gem_object_ggtt_unpin(ctx_obj);
+			}
 			intel_destroy_ringbuffer_obj(ringbuf);
 			kfree(ringbuf);
-			if (ctx == ring->default_context)
-				i915_gem_object_ggtt_unpin(ctx_obj);
 			drm_gem_object_unreference(&ctx_obj->base);
 		}
 	}
@@ -1706,11 +1730,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	if (!ringbuf) {
 		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s\n",
 				ring->name);
-		if (is_global_default_ctx)
-			i915_gem_object_ggtt_unpin(ctx_obj);
-		drm_gem_object_unreference(&ctx_obj->base);
 		ret = -ENOMEM;
-		return ret;
+		goto error_unpin_ctx;
 	}
 
 	ringbuf->ring = ring;
@@ -1722,22 +1743,28 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	ringbuf->space = ringbuf->size;
 	ringbuf->last_retired_head = -1;
 
-	/* TODO: For now we put this in the mappable region so that we can reuse
-	 * the existing ringbuffer code which ioremaps it. When we start
-	 * creating many contexts, this will no longer work and we must switch
-	 * to a kmapish interface.
-	 */
-	ret = intel_alloc_ringbuffer_obj(dev, ringbuf);
-	if (ret) {
-		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer obj %s: %d\n",
-				ring->name, ret);
-		goto error;
+	if (ringbuf->obj == NULL) {
+		ret = intel_alloc_ringbuffer_obj(dev, ringbuf);
+		if (ret) {
+			DRM_DEBUG_DRIVER("Failed to allocate ringbuffer obj %s: %d\n",
+					ring->name, ret);
+			goto error_free_rbuf;
+		}
+
+		if (is_global_default_ctx) {
+			ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
+			if (ret) {
+				DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n",
+						ring->name, ret);
+				goto error_destroy_rbuf;
+			}
+		}
+
 	}
 
 	ret = populate_lr_context(ctx, ctx_obj, ring, ringbuf);
 	if (ret) {
 		DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret);
-		intel_destroy_ringbuffer_obj(ringbuf);
 		goto error;
 	}
 
@@ -1753,7 +1780,6 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 			DRM_ERROR("Init render state failed: %d\n", ret);
 			ctx->engine[ring->id].ringbuf = NULL;
 			ctx->engine[ring->id].state = NULL;
-			intel_destroy_ringbuffer_obj(ringbuf);
 			goto error;
 		}
 		ctx->rcs_initialized = true;
@@ -1762,7 +1788,13 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	return 0;
 
 error:
+	if (is_global_default_ctx)
+		intel_unpin_ringbuffer_obj(ringbuf);
+error_destroy_rbuf:
+	intel_destroy_ringbuffer_obj(ringbuf);
+error_free_rbuf:
 	kfree(ringbuf);
+error_unpin_ctx:
 	if (is_global_default_ctx)
 		i915_gem_object_ggtt_unpin(ctx_obj);
 	drm_gem_object_unreference(&ctx_obj->base);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6e604c9..020588c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1513,13 +1513,42 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
 	return 0;
 }
 
-void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
+void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 {
-	if (!ringbuf->obj)
-		return;
-
 	iounmap(ringbuf->virtual_start);
+	ringbuf->virtual_start = NULL;
 	i915_gem_object_ggtt_unpin(ringbuf->obj);
+}
+
+int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
+				     struct intel_ringbuffer *ringbuf)
+{
+	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct drm_i915_gem_object *obj = ringbuf->obj;
+	int ret;
+
+	ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
+	if (ret)
+		return ret;
+
+	ret = i915_gem_object_set_to_gtt_domain(obj, true);
+	if (ret) {
+		i915_gem_object_ggtt_unpin(obj);
+		return ret;
+	}
+
+	ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
+			i915_gem_obj_ggtt_offset(obj), ringbuf->size);
+	if (ringbuf->virtual_start == NULL) {
+		i915_gem_object_ggtt_unpin(obj);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
+{
 	drm_gem_object_unreference(&ringbuf->obj->base);
 	ringbuf->obj = NULL;
 }
@@ -1527,12 +1556,7 @@ void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 int intel_alloc_ringbuffer_obj(struct drm_device *dev,
 			       struct intel_ringbuffer *ringbuf)
 {
-	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_object *obj;
-	int ret;
-
-	if (ringbuf->obj)
-		return 0;
 
 	obj = NULL;
 	if (!HAS_LLC(dev))
@@ -1545,30 +1569,9 @@ int intel_alloc_ringbuffer_obj(struct drm_device *dev,
 	/* mark ring buffers as read-only from GPU side by default */
 	obj->gt_ro = 1;
 
-	ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
-	if (ret)
-		goto err_unref;
-
-	ret = i915_gem_object_set_to_gtt_domain(obj, true);
-	if (ret)
-		goto err_unpin;
-
-	ringbuf->virtual_start =
-		ioremap_wc(dev_priv->gtt.mappable_base + i915_gem_obj_ggtt_offset(obj),
-				ringbuf->size);
-	if (ringbuf->virtual_start == NULL) {
-		ret = -EINVAL;
-		goto err_unpin;
-	}
-
 	ringbuf->obj = obj;
-	return 0;
 
-err_unpin:
-	i915_gem_object_ggtt_unpin(obj);
-err_unref:
-	drm_gem_object_unreference(&obj->base);
-	return ret;
+	return 0;
 }
 
 static int intel_init_ring_buffer(struct drm_device *dev,
@@ -1606,10 +1609,19 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 			goto error;
 	}
 
-	ret = intel_alloc_ringbuffer_obj(dev, ringbuf);
-	if (ret) {
-		DRM_ERROR("Failed to allocate ringbuffer %s: %d\n", ring->name, ret);
-		goto error;
+	if (ringbuf->obj == NULL) {
+		ret = intel_alloc_ringbuffer_obj(dev, ringbuf);
+		if (ret) {
+			DRM_ERROR("Failed to allocate ringbuffer %s: %d\n", ring->name, ret);
+			goto error;
+		}
+
+		ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
+		if (ret) {
+			DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n", ring->name, ret);
+			intel_destroy_ringbuffer_obj(ringbuf);
+			goto error;
+		}
 	}
 
 	/* Workaround an erratum on the i830 which causes a hang if
@@ -1647,6 +1659,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 	intel_stop_ring_buffer(ring);
 	WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
 
+	intel_unpin_ringbuffer_obj(ringbuf);
 	intel_destroy_ringbuffer_obj(ringbuf);
 	ring->preallocated_lazy_request = NULL;
 	ring->outstanding_lazy_seqno = 0;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 905d1ba..81be2db 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -371,6 +371,9 @@ intel_write_status_page(struct intel_engine_cs *ring,
 #define I915_GEM_HWS_SCRATCH_INDEX	0x30
 #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
 
+void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf);
+int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
+				     struct intel_ringbuffer *ringbuf);
 void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf);
 int intel_alloc_ringbuffer_obj(struct drm_device *dev,
 			       struct intel_ringbuffer *ringbuf);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* Re: [PATCH 01/43] drm/i915: Reorder the actual workload submission so that args checking is done earlier
  2014-07-24 16:04 ` [PATCH 01/43] drm/i915: Reorder the actual workload submission so that args checking is done earlier Thomas Daniel
@ 2014-07-25  8:30   ` Daniel Vetter
  2014-07-25  9:16     ` Chris Wilson
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel Vetter @ 2014-07-25  8:30 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:09PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> In this patch:
> 
> commit 78382593e921c88371abd019aca8978db3248a8f
> Author: Oscar Mateo <oscar.mateo@intel.com>
> Date:   Thu Jul 3 16:28:05 2014 +0100
> 
>     drm/i915: Extract the actual workload submission mechanism from execbuffer
> 
>     So that we isolate the legacy ringbuffer submission mechanism, which becomes
>     a good candidate to be abstracted away. This is prep-work for Execlists (which
>     will its own workload submission mechanism).
> 
>     No functional changes.
> 
> I changed the order in which the args checking is done. I don't know why I did (brain
> fade?) but itś not right. I haven't seen any ill effect from this, but the Execlists
> version of this function will have problems if the order is not correct.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

I don't think this matters - the point of no return for legacy execbuf is
the call to ring->dispatch. After that nothing may fail any more. But as
long as we track state correctly (e.g. if we've switched the context
already) we'll be fine.

So presuming I'm not blind I dont' think this is needed. But maybe Chris
spots something.
-Daniel
> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   86 ++++++++++++++--------------
>  1 file changed, 43 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 60998fc..c5115957 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1042,6 +1042,43 @@ legacy_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
>  	u32 instp_mask;
>  	int i, ret = 0;
>  
> +	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
> +	instp_mask = I915_EXEC_CONSTANTS_MASK;
> +	switch (instp_mode) {
> +	case I915_EXEC_CONSTANTS_REL_GENERAL:
> +	case I915_EXEC_CONSTANTS_ABSOLUTE:
> +	case I915_EXEC_CONSTANTS_REL_SURFACE:
> +		if (instp_mode != 0 && ring != &dev_priv->ring[RCS]) {
> +			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
> +			ret = -EINVAL;
> +			goto error;
> +		}
> +
> +		if (instp_mode != dev_priv->relative_constants_mode) {
> +			if (INTEL_INFO(dev)->gen < 4) {
> +				DRM_DEBUG("no rel constants on pre-gen4\n");
> +				ret = -EINVAL;
> +				goto error;
> +			}
> +
> +			if (INTEL_INFO(dev)->gen > 5 &&
> +			    instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
> +				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
> +				ret = -EINVAL;
> +				goto error;
> +			}
> +
> +			/* The HW changed the meaning on this bit on gen6 */
> +			if (INTEL_INFO(dev)->gen >= 6)
> +				instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
> +		}
> +		break;
> +	default:
> +		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
> +		ret = -EINVAL;
> +		goto error;
> +	}
> +
>  	if (args->num_cliprects != 0) {
>  		if (ring != &dev_priv->ring[RCS]) {
>  			DRM_DEBUG("clip rectangles are only valid with the render ring\n");
> @@ -1085,6 +1122,12 @@ legacy_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
>  		}
>  	}
>  
> +	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
> +		ret = i915_reset_gen7_sol_offsets(dev, ring);
> +		if (ret)
> +			goto error;
> +	}
> +
>  	ret = i915_gem_execbuffer_move_to_gpu(ring, vmas);
>  	if (ret)
>  		goto error;
> @@ -1093,43 +1136,6 @@ legacy_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
>  	if (ret)
>  		goto error;
>  
> -	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
> -	instp_mask = I915_EXEC_CONSTANTS_MASK;
> -	switch (instp_mode) {
> -	case I915_EXEC_CONSTANTS_REL_GENERAL:
> -	case I915_EXEC_CONSTANTS_ABSOLUTE:
> -	case I915_EXEC_CONSTANTS_REL_SURFACE:
> -		if (instp_mode != 0 && ring != &dev_priv->ring[RCS]) {
> -			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
> -			ret = -EINVAL;
> -			goto error;
> -		}
> -
> -		if (instp_mode != dev_priv->relative_constants_mode) {
> -			if (INTEL_INFO(dev)->gen < 4) {
> -				DRM_DEBUG("no rel constants on pre-gen4\n");
> -				ret = -EINVAL;
> -				goto error;
> -			}
> -
> -			if (INTEL_INFO(dev)->gen > 5 &&
> -			    instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
> -				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
> -				ret = -EINVAL;
> -				goto error;
> -			}
> -
> -			/* The HW changed the meaning on this bit on gen6 */
> -			if (INTEL_INFO(dev)->gen >= 6)
> -				instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
> -		}
> -		break;
> -	default:
> -		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
> -		ret = -EINVAL;
> -		goto error;
> -	}
> -
>  	if (ring == &dev_priv->ring[RCS] &&
>  			instp_mode != dev_priv->relative_constants_mode) {
>  		ret = intel_ring_begin(ring, 4);
> @@ -1145,12 +1151,6 @@ legacy_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
>  		dev_priv->relative_constants_mode = instp_mode;
>  	}
>  
> -	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
> -		ret = i915_reset_gen7_sol_offsets(dev, ring);
> -		if (ret)
> -			goto error;
> -	}
> -
>  	exec_len = args->batch_len;
>  	if (cliprects) {
>  		for (i = 0; i < args->num_cliprects; i++) {
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 00/43] Execlists v5
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (42 preceding siblings ...)
  2014-07-24 16:04 ` [PATCH 43/43] drm/i915/bdw: Pin the ringbuffer backing object " Thomas Daniel
@ 2014-07-25  8:35 ` Daniel Vetter
  2014-08-01 16:09 ` Damien Lespiau
  44 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-07-25  8:35 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

Please format mails to a width of 75 chars or so. Decent mailers should do
that for you when hitting send.

On Thu, Jul 24, 2014 at 05:04:08PM +0100, Thomas Daniel wrote:
> From: Thomas Daniel <thomas.daniel@intel.com>
> The previous comment about the WAs still applies. I reproduce it here
> for completeness:
> 
> "One other caveat I have noticed is that many WAs in
> gen8_init_clock_gating (those that affect registers that now exist
> per-context) can get lost in the render default context. The reason is,
> in Execlists, a context is saved as soon as head = tail (with
> MI_SET_CONTEXT, however, the context wouldn't be saved until you tried
> to restore a different context). As we are sending the golden state
> batchbuffer to the render ring as soon as the rings are initialized, we
> are effectively saving the default context before gen8_init_clock_gating
> has an opportunity to set the WAs. I haven't noticed any ill-effect from
> this (yet) but it would be a good idea to move the WAs somewhere else
> (ring init looks like a good place). I believe there is already work in
> progress to create a new WA architecture, so this can be tackled there."

This sounds like the w/a test patch to compare wa reg state after system
s/r, runtime pm, gpu reset and driver reload should also have a test for
multiple contexts. I'll add it to the wishlist.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 01/43] drm/i915: Reorder the actual workload submission so that args checking is done earlier
  2014-07-25  8:30   ` Daniel Vetter
@ 2014-07-25  9:16     ` Chris Wilson
  0 siblings, 0 replies; 137+ messages in thread
From: Chris Wilson @ 2014-07-25  9:16 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Jul 25, 2014 at 10:30:09AM +0200, Daniel Vetter wrote:
> On Thu, Jul 24, 2014 at 05:04:09PM +0100, Thomas Daniel wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> > 
> > In this patch:
> > 
> > commit 78382593e921c88371abd019aca8978db3248a8f
> > Author: Oscar Mateo <oscar.mateo@intel.com>
> > Date:   Thu Jul 3 16:28:05 2014 +0100
> > 
> >     drm/i915: Extract the actual workload submission mechanism from execbuffer
> > 
> >     So that we isolate the legacy ringbuffer submission mechanism, which becomes
> >     a good candidate to be abstracted away. This is prep-work for Execlists (which
> >     will its own workload submission mechanism).
> > 
> >     No functional changes.
> > 
> > I changed the order in which the args checking is done. I don't know why I did (brain
> > fade?) but itś not right. I haven't seen any ill effect from this, but the Execlists
> > version of this function will have problems if the order is not correct.
> > 
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> 
> I don't think this matters - the point of no return for legacy execbuf is
> the call to ring->dispatch. After that nothing may fail any more. But as
> long as we track state correctly (e.g. if we've switched the context
> already) we'll be fine.

Right. Except that I think our tracking is buggy - or at least
insufficient to address the needs of future dispatch mechanisms. I think
that we confuse some bookkeeping that should be at the request level and
place it on the ring. At the moment, we have one request per-ring and so
it doesn't matter, but transitioning to one request per-logical-ring we
start to have issues as that state is being tracked on the wrong struct.

Anyway, that's part of the motivation to fixing up requests and making
them central to accessing the rings/dispatch (whereas at the moment they
are behind the scenes).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 12/43] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
  2014-07-24 16:04 ` [PATCH 12/43] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs Thomas Daniel
@ 2014-08-01 13:46   ` Damien Lespiau
  2014-08-07 12:17   ` Thomas Daniel
  1 sibling, 0 replies; 137+ messages in thread
From: Damien Lespiau @ 2014-08-01 13:46 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:20PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> This is mostly for correctness so that we know we are running the LR
> context correctly (this is, the PDPs are contained inside the context
> object).
> 
> v2: Move the check to inside the enable PPGTT function. The switch
> happens in two places: the legacy context switch (that we won't hit
> when Execlists are enabled) and the PPGTT enable, which unfortunately
> we need. This would look much nicer if the ppgtt->enable was part of
> the ring init, where it logically belongs.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 5188936..ccd70f5 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -852,6 +852,11 @@ static int gen8_ppgtt_enable(struct i915_hw_ppgtt *ppgtt)
>  		if (USES_FULL_PPGTT(dev))
>  			continue;
>  
> +		/* In the case of Execlists, we don't want to write the PDPs
> +		 * in the legacy way (they live inside the context now) */
> +		if (i915.enable_execlists)
> +			return 0;

This looks like it should be a continue to enable PPGTT on more rings
then the render ring?

-- 
Damien

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 29/43] drm/i915/bdw: Write the tail pointer, LRC style
  2014-07-24 16:04 ` [PATCH 29/43] drm/i915/bdw: Write the tail pointer, LRC style Thomas Daniel
@ 2014-08-01 14:33   ` Damien Lespiau
  2014-08-11 21:30   ` Daniel Vetter
  1 sibling, 0 replies; 137+ messages in thread
From: Damien Lespiau @ 2014-08-01 14:33 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:37PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Each logical ring context has the tail pointer in the context object,
> so update it before submission.
> 
> v2: New namespace.

I believe we could just leave the context object mapped for its whole
lifetime. Something to thing about at a later point.

-- 
Damien

> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c |   19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 535ef98..5b6f416 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -176,6 +176,21 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
>  	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
>  }
>  
> +static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tail)
> +{
> +	struct page *page;
> +	uint32_t *reg_state;
> +
> +	page = i915_gem_object_get_page(ctx_obj, 1);
> +	reg_state = kmap_atomic(page);
> +
> +	reg_state[CTX_RING_TAIL+1] = tail;
> +
> +	kunmap_atomic(reg_state);
> +
> +	return 0;
> +}
> +
>  static int execlists_submit_context(struct intel_engine_cs *ring,
>  				    struct intel_context *to0, u32 tail0,
>  				    struct intel_context *to1, u32 tail1)
> @@ -187,10 +202,14 @@ static int execlists_submit_context(struct intel_engine_cs *ring,
>  	BUG_ON(!ctx_obj0);
>  	BUG_ON(!i915_gem_obj_is_pinned(ctx_obj0));
>  
> +	execlists_ctx_write_tail(ctx_obj0, tail0);
> +
>  	if (to1) {
>  		ctx_obj1 = to1->engine[ring->id].state;
>  		BUG_ON(!ctx_obj1);
>  		BUG_ON(!i915_gem_obj_is_pinned(ctx_obj1));
> +
> +		execlists_ctx_write_tail(ctx_obj1, tail1);
>  	}
>  
>  	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 34/43] drm/i915/bdw: Make sure gpu reset still works with Execlists
  2014-07-24 16:04 ` [PATCH 34/43] drm/i915/bdw: Make sure gpu reset still works with Execlists Thomas Daniel
@ 2014-08-01 14:42   ` Damien Lespiau
  2014-08-06  9:26     ` Daniel, Thomas
  2014-08-01 14:46   ` Damien Lespiau
  1 sibling, 1 reply; 137+ messages in thread
From: Damien Lespiau @ 2014-08-01 14:42 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:42PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> If we reset a ring after a hang, we have to make sure that we clear
> out all queued Execlists requests.
> 
> v2: The ring is, at this point, already being correctly re-programmed
> for Execlists, and the hangcheck counters cleared.
> 
> v3: Daniel suggests to drop the "if (execlists)" because the Execlists
> queue should be empty in legacy mode (which is true, if we do the
> INIT_LIST_HEAD).
> 
> v4: Do the pending intel_runtime_pm_put

I don't see a intel_runtime_pm_get() that put() would correspond to.

> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c         |   12 ++++++++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.c |    1 +
>  2 files changed, 13 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1c83b9c..143cff7 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2567,6 +2567,18 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>  		i915_gem_free_request(request);
>  	}
>  
> +	while (!list_empty(&ring->execlist_queue)) {
> +		struct intel_ctx_submit_request *submit_req;
> +
> +		submit_req = list_first_entry(&ring->execlist_queue,
> +				struct intel_ctx_submit_request,
> +				execlist_link);
> +		list_del(&submit_req->execlist_link);
> +		intel_runtime_pm_put(dev_priv);
> +		i915_gem_context_unreference(submit_req->ctx);
> +		kfree(submit_req);
> +	}
> +
>  	/* These may not have been flush before the reset, do so now */
>  	kfree(ring->preallocated_lazy_request);
>  	ring->preallocated_lazy_request = NULL;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 3188403..6e604c9 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1587,6 +1587,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>  	ring->dev = dev;
>  	INIT_LIST_HEAD(&ring->active_list);
>  	INIT_LIST_HEAD(&ring->request_list);
> +	INIT_LIST_HEAD(&ring->execlist_queue);
>  	ringbuf->size = 32 * PAGE_SIZE;
>  	ringbuf->ring = ring;
>  	ringbuf->ctx = ring->default_context;
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 34/43] drm/i915/bdw: Make sure gpu reset still works with Execlists
  2014-07-24 16:04 ` [PATCH 34/43] drm/i915/bdw: Make sure gpu reset still works with Execlists Thomas Daniel
  2014-08-01 14:42   ` Damien Lespiau
@ 2014-08-01 14:46   ` Damien Lespiau
  2014-08-06  9:28     ` Daniel, Thomas
  1 sibling, 1 reply; 137+ messages in thread
From: Damien Lespiau @ 2014-08-01 14:46 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:42PM +0100, Thomas Daniel wrote:
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 3188403..6e604c9 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1587,6 +1587,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>  	ring->dev = dev;
>  	INIT_LIST_HEAD(&ring->active_list);
>  	INIT_LIST_HEAD(&ring->request_list);
> +	INIT_LIST_HEAD(&ring->execlist_queue);

It's also a bit weird to now have sites where we initialize this list.
or I'm missing something?

-- 
Damien

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 37/43] drm/i915/bdw: Display execlists info in debugfs
  2014-07-24 16:04 ` [PATCH 37/43] drm/i915/bdw: Display execlists info in debugfs Thomas Daniel
@ 2014-08-01 14:54   ` Damien Lespiau
  2014-08-07 12:23   ` Thomas Daniel
  1 sibling, 0 replies; 137+ messages in thread
From: Damien Lespiau @ 2014-08-01 14:54 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:45PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> v2: Warn and return if LRCs are not enabled.
> 
> v3: Grab the Execlists spinlock (noticed by Daniel Vetter).
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

This looks like it may be missing a struct_mutex lock to grab a state
atomically.

-- 
Damien

> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |   73 +++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c    |    6 ---
>  drivers/gpu/drm/i915/intel_lrc.h    |    7 ++++
>  3 files changed, 80 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index fc39610..903ed67 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1674,6 +1674,78 @@ static int i915_context_status(struct seq_file *m, void *unused)
>  	return 0;
>  }
>  
> +static int i915_execlists(struct seq_file *m, void *data)
> +{
> +	struct drm_info_node *node = (struct drm_info_node *) m->private;
> +	struct drm_device *dev = node->minor->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_engine_cs *ring;
> +	u32 status_pointer;
> +	u8 read_pointer;
> +	u8 write_pointer;
> +	u32 status;
> +	u32 ctx_id;
> +	struct list_head *cursor;
> +	int ring_id, i;
> +
> +	if (!i915.enable_execlists) {
> +		seq_printf(m, "Logical Ring Contexts are disabled\n");
> +		return 0;
> +	}
> +
> +	for_each_ring(ring, dev_priv, ring_id) {
> +		struct intel_ctx_submit_request *head_req = NULL;
> +		int count = 0;
> +		unsigned long flags;
> +
> +		seq_printf(m, "%s\n", ring->name);
> +
> +		status = I915_READ(RING_EXECLIST_STATUS(ring));
> +		ctx_id = I915_READ(RING_EXECLIST_STATUS(ring) + 4);
> +		seq_printf(m, "\tExeclist status: 0x%08X, context: %u\n",
> +				status, ctx_id);
> +
> +		status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
> +		seq_printf(m, "\tStatus pointer: 0x%08X\n", status_pointer);
> +
> +		read_pointer = ring->next_context_status_buffer;
> +		write_pointer = status_pointer & 0x07;
> +		if (read_pointer > write_pointer)
> +			write_pointer += 6;
> +		seq_printf(m, "\tRead pointer: 0x%08X, write pointer 0x%08X\n",
> +				read_pointer, write_pointer);
> +
> +		for (i = 0; i < 6; i++) {
> +			status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + 8*i);
> +			ctx_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + 8*i + 4);
> +
> +			seq_printf(m, "\tStatus buffer %d: 0x%08X, context: %u\n",
> +					i, status, ctx_id);
> +		}
> +
> +		spin_lock_irqsave(&ring->execlist_lock, flags);
> +		list_for_each(cursor, &ring->execlist_queue)
> +			count++;
> +		head_req = list_first_entry_or_null(&ring->execlist_queue,
> +				struct intel_ctx_submit_request, execlist_link);
> +		spin_unlock_irqrestore(&ring->execlist_lock, flags);
> +
> +		seq_printf(m, "\t%d requests in queue\n", count);
> +		if (head_req) {
> +			struct drm_i915_gem_object *ctx_obj;
> +
> +			ctx_obj = head_req->ctx->engine[ring_id].state;
> +			seq_printf(m, "\tHead request id: %u\n",
> +					intel_execlists_ctx_id(ctx_obj));
> +			seq_printf(m, "\tHead request tail: %u\n", head_req->tail);
> +		}
> +
> +		seq_putc(m, '\n');
> +	}
> +
> +	return 0;
> +}
> +
>  static int i915_gen6_forcewake_count_info(struct seq_file *m, void *data)
>  {
>  	struct drm_info_node *node = m->private;
> @@ -3899,6 +3971,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
>  	{"i915_opregion", i915_opregion, 0},
>  	{"i915_gem_framebuffer", i915_gem_framebuffer_info, 0},
>  	{"i915_context_status", i915_context_status, 0},
> +	{"i915_execlists", i915_execlists, 0},
>  	{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
>  	{"i915_swizzle_info", i915_swizzle_info, 0},
>  	{"i915_ppgtt_info", i915_ppgtt_info, 0},
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 829b15d..8056fa4 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -46,12 +46,6 @@
>  
>  #define GEN8_LR_CONTEXT_ALIGN 4096
>  
> -#define RING_ELSP(ring)			((ring)->mmio_base+0x230)
> -#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
> -#define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
> -#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
> -#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
> -
>  #define RING_EXECLIST_QFULL		(1 << 0x2)
>  #define RING_EXECLIST1_VALID		(1 << 0x3)
>  #define RING_EXECLIST0_VALID		(1 << 0x4)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 074b44f..f3b921b 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -24,6 +24,13 @@
>  #ifndef _INTEL_LRC_H_
>  #define _INTEL_LRC_H_
>  
> +/* Execlists regs */
> +#define RING_ELSP(ring)			((ring)->mmio_base+0x230)
> +#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
> +#define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
> +#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
> +#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
> +
>  /* Logical Rings */
>  void intel_logical_ring_stop(struct intel_engine_cs *ring);
>  void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 39/43] drm/i915/bdw: Print context state in debugfs
  2014-07-24 16:04 ` [PATCH 39/43] drm/i915/bdw: Print context state " Thomas Daniel
@ 2014-08-01 15:54   ` Damien Lespiau
  2014-08-07 12:24   ` Thomas Daniel
  1 sibling, 0 replies; 137+ messages in thread
From: Damien Lespiau @ 2014-08-01 15:54 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:47PM +0100, Thomas Daniel wrote:
> From: Ben Widawsky <ben@bwidawsk.net>
> 
> This has turned out to be really handy in debug so far.
> 
> Update:
> Since writing this patch, I've gotten similar code upstream for error
> state. I've used it quite a bit in debugfs however, and I'd like to keep
> it here at least until preemption is working.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> 
> This patch was accidentally dropped in the first Execlists version, and
> it has been very useful indeed. Put it back again, but as a standalone
> debugfs file.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

It looks like the locking should be on struct_mutex and not
mode_config.mutex?

-- 
Damien

> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |   52 +++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 0980cdd..968c3c0 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1695,6 +1695,57 @@ static int i915_context_status(struct seq_file *m, void *unused)
>  	return 0;
>  }
>  
> +static int i915_dump_lrc(struct seq_file *m, void *unused)
> +{
> +	struct drm_info_node *node = (struct drm_info_node *) m->private;
> +	struct drm_device *dev = node->minor->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_engine_cs *ring;
> +	struct intel_context *ctx;
> +	int ret, i;
> +
> +	if (!i915.enable_execlists) {
> +		seq_printf(m, "Logical Ring Contexts are disabled\n");
> +		return 0;
> +	}
> +
> +	ret = mutex_lock_interruptible(&dev->mode_config.mutex);
> +	if (ret)
> +		return ret;
> +
> +	list_for_each_entry(ctx, &dev_priv->context_list, link) {
> +		for_each_ring(ring, dev_priv, i) {
> +			struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
> +
> +			if (ring->default_context == ctx)
> +				continue;
> +
> +			if (ctx_obj) {
> +				struct page *page = i915_gem_object_get_page(ctx_obj, 1);
> +				uint32_t *reg_state = kmap_atomic(page);
> +				int j;
> +
> +				seq_printf(m, "CONTEXT: %s %u\n", ring->name,
> +						intel_execlists_ctx_id(ctx_obj));
> +
> +				for (j = 0; j < 0x600 / sizeof(u32) / 4; j += 4) {
> +					seq_printf(m, "\t[0x%08lx] 0x%08x 0x%08x 0x%08x 0x%08x\n",
> +					i915_gem_obj_ggtt_offset(ctx_obj) + 4096 + (j * 4),
> +					reg_state[j], reg_state[j + 1],
> +					reg_state[j + 2], reg_state[j + 3]);
> +				}
> +				kunmap_atomic(reg_state);
> +
> +				seq_putc(m, '\n');
> +			}
> +		}
> +	}
> +
> +	mutex_unlock(&dev->mode_config.mutex);
> +
> +	return 0;
> +}
> +
>  static int i915_execlists(struct seq_file *m, void *data)
>  {
>  	struct drm_info_node *node = (struct drm_info_node *) m->private;
> @@ -3992,6 +4043,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
>  	{"i915_opregion", i915_opregion, 0},
>  	{"i915_gem_framebuffer", i915_gem_framebuffer_info, 0},
>  	{"i915_context_status", i915_context_status, 0},
> +	{"i915_dump_lrc", i915_dump_lrc, 0},
>  	{"i915_execlists", i915_execlists, 0},
>  	{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
>  	{"i915_swizzle_info", i915_swizzle_info, 0},
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 00/43] Execlists v5
  2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
                   ` (43 preceding siblings ...)
  2014-07-25  8:35 ` [PATCH 00/43] Execlists v5 Daniel Vetter
@ 2014-08-01 16:09 ` Damien Lespiau
  2014-08-01 16:29   ` Jesse Barnes
  44 siblings, 1 reply; 137+ messages in thread
From: Damien Lespiau @ 2014-08-01 16:09 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:08PM +0100, Thomas Daniel wrote:

All patches can have my r-b tag but patches 12, 34, 37, 39 which have
minor comments (in terms of code changes) to address. I did look more at
the low-level stuff (Vs the higher level abstractions).

At this point, I believe the way forward is to merge that series to
allow more people to beat on it. Step 1 is to make sure we don't regress
the legacy ring buffers and now is a good time to land it as we start a
new kernel cycle.

Whether to enable it by default for BDW is an interesting question that
may depend on the first round of QA.

-- 
Damien

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 00/43] Execlists v5
  2014-08-01 16:09 ` Damien Lespiau
@ 2014-08-01 16:29   ` Jesse Barnes
  0 siblings, 0 replies; 137+ messages in thread
From: Jesse Barnes @ 2014-08-01 16:29 UTC (permalink / raw)
  To: Damien Lespiau; +Cc: intel-gfx

On Fri, 1 Aug 2014 17:09:50 +0100
Damien Lespiau <damien.lespiau@intel.com> wrote:

> On Thu, Jul 24, 2014 at 05:04:08PM +0100, Thomas Daniel wrote:
> 
> All patches can have my r-b tag but patches 12, 34, 37, 39 which have
> minor comments (in terms of code changes) to address. I did look more at
> the low-level stuff (Vs the higher level abstractions).
> 
> At this point, I believe the way forward is to merge that series to
> allow more people to beat on it. Step 1 is to make sure we don't regress
> the legacy ring buffers and now is a good time to land it as we start a
> new kernel cycle.
> 
> Whether to enable it by default for BDW is an interesting question that
> may depend on the first round of QA.

Yeah I think we want to enable it on BDW too after getting some testing
and sanity checking.  The legacy ring buffers aren't getting much
testing elsewhere and I'm afraid we'll run into issues that don't exist
with the execlist path if we stick with the legacy submission path (we
may have already hit one in fact).

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 34/43] drm/i915/bdw: Make sure gpu reset still works with Execlists
  2014-08-01 14:42   ` Damien Lespiau
@ 2014-08-06  9:26     ` Daniel, Thomas
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel, Thomas @ 2014-08-06  9:26 UTC (permalink / raw)
  To: Lespiau, Damien; +Cc: intel-gfx

> -----Original Message-----
> From: Lespiau, Damien
> Sent: Friday, August 01, 2014 3:42 PM
> To: Daniel, Thomas
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 34/43] drm/i915/bdw: Make sure gpu reset
> still works with Execlists
> 
> On Thu, Jul 24, 2014 at 05:04:42PM +0100, Thomas Daniel wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > If we reset a ring after a hang, we have to make sure that we clear
> > out all queued Execlists requests.
> >
> > v2: The ring is, at this point, already being correctly re-programmed
> > for Execlists, and the hangcheck counters cleared.
> >
> > v3: Daniel suggests to drop the "if (execlists)" because the Execlists
> > queue should be empty in legacy mode (which is true, if we do the
> > INIT_LIST_HEAD).
> >
> > v4: Do the pending intel_runtime_pm_put
> 
> I don't see a intel_runtime_pm_get() that put() would correspond to.
It is in execlists_context_queue() where the request is added to the queue.

> 
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c         |   12 ++++++++++++
> >  drivers/gpu/drm/i915/intel_ringbuffer.c |    1 +
> >  2 files changed, 13 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c
> > b/drivers/gpu/drm/i915/i915_gem.c index 1c83b9c..143cff7 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -2567,6 +2567,18 @@ static void i915_gem_reset_ring_cleanup(struct
> drm_i915_private *dev_priv,
> >  		i915_gem_free_request(request);
> >  	}
> >
> > +	while (!list_empty(&ring->execlist_queue)) {
> > +		struct intel_ctx_submit_request *submit_req;
> > +
> > +		submit_req = list_first_entry(&ring->execlist_queue,
> > +				struct intel_ctx_submit_request,
> > +				execlist_link);
> > +		list_del(&submit_req->execlist_link);
> > +		intel_runtime_pm_put(dev_priv);
> > +		i915_gem_context_unreference(submit_req->ctx);
> > +		kfree(submit_req);
> > +	}
> > +
> >  	/* These may not have been flush before the reset, do so now */
> >  	kfree(ring->preallocated_lazy_request);
> >  	ring->preallocated_lazy_request = NULL; diff --git
> > a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 3188403..6e604c9 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1587,6 +1587,7 @@ static int intel_init_ring_buffer(struct drm_device
> *dev,
> >  	ring->dev = dev;
> >  	INIT_LIST_HEAD(&ring->active_list);
> >  	INIT_LIST_HEAD(&ring->request_list);
> > +	INIT_LIST_HEAD(&ring->execlist_queue);
> >  	ringbuf->size = 32 * PAGE_SIZE;
> >  	ringbuf->ring = ring;
> >  	ringbuf->ctx = ring->default_context;
> > --
> > 1.7.9.5
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 34/43] drm/i915/bdw: Make sure gpu reset still works with Execlists
  2014-08-01 14:46   ` Damien Lespiau
@ 2014-08-06  9:28     ` Daniel, Thomas
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel, Thomas @ 2014-08-06  9:28 UTC (permalink / raw)
  To: Lespiau, Damien; +Cc: intel-gfx



> -----Original Message-----
> From: Lespiau, Damien
> Sent: Friday, August 01, 2014 3:46 PM
> To: Daniel, Thomas
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 34/43] drm/i915/bdw: Make sure gpu reset
> still works with Execlists
> 
> On Thu, Jul 24, 2014 at 05:04:42PM +0100, Thomas Daniel wrote:
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 3188403..6e604c9 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1587,6 +1587,7 @@ static int intel_init_ring_buffer(struct drm_device
> *dev,
> >  	ring->dev = dev;
> >  	INIT_LIST_HEAD(&ring->active_list);
> >  	INIT_LIST_HEAD(&ring->request_list);
> > +	INIT_LIST_HEAD(&ring->execlist_queue);
> 
> It's also a bit weird to now have sites where we initialize this list.
> or I'm missing something?
The legacy ringbuffer init path now also has to initialize the queue because
I915_gem_reset_ring_cleanup() now assumes that the list is always valid,
even though it will always be empty in legacy mode.
>
> --
> Damien

^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 12/43] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
  2014-07-24 16:04 ` [PATCH 12/43] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs Thomas Daniel
  2014-08-01 13:46   ` Damien Lespiau
@ 2014-08-07 12:17   ` Thomas Daniel
  2014-08-08 15:59     ` Damien Lespiau
                       ` (2 more replies)
  1 sibling, 3 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-08-07 12:17 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

This is mostly for correctness so that we know we are running the LR
context correctly (this is, the PDPs are contained inside the context
object).

v2: Move the check to inside the enable PPGTT function. The switch
happens in two places: the legacy context switch (that we won't hit
when Execlists are enabled) and the PPGTT enable, which unfortunately
we need. This would look much nicer if the ppgtt->enable was part of
the ring init, where it logically belongs.

v3: Move the check to the start of the enable PPGTT function.  None
of the legacy PPGTT enabling is required when using LRCs as the
PPGTT is enabled in the context descriptor and the PDPs are written
in the LRC.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5188936..cfbf272 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -843,6 +843,11 @@ static int gen8_ppgtt_enable(struct i915_hw_ppgtt *ppgtt)
 	struct intel_engine_cs *ring;
 	int j, ret;
 
+	/* In the case of Execlists, we don't want to write the PDPs
+	 * in the legacy way (they live inside the context now) */
+	if (i915.enable_execlists)
+		return 0;
+
 	for_each_ring(ring, dev_priv, j) {
 		I915_WRITE(RING_MODE_GEN7(ring),
 			   _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 37/43] drm/i915/bdw: Display execlists info in debugfs
  2014-07-24 16:04 ` [PATCH 37/43] drm/i915/bdw: Display execlists info in debugfs Thomas Daniel
  2014-08-01 14:54   ` Damien Lespiau
@ 2014-08-07 12:23   ` Thomas Daniel
  2014-08-08 16:02     ` Damien Lespiau
  1 sibling, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-08-07 12:23 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

v2: Warn and return if LRCs are not enabled.

v3: Grab the Execlists spinlock (noticed by Daniel Vetter).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

v4: Lock the struct mutex for atomic state capture

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |   80 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c    |    6 ---
 drivers/gpu/drm/i915/intel_lrc.h    |    7 +++
 3 files changed, 87 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index fc39610..f8f0e11 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1674,6 +1674,85 @@ static int i915_context_status(struct seq_file *m, void *unused)
 	return 0;
 }
 
+static int i915_execlists(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = (struct drm_info_node *) m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring;
+	u32 status_pointer;
+	u8 read_pointer;
+	u8 write_pointer;
+	u32 status;
+	u32 ctx_id;
+	struct list_head *cursor;
+	int ring_id, i;
+	int ret;
+
+	if (!i915.enable_execlists) {
+		seq_printf(m, "Logical Ring Contexts are disabled\n");
+		return 0;
+	}
+
+	ret = mutex_lock_interruptible(&dev->struct_mutex);
+	if (ret)
+		return ret;
+
+	for_each_ring(ring, dev_priv, ring_id) {
+		struct intel_ctx_submit_request *head_req = NULL;
+		int count = 0;
+		unsigned long flags;
+
+		seq_printf(m, "%s\n", ring->name);
+
+		status = I915_READ(RING_EXECLIST_STATUS(ring));
+		ctx_id = I915_READ(RING_EXECLIST_STATUS(ring) + 4);
+		seq_printf(m, "\tExeclist status: 0x%08X, context: %u\n",
+				status, ctx_id);
+
+		status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
+		seq_printf(m, "\tStatus pointer: 0x%08X\n", status_pointer);
+
+		read_pointer = ring->next_context_status_buffer;
+		write_pointer = status_pointer & 0x07;
+		if (read_pointer > write_pointer)
+			write_pointer += 6;
+		seq_printf(m, "\tRead pointer: 0x%08X, write pointer 0x%08X\n",
+				read_pointer, write_pointer);
+
+		for (i = 0; i < 6; i++) {
+			status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + 8*i);
+			ctx_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + 8*i + 4);
+
+			seq_printf(m, "\tStatus buffer %d: 0x%08X, context: %u\n",
+					i, status, ctx_id);
+		}
+
+		spin_lock_irqsave(&ring->execlist_lock, flags);
+		list_for_each(cursor, &ring->execlist_queue)
+			count++;
+		head_req = list_first_entry_or_null(&ring->execlist_queue,
+				struct intel_ctx_submit_request, execlist_link);
+		spin_unlock_irqrestore(&ring->execlist_lock, flags);
+
+		seq_printf(m, "\t%d requests in queue\n", count);
+		if (head_req) {
+			struct drm_i915_gem_object *ctx_obj;
+
+			ctx_obj = head_req->ctx->engine[ring_id].state;
+			seq_printf(m, "\tHead request id: %u\n",
+					intel_execlists_ctx_id(ctx_obj));
+			seq_printf(m, "\tHead request tail: %u\n", head_req->tail);
+		}
+
+		seq_putc(m, '\n');
+	}
+
+	mutex_unlock(&dev->struct_mutex);
+
+	return 0;
+}
+
 static int i915_gen6_forcewake_count_info(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = m->private;
@@ -3899,6 +3978,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_opregion", i915_opregion, 0},
 	{"i915_gem_framebuffer", i915_gem_framebuffer_info, 0},
 	{"i915_context_status", i915_context_status, 0},
+	{"i915_execlists", i915_execlists, 0},
 	{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
 	{"i915_swizzle_info", i915_swizzle_info, 0},
 	{"i915_ppgtt_info", i915_ppgtt_info, 0},
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 829b15d..8056fa4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -46,12 +46,6 @@
 
 #define GEN8_LR_CONTEXT_ALIGN 4096
 
-#define RING_ELSP(ring)			((ring)->mmio_base+0x230)
-#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
-#define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
-#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
-#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
-
 #define RING_EXECLIST_QFULL		(1 << 0x2)
 #define RING_EXECLIST1_VALID		(1 << 0x3)
 #define RING_EXECLIST0_VALID		(1 << 0x4)
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 074b44f..f3b921b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -24,6 +24,13 @@
 #ifndef _INTEL_LRC_H_
 #define _INTEL_LRC_H_
 
+/* Execlists regs */
+#define RING_ELSP(ring)			((ring)->mmio_base+0x230)
+#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
+#define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
+#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
+#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
+
 /* Logical Rings */
 void intel_logical_ring_stop(struct intel_engine_cs *ring);
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 39/43] drm/i915/bdw: Print context state in debugfs
  2014-07-24 16:04 ` [PATCH 39/43] drm/i915/bdw: Print context state " Thomas Daniel
  2014-08-01 15:54   ` Damien Lespiau
@ 2014-08-07 12:24   ` Thomas Daniel
  2014-08-08 15:57     ` Damien Lespiau
  1 sibling, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-08-07 12:24 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <ben@bwidawsk.net>

This has turned out to be really handy in debug so far.

Update:
Since writing this patch, I've gotten similar code upstream for error
state. I've used it quite a bit in debugfs however, and I'd like to keep
it here at least until preemption is working.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

This patch was accidentally dropped in the first Execlists version, and
it has been very useful indeed. Put it back again, but as a standalone
debugfs file.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

v2: Take the device struct_mutex rather than mode_config mutex for
atomic state capture.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |   52 +++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index aca5ff1..a3c958c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1695,6 +1695,57 @@ static int i915_context_status(struct seq_file *m, void *unused)
 	return 0;
 }
 
+static int i915_dump_lrc(struct seq_file *m, void *unused)
+{
+	struct drm_info_node *node = (struct drm_info_node *) m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring;
+	struct intel_context *ctx;
+	int ret, i;
+
+	if (!i915.enable_execlists) {
+		seq_printf(m, "Logical Ring Contexts are disabled\n");
+		return 0;
+	}
+
+	ret = mutex_lock_interruptible(&dev->struct_mutex);
+	if (ret)
+		return ret;
+
+	list_for_each_entry(ctx, &dev_priv->context_list, link) {
+		for_each_ring(ring, dev_priv, i) {
+			struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
+
+			if (ring->default_context == ctx)
+				continue;
+
+			if (ctx_obj) {
+				struct page *page = i915_gem_object_get_page(ctx_obj, 1);
+				uint32_t *reg_state = kmap_atomic(page);
+				int j;
+
+				seq_printf(m, "CONTEXT: %s %u\n", ring->name,
+						intel_execlists_ctx_id(ctx_obj));
+
+				for (j = 0; j < 0x600 / sizeof(u32) / 4; j += 4) {
+					seq_printf(m, "\t[0x%08lx] 0x%08x 0x%08x 0x%08x 0x%08x\n",
+					i915_gem_obj_ggtt_offset(ctx_obj) + 4096 + (j * 4),
+					reg_state[j], reg_state[j + 1],
+					reg_state[j + 2], reg_state[j + 3]);
+				}
+				kunmap_atomic(reg_state);
+
+				seq_putc(m, '\n');
+			}
+		}
+	}
+
+	mutex_unlock(&dev->struct_mutex);
+
+	return 0;
+}
+
 static int i915_execlists(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = (struct drm_info_node *) m->private;
@@ -3999,6 +4050,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_opregion", i915_opregion, 0},
 	{"i915_gem_framebuffer", i915_gem_framebuffer_info, 0},
 	{"i915_context_status", i915_context_status, 0},
+	{"i915_dump_lrc", i915_dump_lrc, 0},
 	{"i915_execlists", i915_execlists, 0},
 	{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
 	{"i915_swizzle_info", i915_swizzle_info, 0},
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* Re: [PATCH 39/43] drm/i915/bdw: Print context state in debugfs
  2014-08-07 12:24   ` Thomas Daniel
@ 2014-08-08 15:57     ` Damien Lespiau
  0 siblings, 0 replies; 137+ messages in thread
From: Damien Lespiau @ 2014-08-08 15:57 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Aug 07, 2014 at 01:24:26PM +0100, Thomas Daniel wrote:
> From: Ben Widawsky <ben@bwidawsk.net>
> 
> This has turned out to be really handy in debug so far.
> 
> Update:
> Since writing this patch, I've gotten similar code upstream for error
> state. I've used it quite a bit in debugfs however, and I'd like to keep
> it here at least until preemption is working.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> 
> This patch was accidentally dropped in the first Execlists version, and
> it has been very useful indeed. Put it back again, but as a standalone
> debugfs file.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> 
> v2: Take the device struct_mutex rather than mode_config mutex for
> atomic state capture.
> 
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>

-- 
Damien

> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |   52 +++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index aca5ff1..a3c958c 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1695,6 +1695,57 @@ static int i915_context_status(struct seq_file *m, void *unused)
>  	return 0;
>  }
>  
> +static int i915_dump_lrc(struct seq_file *m, void *unused)
> +{
> +	struct drm_info_node *node = (struct drm_info_node *) m->private;
> +	struct drm_device *dev = node->minor->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_engine_cs *ring;
> +	struct intel_context *ctx;
> +	int ret, i;
> +
> +	if (!i915.enable_execlists) {
> +		seq_printf(m, "Logical Ring Contexts are disabled\n");
> +		return 0;
> +	}
> +
> +	ret = mutex_lock_interruptible(&dev->struct_mutex);
> +	if (ret)
> +		return ret;
> +
> +	list_for_each_entry(ctx, &dev_priv->context_list, link) {
> +		for_each_ring(ring, dev_priv, i) {
> +			struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
> +
> +			if (ring->default_context == ctx)
> +				continue;
> +
> +			if (ctx_obj) {
> +				struct page *page = i915_gem_object_get_page(ctx_obj, 1);
> +				uint32_t *reg_state = kmap_atomic(page);
> +				int j;
> +
> +				seq_printf(m, "CONTEXT: %s %u\n", ring->name,
> +						intel_execlists_ctx_id(ctx_obj));
> +
> +				for (j = 0; j < 0x600 / sizeof(u32) / 4; j += 4) {
> +					seq_printf(m, "\t[0x%08lx] 0x%08x 0x%08x 0x%08x 0x%08x\n",
> +					i915_gem_obj_ggtt_offset(ctx_obj) + 4096 + (j * 4),
> +					reg_state[j], reg_state[j + 1],
> +					reg_state[j + 2], reg_state[j + 3]);
> +				}
> +				kunmap_atomic(reg_state);
> +
> +				seq_putc(m, '\n');
> +			}
> +		}
> +	}
> +
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	return 0;
> +}
> +
>  static int i915_execlists(struct seq_file *m, void *data)
>  {
>  	struct drm_info_node *node = (struct drm_info_node *) m->private;
> @@ -3999,6 +4050,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
>  	{"i915_opregion", i915_opregion, 0},
>  	{"i915_gem_framebuffer", i915_gem_framebuffer_info, 0},
>  	{"i915_context_status", i915_context_status, 0},
> +	{"i915_dump_lrc", i915_dump_lrc, 0},
>  	{"i915_execlists", i915_execlists, 0},
>  	{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
>  	{"i915_swizzle_info", i915_swizzle_info, 0},
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 12/43] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
  2014-08-07 12:17   ` Thomas Daniel
@ 2014-08-08 15:59     ` Damien Lespiau
  2014-08-11 14:32     ` Daniel Vetter
  2014-08-15 11:01     ` [PATCH] " Thomas Daniel
  2 siblings, 0 replies; 137+ messages in thread
From: Damien Lespiau @ 2014-08-08 15:59 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Aug 07, 2014 at 01:17:40PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> This is mostly for correctness so that we know we are running the LR
> context correctly (this is, the PDPs are contained inside the context
> object).
> 
> v2: Move the check to inside the enable PPGTT function. The switch
> happens in two places: the legacy context switch (that we won't hit
> when Execlists are enabled) and the PPGTT enable, which unfortunately
> we need. This would look much nicer if the ppgtt->enable was part of
> the ring init, where it logically belongs.
> 
> v3: Move the check to the start of the enable PPGTT function.  None
> of the legacy PPGTT enabling is required when using LRCs as the
> PPGTT is enabled in the context descriptor and the PDPs are written
> in the LRC.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>

-- 
Damien

> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 5188936..cfbf272 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -843,6 +843,11 @@ static int gen8_ppgtt_enable(struct i915_hw_ppgtt *ppgtt)
>  	struct intel_engine_cs *ring;
>  	int j, ret;
>  
> +	/* In the case of Execlists, we don't want to write the PDPs
> +	 * in the legacy way (they live inside the context now) */
> +	if (i915.enable_execlists)
> +		return 0;
> +
>  	for_each_ring(ring, dev_priv, j) {
>  		I915_WRITE(RING_MODE_GEN7(ring),
>  			   _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 37/43] drm/i915/bdw: Display execlists info in debugfs
  2014-08-07 12:23   ` Thomas Daniel
@ 2014-08-08 16:02     ` Damien Lespiau
  0 siblings, 0 replies; 137+ messages in thread
From: Damien Lespiau @ 2014-08-08 16:02 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Aug 07, 2014 at 01:23:20PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> v2: Warn and return if LRCs are not enabled.
> 
> v3: Grab the Execlists spinlock (noticed by Daniel Vetter).
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> 
> v4: Lock the struct mutex for atomic state capture
> 
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |   80 +++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c    |    6 ---
>  drivers/gpu/drm/i915/intel_lrc.h    |    7 +++
>  3 files changed, 87 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index fc39610..f8f0e11 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1674,6 +1674,85 @@ static int i915_context_status(struct seq_file *m, void *unused)
>  	return 0;
>  }
>  
> +static int i915_execlists(struct seq_file *m, void *data)
> +{
> +	struct drm_info_node *node = (struct drm_info_node *) m->private;
> +	struct drm_device *dev = node->minor->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_engine_cs *ring;
> +	u32 status_pointer;
> +	u8 read_pointer;
> +	u8 write_pointer;
> +	u32 status;
> +	u32 ctx_id;
> +	struct list_head *cursor;
> +	int ring_id, i;
> +	int ret;
> +
> +	if (!i915.enable_execlists) {
> +		seq_printf(m, "Logical Ring Contexts are disabled\n");
> +		return 0;
> +	}

checkpatch.pl will tell you seq_puts() should be used here I guess.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>

-- 
Damien

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/43] drm/i915/bdw: Macro for LRCs and module option for Execlists
  2014-07-24 16:04 ` [PATCH 03/43] drm/i915/bdw: Macro for LRCs and module option for Execlists Thomas Daniel
@ 2014-08-11 13:57   ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 13:57 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:11PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
> These expanded contexts enable a number of new abilities, especially
> "Execlists".
> 
> The macro is defined to off until we have things in place to hope to
> work.
> 
> v2: Rename "advanced contexts" to the more correct "logical ring
> contexts".
> 
> v3: Add a module parameter to enable execlists. Execlist are relatively
> new, and so it'd be wise to be able to switch back to ring submission
> to debug subtle problems that will inevitably arise.
> 
> v4: Add an intel_enable_execlists function.
> 
> v5: Sanitize early, as suggested by Daniel. Remove lrc_enabled.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3)
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v4 & v5)
> ---
>  drivers/gpu/drm/i915/i915_drv.h    |    2 ++
>  drivers/gpu/drm/i915/i915_gem.c    |    3 +++
>  drivers/gpu/drm/i915/i915_params.c |    6 ++++++
>  drivers/gpu/drm/i915/intel_lrc.c   |   11 +++++++++++
>  drivers/gpu/drm/i915/intel_lrc.h   |    3 +++
>  5 files changed, 25 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 54c2bd9..a793d6d 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2037,6 +2037,7 @@ struct drm_i915_cmd_table {
>  #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
>  
>  #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
> +#define HAS_LOGICAL_RING_CONTEXTS(dev)	0
>  #define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >= 6)
>  #define HAS_PPGTT(dev)		(INTEL_INFO(dev)->gen >= 7 && !IS_GEN8(dev))
>  #define USES_PPGTT(dev)		intel_enable_ppgtt(dev, false)
> @@ -2122,6 +2123,7 @@ struct i915_params {
>  	int enable_rc6;
>  	int enable_fbc;
>  	int enable_ppgtt;
> +	int enable_execlists;
>  	int enable_psr;
>  	unsigned int preliminary_hw_support;
>  	int disable_power_well;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index e5d4d73..d8bf4fa 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4746,6 +4746,9 @@ int i915_gem_init(struct drm_device *dev)
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	int ret;
>  
> +	i915.enable_execlists = intel_sanitize_enable_execlists(dev,
> +			i915.enable_execlists);

The ordering constraint on i915.enabel_ppgtt sanitization is imo
super-fragile. You'll all get a big price (and grumpy maintainer) if this
blows up. I'll add a patch on top to WARN about this.
-Daniel

> +
>  	mutex_lock(&dev->struct_mutex);
>  
>  	if (IS_VALLEYVIEW(dev)) {
> diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> index bbdee21..7f0fb72 100644
> --- a/drivers/gpu/drm/i915/i915_params.c
> +++ b/drivers/gpu/drm/i915/i915_params.c
> @@ -35,6 +35,7 @@ struct i915_params i915 __read_mostly = {
>  	.vbt_sdvo_panel_type = -1,
>  	.enable_rc6 = -1,
>  	.enable_fbc = -1,
> +	.enable_execlists = -1,
>  	.enable_hangcheck = true,
>  	.enable_ppgtt = -1,
>  	.enable_psr = 1,
> @@ -117,6 +118,11 @@ MODULE_PARM_DESC(enable_ppgtt,
>  	"Override PPGTT usage. "
>  	"(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
>  
> +module_param_named(enable_execlists, i915.enable_execlists, int, 0400);
> +MODULE_PARM_DESC(enable_execlists,
> +	"Override execlists usage. "
> +	"(-1=auto [default], 0=disabled, 1=enabled)");
> +
>  module_param_named(enable_psr, i915.enable_psr, int, 0600);
>  MODULE_PARM_DESC(enable_psr, "Enable PSR (default: true)");
>  
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 49bb6fc..21f7f1c 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -40,3 +40,14 @@
>  #include <drm/drmP.h>
>  #include <drm/i915_drm.h>
>  #include "i915_drv.h"
> +
> +int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists)
> +{
> +	if (enable_execlists == 0)
> +		return 0;
> +
> +	if (HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev))
> +		return 1;
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index f6830a4..75ee9c3 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -24,4 +24,7 @@
>  #ifndef _INTEL_LRC_H_
>  #define _INTEL_LRC_H_
>  
> +/* Execlists */
> +int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
> +
>  #endif /* _INTEL_LRC_H_ */
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH] drm/i915: WARN if module opt sanitization goes out of order
  2014-07-24 16:04 ` [PATCH 05/43] drm/i915/bdw: Introduce one context backing object per engine Thomas Daniel
@ 2014-08-11 13:59   ` Daniel Vetter
  2014-08-11 14:28     ` Damien Lespiau
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 13:59 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Ben Widawsky

Depending upon one module option to be sanitized (through USES_PPGTT)
for the other is a bit too fragile for my taste. At least WARN about
this.

Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/intel_lrc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 21f7f1cce86e..44721292eb77 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -43,6 +43,8 @@
 
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists)
 {
+	WARN_ON(i915.enable_ppgtt == -1);
+
 	if (enable_execlists == 0)
 		return 0;
 
-- 
2.0.1

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* Re: [PATCH 04/43] drm/i915/bdw: Initialization for Logical Ring Contexts
  2014-07-24 16:04 ` [PATCH 04/43] drm/i915/bdw: Initialization for Logical Ring Contexts Thomas Daniel
@ 2014-08-11 14:03   ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 14:03 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:12PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> For the moment this is just a placeholder, but it shows one of the
> main differences between the good ol' HW contexts and the shiny
> new Logical Ring Contexts: LR contexts allocate  and free their
> own backing objects. Another difference is that the allocation is
> deferred (as the create function name suggests), but that does not
> happen in this patch yet, because for the moment we are only dealing
> with the default context.
> 
> Early in the series we had our own gen8_gem_context_init/fini
> functions, but the truth is they now look almost the same as the
> legacy hw context init/fini functions. We can always split them
> later if this ceases to be the case.
> 
> Also, we do not fall back to legacy ringbuffers when logical ring
> context initialization fails (not very likely to happen and, even
> if it does, hw contexts would probably fail as well).
> 
> v2: Daniel says "explain, do not showcase".
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c |   29 +++++++++++++++++++++++------
>  drivers/gpu/drm/i915/intel_lrc.c        |   15 +++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.h        |    5 +++++
>  3 files changed, 43 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index de72a28..718150e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -182,7 +182,10 @@ void i915_gem_context_free(struct kref *ctx_ref)
>  						   typeof(*ctx), ref);
>  	struct i915_hw_ppgtt *ppgtt = NULL;
>  
> -	if (ctx->legacy_hw_ctx.rcs_state) {
> +	if (i915.enable_execlists) {
> +		ppgtt = ctx_to_ppgtt(ctx);
> +		intel_lr_context_free(ctx);
> +	} else if (ctx->legacy_hw_ctx.rcs_state) {
>  		/* We refcount even the aliasing PPGTT to keep the code symmetric */
>  		if (USES_PPGTT(ctx->legacy_hw_ctx.rcs_state->base.dev))
>  			ppgtt = ctx_to_ppgtt(ctx);
> @@ -419,7 +422,11 @@ int i915_gem_context_init(struct drm_device *dev)
>  	if (WARN_ON(dev_priv->ring[RCS].default_context))
>  		return 0;
>  
> -	if (HAS_HW_CONTEXTS(dev)) {
> +	if (i915.enable_execlists) {
> +		/* NB: intentionally left blank. We will allocate our own
> +		 * backing objects as we need them, thank you very much */
> +		dev_priv->hw_context_size = 0;
> +	} else if (HAS_HW_CONTEXTS(dev)) {
>  		dev_priv->hw_context_size = round_up(get_context_size(dev), 4096);
>  		if (dev_priv->hw_context_size > (1<<20)) {
>  			DRM_DEBUG_DRIVER("Disabling HW Contexts; invalid size %d\n",
> @@ -435,11 +442,20 @@ int i915_gem_context_init(struct drm_device *dev)
>  		return PTR_ERR(ctx);
>  	}
>  
> -	/* NB: RCS will hold a ref for all rings */
> -	for (i = 0; i < I915_NUM_RINGS; i++)
> -		dev_priv->ring[i].default_context = ctx;
> +	for (i = 0; i < I915_NUM_RINGS; i++) {
> +		struct intel_engine_cs *ring = &dev_priv->ring[i];
> +
> +		/* NB: RCS will hold a ref for all rings */
> +		ring->default_context = ctx;
> +
> +		/* FIXME: we really only want to do this for initialized rings */
> +		if (i915.enable_execlists)
> +			intel_lr_context_deferred_create(ctx, ring);
> +	}
>  
> -	DRM_DEBUG_DRIVER("%s context support initialized\n", dev_priv->hw_context_size ? "HW" : "fake");
> +	DRM_DEBUG_DRIVER("%s context support initialized\n",
> +			i915.enable_execlists ? "LR" :
> +			dev_priv->hw_context_size ? "HW" : "fake");
>  	return 0;
>  }
>  
> @@ -781,6 +797,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>  	struct intel_context *ctx;
>  	int ret;
>  
> +	/* FIXME: allow user-created LR contexts as well */
>  	if (!hw_context_enabled(dev))
>  		return -ENODEV;
>  
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 21f7f1c..8cc6b55 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -51,3 +51,18 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
>  
>  	return 0;
>  }
> +
> +void intel_lr_context_free(struct intel_context *ctx)
> +{
> +	/* TODO */
> +}
> +
> +int intel_lr_context_deferred_create(struct intel_context *ctx,
> +				     struct intel_engine_cs *ring)
> +{
> +	BUG_ON(ctx->legacy_hw_ctx.rcs_state != NULL);

I command thy:

Thou shalt not use BUG_ON except in especially dire circumstances!

Fixed while applying.
-Daniel

> +
> +	/* TODO */
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 75ee9c3..3b93572 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -24,6 +24,11 @@
>  #ifndef _INTEL_LRC_H_
>  #define _INTEL_LRC_H_
>  
> +/* Logical Ring Contexts */
> +void intel_lr_context_free(struct intel_context *ctx);
> +int intel_lr_context_deferred_create(struct intel_context *ctx,
> +				     struct intel_engine_cs *ring);
> +
>  /* Execlists */
>  int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
>  
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 08/43] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
  2014-07-24 16:04 ` [PATCH 08/43] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer Thomas Daniel
@ 2014-08-11 14:14   ` Daniel Vetter
  2014-08-11 14:20     ` Daniel Vetter
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 14:14 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:16PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Any given ringbuffer is unequivocally tied to one context and one engine.
> By setting the appropriate pointers to them, the ringbuffer struct holds
> all the infromation you might need to submit a workload for processing,
> Execlists style.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c        |    2 ++
>  drivers/gpu/drm/i915/intel_ringbuffer.c |    2 ++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |    3 +++
>  3 files changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 0a12b8c..2eb7db6 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -132,6 +132,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
>  		return ret;
>  	}
>  
> +	ringbuf->ring = ring;
> +	ringbuf->ctx = ctx;
>  	ringbuf->size = 32 * PAGE_SIZE;
>  	ringbuf->effective_size = ringbuf->size;
>  	ringbuf->head = 0;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 01e9840..279dda4 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1570,6 +1570,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>  	INIT_LIST_HEAD(&ring->active_list);
>  	INIT_LIST_HEAD(&ring->request_list);
>  	ringbuf->size = 32 * PAGE_SIZE;
> +	ringbuf->ring = ring;
> +	ringbuf->ctx = ring->default_context;

That doesn't make a terribly lot of sense tbh. I fear it's one of these
slight confusions which will take tons of patches to clean up. Why exactly
do we need the ring->ctx pointer?

If we only need this for lrc I want to name it accordingly, to make sure
legacy code doesn't grow stupid ideas. And also we should only initialize
this in the lrc ctx init then.

All patches up to this one merged.
-Daniel

>  	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
>  
>  	init_waitqueue_head(&ring->irq_queue);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 053d004..be40788 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -88,6 +88,9 @@ struct intel_ringbuffer {
>  	struct drm_i915_gem_object *obj;
>  	void __iomem *virtual_start;
>  
> +	struct intel_engine_cs *ring;
> +	struct intel_context *ctx;
> +
>  	u32 head;
>  	u32 tail;
>  	int space;
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 08/43] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
  2014-08-11 14:14   ` Daniel Vetter
@ 2014-08-11 14:20     ` Daniel Vetter
  2014-08-13 13:34       ` Daniel, Thomas
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 14:20 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Mon, Aug 11, 2014 at 04:14:13PM +0200, Daniel Vetter wrote:
> On Thu, Jul 24, 2014 at 05:04:16PM +0100, Thomas Daniel wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> > 
> > Any given ringbuffer is unequivocally tied to one context and one engine.
> > By setting the appropriate pointers to them, the ringbuffer struct holds
> > all the infromation you might need to submit a workload for processing,
> > Execlists style.
> > 
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_lrc.c        |    2 ++
> >  drivers/gpu/drm/i915/intel_ringbuffer.c |    2 ++
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |    3 +++
> >  3 files changed, 7 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 0a12b8c..2eb7db6 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -132,6 +132,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
> >  		return ret;
> >  	}
> >  
> > +	ringbuf->ring = ring;
> > +	ringbuf->ctx = ctx;
> >  	ringbuf->size = 32 * PAGE_SIZE;
> >  	ringbuf->effective_size = ringbuf->size;
> >  	ringbuf->head = 0;
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 01e9840..279dda4 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1570,6 +1570,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
> >  	INIT_LIST_HEAD(&ring->active_list);
> >  	INIT_LIST_HEAD(&ring->request_list);
> >  	ringbuf->size = 32 * PAGE_SIZE;
> > +	ringbuf->ring = ring;
> > +	ringbuf->ctx = ring->default_context;
> 
> That doesn't make a terribly lot of sense tbh. I fear it's one of these
> slight confusions which will take tons of patches to clean up. Why exactly
> do we need the ring->ctx pointer?
> 
> If we only need this for lrc I want to name it accordingly, to make sure
> legacy code doesn't grow stupid ideas. And also we should only initialize
> this in the lrc ctx init then.
> 
> All patches up to this one merged.

Ok, I've discussed this quickly with Damien on irc. We decided to cut away
the ring->ctx part of this patch for now to be able to move on.
-Daniel

> -Daniel
> 
> >  	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
> >  
> >  	init_waitqueue_head(&ring->irq_queue);
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > index 053d004..be40788 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > @@ -88,6 +88,9 @@ struct intel_ringbuffer {
> >  	struct drm_i915_gem_object *obj;
> >  	void __iomem *virtual_start;
> >  
> > +	struct intel_engine_cs *ring;
> > +	struct intel_context *ctx;
> > +
> >  	u32 head;
> >  	u32 tail;
> >  	int space;
> > -- 
> > 1.7.9.5
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 10/43] drm/i915/bdw: Deferred creation of user-created LRCs
  2014-07-24 16:04 ` [PATCH 10/43] drm/i915/bdw: Deferred creation of user-created LRCs Thomas Daniel
@ 2014-08-11 14:25   ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 14:25 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:18PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> The backing objects and ringbuffers for contexts created via open
> fd are actually empty until the user starts sending execbuffers to
> them. At that point, we allocate & populate them. We do this because,
> at create time, we really don't know which engine is going to be used
> with the context later on (and we don't want to waste memory on
> objects that we might never use).
> 
> v2: As contexts created via ioctl can only be used with the render
> ring, we have enough information to allocate & populate them right
> away.
> 
> v3: Defer the creation always, even with ioctl-created contexts, as
> requested by Daniel Vetter.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

This patch seems to miss the hunk to remove

		/* FIXME: we really only want to do this for initialized rings */
		if (i915.enable_execlists)
			intel_lr_context_deferred_create(ctx, ring);

Also I just realized that that code completely lacks error handling.

If it's not already there please quickly submit a fixup patch.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem_context.c    |    7 +++----
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |    8 ++++++++
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 48d7476..fbe7278 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -784,9 +784,9 @@ int i915_switch_context(struct intel_engine_cs *ring,
>  	return do_switch(ring, to);
>  }
>  
> -static bool hw_context_enabled(struct drm_device *dev)
> +static bool contexts_enabled(struct drm_device *dev)
>  {
> -	return to_i915(dev)->hw_context_size;
> +	return i915.enable_execlists || to_i915(dev)->hw_context_size;
>  }
>  
>  int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
> @@ -797,8 +797,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>  	struct intel_context *ctx;
>  	int ret;
>  
> -	/* FIXME: allow user-created LR contexts as well */
> -	if (!hw_context_enabled(dev))
> +	if (!contexts_enabled(dev))
>  		return -ENODEV;
>  
>  	ret = i915_mutex_lock_interruptible(dev);
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index c5115957..4e9b387 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -951,6 +951,14 @@ i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
>  		return ERR_PTR(-EIO);
>  	}
>  
> +	if (i915.enable_execlists && !ctx->engine[ring->id].state) {
> +		int ret = intel_lr_context_deferred_create(ctx, ring);
> +		if (ret) {
> +			DRM_DEBUG("Could not create LRC %u: %d\n", ctx_id, ret);
> +			return ERR_PTR(ret);
> +		}
> +	}
> +
>  	return ctx;
>  }
>  
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH] drm/i915: WARN if module opt sanitization goes out of order
  2014-08-11 13:59   ` [PATCH] drm/i915: WARN if module opt sanitization goes out of order Daniel Vetter
@ 2014-08-11 14:28     ` Damien Lespiau
  0 siblings, 0 replies; 137+ messages in thread
From: Damien Lespiau @ 2014-08-11 14:28 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel Graphics Development, Ben Widawsky

On Mon, Aug 11, 2014 at 03:59:52PM +0200, Daniel Vetter wrote:
> Depending upon one module option to be sanitized (through USES_PPGTT)
> for the other is a bit too fragile for my taste. At least WARN about
> this.
> 
> Cc: Ben Widawsky <ben@bwidawsk.net>
> Cc: Damien Lespiau <damien.lespiau@intel.com>
> Cc: Oscar Mateo <oscar.mateo@intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Sounds like a safety measure we'd want. Until we try to fix these kind
of dependencies in a more adequate manner that is.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>

-- 
Damien

> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 21f7f1cce86e..44721292eb77 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -43,6 +43,8 @@
>  
>  int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists)
>  {
> +	WARN_ON(i915.enable_ppgtt == -1);
> +
>  	if (enable_execlists == 0)
>  		return 0;
>  
> -- 
> 2.0.1
> 

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 11/43] drm/i915/bdw: Render moot context reset and switch with Execlists
  2014-07-24 16:04 ` [PATCH 11/43] drm/i915/bdw: Render moot context reset and switch with Execlists Thomas Daniel
@ 2014-08-11 14:30   ` Daniel Vetter
  2014-08-15 10:22     ` Daniel, Thomas
  2014-08-20 15:29   ` [PATCH] " Thomas Daniel
  1 sibling, 1 reply; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 14:30 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:19PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> These two functions make no sense in an Logical Ring Context & Execlists
> world.
> 
> v2: We got rid of lrc_enabled and centralized everything in the sanitized
> i915.enbale_execlists instead.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c |    9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index fbe7278..288f5de 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -380,6 +380,9 @@ void i915_gem_context_reset(struct drm_device *dev)
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	int i;
>  
> +	if (i915.enable_execlists)
> +		return;

This will conflict badly with Alistair's patch at a functional level. I'm
pretty sure that we want _some_ form of reset for the context state, since
the hw didn't just magically load the previously running context. So NACK
on this hunk.

> +
>  	/* Prevent the hardware from restoring the last context (which hung) on
>  	 * the next switch */
>  	for (i = 0; i < I915_NUM_RINGS; i++) {
> @@ -514,6 +517,9 @@ int i915_gem_context_enable(struct drm_i915_private *dev_priv)
>  		ppgtt->enable(ppgtt);
>  	}
>  
> +	if (i915.enable_execlists)
> +		return 0;

Again this conflicts with Alistair's patch. Furthermore it looks redudant
since you no-op out i915_switch_context separately.

> +
>  	/* FIXME: We should make this work, even in reset */
>  	if (i915_reset_in_progress(&dev_priv->gpu_error))
>  		return 0;
> @@ -769,6 +775,9 @@ int i915_switch_context(struct intel_engine_cs *ring,
>  {
>  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
>  
> +	if (i915.enable_execlists)
> +		return 0;

I've hoped we don't need this with the legacy ringbuffer cmdsubmission
fuly split out. If there are other paths (resume, gpu reset) where this
comes into play then I guess we need to look at where the best place is to
make this call. So until this comes with a bit a better justification I'll
punt on this for now.
-Daniel

> +
>  	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
>  
>  	if (to->legacy_hw_ctx.rcs_state == NULL) { /* We have the fake context */
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 12/43] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
  2014-08-07 12:17   ` Thomas Daniel
  2014-08-08 15:59     ` Damien Lespiau
@ 2014-08-11 14:32     ` Daniel Vetter
  2014-08-15 11:01     ` [PATCH] " Thomas Daniel
  2 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 14:32 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Aug 07, 2014 at 01:17:40PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> This is mostly for correctness so that we know we are running the LR
> context correctly (this is, the PDPs are contained inside the context
> object).
> 
> v2: Move the check to inside the enable PPGTT function. The switch
> happens in two places: the legacy context switch (that we won't hit
> when Execlists are enabled) and the PPGTT enable, which unfortunately
> we need. This would look much nicer if the ppgtt->enable was part of
> the ring init, where it logically belongs.
> 
> v3: Move the check to the start of the enable PPGTT function.  None
> of the legacy PPGTT enabling is required when using LRCs as the
> PPGTT is enabled in the context descriptor and the PDPs are written
> in the LRC.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 5188936..cfbf272 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -843,6 +843,11 @@ static int gen8_ppgtt_enable(struct i915_hw_ppgtt *ppgtt)
>  	struct intel_engine_cs *ring;
>  	int j, ret;
>  
> +	/* In the case of Execlists, we don't want to write the PDPs

This comment is misleading since you now also disable the ppgtt enable bit
setting. Not that the code disabled in v2 of this patch is for aliasing
ppgtt mode only anyway, so irrelevant for execlist mode. Aside: A few
patches from myself will clear this up a bit better.

As-is this just looks confusing, so I'll punt for now.
-Daniel

> +	 * in the legacy way (they live inside the context now) */
> +	if (i915.enable_execlists)
> +		return 0;
> +
>  	for_each_ring(ring, dev_priv, j) {
>  		I915_WRITE(RING_MODE_GEN7(ring),
>  			   _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 13/43] drm/i915: Abstract the legacy workload submission mechanism away
  2014-07-24 16:04 ` [PATCH 13/43] drm/i915: Abstract the legacy workload submission mechanism away Thomas Daniel
@ 2014-08-11 14:36   ` Daniel Vetter
  2014-08-11 14:39     ` Daniel Vetter
  2014-08-11 14:39   ` Daniel Vetter
  2014-08-11 15:02   ` Daniel Vetter
  2 siblings, 1 reply; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 14:36 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:21PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> As suggested by Daniel Vetter. The idea, in subsequent patches, is to
> provide an alternative to these vfuncs for the Execlists submission
> mechanism.
> 
> v2: Splitted into two and reordered to illustrate our intentions, instead
> of showing it off. Also, remove the add_request vfunc and added the
> stop_ring one.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h            |   24 ++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_gem.c            |   15 +++++++++++----
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   20 ++++++++++----------
>  3 files changed, 45 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index ff2c373..1caed52 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1617,6 +1617,21 @@ struct drm_i915_private {
>  	/* Old ums support infrastructure, same warning applies. */
>  	struct i915_ums_state ums;
>  
> +	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
> +	struct {
> +		int (*do_execbuf) (struct drm_device *dev, struct drm_file *file,
> +				   struct intel_engine_cs *ring,
> +				   struct intel_context *ctx,
> +				   struct drm_i915_gem_execbuffer2 *args,
> +				   struct list_head *vmas,
> +				   struct drm_i915_gem_object *batch_obj,
> +				   u64 exec_start, u32 flags);
> +		int (*init_rings) (struct drm_device *dev);
> +		void (*cleanup_ring) (struct intel_engine_cs *ring);
> +		void (*stop_ring) (struct intel_engine_cs *ring);

My rule of thumb is that with just one caller it's probably better to not
have a vtable - just makes it harder to follow the code flow.

Usually (with non-borked code design at least) init/teardown functions
have only one caller, so there's a good chance I'll ask you to remove them
again.

> +		bool (*is_ring_initialized) (struct intel_engine_cs *ring);

This one is unused and I didn't really like the taste of it. So I killed
it.
-Daniel

> +	} gt;
> +
>  	/*
>  	 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
>  	 * will be rejected. Instead look for a better place.
> @@ -2224,6 +2239,14 @@ int i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
>  			      struct drm_file *file_priv);
>  int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
>  			     struct drm_file *file_priv);
> +int i915_gem_ringbuffer_submission(struct drm_device *dev,
> +				   struct drm_file *file,
> +				   struct intel_engine_cs *ring,
> +				   struct intel_context *ctx,
> +				   struct drm_i915_gem_execbuffer2 *args,
> +				   struct list_head *vmas,
> +				   struct drm_i915_gem_object *batch_obj,
> +				   u64 exec_start, u32 flags);
>  int i915_gem_execbuffer(struct drm_device *dev, void *data,
>  			struct drm_file *file_priv);
>  int i915_gem_execbuffer2(struct drm_device *dev, void *data,
> @@ -2376,6 +2399,7 @@ void i915_gem_reset(struct drm_device *dev);
>  bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
>  int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
>  int __must_check i915_gem_init(struct drm_device *dev);
> +int i915_gem_init_rings(struct drm_device *dev);
>  int __must_check i915_gem_init_hw(struct drm_device *dev);
>  int i915_gem_l3_remap(struct intel_engine_cs *ring, int slice);
>  void i915_gem_init_swizzling(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d8bf4fa..6544286 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4518,7 +4518,7 @@ i915_gem_stop_ringbuffers(struct drm_device *dev)
>  	int i;
>  
>  	for_each_ring(ring, dev_priv, i)
> -		intel_stop_ring_buffer(ring);
> +		dev_priv->gt.stop_ring(ring);
>  }
>  
>  int
> @@ -4635,7 +4635,7 @@ intel_enable_blt(struct drm_device *dev)
>  	return true;
>  }
>  
> -static int i915_gem_init_rings(struct drm_device *dev)
> +int i915_gem_init_rings(struct drm_device *dev)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	int ret;
> @@ -4718,7 +4718,7 @@ i915_gem_init_hw(struct drm_device *dev)
>  
>  	i915_gem_init_swizzling(dev);
>  
> -	ret = i915_gem_init_rings(dev);
> +	ret = dev_priv->gt.init_rings(dev);
>  	if (ret)
>  		return ret;
>  
> @@ -4759,6 +4759,13 @@ int i915_gem_init(struct drm_device *dev)
>  			DRM_DEBUG_DRIVER("allow wake ack timed out\n");
>  	}
>  
> +	if (!i915.enable_execlists) {
> +		dev_priv->gt.do_execbuf = i915_gem_ringbuffer_submission;
> +		dev_priv->gt.init_rings = i915_gem_init_rings;
> +		dev_priv->gt.cleanup_ring = intel_cleanup_ring_buffer;
> +		dev_priv->gt.stop_ring = intel_stop_ring_buffer;
> +	}
> +
>  	i915_gem_init_userptr(dev);
>  	i915_gem_init_global_gtt(dev);
>  
> @@ -4794,7 +4801,7 @@ i915_gem_cleanup_ringbuffer(struct drm_device *dev)
>  	int i;
>  
>  	for_each_ring(ring, dev_priv, i)
> -		intel_cleanup_ring_buffer(ring);
> +		dev_priv->gt.cleanup_ring(ring);
>  }
>  
>  int
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 4e9b387..8c63d79 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1034,14 +1034,14 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev,
>  	return 0;
>  }
>  
> -static int
> -legacy_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
> -			     struct intel_engine_cs *ring,
> -			     struct intel_context *ctx,
> -			     struct drm_i915_gem_execbuffer2 *args,
> -			     struct list_head *vmas,
> -			     struct drm_i915_gem_object *batch_obj,
> -			     u64 exec_start, u32 flags)
> +int
> +i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
> +			       struct intel_engine_cs *ring,
> +			       struct intel_context *ctx,
> +			       struct drm_i915_gem_execbuffer2 *args,
> +			       struct list_head *vmas,
> +			       struct drm_i915_gem_object *batch_obj,
> +			       u64 exec_start, u32 flags)
>  {
>  	struct drm_clip_rect *cliprects = NULL;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> @@ -1408,8 +1408,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	else
>  		exec_start += i915_gem_obj_offset(batch_obj, vm);
>  
> -	ret = legacy_ringbuffer_submission(dev, file, ring, ctx,
> -			args, &eb->vmas, batch_obj, exec_start, flags);
> +	ret = dev_priv->gt.do_execbuf(dev, file, ring, ctx, args,
> +			&eb->vmas, batch_obj, exec_start, flags);
>  	if (ret)
>  		goto err;
>  
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 13/43] drm/i915: Abstract the legacy workload submission mechanism away
  2014-08-11 14:36   ` Daniel Vetter
@ 2014-08-11 14:39     ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 14:39 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Mon, Aug 11, 2014 at 04:36:53PM +0200, Daniel Vetter wrote:
> On Thu, Jul 24, 2014 at 05:04:21PM +0100, Thomas Daniel wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> > 
> > As suggested by Daniel Vetter. The idea, in subsequent patches, is to
> > provide an alternative to these vfuncs for the Execlists submission
> > mechanism.
> > 
> > v2: Splitted into two and reordered to illustrate our intentions, instead
> > of showing it off. Also, remove the add_request vfunc and added the
> > stop_ring one.
> > 
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h            |   24 ++++++++++++++++++++++++
> >  drivers/gpu/drm/i915/i915_gem.c            |   15 +++++++++++----
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   20 ++++++++++----------
> >  3 files changed, 45 insertions(+), 14 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index ff2c373..1caed52 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1617,6 +1617,21 @@ struct drm_i915_private {
> >  	/* Old ums support infrastructure, same warning applies. */
> >  	struct i915_ums_state ums;
> >  
> > +	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
> > +	struct {
> > +		int (*do_execbuf) (struct drm_device *dev, struct drm_file *file,
> > +				   struct intel_engine_cs *ring,
> > +				   struct intel_context *ctx,
> > +				   struct drm_i915_gem_execbuffer2 *args,
> > +				   struct list_head *vmas,
> > +				   struct drm_i915_gem_object *batch_obj,
> > +				   u64 exec_start, u32 flags);
> > +		int (*init_rings) (struct drm_device *dev);
> > +		void (*cleanup_ring) (struct intel_engine_cs *ring);
> > +		void (*stop_ring) (struct intel_engine_cs *ring);
> 
> My rule of thumb is that with just one caller it's probably better to not
> have a vtable - just makes it harder to follow the code flow.
> 
> Usually (with non-borked code design at least) init/teardown functions
> have only one caller, so there's a good chance I'll ask you to remove them
> again.

Also checkpatch (and my eyes!) where unhappy about the space you've put
between ) and (. Fixed that too.
-Daniel

> 
> > +		bool (*is_ring_initialized) (struct intel_engine_cs *ring);
> 
> This one is unused and I didn't really like the taste of it. So I killed
> it.
> -Daniel
> 
> > +	} gt;
> > +
> >  	/*
> >  	 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
> >  	 * will be rejected. Instead look for a better place.
> > @@ -2224,6 +2239,14 @@ int i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
> >  			      struct drm_file *file_priv);
> >  int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
> >  			     struct drm_file *file_priv);
> > +int i915_gem_ringbuffer_submission(struct drm_device *dev,
> > +				   struct drm_file *file,
> > +				   struct intel_engine_cs *ring,
> > +				   struct intel_context *ctx,
> > +				   struct drm_i915_gem_execbuffer2 *args,
> > +				   struct list_head *vmas,
> > +				   struct drm_i915_gem_object *batch_obj,
> > +				   u64 exec_start, u32 flags);
> >  int i915_gem_execbuffer(struct drm_device *dev, void *data,
> >  			struct drm_file *file_priv);
> >  int i915_gem_execbuffer2(struct drm_device *dev, void *data,
> > @@ -2376,6 +2399,7 @@ void i915_gem_reset(struct drm_device *dev);
> >  bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
> >  int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
> >  int __must_check i915_gem_init(struct drm_device *dev);
> > +int i915_gem_init_rings(struct drm_device *dev);
> >  int __must_check i915_gem_init_hw(struct drm_device *dev);
> >  int i915_gem_l3_remap(struct intel_engine_cs *ring, int slice);
> >  void i915_gem_init_swizzling(struct drm_device *dev);
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index d8bf4fa..6544286 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4518,7 +4518,7 @@ i915_gem_stop_ringbuffers(struct drm_device *dev)
> >  	int i;
> >  
> >  	for_each_ring(ring, dev_priv, i)
> > -		intel_stop_ring_buffer(ring);
> > +		dev_priv->gt.stop_ring(ring);
> >  }
> >  
> >  int
> > @@ -4635,7 +4635,7 @@ intel_enable_blt(struct drm_device *dev)
> >  	return true;
> >  }
> >  
> > -static int i915_gem_init_rings(struct drm_device *dev)
> > +int i915_gem_init_rings(struct drm_device *dev)
> >  {
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  	int ret;
> > @@ -4718,7 +4718,7 @@ i915_gem_init_hw(struct drm_device *dev)
> >  
> >  	i915_gem_init_swizzling(dev);
> >  
> > -	ret = i915_gem_init_rings(dev);
> > +	ret = dev_priv->gt.init_rings(dev);
> >  	if (ret)
> >  		return ret;
> >  
> > @@ -4759,6 +4759,13 @@ int i915_gem_init(struct drm_device *dev)
> >  			DRM_DEBUG_DRIVER("allow wake ack timed out\n");
> >  	}
> >  
> > +	if (!i915.enable_execlists) {
> > +		dev_priv->gt.do_execbuf = i915_gem_ringbuffer_submission;
> > +		dev_priv->gt.init_rings = i915_gem_init_rings;
> > +		dev_priv->gt.cleanup_ring = intel_cleanup_ring_buffer;
> > +		dev_priv->gt.stop_ring = intel_stop_ring_buffer;
> > +	}
> > +
> >  	i915_gem_init_userptr(dev);
> >  	i915_gem_init_global_gtt(dev);
> >  
> > @@ -4794,7 +4801,7 @@ i915_gem_cleanup_ringbuffer(struct drm_device *dev)
> >  	int i;
> >  
> >  	for_each_ring(ring, dev_priv, i)
> > -		intel_cleanup_ring_buffer(ring);
> > +		dev_priv->gt.cleanup_ring(ring);
> >  }
> >  
> >  int
> > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index 4e9b387..8c63d79 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -1034,14 +1034,14 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev,
> >  	return 0;
> >  }
> >  
> > -static int
> > -legacy_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
> > -			     struct intel_engine_cs *ring,
> > -			     struct intel_context *ctx,
> > -			     struct drm_i915_gem_execbuffer2 *args,
> > -			     struct list_head *vmas,
> > -			     struct drm_i915_gem_object *batch_obj,
> > -			     u64 exec_start, u32 flags)
> > +int
> > +i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
> > +			       struct intel_engine_cs *ring,
> > +			       struct intel_context *ctx,
> > +			       struct drm_i915_gem_execbuffer2 *args,
> > +			       struct list_head *vmas,
> > +			       struct drm_i915_gem_object *batch_obj,
> > +			       u64 exec_start, u32 flags)
> >  {
> >  	struct drm_clip_rect *cliprects = NULL;
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> > @@ -1408,8 +1408,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >  	else
> >  		exec_start += i915_gem_obj_offset(batch_obj, vm);
> >  
> > -	ret = legacy_ringbuffer_submission(dev, file, ring, ctx,
> > -			args, &eb->vmas, batch_obj, exec_start, flags);
> > +	ret = dev_priv->gt.do_execbuf(dev, file, ring, ctx, args,
> > +			&eb->vmas, batch_obj, exec_start, flags);
> >  	if (ret)
> >  		goto err;
> >  
> > -- 
> > 1.7.9.5
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 13/43] drm/i915: Abstract the legacy workload submission mechanism away
  2014-07-24 16:04 ` [PATCH 13/43] drm/i915: Abstract the legacy workload submission mechanism away Thomas Daniel
  2014-08-11 14:36   ` Daniel Vetter
@ 2014-08-11 14:39   ` Daniel Vetter
  2014-08-11 15:02   ` Daniel Vetter
  2 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 14:39 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:21PM +0100, Thomas Daniel wrote:
> @@ -1408,8 +1408,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	else
>  		exec_start += i915_gem_obj_offset(batch_obj, vm);
>  
> -	ret = legacy_ringbuffer_submission(dev, file, ring, ctx,
> -			args, &eb->vmas, batch_obj, exec_start, flags);
> +	ret = dev_priv->gt.do_execbuf(dev, file, ring, ctx, args,
> +			&eb->vmas, batch_obj, exec_start, flags);

Also misaligned and too long line here. Fixed, too.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 15/43] drm/i915/bdw: Generic logical ring init and cleanup
  2014-07-24 16:04 ` [PATCH 15/43] drm/i915/bdw: Generic logical ring init and cleanup Thomas Daniel
@ 2014-08-11 15:01   ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 15:01 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:23PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Allocate and populate the default LRC for every ring, call
> gen-specific init/cleanup, init/fini the command parser and
> set the status page (now inside the LRC object). These are
> things all engines/rings have in common.
> 
> Stopping the ring before cleanup and initializing the seqnos
> is left as a TODO task (we need more infrastructure in place
> before we can achieve this).
> 
> v2: Check the ringbuffer backing obj for ring_is_initialized,
> instead of the context backing obj (similar, but not exactly
> the same).
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c |    4 ---
>  drivers/gpu/drm/i915/intel_lrc.c        |   54 +++++++++++++++++++++++++++++--
>  drivers/gpu/drm/i915/intel_ringbuffer.c |   17 ++++++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |    6 +---
>  4 files changed, 70 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 288f5de..9085ff1 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -450,10 +450,6 @@ int i915_gem_context_init(struct drm_device *dev)
>  
>  		/* NB: RCS will hold a ref for all rings */
>  		ring->default_context = ctx;
> -
> -		/* FIXME: we really only want to do this for initialized rings */
> -		if (i915.enable_execlists)
> -			intel_lr_context_deferred_create(ctx, ring);

Ah, here we go ;-)

>  	}
>  
>  	DRM_DEBUG_DRIVER("%s context support initialized\n",
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index cb56bb8..05b7069 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -108,12 +108,60 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
>  
>  void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
>  {
> -	/* TODO */
> +	if (!intel_ring_initialized(ring))
> +		return;
> +
> +	/* TODO: make sure the ring is stopped */
> +	ring->preallocated_lazy_request = NULL;
> +	ring->outstanding_lazy_seqno = 0;
> +
> +	if (ring->cleanup)
> +		ring->cleanup(ring);
> +
> +	i915_cmd_parser_fini_ring(ring);
> +
> +	if (ring->status_page.obj) {
> +		kunmap(sg_page(ring->status_page.obj->pages->sgl));
> +		ring->status_page.obj = NULL;
> +	}
>  }
>  
>  static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
>  {
> -	/* TODO */
> +	int ret;
> +	struct intel_context *dctx = ring->default_context;
> +	struct drm_i915_gem_object *dctx_obj;
> +
> +	/* Intentionally left blank. */
> +	ring->buffer = NULL;
> +
> +	ring->dev = dev;
> +	INIT_LIST_HEAD(&ring->active_list);
> +	INIT_LIST_HEAD(&ring->request_list);
> +	init_waitqueue_head(&ring->irq_queue);
> +
> +	ret = intel_lr_context_deferred_create(dctx, ring);
> +	if (ret)
> +		return ret;
> +
> +	/* The status page is offset 0 from the context object in LRCs. */
> +	dctx_obj = dctx->engine[ring->id].state;
> +	ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(dctx_obj);
> +	ring->status_page.page_addr = kmap(sg_page(dctx_obj->pages->sgl));
> +	if (ring->status_page.page_addr == NULL)
> +		return -ENOMEM;
> +	ring->status_page.obj = dctx_obj;
> +
> +	ret = i915_cmd_parser_init_ring(ring);
> +	if (ret)
> +		return ret;
> +
> +	if (ring->init) {
> +		ret = ring->init(ring);
> +		if (ret)
> +			return ret;
> +	}
> +
>  	return 0;
>  }
>  
> @@ -397,6 +445,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
>  	int ret;
>  
>  	BUG_ON(ctx->legacy_hw_ctx.rcs_state != NULL);
> +	if (ctx->engine[ring->id].state)
> +		return 0;
>  
>  	context_size = round_up(get_lr_context_size(ring), 4096);
>  
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 279dda4..20eb1a4 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -40,6 +40,23 @@
>   */
>  #define CACHELINE_BYTES 64
>  
> +bool
> +intel_ring_initialized(struct intel_engine_cs *ring)
> +{
> +	struct drm_device *dev = ring->dev;
> +
> +	if (!dev)
> +		return false;
> +
> +	if (i915.enable_execlists) {
> +		struct intel_context *dctx = ring->default_context;
> +		struct intel_ringbuffer *ringbuf = dctx->engine[ring->id].ringbuf;
> +
> +		return ringbuf->obj;
> +	} else
> +		return ring->buffer && ring->buffer->obj;
> +}

Looks like I'll not regret having ditched gt.ring_is_initialized.
-Daniel

> +
>  static inline int __ring_space(int head, int tail, int size)
>  {
>  	int space = head - (tail + I915_RING_FREE_SPACE);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index be40788..7203ee2 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -288,11 +288,7 @@ struct  intel_engine_cs {
>  	u32 (*get_cmd_length_mask)(u32 cmd_header);
>  };
>  
> -static inline bool
> -intel_ring_initialized(struct intel_engine_cs *ring)
> -{
> -	return ring->buffer && ring->buffer->obj;
> -}
> +bool intel_ring_initialized(struct intel_engine_cs *ring);
>  
>  static inline unsigned
>  intel_ring_flag(struct intel_engine_cs *ring)
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 13/43] drm/i915: Abstract the legacy workload submission mechanism away
  2014-07-24 16:04 ` [PATCH 13/43] drm/i915: Abstract the legacy workload submission mechanism away Thomas Daniel
  2014-08-11 14:36   ` Daniel Vetter
  2014-08-11 14:39   ` Daniel Vetter
@ 2014-08-11 15:02   ` Daniel Vetter
  2 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 15:02 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:21PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> As suggested by Daniel Vetter. The idea, in subsequent patches, is to
> provide an alternative to these vfuncs for the Execlists submission
> mechanism.
> 
> v2: Splitted into two and reordered to illustrate our intentions, instead
> of showing it off. Also, remove the add_request vfunc and added the
> stop_ring one.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h            |   24 ++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_gem.c            |   15 +++++++++++----
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   20 ++++++++++----------
>  3 files changed, 45 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index ff2c373..1caed52 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1617,6 +1617,21 @@ struct drm_i915_private {
>  	/* Old ums support infrastructure, same warning applies. */
>  	struct i915_ums_state ums;
>  
> +	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
> +	struct {
> +		int (*do_execbuf) (struct drm_device *dev, struct drm_file *file,
> +				   struct intel_engine_cs *ring,
> +				   struct intel_context *ctx,
> +				   struct drm_i915_gem_execbuffer2 *args,
> +				   struct list_head *vmas,
> +				   struct drm_i915_gem_object *batch_obj,
> +				   u64 exec_start, u32 flags);
> +		int (*init_rings) (struct drm_device *dev);
> +		void (*cleanup_ring) (struct intel_engine_cs *ring);
> +		void (*stop_ring) (struct intel_engine_cs *ring);
> +		bool (*is_ring_initialized) (struct intel_engine_cs *ring);

Aside: ring here is a bit a misnomer, we talk about the engines. I guess
at the end of all this we should throw a few patches on top to name things
correctly with ring/engine where appropriate.
-Daniel

> +	} gt;
> +
>  	/*
>  	 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
>  	 * will be rejected. Instead look for a better place.
> @@ -2224,6 +2239,14 @@ int i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
>  			      struct drm_file *file_priv);
>  int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
>  			     struct drm_file *file_priv);
> +int i915_gem_ringbuffer_submission(struct drm_device *dev,
> +				   struct drm_file *file,
> +				   struct intel_engine_cs *ring,
> +				   struct intel_context *ctx,
> +				   struct drm_i915_gem_execbuffer2 *args,
> +				   struct list_head *vmas,
> +				   struct drm_i915_gem_object *batch_obj,
> +				   u64 exec_start, u32 flags);
>  int i915_gem_execbuffer(struct drm_device *dev, void *data,
>  			struct drm_file *file_priv);
>  int i915_gem_execbuffer2(struct drm_device *dev, void *data,
> @@ -2376,6 +2399,7 @@ void i915_gem_reset(struct drm_device *dev);
>  bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
>  int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
>  int __must_check i915_gem_init(struct drm_device *dev);
> +int i915_gem_init_rings(struct drm_device *dev);
>  int __must_check i915_gem_init_hw(struct drm_device *dev);
>  int i915_gem_l3_remap(struct intel_engine_cs *ring, int slice);
>  void i915_gem_init_swizzling(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d8bf4fa..6544286 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4518,7 +4518,7 @@ i915_gem_stop_ringbuffers(struct drm_device *dev)
>  	int i;
>  
>  	for_each_ring(ring, dev_priv, i)
> -		intel_stop_ring_buffer(ring);
> +		dev_priv->gt.stop_ring(ring);
>  }
>  
>  int
> @@ -4635,7 +4635,7 @@ intel_enable_blt(struct drm_device *dev)
>  	return true;
>  }
>  
> -static int i915_gem_init_rings(struct drm_device *dev)
> +int i915_gem_init_rings(struct drm_device *dev)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	int ret;
> @@ -4718,7 +4718,7 @@ i915_gem_init_hw(struct drm_device *dev)
>  
>  	i915_gem_init_swizzling(dev);
>  
> -	ret = i915_gem_init_rings(dev);
> +	ret = dev_priv->gt.init_rings(dev);
>  	if (ret)
>  		return ret;
>  
> @@ -4759,6 +4759,13 @@ int i915_gem_init(struct drm_device *dev)
>  			DRM_DEBUG_DRIVER("allow wake ack timed out\n");
>  	}
>  
> +	if (!i915.enable_execlists) {
> +		dev_priv->gt.do_execbuf = i915_gem_ringbuffer_submission;
> +		dev_priv->gt.init_rings = i915_gem_init_rings;
> +		dev_priv->gt.cleanup_ring = intel_cleanup_ring_buffer;
> +		dev_priv->gt.stop_ring = intel_stop_ring_buffer;
> +	}
> +
>  	i915_gem_init_userptr(dev);
>  	i915_gem_init_global_gtt(dev);
>  
> @@ -4794,7 +4801,7 @@ i915_gem_cleanup_ringbuffer(struct drm_device *dev)
>  	int i;
>  
>  	for_each_ring(ring, dev_priv, i)
> -		intel_cleanup_ring_buffer(ring);
> +		dev_priv->gt.cleanup_ring(ring);
>  }
>  
>  int
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 4e9b387..8c63d79 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1034,14 +1034,14 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev,
>  	return 0;
>  }
>  
> -static int
> -legacy_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
> -			     struct intel_engine_cs *ring,
> -			     struct intel_context *ctx,
> -			     struct drm_i915_gem_execbuffer2 *args,
> -			     struct list_head *vmas,
> -			     struct drm_i915_gem_object *batch_obj,
> -			     u64 exec_start, u32 flags)
> +int
> +i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
> +			       struct intel_engine_cs *ring,
> +			       struct intel_context *ctx,
> +			       struct drm_i915_gem_execbuffer2 *args,
> +			       struct list_head *vmas,
> +			       struct drm_i915_gem_object *batch_obj,
> +			       u64 exec_start, u32 flags)
>  {
>  	struct drm_clip_rect *cliprects = NULL;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> @@ -1408,8 +1408,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	else
>  		exec_start += i915_gem_obj_offset(batch_obj, vm);
>  
> -	ret = legacy_ringbuffer_submission(dev, file, ring, ctx,
> -			args, &eb->vmas, batch_obj, exec_start, flags);
> +	ret = dev_priv->gt.do_execbuf(dev, file, ring, ctx, args,
> +			&eb->vmas, batch_obj, exec_start, flags);
>  	if (ret)
>  		goto err;
>  
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 16/43] drm/i915/bdw: GEN-specific logical ring init
  2014-07-24 16:04 ` [PATCH 16/43] drm/i915/bdw: GEN-specific logical ring init Thomas Daniel
@ 2014-08-11 15:04   ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 15:04 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:24PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Logical rings do not need most of the initialization their
> legacy ringbuffer counterparts do: we just need the pipe
> control object for the render ring, enable Execlists on the
> hardware and a few workarounds.
> 
> v2: Squash with: "drm/i915: Extract pipe control fini & make
> init outside accesible".
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c        |   54 +++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.c |   34 +++++++++++--------
>  drivers/gpu/drm/i915/intel_ringbuffer.h |    3 ++
>  3 files changed, 78 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 05b7069..7c8b75e 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -106,6 +106,49 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
>  	/* TODO */
>  }
>  
> +static int gen8_init_common_ring(struct intel_engine_cs *ring)
> +{
> +	struct drm_device *dev = ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	I915_WRITE(RING_MODE_GEN7(ring),
> +		_MASKED_BIT_DISABLE(GFX_REPLAY_MODE) |
> +		_MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));

Please build up a bit a closer relationship with checkpatch.pl, thanks.

Fixed while merging.
-Daniel

> +	POSTING_READ(RING_MODE_GEN7(ring));
> +	DRM_DEBUG_DRIVER("Execlists enabled for %s\n", ring->name);
> +
> +	memset(&ring->hangcheck, 0, sizeof(ring->hangcheck));
> +
> +	return 0;
> +}
> +
> +static int gen8_init_render_ring(struct intel_engine_cs *ring)
> +{
> +	struct drm_device *dev = ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	int ret;
> +
> +	ret = gen8_init_common_ring(ring);
> +	if (ret)
> +		return ret;
> +
> +	/* We need to disable the AsyncFlip performance optimisations in order
> +	 * to use MI_WAIT_FOR_EVENT within the CS. It should already be
> +	 * programmed to '1' on all products.
> +	 *
> +	 * WaDisableAsyncFlipPerfMode:snb,ivb,hsw,vlv,bdw,chv
> +	 */
> +	I915_WRITE(MI_MODE, _MASKED_BIT_ENABLE(ASYNC_FLIP_PERF_DISABLE));
> +
> +	ret = intel_init_pipe_control(ring);
> +	if (ret)
> +		return ret;
> +
> +	I915_WRITE(INSTPM, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
> +
> +	return ret;
> +}
> +
>  void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
>  {
>  	if (!intel_ring_initialized(ring))
> @@ -176,6 +219,9 @@ static int logical_render_ring_init(struct drm_device *dev)
>  	ring->irq_enable_mask =
>  		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
>  
> +	ring->init = gen8_init_render_ring;
> +	ring->cleanup = intel_fini_pipe_control;
> +
>  	return logical_ring_init(dev, ring);
>  }
>  
> @@ -190,6 +236,8 @@ static int logical_bsd_ring_init(struct drm_device *dev)
>  	ring->irq_enable_mask =
>  		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
>  
> +	ring->init = gen8_init_common_ring;
> +
>  	return logical_ring_init(dev, ring);
>  }
>  
> @@ -204,6 +252,8 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
>  	ring->irq_enable_mask =
>  		GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
>  
> +	ring->init = gen8_init_common_ring;
> +
>  	return logical_ring_init(dev, ring);
>  }
>  
> @@ -218,6 +268,8 @@ static int logical_blt_ring_init(struct drm_device *dev)
>  	ring->irq_enable_mask =
>  		GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
>  
> +	ring->init = gen8_init_common_ring;
> +
>  	return logical_ring_init(dev, ring);
>  }
>  
> @@ -232,6 +284,8 @@ static int logical_vebox_ring_init(struct drm_device *dev)
>  	ring->irq_enable_mask =
>  		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
>  
> +	ring->init = gen8_init_common_ring;
> +
>  	return logical_ring_init(dev, ring);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 20eb1a4..ca45c58 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -573,8 +573,25 @@ out:
>  	return ret;
>  }
>  
> -static int
> -init_pipe_control(struct intel_engine_cs *ring)
> +void
> +intel_fini_pipe_control(struct intel_engine_cs *ring)
> +{
> +	struct drm_device *dev = ring->dev;
> +
> +	if (ring->scratch.obj == NULL)
> +		return;
> +
> +	if (INTEL_INFO(dev)->gen >= 5) {
> +		kunmap(sg_page(ring->scratch.obj->pages->sgl));
> +		i915_gem_object_ggtt_unpin(ring->scratch.obj);
> +	}
> +
> +	drm_gem_object_unreference(&ring->scratch.obj->base);
> +	ring->scratch.obj = NULL;
> +}
> +
> +int
> +intel_init_pipe_control(struct intel_engine_cs *ring)
>  {
>  	int ret;
>  
> @@ -649,7 +666,7 @@ static int init_render_ring(struct intel_engine_cs *ring)
>  			   _MASKED_BIT_ENABLE(GFX_REPLAY_MODE));
>  
>  	if (INTEL_INFO(dev)->gen >= 5) {
> -		ret = init_pipe_control(ring);
> +		ret = intel_init_pipe_control(ring);
>  		if (ret)
>  			return ret;
>  	}
> @@ -684,16 +701,7 @@ static void render_ring_cleanup(struct intel_engine_cs *ring)
>  		dev_priv->semaphore_obj = NULL;
>  	}
>  
> -	if (ring->scratch.obj == NULL)
> -		return;
> -
> -	if (INTEL_INFO(dev)->gen >= 5) {
> -		kunmap(sg_page(ring->scratch.obj->pages->sgl));
> -		i915_gem_object_ggtt_unpin(ring->scratch.obj);
> -	}
> -
> -	drm_gem_object_unreference(&ring->scratch.obj->base);
> -	ring->scratch.obj = NULL;
> +	intel_fini_pipe_control(ring);
>  }
>  
>  static int gen8_rcs_signal(struct intel_engine_cs *signaller,
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 7203ee2..c135334 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -380,6 +380,9 @@ void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno);
>  int intel_ring_flush_all_caches(struct intel_engine_cs *ring);
>  int intel_ring_invalidate_all_caches(struct intel_engine_cs *ring);
>  
> +void intel_fini_pipe_control(struct intel_engine_cs *ring);
> +int intel_init_pipe_control(struct intel_engine_cs *ring);
> +
>  int intel_init_render_ring_buffer(struct drm_device *dev);
>  int intel_init_bsd_ring_buffer(struct drm_device *dev);
>  int intel_init_bsd2_ring_buffer(struct drm_device *dev);
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 17/43] drm/i915/bdw: GEN-specific logical ring set/get seqno
  2014-07-24 16:04 ` [PATCH 17/43] drm/i915/bdw: GEN-specific logical ring set/get seqno Thomas Daniel
@ 2014-08-11 15:05   ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 15:05 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:25PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> No mistery here: the seqno is still retrieved from the engine's
> HW status page (the one in the default context. For the moment,
> I see no reason to worry about other context's HWS page).
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

Ok, merged or bikeshedded up to this one. I need a bit a break to
recharge, will continue later on.

Cheers, Daniel
> ---
>  drivers/gpu/drm/i915/intel_lrc.c |   20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 7c8b75e..f171fd5 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -149,6 +149,16 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
>  	return ret;
>  }
>  
> +static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
> +{
> +	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
> +}
> +
> +static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
> +{
> +	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
> +}
> +
>  void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
>  {
>  	if (!intel_ring_initialized(ring))
> @@ -221,6 +231,8 @@ static int logical_render_ring_init(struct drm_device *dev)
>  
>  	ring->init = gen8_init_render_ring;
>  	ring->cleanup = intel_fini_pipe_control;
> +	ring->get_seqno = gen8_get_seqno;
> +	ring->set_seqno = gen8_set_seqno;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -237,6 +249,8 @@ static int logical_bsd_ring_init(struct drm_device *dev)
>  		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
>  
>  	ring->init = gen8_init_common_ring;
> +	ring->get_seqno = gen8_get_seqno;
> +	ring->set_seqno = gen8_set_seqno;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -253,6 +267,8 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
>  		GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
>  
>  	ring->init = gen8_init_common_ring;
> +	ring->get_seqno = gen8_get_seqno;
> +	ring->set_seqno = gen8_set_seqno;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -269,6 +285,8 @@ static int logical_blt_ring_init(struct drm_device *dev)
>  		GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
>  
>  	ring->init = gen8_init_common_ring;
> +	ring->get_seqno = gen8_get_seqno;
> +	ring->set_seqno = gen8_set_seqno;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -285,6 +303,8 @@ static int logical_vebox_ring_init(struct drm_device *dev)
>  		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
>  
>  	ring->init = gen8_init_common_ring;
> +	ring->get_seqno = gen8_get_seqno;
> +	ring->set_seqno = gen8_set_seqno;
>  
>  	return logical_ring_init(dev, ring);
>  }
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 25/43] drm/i915/bdw: Workload submission mechanism for Execlists
  2014-07-24 16:04 ` [PATCH 25/43] drm/i915/bdw: Workload submission mechanism for Execlists Thomas Daniel
@ 2014-08-11 20:30   ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 20:30 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:33PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> This is what i915_gem_do_execbuffer calls when it wants to execute some
> worload in an Execlists world.
> 
> v2: Check arguments before doing stuff in intel_execlists_submission. Also,
> get rel_constants parsing right.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h            |    6 ++
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |    4 +-
>  drivers/gpu/drm/i915/intel_lrc.c           |  130 +++++++++++++++++++++++++++-
>  3 files changed, 137 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 1caed52..4303e2c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2239,6 +2239,12 @@ int i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
>  			      struct drm_file *file_priv);
>  int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
>  			     struct drm_file *file_priv);
> +void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
> +					struct intel_engine_cs *ring);
> +void i915_gem_execbuffer_retire_commands(struct drm_device *dev,
> +					 struct drm_file *file,
> +					 struct intel_engine_cs *ring,
> +					 struct drm_i915_gem_object *obj);
>  int i915_gem_ringbuffer_submission(struct drm_device *dev,
>  				   struct drm_file *file,
>  				   struct intel_engine_cs *ring,
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 8c63d79..cae7df8 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -962,7 +962,7 @@ i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
>  	return ctx;
>  }
>  
> -static void
> +void
>  i915_gem_execbuffer_move_to_active(struct list_head *vmas,
>  				   struct intel_engine_cs *ring)
>  {
> @@ -994,7 +994,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
>  	}
>  }
>  
> -static void
> +void
>  i915_gem_execbuffer_retire_commands(struct drm_device *dev,
>  				    struct drm_file *file,
>  				    struct intel_engine_cs *ring,
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 55ee8dd..cd834b3 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -89,6 +89,57 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
>  	return 0;
>  }
>  
> +static int logical_ring_invalidate_all_caches(struct intel_ringbuffer *ringbuf)
> +{
> +	struct intel_engine_cs *ring = ringbuf->ring;
> +	uint32_t flush_domains;
> +	int ret;
> +
> +	flush_domains = 0;
> +	if (ring->gpu_caches_dirty)
> +		flush_domains = I915_GEM_GPU_DOMAINS;
> +
> +	ret = ring->emit_flush(ringbuf, I915_GEM_GPU_DOMAINS, flush_domains);
> +	if (ret)
> +		return ret;
> +
> +	ring->gpu_caches_dirty = false;
> +	return 0;
> +}
> +
> +static int execlists_move_to_gpu(struct intel_ringbuffer *ringbuf,
> +				 struct list_head *vmas)
> +{
> +	struct intel_engine_cs *ring = ringbuf->ring;
> +	struct i915_vma *vma;
> +	uint32_t flush_domains = 0;
> +	bool flush_chipset = false;
> +	int ret;
> +
> +	list_for_each_entry(vma, vmas, exec_list) {
> +		struct drm_i915_gem_object *obj = vma->obj;
> +		ret = i915_gem_object_sync(obj, ring);
> +		if (ret)
> +			return ret;
> +
> +		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
> +			flush_chipset |= i915_gem_clflush_object(obj, false);
> +
> +		flush_domains |= obj->base.write_domain;
> +	}
> +
> +	if (flush_chipset)
> +		i915_gem_chipset_flush(ring->dev);

chipset flush is gen5 and earlier only. I'll ditch it.
> +
> +	if (flush_domains & I915_GEM_DOMAIN_GTT)
> +		wmb();
> +
> +	/* Unconditionally invalidate gpu caches and ensure that we do flush
> +	 * any residual writes from the previous batch.
> +	 */
> +	return logical_ring_invalidate_all_caches(ringbuf);
> +}
> +
>  int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
>  			       struct intel_engine_cs *ring,
>  			       struct intel_context *ctx,
> @@ -97,7 +148,84 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
>  			       struct drm_i915_gem_object *batch_obj,
>  			       u64 exec_start, u32 flags)
>  {
> -	/* TODO */
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
> +	int instp_mode;
> +	u32 instp_mask;
> +	int ret;
> +
> +	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
> +	instp_mask = I915_EXEC_CONSTANTS_MASK;
> +	switch (instp_mode) {
> +	case I915_EXEC_CONSTANTS_REL_GENERAL:
> +	case I915_EXEC_CONSTANTS_ABSOLUTE:
> +	case I915_EXEC_CONSTANTS_REL_SURFACE:
> +		if (instp_mode != 0 && ring != &dev_priv->ring[RCS]) {
> +			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
> +			return -EINVAL;
> +		}
> +
> +		if (instp_mode != dev_priv->relative_constants_mode) {
> +			if (instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
> +				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
> +				return -EINVAL;
> +			}
> +
> +			/* The HW changed the meaning on this bit on gen6 */
> +			instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
> +		}
> +		break;
> +	default:
> +		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
> +		return -EINVAL;
> +	}
> +
> +	if (args->num_cliprects != 0) {
> +		DRM_DEBUG("clip rectangles are only valid on pre-gen5\n");
> +		return -EINVAL;
> +	} else {
> +		if (args->DR4 == 0xffffffff) {
> +			DRM_DEBUG("UXA submitting garbage DR4, fixing up\n");
> +			args->DR4 = 0;
> +		}
> +
> +		if (args->DR1 || args->DR4 || args->cliprects_ptr) {
> +			DRM_DEBUG("0 cliprects but dirt in cliprects fields\n");
> +			return -EINVAL;
> +		}
> +	}

Yay for all the legacy nonsense that we can ditch here ;-)
-Daniel

> +
> +	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
> +		DRM_DEBUG("sol reset is gen7 only\n");
> +		return -EINVAL;
> +	}
> +
> +	ret = execlists_move_to_gpu(ringbuf, vmas);
> +	if (ret)
> +		return ret;
> +
> +	if (ring == &dev_priv->ring[RCS] &&
> +			instp_mode != dev_priv->relative_constants_mode) {
> +		ret = intel_logical_ring_begin(ringbuf, 4);
> +		if (ret)
> +			return ret;
> +
> +		intel_logical_ring_emit(ringbuf, MI_NOOP);
> +		intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1));
> +		intel_logical_ring_emit(ringbuf, INSTPM);
> +		intel_logical_ring_emit(ringbuf, instp_mask << 16 | instp_mode);
> +		intel_logical_ring_advance(ringbuf);
> +
> +		dev_priv->relative_constants_mode = instp_mode;
> +	}
> +
> +	ret = ring->emit_bb_start(ringbuf, exec_start, flags);
> +	if (ret)
> +		return ret;
> +
> +	i915_gem_execbuffer_move_to_active(vmas, ring);
> +	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
> +
>  	return 0;
>  }
>  
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 26/43] drm/i915/bdw: Always use MMIO flips with Execlists
  2014-07-24 16:04 ` [PATCH 26/43] drm/i915/bdw: Always use MMIO flips with Execlists Thomas Daniel
@ 2014-08-11 20:34   ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 20:34 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:34PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> The normal flip function places things in the ring in the legacy
> way, so we either fix that or force MMIO flips always as we do in
> this patch.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_display.c |    2 ++
>  drivers/gpu/drm/i915/intel_lrc.c     |    3 ++-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 5ed6a1a..8129af4 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -9482,6 +9482,8 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
>  		return false;
>  	else if (i915.use_mmio_flip > 0)
>  		return true;
> +	else if (i915.enable_execlists)
> +		return true;
>  	else
>  		return ring != obj->ring;
>  }
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index cd834b3..0a04c03 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -83,7 +83,8 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
>  	if (enable_execlists == 0)
>  		return 0;
>  
> -	if (HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev))
> +	if (HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev) &&
> +			i915.use_mmio_flip >= 0)

Almost every patch gets the alignment wrong somewhere. Not terribly
amused.
-Daniel

>  		return 1;
>  
>  	return 0;
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 18/43] drm/i915/bdw: New logical ring submission mechanism
  2014-07-24 16:04 ` [PATCH 18/43] drm/i915/bdw: New logical ring submission mechanism Thomas Daniel
@ 2014-08-11 20:40   ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 20:40 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:26PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Well, new-ish: if all this code looks familiar, that's because it's
> a clone of the existing submission mechanism (with some modifications
> here and there to adapt it to LRCs and Execlists).
> 
> And why did we do this instead of reusing code, one might wonder?
> Well, there are some fears that the differences are big enough that
> they will end up breaking all platforms.
> 
> Also, Execlists offer several advantages, like control over when the
> GPU is done with a given workload, that can help simplify the
> submission mechanism, no doubt. I am interested in getting Execlists
> to work first and foremost, but in the future this parallel submission
> mechanism will help us to fine tune the mechanism without affecting
> old gens.
> 
> v2: Pass the ringbuffer only (whenever possible).
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c        |  193 +++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.h        |   12 ++
>  drivers/gpu/drm/i915/intel_ringbuffer.c |   20 ++--
>  drivers/gpu/drm/i915/intel_ringbuffer.h |    3 +
>  4 files changed, 218 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index f171fd5..bd37d51 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -106,6 +106,199 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
>  	/* TODO */
>  }
>  
> +void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
> +{
> +	intel_logical_ring_advance(ringbuf);
> +
> +	if (intel_ring_stopped(ringbuf->ring))
> +		return;
> +
> +	/* TODO: how to submit a context to the ELSP is not here yet */
> +}
> +
> +static int logical_ring_alloc_seqno(struct intel_engine_cs *ring)
> +{
> +	if (ring->outstanding_lazy_seqno)
> +		return 0;
> +
> +	if (ring->preallocated_lazy_request == NULL) {
> +		struct drm_i915_gem_request *request;
> +
> +		request = kmalloc(sizeof(*request), GFP_KERNEL);
> +		if (request == NULL)
> +			return -ENOMEM;
> +
> +		ring->preallocated_lazy_request = request;
> +	}
> +
> +	return i915_gem_get_seqno(ring->dev, &ring->outstanding_lazy_seqno);
> +}
> +
> +static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf, int bytes)
> +{
> +	struct intel_engine_cs *ring = ringbuf->ring;
> +	struct drm_i915_gem_request *request;
> +	u32 seqno = 0;
> +	int ret;
> +
> +	if (ringbuf->last_retired_head != -1) {
> +		ringbuf->head = ringbuf->last_retired_head;
> +		ringbuf->last_retired_head = -1;
> +
> +		ringbuf->space = intel_ring_space(ringbuf);
> +		if (ringbuf->space >= bytes)
> +			return 0;
> +	}
> +
> +	list_for_each_entry(request, &ring->request_list, list) {
> +		if (__intel_ring_space(request->tail, ringbuf->tail,
> +				ringbuf->size) >= bytes) {
> +			seqno = request->seqno;
> +			break;
> +		}
> +	}
> +
> +	if (seqno == 0)
> +		return -ENOSPC;
> +
> +	ret = i915_wait_seqno(ring, seqno);
> +	if (ret)
> +		return ret;
> +
> +	/* TODO: make sure we update the right ringbuffer's last_retired_head
> +	 * when retiring requests */
> +	i915_gem_retire_requests_ring(ring);
> +	ringbuf->head = ringbuf->last_retired_head;
> +	ringbuf->last_retired_head = -1;
> +
> +	ringbuf->space = intel_ring_space(ringbuf);
> +	return 0;
> +}
> +
> +static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf, int bytes)
> +{
> +	struct intel_engine_cs *ring = ringbuf->ring;
> +	struct drm_device *dev = ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	unsigned long end;
> +	int ret;
> +
> +	ret = logical_ring_wait_request(ringbuf, bytes);
> +	if (ret != -ENOSPC)
> +		return ret;
> +
> +	/* Force the context submission in case we have been skipping it */
> +	intel_logical_ring_advance_and_submit(ringbuf);
> +
> +	/* With GEM the hangcheck timer should kick us out of the loop,
> +	 * leaving it early runs the risk of corrupting GEM state (due
> +	 * to running on almost untested codepaths). But on resume
> +	 * timers don't work yet, so prevent a complete hang in that
> +	 * case by choosing an insanely large timeout. */
> +	end = jiffies + 60 * HZ;
> +
> +	do {
> +		ringbuf->head = I915_READ_HEAD(ring);
> +		ringbuf->space = intel_ring_space(ringbuf);
> +		if (ringbuf->space >= bytes) {
> +			ret = 0;
> +			break;
> +		}
> +
> +		if (!drm_core_check_feature(dev, DRIVER_MODESET) &&
> +		    dev->primary->master) {
> +			struct drm_i915_master_private *master_priv = dev->primary->master->driver_priv;
> +			if (master_priv->sarea_priv)
> +				master_priv->sarea_priv->perf_boxes |= I915_BOX_WAIT;
> +		}

sarea is legacy gunk. Really bad legacy gunk. The DRIVE_MODESET check
should have been a give-away. Also checkpatch.

Fixed while applying.
-Daniel

> +
> +		msleep(1);
> +
> +		if (dev_priv->mm.interruptible && signal_pending(current)) {
> +			ret = -ERESTARTSYS;
> +			break;
> +		}
> +
> +		ret = i915_gem_check_wedge(&dev_priv->gpu_error,
> +					   dev_priv->mm.interruptible);
> +		if (ret)
> +			break;
> +
> +		if (time_after(jiffies, end)) {
> +			ret = -EBUSY;
> +			break;
> +		}
> +	} while (1);
> +
> +	return ret;
> +}
> +
> +static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf)
> +{
> +	uint32_t __iomem *virt;
> +	int rem = ringbuf->size - ringbuf->tail;
> +
> +	if (ringbuf->space < rem) {
> +		int ret = logical_ring_wait_for_space(ringbuf, rem);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	virt = ringbuf->virtual_start + ringbuf->tail;
> +	rem /= 4;
> +	while (rem--)
> +		iowrite32(MI_NOOP, virt++);
> +
> +	ringbuf->tail = 0;
> +	ringbuf->space = intel_ring_space(ringbuf);
> +
> +	return 0;
> +}
> +
> +static int logical_ring_prepare(struct intel_ringbuffer *ringbuf, int bytes)
> +{
> +	int ret;
> +
> +	if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
> +		ret = logical_ring_wrap_buffer(ringbuf);
> +		if (unlikely(ret))
> +			return ret;
> +	}
> +
> +	if (unlikely(ringbuf->space < bytes)) {
> +		ret = logical_ring_wait_for_space(ringbuf, bytes);
> +		if (unlikely(ret))
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords)
> +{
> +	struct intel_engine_cs *ring = ringbuf->ring;
> +	struct drm_device *dev = ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	int ret;
> +
> +	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
> +				   dev_priv->mm.interruptible);
> +	if (ret)
> +		return ret;
> +
> +	ret = logical_ring_prepare(ringbuf, num_dwords * sizeof(uint32_t));
> +	if (ret)
> +		return ret;
> +
> +	/* Preallocate the olr before touching the ring */
> +	ret = logical_ring_alloc_seqno(ring);
> +	if (ret)
> +		return ret;
> +
> +	ringbuf->space -= num_dwords * sizeof(uint32_t);
> +	return 0;
> +}
> +
>  static int gen8_init_common_ring(struct intel_engine_cs *ring)
>  {
>  	struct drm_device *dev = ring->dev;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index bf0eff4..16798b6 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -29,6 +29,18 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring);
>  void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
>  int intel_logical_rings_init(struct drm_device *dev);
>  
> +void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf);
> +static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
> +{
> +	ringbuf->tail &= ringbuf->size - 1;
> +}
> +static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf, u32 data)
> +{
> +	iowrite32(data, ringbuf->virtual_start + ringbuf->tail);
> +	ringbuf->tail += 4;
> +}
> +int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords);
> +
>  /* Logical Ring Contexts */
>  void intel_lr_context_free(struct intel_context *ctx);
>  int intel_lr_context_deferred_create(struct intel_context *ctx,
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index ca45c58..dc2a991 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -57,7 +57,7 @@ intel_ring_initialized(struct intel_engine_cs *ring)
>  		return ring->buffer && ring->buffer->obj;
>  }
>  
> -static inline int __ring_space(int head, int tail, int size)
> +int __intel_ring_space(int head, int tail, int size)
>  {
>  	int space = head - (tail + I915_RING_FREE_SPACE);
>  	if (space < 0)
> @@ -65,12 +65,12 @@ static inline int __ring_space(int head, int tail, int size)
>  	return space;
>  }
>  
> -static inline int ring_space(struct intel_ringbuffer *ringbuf)
> +int intel_ring_space(struct intel_ringbuffer *ringbuf)
>  {
> -	return __ring_space(ringbuf->head & HEAD_ADDR, ringbuf->tail, ringbuf->size);
> +	return __intel_ring_space(ringbuf->head & HEAD_ADDR, ringbuf->tail, ringbuf->size);
>  }
>  
> -static bool intel_ring_stopped(struct intel_engine_cs *ring)
> +bool intel_ring_stopped(struct intel_engine_cs *ring)
>  {
>  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
>  	return dev_priv->gpu_error.stop_rings & intel_ring_flag(ring);
> @@ -561,7 +561,7 @@ static int init_ring_common(struct intel_engine_cs *ring)
>  	else {
>  		ringbuf->head = I915_READ_HEAD(ring);
>  		ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
> -		ringbuf->space = ring_space(ringbuf);
> +		ringbuf->space = intel_ring_space(ringbuf);
>  		ringbuf->last_retired_head = -1;
>  	}
>  
> @@ -1679,13 +1679,13 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
>  		ringbuf->head = ringbuf->last_retired_head;
>  		ringbuf->last_retired_head = -1;
>  
> -		ringbuf->space = ring_space(ringbuf);
> +		ringbuf->space = intel_ring_space(ringbuf);
>  		if (ringbuf->space >= n)
>  			return 0;
>  	}
>  
>  	list_for_each_entry(request, &ring->request_list, list) {
> -		if (__ring_space(request->tail, ringbuf->tail, ringbuf->size) >= n) {
> +		if (__intel_ring_space(request->tail, ringbuf->tail, ringbuf->size) >= n) {
>  			seqno = request->seqno;
>  			break;
>  		}
> @@ -1702,7 +1702,7 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
>  	ringbuf->head = ringbuf->last_retired_head;
>  	ringbuf->last_retired_head = -1;
>  
> -	ringbuf->space = ring_space(ringbuf);
> +	ringbuf->space = intel_ring_space(ringbuf);
>  	return 0;
>  }
>  
> @@ -1731,7 +1731,7 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
>  	trace_i915_ring_wait_begin(ring);
>  	do {
>  		ringbuf->head = I915_READ_HEAD(ring);
> -		ringbuf->space = ring_space(ringbuf);
> +		ringbuf->space = intel_ring_space(ringbuf);
>  		if (ringbuf->space >= n) {
>  			ret = 0;
>  			break;
> @@ -1783,7 +1783,7 @@ static int intel_wrap_ring_buffer(struct intel_engine_cs *ring)
>  		iowrite32(MI_NOOP, virt++);
>  
>  	ringbuf->tail = 0;
> -	ringbuf->space = ring_space(ringbuf);
> +	ringbuf->space = intel_ring_space(ringbuf);
>  
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index c135334..c305df0 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -373,6 +373,9 @@ static inline void intel_ring_advance(struct intel_engine_cs *ring)
>  	struct intel_ringbuffer *ringbuf = ring->buffer;
>  	ringbuf->tail &= ringbuf->size - 1;
>  }
> +int __intel_ring_space(int head, int tail, int size);
> +int intel_ring_space(struct intel_ringbuffer *ringbuf);
> +bool intel_ring_stopped(struct intel_engine_cs *ring);
>  void __intel_ring_advance(struct intel_engine_cs *ring);
>  
>  int __must_check intel_ring_idle(struct intel_engine_cs *ring);
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 21/43] drm/i915/bdw: Emission of requests with logical rings
  2014-07-24 16:04 ` [PATCH 21/43] drm/i915/bdw: Emission of requests with logical rings Thomas Daniel
@ 2014-08-11 20:56   ` Daniel Vetter
  2014-08-13 13:34     ` Daniel, Thomas
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 20:56 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:29PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> On a previous iteration of this patch, I created an Execlists
> version of __i915_add_request and asbtracted it away as a
> vfunc. Daniel Vetter wondered then why that was needed:
> 
> "with the clean split in command submission I expect every
> function to know wether it'll submit to an lrc (everything in
> intel_lrc.c) or wether it'll submit to a legacy ring (existing
> code), so I don't see a need for an add_request vfunc."
> 
> The honest, hairy truth is that this patch is the glue keeping
> the whole logical ring puzzle together:

Oops, I didn't spot this and it's indeed not terribly pretty.
> 
> - i915_add_request is used by intel_ring_idle, which in turn is
>   used by i915_gpu_idle, which in turn is used in several places
>   inside the eviction and gtt codes.

This should probably be folded in with the lrc specific version of
stop_rings and so should work out.

> - Also, it is used by i915_gem_check_olr, which is littered all
>   over i915_gem.c

We now always preallocate the request struct, so olr is officially dead.
Well almost, except for non-execbuf stuff that we emit through the rings.
Which is nothing for lrc/execlist mode.

Also there's the icky-bitty problem with ringbuf->ctx which makes this
patch not apply any more. I think we need to revise or at least discuss a
bit.

> - ...
> 
> If I were to duplicate all the code that directly or indirectly
> uses __i915_add_request, I'll end up creating a separate driver.
> 
> To show the differences between the existing legacy version and
> the new Execlists one, this time I have special-cased
> __i915_add_request instead of adding an add_request vfunc. I
> hope this helps to untangle this Gordian knot.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c  |   72 ++++++++++++++++++++++++++++----------
>  drivers/gpu/drm/i915/intel_lrc.c |   30 +++++++++++++---
>  drivers/gpu/drm/i915/intel_lrc.h |    1 +
>  3 files changed, 80 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9560b40..1c83b9c 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2327,10 +2327,21 @@ int __i915_add_request(struct intel_engine_cs *ring,
>  {
>  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
>  	struct drm_i915_gem_request *request;
> +	struct intel_ringbuffer *ringbuf;
>  	u32 request_ring_position, request_start;
>  	int ret;
>  
> -	request_start = intel_ring_get_tail(ring->buffer);
> +	request = ring->preallocated_lazy_request;
> +	if (WARN_ON(request == NULL))
> +		return -ENOMEM;
> +
> +	if (i915.enable_execlists) {
> +		struct intel_context *ctx = request->ctx;
> +		ringbuf = ctx->engine[ring->id].ringbuf;
> +	} else
> +		ringbuf = ring->buffer;
> +
> +	request_start = intel_ring_get_tail(ringbuf);
>  	/*
>  	 * Emit any outstanding flushes - execbuf can fail to emit the flush
>  	 * after having emitted the batchbuffer command. Hence we need to fix
> @@ -2338,24 +2349,32 @@ int __i915_add_request(struct intel_engine_cs *ring,
>  	 * is that the flush _must_ happen before the next request, no matter
>  	 * what.
>  	 */
> -	ret = intel_ring_flush_all_caches(ring);
> -	if (ret)
> -		return ret;
> -
> -	request = ring->preallocated_lazy_request;
> -	if (WARN_ON(request == NULL))
> -		return -ENOMEM;
> +	if (i915.enable_execlists) {
> +		ret = logical_ring_flush_all_caches(ringbuf);
> +		if (ret)
> +			return ret;
> +	} else {
> +		ret = intel_ring_flush_all_caches(ring);
> +		if (ret)
> +			return ret;
> +	}
>  
>  	/* Record the position of the start of the request so that
>  	 * should we detect the updated seqno part-way through the
>  	 * GPU processing the request, we never over-estimate the
>  	 * position of the head.
>  	 */
> -	request_ring_position = intel_ring_get_tail(ring->buffer);
> +	request_ring_position = intel_ring_get_tail(ringbuf);
>  
> -	ret = ring->add_request(ring);
> -	if (ret)
> -		return ret;
> +	if (i915.enable_execlists) {
> +		ret = ring->emit_request(ringbuf);
> +		if (ret)
> +			return ret;
> +	} else {
> +		ret = ring->add_request(ring);
> +		if (ret)
> +			return ret;
> +	}
>  
>  	request->seqno = intel_ring_get_seqno(ring);
>  	request->ring = ring;
> @@ -2370,12 +2389,14 @@ int __i915_add_request(struct intel_engine_cs *ring,
>  	 */
>  	request->batch_obj = obj;
>  
> -	/* Hold a reference to the current context so that we can inspect
> -	 * it later in case a hangcheck error event fires.
> -	 */
> -	request->ctx = ring->last_context;
> -	if (request->ctx)
> -		i915_gem_context_reference(request->ctx);
> +	if (!i915.enable_execlists) {
> +		/* Hold a reference to the current context so that we can inspect
> +		 * it later in case a hangcheck error event fires.
> +		 */
> +		request->ctx = ring->last_context;
> +		if (request->ctx)
> +			i915_gem_context_reference(request->ctx);
> +	}
>  
>  	request->emitted_jiffies = jiffies;
>  	list_add_tail(&request->list, &ring->request_list);
> @@ -2630,6 +2651,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>  
>  	while (!list_empty(&ring->request_list)) {
>  		struct drm_i915_gem_request *request;
> +		struct intel_ringbuffer *ringbuf;
>  
>  		request = list_first_entry(&ring->request_list,
>  					   struct drm_i915_gem_request,
> @@ -2639,12 +2661,24 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>  			break;
>  
>  		trace_i915_gem_request_retire(ring, request->seqno);
> +
> +		/* This is one of the few common intersection points
> +		 * between legacy ringbuffer submission and execlists:
> +		 * we need to tell them apart in order to find the correct
> +		 * ringbuffer to which the request belongs to.
> +		 */
> +		if (i915.enable_execlists) {
> +			struct intel_context *ctx = request->ctx;
> +			ringbuf = ctx->engine[ring->id].ringbuf;
> +		} else
> +			ringbuf = ring->buffer;
> +
>  		/* We know the GPU must have read the request to have
>  		 * sent us the seqno + interrupt, so use the position
>  		 * of tail of the request to update the last known position
>  		 * of the GPU head.
>  		 */
> -		ring->buffer->last_retired_head = request->tail;
> +		ringbuf->last_retired_head = request->tail;
>  
>  		i915_gem_free_request(request);
>  	}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 5dd63d6..dcf59c6 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -106,6 +106,22 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
>  	/* TODO */
>  }
>  
> +int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf)
> +{
> +	struct intel_engine_cs *ring = ringbuf->ring;
> +	int ret;
> +
> +	if (!ring->gpu_caches_dirty)
> +		return 0;
> +
> +	ret = ring->emit_flush(ringbuf, 0, I915_GEM_GPU_DOMAINS);
> +	if (ret)
> +		return ret;
> +
> +	ring->gpu_caches_dirty = false;
> +	return 0;
> +}
> +
>  void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
>  {
>  	intel_logical_ring_advance(ringbuf);
> @@ -116,7 +132,8 @@ void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
>  	/* TODO: how to submit a context to the ELSP is not here yet */
>  }
>  
> -static int logical_ring_alloc_seqno(struct intel_engine_cs *ring)
> +static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
> +				    struct intel_context *ctx)
>  {
>  	if (ring->outstanding_lazy_seqno)
>  		return 0;
> @@ -128,6 +145,13 @@ static int logical_ring_alloc_seqno(struct intel_engine_cs *ring)
>  		if (request == NULL)
>  			return -ENOMEM;
>  
> +		/* Hold a reference to the context this request belongs to
> +		 * (we will need it when the time comes to emit/retire the
> +		 * request).
> +		 */
> +		request->ctx = ctx;
> +		i915_gem_context_reference(request->ctx);
> +
>  		ring->preallocated_lazy_request = request;
>  	}
>  
> @@ -165,8 +189,6 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf, int bytes
>  	if (ret)
>  		return ret;
>  
> -	/* TODO: make sure we update the right ringbuffer's last_retired_head
> -	 * when retiring requests */
>  	i915_gem_retire_requests_ring(ring);
>  	ringbuf->head = ringbuf->last_retired_head;
>  	ringbuf->last_retired_head = -1;
> @@ -291,7 +313,7 @@ int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords)
>  		return ret;
>  
>  	/* Preallocate the olr before touching the ring */
> -	ret = logical_ring_alloc_seqno(ring);
> +	ret = logical_ring_alloc_seqno(ring, ringbuf->ctx);

Ok, this is hairy. Really hairy, since this uses ringbuf->ctx. Not sure we
really want this or need this.

>  	if (ret)
>  		return ret;
>  
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 16798b6..696e09e 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -29,6 +29,7 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring);
>  void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
>  int intel_logical_rings_init(struct drm_device *dev);
>  
> +int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf);
>  void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf);
>  static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
>  {
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 23/43] drm/i915/bdw: Interrupts with logical rings
  2014-07-24 16:04 ` [PATCH 23/43] drm/i915/bdw: Interrupts " Thomas Daniel
@ 2014-08-11 21:02   ` Daniel Vetter
  2014-08-11 21:08   ` Daniel Vetter
  1 sibling, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 21:02 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:31PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> We need to attend context switch interrupts from all rings. Also, fixed writing
> IMR/IER and added HWSTAM at ring init time.
> 
> Notice that, if added to irq_enable_mask, the context switch interrupts would
> be incorrectly masked out when the user interrupts are due to no users waiting
> on a sequence number. Therefore, this commit adds a bitmask of interrupts to
> be kept unmasked at all times.
> 
> v2: Disable HWSTAM, as suggested by Damien (nobody listens to these interrupts,
> anyway).
> 
> v3: Add new get/put_irq functions.
> 
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2 & v3)

irq_keep_mask is a nifty idea, would be pretty to roll it out for legacy
rings too. But totally optional.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_irq.c         |   19 ++++++++--
>  drivers/gpu/drm/i915/i915_reg.h         |    3 ++
>  drivers/gpu/drm/i915/intel_lrc.c        |   58 +++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |    1 +
>  4 files changed, 78 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index a38b5c3..f77a4ca 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1643,6 +1643,8 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>  				notify_ring(dev, &dev_priv->ring[RCS]);
>  			if (bcs & GT_RENDER_USER_INTERRUPT)
>  				notify_ring(dev, &dev_priv->ring[BCS]);
> +			if ((rcs | bcs) & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> +				DRM_DEBUG_DRIVER("TODO: Context switch\n");
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT0)!\n");
>  	}
> @@ -1655,9 +1657,13 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>  			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
>  			if (vcs & GT_RENDER_USER_INTERRUPT)
>  				notify_ring(dev, &dev_priv->ring[VCS]);
> +			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> +				DRM_DEBUG_DRIVER("TODO: Context switch\n");
>  			vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
>  			if (vcs & GT_RENDER_USER_INTERRUPT)
>  				notify_ring(dev, &dev_priv->ring[VCS2]);
> +			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> +				DRM_DEBUG_DRIVER("TODO: Context switch\n");
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT1)!\n");
>  	}
> @@ -1681,6 +1687,8 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>  			vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
>  			if (vcs & GT_RENDER_USER_INTERRUPT)
>  				notify_ring(dev, &dev_priv->ring[VECS]);
> +			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> +				DRM_DEBUG_DRIVER("TODO: Context switch\n");
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT3)!\n");
>  	}
> @@ -3768,12 +3776,17 @@ static void gen8_gt_irq_postinstall(struct drm_i915_private *dev_priv)
>  	/* These are interrupts we'll toggle with the ring mask register */
>  	uint32_t gt_interrupts[] = {
>  		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
> +			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
>  			GT_RENDER_L3_PARITY_ERROR_INTERRUPT |
> -			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT,
> +			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT |
> +			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT,
>  		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
> -			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT,
> +			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
> +			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT |
> +			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT,
>  		0,
> -		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT
> +		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT |
> +			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT
>  		};
>  
>  	for (i = 0; i < ARRAY_SIZE(gt_interrupts); i++)
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 70dddac..bfc0c01 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -1062,6 +1062,7 @@ enum punit_power_well {
>  #define RING_ACTHD_UDW(base)	((base)+0x5c)
>  #define RING_NOPID(base)	((base)+0x94)
>  #define RING_IMR(base)		((base)+0xa8)
> +#define RING_HWSTAM(base)	((base)+0x98)
>  #define RING_TIMESTAMP(base)	((base)+0x358)
>  #define   TAIL_ADDR		0x001FFFF8
>  #define   HEAD_WRAP_COUNT	0xFFE00000
> @@ -4590,6 +4591,8 @@ enum punit_power_well {
>  #define GEN8_GT_IIR(which) (0x44308 + (0x10 * (which)))
>  #define GEN8_GT_IER(which) (0x4430c + (0x10 * (which)))
>  
> +#define GEN8_GT_CONTEXT_SWITCH_INTERRUPT	(1 <<  8)
> +
>  #define GEN8_BCS_IRQ_SHIFT 16
>  #define GEN8_RCS_IRQ_SHIFT 0
>  #define GEN8_VCS2_IRQ_SHIFT 16
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index c30518c..a6dcb3a 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -343,6 +343,9 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring)
>  	struct drm_device *dev = ring->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  
> +	I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
> +	I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff);
> +
>  	I915_WRITE(RING_MODE_GEN7(ring),
>  		_MASKED_BIT_DISABLE(GFX_REPLAY_MODE) |
>  		_MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));
> @@ -381,6 +384,39 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
>  	return ret;
>  }
>  
> +static bool gen8_logical_ring_get_irq(struct intel_engine_cs *ring)
> +{
> +	struct drm_device *dev = ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	unsigned long flags;
> +
> +	if (!dev->irq_enabled)
> +		return false;
> +
> +	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> +	if (ring->irq_refcount++ == 0) {
> +		I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
> +		POSTING_READ(RING_IMR(ring->mmio_base));
> +	}
> +	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> +
> +	return true;
> +}
> +
> +static void gen8_logical_ring_put_irq(struct intel_engine_cs *ring)
> +{
> +	struct drm_device *dev = ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> +	if (--ring->irq_refcount == 0) {
> +		I915_WRITE_IMR(ring, ~ring->irq_keep_mask);
> +		POSTING_READ(RING_IMR(ring->mmio_base));
> +	}
> +	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> +}
> +
>  static int gen8_emit_flush(struct intel_ringbuffer *ringbuf,
>  			   u32 invalidate_domains,
>  			   u32 unused)
> @@ -566,6 +602,10 @@ static int logical_render_ring_init(struct drm_device *dev)
>  	ring->mmio_base = RENDER_RING_BASE;
>  	ring->irq_enable_mask =
>  		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
> +	ring->irq_keep_mask =
> +		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
> +	if (HAS_L3_DPF(dev))
> +		ring->irq_keep_mask |= GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
>  
>  	ring->init = gen8_init_render_ring;
>  	ring->cleanup = intel_fini_pipe_control;
> @@ -573,6 +613,8 @@ static int logical_render_ring_init(struct drm_device *dev)
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->emit_request = gen8_emit_request;
>  	ring->emit_flush = gen8_emit_flush_render;
> +	ring->irq_get = gen8_logical_ring_get_irq;
> +	ring->irq_put = gen8_logical_ring_put_irq;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -587,12 +629,16 @@ static int logical_bsd_ring_init(struct drm_device *dev)
>  	ring->mmio_base = GEN6_BSD_RING_BASE;
>  	ring->irq_enable_mask =
>  		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
> +	ring->irq_keep_mask =
> +		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
>  
>  	ring->init = gen8_init_common_ring;
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->emit_request = gen8_emit_request;
>  	ring->emit_flush = gen8_emit_flush;
> +	ring->irq_get = gen8_logical_ring_get_irq;
> +	ring->irq_put = gen8_logical_ring_put_irq;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -607,12 +653,16 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
>  	ring->mmio_base = GEN8_BSD2_RING_BASE;
>  	ring->irq_enable_mask =
>  		GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
> +	ring->irq_keep_mask =
> +		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
>  
>  	ring->init = gen8_init_common_ring;
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->emit_request = gen8_emit_request;
>  	ring->emit_flush = gen8_emit_flush;
> +	ring->irq_get = gen8_logical_ring_get_irq;
> +	ring->irq_put = gen8_logical_ring_put_irq;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -627,12 +677,16 @@ static int logical_blt_ring_init(struct drm_device *dev)
>  	ring->mmio_base = BLT_RING_BASE;
>  	ring->irq_enable_mask =
>  		GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
> +	ring->irq_keep_mask =
> +		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
>  
>  	ring->init = gen8_init_common_ring;
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->emit_request = gen8_emit_request;
>  	ring->emit_flush = gen8_emit_flush;
> +	ring->irq_get = gen8_logical_ring_get_irq;
> +	ring->irq_put = gen8_logical_ring_put_irq;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -647,12 +701,16 @@ static int logical_vebox_ring_init(struct drm_device *dev)
>  	ring->mmio_base = VEBOX_RING_BASE;
>  	ring->irq_enable_mask =
>  		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
> +	ring->irq_keep_mask =
> +		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
>  
>  	ring->init = gen8_init_common_ring;
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->emit_request = gen8_emit_request;
>  	ring->emit_flush = gen8_emit_flush;
> +	ring->irq_get = gen8_logical_ring_get_irq;
> +	ring->irq_put = gen8_logical_ring_put_irq;
>  
>  	return logical_ring_init(dev, ring);
>  }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 6e22866..09102b2 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -223,6 +223,7 @@ struct  intel_engine_cs {
>  	} semaphore;
>  
>  	/* Execlists */
> +	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
>  	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
>  	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
>  				      u32 invalidate_domains,
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 23/43] drm/i915/bdw: Interrupts with logical rings
  2014-07-24 16:04 ` [PATCH 23/43] drm/i915/bdw: Interrupts " Thomas Daniel
  2014-08-11 21:02   ` Daniel Vetter
@ 2014-08-11 21:08   ` Daniel Vetter
  1 sibling, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 21:08 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:31PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> We need to attend context switch interrupts from all rings. Also, fixed writing
> IMR/IER and added HWSTAM at ring init time.
> 
> Notice that, if added to irq_enable_mask, the context switch interrupts would
> be incorrectly masked out when the user interrupts are due to no users waiting
> on a sequence number. Therefore, this commit adds a bitmask of interrupts to
> be kept unmasked at all times.
> 
> v2: Disable HWSTAM, as suggested by Damien (nobody listens to these interrupts,
> anyway).
> 
> v3: Add new get/put_irq functions.
> 
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2 & v3)
> ---
>  drivers/gpu/drm/i915/i915_irq.c         |   19 ++++++++--
>  drivers/gpu/drm/i915/i915_reg.h         |    3 ++
>  drivers/gpu/drm/i915/intel_lrc.c        |   58 +++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |    1 +
>  4 files changed, 78 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index a38b5c3..f77a4ca 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1643,6 +1643,8 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>  				notify_ring(dev, &dev_priv->ring[RCS]);
>  			if (bcs & GT_RENDER_USER_INTERRUPT)
>  				notify_ring(dev, &dev_priv->ring[BCS]);
> +			if ((rcs | bcs) & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> +				DRM_DEBUG_DRIVER("TODO: Context switch\n");
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT0)!\n");
>  	}
> @@ -1655,9 +1657,13 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>  			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
>  			if (vcs & GT_RENDER_USER_INTERRUPT)
>  				notify_ring(dev, &dev_priv->ring[VCS]);
> +			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> +				DRM_DEBUG_DRIVER("TODO: Context switch\n");
>  			vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
>  			if (vcs & GT_RENDER_USER_INTERRUPT)
>  				notify_ring(dev, &dev_priv->ring[VCS2]);
> +			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> +				DRM_DEBUG_DRIVER("TODO: Context switch\n");
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT1)!\n");
>  	}
> @@ -1681,6 +1687,8 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>  			vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
>  			if (vcs & GT_RENDER_USER_INTERRUPT)
>  				notify_ring(dev, &dev_priv->ring[VECS]);
> +			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> +				DRM_DEBUG_DRIVER("TODO: Context switch\n");
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT3)!\n");
>  	}
> @@ -3768,12 +3776,17 @@ static void gen8_gt_irq_postinstall(struct drm_i915_private *dev_priv)
>  	/* These are interrupts we'll toggle with the ring mask register */
>  	uint32_t gt_interrupts[] = {
>  		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
> +			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
>  			GT_RENDER_L3_PARITY_ERROR_INTERRUPT |
> -			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT,
> +			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT |
> +			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT,
>  		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
> -			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT,
> +			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
> +			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT |
> +			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT,
>  		0,
> -		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT
> +		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT |
> +			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT
>  		};
>  
>  	for (i = 0; i < ARRAY_SIZE(gt_interrupts); i++)
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 70dddac..bfc0c01 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -1062,6 +1062,7 @@ enum punit_power_well {
>  #define RING_ACTHD_UDW(base)	((base)+0x5c)
>  #define RING_NOPID(base)	((base)+0x94)
>  #define RING_IMR(base)		((base)+0xa8)
> +#define RING_HWSTAM(base)	((base)+0x98)
>  #define RING_TIMESTAMP(base)	((base)+0x358)
>  #define   TAIL_ADDR		0x001FFFF8
>  #define   HEAD_WRAP_COUNT	0xFFE00000
> @@ -4590,6 +4591,8 @@ enum punit_power_well {
>  #define GEN8_GT_IIR(which) (0x44308 + (0x10 * (which)))
>  #define GEN8_GT_IER(which) (0x4430c + (0x10 * (which)))
>  
> +#define GEN8_GT_CONTEXT_SWITCH_INTERRUPT	(1 <<  8)

This looked misplaced - I've moved it to its brethren and dropped the
GEN8_ prefix for consistency.
-Daniel

> +
>  #define GEN8_BCS_IRQ_SHIFT 16
>  #define GEN8_RCS_IRQ_SHIFT 0
>  #define GEN8_VCS2_IRQ_SHIFT 16
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index c30518c..a6dcb3a 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -343,6 +343,9 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring)
>  	struct drm_device *dev = ring->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  
> +	I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
> +	I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff);
> +
>  	I915_WRITE(RING_MODE_GEN7(ring),
>  		_MASKED_BIT_DISABLE(GFX_REPLAY_MODE) |
>  		_MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));
> @@ -381,6 +384,39 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
>  	return ret;
>  }
>  
> +static bool gen8_logical_ring_get_irq(struct intel_engine_cs *ring)
> +{
> +	struct drm_device *dev = ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	unsigned long flags;
> +
> +	if (!dev->irq_enabled)
> +		return false;
> +
> +	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> +	if (ring->irq_refcount++ == 0) {
> +		I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
> +		POSTING_READ(RING_IMR(ring->mmio_base));
> +	}
> +	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> +
> +	return true;
> +}
> +
> +static void gen8_logical_ring_put_irq(struct intel_engine_cs *ring)
> +{
> +	struct drm_device *dev = ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> +	if (--ring->irq_refcount == 0) {
> +		I915_WRITE_IMR(ring, ~ring->irq_keep_mask);
> +		POSTING_READ(RING_IMR(ring->mmio_base));
> +	}
> +	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> +}
> +
>  static int gen8_emit_flush(struct intel_ringbuffer *ringbuf,
>  			   u32 invalidate_domains,
>  			   u32 unused)
> @@ -566,6 +602,10 @@ static int logical_render_ring_init(struct drm_device *dev)
>  	ring->mmio_base = RENDER_RING_BASE;
>  	ring->irq_enable_mask =
>  		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
> +	ring->irq_keep_mask =
> +		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
> +	if (HAS_L3_DPF(dev))
> +		ring->irq_keep_mask |= GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
>  
>  	ring->init = gen8_init_render_ring;
>  	ring->cleanup = intel_fini_pipe_control;
> @@ -573,6 +613,8 @@ static int logical_render_ring_init(struct drm_device *dev)
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->emit_request = gen8_emit_request;
>  	ring->emit_flush = gen8_emit_flush_render;
> +	ring->irq_get = gen8_logical_ring_get_irq;
> +	ring->irq_put = gen8_logical_ring_put_irq;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -587,12 +629,16 @@ static int logical_bsd_ring_init(struct drm_device *dev)
>  	ring->mmio_base = GEN6_BSD_RING_BASE;
>  	ring->irq_enable_mask =
>  		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
> +	ring->irq_keep_mask =
> +		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
>  
>  	ring->init = gen8_init_common_ring;
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->emit_request = gen8_emit_request;
>  	ring->emit_flush = gen8_emit_flush;
> +	ring->irq_get = gen8_logical_ring_get_irq;
> +	ring->irq_put = gen8_logical_ring_put_irq;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -607,12 +653,16 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
>  	ring->mmio_base = GEN8_BSD2_RING_BASE;
>  	ring->irq_enable_mask =
>  		GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
> +	ring->irq_keep_mask =
> +		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
>  
>  	ring->init = gen8_init_common_ring;
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->emit_request = gen8_emit_request;
>  	ring->emit_flush = gen8_emit_flush;
> +	ring->irq_get = gen8_logical_ring_get_irq;
> +	ring->irq_put = gen8_logical_ring_put_irq;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -627,12 +677,16 @@ static int logical_blt_ring_init(struct drm_device *dev)
>  	ring->mmio_base = BLT_RING_BASE;
>  	ring->irq_enable_mask =
>  		GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
> +	ring->irq_keep_mask =
> +		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
>  
>  	ring->init = gen8_init_common_ring;
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->emit_request = gen8_emit_request;
>  	ring->emit_flush = gen8_emit_flush;
> +	ring->irq_get = gen8_logical_ring_get_irq;
> +	ring->irq_put = gen8_logical_ring_put_irq;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -647,12 +701,16 @@ static int logical_vebox_ring_init(struct drm_device *dev)
>  	ring->mmio_base = VEBOX_RING_BASE;
>  	ring->irq_enable_mask =
>  		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
> +	ring->irq_keep_mask =
> +		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
>  
>  	ring->init = gen8_init_common_ring;
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->emit_request = gen8_emit_request;
>  	ring->emit_flush = gen8_emit_flush;
> +	ring->irq_get = gen8_logical_ring_get_irq;
> +	ring->irq_put = gen8_logical_ring_put_irq;
>  
>  	return logical_ring_init(dev, ring);
>  }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 6e22866..09102b2 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -223,6 +223,7 @@ struct  intel_engine_cs {
>  	} semaphore;
>  
>  	/* Execlists */
> +	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
>  	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
>  	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
>  				      u32 invalidate_domains,
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 24/43] drm/i915/bdw: GEN-specific logical ring emit batchbuffer start
  2014-07-24 16:04 ` [PATCH 24/43] drm/i915/bdw: GEN-specific logical ring emit batchbuffer start Thomas Daniel
@ 2014-08-11 21:09   ` Daniel Vetter
  2014-08-11 21:12     ` Daniel Vetter
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 21:09 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:32PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Dispatch_execbuffer's evil twin.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c        |   28 ++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |    2 ++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index a6dcb3a..55ee8dd 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -384,6 +384,29 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
>  	return ret;
>  }
>  
> +static int gen8_emit_bb_start(struct intel_ringbuffer *ringbuf,
> +			      u64 offset, unsigned flags)
> +{
> +	struct intel_engine_cs *ring = ringbuf->ring;
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	bool ppgtt = dev_priv->mm.aliasing_ppgtt != NULL &&

The aliasing ppgtt check here is fairly decent bullocks, especially since
full ppgtt is a requirement for execlists. I've ditched it since a series
from myself will otherwise break this patch.
-Daniel

> +		!(flags & I915_DISPATCH_SECURE);
> +	int ret;
> +
> +	ret = intel_logical_ring_begin(ringbuf, 4);
> +	if (ret)
> +		return ret;
> +
> +	/* FIXME(BDW): Address space and security selectors. */
> +	intel_logical_ring_emit(ringbuf, MI_BATCH_BUFFER_START_GEN8 | (ppgtt<<8));
> +	intel_logical_ring_emit(ringbuf, lower_32_bits(offset));
> +	intel_logical_ring_emit(ringbuf, upper_32_bits(offset));
> +	intel_logical_ring_emit(ringbuf, MI_NOOP);
> +	intel_logical_ring_advance(ringbuf);
> +
> +	return 0;
> +}
> +
>  static bool gen8_logical_ring_get_irq(struct intel_engine_cs *ring)
>  {
>  	struct drm_device *dev = ring->dev;
> @@ -615,6 +638,7 @@ static int logical_render_ring_init(struct drm_device *dev)
>  	ring->emit_flush = gen8_emit_flush_render;
>  	ring->irq_get = gen8_logical_ring_get_irq;
>  	ring->irq_put = gen8_logical_ring_put_irq;
> +	ring->emit_bb_start = gen8_emit_bb_start;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -639,6 +663,7 @@ static int logical_bsd_ring_init(struct drm_device *dev)
>  	ring->emit_flush = gen8_emit_flush;
>  	ring->irq_get = gen8_logical_ring_get_irq;
>  	ring->irq_put = gen8_logical_ring_put_irq;
> +	ring->emit_bb_start = gen8_emit_bb_start;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -663,6 +688,7 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
>  	ring->emit_flush = gen8_emit_flush;
>  	ring->irq_get = gen8_logical_ring_get_irq;
>  	ring->irq_put = gen8_logical_ring_put_irq;
> +	ring->emit_bb_start = gen8_emit_bb_start;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -687,6 +713,7 @@ static int logical_blt_ring_init(struct drm_device *dev)
>  	ring->emit_flush = gen8_emit_flush;
>  	ring->irq_get = gen8_logical_ring_get_irq;
>  	ring->irq_put = gen8_logical_ring_put_irq;
> +	ring->emit_bb_start = gen8_emit_bb_start;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -711,6 +738,7 @@ static int logical_vebox_ring_init(struct drm_device *dev)
>  	ring->emit_flush = gen8_emit_flush;
>  	ring->irq_get = gen8_logical_ring_get_irq;
>  	ring->irq_put = gen8_logical_ring_put_irq;
> +	ring->emit_bb_start = gen8_emit_bb_start;
>  
>  	return logical_ring_init(dev, ring);
>  }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 09102b2..c885d5c 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -228,6 +228,8 @@ struct  intel_engine_cs {
>  	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
>  				      u32 invalidate_domains,
>  				      u32 flush_domains);
> +	int		(*emit_bb_start)(struct intel_ringbuffer *ringbuf,
> +					 u64 offset, unsigned flags);
>  
>  	/**
>  	 * List of objects currently involved in rendering from the
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 24/43] drm/i915/bdw: GEN-specific logical ring emit batchbuffer start
  2014-08-11 21:09   ` Daniel Vetter
@ 2014-08-11 21:12     ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 21:12 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Mon, Aug 11, 2014 at 11:09:39PM +0200, Daniel Vetter wrote:
> On Thu, Jul 24, 2014 at 05:04:32PM +0100, Thomas Daniel wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> > 
> > Dispatch_execbuffer's evil twin.
> > 
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_lrc.c        |   28 ++++++++++++++++++++++++++++
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |    2 ++
> >  2 files changed, 30 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index a6dcb3a..55ee8dd 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -384,6 +384,29 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
> >  	return ret;
> >  }
> >  
> > +static int gen8_emit_bb_start(struct intel_ringbuffer *ringbuf,
> > +			      u64 offset, unsigned flags)
> > +{
> > +	struct intel_engine_cs *ring = ringbuf->ring;
> > +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> > +	bool ppgtt = dev_priv->mm.aliasing_ppgtt != NULL &&
> 
> The aliasing ppgtt check here is fairly decent bullocks, especially since
> full ppgtt is a requirement for execlists. I've ditched it since a series
> from myself will otherwise break this patch.
> -Daniel
> 
> > +		!(flags & I915_DISPATCH_SECURE);
> > +	int ret;
> > +
> > +	ret = intel_logical_ring_begin(ringbuf, 4);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* FIXME(BDW): Address space and security selectors. */
> > +	intel_logical_ring_emit(ringbuf, MI_BATCH_BUFFER_START_GEN8 | (ppgtt<<8));

Also please follow up with a patch to replace the magic 8 here with a
proper define. Usual approach is to build this up with an u32 cmd_flags or
so. Patch should obviously also rectify the legacy ring stuff.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 27/43] drm/i915/bdw: Render state init for Execlists
  2014-07-24 16:04 ` [PATCH 27/43] drm/i915/bdw: Render state init for Execlists Thomas Daniel
@ 2014-08-11 21:25   ` Daniel Vetter
  2014-08-13 15:07     ` Daniel, Thomas
  2014-08-21 10:40   ` [PATCH] " Thomas Daniel
  1 sibling, 1 reply; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 21:25 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:35PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> The batchbuffer that sets the render context state is submitted
> in a different way, and from different places.
> 
> We needed to make both the render state preparation and free functions
> outside accesible, and namespace accordingly. This mess is so that all
> LR, LRC and Execlists functionality can go together in intel_lrc.c: we
> can fix all of this later on, once the interfaces are clear.
> 
> v2: Create a separate ctx->rcs_initialized for the Execlists case, as
> suggested by Chris Wilson.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h              |    4 +--
>  drivers/gpu/drm/i915/i915_gem_context.c      |   17 +++++++++-
>  drivers/gpu/drm/i915/i915_gem_render_state.c |   40 ++++++++++++++--------
>  drivers/gpu/drm/i915/i915_gem_render_state.h |   47 ++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c             |   46 +++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.h             |    2 ++
>  drivers/gpu/drm/i915/intel_renderstate.h     |    8 +----
>  7 files changed, 139 insertions(+), 25 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_gem_render_state.h
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 4303e2c..b7cf0ec 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -37,6 +37,7 @@
>  #include "intel_ringbuffer.h"
>  #include "intel_lrc.h"
>  #include "i915_gem_gtt.h"
> +#include "i915_gem_render_state.h"
>  #include <linux/io-mapping.h>
>  #include <linux/i2c.h>
>  #include <linux/i2c-algo-bit.h>
> @@ -623,6 +624,7 @@ struct intel_context {
>  	} legacy_hw_ctx;
>  
>  	/* Execlists */
> +	bool rcs_initialized;
>  	struct {
>  		struct drm_i915_gem_object *state;
>  		struct intel_ringbuffer *ringbuf;
> @@ -2553,8 +2555,6 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>  int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>  				   struct drm_file *file);
>  
> -/* i915_gem_render_state.c */
> -int i915_gem_render_state_init(struct intel_engine_cs *ring);
>  /* i915_gem_evict.c */
>  int __must_check i915_gem_evict_something(struct drm_device *dev,
>  					  struct i915_address_space *vm,
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 9085ff1..0dc6992 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -513,8 +513,23 @@ int i915_gem_context_enable(struct drm_i915_private *dev_priv)
>  		ppgtt->enable(ppgtt);
>  	}
>  
> -	if (i915.enable_execlists)
> +	if (i915.enable_execlists) {
> +		struct intel_context *dctx;
> +
> +		ring = &dev_priv->ring[RCS];
> +		dctx = ring->default_context;
> +
> +		if (!dctx->rcs_initialized) {
> +			ret = intel_lr_context_render_state_init(ring, dctx);
> +			if (ret) {
> +				DRM_ERROR("Init render state failed: %d\n", ret);
> +				return ret;
> +			}
> +			dctx->rcs_initialized = true;
> +		}
> +
>  		return 0;
> +	}

This looks very much like the wrong place. We should init the render state
when we create the context, or when we switch to it for the first time.
The later is what the legacy contexts currently do in do_switch.

But ctx_enable should do the switch to the default context and that's
about it. If there's some depency then I guess we should stall the
creation of the default context a bit, maybe.

In any case someone needs to explain this better and if there's not other
wey this at least needs a bit comment. So I'll punt for now.
-Daniel
 
>  
>  	/* FIXME: We should make this work, even in reset */
>  	if (i915_reset_in_progress(&dev_priv->gpu_error))
> diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
> index e60be3f..a9a62d7 100644
> --- a/drivers/gpu/drm/i915/i915_gem_render_state.c
> +++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
> @@ -28,13 +28,6 @@
>  #include "i915_drv.h"
>  #include "intel_renderstate.h"
>  
> -struct render_state {
> -	const struct intel_renderstate_rodata *rodata;
> -	struct drm_i915_gem_object *obj;
> -	u64 ggtt_offset;
> -	int gen;
> -};
> -
>  static const struct intel_renderstate_rodata *
>  render_state_get_rodata(struct drm_device *dev, const int gen)
>  {
> @@ -127,30 +120,47 @@ static int render_state_setup(struct render_state *so)
>  	return 0;
>  }
>  
> -static void render_state_fini(struct render_state *so)
> +void i915_gem_render_state_fini(struct render_state *so)
>  {
>  	i915_gem_object_ggtt_unpin(so->obj);
>  	drm_gem_object_unreference(&so->obj->base);
>  }
>  
> -int i915_gem_render_state_init(struct intel_engine_cs *ring)
> +int i915_gem_render_state_prepare(struct intel_engine_cs *ring,
> +				  struct render_state *so)
>  {
> -	struct render_state so;
>  	int ret;
>  
>  	if (WARN_ON(ring->id != RCS))
>  		return -ENOENT;
>  
> -	ret = render_state_init(&so, ring->dev);
> +	ret = render_state_init(so, ring->dev);
>  	if (ret)
>  		return ret;
>  
> -	if (so.rodata == NULL)
> +	if (so->rodata == NULL)
>  		return 0;
>  
> -	ret = render_state_setup(&so);
> +	ret = render_state_setup(so);
> +	if (ret) {
> +		i915_gem_render_state_fini(so);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +int i915_gem_render_state_init(struct intel_engine_cs *ring)
> +{
> +	struct render_state so;
> +	int ret;
> +
> +	ret = i915_gem_render_state_prepare(ring, &so);
>  	if (ret)
> -		goto out;
> +		return ret;
> +
> +	if (so.rodata == NULL)
> +		return 0;
>  
>  	ret = ring->dispatch_execbuffer(ring,
>  					so.ggtt_offset,
> @@ -164,6 +174,6 @@ int i915_gem_render_state_init(struct intel_engine_cs *ring)
>  	ret = __i915_add_request(ring, NULL, so.obj, NULL);
>  	/* __i915_add_request moves object to inactive if it fails */
>  out:
> -	render_state_fini(&so);
> +	i915_gem_render_state_fini(&so);
>  	return ret;
>  }
> diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.h b/drivers/gpu/drm/i915/i915_gem_render_state.h
> new file mode 100644
> index 0000000..c44961e
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_gem_render_state.h
> @@ -0,0 +1,47 @@
> +/*
> + * Copyright © 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef _I915_GEM_RENDER_STATE_H_
> +#define _I915_GEM_RENDER_STATE_H_
> +
> +#include <linux/types.h>
> +
> +struct intel_renderstate_rodata {
> +	const u32 *reloc;
> +	const u32 *batch;
> +	const u32 batch_items;
> +};
> +
> +struct render_state {
> +	const struct intel_renderstate_rodata *rodata;
> +	struct drm_i915_gem_object *obj;
> +	u64 ggtt_offset;
> +	int gen;
> +};
> +
> +int i915_gem_render_state_init(struct intel_engine_cs *ring);
> +void i915_gem_render_state_fini(struct render_state *so);
> +int i915_gem_render_state_prepare(struct intel_engine_cs *ring,
> +				  struct render_state *so);
> +
> +#endif /* _I915_GEM_RENDER_STATE_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 0a04c03..4549eec 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -925,6 +925,37 @@ cleanup_render_ring:
>  	return ret;
>  }
>  
> +int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
> +				       struct intel_context *ctx)
> +{
> +	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
> +	struct render_state so;
> +	struct drm_i915_file_private *file_priv = ctx->file_priv;
> +	struct drm_file *file = file_priv? file_priv->file : NULL;
> +	int ret;
> +
> +	ret = i915_gem_render_state_prepare(ring, &so);
> +	if (ret)
> +		return ret;
> +
> +	if (so.rodata == NULL)
> +		return 0;
> +
> +	ret = ring->emit_bb_start(ringbuf,
> +			so.ggtt_offset,
> +			I915_DISPATCH_SECURE);
> +	if (ret)
> +		goto out;
> +
> +	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), ring);
> +
> +	ret = __i915_add_request(ring, file, so.obj, NULL);
> +	/* intel_logical_ring_add_request moves object to inactive if it fails */
> +out:
> +	i915_gem_render_state_fini(&so);
> +	return ret;
> +}
> +
>  static int
>  populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj,
>  		    struct intel_engine_cs *ring, struct intel_ringbuffer *ringbuf)
> @@ -1142,6 +1173,21 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
>  	ctx->engine[ring->id].ringbuf = ringbuf;
>  	ctx->engine[ring->id].state = ctx_obj;
>  
> +	/* The default context will have to wait, because we are not yet
> +	 * ready to send a batchbuffer at this point */
> +	if (ring->id == RCS && !ctx->rcs_initialized &&
> +			ctx != ring->default_context) {
> +		ret = intel_lr_context_render_state_init(ring, ctx);
> +		if (ret) {
> +			DRM_ERROR("Init render state failed: %d\n", ret);
> +			ctx->engine[ring->id].ringbuf = NULL;
> +			ctx->engine[ring->id].state = NULL;
> +			intel_destroy_ringbuffer_obj(ringbuf);
> +			goto error;
> +		}
> +		ctx->rcs_initialized = true;
> +	}
> +
>  	return 0;
>  
>  error:
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 696e09e..f20c3d2 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -43,6 +43,8 @@ static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf, u32
>  int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords);
>  
>  /* Logical Ring Contexts */
> +int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
> +				       struct intel_context *ctx);
>  void intel_lr_context_free(struct intel_context *ctx);
>  int intel_lr_context_deferred_create(struct intel_context *ctx,
>  				     struct intel_engine_cs *ring);
> diff --git a/drivers/gpu/drm/i915/intel_renderstate.h b/drivers/gpu/drm/i915/intel_renderstate.h
> index fd4f662..6c792d3 100644
> --- a/drivers/gpu/drm/i915/intel_renderstate.h
> +++ b/drivers/gpu/drm/i915/intel_renderstate.h
> @@ -24,13 +24,7 @@
>  #ifndef _INTEL_RENDERSTATE_H
>  #define _INTEL_RENDERSTATE_H
>  
> -#include <linux/types.h>
> -
> -struct intel_renderstate_rodata {
> -	const u32 *reloc;
> -	const u32 *batch;
> -	const u32 batch_items;
> -};
> +#include "i915_drv.h"
>  
>  extern const struct intel_renderstate_rodata gen6_null_state;
>  extern const struct intel_renderstate_rodata gen7_null_state;
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 28/43] drm/i915/bdw: Implement context switching (somewhat)
  2014-07-24 16:04 ` [PATCH 28/43] drm/i915/bdw: Implement context switching (somewhat) Thomas Daniel
@ 2014-08-11 21:29   ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 21:29 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:36PM +0100, Thomas Daniel wrote:
> From: Ben Widawsky <benjamin.widawsky@intel.com>
> 
> A context switch occurs by submitting a context descriptor to the
> ExecList Submission Port. Given that we can now initialize a context,
> it's possible to begin implementing the context switch by creating the
> descriptor and submitting it to ELSP (actually two, since the ELSP
> has two ports).
> 
> The context object must be mapped in the GGTT, which means it must exist
> in the 0-4GB graphics VA range.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> 
> v2: This code has changed quite a lot in various rebases. Of particular
> importance is that now we use the globally unique Submission ID to send
> to the hardware. Also, context pages are now pinned unconditionally to
> GGTT, so there is no need to bind them.
> 
> v3: Use LRCA[31:12] as hwCtxId[19:0]. This guarantees that the HW context
> ID we submit to the ELSP is globally unique and != 0 (Bspec requirements
> of the software use-only bits of the Context ID in the Context Descriptor
> Format) without the hassle of the previous submission Id construction.
> Also, re-add the ELSP porting read (it was dropped somewhere during the
> rebases).
> 
> v4:
> - Squash with "drm/i915/bdw: Add forcewake lock around ELSP writes" (BSPEC
>   says: "SW must set Force Wakeup bit to prevent GT from entering C6 while
>   ELSP writes are in progress") as noted by Thomas Daniel
>   (thomas.daniel@intel.com).
> - Rename functions and use an execlists/intel_execlists_ namespace.
> - The BUG_ON only checked that the LRCA was <32 bits, but it didn't make
>   sure that it was properly aligned. Spotted by Alistair Mcaulay
>   <alistair.mcaulay@intel.com>.
> 
> v5:
> - Improved source code comments as suggested by Chris Wilson.
> - No need to abstract submit_ctx away, as pointed by Brad Volkin.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c |  116 +++++++++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/i915/intel_lrc.h |    1 +
>  2 files changed, 115 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 4549eec..535ef98 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -47,6 +47,7 @@
>  #define GEN8_LR_CONTEXT_ALIGN 4096
>  
>  #define RING_ELSP(ring)			((ring)->mmio_base+0x230)
> +#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
>  #define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
>  
>  #define CTX_LRI_HEADER_0		0x01
> @@ -78,6 +79,26 @@
>  #define CTX_R_PWR_CLK_STATE		0x42
>  #define CTX_GPGPU_CSR_BASE_ADDRESS	0x44
>  
> +#define GEN8_CTX_VALID (1<<0)
> +#define GEN8_CTX_FORCE_PD_RESTORE (1<<1)
> +#define GEN8_CTX_FORCE_RESTORE (1<<2)
> +#define GEN8_CTX_L3LLC_COHERENT (1<<5)
> +#define GEN8_CTX_PRIVILEGE (1<<8)
> +enum {
> +	ADVANCED_CONTEXT=0,
> +	LEGACY_CONTEXT,
> +	ADVANCED_AD_CONTEXT,
> +	LEGACY_64B_CONTEXT
> +};
> +#define GEN8_CTX_MODE_SHIFT 3
> +enum {
> +	FAULT_AND_HANG=0,
> +	FAULT_AND_HALT, /* Debug only */
> +	FAULT_AND_STREAM,
> +	FAULT_AND_CONTINUE /* Unsupported */
> +};
> +#define GEN8_CTX_ID_SHIFT 32
> +
>  int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists)
>  {
>  	if (enable_execlists == 0)
> @@ -90,6 +111,93 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
>  	return 0;
>  }
>  
> +u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
> +{
> +	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj);
> +
> +	/* LRCA is required to be 4K aligned so the more significant 20 bits
> +	 * are globally unique */
> +	return lrca >> 12;
> +}
> +
> +static uint64_t execlists_ctx_descriptor(struct drm_i915_gem_object *ctx_obj)
> +{
> +	uint64_t desc;
> +	uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj);
> +	BUG_ON(lrca & 0xFFFFFFFF00000FFFULL);
> +
> +	desc = GEN8_CTX_VALID;
> +	desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
> +	desc |= GEN8_CTX_L3LLC_COHERENT;
> +	desc |= GEN8_CTX_PRIVILEGE;
> +	desc |= lrca;
> +	desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
> +
> +	/* TODO: WaDisableLiteRestore when we start using semaphore
> +	 * signalling between Command Streamers */
> +	/* desc |= GEN8_CTX_FORCE_RESTORE; */
> +
> +	return desc;
> +}
> +
> +static void execlists_elsp_write(struct intel_engine_cs *ring,
> +				 struct drm_i915_gem_object *ctx_obj0,
> +				 struct drm_i915_gem_object *ctx_obj1)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	uint64_t temp = 0;
> +	uint32_t desc[4];
> +
> +	/* XXX: You must always write both descriptors in the order below. */
> +	if (ctx_obj1)
> +		temp = execlists_ctx_descriptor(ctx_obj1);
> +	else
> +		temp = 0;
> +	desc[1] = (u32)(temp >> 32);
> +	desc[0] = (u32)temp;
> +
> +	temp = execlists_ctx_descriptor(ctx_obj0);
> +	desc[3] = (u32)(temp >> 32);
> +	desc[2] = (u32)temp;
> +
> +	/* Set Force Wakeup bit to prevent GT from entering C6 while
> +	 * ELSP writes are in progress */
> +	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
> +
> +	I915_WRITE(RING_ELSP(ring), desc[1]);
> +	I915_WRITE(RING_ELSP(ring), desc[0]);
> +	I915_WRITE(RING_ELSP(ring), desc[3]);
> +	/* The context is automatically loaded after the following */
> +	I915_WRITE(RING_ELSP(ring), desc[2]);
> +
> +	/* ELSP is a wo register, so use another nearby reg for posting instead */
> +	POSTING_READ(RING_EXECLIST_STATUS(ring));
> +
> +	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
> +}
> +
> +static int execlists_submit_context(struct intel_engine_cs *ring,
> +				    struct intel_context *to0, u32 tail0,
> +				    struct intel_context *to1, u32 tail1)
> +{
> +	struct drm_i915_gem_object *ctx_obj0;
> +	struct drm_i915_gem_object *ctx_obj1 = NULL;
> +
> +	ctx_obj0 = to0->engine[ring->id].state;
> +	BUG_ON(!ctx_obj0);
> +	BUG_ON(!i915_gem_obj_is_pinned(ctx_obj0));
> +
> +	if (to1) {
> +		ctx_obj1 = to1->engine[ring->id].state;
> +		BUG_ON(!ctx_obj1);
> +		BUG_ON(!i915_gem_obj_is_pinned(ctx_obj1));
> +	}
> +
> +	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
> +
> +	return 0;
> +}
> +
>  static int logical_ring_invalidate_all_caches(struct intel_ringbuffer *ringbuf)
>  {
>  	struct intel_engine_cs *ring = ringbuf->ring;
> @@ -270,12 +378,16 @@ int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf)
>  
>  void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
>  {
> +	struct intel_engine_cs *ring = ringbuf->ring;
> +	struct intel_context *ctx = ringbuf->ctx;
> +
>  	intel_logical_ring_advance(ringbuf);
>  
> -	if (intel_ring_stopped(ringbuf->ring))
> +	if (intel_ring_stopped(ring))
>  		return;
>  
> -	/* TODO: how to submit a context to the ELSP is not here yet */
> +	/* FIXME: too cheeky, we don't even check if the ELSP is ready */
> +	execlists_submit_context(ring, ctx, ringbuf->tail, NULL, 0);

So this is the 2nd user of ringbuf->ctx I've spotted (well gcc did) and
imo it shouldn't be here. We should have one ELSP submit for each batch,
not one for each ring_advance. Heck even the ring_advance should probably
be done just once.

This is one of the reasons why I wanted to have a parallel execlist
function hierarchy, so that we could implement such stuff correctly
without jumping through hoops (or doing the lazy update trick we do for
legacy rings).

Punt on this for now.
-Daniel

>  }
>  
>  static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index f20c3d2..b59965b 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -58,5 +58,6 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
>  			       struct list_head *vmas,
>  			       struct drm_i915_gem_object *batch_obj,
>  			       u64 exec_start, u32 flags);
> +u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
>  
>  #endif /* _INTEL_LRC_H_ */
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 29/43] drm/i915/bdw: Write the tail pointer, LRC style
  2014-07-24 16:04 ` [PATCH 29/43] drm/i915/bdw: Write the tail pointer, LRC style Thomas Daniel
  2014-08-01 14:33   ` Damien Lespiau
@ 2014-08-11 21:30   ` Daniel Vetter
  1 sibling, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-11 21:30 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:37PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Each logical ring context has the tail pointer in the context object,
> so update it before submission.
> 
> v2: New namespace.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c |   19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 535ef98..5b6f416 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -176,6 +176,21 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
>  	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
>  }
>  
> +static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tail)
> +{
> +	struct page *page;
> +	uint32_t *reg_state;
> +
> +	page = i915_gem_object_get_page(ctx_obj, 1);
> +	reg_state = kmap_atomic(page);
> +
> +	reg_state[CTX_RING_TAIL+1] = tail;
> +
> +	kunmap_atomic(reg_state);
> +
> +	return 0;
> +}
> +
>  static int execlists_submit_context(struct intel_engine_cs *ring,
>  				    struct intel_context *to0, u32 tail0,
>  				    struct intel_context *to1, u32 tail1)
> @@ -187,10 +202,14 @@ static int execlists_submit_context(struct intel_engine_cs *ring,
>  	BUG_ON(!ctx_obj0);
>  	BUG_ON(!i915_gem_obj_is_pinned(ctx_obj0));
>  
> +	execlists_ctx_write_tail(ctx_obj0, tail0);
> +
>  	if (to1) {
>  		ctx_obj1 = to1->engine[ring->id].state;
>  		BUG_ON(!ctx_obj1);
>  		BUG_ON(!i915_gem_obj_is_pinned(ctx_obj1));
> +
> +		execlists_ctx_write_tail(ctx_obj1, tail1);

Ok, now I'm totally surprised - here's the tail write and elsp submit like
excepted. So why do we need the dance in the previous patch?

I'll look at this again tomorrow and will stop merging for today.
-Daniel

>  	}
>  
>  	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 08/43] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
  2014-08-11 14:20     ` Daniel Vetter
@ 2014-08-13 13:34       ` Daniel, Thomas
  2014-08-13 15:16         ` Daniel Vetter
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel, Thomas @ 2014-08-13 13:34 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx



> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Monday, August 11, 2014 3:21 PM
> To: Daniel, Thomas
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 08/43] drm/i915/bdw: Add a context and an
> engine pointers to the ringbuffer
> 
> On Mon, Aug 11, 2014 at 04:14:13PM +0200, Daniel Vetter wrote:
> > On Thu, Jul 24, 2014 at 05:04:16PM +0100, Thomas Daniel wrote:
> > > From: Oscar Mateo <oscar.mateo@intel.com>
> > >
> > > Any given ringbuffer is unequivocally tied to one context and one engine.
> > > By setting the appropriate pointers to them, the ringbuffer struct
> > > holds all the infromation you might need to submit a workload for
> > > processing, Execlists style.
> > >
> > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/intel_lrc.c        |    2 ++
> > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    2 ++
> > >  drivers/gpu/drm/i915/intel_ringbuffer.h |    3 +++
> > >  3 files changed, 7 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > > b/drivers/gpu/drm/i915/intel_lrc.c
> > > index 0a12b8c..2eb7db6 100644
> > > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > > @@ -132,6 +132,8 @@ int intel_lr_context_deferred_create(struct
> intel_context *ctx,
> > >  		return ret;
> > >  	}
> > >
> > > +	ringbuf->ring = ring;
> > > +	ringbuf->ctx = ctx;
> > >  	ringbuf->size = 32 * PAGE_SIZE;
> > >  	ringbuf->effective_size = ringbuf->size;
> > >  	ringbuf->head = 0;
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > index 01e9840..279dda4 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > @@ -1570,6 +1570,8 @@ static int intel_init_ring_buffer(struct
> drm_device *dev,
> > >  	INIT_LIST_HEAD(&ring->active_list);
> > >  	INIT_LIST_HEAD(&ring->request_list);
> > >  	ringbuf->size = 32 * PAGE_SIZE;
> > > +	ringbuf->ring = ring;
> > > +	ringbuf->ctx = ring->default_context;
> >
> > That doesn't make a terribly lot of sense tbh. I fear it's one of
> > these slight confusions which will take tons of patches to clean up.
> > Why exactly do we need the ring->ctx pointer?
> >
> > If we only need this for lrc I want to name it accordingly, to make
> > sure legacy code doesn't grow stupid ideas. And also we should only
> > initialize this in the lrc ctx init then.
> >
> > All patches up to this one merged.
> 
> Ok, I've discussed this quickly with Damien on irc. We decided to cut away
> the ring->ctx part of this patch for now to be able to move on.
> -Daniel
As you've seen, removing ringbuffer->ctx causes serious problems with the
plumbing later on.  This can be renamed (perhaps to lrc) and removed from
legacy init.

Each ring buffer belongs to a specific context - it makes sense to me to
keep this information within the ringbuffer structure so that we don't have
to pass the context pointer around everywhere.

Thomas.
> 
> > -Daniel
> >
> > >  	memset(ring->semaphore.sync_seqno, 0,
> > > sizeof(ring->semaphore.sync_seqno));
> > >
> > >  	init_waitqueue_head(&ring->irq_queue);
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > index 053d004..be40788 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > @@ -88,6 +88,9 @@ struct intel_ringbuffer {
> > >  	struct drm_i915_gem_object *obj;
> > >  	void __iomem *virtual_start;
> > >
> > > +	struct intel_engine_cs *ring;
> > > +	struct intel_context *ctx;
> > > +
> > >  	u32 head;
> > >  	u32 tail;
> > >  	int space;
> > > --
> > > 1.7.9.5
> > >
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 21/43] drm/i915/bdw: Emission of requests with logical rings
  2014-08-11 20:56   ` Daniel Vetter
@ 2014-08-13 13:34     ` Daniel, Thomas
  2014-08-13 15:25       ` Daniel Vetter
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel, Thomas @ 2014-08-13 13:34 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx



> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Monday, August 11, 2014 9:57 PM
> To: Daniel, Thomas
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 21/43] drm/i915/bdw: Emission of requests
> with logical rings
> 
> On Thu, Jul 24, 2014 at 05:04:29PM +0100, Thomas Daniel wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > On a previous iteration of this patch, I created an Execlists version
> > of __i915_add_request and asbtracted it away as a vfunc. Daniel Vetter
> > wondered then why that was needed:
> >
> > "with the clean split in command submission I expect every function to
> > know wether it'll submit to an lrc (everything in
> > intel_lrc.c) or wether it'll submit to a legacy ring (existing code),
> > so I don't see a need for an add_request vfunc."
> >
> > The honest, hairy truth is that this patch is the glue keeping the
> > whole logical ring puzzle together:
> 
> Oops, I didn't spot this and it's indeed not terribly pretty.
Are you saying you want to go back to a vfunc for add_request?

> >
> > - i915_add_request is used by intel_ring_idle, which in turn is
> >   used by i915_gpu_idle, which in turn is used in several places
> >   inside the eviction and gtt codes.
> 
> This should probably be folded in with the lrc specific version of stop_rings
> and so should work out.
> 
> > - Also, it is used by i915_gem_check_olr, which is littered all
> >   over i915_gem.c
> 
> We now always preallocate the request struct, so olr is officially dead.
> Well almost, except for non-execbuf stuff that we emit through the rings.
> Which is nothing for lrc/execlist mode.
> 
> Also there's the icky-bitty problem with ringbuf->ctx which makes this patch
> not apply any more. I think we need to revise or at least discuss a bit.
> 
Context is required when doing an advance_and_submit so that we know
which context to queue in the execlist queue.

> > - ...
> >
> > If I were to duplicate all the code that directly or indirectly uses
> > __i915_add_request, I'll end up creating a separate driver.
> >
> > To show the differences between the existing legacy version and the
> > new Execlists one, this time I have special-cased __i915_add_request
> > instead of adding an add_request vfunc. I hope this helps to untangle
> > this Gordian knot.
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c  |   72
> ++++++++++++++++++++++++++++----------
> >  drivers/gpu/drm/i915/intel_lrc.c |   30 +++++++++++++---
> >  drivers/gpu/drm/i915/intel_lrc.h |    1 +
> >  3 files changed, 80 insertions(+), 23 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c
> > b/drivers/gpu/drm/i915/i915_gem.c index 9560b40..1c83b9c 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -2327,10 +2327,21 @@ int __i915_add_request(struct intel_engine_cs
> > *ring,  {
> >  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> >  	struct drm_i915_gem_request *request;
> > +	struct intel_ringbuffer *ringbuf;
> >  	u32 request_ring_position, request_start;
> >  	int ret;
> >
> > -	request_start = intel_ring_get_tail(ring->buffer);
> > +	request = ring->preallocated_lazy_request;
> > +	if (WARN_ON(request == NULL))
> > +		return -ENOMEM;
> > +
> > +	if (i915.enable_execlists) {
> > +		struct intel_context *ctx = request->ctx;
> > +		ringbuf = ctx->engine[ring->id].ringbuf;
> > +	} else
> > +		ringbuf = ring->buffer;
> > +
> > +	request_start = intel_ring_get_tail(ringbuf);
> >  	/*
> >  	 * Emit any outstanding flushes - execbuf can fail to emit the flush
> >  	 * after having emitted the batchbuffer command. Hence we need to
> > fix @@ -2338,24 +2349,32 @@ int __i915_add_request(struct
> intel_engine_cs *ring,
> >  	 * is that the flush _must_ happen before the next request, no
> matter
> >  	 * what.
> >  	 */
> > -	ret = intel_ring_flush_all_caches(ring);
> > -	if (ret)
> > -		return ret;
> > -
> > -	request = ring->preallocated_lazy_request;
> > -	if (WARN_ON(request == NULL))
> > -		return -ENOMEM;
> > +	if (i915.enable_execlists) {
> > +		ret = logical_ring_flush_all_caches(ringbuf);
> > +		if (ret)
> > +			return ret;
> > +	} else {
> > +		ret = intel_ring_flush_all_caches(ring);
> > +		if (ret)
> > +			return ret;
> > +	}
> >
> >  	/* Record the position of the start of the request so that
> >  	 * should we detect the updated seqno part-way through the
> >  	 * GPU processing the request, we never over-estimate the
> >  	 * position of the head.
> >  	 */
> > -	request_ring_position = intel_ring_get_tail(ring->buffer);
> > +	request_ring_position = intel_ring_get_tail(ringbuf);
> >
> > -	ret = ring->add_request(ring);
> > -	if (ret)
> > -		return ret;
> > +	if (i915.enable_execlists) {
> > +		ret = ring->emit_request(ringbuf);
> > +		if (ret)
> > +			return ret;
> > +	} else {
> > +		ret = ring->add_request(ring);
> > +		if (ret)
> > +			return ret;
> > +	}
> >
> >  	request->seqno = intel_ring_get_seqno(ring);
> >  	request->ring = ring;
> > @@ -2370,12 +2389,14 @@ int __i915_add_request(struct intel_engine_cs
> *ring,
> >  	 */
> >  	request->batch_obj = obj;
> >
> > -	/* Hold a reference to the current context so that we can inspect
> > -	 * it later in case a hangcheck error event fires.
> > -	 */
> > -	request->ctx = ring->last_context;
> > -	if (request->ctx)
> > -		i915_gem_context_reference(request->ctx);
> > +	if (!i915.enable_execlists) {
> > +		/* Hold a reference to the current context so that we can
> inspect
> > +		 * it later in case a hangcheck error event fires.
> > +		 */
> > +		request->ctx = ring->last_context;
> > +		if (request->ctx)
> > +			i915_gem_context_reference(request->ctx);
> > +	}
> >
> >  	request->emitted_jiffies = jiffies;
> >  	list_add_tail(&request->list, &ring->request_list); @@ -2630,6
> > +2651,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
> >
> >  	while (!list_empty(&ring->request_list)) {
> >  		struct drm_i915_gem_request *request;
> > +		struct intel_ringbuffer *ringbuf;
> >
> >  		request = list_first_entry(&ring->request_list,
> >  					   struct drm_i915_gem_request,
> > @@ -2639,12 +2661,24 @@ i915_gem_retire_requests_ring(struct
> intel_engine_cs *ring)
> >  			break;
> >
> >  		trace_i915_gem_request_retire(ring, request->seqno);
> > +
> > +		/* This is one of the few common intersection points
> > +		 * between legacy ringbuffer submission and execlists:
> > +		 * we need to tell them apart in order to find the correct
> > +		 * ringbuffer to which the request belongs to.
> > +		 */
> > +		if (i915.enable_execlists) {
> > +			struct intel_context *ctx = request->ctx;
> > +			ringbuf = ctx->engine[ring->id].ringbuf;
> > +		} else
> > +			ringbuf = ring->buffer;
> > +
> >  		/* We know the GPU must have read the request to have
> >  		 * sent us the seqno + interrupt, so use the position
> >  		 * of tail of the request to update the last known position
> >  		 * of the GPU head.
> >  		 */
> > -		ring->buffer->last_retired_head = request->tail;
> > +		ringbuf->last_retired_head = request->tail;
> >
> >  		i915_gem_free_request(request);
> >  	}
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > b/drivers/gpu/drm/i915/intel_lrc.c
> > index 5dd63d6..dcf59c6 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -106,6 +106,22 @@ void intel_logical_ring_stop(struct intel_engine_cs
> *ring)
> >  	/* TODO */
> >  }
> >
> > +int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf) {
> > +	struct intel_engine_cs *ring = ringbuf->ring;
> > +	int ret;
> > +
> > +	if (!ring->gpu_caches_dirty)
> > +		return 0;
> > +
> > +	ret = ring->emit_flush(ringbuf, 0, I915_GEM_GPU_DOMAINS);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ring->gpu_caches_dirty = false;
> > +	return 0;
> > +}
> > +
> >  void intel_logical_ring_advance_and_submit(struct intel_ringbuffer
> > *ringbuf)  {
> >  	intel_logical_ring_advance(ringbuf);
> > @@ -116,7 +132,8 @@ void intel_logical_ring_advance_and_submit(struct
> intel_ringbuffer *ringbuf)
> >  	/* TODO: how to submit a context to the ELSP is not here yet */  }
> >
> > -static int logical_ring_alloc_seqno(struct intel_engine_cs *ring)
> > +static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
> > +				    struct intel_context *ctx)
> >  {
> >  	if (ring->outstanding_lazy_seqno)
> >  		return 0;
> > @@ -128,6 +145,13 @@ static int logical_ring_alloc_seqno(struct
> intel_engine_cs *ring)
> >  		if (request == NULL)
> >  			return -ENOMEM;
> >
> > +		/* Hold a reference to the context this request belongs to
> > +		 * (we will need it when the time comes to emit/retire the
> > +		 * request).
> > +		 */
> > +		request->ctx = ctx;
> > +		i915_gem_context_reference(request->ctx);
> > +
> >  		ring->preallocated_lazy_request = request;
> >  	}
> >
> > @@ -165,8 +189,6 @@ static int logical_ring_wait_request(struct
> intel_ringbuffer *ringbuf, int bytes
> >  	if (ret)
> >  		return ret;
> >
> > -	/* TODO: make sure we update the right ringbuffer's
> last_retired_head
> > -	 * when retiring requests */
> >  	i915_gem_retire_requests_ring(ring);
> >  	ringbuf->head = ringbuf->last_retired_head;
> >  	ringbuf->last_retired_head = -1;
> > @@ -291,7 +313,7 @@ int intel_logical_ring_begin(struct intel_ringbuffer
> *ringbuf, int num_dwords)
> >  		return ret;
> >
> >  	/* Preallocate the olr before touching the ring */
> > -	ret = logical_ring_alloc_seqno(ring);
> > +	ret = logical_ring_alloc_seqno(ring, ringbuf->ctx);
> 
> Ok, this is hairy. Really hairy, since this uses ringbuf->ctx. Not sure we really
> want this or need this.
> 
alloc_seqno needs to know the context since it also preallocates and
populates the request, which belongs to a specific context.  Having the
logical context pointer inside the ring buffer makes the code a lot more
elegant imo, since we don't have to keep passing buffer and context
around when there is only one context which can use each ring buffer,
just as keeping the engine pointer in the ring buffer means we don't
have to pass the engine with it.

As I said in my reply to your patch 8 review I'm happy for ctx to be
renamed to avoid confusion in the legacy path

Thomas.

> >  	if (ret)
> >  		return ret;
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.h
> > b/drivers/gpu/drm/i915/intel_lrc.h
> > index 16798b6..696e09e 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.h
> > +++ b/drivers/gpu/drm/i915/intel_lrc.h
> > @@ -29,6 +29,7 @@ void intel_logical_ring_stop(struct intel_engine_cs
> > *ring);  void intel_logical_ring_cleanup(struct intel_engine_cs
> > *ring);  int intel_logical_rings_init(struct drm_device *dev);
> >
> > +int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf);
> >  void intel_logical_ring_advance_and_submit(struct intel_ringbuffer
> > *ringbuf);  static inline void intel_logical_ring_advance(struct
> > intel_ringbuffer *ringbuf)  {
> > --
> > 1.7.9.5
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 27/43] drm/i915/bdw: Render state init for Execlists
  2014-08-11 21:25   ` Daniel Vetter
@ 2014-08-13 15:07     ` Daniel, Thomas
  2014-08-13 15:30       ` Daniel Vetter
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel, Thomas @ 2014-08-13 15:07 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx



> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Monday, August 11, 2014 10:25 PM
> To: Daniel, Thomas
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 27/43] drm/i915/bdw: Render state init for
> Execlists
> 
> On Thu, Jul 24, 2014 at 05:04:35PM +0100, Thomas Daniel wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > The batchbuffer that sets the render context state is submitted in a
> > different way, and from different places.
> >
> > We needed to make both the render state preparation and free functions
> > outside accesible, and namespace accordingly. This mess is so that all
> > LR, LRC and Execlists functionality can go together in intel_lrc.c: we
> > can fix all of this later on, once the interfaces are clear.
> >
> > v2: Create a separate ctx->rcs_initialized for the Execlists case, as
> > suggested by Chris Wilson.
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h              |    4 +--
> >  drivers/gpu/drm/i915/i915_gem_context.c      |   17 +++++++++-
> >  drivers/gpu/drm/i915/i915_gem_render_state.c |   40 ++++++++++++++--
> ------
> >  drivers/gpu/drm/i915/i915_gem_render_state.h |   47
> ++++++++++++++++++++++++++
> >  drivers/gpu/drm/i915/intel_lrc.c             |   46
> +++++++++++++++++++++++++
> >  drivers/gpu/drm/i915/intel_lrc.h             |    2 ++
> >  drivers/gpu/drm/i915/intel_renderstate.h     |    8 +----
> >  7 files changed, 139 insertions(+), 25 deletions(-)  create mode
> > 100644 drivers/gpu/drm/i915/i915_gem_render_state.h
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h index 4303e2c..b7cf0ec 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -37,6 +37,7 @@
> >  #include "intel_ringbuffer.h"
> >  #include "intel_lrc.h"
> >  #include "i915_gem_gtt.h"
> > +#include "i915_gem_render_state.h"
> >  #include <linux/io-mapping.h>
> >  #include <linux/i2c.h>
> >  #include <linux/i2c-algo-bit.h>
> > @@ -623,6 +624,7 @@ struct intel_context {
> >  	} legacy_hw_ctx;
> >
> >  	/* Execlists */
> > +	bool rcs_initialized;
> >  	struct {
> >  		struct drm_i915_gem_object *state;
> >  		struct intel_ringbuffer *ringbuf;
> > @@ -2553,8 +2555,6 @@ int i915_gem_context_create_ioctl(struct
> > drm_device *dev, void *data,  int i915_gem_context_destroy_ioctl(struct
> drm_device *dev, void *data,
> >  				   struct drm_file *file);
> >
> > -/* i915_gem_render_state.c */
> > -int i915_gem_render_state_init(struct intel_engine_cs *ring);
> >  /* i915_gem_evict.c */
> >  int __must_check i915_gem_evict_something(struct drm_device *dev,
> >  					  struct i915_address_space *vm, diff
> --git
> > a/drivers/gpu/drm/i915/i915_gem_context.c
> > b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 9085ff1..0dc6992 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -513,8 +513,23 @@ int i915_gem_context_enable(struct
> drm_i915_private *dev_priv)
> >  		ppgtt->enable(ppgtt);
> >  	}
> >
> > -	if (i915.enable_execlists)
> > +	if (i915.enable_execlists) {
> > +		struct intel_context *dctx;
> > +
> > +		ring = &dev_priv->ring[RCS];
> > +		dctx = ring->default_context;
> > +
> > +		if (!dctx->rcs_initialized) {
> > +			ret = intel_lr_context_render_state_init(ring, dctx);
> > +			if (ret) {
> > +				DRM_ERROR("Init render state failed: %d\n",
> ret);
> > +				return ret;
> > +			}
> > +			dctx->rcs_initialized = true;
> > +		}
> > +
> >  		return 0;
> > +	}
> 
> This looks very much like the wrong place. We should init the render state
> when we create the context, or when we switch to it for the first time.
> The later is what the legacy contexts currently do in do_switch.
> 
> But ctx_enable should do the switch to the default context and that's about
Well, a side-effect of switching to the default context in legacy mode is that
the render state gets initialized.  I could move the lr render state init call
into an enable_execlists branch in i915_switch_context() but that doen't
seem like the right place.

How about in i915_gem_init() after calling i915_gem_init_hw()?

> it. If there's some depency then I guess we should stall the creation of the
> default context a bit, maybe.
> 
> In any case someone needs to explain this better and if there's not other
> wey this at least needs a bit comment. So I'll punt for now.
When the default context is created the driver is not ready to execute a
batch.  That is why the render state init can't be done then.

Thomas.

> -Daniel
> 
> >
> >  	/* FIXME: We should make this work, even in reset */
> >  	if (i915_reset_in_progress(&dev_priv->gpu_error))
> > diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c
> > b/drivers/gpu/drm/i915/i915_gem_render_state.c
> > index e60be3f..a9a62d7 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_render_state.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
> > @@ -28,13 +28,6 @@
> >  #include "i915_drv.h"
> >  #include "intel_renderstate.h"
> >
> > -struct render_state {
> > -	const struct intel_renderstate_rodata *rodata;
> > -	struct drm_i915_gem_object *obj;
> > -	u64 ggtt_offset;
> > -	int gen;
> > -};
> > -
> >  static const struct intel_renderstate_rodata *
> > render_state_get_rodata(struct drm_device *dev, const int gen)  { @@
> > -127,30 +120,47 @@ static int render_state_setup(struct render_state *so)
> >  	return 0;
> >  }
> >
> > -static void render_state_fini(struct render_state *so)
> > +void i915_gem_render_state_fini(struct render_state *so)
> >  {
> >  	i915_gem_object_ggtt_unpin(so->obj);
> >  	drm_gem_object_unreference(&so->obj->base);
> >  }
> >
> > -int i915_gem_render_state_init(struct intel_engine_cs *ring)
> > +int i915_gem_render_state_prepare(struct intel_engine_cs *ring,
> > +				  struct render_state *so)
> >  {
> > -	struct render_state so;
> >  	int ret;
> >
> >  	if (WARN_ON(ring->id != RCS))
> >  		return -ENOENT;
> >
> > -	ret = render_state_init(&so, ring->dev);
> > +	ret = render_state_init(so, ring->dev);
> >  	if (ret)
> >  		return ret;
> >
> > -	if (so.rodata == NULL)
> > +	if (so->rodata == NULL)
> >  		return 0;
> >
> > -	ret = render_state_setup(&so);
> > +	ret = render_state_setup(so);
> > +	if (ret) {
> > +		i915_gem_render_state_fini(so);
> > +		return ret;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +int i915_gem_render_state_init(struct intel_engine_cs *ring) {
> > +	struct render_state so;
> > +	int ret;
> > +
> > +	ret = i915_gem_render_state_prepare(ring, &so);
> >  	if (ret)
> > -		goto out;
> > +		return ret;
> > +
> > +	if (so.rodata == NULL)
> > +		return 0;
> >
> >  	ret = ring->dispatch_execbuffer(ring,
> >  					so.ggtt_offset,
> > @@ -164,6 +174,6 @@ int i915_gem_render_state_init(struct
> intel_engine_cs *ring)
> >  	ret = __i915_add_request(ring, NULL, so.obj, NULL);
> >  	/* __i915_add_request moves object to inactive if it fails */
> >  out:
> > -	render_state_fini(&so);
> > +	i915_gem_render_state_fini(&so);
> >  	return ret;
> >  }
> > diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.h
> > b/drivers/gpu/drm/i915/i915_gem_render_state.h
> > new file mode 100644
> > index 0000000..c44961e
> > --- /dev/null
> > +++ b/drivers/gpu/drm/i915/i915_gem_render_state.h
> > @@ -0,0 +1,47 @@
> > +/*
> > + * Copyright (c) 2014 Intel Corporation
> > + *
> > + * Permission is hereby granted, free of charge, to any person
> > +obtaining a
> > + * copy of this software and associated documentation files (the
> > +"Software"),
> > + * to deal in the Software without restriction, including without
> > +limitation
> > + * the rights to use, copy, modify, merge, publish, distribute,
> > +sublicense,
> > + * and/or sell copies of the Software, and to permit persons to whom
> > +the
> > + * Software is furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice (including
> > +the next
> > + * paragraph) shall be included in all copies or substantial portions
> > +of the
> > + * Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
> KIND,
> > +EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> > +MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN
> NO EVENT
> > +SHALL
> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
> DAMAGES
> > +OR OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> OTHERWISE,
> > +ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
> OR
> > +OTHER
> > + * DEALINGS IN THE SOFTWARE.
> > + */
> > +
> > +#ifndef _I915_GEM_RENDER_STATE_H_
> > +#define _I915_GEM_RENDER_STATE_H_
> > +
> > +#include <linux/types.h>
> > +
> > +struct intel_renderstate_rodata {
> > +	const u32 *reloc;
> > +	const u32 *batch;
> > +	const u32 batch_items;
> > +};
> > +
> > +struct render_state {
> > +	const struct intel_renderstate_rodata *rodata;
> > +	struct drm_i915_gem_object *obj;
> > +	u64 ggtt_offset;
> > +	int gen;
> > +};
> > +
> > +int i915_gem_render_state_init(struct intel_engine_cs *ring); void
> > +i915_gem_render_state_fini(struct render_state *so); int
> > +i915_gem_render_state_prepare(struct intel_engine_cs *ring,
> > +				  struct render_state *so);
> > +
> > +#endif /* _I915_GEM_RENDER_STATE_H_ */
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > b/drivers/gpu/drm/i915/intel_lrc.c
> > index 0a04c03..4549eec 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -925,6 +925,37 @@ cleanup_render_ring:
> >  	return ret;
> >  }
> >
> > +int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
> > +				       struct intel_context *ctx) {
> > +	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
> > +	struct render_state so;
> > +	struct drm_i915_file_private *file_priv = ctx->file_priv;
> > +	struct drm_file *file = file_priv? file_priv->file : NULL;
> > +	int ret;
> > +
> > +	ret = i915_gem_render_state_prepare(ring, &so);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (so.rodata == NULL)
> > +		return 0;
> > +
> > +	ret = ring->emit_bb_start(ringbuf,
> > +			so.ggtt_offset,
> > +			I915_DISPATCH_SECURE);
> > +	if (ret)
> > +		goto out;
> > +
> > +	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), ring);
> > +
> > +	ret = __i915_add_request(ring, file, so.obj, NULL);
> > +	/* intel_logical_ring_add_request moves object to inactive if it
> > +fails */
> > +out:
> > +	i915_gem_render_state_fini(&so);
> > +	return ret;
> > +}
> > +
> >  static int
> >  populate_lr_context(struct intel_context *ctx, struct
> drm_i915_gem_object *ctx_obj,
> >  		    struct intel_engine_cs *ring, struct intel_ringbuffer
> *ringbuf)
> > @@ -1142,6 +1173,21 @@ int intel_lr_context_deferred_create(struct
> intel_context *ctx,
> >  	ctx->engine[ring->id].ringbuf = ringbuf;
> >  	ctx->engine[ring->id].state = ctx_obj;
> >
> > +	/* The default context will have to wait, because we are not yet
> > +	 * ready to send a batchbuffer at this point */
> > +	if (ring->id == RCS && !ctx->rcs_initialized &&
> > +			ctx != ring->default_context) {
> > +		ret = intel_lr_context_render_state_init(ring, ctx);
> > +		if (ret) {
> > +			DRM_ERROR("Init render state failed: %d\n", ret);
> > +			ctx->engine[ring->id].ringbuf = NULL;
> > +			ctx->engine[ring->id].state = NULL;
> > +			intel_destroy_ringbuffer_obj(ringbuf);
> > +			goto error;
> > +		}
> > +		ctx->rcs_initialized = true;
> > +	}
> > +
> >  	return 0;
> >
> >  error:
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.h
> > b/drivers/gpu/drm/i915/intel_lrc.h
> > index 696e09e..f20c3d2 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.h
> > +++ b/drivers/gpu/drm/i915/intel_lrc.h
> > @@ -43,6 +43,8 @@ static inline void intel_logical_ring_emit(struct
> > intel_ringbuffer *ringbuf, u32  int intel_logical_ring_begin(struct
> > intel_ringbuffer *ringbuf, int num_dwords);
> >
> >  /* Logical Ring Contexts */
> > +int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
> > +				       struct intel_context *ctx);
> >  void intel_lr_context_free(struct intel_context *ctx);  int
> > intel_lr_context_deferred_create(struct intel_context *ctx,
> >  				     struct intel_engine_cs *ring); diff --git
> > a/drivers/gpu/drm/i915/intel_renderstate.h
> > b/drivers/gpu/drm/i915/intel_renderstate.h
> > index fd4f662..6c792d3 100644
> > --- a/drivers/gpu/drm/i915/intel_renderstate.h
> > +++ b/drivers/gpu/drm/i915/intel_renderstate.h
> > @@ -24,13 +24,7 @@
> >  #ifndef _INTEL_RENDERSTATE_H
> >  #define _INTEL_RENDERSTATE_H
> >
> > -#include <linux/types.h>
> > -
> > -struct intel_renderstate_rodata {
> > -	const u32 *reloc;
> > -	const u32 *batch;
> > -	const u32 batch_items;
> > -};
> > +#include "i915_drv.h"
> >
> >  extern const struct intel_renderstate_rodata gen6_null_state;  extern
> > const struct intel_renderstate_rodata gen7_null_state;
> > --
> > 1.7.9.5
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 08/43] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
  2014-08-13 13:34       ` Daniel, Thomas
@ 2014-08-13 15:16         ` Daniel Vetter
  2014-08-14 15:09           ` Daniel, Thomas
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel Vetter @ 2014-08-13 15:16 UTC (permalink / raw)
  To: Daniel, Thomas; +Cc: intel-gfx

On Wed, Aug 13, 2014 at 01:34:15PM +0000, Daniel, Thomas wrote:
> 
> 
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> > Vetter
> > Sent: Monday, August 11, 2014 3:21 PM
> > To: Daniel, Thomas
> > Cc: intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 08/43] drm/i915/bdw: Add a context and an
> > engine pointers to the ringbuffer
> > 
> > On Mon, Aug 11, 2014 at 04:14:13PM +0200, Daniel Vetter wrote:
> > > On Thu, Jul 24, 2014 at 05:04:16PM +0100, Thomas Daniel wrote:
> > > > From: Oscar Mateo <oscar.mateo@intel.com>
> > > >
> > > > Any given ringbuffer is unequivocally tied to one context and one engine.
> > > > By setting the appropriate pointers to them, the ringbuffer struct
> > > > holds all the infromation you might need to submit a workload for
> > > > processing, Execlists style.
> > > >
> > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_lrc.c        |    2 ++
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    2 ++
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.h |    3 +++
> > > >  3 files changed, 7 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > > > b/drivers/gpu/drm/i915/intel_lrc.c
> > > > index 0a12b8c..2eb7db6 100644
> > > > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > > > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > > > @@ -132,6 +132,8 @@ int intel_lr_context_deferred_create(struct
> > intel_context *ctx,
> > > >  		return ret;
> > > >  	}
> > > >
> > > > +	ringbuf->ring = ring;
> > > > +	ringbuf->ctx = ctx;
> > > >  	ringbuf->size = 32 * PAGE_SIZE;
> > > >  	ringbuf->effective_size = ringbuf->size;
> > > >  	ringbuf->head = 0;
> > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > index 01e9840..279dda4 100644
> > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > @@ -1570,6 +1570,8 @@ static int intel_init_ring_buffer(struct
> > drm_device *dev,
> > > >  	INIT_LIST_HEAD(&ring->active_list);
> > > >  	INIT_LIST_HEAD(&ring->request_list);
> > > >  	ringbuf->size = 32 * PAGE_SIZE;
> > > > +	ringbuf->ring = ring;
> > > > +	ringbuf->ctx = ring->default_context;
> > >
> > > That doesn't make a terribly lot of sense tbh. I fear it's one of
> > > these slight confusions which will take tons of patches to clean up.
> > > Why exactly do we need the ring->ctx pointer?
> > >
> > > If we only need this for lrc I want to name it accordingly, to make
> > > sure legacy code doesn't grow stupid ideas. And also we should only
> > > initialize this in the lrc ctx init then.
> > >
> > > All patches up to this one merged.
> > 
> > Ok, I've discussed this quickly with Damien on irc. We decided to cut away
> > the ring->ctx part of this patch for now to be able to move on.
> > -Daniel
> As you've seen, removing ringbuffer->ctx causes serious problems with the
> plumbing later on.  This can be renamed (perhaps to lrc) and removed from
> legacy init.
> 
> Each ring buffer belongs to a specific context - it makes sense to me to
> keep this information within the ringbuffer structure so that we don't have
> to pass the context pointer around everywhere.

I agree that it causes trouble with the follow-up patches, but I'm not
sold on this being a terrible good idea. After all for ELSP we don't want
to submit a ring, we want to submit the full context. So if the code
that's supposed to do the execlist ctx submission only has the pointer to
the ring object, the layer looks a bit wrong.

Same was iirc about the add_request part.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 21/43] drm/i915/bdw: Emission of requests with logical rings
  2014-08-13 13:34     ` Daniel, Thomas
@ 2014-08-13 15:25       ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-13 15:25 UTC (permalink / raw)
  To: Daniel, Thomas; +Cc: intel-gfx

On Wed, Aug 13, 2014 at 01:34:28PM +0000, Daniel, Thomas wrote:
> 
> 
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> > Vetter
> > Sent: Monday, August 11, 2014 9:57 PM
> > To: Daniel, Thomas
> > Cc: intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 21/43] drm/i915/bdw: Emission of requests
> > with logical rings
> > 
> > On Thu, Jul 24, 2014 at 05:04:29PM +0100, Thomas Daniel wrote:
> > > From: Oscar Mateo <oscar.mateo@intel.com>
> > >
> > > On a previous iteration of this patch, I created an Execlists version
> > > of __i915_add_request and asbtracted it away as a vfunc. Daniel Vetter
> > > wondered then why that was needed:
> > >
> > > "with the clean split in command submission I expect every function to
> > > know wether it'll submit to an lrc (everything in
> > > intel_lrc.c) or wether it'll submit to a legacy ring (existing code),
> > > so I don't see a need for an add_request vfunc."
> > >
> > > The honest, hairy truth is that this patch is the glue keeping the
> > > whole logical ring puzzle together:
> > 
> > Oops, I didn't spot this and it's indeed not terribly pretty.
> Are you saying you want to go back to a vfunc for add_request?

Nope, I'd have expected that there's no need for such a switch at all, and
that all these differences disappear behind the execbuf cmd submission
abstraction.

> > > - i915_add_request is used by intel_ring_idle, which in turn is
> > >   used by i915_gpu_idle, which in turn is used in several places
> > >   inside the eviction and gtt codes.
> > 
> > This should probably be folded in with the lrc specific version of stop_rings
> > and so should work out.
> > 
> > > - Also, it is used by i915_gem_check_olr, which is littered all
> > >   over i915_gem.c
> > 
> > We now always preallocate the request struct, so olr is officially dead.
> > Well almost, except for non-execbuf stuff that we emit through the rings.
> > Which is nothing for lrc/execlist mode.
> > 
> > Also there's the icky-bitty problem with ringbuf->ctx which makes this patch
> > not apply any more. I think we need to revise or at least discuss a bit.
> > 
> Context is required when doing an advance_and_submit so that we know
> which context to queue in the execlist queue.

Might need to re-read, but imo it's not a good idea to do a submit in
there. If I understand this correctly there should only be a need for an
ELSP write (or queueing it up) at the end of the execlist-specific cmd
submission implementation. The ringbuffer writes we do before that for the
different pieces (flushes, seqno write, bb_start, whatever) should just
update the tail pointer in the ring.

Or do I miss something really big here?

> 
> > > - ...
> > >
> > > If I were to duplicate all the code that directly or indirectly uses
> > > __i915_add_request, I'll end up creating a separate driver.
> > >
> > > To show the differences between the existing legacy version and the
> > > new Execlists one, this time I have special-cased __i915_add_request
> > > instead of adding an add_request vfunc. I hope this helps to untangle
> > > this Gordian knot.
> > >
> > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem.c  |   72
> > ++++++++++++++++++++++++++++----------
> > >  drivers/gpu/drm/i915/intel_lrc.c |   30 +++++++++++++---
> > >  drivers/gpu/drm/i915/intel_lrc.h |    1 +
> > >  3 files changed, 80 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c
> > > b/drivers/gpu/drm/i915/i915_gem.c index 9560b40..1c83b9c 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -2327,10 +2327,21 @@ int __i915_add_request(struct intel_engine_cs
> > > *ring,  {
> > >  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> > >  	struct drm_i915_gem_request *request;
> > > +	struct intel_ringbuffer *ringbuf;
> > >  	u32 request_ring_position, request_start;
> > >  	int ret;
> > >
> > > -	request_start = intel_ring_get_tail(ring->buffer);
> > > +	request = ring->preallocated_lazy_request;
> > > +	if (WARN_ON(request == NULL))
> > > +		return -ENOMEM;
> > > +
> > > +	if (i915.enable_execlists) {
> > > +		struct intel_context *ctx = request->ctx;
> > > +		ringbuf = ctx->engine[ring->id].ringbuf;
> > > +	} else
> > > +		ringbuf = ring->buffer;
> > > +
> > > +	request_start = intel_ring_get_tail(ringbuf);
> > >  	/*
> > >  	 * Emit any outstanding flushes - execbuf can fail to emit the flush
> > >  	 * after having emitted the batchbuffer command. Hence we need to
> > > fix @@ -2338,24 +2349,32 @@ int __i915_add_request(struct
> > intel_engine_cs *ring,
> > >  	 * is that the flush _must_ happen before the next request, no
> > matter
> > >  	 * what.
> > >  	 */
> > > -	ret = intel_ring_flush_all_caches(ring);
> > > -	if (ret)
> > > -		return ret;
> > > -
> > > -	request = ring->preallocated_lazy_request;
> > > -	if (WARN_ON(request == NULL))
> > > -		return -ENOMEM;
> > > +	if (i915.enable_execlists) {
> > > +		ret = logical_ring_flush_all_caches(ringbuf);
> > > +		if (ret)
> > > +			return ret;
> > > +	} else {
> > > +		ret = intel_ring_flush_all_caches(ring);
> > > +		if (ret)
> > > +			return ret;
> > > +	}
> > >
> > >  	/* Record the position of the start of the request so that
> > >  	 * should we detect the updated seqno part-way through the
> > >  	 * GPU processing the request, we never over-estimate the
> > >  	 * position of the head.
> > >  	 */
> > > -	request_ring_position = intel_ring_get_tail(ring->buffer);
> > > +	request_ring_position = intel_ring_get_tail(ringbuf);
> > >
> > > -	ret = ring->add_request(ring);
> > > -	if (ret)
> > > -		return ret;
> > > +	if (i915.enable_execlists) {
> > > +		ret = ring->emit_request(ringbuf);
> > > +		if (ret)
> > > +			return ret;
> > > +	} else {
> > > +		ret = ring->add_request(ring);
> > > +		if (ret)
> > > +			return ret;
> > > +	}
> > >
> > >  	request->seqno = intel_ring_get_seqno(ring);
> > >  	request->ring = ring;
> > > @@ -2370,12 +2389,14 @@ int __i915_add_request(struct intel_engine_cs
> > *ring,
> > >  	 */
> > >  	request->batch_obj = obj;
> > >
> > > -	/* Hold a reference to the current context so that we can inspect
> > > -	 * it later in case a hangcheck error event fires.
> > > -	 */
> > > -	request->ctx = ring->last_context;
> > > -	if (request->ctx)
> > > -		i915_gem_context_reference(request->ctx);
> > > +	if (!i915.enable_execlists) {
> > > +		/* Hold a reference to the current context so that we can
> > inspect
> > > +		 * it later in case a hangcheck error event fires.
> > > +		 */
> > > +		request->ctx = ring->last_context;
> > > +		if (request->ctx)
> > > +			i915_gem_context_reference(request->ctx);
> > > +	}
> > >
> > >  	request->emitted_jiffies = jiffies;
> > >  	list_add_tail(&request->list, &ring->request_list); @@ -2630,6
> > > +2651,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
> > >
> > >  	while (!list_empty(&ring->request_list)) {
> > >  		struct drm_i915_gem_request *request;
> > > +		struct intel_ringbuffer *ringbuf;
> > >
> > >  		request = list_first_entry(&ring->request_list,
> > >  					   struct drm_i915_gem_request,
> > > @@ -2639,12 +2661,24 @@ i915_gem_retire_requests_ring(struct
> > intel_engine_cs *ring)
> > >  			break;
> > >
> > >  		trace_i915_gem_request_retire(ring, request->seqno);
> > > +
> > > +		/* This is one of the few common intersection points
> > > +		 * between legacy ringbuffer submission and execlists:
> > > +		 * we need to tell them apart in order to find the correct
> > > +		 * ringbuffer to which the request belongs to.
> > > +		 */
> > > +		if (i915.enable_execlists) {
> > > +			struct intel_context *ctx = request->ctx;
> > > +			ringbuf = ctx->engine[ring->id].ringbuf;
> > > +		} else
> > > +			ringbuf = ring->buffer;
> > > +
> > >  		/* We know the GPU must have read the request to have
> > >  		 * sent us the seqno + interrupt, so use the position
> > >  		 * of tail of the request to update the last known position
> > >  		 * of the GPU head.
> > >  		 */
> > > -		ring->buffer->last_retired_head = request->tail;
> > > +		ringbuf->last_retired_head = request->tail;
> > >
> > >  		i915_gem_free_request(request);
> > >  	}
> > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > > b/drivers/gpu/drm/i915/intel_lrc.c
> > > index 5dd63d6..dcf59c6 100644
> > > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > > @@ -106,6 +106,22 @@ void intel_logical_ring_stop(struct intel_engine_cs
> > *ring)
> > >  	/* TODO */
> > >  }
> > >
> > > +int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf) {
> > > +	struct intel_engine_cs *ring = ringbuf->ring;
> > > +	int ret;
> > > +
> > > +	if (!ring->gpu_caches_dirty)
> > > +		return 0;
> > > +
> > > +	ret = ring->emit_flush(ringbuf, 0, I915_GEM_GPU_DOMAINS);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	ring->gpu_caches_dirty = false;
> > > +	return 0;
> > > +}
> > > +
> > >  void intel_logical_ring_advance_and_submit(struct intel_ringbuffer
> > > *ringbuf)  {
> > >  	intel_logical_ring_advance(ringbuf);
> > > @@ -116,7 +132,8 @@ void intel_logical_ring_advance_and_submit(struct
> > intel_ringbuffer *ringbuf)
> > >  	/* TODO: how to submit a context to the ELSP is not here yet */  }
> > >
> > > -static int logical_ring_alloc_seqno(struct intel_engine_cs *ring)
> > > +static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
> > > +				    struct intel_context *ctx)
> > >  {
> > >  	if (ring->outstanding_lazy_seqno)
> > >  		return 0;
> > > @@ -128,6 +145,13 @@ static int logical_ring_alloc_seqno(struct
> > intel_engine_cs *ring)
> > >  		if (request == NULL)
> > >  			return -ENOMEM;
> > >
> > > +		/* Hold a reference to the context this request belongs to
> > > +		 * (we will need it when the time comes to emit/retire the
> > > +		 * request).
> > > +		 */
> > > +		request->ctx = ctx;
> > > +		i915_gem_context_reference(request->ctx);
> > > +
> > >  		ring->preallocated_lazy_request = request;
> > >  	}
> > >
> > > @@ -165,8 +189,6 @@ static int logical_ring_wait_request(struct
> > intel_ringbuffer *ringbuf, int bytes
> > >  	if (ret)
> > >  		return ret;
> > >
> > > -	/* TODO: make sure we update the right ringbuffer's
> > last_retired_head
> > > -	 * when retiring requests */
> > >  	i915_gem_retire_requests_ring(ring);
> > >  	ringbuf->head = ringbuf->last_retired_head;
> > >  	ringbuf->last_retired_head = -1;
> > > @@ -291,7 +313,7 @@ int intel_logical_ring_begin(struct intel_ringbuffer
> > *ringbuf, int num_dwords)
> > >  		return ret;
> > >
> > >  	/* Preallocate the olr before touching the ring */
> > > -	ret = logical_ring_alloc_seqno(ring);
> > > +	ret = logical_ring_alloc_seqno(ring, ringbuf->ctx);
> > 
> > Ok, this is hairy. Really hairy, since this uses ringbuf->ctx. Not sure we really
> > want this or need this.
> > 
> alloc_seqno needs to know the context since it also preallocates and
> populates the request, which belongs to a specific context.  Having the
> logical context pointer inside the ring buffer makes the code a lot more
> elegant imo, since we don't have to keep passing buffer and context
> around when there is only one context which can use each ring buffer,
> just as keeping the engine pointer in the ring buffer means we don't
> have to pass the engine with it.
> 
> As I said in my reply to your patch 8 review I'm happy for ctx to be
> renamed to avoid confusion in the legacy path

Well I'm not yet sold on whether we need it at all. For this case here
doing the request allocation deeply buried in the ring_begin function
isn't pretty. It's an artifact of 5 years of engineering history for the
legacy gem codepaths essentially, and with execlists (and the new
explicitly fenced world everyone wants to have) I don't think it's a sound
decision any more.

Imo for execlist it would make a lot more sense to do this request
preallocation much higher up in the execlist cmd submission function.
That's kinda one of the reasons I wanted that abstraction. It would be
placed after all the prep work and argument checking is done right before
we start to fill in new instructions into the lrc ring.

logical_ring_begin would then only have a
WARN_ON(!request_not_preallocated).

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 27/43] drm/i915/bdw: Render state init for Execlists
  2014-08-13 15:07     ` Daniel, Thomas
@ 2014-08-13 15:30       ` Daniel Vetter
  2014-08-14 20:00         ` Daniel Vetter
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel Vetter @ 2014-08-13 15:30 UTC (permalink / raw)
  To: Daniel, Thomas; +Cc: intel-gfx

On Wed, Aug 13, 2014 at 03:07:29PM +0000, Daniel, Thomas wrote:
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> > Vetter
> > Sent: Monday, August 11, 2014 10:25 PM
> > To: Daniel, Thomas
> > Cc: intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 27/43] drm/i915/bdw: Render state init for
> > Execlists
> > 
> > On Thu, Jul 24, 2014 at 05:04:35PM +0100, Thomas Daniel wrote:
> > > From: Oscar Mateo <oscar.mateo@intel.com>
 > > index 9085ff1..0dc6992 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > > @@ -513,8 +513,23 @@ int i915_gem_context_enable(struct
> > drm_i915_private *dev_priv)
> > >  		ppgtt->enable(ppgtt);
> > >  	}
> > >
> > > -	if (i915.enable_execlists)
> > > +	if (i915.enable_execlists) {
> > > +		struct intel_context *dctx;
> > > +
> > > +		ring = &dev_priv->ring[RCS];
> > > +		dctx = ring->default_context;
> > > +
> > > +		if (!dctx->rcs_initialized) {
> > > +			ret = intel_lr_context_render_state_init(ring, dctx);
> > > +			if (ret) {
> > > +				DRM_ERROR("Init render state failed: %d\n",
> > ret);
> > > +				return ret;
> > > +			}
> > > +			dctx->rcs_initialized = true;
> > > +		}
> > > +
> > >  		return 0;
> > > +	}
> > 
> > This looks very much like the wrong place. We should init the render state
> > when we create the context, or when we switch to it for the first time.
> > The later is what the legacy contexts currently do in do_switch.
> > 
> > But ctx_enable should do the switch to the default context and that's about
> Well, a side-effect of switching to the default context in legacy mode is that
> the render state gets initialized.  I could move the lr render state init call
> into an enable_execlists branch in i915_switch_context() but that doen't
> seem like the right place.
> 
> How about in i915_gem_init() after calling i915_gem_init_hw()?
> 
> > it. If there's some depency then I guess we should stall the creation of the
> > default context a bit, maybe.
> > 
> > In any case someone needs to explain this better and if there's not other
> > wey this at least needs a bit comment. So I'll punt for now.
> When the default context is created the driver is not ready to execute a
> batch.  That is why the render state init can't be done then.

That sounds like the default context is created too early. Essentially I
want to avoid needless divergence between the default context and normal
contexts, because sooner or later that will means someone will creep in
with a _really_ subtle bug.

What about:
- We create the default lrc contexs in context_init, but like with a
  normal context we don't do any of the deferred setup.
- In context_enable (which since yesterday properly propagates errors to
  callers) we force the deferred lrc ctx setup for the default contexts on
  all engines.
- The render state init is done as part of the deferred ctx setup for the
  render engine in all cases.

Totally off the track or do you see a workable solution somewhere in that
direction?

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 08/43] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
  2014-08-13 15:16         ` Daniel Vetter
@ 2014-08-14 15:09           ` Daniel, Thomas
  2014-08-14 15:32             ` Daniel Vetter
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel, Thomas @ 2014-08-14 15:09 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx



> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Wednesday, August 13, 2014 4:16 PM
> To: Daniel, Thomas
> Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 08/43] drm/i915/bdw: Add a context and an
> engine pointers to the ringbuffer
> 
> On Wed, Aug 13, 2014 at 01:34:15PM +0000, Daniel, Thomas wrote:
> >
> >
> > > -----Original Message-----
> > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> > > Daniel Vetter
> > > Sent: Monday, August 11, 2014 3:21 PM
> > > To: Daniel, Thomas
> > > Cc: intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH 08/43] drm/i915/bdw: Add a context
> > > and an engine pointers to the ringbuffer
> > >
> > > On Mon, Aug 11, 2014 at 04:14:13PM +0200, Daniel Vetter wrote:
> > > > On Thu, Jul 24, 2014 at 05:04:16PM +0100, Thomas Daniel wrote:
> > > > > From: Oscar Mateo <oscar.mateo@intel.com>
> > > > >
> > > > > Any given ringbuffer is unequivocally tied to one context and one
> engine.
> > > > > By setting the appropriate pointers to them, the ringbuffer
> > > > > struct holds all the infromation you might need to submit a
> > > > > workload for processing, Execlists style.
> > > > >
> > > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/intel_lrc.c        |    2 ++
> > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    2 ++
> > > > >  drivers/gpu/drm/i915/intel_ringbuffer.h |    3 +++
> > > > >  3 files changed, 7 insertions(+)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > > > > b/drivers/gpu/drm/i915/intel_lrc.c
> > > > > index 0a12b8c..2eb7db6 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > > > > @@ -132,6 +132,8 @@ int intel_lr_context_deferred_create(struct
> > > intel_context *ctx,
> > > > >  		return ret;
> > > > >  	}
> > > > >
> > > > > +	ringbuf->ring = ring;
> > > > > +	ringbuf->ctx = ctx;
> > > > >  	ringbuf->size = 32 * PAGE_SIZE;
> > > > >  	ringbuf->effective_size = ringbuf->size;
> > > > >  	ringbuf->head = 0;
> > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > index 01e9840..279dda4 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > @@ -1570,6 +1570,8 @@ static int intel_init_ring_buffer(struct
> > > drm_device *dev,
> > > > >  	INIT_LIST_HEAD(&ring->active_list);
> > > > >  	INIT_LIST_HEAD(&ring->request_list);
> > > > >  	ringbuf->size = 32 * PAGE_SIZE;
> > > > > +	ringbuf->ring = ring;
> > > > > +	ringbuf->ctx = ring->default_context;
> > > >
> > > > That doesn't make a terribly lot of sense tbh. I fear it's one of
> > > > these slight confusions which will take tons of patches to clean up.
> > > > Why exactly do we need the ring->ctx pointer?
> > > >
> > > > If we only need this for lrc I want to name it accordingly, to
> > > > make sure legacy code doesn't grow stupid ideas. And also we
> > > > should only initialize this in the lrc ctx init then.
> > > >
> > > > All patches up to this one merged.
> > >
> > > Ok, I've discussed this quickly with Damien on irc. We decided to
> > > cut away the ring->ctx part of this patch for now to be able to move on.
> > > -Daniel
> > As you've seen, removing ringbuffer->ctx causes serious problems with
> > the plumbing later on.  This can be renamed (perhaps to lrc) and
> > removed from legacy init.
> >
> > Each ring buffer belongs to a specific context - it makes sense to me
> > to keep this information within the ringbuffer structure so that we
> > don't have to pass the context pointer around everywhere.
> 
> I agree that it causes trouble with the follow-up patches, but I'm not sold on
> this being a terrible good idea. After all for ELSP we don't want to submit a
> ring, we want to submit the full context. So if the code that's supposed to do
> the execlist ctx submission only has the pointer to the ring object, the layer
> looks a bit wrong.
When it comes to the execlist submission (actually as early as the execlist
request queueing), the engine and context are indeed used and required.
intel_logical_ring_advance_and_submit() is the lrc function analogous to
__intel_ring_advance() and I believe the initial creation of intel_lrc.c
was actually done by copying intel_ringbuffer.c.  This explains why some of
the lrc code is perhaps not as it would have been if this had been designed
from scratch, and there is room for future improvement.
advance_and_submit therefore only gets the ringbuffer struct and uses
the context pointer in that struct to get the logical ring context itself.  At
that point the engine, context and new tail pointer are handed over to the
execlist queue backend.

Thomas.

> 
> Same was iirc about the add_request part.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 08/43] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
  2014-08-14 15:09           ` Daniel, Thomas
@ 2014-08-14 15:32             ` Daniel Vetter
  2014-08-14 15:37               ` Daniel Vetter
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel Vetter @ 2014-08-14 15:32 UTC (permalink / raw)
  To: Daniel, Thomas; +Cc: intel-gfx

On Thu, Aug 14, 2014 at 03:09:45PM +0000, Daniel, Thomas wrote:
> When it comes to the execlist submission (actually as early as the execlist
> request queueing), the engine and context are indeed used and required.
> intel_logical_ring_advance_and_submit() is the lrc function analogous to
> __intel_ring_advance() and I believe the initial creation of intel_lrc.c
> was actually done by copying intel_ringbuffer.c.  This explains why some of
> the lrc code is perhaps not as it would have been if this had been designed
> from scratch, and there is room for future improvement.
> advance_and_submit therefore only gets the ringbuffer struct and uses
> the context pointer in that struct to get the logical ring context itself.  At
> that point the engine, context and new tail pointer are handed over to the
> execlist queue backend.

I guess I need to clarify: Does it make sense to move the ELSP
respectively the submission to the execlist scheduler queue out of there
up a few levels into the execlist cmd submission function? Is it possible
or is there some technical reason that I'm overlooking?

I want to know what exactly I'm dealing with here before I sign up for it
by merging the patches as-is and asking for a cleanup. I doesn't look bad
really, but there's always a good chance that I've overlooked a bigger
dragon.

Since you have the patches and worked with them I'm asking you such
explorative questions. Ofc I can do this checking myself, but that takes
time ... This doesn't mean that you have to implement the changes, just be
reasonable confident that it will work out as a cleanup on top.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 08/43] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
  2014-08-14 15:32             ` Daniel Vetter
@ 2014-08-14 15:37               ` Daniel Vetter
  2014-08-14 15:56                 ` Daniel, Thomas
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel Vetter @ 2014-08-14 15:37 UTC (permalink / raw)
  To: Daniel, Thomas; +Cc: intel-gfx

On Thu, Aug 14, 2014 at 05:32:28PM +0200, Daniel Vetter wrote:
> On Thu, Aug 14, 2014 at 03:09:45PM +0000, Daniel, Thomas wrote:
> > When it comes to the execlist submission (actually as early as the execlist
> > request queueing), the engine and context are indeed used and required.
> > intel_logical_ring_advance_and_submit() is the lrc function analogous to
> > __intel_ring_advance() and I believe the initial creation of intel_lrc.c
> > was actually done by copying intel_ringbuffer.c.  This explains why some of
> > the lrc code is perhaps not as it would have been if this had been designed
> > from scratch, and there is room for future improvement.
> > advance_and_submit therefore only gets the ringbuffer struct and uses
> > the context pointer in that struct to get the logical ring context itself.  At
> > that point the engine, context and new tail pointer are handed over to the
> > execlist queue backend.
> 
> I guess I need to clarify: Does it make sense to move the ELSP
> respectively the submission to the execlist scheduler queue out of there
> up a few levels into the execlist cmd submission function? Is it possible
> or is there some technical reason that I'm overlooking?
> 
> I want to know what exactly I'm dealing with here before I sign up for it
> by merging the patches as-is and asking for a cleanup. I doesn't look bad
> really, but there's always a good chance that I've overlooked a bigger
> dragon.
> 
> Since you have the patches and worked with them I'm asking you such
> explorative questions. Ofc I can do this checking myself, but that takes
> time ... This doesn't mean that you have to implement the changes, just be
> reasonable confident that it will work out as a cleanup on top.

To clarify more the context: Currently you're replies sound like "This is
what it looks like and I don't really know why nor whether we can change
that". That's not confidence instilling and that makes maintainers
reluctant to merge patches for fear of needing to fix things themselves
;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 08/43] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
  2014-08-14 15:37               ` Daniel Vetter
@ 2014-08-14 15:56                 ` Daniel, Thomas
  2014-08-14 16:19                   ` Daniel Vetter
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel, Thomas @ 2014-08-14 15:56 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx



> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Thursday, August 14, 2014 4:37 PM
> To: Daniel, Thomas
> Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 08/43] drm/i915/bdw: Add a context and an
> engine pointers to the ringbuffer
> 
> On Thu, Aug 14, 2014 at 05:32:28PM +0200, Daniel Vetter wrote:
> > On Thu, Aug 14, 2014 at 03:09:45PM +0000, Daniel, Thomas wrote:
> > > When it comes to the execlist submission (actually as early as the
> > > execlist request queueing), the engine and context are indeed used and
> required.
> > > intel_logical_ring_advance_and_submit() is the lrc function
> > > analogous to
> > > __intel_ring_advance() and I believe the initial creation of
> > > intel_lrc.c was actually done by copying intel_ringbuffer.c.  This
> > > explains why some of the lrc code is perhaps not as it would have
> > > been if this had been designed from scratch, and there is room for future
> improvement.
> > > advance_and_submit therefore only gets the ringbuffer struct and
> > > uses the context pointer in that struct to get the logical ring
> > > context itself.  At that point the engine, context and new tail
> > > pointer are handed over to the execlist queue backend.
> >
> > I guess I need to clarify: Does it make sense to move the ELSP
> > respectively the submission to the execlist scheduler queue out of
> > there up a few levels into the execlist cmd submission function? Is it
> > possible or is there some technical reason that I'm overlooking?
Yes this would make sense, and we already have a separate emit_request
vfunc which is only used in lrc mode so we can for example change the
signature to accept a drm_i915_gem_request* directly and take the ctx
and engine from there.

> >
> > I want to know what exactly I'm dealing with here before I sign up for
> > it by merging the patches as-is and asking for a cleanup. I doesn't
> > look bad really, but there's always a good chance that I've overlooked
> > a bigger dragon.
> >
> > Since you have the patches and worked with them I'm asking you such
> > explorative questions. Ofc I can do this checking myself, but that
> > takes time ... This doesn't mean that you have to implement the
> > changes, just be reasonable confident that it will work out as a cleanup on
> top.
Understood.

> 
> To clarify more the context: Currently you're replies sound like "This is what it
> looks like and I don't really know why nor whether we can change that".
You're right, I don't know why it was done this way.  But I can see that there
is no problem to change it later - it's only a piece of code after all...

> That's not confidence instilling and that makes maintainers reluctant to
> merge patches for fear of needing to fix things themselves
> ;-)
If it helps, I can tell you that several guys in our team are working with this
code and we have a vested interest in making sure the quality is as high as
possible.

Cheers,
Thomas.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 08/43] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
  2014-08-14 15:56                 ` Daniel, Thomas
@ 2014-08-14 16:19                   ` Daniel Vetter
  2014-08-14 16:27                     ` [PATCH] drm/i915: Add temporary ring->ctx backpointer Daniel Vetter
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel Vetter @ 2014-08-14 16:19 UTC (permalink / raw)
  To: Daniel, Thomas; +Cc: intel-gfx

On Thu, Aug 14, 2014 at 03:56:20PM +0000, Daniel, Thomas wrote:
> 
> 
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> > Vetter
> > Sent: Thursday, August 14, 2014 4:37 PM
> > To: Daniel, Thomas
> > Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 08/43] drm/i915/bdw: Add a context and an
> > engine pointers to the ringbuffer
> > 
> > On Thu, Aug 14, 2014 at 05:32:28PM +0200, Daniel Vetter wrote:
> > > On Thu, Aug 14, 2014 at 03:09:45PM +0000, Daniel, Thomas wrote:
> > > > When it comes to the execlist submission (actually as early as the
> > > > execlist request queueing), the engine and context are indeed used and
> > required.
> > > > intel_logical_ring_advance_and_submit() is the lrc function
> > > > analogous to
> > > > __intel_ring_advance() and I believe the initial creation of
> > > > intel_lrc.c was actually done by copying intel_ringbuffer.c.  This
> > > > explains why some of the lrc code is perhaps not as it would have
> > > > been if this had been designed from scratch, and there is room for future
> > improvement.
> > > > advance_and_submit therefore only gets the ringbuffer struct and
> > > > uses the context pointer in that struct to get the logical ring
> > > > context itself.  At that point the engine, context and new tail
> > > > pointer are handed over to the execlist queue backend.
> > >
> > > I guess I need to clarify: Does it make sense to move the ELSP
> > > respectively the submission to the execlist scheduler queue out of
> > > there up a few levels into the execlist cmd submission function? Is it
> > > possible or is there some technical reason that I'm overlooking?
> Yes this would make sense, and we already have a separate emit_request
> vfunc which is only used in lrc mode so we can for example change the
> signature to accept a drm_i915_gem_request* directly and take the ctx
> and engine from there.

Hm, I didn't spot the emit_request vfunc yet. Probably another one that
I'll ask you to fold in ;-)

> > > I want to know what exactly I'm dealing with here before I sign up for
> > > it by merging the patches as-is and asking for a cleanup. I doesn't
> > > look bad really, but there's always a good chance that I've overlooked
> > > a bigger dragon.
> > >
> > > Since you have the patches and worked with them I'm asking you such
> > > explorative questions. Ofc I can do this checking myself, but that
> > > takes time ... This doesn't mean that you have to implement the
> > > changes, just be reasonable confident that it will work out as a cleanup on
> > top.
> Understood.
> 
> > 
> > To clarify more the context: Currently you're replies sound like "This is what it
> > looks like and I don't really know why nor whether we can change that".
> You're right, I don't know why it was done this way.  But I can see that there
> is no problem to change it later - it's only a piece of code after all...
> 
> > That's not confidence instilling and that makes maintainers reluctant to
> > merge patches for fear of needing to fix things themselves
> > ;-)
> If it helps, I can tell you that several guys in our team are working with this
> code and we have a vested interest in making sure the quality is as high as
> possible.

Ok, I think you get the hang of how this "guide your maintainer into
accepting stuff" game works ;-) I'll take your word for it that there's no
dragon hiding and that you'll bravely fight it anyway and will merge in a
few more patches. And I'll do a JIRA to jot down the restructuring we need
to do.

Merging might be a bit slower since I'm heading to Chicago on Saturday
already, so I might ask you to resend the remaining patches.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH] drm/i915: Add temporary ring->ctx backpointer
  2014-08-14 16:19                   ` Daniel Vetter
@ 2014-08-14 16:27                     ` Daniel Vetter
  2014-08-14 16:33                       ` Daniel, Thomas
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel Vetter @ 2014-08-14 16:27 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

From: Oscar Mateo <oscar.mateo@intel.com>

The execlist patches have a bit a convoluted and long history and due
to that have the actual submission still misplaced deeply burried in
the low-level ringbuffer handling code. This design goes back to the
legacy ringbuffer code with its tricky lazy request and simple work
submissiion using ring tail writes. For that reason they need a
ring->ctx backpointer.

The goal is to unburry that code and move it up into a level where the
full execlist context is available so that we can ditch this
backpointer. Until that's done make it really obvious that there's
work still to be done.

Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

--

Thomas, please ack this patch and the general plan we've discussed.
Then I'll start pulling in more patches and I'll do the
s/ctx/FIXME_lrc_ctx/ on the fly.
-Daniel
---
 drivers/gpu/drm/i915/intel_lrc.c        | 2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 6b5f416b5c0d..c2352d1b23fa 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1086,6 +1086,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	}
 
 	ringbuf->ring = ring;
+	ringbuf->FIXME_lrc_ctx = ctx;
+
 	ringbuf->size = 32 * PAGE_SIZE;
 	ringbuf->effective_size = ringbuf->size;
 	ringbuf->head = 0;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 24437da91f77..26785ca72530 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -99,6 +99,13 @@ struct intel_ringbuffer {
 
 	struct intel_engine_cs *ring;
 
+	/*
+	 * FIXME: This backpointer is an artifact of the history of how the
+	 * execlist patches came into being. It will get removed once the basic
+	 * code has landed.
+	 */
+	struct intel_context *FIXME_lrc_ctx;
+
 	u32 head;
 	u32 tail;
 	int space;
-- 
2.0.1

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* Re: [PATCH] drm/i915: Add temporary ring->ctx backpointer
  2014-08-14 16:27                     ` [PATCH] drm/i915: Add temporary ring->ctx backpointer Daniel Vetter
@ 2014-08-14 16:33                       ` Daniel, Thomas
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel, Thomas @ 2014-08-14 16:33 UTC (permalink / raw)
  To: Daniel Vetter, Intel Graphics Development



> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch]
> Sent: Thursday, August 14, 2014 5:28 PM
> To: Intel Graphics Development
> Cc: Mateo Lozano, Oscar; Daniel, Thomas; Daniel Vetter
> Subject: [PATCH] drm/i915: Add temporary ring->ctx backpointer
> 
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> The execlist patches have a bit a convoluted and long history and due to that
> have the actual submission still misplaced deeply burried in the low-level
> ringbuffer handling code. This design goes back to the legacy ringbuffer code
> with its tricky lazy request and simple work submissiion using ring tail writes.
> For that reason they need a
> ring->ctx backpointer.
> 
> The goal is to unburry that code and move it up into a level where the full
> execlist context is available so that we can ditch this backpointer. Until that's
> done make it really obvious that there's work still to be done.
> 
> Cc: Oscar Mateo <oscar.mateo@intel.com>
> Cc: Thomas Daniel <thomas.daniel@intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> --
> 
> Thomas, please ack this patch and the general plan we've discussed.
Acked-by: Thomas Daniel <thomas.daniel@intel.com>

> Then I'll start pulling in more patches and I'll do the s/ctx/FIXME_lrc_ctx/ on
> the fly.
> -Daniel
> ---
>  drivers/gpu/drm/i915/intel_lrc.c        | 2 ++
>  drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> b/drivers/gpu/drm/i915/intel_lrc.c
> index 6b5f416b5c0d..c2352d1b23fa 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1086,6 +1086,8 @@ int intel_lr_context_deferred_create(struct
> intel_context *ctx,
>  	}
> 
>  	ringbuf->ring = ring;
> +	ringbuf->FIXME_lrc_ctx = ctx;
> +
>  	ringbuf->size = 32 * PAGE_SIZE;
>  	ringbuf->effective_size = ringbuf->size;
>  	ringbuf->head = 0;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h
> b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 24437da91f77..26785ca72530 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -99,6 +99,13 @@ struct intel_ringbuffer {
> 
>  	struct intel_engine_cs *ring;
> 
> +	/*
> +	 * FIXME: This backpointer is an artifact of the history of how the
> +	 * execlist patches came into being. It will get removed once the basic
> +	 * code has landed.
> +	 */
> +	struct intel_context *FIXME_lrc_ctx;
> +
>  	u32 head;
>  	u32 tail;
>  	int space;
> --
> 2.0.1

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 27/43] drm/i915/bdw: Render state init for Execlists
  2014-08-13 15:30       ` Daniel Vetter
@ 2014-08-14 20:00         ` Daniel Vetter
  2014-08-15  8:43           ` Daniel, Thomas
  2014-08-20 15:55           ` Daniel, Thomas
  0 siblings, 2 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-14 20:00 UTC (permalink / raw)
  To: Daniel, Thomas; +Cc: intel-gfx

On Wed, Aug 13, 2014 at 05:30:07PM +0200, Daniel Vetter wrote:
> On Wed, Aug 13, 2014 at 03:07:29PM +0000, Daniel, Thomas wrote:
> > > -----Original Message-----
> > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> > > Vetter
> > > Sent: Monday, August 11, 2014 10:25 PM
> > > To: Daniel, Thomas
> > > Cc: intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH 27/43] drm/i915/bdw: Render state init for
> > > Execlists
> > > 
> > > On Thu, Jul 24, 2014 at 05:04:35PM +0100, Thomas Daniel wrote:
> > > > From: Oscar Mateo <oscar.mateo@intel.com>
>  > > index 9085ff1..0dc6992 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > > > @@ -513,8 +513,23 @@ int i915_gem_context_enable(struct
> > > drm_i915_private *dev_priv)
> > > >  		ppgtt->enable(ppgtt);
> > > >  	}
> > > >
> > > > -	if (i915.enable_execlists)
> > > > +	if (i915.enable_execlists) {
> > > > +		struct intel_context *dctx;
> > > > +
> > > > +		ring = &dev_priv->ring[RCS];
> > > > +		dctx = ring->default_context;
> > > > +
> > > > +		if (!dctx->rcs_initialized) {
> > > > +			ret = intel_lr_context_render_state_init(ring, dctx);
> > > > +			if (ret) {
> > > > +				DRM_ERROR("Init render state failed: %d\n",
> > > ret);
> > > > +				return ret;
> > > > +			}
> > > > +			dctx->rcs_initialized = true;
> > > > +		}
> > > > +
> > > >  		return 0;
> > > > +	}
> > > 
> > > This looks very much like the wrong place. We should init the render state
> > > when we create the context, or when we switch to it for the first time.
> > > The later is what the legacy contexts currently do in do_switch.
> > > 
> > > But ctx_enable should do the switch to the default context and that's about
> > Well, a side-effect of switching to the default context in legacy mode is that
> > the render state gets initialized.  I could move the lr render state init call
> > into an enable_execlists branch in i915_switch_context() but that doen't
> > seem like the right place.
> > 
> > How about in i915_gem_init() after calling i915_gem_init_hw()?
> > 
> > > it. If there's some depency then I guess we should stall the creation of the
> > > default context a bit, maybe.
> > > 
> > > In any case someone needs to explain this better and if there's not other
> > > wey this at least needs a bit comment. So I'll punt for now.
> > When the default context is created the driver is not ready to execute a
> > batch.  That is why the render state init can't be done then.
> 
> That sounds like the default context is created too early. Essentially I
> want to avoid needless divergence between the default context and normal
> contexts, because sooner or later that will means someone will creep in
> with a _really_ subtle bug.
> 
> What about:
> - We create the default lrc contexs in context_init, but like with a
>   normal context we don't do any of the deferred setup.
> - In context_enable (which since yesterday properly propagates errors to
>   callers) we force the deferred lrc ctx setup for the default contexts on
>   all engines.
> - The render state init is done as part of the deferred ctx setup for the
>   render engine in all cases.
> 
> Totally off the track or do you see a workable solution somewhere in that
> direction?

I'd like to discuss this first a bit more, so will punt on this patch for
now.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 30/43] drm/i915/bdw: Two-stage execlist submit process
  2014-07-24 16:04 ` [PATCH 30/43] drm/i915/bdw: Two-stage execlist submit process Thomas Daniel
@ 2014-08-14 20:05   ` Daniel Vetter
  2014-08-14 20:10   ` Daniel Vetter
  1 sibling, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-14 20:05 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:38PM +0100, Thomas Daniel wrote:
> From: Michel Thierry <michel.thierry@intel.com>
> 
> Context switch (and execlist submission) should happen only when
> other contexts are not active, otherwise pre-emption occurs.
> 
> To assure this, we place context switch requests in a queue and those
> request are later consumed when the right context switch interrupt is
> received (still TODO).
> 
> v2: Use a spinlock, do not remove the requests on unqueue (wait for
> context switch completion).
> 
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
> 
> v3: Several rebases and code changes. Use unique ID.
> 
> v4:
> - Move the queue/lock init to the late ring initialization.
> - Damien's kmalloc review comments: check return, use sizeof(*req),
> do not cast.
> 
> v5:
> - Do not reuse drm_i915_gem_request. Instead, create our own.
> - New namespace.
> 
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v1)
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2-v5)
> ---
>  drivers/gpu/drm/i915/intel_lrc.c        |   63 ++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/i915/intel_lrc.h        |    8 ++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |    2 +
>  3 files changed, 71 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 5b6f416..9e91169 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -217,6 +217,63 @@ static int execlists_submit_context(struct intel_engine_cs *ring,
>  	return 0;
>  }
>  
> +static void execlists_context_unqueue(struct intel_engine_cs *ring)
> +{
> +	struct intel_ctx_submit_request *req0 = NULL, *req1 = NULL;
> +	struct intel_ctx_submit_request *cursor = NULL, *tmp = NULL;
> +
> +	if (list_empty(&ring->execlist_queue))
> +		return;
> +
> +	/* Try to read in pairs */
> +	list_for_each_entry_safe(cursor, tmp, &ring->execlist_queue, execlist_link) {
> +		if (!req0)
> +			req0 = cursor;
> +		else if (req0->ctx == cursor->ctx) {
> +			/* Same ctx: ignore first request, as second request
> +			 * will update tail past first request's workload */
> +			list_del(&req0->execlist_link);
> +			i915_gem_context_unreference(req0->ctx);
> +			kfree(req0);
> +			req0 = cursor;
> +		} else {
> +			req1 = cursor;
> +			break;
> +		}
> +	}
> +
> +	BUG_ON(execlists_submit_context(ring, req0->ctx, req0->tail,
> +			req1? req1->ctx : NULL, req1? req1->tail : 0));

You don't get to hard-hang my driver just because you've done a
programming mistake. Please remember my commandements ;-)

Also checkpatch ...
-Daniel

> +}
> +
> +static int execlists_context_queue(struct intel_engine_cs *ring,
> +				   struct intel_context *to,
> +				   u32 tail)
> +{
> +	struct intel_ctx_submit_request *req = NULL;
> +	unsigned long flags;
> +	bool was_empty;
> +
> +	req = kzalloc(sizeof(*req), GFP_KERNEL);
> +	if (req == NULL)
> +		return -ENOMEM;
> +	req->ctx = to;
> +	i915_gem_context_reference(req->ctx);
> +	req->ring = ring;
> +	req->tail = tail;
> +
> +	spin_lock_irqsave(&ring->execlist_lock, flags);
> +
> +	was_empty = list_empty(&ring->execlist_queue);
> +	list_add_tail(&req->execlist_link, &ring->execlist_queue);
> +	if (was_empty)
> +		execlists_context_unqueue(ring);
> +
> +	spin_unlock_irqrestore(&ring->execlist_lock, flags);
> +
> +	return 0;
> +}
> +
>  static int logical_ring_invalidate_all_caches(struct intel_ringbuffer *ringbuf)
>  {
>  	struct intel_engine_cs *ring = ringbuf->ring;
> @@ -405,8 +462,7 @@ void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
>  	if (intel_ring_stopped(ring))
>  		return;
>  
> -	/* FIXME: too cheeky, we don't even check if the ELSP is ready */
> -	execlists_submit_context(ring, ctx, ringbuf->tail, NULL, 0);
> +	execlists_context_queue(ring, ctx, ringbuf->tail);
>  }
>  
>  static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
> @@ -850,6 +906,9 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>  	INIT_LIST_HEAD(&ring->request_list);
>  	init_waitqueue_head(&ring->irq_queue);
>  
> +	INIT_LIST_HEAD(&ring->execlist_queue);
> +	spin_lock_init(&ring->execlist_lock);
> +
>  	ret = intel_lr_context_deferred_create(dctx, ring);
>  	if (ret)
>  		return ret;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index b59965b..14492a9 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -60,4 +60,12 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
>  			       u64 exec_start, u32 flags);
>  u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
>  
> +struct intel_ctx_submit_request {
> +	struct intel_context *ctx;
> +	struct intel_engine_cs *ring;
> +	u32 tail;
> +
> +	struct list_head execlist_link;
> +};
> +
>  #endif /* _INTEL_LRC_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index c885d5c..6358823 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -223,6 +223,8 @@ struct  intel_engine_cs {
>  	} semaphore;
>  
>  	/* Execlists */
> +	spinlock_t execlist_lock;
> +	struct list_head execlist_queue;
>  	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
>  	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
>  	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 30/43] drm/i915/bdw: Two-stage execlist submit process
  2014-07-24 16:04 ` [PATCH 30/43] drm/i915/bdw: Two-stage execlist submit process Thomas Daniel
  2014-08-14 20:05   ` Daniel Vetter
@ 2014-08-14 20:10   ` Daniel Vetter
  2014-08-15  8:51     ` Daniel, Thomas
  1 sibling, 1 reply; 137+ messages in thread
From: Daniel Vetter @ 2014-08-14 20:10 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:38PM +0100, Thomas Daniel wrote:
> From: Michel Thierry <michel.thierry@intel.com>
> 
> Context switch (and execlist submission) should happen only when
> other contexts are not active, otherwise pre-emption occurs.
> 
> To assure this, we place context switch requests in a queue and those
> request are later consumed when the right context switch interrupt is
> received (still TODO).
> 
> v2: Use a spinlock, do not remove the requests on unqueue (wait for
> context switch completion).
> 
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
> 
> v3: Several rebases and code changes. Use unique ID.
> 
> v4:
> - Move the queue/lock init to the late ring initialization.
> - Damien's kmalloc review comments: check return, use sizeof(*req),
> do not cast.
> 
> v5:
> - Do not reuse drm_i915_gem_request. Instead, create our own.
> - New namespace.
> 
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v1)
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2-v5)
> ---
>  drivers/gpu/drm/i915/intel_lrc.c        |   63 ++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/i915/intel_lrc.h        |    8 ++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |    2 +
>  3 files changed, 71 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 5b6f416..9e91169 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -217,6 +217,63 @@ static int execlists_submit_context(struct intel_engine_cs *ring,
>  	return 0;
>  }
>  
> +static void execlists_context_unqueue(struct intel_engine_cs *ring)
> +{
> +	struct intel_ctx_submit_request *req0 = NULL, *req1 = NULL;
> +	struct intel_ctx_submit_request *cursor = NULL, *tmp = NULL;
> +
> +	if (list_empty(&ring->execlist_queue))
> +		return;
> +
> +	/* Try to read in pairs */
> +	list_for_each_entry_safe(cursor, tmp, &ring->execlist_queue, execlist_link) {

Ok, because checkpatch I've looked at this. Imo open-coding this would be
much easier to read i.e.

	if (!list_empty)
		grab&remove first item;
	if (!list_empty)
		grab&remove 2nd item;

Care to follow up with a patch for that?

Thanks, Daniel

> +		if (!req0)
> +			req0 = cursor;
> +		else if (req0->ctx == cursor->ctx) {
> +			/* Same ctx: ignore first request, as second request
> +			 * will update tail past first request's workload */
> +			list_del(&req0->execlist_link);
> +			i915_gem_context_unreference(req0->ctx);
> +			kfree(req0);
> +			req0 = cursor;
> +		} else {
> +			req1 = cursor;
> +			break;
> +		}
> +	}
> +
> +	BUG_ON(execlists_submit_context(ring, req0->ctx, req0->tail,
> +			req1? req1->ctx : NULL, req1? req1->tail : 0));
> +}
> +
> +static int execlists_context_queue(struct intel_engine_cs *ring,
> +				   struct intel_context *to,
> +				   u32 tail)
> +{
> +	struct intel_ctx_submit_request *req = NULL;
> +	unsigned long flags;
> +	bool was_empty;
> +
> +	req = kzalloc(sizeof(*req), GFP_KERNEL);
> +	if (req == NULL)
> +		return -ENOMEM;
> +	req->ctx = to;
> +	i915_gem_context_reference(req->ctx);
> +	req->ring = ring;
> +	req->tail = tail;
> +
> +	spin_lock_irqsave(&ring->execlist_lock, flags);
> +
> +	was_empty = list_empty(&ring->execlist_queue);
> +	list_add_tail(&req->execlist_link, &ring->execlist_queue);
> +	if (was_empty)
> +		execlists_context_unqueue(ring);
> +
> +	spin_unlock_irqrestore(&ring->execlist_lock, flags);
> +
> +	return 0;
> +}
> +
>  static int logical_ring_invalidate_all_caches(struct intel_ringbuffer *ringbuf)
>  {
>  	struct intel_engine_cs *ring = ringbuf->ring;
> @@ -405,8 +462,7 @@ void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
>  	if (intel_ring_stopped(ring))
>  		return;
>  
> -	/* FIXME: too cheeky, we don't even check if the ELSP is ready */
> -	execlists_submit_context(ring, ctx, ringbuf->tail, NULL, 0);
> +	execlists_context_queue(ring, ctx, ringbuf->tail);
>  }
>  
>  static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
> @@ -850,6 +906,9 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>  	INIT_LIST_HEAD(&ring->request_list);
>  	init_waitqueue_head(&ring->irq_queue);
>  
> +	INIT_LIST_HEAD(&ring->execlist_queue);
> +	spin_lock_init(&ring->execlist_lock);
> +
>  	ret = intel_lr_context_deferred_create(dctx, ring);
>  	if (ret)
>  		return ret;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index b59965b..14492a9 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -60,4 +60,12 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
>  			       u64 exec_start, u32 flags);
>  u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
>  
> +struct intel_ctx_submit_request {
> +	struct intel_context *ctx;
> +	struct intel_engine_cs *ring;
> +	u32 tail;
> +
> +	struct list_head execlist_link;
> +};
> +
>  #endif /* _INTEL_LRC_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index c885d5c..6358823 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -223,6 +223,8 @@ struct  intel_engine_cs {
>  	} semaphore;
>  
>  	/* Execlists */
> +	spinlock_t execlist_lock;
> +	struct list_head execlist_queue;
>  	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
>  	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
>  	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 31/43] drm/i915/bdw: Handle context switch events
  2014-07-24 16:04 ` [PATCH 31/43] drm/i915/bdw: Handle context switch events Thomas Daniel
@ 2014-08-14 20:13   ` Daniel Vetter
  2014-08-14 20:17   ` Daniel Vetter
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-14 20:13 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:39PM +0100, Thomas Daniel wrote:
> Handle all context status events in the context status buffer on every
> context switch interrupt. We only remove work from the execlist queue
> after a context status buffer reports that it has completed and we only
> attempt to schedule new contexts on interrupt when a previously submitted
> context completes (unless no contexts are queued, which means the GPU is
> free).
> 
> We canot call intel_runtime_pm_get() in an interrupt (or with a spinlock
> grabbed, FWIW), because it might sleep, which is not a nice thing to do.
> Instead, do the runtime_pm get/put together with the create/destroy request,
> and handle the forcewake get/put directly.
> 
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
> 
> v2: Unreferencing the context when we are freeing the request might free
> the backing bo, which requires the struct_mutex to be grabbed, so defer
> unreferencing and freeing to a bottom half.
> 
> v3:
> - Ack the interrupt inmediately, before trying to handle it (fix for
> missing interrupts by Bob Beckett <robert.beckett@intel.com>).
> - Update the Context Status Buffer Read Pointer, just in case (spotted
> by Damien Lespiau).
> 
> v4: New namespace and multiple rebase changes.
> 
> v5: Squash with "drm/i915/bdw: Do not call intel_runtime_pm_get() in an
> interrupt", as suggested by Daniel.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_irq.c         |   35 ++++++---
>  drivers/gpu/drm/i915/intel_lrc.c        |  129 +++++++++++++++++++++++++++++--
>  drivers/gpu/drm/i915/intel_lrc.h        |    3 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h |    1 +
>  4 files changed, 151 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index f77a4ca..e4077d1 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1628,6 +1628,7 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>  				       struct drm_i915_private *dev_priv,
>  				       u32 master_ctl)
>  {
> +	struct intel_engine_cs *ring;
>  	u32 rcs, bcs, vcs;
>  	uint32_t tmp = 0;
>  	irqreturn_t ret = IRQ_NONE;
> @@ -1637,14 +1638,20 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>  		if (tmp) {
>  			I915_WRITE(GEN8_GT_IIR(0), tmp);
>  			ret = IRQ_HANDLED;
> +
>  			rcs = tmp >> GEN8_RCS_IRQ_SHIFT;
> -			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
> +			ring = &dev_priv->ring[RCS];
>  			if (rcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, &dev_priv->ring[RCS]);
> +				notify_ring(dev, ring);
> +			if (rcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> +				intel_execlists_handle_ctx_events(ring);
> +
> +			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
> +			ring = &dev_priv->ring[BCS];
>  			if (bcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, &dev_priv->ring[BCS]);
> -			if ((rcs | bcs) & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> -				DRM_DEBUG_DRIVER("TODO: Context switch\n");
> +				notify_ring(dev, ring);
> +			if (bcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> +				intel_execlists_handle_ctx_events(ring);
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT0)!\n");
>  	}
> @@ -1654,16 +1661,20 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>  		if (tmp) {
>  			I915_WRITE(GEN8_GT_IIR(1), tmp);
>  			ret = IRQ_HANDLED;
> +
>  			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
> +			ring = &dev_priv->ring[VCS];
>  			if (vcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, &dev_priv->ring[VCS]);
> +				notify_ring(dev, ring);
>  			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> -				DRM_DEBUG_DRIVER("TODO: Context switch\n");
> +				intel_execlists_handle_ctx_events(ring);
> +
>  			vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
> +			ring = &dev_priv->ring[VCS2];
>  			if (vcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, &dev_priv->ring[VCS2]);
> +				notify_ring(dev, ring);
>  			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> -				DRM_DEBUG_DRIVER("TODO: Context switch\n");
> +				intel_execlists_handle_ctx_events(ring);
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT1)!\n");
>  	}
> @@ -1684,11 +1695,13 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>  		if (tmp) {
>  			I915_WRITE(GEN8_GT_IIR(3), tmp);
>  			ret = IRQ_HANDLED;
> +
>  			vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
> +			ring = &dev_priv->ring[VECS];
>  			if (vcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, &dev_priv->ring[VECS]);
> +				notify_ring(dev, ring);
>  			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> -				DRM_DEBUG_DRIVER("TODO: Context switch\n");
> +				intel_execlists_handle_ctx_events(ring);
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT3)!\n");
>  	}

The above stuff is dropping of the left edge a bit. And it's all so
super-nicely lead out thanks to the hw engineers, so with a bit of code
refactoring we should be able to have one generic ring irq handler and 5
callers for them and rip this all you.

Can you please do that?
-Daniel

> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 9e91169..65f4f26 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -49,6 +49,22 @@
>  #define RING_ELSP(ring)			((ring)->mmio_base+0x230)
>  #define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
>  #define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
> +#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
> +#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
> +
> +#define RING_EXECLIST_QFULL		(1 << 0x2)
> +#define RING_EXECLIST1_VALID		(1 << 0x3)
> +#define RING_EXECLIST0_VALID		(1 << 0x4)
> +#define RING_EXECLIST_ACTIVE_STATUS	(3 << 0xE)
> +#define RING_EXECLIST1_ACTIVE		(1 << 0x11)
> +#define RING_EXECLIST0_ACTIVE		(1 << 0x12)
> +
> +#define GEN8_CTX_STATUS_IDLE_ACTIVE	(1 << 0)
> +#define GEN8_CTX_STATUS_PREEMPTED	(1 << 1)
> +#define GEN8_CTX_STATUS_ELEMENT_SWITCH	(1 << 2)
> +#define GEN8_CTX_STATUS_ACTIVE_IDLE	(1 << 3)
> +#define GEN8_CTX_STATUS_COMPLETE	(1 << 4)
> +#define GEN8_CTX_STATUS_LITE_RESTORE	(1 << 15)
>  
>  #define CTX_LRI_HEADER_0		0x01
>  #define CTX_CONTEXT_CONTROL		0x02
> @@ -147,6 +163,7 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
>  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
>  	uint64_t temp = 0;
>  	uint32_t desc[4];
> +	unsigned long flags;
>  
>  	/* XXX: You must always write both descriptors in the order below. */
>  	if (ctx_obj1)
> @@ -160,9 +177,17 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
>  	desc[3] = (u32)(temp >> 32);
>  	desc[2] = (u32)temp;
>  
> -	/* Set Force Wakeup bit to prevent GT from entering C6 while
> -	 * ELSP writes are in progress */
> -	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
> +	/* Set Force Wakeup bit to prevent GT from entering C6 while ELSP writes
> +	 * are in progress.
> +	 *
> +	 * The other problem is that we can't just call gen6_gt_force_wake_get()
> +	 * because that function calls intel_runtime_pm_get(), which might sleep.
> +	 * Instead, we do the runtime_pm_get/put when creating/destroying requests.
> +	 */
> +	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
> +	if (dev_priv->uncore.forcewake_count++ == 0)
> +		dev_priv->uncore.funcs.force_wake_get(dev_priv, FORCEWAKE_ALL);
> +	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
>  
>  	I915_WRITE(RING_ELSP(ring), desc[1]);
>  	I915_WRITE(RING_ELSP(ring), desc[0]);
> @@ -173,7 +198,11 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
>  	/* ELSP is a wo register, so use another nearby reg for posting instead */
>  	POSTING_READ(RING_EXECLIST_STATUS(ring));
>  
> -	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
> +	/* Release Force Wakeup (see the big comment above). */
> +	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
> +	if (--dev_priv->uncore.forcewake_count == 0)
> +		dev_priv->uncore.funcs.force_wake_put(dev_priv, FORCEWAKE_ALL);
> +	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
>  }
>  
>  static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tail)
> @@ -221,6 +250,9 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
>  {
>  	struct intel_ctx_submit_request *req0 = NULL, *req1 = NULL;
>  	struct intel_ctx_submit_request *cursor = NULL, *tmp = NULL;
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +
> +	assert_spin_locked(&ring->execlist_lock);
>  
>  	if (list_empty(&ring->execlist_queue))
>  		return;
> @@ -233,8 +265,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
>  			/* Same ctx: ignore first request, as second request
>  			 * will update tail past first request's workload */
>  			list_del(&req0->execlist_link);
> -			i915_gem_context_unreference(req0->ctx);
> -			kfree(req0);
> +			queue_work(dev_priv->wq, &req0->work);
>  			req0 = cursor;
>  		} else {
>  			req1 = cursor;
> @@ -246,6 +277,89 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
>  			req1? req1->ctx : NULL, req1? req1->tail : 0));
>  }
>  
> +static bool execlists_check_remove_request(struct intel_engine_cs *ring,
> +					   u32 request_id)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct intel_ctx_submit_request *head_req;
> +
> +	assert_spin_locked(&ring->execlist_lock);
> +
> +	head_req = list_first_entry_or_null(&ring->execlist_queue,
> +			struct intel_ctx_submit_request, execlist_link);
> +	if (head_req != NULL) {
> +		struct drm_i915_gem_object *ctx_obj =
> +				head_req->ctx->engine[ring->id].state;
> +		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
> +			list_del(&head_req->execlist_link);
> +			queue_work(dev_priv->wq, &head_req->work);
> +			return true;
> +		}
> +	}
> +
> +	return false;
> +}
> +
> +void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	u32 status_pointer;
> +	u8 read_pointer;
> +	u8 write_pointer;
> +	u32 status;
> +	u32 status_id;
> +	u32 submit_contexts = 0;
> +
> +	status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
> +
> +	read_pointer = ring->next_context_status_buffer;
> +	write_pointer = status_pointer & 0x07;
> +	if (read_pointer > write_pointer)
> +		write_pointer += 6;
> +
> +	spin_lock(&ring->execlist_lock);
> +
> +	while (read_pointer < write_pointer) {
> +		read_pointer++;
> +		status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
> +				(read_pointer % 6) * 8);
> +		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
> +				(read_pointer % 6) * 8 + 4);
> +
> +		if (status & GEN8_CTX_STATUS_COMPLETE) {
> +			if (execlists_check_remove_request(ring, status_id))
> +				submit_contexts++;
> +		}
> +	}
> +
> +	if (submit_contexts != 0)
> +		execlists_context_unqueue(ring);
> +
> +	spin_unlock(&ring->execlist_lock);
> +
> +	WARN(submit_contexts > 2, "More than two context complete events?\n");
> +	ring->next_context_status_buffer = write_pointer % 6;
> +
> +	I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
> +			((u32)ring->next_context_status_buffer & 0x07) << 8);
> +}
> +
> +static void execlists_free_request_task(struct work_struct *work)
> +{
> +	struct intel_ctx_submit_request *req =
> +			container_of(work, struct intel_ctx_submit_request, work);
> +	struct drm_device *dev = req->ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	intel_runtime_pm_put(dev_priv);
> +
> +	mutex_lock(&dev->struct_mutex);
> +	i915_gem_context_unreference(req->ctx);
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	kfree(req);
> +}
> +
>  static int execlists_context_queue(struct intel_engine_cs *ring,
>  				   struct intel_context *to,
>  				   u32 tail)
> @@ -261,6 +375,8 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
>  	i915_gem_context_reference(req->ctx);
>  	req->ring = ring;
>  	req->tail = tail;
> +	INIT_WORK(&req->work, execlists_free_request_task);
> +	intel_runtime_pm_get(dev_priv);
>  
>  	spin_lock_irqsave(&ring->execlist_lock, flags);
>  
> @@ -908,6 +1024,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>  
>  	INIT_LIST_HEAD(&ring->execlist_queue);
>  	spin_lock_init(&ring->execlist_lock);
> +	ring->next_context_status_buffer = 0;
>  
>  	ret = intel_lr_context_deferred_create(dctx, ring);
>  	if (ret)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 14492a9..2e8929f 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -66,6 +66,9 @@ struct intel_ctx_submit_request {
>  	u32 tail;
>  
>  	struct list_head execlist_link;
> +	struct work_struct work;
>  };
>  
> +void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring);
> +
>  #endif /* _INTEL_LRC_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 6358823..905d1ba 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -225,6 +225,7 @@ struct  intel_engine_cs {
>  	/* Execlists */
>  	spinlock_t execlist_lock;
>  	struct list_head execlist_queue;
> +	u8 next_context_status_buffer;
>  	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
>  	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
>  	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 31/43] drm/i915/bdw: Handle context switch events
  2014-07-24 16:04 ` [PATCH 31/43] drm/i915/bdw: Handle context switch events Thomas Daniel
  2014-08-14 20:13   ` Daniel Vetter
@ 2014-08-14 20:17   ` Daniel Vetter
  2014-08-14 20:28   ` Daniel Vetter
  2014-08-14 20:37   ` Daniel Vetter
  3 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-14 20:17 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:39PM +0100, Thomas Daniel wrote:
> Handle all context status events in the context status buffer on every
> context switch interrupt. We only remove work from the execlist queue
> after a context status buffer reports that it has completed and we only
> attempt to schedule new contexts on interrupt when a previously submitted
> context completes (unless no contexts are queued, which means the GPU is
> free).
> 
> We canot call intel_runtime_pm_get() in an interrupt (or with a spinlock
> grabbed, FWIW), because it might sleep, which is not a nice thing to do.
> Instead, do the runtime_pm get/put together with the create/destroy request,
> and handle the forcewake get/put directly.
> 
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
> 
> v2: Unreferencing the context when we are freeing the request might free
> the backing bo, which requires the struct_mutex to be grabbed, so defer
> unreferencing and freeing to a bottom half.
> 
> v3:
> - Ack the interrupt inmediately, before trying to handle it (fix for
> missing interrupts by Bob Beckett <robert.beckett@intel.com>).
> - Update the Context Status Buffer Read Pointer, just in case (spotted
> by Damien Lespiau).
> 
> v4: New namespace and multiple rebase changes.
> 
> v5: Squash with "drm/i915/bdw: Do not call intel_runtime_pm_get() in an
> interrupt", as suggested by Daniel.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_irq.c         |   35 ++++++---
>  drivers/gpu/drm/i915/intel_lrc.c        |  129 +++++++++++++++++++++++++++++--
>  drivers/gpu/drm/i915/intel_lrc.h        |    3 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h |    1 +
>  4 files changed, 151 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index f77a4ca..e4077d1 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1628,6 +1628,7 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>  				       struct drm_i915_private *dev_priv,
>  				       u32 master_ctl)
>  {
> +	struct intel_engine_cs *ring;
>  	u32 rcs, bcs, vcs;
>  	uint32_t tmp = 0;
>  	irqreturn_t ret = IRQ_NONE;
> @@ -1637,14 +1638,20 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>  		if (tmp) {
>  			I915_WRITE(GEN8_GT_IIR(0), tmp);
>  			ret = IRQ_HANDLED;
> +
>  			rcs = tmp >> GEN8_RCS_IRQ_SHIFT;
> -			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
> +			ring = &dev_priv->ring[RCS];
>  			if (rcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, &dev_priv->ring[RCS]);
> +				notify_ring(dev, ring);
> +			if (rcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> +				intel_execlists_handle_ctx_events(ring);
> +
> +			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
> +			ring = &dev_priv->ring[BCS];
>  			if (bcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, &dev_priv->ring[BCS]);
> -			if ((rcs | bcs) & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> -				DRM_DEBUG_DRIVER("TODO: Context switch\n");
> +				notify_ring(dev, ring);
> +			if (bcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> +				intel_execlists_handle_ctx_events(ring);

Also patch split fail here, the above two cases should have been in an
earlier patch that added the basic irq handling.
-Daniel

>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT0)!\n");
>  	}
> @@ -1654,16 +1661,20 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>  		if (tmp) {
>  			I915_WRITE(GEN8_GT_IIR(1), tmp);
>  			ret = IRQ_HANDLED;
> +
>  			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
> +			ring = &dev_priv->ring[VCS];
>  			if (vcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, &dev_priv->ring[VCS]);
> +				notify_ring(dev, ring);
>  			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> -				DRM_DEBUG_DRIVER("TODO: Context switch\n");
> +				intel_execlists_handle_ctx_events(ring);
> +
>  			vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
> +			ring = &dev_priv->ring[VCS2];
>  			if (vcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, &dev_priv->ring[VCS2]);
> +				notify_ring(dev, ring);
>  			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> -				DRM_DEBUG_DRIVER("TODO: Context switch\n");
> +				intel_execlists_handle_ctx_events(ring);
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT1)!\n");
>  	}
> @@ -1684,11 +1695,13 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>  		if (tmp) {
>  			I915_WRITE(GEN8_GT_IIR(3), tmp);
>  			ret = IRQ_HANDLED;
> +
>  			vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
> +			ring = &dev_priv->ring[VECS];
>  			if (vcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, &dev_priv->ring[VECS]);
> +				notify_ring(dev, ring);
>  			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> -				DRM_DEBUG_DRIVER("TODO: Context switch\n");
> +				intel_execlists_handle_ctx_events(ring);
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT3)!\n");
>  	}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 9e91169..65f4f26 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -49,6 +49,22 @@
>  #define RING_ELSP(ring)			((ring)->mmio_base+0x230)
>  #define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
>  #define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
> +#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
> +#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
> +
> +#define RING_EXECLIST_QFULL		(1 << 0x2)
> +#define RING_EXECLIST1_VALID		(1 << 0x3)
> +#define RING_EXECLIST0_VALID		(1 << 0x4)
> +#define RING_EXECLIST_ACTIVE_STATUS	(3 << 0xE)
> +#define RING_EXECLIST1_ACTIVE		(1 << 0x11)
> +#define RING_EXECLIST0_ACTIVE		(1 << 0x12)
> +
> +#define GEN8_CTX_STATUS_IDLE_ACTIVE	(1 << 0)
> +#define GEN8_CTX_STATUS_PREEMPTED	(1 << 1)
> +#define GEN8_CTX_STATUS_ELEMENT_SWITCH	(1 << 2)
> +#define GEN8_CTX_STATUS_ACTIVE_IDLE	(1 << 3)
> +#define GEN8_CTX_STATUS_COMPLETE	(1 << 4)
> +#define GEN8_CTX_STATUS_LITE_RESTORE	(1 << 15)
>  
>  #define CTX_LRI_HEADER_0		0x01
>  #define CTX_CONTEXT_CONTROL		0x02
> @@ -147,6 +163,7 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
>  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
>  	uint64_t temp = 0;
>  	uint32_t desc[4];
> +	unsigned long flags;
>  
>  	/* XXX: You must always write both descriptors in the order below. */
>  	if (ctx_obj1)
> @@ -160,9 +177,17 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
>  	desc[3] = (u32)(temp >> 32);
>  	desc[2] = (u32)temp;
>  
> -	/* Set Force Wakeup bit to prevent GT from entering C6 while
> -	 * ELSP writes are in progress */
> -	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
> +	/* Set Force Wakeup bit to prevent GT from entering C6 while ELSP writes
> +	 * are in progress.
> +	 *
> +	 * The other problem is that we can't just call gen6_gt_force_wake_get()
> +	 * because that function calls intel_runtime_pm_get(), which might sleep.
> +	 * Instead, we do the runtime_pm_get/put when creating/destroying requests.
> +	 */
> +	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
> +	if (dev_priv->uncore.forcewake_count++ == 0)
> +		dev_priv->uncore.funcs.force_wake_get(dev_priv, FORCEWAKE_ALL);
> +	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
>  
>  	I915_WRITE(RING_ELSP(ring), desc[1]);
>  	I915_WRITE(RING_ELSP(ring), desc[0]);
> @@ -173,7 +198,11 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
>  	/* ELSP is a wo register, so use another nearby reg for posting instead */
>  	POSTING_READ(RING_EXECLIST_STATUS(ring));
>  
> -	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
> +	/* Release Force Wakeup (see the big comment above). */
> +	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
> +	if (--dev_priv->uncore.forcewake_count == 0)
> +		dev_priv->uncore.funcs.force_wake_put(dev_priv, FORCEWAKE_ALL);
> +	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
>  }
>  
>  static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tail)
> @@ -221,6 +250,9 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
>  {
>  	struct intel_ctx_submit_request *req0 = NULL, *req1 = NULL;
>  	struct intel_ctx_submit_request *cursor = NULL, *tmp = NULL;
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +
> +	assert_spin_locked(&ring->execlist_lock);
>  
>  	if (list_empty(&ring->execlist_queue))
>  		return;
> @@ -233,8 +265,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
>  			/* Same ctx: ignore first request, as second request
>  			 * will update tail past first request's workload */
>  			list_del(&req0->execlist_link);
> -			i915_gem_context_unreference(req0->ctx);
> -			kfree(req0);
> +			queue_work(dev_priv->wq, &req0->work);
>  			req0 = cursor;
>  		} else {
>  			req1 = cursor;
> @@ -246,6 +277,89 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
>  			req1? req1->ctx : NULL, req1? req1->tail : 0));
>  }
>  
> +static bool execlists_check_remove_request(struct intel_engine_cs *ring,
> +					   u32 request_id)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct intel_ctx_submit_request *head_req;
> +
> +	assert_spin_locked(&ring->execlist_lock);
> +
> +	head_req = list_first_entry_or_null(&ring->execlist_queue,
> +			struct intel_ctx_submit_request, execlist_link);
> +	if (head_req != NULL) {
> +		struct drm_i915_gem_object *ctx_obj =
> +				head_req->ctx->engine[ring->id].state;
> +		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
> +			list_del(&head_req->execlist_link);
> +			queue_work(dev_priv->wq, &head_req->work);
> +			return true;
> +		}
> +	}
> +
> +	return false;
> +}
> +
> +void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	u32 status_pointer;
> +	u8 read_pointer;
> +	u8 write_pointer;
> +	u32 status;
> +	u32 status_id;
> +	u32 submit_contexts = 0;
> +
> +	status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
> +
> +	read_pointer = ring->next_context_status_buffer;
> +	write_pointer = status_pointer & 0x07;
> +	if (read_pointer > write_pointer)
> +		write_pointer += 6;
> +
> +	spin_lock(&ring->execlist_lock);
> +
> +	while (read_pointer < write_pointer) {
> +		read_pointer++;
> +		status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
> +				(read_pointer % 6) * 8);
> +		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
> +				(read_pointer % 6) * 8 + 4);
> +
> +		if (status & GEN8_CTX_STATUS_COMPLETE) {
> +			if (execlists_check_remove_request(ring, status_id))
> +				submit_contexts++;
> +		}
> +	}
> +
> +	if (submit_contexts != 0)
> +		execlists_context_unqueue(ring);
> +
> +	spin_unlock(&ring->execlist_lock);
> +
> +	WARN(submit_contexts > 2, "More than two context complete events?\n");
> +	ring->next_context_status_buffer = write_pointer % 6;
> +
> +	I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
> +			((u32)ring->next_context_status_buffer & 0x07) << 8);
> +}
> +
> +static void execlists_free_request_task(struct work_struct *work)
> +{
> +	struct intel_ctx_submit_request *req =
> +			container_of(work, struct intel_ctx_submit_request, work);
> +	struct drm_device *dev = req->ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	intel_runtime_pm_put(dev_priv);
> +
> +	mutex_lock(&dev->struct_mutex);
> +	i915_gem_context_unreference(req->ctx);
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	kfree(req);
> +}
> +
>  static int execlists_context_queue(struct intel_engine_cs *ring,
>  				   struct intel_context *to,
>  				   u32 tail)
> @@ -261,6 +375,8 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
>  	i915_gem_context_reference(req->ctx);
>  	req->ring = ring;
>  	req->tail = tail;
> +	INIT_WORK(&req->work, execlists_free_request_task);
> +	intel_runtime_pm_get(dev_priv);
>  
>  	spin_lock_irqsave(&ring->execlist_lock, flags);
>  
> @@ -908,6 +1024,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>  
>  	INIT_LIST_HEAD(&ring->execlist_queue);
>  	spin_lock_init(&ring->execlist_lock);
> +	ring->next_context_status_buffer = 0;
>  
>  	ret = intel_lr_context_deferred_create(dctx, ring);
>  	if (ret)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 14492a9..2e8929f 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -66,6 +66,9 @@ struct intel_ctx_submit_request {
>  	u32 tail;
>  
>  	struct list_head execlist_link;
> +	struct work_struct work;
>  };
>  
> +void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring);
> +
>  #endif /* _INTEL_LRC_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 6358823..905d1ba 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -225,6 +225,7 @@ struct  intel_engine_cs {
>  	/* Execlists */
>  	spinlock_t execlist_lock;
>  	struct list_head execlist_queue;
> +	u8 next_context_status_buffer;
>  	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
>  	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
>  	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 31/43] drm/i915/bdw: Handle context switch events
  2014-07-24 16:04 ` [PATCH 31/43] drm/i915/bdw: Handle context switch events Thomas Daniel
  2014-08-14 20:13   ` Daniel Vetter
  2014-08-14 20:17   ` Daniel Vetter
@ 2014-08-14 20:28   ` Daniel Vetter
  2014-08-14 20:37   ` Daniel Vetter
  3 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-14 20:28 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:39PM +0100, Thomas Daniel wrote:
> Handle all context status events in the context status buffer on every
> context switch interrupt. We only remove work from the execlist queue
> after a context status buffer reports that it has completed and we only
> attempt to schedule new contexts on interrupt when a previously submitted
> context completes (unless no contexts are queued, which means the GPU is
> free).
> 
> We canot call intel_runtime_pm_get() in an interrupt (or with a spinlock
> grabbed, FWIW), because it might sleep, which is not a nice thing to do.
> Instead, do the runtime_pm get/put together with the create/destroy request,
> and handle the forcewake get/put directly.
> 
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
> 
> v2: Unreferencing the context when we are freeing the request might free
> the backing bo, which requires the struct_mutex to be grabbed, so defer
> unreferencing and freeing to a bottom half.
> 
> v3:
> - Ack the interrupt inmediately, before trying to handle it (fix for
> missing interrupts by Bob Beckett <robert.beckett@intel.com>).
> - Update the Context Status Buffer Read Pointer, just in case (spotted
> by Damien Lespiau).
> 
> v4: New namespace and multiple rebase changes.
> 
> v5: Squash with "drm/i915/bdw: Do not call intel_runtime_pm_get() in an
> interrupt", as suggested by Daniel.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

> +static void execlists_free_request_task(struct work_struct *work)
> +{
> +	struct intel_ctx_submit_request *req =
> +			container_of(work, struct intel_ctx_submit_request, work);
> +	struct drm_device *dev = req->ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	intel_runtime_pm_put(dev_priv);
> +
> +	mutex_lock(&dev->struct_mutex);
> +	i915_gem_context_unreference(req->ctx);
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	kfree(req);
> +}

Latching a work item simply for the unference of the context looks very
fishy. The context really can't possible disappear before the last request
on it has completed, since the request already holds a reference.

That you have this additional reference here makes it look a bit like the
relationship and lifetime rules for the execlist queue item is misshapen.

I'd have expected:
- A given execlist queue item is responsible for a list of requests (one
  or more)
- Each request already holds onto the context.

The other thing that's strange here is the runtime pm handling. We already
keep the device awake as long as it's busy, so I wonder why exactly we
need this here in addition.

In any case these kind of cleanup tasks are imo better done in the retire
request handler that we already have.

Imo needs some cleanup.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 32/43] drm/i915/bdw: Avoid non-lite-restore preemptions
  2014-07-24 16:04 ` [PATCH 32/43] drm/i915/bdw: Avoid non-lite-restore preemptions Thomas Daniel
@ 2014-08-14 20:31   ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-14 20:31 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:40PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> In the current Execlists feeding mechanism, full preemption is not
> supported yet: only lite-restores are allowed (this is: the GPU
> simply samples a new tail pointer for the context currently in
> execution).
> 
> But we have identified an scenario in which a full preemption occurs:
> 1) We submit two contexts for execution (A & B).
> 2) The GPU finishes with the first one (A), switches to the second one
> (B) and informs us.
> 3) We submit B again (hoping to cause a lite restore) together with C,
> but in the time we spend writing to the ELSP, the GPU finishes B.
> 4) The GPU start executing B again (since we told it so).
> 5) We receive a B finished interrupt and, mistakenly, we submit C (again)
> and D, causing a full preemption of B.
> 
> The race is avoided by keeping track of how many times a context has been
> submitted to the hardware and by better discriminating the received context
> switch interrupts: in the example, when we have submitted B twice, we won´t
> submit C and D as soon as we receive the notification that B is completed
> because we were expecting to get a LITE_RESTORE and we didn´t, so we know a
> second completion will be received shortly.
> 
> Without this explicit checking, somehow, the batch buffer execution order
> gets messed with. This can be verified with the IGT test I sent together with
> the series. I don´t know the exact mechanism by which the pre-emption messes
> with the execution order but, since other people is working on the Scheduler
> + Preemption on Execlists, I didn´t try to fix it. In these series, only Lite
> Restores are supported (other kind of preemptions WARN).

Where's this igt patch? The kernel patch here is at least missing the

Testcase: igt/foo

tag. Please supply.
-Daniel

> 
> v2: elsp_submitted belongs in the new intel_ctx_submit_request. Several
> rebase changes.
> 
> v3: Clarify how the race is avoided, as requested by Daniel.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c |   28 ++++++++++++++++++++++++----
>  drivers/gpu/drm/i915/intel_lrc.h |    2 ++
>  2 files changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 65f4f26..895dbfc 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -264,6 +264,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
>  		else if (req0->ctx == cursor->ctx) {
>  			/* Same ctx: ignore first request, as second request
>  			 * will update tail past first request's workload */
> +			cursor->elsp_submitted = req0->elsp_submitted;
>  			list_del(&req0->execlist_link);
>  			queue_work(dev_priv->wq, &req0->work);
>  			req0 = cursor;
> @@ -273,8 +274,14 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
>  		}
>  	}
>  
> +	WARN_ON(req1 && req1->elsp_submitted);
> +
>  	BUG_ON(execlists_submit_context(ring, req0->ctx, req0->tail,
>  			req1? req1->ctx : NULL, req1? req1->tail : 0));
> +
> +	req0->elsp_submitted++;
> +	if (req1)
> +		req1->elsp_submitted++;
>  }
>  
>  static bool execlists_check_remove_request(struct intel_engine_cs *ring,
> @@ -291,9 +298,13 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
>  		struct drm_i915_gem_object *ctx_obj =
>  				head_req->ctx->engine[ring->id].state;
>  		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
> -			list_del(&head_req->execlist_link);
> -			queue_work(dev_priv->wq, &head_req->work);
> -			return true;
> +			WARN(head_req->elsp_submitted == 0,
> +					"Never submitted head request\n");
> +			if (--head_req->elsp_submitted <= 0) {
> +				list_del(&head_req->execlist_link);
> +				queue_work(dev_priv->wq, &head_req->work);
> +				return true;
> +			}
>  		}
>  	}
>  
> @@ -326,7 +337,16 @@ void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring)
>  		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
>  				(read_pointer % 6) * 8 + 4);
>  
> -		if (status & GEN8_CTX_STATUS_COMPLETE) {
> +		if (status & GEN8_CTX_STATUS_PREEMPTED) {
> +			if (status & GEN8_CTX_STATUS_LITE_RESTORE) {
> +				if (execlists_check_remove_request(ring, status_id))
> +					WARN(1, "Lite Restored request removed from queue\n");
> +			} else
> +				WARN(1, "Preemption without Lite Restore\n");
> +		}
> +
> +		 if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) ||
> +		     (status & GEN8_CTX_STATUS_ELEMENT_SWITCH)) {
>  			if (execlists_check_remove_request(ring, status_id))
>  				submit_contexts++;
>  		}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 2e8929f..074b44f 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -67,6 +67,8 @@ struct intel_ctx_submit_request {
>  
>  	struct list_head execlist_link;
>  	struct work_struct work;
> +
> +	int elsp_submitted;
>  };
>  
>  void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring);
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 31/43] drm/i915/bdw: Handle context switch events
  2014-07-24 16:04 ` [PATCH 31/43] drm/i915/bdw: Handle context switch events Thomas Daniel
                     ` (2 preceding siblings ...)
  2014-08-14 20:28   ` Daniel Vetter
@ 2014-08-14 20:37   ` Daniel Vetter
  3 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-14 20:37 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:39PM +0100, Thomas Daniel wrote:
> Handle all context status events in the context status buffer on every
> context switch interrupt. We only remove work from the execlist queue
> after a context status buffer reports that it has completed and we only
> attempt to schedule new contexts on interrupt when a previously submitted
> context completes (unless no contexts are queued, which means the GPU is
> free).
> 
> We canot call intel_runtime_pm_get() in an interrupt (or with a spinlock
> grabbed, FWIW), because it might sleep, which is not a nice thing to do.
> Instead, do the runtime_pm get/put together with the create/destroy request,
> and handle the forcewake get/put directly.
> 
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
> 
> v2: Unreferencing the context when we are freeing the request might free
> the backing bo, which requires the struct_mutex to be grabbed, so defer
> unreferencing and freeing to a bottom half.
> 
> v3:
> - Ack the interrupt inmediately, before trying to handle it (fix for
> missing interrupts by Bob Beckett <robert.beckett@intel.com>).
> - Update the Context Status Buffer Read Pointer, just in case (spotted
> by Damien Lespiau).
> 
> v4: New namespace and multiple rebase changes.
> 
> v5: Squash with "drm/i915/bdw: Do not call intel_runtime_pm_get() in an
> interrupt", as suggested by Daniel.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

One more ...

> +void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring)

Please rename this to intel_execlist_ctx_events_irq_handler or similar for
consistency with all the other irq handler functions in a follow-up patch.
That kind of consistency helps a lot when reviewing the locking of
irq-save spinlocks.
-Daniel

> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	u32 status_pointer;
> +	u8 read_pointer;
> +	u8 write_pointer;
> +	u32 status;
> +	u32 status_id;
> +	u32 submit_contexts = 0;
> +
> +	status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
> +
> +	read_pointer = ring->next_context_status_buffer;
> +	write_pointer = status_pointer & 0x07;
> +	if (read_pointer > write_pointer)
> +		write_pointer += 6;
> +
> +	spin_lock(&ring->execlist_lock);
> +
> +	while (read_pointer < write_pointer) {
> +		read_pointer++;
> +		status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
> +				(read_pointer % 6) * 8);
> +		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
> +				(read_pointer % 6) * 8 + 4);
> +
> +		if (status & GEN8_CTX_STATUS_COMPLETE) {
> +			if (execlists_check_remove_request(ring, status_id))
> +				submit_contexts++;
> +		}
> +	}
> +
> +	if (submit_contexts != 0)
> +		execlists_context_unqueue(ring);
> +
> +	spin_unlock(&ring->execlist_lock);
> +
> +	WARN(submit_contexts > 2, "More than two context complete events?\n");
> +	ring->next_context_status_buffer = write_pointer % 6;
> +
> +	I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
> +			((u32)ring->next_context_status_buffer & 0x07) << 8);
> +}
> +
> +static void execlists_free_request_task(struct work_struct *work)
> +{
> +	struct intel_ctx_submit_request *req =
> +			container_of(work, struct intel_ctx_submit_request, work);
> +	struct drm_device *dev = req->ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	intel_runtime_pm_put(dev_priv);
> +
> +	mutex_lock(&dev->struct_mutex);
> +	i915_gem_context_unreference(req->ctx);
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	kfree(req);
> +}
> +
>  static int execlists_context_queue(struct intel_engine_cs *ring,
>  				   struct intel_context *to,
>  				   u32 tail)
> @@ -261,6 +375,8 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
>  	i915_gem_context_reference(req->ctx);
>  	req->ring = ring;
>  	req->tail = tail;
> +	INIT_WORK(&req->work, execlists_free_request_task);
> +	intel_runtime_pm_get(dev_priv);
>  
>  	spin_lock_irqsave(&ring->execlist_lock, flags);
>  
> @@ -908,6 +1024,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>  
>  	INIT_LIST_HEAD(&ring->execlist_queue);
>  	spin_lock_init(&ring->execlist_lock);
> +	ring->next_context_status_buffer = 0;
>  
>  	ret = intel_lr_context_deferred_create(dctx, ring);
>  	if (ret)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 14492a9..2e8929f 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -66,6 +66,9 @@ struct intel_ctx_submit_request {
>  	u32 tail;
>  
>  	struct list_head execlist_link;
> +	struct work_struct work;
>  };
>  
> +void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring);
> +
>  #endif /* _INTEL_LRC_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 6358823..905d1ba 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -225,6 +225,7 @@ struct  intel_engine_cs {
>  	/* Execlists */
>  	spinlock_t execlist_lock;
>  	struct list_head execlist_queue;
> +	u8 next_context_status_buffer;
>  	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
>  	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
>  	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 33/43] drm/i915/bdw: Help out the ctx switch interrupt handler
  2014-07-24 16:04 ` [PATCH 33/43] drm/i915/bdw: Help out the ctx switch interrupt handler Thomas Daniel
@ 2014-08-14 20:43   ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-14 20:43 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:41PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> If we receive a storm of requests for the same context (see gem_storedw_loop_*)
> we might end up iterating over too many elements in interrupt time, looking for
> contexts to squash together. Instead, share the burden by giving more
> intelligence to the queue function. At most, the interrupt will iterate over
> three elements.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

I'll continue merging after this patch tomorrow.
-Daniel

> ---
>  drivers/gpu/drm/i915/intel_lrc.c |   26 ++++++++++++++++++++++----
>  1 file changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 895dbfc..829b15d 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -384,9 +384,10 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
>  				   struct intel_context *to,
>  				   u32 tail)
>  {
> -	struct intel_ctx_submit_request *req = NULL;
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct intel_ctx_submit_request *req = NULL, *cursor;
>  	unsigned long flags;
> -	bool was_empty;
> +	int num_elements = 0;
>  
>  	req = kzalloc(sizeof(*req), GFP_KERNEL);
>  	if (req == NULL)
> @@ -400,9 +401,26 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
>  
>  	spin_lock_irqsave(&ring->execlist_lock, flags);
>  
> -	was_empty = list_empty(&ring->execlist_queue);
> +	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
> +		if (++num_elements > 2)
> +			break;
> +
> +	if (num_elements > 2) {
> +		struct intel_ctx_submit_request *tail_req;
> +
> +		tail_req = list_last_entry(&ring->execlist_queue,
> +					struct intel_ctx_submit_request,
> +					execlist_link);
> +		if (to == tail_req->ctx) {
> +			WARN(tail_req->elsp_submitted != 0,
> +					"More than 2 already-submitted reqs queued\n");
> +			list_del(&tail_req->execlist_link);
> +			queue_work(dev_priv->wq, &tail_req->work);
> +		}
> +	}
> +
>  	list_add_tail(&req->execlist_link, &ring->execlist_queue);
> -	if (was_empty)
> +	if (num_elements == 0)
>  		execlists_context_unqueue(ring);
>  
>  	spin_unlock_irqrestore(&ring->execlist_lock, flags);
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 27/43] drm/i915/bdw: Render state init for Execlists
  2014-08-14 20:00         ` Daniel Vetter
@ 2014-08-15  8:43           ` Daniel, Thomas
  2014-08-20 15:55           ` Daniel, Thomas
  1 sibling, 0 replies; 137+ messages in thread
From: Daniel, Thomas @ 2014-08-15  8:43 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx



> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Thursday, August 14, 2014 9:00 PM
> To: Daniel, Thomas
> Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 27/43] drm/i915/bdw: Render state init for
> Execlists
> 
> On Wed, Aug 13, 2014 at 05:30:07PM +0200, Daniel Vetter wrote:
> > On Wed, Aug 13, 2014 at 03:07:29PM +0000, Daniel, Thomas wrote:
> > > > -----Original Message-----
> > > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> > > > Daniel Vetter
> > > > Sent: Monday, August 11, 2014 10:25 PM
> > > > To: Daniel, Thomas
> > > > Cc: intel-gfx@lists.freedesktop.org
> > > > Subject: Re: [Intel-gfx] [PATCH 27/43] drm/i915/bdw: Render state
> > > > init for Execlists
> > > >
> > > > On Thu, Jul 24, 2014 at 05:04:35PM +0100, Thomas Daniel wrote:
> > > > > From: Oscar Mateo <oscar.mateo@intel.com>
> >  > > index 9085ff1..0dc6992 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > > > > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > > > > @@ -513,8 +513,23 @@ int i915_gem_context_enable(struct
> > > > drm_i915_private *dev_priv)
> > > > >  		ppgtt->enable(ppgtt);
> > > > >  	}
> > > > >
> > > > > -	if (i915.enable_execlists)
> > > > > +	if (i915.enable_execlists) {
> > > > > +		struct intel_context *dctx;
> > > > > +
> > > > > +		ring = &dev_priv->ring[RCS];
> > > > > +		dctx = ring->default_context;
> > > > > +
> > > > > +		if (!dctx->rcs_initialized) {
> > > > > +			ret = intel_lr_context_render_state_init(ring,
> dctx);
> > > > > +			if (ret) {
> > > > > +				DRM_ERROR("Init render state failed:
> %d\n",
> > > > ret);
> > > > > +				return ret;
> > > > > +			}
> > > > > +			dctx->rcs_initialized = true;
> > > > > +		}
> > > > > +
> > > > >  		return 0;
> > > > > +	}
> > > >
> > > > This looks very much like the wrong place. We should init the
> > > > render state when we create the context, or when we switch to it for
> the first time.
> > > > The later is what the legacy contexts currently do in do_switch.
> > > >
> > > > But ctx_enable should do the switch to the default context and
> > > > that's about
> > > Well, a side-effect of switching to the default context in legacy
> > > mode is that the render state gets initialized.  I could move the lr
> > > render state init call into an enable_execlists branch in
> > > i915_switch_context() but that doen't seem like the right place.
> > >
> > > How about in i915_gem_init() after calling i915_gem_init_hw()?
> > >
> > > > it. If there's some depency then I guess we should stall the
> > > > creation of the default context a bit, maybe.
> > > >
> > > > In any case someone needs to explain this better and if there's
> > > > not other wey this at least needs a bit comment. So I'll punt for now.
> > > When the default context is created the driver is not ready to
> > > execute a batch.  That is why the render state init can't be done then.
> >
> > That sounds like the default context is created too early. Essentially
> > I want to avoid needless divergence between the default context and
> > normal contexts, because sooner or later that will means someone will
> > creep in with a _really_ subtle bug.
> >
> > What about:
> > - We create the default lrc contexs in context_init, but like with a
> >   normal context we don't do any of the deferred setup.
> > - In context_enable (which since yesterday properly propagates errors to
> >   callers) we force the deferred lrc ctx setup for the default contexts on
> >   all engines.
> > - The render state init is done as part of the deferred ctx setup for the
> >   render engine in all cases.
> >
> > Totally off the track or do you see a workable solution somewhere in
> > that direction?
> 
> I'd like to discuss this first a bit more, so will punt on this patch for now.
> -Daniel
I think that your proposal will work.  I've been having some trouble
with my RVP board so haven't had a chance to test it out yet.

Thomas.
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 30/43] drm/i915/bdw: Two-stage execlist submit process
  2014-08-14 20:10   ` Daniel Vetter
@ 2014-08-15  8:51     ` Daniel, Thomas
  2014-08-15  9:38       ` Daniel Vetter
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel, Thomas @ 2014-08-15  8:51 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx



> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Thursday, August 14, 2014 9:10 PM
> To: Daniel, Thomas
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 30/43] drm/i915/bdw: Two-stage execlist
> submit process
> 
> On Thu, Jul 24, 2014 at 05:04:38PM +0100, Thomas Daniel wrote:
> > From: Michel Thierry <michel.thierry@intel.com>
> >
> > Context switch (and execlist submission) should happen only when other
> > contexts are not active, otherwise pre-emption occurs.
> >
> > To assure this, we place context switch requests in a queue and those
> > request are later consumed when the right context switch interrupt is
> > received (still TODO).
> >
> > v2: Use a spinlock, do not remove the requests on unqueue (wait for
> > context switch completion).
> >
> > Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
> >
> > v3: Several rebases and code changes. Use unique ID.
> >
> > v4:
> > - Move the queue/lock init to the late ring initialization.
> > - Damien's kmalloc review comments: check return, use sizeof(*req), do
> > not cast.
> >
> > v5:
> > - Do not reuse drm_i915_gem_request. Instead, create our own.
> > - New namespace.
> >
> > Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v1)
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2-v5)
> > ---
> >  drivers/gpu/drm/i915/intel_lrc.c        |   63
> ++++++++++++++++++++++++++++++-
> >  drivers/gpu/drm/i915/intel_lrc.h        |    8 ++++
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |    2 +
> >  3 files changed, 71 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > b/drivers/gpu/drm/i915/intel_lrc.c
> > index 5b6f416..9e91169 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -217,6 +217,63 @@ static int execlists_submit_context(struct
> intel_engine_cs *ring,
> >  	return 0;
> >  }
> >
> > +static void execlists_context_unqueue(struct intel_engine_cs *ring) {
> > +	struct intel_ctx_submit_request *req0 = NULL, *req1 = NULL;
> > +	struct intel_ctx_submit_request *cursor = NULL, *tmp = NULL;
> > +
> > +	if (list_empty(&ring->execlist_queue))
> > +		return;
> > +
> > +	/* Try to read in pairs */
> > +	list_for_each_entry_safe(cursor, tmp, &ring->execlist_queue,
> > +execlist_link) {
> 
> Ok, because checkpatch I've looked at this. Imo open-coding this would be
> much easier to read i.e.
> 
> 	if (!list_empty)
> 		grab&remove first item;
> 	if (!list_empty)
> 		grab&remove 2nd item;
> 
> Care to follow up with a patch for that?
> 
> Thanks, Daniel
This needs to be kept as a loop because if there are two consecutive
requests for the same context they are squashed.  Also the non-squashed
requests are not removed here (unfortunately the remove is in the next
patch).

> 
> > +		if (!req0)
> > +			req0 = cursor;
> > +		else if (req0->ctx == cursor->ctx) {
This:

> > +			/* Same ctx: ignore first request, as second request
> > +			 * will update tail past first request's workload */
> > +			list_del(&req0->execlist_link);
> > +			i915_gem_context_unreference(req0->ctx);
> > +			kfree(req0);
> > +			req0 = cursor;
> > +		} else {
> > +			req1 = cursor;
> > +			break;
> > +		}
> > +	}
> > +
> > +	BUG_ON(execlists_submit_context(ring, req0->ctx, req0->tail,
> > +			req1? req1->ctx : NULL, req1? req1->tail : 0)); }
> > +
> > +static int execlists_context_queue(struct intel_engine_cs *ring,
> > +				   struct intel_context *to,
> > +				   u32 tail)
> > +{
> > +	struct intel_ctx_submit_request *req = NULL;
> > +	unsigned long flags;
> > +	bool was_empty;
> > +
> > +	req = kzalloc(sizeof(*req), GFP_KERNEL);
> > +	if (req == NULL)
> > +		return -ENOMEM;
> > +	req->ctx = to;
> > +	i915_gem_context_reference(req->ctx);
> > +	req->ring = ring;
> > +	req->tail = tail;
> > +
> > +	spin_lock_irqsave(&ring->execlist_lock, flags);
> > +
> > +	was_empty = list_empty(&ring->execlist_queue);
> > +	list_add_tail(&req->execlist_link, &ring->execlist_queue);
> > +	if (was_empty)
> > +		execlists_context_unqueue(ring);
> > +
> > +	spin_unlock_irqrestore(&ring->execlist_lock, flags);
> > +
> > +	return 0;
> > +}
> > +
> >  static int logical_ring_invalidate_all_caches(struct intel_ringbuffer
> > *ringbuf)  {
> >  	struct intel_engine_cs *ring = ringbuf->ring; @@ -405,8 +462,7 @@
> > void intel_logical_ring_advance_and_submit(struct intel_ringbuffer
> *ringbuf)
> >  	if (intel_ring_stopped(ring))
> >  		return;
> >
> > -	/* FIXME: too cheeky, we don't even check if the ELSP is ready */
> > -	execlists_submit_context(ring, ctx, ringbuf->tail, NULL, 0);
> > +	execlists_context_queue(ring, ctx, ringbuf->tail);
> >  }
> >
> >  static int logical_ring_alloc_seqno(struct intel_engine_cs *ring, @@
> > -850,6 +906,9 @@ static int logical_ring_init(struct drm_device *dev, struct
> intel_engine_cs *rin
> >  	INIT_LIST_HEAD(&ring->request_list);
> >  	init_waitqueue_head(&ring->irq_queue);
> >
> > +	INIT_LIST_HEAD(&ring->execlist_queue);
> > +	spin_lock_init(&ring->execlist_lock);
> > +
> >  	ret = intel_lr_context_deferred_create(dctx, ring);
> >  	if (ret)
> >  		return ret;
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.h
> > b/drivers/gpu/drm/i915/intel_lrc.h
> > index b59965b..14492a9 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.h
> > +++ b/drivers/gpu/drm/i915/intel_lrc.h
> > @@ -60,4 +60,12 @@ int intel_execlists_submission(struct drm_device
> *dev, struct drm_file *file,
> >  			       u64 exec_start, u32 flags);
> >  u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
> >
> > +struct intel_ctx_submit_request {
> > +	struct intel_context *ctx;
> > +	struct intel_engine_cs *ring;
> > +	u32 tail;
> > +
> > +	struct list_head execlist_link;
> > +};
> > +
> >  #endif /* _INTEL_LRC_H_ */
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > index c885d5c..6358823 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > @@ -223,6 +223,8 @@ struct  intel_engine_cs {
> >  	} semaphore;
> >
> >  	/* Execlists */
> > +	spinlock_t execlist_lock;
> > +	struct list_head execlist_queue;
> >  	u32             irq_keep_mask; /* bitmask for interrupts that should not
> be masked */
> >  	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
> >  	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
> > --
> > 1.7.9.5
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 30/43] drm/i915/bdw: Two-stage execlist submit process
  2014-08-15  8:51     ` Daniel, Thomas
@ 2014-08-15  9:38       ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-15  9:38 UTC (permalink / raw)
  To: Daniel, Thomas; +Cc: intel-gfx

On Fri, Aug 15, 2014 at 08:51:22AM +0000, Daniel, Thomas wrote:
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> > Vetter
> > Sent: Thursday, August 14, 2014 9:10 PM
> > To: Daniel, Thomas
> > Cc: intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 30/43] drm/i915/bdw: Two-stage execlist
> > submit process
> > On Thu, Jul 24, 2014 at 05:04:38PM +0100, Thomas Daniel wrote:
> > > From: Michel Thierry <michel.thierry@intel.com>
> > > +static void execlists_context_unqueue(struct intel_engine_cs *ring) {
> > > +	struct intel_ctx_submit_request *req0 = NULL, *req1 = NULL;
> > > +	struct intel_ctx_submit_request *cursor = NULL, *tmp = NULL;
> > > +
> > > +	if (list_empty(&ring->execlist_queue))
> > > +		return;
> > > +
> > > +	/* Try to read in pairs */
> > > +	list_for_each_entry_safe(cursor, tmp, &ring->execlist_queue,
> > > +execlist_link) {
> > 
> > Ok, because checkpatch I've looked at this. Imo open-coding this would be
> > much easier to read i.e.
> > 
> > 	if (!list_empty)
> > 		grab&remove first item;
> > 	if (!list_empty)
> > 		grab&remove 2nd item;
> > 
> > Care to follow up with a patch for that?
> > 
> > Thanks, Daniel
> This needs to be kept as a loop because if there are two consecutive
> requests for the same context they are squashed.  Also the non-squashed
> requests are not removed here (unfortunately the remove is in the next
> patch).

Ok, this sounds like we need to overhaul it anyway for the request
tracking then.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 11/43] drm/i915/bdw: Render moot context reset and switch with Execlists
  2014-08-11 14:30   ` Daniel Vetter
@ 2014-08-15 10:22     ` Daniel, Thomas
  2014-08-15 15:39       ` Daniel Vetter
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel, Thomas @ 2014-08-15 10:22 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx



> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Monday, August 11, 2014 3:30 PM
> To: Daniel, Thomas
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 11/43] drm/i915/bdw: Render moot context
> reset and switch with Execlists
> 
> On Thu, Jul 24, 2014 at 05:04:19PM +0100, Thomas Daniel wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > These two functions make no sense in an Logical Ring Context &
> > Execlists world.
> >
> > v2: We got rid of lrc_enabled and centralized everything in the
> > sanitized i915.enbale_execlists instead.
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_context.c |    9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> > b/drivers/gpu/drm/i915/i915_gem_context.c
> > index fbe7278..288f5de 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -380,6 +380,9 @@ void i915_gem_context_reset(struct drm_device
> *dev)
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  	int i;
> >
> > +	if (i915.enable_execlists)
> > +		return;
> 
> This will conflict badly with Alistair's patch at a functional level. I'm pretty sure
> that we want _some_ form of reset for the context state, since the hw didn't
> just magically load the previously running context. So NACK on this hunk.
OK I'll wait to see the final version of Alistair's patch and decide what to do
about this hunk.

> 
> > +
> >  	/* Prevent the hardware from restoring the last context (which
> hung) on
> >  	 * the next switch */
> >  	for (i = 0; i < I915_NUM_RINGS; i++) { @@ -514,6 +517,9 @@ int
> > i915_gem_context_enable(struct drm_i915_private *dev_priv)
> >  		ppgtt->enable(ppgtt);
> >  	}
> >
> > +	if (i915.enable_execlists)
> > +		return 0;
> 
> Again this conflicts with Alistair's patch. Furthermore it looks redudant since
> you no-op out i915_switch_context separately.
I don't think this is a conflict.  Doesn't Alistair's change here just involve
writing PDPs for full PPGTT?  We don't want to do that in lrc mode.

> 
> > +
> >  	/* FIXME: We should make this work, even in reset */
> >  	if (i915_reset_in_progress(&dev_priv->gpu_error))
> >  		return 0;
> > @@ -769,6 +775,9 @@ int i915_switch_context(struct intel_engine_cs
> > *ring,  {
> >  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> >
> > +	if (i915.enable_execlists)
> > +		return 0;
> 
> I've hoped we don't need this with the legacy ringbuffer cmdsubmission fuly
> split out. If there are other paths (resume, gpu reset) where this comes into
> play then I guess we need to look at where the best place is to make this call.
> So until this comes with a bit a better justification I'll punt on this for now.
> -Daniel
Yes, the command submission lrc path doesn't call this but other codepaths
do.  If we keep the check in context_enable() the only remaining call I see is
in i915_gpu_idle().  I don't mind if the check is done there but perhaps a
WARN_ON should then be added into switch_context() because we don't
want to be putting illegal MI_SET_CONTEXT commands into the ring.

Thomas.
>
> > +
> >  	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
> >
> >  	if (to->legacy_hw_ctx.rcs_state == NULL) { /* We have the fake
> > context */
> > --
> > 1.7.9.5
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
  2014-08-07 12:17   ` Thomas Daniel
  2014-08-08 15:59     ` Damien Lespiau
  2014-08-11 14:32     ` Daniel Vetter
@ 2014-08-15 11:01     ` Thomas Daniel
  2 siblings, 0 replies; 137+ messages in thread
From: Thomas Daniel @ 2014-08-15 11:01 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

This is mostly for correctness so that we know we are running the LR
context correctly (this is, the PDPs are contained inside the context
object).

v2: Move the check to inside the enable PPGTT function. The switch
happens in two places: the legacy context switch (that we won't hit
when Execlists are enabled) and the PPGTT enable, which unfortunately
we need. This would look much nicer if the ppgtt->enable was part of
the ring init, where it logically belongs.

v3: Move the check to the start of the enable PPGTT function.  None
of the legacy PPGTT enabling is required when using LRCs as the
PPGTT is enabled in the context descriptor and the PDPs are written
in the LRC.

v4: Clarify comment based on review feedback.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5188936..2966b53 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -843,6 +843,12 @@ static int gen8_ppgtt_enable(struct i915_hw_ppgtt *ppgtt)
 	struct intel_engine_cs *ring;
 	int j, ret;
 
+	/* In the case of execlists, PPGTT is enabled by the context descriptor
+	 * and the PDPs are contained within the context itself.  We don't
+	 * need to do anything here. */
+	if (i915.enable_execlists)
+		return 0;
+
 	for_each_ring(ring, dev_priv, j) {
 		I915_WRITE(RING_MODE_GEN7(ring),
 			   _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* Re: [PATCH 35/43] drm/i915/bdw: Make sure error capture keeps working with Execlists
  2014-07-24 16:04 ` [PATCH 35/43] drm/i915/bdw: Make sure error capture keeps working " Thomas Daniel
@ 2014-08-15 12:14   ` Daniel Vetter
  2014-08-21 10:57     ` Daniel, Thomas
  0 siblings, 1 reply; 137+ messages in thread
From: Daniel Vetter @ 2014-08-15 12:14 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:43PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Since the ringbuffer does not belong per engine anymore, we have to
> make sure that we are always recording the correct ringbuffer.
> 
> TODO: This is only a small fix to keep basic error capture working, but
> we need to add more information for it to be useful (e.g. dump the
> context being executed).
> 
> v2: Reorder how the ringbuffer is chosen to clarify the change and
> rename the variable, both changes suggested by Chris Wilson. Also,
> add the TODO comment to the code, as suggested by Daniel.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

There's a bit too much stuff in-flight to fix up error capture for ppgtt.
I think it's better to stall this patch here until that work is completed.
Please coordinate with Mika here.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c |   22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 45b6191..1e38576 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -874,9 +874,6 @@ static void i915_record_ring_state(struct drm_device *dev,
>  		ering->hws = I915_READ(mmio);
>  	}
>  
> -	ering->cpu_ring_head = ring->buffer->head;
> -	ering->cpu_ring_tail = ring->buffer->tail;
> -
>  	ering->hangcheck_score = ring->hangcheck.score;
>  	ering->hangcheck_action = ring->hangcheck.action;
>  
> @@ -936,6 +933,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
>  
>  	for (i = 0; i < I915_NUM_RINGS; i++) {
>  		struct intel_engine_cs *ring = &dev_priv->ring[i];
> +		struct intel_ringbuffer *rbuf;
>  
>  		error->ring[i].pid = -1;
>  
> @@ -979,8 +977,24 @@ static void i915_gem_record_rings(struct drm_device *dev,
>  			}
>  		}
>  
> +		if (i915.enable_execlists) {
> +			/* TODO: This is only a small fix to keep basic error
> +			 * capture working, but we need to add more information
> +			 * for it to be useful (e.g. dump the context being
> +			 * executed).
> +			 */
> +			if (request)
> +				rbuf = request->ctx->engine[ring->id].ringbuf;
> +			else
> +				rbuf = ring->default_context->engine[ring->id].ringbuf;
> +		} else
> +			rbuf = ring->buffer;
> +
> +		error->ring[i].cpu_ring_head = rbuf->head;
> +		error->ring[i].cpu_ring_tail = rbuf->tail;
> +
>  		error->ring[i].ringbuffer =
> -			i915_error_ggtt_object_create(dev_priv, ring->buffer->obj);
> +			i915_error_ggtt_object_create(dev_priv, rbuf->obj);
>  
>  		if (ring->status_page.obj)
>  			error->ring[i].hws_page =
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 40/43] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
  2014-07-24 16:04 ` [PATCH 40/43] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists Thomas Daniel
@ 2014-08-15 12:42   ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-15 12:42 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:48PM +0100, Thomas Daniel wrote:
> +/**
> + * intel_lr_context_render_state_init() - render state init for Execlists
> + * @ring: Engine Command Streamer.
> + * @ctx: Context to initialize.
> + *
> + * A.K.A. null-context, A.K.A. golden-context. In a word, the render engine
> + * contexts require to always have a valid 3d pipeline state. As this is
> + * achieved with the submission of a batchbuffer, we require an alternative
> + * entry point to the legacy ringbuffer submission one (i915_gem_render_state_init).
> + *
> + * Return: non-zero if the initialization failed.
> + */

I've dropped this hunk here since that part isn't merged yet.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 42/43] drm/i915/bdw: Pin the context backing objects to GGTT on-demand
  2014-07-24 16:04 ` [PATCH 42/43] drm/i915/bdw: Pin the context backing objects to GGTT on-demand Thomas Daniel
@ 2014-08-15 13:03   ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-15 13:03 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Jul 24, 2014 at 05:04:50PM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Up until now, we have pinned every logical ring context backing object
> during creation, and left it pinned until destruction. This made my life
> easier, but it's a harmful thing to do, because we cause fragmentation
> of the GGTT (and, eventually, we would run out of space).
> 
> This patch makes the pinning on-demand: the backing objects of the two
> contexts that are written to the ELSP are pinned right before submission
> and unpinned once the hardware is done with them. The only context that
> is still pinned regardless is the global default one, so that the HWS can
> still be accessed in the same way (ring->status_page).
> 
> v2: In the early version of this patch, we were pinning the context as
> we put it into the ELSP: on the one hand, this is very efficient because
> only a maximum two contexts are pinned at any given time, but on the other
> hand, we cannot really pin in interrupt time :(
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |   11 +++++++--
>  drivers/gpu/drm/i915/i915_drv.h     |    1 +
>  drivers/gpu/drm/i915/i915_gem.c     |   44 ++++++++++++++++++++++++-----------
>  drivers/gpu/drm/i915/intel_lrc.c    |   42 ++++++++++++++++++++++++---------
>  drivers/gpu/drm/i915/intel_lrc.h    |    2 ++
>  5 files changed, 73 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 968c3c0..84531cc 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1721,10 +1721,15 @@ static int i915_dump_lrc(struct seq_file *m, void *unused)
>  				continue;
>  
>  			if (ctx_obj) {
> -				struct page *page = i915_gem_object_get_page(ctx_obj, 1);
> -				uint32_t *reg_state = kmap_atomic(page);
> +				struct page *page ;
> +				uint32_t *reg_state;
>  				int j;
>  
> +				i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);

This just needs a get/put_pages, no pinning required.

> +
> +				page = i915_gem_object_get_page(ctx_obj, 1);
> +				reg_state = kmap_atomic(page);
> +
>  				seq_printf(m, "CONTEXT: %s %u\n", ring->name,
>  						intel_execlists_ctx_id(ctx_obj));
>  
> @@ -1736,6 +1741,8 @@ static int i915_dump_lrc(struct seq_file *m, void *unused)
>  				}
>  				kunmap_atomic(reg_state);
>  
> +				i915_gem_object_ggtt_unpin(ctx_obj);
> +
>  				seq_putc(m, '\n');
>  			}
>  		}
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 1ce51d6..70466af 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -628,6 +628,7 @@ struct intel_context {
>  	struct {
>  		struct drm_i915_gem_object *state;
>  		struct intel_ringbuffer *ringbuf;
> +		atomic_t unpin_count;

No need to reinvent wheels for this. We should be able to do exactly what
we've done for the legacy ctx objects, namely:
- Always pin the default system context so that we can always switch to
  that.
- Shovel all context releated objects through the active queue and obj
  management. This might depend upon the reworked exec list item.
- In the shrinker have a last-ditch effort to switch to the default
  context in case we run out of space.
- igt testcase. For legacy rings this code here resulted in some _really_
  expensive bugs and regressions.

I'll do another jira for this.

Otherwise I think I've pulled pretty much everything in except those
places where I think directly reworking the patches makes more sense. And
we should have JIRA tasks for all the outstanding work (plus ofc getting
ppgtt ready since execlists requires that).

If there's a patch I've missed or where people think it makes more sense
to merge the wip version now, please pipe up.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 11/43] drm/i915/bdw: Render moot context reset and switch with Execlists
  2014-08-15 10:22     ` Daniel, Thomas
@ 2014-08-15 15:39       ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-15 15:39 UTC (permalink / raw)
  To: Daniel, Thomas; +Cc: intel-gfx

On Fri, Aug 15, 2014 at 10:22:01AM +0000, Daniel, Thomas wrote:
> 
> 
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> > Vetter
> > Sent: Monday, August 11, 2014 3:30 PM
> > To: Daniel, Thomas
> > Cc: intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 11/43] drm/i915/bdw: Render moot context
> > reset and switch with Execlists
> > 
> > On Thu, Jul 24, 2014 at 05:04:19PM +0100, Thomas Daniel wrote:
> > > From: Oscar Mateo <oscar.mateo@intel.com>
> > >
> > > These two functions make no sense in an Logical Ring Context &
> > > Execlists world.
> > >
> > > v2: We got rid of lrc_enabled and centralized everything in the
> > > sanitized i915.enbale_execlists instead.
> > >
> > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem_context.c |    9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> > > b/drivers/gpu/drm/i915/i915_gem_context.c
> > > index fbe7278..288f5de 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > > @@ -380,6 +380,9 @@ void i915_gem_context_reset(struct drm_device
> > *dev)
> > >  	struct drm_i915_private *dev_priv = dev->dev_private;
> > >  	int i;
> > >
> > > +	if (i915.enable_execlists)
> > > +		return;
> > 
> > This will conflict badly with Alistair's patch at a functional level. I'm pretty sure
> > that we want _some_ form of reset for the context state, since the hw didn't
> > just magically load the previously running context. So NACK on this hunk.
> OK I'll wait to see the final version of Alistair's patch and decide what to do
> about this hunk.
> 
> > 
> > > +
> > >  	/* Prevent the hardware from restoring the last context (which
> > hung) on
> > >  	 * the next switch */
> > >  	for (i = 0; i < I915_NUM_RINGS; i++) { @@ -514,6 +517,9 @@ int
> > > i915_gem_context_enable(struct drm_i915_private *dev_priv)
> > >  		ppgtt->enable(ppgtt);
> > >  	}
> > >
> > > +	if (i915.enable_execlists)
> > > +		return 0;
> > 
> > Again this conflicts with Alistair's patch. Furthermore it looks redudant since
> > you no-op out i915_switch_context separately.
> I don't think this is a conflict.  Doesn't Alistair's change here just involve
> writing PDPs for full PPGTT?  We don't want to do that in lrc mode.

Oh, just context conflict since Alistair's patch removes the next line ;-)
Also I shuffled some of the ppgtt code right above this around.

> > > +
> > >  	/* FIXME: We should make this work, even in reset */
> > >  	if (i915_reset_in_progress(&dev_priv->gpu_error))
> > >  		return 0;
> > > @@ -769,6 +775,9 @@ int i915_switch_context(struct intel_engine_cs
> > > *ring,  {
> > >  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> > >
> > > +	if (i915.enable_execlists)
> > > +		return 0;
> > 
> > I've hoped we don't need this with the legacy ringbuffer cmdsubmission fuly
> > split out. If there are other paths (resume, gpu reset) where this comes into
> > play then I guess we need to look at where the best place is to make this call.
> > So until this comes with a bit a better justification I'll punt on this for now.
> > -Daniel
> Yes, the command submission lrc path doesn't call this but other codepaths
> do.  If we keep the check in context_enable() the only remaining call I see is
> in i915_gpu_idle().  I don't mind if the check is done there but perhaps a
> WARN_ON should then be added into switch_context() because we don't
> want to be putting illegal MI_SET_CONTEXT commands into the ring.

Yeah, that sounds like a plan. There's still a bit of confusion around the
ctx reset in Alistairs patch but unrelated to the code bits we've
discussed here. So when you resend this revised patch can you please merge
Alistair's patch into your baseline to avoid conflicts.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 41/43] drm/i915/bdw: Enable Logical Ring Contexts (hence, Execlists)
  2014-07-24 16:04 ` [PATCH 41/43] drm/i915/bdw: Enable Logical Ring Contexts (hence, Execlists) Thomas Daniel
@ 2014-08-18  8:33   ` Jani Nikula
  2014-08-18 14:52     ` Daniel, Thomas
  0 siblings, 1 reply; 137+ messages in thread
From: Jani Nikula @ 2014-08-18  8:33 UTC (permalink / raw)
  To: Thomas Daniel, intel-gfx

On Thu, 24 Jul 2014, Thomas Daniel <thomas.daniel@intel.com> wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
>
> The time has come, the Walrus said, to talk of many things.

FYI this causes https://bugs.freedesktop.org/show_bug.cgi?id=82740

> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index b7cf0ec..1ce51d6 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2061,7 +2061,7 @@ struct drm_i915_cmd_table {
>  #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
>  
>  #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
> -#define HAS_LOGICAL_RING_CONTEXTS(dev)	0
> +#define HAS_LOGICAL_RING_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 8)
>  #define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >= 6)
>  #define HAS_PPGTT(dev)		(INTEL_INFO(dev)->gen >= 7 && !IS_GEN8(dev))
>  #define USES_PPGTT(dev)		intel_enable_ppgtt(dev, false)
> -- 
> 1.7.9.5
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 41/43] drm/i915/bdw: Enable Logical Ring Contexts (hence, Execlists)
  2014-08-18  8:33   ` Jani Nikula
@ 2014-08-18 14:52     ` Daniel, Thomas
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel, Thomas @ 2014-08-18 14:52 UTC (permalink / raw)
  To: Jani Nikula, intel-gfx



> -----Original Message-----
> From: Jani Nikula [mailto:jani.nikula@linux.intel.com]
> Sent: Monday, August 18, 2014 9:33 AM
> To: Daniel, Thomas; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 41/43] drm/i915/bdw: Enable Logical Ring
> Contexts (hence, Execlists)
> 
> On Thu, 24 Jul 2014, Thomas Daniel <thomas.daniel@intel.com> wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > The time has come, the Walrus said, to talk of many things.
> 
> FYI this causes https://bugs.freedesktop.org/show_bug.cgi?id=82740
Hmm, this seems to be an interaction between execlists and PPGTT
patches, specifically:
http://patchwork.freedesktop.org/patch/31150/
which breaks aliasing PPGTT when execlists are enabled, resulting
in this oops on boot.

I will post a patch which avoids this but I don't know if the intention
is to drop aliasing PPGTT support when execlists are enabled.

Thomas.

> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h index b7cf0ec..1ce51d6 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2061,7 +2061,7 @@ struct drm_i915_cmd_table {
> >  #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
> >
> >  #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
> > -#define HAS_LOGICAL_RING_CONTEXTS(dev)	0
> > +#define HAS_LOGICAL_RING_CONTEXTS(dev)	(INTEL_INFO(dev)-
> >gen >= 8)
> >  #define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >= 6)
> >  #define HAS_PPGTT(dev)		(INTEL_INFO(dev)->gen >= 7 &&
> !IS_GEN8(dev))
> >  #define USES_PPGTT(dev)		intel_enable_ppgtt(dev, false)
> > --
> > 1.7.9.5
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> --
> Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH] drm/i915/bdw: Render moot context reset and switch with Execlists
  2014-07-24 16:04 ` [PATCH 11/43] drm/i915/bdw: Render moot context reset and switch with Execlists Thomas Daniel
  2014-08-11 14:30   ` Daniel Vetter
@ 2014-08-20 15:29   ` Thomas Daniel
  2014-08-20 15:36     ` Chris Wilson
  1 sibling, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-08-20 15:29 UTC (permalink / raw)
  To: intel-gfx

These two functions make no sense in an Logical Ring Context & Execlists
world.

v2: We got rid of lrc_enabled and centralized everything in the sanitized
i915.enable_execlists instead.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

v3: Rebased.  Corrected a typo in comment for i915_switch_context and
added a comment that it should not be called in execlist mode. Added
WARN_ON if i915_switch_context is called in execlist mode. Moved check
for execlist mode out of i915_switch_context and into callers. Added
comment in context_reset explaining why nothing is done in execlist
mode.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         |    8 +++++---
 drivers/gpu/drm/i915/i915_gem_context.c |   16 +++++++++++++++-
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cb9310b..954a5f9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2981,9 +2981,11 @@ int i915_gpu_idle(struct drm_device *dev)
 
 	/* Flush everything onto the inactive list. */
 	for_each_ring(ring, dev_priv, i) {
-		ret = i915_switch_context(ring, ring->default_context);
-		if (ret)
-			return ret;
+		if (!i915.enable_execlists) {
+			ret = i915_switch_context(ring, ring->default_context);
+			if (ret)
+				return ret;
+		}
 
 		ret = intel_ring_idle(ring);
 		if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 0fdb357..3face51 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -289,6 +289,12 @@ void i915_gem_context_reset(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int i;
 
+	/* In execlists mode we will unreference the context when the execlist
+	 * queue is cleared and the requests destroyed.
+	 */
+	if (i915.enable_execlists)
+		return;
+
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct intel_engine_cs *ring = &dev_priv->ring[i];
 		struct intel_context *lctx = ring->last_context;
@@ -397,6 +403,9 @@ int i915_gem_context_enable(struct drm_i915_private *dev_priv)
 
 	BUG_ON(!dev_priv->ring[RCS].default_context);
 
+	if (i915.enable_execlists)
+		return 0;
+
 	for_each_ring(ring, dev_priv, i) {
 		ret = i915_switch_context(ring, ring->default_context);
 		if (ret)
@@ -637,14 +646,19 @@ unpin_out:
  *
  * The context life cycle is simple. The context refcount is incremented and
  * decremented by 1 and create and destroy. If the context is in use by the GPU,
- * it will have a refoucnt > 1. This allows us to destroy the context abstract
+ * it will have a refcount > 1. This allows us to destroy the context abstract
  * object while letting the normal object tracking destroy the backing BO.
+ *
+ * This function should not be used in execlists mode.  Instead the context is
+ * switched by writing to the ELSP and requests keep a reference to their
+ * context.
  */
 int i915_switch_context(struct intel_engine_cs *ring,
 			struct intel_context *to)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 
+	WARN_ON(i915.enable_execlists);
 	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
 
 	if (to->legacy_hw_ctx.rcs_state == NULL) { /* We have the fake context */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* Re: [PATCH] drm/i915/bdw: Render moot context reset and switch with Execlists
  2014-08-20 15:29   ` [PATCH] " Thomas Daniel
@ 2014-08-20 15:36     ` Chris Wilson
  2014-08-25 20:39       ` Daniel Vetter
  0 siblings, 1 reply; 137+ messages in thread
From: Chris Wilson @ 2014-08-20 15:36 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Wed, Aug 20, 2014 at 04:29:24PM +0100, Thomas Daniel wrote:
> These two functions make no sense in an Logical Ring Context & Execlists
> world.
> 
> v2: We got rid of lrc_enabled and centralized everything in the sanitized
> i915.enable_execlists instead.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> 
> v3: Rebased.  Corrected a typo in comment for i915_switch_context and
> added a comment that it should not be called in execlist mode. Added
> WARN_ON if i915_switch_context is called in execlist mode. Moved check
> for execlist mode out of i915_switch_context and into callers. Added
> comment in context_reset explaining why nothing is done in execlist
> mode.

No, this is not the way. The requirement is to reduce the number of
special cases not increase them. These should be evaluated to be no-ops
when execlists is used.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 27/43] drm/i915/bdw: Render state init for Execlists
  2014-08-14 20:00         ` Daniel Vetter
  2014-08-15  8:43           ` Daniel, Thomas
@ 2014-08-20 15:55           ` Daniel, Thomas
  2014-08-25 20:55             ` Daniel Vetter
  1 sibling, 1 reply; 137+ messages in thread
From: Daniel, Thomas @ 2014-08-20 15:55 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx



> -----Original Message-----
> From: Daniel, Thomas
> Sent: Friday, August 15, 2014 9:44 AM
> To: 'Daniel Vetter'
> Cc: intel-gfx@lists.freedesktop.org
> Subject: RE: [Intel-gfx] [PATCH 27/43] drm/i915/bdw: Render state init for
> Execlists
> 
> 
> 
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> > Daniel Vetter
> > Sent: Thursday, August 14, 2014 9:00 PM
> > To: Daniel, Thomas
> > Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 27/43] drm/i915/bdw: Render state init
> > for Execlists
> >
> > On Wed, Aug 13, 2014 at 05:30:07PM +0200, Daniel Vetter wrote:
> > > On Wed, Aug 13, 2014 at 03:07:29PM +0000, Daniel, Thomas wrote:
> > > > > -----Original Message-----
> > > > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> > > > > Daniel Vetter
> > > > > Sent: Monday, August 11, 2014 10:25 PM
> > > > > To: Daniel, Thomas
> > > > > Cc: intel-gfx@lists.freedesktop.org
> > > > > Subject: Re: [Intel-gfx] [PATCH 27/43] drm/i915/bdw: Render
> > > > > state init for Execlists
> > > > >
> > > > > On Thu, Jul 24, 2014 at 05:04:35PM +0100, Thomas Daniel wrote:
> > > > > > From: Oscar Mateo <oscar.mateo@intel.com>
> > >  > > index 9085ff1..0dc6992 100644
> > > > > > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > > > > > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > > > > > @@ -513,8 +513,23 @@ int i915_gem_context_enable(struct
> > > > > drm_i915_private *dev_priv)
> > > > > >  		ppgtt->enable(ppgtt);
> > > > > >  	}
> > > > > >
> > > > > > -	if (i915.enable_execlists)
> > > > > > +	if (i915.enable_execlists) {
> > > > > > +		struct intel_context *dctx;
> > > > > > +
> > > > > > +		ring = &dev_priv->ring[RCS];
> > > > > > +		dctx = ring->default_context;
> > > > > > +
> > > > > > +		if (!dctx->rcs_initialized) {
> > > > > > +			ret = intel_lr_context_render_state_init(ring,
> > dctx);
> > > > > > +			if (ret) {
> > > > > > +				DRM_ERROR("Init render state failed:
> > %d\n",
> > > > > ret);
> > > > > > +				return ret;
> > > > > > +			}
> > > > > > +			dctx->rcs_initialized = true;
> > > > > > +		}
> > > > > > +
> > > > > >  		return 0;
> > > > > > +	}
> > > > >
> > > > > This looks very much like the wrong place. We should init the
> > > > > render state when we create the context, or when we switch to it
> > > > > for
> > the first time.
> > > > > The later is what the legacy contexts currently do in do_switch.
> > > > >
> > > > > But ctx_enable should do the switch to the default context and
> > > > > that's about
> > > > Well, a side-effect of switching to the default context in legacy
> > > > mode is that the render state gets initialized.  I could move the
> > > > lr render state init call into an enable_execlists branch in
> > > > i915_switch_context() but that doen't seem like the right place.
> > > >
> > > > How about in i915_gem_init() after calling i915_gem_init_hw()?
> > > >
> > > > > it. If there's some depency then I guess we should stall the
> > > > > creation of the default context a bit, maybe.
> > > > >
> > > > > In any case someone needs to explain this better and if there's
> > > > > not other wey this at least needs a bit comment. So I'll punt for now.
> > > > When the default context is created the driver is not ready to
> > > > execute a batch.  That is why the render state init can't be done then.
> > >
> > > That sounds like the default context is created too early.
> > > Essentially I want to avoid needless divergence between the default
> > > context and normal contexts, because sooner or later that will means
> > > someone will creep in with a _really_ subtle bug.
> > >
> > > What about:
> > > - We create the default lrc contexs in context_init, but like with a
> > >   normal context we don't do any of the deferred setup.
> > > - In context_enable (which since yesterday properly propagates errors to
> > >   callers) we force the deferred lrc ctx setup for the default contexts on
> > >   all engines.
> > > - The render state init is done as part of the deferred ctx setup for the
> > >   render engine in all cases.
> > >
> > > Totally off the track or do you see a workable solution somewhere in
> > > that direction?
> >
> > I'd like to discuss this first a bit more, so will punt on this patch for now.
> > -Daniel
> I think that your proposal will work.  I've been having some trouble with my
> RVP board so haven't had a chance to test it out yet.
> 
> Thomas.
I've now tried this out and I don't think it can work without introducing
more problems than the original patch.  Trouble is that in lrc mode the
Hardware Status Page is offset 0 from the context.  All contexts use the
default context's HWSP for writing seqnos, this is stored in
ring->status_page.  We can't populate this until the deferred creation of
the default context is done, so we can't execute any instructions in the
deferred creation (unless we check for default context in the deferred
creation which is what we wanted to avoid in the first place).

Thomas.

> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH] drm/i915/bdw: Render state init for Execlists
  2014-07-24 16:04 ` [PATCH 27/43] drm/i915/bdw: Render state init for Execlists Thomas Daniel
  2014-08-11 21:25   ` Daniel Vetter
@ 2014-08-21 10:40   ` Thomas Daniel
  2014-08-28  9:40     ` Daniel Vetter
  1 sibling, 1 reply; 137+ messages in thread
From: Thomas Daniel @ 2014-08-21 10:40 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

The batchbuffer that sets the render context state is submitted
in a different way, and from different places.

We needed to make both the render state preparation and free functions
outside accesible, and namespace accordingly. This mess is so that all
LR, LRC and Execlists functionality can go together in intel_lrc.c: we
can fix all of this later on, once the interfaces are clear.

v2: Create a separate ctx->rcs_initialized for the Execlists case, as
suggested by Chris Wilson.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

v3: Setup ring status page in lr_context_deferred_create when the
default context is being created. This means that the render state
init for the default context is no longer a special case.  Execute
deferred creation of the default context at the end of
logical_ring_init to allow the render state commands to be submitted.
Fix style errors reported by checkpatch. Rebased.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h              |    4 +-
 drivers/gpu/drm/i915/i915_gem_render_state.c |   40 ++++++++------
 drivers/gpu/drm/i915/i915_gem_render_state.h |   47 +++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c             |   73 ++++++++++++++++++++------
 drivers/gpu/drm/i915/intel_lrc.h             |    2 +
 drivers/gpu/drm/i915/intel_renderstate.h     |    8 +--
 6 files changed, 135 insertions(+), 39 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_render_state.h

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e449f81..f416e341 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -37,6 +37,7 @@
 #include "intel_ringbuffer.h"
 #include "intel_lrc.h"
 #include "i915_gem_gtt.h"
+#include "i915_gem_render_state.h"
 #include <linux/io-mapping.h>
 #include <linux/i2c.h>
 #include <linux/i2c-algo-bit.h>
@@ -635,6 +636,7 @@ struct intel_context {
 	} legacy_hw_ctx;
 
 	/* Execlists */
+	bool rcs_initialized;
 	struct {
 		struct drm_i915_gem_object *state;
 		struct intel_ringbuffer *ringbuf;
@@ -2596,8 +2598,6 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 				   struct drm_file *file);
 
-/* i915_gem_render_state.c */
-int i915_gem_render_state_init(struct intel_engine_cs *ring);
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev,
 					  struct i915_address_space *vm,
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index e60be3f..a9a62d7 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -28,13 +28,6 @@
 #include "i915_drv.h"
 #include "intel_renderstate.h"
 
-struct render_state {
-	const struct intel_renderstate_rodata *rodata;
-	struct drm_i915_gem_object *obj;
-	u64 ggtt_offset;
-	int gen;
-};
-
 static const struct intel_renderstate_rodata *
 render_state_get_rodata(struct drm_device *dev, const int gen)
 {
@@ -127,30 +120,47 @@ static int render_state_setup(struct render_state *so)
 	return 0;
 }
 
-static void render_state_fini(struct render_state *so)
+void i915_gem_render_state_fini(struct render_state *so)
 {
 	i915_gem_object_ggtt_unpin(so->obj);
 	drm_gem_object_unreference(&so->obj->base);
 }
 
-int i915_gem_render_state_init(struct intel_engine_cs *ring)
+int i915_gem_render_state_prepare(struct intel_engine_cs *ring,
+				  struct render_state *so)
 {
-	struct render_state so;
 	int ret;
 
 	if (WARN_ON(ring->id != RCS))
 		return -ENOENT;
 
-	ret = render_state_init(&so, ring->dev);
+	ret = render_state_init(so, ring->dev);
 	if (ret)
 		return ret;
 
-	if (so.rodata == NULL)
+	if (so->rodata == NULL)
 		return 0;
 
-	ret = render_state_setup(&so);
+	ret = render_state_setup(so);
+	if (ret) {
+		i915_gem_render_state_fini(so);
+		return ret;
+	}
+
+	return 0;
+}
+
+int i915_gem_render_state_init(struct intel_engine_cs *ring)
+{
+	struct render_state so;
+	int ret;
+
+	ret = i915_gem_render_state_prepare(ring, &so);
 	if (ret)
-		goto out;
+		return ret;
+
+	if (so.rodata == NULL)
+		return 0;
 
 	ret = ring->dispatch_execbuffer(ring,
 					so.ggtt_offset,
@@ -164,6 +174,6 @@ int i915_gem_render_state_init(struct intel_engine_cs *ring)
 	ret = __i915_add_request(ring, NULL, so.obj, NULL);
 	/* __i915_add_request moves object to inactive if it fails */
 out:
-	render_state_fini(&so);
+	i915_gem_render_state_fini(&so);
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.h b/drivers/gpu/drm/i915/i915_gem_render_state.h
new file mode 100644
index 0000000..c44961e
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.h
@@ -0,0 +1,47 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef _I915_GEM_RENDER_STATE_H_
+#define _I915_GEM_RENDER_STATE_H_
+
+#include <linux/types.h>
+
+struct intel_renderstate_rodata {
+	const u32 *reloc;
+	const u32 *batch;
+	const u32 batch_items;
+};
+
+struct render_state {
+	const struct intel_renderstate_rodata *rodata;
+	struct drm_i915_gem_object *obj;
+	u64 ggtt_offset;
+	int gen;
+};
+
+int i915_gem_render_state_init(struct intel_engine_cs *ring);
+void i915_gem_render_state_fini(struct render_state *so);
+int i915_gem_render_state_prepare(struct intel_engine_cs *ring,
+				  struct render_state *so);
+
+#endif /* _I915_GEM_RENDER_STATE_H_ */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c096b9b..8e51fd0 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1217,8 +1217,6 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
 {
 	int ret;
-	struct intel_context *dctx = ring->default_context;
-	struct drm_i915_gem_object *dctx_obj;
 
 	/* Intentionally left blank. */
 	ring->buffer = NULL;
@@ -1232,18 +1230,6 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	spin_lock_init(&ring->execlist_lock);
 	ring->next_context_status_buffer = 0;
 
-	ret = intel_lr_context_deferred_create(dctx, ring);
-	if (ret)
-		return ret;
-
-	/* The status page is offset 0 from the context object in LRCs. */
-	dctx_obj = dctx->engine[ring->id].state;
-	ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(dctx_obj);
-	ring->status_page.page_addr = kmap(sg_page(dctx_obj->pages->sgl));
-	if (ring->status_page.page_addr == NULL)
-		return -ENOMEM;
-	ring->status_page.obj = dctx_obj;
-
 	ret = i915_cmd_parser_init_ring(ring);
 	if (ret)
 		return ret;
@@ -1254,7 +1240,9 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 			return ret;
 	}
 
-	return 0;
+	ret = intel_lr_context_deferred_create(ring->default_context, ring);
+
+	return ret;
 }
 
 static int logical_render_ring_init(struct drm_device *dev)
@@ -1448,6 +1436,38 @@ cleanup_render_ring:
 	return ret;
 }
 
+int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
+				       struct intel_context *ctx)
+{
+	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	struct render_state so;
+	struct drm_i915_file_private *file_priv = ctx->file_priv;
+	struct drm_file *file = file_priv ? file_priv->file : NULL;
+	int ret;
+
+	ret = i915_gem_render_state_prepare(ring, &so);
+	if (ret)
+		return ret;
+
+	if (so.rodata == NULL)
+		return 0;
+
+	ret = ring->emit_bb_start(ringbuf,
+			so.ggtt_offset,
+			I915_DISPATCH_SECURE);
+	if (ret)
+		goto out;
+
+	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), ring);
+
+	ret = __i915_add_request(ring, file, so.obj, NULL);
+	/* intel_logical_ring_add_request moves object to inactive if it
+	 * fails */
+out:
+	i915_gem_render_state_fini(&so);
+	return ret;
+}
+
 static int
 populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj,
 		    struct intel_engine_cs *ring, struct intel_ringbuffer *ringbuf)
@@ -1687,6 +1707,29 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	ctx->engine[ring->id].ringbuf = ringbuf;
 	ctx->engine[ring->id].state = ctx_obj;
 
+	if (ctx == ring->default_context) {
+		/* The status page is offset 0 from the default context object
+		 * in LRC mode. */
+		ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(ctx_obj);
+		ring->status_page.page_addr =
+				kmap(sg_page(ctx_obj->pages->sgl));
+		if (ring->status_page.page_addr == NULL)
+			return -ENOMEM;
+		ring->status_page.obj = ctx_obj;
+	}
+
+	if (ring->id == RCS && !ctx->rcs_initialized) {
+		ret = intel_lr_context_render_state_init(ring, ctx);
+		if (ret) {
+			DRM_ERROR("Init render state failed: %d\n", ret);
+			ctx->engine[ring->id].ringbuf = NULL;
+			ctx->engine[ring->id].state = NULL;
+			intel_destroy_ringbuffer_obj(ringbuf);
+			goto error;
+		}
+		ctx->rcs_initialized = true;
+	}
+
 	return 0;
 
 error:
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 991d449..33c3b4b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -62,6 +62,8 @@ static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf,
 int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords);
 
 /* Logical Ring Contexts */
+int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
+				       struct intel_context *ctx);
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/intel_renderstate.h b/drivers/gpu/drm/i915/intel_renderstate.h
index fd4f662..6c792d3 100644
--- a/drivers/gpu/drm/i915/intel_renderstate.h
+++ b/drivers/gpu/drm/i915/intel_renderstate.h
@@ -24,13 +24,7 @@
 #ifndef _INTEL_RENDERSTATE_H
 #define _INTEL_RENDERSTATE_H
 
-#include <linux/types.h>
-
-struct intel_renderstate_rodata {
-	const u32 *reloc;
-	const u32 *batch;
-	const u32 batch_items;
-};
+#include "i915_drv.h"
 
 extern const struct intel_renderstate_rodata gen6_null_state;
 extern const struct intel_renderstate_rodata gen7_null_state;
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* Re: [PATCH 35/43] drm/i915/bdw: Make sure error capture keeps working with Execlists
  2014-08-15 12:14   ` Daniel Vetter
@ 2014-08-21 10:57     ` Daniel, Thomas
  2014-08-25 21:00       ` Daniel Vetter
  2014-08-25 21:29       ` Daniel Vetter
  0 siblings, 2 replies; 137+ messages in thread
From: Daniel, Thomas @ 2014-08-21 10:57 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx



> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Friday, August 15, 2014 1:14 PM
> To: Daniel, Thomas
> Cc: intel-gfx@lists.freedesktop.org; Mika Kuoppala
> Subject: Re: [Intel-gfx] [PATCH 35/43] drm/i915/bdw: Make sure error
> capture keeps working with Execlists
> 
> On Thu, Jul 24, 2014 at 05:04:43PM +0100, Thomas Daniel wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > Since the ringbuffer does not belong per engine anymore, we have to
> > make sure that we are always recording the correct ringbuffer.
> >
> > TODO: This is only a small fix to keep basic error capture working,
> > but we need to add more information for it to be useful (e.g. dump the
> > context being executed).
> >
> > v2: Reorder how the ringbuffer is chosen to clarify the change and
> > rename the variable, both changes suggested by Chris Wilson. Also, add
> > the TODO comment to the code, as suggested by Daniel.
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> 
> There's a bit too much stuff in-flight to fix up error capture for ppgtt.
> I think it's better to stall this patch here until that work is completed.
> Please coordinate with Mika here.
> -Daniel
Mika has now closed the Jira issue.  This patch still applies and looks
correct.  Is it OK to be merged as-is?

Thomas.

> 
> > ---
> >  drivers/gpu/drm/i915/i915_gpu_error.c |   22 ++++++++++++++++++----
> >  1 file changed, 18 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c
> > b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index 45b6191..1e38576 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -874,9 +874,6 @@ static void i915_record_ring_state(struct
> drm_device *dev,
> >  		ering->hws = I915_READ(mmio);
> >  	}
> >
> > -	ering->cpu_ring_head = ring->buffer->head;
> > -	ering->cpu_ring_tail = ring->buffer->tail;
> > -
> >  	ering->hangcheck_score = ring->hangcheck.score;
> >  	ering->hangcheck_action = ring->hangcheck.action;
> >
> > @@ -936,6 +933,7 @@ static void i915_gem_record_rings(struct
> > drm_device *dev,
> >
> >  	for (i = 0; i < I915_NUM_RINGS; i++) {
> >  		struct intel_engine_cs *ring = &dev_priv->ring[i];
> > +		struct intel_ringbuffer *rbuf;
> >
> >  		error->ring[i].pid = -1;
> >
> > @@ -979,8 +977,24 @@ static void i915_gem_record_rings(struct
> drm_device *dev,
> >  			}
> >  		}
> >
> > +		if (i915.enable_execlists) {
> > +			/* TODO: This is only a small fix to keep basic error
> > +			 * capture working, but we need to add more
> information
> > +			 * for it to be useful (e.g. dump the context being
> > +			 * executed).
> > +			 */
> > +			if (request)
> > +				rbuf = request->ctx->engine[ring-
> >id].ringbuf;
> > +			else
> > +				rbuf = ring->default_context->engine[ring-
> >id].ringbuf;
> > +		} else
> > +			rbuf = ring->buffer;
> > +
> > +		error->ring[i].cpu_ring_head = rbuf->head;
> > +		error->ring[i].cpu_ring_tail = rbuf->tail;
> > +
> >  		error->ring[i].ringbuffer =
> > -			i915_error_ggtt_object_create(dev_priv, ring-
> >buffer->obj);
> > +			i915_error_ggtt_object_create(dev_priv, rbuf->obj);
> >
> >  		if (ring->status_page.obj)
> >  			error->ring[i].hws_page =
> > --
> > 1.7.9.5
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH] drm/i915/bdw: Render moot context reset and switch with Execlists
  2014-08-20 15:36     ` Chris Wilson
@ 2014-08-25 20:39       ` Daniel Vetter
  2014-08-25 22:01         ` Scot Doyle
  2014-08-26  5:59         ` Chris Wilson
  0 siblings, 2 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-25 20:39 UTC (permalink / raw)
  To: Chris Wilson, Thomas Daniel, intel-gfx

On Wed, Aug 20, 2014 at 04:36:05PM +0100, Chris Wilson wrote:
> On Wed, Aug 20, 2014 at 04:29:24PM +0100, Thomas Daniel wrote:
> > These two functions make no sense in an Logical Ring Context & Execlists
> > world.
> > 
> > v2: We got rid of lrc_enabled and centralized everything in the sanitized
> > i915.enable_execlists instead.
> > 
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > 
> > v3: Rebased.  Corrected a typo in comment for i915_switch_context and
> > added a comment that it should not be called in execlist mode. Added
> > WARN_ON if i915_switch_context is called in execlist mode. Moved check
> > for execlist mode out of i915_switch_context and into callers. Added
> > comment in context_reset explaining why nothing is done in execlist
> > mode.
> 
> No, this is not the way. The requirement is to reduce the number of
> special cases not increase them. These should be evaluated to be no-ops
> when execlists is used.

I think it's ok-ish for now. Maybe we need to reconsider when we wire up
lrc reclaim - which is the real user of the switch_context in gpu_idle.
The problem I have though is that I can't parse the subject of the patch,
someone please translate that to simplified English for me. I can do the
replacement while applying.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 27/43] drm/i915/bdw: Render state init for Execlists
  2014-08-20 15:55           ` Daniel, Thomas
@ 2014-08-25 20:55             ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-25 20:55 UTC (permalink / raw)
  To: Daniel, Thomas; +Cc: intel-gfx

On Wed, Aug 20, 2014 at 03:55:42PM +0000, Daniel, Thomas wrote:
> 
> 
> > -----Original Message-----
> > From: Daniel, Thomas
> > Sent: Friday, August 15, 2014 9:44 AM
> > To: 'Daniel Vetter'
> > Cc: intel-gfx@lists.freedesktop.org
> > Subject: RE: [Intel-gfx] [PATCH 27/43] drm/i915/bdw: Render state init for
> > Execlists
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> > > Daniel Vetter
> > > Sent: Thursday, August 14, 2014 9:00 PM
> > > To: Daniel, Thomas
> > > Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH 27/43] drm/i915/bdw: Render state init
> > > for Execlists
> > >
> > > On Wed, Aug 13, 2014 at 05:30:07PM +0200, Daniel Vetter wrote:
> > > > On Wed, Aug 13, 2014 at 03:07:29PM +0000, Daniel, Thomas wrote:
> > > > > > -----Original Message-----
> > > > > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> > > > > > Daniel Vetter
> > > > > > Sent: Monday, August 11, 2014 10:25 PM
> > > > > > To: Daniel, Thomas
> > > > > > Cc: intel-gfx@lists.freedesktop.org
> > > > > > Subject: Re: [Intel-gfx] [PATCH 27/43] drm/i915/bdw: Render
> > > > > > state init for Execlists
> > > > > >
> > > > > > On Thu, Jul 24, 2014 at 05:04:35PM +0100, Thomas Daniel wrote:
> > > > > > > From: Oscar Mateo <oscar.mateo@intel.com>
> > > >  > > index 9085ff1..0dc6992 100644
> > > > > > > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > > > > > > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > > > > > > @@ -513,8 +513,23 @@ int i915_gem_context_enable(struct
> > > > > > drm_i915_private *dev_priv)
> > > > > > >  		ppgtt->enable(ppgtt);
> > > > > > >  	}
> > > > > > >
> > > > > > > -	if (i915.enable_execlists)
> > > > > > > +	if (i915.enable_execlists) {
> > > > > > > +		struct intel_context *dctx;
> > > > > > > +
> > > > > > > +		ring = &dev_priv->ring[RCS];
> > > > > > > +		dctx = ring->default_context;
> > > > > > > +
> > > > > > > +		if (!dctx->rcs_initialized) {
> > > > > > > +			ret = intel_lr_context_render_state_init(ring,
> > > dctx);
> > > > > > > +			if (ret) {
> > > > > > > +				DRM_ERROR("Init render state failed:
> > > %d\n",
> > > > > > ret);
> > > > > > > +				return ret;
> > > > > > > +			}
> > > > > > > +			dctx->rcs_initialized = true;
> > > > > > > +		}
> > > > > > > +
> > > > > > >  		return 0;
> > > > > > > +	}
> > > > > >
> > > > > > This looks very much like the wrong place. We should init the
> > > > > > render state when we create the context, or when we switch to it
> > > > > > for
> > > the first time.
> > > > > > The later is what the legacy contexts currently do in do_switch.
> > > > > >
> > > > > > But ctx_enable should do the switch to the default context and
> > > > > > that's about
> > > > > Well, a side-effect of switching to the default context in legacy
> > > > > mode is that the render state gets initialized.  I could move the
> > > > > lr render state init call into an enable_execlists branch in
> > > > > i915_switch_context() but that doen't seem like the right place.
> > > > >
> > > > > How about in i915_gem_init() after calling i915_gem_init_hw()?
> > > > >
> > > > > > it. If there's some depency then I guess we should stall the
> > > > > > creation of the default context a bit, maybe.
> > > > > >
> > > > > > In any case someone needs to explain this better and if there's
> > > > > > not other wey this at least needs a bit comment. So I'll punt for now.
> > > > > When the default context is created the driver is not ready to
> > > > > execute a batch.  That is why the render state init can't be done then.
> > > >
> > > > That sounds like the default context is created too early.
> > > > Essentially I want to avoid needless divergence between the default
> > > > context and normal contexts, because sooner or later that will means
> > > > someone will creep in with a _really_ subtle bug.
> > > >
> > > > What about:
> > > > - We create the default lrc contexs in context_init, but like with a
> > > >   normal context we don't do any of the deferred setup.
> > > > - In context_enable (which since yesterday properly propagates errors to
> > > >   callers) we force the deferred lrc ctx setup for the default contexts on
> > > >   all engines.
> > > > - The render state init is done as part of the deferred ctx setup for the
> > > >   render engine in all cases.
> > > >
> > > > Totally off the track or do you see a workable solution somewhere in
> > > > that direction?
> > >
> > > I'd like to discuss this first a bit more, so will punt on this patch for now.
> > > -Daniel
> > I think that your proposal will work.  I've been having some trouble with my
> > RVP board so haven't had a chance to test it out yet.
> > 
> > Thomas.
> I've now tried this out and I don't think it can work without introducing
> more problems than the original patch.  Trouble is that in lrc mode the
> Hardware Status Page is offset 0 from the context.  All contexts use the
> default context's HWSP for writing seqnos, this is stored in
> ring->status_page.  We can't populate this until the deferred creation of
> the default context is done, so we can't execute any instructions in the
> deferred creation (unless we check for default context in the deferred
> creation which is what we wanted to avoid in the first place).

That just sounds like we should either allocate the HWS at engine init
time (since it's shared across all rings). Or that we should switch to
per-ring/ctx hws, but that's probably only something for when the
scheduler shows up.

Or is there some other blocker?

Generally I really don't like to bend over backwards in the init code if
we get such initialization ordering constraints wrong - in the past that
really has been a fairly awesome source of bugs. What's needed though is a
pile of WARN_ON to check that the initialization ordering is done in the
right way, in case someone touches the code. That still means runtime
testing, but that's still a lot better than silent failures.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 35/43] drm/i915/bdw: Make sure error capture keeps working with Execlists
  2014-08-21 10:57     ` Daniel, Thomas
@ 2014-08-25 21:00       ` Daniel Vetter
  2014-08-25 21:29       ` Daniel Vetter
  1 sibling, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-25 21:00 UTC (permalink / raw)
  To: Daniel, Thomas; +Cc: intel-gfx

On Thu, Aug 21, 2014 at 10:57:09AM +0000, Daniel, Thomas wrote:
> 
> 
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> > Vetter
> > Sent: Friday, August 15, 2014 1:14 PM
> > To: Daniel, Thomas
> > Cc: intel-gfx@lists.freedesktop.org; Mika Kuoppala
> > Subject: Re: [Intel-gfx] [PATCH 35/43] drm/i915/bdw: Make sure error
> > capture keeps working with Execlists
> > 
> > On Thu, Jul 24, 2014 at 05:04:43PM +0100, Thomas Daniel wrote:
> > > From: Oscar Mateo <oscar.mateo@intel.com>
> > >
> > > Since the ringbuffer does not belong per engine anymore, we have to
> > > make sure that we are always recording the correct ringbuffer.
> > >
> > > TODO: This is only a small fix to keep basic error capture working,
> > > but we need to add more information for it to be useful (e.g. dump the
> > > context being executed).
> > >
> > > v2: Reorder how the ringbuffer is chosen to clarify the change and
> > > rename the variable, both changes suggested by Chris Wilson. Also, add
> > > the TODO comment to the code, as suggested by Daniel.
> > >
> > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > 
> > There's a bit too much stuff in-flight to fix up error capture for ppgtt.
> > I think it's better to stall this patch here until that work is completed.
> > Please coordinate with Mika here.
> > -Daniel
> Mika has now closed the Jira issue.  This patch still applies and looks
> correct.  Is it OK to be merged as-is?

Apparently Mika closed the Jira before I've merged all the patches. I'd
like to do that first, then come back to this patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 35/43] drm/i915/bdw: Make sure error capture keeps working with Execlists
  2014-08-21 10:57     ` Daniel, Thomas
  2014-08-25 21:00       ` Daniel Vetter
@ 2014-08-25 21:29       ` Daniel Vetter
  1 sibling, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-25 21:29 UTC (permalink / raw)
  To: Daniel, Thomas; +Cc: intel-gfx

On Thu, Aug 21, 2014 at 10:57:09AM +0000, Daniel, Thomas wrote:
> 
> 
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> > Vetter
> > Sent: Friday, August 15, 2014 1:14 PM
> > To: Daniel, Thomas
> > Cc: intel-gfx@lists.freedesktop.org; Mika Kuoppala
> > Subject: Re: [Intel-gfx] [PATCH 35/43] drm/i915/bdw: Make sure error
> > capture keeps working with Execlists
> > 
> > On Thu, Jul 24, 2014 at 05:04:43PM +0100, Thomas Daniel wrote:
> > > From: Oscar Mateo <oscar.mateo@intel.com>
> > >
> > > Since the ringbuffer does not belong per engine anymore, we have to
> > > make sure that we are always recording the correct ringbuffer.
> > >
> > > TODO: This is only a small fix to keep basic error capture working,
> > > but we need to add more information for it to be useful (e.g. dump the
> > > context being executed).
> > >
> > > v2: Reorder how the ringbuffer is chosen to clarify the change and
> > > rename the variable, both changes suggested by Chris Wilson. Also, add
> > > the TODO comment to the code, as suggested by Daniel.
> > >
> > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > 
> > There's a bit too much stuff in-flight to fix up error capture for ppgtt.
> > I think it's better to stall this patch here until that work is completed.
> > Please coordinate with Mika here.
> > -Daniel
> Mika has now closed the Jira issue.  This patch still applies and looks
> correct.  Is it OK to be merged as-is?

Queued for -next, thanks for the patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH] drm/i915/bdw: Render moot context reset and switch with Execlists
  2014-08-25 20:39       ` Daniel Vetter
@ 2014-08-25 22:01         ` Scot Doyle
  2014-08-26  5:59         ` Chris Wilson
  1 sibling, 0 replies; 137+ messages in thread
From: Scot Doyle @ 2014-08-25 22:01 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Mon, 25 Aug 2014, Daniel Vetter wrote:
> On Wed, Aug 20, 2014 at 04:36:05PM +0100, Chris Wilson wrote:
>> On Wed, Aug 20, 2014 at 04:29:24PM +0100, Thomas Daniel wrote:
>>> These two functions make no sense in an Logical Ring Context & Execlists
>>> world.
>>>
>>> v2: We got rid of lrc_enabled and centralized everything in the sanitized
>>> i915.enable_execlists instead.
>>>
>>> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
>>>
>>> v3: Rebased.  Corrected a typo in comment for i915_switch_context and
>>> added a comment that it should not be called in execlist mode. Added
>>> WARN_ON if i915_switch_context is called in execlist mode. Moved check
>>> for execlist mode out of i915_switch_context and into callers. Added
>>> comment in context_reset explaining why nothing is done in execlist
>>> mode.
>>
>> No, this is not the way. The requirement is to reduce the number of
>> special cases not increase them. These should be evaluated to be no-ops
>> when execlists is used.
>
> I think it's ok-ish for now. Maybe we need to reconsider when we wire up
> lrc reclaim - which is the real user of the switch_context in gpu_idle.
> The problem I have though is that I can't parse the subject of the patch,
> someone please translate that to simplified English for me. I can do the
> replacement while applying.
> -Daniel

"Render moot" usually means something like "make obsolete", not sure about 
the rest.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH] drm/i915/bdw: Render moot context reset and switch with Execlists
  2014-08-25 20:39       ` Daniel Vetter
  2014-08-25 22:01         ` Scot Doyle
@ 2014-08-26  5:59         ` Chris Wilson
  2014-08-26 13:54           ` Siluvery, Arun
  1 sibling, 1 reply; 137+ messages in thread
From: Chris Wilson @ 2014-08-26  5:59 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Mon, Aug 25, 2014 at 10:39:39PM +0200, Daniel Vetter wrote:
> On Wed, Aug 20, 2014 at 04:36:05PM +0100, Chris Wilson wrote:
> > On Wed, Aug 20, 2014 at 04:29:24PM +0100, Thomas Daniel wrote:
> > > These two functions make no sense in an Logical Ring Context & Execlists
> > > world.
> > > 
> > > v2: We got rid of lrc_enabled and centralized everything in the sanitized
> > > i915.enable_execlists instead.
> > > 
> > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > 
> > > v3: Rebased.  Corrected a typo in comment for i915_switch_context and
> > > added a comment that it should not be called in execlist mode. Added
> > > WARN_ON if i915_switch_context is called in execlist mode. Moved check
> > > for execlist mode out of i915_switch_context and into callers. Added
> > > comment in context_reset explaining why nothing is done in execlist
> > > mode.
> > 
> > No, this is not the way. The requirement is to reduce the number of
> > special cases not increase them. These should be evaluated to be no-ops
> > when execlists is used.
> 
> I think it's ok-ish for now. Maybe we need to reconsider when we wire up
> lrc reclaim - which is the real user of the switch_context in gpu_idle.
> The problem I have though is that I can't parse the subject of the patch,
> someone please translate that to simplified English for me. I can do the
> replacement while applying.

No, it is not. execlists is badly designed and this is a further symptom
of that.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH] drm/i915/bdw: Render moot context reset and switch with Execlists
  2014-08-26  5:59         ` Chris Wilson
@ 2014-08-26 13:54           ` Siluvery, Arun
  2014-08-26 14:11             ` Daniel Vetter
  0 siblings, 1 reply; 137+ messages in thread
From: Siluvery, Arun @ 2014-08-26 13:54 UTC (permalink / raw)
  To: intel-gfx

On 26/08/2014 06:59, Chris Wilson wrote:
> On Mon, Aug 25, 2014 at 10:39:39PM +0200, Daniel Vetter wrote:
>> On Wed, Aug 20, 2014 at 04:36:05PM +0100, Chris Wilson wrote:
>>> On Wed, Aug 20, 2014 at 04:29:24PM +0100, Thomas Daniel wrote:
>>>> These two functions make no sense in an Logical Ring Context & Execlists
>>>> world.
>>>>
>>>> v2: We got rid of lrc_enabled and centralized everything in the sanitized
>>>> i915.enable_execlists instead.
>>>>
>>>> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
>>>>
>>>> v3: Rebased.  Corrected a typo in comment for i915_switch_context and
>>>> added a comment that it should not be called in execlist mode. Added
>>>> WARN_ON if i915_switch_context is called in execlist mode. Moved check
>>>> for execlist mode out of i915_switch_context and into callers. Added
>>>> comment in context_reset explaining why nothing is done in execlist
>>>> mode.
>>>
>>> No, this is not the way. The requirement is to reduce the number of
>>> special cases not increase them. These should be evaluated to be no-ops
>>> when execlists is used.
>>
>> I think it's ok-ish for now. Maybe we need to reconsider when we wire up
>> lrc reclaim - which is the real user of the switch_context in gpu_idle.
>> The problem I have though is that I can't parse the subject of the patch,
>> someone please translate that to simplified English for me. I can do the
>> replacement while applying.
>
> No, it is not. execlists is badly designed and this is a further symptom
> of that.
> -Chris
>
Thomas is not available and I am replying on his behalf.
Is the following subject is good for this patch?

"Don't execute context reset and switch when using Execlists"

regards
Arun

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH] drm/i915/bdw: Render moot context reset and switch with Execlists
  2014-08-26 13:54           ` Siluvery, Arun
@ 2014-08-26 14:11             ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-26 14:11 UTC (permalink / raw)
  To: Siluvery, Arun; +Cc: intel-gfx

On Tue, Aug 26, 2014 at 02:54:39PM +0100, Siluvery, Arun wrote:
> On 26/08/2014 06:59, Chris Wilson wrote:
> >On Mon, Aug 25, 2014 at 10:39:39PM +0200, Daniel Vetter wrote:
> >>On Wed, Aug 20, 2014 at 04:36:05PM +0100, Chris Wilson wrote:
> >>>On Wed, Aug 20, 2014 at 04:29:24PM +0100, Thomas Daniel wrote:
> >>>>These two functions make no sense in an Logical Ring Context & Execlists
> >>>>world.
> >>>>
> >>>>v2: We got rid of lrc_enabled and centralized everything in the sanitized
> >>>>i915.enable_execlists instead.
> >>>>
> >>>>Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> >>>>
> >>>>v3: Rebased.  Corrected a typo in comment for i915_switch_context and
> >>>>added a comment that it should not be called in execlist mode. Added
> >>>>WARN_ON if i915_switch_context is called in execlist mode. Moved check
> >>>>for execlist mode out of i915_switch_context and into callers. Added
> >>>>comment in context_reset explaining why nothing is done in execlist
> >>>>mode.
> >>>
> >>>No, this is not the way. The requirement is to reduce the number of
> >>>special cases not increase them. These should be evaluated to be no-ops
> >>>when execlists is used.
> >>
> >>I think it's ok-ish for now. Maybe we need to reconsider when we wire up
> >>lrc reclaim - which is the real user of the switch_context in gpu_idle.
> >>The problem I have though is that I can't parse the subject of the patch,
> >>someone please translate that to simplified English for me. I can do the
> >>replacement while applying.
> >
> >No, it is not. execlists is badly designed and this is a further symptom
> >of that.
> >-Chris
> >
> Thomas is not available and I am replying on his behalf.
> Is the following subject is good for this patch?
> 
> "Don't execute context reset and switch when using Execlists"

Yeah, I think that's more parseable by mere mortals^W^W non-native
speakers. Pulled in.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH] drm/i915/bdw: Render state init for Execlists
  2014-08-21 10:40   ` [PATCH] " Thomas Daniel
@ 2014-08-28  9:40     ` Daniel Vetter
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Vetter @ 2014-08-28  9:40 UTC (permalink / raw)
  To: Thomas Daniel; +Cc: intel-gfx

On Thu, Aug 21, 2014 at 11:40:54AM +0100, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> The batchbuffer that sets the render context state is submitted
> in a different way, and from different places.
> 
> We needed to make both the render state preparation and free functions
> outside accesible, and namespace accordingly. This mess is so that all
> LR, LRC and Execlists functionality can go together in intel_lrc.c: we
> can fix all of this later on, once the interfaces are clear.
> 
> v2: Create a separate ctx->rcs_initialized for the Execlists case, as
> suggested by Chris Wilson.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> 
> v3: Setup ring status page in lr_context_deferred_create when the
> default context is being created. This means that the render state
> init for the default context is no longer a special case.  Execute
> deferred creation of the default context at the end of
> logical_ring_init to allow the render state commands to be submitted.
> Fix style errors reported by checkpatch. Rebased.
> 
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

Queued for -next, thanks for the patch.
-Daniel
> ---
>  drivers/gpu/drm/i915/i915_drv.h              |    4 +-
>  drivers/gpu/drm/i915/i915_gem_render_state.c |   40 ++++++++------
>  drivers/gpu/drm/i915/i915_gem_render_state.h |   47 +++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c             |   73 ++++++++++++++++++++------
>  drivers/gpu/drm/i915/intel_lrc.h             |    2 +
>  drivers/gpu/drm/i915/intel_renderstate.h     |    8 +--
>  6 files changed, 135 insertions(+), 39 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_gem_render_state.h
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index e449f81..f416e341 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -37,6 +37,7 @@
>  #include "intel_ringbuffer.h"
>  #include "intel_lrc.h"
>  #include "i915_gem_gtt.h"
> +#include "i915_gem_render_state.h"
>  #include <linux/io-mapping.h>
>  #include <linux/i2c.h>
>  #include <linux/i2c-algo-bit.h>
> @@ -635,6 +636,7 @@ struct intel_context {
>  	} legacy_hw_ctx;
>  
>  	/* Execlists */
> +	bool rcs_initialized;
>  	struct {
>  		struct drm_i915_gem_object *state;
>  		struct intel_ringbuffer *ringbuf;
> @@ -2596,8 +2598,6 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>  int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>  				   struct drm_file *file);
>  
> -/* i915_gem_render_state.c */
> -int i915_gem_render_state_init(struct intel_engine_cs *ring);
>  /* i915_gem_evict.c */
>  int __must_check i915_gem_evict_something(struct drm_device *dev,
>  					  struct i915_address_space *vm,
> diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
> index e60be3f..a9a62d7 100644
> --- a/drivers/gpu/drm/i915/i915_gem_render_state.c
> +++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
> @@ -28,13 +28,6 @@
>  #include "i915_drv.h"
>  #include "intel_renderstate.h"
>  
> -struct render_state {
> -	const struct intel_renderstate_rodata *rodata;
> -	struct drm_i915_gem_object *obj;
> -	u64 ggtt_offset;
> -	int gen;
> -};
> -
>  static const struct intel_renderstate_rodata *
>  render_state_get_rodata(struct drm_device *dev, const int gen)
>  {
> @@ -127,30 +120,47 @@ static int render_state_setup(struct render_state *so)
>  	return 0;
>  }
>  
> -static void render_state_fini(struct render_state *so)
> +void i915_gem_render_state_fini(struct render_state *so)
>  {
>  	i915_gem_object_ggtt_unpin(so->obj);
>  	drm_gem_object_unreference(&so->obj->base);
>  }
>  
> -int i915_gem_render_state_init(struct intel_engine_cs *ring)
> +int i915_gem_render_state_prepare(struct intel_engine_cs *ring,
> +				  struct render_state *so)
>  {
> -	struct render_state so;
>  	int ret;
>  
>  	if (WARN_ON(ring->id != RCS))
>  		return -ENOENT;
>  
> -	ret = render_state_init(&so, ring->dev);
> +	ret = render_state_init(so, ring->dev);
>  	if (ret)
>  		return ret;
>  
> -	if (so.rodata == NULL)
> +	if (so->rodata == NULL)
>  		return 0;
>  
> -	ret = render_state_setup(&so);
> +	ret = render_state_setup(so);
> +	if (ret) {
> +		i915_gem_render_state_fini(so);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +int i915_gem_render_state_init(struct intel_engine_cs *ring)
> +{
> +	struct render_state so;
> +	int ret;
> +
> +	ret = i915_gem_render_state_prepare(ring, &so);
>  	if (ret)
> -		goto out;
> +		return ret;
> +
> +	if (so.rodata == NULL)
> +		return 0;
>  
>  	ret = ring->dispatch_execbuffer(ring,
>  					so.ggtt_offset,
> @@ -164,6 +174,6 @@ int i915_gem_render_state_init(struct intel_engine_cs *ring)
>  	ret = __i915_add_request(ring, NULL, so.obj, NULL);
>  	/* __i915_add_request moves object to inactive if it fails */
>  out:
> -	render_state_fini(&so);
> +	i915_gem_render_state_fini(&so);
>  	return ret;
>  }
> diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.h b/drivers/gpu/drm/i915/i915_gem_render_state.h
> new file mode 100644
> index 0000000..c44961e
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_gem_render_state.h
> @@ -0,0 +1,47 @@
> +/*
> + * Copyright © 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef _I915_GEM_RENDER_STATE_H_
> +#define _I915_GEM_RENDER_STATE_H_
> +
> +#include <linux/types.h>
> +
> +struct intel_renderstate_rodata {
> +	const u32 *reloc;
> +	const u32 *batch;
> +	const u32 batch_items;
> +};
> +
> +struct render_state {
> +	const struct intel_renderstate_rodata *rodata;
> +	struct drm_i915_gem_object *obj;
> +	u64 ggtt_offset;
> +	int gen;
> +};
> +
> +int i915_gem_render_state_init(struct intel_engine_cs *ring);
> +void i915_gem_render_state_fini(struct render_state *so);
> +int i915_gem_render_state_prepare(struct intel_engine_cs *ring,
> +				  struct render_state *so);
> +
> +#endif /* _I915_GEM_RENDER_STATE_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index c096b9b..8e51fd0 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1217,8 +1217,6 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
>  static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
>  {
>  	int ret;
> -	struct intel_context *dctx = ring->default_context;
> -	struct drm_i915_gem_object *dctx_obj;
>  
>  	/* Intentionally left blank. */
>  	ring->buffer = NULL;
> @@ -1232,18 +1230,6 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>  	spin_lock_init(&ring->execlist_lock);
>  	ring->next_context_status_buffer = 0;
>  
> -	ret = intel_lr_context_deferred_create(dctx, ring);
> -	if (ret)
> -		return ret;
> -
> -	/* The status page is offset 0 from the context object in LRCs. */
> -	dctx_obj = dctx->engine[ring->id].state;
> -	ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(dctx_obj);
> -	ring->status_page.page_addr = kmap(sg_page(dctx_obj->pages->sgl));
> -	if (ring->status_page.page_addr == NULL)
> -		return -ENOMEM;
> -	ring->status_page.obj = dctx_obj;
> -
>  	ret = i915_cmd_parser_init_ring(ring);
>  	if (ret)
>  		return ret;
> @@ -1254,7 +1240,9 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>  			return ret;
>  	}
>  
> -	return 0;
> +	ret = intel_lr_context_deferred_create(ring->default_context, ring);
> +
> +	return ret;
>  }
>  
>  static int logical_render_ring_init(struct drm_device *dev)
> @@ -1448,6 +1436,38 @@ cleanup_render_ring:
>  	return ret;
>  }
>  
> +int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
> +				       struct intel_context *ctx)
> +{
> +	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
> +	struct render_state so;
> +	struct drm_i915_file_private *file_priv = ctx->file_priv;
> +	struct drm_file *file = file_priv ? file_priv->file : NULL;
> +	int ret;
> +
> +	ret = i915_gem_render_state_prepare(ring, &so);
> +	if (ret)
> +		return ret;
> +
> +	if (so.rodata == NULL)
> +		return 0;
> +
> +	ret = ring->emit_bb_start(ringbuf,
> +			so.ggtt_offset,
> +			I915_DISPATCH_SECURE);
> +	if (ret)
> +		goto out;
> +
> +	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), ring);
> +
> +	ret = __i915_add_request(ring, file, so.obj, NULL);
> +	/* intel_logical_ring_add_request moves object to inactive if it
> +	 * fails */
> +out:
> +	i915_gem_render_state_fini(&so);
> +	return ret;
> +}
> +
>  static int
>  populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj,
>  		    struct intel_engine_cs *ring, struct intel_ringbuffer *ringbuf)
> @@ -1687,6 +1707,29 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
>  	ctx->engine[ring->id].ringbuf = ringbuf;
>  	ctx->engine[ring->id].state = ctx_obj;
>  
> +	if (ctx == ring->default_context) {
> +		/* The status page is offset 0 from the default context object
> +		 * in LRC mode. */
> +		ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(ctx_obj);
> +		ring->status_page.page_addr =
> +				kmap(sg_page(ctx_obj->pages->sgl));
> +		if (ring->status_page.page_addr == NULL)
> +			return -ENOMEM;
> +		ring->status_page.obj = ctx_obj;
> +	}
> +
> +	if (ring->id == RCS && !ctx->rcs_initialized) {
> +		ret = intel_lr_context_render_state_init(ring, ctx);
> +		if (ret) {
> +			DRM_ERROR("Init render state failed: %d\n", ret);
> +			ctx->engine[ring->id].ringbuf = NULL;
> +			ctx->engine[ring->id].state = NULL;
> +			intel_destroy_ringbuffer_obj(ringbuf);
> +			goto error;
> +		}
> +		ctx->rcs_initialized = true;
> +	}
> +
>  	return 0;
>  
>  error:
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 991d449..33c3b4b 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -62,6 +62,8 @@ static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf,
>  int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords);
>  
>  /* Logical Ring Contexts */
> +int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
> +				       struct intel_context *ctx);
>  void intel_lr_context_free(struct intel_context *ctx);
>  int intel_lr_context_deferred_create(struct intel_context *ctx,
>  				     struct intel_engine_cs *ring);
> diff --git a/drivers/gpu/drm/i915/intel_renderstate.h b/drivers/gpu/drm/i915/intel_renderstate.h
> index fd4f662..6c792d3 100644
> --- a/drivers/gpu/drm/i915/intel_renderstate.h
> +++ b/drivers/gpu/drm/i915/intel_renderstate.h
> @@ -24,13 +24,7 @@
>  #ifndef _INTEL_RENDERSTATE_H
>  #define _INTEL_RENDERSTATE_H
>  
> -#include <linux/types.h>
> -
> -struct intel_renderstate_rodata {
> -	const u32 *reloc;
> -	const u32 *batch;
> -	const u32 batch_items;
> -};
> +#include "i915_drv.h"
>  
>  extern const struct intel_renderstate_rodata gen6_null_state;
>  extern const struct intel_renderstate_rodata gen7_null_state;
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 137+ messages in thread

end of thread, other threads:[~2014-08-28  9:40 UTC | newest]

Thread overview: 137+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-24 16:04 [PATCH 00/43] Execlists v5 Thomas Daniel
2014-07-24 16:04 ` [PATCH 01/43] drm/i915: Reorder the actual workload submission so that args checking is done earlier Thomas Daniel
2014-07-25  8:30   ` Daniel Vetter
2014-07-25  9:16     ` Chris Wilson
2014-07-24 16:04 ` [PATCH 02/43] drm/i915/bdw: New source and header file for LRs, LRCs and Execlists Thomas Daniel
2014-07-24 16:04 ` [PATCH 03/43] drm/i915/bdw: Macro for LRCs and module option for Execlists Thomas Daniel
2014-08-11 13:57   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 04/43] drm/i915/bdw: Initialization for Logical Ring Contexts Thomas Daniel
2014-08-11 14:03   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 05/43] drm/i915/bdw: Introduce one context backing object per engine Thomas Daniel
2014-08-11 13:59   ` [PATCH] drm/i915: WARN if module opt sanitization goes out of order Daniel Vetter
2014-08-11 14:28     ` Damien Lespiau
2014-07-24 16:04 ` [PATCH 06/43] drm/i915/bdw: A bit more advanced LR context alloc/free Thomas Daniel
2014-07-24 16:04 ` [PATCH 07/43] drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts Thomas Daniel
2014-07-24 16:04 ` [PATCH 08/43] drm/i915/bdw: Add a context and an engine pointers to the ringbuffer Thomas Daniel
2014-08-11 14:14   ` Daniel Vetter
2014-08-11 14:20     ` Daniel Vetter
2014-08-13 13:34       ` Daniel, Thomas
2014-08-13 15:16         ` Daniel Vetter
2014-08-14 15:09           ` Daniel, Thomas
2014-08-14 15:32             ` Daniel Vetter
2014-08-14 15:37               ` Daniel Vetter
2014-08-14 15:56                 ` Daniel, Thomas
2014-08-14 16:19                   ` Daniel Vetter
2014-08-14 16:27                     ` [PATCH] drm/i915: Add temporary ring->ctx backpointer Daniel Vetter
2014-08-14 16:33                       ` Daniel, Thomas
2014-07-24 16:04 ` [PATCH 09/43] drm/i915/bdw: Populate LR contexts (somewhat) Thomas Daniel
2014-07-24 16:04 ` [PATCH 10/43] drm/i915/bdw: Deferred creation of user-created LRCs Thomas Daniel
2014-08-11 14:25   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 11/43] drm/i915/bdw: Render moot context reset and switch with Execlists Thomas Daniel
2014-08-11 14:30   ` Daniel Vetter
2014-08-15 10:22     ` Daniel, Thomas
2014-08-15 15:39       ` Daniel Vetter
2014-08-20 15:29   ` [PATCH] " Thomas Daniel
2014-08-20 15:36     ` Chris Wilson
2014-08-25 20:39       ` Daniel Vetter
2014-08-25 22:01         ` Scot Doyle
2014-08-26  5:59         ` Chris Wilson
2014-08-26 13:54           ` Siluvery, Arun
2014-08-26 14:11             ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 12/43] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs Thomas Daniel
2014-08-01 13:46   ` Damien Lespiau
2014-08-07 12:17   ` Thomas Daniel
2014-08-08 15:59     ` Damien Lespiau
2014-08-11 14:32     ` Daniel Vetter
2014-08-15 11:01     ` [PATCH] " Thomas Daniel
2014-07-24 16:04 ` [PATCH 13/43] drm/i915: Abstract the legacy workload submission mechanism away Thomas Daniel
2014-08-11 14:36   ` Daniel Vetter
2014-08-11 14:39     ` Daniel Vetter
2014-08-11 14:39   ` Daniel Vetter
2014-08-11 15:02   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 14/43] drm/i915/bdw: Skeleton for the new logical rings submission path Thomas Daniel
2014-07-24 16:04 ` [PATCH 15/43] drm/i915/bdw: Generic logical ring init and cleanup Thomas Daniel
2014-08-11 15:01   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 16/43] drm/i915/bdw: GEN-specific logical ring init Thomas Daniel
2014-08-11 15:04   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 17/43] drm/i915/bdw: GEN-specific logical ring set/get seqno Thomas Daniel
2014-08-11 15:05   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 18/43] drm/i915/bdw: New logical ring submission mechanism Thomas Daniel
2014-08-11 20:40   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 19/43] drm/i915/bdw: GEN-specific logical ring emit request Thomas Daniel
2014-07-24 16:04 ` [PATCH 20/43] drm/i915/bdw: GEN-specific logical ring emit flush Thomas Daniel
2014-07-24 16:04 ` [PATCH 21/43] drm/i915/bdw: Emission of requests with logical rings Thomas Daniel
2014-08-11 20:56   ` Daniel Vetter
2014-08-13 13:34     ` Daniel, Thomas
2014-08-13 15:25       ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 22/43] drm/i915/bdw: Ring idle and stop " Thomas Daniel
2014-07-24 16:04 ` [PATCH 23/43] drm/i915/bdw: Interrupts " Thomas Daniel
2014-08-11 21:02   ` Daniel Vetter
2014-08-11 21:08   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 24/43] drm/i915/bdw: GEN-specific logical ring emit batchbuffer start Thomas Daniel
2014-08-11 21:09   ` Daniel Vetter
2014-08-11 21:12     ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 25/43] drm/i915/bdw: Workload submission mechanism for Execlists Thomas Daniel
2014-08-11 20:30   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 26/43] drm/i915/bdw: Always use MMIO flips with Execlists Thomas Daniel
2014-08-11 20:34   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 27/43] drm/i915/bdw: Render state init for Execlists Thomas Daniel
2014-08-11 21:25   ` Daniel Vetter
2014-08-13 15:07     ` Daniel, Thomas
2014-08-13 15:30       ` Daniel Vetter
2014-08-14 20:00         ` Daniel Vetter
2014-08-15  8:43           ` Daniel, Thomas
2014-08-20 15:55           ` Daniel, Thomas
2014-08-25 20:55             ` Daniel Vetter
2014-08-21 10:40   ` [PATCH] " Thomas Daniel
2014-08-28  9:40     ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 28/43] drm/i915/bdw: Implement context switching (somewhat) Thomas Daniel
2014-08-11 21:29   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 29/43] drm/i915/bdw: Write the tail pointer, LRC style Thomas Daniel
2014-08-01 14:33   ` Damien Lespiau
2014-08-11 21:30   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 30/43] drm/i915/bdw: Two-stage execlist submit process Thomas Daniel
2014-08-14 20:05   ` Daniel Vetter
2014-08-14 20:10   ` Daniel Vetter
2014-08-15  8:51     ` Daniel, Thomas
2014-08-15  9:38       ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 31/43] drm/i915/bdw: Handle context switch events Thomas Daniel
2014-08-14 20:13   ` Daniel Vetter
2014-08-14 20:17   ` Daniel Vetter
2014-08-14 20:28   ` Daniel Vetter
2014-08-14 20:37   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 32/43] drm/i915/bdw: Avoid non-lite-restore preemptions Thomas Daniel
2014-08-14 20:31   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 33/43] drm/i915/bdw: Help out the ctx switch interrupt handler Thomas Daniel
2014-08-14 20:43   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 34/43] drm/i915/bdw: Make sure gpu reset still works with Execlists Thomas Daniel
2014-08-01 14:42   ` Damien Lespiau
2014-08-06  9:26     ` Daniel, Thomas
2014-08-01 14:46   ` Damien Lespiau
2014-08-06  9:28     ` Daniel, Thomas
2014-07-24 16:04 ` [PATCH 35/43] drm/i915/bdw: Make sure error capture keeps working " Thomas Daniel
2014-08-15 12:14   ` Daniel Vetter
2014-08-21 10:57     ` Daniel, Thomas
2014-08-25 21:00       ` Daniel Vetter
2014-08-25 21:29       ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 36/43] drm/i915/bdw: Disable semaphores for Execlists Thomas Daniel
2014-07-24 16:04 ` [PATCH 37/43] drm/i915/bdw: Display execlists info in debugfs Thomas Daniel
2014-08-01 14:54   ` Damien Lespiau
2014-08-07 12:23   ` Thomas Daniel
2014-08-08 16:02     ` Damien Lespiau
2014-07-24 16:04 ` [PATCH 38/43] drm/i915/bdw: Display context backing obj & ringbuffer " Thomas Daniel
2014-07-24 16:04 ` [PATCH 39/43] drm/i915/bdw: Print context state " Thomas Daniel
2014-08-01 15:54   ` Damien Lespiau
2014-08-07 12:24   ` Thomas Daniel
2014-08-08 15:57     ` Damien Lespiau
2014-07-24 16:04 ` [PATCH 40/43] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists Thomas Daniel
2014-08-15 12:42   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 41/43] drm/i915/bdw: Enable Logical Ring Contexts (hence, Execlists) Thomas Daniel
2014-08-18  8:33   ` Jani Nikula
2014-08-18 14:52     ` Daniel, Thomas
2014-07-24 16:04 ` [PATCH 42/43] drm/i915/bdw: Pin the context backing objects to GGTT on-demand Thomas Daniel
2014-08-15 13:03   ` Daniel Vetter
2014-07-24 16:04 ` [PATCH 43/43] drm/i915/bdw: Pin the ringbuffer backing object " Thomas Daniel
2014-07-25  8:35 ` [PATCH 00/43] Execlists v5 Daniel Vetter
2014-08-01 16:09 ` Damien Lespiau
2014-08-01 16:29   ` Jesse Barnes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.