All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/53] Execlists v3
@ 2014-06-13 15:37 oscar.mateo
  2014-06-13 15:37 ` [PATCH 01/53] drm/i915: Extract context backing object allocation oscar.mateo
                   ` (53 more replies)
  0 siblings, 54 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

For a description of this patchset, please check the previous cover letters: [1] and [2].

The main difference with v2 is in how we get to the point of context submission: this time
around, instead of massaging the legacy ringbuffer submission functions (mostly located in intel_ringebuffer.c), I have effectively created a separate path for Execlists submission
in intel_lrc.c (even though everybody knows you shouldn't mess with split timelines). The
alternative path is mostly a clone of the previous, but the idea is that it will differ
significantly in the (so in exchange for duplicated code, we gain the ability to perform
big changes without breaking legacy hardware support). This change was a suggestion by
Daniel Vetter [3].

I know many patches here will be very controversial, so I would appreciate early feedback
in the direction this effort is taking.

The previous IGT test [4] still applies.

[1]
http://lists.freedesktop.org/archives/intel-gfx/2014-March/042563.html
[2]
http://lists.freedesktop.org/archives/intel-gfx/2014-May/044847.html
[3]
http://lists.freedesktop.org/archives/intel-gfx/2014-May/045139.html
[4]
http://lists.freedesktop.org/archives/intel-gfx/2014-May/044846.html

Ben Widawsky (2):
  drm/i915/bdw: Implement context switching (somewhat)
  drm/i915/bdw: Print context state in debugfs

Michel Thierry (1):
  drm/i915/bdw: Two-stage execlist submit process

Oscar Mateo (48):
  drm/i915: Extract context backing object allocation
  drm/i915: Rename ctx->obj to ctx->render_obj
  drm/i915: Add a dev pointer to the context
  drm/i915: Extract ringbuffer destroy & make alloc outside accesible
  drm/i915: Move i915_gem_validate_context() to i915_gem_context.c
  drm/i915/bdw: Introduce one context backing object per engine
  drm/i915/bdw: New file for Logical Ring Contexts and Execlists
  drm/i915/bdw: Macro for LRCs and module option for Execlists
  drm/i915/bdw: Initialization for Logical Ring Contexts
  drm/i915/bdw: A bit more advanced context init/fini
  drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts
  drm/i915/bdw: Populate LR contexts (somewhat)
  drm/i915/bdw: Deferred creation of user-created LRCs
  drm/i915/bdw: Render moot context reset and switch when LRCs are
    enabled
  drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
  drm/i915/bdw: Skeleton for the new logical rings submission path
  drm/i915/bdw: Generic logical ring init and cleanup
  drm/i915/bdw: New header file for LRs, LRCs and Execlists
  drm/i915: Extract pipe control fini & make init outside accesible
  drm/i915/bdw: GEN-specific logical ring init
  drm/i915/bdw: GEN-specific logical ring set/get seqno
  drm/i915: Make ring_space more generic and outside accesible
  drm/i915: Generalize intel_ring_get_tail
  drm/i915: Make intel_ring_stopped outside accesible
  drm/i915/bdw: GEN-specific logical ring submit context (somewhat)
  drm/i915/bdw: New logical ring submission mechanism
  drm/i915/bdw: GEN-specific logical ring emit request
  drm/i915/bdw: GEN-specific logical ring emit flush
  drm/i915/bdw: Emission of requests with logical rings
  drm/i915/bdw: Ring idle and stop with logical rings
  drm/i915/bdw: Interrupts with logical rings
  drm/i915/bdw: GEN-specific logical ring emit batchbuffer start
  drm/i915: Extract the actual workload submission mechanism from
    execbuffer
  drm/i915: Make move_to_active and retire_commands outside accesible
  drm/i915/bdw: Workload submission mechanism for Execlists
  drm/i915: Abstract the workload submission mechanism away
  drm/i915/bdw: Write the tail pointer, LRC style
  drm/i915/bdw: Avoid non-lite-restore preemptions
  drm/i915/bdw: Make sure gpu reset still works with Execlists
  drm/i915/bdw: Make sure error capture keeps working with Execlists
  drm/i915/bdw: Help out the ctx switch interrupt handler
  drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
  drm/i915/bdw: Display execlists info in debugfs
  drm/i915/bdw: Display context backing obj & ringbuffer info in debugfs
  drm/i915: Extract render state preparation
  drm/i915/bdw: Render state init for Execlists
  drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
  drm/i915/bdw: Enable logical ring contexts

Sourab Gupta (1):
  !UPSTREAM: drm/i915: Use MMIO flips

Thomas Daniel (1):
  drm/i915/bdw: Handle context switch events

 drivers/gpu/drm/i915/Makefile                |    1 +
 drivers/gpu/drm/i915/i915_debugfs.c          |  150 +-
 drivers/gpu/drm/i915/i915_dma.c              |    1 +
 drivers/gpu/drm/i915/i915_drv.h              |   60 +-
 drivers/gpu/drm/i915/i915_gem.c              |   70 +-
 drivers/gpu/drm/i915/i915_gem_context.c      |  242 +++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  328 ++---
 drivers/gpu/drm/i915/i915_gem_gtt.c          |    5 +
 drivers/gpu/drm/i915/i915_gem_render_state.c |   39 +-
 drivers/gpu/drm/i915/i915_gpu_error.c        |   16 +-
 drivers/gpu/drm/i915/i915_irq.c              |   53 +-
 drivers/gpu/drm/i915/i915_params.c           |   11 +
 drivers/gpu/drm/i915/i915_reg.h              |    5 +
 drivers/gpu/drm/i915/intel_display.c         |  148 +-
 drivers/gpu/drm/i915/intel_drv.h             |    6 +
 drivers/gpu/drm/i915/intel_lrc.c             | 1902 ++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h             |   99 ++
 drivers/gpu/drm/i915/intel_renderstate.h     |   13 +
 drivers/gpu/drm/i915/intel_ringbuffer.c      |  101 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h      |   53 +-
 20 files changed, 2974 insertions(+), 329 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/intel_lrc.c
 create mode 100644 drivers/gpu/drm/i915/intel_lrc.h

-- 
1.9.0

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 01/53] drm/i915: Extract context backing object allocation
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 02/53] drm/i915: Rename ctx->obj to ctx->render_obj oscar.mateo
                   ` (52 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

We are going to use it later to allocate our own context objects.

No functional changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  2 ++
 drivers/gpu/drm/i915/i915_gem_context.c | 54 +++++++++++++++++++++------------
 2 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 506386e..24f084d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2386,6 +2386,8 @@ int i915_switch_context(struct intel_engine_cs *ring,
 			struct intel_context *to);
 struct intel_context *
 i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id);
+struct drm_i915_gem_object *
+i915_gem_alloc_context_obj(struct drm_device *dev, size_t size);
 void i915_gem_context_free(struct kref *ctx_ref);
 static inline void i915_gem_context_reference(struct intel_context *ctx)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 3ffe308..4efa5ca 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -198,6 +198,36 @@ void i915_gem_context_free(struct kref *ctx_ref)
 	kfree(ctx);
 }
 
+struct drm_i915_gem_object *
+i915_gem_alloc_context_obj(struct drm_device *dev, size_t size)
+{
+	struct drm_i915_gem_object *obj;
+	int ret;
+
+	obj = i915_gem_alloc_object(dev, size);
+	if (obj == NULL)
+		return ERR_PTR(-ENOMEM);
+
+	/*
+	 * Try to make the context utilize L3 as well as LLC.
+	 *
+	 * On VLV we don't have L3 controls in the PTEs so we
+	 * shouldn't touch the cache level, especially as that
+	 * would make the object snooped which might have a
+	 * negative performance impact.
+	 */
+	if (INTEL_INFO(dev)->gen >= 7 && !IS_VALLEYVIEW(dev)) {
+		ret = i915_gem_object_set_cache_level(obj, I915_CACHE_L3_LLC);
+		/* Failure shouldn't ever happen this early */
+		if (WARN_ON(ret)) {
+			drm_gem_object_unreference(&obj->base);
+			return ERR_PTR(ret);
+		}
+	}
+
+	return obj;
+}
+
 static struct i915_hw_ppgtt *
 create_vm_for_ctx(struct drm_device *dev, struct intel_context *ctx)
 {
@@ -234,27 +264,13 @@ __create_hw_context(struct drm_device *dev,
 	list_add_tail(&ctx->link, &dev_priv->context_list);
 
 	if (dev_priv->hw_context_size) {
-		ctx->obj = i915_gem_alloc_object(dev, dev_priv->hw_context_size);
-		if (ctx->obj == NULL) {
-			ret = -ENOMEM;
+		struct drm_i915_gem_object *obj =
+				i915_gem_alloc_context_obj(dev, dev_priv->hw_context_size);
+		if (IS_ERR(obj)) {
+			ret = PTR_ERR(obj);
 			goto err_out;
 		}
-
-		/*
-		 * Try to make the context utilize L3 as well as LLC.
-		 *
-		 * On VLV we don't have L3 controls in the PTEs so we
-		 * shouldn't touch the cache level, especially as that
-		 * would make the object snooped which might have a
-		 * negative performance impact.
-		 */
-		if (INTEL_INFO(dev)->gen >= 7 && !IS_VALLEYVIEW(dev)) {
-			ret = i915_gem_object_set_cache_level(ctx->obj,
-							      I915_CACHE_L3_LLC);
-			/* Failure shouldn't ever happen this early */
-			if (WARN_ON(ret))
-				goto err_out;
-		}
+		ctx->obj = obj;
 	}
 
 	/* Default context will never have a file_priv */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 02/53] drm/i915: Rename ctx->obj to ctx->render_obj
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
  2014-06-13 15:37 ` [PATCH 01/53] drm/i915: Extract context backing object allocation oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 17:00   ` Daniel Vetter
  2014-06-13 17:15   ` Chris Wilson
  2014-06-13 15:37 ` [PATCH 03/53] drm/i915: Add a dev pointer to the context oscar.mateo
                   ` (51 subsequent siblings)
  53 siblings, 2 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

The reason for doing this will be better explained in the following
patch. For now, suffice it to say that this backing object is only
used with the render ring, so we're making this fact more explicit.

Done with the following Coccinelle patch (plus manual renaming of the
struct field):

	@@
	struct intel_context c;
	@@
	- (c).obj
	+ c.render_obj

	@@
	struct intel_context *c;
	@@
	- (c)->obj
	+ c->render_obj

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  4 +--
 drivers/gpu/drm/i915/i915_drv.h         |  2 +-
 drivers/gpu/drm/i915/i915_gem_context.c | 63 +++++++++++++++++----------------
 3 files changed, 35 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 7b83297..b09cab4 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1735,7 +1735,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
 	}
 
 	list_for_each_entry(ctx, &dev_priv->context_list, link) {
-		if (ctx->obj == NULL)
+		if (ctx->render_obj == NULL)
 			continue;
 
 		seq_puts(m, "HW context ");
@@ -1744,7 +1744,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
 			if (ring->default_context == ctx)
 				seq_printf(m, "(default context %s) ", ring->name);
 
-		describe_obj(m, ctx->obj);
+		describe_obj(m, ctx->render_obj);
 		seq_putc(m, '\n');
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 24f084d..1cebbd4 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -592,7 +592,7 @@ struct intel_context {
 	uint8_t remap_slice;
 	struct drm_i915_file_private *file_priv;
 	struct intel_engine_cs *last_ring;
-	struct drm_i915_gem_object *obj;
+	struct drm_i915_gem_object *render_obj;
 	struct i915_ctx_hang_stats hang_stats;
 	struct i915_address_space *vm;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 4efa5ca..f27886a 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -182,14 +182,14 @@ void i915_gem_context_free(struct kref *ctx_ref)
 						   typeof(*ctx), ref);
 	struct i915_hw_ppgtt *ppgtt = NULL;
 
-	if (ctx->obj) {
+	if (ctx->render_obj) {
 		/* We refcount even the aliasing PPGTT to keep the code symmetric */
-		if (USES_PPGTT(ctx->obj->base.dev))
+		if (USES_PPGTT(ctx->render_obj->base.dev))
 			ppgtt = ctx_to_ppgtt(ctx);
 
 		/* XXX: Free up the object before tearing down the address space, in
 		 * case we're bound in the PPGTT */
-		drm_gem_object_unreference(&ctx->obj->base);
+		drm_gem_object_unreference(&ctx->render_obj->base);
 	}
 
 	if (ppgtt)
@@ -270,7 +270,7 @@ __create_hw_context(struct drm_device *dev,
 			ret = PTR_ERR(obj);
 			goto err_out;
 		}
-		ctx->obj = obj;
+		ctx->render_obj = obj;
 	}
 
 	/* Default context will never have a file_priv */
@@ -317,7 +317,7 @@ i915_gem_create_context(struct drm_device *dev,
 	if (IS_ERR(ctx))
 		return ctx;
 
-	if (is_global_default_ctx && ctx->obj) {
+	if (is_global_default_ctx && ctx->render_obj) {
 		/* We may need to do things with the shrinker which
 		 * require us to immediately switch back to the default
 		 * context. This can cause a problem as pinning the
@@ -325,7 +325,7 @@ i915_gem_create_context(struct drm_device *dev,
 		 * be available. To avoid this we always pin the default
 		 * context.
 		 */
-		ret = i915_gem_obj_ggtt_pin(ctx->obj,
+		ret = i915_gem_obj_ggtt_pin(ctx->render_obj,
 					    get_context_alignment(dev), 0);
 		if (ret) {
 			DRM_DEBUG_DRIVER("Couldn't pin %d\n", ret);
@@ -365,8 +365,8 @@ i915_gem_create_context(struct drm_device *dev,
 	return ctx;
 
 err_unpin:
-	if (is_global_default_ctx && ctx->obj)
-		i915_gem_object_ggtt_unpin(ctx->obj);
+	if (is_global_default_ctx && ctx->render_obj)
+		i915_gem_object_ggtt_unpin(ctx->render_obj);
 err_destroy:
 	i915_gem_context_unreference(ctx);
 	return ERR_PTR(ret);
@@ -390,12 +390,12 @@ void i915_gem_context_reset(struct drm_device *dev)
 		if (!ring->last_context)
 			continue;
 
-		if (dctx->obj && i == RCS) {
-			WARN_ON(i915_gem_obj_ggtt_pin(dctx->obj,
+		if (dctx->render_obj && i == RCS) {
+			WARN_ON(i915_gem_obj_ggtt_pin(dctx->render_obj,
 						      get_context_alignment(dev), 0));
 			/* Fake a finish/inactive */
-			dctx->obj->base.write_domain = 0;
-			dctx->obj->active = 0;
+			dctx->render_obj->base.write_domain = 0;
+			dctx->render_obj->active = 0;
 		}
 
 		i915_gem_context_unreference(ring->last_context);
@@ -445,7 +445,7 @@ void i915_gem_context_fini(struct drm_device *dev)
 	struct intel_context *dctx = dev_priv->ring[RCS].default_context;
 	int i;
 
-	if (dctx->obj) {
+	if (dctx->render_obj) {
 		/* The only known way to stop the gpu from accessing the hw context is
 		 * to reset it. Do this as the very last operation to avoid confusing
 		 * other code, leading to spurious errors. */
@@ -460,13 +460,13 @@ void i915_gem_context_fini(struct drm_device *dev)
 		WARN_ON(!dev_priv->ring[RCS].last_context);
 		if (dev_priv->ring[RCS].last_context == dctx) {
 			/* Fake switch to NULL context */
-			WARN_ON(dctx->obj->active);
-			i915_gem_object_ggtt_unpin(dctx->obj);
+			WARN_ON(dctx->render_obj->active);
+			i915_gem_object_ggtt_unpin(dctx->render_obj);
 			i915_gem_context_unreference(dctx);
 			dev_priv->ring[RCS].last_context = NULL;
 		}
 
-		i915_gem_object_ggtt_unpin(dctx->obj);
+		i915_gem_object_ggtt_unpin(dctx->render_obj);
 	}
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
@@ -586,7 +586,7 @@ mi_set_context(struct intel_engine_cs *ring,
 
 	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_emit(ring, MI_SET_CONTEXT);
-	intel_ring_emit(ring, i915_gem_obj_ggtt_offset(new_context->obj) |
+	intel_ring_emit(ring, i915_gem_obj_ggtt_offset(new_context->render_obj) |
 			MI_MM_SPACE_GTT |
 			MI_SAVE_EXT_STATE_EN |
 			MI_RESTORE_EXT_STATE_EN |
@@ -617,8 +617,8 @@ static int do_switch(struct intel_engine_cs *ring,
 	int ret, i;
 
 	if (from != NULL && ring == &dev_priv->ring[RCS]) {
-		BUG_ON(from->obj == NULL);
-		BUG_ON(!i915_gem_obj_is_pinned(from->obj));
+		BUG_ON(from->render_obj == NULL);
+		BUG_ON(!i915_gem_obj_is_pinned(from->render_obj));
 	}
 
 	if (from == to && from->last_ring == ring && !to->remap_slice)
@@ -626,7 +626,7 @@ static int do_switch(struct intel_engine_cs *ring,
 
 	/* Trying to pin first makes error handling easier. */
 	if (ring == &dev_priv->ring[RCS]) {
-		ret = i915_gem_obj_ggtt_pin(to->obj,
+		ret = i915_gem_obj_ggtt_pin(to->render_obj,
 					    get_context_alignment(ring->dev), 0);
 		if (ret)
 			return ret;
@@ -659,14 +659,14 @@ static int do_switch(struct intel_engine_cs *ring,
 	 *
 	 * XXX: We need a real interface to do this instead of trickery.
 	 */
-	ret = i915_gem_object_set_to_gtt_domain(to->obj, false);
+	ret = i915_gem_object_set_to_gtt_domain(to->render_obj, false);
 	if (ret)
 		goto unpin_out;
 
-	if (!to->obj->has_global_gtt_mapping) {
-		struct i915_vma *vma = i915_gem_obj_to_vma(to->obj,
+	if (!to->render_obj->has_global_gtt_mapping) {
+		struct i915_vma *vma = i915_gem_obj_to_vma(to->render_obj,
 							   &dev_priv->gtt.base);
-		vma->bind_vma(vma, to->obj->cache_level, GLOBAL_BIND);
+		vma->bind_vma(vma, to->render_obj->cache_level, GLOBAL_BIND);
 	}
 
 	if (!to->is_initialized || i915_gem_context_is_default(to))
@@ -695,8 +695,9 @@ static int do_switch(struct intel_engine_cs *ring,
 	 * MI_SET_CONTEXT instead of when the next seqno has completed.
 	 */
 	if (from != NULL) {
-		from->obj->base.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
-		i915_vma_move_to_active(i915_gem_obj_to_ggtt(from->obj), ring);
+		from->render_obj->base.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
+		i915_vma_move_to_active(i915_gem_obj_to_ggtt(from->render_obj),
+					ring);
 		/* As long as MI_SET_CONTEXT is serializing, ie. it flushes the
 		 * whole damn pipeline, we don't need to explicitly mark the
 		 * object dirty. The only exception is that the context must be
@@ -704,11 +705,11 @@ static int do_switch(struct intel_engine_cs *ring,
 		 * able to defer doing this until we know the object would be
 		 * swapped, but there is no way to do that yet.
 		 */
-		from->obj->dirty = 1;
-		BUG_ON(from->obj->ring != ring);
+		from->render_obj->dirty = 1;
+		BUG_ON(from->render_obj->ring != ring);
 
 		/* obj is kept alive until the next request by its active ref */
-		i915_gem_object_ggtt_unpin(from->obj);
+		i915_gem_object_ggtt_unpin(from->render_obj);
 		i915_gem_context_unreference(from);
 	}
 
@@ -729,7 +730,7 @@ done:
 
 unpin_out:
 	if (ring->id == RCS)
-		i915_gem_object_ggtt_unpin(to->obj);
+		i915_gem_object_ggtt_unpin(to->render_obj);
 	return ret;
 }
 
@@ -750,7 +751,7 @@ int i915_switch_context(struct intel_engine_cs *ring,
 
 	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
 
-	if (to->obj == NULL) { /* We have the fake context */
+	if (to->render_obj == NULL) { /* We have the fake context */
 		if (to != ring->last_context) {
 			i915_gem_context_reference(to);
 			if (ring->last_context)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 03/53] drm/i915: Add a dev pointer to the context
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
  2014-06-13 15:37 ` [PATCH 01/53] drm/i915: Extract context backing object allocation oscar.mateo
  2014-06-13 15:37 ` [PATCH 02/53] drm/i915: Rename ctx->obj to ctx->render_obj oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 04/53] drm/i915: Extract ringbuffer destroy & make alloc outside accesible oscar.mateo
                   ` (50 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Without this, i915_gem_context_free looks obfuscated. But, also, it
gives me the possibility to know which kind of context I am dealing
with at freeing time (at this point we only have fake and legacy hw
contexts, but soon we will have logical ring contexts as well).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  1 +
 drivers/gpu/drm/i915/i915_gem_context.c | 21 +++++++++++----------
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1cebbd4..ec7e352 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -586,6 +586,7 @@ struct i915_ctx_hang_stats {
 /* This must match up with the value previously used for execbuf2.rsvd1. */
 #define DEFAULT_CONTEXT_ID 0
 struct intel_context {
+	struct drm_device *dev;
 	struct kref ref;
 	int id;
 	bool is_initialized;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f27886a..f6c2538 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -178,22 +178,21 @@ static int get_context_size(struct drm_device *dev)
 
 void i915_gem_context_free(struct kref *ctx_ref)
 {
-	struct intel_context *ctx = container_of(ctx_ref,
-						   typeof(*ctx), ref);
-	struct i915_hw_ppgtt *ppgtt = NULL;
+	struct intel_context *ctx = container_of(ctx_ref, typeof(*ctx), ref);
+	struct drm_device *dev = ctx->dev;
 
 	if (ctx->render_obj) {
-		/* We refcount even the aliasing PPGTT to keep the code symmetric */
-		if (USES_PPGTT(ctx->render_obj->base.dev))
-			ppgtt = ctx_to_ppgtt(ctx);
-
 		/* XXX: Free up the object before tearing down the address space, in
 		 * case we're bound in the PPGTT */
 		drm_gem_object_unreference(&ctx->render_obj->base);
 	}
 
-	if (ppgtt)
+	/* We refcount even the aliasing PPGTT to keep the code symmetric */
+	if (USES_PPGTT(dev)) {
+		struct i915_hw_ppgtt *ppgtt = ctx_to_ppgtt(ctx);
 		kref_put(&ppgtt->ref, ppgtt_release);
+	}
+
 	list_del(&ctx->link);
 	kfree(ctx);
 }
@@ -229,8 +228,9 @@ i915_gem_alloc_context_obj(struct drm_device *dev, size_t size)
 }
 
 static struct i915_hw_ppgtt *
-create_vm_for_ctx(struct drm_device *dev, struct intel_context *ctx)
+create_vm_for_ctx(struct intel_context *ctx)
 {
+	struct drm_device *dev = ctx->dev;
 	struct i915_hw_ppgtt *ppgtt;
 	int ret;
 
@@ -282,6 +282,7 @@ __create_hw_context(struct drm_device *dev,
 	} else
 		ret = DEFAULT_CONTEXT_ID;
 
+	ctx->dev = dev;
 	ctx->file_priv = file_priv;
 	ctx->id = ret;
 	/* NB: Mark all slices as needing a remap so that when the context first
@@ -334,7 +335,7 @@ i915_gem_create_context(struct drm_device *dev,
 	}
 
 	if (create_vm) {
-		struct i915_hw_ppgtt *ppgtt = create_vm_for_ctx(dev, ctx);
+		struct i915_hw_ppgtt *ppgtt = create_vm_for_ctx(ctx);
 
 		if (IS_ERR_OR_NULL(ppgtt)) {
 			DRM_DEBUG_DRIVER("PPGTT setup failed (%ld)\n",
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 04/53] drm/i915: Extract ringbuffer destroy & make alloc outside accesible
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (2 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 03/53] drm/i915: Add a dev pointer to the context oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-18 21:39   ` Volkin, Bradley D
  2014-06-13 15:37 ` [PATCH 05/53] drm/i915: Move i915_gem_validate_context() to i915_gem_context.c oscar.mateo
                   ` (49 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

We are going to start creating a lot of extra ringbuffers soon, so
these functions are handy.

No functional changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 26 ++++++++++++++++----------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  4 ++++
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 279488a..915f3d5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1378,15 +1378,25 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
 	return 0;
 }
 
-static int allocate_ring_buffer(struct intel_engine_cs *ring)
+void intel_destroy_ring_buffer(struct intel_ringbuffer *ringbuf)
+{
+	if (!ringbuf->obj)
+		return;
+
+	iounmap(ringbuf->virtual_start);
+	i915_gem_object_ggtt_unpin(ringbuf->obj);
+	drm_gem_object_unreference(&ringbuf->obj->base);
+	ringbuf->obj = NULL;
+}
+
+int intel_allocate_ring_buffer(struct drm_device *dev,
+			       struct intel_ringbuffer *ringbuf)
 {
-	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct intel_ringbuffer *ringbuf = ring->buffer;
 	struct drm_i915_gem_object *obj;
 	int ret;
 
-	if (intel_ring_initialized(ring))
+	if (ringbuf->obj)
 		return 0;
 
 	obj = NULL;
@@ -1455,7 +1465,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 			goto error;
 	}
 
-	ret = allocate_ring_buffer(ring);
+	ret = intel_allocate_ring_buffer(dev, ringbuf);
 	if (ret) {
 		DRM_ERROR("Failed to allocate ringbuffer %s: %d\n", ring->name, ret);
 		goto error;
@@ -1496,11 +1506,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 	intel_stop_ring_buffer(ring);
 	WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
 
-	iounmap(ringbuf->virtual_start);
-
-	i915_gem_object_ggtt_unpin(ringbuf->obj);
-	drm_gem_object_unreference(&ringbuf->obj->base);
-	ringbuf->obj = NULL;
+	intel_destroy_ring_buffer(ringbuf);
 	ring->preallocated_lazy_request = NULL;
 	ring->outstanding_lazy_seqno = 0;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 910c83c..dee5b37 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -318,6 +318,10 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev);
 u64 intel_ring_get_active_head(struct intel_engine_cs *ring);
 void intel_ring_setup_status_page(struct intel_engine_cs *ring);
 
+void intel_destroy_ring_buffer(struct intel_ringbuffer *ringbuf);
+int intel_allocate_ring_buffer(struct drm_device *dev,
+			       struct intel_ringbuffer *ringbuf);
+
 static inline u32 intel_ring_get_tail(struct intel_engine_cs *ring)
 {
 	return ring->buffer->tail;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 05/53] drm/i915: Move i915_gem_validate_context() to i915_gem_context.c
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (3 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 04/53] drm/i915: Extract ringbuffer destroy & make alloc outside accesible oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 17:11   ` Chris Wilson
  2014-06-13 15:37 ` [PATCH 06/53] drm/i915/bdw: Introduce one context backing object per engine oscar.mateo
                   ` (48 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

... and namespace appropriately.

It looks to me like it belongs logically there.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  3 +++
 drivers/gpu/drm/i915/i915_gem_context.c    | 23 +++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 25 +------------------------
 3 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ec7e352..a15370c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2409,6 +2409,9 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 				  struct drm_file *file);
 int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 				   struct drm_file *file);
+struct intel_context *
+i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
+			  struct intel_engine_cs *ring, const u32 ctx_id);
 
 /* i915_gem_render_state.c */
 int i915_gem_render_state_init(struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f6c2538..801b891 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -824,3 +824,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 	DRM_DEBUG_DRIVER("HW context %d destroyed\n", args->ctx_id);
 	return 0;
 }
+
+struct intel_context *
+i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
+			  struct intel_engine_cs *ring, const u32 ctx_id)
+{
+	struct intel_context *ctx = NULL;
+	struct i915_ctx_hang_stats *hs;
+
+	if (ring->id != RCS && ctx_id != DEFAULT_CONTEXT_ID)
+		return ERR_PTR(-EINVAL);
+
+	ctx = i915_gem_context_get(file->driver_priv, ctx_id);
+	if (IS_ERR(ctx))
+		return ctx;
+
+	hs = &ctx->hang_stats;
+	if (hs->banned) {
+		DRM_DEBUG("Context %u tried to submit while banned\n", ctx_id);
+		return ERR_PTR(-EIO);
+	}
+
+	return ctx;
+}
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3a30133..58b3970 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -931,29 +931,6 @@ validate_exec_list(struct drm_i915_gem_exec_object2 *exec,
 	return 0;
 }
 
-static struct intel_context *
-i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
-			  struct intel_engine_cs *ring, const u32 ctx_id)
-{
-	struct intel_context *ctx = NULL;
-	struct i915_ctx_hang_stats *hs;
-
-	if (ring->id != RCS && ctx_id != DEFAULT_CONTEXT_ID)
-		return ERR_PTR(-EINVAL);
-
-	ctx = i915_gem_context_get(file->driver_priv, ctx_id);
-	if (IS_ERR(ctx))
-		return ctx;
-
-	hs = &ctx->hang_stats;
-	if (hs->banned) {
-		DRM_DEBUG("Context %u tried to submit while banned\n", ctx_id);
-		return ERR_PTR(-EIO);
-	}
-
-	return ctx;
-}
-
 static void
 i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 				   struct intel_engine_cs *ring)
@@ -1231,7 +1208,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		goto pre_mutex_err;
 	}
 
-	ctx = i915_gem_validate_context(dev, file, ring, ctx_id);
+	ctx = i915_gem_context_validate(dev, file, ring, ctx_id);
 	if (IS_ERR(ctx)) {
 		mutex_unlock(&dev->struct_mutex);
 		ret = PTR_ERR(ctx);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 06/53] drm/i915/bdw: Introduce one context backing object per engine
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (4 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 05/53] drm/i915: Move i915_gem_validate_context() to i915_gem_context.c oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-18 20:16   ` Daniel Vetter
  2014-06-13 15:37 ` [PATCH 07/53] drm/i915/bdw: New file for Logical Ring Contexts and Execlists oscar.mateo
                   ` (47 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

A context backing object only makes sense for a given engine (because
it holds state data specific to that engine).

In legacy ringbuffer sumission mode, the only MI_SET_CONTEXT we really
perform is for the render engine, so one backing object is all we needed.

With Execlists, however, we need backing objects for every engine, as
contexts become the only way to submit workloads to the GPU. To tackle
this problem, we multiplex the context struct to contain <no-of-engines>
objects.

Originally, I colored this code by instantiating one new context for
every engine I wanted to use, but this change suggested by Brad Volkin
makes it more elegant.

v2: Leave the old backing object pointer behind. Daniel Vetter suggested
using a union, but it makes more sense to keep render_obj as a NULL
pointer behind, to make sure no one uses it incorrectly when Execlists
are enabled, similar to what we are doing with ring->buffer (Rusty's API
level 5).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a15370c..ccc1ba6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -593,7 +593,14 @@ struct intel_context {
 	uint8_t remap_slice;
 	struct drm_i915_file_private *file_priv;
 	struct intel_engine_cs *last_ring;
+
+	/* Legacy ring buffer submission */
 	struct drm_i915_gem_object *render_obj;
+	/* Execlists */
+	struct {
+		struct drm_i915_gem_object *obj;
+	} engine[I915_NUM_RINGS];
+
 	struct i915_ctx_hang_stats hang_stats;
 	struct i915_address_space *vm;
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 07/53] drm/i915/bdw: New file for Logical Ring Contexts and Execlists
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (5 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 06/53] drm/i915/bdw: Introduce one context backing object per engine oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-18 20:17   ` Daniel Vetter
  2014-06-13 15:37 ` [PATCH 08/53] drm/i915/bdw: Macro for LRCs and module option for Execlists oscar.mateo
                   ` (46 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Some legacy HW context code assumptions don't make sense for this new
submission method, so we will place this stuff in a separate file.

Note for reviewers: I've carefully considered the best name for this file
and this was my best option (other possibilities were intel_lr_context.c
or intel_execlist.c). I am open to a certain bikeshedding on this matter,
anyway. Regarding splitting execlists and logical ring contexts, it is
probably not worth it for the moment.

v2: Change to intel_lrc.c

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/Makefile    |  1 +
 drivers/gpu/drm/i915/intel_lrc.c | 42 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/intel_lrc.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index cad1683..9fee2a0 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -31,6 +31,7 @@ i915-y += i915_cmd_parser.o \
 	  i915_gpu_error.o \
 	  i915_irq.o \
 	  i915_trace_points.o \
+	  intel_lrc.o \
 	  intel_ringbuffer.o \
 	  intel_uncore.o
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
new file mode 100644
index 0000000..49bb6fc
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -0,0 +1,42 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Ben Widawsky <ben@bwidawsk.net>
+ *    Michel Thierry <michel.thierry@intel.com>
+ *    Thomas Daniel <thomas.daniel@intel.com>
+ *    Oscar Mateo <oscar.mateo@intel.com>
+ *
+ */
+
+/*
+ * GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
+ * These expanded contexts enable a number of new abilities, especially
+ * "Execlists" (also implemented in this file).
+ *
+ * Execlists are the new method by which, on gen8+ hardware, workloads are
+ * submitted for execution (as opposed to the legacy, ringbuffer-based, method).
+ */
+
+#include <drm/drmP.h>
+#include <drm/i915_drm.h>
+#include "i915_drv.h"
-- 
1.9.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 08/53] drm/i915/bdw: Macro for LRCs and module option for Execlists
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (6 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 07/53] drm/i915/bdw: New file for Logical Ring Contexts and Execlists oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-18 20:19   ` Daniel Vetter
  2014-06-13 15:37 ` [PATCH 09/53] drm/i915/bdw: Initialization for Logical Ring Contexts oscar.mateo
                   ` (45 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
These expanded contexts enable a number of new abilities, especially
"Execlists".

The macro is defined to off until we have things in place to hope to
work. In dev_priv, lrc_enabled will reflect the state of whether or
not we've actually properly initialized these new contexts. This helps
the transition in the code but is a candidate for removal at some point.

v2: Rename "advanced contexts" to the more correct "logical ring
contexts".

v3: Add a module parameter to enable execlists. Execlist are relatively
new, and so it'd be wise to be able to switch back to ring submission
to debug subtle problems that will inevitably arise.

v4: Add an intel_enable_execlists function.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2 & v4)
---
 drivers/gpu/drm/i915/i915_drv.h    | 6 ++++++
 drivers/gpu/drm/i915/i915_params.c | 6 ++++++
 drivers/gpu/drm/i915/intel_lrc.c   | 8 ++++++++
 3 files changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ccc1ba6..dac0db1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1519,6 +1519,7 @@ struct drm_i915_private {
 
 	uint32_t hw_context_size;
 	struct list_head context_list;
+	bool lrc_enabled;
 
 	u32 fdi_rx_config;
 
@@ -1944,6 +1945,7 @@ struct drm_i915_cmd_table {
 #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
 
 #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
+#define HAS_LOGICAL_RING_CONTEXTS(dev)	0
 #define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >= 6)
 #define HAS_PPGTT(dev)		(INTEL_INFO(dev)->gen >= 7 && !IS_GEN8(dev))
 #define USES_PPGTT(dev)		intel_enable_ppgtt(dev, false)
@@ -2029,6 +2031,7 @@ struct i915_params {
 	int enable_rc6;
 	int enable_fbc;
 	int enable_ppgtt;
+	int enable_execlists;
 	int enable_psr;
 	unsigned int preliminary_hw_support;
 	int disable_power_well;
@@ -2420,6 +2423,9 @@ struct intel_context *
 i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
 			  struct intel_engine_cs *ring, const u32 ctx_id);
 
+/* intel_lrc.c */
+bool intel_enable_execlists(struct drm_device *dev);
+
 /* i915_gem_render_state.c */
 int i915_gem_render_state_init(struct intel_engine_cs *ring);
 /* i915_gem_evict.c */
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index d05a2af..b7455f8 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -37,6 +37,7 @@ struct i915_params i915 __read_mostly = {
 	.enable_fbc = -1,
 	.enable_hangcheck = true,
 	.enable_ppgtt = -1,
+	.enable_execlists = -1,
 	.enable_psr = 0,
 	.preliminary_hw_support = IS_ENABLED(CONFIG_DRM_I915_PRELIMINARY_HW_SUPPORT),
 	.disable_power_well = 1,
@@ -116,6 +117,11 @@ MODULE_PARM_DESC(enable_ppgtt,
 	"Override PPGTT usage. "
 	"(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
 
+module_param_named(enable_execlists, i915.enable_execlists, int, 0400);
+MODULE_PARM_DESC(enable_execlists,
+	"Override execlists usage. "
+	"(-1=auto [default], 0=disabled, 1=enabled)");
+
 module_param_named(enable_psr, i915.enable_psr, int, 0600);
 MODULE_PARM_DESC(enable_psr, "Enable PSR (default: false)");
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 49bb6fc..58cead1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -40,3 +40,11 @@
 #include <drm/drmP.h>
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
+
+bool intel_enable_execlists(struct drm_device *dev)
+{
+	if (!i915.enable_execlists)
+		return false;
+
+	return HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev);
+}
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 09/53] drm/i915/bdw: Initialization for Logical Ring Contexts
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (7 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 08/53] drm/i915/bdw: Macro for LRCs and module option for Execlists oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-18 20:24   ` Daniel Vetter
  2014-06-13 15:37 ` [PATCH 10/53] drm/i915/bdw: A bit more advanced context init/fini oscar.mateo
                   ` (44 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Early in the series we had our own gen8_gem_context_init/fini
functions, but the truth is they now look almost the same as the
legacy hw context init/fini functions. We can always split them
later if this ceases to be the case.

Also, we do not fall back to legacy ringbuffers when logical ring
context initialization fails (not very likely to happen and, even
if it does, hw contexts would probably fail as well).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 801b891..3f3fb36 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -416,7 +416,13 @@ int i915_gem_context_init(struct drm_device *dev)
 	if (WARN_ON(dev_priv->ring[RCS].default_context))
 		return 0;
 
-	if (HAS_HW_CONTEXTS(dev)) {
+	dev_priv->lrc_enabled = intel_enable_execlists(dev);
+
+	if (dev_priv->lrc_enabled) {
+		/* NB: intentionally left blank. We will allocate our own
+		 * backing objects as we need them, thank you very much */
+		dev_priv->hw_context_size = 0;
+	} else if (HAS_HW_CONTEXTS(dev)) {
 		dev_priv->hw_context_size = round_up(get_context_size(dev), 4096);
 		if (dev_priv->hw_context_size > (1<<20)) {
 			DRM_DEBUG_DRIVER("Disabling HW Contexts; invalid size %d\n",
@@ -436,7 +442,9 @@ int i915_gem_context_init(struct drm_device *dev)
 	for (i = 0; i < I915_NUM_RINGS; i++)
 		dev_priv->ring[i].default_context = ctx;
 
-	DRM_DEBUG_DRIVER("%s context support initialized\n", dev_priv->hw_context_size ? "HW" : "fake");
+	DRM_DEBUG_DRIVER("%s context support initialized\n",
+			dev_priv->lrc_enabled ? "LR" :
+			dev_priv->hw_context_size ? "HW" : "fake");
 	return 0;
 }
 
@@ -765,9 +773,12 @@ int i915_switch_context(struct intel_engine_cs *ring,
 	return do_switch(ring, to);
 }
 
-static bool hw_context_enabled(struct drm_device *dev)
+static bool contexts_enabled(struct drm_device *dev)
 {
-	return to_i915(dev)->hw_context_size;
+	struct drm_i915_private *dev_priv = to_i915(dev);
+
+	/* FIXME: this would be cleaner with a "context type" enum */
+	return dev_priv->lrc_enabled || dev_priv->hw_context_size;
 }
 
 int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
@@ -778,7 +789,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	struct intel_context *ctx;
 	int ret;
 
-	if (!hw_context_enabled(dev))
+	if (!contexts_enabled(dev))
 		return -ENODEV;
 
 	ret = i915_mutex_lock_interruptible(dev);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 10/53] drm/i915/bdw: A bit more advanced context init/fini
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (8 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 09/53] drm/i915/bdw: Initialization for Logical Ring Contexts oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-18 22:13   ` Volkin, Bradley D
  2014-06-13 15:37 ` [PATCH 11/53] drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts oscar.mateo
                   ` (43 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

There are a few big differences between context init and fini with the
previous implementation of hardware contexts. One of them is
demonstrated in this patch: we must allocate a ctx backing object for
each engine.

Regarding the context size, reading the register to calculate the sizes
can work, I think, however the docs are very clear about the actual
context sizes on GEN8, so just hardcode that and use it.

v2: Rebased on top of the Full PPGTT series. It is important to notice
that at this point we have one global default context per engine, all
of them using the aliasing PPGTT (as opposed to the single global
default context we have with legacy HW contexts).

v3:
- Go back to one single global default context, this time with multiple
  backing objects inside.
- Use different context sizes for non-render engines, as suggested by
  Damien (still hardcoded, since the information about the context size
  registers in the BSpec is, well, *lacking*).
- Render ctx size is 20 (or 19) pages, but not 21 (caught by Damien).
- Move default context backing object creation to intel_init_ring (so
  that we don't waste memory in rings that might not get initialized).

v4:
- Reuse the HW legacy context init/fini.
- Create a separate free function.
- Rename the functions with an intel_ preffix.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  3 ++
 drivers/gpu/drm/i915/i915_gem_context.c | 19 +++++++--
 drivers/gpu/drm/i915/intel_lrc.c        | 70 +++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index dac0db1..347308e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2425,6 +2425,9 @@ i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
 
 /* intel_lrc.c */
 bool intel_enable_execlists(struct drm_device *dev);
+void intel_lr_context_free(struct intel_context *ctx);
+int intel_lr_context_deferred_create(struct intel_context *ctx,
+				     struct intel_engine_cs *ring);
 
 /* i915_gem_render_state.c */
 int i915_gem_render_state_init(struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 3f3fb36..1fb4592 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -180,8 +180,11 @@ void i915_gem_context_free(struct kref *ctx_ref)
 {
 	struct intel_context *ctx = container_of(ctx_ref, typeof(*ctx), ref);
 	struct drm_device *dev = ctx->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	if (ctx->render_obj) {
+	if (dev_priv->lrc_enabled)
+		intel_lr_context_free(ctx);
+	else if (ctx->render_obj) {
 		/* XXX: Free up the object before tearing down the address space, in
 		 * case we're bound in the PPGTT */
 		drm_gem_object_unreference(&ctx->render_obj->base);
@@ -438,9 +441,17 @@ int i915_gem_context_init(struct drm_device *dev)
 		return PTR_ERR(ctx);
 	}
 
-	/* NB: RCS will hold a ref for all rings */
-	for (i = 0; i < I915_NUM_RINGS; i++)
-		dev_priv->ring[i].default_context = ctx;
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		struct intel_engine_cs *ring = &dev_priv->ring[i];
+
+		/* NB: RCS will hold a ref for all rings */
+		ring->default_context = ctx;
+
+		/* FIXME: we only want to do this for initialized rings, but for that
+		 * we first need the new logical ring stuff */
+		if (dev_priv->lrc_enabled)
+			intel_lr_context_deferred_create(ctx, ring);
+	}
 
 	DRM_DEBUG_DRIVER("%s context support initialized\n",
 			dev_priv->lrc_enabled ? "LR" :
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 58cead1..952212f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -41,6 +41,11 @@
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 
+#define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
+#define GEN8_LR_CONTEXT_OTHER_SIZE (2 * PAGE_SIZE)
+
+#define GEN8_LR_CONTEXT_ALIGN 4096
+
 bool intel_enable_execlists(struct drm_device *dev)
 {
 	if (!i915.enable_execlists)
@@ -48,3 +53,68 @@ bool intel_enable_execlists(struct drm_device *dev)
 
 	return HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev);
 }
+
+void intel_lr_context_free(struct intel_context *ctx)
+{
+	int i;
+
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].obj;
+		if (ctx_obj) {
+			i915_gem_object_ggtt_unpin(ctx_obj);
+			drm_gem_object_unreference(&ctx_obj->base);
+		}
+	}
+}
+
+static uint32_t get_lr_context_size(struct intel_engine_cs *ring)
+{
+	int ret = 0;
+
+	WARN_ON(INTEL_INFO(ring->dev)->gen != 8);
+
+	switch (ring->id) {
+	case RCS:
+		ret = GEN8_LR_CONTEXT_RENDER_SIZE;
+		break;
+	case VCS:
+	case BCS:
+	case VECS:
+	case VCS2:
+		ret = GEN8_LR_CONTEXT_OTHER_SIZE;
+		break;
+	}
+
+	return ret;
+}
+
+int intel_lr_context_deferred_create(struct intel_context *ctx,
+				     struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_gem_object *ctx_obj;
+	uint32_t context_size;
+	int ret;
+
+	WARN_ON(ctx->render_obj != NULL);
+
+	context_size = round_up(get_lr_context_size(ring), 4096);
+
+	ctx_obj = i915_gem_alloc_context_obj(dev, context_size);
+	if (IS_ERR(ctx_obj)) {
+		ret = PTR_ERR(ctx_obj);
+		DRM_DEBUG_DRIVER("Alloc LRC backing obj failed: %d\n", ret);
+		return ret;
+	}
+
+	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n", ret);
+		drm_gem_object_unreference(&ctx_obj->base);
+		return ret;
+	}
+
+	ctx->engine[ring->id].obj = ctx_obj;
+
+	return 0;
+}
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 11/53] drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (9 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 10/53] drm/i915/bdw: A bit more advanced context init/fini oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-18 22:19   ` Volkin, Bradley D
  2014-06-13 15:37 ` [PATCH 12/53] drm/i915/bdw: Populate LR contexts (somewhat) oscar.mateo
                   ` (42 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

As we have said a couple of times by now, logical ring contexts have
their own ringbuffers: not only the backing pages, but the whole
management struct.

In a previous version of the series, this was achieved with two separate
patches:
drm/i915/bdw: Allocate ringbuffer backing objects for default global LRC
drm/i915/bdw: Allocate ringbuffer for user-created LRCs

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 347308e..79799d8 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -599,6 +599,7 @@ struct intel_context {
 	/* Execlists */
 	struct {
 		struct drm_i915_gem_object *obj;
+		struct intel_ringbuffer *ringbuf;
 	} engine[I915_NUM_RINGS];
 
 	struct i915_ctx_hang_stats hang_stats;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 952212f..b3a23e0 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -60,7 +60,11 @@ void intel_lr_context_free(struct intel_context *ctx)
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].obj;
+		struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
+
 		if (ctx_obj) {
+			intel_destroy_ring_buffer(ringbuf);
+			kfree(ringbuf);
 			i915_gem_object_ggtt_unpin(ctx_obj);
 			drm_gem_object_unreference(&ctx_obj->base);
 		}
@@ -94,6 +98,7 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_gem_object *ctx_obj;
 	uint32_t context_size;
+	struct intel_ringbuffer *ringbuf;
 	int ret;
 
 	WARN_ON(ctx->render_obj != NULL);
@@ -114,6 +119,39 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 		return ret;
 	}
 
+	ringbuf = kzalloc(sizeof(*ringbuf), GFP_KERNEL);
+	if (!ringbuf) {
+		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s\n",
+				ring->name);
+		i915_gem_object_ggtt_unpin(ctx_obj);
+		drm_gem_object_unreference(&ctx_obj->base);
+		ret = -ENOMEM;
+		return ret;
+	}
+
+	ringbuf->size = 32 * PAGE_SIZE;
+	ringbuf->effective_size = ringbuf->size;
+	ringbuf->head = 0;
+	ringbuf->tail = 0;
+	ringbuf->space = ringbuf->size;
+	ringbuf->last_retired_head = -1;
+
+	/* TODO: For now we put this in the mappable region so that we can reuse
+	 * the existing ringbuffer code which ioremaps it. When we start
+	 * creating many contexts, this will no longer work and we must switch
+	 * to a kmapish interface.
+	 */
+	ret = intel_allocate_ring_buffer(dev, ringbuf);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer obj %s: %d\n",
+				ring->name, ret);
+		kfree(ringbuf);
+		i915_gem_object_ggtt_unpin(ctx_obj);
+		drm_gem_object_unreference(&ctx_obj->base);
+		return ret;
+	}
+
+	ctx->engine[ring->id].ringbuf = ringbuf;
 	ctx->engine[ring->id].obj = ctx_obj;
 
 	return 0;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 12/53] drm/i915/bdw: Populate LR contexts (somewhat)
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (10 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 11/53] drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-18 23:24   ` Volkin, Bradley D
  2014-06-13 15:37 ` [PATCH 13/53] drm/i915/bdw: Deferred creation of user-created LRCs oscar.mateo
                   ` (41 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

For the most part, logical ring context objects are similar to hardware
contexts in that the backing object is meant to be opaque. There are
some exceptions where we need to poke certain offsets of the object for
initialization, updating the tail pointer or updating the PDPs.

For our basic execlist implementation we'll only need our PPGTT PDs,
and ringbuffer addresses in order to set up the context. With previous
patches, we have both, so start prepping the context to be load.

Before running a context for the first time you must populate some
fields in the context object. These fields begin 1 PAGE + LRCA, ie. the
first page (in 0 based counting) of the context  image. These same
fields will be read and written to as contexts are saved and restored
once the system is up and running.

Many of these fields are completely reused from previous global
registers: ringbuffer head/tail/control, context control matches some
previous MI_SET_CONTEXT flags, and page directories. There are other
fields which we don't touch which we may want in the future.

v2: CTX_LRI_HEADER_0 is MI_LOAD_REGISTER_IMM(14) for render and (11)
for other engines.

v3: Several rebases and general changes to the code.

v4: Squash with "Extract LR context object populating"
Also, Damien's review comments:
- Set the Force Posted bit on the LRI header, as the BSpec suggest we do.
- Prevent warning when compiling a 32-bits kernel without HIGHMEM64.
- Add a clarifying comment to the context population code.

v5: Damien's review comments:
- The third MI_LOAD_REGISTER_IMM in the context does not set Force Posted.
- Remove dead code.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> (v2)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v3-5)
---
 drivers/gpu/drm/i915/i915_reg.h  |   1 +
 drivers/gpu/drm/i915/intel_lrc.c | 154 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 151 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 286f05c..9c8692a 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -277,6 +277,7 @@
  *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
  */
 #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
+#define   MI_LRI_FORCE_POSTED		(1<<12)
 #define MI_STORE_REGISTER_MEM(x) MI_INSTR(0x24, 2*(x)-1)
 #define MI_STORE_REGISTER_MEM_GEN8(x) MI_INSTR(0x24, 3*(x)-1)
 #define   MI_SRM_LRM_GLOBAL_GTT		(1<<22)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b3a23e0..b96bb45 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -46,6 +46,38 @@
 
 #define GEN8_LR_CONTEXT_ALIGN 4096
 
+#define RING_ELSP(ring)			((ring)->mmio_base+0x230)
+#define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
+
+#define CTX_LRI_HEADER_0		0x01
+#define CTX_CONTEXT_CONTROL		0x02
+#define CTX_RING_HEAD			0x04
+#define CTX_RING_TAIL			0x06
+#define CTX_RING_BUFFER_START		0x08
+#define CTX_RING_BUFFER_CONTROL		0x0a
+#define CTX_BB_HEAD_U			0x0c
+#define CTX_BB_HEAD_L			0x0e
+#define CTX_BB_STATE			0x10
+#define CTX_SECOND_BB_HEAD_U		0x12
+#define CTX_SECOND_BB_HEAD_L		0x14
+#define CTX_SECOND_BB_STATE		0x16
+#define CTX_BB_PER_CTX_PTR		0x18
+#define CTX_RCS_INDIRECT_CTX		0x1a
+#define CTX_RCS_INDIRECT_CTX_OFFSET	0x1c
+#define CTX_LRI_HEADER_1		0x21
+#define CTX_CTX_TIMESTAMP		0x22
+#define CTX_PDP3_UDW			0x24
+#define CTX_PDP3_LDW			0x26
+#define CTX_PDP2_UDW			0x28
+#define CTX_PDP2_LDW			0x2a
+#define CTX_PDP1_UDW			0x2c
+#define CTX_PDP1_LDW			0x2e
+#define CTX_PDP0_UDW			0x30
+#define CTX_PDP0_LDW			0x32
+#define CTX_LRI_HEADER_2		0x41
+#define CTX_R_PWR_CLK_STATE		0x42
+#define CTX_GPGPU_CSR_BASE_ADDRESS	0x44
+
 bool intel_enable_execlists(struct drm_device *dev)
 {
 	if (!i915.enable_execlists)
@@ -54,6 +86,110 @@ bool intel_enable_execlists(struct drm_device *dev)
 	return HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev);
 }
 
+static int
+populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj,
+		    struct intel_engine_cs *ring, struct drm_i915_gem_object *ring_obj)
+{
+	struct i915_hw_ppgtt *ppgtt = ctx_to_ppgtt(ctx);
+	struct page *page;
+	uint32_t *reg_state;
+	int ret;
+
+	ret = i915_gem_object_set_to_cpu_domain(ctx_obj, true);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Could not set to CPU domain\n");
+		return ret;
+	}
+
+	ret = i915_gem_object_get_pages(ctx_obj);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Could not get object pages\n");
+		return ret;
+	}
+
+	i915_gem_object_pin_pages(ctx_obj);
+
+	/* The second page of the context object contains some fields which must
+	 * be set up prior to the first execution. */
+	page = i915_gem_object_get_page(ctx_obj, 1);
+	reg_state = kmap_atomic(page);
+
+	/* A context is actually a big batch buffer with several MI_LOAD_REGISTER_IMM
+	 * commands followed by (reg, value) pairs. The values we are setting here are
+	 * only for the first context restore: on a subsequent save, the GPU will
+	 * recreate this batchbuffer with new values (including all the missing
+	 * MI_LOAD_REGISTER_IMM commands that we are not initializing here). */
+	if (ring->id == RCS)
+		reg_state[CTX_LRI_HEADER_0] = MI_LOAD_REGISTER_IMM(14);
+	else
+		reg_state[CTX_LRI_HEADER_0] = MI_LOAD_REGISTER_IMM(11);
+	reg_state[CTX_LRI_HEADER_0] |= MI_LRI_FORCE_POSTED;
+	reg_state[CTX_CONTEXT_CONTROL] = RING_CONTEXT_CONTROL(ring);
+	reg_state[CTX_CONTEXT_CONTROL+1] = (1<<3) | MI_RESTORE_INHIBIT;
+	reg_state[CTX_CONTEXT_CONTROL+1] |= reg_state[CTX_CONTEXT_CONTROL+1] << 16;
+	reg_state[CTX_RING_HEAD] = RING_HEAD(ring->mmio_base);
+	reg_state[CTX_RING_HEAD+1] = 0;
+	reg_state[CTX_RING_TAIL] = RING_TAIL(ring->mmio_base);
+	reg_state[CTX_RING_TAIL+1] = 0;
+	reg_state[CTX_RING_BUFFER_START] = RING_START(ring->mmio_base);
+	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
+	reg_state[CTX_RING_BUFFER_CONTROL] = RING_CTL(ring->mmio_base);
+	reg_state[CTX_RING_BUFFER_CONTROL+1] = (31 * PAGE_SIZE) | RING_VALID;
+	reg_state[CTX_BB_HEAD_U] = ring->mmio_base + 0x168;
+	reg_state[CTX_BB_HEAD_U+1] = 0;
+	reg_state[CTX_BB_HEAD_L] = ring->mmio_base + 0x140;
+	reg_state[CTX_BB_HEAD_L+1] = 0;
+	reg_state[CTX_BB_STATE] = ring->mmio_base + 0x110;
+	reg_state[CTX_BB_STATE+1] = (1<<5);
+	reg_state[CTX_SECOND_BB_HEAD_U] = ring->mmio_base + 0x11c;
+	reg_state[CTX_SECOND_BB_HEAD_U+1] = 0;
+	reg_state[CTX_SECOND_BB_HEAD_L] = ring->mmio_base + 0x114;
+	reg_state[CTX_SECOND_BB_HEAD_L+1] = 0;
+	reg_state[CTX_SECOND_BB_STATE] = ring->mmio_base + 0x118;
+	reg_state[CTX_SECOND_BB_STATE+1] = 0;
+	if (ring->id == RCS) {
+		reg_state[CTX_BB_PER_CTX_PTR] = ring->mmio_base + 0x1c0;
+		reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
+		reg_state[CTX_RCS_INDIRECT_CTX] = ring->mmio_base + 0x1c4;
+		reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
+		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring->mmio_base + 0x1c8;
+		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
+	}
+	reg_state[CTX_LRI_HEADER_1] = MI_LOAD_REGISTER_IMM(9);
+	reg_state[CTX_LRI_HEADER_1] |= MI_LRI_FORCE_POSTED;
+	reg_state[CTX_CTX_TIMESTAMP] = ring->mmio_base + 0x3a8;
+	reg_state[CTX_CTX_TIMESTAMP+1] = 0;
+	reg_state[CTX_PDP3_UDW] = GEN8_RING_PDP_UDW(ring, 3);
+	reg_state[CTX_PDP3_LDW] = GEN8_RING_PDP_LDW(ring, 3);
+	reg_state[CTX_PDP2_UDW] = GEN8_RING_PDP_UDW(ring, 2);
+	reg_state[CTX_PDP2_LDW] = GEN8_RING_PDP_LDW(ring, 2);
+	reg_state[CTX_PDP1_UDW] = GEN8_RING_PDP_UDW(ring, 1);
+	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
+	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
+	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
+	reg_state[CTX_PDP3_UDW+1] = (u64)ppgtt->pd_dma_addr[3] >> 32;
+	reg_state[CTX_PDP3_LDW+1] = ppgtt->pd_dma_addr[3];
+	reg_state[CTX_PDP2_UDW+1] = (u64)ppgtt->pd_dma_addr[2] >> 32;
+	reg_state[CTX_PDP2_LDW+1] = ppgtt->pd_dma_addr[2];
+	reg_state[CTX_PDP1_UDW+1] = (u64)ppgtt->pd_dma_addr[1] >> 32;
+	reg_state[CTX_PDP1_LDW+1] = ppgtt->pd_dma_addr[1];
+	reg_state[CTX_PDP0_UDW+1] = (u64)ppgtt->pd_dma_addr[0] >> 32;
+	reg_state[CTX_PDP0_LDW+1] = ppgtt->pd_dma_addr[0];
+	if (ring->id == RCS) {
+		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
+		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
+		reg_state[CTX_R_PWR_CLK_STATE+1] = 0;
+	}
+
+	kunmap_atomic(reg_state);
+
+	ctx_obj->dirty = 1;
+	set_page_dirty(page);
+	i915_gem_object_unpin_pages(ctx_obj);
+
+	return 0;
+}
+
 void intel_lr_context_free(struct intel_context *ctx)
 {
 	int i;
@@ -145,14 +281,24 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	if (ret) {
 		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer obj %s: %d\n",
 				ring->name, ret);
-		kfree(ringbuf);
-		i915_gem_object_ggtt_unpin(ctx_obj);
-		drm_gem_object_unreference(&ctx_obj->base);
-		return ret;
+		goto error;
+	}
+
+	ret = populate_lr_context(ctx, ctx_obj, ring, ringbuf->obj);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret);
+		intel_destroy_ring_buffer(ringbuf);
+		goto error;
 	}
 
 	ctx->engine[ring->id].ringbuf = ringbuf;
 	ctx->engine[ring->id].obj = ctx_obj;
 
 	return 0;
+
+error:
+	kfree(ringbuf);
+	i915_gem_object_ggtt_unpin(ctx_obj);
+	drm_gem_object_unreference(&ctx_obj->base);
+	return ret;
 }
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 13/53] drm/i915/bdw: Deferred creation of user-created LRCs
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (11 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 12/53] drm/i915/bdw: Populate LR contexts (somewhat) oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-18 20:27   ` Daniel Vetter
  2014-06-13 15:37 ` [PATCH 14/53] drm/i915/bdw: Render moot context reset and switch when LRCs are enabled oscar.mateo
                   ` (40 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

The backing objects for contexts created via open fd are actually
empty until the user starts sending execbuffers to them. We do this
because, at create time, we really don't know which engine is going
to be used with the context later on.

v2: As context created via ioctl can only be used with the render ring,
we have enough information to allocate & populate them right away.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 1fb4592..70bf6d0 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -795,6 +795,7 @@ static bool contexts_enabled(struct drm_device *dev)
 int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 				  struct drm_file *file)
 {
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_context_create *args = data;
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct intel_context *ctx;
@@ -808,9 +809,23 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 		return ret;
 
 	ctx = i915_gem_create_context(dev, file_priv, USES_FULL_PPGTT(dev));
-	mutex_unlock(&dev->struct_mutex);
-	if (IS_ERR(ctx))
+	if (IS_ERR(ctx)) {
+		mutex_unlock(&dev->struct_mutex);
 		return PTR_ERR(ctx);
+	}
+
+	if (dev_priv->lrc_enabled) {
+		/* NB: We know this context will only be used with the render ring
+		 * (as we enforce it) so we can allocate & populate it already */
+		int ret = intel_lr_context_deferred_create(ctx, &dev_priv->ring[RCS]);
+		if (ret) {
+			mutex_unlock(&dev->struct_mutex);
+			DRM_DEBUG_DRIVER("Could not create LRC: %d\n", ret);
+			return ret;
+		}
+	}
+
+	mutex_unlock(&dev->struct_mutex);
 
 	args->ctx_id = ctx->id;
 	DRM_DEBUG_DRIVER("HW context %d created\n", args->ctx_id);
@@ -851,6 +866,7 @@ struct intel_context *
 i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
 			  struct intel_engine_cs *ring, const u32 ctx_id)
 {
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_context *ctx = NULL;
 	struct i915_ctx_hang_stats *hs;
 
@@ -867,5 +883,13 @@ i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
 		return ERR_PTR(-EIO);
 	}
 
+	if (dev_priv->lrc_enabled && !ctx->engine[ring->id].obj) {
+		int ret = intel_lr_context_deferred_create(ctx, ring);
+		if (ret) {
+			DRM_DEBUG("Could not create LRC %u: %d\n", ctx_id, ret);
+			return ERR_PTR(ret);
+		}
+	}
+
 	return ctx;
 }
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 14/53] drm/i915/bdw: Render moot context reset and switch when LRCs are enabled
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (12 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 13/53] drm/i915/bdw: Deferred creation of user-created LRCs oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 15/53] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs oscar.mateo
                   ` (39 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

These two functions make no sense in an Logical Ring Context & Execlists
world.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 70bf6d0..685c346 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -381,6 +381,9 @@ void i915_gem_context_reset(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int i;
 
+	if (dev_priv->lrc_enabled)
+		return;
+
 	/* Prevent the hardware from restoring the last context (which hung) on
 	 * the next switch */
 	for (i = 0; i < I915_NUM_RINGS; i++) {
@@ -514,6 +517,9 @@ int i915_gem_context_enable(struct drm_i915_private *dev_priv)
 		ppgtt->enable(ppgtt);
 	}
 
+	if (dev_priv->lrc_enabled)
+		return 0;
+
 	/* FIXME: We should make this work, even in reset */
 	if (i915_reset_in_progress(&dev_priv->gpu_error))
 		return 0;
@@ -769,6 +775,9 @@ int i915_switch_context(struct intel_engine_cs *ring,
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 
+	if (dev_priv->lrc_enabled)
+		return 0;
+
 	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
 
 	if (to->render_obj == NULL) { /* We have the fake context */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 15/53] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (13 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 14/53] drm/i915/bdw: Render moot context reset and switch when LRCs are enabled oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-18 23:42   ` Volkin, Bradley D
  2014-06-13 15:37 ` [PATCH 16/53] drm/i915/bdw: Skeleton for the new logical rings submission path oscar.mateo
                   ` (38 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

This is mostly for correctness so that we know we are running the LR
context correctly (this is, the PDPs are contained inside the context
object).

v2: Move the check to inside the enable PPGTT function. The switch
happens in two places: the legacy context switch (that we won't hit
when Execlists are enabled) and the PPGTT enable, which unfortunately
we need. This would look much nicer if the ppgtt->enable was part of
the ring init, where it logically belongs.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 8b3cde7..9f0c69e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -844,6 +844,11 @@ static int gen8_ppgtt_enable(struct i915_hw_ppgtt *ppgtt)
 		if (USES_FULL_PPGTT(dev))
 			continue;
 
+		/* In the case of Execlists, we don't want to write the PDPs
+		 * in the legacy way (they live inside the context now) */
+		if (intel_enable_execlists(dev))
+			return 0;
+
 		ret = ppgtt->switch_mm(ppgtt, ring, true);
 		if (ret)
 			goto err_out;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 16/53] drm/i915/bdw: Skeleton for the new logical rings submission path
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (14 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 15/53] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 17/53] drm/i915/bdw: Generic logical ring init and cleanup oscar.mateo
                   ` (37 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Execlists are indeed a brave new world with respect to workload
submission to the GPU.

In previous version of these series, I have tried to impact the
legacy ringbuffer submission path as little as possible (mostly,
passing the context around and using the correct ringbuffer when I
needed one) but Daniel is afraid (probably with a reason) that
these changes and, especially, future ones, will end up breaking
older gens.

This commit and some others coming next will try to limit the
damage by creating an alternative path for workload submission.
The first step is here: laying out a new ring init/fini.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |   2 +
 drivers/gpu/drm/i915/i915_gem.c  |  13 +++-
 drivers/gpu/drm/i915/intel_lrc.c | 132 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 144 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 79799d8..66d233f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2426,6 +2426,8 @@ i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
 
 /* intel_lrc.c */
 bool intel_enable_execlists(struct drm_device *dev);
+void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
+int intel_logical_rings_init(struct drm_device *dev);
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3768199..c5c06c9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4694,7 +4694,10 @@ i915_gem_init_hw(struct drm_device *dev)
 
 	i915_gem_init_swizzling(dev);
 
-	ret = i915_gem_init_rings(dev);
+	if (intel_enable_execlists(dev))
+		ret = intel_logical_rings_init(dev);
+	else
+		ret = i915_gem_init_rings(dev);
 	if (ret)
 		return ret;
 
@@ -4766,8 +4769,12 @@ i915_gem_cleanup_ringbuffer(struct drm_device *dev)
 	struct intel_engine_cs *ring;
 	int i;
 
-	for_each_ring(ring, dev_priv, i)
-		intel_cleanup_ring_buffer(ring);
+	for_each_ring(ring, dev_priv, i) {
+		if (intel_enable_execlists(dev))
+			intel_logical_ring_cleanup(ring);
+		else
+			intel_cleanup_ring_buffer(ring);
+	}
 }
 
 int
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b96bb45..e2958c1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -86,6 +86,138 @@ bool intel_enable_execlists(struct drm_device *dev)
 	return HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev);
 }
 
+void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
+{
+}
+
+static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
+{
+	return 0;
+}
+
+static int logical_render_ring_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring = &dev_priv->ring[RCS];
+
+	ring->name = "render ring";
+	ring->id = RCS;
+	ring->mmio_base = RENDER_RING_BASE;
+	ring->irq_enable_mask =
+		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
+
+	return logical_ring_init(dev, ring);
+}
+
+static int logical_bsd_ring_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring = &dev_priv->ring[VCS];
+
+	ring->name = "bsd ring";
+	ring->id = VCS;
+	ring->mmio_base = GEN6_BSD_RING_BASE;
+	ring->irq_enable_mask =
+		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
+
+	return logical_ring_init(dev, ring);
+}
+
+static int logical_bsd2_ring_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring = &dev_priv->ring[VCS2];
+
+	ring->name = "bds2 ring";
+	ring->id = VCS2;
+	ring->mmio_base = GEN8_BSD2_RING_BASE;
+	ring->irq_enable_mask =
+		GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
+
+	return logical_ring_init(dev, ring);
+}
+
+static int logical_blt_ring_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring = &dev_priv->ring[BCS];
+
+	ring->name = "blitter ring";
+	ring->id = BCS;
+	ring->mmio_base = BLT_RING_BASE;
+	ring->irq_enable_mask =
+		GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
+
+	return logical_ring_init(dev, ring);
+}
+
+static int logical_vebox_ring_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring = &dev_priv->ring[VECS];
+
+	ring->name = "video enhancement ring";
+	ring->id = VECS;
+	ring->mmio_base = VEBOX_RING_BASE;
+	ring->irq_enable_mask =
+		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
+
+	return logical_ring_init(dev, ring);
+}
+
+int intel_logical_rings_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ret = logical_render_ring_init(dev);
+	if (ret)
+		return ret;
+
+	if (HAS_BSD(dev)) {
+		ret = logical_bsd_ring_init(dev);
+		if (ret)
+			goto cleanup_render_ring;
+	}
+
+	if (HAS_BLT(dev)) {
+		ret = logical_blt_ring_init(dev);
+		if (ret)
+			goto cleanup_bsd_ring;
+	}
+
+	if (HAS_VEBOX(dev)) {
+		ret = logical_vebox_ring_init(dev);
+		if (ret)
+			goto cleanup_blt_ring;
+	}
+
+	if (HAS_BSD2(dev)) {
+		ret = logical_bsd2_ring_init(dev);
+		if (ret)
+			goto cleanup_vebox_ring;
+	}
+
+	ret = i915_gem_set_seqno(dev, ((u32)~0 - 0x1000));
+	if (ret)
+		goto cleanup_bsd2_ring;
+
+	return 0;
+
+cleanup_bsd2_ring:
+	intel_logical_ring_cleanup(&dev_priv->ring[VCS2]);
+cleanup_vebox_ring:
+	intel_logical_ring_cleanup(&dev_priv->ring[VECS]);
+cleanup_blt_ring:
+	intel_logical_ring_cleanup(&dev_priv->ring[BCS]);
+cleanup_bsd_ring:
+	intel_logical_ring_cleanup(&dev_priv->ring[VCS]);
+cleanup_render_ring:
+	intel_logical_ring_cleanup(&dev_priv->ring[RCS]);
+
+	return ret;
+}
+
 static int
 populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj,
 		    struct intel_engine_cs *ring, struct drm_i915_gem_object *ring_obj)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 17/53] drm/i915/bdw: Generic logical ring init and cleanup
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (15 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 16/53] drm/i915/bdw: Skeleton for the new logical rings submission path oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 18/53] drm/i915/bdw: New header file for LRs, LRCs and Execlists oscar.mateo
                   ` (36 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Allocate and populate the default LRC for every ring, call
gen-specific init/cleanup, init/fini the command parser and
set the status page (now inside the LRC object).

Stopping the ring before cleanup and initializing the seqnos
is left as a TODO task (we need more infrastructure in place
before we can achieve this).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c |  5 ----
 drivers/gpu/drm/i915/intel_lrc.c        | 52 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 17 +++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  6 +---
 4 files changed, 70 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 685c346..99bdd5e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -449,11 +449,6 @@ int i915_gem_context_init(struct drm_device *dev)
 
 		/* NB: RCS will hold a ref for all rings */
 		ring->default_context = ctx;
-
-		/* FIXME: we only want to do this for initialized rings, but for that
-		 * we first need the new logical ring stuff */
-		if (dev_priv->lrc_enabled)
-			intel_lr_context_deferred_create(ctx, ring);
 	}
 
 	DRM_DEBUG_DRIVER("%s context support initialized\n",
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e2958c1..55c61e8 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -88,10 +88,60 @@ bool intel_enable_execlists(struct drm_device *dev)
 
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
+	if (!intel_ring_initialized(ring))
+		return;
+
+	/* TODO: make sure the ring is stopped */
+	ring->preallocated_lazy_request = NULL;
+	ring->outstanding_lazy_seqno = 0;
+
+	if (ring->cleanup)
+		ring->cleanup(ring);
+
+	i915_cmd_parser_fini_ring(ring);
+
+	if (ring->status_page.obj) {
+		kunmap(sg_page(ring->status_page.obj->pages->sgl));
+		ring->status_page.obj = NULL;
+	}
 }
 
 static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
 {
+	int ret;
+	struct intel_context *dctx = ring->default_context;
+	struct drm_i915_gem_object *dctx_obj;
+
+	/* Intentionally left blank. */
+	ring->buffer = NULL;
+
+	ring->dev = dev;
+	INIT_LIST_HEAD(&ring->active_list);
+	INIT_LIST_HEAD(&ring->request_list);
+	init_waitqueue_head(&ring->irq_queue);
+
+	ret = intel_lr_context_deferred_create(dctx, ring);
+	if (ret)
+		return ret;
+
+	/* The status page is offset 0 from the context object in LRCs. */
+	dctx_obj = dctx->engine[ring->id].obj;
+	ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(dctx_obj);
+	ring->status_page.page_addr = kmap(sg_page(dctx_obj->pages->sgl));
+	if (ring->status_page.page_addr == NULL)
+		return -ENOMEM;
+	ring->status_page.obj = dctx_obj;
+
+	ret = i915_cmd_parser_init_ring(ring);
+	if (ret)
+		return ret;
+
+	if (ring->init) {
+		ret = ring->init(ring);
+		if (ret)
+			return ret;
+	}
+
 	return 0;
 }
 
@@ -370,6 +420,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	int ret;
 
 	WARN_ON(ctx->render_obj != NULL);
+	if (ctx->engine[ring->id].obj)
+		return 0;
 
 	context_size = round_up(get_lr_context_size(ring), 4096);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 915f3d5..4a71dd4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -40,6 +40,23 @@
  */
 #define CACHELINE_BYTES 64
 
+bool
+intel_ring_initialized(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+
+	if (!dev)
+		return false;
+
+	if (intel_enable_execlists(dev)) {
+		struct intel_context *dctx = ring->default_context;
+		struct drm_i915_gem_object *dctx_obj = dctx->engine[ring->id].obj;
+
+		return dctx_obj;
+	} else
+		return ring->buffer && ring->buffer->obj;
+}
+
 static inline int __ring_space(int head, int tail, int size)
 {
 	int space = head - (tail + I915_RING_FREE_SPACE);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index dee5b37..599b4ed 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -219,11 +219,7 @@ struct  intel_engine_cs {
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
 };
 
-static inline bool
-intel_ring_initialized(struct intel_engine_cs *ring)
-{
-	return ring->buffer && ring->buffer->obj;
-}
+bool intel_ring_initialized(struct intel_engine_cs *ring);
 
 static inline unsigned
 intel_ring_flag(struct intel_engine_cs *ring)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 18/53] drm/i915/bdw: New header file for LRs, LRCs and Execlists
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (16 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 17/53] drm/i915/bdw: Generic logical ring init and cleanup oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 19/53] drm/i915: Extract pipe control fini & make init outside accesible oscar.mateo
                   ` (35 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Things are starting to get messy, and this helps a little.

And some point in time, it would be a good idea to split
intel_lrc.c/.h even further, but for the moment just shove
everything together.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  9 +--------
 drivers/gpu/drm/i915/intel_lrc.h | 16 ++++++++++++++++
 2 files changed, 17 insertions(+), 8 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/intel_lrc.h

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 66d233f..65a85ee 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -35,6 +35,7 @@
 #include "i915_reg.h"
 #include "intel_bios.h"
 #include "intel_ringbuffer.h"
+#include "intel_lrc.h"
 #include "i915_gem_gtt.h"
 #include <linux/io-mapping.h>
 #include <linux/i2c.h>
@@ -2424,14 +2425,6 @@ struct intel_context *
 i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
 			  struct intel_engine_cs *ring, const u32 ctx_id);
 
-/* intel_lrc.c */
-bool intel_enable_execlists(struct drm_device *dev);
-void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
-int intel_logical_rings_init(struct drm_device *dev);
-void intel_lr_context_free(struct intel_context *ctx);
-int intel_lr_context_deferred_create(struct intel_context *ctx,
-				     struct intel_engine_cs *ring);
-
 /* i915_gem_render_state.c */
 int i915_gem_render_state_init(struct intel_engine_cs *ring);
 /* i915_gem_evict.c */
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
new file mode 100644
index 0000000..26b0949
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -0,0 +1,16 @@
+#ifndef _INTEL_LRC_H_
+#define _INTEL_LRC_H_
+
+/* Logical Rings */
+void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
+int intel_logical_rings_init(struct drm_device *dev);
+
+/* Logical Ring Contexts */
+void intel_lr_context_free(struct intel_context *ctx);
+int intel_lr_context_deferred_create(struct intel_context *ctx,
+				     struct intel_engine_cs *ring);
+
+/* Execlists */
+bool intel_enable_execlists(struct drm_device *dev);
+
+#endif /* _INTEL_LRC_H_ */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 19/53] drm/i915: Extract pipe control fini & make init outside accesible
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (17 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 18/53] drm/i915/bdw: New header file for LRs, LRCs and Execlists oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-18 20:31   ` Daniel Vetter
  2014-06-19  0:04   ` Volkin, Bradley D
  2014-06-13 15:37 ` [PATCH 20/53] drm/i915/bdw: GEN-specific logical ring init oscar.mateo
                   ` (34 subsequent siblings)
  53 siblings, 2 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

I plan to reuse these for the new logical ring path.

No functional changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 31 ++++++++++++++++++-------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +++
 2 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 4a71dd4..254e4c5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -574,8 +574,21 @@ out:
 	return ret;
 }
 
-static int
-init_pipe_control(struct intel_engine_cs *ring)
+void
+intel_fini_pipe_control(struct intel_engine_cs *ring)
+{
+	if (ring->scratch.obj == NULL)
+		return;
+
+	kunmap(sg_page(ring->scratch.obj->pages->sgl));
+	i915_gem_object_ggtt_unpin(ring->scratch.obj);
+
+	drm_gem_object_unreference(&ring->scratch.obj->base);
+	ring->scratch.obj = NULL;
+}
+
+int
+intel_init_pipe_control(struct intel_engine_cs *ring)
 {
 	int ret;
 
@@ -648,7 +661,7 @@ static int init_render_ring(struct intel_engine_cs *ring)
 			   _MASKED_BIT_ENABLE(GFX_REPLAY_MODE));
 
 	if (INTEL_INFO(dev)->gen >= 5) {
-		ret = init_pipe_control(ring);
+		ret = intel_init_pipe_control(ring);
 		if (ret)
 			return ret;
 	}
@@ -676,16 +689,8 @@ static void render_ring_cleanup(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;
 
-	if (ring->scratch.obj == NULL)
-		return;
-
-	if (INTEL_INFO(dev)->gen >= 5) {
-		kunmap(sg_page(ring->scratch.obj->pages->sgl));
-		i915_gem_object_ggtt_unpin(ring->scratch.obj);
-	}
-
-	drm_gem_object_unreference(&ring->scratch.obj->base);
-	ring->scratch.obj = NULL;
+	if (INTEL_INFO(dev)->gen >= 5)
+		intel_fini_pipe_control(ring);
 }
 
 static int gen6_signal(struct intel_engine_cs *signaller,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 599b4ed..42026a1 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -221,6 +221,9 @@ struct  intel_engine_cs {
 
 bool intel_ring_initialized(struct intel_engine_cs *ring);
 
+void intel_fini_pipe_control(struct intel_engine_cs *ring);
+int intel_init_pipe_control(struct intel_engine_cs *ring);
+
 static inline unsigned
 intel_ring_flag(struct intel_engine_cs *ring)
 {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 20/53] drm/i915/bdw: GEN-specific logical ring init
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (18 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 19/53] drm/i915: Extract pipe control fini & make init outside accesible oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 21/53] drm/i915/bdw: GEN-specific logical ring set/get seqno oscar.mateo
                   ` (33 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Logical rings do not need most of the initialization their
legacy ringbuffer counterparts do: we just need the pipe
control object for the render ring, enable Execlists on the
hardware and a few workarounds.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 54 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 55c61e8..b0da0bc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -86,6 +86,49 @@ bool intel_enable_execlists(struct drm_device *dev)
 	return HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev);
 }
 
+static int gen8_init_common_ring(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	I915_WRITE(RING_MODE_GEN7(ring),
+		_MASKED_BIT_DISABLE(GFX_REPLAY_MODE) |
+		_MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));
+	POSTING_READ(RING_MODE_GEN7(ring));
+	DRM_DEBUG_DRIVER("Execlists enabled for %s\n", ring->name);
+
+	memset(&ring->hangcheck, 0, sizeof(ring->hangcheck));
+
+	return 0;
+}
+
+static int gen8_init_render_ring(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ret = gen8_init_common_ring(ring);
+	if (ret)
+		return ret;
+
+	/* We need to disable the AsyncFlip performance optimisations in order
+	 * to use MI_WAIT_FOR_EVENT within the CS. It should already be
+	 * programmed to '1' on all products.
+	 *
+	 * WaDisableAsyncFlipPerfMode:snb,ivb,hsw,vlv,bdw,chv
+	 */
+	I915_WRITE(MI_MODE, _MASKED_BIT_ENABLE(ASYNC_FLIP_PERF_DISABLE));
+
+	ret = intel_init_pipe_control(ring);
+	if (ret)
+		return ret;
+
+	I915_WRITE(INSTPM, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
+
+	return ret;
+}
+
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
 	if (!intel_ring_initialized(ring))
@@ -156,6 +199,9 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
 
+	ring->init = gen8_init_render_ring;
+	ring->cleanup = intel_fini_pipe_control;
+
 	return logical_ring_init(dev, ring);
 }
 
@@ -170,6 +216,8 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
 
+	ring->init = gen8_init_common_ring;
+
 	return logical_ring_init(dev, ring);
 }
 
@@ -184,6 +232,8 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
 
+	ring->init = gen8_init_common_ring;
+
 	return logical_ring_init(dev, ring);
 }
 
@@ -198,6 +248,8 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
 
+	ring->init = gen8_init_common_ring;
+
 	return logical_ring_init(dev, ring);
 }
 
@@ -212,6 +264,8 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
 
+	ring->init = gen8_init_common_ring;
+
 	return logical_ring_init(dev, ring);
 }
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 21/53] drm/i915/bdw: GEN-specific logical ring set/get seqno
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (19 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 20/53] drm/i915/bdw: GEN-specific logical ring init oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 22/53] drm/i915: Make ring_space more generic and outside accesible oscar.mateo
                   ` (32 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

No mistery here: the seqno is still retrieved from the engine's
HW status page (the one in the default context. For the moment,
I see no reason to worry about other context's HWS page).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b0da0bc..6c62ae5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -129,6 +129,16 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
 	return ret;
 }
 
+static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
+{
+	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
+}
+
+static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
+{
+	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
+}
+
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
 	if (!intel_ring_initialized(ring))
@@ -201,6 +211,8 @@ static int logical_render_ring_init(struct drm_device *dev)
 
 	ring->init = gen8_init_render_ring;
 	ring->cleanup = intel_fini_pipe_control;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 
 	return logical_ring_init(dev, ring);
 }
@@ -217,6 +229,8 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 
 	return logical_ring_init(dev, ring);
 }
@@ -233,6 +247,8 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 
 	return logical_ring_init(dev, ring);
 }
@@ -249,6 +265,8 @@ static int logical_blt_ring_init(struct drm_device *dev)
 		GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 
 	return logical_ring_init(dev, ring);
 }
@@ -265,6 +283,8 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
+	ring->get_seqno = gen8_get_seqno;
+	ring->set_seqno = gen8_set_seqno;
 
 	return logical_ring_init(dev, ring);
 }
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 22/53] drm/i915: Make ring_space more generic and outside accesible
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (20 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 21/53] drm/i915/bdw: GEN-specific logical ring set/get seqno oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 23/53] drm/i915: Generalize intel_ring_get_tail oscar.mateo
                   ` (31 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

I want to reuse it from the new logical ring code (as it seems
innocent enough).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 26 ++++++--------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h | 13 +++++++++++++
 2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 254e4c5..249804c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -57,20 +57,6 @@ intel_ring_initialized(struct intel_engine_cs *ring)
 		return ring->buffer && ring->buffer->obj;
 }
 
-static inline int __ring_space(int head, int tail, int size)
-{
-	int space = head - (tail + I915_RING_FREE_SPACE);
-	if (space < 0)
-		space += size;
-	return space;
-}
-
-static inline int ring_space(struct intel_engine_cs *ring)
-{
-	struct intel_ringbuffer *ringbuf = ring->buffer;
-	return __ring_space(ringbuf->head & HEAD_ADDR, ringbuf->tail, ringbuf->size);
-}
-
 static bool intel_ring_stopped(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
@@ -562,7 +548,7 @@ static int init_ring_common(struct intel_engine_cs *ring)
 	else {
 		ringbuf->head = I915_READ_HEAD(ring);
 		ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
-		ringbuf->space = ring_space(ring);
+		ringbuf->space = intel_ring_space(ringbuf);
 		ringbuf->last_retired_head = -1;
 	}
 
@@ -1554,13 +1540,13 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 		ringbuf->head = ringbuf->last_retired_head;
 		ringbuf->last_retired_head = -1;
 
-		ringbuf->space = ring_space(ring);
+		ringbuf->space = intel_ring_space(ringbuf);
 		if (ringbuf->space >= n)
 			return 0;
 	}
 
 	list_for_each_entry(request, &ring->request_list, list) {
-		if (__ring_space(request->tail, ringbuf->tail, ringbuf->size) >= n) {
+		if (__intel_ring_space(request->tail, ringbuf->tail, ringbuf->size) >= n) {
 			seqno = request->seqno;
 			break;
 		}
@@ -1577,7 +1563,7 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 	ringbuf->head = ringbuf->last_retired_head;
 	ringbuf->last_retired_head = -1;
 
-	ringbuf->space = ring_space(ring);
+	ringbuf->space = intel_ring_space(ringbuf);
 	return 0;
 }
 
@@ -1606,7 +1592,7 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
 	trace_i915_ring_wait_begin(ring);
 	do {
 		ringbuf->head = I915_READ_HEAD(ring);
-		ringbuf->space = ring_space(ring);
+		ringbuf->space = intel_ring_space(ringbuf);
 		if (ringbuf->space >= n) {
 			ret = 0;
 			break;
@@ -1658,7 +1644,7 @@ static int intel_wrap_ring_buffer(struct intel_engine_cs *ring)
 		iowrite32(MI_NOOP, virt++);
 
 	ringbuf->tail = 0;
-	ringbuf->space = ring_space(ring);
+	ringbuf->space = intel_ring_space(ringbuf);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 42026a1..dc944fe 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -224,6 +224,19 @@ bool intel_ring_initialized(struct intel_engine_cs *ring);
 void intel_fini_pipe_control(struct intel_engine_cs *ring);
 int intel_init_pipe_control(struct intel_engine_cs *ring);
 
+static inline int __intel_ring_space(int head, int tail, int size)
+{
+	int space = head - (tail + I915_RING_FREE_SPACE);
+	if (space < 0)
+		space += size;
+	return space;
+}
+
+static inline int intel_ring_space(struct intel_ringbuffer *ringbuf)
+{
+	return __intel_ring_space(ringbuf->head & HEAD_ADDR, ringbuf->tail, ringbuf->size);
+}
+
 static inline unsigned
 intel_ring_flag(struct intel_engine_cs *ring)
 {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 23/53] drm/i915: Generalize intel_ring_get_tail
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (21 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 22/53] drm/i915: Make ring_space more generic and outside accesible oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-20 20:17   ` Volkin, Bradley D
  2014-06-13 15:37 ` [PATCH 24/53] drm/i915: Make intel_ring_stopped outside accesible oscar.mateo
                   ` (30 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Reusing stuff, a penny at a time.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         | 4 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c5c06c9..dcdffab 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2320,7 +2320,7 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	u32 request_ring_position, request_start;
 	int ret;
 
-	request_start = intel_ring_get_tail(ring);
+	request_start = intel_ring_get_tail(ring->buffer);
 	/*
 	 * Emit any outstanding flushes - execbuf can fail to emit the flush
 	 * after having emitted the batchbuffer command. Hence we need to fix
@@ -2341,7 +2341,7 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	 * GPU processing the request, we never over-estimate the
 	 * position of the head.
 	 */
-	request_ring_position = intel_ring_get_tail(ring);
+	request_ring_position = intel_ring_get_tail(ring->buffer);
 
 	ret = ring->add_request(ring);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index dc944fe..1558afa 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -334,9 +334,9 @@ void intel_destroy_ring_buffer(struct intel_ringbuffer *ringbuf);
 int intel_allocate_ring_buffer(struct drm_device *dev,
 			       struct intel_ringbuffer *ringbuf);
 
-static inline u32 intel_ring_get_tail(struct intel_engine_cs *ring)
+static inline u32 intel_ring_get_tail(struct intel_ringbuffer *ringbuf)
 {
-	return ring->buffer->tail;
+	return ringbuf->tail;
 }
 
 static inline u32 intel_ring_get_seqno(struct intel_engine_cs *ring)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 24/53] drm/i915: Make intel_ring_stopped outside accesible
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (22 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 23/53] drm/i915: Generalize intel_ring_get_tail oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 25/53] drm/i915/bdw: GEN-specific logical ring submit context (somewhat) oscar.mateo
                   ` (29 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

It is generic enough to be reused.
Trivial change.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 249804c..137ee9a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -57,7 +57,7 @@ intel_ring_initialized(struct intel_engine_cs *ring)
 		return ring->buffer && ring->buffer->obj;
 }
 
-static bool intel_ring_stopped(struct intel_engine_cs *ring)
+bool intel_ring_stopped(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	return dev_priv->gpu_error.stop_rings & intel_ring_flag(ring);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 1558afa..ff8753c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -220,6 +220,7 @@ struct  intel_engine_cs {
 };
 
 bool intel_ring_initialized(struct intel_engine_cs *ring);
+bool intel_ring_stopped(struct intel_engine_cs *ring);
 
 void intel_fini_pipe_control(struct intel_engine_cs *ring);
 int intel_init_pipe_control(struct intel_engine_cs *ring);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 25/53] drm/i915/bdw: GEN-specific logical ring submit context (somewhat)
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (23 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 24/53] drm/i915: Make intel_ring_stopped outside accesible oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-20 20:28   ` Volkin, Bradley D
  2014-06-13 15:37 ` [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism oscar.mateo
                   ` (28 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

For the moment, just mark the place (we still need to do a lot of
preparation before execlists are ready to start submitting things).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 11 +++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  6 ++++++
 2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 6c62ae5..02fc3d0 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -139,6 +139,12 @@ static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
 	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
 }
 
+static void gen8_submit_ctx(struct intel_engine_cs *ring,
+			    struct intel_context *ctx, u32 value)
+{
+	DRM_ERROR("Execlists still not ready!\n");
+}
+
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
 	if (!intel_ring_initialized(ring))
@@ -213,6 +219,7 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->cleanup = intel_fini_pipe_control;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
+	ring->submit_ctx = gen8_submit_ctx;
 
 	return logical_ring_init(dev, ring);
 }
@@ -231,6 +238,7 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
+	ring->submit_ctx = gen8_submit_ctx;
 
 	return logical_ring_init(dev, ring);
 }
@@ -249,6 +257,7 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
+	ring->submit_ctx = gen8_submit_ctx;
 
 	return logical_ring_init(dev, ring);
 }
@@ -267,6 +276,7 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
+	ring->submit_ctx = gen8_submit_ctx;
 
 	return logical_ring_init(dev, ring);
 }
@@ -285,6 +295,7 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
+	ring->submit_ctx = gen8_submit_ctx;
 
 	return logical_ring_init(dev, ring);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index ff8753c..1a6df42 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -79,6 +79,8 @@ struct intel_ringbuffer {
 	u32 last_retired_head;
 };
 
+struct intel_context;
+
 struct  intel_engine_cs {
 	const char	*name;
 	enum intel_ring_id {
@@ -146,6 +148,10 @@ struct  intel_engine_cs {
 				  unsigned int num_dwords);
 	} semaphore;
 
+	/* Execlists */
+	void		(*submit_ctx)(struct intel_engine_cs *ring,
+				      struct intel_context *ctx, u32 value);
+
 	/**
 	 * List of objects currently involved in rendering from the
 	 * ringbuffer.
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (24 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 25/53] drm/i915/bdw: GEN-specific logical ring submit context (somewhat) oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-20 21:00   ` Volkin, Bradley D
  2014-06-13 15:37 ` [PATCH 27/53] drm/i915/bdw: GEN-specific logical ring emit request oscar.mateo
                   ` (27 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Well, new-ish: if all this code looks familiar, that's because it's
a clone of the existing submission mechanism (with some modifications
here and there to adapt it to LRCs and Execlists).

And why did we do this? Execlists offer several advantages, like
control over when the GPU is done with a given workload, that can
help simplify the submission mechanism, no doubt, but I am interested
in getting Execlists to work first and foremost. As we are creating
a parallel submission mechanism (even if itñś just a clone), we can
now start improving it without the fear of breaking old gens.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 214 +++++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h |  18 ++++
 2 files changed, 232 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 02fc3d0..89aed7a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -86,6 +86,220 @@ bool intel_enable_execlists(struct drm_device *dev)
 	return HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev);
 }
 
+static inline struct intel_ringbuffer *
+logical_ringbuf_get(struct intel_engine_cs *ring, struct intel_context *ctx)
+{
+	return ctx->engine[ring->id].ringbuf;
+}
+
+void intel_logical_ring_advance_and_submit(struct intel_engine_cs *ring,
+					   struct intel_context *ctx)
+{
+	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
+
+	intel_logical_ring_advance(ringbuf);
+
+	if (intel_ring_stopped(ring))
+		return;
+
+	ring->submit_ctx(ring, ctx, ringbuf->tail);
+}
+
+static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
+				    struct intel_context *ctx)
+{
+	if (ring->outstanding_lazy_seqno)
+		return 0;
+
+	if (ring->preallocated_lazy_request == NULL) {
+		struct drm_i915_gem_request *request;
+
+		request = kmalloc(sizeof(*request), GFP_KERNEL);
+		if (request == NULL)
+			return -ENOMEM;
+
+		ring->preallocated_lazy_request = request;
+	}
+
+	return i915_gem_get_seqno(ring->dev, &ring->outstanding_lazy_seqno);
+}
+
+static int logical_ring_wait_request(struct intel_engine_cs *ring,
+				     struct intel_ringbuffer *ringbuf,
+				     struct intel_context *ctx,
+				     int bytes)
+{
+	struct drm_i915_gem_request *request;
+	u32 seqno = 0;
+	int ret;
+
+	if (ringbuf->last_retired_head != -1) {
+		ringbuf->head = ringbuf->last_retired_head;
+		ringbuf->last_retired_head = -1;
+
+		ringbuf->space = intel_ring_space(ringbuf);
+		if (ringbuf->space >= bytes)
+			return 0;
+	}
+
+	list_for_each_entry(request, &ring->request_list, list) {
+		if (__intel_ring_space(request->tail, ringbuf->tail,
+				ringbuf->size) >= bytes) {
+			seqno = request->seqno;
+			break;
+		}
+	}
+
+	if (seqno == 0)
+		return -ENOSPC;
+
+	ret = i915_wait_seqno(ring, seqno);
+	if (ret)
+		return ret;
+
+	/* TODO: make sure we update the right ringbuffer's last_retired_head
+	 * when retiring requests */
+	i915_gem_retire_requests_ring(ring);
+	ringbuf->head = ringbuf->last_retired_head;
+	ringbuf->last_retired_head = -1;
+
+	ringbuf->space = intel_ring_space(ringbuf);
+	return 0;
+}
+
+static int logical_ring_wait_for_space(struct intel_engine_cs *ring,
+						   struct intel_ringbuffer *ringbuf,
+						   struct intel_context *ctx,
+						   int bytes)
+{
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	unsigned long end;
+	int ret;
+
+	ret = logical_ring_wait_request(ring, ringbuf, ctx, bytes);
+	if (ret != -ENOSPC)
+		return ret;
+
+	/* Force the context submission in case we have been skipping it */
+	intel_logical_ring_advance_and_submit(ring, ctx);
+
+	/* With GEM the hangcheck timer should kick us out of the loop,
+	 * leaving it early runs the risk of corrupting GEM state (due
+	 * to running on almost untested codepaths). But on resume
+	 * timers don't work yet, so prevent a complete hang in that
+	 * case by choosing an insanely large timeout. */
+	end = jiffies + 60 * HZ;
+
+	do {
+		ringbuf->head = I915_READ_HEAD(ring);
+		ringbuf->space = intel_ring_space(ringbuf);
+		if (ringbuf->space >= bytes) {
+			ret = 0;
+			break;
+		}
+
+		if (!drm_core_check_feature(dev, DRIVER_MODESET) &&
+		    dev->primary->master) {
+			struct drm_i915_master_private *master_priv = dev->primary->master->driver_priv;
+			if (master_priv->sarea_priv)
+				master_priv->sarea_priv->perf_boxes |= I915_BOX_WAIT;
+		}
+
+		msleep(1);
+
+		if (dev_priv->mm.interruptible && signal_pending(current)) {
+			ret = -ERESTARTSYS;
+			break;
+		}
+
+		ret = i915_gem_check_wedge(&dev_priv->gpu_error,
+					   dev_priv->mm.interruptible);
+		if (ret)
+			break;
+
+		if (time_after(jiffies, end)) {
+			ret = -EBUSY;
+			break;
+		}
+	} while (1);
+
+	return ret;
+}
+
+static int logical_ring_wrap_buffer(struct intel_engine_cs *ring,
+						struct intel_ringbuffer *ringbuf,
+						struct intel_context *ctx)
+{
+	uint32_t __iomem *virt;
+	int rem = ringbuf->size - ringbuf->tail;
+
+	if (ringbuf->space < rem) {
+		int ret = logical_ring_wait_for_space(ring, ringbuf, ctx, rem);
+		if (ret)
+			return ret;
+	}
+
+	virt = ringbuf->virtual_start + ringbuf->tail;
+	rem /= 4;
+	while (rem--)
+		iowrite32(MI_NOOP, virt++);
+
+	ringbuf->tail = 0;
+	ringbuf->space = intel_ring_space(ringbuf);
+
+	return 0;
+}
+
+static int logical_ring_prepare(struct intel_engine_cs *ring,
+				struct intel_ringbuffer *ringbuf,
+				struct intel_context *ctx,
+				int bytes)
+{
+	int ret;
+
+	if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
+		ret = logical_ring_wrap_buffer(ring, ringbuf, ctx);
+		if (unlikely(ret))
+			return ret;
+	}
+
+	if (unlikely(ringbuf->space < bytes)) {
+		ret = logical_ring_wait_for_space(ring, ringbuf, ctx, bytes);
+		if (unlikely(ret))
+			return ret;
+	}
+
+	return 0;
+}
+
+int intel_logical_ring_begin(struct intel_engine_cs *ring,
+			     struct intel_context *ctx,
+			     int num_dwords)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
+	int ret;
+
+	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
+				   dev_priv->mm.interruptible);
+	if (ret)
+		return ret;
+
+	ret = logical_ring_prepare(ring, ringbuf, ctx,
+			num_dwords * sizeof(uint32_t));
+	if (ret)
+		return ret;
+
+	/* Preallocate the olr before touching the ring */
+	ret = logical_ring_alloc_seqno(ring, ctx);
+	if (ret)
+		return ret;
+
+	ringbuf->space -= num_dwords * sizeof(uint32_t);
+	return 0;
+}
+
 static int gen8_init_common_ring(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 26b0949..686ebf5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -5,6 +5,24 @@
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
 int intel_logical_rings_init(struct drm_device *dev);
 
+void intel_logical_ring_advance_and_submit(struct intel_engine_cs *ring,
+					   struct intel_context *ctx);
+
+static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
+{
+	ringbuf->tail &= ringbuf->size - 1;
+}
+
+static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf, u32 data)
+{
+	iowrite32(data, ringbuf->virtual_start + ringbuf->tail);
+	ringbuf->tail += 4;
+}
+
+int intel_logical_ring_begin(struct intel_engine_cs *ring,
+			     struct intel_context *ctx,
+			     int num_dwords);
+
 /* Logical Ring Contexts */
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
-- 
1.9.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 27/53] drm/i915/bdw: GEN-specific logical ring emit request
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (25 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-20 21:18   ` Volkin, Bradley D
  2014-06-13 15:37 ` [PATCH 28/53] drm/i915/bdw: GEN-specific logical ring emit flush oscar.mateo
                   ` (26 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Very similar to the legacy add_request, only modified to account for
logical ringbuffer.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h         |  1 +
 drivers/gpu/drm/i915/intel_lrc.c        | 61 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
 3 files changed, 64 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 9c8692a..63ec3ea 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -267,6 +267,7 @@
 #define   MI_FORCE_RESTORE		(1<<1)
 #define   MI_RESTORE_INHIBIT		(1<<0)
 #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
+#define MI_STORE_DWORD_IMM_GEN8	MI_INSTR(0x20, 2)
 #define   MI_MEM_VIRTUAL	(1 << 22) /* 965+ only */
 #define MI_STORE_DWORD_INDEX	MI_INSTR(0x21, 1)
 #define   MI_STORE_DWORD_INDEX_SHIFT 2
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 89aed7a..3debe8b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -359,6 +359,62 @@ static void gen8_submit_ctx(struct intel_engine_cs *ring,
 	DRM_ERROR("Execlists still not ready!\n");
 }
 
+static int gen8_emit_request(struct intel_engine_cs *ring,
+			     struct intel_context *ctx)
+{
+	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
+	u32 cmd;
+	int ret;
+
+	ret = intel_logical_ring_begin(ring, ctx, 6);
+	if (ret)
+		return ret;
+
+	cmd = MI_FLUSH_DW + 1;
+	cmd |= MI_INVALIDATE_TLB;
+	cmd |= MI_FLUSH_DW_OP_STOREDW;
+
+	intel_logical_ring_emit(ringbuf, cmd);
+	intel_logical_ring_emit(ringbuf,
+				(ring->status_page.gfx_addr +
+				(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT)) |
+				MI_FLUSH_DW_USE_GTT);
+	intel_logical_ring_emit(ringbuf, 0);
+	intel_logical_ring_emit(ringbuf, ring->outstanding_lazy_seqno);
+	intel_logical_ring_emit(ringbuf, MI_USER_INTERRUPT);
+	intel_logical_ring_emit(ringbuf, MI_NOOP);
+	intel_logical_ring_advance_and_submit(ring, ctx);
+
+	return 0;
+}
+
+static int gen8_emit_request_render(struct intel_engine_cs *ring,
+				    struct intel_context *ctx)
+{
+	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
+	u32 cmd;
+	int ret;
+
+	ret = intel_logical_ring_begin(ring, ctx, 6);
+	if (ret)
+		return ret;
+
+	cmd = MI_STORE_DWORD_IMM_GEN8;
+	cmd |= (1 << 22); /* use global GTT */
+
+	intel_logical_ring_emit(ringbuf, cmd);
+	intel_logical_ring_emit(ringbuf,
+				(ring->status_page.gfx_addr +
+				(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT)));
+	intel_logical_ring_emit(ringbuf, 0);
+	intel_logical_ring_emit(ringbuf, ring->outstanding_lazy_seqno);
+	intel_logical_ring_emit(ringbuf, MI_USER_INTERRUPT);
+	intel_logical_ring_emit(ringbuf, MI_NOOP);
+	intel_logical_ring_advance_and_submit(ring, ctx);
+
+	return 0;
+}
+
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
 	if (!intel_ring_initialized(ring))
@@ -434,6 +490,7 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->submit_ctx = gen8_submit_ctx;
+	ring->emit_request = gen8_emit_request_render;
 
 	return logical_ring_init(dev, ring);
 }
@@ -453,6 +510,7 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->submit_ctx = gen8_submit_ctx;
+	ring->emit_request = gen8_emit_request;
 
 	return logical_ring_init(dev, ring);
 }
@@ -472,6 +530,7 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->submit_ctx = gen8_submit_ctx;
+	ring->emit_request = gen8_emit_request;
 
 	return logical_ring_init(dev, ring);
 }
@@ -491,6 +550,7 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->submit_ctx = gen8_submit_ctx;
+	ring->emit_request = gen8_emit_request;
 
 	return logical_ring_init(dev, ring);
 }
@@ -510,6 +570,7 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
 	ring->submit_ctx = gen8_submit_ctx;
+	ring->emit_request = gen8_emit_request;
 
 	return logical_ring_init(dev, ring);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 1a6df42..d8ded14 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -151,6 +151,8 @@ struct  intel_engine_cs {
 	/* Execlists */
 	void		(*submit_ctx)(struct intel_engine_cs *ring,
 				      struct intel_context *ctx, u32 value);
+	int		(*emit_request)(struct intel_engine_cs *ring,
+					struct intel_context *ctx);
 
 	/**
 	 * List of objects currently involved in rendering from the
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 28/53] drm/i915/bdw: GEN-specific logical ring emit flush
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (26 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 27/53] drm/i915/bdw: GEN-specific logical ring emit request oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-20 21:39   ` Volkin, Bradley D
  2014-06-13 15:37 ` [PATCH 29/53] drm/i915/bdw: Emission of requests with logical rings oscar.mateo
                   ` (25 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Notice that the BSD invalidate bit is no longer present in GEN8, so
we can consolidate the blt and bsd ring flushes into one.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 80 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c |  7 ---
 drivers/gpu/drm/i915/intel_ringbuffer.h | 11 +++++
 3 files changed, 91 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3debe8b..3d7fcd6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -343,6 +343,81 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
 	return ret;
 }
 
+static int gen8_emit_flush(struct intel_engine_cs *ring,
+			   struct intel_context *ctx,
+			   u32 invalidate_domains,
+			   u32 unused)
+{
+	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
+	uint32_t cmd;
+	int ret;
+
+	ret = intel_logical_ring_begin(ring, ctx, 4);
+	if (ret)
+		return ret;
+
+	cmd = MI_FLUSH_DW + 1;
+
+	/*
+	 * Bspec vol 1c.3 - blitter engine command streamer:
+	 * "If ENABLED, all TLBs will be invalidated once the flush
+	 * operation is complete. This bit is only valid when the
+	 * Post-Sync Operation field is a value of 1h or 3h."
+	 */
+	if (invalidate_domains & I915_GEM_DOMAIN_RENDER)
+		cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
+			MI_FLUSH_DW_OP_STOREDW;
+	intel_logical_ring_emit(ringbuf, cmd);
+	intel_logical_ring_emit(ringbuf, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
+	intel_logical_ring_emit(ringbuf, 0); /* upper addr */
+	intel_logical_ring_emit(ringbuf, 0); /* value */
+	intel_logical_ring_advance(ringbuf);
+
+	return 0;
+}
+
+static int gen8_emit_flush_render(struct intel_engine_cs *ring,
+				  struct intel_context *ctx,
+				  u32 invalidate_domains,
+				  u32 flush_domains)
+{
+	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
+	u32 flags = 0;
+	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	int ret;
+
+	flags |= PIPE_CONTROL_CS_STALL;
+
+	if (flush_domains) {
+		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
+		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
+	}
+	if (invalidate_domains) {
+		flags |= PIPE_CONTROL_TLB_INVALIDATE;
+		flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_QW_WRITE;
+		flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
+	}
+
+	ret = intel_logical_ring_begin(ring, ctx, 6);
+	if (ret)
+		return ret;
+
+	intel_logical_ring_emit(ringbuf, GFX_OP_PIPE_CONTROL(6));
+	intel_logical_ring_emit(ringbuf, flags);
+	intel_logical_ring_emit(ringbuf, scratch_addr);
+	intel_logical_ring_emit(ringbuf, 0);
+	intel_logical_ring_emit(ringbuf, 0);
+	intel_logical_ring_emit(ringbuf, 0);
+	intel_logical_ring_advance(ringbuf);
+
+	return 0;
+}
+
 static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
 {
 	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
@@ -491,6 +566,7 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->set_seqno = gen8_set_seqno;
 	ring->submit_ctx = gen8_submit_ctx;
 	ring->emit_request = gen8_emit_request_render;
+	ring->emit_flush = gen8_emit_flush_render;
 
 	return logical_ring_init(dev, ring);
 }
@@ -511,6 +587,7 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->set_seqno = gen8_set_seqno;
 	ring->submit_ctx = gen8_submit_ctx;
 	ring->emit_request = gen8_emit_request;
+	ring->emit_flush = gen8_emit_flush;
 
 	return logical_ring_init(dev, ring);
 }
@@ -531,6 +608,7 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->set_seqno = gen8_set_seqno;
 	ring->submit_ctx = gen8_submit_ctx;
 	ring->emit_request = gen8_emit_request;
+	ring->emit_flush = gen8_emit_flush;
 
 	return logical_ring_init(dev, ring);
 }
@@ -551,6 +629,7 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->set_seqno = gen8_set_seqno;
 	ring->submit_ctx = gen8_submit_ctx;
 	ring->emit_request = gen8_emit_request;
+	ring->emit_flush = gen8_emit_flush;
 
 	return logical_ring_init(dev, ring);
 }
@@ -571,6 +650,7 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->set_seqno = gen8_set_seqno;
 	ring->submit_ctx = gen8_submit_ctx;
 	ring->emit_request = gen8_emit_request;
+	ring->emit_flush = gen8_emit_flush;
 
 	return logical_ring_init(dev, ring);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 137ee9a..a128f6f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -33,13 +33,6 @@
 #include "i915_trace.h"
 #include "intel_drv.h"
 
-/* Early gen2 devices have a cacheline of just 32 bytes, using 64 is overkill,
- * but keeps the logic simple. Indeed, the whole purpose of this macro is just
- * to give some inclination as to some of the magic values used in the various
- * workarounds!
- */
-#define CACHELINE_BYTES 64
-
 bool
 intel_ring_initialized(struct intel_engine_cs *ring)
 {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index d8ded14..527db2a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -5,6 +5,13 @@
 
 #define I915_CMD_HASH_ORDER 9
 
+/* Early gen2 devices have a cacheline of just 32 bytes, using 64 is overkill,
+ * but keeps the logic simple. Indeed, the whole purpose of this macro is just
+ * to give some inclination as to some of the magic values used in the various
+ * workarounds!
+ */
+#define CACHELINE_BYTES 64
+
 /*
  * Gen2 BSpec "1. Programming Environment" / 1.4.4.6 "Ring Buffer Use"
  * Gen3 BSpec "vol1c Memory Interface Functions" / 2.3.4.5 "Ring Buffer Use"
@@ -153,6 +160,10 @@ struct  intel_engine_cs {
 				      struct intel_context *ctx, u32 value);
 	int		(*emit_request)(struct intel_engine_cs *ring,
 					struct intel_context *ctx);
+	int __must_check (*emit_flush)(struct intel_engine_cs *ring,
+				       struct intel_context *ctx,
+				       u32 invalidate_domains,
+				       u32 flush_domains);
 
 	/**
 	 * List of objects currently involved in rendering from the
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 29/53] drm/i915/bdw: Emission of requests with logical rings
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (27 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 28/53] drm/i915/bdw: GEN-specific logical ring emit flush oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 30/53] drm/i915/bdw: Ring idle and stop " oscar.mateo
                   ` (24 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Also known as __i915_add_request's evil twin.

On seqno preallocation, we set the request context information
correctly so that we can retrieve it both when we want to emit
or retire the request.

This is a candidate to be abstracted away (so that it replaces
__i915_add_request seamlessly).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c  |  18 +++++-
 drivers/gpu/drm/i915/intel_lrc.c | 116 ++++++++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.h |   4 ++
 3 files changed, 135 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index dcdffab..69db71a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2320,6 +2320,9 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	u32 request_ring_position, request_start;
 	int ret;
 
+	if (intel_enable_execlists(ring->dev))
+		return intel_logical_ring_add_request(ring, file, obj, out_seqno);
+
 	request_start = intel_ring_get_tail(ring->buffer);
 	/*
 	 * Emit any outstanding flushes - execbuf can fail to emit the flush
@@ -2620,6 +2623,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 
 	while (!list_empty(&ring->request_list)) {
 		struct drm_i915_gem_request *request;
+		struct intel_ringbuffer *ringbuf;
 
 		request = list_first_entry(&ring->request_list,
 					   struct drm_i915_gem_request,
@@ -2629,12 +2633,24 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 			break;
 
 		trace_i915_gem_request_retire(ring, request->seqno);
+
+		/* This is one of the few common intersection points
+		 * between legacy ringbuffer submission and execlists:
+		 * we need to tell them apart in order to find the correct
+		 * ringbuffer to which the request belongs to.
+		 */
+		if (intel_enable_execlists(ring->dev)) {
+			struct intel_context *ctx = request->ctx;
+			ringbuf = ctx->engine[ring->id].ringbuf;
+		} else
+			ringbuf = ring->buffer;
+
 		/* We know the GPU must have read the request to have
 		 * sent us the seqno + interrupt, so use the position
 		 * of tail of the request to update the last known position
 		 * of the GPU head.
 		 */
-		ring->buffer->last_retired_head = request->tail;
+		ringbuf->last_retired_head = request->tail;
 
 		i915_gem_free_request(request);
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3d7fcd6..051e150 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -92,6 +92,113 @@ logical_ringbuf_get(struct intel_engine_cs *ring, struct intel_context *ctx)
 	return ctx->engine[ring->id].ringbuf;
 }
 
+static int logical_ring_flush_all_caches(struct intel_engine_cs *ring,
+					 struct intel_context *ctx)
+{
+	int ret;
+
+	if (!ring->gpu_caches_dirty)
+		return 0;
+
+	ret = ring->emit_flush(ring, ctx, 0, I915_GEM_GPU_DOMAINS);
+	if (ret)
+		return ret;
+
+	ring->gpu_caches_dirty = false;
+	return 0;
+}
+
+int intel_logical_ring_add_request(struct intel_engine_cs *ring,
+				   struct drm_file *file,
+				   struct drm_i915_gem_object *obj,
+				   u32 *out_seqno)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct intel_context *ctx;
+	struct intel_ringbuffer *ringbuf;
+	struct drm_i915_gem_request *request;
+	u32 request_ring_position, request_start;
+	int ret;
+
+	request = ring->preallocated_lazy_request;
+	if (WARN_ON(request == NULL))
+		return -ENOMEM;
+
+	/* We pre-recorded which context the request belongs to */
+	ctx = request->ctx;
+	if (WARN_ON(ctx == NULL))
+		return -EINVAL;
+	ringbuf = logical_ringbuf_get(ring, ctx);
+
+	request_start = intel_ring_get_tail(ringbuf);
+	/*
+	 * Emit any outstanding flushes - execbuf can fail to emit the flush
+	 * after having emitted the batchbuffer command. Hence we need to fix
+	 * things up similar to emitting the lazy request. The difference here
+	 * is that the flush _must_ happen before the next request, no matter
+	 * what.
+	 */
+	ret = logical_ring_flush_all_caches(ring, ctx);
+	if (ret)
+		return ret;
+
+	/* Record the position of the start of the request so that
+	 * should we detect the updated seqno part-way through the
+	 * GPU processing the request, we never over-estimate the
+	 * position of the head.
+	 */
+	request_ring_position = intel_ring_get_tail(ringbuf);
+
+	ret = ring->emit_request(ring, ctx);
+	if (ret)
+		return ret;
+
+	request->seqno = intel_ring_get_seqno(ring);
+	request->ring = ring;
+	request->head = request_start;
+	request->tail = request_ring_position;
+
+	/* Whilst this request exists, batch_obj will be on the
+	 * active_list, and so will hold the active reference. Only when this
+	 * request is retired will the the batch_obj be moved onto the
+	 * inactive_list and lose its active reference. Hence we do not need
+	 * to explicitly hold another reference here.
+	 */
+	request->batch_obj = obj;
+
+	request->emitted_jiffies = jiffies;
+	list_add_tail(&request->list, &ring->request_list);
+	request->file_priv = NULL;
+
+	if (file) {
+		struct drm_i915_file_private *file_priv = file->driver_priv;
+		WARN_ON(file_priv != ctx->file_priv);
+
+		spin_lock(&file_priv->mm.lock);
+		request->file_priv = file_priv;
+		list_add_tail(&request->client_list,
+			      &file_priv->mm.request_list);
+		spin_unlock(&file_priv->mm.lock);
+	}
+
+	ring->outstanding_lazy_seqno = 0;
+	ring->preallocated_lazy_request = NULL;
+
+	if (!dev_priv->ums.mm_suspended) {
+		i915_queue_hangcheck(ring->dev);
+
+		cancel_delayed_work_sync(&dev_priv->mm.idle_work);
+		queue_delayed_work(dev_priv->wq,
+				   &dev_priv->mm.retire_work,
+				   round_jiffies_up_relative(HZ));
+		intel_mark_busy(dev_priv->dev);
+	}
+
+	if (out_seqno)
+		*out_seqno = request->seqno;
+	return 0;
+}
+
 void intel_logical_ring_advance_and_submit(struct intel_engine_cs *ring,
 					   struct intel_context *ctx)
 {
@@ -118,6 +225,13 @@ static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
 		if (request == NULL)
 			return -ENOMEM;
 
+		/* Hold a reference to the context this request belongs to
+		 * (we will need it when the time comes to emit/retire the
+		 * request).
+		 */
+		request->ctx = ctx;
+		i915_gem_context_reference(request->ctx);
+
 		ring->preallocated_lazy_request = request;
 	}
 
@@ -157,8 +271,6 @@ static int logical_ring_wait_request(struct intel_engine_cs *ring,
 	if (ret)
 		return ret;
 
-	/* TODO: make sure we update the right ringbuffer's last_retired_head
-	 * when retiring requests */
 	i915_gem_retire_requests_ring(ring);
 	ringbuf->head = ringbuf->last_retired_head;
 	ringbuf->last_retired_head = -1;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 686ebf5..4495359 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -5,6 +5,10 @@
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
 int intel_logical_rings_init(struct drm_device *dev);
 
+int intel_logical_ring_add_request(struct intel_engine_cs *ring,
+				   struct drm_file *file,
+				   struct drm_i915_gem_object *obj,
+				   u32 *out_seqno);
 void intel_logical_ring_advance_and_submit(struct intel_engine_cs *ring,
 					   struct intel_context *ctx);
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 30/53] drm/i915/bdw: Ring idle and stop with logical rings
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (28 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 29/53] drm/i915/bdw: Emission of requests with logical rings oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 31/53] drm/i915/bdw: Interrupts " oscar.mateo
                   ` (23 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

This is a hard one, since there is no direct hardware ring to
control when in Execlists.

We reuse intel_ring_idle here, but it should be fine as long
as i915_add_request does the ring thing.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 051e150..58a517c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -92,6 +92,28 @@ logical_ringbuf_get(struct intel_engine_cs *ring, struct intel_context *ctx)
 	return ctx->engine[ring->id].ringbuf;
 }
 
+static void logical_ring_stop(struct intel_engine_cs *ring)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	int ret;
+
+	if (!intel_ring_initialized(ring))
+		return;
+
+	ret = intel_ring_idle(ring);
+	if (ret && !i915_reset_in_progress(&to_i915(ring->dev)->gpu_error))
+		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
+			  ring->name, ret);
+
+	/* TODO: Is this correct with Execlists enabled? */
+	I915_WRITE_MODE(ring, _MASKED_BIT_ENABLE(STOP_RING));
+	if (wait_for_atomic((I915_READ_MODE(ring) & MODE_IDLE) != 0, 1000)) {
+		DRM_ERROR("%s :timed out trying to stop ring\n", ring->name);
+		return;
+	}
+	I915_WRITE_MODE(ring, _MASKED_BIT_DISABLE(STOP_RING));
+}
+
 static int logical_ring_flush_all_caches(struct intel_engine_cs *ring,
 					 struct intel_context *ctx)
 {
@@ -604,10 +626,13 @@ static int gen8_emit_request_render(struct intel_engine_cs *ring,
 
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
 	if (!intel_ring_initialized(ring))
 		return;
 
-	/* TODO: make sure the ring is stopped */
+	logical_ring_stop(ring);
+	WARN_ON((I915_READ_MODE(ring) & MODE_IDLE) == 0);
 	ring->preallocated_lazy_request = NULL;
 	ring->outstanding_lazy_seqno = 0;
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 31/53] drm/i915/bdw: Interrupts with logical rings
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (29 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 30/53] drm/i915/bdw: Ring idle and stop " oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 32/53] drm/i915/bdw: GEN-specific logical ring emit batchbuffer start oscar.mateo
                   ` (22 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

We need to attend context switch interrupts from all rings. Also, fixed writing
IMR/IER and added HWSTAM at ring init time.

Notice that, if added to irq_enable_mask, the context switch interrupts would
be incorrectly masked out when the user interrupts are due to no users waiting
on a sequence number. Therefore, this commit adds a bitmask of interrupts to
be kept unmasked at all times.

v2: Disable HWSTAM, as suggested by Damien (nobody listens to these interrupts,
anyway).

v3: Add new get/put_irq functions.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2 & v3)
---
 drivers/gpu/drm/i915/i915_irq.c         | 19 +++++++++--
 drivers/gpu/drm/i915/i915_reg.h         |  3 ++
 drivers/gpu/drm/i915/intel_lrc.c        | 58 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 4 files changed, 78 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 218ef08..c566c38 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1469,6 +1469,8 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 				notify_ring(dev, &dev_priv->ring[RCS]);
 			if (bcs & GT_RENDER_USER_INTERRUPT)
 				notify_ring(dev, &dev_priv->ring[BCS]);
+			if ((rcs | bcs) & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				DRM_DEBUG_DRIVER("TODO: Context switch\n");
 			I915_WRITE(GEN8_GT_IIR(0), tmp);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT0)!\n");
@@ -1481,9 +1483,13 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
 			if (vcs & GT_RENDER_USER_INTERRUPT)
 				notify_ring(dev, &dev_priv->ring[VCS]);
+			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				DRM_DEBUG_DRIVER("TODO: Context switch\n");
 			vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
 			if (vcs & GT_RENDER_USER_INTERRUPT)
 				notify_ring(dev, &dev_priv->ring[VCS2]);
+			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				DRM_DEBUG_DRIVER("TODO: Context switch\n");
 			I915_WRITE(GEN8_GT_IIR(1), tmp);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT1)!\n");
@@ -1507,6 +1513,8 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 			vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
 			if (vcs & GT_RENDER_USER_INTERRUPT)
 				notify_ring(dev, &dev_priv->ring[VECS]);
+			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				DRM_DEBUG_DRIVER("TODO: Context switch\n");
 			I915_WRITE(GEN8_GT_IIR(3), tmp);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT3)!\n");
@@ -3461,12 +3469,17 @@ static void gen8_gt_irq_postinstall(struct drm_i915_private *dev_priv)
 	/* These are interrupts we'll toggle with the ring mask register */
 	uint32_t gt_interrupts[] = {
 		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
+			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
 			GT_RENDER_L3_PARITY_ERROR_INTERRUPT |
-			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT,
+			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT |
+			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT,
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
-			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT,
+			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
+			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT |
+			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT,
 		0,
-		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT
+		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT |
+			GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT
 		};
 
 	for (i = 0; i < ARRAY_SIZE(gt_interrupts); i++)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 63ec3ea..95fef20 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1042,6 +1042,7 @@ enum punit_power_well {
 #define RING_ACTHD_UDW(base)	((base)+0x5c)
 #define RING_NOPID(base)	((base)+0x94)
 #define RING_IMR(base)		((base)+0xa8)
+#define RING_HWSTAM(base)	((base)+0x98)
 #define RING_TIMESTAMP(base)	((base)+0x358)
 #define   TAIL_ADDR		0x001FFFF8
 #define   HEAD_WRAP_COUNT	0xFFE00000
@@ -4569,6 +4570,8 @@ enum punit_power_well {
 #define GEN8_GT_IIR(which) (0x44308 + (0x10 * (which)))
 #define GEN8_GT_IER(which) (0x4430c + (0x10 * (which)))
 
+#define GEN8_GT_CONTEXT_SWITCH_INTERRUPT	(1 <<  8)
+
 #define GEN8_BCS_IRQ_SHIFT 16
 #define GEN8_RCS_IRQ_SHIFT 0
 #define GEN8_VCS2_IRQ_SHIFT 16
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 58a517c..0fab3b9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -439,6 +439,9 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring)
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
+	I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
+	I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff);
+
 	I915_WRITE(RING_MODE_GEN7(ring),
 		_MASKED_BIT_DISABLE(GFX_REPLAY_MODE) |
 		_MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));
@@ -477,6 +480,39 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
 	return ret;
 }
 
+static bool gen8_logical_ring_get_irq(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	unsigned long flags;
+
+	if (!dev->irq_enabled)
+		return false;
+
+	spin_lock_irqsave(&dev_priv->irq_lock, flags);
+	if (ring->irq_refcount++ == 0) {
+		I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
+		POSTING_READ(RING_IMR(ring->mmio_base));
+	}
+	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+
+	return true;
+}
+
+static void gen8_logical_ring_put_irq(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev_priv->irq_lock, flags);
+	if (--ring->irq_refcount == 0) {
+		I915_WRITE_IMR(ring, ~ring->irq_keep_mask);
+		POSTING_READ(RING_IMR(ring->mmio_base));
+	}
+	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+}
+
 static int gen8_emit_flush(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   u32 invalidate_domains,
@@ -696,6 +732,10 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->mmio_base = RENDER_RING_BASE;
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
+	ring->irq_keep_mask =
+		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
+	if (HAS_L3_DPF(dev))
+		ring->irq_keep_mask |= GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
 
 	ring->init = gen8_init_render_ring;
 	ring->cleanup = intel_fini_pipe_control;
@@ -704,6 +744,8 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->submit_ctx = gen8_submit_ctx;
 	ring->emit_request = gen8_emit_request_render;
 	ring->emit_flush = gen8_emit_flush_render;
+	ring->irq_get = gen8_logical_ring_get_irq;
+	ring->irq_put = gen8_logical_ring_put_irq;
 
 	return logical_ring_init(dev, ring);
 }
@@ -718,6 +760,8 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->mmio_base = GEN6_BSD_RING_BASE;
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
+	ring->irq_keep_mask =
+		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
@@ -725,6 +769,8 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->submit_ctx = gen8_submit_ctx;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
+	ring->irq_get = gen8_logical_ring_get_irq;
+	ring->irq_put = gen8_logical_ring_put_irq;
 
 	return logical_ring_init(dev, ring);
 }
@@ -739,6 +785,8 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->mmio_base = GEN8_BSD2_RING_BASE;
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
+	ring->irq_keep_mask =
+		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
@@ -746,6 +794,8 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->submit_ctx = gen8_submit_ctx;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
+	ring->irq_get = gen8_logical_ring_get_irq;
+	ring->irq_put = gen8_logical_ring_put_irq;
 
 	return logical_ring_init(dev, ring);
 }
@@ -760,6 +810,8 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->mmio_base = BLT_RING_BASE;
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
+	ring->irq_keep_mask =
+		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
@@ -767,6 +819,8 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->submit_ctx = gen8_submit_ctx;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
+	ring->irq_get = gen8_logical_ring_get_irq;
+	ring->irq_put = gen8_logical_ring_put_irq;
 
 	return logical_ring_init(dev, ring);
 }
@@ -781,6 +835,8 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->mmio_base = VEBOX_RING_BASE;
 	ring->irq_enable_mask =
 		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
+	ring->irq_keep_mask =
+		GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
 
 	ring->init = gen8_init_common_ring;
 	ring->get_seqno = gen8_get_seqno;
@@ -788,6 +844,8 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->submit_ctx = gen8_submit_ctx;
 	ring->emit_request = gen8_emit_request;
 	ring->emit_flush = gen8_emit_flush;
+	ring->irq_get = gen8_logical_ring_get_irq;
+	ring->irq_put = gen8_logical_ring_put_irq;
 
 	return logical_ring_init(dev, ring);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 527db2a..abaf3ca 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -156,6 +156,7 @@ struct  intel_engine_cs {
 	} semaphore;
 
 	/* Execlists */
+	u32             irq_keep_mask;          /* bitmask for interrupts that should not be masked */
 	void		(*submit_ctx)(struct intel_engine_cs *ring,
 				      struct intel_context *ctx, u32 value);
 	int		(*emit_request)(struct intel_engine_cs *ring,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 32/53] drm/i915/bdw: GEN-specific logical ring emit batchbuffer start
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (30 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 31/53] drm/i915/bdw: Interrupts " oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 33/53] drm/i915: Extract the actual workload submission mechanism from execbuffer oscar.mateo
                   ` (21 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Dispatch_execbuffer's evil twin.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 29 +++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +++
 2 files changed, 32 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0fab3b9..27fde8d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -480,6 +480,30 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
 	return ret;
 }
 
+static int gen8_emit_bb_start(struct intel_engine_cs *ring,
+			      struct intel_context *ctx,
+			      u64 offset, unsigned flags)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	bool ppgtt = dev_priv->mm.aliasing_ppgtt != NULL &&
+		!(flags & I915_DISPATCH_SECURE);
+	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
+	int ret;
+
+	ret = intel_logical_ring_begin(ring, ctx, 4);
+	if (ret)
+		return ret;
+
+	/* FIXME(BDW): Address space and security selectors. */
+	intel_logical_ring_emit(ringbuf, MI_BATCH_BUFFER_START_GEN8 | (ppgtt<<8));
+	intel_logical_ring_emit(ringbuf, lower_32_bits(offset));
+	intel_logical_ring_emit(ringbuf, upper_32_bits(offset));
+	intel_logical_ring_emit(ringbuf, MI_NOOP);
+	intel_logical_ring_advance(ringbuf);
+
+	return 0;
+}
+
 static bool gen8_logical_ring_get_irq(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;
@@ -746,6 +770,7 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->emit_flush = gen8_emit_flush_render;
 	ring->irq_get = gen8_logical_ring_get_irq;
 	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
 }
@@ -771,6 +796,7 @@ static int logical_bsd_ring_init(struct drm_device *dev)
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
 	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
 }
@@ -796,6 +822,7 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
 	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
 }
@@ -821,6 +848,7 @@ static int logical_blt_ring_init(struct drm_device *dev)
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
 	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
 }
@@ -846,6 +874,7 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	ring->emit_flush = gen8_emit_flush;
 	ring->irq_get = gen8_logical_ring_get_irq;
 	ring->irq_put = gen8_logical_ring_put_irq;
+	ring->emit_bb_start = gen8_emit_bb_start;
 
 	return logical_ring_init(dev, ring);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index abaf3ca..ca02b5d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -165,6 +165,9 @@ struct  intel_engine_cs {
 				       struct intel_context *ctx,
 				       u32 invalidate_domains,
 				       u32 flush_domains);
+	int		(*emit_bb_start)(struct intel_engine_cs *ring,
+					 struct intel_context *ctx,
+					 u64 offset, unsigned flags);
 
 	/**
 	 * List of objects currently involved in rendering from the
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 33/53] drm/i915: Extract the actual workload submission mechanism from execbuffer
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (31 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 32/53] drm/i915/bdw: GEN-specific logical ring emit batchbuffer start oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 34/53] drm/i915: Make move_to_active and retire_commands outside accesible oscar.mateo
                   ` (20 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

So that we isolate the legacy ringbuffer submission mechanism, which becomes
a good candidate to be abstracted away.

No functional changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 298 ++++++++++++++++-------------
 1 file changed, 162 insertions(+), 136 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 58b3970..0c038cf 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1005,6 +1005,163 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev,
 	return 0;
 }
 
+static int
+legacy_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
+			     struct intel_engine_cs *ring,
+			     struct intel_context *ctx,
+			     struct drm_i915_gem_execbuffer2 *args,
+			     struct list_head *vmas,
+			     struct drm_i915_gem_object *batch_obj,
+			     u64 exec_start, u32 flags)
+{
+	struct drm_clip_rect *cliprects = NULL;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	u64 exec_len;
+	int instp_mode;
+	u32 instp_mask;
+	int i, ret = 0;
+
+	if (args->num_cliprects != 0) {
+		if (ring != &dev_priv->ring[RCS]) {
+			DRM_DEBUG("clip rectangles are only valid with the render ring\n");
+			return -EINVAL;
+		}
+
+		if (INTEL_INFO(dev)->gen >= 5) {
+			DRM_DEBUG("clip rectangles are only valid on pre-gen5\n");
+			return -EINVAL;
+		}
+
+		if (args->num_cliprects > UINT_MAX / sizeof(*cliprects)) {
+			DRM_DEBUG("execbuf with %u cliprects\n",
+				  args->num_cliprects);
+			return -EINVAL;
+		}
+
+		cliprects = kcalloc(args->num_cliprects,
+				    sizeof(*cliprects),
+				    GFP_KERNEL);
+		if (cliprects == NULL) {
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		if (copy_from_user(cliprects,
+				   to_user_ptr(args->cliprects_ptr),
+				   sizeof(*cliprects)*args->num_cliprects)) {
+			ret = -EFAULT;
+			goto error;
+		}
+	} else {
+		if (args->DR4 == 0xffffffff) {
+			DRM_DEBUG("UXA submitting garbage DR4, fixing up\n");
+			args->DR4 = 0;
+		}
+
+		if (args->DR1 || args->DR4 || args->cliprects_ptr) {
+			DRM_DEBUG("0 cliprects but dirt in cliprects fields\n");
+			return -EINVAL;
+		}
+	}
+
+	ret = i915_gem_execbuffer_move_to_gpu(ring, vmas);
+	if (ret)
+		goto error;
+
+	ret = i915_switch_context(ring, ctx);
+	if (ret)
+		goto error;
+
+	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
+	instp_mask = I915_EXEC_CONSTANTS_MASK;
+	switch (instp_mode) {
+	case I915_EXEC_CONSTANTS_REL_GENERAL:
+	case I915_EXEC_CONSTANTS_ABSOLUTE:
+	case I915_EXEC_CONSTANTS_REL_SURFACE:
+		if (instp_mode != 0 && ring != &dev_priv->ring[RCS]) {
+			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
+			ret = -EINVAL;
+			goto error;
+		}
+
+		if (instp_mode != dev_priv->relative_constants_mode) {
+			if (INTEL_INFO(dev)->gen < 4) {
+				DRM_DEBUG("no rel constants on pre-gen4\n");
+				ret = -EINVAL;
+				goto error;
+			}
+
+			if (INTEL_INFO(dev)->gen > 5 &&
+			    instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
+				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
+				ret = -EINVAL;
+				goto error;
+			}
+
+			/* The HW changed the meaning on this bit on gen6 */
+			if (INTEL_INFO(dev)->gen >= 6)
+				instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
+		}
+		break;
+	default:
+		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
+		ret = -EINVAL;
+		goto error;
+	}
+
+	if (ring == &dev_priv->ring[RCS] &&
+			instp_mode != dev_priv->relative_constants_mode) {
+		ret = intel_ring_begin(ring, 4);
+		if (ret)
+			goto error;
+
+		intel_ring_emit(ring, MI_NOOP);
+		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
+		intel_ring_emit(ring, INSTPM);
+		intel_ring_emit(ring, instp_mask << 16 | instp_mode);
+		intel_ring_advance(ring);
+
+		dev_priv->relative_constants_mode = instp_mode;
+	}
+
+	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
+		ret = i915_reset_gen7_sol_offsets(dev, ring);
+		if (ret)
+			goto error;
+	}
+
+	exec_len = args->batch_len;
+	if (cliprects) {
+		for (i = 0; i < args->num_cliprects; i++) {
+			ret = i915_emit_box(dev, &cliprects[i],
+					    args->DR1, args->DR4);
+			if (ret)
+				goto error;
+
+			ret = ring->dispatch_execbuffer(ring,
+							exec_start, exec_len,
+							flags);
+			if (ret)
+				goto error;
+		}
+	} else {
+		ret = ring->dispatch_execbuffer(ring,
+						exec_start, exec_len,
+						flags);
+		if (ret)
+			return ret;
+	}
+
+	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
+
+	i915_gem_execbuffer_move_to_active(vmas, ring);
+	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
+
+error:
+	kfree(cliprects);
+	return ret;
+}
+
 /**
  * Find one BSD ring to dispatch the corresponding BSD command.
  * The Ring ID is returned.
@@ -1064,14 +1221,13 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct eb_vmas *eb;
 	struct drm_i915_gem_object *batch_obj;
-	struct drm_clip_rect *cliprects = NULL;
 	struct intel_engine_cs *ring;
 	struct intel_context *ctx;
 	struct i915_address_space *vm;
 	const u32 ctx_id = i915_execbuffer2_get_context_id(*args);
-	u64 exec_start = args->batch_start_offset, exec_len;
-	u32 mask, flags;
-	int ret, mode, i;
+	u64 exec_start = args->batch_start_offset;
+	u32 flags;
+	int ret;
 	bool need_relocs;
 
 	if (!i915_gem_check_execbuffer(args))
@@ -1115,87 +1271,11 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
-	mode = args->flags & I915_EXEC_CONSTANTS_MASK;
-	mask = I915_EXEC_CONSTANTS_MASK;
-	switch (mode) {
-	case I915_EXEC_CONSTANTS_REL_GENERAL:
-	case I915_EXEC_CONSTANTS_ABSOLUTE:
-	case I915_EXEC_CONSTANTS_REL_SURFACE:
-		if (mode != 0 && ring != &dev_priv->ring[RCS]) {
-			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
-			return -EINVAL;
-		}
-
-		if (mode != dev_priv->relative_constants_mode) {
-			if (INTEL_INFO(dev)->gen < 4) {
-				DRM_DEBUG("no rel constants on pre-gen4\n");
-				return -EINVAL;
-			}
-
-			if (INTEL_INFO(dev)->gen > 5 &&
-			    mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
-				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
-				return -EINVAL;
-			}
-
-			/* The HW changed the meaning on this bit on gen6 */
-			if (INTEL_INFO(dev)->gen >= 6)
-				mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
-		}
-		break;
-	default:
-		DRM_DEBUG("execbuf with unknown constants: %d\n", mode);
-		return -EINVAL;
-	}
-
 	if (args->buffer_count < 1) {
 		DRM_DEBUG("execbuf with %d buffers\n", args->buffer_count);
 		return -EINVAL;
 	}
 
-	if (args->num_cliprects != 0) {
-		if (ring != &dev_priv->ring[RCS]) {
-			DRM_DEBUG("clip rectangles are only valid with the render ring\n");
-			return -EINVAL;
-		}
-
-		if (INTEL_INFO(dev)->gen >= 5) {
-			DRM_DEBUG("clip rectangles are only valid on pre-gen5\n");
-			return -EINVAL;
-		}
-
-		if (args->num_cliprects > UINT_MAX / sizeof(*cliprects)) {
-			DRM_DEBUG("execbuf with %u cliprects\n",
-				  args->num_cliprects);
-			return -EINVAL;
-		}
-
-		cliprects = kcalloc(args->num_cliprects,
-				    sizeof(*cliprects),
-				    GFP_KERNEL);
-		if (cliprects == NULL) {
-			ret = -ENOMEM;
-			goto pre_mutex_err;
-		}
-
-		if (copy_from_user(cliprects,
-				   to_user_ptr(args->cliprects_ptr),
-				   sizeof(*cliprects)*args->num_cliprects)) {
-			ret = -EFAULT;
-			goto pre_mutex_err;
-		}
-	} else {
-		if (args->DR4 == 0xffffffff) {
-			DRM_DEBUG("UXA submitting garbage DR4, fixing up\n");
-			args->DR4 = 0;
-		}
-
-		if (args->DR1 || args->DR4 || args->cliprects_ptr) {
-			DRM_DEBUG("0 cliprects but dirt in cliprects fields\n");
-			return -EINVAL;
-		}
-	}
-
 	intel_runtime_pm_get(dev_priv);
 
 	ret = i915_mutex_lock_interruptible(dev);
@@ -1299,63 +1379,11 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	else
 		exec_start += i915_gem_obj_offset(batch_obj, vm);
 
-	ret = i915_gem_execbuffer_move_to_gpu(ring, &eb->vmas);
+	ret = legacy_ringbuffer_submission(dev, file, ring, ctx,
+			args, &eb->vmas, batch_obj, exec_start, flags);
 	if (ret)
 		goto err;
 
-	ret = i915_switch_context(ring, ctx);
-	if (ret)
-		goto err;
-
-	if (ring == &dev_priv->ring[RCS] &&
-	    mode != dev_priv->relative_constants_mode) {
-		ret = intel_ring_begin(ring, 4);
-		if (ret)
-				goto err;
-
-		intel_ring_emit(ring, MI_NOOP);
-		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
-		intel_ring_emit(ring, INSTPM);
-		intel_ring_emit(ring, mask << 16 | mode);
-		intel_ring_advance(ring);
-
-		dev_priv->relative_constants_mode = mode;
-	}
-
-	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
-		ret = i915_reset_gen7_sol_offsets(dev, ring);
-		if (ret)
-			goto err;
-	}
-
-
-	exec_len = args->batch_len;
-	if (cliprects) {
-		for (i = 0; i < args->num_cliprects; i++) {
-			ret = i915_emit_box(dev, &cliprects[i],
-					    args->DR1, args->DR4);
-			if (ret)
-				goto err;
-
-			ret = ring->dispatch_execbuffer(ring,
-							exec_start, exec_len,
-							flags);
-			if (ret)
-				goto err;
-		}
-	} else {
-		ret = ring->dispatch_execbuffer(ring,
-						exec_start, exec_len,
-						flags);
-		if (ret)
-			goto err;
-	}
-
-	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
-
-	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
-	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
-
 err:
 	/* the request owns the ref now */
 	i915_gem_context_unreference(ctx);
@@ -1364,8 +1392,6 @@ err:
 	mutex_unlock(&dev->struct_mutex);
 
 pre_mutex_err:
-	kfree(cliprects);
-
 	/* intel_gpu_busy should also get a ref, so it will free when the device
 	 * is really idle. */
 	intel_runtime_pm_put(dev_priv);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 34/53] drm/i915: Make move_to_active and retire_commands outside accesible
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (32 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 33/53] drm/i915: Extract the actual workload submission mechanism from execbuffer oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 35/53] drm/i915/bdw: Workload submission mechanism for Execlists oscar.mateo
                   ` (19 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Trivial change.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            | 6 ++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 65a85ee..3e9983c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2125,6 +2125,12 @@ int i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 			      struct drm_file *file_priv);
 int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 			     struct drm_file *file_priv);
+void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
+					struct intel_engine_cs *ring);
+void i915_gem_execbuffer_retire_commands(struct drm_device *dev,
+					 struct drm_file *file,
+					 struct intel_engine_cs *ring,
+					 struct drm_i915_gem_object *obj);
 int i915_gem_execbuffer(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
 int i915_gem_execbuffer2(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 0c038cf..366469d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -931,7 +931,7 @@ validate_exec_list(struct drm_i915_gem_exec_object2 *exec,
 	return 0;
 }
 
-static void
+void
 i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 				   struct intel_engine_cs *ring)
 {
@@ -965,7 +965,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 	}
 }
 
-static void
+void
 i915_gem_execbuffer_retire_commands(struct drm_device *dev,
 				    struct drm_file *file,
 				    struct intel_engine_cs *ring,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 35/53] drm/i915/bdw: Workload submission mechanism for Execlists
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (33 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 34/53] drm/i915: Make move_to_active and retire_commands outside accesible oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 36/53] drm/i915: Abstract the workload submission mechanism away oscar.mateo
                   ` (18 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

This is what i915_gem_do_execbuffer calls when it wants to execute some
worload in an Execlists world. It's a candidate for abstracting the
submission mechanism away.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   8 +-
 drivers/gpu/drm/i915/intel_lrc.c           | 138 +++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h           |   8 ++
 3 files changed, 152 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 366469d..36c7f0c 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1379,8 +1379,12 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	else
 		exec_start += i915_gem_obj_offset(batch_obj, vm);
 
-	ret = legacy_ringbuffer_submission(dev, file, ring, ctx,
-			args, &eb->vmas, batch_obj, exec_start, flags);
+	if (intel_enable_execlists(dev))
+		ret = intel_execlists_submission(dev, file, ring, ctx,
+				args, &eb->vmas, batch_obj, exec_start, flags);
+	else
+		ret = legacy_ringbuffer_submission(dev, file, ring, ctx,
+				args, &eb->vmas, batch_obj, exec_start, flags);
 	if (ret)
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 27fde8d..c9a5e00 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -130,6 +130,144 @@ static int logical_ring_flush_all_caches(struct intel_engine_cs *ring,
 	return 0;
 }
 
+static int logical_ring_invalidate_all_caches(struct intel_engine_cs *ring,
+					      struct intel_context *ctx)
+{
+	uint32_t flush_domains;
+	int ret;
+
+	flush_domains = 0;
+	if (ring->gpu_caches_dirty)
+		flush_domains = I915_GEM_GPU_DOMAINS;
+
+	ret = ring->emit_flush(ring, ctx, I915_GEM_GPU_DOMAINS, flush_domains);
+	if (ret)
+		return ret;
+
+	ring->gpu_caches_dirty = false;
+	return 0;
+}
+
+static int execlists_move_to_gpu(struct intel_engine_cs *ring,
+				 struct intel_context *ctx,
+				 struct list_head *vmas)
+{
+	struct i915_vma *vma;
+	uint32_t flush_domains = 0;
+	bool flush_chipset = false;
+	int ret;
+
+	list_for_each_entry(vma, vmas, exec_list) {
+		struct drm_i915_gem_object *obj = vma->obj;
+		ret = i915_gem_object_sync(obj, ring);
+		if (ret)
+			return ret;
+
+		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
+			flush_chipset |= i915_gem_clflush_object(obj, false);
+
+		flush_domains |= obj->base.write_domain;
+	}
+
+	if (flush_chipset)
+		i915_gem_chipset_flush(ring->dev);
+
+	if (flush_domains & I915_GEM_DOMAIN_GTT)
+		wmb();
+
+	/* Unconditionally invalidate gpu caches and ensure that we do flush
+	 * any residual writes from the previous batch.
+	 */
+	return logical_ring_invalidate_all_caches(ring, ctx);
+}
+
+int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
+			       struct intel_engine_cs *ring,
+			       struct intel_context *ctx,
+			       struct drm_i915_gem_execbuffer2 *args,
+			       struct list_head *vmas,
+			       struct drm_i915_gem_object *batch_obj,
+			       u64 exec_start, u32 flags)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
+	int instp_mode;
+	u32 instp_mask;
+	int ret;
+
+	if (args->num_cliprects != 0) {
+		DRM_DEBUG("clip rectangles are only valid on pre-gen5\n");
+		return -EINVAL;
+	} else {
+		if (args->DR4 == 0xffffffff) {
+			DRM_DEBUG("UXA submitting garbage DR4, fixing up\n");
+			args->DR4 = 0;
+		}
+
+		if (args->DR1 || args->DR4 || args->cliprects_ptr) {
+			DRM_DEBUG("0 cliprects but dirt in cliprects fields\n");
+			return -EINVAL;
+		}
+	}
+
+	ret = execlists_move_to_gpu(ring, ctx, vmas);
+	if (ret)
+		return ret;
+
+	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
+	instp_mask = I915_EXEC_CONSTANTS_MASK;
+	/* The HW changed the meaning on this bit on gen6 */
+	instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
+	switch (instp_mode) {
+	case I915_EXEC_CONSTANTS_REL_GENERAL:
+		break;
+
+	case I915_EXEC_CONSTANTS_ABSOLUTE:
+		if (ring != &dev_priv->ring[RCS]) {
+			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
+			return -EINVAL;
+		}
+		break;
+
+	case I915_EXEC_CONSTANTS_REL_SURFACE:
+		DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
+		return -EINVAL;
+
+	default:
+		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
+		return -EINVAL;
+	}
+
+	if (ring == &dev_priv->ring[RCS] &&
+			instp_mode != dev_priv->relative_constants_mode) {
+		ret = intel_logical_ring_begin(ring, ctx, 4);
+		if (ret)
+			return ret;
+
+		intel_logical_ring_emit(ringbuf, MI_NOOP);
+		intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1));
+		intel_logical_ring_emit(ringbuf, INSTPM);
+		intel_logical_ring_emit(ringbuf, instp_mask << 16 | instp_mode);
+		intel_logical_ring_advance(ringbuf);
+
+		dev_priv->relative_constants_mode = instp_mode;
+	}
+
+	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
+		DRM_DEBUG("sol reset is gen7 only\n");
+		return -EINVAL;
+	}
+
+	ret = ring->emit_bb_start(ring, ctx, exec_start, flags);
+	if (ret)
+		return ret;
+
+	i915_gem_execbuffer_move_to_active(vmas, ring);
+	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
+
+	return 0;
+}
+
 int intel_logical_ring_add_request(struct intel_engine_cs *ring,
 				   struct drm_file *file,
 				   struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 4495359..0cb7cb5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -5,6 +5,14 @@
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
 int intel_logical_rings_init(struct drm_device *dev);
 
+int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
+			       struct intel_engine_cs *ring,
+			       struct intel_context *ctx,
+			       struct drm_i915_gem_execbuffer2 *args,
+			       struct list_head *vmas,
+			       struct drm_i915_gem_object *batch_obj,
+			       u64 exec_start, u32 flags);
+
 int intel_logical_ring_add_request(struct intel_engine_cs *ring,
 				   struct drm_file *file,
 				   struct drm_i915_gem_object *obj,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 36/53] drm/i915: Abstract the workload submission mechanism away
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (34 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 35/53] drm/i915/bdw: Workload submission mechanism for Execlists oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-18 20:40   ` Daniel Vetter
  2014-06-13 15:37 ` [PATCH 37/53] drm/i915/bdw: Implement context switching (somewhat) oscar.mateo
                   ` (17 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

As suggested by Daniel.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            | 26 ++++++++++++++++
 drivers/gpu/drm/i915/i915_gem.c            | 48 +++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 25 +++++++---------
 3 files changed, 67 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3e9983c..89b6d5c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1554,6 +1554,23 @@ struct drm_i915_private {
 	/* Old ums support infrastructure, same warning applies. */
 	struct i915_ums_state ums;
 
+	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
+	struct {
+		int (*do_execbuf) (struct drm_device *dev, struct drm_file *file,
+				   struct intel_engine_cs *ring,
+				   struct intel_context *ctx,
+				   struct drm_i915_gem_execbuffer2 *args,
+				   struct list_head *vmas,
+				   struct drm_i915_gem_object *batch_obj,
+				   u64 exec_start, u32 flags);
+		int (*add_request) (struct intel_engine_cs *ring,
+				    struct drm_file *file,
+				    struct drm_i915_gem_object *obj,
+				    u32 *out_seqno);
+		int (*init_rings) (struct drm_device *dev);
+		void (*cleanup_ring) (struct intel_engine_cs *ring);
+	} gt;
+
 	/*
 	 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
 	 * will be rejected. Instead look for a better place.
@@ -2131,6 +2148,14 @@ void i915_gem_execbuffer_retire_commands(struct drm_device *dev,
 					 struct drm_file *file,
 					 struct intel_engine_cs *ring,
 					 struct drm_i915_gem_object *obj);
+int i915_gem_ringbuffer_submission(struct drm_device *dev,
+				   struct drm_file *file,
+				   struct intel_engine_cs *ring,
+				   struct intel_context *ctx,
+				   struct drm_i915_gem_execbuffer2 *args,
+				   struct list_head *vmas,
+				   struct drm_i915_gem_object *batch_obj,
+				   u64 exec_start, u32 flags);
 int i915_gem_execbuffer(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
 int i915_gem_execbuffer2(struct drm_device *dev, void *data,
@@ -2281,6 +2306,7 @@ void i915_gem_reset(struct drm_device *dev);
 bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
 int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
 int __must_check i915_gem_init(struct drm_device *dev);
+int i915_gem_init_rings(struct drm_device *dev);
 int __must_check i915_gem_init_hw(struct drm_device *dev);
 int i915_gem_l3_remap(struct intel_engine_cs *ring, int slice);
 void i915_gem_init_swizzling(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 69db71a..7c10540 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2310,19 +2310,16 @@ i915_gem_get_seqno(struct drm_device *dev, u32 *seqno)
 	return 0;
 }
 
-int __i915_add_request(struct intel_engine_cs *ring,
-		       struct drm_file *file,
-		       struct drm_i915_gem_object *obj,
-		       u32 *out_seqno)
+static int i915_gem_add_request(struct intel_engine_cs *ring,
+				struct drm_file *file,
+				struct drm_i915_gem_object *obj,
+				u32 *out_seqno)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	struct drm_i915_gem_request *request;
 	u32 request_ring_position, request_start;
 	int ret;
 
-	if (intel_enable_execlists(ring->dev))
-		return intel_logical_ring_add_request(ring, file, obj, out_seqno);
-
 	request_start = intel_ring_get_tail(ring->buffer);
 	/*
 	 * Emit any outstanding flushes - execbuf can fail to emit the flush
@@ -2403,6 +2400,16 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	return 0;
 }
 
+int __i915_add_request(struct intel_engine_cs *ring,
+		       struct drm_file *file,
+		       struct drm_i915_gem_object *obj,
+		       u32 *out_seqno)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	return dev_priv->gt.add_request(ring, file, obj, out_seqno);
+}
+
 static inline void
 i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
 {
@@ -4627,7 +4634,7 @@ intel_enable_blt(struct drm_device *dev)
 	return true;
 }
 
-static int i915_gem_init_rings(struct drm_device *dev)
+int i915_gem_init_rings(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret;
@@ -4710,10 +4717,7 @@ i915_gem_init_hw(struct drm_device *dev)
 
 	i915_gem_init_swizzling(dev);
 
-	if (intel_enable_execlists(dev))
-		ret = intel_logical_rings_init(dev);
-	else
-		ret = i915_gem_init_rings(dev);
+	ret = dev_priv->gt.init_rings(dev);
 	if (ret)
 		return ret;
 
@@ -4751,6 +4755,18 @@ int i915_gem_init(struct drm_device *dev)
 			DRM_DEBUG_DRIVER("allow wake ack timed out\n");
 	}
 
+	if (intel_enable_execlists(dev)) {
+		dev_priv->gt.do_execbuf = intel_execlists_submission;
+		dev_priv->gt.add_request = intel_logical_ring_add_request;
+		dev_priv->gt.init_rings = intel_logical_rings_init;
+		dev_priv->gt.cleanup_ring = intel_logical_ring_cleanup;
+	} else {
+		dev_priv->gt.do_execbuf = i915_gem_ringbuffer_submission;
+		dev_priv->gt.add_request = i915_gem_add_request;
+		dev_priv->gt.init_rings = i915_gem_init_rings;
+		dev_priv->gt.cleanup_ring = intel_cleanup_ring_buffer;
+	}
+
 	i915_gem_init_userptr(dev);
 	i915_gem_init_global_gtt(dev);
 
@@ -4785,12 +4801,8 @@ i915_gem_cleanup_ringbuffer(struct drm_device *dev)
 	struct intel_engine_cs *ring;
 	int i;
 
-	for_each_ring(ring, dev_priv, i) {
-		if (intel_enable_execlists(dev))
-			intel_logical_ring_cleanup(ring);
-		else
-			intel_cleanup_ring_buffer(ring);
-	}
+	for_each_ring(ring, dev_priv, i)
+		dev_priv->gt.cleanup_ring(ring);
 }
 
 int
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 36c7f0c..f0dd31f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1005,14 +1005,15 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev,
 	return 0;
 }
 
-static int
-legacy_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
-			     struct intel_engine_cs *ring,
-			     struct intel_context *ctx,
-			     struct drm_i915_gem_execbuffer2 *args,
-			     struct list_head *vmas,
-			     struct drm_i915_gem_object *batch_obj,
-			     u64 exec_start, u32 flags)
+int
+i915_gem_ringbuffer_submission(struct drm_device *dev,
+			       struct drm_file *file,
+			       struct intel_engine_cs *ring,
+			       struct intel_context *ctx,
+			       struct drm_i915_gem_execbuffer2 *args,
+			       struct list_head *vmas,
+			       struct drm_i915_gem_object *batch_obj,
+			       u64 exec_start, u32 flags)
 {
 	struct drm_clip_rect *cliprects = NULL;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1379,12 +1380,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	else
 		exec_start += i915_gem_obj_offset(batch_obj, vm);
 
-	if (intel_enable_execlists(dev))
-		ret = intel_execlists_submission(dev, file, ring, ctx,
-				args, &eb->vmas, batch_obj, exec_start, flags);
-	else
-		ret = legacy_ringbuffer_submission(dev, file, ring, ctx,
-				args, &eb->vmas, batch_obj, exec_start, flags);
+	ret = dev_priv->gt.do_execbuf(dev, file, ring, ctx, args,
+			&eb->vmas, batch_obj, exec_start, flags);
 	if (ret)
 		goto err;
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 37/53] drm/i915/bdw: Implement context switching (somewhat)
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (35 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 36/53] drm/i915: Abstract the workload submission mechanism away oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 17:00   ` Chris Wilson
  2014-06-13 15:37 ` [PATCH 38/53] drm/i915/bdw: Write the tail pointer, LRC style oscar.mateo
                   ` (16 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

A context switch occurs by submitting a context descriptor to the
ExecList Submission Port. Given that we can now initialize a context,
it's possible to begin implementing the context switch by creating the
descriptor and submitting it to ELSP (actually two, since the ELSP
has two ports).

The context object must be mapped in the GGTT, which means it must exist
in the 0-4GB graphics VA range.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

v2: This code has changed quite a lot in various rebases. Of particular
importance is that now we use the globally unique Submission ID to send
to the hardware. Also, context pages are now pinned unconditionally to
GGTT, so there is no need to bind them.

v3: Use LRCA[31:12] as hwCtxId[19:0]. This guarantees that the HW context
ID we submit to the ELSP is globally unique and != 0 (Bspec requirements
of the software use-only bits of the Context ID in the Context Descriptor
Format) without the hassle of the previous submission Id construction.
Also, re-add the ELSP porting read (it was dropped somewhere during the
rebases).

v4:
- Squash with "drm/i915/bdw: Add forcewake lock around ELSP writes" (BSPEC
  says: "SW must set Force Wakeup bit to prevent GT from entering C6 while
  ELSP writes are in progress") as noted by Thomas Daniel
  (thomas.daniel@intel.com).
- Rename functions and use an execlists/intel_execlists_ namespace.
- The BUG_ON only checked that the LRCA was <32 bits, but it didn't make
  sure that it was properly aligned. Spotted by Alistair Mcaulay
  <alistair.mcaulay@intel.com>.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 112 ++++++++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.h |   1 +
 2 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c9a5e00..4e8268c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -47,6 +47,7 @@
 #define GEN8_LR_CONTEXT_ALIGN 4096
 
 #define RING_ELSP(ring)			((ring)->mmio_base+0x230)
+#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
 #define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
 
 #define CTX_LRI_HEADER_0		0x01
@@ -78,6 +79,26 @@
 #define CTX_R_PWR_CLK_STATE		0x42
 #define CTX_GPGPU_CSR_BASE_ADDRESS	0x44
 
+#define GEN8_CTX_VALID (1<<0)
+#define GEN8_CTX_FORCE_PD_RESTORE (1<<1)
+#define GEN8_CTX_FORCE_RESTORE (1<<2)
+#define GEN8_CTX_L3LLC_COHERENT (1<<5)
+#define GEN8_CTX_PRIVILEGE (1<<8)
+enum {
+	ADVANCED_CONTEXT=0,
+	LEGACY_CONTEXT,
+	ADVANCED_AD_CONTEXT,
+	LEGACY_64B_CONTEXT
+};
+#define GEN8_CTX_MODE_SHIFT 3
+enum {
+	FAULT_AND_HANG=0,
+	FAULT_AND_HALT, /* Debug only */
+	FAULT_AND_STREAM,
+	FAULT_AND_CONTINUE /* Unsupported */
+};
+#define GEN8_CTX_ID_SHIFT 32
+
 bool intel_enable_execlists(struct drm_device *dev)
 {
 	if (!i915.enable_execlists)
@@ -86,6 +107,94 @@ bool intel_enable_execlists(struct drm_device *dev)
 	return HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev);
 }
 
+u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
+{
+	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj);
+
+	/* LRCA is required to be 4K aligned so the more significant 20 bits
+	 * are globally unique */
+	return lrca >> 12;
+}
+
+static uint64_t execlists_ctx_descriptor(struct drm_i915_gem_object *ctx_obj)
+{
+	uint64_t desc;
+	uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj);
+	BUG_ON(lrca & 0xFFFFFFFF00000FFFULL);
+
+	desc = GEN8_CTX_VALID;
+	desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
+	desc |= GEN8_CTX_L3LLC_COHERENT;
+	desc |= GEN8_CTX_PRIVILEGE;
+	desc |= lrca;
+	desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
+
+	/* TODO: WaDisableLiteRestore when we start using semaphore
+	 * signalling between Command Streamers */
+	/* desc |= GEN8_CTX_FORCE_RESTORE; */
+
+	return desc;
+}
+
+static void execlists_elsp_write(struct intel_engine_cs *ring,
+				 struct drm_i915_gem_object *ctx_obj0,
+				 struct drm_i915_gem_object *ctx_obj1)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	uint64_t temp = 0;
+	uint32_t desc[4];
+
+	/* XXX: You must always write both descriptors in the order below. */
+	if (ctx_obj1)
+		temp = execlists_ctx_descriptor(ctx_obj1);
+	else
+		temp = 0;
+	desc[1] = (u32)(temp >> 32);
+	desc[0] = (u32)temp;
+
+	temp = execlists_ctx_descriptor(ctx_obj0);
+	desc[3] = (u32)(temp >> 32);
+	desc[2] = (u32)temp;
+
+	/* Set Force Wakeup bit to prevent GT from entering C6 while
+	 * ELSP writes are in progress */
+	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
+
+	I915_WRITE(RING_ELSP(ring), desc[1]);
+	I915_WRITE(RING_ELSP(ring), desc[0]);
+	I915_WRITE(RING_ELSP(ring), desc[3]);
+	/* The context is automatically loaded after the following */
+	I915_WRITE(RING_ELSP(ring), desc[2]);
+
+	/* ELSP is a write only register, so this serves as a posting read */
+	POSTING_READ(RING_EXECLIST_STATUS(ring));
+
+	/* Release Force Wakeup */
+	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
+}
+
+static int execlists_submit_context(struct intel_engine_cs *ring,
+				    struct intel_context *to0, u32 tail0,
+				    struct intel_context *to1, u32 tail1)
+{
+	struct drm_i915_gem_object *ctx_obj0;
+	struct drm_i915_gem_object *ctx_obj1 = NULL;
+
+	ctx_obj0 = to0->engine[ring->id].obj;
+	BUG_ON(!ctx_obj0);
+	BUG_ON(!i915_gem_obj_is_pinned(ctx_obj0));
+
+	if (to1) {
+		ctx_obj1 = to1->engine[ring->id].obj;
+		BUG_ON(!ctx_obj1);
+		BUG_ON(!i915_gem_obj_is_pinned(ctx_obj1));
+	}
+
+	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
+
+	return 0;
+}
+
 static inline struct intel_ringbuffer *
 logical_ringbuf_get(struct intel_engine_cs *ring, struct intel_context *ctx)
 {
@@ -763,7 +872,8 @@ static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
 static void gen8_submit_ctx(struct intel_engine_cs *ring,
 			    struct intel_context *ctx, u32 value)
 {
-	DRM_ERROR("Execlists still not ready!\n");
+	/* FIXME: too cheeky, we don't even check if the ELSP is ready */
+	execlists_submit_context(ring, ctx, value, NULL, 0);
 }
 
 static int gen8_emit_request(struct intel_engine_cs *ring,
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 0cb7cb5..eeb90ec 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -41,6 +41,7 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring);
 
 /* Execlists */
+u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 bool intel_enable_execlists(struct drm_device *dev);
 
 #endif /* _INTEL_LRC_H_ */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 38/53] drm/i915/bdw: Write the tail pointer, LRC style
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (36 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 37/53] drm/i915/bdw: Implement context switching (somewhat) oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 39/53] drm/i915/bdw: Two-stage execlist submit process oscar.mateo
                   ` (15 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Each logical ring context has the tail pointer in the context object,
so update it before submission.

v2: New namespace.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4e8268c..cd4512f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -173,6 +173,21 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
 }
 
+static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tail)
+{
+	struct page *page;
+	uint32_t *reg_state;
+
+	page = i915_gem_object_get_page(ctx_obj, 1);
+	reg_state = kmap_atomic(page);
+
+	reg_state[CTX_RING_TAIL+1] = tail;
+
+	kunmap_atomic(reg_state);
+
+	return 0;
+}
+
 static int execlists_submit_context(struct intel_engine_cs *ring,
 				    struct intel_context *to0, u32 tail0,
 				    struct intel_context *to1, u32 tail1)
@@ -184,10 +199,14 @@ static int execlists_submit_context(struct intel_engine_cs *ring,
 	BUG_ON(!ctx_obj0);
 	BUG_ON(!i915_gem_obj_is_pinned(ctx_obj0));
 
+	execlists_ctx_write_tail(ctx_obj0, tail0);
+
 	if (to1) {
 		ctx_obj1 = to1->engine[ring->id].obj;
 		BUG_ON(!ctx_obj1);
 		BUG_ON(!i915_gem_obj_is_pinned(ctx_obj1));
+
+		execlists_ctx_write_tail(ctx_obj1, tail1);
 	}
 
 	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 39/53] drm/i915/bdw: Two-stage execlist submit process
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (37 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 38/53] drm/i915/bdw: Write the tail pointer, LRC style oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 40/53] drm/i915/bdw: Handle context switch events oscar.mateo
                   ` (14 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Michel Thierry <michel.thierry@intel.com>

Context switch (and execlist submission) should happen only when
other contexts are not active, otherwise pre-emption occurs.

To assure this, we place context switch requests in a queue and those
request are later consumed when the right context switch interrupt is
received (still TODO).

v2: Use a spinlock, do not remove the requests on unqueue (wait for
context switch completion).

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

v3: Several rebases and code changes. Use unique ID.

v4:
- Move the queue/lock init to the late ring initialization.
- Damien's kmalloc review comments: check return, use sizeof(*req),
do not cast.

v5:
- Do not reuse drm_i915_gem_request. Instead, create our own.
- New namespace.

Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2-v5)
---
 drivers/gpu/drm/i915/intel_lrc.c        | 63 +++++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_lrc.h        |  8 +++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index cd4512f..49d3c00 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -214,6 +214,63 @@ static int execlists_submit_context(struct intel_engine_cs *ring,
 	return 0;
 }
 
+static void execlists_context_unqueue(struct intel_engine_cs *ring)
+{
+	struct intel_ctx_submit_request *req0 = NULL, *req1 = NULL;
+	struct intel_ctx_submit_request *cursor = NULL, *tmp = NULL;
+
+	if (list_empty(&ring->execlist_queue))
+		return;
+
+	/* Try to read in pairs */
+	list_for_each_entry_safe(cursor, tmp, &ring->execlist_queue, execlist_link) {
+		if (!req0)
+			req0 = cursor;
+		else if (req0->ctx == cursor->ctx) {
+			/* Same ctx: ignore first request, as second request
+			 * will update tail past first request's workload */
+			list_del(&req0->execlist_link);
+			i915_gem_context_unreference(req0->ctx);
+			kfree(req0);
+			req0 = cursor;
+		} else {
+			req1 = cursor;
+			break;
+		}
+	}
+
+	BUG_ON(execlists_submit_context(ring, req0->ctx, req0->tail,
+			req1? req1->ctx : NULL, req1? req1->tail : 0));
+}
+
+static int execlists_context_queue(struct intel_engine_cs *ring,
+				   struct intel_context *to,
+				   u32 tail)
+{
+	struct intel_ctx_submit_request *req = NULL;
+	unsigned long flags;
+	bool was_empty;
+
+	req = kzalloc(sizeof(*req), GFP_KERNEL);
+	if (req == NULL)
+		return -ENOMEM;
+	req->ctx = to;
+	i915_gem_context_reference(req->ctx);
+	req->ring = ring;
+	req->tail = tail;
+
+	spin_lock_irqsave(&ring->execlist_lock, flags);
+
+	was_empty = list_empty(&ring->execlist_queue);
+	list_add_tail(&req->execlist_link, &ring->execlist_queue);
+	if (was_empty)
+		execlists_context_unqueue(ring);
+
+	spin_unlock_irqrestore(&ring->execlist_lock, flags);
+
+	return 0;
+}
+
 static inline struct intel_ringbuffer *
 logical_ringbuf_get(struct intel_engine_cs *ring, struct intel_context *ctx)
 {
@@ -891,8 +948,7 @@ static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
 static void gen8_submit_ctx(struct intel_engine_cs *ring,
 			    struct intel_context *ctx, u32 value)
 {
-	/* FIXME: too cheeky, we don't even check if the ELSP is ready */
-	execlists_submit_context(ring, ctx, value, NULL, 0);
+	execlists_context_queue(ring, ctx, value);
 }
 
 static int gen8_emit_request(struct intel_engine_cs *ring,
@@ -988,6 +1044,9 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	INIT_LIST_HEAD(&ring->request_list);
 	init_waitqueue_head(&ring->irq_queue);
 
+	INIT_LIST_HEAD(&ring->execlist_queue);
+	spin_lock_init(&ring->execlist_lock);
+
 	ret = intel_lr_context_deferred_create(dctx, ring);
 	if (ret)
 		return ret;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index eeb90ec..e1938a3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -44,4 +44,12 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 bool intel_enable_execlists(struct drm_device *dev);
 
+struct intel_ctx_submit_request {
+	struct intel_context *ctx;
+	struct intel_engine_cs *ring;
+	u32 tail;
+
+	struct list_head execlist_link;
+};
+
 #endif /* _INTEL_LRC_H_ */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index ca02b5d..c3342e1 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -156,6 +156,8 @@ struct  intel_engine_cs {
 	} semaphore;
 
 	/* Execlists */
+	spinlock_t execlist_lock;
+	struct list_head execlist_queue;
 	u32             irq_keep_mask;          /* bitmask for interrupts that should not be masked */
 	void		(*submit_ctx)(struct intel_engine_cs *ring,
 				      struct intel_context *ctx, u32 value);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 40/53] drm/i915/bdw: Handle context switch events
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (38 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 39/53] drm/i915/bdw: Two-stage execlist submit process oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-13 15:37 ` [PATCH 41/53] drm/i915/bdw: Avoid non-lite-restore preemptions oscar.mateo
                   ` (13 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Thomas Daniel <thomas.daniel@intel.com>

Handle all context status events in the context status buffer on every
context switch interrupt. We only remove work from the execlist queue
after a context status buffer reports that it has completed and we only
attempt to schedule new contexts on interrupt when a previously submitted
context completes (unless no contexts are queued, which means the GPU is
free).

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

v2: Unreferencing the context when we are freeing the request might free
the backing bo, which requires the struct_mutex to be grabbed, so defer
unreferencing and freeing to a bottom half.

v3:
- Ack the interrupt inmediately, before trying to handle it (fix for
missing interrupts by Bob Beckett <robert.beckett@intel.com>).
- Update the Context Status Buffer Read Pointer, just in case (spotted
by Damien Lespiau).

v4: New namespace and multiple rebase changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c         |  41 ++++++++-----
 drivers/gpu/drm/i915/intel_lrc.c        | 104 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.h        |   3 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
 4 files changed, 133 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index c566c38..b0fa1ed 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1455,6 +1455,7 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 				       struct drm_i915_private *dev_priv,
 				       u32 master_ctl)
 {
+	struct intel_engine_cs *ring;
 	u32 rcs, bcs, vcs;
 	uint32_t tmp = 0;
 	irqreturn_t ret = IRQ_NONE;
@@ -1462,16 +1463,22 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 	if (master_ctl & (GEN8_GT_RCS_IRQ | GEN8_GT_BCS_IRQ)) {
 		tmp = I915_READ(GEN8_GT_IIR(0));
 		if (tmp) {
+			I915_WRITE(GEN8_GT_IIR(0), tmp);
 			ret = IRQ_HANDLED;
+
 			rcs = tmp >> GEN8_RCS_IRQ_SHIFT;
-			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
+			ring = &dev_priv->ring[RCS];
 			if (rcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[RCS]);
+				notify_ring(dev, ring);
+			if (rcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				intel_execlists_handle_ctx_events(ring);
+
+			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
+			ring = &dev_priv->ring[BCS];
 			if (bcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[BCS]);
-			if ((rcs | bcs) & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
-				DRM_DEBUG_DRIVER("TODO: Context switch\n");
-			I915_WRITE(GEN8_GT_IIR(0), tmp);
+				notify_ring(dev, ring);
+			if (bcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				intel_execlists_handle_ctx_events(ring);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT0)!\n");
 	}
@@ -1479,18 +1486,22 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 	if (master_ctl & (GEN8_GT_VCS1_IRQ | GEN8_GT_VCS2_IRQ)) {
 		tmp = I915_READ(GEN8_GT_IIR(1));
 		if (tmp) {
+			I915_WRITE(GEN8_GT_IIR(1), tmp);
 			ret = IRQ_HANDLED;
+
 			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
+			ring = &dev_priv->ring[VCS];
 			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[VCS]);
+				notify_ring(dev, ring);
 			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
-				DRM_DEBUG_DRIVER("TODO: Context switch\n");
+				intel_execlists_handle_ctx_events(ring);
+
 			vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
+			ring = &dev_priv->ring[VCS2];
 			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[VCS2]);
+				notify_ring(dev, ring);
 			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
-				DRM_DEBUG_DRIVER("TODO: Context switch\n");
-			I915_WRITE(GEN8_GT_IIR(1), tmp);
+				intel_execlists_handle_ctx_events(ring);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT1)!\n");
 	}
@@ -1509,13 +1520,15 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 	if (master_ctl & GEN8_GT_VECS_IRQ) {
 		tmp = I915_READ(GEN8_GT_IIR(3));
 		if (tmp) {
+			I915_WRITE(GEN8_GT_IIR(3), tmp);
 			ret = IRQ_HANDLED;
+
 			vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
+			ring = &dev_priv->ring[VECS];
 			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[VECS]);
+				notify_ring(dev, ring);
 			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
-				DRM_DEBUG_DRIVER("TODO: Context switch\n");
-			I915_WRITE(GEN8_GT_IIR(3), tmp);
+				intel_execlists_handle_ctx_events(ring);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT3)!\n");
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 49d3c00..290391c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -49,6 +49,22 @@
 #define RING_ELSP(ring)			((ring)->mmio_base+0x230)
 #define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
 #define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
+#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
+#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
+
+#define RING_EXECLIST_QFULL		(1 << 0x2)
+#define RING_EXECLIST1_VALID		(1 << 0x3)
+#define RING_EXECLIST0_VALID		(1 << 0x4)
+#define RING_EXECLIST_ACTIVE_STATUS	(3 << 0xE)
+#define RING_EXECLIST1_ACTIVE		(1 << 0x11)
+#define RING_EXECLIST0_ACTIVE		(1 << 0x12)
+
+#define GEN8_CTX_STATUS_IDLE_ACTIVE	(1 << 0)
+#define GEN8_CTX_STATUS_PREEMPTED	(1 << 1)
+#define GEN8_CTX_STATUS_ELEMENT_SWITCH	(1 << 2)
+#define GEN8_CTX_STATUS_ACTIVE_IDLE	(1 << 3)
+#define GEN8_CTX_STATUS_COMPLETE	(1 << 4)
+#define GEN8_CTX_STATUS_LITE_RESTORE	(1 << 15)
 
 #define CTX_LRI_HEADER_0		0x01
 #define CTX_CONTEXT_CONTROL		0x02
@@ -218,6 +234,9 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 {
 	struct intel_ctx_submit_request *req0 = NULL, *req1 = NULL;
 	struct intel_ctx_submit_request *cursor = NULL, *tmp = NULL;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	assert_spin_locked(&ring->execlist_lock);
 
 	if (list_empty(&ring->execlist_queue))
 		return;
@@ -230,8 +249,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 			/* Same ctx: ignore first request, as second request
 			 * will update tail past first request's workload */
 			list_del(&req0->execlist_link);
-			i915_gem_context_unreference(req0->ctx);
-			kfree(req0);
+			queue_work(dev_priv->wq, &req0->work);
 			req0 = cursor;
 		} else {
 			req1 = cursor;
@@ -243,6 +261,86 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 			req1? req1->ctx : NULL, req1? req1->tail : 0));
 }
 
+static bool execlists_check_remove_request(struct intel_engine_cs *ring,
+					   u32 request_id)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct intel_ctx_submit_request *head_req;
+
+	assert_spin_locked(&ring->execlist_lock);
+
+	head_req = list_first_entry_or_null(&ring->execlist_queue,
+			struct intel_ctx_submit_request, execlist_link);
+	if (head_req != NULL) {
+		struct drm_i915_gem_object *ctx_obj =
+				head_req->ctx->engine[ring->id].obj;
+		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
+			list_del(&head_req->execlist_link);
+			queue_work(dev_priv->wq, &head_req->work);
+			return true;
+		}
+	}
+
+	return false;
+}
+
+void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	u32 status_pointer;
+	u8 read_pointer;
+	u8 write_pointer;
+	u32 status;
+	u32 status_id;
+	u32 submit_contexts = 0;
+
+	status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
+
+	read_pointer = ring->next_context_status_buffer;
+	write_pointer = status_pointer & 0x07;
+	if (read_pointer > write_pointer)
+		write_pointer += 6;
+
+	spin_lock(&ring->execlist_lock);
+
+	while (read_pointer < write_pointer) {
+		read_pointer++;
+		status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
+				(read_pointer % 6) * 8);
+		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
+				(read_pointer % 6) * 8 + 4);
+
+		if (status & GEN8_CTX_STATUS_COMPLETE) {
+			if (execlists_check_remove_request(ring, status_id))
+				submit_contexts++;
+		}
+	}
+
+	if (submit_contexts != 0)
+		execlists_context_unqueue(ring);
+
+	spin_unlock(&ring->execlist_lock);
+
+	WARN(submit_contexts > 2, "More than two context complete events?\n");
+	ring->next_context_status_buffer = write_pointer % 6;
+
+	I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
+			((u32)ring->next_context_status_buffer & 0x07) << 8);
+}
+
+static void execlists_free_request_task(struct work_struct *work)
+{
+	struct intel_ctx_submit_request *req =
+			container_of(work, struct intel_ctx_submit_request, work);
+	struct drm_device *dev = req->ring->dev;
+
+	mutex_lock(&dev->struct_mutex);
+	i915_gem_context_unreference(req->ctx);
+	mutex_unlock(&dev->struct_mutex);
+
+	kfree(req);
+}
+
 static int execlists_context_queue(struct intel_engine_cs *ring,
 				   struct intel_context *to,
 				   u32 tail)
@@ -258,6 +356,7 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 	i915_gem_context_reference(req->ctx);
 	req->ring = ring;
 	req->tail = tail;
+	INIT_WORK(&req->work, execlists_free_request_task);
 
 	spin_lock_irqsave(&ring->execlist_lock, flags);
 
@@ -1046,6 +1145,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 
 	INIT_LIST_HEAD(&ring->execlist_queue);
 	spin_lock_init(&ring->execlist_lock);
+	ring->next_context_status_buffer = 0;
 
 	ret = intel_lr_context_deferred_create(dctx, ring);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index e1938a3..7949dff 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -50,6 +50,9 @@ struct intel_ctx_submit_request {
 	u32 tail;
 
 	struct list_head execlist_link;
+	struct work_struct work;
 };
 
+void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring);
+
 #endif /* _INTEL_LRC_H_ */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index c3342e1..6f6f561 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -158,6 +158,7 @@ struct  intel_engine_cs {
 	/* Execlists */
 	spinlock_t execlist_lock;
 	struct list_head execlist_queue;
+	u8 next_context_status_buffer;
 	u32             irq_keep_mask;          /* bitmask for interrupts that should not be masked */
 	void		(*submit_ctx)(struct intel_engine_cs *ring,
 				      struct intel_context *ctx, u32 value);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 41/53] drm/i915/bdw: Avoid non-lite-restore preemptions
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (39 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 40/53] drm/i915/bdw: Handle context switch events oscar.mateo
@ 2014-06-13 15:37 ` oscar.mateo
  2014-06-18 20:49   ` Daniel Vetter
  2014-06-13 15:38 ` [PATCH 42/53] drm/i915/bdw: Make sure gpu reset still works with Execlists oscar.mateo
                   ` (12 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:37 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

In the current Execlists feeding mechanism, full preemption is not
supported yet: only lite-restores are allowed (this is: the GPU
simply samples a new tail pointer for the context currently in
execution).

But we have identified an scenario in which a full preemption occurs:
1) We submit two contexts for execution (A & B).
2) The GPU finishes with the first one (A), switches to the second one
(B) and informs us.
3) We submit B again (hoping to cause a lite restore) together with C,
but in the time we spend writing to the ELSP, the GPU finishes B.
4) The GPU start executing B again (since we told it so).
5) We receive a B finished interrupt and, mistakenly, we submit C (again)
and D, causing a full preemption of B.

By keeping a better track of our submissions, we can avoid the scenario
described above.

v2: elsp_submitted belongs in the new intel_ctx_submit_request. Several
rebase changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 28 ++++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_lrc.h |  2 ++
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 290391c..f388b28 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -248,6 +248,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 		else if (req0->ctx == cursor->ctx) {
 			/* Same ctx: ignore first request, as second request
 			 * will update tail past first request's workload */
+			cursor->elsp_submitted = req0->elsp_submitted;
 			list_del(&req0->execlist_link);
 			queue_work(dev_priv->wq, &req0->work);
 			req0 = cursor;
@@ -257,8 +258,14 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 		}
 	}
 
+	WARN_ON(req1 && req1->elsp_submitted);
+
 	BUG_ON(execlists_submit_context(ring, req0->ctx, req0->tail,
 			req1? req1->ctx : NULL, req1? req1->tail : 0));
+
+	req0->elsp_submitted++;
+	if (req1)
+		req1->elsp_submitted++;
 }
 
 static bool execlists_check_remove_request(struct intel_engine_cs *ring,
@@ -275,9 +282,13 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
 		struct drm_i915_gem_object *ctx_obj =
 				head_req->ctx->engine[ring->id].obj;
 		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
-			list_del(&head_req->execlist_link);
-			queue_work(dev_priv->wq, &head_req->work);
-			return true;
+			WARN(head_req->elsp_submitted == 0,
+					"Never submitted head request\n");
+			if (--head_req->elsp_submitted <= 0) {
+				list_del(&head_req->execlist_link);
+				queue_work(dev_priv->wq, &head_req->work);
+				return true;
+			}
 		}
 	}
 
@@ -310,7 +321,16 @@ void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring)
 		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
 				(read_pointer % 6) * 8 + 4);
 
-		if (status & GEN8_CTX_STATUS_COMPLETE) {
+		if (status & GEN8_CTX_STATUS_PREEMPTED) {
+			if (status & GEN8_CTX_STATUS_LITE_RESTORE) {
+				if (execlists_check_remove_request(ring, status_id))
+					WARN(1, "Lite Restored request removed from queue\n");
+			} else
+				WARN(1, "Preemption without Lite Restore\n");
+		}
+
+		 if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) ||
+		     (status & GEN8_CTX_STATUS_ELEMENT_SWITCH)) {
 			if (execlists_check_remove_request(ring, status_id))
 				submit_contexts++;
 		}
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 7949dff..ee877aa 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -51,6 +51,8 @@ struct intel_ctx_submit_request {
 
 	struct list_head execlist_link;
 	struct work_struct work;
+
+	int elsp_submitted;
 };
 
 void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 42/53] drm/i915/bdw: Make sure gpu reset still works with Execlists
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (40 preceding siblings ...)
  2014-06-13 15:37 ` [PATCH 41/53] drm/i915/bdw: Avoid non-lite-restore preemptions oscar.mateo
@ 2014-06-13 15:38 ` oscar.mateo
  2014-06-18 20:50   ` Daniel Vetter
  2014-06-13 15:38 ` [PATCH 43/53] drm/i915/bdw: Make sure error capture keeps working " oscar.mateo
                   ` (11 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:38 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

If we reset a ring after a hang, we have to make sure that we clear
out all queued Execlists requests.

v2: The ring is, at this point, already being correctly re-programmed
for Execlists, and the hangcheck counters cleared.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7c10540..86bfb8a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2546,6 +2546,19 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 		i915_gem_free_request(request);
 	}
 
+	if (intel_enable_execlists(dev_priv->dev)) {
+		while (!list_empty(&ring->execlist_queue)) {
+			struct intel_ctx_submit_request *submit_req;
+
+			submit_req = list_first_entry(&ring->execlist_queue,
+					struct intel_ctx_submit_request,
+					execlist_link);
+			list_del(&submit_req->execlist_link);
+			i915_gem_context_unreference(submit_req->ctx);
+			kfree(submit_req);
+		}
+	}
+
 	/* These may not have been flush before the reset, do so now */
 	kfree(ring->preallocated_lazy_request);
 	ring->preallocated_lazy_request = NULL;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 43/53] drm/i915/bdw: Make sure error capture keeps working with Execlists
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (41 preceding siblings ...)
  2014-06-13 15:38 ` [PATCH 42/53] drm/i915/bdw: Make sure gpu reset still works with Execlists oscar.mateo
@ 2014-06-13 15:38 ` oscar.mateo
  2014-06-13 16:54   ` Chris Wilson
  2014-06-18 20:52   ` Daniel Vetter
  2014-06-13 15:38 ` [PATCH 44/53] drm/i915/bdw: Help out the ctx switch interrupt handler oscar.mateo
                   ` (10 subsequent siblings)
  53 siblings, 2 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:38 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Since the ringbuffer does not belong per engine anymore, we have to
make sure that we are always recording the correct ringbuffer.

TODO: This is only a small fix to keep basic error capture working, but
we need to add more information for it to be useful (e.g. dump the
context being executed).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 87ec60e..f5897be 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -825,9 +825,6 @@ static void i915_record_ring_state(struct drm_device *dev,
 		ering->hws = I915_READ(mmio);
 	}
 
-	ering->cpu_ring_head = ring->buffer->head;
-	ering->cpu_ring_tail = ring->buffer->tail;
-
 	ering->hangcheck_score = ring->hangcheck.score;
 	ering->hangcheck_action = ring->hangcheck.action;
 
@@ -887,6 +884,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct intel_engine_cs *ring = &dev_priv->ring[i];
+		struct intel_ringbuffer *ringbuf = ring->buffer;
 
 		if (ring->dev == NULL)
 			continue;
@@ -929,8 +927,18 @@ static void i915_gem_record_rings(struct drm_device *dev,
 			}
 		}
 
+		if (intel_enable_execlists(dev)) {
+			if (request)
+				ringbuf = request->ctx->engine[ring->id].ringbuf;
+			else
+				ringbuf = ring->default_context->engine[ring->id].ringbuf;
+		}
+
+		error->ring[i].cpu_ring_head = ringbuf->head;
+		error->ring[i].cpu_ring_tail = ringbuf->tail;
+
 		error->ring[i].ringbuffer =
-			i915_error_ggtt_object_create(dev_priv, ring->buffer->obj);
+			i915_error_ggtt_object_create(dev_priv, ringbuf->obj);
 
 		if (ring->status_page.obj)
 			error->ring[i].hws_page =
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 44/53] drm/i915/bdw: Help out the ctx switch interrupt handler
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (42 preceding siblings ...)
  2014-06-13 15:38 ` [PATCH 43/53] drm/i915/bdw: Make sure error capture keeps working " oscar.mateo
@ 2014-06-13 15:38 ` oscar.mateo
  2014-06-13 15:38 ` [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt oscar.mateo
                   ` (9 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:38 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

If we receive a storm of requests for the same context (see gem_storedw_loop_*)
we might end up iterating over too many elements in interrupt time, looking for
contexts to squash together. Instead, share the burden by giving more
intelligence to the queue function. At most, the interrupt will iterate over
three elements.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f388b28..d33e622 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -365,9 +365,10 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 				   struct intel_context *to,
 				   u32 tail)
 {
-	struct intel_ctx_submit_request *req = NULL;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct intel_ctx_submit_request *req = NULL, *cursor;
 	unsigned long flags;
-	bool was_empty;
+	int num_elements = 0;
 
 	req = kzalloc(sizeof(*req), GFP_KERNEL);
 	if (req == NULL)
@@ -380,9 +381,26 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 
 	spin_lock_irqsave(&ring->execlist_lock, flags);
 
-	was_empty = list_empty(&ring->execlist_queue);
+	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
+		if (++num_elements > 2)
+			break;
+
+	if (num_elements > 2) {
+		struct intel_ctx_submit_request *tail_req;
+
+		tail_req = list_last_entry(&ring->execlist_queue,
+					struct intel_ctx_submit_request,
+					execlist_link);
+		if (to == tail_req->ctx) {
+			WARN(tail_req->elsp_submitted != 0,
+					"More than 2 already-submitted reqs queued\n");
+			list_del(&tail_req->execlist_link);
+			queue_work(dev_priv->wq, &tail_req->work);
+		}
+	}
+
 	list_add_tail(&req->execlist_link, &ring->execlist_queue);
-	if (was_empty)
+	if (num_elements == 0)
 		execlists_context_unqueue(ring);
 
 	spin_unlock_irqrestore(&ring->execlist_lock, flags);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (43 preceding siblings ...)
  2014-06-13 15:38 ` [PATCH 44/53] drm/i915/bdw: Help out the ctx switch interrupt handler oscar.mateo
@ 2014-06-13 15:38 ` oscar.mateo
  2014-06-18 20:54   ` Daniel Vetter
  2014-06-13 15:38 ` [PATCH 46/53] drm/i915/bdw: Display execlists info in debugfs oscar.mateo
                   ` (8 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:38 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Or with a spinlock grabbed, because it might sleep, which is not
a nice thing to do. Instead, do the runtime_pm get/put together
with the create/destroy request, and handle the forcewake get/put
directly.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d33e622..ea4b358 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -159,6 +159,7 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	uint64_t temp = 0;
 	uint32_t desc[4];
+	unsigned long flags;
 
 	/* XXX: You must always write both descriptors in the order below. */
 	if (ctx_obj1)
@@ -172,9 +173,17 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	desc[3] = (u32)(temp >> 32);
 	desc[2] = (u32)temp;
 
-	/* Set Force Wakeup bit to prevent GT from entering C6 while
-	 * ELSP writes are in progress */
-	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
+	/* Set Force Wakeup bit to prevent GT from entering C6 while ELSP writes
+	 * are in progress.
+	 *
+	 * The other problem is that we can't just call gen6_gt_force_wake_get()
+	 * because that function calls intel_runtime_pm_get(), which might sleep.
+	 * Instead, we do the runtime_pm_get/put when creating/destroying requests.
+	 */
+	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
+	if (dev_priv->uncore.forcewake_count++ == 0)
+		dev_priv->uncore.funcs.force_wake_get(dev_priv, FORCEWAKE_ALL);
+	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
 
 	I915_WRITE(RING_ELSP(ring), desc[1]);
 	I915_WRITE(RING_ELSP(ring), desc[0]);
@@ -185,8 +194,11 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	/* ELSP is a write only register, so this serves as a posting read */
 	POSTING_READ(RING_EXECLIST_STATUS(ring));
 
-	/* Release Force Wakeup */
-	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
+	/* Release Force Wakeup (see the big comment above). */
+	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
+	if (--dev_priv->uncore.forcewake_count == 0)
+		dev_priv->uncore.funcs.force_wake_put(dev_priv, FORCEWAKE_ALL);
+	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
 }
 
 static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tail)
@@ -353,6 +365,9 @@ static void execlists_free_request_task(struct work_struct *work)
 	struct intel_ctx_submit_request *req =
 			container_of(work, struct intel_ctx_submit_request, work);
 	struct drm_device *dev = req->ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	intel_runtime_pm_put(dev_priv);
 
 	mutex_lock(&dev->struct_mutex);
 	i915_gem_context_unreference(req->ctx);
@@ -378,6 +393,7 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 	req->ring = ring;
 	req->tail = tail;
 	INIT_WORK(&req->work, execlists_free_request_task);
+	intel_runtime_pm_get(dev_priv);
 
 	spin_lock_irqsave(&ring->execlist_lock, flags);
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 46/53] drm/i915/bdw: Display execlists info in debugfs
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (44 preceding siblings ...)
  2014-06-13 15:38 ` [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt oscar.mateo
@ 2014-06-13 15:38 ` oscar.mateo
  2014-06-18 20:59   ` Daniel Vetter
  2014-06-13 15:38 ` [PATCH 47/53] drm/i915/bdw: Display context backing obj & ringbuffer " oscar.mateo
                   ` (7 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:38 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

v2: Warn and return if LRCs are not enabled.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 72 +++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c    |  6 ----
 drivers/gpu/drm/i915/intel_lrc.h    |  7 ++++
 3 files changed, 79 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index b09cab4..3ccdf0d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1753,6 +1753,77 @@ static int i915_context_status(struct seq_file *m, void *unused)
 	return 0;
 }
 
+static int i915_execlists(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = (struct drm_info_node *) m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring;
+	u32 status_pointer;
+	u8 read_pointer;
+	u8 write_pointer;
+	u32 status;
+	u32 ctx_id;
+	struct list_head *cursor;
+	struct intel_ctx_submit_request *head_req;
+	int ring_id, i;
+
+	if (!intel_enable_execlists(dev)) {
+		seq_printf(m, "Logical Ring Contexts are disabled\n");
+		return 0;
+	}
+
+	for_each_ring(ring, dev_priv, ring_id) {
+		int count = 0;
+
+		seq_printf(m, "%s\n", ring->name);
+
+		status = I915_READ(RING_EXECLIST_STATUS(ring));
+		ctx_id = I915_READ(RING_EXECLIST_STATUS(ring) + 4);
+		seq_printf(m, "\tExeclist status: 0x%08X, context: %u\n",
+				status, ctx_id);
+
+		status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
+		seq_printf(m, "\tStatus pointer: 0x%08X\n", status_pointer);
+
+		read_pointer = ring->next_context_status_buffer;
+		write_pointer = status_pointer & 0x07;
+		if (read_pointer > write_pointer)
+			write_pointer += 6;
+		seq_printf(m, "\tRead pointer: 0x%08X, write pointer 0x%08X\n",
+				read_pointer, write_pointer);
+
+		for (i = 0; i < 6; i++) {
+			status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + 8*i);
+			ctx_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + 8*i + 4);
+
+			seq_printf(m, "\tStatus buffer %d: 0x%08X, context: %u\n",
+					i, status, ctx_id);
+		}
+
+		list_for_each(cursor, &ring->execlist_queue) {
+			count++;
+		}
+		seq_printf(m, "\t%d requests in queue\n", count);
+
+		if (count > 0) {
+			struct drm_i915_gem_object *ctx_obj;
+
+			head_req = list_first_entry(&ring->execlist_queue,
+					struct intel_ctx_submit_request, execlist_link);
+
+			ctx_obj = head_req->ctx->engine[ring_id].obj;
+			seq_printf(m, "\tHead request id: %u\n",
+					intel_execlists_ctx_id(ctx_obj));
+			seq_printf(m, "\tHead request tail: %u\n", head_req->tail);
+		}
+
+		seq_putc(m, '\n');
+	}
+
+	return 0;
+}
+
 static int i915_gen6_forcewake_count_info(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = m->private;
@@ -3813,6 +3884,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_opregion", i915_opregion, 0},
 	{"i915_gem_framebuffer", i915_gem_framebuffer_info, 0},
 	{"i915_context_status", i915_context_status, 0},
+	{"i915_execlists", i915_execlists, 0},
 	{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
 	{"i915_swizzle_info", i915_swizzle_info, 0},
 	{"i915_ppgtt_info", i915_ppgtt_info, 0},
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ea4b358..c23c0f6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -46,12 +46,6 @@
 
 #define GEN8_LR_CONTEXT_ALIGN 4096
 
-#define RING_ELSP(ring)			((ring)->mmio_base+0x230)
-#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
-#define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
-#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
-#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
-
 #define RING_EXECLIST_QFULL		(1 << 0x2)
 #define RING_EXECLIST1_VALID		(1 << 0x3)
 #define RING_EXECLIST0_VALID		(1 << 0x4)
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index ee877aa..c318dcb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -1,6 +1,13 @@
 #ifndef _INTEL_LRC_H_
 #define _INTEL_LRC_H_
 
+/* Execlists regs */
+#define RING_ELSP(ring)			((ring)->mmio_base+0x230)
+#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
+#define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
+#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
+#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
+
 /* Logical Rings */
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
 int intel_logical_rings_init(struct drm_device *dev);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 47/53] drm/i915/bdw: Display context backing obj & ringbuffer info in debugfs
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (45 preceding siblings ...)
  2014-06-13 15:38 ` [PATCH 46/53] drm/i915/bdw: Display execlists info in debugfs oscar.mateo
@ 2014-06-13 15:38 ` oscar.mateo
  2014-06-13 15:38 ` [PATCH 48/53] drm/i915/bdw: Print context state " oscar.mateo
                   ` (6 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:38 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 3ccdf0d..e5db287 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1708,6 +1708,12 @@ static int i915_gem_framebuffer_info(struct seq_file *m, void *data)
 
 	return 0;
 }
+static void describe_ctx_ringbuf(struct seq_file *m, struct intel_ringbuffer *ringbuf)
+{
+	seq_printf(m, " (ringbuffer, space: %d, head: %u, tail: %u, last head: %d)",
+			ringbuf->space, ringbuf->head, ringbuf->tail,
+			ringbuf->last_retired_head);
+}
 
 static int i915_context_status(struct seq_file *m, void *unused)
 {
@@ -1716,6 +1722,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
 	struct intel_context *ctx;
+	bool exl_enabled = intel_enable_execlists(dev);
 	int ret, i;
 
 	ret = mutex_lock_interruptible(&dev->struct_mutex);
@@ -1735,7 +1742,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
 	}
 
 	list_for_each_entry(ctx, &dev_priv->context_list, link) {
-		if (ctx->render_obj == NULL)
+		if (!exl_enabled && ctx->render_obj == NULL)
 			continue;
 
 		seq_puts(m, "HW context ");
@@ -1744,7 +1751,22 @@ static int i915_context_status(struct seq_file *m, void *unused)
 			if (ring->default_context == ctx)
 				seq_printf(m, "(default context %s) ", ring->name);
 
-		describe_obj(m, ctx->render_obj);
+		if (exl_enabled) {
+			seq_putc(m, '\n');
+			for_each_ring(ring, dev_priv, i) {
+				struct drm_i915_gem_object *ctx_obj = ctx->engine[i].obj;
+				struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
+
+				seq_printf(m, "%s: ", ring->name);
+				if (ctx_obj)
+					describe_obj(m, ctx_obj);
+				if (ringbuf)
+					describe_ctx_ringbuf(m, ringbuf);
+				seq_putc(m, '\n');
+			}
+		} else
+			describe_obj(m, ctx->render_obj);
+
 		seq_putc(m, '\n');
 	}
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 48/53] drm/i915/bdw: Print context state in debugfs
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (46 preceding siblings ...)
  2014-06-13 15:38 ` [PATCH 47/53] drm/i915/bdw: Display context backing obj & ringbuffer " oscar.mateo
@ 2014-06-13 15:38 ` oscar.mateo
  2014-06-13 15:38 ` [PATCH 49/53] drm/i915: Extract render state preparation oscar.mateo
                   ` (5 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:38 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <ben@bwidawsk.net>

This has turned out to be really handy in debug so far.

Update:
Since writing this patch, I've gotten similar code upstream for error
state. I've used it quite a bit in debugfs however, and I'd like to keep
it here at least until preemption is working.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

This patch was accidentally dropped in the first Execlists version, and
it has been very useful indeed. Put it back again, but as a standalone
debugfs file.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 52 +++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e5db287..7ac6118 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1775,6 +1775,57 @@ static int i915_context_status(struct seq_file *m, void *unused)
 	return 0;
 }
 
+static int i915_dump_lrc(struct seq_file *m, void *unused)
+{
+	struct drm_info_node *node = (struct drm_info_node *) m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring;
+	struct intel_context *ctx;
+	int ret, i;
+
+	if (!intel_enable_execlists(dev)) {
+		seq_printf(m, "Logical Ring Contexts are disabled\n");
+		return 0;
+	}
+
+	ret = mutex_lock_interruptible(&dev->mode_config.mutex);
+	if (ret)
+		return ret;
+
+	list_for_each_entry(ctx, &dev_priv->context_list, link) {
+		for_each_ring(ring, dev_priv, i) {
+			struct drm_i915_gem_object *ctx_obj = ctx->engine[i].obj;
+
+			if (ring->default_context == ctx)
+				continue;
+
+			if (ctx_obj) {
+				struct page *page = i915_gem_object_get_page(ctx_obj, 1);
+				uint32_t *reg_state = kmap_atomic(page);
+				int j;
+
+				seq_printf(m, "CONTEXT: %s %u\n", ring->name,
+						intel_execlists_ctx_id(ctx_obj));
+
+				for (j = 0; j < 0x600 / sizeof(u32) / 4; j += 4) {
+					seq_printf(m, "\t[0x%08lx] 0x%08x 0x%08x 0x%08x 0x%08x\n",
+					i915_gem_obj_ggtt_offset(ctx_obj) + 4096 + (j * 4),
+					reg_state[j], reg_state[j + 1],
+					reg_state[j + 2], reg_state[j + 3]);
+				}
+				kunmap_atomic(reg_state);
+
+				seq_putc(m, '\n');
+			}
+		}
+	}
+
+	mutex_unlock(&dev->mode_config.mutex);
+
+	return 0;
+}
+
 static int i915_execlists(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = (struct drm_info_node *) m->private;
@@ -3906,6 +3957,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_opregion", i915_opregion, 0},
 	{"i915_gem_framebuffer", i915_gem_framebuffer_info, 0},
 	{"i915_context_status", i915_context_status, 0},
+	{"i915_dump_lrc", i915_dump_lrc, 0},
 	{"i915_execlists", i915_execlists, 0},
 	{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
 	{"i915_swizzle_info", i915_swizzle_info, 0},
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 49/53] drm/i915: Extract render state preparation
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (47 preceding siblings ...)
  2014-06-13 15:38 ` [PATCH 48/53] drm/i915/bdw: Print context state " oscar.mateo
@ 2014-06-13 15:38 ` oscar.mateo
  2014-06-13 15:38 ` [PATCH 50/53] drm/i915/bdw: Render state init for Execlists oscar.mateo
                   ` (4 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:38 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Execlists need a new submission mechanism, so split the preparation
from the submission.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_render_state.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 3521f99..5944c0a 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -159,7 +159,8 @@ static int render_state_setup(const int gen,
 	return 0;
 }
 
-int i915_gem_render_state_init(struct intel_engine_cs *ring)
+static struct i915_render_state *
+render_state_prepare(struct intel_engine_cs *ring)
 {
 	const int gen = INTEL_INFO(ring->dev)->gen;
 	struct i915_render_state *so;
@@ -167,19 +168,33 @@ int i915_gem_render_state_init(struct intel_engine_cs *ring)
 	int ret;
 
 	if (WARN_ON(ring->id != RCS))
-		return -ENOENT;
+		return ERR_PTR(-ENOENT);
 
 	rodata = render_state_get_rodata(ring->dev, gen);
 	if (rodata == NULL)
-		return 0;
+		return NULL;
 
 	so = render_state_alloc(ring->dev);
 	if (IS_ERR(so))
-		return PTR_ERR(so);
+		return so;
 
 	ret = render_state_setup(gen, rodata, so);
-	if (ret)
-		goto out;
+	if (ret) {
+		render_state_free(so);
+		return ERR_PTR(ret);
+	}
+
+	return so;
+}
+
+int i915_gem_render_state_init(struct intel_engine_cs *ring)
+{
+	struct i915_render_state *so;
+	int ret;
+
+	so = render_state_prepare(ring);
+	if (IS_ERR_OR_NULL(so))
+		return PTR_ERR(so);
 
 	ret = ring->dispatch_execbuffer(ring,
 					i915_gem_obj_ggtt_offset(so->obj),
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 50/53] drm/i915/bdw: Render state init for Execlists
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (48 preceding siblings ...)
  2014-06-13 15:38 ` [PATCH 49/53] drm/i915: Extract render state preparation oscar.mateo
@ 2014-06-13 15:38 ` oscar.mateo
  2014-06-13 15:38 ` [PATCH 51/53] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists oscar.mateo
                   ` (3 subsequent siblings)
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:38 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

The batchbuffer that sets the render context state is submitted
in a different way, and from different places.

We needed to make both the render state preparation and free functions
outside accesible, and namespace accordingly. This mess is so that all
LR, LRC and Execlists functionality can go together in intel_lrc.c: we
can fix all of this later on, once the interfaces are clear.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c      | 17 ++++++++++-
 drivers/gpu/drm/i915/i915_gem_render_state.c | 20 ++++---------
 drivers/gpu/drm/i915/intel_lrc.c             | 43 ++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h             |  2 ++
 drivers/gpu/drm/i915/intel_renderstate.h     | 13 +++++++++
 5 files changed, 80 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 99bdd5e..2350e8c 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -512,8 +512,23 @@ int i915_gem_context_enable(struct drm_i915_private *dev_priv)
 		ppgtt->enable(ppgtt);
 	}
 
-	if (dev_priv->lrc_enabled)
+	if (dev_priv->lrc_enabled) {
+		struct intel_context *dctx;
+
+		ring = &dev_priv->ring[RCS];
+		dctx = ring->default_context;
+
+		if (!dctx->is_initialized) {
+			ret = intel_lr_context_render_state_init(ring, dctx);
+			if (ret) {
+				DRM_ERROR("Init render state failed: %d\n", ret);
+				return ret;
+			}
+			dctx->is_initialized = true;
+		}
+
 		return 0;
+	}
 
 	/* FIXME: We should make this work, even in reset */
 	if (i915_reset_in_progress(&dev_priv->gpu_error))
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 5944c0a..08d8275 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -28,14 +28,6 @@
 #include "i915_drv.h"
 #include "intel_renderstate.h"
 
-struct i915_render_state {
-	struct drm_i915_gem_object *obj;
-	unsigned long ggtt_offset;
-	void *batch;
-	u32 size;
-	u32 len;
-};
-
 static struct i915_render_state *render_state_alloc(struct drm_device *dev)
 {
 	struct i915_render_state *so;
@@ -78,7 +70,7 @@ free:
 	return ERR_PTR(ret);
 }
 
-static void render_state_free(struct i915_render_state *so)
+void i915_gem_render_state_free(struct i915_render_state *so)
 {
 	kunmap(so->batch);
 	i915_gem_object_ggtt_unpin(so->obj);
@@ -159,8 +151,8 @@ static int render_state_setup(const int gen,
 	return 0;
 }
 
-static struct i915_render_state *
-render_state_prepare(struct intel_engine_cs *ring)
+struct i915_render_state *
+i915_gem_render_state_prepare(struct intel_engine_cs *ring)
 {
 	const int gen = INTEL_INFO(ring->dev)->gen;
 	struct i915_render_state *so;
@@ -180,7 +172,7 @@ render_state_prepare(struct intel_engine_cs *ring)
 
 	ret = render_state_setup(gen, rodata, so);
 	if (ret) {
-		render_state_free(so);
+		i915_gem_render_state_free(so);
 		return ERR_PTR(ret);
 	}
 
@@ -192,7 +184,7 @@ int i915_gem_render_state_init(struct intel_engine_cs *ring)
 	struct i915_render_state *so;
 	int ret;
 
-	so = render_state_prepare(ring);
+	so = i915_gem_render_state_prepare(ring);
 	if (IS_ERR_OR_NULL(so))
 		return PTR_ERR(so);
 
@@ -208,6 +200,6 @@ int i915_gem_render_state_init(struct intel_engine_cs *ring)
 	ret = __i915_add_request(ring, NULL, so->obj, NULL);
 	/* __i915_add_request moves object to inactive if it fails */
 out:
-	render_state_free(so);
+	i915_gem_render_state_free(so);
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c23c0f6..45f5485 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -40,6 +40,7 @@
 #include <drm/drmP.h>
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
+#include "intel_renderstate.h"
 
 #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
 #define GEN8_LR_CONTEXT_OTHER_SIZE (2 * PAGE_SIZE)
@@ -1406,6 +1407,33 @@ cleanup_render_ring:
 	return ret;
 }
 
+int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
+				       struct intel_context *ctx)
+{
+	struct i915_render_state *so;
+	struct drm_i915_file_private *file_priv = ctx->file_priv;
+	struct drm_file *file = file_priv? file_priv->file : NULL;
+	int ret;
+
+	so = i915_gem_render_state_prepare(ring);
+	if (IS_ERR_OR_NULL(so))
+		return PTR_ERR(so);
+
+	ret = ring->emit_bb_start(ring, ctx,
+			i915_gem_obj_ggtt_offset(so->obj),
+			I915_DISPATCH_SECURE);
+	if (ret)
+		goto out;
+
+	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so->obj), ring);
+
+	ret = intel_logical_ring_add_request(ring, file, so->obj, NULL);
+	/* intel_logical_ring_add_request moves object to inactive if it fails */
+out:
+	i915_gem_render_state_free(so);
+	return ret;
+}
+
 static int
 populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj,
 		    struct intel_engine_cs *ring, struct drm_i915_gem_object *ring_obj)
@@ -1616,6 +1644,21 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	ctx->engine[ring->id].ringbuf = ringbuf;
 	ctx->engine[ring->id].obj = ctx_obj;
 
+	/* The default context will have to wait, because we are not yet
+	 * ready to send a batchbuffer at this point */
+	if (ring->id == RCS && !ctx->is_initialized &&
+			ctx != ring->default_context) {
+		ret = intel_lr_context_render_state_init(ring, ctx);
+		if (ret) {
+			DRM_ERROR("Init render state failed: %d\n", ret);
+			ctx->engine[ring->id].ringbuf = NULL;
+			ctx->engine[ring->id].obj = NULL;
+			intel_destroy_ring_buffer(ringbuf);
+			goto error;
+		}
+		ctx->is_initialized = true;
+	}
+
 	return 0;
 
 error:
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index c318dcb..34b1189 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -43,6 +43,8 @@ int intel_logical_ring_begin(struct intel_engine_cs *ring,
 			     int num_dwords);
 
 /* Logical Ring Contexts */
+int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
+				       struct intel_context *ctx);
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/intel_renderstate.h b/drivers/gpu/drm/i915/intel_renderstate.h
index a5e783a..65a4740 100644
--- a/drivers/gpu/drm/i915/intel_renderstate.h
+++ b/drivers/gpu/drm/i915/intel_renderstate.h
@@ -25,6 +25,15 @@
 #define _INTEL_RENDERSTATE_H
 
 #include <linux/types.h>
+#include "i915_drv.h"
+
+struct i915_render_state {
+	struct drm_i915_gem_object *obj;
+	unsigned long ggtt_offset;
+	void *batch;
+	u32 size;
+	u32 len;
+};
 
 struct intel_renderstate_rodata {
 	const u32 *reloc;
@@ -45,4 +54,8 @@ extern const struct intel_renderstate_rodata gen8_null_state;
 		.batch_items = sizeof(gen ## _g ## _null_state_batch)/4, \
 	}
 
+void i915_gem_render_state_free(struct i915_render_state *so);
+struct i915_render_state *
+i915_gem_render_state_prepare(struct intel_engine_cs *ring);
+
 #endif /* INTEL_RENDERSTATE_H */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 51/53] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (49 preceding siblings ...)
  2014-06-13 15:38 ` [PATCH 50/53] drm/i915/bdw: Render state init for Execlists oscar.mateo
@ 2014-06-13 15:38 ` oscar.mateo
  2014-06-13 16:51   ` Chris Wilson
  2014-06-13 15:38 ` [PATCH 52/53] drm/i915/bdw: Enable logical ring contexts oscar.mateo
                   ` (2 subsequent siblings)
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:38 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

Add theory of operation notes to intel_lrc.c and comments to externally
visible functions.

v2: Add notes on logical ring context creation.
v3: Use kerneldoc.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v3)
---
 drivers/gpu/drm/i915/intel_lrc.c | 235 ++++++++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.h |  30 +++++
 2 files changed, 264 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 45f5485..e3349c8 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -28,13 +28,108 @@
  *
  */
 
-/*
+/**
+ * DOC: Logical Rings, Logical Ring Contexts and Execlists
+ *
+ * Motivation:
  * GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
  * These expanded contexts enable a number of new abilities, especially
  * "Execlists" (also implemented in this file).
  *
+ * One of the main differences with the legacy HW contexts is that logical
+ * ring contexts incorporate many more things to the context's state, like
+ * PDPs or ringbuffer control registers:
+ *
+ * The reason why PDPs are included in the context is straightforward: as
+ * PPGTTs (per-process GTTs) are actually per-context, having the PDPs
+ * contained there mean you don't need to do a ppgtt->switch_mm yourself,
+ * instead, the GPU will do it for you on the context switch.
+ *
+ * But, what about the ringbuffer control registers (head, tail, etc..)?
+ * shouldn't we just need a set of those per engine command streamer? This is
+ * where the name "Logical Rings" starts to make sense: by virtualizing the
+ * rings, the engine cs shifts to a new "ring buffer" with every context
+ * switch. When you want to submit a workload to the GPU you: A) choose your
+ * context, B) find its appropriate virtualized ring, C) write commands to it
+ * and then, finally, D) tell the GPU to switch to that context.
+ *
+ * Instead of the legacy MI_SET_CONTEXT, the way you tell the GPU to switch
+ * to a contexts is via a context execution list, ergo "Execlists".
+ *
+ * LRC implementation:
+ * Regarding the creation of contexts, we have:
+ *
+ * - One global default context.
+ * - One local default context for each opened fd.
+ * - One local extra context for each context create ioctl call.
+ *
+ * Now that ringbuffers belong per-context (and not per-engine, like before)
+ * and that contexts are uniquely tied to a given engine (and not reusable,
+ * like before) we need:
+ *
+ * - One ringbuffer per-engine inside each context.
+ * - One backing object per-engine inside each context.
+ *
+ * The global default context starts its life with these new objects fully
+ * allocated and populated. The local default context for each opened fd is
+ * more complex, because we don't know at creation time which engine is going
+ * to use them. To handle this, we have implemented a deferred creation of LR
+ * contexts:
+ *
+ * The local context starts its life as a hollow or blank holder, that only
+ * gets populated for a given engine once we receive an execbuffer. If later
+ * on we receive another execbuffer ioctl for the same context but a different
+ * engine, we allocate/populate a new ringbuffer and context backing object and
+ * so on.
+ *
+ * Finally, regarding local contexts created using the ioctl call: as they are
+ * only allowed with the render ring, we can allocate & populate them right
+ * away (no need to defer anything, at least for now).
+ *
+ * Execlists implementation:
  * Execlists are the new method by which, on gen8+ hardware, workloads are
  * submitted for execution (as opposed to the legacy, ringbuffer-based, method).
+ * This method works as follows:
+ *
+ * When a request is committed, its commands (the BB start and any leading or
+ * trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer
+ * for the appropriate context. The tail pointer in the hardware context is not
+ * updated at this time, but instead, kept by the driver in the ringbuffer
+ * structure. A structure representing this request is added to a request queue
+ * for the appropriate engine: this structure contains a copy of the context's
+ * tail after the request was written to the ring buffer and a pointer to the
+ * context itself.
+ *
+ * If the engine's request queue was empty before the request was added, the
+ * queue is processed immediately. Otherwise the queue will be processed during
+ * a context switch interrupt. In any case, elements on the queue will get sent
+ * (in pairs) to the GPU's ExecLists Submit Port (ELSP, for short) with a
+ * globally unique 20-bits submission ID.
+ *
+ * When execution of a request completes, the GPU updates the context status
+ * buffer with a context complete event and generates a context switch interrupt.
+ * During the interrupt handling, the driver examines the events in the buffer:
+ * for each context complete event, if the announced ID matches that on the head
+ * of the request queue, then that request is retired and removed from the queue.
+ *
+ * After processing, if any requests were retired and the queue is not empty
+ * then a new execution list can be submitted. The two requests at the front of
+ * the queue are next to be submitted but since a context may not occur twice in
+ * an execution list, if subsequent requests have the same ID as the first then
+ * the two requests must be combined. This is done simply by discarding requests
+ * at the head of the queue until either only one requests is left (in which case
+ * we use a NULL second context) or the first two requests have unique IDs.
+ *
+ * By always executing the first two requests in the queue the driver ensures
+ * that the GPU is kept as busy as possible. In the case where a single context
+ * completes but a second context is still executing, the request for this second
+ * context will be at the head of the queue when we remove the first one. This
+ * request will then be resubmitted along with a new request for a different context,
+ * which will cause the hardware to continue executing the second request and queue
+ * the new request (the GPU detects the condition of a context getting preempted
+ * with the same context and optimizes the context switch flow by not doing
+ * preemption, but just sampling the new tail pointer).
+ *
  */
 
 #include <drm/drmP.h>
@@ -110,6 +205,16 @@ enum {
 };
 #define GEN8_CTX_ID_SHIFT 32
 
+/**
+ * intel_enable_execlists() - is Execlists enabled in the system?
+ * @dev: DRM device.
+ *
+ * Only certain platforms support Execlists (the prerequisites being
+ * support for Logical Ring Contexts and Aliasing PPGTT or better),
+ * and only when enabled via module parameter.
+ *
+ * Return: true if Execlists is supported and enabled.
+ */
 bool intel_enable_execlists(struct drm_device *dev)
 {
 	if (!i915.enable_execlists)
@@ -118,6 +223,18 @@ bool intel_enable_execlists(struct drm_device *dev)
 	return HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev);
 }
 
+/**
+ * intel_execlists_ctx_id() - get the Execlists Context ID
+ * @ctx_obj: Logical Ring Context backing object.
+ *
+ * Do not confuse with ctx->id! Unfortunately we have a name overload
+ * here: the old context ID we pass to userspace as a handler so that
+ * they can refer to a context, and the new context ID we pass to the
+ * ELSP so that the GPU can inform us of the context status via
+ * interrupts.
+ *
+ * Return: 20-bits globally unique context ID.
+ */
 u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
 {
 	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj);
@@ -302,6 +419,13 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
 	return false;
 }
 
+/**
+ * intel_execlists_handle_ctx_events() - handle Context Switch interrupts
+ * @ring: Engine Command Streamer to handle.
+ *
+ * Check the unread Context Status Buffers and manage the submission of new
+ * contexts to the ELSP accordingly.
+ */
 void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
@@ -514,6 +638,22 @@ static int execlists_move_to_gpu(struct intel_engine_cs *ring,
 	return logical_ring_invalidate_all_caches(ring, ctx);
 }
 
+/**
+ * execlists_submission() - submit a batchbuffer for execution, Execlists style
+ * @dev: DRM device.
+ * @ring: Engine Command Streamer to submit to.
+ * @ctx: Context to employ for this submission.
+ * @args: execbuffer call arguments.
+ * @vmas: list of vmas.
+ * @batch_obj: the batchbuffer to submit.
+ * @exec_start: batchbuffer start virtual address pointer.
+ * @flags: translated execbuffer call flags.
+ *
+ * This is the evil twin version of i915_gem_ringbuffer_submission. It abstracts
+ * away the submission details of the execbuffer ioctl call.
+ *
+ * Return: non-zero if the submission fails.
+ */
 int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 			       struct intel_engine_cs *ring,
 			       struct intel_context *ctx,
@@ -601,6 +741,25 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 	return 0;
 }
 
+/**
+ * intel_logical_ring_add_request() - queues a GEM request
+ * @ring: Engine Command Streamer.
+ * @file: DRM file (if the request comes from userspace, who sent it?).
+ * @obj: batchbuffer object (if there is one).
+ * @out_seqno: used to return the seqno assigned to this request.
+ *
+ *
+ * A lot of stuff goes on in the function, the main one being that the
+ * a drm_i915_gem_request is filled up with data and queued. This request
+ * (not to be confused with a context submit request) allows us to track
+ * sequence numbers that have been emitted and may be associate them with
+ * active buffers to be retired.
+ *
+ * The equivalent in the legacy ringbuffer submission world to this function
+ * would be __i915_add_request().
+ *
+ * Return: non-zero if the request cannot be added.
+ */
 int intel_logical_ring_add_request(struct intel_engine_cs *ring,
 				   struct drm_file *file,
 				   struct drm_i915_gem_object *obj,
@@ -692,6 +851,16 @@ int intel_logical_ring_add_request(struct intel_engine_cs *ring,
 	return 0;
 }
 
+/**
+ * intel_logical_ring_advance_and_submit() - advance the tail and submit the workload
+ * @ring: Engine Command Streamer.
+ * @ctx: Logical Ring Context.
+ *
+ * The tail is updated in our logical ringbuffer struct, not in the actual context. What
+ * really happens during submission is that the context and current tail will be placed
+ * on a queue waiting for the ELSP to be ready to accept a new context submission. At that
+ * point, the tail *inside* the context is updated and the ELSP written to.
+ */
 void intel_logical_ring_advance_and_submit(struct intel_engine_cs *ring,
 					   struct intel_context *ctx)
 {
@@ -878,6 +1047,20 @@ static int logical_ring_prepare(struct intel_engine_cs *ring,
 	return 0;
 }
 
+/**
+ * intel_logical_ring_begin() - prepare the logical ringbuffer to accept some commands
+ *
+ * @ring: Engine Command Streamer.
+ * @ctx: Logical Ring Context.
+ * @num_dwords: number of DWORDs that we plan to write to the ringbuffer.
+ *
+ * The ringbuffer might not be ready to accept the commands right away (maybe it needs to
+ * be wrapped, or wait a bit for the tail to be updated). This function takes care of that
+ * and also preallocates a request (every workload submission is still mediated through
+ * requests, same as it did with legacy ringbuffer submission).
+ *
+ * Return: non-zero if the ringbuffer is not ready to be written to.
+ */
 int intel_logical_ring_begin(struct intel_engine_cs *ring,
 			     struct intel_context *ctx,
 			     int num_dwords)
@@ -1155,6 +1338,13 @@ static int gen8_emit_request_render(struct intel_engine_cs *ring,
 	return 0;
 }
 
+/**
+ * intel_logical_ring_cleanup() - TODO
+ *
+ * @ring: Engine Command Streamer.
+ *
+ * TODO
+ */
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
@@ -1354,6 +1544,16 @@ static int logical_vebox_ring_init(struct drm_device *dev)
 	return logical_ring_init(dev, ring);
 }
 
+/**
+ * intel_logical_rings_init() - allocate, populate and init the Engine Command Streamers
+ * @dev: DRM device.
+ *
+ * This function inits the engines for an Execlists submission style (the equivalent in the
+ * legacy ringbuffer submission world would be i915_gem_init_rings). It does it only for
+ * those engines that are present in the hardware.
+ *
+ * Return: non-zero if the initialization failed.
+ */
 int intel_logical_rings_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1407,6 +1607,18 @@ cleanup_render_ring:
 	return ret;
 }
 
+/**
+ * intel_lr_context_render_state_init() - render state init for Execlists
+ * @ring: Engine Command Streamer.
+ * @dev: DRM device.
+ *
+ * A.K.A. null-context, A.K.A. golden-context. In a word, the render engine
+ * contexts require to always have a valid 3d pipeline state. As this is
+ * achieved with the submission of a batchbuffer, we require an alternative
+ * entry point to the legacy ringbuffer submission one (i915_gem_render_state_init).
+ *
+ * Return: non-zero if the initialization failed.
+ */
 int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
 				       struct intel_context *ctx)
 {
@@ -1538,6 +1750,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	return 0;
 }
 
+/**
+ * intel_lr_context_free() - free the LRC specific bits of a context
+ * @ctx: the LR context to free.
+ *
+ * The real context freeing is done in i915_gem_context_free: this only
+ * takes care of the bits that are LRC related: the per-engine backing
+ * objects and the logical ringbuffer.
+ */
 void intel_lr_context_free(struct intel_context *ctx)
 {
 	int i;
@@ -1576,6 +1796,19 @@ static uint32_t get_lr_context_size(struct intel_engine_cs *ring)
 	return ret;
 }
 
+/**
+ * intel_lr_context_deferred_create() - create the LRC specific bits of a context
+ * @ctx: LR context to create.
+ * @ring: engine to be used with the context.
+ *
+ * This function can be called more than once, with different engines, if we plan
+ * to use the context with them. The context backing objects and the ringbuffers
+ * (specially the ringbuffer backing objects) suck a lot of memory up, and that's why
+ * the creation is a deferred call: it's better to make sure first that we need to use
+ * a given ring with the context.
+ *
+ * Return: non-zero on eror.
+ */
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring)
 {
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 34b1189..e294e3e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -27,11 +27,22 @@ int intel_logical_ring_add_request(struct intel_engine_cs *ring,
 void intel_logical_ring_advance_and_submit(struct intel_engine_cs *ring,
 					   struct intel_context *ctx);
 
+/**
+ * intel_logical_ring_advance() - advance the ringbuffer tail
+ * @ringbuf: Ringbuffer to advance.
+ *
+ * The tail is only updated in our logical ringbuffer struct.
+ */
 static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
 {
 	ringbuf->tail &= ringbuf->size - 1;
 }
 
+/**
+ * intel_logical_ring_emit() - write a DWORD to the ringbuffer.
+ * @ringbuf: Ringbuffer to write to.
+ * @data: DWORD to write.
+ */
 static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf, u32 data)
 {
 	iowrite32(data, ringbuf->virtual_start + ringbuf->tail);
@@ -53,6 +64,25 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 bool intel_enable_execlists(struct drm_device *dev);
 
+/**
+ * struct intel_ctx_submit_request - queued context submission request
+ * @ctx: Context to submit to the ELSP.
+ * @ring: Engine to submit it to.
+ * @tail: how far in the context's ringbuffer this request goes to.
+ * @execlist_link: link in the submission queue.
+ * @work: workqueue for processing this request in a bottom half.
+ * @elsp_submitted: no. of times this request has been sent to the ELSP.
+ *
+ * The ELSP only accepts two elements at a time, so we queue context/tail
+ * pairs on a given queue (ring->execlist_queue) until the hardware is
+ * available. The queue serves a double purpose: we also use it to keep track
+ * of the up to 2 contexts currently in the hardware (usually one in execution
+ * and the other queued up by the GPU): We only remove elements from the head
+ * of the queue when the hardware informs us that an element has been
+ * completed.
+ *
+ * All accesses to the queue are mediated by a spinlock (ring->execlist_lock).
+ */
 struct intel_ctx_submit_request {
 	struct intel_context *ctx;
 	struct intel_engine_cs *ring;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 52/53] drm/i915/bdw: Enable logical ring contexts
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (50 preceding siblings ...)
  2014-06-13 15:38 ` [PATCH 51/53] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists oscar.mateo
@ 2014-06-13 15:38 ` oscar.mateo
  2014-06-13 15:38 ` [PATCH 53/53] !UPSTREAM: drm/i915: Use MMIO flips oscar.mateo
  2014-06-18 21:26 ` [PATCH 00/53] Execlists v3 Daniel Vetter
  53 siblings, 0 replies; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:38 UTC (permalink / raw)
  To: intel-gfx

From: Oscar Mateo <oscar.mateo@intel.com>

The time has come, the Walrus said, to talk of many things.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 89b6d5c..b62b342 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1964,7 +1964,7 @@ struct drm_i915_cmd_table {
 #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
 
 #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
-#define HAS_LOGICAL_RING_CONTEXTS(dev)	0
+#define HAS_LOGICAL_RING_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 8)
 #define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >= 6)
 #define HAS_PPGTT(dev)		(INTEL_INFO(dev)->gen >= 7 && !IS_GEN8(dev))
 #define USES_PPGTT(dev)		intel_enable_ppgtt(dev, false)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH 53/53] !UPSTREAM: drm/i915: Use MMIO flips
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (51 preceding siblings ...)
  2014-06-13 15:38 ` [PATCH 52/53] drm/i915/bdw: Enable logical ring contexts oscar.mateo
@ 2014-06-13 15:38 ` oscar.mateo
  2014-06-18 21:01   ` Daniel Vetter
  2014-06-18 21:26 ` [PATCH 00/53] Execlists v3 Daniel Vetter
  53 siblings, 1 reply; 156+ messages in thread
From: oscar.mateo @ 2014-06-13 15:38 UTC (permalink / raw)
  To: intel-gfx

From: Sourab Gupta <sourab.gupta@intel.com>

If we want flips to work, either we create an Execlists-aware version
of intel_gen7_queue_flip, or we don't place commands directly in the
ringbuffer.

When upstreamed, this patch should implement the second option:

    drm/i915: Replaced Blitter ring based flips with MMIO flips

    This patch enables the framework for using MMIO based flip calls,
    in contrast with the CS based flip calls which are being used currently.

    MMIO based flip calls can be enabled on architectures where
    Render and Blitter engines reside in different power wells. The
    decision to use MMIO flips can be made based on workloads to give
    100% residency for Media power well.

    v2: The MMIO flips now use the interrupt driven mechanism for issuing the
    flips when target seqno is reached. (Incorporating Ville's idea)

    v3: Rebasing on latest code. Code restructuring after incorporating
    Damien's comments

    v4: Addressing Ville's review comments
        -general cleanup
        -updating only base addr instead of calling update_primary_plane
        -extending patch for gen5+ platforms

    v5: Addressed Ville's review comments
        -Making mmio flip vs cs flip selection based on module parameter
        -Adding check for DRIVER_MODESET feature in notify_ring before calling
         notify mmio flip.
        -Other changes mostly in function arguments

    v6: -Having a seperate function to check condition for using mmio flips (Ville)
        -propogating error code from i915_gem_check_olr (Ville)

    v7: -Adding __must_check with i915_gem_check_olr (Chris)
        -Renaming mmio_flip_data to mmio_flip (Chris)
        -Rebasing on latest nightly

    v8: -Rebasing on latest code
        -squash 3rd patch in series(mmio setbase vs page flip race) with this patch
        -Added new tiling mode update in intel_do_mmio_flip (Chris)

    v9: -check for obj->last_write_seqno being 0 instead of obj->ring being NULL in
    intel_postpone_flip, as this is a more restrictive condition (Chris)

    v10: -Applied Chris's suggestions for squashing patches 2,3 into this patch.
    These patches make the selection of CS vs MMIO flip at the page flip time, and
    make the module parameter for using mmio flips as tristate, the states being
    'force CS flips', 'force mmio flips', 'driver discretion'.
    Changed the logic for driver discretion (Chris)

    v11: Minor code cleanup(better readability, fixing whitespace errors, using
    lockdep to check mutex locked status in postpone_flip, removal of __must_check
    !UPSTREAM: drm/i915: Fix for flips

    If we want flips to work, either we create an Execlists-aware version
    of intel_gen7_queue_flip, or we don't place commands directly in the
    ringbuffer.

    When upstreamed, this patch should implement the second option:

    drm/i915: Replaced Blitter ring based flips with MMIO flips

    This patch enables the framework for using MMIO based flip calls,
    in contrast with the CS based flip calls which are being used currently.

    MMIO based flip calls can be enabled on architectures where
    Render and Blitter engines reside in different power wells. The
    decision to use MMIO flips can be made based on workloads to give
    100% residency for Media power well.

    v2: The MMIO flips now use the interrupt driven mechanism for issuing the
    flips when target seqno is reached. (Incorporating Ville's idea)

    v3: Rebasing on latest code. Code restructuring after incorporating
    Damien's comments

    v4: Addressing Ville's review comments
        -general cleanup
        -updating only base addr instead of calling update_primary_plane
        -extending patch for gen5+ platforms

    v5: Addressed Ville's review comments
        -Making mmio flip vs cs flip selection based on module parameter
        -Adding check for DRIVER_MODESET feature in notify_ring before calling
         notify mmio flip.
        -Other changes mostly in function arguments

    v6: -Having a seperate function to check condition for using mmio flips (Ville)
        -propogating error code from i915_gem_check_olr (Ville)

    v7: -Adding __must_check with i915_gem_check_olr (Chris)
        -Renaming mmio_flip_data to mmio_flip (Chris)
        -Rebasing on latest nightly

    v8: -Rebasing on latest code
        -squash 3rd patch in series(mmio setbase vs page flip race) with this patch
        -Added new tiling mode update in intel_do_mmio_flip (Chris)

    v9: -check for obj->last_write_seqno being 0 instead of obj->ring being NULL in
    intel_postpone_flip, as this is a more restrictive condition (Chris)

    v10: -Applied Chris's suggestions for squashing patches 2,3 into this patch.
    These patches make the selection of CS vs MMIO flip at the page flip time, and
    make the module parameter for using mmio flips as tristate, the states being
    'force CS flips', 'force mmio flips', 'driver discretion'.
    Changed the logic for driver discretion (Chris)

    v11: Minor code cleanup(better readability, fixing whitespace errors, using
    lockdep to check mutex locked status in postpone_flip, removal of __must_check
    in function definition) (Chris)

    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_dma.c      |   1 +
 drivers/gpu/drm/i915/i915_drv.h      |   8 ++
 drivers/gpu/drm/i915/i915_gem.c      |   2 +-
 drivers/gpu/drm/i915/i915_irq.c      |   3 +
 drivers/gpu/drm/i915/i915_params.c   |   5 ++
 drivers/gpu/drm/i915/intel_display.c | 148 ++++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_drv.h     |   6 ++
 7 files changed, 171 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 93c0e1a..681d736 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1607,6 +1607,7 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 	spin_lock_init(&dev_priv->backlight_lock);
 	spin_lock_init(&dev_priv->uncore.lock);
 	spin_lock_init(&dev_priv->mm.object_stat_lock);
+	spin_lock_init(&dev_priv->mmio_flip_lock);
 	mutex_init(&dev_priv->dpio_lock);
 	mutex_init(&dev_priv->modeset_restore_lock);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b62b342..f519b6c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1377,6 +1377,9 @@ struct drm_i915_private {
 	/* protects the irq masks */
 	spinlock_t irq_lock;
 
+	/* protects the mmio flip data */
+	spinlock_t mmio_flip_lock;
+
 	bool display_irqs_enabled;
 
 	/* To control wakeup latency, e.g. for irq-driven dp aux transfers. */
@@ -2064,6 +2067,7 @@ struct i915_params {
 	bool reset;
 	bool disable_display;
 	bool disable_vtd_wa;
+	int use_mmio_flip;
 };
 extern struct i915_params i915 __read_mostly;
 
@@ -2274,6 +2278,8 @@ bool i915_gem_retire_requests(struct drm_device *dev);
 void i915_gem_retire_requests_ring(struct intel_engine_cs *ring);
 int __must_check i915_gem_check_wedge(struct i915_gpu_error *error,
 				      bool interruptible);
+int __must_check i915_gem_check_olr(struct intel_engine_cs *ring, u32 seqno);
+
 static inline bool i915_reset_in_progress(struct i915_gpu_error *error)
 {
 	return unlikely(atomic_read(&error->reset_counter)
@@ -2649,6 +2655,8 @@ int i915_reg_read_ioctl(struct drm_device *dev, void *data,
 int i915_get_reset_stats_ioctl(struct drm_device *dev, void *data,
 			       struct drm_file *file);
 
+void intel_notify_mmio_flip(struct intel_engine_cs *ring);
+
 /* overlay */
 extern struct intel_overlay_error_state *intel_overlay_capture_error_state(struct drm_device *dev);
 extern void intel_overlay_print_error_state(struct drm_i915_error_state_buf *e,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 86bfb8a..093af37 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1095,7 +1095,7 @@ i915_gem_check_wedge(struct i915_gpu_error *error,
  * Compare seqno against outstanding lazy request. Emit a request if they are
  * equal.
  */
-static int
+int
 i915_gem_check_olr(struct intel_engine_cs *ring, u32 seqno)
 {
 	int ret;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index b0fa1ed..824d956 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1218,6 +1218,9 @@ static void notify_ring(struct drm_device *dev,
 
 	trace_i915_gem_request_complete(ring);
 
+	if (drm_core_check_feature(dev, DRIVER_MODESET))
+		intel_notify_mmio_flip(ring);
+
 	wake_up_all(&ring->irq_queue);
 	i915_queue_hangcheck(dev);
 }
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index b7455f8..6bca4b2 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -49,6 +49,7 @@ struct i915_params i915 __read_mostly = {
 	.disable_display = 0,
 	.enable_cmd_parser = 1,
 	.disable_vtd_wa = 0,
+	.use_mmio_flip = 1,
 };
 
 module_param_named(modeset, i915.modeset, int, 0400);
@@ -162,3 +163,7 @@ MODULE_PARM_DESC(disable_vtd_wa, "Disable all VT-d workarounds (default: false)"
 module_param_named(enable_cmd_parser, i915.enable_cmd_parser, int, 0600);
 MODULE_PARM_DESC(enable_cmd_parser,
 		 "Enable command parsing (1=enabled [default], 0=disabled)");
+
+module_param_named(use_mmio_flip, i915.use_mmio_flip, int, 0600);
+MODULE_PARM_DESC(use_mmio_flip, "use MMIO flips (-1=never, 0=driver "
+	"discretion, 1=always [default])");
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index b5cbb28..43fd4e7 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -9255,6 +9255,147 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 	return 0;
 }
 
+static bool use_mmio_flip(struct intel_engine_cs *ring,
+			  struct drm_i915_gem_object *obj)
+{
+	/*
+	 * This is not being used for older platforms, because
+	 * non-availability of flip done interrupt forces us to use
+	 * CS flips. Older platforms derive flip done using some clever
+	 * tricks involving the flip_pending status bits and vblank irqs.
+	 * So using MMIO flips there would disrupt this mechanism.
+	 */
+
+	if (INTEL_INFO(ring->dev)->gen < 5)
+		return false;
+
+	if (i915.use_mmio_flip < 0)
+		return false;
+	else if (i915.use_mmio_flip > 0)
+		return true;
+	else
+		return ring != obj->ring;
+}
+
+static void intel_do_mmio_flip(struct intel_crtc *intel_crtc)
+{
+	struct drm_device *dev = intel_crtc->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_framebuffer *intel_fb =
+		to_intel_framebuffer(intel_crtc->base.primary->fb);
+	struct drm_i915_gem_object *obj = intel_fb->obj;
+	u32 dspcntr;
+	u32 reg;
+
+	intel_mark_page_flip_active(intel_crtc);
+
+	reg = DSPCNTR(intel_crtc->plane);
+	dspcntr = I915_READ(reg);
+
+	if (INTEL_INFO(dev)->gen >= 4) {
+		if (obj->tiling_mode != I915_TILING_NONE)
+			dspcntr |= DISPPLANE_TILED;
+		else
+			dspcntr &= ~DISPPLANE_TILED;
+	}
+	I915_WRITE(reg, dspcntr);
+
+	I915_WRITE(DSPSURF(intel_crtc->plane),
+			intel_crtc->unpin_work->gtt_offset);
+	POSTING_READ(DSPSURF(intel_crtc->plane));
+}
+
+static int intel_postpone_flip(struct drm_i915_gem_object *obj)
+{
+	struct intel_engine_cs *ring;
+	int ret;
+
+	lockdep_assert_held(&obj->base.dev->struct_mutex);
+
+	if (!obj->last_write_seqno)
+		return 0;
+
+	ring = obj->ring;
+
+	if (i915_seqno_passed(ring->get_seqno(ring, true),
+				obj->last_write_seqno))
+		return 0;
+
+	ret = i915_gem_check_olr(ring, obj->last_write_seqno);
+	if (ret)
+		return ret;
+
+	if (WARN_ON(!ring->irq_get(ring)))
+		return 0;
+
+	return 1;
+}
+
+void intel_notify_mmio_flip(struct intel_engine_cs *ring)
+{
+	struct drm_i915_private *dev_priv = to_i915(ring->dev);
+	struct intel_crtc *intel_crtc;
+	unsigned long irq_flags;
+	u32 seqno;
+
+	seqno = ring->get_seqno(ring, false);
+
+	spin_lock_irqsave(&dev_priv->mmio_flip_lock, irq_flags);
+	for_each_intel_crtc(ring->dev, intel_crtc) {
+		struct intel_mmio_flip *mmio_flip;
+
+		mmio_flip = &intel_crtc->mmio_flip;
+		if (mmio_flip->seqno == 0)
+			continue;
+
+		if (ring->id != mmio_flip->ring_id)
+			continue;
+
+		if (i915_seqno_passed(seqno, mmio_flip->seqno)) {
+			intel_do_mmio_flip(intel_crtc);
+			mmio_flip->seqno = 0;
+			ring->irq_put(ring);
+		}
+	}
+	spin_unlock_irqrestore(&dev_priv->mmio_flip_lock, irq_flags);
+}
+
+static int intel_queue_mmio_flip(struct drm_device *dev,
+		struct drm_crtc *crtc,
+		struct drm_framebuffer *fb,
+		struct drm_i915_gem_object *obj,
+		struct intel_engine_cs *ring,
+		uint32_t flags)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	unsigned long irq_flags;
+	int ret;
+
+	if (WARN_ON(intel_crtc->mmio_flip.seqno))
+		return -EBUSY;
+
+	ret = intel_postpone_flip(obj);
+	if (ret < 0)
+		return ret;
+	if (ret == 0) {
+		intel_do_mmio_flip(intel_crtc);
+		return 0;
+	}
+
+	spin_lock_irqsave(&dev_priv->mmio_flip_lock, irq_flags);
+	intel_crtc->mmio_flip.seqno = obj->last_write_seqno;
+	intel_crtc->mmio_flip.ring_id = obj->ring->id;
+	spin_unlock_irqrestore(&dev_priv->mmio_flip_lock, irq_flags);
+
+	/*
+	 * Double check to catch cases where irq fired before
+	 * mmio flip data was ready
+	 */
+	intel_notify_mmio_flip(obj->ring);
+	return 0;
+}
+
 static int intel_default_queue_flip(struct drm_device *dev,
 				    struct drm_crtc *crtc,
 				    struct drm_framebuffer *fb,
@@ -9362,7 +9503,12 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	work->gtt_offset =
 		i915_gem_obj_ggtt_offset(obj) + intel_crtc->dspaddr_offset;
 
-	ret = dev_priv->display.queue_flip(dev, crtc, fb, obj, ring, page_flip_flags);
+	if (use_mmio_flip(ring, obj))
+		ret = intel_queue_mmio_flip(dev, crtc, fb, obj, ring,
+				page_flip_flags);
+	else
+		ret = dev_priv->display.queue_flip(dev, crtc, fb, obj, ring,
+				page_flip_flags);
 	if (ret)
 		goto cleanup_unpin;
 
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 78d4124..b38e88d 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -358,6 +358,11 @@ struct intel_pipe_wm {
 	bool sprites_scaled;
 };
 
+struct intel_mmio_flip {
+	u32 seqno;
+	u32 ring_id;
+};
+
 struct intel_crtc {
 	struct drm_crtc base;
 	enum pipe pipe;
@@ -412,6 +417,7 @@ struct intel_crtc {
 	wait_queue_head_t vbl_wait;
 
 	int scanline_offset;
+	struct intel_mmio_flip mmio_flip;
 };
 
 struct intel_plane_wm_parameters {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [PATCH 51/53] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
  2014-06-13 15:38 ` [PATCH 51/53] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists oscar.mateo
@ 2014-06-13 16:51   ` Chris Wilson
  2014-06-16 15:24     ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Chris Wilson @ 2014-06-13 16:51 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:38:09PM +0100, oscar.mateo@intel.com wrote:
> +/**
> + * intel_execlists_ctx_id() - get the Execlists Context ID
> + * @ctx_obj: Logical Ring Context backing object.
> + *
> + * Do not confuse with ctx->id! Unfortunately we have a name overload
> + * here: the old context ID we pass to userspace as a handler so that
> + * they can refer to a context, and the new context ID we pass to the
> + * ELSP so that the GPU can inform us of the context status via
> + * interrupts.
> + *
> + * Return: 20-bits globally unique context ID.
> + */

Use tag for the ctx id we pass around in hw?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 43/53] drm/i915/bdw: Make sure error capture keeps working with Execlists
  2014-06-13 15:38 ` [PATCH 43/53] drm/i915/bdw: Make sure error capture keeps working " oscar.mateo
@ 2014-06-13 16:54   ` Chris Wilson
  2014-06-18 20:52   ` Daniel Vetter
  1 sibling, 0 replies; 156+ messages in thread
From: Chris Wilson @ 2014-06-13 16:54 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:38:01PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Since the ringbuffer does not belong per engine anymore, we have to
> make sure that we are always recording the correct ringbuffer.
> 
> TODO: This is only a small fix to keep basic error capture working, but
> we need to add more information for it to be useful (e.g. dump the
> context being executed).
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 87ec60e..f5897be 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -825,9 +825,6 @@ static void i915_record_ring_state(struct drm_device *dev,
>  		ering->hws = I915_READ(mmio);
>  	}
>  
> -	ering->cpu_ring_head = ring->buffer->head;
> -	ering->cpu_ring_tail = ring->buffer->tail;
> -
>  	ering->hangcheck_score = ring->hangcheck.score;
>  	ering->hangcheck_action = ring->hangcheck.action;
>  
> @@ -887,6 +884,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
>  
>  	for (i = 0; i < I915_NUM_RINGS; i++) {
>  		struct intel_engine_cs *ring = &dev_priv->ring[i];
> +		struct intel_ringbuffer *ringbuf = ring->buffer;
>  
>  		if (ring->dev == NULL)
>  			continue;
> @@ -929,8 +927,18 @@ static void i915_gem_record_rings(struct drm_device *dev,
>  			}
>  		}
>  
> +		if (intel_enable_execlists(dev)) {
> +			if (request)
> +				ringbuf = request->ctx->engine[ring->id].ringbuf;
> +			else
> +				ringbuf = ring->default_context->engine[ring->id].ringbuf;
> +		}
else
  ringbuf = ring->buffer;

That saves me some confusion in only seeing it correctly set along one
branch. Bonus points for a better name than ringbuf, I'd even prefer rb
over ringbuf.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 37/53] drm/i915/bdw: Implement context switching (somewhat)
  2014-06-13 15:37 ` [PATCH 37/53] drm/i915/bdw: Implement context switching (somewhat) oscar.mateo
@ 2014-06-13 17:00   ` Chris Wilson
  0 siblings, 0 replies; 156+ messages in thread
From: Chris Wilson @ 2014-06-13 17:00 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:37:55PM +0100, oscar.mateo@intel.com wrote:
> +static void execlists_elsp_write(struct intel_engine_cs *ring,
> +				 struct drm_i915_gem_object *ctx_obj0,
> +				 struct drm_i915_gem_object *ctx_obj1)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	uint64_t temp = 0;
> +	uint32_t desc[4];
> +
> +	/* XXX: You must always write both descriptors in the order below. */
> +	if (ctx_obj1)
> +		temp = execlists_ctx_descriptor(ctx_obj1);
> +	else
> +		temp = 0;
> +	desc[1] = (u32)(temp >> 32);
> +	desc[0] = (u32)temp;
> +
> +	temp = execlists_ctx_descriptor(ctx_obj0);
> +	desc[3] = (u32)(temp >> 32);
> +	desc[2] = (u32)temp;
> +
> +	/* Set Force Wakeup bit to prevent GT from entering C6 while
> +	 * ELSP writes are in progress */
> +	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
> +
> +	I915_WRITE(RING_ELSP(ring), desc[1]);
> +	I915_WRITE(RING_ELSP(ring), desc[0]);
> +	I915_WRITE(RING_ELSP(ring), desc[3]);
> +	/* The context is automatically loaded after the following */
> +	I915_WRITE(RING_ELSP(ring), desc[2]);
> +
> +	/* ELSP is a write only register, so this serves as a posting read */

I can see that is a POSTING_READ, so say something like
/* ELSP is a wo reg, so use another nearby reg for posting instead */

> +	POSTING_READ(RING_EXECLIST_STATUS(ring));
> +
> +	/* Release Force Wakeup */

Redundant, the clue is in the function name.

> +	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
> +}
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 02/53] drm/i915: Rename ctx->obj to ctx->render_obj
  2014-06-13 15:37 ` [PATCH 02/53] drm/i915: Rename ctx->obj to ctx->render_obj oscar.mateo
@ 2014-06-13 17:00   ` Daniel Vetter
  2014-06-16 15:20     ` Mateo Lozano, Oscar
  2014-06-13 17:15   ` Chris Wilson
  1 sibling, 1 reply; 156+ messages in thread
From: Daniel Vetter @ 2014-06-13 17:00 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:37:20PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> The reason for doing this will be better explained in the following
> patch. For now, suffice it to say that this backing object is only
> used with the render ring, so we're making this fact more explicit.
> 
> Done with the following Coccinelle patch (plus manual renaming of the
> struct field):
> 
> 	@@
> 	struct intel_context c;
> 	@@
> 	- (c).obj
> 	+ c.render_obj
> 
> 	@@
> 	struct intel_context *c;
> 	@@
> 	- (c)->obj
> 	+ c->render_obj
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

Just screamed at this code reviewing a bugfix from Chris and I really like
this. Can we have a s/is_initialized/render_is_initialized/ on top pls?

Or does that interfere too much with the series? I didn't look ahead ...
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_debugfs.c     |  4 +--
>  drivers/gpu/drm/i915/i915_drv.h         |  2 +-
>  drivers/gpu/drm/i915/i915_gem_context.c | 63 +++++++++++++++++----------------
>  3 files changed, 35 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 7b83297..b09cab4 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1735,7 +1735,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
>  	}
>  
>  	list_for_each_entry(ctx, &dev_priv->context_list, link) {
> -		if (ctx->obj == NULL)
> +		if (ctx->render_obj == NULL)
>  			continue;
>  
>  		seq_puts(m, "HW context ");
> @@ -1744,7 +1744,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
>  			if (ring->default_context == ctx)
>  				seq_printf(m, "(default context %s) ", ring->name);
>  
> -		describe_obj(m, ctx->obj);
> +		describe_obj(m, ctx->render_obj);
>  		seq_putc(m, '\n');
>  	}
>  
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 24f084d..1cebbd4 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -592,7 +592,7 @@ struct intel_context {
>  	uint8_t remap_slice;
>  	struct drm_i915_file_private *file_priv;
>  	struct intel_engine_cs *last_ring;
> -	struct drm_i915_gem_object *obj;
> +	struct drm_i915_gem_object *render_obj;
>  	struct i915_ctx_hang_stats hang_stats;
>  	struct i915_address_space *vm;
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 4efa5ca..f27886a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -182,14 +182,14 @@ void i915_gem_context_free(struct kref *ctx_ref)
>  						   typeof(*ctx), ref);
>  	struct i915_hw_ppgtt *ppgtt = NULL;
>  
> -	if (ctx->obj) {
> +	if (ctx->render_obj) {
>  		/* We refcount even the aliasing PPGTT to keep the code symmetric */
> -		if (USES_PPGTT(ctx->obj->base.dev))
> +		if (USES_PPGTT(ctx->render_obj->base.dev))
>  			ppgtt = ctx_to_ppgtt(ctx);
>  
>  		/* XXX: Free up the object before tearing down the address space, in
>  		 * case we're bound in the PPGTT */
> -		drm_gem_object_unreference(&ctx->obj->base);
> +		drm_gem_object_unreference(&ctx->render_obj->base);
>  	}
>  
>  	if (ppgtt)
> @@ -270,7 +270,7 @@ __create_hw_context(struct drm_device *dev,
>  			ret = PTR_ERR(obj);
>  			goto err_out;
>  		}
> -		ctx->obj = obj;
> +		ctx->render_obj = obj;
>  	}
>  
>  	/* Default context will never have a file_priv */
> @@ -317,7 +317,7 @@ i915_gem_create_context(struct drm_device *dev,
>  	if (IS_ERR(ctx))
>  		return ctx;
>  
> -	if (is_global_default_ctx && ctx->obj) {
> +	if (is_global_default_ctx && ctx->render_obj) {
>  		/* We may need to do things with the shrinker which
>  		 * require us to immediately switch back to the default
>  		 * context. This can cause a problem as pinning the
> @@ -325,7 +325,7 @@ i915_gem_create_context(struct drm_device *dev,
>  		 * be available. To avoid this we always pin the default
>  		 * context.
>  		 */
> -		ret = i915_gem_obj_ggtt_pin(ctx->obj,
> +		ret = i915_gem_obj_ggtt_pin(ctx->render_obj,
>  					    get_context_alignment(dev), 0);
>  		if (ret) {
>  			DRM_DEBUG_DRIVER("Couldn't pin %d\n", ret);
> @@ -365,8 +365,8 @@ i915_gem_create_context(struct drm_device *dev,
>  	return ctx;
>  
>  err_unpin:
> -	if (is_global_default_ctx && ctx->obj)
> -		i915_gem_object_ggtt_unpin(ctx->obj);
> +	if (is_global_default_ctx && ctx->render_obj)
> +		i915_gem_object_ggtt_unpin(ctx->render_obj);
>  err_destroy:
>  	i915_gem_context_unreference(ctx);
>  	return ERR_PTR(ret);
> @@ -390,12 +390,12 @@ void i915_gem_context_reset(struct drm_device *dev)
>  		if (!ring->last_context)
>  			continue;
>  
> -		if (dctx->obj && i == RCS) {
> -			WARN_ON(i915_gem_obj_ggtt_pin(dctx->obj,
> +		if (dctx->render_obj && i == RCS) {
> +			WARN_ON(i915_gem_obj_ggtt_pin(dctx->render_obj,
>  						      get_context_alignment(dev), 0));
>  			/* Fake a finish/inactive */
> -			dctx->obj->base.write_domain = 0;
> -			dctx->obj->active = 0;
> +			dctx->render_obj->base.write_domain = 0;
> +			dctx->render_obj->active = 0;
>  		}
>  
>  		i915_gem_context_unreference(ring->last_context);
> @@ -445,7 +445,7 @@ void i915_gem_context_fini(struct drm_device *dev)
>  	struct intel_context *dctx = dev_priv->ring[RCS].default_context;
>  	int i;
>  
> -	if (dctx->obj) {
> +	if (dctx->render_obj) {
>  		/* The only known way to stop the gpu from accessing the hw context is
>  		 * to reset it. Do this as the very last operation to avoid confusing
>  		 * other code, leading to spurious errors. */
> @@ -460,13 +460,13 @@ void i915_gem_context_fini(struct drm_device *dev)
>  		WARN_ON(!dev_priv->ring[RCS].last_context);
>  		if (dev_priv->ring[RCS].last_context == dctx) {
>  			/* Fake switch to NULL context */
> -			WARN_ON(dctx->obj->active);
> -			i915_gem_object_ggtt_unpin(dctx->obj);
> +			WARN_ON(dctx->render_obj->active);
> +			i915_gem_object_ggtt_unpin(dctx->render_obj);
>  			i915_gem_context_unreference(dctx);
>  			dev_priv->ring[RCS].last_context = NULL;
>  		}
>  
> -		i915_gem_object_ggtt_unpin(dctx->obj);
> +		i915_gem_object_ggtt_unpin(dctx->render_obj);
>  	}
>  
>  	for (i = 0; i < I915_NUM_RINGS; i++) {
> @@ -586,7 +586,7 @@ mi_set_context(struct intel_engine_cs *ring,
>  
>  	intel_ring_emit(ring, MI_NOOP);
>  	intel_ring_emit(ring, MI_SET_CONTEXT);
> -	intel_ring_emit(ring, i915_gem_obj_ggtt_offset(new_context->obj) |
> +	intel_ring_emit(ring, i915_gem_obj_ggtt_offset(new_context->render_obj) |
>  			MI_MM_SPACE_GTT |
>  			MI_SAVE_EXT_STATE_EN |
>  			MI_RESTORE_EXT_STATE_EN |
> @@ -617,8 +617,8 @@ static int do_switch(struct intel_engine_cs *ring,
>  	int ret, i;
>  
>  	if (from != NULL && ring == &dev_priv->ring[RCS]) {
> -		BUG_ON(from->obj == NULL);
> -		BUG_ON(!i915_gem_obj_is_pinned(from->obj));
> +		BUG_ON(from->render_obj == NULL);
> +		BUG_ON(!i915_gem_obj_is_pinned(from->render_obj));
>  	}
>  
>  	if (from == to && from->last_ring == ring && !to->remap_slice)
> @@ -626,7 +626,7 @@ static int do_switch(struct intel_engine_cs *ring,
>  
>  	/* Trying to pin first makes error handling easier. */
>  	if (ring == &dev_priv->ring[RCS]) {
> -		ret = i915_gem_obj_ggtt_pin(to->obj,
> +		ret = i915_gem_obj_ggtt_pin(to->render_obj,
>  					    get_context_alignment(ring->dev), 0);
>  		if (ret)
>  			return ret;
> @@ -659,14 +659,14 @@ static int do_switch(struct intel_engine_cs *ring,
>  	 *
>  	 * XXX: We need a real interface to do this instead of trickery.
>  	 */
> -	ret = i915_gem_object_set_to_gtt_domain(to->obj, false);
> +	ret = i915_gem_object_set_to_gtt_domain(to->render_obj, false);
>  	if (ret)
>  		goto unpin_out;
>  
> -	if (!to->obj->has_global_gtt_mapping) {
> -		struct i915_vma *vma = i915_gem_obj_to_vma(to->obj,
> +	if (!to->render_obj->has_global_gtt_mapping) {
> +		struct i915_vma *vma = i915_gem_obj_to_vma(to->render_obj,
>  							   &dev_priv->gtt.base);
> -		vma->bind_vma(vma, to->obj->cache_level, GLOBAL_BIND);
> +		vma->bind_vma(vma, to->render_obj->cache_level, GLOBAL_BIND);
>  	}
>  
>  	if (!to->is_initialized || i915_gem_context_is_default(to))
> @@ -695,8 +695,9 @@ static int do_switch(struct intel_engine_cs *ring,
>  	 * MI_SET_CONTEXT instead of when the next seqno has completed.
>  	 */
>  	if (from != NULL) {
> -		from->obj->base.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
> -		i915_vma_move_to_active(i915_gem_obj_to_ggtt(from->obj), ring);
> +		from->render_obj->base.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
> +		i915_vma_move_to_active(i915_gem_obj_to_ggtt(from->render_obj),
> +					ring);
>  		/* As long as MI_SET_CONTEXT is serializing, ie. it flushes the
>  		 * whole damn pipeline, we don't need to explicitly mark the
>  		 * object dirty. The only exception is that the context must be
> @@ -704,11 +705,11 @@ static int do_switch(struct intel_engine_cs *ring,
>  		 * able to defer doing this until we know the object would be
>  		 * swapped, but there is no way to do that yet.
>  		 */
> -		from->obj->dirty = 1;
> -		BUG_ON(from->obj->ring != ring);
> +		from->render_obj->dirty = 1;
> +		BUG_ON(from->render_obj->ring != ring);
>  
>  		/* obj is kept alive until the next request by its active ref */
> -		i915_gem_object_ggtt_unpin(from->obj);
> +		i915_gem_object_ggtt_unpin(from->render_obj);
>  		i915_gem_context_unreference(from);
>  	}
>  
> @@ -729,7 +730,7 @@ done:
>  
>  unpin_out:
>  	if (ring->id == RCS)
> -		i915_gem_object_ggtt_unpin(to->obj);
> +		i915_gem_object_ggtt_unpin(to->render_obj);
>  	return ret;
>  }
>  
> @@ -750,7 +751,7 @@ int i915_switch_context(struct intel_engine_cs *ring,
>  
>  	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
>  
> -	if (to->obj == NULL) { /* We have the fake context */
> +	if (to->render_obj == NULL) { /* We have the fake context */
>  		if (to != ring->last_context) {
>  			i915_gem_context_reference(to);
>  			if (ring->last_context)
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 05/53] drm/i915: Move i915_gem_validate_context() to i915_gem_context.c
  2014-06-13 15:37 ` [PATCH 05/53] drm/i915: Move i915_gem_validate_context() to i915_gem_context.c oscar.mateo
@ 2014-06-13 17:11   ` Chris Wilson
  2014-06-16 15:18     ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Chris Wilson @ 2014-06-13 17:11 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:37:23PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> ... and namespace appropriately.
> 
> It looks to me like it belongs logically there.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h            |  3 +++
>  drivers/gpu/drm/i915/i915_gem_context.c    | 23 +++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 25 +------------------------
>  3 files changed, 27 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index ec7e352..a15370c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2409,6 +2409,9 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>  				  struct drm_file *file);
>  int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>  				   struct drm_file *file);
> +struct intel_context *
> +i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
> +			  struct intel_engine_cs *ring, const u32 ctx_id);
>  
>  /* i915_gem_render_state.c */
>  int i915_gem_render_state_init(struct intel_engine_cs *ring);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index f6c2538..801b891 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -824,3 +824,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>  	DRM_DEBUG_DRIVER("HW context %d destroyed\n", args->ctx_id);
>  	return 0;
>  }
> +
> +struct intel_context *
> +i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
> +			  struct intel_engine_cs *ring, const u32 ctx_id)
> +{
> +	struct intel_context *ctx = NULL;
> +	struct i915_ctx_hang_stats *hs;
> +
> +	if (ring->id != RCS && ctx_id != DEFAULT_CONTEXT_ID)
> +		return ERR_PTR(-EINVAL);
> +
> +	ctx = i915_gem_context_get(file->driver_priv, ctx_id);
> +	if (IS_ERR(ctx))
> +		return ctx;
> +
> +	hs = &ctx->hang_stats;
> +	if (hs->banned) {
> +		DRM_DEBUG("Context %u tried to submit while banned\n", ctx_id);
> +		return ERR_PTR(-EIO);

Ugh. No.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 02/53] drm/i915: Rename ctx->obj to ctx->render_obj
  2014-06-13 15:37 ` [PATCH 02/53] drm/i915: Rename ctx->obj to ctx->render_obj oscar.mateo
  2014-06-13 17:00   ` Daniel Vetter
@ 2014-06-13 17:15   ` Chris Wilson
  1 sibling, 0 replies; 156+ messages in thread
From: Chris Wilson @ 2014-06-13 17:15 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:37:20PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> The reason for doing this will be better explained in the following
> patch. For now, suffice it to say that this backing object is only
> used with the render ring, so we're making this fact more explicit.

Try rcs_state for size, render_obj I think is confusing.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 05/53] drm/i915: Move i915_gem_validate_context() to i915_gem_context.c
  2014-06-13 17:11   ` Chris Wilson
@ 2014-06-16 15:18     ` Mateo Lozano, Oscar
  2014-06-18 20:00       ` Volkin, Bradley D
  0 siblings, 1 reply; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-16 15:18 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

> -----Original Message-----
> From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> Sent: Friday, June 13, 2014 6:11 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 05/53] drm/i915: Move
> i915_gem_validate_context() to i915_gem_context.c
> 
> On Fri, Jun 13, 2014 at 04:37:23PM +0100, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > ... and namespace appropriately.
> >
> > It looks to me like it belongs logically there.
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h            |  3 +++
> >  drivers/gpu/drm/i915/i915_gem_context.c    | 23
> +++++++++++++++++++++++
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 25
> > +------------------------
> >  3 files changed, 27 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h index ec7e352..a15370c 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2409,6 +2409,9 @@ int i915_gem_context_create_ioctl(struct
> drm_device *dev, void *data,
> >  				  struct drm_file *file);
> >  int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
> >  				   struct drm_file *file);
> > +struct intel_context *
> > +i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
> > +			  struct intel_engine_cs *ring, const u32 ctx_id);
> >
> >  /* i915_gem_render_state.c */
> >  int i915_gem_render_state_init(struct intel_engine_cs *ring); diff
> > --git a/drivers/gpu/drm/i915/i915_gem_context.c
> > b/drivers/gpu/drm/i915/i915_gem_context.c
> > index f6c2538..801b891 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -824,3 +824,26 @@ int i915_gem_context_destroy_ioctl(struct
> drm_device *dev, void *data,
> >  	DRM_DEBUG_DRIVER("HW context %d destroyed\n", args->ctx_id);
> >  	return 0;
> >  }
> > +
> > +struct intel_context *
> > +i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
> > +			  struct intel_engine_cs *ring, const u32 ctx_id) {
> > +	struct intel_context *ctx = NULL;
> > +	struct i915_ctx_hang_stats *hs;
> > +
> > +	if (ring->id != RCS && ctx_id != DEFAULT_CONTEXT_ID)
> > +		return ERR_PTR(-EINVAL);
> > +
> > +	ctx = i915_gem_context_get(file->driver_priv, ctx_id);
> > +	if (IS_ERR(ctx))
> > +		return ctx;
> > +
> > +	hs = &ctx->hang_stats;
> > +	if (hs->banned) {
> > +		DRM_DEBUG("Context %u tried to submit while banned\n",
> ctx_id);
> > +		return ERR_PTR(-EIO);
> 
> Ugh. No.
> -Chris

D´oh! Why?
- Oscar

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 02/53] drm/i915: Rename ctx->obj to ctx->render_obj
  2014-06-13 17:00   ` Daniel Vetter
@ 2014-06-16 15:20     ` Mateo Lozano, Oscar
  0 siblings, 0 replies; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-16 15:20 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx



---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47


> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Friday, June 13, 2014 6:01 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 02/53] drm/i915: Rename ctx->obj to ctx-
> >render_obj
> 
> On Fri, Jun 13, 2014 at 04:37:20PM +0100, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > The reason for doing this will be better explained in the following
> > patch. For now, suffice it to say that this backing object is only
> > used with the render ring, so we're making this fact more explicit.
> >
> > Done with the following Coccinelle patch (plus manual renaming of the
> > struct field):
> >
> > 	@@
> > 	struct intel_context c;
> > 	@@
> > 	- (c).obj
> > 	+ c.render_obj
> >
> > 	@@
> > 	struct intel_context *c;
> > 	@@
> > 	- (c)->obj
> > 	+ c->render_obj
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> 
> Just screamed at this code reviewing a bugfix from Chris and I really like this.
> Can we have a s/is_initialized/render_is_initialized/ on top pls?
> 
> Or does that interfere too much with the series? I didn't look ahead ...
> -Daniel

No problem, I can add that on top without any hassle. I´ll send it together with a bunch of other prep-work patches.

-- Oscar

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 51/53] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
  2014-06-13 16:51   ` Chris Wilson
@ 2014-06-16 15:24     ` Mateo Lozano, Oscar
  2014-06-16 17:56       ` Daniel Vetter
  0 siblings, 1 reply; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-16 15:24 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

> -----Original Message-----
> From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> Sent: Friday, June 13, 2014 5:51 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 51/53] drm/i915/bdw: Document Logical
> Rings, LR contexts and Execlists
> 
> On Fri, Jun 13, 2014 at 04:38:09PM +0100, oscar.mateo@intel.com wrote:
> > +/**
> > + * intel_execlists_ctx_id() - get the Execlists Context ID
> > + * @ctx_obj: Logical Ring Context backing object.
> > + *
> > + * Do not confuse with ctx->id! Unfortunately we have a name overload
> > + * here: the old context ID we pass to userspace as a handler so that
> > + * they can refer to a context, and the new context ID we pass to the
> > + * ELSP so that the GPU can inform us of the context status via
> > + * interrupts.
> > + *
> > + * Return: 20-bits globally unique context ID.
> > + */
> 
> Use tag for the ctx id we pass around in hw?
> -Chris

I also tried other names, like "submission id", but it confuses people when they search for in the BSpec. Maybe changing ctx->id to ctx->tag, and leaving id for the hardware?

-- Oscar

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 51/53] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
  2014-06-16 15:24     ` Mateo Lozano, Oscar
@ 2014-06-16 17:56       ` Daniel Vetter
  2014-06-17  8:22         ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Daniel Vetter @ 2014-06-16 17:56 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx

On Mon, Jun 16, 2014 at 03:24:26PM +0000, Mateo Lozano, Oscar wrote:
> > -----Original Message-----
> > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > Sent: Friday, June 13, 2014 5:51 PM
> > To: Mateo Lozano, Oscar
> > Cc: intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 51/53] drm/i915/bdw: Document Logical
> > Rings, LR contexts and Execlists
> > 
> > On Fri, Jun 13, 2014 at 04:38:09PM +0100, oscar.mateo@intel.com wrote:
> > > +/**
> > > + * intel_execlists_ctx_id() - get the Execlists Context ID
> > > + * @ctx_obj: Logical Ring Context backing object.
> > > + *
> > > + * Do not confuse with ctx->id! Unfortunately we have a name overload
> > > + * here: the old context ID we pass to userspace as a handler so that
> > > + * they can refer to a context, and the new context ID we pass to the
> > > + * ELSP so that the GPU can inform us of the context status via
> > > + * interrupts.
> > > + *
> > > + * Return: 20-bits globally unique context ID.
> > > + */
> > 
> > Use tag for the ctx id we pass around in hw?
> > -Chris
> 
> I also tried other names, like "submission id", but it confuses people
> when they search for in the BSpec. Maybe changing ctx->id to ctx->tag,
> and leaving id for the hardware?

I think Chris' idea was to reuse the id from the idr for the hw tag. But I
guess that fails because our idr is global.

Or I'm totally confused.

I'd vote for hw_ctx_id or something.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 51/53] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
  2014-06-16 17:56       ` Daniel Vetter
@ 2014-06-17  8:22         ` Mateo Lozano, Oscar
  2014-06-17  9:39           ` Daniel Vetter
  0 siblings, 1 reply; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-17  8:22 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx


> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Monday, June 16, 2014 6:56 PM
> To: Mateo Lozano, Oscar
> Cc: Chris Wilson; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 51/53] drm/i915/bdw: Document Logical
> Rings, LR contexts and Execlists
> 
> On Mon, Jun 16, 2014 at 03:24:26PM +0000, Mateo Lozano, Oscar wrote:
> > > -----Original Message-----
> > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > Sent: Friday, June 13, 2014 5:51 PM
> > > To: Mateo Lozano, Oscar
> > > Cc: intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH 51/53] drm/i915/bdw: Document
> > > Logical Rings, LR contexts and Execlists
> > >
> > > On Fri, Jun 13, 2014 at 04:38:09PM +0100, oscar.mateo@intel.com
> wrote:
> > > > +/**
> > > > + * intel_execlists_ctx_id() - get the Execlists Context ID
> > > > + * @ctx_obj: Logical Ring Context backing object.
> > > > + *
> > > > + * Do not confuse with ctx->id! Unfortunately we have a name
> > > > +overload
> > > > + * here: the old context ID we pass to userspace as a handler so
> > > > +that
> > > > + * they can refer to a context, and the new context ID we pass to
> > > > +the
> > > > + * ELSP so that the GPU can inform us of the context status via
> > > > + * interrupts.
> > > > + *
> > > > + * Return: 20-bits globally unique context ID.
> > > > + */
> > >
> > > Use tag for the ctx id we pass around in hw?
> > > -Chris
> >
> > I also tried other names, like "submission id", but it confuses people
> > when they search for in the BSpec. Maybe changing ctx->id to ctx->tag,
> > and leaving id for the hardware?
> 
> I think Chris' idea was to reuse the id from the idr for the hw tag. But I guess
> that fails because our idr is global.
> 
> Or I'm totally confused.
> 
> I'd vote for hw_ctx_id or something.
> -Daniel

In the first version of the series I tried to reuse the id from the idr, but that was a bad idea because the id we pass to the hw has to be globally unique, while our idr is per file_priv. What I did is adding an id field to the file_priv and then generating the hw ctx id by using some bits from ctx->id, some from file_priv->id and finally some from ring->id (since we multiplex several hw contexts inside our struct intel_context). But the ELSP context descriptor only allows 20 bits for the id, so I had to limit the maximum number of contexts, files or rings artificially (ugly).

Another proposal: s/ctx->id/ctx->handle. After all, our ctx->id software construct is just a userspace handle...

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 51/53] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
  2014-06-17  8:22         ` Mateo Lozano, Oscar
@ 2014-06-17  9:39           ` Daniel Vetter
  2014-06-17  9:46             ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Daniel Vetter @ 2014-06-17  9:39 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx

On Tue, Jun 17, 2014 at 08:22:55AM +0000, Mateo Lozano, Oscar wrote:
> 
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> > Vetter
> > Sent: Monday, June 16, 2014 6:56 PM
> > To: Mateo Lozano, Oscar
> > Cc: Chris Wilson; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 51/53] drm/i915/bdw: Document Logical
> > Rings, LR contexts and Execlists
> > 
> > On Mon, Jun 16, 2014 at 03:24:26PM +0000, Mateo Lozano, Oscar wrote:
> > > > -----Original Message-----
> > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > > Sent: Friday, June 13, 2014 5:51 PM
> > > > To: Mateo Lozano, Oscar
> > > > Cc: intel-gfx@lists.freedesktop.org
> > > > Subject: Re: [Intel-gfx] [PATCH 51/53] drm/i915/bdw: Document
> > > > Logical Rings, LR contexts and Execlists
> > > >
> > > > On Fri, Jun 13, 2014 at 04:38:09PM +0100, oscar.mateo@intel.com
> > wrote:
> > > > > +/**
> > > > > + * intel_execlists_ctx_id() - get the Execlists Context ID
> > > > > + * @ctx_obj: Logical Ring Context backing object.
> > > > > + *
> > > > > + * Do not confuse with ctx->id! Unfortunately we have a name
> > > > > +overload
> > > > > + * here: the old context ID we pass to userspace as a handler so
> > > > > +that
> > > > > + * they can refer to a context, and the new context ID we pass to
> > > > > +the
> > > > > + * ELSP so that the GPU can inform us of the context status via
> > > > > + * interrupts.
> > > > > + *
> > > > > + * Return: 20-bits globally unique context ID.
> > > > > + */
> > > >
> > > > Use tag for the ctx id we pass around in hw?
> > > > -Chris
> > >
> > > I also tried other names, like "submission id", but it confuses people
> > > when they search for in the BSpec. Maybe changing ctx->id to ctx->tag,
> > > and leaving id for the hardware?
> > 
> > I think Chris' idea was to reuse the id from the idr for the hw tag. But I guess
> > that fails because our idr is global.
> > 
> > Or I'm totally confused.
> > 
> > I'd vote for hw_ctx_id or something.
> > -Daniel
> 
> In the first version of the series I tried to reuse the id from the idr,
> but that was a bad idea because the id we pass to the hw has to be
> globally unique, while our idr is per file_priv. What I did is adding an
> id field to the file_priv and then generating the hw ctx id by using
> some bits from ctx->id, some from file_priv->id and finally some from
> ring->id (since we multiplex several hw contexts inside our struct
> intel_context). But the ELSP context descriptor only allows 20 bits for
> the id, so I had to limit the maximum number of contexts, files or rings
> artificially (ugly).

Considerations like this should be somewhere in the commit message.
Especially when it's all stuff you've discovered before review started and
hence doesn't have a public record anywhere.

> Another proposal: s/ctx->id/ctx->handle. After all, our ctx->id software construct is just a userspace handle...

Not sure either is clearer really. As long as there's a clear disdinction
between the hw id and the userspace handle I'm ok, maybe augmented with
some comments to explain the struct fields in the header.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 51/53] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
  2014-06-17  9:39           ` Daniel Vetter
@ 2014-06-17  9:46             ` Mateo Lozano, Oscar
  2014-06-17 10:08               ` Daniel Vetter
  0 siblings, 1 reply; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-17  9:46 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx



---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47


> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Tuesday, June 17, 2014 10:40 AM
> To: Mateo Lozano, Oscar
> Cc: Daniel Vetter; Chris Wilson; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 51/53] drm/i915/bdw: Document Logical
> Rings, LR contexts and Execlists
> 
> On Tue, Jun 17, 2014 at 08:22:55AM +0000, Mateo Lozano, Oscar wrote:
> >
> > > -----Original Message-----
> > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> > > Daniel Vetter
> > > Sent: Monday, June 16, 2014 6:56 PM
> > > To: Mateo Lozano, Oscar
> > > Cc: Chris Wilson; intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH 51/53] drm/i915/bdw: Document
> > > Logical Rings, LR contexts and Execlists
> > >
> > > On Mon, Jun 16, 2014 at 03:24:26PM +0000, Mateo Lozano, Oscar
> wrote:
> > > > > -----Original Message-----
> > > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > > > Sent: Friday, June 13, 2014 5:51 PM
> > > > > To: Mateo Lozano, Oscar
> > > > > Cc: intel-gfx@lists.freedesktop.org
> > > > > Subject: Re: [Intel-gfx] [PATCH 51/53] drm/i915/bdw: Document
> > > > > Logical Rings, LR contexts and Execlists
> > > > >
> > > > > On Fri, Jun 13, 2014 at 04:38:09PM +0100, oscar.mateo@intel.com
> > > wrote:
> > > > > > +/**
> > > > > > + * intel_execlists_ctx_id() - get the Execlists Context ID
> > > > > > + * @ctx_obj: Logical Ring Context backing object.
> > > > > > + *
> > > > > > + * Do not confuse with ctx->id! Unfortunately we have a name
> > > > > > +overload
> > > > > > + * here: the old context ID we pass to userspace as a handler
> > > > > > +so that
> > > > > > + * they can refer to a context, and the new context ID we
> > > > > > +pass to the
> > > > > > + * ELSP so that the GPU can inform us of the context status
> > > > > > +via
> > > > > > + * interrupts.
> > > > > > + *
> > > > > > + * Return: 20-bits globally unique context ID.
> > > > > > + */
> > > > >
> > > > > Use tag for the ctx id we pass around in hw?
> > > > > -Chris
> > > >
> > > > I also tried other names, like "submission id", but it confuses
> > > > people when they search for in the BSpec. Maybe changing ctx->id
> > > > to ctx->tag, and leaving id for the hardware?
> > >
> > > I think Chris' idea was to reuse the id from the idr for the hw tag.
> > > But I guess that fails because our idr is global.
> > >
> > > Or I'm totally confused.
> > >
> > > I'd vote for hw_ctx_id or something.
> > > -Daniel
> >
> > In the first version of the series I tried to reuse the id from the
> > idr, but that was a bad idea because the id we pass to the hw has to
> > be globally unique, while our idr is per file_priv. What I did is
> > adding an id field to the file_priv and then generating the hw ctx id
> > by using some bits from ctx->id, some from file_priv->id and finally
> > some from
> > ring->id (since we multiplex several hw contexts inside our struct
> > intel_context). But the ELSP context descriptor only allows 20 bits
> > for the id, so I had to limit the maximum number of contexts, files or
> > rings artificially (ugly).
> 
> Considerations like this should be somewhere in the commit message.
> Especially when it's all stuff you've discovered before review started and
> hence doesn't have a public record anywhere.

Nope, I discovered this *after* review started: v1 used the Frankenstein-style context id, v2 didn´t. 

The comment appears in the commit message for " drm/i915/bdw: Implement context switching (somewhat) ":

    v3: Use LRCA[31:12] as hwCtxId[19:0]. This guarantees that the HW context
    ID we submit to the ELSP is globally unique and != 0 (Bspec requirements
    of the software use-only bits of the Context ID in the Context Descriptor
    Format) without the hassle of the previous submission Id construction.

> > Another proposal: s/ctx->id/ctx->handle. After all, our ctx->id software
> construct is just a userspace handle...
> 
> Not sure either is clearer really. As long as there's a clear disdinction
> between the hw id and the userspace handle I'm ok, maybe augmented with
> some comments to explain the struct fields in the header.
> -Daniel

I´ll go with s/ctx->id/ctx->handle and the comments then (IMHO, it´s clear enough).

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 51/53] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
  2014-06-17  9:46             ` Mateo Lozano, Oscar
@ 2014-06-17 10:08               ` Daniel Vetter
  2014-06-17 10:12                 ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Daniel Vetter @ 2014-06-17 10:08 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx

On Tue, Jun 17, 2014 at 11:46 AM, Mateo Lozano, Oscar
<oscar.mateo@intel.com> wrote:
> The comment appears in the commit message for " drm/i915/bdw: Implement context switching (somewhat) ":
>
>     v3: Use LRCA[31:12] as hwCtxId[19:0]. This guarantees that the HW context
>     ID we submit to the ELSP is globally unique and != 0 (Bspec requirements
>     of the software use-only bits of the Context ID in the Context Descriptor
>     Format) without the hassle of the previous submission Id construction.

I've meant a comment as to why reusing ctx->id isn't a good idea since
it's per-file an so not globally unique. Occasionally repeating and
stating the seemingly obvious won't hurt ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 51/53] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
  2014-06-17 10:08               ` Daniel Vetter
@ 2014-06-17 10:12                 ` Mateo Lozano, Oscar
  0 siblings, 0 replies; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-17 10:12 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

> -----Original Message-----
> From: daniel.vetter@ffwll.ch [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> Daniel Vetter
> Sent: Tuesday, June 17, 2014 11:09 AM
> To: Mateo Lozano, Oscar
> Cc: Chris Wilson; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 51/53] drm/i915/bdw: Document Logical
> Rings, LR contexts and Execlists
> 
> On Tue, Jun 17, 2014 at 11:46 AM, Mateo Lozano, Oscar
> <oscar.mateo@intel.com> wrote:
> > The comment appears in the commit message for " drm/i915/bdw:
> Implement context switching (somewhat) ":
> >
> >     v3: Use LRCA[31:12] as hwCtxId[19:0]. This guarantees that the HW
> context
> >     ID we submit to the ELSP is globally unique and != 0 (Bspec requirements
> >     of the software use-only bits of the Context ID in the Context Descriptor
> >     Format) without the hassle of the previous submission Id construction.
> 
> I've meant a comment as to why reusing ctx->id isn't a good idea since it's
> per-file an so not globally unique. Occasionally repeating and stating the
> seemingly obvious won't hurt ;-) -Daniel

Ok, I´ll expand the comment describing why the submission Id construction was a very bad idea.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 05/53] drm/i915: Move i915_gem_validate_context() to i915_gem_context.c
  2014-06-16 15:18     ` Mateo Lozano, Oscar
@ 2014-06-18 20:00       ` Volkin, Bradley D
  0 siblings, 0 replies; 156+ messages in thread
From: Volkin, Bradley D @ 2014-06-18 20:00 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx

[snip]

On Mon, Jun 16, 2014 at 08:18:00AM -0700, Mateo Lozano, Oscar wrote:
> > > +struct intel_context *
> > > +i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
> > > +			  struct intel_engine_cs *ring, const u32 ctx_id) {
> > > +	struct intel_context *ctx = NULL;
> > > +	struct i915_ctx_hang_stats *hs;
> > > +
> > > +	if (ring->id != RCS && ctx_id != DEFAULT_CONTEXT_ID)
> > > +		return ERR_PTR(-EINVAL);
> > > +
> > > +	ctx = i915_gem_context_get(file->driver_priv, ctx_id);
> > > +	if (IS_ERR(ctx))
> > > +		return ctx;
> > > +
> > > +	hs = &ctx->hang_stats;
> > > +	if (hs->banned) {
> > > +		DRM_DEBUG("Context %u tried to submit while banned\n",
> > ctx_id);
> > > +		return ERR_PTR(-EIO);
> > 
> > Ugh. No.
> > -Chris
> 
> D´oh! Why?
> - Oscar

Not sure if you got an answer on this. I'd guess the objection is that
the function effectively implements part of the execbuf2 API contract
rather than generic context behavior. So we'd want to just keep it as
part of i915_gem_execbuffer.c.

Brad

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 06/53] drm/i915/bdw: Introduce one context backing object per engine
  2014-06-13 15:37 ` [PATCH 06/53] drm/i915/bdw: Introduce one context backing object per engine oscar.mateo
@ 2014-06-18 20:16   ` Daniel Vetter
  2014-06-19  8:52     ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Daniel Vetter @ 2014-06-18 20:16 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:37:24PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> A context backing object only makes sense for a given engine (because
> it holds state data specific to that engine).
> 
> In legacy ringbuffer sumission mode, the only MI_SET_CONTEXT we really
> perform is for the render engine, so one backing object is all we needed.
> 
> With Execlists, however, we need backing objects for every engine, as
> contexts become the only way to submit workloads to the GPU. To tackle
> this problem, we multiplex the context struct to contain <no-of-engines>
> objects.
> 
> Originally, I colored this code by instantiating one new context for
> every engine I wanted to use, but this change suggested by Brad Volkin
> makes it more elegant.
> 
> v2: Leave the old backing object pointer behind. Daniel Vetter suggested
> using a union, but it makes more sense to keep render_obj as a NULL
> pointer behind, to make sure no one uses it incorrectly when Execlists
> are enabled, similar to what we are doing with ring->buffer (Rusty's API
> level 5).
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index a15370c..ccc1ba6 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -593,7 +593,14 @@ struct intel_context {
>  	uint8_t remap_slice;
>  	struct drm_i915_file_private *file_priv;
>  	struct intel_engine_cs *last_ring;
> +
> +	/* Legacy ring buffer submission */
>  	struct drm_i915_gem_object *render_obj;

Per my previous request, is_initialized should also be nearby, maybe
wrapped in a struct. So
union {
	struct {
		struct gem_bo *obj;
		bool is_iniatlized;

	} render_ctx;
	struct {
		...
	} lrc[I915_NUM_RINGS];
}

Or some other means to make it clearer which fields are for legacy render
ctx objects and which for lrc contexts. I also wonder whether we should
shovel all the hw specific stuff at the end to have a clearer separation
between the sw-side field members associated with the software context
object and the stuff for the hw thing.

Just ideas to pick&choose really, we can cocci-polish this once it's all
settled easily (i.e. afterwards).
-Daniel


> +	/* Execlists */
> +	struct {
> +		struct drm_i915_gem_object *obj;
> +	} engine[I915_NUM_RINGS];
> +
>  	struct i915_ctx_hang_stats hang_stats;
>  	struct i915_address_space *vm;
>  
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 07/53] drm/i915/bdw: New file for Logical Ring Contexts and Execlists
  2014-06-13 15:37 ` [PATCH 07/53] drm/i915/bdw: New file for Logical Ring Contexts and Execlists oscar.mateo
@ 2014-06-18 20:17   ` Daniel Vetter
  2014-06-19  9:01     ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Daniel Vetter @ 2014-06-18 20:17 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:37:25PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Some legacy HW context code assumptions don't make sense for this new
> submission method, so we will place this stuff in a separate file.
> 
> Note for reviewers: I've carefully considered the best name for this file
> and this was my best option (other possibilities were intel_lr_context.c
> or intel_execlist.c). I am open to a certain bikeshedding on this matter,
> anyway. Regarding splitting execlists and logical ring contexts, it is
> probably not worth it for the moment.
> 
> v2: Change to intel_lrc.c
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/Makefile    |  1 +
>  drivers/gpu/drm/i915/intel_lrc.c | 42 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 43 insertions(+)
>  create mode 100644 drivers/gpu/drm/i915/intel_lrc.c
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index cad1683..9fee2a0 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -31,6 +31,7 @@ i915-y += i915_cmd_parser.o \
>  	  i915_gpu_error.o \
>  	  i915_irq.o \
>  	  i915_trace_points.o \
> +	  intel_lrc.o \
>  	  intel_ringbuffer.o \
>  	  intel_uncore.o
>  
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> new file mode 100644
> index 0000000..49bb6fc
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -0,0 +1,42 @@
> +/*
> + * Copyright © 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Authors:
> + *    Ben Widawsky <ben@bwidawsk.net>
> + *    Michel Thierry <michel.thierry@intel.com>
> + *    Thomas Daniel <thomas.daniel@intel.com>
> + *    Oscar Mateo <oscar.mateo@intel.com>
> + *
> + */
> +
> +/*

Overview comments should be kerneldoc DOC: sections and pulled into our
driver doc. Brad knows how to do this, see i915_cmd_parser.c. Just in case
a patch later on doesn't do this ;-)
-Daniel

> + * GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
> + * These expanded contexts enable a number of new abilities, especially
> + * "Execlists" (also implemented in this file).
> + *
> + * Execlists are the new method by which, on gen8+ hardware, workloads are
> + * submitted for execution (as opposed to the legacy, ringbuffer-based, method).
> + */
> +
> +#include <drm/drmP.h>
> +#include <drm/i915_drm.h>
> +#include "i915_drv.h"
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 08/53] drm/i915/bdw: Macro for LRCs and module option for Execlists
  2014-06-13 15:37 ` [PATCH 08/53] drm/i915/bdw: Macro for LRCs and module option for Execlists oscar.mateo
@ 2014-06-18 20:19   ` Daniel Vetter
  2014-06-19  9:04     ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Daniel Vetter @ 2014-06-18 20:19 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:37:26PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
> These expanded contexts enable a number of new abilities, especially
> "Execlists".
> 
> The macro is defined to off until we have things in place to hope to
> work. In dev_priv, lrc_enabled will reflect the state of whether or
> not we've actually properly initialized these new contexts. This helps
> the transition in the code but is a candidate for removal at some point.
> 
> v2: Rename "advanced contexts" to the more correct "logical ring
> contexts".
> 
> v3: Add a module parameter to enable execlists. Execlist are relatively
> new, and so it'd be wise to be able to switch back to ring submission
> to debug subtle problems that will inevitably arise.
> 
> v4: Add an intel_enable_execlists function.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3)
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2 & v4)
> ---
>  drivers/gpu/drm/i915/i915_drv.h    | 6 ++++++
>  drivers/gpu/drm/i915/i915_params.c | 6 ++++++
>  drivers/gpu/drm/i915/intel_lrc.c   | 8 ++++++++
>  3 files changed, 20 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index ccc1ba6..dac0db1 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1519,6 +1519,7 @@ struct drm_i915_private {
>  
>  	uint32_t hw_context_size;
>  	struct list_head context_list;
> +	bool lrc_enabled;
>  
>  	u32 fdi_rx_config;
>  
> @@ -1944,6 +1945,7 @@ struct drm_i915_cmd_table {
>  #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
>  
>  #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
> +#define HAS_LOGICAL_RING_CONTEXTS(dev)	0
>  #define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >= 6)
>  #define HAS_PPGTT(dev)		(INTEL_INFO(dev)->gen >= 7 && !IS_GEN8(dev))
>  #define USES_PPGTT(dev)		intel_enable_ppgtt(dev, false)
> @@ -2029,6 +2031,7 @@ struct i915_params {
>  	int enable_rc6;
>  	int enable_fbc;
>  	int enable_ppgtt;
> +	int enable_execlists;
>  	int enable_psr;
>  	unsigned int preliminary_hw_support;
>  	int disable_power_well;
> @@ -2420,6 +2423,9 @@ struct intel_context *
>  i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
>  			  struct intel_engine_cs *ring, const u32 ctx_id);
>  
> +/* intel_lrc.c */
> +bool intel_enable_execlists(struct drm_device *dev);
> +
>  /* i915_gem_render_state.c */
>  int i915_gem_render_state_init(struct intel_engine_cs *ring);
>  /* i915_gem_evict.c */
> diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> index d05a2af..b7455f8 100644
> --- a/drivers/gpu/drm/i915/i915_params.c
> +++ b/drivers/gpu/drm/i915/i915_params.c
> @@ -37,6 +37,7 @@ struct i915_params i915 __read_mostly = {
>  	.enable_fbc = -1,
>  	.enable_hangcheck = true,
>  	.enable_ppgtt = -1,
> +	.enable_execlists = -1,
>  	.enable_psr = 0,
>  	.preliminary_hw_support = IS_ENABLED(CONFIG_DRM_I915_PRELIMINARY_HW_SUPPORT),
>  	.disable_power_well = 1,
> @@ -116,6 +117,11 @@ MODULE_PARM_DESC(enable_ppgtt,
>  	"Override PPGTT usage. "
>  	"(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
>  
> +module_param_named(enable_execlists, i915.enable_execlists, int, 0400);
> +MODULE_PARM_DESC(enable_execlists,
> +	"Override execlists usage. "
> +	"(-1=auto [default], 0=disabled, 1=enabled)");
> +
>  module_param_named(enable_psr, i915.enable_psr, int, 0600);
>  MODULE_PARM_DESC(enable_psr, "Enable PSR (default: false)");
>  
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 49bb6fc..58cead1 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -40,3 +40,11 @@
>  #include <drm/drmP.h>
>  #include <drm/i915_drm.h>
>  #include "i915_drv.h"
> +
> +bool intel_enable_execlists(struct drm_device *dev)
> +{
> +	if (!i915.enable_execlists)
> +		return false;
> +
> +	return HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev);
> +}

Nitpick: Best practice nowadays for options with complicated details is to
have a sanitized function called early in init. Code then just uses
i915.foo without calling anything. And the parameter needs to be
read-only, but that's already the case. See e.g. ppgtt handling.

Of course if your code only uses this once then this is moot - I didn't
read ahead.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 09/53] drm/i915/bdw: Initialization for Logical Ring Contexts
  2014-06-13 15:37 ` [PATCH 09/53] drm/i915/bdw: Initialization for Logical Ring Contexts oscar.mateo
@ 2014-06-18 20:24   ` Daniel Vetter
  2014-06-19  9:23     ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Daniel Vetter @ 2014-06-18 20:24 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:37:27PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Early in the series we had our own gen8_gem_context_init/fini
> functions, but the truth is they now look almost the same as the
> legacy hw context init/fini functions. We can always split them
> later if this ceases to be the case.
> 
> Also, we do not fall back to legacy ringbuffers when logical ring
> context initialization fails (not very likely to happen and, even
> if it does, hw contexts would probably fail as well).
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 801b891..3f3fb36 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -416,7 +416,13 @@ int i915_gem_context_init(struct drm_device *dev)
>  	if (WARN_ON(dev_priv->ring[RCS].default_context))
>  		return 0;
>  
> -	if (HAS_HW_CONTEXTS(dev)) {
> +	dev_priv->lrc_enabled = intel_enable_execlists(dev);
> +
> +	if (dev_priv->lrc_enabled) {
> +		/* NB: intentionally left blank. We will allocate our own
> +		 * backing objects as we need them, thank you very much */
> +		dev_priv->hw_context_size = 0;
> +	} else if (HAS_HW_CONTEXTS(dev)) {
>  		dev_priv->hw_context_size = round_up(get_context_size(dev), 4096);
>  		if (dev_priv->hw_context_size > (1<<20)) {
>  			DRM_DEBUG_DRIVER("Disabling HW Contexts; invalid size %d\n",
> @@ -436,7 +442,9 @@ int i915_gem_context_init(struct drm_device *dev)
>  	for (i = 0; i < I915_NUM_RINGS; i++)
>  		dev_priv->ring[i].default_context = ctx;
>  
> -	DRM_DEBUG_DRIVER("%s context support initialized\n", dev_priv->hw_context_size ? "HW" : "fake");
> +	DRM_DEBUG_DRIVER("%s context support initialized\n",
> +			dev_priv->lrc_enabled ? "LR" :
> +			dev_priv->hw_context_size ? "HW" : "fake");
>  	return 0;
>  }
>  
> @@ -765,9 +773,12 @@ int i915_switch_context(struct intel_engine_cs *ring,
>  	return do_switch(ring, to);
>  }
>  
> -static bool hw_context_enabled(struct drm_device *dev)
> +static bool contexts_enabled(struct drm_device *dev)
>  {
> -	return to_i915(dev)->hw_context_size;
> +	struct drm_i915_private *dev_priv = to_i915(dev);
> +
> +	/* FIXME: this would be cleaner with a "context type" enum */
> +	return dev_priv->lrc_enabled || dev_priv->hw_context_size;

Since you have a bunch of if ladders the usual approach isn't an enum but
a vfunc table to abstract behaviour. Think object types instead of switch
statements. Style bikeshed though (presume code later on doesn't have
excesses here).
-Daniel

>  }
>  
>  int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
> @@ -778,7 +789,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>  	struct intel_context *ctx;
>  	int ret;
>  
> -	if (!hw_context_enabled(dev))
> +	if (!contexts_enabled(dev))
>  		return -ENODEV;
>  
>  	ret = i915_mutex_lock_interruptible(dev);
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 13/53] drm/i915/bdw: Deferred creation of user-created LRCs
  2014-06-13 15:37 ` [PATCH 13/53] drm/i915/bdw: Deferred creation of user-created LRCs oscar.mateo
@ 2014-06-18 20:27   ` Daniel Vetter
  0 siblings, 0 replies; 156+ messages in thread
From: Daniel Vetter @ 2014-06-18 20:27 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:37:31PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> The backing objects for contexts created via open fd are actually
> empty until the user starts sending execbuffers to them. We do this
> because, at create time, we really don't know which engine is going
> to be used with the context later on.
> 
> v2: As context created via ioctl can only be used with the render ring,
> we have enough information to allocate & populate them right away.

Not sure this is a good choice to special-case the ioctl. We might want
to allow contexts also on non-render rings, and it introduces a special
case for not much gain. Since we must have the deferred alloca anyway lets
use that for all cases. And if there's a (performance) issue with it we
need to address that. One true (code) path to rule them all.
-Daniel

> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 28 ++++++++++++++++++++++++++--
>  1 file changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 1fb4592..70bf6d0 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -795,6 +795,7 @@ static bool contexts_enabled(struct drm_device *dev)
>  int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>  				  struct drm_file *file)
>  {
> +	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct drm_i915_gem_context_create *args = data;
>  	struct drm_i915_file_private *file_priv = file->driver_priv;
>  	struct intel_context *ctx;
> @@ -808,9 +809,23 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>  		return ret;
>  
>  	ctx = i915_gem_create_context(dev, file_priv, USES_FULL_PPGTT(dev));
> -	mutex_unlock(&dev->struct_mutex);
> -	if (IS_ERR(ctx))
> +	if (IS_ERR(ctx)) {
> +		mutex_unlock(&dev->struct_mutex);
>  		return PTR_ERR(ctx);
> +	}
> +
> +	if (dev_priv->lrc_enabled) {
> +		/* NB: We know this context will only be used with the render ring
> +		 * (as we enforce it) so we can allocate & populate it already */
> +		int ret = intel_lr_context_deferred_create(ctx, &dev_priv->ring[RCS]);
> +		if (ret) {
> +			mutex_unlock(&dev->struct_mutex);
> +			DRM_DEBUG_DRIVER("Could not create LRC: %d\n", ret);
> +			return ret;
> +		}
> +	}
> +
> +	mutex_unlock(&dev->struct_mutex);
>  
>  	args->ctx_id = ctx->id;
>  	DRM_DEBUG_DRIVER("HW context %d created\n", args->ctx_id);
> @@ -851,6 +866,7 @@ struct intel_context *
>  i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
>  			  struct intel_engine_cs *ring, const u32 ctx_id)
>  {
> +	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct intel_context *ctx = NULL;
>  	struct i915_ctx_hang_stats *hs;
>  
> @@ -867,5 +883,13 @@ i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
>  		return ERR_PTR(-EIO);
>  	}
>  
> +	if (dev_priv->lrc_enabled && !ctx->engine[ring->id].obj) {
> +		int ret = intel_lr_context_deferred_create(ctx, ring);
> +		if (ret) {
> +			DRM_DEBUG("Could not create LRC %u: %d\n", ctx_id, ret);
> +			return ERR_PTR(ret);
> +		}
> +	}
> +
>  	return ctx;
>  }
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 19/53] drm/i915: Extract pipe control fini & make init outside accesible
  2014-06-13 15:37 ` [PATCH 19/53] drm/i915: Extract pipe control fini & make init outside accesible oscar.mateo
@ 2014-06-18 20:31   ` Daniel Vetter
  2014-06-19  0:04   ` Volkin, Bradley D
  1 sibling, 0 replies; 156+ messages in thread
From: Daniel Vetter @ 2014-06-18 20:31 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:37:37PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> I plan to reuse these for the new logical ring path.
> 
> No functional changes.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

I don't see the new user of this yet, but imo copy-pasting this around
wouldn't be too harmful.
-Daniel

> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 31 ++++++++++++++++++-------------
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +++
>  2 files changed, 21 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 4a71dd4..254e4c5 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -574,8 +574,21 @@ out:
>  	return ret;
>  }
>  
> -static int
> -init_pipe_control(struct intel_engine_cs *ring)
> +void
> +intel_fini_pipe_control(struct intel_engine_cs *ring)
> +{
> +	if (ring->scratch.obj == NULL)
> +		return;
> +
> +	kunmap(sg_page(ring->scratch.obj->pages->sgl));
> +	i915_gem_object_ggtt_unpin(ring->scratch.obj);
> +
> +	drm_gem_object_unreference(&ring->scratch.obj->base);
> +	ring->scratch.obj = NULL;
> +}
> +
> +int
> +intel_init_pipe_control(struct intel_engine_cs *ring)
>  {
>  	int ret;
>  
> @@ -648,7 +661,7 @@ static int init_render_ring(struct intel_engine_cs *ring)
>  			   _MASKED_BIT_ENABLE(GFX_REPLAY_MODE));
>  
>  	if (INTEL_INFO(dev)->gen >= 5) {
> -		ret = init_pipe_control(ring);
> +		ret = intel_init_pipe_control(ring);
>  		if (ret)
>  			return ret;
>  	}
> @@ -676,16 +689,8 @@ static void render_ring_cleanup(struct intel_engine_cs *ring)
>  {
>  	struct drm_device *dev = ring->dev;
>  
> -	if (ring->scratch.obj == NULL)
> -		return;
> -
> -	if (INTEL_INFO(dev)->gen >= 5) {
> -		kunmap(sg_page(ring->scratch.obj->pages->sgl));
> -		i915_gem_object_ggtt_unpin(ring->scratch.obj);
> -	}
> -
> -	drm_gem_object_unreference(&ring->scratch.obj->base);
> -	ring->scratch.obj = NULL;
> +	if (INTEL_INFO(dev)->gen >= 5)
> +		intel_fini_pipe_control(ring);
>  }
>  
>  static int gen6_signal(struct intel_engine_cs *signaller,
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 599b4ed..42026a1 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -221,6 +221,9 @@ struct  intel_engine_cs {
>  
>  bool intel_ring_initialized(struct intel_engine_cs *ring);
>  
> +void intel_fini_pipe_control(struct intel_engine_cs *ring);
> +int intel_init_pipe_control(struct intel_engine_cs *ring);
> +
>  static inline unsigned
>  intel_ring_flag(struct intel_engine_cs *ring)
>  {
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 36/53] drm/i915: Abstract the workload submission mechanism away
  2014-06-13 15:37 ` [PATCH 36/53] drm/i915: Abstract the workload submission mechanism away oscar.mateo
@ 2014-06-18 20:40   ` Daniel Vetter
  0 siblings, 0 replies; 156+ messages in thread
From: Daniel Vetter @ 2014-06-18 20:40 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:37:54PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> As suggested by Daniel.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h            | 26 ++++++++++++++++
>  drivers/gpu/drm/i915/i915_gem.c            | 48 +++++++++++++++++++-----------
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 25 +++++++---------
>  3 files changed, 67 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 3e9983c..89b6d5c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1554,6 +1554,23 @@ struct drm_i915_private {
>  	/* Old ums support infrastructure, same warning applies. */
>  	struct i915_ums_state ums;
>  
> +	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
> +	struct {
> +		int (*do_execbuf) (struct drm_device *dev, struct drm_file *file,
> +				   struct intel_engine_cs *ring,
> +				   struct intel_context *ctx,
> +				   struct drm_i915_gem_execbuffer2 *args,
> +				   struct list_head *vmas,
> +				   struct drm_i915_gem_object *batch_obj,
> +				   u64 exec_start, u32 flags);
> +		int (*add_request) (struct intel_engine_cs *ring,
> +				    struct drm_file *file,
> +				    struct drm_i915_gem_object *obj,
> +				    u32 *out_seqno);

Hm, what do we need add_request for? With the clean split in command
submission I'd expect every function to know whether it'll submit to an
lrc (everything in intel_lrc.c) or whether it'll submit to a legacy ring
(existing code), so I don't see a need for an add_request vfunc.

Au contraire that looks a bit dangerous since code might get run with
execlist that assumes we have a legcy ring ...

A bit confused, but let's hope this clears up in further patches.
-Daniel

> +		int (*init_rings) (struct drm_device *dev);
> +		void (*cleanup_ring) (struct intel_engine_cs *ring);
> +	} gt;
> +
>  	/*
>  	 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
>  	 * will be rejected. Instead look for a better place.
> @@ -2131,6 +2148,14 @@ void i915_gem_execbuffer_retire_commands(struct drm_device *dev,
>  					 struct drm_file *file,
>  					 struct intel_engine_cs *ring,
>  					 struct drm_i915_gem_object *obj);
> +int i915_gem_ringbuffer_submission(struct drm_device *dev,
> +				   struct drm_file *file,
> +				   struct intel_engine_cs *ring,
> +				   struct intel_context *ctx,
> +				   struct drm_i915_gem_execbuffer2 *args,
> +				   struct list_head *vmas,
> +				   struct drm_i915_gem_object *batch_obj,
> +				   u64 exec_start, u32 flags);
>  int i915_gem_execbuffer(struct drm_device *dev, void *data,
>  			struct drm_file *file_priv);
>  int i915_gem_execbuffer2(struct drm_device *dev, void *data,
> @@ -2281,6 +2306,7 @@ void i915_gem_reset(struct drm_device *dev);
>  bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
>  int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
>  int __must_check i915_gem_init(struct drm_device *dev);
> +int i915_gem_init_rings(struct drm_device *dev);
>  int __must_check i915_gem_init_hw(struct drm_device *dev);
>  int i915_gem_l3_remap(struct intel_engine_cs *ring, int slice);
>  void i915_gem_init_swizzling(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 69db71a..7c10540 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2310,19 +2310,16 @@ i915_gem_get_seqno(struct drm_device *dev, u32 *seqno)
>  	return 0;
>  }
>  
> -int __i915_add_request(struct intel_engine_cs *ring,
> -		       struct drm_file *file,
> -		       struct drm_i915_gem_object *obj,
> -		       u32 *out_seqno)
> +static int i915_gem_add_request(struct intel_engine_cs *ring,
> +				struct drm_file *file,
> +				struct drm_i915_gem_object *obj,
> +				u32 *out_seqno)
>  {
>  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
>  	struct drm_i915_gem_request *request;
>  	u32 request_ring_position, request_start;
>  	int ret;
>  
> -	if (intel_enable_execlists(ring->dev))
> -		return intel_logical_ring_add_request(ring, file, obj, out_seqno);
> -
>  	request_start = intel_ring_get_tail(ring->buffer);
>  	/*
>  	 * Emit any outstanding flushes - execbuf can fail to emit the flush
> @@ -2403,6 +2400,16 @@ int __i915_add_request(struct intel_engine_cs *ring,
>  	return 0;
>  }
>  
> +int __i915_add_request(struct intel_engine_cs *ring,
> +		       struct drm_file *file,
> +		       struct drm_i915_gem_object *obj,
> +		       u32 *out_seqno)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +
> +	return dev_priv->gt.add_request(ring, file, obj, out_seqno);
> +}
> +
>  static inline void
>  i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
>  {
> @@ -4627,7 +4634,7 @@ intel_enable_blt(struct drm_device *dev)
>  	return true;
>  }
>  
> -static int i915_gem_init_rings(struct drm_device *dev)
> +int i915_gem_init_rings(struct drm_device *dev)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	int ret;
> @@ -4710,10 +4717,7 @@ i915_gem_init_hw(struct drm_device *dev)
>  
>  	i915_gem_init_swizzling(dev);
>  
> -	if (intel_enable_execlists(dev))
> -		ret = intel_logical_rings_init(dev);
> -	else
> -		ret = i915_gem_init_rings(dev);
> +	ret = dev_priv->gt.init_rings(dev);
>  	if (ret)
>  		return ret;
>  
> @@ -4751,6 +4755,18 @@ int i915_gem_init(struct drm_device *dev)
>  			DRM_DEBUG_DRIVER("allow wake ack timed out\n");
>  	}
>  
> +	if (intel_enable_execlists(dev)) {
> +		dev_priv->gt.do_execbuf = intel_execlists_submission;
> +		dev_priv->gt.add_request = intel_logical_ring_add_request;
> +		dev_priv->gt.init_rings = intel_logical_rings_init;
> +		dev_priv->gt.cleanup_ring = intel_logical_ring_cleanup;
> +	} else {
> +		dev_priv->gt.do_execbuf = i915_gem_ringbuffer_submission;
> +		dev_priv->gt.add_request = i915_gem_add_request;
> +		dev_priv->gt.init_rings = i915_gem_init_rings;
> +		dev_priv->gt.cleanup_ring = intel_cleanup_ring_buffer;
> +	}
> +
>  	i915_gem_init_userptr(dev);
>  	i915_gem_init_global_gtt(dev);
>  
> @@ -4785,12 +4801,8 @@ i915_gem_cleanup_ringbuffer(struct drm_device *dev)
>  	struct intel_engine_cs *ring;
>  	int i;
>  
> -	for_each_ring(ring, dev_priv, i) {
> -		if (intel_enable_execlists(dev))
> -			intel_logical_ring_cleanup(ring);
> -		else
> -			intel_cleanup_ring_buffer(ring);
> -	}
> +	for_each_ring(ring, dev_priv, i)
> +		dev_priv->gt.cleanup_ring(ring);
>  }
>  
>  int
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 36c7f0c..f0dd31f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1005,14 +1005,15 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev,
>  	return 0;
>  }
>  
> -static int
> -legacy_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
> -			     struct intel_engine_cs *ring,
> -			     struct intel_context *ctx,
> -			     struct drm_i915_gem_execbuffer2 *args,
> -			     struct list_head *vmas,
> -			     struct drm_i915_gem_object *batch_obj,
> -			     u64 exec_start, u32 flags)
> +int
> +i915_gem_ringbuffer_submission(struct drm_device *dev,
> +			       struct drm_file *file,
> +			       struct intel_engine_cs *ring,
> +			       struct intel_context *ctx,
> +			       struct drm_i915_gem_execbuffer2 *args,
> +			       struct list_head *vmas,
> +			       struct drm_i915_gem_object *batch_obj,
> +			       u64 exec_start, u32 flags)
>  {
>  	struct drm_clip_rect *cliprects = NULL;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> @@ -1379,12 +1380,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	else
>  		exec_start += i915_gem_obj_offset(batch_obj, vm);
>  
> -	if (intel_enable_execlists(dev))
> -		ret = intel_execlists_submission(dev, file, ring, ctx,
> -				args, &eb->vmas, batch_obj, exec_start, flags);
> -	else
> -		ret = legacy_ringbuffer_submission(dev, file, ring, ctx,
> -				args, &eb->vmas, batch_obj, exec_start, flags);
> +	ret = dev_priv->gt.do_execbuf(dev, file, ring, ctx, args,
> +			&eb->vmas, batch_obj, exec_start, flags);
>  	if (ret)
>  		goto err;
>  
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 41/53] drm/i915/bdw: Avoid non-lite-restore preemptions
  2014-06-13 15:37 ` [PATCH 41/53] drm/i915/bdw: Avoid non-lite-restore preemptions oscar.mateo
@ 2014-06-18 20:49   ` Daniel Vetter
  2014-06-23 11:52     ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Daniel Vetter @ 2014-06-18 20:49 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:37:59PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> In the current Execlists feeding mechanism, full preemption is not
> supported yet: only lite-restores are allowed (this is: the GPU
> simply samples a new tail pointer for the context currently in
> execution).
> 
> But we have identified an scenario in which a full preemption occurs:
> 1) We submit two contexts for execution (A & B).
> 2) The GPU finishes with the first one (A), switches to the second one
> (B) and informs us.
> 3) We submit B again (hoping to cause a lite restore) together with C,
> but in the time we spend writing to the ELSP, the GPU finishes B.
> 4) The GPU start executing B again (since we told it so).
> 5) We receive a B finished interrupt and, mistakenly, we submit C (again)
> and D, causing a full preemption of B.
> 
> By keeping a better track of our submissions, we can avoid the scenario
> described above.

How? I don't see a way to fundamentally avoid the above race, and I don't
really see an issue with it - the gpu should notice that there's not
really any work done and then switch to C.

Or am I completely missing the point here?

With no clue at all this looks really scary.

> v2: elsp_submitted belongs in the new intel_ctx_submit_request. Several
> rebase changes.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 28 ++++++++++++++++++++++++----
>  drivers/gpu/drm/i915/intel_lrc.h |  2 ++
>  2 files changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 290391c..f388b28 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -248,6 +248,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
>  		else if (req0->ctx == cursor->ctx) {
>  			/* Same ctx: ignore first request, as second request
>  			 * will update tail past first request's workload */
> +			cursor->elsp_submitted = req0->elsp_submitted;
>  			list_del(&req0->execlist_link);
>  			queue_work(dev_priv->wq, &req0->work);
>  			req0 = cursor;
> @@ -257,8 +258,14 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
>  		}
>  	}
>  
> +	WARN_ON(req1 && req1->elsp_submitted);
> +
>  	BUG_ON(execlists_submit_context(ring, req0->ctx, req0->tail,
>  			req1? req1->ctx : NULL, req1? req1->tail : 0));

Aside: No BUG_ON except when you can prove that the kernel will die within
the current function anyway. I've seen too many cases where people
sprinkle BUG_ON instead of WARN_ON for not-completely-letal issues with
the argument that stopping the box helps debugging.

That's kinda true for initial development, but not true when shipping: The
usual result is a frustrated user/customer looking at a completely frozen
box (because someone managed to hit the BUG_ON within a spinlock that the
irq handler requires and then the machine is gone) and an equally
frustrated developer half a world away.

A dying kernel that spews useful crap into logs with his last breadth is
_much_ better, even when you know that there's no way we can ever recover
from a given situation.

</rant>

Cheers, Daniel

> +
> +	req0->elsp_submitted++;
> +	if (req1)
> +		req1->elsp_submitted++;
>  }
>  
>  static bool execlists_check_remove_request(struct intel_engine_cs *ring,
> @@ -275,9 +282,13 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
>  		struct drm_i915_gem_object *ctx_obj =
>  				head_req->ctx->engine[ring->id].obj;
>  		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
> -			list_del(&head_req->execlist_link);
> -			queue_work(dev_priv->wq, &head_req->work);
> -			return true;
> +			WARN(head_req->elsp_submitted == 0,
> +					"Never submitted head request\n");
> +			if (--head_req->elsp_submitted <= 0) {
> +				list_del(&head_req->execlist_link);
> +				queue_work(dev_priv->wq, &head_req->work);
> +				return true;
> +			}
>  		}
>  	}
>  
> @@ -310,7 +321,16 @@ void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring)
>  		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
>  				(read_pointer % 6) * 8 + 4);
>  
> -		if (status & GEN8_CTX_STATUS_COMPLETE) {
> +		if (status & GEN8_CTX_STATUS_PREEMPTED) {
> +			if (status & GEN8_CTX_STATUS_LITE_RESTORE) {
> +				if (execlists_check_remove_request(ring, status_id))
> +					WARN(1, "Lite Restored request removed from queue\n");
> +			} else
> +				WARN(1, "Preemption without Lite Restore\n");
> +		}
> +
> +		 if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) ||
> +		     (status & GEN8_CTX_STATUS_ELEMENT_SWITCH)) {
>  			if (execlists_check_remove_request(ring, status_id))
>  				submit_contexts++;
>  		}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 7949dff..ee877aa 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -51,6 +51,8 @@ struct intel_ctx_submit_request {
>  
>  	struct list_head execlist_link;
>  	struct work_struct work;
> +
> +	int elsp_submitted;
>  };
>  
>  void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring);
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 42/53] drm/i915/bdw: Make sure gpu reset still works with Execlists
  2014-06-13 15:38 ` [PATCH 42/53] drm/i915/bdw: Make sure gpu reset still works with Execlists oscar.mateo
@ 2014-06-18 20:50   ` Daniel Vetter
  2014-06-19  9:37     ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Daniel Vetter @ 2014-06-18 20:50 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:38:00PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> If we reset a ring after a hang, we have to make sure that we clear
> out all queued Execlists requests.
> 
> v2: The ring is, at this point, already being correctly re-programmed
> for Execlists, and the hangcheck counters cleared.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 7c10540..86bfb8a 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2546,6 +2546,19 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>  		i915_gem_free_request(request);
>  	}
>  
> +	if (intel_enable_execlists(dev_priv->dev)) {
> +		while (!list_empty(&ring->execlist_queue)) {

the execlist_queu should be emtpy for legacy mode, i.e. you can ditch teh
if here, it's redundant. If not move the INIT_LIST_HEAD ;-)
-Daniel

> +			struct intel_ctx_submit_request *submit_req;
> +
> +			submit_req = list_first_entry(&ring->execlist_queue,
> +					struct intel_ctx_submit_request,
> +					execlist_link);
> +			list_del(&submit_req->execlist_link);
> +			i915_gem_context_unreference(submit_req->ctx);
> +			kfree(submit_req);
> +		}
> +	}
> +
>  	/* These may not have been flush before the reset, do so now */
>  	kfree(ring->preallocated_lazy_request);
>  	ring->preallocated_lazy_request = NULL;
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 43/53] drm/i915/bdw: Make sure error capture keeps working with Execlists
  2014-06-13 15:38 ` [PATCH 43/53] drm/i915/bdw: Make sure error capture keeps working " oscar.mateo
  2014-06-13 16:54   ` Chris Wilson
@ 2014-06-18 20:52   ` Daniel Vetter
  2014-06-18 20:53     ` Daniel Vetter
  1 sibling, 1 reply; 156+ messages in thread
From: Daniel Vetter @ 2014-06-18 20:52 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:38:01PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Since the ringbuffer does not belong per engine anymore, we have to
> make sure that we are always recording the correct ringbuffer.
> 
> TODO: This is only a small fix to keep basic error capture working, but
> we need to add more information for it to be useful (e.g. dump the
> context being executed).

I think we should dump the two lrc submitted to the hw port (our decoder
can deal with arbitrary amounts of rings, just name them foo1 and foo2)
and the overall execlist submission queue with a few hints what's going on
should be a useful start. The scheduler patches can then pimp this further
I guess.
-Daniel

> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 87ec60e..f5897be 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -825,9 +825,6 @@ static void i915_record_ring_state(struct drm_device *dev,
>  		ering->hws = I915_READ(mmio);
>  	}
>  
> -	ering->cpu_ring_head = ring->buffer->head;
> -	ering->cpu_ring_tail = ring->buffer->tail;
> -
>  	ering->hangcheck_score = ring->hangcheck.score;
>  	ering->hangcheck_action = ring->hangcheck.action;
>  
> @@ -887,6 +884,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
>  
>  	for (i = 0; i < I915_NUM_RINGS; i++) {
>  		struct intel_engine_cs *ring = &dev_priv->ring[i];
> +		struct intel_ringbuffer *ringbuf = ring->buffer;
>  
>  		if (ring->dev == NULL)
>  			continue;
> @@ -929,8 +927,18 @@ static void i915_gem_record_rings(struct drm_device *dev,
>  			}
>  		}
>  
> +		if (intel_enable_execlists(dev)) {
> +			if (request)
> +				ringbuf = request->ctx->engine[ring->id].ringbuf;
> +			else
> +				ringbuf = ring->default_context->engine[ring->id].ringbuf;
> +		}
> +
> +		error->ring[i].cpu_ring_head = ringbuf->head;
> +		error->ring[i].cpu_ring_tail = ringbuf->tail;
> +
>  		error->ring[i].ringbuffer =
> -			i915_error_ggtt_object_create(dev_priv, ring->buffer->obj);
> +			i915_error_ggtt_object_create(dev_priv, ringbuf->obj);
>  
>  		if (ring->status_page.obj)
>  			error->ring[i].hws_page =
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 43/53] drm/i915/bdw: Make sure error capture keeps working with Execlists
  2014-06-18 20:52   ` Daniel Vetter
@ 2014-06-18 20:53     ` Daniel Vetter
  0 siblings, 0 replies; 156+ messages in thread
From: Daniel Vetter @ 2014-06-18 20:53 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Wed, Jun 18, 2014 at 10:52:08PM +0200, Daniel Vetter wrote:
> On Fri, Jun 13, 2014 at 04:38:01PM +0100, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> > 
> > Since the ringbuffer does not belong per engine anymore, we have to
> > make sure that we are always recording the correct ringbuffer.
> > 
> > TODO: This is only a small fix to keep basic error capture working, but
> > we need to add more information for it to be useful (e.g. dump the
> > context being executed).
> 
> I think we should dump the two lrc submitted to the hw port (our decoder
> can deal with arbitrary amounts of rings, just name them foo1 and foo2)
> and the overall execlist submission queue with a few hints what's going on
> should be a useful start. The scheduler patches can then pimp this further
> I guess.

ofc this can be done later on, but needs to be tracked somewhere.
Historically we suck badly at delivering such follow-up work.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
  2014-06-13 15:38 ` [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt oscar.mateo
@ 2014-06-18 20:54   ` Daniel Vetter
  2014-07-26 10:27     ` Chris Wilson
  0 siblings, 1 reply; 156+ messages in thread
From: Daniel Vetter @ 2014-06-18 20:54 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:38:03PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Or with a spinlock grabbed, because it might sleep, which is not
> a nice thing to do. Instead, do the runtime_pm get/put together
> with the create/destroy request, and handle the forcewake get/put
> directly.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

Looks like a fixup that should be squashed into relevant earlier patches.
-Daniel

> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 26 +++++++++++++++++++++-----
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index d33e622..ea4b358 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -159,6 +159,7 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
>  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
>  	uint64_t temp = 0;
>  	uint32_t desc[4];
> +	unsigned long flags;
>  
>  	/* XXX: You must always write both descriptors in the order below. */
>  	if (ctx_obj1)
> @@ -172,9 +173,17 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
>  	desc[3] = (u32)(temp >> 32);
>  	desc[2] = (u32)temp;
>  
> -	/* Set Force Wakeup bit to prevent GT from entering C6 while
> -	 * ELSP writes are in progress */
> -	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
> +	/* Set Force Wakeup bit to prevent GT from entering C6 while ELSP writes
> +	 * are in progress.
> +	 *
> +	 * The other problem is that we can't just call gen6_gt_force_wake_get()
> +	 * because that function calls intel_runtime_pm_get(), which might sleep.
> +	 * Instead, we do the runtime_pm_get/put when creating/destroying requests.
> +	 */
> +	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
> +	if (dev_priv->uncore.forcewake_count++ == 0)
> +		dev_priv->uncore.funcs.force_wake_get(dev_priv, FORCEWAKE_ALL);
> +	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
>  
>  	I915_WRITE(RING_ELSP(ring), desc[1]);
>  	I915_WRITE(RING_ELSP(ring), desc[0]);
> @@ -185,8 +194,11 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
>  	/* ELSP is a write only register, so this serves as a posting read */
>  	POSTING_READ(RING_EXECLIST_STATUS(ring));
>  
> -	/* Release Force Wakeup */
> -	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
> +	/* Release Force Wakeup (see the big comment above). */
> +	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
> +	if (--dev_priv->uncore.forcewake_count == 0)
> +		dev_priv->uncore.funcs.force_wake_put(dev_priv, FORCEWAKE_ALL);
> +	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
>  }
>  
>  static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tail)
> @@ -353,6 +365,9 @@ static void execlists_free_request_task(struct work_struct *work)
>  	struct intel_ctx_submit_request *req =
>  			container_of(work, struct intel_ctx_submit_request, work);
>  	struct drm_device *dev = req->ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	intel_runtime_pm_put(dev_priv);
>  
>  	mutex_lock(&dev->struct_mutex);
>  	i915_gem_context_unreference(req->ctx);
> @@ -378,6 +393,7 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
>  	req->ring = ring;
>  	req->tail = tail;
>  	INIT_WORK(&req->work, execlists_free_request_task);
> +	intel_runtime_pm_get(dev_priv);
>  
>  	spin_lock_irqsave(&ring->execlist_lock, flags);
>  
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 46/53] drm/i915/bdw: Display execlists info in debugfs
  2014-06-13 15:38 ` [PATCH 46/53] drm/i915/bdw: Display execlists info in debugfs oscar.mateo
@ 2014-06-18 20:59   ` Daniel Vetter
  0 siblings, 0 replies; 156+ messages in thread
From: Daniel Vetter @ 2014-06-18 20:59 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:38:04PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> v2: Warn and return if LRCs are not enabled.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 72 +++++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c    |  6 ----
>  drivers/gpu/drm/i915/intel_lrc.h    |  7 ++++
>  3 files changed, 79 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index b09cab4..3ccdf0d 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1753,6 +1753,77 @@ static int i915_context_status(struct seq_file *m, void *unused)
>  	return 0;
>  }
>  
> +static int i915_execlists(struct seq_file *m, void *data)
> +{
> +	struct drm_info_node *node = (struct drm_info_node *) m->private;
> +	struct drm_device *dev = node->minor->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_engine_cs *ring;
> +	u32 status_pointer;
> +	u8 read_pointer;
> +	u8 write_pointer;
> +	u32 status;
> +	u32 ctx_id;
> +	struct list_head *cursor;
> +	struct intel_ctx_submit_request *head_req;
> +	int ring_id, i;
> +
> +	if (!intel_enable_execlists(dev)) {
> +		seq_printf(m, "Logical Ring Contexts are disabled\n");
> +		return 0;
> +	}
> +
> +	for_each_ring(ring, dev_priv, ring_id) {
> +		int count = 0;
> +
> +		seq_printf(m, "%s\n", ring->name);
> +
> +		status = I915_READ(RING_EXECLIST_STATUS(ring));
> +		ctx_id = I915_READ(RING_EXECLIST_STATUS(ring) + 4);
> +		seq_printf(m, "\tExeclist status: 0x%08X, context: %u\n",
> +				status, ctx_id);
> +
> +		status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
> +		seq_printf(m, "\tStatus pointer: 0x%08X\n", status_pointer);
> +
> +		read_pointer = ring->next_context_status_buffer;
> +		write_pointer = status_pointer & 0x07;
> +		if (read_pointer > write_pointer)
> +			write_pointer += 6;
> +		seq_printf(m, "\tRead pointer: 0x%08X, write pointer 0x%08X\n",
> +				read_pointer, write_pointer);
> +
> +		for (i = 0; i < 6; i++) {
> +			status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + 8*i);
> +			ctx_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + 8*i + 4);
> +
> +			seq_printf(m, "\tStatus buffer %d: 0x%08X, context: %u\n",
> +					i, status, ctx_id);
> +		}
> +
> +		list_for_each(cursor, &ring->execlist_queue) {
> +			count++;
> +		}
> +		seq_printf(m, "\t%d requests in queue\n", count);
> +
> +		if (count > 0) {
> +			struct drm_i915_gem_object *ctx_obj;
> +
> +			head_req = list_first_entry(&ring->execlist_queue,
> +					struct intel_ctx_submit_request, execlist_link);
> +
> +			ctx_obj = head_req->ctx->engine[ring_id].obj;
> +			seq_printf(m, "\tHead request id: %u\n",
> +					intel_execlists_ctx_id(ctx_obj));
> +			seq_printf(m, "\tHead request tail: %u\n", head_req->tail);
> +		}
> +
> +		seq_putc(m, '\n');
> +	}
> +
> +	return 0;
> +}

debugfs files with 0 locking tend to eventually blow up. Please don't even
though there's way too much precedence. I know that means you first have
to debug the deadlock if you hit such a beast, but usually lockdep and the
other in-kernel tooling are adequate for that.
-Daniel

> +
>  static int i915_gen6_forcewake_count_info(struct seq_file *m, void *data)
>  {
>  	struct drm_info_node *node = m->private;
> @@ -3813,6 +3884,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
>  	{"i915_opregion", i915_opregion, 0},
>  	{"i915_gem_framebuffer", i915_gem_framebuffer_info, 0},
>  	{"i915_context_status", i915_context_status, 0},
> +	{"i915_execlists", i915_execlists, 0},
>  	{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
>  	{"i915_swizzle_info", i915_swizzle_info, 0},
>  	{"i915_ppgtt_info", i915_ppgtt_info, 0},
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index ea4b358..c23c0f6 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -46,12 +46,6 @@
>  
>  #define GEN8_LR_CONTEXT_ALIGN 4096
>  
> -#define RING_ELSP(ring)			((ring)->mmio_base+0x230)
> -#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
> -#define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
> -#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
> -#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
> -
>  #define RING_EXECLIST_QFULL		(1 << 0x2)
>  #define RING_EXECLIST1_VALID		(1 << 0x3)
>  #define RING_EXECLIST0_VALID		(1 << 0x4)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index ee877aa..c318dcb 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -1,6 +1,13 @@
>  #ifndef _INTEL_LRC_H_
>  #define _INTEL_LRC_H_
>  
> +/* Execlists regs */
> +#define RING_ELSP(ring)			((ring)->mmio_base+0x230)
> +#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
> +#define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
> +#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
> +#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
> +
>  /* Logical Rings */
>  void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
>  int intel_logical_rings_init(struct drm_device *dev);
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 53/53] !UPSTREAM: drm/i915: Use MMIO flips
  2014-06-13 15:38 ` [PATCH 53/53] !UPSTREAM: drm/i915: Use MMIO flips oscar.mateo
@ 2014-06-18 21:01   ` Daniel Vetter
  2014-06-19  9:50     ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Daniel Vetter @ 2014-06-18 21:01 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:38:11PM +0100, oscar.mateo@intel.com wrote:
> From: Sourab Gupta <sourab.gupta@intel.com>
> 
> If we want flips to work, either we create an Execlists-aware version
> of intel_gen7_queue_flip, or we don't place commands directly in the
> ringbuffer.
> 
> When upstreamed, this patch should implement the second option:

Usually we just mention such requirements in the cover letter of the
series and don't include them.
-Daniel

> 
>     drm/i915: Replaced Blitter ring based flips with MMIO flips
> 
>     This patch enables the framework for using MMIO based flip calls,
>     in contrast with the CS based flip calls which are being used currently.
> 
>     MMIO based flip calls can be enabled on architectures where
>     Render and Blitter engines reside in different power wells. The
>     decision to use MMIO flips can be made based on workloads to give
>     100% residency for Media power well.
> 
>     v2: The MMIO flips now use the interrupt driven mechanism for issuing the
>     flips when target seqno is reached. (Incorporating Ville's idea)
> 
>     v3: Rebasing on latest code. Code restructuring after incorporating
>     Damien's comments
> 
>     v4: Addressing Ville's review comments
>         -general cleanup
>         -updating only base addr instead of calling update_primary_plane
>         -extending patch for gen5+ platforms
> 
>     v5: Addressed Ville's review comments
>         -Making mmio flip vs cs flip selection based on module parameter
>         -Adding check for DRIVER_MODESET feature in notify_ring before calling
>          notify mmio flip.
>         -Other changes mostly in function arguments
> 
>     v6: -Having a seperate function to check condition for using mmio flips (Ville)
>         -propogating error code from i915_gem_check_olr (Ville)
> 
>     v7: -Adding __must_check with i915_gem_check_olr (Chris)
>         -Renaming mmio_flip_data to mmio_flip (Chris)
>         -Rebasing on latest nightly
> 
>     v8: -Rebasing on latest code
>         -squash 3rd patch in series(mmio setbase vs page flip race) with this patch
>         -Added new tiling mode update in intel_do_mmio_flip (Chris)
> 
>     v9: -check for obj->last_write_seqno being 0 instead of obj->ring being NULL in
>     intel_postpone_flip, as this is a more restrictive condition (Chris)
> 
>     v10: -Applied Chris's suggestions for squashing patches 2,3 into this patch.
>     These patches make the selection of CS vs MMIO flip at the page flip time, and
>     make the module parameter for using mmio flips as tristate, the states being
>     'force CS flips', 'force mmio flips', 'driver discretion'.
>     Changed the logic for driver discretion (Chris)
> 
>     v11: Minor code cleanup(better readability, fixing whitespace errors, using
>     lockdep to check mutex locked status in postpone_flip, removal of __must_check
>     !UPSTREAM: drm/i915: Fix for flips
> 
>     If we want flips to work, either we create an Execlists-aware version
>     of intel_gen7_queue_flip, or we don't place commands directly in the
>     ringbuffer.
> 
>     When upstreamed, this patch should implement the second option:
> 
>     drm/i915: Replaced Blitter ring based flips with MMIO flips
> 
>     This patch enables the framework for using MMIO based flip calls,
>     in contrast with the CS based flip calls which are being used currently.
> 
>     MMIO based flip calls can be enabled on architectures where
>     Render and Blitter engines reside in different power wells. The
>     decision to use MMIO flips can be made based on workloads to give
>     100% residency for Media power well.
> 
>     v2: The MMIO flips now use the interrupt driven mechanism for issuing the
>     flips when target seqno is reached. (Incorporating Ville's idea)
> 
>     v3: Rebasing on latest code. Code restructuring after incorporating
>     Damien's comments
> 
>     v4: Addressing Ville's review comments
>         -general cleanup
>         -updating only base addr instead of calling update_primary_plane
>         -extending patch for gen5+ platforms
> 
>     v5: Addressed Ville's review comments
>         -Making mmio flip vs cs flip selection based on module parameter
>         -Adding check for DRIVER_MODESET feature in notify_ring before calling
>          notify mmio flip.
>         -Other changes mostly in function arguments
> 
>     v6: -Having a seperate function to check condition for using mmio flips (Ville)
>         -propogating error code from i915_gem_check_olr (Ville)
> 
>     v7: -Adding __must_check with i915_gem_check_olr (Chris)
>         -Renaming mmio_flip_data to mmio_flip (Chris)
>         -Rebasing on latest nightly
> 
>     v8: -Rebasing on latest code
>         -squash 3rd patch in series(mmio setbase vs page flip race) with this patch
>         -Added new tiling mode update in intel_do_mmio_flip (Chris)
> 
>     v9: -check for obj->last_write_seqno being 0 instead of obj->ring being NULL in
>     intel_postpone_flip, as this is a more restrictive condition (Chris)
> 
>     v10: -Applied Chris's suggestions for squashing patches 2,3 into this patch.
>     These patches make the selection of CS vs MMIO flip at the page flip time, and
>     make the module parameter for using mmio flips as tristate, the states being
>     'force CS flips', 'force mmio flips', 'driver discretion'.
>     Changed the logic for driver discretion (Chris)
> 
>     v11: Minor code cleanup(better readability, fixing whitespace errors, using
>     lockdep to check mutex locked status in postpone_flip, removal of __must_check
>     in function definition) (Chris)
> 
>     Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_dma.c      |   1 +
>  drivers/gpu/drm/i915/i915_drv.h      |   8 ++
>  drivers/gpu/drm/i915/i915_gem.c      |   2 +-
>  drivers/gpu/drm/i915/i915_irq.c      |   3 +
>  drivers/gpu/drm/i915/i915_params.c   |   5 ++
>  drivers/gpu/drm/i915/intel_display.c | 148 ++++++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/i915/intel_drv.h     |   6 ++
>  7 files changed, 171 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 93c0e1a..681d736 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1607,6 +1607,7 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
>  	spin_lock_init(&dev_priv->backlight_lock);
>  	spin_lock_init(&dev_priv->uncore.lock);
>  	spin_lock_init(&dev_priv->mm.object_stat_lock);
> +	spin_lock_init(&dev_priv->mmio_flip_lock);
>  	mutex_init(&dev_priv->dpio_lock);
>  	mutex_init(&dev_priv->modeset_restore_lock);
>  
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index b62b342..f519b6c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1377,6 +1377,9 @@ struct drm_i915_private {
>  	/* protects the irq masks */
>  	spinlock_t irq_lock;
>  
> +	/* protects the mmio flip data */
> +	spinlock_t mmio_flip_lock;
> +
>  	bool display_irqs_enabled;
>  
>  	/* To control wakeup latency, e.g. for irq-driven dp aux transfers. */
> @@ -2064,6 +2067,7 @@ struct i915_params {
>  	bool reset;
>  	bool disable_display;
>  	bool disable_vtd_wa;
> +	int use_mmio_flip;
>  };
>  extern struct i915_params i915 __read_mostly;
>  
> @@ -2274,6 +2278,8 @@ bool i915_gem_retire_requests(struct drm_device *dev);
>  void i915_gem_retire_requests_ring(struct intel_engine_cs *ring);
>  int __must_check i915_gem_check_wedge(struct i915_gpu_error *error,
>  				      bool interruptible);
> +int __must_check i915_gem_check_olr(struct intel_engine_cs *ring, u32 seqno);
> +
>  static inline bool i915_reset_in_progress(struct i915_gpu_error *error)
>  {
>  	return unlikely(atomic_read(&error->reset_counter)
> @@ -2649,6 +2655,8 @@ int i915_reg_read_ioctl(struct drm_device *dev, void *data,
>  int i915_get_reset_stats_ioctl(struct drm_device *dev, void *data,
>  			       struct drm_file *file);
>  
> +void intel_notify_mmio_flip(struct intel_engine_cs *ring);
> +
>  /* overlay */
>  extern struct intel_overlay_error_state *intel_overlay_capture_error_state(struct drm_device *dev);
>  extern void intel_overlay_print_error_state(struct drm_i915_error_state_buf *e,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 86bfb8a..093af37 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1095,7 +1095,7 @@ i915_gem_check_wedge(struct i915_gpu_error *error,
>   * Compare seqno against outstanding lazy request. Emit a request if they are
>   * equal.
>   */
> -static int
> +int
>  i915_gem_check_olr(struct intel_engine_cs *ring, u32 seqno)
>  {
>  	int ret;
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index b0fa1ed..824d956 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1218,6 +1218,9 @@ static void notify_ring(struct drm_device *dev,
>  
>  	trace_i915_gem_request_complete(ring);
>  
> +	if (drm_core_check_feature(dev, DRIVER_MODESET))
> +		intel_notify_mmio_flip(ring);
> +
>  	wake_up_all(&ring->irq_queue);
>  	i915_queue_hangcheck(dev);
>  }
> diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> index b7455f8..6bca4b2 100644
> --- a/drivers/gpu/drm/i915/i915_params.c
> +++ b/drivers/gpu/drm/i915/i915_params.c
> @@ -49,6 +49,7 @@ struct i915_params i915 __read_mostly = {
>  	.disable_display = 0,
>  	.enable_cmd_parser = 1,
>  	.disable_vtd_wa = 0,
> +	.use_mmio_flip = 1,
>  };
>  
>  module_param_named(modeset, i915.modeset, int, 0400);
> @@ -162,3 +163,7 @@ MODULE_PARM_DESC(disable_vtd_wa, "Disable all VT-d workarounds (default: false)"
>  module_param_named(enable_cmd_parser, i915.enable_cmd_parser, int, 0600);
>  MODULE_PARM_DESC(enable_cmd_parser,
>  		 "Enable command parsing (1=enabled [default], 0=disabled)");
> +
> +module_param_named(use_mmio_flip, i915.use_mmio_flip, int, 0600);
> +MODULE_PARM_DESC(use_mmio_flip, "use MMIO flips (-1=never, 0=driver "
> +	"discretion, 1=always [default])");
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index b5cbb28..43fd4e7 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -9255,6 +9255,147 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
>  	return 0;
>  }
>  
> +static bool use_mmio_flip(struct intel_engine_cs *ring,
> +			  struct drm_i915_gem_object *obj)
> +{
> +	/*
> +	 * This is not being used for older platforms, because
> +	 * non-availability of flip done interrupt forces us to use
> +	 * CS flips. Older platforms derive flip done using some clever
> +	 * tricks involving the flip_pending status bits and vblank irqs.
> +	 * So using MMIO flips there would disrupt this mechanism.
> +	 */
> +
> +	if (INTEL_INFO(ring->dev)->gen < 5)
> +		return false;
> +
> +	if (i915.use_mmio_flip < 0)
> +		return false;
> +	else if (i915.use_mmio_flip > 0)
> +		return true;
> +	else
> +		return ring != obj->ring;
> +}
> +
> +static void intel_do_mmio_flip(struct intel_crtc *intel_crtc)
> +{
> +	struct drm_device *dev = intel_crtc->base.dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_framebuffer *intel_fb =
> +		to_intel_framebuffer(intel_crtc->base.primary->fb);
> +	struct drm_i915_gem_object *obj = intel_fb->obj;
> +	u32 dspcntr;
> +	u32 reg;
> +
> +	intel_mark_page_flip_active(intel_crtc);
> +
> +	reg = DSPCNTR(intel_crtc->plane);
> +	dspcntr = I915_READ(reg);
> +
> +	if (INTEL_INFO(dev)->gen >= 4) {
> +		if (obj->tiling_mode != I915_TILING_NONE)
> +			dspcntr |= DISPPLANE_TILED;
> +		else
> +			dspcntr &= ~DISPPLANE_TILED;
> +	}
> +	I915_WRITE(reg, dspcntr);
> +
> +	I915_WRITE(DSPSURF(intel_crtc->plane),
> +			intel_crtc->unpin_work->gtt_offset);
> +	POSTING_READ(DSPSURF(intel_crtc->plane));
> +}
> +
> +static int intel_postpone_flip(struct drm_i915_gem_object *obj)
> +{
> +	struct intel_engine_cs *ring;
> +	int ret;
> +
> +	lockdep_assert_held(&obj->base.dev->struct_mutex);
> +
> +	if (!obj->last_write_seqno)
> +		return 0;
> +
> +	ring = obj->ring;
> +
> +	if (i915_seqno_passed(ring->get_seqno(ring, true),
> +				obj->last_write_seqno))
> +		return 0;
> +
> +	ret = i915_gem_check_olr(ring, obj->last_write_seqno);
> +	if (ret)
> +		return ret;
> +
> +	if (WARN_ON(!ring->irq_get(ring)))
> +		return 0;
> +
> +	return 1;
> +}
> +
> +void intel_notify_mmio_flip(struct intel_engine_cs *ring)
> +{
> +	struct drm_i915_private *dev_priv = to_i915(ring->dev);
> +	struct intel_crtc *intel_crtc;
> +	unsigned long irq_flags;
> +	u32 seqno;
> +
> +	seqno = ring->get_seqno(ring, false);
> +
> +	spin_lock_irqsave(&dev_priv->mmio_flip_lock, irq_flags);
> +	for_each_intel_crtc(ring->dev, intel_crtc) {
> +		struct intel_mmio_flip *mmio_flip;
> +
> +		mmio_flip = &intel_crtc->mmio_flip;
> +		if (mmio_flip->seqno == 0)
> +			continue;
> +
> +		if (ring->id != mmio_flip->ring_id)
> +			continue;
> +
> +		if (i915_seqno_passed(seqno, mmio_flip->seqno)) {
> +			intel_do_mmio_flip(intel_crtc);
> +			mmio_flip->seqno = 0;
> +			ring->irq_put(ring);
> +		}
> +	}
> +	spin_unlock_irqrestore(&dev_priv->mmio_flip_lock, irq_flags);
> +}
> +
> +static int intel_queue_mmio_flip(struct drm_device *dev,
> +		struct drm_crtc *crtc,
> +		struct drm_framebuffer *fb,
> +		struct drm_i915_gem_object *obj,
> +		struct intel_engine_cs *ring,
> +		uint32_t flags)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> +	unsigned long irq_flags;
> +	int ret;
> +
> +	if (WARN_ON(intel_crtc->mmio_flip.seqno))
> +		return -EBUSY;
> +
> +	ret = intel_postpone_flip(obj);
> +	if (ret < 0)
> +		return ret;
> +	if (ret == 0) {
> +		intel_do_mmio_flip(intel_crtc);
> +		return 0;
> +	}
> +
> +	spin_lock_irqsave(&dev_priv->mmio_flip_lock, irq_flags);
> +	intel_crtc->mmio_flip.seqno = obj->last_write_seqno;
> +	intel_crtc->mmio_flip.ring_id = obj->ring->id;
> +	spin_unlock_irqrestore(&dev_priv->mmio_flip_lock, irq_flags);
> +
> +	/*
> +	 * Double check to catch cases where irq fired before
> +	 * mmio flip data was ready
> +	 */
> +	intel_notify_mmio_flip(obj->ring);
> +	return 0;
> +}
> +
>  static int intel_default_queue_flip(struct drm_device *dev,
>  				    struct drm_crtc *crtc,
>  				    struct drm_framebuffer *fb,
> @@ -9362,7 +9503,12 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
>  	work->gtt_offset =
>  		i915_gem_obj_ggtt_offset(obj) + intel_crtc->dspaddr_offset;
>  
> -	ret = dev_priv->display.queue_flip(dev, crtc, fb, obj, ring, page_flip_flags);
> +	if (use_mmio_flip(ring, obj))
> +		ret = intel_queue_mmio_flip(dev, crtc, fb, obj, ring,
> +				page_flip_flags);
> +	else
> +		ret = dev_priv->display.queue_flip(dev, crtc, fb, obj, ring,
> +				page_flip_flags);
>  	if (ret)
>  		goto cleanup_unpin;
>  
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 78d4124..b38e88d 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -358,6 +358,11 @@ struct intel_pipe_wm {
>  	bool sprites_scaled;
>  };
>  
> +struct intel_mmio_flip {
> +	u32 seqno;
> +	u32 ring_id;
> +};
> +
>  struct intel_crtc {
>  	struct drm_crtc base;
>  	enum pipe pipe;
> @@ -412,6 +417,7 @@ struct intel_crtc {
>  	wait_queue_head_t vbl_wait;
>  
>  	int scanline_offset;
> +	struct intel_mmio_flip mmio_flip;
>  };
>  
>  struct intel_plane_wm_parameters {
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 00/53] Execlists v3
  2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
                   ` (52 preceding siblings ...)
  2014-06-13 15:38 ` [PATCH 53/53] !UPSTREAM: drm/i915: Use MMIO flips oscar.mateo
@ 2014-06-18 21:26 ` Daniel Vetter
  53 siblings, 0 replies; 156+ messages in thread
From: Daniel Vetter @ 2014-06-18 21:26 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 04:37:18PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> For a description of this patchset, please check the previous cover letters: [1] and [2].
> 
> The main difference with v2 is in how we get to the point of context submission: this time
> around, instead of massaging the legacy ringbuffer submission functions (mostly located in intel_ringebuffer.c), I have effectively created a separate path for Execlists submission
> in intel_lrc.c (even though everybody knows you shouldn't mess with split timelines). The
> alternative path is mostly a clone of the previous, but the idea is that it will differ
> significantly in the (so in exchange for duplicated code, we gain the ability to perform
> big changes without breaking legacy hardware support). This change was a suggestion by
> Daniel Vetter [3].
> 
> I know many patches here will be very controversial, so I would appreciate early feedback
> in the direction this effort is taking.

Ok, I've read through this and I like what I've seen. Some detail polish,
but we can do that later on on top. The only really serious issue is the
add_request vfunc, I'd prefer if we can remove that. Alternatively I'd
like to really understand why it's needed ...

Damien and Brad are signed up to shepherd the series through review. I'd
prefer if we fan that out a bit to the wider global team for better
knowledge osmosis (i.e. pls look for reviewers outside of Oscar's team).

One thing I've noticed though is how you've structured your patch series,
so I'll drop my guidelines here for the next big thing. Jesse might want
to include this into his "big patch series" best practices documentation.

1. Build new code outside-in
----------------------------

The major thing I've noticed is that very often you add something in a
patch which is only used later on, e.g. the entire lrc submission path is
created from the endpoints over almost 20 patches until it's finally
plugged into the execbuf submission path.

My puny brain is by far not good enough to hold that much context, I'll
give up after 1 patch. A better approach is to build up code from the
bottom-up. So in this case:
- Start with extracting the execbuf submit vfunc for legacy code.
- Fill in a bare-bones lrc submission function with mostly TODO comments.
- Fill them out step-by-step by recursing down until you hit the
  low-level ring flush/invalidate/foo/bar functions copy-pasted from
  intel_ringbuffer.c.

Another example would be a new ioctl: Start with a dummy ioctl then fill
it out. The key is to put every new piece of code you add to immediate
use, so that your reviewer can look at it, review it and then forget about
it.

3rd example (much smaller) is the frontbuffer tracking I've just done:
First I added new state with self-checks to make sure it's consistent.
This means the code is functional (so can be reviewed), but the state it
tracks isn't used yet. Then I added events for frontbuffer invalidation,
flushing and flippping, explaining in the cover letter and kerneldoc
comments the semantics I'm aiming for. Again reviewable as a functional,
stand-alone thing. Then I've rewired the psr code to be called from the
new code. At each step something working&self-consistent was added, which
means the details can be forgotten again when moving to the next chunk.

I know that it's tempting to do magic and spring the rabbit in the very
last patch onto your unsuspecting audience. But we try to explain the
tricks here, not showcase them ;-)

2. Always add users for new stuff in the same patch
---------------------------------------------------

Same idea holds at a smaller scale for individual patches: Don't extract
code without using the new functions - otherwise the reviewer has to have
2 patches open (since most don't even apply your patches, so can't check
what the code does when you've changed it).

The only exception where it's ok to split new code from it's first user is
for new register #defines. Reviewing that with Bspec is a task in itself
and many people prefer to have that split out from the actual code. But
besides that try hard to avoid to introduce anything that isn't used right
away in the same patch.

One image I have for a complete patch is a small math proof: You start
with the motivation and overall idea (commit message), then go through a
bunch of technicalities in lemmas (functions) and finally stitch it
together in the big theorem (i.e. the function that uses all the new
code). Especially when adding completely new code and as long as you order
functions to avoid forward declarations this is the natural order in the
resulting diff.

3. Split big patch series into sub-parts
----------------------------------------

Books have chapters and patch series should be split into parts. The big
reason for that is that if you don't you need 1 reviewer to look at
everything. Which doesn't scale and means review will take months. So
split it into different topics which should be as independent as possible.
For this series here we'd have:
- Rework of the context allocation/freeing and associated tracking.
- lrc init/teardown code in general.
- Reworking of execbuf submission.
- lrc submission implementation into the list.
- Handling interrupts and actual low-level submission.

It's hard to tell for sure with all the forward references you have but my
impression from a single read-through is that all these different topics
are smeared around a bit. Which means when the review is split there's a
lot of overlap, which slows things down.

So when designing the final series try to keep this in mind, potentially
reworking the patch split and ordering a bit. The sweet spot for such a
chapter is ime around 10 patches. That usually translates into a
reasonable review load of a few hours, which can be squeezed into ongoing
tasks without too much trouble. If you have bigger chunks it becomes much
harder to find available reviewers.

For submission you can either send chapters as individual threads or
explain the differen parts in the cover letter with a paragraph or so each
(instead of the per-chapter cover letter).

Anyway all very fluffy, but we need to start collecting these best
practices somewhere.

Comments and approches from other people on our team highly welcome.

Cheers, Daniel

> 
> The previous IGT test [4] still applies.
> 
> [1]
> http://lists.freedesktop.org/archives/intel-gfx/2014-March/042563.html
> [2]
> http://lists.freedesktop.org/archives/intel-gfx/2014-May/044847.html
> [3]
> http://lists.freedesktop.org/archives/intel-gfx/2014-May/045139.html
> [4]
> http://lists.freedesktop.org/archives/intel-gfx/2014-May/044846.html
> 
> Ben Widawsky (2):
>   drm/i915/bdw: Implement context switching (somewhat)
>   drm/i915/bdw: Print context state in debugfs
> 
> Michel Thierry (1):
>   drm/i915/bdw: Two-stage execlist submit process
> 
> Oscar Mateo (48):
>   drm/i915: Extract context backing object allocation
>   drm/i915: Rename ctx->obj to ctx->render_obj
>   drm/i915: Add a dev pointer to the context
>   drm/i915: Extract ringbuffer destroy & make alloc outside accesible
>   drm/i915: Move i915_gem_validate_context() to i915_gem_context.c
>   drm/i915/bdw: Introduce one context backing object per engine
>   drm/i915/bdw: New file for Logical Ring Contexts and Execlists
>   drm/i915/bdw: Macro for LRCs and module option for Execlists
>   drm/i915/bdw: Initialization for Logical Ring Contexts
>   drm/i915/bdw: A bit more advanced context init/fini
>   drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts
>   drm/i915/bdw: Populate LR contexts (somewhat)
>   drm/i915/bdw: Deferred creation of user-created LRCs
>   drm/i915/bdw: Render moot context reset and switch when LRCs are
>     enabled
>   drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
>   drm/i915/bdw: Skeleton for the new logical rings submission path
>   drm/i915/bdw: Generic logical ring init and cleanup
>   drm/i915/bdw: New header file for LRs, LRCs and Execlists
>   drm/i915: Extract pipe control fini & make init outside accesible
>   drm/i915/bdw: GEN-specific logical ring init
>   drm/i915/bdw: GEN-specific logical ring set/get seqno
>   drm/i915: Make ring_space more generic and outside accesible
>   drm/i915: Generalize intel_ring_get_tail
>   drm/i915: Make intel_ring_stopped outside accesible
>   drm/i915/bdw: GEN-specific logical ring submit context (somewhat)
>   drm/i915/bdw: New logical ring submission mechanism
>   drm/i915/bdw: GEN-specific logical ring emit request
>   drm/i915/bdw: GEN-specific logical ring emit flush
>   drm/i915/bdw: Emission of requests with logical rings
>   drm/i915/bdw: Ring idle and stop with logical rings
>   drm/i915/bdw: Interrupts with logical rings
>   drm/i915/bdw: GEN-specific logical ring emit batchbuffer start
>   drm/i915: Extract the actual workload submission mechanism from
>     execbuffer
>   drm/i915: Make move_to_active and retire_commands outside accesible
>   drm/i915/bdw: Workload submission mechanism for Execlists
>   drm/i915: Abstract the workload submission mechanism away
>   drm/i915/bdw: Write the tail pointer, LRC style
>   drm/i915/bdw: Avoid non-lite-restore preemptions
>   drm/i915/bdw: Make sure gpu reset still works with Execlists
>   drm/i915/bdw: Make sure error capture keeps working with Execlists
>   drm/i915/bdw: Help out the ctx switch interrupt handler
>   drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
>   drm/i915/bdw: Display execlists info in debugfs
>   drm/i915/bdw: Display context backing obj & ringbuffer info in debugfs
>   drm/i915: Extract render state preparation
>   drm/i915/bdw: Render state init for Execlists
>   drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
>   drm/i915/bdw: Enable logical ring contexts
> 
> Sourab Gupta (1):
>   !UPSTREAM: drm/i915: Use MMIO flips
> 
> Thomas Daniel (1):
>   drm/i915/bdw: Handle context switch events
> 
>  drivers/gpu/drm/i915/Makefile                |    1 +
>  drivers/gpu/drm/i915/i915_debugfs.c          |  150 +-
>  drivers/gpu/drm/i915/i915_dma.c              |    1 +
>  drivers/gpu/drm/i915/i915_drv.h              |   60 +-
>  drivers/gpu/drm/i915/i915_gem.c              |   70 +-
>  drivers/gpu/drm/i915/i915_gem_context.c      |  242 +++-
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  328 ++---
>  drivers/gpu/drm/i915/i915_gem_gtt.c          |    5 +
>  drivers/gpu/drm/i915/i915_gem_render_state.c |   39 +-
>  drivers/gpu/drm/i915/i915_gpu_error.c        |   16 +-
>  drivers/gpu/drm/i915/i915_irq.c              |   53 +-
>  drivers/gpu/drm/i915/i915_params.c           |   11 +
>  drivers/gpu/drm/i915/i915_reg.h              |    5 +
>  drivers/gpu/drm/i915/intel_display.c         |  148 +-
>  drivers/gpu/drm/i915/intel_drv.h             |    6 +
>  drivers/gpu/drm/i915/intel_lrc.c             | 1902 ++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.h             |   99 ++
>  drivers/gpu/drm/i915/intel_renderstate.h     |   13 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c      |  101 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.h      |   53 +-
>  20 files changed, 2974 insertions(+), 329 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/intel_lrc.c
>  create mode 100644 drivers/gpu/drm/i915/intel_lrc.h
> 
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 04/53] drm/i915: Extract ringbuffer destroy & make alloc outside accesible
  2014-06-13 15:37 ` [PATCH 04/53] drm/i915: Extract ringbuffer destroy & make alloc outside accesible oscar.mateo
@ 2014-06-18 21:39   ` Volkin, Bradley D
  2014-06-19 10:42     ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Volkin, Bradley D @ 2014-06-18 21:39 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 08:37:22AM -0700, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> We are going to start creating a lot of extra ringbuffers soon, so
> these functions are handy.
> 
> No functional changes.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 26 ++++++++++++++++----------
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  4 ++++
>  2 files changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 279488a..915f3d5 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1378,15 +1378,25 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
>  	return 0;
>  }
>  
> -static int allocate_ring_buffer(struct intel_engine_cs *ring)
> +void intel_destroy_ring_buffer(struct intel_ringbuffer *ringbuf)
> +{
> +	if (!ringbuf->obj)
> +		return;
> +
> +	iounmap(ringbuf->virtual_start);
> +	i915_gem_object_ggtt_unpin(ringbuf->obj);
> +	drm_gem_object_unreference(&ringbuf->obj->base);
> +	ringbuf->obj = NULL;
> +}
> +
> +int intel_allocate_ring_buffer(struct drm_device *dev,
> +			       struct intel_ringbuffer *ringbuf)

A bikeshed, but maybe intel_alloc_ringbuffer_obj() since we're only
allocating the backing object, and to mirror the earlier
i915_gem_alloc_context_obj() with similar purpose. Otherwise, looks
fine to me.

Brad

>  {
> -	struct drm_device *dev = ring->dev;
>  	struct drm_i915_private *dev_priv = to_i915(dev);
> -	struct intel_ringbuffer *ringbuf = ring->buffer;
>  	struct drm_i915_gem_object *obj;
>  	int ret;
>  
> -	if (intel_ring_initialized(ring))
> +	if (ringbuf->obj)
>  		return 0;
>  
>  	obj = NULL;
> @@ -1455,7 +1465,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>  			goto error;
>  	}
>  
> -	ret = allocate_ring_buffer(ring);
> +	ret = intel_allocate_ring_buffer(dev, ringbuf);
>  	if (ret) {
>  		DRM_ERROR("Failed to allocate ringbuffer %s: %d\n", ring->name, ret);
>  		goto error;
> @@ -1496,11 +1506,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
>  	intel_stop_ring_buffer(ring);
>  	WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
>  
> -	iounmap(ringbuf->virtual_start);
> -
> -	i915_gem_object_ggtt_unpin(ringbuf->obj);
> -	drm_gem_object_unreference(&ringbuf->obj->base);
> -	ringbuf->obj = NULL;
> +	intel_destroy_ring_buffer(ringbuf);
>  	ring->preallocated_lazy_request = NULL;
>  	ring->outstanding_lazy_seqno = 0;
>  
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 910c83c..dee5b37 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -318,6 +318,10 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev);
>  u64 intel_ring_get_active_head(struct intel_engine_cs *ring);
>  void intel_ring_setup_status_page(struct intel_engine_cs *ring);
>  
> +void intel_destroy_ring_buffer(struct intel_ringbuffer *ringbuf);
> +int intel_allocate_ring_buffer(struct drm_device *dev,
> +			       struct intel_ringbuffer *ringbuf);
> +
>  static inline u32 intel_ring_get_tail(struct intel_engine_cs *ring)
>  {
>  	return ring->buffer->tail;
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 10/53] drm/i915/bdw: A bit more advanced context init/fini
  2014-06-13 15:37 ` [PATCH 10/53] drm/i915/bdw: A bit more advanced context init/fini oscar.mateo
@ 2014-06-18 22:13   ` Volkin, Bradley D
  2014-06-19  6:13     ` Daniel Vetter
  0 siblings, 1 reply; 156+ messages in thread
From: Volkin, Bradley D @ 2014-06-18 22:13 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 08:37:28AM -0700, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> There are a few big differences between context init and fini with the
> previous implementation of hardware contexts. One of them is
> demonstrated in this patch: we must allocate a ctx backing object for
> each engine.
> 
> Regarding the context size, reading the register to calculate the sizes
> can work, I think, however the docs are very clear about the actual
> context sizes on GEN8, so just hardcode that and use it.
> 
> v2: Rebased on top of the Full PPGTT series. It is important to notice
> that at this point we have one global default context per engine, all
> of them using the aliasing PPGTT (as opposed to the single global
> default context we have with legacy HW contexts).
> 
> v3:
> - Go back to one single global default context, this time with multiple
>   backing objects inside.
> - Use different context sizes for non-render engines, as suggested by
>   Damien (still hardcoded, since the information about the context size
>   registers in the BSpec is, well, *lacking*).
> - Render ctx size is 20 (or 19) pages, but not 21 (caught by Damien).
> - Move default context backing object creation to intel_init_ring (so
>   that we don't waste memory in rings that might not get initialized).
> 
> v4:
> - Reuse the HW legacy context init/fini.
> - Create a separate free function.
> - Rename the functions with an intel_ preffix.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |  3 ++
>  drivers/gpu/drm/i915/i915_gem_context.c | 19 +++++++--
>  drivers/gpu/drm/i915/intel_lrc.c        | 70 +++++++++++++++++++++++++++++++++
>  3 files changed, 88 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index dac0db1..347308e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2425,6 +2425,9 @@ i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
>  
>  /* intel_lrc.c */
>  bool intel_enable_execlists(struct drm_device *dev);
> +void intel_lr_context_free(struct intel_context *ctx);
> +int intel_lr_context_deferred_create(struct intel_context *ctx,
> +				     struct intel_engine_cs *ring);
>  
>  /* i915_gem_render_state.c */
>  int i915_gem_render_state_init(struct intel_engine_cs *ring);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 3f3fb36..1fb4592 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -180,8 +180,11 @@ void i915_gem_context_free(struct kref *ctx_ref)
>  {
>  	struct intel_context *ctx = container_of(ctx_ref, typeof(*ctx), ref);
>  	struct drm_device *dev = ctx->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
>  
> -	if (ctx->render_obj) {
> +	if (dev_priv->lrc_enabled)
> +		intel_lr_context_free(ctx);
> +	else if (ctx->render_obj) {
>  		/* XXX: Free up the object before tearing down the address space, in
>  		 * case we're bound in the PPGTT */

Does this comment apply to both cases?

>  		drm_gem_object_unreference(&ctx->render_obj->base);
> @@ -438,9 +441,17 @@ int i915_gem_context_init(struct drm_device *dev)
>  		return PTR_ERR(ctx);
>  	}
>  
> -	/* NB: RCS will hold a ref for all rings */
> -	for (i = 0; i < I915_NUM_RINGS; i++)
> -		dev_priv->ring[i].default_context = ctx;
> +	for (i = 0; i < I915_NUM_RINGS; i++) {
> +		struct intel_engine_cs *ring = &dev_priv->ring[i];
> +
> +		/* NB: RCS will hold a ref for all rings */
> +		ring->default_context = ctx;
> +
> +		/* FIXME: we only want to do this for initialized rings, but for that
> +		 * we first need the new logical ring stuff */
> +		if (dev_priv->lrc_enabled)
> +			intel_lr_context_deferred_create(ctx, ring);
> +	}
>  
>  	DRM_DEBUG_DRIVER("%s context support initialized\n",
>  			dev_priv->lrc_enabled ? "LR" :
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 58cead1..952212f 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -41,6 +41,11 @@
>  #include <drm/i915_drm.h>
>  #include "i915_drv.h"
>  
> +#define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
> +#define GEN8_LR_CONTEXT_OTHER_SIZE (2 * PAGE_SIZE)
> +
> +#define GEN8_LR_CONTEXT_ALIGN 4096
> +
>  bool intel_enable_execlists(struct drm_device *dev)
>  {
>  	if (!i915.enable_execlists)
> @@ -48,3 +53,68 @@ bool intel_enable_execlists(struct drm_device *dev)
>  
>  	return HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev);
>  }
> +
> +void intel_lr_context_free(struct intel_context *ctx)
> +{
> +	int i;
> +
> +	for (i = 0; i < I915_NUM_RINGS; i++) {
> +		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].obj;
> +		if (ctx_obj) {
> +			i915_gem_object_ggtt_unpin(ctx_obj);

I suspect that leaving the backing objects pinned in ggtt for their entire
lifetimes is going to eventually cause memory related issues. We might need
to look at managing the binding more dynamically - similar to what the
legacy context code already does.

Brad

> +			drm_gem_object_unreference(&ctx_obj->base);
> +		}
> +	}
> +}
> +
> +static uint32_t get_lr_context_size(struct intel_engine_cs *ring)
> +{
> +	int ret = 0;
> +
> +	WARN_ON(INTEL_INFO(ring->dev)->gen != 8);
> +
> +	switch (ring->id) {
> +	case RCS:
> +		ret = GEN8_LR_CONTEXT_RENDER_SIZE;
> +		break;
> +	case VCS:
> +	case BCS:
> +	case VECS:
> +	case VCS2:
> +		ret = GEN8_LR_CONTEXT_OTHER_SIZE;
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
> +int intel_lr_context_deferred_create(struct intel_context *ctx,
> +				     struct intel_engine_cs *ring)
> +{
> +	struct drm_device *dev = ring->dev;
> +	struct drm_i915_gem_object *ctx_obj;
> +	uint32_t context_size;
> +	int ret;
> +
> +	WARN_ON(ctx->render_obj != NULL);
> +
> +	context_size = round_up(get_lr_context_size(ring), 4096);
> +
> +	ctx_obj = i915_gem_alloc_context_obj(dev, context_size);
> +	if (IS_ERR(ctx_obj)) {
> +		ret = PTR_ERR(ctx_obj);
> +		DRM_DEBUG_DRIVER("Alloc LRC backing obj failed: %d\n", ret);
> +		return ret;
> +	}
> +
> +	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n", ret);
> +		drm_gem_object_unreference(&ctx_obj->base);
> +		return ret;
> +	}
> +
> +	ctx->engine[ring->id].obj = ctx_obj;
> +
> +	return 0;
> +}
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 11/53] drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts
  2014-06-13 15:37 ` [PATCH 11/53] drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts oscar.mateo
@ 2014-06-18 22:19   ` Volkin, Bradley D
  2014-06-23 12:07     ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Volkin, Bradley D @ 2014-06-18 22:19 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 08:37:29AM -0700, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> As we have said a couple of times by now, logical ring contexts have
> their own ringbuffers: not only the backing pages, but the whole
> management struct.
> 
> In a previous version of the series, this was achieved with two separate
> patches:
> drm/i915/bdw: Allocate ringbuffer backing objects for default global LRC
> drm/i915/bdw: Allocate ringbuffer for user-created LRCs
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h  |  1 +
>  drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 39 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 347308e..79799d8 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -599,6 +599,7 @@ struct intel_context {
>  	/* Execlists */
>  	struct {
>  		struct drm_i915_gem_object *obj;
> +		struct intel_ringbuffer *ringbuf;
>  	} engine[I915_NUM_RINGS];
>  
>  	struct i915_ctx_hang_stats hang_stats;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 952212f..b3a23e0 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -60,7 +60,11 @@ void intel_lr_context_free(struct intel_context *ctx)
>  
>  	for (i = 0; i < I915_NUM_RINGS; i++) {
>  		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].obj;
> +		struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
> +
>  		if (ctx_obj) {
> +			intel_destroy_ring_buffer(ringbuf);
> +			kfree(ringbuf);
>  			i915_gem_object_ggtt_unpin(ctx_obj);
>  			drm_gem_object_unreference(&ctx_obj->base);
>  		}
> @@ -94,6 +98,7 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
>  	struct drm_device *dev = ring->dev;
>  	struct drm_i915_gem_object *ctx_obj;
>  	uint32_t context_size;
> +	struct intel_ringbuffer *ringbuf;
>  	int ret;
>  
>  	WARN_ON(ctx->render_obj != NULL);
> @@ -114,6 +119,39 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
>  		return ret;
>  	}
>  
> +	ringbuf = kzalloc(sizeof(*ringbuf), GFP_KERNEL);
> +	if (!ringbuf) {
> +		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s\n",
> +				ring->name);
> +		i915_gem_object_ggtt_unpin(ctx_obj);
> +		drm_gem_object_unreference(&ctx_obj->base);
> +		ret = -ENOMEM;
> +		return ret;
> +	}
> +
> +	ringbuf->size = 32 * PAGE_SIZE;
> +	ringbuf->effective_size = ringbuf->size;
> +	ringbuf->head = 0;
> +	ringbuf->tail = 0;
> +	ringbuf->space = ringbuf->size;
> +	ringbuf->last_retired_head = -1;
> +
> +	/* TODO: For now we put this in the mappable region so that we can reuse
> +	 * the existing ringbuffer code which ioremaps it. When we start
> +	 * creating many contexts, this will no longer work and we must switch
> +	 * to a kmapish interface.
> +	 */

It looks like this comment still exists at the end of the series. Does it
still apply or did we find that this is not an issue?

Brad

> +	ret = intel_allocate_ring_buffer(dev, ringbuf);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer obj %s: %d\n",
> +				ring->name, ret);
> +		kfree(ringbuf);
> +		i915_gem_object_ggtt_unpin(ctx_obj);
> +		drm_gem_object_unreference(&ctx_obj->base);
> +		return ret;
> +	}
> +
> +	ctx->engine[ring->id].ringbuf = ringbuf;
>  	ctx->engine[ring->id].obj = ctx_obj;
>  
>  	return 0;
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 12/53] drm/i915/bdw: Populate LR contexts (somewhat)
  2014-06-13 15:37 ` [PATCH 12/53] drm/i915/bdw: Populate LR contexts (somewhat) oscar.mateo
@ 2014-06-18 23:24   ` Volkin, Bradley D
  2014-06-23 12:42     ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Volkin, Bradley D @ 2014-06-18 23:24 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 08:37:30AM -0700, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> For the most part, logical ring context objects are similar to hardware
> contexts in that the backing object is meant to be opaque. There are
> some exceptions where we need to poke certain offsets of the object for
> initialization, updating the tail pointer or updating the PDPs.
> 
> For our basic execlist implementation we'll only need our PPGTT PDs,
> and ringbuffer addresses in order to set up the context. With previous
> patches, we have both, so start prepping the context to be load.
> 
> Before running a context for the first time you must populate some
> fields in the context object. These fields begin 1 PAGE + LRCA, ie. the
> first page (in 0 based counting) of the context  image. These same
> fields will be read and written to as contexts are saved and restored
> once the system is up and running.
> 
> Many of these fields are completely reused from previous global
> registers: ringbuffer head/tail/control, context control matches some
> previous MI_SET_CONTEXT flags, and page directories. There are other
> fields which we don't touch which we may want in the future.
> 
> v2: CTX_LRI_HEADER_0 is MI_LOAD_REGISTER_IMM(14) for render and (11)
> for other engines.
> 
> v3: Several rebases and general changes to the code.
> 
> v4: Squash with "Extract LR context object populating"
> Also, Damien's review comments:
> - Set the Force Posted bit on the LRI header, as the BSpec suggest we do.
> - Prevent warning when compiling a 32-bits kernel without HIGHMEM64.
> - Add a clarifying comment to the context population code.
> 
> v5: Damien's review comments:
> - The third MI_LOAD_REGISTER_IMM in the context does not set Force Posted.
> - Remove dead code.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> (v2)
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v3-5)
> ---
>  drivers/gpu/drm/i915/i915_reg.h  |   1 +
>  drivers/gpu/drm/i915/intel_lrc.c | 154 ++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 151 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 286f05c..9c8692a 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -277,6 +277,7 @@
>   *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
>   */
>  #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
> +#define   MI_LRI_FORCE_POSTED		(1<<12)
>  #define MI_STORE_REGISTER_MEM(x) MI_INSTR(0x24, 2*(x)-1)
>  #define MI_STORE_REGISTER_MEM_GEN8(x) MI_INSTR(0x24, 3*(x)-1)
>  #define   MI_SRM_LRM_GLOBAL_GTT		(1<<22)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index b3a23e0..b96bb45 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -46,6 +46,38 @@
>  
>  #define GEN8_LR_CONTEXT_ALIGN 4096
>  
> +#define RING_ELSP(ring)			((ring)->mmio_base+0x230)
> +#define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
> +
> +#define CTX_LRI_HEADER_0		0x01
> +#define CTX_CONTEXT_CONTROL		0x02
> +#define CTX_RING_HEAD			0x04
> +#define CTX_RING_TAIL			0x06
> +#define CTX_RING_BUFFER_START		0x08
> +#define CTX_RING_BUFFER_CONTROL		0x0a
> +#define CTX_BB_HEAD_U			0x0c
> +#define CTX_BB_HEAD_L			0x0e
> +#define CTX_BB_STATE			0x10
> +#define CTX_SECOND_BB_HEAD_U		0x12
> +#define CTX_SECOND_BB_HEAD_L		0x14
> +#define CTX_SECOND_BB_STATE		0x16
> +#define CTX_BB_PER_CTX_PTR		0x18
> +#define CTX_RCS_INDIRECT_CTX		0x1a
> +#define CTX_RCS_INDIRECT_CTX_OFFSET	0x1c
> +#define CTX_LRI_HEADER_1		0x21
> +#define CTX_CTX_TIMESTAMP		0x22
> +#define CTX_PDP3_UDW			0x24
> +#define CTX_PDP3_LDW			0x26
> +#define CTX_PDP2_UDW			0x28
> +#define CTX_PDP2_LDW			0x2a
> +#define CTX_PDP1_UDW			0x2c
> +#define CTX_PDP1_LDW			0x2e
> +#define CTX_PDP0_UDW			0x30
> +#define CTX_PDP0_LDW			0x32
> +#define CTX_LRI_HEADER_2		0x41
> +#define CTX_R_PWR_CLK_STATE		0x42
> +#define CTX_GPGPU_CSR_BASE_ADDRESS	0x44
> +
>  bool intel_enable_execlists(struct drm_device *dev)
>  {
>  	if (!i915.enable_execlists)
> @@ -54,6 +86,110 @@ bool intel_enable_execlists(struct drm_device *dev)
>  	return HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev);
>  }
>  
> +static int
> +populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj,
> +		    struct intel_engine_cs *ring, struct drm_i915_gem_object *ring_obj)
> +{
> +	struct i915_hw_ppgtt *ppgtt = ctx_to_ppgtt(ctx);
> +	struct page *page;
> +	uint32_t *reg_state;
> +	int ret;
> +
> +	ret = i915_gem_object_set_to_cpu_domain(ctx_obj, true);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("Could not set to CPU domain\n");
> +		return ret;
> +	}
> +
> +	ret = i915_gem_object_get_pages(ctx_obj);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("Could not get object pages\n");
> +		return ret;
> +	}
> +
> +	i915_gem_object_pin_pages(ctx_obj);
> +
> +	/* The second page of the context object contains some fields which must
> +	 * be set up prior to the first execution. */
> +	page = i915_gem_object_get_page(ctx_obj, 1);
> +	reg_state = kmap_atomic(page);
> +
> +	/* A context is actually a big batch buffer with several MI_LOAD_REGISTER_IMM
> +	 * commands followed by (reg, value) pairs. The values we are setting here are
> +	 * only for the first context restore: on a subsequent save, the GPU will
> +	 * recreate this batchbuffer with new values (including all the missing
> +	 * MI_LOAD_REGISTER_IMM commands that we are not initializing here). */
> +	if (ring->id == RCS)
> +		reg_state[CTX_LRI_HEADER_0] = MI_LOAD_REGISTER_IMM(14);
> +	else
> +		reg_state[CTX_LRI_HEADER_0] = MI_LOAD_REGISTER_IMM(11);
> +	reg_state[CTX_LRI_HEADER_0] |= MI_LRI_FORCE_POSTED;
> +	reg_state[CTX_CONTEXT_CONTROL] = RING_CONTEXT_CONTROL(ring);
> +	reg_state[CTX_CONTEXT_CONTROL+1] = (1<<3) | MI_RESTORE_INHIBIT;
> +	reg_state[CTX_CONTEXT_CONTROL+1] |= reg_state[CTX_CONTEXT_CONTROL+1] << 16;

If we can, we should probably use _MASKED_BIT_ENABLE() here to make it more
obvious why we're doing the or+shift.

> +	reg_state[CTX_RING_HEAD] = RING_HEAD(ring->mmio_base);
> +	reg_state[CTX_RING_HEAD+1] = 0;
> +	reg_state[CTX_RING_TAIL] = RING_TAIL(ring->mmio_base);
> +	reg_state[CTX_RING_TAIL+1] = 0;
> +	reg_state[CTX_RING_BUFFER_START] = RING_START(ring->mmio_base);
> +	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
> +	reg_state[CTX_RING_BUFFER_CONTROL] = RING_CTL(ring->mmio_base);
> +	reg_state[CTX_RING_BUFFER_CONTROL+1] = (31 * PAGE_SIZE) | RING_VALID;

The size here doesn't look right to me. Shouldn't it be (number of pages - 1)?
See init_ring_common()

> +	reg_state[CTX_BB_HEAD_U] = ring->mmio_base + 0x168;
> +	reg_state[CTX_BB_HEAD_U+1] = 0;
> +	reg_state[CTX_BB_HEAD_L] = ring->mmio_base + 0x140;
> +	reg_state[CTX_BB_HEAD_L+1] = 0;
> +	reg_state[CTX_BB_STATE] = ring->mmio_base + 0x110;
> +	reg_state[CTX_BB_STATE+1] = (1<<5);
> +	reg_state[CTX_SECOND_BB_HEAD_U] = ring->mmio_base + 0x11c;
> +	reg_state[CTX_SECOND_BB_HEAD_U+1] = 0;
> +	reg_state[CTX_SECOND_BB_HEAD_L] = ring->mmio_base + 0x114;
> +	reg_state[CTX_SECOND_BB_HEAD_L+1] = 0;
> +	reg_state[CTX_SECOND_BB_STATE] = ring->mmio_base + 0x118;
> +	reg_state[CTX_SECOND_BB_STATE+1] = 0;
> +	if (ring->id == RCS) {
> +		reg_state[CTX_BB_PER_CTX_PTR] = ring->mmio_base + 0x1c0;
> +		reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
> +		reg_state[CTX_RCS_INDIRECT_CTX] = ring->mmio_base + 0x1c4;
> +		reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
> +		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring->mmio_base + 0x1c8;
> +		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
> +	}
> +	reg_state[CTX_LRI_HEADER_1] = MI_LOAD_REGISTER_IMM(9);
> +	reg_state[CTX_LRI_HEADER_1] |= MI_LRI_FORCE_POSTED;
> +	reg_state[CTX_CTX_TIMESTAMP] = ring->mmio_base + 0x3a8;
> +	reg_state[CTX_CTX_TIMESTAMP+1] = 0;
> +	reg_state[CTX_PDP3_UDW] = GEN8_RING_PDP_UDW(ring, 3);
> +	reg_state[CTX_PDP3_LDW] = GEN8_RING_PDP_LDW(ring, 3);
> +	reg_state[CTX_PDP2_UDW] = GEN8_RING_PDP_UDW(ring, 2);
> +	reg_state[CTX_PDP2_LDW] = GEN8_RING_PDP_LDW(ring, 2);
> +	reg_state[CTX_PDP1_UDW] = GEN8_RING_PDP_UDW(ring, 1);
> +	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
> +	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
> +	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
> +	reg_state[CTX_PDP3_UDW+1] = (u64)ppgtt->pd_dma_addr[3] >> 32;
> +	reg_state[CTX_PDP3_LDW+1] = ppgtt->pd_dma_addr[3];
> +	reg_state[CTX_PDP2_UDW+1] = (u64)ppgtt->pd_dma_addr[2] >> 32;
> +	reg_state[CTX_PDP2_LDW+1] = ppgtt->pd_dma_addr[2];
> +	reg_state[CTX_PDP1_UDW+1] = (u64)ppgtt->pd_dma_addr[1] >> 32;
> +	reg_state[CTX_PDP1_LDW+1] = ppgtt->pd_dma_addr[1];
> +	reg_state[CTX_PDP0_UDW+1] = (u64)ppgtt->pd_dma_addr[0] >> 32;
> +	reg_state[CTX_PDP0_LDW+1] = ppgtt->pd_dma_addr[0];

Are we able to use upper_32_bits() and lower_32_bits() for these?

Brad

> +	if (ring->id == RCS) {
> +		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
> +		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
> +		reg_state[CTX_R_PWR_CLK_STATE+1] = 0;
> +	}
> +
> +	kunmap_atomic(reg_state);
> +
> +	ctx_obj->dirty = 1;
> +	set_page_dirty(page);
> +	i915_gem_object_unpin_pages(ctx_obj);
> +
> +	return 0;
> +}
> +
>  void intel_lr_context_free(struct intel_context *ctx)
>  {
>  	int i;
> @@ -145,14 +281,24 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
>  	if (ret) {
>  		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer obj %s: %d\n",
>  				ring->name, ret);
> -		kfree(ringbuf);
> -		i915_gem_object_ggtt_unpin(ctx_obj);
> -		drm_gem_object_unreference(&ctx_obj->base);
> -		return ret;
> +		goto error;
> +	}
> +
> +	ret = populate_lr_context(ctx, ctx_obj, ring, ringbuf->obj);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret);
> +		intel_destroy_ring_buffer(ringbuf);
> +		goto error;
>  	}
>  
>  	ctx->engine[ring->id].ringbuf = ringbuf;
>  	ctx->engine[ring->id].obj = ctx_obj;
>  
>  	return 0;
> +
> +error:
> +	kfree(ringbuf);
> +	i915_gem_object_ggtt_unpin(ctx_obj);
> +	drm_gem_object_unreference(&ctx_obj->base);
> +	return ret;
>  }
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 15/53] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
  2014-06-13 15:37 ` [PATCH 15/53] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs oscar.mateo
@ 2014-06-18 23:42   ` Volkin, Bradley D
  2014-06-23 12:45     ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Volkin, Bradley D @ 2014-06-18 23:42 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 08:37:33AM -0700, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> This is mostly for correctness so that we know we are running the LR
> context correctly (this is, the PDPs are contained inside the context
> object).
> 
> v2: Move the check to inside the enable PPGTT function. The switch
> happens in two places: the legacy context switch (that we won't hit
> when Execlists are enabled) and the PPGTT enable, which unfortunately
> we need. This would look much nicer if the ppgtt->enable was part of
> the ring init, where it logically belongs.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 8b3cde7..9f0c69e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -844,6 +844,11 @@ static int gen8_ppgtt_enable(struct i915_hw_ppgtt *ppgtt)
>  		if (USES_FULL_PPGTT(dev))
>  			continue;
>  
> +		/* In the case of Execlists, we don't want to write the PDPs
> +		 * in the legacy way (they live inside the context now) */
> +		if (intel_enable_execlists(dev))
> +			return 0;

Along the lines of one of Daniel's comments about the module parameter,
I think we could use some clarity on when to use intel_enable_execlists()
vs lrc_enabled vs i915.enable_execlists.

Brad

> +
>  		ret = ppgtt->switch_mm(ppgtt, ring, true);
>  		if (ret)
>  			goto err_out;
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 19/53] drm/i915: Extract pipe control fini & make init outside accesible
  2014-06-13 15:37 ` [PATCH 19/53] drm/i915: Extract pipe control fini & make init outside accesible oscar.mateo
  2014-06-18 20:31   ` Daniel Vetter
@ 2014-06-19  0:04   ` Volkin, Bradley D
  2014-06-19 10:58     ` Mateo Lozano, Oscar
  1 sibling, 1 reply; 156+ messages in thread
From: Volkin, Bradley D @ 2014-06-19  0:04 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 08:37:37AM -0700, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> I plan to reuse these for the new logical ring path.
> 
> No functional changes.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 31 ++++++++++++++++++-------------
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +++
>  2 files changed, 21 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 4a71dd4..254e4c5 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -574,8 +574,21 @@ out:
>  	return ret;
>  }
>  
> -static int
> -init_pipe_control(struct intel_engine_cs *ring)
> +void
> +intel_fini_pipe_control(struct intel_engine_cs *ring)
> +{
> +	if (ring->scratch.obj == NULL)
> +		return;
> +
> +	kunmap(sg_page(ring->scratch.obj->pages->sgl));
> +	i915_gem_object_ggtt_unpin(ring->scratch.obj);
> +
> +	drm_gem_object_unreference(&ring->scratch.obj->base);
> +	ring->scratch.obj = NULL;
> +}
> +
> +int
> +intel_init_pipe_control(struct intel_engine_cs *ring)
>  {
>  	int ret;
>  
> @@ -648,7 +661,7 @@ static int init_render_ring(struct intel_engine_cs *ring)
>  			   _MASKED_BIT_ENABLE(GFX_REPLAY_MODE));
>  
>  	if (INTEL_INFO(dev)->gen >= 5) {
> -		ret = init_pipe_control(ring);
> +		ret = intel_init_pipe_control(ring);
>  		if (ret)
>  			return ret;
>  	}
> @@ -676,16 +689,8 @@ static void render_ring_cleanup(struct intel_engine_cs *ring)
>  {
>  	struct drm_device *dev = ring->dev;
>  
> -	if (ring->scratch.obj == NULL)
> -		return;
> -
> -	if (INTEL_INFO(dev)->gen >= 5) {
> -		kunmap(sg_page(ring->scratch.obj->pages->sgl));
> -		i915_gem_object_ggtt_unpin(ring->scratch.obj);
> -	}
> -
> -	drm_gem_object_unreference(&ring->scratch.obj->base);
> -	ring->scratch.obj = NULL;
> +	if (INTEL_INFO(dev)->gen >= 5)
> +		intel_fini_pipe_control(ring);

It looks like we've changed the behavior such that for the case of
scratch.obj != NULL && gen < 5, we don't unreference scratch.obj.
I don't know if we can ever hit that case though.

Brad

>  }
>  
>  static int gen6_signal(struct intel_engine_cs *signaller,
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 599b4ed..42026a1 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -221,6 +221,9 @@ struct  intel_engine_cs {
>  
>  bool intel_ring_initialized(struct intel_engine_cs *ring);
>  
> +void intel_fini_pipe_control(struct intel_engine_cs *ring);
> +int intel_init_pipe_control(struct intel_engine_cs *ring);
> +
>  static inline unsigned
>  intel_ring_flag(struct intel_engine_cs *ring)
>  {
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 10/53] drm/i915/bdw: A bit more advanced context init/fini
  2014-06-18 22:13   ` Volkin, Bradley D
@ 2014-06-19  6:13     ` Daniel Vetter
  0 siblings, 0 replies; 156+ messages in thread
From: Daniel Vetter @ 2014-06-19  6:13 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Thu, Jun 19, 2014 at 12:13 AM, Volkin, Bradley D
<bradley.d.volkin@intel.com> wrote:
>> +void intel_lr_context_free(struct intel_context *ctx)
>> +{
>> +     int i;
>> +
>> +     for (i = 0; i < I915_NUM_RINGS; i++) {
>> +             struct drm_i915_gem_object *ctx_obj = ctx->engine[i].obj;
>> +             if (ctx_obj) {
>> +                     i915_gem_object_ggtt_unpin(ctx_obj);
>
> I suspect that leaving the backing objects pinned in ggtt for their entire
> lifetimes is going to eventually cause memory related issues. We might need
> to look at managing the binding more dynamically - similar to what the
> legacy context code already does.

Oh, I didin't spot this. We definitely need the same handling as for
legacy ring handling, so:
- Only pin while a ctx is used, shoveling the old context through the
active list for timeline unbinding.
- Last-ditch effort in evict_something to switch to the default context.

Without that we'll fragment the global gtt badly even before we
exhaust it, and that means we can't pin scanout buffers any more and
can't handle gtt mmap faults any more. There should be a nasty
testcase somewhere even to exercise the last-ditch context evict code.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 06/53] drm/i915/bdw: Introduce one context backing object per engine
  2014-06-18 20:16   ` Daniel Vetter
@ 2014-06-19  8:52     ` Mateo Lozano, Oscar
  2014-06-19 10:57       ` Daniel Vetter
  0 siblings, 1 reply; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-19  8:52 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Wednesday, June 18, 2014 9:16 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 06/53] drm/i915/bdw: Introduce one context
> backing object per engine
> 
> On Fri, Jun 13, 2014 at 04:37:24PM +0100, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > A context backing object only makes sense for a given engine (because
> > it holds state data specific to that engine).
> >
> > In legacy ringbuffer sumission mode, the only MI_SET_CONTEXT we really
> > perform is for the render engine, so one backing object is all we needed.
> >
> > With Execlists, however, we need backing objects for every engine, as
> > contexts become the only way to submit workloads to the GPU. To tackle
> > this problem, we multiplex the context struct to contain
> > <no-of-engines> objects.
> >
> > Originally, I colored this code by instantiating one new context for
> > every engine I wanted to use, but this change suggested by Brad Volkin
> > makes it more elegant.
> >
> > v2: Leave the old backing object pointer behind. Daniel Vetter
> > suggested using a union, but it makes more sense to keep render_obj as
> > a NULL pointer behind, to make sure no one uses it incorrectly when
> > Execlists are enabled, similar to what we are doing with ring->buffer
> > (Rusty's API level 5).
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h index a15370c..ccc1ba6 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -593,7 +593,14 @@ struct intel_context {
> >  	uint8_t remap_slice;
> >  	struct drm_i915_file_private *file_priv;
> >  	struct intel_engine_cs *last_ring;
> > +
> > +	/* Legacy ring buffer submission */
> >  	struct drm_i915_gem_object *render_obj;
> 
> Per my previous request, is_initialized should also be nearby, maybe wrapped
> in a struct. So union {
> 	struct {
> 		struct gem_bo *obj;
> 		bool is_iniatlized;
> 
> 	} render_ctx;
> 	struct {
> 		...
> 	} lrc[I915_NUM_RINGS];
> }
> 
> Or some other means to make it clearer which fields are for legacy render
> ctx objects and which for lrc contexts. I also wonder whether we should
> shovel all the hw specific stuff at the end to have a clearer separation
> between the sw-side field members associated with the software context
> object and the stuff for the hw thing.
> 
> Just ideas to pick&choose really, we can cocci-polish this once it's all settled
> easily (i.e. afterwards).
> -Daniel

Hmmmm... in the Execlists code I reused is_initialized for the render null-context (same as in the legacy context: I don´t want to do it more than once, e.g. when we come out of reset). Renaming it to render_is_initialized works for me, because null-context is only for the RCS in any case. But I wouldn´t mark that field as "legacy submission only"...

Also, I didn´t follow you instructions about the union for a reason:

> > v2: Leave the old backing object pointer behind. Daniel Vetter
> > suggested using a union, but it makes more sense to keep render_obj as
> > a NULL pointer behind, to make sure no one uses it incorrectly when
> > Execlists are enabled, similar to what we are doing with ring->buffer
> > (Rusty's API level 5).

Not sure if you agree with this or you still prefer the union?

-- Oscar 

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 07/53] drm/i915/bdw: New file for Logical Ring Contexts and Execlists
  2014-06-18 20:17   ` Daniel Vetter
@ 2014-06-19  9:01     ` Mateo Lozano, Oscar
  0 siblings, 0 replies; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-19  9:01 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Wednesday, June 18, 2014 9:17 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 07/53] drm/i915/bdw: New file for Logical
> Ring Contexts and Execlists
> 
> On Fri, Jun 13, 2014 at 04:37:25PM +0100, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > Some legacy HW context code assumptions don't make sense for this new
> > submission method, so we will place this stuff in a separate file.
> >
> > Note for reviewers: I've carefully considered the best name for this
> > file and this was my best option (other possibilities were
> > intel_lr_context.c or intel_execlist.c). I am open to a certain
> > bikeshedding on this matter, anyway. Regarding splitting execlists and
> > logical ring contexts, it is probably not worth it for the moment.
> >
> > v2: Change to intel_lrc.c
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/Makefile    |  1 +
> >  drivers/gpu/drm/i915/intel_lrc.c | 42
> > ++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 43 insertions(+)
> >  create mode 100644 drivers/gpu/drm/i915/intel_lrc.c
> >
> > diff --git a/drivers/gpu/drm/i915/Makefile
> > b/drivers/gpu/drm/i915/Makefile index cad1683..9fee2a0 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -31,6 +31,7 @@ i915-y += i915_cmd_parser.o \
> >  	  i915_gpu_error.o \
> >  	  i915_irq.o \
> >  	  i915_trace_points.o \
> > +	  intel_lrc.o \
> >  	  intel_ringbuffer.o \
> >  	  intel_uncore.o
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > b/drivers/gpu/drm/i915/intel_lrc.c
> > new file mode 100644
> > index 0000000..49bb6fc
> > --- /dev/null
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -0,0 +1,42 @@
> > +/*
> > + * Copyright © 2014 Intel Corporation
> > + *
> > + * Permission is hereby granted, free of charge, to any person
> > +obtaining a
> > + * copy of this software and associated documentation files (the
> > +"Software"),
> > + * to deal in the Software without restriction, including without
> > +limitation
> > + * the rights to use, copy, modify, merge, publish, distribute,
> > +sublicense,
> > + * and/or sell copies of the Software, and to permit persons to whom
> > +the
> > + * Software is furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice (including
> > +the next
> > + * paragraph) shall be included in all copies or substantial portions
> > +of the
> > + * Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
> KIND,
> > +EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> > +MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN
> NO EVENT
> > +SHALL
> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
> DAMAGES
> > +OR OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> OTHERWISE,
> > +ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
> OR
> > +OTHER DEALINGS
> > + * IN THE SOFTWARE.
> > + *
> > + * Authors:
> > + *    Ben Widawsky <ben@bwidawsk.net>
> > + *    Michel Thierry <michel.thierry@intel.com>
> > + *    Thomas Daniel <thomas.daniel@intel.com>
> > + *    Oscar Mateo <oscar.mateo@intel.com>
> > + *
> > + */
> > +
> > +/*
> 
> Overview comments should be kerneldoc DOC: sections and pulled into our
> driver doc. Brad knows how to do this, see i915_cmd_parser.c. Just in case a
> patch later on doesn't do this ;-) -Daniel

The DOC: section is added in a later patch (at the end), but I forgot to integrate it into the DocBook. I´ll do it in the next version.

-- Oscar

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 08/53] drm/i915/bdw: Macro for LRCs and module option for Execlists
  2014-06-18 20:19   ` Daniel Vetter
@ 2014-06-19  9:04     ` Mateo Lozano, Oscar
  0 siblings, 0 replies; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-19  9:04 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Wednesday, June 18, 2014 9:19 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 08/53] drm/i915/bdw: Macro for LRCs and
> module option for Execlists
> 
> On Fri, Jun 13, 2014 at 04:37:26PM +0100, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
> > These expanded contexts enable a number of new abilities, especially
> > "Execlists".
> >
> > The macro is defined to off until we have things in place to hope to
> > work. In dev_priv, lrc_enabled will reflect the state of whether or
> > not we've actually properly initialized these new contexts. This helps
> > the transition in the code but is a candidate for removal at some point.
> >
> > v2: Rename "advanced contexts" to the more correct "logical ring
> > contexts".
> >
> > v3: Add a module parameter to enable execlists. Execlist are
> > relatively new, and so it'd be wise to be able to switch back to ring
> > submission to debug subtle problems that will inevitably arise.
> >
> > v4: Add an intel_enable_execlists function.
> >
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
> > Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3)
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2 & v4)
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h    | 6 ++++++
> >  drivers/gpu/drm/i915/i915_params.c | 6 ++++++
> >  drivers/gpu/drm/i915/intel_lrc.c   | 8 ++++++++
> >  3 files changed, 20 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h index ccc1ba6..dac0db1 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1519,6 +1519,7 @@ struct drm_i915_private {
> >
> >  	uint32_t hw_context_size;
> >  	struct list_head context_list;
> > +	bool lrc_enabled;
> >
> >  	u32 fdi_rx_config;
> >
> > @@ -1944,6 +1945,7 @@ struct drm_i915_cmd_table {
> >  #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
> >
> >  #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
> > +#define HAS_LOGICAL_RING_CONTEXTS(dev)	0
> >  #define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >= 6)
> >  #define HAS_PPGTT(dev)		(INTEL_INFO(dev)->gen >= 7 &&
> !IS_GEN8(dev))
> >  #define USES_PPGTT(dev)		intel_enable_ppgtt(dev, false)
> > @@ -2029,6 +2031,7 @@ struct i915_params {
> >  	int enable_rc6;
> >  	int enable_fbc;
> >  	int enable_ppgtt;
> > +	int enable_execlists;
> >  	int enable_psr;
> >  	unsigned int preliminary_hw_support;
> >  	int disable_power_well;
> > @@ -2420,6 +2423,9 @@ struct intel_context *
> > i915_gem_context_validate(struct drm_device *dev, struct drm_file *file,
> >  			  struct intel_engine_cs *ring, const u32 ctx_id);
> >
> > +/* intel_lrc.c */
> > +bool intel_enable_execlists(struct drm_device *dev);
> > +
> >  /* i915_gem_render_state.c */
> >  int i915_gem_render_state_init(struct intel_engine_cs *ring);
> >  /* i915_gem_evict.c */
> > diff --git a/drivers/gpu/drm/i915/i915_params.c
> > b/drivers/gpu/drm/i915/i915_params.c
> > index d05a2af..b7455f8 100644
> > --- a/drivers/gpu/drm/i915/i915_params.c
> > +++ b/drivers/gpu/drm/i915/i915_params.c
> > @@ -37,6 +37,7 @@ struct i915_params i915 __read_mostly = {
> >  	.enable_fbc = -1,
> >  	.enable_hangcheck = true,
> >  	.enable_ppgtt = -1,
> > +	.enable_execlists = -1,
> >  	.enable_psr = 0,
> >  	.preliminary_hw_support =
> IS_ENABLED(CONFIG_DRM_I915_PRELIMINARY_HW_SUPPORT),
> >  	.disable_power_well = 1,
> > @@ -116,6 +117,11 @@ MODULE_PARM_DESC(enable_ppgtt,
> >  	"Override PPGTT usage. "
> >  	"(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
> >
> > +module_param_named(enable_execlists, i915.enable_execlists, int,
> > +0400); MODULE_PARM_DESC(enable_execlists,
> > +	"Override execlists usage. "
> > +	"(-1=auto [default], 0=disabled, 1=enabled)");
> > +
> >  module_param_named(enable_psr, i915.enable_psr, int, 0600);
> > MODULE_PARM_DESC(enable_psr, "Enable PSR (default: false)");
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > b/drivers/gpu/drm/i915/intel_lrc.c
> > index 49bb6fc..58cead1 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -40,3 +40,11 @@
> >  #include <drm/drmP.h>
> >  #include <drm/i915_drm.h>
> >  #include "i915_drv.h"
> > +
> > +bool intel_enable_execlists(struct drm_device *dev) {
> > +	if (!i915.enable_execlists)
> > +		return false;
> > +
> > +	return HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev); }
> 
> Nitpick: Best practice nowadays for options with complicated details is to
> have a sanitized function called early in init. Code then just uses i915.foo
> without calling anything. And the parameter needs to be read-only, but that's
> already the case. See e.g. ppgtt handling.
> 
> Of course if your code only uses this once then this is moot - I didn't read
> ahead.
> -Daniel

Yes, I haven´t missed the 20+ True PPGTT early sanitize patches in the list :)
Ok, will do!

-- Oscar

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 09/53] drm/i915/bdw: Initialization for Logical Ring Contexts
  2014-06-18 20:24   ` Daniel Vetter
@ 2014-06-19  9:23     ` Mateo Lozano, Oscar
  2014-06-19 10:08       ` Daniel Vetter
  0 siblings, 1 reply; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-19  9:23 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Wednesday, June 18, 2014 9:25 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 09/53] drm/i915/bdw: Initialization for
> Logical Ring Contexts
> 
> On Fri, Jun 13, 2014 at 04:37:27PM +0100, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > Early in the series we had our own gen8_gem_context_init/fini
> > functions, but the truth is they now look almost the same as the
> > legacy hw context init/fini functions. We can always split them later
> > if this ceases to be the case.
> >
> > Also, we do not fall back to legacy ringbuffers when logical ring
> > context initialization fails (not very likely to happen and, even if
> > it does, hw contexts would probably fail as well).
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_context.c | 21 ++++++++++++++++-----
> >  1 file changed, 16 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> > b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 801b891..3f3fb36 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -416,7 +416,13 @@ int i915_gem_context_init(struct drm_device
> *dev)
> >  	if (WARN_ON(dev_priv->ring[RCS].default_context))
> >  		return 0;
> >
> > -	if (HAS_HW_CONTEXTS(dev)) {
> > +	dev_priv->lrc_enabled = intel_enable_execlists(dev);
> > +
> > +	if (dev_priv->lrc_enabled) {
> > +		/* NB: intentionally left blank. We will allocate our own
> > +		 * backing objects as we need them, thank you very much */
> > +		dev_priv->hw_context_size = 0;
> > +	} else if (HAS_HW_CONTEXTS(dev)) {
> >  		dev_priv->hw_context_size =
> round_up(get_context_size(dev), 4096);
> >  		if (dev_priv->hw_context_size > (1<<20)) {
> >  			DRM_DEBUG_DRIVER("Disabling HW Contexts;
> invalid size %d\n", @@
> > -436,7 +442,9 @@ int i915_gem_context_init(struct drm_device *dev)
> >  	for (i = 0; i < I915_NUM_RINGS; i++)
> >  		dev_priv->ring[i].default_context = ctx;
> >
> > -	DRM_DEBUG_DRIVER("%s context support initialized\n", dev_priv-
> >hw_context_size ? "HW" : "fake");
> > +	DRM_DEBUG_DRIVER("%s context support initialized\n",
> > +			dev_priv->lrc_enabled ? "LR" :
> > +			dev_priv->hw_context_size ? "HW" : "fake");
> >  	return 0;
> >  }
> >
> > @@ -765,9 +773,12 @@ int i915_switch_context(struct intel_engine_cs
> *ring,
> >  	return do_switch(ring, to);
> >  }
> >
> > -static bool hw_context_enabled(struct drm_device *dev)
> > +static bool contexts_enabled(struct drm_device *dev)
> >  {
> > -	return to_i915(dev)->hw_context_size;
> > +	struct drm_i915_private *dev_priv = to_i915(dev);
> > +
> > +	/* FIXME: this would be cleaner with a "context type" enum */
> > +	return dev_priv->lrc_enabled || dev_priv->hw_context_size;
> 
> Since you have a bunch of if ladders the usual approach isn't an enum but a
> vfunc table to abstract behaviour. Think object types instead of switch
> statements. Style bikeshed though (presume code later on doesn't have
> excesses here).
> -Daniel

Hmmmm... I offered to do this with vfuncs early on, but you mentioned special-casing should be enough. And I agreed: at the end, the LR contexts are not that different from traditional HW contexts. This is what I had in mind:

CTX_TYPE_FAKE: no backing objects.
CTX_TYPE_HW: one render backing object at creation time.
CTX_TYPE_LR: n backing objects, with deferred creation. A few functions are moot (e.g. switch, reset).

The current system (looking at dev_priv->hw_context_size to distinguish fake from hw contexts) is imo a bit obfuscated.
And we can always abstract this away with vfuncs if it becomes too complex in the future...

What do you think? can special casing made do for the time being?

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 42/53] drm/i915/bdw: Make sure gpu reset still works with Execlists
  2014-06-18 20:50   ` Daniel Vetter
@ 2014-06-19  9:37     ` Mateo Lozano, Oscar
  0 siblings, 0 replies; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-19  9:37 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Wednesday, June 18, 2014 9:50 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 42/53] drm/i915/bdw: Make sure gpu reset
> still works with Execlists
> 
> On Fri, Jun 13, 2014 at 04:38:00PM +0100, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > If we reset a ring after a hang, we have to make sure that we clear
> > out all queued Execlists requests.
> >
> > v2: The ring is, at this point, already being correctly re-programmed
> > for Execlists, and the hangcheck counters cleared.
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c
> > b/drivers/gpu/drm/i915/i915_gem.c index 7c10540..86bfb8a 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -2546,6 +2546,19 @@ static void
> i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
> >  		i915_gem_free_request(request);
> >  	}
> >
> > +	if (intel_enable_execlists(dev_priv->dev)) {
> > +		while (!list_empty(&ring->execlist_queue)) {
> 
> the execlist_queu should be emtpy for legacy mode, i.e. you can ditch teh if
> here, it's redundant. If not move the INIT_LIST_HEAD ;-) -Daniel

I´ll have to INIT_LIST_HEAD in both the legacy and the new ring init but, now that you mention it, that´s probably a good idea :D

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 53/53] !UPSTREAM: drm/i915: Use MMIO flips
  2014-06-18 21:01   ` Daniel Vetter
@ 2014-06-19  9:50     ` Mateo Lozano, Oscar
  2014-06-19 10:04       ` Daniel Vetter
  2014-06-19 10:13       ` Chris Wilson
  0 siblings, 2 replies; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-19  9:50 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Wednesday, June 18, 2014 10:01 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 53/53] !UPSTREAM: drm/i915: Use MMIO
> flips
> 
> On Fri, Jun 13, 2014 at 04:38:11PM +0100, oscar.mateo@intel.com wrote:
> > From: Sourab Gupta <sourab.gupta@intel.com>
> >
> > If we want flips to work, either we create an Execlists-aware version
> > of intel_gen7_queue_flip, or we don't place commands directly in the
> > ringbuffer.
> >
> > When upstreamed, this patch should implement the second option:
> 
> Usually we just mention such requirements in the cover letter of the series
> and don't include them.
> -Daniel

Well, since we have it here, there is something I wanted to ask about this:

MODULE_PARM_DESC(use_mmio_flip, "use MMIO flips (-1=never, 0=driver discretion [default], 1=always)"); 

The "driver discretion" option will blow up with Execlists :(
How do I tackle this? an execlists version of intel_gen7_queue_flip? sanitizing use_mmio_flip when execlists get enabled? I really don´t have a preference...

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 53/53] !UPSTREAM: drm/i915: Use MMIO flips
  2014-06-19  9:50     ` Mateo Lozano, Oscar
@ 2014-06-19 10:04       ` Daniel Vetter
  2014-06-19 10:13       ` Chris Wilson
  1 sibling, 0 replies; 156+ messages in thread
From: Daniel Vetter @ 2014-06-19 10:04 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx

On Thu, Jun 19, 2014 at 11:50 AM, Mateo Lozano, Oscar
<oscar.mateo@intel.com> wrote:
>> -----Original Message-----
>> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
>> Vetter
>> Sent: Wednesday, June 18, 2014 10:01 PM
>> To: Mateo Lozano, Oscar
>> Cc: intel-gfx@lists.freedesktop.org
>> Subject: Re: [Intel-gfx] [PATCH 53/53] !UPSTREAM: drm/i915: Use MMIO
>> flips
>>
>> On Fri, Jun 13, 2014 at 04:38:11PM +0100, oscar.mateo@intel.com wrote:
>> > From: Sourab Gupta <sourab.gupta@intel.com>
>> >
>> > If we want flips to work, either we create an Execlists-aware version
>> > of intel_gen7_queue_flip, or we don't place commands directly in the
>> > ringbuffer.
>> >
>> > When upstreamed, this patch should implement the second option:
>>
>> Usually we just mention such requirements in the cover letter of the series
>> and don't include them.
>> -Daniel
>
> Well, since we have it here, there is something I wanted to ask about this:
>
> MODULE_PARM_DESC(use_mmio_flip, "use MMIO flips (-1=never, 0=driver discretion [default], 1=always)");
>
> The "driver discretion" option will blow up with Execlists :(
> How do I tackle this? an execlists version of intel_gen7_queue_flip? sanitizing use_mmio_flip when execlists get enabled? I really don´t have a preference...

Hm, usually we have it -1=per-platform defaults, 0=force disable,
1=force enable. Hooray for inconsistency. But anyway just make the
default with execlist be mmio flips. Tbh I don't see that much value
in this option, might as well kick it out and use mmio flips on gen7+
unconditionally for everything. Presuming they don't have any other
downsides.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 09/53] drm/i915/bdw: Initialization for Logical Ring Contexts
  2014-06-19  9:23     ` Mateo Lozano, Oscar
@ 2014-06-19 10:08       ` Daniel Vetter
  2014-06-19 10:10         ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Daniel Vetter @ 2014-06-19 10:08 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx

On Thu, Jun 19, 2014 at 11:23 AM, Mateo Lozano, Oscar
<oscar.mateo@intel.com> wrote:
>> > -static bool hw_context_enabled(struct drm_device *dev)
>> > +static bool contexts_enabled(struct drm_device *dev)
>> >  {
>> > -   return to_i915(dev)->hw_context_size;
>> > +   struct drm_i915_private *dev_priv = to_i915(dev);
>> > +
>> > +   /* FIXME: this would be cleaner with a "context type" enum */
>> > +   return dev_priv->lrc_enabled || dev_priv->hw_context_size;
>>
>> Since you have a bunch of if ladders the usual approach isn't an enum but a
>> vfunc table to abstract behaviour. Think object types instead of switch
>> statements. Style bikeshed though (presume code later on doesn't have
>> excesses here).
>> -Daniel
>
> Hmmmm... I offered to do this with vfuncs early on, but you mentioned special-casing should be enough. And I agreed: at the end, the LR contexts are not that different from traditional HW contexts. This is what I had in mind:
>
> CTX_TYPE_FAKE: no backing objects.
> CTX_TYPE_HW: one render backing object at creation time.
> CTX_TYPE_LR: n backing objects, with deferred creation. A few functions are moot (e.g. switch, reset).
>
> The current system (looking at dev_priv->hw_context_size to distinguish fake from hw contexts) is imo a bit obfuscated.
> And we can always abstract this away with vfuncs if it becomes too complex in the future...
>
> What do you think? can special casing made do for the time being?

Yeah I generally promote the rule-of-thumb to only do vfuncs once we
have the 3rd case (and I don't think we should count fake contexts
really). Until then if-ladders and hacks are good enough. Actually
better since usually you need a few completely different platforms to
know what's required of your vfunc intefaces to cover it all.

I really only latched onto this because of your FIXME comment, no
other reason at all. So if we decide that some reorg helps the code a
lot we can do that as a follow-up, but really not required upfront
imo.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 09/53] drm/i915/bdw: Initialization for Logical Ring Contexts
  2014-06-19 10:08       ` Daniel Vetter
@ 2014-06-19 10:10         ` Mateo Lozano, Oscar
  2014-06-19 10:34           ` Daniel Vetter
  0 siblings, 1 reply; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-19 10:10 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

> -----Original Message-----
> From: daniel.vetter@ffwll.ch [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> Daniel Vetter
> Sent: Thursday, June 19, 2014 11:08 AM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 09/53] drm/i915/bdw: Initialization for
> Logical Ring Contexts
> 
> On Thu, Jun 19, 2014 at 11:23 AM, Mateo Lozano, Oscar
> <oscar.mateo@intel.com> wrote:
> >> > -static bool hw_context_enabled(struct drm_device *dev)
> >> > +static bool contexts_enabled(struct drm_device *dev)
> >> >  {
> >> > -   return to_i915(dev)->hw_context_size;
> >> > +   struct drm_i915_private *dev_priv = to_i915(dev);
> >> > +
> >> > +   /* FIXME: this would be cleaner with a "context type" enum */
> >> > +   return dev_priv->lrc_enabled || dev_priv->hw_context_size;
> >>
> >> Since you have a bunch of if ladders the usual approach isn't an enum
> >> but a vfunc table to abstract behaviour. Think object types instead
> >> of switch statements. Style bikeshed though (presume code later on
> >> doesn't have excesses here).
> >> -Daniel
> >
> > Hmmmm... I offered to do this with vfuncs early on, but you mentioned
> special-casing should be enough. And I agreed: at the end, the LR contexts
> are not that different from traditional HW contexts. This is what I had in
> mind:
> >
> > CTX_TYPE_FAKE: no backing objects.
> > CTX_TYPE_HW: one render backing object at creation time.
> > CTX_TYPE_LR: n backing objects, with deferred creation. A few functions
> are moot (e.g. switch, reset).
> >
> > The current system (looking at dev_priv->hw_context_size to distinguish
> fake from hw contexts) is imo a bit obfuscated.
> > And we can always abstract this away with vfuncs if it becomes too
> complex in the future...
> >
> > What do you think? can special casing made do for the time being?
> 
> Yeah I generally promote the rule-of-thumb to only do vfuncs once we have
> the 3rd case (and I don't think we should count fake contexts really). Until
> then if-ladders and hacks are good enough. Actually better since usually you
> need a few completely different platforms to know what's required of your
> vfunc intefaces to cover it all.
> 
> I really only latched onto this because of your FIXME comment, no other
> reason at all. So if we decide that some reorg helps the code a lot we can do
> that as a follow-up, but really not required upfront imo.

So green light to the enum idea? It´ll look better than the current dev_priv->hw_context_size + dev_priv->lrc_enabled and I can send it early as prep-work...
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 53/53] !UPSTREAM: drm/i915: Use MMIO flips
  2014-06-19  9:50     ` Mateo Lozano, Oscar
  2014-06-19 10:04       ` Daniel Vetter
@ 2014-06-19 10:13       ` Chris Wilson
  2014-06-19 10:33         ` Mateo Lozano, Oscar
  1 sibling, 1 reply; 156+ messages in thread
From: Chris Wilson @ 2014-06-19 10:13 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx

On Thu, Jun 19, 2014 at 09:50:33AM +0000, Mateo Lozano, Oscar wrote:
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> > Vetter
> > Sent: Wednesday, June 18, 2014 10:01 PM
> > To: Mateo Lozano, Oscar
> > Cc: intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 53/53] !UPSTREAM: drm/i915: Use MMIO
> > flips
> > 
> > On Fri, Jun 13, 2014 at 04:38:11PM +0100, oscar.mateo@intel.com wrote:
> > > From: Sourab Gupta <sourab.gupta@intel.com>
> > >
> > > If we want flips to work, either we create an Execlists-aware version
> > > of intel_gen7_queue_flip, or we don't place commands directly in the
> > > ringbuffer.
> > >
> > > When upstreamed, this patch should implement the second option:
> > 
> > Usually we just mention such requirements in the cover letter of the series
> > and don't include them.
> > -Daniel
> 
> Well, since we have it here, there is something I wanted to ask about this:
> 
> MODULE_PARM_DESC(use_mmio_flip, "use MMIO flips (-1=never, 0=driver discretion [default], 1=always)"); 
> 
> The "driver discretion" option will blow up with Execlists :(
> How do I tackle this? an execlists version of intel_gen7_queue_flip? sanitizing use_mmio_flip when execlists get enabled? I really don´t have a preference...

Why does execlist make it hard to add a command to the last lrc that the
object was on?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 53/53] !UPSTREAM: drm/i915: Use MMIO flips
  2014-06-19 10:13       ` Chris Wilson
@ 2014-06-19 10:33         ` Mateo Lozano, Oscar
  0 siblings, 0 replies; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-19 10:33 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

> -----Original Message-----
> From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> Sent: Thursday, June 19, 2014 11:14 AM
> To: Mateo Lozano, Oscar
> Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 53/53] !UPSTREAM: drm/i915: Use MMIO
> flips
> 
> On Thu, Jun 19, 2014 at 09:50:33AM +0000, Mateo Lozano, Oscar wrote:
> > > -----Original Message-----
> > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> > > Daniel Vetter
> > > Sent: Wednesday, June 18, 2014 10:01 PM
> > > To: Mateo Lozano, Oscar
> > > Cc: intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH 53/53] !UPSTREAM: drm/i915: Use MMIO
> > > flips
> > >
> > > On Fri, Jun 13, 2014 at 04:38:11PM +0100, oscar.mateo@intel.com
> wrote:
> > > > From: Sourab Gupta <sourab.gupta@intel.com>
> > > >
> > > > If we want flips to work, either we create an Execlists-aware
> > > > version of intel_gen7_queue_flip, or we don't place commands
> > > > directly in the ringbuffer.
> > > >
> > > > When upstreamed, this patch should implement the second option:
> > >
> > > Usually we just mention such requirements in the cover letter of the
> > > series and don't include them.
> > > -Daniel
> >
> > Well, since we have it here, there is something I wanted to ask about this:
> >
> > MODULE_PARM_DESC(use_mmio_flip, "use MMIO flips (-1=never, 0=driver
> > discretion [default], 1=always)");
> >
> > The "driver discretion" option will blow up with Execlists :( How do I
> > tackle this? an execlists version of intel_gen7_queue_flip? sanitizing
> use_mmio_flip when execlists get enabled? I really don´t have a preference...
> 
> Why does execlist make it hard to add a command to the last lrc that the
> object was on?
> -Chris

Well, it´s not hard: up to v2 of the series I reused intel_ring_begin, intel_ring_emit, etc... to simply accept a context (it this case, ring->default_context) but since now we have a different set of functions, it needs some more frobbing. Now, I don´t mind doing that as a especial case, but I was told I shouldn´t worry about flips because we were changing them for MMIO flips anyway...

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 09/53] drm/i915/bdw: Initialization for Logical Ring Contexts
  2014-06-19 10:10         ` Mateo Lozano, Oscar
@ 2014-06-19 10:34           ` Daniel Vetter
  0 siblings, 0 replies; 156+ messages in thread
From: Daniel Vetter @ 2014-06-19 10:34 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx

On Thu, Jun 19, 2014 at 12:10 PM, Mateo Lozano, Oscar
<oscar.mateo@intel.com> wrote:
> So green light to the enum idea? It´ll look better than the current dev_priv->hw_context_size + dev_priv->lrc_enabled and I can send it early as prep-work...

Yeah if you like it I'm ok with it, so please go ahead.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 04/53] drm/i915: Extract ringbuffer destroy & make alloc outside accesible
  2014-06-18 21:39   ` Volkin, Bradley D
@ 2014-06-19 10:42     ` Mateo Lozano, Oscar
  0 siblings, 0 replies; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-19 10:42 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

> -----Original Message-----
> From: Volkin, Bradley D
> Sent: Wednesday, June 18, 2014 10:39 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 04/53] drm/i915: Extract ringbuffer destroy &
> make alloc outside accesible
> 
> On Fri, Jun 13, 2014 at 08:37:22AM -0700, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > We are going to start creating a lot of extra ringbuffers soon, so
> > these functions are handy.
> >
> > No functional changes.
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 26
> > ++++++++++++++++----------  drivers/gpu/drm/i915/intel_ringbuffer.h |
> > 4 ++++
> >  2 files changed, 20 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 279488a..915f3d5 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1378,15 +1378,25 @@ static int init_phys_status_page(struct
> intel_engine_cs *ring)
> >  	return 0;
> >  }
> >
> > -static int allocate_ring_buffer(struct intel_engine_cs *ring)
> > +void intel_destroy_ring_buffer(struct intel_ringbuffer *ringbuf) {
> > +	if (!ringbuf->obj)
> > +		return;
> > +
> > +	iounmap(ringbuf->virtual_start);
> > +	i915_gem_object_ggtt_unpin(ringbuf->obj);
> > +	drm_gem_object_unreference(&ringbuf->obj->base);
> > +	ringbuf->obj = NULL;
> > +}
> > +
> > +int intel_allocate_ring_buffer(struct drm_device *dev,
> > +			       struct intel_ringbuffer *ringbuf)
> 
> A bikeshed, but maybe intel_alloc_ringbuffer_obj() since we're only
> allocating the backing object, and to mirror the earlier
> i915_gem_alloc_context_obj() with similar purpose. Otherwise, looks fine to
> me.
> 
> Brad

Bikeshed accepted! It does look better.

-- Oscar

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 06/53] drm/i915/bdw: Introduce one context backing object per engine
  2014-06-19  8:52     ` Mateo Lozano, Oscar
@ 2014-06-19 10:57       ` Daniel Vetter
  0 siblings, 0 replies; 156+ messages in thread
From: Daniel Vetter @ 2014-06-19 10:57 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx

On Thu, Jun 19, 2014 at 10:52 AM, Mateo Lozano, Oscar
<oscar.mateo@intel.com> wrote:
>> > v2: Leave the old backing object pointer behind. Daniel Vetter
>> > suggested using a union, but it makes more sense to keep render_obj as
>> > a NULL pointer behind, to make sure no one uses it incorrectly when
>> > Execlists are enabled, similar to what we are doing with ring->buffer
>> > (Rusty's API level 5).
>
> Not sure if you agree with this or you still prefer the union?

Well the union has the same idea but using less space. Not really
worth here though at all, so I'm ok with your approach. In any case
subclassing is usually the better approach than having a union.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 19/53] drm/i915: Extract pipe control fini & make init outside accesible
  2014-06-19  0:04   ` Volkin, Bradley D
@ 2014-06-19 10:58     ` Mateo Lozano, Oscar
  0 siblings, 0 replies; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-19 10:58 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

> -----Original Message-----
> From: Volkin, Bradley D
> Sent: Thursday, June 19, 2014 1:04 AM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 19/53] drm/i915: Extract pipe control fini &
> make init outside accesible
> 
> On Fri, Jun 13, 2014 at 08:37:37AM -0700, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > I plan to reuse these for the new logical ring path.
> >
> > No functional changes.
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 31
> > ++++++++++++++++++-------------
> > drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +++
> >  2 files changed, 21 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 4a71dd4..254e4c5 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -574,8 +574,21 @@ out:
> >  	return ret;
> >  }
> >
> > -static int
> > -init_pipe_control(struct intel_engine_cs *ring)
> > +void
> > +intel_fini_pipe_control(struct intel_engine_cs *ring) {
> > +	if (ring->scratch.obj == NULL)
> > +		return;
> > +
> > +	kunmap(sg_page(ring->scratch.obj->pages->sgl));
> > +	i915_gem_object_ggtt_unpin(ring->scratch.obj);
> > +
> > +	drm_gem_object_unreference(&ring->scratch.obj->base);
> > +	ring->scratch.obj = NULL;
> > +}
> > +
> > +int
> > +intel_init_pipe_control(struct intel_engine_cs *ring)
> >  {
> >  	int ret;
> >
> > @@ -648,7 +661,7 @@ static int init_render_ring(struct intel_engine_cs
> *ring)
> >  			   _MASKED_BIT_ENABLE(GFX_REPLAY_MODE));
> >
> >  	if (INTEL_INFO(dev)->gen >= 5) {
> > -		ret = init_pipe_control(ring);
> > +		ret = intel_init_pipe_control(ring);
> >  		if (ret)
> >  			return ret;
> >  	}
> > @@ -676,16 +689,8 @@ static void render_ring_cleanup(struct
> > intel_engine_cs *ring)  {
> >  	struct drm_device *dev = ring->dev;
> >
> > -	if (ring->scratch.obj == NULL)
> > -		return;
> > -
> > -	if (INTEL_INFO(dev)->gen >= 5) {
> > -		kunmap(sg_page(ring->scratch.obj->pages->sgl));
> > -		i915_gem_object_ggtt_unpin(ring->scratch.obj);
> > -	}
> > -
> > -	drm_gem_object_unreference(&ring->scratch.obj->base);
> > -	ring->scratch.obj = NULL;
> > +	if (INTEL_INFO(dev)->gen >= 5)
> > +		intel_fini_pipe_control(ring);
> 
> It looks like we've changed the behavior such that for the case of scratch.obj
> != NULL && gen < 5, we don't unreference scratch.obj.
> I don't know if we can ever hit that case though.

I think you´re right, what was I thinking? :(
Thanks for the heads up!

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 23/53] drm/i915: Generalize intel_ring_get_tail
  2014-06-13 15:37 ` [PATCH 23/53] drm/i915: Generalize intel_ring_get_tail oscar.mateo
@ 2014-06-20 20:17   ` Volkin, Bradley D
  0 siblings, 0 replies; 156+ messages in thread
From: Volkin, Bradley D @ 2014-06-20 20:17 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 08:37:41AM -0700, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Reusing stuff, a penny at a time.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c         | 4 ++--
>  drivers/gpu/drm/i915/intel_ringbuffer.h | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c5c06c9..dcdffab 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2320,7 +2320,7 @@ int __i915_add_request(struct intel_engine_cs *ring,
>  	u32 request_ring_position, request_start;
>  	int ret;
>  
> -	request_start = intel_ring_get_tail(ring);
> +	request_start = intel_ring_get_tail(ring->buffer);
>  	/*
>  	 * Emit any outstanding flushes - execbuf can fail to emit the flush
>  	 * after having emitted the batchbuffer command. Hence we need to fix
> @@ -2341,7 +2341,7 @@ int __i915_add_request(struct intel_engine_cs *ring,
>  	 * GPU processing the request, we never over-estimate the
>  	 * position of the head.
>  	 */
> -	request_ring_position = intel_ring_get_tail(ring);
> +	request_ring_position = intel_ring_get_tail(ring->buffer);
>  
>  	ret = ring->add_request(ring);
>  	if (ret)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index dc944fe..1558afa 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -334,9 +334,9 @@ void intel_destroy_ring_buffer(struct intel_ringbuffer *ringbuf);
>  int intel_allocate_ring_buffer(struct drm_device *dev,
>  			       struct intel_ringbuffer *ringbuf);
>  
> -static inline u32 intel_ring_get_tail(struct intel_engine_cs *ring)
> +static inline u32 intel_ring_get_tail(struct intel_ringbuffer *ringbuf)
>  {
> -	return ring->buffer->tail;
> +	return ringbuf->tail;
>  }

Another naming bikeshed, for this and the previous patch:
It might be unexpected to have all of the intel_ring_ functions except for
two take a struct intel_engine_cs and then have this and intel_ring_space
take a struct intel_ringbuffer. So maybe intel_ringbuffer_ or similar for
those two.

Brad

>  
>  static inline u32 intel_ring_get_seqno(struct intel_engine_cs *ring)
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 25/53] drm/i915/bdw: GEN-specific logical ring submit context (somewhat)
  2014-06-13 15:37 ` [PATCH 25/53] drm/i915/bdw: GEN-specific logical ring submit context (somewhat) oscar.mateo
@ 2014-06-20 20:28   ` Volkin, Bradley D
  2014-06-23 12:49     ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Volkin, Bradley D @ 2014-06-20 20:28 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 08:37:43AM -0700, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> For the moment, just mark the place (we still need to do a lot of
> preparation before execlists are ready to start submitting things).
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c        | 11 +++++++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  6 ++++++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 6c62ae5..02fc3d0 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -139,6 +139,12 @@ static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
>  	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
>  }
>  
> +static void gen8_submit_ctx(struct intel_engine_cs *ring,
> +			    struct intel_context *ctx, u32 value)
> +{
> +	DRM_ERROR("Execlists still not ready!\n");
> +}
> +
>  void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
>  {
>  	if (!intel_ring_initialized(ring))
> @@ -213,6 +219,7 @@ static int logical_render_ring_init(struct drm_device *dev)
>  	ring->cleanup = intel_fini_pipe_control;
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
> +	ring->submit_ctx = gen8_submit_ctx;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -231,6 +238,7 @@ static int logical_bsd_ring_init(struct drm_device *dev)
>  	ring->init = gen8_init_common_ring;
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
> +	ring->submit_ctx = gen8_submit_ctx;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -249,6 +257,7 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
>  	ring->init = gen8_init_common_ring;
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
> +	ring->submit_ctx = gen8_submit_ctx;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -267,6 +276,7 @@ static int logical_blt_ring_init(struct drm_device *dev)
>  	ring->init = gen8_init_common_ring;
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
> +	ring->submit_ctx = gen8_submit_ctx;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -285,6 +295,7 @@ static int logical_vebox_ring_init(struct drm_device *dev)
>  	ring->init = gen8_init_common_ring;
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
> +	ring->submit_ctx = gen8_submit_ctx;
>  
>  	return logical_ring_init(dev, ring);
>  }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index ff8753c..1a6df42 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -79,6 +79,8 @@ struct intel_ringbuffer {
>  	u32 last_retired_head;
>  };
>  
> +struct intel_context;
> +
>  struct  intel_engine_cs {
>  	const char	*name;
>  	enum intel_ring_id {
> @@ -146,6 +148,10 @@ struct  intel_engine_cs {
>  				  unsigned int num_dwords);
>  	} semaphore;
>  
> +	/* Execlists */
> +	void		(*submit_ctx)(struct intel_engine_cs *ring,
> +				      struct intel_context *ctx, u32 value);
> +

Is it worth making this a vfunc in the refactored codebase? It ends up as
the same function for all engines...called in one place...the implementation
of which is a single call to another function that takes the same arguments.
Previously this was an implementation of the write_tail vfunc, so it made
sense. I'm not so sure now.

Brad

>  	/**
>  	 * List of objects currently involved in rendering from the
>  	 * ringbuffer.
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-13 15:37 ` [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism oscar.mateo
@ 2014-06-20 21:00   ` Volkin, Bradley D
  2014-06-23 13:09     ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Volkin, Bradley D @ 2014-06-20 21:00 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 08:37:44AM -0700, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Well, new-ish: if all this code looks familiar, that's because it's
> a clone of the existing submission mechanism (with some modifications
> here and there to adapt it to LRCs and Execlists).
> 
> And why did we do this? Execlists offer several advantages, like
> control over when the GPU is done with a given workload, that can
> help simplify the submission mechanism, no doubt, but I am interested
> in getting Execlists to work first and foremost. As we are creating
> a parallel submission mechanism (even if itñś just a clone), we can
> now start improving it without the fear of breaking old gens.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 214 +++++++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.h |  18 ++++
>  2 files changed, 232 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 02fc3d0..89aed7a 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -86,6 +86,220 @@ bool intel_enable_execlists(struct drm_device *dev)
>  	return HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev);
>  }
>  
> +static inline struct intel_ringbuffer *
> +logical_ringbuf_get(struct intel_engine_cs *ring, struct intel_context *ctx)
> +{
> +	return ctx->engine[ring->id].ringbuf;
> +}
> +
> +void intel_logical_ring_advance_and_submit(struct intel_engine_cs *ring,
> +					   struct intel_context *ctx)
> +{
> +	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
> +
> +	intel_logical_ring_advance(ringbuf);
> +
> +	if (intel_ring_stopped(ring))
> +		return;
> +
> +	ring->submit_ctx(ring, ctx, ringbuf->tail);
> +}
> +
> +static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
> +				    struct intel_context *ctx)
> +{
> +	if (ring->outstanding_lazy_seqno)
> +		return 0;
> +
> +	if (ring->preallocated_lazy_request == NULL) {
> +		struct drm_i915_gem_request *request;
> +
> +		request = kmalloc(sizeof(*request), GFP_KERNEL);
> +		if (request == NULL)
> +			return -ENOMEM;
> +
> +		ring->preallocated_lazy_request = request;
> +	}
> +
> +	return i915_gem_get_seqno(ring->dev, &ring->outstanding_lazy_seqno);
> +}
> +
> +static int logical_ring_wait_request(struct intel_engine_cs *ring,
> +				     struct intel_ringbuffer *ringbuf,
> +				     struct intel_context *ctx,
> +				     int bytes)
> +{
> +	struct drm_i915_gem_request *request;
> +	u32 seqno = 0;
> +	int ret;
> +
> +	if (ringbuf->last_retired_head != -1) {
> +		ringbuf->head = ringbuf->last_retired_head;
> +		ringbuf->last_retired_head = -1;
> +
> +		ringbuf->space = intel_ring_space(ringbuf);
> +		if (ringbuf->space >= bytes)
> +			return 0;
> +	}
> +
> +	list_for_each_entry(request, &ring->request_list, list) {
> +		if (__intel_ring_space(request->tail, ringbuf->tail,
> +				ringbuf->size) >= bytes) {
> +			seqno = request->seqno;
> +			break;
> +		}
> +	}
> +
> +	if (seqno == 0)
> +		return -ENOSPC;
> +
> +	ret = i915_wait_seqno(ring, seqno);
> +	if (ret)
> +		return ret;
> +
> +	/* TODO: make sure we update the right ringbuffer's last_retired_head
> +	 * when retiring requests */
> +	i915_gem_retire_requests_ring(ring);
> +	ringbuf->head = ringbuf->last_retired_head;
> +	ringbuf->last_retired_head = -1;
> +
> +	ringbuf->space = intel_ring_space(ringbuf);
> +	return 0;
> +}
> +
> +static int logical_ring_wait_for_space(struct intel_engine_cs *ring,
> +						   struct intel_ringbuffer *ringbuf,
> +						   struct intel_context *ctx,
> +						   int bytes)
> +{
> +	struct drm_device *dev = ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	unsigned long end;
> +	int ret;
> +
> +	ret = logical_ring_wait_request(ring, ringbuf, ctx, bytes);
> +	if (ret != -ENOSPC)
> +		return ret;
> +
> +	/* Force the context submission in case we have been skipping it */
> +	intel_logical_ring_advance_and_submit(ring, ctx);
> +
> +	/* With GEM the hangcheck timer should kick us out of the loop,
> +	 * leaving it early runs the risk of corrupting GEM state (due
> +	 * to running on almost untested codepaths). But on resume
> +	 * timers don't work yet, so prevent a complete hang in that
> +	 * case by choosing an insanely large timeout. */
> +	end = jiffies + 60 * HZ;
> +

In the legacy ringbuffer version, there are tracepoints around the do loop.
Should we keep those? Or add lrc specific equivalents?

> +	do {
> +		ringbuf->head = I915_READ_HEAD(ring);
> +		ringbuf->space = intel_ring_space(ringbuf);
> +		if (ringbuf->space >= bytes) {
> +			ret = 0;
> +			break;
> +		}
> +
> +		if (!drm_core_check_feature(dev, DRIVER_MODESET) &&
> +		    dev->primary->master) {
> +			struct drm_i915_master_private *master_priv = dev->primary->master->driver_priv;
> +			if (master_priv->sarea_priv)
> +				master_priv->sarea_priv->perf_boxes |= I915_BOX_WAIT;
> +		}
> +
> +		msleep(1);
> +
> +		if (dev_priv->mm.interruptible && signal_pending(current)) {
> +			ret = -ERESTARTSYS;
> +			break;
> +		}
> +
> +		ret = i915_gem_check_wedge(&dev_priv->gpu_error,
> +					   dev_priv->mm.interruptible);
> +		if (ret)
> +			break;
> +
> +		if (time_after(jiffies, end)) {
> +			ret = -EBUSY;
> +			break;
> +		}
> +	} while (1);
> +
> +	return ret;
> +}
> +
> +static int logical_ring_wrap_buffer(struct intel_engine_cs *ring,
> +						struct intel_ringbuffer *ringbuf,
> +						struct intel_context *ctx)
> +{
> +	uint32_t __iomem *virt;
> +	int rem = ringbuf->size - ringbuf->tail;
> +
> +	if (ringbuf->space < rem) {
> +		int ret = logical_ring_wait_for_space(ring, ringbuf, ctx, rem);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	virt = ringbuf->virtual_start + ringbuf->tail;
> +	rem /= 4;
> +	while (rem--)
> +		iowrite32(MI_NOOP, virt++);
> +
> +	ringbuf->tail = 0;
> +	ringbuf->space = intel_ring_space(ringbuf);
> +
> +	return 0;
> +}
> +
> +static int logical_ring_prepare(struct intel_engine_cs *ring,
> +				struct intel_ringbuffer *ringbuf,
> +				struct intel_context *ctx,
> +				int bytes)
> +{
> +	int ret;
> +
> +	if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
> +		ret = logical_ring_wrap_buffer(ring, ringbuf, ctx);
> +		if (unlikely(ret))
> +			return ret;
> +	}
> +
> +	if (unlikely(ringbuf->space < bytes)) {
> +		ret = logical_ring_wait_for_space(ring, ringbuf, ctx, bytes);
> +		if (unlikely(ret))
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +int intel_logical_ring_begin(struct intel_engine_cs *ring,
> +			     struct intel_context *ctx,
> +			     int num_dwords)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
> +	int ret;
> +
> +	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
> +				   dev_priv->mm.interruptible);
> +	if (ret)
> +		return ret;
> +
> +	ret = logical_ring_prepare(ring, ringbuf, ctx,
> +			num_dwords * sizeof(uint32_t));
> +	if (ret)
> +		return ret;
> +
> +	/* Preallocate the olr before touching the ring */
> +	ret = logical_ring_alloc_seqno(ring, ctx);
> +	if (ret)
> +		return ret;
> +
> +	ringbuf->space -= num_dwords * sizeof(uint32_t);
> +	return 0;
> +}
> +
>  static int gen8_init_common_ring(struct intel_engine_cs *ring)
>  {
>  	struct drm_device *dev = ring->dev;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 26b0949..686ebf5 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -5,6 +5,24 @@
>  void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
>  int intel_logical_rings_init(struct drm_device *dev);
>  
> +void intel_logical_ring_advance_and_submit(struct intel_engine_cs *ring,
> +					   struct intel_context *ctx);
> +
> +static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
> +{
> +	ringbuf->tail &= ringbuf->size - 1;
> +}
> +
> +static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf, u32 data)
> +{
> +	iowrite32(data, ringbuf->virtual_start + ringbuf->tail);
> +	ringbuf->tail += 4;
> +}
> +
> +int intel_logical_ring_begin(struct intel_engine_cs *ring,
> +			     struct intel_context *ctx,
> +			     int num_dwords);
> +

I think all of these are only used in intel_lrc.c, so don't need to be in
the header and could all be static. Right?

Brad

>  /* Logical Ring Contexts */
>  void intel_lr_context_free(struct intel_context *ctx);
>  int intel_lr_context_deferred_create(struct intel_context *ctx,
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 27/53] drm/i915/bdw: GEN-specific logical ring emit request
  2014-06-13 15:37 ` [PATCH 27/53] drm/i915/bdw: GEN-specific logical ring emit request oscar.mateo
@ 2014-06-20 21:18   ` Volkin, Bradley D
  2014-06-23 15:48     ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Volkin, Bradley D @ 2014-06-20 21:18 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 08:37:45AM -0700, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Very similar to the legacy add_request, only modified to account for
> logical ringbuffer.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_reg.h         |  1 +
>  drivers/gpu/drm/i915/intel_lrc.c        | 61 +++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
>  3 files changed, 64 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 9c8692a..63ec3ea 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -267,6 +267,7 @@
>  #define   MI_FORCE_RESTORE		(1<<1)
>  #define   MI_RESTORE_INHIBIT		(1<<0)
>  #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
> +#define MI_STORE_DWORD_IMM_GEN8	MI_INSTR(0x20, 2)
>  #define   MI_MEM_VIRTUAL	(1 << 22) /* 965+ only */
>  #define MI_STORE_DWORD_INDEX	MI_INSTR(0x21, 1)
>  #define   MI_STORE_DWORD_INDEX_SHIFT 2
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 89aed7a..3debe8b 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -359,6 +359,62 @@ static void gen8_submit_ctx(struct intel_engine_cs *ring,
>  	DRM_ERROR("Execlists still not ready!\n");
>  }
>  
> +static int gen8_emit_request(struct intel_engine_cs *ring,
> +			     struct intel_context *ctx)
> +{
> +	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
> +	u32 cmd;
> +	int ret;
> +
> +	ret = intel_logical_ring_begin(ring, ctx, 6);
> +	if (ret)
> +		return ret;
> +
> +	cmd = MI_FLUSH_DW + 1;
> +	cmd |= MI_INVALIDATE_TLB;

Is the TLB invalidation truely required here? Otherwise it seems
like we could use the same function for all rings, like on gen6+.

> +	cmd |= MI_FLUSH_DW_OP_STOREDW;
> +
> +	intel_logical_ring_emit(ringbuf, cmd);
> +	intel_logical_ring_emit(ringbuf,
> +				(ring->status_page.gfx_addr +
> +				(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT)) |
> +				MI_FLUSH_DW_USE_GTT);
> +	intel_logical_ring_emit(ringbuf, 0);
> +	intel_logical_ring_emit(ringbuf, ring->outstanding_lazy_seqno);
> +	intel_logical_ring_emit(ringbuf, MI_USER_INTERRUPT);
> +	intel_logical_ring_emit(ringbuf, MI_NOOP);
> +	intel_logical_ring_advance_and_submit(ring, ctx);
> +
> +	return 0;
> +}
> +
> +static int gen8_emit_request_render(struct intel_engine_cs *ring,
> +				    struct intel_context *ctx)
> +{
> +	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
> +	u32 cmd;
> +	int ret;
> +
> +	ret = intel_logical_ring_begin(ring, ctx, 6);
> +	if (ret)
> +		return ret;
> +
> +	cmd = MI_STORE_DWORD_IMM_GEN8;
> +	cmd |= (1 << 22); /* use global GTT */

We could use MI_MEM_VIRTUAL or MI_GLOBAL_GTT instead.

Brad

> +
> +	intel_logical_ring_emit(ringbuf, cmd);
> +	intel_logical_ring_emit(ringbuf,
> +				(ring->status_page.gfx_addr +
> +				(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT)));
> +	intel_logical_ring_emit(ringbuf, 0);
> +	intel_logical_ring_emit(ringbuf, ring->outstanding_lazy_seqno);
> +	intel_logical_ring_emit(ringbuf, MI_USER_INTERRUPT);
> +	intel_logical_ring_emit(ringbuf, MI_NOOP);
> +	intel_logical_ring_advance_and_submit(ring, ctx);
> +
> +	return 0;
> +}
> +
>  void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
>  {
>  	if (!intel_ring_initialized(ring))
> @@ -434,6 +490,7 @@ static int logical_render_ring_init(struct drm_device *dev)
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->submit_ctx = gen8_submit_ctx;
> +	ring->emit_request = gen8_emit_request_render;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -453,6 +510,7 @@ static int logical_bsd_ring_init(struct drm_device *dev)
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->submit_ctx = gen8_submit_ctx;
> +	ring->emit_request = gen8_emit_request;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -472,6 +530,7 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->submit_ctx = gen8_submit_ctx;
> +	ring->emit_request = gen8_emit_request;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -491,6 +550,7 @@ static int logical_blt_ring_init(struct drm_device *dev)
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->submit_ctx = gen8_submit_ctx;
> +	ring->emit_request = gen8_emit_request;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -510,6 +570,7 @@ static int logical_vebox_ring_init(struct drm_device *dev)
>  	ring->get_seqno = gen8_get_seqno;
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->submit_ctx = gen8_submit_ctx;
> +	ring->emit_request = gen8_emit_request;
>  
>  	return logical_ring_init(dev, ring);
>  }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 1a6df42..d8ded14 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -151,6 +151,8 @@ struct  intel_engine_cs {
>  	/* Execlists */
>  	void		(*submit_ctx)(struct intel_engine_cs *ring,
>  				      struct intel_context *ctx, u32 value);
> +	int		(*emit_request)(struct intel_engine_cs *ring,
> +					struct intel_context *ctx);
>  
>  	/**
>  	 * List of objects currently involved in rendering from the
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 28/53] drm/i915/bdw: GEN-specific logical ring emit flush
  2014-06-13 15:37 ` [PATCH 28/53] drm/i915/bdw: GEN-specific logical ring emit flush oscar.mateo
@ 2014-06-20 21:39   ` Volkin, Bradley D
  0 siblings, 0 replies; 156+ messages in thread
From: Volkin, Bradley D @ 2014-06-20 21:39 UTC (permalink / raw)
  To: oscar.mateo; +Cc: intel-gfx

On Fri, Jun 13, 2014 at 08:37:46AM -0700, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Notice that the BSD invalidate bit is no longer present in GEN8, so

Hmm. As far as I can tell, it is still present for VCS on gen8. As to
whether we need to set it, I don't know.

> we can consolidate the blt and bsd ring flushes into one.
> 
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c        | 80 +++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  7 ---
>  drivers/gpu/drm/i915/intel_ringbuffer.h | 11 +++++
>  3 files changed, 91 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 3debe8b..3d7fcd6 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -343,6 +343,81 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
>  	return ret;
>  }
>  
> +static int gen8_emit_flush(struct intel_engine_cs *ring,
> +			   struct intel_context *ctx,
> +			   u32 invalidate_domains,
> +			   u32 unused)
> +{
> +	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
> +	uint32_t cmd;
> +	int ret;
> +
> +	ret = intel_logical_ring_begin(ring, ctx, 4);
> +	if (ret)
> +		return ret;
> +
> +	cmd = MI_FLUSH_DW + 1;
> +
> +	/*
> +	 * Bspec vol 1c.3 - blitter engine command streamer:
> +	 * "If ENABLED, all TLBs will be invalidated once the flush
> +	 * operation is complete. This bit is only valid when the
> +	 * Post-Sync Operation field is a value of 1h or 3h."
> +	 */
> +	if (invalidate_domains & I915_GEM_DOMAIN_RENDER)
> +		cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
> +			MI_FLUSH_DW_OP_STOREDW;
> +	intel_logical_ring_emit(ringbuf, cmd);
> +	intel_logical_ring_emit(ringbuf, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
> +	intel_logical_ring_emit(ringbuf, 0); /* upper addr */
> +	intel_logical_ring_emit(ringbuf, 0); /* value */
> +	intel_logical_ring_advance(ringbuf);
> +
> +	return 0;
> +}
> +
> +static int gen8_emit_flush_render(struct intel_engine_cs *ring,
> +				  struct intel_context *ctx,
> +				  u32 invalidate_domains,
> +				  u32 flush_domains)
> +{
> +	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
> +	u32 flags = 0;
> +	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
> +	int ret;
> +
> +	flags |= PIPE_CONTROL_CS_STALL;
> +
> +	if (flush_domains) {
> +		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> +		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
> +	}
> +	if (invalidate_domains) {
> +		flags |= PIPE_CONTROL_TLB_INVALIDATE;
> +		flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
> +		flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
> +		flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
> +		flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
> +		flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
> +		flags |= PIPE_CONTROL_QW_WRITE;
> +		flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
> +	}
> +
> +	ret = intel_logical_ring_begin(ring, ctx, 6);
> +	if (ret)
> +		return ret;
> +
> +	intel_logical_ring_emit(ringbuf, GFX_OP_PIPE_CONTROL(6));
> +	intel_logical_ring_emit(ringbuf, flags);
> +	intel_logical_ring_emit(ringbuf, scratch_addr);
> +	intel_logical_ring_emit(ringbuf, 0);
> +	intel_logical_ring_emit(ringbuf, 0);
> +	intel_logical_ring_emit(ringbuf, 0);
> +	intel_logical_ring_advance(ringbuf);
> +
> +	return 0;
> +}
> +
>  static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
>  {
>  	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
> @@ -491,6 +566,7 @@ static int logical_render_ring_init(struct drm_device *dev)
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->submit_ctx = gen8_submit_ctx;
>  	ring->emit_request = gen8_emit_request_render;
> +	ring->emit_flush = gen8_emit_flush_render;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -511,6 +587,7 @@ static int logical_bsd_ring_init(struct drm_device *dev)
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->submit_ctx = gen8_submit_ctx;
>  	ring->emit_request = gen8_emit_request;
> +	ring->emit_flush = gen8_emit_flush;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -531,6 +608,7 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->submit_ctx = gen8_submit_ctx;
>  	ring->emit_request = gen8_emit_request;
> +	ring->emit_flush = gen8_emit_flush;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -551,6 +629,7 @@ static int logical_blt_ring_init(struct drm_device *dev)
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->submit_ctx = gen8_submit_ctx;
>  	ring->emit_request = gen8_emit_request;
> +	ring->emit_flush = gen8_emit_flush;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -571,6 +650,7 @@ static int logical_vebox_ring_init(struct drm_device *dev)
>  	ring->set_seqno = gen8_set_seqno;
>  	ring->submit_ctx = gen8_submit_ctx;
>  	ring->emit_request = gen8_emit_request;
> +	ring->emit_flush = gen8_emit_flush;
>  
>  	return logical_ring_init(dev, ring);
>  }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 137ee9a..a128f6f 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -33,13 +33,6 @@
>  #include "i915_trace.h"
>  #include "intel_drv.h"
>  
> -/* Early gen2 devices have a cacheline of just 32 bytes, using 64 is overkill,
> - * but keeps the logic simple. Indeed, the whole purpose of this macro is just
> - * to give some inclination as to some of the magic values used in the various
> - * workarounds!
> - */
> -#define CACHELINE_BYTES 64
> -
>  bool
>  intel_ring_initialized(struct intel_engine_cs *ring)
>  {
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index d8ded14..527db2a 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -5,6 +5,13 @@
>  
>  #define I915_CMD_HASH_ORDER 9
>  
> +/* Early gen2 devices have a cacheline of just 32 bytes, using 64 is overkill,
> + * but keeps the logic simple. Indeed, the whole purpose of this macro is just
> + * to give some inclination as to some of the magic values used in the various
> + * workarounds!
> + */
> +#define CACHELINE_BYTES 64
> +
>  /*
>   * Gen2 BSpec "1. Programming Environment" / 1.4.4.6 "Ring Buffer Use"
>   * Gen3 BSpec "vol1c Memory Interface Functions" / 2.3.4.5 "Ring Buffer Use"
> @@ -153,6 +160,10 @@ struct  intel_engine_cs {
>  				      struct intel_context *ctx, u32 value);
>  	int		(*emit_request)(struct intel_engine_cs *ring,
>  					struct intel_context *ctx);
> +	int __must_check (*emit_flush)(struct intel_engine_cs *ring,
> +				       struct intel_context *ctx,
> +				       u32 invalidate_domains,
> +				       u32 flush_domains);

Any reason to make this one __must_check but not the others?

Brad

>  
>  	/**
>  	 * List of objects currently involved in rendering from the
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 41/53] drm/i915/bdw: Avoid non-lite-restore preemptions
  2014-06-18 20:49   ` Daniel Vetter
@ 2014-06-23 11:52     ` Mateo Lozano, Oscar
  2014-07-07 12:47       ` Daniel Vetter
  0 siblings, 1 reply; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-23 11:52 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Wednesday, June 18, 2014 9:49 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 41/53] drm/i915/bdw: Avoid non-lite-restore
> preemptions
> 
> On Fri, Jun 13, 2014 at 04:37:59PM +0100, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > In the current Execlists feeding mechanism, full preemption is not
> > supported yet: only lite-restores are allowed (this is: the GPU simply
> > samples a new tail pointer for the context currently in execution).
> >
> > But we have identified an scenario in which a full preemption occurs:
> > 1) We submit two contexts for execution (A & B).
> > 2) The GPU finishes with the first one (A), switches to the second one
> > (B) and informs us.
> > 3) We submit B again (hoping to cause a lite restore) together with C,
> > but in the time we spend writing to the ELSP, the GPU finishes B.
> > 4) The GPU start executing B again (since we told it so).
> > 5) We receive a B finished interrupt and, mistakenly, we submit C
> > (again) and D, causing a full preemption of B.
> >
> > By keeping a better track of our submissions, we can avoid the
> > scenario described above.
> 
> How? I don't see a way to fundamentally avoid the above race, and I don't
> really see an issue with it - the gpu should notice that there's not really any
> work done and then switch to C.
> 
> Or am I completely missing the point here?
> 
> With no clue at all this looks really scary.

The race is avoided by keeping track of how many times a context has been submitted to the hardware and by better discriminating the received context switch interrupts: in the example, when we have submitted B twice, we won´t submit C and D as soon as we receive the notification that B is completed because we were expecting to get a LITE_RESTORE and we didn´t, so we know a second completion will be received shortly.

Without this explicit checking, the race condition happens and, somehow, the batch buffer execution order gets messed with. This can be verified with the IGT test I sent together with the series. I don´t know the exact mechanism by which the pre-emption messes with the execution order but, since other people is working on the Scheduler + Preemption on Execlists, I didn´t try to fix it. In these series, only Lite Restores are supported (other kind of preemptions WARN).

I´ll add this clarification to the commit message.

-- Oscar

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 11/53] drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts
  2014-06-18 22:19   ` Volkin, Bradley D
@ 2014-06-23 12:07     ` Mateo Lozano, Oscar
  0 siblings, 0 replies; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-23 12:07 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx



---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47


> -----Original Message-----
> From: Volkin, Bradley D
> Sent: Wednesday, June 18, 2014 11:19 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 11/53] drm/i915/bdw: Allocate ringbuffers
> for Logical Ring Contexts
> 
> On Fri, Jun 13, 2014 at 08:37:29AM -0700, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > As we have said a couple of times by now, logical ring contexts have
> > their own ringbuffers: not only the backing pages, but the whole
> > management struct.
> >
> > In a previous version of the series, this was achieved with two
> > separate
> > patches:
> > drm/i915/bdw: Allocate ringbuffer backing objects for default global
> > LRC
> > drm/i915/bdw: Allocate ringbuffer for user-created LRCs
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h  |  1 +
> > drivers/gpu/drm/i915/intel_lrc.c | 38
> > ++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 39 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h index 347308e..79799d8 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -599,6 +599,7 @@ struct intel_context {
> >  	/* Execlists */
> >  	struct {
> >  		struct drm_i915_gem_object *obj;
> > +		struct intel_ringbuffer *ringbuf;
> >  	} engine[I915_NUM_RINGS];
> >
> >  	struct i915_ctx_hang_stats hang_stats; diff --git
> > a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 952212f..b3a23e0 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -60,7 +60,11 @@ void intel_lr_context_free(struct intel_context
> > *ctx)
> >
> >  	for (i = 0; i < I915_NUM_RINGS; i++) {
> >  		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].obj;
> > +		struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
> > +
> >  		if (ctx_obj) {
> > +			intel_destroy_ring_buffer(ringbuf);
> > +			kfree(ringbuf);
> >  			i915_gem_object_ggtt_unpin(ctx_obj);
> >  			drm_gem_object_unreference(&ctx_obj->base);
> >  		}
> > @@ -94,6 +98,7 @@ int intel_lr_context_deferred_create(struct
> intel_context *ctx,
> >  	struct drm_device *dev = ring->dev;
> >  	struct drm_i915_gem_object *ctx_obj;
> >  	uint32_t context_size;
> > +	struct intel_ringbuffer *ringbuf;
> >  	int ret;
> >
> >  	WARN_ON(ctx->render_obj != NULL);
> > @@ -114,6 +119,39 @@ int intel_lr_context_deferred_create(struct
> intel_context *ctx,
> >  		return ret;
> >  	}
> >
> > +	ringbuf = kzalloc(sizeof(*ringbuf), GFP_KERNEL);
> > +	if (!ringbuf) {
> > +		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s\n",
> > +				ring->name);
> > +		i915_gem_object_ggtt_unpin(ctx_obj);
> > +		drm_gem_object_unreference(&ctx_obj->base);
> > +		ret = -ENOMEM;
> > +		return ret;
> > +	}
> > +
> > +	ringbuf->size = 32 * PAGE_SIZE;
> > +	ringbuf->effective_size = ringbuf->size;
> > +	ringbuf->head = 0;
> > +	ringbuf->tail = 0;
> > +	ringbuf->space = ringbuf->size;
> > +	ringbuf->last_retired_head = -1;
> > +
> > +	/* TODO: For now we put this in the mappable region so that we can
> reuse
> > +	 * the existing ringbuffer code which ioremaps it. When we start
> > +	 * creating many contexts, this will no longer work and we must
> switch
> > +	 * to a kmapish interface.
> > +	 */
> 
> It looks like this comment still exists at the end of the series. Does it still
> apply or did we find that this is not an issue?

This is similar to the problem we have with pinning context regardless: the ringbuffers are 32 pages in size and we pin them (intel_allocate_ring_buffer), so we will end up fragmenting the GTT pretty heavily. This problem is not that bad with Full PPGTT, but of course it needs fixing. The good news is that the logical ring split allows me to tackle this without changing how old platforms work :)

I´ll get to it.

-- Oscar

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 12/53] drm/i915/bdw: Populate LR contexts (somewhat)
  2014-06-18 23:24   ` Volkin, Bradley D
@ 2014-06-23 12:42     ` Mateo Lozano, Oscar
  2014-06-23 15:05       ` Volkin, Bradley D
  0 siblings, 1 reply; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-23 12:42 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

> -----Original Message-----
> From: Volkin, Bradley D
> Sent: Thursday, June 19, 2014 12:24 AM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 12/53] drm/i915/bdw: Populate LR contexts
> (somewhat)
> 
> On Fri, Jun 13, 2014 at 08:37:30AM -0700, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > For the most part, logical ring context objects are similar to
> > hardware contexts in that the backing object is meant to be opaque.
> > There are some exceptions where we need to poke certain offsets of the
> > object for initialization, updating the tail pointer or updating the PDPs.
> >
> > For our basic execlist implementation we'll only need our PPGTT PDs,
> > and ringbuffer addresses in order to set up the context. With previous
> > patches, we have both, so start prepping the context to be load.
> >
> > Before running a context for the first time you must populate some
> > fields in the context object. These fields begin 1 PAGE + LRCA, ie.
> > the first page (in 0 based counting) of the context  image. These same
> > fields will be read and written to as contexts are saved and restored
> > once the system is up and running.
> >
> > Many of these fields are completely reused from previous global
> > registers: ringbuffer head/tail/control, context control matches some
> > previous MI_SET_CONTEXT flags, and page directories. There are other
> > fields which we don't touch which we may want in the future.
> >
> > v2: CTX_LRI_HEADER_0 is MI_LOAD_REGISTER_IMM(14) for render and
> (11)
> > for other engines.
> >
> > v3: Several rebases and general changes to the code.
> >
> > v4: Squash with "Extract LR context object populating"
> > Also, Damien's review comments:
> > - Set the Force Posted bit on the LRI header, as the BSpec suggest we do.
> > - Prevent warning when compiling a 32-bits kernel without HIGHMEM64.
> > - Add a clarifying comment to the context population code.
> >
> > v5: Damien's review comments:
> > - The third MI_LOAD_REGISTER_IMM in the context does not set Force
> Posted.
> > - Remove dead code.
> >
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
> > Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> (v2)
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v3-5)
> > ---
> >  drivers/gpu/drm/i915/i915_reg.h  |   1 +
> >  drivers/gpu/drm/i915/intel_lrc.c | 154
> > ++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 151 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h
> > b/drivers/gpu/drm/i915/i915_reg.h index 286f05c..9c8692a 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -277,6 +277,7 @@
> >   *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
> >   */
> >  #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
> > +#define   MI_LRI_FORCE_POSTED		(1<<12)
> >  #define MI_STORE_REGISTER_MEM(x) MI_INSTR(0x24, 2*(x)-1)  #define
> > MI_STORE_REGISTER_MEM_GEN8(x) MI_INSTR(0x24, 3*(x)-1)
> >  #define   MI_SRM_LRM_GLOBAL_GTT		(1<<22)
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > b/drivers/gpu/drm/i915/intel_lrc.c
> > index b3a23e0..b96bb45 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -46,6 +46,38 @@
> >
> >  #define GEN8_LR_CONTEXT_ALIGN 4096
> >
> > +#define RING_ELSP(ring)			((ring)->mmio_base+0x230)
> > +#define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
> > +
> > +#define CTX_LRI_HEADER_0		0x01
> > +#define CTX_CONTEXT_CONTROL		0x02
> > +#define CTX_RING_HEAD			0x04
> > +#define CTX_RING_TAIL			0x06
> > +#define CTX_RING_BUFFER_START		0x08
> > +#define CTX_RING_BUFFER_CONTROL		0x0a
> > +#define CTX_BB_HEAD_U			0x0c
> > +#define CTX_BB_HEAD_L			0x0e
> > +#define CTX_BB_STATE			0x10
> > +#define CTX_SECOND_BB_HEAD_U		0x12
> > +#define CTX_SECOND_BB_HEAD_L		0x14
> > +#define CTX_SECOND_BB_STATE		0x16
> > +#define CTX_BB_PER_CTX_PTR		0x18
> > +#define CTX_RCS_INDIRECT_CTX		0x1a
> > +#define CTX_RCS_INDIRECT_CTX_OFFSET	0x1c
> > +#define CTX_LRI_HEADER_1		0x21
> > +#define CTX_CTX_TIMESTAMP		0x22
> > +#define CTX_PDP3_UDW			0x24
> > +#define CTX_PDP3_LDW			0x26
> > +#define CTX_PDP2_UDW			0x28
> > +#define CTX_PDP2_LDW			0x2a
> > +#define CTX_PDP1_UDW			0x2c
> > +#define CTX_PDP1_LDW			0x2e
> > +#define CTX_PDP0_UDW			0x30
> > +#define CTX_PDP0_LDW			0x32
> > +#define CTX_LRI_HEADER_2		0x41
> > +#define CTX_R_PWR_CLK_STATE		0x42
> > +#define CTX_GPGPU_CSR_BASE_ADDRESS	0x44
> > +
> >  bool intel_enable_execlists(struct drm_device *dev)  {
> >  	if (!i915.enable_execlists)
> > @@ -54,6 +86,110 @@ bool intel_enable_execlists(struct drm_device
> *dev)
> >  	return HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev);  }
> >
> > +static int
> > +populate_lr_context(struct intel_context *ctx, struct
> drm_i915_gem_object *ctx_obj,
> > +		    struct intel_engine_cs *ring, struct drm_i915_gem_object
> > +*ring_obj) {
> > +	struct i915_hw_ppgtt *ppgtt = ctx_to_ppgtt(ctx);
> > +	struct page *page;
> > +	uint32_t *reg_state;
> > +	int ret;
> > +
> > +	ret = i915_gem_object_set_to_cpu_domain(ctx_obj, true);
> > +	if (ret) {
> > +		DRM_DEBUG_DRIVER("Could not set to CPU domain\n");
> > +		return ret;
> > +	}
> > +
> > +	ret = i915_gem_object_get_pages(ctx_obj);
> > +	if (ret) {
> > +		DRM_DEBUG_DRIVER("Could not get object pages\n");
> > +		return ret;
> > +	}
> > +
> > +	i915_gem_object_pin_pages(ctx_obj);
> > +
> > +	/* The second page of the context object contains some fields which
> must
> > +	 * be set up prior to the first execution. */
> > +	page = i915_gem_object_get_page(ctx_obj, 1);
> > +	reg_state = kmap_atomic(page);
> > +
> > +	/* A context is actually a big batch buffer with several
> MI_LOAD_REGISTER_IMM
> > +	 * commands followed by (reg, value) pairs. The values we are
> setting here are
> > +	 * only for the first context restore: on a subsequent save, the GPU
> will
> > +	 * recreate this batchbuffer with new values (including all the missing
> > +	 * MI_LOAD_REGISTER_IMM commands that we are not initializing
> here). */
> > +	if (ring->id == RCS)
> > +		reg_state[CTX_LRI_HEADER_0] =
> MI_LOAD_REGISTER_IMM(14);
> > +	else
> > +		reg_state[CTX_LRI_HEADER_0] =
> MI_LOAD_REGISTER_IMM(11);
> > +	reg_state[CTX_LRI_HEADER_0] |= MI_LRI_FORCE_POSTED;
> > +	reg_state[CTX_CONTEXT_CONTROL] =
> RING_CONTEXT_CONTROL(ring);
> > +	reg_state[CTX_CONTEXT_CONTROL+1] = (1<<3) |
> MI_RESTORE_INHIBIT;
> > +	reg_state[CTX_CONTEXT_CONTROL+1] |=
> reg_state[CTX_CONTEXT_CONTROL+1]
> > +<< 16;
> 
> If we can, we should probably use _MASKED_BIT_ENABLE() here to make it
> more obvious why we're doing the or+shift.

ACK

> > +	reg_state[CTX_RING_HEAD] = RING_HEAD(ring->mmio_base);
> > +	reg_state[CTX_RING_HEAD+1] = 0;
> > +	reg_state[CTX_RING_TAIL] = RING_TAIL(ring->mmio_base);
> > +	reg_state[CTX_RING_TAIL+1] = 0;
> > +	reg_state[CTX_RING_BUFFER_START] = RING_START(ring-
> >mmio_base);
> > +	reg_state[CTX_RING_BUFFER_START+1] =
> i915_gem_obj_ggtt_offset(ring_obj);
> > +	reg_state[CTX_RING_BUFFER_CONTROL] = RING_CTL(ring-
> >mmio_base);
> > +	reg_state[CTX_RING_BUFFER_CONTROL+1] = (31 * PAGE_SIZE) |
> > +RING_VALID;
> 
> The size here doesn't look right to me. Shouldn't it be (number of pages - 1)?
> See init_ring_common()

But that´s exactly what it is, right? 

Our ringbuf->size = 32 * PAGE_SIZE;
so we are setting 31 * PAGE_SIZE

> > +	reg_state[CTX_BB_HEAD_U] = ring->mmio_base + 0x168;
> > +	reg_state[CTX_BB_HEAD_U+1] = 0;
> > +	reg_state[CTX_BB_HEAD_L] = ring->mmio_base + 0x140;
> > +	reg_state[CTX_BB_HEAD_L+1] = 0;
> > +	reg_state[CTX_BB_STATE] = ring->mmio_base + 0x110;
> > +	reg_state[CTX_BB_STATE+1] = (1<<5);
> > +	reg_state[CTX_SECOND_BB_HEAD_U] = ring->mmio_base + 0x11c;
> > +	reg_state[CTX_SECOND_BB_HEAD_U+1] = 0;
> > +	reg_state[CTX_SECOND_BB_HEAD_L] = ring->mmio_base + 0x114;
> > +	reg_state[CTX_SECOND_BB_HEAD_L+1] = 0;
> > +	reg_state[CTX_SECOND_BB_STATE] = ring->mmio_base + 0x118;
> > +	reg_state[CTX_SECOND_BB_STATE+1] = 0;
> > +	if (ring->id == RCS) {
> > +		reg_state[CTX_BB_PER_CTX_PTR] = ring->mmio_base +
> 0x1c0;
> > +		reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
> > +		reg_state[CTX_RCS_INDIRECT_CTX] = ring->mmio_base +
> 0x1c4;
> > +		reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
> > +		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring-
> >mmio_base + 0x1c8;
> > +		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
> > +	}
> > +	reg_state[CTX_LRI_HEADER_1] = MI_LOAD_REGISTER_IMM(9);
> > +	reg_state[CTX_LRI_HEADER_1] |= MI_LRI_FORCE_POSTED;
> > +	reg_state[CTX_CTX_TIMESTAMP] = ring->mmio_base + 0x3a8;
> > +	reg_state[CTX_CTX_TIMESTAMP+1] = 0;
> > +	reg_state[CTX_PDP3_UDW] = GEN8_RING_PDP_UDW(ring, 3);
> > +	reg_state[CTX_PDP3_LDW] = GEN8_RING_PDP_LDW(ring, 3);
> > +	reg_state[CTX_PDP2_UDW] = GEN8_RING_PDP_UDW(ring, 2);
> > +	reg_state[CTX_PDP2_LDW] = GEN8_RING_PDP_LDW(ring, 2);
> > +	reg_state[CTX_PDP1_UDW] = GEN8_RING_PDP_UDW(ring, 1);
> > +	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
> > +	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
> > +	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
> > +	reg_state[CTX_PDP3_UDW+1] = (u64)ppgtt->pd_dma_addr[3] >> 32;
> > +	reg_state[CTX_PDP3_LDW+1] = ppgtt->pd_dma_addr[3];
> > +	reg_state[CTX_PDP2_UDW+1] = (u64)ppgtt->pd_dma_addr[2] >> 32;
> > +	reg_state[CTX_PDP2_LDW+1] = ppgtt->pd_dma_addr[2];
> > +	reg_state[CTX_PDP1_UDW+1] = (u64)ppgtt->pd_dma_addr[1] >> 32;
> > +	reg_state[CTX_PDP1_LDW+1] = ppgtt->pd_dma_addr[1];
> > +	reg_state[CTX_PDP0_UDW+1] = (u64)ppgtt->pd_dma_addr[0] >> 32;
> > +	reg_state[CTX_PDP0_LDW+1] = ppgtt->pd_dma_addr[0];
> 
> Are we able to use upper_32_bits() and lower_32_bits() for these?

ACK

-- Oscar

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 15/53] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
  2014-06-18 23:42   ` Volkin, Bradley D
@ 2014-06-23 12:45     ` Mateo Lozano, Oscar
  0 siblings, 0 replies; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-23 12:45 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx



---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47


> -----Original Message-----
> From: Volkin, Bradley D
> Sent: Thursday, June 19, 2014 12:43 AM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 15/53] drm/i915/bdw: Don't write PDP in the
> legacy way when using LRCs
> 
> On Fri, Jun 13, 2014 at 08:37:33AM -0700, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > This is mostly for correctness so that we know we are running the LR
> > context correctly (this is, the PDPs are contained inside the context
> > object).
> >
> > v2: Move the check to inside the enable PPGTT function. The switch
> > happens in two places: the legacy context switch (that we won't hit
> > when Execlists are enabled) and the PPGTT enable, which unfortunately
> > we need. This would look much nicer if the ppgtt->enable was part of
> > the ring init, where it logically belongs.
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_gtt.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index 8b3cde7..9f0c69e 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -844,6 +844,11 @@ static int gen8_ppgtt_enable(struct
> i915_hw_ppgtt *ppgtt)
> >  		if (USES_FULL_PPGTT(dev))
> >  			continue;
> >
> > +		/* In the case of Execlists, we don't want to write the PDPs
> > +		 * in the legacy way (they live inside the context now) */
> > +		if (intel_enable_execlists(dev))
> > +			return 0;
> 
> Along the lines of one of Daniel's comments about the module parameter, I
> think we could use some clarity on when to use intel_enable_execlists() vs
> lrc_enabled vs i915.enable_execlists.

Yep. I´ll look at this in v4. It´s probably better doing the early sanitize as Daniel suggested and then just use i915.enable_execlists everywhere.

-- Oscar

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 25/53] drm/i915/bdw: GEN-specific logical ring submit context (somewhat)
  2014-06-20 20:28   ` Volkin, Bradley D
@ 2014-06-23 12:49     ` Mateo Lozano, Oscar
  0 siblings, 0 replies; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-23 12:49 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

> -----Original Message-----
> From: Volkin, Bradley D
> Sent: Friday, June 20, 2014 9:28 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 25/53] drm/i915/bdw: GEN-specific logical
> ring submit context (somewhat)
> 
> On Fri, Jun 13, 2014 at 08:37:43AM -0700, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > For the moment, just mark the place (we still need to do a lot of
> > preparation before execlists are ready to start submitting things).
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_lrc.c        | 11 +++++++++++
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |  6 ++++++
> >  2 files changed, 17 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > b/drivers/gpu/drm/i915/intel_lrc.c
> > index 6c62ae5..02fc3d0 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -139,6 +139,12 @@ static void gen8_set_seqno(struct intel_engine_cs
> *ring, u32 seqno)
> >  	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);  }
> >
> > +static void gen8_submit_ctx(struct intel_engine_cs *ring,
> > +			    struct intel_context *ctx, u32 value) {
> > +	DRM_ERROR("Execlists still not ready!\n"); }
> > +
> >  void intel_logical_ring_cleanup(struct intel_engine_cs *ring)  {
> >  	if (!intel_ring_initialized(ring))
> > @@ -213,6 +219,7 @@ static int logical_render_ring_init(struct
> drm_device *dev)
> >  	ring->cleanup = intel_fini_pipe_control;
> >  	ring->get_seqno = gen8_get_seqno;
> >  	ring->set_seqno = gen8_set_seqno;
> > +	ring->submit_ctx = gen8_submit_ctx;
> >
> >  	return logical_ring_init(dev, ring);  } @@ -231,6 +238,7 @@ static
> > int logical_bsd_ring_init(struct drm_device *dev)
> >  	ring->init = gen8_init_common_ring;
> >  	ring->get_seqno = gen8_get_seqno;
> >  	ring->set_seqno = gen8_set_seqno;
> > +	ring->submit_ctx = gen8_submit_ctx;
> >
> >  	return logical_ring_init(dev, ring);  } @@ -249,6 +257,7 @@ static
> > int logical_bsd2_ring_init(struct drm_device *dev)
> >  	ring->init = gen8_init_common_ring;
> >  	ring->get_seqno = gen8_get_seqno;
> >  	ring->set_seqno = gen8_set_seqno;
> > +	ring->submit_ctx = gen8_submit_ctx;
> >
> >  	return logical_ring_init(dev, ring);  } @@ -267,6 +276,7 @@ static
> > int logical_blt_ring_init(struct drm_device *dev)
> >  	ring->init = gen8_init_common_ring;
> >  	ring->get_seqno = gen8_get_seqno;
> >  	ring->set_seqno = gen8_set_seqno;
> > +	ring->submit_ctx = gen8_submit_ctx;
> >
> >  	return logical_ring_init(dev, ring);  } @@ -285,6 +295,7 @@ static
> > int logical_vebox_ring_init(struct drm_device *dev)
> >  	ring->init = gen8_init_common_ring;
> >  	ring->get_seqno = gen8_get_seqno;
> >  	ring->set_seqno = gen8_set_seqno;
> > +	ring->submit_ctx = gen8_submit_ctx;
> >
> >  	return logical_ring_init(dev, ring);  } diff --git
> > a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > index ff8753c..1a6df42 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > @@ -79,6 +79,8 @@ struct intel_ringbuffer {
> >  	u32 last_retired_head;
> >  };
> >
> > +struct intel_context;
> > +
> >  struct  intel_engine_cs {
> >  	const char	*name;
> >  	enum intel_ring_id {
> > @@ -146,6 +148,10 @@ struct  intel_engine_cs {
> >  				  unsigned int num_dwords);
> >  	} semaphore;
> >
> > +	/* Execlists */
> > +	void		(*submit_ctx)(struct intel_engine_cs *ring,
> > +				      struct intel_context *ctx, u32 value);
> > +
> 
> Is it worth making this a vfunc in the refactored codebase? It ends up as the
> same function for all engines...called in one place...the implementation of
> which is a single call to another function that takes the same arguments.
> Previously this was an implementation of the write_tail vfunc, so it made
> sense. I'm not so sure now.

Now that you say it, no, it´s probably not worth it. This stuff has changed so many times that sometimes it´s difficult to keep track :(

-- Oscar

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-20 21:00   ` Volkin, Bradley D
@ 2014-06-23 13:09     ` Mateo Lozano, Oscar
  2014-06-23 13:13       ` Chris Wilson
  0 siblings, 1 reply; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-23 13:09 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

> -----Original Message-----
> From: Volkin, Bradley D
> Sent: Friday, June 20, 2014 10:01 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical ring
> submission mechanism
> 
> On Fri, Jun 13, 2014 at 08:37:44AM -0700, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > Well, new-ish: if all this code looks familiar, that's because it's a
> > clone of the existing submission mechanism (with some modifications
> > here and there to adapt it to LRCs and Execlists).
> >
> > And why did we do this? Execlists offer several advantages, like
> > control over when the GPU is done with a given workload, that can help
> > simplify the submission mechanism, no doubt, but I am interested in
> > getting Execlists to work first and foremost. As we are creating a
> > parallel submission mechanism (even if itñś just a clone), we can now
> > start improving it without the fear of breaking old gens.
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_lrc.c | 214
> > +++++++++++++++++++++++++++++++++++++++
> >  drivers/gpu/drm/i915/intel_lrc.h |  18 ++++
> >  2 files changed, 232 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > b/drivers/gpu/drm/i915/intel_lrc.c
> > index 02fc3d0..89aed7a 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -86,6 +86,220 @@ bool intel_enable_execlists(struct drm_device
> *dev)
> >  	return HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev);  }
> >
> > +static inline struct intel_ringbuffer * logical_ringbuf_get(struct
> > +intel_engine_cs *ring, struct intel_context *ctx) {
> > +	return ctx->engine[ring->id].ringbuf; }
> > +
> > +void intel_logical_ring_advance_and_submit(struct intel_engine_cs *ring,
> > +					   struct intel_context *ctx)
> > +{
> > +	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
> > +
> > +	intel_logical_ring_advance(ringbuf);
> > +
> > +	if (intel_ring_stopped(ring))
> > +		return;
> > +
> > +	ring->submit_ctx(ring, ctx, ringbuf->tail); }
> > +
> > +static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
> > +				    struct intel_context *ctx)
> > +{
> > +	if (ring->outstanding_lazy_seqno)
> > +		return 0;
> > +
> > +	if (ring->preallocated_lazy_request == NULL) {
> > +		struct drm_i915_gem_request *request;
> > +
> > +		request = kmalloc(sizeof(*request), GFP_KERNEL);
> > +		if (request == NULL)
> > +			return -ENOMEM;
> > +
> > +		ring->preallocated_lazy_request = request;
> > +	}
> > +
> > +	return i915_gem_get_seqno(ring->dev, &ring-
> >outstanding_lazy_seqno);
> > +}
> > +
> > +static int logical_ring_wait_request(struct intel_engine_cs *ring,
> > +				     struct intel_ringbuffer *ringbuf,
> > +				     struct intel_context *ctx,
> > +				     int bytes)
> > +{
> > +	struct drm_i915_gem_request *request;
> > +	u32 seqno = 0;
> > +	int ret;
> > +
> > +	if (ringbuf->last_retired_head != -1) {
> > +		ringbuf->head = ringbuf->last_retired_head;
> > +		ringbuf->last_retired_head = -1;
> > +
> > +		ringbuf->space = intel_ring_space(ringbuf);
> > +		if (ringbuf->space >= bytes)
> > +			return 0;
> > +	}
> > +
> > +	list_for_each_entry(request, &ring->request_list, list) {
> > +		if (__intel_ring_space(request->tail, ringbuf->tail,
> > +				ringbuf->size) >= bytes) {
> > +			seqno = request->seqno;
> > +			break;
> > +		}
> > +	}
> > +
> > +	if (seqno == 0)
> > +		return -ENOSPC;
> > +
> > +	ret = i915_wait_seqno(ring, seqno);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* TODO: make sure we update the right ringbuffer's
> last_retired_head
> > +	 * when retiring requests */
> > +	i915_gem_retire_requests_ring(ring);
> > +	ringbuf->head = ringbuf->last_retired_head;
> > +	ringbuf->last_retired_head = -1;
> > +
> > +	ringbuf->space = intel_ring_space(ringbuf);
> > +	return 0;
> > +}
> > +
> > +static int logical_ring_wait_for_space(struct intel_engine_cs *ring,
> > +						   struct intel_ringbuffer
> *ringbuf,
> > +						   struct intel_context *ctx,
> > +						   int bytes)
> > +{
> > +	struct drm_device *dev = ring->dev;
> > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > +	unsigned long end;
> > +	int ret;
> > +
> > +	ret = logical_ring_wait_request(ring, ringbuf, ctx, bytes);
> > +	if (ret != -ENOSPC)
> > +		return ret;
> > +
> > +	/* Force the context submission in case we have been skipping it */
> > +	intel_logical_ring_advance_and_submit(ring, ctx);
> > +
> > +	/* With GEM the hangcheck timer should kick us out of the loop,
> > +	 * leaving it early runs the risk of corrupting GEM state (due
> > +	 * to running on almost untested codepaths). But on resume
> > +	 * timers don't work yet, so prevent a complete hang in that
> > +	 * case by choosing an insanely large timeout. */
> > +	end = jiffies + 60 * HZ;
> > +
> 
> In the legacy ringbuffer version, there are tracepoints around the do loop.
> Should we keep those? Or add lrc specific equivalents?
> 
> > +	do {
> > +		ringbuf->head = I915_READ_HEAD(ring);
> > +		ringbuf->space = intel_ring_space(ringbuf);
> > +		if (ringbuf->space >= bytes) {
> > +			ret = 0;
> > +			break;
> > +		}
> > +
> > +		if (!drm_core_check_feature(dev, DRIVER_MODESET) &&
> > +		    dev->primary->master) {
> > +			struct drm_i915_master_private *master_priv = dev-
> >primary->master->driver_priv;
> > +			if (master_priv->sarea_priv)
> > +				master_priv->sarea_priv->perf_boxes |=
> I915_BOX_WAIT;
> > +		}
> > +
> > +		msleep(1);
> > +
> > +		if (dev_priv->mm.interruptible && signal_pending(current)) {
> > +			ret = -ERESTARTSYS;
> > +			break;
> > +		}
> > +
> > +		ret = i915_gem_check_wedge(&dev_priv->gpu_error,
> > +					   dev_priv->mm.interruptible);
> > +		if (ret)
> > +			break;
> > +
> > +		if (time_after(jiffies, end)) {
> > +			ret = -EBUSY;
> > +			break;
> > +		}
> > +	} while (1);
> > +
> > +	return ret;
> > +}
> > +
> > +static int logical_ring_wrap_buffer(struct intel_engine_cs *ring,
> > +						struct intel_ringbuffer
> *ringbuf,
> > +						struct intel_context *ctx)
> > +{
> > +	uint32_t __iomem *virt;
> > +	int rem = ringbuf->size - ringbuf->tail;
> > +
> > +	if (ringbuf->space < rem) {
> > +		int ret = logical_ring_wait_for_space(ring, ringbuf, ctx, rem);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> > +	virt = ringbuf->virtual_start + ringbuf->tail;
> > +	rem /= 4;
> > +	while (rem--)
> > +		iowrite32(MI_NOOP, virt++);
> > +
> > +	ringbuf->tail = 0;
> > +	ringbuf->space = intel_ring_space(ringbuf);
> > +
> > +	return 0;
> > +}
> > +
> > +static int logical_ring_prepare(struct intel_engine_cs *ring,
> > +				struct intel_ringbuffer *ringbuf,
> > +				struct intel_context *ctx,
> > +				int bytes)
> > +{
> > +	int ret;
> > +
> > +	if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
> > +		ret = logical_ring_wrap_buffer(ring, ringbuf, ctx);
> > +		if (unlikely(ret))
> > +			return ret;
> > +	}
> > +
> > +	if (unlikely(ringbuf->space < bytes)) {
> > +		ret = logical_ring_wait_for_space(ring, ringbuf, ctx, bytes);
> > +		if (unlikely(ret))
> > +			return ret;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +int intel_logical_ring_begin(struct intel_engine_cs *ring,
> > +			     struct intel_context *ctx,
> > +			     int num_dwords)
> > +{
> > +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> > +	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
> > +	int ret;
> > +
> > +	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
> > +				   dev_priv->mm.interruptible);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = logical_ring_prepare(ring, ringbuf, ctx,
> > +			num_dwords * sizeof(uint32_t));
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* Preallocate the olr before touching the ring */
> > +	ret = logical_ring_alloc_seqno(ring, ctx);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ringbuf->space -= num_dwords * sizeof(uint32_t);
> > +	return 0;
> > +}
> > +
> >  static int gen8_init_common_ring(struct intel_engine_cs *ring)  {
> >  	struct drm_device *dev = ring->dev;
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.h
> > b/drivers/gpu/drm/i915/intel_lrc.h
> > index 26b0949..686ebf5 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.h
> > +++ b/drivers/gpu/drm/i915/intel_lrc.h
> > @@ -5,6 +5,24 @@
> >  void intel_logical_ring_cleanup(struct intel_engine_cs *ring);  int
> > intel_logical_rings_init(struct drm_device *dev);
> >
> > +void intel_logical_ring_advance_and_submit(struct intel_engine_cs *ring,
> > +					   struct intel_context *ctx);
> > +
> > +static inline void intel_logical_ring_advance(struct intel_ringbuffer
> > +*ringbuf) {
> > +	ringbuf->tail &= ringbuf->size - 1;
> > +}
> > +
> > +static inline void intel_logical_ring_emit(struct intel_ringbuffer
> > +*ringbuf, u32 data) {
> > +	iowrite32(data, ringbuf->virtual_start + ringbuf->tail);
> > +	ringbuf->tail += 4;
> > +}
> > +
> > +int intel_logical_ring_begin(struct intel_engine_cs *ring,
> > +			     struct intel_context *ctx,
> > +			     int num_dwords);
> > +
> 
> I think all of these are only used in intel_lrc.c, so don't need to be in the
> header and could all be static. Right?
> 
> Brad

So far, yes, but that´s only because I artificially made intel_lrc.c self-contained, as Daniel requested. What if we need to execute commands from somewhere else, like in intel_gen7_queue_flip()?

And this takes me to another discussion: this logical ring vs legacy ring split is probably a good idea (time will tell), but we should provide a way of sending commands for execution without knowing if Execlists are enabled or not. In the early series that was easy because we reused the ring_begin, ring_emit & ring_advance functions, but this is not the case anymore. And without this, sooner or later somebody will break legacy or execlists (this already happened last week, when somebody here was implementing native sync without knowing about Execlists).

So, the questions is: how do you feel about a dev_priv.gt vfunc that takes a context, a ring, an array of DWORDS and a BB length and does the intel_(logical)_ring_begin/emit/advance based on i915.enable_execlists?

-- Oscar
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-23 13:09     ` Mateo Lozano, Oscar
@ 2014-06-23 13:13       ` Chris Wilson
  2014-06-23 13:18         ` Mateo Lozano, Oscar
                           ` (2 more replies)
  0 siblings, 3 replies; 156+ messages in thread
From: Chris Wilson @ 2014-06-23 13:13 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx

On Mon, Jun 23, 2014 at 01:09:37PM +0000, Mateo Lozano, Oscar wrote:
> So far, yes, but that´s only because I artificially made intel_lrc.c self-contained, as Daniel requested. What if we need to execute commands from somewhere else, like in intel_gen7_queue_flip()?
> 
> And this takes me to another discussion: this logical ring vs legacy ring split is probably a good idea (time will tell), but we should provide a way of sending commands for execution without knowing if Execlists are enabled or not. In the early series that was easy because we reused the ring_begin, ring_emit & ring_advance functions, but this is not the case anymore. And without this, sooner or later somebody will break legacy or execlists (this already happened last week, when somebody here was implementing native sync without knowing about Execlists).
> 
> So, the questions is: how do you feel about a dev_priv.gt vfunc that takes a context, a ring, an array of DWORDS and a BB length and does the intel_(logical)_ring_begin/emit/advance based on i915.enable_execlists?

I'm still baffled by the design. intel_ring_begin() and friends should
be able to find their context (logical or legacy) from the ring and
dtrt.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-23 13:13       ` Chris Wilson
@ 2014-06-23 13:18         ` Mateo Lozano, Oscar
  2014-06-23 13:27           ` Chris Wilson
  2014-06-24 17:19         ` Jesse Barnes
  2014-07-07 12:41         ` Daniel Vetter
  2 siblings, 1 reply; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-23 13:18 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

> -----Original Message-----
> From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> Sent: Monday, June 23, 2014 2:14 PM
> To: Mateo Lozano, Oscar
> Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical ring
> submission mechanism
> 
> On Mon, Jun 23, 2014 at 01:09:37PM +0000, Mateo Lozano, Oscar wrote:
> > So far, yes, but that´s only because I artificially made intel_lrc.c self-
> contained, as Daniel requested. What if we need to execute commands from
> somewhere else, like in intel_gen7_queue_flip()?
> >
> > And this takes me to another discussion: this logical ring vs legacy ring split
> is probably a good idea (time will tell), but we should provide a way of
> sending commands for execution without knowing if Execlists are enabled or
> not. In the early series that was easy because we reused the ring_begin,
> ring_emit & ring_advance functions, but this is not the case anymore. And
> without this, sooner or later somebody will break legacy or execlists (this
> already happened last week, when somebody here was implementing native
> sync without knowing about Execlists).
> >
> > So, the questions is: how do you feel about a dev_priv.gt vfunc that takes a
> context, a ring, an array of DWORDS and a BB length and does the
> intel_(logical)_ring_begin/emit/advance based on i915.enable_execlists?
> 
> I'm still baffled by the design. intel_ring_begin() and friends should be able to
> find their context (logical or legacy) from the ring and dtrt.
> -Chris

Sorry, Chris, I obviously don´t have the same experience with 915 you have: how do you propose to extract the right context from the ring?

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-23 13:18         ` Mateo Lozano, Oscar
@ 2014-06-23 13:27           ` Chris Wilson
  2014-06-23 13:36             ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Chris Wilson @ 2014-06-23 13:27 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx

On Mon, Jun 23, 2014 at 01:18:35PM +0000, Mateo Lozano, Oscar wrote:
> > -----Original Message-----
> > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > Sent: Monday, June 23, 2014 2:14 PM
> > To: Mateo Lozano, Oscar
> > Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical ring
> > submission mechanism
> > 
> > On Mon, Jun 23, 2014 at 01:09:37PM +0000, Mateo Lozano, Oscar wrote:
> > > So far, yes, but that´s only because I artificially made intel_lrc.c self-
> > contained, as Daniel requested. What if we need to execute commands from
> > somewhere else, like in intel_gen7_queue_flip()?
> > >
> > > And this takes me to another discussion: this logical ring vs legacy ring split
> > is probably a good idea (time will tell), but we should provide a way of
> > sending commands for execution without knowing if Execlists are enabled or
> > not. In the early series that was easy because we reused the ring_begin,
> > ring_emit & ring_advance functions, but this is not the case anymore. And
> > without this, sooner or later somebody will break legacy or execlists (this
> > already happened last week, when somebody here was implementing native
> > sync without knowing about Execlists).
> > >
> > > So, the questions is: how do you feel about a dev_priv.gt vfunc that takes a
> > context, a ring, an array of DWORDS and a BB length and does the
> > intel_(logical)_ring_begin/emit/advance based on i915.enable_execlists?
> > 
> > I'm still baffled by the design. intel_ring_begin() and friends should be able to
> > find their context (logical or legacy) from the ring and dtrt.
> > -Chris
> 
> Sorry, Chris, I obviously don´t have the same experience with 915 you have: how do you propose to extract the right context from the ring?

The rings are a set of buffers and vfuncs that are associated with a
context. Before you can call intel_ring_begin() you must know what
context you want to operate on and therefore can pick the right
logical/legacy ring and interface for RCS/BCS/VCS/etc
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-23 13:27           ` Chris Wilson
@ 2014-06-23 13:36             ` Mateo Lozano, Oscar
  2014-06-23 13:41               ` Chris Wilson
  0 siblings, 1 reply; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-23 13:36 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

> -----Original Message-----
> From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> Sent: Monday, June 23, 2014 2:27 PM
> To: Mateo Lozano, Oscar
> Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical ring
> submission mechanism
> 
> On Mon, Jun 23, 2014 at 01:18:35PM +0000, Mateo Lozano, Oscar wrote:
> > > -----Original Message-----
> > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > Sent: Monday, June 23, 2014 2:14 PM
> > > To: Mateo Lozano, Oscar
> > > Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical
> > > ring submission mechanism
> > >
> > > On Mon, Jun 23, 2014 at 01:09:37PM +0000, Mateo Lozano, Oscar
> wrote:
> > > > So far, yes, but that´s only because I artificially made
> > > > intel_lrc.c self-
> > > contained, as Daniel requested. What if we need to execute commands
> > > from somewhere else, like in intel_gen7_queue_flip()?
> > > >
> > > > And this takes me to another discussion: this logical ring vs
> > > > legacy ring split
> > > is probably a good idea (time will tell), but we should provide a
> > > way of sending commands for execution without knowing if Execlists
> > > are enabled or not. In the early series that was easy because we
> > > reused the ring_begin, ring_emit & ring_advance functions, but this
> > > is not the case anymore. And without this, sooner or later somebody
> > > will break legacy or execlists (this already happened last week,
> > > when somebody here was implementing native sync without knowing
> about Execlists).
> > > >
> > > > So, the questions is: how do you feel about a dev_priv.gt vfunc
> > > > that takes a
> > > context, a ring, an array of DWORDS and a BB length and does the
> > > intel_(logical)_ring_begin/emit/advance based on i915.enable_execlists?
> > >
> > > I'm still baffled by the design. intel_ring_begin() and friends
> > > should be able to find their context (logical or legacy) from the ring and
> dtrt.
> > > -Chris
> >
> > Sorry, Chris, I obviously don´t have the same experience with 915 you have:
> how do you propose to extract the right context from the ring?
> 
> The rings are a set of buffers and vfuncs that are associated with a context.
> Before you can call intel_ring_begin() you must know what context you want
> to operate on and therefore can pick the right logical/legacy ring and
> interface for RCS/BCS/VCS/etc -Chris

Ok, but then you need to pass some extra stuff down together with the intel_engine_cs, either intel_context or intel_ringbuffer, right? Because that´s exactly what I did in previous versions, plumbing intel_context  everywhere where it was needed (I could have plumbed intel_ringbuffer instead, it really doesn´t matter). This was rejected for being too intrusive and not allowing easy maintenance in the future.

-- Oscar

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-23 13:36             ` Mateo Lozano, Oscar
@ 2014-06-23 13:41               ` Chris Wilson
  2014-06-23 14:35                 ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Chris Wilson @ 2014-06-23 13:41 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx

On Mon, Jun 23, 2014 at 01:36:07PM +0000, Mateo Lozano, Oscar wrote:
> > -----Original Message-----
> > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > Sent: Monday, June 23, 2014 2:27 PM
> > To: Mateo Lozano, Oscar
> > Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical ring
> > submission mechanism
> > 
> > On Mon, Jun 23, 2014 at 01:18:35PM +0000, Mateo Lozano, Oscar wrote:
> > > > -----Original Message-----
> > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > > Sent: Monday, June 23, 2014 2:14 PM
> > > > To: Mateo Lozano, Oscar
> > > > Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> > > > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical
> > > > ring submission mechanism
> > > >
> > > > On Mon, Jun 23, 2014 at 01:09:37PM +0000, Mateo Lozano, Oscar
> > wrote:
> > > > > So far, yes, but that´s only because I artificially made
> > > > > intel_lrc.c self-
> > > > contained, as Daniel requested. What if we need to execute commands
> > > > from somewhere else, like in intel_gen7_queue_flip()?
> > > > >
> > > > > And this takes me to another discussion: this logical ring vs
> > > > > legacy ring split
> > > > is probably a good idea (time will tell), but we should provide a
> > > > way of sending commands for execution without knowing if Execlists
> > > > are enabled or not. In the early series that was easy because we
> > > > reused the ring_begin, ring_emit & ring_advance functions, but this
> > > > is not the case anymore. And without this, sooner or later somebody
> > > > will break legacy or execlists (this already happened last week,
> > > > when somebody here was implementing native sync without knowing
> > about Execlists).
> > > > >
> > > > > So, the questions is: how do you feel about a dev_priv.gt vfunc
> > > > > that takes a
> > > > context, a ring, an array of DWORDS and a BB length and does the
> > > > intel_(logical)_ring_begin/emit/advance based on i915.enable_execlists?
> > > >
> > > > I'm still baffled by the design. intel_ring_begin() and friends
> > > > should be able to find their context (logical or legacy) from the ring and
> > dtrt.
> > > > -Chris
> > >
> > > Sorry, Chris, I obviously don´t have the same experience with 915 you have:
> > how do you propose to extract the right context from the ring?
> > 
> > The rings are a set of buffers and vfuncs that are associated with a context.
> > Before you can call intel_ring_begin() you must know what context you want
> > to operate on and therefore can pick the right logical/legacy ring and
> > interface for RCS/BCS/VCS/etc -Chris
> 
> Ok, but then you need to pass some extra stuff down together with the intel_engine_cs, either intel_context or intel_ringbuffer, right? Because that´s exactly what I did in previous versions, plumbing intel_context  everywhere where it was needed (I could have plumbed intel_ringbuffer instead, it really doesn´t matter). This was rejected for being too intrusive and not allowing easy maintenance in the future.

Nope. You didn't redesign the ringbuffers to function as we expected but
tacked on extra information and layering violations.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-23 13:41               ` Chris Wilson
@ 2014-06-23 14:35                 ` Mateo Lozano, Oscar
  2014-06-23 19:10                   ` Volkin, Bradley D
  2014-06-24  0:23                   ` Ben Widawsky
  0 siblings, 2 replies; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-23 14:35 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

> -----Original Message-----
> From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> Sent: Monday, June 23, 2014 2:42 PM
> To: Mateo Lozano, Oscar
> Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical ring
> submission mechanism
> 
> On Mon, Jun 23, 2014 at 01:36:07PM +0000, Mateo Lozano, Oscar wrote:
> > > -----Original Message-----
> > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > Sent: Monday, June 23, 2014 2:27 PM
> > > To: Mateo Lozano, Oscar
> > > Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical
> > > ring submission mechanism
> > >
> > > On Mon, Jun 23, 2014 at 01:18:35PM +0000, Mateo Lozano, Oscar
> wrote:
> > > > > -----Original Message-----
> > > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > > > Sent: Monday, June 23, 2014 2:14 PM
> > > > > To: Mateo Lozano, Oscar
> > > > > Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> > > > > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical
> > > > > ring submission mechanism
> > > > >
> > > > > On Mon, Jun 23, 2014 at 01:09:37PM +0000, Mateo Lozano, Oscar
> > > wrote:
> > > > > > So far, yes, but that´s only because I artificially made
> > > > > > intel_lrc.c self-
> > > > > contained, as Daniel requested. What if we need to execute
> > > > > commands from somewhere else, like in intel_gen7_queue_flip()?
> > > > > >
> > > > > > And this takes me to another discussion: this logical ring vs
> > > > > > legacy ring split
> > > > > is probably a good idea (time will tell), but we should provide
> > > > > a way of sending commands for execution without knowing if
> > > > > Execlists are enabled or not. In the early series that was easy
> > > > > because we reused the ring_begin, ring_emit & ring_advance
> > > > > functions, but this is not the case anymore. And without this,
> > > > > sooner or later somebody will break legacy or execlists (this
> > > > > already happened last week, when somebody here was implementing
> > > > > native sync without knowing
> > > about Execlists).
> > > > > >
> > > > > > So, the questions is: how do you feel about a dev_priv.gt
> > > > > > vfunc that takes a
> > > > > context, a ring, an array of DWORDS and a BB length and does the
> > > > > intel_(logical)_ring_begin/emit/advance based on
> i915.enable_execlists?
> > > > >
> > > > > I'm still baffled by the design. intel_ring_begin() and friends
> > > > > should be able to find their context (logical or legacy) from
> > > > > the ring and
> > > dtrt.
> > > > > -Chris
> > > >
> > > > Sorry, Chris, I obviously don´t have the same experience with 915 you
> have:
> > > how do you propose to extract the right context from the ring?
> > >
> > > The rings are a set of buffers and vfuncs that are associated with a
> context.
> > > Before you can call intel_ring_begin() you must know what context
> > > you want to operate on and therefore can pick the right
> > > logical/legacy ring and interface for RCS/BCS/VCS/etc -Chris
> >
> > Ok, but then you need to pass some extra stuff down together with the
> intel_engine_cs, either intel_context or intel_ringbuffer, right? Because
> that´s exactly what I did in previous versions, plumbing intel_context
> everywhere where it was needed (I could have plumbed intel_ringbuffer
> instead, it really doesn´t matter). This was rejected for being too intrusive
> and not allowing easy maintenance in the future.
> 
> Nope. You didn't redesign the ringbuffers to function as we expected but
> tacked on extra information and layering violations.
> -Chris

I know is no excuse, but as I said, I don´t know the code as well as you do. Let me explain to you the history of this one and maybe you can help me out discovering where I got it all wrong:

- The original design I inherited from Ben and Jesse created a completely new "struct intel_ring_buffer" per context, and passed that one on instead of passing one of the &dev_priv->ring[i] ones. When it was time to submit a context to the ELSP, they simply took it from ring->context. The problem with that was that creating an unbound number of "struct intel_ring_buffer" meant there were also an unbound number of things like "struct list_head active_list" or "u32 irq_enable_mask", which made little sense.
- What we really needed, I naively thought, is to get rid of the concept of ring: there are no rings, there are engines (like the render engine, the vebox, etc...) and there are ringbuffers (as in circular buffer with a head offset, tail offset, and control information). So I went on and renamed the old "intel_ring_buffer" to "intel_engine_cs", then extracting a few things into a new "intel_ringbuffer" struct. Pre-Execlists, there was a 1:1 relationship between the ringbuffers and the engines. With Execlists, however, this 1:1 relationship is between the ringbuffers and the contexts.
- I remember explaining this problem in a face-to-face meeting in the UK with some OTC people back in February (I think they tried to get you on the phone but they didn´t manage. I do remember they got Daniel though). Here, I proposed that an easy solution (easy, but maybe not right) was to plumb the context down: in Execlists mode, we would retrieve the ringbuffer from the context while, in legacy mode, we would get it from the engine. Everybody seemed to agree with this.
- I worked on this premise for several versions that I sent to the mailing list for review (first the internal, then intel-gfx). Daniel only complained last month, when he pointed out that he asked, a long long time ago, for a completely separate execution path for Execlists. And this is where we are now...

And now back to the problem at hand: what do you suggest I do now (other than ritual seppuku, I mean)? I need to know the engine (for a lot of reasons), the ringbuffer (to know where to place commands) and the context (to know what to submit to the ELSP). I can spin it a thousand different ways, but I still need to know those three things. At the same time, I cannot reuse the old intel_ringbuffer code because Daniel won´t approve. What would you do?

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 12/53] drm/i915/bdw: Populate LR contexts (somewhat)
  2014-06-23 12:42     ` Mateo Lozano, Oscar
@ 2014-06-23 15:05       ` Volkin, Bradley D
  2014-06-23 15:11         ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 156+ messages in thread
From: Volkin, Bradley D @ 2014-06-23 15:05 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx

On Mon, Jun 23, 2014 at 05:42:50AM -0700, Mateo Lozano, Oscar wrote:
> > -----Original Message-----
> > From: Volkin, Bradley D
> > > +	reg_state[CTX_RING_HEAD+1] = 0;
> > > +	reg_state[CTX_RING_TAIL] = RING_TAIL(ring->mmio_base);
> > > +	reg_state[CTX_RING_TAIL+1] = 0;
> > > +	reg_state[CTX_RING_BUFFER_START] = RING_START(ring-
> > >mmio_base);
> > > +	reg_state[CTX_RING_BUFFER_START+1] =
> > i915_gem_obj_ggtt_offset(ring_obj);
> > > +	reg_state[CTX_RING_BUFFER_CONTROL] = RING_CTL(ring-
> > >mmio_base);
> > > +	reg_state[CTX_RING_BUFFER_CONTROL+1] = (31 * PAGE_SIZE) |
> > > +RING_VALID;
> > 
> > The size here doesn't look right to me. Shouldn't it be (number of pages - 1)?
> > See init_ring_common()
> 
> But that´s exactly what it is, right? 
> 
> Our ringbuf->size = 32 * PAGE_SIZE;
> so we are setting 31 * PAGE_SIZE

Ok, on closer inspection, the result is correct because the Buffer Length
field happens to be bits 20:12. But it looked to me like a size-in-bytes
rather than the encoding I expected. So I guess I'd prefer that we do it
as in init_ring_common(), using ring_obj->base.size and the RING_NR_PAGES
mask so that it's a bit more obvious what's going on.

Thanks,
Brad

> 
> > > +	reg_state[CTX_BB_HEAD_U] = ring->mmio_base + 0x168;
> > > +	reg_state[CTX_BB_HEAD_U+1] = 0;
> > > +	reg_state[CTX_BB_HEAD_L] = ring->mmio_base + 0x140;
> > > +	reg_state[CTX_BB_HEAD_L+1] = 0;
> > > +	reg_state[CTX_BB_STATE] = ring->mmio_base + 0x110;
> > > +	reg_state[CTX_BB_STATE+1] = (1<<5);
> > > +	reg_state[CTX_SECOND_BB_HEAD_U] = ring->mmio_base + 0x11c;
> > > +	reg_state[CTX_SECOND_BB_HEAD_U+1] = 0;
> > > +	reg_state[CTX_SECOND_BB_HEAD_L] = ring->mmio_base + 0x114;
> > > +	reg_state[CTX_SECOND_BB_HEAD_L+1] = 0;
> > > +	reg_state[CTX_SECOND_BB_STATE] = ring->mmio_base + 0x118;
> > > +	reg_state[CTX_SECOND_BB_STATE+1] = 0;
> > > +	if (ring->id == RCS) {
> > > +		reg_state[CTX_BB_PER_CTX_PTR] = ring->mmio_base +
> > 0x1c0;
> > > +		reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
> > > +		reg_state[CTX_RCS_INDIRECT_CTX] = ring->mmio_base +
> > 0x1c4;
> > > +		reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
> > > +		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring-
> > >mmio_base + 0x1c8;
> > > +		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
> > > +	}
> > > +	reg_state[CTX_LRI_HEADER_1] = MI_LOAD_REGISTER_IMM(9);
> > > +	reg_state[CTX_LRI_HEADER_1] |= MI_LRI_FORCE_POSTED;
> > > +	reg_state[CTX_CTX_TIMESTAMP] = ring->mmio_base + 0x3a8;
> > > +	reg_state[CTX_CTX_TIMESTAMP+1] = 0;
> > > +	reg_state[CTX_PDP3_UDW] = GEN8_RING_PDP_UDW(ring, 3);
> > > +	reg_state[CTX_PDP3_LDW] = GEN8_RING_PDP_LDW(ring, 3);
> > > +	reg_state[CTX_PDP2_UDW] = GEN8_RING_PDP_UDW(ring, 2);
> > > +	reg_state[CTX_PDP2_LDW] = GEN8_RING_PDP_LDW(ring, 2);
> > > +	reg_state[CTX_PDP1_UDW] = GEN8_RING_PDP_UDW(ring, 1);
> > > +	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
> > > +	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
> > > +	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
> > > +	reg_state[CTX_PDP3_UDW+1] = (u64)ppgtt->pd_dma_addr[3] >> 32;
> > > +	reg_state[CTX_PDP3_LDW+1] = ppgtt->pd_dma_addr[3];
> > > +	reg_state[CTX_PDP2_UDW+1] = (u64)ppgtt->pd_dma_addr[2] >> 32;
> > > +	reg_state[CTX_PDP2_LDW+1] = ppgtt->pd_dma_addr[2];
> > > +	reg_state[CTX_PDP1_UDW+1] = (u64)ppgtt->pd_dma_addr[1] >> 32;
> > > +	reg_state[CTX_PDP1_LDW+1] = ppgtt->pd_dma_addr[1];
> > > +	reg_state[CTX_PDP0_UDW+1] = (u64)ppgtt->pd_dma_addr[0] >> 32;
> > > +	reg_state[CTX_PDP0_LDW+1] = ppgtt->pd_dma_addr[0];
> > 
> > Are we able to use upper_32_bits() and lower_32_bits() for these?
> 
> ACK
> 
> -- Oscar

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 12/53] drm/i915/bdw: Populate LR contexts (somewhat)
  2014-06-23 15:05       ` Volkin, Bradley D
@ 2014-06-23 15:11         ` Mateo Lozano, Oscar
  0 siblings, 0 replies; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-23 15:11 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

> -----Original Message-----
> From: Volkin, Bradley D
> Sent: Monday, June 23, 2014 4:06 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 12/53] drm/i915/bdw: Populate LR contexts
> (somewhat)
> 
> On Mon, Jun 23, 2014 at 05:42:50AM -0700, Mateo Lozano, Oscar wrote:
> > > -----Original Message-----
> > > From: Volkin, Bradley D
> > > > +	reg_state[CTX_RING_HEAD+1] = 0;
> > > > +	reg_state[CTX_RING_TAIL] = RING_TAIL(ring->mmio_base);
> > > > +	reg_state[CTX_RING_TAIL+1] = 0;
> > > > +	reg_state[CTX_RING_BUFFER_START] = RING_START(ring-
> > > >mmio_base);
> > > > +	reg_state[CTX_RING_BUFFER_START+1] =
> > > i915_gem_obj_ggtt_offset(ring_obj);
> > > > +	reg_state[CTX_RING_BUFFER_CONTROL] = RING_CTL(ring-
> > > >mmio_base);
> > > > +	reg_state[CTX_RING_BUFFER_CONTROL+1] = (31 * PAGE_SIZE) |
> > > > +RING_VALID;
> > >
> > > The size here doesn't look right to me. Shouldn't it be (number of pages -
> 1)?
> > > See init_ring_common()
> >
> > But that´s exactly what it is, right?
> >
> > Our ringbuf->size = 32 * PAGE_SIZE;
> > so we are setting 31 * PAGE_SIZE
> 
> Ok, on closer inspection, the result is correct because the Buffer Length field
> happens to be bits 20:12. But it looked to me like a size-in-bytes rather than
> the encoding I expected. So I guess I'd prefer that we do it as in
> init_ring_common(), using ring_obj->base.size and the RING_NR_PAGES
> mask so that it's a bit more obvious what's going on.
> 
> Thanks,
> Brad

Yeah, it probably looks less magical that way. Ok, will do.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 27/53] drm/i915/bdw: GEN-specific logical ring emit request
  2014-06-20 21:18   ` Volkin, Bradley D
@ 2014-06-23 15:48     ` Mateo Lozano, Oscar
  0 siblings, 0 replies; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-23 15:48 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

> -----Original Message-----
> From: Volkin, Bradley D
> Sent: Friday, June 20, 2014 10:18 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 27/53] drm/i915/bdw: GEN-specific logical
> ring emit request
> 
> On Fri, Jun 13, 2014 at 08:37:45AM -0700, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > Very similar to the legacy add_request, only modified to account for
> > logical ringbuffer.
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_reg.h         |  1 +
> >  drivers/gpu/drm/i915/intel_lrc.c        | 61
> +++++++++++++++++++++++++++++++++
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
> >  3 files changed, 64 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h
> > b/drivers/gpu/drm/i915/i915_reg.h index 9c8692a..63ec3ea 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -267,6 +267,7 @@
> >  #define   MI_FORCE_RESTORE		(1<<1)
> >  #define   MI_RESTORE_INHIBIT		(1<<0)
> >  #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
> > +#define MI_STORE_DWORD_IMM_GEN8	MI_INSTR(0x20, 2)
> >  #define   MI_MEM_VIRTUAL	(1 << 22) /* 965+ only */
> >  #define MI_STORE_DWORD_INDEX	MI_INSTR(0x21, 1)
> >  #define   MI_STORE_DWORD_INDEX_SHIFT 2
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > b/drivers/gpu/drm/i915/intel_lrc.c
> > index 89aed7a..3debe8b 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -359,6 +359,62 @@ static void gen8_submit_ctx(struct
> intel_engine_cs *ring,
> >  	DRM_ERROR("Execlists still not ready!\n");  }
> >
> > +static int gen8_emit_request(struct intel_engine_cs *ring,
> > +			     struct intel_context *ctx)
> > +{
> > +	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
> > +	u32 cmd;
> > +	int ret;
> > +
> > +	ret = intel_logical_ring_begin(ring, ctx, 6);
> > +	if (ret)
> > +		return ret;
> > +
> > +	cmd = MI_FLUSH_DW + 1;
> > +	cmd |= MI_INVALIDATE_TLB;
> 
> Is the TLB invalidation truely required here? Otherwise it seems like we could
> use the same function for all rings, like on gen6+.

Hmmmmm... this is inherited from back when we only had the simulator, and its true meaning has been lost in the multiple rewrites:

drm/i915/bdw: Use MI_FLUSH_DW for requests
    
    The primary reason for doing this is MI_STORE_DWORD_IDX doesn't work in
    simulation. The simulator doesn't complain, it's just the seqno never
    gets pushed to memory. The theory is (and AFAICT this may be broken on
    existing platforms) we must issue an MI_FLUSH_DW after we emit the
    seqno, if we want to be able to read it back coherently.

I´ll rewrite it to use the same MI_STORE_DATA_IMM for every ring and then test it on read hardware.
Thanks for the heads up!

> > +	cmd |= MI_FLUSH_DW_OP_STOREDW;
> > +
> > +	intel_logical_ring_emit(ringbuf, cmd);
> > +	intel_logical_ring_emit(ringbuf,
> > +				(ring->status_page.gfx_addr +
> > +				(I915_GEM_HWS_INDEX <<
> MI_STORE_DWORD_INDEX_SHIFT)) |
> > +				MI_FLUSH_DW_USE_GTT);
> > +	intel_logical_ring_emit(ringbuf, 0);
> > +	intel_logical_ring_emit(ringbuf, ring->outstanding_lazy_seqno);
> > +	intel_logical_ring_emit(ringbuf, MI_USER_INTERRUPT);
> > +	intel_logical_ring_emit(ringbuf, MI_NOOP);
> > +	intel_logical_ring_advance_and_submit(ring, ctx);
> > +
> > +	return 0;
> > +}
> > +
> > +static int gen8_emit_request_render(struct intel_engine_cs *ring,
> > +				    struct intel_context *ctx)
> > +{
> > +	struct intel_ringbuffer *ringbuf = logical_ringbuf_get(ring, ctx);
> > +	u32 cmd;
> > +	int ret;
> > +
> > +	ret = intel_logical_ring_begin(ring, ctx, 6);
> > +	if (ret)
> > +		return ret;
> > +
> > +	cmd = MI_STORE_DWORD_IMM_GEN8;
> > +	cmd |= (1 << 22); /* use global GTT */
> 
> We could use MI_MEM_VIRTUAL or MI_GLOBAL_GTT instead.

Will do, thanks.

-- Oscar

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-23 14:35                 ` Mateo Lozano, Oscar
@ 2014-06-23 19:10                   ` Volkin, Bradley D
  2014-06-24 12:29                     ` Mateo Lozano, Oscar
  2014-06-24  0:23                   ` Ben Widawsky
  1 sibling, 1 reply; 156+ messages in thread
From: Volkin, Bradley D @ 2014-06-23 19:10 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx

On Mon, Jun 23, 2014 at 07:35:38AM -0700, Mateo Lozano, Oscar wrote:
> > -----Original Message-----
> > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > Sent: Monday, June 23, 2014 2:42 PM
> > To: Mateo Lozano, Oscar
> > Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical ring
> > submission mechanism
> > 
> > On Mon, Jun 23, 2014 at 01:36:07PM +0000, Mateo Lozano, Oscar wrote:
> > > > -----Original Message-----
> > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > > Sent: Monday, June 23, 2014 2:27 PM
> > > > To: Mateo Lozano, Oscar
> > > > Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> > > > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical
> > > > ring submission mechanism
> > > >
> > > > On Mon, Jun 23, 2014 at 01:18:35PM +0000, Mateo Lozano, Oscar
> > wrote:
> > > > > > -----Original Message-----
> > > > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > > > > Sent: Monday, June 23, 2014 2:14 PM
> > > > > > To: Mateo Lozano, Oscar
> > > > > > Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> > > > > > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical
> > > > > > ring submission mechanism
> > > > > >
> > > > > > On Mon, Jun 23, 2014 at 01:09:37PM +0000, Mateo Lozano, Oscar
> > > > wrote:
> > > > > > > So far, yes, but that´s only because I artificially made
> > > > > > > intel_lrc.c self-
> > > > > > contained, as Daniel requested. What if we need to execute
> > > > > > commands from somewhere else, like in intel_gen7_queue_flip()?
> > > > > > >
> > > > > > > And this takes me to another discussion: this logical ring vs
> > > > > > > legacy ring split
> > > > > > is probably a good idea (time will tell), but we should provide
> > > > > > a way of sending commands for execution without knowing if
> > > > > > Execlists are enabled or not. In the early series that was easy
> > > > > > because we reused the ring_begin, ring_emit & ring_advance
> > > > > > functions, but this is not the case anymore. And without this,
> > > > > > sooner or later somebody will break legacy or execlists (this
> > > > > > already happened last week, when somebody here was implementing
> > > > > > native sync without knowing
> > > > about Execlists).
> > > > > > >
> > > > > > > So, the questions is: how do you feel about a dev_priv.gt
> > > > > > > vfunc that takes a
> > > > > > context, a ring, an array of DWORDS and a BB length and does the
> > > > > > intel_(logical)_ring_begin/emit/advance based on
> > i915.enable_execlists?

There are 3 cases of non-execbuffer submissions that I can think of: flips,
render state, and clear-buffer (proposed patches on the list). I wonder if the
right approach might be to use batchbuffers with a small wrapper around the
dispatch_execbuffer/emit_bb_start vfuncs. Basically the rule would be to only
touch a ringbuffer from within the intel_engine_cs vfuncs, which always know
which set of functions to use.

For flips, we could use MMIO flips. Render state already uses the existing
dispatch_execbuffer() and add_request(). The clear code could potentially do
the same. There would obviously be some overhead in using a batch buffer for
what could end up being just a few commands. Perhaps the batch buffer pool
code from the command parser would help though.

> > > > > >
> > > > > > I'm still baffled by the design. intel_ring_begin() and friends
> > > > > > should be able to find their context (logical or legacy) from
> > > > > > the ring and
> > > > dtrt.
> > > > > > -Chris
> > > > >
> > > > > Sorry, Chris, I obviously don´t have the same experience with 915 you
> > have:
> > > > how do you propose to extract the right context from the ring?
> > > >
> > > > The rings are a set of buffers and vfuncs that are associated with a
> > context.
> > > > Before you can call intel_ring_begin() you must know what context
> > > > you want to operate on and therefore can pick the right
> > > > logical/legacy ring and interface for RCS/BCS/VCS/etc -Chris
> > >
> > > Ok, but then you need to pass some extra stuff down together with the
> > intel_engine_cs, either intel_context or intel_ringbuffer, right? Because
> > that´s exactly what I did in previous versions, plumbing intel_context
> > everywhere where it was needed (I could have plumbed intel_ringbuffer
> > instead, it really doesn´t matter). This was rejected for being too intrusive
> > and not allowing easy maintenance in the future.
> > 
> > Nope. You didn't redesign the ringbuffers to function as we expected but
> > tacked on extra information and layering violations.

Not sure what your proposed alternative is here Chris. I'll elaborate on what
I had proposed r.e. plumbing intel_ringbuffer instead of intel_context, which
might be along the lines of what you envision. At this point, we're starting
to go in circles, so I don't know if it's worth revisting beyond that.

The earlier versions of the series modified all of the intel_ring_* functions
to accept (engine, context) as parameters. At or near the bottom of the call
chain (e.g. in the engine vfuncs), we called a new function,
intel_ringbuffer_get(engine, context), to return the appropriate ringbuffer
in both legacy and lrc modes. However, a given struct intel_ringbuffer is
only ever used with a particular engine (for both legacy and lrc) and with
a particular context (for lrc only). So I suggested that we:

- Add a back pointer from struct intel_rinbuffer to intel_context (would only
  be valid for lrc mode)
- Move the intel_ringbuffer_get(engine, context) calls up to the callers
- Pass (engine, ringbuf) instead of (engine, context) to intel_ring_* functions
- Have the vfunc implemenations get the context from the ringbuffer where
  needed and ignore it where not

Looking again, we could probably add a back pointer to the intel_engine_cs as
well and then just pass around the ringbuf. In any case, we went back and forth
on this a bit and decided to just stick with passing (engine, context). Which we
then decided was too invasive, and here we are. So is that closer to what you're
thinking of, or did you have something else in mind?

Thanks,
Brad

> > -Chris
> 
> I know is no excuse, but as I said, I don´t know the code as well as you do. Let me explain to you the history of this one and maybe you can help me out discovering where I got it all wrong:
> 
> - The original design I inherited from Ben and Jesse created a completely new "struct intel_ring_buffer" per context, and passed that one on instead of passing one of the &dev_priv->ring[i] ones. When it was time to submit a context to the ELSP, they simply took it from ring->context. The problem with that was that creating an unbound number of "struct intel_ring_buffer" meant there were also an unbound number of things like "struct list_head active_list" or "u32 irq_enable_mask", which made little sense.
> - What we really needed, I naively thought, is to get rid of the concept of ring: there are no rings, there are engines (like the render engine, the vebox, etc...) and there are ringbuffers (as in circular buffer with a head offset, tail offset, and control information). So I went on and renamed the old "intel_ring_buffer" to "intel_engine_cs", then extracting a few things into a new "intel_ringbuffer" struct. Pre-Execlists, there was a 1:1 relationship between the ringbuffers and the engines. With Execlists, however, this 1:1 relationship is between the ringbuffers and the contexts.
> - I remember explaining this problem in a face-to-face meeting in the UK with some OTC people back in February (I think they tried to get you on the phone but they didn´t manage. I do remember they got Daniel though). Here, I proposed that an easy solution (easy, but maybe not right) was to plumb the context down: in Execlists mode, we would retrieve the ringbuffer from the context while, in legacy mode, we would get it from the engine. Everybody seemed to agree with this.
> - I worked on this premise for several versions that I sent to the mailing list for review (first the internal, then intel-gfx). Daniel only complained last month, when he pointed out that he asked, a long long time ago, for a completely separate execution path for Execlists. And this is where we are now...
> 
> And now back to the problem at hand: what do you suggest I do now (other than ritual seppuku, I mean)? I need to know the engine (for a lot of reasons), the ringbuffer (to know where to place commands) and the context (to know what to submit to the ELSP). I can spin it a thousand different ways, but I still need to know those three things. At the same time, I cannot reuse the old intel_ringbuffer code because Daniel won´t approve. What would you do?

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-23 14:35                 ` Mateo Lozano, Oscar
  2014-06-23 19:10                   ` Volkin, Bradley D
@ 2014-06-24  0:23                   ` Ben Widawsky
  2014-06-24 11:45                     ` Mateo Lozano, Oscar
  1 sibling, 1 reply; 156+ messages in thread
From: Ben Widawsky @ 2014-06-24  0:23 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx

On Mon, Jun 23, 2014 at 02:35:38PM +0000, Mateo Lozano, Oscar wrote:
> > -----Original Message-----
> > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > Sent: Monday, June 23, 2014 2:42 PM
> > To: Mateo Lozano, Oscar
> > Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical ring
> > submission mechanism
> > 
> > On Mon, Jun 23, 2014 at 01:36:07PM +0000, Mateo Lozano, Oscar wrote:
> > > > -----Original Message-----
> > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > > Sent: Monday, June 23, 2014 2:27 PM
> > > > To: Mateo Lozano, Oscar
> > > > Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> > > > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical
> > > > ring submission mechanism
> > > >
> > > > On Mon, Jun 23, 2014 at 01:18:35PM +0000, Mateo Lozano, Oscar
> > wrote:
> > > > > > -----Original Message-----
> > > > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > > > > Sent: Monday, June 23, 2014 2:14 PM
> > > > > > To: Mateo Lozano, Oscar
> > > > > > Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> > > > > > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical
> > > > > > ring submission mechanism
> > > > > >
> > > > > > On Mon, Jun 23, 2014 at 01:09:37PM +0000, Mateo Lozano, Oscar
> > > > wrote:
> > > > > > > So far, yes, but that´s only because I artificially made
> > > > > > > intel_lrc.c self-
> > > > > > contained, as Daniel requested. What if we need to execute
> > > > > > commands from somewhere else, like in intel_gen7_queue_flip()?
> > > > > > >
> > > > > > > And this takes me to another discussion: this logical ring vs
> > > > > > > legacy ring split
> > > > > > is probably a good idea (time will tell), but we should provide
> > > > > > a way of sending commands for execution without knowing if
> > > > > > Execlists are enabled or not. In the early series that was easy
> > > > > > because we reused the ring_begin, ring_emit & ring_advance
> > > > > > functions, but this is not the case anymore. And without this,
> > > > > > sooner or later somebody will break legacy or execlists (this
> > > > > > already happened last week, when somebody here was implementing
> > > > > > native sync without knowing
> > > > about Execlists).
> > > > > > >
> > > > > > > So, the questions is: how do you feel about a dev_priv.gt
> > > > > > > vfunc that takes a
> > > > > > context, a ring, an array of DWORDS and a BB length and does the
> > > > > > intel_(logical)_ring_begin/emit/advance based on
> > i915.enable_execlists?
> > > > > >
> > > > > > I'm still baffled by the design. intel_ring_begin() and friends
> > > > > > should be able to find their context (logical or legacy) from
> > > > > > the ring and
> > > > dtrt.
> > > > > > -Chris
> > > > >
> > > > > Sorry, Chris, I obviously don´t have the same experience with 915 you
> > have:
> > > > how do you propose to extract the right context from the ring?
> > > >
> > > > The rings are a set of buffers and vfuncs that are associated with a
> > context.
> > > > Before you can call intel_ring_begin() you must know what context
> > > > you want to operate on and therefore can pick the right
> > > > logical/legacy ring and interface for RCS/BCS/VCS/etc -Chris
> > >
> > > Ok, but then you need to pass some extra stuff down together with the
> > intel_engine_cs, either intel_context or intel_ringbuffer, right? Because
> > that´s exactly what I did in previous versions, plumbing intel_context
> > everywhere where it was needed (I could have plumbed intel_ringbuffer
> > instead, it really doesn´t matter). This was rejected for being too intrusive
> > and not allowing easy maintenance in the future.
> > 
> > Nope. You didn't redesign the ringbuffers to function as we expected but
> > tacked on extra information and layering violations.
> > -Chris
> 
> I know is no excuse, but as I said, I don´t know the code as well as you do. Let me explain to you the history of this one and maybe you can help me out discovering where I got it all wrong:
> 
> - The original design I inherited from Ben and Jesse created a completely new "struct intel_ring_buffer" per context, and passed that one on instead of passing one of the &dev_priv->ring[i] ones. When it was time to submit a context to the ELSP, they simply took it from ring->context. The problem with that was that creating an unbound number of "struct intel_ring_buffer" meant there were also an unbound number of things like "struct list_head active_list" or "u32 irq_enable_mask", which made little sense.
> - What we really needed, I naively thought, is to get rid of the concept of ring: there are no rings, there are engines (like the render engine, the vebox, etc...) and there are ringbuffers (as in circular buffer with a head offset, tail offset, and control information). So I went on and renamed the old "intel_ring_buffer" to "intel_engine_cs", then extracting a few things into a new "intel_ringbuffer" struct. Pre-Execlists, there was a 1:1 relationship between the ringbuffers and the engines. With Execlists, however, this 1:1 relationship is between the ringbuffers and the contexts.
> - I remember explaining this problem in a face-to-face meeting in the UK with some OTC people back in February (I think they tried to get you on the phone but they didn´t manage. I do remember they got Daniel though). Here, I proposed that an easy solution (easy, but maybe not right) was to plumb the context down: in Execlists mode, we would retrieve the ringbuffer from the context while, in legacy mode, we would get it from the engine. Everybody seemed to agree with this.
> - I worked on this premise for several versions that I sent to the mailing list for review (first the internal, then intel-gfx). Daniel only complained last month, when he pointed out that he asked, a long long time ago, for a completely separate execution path for Execlists. And this is where we are now...
> 
> And now back to the problem at hand: what do you suggest I do now (other than ritual seppuku, I mean)? I need to know the engine (for a lot of reasons), the ringbuffer (to know where to place commands) and the context (to know what to submit to the ELSP). I can spin it a thousand different ways, but I still need to know those three things. At the same time, I cannot reuse the old intel_ringbuffer code because Daniel won´t approve. What would you do?

I think what Chris is trying to say is that all of your operations
should be context driven. The ringbuffer is always a derivative of the
context. If I understand Chris correctly, I agree with him, but he is
being characteristically terse.

There should be two data structures:
intel_engine_cs, formerly intel_ringbuffer - for legacy
intel_context, formerly intel_hw_context - for execlist

You did that.

Then there should be a data structure to represent the ringbuffer within
the execlist context.

You did that, too.

I don't think the fact that there is a separate execbuf path makes any
difference in this conversation, but perhaps I missed some detail.

So at least from what I can tell, the data structures are right. The
problem is that we're mixing and matching intel_engine_cs with the new
[and I wish we could have used a different name] intel_ringbuffer. As
an example from near the top of the patch:
+       intel_logical_ring_advance(ringbuf);
+
+       if (intel_ring_stopped(ring))
+               return;

You're advancing the ringbuf, but checking the ring? That's confusing to
me.

I think the only solution for what Chris is asking for is to implement
this as 1 context per engine, as opposed to 1 context with a context
object per engine. As you correctly stated, I think we all agreed the
latter was fine when we met. Functionally, I see no difference, but it
does allow you to always use a context as the sole mechanism for making
any decisions and performing any operations. Now without writing all the
code, I can't promise it actually will look better, but I think it's
likely going to be a lot cleaner. Before you do any changes though...

On to what I see as the real problem: fundamentally, if Daniel put Chris
in charge of giving the thumbs or down, then you should get Daniel to
agree that he will defer to Chris, and you should do whatever Chris
says. You need not be caught in the middle of Daniel and Chris - it is a
bad place to be (I know from much experience). If Daniel is not okay
with that, then he needs to find a different reviewer.

-- 
Ben Widawsky, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-24  0:23                   ` Ben Widawsky
@ 2014-06-24 11:45                     ` Mateo Lozano, Oscar
  2014-06-24 14:41                       ` Volkin, Bradley D
  0 siblings, 1 reply; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-24 11:45 UTC (permalink / raw)
  To: Ben Widawsky, Volkin, Bradley D; +Cc: intel-gfx

Ok, let´s try to extract something positive out of all this.

OPTION A (Ben´s proposal):

> I think the only solution for what Chris is asking for is to implement this as 1
> context per engine, as opposed to 1 context with a context object per
> engine. As you correctly stated, I think we all agreed the latter was fine when
> we met. Functionally, I see no difference, but it does allow you to always use
> a context as the sole mechanism for making any decisions and performing
> any operations. Now without writing all the code, I can't promise it actually
> will look better, but I think it's likely going to be a lot cleaner. Before you do
> any changes though...

We agreed on this early on (v1), yes, but then the idea was frowned upon by Brad and then by Daniel. I cannot recall exactly why anymore, but one big reason was that the idr mechanism makes it difficult to track several contexts with the same id (and userspace only has one context handle) and something about ctx->hang_stats. From v2 on, we agreed to multiplex different engines inside one intel_context (and that´s why we renamed i915_hw_context to intel_context).

OPTION B (Brad´s proposal):

> So I suggested that we:
> 
> - Add a back pointer from struct intel_rinbuffer to intel_context (would only
>   be valid for lrc mode)
> - Move the intel_ringbuffer_get(engine, context) calls up to the callers
> - Pass (engine, ringbuf) instead of (engine, context) to intel_ring_* functions
> - Have the vfunc implemenations get the context from the ringbuffer where
>   needed and ignore it where not
> 
> Looking again, we could probably add a back pointer to the intel_engine_cs
> as well and then just pass around the ringbuf.

Sounds fine by me: intel_ringbuffer is only related to exactly one intel_engine_cs and one intel_context, so having pointers to those two makes sense.
As before, this could be easily done within the existing code (passing intel_rinbgbuffer instead of intel_engine_cs), but Daniel wants a code split, so I can only do it for the logical ring functions.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-23 19:10                   ` Volkin, Bradley D
@ 2014-06-24 12:29                     ` Mateo Lozano, Oscar
  2014-07-07 12:39                       ` Daniel Vetter
  0 siblings, 1 reply; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-24 12:29 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

> -----Original Message-----
> From: Volkin, Bradley D
> Sent: Monday, June 23, 2014 8:10 PM
> To: Mateo Lozano, Oscar
> Cc: Chris Wilson; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical ring
> submission mechanism
> 
> On Mon, Jun 23, 2014 at 07:35:38AM -0700, Mateo Lozano, Oscar wrote:
> > > -----Original Message-----
> > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > Sent: Monday, June 23, 2014 2:42 PM
> > > To: Mateo Lozano, Oscar
> > > Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical
> > > ring submission mechanism
> > >
> > > On Mon, Jun 23, 2014 at 01:36:07PM +0000, Mateo Lozano, Oscar
> wrote:
> > > > > -----Original Message-----
> > > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > > > Sent: Monday, June 23, 2014 2:27 PM
> > > > > To: Mateo Lozano, Oscar
> > > > > Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> > > > > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical
> > > > > ring submission mechanism
> > > > >
> > > > > On Mon, Jun 23, 2014 at 01:18:35PM +0000, Mateo Lozano, Oscar
> > > wrote:
> > > > > > > -----Original Message-----
> > > > > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > > > > > Sent: Monday, June 23, 2014 2:14 PM
> > > > > > > To: Mateo Lozano, Oscar
> > > > > > > Cc: Volkin, Bradley D; intel-gfx@lists.freedesktop.org
> > > > > > > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New
> > > > > > > logical ring submission mechanism
> > > > > > >
> > > > > > > On Mon, Jun 23, 2014 at 01:09:37PM +0000, Mateo Lozano,
> > > > > > > Oscar
> > > > > wrote:
> > > > > > > > So far, yes, but that´s only because I artificially made
> > > > > > > > intel_lrc.c self-
> > > > > > > contained, as Daniel requested. What if we need to execute
> > > > > > > commands from somewhere else, like in intel_gen7_queue_flip()?
> > > > > > > >
> > > > > > > > And this takes me to another discussion: this logical ring
> > > > > > > > vs legacy ring split
> > > > > > > is probably a good idea (time will tell), but we should
> > > > > > > provide a way of sending commands for execution without
> > > > > > > knowing if Execlists are enabled or not. In the early series
> > > > > > > that was easy because we reused the ring_begin, ring_emit &
> > > > > > > ring_advance functions, but this is not the case anymore.
> > > > > > > And without this, sooner or later somebody will break legacy
> > > > > > > or execlists (this already happened last week, when somebody
> > > > > > > here was implementing native sync without knowing
> > > > > about Execlists).
> > > > > > > >
> > > > > > > > So, the questions is: how do you feel about a dev_priv.gt
> > > > > > > > vfunc that takes a
> > > > > > > context, a ring, an array of DWORDS and a BB length and does
> > > > > > > the intel_(logical)_ring_begin/emit/advance based on
> > > i915.enable_execlists?
> 
> There are 3 cases of non-execbuffer submissions that I can think of: flips,
> render state, and clear-buffer (proposed patches on the list). I wonder if the
> right approach might be to use batchbuffers with a small wrapper around the
> dispatch_execbuffer/emit_bb_start vfuncs. Basically the rule would be to
> only touch a ringbuffer from within the intel_engine_cs vfuncs, which always
> know which set of functions to use.
> 
> For flips, we could use MMIO flips. Render state already uses the existing
> dispatch_execbuffer() and add_request(). The clear code could potentially do
> the same. There would obviously be some overhead in using a batch buffer
> for what could end up being just a few commands. Perhaps the batch buffer
> pool code from the command parser would help though.

This has another positive side-effect: the scheduler guys do not like things inside the ring without a proper batchbuffer & request, because it makes their life more complex.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-24 11:45                     ` Mateo Lozano, Oscar
@ 2014-06-24 14:41                       ` Volkin, Bradley D
  0 siblings, 0 replies; 156+ messages in thread
From: Volkin, Bradley D @ 2014-06-24 14:41 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: Ben Widawsky, intel-gfx

On Tue, Jun 24, 2014 at 04:45:05AM -0700, Mateo Lozano, Oscar wrote:
> Ok, let´s try to extract something positive out of all this.
> 
> OPTION A (Ben´s proposal):
> 
> > I think the only solution for what Chris is asking for is to implement this as 1
> > context per engine, as opposed to 1 context with a context object per
> > engine. As you correctly stated, I think we all agreed the latter was fine when
> > we met. Functionally, I see no difference, but it does allow you to always use
> > a context as the sole mechanism for making any decisions and performing
> > any operations. Now without writing all the code, I can't promise it actually
> > will look better, but I think it's likely going to be a lot cleaner. Before you do
> > any changes though...
> 
> We agreed on this early on (v1), yes, but then the idea was frowned upon by Brad and then by Daniel. I cannot recall exactly why anymore, but one big reason was that the idr mechanism makes it difficult to track several contexts with the same id (and userspace only has one context handle) and something about ctx->hang_stats. From v2 on, we agreed to multiplex different engines inside one intel_context (and that´s why we renamed i915_hw_context to intel_context).

Yeah, at least for me, the reason was that the multiple structs per context id
code felt awkward given that most/all of the fields in a struct intel_context are
logically per-context rather than per-engine (vm, hang_stats, etc). It didn't
seem like the right approach to me at the time.

Brad

> 
> OPTION B (Brad´s proposal):
> 
> > So I suggested that we:
> > 
> > - Add a back pointer from struct intel_rinbuffer to intel_context (would only
> >   be valid for lrc mode)
> > - Move the intel_ringbuffer_get(engine, context) calls up to the callers
> > - Pass (engine, ringbuf) instead of (engine, context) to intel_ring_* functions
> > - Have the vfunc implemenations get the context from the ringbuffer where
> >   needed and ignore it where not
> > 
> > Looking again, we could probably add a back pointer to the intel_engine_cs
> > as well and then just pass around the ringbuf.
> 
> Sounds fine by me: intel_ringbuffer is only related to exactly one intel_engine_cs and one intel_context, so having pointers to those two makes sense.
> As before, this could be easily done within the existing code (passing intel_rinbgbuffer instead of intel_engine_cs), but Daniel wants a code split, so I can only do it for the logical ring functions.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-23 13:13       ` Chris Wilson
  2014-06-23 13:18         ` Mateo Lozano, Oscar
@ 2014-06-24 17:19         ` Jesse Barnes
  2014-06-26 13:28           ` Mateo Lozano, Oscar
  2014-07-07 12:41         ` Daniel Vetter
  2 siblings, 1 reply; 156+ messages in thread
From: Jesse Barnes @ 2014-06-24 17:19 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, 23 Jun 2014 14:13:55 +0100
Chris Wilson <chris@chris-wilson.co.uk> wrote:

> On Mon, Jun 23, 2014 at 01:09:37PM +0000, Mateo Lozano, Oscar wrote:
> > So far, yes, but that´s only because I artificially made
> > intel_lrc.c self-contained, as Daniel requested. What if we need to
> > execute commands from somewhere else, like in
> > intel_gen7_queue_flip()?
> > 
> > And this takes me to another discussion: this logical ring vs
> > legacy ring split is probably a good idea (time will tell), but we
> > should provide a way of sending commands for execution without
> > knowing if Execlists are enabled or not. In the early series that
> > was easy because we reused the ring_begin, ring_emit & ring_advance
> > functions, but this is not the case anymore. And without this,
> > sooner or later somebody will break legacy or execlists (this
> > already happened last week, when somebody here was implementing
> > native sync without knowing about Execlists).
> > 
> > So, the questions is: how do you feel about a dev_priv.gt vfunc
> > that takes a context, a ring, an array of DWORDS and a BB length
> > and does the intel_(logical)_ring_begin/emit/advance based on
> > i915.enable_execlists?
> 
> I'm still baffled by the design. intel_ring_begin() and friends should
> be able to find their context (logical or legacy) from the ring and
> dtrt.

To me, given that the LRC contains the ring, the right thing to do
would be to do what Oscar did in the first place: pass the context
around everywhere.  If there were cases where that wasn't ideal (the
layering violations you mention later?) we should fix them up instead.

But given that this is a standalone file, it's easy to fix up however
we want incrementally, as long as things work well to begin with and
it's reasonably easy to review it...

Jesse
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-24 17:19         ` Jesse Barnes
@ 2014-06-26 13:28           ` Mateo Lozano, Oscar
  0 siblings, 0 replies; 156+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-26 13:28 UTC (permalink / raw)
  To: Jesse Barnes, Chris Wilson; +Cc: intel-gfx

> -----Original Message-----
> From: Jesse Barnes [mailto:jbarnes@virtuousgeek.org]
> Sent: Tuesday, June 24, 2014 6:20 PM
> To: Chris Wilson
> Cc: Mateo Lozano, Oscar; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical ring
> submission mechanism
> 
> On Mon, 23 Jun 2014 14:13:55 +0100
> Chris Wilson <chris@chris-wilson.co.uk> wrote:
> 
> > On Mon, Jun 23, 2014 at 01:09:37PM +0000, Mateo Lozano, Oscar wrote:
> > > So far, yes, but that´s only because I artificially made intel_lrc.c
> > > self-contained, as Daniel requested. What if we need to execute
> > > commands from somewhere else, like in intel_gen7_queue_flip()?
> > >
> > > And this takes me to another discussion: this logical ring vs legacy
> > > ring split is probably a good idea (time will tell), but we should
> > > provide a way of sending commands for execution without knowing if
> > > Execlists are enabled or not. In the early series that was easy
> > > because we reused the ring_begin, ring_emit & ring_advance
> > > functions, but this is not the case anymore. And without this,
> > > sooner or later somebody will break legacy or execlists (this
> > > already happened last week, when somebody here was implementing
> > > native sync without knowing about Execlists).
> > >
> > > So, the questions is: how do you feel about a dev_priv.gt vfunc that
> > > takes a context, a ring, an array of DWORDS and a BB length and does
> > > the intel_(logical)_ring_begin/emit/advance based on
> > > i915.enable_execlists?
> >
> > I'm still baffled by the design. intel_ring_begin() and friends should
> > be able to find their context (logical or legacy) from the ring and
> > dtrt.
> 
> To me, given that the LRC contains the ring, the right thing to do would be to
> do what Oscar did in the first place: pass the context around everywhere.  If
> there were cases where that wasn't ideal (the layering violations you
> mention later?) we should fix them up instead.
> 
> But given that this is a standalone file, it's easy to fix up however we want
> incrementally, as long as things work well to begin with and it's reasonably
> easy to review it...

Hi Jesse,

I spoke with Chris on IRC, and he told me that he is currently rewriting this stuff to show (and prove) what his approach is.

In the meantime, I´ll keep working on the review comments I have received from Brad and Daniel and I´ll also send some early patches with prep-work that (hopefully) can be merged without too much fuss.

-- Oscar
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-24 12:29                     ` Mateo Lozano, Oscar
@ 2014-07-07 12:39                       ` Daniel Vetter
  0 siblings, 0 replies; 156+ messages in thread
From: Daniel Vetter @ 2014-07-07 12:39 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx

On Tue, Jun 24, 2014 at 12:29:45PM +0000, Mateo Lozano, Oscar wrote:
> > -----Original Message-----
> > From: Volkin, Bradley D
> > Sent: Monday, June 23, 2014 8:10 PM
> > To: Mateo Lozano, Oscar
> > Cc: Chris Wilson; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical ring
> > submission mechanism
> > There are 3 cases of non-execbuffer submissions that I can think of: flips,
> > render state, and clear-buffer (proposed patches on the list). I wonder if the
> > right approach might be to use batchbuffers with a small wrapper around the
> > dispatch_execbuffer/emit_bb_start vfuncs. Basically the rule would be to
> > only touch a ringbuffer from within the intel_engine_cs vfuncs, which always
> > know which set of functions to use.
> > 
> > For flips, we could use MMIO flips. Render state already uses the existing
> > dispatch_execbuffer() and add_request(). The clear code could potentially do
> > the same. There would obviously be some overhead in using a batch buffer
> > for what could end up being just a few commands. Perhaps the batch buffer
> > pool code from the command parser would help though.
> 
> This has another positive side-effect: the scheduler guys do not like
> things inside the ring without a proper batchbuffer & request, because
> it makes their life more complex.

I'm probably missing all the context here but I've thought this is the
plan forward: We'll use mmio flips with execlists and otherwise we'll
submit everything with the right context/engine whaterever using
execlist-specific functions.

Since the gpu clear code isn't merged yet we can imo ignore it for now and
merge execlists first.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism
  2014-06-23 13:13       ` Chris Wilson
  2014-06-23 13:18         ` Mateo Lozano, Oscar
  2014-06-24 17:19         ` Jesse Barnes
@ 2014-07-07 12:41         ` Daniel Vetter
  2 siblings, 0 replies; 156+ messages in thread
From: Daniel Vetter @ 2014-07-07 12:41 UTC (permalink / raw)
  To: Chris Wilson, Mateo Lozano, Oscar, Volkin, Bradley D, intel-gfx

On Mon, Jun 23, 2014 at 02:13:55PM +0100, Chris Wilson wrote:
> On Mon, Jun 23, 2014 at 01:09:37PM +0000, Mateo Lozano, Oscar wrote:
> > So far, yes, but that´s only because I artificially made intel_lrc.c self-contained, as Daniel requested. What if we need to execute commands from somewhere else, like in intel_gen7_queue_flip()?
> > 
> > And this takes me to another discussion: this logical ring vs legacy ring split is probably a good idea (time will tell), but we should provide a way of sending commands for execution without knowing if Execlists are enabled or not. In the early series that was easy because we reused the ring_begin, ring_emit & ring_advance functions, but this is not the case anymore. And without this, sooner or later somebody will break legacy or execlists (this already happened last week, when somebody here was implementing native sync without knowing about Execlists).
> > 
> > So, the questions is: how do you feel about a dev_priv.gt vfunc that takes a context, a ring, an array of DWORDS and a BB length and does the intel_(logical)_ring_begin/emit/advance based on i915.enable_execlists?
> 
> I'm still baffled by the design. intel_ring_begin() and friends should
> be able to find their context (logical or legacy) from the ring and
> dtrt.

Well I'm opting for a the different approach of presuming that the callers
knows whether we're running with execlists or legacy rings and so will
have a clean (and full) split. If we really need to submit massive amounts
of cs commands from the kernel we should launch that as a batch, which
should be fairly unform for both legacy ring and execlists mode.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 41/53] drm/i915/bdw: Avoid non-lite-restore preemptions
  2014-06-23 11:52     ` Mateo Lozano, Oscar
@ 2014-07-07 12:47       ` Daniel Vetter
  0 siblings, 0 replies; 156+ messages in thread
From: Daniel Vetter @ 2014-07-07 12:47 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx

On Mon, Jun 23, 2014 at 11:52:11AM +0000, Mateo Lozano, Oscar wrote:
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> > Vetter
> > Sent: Wednesday, June 18, 2014 9:49 PM
> > To: Mateo Lozano, Oscar
> > Cc: intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 41/53] drm/i915/bdw: Avoid non-lite-restore
> > preemptions
> > 
> > On Fri, Jun 13, 2014 at 04:37:59PM +0100, oscar.mateo@intel.com wrote:
> > > From: Oscar Mateo <oscar.mateo@intel.com>
> > >
> > > In the current Execlists feeding mechanism, full preemption is not
> > > supported yet: only lite-restores are allowed (this is: the GPU simply
> > > samples a new tail pointer for the context currently in execution).
> > >
> > > But we have identified an scenario in which a full preemption occurs:
> > > 1) We submit two contexts for execution (A & B).
> > > 2) The GPU finishes with the first one (A), switches to the second one
> > > (B) and informs us.
> > > 3) We submit B again (hoping to cause a lite restore) together with C,
> > > but in the time we spend writing to the ELSP, the GPU finishes B.
> > > 4) The GPU start executing B again (since we told it so).
> > > 5) We receive a B finished interrupt and, mistakenly, we submit C
> > > (again) and D, causing a full preemption of B.
> > >
> > > By keeping a better track of our submissions, we can avoid the
> > > scenario described above.
> > 
> > How? I don't see a way to fundamentally avoid the above race, and I don't
> > really see an issue with it - the gpu should notice that there's not really any
> > work done and then switch to C.
> > 
> > Or am I completely missing the point here?
> > 
> > With no clue at all this looks really scary.
> 
> The race is avoided by keeping track of how many times a context has
> been submitted to the hardware and by better discriminating the received
> context switch interrupts: in the example, when we have submitted B
> twice, we won´t submit C and D as soon as we receive the notification
> that B is completed because we were expecting to get a LITE_RESTORE and
> we didn´t, so we know a second completion will be received shortly.
> 
> Without this explicit checking, the race condition happens and, somehow,
> the batch buffer execution order gets messed with. This can be verified
> with the IGT test I sent together with the series. I don´t know the
> exact mechanism by which the pre-emption messes with the execution order
> but, since other people is working on the Scheduler + Preemption on
> Execlists, I didn´t try to fix it. In these series, only Lite Restores
> are supported (other kind of preemptions WARN).
> 
> I´ll add this clarification to the commit message.

Yeah, please elaborate more clearly in the commit message what exactly
seems to go wrong (and where you're unsure with your model of how the
hardware reacts). And please also reference the precise testcase (and
precise failure mode without this patch) so that if anyone stumbles over
this again we have some breadcrumbs to figure things out. And some stern
warnings as comments in the code to warn the unsuspecting reader.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
  2014-06-18 20:54   ` Daniel Vetter
@ 2014-07-26 10:27     ` Chris Wilson
  2014-07-28  8:54       ` Daniel Vetter
  0 siblings, 1 reply; 156+ messages in thread
From: Chris Wilson @ 2014-07-26 10:27 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Jun 18, 2014 at 10:54:13PM +0200, Daniel Vetter wrote:
> On Fri, Jun 13, 2014 at 04:38:03PM +0100, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> > 
> > Or with a spinlock grabbed, because it might sleep, which is not
> > a nice thing to do. Instead, do the runtime_pm get/put together
> > with the create/destroy request, and handle the forcewake get/put
> > directly.
> > 
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> 
> Looks like a fixup that should be squashed into relevant earlier patches.

The whole gen6_gt_force_wake_get() calling intel_runtime_pm_get() is
broken due to this - we must be able to read registers in atomic
context!

Please revert c8c8fb33b37766acf6474784b0d5245dab9a1690
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
  2014-07-26 10:27     ` Chris Wilson
@ 2014-07-28  8:54       ` Daniel Vetter
  2014-07-29  7:37         ` Chris Wilson
  0 siblings, 1 reply; 156+ messages in thread
From: Daniel Vetter @ 2014-07-28  8:54 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, oscar.mateo, intel-gfx

On Sat, Jul 26, 2014 at 11:27:38AM +0100, Chris Wilson wrote:
> On Wed, Jun 18, 2014 at 10:54:13PM +0200, Daniel Vetter wrote:
> > On Fri, Jun 13, 2014 at 04:38:03PM +0100, oscar.mateo@intel.com wrote:
> > > From: Oscar Mateo <oscar.mateo@intel.com>
> > > 
> > > Or with a spinlock grabbed, because it might sleep, which is not
> > > a nice thing to do. Instead, do the runtime_pm get/put together
> > > with the create/destroy request, and handle the forcewake get/put
> > > directly.
> > > 
> > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > 
> > Looks like a fixup that should be squashed into relevant earlier patches.
> 
> The whole gen6_gt_force_wake_get() calling intel_runtime_pm_get() is
> broken due to this - we must be able to read registers in atomic
> context!
> 
> Please revert c8c8fb33b37766acf6474784b0d5245dab9a1690

force_wake_get can't call runtime_pm_get becuase pm_get can sleep. So if
you want to read registers from atomic context you have to have a runtime
pm reference from someone else.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
  2014-07-28  8:54       ` Daniel Vetter
@ 2014-07-29  7:37         ` Chris Wilson
  2014-07-29 10:26           ` Daniel Vetter
  0 siblings, 1 reply; 156+ messages in thread
From: Chris Wilson @ 2014-07-29  7:37 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Mon, Jul 28, 2014 at 10:54:06AM +0200, Daniel Vetter wrote:
> On Sat, Jul 26, 2014 at 11:27:38AM +0100, Chris Wilson wrote:
> > On Wed, Jun 18, 2014 at 10:54:13PM +0200, Daniel Vetter wrote:
> > > On Fri, Jun 13, 2014 at 04:38:03PM +0100, oscar.mateo@intel.com wrote:
> > > > From: Oscar Mateo <oscar.mateo@intel.com>
> > > > 
> > > > Or with a spinlock grabbed, because it might sleep, which is not
> > > > a nice thing to do. Instead, do the runtime_pm get/put together
> > > > with the create/destroy request, and handle the forcewake get/put
> > > > directly.
> > > > 
> > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > 
> > > Looks like a fixup that should be squashed into relevant earlier patches.
> > 
> > The whole gen6_gt_force_wake_get() calling intel_runtime_pm_get() is
> > broken due to this - we must be able to read registers in atomic
> > context!
> > 
> > Please revert c8c8fb33b37766acf6474784b0d5245dab9a1690
> 
> force_wake_get can't call runtime_pm_get becuase pm_get can sleep. So if
> you want to read registers from atomic context you have to have a runtime
> pm reference from someone else.

Nope. That cannot work.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
  2014-07-29  7:37         ` Chris Wilson
@ 2014-07-29 10:26           ` Daniel Vetter
  2014-08-08  9:20             ` Chris Wilson
  0 siblings, 1 reply; 156+ messages in thread
From: Daniel Vetter @ 2014-07-29 10:26 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, oscar.mateo, intel-gfx

On Tue, Jul 29, 2014 at 08:37:48AM +0100, Chris Wilson wrote:
> On Mon, Jul 28, 2014 at 10:54:06AM +0200, Daniel Vetter wrote:
> > On Sat, Jul 26, 2014 at 11:27:38AM +0100, Chris Wilson wrote:
> > > On Wed, Jun 18, 2014 at 10:54:13PM +0200, Daniel Vetter wrote:
> > > > On Fri, Jun 13, 2014 at 04:38:03PM +0100, oscar.mateo@intel.com wrote:
> > > > > From: Oscar Mateo <oscar.mateo@intel.com>
> > > > > 
> > > > > Or with a spinlock grabbed, because it might sleep, which is not
> > > > > a nice thing to do. Instead, do the runtime_pm get/put together
> > > > > with the create/destroy request, and handle the forcewake get/put
> > > > > directly.
> > > > > 
> > > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > > 
> > > > Looks like a fixup that should be squashed into relevant earlier patches.
> > > 
> > > The whole gen6_gt_force_wake_get() calling intel_runtime_pm_get() is
> > > broken due to this - we must be able to read registers in atomic
> > > context!
> > > 
> > > Please revert c8c8fb33b37766acf6474784b0d5245dab9a1690
> > 
> > force_wake_get can't call runtime_pm_get becuase pm_get can sleep. So if
> > you want to read registers from atomic context you have to have a runtime
> > pm reference from someone else.
> 
> Nope. That cannot work.

Well it works currently. So where do you see the problem?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
  2014-07-29 10:26           ` Daniel Vetter
@ 2014-08-08  9:20             ` Chris Wilson
  2014-08-08  9:37                 ` Daniel Vetter
  0 siblings, 1 reply; 156+ messages in thread
From: Chris Wilson @ 2014-08-08  9:20 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, Jul 29, 2014 at 12:26:36PM +0200, Daniel Vetter wrote:
> On Tue, Jul 29, 2014 at 08:37:48AM +0100, Chris Wilson wrote:
> > On Mon, Jul 28, 2014 at 10:54:06AM +0200, Daniel Vetter wrote:
> > > On Sat, Jul 26, 2014 at 11:27:38AM +0100, Chris Wilson wrote:
> > > > On Wed, Jun 18, 2014 at 10:54:13PM +0200, Daniel Vetter wrote:
> > > > > On Fri, Jun 13, 2014 at 04:38:03PM +0100, oscar.mateo@intel.com wrote:
> > > > > > From: Oscar Mateo <oscar.mateo@intel.com>
> > > > > > 
> > > > > > Or with a spinlock grabbed, because it might sleep, which is not
> > > > > > a nice thing to do. Instead, do the runtime_pm get/put together
> > > > > > with the create/destroy request, and handle the forcewake get/put
> > > > > > directly.
> > > > > > 
> > > > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > > > 
> > > > > Looks like a fixup that should be squashed into relevant earlier patches.
> > > > 
> > > > The whole gen6_gt_force_wake_get() calling intel_runtime_pm_get() is
> > > > broken due to this - we must be able to read registers in atomic
> > > > context!
> > > > 
> > > > Please revert c8c8fb33b37766acf6474784b0d5245dab9a1690
> > > 
> > > force_wake_get can't call runtime_pm_get becuase pm_get can sleep. So if
> > > you want to read registers from atomic context you have to have a runtime
> > > pm reference from someone else.
> > 
> > Nope. That cannot work.
> 
> Well it works currently. So where do you see the problem?

Sampling registers from an timer - in particular, we really do not want
to disable runtime pm whilst trying to monitor the impact of runtime pm.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [Intel-gfx] [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
  2014-08-08  9:20             ` Chris Wilson
@ 2014-08-08  9:37                 ` Daniel Vetter
  0 siblings, 0 replies; 156+ messages in thread
From: Daniel Vetter @ 2014-08-08  9:37 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, oscar.mateo, intel-gfx, Greg KH; +Cc: LKML

On Fri, Aug 08, 2014 at 10:20:40AM +0100, Chris Wilson wrote:
> On Tue, Jul 29, 2014 at 12:26:36PM +0200, Daniel Vetter wrote:
> > On Tue, Jul 29, 2014 at 08:37:48AM +0100, Chris Wilson wrote:
> > > On Mon, Jul 28, 2014 at 10:54:06AM +0200, Daniel Vetter wrote:
> > > > On Sat, Jul 26, 2014 at 11:27:38AM +0100, Chris Wilson wrote:
> > > > > On Wed, Jun 18, 2014 at 10:54:13PM +0200, Daniel Vetter wrote:
> > > > > > On Fri, Jun 13, 2014 at 04:38:03PM +0100, oscar.mateo@intel.com wrote:
> > > > > > > From: Oscar Mateo <oscar.mateo@intel.com>
> > > > > > > 
> > > > > > > Or with a spinlock grabbed, because it might sleep, which is not
> > > > > > > a nice thing to do. Instead, do the runtime_pm get/put together
> > > > > > > with the create/destroy request, and handle the forcewake get/put
> > > > > > > directly.
> > > > > > > 
> > > > > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > > > > 
> > > > > > Looks like a fixup that should be squashed into relevant earlier patches.
> > > > > 
> > > > > The whole gen6_gt_force_wake_get() calling intel_runtime_pm_get() is
> > > > > broken due to this - we must be able to read registers in atomic
> > > > > context!
> > > > > 
> > > > > Please revert c8c8fb33b37766acf6474784b0d5245dab9a1690
> > > > 
> > > > force_wake_get can't call runtime_pm_get becuase pm_get can sleep. So if
> > > > you want to read registers from atomic context you have to have a runtime
> > > > pm reference from someone else.
> > > 
> > > Nope. That cannot work.
> > 
> > Well it works currently. So where do you see the problem?
> 
> Sampling registers from an timer - in particular, we really do not want
> to disable runtime pm whilst trying to monitor the impact of runtime pm.

In that case you can grab a runtime pm reference iff the device is powered
on already. Which won't call anything scary, just amounts to an
atomic_add_unless or so, and then drop it again. 

Unfortunately there doesn't seem to be such a thing around already, so
need to add it first. Greg, how much would you freak out if we add
something like

/**
 * pm_runtime_get_unless_suspended - grab a rpm ref if the device is on
 * 
 * Returns true if an rpm ref has been acquire, false otherwise. Can be
 * called from atomic context to e.g. sample perfomance counters (where we
 * obviously don't want to disturb system state if everything is off atm).
 */
static inline bool pm_runtime_get_unless_suspended(struct device *dev)
{
	return atomic_add_unless(&dev->power.usage_count, 1, 0);
}

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
@ 2014-08-08  9:37                 ` Daniel Vetter
  0 siblings, 0 replies; 156+ messages in thread
From: Daniel Vetter @ 2014-08-08  9:37 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, oscar.mateo, intel-gfx, Greg KH; +Cc: LKML

On Fri, Aug 08, 2014 at 10:20:40AM +0100, Chris Wilson wrote:
> On Tue, Jul 29, 2014 at 12:26:36PM +0200, Daniel Vetter wrote:
> > On Tue, Jul 29, 2014 at 08:37:48AM +0100, Chris Wilson wrote:
> > > On Mon, Jul 28, 2014 at 10:54:06AM +0200, Daniel Vetter wrote:
> > > > On Sat, Jul 26, 2014 at 11:27:38AM +0100, Chris Wilson wrote:
> > > > > On Wed, Jun 18, 2014 at 10:54:13PM +0200, Daniel Vetter wrote:
> > > > > > On Fri, Jun 13, 2014 at 04:38:03PM +0100, oscar.mateo@intel.com wrote:
> > > > > > > From: Oscar Mateo <oscar.mateo@intel.com>
> > > > > > > 
> > > > > > > Or with a spinlock grabbed, because it might sleep, which is not
> > > > > > > a nice thing to do. Instead, do the runtime_pm get/put together
> > > > > > > with the create/destroy request, and handle the forcewake get/put
> > > > > > > directly.
> > > > > > > 
> > > > > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > > > > 
> > > > > > Looks like a fixup that should be squashed into relevant earlier patches.
> > > > > 
> > > > > The whole gen6_gt_force_wake_get() calling intel_runtime_pm_get() is
> > > > > broken due to this - we must be able to read registers in atomic
> > > > > context!
> > > > > 
> > > > > Please revert c8c8fb33b37766acf6474784b0d5245dab9a1690
> > > > 
> > > > force_wake_get can't call runtime_pm_get becuase pm_get can sleep. So if
> > > > you want to read registers from atomic context you have to have a runtime
> > > > pm reference from someone else.
> > > 
> > > Nope. That cannot work.
> > 
> > Well it works currently. So where do you see the problem?
> 
> Sampling registers from an timer - in particular, we really do not want
> to disable runtime pm whilst trying to monitor the impact of runtime pm.

In that case you can grab a runtime pm reference iff the device is powered
on already. Which won't call anything scary, just amounts to an
atomic_add_unless or so, and then drop it again. 

Unfortunately there doesn't seem to be such a thing around already, so
need to add it first. Greg, how much would you freak out if we add
something like

/**
 * pm_runtime_get_unless_suspended - grab a rpm ref if the device is on
 * 
 * Returns true if an rpm ref has been acquire, false otherwise. Can be
 * called from atomic context to e.g. sample perfomance counters (where we
 * obviously don't want to disturb system state if everything is off atm).
 */
static inline bool pm_runtime_get_unless_suspended(struct device *dev)
{
	return atomic_add_unless(&dev->power.usage_count, 1, 0);
}

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [Intel-gfx] [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
  2014-08-08  9:37                 ` Daniel Vetter
@ 2014-08-08 13:41                   ` Greg KH
  -1 siblings, 0 replies; 156+ messages in thread
From: Greg KH @ 2014-08-08 13:41 UTC (permalink / raw)
  To: Rafael J. Wysocki, Chris Wilson, oscar.mateo, intel-gfx, LKML

On Fri, Aug 08, 2014 at 11:37:01AM +0200, Daniel Vetter wrote:
> On Fri, Aug 08, 2014 at 10:20:40AM +0100, Chris Wilson wrote:
> > On Tue, Jul 29, 2014 at 12:26:36PM +0200, Daniel Vetter wrote:
> > > On Tue, Jul 29, 2014 at 08:37:48AM +0100, Chris Wilson wrote:
> > > > On Mon, Jul 28, 2014 at 10:54:06AM +0200, Daniel Vetter wrote:
> > > > > On Sat, Jul 26, 2014 at 11:27:38AM +0100, Chris Wilson wrote:
> > > > > > On Wed, Jun 18, 2014 at 10:54:13PM +0200, Daniel Vetter wrote:
> > > > > > > On Fri, Jun 13, 2014 at 04:38:03PM +0100, oscar.mateo@intel.com wrote:
> > > > > > > > From: Oscar Mateo <oscar.mateo@intel.com>
> > > > > > > > 
> > > > > > > > Or with a spinlock grabbed, because it might sleep, which is not
> > > > > > > > a nice thing to do. Instead, do the runtime_pm get/put together
> > > > > > > > with the create/destroy request, and handle the forcewake get/put
> > > > > > > > directly.
> > > > > > > > 
> > > > > > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > > > > > 
> > > > > > > Looks like a fixup that should be squashed into relevant earlier patches.
> > > > > > 
> > > > > > The whole gen6_gt_force_wake_get() calling intel_runtime_pm_get() is
> > > > > > broken due to this - we must be able to read registers in atomic
> > > > > > context!
> > > > > > 
> > > > > > Please revert c8c8fb33b37766acf6474784b0d5245dab9a1690
> > > > > 
> > > > > force_wake_get can't call runtime_pm_get becuase pm_get can sleep. So if
> > > > > you want to read registers from atomic context you have to have a runtime
> > > > > pm reference from someone else.
> > > > 
> > > > Nope. That cannot work.
> > > 
> > > Well it works currently. So where do you see the problem?
> > 
> > Sampling registers from an timer - in particular, we really do not want
> > to disable runtime pm whilst trying to monitor the impact of runtime pm.
> 
> In that case you can grab a runtime pm reference iff the device is powered
> on already. Which won't call anything scary, just amounts to an
> atomic_add_unless or so, and then drop it again. 
> 
> Unfortunately there doesn't seem to be such a thing around already, so
> need to add it first. Greg, how much would you freak out if we add
> something like
> 
> /**
>  * pm_runtime_get_unless_suspended - grab a rpm ref if the device is on
>  * 
>  * Returns true if an rpm ref has been acquire, false otherwise. Can be
>  * called from atomic context to e.g. sample perfomance counters (where we
>  * obviously don't want to disturb system state if everything is off atm).
>  */
> static inline bool pm_runtime_get_unless_suspended(struct device *dev)
> {
> 	return atomic_add_unless(&dev->power.usage_count, 1, 0);
> }

I'd freak out a lot :)

Rafael, isn't there some other better way to resolve this issue without
resorting to something as "horrid" as the above?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
@ 2014-08-08 13:41                   ` Greg KH
  0 siblings, 0 replies; 156+ messages in thread
From: Greg KH @ 2014-08-08 13:41 UTC (permalink / raw)
  To: Rafael J. Wysocki, Chris Wilson, oscar.mateo, intel-gfx, LKML

On Fri, Aug 08, 2014 at 11:37:01AM +0200, Daniel Vetter wrote:
> On Fri, Aug 08, 2014 at 10:20:40AM +0100, Chris Wilson wrote:
> > On Tue, Jul 29, 2014 at 12:26:36PM +0200, Daniel Vetter wrote:
> > > On Tue, Jul 29, 2014 at 08:37:48AM +0100, Chris Wilson wrote:
> > > > On Mon, Jul 28, 2014 at 10:54:06AM +0200, Daniel Vetter wrote:
> > > > > On Sat, Jul 26, 2014 at 11:27:38AM +0100, Chris Wilson wrote:
> > > > > > On Wed, Jun 18, 2014 at 10:54:13PM +0200, Daniel Vetter wrote:
> > > > > > > On Fri, Jun 13, 2014 at 04:38:03PM +0100, oscar.mateo@intel.com wrote:
> > > > > > > > From: Oscar Mateo <oscar.mateo@intel.com>
> > > > > > > > 
> > > > > > > > Or with a spinlock grabbed, because it might sleep, which is not
> > > > > > > > a nice thing to do. Instead, do the runtime_pm get/put together
> > > > > > > > with the create/destroy request, and handle the forcewake get/put
> > > > > > > > directly.
> > > > > > > > 
> > > > > > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > > > > > 
> > > > > > > Looks like a fixup that should be squashed into relevant earlier patches.
> > > > > > 
> > > > > > The whole gen6_gt_force_wake_get() calling intel_runtime_pm_get() is
> > > > > > broken due to this - we must be able to read registers in atomic
> > > > > > context!
> > > > > > 
> > > > > > Please revert c8c8fb33b37766acf6474784b0d5245dab9a1690
> > > > > 
> > > > > force_wake_get can't call runtime_pm_get becuase pm_get can sleep. So if
> > > > > you want to read registers from atomic context you have to have a runtime
> > > > > pm reference from someone else.
> > > > 
> > > > Nope. That cannot work.
> > > 
> > > Well it works currently. So where do you see the problem?
> > 
> > Sampling registers from an timer - in particular, we really do not want
> > to disable runtime pm whilst trying to monitor the impact of runtime pm.
> 
> In that case you can grab a runtime pm reference iff the device is powered
> on already. Which won't call anything scary, just amounts to an
> atomic_add_unless or so, and then drop it again. 
> 
> Unfortunately there doesn't seem to be such a thing around already, so
> need to add it first. Greg, how much would you freak out if we add
> something like
> 
> /**
>  * pm_runtime_get_unless_suspended - grab a rpm ref if the device is on
>  * 
>  * Returns true if an rpm ref has been acquire, false otherwise. Can be
>  * called from atomic context to e.g. sample perfomance counters (where we
>  * obviously don't want to disturb system state if everything is off atm).
>  */
> static inline bool pm_runtime_get_unless_suspended(struct device *dev)
> {
> 	return atomic_add_unless(&dev->power.usage_count, 1, 0);
> }

I'd freak out a lot :)

Rafael, isn't there some other better way to resolve this issue without
resorting to something as "horrid" as the above?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [Intel-gfx] [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
  2014-08-08  9:37                 ` Daniel Vetter
@ 2014-08-09  0:14                   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 156+ messages in thread
From: Rafael J. Wysocki @ 2014-08-09  0:14 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Chris Wilson, oscar.mateo, intel-gfx, Greg KH, LKML, Alan Stern

On Friday, August 08, 2014 11:37:01 AM Daniel Vetter wrote:
> On Fri, Aug 08, 2014 at 10:20:40AM +0100, Chris Wilson wrote:
> > On Tue, Jul 29, 2014 at 12:26:36PM +0200, Daniel Vetter wrote:
> > > On Tue, Jul 29, 2014 at 08:37:48AM +0100, Chris Wilson wrote:
> > > > On Mon, Jul 28, 2014 at 10:54:06AM +0200, Daniel Vetter wrote:
> > > > > On Sat, Jul 26, 2014 at 11:27:38AM +0100, Chris Wilson wrote:
> > > > > > On Wed, Jun 18, 2014 at 10:54:13PM +0200, Daniel Vetter wrote:
> > > > > > > On Fri, Jun 13, 2014 at 04:38:03PM +0100, oscar.mateo@intel.com wrote:
> > > > > > > > From: Oscar Mateo <oscar.mateo@intel.com>
> > > > > > > > 
> > > > > > > > Or with a spinlock grabbed, because it might sleep, which is not
> > > > > > > > a nice thing to do. Instead, do the runtime_pm get/put together
> > > > > > > > with the create/destroy request, and handle the forcewake get/put
> > > > > > > > directly.
> > > > > > > > 
> > > > > > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > > > > > 
> > > > > > > Looks like a fixup that should be squashed into relevant earlier patches.
> > > > > > 
> > > > > > The whole gen6_gt_force_wake_get() calling intel_runtime_pm_get() is
> > > > > > broken due to this - we must be able to read registers in atomic
> > > > > > context!
> > > > > > 
> > > > > > Please revert c8c8fb33b37766acf6474784b0d5245dab9a1690
> > > > > 
> > > > > force_wake_get can't call runtime_pm_get becuase pm_get can sleep. So if
> > > > > you want to read registers from atomic context you have to have a runtime
> > > > > pm reference from someone else.
> > > > 
> > > > Nope. That cannot work.
> > > 
> > > Well it works currently. So where do you see the problem?
> > 
> > Sampling registers from an timer - in particular, we really do not want
> > to disable runtime pm whilst trying to monitor the impact of runtime pm.
> 
> In that case you can grab a runtime pm reference iff the device is powered
> on already. Which won't call anything scary, just amounts to an
> atomic_add_unless or so, and then drop it again. 
> 
> Unfortunately there doesn't seem to be such a thing around already, so
> need to add it first. Greg, how much would you freak out if we add
> something like
> 
> /**
>  * pm_runtime_get_unless_suspended - grab a rpm ref if the device is on
>  * 
>  * Returns true if an rpm ref has been acquire, false otherwise. Can be
>  * called from atomic context to e.g. sample perfomance counters (where we
>  * obviously don't want to disturb system state if everything is off atm).
>  */
> static inline bool pm_runtime_get_unless_suspended(struct device *dev)
> {
> 	return atomic_add_unless(&dev->power.usage_count, 1, 0);
> }

I don't think it'll work universally.

That'd need to be synchronized with other stuff done under the spinlock
and in fact, what you're interested in is runtime_status (and that being
RPM_ACTIVE) and not just the usage count.

Rafael


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
@ 2014-08-09  0:14                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 156+ messages in thread
From: Rafael J. Wysocki @ 2014-08-09  0:14 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Greg KH, intel-gfx, LKML, Alan Stern

On Friday, August 08, 2014 11:37:01 AM Daniel Vetter wrote:
> On Fri, Aug 08, 2014 at 10:20:40AM +0100, Chris Wilson wrote:
> > On Tue, Jul 29, 2014 at 12:26:36PM +0200, Daniel Vetter wrote:
> > > On Tue, Jul 29, 2014 at 08:37:48AM +0100, Chris Wilson wrote:
> > > > On Mon, Jul 28, 2014 at 10:54:06AM +0200, Daniel Vetter wrote:
> > > > > On Sat, Jul 26, 2014 at 11:27:38AM +0100, Chris Wilson wrote:
> > > > > > On Wed, Jun 18, 2014 at 10:54:13PM +0200, Daniel Vetter wrote:
> > > > > > > On Fri, Jun 13, 2014 at 04:38:03PM +0100, oscar.mateo@intel.com wrote:
> > > > > > > > From: Oscar Mateo <oscar.mateo@intel.com>
> > > > > > > > 
> > > > > > > > Or with a spinlock grabbed, because it might sleep, which is not
> > > > > > > > a nice thing to do. Instead, do the runtime_pm get/put together
> > > > > > > > with the create/destroy request, and handle the forcewake get/put
> > > > > > > > directly.
> > > > > > > > 
> > > > > > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > > > > > 
> > > > > > > Looks like a fixup that should be squashed into relevant earlier patches.
> > > > > > 
> > > > > > The whole gen6_gt_force_wake_get() calling intel_runtime_pm_get() is
> > > > > > broken due to this - we must be able to read registers in atomic
> > > > > > context!
> > > > > > 
> > > > > > Please revert c8c8fb33b37766acf6474784b0d5245dab9a1690
> > > > > 
> > > > > force_wake_get can't call runtime_pm_get becuase pm_get can sleep. So if
> > > > > you want to read registers from atomic context you have to have a runtime
> > > > > pm reference from someone else.
> > > > 
> > > > Nope. That cannot work.
> > > 
> > > Well it works currently. So where do you see the problem?
> > 
> > Sampling registers from an timer - in particular, we really do not want
> > to disable runtime pm whilst trying to monitor the impact of runtime pm.
> 
> In that case you can grab a runtime pm reference iff the device is powered
> on already. Which won't call anything scary, just amounts to an
> atomic_add_unless or so, and then drop it again. 
> 
> Unfortunately there doesn't seem to be such a thing around already, so
> need to add it first. Greg, how much would you freak out if we add
> something like
> 
> /**
>  * pm_runtime_get_unless_suspended - grab a rpm ref if the device is on
>  * 
>  * Returns true if an rpm ref has been acquire, false otherwise. Can be
>  * called from atomic context to e.g. sample perfomance counters (where we
>  * obviously don't want to disturb system state if everything is off atm).
>  */
> static inline bool pm_runtime_get_unless_suspended(struct device *dev)
> {
> 	return atomic_add_unless(&dev->power.usage_count, 1, 0);
> }

I don't think it'll work universally.

That'd need to be synchronized with other stuff done under the spinlock
and in fact, what you're interested in is runtime_status (and that being
RPM_ACTIVE) and not just the usage count.

Rafael

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [Intel-gfx] [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
  2014-08-08 13:41                   ` Greg KH
@ 2014-08-09  0:18                     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 156+ messages in thread
From: Rafael J. Wysocki @ 2014-08-09  0:18 UTC (permalink / raw)
  To: Greg KH, Daniel Vetter; +Cc: Chris Wilson, oscar.mateo, intel-gfx, LKML

On Friday, August 08, 2014 06:41:10 AM Greg KH wrote:
> On Fri, Aug 08, 2014 at 11:37:01AM +0200, Daniel Vetter wrote:
> > On Fri, Aug 08, 2014 at 10:20:40AM +0100, Chris Wilson wrote:
> > > On Tue, Jul 29, 2014 at 12:26:36PM +0200, Daniel Vetter wrote:
> > > > On Tue, Jul 29, 2014 at 08:37:48AM +0100, Chris Wilson wrote:
> > > > > On Mon, Jul 28, 2014 at 10:54:06AM +0200, Daniel Vetter wrote:
> > > > > > On Sat, Jul 26, 2014 at 11:27:38AM +0100, Chris Wilson wrote:
> > > > > > > On Wed, Jun 18, 2014 at 10:54:13PM +0200, Daniel Vetter wrote:
> > > > > > > > On Fri, Jun 13, 2014 at 04:38:03PM +0100, oscar.mateo@intel.com wrote:
> > > > > > > > > From: Oscar Mateo <oscar.mateo@intel.com>
> > > > > > > > > 
> > > > > > > > > Or with a spinlock grabbed, because it might sleep, which is not
> > > > > > > > > a nice thing to do. Instead, do the runtime_pm get/put together
> > > > > > > > > with the create/destroy request, and handle the forcewake get/put
> > > > > > > > > directly.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > > > > > > 
> > > > > > > > Looks like a fixup that should be squashed into relevant earlier patches.
> > > > > > > 
> > > > > > > The whole gen6_gt_force_wake_get() calling intel_runtime_pm_get() is
> > > > > > > broken due to this - we must be able to read registers in atomic
> > > > > > > context!
> > > > > > > 
> > > > > > > Please revert c8c8fb33b37766acf6474784b0d5245dab9a1690
> > > > > > 
> > > > > > force_wake_get can't call runtime_pm_get becuase pm_get can sleep. So if
> > > > > > you want to read registers from atomic context you have to have a runtime
> > > > > > pm reference from someone else.
> > > > > 
> > > > > Nope. That cannot work.
> > > > 
> > > > Well it works currently. So where do you see the problem?
> > > 
> > > Sampling registers from an timer - in particular, we really do not want
> > > to disable runtime pm whilst trying to monitor the impact of runtime pm.
> > 
> > In that case you can grab a runtime pm reference iff the device is powered
> > on already. Which won't call anything scary, just amounts to an
> > atomic_add_unless or so, and then drop it again. 
> > 
> > Unfortunately there doesn't seem to be such a thing around already, so
> > need to add it first. Greg, how much would you freak out if we add
> > something like
> > 
> > /**
> >  * pm_runtime_get_unless_suspended - grab a rpm ref if the device is on
> >  * 
> >  * Returns true if an rpm ref has been acquire, false otherwise. Can be
> >  * called from atomic context to e.g. sample perfomance counters (where we
> >  * obviously don't want to disturb system state if everything is off atm).
> >  */
> > static inline bool pm_runtime_get_unless_suspended(struct device *dev)
> > {
> > 	return atomic_add_unless(&dev->power.usage_count, 1, 0);
> > }
> 
> I'd freak out a lot :)
> 
> Rafael, isn't there some other better way to resolve this issue without
> resorting to something as "horrid" as the above?

I'm not sure how to solve this at all, because the above isn't going to work
in general in my opinion.

Can anyone please try to explain the problem to me without referring to the
i915 internals?

Rafael


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
@ 2014-08-09  0:18                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 156+ messages in thread
From: Rafael J. Wysocki @ 2014-08-09  0:18 UTC (permalink / raw)
  To: Greg KH, Daniel Vetter; +Cc: intel-gfx, LKML

On Friday, August 08, 2014 06:41:10 AM Greg KH wrote:
> On Fri, Aug 08, 2014 at 11:37:01AM +0200, Daniel Vetter wrote:
> > On Fri, Aug 08, 2014 at 10:20:40AM +0100, Chris Wilson wrote:
> > > On Tue, Jul 29, 2014 at 12:26:36PM +0200, Daniel Vetter wrote:
> > > > On Tue, Jul 29, 2014 at 08:37:48AM +0100, Chris Wilson wrote:
> > > > > On Mon, Jul 28, 2014 at 10:54:06AM +0200, Daniel Vetter wrote:
> > > > > > On Sat, Jul 26, 2014 at 11:27:38AM +0100, Chris Wilson wrote:
> > > > > > > On Wed, Jun 18, 2014 at 10:54:13PM +0200, Daniel Vetter wrote:
> > > > > > > > On Fri, Jun 13, 2014 at 04:38:03PM +0100, oscar.mateo@intel.com wrote:
> > > > > > > > > From: Oscar Mateo <oscar.mateo@intel.com>
> > > > > > > > > 
> > > > > > > > > Or with a spinlock grabbed, because it might sleep, which is not
> > > > > > > > > a nice thing to do. Instead, do the runtime_pm get/put together
> > > > > > > > > with the create/destroy request, and handle the forcewake get/put
> > > > > > > > > directly.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > > > > > > 
> > > > > > > > Looks like a fixup that should be squashed into relevant earlier patches.
> > > > > > > 
> > > > > > > The whole gen6_gt_force_wake_get() calling intel_runtime_pm_get() is
> > > > > > > broken due to this - we must be able to read registers in atomic
> > > > > > > context!
> > > > > > > 
> > > > > > > Please revert c8c8fb33b37766acf6474784b0d5245dab9a1690
> > > > > > 
> > > > > > force_wake_get can't call runtime_pm_get becuase pm_get can sleep. So if
> > > > > > you want to read registers from atomic context you have to have a runtime
> > > > > > pm reference from someone else.
> > > > > 
> > > > > Nope. That cannot work.
> > > > 
> > > > Well it works currently. So where do you see the problem?
> > > 
> > > Sampling registers from an timer - in particular, we really do not want
> > > to disable runtime pm whilst trying to monitor the impact of runtime pm.
> > 
> > In that case you can grab a runtime pm reference iff the device is powered
> > on already. Which won't call anything scary, just amounts to an
> > atomic_add_unless or so, and then drop it again. 
> > 
> > Unfortunately there doesn't seem to be such a thing around already, so
> > need to add it first. Greg, how much would you freak out if we add
> > something like
> > 
> > /**
> >  * pm_runtime_get_unless_suspended - grab a rpm ref if the device is on
> >  * 
> >  * Returns true if an rpm ref has been acquire, false otherwise. Can be
> >  * called from atomic context to e.g. sample perfomance counters (where we
> >  * obviously don't want to disturb system state if everything is off atm).
> >  */
> > static inline bool pm_runtime_get_unless_suspended(struct device *dev)
> > {
> > 	return atomic_add_unless(&dev->power.usage_count, 1, 0);
> > }
> 
> I'd freak out a lot :)
> 
> Rafael, isn't there some other better way to resolve this issue without
> resorting to something as "horrid" as the above?

I'm not sure how to solve this at all, because the above isn't going to work
in general in my opinion.

Can anyone please try to explain the problem to me without referring to the
i915 internals?

Rafael

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [Intel-gfx] [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
  2014-08-09  0:14                   ` Rafael J. Wysocki
@ 2014-08-09  1:21                     ` Alan Stern
  -1 siblings, 0 replies; 156+ messages in thread
From: Alan Stern @ 2014-08-09  1:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Daniel Vetter, Chris Wilson, oscar.mateo, intel-gfx, Greg KH, LKML

On Sat, 9 Aug 2014, Rafael J. Wysocki wrote:

> > > > Well it works currently. So where do you see the problem?
> > > 
> > > Sampling registers from an timer - in particular, we really do not want
> > > to disable runtime pm whilst trying to monitor the impact of runtime pm.
> > 
> > In that case you can grab a runtime pm reference iff the device is powered
> > on already. Which won't call anything scary, just amounts to an
> > atomic_add_unless or so, and then drop it again. 
> > 
> > Unfortunately there doesn't seem to be such a thing around already, so
> > need to add it first. Greg, how much would you freak out if we add
> > something like
> > 
> > /**
> >  * pm_runtime_get_unless_suspended - grab a rpm ref if the device is on
> >  * 
> >  * Returns true if an rpm ref has been acquire, false otherwise. Can be
> >  * called from atomic context to e.g. sample perfomance counters (where we
> >  * obviously don't want to disturb system state if everything is off atm).
> >  */
> > static inline bool pm_runtime_get_unless_suspended(struct device *dev)
> > {
> > 	return atomic_add_unless(&dev->power.usage_count, 1, 0);
> > }
> 
> I don't think it'll work universally.
> 
> That'd need to be synchronized with other stuff done under the spinlock
> and in fact, what you're interested in is runtime_status (and that being
> RPM_ACTIVE) and not just the usage count.

That's right.  You'd need to acquire the spinlock, test runtime_status, 
do the register sampling if the status is RPM_ACTIVE, and then drop the 
spinlock.

I suppose wrapper routines for acquiring and releasing the spinlock
could be added to the runtime-PM API.  Something like this:

#define pm_runtime_lock(dev, flags)			\
		spin_lock_irqsave(&(dev)->power.lock, flags)
#define pm_runtime_unlock(dev, flags)			\
		spin_unlock_irqrestore(&(dev)->power.lock, flags)

It looks a little silly but it would work.

Alan Stern


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [Intel-gfx] [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
@ 2014-08-09  1:21                     ` Alan Stern
  0 siblings, 0 replies; 156+ messages in thread
From: Alan Stern @ 2014-08-09  1:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Daniel Vetter, Chris Wilson, oscar.mateo, intel-gfx, Greg KH, LKML

On Sat, 9 Aug 2014, Rafael J. Wysocki wrote:

> > > > Well it works currently. So where do you see the problem?
> > > 
> > > Sampling registers from an timer - in particular, we really do not want
> > > to disable runtime pm whilst trying to monitor the impact of runtime pm.
> > 
> > In that case you can grab a runtime pm reference iff the device is powered
> > on already. Which won't call anything scary, just amounts to an
> > atomic_add_unless or so, and then drop it again. 
> > 
> > Unfortunately there doesn't seem to be such a thing around already, so
> > need to add it first. Greg, how much would you freak out if we add
> > something like
> > 
> > /**
> >  * pm_runtime_get_unless_suspended - grab a rpm ref if the device is on
> >  * 
> >  * Returns true if an rpm ref has been acquire, false otherwise. Can be
> >  * called from atomic context to e.g. sample perfomance counters (where we
> >  * obviously don't want to disturb system state if everything is off atm).
> >  */
> > static inline bool pm_runtime_get_unless_suspended(struct device *dev)
> > {
> > 	return atomic_add_unless(&dev->power.usage_count, 1, 0);
> > }
> 
> I don't think it'll work universally.
> 
> That'd need to be synchronized with other stuff done under the spinlock
> and in fact, what you're interested in is runtime_status (and that being
> RPM_ACTIVE) and not just the usage count.

That's right.  You'd need to acquire the spinlock, test runtime_status, 
do the register sampling if the status is RPM_ACTIVE, and then drop the 
spinlock.

I suppose wrapper routines for acquiring and releasing the spinlock
could be added to the runtime-PM API.  Something like this:

#define pm_runtime_lock(dev, flags)			\
		spin_lock_irqsave(&(dev)->power.lock, flags)
#define pm_runtime_unlock(dev, flags)			\
		spin_unlock_irqrestore(&(dev)->power.lock, flags)

It looks a little silly but it would work.

Alan Stern

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [Intel-gfx] [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
  2014-08-09  1:21                     ` Alan Stern
@ 2014-08-09  8:53                       ` Daniel Vetter
  -1 siblings, 0 replies; 156+ messages in thread
From: Daniel Vetter @ 2014-08-09  8:53 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Chris Wilson, Mateo Lozano, Oscar, intel-gfx,
	Greg KH, LKML

On Sat, Aug 9, 2014 at 3:21 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Sat, 9 Aug 2014, Rafael J. Wysocki wrote:
>
>> > > > Well it works currently. So where do you see the problem?
>> > >
>> > > Sampling registers from an timer - in particular, we really do not want
>> > > to disable runtime pm whilst trying to monitor the impact of runtime pm.
>> >
>> > In that case you can grab a runtime pm reference iff the device is powered
>> > on already. Which won't call anything scary, just amounts to an
>> > atomic_add_unless or so, and then drop it again.
>> >
>> > Unfortunately there doesn't seem to be such a thing around already, so
>> > need to add it first. Greg, how much would you freak out if we add
>> > something like
>> >
>> > /**
>> >  * pm_runtime_get_unless_suspended - grab a rpm ref if the device is on
>> >  *
>> >  * Returns true if an rpm ref has been acquire, false otherwise. Can be
>> >  * called from atomic context to e.g. sample perfomance counters (where we
>> >  * obviously don't want to disturb system state if everything is off atm).
>> >  */
>> > static inline bool pm_runtime_get_unless_suspended(struct device *dev)
>> > {
>> >     return atomic_add_unless(&dev->power.usage_count, 1, 0);
>> > }
>>
>> I don't think it'll work universally.
>>
>> That'd need to be synchronized with other stuff done under the spinlock
>> and in fact, what you're interested in is runtime_status (and that being
>> RPM_ACTIVE) and not just the usage count.
>
> That's right.  You'd need to acquire the spinlock, test runtime_status,
> do the register sampling if the status is RPM_ACTIVE, and then drop the
> spinlock.
>
> I suppose wrapper routines for acquiring and releasing the spinlock
> could be added to the runtime-PM API.  Something like this:
>
> #define pm_runtime_lock(dev, flags)                     \
>                 spin_lock_irqsave(&(dev)->power.lock, flags)
> #define pm_runtime_unlock(dev, flags)                   \
>                 spin_unlock_irqrestore(&(dev)->power.lock, flags)
>
> It looks a little silly but it would work.

Oh right, I've totally ignored all the async resuming/suspending
stuff. Anyway what we want to do is sample a perf monitoring unit on
the gpu from an hrtimer and then expose that as a perf pmu. But we
don't want to wake up the gpu for the sampling or hold a special
reference, since that disturbs the sampling and also tends to upset
the gpu.

Note that those registers are just status indicator registers, so no
counters that will get reset when the device is suspended. For the
later counters (we have them too) we're not sure yet how to handle
them, especially since the amount the gpu saves/restores into its own
context storage depends upon the platform.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
@ 2014-08-09  8:53                       ` Daniel Vetter
  0 siblings, 0 replies; 156+ messages in thread
From: Daniel Vetter @ 2014-08-09  8:53 UTC (permalink / raw)
  To: Alan Stern; +Cc: Greg KH, intel-gfx, Rafael J. Wysocki, LKML

On Sat, Aug 9, 2014 at 3:21 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Sat, 9 Aug 2014, Rafael J. Wysocki wrote:
>
>> > > > Well it works currently. So where do you see the problem?
>> > >
>> > > Sampling registers from an timer - in particular, we really do not want
>> > > to disable runtime pm whilst trying to monitor the impact of runtime pm.
>> >
>> > In that case you can grab a runtime pm reference iff the device is powered
>> > on already. Which won't call anything scary, just amounts to an
>> > atomic_add_unless or so, and then drop it again.
>> >
>> > Unfortunately there doesn't seem to be such a thing around already, so
>> > need to add it first. Greg, how much would you freak out if we add
>> > something like
>> >
>> > /**
>> >  * pm_runtime_get_unless_suspended - grab a rpm ref if the device is on
>> >  *
>> >  * Returns true if an rpm ref has been acquire, false otherwise. Can be
>> >  * called from atomic context to e.g. sample perfomance counters (where we
>> >  * obviously don't want to disturb system state if everything is off atm).
>> >  */
>> > static inline bool pm_runtime_get_unless_suspended(struct device *dev)
>> > {
>> >     return atomic_add_unless(&dev->power.usage_count, 1, 0);
>> > }
>>
>> I don't think it'll work universally.
>>
>> That'd need to be synchronized with other stuff done under the spinlock
>> and in fact, what you're interested in is runtime_status (and that being
>> RPM_ACTIVE) and not just the usage count.
>
> That's right.  You'd need to acquire the spinlock, test runtime_status,
> do the register sampling if the status is RPM_ACTIVE, and then drop the
> spinlock.
>
> I suppose wrapper routines for acquiring and releasing the spinlock
> could be added to the runtime-PM API.  Something like this:
>
> #define pm_runtime_lock(dev, flags)                     \
>                 spin_lock_irqsave(&(dev)->power.lock, flags)
> #define pm_runtime_unlock(dev, flags)                   \
>                 spin_unlock_irqrestore(&(dev)->power.lock, flags)
>
> It looks a little silly but it would work.

Oh right, I've totally ignored all the async resuming/suspending
stuff. Anyway what we want to do is sample a perf monitoring unit on
the gpu from an hrtimer and then expose that as a perf pmu. But we
don't want to wake up the gpu for the sampling or hold a special
reference, since that disturbs the sampling and also tends to upset
the gpu.

Note that those registers are just status indicator registers, so no
counters that will get reset when the device is suspended. For the
later counters (we have them too) we're not sure yet how to handle
them, especially since the amount the gpu saves/restores into its own
context storage depends upon the platform.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [Intel-gfx] [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
  2014-08-09  8:53                       ` Daniel Vetter
@ 2014-08-10  1:55                         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 156+ messages in thread
From: Rafael J. Wysocki @ 2014-08-10  1:55 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Alan Stern, Chris Wilson, Mateo Lozano, Oscar, intel-gfx, Greg KH, LKML

On Saturday, August 09, 2014 10:53:03 AM Daniel Vetter wrote:
> On Sat, Aug 9, 2014 at 3:21 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Sat, 9 Aug 2014, Rafael J. Wysocki wrote:
> >
> >> > > > Well it works currently. So where do you see the problem?
> >> > >
> >> > > Sampling registers from an timer - in particular, we really do not want
> >> > > to disable runtime pm whilst trying to monitor the impact of runtime pm.
> >> >
> >> > In that case you can grab a runtime pm reference iff the device is powered
> >> > on already. Which won't call anything scary, just amounts to an
> >> > atomic_add_unless or so, and then drop it again.
> >> >
> >> > Unfortunately there doesn't seem to be such a thing around already, so
> >> > need to add it first. Greg, how much would you freak out if we add
> >> > something like
> >> >
> >> > /**
> >> >  * pm_runtime_get_unless_suspended - grab a rpm ref if the device is on
> >> >  *
> >> >  * Returns true if an rpm ref has been acquire, false otherwise. Can be
> >> >  * called from atomic context to e.g. sample perfomance counters (where we
> >> >  * obviously don't want to disturb system state if everything is off atm).
> >> >  */
> >> > static inline bool pm_runtime_get_unless_suspended(struct device *dev)
> >> > {
> >> >     return atomic_add_unless(&dev->power.usage_count, 1, 0);
> >> > }
> >>
> >> I don't think it'll work universally.
> >>
> >> That'd need to be synchronized with other stuff done under the spinlock
> >> and in fact, what you're interested in is runtime_status (and that being
> >> RPM_ACTIVE) and not just the usage count.
> >
> > That's right.  You'd need to acquire the spinlock, test runtime_status,
> > do the register sampling if the status is RPM_ACTIVE, and then drop the
> > spinlock.
> >
> > I suppose wrapper routines for acquiring and releasing the spinlock
> > could be added to the runtime-PM API.  Something like this:
> >
> > #define pm_runtime_lock(dev, flags)                     \
> >                 spin_lock_irqsave(&(dev)->power.lock, flags)
> > #define pm_runtime_unlock(dev, flags)                   \
> >                 spin_unlock_irqrestore(&(dev)->power.lock, flags)
> >
> > It looks a little silly but it would work.
> 
> Oh right, I've totally ignored all the async resuming/suspending
> stuff. Anyway what we want to do is sample a perf monitoring unit on
> the gpu from an hrtimer and then expose that as a perf pmu. But we
> don't want to wake up the gpu for the sampling or hold a special
> reference, since that disturbs the sampling and also tends to upset
> the gpu.

The way to go is as Alan said: take the spinlock, check the runtime status,
do stuff and release the spinlock.

Rafael


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
@ 2014-08-10  1:55                         ` Rafael J. Wysocki
  0 siblings, 0 replies; 156+ messages in thread
From: Rafael J. Wysocki @ 2014-08-10  1:55 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Greg KH, intel-gfx, LKML, Alan Stern

On Saturday, August 09, 2014 10:53:03 AM Daniel Vetter wrote:
> On Sat, Aug 9, 2014 at 3:21 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Sat, 9 Aug 2014, Rafael J. Wysocki wrote:
> >
> >> > > > Well it works currently. So where do you see the problem?
> >> > >
> >> > > Sampling registers from an timer - in particular, we really do not want
> >> > > to disable runtime pm whilst trying to monitor the impact of runtime pm.
> >> >
> >> > In that case you can grab a runtime pm reference iff the device is powered
> >> > on already. Which won't call anything scary, just amounts to an
> >> > atomic_add_unless or so, and then drop it again.
> >> >
> >> > Unfortunately there doesn't seem to be such a thing around already, so
> >> > need to add it first. Greg, how much would you freak out if we add
> >> > something like
> >> >
> >> > /**
> >> >  * pm_runtime_get_unless_suspended - grab a rpm ref if the device is on
> >> >  *
> >> >  * Returns true if an rpm ref has been acquire, false otherwise. Can be
> >> >  * called from atomic context to e.g. sample perfomance counters (where we
> >> >  * obviously don't want to disturb system state if everything is off atm).
> >> >  */
> >> > static inline bool pm_runtime_get_unless_suspended(struct device *dev)
> >> > {
> >> >     return atomic_add_unless(&dev->power.usage_count, 1, 0);
> >> > }
> >>
> >> I don't think it'll work universally.
> >>
> >> That'd need to be synchronized with other stuff done under the spinlock
> >> and in fact, what you're interested in is runtime_status (and that being
> >> RPM_ACTIVE) and not just the usage count.
> >
> > That's right.  You'd need to acquire the spinlock, test runtime_status,
> > do the register sampling if the status is RPM_ACTIVE, and then drop the
> > spinlock.
> >
> > I suppose wrapper routines for acquiring and releasing the spinlock
> > could be added to the runtime-PM API.  Something like this:
> >
> > #define pm_runtime_lock(dev, flags)                     \
> >                 spin_lock_irqsave(&(dev)->power.lock, flags)
> > #define pm_runtime_unlock(dev, flags)                   \
> >                 spin_unlock_irqrestore(&(dev)->power.lock, flags)
> >
> > It looks a little silly but it would work.
> 
> Oh right, I've totally ignored all the async resuming/suspending
> stuff. Anyway what we want to do is sample a perf monitoring unit on
> the gpu from an hrtimer and then expose that as a perf pmu. But we
> don't want to wake up the gpu for the sampling or hold a special
> reference, since that disturbs the sampling and also tends to upset
> the gpu.

The way to go is as Alan said: take the spinlock, check the runtime status,
do stuff and release the spinlock.

Rafael

^ permalink raw reply	[flat|nested] 156+ messages in thread

end of thread, other threads:[~2014-08-10  1:36 UTC | newest]

Thread overview: 156+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-13 15:37 [PATCH 00/53] Execlists v3 oscar.mateo
2014-06-13 15:37 ` [PATCH 01/53] drm/i915: Extract context backing object allocation oscar.mateo
2014-06-13 15:37 ` [PATCH 02/53] drm/i915: Rename ctx->obj to ctx->render_obj oscar.mateo
2014-06-13 17:00   ` Daniel Vetter
2014-06-16 15:20     ` Mateo Lozano, Oscar
2014-06-13 17:15   ` Chris Wilson
2014-06-13 15:37 ` [PATCH 03/53] drm/i915: Add a dev pointer to the context oscar.mateo
2014-06-13 15:37 ` [PATCH 04/53] drm/i915: Extract ringbuffer destroy & make alloc outside accesible oscar.mateo
2014-06-18 21:39   ` Volkin, Bradley D
2014-06-19 10:42     ` Mateo Lozano, Oscar
2014-06-13 15:37 ` [PATCH 05/53] drm/i915: Move i915_gem_validate_context() to i915_gem_context.c oscar.mateo
2014-06-13 17:11   ` Chris Wilson
2014-06-16 15:18     ` Mateo Lozano, Oscar
2014-06-18 20:00       ` Volkin, Bradley D
2014-06-13 15:37 ` [PATCH 06/53] drm/i915/bdw: Introduce one context backing object per engine oscar.mateo
2014-06-18 20:16   ` Daniel Vetter
2014-06-19  8:52     ` Mateo Lozano, Oscar
2014-06-19 10:57       ` Daniel Vetter
2014-06-13 15:37 ` [PATCH 07/53] drm/i915/bdw: New file for Logical Ring Contexts and Execlists oscar.mateo
2014-06-18 20:17   ` Daniel Vetter
2014-06-19  9:01     ` Mateo Lozano, Oscar
2014-06-13 15:37 ` [PATCH 08/53] drm/i915/bdw: Macro for LRCs and module option for Execlists oscar.mateo
2014-06-18 20:19   ` Daniel Vetter
2014-06-19  9:04     ` Mateo Lozano, Oscar
2014-06-13 15:37 ` [PATCH 09/53] drm/i915/bdw: Initialization for Logical Ring Contexts oscar.mateo
2014-06-18 20:24   ` Daniel Vetter
2014-06-19  9:23     ` Mateo Lozano, Oscar
2014-06-19 10:08       ` Daniel Vetter
2014-06-19 10:10         ` Mateo Lozano, Oscar
2014-06-19 10:34           ` Daniel Vetter
2014-06-13 15:37 ` [PATCH 10/53] drm/i915/bdw: A bit more advanced context init/fini oscar.mateo
2014-06-18 22:13   ` Volkin, Bradley D
2014-06-19  6:13     ` Daniel Vetter
2014-06-13 15:37 ` [PATCH 11/53] drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts oscar.mateo
2014-06-18 22:19   ` Volkin, Bradley D
2014-06-23 12:07     ` Mateo Lozano, Oscar
2014-06-13 15:37 ` [PATCH 12/53] drm/i915/bdw: Populate LR contexts (somewhat) oscar.mateo
2014-06-18 23:24   ` Volkin, Bradley D
2014-06-23 12:42     ` Mateo Lozano, Oscar
2014-06-23 15:05       ` Volkin, Bradley D
2014-06-23 15:11         ` Mateo Lozano, Oscar
2014-06-13 15:37 ` [PATCH 13/53] drm/i915/bdw: Deferred creation of user-created LRCs oscar.mateo
2014-06-18 20:27   ` Daniel Vetter
2014-06-13 15:37 ` [PATCH 14/53] drm/i915/bdw: Render moot context reset and switch when LRCs are enabled oscar.mateo
2014-06-13 15:37 ` [PATCH 15/53] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs oscar.mateo
2014-06-18 23:42   ` Volkin, Bradley D
2014-06-23 12:45     ` Mateo Lozano, Oscar
2014-06-13 15:37 ` [PATCH 16/53] drm/i915/bdw: Skeleton for the new logical rings submission path oscar.mateo
2014-06-13 15:37 ` [PATCH 17/53] drm/i915/bdw: Generic logical ring init and cleanup oscar.mateo
2014-06-13 15:37 ` [PATCH 18/53] drm/i915/bdw: New header file for LRs, LRCs and Execlists oscar.mateo
2014-06-13 15:37 ` [PATCH 19/53] drm/i915: Extract pipe control fini & make init outside accesible oscar.mateo
2014-06-18 20:31   ` Daniel Vetter
2014-06-19  0:04   ` Volkin, Bradley D
2014-06-19 10:58     ` Mateo Lozano, Oscar
2014-06-13 15:37 ` [PATCH 20/53] drm/i915/bdw: GEN-specific logical ring init oscar.mateo
2014-06-13 15:37 ` [PATCH 21/53] drm/i915/bdw: GEN-specific logical ring set/get seqno oscar.mateo
2014-06-13 15:37 ` [PATCH 22/53] drm/i915: Make ring_space more generic and outside accesible oscar.mateo
2014-06-13 15:37 ` [PATCH 23/53] drm/i915: Generalize intel_ring_get_tail oscar.mateo
2014-06-20 20:17   ` Volkin, Bradley D
2014-06-13 15:37 ` [PATCH 24/53] drm/i915: Make intel_ring_stopped outside accesible oscar.mateo
2014-06-13 15:37 ` [PATCH 25/53] drm/i915/bdw: GEN-specific logical ring submit context (somewhat) oscar.mateo
2014-06-20 20:28   ` Volkin, Bradley D
2014-06-23 12:49     ` Mateo Lozano, Oscar
2014-06-13 15:37 ` [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism oscar.mateo
2014-06-20 21:00   ` Volkin, Bradley D
2014-06-23 13:09     ` Mateo Lozano, Oscar
2014-06-23 13:13       ` Chris Wilson
2014-06-23 13:18         ` Mateo Lozano, Oscar
2014-06-23 13:27           ` Chris Wilson
2014-06-23 13:36             ` Mateo Lozano, Oscar
2014-06-23 13:41               ` Chris Wilson
2014-06-23 14:35                 ` Mateo Lozano, Oscar
2014-06-23 19:10                   ` Volkin, Bradley D
2014-06-24 12:29                     ` Mateo Lozano, Oscar
2014-07-07 12:39                       ` Daniel Vetter
2014-06-24  0:23                   ` Ben Widawsky
2014-06-24 11:45                     ` Mateo Lozano, Oscar
2014-06-24 14:41                       ` Volkin, Bradley D
2014-06-24 17:19         ` Jesse Barnes
2014-06-26 13:28           ` Mateo Lozano, Oscar
2014-07-07 12:41         ` Daniel Vetter
2014-06-13 15:37 ` [PATCH 27/53] drm/i915/bdw: GEN-specific logical ring emit request oscar.mateo
2014-06-20 21:18   ` Volkin, Bradley D
2014-06-23 15:48     ` Mateo Lozano, Oscar
2014-06-13 15:37 ` [PATCH 28/53] drm/i915/bdw: GEN-specific logical ring emit flush oscar.mateo
2014-06-20 21:39   ` Volkin, Bradley D
2014-06-13 15:37 ` [PATCH 29/53] drm/i915/bdw: Emission of requests with logical rings oscar.mateo
2014-06-13 15:37 ` [PATCH 30/53] drm/i915/bdw: Ring idle and stop " oscar.mateo
2014-06-13 15:37 ` [PATCH 31/53] drm/i915/bdw: Interrupts " oscar.mateo
2014-06-13 15:37 ` [PATCH 32/53] drm/i915/bdw: GEN-specific logical ring emit batchbuffer start oscar.mateo
2014-06-13 15:37 ` [PATCH 33/53] drm/i915: Extract the actual workload submission mechanism from execbuffer oscar.mateo
2014-06-13 15:37 ` [PATCH 34/53] drm/i915: Make move_to_active and retire_commands outside accesible oscar.mateo
2014-06-13 15:37 ` [PATCH 35/53] drm/i915/bdw: Workload submission mechanism for Execlists oscar.mateo
2014-06-13 15:37 ` [PATCH 36/53] drm/i915: Abstract the workload submission mechanism away oscar.mateo
2014-06-18 20:40   ` Daniel Vetter
2014-06-13 15:37 ` [PATCH 37/53] drm/i915/bdw: Implement context switching (somewhat) oscar.mateo
2014-06-13 17:00   ` Chris Wilson
2014-06-13 15:37 ` [PATCH 38/53] drm/i915/bdw: Write the tail pointer, LRC style oscar.mateo
2014-06-13 15:37 ` [PATCH 39/53] drm/i915/bdw: Two-stage execlist submit process oscar.mateo
2014-06-13 15:37 ` [PATCH 40/53] drm/i915/bdw: Handle context switch events oscar.mateo
2014-06-13 15:37 ` [PATCH 41/53] drm/i915/bdw: Avoid non-lite-restore preemptions oscar.mateo
2014-06-18 20:49   ` Daniel Vetter
2014-06-23 11:52     ` Mateo Lozano, Oscar
2014-07-07 12:47       ` Daniel Vetter
2014-06-13 15:38 ` [PATCH 42/53] drm/i915/bdw: Make sure gpu reset still works with Execlists oscar.mateo
2014-06-18 20:50   ` Daniel Vetter
2014-06-19  9:37     ` Mateo Lozano, Oscar
2014-06-13 15:38 ` [PATCH 43/53] drm/i915/bdw: Make sure error capture keeps working " oscar.mateo
2014-06-13 16:54   ` Chris Wilson
2014-06-18 20:52   ` Daniel Vetter
2014-06-18 20:53     ` Daniel Vetter
2014-06-13 15:38 ` [PATCH 44/53] drm/i915/bdw: Help out the ctx switch interrupt handler oscar.mateo
2014-06-13 15:38 ` [PATCH 45/53] drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt oscar.mateo
2014-06-18 20:54   ` Daniel Vetter
2014-07-26 10:27     ` Chris Wilson
2014-07-28  8:54       ` Daniel Vetter
2014-07-29  7:37         ` Chris Wilson
2014-07-29 10:26           ` Daniel Vetter
2014-08-08  9:20             ` Chris Wilson
2014-08-08  9:37               ` [Intel-gfx] " Daniel Vetter
2014-08-08  9:37                 ` Daniel Vetter
2014-08-08 13:41                 ` [Intel-gfx] " Greg KH
2014-08-08 13:41                   ` Greg KH
2014-08-09  0:18                   ` [Intel-gfx] " Rafael J. Wysocki
2014-08-09  0:18                     ` Rafael J. Wysocki
2014-08-09  0:14                 ` [Intel-gfx] " Rafael J. Wysocki
2014-08-09  0:14                   ` Rafael J. Wysocki
2014-08-09  1:21                   ` [Intel-gfx] " Alan Stern
2014-08-09  1:21                     ` Alan Stern
2014-08-09  8:53                     ` Daniel Vetter
2014-08-09  8:53                       ` Daniel Vetter
2014-08-10  1:55                       ` [Intel-gfx] " Rafael J. Wysocki
2014-08-10  1:55                         ` Rafael J. Wysocki
2014-06-13 15:38 ` [PATCH 46/53] drm/i915/bdw: Display execlists info in debugfs oscar.mateo
2014-06-18 20:59   ` Daniel Vetter
2014-06-13 15:38 ` [PATCH 47/53] drm/i915/bdw: Display context backing obj & ringbuffer " oscar.mateo
2014-06-13 15:38 ` [PATCH 48/53] drm/i915/bdw: Print context state " oscar.mateo
2014-06-13 15:38 ` [PATCH 49/53] drm/i915: Extract render state preparation oscar.mateo
2014-06-13 15:38 ` [PATCH 50/53] drm/i915/bdw: Render state init for Execlists oscar.mateo
2014-06-13 15:38 ` [PATCH 51/53] drm/i915/bdw: Document Logical Rings, LR contexts and Execlists oscar.mateo
2014-06-13 16:51   ` Chris Wilson
2014-06-16 15:24     ` Mateo Lozano, Oscar
2014-06-16 17:56       ` Daniel Vetter
2014-06-17  8:22         ` Mateo Lozano, Oscar
2014-06-17  9:39           ` Daniel Vetter
2014-06-17  9:46             ` Mateo Lozano, Oscar
2014-06-17 10:08               ` Daniel Vetter
2014-06-17 10:12                 ` Mateo Lozano, Oscar
2014-06-13 15:38 ` [PATCH 52/53] drm/i915/bdw: Enable logical ring contexts oscar.mateo
2014-06-13 15:38 ` [PATCH 53/53] !UPSTREAM: drm/i915: Use MMIO flips oscar.mateo
2014-06-18 21:01   ` Daniel Vetter
2014-06-19  9:50     ` Mateo Lozano, Oscar
2014-06-19 10:04       ` Daniel Vetter
2014-06-19 10:13       ` Chris Wilson
2014-06-19 10:33         ` Mateo Lozano, Oscar
2014-06-18 21:26 ` [PATCH 00/53] Execlists v3 Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.