[Intel-gfx] [RFC 0/6] Start separating GuC and execlists submission

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Intel-gfx] [RFC 0/6] Start separating GuC and execlists submission
@ 2020-01-25  0:55 Daniele Ceraolo Spurio
  2020-01-25  0:55 ` [Intel-gfx] [RFC 1/6] drm/i915/guc: Add guc-specific breadcrumb functions Daniele Ceraolo Spurio
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Daniele Ceraolo Spurio @ 2020-01-25  0:55 UTC (permalink / raw)
  To: intel-gfx

Note: this applies on top of my series to commit early to GuC [1].

Picking up from the feedback from my RFC series about splitting
up intel_lrc.c [2], this series introduces GuC submission versions
of most of the engine and context functions. As a starting point,
the functions are still very similar to the execlists ones, but
they will progressively diverge while we add the new submission
logic. Some of the functions have been simplified by removing
support for pre-gen11 cases as we only aim to enable GuC submission
for gen11+; I've left comments and BUG_ONs to mark the reduces
support spots to easily find them in case we ever want to enable
GuC submission for older gens.

Going slightly against the feedback from the previous series, I have
kept the basic context and ring object management shared between
execlists and GuC submission functions. The rationale is that those
objects are HW-related and therefore their creation and their main
attributes (e.g. context size) are not dependent on the submission
method in any way. Handling of more SW constructs, like the timeline,
has been duplicated.

Still in theme of sharing, the flush and bb_start functions have also
been re-used on the GuC side, but I'm not sure if keeping them in
intel_lrc.c is the best solution. My medium-term plan is still the same
as [2], i.e. split execlists_submission.c out of intel_lrc.c, with the
latter holding the common code, but it might be worth moving the
command emission to a dedicated file or to a header as inlines, like
we already do in some cases.

The code is still a bit rough and the series has been compile-tested
only. I wanted to get some early comments about the direction before
rebasing the rest of the GuC code on top of it and start testing.

[1] https://patchwork.freedesktop.org/series/72031/
[2] https://patchwork.freedesktop.org/series/70787/

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>

Daniele Ceraolo Spurio (6):
  drm/i915/guc: Add guc-specific breadcrumb functions
  drm/i915/guc: Add request_alloc for guc_submission
  drm/i915/guc: Add engine->resume for GuC submission
  drm/i915/guc: Re-use lrc flush functions
  drm/i915/guc: Add initial context ops for GuC submission
  drm/i915/guc: Stop inheriting from execlists_set_default_submission

 drivers/gpu/drm/i915/gt/intel_lrc.c           | 217 +++++-----
 drivers/gpu/drm/i915/gt/intel_lrc.h           |  17 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |  12 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 378 +++++++++++++++++-
 4 files changed, 496 insertions(+), 128 deletions(-)

-- 
2.24.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-gfx] [RFC 1/6] drm/i915/guc: Add guc-specific breadcrumb functions
  2020-01-25  0:55 [Intel-gfx] [RFC 0/6] Start separating GuC and execlists submission Daniele Ceraolo Spurio
@ 2020-01-25  0:55 ` Daniele Ceraolo Spurio
  2020-01-25  0:55 ` [Intel-gfx] [RFC 2/6] drm/i915/guc: Add request_alloc for guc_submission Daniele Ceraolo Spurio
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Daniele Ceraolo Spurio @ 2020-01-25  0:55 UTC (permalink / raw)
  To: intel-gfx

They're still mostly the same as the execlists ones, with the main
difference being that there is no busywait on the GuC side.

Since we now have multiple vfuncs set dutring setup, start compacting
them in their own helper.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 148 +++++++++++++++++-
 1 file changed, 145 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ad0d604d35a8..a4caa0f9dfbf 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -498,6 +498,107 @@ static void guc_reset_finish(struct intel_engine_cs *engine)
 		     atomic_read(&execlists->tasklet.count));
 }
 
+static int emit_init_breadcrumb(struct i915_request *rq)
+{
+	u32 *cs;
+
+	GEM_BUG_ON(!i915_request_timeline(rq)->has_initial_breadcrumb);
+
+	cs = intel_ring_begin(rq, 6);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	/*
+	 * Check if we have been preempted before we even get started.
+	 *
+	 * After this point i915_request_started() reports true, even if
+	 * we get preempted and so are no longer running.
+	 */
+	*cs++ = MI_ARB_CHECK;
+	*cs++ = MI_NOOP;
+
+	*cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+	*cs++ = i915_request_timeline(rq)->hwsp_offset;
+	*cs++ = 0;
+	*cs++ = rq->fence.seqno - 1;
+
+	intel_ring_advance(rq, cs);
+
+	/* Record the updated position of the request's payload */
+	rq->infix = intel_ring_offset(rq, cs);
+
+	return 0;
+}
+
+static __always_inline u32*
+emit_fini_breadcrumb_footer(struct i915_request *request, u32 *cs)
+{
+	*cs++ = MI_USER_INTERRUPT;
+
+	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+
+	request->tail = intel_ring_offset(request, cs);
+	assert_ring_tail_valid(request->ring, request->tail);
+
+	/*
+	 * Reserve space for 2 extra dwords at the end of each request to be
+	 * used as a workaround for not being allowed to do lite restore with
+	 * HEAD==TAIL (WaIdleLiteRestore). We use MI_ARB_CHECK as one of the
+	 * dwords to ensure there's always at least one preemption point
+	 * per-request.
+	 */
+	*cs++ = MI_ARB_CHECK;
+	*cs++ = MI_NOOP;
+	request->wa_tail = intel_ring_offset(request, cs);
+
+	return cs;
+}
+
+static u32 *emit_fini_breadcrumb_xcs(struct i915_request *request, u32 *cs)
+{
+	cs = gen8_emit_ggtt_write(cs,
+				  request->fence.seqno,
+				  i915_request_active_timeline(request)->hwsp_offset,
+				  0);
+
+	return emit_fini_breadcrumb_footer(request, cs);
+}
+
+static u32 *
+gen11_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
+{
+	cs = gen8_emit_ggtt_write_rcs(cs,
+				      request->fence.seqno,
+				      i915_request_active_timeline(request)->hwsp_offset,
+				      PIPE_CONTROL_CS_STALL |
+				      PIPE_CONTROL_TILE_CACHE_FLUSH |
+				      PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
+				      PIPE_CONTROL_DEPTH_CACHE_FLUSH |
+				      PIPE_CONTROL_DC_FLUSH_ENABLE |
+				      PIPE_CONTROL_FLUSH_ENABLE);
+
+	return emit_fini_breadcrumb_footer(request, cs);
+}
+
+static u32 *
+gen12_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
+{
+	cs = gen8_emit_ggtt_write_rcs(cs,
+				      request->fence.seqno,
+				      i915_request_active_timeline(request)->hwsp_offset,
+				      PIPE_CONTROL_CS_STALL |
+				      PIPE_CONTROL_TILE_CACHE_FLUSH |
+				      PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
+				      PIPE_CONTROL_DEPTH_CACHE_FLUSH |
+				      /* Wa_1409600907:tgl */
+				      PIPE_CONTROL_DEPTH_STALL |
+				      PIPE_CONTROL_DC_FLUSH_ENABLE |
+				      PIPE_CONTROL_FLUSH_ENABLE |
+				      PIPE_CONTROL_HDC_PIPELINE_FLUSH);
+
+	return emit_fini_breadcrumb_footer(request, cs);
+}
+
 /*
  * Everything below here is concerned with setup & teardown, and is
  * therefore not part of the somewhat time-critical batch-submission
@@ -607,6 +708,49 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
 	engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
 }
 
+static void guc_submission_default_vfuncs(struct intel_engine_cs *engine)
+{
+	/* Default vfuncs which can be overriden by each engine. */
+
+	engine->emit_init_breadcrumb = emit_init_breadcrumb;
+	engine->emit_fini_breadcrumb = emit_fini_breadcrumb_xcs;
+
+	engine->set_default_submission = guc_set_default_submission;
+
+	/*
+	 * we don't set engine->irq_<en/dis>able because we don't support
+	 * masking irq on gen11+ and we don't support guc submission on
+	 * pre-gen11.
+	 */
+	GEM_BUG_ON(INTEL_GEN(engine->i915) < 11);
+}
+
+static void rcs_submission_override(struct intel_engine_cs *engine)
+{
+	/* no support for pre-gen11 variants */
+	GEM_BUG_ON(INTEL_GEN(engine->i915) < 11);
+
+	switch (INTEL_GEN(engine->i915)) {
+	case 12:
+		engine->emit_fini_breadcrumb = gen12_emit_fini_breadcrumb_rcs;
+		break;
+	default:
+		engine->emit_fini_breadcrumb = gen11_emit_fini_breadcrumb_rcs;
+		break;
+	}
+}
+
+void guc_submission_vfuncs(struct intel_engine_cs *engine)
+{
+	/* XXX: still using base execlists vfuncs and overriding some of them */
+	intel_execlists_submission_vfuncs(engine);
+
+	guc_submission_default_vfuncs(engine);
+
+	if (engine->class == RENDER_CLASS)
+		rcs_submission_override(engine);
+}
+
 static inline void guc_default_irqs(struct intel_engine_cs *engine)
 {
 	/*
@@ -641,9 +785,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 	tasklet_init(&engine->execlists.tasklet,
 		     guc_submission_tasklet, (unsigned long)engine);
 
-	/* XXX: still mirroring execlists. Will diverge with new interface */
-	intel_execlists_submission_vfuncs(engine);
-	engine->set_default_submission = guc_set_default_submission;
+	guc_submission_vfuncs(engine);
 
 	guc_default_irqs(engine);
 	intel_logical_ring_init_workaround_bb(engine);
-- 
2.24.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Intel-gfx] [RFC 2/6] drm/i915/guc: Add request_alloc for guc_submission
  2020-01-25  0:55 [Intel-gfx] [RFC 0/6] Start separating GuC and execlists submission Daniele Ceraolo Spurio
  2020-01-25  0:55 ` [Intel-gfx] [RFC 1/6] drm/i915/guc: Add guc-specific breadcrumb functions Daniele Ceraolo Spurio
@ 2020-01-25  0:55 ` Daniele Ceraolo Spurio
  2020-01-25  0:55 ` [Intel-gfx] [RFC 3/6] drm/i915/guc: Add engine->resume for GuC submission Daniele Ceraolo Spurio
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Daniele Ceraolo Spurio @ 2020-01-25  0:55 UTC (permalink / raw)
  To: intel-gfx

A straight copy of the execlists one to begin with. Tweaks will come
later if needed.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a4caa0f9dfbf..bf40793e32e5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -18,6 +18,9 @@
 #include "i915_drv.h"
 #include "i915_trace.h"
 
+/* Typical size of the average request (2 pipecontrols and a MI_BB) */
+#define GUC_REQUEST_SIZE 64 /* bytes */
+
 /**
  * DOC: GuC-based command submission
  *
@@ -498,6 +501,36 @@ static void guc_reset_finish(struct intel_engine_cs *engine)
 		     atomic_read(&execlists->tasklet.count));
 }
 
+static int guc_submission_request_alloc(struct i915_request *request)
+{
+	int ret;
+
+	GEM_BUG_ON(!intel_context_is_pinned(request->context));
+
+	/*
+	 * Flush enough space to reduce the likelihood of waiting after
+	 * we start building the request - in which case we will just
+	 * have to repeat work.
+	 */
+	request->reserved_space += GUC_REQUEST_SIZE;
+
+	/*
+	 * Note that after this point, we have committed to using
+	 * this request as it is being used to both track the
+	 * state of engine initialisation and liveness of the
+	 * golden renderstate above. Think twice before you try
+	 * to cancel/unwind this request now.
+	 */
+
+	/* Unconditionally invalidate GPU caches and TLBs. */
+	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
+	if (ret)
+		return ret;
+
+	request->reserved_space -= GUC_REQUEST_SIZE;
+	return 0;
+}
+
 static int emit_init_breadcrumb(struct i915_request *rq)
 {
 	u32 *cs;
@@ -712,6 +745,8 @@ static void guc_submission_default_vfuncs(struct intel_engine_cs *engine)
 {
 	/* Default vfuncs which can be overriden by each engine. */
 
+	engine->request_alloc = guc_submission_request_alloc;
+
 	engine->emit_init_breadcrumb = emit_init_breadcrumb;
 	engine->emit_fini_breadcrumb = emit_fini_breadcrumb_xcs;
 
-- 
2.24.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Intel-gfx] [RFC 3/6] drm/i915/guc: Add engine->resume for GuC submission
  2020-01-25  0:55 [Intel-gfx] [RFC 0/6] Start separating GuC and execlists submission Daniele Ceraolo Spurio
  2020-01-25  0:55 ` [Intel-gfx] [RFC 1/6] drm/i915/guc: Add guc-specific breadcrumb functions Daniele Ceraolo Spurio
  2020-01-25  0:55 ` [Intel-gfx] [RFC 2/6] drm/i915/guc: Add request_alloc for guc_submission Daniele Ceraolo Spurio
@ 2020-01-25  0:55 ` Daniele Ceraolo Spurio
  2020-01-25  0:55 ` [Intel-gfx] [RFC 4/6] drm/i915/guc: Re-use lrc flush functions Daniele Ceraolo Spurio
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Daniele Ceraolo Spurio @ 2020-01-25  0:55 UTC (permalink / raw)
  To: intel-gfx

Similar to the execlists one, but we don't handle the STOP_RING (because
GuC owns the engine status) and we leave the CSB fifo at 6 deep.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 29 +++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index bf40793e32e5..c688f21cc27e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -11,6 +11,7 @@
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_pm.h"
 #include "gt/intel_lrc_reg.h"
+#include "gt/intel_mocs.h"
 #include "gt/intel_ring.h"
 
 #include "intel_guc_submission.h"
@@ -501,6 +502,32 @@ static void guc_reset_finish(struct intel_engine_cs *engine)
 		     atomic_read(&execlists->tasklet.count));
 }
 
+static void hwsp_setup(struct intel_engine_cs *engine)
+{
+	intel_engine_set_hwsp_writemask(engine, ~0u); /* HWSTAM */
+
+	ENGINE_WRITE(engine, RING_HWS_PGA,
+		     i915_ggtt_offset(engine->status_page.vma));
+	ENGINE_POSTING_READ(engine, RING_HWS_PGA);
+}
+
+static int guc_submission_resume(struct intel_engine_cs *engine)
+{
+	intel_engine_apply_workarounds(engine);
+	intel_engine_apply_whitelist(engine);
+
+	intel_mocs_init_engine(engine);
+
+	intel_engine_reset_breadcrumbs(engine);
+
+	hwsp_setup(engine);
+
+	/* pre-gen11 requires explicit enabling of the execlists */
+	GEM_BUG_ON(INTEL_GEN(engine->i915) < 11);
+
+	return 0;
+}
+
 static int guc_submission_request_alloc(struct i915_request *request)
 {
 	int ret;
@@ -745,6 +772,8 @@ static void guc_submission_default_vfuncs(struct intel_engine_cs *engine)
 {
 	/* Default vfuncs which can be overriden by each engine. */
 
+	engine->resume = guc_submission_resume;
+
 	engine->request_alloc = guc_submission_request_alloc;
 
 	engine->emit_init_breadcrumb = emit_init_breadcrumb;
-- 
2.24.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Intel-gfx] [RFC 4/6] drm/i915/guc: Re-use lrc flush functions
  2020-01-25  0:55 [Intel-gfx] [RFC 0/6] Start separating GuC and execlists submission Daniele Ceraolo Spurio
                   ` (2 preceding siblings ...)
  2020-01-25  0:55 ` [Intel-gfx] [RFC 3/6] drm/i915/guc: Add engine->resume for GuC submission Daniele Ceraolo Spurio
@ 2020-01-25  0:55 ` Daniele Ceraolo Spurio
  2020-01-25  0:55 ` [Intel-gfx] [RFC 5/6] drm/i915/guc: Add initial context ops for GuC submission Daniele Ceraolo Spurio
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Daniele Ceraolo Spurio @ 2020-01-25  0:55 UTC (permalink / raw)
  To: intel-gfx

The flush commands are independent from the submission method, so we can
re-use the same code.

RFC: I've kept the function in intel_lrc.c for now as my current plan is
to keep the common code there and split out execlists_submission out,
but maybe intel_gpu_commands.c would be a better location?

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c               | 8 +++-----
 drivers/gpu/drm/i915/gt/intel_lrc.h               | 4 ++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +++
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 7e912959fb3d..cd15ab7fb82f 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -3793,7 +3793,7 @@ static void gen8_logical_ring_disable_irq(struct intel_engine_cs *engine)
 	ENGINE_WRITE(engine, RING_IMR, ~engine->irq_keep_mask);
 }
 
-static int gen8_emit_flush(struct i915_request *request, u32 mode)
+int gen8_emit_flush(struct i915_request *request, u32 mode)
 {
 	u32 cmd, *cs;
 
@@ -3892,8 +3892,7 @@ static int gen8_emit_flush_render(struct i915_request *request,
 	return 0;
 }
 
-static int gen11_emit_flush_render(struct i915_request *request,
-				   u32 mode)
+int gen11_emit_flush_render(struct i915_request *request, u32 mode)
 {
 	if (mode & EMIT_FLUSH) {
 		u32 *cs;
@@ -3949,8 +3948,7 @@ static u32 preparser_disable(bool state)
 	return MI_ARB_CHECK | 1 << 8 | state;
 }
 
-static int gen12_emit_flush_render(struct i915_request *request,
-				   u32 mode)
+int gen12_emit_flush_render(struct i915_request *request, u32 mode)
 {
 	if (mode & EMIT_FLUSH) {
 		u32 flags = 0;
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h
index 07d1655ed8c0..c5afae6bc89b 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
@@ -87,6 +87,10 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine);
 int intel_execlists_submission_setup(struct intel_engine_cs *engine);
 void intel_execlists_submission_vfuncs(struct intel_engine_cs *engine);
 
+int gen8_emit_flush(struct i915_request *request, u32 mode);
+int gen11_emit_flush_render(struct i915_request *request, u32 mode);
+int gen12_emit_flush_render(struct i915_request *request, u32 mode);
+
 /* Logical Ring Contexts */
 /* At the start of the context image is its per-process HWS page */
 #define LRC_PPHWSP_PN	(0)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index c688f21cc27e..edefb3cf135f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -776,6 +776,7 @@ static void guc_submission_default_vfuncs(struct intel_engine_cs *engine)
 
 	engine->request_alloc = guc_submission_request_alloc;
 
+	engine->emit_flush = gen8_emit_flush;
 	engine->emit_init_breadcrumb = emit_init_breadcrumb;
 	engine->emit_fini_breadcrumb = emit_fini_breadcrumb_xcs;
 
@@ -796,9 +797,11 @@ static void rcs_submission_override(struct intel_engine_cs *engine)
 
 	switch (INTEL_GEN(engine->i915)) {
 	case 12:
+		engine->emit_flush = gen12_emit_flush_render;
 		engine->emit_fini_breadcrumb = gen12_emit_fini_breadcrumb_rcs;
 		break;
 	default:
+		engine->emit_flush = gen11_emit_flush_render;
 		engine->emit_fini_breadcrumb = gen11_emit_fini_breadcrumb_rcs;
 		break;
 	}
-- 
2.24.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Intel-gfx] [RFC 5/6] drm/i915/guc: Add initial context ops for GuC submission
  2020-01-25  0:55 [Intel-gfx] [RFC 0/6] Start separating GuC and execlists submission Daniele Ceraolo Spurio
                   ` (3 preceding siblings ...)
  2020-01-25  0:55 ` [Intel-gfx] [RFC 4/6] drm/i915/guc: Re-use lrc flush functions Daniele Ceraolo Spurio
@ 2020-01-25  0:55 ` Daniele Ceraolo Spurio
  2020-01-25  0:55 ` [Intel-gfx] [RFC 6/6] drm/i915/guc: Stop inheriting from execlists_set_default_submission Daniele Ceraolo Spurio
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Daniele Ceraolo Spurio @ 2020-01-25  0:55 UTC (permalink / raw)
  To: intel-gfx

Code is still very similar to the execlists side, with some
simplifications, but it will diverge soon once we start adding the async
communication with GuC.

Most of the code as been duplicated, with the exception of hw objects
(ce->state, ring) and lrc contents, because these map to HW
functionality and are independent from the submission method.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 179 ++++++++++--------
 drivers/gpu/drm/i915/gt/intel_lrc.h           |  10 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |  12 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  97 +++++++++-
 4 files changed, 211 insertions(+), 87 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index cd15ab7fb82f..3178aa38deec 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -230,14 +230,11 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
 static int __execlists_context_alloc(struct intel_context *ce,
 				     struct intel_engine_cs *engine);
 
-static void execlists_init_reg_state(u32 *reg_state,
-				     const struct intel_context *ce,
-				     const struct intel_engine_cs *engine,
-				     const struct intel_ring *ring,
-				     bool close);
-static void
-__execlists_update_reg_state(const struct intel_context *ce,
-			     const struct intel_engine_cs *engine);
+static void lr_context_init_reg_state(u32 *reg_state,
+				      const struct intel_context *ce,
+				      const struct intel_engine_cs *engine,
+				      const struct intel_ring *ring,
+				      bool close);
 
 static void mark_eio(struct i915_request *rq)
 {
@@ -457,10 +454,9 @@ assert_priority_queue(const struct i915_request *prev,
  * engine info, SW context ID and SW counter need to form a unique number
  * (Context ID) per lrc.
  */
-static u64
-lrc_descriptor(struct intel_context *ce, struct intel_engine_cs *engine)
+u32 intel_lrc_descriptor_default_hw_flags(struct intel_context *ce)
 {
-	u64 desc;
+	u32 desc;
 
 	desc = INTEL_LEGACY_32B_CONTEXT;
 	if (i915_vm_is_4lvl(ce->vm))
@@ -468,10 +464,20 @@ lrc_descriptor(struct intel_context *ce, struct intel_engine_cs *engine)
 	desc <<= GEN8_CTX_ADDRESSING_MODE_SHIFT;
 
 	desc |= GEN8_CTX_VALID | GEN8_CTX_PRIVILEGE;
+
+	desc |= i915_ggtt_offset(ce->state); /* bits 12-31 */
+
+	return desc;
+}
+
+static u64
+lrc_descriptor(struct intel_context *ce, struct intel_engine_cs *engine)
+{
+	u64 desc = intel_lrc_descriptor_default_hw_flags(ce);
+
 	if (IS_GEN(engine->i915, 8))
 		desc |= GEN8_CTX_L3LLC_COHERENT;
 
-	desc |= i915_ggtt_offset(ce->state); /* bits 12-31 */
 	/*
 	 * The following 32bits are copied into the OA reports (dword 2).
 	 * Consider updating oa_get_render_ctx_id in i915_perf.c when changing
@@ -1154,7 +1160,7 @@ static void restore_default_state(struct intel_context *ce,
 		       engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE,
 		       engine->context_size - PAGE_SIZE);
 
-	execlists_init_reg_state(regs, ce, engine, ce->ring, false);
+	lr_context_init_reg_state(regs, ce, engine, ce->ring, false);
 }
 
 static void reset_active(struct i915_request *rq,
@@ -1191,7 +1197,7 @@ static void reset_active(struct i915_request *rq,
 
 	/* Scrub the context image to prevent replaying the previous batch */
 	restore_default_state(ce, engine);
-	__execlists_update_reg_state(ce, engine);
+	intel_lr_context_update_reg_state(ce, engine);
 
 	/* We've switched away, so this should be a no-op, but intent matters */
 	ce->lrc_desc |= CTX_DESC_FORCE_RESTORE;
@@ -2805,12 +2811,6 @@ static void execlists_submit_request(struct i915_request *request)
 	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
-static void __execlists_context_fini(struct intel_context *ce)
-{
-	intel_ring_put(ce->ring);
-	i915_vma_put(ce->state);
-}
-
 static void execlists_context_destroy(struct kref *kref)
 {
 	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
@@ -2819,7 +2819,7 @@ static void execlists_context_destroy(struct kref *kref)
 	GEM_BUG_ON(intel_context_is_pinned(ce));
 
 	if (ce->state)
-		__execlists_context_fini(ce);
+		intel_lr_context_objects_destroy(ce);
 
 	intel_context_fini(ce);
 	intel_context_free(ce);
@@ -2858,9 +2858,8 @@ static void execlists_context_unpin(struct intel_context *ce)
 	i915_gem_object_unpin_map(ce->state->obj);
 }
 
-static void
-__execlists_update_reg_state(const struct intel_context *ce,
-			     const struct intel_engine_cs *engine)
+void intel_lr_context_update_reg_state(const struct intel_context *ce,
+				       const struct intel_engine_cs *engine)
 {
 	struct intel_ring *ring = ce->ring;
 	u32 *regs = ce->lrc_reg_state;
@@ -2898,7 +2897,7 @@ __execlists_context_pin(struct intel_context *ce,
 
 	ce->lrc_desc = lrc_descriptor(ce, engine) | CTX_DESC_FORCE_RESTORE;
 	ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
-	__execlists_update_reg_state(ce, engine);
+	intel_lr_context_update_reg_state(ce, engine);
 
 	return 0;
 }
@@ -2915,32 +2914,7 @@ static int execlists_context_alloc(struct intel_context *ce)
 
 static void execlists_context_reset(struct intel_context *ce)
 {
-	CE_TRACE(ce, "reset\n");
-	GEM_BUG_ON(!intel_context_is_pinned(ce));
-
-	/*
-	 * Because we emit WA_TAIL_DWORDS there may be a disparity
-	 * between our bookkeeping in ce->ring->head and ce->ring->tail and
-	 * that stored in context. As we only write new commands from
-	 * ce->ring->tail onwards, everything before that is junk. If the GPU
-	 * starts reading from its RING_HEAD from the context, it may try to
-	 * execute that junk and die.
-	 *
-	 * The contexts that are stilled pinned on resume belong to the
-	 * kernel, and are local to each engine. All other contexts will
-	 * have their head/tail sanitized upon pinning before use, so they
-	 * will never see garbage,
-	 *
-	 * So to avoid that we reset the context images upon resume. For
-	 * simplicity, we just zero everything out.
-	 */
-	intel_ring_reset(ce->ring, ce->ring->emit);
-
-	/* Scrub away the garbage */
-	execlists_init_reg_state(ce->lrc_reg_state,
-				 ce, ce->engine, ce->ring, true);
-	__execlists_update_reg_state(ce, ce->engine);
-
+	intel_lr_context_objects_state_reset(ce);
 	ce->lrc_desc |= CTX_DESC_FORCE_RESTORE;
 }
 
@@ -3591,7 +3565,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 		     ce->ring->head, ce->ring->tail);
 	intel_ring_update_space(ce->ring);
 	__execlists_reset_reg_state(ce, engine);
-	__execlists_update_reg_state(ce, engine);
+	intel_lr_context_update_reg_state(ce, engine);
 	ce->lrc_desc |= CTX_DESC_FORCE_RESTORE; /* paranoid: GPU was reset! */
 
 unwind:
@@ -4322,7 +4296,7 @@ static void rcs_submission_override(struct intel_engine_cs *engine)
 	}
 }
 
-void intel_execlists_submission_vfuncs(struct intel_engine_cs *engine)
+static void execlists_submission_vfuncs(struct intel_engine_cs *engine)
 {
 	logical_ring_default_vfuncs(engine);
 
@@ -4342,7 +4316,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine)
 	timer_setup(&engine->execlists.timer, execlists_timeslice, 0);
 	timer_setup(&engine->execlists.preempt, execlists_preempt, 0);
 
-	intel_execlists_submission_vfuncs(engine);
+	execlists_submission_vfuncs(engine);
 	logical_ring_default_irqs(engine);
 
 	intel_logical_ring_init_workaround_bb(engine);
@@ -4478,11 +4452,11 @@ static struct i915_ppgtt *vm_alias(struct i915_address_space *vm)
 		return i915_vm_to_ppgtt(vm);
 }
 
-static void execlists_init_reg_state(u32 *regs,
-				     const struct intel_context *ce,
-				     const struct intel_engine_cs *engine,
-				     const struct intel_ring *ring,
-				     bool inhibit)
+static void lr_context_init_reg_state(u32 *regs,
+				      const struct intel_context *ce,
+				      const struct intel_engine_cs *engine,
+				      const struct intel_ring *ring,
+				      bool inhibit)
 {
 	/*
 	 * A context is actually a big batch buffer with several
@@ -4544,8 +4518,8 @@ populate_lr_context(struct intel_context *ce,
 
 	/* The second page of the context object contains some fields which must
 	 * be set up prior to the first execution. */
-	execlists_init_reg_state(vaddr + LRC_STATE_PN * PAGE_SIZE,
-				 ce, engine, ring, inhibit);
+	lr_context_init_reg_state(vaddr + LRC_STATE_PN * PAGE_SIZE,
+				  ce, engine, ring, inhibit);
 
 	ret = 0;
 err_unpin_ctx:
@@ -4554,8 +4528,8 @@ populate_lr_context(struct intel_context *ce,
 	return ret;
 }
 
-static int __execlists_context_alloc(struct intel_context *ce,
-				     struct intel_engine_cs *engine)
+int intel_lr_context_objects_create(struct intel_context *ce,
+				    struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_object *ctx_obj;
 	struct intel_ring *ring;
@@ -4579,18 +4553,6 @@ static int __execlists_context_alloc(struct intel_context *ce,
 		goto error_deref_obj;
 	}
 
-	if (!ce->timeline) {
-		struct intel_timeline *tl;
-
-		tl = intel_timeline_create(engine->gt, NULL);
-		if (IS_ERR(tl)) {
-			ret = PTR_ERR(tl);
-			goto error_deref_obj;
-		}
-
-		ce->timeline = tl;
-	}
-
 	ring = intel_engine_create_ring(engine, (unsigned long)ce->ring);
 	if (IS_ERR(ring)) {
 		ret = PTR_ERR(ring);
@@ -4615,6 +4577,69 @@ static int __execlists_context_alloc(struct intel_context *ce,
 	return ret;
 }
 
+void intel_lr_context_objects_destroy(struct intel_context *ce)
+{
+	intel_ring_put(ce->ring);
+	i915_vma_put(ce->state);
+}
+
+void intel_lr_context_objects_state_reset(struct intel_context *ce)
+{
+	CE_TRACE(ce, "reset\n");
+	GEM_BUG_ON(!intel_context_is_pinned(ce));
+
+	/*
+	 * Because we emit WA_TAIL_DWORDS there may be a disparity
+	 * between our bookkeeping in ce->ring->head and ce->ring->tail and
+	 * that stored in context. As we only write new commands from
+	 * ce->ring->tail onwards, everything before that is junk. If the GPU
+	 * starts reading from its RING_HEAD from the context, it may try to
+	 * execute that junk and die.
+	 *
+	 * The contexts that are stilled pinned on resume belong to the
+	 * kernel, and are local to each engine. All other contexts will
+	 * have their head/tail sanitized upon pinning before use, so they
+	 * will never see garbage,
+	 *
+	 * So to avoid that we reset the context images upon resume. For
+	 * simplicity, we just zero everything out.
+	 */
+	intel_ring_reset(ce->ring, ce->ring->emit);
+
+	/* Scrub away the garbage */
+	lr_context_init_reg_state(ce->lrc_reg_state,
+				  ce, ce->engine, ce->ring, true);
+	intel_lr_context_update_reg_state(ce, ce->engine);
+}
+
+static int __execlists_context_alloc(struct intel_context *ce,
+				     struct intel_engine_cs *engine)
+{
+	int ret;
+
+	ret = intel_lr_context_objects_create(ce, engine);
+	if (ret)
+		return ret;
+
+	if (!ce->timeline) {
+		struct intel_timeline *tl;
+
+		tl = intel_timeline_create(engine->gt, NULL);
+		if (IS_ERR(tl)) {
+			ret = PTR_ERR(tl);
+			goto error_destroy;
+		}
+
+		ce->timeline = tl;
+	}
+
+	return 0;
+
+error_destroy:
+	intel_lr_context_objects_destroy(ce);
+	return ret;
+}
+
 static struct list_head *virtual_queue(struct virtual_engine *ve)
 {
 	return &ve->base.execlists.default_priolist.requests[0];
@@ -4649,7 +4674,7 @@ static void virtual_context_destroy(struct kref *kref)
 	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
 
 	if (ve->context.state)
-		__execlists_context_fini(&ve->context);
+		intel_lr_context_objects_destroy(&ve->context);
 	intel_context_fini(&ve->context);
 
 	kfree(ve->bonds);
@@ -5234,7 +5259,7 @@ void intel_lr_context_reset(struct intel_engine_cs *engine,
 	ce->ring->head = head;
 	intel_ring_update_space(ce->ring);
 
-	__execlists_update_reg_state(ce, engine);
+	intel_lr_context_update_reg_state(ce, engine);
 }
 
 bool
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h
index c5afae6bc89b..17cabe4b9898 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
@@ -85,7 +85,6 @@ void intel_logical_ring_fini_workaround_bb(struct intel_engine_cs *engine);
 void intel_logical_ring_cleanup(struct intel_engine_cs *engine);
 
 int intel_execlists_submission_setup(struct intel_engine_cs *engine);
-void intel_execlists_submission_vfuncs(struct intel_engine_cs *engine);
 
 int gen8_emit_flush(struct i915_request *request, u32 mode);
 int gen11_emit_flush_render(struct i915_request *request, u32 mode);
@@ -104,6 +103,15 @@ int gen12_emit_flush_render(struct i915_request *request, u32 mode);
 
 void intel_execlists_set_default_submission(struct intel_engine_cs *engine);
 
+int intel_lr_context_objects_create(struct intel_context *ce,
+				    struct intel_engine_cs *engine);
+void intel_lr_context_objects_destroy(struct intel_context *ce);
+void intel_lr_context_objects_state_reset(struct intel_context *ce);
+
+u32 intel_lrc_descriptor_default_hw_flags(struct intel_context *ce);
+
+void intel_lr_context_update_reg_state(const struct intel_context *ce,
+				       const struct intel_engine_cs *engine);
 void intel_lr_context_reset(struct intel_engine_cs *engine,
 			    struct intel_context *ce,
 			    u32 head,
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 753cb4959246..57f0ee15795c 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -186,7 +186,7 @@ static int live_unlite_restore(struct intel_gt *gt, int prio)
 		}
 		GEM_BUG_ON(!ce[1]->ring->size);
 		intel_ring_reset(ce[1]->ring, ce[1]->ring->size / 2);
-		__execlists_update_reg_state(ce[1], engine);
+		intel_lr_context_update_reg_state(ce[1], engine);
 
 		rq[0] = igt_spinner_create_request(&spin, ce[0], MI_ARB_CHECK);
 		if (IS_ERR(rq[0])) {
@@ -3664,11 +3664,11 @@ static int live_lrc_layout(void *arg)
 		}
 		hw += LRC_STATE_PN * PAGE_SIZE / sizeof(*hw);
 
-		execlists_init_reg_state(memset(lrc, POISON_INUSE, PAGE_SIZE),
-					 engine->kernel_context,
-					 engine,
-					 engine->kernel_context->ring,
-					 true);
+		lr_context_init_reg_state(memset(lrc, POISON_INUSE, PAGE_SIZE),
+					  engine->kernel_context,
+					  engine,
+					  engine->kernel_context->ring,
+					  true);
 
 		dw = 0;
 		do {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index edefb3cf135f..03e0a8180f77 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -348,6 +348,99 @@ static void __guc_dequeue(struct intel_engine_cs *engine)
 	execlists->active = execlists->inflight;
 }
 
+static int guc_submission_context_alloc(struct intel_context *ce)
+{
+	struct intel_engine_cs *engine = ce->engine;
+	int ret;
+
+	ret = intel_lr_context_objects_create(ce, engine);
+	if (ret)
+		return ret;
+
+	if (!ce->timeline) {
+		struct intel_timeline *tl;
+
+		tl = intel_timeline_create(engine->gt, NULL);
+		if (IS_ERR(tl)) {
+			ret = PTR_ERR(tl);
+			goto error_destroy;
+		}
+
+		ce->timeline = tl;
+	}
+
+	return 0;
+
+error_destroy:
+	intel_lr_context_objects_destroy(ce);
+	return ret;
+}
+
+static void guc_submission_context_destroy(struct kref *kref)
+{
+	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
+
+	GEM_BUG_ON(!i915_active_is_idle(&ce->active));
+	GEM_BUG_ON(intel_context_is_pinned(ce));
+
+	if (ce->state)
+		intel_lr_context_objects_destroy(ce);
+
+	intel_context_fini(ce);
+	intel_context_free(ce);
+}
+
+static void guc_submission_context_reset(struct intel_context *ce)
+{
+	intel_lr_context_objects_state_reset(ce);
+}
+
+static int guc_submission_context_pin(struct intel_context *ce)
+{
+	struct intel_engine_cs *engine = ce->engine;
+	void *vaddr;
+
+	GEM_BUG_ON(!ce->state);
+	GEM_BUG_ON(!i915_vma_is_pinned(ce->state));
+
+	vaddr = i915_gem_object_pin_map(ce->state->obj,
+					i915_coherent_map_type(engine->i915) |
+					I915_MAP_OVERRIDE);
+	if (IS_ERR(vaddr))
+		return PTR_ERR(vaddr);
+
+	/* GuC owns the ctx ID */
+	ce->lrc_desc = intel_lrc_descriptor_default_hw_flags(ce);
+	ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
+	intel_lr_context_update_reg_state(ce, engine);
+
+	/* TODO: talk to the GuC! */
+
+	return 0;
+}
+
+static void guc_submission_context_unpin(struct intel_context *ce)
+{
+	/* TODO: check redzone */
+
+	i915_gem_object_unpin_map(ce->state->obj);
+
+	/* TODO: talk to the GuC! */
+}
+
+static const struct intel_context_ops guc_context_ops = {
+	.alloc = guc_submission_context_alloc,
+
+	.pin = guc_submission_context_pin,
+	.unpin = guc_submission_context_unpin,
+
+	.enter = intel_context_enter_engine,
+	.exit = intel_context_exit_engine,
+
+	.reset = guc_submission_context_reset,
+	.destroy = guc_submission_context_destroy,
+};
+
 static void guc_submission_tasklet(unsigned long data)
 {
 	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
@@ -774,6 +867,7 @@ static void guc_submission_default_vfuncs(struct intel_engine_cs *engine)
 
 	engine->resume = guc_submission_resume;
 
+	engine->cops = &guc_context_ops;
 	engine->request_alloc = guc_submission_request_alloc;
 
 	engine->emit_flush = gen8_emit_flush;
@@ -809,9 +903,6 @@ static void rcs_submission_override(struct intel_engine_cs *engine)
 
 void guc_submission_vfuncs(struct intel_engine_cs *engine)
 {
-	/* XXX: still using base execlists vfuncs and overriding some of them */
-	intel_execlists_submission_vfuncs(engine);
-
 	guc_submission_default_vfuncs(engine);
 
 	if (engine->class == RENDER_CLASS)
-- 
2.24.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Intel-gfx] [RFC 6/6] drm/i915/guc: Stop inheriting from execlists_set_default_submission
  2020-01-25  0:55 [Intel-gfx] [RFC 0/6] Start separating GuC and execlists submission Daniele Ceraolo Spurio
                   ` (4 preceding siblings ...)
  2020-01-25  0:55 ` [Intel-gfx] [RFC 5/6] drm/i915/guc: Add initial context ops for GuC submission Daniele Ceraolo Spurio
@ 2020-01-25  0:55 ` Daniele Ceraolo Spurio
  2020-01-25  1:45 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Start separating GuC and execlists submission Patchwork
  2020-01-29  0:49 ` [Intel-gfx] [RFC 0/6] " Matthew Brost
  7 siblings, 0 replies; 10+ messages in thread
From: Daniele Ceraolo Spurio @ 2020-01-25  0:55 UTC (permalink / raw)
  To: intel-gfx

Copy the submit_request function (with a small simpliication) and set
the engine flags independently. No preemption or semaphore support yet
on the GuC side.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 30 +++-----
 drivers/gpu/drm/i915/gt/intel_lrc.h           |  5 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 72 +++++++++++++++----
 3 files changed, 70 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 3178aa38deec..b730d6593da0 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2756,19 +2756,6 @@ static void queue_request(struct intel_engine_cs *engine,
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 }
 
-static void __submit_queue_imm(struct intel_engine_cs *engine)
-{
-	struct intel_engine_execlists * const execlists = &engine->execlists;
-
-	if (reset_in_progress(execlists))
-		return; /* defer until we restart the engine following reset */
-
-	if (execlists->tasklet.func == execlists_submission_tasklet)
-		__execlists_submission_tasklet(engine);
-	else
-		tasklet_hi_schedule(&execlists->tasklet);
-}
-
 static void submit_queue(struct intel_engine_cs *engine,
 			 const struct i915_request *rq)
 {
@@ -2778,7 +2765,10 @@ static void submit_queue(struct intel_engine_cs *engine,
 		return;
 
 	execlists->queue_priority_hint = rq_prio(rq);
-	__submit_queue_imm(engine);
+
+	/* if reset in progress, defer until we restart the engine */
+	if (!reset_in_progress(execlists))
+		__execlists_submission_tasklet(engine);
 }
 
 static bool ancestor_on_hold(const struct intel_engine_cs *engine,
@@ -3694,9 +3684,9 @@ static void execlists_reset_finish(struct intel_engine_cs *engine)
 		     atomic_read(&execlists->tasklet.count));
 }
 
-static int gen8_emit_bb_start_noarb(struct i915_request *rq,
-				    u64 offset, u32 len,
-				    const unsigned int flags)
+int gen8_emit_bb_start_noarb(struct i915_request *rq,
+			     u64 offset, u32 len,
+			     const unsigned int flags)
 {
 	u32 *cs;
 
@@ -4180,7 +4170,7 @@ static void execlists_park(struct intel_engine_cs *engine)
 	cancel_timer(&engine->execlists.preempt);
 }
 
-void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
+static void execlists_set_default_submission(struct intel_engine_cs *engine)
 {
 	engine->submit_request = execlists_submit_request;
 	engine->schedule = i915_schedule;
@@ -4242,7 +4232,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	if (INTEL_GEN(engine->i915) >= 12)
 		engine->emit_fini_breadcrumb = gen12_emit_fini_breadcrumb;
 
-	engine->set_default_submission = intel_execlists_set_default_submission;
+	engine->set_default_submission = execlists_set_default_submission;
 
 	if (INTEL_GEN(engine->i915) < 11) {
 		engine->irq_enable = gen8_logical_ring_enable_irq;
@@ -5266,7 +5256,7 @@ bool
 intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine)
 {
 	return engine->set_default_submission ==
-	       intel_execlists_set_default_submission;
+	       execlists_set_default_submission;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h
index 17cabe4b9898..441cea5150d2 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
@@ -86,6 +86,9 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine);
 
 int intel_execlists_submission_setup(struct intel_engine_cs *engine);
 
+int gen8_emit_bb_start_noarb(struct i915_request *rq,
+			     u64 offset, u32 len,
+			     const unsigned int flags);
 int gen8_emit_flush(struct i915_request *request, u32 mode);
 int gen11_emit_flush_render(struct i915_request *request, u32 mode);
 int gen12_emit_flush_render(struct i915_request *request, u32 mode);
@@ -101,8 +104,6 @@ int gen12_emit_flush_render(struct i915_request *request, u32 mode);
 #define LRC_PPHWSP_SCRATCH		0x34
 #define LRC_PPHWSP_SCRATCH_ADDR		(LRC_PPHWSP_SCRATCH * sizeof(u32))
 
-void intel_execlists_set_default_submission(struct intel_engine_cs *engine);
-
 int intel_lr_context_objects_create(struct intel_context *ce,
 				    struct intel_engine_cs *engine);
 void intel_lr_context_objects_destroy(struct intel_context *ce);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 03e0a8180f77..8e958dc9d624 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -441,6 +441,54 @@ static const struct intel_context_ops guc_context_ops = {
 	.destroy = guc_submission_context_destroy,
 };
 
+static inline bool
+reset_in_progress(const struct intel_engine_execlists *execlists)
+{
+	return unlikely(!__tasklet_is_enabled(&execlists->tasklet));
+}
+
+static void queue_request(struct intel_engine_cs *engine,
+			  struct i915_request *rq)
+{
+	GEM_BUG_ON(!list_empty(&rq->sched.link));
+	list_add_tail(&rq->sched.link,
+		      i915_sched_lookup_priolist(engine, rq_prio(rq)));
+	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+}
+
+static void submit_queue(struct intel_engine_cs *engine,
+			 const struct i915_request *rq)
+{
+	struct intel_engine_execlists *execlists = &engine->execlists;
+
+	if (rq_prio(rq) <= execlists->queue_priority_hint)
+		return;
+
+	execlists->queue_priority_hint = rq_prio(rq);
+
+	/* if reset in progress, defer until we restart the engine */
+	if (!reset_in_progress(execlists))
+		tasklet_hi_schedule(&execlists->tasklet);
+}
+
+static void guc_submit_request(struct i915_request *request)
+{
+	struct intel_engine_cs *engine = request->engine;
+	unsigned long flags;
+
+	/* Will be called from irq-context when using foreign fences. */
+	spin_lock_irqsave(&engine->active.lock, flags);
+
+	queue_request(engine, request);
+
+	GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
+	GEM_BUG_ON(list_empty(&request->sched.link));
+
+	submit_queue(engine, request);
+
+	spin_unlock_irqrestore(&engine->active.lock, flags);
+}
+
 static void guc_submission_tasklet(unsigned long data)
 {
 	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
@@ -834,22 +882,10 @@ static void guc_interrupts_release(struct intel_gt *gt)
 
 static void guc_set_default_submission(struct intel_engine_cs *engine)
 {
-	/*
-	 * We inherit a bunch of functions from execlists that we'd like
-	 * to keep using:
-	 *
-	 *    engine->submit_request = execlists_submit_request;
-	 *    engine->cancel_requests = execlists_cancel_requests;
-	 *    engine->schedule = execlists_schedule;
-	 *
-	 * But we need to override the actual submission backend in order
-	 * to talk to the GuC.
-	 */
-	intel_execlists_set_default_submission(engine);
-
+	engine->submit_request = guc_submit_request;
+	engine->schedule = i915_schedule;
 	engine->execlists.tasklet.func = guc_submission_tasklet;
 
-	/* do not use execlists park/unpark */
 	engine->park = engine->unpark = NULL;
 
 	engine->reset.prepare = guc_reset_prepare;
@@ -857,7 +893,13 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
 	engine->reset.cancel = guc_reset_cancel;
 	engine->reset.finish = guc_reset_finish;
 
-	engine->flags &= ~I915_ENGINE_SUPPORTS_STATS;
+	if (INTEL_GEN(engine->i915) >= 12)
+		engine->flags |= I915_ENGINE_HAS_RELATIVE_MMIO;
+
+	/* no preempt support yet, so always use the noarb variant of bb_start */
+	GEM_BUG_ON(intel_engine_has_preemption(engine));
+	engine->emit_bb_start = gen8_emit_bb_start_noarb;
+
 	engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
 }
 
-- 
2.24.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for Start separating GuC and execlists submission
  2020-01-25  0:55 [Intel-gfx] [RFC 0/6] Start separating GuC and execlists submission Daniele Ceraolo Spurio
                   ` (5 preceding siblings ...)
  2020-01-25  0:55 ` [Intel-gfx] [RFC 6/6] drm/i915/guc: Stop inheriting from execlists_set_default_submission Daniele Ceraolo Spurio
@ 2020-01-25  1:45 ` Patchwork
  2020-01-29  0:49 ` [Intel-gfx] [RFC 0/6] " Matthew Brost
  7 siblings, 0 replies; 10+ messages in thread
From: Patchwork @ 2020-01-25  1:45 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx

== Series Details ==

Series: Start separating GuC and execlists submission
URL   : https://patchwork.freedesktop.org/series/72562/
State : failure

== Summary ==

Applying: drm/i915/guc: Add guc-specific breadcrumb functions
error: sha1 information is lacking or useless (drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch' to see the failed patch
Patch failed at 0001 drm/i915/guc: Add guc-specific breadcrumb functions
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [RFC 0/6] Start separating GuC and execlists submission
  2020-01-25  0:55 [Intel-gfx] [RFC 0/6] Start separating GuC and execlists submission Daniele Ceraolo Spurio
                   ` (6 preceding siblings ...)
  2020-01-25  1:45 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Start separating GuC and execlists submission Patchwork
@ 2020-01-29  0:49 ` Matthew Brost
  2020-01-29  1:59   ` Daniele Ceraolo Spurio
  7 siblings, 1 reply; 10+ messages in thread
From: Matthew Brost @ 2020-01-29  0:49 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx

On Fri, Jan 24, 2020 at 04:55:31PM -0800, Daniele Ceraolo Spurio wrote:
>Note: this applies on top of my series to commit early to GuC [1].
>
>Picking up from the feedback from my RFC series about splitting
>up intel_lrc.c [2], this series introduces GuC submission versions
>of most of the engine and context functions. As a starting point,
>the functions are still very similar to the execlists ones, but
>they will progressively diverge while we add the new submission
>logic. Some of the functions have been simplified by removing
>support for pre-gen11 cases as we only aim to enable GuC submission
>for gen11+; I've left comments and BUG_ONs to mark the reduces
>support spots to easily find them in case we ever want to enable
>GuC submission for older gens.
>
>Going slightly against the feedback from the previous series, I have
>kept the basic context and ring object management shared between
>execlists and GuC submission functions. The rationale is that those
>objects are HW-related and therefore their creation and their main
>attributes (e.g. context size) are not dependent on the submission
>method in any way. Handling of more SW constructs, like the timeline,
>has been duplicated.
>
>Still in theme of sharing, the flush and bb_start functions have also
>been re-used on the GuC side, but I'm not sure if keeping them in
>intel_lrc.c is the best solution. My medium-term plan is still the same
>as [2], i.e. split execlists_submission.c out of intel_lrc.c, with the
>latter holding the common code, but it might be worth moving the
>command emission to a dedicated file or to a header as inlines, like
>we already do in some cases.
>

I can't say I'm thrilled about the amount of code duplicated in this series.
i.e. 95% of the breadcrumb code is exactly the same, request_alloc is a copy and
paste, same with resume, and same with the context operations. Surely there is a
way to share the code common between the GuC execlists? I like the idea of
putting all of this common code in a shared file and then
intel_execlist_submission.c & intel_guc_submission.c call into the common
functions while retaining there own set of unique function pointers.

Matt

>The code is still a bit rough and the series has been compile-tested
>only. I wanted to get some early comments about the direction before
>rebasing the rest of the GuC code on top of it and start testing.
>
>[1] https://patchwork.freedesktop.org/series/72031/
>[2] https://patchwork.freedesktop.org/series/70787/
>
>Cc: Chris Wilson <chris@chris-wilson.co.uk>
>Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>Cc: John Harrison <John.C.Harrison@Intel.com>
>Cc: Matthew Brost <matthew.brost@intel.com>
>
>Daniele Ceraolo Spurio (6):
>  drm/i915/guc: Add guc-specific breadcrumb functions
>  drm/i915/guc: Add request_alloc for guc_submission
>  drm/i915/guc: Add engine->resume for GuC submission
>  drm/i915/guc: Re-use lrc flush functions
>  drm/i915/guc: Add initial context ops for GuC submission
>  drm/i915/guc: Stop inheriting from execlists_set_default_submission
>
> drivers/gpu/drm/i915/gt/intel_lrc.c           | 217 +++++-----
> drivers/gpu/drm/i915/gt/intel_lrc.h           |  17 +-
> drivers/gpu/drm/i915/gt/selftest_lrc.c        |  12 +-
> .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 378 +++++++++++++++++-
> 4 files changed, 496 insertions(+), 128 deletions(-)
>
>-- 
>2.24.1
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [RFC 0/6] Start separating GuC and execlists submission
  2020-01-29  0:49 ` [Intel-gfx] [RFC 0/6] " Matthew Brost
@ 2020-01-29  1:59   ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 10+ messages in thread
From: Daniele Ceraolo Spurio @ 2020-01-29  1:59 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx



On 1/28/20 4:49 PM, Matthew Brost wrote:
> On Fri, Jan 24, 2020 at 04:55:31PM -0800, Daniele Ceraolo Spurio wrote:
>> Note: this applies on top of my series to commit early to GuC [1].
>>
>> Picking up from the feedback from my RFC series about splitting
>> up intel_lrc.c [2], this series introduces GuC submission versions
>> of most of the engine and context functions. As a starting point,
>> the functions are still very similar to the execlists ones, but
>> they will progressively diverge while we add the new submission
>> logic. Some of the functions have been simplified by removing
>> support for pre-gen11 cases as we only aim to enable GuC submission
>> for gen11+; I've left comments and BUG_ONs to mark the reduces
>> support spots to easily find them in case we ever want to enable
>> GuC submission for older gens.
>>
>> Going slightly against the feedback from the previous series, I have
>> kept the basic context and ring object management shared between
>> execlists and GuC submission functions. The rationale is that those
>> objects are HW-related and therefore their creation and their main
>> attributes (e.g. context size) are not dependent on the submission
>> method in any way. Handling of more SW constructs, like the timeline,
>> has been duplicated.
>>
>> Still in theme of sharing, the flush and bb_start functions have also
>> been re-used on the GuC side, but I'm not sure if keeping them in
>> intel_lrc.c is the best solution. My medium-term plan is still the same
>> as [2], i.e. split execlists_submission.c out of intel_lrc.c, with the
>> latter holding the common code, but it might be worth moving the
>> command emission to a dedicated file or to a header as inlines, like
>> we already do in some cases.
>>
> 
> I can't say I'm thrilled about the amount of code duplicated in this 
> series.
> i.e. 95% of the breadcrumb code is exactly the same, request_alloc is a 
> copy and
> paste, same with resume, and same with the context operations. Surely 
> there is a
> way to share the code common between the GuC execlists? I like the idea of

I do understand the concerns as I started with the same idea you have 
(see my RFC for splitting intel_lrc.c), but after the received feedback 
and looking a bit more at the flows I've partially changed my mind, 
because not all that make sense to have in a common place today will 
stay the same going forward. e.g. we'll have to add a semaphore_signal 
to the breadcrumb_fini to signal completion to the GuC if we want the 
GuC to handle the dependencies for us, which will impact both the 
breadcrumb code itself and the size used in request_alloc.
This said, I do agree that there is certainly the opportunity to move 
more code to be common than what is done in this series, but IMO we need 
to wait for all the bits to be in place to have a clearer picture of 
what the full flow is going to look like. We did go through something 
similar, but on a bigger scale, when we duplicated everything from 
ringbuffer to execlists submission and then gradually merged it back 
together where possible.
Also, consider that some functions are back-end level, so IMO we need to 
compare them with their ringbuffer version as well.

Daniele

> putting all of this common code in a shared file and then
> intel_execlist_submission.c & intel_guc_submission.c call into the common
> functions while retaining there own set of unique function pointers.
> 
> Matt
> 
>> The code is still a bit rough and the series has been compile-tested
>> only. I wanted to get some early comments about the direction before
>> rebasing the rest of the GuC code on top of it and start testing.
>>
>> [1] https://patchwork.freedesktop.org/series/72031/
>> [2] https://patchwork.freedesktop.org/series/70787/
>>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Cc: John Harrison <John.C.Harrison@Intel.com>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>>
>> Daniele Ceraolo Spurio (6):
>>  drm/i915/guc: Add guc-specific breadcrumb functions
>>  drm/i915/guc: Add request_alloc for guc_submission
>>  drm/i915/guc: Add engine->resume for GuC submission
>>  drm/i915/guc: Re-use lrc flush functions
>>  drm/i915/guc: Add initial context ops for GuC submission
>>  drm/i915/guc: Stop inheriting from execlists_set_default_submission
>>
>> drivers/gpu/drm/i915/gt/intel_lrc.c           | 217 +++++-----
>> drivers/gpu/drm/i915/gt/intel_lrc.h           |  17 +-
>> drivers/gpu/drm/i915/gt/selftest_lrc.c        |  12 +-
>> .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 378 +++++++++++++++++-
>> 4 files changed, 496 insertions(+), 128 deletions(-)
>>
>> -- 
>> 2.24.1
>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-01-29  2:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-25  0:55 [Intel-gfx] [RFC 0/6] Start separating GuC and execlists submission Daniele Ceraolo Spurio
2020-01-25  0:55 ` [Intel-gfx] [RFC 1/6] drm/i915/guc: Add guc-specific breadcrumb functions Daniele Ceraolo Spurio
2020-01-25  0:55 ` [Intel-gfx] [RFC 2/6] drm/i915/guc: Add request_alloc for guc_submission Daniele Ceraolo Spurio
2020-01-25  0:55 ` [Intel-gfx] [RFC 3/6] drm/i915/guc: Add engine->resume for GuC submission Daniele Ceraolo Spurio
2020-01-25  0:55 ` [Intel-gfx] [RFC 4/6] drm/i915/guc: Re-use lrc flush functions Daniele Ceraolo Spurio
2020-01-25  0:55 ` [Intel-gfx] [RFC 5/6] drm/i915/guc: Add initial context ops for GuC submission Daniele Ceraolo Spurio
2020-01-25  0:55 ` [Intel-gfx] [RFC 6/6] drm/i915/guc: Stop inheriting from execlists_set_default_submission Daniele Ceraolo Spurio
2020-01-25  1:45 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Start separating GuC and execlists submission Patchwork
2020-01-29  0:49 ` [Intel-gfx] [RFC 0/6] " Matthew Brost
2020-01-29  1:59   ` Daniele Ceraolo Spurio

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.