dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/97] Basic GuC submission support in the i915
@ 2021-05-06 19:13 Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 01/97] drm/i915/gt: Move engine setup out of set_default_submission Matthew Brost
                   ` (99 more replies)
  0 siblings, 100 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Basic GuC submission support. This is the first bullet point in the
upstreaming plan covered in the following RFC [1].

At a very high level the GuC is a piece of firmware which sits between
the i915 and the GPU. It offloads some of the scheduling of contexts
from the i915 and programs the GPU to submit contexts. The i915
communicates with the GuC and the GuC communicates with the GPU.

GuC submission will be disabled by default on all current upstream
platforms behind a module parameter - enable_guc. A value of 3 will
enable submission and HuC loading via the GuC. GuC submission should
work on all gen11+ platforms assuming the GuC firmware is present.

This is a huge series and it is completely unrealistic to merge all of
these patches at once. Fortunately I believe we can break down the
series into different merges:

1. Merge Chris Wilson's patches. These have already been reviewed
upstream and I fully agree with these patches as a precursor to GuC
submission.

2. Update to GuC 60.1.2. These are largely Michal's patches.

3. Turn on GuC/HuC auto mode by default.

4. Additional patches needed to support GuC submission. This is any
patch not covered by 1-3 in the first 34 patches. e.g. 'Engine relative
MMIO'

5. GuC submission support. Patches number 35+. These all don't have to
merge at once though as we don't actually allow GuC submission until the
last patch of this series.

[1] https://patchwork.freedesktop.org/patch/432206/?series=89840&rev=1

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Chris Wilson (3):
  drm/i915/gt: Move engine setup out of set_default_submission
  drm/i915/gt: Move submission_method into intel_gt
  drm/i915/gt: Move CS interrupt handler to the backend

Daniele Ceraolo Spurio (6):
  drm/i915/guc: skip disabling CTBs before sanitizing the GuC
  drm/i915/guc: use probe_error log for CT enablement failure
  drm/i915/guc: enable only the user interrupt when using GuC submission
  drm/i915/uc: turn on GuC/HuC auto mode by default
  drm/i915/guc: Use guc_class instead of engine_class in fw interface
  drm/i915/guc: Unblock GuC submission on Gen11+

John Harrison (13):
  drm/i915/guc: Support per context scheduling policies
  drm/i915/guc: Update firmware to v60.1.2
  drm/i915: Engine relative MMIO
  drm/i915/guc: Module load failure test for CT buffer creation
  drm/i915: Track 'serial' counts for virtual engines
  drm/i915/guc: Provide mmio list to be saved/restored on engine reset
  drm/i915/guc: Don't complain about reset races
  drm/i915/guc: Enable GuC engine reset
  drm/i915/guc: Fix for error capture after full GPU reset with GuC
  drm/i915/guc: Hook GuC scheduling policies up
  drm/i915/guc: Connect reset modparam updates to GuC policy flags
  drm/i915/guc: Include scheduling policies in the debugfs state dump
  drm/i915/guc: Add golden context to GuC ADS

Matthew Brost (53):
  drm/i915: Introduce i915_sched_engine object
  drm/i915/guc: Improve error message for unsolicited CT response
  drm/i915/guc: Add non blocking CTB send function
  drm/i915/guc: Add stall timer to non blocking CTB send function
  drm/i915/guc: Optimize CTB writes and reads
  drm/i915/guc: Increase size of CTB buffers
  drm/i915/guc: Add new GuC interface defines and structures
  drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor
  drm/i915/guc: Add lrc descriptor context lookup array
  drm/i915/guc: Implement GuC submission tasklet
  drm/i915/guc: Add bypass tasklet submission path to GuC
  drm/i915/guc: Implement GuC context operations for new inteface
  drm/i915/guc: Insert fence on context when deregistering
  drm/i915/guc: Defer context unpin until scheduling is disabled
  drm/i915/guc: Disable engine barriers with GuC during unpin
  drm/i915/guc: Extend deregistration fence to schedule disable
  drm/i915: Disable preempt busywait when using GuC scheduling
  drm/i915/guc: Ensure request ordering via completion fences
  drm/i915/guc: Disable semaphores when using GuC scheduling
  drm/i915/guc: Ensure G2H response has space in buffer
  drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
  drm/i915/guc: Update GuC debugfs to support new GuC
  drm/i915/guc: Add several request trace points
  drm/i915: Add intel_context tracing
  drm/i915/guc: GuC virtual engines
  drm/i915: Hold reference to intel_context over life of i915_request
  drm/i915/guc: Disable bonding extension with GuC submission
  drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
  drm/i915/guc: Reset implementation for new GuC interface
  drm/i915: Reset GPU immediately if submission is disabled
  drm/i915/guc: Add disable interrupts to guc sanitize
  drm/i915/guc: Suspend/resume implementation for new interface
  drm/i915/guc: Handle context reset notification
  drm/i915/guc: Handle engine reset failure notification
  drm/i915/guc: Enable the timer expired interrupt for GuC
  drm/i915/guc: Capture error state on context reset
  drm/i915/guc: Don't call ring_is_idle in GuC submission
  drm/i915/guc: Implement banned contexts for GuC submission
  drm/i915/guc: Allow flexible number of context ids
  drm/i915/guc: Connect the number of guc_ids to debugfs
  drm/i915/guc: Don't return -EAGAIN to user when guc_ids exhausted
  drm/i915/guc: Don't allow requests not ready to consume all guc_ids
  drm/i915/guc: Introduce guc_submit_engine object
  drm/i915/guc: Implement GuC priority management
  drm/i915/guc: Support request cancellation
  drm/i915/guc: Check return of __xa_store when registering a context
  drm/i915/guc: Non-static lrc descriptor registration buffer
  drm/i915/guc: Take GT PM ref when deregistering context
  drm/i915: Add GT PM delayed worker
  drm/i915/guc: Take engine PM when a context is pinned with GuC
    submission
  drm/i915/guc: Don't call switch_to_kernel_context with GuC submission
  drm/i915/guc: Selftest for GuC flow control
  drm/i915/guc: Update GuC documentation

Michal Wajdeczko (21):
  drm/i915/guc: Keep strict GuC ABI definitions
  drm/i915/guc: Stop using fence/status from CTB descriptor
  drm/i915: Promote ptrdiff() to i915_utils.h
  drm/i915/guc: Only rely on own CTB size
  drm/i915/guc: Don't repeat CTB layout calculations
  drm/i915/guc: Replace CTB array with explicit members
  drm/i915/guc: Update sizes of CTB buffers
  drm/i915/guc: Relax CTB response timeout
  drm/i915/guc: Start protecting access to CTB descriptors
  drm/i915/guc: Stop using mutex while sending CTB messages
  drm/i915/guc: Don't receive all G2H messages in irq handler
  drm/i915/guc: Always copy CT message to new allocation
  drm/i915/guc: Introduce unified HXG messages
  drm/i915/guc: Update MMIO based communication
  drm/i915/guc: Update CTB response status
  drm/i915/guc: Add flag for mark broken CTB
  drm/i915/guc: New definition of the CTB descriptor
  drm/i915/guc: New definition of the CTB registration action
  drm/i915/guc: New CTB based communication
  drm/i915/guc: Kill guc_clients.ct_pool
  drm/i915/guc: Early initialization of GuC send registers

Rodrigo Vivi (1):
  drm/i915/guc: Remove sample_forcewake h2g action

 drivers/gpu/drm/i915/Makefile                 |    2 +
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   39 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |    1 +
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |    3 +-
 drivers/gpu/drm/i915/gem/i915_gem_wait.c      |    4 +-
 drivers/gpu/drm/i915/gt/gen8_engine_cs.c      |    6 +-
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   |   44 +-
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   |   14 +-
 .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |    7 +
 drivers/gpu/drm/i915/gt/intel_context.c       |   50 +-
 drivers/gpu/drm/i915/gt/intel_context.h       |   45 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   76 +-
 drivers/gpu/drm/i915/gt/intel_engine.h        |   96 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  320 +-
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   75 +-
 .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |    4 +
 drivers/gpu/drm/i915/gt/intel_engine_pm.c     |   14 +-
 drivers/gpu/drm/i915/gt/intel_engine_pm.h     |    5 +
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   71 +-
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |    6 +-
 .../drm/i915/gt/intel_execlists_submission.c  |  693 +--
 .../drm/i915/gt/intel_execlists_submission.h  |   14 -
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h  |    5 +
 drivers/gpu/drm/i915/gt/intel_gt.c            |   23 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |    2 +
 drivers/gpu/drm/i915/gt/intel_gt_irq.c        |  100 +-
 drivers/gpu/drm/i915/gt/intel_gt_irq.h        |   23 +
 drivers/gpu/drm/i915/gt/intel_gt_pm.c         |   14 +-
 drivers/gpu/drm/i915/gt/intel_gt_pm.h         |   13 +
 .../drm/i915/gt/intel_gt_pm_delayed_work.c    |   35 +
 .../drm/i915/gt/intel_gt_pm_delayed_work.h    |   24 +
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   |   23 +-
 drivers/gpu/drm/i915/gt/intel_gt_requests.h   |    7 +-
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |   10 +
 drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |    1 -
 drivers/gpu/drm/i915/gt/intel_reset.c         |   58 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   73 +-
 drivers/gpu/drm/i915/gt/intel_rps.c           |    6 +-
 drivers/gpu/drm/i915/gt/intel_workarounds.c   |   46 +-
 .../gpu/drm/i915/gt/intel_workarounds_types.h |    1 +
 drivers/gpu/drm/i915/gt/mock_engine.c         |   58 +-
 drivers/gpu/drm/i915/gt/selftest_context.c    |   10 +
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |   58 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |    6 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |    6 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      |    2 +-
 .../drm/i915/gt/selftest_ring_submission.c    |    2 +-
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |  177 +
 .../gt/uc/abi/guc_communication_ctb_abi.h     |  192 +
 .../gt/uc/abi/guc_communication_mmio_abi.h    |   35 +
 .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h   |   13 +
 .../gpu/drm/i915/gt/uc/abi/guc_messages_abi.h |  247 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  194 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  131 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |  484 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h    |    3 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 1088 +++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |   49 +-
 .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    |   56 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  377 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 4037 +++++++++++++++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   20 +-
 .../i915/gt/uc/intel_guc_submission_types.h   |   55 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  116 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   11 +
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |   25 +-
 .../i915/gt/uc/selftest_guc_flow_control.c    |  589 +++
 drivers/gpu/drm/i915/i915_active.c            |    3 +
 drivers/gpu/drm/i915/i915_debugfs.c           |    8 +-
 drivers/gpu/drm/i915/i915_debugfs_params.c    |   31 +
 drivers/gpu/drm/i915/i915_drv.h               |    2 +-
 drivers/gpu/drm/i915/i915_gem_evict.c         |    1 +
 drivers/gpu/drm/i915/i915_gpu_error.c         |   28 +-
 drivers/gpu/drm/i915/i915_irq.c               |   10 +-
 drivers/gpu/drm/i915/i915_params.h            |    2 +-
 drivers/gpu/drm/i915/i915_perf.c              |   16 +-
 drivers/gpu/drm/i915/i915_reg.h               |    2 +
 drivers/gpu/drm/i915/i915_request.c           |  218 +-
 drivers/gpu/drm/i915/i915_request.h           |   37 +-
 drivers/gpu/drm/i915/i915_scheduler.c         |  188 +-
 drivers/gpu/drm/i915/i915_scheduler.h         |   74 +-
 drivers/gpu/drm/i915/i915_scheduler_types.h   |   74 +
 drivers/gpu/drm/i915/i915_trace.h             |  219 +-
 drivers/gpu/drm/i915/i915_utils.h             |    5 +
 drivers/gpu/drm/i915/i915_vma.h               |    5 -
 drivers/gpu/drm/i915/intel_wakeref.c          |    5 +
 drivers/gpu/drm/i915/intel_wakeref.h          |    1 +
 .../drm/i915/selftests/i915_live_selftests.h  |    1 +
 .../gpu/drm/i915/selftests/igt_live_test.c    |    2 +-
 .../i915/selftests/intel_scheduler_helpers.c  |  101 +
 .../i915/selftests/intel_scheduler_helpers.h  |   37 +
 .../gpu/drm/i915/selftests/mock_gem_device.c  |    3 +-
 include/uapi/drm/i915_drm.h                   |    9 +
 93 files changed, 8954 insertions(+), 2222 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_pm_delayed_work.c
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_pm_delayed_work.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c
 create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
 create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h

-- 
2.28.0


^ permalink raw reply	[flat|nested] 249+ messages in thread

* [RFC PATCH 01/97] drm/i915/gt: Move engine setup out of set_default_submission
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-19  0:25   ` Matthew Brost
  2021-05-25  8:44   ` [Intel-gfx] " Tvrtko Ursulin
  2021-05-06 19:13 ` [RFC PATCH 02/97] drm/i915/gt: Move submission_method into intel_gt Matthew Brost
                   ` (98 subsequent siblings)
  99 siblings, 2 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Chris Wilson <chris@chris-wilson.co.uk>

Now that we no longer switch back and forth between guc and execlists,
we no longer need to restore the backend's vfunc and can leave them set
after initialisation. The only catch is that we lose the submission on
wedging and still need to reset the submit_request vfunc on unwedging.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 .../drm/i915/gt/intel_execlists_submission.c  | 46 ++++++++---------
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  4 --
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 50 ++++++++-----------
 3 files changed, 44 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index de124870af44..1108c193ab65 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3076,29 +3076,6 @@ static void execlists_set_default_submission(struct intel_engine_cs *engine)
 	engine->submit_request = execlists_submit_request;
 	engine->schedule = i915_schedule;
 	engine->execlists.tasklet.callback = execlists_submission_tasklet;
-
-	engine->reset.prepare = execlists_reset_prepare;
-	engine->reset.rewind = execlists_reset_rewind;
-	engine->reset.cancel = execlists_reset_cancel;
-	engine->reset.finish = execlists_reset_finish;
-
-	engine->park = execlists_park;
-	engine->unpark = NULL;
-
-	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
-	if (!intel_vgpu_active(engine->i915)) {
-		engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
-		if (can_preempt(engine)) {
-			engine->flags |= I915_ENGINE_HAS_PREEMPTION;
-			if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
-				engine->flags |= I915_ENGINE_HAS_TIMESLICES;
-		}
-	}
-
-	if (intel_engine_has_preemption(engine))
-		engine->emit_bb_start = gen8_emit_bb_start;
-	else
-		engine->emit_bb_start = gen8_emit_bb_start_noarb;
 }
 
 static void execlists_shutdown(struct intel_engine_cs *engine)
@@ -3129,6 +3106,14 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->cops = &execlists_context_ops;
 	engine->request_alloc = execlists_request_alloc;
 
+	engine->reset.prepare = execlists_reset_prepare;
+	engine->reset.rewind = execlists_reset_rewind;
+	engine->reset.cancel = execlists_reset_cancel;
+	engine->reset.finish = execlists_reset_finish;
+
+	engine->park = execlists_park;
+	engine->unpark = NULL;
+
 	engine->emit_flush = gen8_emit_flush_xcs;
 	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
 	engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_xcs;
@@ -3149,6 +3134,21 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 		 * until a more refined solution exists.
 		 */
 	}
+
+	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
+	if (!intel_vgpu_active(engine->i915)) {
+		engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
+		if (can_preempt(engine)) {
+			engine->flags |= I915_ENGINE_HAS_PREEMPTION;
+			if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
+				engine->flags |= I915_ENGINE_HAS_TIMESLICES;
+		}
+	}
+
+	if (intel_engine_has_preemption(engine))
+		engine->emit_bb_start = gen8_emit_bb_start;
+	else
+		engine->emit_bb_start = gen8_emit_bb_start_noarb;
 }
 
 static void logical_ring_default_irqs(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 9585546556ee..5f4f7f1df48f 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -989,14 +989,10 @@ static void gen6_bsd_submit_request(struct i915_request *request)
 static void i9xx_set_default_submission(struct intel_engine_cs *engine)
 {
 	engine->submit_request = i9xx_submit_request;
-
-	engine->park = NULL;
-	engine->unpark = NULL;
 }
 
 static void gen6_bsd_set_default_submission(struct intel_engine_cs *engine)
 {
-	i9xx_set_default_submission(engine);
 	engine->submit_request = gen6_bsd_submit_request;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 92688a9b6717..f72faa0b8339 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -608,35 +608,6 @@ static int guc_resume(struct intel_engine_cs *engine)
 static void guc_set_default_submission(struct intel_engine_cs *engine)
 {
 	engine->submit_request = guc_submit_request;
-	engine->schedule = i915_schedule;
-	engine->execlists.tasklet.callback = guc_submission_tasklet;
-
-	engine->reset.prepare = guc_reset_prepare;
-	engine->reset.rewind = guc_reset_rewind;
-	engine->reset.cancel = guc_reset_cancel;
-	engine->reset.finish = guc_reset_finish;
-
-	engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
-	engine->flags |= I915_ENGINE_HAS_PREEMPTION;
-
-	/*
-	 * TODO: GuC supports timeslicing and semaphores as well, but they're
-	 * handled by the firmware so some minor tweaks are required before
-	 * enabling.
-	 *
-	 * engine->flags |= I915_ENGINE_HAS_TIMESLICES;
-	 * engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
-	 */
-
-	engine->emit_bb_start = gen8_emit_bb_start;
-
-	/*
-	 * For the breadcrumb irq to work we need the interrupts to stay
-	 * enabled. However, on all platforms on which we'll have support for
-	 * GuC submission we don't allow disabling the interrupts at runtime, so
-	 * we're always safe with the current flow.
-	 */
-	GEM_BUG_ON(engine->irq_enable || engine->irq_disable);
 }
 
 static void guc_release(struct intel_engine_cs *engine)
@@ -658,6 +629,13 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 	engine->cops = &guc_context_ops;
 	engine->request_alloc = guc_request_alloc;
 
+	engine->schedule = i915_schedule;
+
+	engine->reset.prepare = guc_reset_prepare;
+	engine->reset.rewind = guc_reset_rewind;
+	engine->reset.cancel = guc_reset_cancel;
+	engine->reset.finish = guc_reset_finish;
+
 	engine->emit_flush = gen8_emit_flush_xcs;
 	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
 	engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_xcs;
@@ -666,6 +644,20 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 		engine->emit_flush = gen12_emit_flush_xcs;
 	}
 	engine->set_default_submission = guc_set_default_submission;
+
+	engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
+	engine->flags |= I915_ENGINE_HAS_PREEMPTION;
+
+	/*
+	 * TODO: GuC supports timeslicing and semaphores as well, but they're
+	 * handled by the firmware so some minor tweaks are required before
+	 * enabling.
+	 *
+	 * engine->flags |= I915_ENGINE_HAS_TIMESLICES;
+	 * engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
+	 */
+
+	engine->emit_bb_start = gen8_emit_bb_start;
 }
 
 static void rcs_submission_override(struct intel_engine_cs *engine)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 02/97] drm/i915/gt: Move submission_method into intel_gt
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 01/97] drm/i915/gt: Move engine setup out of set_default_submission Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-19  3:10   ` Matthew Brost
  2021-05-25  8:44   ` [Intel-gfx] " Tvrtko Ursulin
  2021-05-06 19:13 ` [RFC PATCH 03/97] drm/i915/gt: Move CS interrupt handler to the backend Matthew Brost
                   ` (97 subsequent siblings)
  99 siblings, 2 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Chris Wilson <chris@chris-wilson.co.uk>

Since we setup the submission method for the engines once, it is easy to
assign an enum and use that instead of probing into the backends.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine.h               |  8 +++++++-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c            | 12 ++++++++----
 drivers/gpu/drm/i915/gt/intel_execlists_submission.c |  8 --------
 drivers/gpu/drm/i915/gt/intel_execlists_submission.h |  3 ---
 drivers/gpu/drm/i915/gt/intel_gt_types.h             |  7 +++++++
 drivers/gpu/drm/i915/gt/intel_reset.c                |  7 +++----
 drivers/gpu/drm/i915/gt/selftest_execlists.c         |  2 +-
 drivers/gpu/drm/i915/gt/selftest_ring_submission.c   |  2 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c    |  5 -----
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h    |  1 -
 drivers/gpu/drm/i915/i915_perf.c                     | 10 +++++-----
 11 files changed, 32 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 47ee8578e511..8d9184920c51 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -13,8 +13,9 @@
 #include "i915_reg.h"
 #include "i915_request.h"
 #include "i915_selftest.h"
-#include "gt/intel_timeline.h"
 #include "intel_engine_types.h"
+#include "intel_gt_types.h"
+#include "intel_timeline.h"
 #include "intel_workarounds.h"
 
 struct drm_printer;
@@ -262,6 +263,11 @@ void intel_engine_init_active(struct intel_engine_cs *engine,
 #define ENGINE_MOCK	1
 #define ENGINE_VIRTUAL	2
 
+static inline bool intel_engine_uses_guc(const struct intel_engine_cs *engine)
+{
+	return engine->gt->submission_method >= INTEL_SUBMISSION_GUC;
+}
+
 static inline bool
 intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 6dbdbde00f14..0618379b68ca 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -909,12 +909,16 @@ int intel_engines_init(struct intel_gt *gt)
 	enum intel_engine_id id;
 	int err;
 
-	if (intel_uc_uses_guc_submission(&gt->uc))
+	if (intel_uc_uses_guc_submission(&gt->uc)) {
+		gt->submission_method = INTEL_SUBMISSION_GUC;
 		setup = intel_guc_submission_setup;
-	else if (HAS_EXECLISTS(gt->i915))
+	} else if (HAS_EXECLISTS(gt->i915)) {
+		gt->submission_method = INTEL_SUBMISSION_ELSP;
 		setup = intel_execlists_submission_setup;
-	else
+	} else {
+		gt->submission_method = INTEL_SUBMISSION_RING;
 		setup = intel_ring_submission_setup;
+	}
 
 	for_each_engine(engine, gt, id) {
 		err = engine_setup_common(engine);
@@ -1479,7 +1483,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
 		drm_printf(m, "\tIPEHR: 0x%08x\n", ENGINE_READ(engine, IPEHR));
 	}
 
-	if (intel_engine_in_guc_submission_mode(engine)) {
+	if (intel_engine_uses_guc(engine)) {
 		/* nothing to print yet */
 	} else if (HAS_EXECLISTS(dev_priv)) {
 		struct i915_request * const *port, *rq;
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 1108c193ab65..9d2da5ccaef6 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1768,7 +1768,6 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive)
 	 */
 	GEM_BUG_ON(!tasklet_is_locked(&execlists->tasklet) &&
 		   !reset_in_progress(execlists));
-	GEM_BUG_ON(!intel_engine_in_execlists_submission_mode(engine));
 
 	/*
 	 * Note that csb_write, csb_status may be either in HWSP or mmio.
@@ -3884,13 +3883,6 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
-bool
-intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine)
-{
-	return engine->set_default_submission ==
-	       execlists_set_default_submission;
-}
-
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftest_execlists.c"
 #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
index fd61dae820e9..4ca9b475e252 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
@@ -43,7 +43,4 @@ int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
 				     const struct intel_engine_cs *master,
 				     const struct intel_engine_cs *sibling);
 
-bool
-intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
-
 #endif /* __INTEL_EXECLISTS_SUBMISSION_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 0caf6ca0a784..fecfacf551d5 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -31,6 +31,12 @@ struct i915_ggtt;
 struct intel_engine_cs;
 struct intel_uncore;
 
+enum intel_submission_method {
+	INTEL_SUBMISSION_RING,
+	INTEL_SUBMISSION_ELSP,
+	INTEL_SUBMISSION_GUC,
+};
+
 struct intel_gt {
 	struct drm_i915_private *i915;
 	struct intel_uncore *uncore;
@@ -118,6 +124,7 @@ struct intel_gt {
 	struct intel_engine_cs *engine[I915_NUM_ENGINES];
 	struct intel_engine_cs *engine_class[MAX_ENGINE_CLASS + 1]
 					    [MAX_ENGINE_INSTANCE + 1];
+	enum intel_submission_method submission_method;
 
 	/*
 	 * Default address space (either GGTT or ppGTT depending on arch).
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index a377c4588aaa..d5094be6d90f 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -1118,7 +1118,6 @@ static int intel_gt_reset_engine(struct intel_engine_cs *engine)
 int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
 {
 	struct intel_gt *gt = engine->gt;
-	bool uses_guc = intel_engine_in_guc_submission_mode(engine);
 	int ret;
 
 	ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
@@ -1134,10 +1133,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
 			   "Resetting %s for %s\n", engine->name, msg);
 	atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
 
-	if (!uses_guc)
-		ret = intel_gt_reset_engine(engine);
-	else
+	if (intel_engine_uses_guc(engine))
 		ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
+	else
+		ret = intel_gt_reset_engine(engine);
 	if (ret) {
 		/* If we fail here, we expect to fallback to a global reset */
 		ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index 1081cd36a2bd..1f93591a8c69 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -4716,7 +4716,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_virtual_reset),
 	};
 
-	if (!HAS_EXECLISTS(i915))
+	if (i915->gt.submission_method != INTEL_SUBMISSION_ELSP)
 		return 0;
 
 	if (intel_gt_is_wedged(&i915->gt))
diff --git a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
index 99609271c3a7..c12e74171b63 100644
--- a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
@@ -291,7 +291,7 @@ int intel_ring_submission_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_ctx_switch_wa),
 	};
 
-	if (HAS_EXECLISTS(i915))
+	if (i915->gt.submission_method > INTEL_SUBMISSION_RING)
 		return 0;
 
 	return intel_gt_live_subtests(tests, &i915->gt);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index f72faa0b8339..17b551a0c89f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -745,8 +745,3 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
 {
 	guc->submission_selected = __guc_submission_selected(guc);
 }
-
-bool intel_engine_in_guc_submission_mode(const struct intel_engine_cs *engine)
-{
-	return engine->set_default_submission == guc_set_default_submission;
-}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 5f7b9e6347d0..3f7005018939 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -20,7 +20,6 @@ void intel_guc_submission_fini(struct intel_guc *guc);
 int intel_guc_preempt_work_create(struct intel_guc *guc);
 void intel_guc_preempt_work_destroy(struct intel_guc *guc);
 int intel_guc_submission_setup(struct intel_engine_cs *engine);
-bool intel_engine_in_guc_submission_mode(const struct intel_engine_cs *engine);
 
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 85ad62dbabfa..66f1f25119b5 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1257,11 +1257,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
 	case 8:
 	case 9:
 	case 10:
-		if (intel_engine_in_execlists_submission_mode(ce->engine)) {
-			stream->specific_ctx_id_mask =
-				(1U << GEN8_CTX_ID_WIDTH) - 1;
-			stream->specific_ctx_id = stream->specific_ctx_id_mask;
-		} else {
+		if (intel_engine_uses_guc(ce->engine)) {
 			/*
 			 * When using GuC, the context descriptor we write in
 			 * i915 is read by GuC and rewritten before it's
@@ -1280,6 +1276,10 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
 			 */
 			stream->specific_ctx_id_mask =
 				(1U << (GEN8_CTX_ID_WIDTH - 1)) - 1;
+		} else {
+			stream->specific_ctx_id_mask =
+				(1U << GEN8_CTX_ID_WIDTH) - 1;
+			stream->specific_ctx_id = stream->specific_ctx_id_mask;
 		}
 		break;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 03/97] drm/i915/gt: Move CS interrupt handler to the backend
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 01/97] drm/i915/gt: Move engine setup out of set_default_submission Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 02/97] drm/i915/gt: Move submission_method into intel_gt Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-19  3:31   ` Matthew Brost
  2021-05-25  8:45   ` [Intel-gfx] " Tvrtko Ursulin
  2021-05-06 19:13 ` [RFC PATCH 04/97] drm/i915/guc: skip disabling CTBs before sanitizing the GuC Matthew Brost
                   ` (96 subsequent siblings)
  99 siblings, 2 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Chris Wilson <chris@chris-wilson.co.uk>

The different submission backends each have their own preferred
behaviour and interrupt setup. Let each handle their own interrupts.

This becomes more useful later as we to extract the use of auxiliary
state in the interrupt handler that is backend specific.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  7 ++
 drivers/gpu/drm/i915/gt/intel_engine_types.h  | 14 +---
 .../drm/i915/gt/intel_execlists_submission.c  | 41 ++++++++++
 drivers/gpu/drm/i915/gt/intel_gt_irq.c        | 82 ++++++-------------
 drivers/gpu/drm/i915/gt/intel_gt_irq.h        | 23 ++++++
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  8 ++
 drivers/gpu/drm/i915/gt/intel_rps.c           |  2 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 ++-
 drivers/gpu/drm/i915/i915_irq.c               | 10 ++-
 9 files changed, 124 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 0618379b68ca..828e1669f92c 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -255,6 +255,11 @@ static void intel_engine_sanitize_mmio(struct intel_engine_cs *engine)
 	intel_engine_set_hwsp_writemask(engine, ~0u);
 }
 
+static void nop_irq_handler(struct intel_engine_cs *engine, u16 iir)
+{
+	GEM_DEBUG_WARN_ON(iir);
+}
+
 static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 {
 	const struct engine_info *info = &intel_engines[id];
@@ -292,6 +297,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 	engine->hw_id = info->hw_id;
 	engine->guc_id = MAKE_GUC_ID(info->class, info->instance);
 
+	engine->irq_handler = nop_irq_handler;
+
 	engine->class = info->class;
 	engine->instance = info->instance;
 	__sprint_engine_name(engine);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 883bafc44902..9ef349cd5cea 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -402,6 +402,7 @@ struct intel_engine_cs {
 	u32		irq_enable_mask; /* bitmask to enable ring interrupt */
 	void		(*irq_enable)(struct intel_engine_cs *engine);
 	void		(*irq_disable)(struct intel_engine_cs *engine);
+	void		(*irq_handler)(struct intel_engine_cs *engine, u16 iir);
 
 	void		(*sanitize)(struct intel_engine_cs *engine);
 	int		(*resume)(struct intel_engine_cs *engine);
@@ -481,10 +482,9 @@ struct intel_engine_cs {
 #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
 #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
 #define I915_ENGINE_HAS_TIMESLICES   BIT(4)
-#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(5)
-#define I915_ENGINE_IS_VIRTUAL       BIT(6)
-#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(7)
-#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(8)
+#define I915_ENGINE_IS_VIRTUAL       BIT(5)
+#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
+#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
 	unsigned int flags;
 
 	/*
@@ -593,12 +593,6 @@ intel_engine_has_timeslices(const struct intel_engine_cs *engine)
 	return engine->flags & I915_ENGINE_HAS_TIMESLICES;
 }
 
-static inline bool
-intel_engine_needs_breadcrumb_tasklet(const struct intel_engine_cs *engine)
-{
-	return engine->flags & I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
-}
-
 static inline bool
 intel_engine_is_virtual(const struct intel_engine_cs *engine)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 9d2da5ccaef6..8db200422950 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -118,6 +118,7 @@
 #include "intel_engine_stats.h"
 #include "intel_execlists_submission.h"
 #include "intel_gt.h"
+#include "intel_gt_irq.h"
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
 #include "intel_lrc.h"
@@ -2384,6 +2385,45 @@ static void execlists_submission_tasklet(struct tasklet_struct *t)
 	rcu_read_unlock();
 }
 
+static void execlists_irq_handler(struct intel_engine_cs *engine, u16 iir)
+{
+	bool tasklet = false;
+
+	if (unlikely(iir & GT_CS_MASTER_ERROR_INTERRUPT)) {
+		u32 eir;
+
+		/* Upper 16b are the enabling mask, rsvd for internal errors */
+		eir = ENGINE_READ(engine, RING_EIR) & GENMASK(15, 0);
+		ENGINE_TRACE(engine, "CS error: %x\n", eir);
+
+		/* Disable the error interrupt until after the reset */
+		if (likely(eir)) {
+			ENGINE_WRITE(engine, RING_EMR, ~0u);
+			ENGINE_WRITE(engine, RING_EIR, eir);
+			WRITE_ONCE(engine->execlists.error_interrupt, eir);
+			tasklet = true;
+		}
+	}
+
+	if (iir & GT_WAIT_SEMAPHORE_INTERRUPT) {
+		WRITE_ONCE(engine->execlists.yield,
+			   ENGINE_READ_FW(engine, RING_EXECLIST_STATUS_HI));
+		ENGINE_TRACE(engine, "semaphore yield: %08x\n",
+			     engine->execlists.yield);
+		if (del_timer(&engine->execlists.timer))
+			tasklet = true;
+	}
+
+	if (iir & GT_CONTEXT_SWITCH_INTERRUPT)
+		tasklet = true;
+
+	if (iir & GT_RENDER_USER_INTERRUPT)
+		intel_engine_signal_breadcrumbs(engine);
+
+	if (tasklet)
+		tasklet_hi_schedule(&engine->execlists.tasklet);
+}
+
 static void __execlists_kick(struct intel_engine_execlists *execlists)
 {
 	/* Kick the tasklet for some interrupt coalescing and reset handling */
@@ -3133,6 +3173,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 		 * until a more refined solution exists.
 		 */
 	}
+	intel_engine_set_irq_handler(engine, execlists_irq_handler);
 
 	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
 	if (!intel_vgpu_active(engine->i915)) {
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
index 9fc6c912a4e5..d29126c458ba 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
@@ -20,48 +20,6 @@ static void guc_irq_handler(struct intel_guc *guc, u16 iir)
 		intel_guc_to_host_event_handler(guc);
 }
 
-static void
-cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
-{
-	bool tasklet = false;
-
-	if (unlikely(iir & GT_CS_MASTER_ERROR_INTERRUPT)) {
-		u32 eir;
-
-		/* Upper 16b are the enabling mask, rsvd for internal errors */
-		eir = ENGINE_READ(engine, RING_EIR) & GENMASK(15, 0);
-		ENGINE_TRACE(engine, "CS error: %x\n", eir);
-
-		/* Disable the error interrupt until after the reset */
-		if (likely(eir)) {
-			ENGINE_WRITE(engine, RING_EMR, ~0u);
-			ENGINE_WRITE(engine, RING_EIR, eir);
-			WRITE_ONCE(engine->execlists.error_interrupt, eir);
-			tasklet = true;
-		}
-	}
-
-	if (iir & GT_WAIT_SEMAPHORE_INTERRUPT) {
-		WRITE_ONCE(engine->execlists.yield,
-			   ENGINE_READ_FW(engine, RING_EXECLIST_STATUS_HI));
-		ENGINE_TRACE(engine, "semaphore yield: %08x\n",
-			     engine->execlists.yield);
-		if (del_timer(&engine->execlists.timer))
-			tasklet = true;
-	}
-
-	if (iir & GT_CONTEXT_SWITCH_INTERRUPT)
-		tasklet = true;
-
-	if (iir & GT_RENDER_USER_INTERRUPT) {
-		intel_engine_signal_breadcrumbs(engine);
-		tasklet |= intel_engine_needs_breadcrumb_tasklet(engine);
-	}
-
-	if (tasklet)
-		tasklet_hi_schedule(&engine->execlists.tasklet);
-}
-
 static u32
 gen11_gt_engine_identity(struct intel_gt *gt,
 			 const unsigned int bank, const unsigned int bit)
@@ -122,7 +80,7 @@ gen11_engine_irq_handler(struct intel_gt *gt, const u8 class,
 		engine = NULL;
 
 	if (likely(engine))
-		return cs_irq_handler(engine, iir);
+		return intel_engine_cs_irq(engine, iir);
 
 	WARN_ONCE(1, "unhandled engine interrupt class=0x%x, instance=0x%x\n",
 		  class, instance);
@@ -275,9 +233,12 @@ void gen11_gt_irq_postinstall(struct intel_gt *gt)
 void gen5_gt_irq_handler(struct intel_gt *gt, u32 gt_iir)
 {
 	if (gt_iir & GT_RENDER_USER_INTERRUPT)
-		intel_engine_signal_breadcrumbs(gt->engine_class[RENDER_CLASS][0]);
+		intel_engine_cs_irq(gt->engine_class[RENDER_CLASS][0],
+				    gt_iir);
+
 	if (gt_iir & ILK_BSD_USER_INTERRUPT)
-		intel_engine_signal_breadcrumbs(gt->engine_class[VIDEO_DECODE_CLASS][0]);
+		intel_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][0],
+				    gt_iir);
 }
 
 static void gen7_parity_error_irq_handler(struct intel_gt *gt, u32 iir)
@@ -301,11 +262,16 @@ static void gen7_parity_error_irq_handler(struct intel_gt *gt, u32 iir)
 void gen6_gt_irq_handler(struct intel_gt *gt, u32 gt_iir)
 {
 	if (gt_iir & GT_RENDER_USER_INTERRUPT)
-		intel_engine_signal_breadcrumbs(gt->engine_class[RENDER_CLASS][0]);
+		intel_engine_cs_irq(gt->engine_class[RENDER_CLASS][0],
+				    gt_iir);
+
 	if (gt_iir & GT_BSD_USER_INTERRUPT)
-		intel_engine_signal_breadcrumbs(gt->engine_class[VIDEO_DECODE_CLASS][0]);
+		intel_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][0],
+				    gt_iir >> 12);
+
 	if (gt_iir & GT_BLT_USER_INTERRUPT)
-		intel_engine_signal_breadcrumbs(gt->engine_class[COPY_ENGINE_CLASS][0]);
+		intel_engine_cs_irq(gt->engine_class[COPY_ENGINE_CLASS][0],
+				    gt_iir >> 22);
 
 	if (gt_iir & (GT_BLT_CS_ERROR_INTERRUPT |
 		      GT_BSD_CS_ERROR_INTERRUPT |
@@ -324,10 +290,10 @@ void gen8_gt_irq_handler(struct intel_gt *gt, u32 master_ctl)
 	if (master_ctl & (GEN8_GT_RCS_IRQ | GEN8_GT_BCS_IRQ)) {
 		iir = raw_reg_read(regs, GEN8_GT_IIR(0));
 		if (likely(iir)) {
-			cs_irq_handler(gt->engine_class[RENDER_CLASS][0],
-				       iir >> GEN8_RCS_IRQ_SHIFT);
-			cs_irq_handler(gt->engine_class[COPY_ENGINE_CLASS][0],
-				       iir >> GEN8_BCS_IRQ_SHIFT);
+			intel_engine_cs_irq(gt->engine_class[RENDER_CLASS][0],
+					    iir >> GEN8_RCS_IRQ_SHIFT);
+			intel_engine_cs_irq(gt->engine_class[COPY_ENGINE_CLASS][0],
+					    iir >> GEN8_BCS_IRQ_SHIFT);
 			raw_reg_write(regs, GEN8_GT_IIR(0), iir);
 		}
 	}
@@ -335,10 +301,10 @@ void gen8_gt_irq_handler(struct intel_gt *gt, u32 master_ctl)
 	if (master_ctl & (GEN8_GT_VCS0_IRQ | GEN8_GT_VCS1_IRQ)) {
 		iir = raw_reg_read(regs, GEN8_GT_IIR(1));
 		if (likely(iir)) {
-			cs_irq_handler(gt->engine_class[VIDEO_DECODE_CLASS][0],
-				       iir >> GEN8_VCS0_IRQ_SHIFT);
-			cs_irq_handler(gt->engine_class[VIDEO_DECODE_CLASS][1],
-				       iir >> GEN8_VCS1_IRQ_SHIFT);
+			intel_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][0],
+					    iir >> GEN8_VCS0_IRQ_SHIFT);
+			intel_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][1],
+					    iir >> GEN8_VCS1_IRQ_SHIFT);
 			raw_reg_write(regs, GEN8_GT_IIR(1), iir);
 		}
 	}
@@ -346,8 +312,8 @@ void gen8_gt_irq_handler(struct intel_gt *gt, u32 master_ctl)
 	if (master_ctl & GEN8_GT_VECS_IRQ) {
 		iir = raw_reg_read(regs, GEN8_GT_IIR(3));
 		if (likely(iir)) {
-			cs_irq_handler(gt->engine_class[VIDEO_ENHANCEMENT_CLASS][0],
-				       iir >> GEN8_VECS_IRQ_SHIFT);
+			intel_engine_cs_irq(gt->engine_class[VIDEO_ENHANCEMENT_CLASS][0],
+					    iir >> GEN8_VECS_IRQ_SHIFT);
 			raw_reg_write(regs, GEN8_GT_IIR(3), iir);
 		}
 	}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.h b/drivers/gpu/drm/i915/gt/intel_gt_irq.h
index f667e976fb2b..41cad38668c5 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_irq.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.h
@@ -8,6 +8,8 @@
 
 #include <linux/types.h>
 
+#include "intel_engine_types.h"
+
 struct intel_gt;
 
 #define GEN8_GT_IRQS (GEN8_GT_RCS_IRQ | \
@@ -39,4 +41,25 @@ void gen8_gt_irq_handler(struct intel_gt *gt, u32 master_ctl);
 void gen8_gt_irq_reset(struct intel_gt *gt);
 void gen8_gt_irq_postinstall(struct intel_gt *gt);
 
+static inline void intel_engine_cs_irq(struct intel_engine_cs *engine, u16 iir)
+{
+	if (iir)
+		engine->irq_handler(engine, iir);
+}
+
+static inline void
+intel_engine_set_irq_handler(struct intel_engine_cs *engine,
+			     void (*fn)(struct intel_engine_cs *engine,
+					u16 iir))
+{
+	/*
+	 * As the interrupt is live as allocate and setup the engines,
+	 * err on the side of caution and apply barriers to updating
+	 * the irq handler callback. This assures that when we do use
+	 * the engine, we will receive interrupts only to ourselves,
+	 * and not lose any.
+	 */
+	smp_store_mb(engine->irq_handler, fn);
+}
+
 #endif /* INTEL_GT_IRQ_H */
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 5f4f7f1df48f..2b6dffcc2262 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -12,6 +12,7 @@
 #include "intel_breadcrumbs.h"
 #include "intel_context.h"
 #include "intel_gt.h"
+#include "intel_gt_irq.h"
 #include "intel_reset.h"
 #include "intel_ring.h"
 #include "shmem_utils.h"
@@ -1017,10 +1018,17 @@ static void ring_release(struct intel_engine_cs *engine)
 	intel_timeline_put(engine->legacy.timeline);
 }
 
+static void irq_handler(struct intel_engine_cs *engine, u16 iir)
+{
+	intel_engine_signal_breadcrumbs(engine);
+}
+
 static void setup_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
 
+	intel_engine_set_irq_handler(engine, irq_handler);
+
 	if (INTEL_GEN(i915) >= 6) {
 		engine->irq_enable = gen6_irq_enable;
 		engine->irq_disable = gen6_irq_disable;
diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
index 405d814e9040..97cab1b99871 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -1774,7 +1774,7 @@ void gen6_rps_irq_handler(struct intel_rps *rps, u32 pm_iir)
 		return;
 
 	if (pm_iir & PM_VEBOX_USER_INTERRUPT)
-		intel_engine_signal_breadcrumbs(gt->engine[VECS0]);
+		intel_engine_cs_irq(gt->engine[VECS0], pm_iir >> 10);
 
 	if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT)
 		DRM_DEBUG("Command parser error, pm_iir 0x%08x\n", pm_iir);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 17b551a0c89f..335719f17490 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -11,6 +11,7 @@
 #include "gt/intel_context.h"
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_gt.h"
+#include "gt/intel_gt_irq.h"
 #include "gt/intel_gt_pm.h"
 #include "gt/intel_lrc.h"
 #include "gt/intel_mocs.h"
@@ -264,6 +265,14 @@ static void guc_submission_tasklet(struct tasklet_struct *t)
 	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
+static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
+{
+	if (iir & GT_RENDER_USER_INTERRUPT) {
+		intel_engine_signal_breadcrumbs(engine);
+		tasklet_hi_schedule(&engine->execlists.tasklet);
+	}
+}
+
 static void guc_reset_prepare(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -645,7 +654,6 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 	}
 	engine->set_default_submission = guc_set_default_submission;
 
-	engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
 	engine->flags |= I915_ENGINE_HAS_PREEMPTION;
 
 	/*
@@ -681,6 +689,7 @@ static void rcs_submission_override(struct intel_engine_cs *engine)
 static inline void guc_default_irqs(struct intel_engine_cs *engine)
 {
 	engine->irq_keep_mask = GT_RENDER_USER_INTERRUPT;
+	intel_engine_set_irq_handler(engine, cs_irq_handler);
 }
 
 int intel_guc_submission_setup(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f6967a93ec7a..d58118806299 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -4014,7 +4014,7 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
 		intel_uncore_write16(&dev_priv->uncore, GEN2_IIR, iir);
 
 		if (iir & I915_USER_INTERRUPT)
-			intel_engine_signal_breadcrumbs(dev_priv->gt.engine[RCS0]);
+			intel_engine_cs_irq(dev_priv->gt.engine[RCS0], iir);
 
 		if (iir & I915_MASTER_ERROR_INTERRUPT)
 			i8xx_error_irq_handler(dev_priv, eir, eir_stuck);
@@ -4122,7 +4122,7 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
 		intel_uncore_write(&dev_priv->uncore, GEN2_IIR, iir);
 
 		if (iir & I915_USER_INTERRUPT)
-			intel_engine_signal_breadcrumbs(dev_priv->gt.engine[RCS0]);
+			intel_engine_cs_irq(dev_priv->gt.engine[RCS0], iir);
 
 		if (iir & I915_MASTER_ERROR_INTERRUPT)
 			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
@@ -4267,10 +4267,12 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
 		intel_uncore_write(&dev_priv->uncore, GEN2_IIR, iir);
 
 		if (iir & I915_USER_INTERRUPT)
-			intel_engine_signal_breadcrumbs(dev_priv->gt.engine[RCS0]);
+			intel_engine_cs_irq(dev_priv->gt.engine[RCS0],
+					    iir);
 
 		if (iir & I915_BSD_USER_INTERRUPT)
-			intel_engine_signal_breadcrumbs(dev_priv->gt.engine[VCS0]);
+			intel_engine_cs_irq(dev_priv->gt.engine[VCS0],
+					    iir >> 25);
 
 		if (iir & I915_MASTER_ERROR_INTERRUPT)
 			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 04/97] drm/i915/guc: skip disabling CTBs before sanitizing the GuC
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (2 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 03/97] drm/i915/gt: Move CS interrupt handler to the backend Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-20 16:47   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 05/97] drm/i915/guc: use probe_error log for CT enablement failure Matthew Brost
                   ` (95 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

If we're about to sanitize the GuC, something might have going wrong
beforehand, so we should avoid trying to talk to it. Even if GuC is
still running fine, the sanitize will reset its internal state and clear
the CTB registration, so there is still no need to explicitly do so.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/2469
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_uc.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 6abb8f2dc33d..892c1315ce49 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -504,7 +504,7 @@ static int __uc_init_hw(struct intel_uc *uc)
 
 	ret = intel_guc_sample_forcewake(guc);
 	if (ret)
-		goto err_communication;
+		goto err_log_capture;
 
 	if (intel_uc_uses_guc_submission(uc))
 		intel_guc_submission_enable(guc);
@@ -529,8 +529,6 @@ static int __uc_init_hw(struct intel_uc *uc)
 	/*
 	 * We've failed to load the firmware :(
 	 */
-err_communication:
-	guc_disable_communication(guc);
 err_log_capture:
 	__uc_capture_load_err_log(uc);
 err_out:
@@ -558,9 +556,6 @@ static void __uc_fini_hw(struct intel_uc *uc)
 	if (intel_uc_uses_guc_submission(uc))
 		intel_guc_submission_disable(guc);
 
-	if (guc_communication_enabled(guc))
-		guc_disable_communication(guc);
-
 	__uc_sanitize(uc);
 }
 
@@ -577,7 +572,6 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
 	if (!intel_guc_is_ready(guc))
 		return;
 
-	guc_disable_communication(guc);
 	__uc_sanitize(uc);
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 05/97] drm/i915/guc: use probe_error log for CT enablement failure
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (3 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 04/97] drm/i915/guc: skip disabling CTBs before sanitizing the GuC Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-24 10:30   ` Michal Wajdeczko
  2021-05-06 19:13 ` [RFC PATCH 06/97] drm/i915/guc: enable only the user interrupt when using GuC submission Matthew Brost
                   ` (94 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

We have a couple of failure injection points in the CT enablement path,
so we need to use i915_probe_error() to select the appropriate log level.
A new macro (CT_PROBE_ERROR) has been added to the set of CT logging
macros to be used in this scenario and upcoming ones.

While adding the new macros, fix the underlying logging mechanics used
by the existing ones (DRM_DEV_* -> drm_*) and move the inlines to
before they're used inside the macros.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 48 ++++++++++++-----------
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index fa9e048cc65f..25618649048f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -7,14 +7,36 @@
 #include "intel_guc_ct.h"
 #include "gt/intel_gt.h"
 
+static inline struct intel_guc *ct_to_guc(struct intel_guc_ct *ct)
+{
+	return container_of(ct, struct intel_guc, ct);
+}
+
+static inline struct intel_gt *ct_to_gt(struct intel_guc_ct *ct)
+{
+	return guc_to_gt(ct_to_guc(ct));
+}
+
+static inline struct drm_i915_private *ct_to_i915(struct intel_guc_ct *ct)
+{
+	return ct_to_gt(ct)->i915;
+}
+
+static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
+{
+	return &ct_to_i915(ct)->drm;
+}
+
 #define CT_ERROR(_ct, _fmt, ...) \
-	DRM_DEV_ERROR(ct_to_dev(_ct), "CT: " _fmt, ##__VA_ARGS__)
+	drm_err(ct_to_drm(_ct), "CT: " _fmt, ##__VA_ARGS__)
 #ifdef CONFIG_DRM_I915_DEBUG_GUC
 #define CT_DEBUG(_ct, _fmt, ...) \
-	DRM_DEV_DEBUG_DRIVER(ct_to_dev(_ct), "CT: " _fmt, ##__VA_ARGS__)
+	drm_dbg(ct_to_drm(_ct), "CT: " _fmt, ##__VA_ARGS__)
 #else
 #define CT_DEBUG(...)	do { } while (0)
 #endif
+#define CT_PROBE_ERROR(_ct, _fmt, ...) \
+	i915_probe_error(ct_to_i915(ct), "CT: " _fmt, ##__VA_ARGS__);
 
 struct ct_request {
 	struct list_head link;
@@ -47,26 +69,6 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
 	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
 }
 
-static inline struct intel_guc *ct_to_guc(struct intel_guc_ct *ct)
-{
-	return container_of(ct, struct intel_guc, ct);
-}
-
-static inline struct intel_gt *ct_to_gt(struct intel_guc_ct *ct)
-{
-	return guc_to_gt(ct_to_guc(ct));
-}
-
-static inline struct drm_i915_private *ct_to_i915(struct intel_guc_ct *ct)
-{
-	return ct_to_gt(ct)->i915;
-}
-
-static inline struct device *ct_to_dev(struct intel_guc_ct *ct)
-{
-	return ct_to_i915(ct)->drm.dev;
-}
-
 static inline const char *guc_ct_buffer_type_to_str(u32 type)
 {
 	switch (type) {
@@ -264,7 +266,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
 err_deregister:
 	ct_deregister_buffer(ct, INTEL_GUC_CT_BUFFER_TYPE_RECV);
 err_out:
-	CT_ERROR(ct, "Failed to open open CT channel (err=%d)\n", err);
+	CT_PROBE_ERROR(ct, "Failed to open channel (err=%d)\n", err);
 	return err;
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 06/97] drm/i915/guc: enable only the user interrupt when using GuC submission
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (4 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 05/97] drm/i915/guc: use probe_error log for CT enablement failure Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-25  0:31   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 07/97] drm/i915/guc: Remove sample_forcewake h2g action Matthew Brost
                   ` (93 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

In GuC submission mode the CS is owned by the GuC FW, so all CS status
interrupts are handled by it. We only need the user interrupt as that
signals request completion.

Since we're now starting the engines directly in GuC submission mode
when selected, we can stop switching back and forth between the
execlists and the GuC programming and select directly the correct
interrupt mask.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_irq.c        | 18 ++++++-----
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 31 -------------------
 2 files changed, 11 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
index d29126c458ba..f88c10366e58 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
@@ -194,14 +194,18 @@ void gen11_gt_irq_reset(struct intel_gt *gt)
 
 void gen11_gt_irq_postinstall(struct intel_gt *gt)
 {
-	const u32 irqs =
-		GT_CS_MASTER_ERROR_INTERRUPT |
-		GT_RENDER_USER_INTERRUPT |
-		GT_CONTEXT_SWITCH_INTERRUPT |
-		GT_WAIT_SEMAPHORE_INTERRUPT;
 	struct intel_uncore *uncore = gt->uncore;
-	const u32 dmask = irqs << 16 | irqs;
-	const u32 smask = irqs << 16;
+	u32 irqs = GT_RENDER_USER_INTERRUPT;
+	u32 dmask;
+	u32 smask;
+
+	if (!intel_uc_wants_guc_submission(&gt->uc))
+		irqs |= GT_CS_MASTER_ERROR_INTERRUPT |
+			GT_CONTEXT_SWITCH_INTERRUPT |
+			GT_WAIT_SEMAPHORE_INTERRUPT;
+
+	dmask = irqs << 16 | irqs;
+	smask = irqs << 16;
 
 	BUILD_BUG_ON(irqs & 0xffff0000);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 335719f17490..38cda5d599a6 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -432,32 +432,6 @@ void intel_guc_submission_fini(struct intel_guc *guc)
 	}
 }
 
-static void guc_interrupts_capture(struct intel_gt *gt)
-{
-	struct intel_uncore *uncore = gt->uncore;
-	u32 irqs = GT_CONTEXT_SWITCH_INTERRUPT;
-	u32 dmask = irqs << 16 | irqs;
-
-	GEM_BUG_ON(INTEL_GEN(gt->i915) < 11);
-
-	/* Don't handle the ctx switch interrupt in GuC submission mode */
-	intel_uncore_rmw(uncore, GEN11_RENDER_COPY_INTR_ENABLE, dmask, 0);
-	intel_uncore_rmw(uncore, GEN11_VCS_VECS_INTR_ENABLE, dmask, 0);
-}
-
-static void guc_interrupts_release(struct intel_gt *gt)
-{
-	struct intel_uncore *uncore = gt->uncore;
-	u32 irqs = GT_CONTEXT_SWITCH_INTERRUPT;
-	u32 dmask = irqs << 16 | irqs;
-
-	GEM_BUG_ON(INTEL_GEN(gt->i915) < 11);
-
-	/* Handle ctx switch interrupts again */
-	intel_uncore_rmw(uncore, GEN11_RENDER_COPY_INTR_ENABLE, 0, dmask);
-	intel_uncore_rmw(uncore, GEN11_VCS_VECS_INTR_ENABLE, 0, dmask);
-}
-
 static int guc_context_alloc(struct intel_context *ce)
 {
 	return lrc_alloc(ce, ce->engine);
@@ -722,9 +696,6 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 void intel_guc_submission_enable(struct intel_guc *guc)
 {
 	guc_stage_desc_init(guc);
-
-	/* Take over from manual control of ELSP (execlists) */
-	guc_interrupts_capture(guc_to_gt(guc));
 }
 
 void intel_guc_submission_disable(struct intel_guc *guc)
@@ -735,8 +706,6 @@ void intel_guc_submission_disable(struct intel_guc *guc)
 
 	/* Note: By the time we're here, GuC may have already been reset */
 
-	guc_interrupts_release(gt);
-
 	guc_stage_desc_fini(guc);
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 07/97] drm/i915/guc: Remove sample_forcewake h2g action
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (5 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 06/97] drm/i915/guc: enable only the user interrupt when using GuC submission Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-24 10:48   ` Michal Wajdeczko
  2021-05-25  0:36   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 08/97] drm/i915/guc: Keep strict GuC ABI definitions Matthew Brost
                   ` (92 subsequent siblings)
  99 siblings, 2 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Rodrigo Vivi <rodrigo.vivi@intel.com>

This action is no-op in the GuC side for a few versions already
and it is getting entirely removed soon, in an upcoming version.

Time to remove before we face communication issues.

Cc:  Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.c      | 16 ----------------
 drivers/gpu/drm/i915/gt/uc/intel_guc.h      |  1 -
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h |  4 ----
 drivers/gpu/drm/i915/gt/uc/intel_uc.c       |  4 ----
 4 files changed, 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index adae04c47aab..ab2c8fe8cdfa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -469,22 +469,6 @@ int intel_guc_to_host_process_recv_msg(struct intel_guc *guc,
 	return 0;
 }
 
-int intel_guc_sample_forcewake(struct intel_guc *guc)
-{
-	struct drm_i915_private *dev_priv = guc_to_gt(guc)->i915;
-	u32 action[2];
-
-	action[0] = INTEL_GUC_ACTION_SAMPLE_FORCEWAKE;
-	/* WaRsDisableCoarsePowerGating:skl,cnl */
-	if (!HAS_RC6(dev_priv) || NEEDS_WaRsDisableCoarsePowerGating(dev_priv))
-		action[1] = 0;
-	else
-		/* bit 0 and 1 are for Render and Media domain separately */
-		action[1] = GUC_FORCEWAKE_RENDER | GUC_FORCEWAKE_MEDIA;
-
-	return intel_guc_send(guc, action, ARRAY_SIZE(action));
-}
-
 /**
  * intel_guc_auth_huc() - Send action to GuC to authenticate HuC ucode
  * @guc: intel_guc structure
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index bc2ba7d0626c..c20f3839de12 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -128,7 +128,6 @@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action, u32 len,
 			u32 *response_buf, u32 response_buf_size);
 int intel_guc_to_host_process_recv_msg(struct intel_guc *guc,
 				       const u32 *payload, u32 len);
-int intel_guc_sample_forcewake(struct intel_guc *guc);
 int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset);
 int intel_guc_suspend(struct intel_guc *guc);
 int intel_guc_resume(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 79c560d9c0b6..0f9afcde1d0b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -302,9 +302,6 @@ struct guc_ct_buffer_desc {
 #define GUC_CT_MSG_ACTION_SHIFT			16
 #define GUC_CT_MSG_ACTION_MASK			0xFFFF
 
-#define GUC_FORCEWAKE_RENDER	(1 << 0)
-#define GUC_FORCEWAKE_MEDIA	(1 << 1)
-
 #define GUC_POWER_UNSPECIFIED	0
 #define GUC_POWER_D0		1
 #define GUC_POWER_D1		2
@@ -558,7 +555,6 @@ enum intel_guc_action {
 	INTEL_GUC_ACTION_ENTER_S_STATE = 0x501,
 	INTEL_GUC_ACTION_EXIT_S_STATE = 0x502,
 	INTEL_GUC_ACTION_SLPC_REQUEST = 0x3003,
-	INTEL_GUC_ACTION_SAMPLE_FORCEWAKE = 0x3005,
 	INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000,
 	INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
 	INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 892c1315ce49..ab0789d66e06 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -502,10 +502,6 @@ static int __uc_init_hw(struct intel_uc *uc)
 
 	intel_huc_auth(huc);
 
-	ret = intel_guc_sample_forcewake(guc);
-	if (ret)
-		goto err_log_capture;
-
 	if (intel_uc_uses_guc_submission(uc))
 		intel_guc_submission_enable(guc);
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 08/97] drm/i915/guc: Keep strict GuC ABI definitions
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (6 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 07/97] drm/i915/guc: Remove sample_forcewake h2g action Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-24 23:52   ` Michał Winiarski
  2021-05-06 19:13 ` [RFC PATCH 09/97] drm/i915/guc: Stop using fence/status from CTB descriptor Matthew Brost
                   ` (91 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

Our fwif.h file is now mix of strict firmware ABI definitions and
set of our helpers. In anticipation of upcoming changes to the GuC
interface try to keep them separate in smaller maintainable files.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
---
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |  51 +++++
 .../gt/uc/abi/guc_communication_ctb_abi.h     | 106 +++++++++
 .../gt/uc/abi/guc_communication_mmio_abi.h    |  52 +++++
 .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h   |  14 ++
 .../gpu/drm/i915/gt/uc/abi/guc_messages_abi.h |  21 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   | 203 +-----------------
 6 files changed, 250 insertions(+), 197 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
new file mode 100644
index 000000000000..90efef8a73e4
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2014-2021 Intel Corporation
+ */
+
+#ifndef _ABI_GUC_ACTIONS_ABI_H
+#define _ABI_GUC_ACTIONS_ABI_H
+
+enum intel_guc_action {
+	INTEL_GUC_ACTION_DEFAULT = 0x0,
+	INTEL_GUC_ACTION_REQUEST_PREEMPTION = 0x2,
+	INTEL_GUC_ACTION_REQUEST_ENGINE_RESET = 0x3,
+	INTEL_GUC_ACTION_ALLOCATE_DOORBELL = 0x10,
+	INTEL_GUC_ACTION_DEALLOCATE_DOORBELL = 0x20,
+	INTEL_GUC_ACTION_LOG_BUFFER_FILE_FLUSH_COMPLETE = 0x30,
+	INTEL_GUC_ACTION_UK_LOG_ENABLE_LOGGING = 0x40,
+	INTEL_GUC_ACTION_FORCE_LOG_BUFFER_FLUSH = 0x302,
+	INTEL_GUC_ACTION_ENTER_S_STATE = 0x501,
+	INTEL_GUC_ACTION_EXIT_S_STATE = 0x502,
+	INTEL_GUC_ACTION_SLPC_REQUEST = 0x3003,
+	INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000,
+	INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
+	INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
+	INTEL_GUC_ACTION_LIMIT
+};
+
+enum intel_guc_preempt_options {
+	INTEL_GUC_PREEMPT_OPTION_DROP_WORK_Q = 0x4,
+	INTEL_GUC_PREEMPT_OPTION_DROP_SUBMIT_Q = 0x8,
+};
+
+enum intel_guc_report_status {
+	INTEL_GUC_REPORT_STATUS_UNKNOWN = 0x0,
+	INTEL_GUC_REPORT_STATUS_ACKED = 0x1,
+	INTEL_GUC_REPORT_STATUS_ERROR = 0x2,
+	INTEL_GUC_REPORT_STATUS_COMPLETE = 0x4,
+};
+
+enum intel_guc_sleep_state_status {
+	INTEL_GUC_SLEEP_STATE_SUCCESS = 0x1,
+	INTEL_GUC_SLEEP_STATE_PREEMPT_TO_IDLE_FAILED = 0x2,
+	INTEL_GUC_SLEEP_STATE_ENGINE_RESET_FAILED = 0x3
+#define INTEL_GUC_SLEEP_STATE_INVALID_MASK 0x80000000
+};
+
+#define GUC_LOG_CONTROL_LOGGING_ENABLED	(1 << 0)
+#define GUC_LOG_CONTROL_VERBOSITY_SHIFT	4
+#define GUC_LOG_CONTROL_VERBOSITY_MASK	(0xF << GUC_LOG_CONTROL_VERBOSITY_SHIFT)
+#define GUC_LOG_CONTROL_DEFAULT_LOGGING	(1 << 8)
+
+#endif /* _ABI_GUC_ACTIONS_ABI_H */
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
new file mode 100644
index 000000000000..ebd8c3e0e4bb
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
@@ -0,0 +1,106 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2014-2021 Intel Corporation
+ */
+
+#ifndef _ABI_GUC_COMMUNICATION_CTB_ABI_H
+#define _ABI_GUC_COMMUNICATION_CTB_ABI_H
+
+#include <linux/types.h>
+
+/**
+ * DOC: CTB based communication
+ *
+ * The CTB (command transport buffer) communication between Host and GuC
+ * is based on u32 data stream written to the shared buffer. One buffer can
+ * be used to transmit data only in one direction (one-directional channel).
+ *
+ * Current status of the each buffer is stored in the buffer descriptor.
+ * Buffer descriptor holds tail and head fields that represents active data
+ * stream. The tail field is updated by the data producer (sender), and head
+ * field is updated by the data consumer (receiver)::
+ *
+ *      +------------+
+ *      | DESCRIPTOR |          +=================+============+========+
+ *      +============+          |                 | MESSAGE(s) |        |
+ *      | address    |--------->+=================+============+========+
+ *      +------------+
+ *      | head       |          ^-----head--------^
+ *      +------------+
+ *      | tail       |          ^---------tail-----------------^
+ *      +------------+
+ *      | size       |          ^---------------size--------------------^
+ *      +------------+
+ *
+ * Each message in data stream starts with the single u32 treated as a header,
+ * followed by optional set of u32 data that makes message specific payload::
+ *
+ *      +------------+---------+---------+---------+
+ *      |         MESSAGE                          |
+ *      +------------+---------+---------+---------+
+ *      |   msg[0]   |   [1]   |   ...   |  [n-1]  |
+ *      +------------+---------+---------+---------+
+ *      |   MESSAGE  |       MESSAGE PAYLOAD       |
+ *      +   HEADER   +---------+---------+---------+
+ *      |            |    0    |   ...   |    n    |
+ *      +======+=====+=========+=========+=========+
+ *      | 31:16| code|         |         |         |
+ *      +------+-----+         |         |         |
+ *      |  15:5|flags|         |         |         |
+ *      +------+-----+         |         |         |
+ *      |   4:0|  len|         |         |         |
+ *      +------+-----+---------+---------+---------+
+ *
+ *                   ^-------------len-------------^
+ *
+ * The message header consists of:
+ *
+ * - **len**, indicates length of the message payload (in u32)
+ * - **code**, indicates message code
+ * - **flags**, holds various bits to control message handling
+ */
+
+/*
+ * Describes single command transport buffer.
+ * Used by both guc-master and clients.
+ */
+struct guc_ct_buffer_desc {
+	u32 addr;		/* gfx address */
+	u64 host_private;	/* host private data */
+	u32 size;		/* size in bytes */
+	u32 head;		/* offset updated by GuC*/
+	u32 tail;		/* offset updated by owner */
+	u32 is_in_error;	/* error indicator */
+	u32 fence;		/* fence updated by GuC */
+	u32 status;		/* status updated by GuC */
+	u32 owner;		/* id of the channel owner */
+	u32 owner_sub_id;	/* owner-defined field for extra tracking */
+	u32 reserved[5];
+} __packed;
+
+/* Type of command transport buffer */
+#define INTEL_GUC_CT_BUFFER_TYPE_SEND	0x0u
+#define INTEL_GUC_CT_BUFFER_TYPE_RECV	0x1u
+
+/*
+ * Definition of the command transport message header (DW0)
+ *
+ * bit[4..0]	message len (in dwords)
+ * bit[7..5]	reserved
+ * bit[8]	response (G2H only)
+ * bit[8]	write fence to desc (H2G only)
+ * bit[9]	write status to H2G buff (H2G only)
+ * bit[10]	send status back via G2H (H2G only)
+ * bit[15..11]	reserved
+ * bit[31..16]	action code
+ */
+#define GUC_CT_MSG_LEN_SHIFT			0
+#define GUC_CT_MSG_LEN_MASK			0x1F
+#define GUC_CT_MSG_IS_RESPONSE			(1 << 8)
+#define GUC_CT_MSG_WRITE_FENCE_TO_DESC		(1 << 8)
+#define GUC_CT_MSG_WRITE_STATUS_TO_BUFF		(1 << 9)
+#define GUC_CT_MSG_SEND_STATUS			(1 << 10)
+#define GUC_CT_MSG_ACTION_SHIFT			16
+#define GUC_CT_MSG_ACTION_MASK			0xFFFF
+
+#endif /* _ABI_GUC_COMMUNICATION_CTB_ABI_H */
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h
new file mode 100644
index 000000000000..be066a62e9e0
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2014-2021 Intel Corporation
+ */
+
+#ifndef _ABI_GUC_COMMUNICATION_MMIO_ABI_H
+#define _ABI_GUC_COMMUNICATION_MMIO_ABI_H
+
+/**
+ * DOC: MMIO based communication
+ *
+ * The MMIO based communication between Host and GuC uses software scratch
+ * registers, where first register holds data treated as message header,
+ * and other registers are used to hold message payload.
+ *
+ * For Gen9+, GuC uses software scratch registers 0xC180-0xC1B8,
+ * but no H2G command takes more than 8 parameters and the GuC FW
+ * itself uses an 8-element array to store the H2G message.
+ *
+ *      +-----------+---------+---------+---------+
+ *      |  MMIO[0]  | MMIO[1] |   ...   | MMIO[n] |
+ *      +-----------+---------+---------+---------+
+ *      | header    |      optional payload       |
+ *      +======+====+=========+=========+=========+
+ *      | 31:28|type|         |         |         |
+ *      +------+----+         |         |         |
+ *      | 27:16|data|         |         |         |
+ *      +------+----+         |         |         |
+ *      |  15:0|code|         |         |         |
+ *      +------+----+---------+---------+---------+
+ *
+ * The message header consists of:
+ *
+ * - **type**, indicates message type
+ * - **code**, indicates message code, is specific for **type**
+ * - **data**, indicates message data, optional, depends on **code**
+ *
+ * The following message **types** are supported:
+ *
+ * - **REQUEST**, indicates Host-to-GuC request, requested GuC action code
+ *   must be priovided in **code** field. Optional action specific parameters
+ *   can be provided in remaining payload registers or **data** field.
+ *
+ * - **RESPONSE**, indicates GuC-to-Host response from earlier GuC request,
+ *   action response status will be provided in **code** field. Optional
+ *   response data can be returned in remaining payload registers or **data**
+ *   field.
+ */
+
+#define GUC_MAX_MMIO_MSG_LEN		8
+
+#endif /* _ABI_GUC_COMMUNICATION_MMIO_ABI_H */
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
new file mode 100644
index 000000000000..488b6061ee89
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2014-2021 Intel Corporation
+ */
+
+#ifndef _ABI_GUC_ERRORS_ABI_H
+#define _ABI_GUC_ERRORS_ABI_H
+
+enum intel_guc_response_status {
+	INTEL_GUC_RESPONSE_STATUS_SUCCESS = 0x0,
+	INTEL_GUC_RESPONSE_STATUS_GENERIC_FAIL = 0xF000,
+};
+
+#endif /* _ABI_GUC_ERRORS_ABI_H */
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
new file mode 100644
index 000000000000..775e21f3058c
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2014-2021 Intel Corporation
+ */
+
+#ifndef _ABI_GUC_MESSAGES_ABI_H
+#define _ABI_GUC_MESSAGES_ABI_H
+
+#define INTEL_GUC_MSG_TYPE_SHIFT	28
+#define INTEL_GUC_MSG_TYPE_MASK		(0xF << INTEL_GUC_MSG_TYPE_SHIFT)
+#define INTEL_GUC_MSG_DATA_SHIFT	16
+#define INTEL_GUC_MSG_DATA_MASK		(0xFFF << INTEL_GUC_MSG_DATA_SHIFT)
+#define INTEL_GUC_MSG_CODE_SHIFT	0
+#define INTEL_GUC_MSG_CODE_MASK		(0xFFFF << INTEL_GUC_MSG_CODE_SHIFT)
+
+enum intel_guc_msg_type {
+	INTEL_GUC_MSG_TYPE_REQUEST = 0x0,
+	INTEL_GUC_MSG_TYPE_RESPONSE = 0xF,
+};
+
+#endif /* _ABI_GUC_MESSAGES_ABI_H */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 0f9afcde1d0b..9bf35240e723 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -10,6 +10,12 @@
 #include <linux/compiler.h>
 #include <linux/types.h>
 
+#include "abi/guc_actions_abi.h"
+#include "abi/guc_errors_abi.h"
+#include "abi/guc_communication_mmio_abi.h"
+#include "abi/guc_communication_ctb_abi.h"
+#include "abi/guc_messages_abi.h"
+
 #define GUC_CLIENT_PRIORITY_KMD_HIGH	0
 #define GUC_CLIENT_PRIORITY_HIGH	1
 #define GUC_CLIENT_PRIORITY_KMD_NORMAL	2
@@ -207,101 +213,6 @@ struct guc_stage_desc {
 	u64 desc_private;
 } __packed;
 
-/**
- * DOC: CTB based communication
- *
- * The CTB (command transport buffer) communication between Host and GuC
- * is based on u32 data stream written to the shared buffer. One buffer can
- * be used to transmit data only in one direction (one-directional channel).
- *
- * Current status of the each buffer is stored in the buffer descriptor.
- * Buffer descriptor holds tail and head fields that represents active data
- * stream. The tail field is updated by the data producer (sender), and head
- * field is updated by the data consumer (receiver)::
- *
- *      +------------+
- *      | DESCRIPTOR |          +=================+============+========+
- *      +============+          |                 | MESSAGE(s) |        |
- *      | address    |--------->+=================+============+========+
- *      +------------+
- *      | head       |          ^-----head--------^
- *      +------------+
- *      | tail       |          ^---------tail-----------------^
- *      +------------+
- *      | size       |          ^---------------size--------------------^
- *      +------------+
- *
- * Each message in data stream starts with the single u32 treated as a header,
- * followed by optional set of u32 data that makes message specific payload::
- *
- *      +------------+---------+---------+---------+
- *      |         MESSAGE                          |
- *      +------------+---------+---------+---------+
- *      |   msg[0]   |   [1]   |   ...   |  [n-1]  |
- *      +------------+---------+---------+---------+
- *      |   MESSAGE  |       MESSAGE PAYLOAD       |
- *      +   HEADER   +---------+---------+---------+
- *      |            |    0    |   ...   |    n    |
- *      +======+=====+=========+=========+=========+
- *      | 31:16| code|         |         |         |
- *      +------+-----+         |         |         |
- *      |  15:5|flags|         |         |         |
- *      +------+-----+         |         |         |
- *      |   4:0|  len|         |         |         |
- *      +------+-----+---------+---------+---------+
- *
- *                   ^-------------len-------------^
- *
- * The message header consists of:
- *
- * - **len**, indicates length of the message payload (in u32)
- * - **code**, indicates message code
- * - **flags**, holds various bits to control message handling
- */
-
-/*
- * Describes single command transport buffer.
- * Used by both guc-master and clients.
- */
-struct guc_ct_buffer_desc {
-	u32 addr;		/* gfx address */
-	u64 host_private;	/* host private data */
-	u32 size;		/* size in bytes */
-	u32 head;		/* offset updated by GuC*/
-	u32 tail;		/* offset updated by owner */
-	u32 is_in_error;	/* error indicator */
-	u32 fence;		/* fence updated by GuC */
-	u32 status;		/* status updated by GuC */
-	u32 owner;		/* id of the channel owner */
-	u32 owner_sub_id;	/* owner-defined field for extra tracking */
-	u32 reserved[5];
-} __packed;
-
-/* Type of command transport buffer */
-#define INTEL_GUC_CT_BUFFER_TYPE_SEND	0x0u
-#define INTEL_GUC_CT_BUFFER_TYPE_RECV	0x1u
-
-/*
- * Definition of the command transport message header (DW0)
- *
- * bit[4..0]	message len (in dwords)
- * bit[7..5]	reserved
- * bit[8]	response (G2H only)
- * bit[8]	write fence to desc (H2G only)
- * bit[9]	write status to H2G buff (H2G only)
- * bit[10]	send status back via G2H (H2G only)
- * bit[15..11]	reserved
- * bit[31..16]	action code
- */
-#define GUC_CT_MSG_LEN_SHIFT			0
-#define GUC_CT_MSG_LEN_MASK			0x1F
-#define GUC_CT_MSG_IS_RESPONSE			(1 << 8)
-#define GUC_CT_MSG_WRITE_FENCE_TO_DESC		(1 << 8)
-#define GUC_CT_MSG_WRITE_STATUS_TO_BUFF		(1 << 9)
-#define GUC_CT_MSG_SEND_STATUS			(1 << 10)
-#define GUC_CT_MSG_ACTION_SHIFT			16
-#define GUC_CT_MSG_ACTION_MASK			0xFFFF
-
 #define GUC_POWER_UNSPECIFIED	0
 #define GUC_POWER_D0		1
 #define GUC_POWER_D1		2
@@ -477,119 +388,17 @@ struct guc_shared_ctx_data {
 	struct guc_ctx_report preempt_ctx_report[GUC_MAX_ENGINES_NUM];
 } __packed;
 
-/**
- * DOC: MMIO based communication
- *
- * The MMIO based communication between Host and GuC uses software scratch
- * registers, where first register holds data treated as message header,
- * and other registers are used to hold message payload.
- *
- * For Gen9+, GuC uses software scratch registers 0xC180-0xC1B8,
- * but no H2G command takes more than 8 parameters and the GuC FW
- * itself uses an 8-element array to store the H2G message.
- *
- *      +-----------+---------+---------+---------+
- *      |  MMIO[0]  | MMIO[1] |   ...   | MMIO[n] |
- *      +-----------+---------+---------+---------+
- *      | header    |      optional payload       |
- *      +======+====+=========+=========+=========+
- *      | 31:28|type|         |         |         |
- *      +------+----+         |         |         |
- *      | 27:16|data|         |         |         |
- *      +------+----+         |         |         |
- *      |  15:0|code|         |         |         |
- *      +------+----+---------+---------+---------+
- *
- * The message header consists of:
- *
- * - **type**, indicates message type
- * - **code**, indicates message code, is specific for **type**
- * - **data**, indicates message data, optional, depends on **code**
- *
- * The following message **types** are supported:
- *
- * - **REQUEST**, indicates Host-to-GuC request, requested GuC action code
- *   must be priovided in **code** field. Optional action specific parameters
- *   can be provided in remaining payload registers or **data** field.
- *
- * - **RESPONSE**, indicates GuC-to-Host response from earlier GuC request,
- *   action response status will be provided in **code** field. Optional
- *   response data can be returned in remaining payload registers or **data**
- *   field.
- */
-
-#define GUC_MAX_MMIO_MSG_LEN		8
-
-#define INTEL_GUC_MSG_TYPE_SHIFT	28
-#define INTEL_GUC_MSG_TYPE_MASK		(0xF << INTEL_GUC_MSG_TYPE_SHIFT)
-#define INTEL_GUC_MSG_DATA_SHIFT	16
-#define INTEL_GUC_MSG_DATA_MASK		(0xFFF << INTEL_GUC_MSG_DATA_SHIFT)
-#define INTEL_GUC_MSG_CODE_SHIFT	0
-#define INTEL_GUC_MSG_CODE_MASK		(0xFFFF << INTEL_GUC_MSG_CODE_SHIFT)
-
 #define __INTEL_GUC_MSG_GET(T, m) \
 	(((m) & INTEL_GUC_MSG_ ## T ## _MASK) >> INTEL_GUC_MSG_ ## T ## _SHIFT)
 #define INTEL_GUC_MSG_TO_TYPE(m)	__INTEL_GUC_MSG_GET(TYPE, m)
 #define INTEL_GUC_MSG_TO_DATA(m)	__INTEL_GUC_MSG_GET(DATA, m)
 #define INTEL_GUC_MSG_TO_CODE(m)	__INTEL_GUC_MSG_GET(CODE, m)
 
-enum intel_guc_msg_type {
-	INTEL_GUC_MSG_TYPE_REQUEST = 0x0,
-	INTEL_GUC_MSG_TYPE_RESPONSE = 0xF,
-};
-
 #define __INTEL_GUC_MSG_TYPE_IS(T, m) \
 	(INTEL_GUC_MSG_TO_TYPE(m) == INTEL_GUC_MSG_TYPE_ ## T)
 #define INTEL_GUC_MSG_IS_REQUEST(m)	__INTEL_GUC_MSG_TYPE_IS(REQUEST, m)
 #define INTEL_GUC_MSG_IS_RESPONSE(m)	__INTEL_GUC_MSG_TYPE_IS(RESPONSE, m)
 
-enum intel_guc_action {
-	INTEL_GUC_ACTION_DEFAULT = 0x0,
-	INTEL_GUC_ACTION_REQUEST_PREEMPTION = 0x2,
-	INTEL_GUC_ACTION_REQUEST_ENGINE_RESET = 0x3,
-	INTEL_GUC_ACTION_ALLOCATE_DOORBELL = 0x10,
-	INTEL_GUC_ACTION_DEALLOCATE_DOORBELL = 0x20,
-	INTEL_GUC_ACTION_LOG_BUFFER_FILE_FLUSH_COMPLETE = 0x30,
-	INTEL_GUC_ACTION_UK_LOG_ENABLE_LOGGING = 0x40,
-	INTEL_GUC_ACTION_FORCE_LOG_BUFFER_FLUSH = 0x302,
-	INTEL_GUC_ACTION_ENTER_S_STATE = 0x501,
-	INTEL_GUC_ACTION_EXIT_S_STATE = 0x502,
-	INTEL_GUC_ACTION_SLPC_REQUEST = 0x3003,
-	INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000,
-	INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
-	INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
-	INTEL_GUC_ACTION_LIMIT
-};
-
-enum intel_guc_preempt_options {
-	INTEL_GUC_PREEMPT_OPTION_DROP_WORK_Q = 0x4,
-	INTEL_GUC_PREEMPT_OPTION_DROP_SUBMIT_Q = 0x8,
-};
-
-enum intel_guc_report_status {
-	INTEL_GUC_REPORT_STATUS_UNKNOWN = 0x0,
-	INTEL_GUC_REPORT_STATUS_ACKED = 0x1,
-	INTEL_GUC_REPORT_STATUS_ERROR = 0x2,
-	INTEL_GUC_REPORT_STATUS_COMPLETE = 0x4,
-};
-
-enum intel_guc_sleep_state_status {
-	INTEL_GUC_SLEEP_STATE_SUCCESS = 0x1,
-	INTEL_GUC_SLEEP_STATE_PREEMPT_TO_IDLE_FAILED = 0x2,
-	INTEL_GUC_SLEEP_STATE_ENGINE_RESET_FAILED = 0x3
-#define INTEL_GUC_SLEEP_STATE_INVALID_MASK 0x80000000
-};
-
-#define GUC_LOG_CONTROL_LOGGING_ENABLED	(1 << 0)
-#define GUC_LOG_CONTROL_VERBOSITY_SHIFT	4
-#define GUC_LOG_CONTROL_VERBOSITY_MASK	(0xF << GUC_LOG_CONTROL_VERBOSITY_SHIFT)
-#define GUC_LOG_CONTROL_DEFAULT_LOGGING	(1 << 8)
-
-enum intel_guc_response_status {
-	INTEL_GUC_RESPONSE_STATUS_SUCCESS = 0x0,
-	INTEL_GUC_RESPONSE_STATUS_GENERIC_FAIL = 0xF000,
-};
-
 #define INTEL_GUC_MSG_IS_RESPONSE_SUCCESS(m) \
 	 (typecheck(u32, (m)) && \
 	  ((m) & (INTEL_GUC_MSG_TYPE_MASK | INTEL_GUC_MSG_CODE_MASK)) == \
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 09/97] drm/i915/guc: Stop using fence/status from CTB descriptor
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (7 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 08/97] drm/i915/guc: Keep strict GuC ABI definitions Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-25  2:38   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 10/97] drm/i915: Promote ptrdiff() to i915_utils.h Matthew Brost
                   ` (90 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

Stop using fence/status from CTB descriptor as future GuC ABI will
no longer support replies over CTB descriptor.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gt/uc/abi/guc_communication_ctb_abi.h     |  4 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 72 ++-----------------
 2 files changed, 6 insertions(+), 70 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
index ebd8c3e0e4bb..d38935f47ecf 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
@@ -71,8 +71,8 @@ struct guc_ct_buffer_desc {
 	u32 head;		/* offset updated by GuC*/
 	u32 tail;		/* offset updated by owner */
 	u32 is_in_error;	/* error indicator */
-	u32 fence;		/* fence updated by GuC */
-	u32 status;		/* status updated by GuC */
+	u32 reserved1;
+	u32 reserved2;
 	u32 owner;		/* id of the channel owner */
 	u32 owner_sub_id;	/* owner-defined field for extra tracking */
 	u32 reserved[5];
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 25618649048f..4cc8c0b71699 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -90,13 +90,6 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
 	desc->owner = CTB_OWNER_HOST;
 }
 
-static void guc_ct_buffer_desc_reset(struct guc_ct_buffer_desc *desc)
-{
-	desc->head = 0;
-	desc->tail = 0;
-	desc->is_in_error = 0;
-}
-
 static int guc_action_register_ct_buffer(struct intel_guc *guc,
 					 u32 desc_addr,
 					 u32 type)
@@ -315,8 +308,7 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
 static int ct_write(struct intel_guc_ct *ct,
 		    const u32 *action,
 		    u32 len /* in dwords */,
-		    u32 fence,
-		    bool want_response)
+		    u32 fence)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs[CTB_SEND];
 	struct guc_ct_buffer_desc *desc = ctb->desc;
@@ -360,8 +352,7 @@ static int ct_write(struct intel_guc_ct *ct,
 	 * DW2+: action data
 	 */
 	header = (len << GUC_CT_MSG_LEN_SHIFT) |
-		 (GUC_CT_MSG_WRITE_FENCE_TO_DESC) |
-		 (want_response ? GUC_CT_MSG_SEND_STATUS : 0) |
+		 GUC_CT_MSG_SEND_STATUS |
 		 (action[0] << GUC_CT_MSG_ACTION_SHIFT);
 
 	CT_DEBUG(ct, "writing %*ph %*ph %*ph\n",
@@ -390,56 +381,6 @@ static int ct_write(struct intel_guc_ct *ct,
 	return -EPIPE;
 }
 
-/**
- * wait_for_ctb_desc_update - Wait for the CT buffer descriptor update.
- * @desc:	buffer descriptor
- * @fence:	response fence
- * @status:	placeholder for status
- *
- * Guc will update CT buffer descriptor with new fence and status
- * after processing the command identified by the fence. Wait for
- * specified fence and then read from the descriptor status of the
- * command.
- *
- * Return:
- * *	0 response received (status is valid)
- * *	-ETIMEDOUT no response within hardcoded timeout
- * *	-EPROTO no response, CT buffer is in error
- */
-static int wait_for_ctb_desc_update(struct guc_ct_buffer_desc *desc,
-				    u32 fence,
-				    u32 *status)
-{
-	int err;
-
-	/*
-	 * Fast commands should complete in less than 10us, so sample quickly
-	 * up to that length of time, then switch to a slower sleep-wait loop.
-	 * No GuC command should ever take longer than 10ms.
-	 */
-#define done (READ_ONCE(desc->fence) == fence)
-	err = wait_for_us(done, 10);
-	if (err)
-		err = wait_for(done, 10);
-#undef done
-
-	if (unlikely(err)) {
-		DRM_ERROR("CT: fence %u failed; reported fence=%u\n",
-			  fence, desc->fence);
-
-		if (WARN_ON(desc->is_in_error)) {
-			/* Something went wrong with the messaging, try to reset
-			 * the buffer and hope for the best
-			 */
-			guc_ct_buffer_desc_reset(desc);
-			err = -EPROTO;
-		}
-	}
-
-	*status = desc->status;
-	return err;
-}
-
 /**
  * wait_for_ct_request_update - Wait for CT request state update.
  * @req:	pointer to pending request
@@ -483,8 +424,6 @@ static int ct_send(struct intel_guc_ct *ct,
 		   u32 response_buf_size,
 		   u32 *status)
 {
-	struct intel_guc_ct_buffer *ctb = &ct->ctbs[CTB_SEND];
-	struct guc_ct_buffer_desc *desc = ctb->desc;
 	struct ct_request request;
 	unsigned long flags;
 	u32 fence;
@@ -505,16 +444,13 @@ static int ct_send(struct intel_guc_ct *ct,
 	list_add_tail(&request.link, &ct->requests.pending);
 	spin_unlock_irqrestore(&ct->requests.lock, flags);
 
-	err = ct_write(ct, action, len, fence, !!response_buf);
+	err = ct_write(ct, action, len, fence);
 	if (unlikely(err))
 		goto unlink;
 
 	intel_guc_notify(ct_to_guc(ct));
 
-	if (response_buf)
-		err = wait_for_ct_request_update(&request, status);
-	else
-		err = wait_for_ctb_desc_update(desc, fence, status);
+	err = wait_for_ct_request_update(&request, status);
 	if (unlikely(err))
 		goto unlink;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 10/97] drm/i915: Promote ptrdiff() to i915_utils.h
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (8 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 09/97] drm/i915/guc: Stop using fence/status from CTB descriptor Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-25  0:42   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 11/97] drm/i915/guc: Only rely on own CTB size Matthew Brost
                   ` (89 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

Generic helpers should be placed in i915_utils.h.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/i915_utils.h | 5 +++++
 drivers/gpu/drm/i915/i915_vma.h   | 5 -----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index f02f52ab5070..5259edacde38 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -201,6 +201,11 @@ __check_struct_size(size_t base, size_t arr, size_t count, size_t *size)
 	__T;								\
 })
 
+static __always_inline ptrdiff_t ptrdiff(const void *a, const void *b)
+{
+	return a - b;
+}
+
 /*
  * container_of_user: Extract the superclass from a pointer to a member.
  *
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 8df784a026d2..a29a158990c6 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -146,11 +146,6 @@ static inline void i915_vma_put(struct i915_vma *vma)
 	i915_gem_object_put(vma->obj);
 }
 
-static __always_inline ptrdiff_t ptrdiff(const void *a, const void *b)
-{
-	return a - b;
-}
-
 static inline long
 i915_vma_compare(struct i915_vma *vma,
 		 struct i915_address_space *vm,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 11/97] drm/i915/guc: Only rely on own CTB size
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (9 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 10/97] drm/i915: Promote ptrdiff() to i915_utils.h Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-25  2:47   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 12/97] drm/i915/guc: Don't repeat CTB layout calculations Matthew Brost
                   ` (88 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

In upcoming GuC firmware, CTB size will be removed from the CTB
descriptor so we must keep it locally for any calculations.

While around, improve some debug messages and helpers.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 55 +++++++++++++++++------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  2 +
 2 files changed, 43 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 4cc8c0b71699..dbece569fbe4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -90,6 +90,24 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
 	desc->owner = CTB_OWNER_HOST;
 }
 
+static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb, u32 cmds_addr)
+{
+	guc_ct_buffer_desc_init(ctb->desc, cmds_addr, ctb->size);
+}
+
+static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb,
+			       struct guc_ct_buffer_desc *desc,
+			       u32 *cmds, u32 size)
+{
+	GEM_BUG_ON(size % 4);
+
+	ctb->desc = desc;
+	ctb->cmds = cmds;
+	ctb->size = size;
+
+	guc_ct_buffer_reset(ctb, 0);
+}
+
 static int guc_action_register_ct_buffer(struct intel_guc *guc,
 					 u32 desc_addr,
 					 u32 type)
@@ -148,7 +166,10 @@ static int ct_deregister_buffer(struct intel_guc_ct *ct, u32 type)
 int intel_guc_ct_init(struct intel_guc_ct *ct)
 {
 	struct intel_guc *guc = ct_to_guc(ct);
+	struct guc_ct_buffer_desc *desc;
+	u32 blob_size;
 	void *blob;
+	u32 *cmds;
 	int err;
 	int i;
 
@@ -176,19 +197,24 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
 	 * other code will need updating as well.
 	 */
 
-	err = intel_guc_allocate_and_map_vma(guc, PAGE_SIZE, &ct->vma, &blob);
+	blob_size = PAGE_SIZE;
+	err = intel_guc_allocate_and_map_vma(guc, blob_size, &ct->vma, &blob);
 	if (unlikely(err)) {
-		CT_ERROR(ct, "Failed to allocate CT channel (err=%d)\n", err);
+		CT_PROBE_ERROR(ct, "Failed to allocate %u for CTB data (%pe)\n",
+			       blob_size, ERR_PTR(err));
 		return err;
 	}
 
-	CT_DEBUG(ct, "vma base=%#x\n", intel_guc_ggtt_offset(guc, ct->vma));
+	CT_DEBUG(ct, "base=%#x size=%u\n", intel_guc_ggtt_offset(guc, ct->vma), blob_size);
 
 	/* store pointers to desc and cmds */
 	for (i = 0; i < ARRAY_SIZE(ct->ctbs); i++) {
 		GEM_BUG_ON((i !=  CTB_SEND) && (i != CTB_RECV));
-		ct->ctbs[i].desc = blob + PAGE_SIZE/4 * i;
-		ct->ctbs[i].cmds = blob + PAGE_SIZE/4 * i + PAGE_SIZE/2;
+
+		desc = blob + PAGE_SIZE / 4 * i;
+		cmds = blob + PAGE_SIZE / 4 * i + PAGE_SIZE / 2;
+
+		guc_ct_buffer_init(&ct->ctbs[i], desc, cmds, PAGE_SIZE / 4);
 	}
 
 	return 0;
@@ -217,7 +243,7 @@ void intel_guc_ct_fini(struct intel_guc_ct *ct)
 int intel_guc_ct_enable(struct intel_guc_ct *ct)
 {
 	struct intel_guc *guc = ct_to_guc(ct);
-	u32 base, cmds, size;
+	u32 base, cmds;
 	int err;
 	int i;
 
@@ -232,10 +258,11 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
 	 */
 	for (i = 0; i < ARRAY_SIZE(ct->ctbs); i++) {
 		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
+
 		cmds = base + PAGE_SIZE / 4 * i + PAGE_SIZE / 2;
-		size = PAGE_SIZE / 4;
-		CT_DEBUG(ct, "%d: addr=%#x size=%u\n", i, cmds, size);
-		guc_ct_buffer_desc_init(ct->ctbs[i].desc, cmds, size);
+		CT_DEBUG(ct, "%d: cmds addr=%#x\n", i, cmds);
+
+		guc_ct_buffer_reset(&ct->ctbs[i], cmds);
 	}
 
 	/*
@@ -259,7 +286,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
 err_deregister:
 	ct_deregister_buffer(ct, INTEL_GUC_CT_BUFFER_TYPE_RECV);
 err_out:
-	CT_PROBE_ERROR(ct, "Failed to open channel (err=%d)\n", err);
+	CT_PROBE_ERROR(ct, "Failed to enable CTB (%pe)\n", ERR_PTR(err));
 	return err;
 }
 
@@ -314,7 +341,7 @@ static int ct_write(struct intel_guc_ct *ct,
 	struct guc_ct_buffer_desc *desc = ctb->desc;
 	u32 head = desc->head;
 	u32 tail = desc->tail;
-	u32 size = desc->size;
+	u32 size = ctb->size;
 	u32 used;
 	u32 header;
 	u32 *cmds = ctb->cmds;
@@ -323,7 +350,7 @@ static int ct_write(struct intel_guc_ct *ct,
 	if (unlikely(desc->is_in_error))
 		return -EPIPE;
 
-	if (unlikely(!IS_ALIGNED(head | tail | size, 4) ||
+	if (unlikely(!IS_ALIGNED(head | tail, 4) ||
 		     (tail | head) >= size))
 		goto corrupted;
 
@@ -530,7 +557,7 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
 	struct guc_ct_buffer_desc *desc = ctb->desc;
 	u32 head = desc->head;
 	u32 tail = desc->tail;
-	u32 size = desc->size;
+	u32 size = ctb->size;
 	u32 *cmds = ctb->cmds;
 	s32 available;
 	unsigned int len;
@@ -539,7 +566,7 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
 	if (unlikely(desc->is_in_error))
 		return -EPIPE;
 
-	if (unlikely(!IS_ALIGNED(head | tail | size, 4) ||
+	if (unlikely(!IS_ALIGNED(head | tail, 4) ||
 		     (tail | head) >= size))
 		goto corrupted;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 494a51a5200f..4009e2dd0de4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -29,10 +29,12 @@ struct intel_guc;
  *
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
+ * @size: size of the commands buffer
  */
 struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
+	u32 size;
 };
 
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 12/97] drm/i915/guc: Don't repeat CTB layout calculations
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (10 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 11/97] drm/i915/guc: Only rely on own CTB size Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-25  2:53   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 13/97] drm/i915/guc: Replace CTB array with explicit members Matthew Brost
                   ` (87 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

We can retrieve offsets to cmds buffers and descriptor from
actual pointers that we already keep locally.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index dbece569fbe4..fbd6bd20f588 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -244,6 +244,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
 {
 	struct intel_guc *guc = ct_to_guc(ct);
 	u32 base, cmds;
+	void *blob;
 	int err;
 	int i;
 
@@ -251,15 +252,18 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
 
 	/* vma should be already allocated and map'ed */
 	GEM_BUG_ON(!ct->vma);
+	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(ct->vma->obj));
 	base = intel_guc_ggtt_offset(guc, ct->vma);
 
-	/* (re)initialize descriptors
-	 * cmds buffers are in the second half of the blob page
-	 */
+	/* blob should start with send descriptor */
+	blob = __px_vaddr(ct->vma->obj);
+	GEM_BUG_ON(blob != ct->ctbs[CTB_SEND].desc);
+
+	/* (re)initialize descriptors */
 	for (i = 0; i < ARRAY_SIZE(ct->ctbs); i++) {
 		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
 
-		cmds = base + PAGE_SIZE / 4 * i + PAGE_SIZE / 2;
+		cmds = base + ptrdiff(ct->ctbs[i].cmds, blob);
 		CT_DEBUG(ct, "%d: cmds addr=%#x\n", i, cmds);
 
 		guc_ct_buffer_reset(&ct->ctbs[i], cmds);
@@ -269,12 +273,12 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
 	 * Register both CT buffers starting with RECV buffer.
 	 * Descriptors are in first half of the blob.
 	 */
-	err = ct_register_buffer(ct, base + PAGE_SIZE / 4 * CTB_RECV,
+	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs[CTB_RECV].desc, blob),
 				 INTEL_GUC_CT_BUFFER_TYPE_RECV);
 	if (unlikely(err))
 		goto err_out;
 
-	err = ct_register_buffer(ct, base + PAGE_SIZE / 4 * CTB_SEND,
+	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs[CTB_SEND].desc, blob),
 				 INTEL_GUC_CT_BUFFER_TYPE_SEND);
 	if (unlikely(err))
 		goto err_deregister;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 13/97] drm/i915/guc: Replace CTB array with explicit members
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (11 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 12/97] drm/i915/guc: Don't repeat CTB layout calculations Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-25  3:15   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 14/97] drm/i915/guc: Update sizes of CTB buffers Matthew Brost
                   ` (86 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

Upcoming GuC firmware will always require just two CTBs and we
also plan to configure them with different sizes, so definining
them as array is no longer suitable.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 46 ++++++++++++-----------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  7 +++-
 2 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index fbd6bd20f588..c54a29176862 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -168,10 +168,10 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
 	struct intel_guc *guc = ct_to_guc(ct);
 	struct guc_ct_buffer_desc *desc;
 	u32 blob_size;
+	u32 cmds_size;
 	void *blob;
 	u32 *cmds;
 	int err;
-	int i;
 
 	GEM_BUG_ON(ct->vma);
 
@@ -207,15 +207,23 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
 
 	CT_DEBUG(ct, "base=%#x size=%u\n", intel_guc_ggtt_offset(guc, ct->vma), blob_size);
 
-	/* store pointers to desc and cmds */
-	for (i = 0; i < ARRAY_SIZE(ct->ctbs); i++) {
-		GEM_BUG_ON((i !=  CTB_SEND) && (i != CTB_RECV));
+	/* store pointers to desc and cmds for send ctb */
+	desc = blob;
+	cmds = blob + PAGE_SIZE / 2;
+	cmds_size = PAGE_SIZE / 4;
+	CT_DEBUG(ct, "%s desc %#lx cmds %#lx size %u\n", "send",
+		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
 
-		desc = blob + PAGE_SIZE / 4 * i;
-		cmds = blob + PAGE_SIZE / 4 * i + PAGE_SIZE / 2;
+	guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size);
 
-		guc_ct_buffer_init(&ct->ctbs[i], desc, cmds, PAGE_SIZE / 4);
-	}
+	/* store pointers to desc and cmds for recv ctb */
+	desc = blob + PAGE_SIZE / 4;
+	cmds = blob + PAGE_SIZE / 4 + PAGE_SIZE / 2;
+	cmds_size = PAGE_SIZE / 4;
+	CT_DEBUG(ct, "%s desc %#lx cmds %#lx size %u\n", "recv",
+		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
+
+	guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size);
 
 	return 0;
 }
@@ -246,7 +254,6 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
 	u32 base, cmds;
 	void *blob;
 	int err;
-	int i;
 
 	GEM_BUG_ON(ct->enabled);
 
@@ -257,28 +264,25 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
 
 	/* blob should start with send descriptor */
 	blob = __px_vaddr(ct->vma->obj);
-	GEM_BUG_ON(blob != ct->ctbs[CTB_SEND].desc);
+	GEM_BUG_ON(blob != ct->ctbs.send.desc);
 
 	/* (re)initialize descriptors */
-	for (i = 0; i < ARRAY_SIZE(ct->ctbs); i++) {
-		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
+	cmds = base + ptrdiff(ct->ctbs.send.cmds, blob);
+	guc_ct_buffer_reset(&ct->ctbs.send, cmds);
 
-		cmds = base + ptrdiff(ct->ctbs[i].cmds, blob);
-		CT_DEBUG(ct, "%d: cmds addr=%#x\n", i, cmds);
-
-		guc_ct_buffer_reset(&ct->ctbs[i], cmds);
-	}
+	cmds = base + ptrdiff(ct->ctbs.recv.cmds, blob);
+	guc_ct_buffer_reset(&ct->ctbs.recv, cmds);
 
 	/*
 	 * Register both CT buffers starting with RECV buffer.
 	 * Descriptors are in first half of the blob.
 	 */
-	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs[CTB_RECV].desc, blob),
+	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs.recv.desc, blob),
 				 INTEL_GUC_CT_BUFFER_TYPE_RECV);
 	if (unlikely(err))
 		goto err_out;
 
-	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs[CTB_SEND].desc, blob),
+	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs.send.desc, blob),
 				 INTEL_GUC_CT_BUFFER_TYPE_SEND);
 	if (unlikely(err))
 		goto err_deregister;
@@ -341,7 +345,7 @@ static int ct_write(struct intel_guc_ct *ct,
 		    u32 len /* in dwords */,
 		    u32 fence)
 {
-	struct intel_guc_ct_buffer *ctb = &ct->ctbs[CTB_SEND];
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
 	u32 head = desc->head;
 	u32 tail = desc->tail;
@@ -557,7 +561,7 @@ static inline bool ct_header_is_response(u32 header)
 
 static int ct_read(struct intel_guc_ct *ct, u32 *data)
 {
-	struct intel_guc_ct_buffer *ctb = &ct->ctbs[CTB_RECV];
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
 	u32 head = desc->head;
 	u32 tail = desc->tail;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 4009e2dd0de4..fc9486779e87 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -47,8 +47,11 @@ struct intel_guc_ct {
 	struct i915_vma *vma;
 	bool enabled;
 
-	/* buffers for sending(0) and receiving(1) commands */
-	struct intel_guc_ct_buffer ctbs[2];
+	/* buffers for sending and receiving commands */
+	struct {
+		struct intel_guc_ct_buffer send;
+		struct intel_guc_ct_buffer recv;
+	} ctbs;
 
 	struct {
 		u32 last_fence; /* last fence used to send request */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 14/97] drm/i915/guc: Update sizes of CTB buffers
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (12 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 13/97] drm/i915/guc: Replace CTB array with explicit members Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-25  2:56   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 15/97] drm/i915/guc: Relax CTB response timeout Matthew Brost
                   ` (85 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

Future GuC will require CTB buffers sizes to be multiple of 4K.
Make these changes now as this shouldn't impact us too much.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 60 ++++++++++++-----------
 1 file changed, 32 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index c54a29176862..c87a0a8bef26 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -38,6 +38,32 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
 #define CT_PROBE_ERROR(_ct, _fmt, ...) \
 	i915_probe_error(ct_to_i915(ct), "CT: " _fmt, ##__VA_ARGS__);
 
+/**
+ * DOC: CTB Blob
+ *
+ * We allocate single blob to hold both CTB descriptors and buffers:
+ *
+ *      +--------+-----------------------------------------------+------+
+ *      | offset | contents                                      | size |
+ *      +========+===============================================+======+
+ *      | 0x0000 | H2G `CTB Descriptor`_ (send)                  |      |
+ *      +--------+-----------------------------------------------+  4K  |
+ *      | 0x0800 | G2H `CTB Descriptor`_ (recv)                  |      |
+ *      +--------+-----------------------------------------------+------+
+ *      | 0x1000 | H2G `CT Buffer`_ (send)                       | n*4K |
+ *      |        |                                               |      |
+ *      +--------+-----------------------------------------------+------+
+ *      | 0x1000 | G2H `CT Buffer`_ (recv)                       | m*4K |
+ *      | + n*4K |                                               |      |
+ *      +--------+-----------------------------------------------+------+
+ *
+ * Size of each `CT Buffer`_ must be multiple of 4K.
+ * As we don't expect too many messages, for now use minimum sizes.
+ */
+#define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
+#define CTB_H2G_BUFFER_SIZE	(SZ_4K)
+#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
+
 struct ct_request {
 	struct list_head link;
 	u32 fence;
@@ -175,29 +201,7 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
 
 	GEM_BUG_ON(ct->vma);
 
-	/* We allocate 1 page to hold both descriptors and both buffers.
-	 *       ___________.....................
-	 *      |desc (SEND)|                   :
-	 *      |___________|                   PAGE/4
-	 *      :___________....................:
-	 *      |desc (RECV)|                   :
-	 *      |___________|                   PAGE/4
-	 *      :_______________________________:
-	 *      |cmds (SEND)                    |
-	 *      |                               PAGE/4
-	 *      |_______________________________|
-	 *      |cmds (RECV)                    |
-	 *      |                               PAGE/4
-	 *      |_______________________________|
-	 *
-	 * Each message can use a maximum of 32 dwords and we don't expect to
-	 * have more than 1 in flight at any time, so we have enough space.
-	 * Some logic further ahead will rely on the fact that there is only 1
-	 * page and that it is always mapped, so if the size is changed the
-	 * other code will need updating as well.
-	 */
-
-	blob_size = PAGE_SIZE;
+	blob_size = 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE + CTB_G2H_BUFFER_SIZE;
 	err = intel_guc_allocate_and_map_vma(guc, blob_size, &ct->vma, &blob);
 	if (unlikely(err)) {
 		CT_PROBE_ERROR(ct, "Failed to allocate %u for CTB data (%pe)\n",
@@ -209,17 +213,17 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
 
 	/* store pointers to desc and cmds for send ctb */
 	desc = blob;
-	cmds = blob + PAGE_SIZE / 2;
-	cmds_size = PAGE_SIZE / 4;
+	cmds = blob + 2 * CTB_DESC_SIZE;
+	cmds_size = CTB_H2G_BUFFER_SIZE;
 	CT_DEBUG(ct, "%s desc %#lx cmds %#lx size %u\n", "send",
 		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
 
 	guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size);
 
 	/* store pointers to desc and cmds for recv ctb */
-	desc = blob + PAGE_SIZE / 4;
-	cmds = blob + PAGE_SIZE / 4 + PAGE_SIZE / 2;
-	cmds_size = PAGE_SIZE / 4;
+	desc = blob + CTB_DESC_SIZE;
+	cmds = blob + 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE;
+	cmds_size = CTB_G2H_BUFFER_SIZE;
 	CT_DEBUG(ct, "%s desc %#lx cmds %#lx size %u\n", "recv",
 		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 15/97] drm/i915/guc: Relax CTB response timeout
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (13 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 14/97] drm/i915/guc: Update sizes of CTB buffers Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-25 18:08   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 16/97] drm/i915/guc: Start protecting access to CTB descriptors Matthew Brost
                   ` (84 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

In upcoming patch we will allow more CTB requests to be sent in
parallel to the GuC for procesing, so we shouldn't assume any more
that GuC will always reply without 10ms.

Use bigger value from CONFIG_DRM_I915_HEARTBEAT_INTERVAL instead.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index c87a0a8bef26..a4b2e7fe318b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -436,17 +436,23 @@ static int ct_write(struct intel_guc_ct *ct,
  */
 static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
 {
+	long timeout;
 	int err;
 
 	/*
 	 * Fast commands should complete in less than 10us, so sample quickly
 	 * up to that length of time, then switch to a slower sleep-wait loop.
 	 * No GuC command should ever take longer than 10ms.
+	 *
+	 * However, there might be other CT requests in flight before this one,
+	 * so use @CONFIG_DRM_I915_HEARTBEAT_INTERVAL as backup timeout value.
 	 */
+	timeout = max(10, CONFIG_DRM_I915_HEARTBEAT_INTERVAL);
+
 #define done INTEL_GUC_MSG_IS_RESPONSE(READ_ONCE(req->status))
 	err = wait_for_us(done, 10);
 	if (err)
-		err = wait_for(done, 10);
+		err = wait_for(done, timeout);
 #undef done
 
 	if (unlikely(err))
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 16/97] drm/i915/guc: Start protecting access to CTB descriptors
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (14 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 15/97] drm/i915/guc: Relax CTB response timeout Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-25  3:21   ` Matthew Brost
  2021-05-25  3:21   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 17/97] drm/i915/guc: Stop using mutex while sending CTB messages Matthew Brost
                   ` (83 subsequent siblings)
  99 siblings, 2 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

We want to stop using guc.send_mutex while sending CTB messages
so we have to start protecting access to CTB send descriptor.

For completeness protect also CTB send descriptor.

Add spinlock to struct intel_guc_ct_buffer and start using it.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 14 ++++++++++++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  2 ++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index a4b2e7fe318b..bee0958d8bae 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -89,6 +89,8 @@ static void ct_incoming_request_worker_func(struct work_struct *w);
  */
 void intel_guc_ct_init_early(struct intel_guc_ct *ct)
 {
+	spin_lock_init(&ct->ctbs.send.lock);
+	spin_lock_init(&ct->ctbs.recv.lock);
 	spin_lock_init(&ct->requests.lock);
 	INIT_LIST_HEAD(&ct->requests.pending);
 	INIT_LIST_HEAD(&ct->requests.incoming);
@@ -479,17 +481,22 @@ static int ct_send(struct intel_guc_ct *ct,
 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
 	GEM_BUG_ON(!response_buf && response_buf_size);
 
+	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
+
 	fence = ct_get_next_fence(ct);
 	request.fence = fence;
 	request.status = 0;
 	request.response_len = response_buf_size;
 	request.response_buf = response_buf;
 
-	spin_lock_irqsave(&ct->requests.lock, flags);
+	spin_lock(&ct->requests.lock);
 	list_add_tail(&request.link, &ct->requests.pending);
-	spin_unlock_irqrestore(&ct->requests.lock, flags);
+	spin_unlock(&ct->requests.lock);
 
 	err = ct_write(ct, action, len, fence);
+
+	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
+
 	if (unlikely(err))
 		goto unlink;
 
@@ -825,6 +832,7 @@ static int ct_handle_request(struct intel_guc_ct *ct, const u32 *msg)
 void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
 {
 	u32 msg[GUC_CT_MSG_LEN_MASK + 1]; /* one extra dw for the header */
+	unsigned long flags;
 	int err = 0;
 
 	if (unlikely(!ct->enabled)) {
@@ -833,7 +841,9 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
 	}
 
 	do {
+		spin_lock_irqsave(&ct->ctbs.recv.lock, flags);
 		err = ct_read(ct, msg);
+		spin_unlock_irqrestore(&ct->ctbs.recv.lock, flags);
 		if (err)
 			break;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index fc9486779e87..bc52dc479a14 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -27,11 +27,13 @@ struct intel_guc;
  * record (command transport buffer descriptor) and the actual buffer which
  * holds the commands.
  *
+ * @lock: protects access to the commands buffer and buffer descriptor
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
  * @size: size of the commands buffer
  */
 struct intel_guc_ct_buffer {
+	spinlock_t lock;
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
 	u32 size;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 17/97] drm/i915/guc: Stop using mutex while sending CTB messages
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (15 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 16/97] drm/i915/guc: Start protecting access to CTB descriptors Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-25 16:14   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 18/97] drm/i915/guc: Don't receive all G2H messages in irq handler Matthew Brost
                   ` (82 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

We are no longer using descriptor to hold G2H replies and we are
protecting access to the descriptor and command buffer by the
separate spinlock, so we can stop using mutex.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index bee0958d8bae..cb58fa7f970c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -537,7 +537,6 @@ static int ct_send(struct intel_guc_ct *ct,
 int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 		      u32 *response_buf, u32 response_buf_size)
 {
-	struct intel_guc *guc = ct_to_guc(ct);
 	u32 status = ~0; /* undefined */
 	int ret;
 
@@ -546,8 +545,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 		return -ENODEV;
 	}
 
-	mutex_lock(&guc->send_mutex);
-
 	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
 	if (unlikely(ret < 0)) {
 		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
@@ -557,7 +554,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 			 action[0], ret, ret);
 	}
 
-	mutex_unlock(&guc->send_mutex);
 	return ret;
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 18/97] drm/i915/guc: Don't receive all G2H messages in irq handler
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (16 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 17/97] drm/i915/guc: Stop using mutex while sending CTB messages Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-25 18:15   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 19/97] drm/i915/guc: Always copy CT message to new allocation Matthew Brost
                   ` (81 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

In irq handler try to receive just single G2H message, let other
messages to be received from tasklet.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 67 ++++++++++++++++-------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +
 2 files changed, 50 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index cb58fa7f970c..d630ec32decf 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -81,6 +81,7 @@ enum { CTB_SEND = 0, CTB_RECV = 1 };
 
 enum { CTB_OWNER_HOST = 0 };
 
+static void ct_receive_tasklet_func(unsigned long data);
 static void ct_incoming_request_worker_func(struct work_struct *w);
 
 /**
@@ -95,6 +96,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
 	INIT_LIST_HEAD(&ct->requests.pending);
 	INIT_LIST_HEAD(&ct->requests.incoming);
 	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
+	tasklet_init(&ct->receive_tasklet, ct_receive_tasklet_func, (unsigned long)ct);
 }
 
 static inline const char *guc_ct_buffer_type_to_str(u32 type)
@@ -244,6 +246,7 @@ void intel_guc_ct_fini(struct intel_guc_ct *ct)
 {
 	GEM_BUG_ON(ct->enabled);
 
+	tasklet_kill(&ct->receive_tasklet);
 	i915_vma_unpin_and_release(&ct->vma, I915_VMA_RELEASE_MAP);
 	memset(ct, 0, sizeof(*ct));
 }
@@ -629,7 +632,7 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
 	CT_DEBUG(ct, "received %*ph\n", 4 * len, data);
 
 	desc->head = head * 4;
-	return 0;
+	return available - len;
 
 corrupted:
 	CT_ERROR(ct, "Corrupted descriptor addr=%#x head=%u tail=%u size=%u\n",
@@ -665,10 +668,10 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
 	u32 status;
 	u32 datalen;
 	struct ct_request *req;
+	unsigned long flags;
 	bool found = false;
 
 	GEM_BUG_ON(!ct_header_is_response(header));
-	GEM_BUG_ON(!in_irq());
 
 	/* Response payload shall at least include fence and status */
 	if (unlikely(len < 2)) {
@@ -688,7 +691,7 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
 
 	CT_DEBUG(ct, "response fence %u status %#x\n", fence, status);
 
-	spin_lock(&ct->requests.lock);
+	spin_lock_irqsave(&ct->requests.lock, flags);
 	list_for_each_entry(req, &ct->requests.pending, link) {
 		if (unlikely(fence != req->fence)) {
 			CT_DEBUG(ct, "request %u awaits response\n",
@@ -707,7 +710,7 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
 		found = true;
 		break;
 	}
-	spin_unlock(&ct->requests.lock);
+	spin_unlock_irqrestore(&ct->requests.lock, flags);
 
 	if (!found)
 		CT_ERROR(ct, "Unsolicited response %*ph\n", msgsize, msg);
@@ -821,31 +824,55 @@ static int ct_handle_request(struct intel_guc_ct *ct, const u32 *msg)
 	return 0;
 }
 
+static int ct_receive(struct intel_guc_ct *ct)
+{
+	u32 msg[GUC_CT_MSG_LEN_MASK + 1]; /* one extra dw for the header */
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&ct->ctbs.recv.lock, flags);
+	ret = ct_read(ct, msg);
+	spin_unlock_irqrestore(&ct->ctbs.recv.lock, flags);
+	if (ret < 0)
+		return ret;
+
+	if (ct_header_is_response(msg[0]))
+		ct_handle_response(ct, msg);
+	else
+		ct_handle_request(ct, msg);
+
+	return ret;
+}
+
+static void ct_try_receive_message(struct intel_guc_ct *ct)
+{
+	int ret;
+
+	if (GEM_WARN_ON(!ct->enabled))
+		return;
+
+	ret = ct_receive(ct);
+	if (ret > 0)
+		tasklet_hi_schedule(&ct->receive_tasklet);
+}
+
+static void ct_receive_tasklet_func(unsigned long data)
+{
+	struct intel_guc_ct *ct = (struct intel_guc_ct *)data;
+
+	ct_try_receive_message(ct);
+}
+
 /*
  * When we're communicating with the GuC over CT, GuC uses events
  * to notify us about new messages being posted on the RECV buffer.
  */
 void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
 {
-	u32 msg[GUC_CT_MSG_LEN_MASK + 1]; /* one extra dw for the header */
-	unsigned long flags;
-	int err = 0;
-
 	if (unlikely(!ct->enabled)) {
 		WARN(1, "Unexpected GuC event received while CT disabled!\n");
 		return;
 	}
 
-	do {
-		spin_lock_irqsave(&ct->ctbs.recv.lock, flags);
-		err = ct_read(ct, msg);
-		spin_unlock_irqrestore(&ct->ctbs.recv.lock, flags);
-		if (err)
-			break;
-
-		if (ct_header_is_response(msg[0]))
-			err = ct_handle_response(ct, msg);
-		else
-			err = ct_handle_request(ct, msg);
-	} while (!err);
+	ct_try_receive_message(ct);
 }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index bc52dc479a14..cb222f202301 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -6,6 +6,7 @@
 #ifndef _INTEL_GUC_CT_H_
 #define _INTEL_GUC_CT_H_
 
+#include <linux/interrupt.h>
 #include <linux/spinlock.h>
 #include <linux/workqueue.h>
 
@@ -55,6 +56,8 @@ struct intel_guc_ct {
 		struct intel_guc_ct_buffer recv;
 	} ctbs;
 
+	struct tasklet_struct receive_tasklet;
+
 	struct {
 		u32 last_fence; /* last fence used to send request */
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 19/97] drm/i915/guc: Always copy CT message to new allocation
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (17 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 18/97] drm/i915/guc: Don't receive all G2H messages in irq handler Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-25 18:25   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 20/97] drm/i915/guc: Introduce unified HXG messages Matthew Brost
                   ` (80 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

Since most of future CT traffic will be based on G2H requests,
instead of copying incoming CT message to static buffer and then
create new allocation for such request, always copy incoming CT
message to new allocation. Also by doing it while reading CT
header, we can safely fallback if that atomic allocation fails.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 180 ++++++++++++++--------
 1 file changed, 120 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index d630ec32decf..a174978c6a27 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -72,8 +72,9 @@ struct ct_request {
 	u32 *response_buf;
 };
 
-struct ct_incoming_request {
+struct ct_incoming_msg {
 	struct list_head link;
+	u32 size;
 	u32 msg[];
 };
 
@@ -575,7 +576,26 @@ static inline bool ct_header_is_response(u32 header)
 	return !!(header & GUC_CT_MSG_IS_RESPONSE);
 }
 
-static int ct_read(struct intel_guc_ct *ct, u32 *data)
+static struct ct_incoming_msg *ct_alloc_msg(u32 num_dwords)
+{
+	struct ct_incoming_msg *msg;
+
+	msg = kmalloc(sizeof(*msg) + sizeof(u32) * num_dwords, GFP_ATOMIC);
+	if (msg)
+		msg->size = num_dwords;
+	return msg;
+}
+
+static void ct_free_msg(struct ct_incoming_msg *msg)
+{
+	kfree(msg);
+}
+
+/*
+ * Return: number available remaining dwords to read (0 if empty)
+ *         or a negative error code on failure
+ */
+static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
@@ -586,6 +606,7 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
 	s32 available;
 	unsigned int len;
 	unsigned int i;
+	u32 header;
 
 	if (unlikely(desc->is_in_error))
 		return -EPIPE;
@@ -601,8 +622,10 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
 
 	/* tail == head condition indicates empty */
 	available = tail - head;
-	if (unlikely(available == 0))
-		return -ENODATA;
+	if (unlikely(available == 0)) {
+		*msg = NULL;
+		return 0;
+	}
 
 	/* beware of buffer wrap case */
 	if (unlikely(available < 0))
@@ -610,14 +633,14 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
 	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
 	GEM_BUG_ON(available < 0);
 
-	data[0] = cmds[head];
+	header = cmds[head];
 	head = (head + 1) % size;
 
 	/* message len with header */
-	len = ct_header_get_len(data[0]) + 1;
+	len = ct_header_get_len(header) + 1;
 	if (unlikely(len > (u32)available)) {
 		CT_ERROR(ct, "Incomplete message %*ph %*ph %*ph\n",
-			 4, data,
+			 4, &header,
 			 4 * (head + available - 1 > size ?
 			      size - head : available - 1), &cmds[head],
 			 4 * (head + available - 1 > size ?
@@ -625,11 +648,24 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
 		goto corrupted;
 	}
 
+	*msg = ct_alloc_msg(len);
+	if (!*msg) {
+		CT_ERROR(ct, "No memory for message %*ph %*ph %*ph\n",
+			 4, &header,
+			 4 * (head + available - 1 > size ?
+			      size - head : available - 1), &cmds[head],
+			 4 * (head + available - 1 > size ?
+			      available - 1 - size + head : 0), &cmds[0]);
+		return available;
+	}
+
+	(*msg)->msg[0] = header;
+
 	for (i = 1; i < len; i++) {
-		data[i] = cmds[head];
+		(*msg)->msg[i] = cmds[head];
 		head = (head + 1) % size;
 	}
-	CT_DEBUG(ct, "received %*ph\n", 4 * len, data);
+	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
 
 	desc->head = head * 4;
 	return available - len;
@@ -659,33 +695,33 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
  *                   ^-----------------------len-----------------------^
  */
 
-static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
+static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *response)
 {
-	u32 header = msg[0];
+	u32 header = response->msg[0];
 	u32 len = ct_header_get_len(header);
-	u32 msgsize = (len + 1) * sizeof(u32); /* msg size in bytes w/header */
 	u32 fence;
 	u32 status;
 	u32 datalen;
 	struct ct_request *req;
 	unsigned long flags;
 	bool found = false;
+	int err = 0;
 
 	GEM_BUG_ON(!ct_header_is_response(header));
 
 	/* Response payload shall at least include fence and status */
 	if (unlikely(len < 2)) {
-		CT_ERROR(ct, "Corrupted response %*ph\n", msgsize, msg);
+		CT_ERROR(ct, "Corrupted response (len %u)\n", len);
 		return -EPROTO;
 	}
 
-	fence = msg[1];
-	status = msg[2];
+	fence = response->msg[1];
+	status = response->msg[2];
 	datalen = len - 2;
 
 	/* Format of the status follows RESPONSE message */
 	if (unlikely(!INTEL_GUC_MSG_IS_RESPONSE(status))) {
-		CT_ERROR(ct, "Corrupted response %*ph\n", msgsize, msg);
+		CT_ERROR(ct, "Corrupted response (status %#x)\n", status);
 		return -EPROTO;
 	}
 
@@ -699,12 +735,13 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
 			continue;
 		}
 		if (unlikely(datalen > req->response_len)) {
-			CT_ERROR(ct, "Response for %u is too long %*ph\n",
-				 req->fence, msgsize, msg);
-			datalen = 0;
+			CT_ERROR(ct, "Response %u too long (datalen %u > %u)\n",
+				 req->fence, datalen, req->response_len);
+			datalen = min(datalen, req->response_len);
+			err = -EMSGSIZE;
 		}
 		if (datalen)
-			memcpy(req->response_buf, msg + 3, 4 * datalen);
+			memcpy(req->response_buf, response->msg + 3, 4 * datalen);
 		req->response_len = datalen;
 		WRITE_ONCE(req->status, status);
 		found = true;
@@ -712,45 +749,61 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
 	}
 	spin_unlock_irqrestore(&ct->requests.lock, flags);
 
-	if (!found)
-		CT_ERROR(ct, "Unsolicited response %*ph\n", msgsize, msg);
+	if (!found) {
+		CT_ERROR(ct, "Unsolicited response (fence %u)\n", fence);
+		return -ENOKEY;
+	}
+
+	if (unlikely(err))
+		return err;
+
+	ct_free_msg(response);
 	return 0;
 }
 
-static void ct_process_request(struct intel_guc_ct *ct,
-			       u32 action, u32 len, const u32 *payload)
+static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *request)
 {
 	struct intel_guc *guc = ct_to_guc(ct);
+	u32 header, action, len;
+	const u32 *payload;
 	int ret;
 
+	header = request->msg[0];
+	payload = &request->msg[1];
+	action = ct_header_get_action(header);
+	len = ct_header_get_len(header);
+
 	CT_DEBUG(ct, "request %x %*ph\n", action, 4 * len, payload);
 
 	switch (action) {
 	case INTEL_GUC_ACTION_DEFAULT:
 		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
-		if (unlikely(ret))
-			goto fail_unexpected;
 		break;
-
 	default:
-fail_unexpected:
-		CT_ERROR(ct, "Unexpected request %x %*ph\n",
-			 action, 4 * len, payload);
+		ret = -EOPNOTSUPP;
 		break;
 	}
+
+	if (unlikely(ret)) {
+		CT_ERROR(ct, "Failed to process request %04x (%pe)\n",
+			 action, ERR_PTR(ret));
+		return ret;
+	}
+
+	ct_free_msg(request);
+	return 0;
 }
 
 static bool ct_process_incoming_requests(struct intel_guc_ct *ct)
 {
 	unsigned long flags;
-	struct ct_incoming_request *request;
-	u32 header;
-	u32 *payload;
+	struct ct_incoming_msg *request;
 	bool done;
+	int err;
 
 	spin_lock_irqsave(&ct->requests.lock, flags);
 	request = list_first_entry_or_null(&ct->requests.incoming,
-					   struct ct_incoming_request, link);
+					   struct ct_incoming_msg, link);
 	if (request)
 		list_del(&request->link);
 	done = !!list_empty(&ct->requests.incoming);
@@ -759,14 +812,13 @@ static bool ct_process_incoming_requests(struct intel_guc_ct *ct)
 	if (!request)
 		return true;
 
-	header = request->msg[0];
-	payload = &request->msg[1];
-	ct_process_request(ct,
-			   ct_header_get_action(header),
-			   ct_header_get_len(header),
-			   payload);
+	err = ct_process_request(ct, request);
+	if (unlikely(err)) {
+		CT_ERROR(ct, "Failed to process CT message (%pe) %*ph\n",
+			 ERR_PTR(err), 4 * request->size, request->msg);
+		ct_free_msg(request);
+	}
 
-	kfree(request);
 	return done;
 }
 
@@ -799,22 +851,11 @@ static void ct_incoming_request_worker_func(struct work_struct *w)
  *                   ^-----------------------len-----------------------^
  */
 
-static int ct_handle_request(struct intel_guc_ct *ct, const u32 *msg)
+static int ct_handle_request(struct intel_guc_ct *ct, struct ct_incoming_msg *request)
 {
-	u32 header = msg[0];
-	u32 len = ct_header_get_len(header);
-	u32 msgsize = (len + 1) * sizeof(u32); /* msg size in bytes w/header */
-	struct ct_incoming_request *request;
 	unsigned long flags;
 
-	GEM_BUG_ON(ct_header_is_response(header));
-
-	request = kmalloc(sizeof(*request) + msgsize, GFP_ATOMIC);
-	if (unlikely(!request)) {
-		CT_ERROR(ct, "Dropping request %*ph\n", msgsize, msg);
-		return 0; /* XXX: -ENOMEM ? */
-	}
-	memcpy(request->msg, msg, msgsize);
+	GEM_BUG_ON(ct_header_is_response(request->msg[0]));
 
 	spin_lock_irqsave(&ct->requests.lock, flags);
 	list_add_tail(&request->link, &ct->requests.incoming);
@@ -824,22 +865,41 @@ static int ct_handle_request(struct intel_guc_ct *ct, const u32 *msg)
 	return 0;
 }
 
+static void ct_handle_msg(struct intel_guc_ct *ct, struct ct_incoming_msg *msg)
+{
+	u32 header = msg->msg[0];
+	int err;
+
+	if (ct_header_is_response(header))
+		err = ct_handle_response(ct, msg);
+	else
+		err = ct_handle_request(ct, msg);
+
+	if (unlikely(err)) {
+		CT_ERROR(ct, "Failed to process CT message (%pe) %*ph\n",
+			 ERR_PTR(err), 4 * msg->size, msg->msg);
+		ct_free_msg(msg);
+	}
+}
+
+/*
+ * Return: number available remaining dwords to read (0 if empty)
+ *         or a negative error code on failure
+ */
 static int ct_receive(struct intel_guc_ct *ct)
 {
-	u32 msg[GUC_CT_MSG_LEN_MASK + 1]; /* one extra dw for the header */
+	struct ct_incoming_msg *msg = NULL;
 	unsigned long flags;
 	int ret;
 
 	spin_lock_irqsave(&ct->ctbs.recv.lock, flags);
-	ret = ct_read(ct, msg);
+	ret = ct_read(ct, &msg);
 	spin_unlock_irqrestore(&ct->ctbs.recv.lock, flags);
 	if (ret < 0)
 		return ret;
 
-	if (ct_header_is_response(msg[0]))
-		ct_handle_response(ct, msg);
-	else
-		ct_handle_request(ct, msg);
+	if (msg)
+		ct_handle_msg(ct, msg);
 
 	return ret;
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 20/97] drm/i915/guc: Introduce unified HXG messages
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (18 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 19/97] drm/i915/guc: Always copy CT message to new allocation Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-11 15:16   ` Daniel Vetter
  2021-05-06 19:13 ` [RFC PATCH 21/97] drm/i915/guc: Update MMIO based communication Matthew Brost
                   ` (79 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

New GuC firmware will unify format of MMIO and CTB H2G messages.
Introduce their definitions now to allow gradual transition of
our code to match new changes.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
---
 .../gpu/drm/i915/gt/uc/abi/guc_messages_abi.h | 226 ++++++++++++++++++
 1 file changed, 226 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
index 775e21f3058c..1c264819aa03 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
@@ -6,6 +6,232 @@
 #ifndef _ABI_GUC_MESSAGES_ABI_H
 #define _ABI_GUC_MESSAGES_ABI_H
 
+/**
+ * DOC: HXG Message
+ *
+ * All messages exchanged with GuC are defined using 32 bit dwords.
+ * First dword is treated as a message header. Remaining dwords are optional.
+ *
+ * .. _HXG Message:
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  |   |       |                                                              |
+ *  | 0 |    31 | **ORIGIN** - originator of the message                       |
+ *  |   |       |   - _`GUC_HXG_ORIGIN_HOST` = 0                               |
+ *  |   |       |   - _`GUC_HXG_ORIGIN_GUC` = 1                                |
+ *  |   |       |                                                              |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 30:28 | **TYPE** - message type                                      |
+ *  |   |       |   - _`GUC_HXG_TYPE_REQUEST` = 0                              |
+ *  |   |       |   - _`GUC_HXG_TYPE_EVENT` = 1                                |
+ *  |   |       |   - _`GUC_HXG_TYPE_NO_RESPONSE_BUSY` = 3                     |
+ *  |   |       |   - _`GUC_HXG_TYPE_NO_RESPONSE_RETRY` = 5                    |
+ *  |   |       |   - _`GUC_HXG_TYPE_RESPONSE_FAILURE` = 6                     |
+ *  |   |       |   - _`GUC_HXG_TYPE_RESPONSE_SUCCESS` = 7                     |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  27:0 | **AUX** - auxiliary data (depends TYPE)                      |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 1 |  31:0 | optional payload (depends on TYPE)                           |
+ *  +---+-------+                                                              |
+ *  |...|       |                                                              |
+ *  +---+-------+                                                              |
+ *  | n |  31:0 |                                                              |
+ *  +---+-------+--------------------------------------------------------------+
+ */
+
+#define GUC_HXG_MSG_MIN_LEN			1u
+#define GUC_HXG_MSG_0_ORIGIN			(0x1 << 31)
+#define   GUC_HXG_ORIGIN_HOST			0u
+#define   GUC_HXG_ORIGIN_GUC			1u
+#define GUC_HXG_MSG_0_TYPE			(0x7 << 28)
+#define   GUC_HXG_TYPE_REQUEST			0u
+#define   GUC_HXG_TYPE_EVENT			1u
+#define   GUC_HXG_TYPE_NO_RESPONSE_BUSY		3u
+#define   GUC_HXG_TYPE_NO_RESPONSE_RETRY	5u
+#define   GUC_HXG_TYPE_RESPONSE_FAILURE		6u
+#define   GUC_HXG_TYPE_RESPONSE_SUCCESS		7u
+#define GUC_HXG_MSG_0_AUX			(0xfffffff << 0)
+
+/**
+ * DOC: HXG Request
+ *
+ * The `HXG Request`_ message should be used to initiate synchronous activity
+ * for which confirmation or return data is expected.
+ *
+ * The recipient of this message shall use `HXG Response`_, `HXG Failure`_
+ * or `HXG Retry`_ message as a definite reply, and may use `HXG Busy`_
+ * message as a intermediate reply.
+ *
+ * Format of @DATA0 and all @DATAn fields depends on the @ACTION code.
+ *
+ * _HXG Request:
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 |    31 | ORIGIN                                                       |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_                                 |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 27:16 | **DATA0** - request data (depends on ACTION)                 |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  15:0 | **ACTION** - requested action code                           |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 1 |  31:0 | **DATA1** - optional data (depends on ACTION)                |
+ *  +---+-------+--------------------------------------------------------------+
+ *  |...|       |                                                              |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | n |  31:0 | **DATAn** - optional data (depends on ACTION)                |
+ *  +---+-------+--------------------------------------------------------------+
+ */
+
+#define GUC_HXG_REQUEST_MSG_MIN_LEN		GUC_HXG_MSG_MIN_LEN
+#define GUC_HXG_REQUEST_MSG_0_DATA0		(0xfff << 16)
+#define GUC_HXG_REQUEST_MSG_0_ACTION		(0xffff << 0)
+#define GUC_HXG_REQUEST_MSG_n_DATAn		(0xffffffff << 0)
+
+/**
+ * DOC: HXG Event
+ *
+ * The `HXG Event`_ message should be used to initiate asynchronous activity
+ * that does not involves immediate confirmation nor data.
+ *
+ * Format of @DATA0 and all @DATAn fields depends on the @ACTION code.
+ *
+ * .. _HXG Event:
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 |    31 | ORIGIN                                                       |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 30:28 | TYPE = GUC_HXG_TYPE_EVENT_                                   |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 27:16 | **DATA0** - event data (depends on ACTION)                   |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  15:0 | **ACTION** - event action code                               |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 1 |  31:0 | **DATA1** - optional event data (depends on ACTION)          |
+ *  +---+-------+--------------------------------------------------------------+
+ *  |...|       |                                                              |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | n |  31:0 | **DATAn** - optional event  data (depends on ACTION)         |
+ *  +---+-------+--------------------------------------------------------------+
+ */
+
+#define GUC_HXG_EVENT_MSG_MIN_LEN		GUC_HXG_MSG_MIN_LEN
+#define GUC_HXG_EVENT_MSG_0_DATA0		(0xfff << 16)
+#define GUC_HXG_EVENT_MSG_0_ACTION		(0xffff << 0)
+#define GUC_HXG_EVENT_MSG_n_DATAn		(0xffffffff << 0)
+
+/**
+ * DOC: HXG Busy
+ *
+ * The `HXG Busy`_ message may be used to acknowledge reception of the `HXG Request`_
+ * message if the recipient expects that it processing will be longer than default
+ * timeout.
+ *
+ * The @COUNTER field may be used as a progress indicator.
+ *
+ * .. _HXG Busy:
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 |    31 | ORIGIN                                                       |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 30:28 | TYPE = GUC_HXG_TYPE_NO_RESPONSE_BUSY_                        |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  27:0 | **COUNTER** - progress indicator                             |
+ *  +---+-------+--------------------------------------------------------------+
+ */
+
+#define GUC_HXG_BUSY_MSG_LEN			GUC_HXG_MSG_MIN_LEN
+#define GUC_HXG_BUSY_MSG_0_COUNTER		GUC_HXG_MSG_0_AUX
+
+/**
+ * DOC: HXG Retry
+ *
+ * The `HXG Retry`_ message should be used by recipient to indicate that the
+ * `HXG Request`_ message was dropped and it should be resent again.
+ *
+ * The @REASON field may be used to provide additional information.
+ *
+ * .. _HXG Retry:
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 |    31 | ORIGIN                                                       |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 30:28 | TYPE = GUC_HXG_TYPE_NO_RESPONSE_RETRY_                       |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  27:0 | **REASON** - reason for retry                                |
+ *  |   |       |  - _`GUC_HXG_RETRY_REASON_UNSPECIFIED` = 0                   |
+ *  +---+-------+--------------------------------------------------------------+
+ */
+
+#define GUC_HXG_RETRY_MSG_LEN			GUC_HXG_MSG_MIN_LEN
+#define GUC_HXG_RETRY_MSG_0_REASON		GUC_HXG_MSG_0_AUX
+#define   GUC_HXG_RETRY_REASON_UNSPECIFIED	0u
+
+/**
+ * DOC: HXG Failure
+ *
+ * The `HXG Failure`_ message shall be used as a reply to the `HXG Request`_
+ * message that could not be processed due to an error.
+ *
+ * .. _HXG Failure:
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 |    31 | ORIGIN                                                       |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_FAILURE_                        |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 27:16 | **HINT** - additional error hint                             |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  15:0 | **ERROR** - error/result code                                |
+ *  +---+-------+--------------------------------------------------------------+
+ */
+
+#define GUC_HXG_FAILURE_MSG_LEN			GUC_HXG_MSG_MIN_LEN
+#define GUC_HXG_FAILURE_MSG_0_HINT		(0xfff << 16)
+#define GUC_HXG_FAILURE_MSG_0_ERROR		(0xffff << 0)
+
+/**
+ * DOC: HXG Response
+ *
+ * The `HXG Response`_ message SHALL be used as a reply to the `HXG Request`_
+ * message that was successfully processed without an error.
+ *
+ * .. _HXG Response:
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 |    31 | ORIGIN                                                       |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_                        |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  27:0 | **DATA0** - data (depends on ACTION from `HXG Request`_)     |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 1 |  31:0 | **DATA1** - data (depends on ACTION from `HXG Request`_)     |
+ *  +---+-------+--------------------------------------------------------------+
+ *  |...|       |                                                              |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | n |  31:0 | **DATAn** - data (depends on ACTION from `HXG Request`_)     |
+ *  +---+-------+--------------------------------------------------------------+
+ */
+
+#define GUC_HXG_RESPONSE_MSG_MIN_LEN		GUC_HXG_MSG_MIN_LEN
+#define GUC_HXG_RESPONSE_MSG_0_DATA0		GUC_HXG_MSG_0_AUX
+#define GUC_HXG_RESPONSE_MSG_n_DATAn		(0xffffffff << 0)
+
+/* deprecated */
 #define INTEL_GUC_MSG_TYPE_SHIFT	28
 #define INTEL_GUC_MSG_TYPE_MASK		(0xF << INTEL_GUC_MSG_TYPE_SHIFT)
 #define INTEL_GUC_MSG_DATA_SHIFT	16
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 21/97] drm/i915/guc: Update MMIO based communication
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (19 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 20/97] drm/i915/guc: Introduce unified HXG messages Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 22/97] drm/i915/guc: Update CTB response status Matthew Brost
                   ` (78 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

The MMIO based Host-to-GuC communication protocol has been
updated to use unified HXG messages.

Update our intel_guc_send_mmio() function by correctly handle
BUSY, RETRY and FAILURE replies. Also update our documentation.

GuC: 55.0.0
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
---
 .../gt/uc/abi/guc_communication_mmio_abi.h    | 47 ++++--------
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        | 75 ++++++++++++++-----
 2 files changed, 70 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h
index be066a62e9e0..fef51499386b 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h
@@ -7,44 +7,27 @@
 #define _ABI_GUC_COMMUNICATION_MMIO_ABI_H
 
 /**
- * DOC: MMIO based communication
+ * DOC: GuC MMIO based communication
  *
- * The MMIO based communication between Host and GuC uses software scratch
- * registers, where first register holds data treated as message header,
- * and other registers are used to hold message payload.
+ * The MMIO based communication between Host and GuC relies on special
+ * hardware registers which format could be defined by the software
+ * (so called scratch registers).
  *
- * For Gen9+, GuC uses software scratch registers 0xC180-0xC1B8,
- * but no H2G command takes more than 8 parameters and the GuC FW
- * itself uses an 8-element array to store the H2G message.
- *
- *      +-----------+---------+---------+---------+
- *      |  MMIO[0]  | MMIO[1] |   ...   | MMIO[n] |
- *      +-----------+---------+---------+---------+
- *      | header    |      optional payload       |
- *      +======+====+=========+=========+=========+
- *      | 31:28|type|         |         |         |
- *      +------+----+         |         |         |
- *      | 27:16|data|         |         |         |
- *      +------+----+         |         |         |
- *      |  15:0|code|         |         |         |
- *      +------+----+---------+---------+---------+
+ * Each MMIO based message, both Host to GuC (H2G) and GuC to Host (G2H)
+ * messages, which maximum length depends on number of available scratch
+ * registers, is directly written into those scratch registers.
  *
- * The message header consists of:
- *
- * - **type**, indicates message type
- * - **code**, indicates message code, is specific for **type**
- * - **data**, indicates message data, optional, depends on **code**
+ * For Gen9+, there are 16 software scratch registers 0xC180-0xC1B8,
+ * but no H2G command takes more than 8 parameters and the GuC firmware
+ * itself uses an 8-element array to store the H2G message.
  *
- * The following message **types** are supported:
+ * For Gen11+, there are additional 4 registers 0x190240-0x19024C, which
+ * are, regardless on lower count, preffered over legacy ones.
  *
- * - **REQUEST**, indicates Host-to-GuC request, requested GuC action code
- *   must be priovided in **code** field. Optional action specific parameters
- *   can be provided in remaining payload registers or **data** field.
+ * The MMIO based communication is mainly used during driver initialization
+ * phase to setup the CTB based communication that will be used afterwards.
  *
- * - **RESPONSE**, indicates GuC-to-Host response from earlier GuC request,
- *   action response status will be provided in **code** field. Optional
- *   response data can be returned in remaining payload registers or **data**
- *   field.
+ * Format of the MMIO messages follows definitions of `HXG Message`_.
  */
 
 #define GUC_MAX_MMIO_MSG_LEN		8
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index ab2c8fe8cdfa..454c8d886499 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -385,29 +385,27 @@ void intel_guc_fini(struct intel_guc *guc)
 /*
  * This function implements the MMIO based host to GuC interface.
  */
-int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action, u32 len,
+int intel_guc_send_mmio(struct intel_guc *guc, const u32 *request, u32 len,
 			u32 *response_buf, u32 response_buf_size)
 {
+	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
 	struct intel_uncore *uncore = guc_to_gt(guc)->uncore;
-	u32 status;
+	u32 header;
 	int i;
 	int ret;
 
 	GEM_BUG_ON(!len);
 	GEM_BUG_ON(len > guc->send_regs.count);
 
-	/* We expect only action code */
-	GEM_BUG_ON(*action & ~INTEL_GUC_MSG_CODE_MASK);
-
-	/* If CT is available, we expect to use MMIO only during init/fini */
-	GEM_BUG_ON(*action != INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER &&
-		   *action != INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER);
+	GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, request[0]) != GUC_HXG_ORIGIN_HOST);
+	GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_TYPE, request[0]) != GUC_HXG_TYPE_REQUEST);
 
 	mutex_lock(&guc->send_mutex);
 	intel_uncore_forcewake_get(uncore, guc->send_regs.fw_domains);
 
+retry:
 	for (i = 0; i < len; i++)
-		intel_uncore_write(uncore, guc_send_reg(guc, i), action[i]);
+		intel_uncore_write(uncore, guc_send_reg(guc, i), request[i]);
 
 	intel_uncore_posting_read(uncore, guc_send_reg(guc, i - 1));
 
@@ -419,17 +417,54 @@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action, u32 len,
 	 */
 	ret = __intel_wait_for_register_fw(uncore,
 					   guc_send_reg(guc, 0),
-					   INTEL_GUC_MSG_TYPE_MASK,
-					   INTEL_GUC_MSG_TYPE_RESPONSE <<
-					   INTEL_GUC_MSG_TYPE_SHIFT,
-					   10, 10, &status);
-	/* If GuC explicitly returned an error, convert it to -EIO */
-	if (!ret && !INTEL_GUC_MSG_IS_RESPONSE_SUCCESS(status))
-		ret = -EIO;
+					   GUC_HXG_MSG_0_ORIGIN,
+					   FIELD_PREP(GUC_HXG_MSG_0_ORIGIN,
+						      GUC_HXG_ORIGIN_GUC),
+					   10, 10, &header);
+	if (unlikely(ret)) {
+timeout:
+		drm_err(&i915->drm, "mmio request %#x: no reply %x\n",
+			request[0], header);
+		goto out;
+	}
 
-	if (ret) {
-		DRM_ERROR("MMIO: GuC action %#x failed with error %d %#x\n",
-			  action[0], ret, status);
+	if (FIELD_GET(GUC_HXG_MSG_0_TYPE, header) == GUC_HXG_TYPE_NO_RESPONSE_BUSY) {
+#define done ({ header = intel_uncore_read(uncore, guc_send_reg(guc, 0)); \
+		FIELD_GET(GUC_HXG_MSG_0_ORIGIN, header) != GUC_HXG_ORIGIN_GUC || \
+		FIELD_GET(GUC_HXG_MSG_0_TYPE, header) != GUC_HXG_TYPE_NO_RESPONSE_BUSY; })
+
+		ret = wait_for(done, 1000);
+		if (unlikely(ret))
+			goto timeout;
+		if (unlikely(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, header) !=
+				       GUC_HXG_ORIGIN_GUC))
+			goto proto;
+#undef done
+	}
+
+	if (FIELD_GET(GUC_HXG_MSG_0_TYPE, header) == GUC_HXG_TYPE_NO_RESPONSE_RETRY) {
+		u32 reason = FIELD_GET(GUC_HXG_RETRY_MSG_0_REASON, header);
+
+		drm_dbg(&i915->drm, "mmio request %#x: retrying, reason %u\n",
+			request[0], reason);
+		goto retry;
+	}
+
+	if (FIELD_GET(GUC_HXG_MSG_0_TYPE, header) == GUC_HXG_TYPE_RESPONSE_FAILURE) {
+		u32 hint = FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, header);
+		u32 error = FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, header);
+
+		drm_err(&i915->drm, "mmio request %#x: failure %x/%u\n",
+			request[0], error, hint);
+		ret = -ENXIO;
+		goto out;
+	}
+
+	if (FIELD_GET(GUC_HXG_MSG_0_TYPE, header) != GUC_HXG_TYPE_RESPONSE_SUCCESS) {
+proto:
+		drm_err(&i915->drm, "mmio request %#x: unexpected reply %#x\n",
+			request[0], header);
+		ret = -EPROTO;
 		goto out;
 	}
 
@@ -442,7 +477,7 @@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action, u32 len,
 	}
 
 	/* Use data from the GuC response as our return value */
-	ret = INTEL_GUC_MSG_TO_DATA(status);
+	ret = FIELD_GET(GUC_HXG_RESPONSE_MSG_0_DATA0, header);
 
 out:
 	intel_uncore_forcewake_put(uncore, guc->send_regs.fw_domains);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 22/97] drm/i915/guc: Update CTB response status
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (20 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 21/97] drm/i915/guc: Update MMIO based communication Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 23/97] drm/i915/guc: Support per context scheduling policies Matthew Brost
                   ` (77 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

Format of the STATUS dword in CTB response message now follows
definition of the HXG header. Update our code and remove any
obsolete legacy definitions.

GuC: 55.0.0
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h |  1 -
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c       | 12 ++++++------
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h     | 17 -----------------
 3 files changed, 6 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
index 488b6061ee89..2030896857d5 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
@@ -7,7 +7,6 @@
 #define _ABI_GUC_ERRORS_ABI_H
 
 enum intel_guc_response_status {
-	INTEL_GUC_RESPONSE_STATUS_SUCCESS = 0x0,
 	INTEL_GUC_RESPONSE_STATUS_GENERIC_FAIL = 0xF000,
 };
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index a174978c6a27..1afdeac683b5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -455,7 +455,7 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
 	 */
 	timeout = max(10, CONFIG_DRM_I915_HEARTBEAT_INTERVAL);
 
-#define done INTEL_GUC_MSG_IS_RESPONSE(READ_ONCE(req->status))
+#define done (FIELD_GET(GUC_HXG_MSG_0_ORIGIN, READ_ONCE(req->status)) == GUC_HXG_ORIGIN_GUC)
 	err = wait_for_us(done, 10);
 	if (err)
 		err = wait_for(done, timeout);
@@ -510,21 +510,21 @@ static int ct_send(struct intel_guc_ct *ct,
 	if (unlikely(err))
 		goto unlink;
 
-	if (!INTEL_GUC_MSG_IS_RESPONSE_SUCCESS(*status)) {
+	if (FIELD_GET(GUC_HXG_MSG_0_TYPE, *status) != GUC_HXG_TYPE_RESPONSE_SUCCESS) {
 		err = -EIO;
 		goto unlink;
 	}
 
 	if (response_buf) {
 		/* There shall be no data in the status */
-		WARN_ON(INTEL_GUC_MSG_TO_DATA(request.status));
+		WARN_ON(FIELD_GET(GUC_HXG_RESPONSE_MSG_0_DATA0, request.status));
 		/* Return actual response len */
 		err = request.response_len;
 	} else {
 		/* There shall be no response payload */
 		WARN_ON(request.response_len);
 		/* Return data decoded from the status dword */
-		err = INTEL_GUC_MSG_TO_DATA(*status);
+		err = FIELD_GET(GUC_HXG_RESPONSE_MSG_0_DATA0, *status);
 	}
 
 unlink:
@@ -719,8 +719,8 @@ static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 	status = response->msg[2];
 	datalen = len - 2;
 
-	/* Format of the status follows RESPONSE message */
-	if (unlikely(!INTEL_GUC_MSG_IS_RESPONSE(status))) {
+	/* Format of the status dword follows HXG header */
+	if (unlikely(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, status) != GUC_HXG_ORIGIN_GUC)) {
 		CT_ERROR(ct, "Corrupted response (status %#x)\n", status);
 		return -EPROTO;
 	}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 9bf35240e723..d445f6b77db4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -388,23 +388,6 @@ struct guc_shared_ctx_data {
 	struct guc_ctx_report preempt_ctx_report[GUC_MAX_ENGINES_NUM];
 } __packed;
 
-#define __INTEL_GUC_MSG_GET(T, m) \
-	(((m) & INTEL_GUC_MSG_ ## T ## _MASK) >> INTEL_GUC_MSG_ ## T ## _SHIFT)
-#define INTEL_GUC_MSG_TO_TYPE(m)	__INTEL_GUC_MSG_GET(TYPE, m)
-#define INTEL_GUC_MSG_TO_DATA(m)	__INTEL_GUC_MSG_GET(DATA, m)
-#define INTEL_GUC_MSG_TO_CODE(m)	__INTEL_GUC_MSG_GET(CODE, m)
-
-#define __INTEL_GUC_MSG_TYPE_IS(T, m) \
-	(INTEL_GUC_MSG_TO_TYPE(m) == INTEL_GUC_MSG_TYPE_ ## T)
-#define INTEL_GUC_MSG_IS_REQUEST(m)	__INTEL_GUC_MSG_TYPE_IS(REQUEST, m)
-#define INTEL_GUC_MSG_IS_RESPONSE(m)	__INTEL_GUC_MSG_TYPE_IS(RESPONSE, m)
-
-#define INTEL_GUC_MSG_IS_RESPONSE_SUCCESS(m) \
-	 (typecheck(u32, (m)) && \
-	  ((m) & (INTEL_GUC_MSG_TYPE_MASK | INTEL_GUC_MSG_CODE_MASK)) == \
-	  ((INTEL_GUC_MSG_TYPE_RESPONSE << INTEL_GUC_MSG_TYPE_SHIFT) | \
-	   (INTEL_GUC_RESPONSE_STATUS_SUCCESS << INTEL_GUC_MSG_CODE_SHIFT)))
-
 /* This action will be programmed in C1BC - SOFT_SCRATCH_15_REG */
 enum intel_guc_recv_message {
 	INTEL_GUC_RECV_MSG_CRASH_DUMP_POSTED = BIT(1),
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 23/97] drm/i915/guc: Support per context scheduling policies
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (21 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 22/97] drm/i915/guc: Update CTB response status Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-25  1:15   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 24/97] drm/i915/guc: Add flag for mark broken CTB Matthew Brost
                   ` (76 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

GuC firmware v53.0.0 introduced per context scheduling policies. This
includes changes to some of the ADS structures which are required to
load the firmware even if not using GuC submission.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c  | 26 +++--------------
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 31 +++++----------------
 2 files changed, 11 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 17526717368c..648e1767b17a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -58,30 +58,12 @@ static u32 guc_ads_blob_size(struct intel_guc *guc)
 	       guc_ads_private_data_size(guc);
 }
 
-static void guc_policy_init(struct guc_policy *policy)
-{
-	policy->execution_quantum = POLICY_DEFAULT_EXECUTION_QUANTUM_US;
-	policy->preemption_time = POLICY_DEFAULT_PREEMPTION_TIME_US;
-	policy->fault_time = POLICY_DEFAULT_FAULT_TIME_US;
-	policy->policy_flags = 0;
-}
-
 static void guc_policies_init(struct guc_policies *policies)
 {
-	struct guc_policy *policy;
-	u32 p, i;
-
-	policies->dpc_promote_time = POLICY_DEFAULT_DPC_PROMOTE_TIME_US;
-	policies->max_num_work_items = POLICY_MAX_NUM_WI;
-
-	for (p = 0; p < GUC_CLIENT_PRIORITY_NUM; p++) {
-		for (i = 0; i < GUC_MAX_ENGINE_CLASSES; i++) {
-			policy = &policies->policy[p][i];
-
-			guc_policy_init(policy);
-		}
-	}
-
+	policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US;
+	policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI;
+	/* Disable automatic resets as not yet supported. */
+	policies->global_flags = GLOBAL_POLICY_DISABLE_ENGINE_RESET;
 	policies->is_valid = 1;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index d445f6b77db4..95db4a7d3f4d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -221,32 +221,14 @@ struct guc_stage_desc {
 
 /* Scheduling policy settings */
 
-/* Reset engine upon preempt failure */
-#define POLICY_RESET_ENGINE		(1<<0)
-/* Preempt to idle on quantum expiry */
-#define POLICY_PREEMPT_TO_IDLE		(1<<1)
-
-#define POLICY_MAX_NUM_WI 15
-#define POLICY_DEFAULT_DPC_PROMOTE_TIME_US 500000
-#define POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000
-#define POLICY_DEFAULT_PREEMPTION_TIME_US 500000
-#define POLICY_DEFAULT_FAULT_TIME_US 250000
-
-struct guc_policy {
-	/* Time for one workload to execute. (in micro seconds) */
-	u32 execution_quantum;
-	/* Time to wait for a preemption request to completed before issuing a
-	 * reset. (in micro seconds). */
-	u32 preemption_time;
-	/* How much time to allow to run after the first fault is observed.
-	 * Then preempt afterwards. (in micro seconds) */
-	u32 fault_time;
-	u32 policy_flags;
-	u32 reserved[8];
-} __packed;
+#define GLOBAL_POLICY_MAX_NUM_WI 15
+
+/* Don't reset an engine upon preemption failure */
+#define GLOBAL_POLICY_DISABLE_ENGINE_RESET				BIT(0)
+
+#define GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US 500000
 
 struct guc_policies {
-	struct guc_policy policy[GUC_CLIENT_PRIORITY_NUM][GUC_MAX_ENGINE_CLASSES];
 	u32 submission_queue_depth[GUC_MAX_ENGINE_CLASSES];
 	/* In micro seconds. How much time to allow before DPC processing is
 	 * called back via interrupt (to prevent DPC queue drain starving).
@@ -260,6 +242,7 @@ struct guc_policies {
 	 * idle. */
 	u32 max_num_work_items;
 
+	u32 global_flags;
 	u32 reserved[4];
 } __packed;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 24/97] drm/i915/guc: Add flag for mark broken CTB
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (22 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 23/97] drm/i915/guc: Support per context scheduling policies Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-27 19:44   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 25/97] drm/i915/guc: New definition of the CTB descriptor Matthew Brost
                   ` (75 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

Once CTB descriptor is found in error state, either set by GuC
or us, there is no need continue checking descriptor any more,
we can rely on our internal flag.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 13 +++++++++++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  2 ++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 1afdeac683b5..178f73ab2c96 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -123,6 +123,7 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
 
 static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb, u32 cmds_addr)
 {
+	ctb->broken = false;
 	guc_ct_buffer_desc_init(ctb->desc, cmds_addr, ctb->size);
 }
 
@@ -365,9 +366,12 @@ static int ct_write(struct intel_guc_ct *ct,
 	u32 *cmds = ctb->cmds;
 	unsigned int i;
 
-	if (unlikely(desc->is_in_error))
+	if (unlikely(ctb->broken))
 		return -EPIPE;
 
+	if (unlikely(desc->is_in_error))
+		goto corrupted;
+
 	if (unlikely(!IS_ALIGNED(head | tail, 4) ||
 		     (tail | head) >= size))
 		goto corrupted;
@@ -423,6 +427,7 @@ static int ct_write(struct intel_guc_ct *ct,
 	CT_ERROR(ct, "Corrupted descriptor addr=%#x head=%u tail=%u size=%u\n",
 		 desc->addr, desc->head, desc->tail, desc->size);
 	desc->is_in_error = 1;
+	ctb->broken = true;
 	return -EPIPE;
 }
 
@@ -608,9 +613,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	unsigned int i;
 	u32 header;
 
-	if (unlikely(desc->is_in_error))
+	if (unlikely(ctb->broken))
 		return -EPIPE;
 
+	if (unlikely(desc->is_in_error))
+		goto corrupted;
+
 	if (unlikely(!IS_ALIGNED(head | tail, 4) ||
 		     (tail | head) >= size))
 		goto corrupted;
@@ -674,6 +682,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	CT_ERROR(ct, "Corrupted descriptor addr=%#x head=%u tail=%u size=%u\n",
 		 desc->addr, desc->head, desc->tail, desc->size);
 	desc->is_in_error = 1;
+	ctb->broken = true;
 	return -EPIPE;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index cb222f202301..7d3cd375d6a7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -32,12 +32,14 @@ struct intel_guc;
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
  * @size: size of the commands buffer
+ * @broken: flag to indicate if descriptor data is broken
  */
 struct intel_guc_ct_buffer {
 	spinlock_t lock;
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
 	u32 size;
+	bool broken;
 };
 
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 25/97] drm/i915/guc: New definition of the CTB descriptor
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (23 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 24/97] drm/i915/guc: Add flag for mark broken CTB Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 26/97] drm/i915/guc: New definition of the CTB registration action Matthew Brost
                   ` (74 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

Definition of the CTB descriptor has changed, leaving only
minimal shared fields like HEAD/TAIL/STATUS.

Both HEAD and TAIL are now in dwords.

Add some ABI documentation and implement required changes.

GuC: 57.0.0
GuC: 60.0.0
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gt/uc/abi/guc_communication_ctb_abi.h     | 70 ++++++++++++++-----
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 70 +++++++++----------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  2 +-
 3 files changed, 85 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
index d38935f47ecf..c2a069a78e01 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
@@ -7,6 +7,58 @@
 #define _ABI_GUC_COMMUNICATION_CTB_ABI_H
 
 #include <linux/types.h>
+#include <linux/build_bug.h>
+
+#include "guc_messages_abi.h"
+
+/**
+ * DOC: CT Buffer
+ *
+ * TBD
+ */
+
+/**
+ * DOC: CTB Descriptor
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 |  31:0 | **HEAD** - offset (in dwords) to the last dword that was     |
+ *  |   |       | read from the `CT Buffer`_.                                  |
+ *  |   |       | It can only be updated by the receiver.                      |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 1 |  31:0 | **TAIL** - offset (in dwords) to the last dword that was     |
+ *  |   |       | written to the `CT Buffer`_.                                 |
+ *  |   |       | It can only be updated by the sender.                        |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 2 |  31:0 | **STATUS** - status of the CTB                               |
+ *  |   |       |                                                              |
+ *  |   |       |   - _`GUC_CTB_STATUS_NO_ERROR` = 0 (normal operation)        |
+ *  |   |       |   - _`GUC_CTB_STATUS_OVERFLOW` = 1 (head/tail too large)     |
+ *  |   |       |   - _`GUC_CTB_STATUS_UNDERFLOW` = 2 (truncated message)      |
+ *  |   |       |   - _`GUC_CTB_STATUS_MISMATCH` = 4 (head/tail modified)      |
+ *  |   |       |   - _`GUC_CTB_STATUS_NO_BACKCHANNEL` = 8                     |
+ *  |   |       |   - _`GUC_CTB_STATUS_MALFORMED_MSG` = 16                     |
+ *  +---+-------+--------------------------------------------------------------+
+ *  |...|       | RESERVED = MBZ                                               |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 15|  31:0 | RESERVED = MBZ                                               |
+ *  +---+-------+--------------------------------------------------------------+
+ */
+
+struct guc_ct_buffer_desc {
+	u32 head;
+	u32 tail;
+	u32 status;
+#define GUC_CTB_STATUS_NO_ERROR				0
+#define GUC_CTB_STATUS_OVERFLOW				(1 << 0)
+#define GUC_CTB_STATUS_UNDERFLOW			(1 << 1)
+#define GUC_CTB_STATUS_MISMATCH				(1 << 2)
+#define GUC_CTB_STATUS_NO_BACKCHANNEL			(1 << 3)
+#define GUC_CTB_STATUS_MALFORMED_MSG			(1 << 4)
+	u32 reserved[13];
+} __packed;
+static_assert(sizeof(struct guc_ct_buffer_desc) == 64);
 
 /**
  * DOC: CTB based communication
@@ -60,24 +112,6 @@
  * - **flags**, holds various bits to control message handling
  */
 
-/*
- * Describes single command transport buffer.
- * Used by both guc-master and clients.
- */
-struct guc_ct_buffer_desc {
-	u32 addr;		/* gfx address */
-	u64 host_private;	/* host private data */
-	u32 size;		/* size in bytes */
-	u32 head;		/* offset updated by GuC*/
-	u32 tail;		/* offset updated by owner */
-	u32 is_in_error;	/* error indicator */
-	u32 reserved1;
-	u32 reserved2;
-	u32 owner;		/* id of the channel owner */
-	u32 owner_sub_id;	/* owner-defined field for extra tracking */
-	u32 reserved[5];
-} __packed;
-
 /* Type of command transport buffer */
 #define INTEL_GUC_CT_BUFFER_TYPE_SEND	0x0u
 #define INTEL_GUC_CT_BUFFER_TYPE_RECV	0x1u
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 178f73ab2c96..282df9706912 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -112,32 +112,28 @@ static inline const char *guc_ct_buffer_type_to_str(u32 type)
 	}
 }
 
-static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
-				    u32 cmds_addr, u32 size)
+static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
 {
 	memset(desc, 0, sizeof(*desc));
-	desc->addr = cmds_addr;
-	desc->size = size;
-	desc->owner = CTB_OWNER_HOST;
 }
 
-static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb, u32 cmds_addr)
+static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
 {
 	ctb->broken = false;
-	guc_ct_buffer_desc_init(ctb->desc, cmds_addr, ctb->size);
+	guc_ct_buffer_desc_init(ctb->desc);
 }
 
 static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb,
 			       struct guc_ct_buffer_desc *desc,
-			       u32 *cmds, u32 size)
+			       u32 *cmds, u32 size_in_bytes)
 {
-	GEM_BUG_ON(size % 4);
+	GEM_BUG_ON(size_in_bytes % 4);
 
 	ctb->desc = desc;
 	ctb->cmds = cmds;
-	ctb->size = size;
+	ctb->size = size_in_bytes / 4;
 
-	guc_ct_buffer_reset(ctb, 0);
+	guc_ct_buffer_reset(ctb);
 }
 
 static int guc_action_register_ct_buffer(struct intel_guc *guc,
@@ -279,10 +275,10 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
 
 	/* (re)initialize descriptors */
 	cmds = base + ptrdiff(ct->ctbs.send.cmds, blob);
-	guc_ct_buffer_reset(&ct->ctbs.send, cmds);
+	guc_ct_buffer_reset(&ct->ctbs.send);
 
 	cmds = base + ptrdiff(ct->ctbs.recv.cmds, blob);
-	guc_ct_buffer_reset(&ct->ctbs.recv, cmds);
+	guc_ct_buffer_reset(&ct->ctbs.recv);
 
 	/*
 	 * Register both CT buffers starting with RECV buffer.
@@ -369,17 +365,15 @@ static int ct_write(struct intel_guc_ct *ct,
 	if (unlikely(ctb->broken))
 		return -EPIPE;
 
-	if (unlikely(desc->is_in_error))
+	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely(!IS_ALIGNED(head | tail, 4) ||
-		     (tail | head) >= size))
+	if (unlikely((tail | head) >= size)) {
+		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
+			 head, tail, size);
+		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
-
-	/* later calculations will be done in dwords */
-	head /= 4;
-	tail /= 4;
-	size /= 4;
+	}
 
 	/*
 	 * tail == head condition indicates empty. GuC FW does not support
@@ -419,14 +413,14 @@ static int ct_write(struct intel_guc_ct *ct,
 	}
 	GEM_BUG_ON(tail > size);
 
-	/* now update desc tail (back in bytes) */
-	desc->tail = tail * 4;
+	/* now update descriptor */
+	WRITE_ONCE(desc->tail, tail);
+
 	return 0;
 
 corrupted:
-	CT_ERROR(ct, "Corrupted descriptor addr=%#x head=%u tail=%u size=%u\n",
-		 desc->addr, desc->head, desc->tail, desc->size);
-	desc->is_in_error = 1;
+	CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n",
+		 desc->head, desc->tail, desc->status);
 	ctb->broken = true;
 	return -EPIPE;
 }
@@ -616,17 +610,15 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	if (unlikely(ctb->broken))
 		return -EPIPE;
 
-	if (unlikely(desc->is_in_error))
+	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely(!IS_ALIGNED(head | tail, 4) ||
-		     (tail | head) >= size))
+	if (unlikely((tail | head) >= size)) {
+		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
+			 head, tail, size);
+		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
-
-	/* later calculations will be done in dwords */
-	head /= 4;
-	tail /= 4;
-	size /= 4;
+	}
 
 	/* tail == head condition indicates empty */
 	available = tail - head;
@@ -653,6 +645,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 			      size - head : available - 1), &cmds[head],
 			 4 * (head + available - 1 > size ?
 			      available - 1 - size + head : 0), &cmds[0]);
+		desc->status |= GUC_CTB_STATUS_UNDERFLOW;
 		goto corrupted;
 	}
 
@@ -675,13 +668,14 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	}
 	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
 
-	desc->head = head * 4;
+	/* now update descriptor */
+	WRITE_ONCE(desc->head, head);
+
 	return available - len;
 
 corrupted:
-	CT_ERROR(ct, "Corrupted descriptor addr=%#x head=%u tail=%u size=%u\n",
-		 desc->addr, desc->head, desc->tail, desc->size);
-	desc->is_in_error = 1;
+	CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n",
+		 desc->head, desc->tail, desc->status);
 	ctb->broken = true;
 	return -EPIPE;
 }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 7d3cd375d6a7..905202caaad3 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -31,7 +31,7 @@ struct intel_guc;
  * @lock: protects access to the commands buffer and buffer descriptor
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
- * @size: size of the commands buffer
+ * @size: size of the commands buffer in dwords
  * @broken: flag to indicate if descriptor data is broken
  */
 struct intel_guc_ct_buffer {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 26/97] drm/i915/guc: New definition of the CTB registration action
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (24 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 25/97] drm/i915/guc: New definition of the CTB descriptor Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 27/97] drm/i915/guc: New CTB based communication Matthew Brost
                   ` (73 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

Definition of the CTB registration action has changed.
Add some ABI documentation and implement required changes.

GuC: 57.0.0
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  | 107 ++++++++++++++++++
 .../gt/uc/abi/guc_communication_ctb_abi.h     |   4 -
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  76 ++++++++-----
 3 files changed, 152 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 90efef8a73e4..6cb0d3eb9b72 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -6,6 +6,113 @@
 #ifndef _ABI_GUC_ACTIONS_ABI_H
 #define _ABI_GUC_ACTIONS_ABI_H
 
+/**
+ * DOC: HOST2GUC_REGISTER_CTB
+ *
+ * This message is used as part of the `CTB based communication`_ setup.
+ *
+ * This message must be sent as `MMIO H2G Message`_.
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 |    31 | ORIGIN = GUC_HXG_ORIGIN_HOST_                                |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_                                 |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 27:16 | DATA0 = MBZ                                                  |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  15:0 | ACTION = _`GUC_ACTION_HOST2GUC_REGISTER_CTB` = 0x5200        |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 1 | 31:12 | RESERVED = MBZ                                               |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  11:8 | **TYPE** - type for the CT buffer                            |
+ *  |   |       |                                                              |
+ *  |   |       |   - _`GUC_CTB_TYPE_HOST2GUC` = 0                             |
+ *  |   |       |   - _`GUC_CTB_TYPE_GUC2HOST` = 1                             |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |   7:0 | **SIZE** - size of the `CT Buffer`_ in 4K units minus 1      |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 2 |  31:0 | **DESC_ADDR** - GGTT address of the `CT Descriptor`_         |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 3 |  31:0 | **BUFF_ADDF** - GGTT address of the `CT Buffer`_             |
+ *  +---+-------+--------------------------------------------------------------+
+*
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 |    31 | ORIGIN = GUC_HXG_ORIGIN_GUC_                                 |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_                        |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  27:0 | DATA0 = MBZ                                                  |
+ *  +---+-------+--------------------------------------------------------------+
+ */
+#define GUC_ACTION_HOST2GUC_REGISTER_CTB		0x4505 // FIXME 0x5200
+
+#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_LEN		(GUC_HXG_REQUEST_MSG_MIN_LEN + 3u)
+#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_0_MBZ		GUC_HXG_REQUEST_MSG_0_DATA0
+#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_1_MBZ		(0xfffff << 12)
+#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_1_TYPE	(0xf << 8)
+#define   GUC_CTB_TYPE_HOST2GUC				0u
+#define   GUC_CTB_TYPE_GUC2HOST				1u
+#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_1_SIZE	(0xff << 0)
+#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_2_DESC_ADDR	GUC_HXG_REQUEST_MSG_n_DATAn
+#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_3_BUFF_ADDR	GUC_HXG_REQUEST_MSG_n_DATAn
+
+#define HOST2GUC_REGISTER_CTB_RESPONSE_MSG_LEN		GUC_HXG_RESPONSE_MSG_MIN_LEN
+#define HOST2GUC_REGISTER_CTB_RESPONSE_MSG_0_MBZ	GUC_HXG_RESPONSE_MSG_0_DATA0
+
+/**
+ * DOC: HOST2GUC_DEREGISTER_CTB
+ *
+ * This message is used as part of the `CTB based communication`_ teardown.
+ *
+ * This message must be sent as `MMIO H2G Message`_.
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 |    31 | ORIGIN = GUC_HXG_ORIGIN_HOST_                                |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_                                 |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 27:16 | DATA0 = MBZ                                                  |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  15:0 | ACTION = _`GUC_ACTION_HOST2GUC_DEREGISTER_CTB` = 0x5201      |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 1 | 31:12 | RESERVED = MBZ                                               |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  11:8 | **TYPE** - type of the CT buffer                             |
+ *  |   |       |                                                              |
+ *  |   |       | see _`GUC_ACTION_HOST2GUC_REGISTER_CTB`                      |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |   7:0 | RESERVED = MBZ                                               |
+ *  +---+-------+--------------------------------------------------------------+
+*
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 |    31 | ORIGIN = GUC_HXG_ORIGIN_GUC_                                 |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_                        |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  27:0 | DATA0 = MBZ                                                  |
+ *  +---+-------+--------------------------------------------------------------+
+ */
+#define GUC_ACTION_HOST2GUC_DEREGISTER_CTB		0x4506 // FIXME 0x5201
+
+#define HOST2GUC_DEREGISTER_CTB_REQUEST_MSG_LEN		(GUC_HXG_REQUEST_MSG_MIN_LEN + 1u)
+#define HOST2GUC_DEREGISTER_CTB_REQUEST_MSG_0_MBZ	GUC_HXG_REQUEST_MSG_0_DATA0
+#define HOST2GUC_DEREGISTER_CTB_REQUEST_MSG_1_MBZ	(0xfffff << 12)
+#define HOST2GUC_DEREGISTER_CTB_REQUEST_MSG_1_TYPE	(0xf << 8)
+#define HOST2GUC_DEREGISTER_CTB_REQUEST_MSG_1_MBZ2	(0xff << 0)
+
+#define HOST2GUC_DEREGISTER_CTB_RESPONSE_MSG_LEN	GUC_HXG_RESPONSE_MSG_MIN_LEN
+#define HOST2GUC_DEREGISTER_CTB_RESPONSE_MSG_0_MBZ	GUC_HXG_RESPONSE_MSG_0_DATA0
+
+/* legacy definitions */
+
 enum intel_guc_action {
 	INTEL_GUC_ACTION_DEFAULT = 0x0,
 	INTEL_GUC_ACTION_REQUEST_PREEMPTION = 0x2,
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
index c2a069a78e01..127b256a662c 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
@@ -112,10 +112,6 @@ static_assert(sizeof(struct guc_ct_buffer_desc) == 64);
  * - **flags**, holds various bits to control message handling
  */
 
-/* Type of command transport buffer */
-#define INTEL_GUC_CT_BUFFER_TYPE_SEND	0x0u
-#define INTEL_GUC_CT_BUFFER_TYPE_RECV	0x1u
-
 /*
  * Definition of the command transport message header (DW0)
  *
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 282df9706912..e25b49a45107 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -103,9 +103,9 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
 static inline const char *guc_ct_buffer_type_to_str(u32 type)
 {
 	switch (type) {
-	case INTEL_GUC_CT_BUFFER_TYPE_SEND:
+	case GUC_CTB_TYPE_HOST2GUC:
 		return "SEND";
-	case INTEL_GUC_CT_BUFFER_TYPE_RECV:
+	case GUC_CTB_TYPE_GUC2HOST:
 		return "RECV";
 	default:
 		return "<invalid>";
@@ -136,25 +136,33 @@ static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb,
 	guc_ct_buffer_reset(ctb);
 }
 
-static int guc_action_register_ct_buffer(struct intel_guc *guc,
-					 u32 desc_addr,
-					 u32 type)
+static int guc_action_register_ct_buffer(struct intel_guc *guc, u32 type,
+					 u32 desc_addr, u32 buff_addr, u32 size)
 {
-	u32 action[] = {
-		INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER,
-		desc_addr,
-		sizeof(struct guc_ct_buffer_desc),
-		type
+	u32 request[HOST2GUC_REGISTER_CTB_REQUEST_MSG_LEN] = {
+		FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+		FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
+		FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, GUC_ACTION_HOST2GUC_REGISTER_CTB),
+		FIELD_PREP(HOST2GUC_REGISTER_CTB_REQUEST_MSG_1_SIZE, size / SZ_4K - 1) |
+		FIELD_PREP(HOST2GUC_REGISTER_CTB_REQUEST_MSG_1_TYPE, type),
+		FIELD_PREP(HOST2GUC_REGISTER_CTB_REQUEST_MSG_2_DESC_ADDR, desc_addr),
+		FIELD_PREP(HOST2GUC_REGISTER_CTB_REQUEST_MSG_3_BUFF_ADDR, buff_addr),
 	};
 
-	/* Can't use generic send(), CT registration must go over MMIO */
-	return intel_guc_send_mmio(guc, action, ARRAY_SIZE(action), NULL, 0);
+	GEM_BUG_ON(type != GUC_CTB_TYPE_HOST2GUC && type != GUC_CTB_TYPE_GUC2HOST);
+	GEM_BUG_ON(size % SZ_4K);
+
+	/* CT registration must go over MMIO */
+	return intel_guc_send_mmio(guc, request, ARRAY_SIZE(request), NULL, 0);
 }
 
-static int ct_register_buffer(struct intel_guc_ct *ct, u32 desc_addr, u32 type)
+static int ct_register_buffer(struct intel_guc_ct *ct, u32 type,
+			      u32 desc_addr, u32 buff_addr, u32 size)
 {
-	int err = guc_action_register_ct_buffer(ct_to_guc(ct), desc_addr, type);
+	int err;
 
+	err = guc_action_register_ct_buffer(ct_to_guc(ct), type,
+					    desc_addr, buff_addr, size);
 	if (unlikely(err))
 		CT_ERROR(ct, "Failed to register %s buffer (err=%d)\n",
 			 guc_ct_buffer_type_to_str(type), err);
@@ -163,14 +171,17 @@ static int ct_register_buffer(struct intel_guc_ct *ct, u32 desc_addr, u32 type)
 
 static int guc_action_deregister_ct_buffer(struct intel_guc *guc, u32 type)
 {
-	u32 action[] = {
-		INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER,
-		CTB_OWNER_HOST,
-		type
+	u32 request[HOST2GUC_DEREGISTER_CTB_REQUEST_MSG_LEN] = {
+		FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+		FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
+		FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, GUC_ACTION_HOST2GUC_DEREGISTER_CTB),
+		FIELD_PREP(HOST2GUC_DEREGISTER_CTB_REQUEST_MSG_1_TYPE, type),
 	};
 
-	/* Can't use generic send(), CT deregistration must go over MMIO */
-	return intel_guc_send_mmio(guc, action, ARRAY_SIZE(action), NULL, 0);
+	GEM_BUG_ON(type != GUC_CTB_TYPE_HOST2GUC && type != GUC_CTB_TYPE_GUC2HOST);
+
+	/* CT deregistration must go over MMIO */
+	return intel_guc_send_mmio(guc, request, ARRAY_SIZE(request), NULL, 0);
 }
 
 static int ct_deregister_buffer(struct intel_guc_ct *ct, u32 type)
@@ -258,7 +269,7 @@ void intel_guc_ct_fini(struct intel_guc_ct *ct)
 int intel_guc_ct_enable(struct intel_guc_ct *ct)
 {
 	struct intel_guc *guc = ct_to_guc(ct);
-	u32 base, cmds;
+	u32 base, desc, cmds;
 	void *blob;
 	int err;
 
@@ -274,23 +285,26 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
 	GEM_BUG_ON(blob != ct->ctbs.send.desc);
 
 	/* (re)initialize descriptors */
-	cmds = base + ptrdiff(ct->ctbs.send.cmds, blob);
 	guc_ct_buffer_reset(&ct->ctbs.send);
-
-	cmds = base + ptrdiff(ct->ctbs.recv.cmds, blob);
 	guc_ct_buffer_reset(&ct->ctbs.recv);
 
 	/*
 	 * Register both CT buffers starting with RECV buffer.
 	 * Descriptors are in first half of the blob.
 	 */
-	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs.recv.desc, blob),
-				 INTEL_GUC_CT_BUFFER_TYPE_RECV);
+	desc = base + ptrdiff(ct->ctbs.recv.desc, blob);
+	cmds = base + ptrdiff(ct->ctbs.recv.cmds, blob);
+	err = ct_register_buffer(ct, GUC_CTB_TYPE_GUC2HOST,
+				 desc, cmds, ct->ctbs.recv.size * 4);
+
 	if (unlikely(err))
 		goto err_out;
 
-	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs.send.desc, blob),
-				 INTEL_GUC_CT_BUFFER_TYPE_SEND);
+	desc = base + ptrdiff(ct->ctbs.send.desc, blob);
+	cmds = base + ptrdiff(ct->ctbs.send.cmds, blob);
+	err = ct_register_buffer(ct, GUC_CTB_TYPE_HOST2GUC,
+				 desc, cmds, ct->ctbs.send.size * 4);
+
 	if (unlikely(err))
 		goto err_deregister;
 
@@ -299,7 +313,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
 	return 0;
 
 err_deregister:
-	ct_deregister_buffer(ct, INTEL_GUC_CT_BUFFER_TYPE_RECV);
+	ct_deregister_buffer(ct, GUC_CTB_TYPE_GUC2HOST);
 err_out:
 	CT_PROBE_ERROR(ct, "Failed to enable CTB (%pe)\n", ERR_PTR(err));
 	return err;
@@ -318,8 +332,8 @@ void intel_guc_ct_disable(struct intel_guc_ct *ct)
 	ct->enabled = false;
 
 	if (intel_guc_is_fw_running(guc)) {
-		ct_deregister_buffer(ct, INTEL_GUC_CT_BUFFER_TYPE_SEND);
-		ct_deregister_buffer(ct, INTEL_GUC_CT_BUFFER_TYPE_RECV);
+		ct_deregister_buffer(ct, GUC_CTB_TYPE_HOST2GUC);
+		ct_deregister_buffer(ct, GUC_CTB_TYPE_GUC2HOST);
 	}
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 27/97] drm/i915/guc: New CTB based communication
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (25 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 26/97] drm/i915/guc: New definition of the CTB registration action Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 28/97] drm/i915/guc: Kill guc_clients.ct_pool Matthew Brost
                   ` (72 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

Format of the CTB messages has changed:
 - support for multiple formats
 - message fence is now part of the header
 - reuse of unified HXG message formats

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
---
 .../gt/uc/abi/guc_communication_ctb_abi.h     |  56 +++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 193 +++++++-----------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |   2 +-
 3 files changed, 134 insertions(+), 117 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
index 127b256a662c..92660726c094 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
@@ -60,6 +60,62 @@ struct guc_ct_buffer_desc {
 } __packed;
 static_assert(sizeof(struct guc_ct_buffer_desc) == 64);
 
+/**
+ * DOC: CTB Message
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 | 31:16 | **FENCE** - message identifier                               |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 15:12 | **FORMAT** - format of the CTB message                       |
+ *  |   |       |  - _`GUC_CTB_FORMAT_HXG` = 0 - see `CTB HXG Message`_        |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  11:8 | **RESERVED**                                                 |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |   7:0 | **NUM_DWORDS** - length of the CTB message (w/o header)      |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 1 |  31:0 | optional (depends on FORMAT)                                 |
+ *  +---+-------+                                                              |
+ *  |...|       |                                                              |
+ *  +---+-------+                                                              |
+ *  | n |  31:0 |                                                              |
+ *  +---+-------+--------------------------------------------------------------+
+ */
+
+#define GUC_CTB_MSG_MIN_LEN			1u
+#define GUC_CTB_MSG_MAX_LEN			256u
+#define GUC_CTB_MSG_0_FENCE			(0xffff << 16)
+#define GUC_CTB_MSG_0_FORMAT			(0xf << 12)
+#define   GUC_CTB_FORMAT_HXG			0u
+#define GUC_CTB_MSG_0_RESERVED			(0xf << 8)
+#define GUC_CTB_MSG_0_NUM_DWORDS		(0xff << 0)
+
+/**
+ * DOC: CTB HXG Message
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 | 31:16 | FENCE                                                        |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 15:12 | FORMAT = GUC_CTB_FORMAT_HXG_                                 |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  11:8 | RESERVED = MBZ                                               |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |   7:0 | NUM_DWORDS = length (in dwords) of the embedded HXG message  |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 1 |  31:0 |  +--------------------------------------------------------+  |
+ *  +---+-------+  |                                                        |  |
+ *  |...|       |  |  Embedded `HXG Message`_                               |  |
+ *  +---+-------+  |                                                        |  |
+ *  | n |  31:0 |  +--------------------------------------------------------+  |
+ *  +---+-------+--------------------------------------------------------------+
+ */
+
+#define GUC_CTB_HXG_MSG_MIN_LEN		(GUC_CTB_MSG_MIN_LEN + GUC_HXG_MSG_MIN_LEN)
+#define GUC_CTB_HXG_MSG_MAX_LEN		GUC_CTB_MSG_MAX_LEN
+
 /**
  * DOC: CTB based communication
  *
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index e25b49a45107..217ab3ebd1af 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -343,24 +343,6 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
 	return ++ct->requests.last_fence;
 }
 
-/**
- * DOC: CTB Host to GuC request
- *
- * Format of the CTB Host to GuC request message is as follows::
- *
- *      +------------+---------+---------+---------+---------+
- *      |   msg[0]   |   [1]   |   [2]   |   ...   |  [n-1]  |
- *      +------------+---------+---------+---------+---------+
- *      |   MESSAGE  |       MESSAGE PAYLOAD                 |
- *      +   HEADER   +---------+---------+---------+---------+
- *      |            |    0    |    1    |   ...   |    n    |
- *      +============+=========+=========+=========+=========+
- *      |  len >= 1  |  FENCE  |     request specific data   |
- *      +------+-----+---------+---------+---------+---------+
- *
- *                   ^-----------------len-------------------^
- */
-
 static int ct_write(struct intel_guc_ct *ct,
 		    const u32 *action,
 		    u32 len /* in dwords */,
@@ -373,6 +355,7 @@ static int ct_write(struct intel_guc_ct *ct,
 	u32 size = ctb->size;
 	u32 used;
 	u32 header;
+	u32 hxg;
 	u32 *cmds = ctb->cmds;
 	unsigned int i;
 
@@ -403,22 +386,24 @@ static int ct_write(struct intel_guc_ct *ct,
 		return -ENOSPC;
 
 	/*
-	 * Write the message. The format is the following:
-	 * DW0: header (including action code)
-	 * DW1: fence
-	 * DW2+: action data
+	 * dw0: CT header (including fence)
+	 * dw1: HXG header
 	 */
-	header = (len << GUC_CT_MSG_LEN_SHIFT) |
-		 GUC_CT_MSG_SEND_STATUS |
-		 (action[0] << GUC_CT_MSG_ACTION_SHIFT);
+	header = FIELD_PREP(GUC_CTB_MSG_0_FORMAT, GUC_CTB_FORMAT_HXG) |
+		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
+		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
+
+	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
+	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
+			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
 
-	CT_DEBUG(ct, "writing %*ph %*ph %*ph\n",
-		 4, &header, 4, &fence, 4 * (len - 1), &action[1]);
+	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
+		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
 
 	cmds[tail] = header;
 	tail = (tail + 1) % size;
 
-	cmds[tail] = fence;
+	cmds[tail] = hxg;
 	tail = (tail + 1) % size;
 
 	for (i = 1; i < len; i++) {
@@ -574,21 +559,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 	return ret;
 }
 
-static inline unsigned int ct_header_get_len(u32 header)
-{
-	return (header >> GUC_CT_MSG_LEN_SHIFT) & GUC_CT_MSG_LEN_MASK;
-}
-
-static inline unsigned int ct_header_get_action(u32 header)
-{
-	return (header >> GUC_CT_MSG_ACTION_SHIFT) & GUC_CT_MSG_ACTION_MASK;
-}
-
-static inline bool ct_header_is_response(u32 header)
-{
-	return !!(header & GUC_CT_MSG_IS_RESPONSE);
-}
-
 static struct ct_incoming_msg *ct_alloc_msg(u32 num_dwords)
 {
 	struct ct_incoming_msg *msg;
@@ -651,7 +621,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	head = (head + 1) % size;
 
 	/* message len with header */
-	len = ct_header_get_len(header) + 1;
+	len = FIELD_GET(GUC_CTB_MSG_0_NUM_DWORDS, header) + GUC_CTB_MSG_MIN_LEN;
 	if (unlikely(len > (u32)available)) {
 		CT_ERROR(ct, "Incomplete message %*ph %*ph %*ph\n",
 			 4, &header,
@@ -694,55 +664,24 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	return -EPIPE;
 }
 
-/**
- * DOC: CTB GuC to Host response
- *
- * Format of the CTB GuC to Host response message is as follows::
- *
- *      +------------+---------+---------+---------+---------+---------+
- *      |   msg[0]   |   [1]   |   [2]   |   [3]   |   ...   |  [n-1]  |
- *      +------------+---------+---------+---------+---------+---------+
- *      |   MESSAGE  |       MESSAGE PAYLOAD                           |
- *      +   HEADER   +---------+---------+---------+---------+---------+
- *      |            |    0    |    1    |    2    |   ...   |    n    |
- *      +============+=========+=========+=========+=========+=========+
- *      |  len >= 2  |  FENCE  |  STATUS |   response specific data    |
- *      +------+-----+---------+---------+---------+---------+---------+
- *
- *                   ^-----------------------len-----------------------^
- */
-
 static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *response)
 {
-	u32 header = response->msg[0];
-	u32 len = ct_header_get_len(header);
-	u32 fence;
-	u32 status;
-	u32 datalen;
+	u32 len = FIELD_GET(GUC_CTB_MSG_0_NUM_DWORDS, response->msg[0]);
+	u32 fence = FIELD_GET(GUC_CTB_MSG_0_FENCE, response->msg[0]);
+	const u32 *hxg = &response->msg[GUC_CTB_MSG_MIN_LEN];
+	const u32 *data = &hxg[GUC_HXG_MSG_MIN_LEN];
+	u32 datalen = len - GUC_HXG_MSG_MIN_LEN;
 	struct ct_request *req;
 	unsigned long flags;
 	bool found = false;
 	int err = 0;
 
-	GEM_BUG_ON(!ct_header_is_response(header));
+	GEM_BUG_ON(len < GUC_HXG_MSG_MIN_LEN);
+	GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, hxg[0]) != GUC_HXG_ORIGIN_GUC);
+	GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]) != GUC_HXG_TYPE_RESPONSE_SUCCESS &&
+		   FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]) != GUC_HXG_TYPE_RESPONSE_FAILURE);
 
-	/* Response payload shall at least include fence and status */
-	if (unlikely(len < 2)) {
-		CT_ERROR(ct, "Corrupted response (len %u)\n", len);
-		return -EPROTO;
-	}
-
-	fence = response->msg[1];
-	status = response->msg[2];
-	datalen = len - 2;
-
-	/* Format of the status dword follows HXG header */
-	if (unlikely(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, status) != GUC_HXG_ORIGIN_GUC)) {
-		CT_ERROR(ct, "Corrupted response (status %#x)\n", status);
-		return -EPROTO;
-	}
-
-	CT_DEBUG(ct, "response fence %u status %#x\n", fence, status);
+	CT_DEBUG(ct, "response fence %u status %#x\n", fence, hxg[0]);
 
 	spin_lock_irqsave(&ct->requests.lock, flags);
 	list_for_each_entry(req, &ct->requests.pending, link) {
@@ -758,9 +697,9 @@ static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 			err = -EMSGSIZE;
 		}
 		if (datalen)
-			memcpy(req->response_buf, response->msg + 3, 4 * datalen);
+			memcpy(req->response_buf, data, 4 * datalen);
 		req->response_len = datalen;
-		WRITE_ONCE(req->status, status);
+		WRITE_ONCE(req->status, hxg[0]);
 		found = true;
 		break;
 	}
@@ -781,14 +720,15 @@ static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *request)
 {
 	struct intel_guc *guc = ct_to_guc(ct);
-	u32 header, action, len;
+	const u32 *hxg;
 	const u32 *payload;
+	u32 action, len;
 	int ret;
 
-	header = request->msg[0];
-	payload = &request->msg[1];
-	action = ct_header_get_action(header);
-	len = ct_header_get_len(header);
+	hxg = &request->msg[GUC_CTB_MSG_MIN_LEN];
+	payload = &hxg[GUC_HXG_MSG_MIN_LEN];
+	action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, hxg[0]);
+	len = request->size - GUC_CTB_HXG_MSG_MIN_LEN;
 
 	CT_DEBUG(ct, "request %x %*ph\n", action, 4 * len, payload);
 
@@ -850,29 +790,12 @@ static void ct_incoming_request_worker_func(struct work_struct *w)
 		queue_work(system_unbound_wq, &ct->requests.worker);
 }
 
-/**
- * DOC: CTB GuC to Host request
- *
- * Format of the CTB GuC to Host request message is as follows::
- *
- *      +------------+---------+---------+---------+---------+---------+
- *      |   msg[0]   |   [1]   |   [2]   |   [3]   |   ...   |  [n-1]  |
- *      +------------+---------+---------+---------+---------+---------+
- *      |   MESSAGE  |       MESSAGE PAYLOAD                           |
- *      +   HEADER   +---------+---------+---------+---------+---------+
- *      |            |    0    |    1    |    2    |   ...   |    n    |
- *      +============+=========+=========+=========+=========+=========+
- *      |     len    |            request specific data                |
- *      +------+-----+---------+---------+---------+---------+---------+
- *
- *                   ^-----------------------len-----------------------^
- */
-
-static int ct_handle_request(struct intel_guc_ct *ct, struct ct_incoming_msg *request)
+static int ct_handle_event(struct intel_guc_ct *ct, struct ct_incoming_msg *request)
 {
+	const u32 *hxg = &request->msg[GUC_CTB_MSG_MIN_LEN];
 	unsigned long flags;
 
-	GEM_BUG_ON(ct_header_is_response(request->msg[0]));
+	GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]) != GUC_HXG_TYPE_EVENT);
 
 	spin_lock_irqsave(&ct->requests.lock, flags);
 	list_add_tail(&request->link, &ct->requests.incoming);
@@ -882,15 +805,53 @@ static int ct_handle_request(struct intel_guc_ct *ct, struct ct_incoming_msg *re
 	return 0;
 }
 
-static void ct_handle_msg(struct intel_guc_ct *ct, struct ct_incoming_msg *msg)
+static int ct_handle_hxg(struct intel_guc_ct *ct, struct ct_incoming_msg *msg)
 {
-	u32 header = msg->msg[0];
+	u32 origin, type;
+	u32 *hxg;
 	int err;
 
-	if (ct_header_is_response(header))
+	if (unlikely(msg->size < GUC_CTB_HXG_MSG_MIN_LEN))
+		return -EBADMSG;
+
+	hxg = &msg->msg[GUC_CTB_MSG_MIN_LEN];
+
+	origin = FIELD_GET(GUC_HXG_MSG_0_ORIGIN, hxg[0]);
+	if (unlikely(origin != GUC_HXG_ORIGIN_GUC)) {
+		err = -EPROTO;
+		goto failed;
+	}
+
+	type = FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]);
+	switch (type) {
+	case GUC_HXG_TYPE_EVENT:
+		err = ct_handle_event(ct, msg);
+		break;
+	case GUC_HXG_TYPE_RESPONSE_SUCCESS:
+	case GUC_HXG_TYPE_RESPONSE_FAILURE:
 		err = ct_handle_response(ct, msg);
+		break;
+	default:
+		err = -EOPNOTSUPP;
+	}
+
+	if (unlikely(err)) {
+failed:
+		CT_ERROR(ct, "Failed to handle HXG message (%pe) %*ph\n",
+			 ERR_PTR(err), 4 * GUC_HXG_MSG_MIN_LEN, hxg);
+	}
+	return err;
+}
+
+static void ct_handle_msg(struct intel_guc_ct *ct, struct ct_incoming_msg *msg)
+{
+	u32 format = FIELD_GET(GUC_CTB_MSG_0_FORMAT, msg->msg[0]);
+	int err;
+
+	if (format == GUC_CTB_FORMAT_HXG)
+		err = ct_handle_hxg(ct, msg);
 	else
-		err = ct_handle_request(ct, msg);
+		err = -EOPNOTSUPP;
 
 	if (unlikely(err)) {
 		CT_ERROR(ct, "Failed to process CT message (%pe) %*ph\n",
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 905202caaad3..1ae2dde6db93 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -61,7 +61,7 @@ struct intel_guc_ct {
 	struct tasklet_struct receive_tasklet;
 
 	struct {
-		u32 last_fence; /* last fence used to send request */
+		u16 last_fence; /* last fence used to send request */
 
 		spinlock_t lock; /* protects pending requests list */
 		struct list_head pending; /* requests waiting for response */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 28/97] drm/i915/guc: Kill guc_clients.ct_pool
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (26 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 27/97] drm/i915/guc: New CTB based communication Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-25  1:01   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 29/97] drm/i915/guc: Update firmware to v60.1.2 Matthew Brost
                   ` (71 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

CTB pool is now maintained internally by the GuC as part of its
"private data". No need to allocate separate buffer and pass it
to GuC as yet another ADS.

GuC: 57.0.0
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c  | 12 ------------
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 12 +-----------
 2 files changed, 1 insertion(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 648e1767b17a..775f00d706fa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -25,8 +25,6 @@
  *      +---------------------------------------+
  *      | guc_clients_info                      |
  *      +---------------------------------------+
- *      | guc_ct_pool_entry[size]               |
- *      +---------------------------------------+
  *      | padding                               |
  *      +---------------------------------------+ <== 4K aligned
  *      | private data                          |
@@ -39,7 +37,6 @@ struct __guc_ads_blob {
 	struct guc_policies policies;
 	struct guc_gt_system_info system_info;
 	struct guc_clients_info clients_info;
-	struct guc_ct_pool_entry ct_pool[GUC_CT_POOL_SIZE];
 } __packed;
 
 static u32 guc_ads_private_data_size(struct intel_guc *guc)
@@ -67,11 +64,6 @@ static void guc_policies_init(struct guc_policies *policies)
 	policies->is_valid = 1;
 }
 
-static void guc_ct_pool_entries_init(struct guc_ct_pool_entry *pool, u32 num)
-{
-	memset(pool, 0, num * sizeof(*pool));
-}
-
 static void guc_mapping_table_init(struct intel_gt *gt,
 				   struct guc_gt_system_info *system_info)
 {
@@ -157,11 +149,7 @@ static void __guc_ads_init(struct intel_guc *guc)
 	base = intel_guc_ggtt_offset(guc, guc->ads_vma);
 
 	/* Clients info  */
-	guc_ct_pool_entries_init(blob->ct_pool, ARRAY_SIZE(blob->ct_pool));
-
 	blob->clients_info.clients_num = 1;
-	blob->clients_info.ct_pool_addr = base + ptr_offset(blob, ct_pool);
-	blob->clients_info.ct_pool_count = ARRAY_SIZE(blob->ct_pool);
 
 	/* ADS */
 	blob->ads.scheduler_policies = base + ptr_offset(blob, policies);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 95db4a7d3f4d..301b173a26bc 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -269,19 +269,9 @@ struct guc_gt_system_info {
 } __packed;
 
 /* Clients info */
-struct guc_ct_pool_entry {
-	struct guc_ct_buffer_desc desc;
-	u32 reserved[7];
-} __packed;
-
-#define GUC_CT_POOL_SIZE	2
-
 struct guc_clients_info {
 	u32 clients_num;
-	u32 reserved0[13];
-	u32 ct_pool_addr;
-	u32 ct_pool_count;
-	u32 reserved[4];
+	u32 reserved[19];
 } __packed;
 
 /* GuC Additional Data Struct */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 29/97] drm/i915/guc: Update firmware to v60.1.2
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (27 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 28/97] drm/i915/guc: Kill guc_clients.ct_pool Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 30/97] drm/i915/uc: turn on GuC/HuC auto mode by default Matthew Brost
                   ` (70 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 25 ++++++++++++------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
index df647c9a8d56..81f5fad84906 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
@@ -48,19 +48,18 @@ void intel_uc_fw_change_status(struct intel_uc_fw *uc_fw,
  * firmware as TGL.
  */
 #define INTEL_UC_FIRMWARE_DEFS(fw_def, guc_def, huc_def) \
-	fw_def(ALDERLAKE_S, 0, guc_def(tgl, 49, 0, 1), huc_def(tgl,  7, 5, 0)) \
-	fw_def(ROCKETLAKE,  0, guc_def(tgl, 49, 0, 1), huc_def(tgl,  7, 5, 0)) \
-	fw_def(TIGERLAKE,   0, guc_def(tgl, 49, 0, 1), huc_def(tgl,  7, 5, 0)) \
-	fw_def(JASPERLAKE,  0, guc_def(ehl, 49, 0, 1), huc_def(ehl,  9, 0, 0)) \
-	fw_def(ELKHARTLAKE, 0, guc_def(ehl, 49, 0, 1), huc_def(ehl,  9, 0, 0)) \
-	fw_def(ICELAKE,     0, guc_def(icl, 49, 0, 1), huc_def(icl,  9, 0, 0)) \
-	fw_def(COMETLAKE,   5, guc_def(cml, 49, 0, 1), huc_def(cml,  4, 0, 0)) \
-	fw_def(COMETLAKE,   0, guc_def(kbl, 49, 0, 1), huc_def(kbl,  4, 0, 0)) \
-	fw_def(COFFEELAKE,  0, guc_def(kbl, 49, 0, 1), huc_def(kbl,  4, 0, 0)) \
-	fw_def(GEMINILAKE,  0, guc_def(glk, 49, 0, 1), huc_def(glk,  4, 0, 0)) \
-	fw_def(KABYLAKE,    0, guc_def(kbl, 49, 0, 1), huc_def(kbl,  4, 0, 0)) \
-	fw_def(BROXTON,     0, guc_def(bxt, 49, 0, 1), huc_def(bxt,  2, 0, 0)) \
-	fw_def(SKYLAKE,     0, guc_def(skl, 49, 0, 1), huc_def(skl,  2, 0, 0))
+	fw_def(ALDERLAKE_S, 0, guc_def(tgl, 60, 1, 2), huc_def(tgl,  7, 5, 0)) \
+	fw_def(ROCKETLAKE,  0, guc_def(tgl, 60, 1, 2), huc_def(tgl,  7, 5, 0)) \
+	fw_def(TIGERLAKE,   0, guc_def(tgl, 60, 1, 2), huc_def(tgl,  7, 5, 0)) \
+	fw_def(JASPERLAKE,  0, guc_def(ehl, 60, 1, 2), huc_def(ehl,  9, 0, 0)) \
+	fw_def(ELKHARTLAKE, 0, guc_def(ehl, 60, 1, 2), huc_def(ehl,  9, 0, 0)) \
+	fw_def(ICELAKE,     0, guc_def(icl, 60, 1, 2), huc_def(icl,  9, 0, 0)) \
+	fw_def(COMETLAKE,   5, guc_def(cml, 60, 1, 2), huc_def(cml,  4, 0, 0)) \
+	fw_def(COFFEELAKE,  0, guc_def(kbl, 60, 1, 2), huc_def(kbl,  4, 0, 0)) \
+	fw_def(GEMINILAKE,  0, guc_def(glk, 60, 1, 2), huc_def(glk,  4, 0, 0)) \
+	fw_def(KABYLAKE,    0, guc_def(kbl, 60, 1, 2), huc_def(kbl,  4, 0, 0)) \
+	fw_def(BROXTON,     0, guc_def(bxt, 60, 1, 2), huc_def(bxt,  2, 0, 0)) \
+	fw_def(SKYLAKE,     0, guc_def(skl, 60, 1, 2), huc_def(skl,  2, 0, 0))
 
 #define __MAKE_UC_FW_PATH(prefix_, name_, major_, minor_, patch_) \
 	"i915/" \
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 30/97] drm/i915/uc: turn on GuC/HuC auto mode by default
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (28 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 29/97] drm/i915/guc: Update firmware to v60.1.2 Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-24 11:00   ` Michal Wajdeczko
  2021-05-06 19:13 ` [RFC PATCH 31/97] drm/i915/guc: Early initialization of GuC send registers Matthew Brost
                   ` (69 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

This will enable HuC loading for Gen11+ by default if the binaries
are available on the system. GuC submission still requires explicit
enabling by the user.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_params.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
index 14cd64cc61d0..a0575948ab61 100644
--- a/drivers/gpu/drm/i915/i915_params.h
+++ b/drivers/gpu/drm/i915/i915_params.h
@@ -59,7 +59,7 @@ struct drm_printer;
 	param(int, disable_power_well, -1, 0400) \
 	param(int, enable_ips, 1, 0600) \
 	param(int, invert_brightness, 0, 0600) \
-	param(int, enable_guc, 0, 0400) \
+	param(int, enable_guc, -1, 0400) \
 	param(int, guc_log_level, -1, 0400) \
 	param(char *, guc_firmware_path, NULL, 0400) \
 	param(char *, huc_firmware_path, NULL, 0400) \
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 31/97] drm/i915/guc: Early initialization of GuC send registers
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (29 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 30/97] drm/i915/uc: turn on GuC/HuC auto mode by default Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-26 20:28   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 32/97] drm/i915: Introduce i915_sched_engine object Matthew Brost
                   ` (68 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

Base offset and count of the GuC scratch registers, used for
sending MMIO messages to GuC, can be initialized earlier with
other GuC members that also depends on platform.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 454c8d886499..235c1997f32d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -60,15 +60,8 @@ void intel_guc_init_send_regs(struct intel_guc *guc)
 	enum forcewake_domains fw_domains = 0;
 	unsigned int i;
 
-	if (INTEL_GEN(gt->i915) >= 11) {
-		guc->send_regs.base =
-				i915_mmio_reg_offset(GEN11_SOFT_SCRATCH(0));
-		guc->send_regs.count = GEN11_SOFT_SCRATCH_COUNT;
-	} else {
-		guc->send_regs.base = i915_mmio_reg_offset(SOFT_SCRATCH(0));
-		guc->send_regs.count = GUC_MAX_MMIO_MSG_LEN;
-		BUILD_BUG_ON(GUC_MAX_MMIO_MSG_LEN > SOFT_SCRATCH_COUNT);
-	}
+	GEM_BUG_ON(!guc->send_regs.base);
+	GEM_BUG_ON(!guc->send_regs.count);
 
 	for (i = 0; i < guc->send_regs.count; i++) {
 		fw_domains |= intel_uncore_forcewake_for_reg(gt->uncore,
@@ -181,11 +174,18 @@ void intel_guc_init_early(struct intel_guc *guc)
 		guc->interrupts.reset = gen11_reset_guc_interrupts;
 		guc->interrupts.enable = gen11_enable_guc_interrupts;
 		guc->interrupts.disable = gen11_disable_guc_interrupts;
+		guc->send_regs.base =
+			i915_mmio_reg_offset(GEN11_SOFT_SCRATCH(0));
+		guc->send_regs.count = GEN11_SOFT_SCRATCH_COUNT;
+
 	} else {
 		guc->notify_reg = GUC_SEND_INTERRUPT;
 		guc->interrupts.reset = gen9_reset_guc_interrupts;
 		guc->interrupts.enable = gen9_enable_guc_interrupts;
 		guc->interrupts.disable = gen9_disable_guc_interrupts;
+		guc->send_regs.base = i915_mmio_reg_offset(SOFT_SCRATCH(0));
+		guc->send_regs.count = GUC_MAX_MMIO_MSG_LEN;
+		BUILD_BUG_ON(GUC_MAX_MMIO_MSG_LEN > SOFT_SCRATCH_COUNT);
 	}
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 32/97] drm/i915: Introduce i915_sched_engine object
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (30 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 31/97] drm/i915/guc: Early initialization of GuC send registers Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-11 15:18   ` Daniel Vetter
  2021-05-06 19:13 ` [RFC PATCH 33/97] drm/i915: Engine relative MMIO Matthew Brost
                   ` (67 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Introduce i915_sched_engine object which is lower level data structure
that i915_scheduler / generic code can operate on without touching
execlist specific structures. This allows additional submission backends
to be added without breaking the layer.

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_wait.c      |   4 +-
 drivers/gpu/drm/i915/gt/intel_engine.h        |  16 -
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  77 ++--
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   4 +-
 drivers/gpu/drm/i915/gt/intel_engine_pm.c     |  10 +-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  42 +--
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |   2 +-
 .../drm/i915/gt/intel_execlists_submission.c  | 350 +++++++++++-------
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  13 +-
 drivers/gpu/drm/i915/gt/mock_engine.c         |  17 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  36 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   6 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |   6 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      |   2 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  75 ++--
 drivers/gpu/drm/i915/i915_gpu_error.c         |   7 +-
 drivers/gpu/drm/i915/i915_request.c           |  50 +--
 drivers/gpu/drm/i915/i915_request.h           |   2 +-
 drivers/gpu/drm/i915/i915_scheduler.c         | 168 ++++-----
 drivers/gpu/drm/i915/i915_scheduler.h         |  65 +++-
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  63 ++++
 21 files changed, 575 insertions(+), 440 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index 4b9856d5ba14..af1fbf8e2a9a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -104,8 +104,8 @@ static void fence_set_priority(struct dma_fence *fence,
 	engine = rq->engine;
 
 	rcu_read_lock(); /* RCU serialisation for set-wedged protection */
-	if (engine->schedule)
-		engine->schedule(rq, attr);
+	if (engine->sched_engine->schedule)
+		engine->sched_engine->schedule(rq, attr);
 	rcu_read_unlock();
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 8d9184920c51..988d9688ae4d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -123,20 +123,6 @@ execlists_active(const struct intel_engine_execlists *execlists)
 	return active;
 }
 
-static inline void
-execlists_active_lock_bh(struct intel_engine_execlists *execlists)
-{
-	local_bh_disable(); /* prevent local softirq and lock recursion */
-	tasklet_lock(&execlists->tasklet);
-}
-
-static inline void
-execlists_active_unlock_bh(struct intel_engine_execlists *execlists)
-{
-	tasklet_unlock(&execlists->tasklet);
-	local_bh_enable(); /* restore softirq, and kick ksoftirqd! */
-}
-
 struct i915_request *
 execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists);
 
@@ -257,8 +243,6 @@ intel_engine_find_active_request(struct intel_engine_cs *engine);
 
 u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
 
-void intel_engine_init_active(struct intel_engine_cs *engine,
-			      unsigned int subclass);
 #define ENGINE_PHYSICAL	0
 #define ENGINE_MOCK	1
 #define ENGINE_VIRTUAL	2
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 828e1669f92c..ec82a7ec0c8d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -8,6 +8,7 @@
 #include "gem/i915_gem_context.h"
 
 #include "i915_drv.h"
+#include "i915_scheduler.h"
 
 #include "intel_breadcrumbs.h"
 #include "intel_context.h"
@@ -326,9 +327,6 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 	if (engine->context_size)
 		DRIVER_CAPS(i915)->has_logical_contexts = true;
 
-	/* Nothing to do here, execute in order of dependencies */
-	engine->schedule = NULL;
-
 	ewma__engine_latency_init(&engine->latency);
 	seqcount_init(&engine->stats.lock);
 
@@ -583,9 +581,6 @@ void intel_engine_init_execlists(struct intel_engine_cs *engine)
 	memset(execlists->pending, 0, sizeof(execlists->pending));
 	execlists->active =
 		memset(execlists->inflight, 0, sizeof(execlists->inflight));
-
-	execlists->queue_priority_hint = INT_MIN;
-	execlists->queue = RB_ROOT_CACHED;
 }
 
 static void cleanup_status_page(struct intel_engine_cs *engine)
@@ -712,11 +707,17 @@ static int engine_setup_common(struct intel_engine_cs *engine)
 		goto err_status;
 	}
 
+	engine->sched_engine = i915_sched_engine_create(ENGINE_PHYSICAL);
+	if (!engine->sched_engine) {
+		err = -ENOMEM;
+		goto err_sched_engine;
+	}
+	engine->sched_engine->engine = engine;
+
 	err = intel_engine_init_cmd_parser(engine);
 	if (err)
 		goto err_cmd_parser;
 
-	intel_engine_init_active(engine, ENGINE_PHYSICAL);
 	intel_engine_init_execlists(engine);
 	intel_engine_init__pm(engine);
 	intel_engine_init_retire(engine);
@@ -735,6 +736,8 @@ static int engine_setup_common(struct intel_engine_cs *engine)
 	return 0;
 
 err_cmd_parser:
+	i915_sched_engine_put(engine->sched_engine);
+err_sched_engine:
 	intel_breadcrumbs_free(engine->breadcrumbs);
 err_status:
 	cleanup_status_page(engine);
@@ -773,11 +776,11 @@ static int measure_breadcrumb_dw(struct intel_context *ce)
 	frame->rq.ring = &frame->ring;
 
 	mutex_lock(&ce->timeline->mutex);
-	spin_lock_irq(&engine->active.lock);
+	spin_lock_irq(&engine->sched_engine->lock);
 
 	dw = engine->emit_fini_breadcrumb(&frame->rq, frame->cs) - frame->cs;
 
-	spin_unlock_irq(&engine->active.lock);
+	spin_unlock_irq(&engine->sched_engine->lock);
 	mutex_unlock(&ce->timeline->mutex);
 
 	GEM_BUG_ON(dw & 1); /* RING_TAIL must be qword aligned */
@@ -786,28 +789,6 @@ static int measure_breadcrumb_dw(struct intel_context *ce)
 	return dw;
 }
 
-void
-intel_engine_init_active(struct intel_engine_cs *engine, unsigned int subclass)
-{
-	INIT_LIST_HEAD(&engine->active.requests);
-	INIT_LIST_HEAD(&engine->active.hold);
-
-	spin_lock_init(&engine->active.lock);
-	lockdep_set_subclass(&engine->active.lock, subclass);
-
-	/*
-	 * Due to an interesting quirk in lockdep's internal debug tracking,
-	 * after setting a subclass we must ensure the lock is used. Otherwise,
-	 * nr_unused_locks is incremented once too often.
-	 */
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-	local_irq_disable();
-	lock_map_acquire(&engine->active.lock.dep_map);
-	lock_map_release(&engine->active.lock.dep_map);
-	local_irq_enable();
-#endif
-}
-
 static struct intel_context *
 create_pinned_context(struct intel_engine_cs *engine,
 		      unsigned int hwsp,
@@ -955,10 +936,10 @@ int intel_engines_init(struct intel_gt *gt)
  */
 void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 {
-	GEM_BUG_ON(!list_empty(&engine->active.requests));
-	tasklet_kill(&engine->execlists.tasklet); /* flush the callback */
+	GEM_BUG_ON(!list_empty(&engine->sched_engine->requests));
 
 	intel_breadcrumbs_free(engine->breadcrumbs);
+	i915_sched_engine_put(engine->sched_engine);
 
 	intel_engine_fini_retire(engine);
 	intel_engine_cleanup_cmd_parser(engine);
@@ -1241,7 +1222,7 @@ static bool ring_is_idle(struct intel_engine_cs *engine)
 
 void __intel_engine_flush_submission(struct intel_engine_cs *engine, bool sync)
 {
-	struct tasklet_struct *t = &engine->execlists.tasklet;
+	struct tasklet_struct *t = &engine->sched_engine->tasklet;
 
 	if (!t->callback)
 		return;
@@ -1281,7 +1262,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
 	intel_engine_flush_submission(engine);
 
 	/* ELSP is empty, but there are ready requests? E.g. after reset */
-	if (!RB_EMPTY_ROOT(&engine->execlists.queue.rb_root))
+	if (!i915_sched_engine_is_empty(engine->sched_engine))
 		return false;
 
 	/* Ring stopped? */
@@ -1347,7 +1328,7 @@ static struct intel_timeline *get_timeline(struct i915_request *rq)
 	struct intel_timeline *tl;
 
 	/*
-	 * Even though we are holding the engine->active.lock here, there
+	 * Even though we are holding the engine->sched_engine->lock here, there
 	 * is no control over the submission queue per-se and we are
 	 * inspecting the active state at a random point in time, with an
 	 * unknown queue. Play safe and make sure the timeline remains valid.
@@ -1502,10 +1483,10 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
 
 		drm_printf(m, "\tExeclist tasklet queued? %s (%s), preempt? %s, timeslice? %s\n",
 			   yesno(test_bit(TASKLET_STATE_SCHED,
-					  &engine->execlists.tasklet.state)),
-			   enableddisabled(!atomic_read(&engine->execlists.tasklet.count)),
-			   repr_timer(&engine->execlists.preempt),
-			   repr_timer(&engine->execlists.timer));
+					  &engine->sched_engine->tasklet.state)),
+			   enableddisabled(!atomic_read(&engine->sched_engine->tasklet.count)),
+			   repr_timer(&execlists->preempt),
+			   repr_timer(&execlists->timer));
 
 		read = execlists->csb_head;
 		write = READ_ONCE(*execlists->csb_write);
@@ -1527,7 +1508,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
 				   idx, hws[idx * 2], hws[idx * 2 + 1]);
 		}
 
-		execlists_active_lock_bh(execlists);
+		sched_engine_active_lock_bh(engine->sched_engine);
 		rcu_read_lock();
 		for (port = execlists->active; (rq = *port); port++) {
 			char hdr[160];
@@ -1558,7 +1539,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
 			i915_request_show(m, rq, hdr, 0);
 		}
 		rcu_read_unlock();
-		execlists_active_unlock_bh(execlists);
+		sched_engine_active_unlock_bh(engine->sched_engine);
 	} else if (INTEL_GEN(dev_priv) > 6) {
 		drm_printf(m, "\tPP_DIR_BASE: 0x%08x\n",
 			   ENGINE_READ(engine, RING_PP_DIR_BASE));
@@ -1694,7 +1675,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 
 	drm_printf(m, "\tRequests:\n");
 
-	spin_lock_irqsave(&engine->active.lock, flags);
+	spin_lock_irqsave(&engine->sched_engine->lock, flags);
 	rq = intel_engine_find_active_request(engine);
 	if (rq) {
 		struct intel_timeline *tl = get_timeline(rq);
@@ -1725,8 +1706,9 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 			hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE);
 		}
 	}
-	drm_printf(m, "\tOn hold?: %lu\n", list_count(&engine->active.hold));
-	spin_unlock_irqrestore(&engine->active.lock, flags);
+	drm_printf(m, "\tOn hold?: %lu\n",
+		   list_count(&engine->sched_engine->hold));
+	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 
 	drm_printf(m, "\tMMIO base:  0x%08x\n", engine->mmio_base);
 	wakeref = intel_runtime_pm_get_if_in_use(engine->uncore->rpm);
@@ -1806,7 +1788,7 @@ intel_engine_find_active_request(struct intel_engine_cs *engine)
 	 * At all other times, we must assume the GPU is still running, but
 	 * we only care about the snapshot of this moment.
 	 */
-	lockdep_assert_held(&engine->active.lock);
+	lockdep_assert_held(&engine->sched_engine->lock);
 
 	rcu_read_lock();
 	request = execlists_active(&engine->execlists);
@@ -1824,7 +1806,8 @@ intel_engine_find_active_request(struct intel_engine_cs *engine)
 	if (active)
 		return active;
 
-	list_for_each_entry(request, &engine->active.requests, sched.link) {
+	list_for_each_entry(request, &engine->sched_engine->requests,
+			    sched.link) {
 		if (__i915_request_is_complete(request))
 			continue;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index b99ac41695f3..b6a305e6a974 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -121,7 +121,7 @@ static void heartbeat(struct work_struct *wrk)
 			 * but all other contexts, including the kernel
 			 * context are stuck waiting for the signal.
 			 */
-		} else if (engine->schedule &&
+		} else if (engine->sched_engine->schedule &&
 			   rq->sched.attr.priority < I915_PRIORITY_BARRIER) {
 			/*
 			 * Gradually raise the priority of the heartbeat to
@@ -136,7 +136,7 @@ static void heartbeat(struct work_struct *wrk)
 				attr.priority = I915_PRIORITY_BARRIER;
 
 			local_bh_disable();
-			engine->schedule(rq, &attr);
+			engine->sched_engine->schedule(rq, &attr);
 			local_bh_enable();
 		} else {
 			if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 47f4397095e5..ba6a9931c4e8 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -274,14 +274,16 @@ static int __engine_park(struct intel_wakeref *wf)
 	intel_engine_park_heartbeat(engine);
 	intel_breadcrumbs_park(engine->breadcrumbs);
 
-	/* Must be reset upon idling, or we may miss the busy wakeup. */
-	GEM_BUG_ON(engine->execlists.queue_priority_hint != INT_MIN);
+	/*
+	 * XXX: Must be reset upon idling, or we may miss the busy wakeup.
+	 * queue_priority_hint only used in execlists submission but works in
+	 * other modes as default is INT_MIN.
+	 */
+	GEM_BUG_ON(engine->sched_engine->queue_priority_hint != INT_MIN);
 
 	if (engine->park)
 		engine->park(engine);
 
-	engine->execlists.no_priolist = false;
-
 	/* While gt calls i915_vma_parked(), we have to break the lock cycle */
 	intel_gt_pm_put_async(engine->gt);
 	return 0;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 9ef349cd5cea..93aa22680db0 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -59,6 +59,7 @@ struct drm_i915_reg_table;
 struct i915_gem_context;
 struct i915_request;
 struct i915_sched_attr;
+struct i915_sched_engine;
 struct intel_gt;
 struct intel_ring;
 struct intel_uncore;
@@ -137,11 +138,6 @@ struct st_preempt_hang {
  * driver and the hardware state for execlist mode of submission.
  */
 struct intel_engine_execlists {
-	/**
-	 * @tasklet: softirq tasklet for bottom handler
-	 */
-	struct tasklet_struct tasklet;
-
 	/**
 	 * @timer: kick the current context if its timeslice expires
 	 */
@@ -152,11 +148,6 @@ struct intel_engine_execlists {
 	 */
 	struct timer_list preempt;
 
-	/**
-	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
-	 */
-	struct i915_priolist default_priolist;
-
 	/**
 	 * @ccid: identifier for contexts submitted to this engine
 	 */
@@ -191,11 +182,6 @@ struct intel_engine_execlists {
 	 */
 	u32 reset_ccid;
 
-	/**
-	 * @no_priolist: priority lists disabled
-	 */
-	bool no_priolist;
-
 	/**
 	 * @submit_reg: gen-specific execlist submission register
 	 * set to the ExecList Submission Port (elsp) register pre-Gen11 and to
@@ -238,23 +224,8 @@ struct intel_engine_execlists {
 	unsigned int port_mask;
 
 	/**
-	 * @queue_priority_hint: Highest pending priority.
-	 *
-	 * When we add requests into the queue, or adjust the priority of
-	 * executing requests, we compute the maximum priority of those
-	 * pending requests. We can then use this value to determine if
-	 * we need to preempt the executing requests to service the queue.
-	 * However, since the we may have recorded the priority of an inflight
-	 * request we wanted to preempt but since completed, at the time of
-	 * dequeuing the priority hint may no longer may match the highest
-	 * available request priority.
+	 * @virtual: virtual of requests, in priority lists
 	 */
-	int queue_priority_hint;
-
-	/**
-	 * @queue: queue of requests, in priority lists
-	 */
-	struct rb_root_cached queue;
 	struct rb_root_cached virtual;
 
 	/**
@@ -326,11 +297,7 @@ struct intel_engine_cs {
 
 	struct intel_sseu sseu;
 
-	struct {
-		spinlock_t lock;
-		struct list_head requests;
-		struct list_head hold; /* ready requests, but on hold */
-	} active;
+	struct i915_sched_engine *sched_engine;
 
 	/* keep a request in reserve for a [pm] barrier under oom */
 	struct i915_request *request_pool;
@@ -459,9 +426,6 @@ struct intel_engine_cs {
 	 * dependencies may need rescheduling. Note the request itself may
 	 * not be ready to run!
 	 */
-	void		(*schedule)(struct i915_request *request,
-				    const struct i915_sched_attr *attr);
-
 	void		(*release)(struct intel_engine_cs *engine);
 
 	struct intel_engine_execlists execlists;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 1cbd84eb24e4..d6dcdeace174 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -107,7 +107,7 @@ static void set_scheduler_caps(struct drm_i915_private *i915)
 	for_each_uabi_engine(engine, i915) { /* all engines must agree! */
 		int i;
 
-		if (engine->schedule)
+		if (engine->sched_engine->schedule)
 			enabled |= (I915_SCHEDULER_CAP_ENABLED |
 				    I915_SCHEDULER_CAP_PRIORITY);
 		else
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 8db200422950..0927a2416b52 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -273,11 +273,11 @@ static int effective_prio(const struct i915_request *rq)
 	return prio;
 }
 
-static int queue_prio(const struct intel_engine_execlists *execlists)
+static int queue_prio(const struct i915_sched_engine *sched_engine)
 {
 	struct rb_node *rb;
 
-	rb = rb_first_cached(&execlists->queue);
+	rb = rb_first_cached(&sched_engine->queue);
 	if (!rb)
 		return INT_MIN;
 
@@ -318,14 +318,14 @@ static bool need_preempt(const struct intel_engine_cs *engine,
 	 * to preserve FIFO ordering of dependencies.
 	 */
 	last_prio = max(effective_prio(rq), I915_PRIORITY_NORMAL - 1);
-	if (engine->execlists.queue_priority_hint <= last_prio)
+	if (engine->sched_engine->queue_priority_hint <= last_prio)
 		return false;
 
 	/*
 	 * Check against the first request in ELSP[1], it will, thanks to the
 	 * power of PI, be the highest priority of that context.
 	 */
-	if (!list_is_last(&rq->sched.link, &engine->active.requests) &&
+	if (!list_is_last(&rq->sched.link, &engine->sched_engine->requests) &&
 	    rq_prio(list_next_entry(rq, sched.link)) > last_prio)
 		return true;
 
@@ -340,7 +340,7 @@ static bool need_preempt(const struct intel_engine_cs *engine,
 	 * context, it's priority would not exceed ELSP[0] aka last_prio.
 	 */
 	return max(virtual_prio(&engine->execlists),
-		   queue_prio(&engine->execlists)) > last_prio;
+		   queue_prio(engine->sched_engine)) > last_prio;
 }
 
 __maybe_unused static bool
@@ -367,10 +367,10 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 	struct list_head *pl;
 	int prio = I915_PRIORITY_INVALID;
 
-	lockdep_assert_held(&engine->active.lock);
+	lockdep_assert_held(&engine->sched_engine->lock);
 
 	list_for_each_entry_safe_reverse(rq, rn,
-					 &engine->active.requests,
+					 &engine->sched_engine->requests,
 					 sched.link) {
 		if (__i915_request_is_complete(rq)) {
 			list_del_init(&rq->sched.link);
@@ -382,9 +382,10 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
 		if (rq_prio(rq) != prio) {
 			prio = rq_prio(rq);
-			pl = i915_sched_lookup_priolist(engine, prio);
+			pl = i915_sched_lookup_priolist(engine->sched_engine,
+							prio);
 		}
-		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
+		GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
 
 		list_move(&rq->sched.link, pl);
 		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
@@ -534,13 +535,13 @@ resubmit_virtual_request(struct i915_request *rq, struct virtual_engine *ve)
 {
 	struct intel_engine_cs *engine = rq->engine;
 
-	spin_lock_irq(&engine->active.lock);
+	spin_lock_irq(&engine->sched_engine->lock);
 
 	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 	WRITE_ONCE(rq->engine, &ve->base);
 	ve->base.submit_request(rq);
 
-	spin_unlock_irq(&engine->active.lock);
+	spin_unlock_irq(&engine->sched_engine->lock);
 }
 
 static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
@@ -569,7 +570,7 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 		resubmit_virtual_request(rq, ve);
 
 	if (READ_ONCE(ve->request))
-		tasklet_hi_schedule(&ve->base.execlists.tasklet);
+		i915_sched_engine_hi_kick(ve->base.sched_engine);
 }
 
 static void __execlists_schedule_out(struct i915_request * const rq,
@@ -579,7 +580,7 @@ static void __execlists_schedule_out(struct i915_request * const rq,
 	unsigned int ccid;
 
 	/*
-	 * NB process_csb() is not under the engine->active.lock and hence
+	 * NB process_csb() is not under the engine->sched_engine->lock and hence
 	 * schedule_out can race with schedule_in meaning that we should
 	 * refrain from doing non-trivial work here.
 	 */
@@ -721,12 +722,11 @@ dump_port(char *buf, int buflen, const char *prefix, struct i915_request *rq)
 }
 
 static __maybe_unused noinline void
-trace_ports(const struct intel_engine_execlists *execlists,
+trace_ports(const struct intel_engine_cs *engine,
+	    const struct intel_engine_execlists *execlists,
 	    const char *msg,
 	    struct i915_request * const *ports)
 {
-	const struct intel_engine_cs *engine =
-		container_of(execlists, typeof(*engine), execlists);
 	char __maybe_unused p0[40], p1[40];
 
 	if (!ports[0])
@@ -738,25 +738,24 @@ trace_ports(const struct intel_engine_execlists *execlists,
 }
 
 static bool
-reset_in_progress(const struct intel_engine_execlists *execlists)
+reset_in_progress(const struct intel_engine_cs *engine)
 {
-	return unlikely(!__tasklet_is_enabled(&execlists->tasklet));
+	return unlikely(!__tasklet_is_enabled(&engine->sched_engine->tasklet));
 }
 
 static __maybe_unused noinline bool
-assert_pending_valid(const struct intel_engine_execlists *execlists,
+assert_pending_valid(struct intel_engine_cs *engine,
+		     const struct intel_engine_execlists *execlists,
 		     const char *msg)
 {
-	struct intel_engine_cs *engine =
-		container_of(execlists, typeof(*engine), execlists);
 	struct i915_request * const *port, *rq, *prev = NULL;
 	struct intel_context *ce = NULL;
 	u32 ccid = -1;
 
-	trace_ports(execlists, msg, execlists->pending);
+	trace_ports(engine, execlists, msg, execlists->pending);
 
 	/* We may be messing around with the lists during reset, lalala */
-	if (reset_in_progress(execlists))
+	if (reset_in_progress(engine))
 		return true;
 
 	if (!execlists->pending[0]) {
@@ -878,7 +877,7 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
 	struct intel_engine_execlists *execlists = &engine->execlists;
 	unsigned int n;
 
-	GEM_BUG_ON(!assert_pending_valid(execlists, "submit"));
+	GEM_BUG_ON(!assert_pending_valid(engine, execlists, "submit"));
 
 	/*
 	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
@@ -1096,7 +1095,8 @@ static void defer_active(struct intel_engine_cs *engine)
 	if (!rq)
 		return;
 
-	defer_request(rq, i915_sched_lookup_priolist(engine, rq_prio(rq)));
+	defer_request(rq, i915_sched_lookup_priolist(engine->sched_engine,
+						     rq_prio(rq)));
 }
 
 static bool
@@ -1133,13 +1133,14 @@ static bool needs_timeslice(const struct intel_engine_cs *engine,
 		return false;
 
 	/* If ELSP[1] is occupied, always check to see if worth slicing */
-	if (!list_is_last_rcu(&rq->sched.link, &engine->active.requests)) {
+	if (!list_is_last_rcu(&rq->sched.link,
+			      &engine->sched_engine->requests)) {
 		ENGINE_TRACE(engine, "timeslice required for second inflight context\n");
 		return true;
 	}
 
 	/* Otherwise, ELSP[0] is by itself, but may be waiting in the queue */
-	if (!RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)) {
+	if (!i915_sched_engine_is_empty(engine->sched_engine)) {
 		ENGINE_TRACE(engine, "timeslice required for queue\n");
 		return true;
 	}
@@ -1187,7 +1188,7 @@ static void start_timeslice(struct intel_engine_cs *engine)
 			 * its timeslice, so recheck.
 			 */
 			if (!timer_pending(&el->timer))
-				tasklet_hi_schedule(&el->tasklet);
+				i915_sched_engine_hi_kick(engine->sched_engine);
 			return;
 		}
 
@@ -1235,6 +1236,7 @@ static bool completed(const struct i915_request *rq)
 
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
+	struct i915_sched_engine * const sched_engine = engine->sched_engine;
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct i915_request **port = execlists->pending;
 	struct i915_request ** const last_port = port + execlists->port_mask;
@@ -1265,7 +1267,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * and context switches) submission.
 	 */
 
-	spin_lock(&engine->active.lock);
+	spin_lock(&engine->sched_engine->lock);
 
 	/*
 	 * If the queue is higher priority than the last
@@ -1287,7 +1289,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				     last->fence.context,
 				     last->fence.seqno,
 				     last->sched.attr.priority,
-				     execlists->queue_priority_hint);
+				     sched_engine->queue_priority_hint);
 			record_preemption(execlists);
 
 			/*
@@ -1313,7 +1315,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				     yesno(timer_expired(&execlists->timer)),
 				     last->fence.context, last->fence.seqno,
 				     rq_prio(last),
-				     execlists->queue_priority_hint,
+				     sched_engine->queue_priority_hint,
 				     yesno(timeslice_yield(execlists, last)));
 
 			/*
@@ -1365,7 +1367,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				 * Even if ELSP[1] is occupied and not worthy
 				 * of timeslices, our queue might be.
 				 */
-				spin_unlock(&engine->active.lock);
+				spin_unlock(&sched_engine->lock);
 				return;
 			}
 		}
@@ -1375,7 +1377,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	while ((ve = first_virtual_engine(engine))) {
 		struct i915_request *rq;
 
-		spin_lock(&ve->base.active.lock);
+		spin_lock(&ve->base.sched_engine->lock);
 
 		rq = ve->request;
 		if (unlikely(!virtual_matches(ve, rq, engine)))
@@ -1384,14 +1386,14 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		GEM_BUG_ON(rq->engine != &ve->base);
 		GEM_BUG_ON(rq->context != &ve->context);
 
-		if (unlikely(rq_prio(rq) < queue_prio(execlists))) {
-			spin_unlock(&ve->base.active.lock);
+		if (unlikely(rq_prio(rq) < queue_prio(sched_engine))) {
+			spin_unlock(&ve->base.sched_engine->lock);
 			break;
 		}
 
 		if (last && !can_merge_rq(last, rq)) {
-			spin_unlock(&ve->base.active.lock);
-			spin_unlock(&engine->active.lock);
+			spin_unlock(&ve->base.sched_engine->lock);
+			spin_unlock(&sched_engine->lock);
 			return; /* leave this for another sibling */
 		}
 
@@ -1405,7 +1407,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			     yesno(engine != ve->siblings[0]));
 
 		WRITE_ONCE(ve->request, NULL);
-		WRITE_ONCE(ve->base.execlists.queue_priority_hint, INT_MIN);
+		WRITE_ONCE(ve->base.sched_engine->queue_priority_hint, INT_MIN);
 
 		rb = &ve->nodes[engine->id].rb;
 		rb_erase_cached(rb, &execlists->virtual);
@@ -1437,7 +1439,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 
 		i915_request_put(rq);
 unlock:
-		spin_unlock(&ve->base.active.lock);
+		spin_unlock(&ve->base.sched_engine->lock);
 
 		/*
 		 * Hmm, we have a bunch of virtual engine requests,
@@ -1450,7 +1452,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			break;
 	}
 
-	while ((rb = rb_first_cached(&execlists->queue))) {
+	while ((rb = rb_first_cached(&sched_engine->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 		struct i915_request *rq, *rn;
 
@@ -1529,7 +1531,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			}
 		}
 
-		rb_erase_cached(&p->node, &execlists->queue);
+		rb_erase_cached(&p->node, &sched_engine->queue);
 		i915_priolist_free(p);
 	}
 done:
@@ -1551,8 +1553,9 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * request triggering preemption on the next dequeue (or subsequent
 	 * interrupt for secondary ports).
 	 */
-	execlists->queue_priority_hint = queue_prio(execlists);
-	spin_unlock(&engine->active.lock);
+	sched_engine->queue_priority_hint = queue_prio(sched_engine);
+	i915_sched_engine_reset_on_empty(sched_engine);
+	spin_unlock(&sched_engine->lock);
 
 	/*
 	 * We can skip poking the HW if we ended up with exactly the same set
@@ -1767,8 +1770,8 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive)
 	 * access. Either we are inside the tasklet, or the tasklet is disabled
 	 * and we assume that is only inside the reset paths and so serialised.
 	 */
-	GEM_BUG_ON(!tasklet_is_locked(&execlists->tasklet) &&
-		   !reset_in_progress(execlists));
+	GEM_BUG_ON(!tasklet_is_locked(&engine->sched_engine->tasklet) &&
+		   !reset_in_progress(engine));
 
 	/*
 	 * Note that csb_write, csb_status may be either in HWSP or mmio.
@@ -1866,12 +1869,12 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive)
 			smp_wmb(); /* notify execlists_active() */
 
 			/* cancel old inflight, prepare for switch */
-			trace_ports(execlists, "preempted", old);
+			trace_ports(engine, execlists, "preempted", old);
 			while (*old)
 				*inactive++ = *old++;
 
 			/* switch pending to inflight */
-			GEM_BUG_ON(!assert_pending_valid(execlists, "promote"));
+			GEM_BUG_ON(!assert_pending_valid(engine, execlists, "promote"));
 			copy_ports(execlists->inflight,
 				   execlists->pending,
 				   execlists_num_ports(execlists));
@@ -1889,7 +1892,7 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive)
 			}
 
 			/* port0 completed, advanced to port1 */
-			trace_ports(execlists, "completed", execlists->active);
+			trace_ports(engine, execlists, "completed", execlists->active);
 
 			/*
 			 * We rely on the hardware being strongly
@@ -1979,7 +1982,7 @@ static void __execlists_hold(struct i915_request *rq)
 			__i915_request_unsubmit(rq);
 
 		clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
-		list_move_tail(&rq->sched.link, &rq->engine->active.hold);
+		list_move_tail(&rq->sched.link, &rq->engine->sched_engine->hold);
 		i915_request_set_hold(rq);
 		RQ_TRACE(rq, "on hold\n");
 
@@ -2016,7 +2019,7 @@ static bool execlists_hold(struct intel_engine_cs *engine,
 	if (i915_request_on_hold(rq))
 		return false;
 
-	spin_lock_irq(&engine->active.lock);
+	spin_lock_irq(&engine->sched_engine->lock);
 
 	if (__i915_request_is_complete(rq)) { /* too late! */
 		rq = NULL;
@@ -2032,10 +2035,10 @@ static bool execlists_hold(struct intel_engine_cs *engine,
 	GEM_BUG_ON(i915_request_on_hold(rq));
 	GEM_BUG_ON(rq->engine != engine);
 	__execlists_hold(rq);
-	GEM_BUG_ON(list_empty(&engine->active.hold));
+	GEM_BUG_ON(list_empty(&engine->sched_engine->hold));
 
 unlock:
-	spin_unlock_irq(&engine->active.lock);
+	spin_unlock_irq(&engine->sched_engine->lock);
 	return rq;
 }
 
@@ -2079,7 +2082,7 @@ static void __execlists_unhold(struct i915_request *rq)
 
 		i915_request_clear_hold(rq);
 		list_move_tail(&rq->sched.link,
-			       i915_sched_lookup_priolist(rq->engine,
+			       i915_sched_lookup_priolist(rq->engine->sched_engine,
 							  rq_prio(rq)));
 		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 
@@ -2115,7 +2118,7 @@ static void __execlists_unhold(struct i915_request *rq)
 static void execlists_unhold(struct intel_engine_cs *engine,
 			     struct i915_request *rq)
 {
-	spin_lock_irq(&engine->active.lock);
+	spin_lock_irq(&engine->sched_engine->lock);
 
 	/*
 	 * Move this request back to the priority queue, and all of its
@@ -2123,12 +2126,12 @@ static void execlists_unhold(struct intel_engine_cs *engine,
 	 */
 	__execlists_unhold(rq);
 
-	if (rq_prio(rq) > engine->execlists.queue_priority_hint) {
-		engine->execlists.queue_priority_hint = rq_prio(rq);
-		tasklet_hi_schedule(&engine->execlists.tasklet);
+	if (rq_prio(rq) > engine->sched_engine->queue_priority_hint) {
+		engine->sched_engine->queue_priority_hint = rq_prio(rq);
+		i915_sched_engine_hi_kick(engine->sched_engine);
 	}
 
-	spin_unlock_irq(&engine->active.lock);
+	spin_unlock_irq(&engine->sched_engine->lock);
 }
 
 struct execlists_capture {
@@ -2258,13 +2261,13 @@ static void execlists_capture(struct intel_engine_cs *engine)
 	if (!cap)
 		return;
 
-	spin_lock_irq(&engine->active.lock);
+	spin_lock_irq(&engine->sched_engine->lock);
 	cap->rq = active_context(engine, active_ccid(engine));
 	if (cap->rq) {
 		cap->rq = active_request(cap->rq->context->timeline, cap->rq);
 		cap->rq = i915_request_get_rcu(cap->rq);
 	}
-	spin_unlock_irq(&engine->active.lock);
+	spin_unlock_irq(&engine->sched_engine->lock);
 	if (!cap->rq)
 		goto err_free;
 
@@ -2316,13 +2319,13 @@ static void execlists_reset(struct intel_engine_cs *engine, const char *msg)
 	ENGINE_TRACE(engine, "reset for %s\n", msg);
 
 	/* Mark this tasklet as disabled to avoid waiting for it to complete */
-	tasklet_disable_nosync(&engine->execlists.tasklet);
+	tasklet_disable_nosync(&engine->sched_engine->tasklet);
 
 	ring_set_paused(engine, 1); /* Freeze the current request in place */
 	execlists_capture(engine);
 	intel_engine_reset(engine, msg);
 
-	tasklet_enable(&engine->execlists.tasklet);
+	tasklet_enable(&engine->sched_engine->tasklet);
 	clear_and_wake_up_bit(bit, lock);
 }
 
@@ -2345,8 +2348,9 @@ static bool preempt_timeout(const struct intel_engine_cs *const engine)
  */
 static void execlists_submission_tasklet(struct tasklet_struct *t)
 {
-	struct intel_engine_cs * const engine =
-		from_tasklet(engine, t, execlists.tasklet);
+	struct i915_sched_engine *sched_engine =
+		from_tasklet(sched_engine, t, tasklet);
+	struct intel_engine_cs * const engine = sched_engine->engine;
 	struct i915_request *post[2 * EXECLIST_MAX_PORTS];
 	struct i915_request **inactive;
 
@@ -2421,13 +2425,16 @@ static void execlists_irq_handler(struct intel_engine_cs *engine, u16 iir)
 		intel_engine_signal_breadcrumbs(engine);
 
 	if (tasklet)
-		tasklet_hi_schedule(&engine->execlists.tasklet);
+		i915_sched_engine_hi_kick(engine->sched_engine);
 }
 
 static void __execlists_kick(struct intel_engine_execlists *execlists)
 {
+	struct intel_engine_cs *engine =
+		container_of(execlists, typeof(*engine), execlists);
+
 	/* Kick the tasklet for some interrupt coalescing and reset handling */
-	tasklet_hi_schedule(&execlists->tasklet);
+	i915_sched_engine_hi_kick(engine->sched_engine);
 }
 
 #define execlists_kick(t, member) \
@@ -2448,19 +2455,20 @@ static void queue_request(struct intel_engine_cs *engine,
 {
 	GEM_BUG_ON(!list_empty(&rq->sched.link));
 	list_add_tail(&rq->sched.link,
-		      i915_sched_lookup_priolist(engine, rq_prio(rq)));
+		      i915_sched_lookup_priolist(engine->sched_engine,
+						 rq_prio(rq)));
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 }
 
 static bool submit_queue(struct intel_engine_cs *engine,
 			 const struct i915_request *rq)
 {
-	struct intel_engine_execlists *execlists = &engine->execlists;
+	struct i915_sched_engine *sched_engine = engine->sched_engine;
 
-	if (rq_prio(rq) <= execlists->queue_priority_hint)
+	if (rq_prio(rq) <= sched_engine->queue_priority_hint)
 		return false;
 
-	execlists->queue_priority_hint = rq_prio(rq);
+	sched_engine->queue_priority_hint = rq_prio(rq);
 	return true;
 }
 
@@ -2468,7 +2476,7 @@ static bool ancestor_on_hold(const struct intel_engine_cs *engine,
 			     const struct i915_request *rq)
 {
 	GEM_BUG_ON(i915_request_on_hold(rq));
-	return !list_empty(&engine->active.hold) && hold_request(rq);
+	return !list_empty(&engine->sched_engine->hold) && hold_request(rq);
 }
 
 static void execlists_submit_request(struct i915_request *request)
@@ -2477,23 +2485,24 @@ static void execlists_submit_request(struct i915_request *request)
 	unsigned long flags;
 
 	/* Will be called from irq-context when using foreign fences. */
-	spin_lock_irqsave(&engine->active.lock, flags);
+	spin_lock_irqsave(&engine->sched_engine->lock, flags);
 
 	if (unlikely(ancestor_on_hold(engine, request))) {
 		RQ_TRACE(request, "ancestor on hold\n");
-		list_add_tail(&request->sched.link, &engine->active.hold);
+		list_add_tail(&request->sched.link,
+			      &engine->sched_engine->hold);
 		i915_request_set_hold(request);
 	} else {
 		queue_request(engine, request);
 
-		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
+		GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
 		GEM_BUG_ON(list_empty(&request->sched.link));
 
 		if (submit_queue(engine, request))
-			__execlists_kick(&engine->execlists);
+			i915_sched_engine_hi_kick(engine->sched_engine);
 	}
 
-	spin_unlock_irqrestore(&engine->active.lock, flags);
+	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 }
 
 static int
@@ -2800,10 +2809,10 @@ static int execlists_resume(struct intel_engine_cs *engine)
 
 static void execlists_reset_prepare(struct intel_engine_cs *engine)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
+	struct i915_sched_engine * const sched_engine = engine->sched_engine;
 
 	ENGINE_TRACE(engine, "depth<-%d\n",
-		     atomic_read(&execlists->tasklet.count));
+		     atomic_read(&sched_engine->tasklet.count));
 
 	/*
 	 * Prevent request submission to the hardware until we have
@@ -2814,8 +2823,8 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine)
 	 * Turning off the execlists->tasklet until the reset is over
 	 * prevents the race.
 	 */
-	__tasklet_disable_sync_once(&execlists->tasklet);
-	GEM_BUG_ON(!reset_in_progress(execlists));
+	__tasklet_disable_sync_once(&sched_engine->tasklet);
+	GEM_BUG_ON(!reset_in_progress(engine));
 
 	/*
 	 * We stop engines, otherwise we might get failed reset and a
@@ -2957,23 +2966,25 @@ static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled)
 
 	/* Push back any incomplete requests for replay after the reset. */
 	rcu_read_lock();
-	spin_lock_irqsave(&engine->active.lock, flags);
+	spin_lock_irqsave(&engine->sched_engine->lock, flags);
 	__unwind_incomplete_requests(engine);
-	spin_unlock_irqrestore(&engine->active.lock, flags);
+	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 	rcu_read_unlock();
 }
 
 static void nop_submission_tasklet(struct tasklet_struct *t)
 {
-	struct intel_engine_cs * const engine =
-		from_tasklet(engine, t, execlists.tasklet);
+	struct i915_sched_engine *sched_engine =
+		from_tasklet(sched_engine, t, tasklet);
+	struct intel_engine_cs * const engine = sched_engine->engine;
 
 	/* The driver is wedged; don't process any more events. */
-	WRITE_ONCE(engine->execlists.queue_priority_hint, INT_MIN);
+	WRITE_ONCE(engine->sched_engine->queue_priority_hint, INT_MIN);
 }
 
 static void execlists_reset_cancel(struct intel_engine_cs *engine)
 {
+	struct i915_sched_engine * const sched_engine = engine->sched_engine;
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct i915_request *rq, *rn;
 	struct rb_node *rb;
@@ -2998,15 +3009,15 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine)
 	execlists_reset_csb(engine, true);
 
 	rcu_read_lock();
-	spin_lock_irqsave(&engine->active.lock, flags);
+	spin_lock_irqsave(&sched_engine->lock, flags);
 
 	/* Mark all executing requests as skipped. */
-	list_for_each_entry(rq, &engine->active.requests, sched.link)
+	list_for_each_entry(rq, &sched_engine->requests, sched.link)
 		i915_request_put(i915_request_mark_eio(rq));
 	intel_engine_signal_breadcrumbs(engine);
 
 	/* Flush the queued requests to the timeline list (for retiring). */
-	while ((rb = rb_first_cached(&execlists->queue))) {
+	while ((rb = rb_first_cached(&sched_engine->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 
 		priolist_for_each_request_consume(rq, rn, p) {
@@ -3016,12 +3027,12 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine)
 			}
 		}
 
-		rb_erase_cached(&p->node, &execlists->queue);
+		rb_erase_cached(&p->node, &sched_engine->queue);
 		i915_priolist_free(p);
 	}
 
 	/* On-hold requests will be flushed to timeline upon their release */
-	list_for_each_entry(rq, &engine->active.hold, sched.link)
+	list_for_each_entry(rq, &sched_engine->hold, sched.link)
 		i915_request_put(i915_request_mark_eio(rq));
 
 	/* Cancel all attached virtual engines */
@@ -3032,7 +3043,7 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine)
 		rb_erase_cached(rb, &execlists->virtual);
 		RB_CLEAR_NODE(rb);
 
-		spin_lock(&ve->base.active.lock);
+		spin_lock(&ve->base.sched_engine->lock);
 		rq = fetch_and_zero(&ve->request);
 		if (rq) {
 			if (i915_request_mark_eio(rq)) {
@@ -3042,26 +3053,26 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine)
 			}
 			i915_request_put(rq);
 
-			ve->base.execlists.queue_priority_hint = INT_MIN;
+			ve->base.sched_engine->queue_priority_hint = INT_MIN;
 		}
-		spin_unlock(&ve->base.active.lock);
+		spin_unlock(&ve->base.sched_engine->lock);
 	}
 
 	/* Remaining _unready_ requests will be nop'ed when submitted */
 
-	execlists->queue_priority_hint = INT_MIN;
-	execlists->queue = RB_ROOT_CACHED;
+	sched_engine->queue_priority_hint = INT_MIN;
+	sched_engine->queue = RB_ROOT_CACHED;
 
-	GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet));
-	execlists->tasklet.callback = nop_submission_tasklet;
+	GEM_BUG_ON(__tasklet_is_enabled(&sched_engine->tasklet));
+	sched_engine->tasklet.callback = nop_submission_tasklet;
 
-	spin_unlock_irqrestore(&engine->active.lock, flags);
+	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 	rcu_read_unlock();
 }
 
 static void execlists_reset_finish(struct intel_engine_cs *engine)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
+	struct i915_sched_engine * const sched_engine = engine->sched_engine;
 
 	/*
 	 * After a GPU reset, we may have requests to replay. Do so now while
@@ -3073,14 +3084,14 @@ static void execlists_reset_finish(struct intel_engine_cs *engine)
 	 * reset as the next level of recovery, and as a final resort we
 	 * will declare the device wedged.
 	 */
-	GEM_BUG_ON(!reset_in_progress(execlists));
+	GEM_BUG_ON(!reset_in_progress(engine));
 
 	/* And kick in case we missed a new request submission. */
-	if (__tasklet_enable(&execlists->tasklet))
-		__execlists_kick(execlists);
+	if (__tasklet_enable(&sched_engine->tasklet))
+		i915_sched_engine_hi_kick(sched_engine);
 
 	ENGINE_TRACE(engine, "depth->%d\n",
-		     atomic_read(&execlists->tasklet.count));
+		     atomic_read(&sched_engine->tasklet.count));
 }
 
 static void gen8_logical_ring_enable_irq(struct intel_engine_cs *engine)
@@ -3110,11 +3121,59 @@ static bool can_preempt(struct intel_engine_cs *engine)
 	return engine->class != RENDER_CLASS;
 }
 
+static void kick_execlists(const struct i915_request *rq, int prio)
+{
+	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
+	const struct i915_request *inflight;
+
+	/*
+	 * We only need to kick the tasklet once for the high priority
+	 * new context we add into the queue.
+	 */
+	if (prio <= sched_engine->queue_priority_hint)
+		return;
+
+	rcu_read_lock();
+
+	/* Nothing currently active? We're overdue for a submission! */
+	inflight = execlists_active(&rq->engine->execlists);
+	if (!inflight)
+		goto unlock;
+
+	/*
+	 * If we are already the currently executing context, don't
+	 * bother evaluating if we should preempt ourselves.
+	 */
+	if (inflight->context == rq->context)
+		goto unlock;
+
+	ENGINE_TRACE(rq->engine,
+		     "bumping queue-priority-hint:%d for rq:%llx:%lld, inflight:%llx:%lld prio %d\n",
+		     prio,
+		     rq->fence.context, rq->fence.seqno,
+		     inflight->fence.context, inflight->fence.seqno,
+		     inflight->sched.attr.priority);
+
+	sched_engine->queue_priority_hint = prio;
+
+	/*
+	 * Allow preemption of low -> normal -> high, but we do
+	 * not allow low priority tasks to preempt other low priority
+	 * tasks under the impression that latency for low priority
+	 * tasks does not matter (as much as background throughput),
+	 * so kiss.
+	 */
+	if (prio >= max(I915_PRIORITY_NORMAL, rq_prio(inflight)))
+		i915_sched_engine_hi_kick(sched_engine);
+
+unlock:
+	rcu_read_unlock();
+}
+
 static void execlists_set_default_submission(struct intel_engine_cs *engine)
 {
 	engine->submit_request = execlists_submit_request;
-	engine->schedule = i915_schedule;
-	engine->execlists.tasklet.callback = execlists_submission_tasklet;
+	engine->sched_engine->tasklet.callback = execlists_submission_tasklet;
 }
 
 static void execlists_shutdown(struct intel_engine_cs *engine)
@@ -3122,7 +3181,7 @@ static void execlists_shutdown(struct intel_engine_cs *engine)
 	/* Synchronise with residual timers and any softirq they raise */
 	del_timer_sync(&engine->execlists.timer);
 	del_timer_sync(&engine->execlists.preempt);
-	tasklet_kill(&engine->execlists.tasklet);
+	i915_sched_engine_kill(engine->sched_engine);
 }
 
 static void execlists_release(struct intel_engine_cs *engine)
@@ -3238,10 +3297,14 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine)
 	struct intel_uncore *uncore = engine->uncore;
 	u32 base = engine->mmio_base;
 
-	tasklet_setup(&engine->execlists.tasklet, execlists_submission_tasklet);
+	tasklet_setup(&engine->sched_engine->tasklet,
+		      execlists_submission_tasklet);
 	timer_setup(&engine->execlists.timer, execlists_timeslice, 0);
 	timer_setup(&engine->execlists.preempt, execlists_preempt, 0);
 
+	engine->sched_engine->schedule = i915_schedule;
+	engine->sched_engine->kick_backend = kick_execlists;
+
 	logical_ring_default_vfuncs(engine);
 	logical_ring_default_irqs(engine);
 
@@ -3286,7 +3349,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine)
 
 static struct list_head *virtual_queue(struct virtual_engine *ve)
 {
-	return &ve->base.execlists.default_priolist.requests;
+	return &ve->base.sched_engine->default_priolist.requests;
 }
 
 static void rcu_virtual_context_destroy(struct work_struct *wrk)
@@ -3301,7 +3364,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
 	if (unlikely(ve->request)) {
 		struct i915_request *old;
 
-		spin_lock_irq(&ve->base.active.lock);
+		spin_lock_irq(&ve->base.sched_engine->lock);
 
 		old = fetch_and_zero(&ve->request);
 		if (old) {
@@ -3310,7 +3373,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
 			i915_request_put(old);
 		}
 
-		spin_unlock_irq(&ve->base.active.lock);
+		spin_unlock_irq(&ve->base.sched_engine->lock);
 	}
 
 	/*
@@ -3320,7 +3383,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
 	 * rbtrees as in the case it is running in parallel, it may reinsert
 	 * the rb_node into a sibling.
 	 */
-	tasklet_kill(&ve->base.execlists.tasklet);
+	i915_sched_engine_kill(ve->base.sched_engine);
 
 	/* Decouple ourselves from the siblings, no more access allowed. */
 	for (n = 0; n < ve->num_siblings; n++) {
@@ -3330,21 +3393,23 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
 		if (RB_EMPTY_NODE(node))
 			continue;
 
-		spin_lock_irq(&sibling->active.lock);
+		spin_lock_irq(&sibling->sched_engine->lock);
 
 		/* Detachment is lazily performed in the execlists tasklet */
 		if (!RB_EMPTY_NODE(node))
 			rb_erase_cached(node, &sibling->execlists.virtual);
 
-		spin_unlock_irq(&sibling->active.lock);
+		spin_unlock_irq(&sibling->sched_engine->lock);
 	}
-	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
+	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.sched_engine->tasklet));
 	GEM_BUG_ON(!list_empty(virtual_queue(ve)));
 
 	lrc_fini(&ve->context);
 	intel_context_fini(&ve->context);
 
 	intel_breadcrumbs_free(ve->base.breadcrumbs);
+	if (ve->base.sched_engine)
+		i915_sched_engine_put(ve->base.sched_engine);
 	intel_engine_free_request_pool(&ve->base);
 
 	kfree(ve->bonds);
@@ -3475,16 +3540,18 @@ static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
 
 	ENGINE_TRACE(&ve->base, "rq=%llx:%lld, mask=%x, prio=%d\n",
 		     rq->fence.context, rq->fence.seqno,
-		     mask, ve->base.execlists.queue_priority_hint);
+		     mask, ve->base.sched_engine->queue_priority_hint);
 
 	return mask;
 }
 
 static void virtual_submission_tasklet(struct tasklet_struct *t)
 {
+	struct i915_sched_engine *sched_engine =
+		from_tasklet(sched_engine, t, tasklet);
 	struct virtual_engine * const ve =
-		from_tasklet(ve, t, base.execlists.tasklet);
-	const int prio = READ_ONCE(ve->base.execlists.queue_priority_hint);
+		(struct virtual_engine *)sched_engine->engine;
+	const int prio = READ_ONCE(ve->base.sched_engine->queue_priority_hint);
 	intel_engine_mask_t mask;
 	unsigned int n;
 
@@ -3503,7 +3570,7 @@ static void virtual_submission_tasklet(struct tasklet_struct *t)
 		if (!READ_ONCE(ve->request))
 			break; /* already handled by a sibling's tasklet */
 
-		spin_lock_irq(&sibling->active.lock);
+		spin_lock_irq(&sibling->sched_engine->lock);
 
 		if (unlikely(!(mask & sibling->mask))) {
 			if (!RB_EMPTY_NODE(&node->rb)) {
@@ -3552,11 +3619,11 @@ static void virtual_submission_tasklet(struct tasklet_struct *t)
 submit_engine:
 		GEM_BUG_ON(RB_EMPTY_NODE(&node->rb));
 		node->prio = prio;
-		if (first && prio > sibling->execlists.queue_priority_hint)
-			tasklet_hi_schedule(&sibling->execlists.tasklet);
+		if (first && prio > sibling->sched_engine->queue_priority_hint)
+			i915_sched_engine_hi_kick(sibling->sched_engine);
 
 unlock_engine:
-		spin_unlock_irq(&sibling->active.lock);
+		spin_unlock_irq(&sibling->sched_engine->lock);
 
 		if (intel_context_inflight(&ve->context))
 			break;
@@ -3574,7 +3641,7 @@ static void virtual_submit_request(struct i915_request *rq)
 
 	GEM_BUG_ON(ve->base.submit_request != virtual_submit_request);
 
-	spin_lock_irqsave(&ve->base.active.lock, flags);
+	spin_lock_irqsave(&ve->base.sched_engine->lock, flags);
 
 	/* By the time we resubmit a request, it may be completed */
 	if (__i915_request_is_complete(rq)) {
@@ -3588,16 +3655,16 @@ static void virtual_submit_request(struct i915_request *rq)
 		i915_request_put(ve->request);
 	}
 
-	ve->base.execlists.queue_priority_hint = rq_prio(rq);
+	ve->base.sched_engine->queue_priority_hint = rq_prio(rq);
 	ve->request = i915_request_get(rq);
 
 	GEM_BUG_ON(!list_empty(virtual_queue(ve)));
 	list_move_tail(&rq->sched.link, virtual_queue(ve));
 
-	tasklet_hi_schedule(&ve->base.execlists.tasklet);
+	i915_sched_engine_hi_kick(ve->base.sched_engine);
 
 unlock:
-	spin_unlock_irqrestore(&ve->base.active.lock, flags);
+	spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags);
 }
 
 static struct ve_bond *
@@ -3681,19 +3748,24 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
 
 	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
 
-	intel_engine_init_active(&ve->base, ENGINE_VIRTUAL);
-	intel_engine_init_execlists(&ve->base);
+	ve->base.sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
+	if (!ve->base.sched_engine) {
+		kfree(ve);
+		return ERR_PTR(-ENOMEM);
+	}
+	ve->base.sched_engine->engine = &ve->base;
 
 	ve->base.cops = &virtual_context_ops;
 	ve->base.request_alloc = execlists_request_alloc;
 
-	ve->base.schedule = i915_schedule;
+	ve->base.sched_engine->schedule = i915_schedule;
 	ve->base.submit_request = virtual_submit_request;
 	ve->base.bond_execute = virtual_bond_execute;
 
 	INIT_LIST_HEAD(virtual_queue(ve));
-	ve->base.execlists.queue_priority_hint = INT_MIN;
-	tasklet_setup(&ve->base.execlists.tasklet, virtual_submission_tasklet);
+	ve->base.sched_engine->queue_priority_hint = INT_MIN;
+	tasklet_setup(&ve->base.sched_engine->tasklet,
+		      virtual_submission_tasklet);
 
 	intel_context_init(&ve->context, &ve->base);
 
@@ -3721,7 +3793,7 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
 		 * layering if we handle cloning of the requests and
 		 * submitting a copy into each backend.
 		 */
-		if (sibling->execlists.tasklet.callback !=
+		if (sibling->sched_engine->tasklet.callback !=
 		    execlists_submission_tasklet) {
 			err = -ENODEV;
 			goto err_put;
@@ -3756,6 +3828,9 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
 			 "v%dx%d", ve->base.class, count);
 		ve->base.context_size = sibling->context_size;
 
+		ve->base.sched_engine->kick_backend =
+			sibling->sched_engine->kick_backend;
+
 		ve->base.emit_bb_start = sibling->emit_bb_start;
 		ve->base.emit_flush = sibling->emit_flush;
 		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
@@ -3848,17 +3923,18 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 							int indent),
 				   unsigned int max)
 {
+	const struct i915_sched_engine *sched_engine = engine->sched_engine;
 	const struct intel_engine_execlists *execlists = &engine->execlists;
 	struct i915_request *rq, *last;
 	unsigned long flags;
 	unsigned int count;
 	struct rb_node *rb;
 
-	spin_lock_irqsave(&engine->active.lock, flags);
+	spin_lock_irqsave(&engine->sched_engine->lock, flags);
 
 	last = NULL;
 	count = 0;
-	list_for_each_entry(rq, &engine->active.requests, sched.link) {
+	list_for_each_entry(rq, &sched_engine->requests, sched.link) {
 		if (count++ < max - 1)
 			show_request(m, rq, "\t\t", 0);
 		else
@@ -3873,13 +3949,13 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 		show_request(m, last, "\t\t", 0);
 	}
 
-	if (execlists->queue_priority_hint != INT_MIN)
+	if (sched_engine->queue_priority_hint != INT_MIN)
 		drm_printf(m, "\t\tQueue priority hint: %d\n",
-			   READ_ONCE(execlists->queue_priority_hint));
+			   READ_ONCE(sched_engine->queue_priority_hint));
 
 	last = NULL;
 	count = 0;
-	for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
+	for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
 		struct i915_priolist *p = rb_entry(rb, typeof(*p), node);
 
 		priolist_for_each_request(rq, p) {
@@ -3921,7 +3997,7 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 		show_request(m, last, "\t\t", 0);
 	}
 
-	spin_unlock_irqrestore(&engine->active.lock, flags);
+	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 2b6dffcc2262..14aa31879a37 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -339,9 +339,9 @@ static void reset_rewind(struct intel_engine_cs *engine, bool stalled)
 	u32 head;
 
 	rq = NULL;
-	spin_lock_irqsave(&engine->active.lock, flags);
+	spin_lock_irqsave(&engine->sched_engine->lock, flags);
 	rcu_read_lock();
-	list_for_each_entry(pos, &engine->active.requests, sched.link) {
+	list_for_each_entry(pos, &engine->sched_engine->requests, sched.link) {
 		if (!__i915_request_is_complete(pos)) {
 			rq = pos;
 			break;
@@ -396,7 +396,7 @@ static void reset_rewind(struct intel_engine_cs *engine, bool stalled)
 	}
 	engine->legacy.ring->head = intel_ring_wrap(engine->legacy.ring, head);
 
-	spin_unlock_irqrestore(&engine->active.lock, flags);
+	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 }
 
 static void reset_finish(struct intel_engine_cs *engine)
@@ -408,16 +408,17 @@ static void reset_cancel(struct intel_engine_cs *engine)
 	struct i915_request *request;
 	unsigned long flags;
 
-	spin_lock_irqsave(&engine->active.lock, flags);
+	spin_lock_irqsave(&engine->sched_engine->lock, flags);
 
 	/* Mark all submitted requests as skipped. */
-	list_for_each_entry(request, &engine->active.requests, sched.link)
+	list_for_each_entry(request, &engine->sched_engine->requests,
+			    sched.link)
 		i915_request_put(i915_request_mark_eio(request));
 	intel_engine_signal_breadcrumbs(engine);
 
 	/* Remaining _unready_ requests will be nop'ed when submitted */
 
-	spin_unlock_irqrestore(&engine->active.lock, flags);
+	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 }
 
 static void i9xx_submit_request(struct i915_request *request)
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index 32589c6625e1..bd005c1b6fd5 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -253,10 +253,10 @@ static void mock_reset_cancel(struct intel_engine_cs *engine)
 
 	del_timer_sync(&mock->hw_delay);
 
-	spin_lock_irqsave(&engine->active.lock, flags);
+	spin_lock_irqsave(&engine->sched_engine->lock, flags);
 
 	/* Mark all submitted requests as skipped. */
-	list_for_each_entry(rq, &engine->active.requests, sched.link)
+	list_for_each_entry(rq, &engine->sched_engine->requests, sched.link)
 		i915_request_put(i915_request_mark_eio(rq));
 	intel_engine_signal_breadcrumbs(engine);
 
@@ -269,7 +269,7 @@ static void mock_reset_cancel(struct intel_engine_cs *engine)
 	}
 	INIT_LIST_HEAD(&mock->hw_queue);
 
-	spin_unlock_irqrestore(&engine->active.lock, flags);
+	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 }
 
 static void mock_reset_finish(struct intel_engine_cs *engine)
@@ -283,6 +283,7 @@ static void mock_engine_release(struct intel_engine_cs *engine)
 
 	GEM_BUG_ON(timer_pending(&mock->hw_delay));
 
+	i915_sched_engine_put(engine->sched_engine);
 	intel_breadcrumbs_free(engine->breadcrumbs);
 
 	intel_context_unpin(engine->kernel_context);
@@ -345,14 +346,18 @@ int mock_engine_init(struct intel_engine_cs *engine)
 {
 	struct intel_context *ce;
 
-	intel_engine_init_active(engine, ENGINE_MOCK);
+	engine->sched_engine = i915_sched_engine_create(ENGINE_MOCK);
+	if (!engine->sched_engine)
+		return -ENOMEM;
+	engine->sched_engine->engine = engine;
+
 	intel_engine_init_execlists(engine);
 	intel_engine_init__pm(engine);
 	intel_engine_init_retire(engine);
 
 	engine->breadcrumbs = intel_breadcrumbs_create(NULL);
 	if (!engine->breadcrumbs)
-		return -ENOMEM;
+		goto err_schedule;
 
 	ce = create_kernel_context(engine);
 	if (IS_ERR(ce))
@@ -366,6 +371,8 @@ int mock_engine_init(struct intel_engine_cs *engine)
 
 err_breadcrumbs:
 	intel_breadcrumbs_free(engine->breadcrumbs);
+err_schedule:
+	i915_sched_engine_put(engine->sched_engine);
 	return -ENOMEM;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index 1f93591a8c69..f349048ccbf6 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -43,7 +43,7 @@ static int wait_for_submit(struct intel_engine_cs *engine,
 			   unsigned long timeout)
 {
 	/* Ignore our own attempts to suppress excess tasklets */
-	tasklet_hi_schedule(&engine->execlists.tasklet);
+	i915_sched_engine_hi_kick(engine->sched_engine);
 
 	timeout += jiffies;
 	do {
@@ -273,7 +273,7 @@ static int live_unlite_restore(struct intel_gt *gt, int prio)
 			};
 
 			/* Alternatively preempt the spinner with ce[1] */
-			engine->schedule(rq[1], &attr);
+			engine->sched_engine->schedule(rq[1], &attr);
 		}
 
 		/* And switch back to ce[0] for good measure */
@@ -606,9 +606,9 @@ static int live_hold_reset(void *arg)
 			err = -EBUSY;
 			goto out;
 		}
-		tasklet_disable(&engine->execlists.tasklet);
+		tasklet_disable(&engine->sched_engine->tasklet);
 
-		engine->execlists.tasklet.callback(&engine->execlists.tasklet);
+		engine->sched_engine->tasklet.callback(&engine->sched_engine->tasklet);
 		GEM_BUG_ON(execlists_active(&engine->execlists) != rq);
 
 		i915_request_get(rq);
@@ -618,7 +618,7 @@ static int live_hold_reset(void *arg)
 		__intel_engine_reset_bh(engine, NULL);
 		GEM_BUG_ON(rq->fence.error != -EIO);
 
-		tasklet_enable(&engine->execlists.tasklet);
+		tasklet_enable(&engine->sched_engine->tasklet);
 		clear_and_wake_up_bit(I915_RESET_ENGINE + id,
 				      &gt->reset.flags);
 		local_bh_enable();
@@ -900,7 +900,7 @@ release_queue(struct intel_engine_cs *engine,
 	i915_request_add(rq);
 
 	local_bh_disable();
-	engine->schedule(rq, &attr);
+	engine->sched_engine->schedule(rq, &attr);
 	local_bh_enable(); /* kick tasklet */
 
 	i915_request_put(rq);
@@ -1183,7 +1183,7 @@ static int live_timeslice_rewind(void *arg)
 		while (i915_request_is_active(rq[A2])) { /* semaphore yield! */
 			/* Wait for the timeslice to kick in */
 			del_timer(&engine->execlists.timer);
-			tasklet_hi_schedule(&engine->execlists.tasklet);
+			i915_sched_engine_hi_kick(engine->sched_engine);
 			intel_engine_flush_submission(engine);
 		}
 		/* -> ELSP[] = { { A:rq1 }, { B:rq1 } } */
@@ -1325,7 +1325,7 @@ static int live_timeslice_queue(void *arg)
 			err = PTR_ERR(rq);
 			goto err_heartbeat;
 		}
-		engine->schedule(rq, &attr);
+		engine->sched_engine->schedule(rq, &attr);
 		err = wait_for_submit(engine, rq, HZ / 2);
 		if (err) {
 			pr_err("%s: Timed out trying to submit semaphores\n",
@@ -1867,7 +1867,7 @@ static int live_late_preempt(void *arg)
 		}
 
 		attr.priority = I915_PRIORITY_MAX;
-		engine->schedule(rq, &attr);
+		engine->sched_engine->schedule(rq, &attr);
 
 		if (!igt_wait_for_spinner(&spin_hi, rq)) {
 			pr_err("High priority context failed to preempt the low priority context\n");
@@ -2480,7 +2480,7 @@ static int live_suppress_self_preempt(void *arg)
 			i915_request_add(rq_b);
 
 			GEM_BUG_ON(i915_request_completed(rq_a));
-			engine->schedule(rq_a, &attr);
+			engine->sched_engine->schedule(rq_a, &attr);
 			igt_spinner_end(&a.spin);
 
 			if (!igt_wait_for_spinner(&b.spin, rq_b)) {
@@ -2612,7 +2612,7 @@ static int live_chain_preempt(void *arg)
 
 			i915_request_get(rq);
 			i915_request_add(rq);
-			engine->schedule(rq, &attr);
+			engine->sched_engine->schedule(rq, &attr);
 
 			igt_spinner_end(&hi.spin);
 			if (i915_request_wait(rq, 0, HZ / 5) < 0) {
@@ -2971,7 +2971,7 @@ static int live_preempt_gang(void *arg)
 				break;
 
 			/* Submit each spinner at increasing priority */
-			engine->schedule(rq, &attr);
+			engine->sched_engine->schedule(rq, &attr);
 		} while (prio <= I915_PRIORITY_MAX &&
 			 !__igt_timeout(end_time, NULL));
 		pr_debug("%s: Preempt chain of %d requests\n",
@@ -3219,7 +3219,7 @@ static int preempt_user(struct intel_engine_cs *engine,
 	i915_request_get(rq);
 	i915_request_add(rq);
 
-	engine->schedule(rq, &attr);
+	engine->sched_engine->schedule(rq, &attr);
 
 	if (i915_request_wait(rq, 0, HZ / 2) < 0)
 		err = -ETIME;
@@ -4593,15 +4593,15 @@ static int reset_virtual_engine(struct intel_gt *gt,
 		err = -EBUSY;
 		goto out_heartbeat;
 	}
-	tasklet_disable(&engine->execlists.tasklet);
+	tasklet_disable(&engine->sched_engine->tasklet);
 
-	engine->execlists.tasklet.callback(&engine->execlists.tasklet);
+	engine->sched_engine->tasklet.callback(&engine->sched_engine->tasklet);
 	GEM_BUG_ON(execlists_active(&engine->execlists) != rq);
 
 	/* Fake a preemption event; failed of course */
-	spin_lock_irq(&engine->active.lock);
+	spin_lock_irq(&engine->sched_engine->lock);
 	__unwind_incomplete_requests(engine);
-	spin_unlock_irq(&engine->active.lock);
+	spin_unlock_irq(&engine->sched_engine->lock);
 	GEM_BUG_ON(rq->engine != engine);
 
 	/* Reset the engine while keeping our active request on hold */
@@ -4612,7 +4612,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
 	GEM_BUG_ON(rq->fence.error != -EIO);
 
 	/* Release our grasp on the engine, letting CS flow again */
-	tasklet_enable(&engine->execlists.tasklet);
+	tasklet_enable(&engine->sched_engine->tasklet);
 	clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags);
 	local_bh_enable();
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 5b63d4df8c93..cbcb800e2ca0 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -858,12 +858,12 @@ static int active_engine(void *data)
 		rq[idx] = i915_request_get(new);
 		i915_request_add(new);
 
-		if (engine->schedule && arg->flags & TEST_PRIORITY) {
+		if (engine->sched_engine->schedule && arg->flags & TEST_PRIORITY) {
 			struct i915_sched_attr attr = {
 				.priority =
 					i915_prandom_u32_max_state(512, &prng),
 			};
-			engine->schedule(rq[idx], &attr);
+			engine->sched_engine->schedule(rq[idx], &attr);
 		}
 
 		err = active_request_put(old);
@@ -1702,7 +1702,7 @@ static int __igt_atomic_reset_engine(struct intel_engine_cs *engine,
 				     const struct igt_atomic_section *p,
 				     const char *mode)
 {
-	struct tasklet_struct * const t = &engine->execlists.tasklet;
+	struct tasklet_struct * const t = &engine->sched_engine->tasklet;
 	int err;
 
 	GEM_TRACE("i915_reset_engine(%s:%s) under %s\n",
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index d8f6623524e8..5b40def7cd9d 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -49,7 +49,7 @@ static int wait_for_submit(struct intel_engine_cs *engine,
 			   unsigned long timeout)
 {
 	/* Ignore our own attempts to suppress excess tasklets */
-	tasklet_hi_schedule(&engine->execlists.tasklet);
+	i915_sched_engine_hi_kick(engine->sched_engine);
 
 	timeout += jiffies;
 	do {
@@ -1613,12 +1613,12 @@ static void garbage_reset(struct intel_engine_cs *engine,
 
 	local_bh_disable();
 	if (!test_and_set_bit(bit, lock)) {
-		tasklet_disable(&engine->execlists.tasklet);
+		tasklet_disable(&engine->sched_engine->tasklet);
 
 		if (!rq->fence.error)
 			__intel_engine_reset_bh(engine, NULL);
 
-		tasklet_enable(&engine->execlists.tasklet);
+		tasklet_enable(&engine->sched_engine->tasklet);
 		clear_and_wake_up_bit(bit, lock);
 	}
 	local_bh_enable();
diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
index 8784257ec808..7a50c9f4071b 100644
--- a/drivers/gpu/drm/i915/gt/selftest_reset.c
+++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
@@ -321,7 +321,7 @@ static int igt_atomic_engine_reset(void *arg)
 		goto out_unlock;
 
 	for_each_engine(engine, gt, id) {
-		struct tasklet_struct *t = &engine->execlists.tasklet;
+		struct tasklet_struct *t = &engine->sched_engine->tasklet;
 
 		if (t->func)
 			tasklet_disable(t);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 38cda5d599a6..b8f9c71af13e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -181,6 +181,7 @@ static void schedule_out(struct i915_request *rq)
 
 static void __guc_dequeue(struct intel_engine_cs *engine)
 {
+	struct i915_sched_engine * const sched_engine = engine->sched_engine;
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct i915_request **first = execlists->inflight;
 	struct i915_request ** const last_port = first + execlists->port_mask;
@@ -189,7 +190,7 @@ static void __guc_dequeue(struct intel_engine_cs *engine)
 	bool submit = false;
 	struct rb_node *rb;
 
-	lockdep_assert_held(&engine->active.lock);
+	lockdep_assert_held(&engine->sched_engine->lock);
 
 	if (last) {
 		if (*++first)
@@ -204,7 +205,7 @@ static void __guc_dequeue(struct intel_engine_cs *engine)
 	 * event.
 	 */
 	port = first;
-	while ((rb = rb_first_cached(&execlists->queue))) {
+	while ((rb = rb_first_cached(&sched_engine->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 		struct i915_request *rq, *rn;
 
@@ -224,11 +225,11 @@ static void __guc_dequeue(struct intel_engine_cs *engine)
 			last = rq;
 		}
 
-		rb_erase_cached(&p->node, &execlists->queue);
+		rb_erase_cached(&p->node, &sched_engine->queue);
 		i915_priolist_free(p);
 	}
 done:
-	execlists->queue_priority_hint =
+	sched_engine->queue_priority_hint =
 		rb ? to_priolist(rb)->priority : INT_MIN;
 	if (submit) {
 		*port = schedule_in(last, port - execlists->inflight);
@@ -240,13 +241,14 @@ static void __guc_dequeue(struct intel_engine_cs *engine)
 
 static void guc_submission_tasklet(struct tasklet_struct *t)
 {
-	struct intel_engine_cs * const engine =
-		from_tasklet(engine, t, execlists.tasklet);
+	struct i915_sched_engine *sched_engine =
+		from_tasklet(sched_engine, t, tasklet);
+	struct intel_engine_cs * const engine = sched_engine->engine;
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct i915_request **port, *rq;
 	unsigned long flags;
 
-	spin_lock_irqsave(&engine->active.lock, flags);
+	spin_lock_irqsave(&engine->sched_engine->lock, flags);
 
 	for (port = execlists->inflight; (rq = *port); port++) {
 		if (!i915_request_completed(rq))
@@ -262,20 +264,22 @@ static void guc_submission_tasklet(struct tasklet_struct *t)
 
 	__guc_dequeue(engine);
 
-	spin_unlock_irqrestore(&engine->active.lock, flags);
+	i915_sched_engine_reset_on_empty(engine->sched_engine);
+
+	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 }
 
 static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
 {
 	if (iir & GT_RENDER_USER_INTERRUPT) {
 		intel_engine_signal_breadcrumbs(engine);
-		tasklet_hi_schedule(&engine->execlists.tasklet);
+		i915_sched_engine_hi_kick(engine->sched_engine);
 	}
 }
 
 static void guc_reset_prepare(struct intel_engine_cs *engine)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
+	struct i915_sched_engine * const sched_engine = engine->sched_engine;
 
 	ENGINE_TRACE(engine, "\n");
 
@@ -283,12 +287,12 @@ static void guc_reset_prepare(struct intel_engine_cs *engine)
 	 * Prevent request submission to the hardware until we have
 	 * completed the reset in i915_gem_reset_finish(). If a request
 	 * is completed by one engine, it may then queue a request
-	 * to a second via its execlists->tasklet *just* as we are
+	 * to a second via its sched_engine->tasklet *just* as we are
 	 * calling engine->init_hw() and also writing the ELSP.
-	 * Turning off the execlists->tasklet until the reset is over
+	 * Turning off the sched_engine->tasklet until the reset is over
 	 * prevents the race.
 	 */
-	__tasklet_disable_sync_once(&execlists->tasklet);
+	__tasklet_disable_sync_once(&sched_engine->tasklet);
 }
 
 static void guc_reset_state(struct intel_context *ce,
@@ -319,7 +323,7 @@ static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
 	struct i915_request *rq;
 	unsigned long flags;
 
-	spin_lock_irqsave(&engine->active.lock, flags);
+	spin_lock_irqsave(&engine->sched_engine->lock, flags);
 
 	/* Push back any incomplete requests for replay after the reset. */
 	rq = execlists_unwind_incomplete_requests(execlists);
@@ -333,12 +337,12 @@ static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
 	guc_reset_state(rq->context, engine, rq->head, stalled);
 
 out_unlock:
-	spin_unlock_irqrestore(&engine->active.lock, flags);
+	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 }
 
 static void guc_reset_cancel(struct intel_engine_cs *engine)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
+	struct i915_sched_engine * const sched_engine = engine->sched_engine;
 	struct i915_request *rq, *rn;
 	struct rb_node *rb;
 	unsigned long flags;
@@ -359,16 +363,16 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
 	 * submission's irq state, we also wish to remind ourselves that
 	 * it is irq state.)
 	 */
-	spin_lock_irqsave(&engine->active.lock, flags);
+	spin_lock_irqsave(&engine->sched_engine->lock, flags);
 
 	/* Mark all executing requests as skipped. */
-	list_for_each_entry(rq, &engine->active.requests, sched.link) {
+	list_for_each_entry(rq, &engine->sched_engine->requests, sched.link) {
 		i915_request_set_error_once(rq, -EIO);
 		i915_request_mark_complete(rq);
 	}
 
 	/* Flush the queued requests to the timeline list (for retiring). */
-	while ((rb = rb_first_cached(&execlists->queue))) {
+	while ((rb = rb_first_cached(&sched_engine->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 
 		priolist_for_each_request_consume(rq, rn, p) {
@@ -378,28 +382,28 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
 			i915_request_mark_complete(rq);
 		}
 
-		rb_erase_cached(&p->node, &execlists->queue);
+		rb_erase_cached(&p->node, &sched_engine->queue);
 		i915_priolist_free(p);
 	}
 
 	/* Remaining _unready_ requests will be nop'ed when submitted */
 
-	execlists->queue_priority_hint = INT_MIN;
-	execlists->queue = RB_ROOT_CACHED;
+	sched_engine->queue_priority_hint = INT_MIN;
+	sched_engine->queue = RB_ROOT_CACHED;
 
-	spin_unlock_irqrestore(&engine->active.lock, flags);
+	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 }
 
 static void guc_reset_finish(struct intel_engine_cs *engine)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
+	struct i915_sched_engine * const sched_engine = engine->sched_engine;
 
-	if (__tasklet_enable(&execlists->tasklet))
+	if (__tasklet_enable(&sched_engine->tasklet))
 		/* And kick in case we missed a new request submission. */
-		tasklet_hi_schedule(&execlists->tasklet);
+		i915_sched_engine_hi_kick(sched_engine);
 
 	ENGINE_TRACE(engine, "depth->%d\n",
-		     atomic_read(&execlists->tasklet.count));
+		     atomic_read(&sched_engine->tasklet.count));
 }
 
 /*
@@ -500,7 +504,7 @@ static inline void queue_request(struct intel_engine_cs *engine,
 {
 	GEM_BUG_ON(!list_empty(&rq->sched.link));
 	list_add_tail(&rq->sched.link,
-		      i915_sched_lookup_priolist(engine, prio));
+		      i915_sched_lookup_priolist(engine->sched_engine, prio));
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 }
 
@@ -510,16 +514,16 @@ static void guc_submit_request(struct i915_request *rq)
 	unsigned long flags;
 
 	/* Will be called from irq-context when using foreign fences. */
-	spin_lock_irqsave(&engine->active.lock, flags);
+	spin_lock_irqsave(&engine->sched_engine->lock, flags);
 
 	queue_request(engine, rq, rq_prio(rq));
 
-	GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
+	GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
 	GEM_BUG_ON(list_empty(&rq->sched.link));
 
-	tasklet_hi_schedule(&engine->execlists.tasklet);
+	i915_sched_engine_hi_kick(engine->sched_engine);
 
-	spin_unlock_irqrestore(&engine->active.lock, flags);
+	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 }
 
 static void sanitize_hwsp(struct intel_engine_cs *engine)
@@ -597,7 +601,7 @@ static void guc_release(struct intel_engine_cs *engine)
 {
 	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
 
-	tasklet_kill(&engine->execlists.tasklet);
+	tasklet_kill(&engine->sched_engine->tasklet);
 
 	intel_engine_cleanup_common(engine);
 	lrc_fini_wa_ctx(engine);
@@ -612,7 +616,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 	engine->cops = &guc_context_ops;
 	engine->request_alloc = guc_request_alloc;
 
-	engine->schedule = i915_schedule;
+	engine->sched_engine->schedule = i915_schedule;
 
 	engine->reset.prepare = guc_reset_prepare;
 	engine->reset.rewind = guc_reset_rewind;
@@ -676,7 +680,8 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 	 */
 	GEM_BUG_ON(INTEL_GEN(i915) < 11);
 
-	tasklet_setup(&engine->execlists.tasklet, guc_submission_tasklet);
+	tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet);
+	engine->sched_engine->schedule = i915_schedule;
 
 	guc_default_vfuncs(engine);
 	guc_default_irqs(engine);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index bb181fe5d47e..3352f56bcf63 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1247,7 +1247,8 @@ static void record_request(const struct i915_request *request,
 
 static void engine_record_execlists(struct intel_engine_coredump *ee)
 {
-	const struct intel_engine_execlists * const el = &ee->engine->execlists;
+	const struct intel_engine_execlists * const el =
+		&ee->engine->execlists;
 	struct i915_request * const *port = el->active;
 	unsigned int n = 0;
 
@@ -1441,12 +1442,12 @@ capture_engine(struct intel_engine_cs *engine,
 	if (!ee)
 		return NULL;
 
-	spin_lock_irqsave(&engine->active.lock, flags);
+	spin_lock_irqsave(&engine->sched_engine->lock, flags);
 	rq = intel_engine_find_active_request(engine);
 	if (rq)
 		capture = intel_engine_coredump_add_request(ee, rq,
 							    ATOMIC_MAYFAIL);
-	spin_unlock_irqrestore(&engine->active.lock, flags);
+	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 	if (!capture) {
 		kfree(ee);
 		return NULL;
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 970d8f4986bb..4c0df56e3b86 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -272,11 +272,11 @@ i915_request_active_engine(struct i915_request *rq,
 	 * check that we have acquired the lock on the final engine.
 	 */
 	locked = READ_ONCE(rq->engine);
-	spin_lock_irq(&locked->active.lock);
+	spin_lock_irq(&locked->sched_engine->lock);
 	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
-		spin_unlock(&locked->active.lock);
+		spin_unlock(&locked->sched_engine->lock);
 		locked = engine;
-		spin_lock(&locked->active.lock);
+		spin_lock(&locked->sched_engine->lock);
 	}
 
 	if (i915_request_is_active(rq)) {
@@ -285,7 +285,7 @@ i915_request_active_engine(struct i915_request *rq,
 		ret = true;
 	}
 
-	spin_unlock_irq(&locked->active.lock);
+	spin_unlock_irq(&locked->sched_engine->lock);
 
 	return ret;
 }
@@ -302,10 +302,10 @@ static void remove_from_engine(struct i915_request *rq)
 	 * check that the rq still belongs to the newly locked engine.
 	 */
 	locked = READ_ONCE(rq->engine);
-	spin_lock_irq(&locked->active.lock);
+	spin_lock_irq(&locked->sched_engine->lock);
 	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
-		spin_unlock(&locked->active.lock);
-		spin_lock(&engine->active.lock);
+		spin_unlock(&locked->sched_engine->lock);
+		spin_lock(&engine->sched_engine->lock);
 		locked = engine;
 	}
 	list_del_init(&rq->sched.link);
@@ -316,7 +316,7 @@ static void remove_from_engine(struct i915_request *rq)
 	/* Prevent further __await_execution() registering a cb, then flush */
 	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
 
-	spin_unlock_irq(&locked->active.lock);
+	spin_unlock_irq(&locked->sched_engine->lock);
 
 	__notify_execute_cb_imm(rq);
 }
@@ -481,7 +481,7 @@ static bool __request_in_flight(const struct i915_request *signal)
 	 * may either perform a context switch to the second inflight execlists,
 	 * or it may switch to the pending set of execlists. In the case of the
 	 * latter, it may send the ACK and we process the event copying the
-	 * pending[] over top of inflight[], _overwriting_ our *active. Since
+	 * pending[] over top of inflight[], _overwriting_ our *active-> Since
 	 * this implies the HW is arbitrating and not struck in *active, we do
 	 * not worry about complete accuracy, but we do require no read/write
 	 * tearing of the pointer [the read of the pointer must be valid, even
@@ -490,7 +490,7 @@ static bool __request_in_flight(const struct i915_request *signal)
 	 *
 	 * Note that the read of *execlists->active may race with the promotion
 	 * of execlists->pending[] to execlists->inflight[], overwritting
-	 * the value at *execlists->active. This is fine. The promotion implies
+	 * the value at *execlists->active-> This is fine. The promotion implies
 	 * that we received an ACK from the HW, and so the context is not
 	 * stuck -- if we do not see ourselves in *active, the inflight status
 	 * is valid. If instead we see ourselves being copied into *active,
@@ -545,7 +545,7 @@ __await_execution(struct i915_request *rq,
 
 	/*
 	 * Register the callback first, then see if the signaler is already
-	 * active. This ensures that if we race with the
+	 * active-> This ensures that if we race with the
 	 * __notify_execute_cb from i915_request_submit() and we are not
 	 * included in that list, we get a second bite of the cherry and
 	 * execute it ourselves. After this point, a future
@@ -637,7 +637,7 @@ bool __i915_request_submit(struct i915_request *request)
 	RQ_TRACE(request, "\n");
 
 	GEM_BUG_ON(!irqs_disabled());
-	lockdep_assert_held(&engine->active.lock);
+	lockdep_assert_held(&engine->sched_engine->lock);
 
 	/*
 	 * With the advent of preempt-to-busy, we frequently encounter
@@ -649,9 +649,9 @@ bool __i915_request_submit(struct i915_request *request)
 	 *
 	 * We must remove the request from the caller's priority queue,
 	 * and the caller must only call us when the request is in their
-	 * priority queue, under the active.lock. This ensures that the
+	 * priority queue, under the active->lock. This ensures that the
 	 * request has *not* yet been retired and we can safely move
-	 * the request into the engine->active.list where it will be
+	 * the request into the engine->sched_engine->list where it will be
 	 * dropped upon retiring. (Otherwise if resubmit a *retired*
 	 * request, this would be a horrible use-after-free.)
 	 */
@@ -694,7 +694,7 @@ bool __i915_request_submit(struct i915_request *request)
 	result = true;
 
 	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
-	list_move_tail(&request->sched.link, &engine->active.requests);
+	list_move_tail(&request->sched.link, &engine->sched_engine->requests);
 active:
 	clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags);
 	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
@@ -724,11 +724,11 @@ void i915_request_submit(struct i915_request *request)
 	unsigned long flags;
 
 	/* Will be called from irq-context when using foreign fences. */
-	spin_lock_irqsave(&engine->active.lock, flags);
+	spin_lock_irqsave(&engine->sched_engine->lock, flags);
 
 	__i915_request_submit(request);
 
-	spin_unlock_irqrestore(&engine->active.lock, flags);
+	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 }
 
 void __i915_request_unsubmit(struct i915_request *request)
@@ -742,7 +742,7 @@ void __i915_request_unsubmit(struct i915_request *request)
 	RQ_TRACE(request, "\n");
 
 	GEM_BUG_ON(!irqs_disabled());
-	lockdep_assert_held(&engine->active.lock);
+	lockdep_assert_held(&engine->sched_engine->lock);
 
 	/*
 	 * Before we remove this breadcrumb from the signal list, we have
@@ -775,11 +775,11 @@ void i915_request_unsubmit(struct i915_request *request)
 	unsigned long flags;
 
 	/* Will be called from irq-context when using foreign fences. */
-	spin_lock_irqsave(&engine->active.lock, flags);
+	spin_lock_irqsave(&engine->sched_engine->lock, flags);
 
 	__i915_request_unsubmit(request);
 
-	spin_unlock_irqrestore(&engine->active.lock, flags);
+	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 }
 
 static void __cancel_request(struct i915_request *rq)
@@ -1343,7 +1343,7 @@ __i915_request_await_execution(struct i915_request *to,
 	}
 
 	/* Couple the dependency tree for PI on this exposed to->fence */
-	if (to->engine->schedule) {
+	if (to->engine->sched_engine->schedule) {
 		err = i915_sched_node_add_dependency(&to->sched,
 						     &from->sched,
 						     I915_DEPENDENCY_WEAK);
@@ -1484,7 +1484,7 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 		return 0;
 	}
 
-	if (to->engine->schedule) {
+	if (to->engine->sched_engine->schedule) {
 		ret = i915_sched_node_add_dependency(&to->sched,
 						     &from->sched,
 						     I915_DEPENDENCY_EXTERNAL);
@@ -1671,7 +1671,7 @@ __i915_request_add_to_timeline(struct i915_request *rq)
 			__i915_sw_fence_await_dma_fence(&rq->submit,
 							&prev->fence,
 							&rq->dmaq);
-		if (rq->engine->schedule)
+		if (rq->engine->sched_engine->schedule)
 			__i915_sched_node_add_dependency(&rq->sched,
 							 &prev->sched,
 							 &rq->dep,
@@ -1743,8 +1743,8 @@ void __i915_request_queue(struct i915_request *rq,
 	 * decide whether to preempt the entire chain so that it is ready to
 	 * run at the earliest possible convenience.
 	 */
-	if (attr && rq->engine->schedule)
-		rq->engine->schedule(rq, attr);
+	if (attr && rq->engine->sched_engine->schedule)
+		rq->engine->sched_engine->schedule(rq, attr);
 
 	local_bh_disable();
 	__i915_request_queue_bh(rq);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 270f6cd37650..239964bec1fa 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -613,7 +613,7 @@ i915_request_active_timeline(const struct i915_request *rq)
 	 * this submission.
 	 */
 	return rcu_dereference_protected(rq->timeline,
-					 lockdep_is_held(&rq->engine->active.lock));
+					 lockdep_is_held(&rq->engine->sched_engine->lock));
 }
 
 static inline u32
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index efa638c3acc7..28d403a8d7d2 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -40,7 +40,7 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 	return rb_entry(rb, struct i915_priolist, node);
 }
 
-static void assert_priolists(struct intel_engine_execlists * const execlists)
+static void assert_priolists(struct i915_sched_engine * const sched_engine)
 {
 	struct rb_node *rb;
 	long last_prio;
@@ -48,11 +48,11 @@ static void assert_priolists(struct intel_engine_execlists * const execlists)
 	if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
 		return;
 
-	GEM_BUG_ON(rb_first_cached(&execlists->queue) !=
-		   rb_first(&execlists->queue.rb_root));
+	GEM_BUG_ON(rb_first_cached(&sched_engine->queue) !=
+		   rb_first(&sched_engine->queue.rb_root));
 
 	last_prio = INT_MAX;
-	for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
+	for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
 		const struct i915_priolist *p = to_priolist(rb);
 
 		GEM_BUG_ON(p->priority > last_prio);
@@ -61,23 +61,22 @@ static void assert_priolists(struct intel_engine_execlists * const execlists)
 }
 
 struct list_head *
-i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
+i915_sched_lookup_priolist(struct i915_sched_engine *sched_engine, int prio)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct i915_priolist *p;
 	struct rb_node **parent, *rb;
 	bool first = true;
 
-	lockdep_assert_held(&engine->active.lock);
-	assert_priolists(execlists);
+	lockdep_assert_held(&sched_engine->lock);
+	assert_priolists(sched_engine);
 
-	if (unlikely(execlists->no_priolist))
+	if (unlikely(sched_engine->no_priolist))
 		prio = I915_PRIORITY_NORMAL;
 
 find_priolist:
 	/* most positive priority is scheduled first, equal priorities fifo */
 	rb = NULL;
-	parent = &execlists->queue.rb_root.rb_node;
+	parent = &sched_engine->queue.rb_root.rb_node;
 	while (*parent) {
 		rb = *parent;
 		p = to_priolist(rb);
@@ -92,7 +91,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 	}
 
 	if (prio == I915_PRIORITY_NORMAL) {
-		p = &execlists->default_priolist;
+		p = &sched_engine->default_priolist;
 	} else {
 		p = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC);
 		/* Convert an allocation failure to a priority bump */
@@ -107,7 +106,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 			 * requests, so if userspace lied about their
 			 * dependencies that reordering may be visible.
 			 */
-			execlists->no_priolist = true;
+			sched_engine->no_priolist = true;
 			goto find_priolist;
 		}
 	}
@@ -116,7 +115,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 	INIT_LIST_HEAD(&p->requests);
 
 	rb_link_node(&p->node, rb, parent);
-	rb_insert_color_cached(&p->node, &execlists->queue, first);
+	rb_insert_color_cached(&p->node, &sched_engine->queue, first);
 
 	return &p->requests;
 }
@@ -130,13 +129,13 @@ struct sched_cache {
 	struct list_head *priolist;
 };
 
-static struct intel_engine_cs *
-sched_lock_engine(const struct i915_sched_node *node,
-		  struct intel_engine_cs *locked,
+static struct i915_sched_engine *
+lock_sched_engine(struct i915_sched_node *node,
+		  struct i915_sched_engine *locked,
 		  struct sched_cache *cache)
 {
 	const struct i915_request *rq = node_to_request(node);
-	struct intel_engine_cs *engine;
+	struct i915_sched_engine *sched_engine;
 
 	GEM_BUG_ON(!locked);
 
@@ -146,81 +145,22 @@ sched_lock_engine(const struct i915_sched_node *node,
 	 * engine lock. The simple ploy we use is to take the lock then
 	 * check that the rq still belongs to the newly locked engine.
 	 */
-	while (locked != (engine = READ_ONCE(rq->engine))) {
-		spin_unlock(&locked->active.lock);
+	while (locked != (sched_engine = rq->engine->sched_engine)) {
+		spin_unlock(&locked->lock);
 		memset(cache, 0, sizeof(*cache));
-		spin_lock(&engine->active.lock);
-		locked = engine;
+		spin_lock(&sched_engine->lock);
+		locked = sched_engine;
 	}
 
-	GEM_BUG_ON(locked != engine);
+	GEM_BUG_ON(locked != sched_engine);
 	return locked;
 }
 
-static inline int rq_prio(const struct i915_request *rq)
-{
-	return rq->sched.attr.priority;
-}
-
-static inline bool need_preempt(int prio, int active)
-{
-	/*
-	 * Allow preemption of low -> normal -> high, but we do
-	 * not allow low priority tasks to preempt other low priority
-	 * tasks under the impression that latency for low priority
-	 * tasks does not matter (as much as background throughput),
-	 * so kiss.
-	 */
-	return prio >= max(I915_PRIORITY_NORMAL, active);
-}
-
-static void kick_submission(struct intel_engine_cs *engine,
-			    const struct i915_request *rq,
-			    int prio)
-{
-	const struct i915_request *inflight;
-
-	/*
-	 * We only need to kick the tasklet once for the high priority
-	 * new context we add into the queue.
-	 */
-	if (prio <= engine->execlists.queue_priority_hint)
-		return;
-
-	rcu_read_lock();
-
-	/* Nothing currently active? We're overdue for a submission! */
-	inflight = execlists_active(&engine->execlists);
-	if (!inflight)
-		goto unlock;
-
-	/*
-	 * If we are already the currently executing context, don't
-	 * bother evaluating if we should preempt ourselves.
-	 */
-	if (inflight->context == rq->context)
-		goto unlock;
-
-	ENGINE_TRACE(engine,
-		     "bumping queue-priority-hint:%d for rq:%llx:%lld, inflight:%llx:%lld prio %d\n",
-		     prio,
-		     rq->fence.context, rq->fence.seqno,
-		     inflight->fence.context, inflight->fence.seqno,
-		     inflight->sched.attr.priority);
-
-	engine->execlists.queue_priority_hint = prio;
-	if (need_preempt(prio, rq_prio(inflight)))
-		tasklet_hi_schedule(&engine->execlists.tasklet);
-
-unlock:
-	rcu_read_unlock();
-}
-
 static void __i915_schedule(struct i915_sched_node *node,
 			    const struct i915_sched_attr *attr)
 {
 	const int prio = max(attr->priority, node->attr.priority);
-	struct intel_engine_cs *engine;
+	struct i915_sched_engine *sched_engine;
 	struct i915_dependency *dep, *p;
 	struct i915_dependency stack;
 	struct sched_cache cache;
@@ -295,23 +235,24 @@ static void __i915_schedule(struct i915_sched_node *node,
 	}
 
 	memset(&cache, 0, sizeof(cache));
-	engine = node_to_request(node)->engine;
-	spin_lock(&engine->active.lock);
+	sched_engine = node_to_request(node)->engine->sched_engine;
+	spin_lock(&sched_engine->lock);
 
 	/* Fifo and depth-first replacement ensure our deps execute before us */
-	engine = sched_lock_engine(node, engine, &cache);
+	sched_engine = lock_sched_engine(node, sched_engine, &cache);
 	list_for_each_entry_safe_reverse(dep, p, &dfs, dfs_link) {
 		INIT_LIST_HEAD(&dep->dfs_link);
 
 		node = dep->signaler;
-		engine = sched_lock_engine(node, engine, &cache);
-		lockdep_assert_held(&engine->active.lock);
+		sched_engine = lock_sched_engine(node, sched_engine, &cache);
+		lockdep_assert_held(&sched_engine->lock);
 
 		/* Recheck after acquiring the engine->timeline.lock */
 		if (prio <= node->attr.priority || node_signaled(node))
 			continue;
 
-		GEM_BUG_ON(node_to_request(node)->engine != engine);
+		GEM_BUG_ON(node_to_request(node)->engine->sched_engine !=
+			   sched_engine);
 
 		WRITE_ONCE(node->attr.priority, prio);
 
@@ -329,16 +270,17 @@ static void __i915_schedule(struct i915_sched_node *node,
 		if (i915_request_in_priority_queue(node_to_request(node))) {
 			if (!cache.priolist)
 				cache.priolist =
-					i915_sched_lookup_priolist(engine,
+					i915_sched_lookup_priolist(sched_engine,
 								   prio);
 			list_move_tail(&node->link, cache.priolist);
 		}
 
 		/* Defer (tasklet) submission until after all of our updates. */
-		kick_submission(engine, node_to_request(node), prio);
+		if (sched_engine->kick_backend)
+			sched_engine->kick_backend(node_to_request(node), prio);
 	}
 
-	spin_unlock(&engine->active.lock);
+	spin_unlock(&sched_engine->lock);
 }
 
 void i915_schedule(struct i915_request *rq, const struct i915_sched_attr *attr)
@@ -489,6 +431,50 @@ void i915_request_show_with_schedule(struct drm_printer *m,
 	rcu_read_unlock();
 }
 
+void i915_sched_engine_free(struct kref *kref)
+{
+	struct i915_sched_engine *sched_engine =
+		container_of(kref, typeof(*sched_engine), ref);
+
+	i915_sched_engine_kill(sched_engine); /* flush the callback */
+	kfree(sched_engine);
+}
+
+struct i915_sched_engine *
+i915_sched_engine_create(unsigned int subclass)
+{
+	struct i915_sched_engine *sched_engine;
+
+	sched_engine = kzalloc(sizeof(*sched_engine), GFP_KERNEL);
+	if (!sched_engine)
+		return NULL;
+
+	kref_init(&sched_engine->ref);
+
+	sched_engine->queue = RB_ROOT_CACHED;
+	sched_engine->queue_priority_hint = INT_MIN;
+
+	INIT_LIST_HEAD(&sched_engine->requests);
+	INIT_LIST_HEAD(&sched_engine->hold);
+
+	spin_lock_init(&sched_engine->lock);
+	lockdep_set_subclass(&sched_engine->lock, subclass);
+
+	/*
+	 * Due to an interesting quirk in lockdep's internal debug tracking,
+	 * after setting a subclass we must ensure the lock is used. Otherwise,
+	 * nr_unused_locks is incremented once too often.
+	 */
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	local_irq_disable();
+	lock_map_acquire(&sched_engine->lock.dep_map);
+	lock_map_release(&sched_engine->lock.dep_map);
+	local_irq_enable();
+#endif
+
+	return sched_engine;
+}
+
 static void i915_global_scheduler_shrink(void)
 {
 	kmem_cache_shrink(global.slab_dependencies);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 858a0938f47a..a78b1f50ecb4 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -39,7 +39,7 @@ void i915_schedule(struct i915_request *request,
 		   const struct i915_sched_attr *attr);
 
 struct list_head *
-i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio);
+i915_sched_lookup_priolist(struct i915_sched_engine *sched_engine, int prio);
 
 void __i915_priolist_free(struct i915_priolist *p);
 static inline void i915_priolist_free(struct i915_priolist *p)
@@ -53,4 +53,67 @@ void i915_request_show_with_schedule(struct drm_printer *m,
 				     const char *prefix,
 				     int indent);
 
+struct i915_sched_engine *
+i915_sched_engine_create(unsigned int subclass);
+
+void i915_sched_engine_free(struct kref *kref);
+
+static inline struct i915_sched_engine *
+i915_sched_engine_get(struct i915_sched_engine *sched_engine)
+{
+	kref_get(&sched_engine->ref);
+	return sched_engine;
+}
+
+static inline void
+i915_sched_engine_put(struct i915_sched_engine *sched_engine)
+{
+	kref_put(&sched_engine->ref, i915_sched_engine_free);
+}
+
+static inline bool
+i915_sched_engine_is_empty(struct i915_sched_engine *sched_engine)
+{
+	return RB_EMPTY_ROOT(&sched_engine->queue.rb_root);
+}
+
+static inline void
+i915_sched_engine_reset_on_empty(struct i915_sched_engine *sched_engine)
+{
+	if (i915_sched_engine_is_empty(sched_engine))
+		sched_engine->no_priolist = false;
+}
+
+static inline void
+i915_sched_engine_hi_kick(struct i915_sched_engine *sched_engine)
+{
+	tasklet_hi_schedule(&sched_engine->tasklet);
+}
+
+static inline void
+i915_sched_engine_kick(struct i915_sched_engine *sched_engine)
+{
+	tasklet_schedule(&sched_engine->tasklet);
+}
+
+static inline void
+i915_sched_engine_kill(struct i915_sched_engine *sched_engine)
+{
+	tasklet_kill(&sched_engine->tasklet);
+}
+
+static inline void
+sched_engine_active_lock_bh(struct i915_sched_engine *sched_engine)
+{
+	local_bh_disable(); /* prevent local softirq and lock recursion */
+	tasklet_lock(&sched_engine->tasklet);
+}
+
+static inline void
+sched_engine_active_unlock_bh(struct i915_sched_engine *sched_engine)
+{
+	tasklet_unlock(&sched_engine->tasklet);
+	local_bh_enable(); /* restore softirq, and kick ksoftirqd! */
+}
+
 #endif /* _I915_SCHEDULER_H_ */
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index 343ed44d5ed4..90b389ba661b 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -91,4 +91,67 @@ struct i915_dependency {
 				&(rq__)->sched.signalers_list, \
 				signal_link)
 
+struct i915_sched_engine {
+	struct kref ref;
+
+	/*
+	 * @lock: Protects requests in priority lists, requests, hold, and
+	 * tasklet while running.
+	 */
+	spinlock_t lock;
+
+	/* Execlist specific lists, needed here as protected by lock */
+	struct list_head requests;
+	struct list_head hold; /* ready requests, but on hold */
+
+	/**
+	 * @tasklet: softirq tasklet for bottom handler
+	 */
+	struct tasklet_struct tasklet;
+
+	/**
+	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
+	 */
+	struct i915_priolist default_priolist;
+
+	/**
+	 * @queue_priority_hint: Highest pending priority.
+	 *
+	 * When we add requests into the queue, or adjust the priority of
+	 * executing requests, we compute the maximum priority of those
+	 * pending requests. We can then use this value to determine if
+	 * we need to preempt the executing requests to service the queue.
+	 * However, since the we may have recorded the priority of an inflight
+	 * request we wanted to preempt but since completed, at the time of
+	 * dequeuing the priority hint may no longer may match the highest
+	 * available request priority.
+	 */
+	int queue_priority_hint;
+
+	/**
+	 * @queue: queue of requests, in priority lists
+	 */
+	struct rb_root_cached queue;
+
+	/**
+	 * @no_priolist: priority lists disabled
+	 */
+	bool no_priolist;
+
+	/* Back pointer to engine */
+	struct intel_engine_cs *engine;
+
+	/* Kick backend */
+	void	(*kick_backend)(const struct i915_request *rq,
+				int prio);
+
+	/*
+	 * Call when the priority on a request has changed and it and its
+	 * dependencies may need rescheduling. Note the request itself may
+	 * not be ready to run!
+	 */
+	void	(*schedule)(struct i915_request *request,
+			    const struct i915_sched_attr *attr);
+};
+
 #endif /* _I915_SCHEDULER_TYPES_H_ */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 33/97] drm/i915: Engine relative MMIO
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (31 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 32/97] drm/i915: Introduce i915_sched_engine object Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-25  9:05   ` [Intel-gfx] " Tvrtko Ursulin
  2021-05-06 19:13 ` [RFC PATCH 34/97] drm/i915/guc: Use guc_class instead of engine_class in fw interface Matthew Brost
                   ` (66 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

With virtual engines, it is no longer possible to know which specific
physical engine a given request will be executed on at the time that
request is generated. This means that the request itself must be engine
agnostic - any direct register writes must be relative to the engine
and not absolute addresses.

The LRI command has support for engine relative addressing. However,
the mechanism is not transparent to the driver. The scheme for Gen11
(MI_LRI_ADD_CS_MMIO_START) requires the LRI address to have no
absolute engine base component. The hardware then adds on the correct
engine offset at execution time.

Due to the non-trivial and differing schemes on different hardware, it
is not possible to simply update the code that creates the LRI
commands to set a remap flag and let the hardware get on with it.
Instead, this patch adds function wrappers for generating the LRI
command itself and then for constructing the correct address to use
with the LRI.

Bspec: 45606
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
CC: Rodrigo Vivi <rodrigo.vivi@intel.com>
CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
CC: Chris P Wilson <chris.p.wilson@intel.com>
CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c  |  7 +++---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 25 ++++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_engine_types.h |  3 +++
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  5 ++++
 drivers/gpu/drm/i915/i915_perf.c             |  6 +++++
 5 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 188dee13e017..993faa213b41 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1211,7 +1211,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
 {
 	struct i915_address_space *vm = rq->context->vm;
 	struct intel_engine_cs *engine = rq->engine;
-	u32 base = engine->mmio_base;
+	u32 base = engine->lri_mmio_base;
 	u32 *cs;
 	int i;
 
@@ -1223,7 +1223,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
 		if (IS_ERR(cs))
 			return PTR_ERR(cs);
 
-		*cs++ = MI_LOAD_REGISTER_IMM(2);
+		*cs++ = MI_LOAD_REGISTER_IMM(2) | engine->lri_cmd_mode;
 
 		*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, 0));
 		*cs++ = upper_32_bits(pd_daddr);
@@ -1245,7 +1245,8 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
 		if (IS_ERR(cs))
 			return PTR_ERR(cs);
 
-		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) | MI_LRI_FORCE_POSTED;
+		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) |
+			MI_LRI_FORCE_POSTED | engine->lri_cmd_mode;
 		for (i = GEN8_3LVL_PDPES; i--; ) {
 			const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index ec82a7ec0c8d..c88b792c1ab5 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -16,6 +16,7 @@
 #include "intel_engine_pm.h"
 #include "intel_engine_user.h"
 #include "intel_execlists_submission.h"
+#include "intel_gpu_commands.h"
 #include "intel_gt.h"
 #include "intel_gt_requests.h"
 #include "intel_gt_pm.h"
@@ -223,6 +224,28 @@ static u32 __engine_mmio_base(struct drm_i915_private *i915,
 	return bases[i].base;
 }
 
+static bool i915_engine_has_relative_lri(const struct intel_engine_cs *engine)
+{
+	if (INTEL_GEN(engine->i915) < 11)
+		return false;
+
+	if (engine->class == COPY_ENGINE_CLASS)
+		return false;
+
+	return true;
+}
+
+static void lri_init(struct intel_engine_cs *engine)
+{
+	if (i915_engine_has_relative_lri(engine)) {
+		engine->lri_cmd_mode = MI_LRI_LRM_CS_MMIO;
+		engine->lri_mmio_base = 0;
+	} else {
+		engine->lri_cmd_mode = 0;
+		engine->lri_mmio_base = engine->mmio_base;
+	}
+}
+
 static void __sprint_engine_name(struct intel_engine_cs *engine)
 {
 	/*
@@ -327,6 +350,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 	if (engine->context_size)
 		DRIVER_CAPS(i915)->has_logical_contexts = true;
 
+	lri_init(engine);
+
 	ewma__engine_latency_init(&engine->latency);
 	seqcount_init(&engine->stats.lock);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 93aa22680db0..86302e6d86b2 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -281,6 +281,9 @@ struct intel_engine_cs {
 	u32 context_size;
 	u32 mmio_base;
 
+	u32 lri_mmio_base;
+	u32 lri_cmd_mode;
+
 	/*
 	 * Some w/a require forcewake to be held (which prevents RC6) while
 	 * a particular engine is active. If so, we set fw_domain to which
diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index 14e2ffb6c0e5..887d59897bc2 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -134,6 +134,11 @@
  *   simply ignores the register load under certain conditions.
  * - One can actually load arbitrary many arbitrary registers: Simply issue x
  *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
+ * - Newer hardware supports engine relative addressing but older hardware does
+ *   not. This is required for hw engine load balancing. Hence the MI_LRI
+ *   instruction itself is prefixed with '__' and should only be used on
+ *   legacy hardware code paths. Generic code must always use the MI_LRI
+ *   and i915_get_lri_reg() helper functions instead.
  */
 #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
 /* Gen11+. addr = base + (ctx_restore ? offset & GENMASK(12,2) : offset) */
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 66f1f25119b5..b9cc3f0a616f 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -2118,6 +2118,11 @@ gen8_update_reg_state_unlocked(const struct intel_context *ce,
 	u32 *reg_state = ce->lrc_reg_state;
 	int i;
 
+	/*
+	 * NB: The LRI instruction is generated by the hardware.
+	 * Should we read it in and assert that the offset flag is set?
+	 */
+
 	reg_state[ctx_oactxctrl + 1] =
 		(stream->period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
 		(stream->periodic ? GEN8_OA_TIMER_ENABLE : 0) |
@@ -2174,6 +2179,7 @@ gen8_load_flex(struct i915_request *rq,
 
 	*cs++ = MI_LOAD_REGISTER_IMM(count);
 	do {
+		/* FIXME: Is this table LRI remap/offset friendly? */
 		*cs++ = i915_mmio_reg_offset(flex->reg);
 		*cs++ = flex->value;
 	} while (flex++, --count);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 34/97] drm/i915/guc: Use guc_class instead of engine_class in fw interface
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (32 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 33/97] drm/i915: Engine relative MMIO Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-26 20:41   ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 35/97] drm/i915/guc: Improve error message for unsolicited CT response Matthew Brost
                   ` (65 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

GuC has its own defines for the engine classes. They're currently
mapping 1:1 to the defines used by the driver, but there is no guarantee
this will continue in the future. Given that we've been caught off-guard
in the past by similar divergences, we can prepare for the changes by
introducing helper functions to convert from engine class to GuC class and
back again.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c   |  6 +++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c  | 20 +++++++++-------
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 26 +++++++++++++++++++++
 3 files changed, 42 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index c88b792c1ab5..7866ff0c2673 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -289,6 +289,7 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 	const struct engine_info *info = &intel_engines[id];
 	struct drm_i915_private *i915 = gt->i915;
 	struct intel_engine_cs *engine;
+	u8 guc_class;
 
 	BUILD_BUG_ON(MAX_ENGINE_CLASS >= BIT(GEN11_ENGINE_CLASS_WIDTH));
 	BUILD_BUG_ON(MAX_ENGINE_INSTANCE >= BIT(GEN11_ENGINE_INSTANCE_WIDTH));
@@ -317,9 +318,10 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 	engine->i915 = i915;
 	engine->gt = gt;
 	engine->uncore = gt->uncore;
-	engine->mmio_base = __engine_mmio_base(i915, info->mmio_bases);
 	engine->hw_id = info->hw_id;
-	engine->guc_id = MAKE_GUC_ID(info->class, info->instance);
+	guc_class = engine_class_to_guc_class(info->class);
+	engine->guc_id = MAKE_GUC_ID(guc_class, info->instance);
+	engine->mmio_base = __engine_mmio_base(i915, info->mmio_bases);
 
 	engine->irq_handler = nop_irq_handler;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 775f00d706fa..ecd18531b40a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -6,6 +6,7 @@
 #include "gt/intel_gt.h"
 #include "gt/intel_lrc.h"
 #include "intel_guc_ads.h"
+#include "intel_guc_fwif.h"
 #include "intel_uc.h"
 #include "i915_drv.h"
 
@@ -78,7 +79,7 @@ static void guc_mapping_table_init(struct intel_gt *gt,
 				GUC_MAX_INSTANCES_PER_CLASS;
 
 	for_each_engine(engine, gt, id) {
-		u8 guc_class = engine->class;
+		u8 guc_class = engine_class_to_guc_class(engine->class);
 
 		system_info->mapping_table[guc_class][engine->instance] =
 			engine->instance;
@@ -98,7 +99,7 @@ static void __guc_ads_init(struct intel_guc *guc)
 	struct __guc_ads_blob *blob = guc->ads_blob;
 	const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE;
 	u32 base;
-	u8 engine_class;
+	u8 engine_class, guc_class;
 
 	/* GuC scheduling policies */
 	guc_policies_init(&blob->policies);
@@ -114,22 +115,25 @@ static void __guc_ads_init(struct intel_guc *guc)
 	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
 		if (engine_class == OTHER_CLASS)
 			continue;
+
+		guc_class = engine_class_to_guc_class(engine_class);
+
 		/*
 		 * TODO: Set context pointer to default state to allow
 		 * GuC to re-init guilty contexts after internal reset.
 		 */
-		blob->ads.golden_context_lrca[engine_class] = 0;
-		blob->ads.eng_state_size[engine_class] =
+		blob->ads.golden_context_lrca[guc_class] = 0;
+		blob->ads.eng_state_size[guc_class] =
 			intel_engine_context_size(guc_to_gt(guc),
 						  engine_class) -
 			skipped_size;
 	}
 
 	/* System info */
-	blob->system_info.engine_enabled_masks[RENDER_CLASS] = 1;
-	blob->system_info.engine_enabled_masks[COPY_ENGINE_CLASS] = 1;
-	blob->system_info.engine_enabled_masks[VIDEO_DECODE_CLASS] = VDBOX_MASK(gt);
-	blob->system_info.engine_enabled_masks[VIDEO_ENHANCEMENT_CLASS] = VEBOX_MASK(gt);
+	blob->system_info.engine_enabled_masks[GUC_RENDER_CLASS] = 1;
+	blob->system_info.engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
+	blob->system_info.engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
+	blob->system_info.engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
 
 	blob->system_info.generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_SLICE_ENABLED] =
 		hweight8(gt->info.sseu.slice_mask);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 301b173a26bc..558cfe168cb7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -15,6 +15,7 @@
 #include "abi/guc_communication_mmio_abi.h"
 #include "abi/guc_communication_ctb_abi.h"
 #include "abi/guc_messages_abi.h"
+#include "gt/intel_engine_types.h"
 
 #define GUC_CLIENT_PRIORITY_KMD_HIGH	0
 #define GUC_CLIENT_PRIORITY_HIGH	1
@@ -32,6 +33,12 @@
 #define GUC_VIDEO_ENGINE2		4
 #define GUC_MAX_ENGINES_NUM		(GUC_VIDEO_ENGINE2 + 1)
 
+#define GUC_RENDER_CLASS		0
+#define GUC_VIDEO_CLASS			1
+#define GUC_VIDEOENHANCE_CLASS		2
+#define GUC_BLITTER_CLASS		3
+#define GUC_RESERVED_CLASS		4
+#define GUC_LAST_ENGINE_CLASS		GUC_RESERVED_CLASS
 #define GUC_MAX_ENGINE_CLASSES		16
 #define GUC_MAX_INSTANCES_PER_CLASS	32
 
@@ -129,6 +136,25 @@
 #define GUC_ID_TO_ENGINE_INSTANCE(guc_id) \
 	(((guc_id) & GUC_ENGINE_INSTANCE_MASK) >> GUC_ENGINE_INSTANCE_SHIFT)
 
+static inline u8 engine_class_to_guc_class(u8 class)
+{
+	BUILD_BUG_ON(GUC_RENDER_CLASS != RENDER_CLASS);
+	BUILD_BUG_ON(GUC_BLITTER_CLASS != COPY_ENGINE_CLASS);
+	BUILD_BUG_ON(GUC_VIDEO_CLASS != VIDEO_DECODE_CLASS);
+	BUILD_BUG_ON(GUC_VIDEOENHANCE_CLASS != VIDEO_ENHANCEMENT_CLASS);
+	GEM_BUG_ON(class > MAX_ENGINE_CLASS || class == OTHER_CLASS);
+
+	return class;
+}
+
+static inline u8 guc_class_to_engine_class(u8 guc_class)
+{
+	GEM_BUG_ON(guc_class > GUC_LAST_ENGINE_CLASS);
+	GEM_BUG_ON(guc_class == GUC_RESERVED_CLASS);
+
+	return guc_class;
+}
+
 /* Work item for submitting workloads into work queue of GuC. */
 struct guc_wq_item {
 	u32 header;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 35/97] drm/i915/guc: Improve error message for unsolicited CT response
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (33 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 34/97] drm/i915/guc: Use guc_class instead of engine_class in fw interface Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-24 11:59   ` Michal Wajdeczko
  2021-05-06 19:13 ` [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function Matthew Brost
                   ` (64 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Improve the error message when a unsolicited CT response is received by
printing fence that couldn't be found, the last fence, and all requests
with a response outstanding.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 217ab3ebd1af..a76603537fa8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -703,12 +703,16 @@ static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 		found = true;
 		break;
 	}
-	spin_unlock_irqrestore(&ct->requests.lock, flags);
-
 	if (!found) {
 		CT_ERROR(ct, "Unsolicited response (fence %u)\n", fence);
-		return -ENOKEY;
+		CT_ERROR(ct, "Could not find fence=%u, last_fence=%u\n", fence,
+			 ct->requests.last_fence);
+		list_for_each_entry(req, &ct->requests.pending, link)
+			CT_ERROR(ct, "request %u awaits response\n",
+				 req->fence);
+		err = -ENOKEY;
 	}
+	spin_unlock_irqrestore(&ct->requests.lock, flags);
 
 	if (unlikely(err))
 		return err;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (34 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 35/97] drm/i915/guc: Improve error message for unsolicited CT response Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-24 12:21   ` Michal Wajdeczko
  2021-05-25  9:21   ` [Intel-gfx] " Tvrtko Ursulin
  2021-05-06 19:13 ` [RFC PATCH 37/97] drm/i915/guc: Add stall timer to " Matthew Brost
                   ` (63 subsequent siblings)
  99 siblings, 2 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Add non blocking CTB send function, intel_guc_send_nb. In order to
support a non blocking CTB send function a spin lock is needed to
protect the CTB descriptors fields. Also the non blocking call must not
update the fence value as this value is owned by the blocking call
(intel_guc_send).

The blocking CTB now must have a flow control mechanism to ensure the
buffer isn't overrun. A lazy spin wait is used as we believe the flow
control condition should be rare with properly sized buffer.

The function, intel_guc_send_nb, is exported in this patch but unused.
Several patches later in the series make use of this function.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 ++-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 96 +++++++++++++++++++++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  7 +-
 3 files changed, 105 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index c20f3839de12..4c0a367e41d8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -75,7 +75,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
 static
 inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
 {
-	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
+	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
+}
+
+#define INTEL_GUC_SEND_NB		BIT(31)
+static
+inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
+{
+	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
+				 INTEL_GUC_SEND_NB);
 }
 
 static inline int
@@ -83,7 +91,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
 			   u32 *response_buf, u32 response_buf_size)
 {
 	return intel_guc_ct_send(&guc->ct, action, len,
-				 response_buf, response_buf_size);
+				 response_buf, response_buf_size, 0);
 }
 
 static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index a76603537fa8..af7314d45a78 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -3,6 +3,11 @@
  * Copyright © 2016-2019 Intel Corporation
  */
 
+#include <linux/circ_buf.h>
+#include <linux/ktime.h>
+#include <linux/time64.h>
+#include <linux/timekeeping.h>
+
 #include "i915_drv.h"
 #include "intel_guc_ct.h"
 #include "gt/intel_gt.h"
@@ -308,6 +313,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
 	if (unlikely(err))
 		goto err_deregister;
 
+	ct->requests.last_fence = 1;
 	ct->enabled = true;
 
 	return 0;
@@ -343,10 +349,22 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
 	return ++ct->requests.last_fence;
 }
 
+static void write_barrier(struct intel_guc_ct *ct) {
+	struct intel_guc *guc = ct_to_guc(ct);
+	struct intel_gt *gt = guc_to_gt(guc);
+
+	if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
+		GEM_BUG_ON(guc->send_regs.fw_domains);
+		intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
+	} else {
+		wmb();
+	}
+}
+
 static int ct_write(struct intel_guc_ct *ct,
 		    const u32 *action,
 		    u32 len /* in dwords */,
-		    u32 fence)
+		    u32 fence, u32 flags)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
@@ -393,9 +411,13 @@ static int ct_write(struct intel_guc_ct *ct,
 		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
 		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
 
-	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
-	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
-			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
+	hxg = (flags & INTEL_GUC_SEND_NB) ?
+		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
+		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
+			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
+		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
+		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
+			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
 
 	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
 		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
@@ -412,6 +434,12 @@ static int ct_write(struct intel_guc_ct *ct,
 	}
 	GEM_BUG_ON(tail > size);
 
+	/*
+	 * make sure H2G buffer update and LRC tail update (if this triggering a
+	 * submission) are visable before updating the descriptor tail
+	 */
+	write_barrier(ct);
+
 	/* now update descriptor */
 	WRITE_ONCE(desc->tail, tail);
 
@@ -466,6 +494,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
 	return err;
 }
 
+static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
+{
+	struct guc_ct_buffer_desc *desc = ctb->desc;
+	u32 head = READ_ONCE(desc->head);
+	u32 space;
+
+	space = CIRC_SPACE(desc->tail, head, ctb->size);
+
+	return space >= len_dw;
+}
+
+static int ct_send_nb(struct intel_guc_ct *ct,
+		      const u32 *action,
+		      u32 len,
+		      u32 flags)
+{
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+	unsigned long spin_flags;
+	u32 fence;
+	int ret;
+
+	spin_lock_irqsave(&ctb->lock, spin_flags);
+
+	ret = ctb_has_room(ctb, len + 1);
+	if (unlikely(ret))
+		goto out;
+
+	fence = ct_get_next_fence(ct);
+	ret = ct_write(ct, action, len, fence, flags);
+	if (unlikely(ret))
+		goto out;
+
+	intel_guc_notify(ct_to_guc(ct));
+
+out:
+	spin_unlock_irqrestore(&ctb->lock, spin_flags);
+
+	return ret;
+}
+
 static int ct_send(struct intel_guc_ct *ct,
 		   const u32 *action,
 		   u32 len,
@@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
 		   u32 response_buf_size,
 		   u32 *status)
 {
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct ct_request request;
 	unsigned long flags;
 	u32 fence;
@@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
 	GEM_BUG_ON(!len);
 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
 	GEM_BUG_ON(!response_buf && response_buf_size);
+	might_sleep();
 
+	/*
+	 * We use a lazy spin wait loop here as we believe that if the CT
+	 * buffers are sized correctly the flow control condition should be
+	 * rare.
+	 */
+retry:
 	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
+	if (unlikely(!ctb_has_room(ctb, len + 1))) {
+		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
+		cond_resched();
+		goto retry;
+	}
 
 	fence = ct_get_next_fence(ct);
 	request.fence = fence;
@@ -495,7 +576,7 @@ static int ct_send(struct intel_guc_ct *ct,
 	list_add_tail(&request.link, &ct->requests.pending);
 	spin_unlock(&ct->requests.lock);
 
-	err = ct_write(ct, action, len, fence);
+	err = ct_write(ct, action, len, fence, 0);
 
 	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
 
@@ -537,7 +618,7 @@ static int ct_send(struct intel_guc_ct *ct,
  * Command Transport (CT) buffer based GuC send function.
  */
 int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
-		      u32 *response_buf, u32 response_buf_size)
+		      u32 *response_buf, u32 response_buf_size, u32 flags)
 {
 	u32 status = ~0; /* undefined */
 	int ret;
@@ -547,6 +628,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 		return -ENODEV;
 	}
 
+	if (flags & INTEL_GUC_SEND_NB)
+		return ct_send_nb(ct, action, len, flags);
+
 	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
 	if (unlikely(ret < 0)) {
 		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 1ae2dde6db93..55ef7c52472f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -9,6 +9,7 @@
 #include <linux/interrupt.h>
 #include <linux/spinlock.h>
 #include <linux/workqueue.h>
+#include <linux/ktime.h>
 
 #include "intel_guc_fwif.h"
 
@@ -42,7 +43,6 @@ struct intel_guc_ct_buffer {
 	bool broken;
 };
 
-
 /** Top-level structure for Command Transport related data
  *
  * Includes a pair of CT buffers for bi-directional communication and tracking
@@ -69,6 +69,9 @@ struct intel_guc_ct {
 		struct list_head incoming; /* incoming requests */
 		struct work_struct worker; /* handler for incoming requests */
 	} requests;
+
+	/** @stall_time: time of first time a CTB submission is stalled */
+	ktime_t stall_time;
 };
 
 void intel_guc_ct_init_early(struct intel_guc_ct *ct);
@@ -88,7 +91,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
 }
 
 int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
-		      u32 *response_buf, u32 response_buf_size);
+		      u32 *response_buf, u32 response_buf_size, u32 flags);
 void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
 
 #endif /* _INTEL_GUC_CT_H_ */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 37/97] drm/i915/guc: Add stall timer to non blocking CTB send function
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (35 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-24 12:58   ` Michal Wajdeczko
  2021-05-06 19:13 ` [RFC PATCH 38/97] drm/i915/guc: Optimize CTB writes and reads Matthew Brost
                   ` (62 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Implement a stall timer which fails H2G CTBs once a period of time
with no forward progress is reached to prevent deadlock.

Also update to ct_write to return -EDEADLK rather than -EPIPE on a
corrupted descriptor.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 48 +++++++++++++++++++++--
 1 file changed, 45 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index af7314d45a78..4eab319d61be 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -69,6 +69,8 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
 #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
 #define CTB_G2H_BUFFER_SIZE	(SZ_4K)
 
+#define MAX_US_STALL_CTB	1000000
+
 struct ct_request {
 	struct list_head link;
 	u32 fence;
@@ -315,6 +317,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
 
 	ct->requests.last_fence = 1;
 	ct->enabled = true;
+	ct->stall_time = KTIME_MAX;
 
 	return 0;
 
@@ -378,7 +381,7 @@ static int ct_write(struct intel_guc_ct *ct,
 	unsigned int i;
 
 	if (unlikely(ctb->broken))
-		return -EPIPE;
+		return -EDEADLK;
 
 	if (unlikely(desc->status))
 		goto corrupted;
@@ -449,7 +452,7 @@ static int ct_write(struct intel_guc_ct *ct,
 	CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n",
 		 desc->head, desc->tail, desc->status);
 	ctb->broken = true;
-	return -EPIPE;
+	return -EDEADLK;
 }
 
 /**
@@ -494,6 +497,17 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
 	return err;
 }
 
+static inline bool ct_deadlocked(struct intel_guc_ct *ct)
+{
+	bool ret = ktime_us_delta(ktime_get(), ct->stall_time) >
+		MAX_US_STALL_CTB;
+
+	if (unlikely(ret))
+		CT_ERROR(ct, "CT deadlocked\n");
+
+	return ret;
+}
+
 static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
 {
 	struct guc_ct_buffer_desc *desc = ctb->desc;
@@ -505,6 +519,26 @@ static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
 	return space >= len_dw;
 }
 
+static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
+{
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+
+	lockdep_assert_held(&ct->ctbs.send.lock);
+
+	if (unlikely(!ctb_has_room(ctb, len_dw))) {
+		if (ct->stall_time == KTIME_MAX)
+			ct->stall_time = ktime_get();
+
+		if (unlikely(ct_deadlocked(ct)))
+			return -EDEADLK;
+		else
+			return -EBUSY;
+	}
+
+	ct->stall_time = KTIME_MAX;
+	return 0;
+}
+
 static int ct_send_nb(struct intel_guc_ct *ct,
 		      const u32 *action,
 		      u32 len,
@@ -517,7 +551,7 @@ static int ct_send_nb(struct intel_guc_ct *ct,
 
 	spin_lock_irqsave(&ctb->lock, spin_flags);
 
-	ret = ctb_has_room(ctb, len + 1);
+	ret = has_room_nb(ct, len + 1);
 	if (unlikely(ret))
 		goto out;
 
@@ -561,11 +595,19 @@ static int ct_send(struct intel_guc_ct *ct,
 retry:
 	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
 	if (unlikely(!ctb_has_room(ctb, len + 1))) {
+		if (ct->stall_time == KTIME_MAX)
+			ct->stall_time = ktime_get();
 		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
+
+		if (unlikely(ct_deadlocked(ct)))
+			return -EDEADLK;
+
 		cond_resched();
 		goto retry;
 	}
 
+	ct->stall_time = KTIME_MAX;
+
 	fence = ct_get_next_fence(ct);
 	request.fence = fence;
 	request.status = 0;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 38/97] drm/i915/guc: Optimize CTB writes and reads
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (36 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 37/97] drm/i915/guc: Add stall timer to " Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-24 13:31   ` Michal Wajdeczko
  2021-05-06 19:13 ` [RFC PATCH 39/97] drm/i915/guc: Increase size of CTB buffers Matthew Brost
                   ` (61 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail, size) which could result in accesses across the PCIe
bus, store shadow local copies and only read/write the descriptor
values when absolutely necessary.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 78 +++++++++++++----------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
 2 files changed, 52 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 4eab319d61be..77dfbc94dcc3 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -127,6 +127,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
 static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
 {
 	ctb->broken = false;
+	ctb->tail = 0;
+	ctb->head = 0;
+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+
 	guc_ct_buffer_desc_init(ctb->desc);
 }
 
@@ -371,10 +375,8 @@ static int ct_write(struct intel_guc_ct *ct,
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
-	u32 tail = desc->tail;
+	u32 tail = ctb->tail;
 	u32 size = ctb->size;
-	u32 used;
 	u32 header;
 	u32 hxg;
 	u32 *cmds = ctb->cmds;
@@ -386,25 +388,14 @@ static int ct_write(struct intel_guc_ct *ct,
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely((desc->tail | desc->head) >= size)) {
 		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-			 head, tail, size);
+			 desc->head, desc->tail, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
-
-	/*
-	 * tail == head condition indicates empty. GuC FW does not support
-	 * using up the entire buffer to get tail == head meaning full.
-	 */
-	if (tail < head)
-		used = (size - head) + tail;
-	else
-		used = tail - head;
-
-	/* make sure there is a space including extra dw for the fence */
-	if (unlikely(used + len + 1 >= size))
-		return -ENOSPC;
+#endif
 
 	/*
 	 * dw0: CT header (including fence)
@@ -444,7 +435,9 @@ static int ct_write(struct intel_guc_ct *ct,
 	write_barrier(ct);
 
 	/* now update descriptor */
+	ctb->tail = tail;
 	WRITE_ONCE(desc->tail, tail);
+	ctb->space -= len + 1;
 
 	return 0;
 
@@ -460,7 +453,7 @@ static int ct_write(struct intel_guc_ct *ct,
  * @req:	pointer to pending request
  * @status:	placeholder for status
  *
- * For each sent request, Guc shall send bac CT response message.
+ * For each sent request, GuC shall send back CT response message.
  * Our message handler will update status of tracked request once
  * response message with given fence is received. Wait here and
  * check for valid response status value.
@@ -508,24 +501,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 	return ret;
 }
 
-static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
+static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = READ_ONCE(desc->head);
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+	u32 head;
 	u32 space;
 
-	space = CIRC_SPACE(desc->tail, head, ctb->size);
+	if (ctb->space >= len_dw)
+		return true;
+
+	head = READ_ONCE(ctb->desc->head);
+	if (unlikely(head > ctb->size)) {
+		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
+			  ctb->desc->head, ctb->desc->tail, ctb->size);
+		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
+		ctb->broken = true;
+		return false;
+	}
+
+	space = CIRC_SPACE(ctb->tail, head, ctb->size);
+	ctb->space = space;
 
 	return space >= len_dw;
 }
 
 static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
-
 	lockdep_assert_held(&ct->ctbs.send.lock);
 
-	if (unlikely(!ctb_has_room(ctb, len_dw))) {
+	if (unlikely(!h2g_has_room(ct, len_dw))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 
@@ -593,11 +597,11 @@ static int ct_send(struct intel_guc_ct *ct,
 	 * rare.
 	 */
 retry:
-	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
-	if (unlikely(!ctb_has_room(ctb, len + 1))) {
+	spin_lock_irqsave(&ctb->lock, flags);
+	if (unlikely(!h2g_has_room(ct, len + 1))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
-		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
+		spin_unlock_irqrestore(&ctb->lock, flags);
 
 		if (unlikely(ct_deadlocked(ct)))
 			return -EDEADLK;
@@ -620,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct,
 
 	err = ct_write(ct, action, len, fence, 0);
 
-	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
+	spin_unlock_irqrestore(&ctb->lock, flags);
 
 	if (unlikely(err))
 		goto unlink;
@@ -708,7 +712,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
+	u32 head = ctb->head;
 	u32 tail = desc->tail;
 	u32 size = ctb->size;
 	u32 *cmds = ctb->cmds;
@@ -723,12 +727,21 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely((desc->tail | desc->head) >= size)) {
 		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
 			 head, tail, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
+#else
+	if (unlikely((tail | ctb->head) >= size)) {
+		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
+			 head, tail, size);
+		desc->status |= GUC_CTB_STATUS_OVERFLOW;
+		goto corrupted;
+	}
+#endif
 
 	/* tail == head condition indicates empty */
 	available = tail - head;
@@ -778,6 +791,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	}
 	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
 
+	ctb->head = head;
 	/* now update descriptor */
 	WRITE_ONCE(desc->head, head);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 55ef7c52472f..9924335e2ee6 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -33,6 +33,9 @@ struct intel_guc;
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
  * @size: size of the commands buffer in dwords
+ * @head: local shadow copy of head in dwords
+ * @tail: local shadow copy of tail in dwords
+ * @space: local shadow copy of space in dwords
  * @broken: flag to indicate if descriptor data is broken
  */
 struct intel_guc_ct_buffer {
@@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
 	u32 size;
+	u32 tail;
+	u32 head;
+	u32 space;
 	bool broken;
 };
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 39/97] drm/i915/guc: Increase size of CTB buffers
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (37 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 38/97] drm/i915/guc: Optimize CTB writes and reads Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-24 13:43   ` [Intel-gfx] " Michal Wajdeczko
  2021-05-25  9:24   ` Tvrtko Ursulin
  2021-05-06 19:13 ` [RFC PATCH 40/97] drm/i915/guc: Module load failure test for CT buffer creation Matthew Brost
                   ` (60 subsequent siblings)
  99 siblings, 2 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

With the introduction of non-blocking CTBs more than one CTB can be in
flight at a time. Increasing the size of the CTBs should reduce how
often software hits the case where no space is available in the CTB
buffer.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 77dfbc94dcc3..d6895d29ed2d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
  *      +--------+-----------------------------------------------+------+
  *
  * Size of each `CT Buffer`_ must be multiple of 4K.
- * As we don't expect too many messages, for now use minimum sizes.
+ * We don't expect too many messages in flight at any time, unless we are
+ * using the GuC submission. In that case each request requires a minimum
+ * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this
+ * enough space to avoid backpressure on the driver. We increase the size
+ * of the receive buffer (relative to the send) to ensure a G2H response
+ * CTB has a landing spot.
  */
 #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
 #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
-#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
+#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
 
 #define MAX_US_STALL_CTB	1000000
 
@@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	/* beware of buffer wrap case */
 	if (unlikely(available < 0))
 		available += size;
-	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
+	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
 	GEM_BUG_ON(available < 0);
 
 	header = cmds[head];
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 40/97] drm/i915/guc: Module load failure test for CT buffer creation
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (38 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 39/97] drm/i915/guc: Increase size of CTB buffers Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-24 13:45   ` Michal Wajdeczko
  2021-05-06 19:13 ` [RFC PATCH 41/97] drm/i915/guc: Add new GuC interface defines and structures Matthew Brost
                   ` (59 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

Add several module failure load inject points in the CT buffer creation
code path.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index d6895d29ed2d..586e6efc3558 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -177,6 +177,10 @@ static int ct_register_buffer(struct intel_guc_ct *ct, u32 type,
 {
 	int err;
 
+	err = i915_inject_probe_error(guc_to_gt(ct_to_guc(ct))->i915, -ENXIO);
+	if (unlikely(err))
+		return err;
+
 	err = guc_action_register_ct_buffer(ct_to_guc(ct), type,
 					    desc_addr, buff_addr, size);
 	if (unlikely(err))
@@ -228,6 +232,10 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
 	u32 *cmds;
 	int err;
 
+	err = i915_inject_probe_error(guc_to_gt(guc)->i915, -ENXIO);
+	if (err)
+		return err;
+
 	GEM_BUG_ON(ct->vma);
 
 	blob_size = 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE + CTB_G2H_BUFFER_SIZE;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 41/97] drm/i915/guc: Add new GuC interface defines and structures
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (39 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 40/97] drm/i915/guc: Module load failure test for CT buffer creation Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 42/97] drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor Matthew Brost
                   ` (58 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Add new GuC interface defines and structures while maintaining old ones
in parallel.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  | 18 ++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   | 41 +++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 6cb0d3eb9b72..c0a715ec7276 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -121,13 +121,31 @@ enum intel_guc_action {
 	INTEL_GUC_ACTION_DEALLOCATE_DOORBELL = 0x20,
 	INTEL_GUC_ACTION_LOG_BUFFER_FILE_FLUSH_COMPLETE = 0x30,
 	INTEL_GUC_ACTION_UK_LOG_ENABLE_LOGGING = 0x40,
+	INTEL_GUC_ACTION_LOG_CACHE_CRASH_DUMP = 0x200,
+	INTEL_GUC_ACTION_GLOBAL_DEBUG_ACTIONS = 0x301,
 	INTEL_GUC_ACTION_FORCE_LOG_BUFFER_FLUSH = 0x302,
+	INTEL_GUC_ACTION_LOG_VERBOSITY_SELECT = 0x400,
 	INTEL_GUC_ACTION_ENTER_S_STATE = 0x501,
 	INTEL_GUC_ACTION_EXIT_S_STATE = 0x502,
+	INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE = 0x506,
+	INTEL_GUC_ACTION_SCHED_CONTEXT = 0x1000,
+	INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET = 0x1001,
+	INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE = 0x1002,
+	INTEL_GUC_ACTION_SCHED_ENGINE_MODE_SET = 0x1003,
+	INTEL_GUC_ACTION_SCHED_ENGINE_MODE_DONE = 0x1004,
+	INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY = 0x1005,
+	INTEL_GUC_ACTION_SET_CONTEXT_EXECUTION_QUANTUM = 0x1006,
+	INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT = 0x1007,
+	INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION = 0x1008,
+	INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION = 0x1009,
 	INTEL_GUC_ACTION_SLPC_REQUEST = 0x3003,
+	INTEL_GUC_ACTION_SETUP_GUCRC = 0x3004,
 	INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000,
+	INTEL_GUC_ACTION_REGISTER_CONTEXT = 0x4502,
+	INTEL_GUC_ACTION_DEREGISTER_CONTEXT = 0x4503,
 	INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
 	INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
+	INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
 	INTEL_GUC_ACTION_LIMIT
 };
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 558cfe168cb7..cae8649a8147 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -17,6 +17,9 @@
 #include "abi/guc_messages_abi.h"
 #include "gt/intel_engine_types.h"
 
+#define GUC_CONTEXT_DISABLE		0
+#define GUC_CONTEXT_ENABLE		1
+
 #define GUC_CLIENT_PRIORITY_KMD_HIGH	0
 #define GUC_CLIENT_PRIORITY_HIGH	1
 #define GUC_CLIENT_PRIORITY_KMD_NORMAL	2
@@ -26,6 +29,9 @@
 #define GUC_MAX_STAGE_DESCRIPTORS	1024
 #define	GUC_INVALID_STAGE_ID		GUC_MAX_STAGE_DESCRIPTORS
 
+#define GUC_MAX_LRC_DESCRIPTORS		65535
+#define	GUC_INVALID_LRC_ID		GUC_MAX_LRC_DESCRIPTORS
+
 #define GUC_RENDER_ENGINE		0
 #define GUC_VIDEO_ENGINE		1
 #define GUC_BLITTER_ENGINE		2
@@ -239,6 +245,41 @@ struct guc_stage_desc {
 	u64 desc_private;
 } __packed;
 
+#define CONTEXT_REGISTRATION_FLAG_KMD	BIT(0)
+
+#define CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000
+#define CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US 500000
+
+/* Preempt to idle on quantum expiry */
+#define CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE	BIT(0)
+
+/*
+ * GuC Context registration descriptor.
+ * FIXME: This is only required to exist during context registration.
+ * The current 1:1 between guc_lrc_desc and LRCs for the lifetime of the LRC
+ * is not required.
+ */
+struct guc_lrc_desc {
+	u32 hw_context_desc;
+	u32 slpm_perf_mode_hint;	/* SPLC v1 only */
+	u32 slpm_freq_hint;
+	u32 engine_submit_mask;		/* In logical space */
+	u8 engine_class;
+	u8 reserved0[3];
+	u32 priority;
+	u32 process_desc;
+	u32 wq_addr;
+	u32 wq_size;
+	u32 context_flags;		/* CONTEXT_REGISTRATION_* */
+	/* Time for one workload to execute. (in micro seconds) */
+	u32 execution_quantum;
+	/* Time to wait for a preemption request to complete before issuing a
+	 * reset. (in micro seconds). */
+	u32 preemption_timeout;
+	u32 policy_flags;		/* CONTEXT_POLICY_* */
+	u32 reserved1[19];
+} __packed;
+
 #define GUC_POWER_UNSPECIFIED	0
 #define GUC_POWER_D0		1
 #define GUC_POWER_D1		2
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 42/97] drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (40 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 41/97] drm/i915/guc: Add new GuC interface defines and structures Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-06 19:13 ` [RFC PATCH 43/97] drm/i915/guc: Add lrc descriptor context lookup array Matthew Brost
                   ` (57 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Remove old GuC stage descriptor, add lrc descriptor which will be used
by the new GuC interface implemented in this patch series.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   | 65 -----------------
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 72 ++++++-------------
 3 files changed, 25 insertions(+), 116 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 4c0a367e41d8..d84f37afb9d8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -44,8 +44,8 @@ struct intel_guc {
 	struct i915_vma *ads_vma;
 	struct __guc_ads_blob *ads_blob;
 
-	struct i915_vma *stage_desc_pool;
-	void *stage_desc_pool_vaddr;
+	struct i915_vma *lrc_desc_pool;
+	void *lrc_desc_pool_vaddr;
 
 	/* Control params for fw initialization */
 	u32 params[GUC_CTL_MAX_DWORDS];
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index cae8649a8147..1dd2f04c2762 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -26,9 +26,6 @@
 #define GUC_CLIENT_PRIORITY_NORMAL	3
 #define GUC_CLIENT_PRIORITY_NUM		4
 
-#define GUC_MAX_STAGE_DESCRIPTORS	1024
-#define	GUC_INVALID_STAGE_ID		GUC_MAX_STAGE_DESCRIPTORS
-
 #define GUC_MAX_LRC_DESCRIPTORS		65535
 #define	GUC_INVALID_LRC_ID		GUC_MAX_LRC_DESCRIPTORS
 
@@ -183,68 +180,6 @@ struct guc_process_desc {
 	u32 reserved[30];
 } __packed;
 
-/* engine id and context id is packed into guc_execlist_context.context_id*/
-#define GUC_ELC_CTXID_OFFSET		0
-#define GUC_ELC_ENGINE_OFFSET		29
-
-/* The execlist context including software and HW information */
-struct guc_execlist_context {
-	u32 context_desc;
-	u32 context_id;
-	u32 ring_status;
-	u32 ring_lrca;
-	u32 ring_begin;
-	u32 ring_end;
-	u32 ring_next_free_location;
-	u32 ring_current_tail_pointer_value;
-	u8 engine_state_submit_value;
-	u8 engine_state_wait_value;
-	u16 pagefault_count;
-	u16 engine_submit_queue_count;
-} __packed;
-
-/*
- * This structure describes a stage set arranged for a particular communication
- * between uKernel (GuC) and Driver (KMD). Technically, this is known as a
- * "GuC Context descriptor" in the specs, but we use the term "stage descriptor"
- * to avoid confusion with all the other things already named "context" in the
- * driver. A static pool of these descriptors are stored inside a GEM object
- * (stage_desc_pool) which is held for the entire lifetime of our interaction
- * with the GuC, being allocated before the GuC is loaded with its firmware.
- */
-struct guc_stage_desc {
-	u32 sched_common_area;
-	u32 stage_id;
-	u32 pas_id;
-	u8 engines_used;
-	u64 db_trigger_cpu;
-	u32 db_trigger_uk;
-	u64 db_trigger_phy;
-	u16 db_id;
-
-	struct guc_execlist_context lrc[GUC_MAX_ENGINES_NUM];
-
-	u8 attribute;
-
-	u32 priority;
-
-	u32 wq_sampled_tail_offset;
-	u32 wq_total_submit_enqueues;
-
-	u32 process_desc;
-	u32 wq_addr;
-	u32 wq_size;
-
-	u32 engine_presence;
-
-	u8 engine_suspended;
-
-	u8 reserved0[3];
-	u64 reserved1[1];
-
-	u64 desc_private;
-} __packed;
-
 #define CONTEXT_REGISTRATION_FLAG_KMD	BIT(0)
 
 #define CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index b8f9c71af13e..6acc1ef34f92 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -65,57 +65,35 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 	return rb_entry(rb, struct i915_priolist, node);
 }
 
-static struct guc_stage_desc *__get_stage_desc(struct intel_guc *guc, u32 id)
+/* Future patches will use this function */
+__attribute__ ((unused))
+static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
 {
-	struct guc_stage_desc *base = guc->stage_desc_pool_vaddr;
+	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
 
-	return &base[id];
-}
-
-static int guc_stage_desc_pool_create(struct intel_guc *guc)
-{
-	u32 size = PAGE_ALIGN(sizeof(struct guc_stage_desc) *
-			      GUC_MAX_STAGE_DESCRIPTORS);
+	GEM_BUG_ON(index >= GUC_MAX_LRC_DESCRIPTORS);
 
-	return intel_guc_allocate_and_map_vma(guc, size, &guc->stage_desc_pool,
-					      &guc->stage_desc_pool_vaddr);
+	return &base[index];
 }
 
-static void guc_stage_desc_pool_destroy(struct intel_guc *guc)
-{
-	i915_vma_unpin_and_release(&guc->stage_desc_pool, I915_VMA_RELEASE_MAP);
-}
-
-/*
- * Initialise/clear the stage descriptor shared with the GuC firmware.
- *
- * This descriptor tells the GuC where (in GGTT space) to find the important
- * data structures related to work submission (process descriptor, write queue,
- * etc).
- */
-static void guc_stage_desc_init(struct intel_guc *guc)
+static int guc_lrc_desc_pool_create(struct intel_guc *guc)
 {
-	struct guc_stage_desc *desc;
-
-	/* we only use 1 stage desc, so hardcode it to 0 */
-	desc = __get_stage_desc(guc, 0);
-	memset(desc, 0, sizeof(*desc));
-
-	desc->attribute = GUC_STAGE_DESC_ATTR_ACTIVE |
-			  GUC_STAGE_DESC_ATTR_KERNEL;
+	u32 size;
+	int ret;
 
-	desc->stage_id = 0;
-	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
+	size = PAGE_ALIGN(sizeof(struct guc_lrc_desc) *
+			  GUC_MAX_LRC_DESCRIPTORS);
+	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->lrc_desc_pool,
+					     (void **)&guc->lrc_desc_pool_vaddr);
+	if (ret)
+		return ret;
 
-	desc->wq_size = GUC_WQ_SIZE;
+	return 0;
 }
 
-static void guc_stage_desc_fini(struct intel_guc *guc)
+static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
 {
-	struct guc_stage_desc *desc;
-
-	desc = __get_stage_desc(guc, 0);
-	memset(desc, 0, sizeof(*desc));
+	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
 }
 
 static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
@@ -414,26 +392,25 @@ int intel_guc_submission_init(struct intel_guc *guc)
 {
 	int ret;
 
-	if (guc->stage_desc_pool)
+	if (guc->lrc_desc_pool)
 		return 0;
 
-	ret = guc_stage_desc_pool_create(guc);
+	ret = guc_lrc_desc_pool_create(guc);
 	if (ret)
 		return ret;
 	/*
 	 * Keep static analysers happy, let them know that we allocated the
 	 * vma after testing that it didn't exist earlier.
 	 */
-	GEM_BUG_ON(!guc->stage_desc_pool);
+	GEM_BUG_ON(!guc->lrc_desc_pool);
 
 	return 0;
 }
 
 void intel_guc_submission_fini(struct intel_guc *guc)
 {
-	if (guc->stage_desc_pool) {
-		guc_stage_desc_pool_destroy(guc);
-	}
+	if (guc->lrc_desc_pool)
+		guc_lrc_desc_pool_destroy(guc);
 }
 
 static int guc_context_alloc(struct intel_context *ce)
@@ -700,7 +677,6 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 
 void intel_guc_submission_enable(struct intel_guc *guc)
 {
-	guc_stage_desc_init(guc);
 }
 
 void intel_guc_submission_disable(struct intel_guc *guc)
@@ -710,8 +686,6 @@ void intel_guc_submission_disable(struct intel_guc *guc)
 	GEM_BUG_ON(gt->awake); /* GT should be parked first */
 
 	/* Note: By the time we're here, GuC may have already been reset */
-
-	guc_stage_desc_fini(guc);
 }
 
 static bool __guc_submission_selected(struct intel_guc *guc)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 43/97] drm/i915/guc: Add lrc descriptor context lookup array
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (41 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 42/97] drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-11 15:26   ` Daniel Vetter
  2021-05-06 19:13 ` [RFC PATCH 44/97] drm/i915/guc: Implement GuC submission tasklet Matthew Brost
                   ` (56 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Add lrc descriptor context lookup array which can resolve the
intel_context from the lrc descriptor index. In addition to lookup, it
can determine in the lrc descriptor context is currently registered with
the GuC by checking if an entry for a descriptor index is present.
Future patches in the series will make use of this array.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  5 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +++++++++++++++++--
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index d84f37afb9d8..2eb6c497e43c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -6,6 +6,8 @@
 #ifndef _INTEL_GUC_H_
 #define _INTEL_GUC_H_
 
+#include "linux/xarray.h"
+
 #include "intel_uncore.h"
 #include "intel_guc_fw.h"
 #include "intel_guc_fwif.h"
@@ -47,6 +49,9 @@ struct intel_guc {
 	struct i915_vma *lrc_desc_pool;
 	void *lrc_desc_pool_vaddr;
 
+	/* guc_id to intel_context lookup */
+	struct xarray context_lookup;
+
 	/* Control params for fw initialization */
 	u32 params[GUC_CTL_MAX_DWORDS];
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 6acc1ef34f92..c2b6d27404b7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 	return rb_entry(rb, struct i915_priolist, node);
 }
 
-/* Future patches will use this function */
-__attribute__ ((unused))
 static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
 {
 	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
@@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
 	return &base[index];
 }
 
+static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
+{
+	struct intel_context *ce = xa_load(&guc->context_lookup, id);
+
+	GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
+
+	return ce;
+}
+
 static int guc_lrc_desc_pool_create(struct intel_guc *guc)
 {
 	u32 size;
@@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
 	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
 }
 
+static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
+{
+	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
+
+	memset(desc, 0, sizeof(*desc));
+	xa_erase_irq(&guc->context_lookup, id);
+}
+
+static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
+{
+	return __get_context(guc, id);
+}
+
+static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
+					   struct intel_context *ce)
+{
+	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
+}
+
 static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
 	/* Leaving stub as this function will be used in future patches */
@@ -404,6 +430,8 @@ int intel_guc_submission_init(struct intel_guc *guc)
 	 */
 	GEM_BUG_ON(!guc->lrc_desc_pool);
 
+	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
+
 	return 0;
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 44/97] drm/i915/guc: Implement GuC submission tasklet
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (42 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 43/97] drm/i915/guc: Add lrc descriptor context lookup array Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-25  9:43   ` [Intel-gfx] " Tvrtko Ursulin
  2021-05-06 19:13 ` [RFC PATCH 45/97] drm/i915/guc: Add bypass tasklet submission path to GuC Matthew Brost
                   ` (55 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Implement GuC submission tasklet for new interface. The new GuC
interface uses H2G to submit contexts to the GuC. Since H2G use a single
channel, a single tasklet submits is used for the submission path. As
such a global struct intel_engine_cs has been added to leverage the
existing scheduling code.

Also the per engine interrupt handler has been updated to disable the
rescheduling of the physical engine tasklet, when using GuC scheduling,
as the physical engine tasklet is no longer used.

In this patch the field, guc_id, has been added to intel_context and is
not assigned. Patches later in the series will assign this value.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 233 +++++++++---------
 3 files changed, 127 insertions(+), 119 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index ed8c447a7346..bb6fef7eae52 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -136,6 +136,15 @@ struct intel_context {
 	struct intel_sseu sseu;
 
 	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
+
+	/* GuC scheduling state that does not require a lock. */
+	atomic_t guc_sched_state_no_lock;
+
+	/*
+	 * GuC lrc descriptor ID - Not assigned in this patch but future patches
+	 * in the series will.
+	 */
+	u16 guc_id;
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 2eb6c497e43c..d32866fe90ad 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -30,6 +30,10 @@ struct intel_guc {
 	struct intel_guc_log log;
 	struct intel_guc_ct ct;
 
+	/* Global engine used to submit requests to GuC */
+	struct i915_sched_engine *sched_engine;
+	struct i915_request *stalled_request;
+
 	/* intel_guc_recv interrupt related state */
 	spinlock_t irq_lock;
 	unsigned int msg_enabled_mask;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index c2b6d27404b7..0955a8b00ee8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -60,6 +60,30 @@
 
 #define GUC_REQUEST_SIZE 64 /* bytes */
 
+/*
+ * Below is a set of functions which control the GuC scheduling state which do
+ * not require a lock as all state transitions are mutually exclusive. i.e. It
+ * is not possible for the context pinning code and submission, for the same
+ * context, to be executing simultaneously.
+ */
+#define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
+static inline bool context_enabled(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_ENABLED);
+}
+
+static inline void set_context_enabled(struct intel_context *ce)
+{
+	atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_enabled(struct intel_context *ce)
+{
+	atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
+		   &ce->guc_sched_state_no_lock);
+}
+
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
 	return rb_entry(rb, struct i915_priolist, node);
@@ -122,37 +146,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
 	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
 }
 
-static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
+static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
-	/* Leaving stub as this function will be used in future patches */
-}
+	int err;
+	struct intel_context *ce = rq->context;
+	u32 action[3];
+	int len = 0;
+	bool enabled = context_enabled(ce);
 
-/*
- * When we're doing submissions using regular execlists backend, writing to
- * ELSP from CPU side is enough to make sure that writes to ringbuffer pages
- * pinned in mappable aperture portion of GGTT are visible to command streamer.
- * Writes done by GuC on our behalf are not guaranteeing such ordering,
- * therefore, to ensure the flush, we're issuing a POSTING READ.
- */
-static void flush_ggtt_writes(struct i915_vma *vma)
-{
-	if (i915_vma_is_map_and_fenceable(vma))
-		intel_uncore_posting_read_fw(vma->vm->gt->uncore,
-					     GUC_STATUS);
-}
+	if (!enabled) {
+		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
+		action[len++] = ce->guc_id;
+		action[len++] = GUC_CONTEXT_ENABLE;
+	} else {
+		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
+		action[len++] = ce->guc_id;
+	}
 
-static void guc_submit(struct intel_engine_cs *engine,
-		       struct i915_request **out,
-		       struct i915_request **end)
-{
-	struct intel_guc *guc = &engine->gt->uc.guc;
+	err = intel_guc_send_nb(guc, action, len);
 
-	do {
-		struct i915_request *rq = *out++;
+	if (!enabled && !err)
+		set_context_enabled(ce);
 
-		flush_ggtt_writes(rq->ring->vma);
-		guc_add_request(guc, rq);
-	} while (out != end);
+	return err;
 }
 
 static inline int rq_prio(const struct i915_request *rq)
@@ -160,125 +176,88 @@ static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority;
 }
 
-static struct i915_request *schedule_in(struct i915_request *rq, int idx)
-{
-	trace_i915_request_in(rq, idx);
-
-	/*
-	 * Currently we are not tracking the rq->context being inflight
-	 * (ce->inflight = rq->engine). It is only used by the execlists
-	 * backend at the moment, a similar counting strategy would be
-	 * required if we generalise the inflight tracking.
-	 */
-
-	__intel_gt_pm_get(rq->engine->gt);
-	return i915_request_get(rq);
-}
-
-static void schedule_out(struct i915_request *rq)
-{
-	trace_i915_request_out(rq);
-
-	intel_gt_pm_put_async(rq->engine->gt);
-	i915_request_put(rq);
-}
-
-static void __guc_dequeue(struct intel_engine_cs *engine)
+static int guc_dequeue_one_context(struct intel_guc *guc)
 {
-	struct i915_sched_engine * const sched_engine = engine->sched_engine;
-	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct i915_request **first = execlists->inflight;
-	struct i915_request ** const last_port = first + execlists->port_mask;
-	struct i915_request *last = first[0];
-	struct i915_request **port;
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	struct i915_request *last = NULL;
 	bool submit = false;
 	struct rb_node *rb;
+	int ret;
 
-	lockdep_assert_held(&engine->sched_engine->lock);
-
-	if (last) {
-		if (*++first)
-			return;
+	lockdep_assert_held(&sched_engine->lock);
 
-		last = NULL;
+	if (guc->stalled_request) {
+		submit = true;
+		last = guc->stalled_request;
+		goto resubmit;
 	}
 
-	/*
-	 * We write directly into the execlists->inflight queue and don't use
-	 * the execlists->pending queue, as we don't have a distinct switch
-	 * event.
-	 */
-	port = first;
 	while ((rb = rb_first_cached(&sched_engine->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 		struct i915_request *rq, *rn;
 
 		priolist_for_each_request_consume(rq, rn, p) {
-			if (last && rq->context != last->context) {
-				if (port == last_port)
-					goto done;
-
-				*port = schedule_in(last,
-						    port - execlists->inflight);
-				port++;
-			}
+			if (last && rq->context != last->context)
+				goto done;
 
 			list_del_init(&rq->sched.link);
+
 			__i915_request_submit(rq);
-			submit = true;
+
+			trace_i915_request_in(rq, 0);
 			last = rq;
+			submit = true;
 		}
 
 		rb_erase_cached(&p->node, &sched_engine->queue);
 		i915_priolist_free(p);
 	}
 done:
-	sched_engine->queue_priority_hint =
-		rb ? to_priolist(rb)->priority : INT_MIN;
 	if (submit) {
-		*port = schedule_in(last, port - execlists->inflight);
-		*++port = NULL;
-		guc_submit(engine, first, port);
+		last->context->lrc_reg_state[CTX_RING_TAIL] =
+			intel_ring_set_tail(last->ring, last->tail);
+resubmit:
+		/*
+		 * We only check for -EBUSY here even though it is possible for
+		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
+		 * died and a full GPU needs to be done. The hangcheck will
+		 * eventually detect that the GuC has died and trigger this
+		 * reset so no need to handle -EDEADLK here.
+		 */
+		ret = guc_add_request(guc, last);
+		if (ret == -EBUSY) {
+			i915_sched_engine_kick(sched_engine);
+			guc->stalled_request = last;
+			return false;
+		}
 	}
-	execlists->active = execlists->inflight;
+
+	guc->stalled_request = NULL;
+	return submit;
 }
 
 static void guc_submission_tasklet(struct tasklet_struct *t)
 {
 	struct i915_sched_engine *sched_engine =
 		from_tasklet(sched_engine, t, tasklet);
-	struct intel_engine_cs * const engine = sched_engine->engine;
-	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct i915_request **port, *rq;
 	unsigned long flags;
+	bool loop;
 
-	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-
-	for (port = execlists->inflight; (rq = *port); port++) {
-		if (!i915_request_completed(rq))
-			break;
+	spin_lock_irqsave(&sched_engine->lock, flags);
 
-		schedule_out(rq);
-	}
-	if (port != execlists->inflight) {
-		int idx = port - execlists->inflight;
-		int rem = ARRAY_SIZE(execlists->inflight) - idx;
-		memmove(execlists->inflight, port, rem * sizeof(*port));
-	}
-
-	__guc_dequeue(engine);
+	do {
+		loop = guc_dequeue_one_context(&sched_engine->engine->gt->uc.guc);
+	} while (loop);
 
-	i915_sched_engine_reset_on_empty(engine->sched_engine);
+	i915_sched_engine_reset_on_empty(sched_engine);
 
-	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
 static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
 {
-	if (iir & GT_RENDER_USER_INTERRUPT) {
+	if (iir & GT_RENDER_USER_INTERRUPT)
 		intel_engine_signal_breadcrumbs(engine);
-		i915_sched_engine_hi_kick(engine->sched_engine);
-	}
 }
 
 static void guc_reset_prepare(struct intel_engine_cs *engine)
@@ -351,6 +330,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
 	struct rb_node *rb;
 	unsigned long flags;
 
+	/* Can be called during boot if GuC fails to load */
+	if (!engine->gt)
+		return;
+
 	ENGINE_TRACE(engine, "\n");
 
 	/*
@@ -437,8 +420,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
 
 void intel_guc_submission_fini(struct intel_guc *guc)
 {
-	if (guc->lrc_desc_pool)
-		guc_lrc_desc_pool_destroy(guc);
+	if (!guc->lrc_desc_pool)
+		return;
+
+	guc_lrc_desc_pool_destroy(guc);
+	i915_sched_engine_put(guc->sched_engine);
 }
 
 static int guc_context_alloc(struct intel_context *ce)
@@ -503,32 +489,32 @@ static int guc_request_alloc(struct i915_request *request)
 	return 0;
 }
 
-static inline void queue_request(struct intel_engine_cs *engine,
+static inline void queue_request(struct i915_sched_engine *sched_engine,
 				 struct i915_request *rq,
 				 int prio)
 {
 	GEM_BUG_ON(!list_empty(&rq->sched.link));
 	list_add_tail(&rq->sched.link,
-		      i915_sched_lookup_priolist(engine->sched_engine, prio));
+		      i915_sched_lookup_priolist(sched_engine, prio));
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 }
 
 static void guc_submit_request(struct i915_request *rq)
 {
-	struct intel_engine_cs *engine = rq->engine;
+	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
 	unsigned long flags;
 
 	/* Will be called from irq-context when using foreign fences. */
-	spin_lock_irqsave(&engine->sched_engine->lock, flags);
+	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	queue_request(engine, rq, rq_prio(rq));
+	queue_request(sched_engine, rq, rq_prio(rq));
 
-	GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
+	GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
 	GEM_BUG_ON(list_empty(&rq->sched.link));
 
-	i915_sched_engine_hi_kick(engine->sched_engine);
+	i915_sched_engine_hi_kick(sched_engine);
 
-	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
 static void sanitize_hwsp(struct intel_engine_cs *engine)
@@ -606,8 +592,6 @@ static void guc_release(struct intel_engine_cs *engine)
 {
 	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
 
-	tasklet_kill(&engine->sched_engine->tasklet);
-
 	intel_engine_cleanup_common(engine);
 	lrc_fini_wa_ctx(engine);
 }
@@ -678,6 +662,7 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
 int intel_guc_submission_setup(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
+	struct intel_guc *guc = &engine->gt->uc.guc;
 
 	/*
 	 * The setup relies on several assumptions (e.g. irqs always enabled)
@@ -685,8 +670,18 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 	 */
 	GEM_BUG_ON(INTEL_GEN(i915) < 11);
 
-	tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet);
-	engine->sched_engine->schedule = i915_schedule;
+	if (!guc->sched_engine) {
+		guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
+		if (!guc->sched_engine)
+			return -ENOMEM;
+
+		guc->sched_engine->schedule = i915_schedule;
+		guc->sched_engine->engine = engine;
+		tasklet_setup(&guc->sched_engine->tasklet,
+			      guc_submission_tasklet);
+	}
+	i915_sched_engine_put(engine->sched_engine);
+	engine->sched_engine = i915_sched_engine_get(guc->sched_engine);
 
 	guc_default_vfuncs(engine);
 	guc_default_irqs(engine);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 45/97] drm/i915/guc: Add bypass tasklet submission path to GuC
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (43 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 44/97] drm/i915/guc: Implement GuC submission tasklet Matthew Brost
@ 2021-05-06 19:13 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 46/97] drm/i915/guc: Implement GuC context operations for new inteface Matthew Brost
                   ` (54 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Add bypass tasklet submission path to GuC. The tasklet is only used if H2G
channel has backpresure.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++++++++++++++----
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 0955a8b00ee8..2fd83562c1d1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -171,6 +171,12 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	return err;
 }
 
+static inline void guc_set_lrc_tail(struct i915_request *rq)
+{
+	rq->context->lrc_reg_state[CTX_RING_TAIL] =
+		intel_ring_set_tail(rq->ring, rq->tail);
+}
+
 static inline int rq_prio(const struct i915_request *rq)
 {
 	return rq->sched.attr.priority;
@@ -214,8 +220,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 	}
 done:
 	if (submit) {
-		last->context->lrc_reg_state[CTX_RING_TAIL] =
-			intel_ring_set_tail(last->ring, last->tail);
+		guc_set_lrc_tail(last);
 resubmit:
 		/*
 		 * We only check for -EBUSY here even though it is possible for
@@ -499,20 +504,36 @@ static inline void queue_request(struct i915_sched_engine *sched_engine,
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 }
 
+static int guc_bypass_tasklet_submit(struct intel_guc *guc,
+				     struct i915_request *rq)
+{
+	int ret;
+
+	__i915_request_submit(rq);
+
+	trace_i915_request_in(rq, 0);
+
+	guc_set_lrc_tail(rq);
+	ret = guc_add_request(guc, rq);
+	if (ret == -EBUSY)
+		guc->stalled_request = rq;
+
+	return ret;
+}
+
 static void guc_submit_request(struct i915_request *rq)
 {
 	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
+	struct intel_guc *guc = &rq->engine->gt->uc.guc;
 	unsigned long flags;
 
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	queue_request(sched_engine, rq, rq_prio(rq));
-
-	GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
-	GEM_BUG_ON(list_empty(&rq->sched.link));
-
-	i915_sched_engine_hi_kick(sched_engine);
+	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
+		queue_request(sched_engine, rq, rq_prio(rq));
+	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
+		i915_sched_engine_hi_kick(sched_engine);
 
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 46/97] drm/i915/guc: Implement GuC context operations for new inteface
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (44 preceding siblings ...)
  2021-05-06 19:13 ` [RFC PATCH 45/97] drm/i915/guc: Add bypass tasklet submission path to GuC Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-29 20:32   ` Michal Wajdeczko
  2021-05-06 19:14 ` [RFC PATCH 47/97] drm/i915/guc: Insert fence on context when deregistering Matthew Brost
                   ` (53 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Implement GuC context operations which includes GuC specific operations
pin, unpin, and destroy.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
 drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  34 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   7 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 663 ++++++++++++++++--
 drivers/gpu/drm/i915/i915_reg.h               |   1 +
 drivers/gpu/drm/i915/i915_request.c           |   1 +
 8 files changed, 680 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 4033184f13b9..2b68af16222c 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -383,6 +383,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 
 	mutex_init(&ce->pin_mutex);
 
+	spin_lock_init(&ce->guc_state.lock);
+
+	ce->guc_id = GUC_INVALID_LRC_ID;
+	INIT_LIST_HEAD(&ce->guc_id_link);
+
 	i915_active_init(&ce->active,
 			 __intel_context_active, __intel_context_retire, 0);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index bb6fef7eae52..ce7c69b34cd1 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -95,6 +95,7 @@ struct intel_context {
 #define CONTEXT_BANNED			6
 #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
 #define CONTEXT_NOPREEMPT		8
+#define CONTEXT_LRCA_DIRTY		9
 
 	struct {
 		u64 timeout_us;
@@ -137,14 +138,29 @@ struct intel_context {
 
 	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
 
+	struct {
+		/** lock: protects everything in guc_state */
+		spinlock_t lock;
+		/**
+		 * sched_state: scheduling state of this context using GuC
+		 * submission
+		 */
+		u8 sched_state;
+	} guc_state;
+
 	/* GuC scheduling state that does not require a lock. */
 	atomic_t guc_sched_state_no_lock;
 
+	/* GuC lrc descriptor ID */
+	u16 guc_id;
+
+	/* GuC lrc descriptor reference count */
+	atomic_t guc_id_ref;
+
 	/*
-	 * GuC lrc descriptor ID - Not assigned in this patch but future patches
-	 * in the series will.
+	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
 	 */
-	u16 guc_id;
+	struct list_head guc_id_link;
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
index 41e5350a7a05..49d4857ad9b7 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
@@ -87,7 +87,6 @@
 #define GEN11_CSB_WRITE_PTR_MASK	(GEN11_CSB_PTR_MASK << 0)
 
 #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */
-#define MAX_GUC_CONTEXT_HW_ID	(1 << 20) /* exclusive */
 #define GEN11_MAX_CONTEXT_HW_ID	(1 << 11) /* exclusive */
 /* in Gen12 ID 0x7FF is reserved to indicate idle */
 #define GEN12_MAX_CONTEXT_HW_ID	(GEN11_MAX_CONTEXT_HW_ID - 1)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index d32866fe90ad..85ff32bfd074 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -45,6 +45,14 @@ struct intel_guc {
 		void (*disable)(struct intel_guc *guc);
 	} interrupts;
 
+	/*
+	 * contexts_lock protects the pool of free guc ids and a linked list of
+	 * guc ids available to be stolden
+	 */
+	spinlock_t contexts_lock;
+	struct ida guc_ids;
+	struct list_head guc_id_list;
+
 	bool submission_selected;
 
 	struct i915_vma *ads_vma;
@@ -103,6 +111,29 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
 				 response_buf, response_buf_size, 0);
 }
 
+static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
+					   const u32 *action,
+					   u32 len,
+					   bool loop)
+{
+	int err;
+
+	/* No sleeping with spin locks, just busy loop */
+	might_sleep_if(loop && (!in_atomic() && !irqs_disabled()));
+
+retry:
+	err = intel_guc_send_nb(guc, action, len);
+	if (unlikely(err == -EBUSY && loop)) {
+		if (likely(!in_atomic() && !irqs_disabled()))
+			cond_resched();
+		else
+			cpu_relax();
+		goto retry;
+	}
+
+	return err;
+}
+
 static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
 {
 	intel_guc_ct_event_handler(&guc->ct);
@@ -204,6 +235,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
 int intel_guc_reset_engine(struct intel_guc *guc,
 			   struct intel_engine_cs *engine);
 
+int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
+					  const u32 *msg, u32 len);
+
 void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
 
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 586e6efc3558..51c5efdf543a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -893,6 +893,13 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 	case INTEL_GUC_ACTION_DEFAULT:
 		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
 		break;
+	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
+		ret = intel_guc_deregister_done_process_msg(guc, payload,
+							    len);
+		if (unlikely(ret))
+			CT_ERROR(ct, "deregister context failed %x %*ph\n",
+				  action, 4 * len, payload);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 2fd83562c1d1..eada9ffc1a54 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -13,7 +13,9 @@
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_irq.h"
 #include "gt/intel_gt_pm.h"
+#include "gt/intel_gt_requests.h"
 #include "gt/intel_lrc.h"
+#include "gt/intel_lrc_reg.h"
 #include "gt/intel_mocs.h"
 #include "gt/intel_ring.h"
 
@@ -84,6 +86,73 @@ static inline void clr_context_enabled(struct intel_context *ce)
 		   &ce->guc_sched_state_no_lock);
 }
 
+/*
+ * Below is a set of functions which control the GuC scheduling state which
+ * require a lock, aside from the special case where the functions are called
+ * from guc_lrc_desc_pin(). In that case it isn't possible for any other code
+ * path to be executing on the context.
+ */
+#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
+#define SCHED_STATE_DESTROYED				BIT(1)
+static inline void init_sched_state(struct intel_context *ce)
+{
+	/* Only should be called from guc_lrc_desc_pin() */
+	atomic_set(&ce->guc_sched_state_no_lock, 0);
+	ce->guc_state.sched_state = 0;
+}
+
+static inline bool
+context_wait_for_deregister_to_register(struct intel_context *ce)
+{
+	return (ce->guc_state.sched_state &
+		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
+}
+
+static inline void
+set_context_wait_for_deregister_to_register(struct intel_context *ce)
+{
+	/* Only should be called from guc_lrc_desc_pin() */
+	ce->guc_state.sched_state |=
+		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
+}
+
+static inline void
+clr_context_wait_for_deregister_to_register(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state =
+		(ce->guc_state.sched_state &
+		 ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
+}
+
+static inline bool
+context_destroyed(struct intel_context *ce)
+{
+	return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
+}
+
+static inline void
+set_context_destroyed(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
+}
+
+static inline bool context_guc_id_invalid(struct intel_context *ce)
+{
+	return (ce->guc_id == GUC_INVALID_LRC_ID);
+}
+
+static inline void set_context_guc_id_invalid(struct intel_context *ce)
+{
+	ce->guc_id = GUC_INVALID_LRC_ID;
+}
+
+static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
+{
+	return &ce->engine->gt->uc.guc;
+}
+
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
 	return rb_entry(rb, struct i915_priolist, node);
@@ -154,6 +223,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	int len = 0;
 	bool enabled = context_enabled(ce);
 
+	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
+	GEM_BUG_ON(context_guc_id_invalid(ce));
+
 	if (!enabled) {
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
 		action[len++] = ce->guc_id;
@@ -420,6 +492,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
 
 	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
 
+	spin_lock_init(&guc->contexts_lock);
+	INIT_LIST_HEAD(&guc->guc_id_list);
+	ida_init(&guc->guc_ids);
+
 	return 0;
 }
 
@@ -432,9 +508,303 @@ void intel_guc_submission_fini(struct intel_guc *guc)
 	i915_sched_engine_put(guc->sched_engine);
 }
 
-static int guc_context_alloc(struct intel_context *ce)
+static inline void queue_request(struct i915_sched_engine *sched_engine,
+				 struct i915_request *rq,
+				 int prio)
 {
-	return lrc_alloc(ce, ce->engine);
+	GEM_BUG_ON(!list_empty(&rq->sched.link));
+	list_add_tail(&rq->sched.link,
+		      i915_sched_lookup_priolist(sched_engine, prio));
+	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+}
+
+static int guc_bypass_tasklet_submit(struct intel_guc *guc,
+				     struct i915_request *rq)
+{
+	int ret;
+
+	__i915_request_submit(rq);
+
+	trace_i915_request_in(rq, 0);
+
+	guc_set_lrc_tail(rq);
+	ret = guc_add_request(guc, rq);
+	if (ret == -EBUSY)
+		guc->stalled_request = rq;
+
+	return ret;
+}
+
+static void guc_submit_request(struct i915_request *rq)
+{
+	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
+	struct intel_guc *guc = &rq->engine->gt->uc.guc;
+	unsigned long flags;
+
+	/* Will be called from irq-context when using foreign fences. */
+	spin_lock_irqsave(&sched_engine->lock, flags);
+
+	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
+		queue_request(sched_engine, rq, rq_prio(rq));
+	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
+		i915_sched_engine_hi_kick(sched_engine);
+
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+
+#define GUC_ID_START	64	/* First 64 guc_ids reserved */
+static int new_guc_id(struct intel_guc *guc)
+{
+	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
+			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
+			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
+}
+
+static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
+{
+	if (!context_guc_id_invalid(ce)) {
+		ida_simple_remove(&guc->guc_ids, ce->guc_id);
+		reset_lrc_desc(guc, ce->guc_id);
+		set_context_guc_id_invalid(ce);
+	}
+	if (!list_empty(&ce->guc_id_link))
+		list_del_init(&ce->guc_id_link);
+}
+
+static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&guc->contexts_lock, flags);
+	__release_guc_id(guc, ce);
+	spin_unlock_irqrestore(&guc->contexts_lock, flags);
+}
+
+static int steal_guc_id(struct intel_guc *guc)
+{
+	struct intel_context *ce;
+	int guc_id;
+
+	if (!list_empty(&guc->guc_id_list)) {
+		ce = list_first_entry(&guc->guc_id_list,
+				      struct intel_context,
+				      guc_id_link);
+
+		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
+		GEM_BUG_ON(context_guc_id_invalid(ce));
+
+		list_del_init(&ce->guc_id_link);
+		guc_id = ce->guc_id;
+		set_context_guc_id_invalid(ce);
+		return guc_id;
+	} else {
+		return -EAGAIN;
+	}
+}
+
+static int assign_guc_id(struct intel_guc *guc, u16 *out)
+{
+	int ret;
+
+	ret = new_guc_id(guc);
+	if (unlikely(ret < 0)) {
+		ret = steal_guc_id(guc);
+		if (ret < 0)
+			return ret;
+	}
+
+	*out = ret;
+	return 0;
+}
+
+#define PIN_GUC_ID_TRIES	4
+static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
+{
+	int ret = 0;
+	unsigned long flags, tries = PIN_GUC_ID_TRIES;
+
+	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
+
+try_again:
+	spin_lock_irqsave(&guc->contexts_lock, flags);
+
+	if (context_guc_id_invalid(ce)) {
+		ret = assign_guc_id(guc, &ce->guc_id);
+		if (ret)
+			goto out_unlock;
+		ret = 1;	// Indidcates newly assigned HW context
+	}
+	if (!list_empty(&ce->guc_id_link))
+		list_del_init(&ce->guc_id_link);
+	atomic_inc(&ce->guc_id_ref);
+
+out_unlock:
+	spin_unlock_irqrestore(&guc->contexts_lock, flags);
+
+	/*
+	 * -EAGAIN indicates no guc_ids are available, let's retire any
+	 * outstanding requests to see if that frees up a guc_id. If the first
+	 * retire didn't help, insert a sleep with the timeslice duration before
+	 * attempting to retire more requests. Double the sleep period each
+	 * subsequent pass before finally giving up. The sleep period has max of
+	 * 100ms and minimum of 1ms.
+	 */
+	if (ret == -EAGAIN && --tries) {
+		if (PIN_GUC_ID_TRIES - tries > 1) {
+			unsigned int timeslice_shifted =
+				ce->engine->props.timeslice_duration_ms <<
+				(PIN_GUC_ID_TRIES - tries - 2);
+			unsigned int max = min_t(unsigned int, 100,
+						 timeslice_shifted);
+
+			msleep(max_t(unsigned int, max, 1));
+		}
+		intel_gt_retire_requests(guc_to_gt(guc));
+		goto try_again;
+	}
+
+	return ret;
+}
+
+static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
+{
+	unsigned long flags;
+
+	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
+
+	spin_lock_irqsave(&guc->contexts_lock, flags);
+	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
+	    !atomic_read(&ce->guc_id_ref))
+		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
+	spin_unlock_irqrestore(&guc->contexts_lock, flags);
+}
+
+static int __guc_action_register_context(struct intel_guc *guc,
+					 u32 guc_id,
+					 u32 offset)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_REGISTER_CONTEXT,
+		guc_id,
+		offset,
+	};
+
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+}
+
+static int register_context(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
+		ce->guc_id * sizeof(struct guc_lrc_desc);
+
+	return __guc_action_register_context(guc, ce->guc_id, offset);
+}
+
+static int __guc_action_deregister_context(struct intel_guc *guc,
+					   u32 guc_id)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
+		guc_id,
+	};
+
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+}
+
+static int deregister_context(struct intel_context *ce, u32 guc_id)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+
+	return __guc_action_deregister_context(guc, guc_id);
+}
+
+static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
+{
+	switch (class) {
+	case RENDER_CLASS:
+		return mask >> RCS0;
+	case VIDEO_ENHANCEMENT_CLASS:
+		return mask >> VECS0;
+	case VIDEO_DECODE_CLASS:
+		return mask >> VCS0;
+	case COPY_ENGINE_CLASS:
+		return mask >> BCS0;
+	default:
+		GEM_BUG_ON("Invalid Class");
+		return 0;
+	}
+}
+
+static void guc_context_policy_init(struct intel_engine_cs *engine,
+				    struct guc_lrc_desc *desc)
+{
+	desc->policy_flags = 0;
+
+	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
+	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
+}
+
+static int guc_lrc_desc_pin(struct intel_context *ce)
+{
+	struct intel_runtime_pm *runtime_pm =
+		&ce->engine->gt->i915->runtime_pm;
+	struct intel_engine_cs *engine = ce->engine;
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	u32 desc_idx = ce->guc_id;
+	struct guc_lrc_desc *desc;
+	bool context_registered;
+	intel_wakeref_t wakeref;
+	int ret = 0;
+
+	GEM_BUG_ON(!engine->mask);
+
+	/*
+	 * Ensure LRC + CT vmas are is same region as write barrier is done
+	 * based on CT vma region.
+	 */
+	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
+		   i915_gem_object_is_lmem(ce->ring->vma->obj));
+
+	context_registered = lrc_desc_registered(guc, desc_idx);
+
+	reset_lrc_desc(guc, desc_idx);
+	set_lrc_desc_registered(guc, desc_idx, ce);
+
+	desc = __get_lrc_desc(guc, desc_idx);
+	desc->engine_class = engine_class_to_guc_class(engine->class);
+	desc->engine_submit_mask = adjust_engine_mask(engine->class,
+						      engine->mask);
+	desc->hw_context_desc = ce->lrc.lrca;
+	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
+	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
+	guc_context_policy_init(engine, desc);
+	init_sched_state(ce);
+
+	/*
+	 * The context_lookup xarray is used to determine if the hardware
+	 * context is currently registered. There are two cases in which it
+	 * could be regisgered either the guc_id has been stole from from
+	 * another context or the lrc descriptor address of this context has
+	 * changed. In either case the context needs to be deregistered with the
+	 * GuC before registering this context.
+	 */
+	if (context_registered) {
+		set_context_wait_for_deregister_to_register(ce);
+		intel_context_get(ce);
+
+		/*
+		 * If stealing the guc_id, this ce has the same guc_id as the
+		 * context whos guc_id was stole.
+		 */
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			ret = deregister_context(ce, ce->guc_id);
+	} else {
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			ret = register_context(ce);
+	}
+
+	return ret;
 }
 
 static int guc_context_pre_pin(struct intel_context *ce,
@@ -446,36 +816,137 @@ static int guc_context_pre_pin(struct intel_context *ce,
 
 static int guc_context_pin(struct intel_context *ce, void *vaddr)
 {
+	if (i915_ggtt_offset(ce->state) !=
+	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
+		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
+
 	return lrc_pin(ce, ce->engine, vaddr);
 }
 
+static void guc_context_unpin(struct intel_context *ce)
+{
+	unpin_guc_id(ce_to_guc(ce), ce);
+	lrc_unpin(ce);
+}
+
+static void guc_context_post_unpin(struct intel_context *ce)
+{
+	lrc_post_unpin(ce);
+}
+
+static inline void guc_lrc_desc_unpin(struct intel_context *ce)
+{
+	struct intel_engine_cs *engine = ce->engine;
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	unsigned long flags;
+
+	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
+	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	set_context_destroyed(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+	deregister_context(ce, ce->guc_id);
+}
+
+static void guc_context_destroy(struct kref *kref)
+{
+	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
+	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
+	struct intel_guc *guc = &ce->engine->gt->uc.guc;
+	intel_wakeref_t wakeref;
+	unsigned long flags;
+
+	/*
+	 * If the guc_id is invalid this context has been stolen and we can free
+	 * it immediately. Also can be freed immediately if the context is not
+	 * registered with the GuC.
+	 */
+	if (context_guc_id_invalid(ce) ||
+	    !lrc_desc_registered(guc, ce->guc_id)) {
+		release_guc_id(guc, ce);
+		lrc_destroy(kref);
+		return;
+	}
+
+	/*
+	 * We have to acquire the context spinlock and check guc_id again, if it
+	 * is valid it hasn't been stolen and needs to be deregistered. We
+	 * delete this context from the list of unpinned guc_ids available to
+	 * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
+	 * returns indicating this context has been deregistered the guc_id is
+	 * returned to the pool of available guc_ids.
+	 */
+	spin_lock_irqsave(&guc->contexts_lock, flags);
+	if (context_guc_id_invalid(ce)) {
+		__release_guc_id(guc, ce);
+		spin_unlock_irqrestore(&guc->contexts_lock, flags);
+		lrc_destroy(kref);
+		return;
+	}
+
+	if (!list_empty(&ce->guc_id_link))
+		list_del_init(&ce->guc_id_link);
+	spin_unlock_irqrestore(&guc->contexts_lock, flags);
+
+	/*
+	 * We defer GuC context deregistration until the context is destroyed
+	 * in order to save on CTBs. With this optimization ideally we only need
+	 * 1 CTB to register the context during the first pin and 1 CTB to
+	 * deregister the context when the context is destroyed. Without this
+	 * optimization, a CTB would be needed every pin & unpin.
+	 *
+	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
+	 * from context_free_worker when not runtime wakeref is held.
+	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
+	 * in H2G CTB to deregister the context. A future patch may defer this
+	 * H2G CTB if the runtime wakeref is zero.
+	 */
+	with_intel_runtime_pm(runtime_pm, wakeref)
+		guc_lrc_desc_unpin(ce);
+}
+
+static int guc_context_alloc(struct intel_context *ce)
+{
+	return lrc_alloc(ce, ce->engine);
+}
+
 static const struct intel_context_ops guc_context_ops = {
 	.alloc = guc_context_alloc,
 
 	.pre_pin = guc_context_pre_pin,
 	.pin = guc_context_pin,
-	.unpin = lrc_unpin,
-	.post_unpin = lrc_post_unpin,
+	.unpin = guc_context_unpin,
+	.post_unpin = guc_context_post_unpin,
 
 	.enter = intel_context_enter_engine,
 	.exit = intel_context_exit_engine,
 
 	.reset = lrc_reset,
-	.destroy = lrc_destroy,
+	.destroy = guc_context_destroy,
 };
 
-static int guc_request_alloc(struct i915_request *request)
+static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
+{
+	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
+		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
+}
+
+static int guc_request_alloc(struct i915_request *rq)
 {
+	struct intel_context *ce = rq->context;
+	struct intel_guc *guc = ce_to_guc(ce);
 	int ret;
 
-	GEM_BUG_ON(!intel_context_is_pinned(request->context));
+	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
 
 	/*
 	 * Flush enough space to reduce the likelihood of waiting after
 	 * we start building the request - in which case we will just
 	 * have to repeat work.
 	 */
-	request->reserved_space += GUC_REQUEST_SIZE;
+	rq->reserved_space += GUC_REQUEST_SIZE;
 
 	/*
 	 * Note that after this point, we have committed to using
@@ -486,56 +957,48 @@ static int guc_request_alloc(struct i915_request *request)
 	 */
 
 	/* Unconditionally invalidate GPU caches and TLBs. */
-	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
+	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
 	if (ret)
 		return ret;
 
-	request->reserved_space -= GUC_REQUEST_SIZE;
-	return 0;
-}
-
-static inline void queue_request(struct i915_sched_engine *sched_engine,
-				 struct i915_request *rq,
-				 int prio)
-{
-	GEM_BUG_ON(!list_empty(&rq->sched.link));
-	list_add_tail(&rq->sched.link,
-		      i915_sched_lookup_priolist(sched_engine, prio));
-	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
-}
-
-static int guc_bypass_tasklet_submit(struct intel_guc *guc,
-				     struct i915_request *rq)
-{
-	int ret;
+	rq->reserved_space -= GUC_REQUEST_SIZE;
 
-	__i915_request_submit(rq);
-
-	trace_i915_request_in(rq, 0);
-
-	guc_set_lrc_tail(rq);
-	ret = guc_add_request(guc, rq);
-	if (ret == -EBUSY)
-		guc->stalled_request = rq;
-
-	return ret;
-}
+	/*
+	 * Call pin_guc_id here rather than in the pinning step as with
+	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
+	 * guc_ids and creating horrible race conditions. This is especially bad
+	 * when guc_ids are being stolen due to over subscription. By the time
+	 * this function is reached, it is guaranteed that the guc_id will be
+	 * persistent until the generated request is retired. Thus, sealing these
+	 * race conditions. It is still safe to fail here if guc_ids are
+	 * exhausted and return -EAGAIN to the user indicating that they can try
+	 * again in the future.
+	 *
+	 * There is no need for a lock here as the timeline mutex ensures at
+	 * most one context can be executing this code path at once. The
+	 * guc_id_ref is incremented once for every request in flight and
+	 * decremented on each retire. When it is zero, a lock around the
+	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
+	 */
+	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
+		return 0;
 
-static void guc_submit_request(struct i915_request *rq)
-{
-	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
-	struct intel_guc *guc = &rq->engine->gt->uc.guc;
-	unsigned long flags;
+	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
+	if (unlikely(ret < 0))
+		return ret;;
 
-	/* Will be called from irq-context when using foreign fences. */
-	spin_lock_irqsave(&sched_engine->lock, flags);
+	if (context_needs_register(ce, !!ret)) {
+		ret = guc_lrc_desc_pin(ce);
+		if (unlikely(ret)) {	/* unwind */
+			atomic_dec(&ce->guc_id_ref);
+			unpin_guc_id(guc, ce);
+			return ret;
+		}
+	}
 
-	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
-		queue_request(sched_engine, rq, rq_prio(rq));
-	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
-		i915_sched_engine_hi_kick(sched_engine);
+	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
 
-	spin_unlock_irqrestore(&sched_engine->lock, flags);
+	return 0;
 }
 
 static void sanitize_hwsp(struct intel_engine_cs *engine)
@@ -609,6 +1072,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
 	engine->submit_request = guc_submit_request;
 }
 
+static inline void guc_kernel_context_pin(struct intel_guc *guc,
+					  struct intel_context *ce)
+{
+	if (context_guc_id_invalid(ce))
+		pin_guc_id(guc, ce);
+	guc_lrc_desc_pin(ce);
+}
+
+static inline void guc_init_lrc_mapping(struct intel_guc *guc)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	/* make sure all descriptors are clean... */
+	xa_destroy(&guc->context_lookup);
+
+	/*
+	 * Some contexts might have been pinned before we enabled GuC
+	 * submission, so we need to add them to the GuC bookeeping.
+	 * Also, after a reset the GuC we want to make sure that the information
+	 * shared with GuC is properly reset. The kernel lrcs are not attached
+	 * to the gem_context, so they need to be added separately.
+	 *
+	 * Note: we purposely do not check the error return of
+	 * guc_lrc_desc_pin, because that function can only fail in two cases.
+	 * One, if there aren't enough free IDs, but we're guaranteed to have
+	 * enough here (we're either only pinning a handful of lrc on first boot
+	 * or we're re-pinning lrcs that were already pinned before the reset).
+	 * Two, if the GuC has died and CTBs can't make forward progress.
+	 * Presumably, the GuC should be alive as this function is called on
+	 * driver load or after a reset. Even if it is dead, another full GPU
+	 * reset will be triggered and this function would be called again.
+	 */
+
+	for_each_engine(engine, gt, id)
+		if (engine->kernel_context)
+			guc_kernel_context_pin(guc, engine->kernel_context);
+}
+
 static void guc_release(struct intel_engine_cs *engine)
 {
 	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
@@ -721,6 +1224,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 
 void intel_guc_submission_enable(struct intel_guc *guc)
 {
+	guc_init_lrc_mapping(guc);
 }
 
 void intel_guc_submission_disable(struct intel_guc *guc)
@@ -746,3 +1250,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
 {
 	guc->submission_selected = __guc_submission_selected(guc);
 }
+
+static inline struct intel_context *
+g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
+{
+	struct intel_context *ce;
+
+	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"Invalid desc_idx %u", desc_idx);
+		return NULL;
+	}
+
+	ce = __get_context(guc, desc_idx);
+	if (unlikely(!ce)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"Context is NULL, desc_idx %u", desc_idx);
+		return NULL;
+	}
+
+	return ce;
+}
+
+int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
+					  const u32 *msg,
+					  u32 len)
+{
+	struct intel_context *ce;
+	u32 desc_idx = msg[0];
+
+	if (unlikely(len < 1)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+		return -EPROTO;
+	}
+
+	ce = g2h_context_lookup(guc, desc_idx);
+	if (unlikely(!ce))
+		return -EPROTO;
+
+	if (context_wait_for_deregister_to_register(ce)) {
+		struct intel_runtime_pm *runtime_pm =
+			&ce->engine->gt->i915->runtime_pm;
+		intel_wakeref_t wakeref;
+
+		/*
+		 * Previous owner of this guc_id has been deregistered, now safe
+		 * register this context.
+		 */
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			register_context(ce);
+		clr_context_wait_for_deregister_to_register(ce);
+		intel_context_put(ce);
+	} else if (context_destroyed(ce)) {
+		/* Context has been destroyed */
+		release_guc_id(guc, ce);
+		lrc_destroy(&ce->ref);
+	}
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 9ffd173f8b7f..db151b522825 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -4127,6 +4127,7 @@ enum {
 	FAULT_AND_CONTINUE /* Unsupported */
 };
 
+#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
 #define GEN8_CTX_VALID (1 << 0)
 #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
 #define GEN8_CTX_FORCE_RESTORE (1 << 2)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 4c0df56e3b86..56860b7d065b 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -419,6 +419,7 @@ bool i915_request_retire(struct i915_request *rq)
 	 */
 	if (!list_empty(&rq->sched.link))
 		remove_from_engine(rq);
+	atomic_dec(&rq->context->guc_id_ref);
 	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
 
 	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 47/97] drm/i915/guc: Insert fence on context when deregistering
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (45 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 46/97] drm/i915/guc: Implement GuC context operations for new inteface Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 48/97] drm/i915/guc: Defer context unpin until scheduling is disabled Matthew Brost
                   ` (52 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Sometime during context pinning a context with the same guc_id is
registered with the GuC. In this a case deregister must be before before
the context can be registered. A fence is inserted on all requests while
the deregister is in flight. Once the G2H is received indicating the
deregistration is complete the context is registered and the fence is
released.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |  1 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |  5 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 51 ++++++++++++++++++-
 drivers/gpu/drm/i915/i915_request.h           |  8 +++
 4 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 2b68af16222c..f750c826e19d 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -384,6 +384,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	mutex_init(&ce->pin_mutex);
 
 	spin_lock_init(&ce->guc_state.lock);
+	INIT_LIST_HEAD(&ce->guc_state.fences);
 
 	ce->guc_id = GUC_INVALID_LRC_ID;
 	INIT_LIST_HEAD(&ce->guc_id_link);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index ce7c69b34cd1..beafe55a9101 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -146,6 +146,11 @@ struct intel_context {
 		 * submission
 		 */
 		u8 sched_state;
+		/*
+		 * fences: maintains of list of requests that have a submit
+		 * fence related to GuC submission
+		 */
+		struct list_head fences;
 	} guc_state;
 
 	/* GuC scheduling state that does not require a lock. */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index eada9ffc1a54..b4c439025a5f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -927,6 +927,30 @@ static const struct intel_context_ops guc_context_ops = {
 	.destroy = guc_context_destroy,
 };
 
+static void __guc_signal_context_fence(struct intel_context *ce)
+{
+	struct i915_request *rq;
+
+	lockdep_assert_held(&ce->guc_state.lock);
+
+	list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link)
+		i915_sw_fence_complete(&rq->submit);
+
+	INIT_LIST_HEAD(&ce->guc_state.fences);
+}
+
+static void guc_signal_context_fence(struct intel_context *ce)
+{
+	unsigned long flags;
+
+	GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	clr_context_wait_for_deregister_to_register(ce);
+	__guc_signal_context_fence(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+}
+
 static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
 {
 	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
@@ -937,6 +961,7 @@ static int guc_request_alloc(struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
 	struct intel_guc *guc = ce_to_guc(ce);
+	unsigned long flags;
 	int ret;
 
 	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
@@ -981,7 +1006,7 @@ static int guc_request_alloc(struct i915_request *rq)
 	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
 	 */
 	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
-		return 0;
+		goto out;
 
 	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
 	if (unlikely(ret < 0))
@@ -998,6 +1023,28 @@ static int guc_request_alloc(struct i915_request *rq)
 
 	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
 
+out:
+	/*
+	 * We block all requests on this context if a G2H is pending for a
+	 * context deregistration as the GuC will fail a context registration
+	 * while this G2H is pending. Once a G2H returns, the fence is released
+	 * that is blocking these requests (see guc_signal_context_fence).
+	 *
+	 * We can safely check the below field outside of the lock as it isn't
+	 * possible for this field to transition from being clear to set but
+	 * converse is possible, hence the need for the check within the lock.
+	 */
+	if (likely(!context_wait_for_deregister_to_register(ce)))
+		return 0;
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	if (context_wait_for_deregister_to_register(ce)) {
+		i915_sw_fence_await(&rq->submit);
+
+		list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences);
+	}
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
 	return 0;
 }
 
@@ -1299,7 +1346,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 		 */
 		with_intel_runtime_pm(runtime_pm, wakeref)
 			register_context(ce);
-		clr_context_wait_for_deregister_to_register(ce);
+		guc_signal_context_fence(ce);
 		intel_context_put(ce);
 	} else if (context_destroyed(ce)) {
 		/* Context has been destroyed */
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 239964bec1fa..f870cd75a001 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -285,6 +285,14 @@ struct i915_request {
 		struct hrtimer timer;
 	} watchdog;
 
+	/*
+	 * Requests may need to be stalled when using GuC submission waiting for
+	 * certain GuC operations to complete. If that is the case, stalled
+	 * requests are added to a per context list of stalled requests. The
+	 * below list_head is the link in that list.
+	 */
+	struct list_head guc_fence_link;
+
 	I915_SELFTEST_DECLARE(struct {
 		struct list_head link;
 		unsigned long delay;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 48/97] drm/i915/guc: Defer context unpin until scheduling is disabled
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (46 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 47/97] drm/i915/guc: Insert fence on context when deregistering Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 49/97] drm/i915/guc: Disable engine barriers with GuC during unpin Matthew Brost
                   ` (51 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

With GuC scheduling, it isn't safe to unpin a context while scheduling
is enabled for that context as the GuC may touch some of the pinned
state (e.g. LRC). To ensure scheduling isn't enabled when an unpin is
done, a call back is added to intel_context_unpin when pin count == 1
to disable scheduling for that context. When the response CTB is
received it is safe to do the final unpin.

Future patches may add a heuristic / delay to schedule the disable
call back to avoid thrashing on schedule enable / disable.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   4 +-
 drivers/gpu/drm/i915/gt/intel_context.h       |  21 ++-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   6 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 145 +++++++++++++++++-
 6 files changed, 176 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index f750c826e19d..1499b8aace2a 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -306,9 +306,9 @@ int __intel_context_do_pin(struct intel_context *ce)
 	return err;
 }
 
-void intel_context_unpin(struct intel_context *ce)
+void __intel_context_do_unpin(struct intel_context *ce, int sub)
 {
-	if (!atomic_dec_and_test(&ce->pin_count))
+	if (!atomic_sub_and_test(sub, &ce->pin_count))
 		return;
 
 	CE_TRACE(ce, "unpin\n");
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index f83a73a2b39f..92ecbab8c1cd 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -113,7 +113,26 @@ static inline void __intel_context_pin(struct intel_context *ce)
 	atomic_inc(&ce->pin_count);
 }
 
-void intel_context_unpin(struct intel_context *ce);
+void __intel_context_do_unpin(struct intel_context *ce, int sub);
+
+static inline void intel_context_sched_disable_unpin(struct intel_context *ce)
+{
+	__intel_context_do_unpin(ce, 2);
+}
+
+static inline void intel_context_unpin(struct intel_context *ce)
+{
+	if (!ce->ops->sched_disable) {
+		__intel_context_do_unpin(ce, 1);
+	} else {
+		while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
+			if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
+				ce->ops->sched_disable(ce);
+				break;
+			}
+		}
+	}
+}
 
 void intel_context_enter_engine(struct intel_context *ce);
 void intel_context_exit_engine(struct intel_context *ce);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index beafe55a9101..e7af6a2368f8 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -43,6 +43,8 @@ struct intel_context_ops {
 	void (*enter)(struct intel_context *ce);
 	void (*exit)(struct intel_context *ce);
 
+	void (*sched_disable)(struct intel_context *ce);
+
 	void (*reset)(struct intel_context *ce);
 	void (*destroy)(struct kref *kref);
 };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 85ff32bfd074..55f02dd1598d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -237,6 +237,8 @@ int intel_guc_reset_engine(struct intel_guc *guc,
 
 int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 					  const u32 *msg, u32 len);
+int intel_guc_sched_done_process_msg(struct intel_guc *guc,
+				     const u32 *msg, u32 len);
 
 void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 51c5efdf543a..8e48bf260eab 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -900,6 +900,12 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 			CT_ERROR(ct, "deregister context failed %x %*ph\n",
 				  action, 4 * len, payload);
 		break;
+	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
+		ret = intel_guc_sched_done_process_msg(guc, payload, len);
+		if (unlikely(ret))
+			CT_ERROR(ct, "schedule context failed %x %*ph\n",
+				  action, 4 * len, payload);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index b4c439025a5f..2afc49caf462 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -69,6 +69,7 @@
  * context, to be executing simultaneously.
  */
 #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
+#define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
 static inline bool context_enabled(struct intel_context *ce)
 {
 	return (atomic_read(&ce->guc_sched_state_no_lock) &
@@ -86,6 +87,24 @@ static inline void clr_context_enabled(struct intel_context *ce)
 		   &ce->guc_sched_state_no_lock);
 }
 
+static inline bool context_pending_enable(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_PENDING_ENABLE);
+}
+
+static inline void set_context_pending_enable(struct intel_context *ce)
+{
+	atomic_or(SCHED_STATE_NO_LOCK_PENDING_ENABLE,
+		  &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_pending_enable(struct intel_context *ce)
+{
+	atomic_and((u32)~SCHED_STATE_NO_LOCK_PENDING_ENABLE,
+		   &ce->guc_sched_state_no_lock);
+}
+
 /*
  * Below is a set of functions which control the GuC scheduling state which
  * require a lock, aside from the special case where the functions are called
@@ -94,6 +113,7 @@ static inline void clr_context_enabled(struct intel_context *ce)
  */
 #define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
 #define SCHED_STATE_DESTROYED				BIT(1)
+#define SCHED_STATE_PENDING_DISABLE			BIT(2)
 static inline void init_sched_state(struct intel_context *ce)
 {
 	/* Only should be called from guc_lrc_desc_pin() */
@@ -138,6 +158,24 @@ set_context_destroyed(struct intel_context *ce)
 	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
 }
 
+static inline bool context_pending_disable(struct intel_context *ce)
+{
+	return (ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE);
+}
+
+static inline void set_context_pending_disable(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state |= SCHED_STATE_PENDING_DISABLE;
+}
+
+static inline void clr_context_pending_disable(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state =
+		(ce->guc_state.sched_state & ~SCHED_STATE_PENDING_DISABLE);
+}
+
 static inline bool context_guc_id_invalid(struct intel_context *ce)
 {
 	return (ce->guc_id == GUC_INVALID_LRC_ID);
@@ -230,6 +268,8 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
 		action[len++] = ce->guc_id;
 		action[len++] = GUC_CONTEXT_ENABLE;
+		set_context_pending_enable(ce);
+		intel_context_get(ce);
 	} else {
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
 		action[len++] = ce->guc_id;
@@ -237,8 +277,12 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 
 	err = intel_guc_send_nb(guc, action, len);
 
-	if (!enabled && !err)
+	if (!enabled && !err) {
 		set_context_enabled(ce);
+	} else if (!enabled) {
+		clr_context_pending_enable(ce);
+		intel_context_put(ce);
+	}
 
 	return err;
 }
@@ -834,6 +878,60 @@ static void guc_context_post_unpin(struct intel_context *ce)
 	lrc_post_unpin(ce);
 }
 
+static void __guc_context_sched_disable(struct intel_guc *guc,
+					struct intel_context *ce,
+					u16 guc_id)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET,
+		guc_id,	/* ce->guc_id not stable */
+		GUC_CONTEXT_DISABLE
+	};
+
+	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
+
+	intel_context_get(ce);
+
+	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+}
+
+static u16 prep_context_pending_disable(struct intel_context *ce)
+{
+	set_context_pending_disable(ce);
+	clr_context_enabled(ce);
+
+	return ce->guc_id;
+}
+
+static void guc_context_sched_disable(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
+	unsigned long flags;
+	u16 guc_id;
+	intel_wakeref_t wakeref;
+
+	if (context_guc_id_invalid(ce) ||
+	    !lrc_desc_registered(guc, ce->guc_id)) {
+		clr_context_enabled(ce);
+		goto unpin;
+	}
+
+	if (!context_enabled(ce))
+		goto unpin;
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	guc_id = prep_context_pending_disable(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+	with_intel_runtime_pm(runtime_pm, wakeref)
+		__guc_context_sched_disable(guc, ce, guc_id);
+
+	return;
+unpin:
+	intel_context_sched_disable_unpin(ce);
+}
+
 static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 {
 	struct intel_engine_cs *engine = ce->engine;
@@ -842,6 +940,7 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 
 	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
 	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
+	GEM_BUG_ON(context_enabled(ce));
 
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
 	set_context_destroyed(ce);
@@ -923,6 +1022,8 @@ static const struct intel_context_ops guc_context_ops = {
 	.enter = intel_context_enter_engine,
 	.exit = intel_context_exit_engine,
 
+	.sched_disable = guc_context_sched_disable,
+
 	.reset = lrc_reset,
 	.destroy = guc_context_destroy,
 };
@@ -1356,3 +1457,45 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 
 	return 0;
 }
+
+int intel_guc_sched_done_process_msg(struct intel_guc *guc,
+				     const u32 *msg,
+				     u32 len)
+{
+	struct intel_context *ce;
+	unsigned long flags;
+	u32 desc_idx = msg[0];
+
+	if (unlikely(len < 2)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+		return -EPROTO;
+	}
+
+	ce = g2h_context_lookup(guc, desc_idx);
+	if (unlikely(!ce))
+		return -EPROTO;
+
+	if (unlikely(context_destroyed(ce) ||
+		     (!context_pending_enable(ce) &&
+		     !context_pending_disable(ce)))) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"Bad context sched_state 0x%x, 0x%x, desc_idx %u",
+			atomic_read(&ce->guc_sched_state_no_lock),
+			ce->guc_state.sched_state, desc_idx);
+		return -EPROTO;
+	}
+
+	if (context_pending_enable(ce)) {
+		clr_context_pending_enable(ce);
+	} else if (context_pending_disable(ce)) {
+		intel_context_sched_disable_unpin(ce);
+
+		spin_lock_irqsave(&ce->guc_state.lock, flags);
+		clr_context_pending_disable(ce);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+	}
+
+	intel_context_put(ce);
+
+	return 0;
+}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 49/97] drm/i915/guc: Disable engine barriers with GuC during unpin
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (47 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 48/97] drm/i915/guc: Defer context unpin until scheduling is disabled Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-11 15:37   ` Daniel Vetter
  2021-05-26 10:26   ` [Intel-gfx] " Tvrtko Ursulin
  2021-05-06 19:14 ` [RFC PATCH 50/97] drm/i915/guc: Extend deregistration fence to schedule disable Matthew Brost
                   ` (50 subsequent siblings)
  99 siblings, 2 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Disable engine barriers for unpinning with GuC. This feature isn't
needed with the GuC as it disables context scheduling before unpinning
which guarantees the HW will not reference the context. Hence it is
not necessary to defer unpinning until a kernel context request
completes on each engine in the context engine mask.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c    |  2 +-
 drivers/gpu/drm/i915/gt/intel_context.h    |  1 +
 drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++++++++++
 drivers/gpu/drm/i915/i915_active.c         |  3 +++
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 1499b8aace2a..7f97753ab164 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct intel_context *ce)
 
 	__i915_active_acquire(&ce->active);
 
-	if (intel_context_is_barrier(ce))
+	if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))
 		return 0;
 
 	/* Preallocate tracking nodes */
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index 92ecbab8c1cd..9b211ca5ecc7 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -16,6 +16,7 @@
 #include "intel_engine_types.h"
 #include "intel_ring_types.h"
 #include "intel_timeline_types.h"
+#include "uc/intel_guc_submission.h"
 
 #define CE_TRACE(ce, fmt, ...) do {					\
 	const struct intel_context *ce__ = (ce);			\
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c
index 26685b927169..fa7b99a671dd 100644
--- a/drivers/gpu/drm/i915/gt/selftest_context.c
+++ b/drivers/gpu/drm/i915/gt/selftest_context.c
@@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs *engine)
 	 * This test makes sure that the context is kept alive until a
 	 * subsequent idle-barrier (emitted when the engine wakeref hits 0
 	 * with no more outstanding requests).
+	 *
+	 * In GuC submission mode we don't use idle barriers and we instead
+	 * get a message from the GuC to signal that it is safe to unpin the
+	 * context from memory.
 	 */
+	if (intel_engine_uses_guc(engine))
+		return 0;
 
 	if (intel_engine_pm_is_awake(engine)) {
 		pr_err("%s is awake before starting %s!\n",
@@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs *engine)
 	 * on the context image remotely (intel_context_prepare_remote_request),
 	 * which inserts foreign fences into intel_context.active, does not
 	 * clobber the idle-barrier.
+	 *
+	 * In GuC submission mode we don't use idle barriers.
 	 */
+	if (intel_engine_uses_guc(engine))
+		return 0;
 
 	if (intel_engine_pm_is_awake(engine)) {
 		pr_err("%s is awake before starting %s!\n",
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index b1aa1c482c32..9a264898bb91 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -968,6 +968,9 @@ void i915_active_acquire_barrier(struct i915_active *ref)
 
 	GEM_BUG_ON(i915_active_is_idle(ref));
 
+	if (llist_empty(&ref->preallocated_barriers))
+		return;
+
 	/*
 	 * Transfer the list of preallocated barriers into the
 	 * i915_active rbtree, but only as proto-nodes. They will be
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 50/97] drm/i915/guc: Extend deregistration fence to schedule disable
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (48 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 49/97] drm/i915/guc: Disable engine barriers with GuC during unpin Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 51/97] drm/i915: Disable preempt busywait when using GuC scheduling Matthew Brost
                   ` (49 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Extend the deregistration context fence to fence whne a GuC context has
scheduling disable pending.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++++++++++++++----
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 2afc49caf462..885f14bfe3b9 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -921,7 +921,19 @@ static void guc_context_sched_disable(struct intel_context *ce)
 		goto unpin;
 
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
+
+	/*
+	 * We have to check if the context has been pinned again as another pin
+	 * operation is allowed to pass this function. Checking the pin count
+	 * here synchronizes this function with guc_request_alloc ensuring a
+	 * request doesn't slip through the 'context_pending_disable' fence.
+	 */
+	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+		return;
+	}
 	guc_id = prep_context_pending_disable(ce);
+
 	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
 	with_intel_runtime_pm(runtime_pm, wakeref)
@@ -1127,19 +1139,22 @@ static int guc_request_alloc(struct i915_request *rq)
 out:
 	/*
 	 * We block all requests on this context if a G2H is pending for a
-	 * context deregistration as the GuC will fail a context registration
-	 * while this G2H is pending. Once a G2H returns, the fence is released
-	 * that is blocking these requests (see guc_signal_context_fence).
+	 * schedule disable or context deregistration as the GuC will fail a
+	 * schedule enable or context registration if either G2H is pending
+	 * respectfully. Once a G2H returns, the fence is released that is
+	 * blocking these requests (see guc_signal_context_fence).
 	 *
-	 * We can safely check the below field outside of the lock as it isn't
-	 * possible for this field to transition from being clear to set but
+	 * We can safely check the below fields outside of the lock as it isn't
+	 * possible for these fields to transition from being clear to set but
 	 * converse is possible, hence the need for the check within the lock.
 	 */
-	if (likely(!context_wait_for_deregister_to_register(ce)))
+	if (likely(!context_wait_for_deregister_to_register(ce) &&
+		   !context_pending_disable(ce)))
 		return 0;
 
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
-	if (context_wait_for_deregister_to_register(ce)) {
+	if (context_wait_for_deregister_to_register(ce) ||
+	    context_pending_disable(ce)) {
 		i915_sw_fence_await(&rq->submit);
 
 		list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences);
@@ -1488,10 +1503,18 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 	if (context_pending_enable(ce)) {
 		clr_context_pending_enable(ce);
 	} else if (context_pending_disable(ce)) {
+		/*
+		 * Unpin must be done before __guc_signal_context_fence,
+		 * otherwise a race exists between the requests getting
+		 * submitted + retired before this unpin completes resulting in
+		 * the pin_count going to zero and the context still being
+		 * enabled.
+		 */
 		intel_context_sched_disable_unpin(ce);
 
 		spin_lock_irqsave(&ce->guc_state.lock, flags);
 		clr_context_pending_disable(ce);
+		__guc_signal_context_fence(ce);
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 	}
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 51/97] drm/i915: Disable preempt busywait when using GuC scheduling
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (49 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 50/97] drm/i915/guc: Extend deregistration fence to schedule disable Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 52/97] drm/i915/guc: Ensure request ordering via completion fences Matthew Brost
                   ` (48 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Disable preempt busywait when using GuC scheduling. This isn't need as
the GuC control preemption when scheduling.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
index 732c2ed1d933..47500ee955d4 100644
--- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
@@ -506,7 +506,8 @@ gen8_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs)
 	*cs++ = MI_USER_INTERRUPT;
 
 	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
-	if (intel_engine_has_semaphores(rq->engine))
+	if (intel_engine_has_semaphores(rq->engine) &&
+	    !intel_uc_uses_guc_submission(&rq->engine->gt->uc))
 		cs = emit_preempt_busywait(rq, cs);
 
 	rq->tail = intel_ring_offset(rq, cs);
@@ -598,7 +599,8 @@ gen12_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs)
 	*cs++ = MI_USER_INTERRUPT;
 
 	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
-	if (intel_engine_has_semaphores(rq->engine))
+	if (intel_engine_has_semaphores(rq->engine) &&
+	    !intel_uc_uses_guc_submission(&rq->engine->gt->uc))
 		cs = gen12_emit_preempt_busywait(rq, cs);
 
 	rq->tail = intel_ring_offset(rq, cs);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 52/97] drm/i915/guc: Ensure request ordering via completion fences
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (50 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 51/97] drm/i915: Disable preempt busywait when using GuC scheduling Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 53/97] drm/i915/guc: Disable semaphores when using GuC scheduling Matthew Brost
                   ` (47 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

If two requests are on the same ring, they are explicitly ordered by the
HW. So, a submission fence is sufficient to ensure ordering when using
the new GuC submission interface. Conversely, if two requests share a
timeline and are on the same physical engine but different context this
doesn't ensure ordering on the new GuC submission interface. So, a
completion fence needs to be used to ensure ordering.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c   |  1 -
 drivers/gpu/drm/i915/i915_request.c             | 17 +++++++++++++----
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 885f14bfe3b9..580535b02eb1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -929,7 +929,6 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	 * request doesn't slip through the 'context_pending_disable' fence.
 	 */
 	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
-		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 		return;
 	}
 	guc_id = prep_context_pending_disable(ce);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 56860b7d065b..3a8f6ec0c32d 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -444,6 +444,7 @@ void i915_request_retire_upto(struct i915_request *rq)
 
 	do {
 		tmp = list_first_entry(&tl->requests, typeof(*tmp), link);
+		GEM_BUG_ON(!i915_request_completed(tmp));
 	} while (i915_request_retire(tmp) && tmp != rq);
 }
 
@@ -1405,6 +1406,9 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence)
 	return err;
 }
 
+static int
+i915_request_await_request(struct i915_request *to, struct i915_request *from);
+
 int
 i915_request_await_execution(struct i915_request *rq,
 			     struct dma_fence *fence,
@@ -1464,12 +1468,13 @@ await_request_submit(struct i915_request *to, struct i915_request *from)
 	 * the waiter to be submitted immediately to the physical engine
 	 * as it may then bypass the virtual request.
 	 */
-	if (to->engine == READ_ONCE(from->engine))
+	if (to->engine == READ_ONCE(from->engine)) {
 		return i915_sw_fence_await_sw_fence_gfp(&to->submit,
 							&from->submit,
 							I915_FENCE_GFP);
-	else
+	} else {
 		return __i915_request_await_execution(to, from, NULL);
+	}
 }
 
 static int
@@ -1493,7 +1498,8 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 			return ret;
 	}
 
-	if (is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
+	if (!intel_engine_uses_guc(to->engine) &&
+	    is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
 		ret = await_request_submit(to, from);
 	else
 		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
@@ -1654,6 +1660,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
 	prev = to_request(__i915_active_fence_set(&timeline->last_request,
 						  &rq->fence));
 	if (prev && !__i915_request_is_complete(prev)) {
+		bool uses_guc = intel_engine_uses_guc(rq->engine);
+
 		/*
 		 * The requests are supposed to be kept in order. However,
 		 * we need to be wary in case the timeline->last_request
@@ -1664,7 +1672,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
 			   i915_seqno_passed(prev->fence.seqno,
 					     rq->fence.seqno));
 
-		if (is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask))
+		if ((!uses_guc && is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask)) ||
+		    (uses_guc && prev->context == rq->context))
 			i915_sw_fence_await_sw_fence(&rq->submit,
 						     &prev->submit,
 						     &rq->submitq);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 53/97] drm/i915/guc: Disable semaphores when using GuC scheduling
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (51 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 52/97] drm/i915/guc: Ensure request ordering via completion fences Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-25  9:52   ` [Intel-gfx] " Tvrtko Ursulin
  2021-05-06 19:14 ` [RFC PATCH 54/97] drm/i915/guc: Ensure G2H response has space in buffer Matthew Brost
                   ` (46 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Disable semaphores when using GuC scheduling as semaphores are broken in
the current GuC firmware.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 993faa213b41..d30260ffe2a7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -230,7 +230,8 @@ static void intel_context_set_gem(struct intel_context *ce,
 		ce->timeline = intel_timeline_get(ctx->timeline);
 
 	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
-	    intel_engine_has_timeslices(ce->engine))
+	    intel_engine_has_timeslices(ce->engine) &&
+	    intel_engine_has_semaphores(ce->engine))
 		__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
 
 	intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us);
@@ -1939,7 +1940,8 @@ static int __apply_priority(struct intel_context *ce, void *arg)
 	if (!intel_engine_has_timeslices(ce->engine))
 		return 0;
 
-	if (ctx->sched.priority >= I915_PRIORITY_NORMAL)
+	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
+	    intel_engine_has_semaphores(ce->engine))
 		intel_context_set_use_semaphores(ce);
 	else
 		intel_context_clear_use_semaphores(ce);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 54/97] drm/i915/guc: Ensure G2H response has space in buffer
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (52 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 53/97] drm/i915/guc: Disable semaphores when using GuC scheduling Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 55/97] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC Matthew Brost
                   ` (45 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Ensure G2H response has space in the buffer before sending H2G CTB as
the GuC can't handle any backpressure on the G2H interface.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        | 13 +++-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 74 +++++++++++++++----
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 ++--
 5 files changed, 85 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 55f02dd1598d..485e98f3f304 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -96,11 +96,17 @@ inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
 }
 
 #define INTEL_GUC_SEND_NB		BIT(31)
+#define INTEL_GUC_SEND_G2H_DW_SHIFT	0
+#define INTEL_GUC_SEND_G2H_DW_MASK	(0xff << INTEL_GUC_SEND_G2H_DW_SHIFT)
+#define MAKE_SEND_FLAGS(len) \
+	({GEM_BUG_ON(!FIELD_FIT(INTEL_GUC_SEND_G2H_DW_MASK, len)); \
+	(FIELD_PREP(INTEL_GUC_SEND_G2H_DW_MASK, len) | INTEL_GUC_SEND_NB);})
 static
-inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
+inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len,
+			     u32 g2h_len_dw)
 {
 	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
-				 INTEL_GUC_SEND_NB);
+				 MAKE_SEND_FLAGS(g2h_len_dw));
 }
 
 static inline int
@@ -114,6 +120,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
 static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
 					   const u32 *action,
 					   u32 len,
+					   u32 g2h_len_dw,
 					   bool loop)
 {
 	int err;
@@ -122,7 +129,7 @@ static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
 	might_sleep_if(loop && (!in_atomic() && !irqs_disabled()));
 
 retry:
-	err = intel_guc_send_nb(guc, action, len);
+	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
 	if (unlikely(err == -EBUSY && loop)) {
 		if (likely(!in_atomic() && !irqs_disabled()))
 			cond_resched();
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 8e48bf260eab..f1893030ca88 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -73,6 +73,7 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
 #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
 #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
 #define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
+#define G2H_ROOM_BUFFER_SIZE	(PAGE_SIZE)
 
 #define MAX_US_STALL_CTB	1000000
 
@@ -131,23 +132,27 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
 
 static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
 {
+	u32 space;
+
 	ctb->broken = false;
 	ctb->tail = 0;
 	ctb->head = 0;
-	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+	space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size) - ctb->resv_space;
+	atomic_set(&ctb->space, space);
 
 	guc_ct_buffer_desc_init(ctb->desc);
 }
 
 static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb,
 			       struct guc_ct_buffer_desc *desc,
-			       u32 *cmds, u32 size_in_bytes)
+			       u32 *cmds, u32 size_in_bytes, u32 resv_space)
 {
 	GEM_BUG_ON(size_in_bytes % 4);
 
 	ctb->desc = desc;
 	ctb->cmds = cmds;
 	ctb->size = size_in_bytes / 4;
+	ctb->resv_space = resv_space / 4;
 
 	guc_ct_buffer_reset(ctb);
 }
@@ -228,6 +233,7 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
 	struct guc_ct_buffer_desc *desc;
 	u32 blob_size;
 	u32 cmds_size;
+	u32 resv_space;
 	void *blob;
 	u32 *cmds;
 	int err;
@@ -252,19 +258,21 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
 	desc = blob;
 	cmds = blob + 2 * CTB_DESC_SIZE;
 	cmds_size = CTB_H2G_BUFFER_SIZE;
-	CT_DEBUG(ct, "%s desc %#lx cmds %#lx size %u\n", "send",
-		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
+	resv_space = 0;
+	CT_DEBUG(ct, "%s desc %#lx cmds %#lx size %u/%u\n", "send",
+		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size, resv_space);
 
-	guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size);
+	guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size, resv_space);
 
 	/* store pointers to desc and cmds for recv ctb */
 	desc = blob + CTB_DESC_SIZE;
 	cmds = blob + 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE;
 	cmds_size = CTB_G2H_BUFFER_SIZE;
-	CT_DEBUG(ct, "%s desc %#lx cmds %#lx size %u\n", "recv",
-		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
+	resv_space = G2H_ROOM_BUFFER_SIZE;
+	CT_DEBUG(ct, "%s desc %#lx cmds %#lx size %u/%u\n", "recv",
+		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size, resv_space);
 
-	guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size);
+	guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size, resv_space);
 
 	return 0;
 }
@@ -450,7 +458,7 @@ static int ct_write(struct intel_guc_ct *ct,
 	/* now update descriptor */
 	ctb->tail = tail;
 	WRITE_ONCE(desc->tail, tail);
-	ctb->space -= len + 1;
+	atomic_sub(len + 1, &ctb->space);
 
 	return 0;
 
@@ -514,13 +522,34 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 	return ret;
 }
 
+static inline bool g2h_has_room(struct intel_guc_ct *ct, u32 g2h_len_dw)
+{
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
+
+	/*
+	 * We leave a certain amount of space in the G2H CTB buffer for
+	 * unexpected G2H CTBs (e.g. logging, engine hang, etc...)
+	 */
+	return !g2h_len_dw || atomic_read(&ctb->space) >= g2h_len_dw;
+}
+
+static inline void g2h_reserve_space(struct intel_guc_ct *ct, u32 g2h_len_dw)
+{
+	lockdep_assert_held(&ct->ctbs.send.lock);
+
+	GEM_BUG_ON(!g2h_has_room(ct, g2h_len_dw));
+
+	if (g2h_len_dw)
+		atomic_sub(g2h_len_dw, &ct->ctbs.recv.space);
+}
+
 static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	u32 head;
 	u32 space;
 
-	if (ctb->space >= len_dw)
+	if (atomic_read(&ctb->space) >= len_dw)
 		return true;
 
 	head = READ_ONCE(ctb->desc->head);
@@ -533,16 +562,16 @@ static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 	}
 
 	space = CIRC_SPACE(ctb->tail, head, ctb->size);
-	ctb->space = space;
+	atomic_set(&ctb->space, space);
 
 	return space >= len_dw;
 }
 
-static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
+static int has_room_nb(struct intel_guc_ct *ct, u32 h2g_dw, u32 g2h_dw)
 {
 	lockdep_assert_held(&ct->ctbs.send.lock);
 
-	if (unlikely(!h2g_has_room(ct, len_dw))) {
+	if (unlikely(!h2g_has_room(ct, h2g_dw) || !g2h_has_room(ct, g2h_dw))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 
@@ -556,6 +585,9 @@ static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
 	return 0;
 }
 
+#define G2H_LEN_DW(f) \
+	FIELD_GET(INTEL_GUC_SEND_G2H_DW_MASK, f) ? \
+	FIELD_GET(INTEL_GUC_SEND_G2H_DW_MASK, f) + GUC_CTB_HXG_MSG_MIN_LEN : 0
 static int ct_send_nb(struct intel_guc_ct *ct,
 		      const u32 *action,
 		      u32 len,
@@ -563,12 +595,13 @@ static int ct_send_nb(struct intel_guc_ct *ct,
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	unsigned long spin_flags;
+	u32 g2h_len_dw = G2H_LEN_DW(flags);
 	u32 fence;
 	int ret;
 
 	spin_lock_irqsave(&ctb->lock, spin_flags);
 
-	ret = has_room_nb(ct, len + 1);
+	ret = has_room_nb(ct, len + 1, g2h_len_dw);
 	if (unlikely(ret))
 		goto out;
 
@@ -577,6 +610,7 @@ static int ct_send_nb(struct intel_guc_ct *ct,
 	if (unlikely(ret))
 		goto out;
 
+	g2h_reserve_space(ct, g2h_len_dw);
 	intel_guc_notify(ct_to_guc(ct));
 
 out:
@@ -963,10 +997,22 @@ static void ct_incoming_request_worker_func(struct work_struct *w)
 static int ct_handle_event(struct intel_guc_ct *ct, struct ct_incoming_msg *request)
 {
 	const u32 *hxg = &request->msg[GUC_CTB_MSG_MIN_LEN];
+	u32 action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, hxg[0]);
 	unsigned long flags;
 
 	GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]) != GUC_HXG_TYPE_EVENT);
 
+	/*
+	 * Adjusting the space must be done in IRQ or deadlock can occur as the
+	 * CTB processing in the below workqueue can send CTBs which creates a
+	 * circular dependency if the space was returned there.
+	 */
+	switch (action) {
+	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
+	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
+		atomic_add(request->size, &ct->ctbs.recv.space);
+	}
+
 	spin_lock_irqsave(&ct->requests.lock, flags);
 	list_add_tail(&request->link, &ct->requests.incoming);
 	spin_unlock_irqrestore(&ct->requests.lock, flags);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 9924335e2ee6..660bf37238e2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -33,6 +33,7 @@ struct intel_guc;
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
  * @size: size of the commands buffer in dwords
+ * @resv_space: reserved space in buffer in dwords
  * @head: local shadow copy of head in dwords
  * @tail: local shadow copy of tail in dwords
  * @space: local shadow copy of space in dwords
@@ -43,9 +44,10 @@ struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
 	u32 size;
+	u32 resv_space;
 	u32 tail;
 	u32 head;
-	u32 space;
+	atomic_t space;
 	bool broken;
 };
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 1dd2f04c2762..9c258f9546af 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -17,6 +17,10 @@
 #include "abi/guc_messages_abi.h"
 #include "gt/intel_engine_types.h"
 
+/* Payload length only i.e. don't include G2H header length */
+#define G2H_LEN_DW_SCHED_CONTEXT_MODE_SET	2
+#define G2H_LEN_DW_DEREGISTER_CONTEXT		1
+
 #define GUC_CONTEXT_DISABLE		0
 #define GUC_CONTEXT_ENABLE		1
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 580535b02eb1..ae0b386467e3 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -259,6 +259,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	struct intel_context *ce = rq->context;
 	u32 action[3];
 	int len = 0;
+	u32 g2h_len_dw = 0;
 	bool enabled = context_enabled(ce);
 
 	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
@@ -270,13 +271,13 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		action[len++] = GUC_CONTEXT_ENABLE;
 		set_context_pending_enable(ce);
 		intel_context_get(ce);
+		g2h_len_dw = G2H_LEN_DW_SCHED_CONTEXT_MODE_SET;
 	} else {
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
 		action[len++] = ce->guc_id;
 	}
 
-	err = intel_guc_send_nb(guc, action, len);
-
+	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
 	if (!enabled && !err) {
 		set_context_enabled(ce);
 	} else if (!enabled) {
@@ -733,7 +734,7 @@ static int __guc_action_register_context(struct intel_guc *guc,
 		offset,
 	};
 
-	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
 }
 
 static int register_context(struct intel_context *ce)
@@ -753,7 +754,8 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
 		guc_id,
 	};
 
-	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
+					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
 }
 
 static int deregister_context(struct intel_context *ce, u32 guc_id)
@@ -892,7 +894,8 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 
 	intel_context_get(ce);
 
-	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
+				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
 }
 
 static u16 prep_context_pending_disable(struct intel_context *ce)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 55/97] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (53 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 54/97] drm/i915/guc: Ensure G2H response has space in buffer Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-25 10:06   ` [Intel-gfx] " Tvrtko Ursulin
  2021-05-06 19:14 ` [RFC PATCH 56/97] drm/i915/guc: Update GuC debugfs to support new GuC Matthew Brost
                   ` (44 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

When running the GuC the GPU can't be considered idle if the GuC still
has contexts pinned. As such, a call has been added in
intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
the number of unpinned contexts to go to zero.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
 drivers/gpu/drm/i915/gt/intel_gt.c            | 18 ++++
 drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 22 ++---
 drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 ++++++++++++++++++-
 drivers/gpu/drm/i915/gt/uc/intel_uc.h         |  5 +
 drivers/gpu/drm/i915/i915_debugfs.c           |  1 +
 drivers/gpu/drm/i915/i915_gem_evict.c         |  1 +
 .../gpu/drm/i915/selftests/igt_live_test.c    |  2 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
 14 files changed, 137 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 8598a1c78a4c..2f5295c9408d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -634,7 +634,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
 		goto insert;
 
 	/* Attempt to reap some mmap space from dead objects */
-	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
+	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
+					       NULL);
 	if (err)
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 8d77dcbad059..1742a8561f69 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -574,6 +574,24 @@ static void __intel_gt_disable(struct intel_gt *gt)
 	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
 }
 
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
+{
+	long rtimeout;
+
+	/* If the device is asleep, we have no requests outstanding */
+	if (!intel_gt_pm_is_awake(gt))
+		return 0;
+
+	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
+							   &rtimeout)) > 0) {
+		cond_resched();
+		if (signal_pending(current))
+			return -EINTR;
+	}
+
+	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc, rtimeout);
+}
+
 int intel_gt_init(struct intel_gt *gt)
 {
 	int err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index 7ec395cace69..c775043334bf 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
 
 void intel_gt_driver_late_release(struct intel_gt *gt);
 
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
+
 void intel_gt_check_and_clear_faults(struct intel_gt *gt);
 void intel_gt_clear_error_registers(struct intel_gt *gt,
 				    intel_engine_mask_t engine_mask);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index 647eca9d867a..c6c702f236fa 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -13,6 +13,7 @@
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
 #include "intel_timeline.h"
+#include "uc/intel_uc.h"
 
 static bool retire_requests(struct intel_timeline *tl)
 {
@@ -130,7 +131,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine)
 	GEM_BUG_ON(engine->retire);
 }
 
-long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
+long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
+				      long *rtimeout)
 {
 	struct intel_gt_timelines *timelines = &gt->timelines;
 	struct intel_timeline *tl, *tn;
@@ -195,22 +197,10 @@ out_active:	spin_lock(&timelines->lock);
 	if (flush_submission(gt, timeout)) /* Wait, there's more! */
 		active_count++;
 
-	return active_count ? timeout : 0;
-}
-
-int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
-{
-	/* If the device is asleep, we have no requests outstanding */
-	if (!intel_gt_pm_is_awake(gt))
-		return 0;
-
-	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
-		cond_resched();
-		if (signal_pending(current))
-			return -EINTR;
-	}
+	if (rtimeout)
+		*rtimeout = timeout;
 
-	return timeout;
+	return active_count ? timeout : 0;
 }
 
 static void retire_work_handler(struct work_struct *work)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
index fcc30a6e4fe9..4419787124e2 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
@@ -10,10 +10,11 @@ struct intel_engine_cs;
 struct intel_gt;
 struct intel_timeline;
 
-long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
+long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
+				      long *rtimeout);
 static inline void intel_gt_retire_requests(struct intel_gt *gt)
 {
-	intel_gt_retire_requests_timeout(gt, 0);
+	intel_gt_retire_requests_timeout(gt, 0, NULL);
 }
 
 void intel_engine_init_retire(struct intel_engine_cs *engine);
@@ -21,8 +22,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine,
 			     struct intel_timeline *tl);
 void intel_engine_fini_retire(struct intel_engine_cs *engine);
 
-int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
-
 void intel_gt_init_requests(struct intel_gt *gt);
 void intel_gt_park_requests(struct intel_gt *gt);
 void intel_gt_unpark_requests(struct intel_gt *gt);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 485e98f3f304..47eaa69809e8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -38,6 +38,8 @@ struct intel_guc {
 	spinlock_t irq_lock;
 	unsigned int msg_enabled_mask;
 
+	atomic_t outstanding_submission_g2h;
+
 	struct {
 		bool enabled;
 		void (*reset)(struct intel_guc *guc);
@@ -239,6 +241,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
 	spin_unlock_irq(&guc->irq_lock);
 }
 
+int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
+
 int intel_guc_reset_engine(struct intel_guc *guc,
 			   struct intel_engine_cs *engine);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index f1893030ca88..cf701056fa14 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -111,6 +111,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
 	INIT_LIST_HEAD(&ct->requests.incoming);
 	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
 	tasklet_init(&ct->receive_tasklet, ct_receive_tasklet_func, (unsigned long)ct);
+	init_waitqueue_head(&ct->wq);
 }
 
 static inline const char *guc_ct_buffer_type_to_str(u32 type)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 660bf37238e2..ab1b79ab960b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -10,6 +10,7 @@
 #include <linux/spinlock.h>
 #include <linux/workqueue.h>
 #include <linux/ktime.h>
+#include <linux/wait.h>
 
 #include "intel_guc_fwif.h"
 
@@ -68,6 +69,9 @@ struct intel_guc_ct {
 
 	struct tasklet_struct receive_tasklet;
 
+	/** @wq: wait queue for g2h chanenl */
+	wait_queue_head_t wq;
+
 	struct {
 		u16 last_fence; /* last fence used to send request */
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ae0b386467e3..0ff7dd6d337d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -253,6 +253,74 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
 	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
 }
 
+static int guc_submission_busy_loop(struct intel_guc* guc,
+				    const u32 *action,
+				    u32 len,
+				    u32 g2h_len_dw,
+				    bool loop)
+{
+	int err;
+
+	err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+
+	if (!err && g2h_len_dw)
+		atomic_inc(&guc->outstanding_submission_g2h);
+
+	return err;
+}
+
+static int guc_wait_for_pending_msg(struct intel_guc *guc,
+				    atomic_t *wait_var,
+				    bool interruptible,
+				    long timeout)
+{
+	const int state = interruptible ?
+		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
+	DEFINE_WAIT(wait);
+
+	might_sleep();
+	GEM_BUG_ON(timeout < 0);
+
+	if (!atomic_read(wait_var))
+		return 0;
+
+	if (!timeout)
+		return -ETIME;
+
+	for (;;) {
+		prepare_to_wait(&guc->ct.wq, &wait, state);
+
+		if (!atomic_read(wait_var))
+			break;
+
+		if (signal_pending_state(state, current)) {
+			timeout = -ERESTARTSYS;
+			break;
+		}
+
+		if (!timeout) {
+			timeout = -ETIME;
+			break;
+		}
+
+		timeout = io_schedule_timeout(timeout);
+	}
+	finish_wait(&guc->ct.wq, &wait);
+
+	return (timeout < 0) ? timeout : 0;
+}
+
+int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
+{
+	bool interruptible = true;
+
+	if (unlikely(timeout < 0))
+		timeout = -timeout, interruptible = false;
+
+	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
+					interruptible, timeout);
+}
+
 static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
 	int err;
@@ -279,6 +347,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 
 	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
 	if (!enabled && !err) {
+		atomic_inc(&guc->outstanding_submission_g2h);
 		set_context_enabled(ce);
 	} else if (!enabled) {
 		clr_context_pending_enable(ce);
@@ -734,7 +803,7 @@ static int __guc_action_register_context(struct intel_guc *guc,
 		offset,
 	};
 
-	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
+	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
 }
 
 static int register_context(struct intel_context *ce)
@@ -754,7 +823,7 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
 		guc_id,
 	};
 
-	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
+	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
 					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
 }
 
@@ -871,7 +940,9 @@ static int guc_context_pin(struct intel_context *ce, void *vaddr)
 
 static void guc_context_unpin(struct intel_context *ce)
 {
-	unpin_guc_id(ce_to_guc(ce), ce);
+	struct intel_guc *guc = ce_to_guc(ce);
+
+	unpin_guc_id(guc, ce);
 	lrc_unpin(ce);
 }
 
@@ -894,7 +965,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 
 	intel_context_get(ce);
 
-	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
+	guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
 				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
 }
 
@@ -1437,6 +1508,15 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
 	return ce;
 }
 
+static void decr_outstanding_submission_g2h(struct intel_guc *guc)
+{
+	if (atomic_dec_and_test(&guc->outstanding_submission_g2h)) {
+		smp_mb();
+		if (waitqueue_active(&guc->ct.wq))
+			wake_up_all(&guc->ct.wq);
+	}
+}
+
 int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 					  const u32 *msg,
 					  u32 len)
@@ -1472,6 +1552,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 		lrc_destroy(&ce->ref);
 	}
 
+	decr_outstanding_submission_g2h(guc);
+
 	return 0;
 }
 
@@ -1520,6 +1602,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 	}
 
+	decr_outstanding_submission_g2h(guc);
 	intel_context_put(ce);
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
index 9c954c589edf..c4cef885e984 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission);
 #undef uc_state_checkers
 #undef __uc_state_checker
 
+static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout)
+{
+	return intel_guc_wait_for_idle(&uc->guc, timeout);
+}
+
 #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \
 static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \
 { \
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 8dd374691102..bb29838d1cd7 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -36,6 +36,7 @@
 #include "gt/intel_gt_clock_utils.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_pm.h"
+#include "gt/intel_gt.h"
 #include "gt/intel_gt_requests.h"
 #include "gt/intel_reset.h"
 #include "gt/intel_rc6.h"
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 4d2d59a9942b..2b73ddb11c66 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -27,6 +27,7 @@
  */
 
 #include "gem/i915_gem_context.h"
+#include "gt/intel_gt.h"
 #include "gt/intel_gt_requests.h"
 
 #include "i915_drv.h"
diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
index c130010a7033..1c721542e277 100644
--- a/drivers/gpu/drm/i915/selftests/igt_live_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
@@ -5,7 +5,7 @@
  */
 
 #include "i915_drv.h"
-#include "gt/intel_gt_requests.h"
+#include "gt/intel_gt.h"
 
 #include "../i915_selftest.h"
 #include "igt_flush_test.h"
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index cf40004bc92a..6c06816e2b99 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -51,7 +51,8 @@ void mock_device_flush(struct drm_i915_private *i915)
 	do {
 		for_each_engine(engine, gt, id)
 			mock_engine_flush(engine);
-	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT));
+	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT,
+						  NULL));
 }
 
 static void mock_device_release(struct drm_device *dev)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 56/97] drm/i915/guc: Update GuC debugfs to support new GuC
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (54 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 55/97] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 57/97] drm/i915/guc: Add several request trace points Matthew Brost
                   ` (43 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Update GuC debugfs to support the new GuC structures.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 22 ++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  3 ++
 .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    | 23 +++++++-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 52 +++++++++++++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  4 ++
 drivers/gpu/drm/i915/i915_debugfs.c           |  1 +
 6 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index cf701056fa14..b3194d753b13 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -1131,3 +1131,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
 
 	ct_try_receive_message(ct);
 }
+
+void intel_guc_log_ct_info(struct intel_guc_ct *ct,
+			   struct drm_printer *p)
+{
+	if (!ct->enabled) {
+		drm_puts(p, "CT disabled\n");
+		return;
+	}
+
+	drm_printf(p, "H2G Space: %u\n",
+		   atomic_read(&ct->ctbs.send.space) * 4);
+	drm_printf(p, "Head: %u\n",
+		   ct->ctbs.send.desc->head);
+	drm_printf(p, "Tail: %u\n",
+		   ct->ctbs.send.desc->tail);
+	drm_printf(p, "G2H Space: %u\n",
+		   atomic_read(&ct->ctbs.recv.space) * 4);
+	drm_printf(p, "Head: %u\n",
+		   ct->ctbs.recv.desc->head);
+	drm_printf(p, "Tail: %u\n",
+		   ct->ctbs.recv.desc->tail);
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index ab1b79ab960b..f62eb06b32fc 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -16,6 +16,7 @@
 
 struct i915_vma;
 struct intel_guc;
+struct drm_printer;
 
 /**
  * DOC: Command Transport (CT).
@@ -106,4 +107,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 		      u32 *response_buf, u32 response_buf_size, u32 flags);
 void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
 
+void intel_guc_log_ct_info(struct intel_guc_ct *ct, struct drm_printer *p);
+
 #endif /* _INTEL_GUC_CT_H_ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
index fe7cb7b29a1e..62b9ce0fafaa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
@@ -9,6 +9,8 @@
 #include "intel_guc.h"
 #include "intel_guc_debugfs.h"
 #include "intel_guc_log_debugfs.h"
+#include "gt/uc/intel_guc_ct.h"
+#include "gt/uc/intel_guc_submission.h"
 
 static int guc_info_show(struct seq_file *m, void *data)
 {
@@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data)
 	drm_puts(&p, "\n");
 	intel_guc_log_info(&guc->log, &p);
 
-	/* Add more as required ... */
+	if (!intel_guc_submission_is_used(guc))
+		return 0;
+
+	intel_guc_log_ct_info(&guc->ct, &p);
+	intel_guc_log_submission_info(guc, &p);
 
 	return 0;
 }
 DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info);
 
+static int guc_registered_contexts_show(struct seq_file *m, void *data)
+{
+	struct intel_guc *guc = m->private;
+	struct drm_printer p = drm_seq_file_printer(m);
+
+	if (!intel_guc_submission_is_used(guc))
+		return -ENODEV;
+
+	intel_guc_log_context_info(guc, &p);
+
+	return 0;
+}
+DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts);
+
 void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root)
 {
 	static const struct debugfs_gt_file files[] = {
 		{ "guc_info", &guc_info_fops, NULL },
+		{ "guc_registered_contexts", &guc_registered_contexts_fops, NULL },
 	};
 
 	if (!intel_guc_is_supported(guc))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 0ff7dd6d337d..c7a8968f22c5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1607,3 +1607,55 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 
 	return 0;
 }
+
+void intel_guc_log_submission_info(struct intel_guc *guc,
+				   struct drm_printer *p)
+{
+	struct i915_sched_engine *sched_engine = guc->sched_engine;
+	struct rb_node *rb;
+	unsigned long flags;
+
+	drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
+		   atomic_read(&guc->outstanding_submission_g2h));
+	drm_printf(p, "GuC tasklet count: %u\n\n",
+		   atomic_read(&sched_engine->tasklet.count));
+
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	drm_printf(p, "Requests in GuC submit tasklet:\n");
+	for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
+		struct i915_priolist *pl = to_priolist(rb);
+		struct i915_request *rq;
+
+		priolist_for_each_request(rq, pl)
+			drm_printf(p, "guc_id=%u, seqno=%llu\n",
+				   rq->context->guc_id,
+				   rq->fence.seqno);
+	}
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+	drm_printf(p, "\n");
+}
+
+void intel_guc_log_context_info(struct intel_guc *guc,
+				struct drm_printer *p)
+{
+	struct intel_context *ce;
+	unsigned long index;
+
+	xa_for_each(&guc->context_lookup, index, ce) {
+		drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
+		drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
+		drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
+			   ce->ring->head,
+			   ce->lrc_reg_state[CTX_RING_HEAD]);
+		drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
+			   ce->ring->tail,
+			   ce->lrc_reg_state[CTX_RING_TAIL]);
+		drm_printf(p, "\t\tContext Pin Count: %u\n",
+			   atomic_read(&ce->pin_count));
+		drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
+			   atomic_read(&ce->guc_id_ref));
+		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
+			   ce->guc_state.sched_state,
+			   atomic_read(&ce->guc_sched_state_no_lock));
+	}
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 3f7005018939..6453e2bfa151 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -10,6 +10,7 @@
 
 #include "intel_guc.h"
 
+struct drm_printer;
 struct intel_engine_cs;
 
 void intel_guc_submission_init_early(struct intel_guc *guc);
@@ -20,6 +21,9 @@ void intel_guc_submission_fini(struct intel_guc *guc);
 int intel_guc_preempt_work_create(struct intel_guc *guc);
 void intel_guc_preempt_work_destroy(struct intel_guc *guc);
 int intel_guc_submission_setup(struct intel_engine_cs *engine);
+void intel_guc_log_submission_info(struct intel_guc *guc,
+				   struct drm_printer *p);
+void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p);
 
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index bb29838d1cd7..d540dd8029d0 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -50,6 +50,7 @@
 #include "i915_trace.h"
 #include "intel_pm.h"
 #include "intel_sideband.h"
+#include "gt/intel_lrc_reg.h"
 
 static inline struct drm_i915_private *node_to_i915(struct drm_info_node *node)
 {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 57/97] drm/i915/guc: Add several request trace points
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (55 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 56/97] drm/i915/guc: Update GuC debugfs to support new GuC Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 58/97] drm/i915: Add intel_context tracing Matthew Brost
                   ` (42 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Add trace points for request dependencies and GuC submit. Extended
existing request trace points to include submit fence value,, guc_id,
and ring tail value.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  3 ++
 drivers/gpu/drm/i915/i915_request.c           |  3 ++
 drivers/gpu/drm/i915/i915_trace.h             | 39 ++++++++++++++++++-
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index c7a8968f22c5..87ed00f272e7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -421,6 +421,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 			guc->stalled_request = last;
 			return false;
 		}
+		trace_i915_request_guc_submit(last);
 	}
 
 	guc->stalled_request = NULL;
@@ -645,6 +646,8 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
 	ret = guc_add_request(guc, rq);
 	if (ret == -EBUSY)
 		guc->stalled_request = rq;
+	else
+		trace_i915_request_guc_submit(rq);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 3a8f6ec0c32d..9542a5baa45a 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1344,6 +1344,9 @@ __i915_request_await_execution(struct i915_request *to,
 			return err;
 	}
 
+	trace_i915_request_dep_to(to);
+	trace_i915_request_dep_from(from);
+
 	/* Couple the dependency tree for PI on this exposed to->fence */
 	if (to->engine->sched_engine->schedule) {
 		err = i915_sched_node_add_dependency(&to->sched,
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 6778ad2a14a4..b02d04b6c8f6 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -794,22 +794,27 @@ DECLARE_EVENT_CLASS(i915_request,
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
 			     __field(u64, ctx)
+			     __field(u32, guc_id)
 			     __field(u16, class)
 			     __field(u16, instance)
 			     __field(u32, seqno)
+			     __field(u32, tail)
 			     ),
 
 	    TP_fast_assign(
 			   __entry->dev = rq->engine->i915->drm.primary->index;
 			   __entry->class = rq->engine->uabi_class;
 			   __entry->instance = rq->engine->uabi_instance;
+			   __entry->guc_id = rq->context->guc_id;
 			   __entry->ctx = rq->fence.context;
 			   __entry->seqno = rq->fence.seqno;
+			   __entry->tail = rq->tail;
 			   ),
 
-	    TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u",
+	    TP_printk("dev=%u, engine=%u:%u, guc_id=%u, ctx=%llu, seqno=%u, tail=%u",
 		      __entry->dev, __entry->class, __entry->instance,
-		      __entry->ctx, __entry->seqno)
+		      __entry->guc_id, __entry->ctx, __entry->seqno,
+		      __entry->tail)
 );
 
 DEFINE_EVENT(i915_request, i915_request_add,
@@ -818,6 +823,21 @@ DEFINE_EVENT(i915_request, i915_request_add,
 );
 
 #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
+DEFINE_EVENT(i915_request, i915_request_dep_to,
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
+);
+
+DEFINE_EVENT(i915_request, i915_request_dep_from,
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
+);
+
+DEFINE_EVENT(i915_request, i915_request_guc_submit,
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
+);
+
 DEFINE_EVENT(i915_request, i915_request_submit,
 	     TP_PROTO(struct i915_request *rq),
 	     TP_ARGS(rq)
@@ -887,6 +907,21 @@ TRACE_EVENT(i915_request_out,
 
 #else
 #if !defined(TRACE_HEADER_MULTI_READ)
+static inline void
+trace_i915_request_dep_to(struct i915_request *rq)
+{
+}
+
+static inline void
+trace_i915_request_dep_from(struct i915_request *rq)
+{
+}
+
+static inline void
+trace_i915_request_guc_submit(struct i915_request *rq)
+{
+}
+
 static inline void
 trace_i915_request_submit(struct i915_request *rq)
 {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 58/97] drm/i915: Add intel_context tracing
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (56 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 57/97] drm/i915/guc: Add several request trace points Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 59/97] drm/i915/guc: GuC virtual engines Matthew Brost
                   ` (41 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Add intel_context tracing. These trace points are particular helpful
when debugging the GuC firmware and can be enabled via
CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS kernel config option.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   6 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  14 ++
 drivers/gpu/drm/i915/i915_trace.h             | 148 +++++++++++++++++-
 3 files changed, 166 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 7f97753ab164..b24a1b7a3f88 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -8,6 +8,7 @@
 
 #include "i915_drv.h"
 #include "i915_globals.h"
+#include "i915_trace.h"
 
 #include "intel_context.h"
 #include "intel_engine.h"
@@ -28,6 +29,7 @@ static void rcu_context_free(struct rcu_head *rcu)
 {
 	struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
 
+	trace_intel_context_free(ce);
 	kmem_cache_free(global.slab_ce, ce);
 }
 
@@ -46,6 +48,7 @@ intel_context_create(struct intel_engine_cs *engine)
 		return ERR_PTR(-ENOMEM);
 
 	intel_context_init(ce, engine);
+	trace_intel_context_create(ce);
 	return ce;
 }
 
@@ -268,6 +271,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
 
 	GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */
 
+	trace_intel_context_do_pin(ce);
+
 err_unlock:
 	mutex_unlock(&ce->pin_mutex);
 err_post_unpin:
@@ -323,6 +328,7 @@ void __intel_context_do_unpin(struct intel_context *ce, int sub)
 	 */
 	intel_context_get(ce);
 	intel_context_active_release(ce);
+	trace_intel_context_do_unpin(ce);
 	intel_context_put(ce);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 87ed00f272e7..a789994d6de7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -347,6 +347,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 
 	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
 	if (!enabled && !err) {
+		trace_intel_context_sched_enable(ce);
 		atomic_inc(&guc->outstanding_submission_g2h);
 		set_context_enabled(ce);
 	} else if (!enabled) {
@@ -815,6 +816,8 @@ static int register_context(struct intel_context *ce)
 	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
 		ce->guc_id * sizeof(struct guc_lrc_desc);
 
+	trace_intel_context_register(ce);
+
 	return __guc_action_register_context(guc, ce->guc_id, offset);
 }
 
@@ -834,6 +837,8 @@ static int deregister_context(struct intel_context *ce, u32 guc_id)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 
+	trace_intel_context_deregister(ce);
+
 	return __guc_action_deregister_context(guc, guc_id);
 }
 
@@ -908,6 +913,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
 	 * GuC before registering this context.
 	 */
 	if (context_registered) {
+		trace_intel_context_steal_guc_id(ce);
 		set_context_wait_for_deregister_to_register(ce);
 		intel_context_get(ce);
 
@@ -966,6 +972,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 
 	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
 
+	trace_intel_context_sched_disable(ce);
 	intel_context_get(ce);
 
 	guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
@@ -1122,6 +1129,9 @@ static void __guc_signal_context_fence(struct intel_context *ce)
 
 	lockdep_assert_held(&ce->guc_state.lock);
 
+	if (!list_empty(&ce->guc_state.fences))
+		trace_intel_context_fence_release(ce);
+
 	list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link)
 		i915_sw_fence_complete(&rq->submit);
 
@@ -1536,6 +1546,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 	if (unlikely(!ce))
 		return -EPROTO;
 
+	trace_intel_context_deregister_done(ce);
+
 	if (context_wait_for_deregister_to_register(ce)) {
 		struct intel_runtime_pm *runtime_pm =
 			&ce->engine->gt->i915->runtime_pm;
@@ -1587,6 +1599,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		return -EPROTO;
 	}
 
+	trace_intel_context_sched_done(ce);
+
 	if (context_pending_enable(ce)) {
 		clr_context_pending_enable(ce);
 	} else if (context_pending_disable(ce)) {
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index b02d04b6c8f6..97c2e83984ed 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -818,8 +818,8 @@ DECLARE_EVENT_CLASS(i915_request,
 );
 
 DEFINE_EVENT(i915_request, i915_request_add,
-	    TP_PROTO(struct i915_request *rq),
-	    TP_ARGS(rq)
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
 );
 
 #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
@@ -905,6 +905,90 @@ TRACE_EVENT(i915_request_out,
 			      __entry->ctx, __entry->seqno, __entry->completed)
 );
 
+DECLARE_EVENT_CLASS(intel_context,
+	    TP_PROTO(struct intel_context *ce),
+	    TP_ARGS(ce),
+
+	    TP_STRUCT__entry(
+			     __field(u32, guc_id)
+			     __field(int, pin_count)
+			     __field(u32, sched_state)
+			     __field(u32, guc_sched_state_no_lock)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->guc_id = ce->guc_id;
+			   __entry->pin_count = atomic_read(&ce->pin_count);
+			   __entry->sched_state = ce->guc_state.sched_state;
+			   __entry->guc_sched_state_no_lock =
+			   atomic_read(&ce->guc_sched_state_no_lock);
+			   ),
+
+	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x",
+		      __entry->guc_id, __entry->pin_count, __entry->sched_state,
+		      __entry->guc_sched_state_no_lock)
+);
+
+DEFINE_EVENT(intel_context, intel_context_register,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_deregister,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_deregister_done,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_sched_enable,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_sched_disable,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_sched_done,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_create,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_fence_release,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_free,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_steal_guc_id,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_do_pin,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_do_unpin,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
 #else
 #if !defined(TRACE_HEADER_MULTI_READ)
 static inline void
@@ -941,6 +1025,66 @@ static inline void
 trace_i915_request_out(struct i915_request *rq)
 {
 }
+
+static inline void
+trace_intel_context_register(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_deregister(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_deregister_done(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_sched_enable(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_sched_disable(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_sched_done(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_create(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_fence_release(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_free(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_steal_guc_id(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_do_pin(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_do_unpin(struct intel_context *ce)
+{
+}
 #endif
 #endif
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 59/97] drm/i915/guc: GuC virtual engines
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (57 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 58/97] drm/i915: Add intel_context tracing Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 60/97] drm/i915: Track 'serial' counts for " Matthew Brost
                   ` (40 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Implement GuC virtual engines. Rather simple implementation, basically
just allocate an engine, setup context enter / exit function to virtual
engine specific functions, set all other variables / functions to guc
versions, and set the engine mask to that of all the siblings.

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  19 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |   1 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |  10 +
 drivers/gpu/drm/i915/gt/intel_engine.h        |  45 +++-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  14 +
 .../drm/i915/gt/intel_execlists_submission.c  | 186 +++++++------
 .../drm/i915/gt/intel_execlists_submission.h  |  11 -
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  20 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 253 +++++++++++++++++-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   2 +
 10 files changed, 429 insertions(+), 132 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index d30260ffe2a7..e6bc5c666f93 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -72,7 +72,6 @@
 #include "gt/intel_context_param.h"
 #include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_engine_user.h"
-#include "gt/intel_execlists_submission.h" /* virtual_engine */
 #include "gt/intel_gpu_commands.h"
 #include "gt/intel_ring.h"
 
@@ -1569,9 +1568,6 @@ set_engines__load_balance(struct i915_user_extension __user *base, void *data)
 	if (!HAS_EXECLISTS(i915))
 		return -ENODEV;
 
-	if (intel_uc_uses_guc_submission(&i915->gt.uc))
-		return -ENODEV; /* not implement yet */
-
 	if (get_user(idx, &ext->engine_index))
 		return -EFAULT;
 
@@ -1628,7 +1624,7 @@ set_engines__load_balance(struct i915_user_extension __user *base, void *data)
 		}
 	}
 
-	ce = intel_execlists_create_virtual(siblings, n);
+	ce = intel_engine_create_virtual(siblings, n);
 	if (IS_ERR(ce)) {
 		err = PTR_ERR(ce);
 		goto out_siblings;
@@ -1724,13 +1720,9 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
 		 * A non-virtual engine has no siblings to choose between; and
 		 * a submit fence will always be directed to the one engine.
 		 */
-		if (intel_engine_is_virtual(virtual)) {
-			err = intel_virtual_engine_attach_bond(virtual,
-							       master,
-							       bond);
-			if (err)
-				return err;
-		}
+		err = intel_engine_attach_bond(virtual, master, bond);
+		if (err)
+			return err;
 	}
 
 	return 0;
@@ -2117,8 +2109,7 @@ static int clone_engines(struct i915_gem_context *dst,
 		 * the virtual engine instead.
 		 */
 		if (intel_engine_is_virtual(engine))
-			clone->engines[n] =
-				intel_execlists_clone_virtual(engine);
+			clone->engines[n] = intel_engine_clone_virtual(engine);
 		else
 			clone->engines[n] = intel_context_create(engine);
 		if (IS_ERR_OR_NULL(clone->engines[n])) {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
index b5c908f3f4f2..ba772762f7b9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
@@ -10,6 +10,7 @@
 #include "i915_gem_context_types.h"
 
 #include "gt/intel_context.h"
+#include "gt/intel_engine.h"
 
 #include "i915_drv.h"
 #include "i915_gem.h"
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index e7af6a2368f8..6945963a31ba 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -47,6 +47,16 @@ struct intel_context_ops {
 
 	void (*reset)(struct intel_context *ce);
 	void (*destroy)(struct kref *kref);
+
+	/* virtual engine/context interface */
+	struct intel_context *(*create_virtual)(struct intel_engine_cs **engine,
+						unsigned int count);
+	struct intel_context *(*clone_virtual)(struct intel_engine_cs *engine);
+	struct intel_engine_cs *(*get_sibling)(struct intel_engine_cs *engine,
+					       unsigned int sibling);
+	int (*attach_bond)(struct intel_engine_cs *engine,
+			   const struct intel_engine_cs *master,
+			   const struct intel_engine_cs *sibling);
 };
 
 struct intel_context {
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 988d9688ae4d..3cd09381b6f8 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -261,13 +261,56 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
 	return intel_engine_has_preemption(engine);
 }
 
+struct intel_context *
+intel_engine_create_virtual(struct intel_engine_cs **siblings,
+			    unsigned int count);
+
+static inline bool
+intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine)
+{
+	if (intel_engine_uses_guc(engine))
+		return intel_guc_virtual_engine_has_heartbeat(engine);
+	else
+		GEM_BUG_ON("Only should be called in GuC submission");
+
+	return false;
+}
+
 static inline bool
 intel_engine_has_heartbeat(const struct intel_engine_cs *engine)
 {
 	if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL))
 		return false;
 
-	return READ_ONCE(engine->props.heartbeat_interval_ms);
+	if (intel_engine_is_virtual(engine))
+		return intel_virtual_engine_has_heartbeat(engine);
+	else
+		return READ_ONCE(engine->props.heartbeat_interval_ms);
+}
+
+static inline struct intel_context *
+intel_engine_clone_virtual(struct intel_engine_cs *src)
+{
+	GEM_BUG_ON(!intel_engine_is_virtual(src));
+	return src->cops->clone_virtual(src);
+}
+
+static inline int
+intel_engine_attach_bond(struct intel_engine_cs *engine,
+			 const struct intel_engine_cs *master,
+			 const struct intel_engine_cs *sibling)
+{
+	if (!engine->cops->attach_bond)
+		return 0;
+
+	return engine->cops->attach_bond(engine, master, sibling);
+}
+
+static inline struct intel_engine_cs *
+intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
+{
+	GEM_BUG_ON(!intel_engine_is_virtual(engine));
+	return engine->cops->get_sibling(engine, sibling);
 }
 
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 7866ff0c2673..903f72f0953a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1792,6 +1792,20 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now)
 	return total;
 }
 
+struct intel_context *
+intel_engine_create_virtual(struct intel_engine_cs **siblings,
+			    unsigned int count)
+{
+	if (count == 0)
+		return ERR_PTR(-EINVAL);
+
+	if (count == 1)
+		return intel_context_create(siblings[0]);
+
+	GEM_BUG_ON(!siblings[0]->cops->create_virtual);
+	return siblings[0]->cops->create_virtual(siblings, count);
+}
+
 static bool match_ring(struct i915_request *rq)
 {
 	u32 ring = ENGINE_READ(rq->engine, RING_START);
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 0927a2416b52..ae12d7f19ecd 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -205,6 +205,9 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
 	return container_of(engine, struct virtual_engine, base);
 }
 
+static struct intel_context *
+execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
+
 static struct i915_request *
 __active_request(const struct intel_timeline * const tl,
 		 struct i915_request *rq,
@@ -2557,6 +2560,8 @@ static const struct intel_context_ops execlists_context_ops = {
 
 	.reset = lrc_reset,
 	.destroy = lrc_destroy,
+
+	.create_virtual = execlists_create_virtual,
 };
 
 static int emit_pdps(struct i915_request *rq)
@@ -3505,6 +3510,94 @@ static void virtual_context_exit(struct intel_context *ce)
 		intel_engine_pm_put(ve->siblings[n]);
 }
 
+static struct intel_engine_cs *
+virtual_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
+{
+	struct virtual_engine *ve = to_virtual_engine(engine);
+
+	if (sibling >= ve->num_siblings)
+		return NULL;
+
+	return ve->siblings[sibling];
+}
+
+static struct intel_context *
+virtual_clone(struct intel_engine_cs *src)
+{
+	struct virtual_engine *se = to_virtual_engine(src);
+	struct intel_context *dst;
+
+	dst = execlists_create_virtual(se->siblings, se->num_siblings);
+	if (IS_ERR(dst))
+		return dst;
+
+	if (se->num_bonds) {
+		struct virtual_engine *de = to_virtual_engine(dst->engine);
+
+		de->bonds = kmemdup(se->bonds,
+				    sizeof(*se->bonds) * se->num_bonds,
+				    GFP_KERNEL);
+		if (!de->bonds) {
+			intel_context_put(dst);
+			return ERR_PTR(-ENOMEM);
+		}
+
+		de->num_bonds = se->num_bonds;
+	}
+
+	return dst;
+}
+
+static struct ve_bond *
+virtual_find_bond(struct virtual_engine *ve,
+		  const struct intel_engine_cs *master)
+{
+	int i;
+
+	for (i = 0; i < ve->num_bonds; i++) {
+		if (ve->bonds[i].master == master)
+			return &ve->bonds[i];
+	}
+
+	return NULL;
+}
+
+static int virtual_attach_bond(struct intel_engine_cs *engine,
+			       const struct intel_engine_cs *master,
+			       const struct intel_engine_cs *sibling)
+{
+	struct virtual_engine *ve = to_virtual_engine(engine);
+	struct ve_bond *bond;
+	int n;
+
+	/* Sanity check the sibling is part of the virtual engine */
+	for (n = 0; n < ve->num_siblings; n++)
+		if (sibling == ve->siblings[n])
+			break;
+	if (n == ve->num_siblings)
+		return -EINVAL;
+
+	bond = virtual_find_bond(ve, master);
+	if (bond) {
+		bond->sibling_mask |= sibling->mask;
+		return 0;
+	}
+
+	bond = krealloc(ve->bonds,
+			sizeof(*bond) * (ve->num_bonds + 1),
+			GFP_KERNEL);
+	if (!bond)
+		return -ENOMEM;
+
+	bond[ve->num_bonds].master = master;
+	bond[ve->num_bonds].sibling_mask = sibling->mask;
+
+	ve->bonds = bond;
+	ve->num_bonds++;
+
+	return 0;
+}
+
 static const struct intel_context_ops virtual_context_ops = {
 	.flags = COPS_HAS_INFLIGHT,
 
@@ -3519,6 +3612,10 @@ static const struct intel_context_ops virtual_context_ops = {
 	.exit = virtual_context_exit,
 
 	.destroy = virtual_context_destroy,
+
+	.clone_virtual = virtual_clone,
+	.get_sibling = virtual_get_sibling,
+	.attach_bond = virtual_attach_bond,
 };
 
 static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
@@ -3667,20 +3764,6 @@ static void virtual_submit_request(struct i915_request *rq)
 	spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags);
 }
 
-static struct ve_bond *
-virtual_find_bond(struct virtual_engine *ve,
-		  const struct intel_engine_cs *master)
-{
-	int i;
-
-	for (i = 0; i < ve->num_bonds; i++) {
-		if (ve->bonds[i].master == master)
-			return &ve->bonds[i];
-	}
-
-	return NULL;
-}
-
 static void
 virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
 {
@@ -3703,20 +3786,13 @@ virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
 	to_request(signal)->execution_mask &= ~allowed;
 }
 
-struct intel_context *
-intel_execlists_create_virtual(struct intel_engine_cs **siblings,
-			       unsigned int count)
+static struct intel_context *
+execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 {
 	struct virtual_engine *ve;
 	unsigned int n;
 	int err;
 
-	if (count == 0)
-		return ERR_PTR(-EINVAL);
-
-	if (count == 1)
-		return intel_context_create(siblings[0]);
-
 	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
 	if (!ve)
 		return ERR_PTR(-ENOMEM);
@@ -3851,70 +3927,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
 	return ERR_PTR(err);
 }
 
-struct intel_context *
-intel_execlists_clone_virtual(struct intel_engine_cs *src)
-{
-	struct virtual_engine *se = to_virtual_engine(src);
-	struct intel_context *dst;
-
-	dst = intel_execlists_create_virtual(se->siblings,
-					     se->num_siblings);
-	if (IS_ERR(dst))
-		return dst;
-
-	if (se->num_bonds) {
-		struct virtual_engine *de = to_virtual_engine(dst->engine);
-
-		de->bonds = kmemdup(se->bonds,
-				    sizeof(*se->bonds) * se->num_bonds,
-				    GFP_KERNEL);
-		if (!de->bonds) {
-			intel_context_put(dst);
-			return ERR_PTR(-ENOMEM);
-		}
-
-		de->num_bonds = se->num_bonds;
-	}
-
-	return dst;
-}
-
-int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
-				     const struct intel_engine_cs *master,
-				     const struct intel_engine_cs *sibling)
-{
-	struct virtual_engine *ve = to_virtual_engine(engine);
-	struct ve_bond *bond;
-	int n;
-
-	/* Sanity check the sibling is part of the virtual engine */
-	for (n = 0; n < ve->num_siblings; n++)
-		if (sibling == ve->siblings[n])
-			break;
-	if (n == ve->num_siblings)
-		return -EINVAL;
-
-	bond = virtual_find_bond(ve, master);
-	if (bond) {
-		bond->sibling_mask |= sibling->mask;
-		return 0;
-	}
-
-	bond = krealloc(ve->bonds,
-			sizeof(*bond) * (ve->num_bonds + 1),
-			GFP_KERNEL);
-	if (!bond)
-		return -ENOMEM;
-
-	bond[ve->num_bonds].master = master;
-	bond[ve->num_bonds].sibling_mask = sibling->mask;
-
-	ve->bonds = bond;
-	ve->num_bonds++;
-
-	return 0;
-}
-
 void intel_execlists_show_requests(struct intel_engine_cs *engine,
 				   struct drm_printer *m,
 				   void (*show_request)(struct drm_printer *m,
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
index 4ca9b475e252..74041b1994af 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
@@ -32,15 +32,4 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 							int indent),
 				   unsigned int max);
 
-struct intel_context *
-intel_execlists_create_virtual(struct intel_engine_cs **siblings,
-			       unsigned int count);
-
-struct intel_context *
-intel_execlists_clone_virtual(struct intel_engine_cs *src);
-
-int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
-				     const struct intel_engine_cs *master,
-				     const struct intel_engine_cs *sibling);
-
 #endif /* __INTEL_EXECLISTS_SUBMISSION_H__ */
diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index f349048ccbf6..77c411d8e5a0 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -3710,7 +3710,7 @@ static int nop_virtual_engine(struct intel_gt *gt,
 	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ve));
 
 	for (n = 0; n < nctx; n++) {
-		ve[n] = intel_execlists_create_virtual(siblings, nsibling);
+		ve[n] = intel_engine_create_virtual(siblings, nsibling);
 		if (IS_ERR(ve[n])) {
 			err = PTR_ERR(ve[n]);
 			nctx = n;
@@ -3906,7 +3906,7 @@ static int mask_virtual_engine(struct intel_gt *gt,
 	 * restrict it to our desired engine within the virtual engine.
 	 */
 
-	ve = intel_execlists_create_virtual(siblings, nsibling);
+	ve = intel_engine_create_virtual(siblings, nsibling);
 	if (IS_ERR(ve)) {
 		err = PTR_ERR(ve);
 		goto out_close;
@@ -4037,7 +4037,7 @@ static int slicein_virtual_engine(struct intel_gt *gt,
 		i915_request_add(rq);
 	}
 
-	ce = intel_execlists_create_virtual(siblings, nsibling);
+	ce = intel_engine_create_virtual(siblings, nsibling);
 	if (IS_ERR(ce)) {
 		err = PTR_ERR(ce);
 		goto out;
@@ -4089,7 +4089,7 @@ static int sliceout_virtual_engine(struct intel_gt *gt,
 
 	/* XXX We do not handle oversubscription and fairness with normal rq */
 	for (n = 0; n < nsibling; n++) {
-		ce = intel_execlists_create_virtual(siblings, nsibling);
+		ce = intel_engine_create_virtual(siblings, nsibling);
 		if (IS_ERR(ce)) {
 			err = PTR_ERR(ce);
 			goto out;
@@ -4191,7 +4191,7 @@ static int preserved_virtual_engine(struct intel_gt *gt,
 	if (err)
 		goto out_scratch;
 
-	ve = intel_execlists_create_virtual(siblings, nsibling);
+	ve = intel_engine_create_virtual(siblings, nsibling);
 	if (IS_ERR(ve)) {
 		err = PTR_ERR(ve);
 		goto out_scratch;
@@ -4414,16 +4414,16 @@ static int bond_virtual_engine(struct intel_gt *gt,
 		for (n = 0; n < nsibling; n++) {
 			struct intel_context *ve;
 
-			ve = intel_execlists_create_virtual(siblings, nsibling);
+			ve = intel_engine_create_virtual(siblings, nsibling);
 			if (IS_ERR(ve)) {
 				err = PTR_ERR(ve);
 				onstack_fence_fini(&fence);
 				goto out;
 			}
 
-			err = intel_virtual_engine_attach_bond(ve->engine,
-							       master,
-							       siblings[n]);
+			err = intel_engine_attach_bond(ve->engine,
+						       master,
+						       siblings[n]);
 			if (err) {
 				intel_context_put(ve);
 				onstack_fence_fini(&fence);
@@ -4559,7 +4559,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
 	if (igt_spinner_init(&spin, gt))
 		return -ENOMEM;
 
-	ve = intel_execlists_create_virtual(siblings, nsibling);
+	ve = intel_engine_create_virtual(siblings, nsibling);
 	if (IS_ERR(ve)) {
 		err = PTR_ERR(ve);
 		goto out_spin;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a789994d6de7..dc79d287c50a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -60,6 +60,15 @@
  *
  */
 
+/* GuC Virtual Engine */
+struct guc_virtual_engine {
+	struct intel_engine_cs base;
+	struct intel_context context;
+};
+
+static struct intel_context *
+guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
+
 #define GUC_REQUEST_SIZE 64 /* bytes */
 
 /*
@@ -931,20 +940,35 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
 	return ret;
 }
 
-static int guc_context_pre_pin(struct intel_context *ce,
-			       struct i915_gem_ww_ctx *ww,
-			       void **vaddr)
+static int __guc_context_pre_pin(struct intel_context *ce,
+				 struct intel_engine_cs *engine,
+				 struct i915_gem_ww_ctx *ww,
+				 void **vaddr)
 {
-	return lrc_pre_pin(ce, ce->engine, ww, vaddr);
+	return lrc_pre_pin(ce, engine, ww, vaddr);
 }
 
-static int guc_context_pin(struct intel_context *ce, void *vaddr)
+static int __guc_context_pin(struct intel_context *ce,
+			     struct intel_engine_cs *engine,
+			     void *vaddr)
 {
 	if (i915_ggtt_offset(ce->state) !=
 	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
 		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
 
-	return lrc_pin(ce, ce->engine, vaddr);
+	return lrc_pin(ce, engine, vaddr);
+}
+
+static int guc_context_pre_pin(struct intel_context *ce,
+			       struct i915_gem_ww_ctx *ww,
+			       void **vaddr)
+{
+	return __guc_context_pre_pin(ce, ce->engine, ww, vaddr);
+}
+
+static int guc_context_pin(struct intel_context *ce, void *vaddr)
+{
+	return __guc_context_pin(ce, ce->engine, vaddr);
 }
 
 static void guc_context_unpin(struct intel_context *ce)
@@ -1044,6 +1068,21 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 	deregister_context(ce, ce->guc_id);
 }
 
+static void __guc_context_destroy(struct intel_context *ce)
+{
+	lrc_fini(ce);
+	intel_context_fini(ce);
+
+	if (intel_engine_is_virtual(ce->engine)) {
+		struct guc_virtual_engine *ve =
+			container_of(ce, typeof(*ve), context);
+
+		kfree(ve);
+	} else {
+		intel_context_free(ce);
+	}
+}
+
 static void guc_context_destroy(struct kref *kref)
 {
 	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
@@ -1060,7 +1099,7 @@ static void guc_context_destroy(struct kref *kref)
 	if (context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
 		release_guc_id(guc, ce);
-		lrc_destroy(kref);
+		__guc_context_destroy(ce);
 		return;
 	}
 
@@ -1076,7 +1115,7 @@ static void guc_context_destroy(struct kref *kref)
 	if (context_guc_id_invalid(ce)) {
 		__release_guc_id(guc, ce);
 		spin_unlock_irqrestore(&guc->contexts_lock, flags);
-		lrc_destroy(kref);
+		__guc_context_destroy(ce);
 		return;
 	}
 
@@ -1121,6 +1160,8 @@ static const struct intel_context_ops guc_context_ops = {
 
 	.reset = lrc_reset,
 	.destroy = guc_context_destroy,
+
+	.create_virtual = guc_create_virtual,
 };
 
 static void __guc_signal_context_fence(struct intel_context *ce)
@@ -1250,6 +1291,96 @@ static int guc_request_alloc(struct i915_request *rq)
 	return 0;
 }
 
+static struct intel_engine_cs *
+guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
+{
+	struct intel_engine_cs *engine;
+	intel_engine_mask_t tmp, mask = ve->mask;
+	unsigned int num_siblings = 0;
+
+	for_each_engine_masked(engine, ve->gt, mask, tmp)
+		if (num_siblings++ == sibling)
+			return engine;
+
+	return NULL;
+}
+
+static int guc_virtual_context_pre_pin(struct intel_context *ce,
+				       struct i915_gem_ww_ctx *ww,
+				       void **vaddr)
+{
+	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+
+	return __guc_context_pre_pin(ce, engine, ww, vaddr);
+}
+
+static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
+{
+	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+
+	return __guc_context_pin(ce, engine, vaddr);
+}
+
+static void guc_virtual_context_enter(struct intel_context *ce)
+{
+	intel_engine_mask_t tmp, mask = ce->engine->mask;
+	struct intel_engine_cs *engine;
+
+	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
+		intel_engine_pm_get(engine);
+
+	intel_timeline_enter(ce->timeline);
+}
+
+static void guc_virtual_context_exit(struct intel_context *ce)
+{
+	intel_engine_mask_t tmp, mask = ce->engine->mask;
+	struct intel_engine_cs *engine;
+
+	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
+		intel_engine_pm_put(engine);
+
+	intel_timeline_exit(ce->timeline);
+}
+
+static int guc_virtual_context_alloc(struct intel_context *ce)
+{
+	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+
+	return lrc_alloc(ce, engine);
+}
+
+static struct intel_context *guc_clone_virtual(struct intel_engine_cs *src)
+{
+	struct intel_engine_cs *siblings[GUC_MAX_INSTANCES_PER_CLASS], *engine;
+	intel_engine_mask_t tmp, mask = src->mask;
+	unsigned int num_siblings = 0;
+
+	for_each_engine_masked(engine, src->gt, mask, tmp)
+		siblings[num_siblings++] = engine;
+
+	return guc_create_virtual(siblings, num_siblings);
+}
+
+static const struct intel_context_ops virtual_guc_context_ops = {
+	.alloc = guc_virtual_context_alloc,
+
+	.pre_pin = guc_virtual_context_pre_pin,
+	.pin = guc_virtual_context_pin,
+	.unpin = guc_context_unpin,
+	.post_unpin = guc_context_post_unpin,
+
+	.enter = guc_virtual_context_enter,
+	.exit = guc_virtual_context_exit,
+
+	.sched_disable = guc_context_sched_disable,
+
+	.destroy = guc_context_destroy,
+
+	.clone_virtual = guc_clone_virtual,
+	.get_sibling = guc_virtual_get_sibling,
+};
+
 static void sanitize_hwsp(struct intel_engine_cs *engine)
 {
 	struct intel_timeline *tl;
@@ -1564,7 +1695,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 	} else if (context_destroyed(ce)) {
 		/* Context has been destroyed */
 		release_guc_id(guc, ce);
-		lrc_destroy(&ce->ref);
+		__guc_context_destroy(ce);
 	}
 
 	decr_outstanding_submission_g2h(guc);
@@ -1676,3 +1807,107 @@ void intel_guc_log_context_info(struct intel_guc *guc,
 			   atomic_read(&ce->guc_sched_state_no_lock));
 	}
 }
+
+static struct intel_context *
+guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
+{
+	struct guc_virtual_engine *ve;
+	struct intel_guc *guc;
+	unsigned int n;
+	int err;
+
+	ve = kzalloc(sizeof(*ve), GFP_KERNEL);
+	if (!ve)
+		return ERR_PTR(-ENOMEM);
+
+	guc = &siblings[0]->gt->uc.guc;
+
+	ve->base.i915 = siblings[0]->i915;
+	ve->base.gt = siblings[0]->gt;
+	ve->base.uncore = siblings[0]->uncore;
+	ve->base.id = -1;
+
+	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
+	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
+	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
+	ve->base.saturated = ALL_ENGINES;
+	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
+	if (!ve->base.breadcrumbs) {
+		kfree(ve);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
+
+	ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine);
+
+	ve->base.cops = &virtual_guc_context_ops;
+	ve->base.request_alloc = guc_request_alloc;
+
+	ve->base.submit_request = guc_submit_request;
+
+	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
+
+	intel_context_init(&ve->context, &ve->base);
+
+	for (n = 0; n < count; n++) {
+		struct intel_engine_cs *sibling = siblings[n];
+
+		GEM_BUG_ON(!is_power_of_2(sibling->mask));
+		if (sibling->mask & ve->base.mask) {
+			DRM_DEBUG("duplicate %s entry in load balancer\n",
+				  sibling->name);
+			err = -EINVAL;
+			goto err_put;
+		}
+
+		ve->base.mask |= sibling->mask;
+
+		if (n != 0 && ve->base.class != sibling->class) {
+			DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n",
+				  sibling->class, ve->base.class);
+			err = -EINVAL;
+			goto err_put;
+		} else if (n == 0) {
+			ve->base.class = sibling->class;
+			ve->base.uabi_class = sibling->uabi_class;
+			snprintf(ve->base.name, sizeof(ve->base.name),
+				 "v%dx%d", ve->base.class, count);
+			ve->base.context_size = sibling->context_size;
+
+			ve->base.emit_bb_start = sibling->emit_bb_start;
+			ve->base.emit_flush = sibling->emit_flush;
+			ve->base.emit_init_breadcrumb =
+				sibling->emit_init_breadcrumb;
+			ve->base.emit_fini_breadcrumb =
+				sibling->emit_fini_breadcrumb;
+			ve->base.emit_fini_breadcrumb_dw =
+				sibling->emit_fini_breadcrumb_dw;
+
+			ve->base.flags |= sibling->flags;
+
+			ve->base.props.timeslice_duration_ms =
+				sibling->props.timeslice_duration_ms;
+		}
+	}
+
+	return &ve->context;
+
+err_put:
+	intel_context_put(&ve->context);
+	return ERR_PTR(err);
+}
+
+
+
+bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve)
+{
+	struct intel_engine_cs *engine;
+	intel_engine_mask_t tmp, mask = ve->mask;
+
+	for_each_engine_masked(engine, ve->gt, mask, tmp)
+		if (READ_ONCE(engine->props.heartbeat_interval_ms))
+			return true;
+
+	return false;
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 6453e2bfa151..95df5ab06031 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -25,6 +25,8 @@ void intel_guc_log_submission_info(struct intel_guc *guc,
 				   struct drm_printer *p);
 void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p);
 
+bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
+
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
 	/* XXX: GuC submission is unavailable for now */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 60/97] drm/i915: Track 'serial' counts for virtual engines
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (58 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 59/97] drm/i915/guc: GuC virtual engines Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-25 10:16   ` [Intel-gfx] " Tvrtko Ursulin
  2021-06-02 12:09   ` Tvrtko Ursulin
  2021-05-06 19:14 ` [RFC PATCH 61/97] drm/i915: Hold reference to intel_context over life of i915_request Matthew Brost
                   ` (39 subsequent siblings)
  99 siblings, 2 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

The serial number tracking of engines happens at the backend of
request submission and was expecting to only be given physical
engines. However, in GuC submission mode, the decomposition of virtual
to physical engines does not happen in i915. Instead, requests are
submitted to their virtual engine mask all the way through to the
hardware (i.e. to GuC). This would mean that the heart beat code
thinks the physical engines are idle due to the serial number not
incrementing.

This patch updates the tracking to decompose virtual engines into
their physical constituents and tracks the request against each. This
is not entirely accurate as the GuC will only be issuing the request
to one physical engine. However, it is the best that i915 can do given
that it has no knowledge of the GuC's scheduling decisions.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h     |  2 ++
 .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++++++
 drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++++++
 drivers/gpu/drm/i915/gt/mock_engine.c            |  6 ++++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 ++++++++++++++++
 drivers/gpu/drm/i915/i915_request.c              |  4 +++-
 6 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 86302e6d86b2..e2b5cda6dbc4 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -389,6 +389,8 @@ struct intel_engine_cs {
 	void		(*park)(struct intel_engine_cs *engine);
 	void		(*unpark)(struct intel_engine_cs *engine);
 
+	void		(*bump_serial)(struct intel_engine_cs *engine);
+
 	void		(*set_default_submission)(struct intel_engine_cs *engine);
 
 	const struct intel_context_ops *cops;
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index ae12d7f19ecd..02880ea5d693 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3199,6 +3199,11 @@ static void execlists_release(struct intel_engine_cs *engine)
 	lrc_fini_wa_ctx(engine);
 }
 
+static void execlist_bump_serial(struct intel_engine_cs *engine)
+{
+	engine->serial++;
+}
+
 static void
 logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 {
@@ -3208,6 +3213,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 
 	engine->cops = &execlists_context_ops;
 	engine->request_alloc = execlists_request_alloc;
+	engine->bump_serial = execlist_bump_serial;
 
 	engine->reset.prepare = execlists_reset_prepare;
 	engine->reset.rewind = execlists_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 14aa31879a37..39dd7c4ed0a9 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -1045,6 +1045,11 @@ static void setup_irq(struct intel_engine_cs *engine)
 	}
 }
 
+static void ring_bump_serial(struct intel_engine_cs *engine)
+{
+	engine->serial++;
+}
+
 static void setup_common(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
@@ -1064,6 +1069,7 @@ static void setup_common(struct intel_engine_cs *engine)
 
 	engine->cops = &ring_context_ops;
 	engine->request_alloc = ring_request_alloc;
+	engine->bump_serial = ring_bump_serial;
 
 	/*
 	 * Using a global execution timeline; the previous final breadcrumb is
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index bd005c1b6fd5..97b10fd60b55 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine)
 	intel_engine_fini_retire(engine);
 }
 
+static void mock_bump_serial(struct intel_engine_cs *engine)
+{
+	engine->serial++;
+}
+
 struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 				    const char *name,
 				    int id)
@@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 
 	engine->base.cops = &mock_context_ops;
 	engine->base.request_alloc = mock_request_alloc;
+	engine->base.bump_serial = mock_bump_serial;
 	engine->base.emit_flush = mock_emit_flush;
 	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
 	engine->base.submit_request = mock_submit_request;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index dc79d287c50a..f0e5731bcef6 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1500,6 +1500,20 @@ static void guc_release(struct intel_engine_cs *engine)
 	lrc_fini_wa_ctx(engine);
 }
 
+static void guc_bump_serial(struct intel_engine_cs *engine)
+{
+	engine->serial++;
+}
+
+static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
+{
+	struct intel_engine_cs *e;
+	intel_engine_mask_t tmp, mask = engine->mask;
+
+	for_each_engine_masked(e, engine->gt, mask, tmp)
+		e->serial++;
+}
+
 static void guc_default_vfuncs(struct intel_engine_cs *engine)
 {
 	/* Default vfuncs which can be overridden by each engine. */
@@ -1508,6 +1522,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 
 	engine->cops = &guc_context_ops;
 	engine->request_alloc = guc_request_alloc;
+	engine->bump_serial = guc_bump_serial;
 
 	engine->sched_engine->schedule = i915_schedule;
 
@@ -1843,6 +1858,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 
 	ve->base.cops = &virtual_guc_context_ops;
 	ve->base.request_alloc = guc_request_alloc;
+	ve->base.bump_serial = virtual_guc_bump_serial;
 
 	ve->base.submit_request = guc_submit_request;
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 9542a5baa45a..127d60b36422 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -692,7 +692,9 @@ bool __i915_request_submit(struct i915_request *request)
 				     request->ring->vaddr + request->postfix);
 
 	trace_i915_request_execute(request);
-	engine->serial++;
+	if (engine->bump_serial)
+		engine->bump_serial(engine);
+
 	result = true;
 
 	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 61/97] drm/i915: Hold reference to intel_context over life of i915_request
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (59 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 60/97] drm/i915: Track 'serial' counts for " Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-06-02 12:18   ` [Intel-gfx] " Tvrtko Ursulin
  2021-05-06 19:14 ` [RFC PATCH 62/97] drm/i915/guc: Disable bonding extension with GuC submission Matthew Brost
                   ` (38 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Hold a reference to the intel_context over life of an i915_request.
Without this an i915_request can exist after the context has been
destroyed (e.g. request retired, context closed, but user space holds a
reference to the request from an out fence). In the case of GuC
submission + virtual engine, the engine that the request references is
also destroyed which can trigger bad pointer dref in fence ops (e.g.
i915_fence_get_driver_name). We could likely change
i915_fence_get_driver_name to avoid touching the engine but let's just
be safe and hold the intel_context reference.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 54 ++++++++++++-----------------
 1 file changed, 22 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 127d60b36422..0b96b824ea06 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -126,39 +126,17 @@ static void i915_fence_release(struct dma_fence *fence)
 	i915_sw_fence_fini(&rq->semaphore);
 
 	/*
-	 * Keep one request on each engine for reserved use under mempressure
-	 *
-	 * We do not hold a reference to the engine here and so have to be
-	 * very careful in what rq->engine we poke. The virtual engine is
-	 * referenced via the rq->context and we released that ref during
-	 * i915_request_retire(), ergo we must not dereference a virtual
-	 * engine here. Not that we would want to, as the only consumer of
-	 * the reserved engine->request_pool is the power management parking,
-	 * which must-not-fail, and that is only run on the physical engines.
-	 *
-	 * Since the request must have been executed to be have completed,
-	 * we know that it will have been processed by the HW and will
-	 * not be unsubmitted again, so rq->engine and rq->execution_mask
-	 * at this point is stable. rq->execution_mask will be a single
-	 * bit if the last and _only_ engine it could execution on was a
-	 * physical engine, if it's multiple bits then it started on and
-	 * could still be on a virtual engine. Thus if the mask is not a
-	 * power-of-two we assume that rq->engine may still be a virtual
-	 * engine and so a dangling invalid pointer that we cannot dereference
-	 *
-	 * For example, consider the flow of a bonded request through a virtual
-	 * engine. The request is created with a wide engine mask (all engines
-	 * that we might execute on). On processing the bond, the request mask
-	 * is reduced to one or more engines. If the request is subsequently
-	 * bound to a single engine, it will then be constrained to only
-	 * execute on that engine and never returned to the virtual engine
-	 * after timeslicing away, see __unwind_incomplete_requests(). Thus we
-	 * know that if the rq->execution_mask is a single bit, rq->engine
-	 * can be a physical engine with the exact corresponding mask.
+	 * Keep one request on each engine for reserved use under mempressure,
+	 * do not use with virtual engines as this really is only needed for
+	 * kernel contexts.
 	 */
-	if (is_power_of_2(rq->execution_mask) &&
-	    !cmpxchg(&rq->engine->request_pool, NULL, rq))
+	if (!intel_engine_is_virtual(rq->engine) &&
+	    !cmpxchg(&rq->engine->request_pool, NULL, rq)) {
+		intel_context_put(rq->context);
 		return;
+	}
+
+	intel_context_put(rq->context);
 
 	kmem_cache_free(global.slab_requests, rq);
 }
@@ -977,7 +955,18 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 		}
 	}
 
-	rq->context = ce;
+	/*
+	 * Hold a reference to the intel_context over life of an i915_request.
+	 * Without this an i915_request can exist after the context has been
+	 * destroyed (e.g. request retired, context closed, but user space holds
+	 * a reference to the request from an out fence). In the case of GuC
+	 * submission + virtual engine, the engine that the request references
+	 * is also destroyed which can trigger bad pointer dref in fence ops
+	 * (e.g. i915_fence_get_driver_name). We could likely change these
+	 * functions to avoid touching the engine but let's just be safe and
+	 * hold the intel_context reference.
+	 */
+	rq->context = intel_context_get(ce);
 	rq->engine = ce->engine;
 	rq->ring = ce->ring;
 	rq->execution_mask = ce->engine->mask;
@@ -1054,6 +1043,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
 err_free:
+	intel_context_put(ce);
 	kmem_cache_free(global.slab_requests, rq);
 err_unreserve:
 	intel_context_unpin(ce);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 62/97] drm/i915/guc: Disable bonding extension with GuC submission
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (60 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 61/97] drm/i915: Hold reference to intel_context over life of i915_request Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 63/97] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs Matthew Brost
                   ` (37 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Update the bonding extension to return -ENODEV when using GuC submission
as this extension fundamentally will not work with the GuC submission
interface.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index e6bc5c666f93..bb827bb99250 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1675,6 +1675,11 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
 	}
 	virtual = set->engines->engines[idx]->engine;
 
+	if (intel_engine_uses_guc(virtual)) {
+		DRM_DEBUG("bonding extension not supported with GuC submission");
+		return -ENODEV;
+	}
+
 	err = check_user_mbz(&ext->flags);
 	if (err)
 		return err;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 63/97] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (61 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 62/97] drm/i915/guc: Disable bonding extension with GuC submission Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-06-02 13:31   ` [Intel-gfx] " Tvrtko Ursulin
  2021-05-06 19:14 ` [RFC PATCH 64/97] drm/i915/guc: Reset implementation for new GuC interface Matthew Brost
                   ` (36 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

With GuC virtual engines the physical engine which a request executes
and completes on isn't known to the i915. Therefore we can't attach a
request to a physical engines breadcrumbs. To work around this we create
a single breadcrumbs per engine class when using GuC submission and
direct all physical engine interrupts to this breadcrumbs.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
CC: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 41 +++++-------
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   | 14 +++-
 .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |  7 ++
 drivers/gpu/drm/i915/gt/intel_engine.h        |  3 +
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 28 +++++++-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  1 -
 .../drm/i915/gt/intel_execlists_submission.c  |  4 +-
 drivers/gpu/drm/i915/gt/mock_engine.c         |  4 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +++++++++++++++++--
 9 files changed, 133 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 38cc42783dfb..2007dc6f6b99 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -15,28 +15,14 @@
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
 
-static bool irq_enable(struct intel_engine_cs *engine)
+static bool irq_enable(struct intel_breadcrumbs *b)
 {
-	if (!engine->irq_enable)
-		return false;
-
-	/* Caller disables interrupts */
-	spin_lock(&engine->gt->irq_lock);
-	engine->irq_enable(engine);
-	spin_unlock(&engine->gt->irq_lock);
-
-	return true;
+	return intel_engine_irq_enable(b->irq_engine);
 }
 
-static void irq_disable(struct intel_engine_cs *engine)
+static void irq_disable(struct intel_breadcrumbs *b)
 {
-	if (!engine->irq_disable)
-		return;
-
-	/* Caller disables interrupts */
-	spin_lock(&engine->gt->irq_lock);
-	engine->irq_disable(engine);
-	spin_unlock(&engine->gt->irq_lock);
+	intel_engine_irq_disable(b->irq_engine);
 }
 
 static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
@@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 	WRITE_ONCE(b->irq_armed, true);
 
 	/* Requests may have completed before we could enable the interrupt. */
-	if (!b->irq_enabled++ && irq_enable(b->irq_engine))
+	if (!b->irq_enabled++ && b->irq_enable(b))
 		irq_work_queue(&b->irq_work);
 }
 
@@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
 {
 	GEM_BUG_ON(!b->irq_enabled);
 	if (!--b->irq_enabled)
-		irq_disable(b->irq_engine);
+		b->irq_disable(b);
 
 	WRITE_ONCE(b->irq_armed, false);
 	intel_gt_pm_put_async(b->irq_engine->gt);
@@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
 	if (!b)
 		return NULL;
 
-	b->irq_engine = irq_engine;
+	kref_init(&b->ref);
 
 	spin_lock_init(&b->signalers_lock);
 	INIT_LIST_HEAD(&b->signalers);
@@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
 	spin_lock_init(&b->irq_lock);
 	init_irq_work(&b->irq_work, signal_irq_work);
 
+	b->irq_engine = irq_engine;
+	b->irq_enable = irq_enable;
+	b->irq_disable = irq_disable;
+
 	return b;
 }
 
@@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)
 	spin_lock_irqsave(&b->irq_lock, flags);
 
 	if (b->irq_enabled)
-		irq_enable(b->irq_engine);
+		b->irq_enable(b);
 	else
-		irq_disable(b->irq_engine);
+		b->irq_disable(b);
 
 	spin_unlock_irqrestore(&b->irq_lock, flags);
 }
@@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
 	}
 }
 
-void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
+void intel_breadcrumbs_free(struct kref *kref)
 {
+	struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref);
+
 	irq_work_sync(&b->irq_work);
 	GEM_BUG_ON(!list_empty(&b->signalers));
 	GEM_BUG_ON(b->irq_armed);
+
 	kfree(b);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
index 3ce5ce270b04..72105b74663d 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
@@ -17,7 +17,7 @@ struct intel_breadcrumbs;
 
 struct intel_breadcrumbs *
 intel_breadcrumbs_create(struct intel_engine_cs *irq_engine);
-void intel_breadcrumbs_free(struct intel_breadcrumbs *b);
+void intel_breadcrumbs_free(struct kref *kref);
 
 void intel_breadcrumbs_reset(struct intel_breadcrumbs *b);
 void __intel_breadcrumbs_park(struct intel_breadcrumbs *b);
@@ -48,4 +48,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *request);
 void intel_context_remove_breadcrumbs(struct intel_context *ce,
 				      struct intel_breadcrumbs *b);
 
+static inline struct intel_breadcrumbs *
+intel_breadcrumbs_get(struct intel_breadcrumbs *b)
+{
+	kref_get(&b->ref);
+	return b;
+}
+
+static inline void intel_breadcrumbs_put(struct intel_breadcrumbs *b)
+{
+	kref_put(&b->ref, intel_breadcrumbs_free);
+}
+
 #endif /* __INTEL_BREADCRUMBS__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
index 3a084ce8ff5e..a4e146684be8 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
@@ -7,10 +7,13 @@
 #define __INTEL_BREADCRUMBS_TYPES__
 
 #include <linux/irq_work.h>
+#include <linux/kref.h>
 #include <linux/list.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
 
+typedef u8 intel_engine_mask_t;
+
 /*
  * Rather than have every client wait upon all user interrupts,
  * with the herd waking after every interrupt and each doing the
@@ -29,6 +32,7 @@
  * the overhead of waking that client is much preferred.
  */
 struct intel_breadcrumbs {
+	struct kref ref;
 	atomic_t active;
 
 	spinlock_t signalers_lock; /* protects the list of signalers */
@@ -42,7 +46,10 @@ struct intel_breadcrumbs {
 	bool irq_armed;
 
 	/* Not all breadcrumbs are attached to physical HW */
+	intel_engine_mask_t	engine_mask;
 	struct intel_engine_cs *irq_engine;
+	bool	(*irq_enable)(struct intel_breadcrumbs *b);
+	void	(*irq_disable)(struct intel_breadcrumbs *b);
 };
 
 #endif /* __INTEL_BREADCRUMBS_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 3cd09381b6f8..3321d0917a99 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -209,6 +209,9 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine,
 
 void intel_engine_init_execlists(struct intel_engine_cs *engine);
 
+bool intel_engine_irq_enable(struct intel_engine_cs *engine);
+void intel_engine_irq_disable(struct intel_engine_cs *engine);
+
 static inline void __intel_engine_reset(struct intel_engine_cs *engine,
 					bool stalled)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 903f72f0953a..10300db1c9a6 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -765,7 +765,7 @@ static int engine_setup_common(struct intel_engine_cs *engine)
 err_cmd_parser:
 	i915_sched_engine_put(engine->sched_engine);
 err_sched_engine:
-	intel_breadcrumbs_free(engine->breadcrumbs);
+	intel_breadcrumbs_put(engine->breadcrumbs);
 err_status:
 	cleanup_status_page(engine);
 	return err;
@@ -965,7 +965,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 {
 	GEM_BUG_ON(!list_empty(&engine->sched_engine->requests));
 
-	intel_breadcrumbs_free(engine->breadcrumbs);
+	intel_breadcrumbs_put(engine->breadcrumbs);
 	i915_sched_engine_put(engine->sched_engine);
 
 	intel_engine_fini_retire(engine);
@@ -1320,6 +1320,30 @@ bool intel_engines_are_idle(struct intel_gt *gt)
 	return true;
 }
 
+bool intel_engine_irq_enable(struct intel_engine_cs *engine)
+{
+	if (!engine->irq_enable)
+		return false;
+
+	/* Caller disables interrupts */
+	spin_lock(&engine->gt->irq_lock);
+	engine->irq_enable(engine);
+	spin_unlock(&engine->gt->irq_lock);
+
+	return true;
+}
+
+void intel_engine_irq_disable(struct intel_engine_cs *engine)
+{
+	if (!engine->irq_disable)
+		return;
+
+	/* Caller disables interrupts */
+	spin_lock(&engine->gt->irq_lock);
+	engine->irq_disable(engine);
+	spin_unlock(&engine->gt->irq_lock);
+}
+
 void intel_engines_reset_default_submission(struct intel_gt *gt)
 {
 	struct intel_engine_cs *engine;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index e2b5cda6dbc4..f7b6eed586ce 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -64,7 +64,6 @@ struct intel_gt;
 struct intel_ring;
 struct intel_uncore;
 
-typedef u8 intel_engine_mask_t;
 #define ALL_ENGINES ((intel_engine_mask_t)~0ul)
 
 struct intel_hw_status_page {
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 02880ea5d693..396b1356ea3e 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3418,9 +3418,11 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
 	lrc_fini(&ve->context);
 	intel_context_fini(&ve->context);
 
-	intel_breadcrumbs_free(ve->base.breadcrumbs);
+	if (ve->base.breadcrumbs)
+		intel_breadcrumbs_put(ve->base.breadcrumbs);
 	if (ve->base.sched_engine)
 		i915_sched_engine_put(ve->base.sched_engine);
+
 	intel_engine_free_request_pool(&ve->base);
 
 	kfree(ve->bonds);
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index 97b10fd60b55..4d023b5cd5da 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -284,7 +284,7 @@ static void mock_engine_release(struct intel_engine_cs *engine)
 	GEM_BUG_ON(timer_pending(&mock->hw_delay));
 
 	i915_sched_engine_put(engine->sched_engine);
-	intel_breadcrumbs_free(engine->breadcrumbs);
+	intel_breadcrumbs_put(engine->breadcrumbs);
 
 	intel_context_unpin(engine->kernel_context);
 	intel_context_put(engine->kernel_context);
@@ -376,7 +376,7 @@ int mock_engine_init(struct intel_engine_cs *engine)
 	return 0;
 
 err_breadcrumbs:
-	intel_breadcrumbs_free(engine->breadcrumbs);
+	intel_breadcrumbs_put(engine->breadcrumbs);
 err_schedule:
 	i915_sched_engine_put(engine->sched_engine);
 	return -ENOMEM;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index f0e5731bcef6..80b89171b35a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1077,6 +1077,9 @@ static void __guc_context_destroy(struct intel_context *ce)
 		struct guc_virtual_engine *ve =
 			container_of(ce, typeof(*ve), context);
 
+		if (ve->base.breadcrumbs)
+			intel_breadcrumbs_put(ve->base.breadcrumbs);
+
 		kfree(ve);
 	} else {
 		intel_context_free(ce);
@@ -1381,6 +1384,62 @@ static const struct intel_context_ops virtual_guc_context_ops = {
 	.get_sibling = guc_virtual_get_sibling,
 };
 
+static bool
+guc_irq_enable_breadcrumbs(struct intel_breadcrumbs *b)
+{
+	struct intel_engine_cs *sibling;
+	intel_engine_mask_t tmp, mask = b->engine_mask;
+	bool result = false;
+
+	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
+		result |= intel_engine_irq_enable(sibling);
+
+	return result;
+}
+
+static void
+guc_irq_disable_breadcrumbs(struct intel_breadcrumbs *b)
+{
+	struct intel_engine_cs *sibling;
+	intel_engine_mask_t tmp, mask = b->engine_mask;
+
+	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
+		intel_engine_irq_disable(sibling);
+}
+
+static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
+{
+	int i;
+
+       /*
+        * In GuC submission mode we do not know which physical engine a request
+        * will be scheduled on, this creates a problem because the breadcrumb
+        * interrupt is per physical engine. To work around this we attach
+        * requests and direct all breadcrumb interrupts to the first instance
+        * of an engine per class. In addition all breadcrumb interrupts are
+	* enaled / disabled across an engine class in unison.
+        */
+	for (i = 0; i < MAX_ENGINE_INSTANCE; ++i) {
+		struct intel_engine_cs *sibling =
+			engine->gt->engine_class[engine->class][i];
+
+		if (sibling) {
+			if (engine->breadcrumbs != sibling->breadcrumbs) {
+				intel_breadcrumbs_put(engine->breadcrumbs);
+				engine->breadcrumbs =
+					intel_breadcrumbs_get(sibling->breadcrumbs);
+			}
+			break;
+		}
+	}
+
+	if (engine->breadcrumbs) {
+		engine->breadcrumbs->engine_mask |= engine->mask;
+		engine->breadcrumbs->irq_enable = guc_irq_enable_breadcrumbs;
+		engine->breadcrumbs->irq_disable = guc_irq_disable_breadcrumbs;
+	}
+}
+
 static void sanitize_hwsp(struct intel_engine_cs *engine)
 {
 	struct intel_timeline *tl;
@@ -1604,6 +1663,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 
 	guc_default_vfuncs(engine);
 	guc_default_irqs(engine);
+	guc_init_breadcrumbs(engine);
 
 	if (engine->class == RENDER_CLASS)
 		rcs_submission_override(engine);
@@ -1846,11 +1906,6 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
 	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
 	ve->base.saturated = ALL_ENGINES;
-	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
-	if (!ve->base.breadcrumbs) {
-		kfree(ve);
-		return ERR_PTR(-ENOMEM);
-	}
 
 	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
 
@@ -1899,6 +1954,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 				sibling->emit_fini_breadcrumb;
 			ve->base.emit_fini_breadcrumb_dw =
 				sibling->emit_fini_breadcrumb_dw;
+			ve->base.breadcrumbs =
+				intel_breadcrumbs_get(sibling->breadcrumbs);
 
 			ve->base.flags |= sibling->flags;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 64/97] drm/i915/guc: Reset implementation for new GuC interface
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (62 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 63/97] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-06-02 14:33   ` [Intel-gfx] " Tvrtko Ursulin
  2021-05-06 19:14 ` [RFC PATCH 65/97] drm/i915: Reset GPU immediately if submission is disabled Matthew Brost
                   ` (35 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Reset implementation for new GuC interface. This is the legacy reset
implementation which is called when the i915 owns the engine hang check.
Future patches will offload the engine hang check to GuC but we will
continue to maintain this legacy path as a fallback and this code path
is also required if the GuC dies.

With the new GuC interface it is not possible to reset individual
engines - it is only possible to reset the GPU entirely. This patch
forces an entire chip reset if any engine hangs.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   3 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   6 +
 .../drm/i915/gt/intel_execlists_submission.c  |  40 ++
 drivers/gpu/drm/i915/gt/intel_gt_pm.c         |   6 +-
 drivers/gpu/drm/i915/gt/intel_reset.c         |  18 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  22 +
 drivers/gpu/drm/i915/gt/mock_engine.c         |  31 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  16 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   8 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 580 ++++++++++++++----
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  34 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   3 +
 drivers/gpu/drm/i915/i915_request.c           |  41 +-
 drivers/gpu/drm/i915/i915_request.h           |   2 +
 15 files changed, 643 insertions(+), 174 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index b24a1b7a3f88..2f01437056a8 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -392,6 +392,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	spin_lock_init(&ce->guc_state.lock);
 	INIT_LIST_HEAD(&ce->guc_state.fences);
 
+	spin_lock_init(&ce->guc_active.lock);
+	INIT_LIST_HEAD(&ce->guc_active.requests);
+
 	ce->guc_id = GUC_INVALID_LRC_ID;
 	INIT_LIST_HEAD(&ce->guc_id_link);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 6945963a31ba..b63c8cf7823b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -165,6 +165,13 @@ struct intel_context {
 		struct list_head fences;
 	} guc_state;
 
+	struct {
+		/** lock: protects everything in guc_active */
+		spinlock_t lock;
+		/** requests: active requests on this context */
+		struct list_head requests;
+	} guc_active;
+
 	/* GuC scheduling state that does not require a lock. */
 	atomic_t guc_sched_state_no_lock;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index f7b6eed586ce..b84562b2708b 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -432,6 +432,12 @@ struct intel_engine_cs {
 	 */
 	void		(*release)(struct intel_engine_cs *engine);
 
+	/*
+	 * Add / remove request from engine active tracking
+	 */
+	void		(*add_active_request)(struct i915_request *rq);
+	void		(*remove_active_request)(struct i915_request *rq);
+
 	struct intel_engine_execlists execlists;
 
 	/*
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 396b1356ea3e..54518b64bdbd 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3117,6 +3117,42 @@ static void execlists_park(struct intel_engine_cs *engine)
 	cancel_timer(&engine->execlists.preempt);
 }
 
+static void add_to_engine(struct i915_request *rq)
+{
+	lockdep_assert_held(&rq->engine->sched_engine->lock);
+	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+
+static void remove_from_engine(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine, *locked;
+
+	/*
+	 * Virtual engines complicate acquiring the engine timeline lock,
+	 * as their rq->engine pointer is not stable until under that
+	 * engine lock. The simple ploy we use is to take the lock then
+	 * check that the rq still belongs to the newly locked engine.
+	 */
+	locked = READ_ONCE(rq->engine);
+	spin_lock_irq(&locked->sched_engine->lock);
+	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
+		spin_unlock(&locked->sched_engine->lock);
+		spin_lock(&engine->sched_engine->lock);
+		locked = engine;
+	}
+	list_del_init(&rq->sched.link);
+
+	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+	clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
+
+	/* Prevent further __await_execution() registering a cb, then flush */
+	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
+
+	spin_unlock_irq(&locked->sched_engine->lock);
+
+	i915_request_notify_execute_cb_imm(rq);
+}
+
 static bool can_preempt(struct intel_engine_cs *engine)
 {
 	if (INTEL_GEN(engine->i915) > 8)
@@ -3214,6 +3250,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->cops = &execlists_context_ops;
 	engine->request_alloc = execlists_request_alloc;
 	engine->bump_serial = execlist_bump_serial;
+	engine->add_active_request = add_to_engine;
+	engine->remove_active_request = remove_from_engine;
 
 	engine->reset.prepare = execlists_reset_prepare;
 	engine->reset.rewind = execlists_reset_rewind;
@@ -3915,6 +3953,8 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 		ve->base.sched_engine->kick_backend =
 			sibling->sched_engine->kick_backend;
 
+		ve->base.add_active_request = sibling->add_active_request;
+		ve->base.remove_active_request = sibling->remove_active_request;
 		ve->base.emit_bb_start = sibling->emit_bb_start;
 		ve->base.emit_flush = sibling->emit_flush;
 		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index aef3084e8b16..463a6ae605a0 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
 	if (intel_gt_is_wedged(gt))
 		intel_gt_unset_wedged(gt);
 
-	intel_uc_sanitize(&gt->uc);
-
 	for_each_engine(engine, gt, id)
 		if (engine->reset.prepare)
 			engine->reset.prepare(engine);
@@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
 			__intel_engine_reset(engine, false);
 	}
 
+	intel_uc_reset(&gt->uc, false);
+
 	for_each_engine(engine, gt, id)
 		if (engine->reset.finish)
 			engine->reset.finish(engine);
@@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt)
 		goto err_wedged;
 	}
 
+	intel_uc_reset_finish(&gt->uc);
+
 	intel_rps_enable(&gt->rps);
 	intel_llc_enable(&gt->llc);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index d5094be6d90f..ce3ef26ffe2d 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -758,6 +758,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
 		__intel_engine_reset(engine, stalled_mask & engine->mask);
 	local_bh_enable();
 
+	intel_uc_reset(&gt->uc, true);
+
 	intel_ggtt_restore_fences(gt->ggtt);
 
 	return err;
@@ -782,6 +784,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
 		if (awake & engine->mask)
 			intel_engine_pm_put(engine);
 	}
+
+	intel_uc_reset_finish(&gt->uc);
 }
 
 static void nop_submit_request(struct i915_request *request)
@@ -835,6 +839,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
 	for_each_engine(engine, gt, id)
 		if (engine->reset.cancel)
 			engine->reset.cancel(engine);
+	intel_uc_cancel_requests(&gt->uc);
 	local_bh_enable();
 
 	reset_finish(gt, awake);
@@ -1123,6 +1128,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
 	ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
 	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags));
 
+	if (intel_engine_uses_guc(engine))
+		return -ENODEV;
+
 	if (!intel_engine_pm_get_if_awake(engine))
 		return 0;
 
@@ -1133,13 +1141,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
 			   "Resetting %s for %s\n", engine->name, msg);
 	atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
 
-	if (intel_engine_uses_guc(engine))
-		ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
-	else
-		ret = intel_gt_reset_engine(engine);
+	ret = intel_gt_reset_engine(engine);
 	if (ret) {
 		/* If we fail here, we expect to fallback to a global reset */
-		ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
+		ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret);
 		goto out;
 	}
 
@@ -1273,7 +1278,8 @@ void intel_gt_handle_error(struct intel_gt *gt,
 	 * Try engine reset when available. We fall back to full reset if
 	 * single reset fails.
 	 */
-	if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
+	if (!intel_uc_uses_guc_submission(&gt->uc) &&
+	    intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
 		local_bh_disable();
 		for_each_engine_masked(engine, gt, engine_mask, tmp) {
 			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 39dd7c4ed0a9..7d05bf16094c 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -1050,6 +1050,25 @@ static void ring_bump_serial(struct intel_engine_cs *engine)
 	engine->serial++;
 }
 
+static void add_to_engine(struct i915_request *rq)
+{
+	lockdep_assert_held(&rq->engine->sched_engine->lock);
+	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+
+static void remove_from_engine(struct i915_request *rq)
+{
+	spin_lock_irq(&rq->engine->sched_engine->lock);
+	list_del_init(&rq->sched.link);
+
+	/* Prevent further __await_execution() registering a cb, then flush */
+	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
+
+	spin_unlock_irq(&rq->engine->sched_engine->lock);
+
+	i915_request_notify_execute_cb_imm(rq);
+}
+
 static void setup_common(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
@@ -1067,6 +1086,9 @@ static void setup_common(struct intel_engine_cs *engine)
 	engine->reset.cancel = reset_cancel;
 	engine->reset.finish = reset_finish;
 
+	engine->add_active_request = add_to_engine;
+	engine->remove_active_request = remove_from_engine;
+
 	engine->cops = &ring_context_ops;
 	engine->request_alloc = ring_request_alloc;
 	engine->bump_serial = ring_bump_serial;
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index 4d023b5cd5da..dccf5fce980a 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -235,6 +235,35 @@ static void mock_submit_request(struct i915_request *request)
 	spin_unlock_irqrestore(&engine->hw_lock, flags);
 }
 
+static void mock_add_to_engine(struct i915_request *rq)
+{
+	lockdep_assert_held(&rq->engine->sched_engine->lock);
+	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+
+static void mock_remove_from_engine(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine, *locked;
+
+	/*
+	 * Virtual engines complicate acquiring the engine timeline lock,
+	 * as their rq->engine pointer is not stable until under that
+	 * engine lock. The simple ploy we use is to take the lock then
+	 * check that the rq still belongs to the newly locked engine.
+	 */
+
+	locked = READ_ONCE(rq->engine);
+	spin_lock_irq(&locked->sched_engine->lock);
+	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
+		spin_unlock(&locked->sched_engine->lock);
+		spin_lock(&engine->sched_engine->lock);
+		locked = engine;
+	}
+	list_del_init(&rq->sched.link);
+	spin_unlock_irq(&locked->sched_engine->lock);
+}
+
+
 static void mock_reset_prepare(struct intel_engine_cs *engine)
 {
 }
@@ -327,6 +356,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	engine->base.emit_flush = mock_emit_flush;
 	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
 	engine->base.submit_request = mock_submit_request;
+	engine->base.add_active_request = mock_add_to_engine;
+	engine->base.remove_active_request = mock_remove_from_engine;
 
 	engine->base.reset.prepare = mock_reset_prepare;
 	engine->base.reset.rewind = mock_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 235c1997f32d..864b14e313a3 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -146,6 +146,9 @@ static void gen11_disable_guc_interrupts(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
 
+	if (!guc->interrupts.enabled)
+		return;
+
 	spin_lock_irq(&gt->irq_lock);
 	guc->interrupts.enabled = false;
 
@@ -579,19 +582,6 @@ int intel_guc_suspend(struct intel_guc *guc)
 	return 0;
 }
 
-/**
- * intel_guc_reset_engine() - ask GuC to reset an engine
- * @guc:	intel_guc structure
- * @engine:	engine to be reset
- */
-int intel_guc_reset_engine(struct intel_guc *guc,
-			   struct intel_engine_cs *engine)
-{
-	/* XXX: to be implemented with submission interface rework */
-
-	return -ENODEV;
-}
-
 /**
  * intel_guc_resume() - notify GuC resuming from suspend state
  * @guc:	the guc
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 47eaa69809e8..afea04d56494 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -243,14 +243,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
 
 int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
 
-int intel_guc_reset_engine(struct intel_guc *guc,
-			   struct intel_engine_cs *engine);
-
 int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 					  const u32 *msg, u32 len);
 int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 				     const u32 *msg, u32 len);
 
+void intel_guc_submission_reset_prepare(struct intel_guc *guc);
+void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
+void intel_guc_submission_reset_finish(struct intel_guc *guc);
+void intel_guc_submission_cancel_requests(struct intel_guc *guc);
+
 void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
 
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 80b89171b35a..8c093bc2d3a4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -140,7 +140,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce)
 static inline void
 set_context_wait_for_deregister_to_register(struct intel_context *ce)
 {
-	/* Only should be called from guc_lrc_desc_pin() */
+	/* Only should be called from guc_lrc_desc_pin() without lock */
 	ce->guc_state.sched_state |=
 		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
 }
@@ -240,15 +240,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
 
 static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
 {
+	guc->lrc_desc_pool_vaddr = NULL;
 	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
 }
 
+static inline bool guc_submission_initialized(struct intel_guc *guc)
+{
+	return guc->lrc_desc_pool_vaddr != NULL;
+}
+
 static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
 {
-	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
+	if (likely(guc_submission_initialized(guc))) {
+		struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
+		unsigned long flags;
 
-	memset(desc, 0, sizeof(*desc));
-	xa_erase_irq(&guc->context_lookup, id);
+		memset(desc, 0, sizeof(*desc));
+
+		/*
+		 * xarray API doesn't have xa_erase_irqsave wrapper, so calling
+		 * the lower level functions directly.
+		 */
+		xa_lock_irqsave(&guc->context_lookup, flags);
+		__xa_erase(&guc->context_lookup, id);
+		xa_unlock_irqrestore(&guc->context_lookup, flags);
+	}
 }
 
 static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
@@ -259,7 +275,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
 static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
 					   struct intel_context *ce)
 {
-	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
+	unsigned long flags;
+
+	/*
+	 * xarray API doesn't have xa_save_irqsave wrapper, so calling the
+	 * lower level functions directly.
+	 */
+	xa_lock_irqsave(&guc->context_lookup, flags);
+	__xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
+	xa_unlock_irqrestore(&guc->context_lookup, flags);
 }
 
 static int guc_submission_busy_loop(struct intel_guc* guc,
@@ -330,6 +354,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
 					interruptible, timeout);
 }
 
+static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
+
 static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
 	int err;
@@ -337,11 +363,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	u32 action[3];
 	int len = 0;
 	u32 g2h_len_dw = 0;
-	bool enabled = context_enabled(ce);
+	bool enabled;
 
 	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
 	GEM_BUG_ON(context_guc_id_invalid(ce));
 
+	/*
+	 * Corner case where the GuC firmware was blown away and reloaded while
+	 * this context was pinned.
+	 */
+	if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
+		err = guc_lrc_desc_pin(ce, false);
+		if (unlikely(err))
+			goto out;
+	}
+	enabled = context_enabled(ce);
+
 	if (!enabled) {
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
 		action[len++] = ce->guc_id;
@@ -364,6 +401,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		intel_context_put(ce);
 	}
 
+out:
 	return err;
 }
 
@@ -418,15 +456,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 	if (submit) {
 		guc_set_lrc_tail(last);
 resubmit:
-		/*
-		 * We only check for -EBUSY here even though it is possible for
-		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
-		 * died and a full GPU needs to be done. The hangcheck will
-		 * eventually detect that the GuC has died and trigger this
-		 * reset so no need to handle -EDEADLK here.
-		 */
 		ret = guc_add_request(guc, last);
-		if (ret == -EBUSY) {
+		if (unlikely(ret == -EDEADLK))
+			goto deadlk;
+		else if (ret == -EBUSY) {
 			i915_sched_engine_kick(sched_engine);
 			guc->stalled_request = last;
 			return false;
@@ -436,6 +469,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 
 	guc->stalled_request = NULL;
 	return submit;
+
+deadlk:
+	sched_engine->tasklet.callback = NULL;
+	tasklet_disable_nosync(&sched_engine->tasklet);
+	return false;
 }
 
 static void guc_submission_tasklet(struct tasklet_struct *t)
@@ -462,29 +500,165 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
 		intel_engine_signal_breadcrumbs(engine);
 }
 
-static void guc_reset_prepare(struct intel_engine_cs *engine)
+static void __guc_context_destroy(struct intel_context *ce);
+static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
+static void guc_signal_context_fence(struct intel_context *ce);
+
+static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 {
-	struct i915_sched_engine * const sched_engine = engine->sched_engine;
+	struct intel_context *ce;
+	unsigned long index, flags;
+	bool pending_disable, pending_enable, deregister, destroyed;
 
-	ENGINE_TRACE(engine, "\n");
+	xa_for_each(&guc->context_lookup, index, ce) {
+		/* Flush context */
+		spin_lock_irqsave(&ce->guc_state.lock, flags);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		/*
+		 * Once we are at this point submission_disabled() is guaranteed
+		 * to visible to all callers who set the below flags (see above
+		 * flush and flushes in reset_prepare). If submission_disabled()
+		 * is set, the caller shouldn't set these flags.
+		 */
+
+		destroyed = context_destroyed(ce);
+		pending_enable = context_pending_enable(ce);
+		pending_disable = context_pending_disable(ce);
+		deregister = context_wait_for_deregister_to_register(ce);
+		init_sched_state(ce);
+
+		if (pending_enable || destroyed || deregister) {
+			atomic_dec(&guc->outstanding_submission_g2h);
+			if (deregister)
+				guc_signal_context_fence(ce);
+			if (destroyed) {
+				release_guc_id(guc, ce);
+				__guc_context_destroy(ce);
+			}
+			if (pending_enable|| deregister)
+				intel_context_put(ce);
+		}
+
+		/* Not mutualy exclusive with above if statement. */
+		if (pending_disable) {
+			guc_signal_context_fence(ce);
+			intel_context_sched_disable_unpin(ce);
+			atomic_dec(&guc->outstanding_submission_g2h);
+			intel_context_put(ce);
+		}
+	}
+}
+
+static inline bool
+submission_disabled(struct intel_guc *guc)
+{
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+
+	return unlikely(!__tasklet_is_enabled(&sched_engine->tasklet));
+}
+
+static void disable_submission(struct intel_guc *guc)
+{
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+
+	if (__tasklet_is_enabled(&sched_engine->tasklet)) {
+		GEM_BUG_ON(!guc->ct.enabled);
+		__tasklet_disable_sync_once(&sched_engine->tasklet);
+		sched_engine->tasklet.callback = NULL;
+	}
+}
+
+static void enable_submission(struct intel_guc *guc)
+{
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	unsigned long flags;
+
+	spin_lock_irqsave(&guc->sched_engine->lock, flags);
+	sched_engine->tasklet.callback = guc_submission_tasklet;
+	wmb();
+	if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
+	    __tasklet_enable(&sched_engine->tasklet)) {
+		GEM_BUG_ON(!guc->ct.enabled);
+
+		/* And kick in case we missed a new request submission. */
+		i915_sched_engine_hi_kick(sched_engine);
+	}
+	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+}
+
+static void guc_flush_submissions(struct intel_guc *guc)
+{
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	unsigned long flags;
+
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+
+void intel_guc_submission_reset_prepare(struct intel_guc *guc)
+{
+	int i;
+
+	if (unlikely(!guc_submission_initialized(guc)))
+		/* Reset called during driver load? GuC not yet initialised! */
+		return;
+
+	disable_submission(guc);
+	guc->interrupts.disable(guc);
+
+	/* Flush IRQ handler */
+	spin_lock_irq(&guc_to_gt(guc)->irq_lock);
+	spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
+
+	guc_flush_submissions(guc);
 
 	/*
-	 * Prevent request submission to the hardware until we have
-	 * completed the reset in i915_gem_reset_finish(). If a request
-	 * is completed by one engine, it may then queue a request
-	 * to a second via its sched_engine->tasklet *just* as we are
-	 * calling engine->init_hw() and also writing the ELSP.
-	 * Turning off the sched_engine->tasklet until the reset is over
-	 * prevents the race.
+	 * Handle any outstanding G2Hs before reset. Call IRQ handler directly
+	 * each pass as interrupt have been disabled. We always scrub for
+	 * outstanding G2H as it is possible for outstanding_submission_g2h to
+	 * be incremented after the context state update.
 	 */
-	__tasklet_disable_sync_once(&sched_engine->tasklet);
+	for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
+		intel_guc_to_host_event_handler(guc);
+#define wait_for_reset(guc, wait_var) \
+		guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
+		do {
+			wait_for_reset(guc, &guc->outstanding_submission_g2h);
+		} while (!list_empty(&guc->ct.requests.incoming));
+	}
+	scrub_guc_desc_for_outstanding_g2h(guc);
+}
+
+static struct intel_engine_cs *
+guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
+{
+	struct intel_engine_cs *engine;
+	intel_engine_mask_t tmp, mask = ve->mask;
+	unsigned int num_siblings = 0;
+
+	for_each_engine_masked(engine, ve->gt, mask, tmp)
+		if (num_siblings++ == sibling)
+			return engine;
+
+	return NULL;
+}
+
+static inline struct intel_engine_cs *
+__context_to_physical_engine(struct intel_context *ce)
+{
+	struct intel_engine_cs *engine = ce->engine;
+
+	if (intel_engine_is_virtual(engine))
+		engine = guc_virtual_get_sibling(engine, 0);
+
+	return engine;
 }
 
-static void guc_reset_state(struct intel_context *ce,
-			    struct intel_engine_cs *engine,
-			    u32 head,
-			    bool scrub)
+static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
 {
+	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
+
 	GEM_BUG_ON(!intel_context_is_pinned(ce));
 
 	/*
@@ -502,42 +676,147 @@ static void guc_reset_state(struct intel_context *ce,
 	lrc_update_regs(ce, engine, head);
 }
 
-static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
+static void guc_reset_nop(struct intel_engine_cs *engine)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct i915_request *rq;
+}
+
+static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled)
+{
+}
+
+static void
+__unwind_incomplete_requests(struct intel_context *ce)
+{
+	struct i915_request *rq, *rn;
+	struct list_head *pl;
+	int prio = I915_PRIORITY_INVALID;
+	struct i915_sched_engine * const sched_engine =
+		ce->engine->sched_engine;
 	unsigned long flags;
 
-	spin_lock_irqsave(&engine->sched_engine->lock, flags);
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	spin_lock(&ce->guc_active.lock);
+	list_for_each_entry_safe(rq, rn,
+				 &ce->guc_active.requests,
+				 sched.link) {
+		if (i915_request_completed(rq))
+			continue;
+
+		list_del_init(&rq->sched.link);
+		spin_unlock(&ce->guc_active.lock);
+
+		__i915_request_unsubmit(rq);
+
+		/* Push the request back into the queue for later resubmission. */
+		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
+		if (rq_prio(rq) != prio) {
+			prio = rq_prio(rq);
+			pl = i915_sched_lookup_priolist(sched_engine, prio);
+		}
+		GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
 
-	/* Push back any incomplete requests for replay after the reset. */
-	rq = execlists_unwind_incomplete_requests(execlists);
-	if (!rq)
-		goto out_unlock;
+		list_add_tail(&rq->sched.link, pl);
+		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+
+		spin_lock(&ce->guc_active.lock);
+	}
+	spin_unlock(&ce->guc_active.lock);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+
+static struct i915_request *context_find_active_request(struct intel_context *ce)
+{
+	struct i915_request *rq, *active = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ce->guc_active.lock, flags);
+	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
+				    sched.link) {
+		if (i915_request_completed(rq))
+			break;
+
+		active = rq;
+	}
+	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
+
+	return active;
+}
+
+static void __guc_reset_context(struct intel_context *ce, bool stalled)
+{
+	struct i915_request *rq;
+	u32 head;
+
+	/*
+	 * GuC will implicitly mark the context as non-schedulable
+	 * when it sends the reset notification. Make sure our state
+	 * reflects this change. The context will be marked enabled
+	 * on resubmission.
+	 */
+	clr_context_enabled(ce);
+
+	rq = context_find_active_request(ce);
+	if (!rq) {
+		head = ce->ring->tail;
+		stalled = false;
+		goto out_replay;
+	}
 
 	if (!i915_request_started(rq))
 		stalled = false;
 
+	GEM_BUG_ON(i915_active_is_idle(&ce->active));
+	head = intel_ring_wrap(ce->ring, rq->head);
 	__i915_request_reset(rq, stalled);
-	guc_reset_state(rq->context, engine, rq->head, stalled);
 
-out_unlock:
-	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+out_replay:
+	guc_reset_state(ce, head, stalled);
+	__unwind_incomplete_requests(ce);
+}
+
+void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
+{
+	struct intel_context *ce;
+	unsigned long index;
+
+	if (unlikely(!guc_submission_initialized(guc)))
+		/* Reset called during driver load? GuC not yet initialised! */
+		return;
+
+	xa_for_each(&guc->context_lookup, index, ce)
+		if (intel_context_is_pinned(ce))
+			__guc_reset_context(ce, stalled);
+
+	/* GuC is blown away, drop all references to contexts */
+	xa_destroy(&guc->context_lookup);
+}
+
+static void guc_cancel_context_requests(struct intel_context *ce)
+{
+	struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
+	struct i915_request *rq;
+	unsigned long flags;
+
+	/* Mark all executing requests as skipped. */
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	spin_lock(&ce->guc_active.lock);
+	list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
+		i915_request_put(i915_request_mark_eio(rq));
+	spin_unlock(&ce->guc_active.lock);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
-static void guc_reset_cancel(struct intel_engine_cs *engine)
+static void
+guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine)
 {
-	struct i915_sched_engine * const sched_engine = engine->sched_engine;
 	struct i915_request *rq, *rn;
 	struct rb_node *rb;
 	unsigned long flags;
 
 	/* Can be called during boot if GuC fails to load */
-	if (!engine->gt)
+	if (!sched_engine)
 		return;
 
-	ENGINE_TRACE(engine, "\n");
-
 	/*
 	 * Before we call engine->cancel_requests(), we should have exclusive
 	 * access to the submission state. This is arranged for us by the
@@ -552,13 +831,7 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
 	 * submission's irq state, we also wish to remind ourselves that
 	 * it is irq state.)
 	 */
-	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-
-	/* Mark all executing requests as skipped. */
-	list_for_each_entry(rq, &engine->sched_engine->requests, sched.link) {
-		i915_request_set_error_once(rq, -EIO);
-		i915_request_mark_complete(rq);
-	}
+	spin_lock_irqsave(&sched_engine->lock, flags);
 
 	/* Flush the queued requests to the timeline list (for retiring). */
 	while ((rb = rb_first_cached(&sched_engine->queue))) {
@@ -566,9 +839,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
 
 		priolist_for_each_request_consume(rq, rn, p) {
 			list_del_init(&rq->sched.link);
+
 			__i915_request_submit(rq);
-			dma_fence_set_error(&rq->fence, -EIO);
-			i915_request_mark_complete(rq);
+
+			i915_request_put(i915_request_mark_eio(rq));
 		}
 
 		rb_erase_cached(&p->node, &sched_engine->queue);
@@ -580,19 +854,41 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
 	sched_engine->queue_priority_hint = INT_MIN;
 	sched_engine->queue = RB_ROOT_CACHED;
 
-	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
-static void guc_reset_finish(struct intel_engine_cs *engine)
+void intel_guc_submission_cancel_requests(struct intel_guc *guc)
 {
-	struct i915_sched_engine * const sched_engine = engine->sched_engine;
+	struct intel_context *ce;
+	unsigned long index;
 
-	if (__tasklet_enable(&sched_engine->tasklet))
-		/* And kick in case we missed a new request submission. */
-		i915_sched_engine_hi_kick(sched_engine);
+	xa_for_each(&guc->context_lookup, index, ce)
+		if (intel_context_is_pinned(ce))
+			guc_cancel_context_requests(ce);
+
+	guc_cancel_sched_engine_requests(guc->sched_engine);
+
+	/* GuC is blown away, drop all references to contexts */
+	xa_destroy(&guc->context_lookup);
+}
+
+void intel_guc_submission_reset_finish(struct intel_guc *guc)
+{
+	/* Reset called during driver load or during wedge? */
+	if (unlikely(!guc_submission_initialized(guc) ||
+		     test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
+		return;
 
-	ENGINE_TRACE(engine, "depth->%d\n",
-		     atomic_read(&sched_engine->tasklet.count));
+	/*
+	 * Technically possible for either of these values to be non-zero here,
+	 * but very unlikely + harmless. Regardless let's add a warn so we can
+	 * see in CI if this happens frequently / a precursor to taking down the
+	 * machine.
+	 */
+	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
+	atomic_set(&guc->outstanding_submission_g2h, 0);
+
+	enable_submission(guc);
 }
 
 /*
@@ -659,6 +955,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
 	else
 		trace_i915_request_guc_submit(rq);
 
+	if (unlikely(ret == -EDEADLK))
+		disable_submission(guc);
+
 	return ret;
 }
 
@@ -671,7 +970,8 @@ static void guc_submit_request(struct i915_request *rq)
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
+	if (submission_disabled(guc) || guc->stalled_request ||
+	    !i915_sched_engine_is_empty(sched_engine))
 		queue_request(sched_engine, rq, rq_prio(rq));
 	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
 		i915_sched_engine_hi_kick(sched_engine);
@@ -808,7 +1108,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
 
 static int __guc_action_register_context(struct intel_guc *guc,
 					 u32 guc_id,
-					 u32 offset)
+					 u32 offset,
+					 bool loop)
 {
 	u32 action[] = {
 		INTEL_GUC_ACTION_REGISTER_CONTEXT,
@@ -816,10 +1117,10 @@ static int __guc_action_register_context(struct intel_guc *guc,
 		offset,
 	};
 
-	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
+	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, loop);
 }
 
-static int register_context(struct intel_context *ce)
+static int register_context(struct intel_context *ce, bool loop)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
@@ -827,11 +1128,12 @@ static int register_context(struct intel_context *ce)
 
 	trace_intel_context_register(ce);
 
-	return __guc_action_register_context(guc, ce->guc_id, offset);
+	return __guc_action_register_context(guc, ce->guc_id, offset, loop);
 }
 
 static int __guc_action_deregister_context(struct intel_guc *guc,
-					   u32 guc_id)
+					   u32 guc_id,
+					   bool loop)
 {
 	u32 action[] = {
 		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
@@ -839,16 +1141,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
 	};
 
 	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
-					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
+					G2H_LEN_DW_DEREGISTER_CONTEXT, loop);
 }
 
-static int deregister_context(struct intel_context *ce, u32 guc_id)
+static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 
 	trace_intel_context_deregister(ce);
 
-	return __guc_action_deregister_context(guc, guc_id);
+	return __guc_action_deregister_context(guc, guc_id, loop);
 }
 
 static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
@@ -877,7 +1179,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
 	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
 }
 
-static int guc_lrc_desc_pin(struct intel_context *ce)
+static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 {
 	struct intel_runtime_pm *runtime_pm =
 		&ce->engine->gt->i915->runtime_pm;
@@ -923,18 +1225,44 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
 	 */
 	if (context_registered) {
 		trace_intel_context_steal_guc_id(ce);
-		set_context_wait_for_deregister_to_register(ce);
-		intel_context_get(ce);
+		if (!loop) {
+			set_context_wait_for_deregister_to_register(ce);
+			intel_context_get(ce);
+		} else {
+			bool disabled;
+			unsigned long flags;
+
+			/* Seal race with Reset */
+			spin_lock_irqsave(&ce->guc_state.lock, flags);
+			disabled = submission_disabled(guc);
+			if (likely(!disabled)) {
+				set_context_wait_for_deregister_to_register(ce);
+				intel_context_get(ce);
+			}
+			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+			if (unlikely(disabled)) {
+				reset_lrc_desc(guc, desc_idx);
+				return 0;	/* Will get registered later */
+			}
+		}
 
 		/*
 		 * If stealing the guc_id, this ce has the same guc_id as the
 		 * context whos guc_id was stole.
 		 */
 		with_intel_runtime_pm(runtime_pm, wakeref)
-			ret = deregister_context(ce, ce->guc_id);
+			ret = deregister_context(ce, ce->guc_id, loop);
+		if (unlikely(ret == -EBUSY)) {
+			clr_context_wait_for_deregister_to_register(ce);
+			intel_context_put(ce);
+		}
 	} else {
 		with_intel_runtime_pm(runtime_pm, wakeref)
-			ret = register_context(ce);
+			ret = register_context(ce, loop);
+		if (unlikely(ret == -EBUSY))
+			reset_lrc_desc(guc, desc_idx);
+		else if (unlikely(ret == -ENODEV))
+			ret = 0;	/* Will get registered later */
 	}
 
 	return ret;
@@ -997,7 +1325,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
 
 	trace_intel_context_sched_disable(ce);
-	intel_context_get(ce);
 
 	guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
 				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
@@ -1007,6 +1334,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
 {
 	set_context_pending_disable(ce);
 	clr_context_enabled(ce);
+	intel_context_get(ce);
 
 	return ce->guc_id;
 }
@@ -1019,7 +1347,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	u16 guc_id;
 	intel_wakeref_t wakeref;
 
-	if (context_guc_id_invalid(ce) ||
+	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
 		clr_context_enabled(ce);
 		goto unpin;
@@ -1053,19 +1381,13 @@ static void guc_context_sched_disable(struct intel_context *ce)
 
 static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 {
-	struct intel_engine_cs *engine = ce->engine;
-	struct intel_guc *guc = &engine->gt->uc.guc;
-	unsigned long flags;
+	struct intel_guc *guc = ce_to_guc(ce);
 
 	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
 	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
 	GEM_BUG_ON(context_enabled(ce));
 
-	spin_lock_irqsave(&ce->guc_state.lock, flags);
-	set_context_destroyed(ce);
-	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
-
-	deregister_context(ce, ce->guc_id);
+	deregister_context(ce, ce->guc_id, true);
 }
 
 static void __guc_context_destroy(struct intel_context *ce)
@@ -1093,13 +1415,15 @@ static void guc_context_destroy(struct kref *kref)
 	struct intel_guc *guc = &ce->engine->gt->uc.guc;
 	intel_wakeref_t wakeref;
 	unsigned long flags;
+	bool disabled;
 
 	/*
 	 * If the guc_id is invalid this context has been stolen and we can free
 	 * it immediately. Also can be freed immediately if the context is not
 	 * registered with the GuC.
 	 */
-	if (context_guc_id_invalid(ce) ||
+	if (submission_disabled(guc) ||
+	    context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
 		release_guc_id(guc, ce);
 		__guc_context_destroy(ce);
@@ -1126,6 +1450,18 @@ static void guc_context_destroy(struct kref *kref)
 		list_del_init(&ce->guc_id_link);
 	spin_unlock_irqrestore(&guc->contexts_lock, flags);
 
+	/* Seal race with Reset */
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	disabled = submission_disabled(guc);
+	if (likely(!disabled))
+		set_context_destroyed(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+	if (unlikely(disabled)) {
+		release_guc_id(guc, ce);
+		__guc_context_destroy(ce);
+		return;
+	}
+
 	/*
 	 * We defer GuC context deregistration until the context is destroyed
 	 * in order to save on CTBs. With this optimization ideally we only need
@@ -1148,6 +1484,33 @@ static int guc_context_alloc(struct intel_context *ce)
 	return lrc_alloc(ce, ce->engine);
 }
 
+static void add_to_context(struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+
+	spin_lock(&ce->guc_active.lock);
+	list_move_tail(&rq->sched.link, &ce->guc_active.requests);
+	spin_unlock(&ce->guc_active.lock);
+}
+
+static void remove_from_context(struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+
+	spin_lock_irq(&ce->guc_active.lock);
+
+	list_del_init(&rq->sched.link);
+	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+
+	/* Prevent further __await_execution() registering a cb, then flush */
+	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
+
+	spin_unlock_irq(&ce->guc_active.lock);
+
+	atomic_dec(&ce->guc_id_ref);
+	i915_request_notify_execute_cb_imm(rq);
+}
+
 static const struct intel_context_ops guc_context_ops = {
 	.alloc = guc_context_alloc,
 
@@ -1186,8 +1549,6 @@ static void guc_signal_context_fence(struct intel_context *ce)
 {
 	unsigned long flags;
 
-	GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
-
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
 	clr_context_wait_for_deregister_to_register(ce);
 	__guc_signal_context_fence(ce);
@@ -1196,8 +1557,9 @@ static void guc_signal_context_fence(struct intel_context *ce)
 
 static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
 {
-	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
-		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
+	return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
+		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
+		!submission_disabled(ce_to_guc(ce));
 }
 
 static int guc_request_alloc(struct i915_request *rq)
@@ -1256,8 +1618,10 @@ static int guc_request_alloc(struct i915_request *rq)
 		return ret;;
 
 	if (context_needs_register(ce, !!ret)) {
-		ret = guc_lrc_desc_pin(ce);
+		ret = guc_lrc_desc_pin(ce, true);
 		if (unlikely(ret)) {	/* unwind */
+			if (ret == -EDEADLK)
+				disable_submission(guc);
 			atomic_dec(&ce->guc_id_ref);
 			unpin_guc_id(guc, ce);
 			return ret;
@@ -1294,20 +1658,6 @@ static int guc_request_alloc(struct i915_request *rq)
 	return 0;
 }
 
-static struct intel_engine_cs *
-guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
-{
-	struct intel_engine_cs *engine;
-	intel_engine_mask_t tmp, mask = ve->mask;
-	unsigned int num_siblings = 0;
-
-	for_each_engine_masked(engine, ve->gt, mask, tmp)
-		if (num_siblings++ == sibling)
-			return engine;
-
-	return NULL;
-}
-
 static int guc_virtual_context_pre_pin(struct intel_context *ce,
 				       struct i915_gem_ww_ctx *ww,
 				       void **vaddr)
@@ -1516,7 +1866,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
 {
 	if (context_guc_id_invalid(ce))
 		pin_guc_id(guc, ce);
-	guc_lrc_desc_pin(ce);
+	guc_lrc_desc_pin(ce, true);
 }
 
 static inline void guc_init_lrc_mapping(struct intel_guc *guc)
@@ -1582,13 +1932,15 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 	engine->cops = &guc_context_ops;
 	engine->request_alloc = guc_request_alloc;
 	engine->bump_serial = guc_bump_serial;
+	engine->add_active_request = add_to_context;
+	engine->remove_active_request = remove_from_context;
 
 	engine->sched_engine->schedule = i915_schedule;
 
-	engine->reset.prepare = guc_reset_prepare;
-	engine->reset.rewind = guc_reset_rewind;
-	engine->reset.cancel = guc_reset_cancel;
-	engine->reset.finish = guc_reset_finish;
+	engine->reset.prepare = guc_reset_nop;
+	engine->reset.rewind = guc_rewind_nop;
+	engine->reset.cancel = guc_reset_nop;
+	engine->reset.finish = guc_reset_nop;
 
 	engine->emit_flush = gen8_emit_flush_xcs;
 	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
@@ -1764,7 +2116,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 		 * register this context.
 		 */
 		with_intel_runtime_pm(runtime_pm, wakeref)
-			register_context(ce);
+			register_context(ce, true);
 		guc_signal_context_fence(ce);
 		intel_context_put(ce);
 	} else if (context_destroyed(ce)) {
@@ -1946,6 +2298,10 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 				 "v%dx%d", ve->base.class, count);
 			ve->base.context_size = sibling->context_size;
 
+			ve->base.add_active_request =
+				sibling->add_active_request;
+			ve->base.remove_active_request =
+				sibling->remove_active_request;
 			ve->base.emit_bb_start = sibling->emit_bb_start;
 			ve->base.emit_flush = sibling->emit_flush;
 			ve->base.emit_init_breadcrumb =
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index ab0789d66e06..d5ccffbb89ae 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -565,12 +565,44 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
 
+	/* Firmware expected to be running when this function is called */
 	if (!intel_guc_is_ready(guc))
-		return;
+		goto sanitize;
+
+	if (intel_uc_uses_guc_submission(uc))
+		intel_guc_submission_reset_prepare(guc);
 
+sanitize:
 	__uc_sanitize(uc);
 }
 
+void intel_uc_reset(struct intel_uc *uc, bool stalled)
+{
+	struct intel_guc *guc = &uc->guc;
+
+	/* Firmware can not be running when this function is called  */
+	if (intel_uc_uses_guc_submission(uc))
+		intel_guc_submission_reset(guc, stalled);
+}
+
+void intel_uc_reset_finish(struct intel_uc *uc)
+{
+	struct intel_guc *guc = &uc->guc;
+
+	/* Firmware expected to be running when this function is called */
+	if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
+		intel_guc_submission_reset_finish(guc);
+}
+
+void intel_uc_cancel_requests(struct intel_uc *uc)
+{
+	struct intel_guc *guc = &uc->guc;
+
+	/* Firmware can not be running when this function is called  */
+	if (intel_uc_uses_guc_submission(uc))
+		intel_guc_submission_cancel_requests(guc);
+}
+
 void intel_uc_runtime_suspend(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
index c4cef885e984..eaa3202192ac 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc);
 void intel_uc_driver_remove(struct intel_uc *uc);
 void intel_uc_init_mmio(struct intel_uc *uc);
 void intel_uc_reset_prepare(struct intel_uc *uc);
+void intel_uc_reset(struct intel_uc *uc, bool stalled);
+void intel_uc_reset_finish(struct intel_uc *uc);
+void intel_uc_cancel_requests(struct intel_uc *uc);
 void intel_uc_suspend(struct intel_uc *uc);
 void intel_uc_runtime_suspend(struct intel_uc *uc);
 int intel_uc_resume(struct intel_uc *uc);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 0b96b824ea06..4855cf7ebe21 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -194,7 +194,7 @@ static bool irq_work_imm(struct irq_work *wrk)
 	return false;
 }
 
-static void __notify_execute_cb_imm(struct i915_request *rq)
+void i915_request_notify_execute_cb_imm(struct i915_request *rq)
 {
 	__notify_execute_cb(rq, irq_work_imm);
 }
@@ -268,37 +268,6 @@ i915_request_active_engine(struct i915_request *rq,
 	return ret;
 }
 
-
-static void remove_from_engine(struct i915_request *rq)
-{
-	struct intel_engine_cs *engine, *locked;
-
-	/*
-	 * Virtual engines complicate acquiring the engine timeline lock,
-	 * as their rq->engine pointer is not stable until under that
-	 * engine lock. The simple ploy we use is to take the lock then
-	 * check that the rq still belongs to the newly locked engine.
-	 */
-	locked = READ_ONCE(rq->engine);
-	spin_lock_irq(&locked->sched_engine->lock);
-	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
-		spin_unlock(&locked->sched_engine->lock);
-		spin_lock(&engine->sched_engine->lock);
-		locked = engine;
-	}
-	list_del_init(&rq->sched.link);
-
-	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
-	clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
-
-	/* Prevent further __await_execution() registering a cb, then flush */
-	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
-
-	spin_unlock_irq(&locked->sched_engine->lock);
-
-	__notify_execute_cb_imm(rq);
-}
-
 static void __rq_init_watchdog(struct i915_request *rq)
 {
 	rq->watchdog.timer.function = NULL;
@@ -395,9 +364,7 @@ bool i915_request_retire(struct i915_request *rq)
 	 * after removing the breadcrumb and signaling it, so that we do not
 	 * inadvertently attach the breadcrumb to a completed request.
 	 */
-	if (!list_empty(&rq->sched.link))
-		remove_from_engine(rq);
-	atomic_dec(&rq->context->guc_id_ref);
+	rq->engine->remove_active_request(rq);
 	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
 
 	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
@@ -539,7 +506,7 @@ __await_execution(struct i915_request *rq,
 	if (llist_add(&cb->work.node.llist, &signal->execute_cb)) {
 		if (i915_request_is_active(signal) ||
 		    __request_in_flight(signal))
-			__notify_execute_cb_imm(signal);
+			i915_request_notify_execute_cb_imm(signal);
 	}
 
 	return 0;
@@ -676,7 +643,7 @@ bool __i915_request_submit(struct i915_request *request)
 	result = true;
 
 	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
-	list_move_tail(&request->sched.link, &engine->sched_engine->requests);
+	engine->add_active_request(request);
 active:
 	clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags);
 	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index f870cd75a001..bcc6340c505e 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -649,4 +649,6 @@ bool
 i915_request_active_engine(struct i915_request *rq,
 			   struct intel_engine_cs **active);
 
+void i915_request_notify_execute_cb_imm(struct i915_request *rq);
+
 #endif /* I915_REQUEST_H */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 65/97] drm/i915: Reset GPU immediately if submission is disabled
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (63 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 64/97] drm/i915/guc: Reset implementation for new GuC interface Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-06-02 14:36   ` [Intel-gfx] " Tvrtko Ursulin
  2021-05-06 19:14 ` [RFC PATCH 66/97] drm/i915/guc: Add disable interrupts to guc sanitize Matthew Brost
                   ` (34 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

If submission is disabled by the backend for any reason, reset the GPU
immediately in the heartbeat code.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 63 +++++++++++++++----
 .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  4 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  9 +++
 drivers/gpu/drm/i915/i915_scheduler.c         |  6 ++
 drivers/gpu/drm/i915/i915_scheduler.h         |  6 ++
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  3 +
 6 files changed, 78 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index b6a305e6a974..a8495364d906 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -70,12 +70,30 @@ static void show_heartbeat(const struct i915_request *rq,
 {
 	struct drm_printer p = drm_debug_printer("heartbeat");
 
-	intel_engine_dump(engine, &p,
-			  "%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n",
-			  engine->name,
-			  rq->fence.context,
-			  rq->fence.seqno,
-			  rq->sched.attr.priority);
+	if (!rq) {
+		intel_engine_dump(engine, &p,
+				  "%s heartbeat not ticking\n",
+				  engine->name);
+	} else {
+		intel_engine_dump(engine, &p,
+				  "%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n",
+				  engine->name,
+				  rq->fence.context,
+				  rq->fence.seqno,
+				  rq->sched.attr.priority);
+	}
+}
+
+static void
+reset_engine(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+	if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
+		show_heartbeat(rq, engine);
+
+	intel_gt_handle_error(engine->gt, engine->mask,
+			      I915_ERROR_CAPTURE,
+			      "stopped heartbeat on %s",
+			      engine->name);
 }
 
 static void heartbeat(struct work_struct *wrk)
@@ -102,6 +120,11 @@ static void heartbeat(struct work_struct *wrk)
 	if (intel_gt_is_wedged(engine->gt))
 		goto out;
 
+	if (i915_sched_engine_disabled(engine->sched_engine)) {
+		reset_engine(engine, engine->heartbeat.systole);
+		goto out;
+	}
+
 	if (engine->heartbeat.systole) {
 		long delay = READ_ONCE(engine->props.heartbeat_interval_ms);
 
@@ -139,13 +162,7 @@ static void heartbeat(struct work_struct *wrk)
 			engine->sched_engine->schedule(rq, &attr);
 			local_bh_enable();
 		} else {
-			if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
-				show_heartbeat(rq, engine);
-
-			intel_gt_handle_error(engine->gt, engine->mask,
-					      I915_ERROR_CAPTURE,
-					      "stopped heartbeat on %s",
-					      engine->name);
+			reset_engine(engine, rq);
 		}
 
 		rq->emitted_jiffies = jiffies;
@@ -194,6 +211,26 @@ void intel_engine_park_heartbeat(struct intel_engine_cs *engine)
 		i915_request_put(fetch_and_zero(&engine->heartbeat.systole));
 }
 
+void intel_gt_unpark_heartbeats(struct intel_gt *gt)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, gt, id)
+		if (intel_engine_pm_is_awake(engine))
+			intel_engine_unpark_heartbeat(engine);
+
+}
+
+void intel_gt_park_heartbeats(struct intel_gt *gt)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, gt, id)
+		intel_engine_park_heartbeat(engine);
+}
+
 void intel_engine_init_heartbeat(struct intel_engine_cs *engine)
 {
 	INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
index a488ea3e84a3..5da6d809a87a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
@@ -7,6 +7,7 @@
 #define INTEL_ENGINE_HEARTBEAT_H
 
 struct intel_engine_cs;
+struct intel_gt;
 
 void intel_engine_init_heartbeat(struct intel_engine_cs *engine);
 
@@ -16,6 +17,9 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
 void intel_engine_park_heartbeat(struct intel_engine_cs *engine);
 void intel_engine_unpark_heartbeat(struct intel_engine_cs *engine);
 
+void intel_gt_park_heartbeats(struct intel_gt *gt);
+void intel_gt_unpark_heartbeats(struct intel_gt *gt);
+
 int intel_engine_pulse(struct intel_engine_cs *engine);
 int intel_engine_flush_barriers(struct intel_engine_cs *engine);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 8c093bc2d3a4..a5997d6b4aa4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -10,6 +10,7 @@
 #include "gt/intel_breadcrumbs.h"
 #include "gt/intel_context.h"
 #include "gt/intel_engine_pm.h"
+#include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_irq.h"
 #include "gt/intel_gt_pm.h"
@@ -604,6 +605,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 		/* Reset called during driver load? GuC not yet initialised! */
 		return;
 
+	intel_gt_park_heartbeats(guc_to_gt(guc));
 	disable_submission(guc);
 	guc->interrupts.disable(guc);
 
@@ -889,6 +891,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
 	atomic_set(&guc->outstanding_submission_g2h, 0);
 
 	enable_submission(guc);
+	intel_gt_unpark_heartbeats(guc_to_gt(guc));
 }
 
 /*
@@ -1856,6 +1859,11 @@ static int guc_resume(struct intel_engine_cs *engine)
 	return 0;
 }
 
+static bool guc_sched_engine_disabled(struct i915_sched_engine *sched_engine)
+{
+	return !sched_engine->tasklet.callback;
+}
+
 static void guc_set_default_submission(struct intel_engine_cs *engine)
 {
 	engine->submit_request = guc_submit_request;
@@ -2006,6 +2014,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 			return -ENOMEM;
 
 		guc->sched_engine->schedule = i915_schedule;
+		guc->sched_engine->disabled = guc_sched_engine_disabled;
 		guc->sched_engine->engine = engine;
 		tasklet_setup(&guc->sched_engine->tasklet,
 			      guc_submission_tasklet);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 28d403a8d7d2..72a9bee3026f 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -440,6 +440,11 @@ void i915_sched_engine_free(struct kref *kref)
 	kfree(sched_engine);
 }
 
+static bool default_disabled(struct i915_sched_engine *sched_engine)
+{
+	return false;
+}
+
 struct i915_sched_engine *
 i915_sched_engine_create(unsigned int subclass)
 {
@@ -453,6 +458,7 @@ i915_sched_engine_create(unsigned int subclass)
 
 	sched_engine->queue = RB_ROOT_CACHED;
 	sched_engine->queue_priority_hint = INT_MIN;
+	sched_engine->disabled = default_disabled;
 
 	INIT_LIST_HEAD(&sched_engine->requests);
 	INIT_LIST_HEAD(&sched_engine->hold);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index a78b1f50ecb4..ec8dfa87cbb6 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -116,4 +116,10 @@ sched_engine_active_unlock_bh(struct i915_sched_engine *sched_engine)
 	local_bh_enable(); /* restore softirq, and kick ksoftirqd! */
 }
 
+static inline bool
+i915_sched_engine_disabled(struct i915_sched_engine *sched_engine)
+{
+	return sched_engine->disabled(sched_engine);
+}
+
 #endif /* _I915_SCHEDULER_H_ */
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index 90b389ba661b..a7183792d110 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -141,6 +141,9 @@ struct i915_sched_engine {
 	/* Back pointer to engine */
 	struct intel_engine_cs *engine;
 
+	/* Schedule engine is disabled by backend */
+	bool	(*disabled)(struct i915_sched_engine *sched_engine);
+
 	/* Kick backend */
 	void	(*kick_backend)(const struct i915_request *rq,
 				int prio);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 66/97] drm/i915/guc: Add disable interrupts to guc sanitize
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (64 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 65/97] drm/i915: Reset GPU immediately if submission is disabled Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-11  8:16   ` [drm/i915/guc] 07336fb545: WARNING:at_drivers/gpu/drm/i915/gt/uc/intel_uc.c:#__uc_sanitize[i915] kernel test robot
  2021-05-06 19:14 ` [RFC PATCH 67/97] drm/i915/guc: Suspend/resume implementation for new interface Matthew Brost
                   ` (33 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Add disable GuC interrupts to intel_guc_sanitize(). Part of this
requires moving the guc_*_interrupt wrapper function into header file
intel_guc.h.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h | 16 ++++++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c  | 21 +++------------------
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index afea04d56494..277b4496a20e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -218,9 +218,25 @@ static inline bool intel_guc_is_ready(struct intel_guc *guc)
 	return intel_guc_is_fw_running(guc) && intel_guc_ct_enabled(&guc->ct);
 }
 
+static inline void intel_guc_reset_interrupts(struct intel_guc *guc)
+{
+	guc->interrupts.reset(guc);
+}
+
+static inline void intel_guc_enable_interrupts(struct intel_guc *guc)
+{
+	guc->interrupts.enable(guc);
+}
+
+static inline void intel_guc_disable_interrupts(struct intel_guc *guc)
+{
+	guc->interrupts.disable(guc);
+}
+
 static inline int intel_guc_sanitize(struct intel_guc *guc)
 {
 	intel_uc_fw_sanitize(&guc->fw);
+	intel_guc_disable_interrupts(guc);
 	intel_guc_ct_sanitize(&guc->ct);
 	guc->mmio_msg = 0;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index d5ccffbb89ae..67c1e15845aa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -207,21 +207,6 @@ static void guc_handle_mmio_msg(struct intel_guc *guc)
 	spin_unlock_irq(&guc->irq_lock);
 }
 
-static void guc_reset_interrupts(struct intel_guc *guc)
-{
-	guc->interrupts.reset(guc);
-}
-
-static void guc_enable_interrupts(struct intel_guc *guc)
-{
-	guc->interrupts.enable(guc);
-}
-
-static void guc_disable_interrupts(struct intel_guc *guc)
-{
-	guc->interrupts.disable(guc);
-}
-
 static int guc_enable_communication(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
@@ -242,7 +227,7 @@ static int guc_enable_communication(struct intel_guc *guc)
 	guc_get_mmio_msg(guc);
 	guc_handle_mmio_msg(guc);
 
-	guc_enable_interrupts(guc);
+	intel_guc_enable_interrupts(guc);
 
 	/* check for CT messages received before we enabled interrupts */
 	spin_lock_irq(&gt->irq_lock);
@@ -265,7 +250,7 @@ static void guc_disable_communication(struct intel_guc *guc)
 	 */
 	guc_clear_mmio_msg(guc);
 
-	guc_disable_interrupts(guc);
+	intel_guc_disable_interrupts(guc);
 
 	intel_guc_ct_disable(&guc->ct);
 
@@ -463,7 +448,7 @@ static int __uc_init_hw(struct intel_uc *uc)
 	if (ret)
 		goto err_out;
 
-	guc_reset_interrupts(guc);
+	intel_guc_reset_interrupts(guc);
 
 	/* WaEnableuKernelHeaderValidFix:skl */
 	/* WaEnableGuCBootHashCheckNotSet:skl,bxt,kbl */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 67/97] drm/i915/guc: Suspend/resume implementation for new interface
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (65 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 66/97] drm/i915/guc: Add disable interrupts to guc sanitize Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 68/97] drm/i915/guc: Handle context reset notification Matthew Brost
                   ` (32 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

The new GuC interface introduces an MMIO H2G command,
INTEL_GUC_ACTION_RESET_CLIENT, which is used to implement suspend. This
MMIO tears down any active contexts generating a context reset G2H CTB
for each. Once that step completes the GuC tears down the CTB
channels. It is safe to suspend once this MMIO H2G command completes
and all G2H CTBs have been processed. In practice the i915 will likely
never receive a G2H as suspend should only be called after the GPU is
idle.

Resume is implemented in the same manner as before - simply reload the
GuC firmware and reinitialize everything (e.g. CTB channels, contexts,
etc..).

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        | 64 ++++++++-----------
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 ++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  5 ++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         | 28 +++++---
 5 files changed, 59 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index c0a715ec7276..c9e87de3af49 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -146,6 +146,7 @@ enum intel_guc_action {
 	INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
 	INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
 	INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
+	INTEL_GUC_ACTION_RESET_CLIENT = 0x5B01,
 	INTEL_GUC_ACTION_LIMIT
 };
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 864b14e313a3..f3240037fb7c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -534,51 +534,34 @@ int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset)
  */
 int intel_guc_suspend(struct intel_guc *guc)
 {
-	struct intel_uncore *uncore = guc_to_gt(guc)->uncore;
 	int ret;
-	u32 status;
 	u32 action[] = {
-		INTEL_GUC_ACTION_ENTER_S_STATE,
-		GUC_POWER_D1, /* any value greater than GUC_POWER_D0 */
+		INTEL_GUC_ACTION_RESET_CLIENT,
 	};
 
-	/*
-	 * If GuC communication is enabled but submission is not supported,
-	 * we do not need to suspend the GuC.
-	 */
-	if (!intel_guc_submission_is_used(guc) || !intel_guc_is_ready(guc))
+	if (!intel_guc_is_ready(guc))
 		return 0;
 
-	/*
-	 * The ENTER_S_STATE action queues the save/restore operation in GuC FW
-	 * and then returns, so waiting on the H2G is not enough to guarantee
-	 * GuC is done. When all the processing is done, GuC writes
-	 * INTEL_GUC_SLEEP_STATE_SUCCESS to scratch register 14, so we can poll
-	 * on that. Note that GuC does not ensure that the value in the register
-	 * is different from INTEL_GUC_SLEEP_STATE_SUCCESS while the action is
-	 * in progress so we need to take care of that ourselves as well.
-	 */
-
-	intel_uncore_write(uncore, SOFT_SCRATCH(14),
-			   INTEL_GUC_SLEEP_STATE_INVALID_MASK);
-
-	ret = intel_guc_send(guc, action, ARRAY_SIZE(action));
-	if (ret)
-		return ret;
-
-	ret = __intel_wait_for_register(uncore, SOFT_SCRATCH(14),
-					INTEL_GUC_SLEEP_STATE_INVALID_MASK,
-					0, 0, 10, &status);
-	if (ret)
-		return ret;
-
-	if (status != INTEL_GUC_SLEEP_STATE_SUCCESS) {
-		DRM_ERROR("GuC failed to change sleep state. "
-			  "action=0x%x, err=%u\n",
-			  action[0], status);
-		return -EIO;
+	if (intel_guc_submission_is_used(guc)) {
+		/*
+		 * This H2G MMIO command tears down the GuC in two steps. First it will
+		 * generate a G2H CTB for every active context indicating a reset. In
+		 * practice the i915 shouldn't ever get a G2H as suspend should only be
+		 * called when the GPU is idle. Next, it tears down the CTBs and this
+		 * H2G MMIO command completes.
+		 *
+		 * Don't abort on a failure code from the GuC. Keep going and do the
+		 * clean up in santize() and re-initialisation on resume and hopefully
+		 * the error here won't be problematic.
+		 */
+		ret = intel_guc_send_mmio(guc, action, ARRAY_SIZE(action), NULL, 0);
+		if (ret)
+			DRM_ERROR("GuC suspend: RESET_CLIENT action failed with error %d!\n", ret);
 	}
 
+	/* Signal that the GuC isn't running. */
+	intel_guc_sanitize(guc);
+
 	return 0;
 }
 
@@ -588,7 +571,12 @@ int intel_guc_suspend(struct intel_guc *guc)
  */
 int intel_guc_resume(struct intel_guc *guc)
 {
-	/* XXX: to be implemented with submission interface rework */
+	/*
+	 * NB: This function can still be called even if GuC submission is
+	 * disabled, e.g. if GuC is enabled for HuC authentication only. Thus,
+	 * if any code is later added here, it must be support doing nothing
+	 * if submission is disabled (as per intel_guc_suspend).
+	 */
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a5997d6b4aa4..2c3791fc24b7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -303,10 +303,10 @@ static int guc_submission_busy_loop(struct intel_guc* guc,
 	return err;
 }
 
-static int guc_wait_for_pending_msg(struct intel_guc *guc,
-				    atomic_t *wait_var,
-				    bool interruptible,
-				    long timeout)
+int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
+				   atomic_t *wait_var,
+				   bool interruptible,
+				   long timeout)
 {
 	const int state = interruptible ?
 		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
@@ -351,8 +351,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
 	if (unlikely(timeout < 0))
 		timeout = -timeout, interruptible = false;
 
-	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
-					interruptible, timeout);
+	return intel_guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
+					      interruptible, timeout);
 }
 
 static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
@@ -624,7 +624,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 	for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
 		intel_guc_to_host_event_handler(guc);
 #define wait_for_reset(guc, wait_var) \
-		guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
+		intel_guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
 		do {
 			wait_for_reset(guc, &guc->outstanding_submission_g2h);
 		} while (!list_empty(&guc->ct.requests.incoming));
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 95df5ab06031..b9b9f0f60f91 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -27,6 +27,11 @@ void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p);
 
 bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
 
+int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
+				   atomic_t *wait_var,
+				   bool interruptible,
+				   long timeout);
+
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
 	/* XXX: GuC submission is unavailable for now */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 67c1e15845aa..7035aa727e04 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -591,14 +591,20 @@ void intel_uc_cancel_requests(struct intel_uc *uc)
 void intel_uc_runtime_suspend(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
-	int err;
 
-	if (!intel_guc_is_ready(guc))
+	if (!intel_guc_is_ready(guc)) {
+		guc->interrupts.enabled = false;
 		return;
+	}
 
-	err = intel_guc_suspend(guc);
-	if (err)
-		DRM_DEBUG_DRIVER("Failed to suspend GuC, err=%d", err);
+	/*
+	 * Wait for any outstanding CTB before tearing down communication /w the
+	 * GuC.
+	 */
+#define OUTSTANDING_CTB_TIMEOUT_PERIOD	(HZ / 5)
+	intel_guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
+				       false, OUTSTANDING_CTB_TIMEOUT_PERIOD);
+	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
 
 	guc_disable_communication(guc);
 }
@@ -607,12 +613,18 @@ void intel_uc_suspend(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
 	intel_wakeref_t wakeref;
+	int err;
 
-	if (!intel_guc_is_ready(guc))
+	if (!intel_guc_is_ready(guc)) {
+		guc->interrupts.enabled = false;
 		return;
+	}
 
-	with_intel_runtime_pm(uc_to_gt(uc)->uncore->rpm, wakeref)
-		intel_uc_runtime_suspend(uc);
+	with_intel_runtime_pm(&uc_to_gt(uc)->i915->runtime_pm, wakeref) {
+		err = intel_guc_suspend(guc);
+		if (err)
+			DRM_DEBUG_DRIVER("Failed to suspend GuC, err=%d", err);
+	}
 }
 
 static int __uc_resume(struct intel_uc *uc, bool enable_communication)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 68/97] drm/i915/guc: Handle context reset notification
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (66 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 67/97] drm/i915/guc: Suspend/resume implementation for new interface Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-11 16:25   ` [Intel-gfx] " Daniel Vetter
  2021-05-06 19:14 ` [RFC PATCH 69/97] drm/i915/guc: Handle engine reset failure notification Matthew Brost
                   ` (31 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

GuC will issue a reset on detecting an engine hang and will notify
the driver via a G2H message. The driver will service the notification
by resetting the guilty context to a simple state or banning it
completely.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  6 ++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_trace.h             | 10 ++++++
 4 files changed, 53 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 277b4496a20e..a2abe1c422e3 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -263,6 +263,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 					  const u32 *msg, u32 len);
 int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 				     const u32 *msg, u32 len);
+int intel_guc_context_reset_process_msg(struct intel_guc *guc,
+					const u32 *msg, u32 len);
 
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index b3194d753b13..9c84b2ba63a8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -941,6 +941,12 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 			CT_ERROR(ct, "schedule context failed %x %*ph\n",
 				  action, 4 * len, payload);
 		break;
+	case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
+		ret = intel_guc_context_reset_process_msg(guc, payload, len);
+		if (unlikely(ret))
+			CT_ERROR(ct, "context reset notification failed %x %*ph\n",
+				  action, 4 * len, payload);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 2c3791fc24b7..940017495731 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2192,6 +2192,41 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 	return 0;
 }
 
+static void guc_context_replay(struct intel_context *ce)
+{
+	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
+
+	__guc_reset_context(ce, true);
+	i915_sched_engine_hi_kick(sched_engine);
+}
+
+static void guc_handle_context_reset(struct intel_guc *guc,
+				     struct intel_context *ce)
+{
+	trace_intel_context_reset(ce);
+	guc_context_replay(ce);
+}
+
+int intel_guc_context_reset_process_msg(struct intel_guc *guc,
+					const u32 *msg, u32 len)
+{
+	struct intel_context *ce;
+	int desc_idx = msg[0];
+
+	if (unlikely(len != 1)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+		return -EPROTO;
+	}
+
+	ce = g2h_context_lookup(guc, desc_idx);
+	if (unlikely(!ce))
+		return -EPROTO;
+
+	guc_handle_context_reset(guc, ce);
+
+	return 0;
+}
+
 void intel_guc_log_submission_info(struct intel_guc *guc,
 				   struct drm_printer *p)
 {
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 97c2e83984ed..c095c4d39456 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context,
 		      __entry->guc_sched_state_no_lock)
 );
 
+DEFINE_EVENT(intel_context, intel_context_reset,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
 DEFINE_EVENT(intel_context, intel_context_register,
 	     TP_PROTO(struct intel_context *ce),
 	     TP_ARGS(ce)
@@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq)
 {
 }
 
+static inline void
+trace_intel_context_reset(struct intel_context *ce)
+{
+}
+
 static inline void
 trace_intel_context_register(struct intel_context *ce)
 {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 69/97] drm/i915/guc: Handle engine reset failure notification
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (67 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 68/97] drm/i915/guc: Handle context reset notification Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 70/97] drm/i915/guc: Enable the timer expired interrupt for GuC Matthew Brost
                   ` (30 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

GuC will notify the driver, via G2H, if it fails to
reset an engine. We recover by resorting to a full GPU
reset.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  6 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 43 +++++++++++++++++++
 3 files changed, 51 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index a2abe1c422e3..e118d8217e77 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -265,6 +265,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 				     const u32 *msg, u32 len);
 int intel_guc_context_reset_process_msg(struct intel_guc *guc,
 					const u32 *msg, u32 len);
+int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
+					 const u32 *msg, u32 len);
 
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 9c84b2ba63a8..d5b326d4e250 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -947,6 +947,12 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 			CT_ERROR(ct, "context reset notification failed %x %*ph\n",
 				  action, 4 * len, payload);
 		break;
+	case INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION:
+		ret = intel_guc_engine_failure_process_msg(guc, payload, len);
+		if (unlikely(ret))
+			CT_ERROR(ct, "engine failure handler failed %x %*ph\n",
+				  action, 4 * len, payload);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 940017495731..22f17a055b21 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2227,6 +2227,49 @@ int intel_guc_context_reset_process_msg(struct intel_guc *guc,
 	return 0;
 }
 
+static struct intel_engine_cs *
+guc_lookup_engine(struct intel_guc *guc, u8 guc_class, u8 instance)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	u8 engine_class = guc_class_to_engine_class(guc_class);
+
+	/* Class index is checked in class converter */
+	GEM_BUG_ON(instance > MAX_ENGINE_INSTANCE);
+
+	return gt->engine_class[engine_class][instance];
+}
+
+int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
+					 const u32 *msg, u32 len)
+{
+	struct intel_engine_cs *engine;
+	u8 guc_class, instance;
+	u32 reason;
+
+	if (unlikely(len != 3)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+		return -EPROTO;
+	}
+
+	guc_class = msg[0];
+	instance = msg[1];
+	reason = msg[2];
+
+	engine = guc_lookup_engine(guc, guc_class, instance);
+	if (unlikely(!engine)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"Invalid engine %d:%d", guc_class, instance);
+		return -EPROTO;
+	}
+
+	intel_gt_handle_error(guc_to_gt(guc), engine->mask,
+			      I915_ERROR_CAPTURE,
+			      "GuC failed to reset %s (reason=0x%08x)\n",
+			      engine->name, reason);
+
+	return 0;
+}
+
 void intel_guc_log_submission_info(struct intel_guc *guc,
 				   struct drm_printer *p)
 {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 70/97] drm/i915/guc: Enable the timer expired interrupt for GuC
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (68 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 69/97] drm/i915/guc: Handle engine reset failure notification Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 71/97] drm/i915/guc: Provide mmio list to be saved/restored on engine reset Matthew Brost
                   ` (29 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

The GuC can implement execution qunatums, detect hung contexts and
other such things but it requires the timer expired interrupt to do so.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
CC: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_rps.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
index 97cab1b99871..0bf86d54adb6 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -1877,6 +1877,10 @@ void intel_rps_init(struct intel_rps *rps)
 
 	if (INTEL_GEN(i915) >= 8 && INTEL_GEN(i915) < 11)
 		rps->pm_intrmsk_mbz |= GEN8_PMINTR_DISABLE_REDIRECT_TO_GUC;
+
+	/* GuC needs ARAT expired interrupt unmasked */
+	if (intel_uc_uses_guc_submission(&rps_to_gt(rps)->uc))
+		rps->pm_intrmsk_mbz |= ARAT_EXPIRED_INTRMSK;
 }
 
 void intel_rps_sanitize(struct intel_rps *rps)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 71/97] drm/i915/guc: Provide mmio list to be saved/restored on engine reset
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (69 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 70/97] drm/i915/guc: Enable the timer expired interrupt for GuC Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 72/97] drm/i915/guc: Don't complain about reset races Matthew Brost
                   ` (28 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

The driver must provide GuC with a list of mmio registers
that should be saved/restored during a GuC-based engine reset.
Unfortunately, the list must be dynamically allocated as its size is
variable. That means the driver must generate the list twice - once to
work out the size and a second time to actually save it.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c   |  46 ++--
 .../gpu/drm/i915/gt/intel_workarounds_types.h |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    | 199 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_reg.h               |   1 +
 5 files changed, 222 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 5a03a76bb9e2..05d21476d140 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -150,13 +150,14 @@ static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
 }
 
 static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
-		   u32 clear, u32 set, u32 read_mask)
+		   u32 clear, u32 set, u32 read_mask, bool masked_reg)
 {
 	struct i915_wa wa = {
 		.reg  = reg,
 		.clr  = clear,
 		.set  = set,
 		.read = read_mask,
+		.masked_reg = masked_reg,
 	};
 
 	_wa_add(wal, &wa);
@@ -165,7 +166,7 @@ static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
 static void
 wa_write_clr_set(struct i915_wa_list *wal, i915_reg_t reg, u32 clear, u32 set)
 {
-	wa_add(wal, reg, clear, set, clear);
+	wa_add(wal, reg, clear, set, clear, false);
 }
 
 static void
@@ -200,20 +201,20 @@ wa_write_clr(struct i915_wa_list *wal, i915_reg_t reg, u32 clr)
 static void
 wa_masked_en(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
 {
-	wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val);
+	wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val, true);
 }
 
 static void
 wa_masked_dis(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
 {
-	wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val);
+	wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val, true);
 }
 
 static void
 wa_masked_field_set(struct i915_wa_list *wal, i915_reg_t reg,
 		    u32 mask, u32 val)
 {
-	wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask);
+	wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask, true);
 }
 
 static void gen6_ctx_workarounds_init(struct intel_engine_cs *engine,
@@ -583,10 +584,10 @@ static void icl_ctx_workarounds_init(struct intel_engine_cs *engine,
 			     GEN11_BLEND_EMB_FIX_DISABLE_IN_RCC);
 
 	/* WaEnableFloatBlendOptimization:icl */
-	wa_write_clr_set(wal,
-			 GEN10_CACHE_MODE_SS,
-			 0, /* write-only, so skip validation */
-			 _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE));
+	wa_add(wal, GEN10_CACHE_MODE_SS, 0,
+	       _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE),
+	       0 /* write-only, so skip validation */,
+	       true);
 
 	/* WaDisableGPGPUMidThreadPreemption:icl */
 	wa_masked_field_set(wal, GEN8_CS_CHICKEN1,
@@ -631,7 +632,7 @@ static void gen12_ctx_gt_tuning_init(struct intel_engine_cs *engine,
 	       FF_MODE2,
 	       FF_MODE2_TDS_TIMER_MASK,
 	       FF_MODE2_TDS_TIMER_128,
-	       0);
+	       0, false);
 }
 
 static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine,
@@ -668,7 +669,7 @@ static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine,
 	       FF_MODE2,
 	       FF_MODE2_GS_TIMER_MASK,
 	       FF_MODE2_GS_TIMER_224,
-	       0);
+	       0, false);
 }
 
 static void dg1_ctx_workarounds_init(struct intel_engine_cs *engine,
@@ -839,7 +840,7 @@ hsw_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
 	wa_add(wal,
 	       HSW_ROW_CHICKEN3, 0,
 	       _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE),
-		0 /* XXX does this reg exist? */);
+	       0 /* XXX does this reg exist? */, true);
 
 	/* WaVSRefCountFullforceMissDisable:hsw */
 	wa_write_clr(wal, GEN7_FF_THREAD_MODE, GEN7_FF_VS_REF_CNT_FFME);
@@ -1950,10 +1951,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 		 * disable bit, which we don't touch here, but it's good
 		 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
 		 */
-		wa_add(wal, GEN7_GT_MODE, 0,
-		       _MASKED_FIELD(GEN6_WIZ_HASHING_MASK,
-				     GEN6_WIZ_HASHING_16x4),
-		       GEN6_WIZ_HASHING_16x4);
+		wa_masked_field_set(wal,
+				    GEN7_GT_MODE,
+				    GEN6_WIZ_HASHING_MASK,
+				    GEN6_WIZ_HASHING_16x4);
 	}
 
 	if (IS_GEN_RANGE(i915, 6, 7))
@@ -2003,10 +2004,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 		 * disable bit, which we don't touch here, but it's good
 		 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
 		 */
-		wa_add(wal,
-		       GEN6_GT_MODE, 0,
-		       _MASKED_FIELD(GEN6_WIZ_HASHING_MASK, GEN6_WIZ_HASHING_16x4),
-		       GEN6_WIZ_HASHING_16x4);
+		wa_masked_field_set(wal,
+				    GEN7_GT_MODE,
+				    GEN6_WIZ_HASHING_MASK,
+				    GEN6_WIZ_HASHING_16x4);
 
 		/* WaDisable_RenderCache_OperationalFlush:snb */
 		wa_masked_dis(wal, CACHE_MODE_0, RC_OP_FLUSH_ENABLE);
@@ -2027,7 +2028,7 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 		wa_add(wal, MI_MODE,
 		       0, _MASKED_BIT_ENABLE(VS_TIMER_DISPATCH),
 		       /* XXX bit doesn't stick on Broadwater */
-		       IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH);
+		       IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH, true);
 
 	if (IS_GEN(i915, 4))
 		/*
@@ -2042,7 +2043,8 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 		 */
 		wa_add(wal, ECOSKPD,
 		       0, _MASKED_BIT_ENABLE(ECO_CONSTANT_BUFFER_SR_DISABLE),
-		       0 /* XXX bit doesn't stick on Broadwater */);
+		       0 /* XXX bit doesn't stick on Broadwater */,
+		       true);
 }
 
 static void
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds_types.h b/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
index c214111ea367..1e873681795d 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
@@ -15,6 +15,7 @@ struct i915_wa {
 	u32		clr;
 	u32		set;
 	u32		read;
+	bool		masked_reg;
 };
 
 struct i915_wa_list {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index e118d8217e77..097687937cec 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -59,6 +59,7 @@ struct intel_guc {
 
 	struct i915_vma *ads_vma;
 	struct __guc_ads_blob *ads_blob;
+	u32 ads_regset_size;
 
 	struct i915_vma *lrc_desc_pool;
 	void *lrc_desc_pool_vaddr;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index ecd18531b40a..cd65ff42657d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -3,6 +3,8 @@
  * Copyright © 2014-2019 Intel Corporation
  */
 
+#include <linux/bsearch.h>
+
 #include "gt/intel_gt.h"
 #include "gt/intel_lrc.h"
 #include "intel_guc_ads.h"
@@ -25,7 +27,12 @@
  *      | guc_gt_system_info                    |
  *      +---------------------------------------+
  *      | guc_clients_info                      |
- *      +---------------------------------------+
+ *      +---------------------------------------+ <== static
+ *      | guc_mmio_reg[countA] (engine 0.0)     |
+ *      | guc_mmio_reg[countB] (engine 0.1)     |
+ *      | guc_mmio_reg[countC] (engine 1.0)     |
+ *      |   ...                                 |
+ *      +---------------------------------------+ <== dynamic
  *      | padding                               |
  *      +---------------------------------------+ <== 4K aligned
  *      | private data                          |
@@ -38,16 +45,33 @@ struct __guc_ads_blob {
 	struct guc_policies policies;
 	struct guc_gt_system_info system_info;
 	struct guc_clients_info clients_info;
+	/* From here on, location is dynamic! Refer to above diagram. */
+	struct guc_mmio_reg regset[0];
 } __packed;
 
+static u32 guc_ads_regset_size(struct intel_guc *guc)
+{
+	GEM_BUG_ON(!guc->ads_regset_size);
+	return guc->ads_regset_size;
+}
+
 static u32 guc_ads_private_data_size(struct intel_guc *guc)
 {
 	return PAGE_ALIGN(guc->fw.private_data_size);
 }
 
+static u32 guc_ads_regset_offset(struct intel_guc *guc)
+{
+	return offsetof(struct __guc_ads_blob, regset);
+}
+
 static u32 guc_ads_private_data_offset(struct intel_guc *guc)
 {
-	return PAGE_ALIGN(sizeof(struct __guc_ads_blob));
+	u32 offset;
+
+	offset = guc_ads_regset_offset(guc) +
+		 guc_ads_regset_size(guc);
+	return PAGE_ALIGN(offset);
 }
 
 static u32 guc_ads_blob_size(struct intel_guc *guc)
@@ -86,6 +110,165 @@ static void guc_mapping_table_init(struct intel_gt *gt,
 	}
 }
 
+/*
+ * The save/restore register list must be pre-calculated to a temporary
+ * buffer of driver defined size before it can be generated in place
+ * inside the ADS.
+ */
+#define MAX_MMIO_REGS	128	/* Arbitrary size, increase as needed */
+struct temp_regset {
+	struct guc_mmio_reg *registers;
+	u32 used;
+	u32 size;
+};
+
+static int guc_mmio_reg_cmp(const void *a, const void *b)
+{
+	const struct guc_mmio_reg *ra = a;
+	const struct guc_mmio_reg *rb = b;
+
+	return (int)ra->offset - (int)rb->offset;
+}
+
+static void guc_mmio_reg_add(struct temp_regset *regset,
+			     u32 offset, u32 flags)
+{
+	u32 count = regset->used;
+	struct guc_mmio_reg reg = {
+		.offset = offset,
+		.flags = flags,
+	};
+	struct guc_mmio_reg *slot;
+
+	GEM_BUG_ON(count >= regset->size);
+
+	/*
+	 * The mmio list is built using separate lists within the driver.
+	 * It's possible that at some point we may attempt to add the same
+	 * register more than once. Do not consider this an error; silently
+	 * move on if the register is already in the list.
+	 */
+	if (bsearch(&reg, regset->registers, count,
+		    sizeof(reg), guc_mmio_reg_cmp))
+		return;
+
+	slot = &regset->registers[count];
+	regset->used++;
+	*slot = reg;
+
+	while (slot-- > regset->registers) {
+		GEM_BUG_ON(slot[0].offset == slot[1].offset);
+		if (slot[1].offset > slot[0].offset)
+			break;
+
+		swap(slot[1], slot[0]);
+	}
+}
+
+#define GUC_MMIO_REG_ADD(regset, reg, masked) \
+	guc_mmio_reg_add(regset, \
+			 i915_mmio_reg_offset((reg)), \
+			 (masked) ? GUC_REGSET_MASKED : 0)
+
+static void guc_mmio_regset_init(struct temp_regset *regset,
+				 struct intel_engine_cs *engine)
+{
+	const u32 base = engine->mmio_base;
+	struct i915_wa_list *wal = &engine->wa_list;
+	struct i915_wa *wa;
+	unsigned int i;
+
+	regset->used = 0;
+
+	GUC_MMIO_REG_ADD(regset, RING_MODE_GEN7(base), true);
+	GUC_MMIO_REG_ADD(regset, RING_HWS_PGA(base), false);
+	GUC_MMIO_REG_ADD(regset, RING_IMR(base), false);
+
+	for (i = 0, wa = wal->list; i < wal->count; i++, wa++)
+		GUC_MMIO_REG_ADD(regset, wa->reg, wa->masked_reg);
+
+	/* Be extra paranoid and include all whitelist registers. */
+	for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++)
+		GUC_MMIO_REG_ADD(regset,
+				 RING_FORCE_TO_NONPRIV(base, i),
+				 false);
+
+	/* add in local MOCS registers */
+	for (i = 0; i < GEN9_LNCFCMOCS_REG_COUNT; i++)
+		GUC_MMIO_REG_ADD(regset, GEN9_LNCFCMOCS(i), false);
+}
+
+static int guc_mmio_reg_state_query(struct intel_guc *guc)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	struct temp_regset temp_set;
+	u32 total;
+
+	/*
+	 * Need to actually build the list in order to filter out
+	 * duplicates and other such data dependent constructions.
+	 */
+	temp_set.size = MAX_MMIO_REGS;
+	temp_set.registers = kmalloc_array(temp_set.size,
+					  sizeof(*temp_set.registers),
+					  GFP_KERNEL);
+	if (!temp_set.registers)
+		return -ENOMEM;
+
+	total = 0;
+	for_each_engine(engine, gt, id) {
+		guc_mmio_regset_init(&temp_set, engine);
+		total += temp_set.used;
+	}
+
+	kfree(temp_set.registers);
+
+	return total * sizeof(struct guc_mmio_reg);
+}
+
+static void guc_mmio_reg_state_init(struct intel_guc *guc,
+				    struct __guc_ads_blob *blob)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	struct temp_regset temp_set;
+	struct guc_mmio_reg_set *ads_reg_set;
+	u32 addr_ggtt, offset;
+	u8 guc_class;
+
+	offset = guc_ads_regset_offset(guc);
+	addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
+	temp_set.registers = (struct guc_mmio_reg *) (((u8 *) blob) + offset);
+	temp_set.size = guc->ads_regset_size / sizeof(temp_set.registers[0]);
+
+	for_each_engine(engine, gt, id) {
+		/* Class index is checked in class converter */
+		GEM_BUG_ON(engine->instance >= GUC_MAX_INSTANCES_PER_CLASS);
+
+		guc_class = engine_class_to_guc_class(engine->class);
+		ads_reg_set = &blob->ads.reg_state_list[guc_class][engine->instance];
+
+		guc_mmio_regset_init(&temp_set, engine);
+		if (!temp_set.used) {
+			ads_reg_set->address = 0;
+			ads_reg_set->count = 0;
+			continue;
+		}
+
+		ads_reg_set->address = addr_ggtt;
+		ads_reg_set->count = temp_set.used;
+
+		temp_set.size -= temp_set.used;
+		temp_set.registers += temp_set.used;
+		addr_ggtt += temp_set.used * sizeof(struct guc_mmio_reg);
+	}
+
+	GEM_BUG_ON(temp_set.size);
+}
+
 /*
  * The first 80 dwords of the register state context, containing the
  * execlists and ppgtt registers.
@@ -124,8 +307,7 @@ static void __guc_ads_init(struct intel_guc *guc)
 		 */
 		blob->ads.golden_context_lrca[guc_class] = 0;
 		blob->ads.eng_state_size[guc_class] =
-			intel_engine_context_size(guc_to_gt(guc),
-						  engine_class) -
+			intel_engine_context_size(gt, engine_class) -
 			skipped_size;
 	}
 
@@ -160,6 +342,9 @@ static void __guc_ads_init(struct intel_guc *guc)
 	blob->ads.gt_system_info = base + ptr_offset(blob, system_info);
 	blob->ads.clients_info = base + ptr_offset(blob, clients_info);
 
+	/* MMIO save/restore list */
+	guc_mmio_reg_state_init(guc, blob);
+
 	/* Private Data */
 	blob->ads.private_data = base + guc_ads_private_data_offset(guc);
 
@@ -180,6 +365,12 @@ int intel_guc_ads_create(struct intel_guc *guc)
 
 	GEM_BUG_ON(guc->ads_vma);
 
+	/* Need to calculate the reg state size dynamically: */
+	ret = guc_mmio_reg_state_query(guc);
+	if (ret < 0)
+		return ret;
+	guc->ads_regset_size = ret;
+
 	size = guc_ads_blob_size(guc);
 
 	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma,
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index db151b522825..e7988b4812f3 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -12166,6 +12166,7 @@ enum skl_power_gate {
 
 /* MOCS (Memory Object Control State) registers */
 #define GEN9_LNCFCMOCS(i)	_MMIO(0xb020 + (i) * 4)	/* L3 Cache Control */
+#define GEN9_LNCFCMOCS_REG_COUNT	32
 
 #define __GEN9_RCS0_MOCS0	0xc800
 #define GEN9_GFX_MOCS(i)	_MMIO(__GEN9_RCS0_MOCS0 + (i) * 4)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 72/97] drm/i915/guc: Don't complain about reset races
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (70 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 71/97] drm/i915/guc: Provide mmio list to be saved/restored on engine reset Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 73/97] drm/i915/guc: Enable GuC engine reset Matthew Brost
                   ` (27 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

It is impossible to seal all race conditions of resets occurring
concurrent to other operations. At least, not without introducing
excesive mutex locking. Instead, don't complain if it occurs. In
particular, don't complain if trying to send a H2G during a reset.
Whatever the H2G was about should get redone once the reset is over.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 5 ++++-
 drivers/gpu/drm/i915/gt/uc/intel_uc.c     | 4 ++++
 drivers/gpu/drm/i915/gt/uc/intel_uc.h     | 2 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index d5b326d4e250..1c240ff8dec9 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -718,7 +718,10 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 	int ret;
 
 	if (unlikely(!ct->enabled)) {
-		WARN(1, "Unexpected send: action=%#x\n", *action);
+		struct intel_guc *guc = ct_to_guc(ct);
+		struct intel_uc *uc = container_of(guc, struct intel_uc, guc);
+
+		WARN(!uc->reset_in_progress, "Unexpected send: action=%#x\n", *action);
 		return -ENODEV;
 	}
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 7035aa727e04..8c681fc49638 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -550,6 +550,8 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
 
+	uc->reset_in_progress = true;
+
 	/* Firmware expected to be running when this function is called */
 	if (!intel_guc_is_ready(guc))
 		goto sanitize;
@@ -574,6 +576,8 @@ void intel_uc_reset_finish(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
 
+	uc->reset_in_progress = false;
+
 	/* Firmware expected to be running when this function is called */
 	if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
 		intel_guc_submission_reset_finish(guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
index eaa3202192ac..91315e3f1c58 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -30,6 +30,8 @@ struct intel_uc {
 
 	/* Snapshot of GuC log from last failed load */
 	struct drm_i915_gem_object *load_err_log;
+
+	bool reset_in_progress;
 };
 
 void intel_uc_init_early(struct intel_uc *uc);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 73/97] drm/i915/guc: Enable GuC engine reset
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (71 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 72/97] drm/i915/guc: Don't complain about reset races Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 74/97] drm/i915/guc: Capture error state on context reset Matthew Brost
                   ` (26 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

Clear the 'disable resets' flag to allow GuC to reset hung contexts
(detected via pre-emption timeout).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index cd65ff42657d..179ab658d2b5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -84,8 +84,7 @@ static void guc_policies_init(struct guc_policies *policies)
 {
 	policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US;
 	policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI;
-	/* Disable automatic resets as not yet supported. */
-	policies->global_flags = GLOBAL_POLICY_DISABLE_ENGINE_RESET;
+	policies->global_flags = 0;
 	policies->is_valid = 1;
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 74/97] drm/i915/guc: Capture error state on context reset
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (72 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 73/97] drm/i915/guc: Enable GuC engine reset Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-11 16:28   ` [Intel-gfx] " Daniel Vetter
  2021-05-06 19:14 ` [RFC PATCH 75/97] drm/i915/guc: Fix for error capture after full GPU reset with GuC Matthew Brost
                   ` (25 subsequent siblings)
  99 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

We receive notification of an engine reset from GuC at its
completion. Meaning GuC has potentially cleared any HW state
we may have been interested in capturing. GuC resumes scheduling
on the engine post-reset, as the resets are meant to be transparent,
further muddling our error state.

There is ongoing work to define an API for a GuC debug state dump. The
suggestion for now is to manually disable FW initiated resets in cases
where debug state is needed.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       | 20 +++++++++++
 drivers/gpu/drm/i915/gt/intel_context.h       |  3 ++
 drivers/gpu/drm/i915/gt/intel_engine.h        | 21 ++++++++++-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 11 ++++--
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++++++++----------
 drivers/gpu/drm/i915/i915_gpu_error.c         | 25 ++++++++++---
 7 files changed, 91 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 2f01437056a8..3fe7794b2bfd 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -514,6 +514,26 @@ struct i915_request *intel_context_create_request(struct intel_context *ce)
 	return rq;
 }
 
+struct i915_request *intel_context_find_active_request(struct intel_context *ce)
+{
+	struct i915_request *rq, *active = NULL;
+	unsigned long flags;
+
+	GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
+
+	spin_lock_irqsave(&ce->guc_active.lock, flags);
+	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
+				    sched.link) {
+		if (i915_request_completed(rq))
+			break;
+
+		active = rq;
+	}
+	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
+
+	return active;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftest_context.c"
 #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index 9b211ca5ecc7..d2b499ed8a05 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -195,6 +195,9 @@ int intel_context_prepare_remote_request(struct intel_context *ce,
 
 struct i915_request *intel_context_create_request(struct intel_context *ce);
 
+struct i915_request *
+intel_context_find_active_request(struct intel_context *ce);
+
 static inline struct intel_ring *__intel_context_ring_size(u64 sz)
 {
 	return u64_to_ptr(struct intel_ring, sz);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 3321d0917a99..bb94963a9fa2 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -242,7 +242,7 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine,
 				   ktime_t *now);
 
 struct i915_request *
-intel_engine_find_active_request(struct intel_engine_cs *engine);
+intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine);
 
 u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
 
@@ -316,4 +316,23 @@ intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
 	return engine->cops->get_sibling(engine, sibling);
 }
 
+static inline void
+intel_engine_set_hung_context(struct intel_engine_cs *engine,
+			      struct intel_context *ce)
+{
+	engine->hung_ce = ce;
+}
+
+static inline void
+intel_engine_clear_hung_context(struct intel_engine_cs *engine)
+{
+	intel_engine_set_hung_context(engine, NULL);
+}
+
+static inline struct intel_context *
+intel_engine_get_hung_context(struct intel_engine_cs *engine)
+{
+	return engine->hung_ce;
+}
+
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 10300db1c9a6..ad3987289f09 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1727,7 +1727,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	drm_printf(m, "\tRequests:\n");
 
 	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-	rq = intel_engine_find_active_request(engine);
+	rq = intel_engine_execlist_find_hung_request(engine);
 	if (rq) {
 		struct intel_timeline *tl = get_timeline(rq);
 
@@ -1838,10 +1838,17 @@ static bool match_ring(struct i915_request *rq)
 }
 
 struct i915_request *
-intel_engine_find_active_request(struct intel_engine_cs *engine)
+intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
 {
 	struct i915_request *request, *active = NULL;
 
+	/*
+	 * This search does not work in GuC submission mode. However, the GuC
+	 * will report the hanging context directly to the driver itself. So
+	 * the driver should never get here when in GuC mode.
+	 */
+	GEM_BUG_ON(intel_uc_uses_guc_submission(&engine->gt->uc));
+
 	/*
 	 * We are called by the error capture, reset and to dump engine
 	 * state at random points in time. In particular, note that neither is
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index b84562b2708b..bba53e3b39b9 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -304,6 +304,8 @@ struct intel_engine_cs {
 	/* keep a request in reserve for a [pm] barrier under oom */
 	struct i915_request *request_pool;
 
+	struct intel_context *hung_ce;
+
 	struct llist_head barrier_tasks;
 
 	struct intel_context *kernel_context; /* pinned */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 22f17a055b21..6b3b74e50b31 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -726,24 +726,6 @@ __unwind_incomplete_requests(struct intel_context *ce)
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
-static struct i915_request *context_find_active_request(struct intel_context *ce)
-{
-	struct i915_request *rq, *active = NULL;
-	unsigned long flags;
-
-	spin_lock_irqsave(&ce->guc_active.lock, flags);
-	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
-				    sched.link) {
-		if (i915_request_completed(rq))
-			break;
-
-		active = rq;
-	}
-	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
-
-	return active;
-}
-
 static void __guc_reset_context(struct intel_context *ce, bool stalled)
 {
 	struct i915_request *rq;
@@ -757,7 +739,7 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
 	 */
 	clr_context_enabled(ce);
 
-	rq = context_find_active_request(ce);
+	rq = intel_context_find_active_request(ce);
 	if (!rq) {
 		head = ce->ring->tail;
 		stalled = false;
@@ -2192,6 +2174,20 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 	return 0;
 }
 
+static void capture_error_state(struct intel_guc *guc,
+				struct intel_context *ce)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct drm_i915_private *i915 = gt->i915;
+	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
+	intel_wakeref_t wakeref;
+
+	intel_engine_set_hung_context(engine, ce);
+	with_intel_runtime_pm(&i915->runtime_pm, wakeref)
+		i915_capture_error_state(gt, engine->mask);
+	atomic_inc(&i915->gpu_error.reset_engine_count[engine->uabi_class]);
+}
+
 static void guc_context_replay(struct intel_context *ce)
 {
 	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
@@ -2204,6 +2200,7 @@ static void guc_handle_context_reset(struct intel_guc *guc,
 				     struct intel_context *ce)
 {
 	trace_intel_context_reset(ce);
+	capture_error_state(guc, ce);
 	guc_context_replay(ce);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 3352f56bcf63..825bdfe44225 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1435,20 +1435,37 @@ capture_engine(struct intel_engine_cs *engine,
 {
 	struct intel_engine_capture_vma *capture = NULL;
 	struct intel_engine_coredump *ee;
-	struct i915_request *rq;
+	struct intel_context *ce;
+	struct i915_request *rq = NULL;
 	unsigned long flags;
 
 	ee = intel_engine_coredump_alloc(engine, GFP_KERNEL);
 	if (!ee)
 		return NULL;
 
-	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-	rq = intel_engine_find_active_request(engine);
+	ce = intel_engine_get_hung_context(engine);
+	if (ce) {
+		intel_engine_clear_hung_context(engine);
+		rq = intel_context_find_active_request(ce);
+		if (!rq || !i915_request_started(rq))
+			goto no_request_capture;
+	} else {
+		/*
+		 * Getting here with GuC enabled means it is a forced error capture
+		 * with no actual hang. So, no need to attempt the execlist search.
+		 */
+		if (!intel_uc_uses_guc_submission(&engine->gt->uc)) {
+			spin_lock_irqsave(&engine->sched_engine->lock, flags);
+			rq = intel_engine_execlist_find_hung_request(engine);
+			spin_unlock_irqrestore(&engine->sched_engine->lock,
+					       flags);
+		}
+	}
 	if (rq)
 		capture = intel_engine_coredump_add_request(ee, rq,
 							    ATOMIC_MAYFAIL);
-	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 	if (!capture) {
+no_request_capture:
 		kfree(ee);
 		return NULL;
 	}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 75/97] drm/i915/guc: Fix for error capture after full GPU reset with GuC
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (73 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 74/97] drm/i915/guc: Capture error state on context reset Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 76/97] drm/i915/guc: Hook GuC scheduling policies up Matthew Brost
                   ` (24 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

In the case of a full GPU reset (e.g. because GuC has died or because
GuC's hang detection has been disabled), the driver can't rely on GuC
reporting the guilty context. Instead, the driver needs to scan all
active contexts and find one that is currently executing, as per the
execlist mode behaviour. In GuC mode, this scan is different to
execlist mode as the active request list is handled very differently.

Similarly, the request state dump in debugfs needs to be handled
differently when in GuC submission mode.

Also refactured some of the request scanning code to avoid duplication
across the multiple code paths that are now replicating it.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine.h        |   3 +
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 139 ++++++++++++------
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   8 +
 drivers/gpu/drm/i915/gt/intel_reset.c         |   2 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  67 +++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   3 +
 drivers/gpu/drm/i915/i915_request.c           |  41 ++++++
 drivers/gpu/drm/i915/i915_request.h           |  11 ++
 9 files changed, 229 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index bb94963a9fa2..2e69be3bb1cf 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -237,6 +237,9 @@ __printf(3, 4)
 void intel_engine_dump(struct intel_engine_cs *engine,
 		       struct drm_printer *m,
 		       const char *header, ...);
+void intel_engine_dump_active_requests(struct list_head *requests,
+				       struct i915_request *hung_rq,
+				       struct drm_printer *m);
 
 ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine,
 				   ktime_t *now);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index ad3987289f09..e34a61600c8c 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1680,6 +1680,97 @@ static void print_properties(struct intel_engine_cs *engine,
 			   read_ul(&engine->defaults, p->offset));
 }
 
+static void engine_dump_request(struct i915_request *rq, struct drm_printer *m, const char *msg)
+{
+	struct intel_timeline *tl = get_timeline(rq);
+
+	i915_request_show(m, rq, msg, 0);
+
+	drm_printf(m, "\t\tring->start:  0x%08x\n",
+		   i915_ggtt_offset(rq->ring->vma));
+	drm_printf(m, "\t\tring->head:   0x%08x\n",
+		   rq->ring->head);
+	drm_printf(m, "\t\tring->tail:   0x%08x\n",
+		   rq->ring->tail);
+	drm_printf(m, "\t\tring->emit:   0x%08x\n",
+		   rq->ring->emit);
+	drm_printf(m, "\t\tring->space:  0x%08x\n",
+		   rq->ring->space);
+
+	if (tl) {
+		drm_printf(m, "\t\tring->hwsp:   0x%08x\n",
+			   tl->hwsp_offset);
+		intel_timeline_put(tl);
+	}
+
+	print_request_ring(m, rq);
+
+	if (rq->context->lrc_reg_state) {
+		drm_printf(m, "Logical Ring Context:\n");
+		hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE);
+	}
+}
+
+void intel_engine_dump_active_requests(struct list_head *requests,
+				       struct i915_request *hung_rq,
+				       struct drm_printer *m)
+{
+	struct i915_request *rq;
+	const char *msg;
+	enum i915_request_state state;
+
+	list_for_each_entry(rq, requests, sched.link) {
+		if (rq == hung_rq)
+			continue;
+
+		state = i915_test_request_state(rq);
+		if (state < I915_REQUEST_QUEUED)
+			continue;
+
+		if (state == I915_REQUEST_ACTIVE)
+			msg = "\t\tactive on engine";
+		else
+			msg = "\t\tactive in queue";
+
+		engine_dump_request(rq, m, msg);
+	}
+}
+
+static void engine_dump_active_requests(struct intel_engine_cs *engine, struct drm_printer *m)
+{
+	struct i915_request *hung_rq = NULL;
+	struct intel_context *ce;
+	bool guc;
+
+	/*
+	 * No need for an engine->irq_seqno_barrier() before the seqno reads.
+	 * The GPU is still running so requests are still executing and any
+	 * hardware reads will be out of date by the time they are reported.
+	 * But the intention here is just to report an instantaneous snapshot
+	 * so that's fine.
+	 */
+	lockdep_assert_held(&engine->sched_engine->lock);
+
+	drm_printf(m, "\tRequests:\n");
+
+	guc = intel_uc_uses_guc_submission(&engine->gt->uc);
+	if (guc) {
+		ce = intel_engine_get_hung_context(engine);
+		if (ce)
+			hung_rq = intel_context_find_active_request(ce);
+	} else
+		hung_rq = intel_engine_execlist_find_hung_request(engine);
+
+	if (hung_rq)
+		engine_dump_request(hung_rq, m, "\t\thung");
+
+	if (guc)
+		intel_guc_dump_active_requests(engine, hung_rq, m);
+	else
+		intel_engine_dump_active_requests(&engine->sched_engine->requests,
+						  hung_rq, m);
+}
+
 void intel_engine_dump(struct intel_engine_cs *engine,
 		       struct drm_printer *m,
 		       const char *header, ...)
@@ -1724,39 +1815,9 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		   i915_reset_count(error));
 	print_properties(engine, m);
 
-	drm_printf(m, "\tRequests:\n");
-
 	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-	rq = intel_engine_execlist_find_hung_request(engine);
-	if (rq) {
-		struct intel_timeline *tl = get_timeline(rq);
-
-		i915_request_show(m, rq, "\t\tactive ", 0);
-
-		drm_printf(m, "\t\tring->start:  0x%08x\n",
-			   i915_ggtt_offset(rq->ring->vma));
-		drm_printf(m, "\t\tring->head:   0x%08x\n",
-			   rq->ring->head);
-		drm_printf(m, "\t\tring->tail:   0x%08x\n",
-			   rq->ring->tail);
-		drm_printf(m, "\t\tring->emit:   0x%08x\n",
-			   rq->ring->emit);
-		drm_printf(m, "\t\tring->space:  0x%08x\n",
-			   rq->ring->space);
-
-		if (tl) {
-			drm_printf(m, "\t\tring->hwsp:   0x%08x\n",
-				   tl->hwsp_offset);
-			intel_timeline_put(tl);
-		}
-
-		print_request_ring(m, rq);
+	engine_dump_active_requests(engine, m);
 
-		if (rq->context->lrc_reg_state) {
-			drm_printf(m, "Logical Ring Context:\n");
-			hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE);
-		}
-	}
 	drm_printf(m, "\tOn hold?: %lu\n",
 		   list_count(&engine->sched_engine->hold));
 	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
@@ -1830,13 +1891,6 @@ intel_engine_create_virtual(struct intel_engine_cs **siblings,
 	return siblings[0]->cops->create_virtual(siblings, count);
 }
 
-static bool match_ring(struct i915_request *rq)
-{
-	u32 ring = ENGINE_READ(rq->engine, RING_START);
-
-	return ring == i915_ggtt_offset(rq->ring->vma);
-}
-
 struct i915_request *
 intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
 {
@@ -1880,14 +1934,7 @@ intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
 
 	list_for_each_entry(request, &engine->sched_engine->requests,
 			    sched.link) {
-		if (__i915_request_is_complete(request))
-			continue;
-
-		if (!__i915_request_has_started(request))
-			continue;
-
-		/* More than one preemptible request may match! */
-		if (!match_ring(request))
+		if (i915_test_request_state(request) != I915_REQUEST_ACTIVE)
 			continue;
 
 		active = request;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index a8495364d906..f0768824de6f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -90,6 +90,14 @@ reset_engine(struct intel_engine_cs *engine, struct i915_request *rq)
 	if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
 		show_heartbeat(rq, engine);
 
+	if (intel_engine_uses_guc(engine))
+		/*
+		 * GuC itself is toast or GuC's hang detection
+		 * is disabled. Either way, need to find the
+		 * hang culprit manually.
+		 */
+		intel_guc_find_hung_context(engine);
+
 	intel_gt_handle_error(engine->gt, engine->mask,
 			      I915_ERROR_CAPTURE,
 			      "stopped heartbeat on %s",
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index ce3ef26ffe2d..c35c4b529ce5 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -156,7 +156,7 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
 	if (guilty) {
 		i915_request_set_error_once(rq, -EIO);
 		__i915_request_skip(rq);
-		if (mark_guilty(rq))
+		if (mark_guilty(rq) && !intel_engine_uses_guc(rq->engine))
 			skip_context(rq);
 	} else {
 		i915_request_set_error_once(rq, -EAGAIN);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 097687937cec..10b48b9f7603 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -269,6 +269,8 @@ int intel_guc_context_reset_process_msg(struct intel_guc *guc,
 int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
 					 const u32 *msg, u32 len);
 
+void intel_guc_find_hung_context(struct intel_engine_cs *engine);
+
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
 void intel_guc_submission_reset_finish(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 6b3b74e50b31..ad3d2326a81d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2267,6 +2267,73 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
 	return 0;
 }
 
+void intel_guc_find_hung_context(struct intel_engine_cs *engine)
+{
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	struct intel_context *ce;
+	struct i915_request *rq;
+	unsigned long index;
+
+	/* Reset called during driver load? GuC not yet initialised! */
+	if (unlikely(!guc_submission_initialized(guc)))
+		return;
+
+	xa_for_each(&guc->context_lookup, index, ce) {
+		if (!intel_context_is_pinned(ce))
+			continue;
+
+		if (intel_engine_is_virtual(ce->engine)) {
+			if (!(ce->engine->mask & engine->mask))
+				continue;
+		} else {
+			if (ce->engine != engine)
+				continue;
+		}
+
+		list_for_each_entry(rq, &ce->guc_active.requests, sched.link) {
+			if (i915_test_request_state(rq) != I915_REQUEST_ACTIVE)
+				continue;
+
+			intel_engine_set_hung_context(engine, ce);
+
+			/* Can only cope with one hang at a time... */
+			return;
+		}
+	}
+}
+
+void intel_guc_dump_active_requests(struct intel_engine_cs *engine,
+				    struct i915_request *hung_rq,
+				    struct drm_printer *m)
+{
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	struct intel_context *ce;
+	unsigned long index;
+	unsigned long flags;
+
+	/* Reset called during driver load? GuC not yet initialised! */
+	if (unlikely(!guc_submission_initialized(guc)))
+		return;
+
+	xa_for_each(&guc->context_lookup, index, ce) {
+		if (!intel_context_is_pinned(ce))
+			continue;
+
+		if (intel_engine_is_virtual(ce->engine)) {
+			if (!(ce->engine->mask & engine->mask))
+				continue;
+		} else {
+			if (ce->engine != engine)
+				continue;
+		}
+
+		spin_lock_irqsave(&ce->guc_active.lock, flags);
+		intel_engine_dump_active_requests(&ce->guc_active.requests,
+						  hung_rq, m);
+		spin_unlock_irqrestore(&ce->guc_active.lock, flags);
+	}
+}
+
 void intel_guc_log_submission_info(struct intel_guc *guc,
 				   struct drm_printer *p)
 {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index b9b9f0f60f91..a2a3fad72be1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -24,6 +24,9 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine);
 void intel_guc_log_submission_info(struct intel_guc *guc,
 				   struct drm_printer *p);
 void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p);
+void intel_guc_dump_active_requests(struct intel_engine_cs *engine,
+				    struct i915_request *hung_rq,
+				    struct drm_printer *m);
 
 bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 4855cf7ebe21..ef9eb91ec84c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -2076,6 +2076,47 @@ void i915_request_show(struct drm_printer *m,
 		   name);
 }
 
+static bool engine_match_ring(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+	u32 ring = ENGINE_READ(engine, RING_START);
+
+	return ring == i915_ggtt_offset(rq->ring->vma);
+}
+
+static bool match_ring(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine;
+	bool found;
+	int i;
+
+	if (!intel_engine_is_virtual(rq->engine))
+		return engine_match_ring(rq->engine, rq);
+
+	found = false;
+	i = 0;
+	while ((engine = intel_engine_get_sibling(rq->engine, i++))) {
+		found = engine_match_ring(engine, rq);
+		if (found)
+			break;
+	}
+
+	return found;
+}
+
+enum i915_request_state i915_test_request_state(struct i915_request *rq)
+{
+	if (i915_request_completed(rq))
+		return I915_REQUEST_COMPLETE;
+
+	if (!i915_request_started(rq))
+		return I915_REQUEST_PENDING;
+
+	if (match_ring(rq))
+		return I915_REQUEST_ACTIVE;
+
+	return I915_REQUEST_QUEUED;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/mock_request.c"
 #include "selftests/i915_request.c"
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index bcc6340c505e..f98385f72782 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -651,4 +651,15 @@ i915_request_active_engine(struct i915_request *rq,
 
 void i915_request_notify_execute_cb_imm(struct i915_request *rq);
 
+enum i915_request_state
+{
+	I915_REQUEST_UNKNOWN = 0,
+	I915_REQUEST_COMPLETE,
+	I915_REQUEST_PENDING,
+	I915_REQUEST_QUEUED,
+	I915_REQUEST_ACTIVE,
+};
+
+enum i915_request_state i915_test_request_state(struct i915_request *rq);
+
 #endif /* I915_REQUEST_H */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 76/97] drm/i915/guc: Hook GuC scheduling policies up
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (74 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 75/97] drm/i915/guc: Fix for error capture after full GPU reset with GuC Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 77/97] drm/i915/guc: Connect reset modparam updates to GuC policy flags Matthew Brost
                   ` (23 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

Use the official driver default scheduling policies for configuring
the GuC scheduler rather than a bunch of hardcoded values.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Jose Souza <jose.souza@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    | 44 ++++++++++++++++++-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 +++--
 4 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index bba53e3b39b9..16cc8453b01c 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -461,6 +461,7 @@ struct intel_engine_cs {
 #define I915_ENGINE_IS_VIRTUAL       BIT(5)
 #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
 #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
+#define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8)
 	unsigned int flags;
 
 	/*
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 10b48b9f7603..266358d04bfc 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -271,6 +271,8 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
 
 void intel_guc_find_hung_context(struct intel_engine_cs *engine);
 
+int intel_guc_global_policies_update(struct intel_guc *guc);
+
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
 void intel_guc_submission_reset_finish(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 179ab658d2b5..b37473bc8fff 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -80,14 +80,54 @@ static u32 guc_ads_blob_size(struct intel_guc *guc)
 	       guc_ads_private_data_size(guc);
 }
 
-static void guc_policies_init(struct guc_policies *policies)
+static void guc_policies_init(struct intel_guc *guc, struct guc_policies *policies)
 {
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct drm_i915_private *i915 = gt->i915;
+
 	policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US;
 	policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI;
+
 	policies->global_flags = 0;
+	if (i915->params.reset < 2)
+		policies->global_flags |= GLOBAL_POLICY_DISABLE_ENGINE_RESET;
+
 	policies->is_valid = 1;
 }
 
+static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE,
+		policy_offset
+	};
+
+	return intel_guc_send(guc, action, ARRAY_SIZE(action));
+}
+
+int intel_guc_global_policies_update(struct intel_guc *guc)
+{
+	struct __guc_ads_blob *blob = guc->ads_blob;
+	struct intel_gt *gt = guc_to_gt(guc);
+	intel_wakeref_t wakeref;
+	int ret;
+
+	if (!blob)
+		return -ENOTSUPP;
+
+	GEM_BUG_ON(!blob->ads.scheduler_policies);
+
+	guc_policies_init(guc, &blob->policies);
+
+	if (!intel_guc_is_ready(guc))
+		return 0;
+
+	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
+		ret = guc_action_policies_update(guc, blob->ads.scheduler_policies);
+
+	return ret;
+}
+
 static void guc_mapping_table_init(struct intel_gt *gt,
 				   struct guc_gt_system_info *system_info)
 {
@@ -284,7 +324,7 @@ static void __guc_ads_init(struct intel_guc *guc)
 	u8 engine_class, guc_class;
 
 	/* GuC scheduling policies */
-	guc_policies_init(&blob->policies);
+	guc_policies_init(guc, &blob->policies);
 
 	/*
 	 * GuC expects a per-engine-class context image and size
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ad3d2326a81d..a9fb31370c61 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -872,6 +872,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
 	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
 	atomic_set(&guc->outstanding_submission_g2h, 0);
 
+	intel_guc_global_policies_update(guc);
 	enable_submission(guc);
 	intel_gt_unpark_heartbeats(guc_to_gt(guc));
 }
@@ -1160,8 +1161,12 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
 {
 	desc->policy_flags = 0;
 
-	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
-	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
+	if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION)
+		desc->policy_flags |= CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE;
+
+	/* NB: For both of these, zero means disabled. */
+	desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
+	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
 }
 
 static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
@@ -1942,13 +1947,13 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 	engine->set_default_submission = guc_set_default_submission;
 
 	engine->flags |= I915_ENGINE_HAS_PREEMPTION;
+	engine->flags |= I915_ENGINE_HAS_TIMESLICES;
 
 	/*
 	 * TODO: GuC supports timeslicing and semaphores as well, but they're
 	 * handled by the firmware so some minor tweaks are required before
 	 * enabling.
 	 *
-	 * engine->flags |= I915_ENGINE_HAS_TIMESLICES;
 	 * engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
 	 */
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 77/97] drm/i915/guc: Connect reset modparam updates to GuC policy flags
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (75 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 76/97] drm/i915/guc: Hook GuC scheduling policies up Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 78/97] drm/i915/guc: Include scheduling policies in the debugfs state dump Matthew Brost
                   ` (22 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

Changing the reset module parameter has no effect on a running GuC.
The corresponding entry in the ADS must be updated and then the GuC
informed via a Host2GuC message.

The new debugfs interface to module parameters allows this to happen.
However, connecting the parameter data address back to anything useful
is messy. One option would be to pass a new private data structure
address through instead of just the parameter pointer. However, that
means having a new (and different) data structure for each parameter
and a new (and different) write function for each parameter. This
method keeps everything generic by instead using a string lookup on
the directory entry name.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c |  2 +-
 drivers/gpu/drm/i915/i915_debugfs_params.c | 31 ++++++++++++++++++++++
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index b37473bc8fff..bb20513f40f6 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -102,7 +102,7 @@ static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
 		policy_offset
 	};
 
-	return intel_guc_send(guc, action, ARRAY_SIZE(action));
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
 }
 
 int intel_guc_global_policies_update(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/i915_debugfs_params.c b/drivers/gpu/drm/i915/i915_debugfs_params.c
index 4e2b077692cb..8ecd8b42f048 100644
--- a/drivers/gpu/drm/i915/i915_debugfs_params.c
+++ b/drivers/gpu/drm/i915/i915_debugfs_params.c
@@ -6,9 +6,20 @@
 #include <linux/kernel.h>
 
 #include "i915_debugfs_params.h"
+#include "gt/intel_gt.h"
+#include "gt/uc/intel_guc.h"
 #include "i915_drv.h"
 #include "i915_params.h"
 
+#define MATCH_DEBUGFS_NODE_NAME(_file, _name)	(strcmp((_file)->f_path.dentry->d_name.name, (_name)) == 0)
+
+#define GET_I915(i915, name, ptr)	\
+	do {	\
+		struct i915_params *params;	\
+		params = container_of(((void *) (ptr)), typeof(*params), name);	\
+		(i915) = container_of(params, typeof(*(i915)), params);	\
+	} while(0)
+
 /* int param */
 static int i915_param_int_show(struct seq_file *m, void *data)
 {
@@ -24,6 +35,16 @@ static int i915_param_int_open(struct inode *inode, struct file *file)
 	return single_open(file, i915_param_int_show, inode->i_private);
 }
 
+static int notify_guc(struct drm_i915_private *i915)
+{
+	int ret = 0;
+
+	if (intel_uc_uses_guc_submission(&i915->gt.uc))
+		ret = intel_guc_global_policies_update(&i915->gt.uc.guc);
+
+	return ret;
+}
+
 static ssize_t i915_param_int_write(struct file *file,
 				    const char __user *ubuf, size_t len,
 				    loff_t *offp)
@@ -81,8 +102,10 @@ static ssize_t i915_param_uint_write(struct file *file,
 				     const char __user *ubuf, size_t len,
 				     loff_t *offp)
 {
+	struct drm_i915_private *i915;
 	struct seq_file *m = file->private_data;
 	unsigned int *value = m->private;
+	unsigned int old = *value;
 	int ret;
 
 	ret = kstrtouint_from_user(ubuf, len, 0, value);
@@ -95,6 +118,14 @@ static ssize_t i915_param_uint_write(struct file *file,
 			*value = b;
 	}
 
+	if (!ret && MATCH_DEBUGFS_NODE_NAME(file, "reset")) {
+		GET_I915(i915, reset, value);
+
+		ret = notify_guc(i915);
+		if (ret)
+			*value = old;
+	}
+
 	return ret ?: len;
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 78/97] drm/i915/guc: Include scheduling policies in the debugfs state dump
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (76 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 77/97] drm/i915/guc: Connect reset modparam updates to GuC policy flags Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 79/97] drm/i915/guc: Don't call ring_is_idle in GuC submission Matthew Brost
                   ` (21 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

Added the scheduling policy parameters to the 'guc_info' debugfs state
dump.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c     | 13 +++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h     |  2 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c |  2 ++
 3 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index bb20513f40f6..bc2745f73a06 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -95,6 +95,19 @@ static void guc_policies_init(struct intel_guc *guc, struct guc_policies *polici
 	policies->is_valid = 1;
 }
 
+void intel_guc_log_policy_info(struct intel_guc *guc, struct drm_printer *dp)
+{
+	struct __guc_ads_blob *blob = guc->ads_blob;
+
+	if (unlikely(!blob))
+		return;
+
+	drm_printf(dp, "Global scheduling policies:\n");
+	drm_printf(dp, "  DPC promote time   = %u\n", blob->policies.dpc_promote_time);
+	drm_printf(dp, "  Max num work items = %u\n", blob->policies.max_num_work_items);
+	drm_printf(dp, "  Flags              = %u\n", blob->policies.global_flags);
+}
+
 static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
 {
 	u32 action[] = {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
index b00d3ae1113a..0fdcb3583601 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
@@ -7,9 +7,11 @@
 #define _INTEL_GUC_ADS_H_
 
 struct intel_guc;
+struct drm_printer;
 
 int intel_guc_ads_create(struct intel_guc *guc);
 void intel_guc_ads_destroy(struct intel_guc *guc);
 void intel_guc_ads_reset(struct intel_guc *guc);
+void intel_guc_log_policy_info(struct intel_guc *guc, struct drm_printer *p);
 
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
index 62b9ce0fafaa..9a03ff56e654 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
@@ -10,6 +10,7 @@
 #include "intel_guc_debugfs.h"
 #include "intel_guc_log_debugfs.h"
 #include "gt/uc/intel_guc_ct.h"
+#include "gt/uc/intel_guc_ads.h"
 #include "gt/uc/intel_guc_submission.h"
 
 static int guc_info_show(struct seq_file *m, void *data)
@@ -29,6 +30,7 @@ static int guc_info_show(struct seq_file *m, void *data)
 
 	intel_guc_log_ct_info(&guc->ct, &p);
 	intel_guc_log_submission_info(guc, &p);
+	intel_guc_log_policy_info(guc, &p);
 
 	return 0;
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 79/97] drm/i915/guc: Don't call ring_is_idle in GuC submission
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (77 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 78/97] drm/i915/guc: Include scheduling policies in the debugfs state dump Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 80/97] drm/i915/guc: Implement banned contexts for " Matthew Brost
                   ` (20 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

The engine registers really shouldn't be touched during GuC submission
as the GuC owns the registers. Don't call ring_is_idle and tie
intel_engine_is_idle strickly the engine pm.

Because intel_engine_is_idle tied to the engine pm, retire requests
before checking intel_engines_are_idle in gt_drop_caches, and lastly
increase the timeout in gt_drop_caches for the intel_engines_are_idle
check.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 13 +++++++++++++
 drivers/gpu/drm/i915/i915_debugfs.c       |  6 +++---
 drivers/gpu/drm/i915/i915_drv.h           |  2 +-
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index e34a61600c8c..591226b96201 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1226,6 +1226,9 @@ static bool ring_is_idle(struct intel_engine_cs *engine)
 {
 	bool idle = true;
 
+	/* GuC submission shouldn't access HEAD & TAIL via MMIO */
+	GEM_BUG_ON(intel_engine_uses_guc(engine));
+
 	if (I915_SELFTEST_ONLY(!engine->mmio_base))
 		return true;
 
@@ -1292,6 +1295,16 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
 	if (!i915_sched_engine_is_empty(engine->sched_engine))
 		return false;
 
+	/*
+	 * We shouldn't touch engine registers with GuC submission as the GuC
+	 * owns the registers. Let's tie the idle to engine pm, at worst this
+	 * function sometimes will falsely report non-idle when idle during the
+	 * delay to retire requests or with virtual engines and a request
+	 * running on another instance within the same class / submit mask.
+	 */
+	if (intel_engine_uses_guc(engine))
+		return false;
+
 	/* Ring stopped? */
 	return ring_is_idle(engine);
 }
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index d540dd8029d0..2639961504b5 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -867,13 +867,13 @@ gt_drop_caches(struct intel_gt *gt, u64 val)
 {
 	int ret;
 
+	if (val & DROP_RETIRE || val & DROP_RESET_ACTIVE)
+		intel_gt_retire_requests(gt);
+
 	if (val & DROP_RESET_ACTIVE &&
 	    wait_for(intel_engines_are_idle(gt), I915_IDLE_ENGINES_TIMEOUT))
 		intel_gt_set_wedged(gt);
 
-	if (val & DROP_RETIRE)
-		intel_gt_retire_requests(gt);
-
 	if (val & (DROP_IDLE | DROP_ACTIVE)) {
 		ret = intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT);
 		if (ret)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3cfa6effbb5f..aa359b8480cd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -576,7 +576,7 @@ struct i915_gem_mm {
 	u32 shrink_count;
 };
 
-#define I915_IDLE_ENGINES_TIMEOUT (200) /* in ms */
+#define I915_IDLE_ENGINES_TIMEOUT (500) /* in ms */
 
 unsigned long i915_fence_context_timeout(const struct drm_i915_private *i915,
 					 u64 context);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 80/97] drm/i915/guc: Implement banned contexts for GuC submission
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (78 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 79/97] drm/i915/guc: Don't call ring_is_idle in GuC submission Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 81/97] drm/i915/guc: Allow flexible number of context ids Matthew Brost
                   ` (19 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

When using GuC submission, if a context gets banned disable scheduling
and mark all inflight requests as complete.

Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   2 +-
 drivers/gpu/drm/i915/gt/intel_context.h       |  13 ++
 drivers/gpu/drm/i915/gt/intel_context_types.h |   2 +
 drivers/gpu/drm/i915/gt/intel_reset.c         |  32 ++---
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  20 +++
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 129 ++++++++++++++++--
 drivers/gpu/drm/i915/i915_trace.h             |  10 ++
 8 files changed, 172 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index bb827bb99250..5dcab5536433 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -441,7 +441,7 @@ static void kill_engines(struct i915_gem_engines *engines, bool ban)
 	for_each_gem_engine(ce, engines, it) {
 		struct intel_engine_cs *engine;
 
-		if (ban && intel_context_set_banned(ce))
+		if (ban && intel_context_ban(ce, NULL))
 			continue;
 
 		/*
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index d2b499ed8a05..11fa7700dc9e 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -17,6 +17,7 @@
 #include "intel_ring_types.h"
 #include "intel_timeline_types.h"
 #include "uc/intel_guc_submission.h"
+#include "i915_trace.h"
 
 #define CE_TRACE(ce, fmt, ...) do {					\
 	const struct intel_context *ce__ = (ce);			\
@@ -243,6 +244,18 @@ static inline bool intel_context_set_banned(struct intel_context *ce)
 	return test_and_set_bit(CONTEXT_BANNED, &ce->flags);
 }
 
+static inline bool intel_context_ban(struct intel_context *ce,
+				     struct i915_request *rq)
+{
+	bool ret = intel_context_set_banned(ce);
+
+	trace_intel_context_ban(ce);
+	if (ce->ops->ban)
+		ce->ops->ban(ce, rq);
+
+	return ret;
+}
+
 static inline bool
 intel_context_force_single_submission(const struct intel_context *ce)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index b63c8cf7823b..591dcba7bfde 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -35,6 +35,8 @@ struct intel_context_ops {
 
 	int (*alloc)(struct intel_context *ce);
 
+	void (*ban)(struct intel_context *ce, struct i915_request *rq);
+
 	int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr);
 	int (*pin)(struct intel_context *ce, void *vaddr);
 	void (*unpin)(struct intel_context *ce);
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index c35c4b529ce5..4347cc2dcea0 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -22,7 +22,6 @@
 #include "intel_reset.h"
 
 #include "uc/intel_guc.h"
-#include "uc/intel_guc_submission.h"
 
 #define RESET_MAX_RETRIES 3
 
@@ -39,21 +38,6 @@ static void rmw_clear_fw(struct intel_uncore *uncore, i915_reg_t reg, u32 clr)
 	intel_uncore_rmw_fw(uncore, reg, clr, 0);
 }
 
-static void skip_context(struct i915_request *rq)
-{
-	struct intel_context *hung_ctx = rq->context;
-
-	list_for_each_entry_from_rcu(rq, &hung_ctx->timeline->requests, link) {
-		if (!i915_request_is_active(rq))
-			return;
-
-		if (rq->context == hung_ctx) {
-			i915_request_set_error_once(rq, -EIO);
-			__i915_request_skip(rq);
-		}
-	}
-}
-
 static void client_mark_guilty(struct i915_gem_context *ctx, bool banned)
 {
 	struct drm_i915_file_private *file_priv = ctx->file_priv;
@@ -88,10 +72,8 @@ static bool mark_guilty(struct i915_request *rq)
 	bool banned;
 	int i;
 
-	if (intel_context_is_closed(rq->context)) {
-		intel_context_set_banned(rq->context);
+	if (intel_context_is_closed(rq->context))
 		return true;
-	}
 
 	rcu_read_lock();
 	ctx = rcu_dereference(rq->context->gem_context);
@@ -123,11 +105,9 @@ static bool mark_guilty(struct i915_request *rq)
 	banned = !i915_gem_context_is_recoverable(ctx);
 	if (time_before(jiffies, prev_hang + CONTEXT_FAST_HANG_JIFFIES))
 		banned = true;
-	if (banned) {
+	if (banned)
 		drm_dbg(&ctx->i915->drm, "context %s: guilty %d, banned\n",
 			ctx->name, atomic_read(&ctx->guilty_count));
-		intel_context_set_banned(rq->context);
-	}
 
 	client_mark_guilty(ctx, banned);
 
@@ -149,6 +129,8 @@ static void mark_innocent(struct i915_request *rq)
 
 void __i915_request_reset(struct i915_request *rq, bool guilty)
 {
+	bool banned = false;
+
 	RQ_TRACE(rq, "guilty? %s\n", yesno(guilty));
 	GEM_BUG_ON(__i915_request_is_complete(rq));
 
@@ -156,13 +138,15 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
 	if (guilty) {
 		i915_request_set_error_once(rq, -EIO);
 		__i915_request_skip(rq);
-		if (mark_guilty(rq) && !intel_engine_uses_guc(rq->engine))
-			skip_context(rq);
+		banned = mark_guilty(rq);
 	} else {
 		i915_request_set_error_once(rq, -EAGAIN);
 		mark_innocent(rq);
 	}
 	rcu_read_unlock();
+
+	if (banned)
+		intel_context_ban(rq->context, rq);
 }
 
 static bool i915_in_reset(struct pci_dev *pdev)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 7d05bf16094c..10715ccd5052 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -584,9 +584,29 @@ static void ring_context_reset(struct intel_context *ce)
 	clear_bit(CONTEXT_VALID_BIT, &ce->flags);
 }
 
+static void ring_context_ban(struct intel_context *ce,
+			     struct i915_request *rq)
+{
+	struct intel_engine_cs *engine;
+
+	if (!rq || !i915_request_is_active(rq))
+		return;
+
+	engine = rq->engine;
+	lockdep_assert_held(&engine->sched_engine->lock);
+	list_for_each_entry_continue(rq, &engine->sched_engine->requests,
+				     sched.link)
+		if (rq->context == ce) {
+			i915_request_set_error_once(rq, -EIO);
+			__i915_request_skip(rq);
+		}
+}
+
 static const struct intel_context_ops ring_context_ops = {
 	.alloc = ring_context_alloc,
 
+	.ban = ring_context_ban,
+
 	.pre_pin = ring_context_pre_pin,
 	.pin = ring_context_pin,
 	.unpin = ring_context_unpin,
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 266358d04bfc..306d6857d683 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -273,6 +273,8 @@ void intel_guc_find_hung_context(struct intel_engine_cs *engine);
 
 int intel_guc_global_policies_update(struct intel_guc *guc);
 
+void intel_guc_context_ban(struct intel_context *ce, struct i915_request *rq);
+
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
 void intel_guc_submission_reset_finish(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a9fb31370c61..a20d7205895a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -124,6 +124,7 @@ static inline void clr_context_pending_enable(struct intel_context *ce)
 #define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
 #define SCHED_STATE_DESTROYED				BIT(1)
 #define SCHED_STATE_PENDING_DISABLE			BIT(2)
+#define SCHED_STATE_BANNED				BIT(3)
 static inline void init_sched_state(struct intel_context *ce)
 {
 	/* Only should be called from guc_lrc_desc_pin() */
@@ -186,6 +187,23 @@ static inline void clr_context_pending_disable(struct intel_context *ce)
 		(ce->guc_state.sched_state & ~SCHED_STATE_PENDING_DISABLE);
 }
 
+static inline bool context_banned(struct intel_context *ce)
+{
+	return (ce->guc_state.sched_state & SCHED_STATE_BANNED);
+}
+
+static inline void set_context_banned(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state |= SCHED_STATE_BANNED;
+}
+
+static inline void clr_context_banned(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state &= ~SCHED_STATE_BANNED;
+}
+
 static inline bool context_guc_id_invalid(struct intel_context *ce)
 {
 	return (ce->guc_id == GUC_INVALID_LRC_ID);
@@ -359,7 +377,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
 
 static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
-	int err;
+	int err = 0;
 	struct intel_context *ce = rq->context;
 	u32 action[3];
 	int len = 0;
@@ -369,6 +387,16 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
 	GEM_BUG_ON(context_guc_id_invalid(ce));
 
+	/*
+	 * Corner case where requests were sitting in the priority list or a
+	 * request resubmitted after the context was banned.
+	 */
+	if (unlikely(intel_context_is_banned(ce))) {
+		i915_request_put(i915_request_mark_eio(rq));
+		intel_engine_signal_breadcrumbs(ce->engine);
+		goto out;
+	}
+
 	/*
 	 * Corner case where the GuC firmware was blown away and reloaded while
 	 * this context was pinned.
@@ -401,6 +429,8 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		clr_context_pending_enable(ce);
 		intel_context_put(ce);
 	}
+	if (likely(!err))
+		trace_i915_request_guc_submit(rq);
 
 out:
 	return err;
@@ -465,7 +495,6 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 			guc->stalled_request = last;
 			return false;
 		}
-		trace_i915_request_guc_submit(last);
 	}
 
 	guc->stalled_request = NULL;
@@ -504,12 +533,13 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
 static void __guc_context_destroy(struct intel_context *ce);
 static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
 static void guc_signal_context_fence(struct intel_context *ce);
+static void guc_cancel_context_requests(struct intel_context *ce);
 
 static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 {
 	struct intel_context *ce;
 	unsigned long index, flags;
-	bool pending_disable, pending_enable, deregister, destroyed;
+	bool pending_disable, pending_enable, deregister, destroyed, banned;
 
 	xa_for_each(&guc->context_lookup, index, ce) {
 		/* Flush context */
@@ -527,6 +557,7 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 		pending_enable = context_pending_enable(ce);
 		pending_disable = context_pending_disable(ce);
 		deregister = context_wait_for_deregister_to_register(ce);
+		banned = context_banned(ce);
 		init_sched_state(ce);
 
 		if (pending_enable || destroyed || deregister) {
@@ -544,6 +575,10 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 		/* Not mutualy exclusive with above if statement. */
 		if (pending_disable) {
 			guc_signal_context_fence(ce);
+			if (banned) {
+				guc_cancel_context_requests(ce);
+				intel_engine_signal_breadcrumbs(ce->engine);
+			}
 			intel_context_sched_disable_unpin(ce);
 			atomic_dec(&guc->outstanding_submission_g2h);
 			intel_context_put(ce);
@@ -661,6 +696,9 @@ static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
 {
 	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
 
+	if (intel_context_is_banned(ce))
+		return;
+
 	GEM_BUG_ON(!intel_context_is_pinned(ce));
 
 	/*
@@ -731,6 +769,8 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
 	struct i915_request *rq;
 	u32 head;
 
+	intel_context_get(ce);
+
 	/*
 	 * GuC will implicitly mark the context as non-schedulable
 	 * when it sends the reset notification. Make sure our state
@@ -756,6 +796,7 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
 out_replay:
 	guc_reset_state(ce, head, stalled);
 	__unwind_incomplete_requests(ce);
+	intel_context_put(ce);
 }
 
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
@@ -938,8 +979,6 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
 	ret = guc_add_request(guc, rq);
 	if (ret == -EBUSY)
 		guc->stalled_request = rq;
-	else
-		trace_i915_request_guc_submit(rq);
 
 	if (unlikely(ret == -EDEADLK))
 		disable_submission(guc);
@@ -1329,13 +1368,52 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
 	return ce->guc_id;
 }
 
+static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	unsigned long flags;
+
+	guc_flush_submissions(guc);
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	set_context_banned(ce);
+
+	if (submission_disabled(guc) || (!context_enabled(ce) &&
+	    !context_pending_disable(ce))) {
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		guc_cancel_context_requests(ce);
+		intel_engine_signal_breadcrumbs(ce->engine);
+	} else if (!context_pending_disable(ce)) {
+		struct intel_runtime_pm *runtime_pm =
+			&ce->engine->gt->i915->runtime_pm;
+		intel_wakeref_t wakeref;
+		u16 guc_id;
+
+		/*
+		 * We add +2 here as the schedule disable complete CTB handler
+		 * calls intel_context_sched_disable_unpin (-2 to pin_count).
+		 */
+		atomic_add(2, &ce->pin_count);
+
+		guc_id = prep_context_pending_disable(ce);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			__guc_context_sched_disable(guc, ce, guc_id);
+	} else {
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+	}
+}
+
 static void guc_context_sched_disable(struct intel_context *ce)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
-	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
 	unsigned long flags;
-	u16 guc_id;
+	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
 	intel_wakeref_t wakeref;
+	u16 guc_id;
+	bool enabled;
 
 	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
@@ -1349,12 +1427,21 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
 
 	/*
-	 * We have to check if the context has been pinned again as another pin
-	 * operation is allowed to pass this function. Checking the pin count
-	 * here synchronizes this function with guc_request_alloc ensuring a
-	 * request doesn't slip through the 'context_pending_disable' fence.
+	 * We have to check if the context is disabled by another thread. We
+	 * also have to check if the context has been pinned again as another
+	 * pin operation is allowed to pass this function. Checking the pin
+	 * count here synchronizes this function with guc_request_alloc ensuring
+	 * a request doesn't slip through the 'context_pending_disable' fence.
 	 */
+	enabled = context_enabled(ce);
+	if (unlikely(!enabled || submission_disabled(guc))) {
+		if (!enabled)
+			clr_context_enabled(ce);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+		goto unpin;
+	}
 	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 		return;
 	}
 	guc_id = prep_context_pending_disable(ce);
@@ -1509,6 +1596,8 @@ static const struct intel_context_ops guc_context_ops = {
 	.unpin = guc_context_unpin,
 	.post_unpin = guc_context_post_unpin,
 
+	.ban = guc_context_ban,
+
 	.enter = intel_context_enter_engine,
 	.exit = intel_context_exit_engine,
 
@@ -1713,6 +1802,8 @@ static const struct intel_context_ops virtual_guc_context_ops = {
 	.unpin = guc_context_unpin,
 	.post_unpin = guc_context_post_unpin,
 
+	.ban = guc_context_ban,
+
 	.enter = guc_virtual_context_enter,
 	.exit = guc_virtual_context_exit,
 
@@ -2158,6 +2249,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 	if (context_pending_enable(ce)) {
 		clr_context_pending_enable(ce);
 	} else if (context_pending_disable(ce)) {
+		bool banned;
+
 		/*
 		 * Unpin must be done before __guc_signal_context_fence,
 		 * otherwise a race exists between the requests getting
@@ -2168,9 +2261,16 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		intel_context_sched_disable_unpin(ce);
 
 		spin_lock_irqsave(&ce->guc_state.lock, flags);
+		banned = context_banned(ce);
+		clr_context_banned(ce);
 		clr_context_pending_disable(ce);
 		__guc_signal_context_fence(ce);
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		if (banned) {
+			guc_cancel_context_requests(ce);
+			intel_engine_signal_breadcrumbs(ce->engine);
+		}
 	}
 
 	decr_outstanding_submission_g2h(guc);
@@ -2205,8 +2305,11 @@ static void guc_handle_context_reset(struct intel_guc *guc,
 				     struct intel_context *ce)
 {
 	trace_intel_context_reset(ce);
-	capture_error_state(guc, ce);
-	guc_context_replay(ce);
+
+	if (likely(!intel_context_is_banned(ce))) {
+		capture_error_state(guc, ce);
+		guc_context_replay(ce);
+	}
 }
 
 int intel_guc_context_reset_process_msg(struct intel_guc *guc,
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index c095c4d39456..937d3706af9b 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -934,6 +934,11 @@ DEFINE_EVENT(intel_context, intel_context_reset,
 	     TP_ARGS(ce)
 );
 
+DEFINE_EVENT(intel_context, intel_context_ban,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
 DEFINE_EVENT(intel_context, intel_context_register,
 	     TP_PROTO(struct intel_context *ce),
 	     TP_ARGS(ce)
@@ -1036,6 +1041,11 @@ trace_intel_context_reset(struct intel_context *ce)
 {
 }
 
+static inline void
+trace_intel_context_ban(struct intel_context *ce)
+{
+}
+
 static inline void
 trace_intel_context_register(struct intel_context *ce)
 {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 81/97] drm/i915/guc: Allow flexible number of context ids
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (79 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 80/97] drm/i915/guc: Implement banned contexts for " Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 82/97] drm/i915/guc: Connect the number of guc_ids to debugfs Matthew Brost
                   ` (18 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Number of available GuC contexts ids might be limited.
Stop refering in code to macro and use variable instead.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h           |  2 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 +++++++++-------
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 306d6857d683..9b1a89530844 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -53,6 +53,8 @@ struct intel_guc {
 	 */
 	spinlock_t contexts_lock;
 	struct ida guc_ids;
+	u32 num_guc_ids;
+	u32 max_guc_ids;
 	struct list_head guc_id_list;
 
 	bool submission_selected;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a20d7205895a..8f40e534bc81 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -228,7 +228,7 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
 {
 	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
 
-	GEM_BUG_ON(index >= GUC_MAX_LRC_DESCRIPTORS);
+	GEM_BUG_ON(index >= guc->max_guc_ids);
 
 	return &base[index];
 }
@@ -237,7 +237,7 @@ static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
 {
 	struct intel_context *ce = xa_load(&guc->context_lookup, id);
 
-	GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
+	GEM_BUG_ON(id >= guc->max_guc_ids);
 
 	return ce;
 }
@@ -247,8 +247,7 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
 	u32 size;
 	int ret;
 
-	size = PAGE_ALIGN(sizeof(struct guc_lrc_desc) *
-			  GUC_MAX_LRC_DESCRIPTORS);
+	size = PAGE_ALIGN(sizeof(struct guc_lrc_desc) * guc->max_guc_ids);
 	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->lrc_desc_pool,
 					     (void **)&guc->lrc_desc_pool_vaddr);
 	if (ret)
@@ -1008,7 +1007,7 @@ static void guc_submit_request(struct i915_request *rq)
 static int new_guc_id(struct intel_guc *guc)
 {
 	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
-			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
+			      guc->num_guc_ids, GFP_KERNEL |
 			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
 }
 
@@ -2142,6 +2141,8 @@ static bool __guc_submission_selected(struct intel_guc *guc)
 
 void intel_guc_submission_init_early(struct intel_guc *guc)
 {
+	guc->max_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
+	guc->num_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
 	guc->submission_selected = __guc_submission_selected(guc);
 }
 
@@ -2150,7 +2151,7 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
 {
 	struct intel_context *ce;
 
-	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
+	if (unlikely(desc_idx >= guc->max_guc_ids)) {
 		drm_dbg(&guc_to_gt(guc)->i915->drm,
 			"Invalid desc_idx %u", desc_idx);
 		return NULL;
@@ -2451,6 +2452,8 @@ void intel_guc_log_submission_info(struct intel_guc *guc,
 
 	drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
 		   atomic_read(&guc->outstanding_submission_g2h));
+	drm_printf(p, "GuC Number GuC IDs: %u\n", guc->num_guc_ids);
+	drm_printf(p, "GuC Max GuC IDs: %u\n", guc->max_guc_ids);
 	drm_printf(p, "GuC tasklet count: %u\n\n",
 		   atomic_read(&sched_engine->tasklet.count));
 
@@ -2474,7 +2477,6 @@ void intel_guc_log_context_info(struct intel_guc *guc,
 {
 	struct intel_context *ce;
 	unsigned long index;
-
 	xa_for_each(&guc->context_lookup, index, ce) {
 		drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
 		drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 82/97] drm/i915/guc: Connect the number of guc_ids to debugfs
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (80 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 81/97] drm/i915/guc: Allow flexible number of context ids Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 83/97] drm/i915/guc: Don't return -EAGAIN to user when guc_ids exhausted Matthew Brost
                   ` (17 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

For testing purposes it may make sense to reduce the number of guc_ids
available to be allocated. Add debugfs support for setting the number of
guc_ids.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    | 31 +++++++++++++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  3 +-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
index 9a03ff56e654..474c96fc16ef 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
@@ -50,11 +50,42 @@ static int guc_registered_contexts_show(struct seq_file *m, void *data)
 }
 DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts);
 
+static int guc_num_id_get(void *data, u64 *val)
+{
+	struct intel_guc *guc = data;
+
+	if (!intel_guc_submission_is_used(guc))
+		return -ENODEV;
+
+	*val = guc->num_guc_ids;
+
+	return 0;
+}
+
+static int guc_num_id_set(void *data, u64 val)
+{
+	struct intel_guc *guc = data;
+
+	if (!intel_guc_submission_is_used(guc))
+		return -ENODEV;
+
+	if (val > guc->max_guc_ids)
+		val = guc->max_guc_ids;
+	else if (val < 256)
+		val = 256;
+
+	guc->num_guc_ids = val;
+
+	return 0;
+}
+DEFINE_SIMPLE_ATTRIBUTE(guc_num_id_fops, guc_num_id_get, guc_num_id_set, "%lld\n");
+
 void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root)
 {
 	static const struct debugfs_gt_file files[] = {
 		{ "guc_info", &guc_info_fops, NULL },
 		{ "guc_registered_contexts", &guc_registered_contexts_fops, NULL },
+		{ "guc_num_id", &guc_num_id_fops, NULL },
 	};
 
 	if (!intel_guc_is_supported(guc))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 8f40e534bc81..3c73c2ca668e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2153,7 +2153,8 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
 
 	if (unlikely(desc_idx >= guc->max_guc_ids)) {
 		drm_dbg(&guc_to_gt(guc)->i915->drm,
-			"Invalid desc_idx %u", desc_idx);
+			"Invalid desc_idx %u, max %u",
+			desc_idx, guc->max_guc_ids);
 		return NULL;
 	}
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 83/97] drm/i915/guc: Don't return -EAGAIN to user when guc_ids exhausted
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (81 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 82/97] drm/i915/guc: Connect the number of guc_ids to debugfs Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 84/97] drm/i915/guc: Don't allow requests not ready to consume all guc_ids Matthew Brost
                   ` (16 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Rather than returning -EAGAIN to the user when no guc_ids are available,
implement a fair sharing algorithm in the kernel which blocks submissons
until guc_ids become available. Submissions are released one at a time,
based on priority, until the guc_id pressure is released to ensure fair
sharing of the guc_ids. Once the pressure is fully released, the normal
guc_id allocation (at request creation time in guc_request_alloc) can
resume as this allocation path should be significantly faster and a fair
sharing algorithm isn't needed when guc_ids are plentifully.

The fair sharing algorithm is implemented by forcing all submissions to
the tasklet which serializes submissions, dequeuing one at a time.

If the submission doesn't have a guc_id and new guc_id can't be found,
two lists are searched, one list with contexts that are not pinned but
still registered with the guc (searched first) and another list with
contexts that are pinned but do not have any submissions actively in
inflight (scheduling enabled + registered, searched second). If no
guc_ids can be found we kick a workqueue which will retire requests
hopefully freeing a guc_id. The workqueue + tasklet ping / pong back and
forth until a guc_id can be found.

Once a guc_id is found, we may have to disable context scheduling
depending on which list the context is stolen from. When we disable
scheduling, we block the tasklet from executing until the completion G2H
returns. The disable scheduling must be issued from the workqueue
because of the locking structure. When we deregister a context, we also
do the same thing (waiting on the G2H) but we can safely issue the
deregister H2G from the tasklet.

Once all the G2H have returned we can trigger a submission on the
context.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |   3 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  26 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 806 ++++++++++++++++--
 drivers/gpu/drm/i915/i915_request.h           |   6 +
 4 files changed, 754 insertions(+), 87 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 591dcba7bfde..a25ea8fe2029 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -180,6 +180,9 @@ struct intel_context {
 	/* GuC lrc descriptor ID */
 	u16 guc_id;
 
+	/* Number of rq submitted without a guc_id */
+	u16 guc_num_rq_submit_no_id;
+
 	/* GuC lrc descriptor reference count */
 	atomic_t guc_id_ref;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 9b1a89530844..bd477209839b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -32,7 +32,28 @@ struct intel_guc {
 
 	/* Global engine used to submit requests to GuC */
 	struct i915_sched_engine *sched_engine;
-	struct i915_request *stalled_request;
+
+	/* Global state related to submission tasklet */
+	struct i915_request *stalled_rq;
+	struct intel_context *stalled_context;
+	struct work_struct retire_worker;
+	unsigned long flags;
+	int total_num_rq_with_no_guc_id;
+
+	/*
+	 * Submisson stall reason. See intel_guc_submission.c for detailed
+	 * description.
+	 */
+	enum {
+		STALL_NONE,
+		STALL_GUC_ID_WORKQUEUE,
+		STALL_GUC_ID_TASKLET,
+		STALL_SCHED_DISABLE,
+		STALL_REGISTER_CONTEXT,
+		STALL_DEREGISTER_CONTEXT,
+		STALL_MOVE_LRC_TAIL,
+		STALL_ADD_REQUEST,
+	} submission_stall_reason;
 
 	/* intel_guc_recv interrupt related state */
 	spinlock_t irq_lock;
@@ -55,7 +76,8 @@ struct intel_guc {
 	struct ida guc_ids;
 	u32 num_guc_ids;
 	u32 max_guc_ids;
-	struct list_head guc_id_list;
+	struct list_head guc_id_list_no_ref;
+	struct list_head guc_id_list_unpinned;
 
 	bool submission_selected;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 3c73c2ca668e..037a7ee4971b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -59,6 +59,25 @@
  * ELSP context descriptor dword into Work Item.
  * See guc_add_request()
  *
+ * GuC flow control state machine:
+ * The tasklet, workqueue (retire_worker), and the G2H handlers together more or
+ * less form a state machine which is used to submit requests + flow control
+ * requests, while waiting on resources / actions, if necessary. The enum,
+ * submission_stall_reason, controls the handoff of stalls between these
+ * entities with stalled_rq & stalled_context being the arguments. Each state
+ * described below.
+ *
+ * STALL_NONE			No stall condition
+ * STALL_GUC_ID_WORKQUEUE	Workqueue will try to free guc_ids
+ * STALL_GUC_ID_TASKLET		Tasklet will try to find guc_id
+ * STALL_SCHED_DISABLE		Workqueue will issue context schedule disable
+ *				H2G
+ * STALL_REGISTER_CONTEXT	Tasklet needs to register context
+ * STALL_DEREGISTER_CONTEXT	G2H handler is waiting for context deregister,
+ *				will register context upon receipt of G2H
+ * STALL_MOVE_LRC_TAIL		Tasklet will try to move LRC tail
+ * STALL_ADD_REQUEST		Tasklet will try to add the request (submit
+ *				context)
  */
 
 /* GuC Virtual Engine */
@@ -72,6 +91,83 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
 
 #define GUC_REQUEST_SIZE 64 /* bytes */
 
+/*
+ * Global GuC flags helper functions
+ */
+enum {
+	GUC_STATE_TASKLET_BLOCKED,
+	GUC_STATE_GUC_IDS_EXHAUSTED,
+};
+
+static bool tasklet_blocked(struct intel_guc *guc)
+{
+	return test_bit(GUC_STATE_TASKLET_BLOCKED, &guc->flags);
+}
+
+static void set_tasklet_blocked(struct intel_guc *guc)
+{
+	lockdep_assert_held(&guc->sched_engine->lock);
+	set_bit(GUC_STATE_TASKLET_BLOCKED, &guc->flags);
+}
+
+static void __clr_tasklet_blocked(struct intel_guc *guc)
+{
+	lockdep_assert_held(&guc->sched_engine->lock);
+	clear_bit(GUC_STATE_TASKLET_BLOCKED, &guc->flags);
+}
+
+static void clr_tasklet_blocked(struct intel_guc *guc)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&guc->sched_engine->lock, flags);
+	__clr_tasklet_blocked(guc);
+	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+}
+
+static bool guc_ids_exhausted(struct intel_guc *guc)
+{
+	return test_bit(GUC_STATE_GUC_IDS_EXHAUSTED, &guc->flags);
+}
+
+static bool test_and_update_guc_ids_exhausted(struct intel_guc *guc)
+{
+	unsigned long flags;
+	bool ret = false;
+
+	/*
+	 * Strict ordering on checking if guc_ids are exhausted isn't required,
+	 * so let's avoid grabbing the submission lock if possible.
+	 */
+	if (guc_ids_exhausted(guc)) {
+		spin_lock_irqsave(&guc->sched_engine->lock, flags);
+		ret = guc_ids_exhausted(guc);
+		if (ret)
+			++guc->total_num_rq_with_no_guc_id;
+		spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+	}
+
+	return ret;
+}
+
+static void set_and_update_guc_ids_exhausted(struct intel_guc *guc)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&guc->sched_engine->lock, flags);
+	++guc->total_num_rq_with_no_guc_id;
+	set_bit(GUC_STATE_GUC_IDS_EXHAUSTED, &guc->flags);
+	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+}
+
+static void clr_guc_ids_exhausted(struct intel_guc *guc)
+{
+	lockdep_assert_held(&guc->sched_engine->lock);
+	GEM_BUG_ON(guc->total_num_rq_with_no_guc_id);
+
+	clear_bit(GUC_STATE_GUC_IDS_EXHAUSTED, &guc->flags);
+}
+
 /*
  * Below is a set of functions which control the GuC scheduling state which do
  * not require a lock as all state transitions are mutually exclusive. i.e. It
@@ -80,6 +176,9 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
  */
 #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
 #define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
+#define SCHED_STATE_NO_LOCK_BLOCK_TASKLET		BIT(2)
+#define SCHED_STATE_NO_LOCK_GUC_ID_STOLEN		BIT(3)
+#define SCHED_STATE_NO_LOCK_NEEDS_REGISTER		BIT(4)
 static inline bool context_enabled(struct intel_context *ce)
 {
 	return (atomic_read(&ce->guc_sched_state_no_lock) &
@@ -115,6 +214,60 @@ static inline void clr_context_pending_enable(struct intel_context *ce)
 		   &ce->guc_sched_state_no_lock);
 }
 
+static inline bool context_block_tasklet(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_BLOCK_TASKLET);
+}
+
+static inline void set_context_block_tasklet(struct intel_context *ce)
+{
+	atomic_or(SCHED_STATE_NO_LOCK_BLOCK_TASKLET,
+		  &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_block_tasklet(struct intel_context *ce)
+{
+	atomic_and((u32)~SCHED_STATE_NO_LOCK_BLOCK_TASKLET,
+		   &ce->guc_sched_state_no_lock);
+}
+
+static inline bool context_guc_id_stolen(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_GUC_ID_STOLEN);
+}
+
+static inline void set_context_guc_id_stolen(struct intel_context *ce)
+{
+	atomic_or(SCHED_STATE_NO_LOCK_GUC_ID_STOLEN,
+		  &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_guc_id_stolen(struct intel_context *ce)
+{
+	atomic_and((u32)~SCHED_STATE_NO_LOCK_GUC_ID_STOLEN,
+		   &ce->guc_sched_state_no_lock);
+}
+
+static inline bool context_needs_register(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_NEEDS_REGISTER);
+}
+
+static inline void set_context_needs_register(struct intel_context *ce)
+{
+	atomic_or(SCHED_STATE_NO_LOCK_NEEDS_REGISTER,
+		  &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_needs_register(struct intel_context *ce)
+{
+	atomic_and((u32)~SCHED_STATE_NO_LOCK_NEEDS_REGISTER,
+		   &ce->guc_sched_state_no_lock);
+}
+
 /*
  * Below is a set of functions which control the GuC scheduling state which
  * require a lock, aside from the special case where the functions are called
@@ -372,9 +525,12 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
 					      interruptible, timeout);
 }
 
-static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
+static inline bool request_has_no_guc_id(struct i915_request *rq)
+{
+	return test_bit(I915_FENCE_FLAG_GUC_ID_NOT_PINNED, &rq->fence.flags);
+}
 
-static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
+static int __guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
 	int err = 0;
 	struct intel_context *ce = rq->context;
@@ -383,8 +539,15 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	u32 g2h_len_dw = 0;
 	bool enabled;
 
+	/* Ensure context is in correct state before a submission */
+	GEM_BUG_ON(ce->guc_num_rq_submit_no_id);
+	GEM_BUG_ON(request_has_no_guc_id(rq));
 	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
+	GEM_BUG_ON(context_needs_register(ce));
 	GEM_BUG_ON(context_guc_id_invalid(ce));
+	GEM_BUG_ON(context_pending_disable(ce));
+	GEM_BUG_ON(context_wait_for_deregister_to_register(ce));
+	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
 
 	/*
 	 * Corner case where requests were sitting in the priority list or a
@@ -396,18 +559,11 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		goto out;
 	}
 
-	/*
-	 * Corner case where the GuC firmware was blown away and reloaded while
-	 * this context was pinned.
-	 */
-	if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
-		err = guc_lrc_desc_pin(ce, false);
-		if (unlikely(err))
-			goto out;
-	}
 	enabled = context_enabled(ce);
 
 	if (!enabled) {
+		GEM_BUG_ON(context_pending_enable(ce));
+
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
 		action[len++] = ce->guc_id;
 		action[len++] = GUC_CONTEXT_ENABLE;
@@ -435,6 +591,67 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	return err;
 }
 
+static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
+{
+	int ret;
+
+	lockdep_assert_held(&guc->sched_engine->lock);
+
+	ret = __guc_add_request(guc, rq);
+	if (ret == -EBUSY) {
+		guc->stalled_rq = rq;
+		guc->submission_stall_reason = STALL_ADD_REQUEST;
+	} else {
+		guc->stalled_rq = NULL;
+		guc->submission_stall_reason = STALL_NONE;
+	}
+
+	return ret;
+}
+
+static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
+
+static int tasklet_register_context(struct intel_guc *guc,
+				    struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+	int ret = 0;
+
+	/* Check state */
+	lockdep_assert_held(&guc->sched_engine->lock);
+	GEM_BUG_ON(ce->guc_num_rq_submit_no_id);
+	GEM_BUG_ON(request_has_no_guc_id(rq));
+	GEM_BUG_ON(context_guc_id_invalid(ce));
+	GEM_BUG_ON(context_pending_disable(ce));
+	GEM_BUG_ON(context_wait_for_deregister_to_register(ce));
+	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
+
+	/*
+	 * The guc_id is getting pinned during the tasklet and we need to
+	 * register this context or a corner case where the GuC firwmare was
+	 * blown away and reloaded while this context was pinned
+	 */
+	if (unlikely((!lrc_desc_registered(guc, ce->guc_id) ||
+		      context_needs_register(ce)) &&
+		     !intel_context_is_banned(ce))) {
+		ret = guc_lrc_desc_pin(ce, false);
+
+		if (likely(ret != -EBUSY))
+			clr_context_needs_register(ce);
+
+		if (unlikely(ret == -EBUSY)) {
+			guc->stalled_rq = rq;
+			guc->submission_stall_reason = STALL_REGISTER_CONTEXT;
+		} else if (unlikely(ret == -EINPROGRESS)) {
+			guc->stalled_rq = rq;
+			guc->submission_stall_reason = STALL_DEREGISTER_CONTEXT;
+		}
+	}
+
+	return ret;
+}
+
+
 static inline void guc_set_lrc_tail(struct i915_request *rq)
 {
 	rq->context->lrc_reg_state[CTX_RING_TAIL] =
@@ -446,77 +663,143 @@ static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority;
 }
 
+static void kick_retire_wq(struct intel_guc *guc)
+{
+	queue_work(system_unbound_wq, &guc->retire_worker);
+}
+
+static int tasklet_pin_guc_id(struct intel_guc *guc, struct i915_request *rq);
+
 static int guc_dequeue_one_context(struct intel_guc *guc)
 {
 	struct i915_sched_engine * const sched_engine = guc->sched_engine;
-	struct i915_request *last = NULL;
-	bool submit = false;
+	struct i915_request *last = guc->stalled_rq;
+	bool submit = !!last;
 	struct rb_node *rb;
 	int ret;
 
 	lockdep_assert_held(&sched_engine->lock);
+	GEM_BUG_ON(guc->stalled_context);
+	GEM_BUG_ON(!submit && guc->submission_stall_reason);
 
-	if (guc->stalled_request) {
-		submit = true;
-		last = guc->stalled_request;
-		goto resubmit;
-	}
+	if (submit) {
+		/* Flow control conditions */
+		switch (guc->submission_stall_reason) {
+		case STALL_GUC_ID_TASKLET:
+			goto done;
+		case STALL_REGISTER_CONTEXT:
+			goto register_context;
+		case STALL_MOVE_LRC_TAIL:
+			goto move_lrc_tail;
+		case STALL_ADD_REQUEST:
+			goto add_request;
+		default:
+			GEM_BUG_ON("Invalid stall state");
+		}
+	} else {
+		GEM_BUG_ON(!guc->total_num_rq_with_no_guc_id &&
+			   guc_ids_exhausted(guc));
 
-	while ((rb = rb_first_cached(&sched_engine->queue))) {
-		struct i915_priolist *p = to_priolist(rb);
-		struct i915_request *rq, *rn;
+		while ((rb = rb_first_cached(&sched_engine->queue))) {
+			struct i915_priolist *p = to_priolist(rb);
+			struct i915_request *rq, *rn;
 
-		priolist_for_each_request_consume(rq, rn, p) {
-			if (last && rq->context != last->context)
-				goto done;
+			priolist_for_each_request_consume(rq, rn, p) {
+				if (last && rq->context != last->context)
+					goto done;
 
-			list_del_init(&rq->sched.link);
+				list_del_init(&rq->sched.link);
 
-			__i915_request_submit(rq);
+				__i915_request_submit(rq);
 
-			trace_i915_request_in(rq, 0);
-			last = rq;
-			submit = true;
-		}
+				trace_i915_request_in(rq, 0);
+				last = rq;
+				submit = true;
+			}
 
-		rb_erase_cached(&p->node, &sched_engine->queue);
-		i915_priolist_free(p);
+			rb_erase_cached(&p->node, &sched_engine->queue);
+			i915_priolist_free(p);
+		}
 	}
+
 done:
 	if (submit) {
+		struct intel_context *ce = last->context;
+
+		if (ce->guc_num_rq_submit_no_id) {
+			ret = tasklet_pin_guc_id(guc, last);
+			if (ret)
+				goto blk_tasklet_kick;
+		}
+
+register_context:
+		ret = tasklet_register_context(guc, last);
+		if (unlikely(ret == -EINPROGRESS))
+			goto blk_tasklet;
+		else if (unlikely(ret == -EDEADLK))
+			goto deadlk;
+		else if (unlikely(ret == -EBUSY))
+			goto schedule_tasklet;
+		else if (ret != 0) {
+			GEM_WARN_ON(ret);	/* Unexpected */
+			goto deadlk;
+		}
+
+move_lrc_tail:
 		guc_set_lrc_tail(last);
-resubmit:
+
+add_request:
 		ret = guc_add_request(guc, last);
 		if (unlikely(ret == -EDEADLK))
 			goto deadlk;
-		else if (ret == -EBUSY) {
-			i915_sched_engine_kick(sched_engine);
-			guc->stalled_request = last;
-			return false;
+		else if (ret == -EBUSY)
+			goto schedule_tasklet;
+		else if (ret != 0) {
+			GEM_WARN_ON(ret);	/* Unexpected */
+			goto deadlk;
 		}
 	}
 
-	guc->stalled_request = NULL;
+	/*
+	 * No requests without a guc_id, enable guc_id allocation at request
+	 * creation time (guc_request_alloc).
+	 */
+	if (!guc->total_num_rq_with_no_guc_id)
+		clr_guc_ids_exhausted(guc);
+
 	return submit;
 
+
+schedule_tasklet:
+	i915_sched_engine_kick(sched_engine);
+	return false;
+
 deadlk:
 	sched_engine->tasklet.callback = NULL;
 	tasklet_disable_nosync(&sched_engine->tasklet);
 	return false;
+
+blk_tasklet_kick:
+	kick_retire_wq(guc);
+blk_tasklet:
+	set_tasklet_blocked(guc);
+	return false;
 }
 
 static void guc_submission_tasklet(struct tasklet_struct *t)
 {
 	struct i915_sched_engine *sched_engine =
 		from_tasklet(sched_engine, t, tasklet);
+	struct intel_guc *guc = &sched_engine->engine->gt->uc.guc;
 	unsigned long flags;
 	bool loop;
 
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	do {
-		loop = guc_dequeue_one_context(&sched_engine->engine->gt->uc.guc);
-	} while (loop);
+	if (likely(!tasklet_blocked(guc)))
+		do {
+			loop = guc_dequeue_one_context(guc);
+		} while (loop);
 
 	i915_sched_engine_reset_on_empty(sched_engine);
 
@@ -593,6 +876,14 @@ submission_disabled(struct intel_guc *guc)
 	return unlikely(!__tasklet_is_enabled(&sched_engine->tasklet));
 }
 
+static void kick_tasklet(struct intel_guc *guc)
+{
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+
+	if (likely(!tasklet_blocked(guc)))
+		i915_sched_engine_hi_kick(sched_engine);
+}
+
 static void disable_submission(struct intel_guc *guc)
 {
 	struct i915_sched_engine * const sched_engine = guc->sched_engine;
@@ -616,8 +907,16 @@ static void enable_submission(struct intel_guc *guc)
 	    __tasklet_enable(&sched_engine->tasklet)) {
 		GEM_BUG_ON(!guc->ct.enabled);
 
+		/* Reset tasklet state */
+		guc->stalled_rq = NULL;
+		if (guc->stalled_context)
+			intel_context_put(guc->stalled_context);
+		guc->stalled_context = NULL;
+		guc->submission_stall_reason = STALL_NONE;
+		guc->flags = 0;
+
 		/* And kick in case we missed a new request submission. */
-		i915_sched_engine_hi_kick(sched_engine);
+		kick_tasklet(guc);
 	}
 	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
 }
@@ -795,6 +1094,7 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
 out_replay:
 	guc_reset_state(ce, head, stalled);
 	__unwind_incomplete_requests(ce);
+	ce->guc_num_rq_submit_no_id = 0;
 	intel_context_put(ce);
 }
 
@@ -826,6 +1126,7 @@ static void guc_cancel_context_requests(struct intel_context *ce)
 	spin_lock(&ce->guc_active.lock);
 	list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
 		i915_request_put(i915_request_mark_eio(rq));
+	ce->guc_num_rq_submit_no_id = 0;
 	spin_unlock(&ce->guc_active.lock);
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
@@ -862,11 +1163,15 @@ guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine)
 		struct i915_priolist *p = to_priolist(rb);
 
 		priolist_for_each_request_consume(rq, rn, p) {
+			struct intel_context *ce = rq->context;
+
 			list_del_init(&rq->sched.link);
 
 			__i915_request_submit(rq);
 
 			i915_request_put(i915_request_mark_eio(rq));
+
+			ce->guc_num_rq_submit_no_id = 0;
 		}
 
 		rb_erase_cached(&p->node, &sched_engine->queue);
@@ -917,6 +1222,51 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
 	intel_gt_unpark_heartbeats(guc_to_gt(guc));
 }
 
+static void retire_worker_sched_disable(struct intel_guc *guc,
+					struct intel_context *ce);
+
+static void retire_worker_func(struct work_struct *w)
+{
+	struct intel_guc *guc =
+		container_of(w, struct intel_guc, retire_worker);
+
+	/*
+	 * It is possible that another thread issues the schedule disable + that
+	 * G2H completes moving the state machine further along to a point
+	 * where nothing needs to be done here. Let's be paranoid and kick the
+	 * tasklet in that case.
+	 */
+	if (guc->submission_stall_reason != STALL_SCHED_DISABLE &&
+	    guc->submission_stall_reason != STALL_GUC_ID_WORKQUEUE) {
+		kick_tasklet(guc);
+		return;
+	}
+
+	if (guc->submission_stall_reason == STALL_SCHED_DISABLE) {
+		GEM_BUG_ON(!guc->stalled_context);
+		GEM_BUG_ON(context_guc_id_invalid(guc->stalled_context));
+
+		retire_worker_sched_disable(guc, guc->stalled_context);
+	}
+
+	/*
+	 * guc_id pressure, always try to release it regardless of state,
+	 * albeit after possibly issuing a schedule disable as that is async
+	 * operation.
+	 */
+	intel_gt_retire_requests(guc_to_gt(guc));
+
+	if (guc->submission_stall_reason == STALL_GUC_ID_WORKQUEUE) {
+		GEM_BUG_ON(guc->stalled_context);
+
+		/* Hopefully guc_ids are now available, kick tasklet */
+		guc->submission_stall_reason = STALL_GUC_ID_TASKLET;
+		clr_tasklet_blocked(guc);
+
+		kick_tasklet(guc);
+	}
+}
+
 /*
  * Set up the memory resources to be shared with the GuC (via the GGTT)
  * at firmware loading time.
@@ -940,9 +1290,12 @@ int intel_guc_submission_init(struct intel_guc *guc)
 	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
 
 	spin_lock_init(&guc->contexts_lock);
-	INIT_LIST_HEAD(&guc->guc_id_list);
+	INIT_LIST_HEAD(&guc->guc_id_list_no_ref);
+	INIT_LIST_HEAD(&guc->guc_id_list_unpinned);
 	ida_init(&guc->guc_ids);
 
+	INIT_WORK(&guc->retire_worker, retire_worker_func);
+
 	return 0;
 }
 
@@ -959,10 +1312,28 @@ static inline void queue_request(struct i915_sched_engine *sched_engine,
 				 struct i915_request *rq,
 				 int prio)
 {
+	bool empty = i915_sched_engine_is_empty(sched_engine);
+
 	GEM_BUG_ON(!list_empty(&rq->sched.link));
 	list_add_tail(&rq->sched.link,
 		      i915_sched_lookup_priolist(sched_engine, prio));
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+
+	if (empty)
+		kick_tasklet(&rq->engine->gt->uc.guc);
+}
+
+static bool need_tasklet(struct intel_guc *guc, struct intel_context *ce)
+{
+	struct i915_sched_engine * const sched_engine =
+		ce->engine->sched_engine;
+
+	lockdep_assert_held(&sched_engine->lock);
+
+	return guc_ids_exhausted(guc) || submission_disabled(guc) ||
+		guc->stalled_rq || guc->stalled_context ||
+		!lrc_desc_registered(guc, ce->guc_id) ||
+		!i915_sched_engine_is_empty(sched_engine);
 }
 
 static int guc_bypass_tasklet_submit(struct intel_guc *guc,
@@ -976,8 +1347,6 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
 
 	guc_set_lrc_tail(rq);
 	ret = guc_add_request(guc, rq);
-	if (ret == -EBUSY)
-		guc->stalled_request = rq;
 
 	if (unlikely(ret == -EDEADLK))
 		disable_submission(guc);
@@ -994,11 +1363,10 @@ static void guc_submit_request(struct i915_request *rq)
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	if (submission_disabled(guc) || guc->stalled_request ||
-	    !i915_sched_engine_is_empty(sched_engine))
+	if (need_tasklet(guc, rq->context))
 		queue_request(sched_engine, rq, rq_prio(rq));
 	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
-		i915_sched_engine_hi_kick(sched_engine);
+		kick_tasklet(guc);
 
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
@@ -1031,45 +1399,100 @@ static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
 	spin_unlock_irqrestore(&guc->contexts_lock, flags);
 }
 
-static int steal_guc_id(struct intel_guc *guc)
+/*
+ * We have two lists for guc_ids available to steal. One list is for contexts
+ * that to have a zero guc_id_ref but are still pinned (scheduling enabled, only
+ * available inside tasklet) and the other is for contexts that are not pinned
+ * but still registered (available both outside and inside tasklet). Stealing
+ * from the latter only requires a deregister H2G, while the former requires a
+ * schedule disable H2G + a deregister H2G.
+ */
+static struct list_head *get_guc_id_list(struct intel_guc *guc,
+					 bool unpinned)
+{
+	if (unpinned)
+		return &guc->guc_id_list_unpinned;
+	else
+		return &guc->guc_id_list_no_ref;
+}
+
+static int steal_guc_id(struct intel_guc *guc, bool unpinned)
 {
 	struct intel_context *ce;
 	int guc_id;
+	struct list_head *guc_id_list = get_guc_id_list(guc, unpinned);
 
-	if (!list_empty(&guc->guc_id_list)) {
-		ce = list_first_entry(&guc->guc_id_list,
+	if (!list_empty(guc_id_list)) {
+		ce = list_first_entry(guc_id_list,
 				      struct intel_context,
 				      guc_id_link);
 
+		/* Ensure context getting stolen in expected state */
 		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
 		GEM_BUG_ON(context_guc_id_invalid(ce));
+		GEM_BUG_ON(context_guc_id_stolen(ce));
 
 		list_del_init(&ce->guc_id_link);
 		guc_id = ce->guc_id;
-		set_context_guc_id_invalid(ce);
+
+		/*
+		 * If stealing from the pinned list, defer invalidating
+		 * the guc_id until the retire workqueue processes this
+		 * context.
+		 */
+		if (!unpinned) {
+			GEM_BUG_ON(guc->stalled_context);
+			guc->stalled_context = intel_context_get(ce);
+			set_context_guc_id_stolen(ce);
+		} else {
+			set_context_guc_id_invalid(ce);
+		}
+
 		return guc_id;
 	} else {
 		return -EAGAIN;
 	}
 }
 
-static int assign_guc_id(struct intel_guc *guc, u16 *out)
+enum {	/* Return values for pin_guc_id / assign_guc_id */
+	SAME_GUC_ID		=0,
+	NEW_GUC_ID_DISABLED	=1,
+	NEW_GUC_ID_ENABLED	=2,
+};
+
+static int assign_guc_id(struct intel_guc *guc, u16 *out, bool tasklet)
 {
 	int ret;
 
 	ret = new_guc_id(guc);
 	if (unlikely(ret < 0)) {
-		ret = steal_guc_id(guc);
-		if (ret < 0)
-			return ret;
+		ret = steal_guc_id(guc, true);
+		if (ret >= 0) {
+			*out = ret;
+			ret = NEW_GUC_ID_DISABLED;
+		} else if (ret < 0 && tasklet) {
+			/*
+			 * We only steal a guc_id from a context with scheduling
+			 * enabled if guc_ids are exhausted and we are submitting
+			 * from the tasklet.
+			 */
+			ret = steal_guc_id(guc, false);
+			if (ret >= 0) {
+				*out = ret;
+				ret = NEW_GUC_ID_ENABLED;
+			}
+		}
+	} else {
+		*out = ret;
+		ret = SAME_GUC_ID;
 	}
 
-	*out = ret;
-	return 0;
+	return ret;
 }
 
 #define PIN_GUC_ID_TRIES	4
-static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
+static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce,
+		      bool tasklet)
 {
 	int ret = 0;
 	unsigned long flags, tries = PIN_GUC_ID_TRIES;
@@ -1079,11 +1502,15 @@ static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
 try_again:
 	spin_lock_irqsave(&guc->contexts_lock, flags);
 
+	if (!tasklet && guc_ids_exhausted(guc)) {
+		ret = -EAGAIN;
+		goto out_unlock;
+	}
+
 	if (context_guc_id_invalid(ce)) {
-		ret = assign_guc_id(guc, &ce->guc_id);
-		if (ret)
+		ret = assign_guc_id(guc, &ce->guc_id, tasklet);
+		if (unlikely(ret < 0))
 			goto out_unlock;
-		ret = 1;	// Indidcates newly assigned HW context
 	}
 	if (!list_empty(&ce->guc_id_link))
 		list_del_init(&ce->guc_id_link);
@@ -1099,8 +1526,11 @@ static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
 	 * attempting to retire more requests. Double the sleep period each
 	 * subsequent pass before finally giving up. The sleep period has max of
 	 * 100ms and minimum of 1ms.
+	 *
+	 * We only try this if outside the tasklet, inside the tasklet we have a
+	 * (slower, more complex, blocking) different flow control algorithm.
 	 */
-	if (ret == -EAGAIN && --tries) {
+	if (ret == -EAGAIN && --tries && !tasklet) {
 		if (PIN_GUC_ID_TRIES - tries > 1) {
 			unsigned int timeslice_shifted =
 				ce->engine->props.timeslice_duration_ms <<
@@ -1117,16 +1547,26 @@ static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
 	return ret;
 }
 
-static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
+static void unpin_guc_id(struct intel_guc *guc,
+			 struct intel_context *ce,
+			 bool unpinned)
 {
 	unsigned long flags;
 
 	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
 
 	spin_lock_irqsave(&guc->contexts_lock, flags);
-	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
-	    !atomic_read(&ce->guc_id_ref))
-		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
+
+	if (!list_empty(&ce->guc_id_link))
+		list_del_init(&ce->guc_id_link);
+
+	if (!context_guc_id_invalid(ce) && !context_guc_id_stolen(ce) &&
+	    !atomic_read(&ce->guc_id_ref)) {
+		struct list_head *head = get_guc_id_list(guc, unpinned);
+
+		list_add_tail(&ce->guc_id_link, head);
+	}
+
 	spin_unlock_irqrestore(&guc->contexts_lock, flags);
 }
 
@@ -1220,6 +1660,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	int ret = 0;
 
 	GEM_BUG_ON(!engine->mask);
+	GEM_BUG_ON(context_guc_id_invalid(ce));
 
 	/*
 	 * Ensure LRC + CT vmas are is same region as write barrier is done
@@ -1255,6 +1696,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 		trace_intel_context_steal_guc_id(ce);
 		if (!loop) {
 			set_context_wait_for_deregister_to_register(ce);
+			set_context_block_tasklet(ce);
 			intel_context_get(ce);
 		} else {
 			bool disabled;
@@ -1282,7 +1724,14 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 			ret = deregister_context(ce, ce->guc_id, loop);
 		if (unlikely(ret == -EBUSY)) {
 			clr_context_wait_for_deregister_to_register(ce);
+			clr_context_block_tasklet(ce);
 			intel_context_put(ce);
+		} else if (!loop && !ret) {
+			/*
+			 * A context de-registration has been issued from within
+			 * the tasklet. Need to block until it complete.
+			 */
+			return -EINPROGRESS;
 		}
 	} else {
 		with_intel_runtime_pm(runtime_pm, wakeref)
@@ -1331,7 +1780,9 @@ static void guc_context_unpin(struct intel_context *ce)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 
-	unpin_guc_id(guc, ce);
+	GEM_BUG_ON(context_enabled(ce));
+
+	unpin_guc_id(guc, ce, true);
 	lrc_unpin(ce);
 }
 
@@ -1493,13 +1944,14 @@ static void guc_context_destroy(struct kref *kref)
 	unsigned long flags;
 	bool disabled;
 
+	GEM_BUG_ON(context_guc_id_stolen(ce));
+
 	/*
 	 * If the guc_id is invalid this context has been stolen and we can free
 	 * it immediately. Also can be freed immediately if the context is not
 	 * registered with the GuC.
 	 */
-	if (submission_disabled(guc) ||
-	    context_guc_id_invalid(ce) ||
+	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
 		release_guc_id(guc, ce);
 		__guc_context_destroy(ce);
@@ -1566,6 +2018,8 @@ static void add_to_context(struct i915_request *rq)
 
 	spin_lock(&ce->guc_active.lock);
 	list_move_tail(&rq->sched.link, &ce->guc_active.requests);
+	if (unlikely(request_has_no_guc_id(rq)))
+		++ce->guc_num_rq_submit_no_id;
 	spin_unlock(&ce->guc_active.lock);
 }
 
@@ -1583,7 +2037,12 @@ static void remove_from_context(struct i915_request *rq)
 
 	spin_unlock_irq(&ce->guc_active.lock);
 
-	atomic_dec(&ce->guc_id_ref);
+	if (likely(!request_has_no_guc_id(rq)))
+		atomic_dec(&ce->guc_id_ref);
+	else
+		--ce_to_guc(rq->context)->total_num_rq_with_no_guc_id;
+	unpin_guc_id(ce_to_guc(ce), ce, false);
+
 	i915_request_notify_execute_cb_imm(rq);
 }
 
@@ -1633,13 +2092,144 @@ static void guc_signal_context_fence(struct intel_context *ce)
 	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 }
 
-static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
+static void invalidate_guc_id_sched_disable(struct intel_context *ce)
+{
+	set_context_guc_id_invalid(ce);
+	wmb();
+	clr_context_guc_id_stolen(ce);
+}
+
+static void retire_worker_sched_disable(struct intel_guc *guc,
+					struct intel_context *ce)
+{
+	unsigned long flags;
+	bool disabled;
+
+	guc->stalled_context = NULL;
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	disabled = submission_disabled(guc);
+	if (!disabled && !context_pending_disable(ce) && context_enabled(ce)) {
+		/*
+		 * Still enabled, issue schedule disable + configure state so
+		 * when G2H returns tasklet is kicked.
+		 */
+
+		struct intel_runtime_pm *runtime_pm =
+			&ce->engine->gt->i915->runtime_pm;
+		intel_wakeref_t wakeref;
+		u16 guc_id;
+
+		/*
+		 * We add +2 here as the schedule disable complete CTB handler
+		 * calls intel_context_sched_disable_unpin (-2 to pin_count).
+		 */
+		GEM_BUG_ON(!atomic_read(&ce->pin_count));
+		atomic_add(2, &ce->pin_count);
+
+		set_context_block_tasklet(ce);
+		guc_id = prep_context_pending_disable(ce);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			__guc_context_sched_disable(guc, ce, guc_id);
+
+		invalidate_guc_id_sched_disable(ce);
+	} else if (!disabled && context_pending_disable(ce)) {
+		/*
+		 * Schedule disable in flight, set bit to kick tasklet in G2H
+		 * handler and call it a day.
+		 */
+
+		set_context_block_tasklet(ce);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		invalidate_guc_id_sched_disable(ce);
+	} else if (disabled || !context_enabled(ce)) {
+		/* Schedule disable is done, kick tasklet */
+
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		invalidate_guc_id_sched_disable(ce);
+
+		guc->submission_stall_reason = STALL_REGISTER_CONTEXT;
+		clr_tasklet_blocked(guc);
+
+		kick_tasklet(ce_to_guc(ce));
+	}
+
+	intel_context_put(ce);
+}
+
+static bool context_needs_lrc_desc_pin(struct intel_context *ce, bool new_guc_id)
 {
 	return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
 		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
 		!submission_disabled(ce_to_guc(ce));
 }
 
+static int tasklet_pin_guc_id(struct intel_guc *guc, struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+	int ret = 0;
+
+	lockdep_assert_held(&guc->sched_engine->lock);
+	GEM_BUG_ON(!ce->guc_num_rq_submit_no_id);
+
+	if (atomic_add_unless(&ce->guc_id_ref, ce->guc_num_rq_submit_no_id, 0))
+		goto out;
+
+	ret = pin_guc_id(guc, ce, true);
+	if (unlikely(ret < 0)) {
+		/*
+		 * No guc_ids available, disable the tasklet and kick the retire
+		 * workqueue hopefully freeing up some guc_ids.
+		 */
+		guc->stalled_rq = rq;
+		guc->submission_stall_reason = STALL_GUC_ID_WORKQUEUE;
+		return ret;
+	}
+
+	if (ce->guc_num_rq_submit_no_id - 1 > 0)
+		atomic_add(ce->guc_num_rq_submit_no_id - 1,
+			   &ce->guc_id_ref);
+
+	if (context_needs_lrc_desc_pin(ce, !!ret))
+		set_context_needs_register(ce);
+
+	if (ret == NEW_GUC_ID_ENABLED) {
+		guc->stalled_rq = rq;
+		guc->submission_stall_reason = STALL_SCHED_DISABLE;
+	}
+
+	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
+out:
+	guc->total_num_rq_with_no_guc_id -= ce->guc_num_rq_submit_no_id;
+	GEM_BUG_ON(guc->total_num_rq_with_no_guc_id < 0);
+
+	list_for_each_entry_reverse(rq, &ce->guc_active.requests, sched.link)
+		if (request_has_no_guc_id(rq)) {
+			--ce->guc_num_rq_submit_no_id;
+			clear_bit(I915_FENCE_FLAG_GUC_ID_NOT_PINNED,
+				  &rq->fence.flags);
+		} else if (!ce->guc_num_rq_submit_no_id) {
+			break;
+		}
+
+	GEM_BUG_ON(ce->guc_num_rq_submit_no_id);
+
+	/*
+	 * When NEW_GUC_ID_ENABLED is returned it means we are stealing a guc_id
+	 * from a context that has scheduling enabled. We have to disable
+	 * scheduling before deregistering the context and it isn't safe to do
+	 * in the tasklet because of lock inversion (ce->guc_state.lock must be
+	 * acquired before guc->sched_engine->lock). To work around this
+	 * we do the schedule disable in retire workqueue and block the tasklet
+	 * until the schedule done G2H returns. Returning non-zero here kicks
+	 * the workqueue.
+	 */
+	return (ret == NEW_GUC_ID_ENABLED) ? ret : 0;
+}
+
 static int guc_request_alloc(struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
@@ -1649,6 +2239,15 @@ static int guc_request_alloc(struct i915_request *rq)
 
 	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
 
+	/*
+	 * guc_ids are exhausted, don't allocate one here, defer to submission
+	 * in the tasklet.
+	 */
+	if (test_and_update_guc_ids_exhausted(guc)) {
+		set_bit(I915_FENCE_FLAG_GUC_ID_NOT_PINNED, &rq->fence.flags);
+		goto out;
+	}
+
 	/*
 	 * Flush enough space to reduce the likelihood of waiting after
 	 * we start building the request - in which case we will just
@@ -1678,9 +2277,7 @@ static int guc_request_alloc(struct i915_request *rq)
 	 * when guc_ids are being stolen due to over subscription. By the time
 	 * this function is reached, it is guaranteed that the guc_id will be
 	 * persistent until the generated request is retired. Thus, sealing these
-	 * race conditions. It is still safe to fail here if guc_ids are
-	 * exhausted and return -EAGAIN to the user indicating that they can try
-	 * again in the future.
+	 * race conditions.
 	 *
 	 * There is no need for a lock here as the timeline mutex ensures at
 	 * most one context can be executing this code path at once. The
@@ -1691,17 +2288,32 @@ static int guc_request_alloc(struct i915_request *rq)
 	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
 		goto out;
 
-	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
-	if (unlikely(ret < 0))
-		return ret;;
+	ret = pin_guc_id(guc, ce, false);	/* > 0 indicates new guc_id */
+	if (unlikely(ret == -EAGAIN)) {
+		/*
+		 * No guc_ids available, so we force this submission and all
+		 * future submissions to be serialized in the tasklet, sharing
+		 * the guc_ids on a per submission basis to ensure (more) fair
+		 * scheduling of submissions. Once the tasklet is flushed of
+		 * submissions we return to allocating guc_ids in this function.
+		 */
+		set_bit(I915_FENCE_FLAG_GUC_ID_NOT_PINNED, &rq->fence.flags);
+		set_and_update_guc_ids_exhausted(guc);
+
+		return 0;
+	} else if (unlikely(ret < 0)) {
+		return ret;
+	}
+
+	GEM_BUG_ON(ret == NEW_GUC_ID_ENABLED);
 
-	if (context_needs_register(ce, !!ret)) {
+	if (context_needs_lrc_desc_pin(ce, !!ret)) {
 		ret = guc_lrc_desc_pin(ce, true);
 		if (unlikely(ret)) {	/* unwind */
 			if (ret == -EDEADLK)
 				disable_submission(guc);
 			atomic_dec(&ce->guc_id_ref);
-			unpin_guc_id(guc, ce);
+			unpin_guc_id(guc, ce, true);
 			return ret;
 		}
 	}
@@ -1950,7 +2562,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
 					  struct intel_context *ce)
 {
 	if (context_guc_id_invalid(ce))
-		pin_guc_id(guc, ce);
+		pin_guc_id(guc, ce, false);
 	guc_lrc_desc_pin(ce, true);
 }
 
@@ -2207,6 +2819,16 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 		with_intel_runtime_pm(runtime_pm, wakeref)
 			register_context(ce, true);
 		guc_signal_context_fence(ce);
+		if (context_block_tasklet(ce)) {
+			GEM_BUG_ON(guc->submission_stall_reason !=
+				   STALL_DEREGISTER_CONTEXT);
+
+			clr_context_block_tasklet(ce);
+			guc->submission_stall_reason = STALL_MOVE_LRC_TAIL;
+			clr_tasklet_blocked(guc);
+
+			kick_tasklet(ce_to_guc(ce));
+		}
 		intel_context_put(ce);
 	} else if (context_destroyed(ce)) {
 		/* Context has been destroyed */
@@ -2269,6 +2891,14 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		__guc_signal_context_fence(ce);
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
+		if (context_block_tasklet(ce)) {
+			clr_context_block_tasklet(ce);
+			guc->submission_stall_reason = STALL_REGISTER_CONTEXT;
+			clr_tasklet_blocked(guc);
+
+			kick_tasklet(ce_to_guc(ce));
+		}
+
 		if (banned) {
 			guc_cancel_context_requests(ce);
 			intel_engine_signal_breadcrumbs(ce->engine);
@@ -2297,10 +2927,8 @@ static void capture_error_state(struct intel_guc *guc,
 
 static void guc_context_replay(struct intel_context *ce)
 {
-	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
-
 	__guc_reset_context(ce, true);
-	i915_sched_engine_hi_kick(sched_engine);
+	kick_tasklet(ce_to_guc(ce));
 }
 
 static void guc_handle_context_reset(struct intel_guc *guc,
@@ -2455,8 +3083,16 @@ void intel_guc_log_submission_info(struct intel_guc *guc,
 		   atomic_read(&guc->outstanding_submission_g2h));
 	drm_printf(p, "GuC Number GuC IDs: %u\n", guc->num_guc_ids);
 	drm_printf(p, "GuC Max GuC IDs: %u\n", guc->max_guc_ids);
-	drm_printf(p, "GuC tasklet count: %u\n\n",
+	drm_printf(p, "GuC tasklet count: %u\n",
 		   atomic_read(&sched_engine->tasklet.count));
+	drm_printf(p, "GuC submit flags: 0x%04lx\n", guc->flags);
+	drm_printf(p, "GuC total number request without guc_id: %d\n",
+		   guc->total_num_rq_with_no_guc_id);
+	drm_printf(p, "GuC stall reason: %d\n", guc->submission_stall_reason);
+	drm_printf(p, "GuC stalled request: %s\n",
+		   yesno(guc->stalled_rq));
+	drm_printf(p, "GuC stalled context: %s\n\n",
+		   yesno(guc->stalled_context));
 
 	spin_lock_irqsave(&sched_engine->lock, flags);
 	drm_printf(p, "Requests in GuC submit tasklet:\n");
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index f98385f72782..94a3f119ad86 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -139,6 +139,12 @@ enum {
 	 * the GPU. Here we track such boost requests on a per-request basis.
 	 */
 	I915_FENCE_FLAG_BOOST,
+
+	/*
+	 * I915_FENCE_FLAG_GUC_ID_NOT_PINNED - Set to signal the GuC submission
+	 * tasklet that the guc_id isn't pinned.
+	 */
+	I915_FENCE_FLAG_GUC_ID_NOT_PINNED,
 };
 
 /**
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 84/97] drm/i915/guc: Don't allow requests not ready to consume all guc_ids
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (82 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 83/97] drm/i915/guc: Don't return -EAGAIN to user when guc_ids exhausted Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 85/97] drm/i915/guc: Introduce guc_submit_engine object Matthew Brost
                   ` (15 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Add a heuristic which checks if over half of the available guc_ids are
currently consumed by requests not ready to be submitted. If this
heuristic is true at request creation time (normal guc_id allocation
location) force all submissions + guc_ids allocations to tasklet.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |  3 ++
 drivers/gpu/drm/i915/gt/intel_reset.c         |  9 ++++
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  1 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 53 +++++++++++++++++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 +
 5 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index a25ea8fe2029..998f3839411a 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -186,6 +186,9 @@ struct intel_context {
 	/* GuC lrc descriptor reference count */
 	atomic_t guc_id_ref;
 
+	/* GuC number of requests not ready */
+	atomic_t guc_num_rq_not_ready;
+
 	/*
 	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
 	 */
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 4347cc2dcea0..be25e39f0dd8 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -22,6 +22,7 @@
 #include "intel_reset.h"
 
 #include "uc/intel_guc.h"
+#include "uc/intel_guc_submission.h"
 
 #define RESET_MAX_RETRIES 3
 
@@ -776,6 +777,14 @@ static void nop_submit_request(struct i915_request *request)
 {
 	RQ_TRACE(request, "-EIO\n");
 
+	/*
+	 * XXX: Kinda ugly to check for GuC submission here but this function is
+	 * going away once we switch to the DRM scheduler so we can live with
+	 * this for now.
+	 */
+	if (intel_engine_uses_guc(request->engine))
+		intel_guc_decr_num_rq_not_ready(request->context);
+
 	request = i915_request_mark_eio(request);
 	if (request) {
 		i915_request_submit(request);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index bd477209839b..26a0225f45e9 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -76,6 +76,7 @@ struct intel_guc {
 	struct ida guc_ids;
 	u32 num_guc_ids;
 	u32 max_guc_ids;
+	atomic_t num_guc_ids_not_ready;
 	struct list_head guc_id_list_no_ref;
 	struct list_head guc_id_list_unpinned;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 037a7ee4971b..aa5e608deed5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1323,6 +1323,41 @@ static inline void queue_request(struct i915_sched_engine *sched_engine,
 		kick_tasklet(&rq->engine->gt->uc.guc);
 }
 
+/* Macro to tweak heuristic, using a simple over 50% not ready for now */
+#define TOO_MANY_GUC_IDS_NOT_READY(avail, consumed) \
+	(consumed > avail / 2)
+static bool too_many_guc_ids_not_ready(struct intel_guc *guc,
+				       struct intel_context *ce)
+{
+	u32 available_guc_ids, guc_ids_consumed;
+
+	available_guc_ids = guc->num_guc_ids;
+	guc_ids_consumed = atomic_read(&guc->num_guc_ids_not_ready);
+
+	if (TOO_MANY_GUC_IDS_NOT_READY(available_guc_ids, guc_ids_consumed)) {
+		set_and_update_guc_ids_exhausted(guc);
+		return true;
+	}
+
+	return false;
+}
+
+static void incr_num_rq_not_ready(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+
+	if (!atomic_fetch_add(1, &ce->guc_num_rq_not_ready))
+		atomic_inc(&guc->num_guc_ids_not_ready);
+}
+
+void intel_guc_decr_num_rq_not_ready(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+
+	if (atomic_fetch_add(-1, &ce->guc_num_rq_not_ready) == 1)
+		atomic_dec(&guc->num_guc_ids_not_ready);
+}
+
 static bool need_tasklet(struct intel_guc *guc, struct intel_context *ce)
 {
 	struct i915_sched_engine * const sched_engine =
@@ -1369,6 +1404,8 @@ static void guc_submit_request(struct i915_request *rq)
 		kick_tasklet(guc);
 
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
+
+	intel_guc_decr_num_rq_not_ready(rq->context);
 }
 
 #define GUC_ID_START	64	/* First 64 guc_ids reserved */
@@ -2240,10 +2277,13 @@ static int guc_request_alloc(struct i915_request *rq)
 	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
 
 	/*
-	 * guc_ids are exhausted, don't allocate one here, defer to submission
-	 * in the tasklet.
+	 * guc_ids are exhausted or a heuristic is met indicating too many
+	 * guc_ids are waiting on requests with submission dependencies (not
+	 * ready to submit). Don't allocate one here, defer to submission in the
+	 * tasklet.
 	 */
-	if (test_and_update_guc_ids_exhausted(guc)) {
+	if (test_and_update_guc_ids_exhausted(guc) ||
+	    too_many_guc_ids_not_ready(guc, ce)) {
 		set_bit(I915_FENCE_FLAG_GUC_ID_NOT_PINNED, &rq->fence.flags);
 		goto out;
 	}
@@ -2299,6 +2339,7 @@ static int guc_request_alloc(struct i915_request *rq)
 		 */
 		set_bit(I915_FENCE_FLAG_GUC_ID_NOT_PINNED, &rq->fence.flags);
 		set_and_update_guc_ids_exhausted(guc);
+		incr_num_rq_not_ready(ce);
 
 		return 0;
 	} else if (unlikely(ret < 0)) {
@@ -2321,6 +2362,8 @@ static int guc_request_alloc(struct i915_request *rq)
 	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
 
 out:
+	incr_num_rq_not_ready(ce);
+
 	/*
 	 * We block all requests on this context if a G2H is pending for a
 	 * schedule disable or context deregistration as the GuC will fail a
@@ -3088,6 +3131,8 @@ void intel_guc_log_submission_info(struct intel_guc *guc,
 	drm_printf(p, "GuC submit flags: 0x%04lx\n", guc->flags);
 	drm_printf(p, "GuC total number request without guc_id: %d\n",
 		   guc->total_num_rq_with_no_guc_id);
+	drm_printf(p, "GuC Number GuC IDs not ready: %d\n",
+		   atomic_read(&guc->num_guc_ids_not_ready));
 	drm_printf(p, "GuC stall reason: %d\n", guc->submission_stall_reason);
 	drm_printf(p, "GuC stalled request: %s\n",
 		   yesno(guc->stalled_rq));
@@ -3127,6 +3172,8 @@ void intel_guc_log_context_info(struct intel_guc *guc,
 			   atomic_read(&ce->pin_count));
 		drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
 			   atomic_read(&ce->guc_id_ref));
+		drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
+			   atomic_read(&ce->guc_num_rq_not_ready));
 		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
 			   ce->guc_state.sched_state,
 			   atomic_read(&ce->guc_sched_state_no_lock));
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index a2a3fad72be1..60c8b9aaad6e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -51,4 +51,6 @@ static inline bool intel_guc_submission_is_used(struct intel_guc *guc)
 	return intel_guc_is_used(guc) && intel_guc_submission_is_wanted(guc);
 }
 
+void intel_guc_decr_num_rq_not_ready(struct intel_context *ce);
+
 #endif
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 85/97] drm/i915/guc: Introduce guc_submit_engine object
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (83 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 84/97] drm/i915/guc: Don't allow requests not ready to consume all guc_ids Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 86/97] drm/i915/guc: Add golden context to GuC ADS Matthew Brost
                   ` (14 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Move fields related to controlling the GuC submission state machine to a
unique object (guc_submit_engine) rather than the global GuC state
(intel_guc). This encapsulation allows multiple instances of submission
objects to operate in parallel and a single instance can block if needed
while another can make forward progress. This is analogous to how the
execlist mode works assigning a schedule object per physical engine but
rather in GuC mode we assign a schedule object based on the blocking
dependencies.

The guc_submit_engine object also encapsulates the i915_sched_engine
object as well.

Lots of find-replace.

Currently only 1 guc_submit_engine instantiated, future patches will
instantiate more.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  33 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 544 +++++++++++-------
 .../i915/gt/uc/intel_guc_submission_types.h   |  53 ++
 drivers/gpu/drm/i915/i915_scheduler.c         |  25 +-
 drivers/gpu/drm/i915/i915_scheduler.h         |   5 +-
 drivers/gpu/drm/i915/i915_scheduler_types.h   |   3 +
 6 files changed, 411 insertions(+), 252 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 26a0225f45e9..904f3a941832 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -20,6 +20,11 @@
 
 struct __guc_ads_blob;
 
+enum {
+	GUC_SUBMIT_ENGINE_SINGLE_LRC,
+	GUC_SUBMIT_ENGINE_MAX
+};
+
 /*
  * Top level structure of GuC. It handles firmware loading and manages client
  * pool. intel_guc owns a intel_guc_client to replace the legacy ExecList
@@ -30,31 +35,6 @@ struct intel_guc {
 	struct intel_guc_log log;
 	struct intel_guc_ct ct;
 
-	/* Global engine used to submit requests to GuC */
-	struct i915_sched_engine *sched_engine;
-
-	/* Global state related to submission tasklet */
-	struct i915_request *stalled_rq;
-	struct intel_context *stalled_context;
-	struct work_struct retire_worker;
-	unsigned long flags;
-	int total_num_rq_with_no_guc_id;
-
-	/*
-	 * Submisson stall reason. See intel_guc_submission.c for detailed
-	 * description.
-	 */
-	enum {
-		STALL_NONE,
-		STALL_GUC_ID_WORKQUEUE,
-		STALL_GUC_ID_TASKLET,
-		STALL_SCHED_DISABLE,
-		STALL_REGISTER_CONTEXT,
-		STALL_DEREGISTER_CONTEXT,
-		STALL_MOVE_LRC_TAIL,
-		STALL_ADD_REQUEST,
-	} submission_stall_reason;
-
 	/* intel_guc_recv interrupt related state */
 	spinlock_t irq_lock;
 	unsigned int msg_enabled_mask;
@@ -68,6 +48,8 @@ struct intel_guc {
 		void (*disable)(struct intel_guc *guc);
 	} interrupts;
 
+	struct guc_submit_engine *gse[GUC_SUBMIT_ENGINE_MAX];
+
 	/*
 	 * contexts_lock protects the pool of free guc ids and a linked list of
 	 * guc ids available to be stolden
@@ -76,7 +58,6 @@ struct intel_guc {
 	struct ida guc_ids;
 	u32 num_guc_ids;
 	u32 max_guc_ids;
-	atomic_t num_guc_ids_not_ready;
 	struct list_head guc_id_list_no_ref;
 	struct list_head guc_id_list_unpinned;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index aa5e608deed5..9dc0ffc07cd7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -21,6 +21,7 @@
 #include "gt/intel_ring.h"
 
 #include "intel_guc_submission.h"
+#include "intel_guc_submission_types.h"
 
 #include "i915_drv.h"
 #include "i915_trace.h"
@@ -57,7 +58,7 @@
  * WQ_TYPE_INORDER is needed to support legacy submission via GuC, which
  * represents in-order queue. The kernel driver packs ring tail pointer and an
  * ELSP context descriptor dword into Work Item.
- * See guc_add_request()
+ * See gse_add_request()
  *
  * GuC flow control state machine:
  * The tasklet, workqueue (retire_worker), and the G2H handlers together more or
@@ -80,57 +81,57 @@
  *				context)
  */
 
-/* GuC Virtual Engine */
-struct guc_virtual_engine {
-	struct intel_engine_cs base;
-	struct intel_context context;
-};
-
 static struct intel_context *
 guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
 
 #define GUC_REQUEST_SIZE 64 /* bytes */
 
+static inline struct guc_submit_engine *ce_to_gse(struct intel_context *ce)
+{
+	return container_of(ce->engine->sched_engine, struct guc_submit_engine,
+			    sched_engine);
+}
+
 /*
  * Global GuC flags helper functions
  */
 enum {
-	GUC_STATE_TASKLET_BLOCKED,
-	GUC_STATE_GUC_IDS_EXHAUSTED,
+	GSE_STATE_TASKLET_BLOCKED,
+	GSE_STATE_GUC_IDS_EXHAUSTED,
 };
 
-static bool tasklet_blocked(struct intel_guc *guc)
+static bool tasklet_blocked(struct guc_submit_engine *gse)
 {
-	return test_bit(GUC_STATE_TASKLET_BLOCKED, &guc->flags);
+	return test_bit(GSE_STATE_TASKLET_BLOCKED, &gse->flags);
 }
 
-static void set_tasklet_blocked(struct intel_guc *guc)
+static void set_tasklet_blocked(struct guc_submit_engine *gse)
 {
-	lockdep_assert_held(&guc->sched_engine->lock);
-	set_bit(GUC_STATE_TASKLET_BLOCKED, &guc->flags);
+	lockdep_assert_held(&gse->sched_engine.lock);
+	set_bit(GSE_STATE_TASKLET_BLOCKED, &gse->flags);
 }
 
-static void __clr_tasklet_blocked(struct intel_guc *guc)
+static void __clr_tasklet_blocked(struct guc_submit_engine *gse)
 {
-	lockdep_assert_held(&guc->sched_engine->lock);
-	clear_bit(GUC_STATE_TASKLET_BLOCKED, &guc->flags);
+	lockdep_assert_held(&gse->sched_engine.lock);
+	clear_bit(GSE_STATE_TASKLET_BLOCKED, &gse->flags);
 }
 
-static void clr_tasklet_blocked(struct intel_guc *guc)
+static void clr_tasklet_blocked(struct guc_submit_engine *gse)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&guc->sched_engine->lock, flags);
-	__clr_tasklet_blocked(guc);
-	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+	spin_lock_irqsave(&gse->sched_engine.lock, flags);
+	__clr_tasklet_blocked(gse);
+	spin_unlock_irqrestore(&gse->sched_engine.lock, flags);
 }
 
-static bool guc_ids_exhausted(struct intel_guc *guc)
+static bool guc_ids_exhausted(struct guc_submit_engine *gse)
 {
-	return test_bit(GUC_STATE_GUC_IDS_EXHAUSTED, &guc->flags);
+	return test_bit(GSE_STATE_GUC_IDS_EXHAUSTED, &gse->flags);
 }
 
-static bool test_and_update_guc_ids_exhausted(struct intel_guc *guc)
+static bool test_and_update_guc_ids_exhausted(struct guc_submit_engine *gse)
 {
 	unsigned long flags;
 	bool ret = false;
@@ -139,33 +140,33 @@ static bool test_and_update_guc_ids_exhausted(struct intel_guc *guc)
 	 * Strict ordering on checking if guc_ids are exhausted isn't required,
 	 * so let's avoid grabbing the submission lock if possible.
 	 */
-	if (guc_ids_exhausted(guc)) {
-		spin_lock_irqsave(&guc->sched_engine->lock, flags);
-		ret = guc_ids_exhausted(guc);
+	if (guc_ids_exhausted(gse)) {
+		spin_lock_irqsave(&gse->sched_engine.lock, flags);
+		ret = guc_ids_exhausted(gse);
 		if (ret)
-			++guc->total_num_rq_with_no_guc_id;
-		spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+			++gse->total_num_rq_with_no_guc_id;
+		spin_unlock_irqrestore(&gse->sched_engine.lock, flags);
 	}
 
 	return ret;
 }
 
-static void set_and_update_guc_ids_exhausted(struct intel_guc *guc)
+static void set_and_update_guc_ids_exhausted(struct guc_submit_engine *gse)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&guc->sched_engine->lock, flags);
-	++guc->total_num_rq_with_no_guc_id;
-	set_bit(GUC_STATE_GUC_IDS_EXHAUSTED, &guc->flags);
-	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+	spin_lock_irqsave(&gse->sched_engine.lock, flags);
+	++gse->total_num_rq_with_no_guc_id;
+	set_bit(GSE_STATE_GUC_IDS_EXHAUSTED, &gse->flags);
+	spin_unlock_irqrestore(&gse->sched_engine.lock, flags);
 }
 
-static void clr_guc_ids_exhausted(struct intel_guc *guc)
+static void clr_guc_ids_exhausted(struct guc_submit_engine *gse)
 {
-	lockdep_assert_held(&guc->sched_engine->lock);
-	GEM_BUG_ON(guc->total_num_rq_with_no_guc_id);
+	lockdep_assert_held(&gse->sched_engine.lock);
+	GEM_BUG_ON(gse->total_num_rq_with_no_guc_id);
 
-	clear_bit(GUC_STATE_GUC_IDS_EXHAUSTED, &guc->flags);
+	clear_bit(GSE_STATE_GUC_IDS_EXHAUSTED, &gse->flags);
 }
 
 /*
@@ -372,6 +373,20 @@ static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
 	return &ce->engine->gt->uc.guc;
 }
 
+static inline struct i915_sched_engine *
+ce_to_sched_engine(struct intel_context *ce)
+{
+	return ce->engine->sched_engine;
+}
+
+static inline struct i915_sched_engine *
+guc_to_sched_engine(struct intel_guc *guc, int index)
+{
+	GEM_BUG_ON(index < 0 || index >= GUC_SUBMIT_ENGINE_MAX);
+
+	return &guc->gse[index]->sched_engine;
+}
+
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
 	return rb_entry(rb, struct i915_priolist, node);
@@ -591,19 +606,20 @@ static int __guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	return err;
 }
 
-static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
+static int gse_add_request(struct guc_submit_engine *gse,
+			   struct i915_request *rq)
 {
 	int ret;
 
-	lockdep_assert_held(&guc->sched_engine->lock);
+	lockdep_assert_held(&gse->sched_engine.lock);
 
-	ret = __guc_add_request(guc, rq);
+	ret = __guc_add_request(gse->guc, rq);
 	if (ret == -EBUSY) {
-		guc->stalled_rq = rq;
-		guc->submission_stall_reason = STALL_ADD_REQUEST;
+		gse->stalled_rq = rq;
+		gse->submission_stall_reason = STALL_ADD_REQUEST;
 	} else {
-		guc->stalled_rq = NULL;
-		guc->submission_stall_reason = STALL_NONE;
+		gse->stalled_rq = NULL;
+		gse->submission_stall_reason = STALL_NONE;
 	}
 
 	return ret;
@@ -611,14 +627,14 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 
 static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
 
-static int tasklet_register_context(struct intel_guc *guc,
+static int tasklet_register_context(struct guc_submit_engine *gse,
 				    struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
 	int ret = 0;
 
 	/* Check state */
-	lockdep_assert_held(&guc->sched_engine->lock);
+	lockdep_assert_held(&gse->sched_engine.lock);
 	GEM_BUG_ON(ce->guc_num_rq_submit_no_id);
 	GEM_BUG_ON(request_has_no_guc_id(rq));
 	GEM_BUG_ON(context_guc_id_invalid(ce));
@@ -631,7 +647,7 @@ static int tasklet_register_context(struct intel_guc *guc,
 	 * register this context or a corner case where the GuC firwmare was
 	 * blown away and reloaded while this context was pinned
 	 */
-	if (unlikely((!lrc_desc_registered(guc, ce->guc_id) ||
+	if (unlikely((!lrc_desc_registered(gse->guc, ce->guc_id) ||
 		      context_needs_register(ce)) &&
 		     !intel_context_is_banned(ce))) {
 		ret = guc_lrc_desc_pin(ce, false);
@@ -640,11 +656,11 @@ static int tasklet_register_context(struct intel_guc *guc,
 			clr_context_needs_register(ce);
 
 		if (unlikely(ret == -EBUSY)) {
-			guc->stalled_rq = rq;
-			guc->submission_stall_reason = STALL_REGISTER_CONTEXT;
+			gse->stalled_rq = rq;
+			gse->submission_stall_reason = STALL_REGISTER_CONTEXT;
 		} else if (unlikely(ret == -EINPROGRESS)) {
-			guc->stalled_rq = rq;
-			guc->submission_stall_reason = STALL_DEREGISTER_CONTEXT;
+			gse->stalled_rq = rq;
+			gse->submission_stall_reason = STALL_DEREGISTER_CONTEXT;
 		}
 	}
 
@@ -663,28 +679,29 @@ static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority;
 }
 
-static void kick_retire_wq(struct intel_guc *guc)
+static void kick_retire_wq(struct guc_submit_engine *gse)
 {
-	queue_work(system_unbound_wq, &guc->retire_worker);
+	queue_work(system_unbound_wq, &gse->retire_worker);
 }
 
-static int tasklet_pin_guc_id(struct intel_guc *guc, struct i915_request *rq);
+static int tasklet_pin_guc_id(struct guc_submit_engine *gse,
+			      struct i915_request *rq);
 
-static int guc_dequeue_one_context(struct intel_guc *guc)
+static int gse_dequeue_one_context(struct guc_submit_engine *gse)
 {
-	struct i915_sched_engine * const sched_engine = guc->sched_engine;
-	struct i915_request *last = guc->stalled_rq;
+	struct i915_sched_engine * const sched_engine = &gse->sched_engine;
+	struct i915_request *last = gse->stalled_rq;
 	bool submit = !!last;
 	struct rb_node *rb;
 	int ret;
 
 	lockdep_assert_held(&sched_engine->lock);
-	GEM_BUG_ON(guc->stalled_context);
-	GEM_BUG_ON(!submit && guc->submission_stall_reason);
+	GEM_BUG_ON(gse->stalled_context);
+	GEM_BUG_ON(!submit && gse->submission_stall_reason);
 
 	if (submit) {
 		/* Flow control conditions */
-		switch (guc->submission_stall_reason) {
+		switch (gse->submission_stall_reason) {
 		case STALL_GUC_ID_TASKLET:
 			goto done;
 		case STALL_REGISTER_CONTEXT:
@@ -697,8 +714,8 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 			GEM_BUG_ON("Invalid stall state");
 		}
 	} else {
-		GEM_BUG_ON(!guc->total_num_rq_with_no_guc_id &&
-			   guc_ids_exhausted(guc));
+		GEM_BUG_ON(!gse->total_num_rq_with_no_guc_id &&
+			   guc_ids_exhausted(gse));
 
 		while ((rb = rb_first_cached(&sched_engine->queue))) {
 			struct i915_priolist *p = to_priolist(rb);
@@ -727,13 +744,13 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 		struct intel_context *ce = last->context;
 
 		if (ce->guc_num_rq_submit_no_id) {
-			ret = tasklet_pin_guc_id(guc, last);
+			ret = tasklet_pin_guc_id(gse, last);
 			if (ret)
 				goto blk_tasklet_kick;
 		}
 
 register_context:
-		ret = tasklet_register_context(guc, last);
+		ret = tasklet_register_context(gse, last);
 		if (unlikely(ret == -EINPROGRESS))
 			goto blk_tasklet;
 		else if (unlikely(ret == -EDEADLK))
@@ -749,7 +766,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 		guc_set_lrc_tail(last);
 
 add_request:
-		ret = guc_add_request(guc, last);
+		ret = gse_add_request(gse, last);
 		if (unlikely(ret == -EDEADLK))
 			goto deadlk;
 		else if (ret == -EBUSY)
@@ -764,8 +781,8 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 	 * No requests without a guc_id, enable guc_id allocation at request
 	 * creation time (guc_request_alloc).
 	 */
-	if (!guc->total_num_rq_with_no_guc_id)
-		clr_guc_ids_exhausted(guc);
+	if (!gse->total_num_rq_with_no_guc_id)
+		clr_guc_ids_exhausted(gse);
 
 	return submit;
 
@@ -780,25 +797,26 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 	return false;
 
 blk_tasklet_kick:
-	kick_retire_wq(guc);
+	kick_retire_wq(gse);
 blk_tasklet:
-	set_tasklet_blocked(guc);
+	set_tasklet_blocked(gse);
 	return false;
 }
 
-static void guc_submission_tasklet(struct tasklet_struct *t)
+static void gse_submission_tasklet(struct tasklet_struct *t)
 {
 	struct i915_sched_engine *sched_engine =
 		from_tasklet(sched_engine, t, tasklet);
-	struct intel_guc *guc = &sched_engine->engine->gt->uc.guc;
+	struct guc_submit_engine *gse =
+		container_of(sched_engine, typeof(*gse), sched_engine);
 	unsigned long flags;
 	bool loop;
 
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	if (likely(!tasklet_blocked(guc)))
+	if (likely(!tasklet_blocked(gse)))
 		do {
-			loop = guc_dequeue_one_context(guc);
+			loop = gse_dequeue_one_context(gse);
 		} while (loop);
 
 	i915_sched_engine_reset_on_empty(sched_engine);
@@ -871,65 +889,92 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 static inline bool
 submission_disabled(struct intel_guc *guc)
 {
-	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	int i;
+
+	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i) {
+		struct i915_sched_engine *sched_engine =
+			guc_to_sched_engine(guc, i);
+
+		if (unlikely(!__tasklet_is_enabled(&sched_engine->tasklet)))
+			return true;
+	}
 
-	return unlikely(!__tasklet_is_enabled(&sched_engine->tasklet));
+	return false;
 }
 
-static void kick_tasklet(struct intel_guc *guc)
+static void kick_tasklet(struct guc_submit_engine *gse)
 {
-	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	struct i915_sched_engine *sched_engine = &gse->sched_engine;
 
-	if (likely(!tasklet_blocked(guc)))
+	if (likely(!tasklet_blocked(gse)))
 		i915_sched_engine_hi_kick(sched_engine);
 }
 
 static void disable_submission(struct intel_guc *guc)
 {
-	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	int i;
 
-	if (__tasklet_is_enabled(&sched_engine->tasklet)) {
-		GEM_BUG_ON(!guc->ct.enabled);
-		__tasklet_disable_sync_once(&sched_engine->tasklet);
-		sched_engine->tasklet.callback = NULL;
+	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i) {
+		struct i915_sched_engine *sched_engine =
+			guc_to_sched_engine(guc, i);
+
+		if (__tasklet_is_enabled(&sched_engine->tasklet)) {
+			GEM_BUG_ON(!guc->ct.enabled);
+			__tasklet_disable_sync_once(&sched_engine->tasklet);
+			sched_engine->tasklet.callback = NULL;
+		}
 	}
 }
 
 static void enable_submission(struct intel_guc *guc)
 {
-	struct i915_sched_engine * const sched_engine = guc->sched_engine;
 	unsigned long flags;
+	int i;
 
-	spin_lock_irqsave(&guc->sched_engine->lock, flags);
-	sched_engine->tasklet.callback = guc_submission_tasklet;
-	wmb();
-	if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
-	    __tasklet_enable(&sched_engine->tasklet)) {
-		GEM_BUG_ON(!guc->ct.enabled);
-
-		/* Reset tasklet state */
-		guc->stalled_rq = NULL;
-		if (guc->stalled_context)
-			intel_context_put(guc->stalled_context);
-		guc->stalled_context = NULL;
-		guc->submission_stall_reason = STALL_NONE;
-		guc->flags = 0;
-
-		/* And kick in case we missed a new request submission. */
-		kick_tasklet(guc);
+	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i) {
+		struct i915_sched_engine *sched_engine =
+			guc_to_sched_engine(guc, i);
+		struct guc_submit_engine *gse = guc->gse[i];
+
+		spin_lock_irqsave(&sched_engine->lock, flags);
+		sched_engine->tasklet.callback = gse_submission_tasklet;
+		wmb();
+		if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
+		    __tasklet_enable(&sched_engine->tasklet)) {
+			GEM_BUG_ON(!guc->ct.enabled);
+
+			/* Reset GuC submit engine state */
+			gse->stalled_rq = NULL;
+			if (gse->stalled_context)
+				intel_context_put(gse->stalled_context);
+			gse->stalled_context = NULL;
+			gse->submission_stall_reason = STALL_NONE;
+			gse->flags = 0;
+
+			/* And kick in case we missed a new request submission. */
+			kick_tasklet(gse);
+		}
+		spin_unlock_irqrestore(&sched_engine->lock, flags);
 	}
-	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
 }
 
-static void guc_flush_submissions(struct intel_guc *guc)
+static void gse_flush_submissions(struct guc_submit_engine *gse)
 {
-	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	struct i915_sched_engine * const sched_engine = &gse->sched_engine;
 	unsigned long flags;
 
 	spin_lock_irqsave(&sched_engine->lock, flags);
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
+static void guc_flush_submissions(struct intel_guc *guc)
+{
+	int i;
+
+	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i)
+		gse_flush_submissions(guc->gse[i]);
+}
+
 void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 {
 	int i;
@@ -1111,13 +1156,12 @@ void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
 		if (intel_context_is_pinned(ce))
 			__guc_reset_context(ce, stalled);
 
-	/* GuC is blown away, drop all references to contexts */
 	xa_destroy(&guc->context_lookup);
 }
 
 static void guc_cancel_context_requests(struct intel_context *ce)
 {
-	struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
+	struct i915_sched_engine *sched_engine = ce_to_sched_engine(ce);
 	struct i915_request *rq;
 	unsigned long flags;
 
@@ -1132,8 +1176,9 @@ static void guc_cancel_context_requests(struct intel_context *ce)
 }
 
 static void
-guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine)
+gse_cancel_requests(struct guc_submit_engine *gse)
 {
+	struct i915_sched_engine *sched_engine = &gse->sched_engine;
 	struct i915_request *rq, *rn;
 	struct rb_node *rb;
 	unsigned long flags;
@@ -1190,12 +1235,14 @@ void intel_guc_submission_cancel_requests(struct intel_guc *guc)
 {
 	struct intel_context *ce;
 	unsigned long index;
+	int i;
 
 	xa_for_each(&guc->context_lookup, index, ce)
 		if (intel_context_is_pinned(ce))
 			guc_cancel_context_requests(ce);
 
-	guc_cancel_sched_engine_requests(guc->sched_engine);
+	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i)
+		gse_cancel_requests(guc->gse[i]);
 
 	/* GuC is blown away, drop all references to contexts */
 	xa_destroy(&guc->context_lookup);
@@ -1222,13 +1269,13 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
 	intel_gt_unpark_heartbeats(guc_to_gt(guc));
 }
 
-static void retire_worker_sched_disable(struct intel_guc *guc,
+static void retire_worker_sched_disable(struct guc_submit_engine *gse,
 					struct intel_context *ce);
 
 static void retire_worker_func(struct work_struct *w)
 {
-	struct intel_guc *guc =
-		container_of(w, struct intel_guc, retire_worker);
+	struct guc_submit_engine *gse =
+		container_of(w, struct guc_submit_engine, retire_worker);
 
 	/*
 	 * It is possible that another thread issues the schedule disable + that
@@ -1236,17 +1283,17 @@ static void retire_worker_func(struct work_struct *w)
 	 * where nothing needs to be done here. Let's be paranoid and kick the
 	 * tasklet in that case.
 	 */
-	if (guc->submission_stall_reason != STALL_SCHED_DISABLE &&
-	    guc->submission_stall_reason != STALL_GUC_ID_WORKQUEUE) {
-		kick_tasklet(guc);
+	if (gse->submission_stall_reason != STALL_SCHED_DISABLE &&
+	    gse->submission_stall_reason != STALL_GUC_ID_WORKQUEUE) {
+		kick_tasklet(gse);
 		return;
 	}
 
-	if (guc->submission_stall_reason == STALL_SCHED_DISABLE) {
-		GEM_BUG_ON(!guc->stalled_context);
-		GEM_BUG_ON(context_guc_id_invalid(guc->stalled_context));
+	if (gse->submission_stall_reason == STALL_SCHED_DISABLE) {
+		GEM_BUG_ON(!gse->stalled_context);
+		GEM_BUG_ON(context_guc_id_invalid(gse->stalled_context));
 
-		retire_worker_sched_disable(guc, guc->stalled_context);
+		retire_worker_sched_disable(gse, gse->stalled_context);
 	}
 
 	/*
@@ -1254,16 +1301,16 @@ static void retire_worker_func(struct work_struct *w)
 	 * albeit after possibly issuing a schedule disable as that is async
 	 * operation.
 	 */
-	intel_gt_retire_requests(guc_to_gt(guc));
+	intel_gt_retire_requests(guc_to_gt(gse->guc));
 
-	if (guc->submission_stall_reason == STALL_GUC_ID_WORKQUEUE) {
-		GEM_BUG_ON(guc->stalled_context);
+	if (gse->submission_stall_reason == STALL_GUC_ID_WORKQUEUE) {
+		GEM_BUG_ON(gse->stalled_context);
 
 		/* Hopefully guc_ids are now available, kick tasklet */
-		guc->submission_stall_reason = STALL_GUC_ID_TASKLET;
-		clr_tasklet_blocked(guc);
+		gse->submission_stall_reason = STALL_GUC_ID_TASKLET;
+		clr_tasklet_blocked(gse);
 
-		kick_tasklet(guc);
+		kick_tasklet(gse);
 	}
 }
 
@@ -1294,18 +1341,24 @@ int intel_guc_submission_init(struct intel_guc *guc)
 	INIT_LIST_HEAD(&guc->guc_id_list_unpinned);
 	ida_init(&guc->guc_ids);
 
-	INIT_WORK(&guc->retire_worker, retire_worker_func);
-
 	return 0;
 }
 
 void intel_guc_submission_fini(struct intel_guc *guc)
 {
+	int i;
+
 	if (!guc->lrc_desc_pool)
 		return;
 
 	guc_lrc_desc_pool_destroy(guc);
-	i915_sched_engine_put(guc->sched_engine);
+
+	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i) {
+		struct i915_sched_engine *sched_engine =
+			guc_to_sched_engine(guc, i);
+
+		i915_sched_engine_put(sched_engine);
+	}
 }
 
 static inline void queue_request(struct i915_sched_engine *sched_engine,
@@ -1320,22 +1373,22 @@ static inline void queue_request(struct i915_sched_engine *sched_engine,
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 
 	if (empty)
-		kick_tasklet(&rq->engine->gt->uc.guc);
+		kick_tasklet(ce_to_gse(rq->context));
 }
 
 /* Macro to tweak heuristic, using a simple over 50% not ready for now */
 #define TOO_MANY_GUC_IDS_NOT_READY(avail, consumed) \
 	(consumed > avail / 2)
-static bool too_many_guc_ids_not_ready(struct intel_guc *guc,
+static bool too_many_guc_ids_not_ready(struct guc_submit_engine *gse,
 				       struct intel_context *ce)
 {
 	u32 available_guc_ids, guc_ids_consumed;
 
-	available_guc_ids = guc->num_guc_ids;
-	guc_ids_consumed = atomic_read(&guc->num_guc_ids_not_ready);
+	available_guc_ids = gse->guc->num_guc_ids;
+	guc_ids_consumed = atomic_read(&gse->num_guc_ids_not_ready);
 
 	if (TOO_MANY_GUC_IDS_NOT_READY(available_guc_ids, guc_ids_consumed)) {
-		set_and_update_guc_ids_exhausted(guc);
+		set_and_update_guc_ids_exhausted(gse);
 		return true;
 	}
 
@@ -1344,34 +1397,35 @@ static bool too_many_guc_ids_not_ready(struct intel_guc *guc,
 
 static void incr_num_rq_not_ready(struct intel_context *ce)
 {
-	struct intel_guc *guc = ce_to_guc(ce);
+	struct guc_submit_engine *gse = ce_to_gse(ce);
 
 	if (!atomic_fetch_add(1, &ce->guc_num_rq_not_ready))
-		atomic_inc(&guc->num_guc_ids_not_ready);
+		atomic_inc(&gse->num_guc_ids_not_ready);
 }
 
 void intel_guc_decr_num_rq_not_ready(struct intel_context *ce)
 {
-	struct intel_guc *guc = ce_to_guc(ce);
+	struct guc_submit_engine *gse = ce_to_gse(ce);
 
-	if (atomic_fetch_add(-1, &ce->guc_num_rq_not_ready) == 1)
-		atomic_dec(&guc->num_guc_ids_not_ready);
+	if (atomic_fetch_add(-1, &ce->guc_num_rq_not_ready) == 1) {
+		GEM_BUG_ON(!atomic_read(&gse->num_guc_ids_not_ready));
+		atomic_dec(&gse->num_guc_ids_not_ready);
+	}
 }
 
-static bool need_tasklet(struct intel_guc *guc, struct intel_context *ce)
+static bool need_tasklet(struct guc_submit_engine *gse, struct intel_context *ce)
 {
-	struct i915_sched_engine * const sched_engine =
-		ce->engine->sched_engine;
+	struct i915_sched_engine * const sched_engine = &gse->sched_engine;
 
 	lockdep_assert_held(&sched_engine->lock);
 
-	return guc_ids_exhausted(guc) || submission_disabled(guc) ||
-		guc->stalled_rq || guc->stalled_context ||
-		!lrc_desc_registered(guc, ce->guc_id) ||
+	return guc_ids_exhausted(gse) || submission_disabled(gse->guc) ||
+		gse->stalled_rq || gse->stalled_context ||
+		!lrc_desc_registered(gse->guc, ce->guc_id) ||
 		!i915_sched_engine_is_empty(sched_engine);
 }
 
-static int guc_bypass_tasklet_submit(struct intel_guc *guc,
+static int gse_bypass_tasklet_submit(struct guc_submit_engine *gse,
 				     struct i915_request *rq)
 {
 	int ret;
@@ -1381,27 +1435,27 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
 	trace_i915_request_in(rq, 0);
 
 	guc_set_lrc_tail(rq);
-	ret = guc_add_request(guc, rq);
+	ret = gse_add_request(gse, rq);
 
 	if (unlikely(ret == -EDEADLK))
-		disable_submission(guc);
+		disable_submission(gse->guc);
 
 	return ret;
 }
 
 static void guc_submit_request(struct i915_request *rq)
 {
+	struct guc_submit_engine *gse = ce_to_gse(rq->context);
 	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
-	struct intel_guc *guc = &rq->engine->gt->uc.guc;
 	unsigned long flags;
 
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	if (need_tasklet(guc, rq->context))
+	if (need_tasklet(gse, rq->context))
 		queue_request(sched_engine, rq, rq_prio(rq));
-	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
-		kick_tasklet(guc);
+	else if (gse_bypass_tasklet_submit(gse, rq) == -EBUSY)
+		kick_tasklet(gse);
 
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 
@@ -1478,8 +1532,9 @@ static int steal_guc_id(struct intel_guc *guc, bool unpinned)
 		 * context.
 		 */
 		if (!unpinned) {
-			GEM_BUG_ON(guc->stalled_context);
-			guc->stalled_context = intel_context_get(ce);
+			GEM_BUG_ON(ce_to_gse(ce)->stalled_context);
+
+			ce_to_gse(ce)->stalled_context = intel_context_get(ce);
 			set_context_guc_id_stolen(ce);
 		} else {
 			set_context_guc_id_invalid(ce);
@@ -1539,7 +1594,7 @@ static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce,
 try_again:
 	spin_lock_irqsave(&guc->contexts_lock, flags);
 
-	if (!tasklet && guc_ids_exhausted(guc)) {
+	if (!tasklet && guc_ids_exhausted(ce_to_gse(ce))) {
 		ret = -EAGAIN;
 		goto out_unlock;
 	}
@@ -1860,7 +1915,7 @@ static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
 	struct intel_guc *guc = ce_to_guc(ce);
 	unsigned long flags;
 
-	guc_flush_submissions(guc);
+	gse_flush_submissions(ce_to_gse(ce));
 
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
 	set_context_banned(ce);
@@ -1936,7 +1991,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
 	with_intel_runtime_pm(runtime_pm, wakeref)
-		__guc_context_sched_disable(guc, ce, guc_id);
+		__guc_context_sched_disable(ce_to_guc(ce), ce, guc_id);
 
 	return;
 unpin:
@@ -2077,7 +2132,7 @@ static void remove_from_context(struct i915_request *rq)
 	if (likely(!request_has_no_guc_id(rq)))
 		atomic_dec(&ce->guc_id_ref);
 	else
-		--ce_to_guc(rq->context)->total_num_rq_with_no_guc_id;
+		--ce_to_gse(rq->context)->total_num_rq_with_no_guc_id;
 	unpin_guc_id(ce_to_guc(ce), ce, false);
 
 	i915_request_notify_execute_cb_imm(rq);
@@ -2136,15 +2191,15 @@ static void invalidate_guc_id_sched_disable(struct intel_context *ce)
 	clr_context_guc_id_stolen(ce);
 }
 
-static void retire_worker_sched_disable(struct intel_guc *guc,
+static void retire_worker_sched_disable(struct guc_submit_engine *gse,
 					struct intel_context *ce)
 {
 	unsigned long flags;
 	bool disabled;
 
-	guc->stalled_context = NULL;
+	gse->stalled_context = NULL;
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
-	disabled = submission_disabled(guc);
+	disabled = submission_disabled(gse->guc);
 	if (!disabled && !context_pending_disable(ce) && context_enabled(ce)) {
 		/*
 		 * Still enabled, issue schedule disable + configure state so
@@ -2168,7 +2223,7 @@ static void retire_worker_sched_disable(struct intel_guc *guc,
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
 		with_intel_runtime_pm(runtime_pm, wakeref)
-			__guc_context_sched_disable(guc, ce, guc_id);
+			__guc_context_sched_disable(gse->guc, ce, guc_id);
 
 		invalidate_guc_id_sched_disable(ce);
 	} else if (!disabled && context_pending_disable(ce)) {
@@ -2188,10 +2243,10 @@ static void retire_worker_sched_disable(struct intel_guc *guc,
 
 		invalidate_guc_id_sched_disable(ce);
 
-		guc->submission_stall_reason = STALL_REGISTER_CONTEXT;
-		clr_tasklet_blocked(guc);
+		gse->submission_stall_reason = STALL_REGISTER_CONTEXT;
+		clr_tasklet_blocked(gse);
 
-		kick_tasklet(ce_to_guc(ce));
+		kick_tasklet(gse);
 	}
 
 	intel_context_put(ce);
@@ -2204,25 +2259,26 @@ static bool context_needs_lrc_desc_pin(struct intel_context *ce, bool new_guc_id
 		!submission_disabled(ce_to_guc(ce));
 }
 
-static int tasklet_pin_guc_id(struct intel_guc *guc, struct i915_request *rq)
+static int tasklet_pin_guc_id(struct guc_submit_engine *gse,
+			      struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
 	int ret = 0;
 
-	lockdep_assert_held(&guc->sched_engine->lock);
+	lockdep_assert_held(&gse->sched_engine.lock);
 	GEM_BUG_ON(!ce->guc_num_rq_submit_no_id);
 
 	if (atomic_add_unless(&ce->guc_id_ref, ce->guc_num_rq_submit_no_id, 0))
 		goto out;
 
-	ret = pin_guc_id(guc, ce, true);
+	ret = pin_guc_id(gse->guc, ce, true);
 	if (unlikely(ret < 0)) {
 		/*
 		 * No guc_ids available, disable the tasklet and kick the retire
 		 * workqueue hopefully freeing up some guc_ids.
 		 */
-		guc->stalled_rq = rq;
-		guc->submission_stall_reason = STALL_GUC_ID_WORKQUEUE;
+		gse->stalled_rq = rq;
+		gse->submission_stall_reason = STALL_GUC_ID_WORKQUEUE;
 		return ret;
 	}
 
@@ -2234,14 +2290,14 @@ static int tasklet_pin_guc_id(struct intel_guc *guc, struct i915_request *rq)
 		set_context_needs_register(ce);
 
 	if (ret == NEW_GUC_ID_ENABLED) {
-		guc->stalled_rq = rq;
-		guc->submission_stall_reason = STALL_SCHED_DISABLE;
+		gse->stalled_rq = rq;
+		gse->submission_stall_reason = STALL_SCHED_DISABLE;
 	}
 
 	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
 out:
-	guc->total_num_rq_with_no_guc_id -= ce->guc_num_rq_submit_no_id;
-	GEM_BUG_ON(guc->total_num_rq_with_no_guc_id < 0);
+	gse->total_num_rq_with_no_guc_id -= ce->guc_num_rq_submit_no_id;
+	GEM_BUG_ON(gse->total_num_rq_with_no_guc_id < 0);
 
 	list_for_each_entry_reverse(rq, &ce->guc_active.requests, sched.link)
 		if (request_has_no_guc_id(rq)) {
@@ -2259,7 +2315,7 @@ static int tasklet_pin_guc_id(struct intel_guc *guc, struct i915_request *rq)
 	 * from a context that has scheduling enabled. We have to disable
 	 * scheduling before deregistering the context and it isn't safe to do
 	 * in the tasklet because of lock inversion (ce->guc_state.lock must be
-	 * acquired before guc->sched_engine->lock). To work around this
+	 * acquired before gse->sched_engine.lock). To work around this
 	 * we do the schedule disable in retire workqueue and block the tasklet
 	 * until the schedule done G2H returns. Returning non-zero here kicks
 	 * the workqueue.
@@ -2271,6 +2327,7 @@ static int guc_request_alloc(struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
 	struct intel_guc *guc = ce_to_guc(ce);
+	struct guc_submit_engine *gse = ce_to_gse(ce);
 	unsigned long flags;
 	int ret;
 
@@ -2282,8 +2339,8 @@ static int guc_request_alloc(struct i915_request *rq)
 	 * ready to submit). Don't allocate one here, defer to submission in the
 	 * tasklet.
 	 */
-	if (test_and_update_guc_ids_exhausted(guc) ||
-	    too_many_guc_ids_not_ready(guc, ce)) {
+	if (test_and_update_guc_ids_exhausted(gse) ||
+	    too_many_guc_ids_not_ready(gse, ce)) {
 		set_bit(I915_FENCE_FLAG_GUC_ID_NOT_PINNED, &rq->fence.flags);
 		goto out;
 	}
@@ -2338,7 +2395,7 @@ static int guc_request_alloc(struct i915_request *rq)
 		 * submissions we return to allocating guc_ids in this function.
 		 */
 		set_bit(I915_FENCE_FLAG_GUC_ID_NOT_PINNED, &rq->fence.flags);
-		set_and_update_guc_ids_exhausted(guc);
+		set_and_update_guc_ids_exhausted(gse);
 		incr_num_rq_not_ready(ce);
 
 		return 0;
@@ -2729,10 +2786,37 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
 	intel_engine_set_irq_handler(engine, cs_irq_handler);
 }
 
+static void guc_sched_engine_destroy(struct kref *kref)
+{
+	struct i915_sched_engine *sched_engine =
+		container_of(kref, typeof(*sched_engine), ref);
+	struct guc_submit_engine *gse =
+		container_of(sched_engine, typeof(*gse), sched_engine);
+
+	i915_sched_engine_kill(sched_engine); /* flush the callback */
+	kfree(gse);
+}
+
+static void guc_submit_engine_init(struct intel_guc *guc,
+				   struct guc_submit_engine *gse,
+				   int id)
+{
+	i915_sched_engine_init(&gse->sched_engine, ENGINE_VIRTUAL);
+	INIT_WORK(&gse->retire_worker, retire_worker_func);
+	tasklet_setup(&gse->sched_engine.tasklet, gse_submission_tasklet);
+	gse->sched_engine.schedule = i915_schedule;
+	gse->sched_engine.disabled = guc_sched_engine_disabled;
+	gse->sched_engine.destroy = guc_sched_engine_destroy;
+	gse->guc = guc;
+	gse->id = id;
+}
+
 int intel_guc_submission_setup(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
 	struct intel_guc *guc = &engine->gt->uc.guc;
+	struct i915_sched_engine *sched_engine;
+	int ret, i;
 
 	/*
 	 * The setup relies on several assumptions (e.g. irqs always enabled)
@@ -2740,19 +2824,20 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 	 */
 	GEM_BUG_ON(INTEL_GEN(i915) < 11);
 
-	if (!guc->sched_engine) {
-		guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
-		if (!guc->sched_engine)
-			return -ENOMEM;
-
-		guc->sched_engine->schedule = i915_schedule;
-		guc->sched_engine->disabled = guc_sched_engine_disabled;
-		guc->sched_engine->engine = engine;
-		tasklet_setup(&guc->sched_engine->tasklet,
-			      guc_submission_tasklet);
+	if (!guc->gse[0]) {
+		for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i) {
+			guc->gse[i] = kzalloc(sizeof(*guc->gse[i]), GFP_KERNEL);
+			if (!guc->gse[i]) {
+				ret = -ENOMEM;
+				goto put_sched_engine;
+			}
+			guc_submit_engine_init(guc, guc->gse[i], i);
+		}
 	}
+
+	sched_engine = guc_to_sched_engine(guc, GUC_SUBMIT_ENGINE_SINGLE_LRC);
 	i915_sched_engine_put(engine->sched_engine);
-	engine->sched_engine = i915_sched_engine_get(guc->sched_engine);
+	engine->sched_engine = i915_sched_engine_get(sched_engine);
 
 	guc_default_vfuncs(engine);
 	guc_default_irqs(engine);
@@ -2768,6 +2853,16 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 	engine->release = guc_release;
 
 	return 0;
+
+put_sched_engine:
+	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i) {
+		struct i915_sched_engine *sched_engine =
+			guc_to_sched_engine(guc, i);
+
+		if (sched_engine)
+			i915_sched_engine_put(sched_engine);
+	}
+	return ret;
 }
 
 void intel_guc_submission_enable(struct intel_guc *guc)
@@ -2863,14 +2958,16 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 			register_context(ce, true);
 		guc_signal_context_fence(ce);
 		if (context_block_tasklet(ce)) {
-			GEM_BUG_ON(guc->submission_stall_reason !=
+			struct guc_submit_engine *gse = ce_to_gse(ce);
+
+			GEM_BUG_ON(gse->submission_stall_reason !=
 				   STALL_DEREGISTER_CONTEXT);
 
 			clr_context_block_tasklet(ce);
-			guc->submission_stall_reason = STALL_MOVE_LRC_TAIL;
-			clr_tasklet_blocked(guc);
+			gse->submission_stall_reason = STALL_MOVE_LRC_TAIL;
+			clr_tasklet_blocked(gse);
 
-			kick_tasklet(ce_to_guc(ce));
+			kick_tasklet(gse);
 		}
 		intel_context_put(ce);
 	} else if (context_destroyed(ce)) {
@@ -2935,11 +3032,13 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
 		if (context_block_tasklet(ce)) {
+			struct guc_submit_engine *gse = ce_to_gse(ce);
+
 			clr_context_block_tasklet(ce);
-			guc->submission_stall_reason = STALL_REGISTER_CONTEXT;
-			clr_tasklet_blocked(guc);
+			gse->submission_stall_reason = STALL_REGISTER_CONTEXT;
+			clr_tasklet_blocked(gse);
 
-			kick_tasklet(ce_to_guc(ce));
+			kick_tasklet(gse);
 		}
 
 		if (banned) {
@@ -2971,7 +3070,7 @@ static void capture_error_state(struct intel_guc *guc,
 static void guc_context_replay(struct intel_context *ce)
 {
 	__guc_reset_context(ce, true);
-	kick_tasklet(ce_to_guc(ce));
+	kick_tasklet(ce_to_gse(ce));
 }
 
 static void guc_handle_context_reset(struct intel_guc *guc,
@@ -3115,32 +3214,29 @@ void intel_guc_dump_active_requests(struct intel_engine_cs *engine,
 	}
 }
 
-void intel_guc_log_submission_info(struct intel_guc *guc,
-				   struct drm_printer *p)
+static void gse_log_submission_info(struct guc_submit_engine *gse,
+				    struct drm_printer *p, int id)
 {
-	struct i915_sched_engine *sched_engine = guc->sched_engine;
+	struct i915_sched_engine *sched_engine = &gse->sched_engine;
 	struct rb_node *rb;
 	unsigned long flags;
 
-	drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
-		   atomic_read(&guc->outstanding_submission_g2h));
-	drm_printf(p, "GuC Number GuC IDs: %u\n", guc->num_guc_ids);
-	drm_printf(p, "GuC Max GuC IDs: %u\n", guc->max_guc_ids);
-	drm_printf(p, "GuC tasklet count: %u\n",
+	drm_printf(p, "GSE[%d] tasklet count: %u\n", id,
 		   atomic_read(&sched_engine->tasklet.count));
-	drm_printf(p, "GuC submit flags: 0x%04lx\n", guc->flags);
-	drm_printf(p, "GuC total number request without guc_id: %d\n",
-		   guc->total_num_rq_with_no_guc_id);
-	drm_printf(p, "GuC Number GuC IDs not ready: %d\n",
-		   atomic_read(&guc->num_guc_ids_not_ready));
-	drm_printf(p, "GuC stall reason: %d\n", guc->submission_stall_reason);
-	drm_printf(p, "GuC stalled request: %s\n",
-		   yesno(guc->stalled_rq));
-	drm_printf(p, "GuC stalled context: %s\n\n",
-		   yesno(guc->stalled_context));
+	drm_printf(p, "GSE[%d] submit flags: 0x%04lx\n", id, gse->flags);
+	drm_printf(p, "GSE[%d] total number request without guc_id: %d\n",
+		   id, gse->total_num_rq_with_no_guc_id);
+	drm_printf(p, "GSE[%d] Number GuC IDs not ready: %d\n",
+		   id, atomic_read(&gse->num_guc_ids_not_ready));
+	drm_printf(p, "GSE[%d] stall reason: %d\n",
+		   id, gse->submission_stall_reason);
+	drm_printf(p, "GSE[%d] stalled request: %s\n",
+		   id, yesno(gse->stalled_rq));
+	drm_printf(p, "GSE[%d] stalled context: %s\n\n",
+		   id, yesno(gse->stalled_context));
 
 	spin_lock_irqsave(&sched_engine->lock, flags);
-	drm_printf(p, "Requests in GuC submit tasklet:\n");
+	drm_printf(p, "Requests in GSE[%d] submit tasklet:\n", id);
 	for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
 		struct i915_priolist *pl = to_priolist(rb);
 		struct i915_request *rq;
@@ -3154,6 +3250,20 @@ void intel_guc_log_submission_info(struct intel_guc *guc,
 	drm_printf(p, "\n");
 }
 
+void intel_guc_log_submission_info(struct intel_guc *guc,
+				   struct drm_printer *p)
+{
+	int i;
+
+	drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
+		   atomic_read(&guc->outstanding_submission_g2h));
+	drm_printf(p, "GuC Number GuC IDs: %d\n", guc->num_guc_ids);
+	drm_printf(p, "GuC Max Number GuC IDs: %d\n\n", guc->max_guc_ids);
+
+	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i)
+		gse_log_submission_info(guc->gse[i], p, i);
+}
+
 void intel_guc_log_context_info(struct intel_guc *guc,
 				struct drm_printer *p)
 {
@@ -3185,6 +3295,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 {
 	struct guc_virtual_engine *ve;
 	struct intel_guc *guc;
+	struct i915_sched_engine *sched_engine;
 	unsigned int n;
 	int err;
 
@@ -3193,6 +3304,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 		return ERR_PTR(-ENOMEM);
 
 	guc = &siblings[0]->gt->uc.guc;
+	sched_engine = guc_to_sched_engine(guc, GUC_SUBMIT_ENGINE_SINGLE_LRC);
 
 	ve->base.i915 = siblings[0]->i915;
 	ve->base.gt = siblings[0]->gt;
@@ -3206,7 +3318,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 
 	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
 
-	ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine);
+	ve->base.sched_engine = i915_sched_engine_get(sched_engine);
 
 	ve->base.cops = &virtual_guc_context_ops;
 	ve->base.request_alloc = guc_request_alloc;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
new file mode 100644
index 000000000000..e45c2f00f09c
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2014-2019 Intel Corporation
+ */
+
+#ifndef _INTEL_GUC_SUBMISSION_TYPES_H_
+#define _INTEL_GUC_SUBMISSION_TYPES_H_
+
+#include "gt/intel_engine_types.h"
+#include "gt/intel_context_types.h"
+#include "i915_scheduler_types.h"
+
+struct intel_guc;
+struct i915_request;
+
+/* GuC Virtual Engine */
+struct guc_virtual_engine {
+	struct intel_engine_cs base;
+	struct intel_context context;
+};
+
+/*
+ * Object which encapsulates the globally operated on i915_sched_engine +
+ * the GuC submission state machine described in intel_guc_submission.c.
+ */
+struct guc_submit_engine {
+	struct i915_sched_engine sched_engine;
+	struct work_struct retire_worker;
+	struct intel_guc *guc;
+	struct i915_request *stalled_rq;
+	struct intel_context *stalled_context;
+	unsigned long flags;
+	int total_num_rq_with_no_guc_id;
+	atomic_t num_guc_ids_not_ready;
+	int id;
+
+	/*
+	 * Submisson stall reason. See intel_guc_submission.c for detailed
+	 * description.
+	 */
+	enum {
+		STALL_NONE,
+		STALL_GUC_ID_WORKQUEUE,
+		STALL_GUC_ID_TASKLET,
+		STALL_SCHED_DISABLE,
+		STALL_REGISTER_CONTEXT,
+		STALL_DEREGISTER_CONTEXT,
+		STALL_MOVE_LRC_TAIL,
+		STALL_ADD_REQUEST,
+	} submission_stall_reason;
+};
+
+#endif
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 72a9bee3026f..51644de0e9ca 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -431,7 +431,7 @@ void i915_request_show_with_schedule(struct drm_printer *m,
 	rcu_read_unlock();
 }
 
-void i915_sched_engine_free(struct kref *kref)
+static void default_destroy(struct kref *kref)
 {
 	struct i915_sched_engine *sched_engine =
 		container_of(kref, typeof(*sched_engine), ref);
@@ -445,20 +445,15 @@ static bool default_disabled(struct i915_sched_engine *sched_engine)
 	return false;
 }
 
-struct i915_sched_engine *
-i915_sched_engine_create(unsigned int subclass)
+void i915_sched_engine_init(struct i915_sched_engine *sched_engine,
+			    unsigned int subclass)
 {
-	struct i915_sched_engine *sched_engine;
-
-	sched_engine = kzalloc(sizeof(*sched_engine), GFP_KERNEL);
-	if (!sched_engine)
-		return NULL;
-
 	kref_init(&sched_engine->ref);
 
 	sched_engine->queue = RB_ROOT_CACHED;
 	sched_engine->queue_priority_hint = INT_MIN;
 	sched_engine->disabled = default_disabled;
+	sched_engine->destroy = default_destroy;
 
 	INIT_LIST_HEAD(&sched_engine->requests);
 	INIT_LIST_HEAD(&sched_engine->hold);
@@ -477,7 +472,19 @@ i915_sched_engine_create(unsigned int subclass)
 	lock_map_release(&sched_engine->lock.dep_map);
 	local_irq_enable();
 #endif
+}
+
+struct i915_sched_engine *
+i915_sched_engine_create(unsigned int subclass)
+{
+	struct i915_sched_engine *sched_engine;
+
+	sched_engine = kzalloc(sizeof(*sched_engine), GFP_KERNEL);
+	if (!sched_engine)
+		return NULL;
 
+	i915_sched_engine_init(sched_engine, subclass);
+ 
 	return sched_engine;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index ec8dfa87cbb6..92627f72182a 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -53,6 +53,9 @@ void i915_request_show_with_schedule(struct drm_printer *m,
 				     const char *prefix,
 				     int indent);
 
+void i915_sched_engine_init(struct i915_sched_engine *sched_engine,
+			    unsigned int subclass);
+
 struct i915_sched_engine *
 i915_sched_engine_create(unsigned int subclass);
 
@@ -68,7 +71,7 @@ i915_sched_engine_get(struct i915_sched_engine *sched_engine)
 static inline void
 i915_sched_engine_put(struct i915_sched_engine *sched_engine)
 {
-	kref_put(&sched_engine->ref, i915_sched_engine_free);
+	kref_put(&sched_engine->ref, sched_engine->destroy);
 }
 
 static inline bool
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index a7183792d110..a0b755a27140 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -141,6 +141,9 @@ struct i915_sched_engine {
 	/* Back pointer to engine */
 	struct intel_engine_cs *engine;
 
+	/* Destroy schedule engine */
+	void	(*destroy)(struct kref *kref);
+
 	/* Schedule engine is disabled by backend */
 	bool	(*disabled)(struct i915_sched_engine *sched_engine);
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 86/97] drm/i915/guc: Add golden context to GuC ADS
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (84 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 85/97] drm/i915/guc: Introduce guc_submit_engine object Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 87/97] drm/i915/guc: Implement GuC priority management Matthew Brost
                   ` (13 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

The media watchdog mechanism involves GuC doing a silent reset and
continue of the hung context. This requires the i915 driver provide a
golden context to GuC in the ADS.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.c         |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c     |   5 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h     |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 213 ++++++++++++++++++---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.c      |   5 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.h      |   1 +
 7 files changed, 199 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 1742a8561f69..0e4a5c4c883f 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -641,6 +641,8 @@ int intel_gt_init(struct intel_gt *gt)
 	if (err)
 		goto err_gt;
 
+	intel_uc_init_late(&gt->uc);
+
 	err = i915_inject_probe_error(gt->i915, -EIO);
 	if (err)
 		goto err_gt;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index f3240037fb7c..918802712460 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -192,6 +192,11 @@ void intel_guc_init_early(struct intel_guc *guc)
 	}
 }
 
+void intel_guc_init_late(struct intel_guc *guc)
+{
+	intel_guc_ads_init_late(guc);
+}
+
 static u32 guc_ctl_debug_flags(struct intel_guc *guc)
 {
 	u32 level = intel_guc_log_get_level(&guc->log);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 904f3a941832..96849a256be8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -66,6 +66,7 @@ struct intel_guc {
 	struct i915_vma *ads_vma;
 	struct __guc_ads_blob *ads_blob;
 	u32 ads_regset_size;
+	u32 ads_golden_ctxt_size;
 
 	struct i915_vma *lrc_desc_pool;
 	void *lrc_desc_pool_vaddr;
@@ -183,6 +184,7 @@ static inline u32 intel_guc_ggtt_offset(struct intel_guc *guc,
 }
 
 void intel_guc_init_early(struct intel_guc *guc);
+void intel_guc_init_late(struct intel_guc *guc);
 void intel_guc_init_send_regs(struct intel_guc *guc);
 void intel_guc_write_params(struct intel_guc *guc);
 int intel_guc_init(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index bc2745f73a06..299aa580d90a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -7,6 +7,7 @@
 
 #include "gt/intel_gt.h"
 #include "gt/intel_lrc.h"
+#include "gt/shmem_utils.h"
 #include "intel_guc_ads.h"
 #include "intel_guc_fwif.h"
 #include "intel_uc.h"
@@ -35,6 +36,10 @@
  *      +---------------------------------------+ <== dynamic
  *      | padding                               |
  *      +---------------------------------------+ <== 4K aligned
+ *      | golden contexts                       |
+ *      +---------------------------------------+
+ *      | padding                               |
+ *      +---------------------------------------+ <== 4K aligned
  *      | private data                          |
  *      +---------------------------------------+
  *      | padding                               |
@@ -55,6 +60,11 @@ static u32 guc_ads_regset_size(struct intel_guc *guc)
 	return guc->ads_regset_size;
 }
 
+static u32 guc_ads_golden_ctxt_size(struct intel_guc *guc)
+{
+	return PAGE_ALIGN(guc->ads_golden_ctxt_size);
+}
+
 static u32 guc_ads_private_data_size(struct intel_guc *guc)
 {
 	return PAGE_ALIGN(guc->fw.private_data_size);
@@ -65,12 +75,23 @@ static u32 guc_ads_regset_offset(struct intel_guc *guc)
 	return offsetof(struct __guc_ads_blob, regset);
 }
 
-static u32 guc_ads_private_data_offset(struct intel_guc *guc)
+static u32 guc_ads_golden_ctxt_offset(struct intel_guc *guc)
 {
 	u32 offset;
 
 	offset = guc_ads_regset_offset(guc) +
 		 guc_ads_regset_size(guc);
+
+	return PAGE_ALIGN(offset);
+}
+
+static u32 guc_ads_private_data_offset(struct intel_guc *guc)
+{
+	u32 offset;
+
+	offset = guc_ads_golden_ctxt_offset(guc) +
+		 guc_ads_golden_ctxt_size(guc);
+
 	return PAGE_ALIGN(offset);
 }
 
@@ -321,53 +342,163 @@ static void guc_mmio_reg_state_init(struct intel_guc *guc,
 	GEM_BUG_ON(temp_set.size);
 }
 
-/*
- * The first 80 dwords of the register state context, containing the
- * execlists and ppgtt registers.
- */
-#define LR_HW_CONTEXT_SIZE	(80 * sizeof(u32))
+static void fill_engine_enable_masks(struct intel_gt *gt,
+				     struct guc_gt_system_info *info)
+{
+	info->engine_enabled_masks[GUC_RENDER_CLASS] = 1;
+	info->engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
+	info->engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
+	info->engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
+}
 
-static void __guc_ads_init(struct intel_guc *guc)
+/* Skip execlist and PPGTT registers */
+#define LR_HW_CONTEXT_SIZE      (80 * sizeof(u32))
+#define SKIP_SIZE               (LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE)
+
+static int guc_prep_golden_context(struct intel_guc *guc,
+				   struct __guc_ads_blob *blob)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
-	struct drm_i915_private *i915 = gt->i915;
+	u32 addr_ggtt, offset;
+	u32 total_size = 0, alloc_size, real_size;
+	u8 engine_class, guc_class;
+	struct guc_gt_system_info *info, local_info;
+
+	/*
+	 * Reserve the memory for the golden contexts and point GuC at it but
+	 * leave it empty for now. The context data will be filled in later
+	 * once there is something available to put there.
+	 *
+	 * Note that the HWSP and ring context are not included.
+	 *
+	 * Note also that the storage must be pinned in the GGTT, so that the
+	 * address won't change after GuC has been told where to find it. The
+	 * GuC will also validate that the LRC base + size fall within the
+	 * allowed GGTT range.
+	 */
+	if (blob) {
+		offset = guc_ads_golden_ctxt_offset(guc);
+		addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
+		info = &blob->system_info;
+	} else {
+		memset(&local_info, 0, sizeof(local_info));
+		info = &local_info;
+		fill_engine_enable_masks(gt, info);
+	}
+
+	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
+		if (engine_class == OTHER_CLASS)
+			continue;
+
+		guc_class = engine_class_to_guc_class(engine_class);
+
+		if (!info->engine_enabled_masks[guc_class])
+			continue;
+
+		real_size = intel_engine_context_size(gt, engine_class);
+		alloc_size = PAGE_ALIGN(real_size);
+		total_size += alloc_size;
+
+		if (!blob)
+			continue;
+
+		blob->ads.eng_state_size[guc_class] = real_size;
+		blob->ads.golden_context_lrca[guc_class] = addr_ggtt;
+		addr_ggtt += alloc_size;
+	}
+
+	if (!blob)
+		return total_size;
+
+	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
+	return total_size;
+}
+
+static struct intel_engine_cs *find_engine_state(struct intel_gt *gt, u8 engine_class)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, gt, id) {
+		if (engine->class != engine_class)
+			continue;
+
+		if (!engine->default_state)
+			continue;
+
+		return engine;
+	}
+
+	return NULL;
+}
+
+static void guc_init_golden_context(struct intel_guc *guc)
+{
 	struct __guc_ads_blob *blob = guc->ads_blob;
-	const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE;
-	u32 base;
+	struct intel_engine_cs *engine;
+	struct intel_gt *gt = guc_to_gt(guc);
+	u32 addr_ggtt, offset;
+	u32 total_size = 0, alloc_size, real_size;
 	u8 engine_class, guc_class;
+	u8 *ptr;
 
-	/* GuC scheduling policies */
-	guc_policies_init(guc, &blob->policies);
+	if (!intel_uc_uses_guc_submission(&gt->uc))
+		return;
+
+	GEM_BUG_ON(!blob);
 
 	/*
-	 * GuC expects a per-engine-class context image and size
-	 * (minus hwsp and ring context). The context image will be
-	 * used to reinitialize engines after a reset. It must exist
-	 * and be pinned in the GGTT, so that the address won't change after
-	 * we have told GuC where to find it. The context size will be used
-	 * to validate that the LRC base + size fall within allowed GGTT.
+	 * Go back and fill in the golden context data now that it is
+	 * available.
 	 */
+	offset = guc_ads_golden_ctxt_offset(guc);
+	addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
+	ptr = ((u8 *) blob) + offset;
+
 	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
 		if (engine_class == OTHER_CLASS)
 			continue;
 
 		guc_class = engine_class_to_guc_class(engine_class);
 
-		/*
-		 * TODO: Set context pointer to default state to allow
-		 * GuC to re-init guilty contexts after internal reset.
-		 */
-		blob->ads.golden_context_lrca[guc_class] = 0;
-		blob->ads.eng_state_size[guc_class] =
-			intel_engine_context_size(gt, engine_class) -
-			skipped_size;
+		if (!blob->system_info.engine_enabled_masks[guc_class])
+			continue;
+
+		real_size = intel_engine_context_size(gt, engine_class);
+		alloc_size = PAGE_ALIGN(real_size);
+		total_size += alloc_size;
+
+		engine = find_engine_state(gt, engine_class);
+		if (!engine) {
+			drm_err(&gt->i915->drm, "No engine state recorded for class %d!\n", engine_class);
+			blob->ads.eng_state_size[guc_class] = 0;
+			blob->ads.golden_context_lrca[guc_class] = 0;
+			continue;
+		}
+
+		GEM_BUG_ON(blob->ads.eng_state_size[guc_class] != real_size);
+		GEM_BUG_ON(blob->ads.golden_context_lrca[guc_class] != addr_ggtt);
+		addr_ggtt += alloc_size;
+
+		shmem_read(engine->default_state, SKIP_SIZE, ptr + SKIP_SIZE, real_size);
+		ptr += alloc_size;
 	}
 
+	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
+}
+
+static void __guc_ads_init(struct intel_guc *guc)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct drm_i915_private *i915 = gt->i915;
+	struct __guc_ads_blob *blob = guc->ads_blob;
+	u32 base;
+
+	/* GuC scheduling policies */
+	guc_policies_init(guc, &blob->policies);
+
 	/* System info */
-	blob->system_info.engine_enabled_masks[GUC_RENDER_CLASS] = 1;
-	blob->system_info.engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
-	blob->system_info.engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
-	blob->system_info.engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
+	fill_engine_enable_masks(gt, &blob->system_info);
 
 	blob->system_info.generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_SLICE_ENABLED] =
 		hweight8(gt->info.sseu.slice_mask);
@@ -382,6 +513,9 @@ static void __guc_ads_init(struct intel_guc *guc)
 			 GEN12_DOORBELLS_PER_SQIDI) + 1;
 	}
 
+	/* Golden contexts for re-initialising after a watchdog reset */
+	guc_prep_golden_context(guc, blob);
+
 	guc_mapping_table_init(guc_to_gt(guc), &blob->system_info);
 
 	base = intel_guc_ggtt_offset(guc, guc->ads_vma);
@@ -423,6 +557,13 @@ int intel_guc_ads_create(struct intel_guc *guc)
 		return ret;
 	guc->ads_regset_size = ret;
 
+	/* Likewise the golden contexts: */
+	ret = guc_prep_golden_context(guc, NULL);
+	if (ret < 0)
+		return ret;
+	guc->ads_golden_ctxt_size = ret;
+
+	/* Now the total size can be determined: */
 	size = guc_ads_blob_size(guc);
 
 	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma,
@@ -435,6 +576,18 @@ int intel_guc_ads_create(struct intel_guc *guc)
 	return 0;
 }
 
+void intel_guc_ads_init_late(struct intel_guc *guc)
+{
+	/*
+	 * The golden context setup requires the saved engine state from
+	 * __engines_record_defaults(). However, that requires engines to be
+	 * operational which means the ADS must already have been configured.
+	 * Fortunately, the golden context state is not needed until a hang
+	 * occurs, so it can be filled in during this late init phase.
+	 */
+	guc_init_golden_context(guc);
+}
+
 void intel_guc_ads_destroy(struct intel_guc *guc)
 {
 	i915_vma_unpin_and_release(&guc->ads_vma, I915_VMA_RELEASE_MAP);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
index 0fdcb3583601..dac0dc32da34 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
@@ -11,6 +11,7 @@ struct drm_printer;
 
 int intel_guc_ads_create(struct intel_guc *guc);
 void intel_guc_ads_destroy(struct intel_guc *guc);
+void intel_guc_ads_init_late(struct intel_guc *guc);
 void intel_guc_ads_reset(struct intel_guc *guc);
 void intel_guc_log_policy_info(struct intel_guc *guc, struct drm_printer *p);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 8c681fc49638..4a79db4a739f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -120,6 +120,11 @@ void intel_uc_init_early(struct intel_uc *uc)
 		uc->ops = &uc_ops_off;
 }
 
+void intel_uc_init_late(struct intel_uc *uc)
+{
+	intel_guc_init_late(&uc->guc);
+}
+
 void intel_uc_driver_late_release(struct intel_uc *uc)
 {
 }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
index 91315e3f1c58..e2da2b6e76e1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -35,6 +35,7 @@ struct intel_uc {
 };
 
 void intel_uc_init_early(struct intel_uc *uc);
+void intel_uc_init_late(struct intel_uc *uc);
 void intel_uc_driver_late_release(struct intel_uc *uc);
 void intel_uc_driver_remove(struct intel_uc *uc);
 void intel_uc_init_mmio(struct intel_uc *uc);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 87/97] drm/i915/guc: Implement GuC priority management
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (85 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 86/97] drm/i915/guc: Add golden context to GuC ADS Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 88/97] drm/i915/guc: Support request cancellation Matthew Brost
                   ` (12 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Implement a simple static mapping algorithm of the i915 priority levels
(int, -1k to 1k exposed to user) to the 4 GuC levels. Mapping is as
follows:

i915 level < 0          -> GuC low level     (3)
i915 level == 0         -> GuC normal level  (2)
i915 level < INT_MAX    -> GuC high level    (1)
i915 level == INT_MAX   -> GuC highest level (0)

We believe this mapping should cover the UMD use cases (3 distinct user
levels + 1 kernel level).

In addition to static mapping, a simple counter system is attached to
each context tracking the number of requests inflight on the context at
each level. This is needed as the GuC levels are per context while in
the i915 levels are per request.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   |   3 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +-
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |   4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 205 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_request.c           |   5 +
 drivers/gpu/drm/i915/i915_request.h           |   8 +
 drivers/gpu/drm/i915/i915_scheduler.c         |   7 +
 drivers/gpu/drm/i915/i915_scheduler_types.h   |   5 +
 drivers/gpu/drm/i915/i915_trace.h             |  16 +-
 include/uapi/drm/i915_drm.h                   |   9 +
 10 files changed, 266 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 2007dc6f6b99..209cf265bf74 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -245,6 +245,9 @@ static void signal_irq_work(struct irq_work *work)
 			llist_entry(signal, typeof(*rq), signal_node);
 		struct list_head cb_list;
 
+		if (rq->engine->sched_engine->retire_inflight_request_prio)
+			rq->engine->sched_engine->retire_inflight_request_prio(rq);
+
 		spin_lock(&rq->lock);
 		list_replace(&rq->fence.cb_list, &cb_list);
 		__dma_fence_signal__timestamp(&rq->fence, timestamp);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 998f3839411a..217761b27b6c 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -17,8 +17,9 @@
 #include "intel_engine_types.h"
 #include "intel_sseu.h"
 
-#define CONTEXT_REDZONE POISON_INUSE
+#include "uc/intel_guc_fwif.h"
 
+#define CONTEXT_REDZONE POISON_INUSE
 DECLARE_EWMA(runtime, 3, 8);
 
 struct i915_gem_context;
@@ -193,6 +194,12 @@ struct intel_context {
 	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
 	 */
 	struct list_head guc_id_link;
+
+	/*
+	 * GuC priority management
+	 */
+	u8 guc_prio;
+	u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM];
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index d6dcdeace174..7cb16b6cf2ef 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -11,6 +11,7 @@
 #include "intel_engine.h"
 #include "intel_engine_user.h"
 #include "intel_gt.h"
+#include "uc/intel_guc_submission.h"
 
 struct intel_engine_cs *
 intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance)
@@ -114,6 +115,9 @@ static void set_scheduler_caps(struct drm_i915_private *i915)
 			disabled |= (I915_SCHEDULER_CAP_ENABLED |
 				     I915_SCHEDULER_CAP_PRIORITY);
 
+		if (intel_uc_uses_guc_submission(&i915->gt.uc))
+			enabled |= I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP;
+
 		for (i = 0; i < ARRAY_SIZE(map); i++) {
 			if (engine->flags & BIT(map[i].engine))
 				enabled |= BIT(map[i].sched);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9dc0ffc07cd7..6d2ae6390299 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -180,6 +180,7 @@ static void clr_guc_ids_exhausted(struct guc_submit_engine *gse)
 #define SCHED_STATE_NO_LOCK_BLOCK_TASKLET		BIT(2)
 #define SCHED_STATE_NO_LOCK_GUC_ID_STOLEN		BIT(3)
 #define SCHED_STATE_NO_LOCK_NEEDS_REGISTER		BIT(4)
+#define SCHED_STATE_NO_LOCK_REGISTERED			BIT(5)
 static inline bool context_enabled(struct intel_context *ce)
 {
 	return (atomic_read(&ce->guc_sched_state_no_lock) &
@@ -269,6 +270,24 @@ static inline void clr_context_needs_register(struct intel_context *ce)
 		   &ce->guc_sched_state_no_lock);
 }
 
+static inline bool context_registered(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_REGISTERED);
+}
+
+static inline void set_context_registered(struct intel_context *ce)
+{
+	atomic_or(SCHED_STATE_NO_LOCK_REGISTERED,
+		  &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_registered(struct intel_context *ce)
+{
+	atomic_and((u32)~SCHED_STATE_NO_LOCK_REGISTERED,
+		   &ce->guc_sched_state_no_lock);
+}
+
 /*
  * Below is a set of functions which control the GuC scheduling state which
  * require a lock, aside from the special case where the functions are called
@@ -1531,6 +1550,7 @@ static int steal_guc_id(struct intel_guc *guc, bool unpinned)
 		 * the guc_id until the retire workqueue processes this
 		 * context.
 		 */
+		clr_context_registered(ce);
 		if (!unpinned) {
 			GEM_BUG_ON(ce_to_gse(ce)->stalled_context);
 
@@ -1681,10 +1701,13 @@ static int register_context(struct intel_context *ce, bool loop)
 	struct intel_guc *guc = ce_to_guc(ce);
 	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
 		ce->guc_id * sizeof(struct guc_lrc_desc);
+	int ret;
 
 	trace_intel_context_register(ce);
 
-	return __guc_action_register_context(guc, ce->guc_id, offset, loop);
+	ret = __guc_action_register_context(guc, ce->guc_id, offset, loop);
+	set_context_registered(ce);
+	return ret;
 }
 
 static int __guc_action_deregister_context(struct intel_guc *guc,
@@ -1739,6 +1762,8 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
 	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
 }
 
+static inline u8 map_i915_prio_to_guc_prio(int prio);
+
 static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 {
 	struct intel_runtime_pm *runtime_pm =
@@ -1747,6 +1772,8 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	struct intel_guc *guc = &engine->gt->uc.guc;
 	u32 desc_idx = ce->guc_id;
 	struct guc_lrc_desc *desc;
+	const struct i915_gem_context *ctx;
+	int prio = I915_CONTEXT_DEFAULT_PRIORITY;
 	bool context_registered;
 	intel_wakeref_t wakeref;
 	int ret = 0;
@@ -1763,6 +1790,12 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 
 	context_registered = lrc_desc_registered(guc, desc_idx);
 
+	rcu_read_lock();
+	ctx = rcu_dereference(ce->gem_context);
+	if (ctx)
+		prio = ctx->sched.priority;
+	rcu_read_unlock();
+
 	reset_lrc_desc(guc, desc_idx);
 	set_lrc_desc_registered(guc, desc_idx, ce);
 
@@ -1771,7 +1804,8 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	desc->engine_submit_mask = adjust_engine_mask(engine->class,
 						      engine->mask);
 	desc->hw_context_desc = ce->lrc.lrca;
-	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
+	ce->guc_prio = map_i915_prio_to_guc_prio(prio);
+	desc->priority = ce->guc_prio;
 	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
 	guc_context_policy_init(engine, desc);
 	init_sched_state(ce);
@@ -2006,11 +2040,17 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
 	GEM_BUG_ON(context_enabled(ce));
 
+	clr_context_registered(ce);
 	deregister_context(ce, ce->guc_id, true);
 }
 
 static void __guc_context_destroy(struct intel_context *ce)
 {
+	GEM_BUG_ON(ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_HIGH] ||
+		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_HIGH] ||
+		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_NORMAL] ||
+		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_NORMAL]);
+
 	lrc_fini(ce);
 	intel_context_fini(ce);
 
@@ -2104,17 +2144,121 @@ static int guc_context_alloc(struct intel_context *ce)
 	return lrc_alloc(ce, ce->engine);
 }
 
+static void guc_context_set_prio(struct intel_guc *guc,
+				 struct intel_context *ce,
+				 u8 prio)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY,
+		ce->guc_id,
+		prio,
+	};
+
+	GEM_BUG_ON(prio < GUC_CLIENT_PRIORITY_KMD_HIGH ||
+		   prio > GUC_CLIENT_PRIORITY_NORMAL);
+
+	if (ce->guc_prio == prio || submission_disabled(guc) ||
+	    !context_registered(ce))
+		return;
+
+	guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
+
+	ce->guc_prio = prio;
+	trace_intel_context_set_prio(ce);
+}
+
+static inline u8 map_i915_prio_to_guc_prio(int prio)
+{
+	if (prio == I915_PRIORITY_NORMAL)
+		return GUC_CLIENT_PRIORITY_KMD_NORMAL;
+	else if (prio < I915_PRIORITY_NORMAL)
+		return GUC_CLIENT_PRIORITY_NORMAL;
+	else if (prio != I915_PRIORITY_BARRIER)
+		return GUC_CLIENT_PRIORITY_HIGH;
+	else
+		return GUC_CLIENT_PRIORITY_KMD_HIGH;
+}
+
+static inline void add_context_inflight_prio(struct intel_context *ce,
+					     u8 guc_prio)
+{
+	lockdep_assert_held(&ce->guc_active.lock);
+	GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count));
+
+	++ce->guc_prio_count[guc_prio];
+
+	/* Overflow protection */
+	GEM_WARN_ON(!ce->guc_prio_count[guc_prio]);
+}
+
+static inline void sub_context_inflight_prio(struct intel_context *ce,
+					     u8 guc_prio)
+{
+	lockdep_assert_held(&ce->guc_active.lock);
+	GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count));
+
+	/* Underflow protection */
+	GEM_WARN_ON(!ce->guc_prio_count[guc_prio]);
+
+	--ce->guc_prio_count[guc_prio];
+}
+
+static inline void update_context_prio(struct intel_context *ce)
+{
+	struct intel_guc *guc = &ce->engine->gt->uc.guc;
+	int i;
+
+	lockdep_assert_held(&ce->guc_active.lock);
+
+	for (i = 0; i < ARRAY_SIZE(ce->guc_prio_count); ++i) {
+		if (ce->guc_prio_count[i]) {
+			guc_context_set_prio(guc, ce, i);
+			break;
+		}
+	}
+}
+
+static inline bool new_guc_prio_higher(u8 old_guc_prio, u8 new_guc_prio)
+{
+	/* Lower value is higher priority */
+	return new_guc_prio < old_guc_prio;
+}
+
 static void add_to_context(struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
+	u8 new_guc_prio = map_i915_prio_to_guc_prio(rq_prio(rq));
+
+	GEM_BUG_ON(rq->guc_prio == GUC_PRIO_FINI);
 
 	spin_lock(&ce->guc_active.lock);
 	list_move_tail(&rq->sched.link, &ce->guc_active.requests);
+
+	if (rq->guc_prio == GUC_PRIO_INIT) {
+		rq->guc_prio = new_guc_prio;
+		add_context_inflight_prio(ce, rq->guc_prio);
+	} else if (new_guc_prio_higher(rq->guc_prio, new_guc_prio)) {
+		sub_context_inflight_prio(ce, rq->guc_prio);
+		rq->guc_prio = new_guc_prio;
+		add_context_inflight_prio(ce, rq->guc_prio);
+	}
+	update_context_prio(ce);
+
 	if (unlikely(request_has_no_guc_id(rq)))
 		++ce->guc_num_rq_submit_no_id;
 	spin_unlock(&ce->guc_active.lock);
 }
 
+static void guc_prio_fini(struct i915_request *rq, struct intel_context *ce)
+{
+	if (rq->guc_prio != GUC_PRIO_INIT &&
+	    rq->guc_prio != GUC_PRIO_FINI) {
+		sub_context_inflight_prio(ce, rq->guc_prio);
+		update_context_prio(ce);
+	}
+	rq->guc_prio = GUC_PRIO_FINI;
+}
+
 static void remove_from_context(struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
@@ -2127,6 +2271,8 @@ static void remove_from_context(struct i915_request *rq)
 	/* Prevent further __await_execution() registering a cb, then flush */
 	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
 
+	guc_prio_fini(rq, ce);
+
 	spin_unlock_irq(&ce->guc_active.lock);
 
 	if (likely(!request_has_no_guc_id(rq)))
@@ -2582,6 +2728,39 @@ static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
 	}
 }
 
+static void guc_bump_inflight_request_prio(struct i915_request *rq,
+					   int prio)
+{
+	struct intel_context *ce = rq->context;
+	u8 new_guc_prio = map_i915_prio_to_guc_prio(prio);
+
+	/* Short circuit function */
+	if (prio < I915_PRIORITY_NORMAL ||
+	    (rq->guc_prio == GUC_PRIO_FINI) ||
+	    (rq->guc_prio != GUC_PRIO_INIT &&
+	     !new_guc_prio_higher(rq->guc_prio, new_guc_prio)))
+		return;
+
+	spin_lock(&ce->guc_active.lock);
+	if (rq->guc_prio != GUC_PRIO_FINI) {
+		if (rq->guc_prio != GUC_PRIO_INIT)
+			sub_context_inflight_prio(ce, rq->guc_prio);
+		rq->guc_prio = new_guc_prio;
+		add_context_inflight_prio(ce, rq->guc_prio);
+		update_context_prio(ce);
+	}
+	spin_unlock(&ce->guc_active.lock);
+}
+
+static void guc_retire_inflight_request_prio(struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+
+	spin_lock(&ce->guc_active.lock);
+	guc_prio_fini(rq, ce);
+	spin_unlock(&ce->guc_active.lock);
+}
+
 static void sanitize_hwsp(struct intel_engine_cs *engine)
 {
 	struct intel_timeline *tl;
@@ -2807,6 +2986,10 @@ static void guc_submit_engine_init(struct intel_guc *guc,
 	gse->sched_engine.schedule = i915_schedule;
 	gse->sched_engine.disabled = guc_sched_engine_disabled;
 	gse->sched_engine.destroy = guc_sched_engine_destroy;
+	gse->sched_engine.bump_inflight_request_prio =
+		guc_bump_inflight_request_prio;
+	gse->sched_engine.retire_inflight_request_prio =
+		guc_retire_inflight_request_prio;
 	gse->guc = guc;
 	gse->id = id;
 }
@@ -3264,6 +3447,22 @@ void intel_guc_log_submission_info(struct intel_guc *guc,
 		gse_log_submission_info(guc->gse[i], p, i);
 }
 
+static inline void guc_log_context_priority(struct drm_printer *p,
+					    struct intel_context *ce)
+{
+	int i;
+
+	drm_printf(p, "\t\tPriority: %d\n",
+		   ce->guc_prio);
+	drm_printf(p, "\t\tNumber Requests (lower index == higher priority)\n");
+	for (i = GUC_CLIENT_PRIORITY_KMD_HIGH;
+	     i < GUC_CLIENT_PRIORITY_NUM; ++i) {
+		drm_printf(p, "\t\tNumber requests in priority band[%d]: %d\n",
+			   i, ce->guc_prio_count[i]);
+	}
+	drm_printf(p, "\n");
+}
+
 void intel_guc_log_context_info(struct intel_guc *guc,
 				struct drm_printer *p)
 {
@@ -3287,6 +3486,8 @@ void intel_guc_log_context_info(struct intel_guc *guc,
 		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
 			   ce->guc_state.sched_state,
 			   atomic_read(&ce->guc_sched_state_no_lock));
+
+		guc_log_context_priority(p, ce);
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index ef9eb91ec84c..4bf10f0ac34d 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -115,6 +115,9 @@ static void i915_fence_release(struct dma_fence *fence)
 {
 	struct i915_request *rq = to_request(fence);
 
+	GEM_BUG_ON(rq->guc_prio != GUC_PRIO_INIT &&
+		   rq->guc_prio != GUC_PRIO_FINI);
+
 	/*
 	 * The request is put onto a RCU freelist (i.e. the address
 	 * is immediately reused), mark the fences as being freed now.
@@ -956,6 +959,8 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 
 	rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */
 
+	rq->guc_prio = GUC_PRIO_INIT;
+
 	/* We bump the ref for the fence chain */
 	i915_sw_fence_reinit(&i915_request_get(rq)->submit);
 	i915_sw_fence_reinit(&i915_request_get(rq)->semaphore);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 94a3f119ad86..a03905f86e82 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -299,6 +299,14 @@ struct i915_request {
 	 */
 	struct list_head guc_fence_link;
 
+	/**
+	 * Priority level while the request is inflight. Differs slightly than
+	 * i915 scheduler priority.
+	 */
+#define	GUC_PRIO_INIT	0xff
+#define	GUC_PRIO_FINI	0xfe
+	u8 guc_prio;
+
 	I915_SELFTEST_DECLARE(struct {
 		struct list_head link;
 		unsigned long delay;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 51644de0e9ca..babad7da3906 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -241,6 +241,9 @@ static void __i915_schedule(struct i915_sched_node *node,
 	/* Fifo and depth-first replacement ensure our deps execute before us */
 	sched_engine = lock_sched_engine(node, sched_engine, &cache);
 	list_for_each_entry_safe_reverse(dep, p, &dfs, dfs_link) {
+		struct i915_request *from = container_of(dep->signaler,
+							 struct i915_request,
+							 sched);
 		INIT_LIST_HEAD(&dep->dfs_link);
 
 		node = dep->signaler;
@@ -254,6 +257,10 @@ static void __i915_schedule(struct i915_sched_node *node,
 		GEM_BUG_ON(node_to_request(node)->engine->sched_engine !=
 			   sched_engine);
 
+		/* Must be called before changing the nodes priority */
+		if (sched_engine->bump_inflight_request_prio)
+			sched_engine->bump_inflight_request_prio(from, prio);
+
 		WRITE_ONCE(node->attr.priority, prio);
 
 		/*
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index a0b755a27140..14626fcfeed3 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -151,6 +151,11 @@ struct i915_sched_engine {
 	void	(*kick_backend)(const struct i915_request *rq,
 				int prio);
 
+	/* Update priority of inflight requests */
+	void	(*bump_inflight_request_prio)(struct i915_request *rq,
+					      int prio);
+	void	(*retire_inflight_request_prio)(struct i915_request *rq);
+
 	/*
 	 * Call when the priority on a request has changed and it and its
 	 * dependencies may need rescheduling. Note the request itself may
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 937d3706af9b..9d2cd14ed882 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -914,6 +914,7 @@ DECLARE_EVENT_CLASS(intel_context,
 			     __field(int, pin_count)
 			     __field(u32, sched_state)
 			     __field(u32, guc_sched_state_no_lock)
+			     __field(u8, guc_prio)
 			     ),
 
 	    TP_fast_assign(
@@ -922,11 +923,17 @@ DECLARE_EVENT_CLASS(intel_context,
 			   __entry->sched_state = ce->guc_state.sched_state;
 			   __entry->guc_sched_state_no_lock =
 			   atomic_read(&ce->guc_sched_state_no_lock);
+			   __entry->guc_prio = ce->guc_prio;
 			   ),
 
-	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x",
+	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x, guc_prio=%u",
 		      __entry->guc_id, __entry->pin_count, __entry->sched_state,
-		      __entry->guc_sched_state_no_lock)
+		      __entry->guc_sched_state_no_lock, __entry->guc_prio)
+);
+
+DEFINE_EVENT(intel_context, intel_context_set_prio,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
 );
 
 DEFINE_EVENT(intel_context, intel_context_reset,
@@ -1036,6 +1043,11 @@ trace_i915_request_out(struct i915_request *rq)
 {
 }
 
+static inline void
+trace_intel_context_set_prio(struct intel_context *ce)
+{
+}
+
 static inline void
 trace_intel_context_reset(struct intel_context *ce)
 {
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index c2c7759b7d2e..0a489b11fe6b 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -572,6 +572,15 @@ typedef struct drm_i915_irq_wait {
 #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
 #define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 3)
 #define   I915_SCHEDULER_CAP_ENGINE_BUSY_STATS	(1ul << 4)
+/*
+ * Indicates the 2k user priority levels are statically mapped into 3 buckets as
+ * follows:
+ *
+ * -1k to -1	Low priority
+ * 0		Normal priority
+ * 1 to 1k	Highest priority
+ */
+#define   I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP	(1ul << 5)
 
 #define I915_PARAM_HUC_STATUS		 42
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 88/97] drm/i915/guc: Support request cancellation
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (86 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 87/97] drm/i915/guc: Implement GuC priority management Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 89/97] drm/i915/guc: Check return of __xa_store when registering a context Matthew Brost
                   ` (11 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

This adds GuC backend support for i915_request_cancel(), which in turn
makes CONFIG_DRM_I915_REQUEST_TIMEOUT work.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   9 +
 drivers/gpu/drm/i915/gt/intel_context.h       |   7 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
 .../drm/i915/gt/intel_execlists_submission.c  |  18 ++
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   |   1 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 168 ++++++++++++++++++
 drivers/gpu/drm/i915/i915_request.c           |  14 +-
 7 files changed, 211 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 3fe7794b2bfd..b633fea684d4 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -366,6 +366,12 @@ static int __intel_context_active(struct i915_active *active)
 	return 0;
 }
 
+static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
+				 enum i915_sw_fence_notify state)
+{
+	return NOTIFY_DONE;
+}
+
 void
 intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 {
@@ -398,6 +404,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	ce->guc_id = GUC_INVALID_LRC_ID;
 	INIT_LIST_HEAD(&ce->guc_id_link);
 
+	i915_sw_fence_init(&ce->guc_blocked, sw_fence_dummy_notify);
+	i915_sw_fence_commit(&ce->guc_blocked);
+
 	i915_active_init(&ce->active,
 			 __intel_context_active, __intel_context_retire, 0);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index 11fa7700dc9e..1b208daee72b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -71,6 +71,13 @@ intel_context_is_pinned(struct intel_context *ce)
 	return atomic_read(&ce->pin_count);
 }
 
+static inline void intel_context_cancel_request(struct intel_context *ce,
+						struct i915_request *rq)
+{
+	GEM_BUG_ON(!ce->ops->cancel_request);
+	return ce->ops->cancel_request(ce, rq);
+}
+
 /**
  * intel_context_unlock_pinned - Releases the earlier locking of 'pinned' status
  * @ce - the context
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 217761b27b6c..cd2ea5b98fc3 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -13,6 +13,7 @@
 #include <linux/types.h>
 
 #include "i915_active_types.h"
+#include "i915_sw_fence.h"
 #include "i915_utils.h"
 #include "intel_engine_types.h"
 #include "intel_sseu.h"
@@ -43,6 +44,9 @@ struct intel_context_ops {
 	void (*unpin)(struct intel_context *ce);
 	void (*post_unpin)(struct intel_context *ce);
 
+	void (*cancel_request)(struct intel_context *ce,
+			       struct i915_request *rq);
+
 	void (*enter)(struct intel_context *ce);
 	void (*exit)(struct intel_context *ce);
 
@@ -200,6 +204,9 @@ struct intel_context {
 	 */
 	u8 guc_prio;
 	u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM];
+
+	/* GuC context blocked fence */
+	struct i915_sw_fence guc_blocked;
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 54518b64bdbd..16606cdfc2f5 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -114,6 +114,7 @@
 #include "gen8_engine_cs.h"
 #include "intel_breadcrumbs.h"
 #include "intel_context.h"
+#include "intel_engine_heartbeat.h"
 #include "intel_engine_pm.h"
 #include "intel_engine_stats.h"
 #include "intel_execlists_submission.h"
@@ -2545,11 +2546,26 @@ static int execlists_context_alloc(struct intel_context *ce)
 	return lrc_alloc(ce, ce->engine);
 }
 
+static void execlists_context_cancel_request(struct intel_context *ce,
+					     struct i915_request *rq)
+{
+	struct intel_engine_cs *engine = NULL;
+
+	i915_request_active_engine(rq, &engine);
+
+	if (engine && intel_engine_pulse(engine))
+		intel_gt_handle_error(engine->gt, engine->mask, 0,
+				      "request cancellation by %s",
+				      current->comm);
+}
+
 static const struct intel_context_ops execlists_context_ops = {
 	.flags = COPS_HAS_INFLIGHT,
 
 	.alloc = execlists_context_alloc,
 
+	.cancel_request = execlists_context_cancel_request,
+
 	.pre_pin = execlists_context_pre_pin,
 	.pin = execlists_context_pin,
 	.unpin = lrc_unpin,
@@ -3649,6 +3665,8 @@ static const struct intel_context_ops virtual_context_ops = {
 
 	.alloc = virtual_context_alloc,
 
+	.cancel_request = execlists_context_cancel_request,
+
 	.pre_pin = virtual_context_pre_pin,
 	.pin = virtual_context_pin,
 	.unpin = lrc_unpin,
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index c6c702f236fa..3d3043d4bf98 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -13,6 +13,7 @@
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
 #include "intel_timeline.h"
+#include "intel_context.h"
 #include "uc/intel_uc.h"
 
 static bool retire_requests(struct intel_timeline *tl)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 6d2ae6390299..b3157eeb2599 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -181,6 +181,11 @@ static void clr_guc_ids_exhausted(struct guc_submit_engine *gse)
 #define SCHED_STATE_NO_LOCK_GUC_ID_STOLEN		BIT(3)
 #define SCHED_STATE_NO_LOCK_NEEDS_REGISTER		BIT(4)
 #define SCHED_STATE_NO_LOCK_REGISTERED			BIT(5)
+#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		6
+#define SCHED_STATE_NO_LOCK_BLOCKED \
+	BIT(SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
+#define SCHED_STATE_NO_LOCK_BLOCKED_MASK \
+	(0xffff << SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
 static inline bool context_enabled(struct intel_context *ce)
 {
 	return (atomic_read(&ce->guc_sched_state_no_lock) &
@@ -288,6 +293,27 @@ static inline void clr_context_registered(struct intel_context *ce)
 		   &ce->guc_sched_state_no_lock);
 }
 
+static inline u32 context_blocked(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_BLOCKED_MASK) >>
+		SCHED_STATE_NO_LOCK_BLOCKED_SHIFT;
+}
+
+static inline void incr_context_blocked(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce_to_gse(ce)->sched_engine.lock);
+	atomic_add(SCHED_STATE_NO_LOCK_BLOCKED,
+		   &ce->guc_sched_state_no_lock);
+}
+
+static inline void decr_context_blocked(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce_to_gse(ce)->sched_engine.lock);
+	atomic_sub(SCHED_STATE_NO_LOCK_BLOCKED,
+		   &ce->guc_sched_state_no_lock);
+}
+
 /*
  * Below is a set of functions which control the GuC scheduling state which
  * require a lock, aside from the special case where the functions are called
@@ -593,6 +619,9 @@ static int __guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		goto out;
 	}
 
+	if (unlikely(context_blocked(ce)))
+		goto out;
+
 	enabled = context_enabled(ce);
 
 	if (!enabled) {
@@ -853,6 +882,7 @@ static void __guc_context_destroy(struct intel_context *ce);
 static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
 static void guc_signal_context_fence(struct intel_context *ce);
 static void guc_cancel_context_requests(struct intel_context *ce);
+static void guc_blocked_fence_complete(struct intel_context *ce);
 
 static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 {
@@ -900,6 +930,10 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 			}
 			intel_context_sched_disable_unpin(ce);
 			atomic_dec(&guc->outstanding_submission_g2h);
+			spin_lock_irqsave(&ce->guc_state.lock, flags);
+			guc_blocked_fence_complete(ce);
+			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
 			intel_context_put(ce);
 		}
 	}
@@ -1917,6 +1951,21 @@ static void guc_context_post_unpin(struct intel_context *ce)
 	lrc_post_unpin(ce);
 }
 
+static void __guc_context_sched_enable(struct intel_guc *guc,
+				       struct intel_context *ce)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET,
+		ce->guc_id,
+		GUC_CONTEXT_ENABLE
+	};
+
+	trace_intel_context_sched_enable(ce);
+
+	guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
+				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
+}
+
 static void __guc_context_sched_disable(struct intel_guc *guc,
 					struct intel_context *ce,
 					u16 guc_id)
@@ -1935,15 +1984,129 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
 }
 
+static void guc_blocked_fence_complete(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+
+	if (!i915_sw_fence_done(&ce->guc_blocked))
+		i915_sw_fence_complete(&ce->guc_blocked);
+}
+
+static void guc_blocked_fence_reinit(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	GEM_BUG_ON(!i915_sw_fence_done(&ce->guc_blocked));
+	i915_sw_fence_fini(&ce->guc_blocked);
+	i915_sw_fence_reinit(&ce->guc_blocked);
+	i915_sw_fence_await(&ce->guc_blocked);
+	i915_sw_fence_commit(&ce->guc_blocked);
+}
+
 static u16 prep_context_pending_disable(struct intel_context *ce)
 {
 	set_context_pending_disable(ce);
 	clr_context_enabled(ce);
+	guc_blocked_fence_reinit(ce);
 	intel_context_get(ce);
 
 	return ce->guc_id;
 }
 
+static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	struct i915_sched_engine *sched_engine = ce_to_sched_engine(ce);
+	unsigned long flags;
+	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
+	intel_wakeref_t wakeref;
+	u16 guc_id;
+	bool enabled;
+
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	incr_context_blocked(ce);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	enabled = context_enabled(ce);
+	if (unlikely(!enabled || submission_disabled(guc))) {
+		if (!enabled)
+			clr_context_enabled(ce);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+		return &ce->guc_blocked;
+	}
+
+	/*
+	 * We add +2 here as the schedule disable complete CTB handler calls
+	 * intel_context_sched_disable_unpin (-2 to pin_count).
+	 */
+	atomic_add(2, &ce->pin_count);
+
+	guc_id = prep_context_pending_disable(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+	with_intel_runtime_pm(runtime_pm, wakeref)
+		__guc_context_sched_disable(guc, ce, guc_id);
+
+	return &ce->guc_blocked;
+}
+
+static void guc_context_unblock(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	struct i915_sched_engine *sched_engine = ce_to_sched_engine(ce);
+	unsigned long flags;
+	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
+	intel_wakeref_t wakeref;
+
+	GEM_BUG_ON(context_enabled(ce));
+
+	if (unlikely(context_blocked(ce) > 1)) {
+		spin_lock_irqsave(&sched_engine->lock, flags);
+		if (likely(context_blocked(ce) > 1))
+			goto decrement;
+		spin_unlock_irqrestore(&sched_engine->lock, flags);
+	}
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	if (unlikely(submission_disabled(guc) ||
+		     !intel_context_is_pinned(ce) ||
+		     context_pending_disable(ce) ||
+		     context_blocked(ce) > 1)) {
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+		goto out;
+	}
+
+	set_context_pending_enable(ce);
+	set_context_enabled(ce);
+	intel_context_get(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+	with_intel_runtime_pm(runtime_pm, wakeref)
+		__guc_context_sched_enable(guc, ce);
+
+out:
+	spin_lock_irqsave(&sched_engine->lock, flags);
+decrement:
+	decr_context_blocked(ce);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+
+static void guc_context_cancel_request(struct intel_context *ce,
+				       struct i915_request *rq)
+{
+	if (i915_sw_fence_signaled(&rq->submit)) {
+		struct i915_sw_fence *fence = guc_context_block(ce);
+
+		i915_sw_fence_wait(fence);
+		if (!i915_request_completed(rq)) {
+			__i915_request_skip(rq);
+			guc_reset_state(ce, intel_ring_wrap(ce->ring, rq->head),
+					true);
+		}
+		guc_context_unblock(ce);
+	}
+}
+
 static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
@@ -2294,6 +2457,8 @@ static const struct intel_context_ops guc_context_ops = {
 
 	.ban = guc_context_ban,
 
+	.cancel_request = guc_context_cancel_request,
+
 	.enter = intel_context_enter_engine,
 	.exit = intel_context_exit_engine,
 
@@ -2661,6 +2826,8 @@ static const struct intel_context_ops virtual_guc_context_ops = {
 
 	.ban = guc_context_ban,
 
+	.cancel_request = guc_context_cancel_request,
+
 	.enter = guc_virtual_context_enter,
 	.exit = guc_virtual_context_exit,
 
@@ -3212,6 +3379,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		clr_context_banned(ce);
 		clr_context_pending_disable(ce);
 		__guc_signal_context_fence(ce);
+		guc_blocked_fence_complete(ce);
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
 		if (context_block_tasklet(ce)) {
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 4bf10f0ac34d..71965fb4f3ab 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -734,18 +734,6 @@ void i915_request_unsubmit(struct i915_request *request)
 	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 }
 
-static void __cancel_request(struct i915_request *rq)
-{
-	struct intel_engine_cs *engine = NULL;
-
-	i915_request_active_engine(rq, &engine);
-
-	if (engine && intel_engine_pulse(engine))
-		intel_gt_handle_error(engine->gt, engine->mask, 0,
-				      "request cancellation by %s",
-				      current->comm);
-}
-
 void i915_request_cancel(struct i915_request *rq, int error)
 {
 	if (!i915_request_set_error_once(rq, error))
@@ -753,7 +741,7 @@ void i915_request_cancel(struct i915_request *rq, int error)
 
 	set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
 
-	__cancel_request(rq);
+	intel_context_cancel_request(rq->context, rq);
 }
 
 static int __i915_sw_fence_call
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 89/97] drm/i915/guc: Check return of __xa_store when registering a context
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (87 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 88/97] drm/i915/guc: Support request cancellation Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 90/97] drm/i915/guc: Non-static lrc descriptor registration buffer Matthew Brost
                   ` (10 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Check return of __xa_store when registering a context as this can fail
in a rare case if not memory can not be allocated. If this occurs fall
back on the tasklet flow control and try again in the future.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index b3157eeb2599..608b30907f4c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -503,18 +503,24 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
 	return __get_context(guc, id);
 }
 
-static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
+static inline int set_lrc_desc_registered(struct intel_guc *guc, u32 id,
 					   struct intel_context *ce)
 {
 	unsigned long flags;
+	void *ret;
 
 	/*
 	 * xarray API doesn't have xa_save_irqsave wrapper, so calling the
 	 * lower level functions directly.
 	 */
 	xa_lock_irqsave(&guc->context_lookup, flags);
-	__xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
+	ret = __xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
 	xa_unlock_irqrestore(&guc->context_lookup, flags);
+
+	if (unlikely(xa_is_err(ret)))
+		return -EBUSY;	/* Try again in future */
+
+	return 0;
 }
 
 static int guc_submission_busy_loop(struct intel_guc* guc,
@@ -1831,7 +1837,9 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	rcu_read_unlock();
 
 	reset_lrc_desc(guc, desc_idx);
-	set_lrc_desc_registered(guc, desc_idx, ce);
+	ret = set_lrc_desc_registered(guc, desc_idx, ce);
+	if (unlikely(ret))
+		return ret;
 
 	desc = __get_lrc_desc(guc, desc_idx);
 	desc->engine_class = engine_class_to_guc_class(engine->class);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 90/97] drm/i915/guc: Non-static lrc descriptor registration buffer
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (88 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 89/97] drm/i915/guc: Check return of __xa_store when registering a context Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 91/97] drm/i915/guc: Take GT PM ref when deregistering context Matthew Brost
                   ` (9 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Dynamically allocate space for lrc descriptor registration with the GuC
rather than using a large static buffer indexed by the guc_id. If no
space is available to register a context, fall back to tasklet flow
control mechanism. Only allow 1/2 of the space to be allocated outside
the tasklet to prevent unready requests/contexts from consuming all
registration space.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |   3 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   9 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 198 +++++++++++++-----
 3 files changed, 150 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index cd2ea5b98fc3..0d7173d3eabd 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -182,6 +182,9 @@ struct intel_context {
 	/* GuC scheduling state that does not require a lock. */
 	atomic_t guc_sched_state_no_lock;
 
+	/* GuC lrc descriptor registration buffer */
+	unsigned int guc_lrcd_reg_idx;
+
 	/* GuC lrc descriptor ID */
 	u16 guc_id;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 96849a256be8..97bb262f8a13 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -68,8 +68,13 @@ struct intel_guc {
 	u32 ads_regset_size;
 	u32 ads_golden_ctxt_size;
 
-	struct i915_vma *lrc_desc_pool;
-	void *lrc_desc_pool_vaddr;
+	/* GuC LRC descriptor registration */
+	struct {
+		struct i915_vma *vma;
+		void *vaddr;
+		struct ida ida;
+		unsigned int max_idx;
+	} lrcd_reg;
 
 	/* guc_id to intel_context lookup */
 	struct xarray context_lookup;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 608b30907f4c..79caf9596084 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -437,65 +437,54 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 	return rb_entry(rb, struct i915_priolist, node);
 }
 
-static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
+static u32 __get_lrc_desc_offset(struct intel_guc *guc, int index)
 {
-	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
-
+	GEM_BUG_ON(index >= guc->lrcd_reg.max_idx);
 	GEM_BUG_ON(index >= guc->max_guc_ids);
 
-	return &base[index];
+	return intel_guc_ggtt_offset(guc, guc->lrcd_reg.vma) +
+		(index * sizeof(struct guc_lrc_desc));
 }
 
-static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
+static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, int index)
 {
-	struct intel_context *ce = xa_load(&guc->context_lookup, id);
+	struct guc_lrc_desc *desc;
 
-	GEM_BUG_ON(id >= guc->max_guc_ids);
+	GEM_BUG_ON(index >= guc->lrcd_reg.max_idx);
+	GEM_BUG_ON(index >= guc->max_guc_ids);
 
-	return ce;
+	desc = guc->lrcd_reg.vaddr;
+	desc = &desc[index];
+	memset(desc, 0, sizeof(*desc));
+
+	return desc;
 }
 
-static int guc_lrc_desc_pool_create(struct intel_guc *guc)
+static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
 {
-	u32 size;
-	int ret;
-
-	size = PAGE_ALIGN(sizeof(struct guc_lrc_desc) * guc->max_guc_ids);
-	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->lrc_desc_pool,
-					     (void **)&guc->lrc_desc_pool_vaddr);
-	if (ret)
-		return ret;
+	struct intel_context *ce = xa_load(&guc->context_lookup, id);
 
-	return 0;
-}
+	GEM_BUG_ON(id >= guc->max_guc_ids);
 
-static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
-{
-	guc->lrc_desc_pool_vaddr = NULL;
-	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
+	return ce;
 }
 
 static inline bool guc_submission_initialized(struct intel_guc *guc)
 {
-	return guc->lrc_desc_pool_vaddr != NULL;
+	return guc->lrcd_reg.max_idx != 0;
 }
 
-static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
+static inline void clr_lrc_desc_registered(struct intel_guc *guc, u32 id)
 {
-	if (likely(guc_submission_initialized(guc))) {
-		struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
-		unsigned long flags;
-
-		memset(desc, 0, sizeof(*desc));
+	unsigned long flags;
 
-		/*
-		 * xarray API doesn't have xa_erase_irqsave wrapper, so calling
-		 * the lower level functions directly.
-		 */
-		xa_lock_irqsave(&guc->context_lookup, flags);
-		__xa_erase(&guc->context_lookup, id);
-		xa_unlock_irqrestore(&guc->context_lookup, flags);
-	}
+	/*
+	 * xarray API doesn't have xa_erase_irqsave wrapper, so calling
+	 * the lower level functions directly.
+	 */
+	xa_lock_irqsave(&guc->context_lookup, flags);
+	__xa_erase(&guc->context_lookup, id);
+	xa_unlock_irqrestore(&guc->context_lookup, flags);
 }
 
 static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
@@ -1373,6 +1362,9 @@ static void retire_worker_func(struct work_struct *w)
 	}
 }
 
+static int guc_lrcd_reg_init(struct intel_guc *guc);
+static void guc_lrcd_reg_fini(struct intel_guc *guc);
+
 /*
  * Set up the memory resources to be shared with the GuC (via the GGTT)
  * at firmware loading time.
@@ -1381,17 +1373,12 @@ int intel_guc_submission_init(struct intel_guc *guc)
 {
 	int ret;
 
-	if (guc->lrc_desc_pool)
+	if (guc_submission_initialized(guc))
 		return 0;
 
-	ret = guc_lrc_desc_pool_create(guc);
+	ret = guc_lrcd_reg_init(guc);
 	if (ret)
 		return ret;
-	/*
-	 * Keep static analysers happy, let them know that we allocated the
-	 * vma after testing that it didn't exist earlier.
-	 */
-	GEM_BUG_ON(!guc->lrc_desc_pool);
 
 	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
 
@@ -1407,10 +1394,10 @@ void intel_guc_submission_fini(struct intel_guc *guc)
 {
 	int i;
 
-	if (!guc->lrc_desc_pool)
+	if (!guc_submission_initialized(guc))
 		return;
 
-	guc_lrc_desc_pool_destroy(guc);
+	guc_lrcd_reg_fini(guc);
 
 	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i) {
 		struct i915_sched_engine *sched_engine =
@@ -1481,6 +1468,7 @@ static bool need_tasklet(struct guc_submit_engine *gse, struct intel_context *ce
 	return guc_ids_exhausted(gse) || submission_disabled(gse->guc) ||
 		gse->stalled_rq || gse->stalled_context ||
 		!lrc_desc_registered(gse->guc, ce->guc_id) ||
+		context_needs_register(ce) ||
 		!i915_sched_engine_is_empty(sched_engine);
 }
 
@@ -1533,7 +1521,7 @@ static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
 {
 	if (!context_guc_id_invalid(ce)) {
 		ida_simple_remove(&guc->guc_ids, ce->guc_id);
-		reset_lrc_desc(guc, ce->guc_id);
+		clr_lrc_desc_registered(guc, ce->guc_id);
 		set_context_guc_id_invalid(ce);
 	}
 	if (!list_empty(&ce->guc_id_link))
@@ -1723,14 +1711,14 @@ static void unpin_guc_id(struct intel_guc *guc,
 }
 
 static int __guc_action_register_context(struct intel_guc *guc,
+					 struct intel_context *ce,
 					 u32 guc_id,
-					 u32 offset,
 					 bool loop)
 {
 	u32 action[] = {
 		INTEL_GUC_ACTION_REGISTER_CONTEXT,
 		guc_id,
-		offset,
+		__get_lrc_desc_offset(guc, ce->guc_lrcd_reg_idx),
 	};
 
 	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, loop);
@@ -1739,13 +1727,11 @@ static int __guc_action_register_context(struct intel_guc *guc,
 static int register_context(struct intel_context *ce, bool loop)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
-	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
-		ce->guc_id * sizeof(struct guc_lrc_desc);
 	int ret;
 
 	trace_intel_context_register(ce);
 
-	ret = __guc_action_register_context(guc, ce->guc_id, offset, loop);
+	ret = __guc_action_register_context(guc, ce, ce->guc_id, loop);
 	set_context_registered(ce);
 	return ret;
 }
@@ -1804,6 +1790,86 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
 
 static inline u8 map_i915_prio_to_guc_prio(int prio);
 
+static int alloc_lrcd_reg_idx_buffer(struct intel_guc *guc, int num_per_vma)
+{
+	u32 size = num_per_vma * sizeof(struct guc_lrc_desc);
+	struct i915_vma **vma = &guc->lrcd_reg.vma;
+	void **vaddr = &guc->lrcd_reg.vaddr;
+	int ret;
+
+	GEM_BUG_ON(!is_power_of_2(size));
+
+	ret = intel_guc_allocate_and_map_vma(guc, size, vma, vaddr);
+	if (unlikely(ret))
+		return ret;
+
+	guc->lrcd_reg.max_idx += num_per_vma;
+
+	return 0;
+}
+
+static int alloc_lrcd_reg_idx(struct intel_guc *guc, bool tasklet)
+{
+	int ret;
+	gfp_t gfp = tasklet ? GFP_ATOMIC :
+		GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN;
+
+	might_sleep_if(!tasklet);
+
+	/*
+	 * We only allow 1/2 of the space to be allocated outside of tasklet
+	 * (flow control) to ensure requests that are not ready don't consume
+	 * all context registration space.
+	 */
+	ret = ida_simple_get(&guc->lrcd_reg.ida, 0,
+			     tasklet ? guc->lrcd_reg.max_idx :
+			     guc->lrcd_reg.max_idx / 2, gfp);
+	if (unlikely(ret < 0))
+		return -EBUSY;
+
+	return ret;
+}
+
+static void __free_lrcd_reg_idx(struct intel_guc *guc, struct intel_context *ce)
+{
+	if (ce->guc_lrcd_reg_idx && guc->lrcd_reg.max_idx) {
+		ida_simple_remove(&guc->lrcd_reg.ida, ce->guc_lrcd_reg_idx);
+		ce->guc_lrcd_reg_idx = 0;;
+	}
+}
+
+static void free_lrcd_reg_idx(struct intel_guc *guc, struct intel_context *ce)
+{
+	__free_lrcd_reg_idx(guc, ce);
+}
+
+static int guc_lrcd_reg_init(struct intel_guc *guc)
+{
+	unsigned buffer_size = I915_GTT_PAGE_SIZE_4K * 16;
+	int ret;
+
+	ida_init(&guc->lrcd_reg.ida);
+
+	ret = alloc_lrcd_reg_idx_buffer(guc, buffer_size /
+					sizeof(struct guc_lrc_desc));
+	if (unlikely(ret))
+		return ret;
+
+	/* Zero is reserved */
+	ret = alloc_lrcd_reg_idx(guc, false);
+	GEM_BUG_ON(ret);
+
+	return ret;
+}
+
+static void guc_lrcd_reg_fini(struct intel_guc *guc)
+{
+	i915_vma_unpin_and_release(&guc->lrcd_reg.vma,
+				   I915_VMA_RELEASE_MAP);
+	ida_destroy(&guc->lrcd_reg.ida);
+	guc->lrcd_reg.max_idx = 0;
+}
+
 static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 {
 	struct intel_runtime_pm *runtime_pm =
@@ -1828,6 +1894,14 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
 		   i915_gem_object_is_lmem(ce->ring->vma->obj));
 
+	/* Allocate space for registeration */
+	if (likely(!ce->guc_lrcd_reg_idx)) {
+		ret = alloc_lrcd_reg_idx(guc, !loop);
+		if (unlikely(ret < 0))
+			return ret;
+		ce->guc_lrcd_reg_idx = ret;
+	}
+
 	context_registered = lrc_desc_registered(guc, desc_idx);
 
 	rcu_read_lock();
@@ -1836,12 +1910,11 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 		prio = ctx->sched.priority;
 	rcu_read_unlock();
 
-	reset_lrc_desc(guc, desc_idx);
 	ret = set_lrc_desc_registered(guc, desc_idx, ce);
 	if (unlikely(ret))
 		return ret;
 
-	desc = __get_lrc_desc(guc, desc_idx);
+	desc = __get_lrc_desc(guc, ce->guc_lrcd_reg_idx);
 	desc->engine_class = engine_class_to_guc_class(engine->class);
 	desc->engine_submit_mask = adjust_engine_mask(engine->class,
 						      engine->mask);
@@ -1879,7 +1952,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 			}
 			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 			if (unlikely(disabled)) {
-				reset_lrc_desc(guc, desc_idx);
+				clr_lrc_desc_registered(guc, desc_idx);
 				return 0;	/* Will get registered later */
 			}
 		}
@@ -1905,7 +1978,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 		with_intel_runtime_pm(runtime_pm, wakeref)
 			ret = register_context(ce, loop);
 		if (unlikely(ret == -EBUSY))
-			reset_lrc_desc(guc, desc_idx);
+			clr_lrc_desc_registered(guc, desc_idx);
 		else if (unlikely(ret == -ENODEV))
 			ret = 0;	/* Will get registered later */
 	}
@@ -2146,6 +2219,7 @@ static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
 		guc_id = prep_context_pending_disable(ce);
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
+		free_lrcd_reg_idx(guc, ce);
 		with_intel_runtime_pm(runtime_pm, wakeref)
 			__guc_context_sched_disable(guc, ce, guc_id);
 	} else {
@@ -2224,6 +2298,7 @@ static void __guc_context_destroy(struct intel_context *ce)
 
 	lrc_fini(ce);
 	intel_context_fini(ce);
+	__free_lrcd_reg_idx(ce_to_guc(ce), ce);
 
 	if (intel_engine_is_virtual(ce->engine)) {
 		struct guc_virtual_engine *ve =
@@ -2726,11 +2801,14 @@ static int guc_request_alloc(struct i915_request *rq)
 
 	if (context_needs_lrc_desc_pin(ce, !!ret)) {
 		ret = guc_lrc_desc_pin(ce, true);
-		if (unlikely(ret)) {	/* unwind */
+		if (unlikely(ret == -EBUSY)) {
+			set_context_needs_register(ce);
+		} else if (unlikely(ret)) {	/* unwind */
 			if (ret == -EDEADLK)
 				disable_submission(guc);
 			atomic_dec(&ce->guc_id_ref);
 			unpin_guc_id(guc, ce, true);
+
 			return ret;
 		}
 	}
@@ -3370,6 +3448,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 
 	if (context_pending_enable(ce)) {
 		clr_context_pending_enable(ce);
+
+		free_lrcd_reg_idx(guc, ce);
 	} else if (context_pending_disable(ce)) {
 		bool banned;
 
@@ -3618,6 +3698,8 @@ void intel_guc_log_submission_info(struct intel_guc *guc,
 		   atomic_read(&guc->outstanding_submission_g2h));
 	drm_printf(p, "GuC Number GuC IDs: %d\n", guc->num_guc_ids);
 	drm_printf(p, "GuC Max Number GuC IDs: %d\n\n", guc->max_guc_ids);
+	drm_printf(p, "GuC max context registered: %u\n\n",
+		   guc->lrcd_reg.max_idx);
 
 	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i)
 		gse_log_submission_info(guc->gse[i], p, i);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 91/97] drm/i915/guc: Take GT PM ref when deregistering context
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (89 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 90/97] drm/i915/guc: Non-static lrc descriptor registration buffer Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 92/97] drm/i915: Add GT PM delayed worker Matthew Brost
                   ` (8 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Taking a PM reference to prevent intel_gt_wait_for_idle from short
circuiting while a deregister context H2G is in flight.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_pm.h     |  5 +
 drivers/gpu/drm/i915/gt/intel_gt_pm.h         | 13 +++
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 98 +++++++++++++++----
 4 files changed, 101 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.h b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
index 70ea46d6cfb0..17a5028ea177 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
@@ -16,6 +16,11 @@ intel_engine_pm_is_awake(const struct intel_engine_cs *engine)
 	return intel_wakeref_is_active(&engine->wakeref);
 }
 
+static inline void __intel_engine_pm_get(struct intel_engine_cs *engine)
+{
+	__intel_wakeref_get(&engine->wakeref);
+}
+
 static inline void intel_engine_pm_get(struct intel_engine_cs *engine)
 {
 	intel_wakeref_get(&engine->wakeref);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index d0588d8aaa44..a17bf0d4592b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -41,6 +41,19 @@ static inline void intel_gt_pm_put_async(struct intel_gt *gt)
 	intel_wakeref_put_async(&gt->wakeref);
 }
 
+#define with_intel_gt_pm(gt, tmp) \
+	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
+	     intel_gt_pm_put(gt), tmp = 0)
+#define with_intel_gt_pm_async(gt, tmp) \
+	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
+	     intel_gt_pm_put_async(gt), tmp = 0)
+#define with_intel_gt_pm_if_awake(gt, tmp) \
+	for (tmp = intel_gt_pm_get_if_awake(gt); tmp; \
+	     intel_gt_pm_put(gt), tmp = 0)
+#define with_intel_gt_pm_if_awake_async(gt, tmp) \
+	for (tmp = intel_gt_pm_get_if_awake(gt); tmp; \
+	     intel_gt_pm_put_async(gt), tmp = 0)
+
 static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
 {
 	return intel_wakeref_wait_for_idle(&gt->wakeref);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 97bb262f8a13..f6c40f6fb7ac 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -61,6 +61,10 @@ struct intel_guc {
 	struct list_head guc_id_list_no_ref;
 	struct list_head guc_id_list_unpinned;
 
+	spinlock_t destroy_lock;
+	struct list_head destroyed_contexts;
+	struct work_struct destroy_worker;
+
 	bool submission_selected;
 
 	struct i915_vma *ads_vma;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 79caf9596084..6fd5414296cd 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -909,6 +909,7 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 			if (deregister)
 				guc_signal_context_fence(ce);
 			if (destroyed) {
+				intel_gt_pm_put_async(guc_to_gt(guc));
 				release_guc_id(guc, ce);
 				__guc_context_destroy(ce);
 			}
@@ -1023,6 +1024,8 @@ static void guc_flush_submissions(struct intel_guc *guc)
 		gse_flush_submissions(guc->gse[i]);
 }
 
+static void guc_flush_destroyed_contexts(struct intel_guc *guc);
+
 void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 {
 	int i;
@@ -1040,6 +1043,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 	spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
 
 	guc_flush_submissions(guc);
+	guc_flush_destroyed_contexts(guc);
 
 	/*
 	 * Handle any outstanding G2Hs before reset. Call IRQ handler directly
@@ -1365,6 +1369,8 @@ static void retire_worker_func(struct work_struct *w)
 static int guc_lrcd_reg_init(struct intel_guc *guc);
 static void guc_lrcd_reg_fini(struct intel_guc *guc);
 
+static void destroy_worker_func(struct work_struct *w);
+
 /*
  * Set up the memory resources to be shared with the GuC (via the GGTT)
  * at firmware loading time.
@@ -1387,6 +1393,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
 	INIT_LIST_HEAD(&guc->guc_id_list_unpinned);
 	ida_init(&guc->guc_ids);
 
+	spin_lock_init(&guc->destroy_lock);
+	INIT_LIST_HEAD(&guc->destroyed_contexts);
+	INIT_WORK(&guc->destroy_worker, destroy_worker_func);
+
 	return 0;
 }
 
@@ -1397,6 +1407,7 @@ void intel_guc_submission_fini(struct intel_guc *guc)
 	if (!guc_submission_initialized(guc))
 		return;
 
+	guc_flush_destroyed_contexts(guc);
 	guc_lrcd_reg_fini(guc);
 
 	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i) {
@@ -2280,11 +2291,29 @@ static void guc_context_sched_disable(struct intel_context *ce)
 static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
+	struct intel_gt *gt = guc_to_gt(guc);
+	unsigned long flags;
+	bool disabled;
 
+	GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
 	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
 	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
 	GEM_BUG_ON(context_enabled(ce));
 
+	/* Seal race with Reset */
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	disabled = submission_disabled(guc);
+	if (likely(!disabled)) {
+		__intel_gt_pm_get(gt);
+		set_context_destroyed(ce);
+	}
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+	if (unlikely(disabled)) {
+		release_guc_id(guc, ce);
+		__guc_context_destroy(ce);
+		return;
+	}
+
 	clr_context_registered(ce);
 	deregister_context(ce, ce->guc_id, true);
 }
@@ -2313,12 +2342,51 @@ static void __guc_context_destroy(struct intel_context *ce)
 	}
 }
 
+static void guc_flush_destroyed_contexts(struct intel_guc *guc)
+{
+	struct intel_context *ce, *cn;
+	unsigned long flags;
+	spin_lock_irqsave(&guc->destroy_lock, flags);
+	list_for_each_entry_safe(ce, cn,
+				 &guc->destroyed_contexts, guc_id_link) {
+		list_del_init(&ce->guc_id_link);
+		release_guc_id(guc, ce);
+		__guc_context_destroy(ce);
+	}
+	spin_unlock_irqrestore(&guc->destroy_lock, flags);
+}
+
+static void deregister_destroyed_contexts(struct intel_guc *guc)
+{
+	struct intel_context *ce, *cn;
+	unsigned long flags;
+
+	spin_lock_irqsave(&guc->destroy_lock, flags);
+	list_for_each_entry_safe(ce, cn,
+				 &guc->destroyed_contexts, guc_id_link) {
+		list_del_init(&ce->guc_id_link);
+		spin_unlock_irqrestore(&guc->destroy_lock, flags);
+		guc_lrc_desc_unpin(ce);
+		spin_lock_irqsave(&guc->destroy_lock, flags);
+	}
+	spin_unlock_irqrestore(&guc->destroy_lock, flags);
+}
+
+static void destroy_worker_func(struct work_struct *w)
+{
+	struct intel_guc *guc =
+		container_of(w, struct intel_guc, destroy_worker);
+	struct intel_gt *gt = guc_to_gt(guc);
+	int tmp;
+
+	with_intel_gt_pm(gt, tmp)
+		deregister_destroyed_contexts(guc);
+}
+
 static void guc_context_destroy(struct kref *kref)
 {
 	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
-	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
 	struct intel_guc *guc = &ce->engine->gt->uc.guc;
-	intel_wakeref_t wakeref;
 	unsigned long flags;
 	bool disabled;
 
@@ -2356,12 +2424,12 @@ static void guc_context_destroy(struct kref *kref)
 		list_del_init(&ce->guc_id_link);
 	spin_unlock_irqrestore(&guc->contexts_lock, flags);
 
-	/* Seal race with Reset */
-	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	/* Seal race with reset */
+	spin_lock_irqsave(&guc->destroy_lock, flags);
 	disabled = submission_disabled(guc);
 	if (likely(!disabled))
-		set_context_destroyed(ce);
-	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+		list_add_tail(&ce->guc_id_link, &guc->destroyed_contexts);
+	spin_unlock_irqrestore(&guc->destroy_lock, flags);
 	if (unlikely(disabled)) {
 		release_guc_id(guc, ce);
 		__guc_context_destroy(ce);
@@ -2369,20 +2437,11 @@ static void guc_context_destroy(struct kref *kref)
 	}
 
 	/*
-	 * We defer GuC context deregistration until the context is destroyed
-	 * in order to save on CTBs. With this optimization ideally we only need
-	 * 1 CTB to register the context during the first pin and 1 CTB to
-	 * deregister the context when the context is destroyed. Without this
-	 * optimization, a CTB would be needed every pin & unpin.
-	 *
-	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
-	 * from context_free_worker when not runtime wakeref is held.
-	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
-	 * in H2G CTB to deregister the context. A future patch may defer this
-	 * H2G CTB if the runtime wakeref is zero.
+	 * We use a worker to issue the H2G to deregister the context as we can
+	 * take the GT PM for the first time which isn't allowed from an atomic
+	 * context.
 	 */
-	with_intel_runtime_pm(runtime_pm, wakeref)
-		guc_lrc_desc_unpin(ce);
+	queue_work(system_unbound_wq, &guc->destroy_worker);
 }
 
 static int guc_context_alloc(struct intel_context *ce)
@@ -3408,6 +3467,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 		intel_context_put(ce);
 	} else if (context_destroyed(ce)) {
 		/* Context has been destroyed */
+		intel_gt_pm_put_async(guc_to_gt(guc));
 		release_guc_id(guc, ce);
 		__guc_context_destroy(ce);
 	}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 92/97] drm/i915: Add GT PM delayed worker
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (90 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 91/97] drm/i915/guc: Take GT PM ref when deregistering context Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 93/97] drm/i915/guc: Take engine PM when a context is pinned with GuC submission Matthew Brost
                   ` (7 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Sometimes it is desirable to queue work up for later if the GT PM isn't
held and run that work on next GT PM unpark.

Implemented with a list in the GT of all pending work, workqueues in
the list, a callback to add a workqueue to the list, and finally a
wakeref post_get callback that iterates / drains the list + queues the
workqueues.

First user of this is deregistration of GuC contexts.

Signed-off-by: Matthew Brost <matthew.brost@intel.com
---
 drivers/gpu/drm/i915/Makefile                 |  1 +
 drivers/gpu/drm/i915/gt/intel_gt.c            |  3 ++
 drivers/gpu/drm/i915/gt/intel_gt_pm.c         |  8 +++++
 .../drm/i915/gt/intel_gt_pm_delayed_work.c    | 35 +++++++++++++++++++
 .../drm/i915/gt/intel_gt_pm_delayed_work.h    | 24 +++++++++++++
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |  3 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  3 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 +++++---
 drivers/gpu/drm/i915/intel_wakeref.c          |  5 +++
 drivers/gpu/drm/i915/intel_wakeref.h          |  1 +
 10 files changed, 92 insertions(+), 5 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_pm_delayed_work.c
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_pm_delayed_work.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index d0d936d9137b..c80ec163a7d1 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -102,6 +102,7 @@ gt-y += \
 	gt/intel_gt_clock_utils.o \
 	gt/intel_gt_irq.o \
 	gt/intel_gt_pm.o \
+	gt/intel_gt_pm_delayed_work.o \
 	gt/intel_gt_pm_irq.o \
 	gt/intel_gt_requests.o \
 	gt/intel_gtt.o \
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 0e4a5c4c883f..b3ea788de9e3 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -28,6 +28,9 @@ void intel_gt_init_early(struct intel_gt *gt, struct drm_i915_private *i915)
 
 	spin_lock_init(&gt->irq_lock);
 
+	spin_lock_init(&gt->pm_delayed_work_lock);
+	INIT_LIST_HEAD(&gt->pm_delayed_work_list);
+
 	INIT_LIST_HEAD(&gt->closed_vma);
 	spin_lock_init(&gt->closed_lock);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index 463a6ae605a0..9f5485be156e 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -93,6 +93,13 @@ static int __gt_unpark(struct intel_wakeref *wf)
 	return 0;
 }
 
+static void __gt_queue_delayed_work(struct intel_wakeref *wf)
+{
+	struct intel_gt *gt = container_of(wf, typeof(*gt), wakeref);
+
+	intel_gt_pm_queue_delayed_work(gt);
+}
+
 static int __gt_park(struct intel_wakeref *wf)
 {
 	struct intel_gt *gt = container_of(wf, typeof(*gt), wakeref);
@@ -123,6 +130,7 @@ static int __gt_park(struct intel_wakeref *wf)
 
 static const struct intel_wakeref_ops wf_ops = {
 	.get = __gt_unpark,
+	.post_get = __gt_queue_delayed_work,
 	.put = __gt_park,
 };
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm_delayed_work.c b/drivers/gpu/drm/i915/gt/intel_gt_pm_delayed_work.c
new file mode 100644
index 000000000000..fc97a37b9ca1
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm_delayed_work.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include "i915_drv.h"
+#include "intel_runtime_pm.h"
+#include "intel_gt_pm.h"
+
+void intel_gt_pm_queue_delayed_work(struct intel_gt *gt)
+{
+	struct intel_gt_pm_delayed_work *work, *next;
+	unsigned long flags;
+
+	spin_lock_irqsave(&gt->pm_delayed_work_lock, flags);
+	list_for_each_entry_safe(work, next,
+				 &gt->pm_delayed_work_list, link) {
+		list_del_init(&work->link);
+		queue_work(system_unbound_wq, &work->worker);
+	}
+	spin_unlock_irqrestore(&gt->pm_delayed_work_lock, flags);
+}
+
+void intel_gt_pm_add_delayed_work(struct intel_gt *gt,
+				  struct intel_gt_pm_delayed_work *work)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&gt->pm_delayed_work_lock, flags);
+	if (intel_gt_pm_is_awake(gt))
+		queue_work(system_unbound_wq, &work->worker);
+	else if (list_empty(&work->link))
+		list_add_tail(&work->link, &gt->pm_delayed_work_list);
+	spin_unlock_irqrestore(&gt->pm_delayed_work_lock, flags);
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm_delayed_work.h b/drivers/gpu/drm/i915/gt/intel_gt_pm_delayed_work.h
new file mode 100644
index 000000000000..7e91a9432f7f
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm_delayed_work.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef INTEL_GT_PM_DELAYED_WORK_H
+#define INTEL_GT_PM_DELAYED_WORK_H
+
+#include <linux/list.h>
+#include <linux/workqueue.h>
+
+struct intel_gt;
+
+struct intel_gt_pm_delayed_work {
+	struct list_head link;
+	struct work_struct worker;
+};
+
+void intel_gt_pm_queue_delayed_work(struct intel_gt *gt);
+
+void intel_gt_pm_add_delayed_work(struct intel_gt *gt,
+				  struct intel_gt_pm_delayed_work *work);
+
+#endif /* INTEL_GT_PM_DELAYED_WORK_H */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index fecfacf551d5..60ed7af94dba 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -68,6 +68,9 @@ struct intel_gt {
 	struct intel_wakeref wakeref;
 	atomic_t user_wakeref;
 
+	struct list_head pm_delayed_work_list;
+	spinlock_t pm_delayed_work_lock;
+
 	struct list_head closed_vma;
 	spinlock_t closed_lock; /* guards the list of closed_vma */
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index f6c40f6fb7ac..10dcfd790aa2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -17,6 +17,7 @@
 #include "intel_uc_fw.h"
 #include "i915_utils.h"
 #include "i915_vma.h"
+#include "gt/intel_gt_pm_delayed_work.h"
 
 struct __guc_ads_blob;
 
@@ -63,7 +64,7 @@ struct intel_guc {
 
 	spinlock_t destroy_lock;
 	struct list_head destroyed_contexts;
-	struct work_struct destroy_worker;
+	struct intel_gt_pm_delayed_work destroy_worker;
 
 	bool submission_selected;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 6fd5414296cd..25c77084c3a0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1395,7 +1395,8 @@ int intel_guc_submission_init(struct intel_guc *guc)
 
 	spin_lock_init(&guc->destroy_lock);
 	INIT_LIST_HEAD(&guc->destroyed_contexts);
-	INIT_WORK(&guc->destroy_worker, destroy_worker_func);
+	INIT_LIST_HEAD(&guc->destroy_worker.link);
+	INIT_WORK(&guc->destroy_worker.worker, destroy_worker_func);
 
 	return 0;
 }
@@ -2374,13 +2375,18 @@ static void deregister_destroyed_contexts(struct intel_guc *guc)
 
 static void destroy_worker_func(struct work_struct *w)
 {
+	struct intel_gt_pm_delayed_work *destroy_worker =
+		container_of(w, struct intel_gt_pm_delayed_work, worker);
 	struct intel_guc *guc =
-		container_of(w, struct intel_guc, destroy_worker);
+		container_of(destroy_worker, struct intel_guc, destroy_worker);
 	struct intel_gt *gt = guc_to_gt(guc);
 	int tmp;
 
-	with_intel_gt_pm(gt, tmp)
+	with_intel_gt_pm_if_awake(gt, tmp)
 		deregister_destroyed_contexts(guc);
+
+	if (!list_empty(&guc->destroyed_contexts))
+		intel_gt_pm_add_delayed_work(gt, destroy_worker);
 }
 
 static void guc_context_destroy(struct kref *kref)
@@ -2441,7 +2447,7 @@ static void guc_context_destroy(struct kref *kref)
 	 * take the GT PM for the first time which isn't allowed from an atomic
 	 * context.
 	 */
-	queue_work(system_unbound_wq, &guc->destroy_worker);
+	intel_gt_pm_add_delayed_work(guc_to_gt(guc), &guc->destroy_worker);
 }
 
 static int guc_context_alloc(struct intel_context *ce)
diff --git a/drivers/gpu/drm/i915/intel_wakeref.c b/drivers/gpu/drm/i915/intel_wakeref.c
index dfd87d082218..282fc4f312e3 100644
--- a/drivers/gpu/drm/i915/intel_wakeref.c
+++ b/drivers/gpu/drm/i915/intel_wakeref.c
@@ -24,6 +24,8 @@ static void rpm_put(struct intel_wakeref *wf)
 
 int __intel_wakeref_get_first(struct intel_wakeref *wf)
 {
+	bool do_post = false;
+
 	/*
 	 * Treat get/put as different subclasses, as we may need to run
 	 * the put callback from under the shrinker and do not want to
@@ -44,8 +46,11 @@ int __intel_wakeref_get_first(struct intel_wakeref *wf)
 		}
 
 		smp_mb__before_atomic(); /* release wf->count */
+		do_post = true;
 	}
 	atomic_inc(&wf->count);
+	if (do_post && wf->ops->post_get)
+		wf->ops->post_get(wf);
 	mutex_unlock(&wf->mutex);
 
 	INTEL_WAKEREF_BUG_ON(atomic_read(&wf->count) <= 0);
diff --git a/drivers/gpu/drm/i915/intel_wakeref.h b/drivers/gpu/drm/i915/intel_wakeref.h
index 545c8f277c46..ef7e6a698e8a 100644
--- a/drivers/gpu/drm/i915/intel_wakeref.h
+++ b/drivers/gpu/drm/i915/intel_wakeref.h
@@ -30,6 +30,7 @@ typedef depot_stack_handle_t intel_wakeref_t;
 
 struct intel_wakeref_ops {
 	int (*get)(struct intel_wakeref *wf);
+	void (*post_get)(struct intel_wakeref *wf);
 	int (*put)(struct intel_wakeref *wf);
 };
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 93/97] drm/i915/guc: Take engine PM when a context is pinned with GuC submission
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (91 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 92/97] drm/i915: Add GT PM delayed worker Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 94/97] drm/i915/guc: Don't call switch_to_kernel_context " Matthew Brost
                   ` (6 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Taking a PM reference to prevent intel_gt_wait_for_idle from short
circuiting while a scheduling of user context could be enabled.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +++++++++++++++++--
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 25c77084c3a0..dd4baaad679f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2026,7 +2026,12 @@ static int guc_context_pre_pin(struct intel_context *ce,
 
 static int guc_context_pin(struct intel_context *ce, void *vaddr)
 {
-	return __guc_context_pin(ce, ce->engine, vaddr);
+	int ret = __guc_context_pin(ce, ce->engine, vaddr);
+
+	if (likely(!ret && !intel_context_is_barrier(ce)))
+		intel_engine_pm_get(ce->engine);
+
+	return ret;
 }
 
 static void guc_context_unpin(struct intel_context *ce)
@@ -2037,6 +2042,9 @@ static void guc_context_unpin(struct intel_context *ce)
 
 	unpin_guc_id(guc, ce, true);
 	lrc_unpin(ce);
+
+	if (likely(!intel_context_is_barrier(ce)))
+		intel_engine_pm_put(ce->engine);
 }
 
 static void guc_context_post_unpin(struct intel_context *ce)
@@ -2922,8 +2930,30 @@ static int guc_virtual_context_pre_pin(struct intel_context *ce,
 static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
 {
 	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+	int ret = __guc_context_pin(ce, engine, vaddr);
+	intel_engine_mask_t tmp, mask = ce->engine->mask;
+
+	if (likely(!ret))
+		for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
+			intel_engine_pm_get(engine);
+
+	return ret;
+}
+
+static void guc_virtual_context_unpin(struct intel_context *ce)
+{
+	intel_engine_mask_t tmp, mask = ce->engine->mask;
+	struct intel_engine_cs *engine;
+	struct intel_guc *guc = ce_to_guc(ce);
 
-	return __guc_context_pin(ce, engine, vaddr);
+	GEM_BUG_ON(context_enabled(ce));
+	GEM_BUG_ON(intel_context_is_barrier(ce));
+
+	unpin_guc_id(guc, ce, true);
+	lrc_unpin(ce);
+
+	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
+		intel_engine_pm_put(engine);
 }
 
 static void guc_virtual_context_enter(struct intel_context *ce)
@@ -2972,7 +3002,7 @@ static const struct intel_context_ops virtual_guc_context_ops = {
 
 	.pre_pin = guc_virtual_context_pre_pin,
 	.pin = guc_virtual_context_pin,
-	.unpin = guc_context_unpin,
+	.unpin = guc_virtual_context_unpin,
 	.post_unpin = guc_context_post_unpin,
 
 	.ban = guc_context_ban,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 94/97] drm/i915/guc: Don't call switch_to_kernel_context with GuC submission
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (92 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 93/97] drm/i915/guc: Take engine PM when a context is pinned with GuC submission Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 95/97] drm/i915/guc: Selftest for GuC flow control Matthew Brost
                   ` (5 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Calling switch_to_kernel_context isn't needed if the engine PM reference
is taken while all contexts are pinned. By not calling
switch_to_kernel_context we save on issuing a request to the engine.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_pm.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index ba6a9931c4e8..f8fab316e33d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -162,6 +162,10 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine)
 	unsigned long flags;
 	bool result = true;
 
+	/* No need to switch_to_kernel_context if GuC submission */
+	if (intel_engine_uses_guc(engine))
+		return true;
+
 	/* GPU is pointing to the void, as good as in the kernel context. */
 	if (intel_gt_is_wedged(engine->gt))
 		return true;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 95/97] drm/i915/guc: Selftest for GuC flow control
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (93 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 94/97] drm/i915/guc: Don't call switch_to_kernel_context " Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 96/97] drm/i915/guc: Update GuC documentation Matthew Brost
                   ` (4 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Add 5 selftests for hard (from user space) to recreate flow conditions.
Test listed below:

1. A test to verify that the number of guc_ids can be exhausted and all
submissions still complete.

2. A test to verify that the flow control state machine can recover from
a full GPU reset.

3. A teset to verify that the lrcd registration slots can be exhausted
and all submissions still complete.

4. A test to verify that the H2G channel can deadlock and a full GPU
reset recovers the system.

5. A test to stress to CTB channel but submitting to lots of contexts
and then immediately destroy the contexts.

Tests 1, 2, and 3 also ensure when the flow control is triggered by
unready requests those unready requests do not DoS ready requests.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   6 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  40 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |   9 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  16 +
 .../i915/gt/uc/intel_guc_submission_types.h   |   2 +
 .../i915/gt/uc/selftest_guc_flow_control.c    | 589 ++++++++++++++++++
 .../drm/i915/selftests/i915_live_selftests.h  |   1 +
 .../i915/selftests/intel_scheduler_helpers.c  | 101 +++
 .../i915/selftests/intel_scheduler_helpers.h  |  37 ++
 10 files changed, 793 insertions(+), 9 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c
 create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
 create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index c80ec163a7d1..eba5c1e9eceb 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -285,6 +285,7 @@ i915-$(CONFIG_DRM_I915_SELFTEST) += \
 	selftests/igt_mmap.o \
 	selftests/igt_reset.o \
 	selftests/igt_spinner.o \
+	selftests/intel_scheduler_helpers.o \
 	selftests/librapl.o
 
 # virtual gpu code
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 10dcfd790aa2..169daaf8a189 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -102,6 +102,12 @@ struct intel_guc {
 
 	/* To serialize the intel_guc_send actions */
 	struct mutex send_mutex;
+
+	I915_SELFTEST_DECLARE(bool gse_hang_expected;)
+	I915_SELFTEST_DECLARE(bool deadlock_expected;)
+	I915_SELFTEST_DECLARE(bool bad_desc_expected;)
+	I915_SELFTEST_DECLARE(bool inject_bad_sched_disable;)
+	I915_SELFTEST_DECLARE(bool inject_corrupt_h2g;)
 };
 
 static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 1c240ff8dec9..03b8a359bfcb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -3,7 +3,6 @@
  * Copyright © 2016-2019 Intel Corporation
  */
 
-#include <linux/circ_buf.h>
 #include <linux/ktime.h>
 #include <linux/time64.h>
 #include <linux/timekeeping.h>
@@ -404,11 +403,13 @@ static int ct_write(struct intel_guc_ct *ct,
 	u32 *cmds = ctb->cmds;
 	unsigned int i;
 
-	if (unlikely(ctb->broken))
-		return -EDEADLK;
+	if (!I915_SELFTEST_ONLY(ct_to_guc(ct)->deadlock_expected)) {
+		if (unlikely(ctb->broken))
+			return -EDEADLK;
 
-	if (unlikely(desc->status))
-		goto corrupted;
+		if (unlikely(desc->status))
+			goto corrupted;
+	}
 
 #ifdef CONFIG_DRM_I915_DEBUG_GUC
 	if (unlikely((desc->tail | desc->head) >= size)) {
@@ -427,6 +428,15 @@ static int ct_write(struct intel_guc_ct *ct,
 		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
 		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
 
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+	if (ct_to_guc(ct)->inject_corrupt_h2g) {
+		header = FIELD_PREP(GUC_CTB_MSG_0_FORMAT, 3) |
+			 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len + 5) |
+			 FIELD_PREP(GUC_CTB_MSG_0_FENCE, 0xdead);
+		ct_to_guc(ct)->inject_corrupt_h2g = false;
+	}
+#endif
+
 	hxg = (flags & INTEL_GUC_SEND_NB) ?
 		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
 		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
@@ -464,8 +474,12 @@ static int ct_write(struct intel_guc_ct *ct,
 	return 0;
 
 corrupted:
-	CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n",
-		 desc->head, desc->tail, desc->status);
+	if (I915_SELFTEST_ONLY(ct_to_guc(ct)->bad_desc_expected))
+		CT_DEBUG(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n",
+			 desc->head, desc->tail, desc->status);
+	else
+		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n",
+			 desc->head, desc->tail, desc->status);
 	ctb->broken = true;
 	return -EDEADLK;
 }
@@ -517,8 +531,16 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 	bool ret = ktime_us_delta(ktime_get(), ct->stall_time) >
 		MAX_US_STALL_CTB;
 
-	if (unlikely(ret))
-		CT_ERROR(ct, "CT deadlocked\n");
+	if (unlikely(ret)) {
+		/*
+		 * CI doesn't like error messages, demote to debug if deadlock was
+		 * intentionally hit.
+		 */
+		if (I915_SELFTEST_ONLY(ct_to_guc(ct)->deadlock_expected))
+			CT_DEBUG(ct, "CT deadlocked\n");
+		else
+			CT_ERROR(ct, "CT deadlocked\n");
+	}
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index f62eb06b32fc..84023c175001 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -6,6 +6,7 @@
 #ifndef _INTEL_GUC_CT_H_
 #define _INTEL_GUC_CT_H_
 
+#include <linux/circ_buf.h>
 #include <linux/interrupt.h>
 #include <linux/spinlock.h>
 #include <linux/workqueue.h>
@@ -109,4 +110,12 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
 
 void intel_guc_log_ct_info(struct intel_guc_ct *ct, struct drm_printer *p);
 
+static inline bool intel_guc_ct_is_recv_buffer_empty(struct intel_guc_ct *ct)
+{
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
+
+	return atomic_read(&ctb->space) ==
+		(CIRC_SPACE(0, 0, ctb->size) - ctb->resv_space);
+}
+
 #endif /* _INTEL_GUC_CT_H_ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index dd4baaad679f..337ddc0dab6b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -818,6 +818,7 @@ static int gse_dequeue_one_context(struct guc_submit_engine *gse)
 			GEM_WARN_ON(ret);	/* Unexpected */
 			goto deadlk;
 		}
+		I915_SELFTEST_DECLARE(++gse->tasklets_submit_count;)
 	}
 
 	/*
@@ -2077,7 +2078,15 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 		GUC_CONTEXT_DISABLE
 	};
 
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+	if (guc->inject_bad_sched_disable &&
+	    guc_id == GUC_INVALID_LRC_ID)
+		guc->inject_bad_sched_disable = false;
+	else
+		GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
+#else
 	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
+#endif
 
 	trace_intel_context_sched_disable(ce);
 
@@ -2689,6 +2698,9 @@ static void retire_worker_sched_disable(struct guc_submit_engine *gse,
 		guc_id = prep_context_pending_disable(ce);
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
+		if (I915_SELFTEST_ONLY(gse->guc->inject_bad_sched_disable))
+			guc_id = GUC_INVALID_LRC_ID;
+
 		with_intel_runtime_pm(runtime_pm, wakeref)
 			__guc_context_sched_disable(gse->guc, ce, guc_id);
 
@@ -3952,3 +3964,7 @@ bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve)
 
 	return false;
 }
+
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+#include "selftest_guc_flow_control.c"
+#endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
index e45c2f00f09c..43c5ea0f64e7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
@@ -48,6 +48,8 @@ struct guc_submit_engine {
 		STALL_MOVE_LRC_TAIL,
 		STALL_ADD_REQUEST,
 	} submission_stall_reason;
+
+	I915_SELFTEST_DECLARE(u64 tasklets_submit_count;)
 };
 
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c
new file mode 100644
index 000000000000..c5385064754d
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c
@@ -0,0 +1,589 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright �� 2021 Intel Corporation
+ */
+
+#include "selftests/igt_spinner.h"
+#include "selftests/igt_reset.h"
+#include "selftests/intel_scheduler_helpers.h"
+#include "gt/intel_engine_heartbeat.h"
+#include "gem/selftests/mock_context.h"
+
+static int __request_add_spin(struct i915_request *rq, struct igt_spinner *spin)
+{
+	int err = 0;
+
+	i915_request_get(rq);
+	i915_request_add(rq);
+	if (spin && !igt_wait_for_spinner(spin, rq))
+		err = -ETIMEDOUT;
+
+	return err;
+}
+
+static struct i915_request *nop_kernel_request(struct intel_engine_cs *engine)
+{
+	struct i915_request *rq;
+
+	rq = intel_engine_create_kernel_request(engine);
+	if (IS_ERR(rq))
+		return rq;
+
+	i915_request_get(rq);
+	i915_request_add(rq);
+
+	return rq;
+}
+
+static struct i915_request *nop_user_request(struct intel_context *ce,
+					     struct i915_request *from)
+{
+	struct i915_request *rq;
+	int ret;
+
+	rq = intel_context_create_request(ce);
+	if (IS_ERR(rq))
+		return rq;
+
+	if (from) {
+		ret = i915_sw_fence_await_dma_fence(&rq->submit,
+						    &from->fence, 0,
+						    I915_FENCE_GFP);
+		if (ret < 0) {
+			i915_request_put(rq);
+			return ERR_PTR(ret);
+		}
+	}
+
+	i915_request_get(rq);
+	i915_request_add(rq);
+
+	return rq;
+}
+
+static int nop_request_wait(struct intel_engine_cs *engine, bool kernel,
+			    bool flow_control)
+{
+	struct i915_gpu_error *global = &engine->gt->i915->gpu_error;
+	unsigned int reset_count = i915_reset_count(global);
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	struct guc_submit_engine *gse = guc->gse[GUC_SUBMIT_ENGINE_SINGLE_LRC];
+	u64 tasklets_submit_count = gse->tasklets_submit_count;
+	struct intel_context *ce;
+	struct i915_request *nop;
+	int ret;
+
+	if (kernel) {
+		nop = nop_kernel_request(engine);
+	} else {
+		ce = intel_context_create(engine);
+		if (IS_ERR(ce))
+			return PTR_ERR(ce);
+		nop = nop_user_request(ce, NULL);
+		intel_context_put(ce);
+	}
+	if (IS_ERR(nop))
+		return PTR_ERR(nop);
+
+	ret = intel_selftest_wait_for_rq(nop);
+	i915_request_put(nop);
+	if (ret)
+		return ret;
+
+	if (!flow_control &&
+	    gse->tasklets_submit_count != tasklets_submit_count) {
+		pr_err("Flow control for single-lrc unexpectedly kicked in\n");
+		ret = -EINVAL;
+	}
+
+	if (flow_control &&
+	    gse->tasklets_submit_count == tasklets_submit_count) {
+		pr_err("Flow control for single-lrc did not kick in\n");
+		ret = -EINVAL;
+	}
+
+	if (i915_reset_count(global) != reset_count) {
+		pr_err("Unexpected GPU reset during single-lrc submit\n");
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+#define NUM_GUC_ID		256
+#define NUM_CONTEXT		1024
+#define NUM_RQ_PER_CONTEXT	2
+#define HEARTBEAT_INTERVAL	1500
+
+static int __intel_guc_flow_control_guc(void *arg, bool limit_guc_ids, bool hang)
+{
+	struct intel_gt *gt = arg;
+	struct intel_guc *guc = &gt->uc.guc;
+	struct guc_submit_engine *gse = guc->gse[GUC_SUBMIT_ENGINE_SINGLE_LRC];
+	struct intel_context **contexts;
+	int ret = 0;
+	int i, j, k;
+	struct intel_context *ce;
+	struct igt_spinner spin;
+	struct i915_request *spin_rq = NULL, *last = NULL;
+	intel_wakeref_t wakeref;
+	struct intel_engine_cs *engine;
+	struct i915_gpu_error *global = &gt->i915->gpu_error;
+	unsigned int reset_count;
+	u64 tasklets_submit_count = gse->tasklets_submit_count;
+	u32 old_beat;
+
+	contexts = kmalloc(sizeof(*contexts) * NUM_CONTEXT, GFP_KERNEL);
+	if (!contexts) {
+		pr_err("Context array allocation failed\n");
+		return -ENOMEM;
+	}
+
+	wakeref = intel_runtime_pm_get(gt->uncore->rpm);
+
+	if (limit_guc_ids)
+		guc->num_guc_ids = NUM_GUC_ID;
+
+	ce = intel_context_create(intel_selftest_find_any_engine(gt));
+	if (IS_ERR(ce)) {
+		ret = PTR_ERR(ce);
+		pr_err("Failed to create context: %d\n", ret);
+		goto err;
+	}
+
+	reset_count = i915_reset_count(global);
+	engine = ce->engine;
+
+	old_beat = engine->props.heartbeat_interval_ms;
+	if (hang) {
+		ret = intel_engine_set_heartbeat(engine, HEARTBEAT_INTERVAL);
+		if (ret) {
+			pr_err("Failed to boost heartbeat interval: %d\n", ret);
+			goto err;
+		}
+	}
+
+	/* Create spinner to block requests in below loop */
+	ret = igt_spinner_init(&spin, engine->gt);
+	if (ret) {
+		pr_err("Failed to create spinner: %d\n", ret);
+		goto err_heartbeat;
+	}
+	spin_rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
+	intel_context_put(ce);
+	if (IS_ERR(spin_rq)) {
+		ret = PTR_ERR(spin_rq);
+		pr_err("Failed to create spinner request: %d\n", ret);
+		goto err_heartbeat;
+	}
+	ret = __request_add_spin(spin_rq, &spin);
+	if (ret) {
+		pr_err("Failed to add Spinner request: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+	/*
+	 * Create of lot of requests in a loop to trigger the flow control state
+	 * machine. Using a three level loop as it is interesting to hit flow
+	 * control with more than 1 request on each context in a row and also
+	 * interleave requests with other contexts.
+	 */
+	for (i = 0; i < NUM_RQ_PER_CONTEXT; ++i) {
+		for (j = 0; j < NUM_CONTEXT; ++j) {
+			for (k = 0; k < NUM_RQ_PER_CONTEXT; ++k) {
+				bool first_pass = !i && !k;
+
+				if (last)
+					i915_request_put(last);
+				last = NULL;
+
+				if (first_pass)
+					contexts[j] = intel_context_create(engine);
+				ce = contexts[j];
+
+				if (IS_ERR(ce)) {
+					ret = PTR_ERR(ce);
+					pr_err("Failed to create context, %d,%d,%d: %d\n",
+					       i, j, k, ret);
+					goto err_spin_rq;
+				}
+
+				last = nop_user_request(ce, spin_rq);
+				if (first_pass)
+					intel_context_put(ce);
+				if (IS_ERR(last)) {
+					ret = PTR_ERR(last);
+					pr_err("Failed to create request, %d,%d,%d: %d\n",
+					       i, j, k, ret);
+					goto err_spin_rq;
+				}
+			}
+		}
+	}
+
+	/* Verify GuC submit engine state */
+	if (limit_guc_ids && !guc_ids_exhausted(gse)) {
+		pr_err("guc_ids not exhausted\n");
+		ret = -EINVAL;
+		goto err_spin_rq;
+	}
+	if (!limit_guc_ids && guc_ids_exhausted(gse)) {
+		pr_err("guc_ids exhausted\n");
+		ret = -EINVAL;
+		goto err_spin_rq;
+	}
+
+	/* Ensure no DoS from unready requests */
+	ret = nop_request_wait(engine, false, true);
+	if (ret < 0) {
+		pr_err("User NOP request DoS: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+	/* Inject hang in flow control state machine */
+	if (hang) {
+		guc->gse_hang_expected = true;
+		guc->inject_bad_sched_disable = true;
+	}
+
+	/* Release blocked requests */
+	igt_spinner_end(&spin);
+	ret = intel_selftest_wait_for_rq(spin_rq);
+	if (ret) {
+		pr_err("Spin request failed to complete: %d\n", ret);
+		goto err_spin_rq;
+	}
+	i915_request_put(spin_rq);
+	igt_spinner_fini(&spin);
+	spin_rq = NULL;
+
+	/* Wait for last request / GT to idle */
+	ret = i915_request_wait(last, 0, hang ? HZ * 30 : HZ * 10);
+	if (ret < 0) {
+		pr_err("Last request failed to complete: %d\n", ret);
+		goto err_spin_rq;
+	}
+	i915_request_put(last);
+	last = NULL;
+	ret = intel_gt_wait_for_idle(gt, HZ * 5);
+	if (ret < 0) {
+		pr_err("GT failed to idle: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+	/* Check state after idle */
+	if (guc_ids_exhausted(gse)) {
+		pr_err("guc_ids exhausted after last request signaled\n");
+		ret = -EINVAL;
+		goto err_spin_rq;
+	}
+	if (hang) {
+		if (i915_reset_count(global) == reset_count) {
+			pr_err("Failed to record a GPU reset\n");
+			ret = -EINVAL;
+			goto err_spin_rq;
+		}
+	} else {
+		if (i915_reset_count(global) != reset_count) {
+			pr_err("Unexpected GPU reset\n");
+			ret = -EINVAL;
+			goto err_spin_rq;
+		}
+		if (gse->tasklets_submit_count == tasklets_submit_count) {
+			pr_err("Flow control failed to kick in\n");
+			ret = -EINVAL;
+			goto err_spin_rq;
+		}
+	}
+
+	/* Verify requests can be submitted after flow control */
+	ret = nop_request_wait(engine, true, false);
+	if (ret < 0) {
+		pr_err("Kernel NOP failed to complete: %d\n", ret);
+		goto err_spin_rq;
+	}
+	ret = nop_request_wait(engine, false, false);
+	if (ret < 0) {
+		pr_err("User NOP failed to complete: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+err_spin_rq:
+	if (spin_rq) {
+		igt_spinner_end(&spin);
+		intel_selftest_wait_for_rq(spin_rq);
+		i915_request_put(spin_rq);
+		igt_spinner_fini(&spin);
+		intel_gt_wait_for_idle(gt, HZ * 5);
+	}
+err_heartbeat:
+	if (last)
+		i915_request_put(last);
+	intel_engine_set_heartbeat(engine, old_beat);
+err:
+	intel_runtime_pm_put(gt->uncore->rpm, wakeref);
+	guc->num_guc_ids = guc->max_guc_ids;
+	guc->gse_hang_expected = false;
+	guc->inject_bad_sched_disable = false;
+	kfree(contexts);
+
+	return ret;
+}
+
+static int intel_guc_flow_control_guc_ids(void *arg)
+{
+	return __intel_guc_flow_control_guc(arg, true, false);
+}
+
+static int intel_guc_flow_control_lrcd_reg(void *arg)
+{
+	return __intel_guc_flow_control_guc(arg, false, false);
+}
+
+static int intel_guc_flow_control_hang_state_machine(void *arg)
+{
+	return __intel_guc_flow_control_guc(arg, true, true);
+}
+
+#define NUM_RQ_STRESS_CTBS	0x4000
+static int intel_guc_flow_control_stress_ctbs(void *arg)
+{
+	struct intel_gt *gt = arg;
+	int ret = 0;
+	int i;
+	struct intel_context *ce;
+	struct i915_request *last = NULL, *rq;
+	intel_wakeref_t wakeref;
+	struct intel_engine_cs *engine;
+	struct i915_gpu_error *global = &gt->i915->gpu_error;
+	unsigned int reset_count;
+	struct intel_guc *guc = &gt->uc.guc;
+	struct intel_guc_ct_buffer *ctb = &guc->ct.ctbs.recv;
+
+	wakeref = intel_runtime_pm_get(gt->uncore->rpm);
+
+	reset_count = i915_reset_count(global);
+	engine = intel_selftest_find_any_engine(gt);
+
+	/*
+	 * Create a bunch of requests, and then idle the GT which will create a
+	 * lot of H2G / G2H traffic.
+	 */
+	for (i = 0; i < NUM_RQ_STRESS_CTBS; ++i) {
+		ce = intel_context_create(engine);
+		if (IS_ERR(ce)) {
+			ret = PTR_ERR(ce);
+			pr_err("Failed to create context, %d: %d\n", i, ret);
+			goto err;
+		}
+
+		rq = nop_user_request(ce, NULL);
+		intel_context_put(ce);
+
+		if (IS_ERR(rq)) {
+			ret = PTR_ERR(rq);
+			pr_err("Failed to create request, %d: %d\n", i, ret);
+			goto err;
+		}
+
+		if (last)
+			i915_request_put(last);
+		last = rq;
+	}
+
+	ret = i915_request_wait(last, 0, HZ * 10);
+	if (ret < 0) {
+		pr_err("Last request failed to complete: %d\n", ret);
+		goto err;
+	}
+	i915_request_put(last);
+	last = NULL;
+
+	ret = intel_gt_wait_for_idle(gt, HZ * 10);
+	if (ret < 0) {
+		pr_err("GT failed to idle: %d\n", ret);
+		goto err;
+	}
+
+	if (i915_reset_count(global) != reset_count) {
+		pr_err("Unexpected GPU reset\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	ret = nop_request_wait(engine, true, false);
+	if (ret < 0) {
+		pr_err("Kernel NOP failed to complete: %d\n", ret);
+		goto err;
+	}
+
+	ret = nop_request_wait(engine, false, false);
+	if (ret < 0) {
+		pr_err("User NOP failed to complete: %d\n", ret);
+		goto err;
+	}
+
+	ret = intel_gt_wait_for_idle(gt, HZ);
+	if (ret < 0) {
+		pr_err("GT failed to idle: %d\n", ret);
+		goto err;
+	}
+
+	ret = wait_for(intel_guc_ct_is_recv_buffer_empty(&guc->ct), HZ);
+	if (ret) {
+		pr_err("Recv CTB not expected value=%d,%d outstanding_ctb=%d\n",
+		       atomic_read(&ctb->space),
+		       CIRC_SPACE(0, 0, ctb->size) - ctb->resv_space,
+		       atomic_read(&guc->outstanding_submission_g2h));
+		ret = -EINVAL;
+		goto err;
+	}
+
+err:
+	if (last)
+		i915_request_put(last);
+	intel_runtime_pm_put(gt->uncore->rpm, wakeref);
+
+	return ret;
+}
+
+#define NUM_RQ_DEADLOCK		2048
+static int __intel_guc_flow_control_deadlock_h2g(void *arg, bool bad_desc)
+{
+	struct intel_gt *gt = arg;
+	struct intel_guc *guc = &gt->uc.guc;
+	int ret = 0;
+	int i;
+	struct intel_context *ce;
+	struct i915_request *last = NULL, *rq;
+	intel_wakeref_t wakeref;
+	struct intel_engine_cs *engine;
+	struct i915_gpu_error *global = &gt->i915->gpu_error;
+	unsigned int reset_count;
+	u32 old_beat;
+
+	wakeref = intel_runtime_pm_get(gt->uncore->rpm);
+
+	reset_count = i915_reset_count(global);
+	engine = intel_selftest_find_any_engine(gt);
+
+	old_beat = engine->props.heartbeat_interval_ms;
+	ret = intel_engine_set_heartbeat(engine, HEARTBEAT_INTERVAL);
+	if (ret) {
+		pr_err("Failed to boost heartbeat interval: %d\n", ret);
+		goto err;
+	}
+
+	guc->inject_corrupt_h2g = true;
+	if (bad_desc)
+		guc->bad_desc_expected = true;
+	else
+		guc->deadlock_expected = true;
+
+	for (i = 0; i < NUM_RQ_DEADLOCK; ++i) {
+		ce = intel_context_create(engine);
+		if (IS_ERR(ce)) {
+			ret = PTR_ERR(ce);
+			pr_err("Failed to create context, %d: %d\n", i, ret);
+			goto err_heartbeat;
+		}
+
+		rq = nop_user_request(ce, NULL);
+		intel_context_put(ce);
+
+		if (IS_ERR(rq) && PTR_ERR(rq) == -EDEADLK) {
+			break;
+		} else if (IS_ERR(rq)) {
+			ret = PTR_ERR(rq);
+			pr_err("Failed to create request, %d: %d\n", i, ret);
+			goto err_heartbeat;
+		}
+
+		if (last)
+			i915_request_put(last);
+		last = rq;
+	}
+
+	pr_debug("Number requests before deadlock: %d\n", i);
+
+	if (!submission_disabled(guc)) {
+		pr_err("Submission not disabled");
+		ret = -EINVAL;
+		goto err_heartbeat;
+	}
+
+	ret = i915_request_wait(last, 0, HZ * 5);
+	if (ret < 0) {
+		pr_err("Last request failed to complete: %d\n", ret);
+		goto err_heartbeat;
+	}
+	i915_request_put(last);
+	last = NULL;
+
+	ret = intel_gt_wait_for_idle(gt, HZ * 10);
+	if (ret < 0) {
+		pr_err("GT failed to idle: %d\n", ret);
+		goto err_heartbeat;
+	}
+
+	if (i915_reset_count(global) == reset_count) {
+		pr_err("Failed to record a GPU reset\n");
+		ret = -EINVAL;
+		goto err_heartbeat;
+	}
+
+	ret = nop_request_wait(engine, true, false);
+	if (ret < 0) {
+		pr_err("Kernel NOP failed to complete: %d\n", ret);
+		goto err_heartbeat;
+	}
+
+	ret = nop_request_wait(engine, false, false);
+	if (ret < 0) {
+		pr_err("User NOP failed to complete: %d\n", ret);
+		goto err_heartbeat;
+	}
+
+err_heartbeat:
+	if (last)
+		i915_request_put(last);
+	intel_engine_set_heartbeat(engine, old_beat);
+err:
+	intel_runtime_pm_put(gt->uncore->rpm, wakeref);
+	guc->inject_corrupt_h2g = false;
+	guc->deadlock_expected = false;
+	guc->bad_desc_expected = false;
+
+	return ret;
+}
+
+static int intel_guc_flow_control_deadlock_h2g(void *arg)
+{
+	return __intel_guc_flow_control_deadlock_h2g(arg, false);
+}
+
+static int intel_guc_flow_control_bad_desc_h2g(void *arg)
+{
+	return __intel_guc_flow_control_deadlock_h2g(arg, true);
+}
+
+int intel_guc_flow_control(struct drm_i915_private *i915)
+{
+	static const struct i915_subtest tests[] = {
+		SUBTEST(intel_guc_flow_control_stress_ctbs),
+		SUBTEST(intel_guc_flow_control_guc_ids),
+		SUBTEST(intel_guc_flow_control_lrcd_reg),
+		SUBTEST(intel_guc_flow_control_hang_state_machine),
+		SUBTEST(intel_guc_flow_control_deadlock_h2g),
+		SUBTEST(intel_guc_flow_control_bad_desc_h2g),
+	};
+	struct intel_gt *gt = &i915->gt;
+
+	if (intel_gt_is_wedged(gt))
+		return 0;
+
+	if (!intel_uc_uses_guc_submission(&gt->uc))
+		return 0;
+
+	return intel_gt_live_subtests(tests, gt);
+}
diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
index a92c0e9b7e6b..7a48b3adc545 100644
--- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
@@ -46,5 +46,6 @@ selftest(hangcheck, intel_hangcheck_live_selftests)
 selftest(execlists, intel_execlists_live_selftests)
 selftest(ring_submission, intel_ring_submission_live_selftests)
 selftest(perf, i915_perf_live_selftests)
+selftest(guc_flow_control, intel_guc_flow_control)
 /* Here be dragons: keep last to run last! */
 selftest(late_gt_pm, intel_gt_pm_late_selftests)
diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
new file mode 100644
index 000000000000..f83c8c6c0d9b
--- /dev/null
+++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
@@ -0,0 +1,101 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+//#include "gt/intel_engine_user.h"
+#include "gt/intel_gt.h"
+#include "i915_drv.h"
+#include "i915_selftest.h"
+
+#include "selftests/intel_scheduler_helpers.h"
+
+#define REDUCED_TIMESLICE	5
+#define REDUCED_PREEMPT		10
+#define WAIT_FOR_RESET_TIME	10000
+
+struct intel_engine_cs *intel_selftest_find_any_engine(struct intel_gt *gt)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, gt, id)
+		return engine;
+
+	pr_err("No valid engine found!\n");
+	return NULL;
+}
+
+int intel_selftest_modify_policy(struct intel_engine_cs *engine,
+				 struct intel_selftest_saved_policy *saved,
+				 u32 modify_type)
+
+{
+	int err;
+
+	saved->reset = engine->i915->params.reset;
+	saved->flags = engine->flags;
+	saved->timeslice = engine->props.timeslice_duration_ms;
+	saved->preempt_timeout = engine->props.preempt_timeout_ms;
+
+	switch (modify_type) {
+	case SELFTEST_SCHEDULER_MODIFY_FAST_RESET:
+		/*
+		 * Enable force pre-emption on time slice expiration
+		 * together with engine reset on pre-emption timeout.
+		 * This is required to make the GuC notice and reset
+		 * the single hanging context.
+		 * Also, reduce the preemption timeout to something
+		 * small to speed the test up.
+		 */
+		engine->i915->params.reset = 2;
+		engine->flags |= I915_ENGINE_WANT_FORCED_PREEMPTION;
+		engine->props.timeslice_duration_ms = REDUCED_TIMESLICE;
+		engine->props.preempt_timeout_ms = REDUCED_PREEMPT;
+		break;
+
+	case SELFTEST_SCHEDULER_MODIFY_NO_HANGCHECK:
+		engine->props.preempt_timeout_ms = 0;
+		break;
+
+	default:
+		pr_err("Invalid scheduler policy modification type: %d!\n", modify_type);
+		return -EINVAL;
+	}
+
+	if (!intel_engine_uses_guc(engine))
+		return 0;
+
+	err = intel_guc_global_policies_update(&engine->gt->uc.guc);
+	if (err)
+		intel_selftest_restore_policy(engine, saved);
+
+	return err;
+}
+
+int intel_selftest_restore_policy(struct intel_engine_cs *engine,
+				  struct intel_selftest_saved_policy *saved)
+{
+	/* Restore the original policies */
+	engine->i915->params.reset = saved->reset;
+	engine->flags = saved->flags;
+	engine->props.timeslice_duration_ms = saved->timeslice;
+	engine->props.preempt_timeout_ms = saved->preempt_timeout;
+
+	if (!intel_engine_uses_guc(engine))
+		return 0;
+
+	return intel_guc_global_policies_update(&engine->gt->uc.guc);
+}
+
+int intel_selftest_wait_for_rq(struct i915_request *rq)
+{
+	long ret;
+
+	ret = i915_request_wait(rq, 0, WAIT_FOR_RESET_TIME);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
new file mode 100644
index 000000000000..34f26c1597e5
--- /dev/null
+++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2014-2019 Intel Corporation
+ */
+
+#ifndef _INTEL_SELFTEST_SCHEDULER_HELPERS_H_
+#define _INTEL_SELFTEST_SCHEDULER_HELPERS_H_
+
+#include <linux/types.h>
+
+struct intel_gt;
+struct i915_request;
+struct intel_engine_cs;
+
+struct intel_selftest_saved_policy
+{
+	u32 flags;
+	u32 reset;
+	u64 timeslice;
+	u64 preempt_timeout;
+};
+
+enum selftest_scheduler_modify
+{
+	SELFTEST_SCHEDULER_MODIFY_NO_HANGCHECK = 0,
+	SELFTEST_SCHEDULER_MODIFY_FAST_RESET,
+};
+
+int intel_selftest_modify_policy(struct intel_engine_cs *engine,
+				 struct intel_selftest_saved_policy *saved,
+				 enum selftest_scheduler_modify modify_type);
+int intel_selftest_restore_policy(struct intel_engine_cs *engine,
+				  struct intel_selftest_saved_policy *saved);
+int intel_selftest_wait_for_rq( struct i915_request *rq);
+struct intel_engine_cs *intel_selftest_find_any_engine(struct intel_gt *gt);
+
+#endif
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 96/97] drm/i915/guc: Update GuC documentation
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (94 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 95/97] drm/i915/guc: Selftest for GuC flow control Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-06 19:14 ` [RFC PATCH 97/97] drm/i915/guc: Unblock GuC submission on Gen11+ Matthew Brost
                   ` (3 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 99 ++++++++++++++-----
 1 file changed, 77 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 337ddc0dab6b..594a99ea4f5c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -29,21 +29,6 @@
 /**
  * DOC: GuC-based command submission
  *
- * IMPORTANT NOTE: GuC submission is currently not supported in i915. The GuC
- * firmware is moving to an updated submission interface and we plan to
- * turn submission back on when that lands. The below documentation (and related
- * code) matches the old submission model and will be updated as part of the
- * upgrade to the new flow.
- *
- * GuC stage descriptor:
- * During initialization, the driver allocates a static pool of 1024 such
- * descriptors, and shares them with the GuC. Currently, we only use one
- * descriptor. This stage descriptor lets the GuC know about the workqueue and
- * process descriptor. Theoretically, it also lets the GuC know about our HW
- * contexts (context ID, etc...), but we actually employ a kind of submission
- * where the GuC uses the LRCA sent via the work item instead. This is called
- * a "proxy" submission.
- *
  * The Scratch registers:
  * There are 16 MMIO-based registers start from 0xC180. The kernel driver writes
  * a value to the action register (SOFT_SCRATCH_0) along with any data. It then
@@ -52,13 +37,45 @@
  * processes the request. The kernel driver polls waiting for this update and
  * then proceeds.
  *
- * Work Items:
- * There are several types of work items that the host may place into a
- * workqueue, each with its own requirements and limitations. Currently only
- * WQ_TYPE_INORDER is needed to support legacy submission via GuC, which
- * represents in-order queue. The kernel driver packs ring tail pointer and an
- * ELSP context descriptor dword into Work Item.
- * See gse_add_request()
+ * Command Transport buffers (CTBs):
+ * Covered in detail in other sections but CTBs (host-to-guc, H2G, guc-to-host
+ * G2H) are how the i915 controls submissions.
+ *
+ * Context registration:
+ * Before a context can be submitted it must be registered with the GuC via a
+ * H2G. A unique guc_id associated with each context. The context is either
+ * registered at request creation time (no flow control) or at submission time
+ * (flow control). It will stay registered until the context is destroyed or a
+ * flow control condition is met (e.g. pressure on guc_ids).
+ *
+ * Context submission:
+ * The i915 updates the LRC tail value in memory. Either a schedule enable H2G
+ * or context submit H2G is used to submit a context.
+ *
+ * Context unpin:
+ * To unpin a context a H2G is used to disable scheduling and when the
+ * corresponding G2H returns indicating the scheduling disable operation has
+ * completed it is safe to unpin the context. While a disable is in flight it
+ * isn't safe to resubmit the context so a fence is used to stall all future
+ * requests until the G2H is returned.
+ *
+ * Context deregistration:
+ * Before a context can be destroyed or we steal its guc_id we must deregister
+ * the context with the GuC via H2G. If stealing the guc_id it isn't safe to
+ * submit anything to this guc_id until the deregister completes so a fence is
+ * used to stall all requests associated with this guc_ids until the
+ * corresponding G2H returns indicating the guc_id has been deregistered.
+ *
+ * guc_ids:
+ * Unique number associated with private GuC context data passed in during
+ * context registration / submission / deregistration. 64k available. Simple ida
+ * is used for allocation.
+ *
+ * Stealing guc_ids:
+ * If no guc_ids are available they can be stolen from another context at
+ * request creation time if that context is unpinned. If nothing can be found at
+ * request creation time, flow control is triggered (serializing all submission
+ * until flow control exits) and guc_ids are stolden at submission time.
  *
  * GuC flow control state machine:
  * The tasklet, workqueue (retire_worker), and the G2H handlers together more or
@@ -79,6 +96,44 @@
  * STALL_MOVE_LRC_TAIL		Tasklet will try to move LRC tail
  * STALL_ADD_REQUEST		Tasklet will try to add the request (submit
  *				context)
+ *
+ * Locking:
+ * In the GuC submission code we have 4 basic spin locks which protect
+ * everything. Details about each below.
+ *
+ * gse->sched_engine->lock
+ * This is the submission lock for all contexts that share a GuC submit engine
+ * (gse), thus only 1 context which share a gse can be submitting at a time.
+ *
+ * guc->contexts_lock
+ * Protects guc_id allocation. Global lock i.e. Only 1 context that uses GuC
+ * submission can hold this at a time.
+ *
+ * ce->guc_state.lock
+ * Protects everything under ce->guc_state. Ensures that a context is in the
+ * correct state before issuing a H2G. e.g. We don't issue a schedule disable
+ * on disabled context (bad idea), we don't issue schedule enable when a
+ * schedule disable is inflight, etc... Lock individual to each context.
+ *
+ * ce->guc_active.lock
+ * Protects everything under ce->guc_active which is the current requests
+ * inflight on the context / priority management. Lock individual to each
+ * context.
+ *
+ * Lock ordering rules:
+ * ce->guc_state.lock -> gse->sched_engine->lock -> ce->guc_active.lock
+ * gse->sched_engine->lock -> guc->contexts_lock
+ *
+ * Reset races:
+ * When a GPU full reset is triggered it is assumed that some G2H responses to
+ * a H2G can be lost as the GuC is likely toast. Losing these G2H can prove to
+ * fatal as we do certain operations upon receiving a G2H (e.g. destroy
+ * contexts, release guc_ids, etc...). Luckly when this occurs we can scrub
+ * context state and cleanup appropriately, however this is quite racey. To
+ * avoid races the rules are check for submission being disabled (i.e. check for
+ * mid reset) with the appropriate lock being held. If submission is disabled
+ * don't send the H2G. The reset code must disable submission and flush all
+ * locks before scrubbing for missing G2H.
  */
 
 static struct intel_context *
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* [RFC PATCH 97/97] drm/i915/guc: Unblock GuC submission on Gen11+
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (95 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 96/97] drm/i915/guc: Update GuC documentation Matthew Brost
@ 2021-05-06 19:14 ` Matthew Brost
  2021-05-09 17:12 ` [RFC PATCH 00/97] Basic GuC submission support in the i915 Martin Peres
                   ` (2 subsequent siblings)
  99 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-06 19:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, tvrtko.ursulin, daniele.ceraolospurio,
	jason.ekstrand, jon.bloomfield, daniel.vetter, john.c.harrison

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Unblock GuC submission on Gen11+ platforms.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h            |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  8 ++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h |  3 +--
 drivers/gpu/drm/i915/gt/uc/intel_uc.c             | 14 +++++++++-----
 4 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 169daaf8a189..ac7ece2f4c8c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -66,6 +66,7 @@ struct intel_guc {
 	struct list_head destroyed_contexts;
 	struct intel_gt_pm_delayed_work destroy_worker;
 
+	bool submission_supported;
 	bool submission_selected;
 
 	struct i915_vma *ads_vma;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 594a99ea4f5c..b9c86e0f02b2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -3477,6 +3477,13 @@ void intel_guc_submission_disable(struct intel_guc *guc)
 	/* Note: By the time we're here, GuC may have already been reset */
 }
 
+static bool __guc_submission_supported(struct intel_guc *guc)
+{
+	/* GuC submission is unavailable for pre-Gen11 */
+	return intel_guc_is_supported(guc) &&
+	       INTEL_GEN(guc_to_gt(guc)->i915) >= 11;
+}
+
 static bool __guc_submission_selected(struct intel_guc *guc)
 {
 	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
@@ -3491,6 +3498,7 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
 {
 	guc->max_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
 	guc->num_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
+	guc->submission_supported = __guc_submission_supported(guc);
 	guc->submission_selected = __guc_submission_selected(guc);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 60c8b9aaad6e..9431ec52a6c4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -37,8 +37,7 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
 
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
-	/* XXX: GuC submission is unavailable for now */
-	return false;
+	return guc->submission_supported;
 }
 
 static inline bool intel_guc_submission_is_wanted(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 4a79db4a739f..8cfb226da62e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -34,8 +34,15 @@ static void uc_expand_default_options(struct intel_uc *uc)
 		return;
 	}
 
-	/* Default: enable HuC authentication only */
-	i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
+	/* Intermediate platforms are HuC authentication only */
+	if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) {
+		drm_dbg(&i915->drm, "Disabling GuC only due to old platform\n");
+		i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
+		return;
+	}
+
+	/* Default: enable HuC authentication and GuC submission */
+	i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION;
 }
 
 /* Reset GuC providing us with fresh state for both GuC and HuC.
@@ -313,9 +320,6 @@ static int __uc_init(struct intel_uc *uc)
 	if (i915_inject_probe_failure(uc_to_gt(uc)->i915))
 		return -ENOMEM;
 
-	/* XXX: GuC submission is unavailable for now */
-	GEM_BUG_ON(intel_uc_uses_guc_submission(uc));
-
 	ret = intel_guc_init(guc);
 	if (ret)
 		return ret;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (96 preceding siblings ...)
  2021-05-06 19:14 ` [RFC PATCH 97/97] drm/i915/guc: Unblock GuC submission on Gen11+ Matthew Brost
@ 2021-05-09 17:12 ` Martin Peres
  2021-05-09 23:11   ` Jason Ekstrand
  2021-05-14 11:11 ` [Intel-gfx] " Tvrtko Ursulin
  2021-05-25 10:32 ` Tvrtko Ursulin
  99 siblings, 1 reply; 249+ messages in thread
From: Martin Peres @ 2021-05-09 17:12 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel
  Cc: tvrtko.ursulin, jason.ekstrand, daniele.ceraolospurio,
	jon.bloomfield, daniel.vetter, john.c.harrison

Hi,

On 06/05/2021 22:13, Matthew Brost wrote:
> Basic GuC submission support. This is the first bullet point in the
> upstreaming plan covered in the following RFC [1].
> 
> At a very high level the GuC is a piece of firmware which sits between
> the i915 and the GPU. It offloads some of the scheduling of contexts
> from the i915 and programs the GPU to submit contexts. The i915
> communicates with the GuC and the GuC communicates with the GPU.

May I ask what will GuC command submission do that execlist won't/can't 
do? And what would be the impact on users? Even forgetting the troubled 
history of GuC (instability, performance regression, poor level of user 
support, 6+ years of trying to upstream it...), adding this much code 
and doubling the amount of validation needed should come with a 
rationale making it feel worth it... and I am not seeing here. Would you 
mind providing the rationale behind this work?

> 
> GuC submission will be disabled by default on all current upstream
> platforms behind a module parameter - enable_guc. A value of 3 will
> enable submission and HuC loading via the GuC. GuC submission should
> work on all gen11+ platforms assuming the GuC firmware is present.

What is the plan here when it comes to keeping support for execlist? I 
am afraid that landing GuC support in Linux is the first step towards 
killing the execlist, which would force users to use proprietary 
firmwares that even most Intel engineers have little influence over. 
Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling" 
which states "Disable semaphores when using GuC scheduling as semaphores 
are broken in the current GuC firmware." is anything to go by, it means 
that even Intel developers seem to prefer working around the GuC 
firmware, rather than fixing it.

In the same vein, I have another concern related to the impact of GuC on 
Linux's stable releases. Let's say that in 3 years, a new application 
triggers a bug in command submission inside the firmware. Given that the 
Linux community cannot patch the GuC firmware, how likely is it that 
Intel would release a new GuC version? That would not be necessarily 
such a big problem if newer versions of the GuC could easily be 
backported to this potentially-decade-old Linux version, but given that 
the GuC seems to have ABI-breaking changes on a monthly cadence (we are 
at major version 60 *already*? :o), I would say that it is 
highly-unlikely that it would not require potentially-extensive changes 
to i915 to make it work, making the fix almost impossible to land in the 
stable tree... Do you have a plan to mitigate this problem?

Patches like "drm/i915/guc: Disable bonding extension with GuC 
submission" also make me twitch, as this means the two command 
submission paths will not be functionally equivalent, and enabling GuC 
could thus introduce a user-visible regression (one app used to work, 
then stopped working). Could you add in the commit's message a proof 
that this would not end up being a user regression (in which case, why 
have this codepath to begin with?).

Finally, could you explain why IGT tests need to be modified to work the 
GuC [1], and how much of the code in this series is covered by 
existing/upcoming tests? I would expect a very solid set of tests to 
minimize the maintenance burden, and enable users to reproduce potential 
issues found in this new codepath (too many users run with enable_guc=3, 
as can be seen on Google[2]).

Looking forward to reading up about your plan, and the commitments Intel 
would put in place to make this feature something users should be 
looking forward to rather than fear.

Thanks,
Martin

[2] https://www.google.com/search?q=enable_guc%3D3

> 
> This is a huge series and it is completely unrealistic to merge all of
> these patches at once. Fortunately I believe we can break down the
> series into different merges:
> 
> 1. Merge Chris Wilson's patches. These have already been reviewed
> upstream and I fully agree with these patches as a precursor to GuC
> submission.
> 
> 2. Update to GuC 60.1.2. These are largely Michal's patches.
> 
> 3. Turn on GuC/HuC auto mode by default.
> 
> 4. Additional patches needed to support GuC submission. This is any
> patch not covered by 1-3 in the first 34 patches. e.g. 'Engine relative
> MMIO'
> 
> 5. GuC submission support. Patches number 35+. These all don't have to
> merge at once though as we don't actually allow GuC submission until the
> last patch of this series.
> 
> [1] https://patchwork.freedesktop.org/patch/432206/?series=89840&rev=1
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> 
> Chris Wilson (3):
>    drm/i915/gt: Move engine setup out of set_default_submission
>    drm/i915/gt: Move submission_method into intel_gt
>    drm/i915/gt: Move CS interrupt handler to the backend
> 
> Daniele Ceraolo Spurio (6):
>    drm/i915/guc: skip disabling CTBs before sanitizing the GuC
>    drm/i915/guc: use probe_error log for CT enablement failure
>    drm/i915/guc: enable only the user interrupt when using GuC submission
>    drm/i915/uc: turn on GuC/HuC auto mode by default
>    drm/i915/guc: Use guc_class instead of engine_class in fw interface
>    drm/i915/guc: Unblock GuC submission on Gen11+
> 
> John Harrison (13):
>    drm/i915/guc: Support per context scheduling policies
>    drm/i915/guc: Update firmware to v60.1.2
>    drm/i915: Engine relative MMIO
>    drm/i915/guc: Module load failure test for CT buffer creation
>    drm/i915: Track 'serial' counts for virtual engines
>    drm/i915/guc: Provide mmio list to be saved/restored on engine reset
>    drm/i915/guc: Don't complain about reset races
>    drm/i915/guc: Enable GuC engine reset
>    drm/i915/guc: Fix for error capture after full GPU reset with GuC
>    drm/i915/guc: Hook GuC scheduling policies up
>    drm/i915/guc: Connect reset modparam updates to GuC policy flags
>    drm/i915/guc: Include scheduling policies in the debugfs state dump
>    drm/i915/guc: Add golden context to GuC ADS
> 
> Matthew Brost (53):
>    drm/i915: Introduce i915_sched_engine object
>    drm/i915/guc: Improve error message for unsolicited CT response
>    drm/i915/guc: Add non blocking CTB send function
>    drm/i915/guc: Add stall timer to non blocking CTB send function
>    drm/i915/guc: Optimize CTB writes and reads
>    drm/i915/guc: Increase size of CTB buffers
>    drm/i915/guc: Add new GuC interface defines and structures
>    drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor
>    drm/i915/guc: Add lrc descriptor context lookup array
>    drm/i915/guc: Implement GuC submission tasklet
>    drm/i915/guc: Add bypass tasklet submission path to GuC
>    drm/i915/guc: Implement GuC context operations for new inteface
>    drm/i915/guc: Insert fence on context when deregistering
>    drm/i915/guc: Defer context unpin until scheduling is disabled
>    drm/i915/guc: Disable engine barriers with GuC during unpin
>    drm/i915/guc: Extend deregistration fence to schedule disable
>    drm/i915: Disable preempt busywait when using GuC scheduling
>    drm/i915/guc: Ensure request ordering via completion fences
>    drm/i915/guc: Disable semaphores when using GuC scheduling
>    drm/i915/guc: Ensure G2H response has space in buffer
>    drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
>    drm/i915/guc: Update GuC debugfs to support new GuC
>    drm/i915/guc: Add several request trace points
>    drm/i915: Add intel_context tracing
>    drm/i915/guc: GuC virtual engines
>    drm/i915: Hold reference to intel_context over life of i915_request
>    drm/i915/guc: Disable bonding extension with GuC submission
>    drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
>    drm/i915/guc: Reset implementation for new GuC interface
>    drm/i915: Reset GPU immediately if submission is disabled
>    drm/i915/guc: Add disable interrupts to guc sanitize
>    drm/i915/guc: Suspend/resume implementation for new interface
>    drm/i915/guc: Handle context reset notification
>    drm/i915/guc: Handle engine reset failure notification
>    drm/i915/guc: Enable the timer expired interrupt for GuC
>    drm/i915/guc: Capture error state on context reset
>    drm/i915/guc: Don't call ring_is_idle in GuC submission
>    drm/i915/guc: Implement banned contexts for GuC submission
>    drm/i915/guc: Allow flexible number of context ids
>    drm/i915/guc: Connect the number of guc_ids to debugfs
>    drm/i915/guc: Don't return -EAGAIN to user when guc_ids exhausted
>    drm/i915/guc: Don't allow requests not ready to consume all guc_ids
>    drm/i915/guc: Introduce guc_submit_engine object
>    drm/i915/guc: Implement GuC priority management
>    drm/i915/guc: Support request cancellation
>    drm/i915/guc: Check return of __xa_store when registering a context
>    drm/i915/guc: Non-static lrc descriptor registration buffer
>    drm/i915/guc: Take GT PM ref when deregistering context
>    drm/i915: Add GT PM delayed worker
>    drm/i915/guc: Take engine PM when a context is pinned with GuC
>      submission
>    drm/i915/guc: Don't call switch_to_kernel_context with GuC submission
>    drm/i915/guc: Selftest for GuC flow control
>    drm/i915/guc: Update GuC documentation
> 
> Michal Wajdeczko (21):
>    drm/i915/guc: Keep strict GuC ABI definitions
>    drm/i915/guc: Stop using fence/status from CTB descriptor
>    drm/i915: Promote ptrdiff() to i915_utils.h
>    drm/i915/guc: Only rely on own CTB size
>    drm/i915/guc: Don't repeat CTB layout calculations
>    drm/i915/guc: Replace CTB array with explicit members
>    drm/i915/guc: Update sizes of CTB buffers
>    drm/i915/guc: Relax CTB response timeout
>    drm/i915/guc: Start protecting access to CTB descriptors
>    drm/i915/guc: Stop using mutex while sending CTB messages
>    drm/i915/guc: Don't receive all G2H messages in irq handler
>    drm/i915/guc: Always copy CT message to new allocation
>    drm/i915/guc: Introduce unified HXG messages
>    drm/i915/guc: Update MMIO based communication
>    drm/i915/guc: Update CTB response status
>    drm/i915/guc: Add flag for mark broken CTB
>    drm/i915/guc: New definition of the CTB descriptor
>    drm/i915/guc: New definition of the CTB registration action
>    drm/i915/guc: New CTB based communication
>    drm/i915/guc: Kill guc_clients.ct_pool
>    drm/i915/guc: Early initialization of GuC send registers
> 
> Rodrigo Vivi (1):
>    drm/i915/guc: Remove sample_forcewake h2g action
> 
>   drivers/gpu/drm/i915/Makefile                 |    2 +
>   drivers/gpu/drm/i915/gem/i915_gem_context.c   |   39 +-
>   drivers/gpu/drm/i915/gem/i915_gem_context.h   |    1 +
>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |    3 +-
>   drivers/gpu/drm/i915/gem/i915_gem_wait.c      |    4 +-
>   drivers/gpu/drm/i915/gt/gen8_engine_cs.c      |    6 +-
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   |   44 +-
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   |   14 +-
>   .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |    7 +
>   drivers/gpu/drm/i915/gt/intel_context.c       |   50 +-
>   drivers/gpu/drm/i915/gt/intel_context.h       |   45 +-
>   drivers/gpu/drm/i915/gt/intel_context_types.h |   76 +-
>   drivers/gpu/drm/i915/gt/intel_engine.h        |   96 +-
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  320 +-
>   .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   75 +-
>   .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |    4 +
>   drivers/gpu/drm/i915/gt/intel_engine_pm.c     |   14 +-
>   drivers/gpu/drm/i915/gt/intel_engine_pm.h     |    5 +
>   drivers/gpu/drm/i915/gt/intel_engine_types.h  |   71 +-
>   drivers/gpu/drm/i915/gt/intel_engine_user.c   |    6 +-
>   .../drm/i915/gt/intel_execlists_submission.c  |  693 +--
>   .../drm/i915/gt/intel_execlists_submission.h  |   14 -
>   drivers/gpu/drm/i915/gt/intel_gpu_commands.h  |    5 +
>   drivers/gpu/drm/i915/gt/intel_gt.c            |   23 +
>   drivers/gpu/drm/i915/gt/intel_gt.h            |    2 +
>   drivers/gpu/drm/i915/gt/intel_gt_irq.c        |  100 +-
>   drivers/gpu/drm/i915/gt/intel_gt_irq.h        |   23 +
>   drivers/gpu/drm/i915/gt/intel_gt_pm.c         |   14 +-
>   drivers/gpu/drm/i915/gt/intel_gt_pm.h         |   13 +
>   .../drm/i915/gt/intel_gt_pm_delayed_work.c    |   35 +
>   .../drm/i915/gt/intel_gt_pm_delayed_work.h    |   24 +
>   drivers/gpu/drm/i915/gt/intel_gt_requests.c   |   23 +-
>   drivers/gpu/drm/i915/gt/intel_gt_requests.h   |    7 +-
>   drivers/gpu/drm/i915/gt/intel_gt_types.h      |   10 +
>   drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |    1 -
>   drivers/gpu/drm/i915/gt/intel_reset.c         |   58 +-
>   .../gpu/drm/i915/gt/intel_ring_submission.c   |   73 +-
>   drivers/gpu/drm/i915/gt/intel_rps.c           |    6 +-
>   drivers/gpu/drm/i915/gt/intel_workarounds.c   |   46 +-
>   .../gpu/drm/i915/gt/intel_workarounds_types.h |    1 +
>   drivers/gpu/drm/i915/gt/mock_engine.c         |   58 +-
>   drivers/gpu/drm/i915/gt/selftest_context.c    |   10 +
>   drivers/gpu/drm/i915/gt/selftest_execlists.c  |   58 +-
>   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |    6 +-
>   drivers/gpu/drm/i915/gt/selftest_lrc.c        |    6 +-
>   drivers/gpu/drm/i915/gt/selftest_reset.c      |    2 +-
>   .../drm/i915/gt/selftest_ring_submission.c    |    2 +-
>   .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |  177 +
>   .../gt/uc/abi/guc_communication_ctb_abi.h     |  192 +
>   .../gt/uc/abi/guc_communication_mmio_abi.h    |   35 +
>   .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h   |   13 +
>   .../gpu/drm/i915/gt/uc/abi/guc_messages_abi.h |  247 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  194 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  131 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |  484 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h    |    3 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 1088 +++--
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |   49 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    |   56 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  377 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 4037 +++++++++++++++--
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   20 +-
>   .../i915/gt/uc/intel_guc_submission_types.h   |   55 +
>   drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  116 +-
>   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   11 +
>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |   25 +-
>   .../i915/gt/uc/selftest_guc_flow_control.c    |  589 +++
>   drivers/gpu/drm/i915/i915_active.c            |    3 +
>   drivers/gpu/drm/i915/i915_debugfs.c           |    8 +-
>   drivers/gpu/drm/i915/i915_debugfs_params.c    |   31 +
>   drivers/gpu/drm/i915/i915_drv.h               |    2 +-
>   drivers/gpu/drm/i915/i915_gem_evict.c         |    1 +
>   drivers/gpu/drm/i915/i915_gpu_error.c         |   28 +-
>   drivers/gpu/drm/i915/i915_irq.c               |   10 +-
>   drivers/gpu/drm/i915/i915_params.h            |    2 +-
>   drivers/gpu/drm/i915/i915_perf.c              |   16 +-
>   drivers/gpu/drm/i915/i915_reg.h               |    2 +
>   drivers/gpu/drm/i915/i915_request.c           |  218 +-
>   drivers/gpu/drm/i915/i915_request.h           |   37 +-
>   drivers/gpu/drm/i915/i915_scheduler.c         |  188 +-
>   drivers/gpu/drm/i915/i915_scheduler.h         |   74 +-
>   drivers/gpu/drm/i915/i915_scheduler_types.h   |   74 +
>   drivers/gpu/drm/i915/i915_trace.h             |  219 +-
>   drivers/gpu/drm/i915/i915_utils.h             |    5 +
>   drivers/gpu/drm/i915/i915_vma.h               |    5 -
>   drivers/gpu/drm/i915/intel_wakeref.c          |    5 +
>   drivers/gpu/drm/i915/intel_wakeref.h          |    1 +
>   .../drm/i915/selftests/i915_live_selftests.h  |    1 +
>   .../gpu/drm/i915/selftests/igt_live_test.c    |    2 +-
>   .../i915/selftests/intel_scheduler_helpers.c  |  101 +
>   .../i915/selftests/intel_scheduler_helpers.h  |   37 +
>   .../gpu/drm/i915/selftests/mock_gem_device.c  |    3 +-
>   include/uapi/drm/i915_drm.h                   |    9 +
>   93 files changed, 8954 insertions(+), 2222 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_pm_delayed_work.c
>   create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_pm_delayed_work.h
>   create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
>   create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
>   create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h
>   create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
>   create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
>   create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
>   create mode 100644 drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c
>   create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
>   create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-09 17:12 ` [RFC PATCH 00/97] Basic GuC submission support in the i915 Martin Peres
@ 2021-05-09 23:11   ` Jason Ekstrand
  2021-05-10 13:55     ` Martin Peres
  2021-05-11  2:58     ` Dixit, Ashutosh
  0 siblings, 2 replies; 249+ messages in thread
From: Jason Ekstrand @ 2021-05-09 23:11 UTC (permalink / raw)
  To: Martin Peres, Matthew Brost, intel-gfx, dri-devel
  Cc: tvrtko.ursulin, jason.ekstrand, daniele.ceraolospurio,
	jon.bloomfield, daniel.vetter, john.c.harrison

[-- Attachment #1: Type: text/plain, Size: 6191 bytes --]

On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:

> Hi,
>
> On 06/05/2021 22:13, Matthew Brost wrote:
>> Basic GuC submission support. This is the first bullet point in the
>> upstreaming plan covered in the following RFC [1].
>>
>> At a very high level the GuC is a piece of firmware which sits between
>> the i915 and the GPU. It offloads some of the scheduling of contexts
>> from the i915 and programs the GPU to submit contexts. The i915
>> communicates with the GuC and the GuC communicates with the GPU.
>
> May I ask what will GuC command submission do that execlist won't/can't
> do? And what would be the impact on users? Even forgetting the troubled
> history of GuC (instability, performance regression, poor level of user
> support, 6+ years of trying to upstream it...), adding this much code
> and doubling the amount of validation needed should come with a
> rationale making it feel worth it... and I am not seeing here. Would you
> mind providing the rationale behind this work?
>
>>
>> GuC submission will be disabled by default on all current upstream
>> platforms behind a module parameter - enable_guc. A value of 3 will
>> enable submission and HuC loading via the GuC. GuC submission should
>> work on all gen11+ platforms assuming the GuC firmware is present.
>
> What is the plan here when it comes to keeping support for execlist? I
> am afraid that landing GuC support in Linux is the first step towards
> killing the execlist, which would force users to use proprietary
> firmwares that even most Intel engineers have little influence over.
> Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling"
> which states "Disable semaphores when using GuC scheduling as semaphores
> are broken in the current GuC firmware." is anything to go by, it means
> that even Intel developers seem to prefer working around the GuC
> firmware, rather than fixing it.

Yes, landing GuC support may be the first step in removing execlist 
support. The inevitable reality is that GPU scheduling is coming and likely 
to be there only path in the not-too-distant future. (See also the ongoing 
thread with AMD about fences.) I'm not going to pass judgement on whether 
or not this is a good thing.  I'm just reading the winds and, in my view, 
this is where things are headed for good or ill.

In answer to the question above, the answer to "what do we gain from GuC?" 
may soon be, "you get to use your GPU."  We're not there yet and, again, 
I'm not necessarily advocating for it, but that is likely where things are 
headed.

A firmware-based submission model isn't a bad design IMO and, aside from 
the firmware freedom issues, I think there are actual advantages to the 
model. Immediately, it'll unlock a few features like parallel submission 
(more on that in a bit) and long-running compute because they're 
implemented in GuC and the work to implement them properly in the execlist 
scheduler is highly non-trivial. Longer term, it may (no guarantees) unlock 
some performance by getting the kernel out of the way.


> In the same vein, I have another concern related to the impact of GuC on
> Linux's stable releases. Let's say that in 3 years, a new application
> triggers a bug in command submission inside the firmware. Given that the
> Linux community cannot patch the GuC firmware, how likely is it that
> Intel would release a new GuC version? That would not be necessarily
> such a big problem if newer versions of the GuC could easily be
> backported to this potentially-decade-old Linux version, but given that
> the GuC seems to have ABI-breaking changes on a monthly cadence (we are
> at major version 60 *already*? :o), I would say that it is
> highly-unlikely that it would not require potentially-extensive changes
> to i915 to make it work, making the fix almost impossible to land in the
> stable tree... Do you have a plan to mitigate this problem?
>
> Patches like "drm/i915/guc: Disable bonding extension with GuC
> submission" also make me twitch, as this means the two command
> submission paths will not be functionally equivalent, and enabling GuC
> could thus introduce a user-visible regression (one app used to work,
> then stopped working). Could you add in the commit's message a proof
> that this would not end up being a user regression (in which case, why
> have this codepath to begin with?).

I'd like to address this one specifically as it's become something of a 
speciality of mine the past few weeks. The current bonded submission model 
is bad. It provides a plethora of ways for a client to back itself into a 
corner and doesn't actually provide the guarantees the media driver needs 
for its real-time high-resolution decode. It's bad enough we're seriously 
considering ripping it out, backwards compatibility or not. The good news 
is that very little that your average desktop user does depends on it: 
basically just real-time >4K video decode.

The new parallel submit API is much better and should be the path forward. 
(We should have landed parallel submit the first time around.) It isn't 
full of corners and does let us provides actual parallel execution 
guarantees. It also gives the scheduler the information it needs to 
reliably provide those guarantees.

If we need to support the parallel submit API with the execlist back-end, 
that's totally possible. The choice to only implement the parallel submit 
API with GuC is a pragmatic one. We're trying to get upstream back on it's 
feet and get all the various up-and-coming bits of hardware enabled. 
Enabling the new API in the execlist back-end makes that pipeline longer.


> Finally, could you explain why IGT tests need to be modified to work the
> GuC [1], and how much of the code in this series is covered by
> existing/upcoming tests? I would expect a very solid set of tests to
> minimize the maintenance burden, and enable users to reproduce potential
> issues found in this new codepath (too many users run with enable_guc=3,
> as can be seen on Google[2]).

The IGT changes, as I understand them, are entirely around switching to the 
new parallel submit API. There shouldn't be a major effect to most users.

--Jason

[-- Attachment #2: Type: text/html, Size: 9304 bytes --]

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-09 23:11   ` Jason Ekstrand
@ 2021-05-10 13:55     ` Martin Peres
  2021-05-10 16:25       ` Jason Ekstrand
  2021-05-10 16:33       ` Daniel Vetter
  2021-05-11  2:58     ` Dixit, Ashutosh
  1 sibling, 2 replies; 249+ messages in thread
From: Martin Peres @ 2021-05-10 13:55 UTC (permalink / raw)
  To: Jason Ekstrand, Matthew Brost, intel-gfx, dri-devel
  Cc: tvrtko.ursulin, jason.ekstrand, daniele.ceraolospurio,
	jon.bloomfield, daniel.vetter, john.c.harrison

On 10/05/2021 02:11, Jason Ekstrand wrote:
> On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
> 
>> Hi,
>>
>> On 06/05/2021 22:13, Matthew Brost wrote:
>>> Basic GuC submission support. This is the first bullet point in the
>>> upstreaming plan covered in the following RFC [1].
>>>
>>> At a very high level the GuC is a piece of firmware which sits between
>>> the i915 and the GPU. It offloads some of the scheduling of contexts
>>> from the i915 and programs the GPU to submit contexts. The i915
>>> communicates with the GuC and the GuC communicates with the GPU.
>>
>> May I ask what will GuC command submission do that execlist won't/can't
>> do? And what would be the impact on users? Even forgetting the troubled
>> history of GuC (instability, performance regression, poor level of user
>> support, 6+ years of trying to upstream it...), adding this much code
>> and doubling the amount of validation needed should come with a
>> rationale making it feel worth it... and I am not seeing here. Would you
>> mind providing the rationale behind this work?
>>
>>>
>>> GuC submission will be disabled by default on all current upstream
>>> platforms behind a module parameter - enable_guc. A value of 3 will
>>> enable submission and HuC loading via the GuC. GuC submission should
>>> work on all gen11+ platforms assuming the GuC firmware is present.
>>
>> What is the plan here when it comes to keeping support for execlist? I
>> am afraid that landing GuC support in Linux is the first step towards
>> killing the execlist, which would force users to use proprietary
>> firmwares that even most Intel engineers have little influence over.
>> Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling"
>> which states "Disable semaphores when using GuC scheduling as semaphores
>> are broken in the current GuC firmware." is anything to go by, it means
>> that even Intel developers seem to prefer working around the GuC
>> firmware, rather than fixing it.
> 
> Yes, landing GuC support may be the first step in removing execlist 
> support. The inevitable reality is that GPU scheduling is coming and 
> likely to be there only path in the not-too-distant future. (See also 
> the ongoing thread with AMD about fences.) I'm not going to pass 
> judgement on whether or not this is a good thing.  I'm just reading the 
> winds and, in my view, this is where things are headed for good or ill.
> 
> In answer to the question above, the answer to "what do we gain from 
> GuC?" may soon be, "you get to use your GPU."  We're not there yet and, 
> again, I'm not necessarily advocating for it, but that is likely where 
> things are headed.

This will be a sad day, especially since it seems fundamentally opposed 
with any long-term support, on top of taking away user freedom to 
fix/tweak their system when Intel won't.

> A firmware-based submission model isn't a bad design IMO and, aside from 
> the firmware freedom issues, I think there are actual advantages to the 
> model. Immediately, it'll unlock a few features like parallel submission 
> (more on that in a bit) and long-running compute because they're 
> implemented in GuC and the work to implement them properly in the 
> execlist scheduler is highly non-trivial. Longer term, it may (no 
> guarantees) unlock some performance by getting the kernel out of the way.

Oh, I definitely agree with firmware-based submission model not being a 
bad design. I was even cheering for it in 2015. Experience with it made 
me regret that deeply since :s

But with the DRM scheduler being responsible for most things, I fail to 
see what we could offload in the GuC except context switching (like 
every other manufacturer). The problem is, the GuC does way more than 
just switching registers in bulk, and if the number of revisions of the 
GuC is anything to go by, it is way too complex for me to feel 
comfortable with it.

> 
>> In the same vein, I have another concern related to the impact of GuC on
>> Linux's stable releases. Let's say that in 3 years, a new application
>> triggers a bug in command submission inside the firmware. Given that the
>> Linux community cannot patch the GuC firmware, how likely is it that
>> Intel would release a new GuC version? That would not be necessarily
>> such a big problem if newer versions of the GuC could easily be
>> backported to this potentially-decade-old Linux version, but given that
>> the GuC seems to have ABI-breaking changes on a monthly cadence (we are
>> at major version 60 *already*? :o), I would say that it is
>> highly-unlikely that it would not require potentially-extensive changes
>> to i915 to make it work, making the fix almost impossible to land in the
>> stable tree... Do you have a plan to mitigate this problem?
>>
>> Patches like "drm/i915/guc: Disable bonding extension with GuC
>> submission" also make me twitch, as this means the two command
>> submission paths will not be functionally equivalent, and enabling GuC
>> could thus introduce a user-visible regression (one app used to work,
>> then stopped working). Could you add in the commit's message a proof
>> that this would not end up being a user regression (in which case, why
>> have this codepath to begin with?).
> 
> I'd like to address this one specifically as it's become something of a 
> speciality of mine the past few weeks. The current bonded submission 
> model is bad. It provides a plethora of ways for a client to back itself 
> into a corner and doesn't actually provide the guarantees the media 
> driver needs for its real-time high-resolution decode. It's bad enough 
> we're seriously considering ripping it out, backwards compatibility or 
> not. The good news is that very little that your average desktop user 
> does depends on it: basically just real-time >4K video decode.
> 
> The new parallel submit API is much better and should be the path 
> forward. (We should have landed parallel submit the first time around.) 
> It isn't full of corners and does let us provides actual parallel 
> execution guarantees. It also gives the scheduler the information it 
> needs to reliably provide those guarantees. >
> If we need to support the parallel submit API with the execlist 
> back-end, that's totally possible. The choice to only implement the 
> parallel submit API with GuC is a pragmatic one. We're trying to get 
> upstream back on it's feet and get all the various up-and-coming bits of 
> hardware enabled. Enabling the new API in the execlist back-end makes 
> that pipeline longer.

I feel your pain, and wish you all the best to get GEM less complex
and more manageable.

So, if I understood correctly, the plan is just to regress 4K+ video 
decoding for people who do not enable GuC scheduling, or did not also 
update to a recent-enough media driver that would support this new 
interface? If it is indeed only for over 4K videos, then whatever. If it 
is 4K, it starts being a little bad, assuming graceful fallback to 
CPU-based decoding. What's the test plan for this patch then? The patch 
in its current form is definitely not making me confident.

> 
>> Finally, could you explain why IGT tests need to be modified to work the
>> GuC [1], and how much of the code in this series is covered by
>> existing/upcoming tests? I would expect a very solid set of tests to
>> minimize the maintenance burden, and enable users to reproduce potential
>> issues found in this new codepath (too many users run with enable_guc=3,
>> as can be seen on Google[2]).
> 
> The IGT changes, as I understand them, are entirely around switching to 
> the new parallel submit API. There shouldn't be a major effect to most 
> users.

Right, this part I followed, but failed to connect it to the GuC... 
because I couldn't see why it would be needed (execlist requiring a lot 
more work).

I sincerely wish for the GuC to stay away from upstream because of the 
above concerns (which are yet to be addressed), but if Intel were to 
push forward with the plan to drop execlist, I can foresee a world of 
trouble for users... That is of course unless the GuC were to be open 
sourced, with people outside of Intel able to sign their own builds or 
run unsigned. Failing that, let's hope the last 6 years were just a bad 
start, and the rapid climb in major version of the GuC will magically 
stop! I hope execlists will remain at feature parity with the GuC when 
possible... but deplore the increase in validation needs which will only 
hurt users in the end.

Thanks for your honest answer,
Martin

> 
> --Jason

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-10 13:55     ` Martin Peres
@ 2021-05-10 16:25       ` Jason Ekstrand
  2021-05-11  8:01         ` Martin Peres
  2021-05-10 16:33       ` Daniel Vetter
  1 sibling, 1 reply; 249+ messages in thread
From: Jason Ekstrand @ 2021-05-10 16:25 UTC (permalink / raw)
  To: Martin Peres, Matthew Brost, intel-gfx, dri-devel
  Cc: tvrtko.ursulin, jason.ekstrand, daniele.ceraolospurio,
	jon.bloomfield, daniel.vetter, john.c.harrison

[-- Attachment #1: Type: text/plain, Size: 9531 bytes --]

On May 10, 2021 08:55:55 Martin Peres <martin.peres@free.fr> wrote:

> On 10/05/2021 02:11, Jason Ekstrand wrote:
>> On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
>>
>>> Hi,
>>>
>>> On 06/05/2021 22:13, Matthew Brost wrote:
>>>> Basic GuC submission support. This is the first bullet point in the
>>>> upstreaming plan covered in the following RFC [1].
>>>>
>>>> At a very high level the GuC is a piece of firmware which sits between
>>>> the i915 and the GPU. It offloads some of the scheduling of contexts
>>>> from the i915 and programs the GPU to submit contexts. The i915
>>>> communicates with the GuC and the GuC communicates with the GPU.
>>>
>>> May I ask what will GuC command submission do that execlist won't/can't
>>> do? And what would be the impact on users? Even forgetting the troubled
>>> history of GuC (instability, performance regression, poor level of user
>>> support, 6+ years of trying to upstream it...), adding this much code
>>> and doubling the amount of validation needed should come with a
>>> rationale making it feel worth it... and I am not seeing here. Would you
>>> mind providing the rationale behind this work?
>>>
>>>>
>>>> GuC submission will be disabled by default on all current upstream
>>>> platforms behind a module parameter - enable_guc. A value of 3 will
>>>> enable submission and HuC loading via the GuC. GuC submission should
>>>> work on all gen11+ platforms assuming the GuC firmware is present.
>>>
>>> What is the plan here when it comes to keeping support for execlist? I
>>> am afraid that landing GuC support in Linux is the first step towards
>>> killing the execlist, which would force users to use proprietary
>>> firmwares that even most Intel engineers have little influence over.
>>> Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling"
>>> which states "Disable semaphores when using GuC scheduling as semaphores
>>> are broken in the current GuC firmware." is anything to go by, it means
>>> that even Intel developers seem to prefer working around the GuC
>>> firmware, rather than fixing it.
>>
>> Yes, landing GuC support may be the first step in removing execlist
>> support. The inevitable reality is that GPU scheduling is coming and
>> likely to be there only path in the not-too-distant future. (See also
>> the ongoing thread with AMD about fences.) I'm not going to pass
>> judgement on whether or not this is a good thing.  I'm just reading the
>> winds and, in my view, this is where things are headed for good or ill.
>>
>> In answer to the question above, the answer to "what do we gain from
>> GuC?" may soon be, "you get to use your GPU."  We're not there yet and,
>> again, I'm not necessarily advocating for it, but that is likely where
>> things are headed.
>
> This will be a sad day, especially since it seems fundamentally opposed
> with any long-term support, on top of taking away user freedom to
> fix/tweak their system when Intel won't.
>
>> A firmware-based submission model isn't a bad design IMO and, aside from
>> the firmware freedom issues, I think there are actual advantages to the
>> model. Immediately, it'll unlock a few features like parallel submission
>> (more on that in a bit) and long-running compute because they're
>> implemented in GuC and the work to implement them properly in the
>> execlist scheduler is highly non-trivial. Longer term, it may (no
>> guarantees) unlock some performance by getting the kernel out of the way.
>
> Oh, I definitely agree with firmware-based submission model not being a
> bad design. I was even cheering for it in 2015. Experience with it made
> me regret that deeply since :s
>
> But with the DRM scheduler being responsible for most things, I fail to
> see what we could offload in the GuC except context switching (like
> every other manufacturer). The problem is, the GuC does way more than
> just switching registers in bulk, and if the number of revisions of the
> GuC is anything to go by, it is way too complex for me to feel
> comfortable with it.

It's more than just bulk register writes. When it comes to load-balancing 
multiple GPU users, firmware can theoretically preempt and switch faster 
leading to more efficient time-slicing. All we really need the DRM 
scheduler for is handling implicit dma_fence dependencies between different 
applications.


>
>>> In the same vein, I have another concern related to the impact of GuC on
>>> Linux's stable releases. Let's say that in 3 years, a new application
>>> triggers a bug in command submission inside the firmware. Given that the
>>> Linux community cannot patch the GuC firmware, how likely is it that
>>> Intel would release a new GuC version? That would not be necessarily
>>> such a big problem if newer versions of the GuC could easily be
>>> backported to this potentially-decade-old Linux version, but given that
>>> the GuC seems to have ABI-breaking changes on a monthly cadence (we are
>>> at major version 60 *already*? :o), I would say that it is
>>> highly-unlikely that it would not require potentially-extensive changes
>>> to i915 to make it work, making the fix almost impossible to land in the
>>> stable tree... Do you have a plan to mitigate this problem?
>>>
>>> Patches like "drm/i915/guc: Disable bonding extension with GuC
>>> submission" also make me twitch, as this means the two command
>>> submission paths will not be functionally equivalent, and enabling GuC
>>> could thus introduce a user-visible regression (one app used to work,
>>> then stopped working). Could you add in the commit's message a proof
>>> that this would not end up being a user regression (in which case, why
>>> have this codepath to begin with?).
>>
>> I'd like to address this one specifically as it's become something of a
>> speciality of mine the past few weeks. The current bonded submission
>> model is bad. It provides a plethora of ways for a client to back itself
>> into a corner and doesn't actually provide the guarantees the media
>> driver needs for its real-time high-resolution decode. It's bad enough
>> we're seriously considering ripping it out, backwards compatibility or
>> not. The good news is that very little that your average desktop user
>> does depends on it: basically just real-time >4K video decode.
>>
>> The new parallel submit API is much better and should be the path
>> forward. (We should have landed parallel submit the first time around.)
>> It isn't full of corners and does let us provides actual parallel
>> execution guarantees. It also gives the scheduler the information it
>> needs to reliably provide those guarantees. >
>> If we need to support the parallel submit API with the execlist
>> back-end, that's totally possible. The choice to only implement the
>> parallel submit API with GuC is a pragmatic one. We're trying to get
>> upstream back on it's feet and get all the various up-and-coming bits of
>> hardware enabled. Enabling the new API in the execlist back-end makes
>> that pipeline longer.
>
> I feel your pain, and wish you all the best to get GEM less complex
> and more manageable.
>
> So, if I understood correctly, the plan is just to regress 4K+ video
> decoding for people who do not enable GuC scheduling, or did not also
> update to a recent-enough media driver that would support this new
> interface? If it is indeed only for over 4K videos, then whatever. If it
> is 4K, it starts being a little bad, assuming graceful fallback to
> CPU-based decoding. What's the test plan for this patch then? The patch
> in its current form is definitely not making me confident.

My understanding is that it's only >4k that's affected; we've got enough 
bandwidth on a single VCS for 4K. I'm not sure where the exact cut-off is 
(it may be a little higher than 4k) but real-time 4k should be fine and 
real-time 8k requires parallel submit. So we're really not cutting off many 
use-cases. Also, as I said above, the new API can be implemented with the 
execlist scheduler if needed. We've just pragmatically deprioritized it.

--Jason


>
>>> Finally, could you explain why IGT tests need to be modified to work the
>>> GuC [1], and how much of the code in this series is covered by
>>> existing/upcoming tests? I would expect a very solid set of tests to
>>> minimize the maintenance burden, and enable users to reproduce potential
>>> issues found in this new codepath (too many users run with enable_guc=3,
>>> as can be seen on Google[2]).
>>
>> The IGT changes, as I understand them, are entirely around switching to
>> the new parallel submit API. There shouldn't be a major effect to most
>> users.
>
> Right, this part I followed, but failed to connect it to the GuC...
> because I couldn't see why it would be needed (execlist requiring a lot
> more work).
>
> I sincerely wish for the GuC to stay away from upstream because of the
> above concerns (which are yet to be addressed), but if Intel were to
> push forward with the plan to drop execlist, I can foresee a world of
> trouble for users... That is of course unless the GuC were to be open
> sourced, with people outside of Intel able to sign their own builds or
> run unsigned. Failing that, let's hope the last 6 years were just a bad
> start, and the rapid climb in major version of the GuC will magically
> stop! I hope execlists will remain at feature parity with the GuC when
> possible... but deplore the increase in validation needs which will only
> hurt users in the end.
>
> Thanks for your honest answer,
> Martin
>
>>
>> --Jason


[-- Attachment #2: Type: text/html, Size: 15688 bytes --]

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-10 13:55     ` Martin Peres
  2021-05-10 16:25       ` Jason Ekstrand
@ 2021-05-10 16:33       ` Daniel Vetter
  2021-05-10 18:30         ` [Intel-gfx] " Francisco Jerez
  2021-05-11  8:06         ` Martin Peres
  1 sibling, 2 replies; 249+ messages in thread
From: Daniel Vetter @ 2021-05-10 16:33 UTC (permalink / raw)
  To: Martin Peres
  Cc: Matthew Brost, Tvrtko Ursulin, intel-gfx, dri-devel,
	Jason Ekstrand, Ceraolo Spurio, Daniele, Bloomfield, Jon,
	Jason Ekstrand, Daniel Vetter, John Harrison

On Mon, May 10, 2021 at 3:55 PM Martin Peres <martin.peres@free.fr> wrote:
>
> On 10/05/2021 02:11, Jason Ekstrand wrote:
> > On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
> >
> >> Hi,
> >>
> >> On 06/05/2021 22:13, Matthew Brost wrote:
> >>> Basic GuC submission support. This is the first bullet point in the
> >>> upstreaming plan covered in the following RFC [1].
> >>>
> >>> At a very high level the GuC is a piece of firmware which sits between
> >>> the i915 and the GPU. It offloads some of the scheduling of contexts
> >>> from the i915 and programs the GPU to submit contexts. The i915
> >>> communicates with the GuC and the GuC communicates with the GPU.
> >>
> >> May I ask what will GuC command submission do that execlist won't/can't
> >> do? And what would be the impact on users? Even forgetting the troubled
> >> history of GuC (instability, performance regression, poor level of user
> >> support, 6+ years of trying to upstream it...), adding this much code
> >> and doubling the amount of validation needed should come with a
> >> rationale making it feel worth it... and I am not seeing here. Would you
> >> mind providing the rationale behind this work?
> >>
> >>>
> >>> GuC submission will be disabled by default on all current upstream
> >>> platforms behind a module parameter - enable_guc. A value of 3 will
> >>> enable submission and HuC loading via the GuC. GuC submission should
> >>> work on all gen11+ platforms assuming the GuC firmware is present.
> >>
> >> What is the plan here when it comes to keeping support for execlist? I
> >> am afraid that landing GuC support in Linux is the first step towards
> >> killing the execlist, which would force users to use proprietary
> >> firmwares that even most Intel engineers have little influence over.
> >> Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling"
> >> which states "Disable semaphores when using GuC scheduling as semaphores
> >> are broken in the current GuC firmware." is anything to go by, it means
> >> that even Intel developers seem to prefer working around the GuC
> >> firmware, rather than fixing it.
> >
> > Yes, landing GuC support may be the first step in removing execlist
> > support. The inevitable reality is that GPU scheduling is coming and
> > likely to be there only path in the not-too-distant future. (See also
> > the ongoing thread with AMD about fences.) I'm not going to pass
> > judgement on whether or not this is a good thing.  I'm just reading the
> > winds and, in my view, this is where things are headed for good or ill.
> >
> > In answer to the question above, the answer to "what do we gain from
> > GuC?" may soon be, "you get to use your GPU."  We're not there yet and,
> > again, I'm not necessarily advocating for it, but that is likely where
> > things are headed.
>
> This will be a sad day, especially since it seems fundamentally opposed
> with any long-term support, on top of taking away user freedom to
> fix/tweak their system when Intel won't.
>
> > A firmware-based submission model isn't a bad design IMO and, aside from
> > the firmware freedom issues, I think there are actual advantages to the
> > model. Immediately, it'll unlock a few features like parallel submission
> > (more on that in a bit) and long-running compute because they're
> > implemented in GuC and the work to implement them properly in the
> > execlist scheduler is highly non-trivial. Longer term, it may (no
> > guarantees) unlock some performance by getting the kernel out of the way.
>
> Oh, I definitely agree with firmware-based submission model not being a
> bad design. I was even cheering for it in 2015. Experience with it made
> me regret that deeply since :s
>
> But with the DRM scheduler being responsible for most things, I fail to
> see what we could offload in the GuC except context switching (like
> every other manufacturer). The problem is, the GuC does way more than
> just switching registers in bulk, and if the number of revisions of the
> GuC is anything to go by, it is way too complex for me to feel
> comfortable with it.

We need to flesh out that part of the plan more, but we're not going
to use drm scheduler for everything. It's only to handle the dma-fence
legacy side of things, which means:
- timeout handling for batches that take too long
- dma_fence dependency sorting/handling
- boosting of context from display flips (currently missing, needs to
be ported from drm/i915)

The actual round-robin/preempt/priority handling is still left to the
backend, in this case here the fw. So there's large chunks of
code/functionality where drm/scheduler wont be involved in, and like
Jason says: The hw direction winds definitely blow in the direction
that this is all handled in hw.

> >> In the same vein, I have another concern related to the impact of GuC on
> >> Linux's stable releases. Let's say that in 3 years, a new application
> >> triggers a bug in command submission inside the firmware. Given that the
> >> Linux community cannot patch the GuC firmware, how likely is it that
> >> Intel would release a new GuC version? That would not be necessarily
> >> such a big problem if newer versions of the GuC could easily be
> >> backported to this potentially-decade-old Linux version, but given that
> >> the GuC seems to have ABI-breaking changes on a monthly cadence (we are
> >> at major version 60 *already*? :o), I would say that it is
> >> highly-unlikely that it would not require potentially-extensive changes
> >> to i915 to make it work, making the fix almost impossible to land in the
> >> stable tree... Do you have a plan to mitigate this problem?
> >>
> >> Patches like "drm/i915/guc: Disable bonding extension with GuC
> >> submission" also make me twitch, as this means the two command
> >> submission paths will not be functionally equivalent, and enabling GuC
> >> could thus introduce a user-visible regression (one app used to work,
> >> then stopped working). Could you add in the commit's message a proof
> >> that this would not end up being a user regression (in which case, why
> >> have this codepath to begin with?).
> >
> > I'd like to address this one specifically as it's become something of a
> > speciality of mine the past few weeks. The current bonded submission
> > model is bad. It provides a plethora of ways for a client to back itself
> > into a corner and doesn't actually provide the guarantees the media
> > driver needs for its real-time high-resolution decode. It's bad enough
> > we're seriously considering ripping it out, backwards compatibility or
> > not. The good news is that very little that your average desktop user
> > does depends on it: basically just real-time >4K video decode.
> >
> > The new parallel submit API is much better and should be the path
> > forward. (We should have landed parallel submit the first time around.)
> > It isn't full of corners and does let us provides actual parallel
> > execution guarantees. It also gives the scheduler the information it
> > needs to reliably provide those guarantees. >
> > If we need to support the parallel submit API with the execlist
> > back-end, that's totally possible. The choice to only implement the
> > parallel submit API with GuC is a pragmatic one. We're trying to get
> > upstream back on it's feet and get all the various up-and-coming bits of
> > hardware enabled. Enabling the new API in the execlist back-end makes
> > that pipeline longer.
>
> I feel your pain, and wish you all the best to get GEM less complex
> and more manageable.
>
> So, if I understood correctly, the plan is just to regress 4K+ video
> decoding for people who do not enable GuC scheduling, or did not also
> update to a recent-enough media driver that would support this new
> interface? If it is indeed only for over 4K videos, then whatever. If it
> is 4K, it starts being a little bad, assuming graceful fallback to
> CPU-based decoding. What's the test plan for this patch then? The patch
> in its current form is definitely not making me confident.

Only if they don't scream loudly enough. If someone screams loud
enough we'll bite the bullet and enable the new interface on execlist
backend.

Cheers, Daniel

> >> Finally, could you explain why IGT tests need to be modified to work the
> >> GuC [1], and how much of the code in this series is covered by
> >> existing/upcoming tests? I would expect a very solid set of tests to
> >> minimize the maintenance burden, and enable users to reproduce potential
> >> issues found in this new codepath (too many users run with enable_guc=3,
> >> as can be seen on Google[2]).
> >
> > The IGT changes, as I understand them, are entirely around switching to
> > the new parallel submit API. There shouldn't be a major effect to most
> > users.
>
> Right, this part I followed, but failed to connect it to the GuC...
> because I couldn't see why it would be needed (execlist requiring a lot
> more work).
>
> I sincerely wish for the GuC to stay away from upstream because of the
> above concerns (which are yet to be addressed), but if Intel were to
> push forward with the plan to drop execlist, I can foresee a world of
> trouble for users... That is of course unless the GuC were to be open
> sourced, with people outside of Intel able to sign their own builds or
> run unsigned. Failing that, let's hope the last 6 years were just a bad
> start, and the rapid climb in major version of the GuC will magically
> stop! I hope execlists will remain at feature parity with the GuC when
> possible... but deplore the increase in validation needs which will only
> hurt users in the end.
>
> Thanks for your honest answer,
> Martin
>
> >
> > --Jason



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-10 16:33       ` Daniel Vetter
@ 2021-05-10 18:30         ` Francisco Jerez
  2021-05-11  8:06         ` Martin Peres
  1 sibling, 0 replies; 249+ messages in thread
From: Francisco Jerez @ 2021-05-10 18:30 UTC (permalink / raw)
  To: Daniel Vetter, Martin Peres
  Cc: Jason Ekstrand, Daniel Vetter, intel-gfx, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 11260 bytes --]

Daniel Vetter <daniel@ffwll.ch> writes:

> On Mon, May 10, 2021 at 3:55 PM Martin Peres <martin.peres@free.fr> wrote:
>>
>> On 10/05/2021 02:11, Jason Ekstrand wrote:
>> > On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
>> >
>> >> Hi,
>> >>
>> >> On 06/05/2021 22:13, Matthew Brost wrote:
>> >>> Basic GuC submission support. This is the first bullet point in the
>> >>> upstreaming plan covered in the following RFC [1].
>> >>>
>> >>> At a very high level the GuC is a piece of firmware which sits between
>> >>> the i915 and the GPU. It offloads some of the scheduling of contexts
>> >>> from the i915 and programs the GPU to submit contexts. The i915
>> >>> communicates with the GuC and the GuC communicates with the GPU.
>> >>
>> >> May I ask what will GuC command submission do that execlist won't/can't
>> >> do? And what would be the impact on users? Even forgetting the troubled
>> >> history of GuC (instability, performance regression, poor level of user
>> >> support, 6+ years of trying to upstream it...), adding this much code
>> >> and doubling the amount of validation needed should come with a
>> >> rationale making it feel worth it... and I am not seeing here. Would you
>> >> mind providing the rationale behind this work?
>> >>
>> >>>
>> >>> GuC submission will be disabled by default on all current upstream
>> >>> platforms behind a module parameter - enable_guc. A value of 3 will
>> >>> enable submission and HuC loading via the GuC. GuC submission should
>> >>> work on all gen11+ platforms assuming the GuC firmware is present.
>> >>
>> >> What is the plan here when it comes to keeping support for execlist? I
>> >> am afraid that landing GuC support in Linux is the first step towards
>> >> killing the execlist, which would force users to use proprietary
>> >> firmwares that even most Intel engineers have little influence over.
>> >> Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling"
>> >> which states "Disable semaphores when using GuC scheduling as semaphores
>> >> are broken in the current GuC firmware." is anything to go by, it means
>> >> that even Intel developers seem to prefer working around the GuC
>> >> firmware, rather than fixing it.
>> >
>> > Yes, landing GuC support may be the first step in removing execlist
>> > support. The inevitable reality is that GPU scheduling is coming and
>> > likely to be there only path in the not-too-distant future. (See also
>> > the ongoing thread with AMD about fences.) I'm not going to pass
>> > judgement on whether or not this is a good thing.  I'm just reading the
>> > winds and, in my view, this is where things are headed for good or ill.
>> >
>> > In answer to the question above, the answer to "what do we gain from
>> > GuC?" may soon be, "you get to use your GPU."  We're not there yet and,
>> > again, I'm not necessarily advocating for it, but that is likely where
>> > things are headed.
>>
>> This will be a sad day, especially since it seems fundamentally opposed
>> with any long-term support, on top of taking away user freedom to
>> fix/tweak their system when Intel won't.
>>
>> > A firmware-based submission model isn't a bad design IMO and, aside from
>> > the firmware freedom issues, I think there are actual advantages to the
>> > model. Immediately, it'll unlock a few features like parallel submission
>> > (more on that in a bit) and long-running compute because they're
>> > implemented in GuC and the work to implement them properly in the
>> > execlist scheduler is highly non-trivial. Longer term, it may (no
>> > guarantees) unlock some performance by getting the kernel out of the way.
>>
>> Oh, I definitely agree with firmware-based submission model not being a
>> bad design. I was even cheering for it in 2015. Experience with it made
>> me regret that deeply since :s
>>
>> But with the DRM scheduler being responsible for most things, I fail to
>> see what we could offload in the GuC except context switching (like
>> every other manufacturer). The problem is, the GuC does way more than
>> just switching registers in bulk, and if the number of revisions of the
>> GuC is anything to go by, it is way too complex for me to feel
>> comfortable with it.
>
> We need to flesh out that part of the plan more, but we're not going
> to use drm scheduler for everything. It's only to handle the dma-fence
> legacy side of things, which means:
> - timeout handling for batches that take too long
> - dma_fence dependency sorting/handling
> - boosting of context from display flips (currently missing, needs to
> be ported from drm/i915)
>
> The actual round-robin/preempt/priority handling is still left to the
> backend, in this case here the fw. So there's large chunks of
> code/functionality where drm/scheduler wont be involved in, and like
> Jason says: The hw direction winds definitely blow in the direction
> that this is all handled in hw.
>

I agree with Martin on this.  Given that using GuC currently involves
making your open-source graphics stack rely on a closed-source
cryptographically-protected blob in order to submit commands to the GPU,
and given that it is /still/ possible to use the GPU without it, I'd
expect some strong material justification for making the switch (like,
it improves performance of test-case X and Y by Z%, or, we're truly
sorry but we cannot program your GPU anymore with a purely open-source
software stack).  Any argument based on the apparent direction of the
wind doesn't sound like a material engineering reason to me, and runs
the risk of being self-fulfilling if it leads us to do the worse thing
for our users just because we have the vague feeling that it is the
general trend, even though we may have had the means to obtain a better
compromise for them.

>> >> In the same vein, I have another concern related to the impact of GuC on
>> >> Linux's stable releases. Let's say that in 3 years, a new application
>> >> triggers a bug in command submission inside the firmware. Given that the
>> >> Linux community cannot patch the GuC firmware, how likely is it that
>> >> Intel would release a new GuC version? That would not be necessarily
>> >> such a big problem if newer versions of the GuC could easily be
>> >> backported to this potentially-decade-old Linux version, but given that
>> >> the GuC seems to have ABI-breaking changes on a monthly cadence (we are
>> >> at major version 60 *already*? :o), I would say that it is
>> >> highly-unlikely that it would not require potentially-extensive changes
>> >> to i915 to make it work, making the fix almost impossible to land in the
>> >> stable tree... Do you have a plan to mitigate this problem?
>> >>
>> >> Patches like "drm/i915/guc: Disable bonding extension with GuC
>> >> submission" also make me twitch, as this means the two command
>> >> submission paths will not be functionally equivalent, and enabling GuC
>> >> could thus introduce a user-visible regression (one app used to work,
>> >> then stopped working). Could you add in the commit's message a proof
>> >> that this would not end up being a user regression (in which case, why
>> >> have this codepath to begin with?).
>> >
>> > I'd like to address this one specifically as it's become something of a
>> > speciality of mine the past few weeks. The current bonded submission
>> > model is bad. It provides a plethora of ways for a client to back itself
>> > into a corner and doesn't actually provide the guarantees the media
>> > driver needs for its real-time high-resolution decode. It's bad enough
>> > we're seriously considering ripping it out, backwards compatibility or
>> > not. The good news is that very little that your average desktop user
>> > does depends on it: basically just real-time >4K video decode.
>> >
>> > The new parallel submit API is much better and should be the path
>> > forward. (We should have landed parallel submit the first time around.)
>> > It isn't full of corners and does let us provides actual parallel
>> > execution guarantees. It also gives the scheduler the information it
>> > needs to reliably provide those guarantees. >
>> > If we need to support the parallel submit API with the execlist
>> > back-end, that's totally possible. The choice to only implement the
>> > parallel submit API with GuC is a pragmatic one. We're trying to get
>> > upstream back on it's feet and get all the various up-and-coming bits of
>> > hardware enabled. Enabling the new API in the execlist back-end makes
>> > that pipeline longer.
>>
>> I feel your pain, and wish you all the best to get GEM less complex
>> and more manageable.
>>
>> So, if I understood correctly, the plan is just to regress 4K+ video
>> decoding for people who do not enable GuC scheduling, or did not also
>> update to a recent-enough media driver that would support this new
>> interface? If it is indeed only for over 4K videos, then whatever. If it
>> is 4K, it starts being a little bad, assuming graceful fallback to
>> CPU-based decoding. What's the test plan for this patch then? The patch
>> in its current form is definitely not making me confident.
>
> Only if they don't scream loudly enough. If someone screams loud
> enough we'll bite the bullet and enable the new interface on execlist
> backend.
>
> Cheers, Daniel
>
>> >> Finally, could you explain why IGT tests need to be modified to work the
>> >> GuC [1], and how much of the code in this series is covered by
>> >> existing/upcoming tests? I would expect a very solid set of tests to
>> >> minimize the maintenance burden, and enable users to reproduce potential
>> >> issues found in this new codepath (too many users run with enable_guc=3,
>> >> as can be seen on Google[2]).
>> >
>> > The IGT changes, as I understand them, are entirely around switching to
>> > the new parallel submit API. There shouldn't be a major effect to most
>> > users.
>>
>> Right, this part I followed, but failed to connect it to the GuC...
>> because I couldn't see why it would be needed (execlist requiring a lot
>> more work).
>>
>> I sincerely wish for the GuC to stay away from upstream because of the
>> above concerns (which are yet to be addressed), but if Intel were to
>> push forward with the plan to drop execlist, I can foresee a world of
>> trouble for users... That is of course unless the GuC were to be open
>> sourced, with people outside of Intel able to sign their own builds or
>> run unsigned. Failing that, let's hope the last 6 years were just a bad
>> start, and the rapid climb in major version of the GuC will magically
>> stop! I hope execlists will remain at feature parity with the GuC when
>> possible... but deplore the increase in validation needs which will only
>> hurt users in the end.
>>
>> Thanks for your honest answer,
>> Martin
>>
>> >
>> > --Jason
>
>
>
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-09 23:11   ` Jason Ekstrand
  2021-05-10 13:55     ` Martin Peres
@ 2021-05-11  2:58     ` Dixit, Ashutosh
  2021-05-11  7:47       ` Martin Peres
  1 sibling, 1 reply; 249+ messages in thread
From: Dixit, Ashutosh @ 2021-05-11  2:58 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Matthew Brost, tvrtko.ursulin, intel-gfx, dri-devel,
	jason.ekstrand, daniele.ceraolospurio, jon.bloomfield,
	daniel.vetter, john.c.harrison

On Sun, 09 May 2021 16:11:43 -0700, Jason Ekstrand wrote:
>
> Yes, landing GuC support may be the first step in removing execlist
> support. The inevitable reality is that GPU scheduling is coming and
> likely to be there only path in the not-too-distant future.  (See also
> the ongoing thread with AMD about fences.) I'm not going to pass
> judgement on whether or not this is a good thing.  I'm just reading the
> winds and, in my view, this is where things are headed for good or ill.
>
> In answer to the question above, the answer to "what do we gain from
> GuC?" may soon be, "you get to use your GPU."  We're not there yet and,
> again, I'm not necessarily advocating for it, but that is likely where
> things are headed.
>
> A firmware-based submission model isn't a bad design IMO and, aside from
> the firmware freedom issues, I think there are actual advantages to the
> model. Immediately, it'll unlock a few features like parallel submission
> (more on that in a bit) and long-running compute because they're
> implemented in GuC and the work to implement them properly in the
> execlist scheduler is highly non-trivial.  Longer term, it may (no
> guarantees) unlock some performance by getting the kernel out of the way.

I believe another main reason for GuC is support for HW based
virtualization like SRIOV. The only way to support SRIOV with execlists
would be to statically partition the GPU between VM's, any dynamic
partitioning needs something in HW.

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-11  2:58     ` Dixit, Ashutosh
@ 2021-05-11  7:47       ` Martin Peres
  0 siblings, 0 replies; 249+ messages in thread
From: Martin Peres @ 2021-05-11  7:47 UTC (permalink / raw)
  To: Dixit, Ashutosh, Jason Ekstrand
  Cc: Matthew Brost, tvrtko.ursulin, intel-gfx, dri-devel,
	jason.ekstrand, daniele.ceraolospurio, jon.bloomfield,
	daniel.vetter, john.c.harrison



On 11/05/2021 05:58, Dixit, Ashutosh wrote:
> On Sun, 09 May 2021 16:11:43 -0700, Jason Ekstrand wrote:
>>
>> Yes, landing GuC support may be the first step in removing execlist
>> support. The inevitable reality is that GPU scheduling is coming and
>> likely to be there only path in the not-too-distant future.  (See also
>> the ongoing thread with AMD about fences.) I'm not going to pass
>> judgement on whether or not this is a good thing.  I'm just reading the
>> winds and, in my view, this is where things are headed for good or ill.
>>
>> In answer to the question above, the answer to "what do we gain from
>> GuC?" may soon be, "you get to use your GPU."  We're not there yet and,
>> again, I'm not necessarily advocating for it, but that is likely where
>> things are headed.
>>
>> A firmware-based submission model isn't a bad design IMO and, aside from
>> the firmware freedom issues, I think there are actual advantages to the
>> model. Immediately, it'll unlock a few features like parallel submission
>> (more on that in a bit) and long-running compute because they're
>> implemented in GuC and the work to implement them properly in the
>> execlist scheduler is highly non-trivial.  Longer term, it may (no
>> guarantees) unlock some performance by getting the kernel out of the way.
> 
> I believe another main reason for GuC is support for HW based
> virtualization like SRIOV. The only way to support SRIOV with execlists
> would be to statically partition the GPU between VM's, any dynamic
> partitioning needs something in HW.
> 

FW-based command-submission is indeed a win for SRIOV and the current HW 
architecture.

Thanks for chiming in!
Martin

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-10 16:25       ` Jason Ekstrand
@ 2021-05-11  8:01         ` Martin Peres
  0 siblings, 0 replies; 249+ messages in thread
From: Martin Peres @ 2021-05-11  8:01 UTC (permalink / raw)
  To: Jason Ekstrand, Matthew Brost, intel-gfx, dri-devel
  Cc: tvrtko.ursulin, jason.ekstrand, daniele.ceraolospurio,
	jon.bloomfield, daniel.vetter, john.c.harrison

On 10/05/2021 19:25, Jason Ekstrand wrote:
> On May 10, 2021 08:55:55 Martin Peres <martin.peres@free.fr> wrote:
> 
>> On 10/05/2021 02:11, Jason Ekstrand wrote:
>>> On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
>>>
>>>> Hi,
>>>>
>>>> On 06/05/2021 22:13, Matthew Brost wrote:
>>>>> Basic GuC submission support. This is the first bullet point in the
>>>>> upstreaming plan covered in the following RFC [1].
>>>>>
>>>>> At a very high level the GuC is a piece of firmware which sits between
>>>>> the i915 and the GPU. It offloads some of the scheduling of contexts
>>>>> from the i915 and programs the GPU to submit contexts. The i915
>>>>> communicates with the GuC and the GuC communicates with the GPU.
>>>>
>>>> May I ask what will GuC command submission do that execlist won't/can't
>>>> do? And what would be the impact on users? Even forgetting the troubled
>>>> history of GuC (instability, performance regression, poor level of user
>>>> support, 6+ years of trying to upstream it...), adding this much code
>>>> and doubling the amount of validation needed should come with a
>>>> rationale making it feel worth it... and I am not seeing here. Would you
>>>> mind providing the rationale behind this work?
>>>>
>>>>>
>>>>> GuC submission will be disabled by default on all current upstream
>>>>> platforms behind a module parameter - enable_guc. A value of 3 will
>>>>> enable submission and HuC loading via the GuC. GuC submission should
>>>>> work on all gen11+ platforms assuming the GuC firmware is present.
>>>>
>>>> What is the plan here when it comes to keeping support for execlist? I
>>>> am afraid that landing GuC support in Linux is the first step towards
>>>> killing the execlist, which would force users to use proprietary
>>>> firmwares that even most Intel engineers have little influence over.
>>>> Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling"
>>>> which states "Disable semaphores when using GuC scheduling as semaphores
>>>> are broken in the current GuC firmware." is anything to go by, it means
>>>> that even Intel developers seem to prefer working around the GuC
>>>> firmware, rather than fixing it.
>>>
>>> Yes, landing GuC support may be the first step in removing execlist
>>> support. The inevitable reality is that GPU scheduling is coming and
>>> likely to be there only path in the not-too-distant future. (See also
>>> the ongoing thread with AMD about fences.) I'm not going to pass
>>> judgement on whether or not this is a good thing.  I'm just reading the
>>> winds and, in my view, this is where things are headed for good or ill.
>>>
>>> In answer to the question above, the answer to "what do we gain from
>>> GuC?" may soon be, "you get to use your GPU."  We're not there yet and,
>>> again, I'm not necessarily advocating for it, but that is likely where
>>> things are headed.
>>
>> This will be a sad day, especially since it seems fundamentally opposed
>> with any long-term support, on top of taking away user freedom to
>> fix/tweak their system when Intel won't.
>>
>>> A firmware-based submission model isn't a bad design IMO and, aside from
>>> the firmware freedom issues, I think there are actual advantages to the
>>> model. Immediately, it'll unlock a few features like parallel submission
>>> (more on that in a bit) and long-running compute because they're
>>> implemented in GuC and the work to implement them properly in the
>>> execlist scheduler is highly non-trivial. Longer term, it may (no
>>> guarantees) unlock some performance by getting the kernel out of the way.
>>
>> Oh, I definitely agree with firmware-based submission model not being a
>> bad design. I was even cheering for it in 2015. Experience with it made
>> me regret that deeply since :s
>>
>> But with the DRM scheduler being responsible for most things, I fail to
>> see what we could offload in the GuC except context switching (like
>> every other manufacturer). The problem is, the GuC does way more than
>> just switching registers in bulk, and if the number of revisions of the
>> GuC is anything to go by, it is way too complex for me to feel
>> comfortable with it.
> 
> It's more than just bulk register writes. When it comes to 
> load-balancing multiple GPU users, firmware can theoretically preempt 
> and switch faster leading to more efficient time-slicing. All we really 
> need the DRM scheduler for is handling implicit dma_fence dependencies 
> between different applications.

Right, this makes sense. However, if the GuC's interface was so simple, 
I doubt it would be at major version 60 already :s

I don't disagree with FW-based command submission, as it has a lot of 
benefits. I just don't like the route of going with a firmware no-one 
else than Intel can work on, *and* one that doesn't seem to concern 
itself with stable interfaces, and how i915 will have to deal with every 
generation using different interfaces (assuming the firmware was bug-free).

> 
> 
>>>> In the same vein, I have another concern related to the impact of GuC on
>>>> Linux's stable releases. Let's say that in 3 years, a new application
>>>> triggers a bug in command submission inside the firmware. Given that the
>>>> Linux community cannot patch the GuC firmware, how likely is it that
>>>> Intel would release a new GuC version? That would not be necessarily
>>>> such a big problem if newer versions of the GuC could easily be
>>>> backported to this potentially-decade-old Linux version, but given that
>>>> the GuC seems to have ABI-breaking changes on a monthly cadence (we are
>>>> at major version 60 *already*? :o), I would say that it is
>>>> highly-unlikely that it would not require potentially-extensive changes
>>>> to i915 to make it work, making the fix almost impossible to land in the
>>>> stable tree... Do you have a plan to mitigate this problem?
>>>>
>>>> Patches like "drm/i915/guc: Disable bonding extension with GuC
>>>> submission" also make me twitch, as this means the two command
>>>> submission paths will not be functionally equivalent, and enabling GuC
>>>> could thus introduce a user-visible regression (one app used to work,
>>>> then stopped working). Could you add in the commit's message a proof
>>>> that this would not end up being a user regression (in which case, why
>>>> have this codepath to begin with?).
>>>
>>> I'd like to address this one specifically as it's become something of a
>>> speciality of mine the past few weeks. The current bonded submission
>>> model is bad. It provides a plethora of ways for a client to back itself
>>> into a corner and doesn't actually provide the guarantees the media
>>> driver needs for its real-time high-resolution decode. It's bad enough
>>> we're seriously considering ripping it out, backwards compatibility or
>>> not. The good news is that very little that your average desktop user
>>> does depends on it: basically just real-time >4K video decode.
>>>
>>> The new parallel submit API is much better and should be the path
>>> forward. (We should have landed parallel submit the first time around.)
>>> It isn't full of corners and does let us provides actual parallel
>>> execution guarantees. It also gives the scheduler the information it
>>> needs to reliably provide those guarantees. >
>>> If we need to support the parallel submit API with the execlist
>>> back-end, that's totally possible. The choice to only implement the
>>> parallel submit API with GuC is a pragmatic one. We're trying to get
>>> upstream back on it's feet and get all the various up-and-coming bits of
>>> hardware enabled. Enabling the new API in the execlist back-end makes
>>> that pipeline longer.
>>
>> I feel your pain, and wish you all the best to get GEM less complex
>> and more manageable.
>>
>> So, if I understood correctly, the plan is just to regress 4K+ video
>> decoding for people who do not enable GuC scheduling, or did not also
>> update to a recent-enough media driver that would support this new
>> interface? If it is indeed only for over 4K videos, then whatever. If it
>> is 4K, it starts being a little bad, assuming graceful fallback to
>> CPU-based decoding. What's the test plan for this patch then? The patch
>> in its current form is definitely not making me confident.
> 
> My understanding is that it's only >4k that's affected; we've got enough 
> bandwidth on a single VCS for 4K. I'm not sure where the exact cut-off 
> is (it may be a little higher than 4k) but real-time 4k should be fine 
> and real-time 8k requires parallel submit. So we're really not cutting 
> off many use-cases. Also, as I said above, the new API can be 
> implemented with the execlist scheduler if needed. We've just 
> pragmatically deprioritized it.

Sounds like a niche-enough use case to me that I feel no user would 
complain about it.

Martin

> 
> --Jason
> 
> 
>>>> Finally, could you explain why IGT tests need to be modified to work the
>>>> GuC [1], and how much of the code in this series is covered by
>>>> existing/upcoming tests? I would expect a very solid set of tests to
>>>> minimize the maintenance burden, and enable users to reproduce potential
>>>> issues found in this new codepath (too many users run with enable_guc=3,
>>>> as can be seen on Google[2]).
>>>
>>> The IGT changes, as I understand them, are entirely around switching to
>>> the new parallel submit API. There shouldn't be a major effect to most
>>> users.
>>
>> Right, this part I followed, but failed to connect it to the GuC...
>> because I couldn't see why it would be needed (execlist requiring a lot
>> more work).
>>
>> I sincerely wish for the GuC to stay away from upstream because of the
>> above concerns (which are yet to be addressed), but if Intel were to
>> push forward with the plan to drop execlist, I can foresee a world of
>> trouble for users... That is of course unless the GuC were to be open
>> sourced, with people outside of Intel able to sign their own builds or
>> run unsigned. Failing that, let's hope the last 6 years were just a bad
>> start, and the rapid climb in major version of the GuC will magically
>> stop! I hope execlists will remain at feature parity with the GuC when
>> possible... but deplore the increase in validation needs which will only
>> hurt users in the end.
>>
>> Thanks for your honest answer,
>> Martin
>>
>>>
>>> --Jason
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-10 16:33       ` Daniel Vetter
  2021-05-10 18:30         ` [Intel-gfx] " Francisco Jerez
@ 2021-05-11  8:06         ` Martin Peres
  2021-05-11 15:26           ` Bloomfield, Jon
  1 sibling, 1 reply; 249+ messages in thread
From: Martin Peres @ 2021-05-11  8:06 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Matthew Brost, Tvrtko Ursulin, intel-gfx, dri-devel,
	Jason Ekstrand, Ceraolo Spurio, Daniele, Bloomfield, Jon,
	Jason Ekstrand, Daniel Vetter, John Harrison

On 10/05/2021 19:33, Daniel Vetter wrote:
> On Mon, May 10, 2021 at 3:55 PM Martin Peres <martin.peres@free.fr> wrote:
>>
>> On 10/05/2021 02:11, Jason Ekstrand wrote:
>>> On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
>>>
>>>> Hi,
>>>>
>>>> On 06/05/2021 22:13, Matthew Brost wrote:
>>>>> Basic GuC submission support. This is the first bullet point in the
>>>>> upstreaming plan covered in the following RFC [1].
>>>>>
>>>>> At a very high level the GuC is a piece of firmware which sits between
>>>>> the i915 and the GPU. It offloads some of the scheduling of contexts
>>>>> from the i915 and programs the GPU to submit contexts. The i915
>>>>> communicates with the GuC and the GuC communicates with the GPU.
>>>>
>>>> May I ask what will GuC command submission do that execlist won't/can't
>>>> do? And what would be the impact on users? Even forgetting the troubled
>>>> history of GuC (instability, performance regression, poor level of user
>>>> support, 6+ years of trying to upstream it...), adding this much code
>>>> and doubling the amount of validation needed should come with a
>>>> rationale making it feel worth it... and I am not seeing here. Would you
>>>> mind providing the rationale behind this work?
>>>>
>>>>>
>>>>> GuC submission will be disabled by default on all current upstream
>>>>> platforms behind a module parameter - enable_guc. A value of 3 will
>>>>> enable submission and HuC loading via the GuC. GuC submission should
>>>>> work on all gen11+ platforms assuming the GuC firmware is present.
>>>>
>>>> What is the plan here when it comes to keeping support for execlist? I
>>>> am afraid that landing GuC support in Linux is the first step towards
>>>> killing the execlist, which would force users to use proprietary
>>>> firmwares that even most Intel engineers have little influence over.
>>>> Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling"
>>>> which states "Disable semaphores when using GuC scheduling as semaphores
>>>> are broken in the current GuC firmware." is anything to go by, it means
>>>> that even Intel developers seem to prefer working around the GuC
>>>> firmware, rather than fixing it.
>>>
>>> Yes, landing GuC support may be the first step in removing execlist
>>> support. The inevitable reality is that GPU scheduling is coming and
>>> likely to be there only path in the not-too-distant future. (See also
>>> the ongoing thread with AMD about fences.) I'm not going to pass
>>> judgement on whether or not this is a good thing.  I'm just reading the
>>> winds and, in my view, this is where things are headed for good or ill.
>>>
>>> In answer to the question above, the answer to "what do we gain from
>>> GuC?" may soon be, "you get to use your GPU."  We're not there yet and,
>>> again, I'm not necessarily advocating for it, but that is likely where
>>> things are headed.
>>
>> This will be a sad day, especially since it seems fundamentally opposed
>> with any long-term support, on top of taking away user freedom to
>> fix/tweak their system when Intel won't.
>>
>>> A firmware-based submission model isn't a bad design IMO and, aside from
>>> the firmware freedom issues, I think there are actual advantages to the
>>> model. Immediately, it'll unlock a few features like parallel submission
>>> (more on that in a bit) and long-running compute because they're
>>> implemented in GuC and the work to implement them properly in the
>>> execlist scheduler is highly non-trivial. Longer term, it may (no
>>> guarantees) unlock some performance by getting the kernel out of the way.
>>
>> Oh, I definitely agree with firmware-based submission model not being a
>> bad design. I was even cheering for it in 2015. Experience with it made
>> me regret that deeply since :s
>>
>> But with the DRM scheduler being responsible for most things, I fail to
>> see what we could offload in the GuC except context switching (like
>> every other manufacturer). The problem is, the GuC does way more than
>> just switching registers in bulk, and if the number of revisions of the
>> GuC is anything to go by, it is way too complex for me to feel
>> comfortable with it.
> 
> We need to flesh out that part of the plan more, but we're not going
> to use drm scheduler for everything. It's only to handle the dma-fence
> legacy side of things, which means:
> - timeout handling for batches that take too long
> - dma_fence dependency sorting/handling
> - boosting of context from display flips (currently missing, needs to
> be ported from drm/i915)
> 
> The actual round-robin/preempt/priority handling is still left to the
> backend, in this case here the fw. So there's large chunks of
> code/functionality where drm/scheduler wont be involved in, and like
> Jason says: The hw direction winds definitely blow in the direction
> that this is all handled in hw.

The plan makes sense for a SRIOV-enable GPU, yes.

However, if the GuC is actually helping i915, then why not open source 
it and drop all the issues related to its stability? Wouldn't it be the 
perfect solution, as it would allow dropping execlist support for newer 
HW, and it would eliminate the concerns about maintenance of stable 
releases of Linux?

> 
>>>> In the same vein, I have another concern related to the impact of GuC on
>>>> Linux's stable releases. Let's say that in 3 years, a new application
>>>> triggers a bug in command submission inside the firmware. Given that the
>>>> Linux community cannot patch the GuC firmware, how likely is it that
>>>> Intel would release a new GuC version? That would not be necessarily
>>>> such a big problem if newer versions of the GuC could easily be
>>>> backported to this potentially-decade-old Linux version, but given that
>>>> the GuC seems to have ABI-breaking changes on a monthly cadence (we are
>>>> at major version 60 *already*? :o), I would say that it is
>>>> highly-unlikely that it would not require potentially-extensive changes
>>>> to i915 to make it work, making the fix almost impossible to land in the
>>>> stable tree... Do you have a plan to mitigate this problem?
>>>>
>>>> Patches like "drm/i915/guc: Disable bonding extension with GuC
>>>> submission" also make me twitch, as this means the two command
>>>> submission paths will not be functionally equivalent, and enabling GuC
>>>> could thus introduce a user-visible regression (one app used to work,
>>>> then stopped working). Could you add in the commit's message a proof
>>>> that this would not end up being a user regression (in which case, why
>>>> have this codepath to begin with?).
>>>
>>> I'd like to address this one specifically as it's become something of a
>>> speciality of mine the past few weeks. The current bonded submission
>>> model is bad. It provides a plethora of ways for a client to back itself
>>> into a corner and doesn't actually provide the guarantees the media
>>> driver needs for its real-time high-resolution decode. It's bad enough
>>> we're seriously considering ripping it out, backwards compatibility or
>>> not. The good news is that very little that your average desktop user
>>> does depends on it: basically just real-time >4K video decode.
>>>
>>> The new parallel submit API is much better and should be the path
>>> forward. (We should have landed parallel submit the first time around.)
>>> It isn't full of corners and does let us provides actual parallel
>>> execution guarantees. It also gives the scheduler the information it
>>> needs to reliably provide those guarantees. >
>>> If we need to support the parallel submit API with the execlist
>>> back-end, that's totally possible. The choice to only implement the
>>> parallel submit API with GuC is a pragmatic one. We're trying to get
>>> upstream back on it's feet and get all the various up-and-coming bits of
>>> hardware enabled. Enabling the new API in the execlist back-end makes
>>> that pipeline longer.
>>
>> I feel your pain, and wish you all the best to get GEM less complex
>> and more manageable.
>>
>> So, if I understood correctly, the plan is just to regress 4K+ video
>> decoding for people who do not enable GuC scheduling, or did not also
>> update to a recent-enough media driver that would support this new
>> interface? If it is indeed only for over 4K videos, then whatever. If it
>> is 4K, it starts being a little bad, assuming graceful fallback to
>> CPU-based decoding. What's the test plan for this patch then? The patch
>> in its current form is definitely not making me confident.
> 
> Only if they don't scream loudly enough. If someone screams loud
> enough we'll bite the bullet and enable the new interface on execlist
> backend.

Ack.

Martih

> 
> Cheers, Daniel
> 
>>>> Finally, could you explain why IGT tests need to be modified to work the
>>>> GuC [1], and how much of the code in this series is covered by
>>>> existing/upcoming tests? I would expect a very solid set of tests to
>>>> minimize the maintenance burden, and enable users to reproduce potential
>>>> issues found in this new codepath (too many users run with enable_guc=3,
>>>> as can be seen on Google[2]).
>>>
>>> The IGT changes, as I understand them, are entirely around switching to
>>> the new parallel submit API. There shouldn't be a major effect to most
>>> users.
>>
>> Right, this part I followed, but failed to connect it to the GuC...
>> because I couldn't see why it would be needed (execlist requiring a lot
>> more work).
>>
>> I sincerely wish for the GuC to stay away from upstream because of the
>> above concerns (which are yet to be addressed), but if Intel were to
>> push forward with the plan to drop execlist, I can foresee a world of
>> trouble for users... That is of course unless the GuC were to be open
>> sourced, with people outside of Intel able to sign their own builds or
>> run unsigned. Failing that, let's hope the last 6 years were just a bad
>> start, and the rapid climb in major version of the GuC will magically
>> stop! I hope execlists will remain at feature parity with the GuC when
>> possible... but deplore the increase in validation needs which will only
>> hurt users in the end.
>>
>> Thanks for your honest answer,
>> Martin
>>
>>>
>>> --Jason
> 
> 
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* [drm/i915/guc]  07336fb545: WARNING:at_drivers/gpu/drm/i915/gt/uc/intel_uc.c:#__uc_sanitize[i915]
  2021-05-06 19:14 ` [RFC PATCH 66/97] drm/i915/guc: Add disable interrupts to guc sanitize Matthew Brost
@ 2021-05-11  8:16   ` kernel test robot
  0 siblings, 0 replies; 249+ messages in thread
From: kernel test robot @ 2021-05-11  8:16 UTC (permalink / raw)
  To: Matthew Brost
  Cc: matthew.brost, 0day robot, tvrtko.ursulin, intel-gfx, LKML,
	dri-devel, jason.ekstrand, lkp, daniele.ceraolospurio,
	jon.bloomfield, daniel.vetter, john.c.harrison

[-- Attachment #1: Type: text/plain, Size: 8278 bytes --]



Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: 07336fb545bfa9794d5b4146809355dffc93f0aa ("[RFC PATCH 66/97] drm/i915/guc: Add disable interrupts to guc sanitize")
url: https://github.com/0day-ci/linux/commits/Matthew-Brost/Basic-GuC-submission-support-in-the-i915/20210507-030308
base: git://anongit.freedesktop.org/drm/drm-tip drm-tip

in testcase: xfstests
version: xfstests-x86_64-73c0871-1_20210401
with following parameters:

	disk: 4HDD
	fs: xfs
	test: xfs-group-23
	ucode: 0x21

test-description: xfstests is a regression test suite for xfs and other files ystems.
test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git


on test machine: 4 threads 1 sockets Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 8G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@intel.com>


kern  :warn  : [   24.324557] ------------[ cut here ]------------
kern  :warn  : [   24.329257] GuC status: 0x0, MIA core expected to be in reset
kern  :warn  : [   24.335071] WARNING: CPU: 1 PID: 197 at drivers/gpu/drm/i915/gt/uc/intel_uc.c:60 __uc_sanitize+0xa1/0xc0 [i915]
kern  :warn  : [   24.346756] Modules linked in: intel_powerclamp coretemp acpi_cpufreq(-) kvm_intel i915(+) ahci kvm intel_gtt libahci drm_kms_helper irq
bypass crct10dif_pclmul crc32_pclmul syscopyarea crc32c_intel mei_me sysfillrect wmi_bmof ghash_clmulni_intel rapl sysimgblt fb_sys_fops intel_cstate liba
ta drm joydev mei intel_uncore wmi video ip_tables
kern  :warn  : [   24.378439] CPU: 1 PID: 197 Comm: systemd-udevd Not tainted 5.12.0-02605-g07336fb545bf #1
kern  :warn  : [   24.388071] Hardware name: Hewlett-Packard HP Pro 3340 MT/17A1, BIOS 8.07 01/24/2013
kern  :warn  : [   24.397285] RIP: 0010:__uc_sanitize+0xa1/0xc0 [i915]
kern  :warn  : [   24.402840] Code: 44 89 e0 5b 41 5c c3 89 c6 48 c7 c7 20 4c 9b c0 e8 b4 47 8f ff 44 89 e0 5b 41 5c c3 89 c6 48 c7 c7 40 4c 9b c0 e8 b5 6
2 31 c1 <0f> 0b eb d2 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00
kern  :warn  : [   24.421847] RSP: 0018:ffffc900005c7990 EFLAGS: 00010286
kern  :warn  : [   24.427101] RAX: 0000000000000000 RBX: ffff88821b3c6908 RCX: 0000000000000000
kern  :warn  : [   24.434288] RDX: ffff88821faa7960 RSI: ffff88821fa977f0 RDI: ffff88821fa977f0
kern  :warn  : [   24.441519] RBP: ffff88821b3d3450 R08: ffff88821fa977f0 R09: ffffc900005c77a8
kern  :warn  : [   24.450088] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
kern  :warn  : [   24.458674] R13: ffff88821b3c68f0 R14: 0000000000000001 R15: ffff88821b3c6908
kern  :warn  : [   24.467270] FS:  00007fc4a4b3ad40(0000) GS:ffff88821fa80000(0000) knlGS:0000000000000000
kern  :warn  : [   24.475429] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kern  :warn  : [   24.482569] CR2: 00007fc4a521202f CR3: 000000021f0fe006 CR4: 00000000001706e0
kern  :warn  : [   24.491126] Call Trace:
kern  :warn  : [   24.493780]  gt_sanitize+0x83/0x180 [i915]
kern  :warn  : [   24.498005]  intel_gt_resume+0x51/0x280 [i915]
kern  :warn  : [   24.503938]  ? intel_rps_init+0x301/0x760 [i915]
kern  :warn  : [   24.509981]  intel_gt_init+0x146/0x2e0 [i915]
kern  :warn  : [   24.515762]  i915_gem_init+0x129/0x1c0 [i915]
kern  :warn  : [   24.521535]  i915_driver_probe+0x2f3/0x6a0 [i915]
kern  :warn  : [   24.526959]  ? __cond_resched+0x19/0x40
kern  :warn  : [   24.530826]  ? mutex_lock+0x21/0x40
kern  :warn  : [   24.530831]  i915_pci_probe+0x54/0x140 [i915]
kern  :warn  : [   24.538745]  local_pci_probe+0x42/0x80
kern  :warn  : [   24.542524]  ? __cond_resched+0x19/0x40
kern  :warn  : [   24.546408]  pci_device_probe+0x107/0x1c0
kern  :warn  : [   24.551825]  really_probe+0xf0/0x400
kern  :warn  : [   24.556811]  driver_probe_device+0xe1/0x160
kern  :warn  : [   24.562399]  device_driver_attach+0x53/0x60
kern  :warn  : [   24.567991]  __driver_attach+0x8a/0x160
kern  :warn  : [   24.572815]  ? device_driver_attach+0x60/0x60
kern  :info  : [   24.573611] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
kern  :warn  : [   24.577243]  ? device_driver_attach+0x60/0x60
kern  :warn  : [   24.577246]  bus_for_each_dev+0x78/0xc0
kern  :info  : [   24.583520] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
kern  :warn  : [   24.587886]  bus_add_driver+0x14d/0x200
kern  :debug : [   24.598175] ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
kern  :warn  : [   24.601823]  driver_register+0x6c/0xc0
kern  :info  : [   24.608849] ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
kern  :warn  : [   24.612654]  ? 0xffffffffc0ad9000
kern  :warn  : [   24.612657]  i915_init+0x62/0x7c [i915]
kern  :info  : [   24.620637] ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
kern  :warn  : [   24.623980]  do_one_initcall+0x44/0x200
kern  :debug : [   24.627872] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
kern  :warn  : [   24.636540]  ? __cond_resched+0x19/0x40
kern  :info  : [   24.640395] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
kern  :warn  : [   24.647416]  ? kmem_cache_alloc_trace+0x44/0x460
kern  :info  : [   24.651297] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
kern  :warn  : [   24.659262]  do_init_module+0x5c/0x240
kern  :info  : [   24.663933] ata5.00: ATA-10: INTEL SSDSC2KB240G8, XCV10100, max UDMA/133
kern  :warn  : [   24.672656]  load_module+0x1143/0x13a0
kern  :info  : [   24.676418] ata5.00: 468862128 sectors, multi 1: LBA48 NCQ (depth 32)
kern  :warn  : [   24.683177]  ? kernel_read_file+0x220/0x280
kern  :warn  : [   24.697667]  ? __do_sys_finit_module+0xb1/0x120
kern  :info  : [   24.697670] ata1.00: ATA-9: WDC WD20EZRX-00D8PB0, 80.00A80, max UDMA/133
kern  :warn  : [   24.702259]  __do_sys_finit_module+0xb1/0x120
kern  :info  : [   24.709017] ata1.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 32), AA
kern  :warn  : [   24.713423]  do_syscall_64+0x33/0x40
kern  :warn  : [   24.725471]  entry_SYSCALL_64_after_hwframe+0x44/0xae
kern  :debug : [   24.725490] ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
kern  :warn  : [   24.731975] RIP: 0033:0x7fc4a5324f59
kern  :info  : [   24.740383] ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
kern  :info  : [   24.740387] ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
kern  :warn  : [   24.745426] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 07 6f 0c 00 f7 d8 64 89 01 48
kern  :debug : [   24.762803] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
kern  :warn  : [   24.781758] RSP: 002b:00007ffd7e4e4988 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
kern  :info  : [   24.788776] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
kern  :warn  : [   24.796438] RAX: ffffffffffffffda RBX: 0000559b41efcc10 RCX: 00007fc4a5324f59
kern  :warn  : [   24.796440] RDX: 0000000000000000 RSI: 00007fc4a5229cad RDI: 0000000000000015
kern  :warn  : [   24.796442] RBP: 00007fc4a5229cad R08: 0000000000000000 R09: 0000000000000000
kern  :info  : [   24.804419] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
kern  :warn  : [   24.811605] R10: 0000000000000015 R11: 0000000000000246 R12: 0000000000000000
kern  :warn  : [   24.811607] R13: 0000559b41ef4870 R14: 0000000000020000 R15: 0000559b41efcc10
kern  :info  : [   24.818795] ata5.00: configured for UDMA/133
kern  :warn  : [   24.825989] ---[ end trace d71d841a02d7284b ]---



To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install                job.yaml  # job file is attached in this email
        bin/lkp split-job --compatible job.yaml  # generate the yaml file for lkp run
        bin/lkp run                    generated-yaml-file



---
0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation

Thanks,
Oliver Sang


[-- Attachment #2: config-5.12.0-02605-g07336fb545bf --]
[-- Type: text/plain, Size: 173005 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 5.12.0 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc-9 (Debian 9.3.0-22) 9.3.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=90300
CONFIG_CLANG_VERSION=0
CONFIG_LD_IS_BFD=y
CONFIG_LD_VERSION=23502
CONFIG_LLD_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_GENERIC_IRQ_INJECTION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_IRQ_MSI_IOMMU=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ_FULL=y
CONFIG_CONTEXT_TRACKING=y
# CONFIG_CONTEXT_TRACKING_FORCE is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y

#
# CPU/Task time and stats accounting
#
CONFIG_VIRT_CPU_ACCOUNTING=y
CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_SCHED_AVG_IRQ=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RCU=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_RCU_NOCB_CPU=y
# end of RCU Subsystem

CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=20
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y

#
# Scheduler features
#
# CONFIG_UCLAMP_TASK is not set
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CC_HAS_INT128=y
CONFIG_ARCH_SUPPORTS_INT128=y
CONFIG_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_KMEM=y
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_CGROUP_PIDS=y
CONFIG_CGROUP_RDMA=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_BPF=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_TIME_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
CONFIG_RD_ZSTD=y
# CONFIG_BOOT_CONFIG is not set
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_LD_ORPHAN_WARN=y
CONFIG_SYSCTL=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BPF=y
# CONFIG_EXPERT is not set
CONFIG_UID16=y
CONFIG_MULTIUSER=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_PRINTK_NMI=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_IO_URING=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_HAVE_ARCH_USERFAULTFD_WP=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
# CONFIG_BPF_LSM is not set
CONFIG_BPF_SYSCALL=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_BPF_JIT_DEFAULT_ON=y
# CONFIG_BPF_PRELOAD is not set
CONFIG_USERFAULTFD=y
CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
CONFIG_KCMP=y
CONFIG_RSEQ=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
# end of Kernel Performance Events And Counters

CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
CONFIG_SLAB_MERGE_DEFAULT=y
CONFIG_SLAB_FREELIST_RANDOM=y
# CONFIG_SLAB_FREELIST_HARDENED is not set
CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
CONFIG_SLUB_CPU_PARTIAL=y
CONFIG_SYSTEM_DATA_VERIFICATION=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
# end of General setup

CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_FILTER_PGPROT=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DYNAMIC_PHYSICAL_MASK=y
CONFIG_PGTABLE_LEVELS=5
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

#
# Processor type and features
#
CONFIG_ZONE_DMA=y
CONFIG_SMP=y
CONFIG_X86_FEATURE_NAMES=y
CONFIG_X86_X2APIC=y
CONFIG_X86_MPPARSE=y
# CONFIG_GOLDFISH is not set
CONFIG_RETPOLINE=y
CONFIG_X86_CPU_RESCTRL=y
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_NUMACHIP is not set
# CONFIG_X86_VSMP is not set
CONFIG_X86_UV=y
# CONFIG_X86_GOLDFISH is not set
# CONFIG_X86_INTEL_MID is not set
CONFIG_X86_INTEL_LPSS=y
CONFIG_X86_AMD_PLATFORM_DEVICE=y
CONFIG_IOSF_MBI=y
# CONFIG_IOSF_MBI_DEBUG is not set
CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
# CONFIG_SCHED_OMIT_FRAME_POINTER is not set
CONFIG_HYPERVISOR_GUEST=y
CONFIG_PARAVIRT=y
# CONFIG_PARAVIRT_DEBUG is not set
CONFIG_PARAVIRT_SPINLOCKS=y
CONFIG_X86_HV_CALLBACK_VECTOR=y
CONFIG_XEN=y
# CONFIG_XEN_PV is not set
CONFIG_XEN_PVHVM=y
CONFIG_XEN_PVHVM_SMP=y
CONFIG_XEN_PVHVM_GUEST=y
CONFIG_XEN_SAVE_RESTORE=y
# CONFIG_XEN_DEBUG_FS is not set
# CONFIG_XEN_PVH is not set
CONFIG_KVM_GUEST=y
CONFIG_ARCH_CPUIDLE_HALTPOLL=y
# CONFIG_PVH is not set
CONFIG_PARAVIRT_TIME_ACCOUNTING=y
CONFIG_PARAVIRT_CLOCK=y
# CONFIG_JAILHOUSE_GUEST is not set
# CONFIG_ACRN_GUEST is not set
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_IA32_FEAT_CTL=y
CONFIG_X86_VMX_FEATURE_NAMES=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_HYGON=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_CPU_SUP_ZHAOXIN=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
# CONFIG_GART_IOMMU is not set
CONFIG_MAXSMP=y
CONFIG_NR_CPUS_RANGE_BEGIN=8192
CONFIG_NR_CPUS_RANGE_END=8192
CONFIG_NR_CPUS_DEFAULT=8192
CONFIG_NR_CPUS=8192
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_MC_PRIO=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
CONFIG_X86_MCE=y
CONFIG_X86_MCELOG_LEGACY=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y
CONFIG_X86_MCE_INJECT=m

#
# Performance monitoring
#
CONFIG_PERF_EVENTS_INTEL_UNCORE=m
CONFIG_PERF_EVENTS_INTEL_RAPL=m
CONFIG_PERF_EVENTS_INTEL_CSTATE=m
CONFIG_PERF_EVENTS_AMD_POWER=m
# end of Performance monitoring

CONFIG_X86_16BIT=y
CONFIG_X86_ESPFIX64=y
CONFIG_X86_VSYSCALL_EMULATION=y
CONFIG_X86_IOPL_IOPERM=y
CONFIG_I8K=m
CONFIG_MICROCODE=y
CONFIG_MICROCODE_INTEL=y
CONFIG_MICROCODE_AMD=y
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_X86_5LEVEL=y
CONFIG_X86_DIRECT_GBPAGES=y
# CONFIG_X86_CPA_STATISTICS is not set
CONFIG_AMD_MEM_ENCRYPT=y
# CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT is not set
CONFIG_NUMA=y
CONFIG_AMD_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_NUMA_EMU=y
CONFIG_NODES_SHIFT=10
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
# CONFIG_ARCH_MEMORY_PROBE is not set
CONFIG_ARCH_PROC_KCORE_TEXT=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
CONFIG_X86_PMEM_LEGACY_DEVICE=y
CONFIG_X86_PMEM_LEGACY=m
CONFIG_X86_CHECK_BIOS_CORRUPTION=y
# CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK is not set
CONFIG_X86_RESERVE_LOW=64
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=1
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
CONFIG_ARCH_RANDOM=y
CONFIG_X86_SMAP=y
CONFIG_X86_UMIP=y
CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=y
CONFIG_X86_INTEL_TSX_MODE_OFF=y
# CONFIG_X86_INTEL_TSX_MODE_ON is not set
# CONFIG_X86_INTEL_TSX_MODE_AUTO is not set
# CONFIG_X86_SGX is not set
CONFIG_EFI=y
CONFIG_EFI_STUB=y
CONFIG_EFI_MIXED=y
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_ARCH_HAS_KEXEC_PURGATORY=y
# CONFIG_KEXEC_SIG is not set
CONFIG_CRASH_DUMP=y
CONFIG_KEXEC_JUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
CONFIG_RANDOMIZE_BASE=y
CONFIG_X86_NEED_RELOCS=y
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_DYNAMIC_MEMORY_LAYOUT=y
CONFIG_RANDOMIZE_MEMORY=y
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING=0xa
CONFIG_HOTPLUG_CPU=y
CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
# CONFIG_DEBUG_HOTPLUG_CPU0 is not set
# CONFIG_COMPAT_VDSO is not set
CONFIG_LEGACY_VSYSCALL_EMULATE=y
# CONFIG_LEGACY_VSYSCALL_XONLY is not set
# CONFIG_LEGACY_VSYSCALL_NONE is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_MODIFY_LDT_SYSCALL=y
CONFIG_HAVE_LIVEPATCH=y
CONFIG_LIVEPATCH=y
# end of Processor type and features

CONFIG_ARCH_HAS_ADD_PAGES=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_USE_PERCPU_NUMA_NODE_ID=y
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y
CONFIG_ARCH_ENABLE_THP_MIGRATION=y

#
# Power management and ACPI options
#
CONFIG_ARCH_HIBERNATION_HEADER=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
CONFIG_HIBERNATE_CALLBACKS=y
CONFIG_HIBERNATION=y
CONFIG_HIBERNATION_SNAPSHOT_DEV=y
CONFIG_PM_STD_PARTITION=""
CONFIG_PM_SLEEP=y
CONFIG_PM_SLEEP_SMP=y
# CONFIG_PM_AUTOSLEEP is not set
# CONFIG_PM_WAKELOCKS is not set
CONFIG_PM=y
CONFIG_PM_DEBUG=y
# CONFIG_PM_ADVANCED_DEBUG is not set
# CONFIG_PM_TEST_SUSPEND is not set
CONFIG_PM_SLEEP_DEBUG=y
# CONFIG_PM_TRACE_RTC is not set
CONFIG_PM_CLK=y
# CONFIG_WQ_POWER_EFFICIENT_DEFAULT is not set
# CONFIG_ENERGY_MODEL is not set
CONFIG_ARCH_SUPPORTS_ACPI=y
CONFIG_ACPI=y
CONFIG_ACPI_LEGACY_TABLES_LOOKUP=y
CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC=y
CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT=y
# CONFIG_ACPI_DEBUGGER is not set
CONFIG_ACPI_SPCR_TABLE=y
# CONFIG_ACPI_FPDT is not set
CONFIG_ACPI_LPIT=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_REV_OVERRIDE_POSSIBLE=y
CONFIG_ACPI_EC_DEBUGFS=m
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_VIDEO=m
CONFIG_ACPI_FAN=y
CONFIG_ACPI_TAD=m
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_CPU_FREQ_PSS=y
CONFIG_ACPI_PROCESSOR_CSTATE=y
CONFIG_ACPI_PROCESSOR_IDLE=y
CONFIG_ACPI_CPPC_LIB=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_IPMI=m
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_PROCESSOR_AGGREGATOR=m
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_PLATFORM_PROFILE=m
CONFIG_ARCH_HAS_ACPI_TABLE_UPGRADE=y
CONFIG_ACPI_TABLE_UPGRADE=y
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_PCI_SLOT=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_HOTPLUG_MEMORY=y
CONFIG_ACPI_HOTPLUG_IOAPIC=y
CONFIG_ACPI_SBS=m
CONFIG_ACPI_HED=y
# CONFIG_ACPI_CUSTOM_METHOD is not set
CONFIG_ACPI_BGRT=y
CONFIG_ACPI_NFIT=m
# CONFIG_NFIT_SECURITY_DEBUG is not set
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_HMAT is not set
CONFIG_HAVE_ACPI_APEI=y
CONFIG_HAVE_ACPI_APEI_NMI=y
CONFIG_ACPI_APEI=y
CONFIG_ACPI_APEI_GHES=y
CONFIG_ACPI_APEI_PCIEAER=y
CONFIG_ACPI_APEI_MEMORY_FAILURE=y
CONFIG_ACPI_APEI_EINJ=m
CONFIG_ACPI_APEI_ERST_DEBUG=y
# CONFIG_ACPI_DPTF is not set
CONFIG_ACPI_WATCHDOG=y
CONFIG_ACPI_EXTLOG=m
CONFIG_ACPI_ADXL=y
# CONFIG_ACPI_CONFIGFS is not set
CONFIG_PMIC_OPREGION=y
CONFIG_X86_PM_TIMER=y

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_GOV_ATTR_SET=y
CONFIG_CPU_FREQ_GOV_COMMON=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y

#
# CPU frequency scaling drivers
#
CONFIG_X86_INTEL_PSTATE=y
# CONFIG_X86_PCC_CPUFREQ is not set
CONFIG_X86_ACPI_CPUFREQ=m
CONFIG_X86_ACPI_CPUFREQ_CPB=y
CONFIG_X86_POWERNOW_K8=m
CONFIG_X86_AMD_FREQ_SENSITIVITY=m
# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
CONFIG_X86_P4_CLOCKMOD=m

#
# shared options
#
CONFIG_X86_SPEEDSTEP_LIB=m
# end of CPU Frequency scaling

#
# CPU Idle
#
CONFIG_CPU_IDLE=y
# CONFIG_CPU_IDLE_GOV_LADDER is not set
CONFIG_CPU_IDLE_GOV_MENU=y
# CONFIG_CPU_IDLE_GOV_TEO is not set
# CONFIG_CPU_IDLE_GOV_HALTPOLL is not set
CONFIG_HALTPOLL_CPUIDLE=y
# end of CPU Idle

CONFIG_INTEL_IDLE=y
# end of Power management and ACPI options

#
# Bus options (PCI etc.)
#
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_XEN=y
CONFIG_MMCONF_FAM10H=y
CONFIG_ISA_DMA_API=y
CONFIG_AMD_NB=y
# CONFIG_X86_SYSFB is not set
# end of Bus options (PCI etc.)

#
# Binary Emulations
#
CONFIG_IA32_EMULATION=y
# CONFIG_X86_X32 is not set
CONFIG_COMPAT_32=y
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
# end of Binary Emulations

#
# Firmware Drivers
#
CONFIG_EDD=m
# CONFIG_EDD_OFF is not set
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_DMIID=y
CONFIG_DMI_SYSFS=y
CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y
# CONFIG_ISCSI_IBFT is not set
CONFIG_FW_CFG_SYSFS=y
# CONFIG_FW_CFG_SYSFS_CMDLINE is not set
# CONFIG_GOOGLE_FIRMWARE is not set

#
# EFI (Extensible Firmware Interface) Support
#
CONFIG_EFI_VARS=y
CONFIG_EFI_ESRT=y
CONFIG_EFI_VARS_PSTORE=y
CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE=y
CONFIG_EFI_RUNTIME_MAP=y
# CONFIG_EFI_FAKE_MEMMAP is not set
CONFIG_EFI_RUNTIME_WRAPPERS=y
CONFIG_EFI_GENERIC_STUB_INITRD_CMDLINE_LOADER=y
# CONFIG_EFI_BOOTLOADER_CONTROL is not set
# CONFIG_EFI_CAPSULE_LOADER is not set
# CONFIG_EFI_TEST is not set
CONFIG_APPLE_PROPERTIES=y
# CONFIG_RESET_ATTACK_MITIGATION is not set
# CONFIG_EFI_RCI2_TABLE is not set
# CONFIG_EFI_DISABLE_PCI_DMA is not set
# end of EFI (Extensible Firmware Interface) Support

CONFIG_UEFI_CPER=y
CONFIG_UEFI_CPER_X86=y
CONFIG_EFI_DEV_PATH_PARSER=y
CONFIG_EFI_EARLYCON=y
CONFIG_EFI_CUSTOM_SSDT_OVERLAYS=y

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQFD=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM_VFIO=y
CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT=y
CONFIG_KVM_COMPAT=y
CONFIG_HAVE_KVM_IRQ_BYPASS=y
CONFIG_HAVE_KVM_NO_POLL=y
CONFIG_KVM_XFER_TO_GUEST_WORK=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
# CONFIG_KVM_AMD is not set
# CONFIG_KVM_XEN is not set
CONFIG_KVM_MMU_AUDIT=y
CONFIG_AS_AVX512=y
CONFIG_AS_SHA1_NI=y
CONFIG_AS_SHA256_NI=y
CONFIG_AS_TPAUSE=y

#
# General architecture-dependent options
#
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_HOTPLUG_SMT=y
CONFIG_GENERIC_ENTRY=y
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
# CONFIG_STATIC_KEYS_SELFTEST is not set
# CONFIG_STATIC_CALL_SELFTEST is not set
CONFIG_OPTPROBES=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_UPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_HAVE_FUNCTION_ERROR_INJECTION=y
CONFIG_HAVE_NMI=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_HAS_FORTIFY_SOURCE=y
CONFIG_ARCH_HAS_SET_MEMORY=y
CONFIG_ARCH_HAS_SET_DIRECT_MAP=y
CONFIG_HAVE_ARCH_THREAD_STRUCT_WHITELIST=y
CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y
CONFIG_HAVE_ASM_MODVERSIONS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_RSEQ=y
CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y
CONFIG_MMU_GATHER_TABLE_FREE=y
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_HAVE_ARCH_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_SECCOMP_FILTER=y
# CONFIG_SECCOMP_CACHE_DEBUG is not set
CONFIG_HAVE_ARCH_STACKLEAK=y
CONFIG_HAVE_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR_STRONG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y
CONFIG_LTO_NONE=y
CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES=y
CONFIG_HAVE_CONTEXT_TRACKING=y
CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_MOVE_PUD=y
CONFIG_HAVE_MOVE_PMD=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y
CONFIG_HAVE_ARCH_HUGE_VMAP=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK=y
CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_HAVE_ARCH_MMAP_RND_BITS=y
CONFIG_HAVE_EXIT_THREAD=y
CONFIG_ARCH_MMAP_RND_BITS=28
CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS=y
CONFIG_ARCH_MMAP_RND_COMPAT_BITS=8
CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES=y
CONFIG_HAVE_STACK_VALIDATION=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_COMPAT_OLD_SIGACTION=y
CONFIG_COMPAT_32BIT_TIME=y
CONFIG_HAVE_ARCH_VMAP_STACK=y
CONFIG_VMAP_STACK=y
CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_ARCH_HAS_STRICT_MODULE_RWX=y
CONFIG_STRICT_MODULE_RWX=y
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS=y
CONFIG_ARCH_USE_MEMREMAP_PROT=y
# CONFIG_LOCK_EVENT_COUNTS is not set
CONFIG_ARCH_HAS_MEM_ENCRYPT=y
CONFIG_HAVE_STATIC_CALL=y
CONFIG_HAVE_STATIC_CALL_INLINE=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_ARCH_WANT_LD_ORPHAN_WARN=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_HAS_ELFCORE_COMPAT=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# end of GCOV-based kernel profiling

CONFIG_HAVE_GCC_PLUGINS=y
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULE_SIG_FORMAT=y
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_MODULE_SIG=y
# CONFIG_MODULE_SIG_FORCE is not set
CONFIG_MODULE_SIG_ALL=y
# CONFIG_MODULE_SIG_SHA1 is not set
# CONFIG_MODULE_SIG_SHA224 is not set
CONFIG_MODULE_SIG_SHA256=y
# CONFIG_MODULE_SIG_SHA384 is not set
# CONFIG_MODULE_SIG_SHA512 is not set
CONFIG_MODULE_SIG_HASH="sha256"
# CONFIG_MODULE_COMPRESS is not set
# CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_BLOCK=y
CONFIG_BLK_SCSI_REQUEST=y
CONFIG_BLK_CGROUP_RWSTAT=y
CONFIG_BLK_DEV_BSG=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_INTEGRITY_T10=m
CONFIG_BLK_DEV_ZONED=y
CONFIG_BLK_DEV_THROTTLING=y
# CONFIG_BLK_DEV_THROTTLING_LOW is not set
# CONFIG_BLK_CMDLINE_PARSER is not set
CONFIG_BLK_WBT=y
# CONFIG_BLK_CGROUP_IOLATENCY is not set
# CONFIG_BLK_CGROUP_IOCOST is not set
CONFIG_BLK_WBT_MQ=y
CONFIG_BLK_DEBUG_FS=y
CONFIG_BLK_DEBUG_FS_ZONED=y
# CONFIG_BLK_SED_OPAL is not set
# CONFIG_BLK_INLINE_ENCRYPTION is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_AIX_PARTITION is not set
CONFIG_OSF_PARTITION=y
CONFIG_AMIGA_PARTITION=y
# CONFIG_ATARI_PARTITION is not set
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
CONFIG_SUN_PARTITION=y
CONFIG_KARMA_PARTITION=y
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
# CONFIG_CMDLINE_PARTITION is not set
# end of Partition Types

CONFIG_BLOCK_COMPAT=y
CONFIG_BLK_MQ_PCI=y
CONFIG_BLK_MQ_VIRTIO=y
CONFIG_BLK_MQ_RDMA=y
CONFIG_BLK_PM=y

#
# IO Schedulers
#
CONFIG_MQ_IOSCHED_DEADLINE=y
CONFIG_MQ_IOSCHED_KYBER=y
CONFIG_IOSCHED_BFQ=y
CONFIG_BFQ_GROUP_IOSCHED=y
# CONFIG_BFQ_CGROUP_DEBUG is not set
# end of IO Schedulers

CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PADATA=y
CONFIG_ASN1=y
CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_INLINE_READ_UNLOCK=y
CONFIG_INLINE_READ_UNLOCK_IRQ=y
CONFIG_INLINE_WRITE_UNLOCK=y
CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_MUTEX_SPIN_ON_OWNER=y
CONFIG_RWSEM_SPIN_ON_OWNER=y
CONFIG_LOCK_SPIN_ON_OWNER=y
CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
CONFIG_QUEUED_SPINLOCKS=y
CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
CONFIG_QUEUED_RWLOCKS=y
CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE=y
CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE=y
CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
CONFIG_FREEZER=y

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_ELFCORE=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_BINFMT_MISC=m
CONFIG_COREDUMP=y
# end of Executable file formats

#
# Memory Management options
#
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_NEED_MULTIPLE_NODES=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_FAST_GUP=y
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_HAVE_BOOTMEM_INFO_NODE=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
# CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE is not set
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MEMORY_BALLOON=y
CONFIG_BALLOON_COMPACTION=y
CONFIG_COMPACTION=y
CONFIG_PAGE_REPORTING=y
CONFIG_MIGRATION=y
CONFIG_CONTIG_ALLOC=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
CONFIG_MEMORY_FAILURE=y
CONFIG_HWPOISON_INJECT=m
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_ARCH_WANTS_THP_SWAP=y
CONFIG_THP_SWAP=y
CONFIG_CLEANCACHE=y
CONFIG_FRONTSWAP=y
CONFIG_CMA=y
# CONFIG_CMA_DEBUG is not set
# CONFIG_CMA_DEBUGFS is not set
CONFIG_CMA_AREAS=19
CONFIG_ZSWAP=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_DEFLATE is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZO=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_842 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4HC is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_ZSTD is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT="lzo"
CONFIG_ZSWAP_ZPOOL_DEFAULT_ZBUD=y
# CONFIG_ZSWAP_ZPOOL_DEFAULT_Z3FOLD is not set
# CONFIG_ZSWAP_ZPOOL_DEFAULT_ZSMALLOC is not set
CONFIG_ZSWAP_ZPOOL_DEFAULT="zbud"
# CONFIG_ZSWAP_DEFAULT_ON is not set
CONFIG_ZPOOL=y
CONFIG_ZBUD=y
# CONFIG_Z3FOLD is not set
CONFIG_ZSMALLOC=y
CONFIG_ZSMALLOC_STAT=y
CONFIG_GENERIC_EARLY_IOREMAP=y
CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
CONFIG_IDLE_PAGE_TRACKING=y
CONFIG_ARCH_HAS_PTE_DEVMAP=y
CONFIG_ZONE_DEVICE=y
CONFIG_DEV_PAGEMAP_OPS=y
CONFIG_HMM_MIRROR=y
CONFIG_DEVICE_PRIVATE=y
CONFIG_VMAP_PFN=y
CONFIG_ARCH_USES_HIGH_VMA_FLAGS=y
CONFIG_ARCH_HAS_PKEYS=y
# CONFIG_PERCPU_STATS is not set
# CONFIG_GUP_TEST is not set
# CONFIG_READ_ONLY_THP_FOR_FS is not set
CONFIG_ARCH_HAS_PTE_SPECIAL=y
# end of Memory Management options

CONFIG_NET=y
CONFIG_COMPAT_NETLINK_MESSAGES=y
CONFIG_NET_INGRESS=y
CONFIG_NET_EGRESS=y
CONFIG_SKB_EXTENSIONS=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=m
CONFIG_UNIX=y
CONFIG_UNIX_SCM=y
CONFIG_UNIX_DIAG=m
CONFIG_TLS=m
CONFIG_TLS_DEVICE=y
# CONFIG_TLS_TOE is not set
CONFIG_XFRM=y
CONFIG_XFRM_OFFLOAD=y
CONFIG_XFRM_ALGO=y
CONFIG_XFRM_USER=y
# CONFIG_XFRM_USER_COMPAT is not set
# CONFIG_XFRM_INTERFACE is not set
CONFIG_XFRM_SUB_POLICY=y
CONFIG_XFRM_MIGRATE=y
CONFIG_XFRM_STATISTICS=y
CONFIG_XFRM_AH=m
CONFIG_XFRM_ESP=m
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
CONFIG_NET_KEY_MIGRATE=y
# CONFIG_SMC is not set
CONFIG_XDP_SOCKETS=y
# CONFIG_XDP_SOCKETS_DIAG is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_IP_FIB_TRIE_STATS=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_ROUTE_CLASSID=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
# CONFIG_IP_PNP_BOOTP is not set
# CONFIG_IP_PNP_RARP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IP_TUNNEL=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE_COMMON=y
CONFIG_IP_MROUTE=y
CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_SYN_COOKIES=y
CONFIG_NET_IPVTI=m
CONFIG_NET_UDP_TUNNEL=m
# CONFIG_NET_FOU is not set
# CONFIG_NET_FOU_IP_TUNNELS is not set
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_ESP_OFFLOAD=m
# CONFIG_INET_ESPINTCP is not set
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_INET_UDP_DIAG=m
CONFIG_INET_RAW_DIAG=m
# CONFIG_INET_DIAG_DESTROY is not set
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=m
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=m
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=m
CONFIG_TCP_CONG_NV=m
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
CONFIG_TCP_CONG_YEAH=m
CONFIG_TCP_CONG_ILLINOIS=m
CONFIG_TCP_CONG_DCTCP=m
# CONFIG_TCP_CONG_CDG is not set
CONFIG_TCP_CONG_BBR=m
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_TCP_MD5SIG=y
CONFIG_IPV6=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
CONFIG_IPV6_OPTIMISTIC_DAD=y
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
CONFIG_INET6_ESP_OFFLOAD=m
# CONFIG_INET6_ESPINTCP is not set
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_MIP6=m
# CONFIG_IPV6_ILA is not set
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_IPV6_VTI=m
CONFIG_IPV6_SIT=m
CONFIG_IPV6_SIT_6RD=y
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_GRE=m
CONFIG_IPV6_MULTIPLE_TABLES=y
# CONFIG_IPV6_SUBTREES is not set
CONFIG_IPV6_MROUTE=y
CONFIG_IPV6_MROUTE_MULTIPLE_TABLES=y
CONFIG_IPV6_PIMSM_V2=y
# CONFIG_IPV6_SEG6_LWTUNNEL is not set
# CONFIG_IPV6_SEG6_HMAC is not set
# CONFIG_IPV6_RPL_LWTUNNEL is not set
CONFIG_NETLABEL=y
# CONFIG_MPTCP is not set
CONFIG_NETWORK_SECMARK=y
CONFIG_NET_PTP_CLASSIFY=y
CONFIG_NETWORK_PHY_TIMESTAMPING=y
CONFIG_NETFILTER=y
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=m

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_INGRESS=y
CONFIG_NETFILTER_NETLINK=m
CONFIG_NETFILTER_FAMILY_BRIDGE=y
CONFIG_NETFILTER_FAMILY_ARP=y
# CONFIG_NETFILTER_NETLINK_ACCT is not set
CONFIG_NETFILTER_NETLINK_QUEUE=m
CONFIG_NETFILTER_NETLINK_LOG=m
CONFIG_NETFILTER_NETLINK_OSF=m
CONFIG_NF_CONNTRACK=m
CONFIG_NF_LOG_COMMON=m
CONFIG_NF_LOG_NETDEV=m
CONFIG_NETFILTER_CONNCOUNT=m
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CONNTRACK_SECMARK=y
CONFIG_NF_CONNTRACK_ZONES=y
CONFIG_NF_CONNTRACK_PROCFS=y
CONFIG_NF_CONNTRACK_EVENTS=y
CONFIG_NF_CONNTRACK_TIMEOUT=y
CONFIG_NF_CONNTRACK_TIMESTAMP=y
CONFIG_NF_CONNTRACK_LABELS=y
CONFIG_NF_CT_PROTO_DCCP=y
CONFIG_NF_CT_PROTO_GRE=y
CONFIG_NF_CT_PROTO_SCTP=y
CONFIG_NF_CT_PROTO_UDPLITE=y
CONFIG_NF_CONNTRACK_AMANDA=m
CONFIG_NF_CONNTRACK_FTP=m
CONFIG_NF_CONNTRACK_H323=m
CONFIG_NF_CONNTRACK_IRC=m
CONFIG_NF_CONNTRACK_BROADCAST=m
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
CONFIG_NF_CONNTRACK_SNMP=m
CONFIG_NF_CONNTRACK_PPTP=m
CONFIG_NF_CONNTRACK_SANE=m
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_CT_NETLINK=m
CONFIG_NF_CT_NETLINK_TIMEOUT=m
CONFIG_NF_CT_NETLINK_HELPER=m
CONFIG_NETFILTER_NETLINK_GLUE_CT=y
CONFIG_NF_NAT=m
CONFIG_NF_NAT_AMANDA=m
CONFIG_NF_NAT_FTP=m
CONFIG_NF_NAT_IRC=m
CONFIG_NF_NAT_SIP=m
CONFIG_NF_NAT_TFTP=m
CONFIG_NF_NAT_REDIRECT=y
CONFIG_NF_NAT_MASQUERADE=y
CONFIG_NETFILTER_SYNPROXY=m
CONFIG_NF_TABLES=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_COUNTER=m
CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
CONFIG_NFT_REDIR=m
CONFIG_NFT_NAT=m
# CONFIG_NFT_TUNNEL is not set
CONFIG_NFT_OBJREF=m
CONFIG_NFT_QUEUE=m
CONFIG_NFT_QUOTA=m
CONFIG_NFT_REJECT=m
CONFIG_NFT_REJECT_INET=m
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB=m
CONFIG_NFT_FIB_INET=m
# CONFIG_NFT_XFRM is not set
CONFIG_NFT_SOCKET=m
# CONFIG_NFT_OSF is not set
# CONFIG_NFT_TPROXY is not set
# CONFIG_NFT_SYNPROXY is not set
CONFIG_NF_DUP_NETDEV=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
# CONFIG_NFT_REJECT_NETDEV is not set
# CONFIG_NF_FLOW_TABLE is not set
CONFIG_NETFILTER_XTABLES=y

#
# Xtables combined modules
#
CONFIG_NETFILTER_XT_MARK=m
CONFIG_NETFILTER_XT_CONNMARK=m
CONFIG_NETFILTER_XT_SET=m

#
# Xtables targets
#
CONFIG_NETFILTER_XT_TARGET_AUDIT=m
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=m
CONFIG_NETFILTER_XT_TARGET_CT=m
CONFIG_NETFILTER_XT_TARGET_DSCP=m
CONFIG_NETFILTER_XT_TARGET_HL=m
CONFIG_NETFILTER_XT_TARGET_HMARK=m
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=m
# CONFIG_NETFILTER_XT_TARGET_LED is not set
CONFIG_NETFILTER_XT_TARGET_LOG=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_NAT=m
CONFIG_NETFILTER_XT_TARGET_NETMAP=m
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
CONFIG_NETFILTER_XT_TARGET_NOTRACK=m
CONFIG_NETFILTER_XT_TARGET_RATEEST=m
CONFIG_NETFILTER_XT_TARGET_REDIRECT=m
CONFIG_NETFILTER_XT_TARGET_MASQUERADE=m
CONFIG_NETFILTER_XT_TARGET_TEE=m
CONFIG_NETFILTER_XT_TARGET_TPROXY=m
CONFIG_NETFILTER_XT_TARGET_TRACE=m
CONFIG_NETFILTER_XT_TARGET_SECMARK=m
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=m

#
# Xtables matches
#
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=m
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NETFILTER_XT_MATCH_CGROUP=m
CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
CONFIG_NETFILTER_XT_MATCH_CONNLABEL=m
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
CONFIG_NETFILTER_XT_MATCH_CPU=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
CONFIG_NETFILTER_XT_MATCH_DEVGROUP=m
CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_ECN=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_HL=m
# CONFIG_NETFILTER_XT_MATCH_IPCOMP is not set
CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
CONFIG_NETFILTER_XT_MATCH_IPVS=m
# CONFIG_NETFILTER_XT_MATCH_L2TP is not set
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
# CONFIG_NETFILTER_XT_MATCH_NFACCT is not set
CONFIG_NETFILTER_XT_MATCH_OSF=m
CONFIG_NETFILTER_XT_MATCH_OWNER=m
CONFIG_NETFILTER_XT_MATCH_POLICY=m
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
CONFIG_NETFILTER_XT_MATCH_RATEEST=m
CONFIG_NETFILTER_XT_MATCH_REALM=m
CONFIG_NETFILTER_XT_MATCH_RECENT=m
CONFIG_NETFILTER_XT_MATCH_SCTP=m
CONFIG_NETFILTER_XT_MATCH_SOCKET=m
CONFIG_NETFILTER_XT_MATCH_STATE=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
# CONFIG_NETFILTER_XT_MATCH_TIME is not set
# CONFIG_NETFILTER_XT_MATCH_U32 is not set
# end of Core Netfilter Configuration

CONFIG_IP_SET=m
CONFIG_IP_SET_MAX=256
CONFIG_IP_SET_BITMAP_IP=m
CONFIG_IP_SET_BITMAP_IPMAC=m
CONFIG_IP_SET_BITMAP_PORT=m
CONFIG_IP_SET_HASH_IP=m
CONFIG_IP_SET_HASH_IPMARK=m
CONFIG_IP_SET_HASH_IPPORT=m
CONFIG_IP_SET_HASH_IPPORTIP=m
CONFIG_IP_SET_HASH_IPPORTNET=m
CONFIG_IP_SET_HASH_IPMAC=m
CONFIG_IP_SET_HASH_MAC=m
CONFIG_IP_SET_HASH_NETPORTNET=m
CONFIG_IP_SET_HASH_NET=m
CONFIG_IP_SET_HASH_NETNET=m
CONFIG_IP_SET_HASH_NETPORT=m
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_IP_VS=m
CONFIG_IP_VS_IPV6=y
# CONFIG_IP_VS_DEBUG is not set
CONFIG_IP_VS_TAB_BITS=12

#
# IPVS transport protocol load balancing support
#
CONFIG_IP_VS_PROTO_TCP=y
CONFIG_IP_VS_PROTO_UDP=y
CONFIG_IP_VS_PROTO_AH_ESP=y
CONFIG_IP_VS_PROTO_ESP=y
CONFIG_IP_VS_PROTO_AH=y
CONFIG_IP_VS_PROTO_SCTP=y

#
# IPVS scheduler
#
CONFIG_IP_VS_RR=m
CONFIG_IP_VS_WRR=m
CONFIG_IP_VS_LC=m
CONFIG_IP_VS_WLC=m
CONFIG_IP_VS_FO=m
CONFIG_IP_VS_OVF=m
CONFIG_IP_VS_LBLC=m
CONFIG_IP_VS_LBLCR=m
CONFIG_IP_VS_DH=m
CONFIG_IP_VS_SH=m
# CONFIG_IP_VS_MH is not set
CONFIG_IP_VS_SED=m
CONFIG_IP_VS_NQ=m
# CONFIG_IP_VS_TWOS is not set

#
# IPVS SH scheduler
#
CONFIG_IP_VS_SH_TAB_BITS=8

#
# IPVS MH scheduler
#
CONFIG_IP_VS_MH_TAB_INDEX=12

#
# IPVS application helper
#
CONFIG_IP_VS_FTP=m
CONFIG_IP_VS_NFCT=y
CONFIG_IP_VS_PE_SIP=m

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=m
CONFIG_NF_SOCKET_IPV4=m
CONFIG_NF_TPROXY_IPV4=m
CONFIG_NF_TABLES_IPV4=y
CONFIG_NFT_REJECT_IPV4=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
CONFIG_NF_TABLES_ARP=y
CONFIG_NF_DUP_IPV4=m
CONFIG_NF_LOG_ARP=m
CONFIG_NF_LOG_IPV4=m
CONFIG_NF_REJECT_IPV4=m
CONFIG_NF_NAT_SNMP_BASIC=m
CONFIG_NF_NAT_PPTP=m
CONFIG_NF_NAT_H323=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_RPFILTER=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_SYNPROXY=m
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_NETMAP=m
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_IP_NF_MANGLE=m
# CONFIG_IP_NF_TARGET_CLUSTERIP is not set
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_SECURITY=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
# end of IP: Netfilter Configuration

#
# IPv6: Netfilter Configuration
#
CONFIG_NF_SOCKET_IPV6=m
CONFIG_NF_TPROXY_IPV6=m
CONFIG_NF_TABLES_IPV6=y
CONFIG_NFT_REJECT_IPV6=m
CONFIG_NFT_DUP_IPV6=m
CONFIG_NFT_FIB_IPV6=m
CONFIG_NF_DUP_IPV6=m
CONFIG_NF_REJECT_IPV6=m
CONFIG_NF_LOG_IPV6=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_AH=m
CONFIG_IP6_NF_MATCH_EUI64=m
CONFIG_IP6_NF_MATCH_FRAG=m
CONFIG_IP6_NF_MATCH_OPTS=m
CONFIG_IP6_NF_MATCH_HL=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
CONFIG_IP6_NF_MATCH_MH=m
CONFIG_IP6_NF_MATCH_RPFILTER=m
CONFIG_IP6_NF_MATCH_RT=m
# CONFIG_IP6_NF_MATCH_SRH is not set
# CONFIG_IP6_NF_TARGET_HL is not set
CONFIG_IP6_NF_FILTER=m
CONFIG_IP6_NF_TARGET_REJECT=m
CONFIG_IP6_NF_TARGET_SYNPROXY=m
CONFIG_IP6_NF_MANGLE=m
CONFIG_IP6_NF_RAW=m
CONFIG_IP6_NF_SECURITY=m
CONFIG_IP6_NF_NAT=m
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
# end of IPv6: Netfilter Configuration

CONFIG_NF_DEFRAG_IPV6=m
CONFIG_NF_TABLES_BRIDGE=m
# CONFIG_NFT_BRIDGE_META is not set
CONFIG_NFT_BRIDGE_REJECT=m
CONFIG_NF_LOG_BRIDGE=m
# CONFIG_NF_CONNTRACK_BRIDGE is not set
CONFIG_BRIDGE_NF_EBTABLES=m
CONFIG_BRIDGE_EBT_BROUTE=m
CONFIG_BRIDGE_EBT_T_FILTER=m
CONFIG_BRIDGE_EBT_T_NAT=m
CONFIG_BRIDGE_EBT_802_3=m
CONFIG_BRIDGE_EBT_AMONG=m
CONFIG_BRIDGE_EBT_ARP=m
CONFIG_BRIDGE_EBT_IP=m
CONFIG_BRIDGE_EBT_IP6=m
CONFIG_BRIDGE_EBT_LIMIT=m
CONFIG_BRIDGE_EBT_MARK=m
CONFIG_BRIDGE_EBT_PKTTYPE=m
CONFIG_BRIDGE_EBT_STP=m
CONFIG_BRIDGE_EBT_VLAN=m
CONFIG_BRIDGE_EBT_ARPREPLY=m
CONFIG_BRIDGE_EBT_DNAT=m
CONFIG_BRIDGE_EBT_MARK_T=m
CONFIG_BRIDGE_EBT_REDIRECT=m
CONFIG_BRIDGE_EBT_SNAT=m
CONFIG_BRIDGE_EBT_LOG=m
CONFIG_BRIDGE_EBT_NFLOG=m
# CONFIG_BPFILTER is not set
# CONFIG_IP_DCCP is not set
CONFIG_IP_SCTP=m
# CONFIG_SCTP_DBG_OBJCNT is not set
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5 is not set
CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1=y
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE is not set
CONFIG_SCTP_COOKIE_HMAC_MD5=y
CONFIG_SCTP_COOKIE_HMAC_SHA1=y
CONFIG_INET_SCTP_DIAG=m
# CONFIG_RDS is not set
CONFIG_TIPC=m
# CONFIG_TIPC_MEDIA_IB is not set
CONFIG_TIPC_MEDIA_UDP=y
CONFIG_TIPC_CRYPTO=y
CONFIG_TIPC_DIAG=m
CONFIG_ATM=m
CONFIG_ATM_CLIP=m
# CONFIG_ATM_CLIP_NO_ICMP is not set
CONFIG_ATM_LANE=m
# CONFIG_ATM_MPOA is not set
CONFIG_ATM_BR2684=m
# CONFIG_ATM_BR2684_IPFILTER is not set
CONFIG_L2TP=m
CONFIG_L2TP_DEBUGFS=m
CONFIG_L2TP_V3=y
CONFIG_L2TP_IP=m
CONFIG_L2TP_ETH=m
CONFIG_STP=m
CONFIG_GARP=m
CONFIG_MRP=m
CONFIG_BRIDGE=m
CONFIG_BRIDGE_IGMP_SNOOPING=y
CONFIG_BRIDGE_VLAN_FILTERING=y
# CONFIG_BRIDGE_MRP is not set
# CONFIG_BRIDGE_CFM is not set
CONFIG_HAVE_NET_DSA=y
# CONFIG_NET_DSA is not set
CONFIG_VLAN_8021Q=m
CONFIG_VLAN_8021Q_GVRP=y
CONFIG_VLAN_8021Q_MVRP=y
# CONFIG_DECNET is not set
CONFIG_LLC=m
# CONFIG_LLC2 is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_PHONET is not set
CONFIG_6LOWPAN=m
# CONFIG_6LOWPAN_DEBUGFS is not set
# CONFIG_6LOWPAN_NHC is not set
CONFIG_IEEE802154=m
# CONFIG_IEEE802154_NL802154_EXPERIMENTAL is not set
CONFIG_IEEE802154_SOCKET=m
CONFIG_IEEE802154_6LOWPAN=m
CONFIG_MAC802154=m
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_ATM=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_MULTIQ=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFB=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
# CONFIG_NET_SCH_CBS is not set
# CONFIG_NET_SCH_ETF is not set
# CONFIG_NET_SCH_TAPRIO is not set
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_DRR=m
CONFIG_NET_SCH_MQPRIO=m
# CONFIG_NET_SCH_SKBPRIO is not set
CONFIG_NET_SCH_CHOKE=m
CONFIG_NET_SCH_QFQ=m
CONFIG_NET_SCH_CODEL=m
CONFIG_NET_SCH_FQ_CODEL=y
# CONFIG_NET_SCH_CAKE is not set
CONFIG_NET_SCH_FQ=m
CONFIG_NET_SCH_HHF=m
CONFIG_NET_SCH_PIE=m
# CONFIG_NET_SCH_FQ_PIE is not set
CONFIG_NET_SCH_INGRESS=m
CONFIG_NET_SCH_PLUG=m
# CONFIG_NET_SCH_ETS is not set
CONFIG_NET_SCH_DEFAULT=y
# CONFIG_DEFAULT_FQ is not set
# CONFIG_DEFAULT_CODEL is not set
CONFIG_DEFAULT_FQ_CODEL=y
# CONFIG_DEFAULT_SFQ is not set
# CONFIG_DEFAULT_PFIFO_FAST is not set
CONFIG_DEFAULT_NET_SCH="fq_codel"

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
CONFIG_NET_CLS_FLOW=m
CONFIG_NET_CLS_CGROUP=y
CONFIG_NET_CLS_BPF=m
CONFIG_NET_CLS_FLOWER=m
CONFIG_NET_CLS_MATCHALL=m
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_STACK=32
CONFIG_NET_EMATCH_CMP=m
CONFIG_NET_EMATCH_NBYTE=m
CONFIG_NET_EMATCH_U32=m
CONFIG_NET_EMATCH_META=m
CONFIG_NET_EMATCH_TEXT=m
# CONFIG_NET_EMATCH_CANID is not set
CONFIG_NET_EMATCH_IPSET=m
# CONFIG_NET_EMATCH_IPT is not set
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
CONFIG_NET_ACT_SAMPLE=m
# CONFIG_NET_ACT_IPT is not set
CONFIG_NET_ACT_NAT=m
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
CONFIG_NET_ACT_SKBEDIT=m
CONFIG_NET_ACT_CSUM=m
# CONFIG_NET_ACT_MPLS is not set
CONFIG_NET_ACT_VLAN=m
CONFIG_NET_ACT_BPF=m
# CONFIG_NET_ACT_CONNMARK is not set
# CONFIG_NET_ACT_CTINFO is not set
CONFIG_NET_ACT_SKBMOD=m
# CONFIG_NET_ACT_IFE is not set
CONFIG_NET_ACT_TUNNEL_KEY=m
# CONFIG_NET_ACT_GATE is not set
# CONFIG_NET_TC_SKB_EXT is not set
CONFIG_NET_SCH_FIFO=y
CONFIG_DCB=y
CONFIG_DNS_RESOLVER=m
# CONFIG_BATMAN_ADV is not set
CONFIG_OPENVSWITCH=m
CONFIG_OPENVSWITCH_GRE=m
CONFIG_VSOCKETS=m
CONFIG_VSOCKETS_DIAG=m
CONFIG_VSOCKETS_LOOPBACK=m
CONFIG_VMWARE_VMCI_VSOCKETS=m
CONFIG_VIRTIO_VSOCKETS=m
CONFIG_VIRTIO_VSOCKETS_COMMON=m
CONFIG_HYPERV_VSOCKETS=m
CONFIG_NETLINK_DIAG=m
CONFIG_MPLS=y
CONFIG_NET_MPLS_GSO=y
CONFIG_MPLS_ROUTING=m
CONFIG_MPLS_IPTUNNEL=m
CONFIG_NET_NSH=y
# CONFIG_HSR is not set
CONFIG_NET_SWITCHDEV=y
CONFIG_NET_L3_MASTER_DEV=y
# CONFIG_QRTR is not set
# CONFIG_NET_NCSI is not set
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_SOCK_RX_QUEUE_MAPPING=y
CONFIG_XPS=y
CONFIG_CGROUP_NET_PRIO=y
CONFIG_CGROUP_NET_CLASSID=y
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
CONFIG_BPF_JIT=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
CONFIG_NET_PKTGEN=m
CONFIG_NET_DROP_MONITOR=y
# end of Network testing
# end of Networking options

# CONFIG_HAMRADIO is not set
CONFIG_CAN=m
CONFIG_CAN_RAW=m
CONFIG_CAN_BCM=m
CONFIG_CAN_GW=m
# CONFIG_CAN_J1939 is not set
# CONFIG_CAN_ISOTP is not set

#
# CAN Device Drivers
#
CONFIG_CAN_VCAN=m
# CONFIG_CAN_VXCAN is not set
CONFIG_CAN_SLCAN=m
CONFIG_CAN_DEV=m
CONFIG_CAN_CALC_BITTIMING=y
# CONFIG_CAN_KVASER_PCIEFD is not set
CONFIG_CAN_C_CAN=m
CONFIG_CAN_C_CAN_PLATFORM=m
CONFIG_CAN_C_CAN_PCI=m
CONFIG_CAN_CC770=m
# CONFIG_CAN_CC770_ISA is not set
CONFIG_CAN_CC770_PLATFORM=m
# CONFIG_CAN_IFI_CANFD is not set
# CONFIG_CAN_M_CAN is not set
# CONFIG_CAN_PEAK_PCIEFD is not set
CONFIG_CAN_SJA1000=m
CONFIG_CAN_EMS_PCI=m
# CONFIG_CAN_F81601 is not set
CONFIG_CAN_KVASER_PCI=m
CONFIG_CAN_PEAK_PCI=m
CONFIG_CAN_PEAK_PCIEC=y
CONFIG_CAN_PLX_PCI=m
# CONFIG_CAN_SJA1000_ISA is not set
CONFIG_CAN_SJA1000_PLATFORM=m
CONFIG_CAN_SOFTING=m

#
# CAN SPI interfaces
#
# CONFIG_CAN_HI311X is not set
# CONFIG_CAN_MCP251X is not set
# CONFIG_CAN_MCP251XFD is not set
# end of CAN SPI interfaces

#
# CAN USB interfaces
#
# CONFIG_CAN_8DEV_USB is not set
# CONFIG_CAN_EMS_USB is not set
# CONFIG_CAN_ESD_USB2 is not set
# CONFIG_CAN_GS_USB is not set
# CONFIG_CAN_KVASER_USB is not set
# CONFIG_CAN_MCBA_USB is not set
# CONFIG_CAN_PEAK_USB is not set
# CONFIG_CAN_UCAN is not set
# end of CAN USB interfaces

# CONFIG_CAN_DEBUG_DEVICES is not set
# end of CAN Device Drivers

CONFIG_BT=m
CONFIG_BT_BREDR=y
CONFIG_BT_RFCOMM=m
CONFIG_BT_RFCOMM_TTY=y
CONFIG_BT_BNEP=m
CONFIG_BT_BNEP_MC_FILTER=y
CONFIG_BT_BNEP_PROTO_FILTER=y
CONFIG_BT_HIDP=m
CONFIG_BT_HS=y
CONFIG_BT_LE=y
# CONFIG_BT_6LOWPAN is not set
# CONFIG_BT_LEDS is not set
# CONFIG_BT_MSFTEXT is not set
CONFIG_BT_DEBUGFS=y
# CONFIG_BT_SELFTEST is not set

#
# Bluetooth device drivers
#
# CONFIG_BT_HCIBTUSB is not set
# CONFIG_BT_HCIBTSDIO is not set
CONFIG_BT_HCIUART=m
CONFIG_BT_HCIUART_H4=y
CONFIG_BT_HCIUART_BCSP=y
CONFIG_BT_HCIUART_ATH3K=y
# CONFIG_BT_HCIUART_INTEL is not set
# CONFIG_BT_HCIUART_AG6XX is not set
# CONFIG_BT_HCIBCM203X is not set
# CONFIG_BT_HCIBPA10X is not set
# CONFIG_BT_HCIBFUSB is not set
CONFIG_BT_HCIVHCI=m
CONFIG_BT_MRVL=m
# CONFIG_BT_MRVL_SDIO is not set
# CONFIG_BT_MTKSDIO is not set
# end of Bluetooth device drivers

# CONFIG_AF_RXRPC is not set
# CONFIG_AF_KCM is not set
CONFIG_STREAM_PARSER=y
CONFIG_FIB_RULES=y
CONFIG_WIRELESS=y
CONFIG_WEXT_CORE=y
CONFIG_WEXT_PROC=y
CONFIG_CFG80211=m
# CONFIG_NL80211_TESTMODE is not set
# CONFIG_CFG80211_DEVELOPER_WARNINGS is not set
CONFIG_CFG80211_REQUIRE_SIGNED_REGDB=y
CONFIG_CFG80211_USE_KERNEL_REGDB_KEYS=y
CONFIG_CFG80211_DEFAULT_PS=y
# CONFIG_CFG80211_DEBUGFS is not set
CONFIG_CFG80211_CRDA_SUPPORT=y
CONFIG_CFG80211_WEXT=y
CONFIG_MAC80211=m
CONFIG_MAC80211_HAS_RC=y
CONFIG_MAC80211_RC_MINSTREL=y
CONFIG_MAC80211_RC_DEFAULT_MINSTREL=y
CONFIG_MAC80211_RC_DEFAULT="minstrel_ht"
CONFIG_MAC80211_MESH=y
CONFIG_MAC80211_LEDS=y
CONFIG_MAC80211_DEBUGFS=y
# CONFIG_MAC80211_MESSAGE_TRACING is not set
# CONFIG_MAC80211_DEBUG_MENU is not set
CONFIG_MAC80211_STA_HASH_MAX_SIZE=0
CONFIG_RFKILL=m
CONFIG_RFKILL_LEDS=y
CONFIG_RFKILL_INPUT=y
# CONFIG_RFKILL_GPIO is not set
CONFIG_NET_9P=y
CONFIG_NET_9P_VIRTIO=y
# CONFIG_NET_9P_XEN is not set
# CONFIG_NET_9P_RDMA is not set
# CONFIG_NET_9P_DEBUG is not set
# CONFIG_CAIF is not set
CONFIG_CEPH_LIB=m
# CONFIG_CEPH_LIB_PRETTYDEBUG is not set
CONFIG_CEPH_LIB_USE_DNS_RESOLVER=y
# CONFIG_NFC is not set
CONFIG_PSAMPLE=m
# CONFIG_NET_IFE is not set
CONFIG_LWTUNNEL=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_DST_CACHE=y
CONFIG_GRO_CELLS=y
CONFIG_SOCK_VALIDATE_XMIT=y
CONFIG_NET_SOCK_MSG=y
CONFIG_NET_DEVLINK=y
CONFIG_PAGE_POOL=y
CONFIG_FAILOVER=m
CONFIG_ETHTOOL_NETLINK=y
CONFIG_HAVE_EBPF_JIT=y

#
# Device Drivers
#
CONFIG_HAVE_EISA=y
# CONFIG_EISA is not set
CONFIG_HAVE_PCI=y
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCIEPORTBUS=y
CONFIG_HOTPLUG_PCI_PCIE=y
CONFIG_PCIEAER=y
CONFIG_PCIEAER_INJECT=m
CONFIG_PCIE_ECRC=y
CONFIG_PCIEASPM=y
CONFIG_PCIEASPM_DEFAULT=y
# CONFIG_PCIEASPM_POWERSAVE is not set
# CONFIG_PCIEASPM_POWER_SUPERSAVE is not set
# CONFIG_PCIEASPM_PERFORMANCE is not set
CONFIG_PCIE_PME=y
CONFIG_PCIE_DPC=y
# CONFIG_PCIE_PTM is not set
# CONFIG_PCIE_EDR is not set
CONFIG_PCI_MSI=y
CONFIG_PCI_MSI_IRQ_DOMAIN=y
CONFIG_PCI_QUIRKS=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set
CONFIG_PCI_STUB=y
CONFIG_PCI_PF_STUB=m
# CONFIG_XEN_PCIDEV_FRONTEND is not set
CONFIG_PCI_ATS=y
CONFIG_PCI_LOCKLESS_CONFIG=y
CONFIG_PCI_IOV=y
CONFIG_PCI_PRI=y
CONFIG_PCI_PASID=y
# CONFIG_PCI_P2PDMA is not set
CONFIG_PCI_LABEL=y
CONFIG_PCI_HYPERV=m
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_ACPI=y
CONFIG_HOTPLUG_PCI_ACPI_IBM=m
# CONFIG_HOTPLUG_PCI_CPCI is not set
CONFIG_HOTPLUG_PCI_SHPC=y

#
# PCI controller drivers
#
CONFIG_VMD=y
CONFIG_PCI_HYPERV_INTERFACE=m

#
# DesignWare PCI Core Support
#
# CONFIG_PCIE_DW_PLAT_HOST is not set
# CONFIG_PCI_MESON is not set
# end of DesignWare PCI Core Support

#
# Mobiveil PCIe Core Support
#
# end of Mobiveil PCIe Core Support

#
# Cadence PCIe controllers support
#
# end of Cadence PCIe controllers support
# end of PCI controller drivers

#
# PCI Endpoint
#
# CONFIG_PCI_ENDPOINT is not set
# end of PCI Endpoint

#
# PCI switch controller drivers
#
# CONFIG_PCI_SW_SWITCHTEC is not set
# end of PCI switch controller drivers

# CONFIG_CXL_BUS is not set
# CONFIG_PCCARD is not set
# CONFIG_RAPIDIO is not set

#
# Generic Driver Options
#
# CONFIG_UEVENT_HELPER is not set
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_FW_LOADER_PAGED_BUF=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_FW_LOADER_USER_HELPER=y
# CONFIG_FW_LOADER_USER_HELPER_FALLBACK is not set
# CONFIG_FW_LOADER_COMPRESS is not set
CONFIG_FW_CACHE=y
# end of Firmware loader

CONFIG_ALLOW_DEV_COREDUMP=y
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_DEBUG_TEST_DRIVER_REMOVE is not set
# CONFIG_PM_QOS_KUNIT_TEST is not set
# CONFIG_TEST_ASYNC_DRIVER_PROBE is not set
CONFIG_KUNIT_DRIVER_PE_TEST=y
CONFIG_SYS_HYPERVISOR=y
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_GENERIC_CPU_VULNERABILITIES=y
CONFIG_REGMAP=y
CONFIG_REGMAP_I2C=m
CONFIG_REGMAP_SPI=m
CONFIG_DMA_SHARED_BUFFER=y
# CONFIG_DMA_FENCE_TRACE is not set
# end of Generic Driver Options

#
# Bus devices
#
# CONFIG_MHI_BUS is not set
# end of Bus devices

CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
# CONFIG_GNSS is not set
# CONFIG_MTD is not set
# CONFIG_OF is not set
CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_SERIAL=m
# CONFIG_PARPORT_PC_FIFO is not set
# CONFIG_PARPORT_PC_SUPERIO is not set
# CONFIG_PARPORT_AX88796 is not set
CONFIG_PARPORT_1284=y
CONFIG_PNP=y
# CONFIG_PNP_DEBUG_MESSAGES is not set

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_NULL_BLK=m
CONFIG_BLK_DEV_NULL_BLK_FAULT_INJECTION=y
# CONFIG_BLK_DEV_FD is not set
CONFIG_CDROM=m
# CONFIG_PARIDE is not set
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
# CONFIG_ZRAM is not set
# CONFIG_BLK_DEV_UMEM is not set
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_LOOP_MIN_COUNT=0
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
# CONFIG_BLK_DEV_DRBD is not set
CONFIG_BLK_DEV_NBD=m
# CONFIG_BLK_DEV_SX8 is not set
CONFIG_BLK_DEV_RAM=m
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=16384
CONFIG_CDROM_PKTCDVD=m
CONFIG_CDROM_PKTCDVD_BUFFERS=8
# CONFIG_CDROM_PKTCDVD_WCACHE is not set
# CONFIG_ATA_OVER_ETH is not set
CONFIG_XEN_BLKDEV_FRONTEND=m
CONFIG_VIRTIO_BLK=m
CONFIG_BLK_DEV_RBD=m
# CONFIG_BLK_DEV_RSXX is not set

#
# NVME Support
#
CONFIG_NVME_CORE=m
CONFIG_BLK_DEV_NVME=m
CONFIG_NVME_MULTIPATH=y
# CONFIG_NVME_HWMON is not set
CONFIG_NVME_FABRICS=m
# CONFIG_NVME_RDMA is not set
CONFIG_NVME_FC=m
# CONFIG_NVME_TCP is not set
CONFIG_NVME_TARGET=m
# CONFIG_NVME_TARGET_PASSTHRU is not set
CONFIG_NVME_TARGET_LOOP=m
# CONFIG_NVME_TARGET_RDMA is not set
CONFIG_NVME_TARGET_FC=m
CONFIG_NVME_TARGET_FCLOOP=m
# CONFIG_NVME_TARGET_TCP is not set
# end of NVME Support

#
# Misc devices
#
CONFIG_SENSORS_LIS3LV02D=m
# CONFIG_AD525X_DPOT is not set
# CONFIG_DUMMY_IRQ is not set
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
CONFIG_TIFM_CORE=m
CONFIG_TIFM_7XX1=m
# CONFIG_ICS932S401 is not set
CONFIG_ENCLOSURE_SERVICES=m
CONFIG_SGI_XP=m
CONFIG_HP_ILO=m
CONFIG_SGI_GRU=m
# CONFIG_SGI_GRU_DEBUG is not set
CONFIG_APDS9802ALS=m
CONFIG_ISL29003=m
CONFIG_ISL29020=m
CONFIG_SENSORS_TSL2550=m
CONFIG_SENSORS_BH1770=m
CONFIG_SENSORS_APDS990X=m
# CONFIG_HMC6352 is not set
# CONFIG_DS1682 is not set
CONFIG_VMWARE_BALLOON=m
# CONFIG_LATTICE_ECP3_CONFIG is not set
# CONFIG_SRAM is not set
# CONFIG_PCI_ENDPOINT_TEST is not set
# CONFIG_XILINX_SDFEC is not set
CONFIG_MISC_RTSX=m
CONFIG_PVPANIC=y
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
# CONFIG_EEPROM_AT25 is not set
CONFIG_EEPROM_LEGACY=m
CONFIG_EEPROM_MAX6875=m
CONFIG_EEPROM_93CX6=m
# CONFIG_EEPROM_93XX46 is not set
# CONFIG_EEPROM_IDT_89HPESX is not set
# CONFIG_EEPROM_EE1004 is not set
# end of EEPROM support

CONFIG_CB710_CORE=m
# CONFIG_CB710_DEBUG is not set
CONFIG_CB710_DEBUG_ASSUMPTIONS=y

#
# Texas Instruments shared transport line discipline
#
# CONFIG_TI_ST is not set
# end of Texas Instruments shared transport line discipline

CONFIG_SENSORS_LIS3_I2C=m
CONFIG_ALTERA_STAPL=m
CONFIG_INTEL_MEI=m
CONFIG_INTEL_MEI_ME=m
# CONFIG_INTEL_MEI_TXE is not set
# CONFIG_INTEL_MEI_HDCP is not set
CONFIG_VMWARE_VMCI=m
# CONFIG_GENWQE is not set
# CONFIG_ECHO is not set
# CONFIG_BCM_VK is not set
# CONFIG_MISC_ALCOR_PCI is not set
CONFIG_MISC_RTSX_PCI=m
# CONFIG_MISC_RTSX_USB is not set
# CONFIG_HABANA_AI is not set
# CONFIG_UACCE is not set
# end of Misc devices

CONFIG_HAVE_IDE=y
# CONFIG_IDE is not set

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
CONFIG_RAID_ATTRS=m
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=m
CONFIG_CHR_DEV_ST=m
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=m
CONFIG_CHR_DEV_SCH=m
CONFIG_SCSI_ENCLOSURE=m
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
CONFIG_SCSI_SCAN_ASYNC=y

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=m
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
CONFIG_SCSI_SAS_ATA=y
CONFIG_SCSI_SAS_HOST_SMP=y
CONFIG_SCSI_SRP_ATTRS=m
# end of SCSI Transports

CONFIG_SCSI_LOWLEVEL=y
# CONFIG_ISCSI_TCP is not set
# CONFIG_ISCSI_BOOT_SYSFS is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_SCSI_CXGB4_ISCSI is not set
# CONFIG_SCSI_BNX2_ISCSI is not set
# CONFIG_BE2ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_3W_SAS is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_MVUMI is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_SCSI_ESAS2R is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
CONFIG_SCSI_MPT3SAS=m
CONFIG_SCSI_MPT2SAS_MAX_SGE=128
CONFIG_SCSI_MPT3SAS_MAX_SGE=128
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_SMARTPQI is not set
# CONFIG_SCSI_UFSHCD is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_MYRB is not set
# CONFIG_SCSI_MYRS is not set
# CONFIG_VMWARE_PVSCSI is not set
# CONFIG_XEN_SCSI_FRONTEND is not set
CONFIG_HYPERV_STORAGE=m
# CONFIG_LIBFC is not set
# CONFIG_SCSI_SNIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_FDOMAIN_PCI is not set
CONFIG_SCSI_ISCI=m
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_PPA is not set
# CONFIG_SCSI_IMM is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_AM53C974 is not set
# CONFIG_SCSI_WD719X is not set
CONFIG_SCSI_DEBUG=m
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
# CONFIG_SCSI_BFA_FC is not set
# CONFIG_SCSI_VIRTIO is not set
# CONFIG_SCSI_CHELSIO_FCOE is not set
CONFIG_SCSI_DH=y
CONFIG_SCSI_DH_RDAC=y
CONFIG_SCSI_DH_HP_SW=y
CONFIG_SCSI_DH_EMC=y
CONFIG_SCSI_DH_ALUA=y
# end of SCSI device support

CONFIG_ATA=m
CONFIG_SATA_HOST=y
CONFIG_PATA_TIMINGS=y
CONFIG_ATA_VERBOSE_ERROR=y
CONFIG_ATA_FORCE=y
CONFIG_ATA_ACPI=y
# CONFIG_SATA_ZPODD is not set
CONFIG_SATA_PMP=y

#
# Controllers with non-SFF native interface
#
CONFIG_SATA_AHCI=m
CONFIG_SATA_MOBILE_LPM_POLICY=0
CONFIG_SATA_AHCI_PLATFORM=m
# CONFIG_SATA_INIC162X is not set
# CONFIG_SATA_ACARD_AHCI is not set
# CONFIG_SATA_SIL24 is not set
CONFIG_ATA_SFF=y

#
# SFF controllers with custom DMA interface
#
# CONFIG_PDC_ADMA is not set
# CONFIG_SATA_QSTOR is not set
# CONFIG_SATA_SX4 is not set
CONFIG_ATA_BMDMA=y

#
# SATA SFF controllers with BMDMA
#
CONFIG_ATA_PIIX=m
# CONFIG_SATA_DWC is not set
# CONFIG_SATA_MV is not set
# CONFIG_SATA_NV is not set
# CONFIG_SATA_PROMISE is not set
# CONFIG_SATA_SIL is not set
# CONFIG_SATA_SIS is not set
# CONFIG_SATA_SVW is not set
# CONFIG_SATA_ULI is not set
# CONFIG_SATA_VIA is not set
# CONFIG_SATA_VITESSE is not set

#
# PATA SFF controllers with BMDMA
#
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARTOP is not set
# CONFIG_PATA_ATIIXP is not set
# CONFIG_PATA_ATP867X is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_MARVELL is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87415 is not set
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PDC2027X is not set
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RDC is not set
# CONFIG_PATA_SCH is not set
# CONFIG_PATA_SERVERWORKS is not set
# CONFIG_PATA_SIL680 is not set
# CONFIG_PATA_SIS is not set
# CONFIG_PATA_TOSHIBA is not set
# CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_VIA is not set
# CONFIG_PATA_WINBOND is not set

#
# PIO-only SFF controllers
#
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_RZ1000 is not set

#
# Generic fallback / legacy drivers
#
# CONFIG_PATA_ACPI is not set
CONFIG_ATA_GENERIC=m
# CONFIG_PATA_LEGACY is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
CONFIG_MD_MULTIPATH=m
CONFIG_MD_FAULTY=m
CONFIG_MD_CLUSTER=m
# CONFIG_BCACHE is not set
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=m
CONFIG_DM_DEBUG=y
CONFIG_DM_BUFIO=m
# CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING is not set
CONFIG_DM_BIO_PRISON=m
CONFIG_DM_PERSISTENT_DATA=m
# CONFIG_DM_UNSTRIPED is not set
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
CONFIG_DM_CACHE=m
CONFIG_DM_CACHE_SMQ=m
CONFIG_DM_WRITECACHE=m
# CONFIG_DM_EBS is not set
CONFIG_DM_ERA=m
# CONFIG_DM_CLONE is not set
CONFIG_DM_MIRROR=m
CONFIG_DM_LOG_USERSPACE=m
CONFIG_DM_RAID=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
CONFIG_DM_MULTIPATH_QL=m
CONFIG_DM_MULTIPATH_ST=m
# CONFIG_DM_MULTIPATH_HST is not set
# CONFIG_DM_MULTIPATH_IOA is not set
CONFIG_DM_DELAY=m
# CONFIG_DM_DUST is not set
CONFIG_DM_UEVENT=y
CONFIG_DM_FLAKEY=m
CONFIG_DM_VERITY=m
# CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG is not set
# CONFIG_DM_VERITY_FEC is not set
CONFIG_DM_SWITCH=m
CONFIG_DM_LOG_WRITES=m
CONFIG_DM_INTEGRITY=m
# CONFIG_DM_ZONED is not set
CONFIG_TARGET_CORE=m
CONFIG_TCM_IBLOCK=m
CONFIG_TCM_FILEIO=m
CONFIG_TCM_PSCSI=m
CONFIG_TCM_USER2=m
CONFIG_LOOPBACK_TARGET=m
CONFIG_ISCSI_TARGET=m
# CONFIG_SBP_TARGET is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
CONFIG_FIREWIRE=m
CONFIG_FIREWIRE_OHCI=m
CONFIG_FIREWIRE_SBP2=m
CONFIG_FIREWIRE_NET=m
# CONFIG_FIREWIRE_NOSY is not set
# end of IEEE 1394 (FireWire) support

CONFIG_MACINTOSH_DRIVERS=y
CONFIG_MAC_EMUMOUSEBTN=y
CONFIG_NETDEVICES=y
CONFIG_MII=y
CONFIG_NET_CORE=y
# CONFIG_BONDING is not set
# CONFIG_DUMMY is not set
# CONFIG_WIREGUARD is not set
# CONFIG_EQUALIZER is not set
# CONFIG_NET_FC is not set
# CONFIG_IFB is not set
# CONFIG_NET_TEAM is not set
# CONFIG_MACVLAN is not set
# CONFIG_IPVLAN is not set
# CONFIG_VXLAN is not set
# CONFIG_GENEVE is not set
# CONFIG_BAREUDP is not set
# CONFIG_GTP is not set
# CONFIG_MACSEC is not set
CONFIG_NETCONSOLE=m
CONFIG_NETCONSOLE_DYNAMIC=y
CONFIG_NETPOLL=y
CONFIG_NET_POLL_CONTROLLER=y
CONFIG_TUN=m
# CONFIG_TUN_VNET_CROSS_LE is not set
CONFIG_VETH=m
CONFIG_VIRTIO_NET=m
# CONFIG_NLMON is not set
# CONFIG_NET_VRF is not set
# CONFIG_VSOCKMON is not set
# CONFIG_ARCNET is not set
CONFIG_ATM_DRIVERS=y
# CONFIG_ATM_DUMMY is not set
# CONFIG_ATM_TCP is not set
# CONFIG_ATM_LANAI is not set
# CONFIG_ATM_ENI is not set
# CONFIG_ATM_FIRESTREAM is not set
# CONFIG_ATM_ZATM is not set
# CONFIG_ATM_NICSTAR is not set
# CONFIG_ATM_IDT77252 is not set
# CONFIG_ATM_AMBASSADOR is not set
# CONFIG_ATM_HORIZON is not set
# CONFIG_ATM_IA is not set
# CONFIG_ATM_FORE200E is not set
# CONFIG_ATM_HE is not set
# CONFIG_ATM_SOLOS is not set

#
# Distributed Switch Architecture drivers
#
# CONFIG_NET_DSA_MV88E6XXX_PTP is not set
# end of Distributed Switch Architecture drivers

CONFIG_ETHERNET=y
CONFIG_MDIO=y
CONFIG_NET_VENDOR_3COM=y
# CONFIG_VORTEX is not set
# CONFIG_TYPHOON is not set
CONFIG_NET_VENDOR_ADAPTEC=y
# CONFIG_ADAPTEC_STARFIRE is not set
CONFIG_NET_VENDOR_AGERE=y
# CONFIG_ET131X is not set
CONFIG_NET_VENDOR_ALACRITECH=y
# CONFIG_SLICOSS is not set
CONFIG_NET_VENDOR_ALTEON=y
# CONFIG_ACENIC is not set
# CONFIG_ALTERA_TSE is not set
CONFIG_NET_VENDOR_AMAZON=y
# CONFIG_ENA_ETHERNET is not set
CONFIG_NET_VENDOR_AMD=y
# CONFIG_AMD8111_ETH is not set
# CONFIG_PCNET32 is not set
# CONFIG_AMD_XGBE is not set
CONFIG_NET_VENDOR_AQUANTIA=y
# CONFIG_AQTION is not set
CONFIG_NET_VENDOR_ARC=y
CONFIG_NET_VENDOR_ATHEROS=y
# CONFIG_ATL2 is not set
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
# CONFIG_ALX is not set
CONFIG_NET_VENDOR_BROADCOM=y
# CONFIG_B44 is not set
# CONFIG_BCMGENET is not set
# CONFIG_BNX2 is not set
# CONFIG_CNIC is not set
# CONFIG_TIGON3 is not set
# CONFIG_BNX2X is not set
# CONFIG_SYSTEMPORT is not set
# CONFIG_BNXT is not set
CONFIG_NET_VENDOR_BROCADE=y
# CONFIG_BNA is not set
CONFIG_NET_VENDOR_CADENCE=y
# CONFIG_MACB is not set
CONFIG_NET_VENDOR_CAVIUM=y
# CONFIG_THUNDER_NIC_PF is not set
# CONFIG_THUNDER_NIC_VF is not set
# CONFIG_THUNDER_NIC_BGX is not set
# CONFIG_THUNDER_NIC_RGX is not set
CONFIG_CAVIUM_PTP=y
# CONFIG_LIQUIDIO is not set
# CONFIG_LIQUIDIO_VF is not set
CONFIG_NET_VENDOR_CHELSIO=y
# CONFIG_CHELSIO_T1 is not set
# CONFIG_CHELSIO_T3 is not set
# CONFIG_CHELSIO_T4 is not set
# CONFIG_CHELSIO_T4VF is not set
CONFIG_NET_VENDOR_CISCO=y
# CONFIG_ENIC is not set
CONFIG_NET_VENDOR_CORTINA=y
# CONFIG_CX_ECAT is not set
# CONFIG_DNET is not set
CONFIG_NET_VENDOR_DEC=y
# CONFIG_NET_TULIP is not set
CONFIG_NET_VENDOR_DLINK=y
# CONFIG_DL2K is not set
# CONFIG_SUNDANCE is not set
CONFIG_NET_VENDOR_EMULEX=y
# CONFIG_BE2NET is not set
CONFIG_NET_VENDOR_EZCHIP=y
CONFIG_NET_VENDOR_GOOGLE=y
# CONFIG_GVE is not set
CONFIG_NET_VENDOR_HUAWEI=y
# CONFIG_HINIC is not set
CONFIG_NET_VENDOR_I825XX=y
CONFIG_NET_VENDOR_INTEL=y
# CONFIG_E100 is not set
CONFIG_E1000=y
CONFIG_E1000E=y
CONFIG_E1000E_HWTS=y
CONFIG_IGB=y
CONFIG_IGB_HWMON=y
# CONFIG_IGBVF is not set
# CONFIG_IXGB is not set
CONFIG_IXGBE=y
CONFIG_IXGBE_HWMON=y
# CONFIG_IXGBE_DCB is not set
CONFIG_IXGBE_IPSEC=y
# CONFIG_IXGBEVF is not set
CONFIG_I40E=y
# CONFIG_I40E_DCB is not set
# CONFIG_I40EVF is not set
# CONFIG_ICE is not set
# CONFIG_FM10K is not set
CONFIG_IGC=y
# CONFIG_JME is not set
CONFIG_NET_VENDOR_MARVELL=y
# CONFIG_MVMDIO is not set
# CONFIG_SKGE is not set
# CONFIG_SKY2 is not set
# CONFIG_PRESTERA is not set
CONFIG_NET_VENDOR_MELLANOX=y
# CONFIG_MLX4_EN is not set
# CONFIG_MLX5_CORE is not set
# CONFIG_MLXSW_CORE is not set
# CONFIG_MLXFW is not set
CONFIG_NET_VENDOR_MICREL=y
# CONFIG_KS8842 is not set
# CONFIG_KS8851 is not set
# CONFIG_KS8851_MLL is not set
# CONFIG_KSZ884X_PCI is not set
CONFIG_NET_VENDOR_MICROCHIP=y
# CONFIG_ENC28J60 is not set
# CONFIG_ENCX24J600 is not set
# CONFIG_LAN743X is not set
CONFIG_NET_VENDOR_MICROSEMI=y
CONFIG_NET_VENDOR_MYRI=y
# CONFIG_MYRI10GE is not set
# CONFIG_FEALNX is not set
CONFIG_NET_VENDOR_NATSEMI=y
# CONFIG_NATSEMI is not set
# CONFIG_NS83820 is not set
CONFIG_NET_VENDOR_NETERION=y
# CONFIG_S2IO is not set
# CONFIG_VXGE is not set
CONFIG_NET_VENDOR_NETRONOME=y
# CONFIG_NFP is not set
CONFIG_NET_VENDOR_NI=y
# CONFIG_NI_XGE_MANAGEMENT_ENET is not set
CONFIG_NET_VENDOR_8390=y
# CONFIG_NE2K_PCI is not set
CONFIG_NET_VENDOR_NVIDIA=y
# CONFIG_FORCEDETH is not set
CONFIG_NET_VENDOR_OKI=y
# CONFIG_ETHOC is not set
CONFIG_NET_VENDOR_PACKET_ENGINES=y
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
CONFIG_NET_VENDOR_PENSANDO=y
# CONFIG_IONIC is not set
CONFIG_NET_VENDOR_QLOGIC=y
# CONFIG_QLA3XXX is not set
# CONFIG_QLCNIC is not set
# CONFIG_NETXEN_NIC is not set
# CONFIG_QED is not set
CONFIG_NET_VENDOR_QUALCOMM=y
# CONFIG_QCOM_EMAC is not set
# CONFIG_RMNET is not set
CONFIG_NET_VENDOR_RDC=y
# CONFIG_R6040 is not set
CONFIG_NET_VENDOR_REALTEK=y
# CONFIG_ATP is not set
# CONFIG_8139CP is not set
# CONFIG_8139TOO is not set
CONFIG_R8169=y
CONFIG_NET_VENDOR_RENESAS=y
CONFIG_NET_VENDOR_ROCKER=y
# CONFIG_ROCKER is not set
CONFIG_NET_VENDOR_SAMSUNG=y
# CONFIG_SXGBE_ETH is not set
CONFIG_NET_VENDOR_SEEQ=y
CONFIG_NET_VENDOR_SOLARFLARE=y
# CONFIG_SFC is not set
# CONFIG_SFC_FALCON is not set
CONFIG_NET_VENDOR_SILAN=y
# CONFIG_SC92031 is not set
CONFIG_NET_VENDOR_SIS=y
# CONFIG_SIS900 is not set
# CONFIG_SIS190 is not set
CONFIG_NET_VENDOR_SMSC=y
# CONFIG_EPIC100 is not set
# CONFIG_SMSC911X is not set
# CONFIG_SMSC9420 is not set
CONFIG_NET_VENDOR_SOCIONEXT=y
CONFIG_NET_VENDOR_STMICRO=y
# CONFIG_STMMAC_ETH is not set
CONFIG_NET_VENDOR_SUN=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
# CONFIG_NIU is not set
CONFIG_NET_VENDOR_SYNOPSYS=y
# CONFIG_DWC_XLGMAC is not set
CONFIG_NET_VENDOR_TEHUTI=y
# CONFIG_TEHUTI is not set
CONFIG_NET_VENDOR_TI=y
# CONFIG_TI_CPSW_PHY_SEL is not set
# CONFIG_TLAN is not set
CONFIG_NET_VENDOR_VIA=y
# CONFIG_VIA_RHINE is not set
# CONFIG_VIA_VELOCITY is not set
CONFIG_NET_VENDOR_WIZNET=y
# CONFIG_WIZNET_W5100 is not set
# CONFIG_WIZNET_W5300 is not set
CONFIG_NET_VENDOR_XILINX=y
# CONFIG_XILINX_EMACLITE is not set
# CONFIG_XILINX_AXI_EMAC is not set
# CONFIG_XILINX_LL_TEMAC is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_NET_SB1000 is not set
CONFIG_PHYLIB=y
# CONFIG_LED_TRIGGER_PHY is not set
# CONFIG_FIXED_PHY is not set

#
# MII PHY device drivers
#
# CONFIG_AMD_PHY is not set
# CONFIG_ADIN_PHY is not set
# CONFIG_AQUANTIA_PHY is not set
# CONFIG_AX88796B_PHY is not set
# CONFIG_BROADCOM_PHY is not set
# CONFIG_BCM54140_PHY is not set
# CONFIG_BCM7XXX_PHY is not set
# CONFIG_BCM84881_PHY is not set
# CONFIG_BCM87XX_PHY is not set
# CONFIG_CICADA_PHY is not set
# CONFIG_CORTINA_PHY is not set
# CONFIG_DAVICOM_PHY is not set
# CONFIG_ICPLUS_PHY is not set
# CONFIG_LXT_PHY is not set
# CONFIG_INTEL_XWAY_PHY is not set
# CONFIG_LSI_ET1011C_PHY is not set
# CONFIG_MARVELL_PHY is not set
# CONFIG_MARVELL_10G_PHY is not set
# CONFIG_MICREL_PHY is not set
# CONFIG_MICROCHIP_PHY is not set
# CONFIG_MICROCHIP_T1_PHY is not set
# CONFIG_MICROSEMI_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_NXP_TJA11XX_PHY is not set
# CONFIG_QSEMI_PHY is not set
CONFIG_REALTEK_PHY=y
# CONFIG_RENESAS_PHY is not set
# CONFIG_ROCKCHIP_PHY is not set
# CONFIG_SMSC_PHY is not set
# CONFIG_STE10XP is not set
# CONFIG_TERANETICS_PHY is not set
# CONFIG_DP83822_PHY is not set
# CONFIG_DP83TC811_PHY is not set
# CONFIG_DP83848_PHY is not set
# CONFIG_DP83867_PHY is not set
# CONFIG_DP83869_PHY is not set
# CONFIG_VITESSE_PHY is not set
# CONFIG_XILINX_GMII2RGMII is not set
# CONFIG_MICREL_KS8995MA is not set
CONFIG_MDIO_DEVICE=y
CONFIG_MDIO_BUS=y
CONFIG_MDIO_DEVRES=y
# CONFIG_MDIO_BITBANG is not set
# CONFIG_MDIO_BCM_UNIMAC is not set
# CONFIG_MDIO_MVUSB is not set
# CONFIG_MDIO_MSCC_MIIM is not set
# CONFIG_MDIO_THUNDER is not set

#
# MDIO Multiplexers
#

#
# PCS device drivers
#
# CONFIG_PCS_XPCS is not set
# end of PCS device drivers

# CONFIG_PLIP is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
CONFIG_USB_NET_DRIVERS=y
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
CONFIG_USB_RTL8152=y
# CONFIG_USB_LAN78XX is not set
CONFIG_USB_USBNET=y
CONFIG_USB_NET_AX8817X=y
CONFIG_USB_NET_AX88179_178A=y
# CONFIG_USB_NET_CDCETHER is not set
# CONFIG_USB_NET_CDC_EEM is not set
# CONFIG_USB_NET_CDC_NCM is not set
# CONFIG_USB_NET_HUAWEI_CDC_NCM is not set
# CONFIG_USB_NET_CDC_MBIM is not set
# CONFIG_USB_NET_DM9601 is not set
# CONFIG_USB_NET_SR9700 is not set
# CONFIG_USB_NET_SR9800 is not set
# CONFIG_USB_NET_SMSC75XX is not set
# CONFIG_USB_NET_SMSC95XX is not set
# CONFIG_USB_NET_GL620A is not set
# CONFIG_USB_NET_NET1080 is not set
# CONFIG_USB_NET_PLUSB is not set
# CONFIG_USB_NET_MCS7830 is not set
# CONFIG_USB_NET_RNDIS_HOST is not set
# CONFIG_USB_NET_CDC_SUBSET is not set
# CONFIG_USB_NET_ZAURUS is not set
# CONFIG_USB_NET_CX82310_ETH is not set
# CONFIG_USB_NET_KALMIA is not set
# CONFIG_USB_NET_QMI_WWAN is not set
# CONFIG_USB_HSO is not set
# CONFIG_USB_NET_INT51X1 is not set
# CONFIG_USB_IPHETH is not set
# CONFIG_USB_SIERRA_NET is not set
# CONFIG_USB_NET_CH9200 is not set
# CONFIG_USB_NET_AQC111 is not set
CONFIG_WLAN=y
CONFIG_WLAN_VENDOR_ADMTEK=y
# CONFIG_ADM8211 is not set
CONFIG_WLAN_VENDOR_ATH=y
# CONFIG_ATH_DEBUG is not set
# CONFIG_ATH5K is not set
# CONFIG_ATH5K_PCI is not set
# CONFIG_ATH9K is not set
# CONFIG_ATH9K_HTC is not set
# CONFIG_CARL9170 is not set
# CONFIG_ATH6KL is not set
# CONFIG_AR5523 is not set
# CONFIG_WIL6210 is not set
# CONFIG_ATH10K is not set
# CONFIG_WCN36XX is not set
# CONFIG_ATH11K is not set
CONFIG_WLAN_VENDOR_ATMEL=y
# CONFIG_ATMEL is not set
# CONFIG_AT76C50X_USB is not set
CONFIG_WLAN_VENDOR_BROADCOM=y
# CONFIG_B43 is not set
# CONFIG_B43LEGACY is not set
# CONFIG_BRCMSMAC is not set
# CONFIG_BRCMFMAC is not set
CONFIG_WLAN_VENDOR_CISCO=y
# CONFIG_AIRO is not set
CONFIG_WLAN_VENDOR_INTEL=y
# CONFIG_IPW2100 is not set
# CONFIG_IPW2200 is not set
# CONFIG_IWL4965 is not set
# CONFIG_IWL3945 is not set
# CONFIG_IWLWIFI is not set
CONFIG_WLAN_VENDOR_INTERSIL=y
# CONFIG_HOSTAP is not set
# CONFIG_HERMES is not set
# CONFIG_P54_COMMON is not set
# CONFIG_PRISM54 is not set
CONFIG_WLAN_VENDOR_MARVELL=y
# CONFIG_LIBERTAS is not set
# CONFIG_LIBERTAS_THINFIRM is not set
# CONFIG_MWIFIEX is not set
# CONFIG_MWL8K is not set
CONFIG_WLAN_VENDOR_MEDIATEK=y
# CONFIG_MT7601U is not set
# CONFIG_MT76x0U is not set
# CONFIG_MT76x0E is not set
# CONFIG_MT76x2E is not set
# CONFIG_MT76x2U is not set
# CONFIG_MT7603E is not set
# CONFIG_MT7615E is not set
# CONFIG_MT7663U is not set
# CONFIG_MT7663S is not set
# CONFIG_MT7915E is not set
# CONFIG_MT7921E is not set
CONFIG_WLAN_VENDOR_MICROCHIP=y
# CONFIG_WILC1000_SDIO is not set
# CONFIG_WILC1000_SPI is not set
CONFIG_WLAN_VENDOR_RALINK=y
# CONFIG_RT2X00 is not set
CONFIG_WLAN_VENDOR_REALTEK=y
# CONFIG_RTL8180 is not set
# CONFIG_RTL8187 is not set
CONFIG_RTL_CARDS=m
# CONFIG_RTL8192CE is not set
# CONFIG_RTL8192SE is not set
# CONFIG_RTL8192DE is not set
# CONFIG_RTL8723AE is not set
# CONFIG_RTL8723BE is not set
# CONFIG_RTL8188EE is not set
# CONFIG_RTL8192EE is not set
# CONFIG_RTL8821AE is not set
# CONFIG_RTL8192CU is not set
# CONFIG_RTL8XXXU is not set
# CONFIG_RTW88 is not set
CONFIG_WLAN_VENDOR_RSI=y
# CONFIG_RSI_91X is not set
CONFIG_WLAN_VENDOR_ST=y
# CONFIG_CW1200 is not set
CONFIG_WLAN_VENDOR_TI=y
# CONFIG_WL1251 is not set
# CONFIG_WL12XX is not set
# CONFIG_WL18XX is not set
# CONFIG_WLCORE is not set
CONFIG_WLAN_VENDOR_ZYDAS=y
# CONFIG_USB_ZD1201 is not set
# CONFIG_ZD1211RW is not set
CONFIG_WLAN_VENDOR_QUANTENNA=y
# CONFIG_QTNFMAC_PCIE is not set
CONFIG_MAC80211_HWSIM=m
# CONFIG_USB_NET_RNDIS_WLAN is not set
# CONFIG_VIRT_WIFI is not set
# CONFIG_WAN is not set
CONFIG_IEEE802154_DRIVERS=m
# CONFIG_IEEE802154_FAKELB is not set
# CONFIG_IEEE802154_AT86RF230 is not set
# CONFIG_IEEE802154_MRF24J40 is not set
# CONFIG_IEEE802154_CC2520 is not set
# CONFIG_IEEE802154_ATUSB is not set
# CONFIG_IEEE802154_ADF7242 is not set
# CONFIG_IEEE802154_CA8210 is not set
# CONFIG_IEEE802154_MCR20A is not set
# CONFIG_IEEE802154_HWSIM is not set
CONFIG_XEN_NETDEV_FRONTEND=y
# CONFIG_VMXNET3 is not set
# CONFIG_FUJITSU_ES is not set
# CONFIG_HYPERV_NET is not set
CONFIG_NETDEVSIM=m
CONFIG_NET_FAILOVER=m
# CONFIG_ISDN is not set
# CONFIG_NVM is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_LEDS=y
CONFIG_INPUT_FF_MEMLESS=m
CONFIG_INPUT_SPARSEKMAP=m
# CONFIG_INPUT_MATRIXKMAP is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_JOYDEV=m
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
# CONFIG_KEYBOARD_ADP5588 is not set
# CONFIG_KEYBOARD_ADP5589 is not set
# CONFIG_KEYBOARD_APPLESPI is not set
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_QT1050 is not set
# CONFIG_KEYBOARD_QT1070 is not set
# CONFIG_KEYBOARD_QT2160 is not set
# CONFIG_KEYBOARD_DLINK_DIR685 is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_GPIO is not set
# CONFIG_KEYBOARD_GPIO_POLLED is not set
# CONFIG_KEYBOARD_TCA6416 is not set
# CONFIG_KEYBOARD_TCA8418 is not set
# CONFIG_KEYBOARD_MATRIX is not set
# CONFIG_KEYBOARD_LM8323 is not set
# CONFIG_KEYBOARD_LM8333 is not set
# CONFIG_KEYBOARD_MAX7359 is not set
# CONFIG_KEYBOARD_MCS is not set
# CONFIG_KEYBOARD_MPR121 is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_OPENCORES is not set
# CONFIG_KEYBOARD_SAMSUNG is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_TM2_TOUCHKEY is not set
# CONFIG_KEYBOARD_XTKBD is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_BYD=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_SYNAPTICS_SMBUS=y
CONFIG_MOUSE_PS2_CYPRESS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
CONFIG_MOUSE_PS2_ELANTECH=y
CONFIG_MOUSE_PS2_ELANTECH_SMBUS=y
CONFIG_MOUSE_PS2_SENTELIC=y
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
CONFIG_MOUSE_PS2_FOCALTECH=y
CONFIG_MOUSE_PS2_VMMOUSE=y
CONFIG_MOUSE_PS2_SMBUS=y
CONFIG_MOUSE_SERIAL=m
# CONFIG_MOUSE_APPLETOUCH is not set
# CONFIG_MOUSE_BCM5974 is not set
CONFIG_MOUSE_CYAPA=m
CONFIG_MOUSE_ELAN_I2C=m
CONFIG_MOUSE_ELAN_I2C_I2C=y
CONFIG_MOUSE_ELAN_I2C_SMBUS=y
CONFIG_MOUSE_VSXXXAA=m
# CONFIG_MOUSE_GPIO is not set
CONFIG_MOUSE_SYNAPTICS_I2C=m
# CONFIG_MOUSE_SYNAPTICS_USB is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set
CONFIG_RMI4_CORE=m
CONFIG_RMI4_I2C=m
CONFIG_RMI4_SPI=m
CONFIG_RMI4_SMB=m
CONFIG_RMI4_F03=y
CONFIG_RMI4_F03_SERIO=m
CONFIG_RMI4_2D_SENSOR=y
CONFIG_RMI4_F11=y
CONFIG_RMI4_F12=y
CONFIG_RMI4_F30=y
CONFIG_RMI4_F34=y
# CONFIG_RMI4_F3A is not set
# CONFIG_RMI4_F54 is not set
CONFIG_RMI4_F55=y

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=m
CONFIG_SERIO_ALTERA_PS2=m
# CONFIG_SERIO_PS2MULT is not set
CONFIG_SERIO_ARC_PS2=m
CONFIG_HYPERV_KEYBOARD=m
# CONFIG_SERIO_GPIO_PS2 is not set
# CONFIG_USERIO is not set
# CONFIG_GAMEPORT is not set
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_VT_CONSOLE_SLEEP=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
CONFIG_LDISC_AUTOLOAD=y

#
# Serial drivers
#
CONFIG_SERIAL_EARLYCON=y
CONFIG_SERIAL_8250=y
# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
CONFIG_SERIAL_8250_PNP=y
# CONFIG_SERIAL_8250_16550A_VARIANTS is not set
# CONFIG_SERIAL_8250_FINTEK is not set
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_DMA=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_EXAR=y
CONFIG_SERIAL_8250_NR_UARTS=64
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
# CONFIG_SERIAL_8250_DETECT_IRQ is not set
CONFIG_SERIAL_8250_RSA=y
CONFIG_SERIAL_8250_DWLIB=y
CONFIG_SERIAL_8250_DW=y
# CONFIG_SERIAL_8250_RT288X is not set
CONFIG_SERIAL_8250_LPSS=y
CONFIG_SERIAL_8250_MID=y

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_MAX3100 is not set
# CONFIG_SERIAL_MAX310X is not set
# CONFIG_SERIAL_UARTLITE is not set
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_SERIAL_JSM=m
# CONFIG_SERIAL_LANTIQ is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_SC16IS7XX is not set
# CONFIG_SERIAL_BCM63XX is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
CONFIG_SERIAL_ARC=m
CONFIG_SERIAL_ARC_NR_PORTS=1
# CONFIG_SERIAL_RP2 is not set
# CONFIG_SERIAL_FSL_LPUART is not set
# CONFIG_SERIAL_FSL_LINFLEXUART is not set
# CONFIG_SERIAL_SPRD is not set
# end of Serial drivers

CONFIG_SERIAL_MCTRL_GPIO=y
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_ROCKETPORT is not set
CONFIG_CYCLADES=m
# CONFIG_CYZ_INTR is not set
# CONFIG_MOXA_INTELLIO is not set
# CONFIG_MOXA_SMARTIO is not set
CONFIG_SYNCLINK_GT=m
# CONFIG_ISI is not set
CONFIG_N_HDLC=m
CONFIG_N_GSM=m
CONFIG_NOZOMI=m
# CONFIG_NULL_TTY is not set
# CONFIG_TRACE_SINK is not set
CONFIG_HVC_DRIVER=y
CONFIG_HVC_IRQ=y
CONFIG_HVC_XEN=y
CONFIG_HVC_XEN_FRONTEND=y
# CONFIG_SERIAL_DEV_BUS is not set
CONFIG_PRINTER=m
# CONFIG_LP_CONSOLE is not set
CONFIG_PPDEV=m
CONFIG_VIRTIO_CONSOLE=m
CONFIG_IPMI_HANDLER=m
CONFIG_IPMI_DMI_DECODE=y
CONFIG_IPMI_PLAT_DATA=y
CONFIG_IPMI_PANIC_EVENT=y
CONFIG_IPMI_PANIC_STRING=y
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_SSIF=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=y
CONFIG_HW_RANDOM_TIMERIOMEM=m
CONFIG_HW_RANDOM_INTEL=m
CONFIG_HW_RANDOM_AMD=m
# CONFIG_HW_RANDOM_BA431 is not set
CONFIG_HW_RANDOM_VIA=m
CONFIG_HW_RANDOM_VIRTIO=y
# CONFIG_HW_RANDOM_XIPHERA is not set
# CONFIG_APPLICOM is not set
# CONFIG_MWAVE is not set
CONFIG_DEVMEM=y
# CONFIG_DEVKMEM is not set
CONFIG_NVRAM=y
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=8192
CONFIG_DEVPORT=y
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
# CONFIG_HPET_MMAP_DEFAULT is not set
CONFIG_HANGCHECK_TIMER=m
CONFIG_UV_MMTIMER=m
CONFIG_TCG_TPM=y
CONFIG_HW_RANDOM_TPM=y
CONFIG_TCG_TIS_CORE=y
CONFIG_TCG_TIS=y
# CONFIG_TCG_TIS_SPI is not set
# CONFIG_TCG_TIS_I2C_CR50 is not set
CONFIG_TCG_TIS_I2C_ATMEL=m
CONFIG_TCG_TIS_I2C_INFINEON=m
CONFIG_TCG_TIS_I2C_NUVOTON=m
CONFIG_TCG_NSC=m
CONFIG_TCG_ATMEL=m
CONFIG_TCG_INFINEON=m
# CONFIG_TCG_XEN is not set
CONFIG_TCG_CRB=y
# CONFIG_TCG_VTPM_PROXY is not set
CONFIG_TCG_TIS_ST33ZP24=m
CONFIG_TCG_TIS_ST33ZP24_I2C=m
# CONFIG_TCG_TIS_ST33ZP24_SPI is not set
CONFIG_TELCLOCK=m
# CONFIG_XILLYBUS is not set
# end of Character devices

# CONFIG_RANDOM_TRUST_CPU is not set
# CONFIG_RANDOM_TRUST_BOOTLOADER is not set

#
# I2C support
#
CONFIG_I2C=y
CONFIG_ACPI_I2C_OPREGION=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_MUX=m

#
# Multiplexer I2C Chip support
#
# CONFIG_I2C_MUX_GPIO is not set
# CONFIG_I2C_MUX_LTC4306 is not set
# CONFIG_I2C_MUX_PCA9541 is not set
# CONFIG_I2C_MUX_PCA954x is not set
# CONFIG_I2C_MUX_REG is not set
CONFIG_I2C_MUX_MLXCPLD=m
# end of Multiplexer I2C Chip support

CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_SMBUS=y
CONFIG_I2C_ALGOBIT=y
CONFIG_I2C_ALGOPCA=m

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
CONFIG_I2C_AMD756=m
CONFIG_I2C_AMD756_S4882=m
CONFIG_I2C_AMD8111=m
# CONFIG_I2C_AMD_MP2 is not set
CONFIG_I2C_I801=y
CONFIG_I2C_ISCH=m
CONFIG_I2C_ISMT=m
CONFIG_I2C_PIIX4=m
CONFIG_I2C_NFORCE2=m
CONFIG_I2C_NFORCE2_S4985=m
# CONFIG_I2C_NVIDIA_GPU is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
CONFIG_I2C_SIS96X=m
CONFIG_I2C_VIA=m
CONFIG_I2C_VIAPRO=m

#
# ACPI drivers
#
CONFIG_I2C_SCMI=m

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_CBUS_GPIO is not set
CONFIG_I2C_DESIGNWARE_CORE=m
# CONFIG_I2C_DESIGNWARE_SLAVE is not set
CONFIG_I2C_DESIGNWARE_PLATFORM=m
CONFIG_I2C_DESIGNWARE_BAYTRAIL=y
# CONFIG_I2C_DESIGNWARE_PCI is not set
# CONFIG_I2C_EMEV2 is not set
# CONFIG_I2C_GPIO is not set
# CONFIG_I2C_OCORES is not set
CONFIG_I2C_PCA_PLATFORM=m
CONFIG_I2C_SIMTEC=m
# CONFIG_I2C_XILINX is not set

#
# External I2C/SMBus adapter drivers
#
# CONFIG_I2C_DIOLAN_U2C is not set
CONFIG_I2C_PARPORT=m
# CONFIG_I2C_ROBOTFUZZ_OSIF is not set
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Other I2C/SMBus bus drivers
#
CONFIG_I2C_MLXCPLD=m
# end of I2C Hardware Bus support

CONFIG_I2C_STUB=m
# CONFIG_I2C_SLAVE is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# end of I2C support

# CONFIG_I3C is not set
CONFIG_SPI=y
# CONFIG_SPI_DEBUG is not set
CONFIG_SPI_MASTER=y
# CONFIG_SPI_MEM is not set

#
# SPI Master Controller Drivers
#
# CONFIG_SPI_ALTERA is not set
# CONFIG_SPI_AXI_SPI_ENGINE is not set
# CONFIG_SPI_BITBANG is not set
# CONFIG_SPI_BUTTERFLY is not set
# CONFIG_SPI_CADENCE is not set
# CONFIG_SPI_DESIGNWARE is not set
# CONFIG_SPI_NXP_FLEXSPI is not set
# CONFIG_SPI_GPIO is not set
# CONFIG_SPI_LM70_LLP is not set
# CONFIG_SPI_LANTIQ_SSC is not set
# CONFIG_SPI_OC_TINY is not set
# CONFIG_SPI_PXA2XX is not set
# CONFIG_SPI_ROCKCHIP is not set
# CONFIG_SPI_SC18IS602 is not set
# CONFIG_SPI_SIFIVE is not set
# CONFIG_SPI_MXIC is not set
# CONFIG_SPI_XCOMM is not set
# CONFIG_SPI_XILINX is not set
# CONFIG_SPI_ZYNQMP_GQSPI is not set
# CONFIG_SPI_AMD is not set

#
# SPI Multiplexer support
#
# CONFIG_SPI_MUX is not set

#
# SPI Protocol Masters
#
# CONFIG_SPI_SPIDEV is not set
# CONFIG_SPI_LOOPBACK_TEST is not set
# CONFIG_SPI_TLE62X0 is not set
# CONFIG_SPI_SLAVE is not set
CONFIG_SPI_DYNAMIC=y
# CONFIG_SPMI is not set
# CONFIG_HSI is not set
CONFIG_PPS=y
# CONFIG_PPS_DEBUG is not set

#
# PPS clients support
#
# CONFIG_PPS_CLIENT_KTIMER is not set
CONFIG_PPS_CLIENT_LDISC=m
CONFIG_PPS_CLIENT_PARPORT=m
CONFIG_PPS_CLIENT_GPIO=m

#
# PPS generators support
#

#
# PTP clock support
#
CONFIG_PTP_1588_CLOCK=y
# CONFIG_DP83640_PHY is not set
# CONFIG_PTP_1588_CLOCK_INES is not set
CONFIG_PTP_1588_CLOCK_KVM=m
# CONFIG_PTP_1588_CLOCK_IDT82P33 is not set
# CONFIG_PTP_1588_CLOCK_IDTCM is not set
# CONFIG_PTP_1588_CLOCK_VMW is not set
# CONFIG_PTP_1588_CLOCK_OCP is not set
# end of PTP clock support

CONFIG_PINCTRL=y
CONFIG_PINMUX=y
CONFIG_PINCONF=y
CONFIG_GENERIC_PINCONF=y
# CONFIG_DEBUG_PINCTRL is not set
CONFIG_PINCTRL_AMD=m
# CONFIG_PINCTRL_MCP23S08 is not set
# CONFIG_PINCTRL_SX150X is not set
CONFIG_PINCTRL_BAYTRAIL=y
# CONFIG_PINCTRL_CHERRYVIEW is not set
# CONFIG_PINCTRL_LYNXPOINT is not set
CONFIG_PINCTRL_INTEL=y
# CONFIG_PINCTRL_ALDERLAKE is not set
CONFIG_PINCTRL_BROXTON=m
CONFIG_PINCTRL_CANNONLAKE=m
CONFIG_PINCTRL_CEDARFORK=m
CONFIG_PINCTRL_DENVERTON=m
# CONFIG_PINCTRL_ELKHARTLAKE is not set
# CONFIG_PINCTRL_EMMITSBURG is not set
CONFIG_PINCTRL_GEMINILAKE=m
# CONFIG_PINCTRL_ICELAKE is not set
# CONFIG_PINCTRL_JASPERLAKE is not set
# CONFIG_PINCTRL_LAKEFIELD is not set
CONFIG_PINCTRL_LEWISBURG=m
CONFIG_PINCTRL_SUNRISEPOINT=m
# CONFIG_PINCTRL_TIGERLAKE is not set

#
# Renesas pinctrl drivers
#
# end of Renesas pinctrl drivers

CONFIG_GPIOLIB=y
CONFIG_GPIOLIB_FASTPATH_LIMIT=512
CONFIG_GPIO_ACPI=y
CONFIG_GPIOLIB_IRQCHIP=y
# CONFIG_DEBUG_GPIO is not set
CONFIG_GPIO_CDEV=y
CONFIG_GPIO_CDEV_V1=y
CONFIG_GPIO_GENERIC=m

#
# Memory mapped GPIO drivers
#
CONFIG_GPIO_AMDPT=m
# CONFIG_GPIO_DWAPB is not set
# CONFIG_GPIO_EXAR is not set
# CONFIG_GPIO_GENERIC_PLATFORM is not set
CONFIG_GPIO_ICH=m
# CONFIG_GPIO_MB86S7X is not set
# CONFIG_GPIO_VX855 is not set
# CONFIG_GPIO_AMD_FCH is not set
# end of Memory mapped GPIO drivers

#
# Port-mapped I/O GPIO drivers
#
# CONFIG_GPIO_F7188X is not set
# CONFIG_GPIO_IT87 is not set
# CONFIG_GPIO_SCH is not set
# CONFIG_GPIO_SCH311X is not set
# CONFIG_GPIO_WINBOND is not set
# CONFIG_GPIO_WS16C48 is not set
# end of Port-mapped I/O GPIO drivers

#
# I2C GPIO expanders
#
# CONFIG_GPIO_ADP5588 is not set
# CONFIG_GPIO_MAX7300 is not set
# CONFIG_GPIO_MAX732X is not set
# CONFIG_GPIO_PCA953X is not set
# CONFIG_GPIO_PCA9570 is not set
# CONFIG_GPIO_PCF857X is not set
# CONFIG_GPIO_TPIC2810 is not set
# end of I2C GPIO expanders

#
# MFD GPIO expanders
#
# end of MFD GPIO expanders

#
# PCI GPIO expanders
#
# CONFIG_GPIO_AMD8111 is not set
# CONFIG_GPIO_BT8XX is not set
# CONFIG_GPIO_ML_IOH is not set
# CONFIG_GPIO_PCI_IDIO_16 is not set
# CONFIG_GPIO_PCIE_IDIO_24 is not set
# CONFIG_GPIO_RDC321X is not set
# end of PCI GPIO expanders

#
# SPI GPIO expanders
#
# CONFIG_GPIO_MAX3191X is not set
# CONFIG_GPIO_MAX7301 is not set
# CONFIG_GPIO_MC33880 is not set
# CONFIG_GPIO_PISOSR is not set
# CONFIG_GPIO_XRA1403 is not set
# end of SPI GPIO expanders

#
# USB GPIO expanders
#
# end of USB GPIO expanders

#
# Virtual GPIO drivers
#
# CONFIG_GPIO_AGGREGATOR is not set
# CONFIG_GPIO_MOCKUP is not set
# end of Virtual GPIO drivers

# CONFIG_W1 is not set
CONFIG_POWER_RESET=y
# CONFIG_POWER_RESET_RESTART is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
CONFIG_POWER_SUPPLY_HWMON=y
# CONFIG_PDA_POWER is not set
# CONFIG_TEST_POWER is not set
# CONFIG_CHARGER_ADP5061 is not set
# CONFIG_BATTERY_CW2015 is not set
# CONFIG_BATTERY_DS2780 is not set
# CONFIG_BATTERY_DS2781 is not set
# CONFIG_BATTERY_DS2782 is not set
# CONFIG_BATTERY_SBS is not set
# CONFIG_CHARGER_SBS is not set
# CONFIG_MANAGER_SBS is not set
# CONFIG_BATTERY_BQ27XXX is not set
# CONFIG_BATTERY_MAX17040 is not set
# CONFIG_BATTERY_MAX17042 is not set
# CONFIG_CHARGER_MAX8903 is not set
# CONFIG_CHARGER_LP8727 is not set
# CONFIG_CHARGER_GPIO is not set
# CONFIG_CHARGER_LT3651 is not set
# CONFIG_CHARGER_LTC4162L is not set
# CONFIG_CHARGER_BQ2415X is not set
# CONFIG_CHARGER_BQ24257 is not set
# CONFIG_CHARGER_BQ24735 is not set
# CONFIG_CHARGER_BQ2515X is not set
# CONFIG_CHARGER_BQ25890 is not set
# CONFIG_CHARGER_BQ25980 is not set
# CONFIG_CHARGER_BQ256XX is not set
CONFIG_CHARGER_SMB347=m
# CONFIG_BATTERY_GAUGE_LTC2941 is not set
# CONFIG_CHARGER_RT9455 is not set
# CONFIG_CHARGER_BD99954 is not set
CONFIG_HWMON=y
CONFIG_HWMON_VID=m
# CONFIG_HWMON_DEBUG_CHIP is not set

#
# Native drivers
#
CONFIG_SENSORS_ABITUGURU=m
CONFIG_SENSORS_ABITUGURU3=m
# CONFIG_SENSORS_AD7314 is not set
CONFIG_SENSORS_AD7414=m
CONFIG_SENSORS_AD7418=m
CONFIG_SENSORS_ADM1021=m
CONFIG_SENSORS_ADM1025=m
CONFIG_SENSORS_ADM1026=m
CONFIG_SENSORS_ADM1029=m
CONFIG_SENSORS_ADM1031=m
# CONFIG_SENSORS_ADM1177 is not set
CONFIG_SENSORS_ADM9240=m
CONFIG_SENSORS_ADT7X10=m
# CONFIG_SENSORS_ADT7310 is not set
CONFIG_SENSORS_ADT7410=m
CONFIG_SENSORS_ADT7411=m
CONFIG_SENSORS_ADT7462=m
CONFIG_SENSORS_ADT7470=m
CONFIG_SENSORS_ADT7475=m
# CONFIG_SENSORS_AHT10 is not set
# CONFIG_SENSORS_AS370 is not set
CONFIG_SENSORS_ASC7621=m
# CONFIG_SENSORS_AXI_FAN_CONTROL is not set
CONFIG_SENSORS_K8TEMP=m
CONFIG_SENSORS_K10TEMP=m
CONFIG_SENSORS_FAM15H_POWER=m
# CONFIG_SENSORS_AMD_ENERGY is not set
CONFIG_SENSORS_APPLESMC=m
CONFIG_SENSORS_ASB100=m
# CONFIG_SENSORS_ASPEED is not set
CONFIG_SENSORS_ATXP1=m
# CONFIG_SENSORS_CORSAIR_CPRO is not set
# CONFIG_SENSORS_CORSAIR_PSU is not set
# CONFIG_SENSORS_DRIVETEMP is not set
CONFIG_SENSORS_DS620=m
CONFIG_SENSORS_DS1621=m
CONFIG_SENSORS_DELL_SMM=m
CONFIG_SENSORS_I5K_AMB=m
CONFIG_SENSORS_F71805F=m
CONFIG_SENSORS_F71882FG=m
CONFIG_SENSORS_F75375S=m
CONFIG_SENSORS_FSCHMD=m
# CONFIG_SENSORS_FTSTEUTATES is not set
CONFIG_SENSORS_GL518SM=m
CONFIG_SENSORS_GL520SM=m
CONFIG_SENSORS_G760A=m
# CONFIG_SENSORS_G762 is not set
# CONFIG_SENSORS_HIH6130 is not set
CONFIG_SENSORS_IBMAEM=m
CONFIG_SENSORS_IBMPEX=m
CONFIG_SENSORS_I5500=m
CONFIG_SENSORS_CORETEMP=m
CONFIG_SENSORS_IT87=m
CONFIG_SENSORS_JC42=m
# CONFIG_SENSORS_POWR1220 is not set
CONFIG_SENSORS_LINEAGE=m
# CONFIG_SENSORS_LTC2945 is not set
# CONFIG_SENSORS_LTC2947_I2C is not set
# CONFIG_SENSORS_LTC2947_SPI is not set
# CONFIG_SENSORS_LTC2990 is not set
# CONFIG_SENSORS_LTC2992 is not set
CONFIG_SENSORS_LTC4151=m
CONFIG_SENSORS_LTC4215=m
# CONFIG_SENSORS_LTC4222 is not set
CONFIG_SENSORS_LTC4245=m
# CONFIG_SENSORS_LTC4260 is not set
CONFIG_SENSORS_LTC4261=m
# CONFIG_SENSORS_MAX1111 is not set
# CONFIG_SENSORS_MAX127 is not set
CONFIG_SENSORS_MAX16065=m
CONFIG_SENSORS_MAX1619=m
CONFIG_SENSORS_MAX1668=m
CONFIG_SENSORS_MAX197=m
# CONFIG_SENSORS_MAX31722 is not set
# CONFIG_SENSORS_MAX31730 is not set
# CONFIG_SENSORS_MAX6621 is not set
CONFIG_SENSORS_MAX6639=m
CONFIG_SENSORS_MAX6642=m
CONFIG_SENSORS_MAX6650=m
CONFIG_SENSORS_MAX6697=m
# CONFIG_SENSORS_MAX31790 is not set
CONFIG_SENSORS_MCP3021=m
# CONFIG_SENSORS_MLXREG_FAN is not set
# CONFIG_SENSORS_TC654 is not set
# CONFIG_SENSORS_TPS23861 is not set
# CONFIG_SENSORS_MR75203 is not set
# CONFIG_SENSORS_ADCXX is not set
CONFIG_SENSORS_LM63=m
# CONFIG_SENSORS_LM70 is not set
CONFIG_SENSORS_LM73=m
CONFIG_SENSORS_LM75=m
CONFIG_SENSORS_LM77=m
CONFIG_SENSORS_LM78=m
CONFIG_SENSORS_LM80=m
CONFIG_SENSORS_LM83=m
CONFIG_SENSORS_LM85=m
CONFIG_SENSORS_LM87=m
CONFIG_SENSORS_LM90=m
CONFIG_SENSORS_LM92=m
CONFIG_SENSORS_LM93=m
CONFIG_SENSORS_LM95234=m
CONFIG_SENSORS_LM95241=m
CONFIG_SENSORS_LM95245=m
CONFIG_SENSORS_PC87360=m
CONFIG_SENSORS_PC87427=m
CONFIG_SENSORS_NTC_THERMISTOR=m
# CONFIG_SENSORS_NCT6683 is not set
CONFIG_SENSORS_NCT6775=m
# CONFIG_SENSORS_NCT7802 is not set
# CONFIG_SENSORS_NCT7904 is not set
# CONFIG_SENSORS_NPCM7XX is not set
CONFIG_SENSORS_PCF8591=m
CONFIG_PMBUS=m
CONFIG_SENSORS_PMBUS=m
# CONFIG_SENSORS_ADM1266 is not set
CONFIG_SENSORS_ADM1275=m
# CONFIG_SENSORS_BEL_PFE is not set
# CONFIG_SENSORS_IBM_CFFPS is not set
# CONFIG_SENSORS_INSPUR_IPSPS is not set
# CONFIG_SENSORS_IR35221 is not set
# CONFIG_SENSORS_IR38064 is not set
# CONFIG_SENSORS_IRPS5401 is not set
# CONFIG_SENSORS_ISL68137 is not set
CONFIG_SENSORS_LM25066=m
CONFIG_SENSORS_LTC2978=m
# CONFIG_SENSORS_LTC3815 is not set
CONFIG_SENSORS_MAX16064=m
# CONFIG_SENSORS_MAX16601 is not set
# CONFIG_SENSORS_MAX20730 is not set
# CONFIG_SENSORS_MAX20751 is not set
# CONFIG_SENSORS_MAX31785 is not set
CONFIG_SENSORS_MAX34440=m
CONFIG_SENSORS_MAX8688=m
# CONFIG_SENSORS_MP2975 is not set
# CONFIG_SENSORS_PM6764TR is not set
# CONFIG_SENSORS_PXE1610 is not set
# CONFIG_SENSORS_Q54SJ108A2 is not set
# CONFIG_SENSORS_TPS40422 is not set
# CONFIG_SENSORS_TPS53679 is not set
CONFIG_SENSORS_UCD9000=m
CONFIG_SENSORS_UCD9200=m
# CONFIG_SENSORS_XDPE122 is not set
CONFIG_SENSORS_ZL6100=m
# CONFIG_SENSORS_SBTSI is not set
CONFIG_SENSORS_SHT15=m
CONFIG_SENSORS_SHT21=m
# CONFIG_SENSORS_SHT3x is not set
# CONFIG_SENSORS_SHTC1 is not set
CONFIG_SENSORS_SIS5595=m
CONFIG_SENSORS_DME1737=m
CONFIG_SENSORS_EMC1403=m
# CONFIG_SENSORS_EMC2103 is not set
CONFIG_SENSORS_EMC6W201=m
CONFIG_SENSORS_SMSC47M1=m
CONFIG_SENSORS_SMSC47M192=m
CONFIG_SENSORS_SMSC47B397=m
CONFIG_SENSORS_SCH56XX_COMMON=m
CONFIG_SENSORS_SCH5627=m
CONFIG_SENSORS_SCH5636=m
# CONFIG_SENSORS_STTS751 is not set
# CONFIG_SENSORS_SMM665 is not set
# CONFIG_SENSORS_ADC128D818 is not set
CONFIG_SENSORS_ADS7828=m
# CONFIG_SENSORS_ADS7871 is not set
CONFIG_SENSORS_AMC6821=m
CONFIG_SENSORS_INA209=m
CONFIG_SENSORS_INA2XX=m
# CONFIG_SENSORS_INA3221 is not set
# CONFIG_SENSORS_TC74 is not set
CONFIG_SENSORS_THMC50=m
CONFIG_SENSORS_TMP102=m
# CONFIG_SENSORS_TMP103 is not set
# CONFIG_SENSORS_TMP108 is not set
CONFIG_SENSORS_TMP401=m
CONFIG_SENSORS_TMP421=m
# CONFIG_SENSORS_TMP513 is not set
CONFIG_SENSORS_VIA_CPUTEMP=m
CONFIG_SENSORS_VIA686A=m
CONFIG_SENSORS_VT1211=m
CONFIG_SENSORS_VT8231=m
# CONFIG_SENSORS_W83773G is not set
CONFIG_SENSORS_W83781D=m
CONFIG_SENSORS_W83791D=m
CONFIG_SENSORS_W83792D=m
CONFIG_SENSORS_W83793=m
CONFIG_SENSORS_W83795=m
# CONFIG_SENSORS_W83795_FANCTRL is not set
CONFIG_SENSORS_W83L785TS=m
CONFIG_SENSORS_W83L786NG=m
CONFIG_SENSORS_W83627HF=m
CONFIG_SENSORS_W83627EHF=m
# CONFIG_SENSORS_XGENE is not set

#
# ACPI drivers
#
CONFIG_SENSORS_ACPI_POWER=m
CONFIG_SENSORS_ATK0110=m
CONFIG_THERMAL=y
# CONFIG_THERMAL_NETLINK is not set
# CONFIG_THERMAL_STATISTICS is not set
CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS=0
CONFIG_THERMAL_HWMON=y
CONFIG_THERMAL_WRITABLE_TRIPS=y
CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y
# CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE is not set
# CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE is not set
CONFIG_THERMAL_GOV_FAIR_SHARE=y
CONFIG_THERMAL_GOV_STEP_WISE=y
CONFIG_THERMAL_GOV_BANG_BANG=y
CONFIG_THERMAL_GOV_USER_SPACE=y
# CONFIG_THERMAL_EMULATION is not set

#
# Intel thermal drivers
#
CONFIG_INTEL_POWERCLAMP=m
CONFIG_X86_THERMAL_VECTOR=y
CONFIG_X86_PKG_TEMP_THERMAL=m
CONFIG_INTEL_SOC_DTS_IOSF_CORE=m
# CONFIG_INTEL_SOC_DTS_THERMAL is not set

#
# ACPI INT340X thermal drivers
#
CONFIG_INT340X_THERMAL=m
CONFIG_ACPI_THERMAL_REL=m
# CONFIG_INT3406_THERMAL is not set
CONFIG_PROC_THERMAL_MMIO_RAPL=m
# end of ACPI INT340X thermal drivers

CONFIG_INTEL_PCH_THERMAL=m
# end of Intel thermal drivers

CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
# CONFIG_WATCHDOG_NOWAYOUT is not set
CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED=y
CONFIG_WATCHDOG_OPEN_TIMEOUT=0
CONFIG_WATCHDOG_SYSFS=y

#
# Watchdog Pretimeout Governors
#
# CONFIG_WATCHDOG_PRETIMEOUT_GOV is not set

#
# Watchdog Device Drivers
#
CONFIG_SOFT_WATCHDOG=m
CONFIG_WDAT_WDT=m
# CONFIG_XILINX_WATCHDOG is not set
# CONFIG_ZIIRAVE_WATCHDOG is not set
# CONFIG_MLX_WDT is not set
# CONFIG_CADENCE_WATCHDOG is not set
# CONFIG_DW_WATCHDOG is not set
# CONFIG_MAX63XX_WATCHDOG is not set
# CONFIG_ACQUIRE_WDT is not set
# CONFIG_ADVANTECH_WDT is not set
CONFIG_ALIM1535_WDT=m
CONFIG_ALIM7101_WDT=m
# CONFIG_EBC_C384_WDT is not set
CONFIG_F71808E_WDT=m
CONFIG_SP5100_TCO=m
CONFIG_SBC_FITPC2_WATCHDOG=m
# CONFIG_EUROTECH_WDT is not set
CONFIG_IB700_WDT=m
CONFIG_IBMASR=m
# CONFIG_WAFER_WDT is not set
CONFIG_I6300ESB_WDT=y
CONFIG_IE6XX_WDT=m
CONFIG_ITCO_WDT=y
CONFIG_ITCO_VENDOR_SUPPORT=y
CONFIG_IT8712F_WDT=m
CONFIG_IT87_WDT=m
CONFIG_HP_WATCHDOG=m
CONFIG_HPWDT_NMI_DECODING=y
# CONFIG_SC1200_WDT is not set
# CONFIG_PC87413_WDT is not set
CONFIG_NV_TCO=m
# CONFIG_60XX_WDT is not set
# CONFIG_CPU5_WDT is not set
CONFIG_SMSC_SCH311X_WDT=m
# CONFIG_SMSC37B787_WDT is not set
# CONFIG_TQMX86_WDT is not set
CONFIG_VIA_WDT=m
CONFIG_W83627HF_WDT=m
CONFIG_W83877F_WDT=m
CONFIG_W83977F_WDT=m
CONFIG_MACHZ_WDT=m
# CONFIG_SBC_EPX_C3_WATCHDOG is not set
CONFIG_INTEL_MEI_WDT=m
# CONFIG_NI903X_WDT is not set
# CONFIG_NIC7018_WDT is not set
# CONFIG_MEN_A21_WDT is not set
CONFIG_XEN_WDT=m

#
# PCI-based Watchdog Cards
#
CONFIG_PCIPCWATCHDOG=m
CONFIG_WDTPCI=m

#
# USB-based Watchdog Cards
#
# CONFIG_USBPCWATCHDOG is not set
CONFIG_SSB_POSSIBLE=y
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y
CONFIG_BCMA=m
CONFIG_BCMA_HOST_PCI_POSSIBLE=y
CONFIG_BCMA_HOST_PCI=y
# CONFIG_BCMA_HOST_SOC is not set
CONFIG_BCMA_DRIVER_PCI=y
CONFIG_BCMA_DRIVER_GMAC_CMN=y
CONFIG_BCMA_DRIVER_GPIO=y
# CONFIG_BCMA_DEBUG is not set

#
# Multifunction device drivers
#
CONFIG_MFD_CORE=y
# CONFIG_MFD_AS3711 is not set
# CONFIG_PMIC_ADP5520 is not set
# CONFIG_MFD_AAT2870_CORE is not set
# CONFIG_MFD_BCM590XX is not set
# CONFIG_MFD_BD9571MWV is not set
# CONFIG_MFD_AXP20X_I2C is not set
# CONFIG_MFD_MADERA is not set
# CONFIG_PMIC_DA903X is not set
# CONFIG_MFD_DA9052_SPI is not set
# CONFIG_MFD_DA9052_I2C is not set
# CONFIG_MFD_DA9055 is not set
# CONFIG_MFD_DA9062 is not set
# CONFIG_MFD_DA9063 is not set
# CONFIG_MFD_DA9150 is not set
# CONFIG_MFD_DLN2 is not set
# CONFIG_MFD_MC13XXX_SPI is not set
# CONFIG_MFD_MC13XXX_I2C is not set
# CONFIG_MFD_MP2629 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_HTC_I2CPLD is not set
# CONFIG_MFD_INTEL_QUARK_I2C_GPIO is not set
CONFIG_LPC_ICH=y
CONFIG_LPC_SCH=m
# CONFIG_INTEL_SOC_PMIC_CHTDC_TI is not set
CONFIG_MFD_INTEL_LPSS=y
CONFIG_MFD_INTEL_LPSS_ACPI=y
CONFIG_MFD_INTEL_LPSS_PCI=y
# CONFIG_MFD_INTEL_PMC_BXT is not set
# CONFIG_MFD_INTEL_PMT is not set
# CONFIG_MFD_IQS62X is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_88PM800 is not set
# CONFIG_MFD_88PM805 is not set
# CONFIG_MFD_88PM860X is not set
# CONFIG_MFD_MAX14577 is not set
# CONFIG_MFD_MAX77693 is not set
# CONFIG_MFD_MAX77843 is not set
# CONFIG_MFD_MAX8907 is not set
# CONFIG_MFD_MAX8925 is not set
# CONFIG_MFD_MAX8997 is not set
# CONFIG_MFD_MAX8998 is not set
# CONFIG_MFD_MT6360 is not set
# CONFIG_MFD_MT6397 is not set
# CONFIG_MFD_MENF21BMC is not set
# CONFIG_EZX_PCAP is not set
# CONFIG_MFD_VIPERBOARD is not set
# CONFIG_MFD_RETU is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_RT5033 is not set
# CONFIG_MFD_RC5T583 is not set
# CONFIG_MFD_SEC_CORE is not set
# CONFIG_MFD_SI476X_CORE is not set
CONFIG_MFD_SM501=m
CONFIG_MFD_SM501_GPIO=y
# CONFIG_MFD_SKY81452 is not set
# CONFIG_ABX500_CORE is not set
# CONFIG_MFD_SYSCON is not set
# CONFIG_MFD_TI_AM335X_TSCADC is not set
# CONFIG_MFD_LP3943 is not set
# CONFIG_MFD_LP8788 is not set
# CONFIG_MFD_TI_LMU is not set
# CONFIG_MFD_PALMAS is not set
# CONFIG_TPS6105X is not set
# CONFIG_TPS65010 is not set
# CONFIG_TPS6507X is not set
# CONFIG_MFD_TPS65086 is not set
# CONFIG_MFD_TPS65090 is not set
# CONFIG_MFD_TI_LP873X is not set
# CONFIG_MFD_TPS6586X is not set
# CONFIG_MFD_TPS65910 is not set
# CONFIG_MFD_TPS65912_I2C is not set
# CONFIG_MFD_TPS65912_SPI is not set
# CONFIG_MFD_TPS80031 is not set
# CONFIG_TWL4030_CORE is not set
# CONFIG_TWL6040_CORE is not set
# CONFIG_MFD_WL1273_CORE is not set
# CONFIG_MFD_LM3533 is not set
# CONFIG_MFD_TQMX86 is not set
CONFIG_MFD_VX855=m
# CONFIG_MFD_ARIZONA_I2C is not set
# CONFIG_MFD_ARIZONA_SPI is not set
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_WM831X_I2C is not set
# CONFIG_MFD_WM831X_SPI is not set
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_MFD_WM8994 is not set
# CONFIG_MFD_INTEL_M10_BMC is not set
# end of Multifunction device drivers

# CONFIG_REGULATOR is not set
CONFIG_RC_CORE=m
CONFIG_RC_MAP=m
CONFIG_LIRC=y
CONFIG_RC_DECODERS=y
CONFIG_IR_NEC_DECODER=m
CONFIG_IR_RC5_DECODER=m
CONFIG_IR_RC6_DECODER=m
CONFIG_IR_JVC_DECODER=m
CONFIG_IR_SONY_DECODER=m
CONFIG_IR_SANYO_DECODER=m
# CONFIG_IR_SHARP_DECODER is not set
CONFIG_IR_MCE_KBD_DECODER=m
# CONFIG_IR_XMP_DECODER is not set
CONFIG_IR_IMON_DECODER=m
# CONFIG_IR_RCMM_DECODER is not set
CONFIG_RC_DEVICES=y
# CONFIG_RC_ATI_REMOTE is not set
CONFIG_IR_ENE=m
# CONFIG_IR_IMON is not set
# CONFIG_IR_IMON_RAW is not set
# CONFIG_IR_MCEUSB is not set
CONFIG_IR_ITE_CIR=m
CONFIG_IR_FINTEK=m
CONFIG_IR_NUVOTON=m
# CONFIG_IR_REDRAT3 is not set
# CONFIG_IR_STREAMZAP is not set
CONFIG_IR_WINBOND_CIR=m
# CONFIG_IR_IGORPLUGUSB is not set
# CONFIG_IR_IGUANA is not set
# CONFIG_IR_TTUSBIR is not set
# CONFIG_RC_LOOPBACK is not set
CONFIG_IR_SERIAL=m
CONFIG_IR_SERIAL_TRANSMITTER=y
CONFIG_IR_SIR=m
# CONFIG_RC_XBOX_DVD is not set
# CONFIG_IR_TOY is not set
CONFIG_MEDIA_CEC_SUPPORT=y
# CONFIG_CEC_CH7322 is not set
# CONFIG_CEC_SECO is not set
# CONFIG_USB_PULSE8_CEC is not set
# CONFIG_USB_RAINSHADOW_CEC is not set
CONFIG_MEDIA_SUPPORT=m
# CONFIG_MEDIA_SUPPORT_FILTER is not set
# CONFIG_MEDIA_SUBDRV_AUTOSELECT is not set

#
# Media device types
#
CONFIG_MEDIA_CAMERA_SUPPORT=y
CONFIG_MEDIA_ANALOG_TV_SUPPORT=y
CONFIG_MEDIA_DIGITAL_TV_SUPPORT=y
CONFIG_MEDIA_RADIO_SUPPORT=y
CONFIG_MEDIA_SDR_SUPPORT=y
CONFIG_MEDIA_PLATFORM_SUPPORT=y
CONFIG_MEDIA_TEST_SUPPORT=y
# end of Media device types

#
# Media core support
#
CONFIG_VIDEO_DEV=m
CONFIG_MEDIA_CONTROLLER=y
CONFIG_DVB_CORE=m
# end of Media core support

#
# Video4Linux options
#
CONFIG_VIDEO_V4L2=m
CONFIG_VIDEO_V4L2_I2C=y
CONFIG_VIDEO_V4L2_SUBDEV_API=y
# CONFIG_VIDEO_ADV_DEBUG is not set
# CONFIG_VIDEO_FIXED_MINOR_RANGES is not set
# end of Video4Linux options

#
# Media controller options
#
# CONFIG_MEDIA_CONTROLLER_DVB is not set
# end of Media controller options

#
# Digital TV options
#
# CONFIG_DVB_MMAP is not set
CONFIG_DVB_NET=y
CONFIG_DVB_MAX_ADAPTERS=16
CONFIG_DVB_DYNAMIC_MINORS=y
# CONFIG_DVB_DEMUX_SECTION_LOSS_LOG is not set
# CONFIG_DVB_ULE_DEBUG is not set
# end of Digital TV options

#
# Media drivers
#
# CONFIG_MEDIA_USB_SUPPORT is not set
# CONFIG_MEDIA_PCI_SUPPORT is not set
CONFIG_RADIO_ADAPTERS=y
# CONFIG_RADIO_SI470X is not set
# CONFIG_RADIO_SI4713 is not set
# CONFIG_USB_MR800 is not set
# CONFIG_USB_DSBR is not set
# CONFIG_RADIO_MAXIRADIO is not set
# CONFIG_RADIO_SHARK is not set
# CONFIG_RADIO_SHARK2 is not set
# CONFIG_USB_KEENE is not set
# CONFIG_USB_RAREMONO is not set
# CONFIG_USB_MA901 is not set
# CONFIG_RADIO_TEA5764 is not set
# CONFIG_RADIO_SAA7706H is not set
# CONFIG_RADIO_TEF6862 is not set
# CONFIG_RADIO_WL1273 is not set
CONFIG_VIDEOBUF2_CORE=m
CONFIG_VIDEOBUF2_V4L2=m
CONFIG_VIDEOBUF2_MEMOPS=m
CONFIG_VIDEOBUF2_VMALLOC=m
# CONFIG_V4L_PLATFORM_DRIVERS is not set
# CONFIG_V4L_MEM2MEM_DRIVERS is not set
# CONFIG_DVB_PLATFORM_DRIVERS is not set
# CONFIG_SDR_PLATFORM_DRIVERS is not set

#
# MMC/SDIO DVB adapters
#
# CONFIG_SMS_SDIO_DRV is not set
# CONFIG_V4L_TEST_DRIVERS is not set
# CONFIG_DVB_TEST_DRIVERS is not set

#
# FireWire (IEEE 1394) Adapters
#
# CONFIG_DVB_FIREDTV is not set
# end of Media drivers

#
# Media ancillary drivers
#
CONFIG_MEDIA_ATTACH=y
CONFIG_VIDEO_IR_I2C=m

#
# Audio decoders, processors and mixers
#
# CONFIG_VIDEO_TVAUDIO is not set
# CONFIG_VIDEO_TDA7432 is not set
# CONFIG_VIDEO_TDA9840 is not set
# CONFIG_VIDEO_TEA6415C is not set
# CONFIG_VIDEO_TEA6420 is not set
# CONFIG_VIDEO_MSP3400 is not set
# CONFIG_VIDEO_CS3308 is not set
# CONFIG_VIDEO_CS5345 is not set
# CONFIG_VIDEO_CS53L32A is not set
# CONFIG_VIDEO_TLV320AIC23B is not set
# CONFIG_VIDEO_UDA1342 is not set
# CONFIG_VIDEO_WM8775 is not set
# CONFIG_VIDEO_WM8739 is not set
# CONFIG_VIDEO_VP27SMPX is not set
# CONFIG_VIDEO_SONY_BTF_MPX is not set
# end of Audio decoders, processors and mixers

#
# RDS decoders
#
# CONFIG_VIDEO_SAA6588 is not set
# end of RDS decoders

#
# Video decoders
#
# CONFIG_VIDEO_ADV7180 is not set
# CONFIG_VIDEO_ADV7183 is not set
# CONFIG_VIDEO_ADV7604 is not set
# CONFIG_VIDEO_ADV7842 is not set
# CONFIG_VIDEO_BT819 is not set
# CONFIG_VIDEO_BT856 is not set
# CONFIG_VIDEO_BT866 is not set
# CONFIG_VIDEO_KS0127 is not set
# CONFIG_VIDEO_ML86V7667 is not set
# CONFIG_VIDEO_SAA7110 is not set
# CONFIG_VIDEO_SAA711X is not set
# CONFIG_VIDEO_TC358743 is not set
# CONFIG_VIDEO_TVP514X is not set
# CONFIG_VIDEO_TVP5150 is not set
# CONFIG_VIDEO_TVP7002 is not set
# CONFIG_VIDEO_TW2804 is not set
# CONFIG_VIDEO_TW9903 is not set
# CONFIG_VIDEO_TW9906 is not set
# CONFIG_VIDEO_TW9910 is not set
# CONFIG_VIDEO_VPX3220 is not set

#
# Video and audio decoders
#
# CONFIG_VIDEO_SAA717X is not set
# CONFIG_VIDEO_CX25840 is not set
# end of Video decoders

#
# Video encoders
#
# CONFIG_VIDEO_SAA7127 is not set
# CONFIG_VIDEO_SAA7185 is not set
# CONFIG_VIDEO_ADV7170 is not set
# CONFIG_VIDEO_ADV7175 is not set
# CONFIG_VIDEO_ADV7343 is not set
# CONFIG_VIDEO_ADV7393 is not set
# CONFIG_VIDEO_ADV7511 is not set
# CONFIG_VIDEO_AD9389B is not set
# CONFIG_VIDEO_AK881X is not set
# CONFIG_VIDEO_THS8200 is not set
# end of Video encoders

#
# Video improvement chips
#
# CONFIG_VIDEO_UPD64031A is not set
# CONFIG_VIDEO_UPD64083 is not set
# end of Video improvement chips

#
# Audio/Video compression chips
#
# CONFIG_VIDEO_SAA6752HS is not set
# end of Audio/Video compression chips

#
# SDR tuner chips
#
# CONFIG_SDR_MAX2175 is not set
# end of SDR tuner chips

#
# Miscellaneous helper chips
#
# CONFIG_VIDEO_THS7303 is not set
# CONFIG_VIDEO_M52790 is not set
# CONFIG_VIDEO_I2C is not set
# CONFIG_VIDEO_ST_MIPID02 is not set
# end of Miscellaneous helper chips

#
# Camera sensor devices
#
# CONFIG_VIDEO_HI556 is not set
# CONFIG_VIDEO_IMX214 is not set
# CONFIG_VIDEO_IMX219 is not set
# CONFIG_VIDEO_IMX258 is not set
# CONFIG_VIDEO_IMX274 is not set
# CONFIG_VIDEO_IMX290 is not set
# CONFIG_VIDEO_IMX319 is not set
# CONFIG_VIDEO_IMX355 is not set
# CONFIG_VIDEO_OV02A10 is not set
# CONFIG_VIDEO_OV2640 is not set
# CONFIG_VIDEO_OV2659 is not set
# CONFIG_VIDEO_OV2680 is not set
# CONFIG_VIDEO_OV2685 is not set
# CONFIG_VIDEO_OV2740 is not set
# CONFIG_VIDEO_OV5647 is not set
# CONFIG_VIDEO_OV5648 is not set
# CONFIG_VIDEO_OV6650 is not set
# CONFIG_VIDEO_OV5670 is not set
# CONFIG_VIDEO_OV5675 is not set
# CONFIG_VIDEO_OV5695 is not set
# CONFIG_VIDEO_OV7251 is not set
# CONFIG_VIDEO_OV772X is not set
# CONFIG_VIDEO_OV7640 is not set
# CONFIG_VIDEO_OV7670 is not set
# CONFIG_VIDEO_OV7740 is not set
# CONFIG_VIDEO_OV8856 is not set
# CONFIG_VIDEO_OV8865 is not set
# CONFIG_VIDEO_OV9640 is not set
# CONFIG_VIDEO_OV9650 is not set
# CONFIG_VIDEO_OV9734 is not set
# CONFIG_VIDEO_OV13858 is not set
# CONFIG_VIDEO_VS6624 is not set
# CONFIG_VIDEO_MT9M001 is not set
# CONFIG_VIDEO_MT9M032 is not set
# CONFIG_VIDEO_MT9M111 is not set
# CONFIG_VIDEO_MT9P031 is not set
# CONFIG_VIDEO_MT9T001 is not set
# CONFIG_VIDEO_MT9T112 is not set
# CONFIG_VIDEO_MT9V011 is not set
# CONFIG_VIDEO_MT9V032 is not set
# CONFIG_VIDEO_MT9V111 is not set
# CONFIG_VIDEO_SR030PC30 is not set
# CONFIG_VIDEO_NOON010PC30 is not set
# CONFIG_VIDEO_M5MOLS is not set
# CONFIG_VIDEO_RDACM20 is not set
# CONFIG_VIDEO_RDACM21 is not set
# CONFIG_VIDEO_RJ54N1 is not set
# CONFIG_VIDEO_S5K6AA is not set
# CONFIG_VIDEO_S5K6A3 is not set
# CONFIG_VIDEO_S5K4ECGX is not set
# CONFIG_VIDEO_S5K5BAF is not set
# CONFIG_VIDEO_CCS is not set
# CONFIG_VIDEO_ET8EK8 is not set
# CONFIG_VIDEO_S5C73M3 is not set
# end of Camera sensor devices

#
# Lens drivers
#
# CONFIG_VIDEO_AD5820 is not set
# CONFIG_VIDEO_AK7375 is not set
# CONFIG_VIDEO_DW9714 is not set
# CONFIG_VIDEO_DW9768 is not set
# CONFIG_VIDEO_DW9807_VCM is not set
# end of Lens drivers

#
# Flash devices
#
# CONFIG_VIDEO_ADP1653 is not set
# CONFIG_VIDEO_LM3560 is not set
# CONFIG_VIDEO_LM3646 is not set
# end of Flash devices

#
# SPI helper chips
#
# CONFIG_VIDEO_GS1662 is not set
# end of SPI helper chips

#
# Media SPI Adapters
#
CONFIG_CXD2880_SPI_DRV=m
# end of Media SPI Adapters

CONFIG_MEDIA_TUNER=m

#
# Customize TV tuners
#
CONFIG_MEDIA_TUNER_SIMPLE=m
CONFIG_MEDIA_TUNER_TDA18250=m
CONFIG_MEDIA_TUNER_TDA8290=m
CONFIG_MEDIA_TUNER_TDA827X=m
CONFIG_MEDIA_TUNER_TDA18271=m
CONFIG_MEDIA_TUNER_TDA9887=m
CONFIG_MEDIA_TUNER_TEA5761=m
CONFIG_MEDIA_TUNER_TEA5767=m
CONFIG_MEDIA_TUNER_MSI001=m
CONFIG_MEDIA_TUNER_MT20XX=m
CONFIG_MEDIA_TUNER_MT2060=m
CONFIG_MEDIA_TUNER_MT2063=m
CONFIG_MEDIA_TUNER_MT2266=m
CONFIG_MEDIA_TUNER_MT2131=m
CONFIG_MEDIA_TUNER_QT1010=m
CONFIG_MEDIA_TUNER_XC2028=m
CONFIG_MEDIA_TUNER_XC5000=m
CONFIG_MEDIA_TUNER_XC4000=m
CONFIG_MEDIA_TUNER_MXL5005S=m
CONFIG_MEDIA_TUNER_MXL5007T=m
CONFIG_MEDIA_TUNER_MC44S803=m
CONFIG_MEDIA_TUNER_MAX2165=m
CONFIG_MEDIA_TUNER_TDA18218=m
CONFIG_MEDIA_TUNER_FC0011=m
CONFIG_MEDIA_TUNER_FC0012=m
CONFIG_MEDIA_TUNER_FC0013=m
CONFIG_MEDIA_TUNER_TDA18212=m
CONFIG_MEDIA_TUNER_E4000=m
CONFIG_MEDIA_TUNER_FC2580=m
CONFIG_MEDIA_TUNER_M88RS6000T=m
CONFIG_MEDIA_TUNER_TUA9001=m
CONFIG_MEDIA_TUNER_SI2157=m
CONFIG_MEDIA_TUNER_IT913X=m
CONFIG_MEDIA_TUNER_R820T=m
CONFIG_MEDIA_TUNER_MXL301RF=m
CONFIG_MEDIA_TUNER_QM1D1C0042=m
CONFIG_MEDIA_TUNER_QM1D1B0004=m
# end of Customize TV tuners

#
# Customise DVB Frontends
#

#
# Multistandard (satellite) frontends
#
CONFIG_DVB_STB0899=m
CONFIG_DVB_STB6100=m
CONFIG_DVB_STV090x=m
CONFIG_DVB_STV0910=m
CONFIG_DVB_STV6110x=m
CONFIG_DVB_STV6111=m
CONFIG_DVB_MXL5XX=m
CONFIG_DVB_M88DS3103=m

#
# Multistandard (cable + terrestrial) frontends
#
CONFIG_DVB_DRXK=m
CONFIG_DVB_TDA18271C2DD=m
CONFIG_DVB_SI2165=m
CONFIG_DVB_MN88472=m
CONFIG_DVB_MN88473=m

#
# DVB-S (satellite) frontends
#
CONFIG_DVB_CX24110=m
CONFIG_DVB_CX24123=m
CONFIG_DVB_MT312=m
CONFIG_DVB_ZL10036=m
CONFIG_DVB_ZL10039=m
CONFIG_DVB_S5H1420=m
CONFIG_DVB_STV0288=m
CONFIG_DVB_STB6000=m
CONFIG_DVB_STV0299=m
CONFIG_DVB_STV6110=m
CONFIG_DVB_STV0900=m
CONFIG_DVB_TDA8083=m
CONFIG_DVB_TDA10086=m
CONFIG_DVB_TDA8261=m
CONFIG_DVB_VES1X93=m
CONFIG_DVB_TUNER_ITD1000=m
CONFIG_DVB_TUNER_CX24113=m
CONFIG_DVB_TDA826X=m
CONFIG_DVB_TUA6100=m
CONFIG_DVB_CX24116=m
CONFIG_DVB_CX24117=m
CONFIG_DVB_CX24120=m
CONFIG_DVB_SI21XX=m
CONFIG_DVB_TS2020=m
CONFIG_DVB_DS3000=m
CONFIG_DVB_MB86A16=m
CONFIG_DVB_TDA10071=m

#
# DVB-T (terrestrial) frontends
#
CONFIG_DVB_SP8870=m
CONFIG_DVB_SP887X=m
CONFIG_DVB_CX22700=m
CONFIG_DVB_CX22702=m
CONFIG_DVB_S5H1432=m
CONFIG_DVB_DRXD=m
CONFIG_DVB_L64781=m
CONFIG_DVB_TDA1004X=m
CONFIG_DVB_NXT6000=m
CONFIG_DVB_MT352=m
CONFIG_DVB_ZL10353=m
CONFIG_DVB_DIB3000MB=m
CONFIG_DVB_DIB3000MC=m
CONFIG_DVB_DIB7000M=m
CONFIG_DVB_DIB7000P=m
CONFIG_DVB_DIB9000=m
CONFIG_DVB_TDA10048=m
CONFIG_DVB_AF9013=m
CONFIG_DVB_EC100=m
CONFIG_DVB_STV0367=m
CONFIG_DVB_CXD2820R=m
CONFIG_DVB_CXD2841ER=m
CONFIG_DVB_RTL2830=m
CONFIG_DVB_RTL2832=m
CONFIG_DVB_RTL2832_SDR=m
CONFIG_DVB_SI2168=m
CONFIG_DVB_ZD1301_DEMOD=m
CONFIG_DVB_CXD2880=m

#
# DVB-C (cable) frontends
#
CONFIG_DVB_VES1820=m
CONFIG_DVB_TDA10021=m
CONFIG_DVB_TDA10023=m
CONFIG_DVB_STV0297=m

#
# ATSC (North American/Korean Terrestrial/Cable DTV) frontends
#
CONFIG_DVB_NXT200X=m
CONFIG_DVB_OR51211=m
CONFIG_DVB_OR51132=m
CONFIG_DVB_BCM3510=m
CONFIG_DVB_LGDT330X=m
CONFIG_DVB_LGDT3305=m
CONFIG_DVB_LGDT3306A=m
CONFIG_DVB_LG2160=m
CONFIG_DVB_S5H1409=m
CONFIG_DVB_AU8522=m
CONFIG_DVB_AU8522_DTV=m
CONFIG_DVB_AU8522_V4L=m
CONFIG_DVB_S5H1411=m
CONFIG_DVB_MXL692=m

#
# ISDB-T (terrestrial) frontends
#
CONFIG_DVB_S921=m
CONFIG_DVB_DIB8000=m
CONFIG_DVB_MB86A20S=m

#
# ISDB-S (satellite) & ISDB-T (terrestrial) frontends
#
CONFIG_DVB_TC90522=m
CONFIG_DVB_MN88443X=m

#
# Digital terrestrial only tuners/PLL
#
CONFIG_DVB_PLL=m
CONFIG_DVB_TUNER_DIB0070=m
CONFIG_DVB_TUNER_DIB0090=m

#
# SEC control devices for DVB-S
#
CONFIG_DVB_DRX39XYJ=m
CONFIG_DVB_LNBH25=m
CONFIG_DVB_LNBH29=m
CONFIG_DVB_LNBP21=m
CONFIG_DVB_LNBP22=m
CONFIG_DVB_ISL6405=m
CONFIG_DVB_ISL6421=m
CONFIG_DVB_ISL6423=m
CONFIG_DVB_A8293=m
CONFIG_DVB_LGS8GL5=m
CONFIG_DVB_LGS8GXX=m
CONFIG_DVB_ATBM8830=m
CONFIG_DVB_TDA665x=m
CONFIG_DVB_IX2505V=m
CONFIG_DVB_M88RS2000=m
CONFIG_DVB_AF9033=m
CONFIG_DVB_HORUS3A=m
CONFIG_DVB_ASCOT2E=m
CONFIG_DVB_HELENE=m

#
# Common Interface (EN50221) controller drivers
#
CONFIG_DVB_CXD2099=m
CONFIG_DVB_SP2=m
# end of Customise DVB Frontends

#
# Tools to develop new frontends
#
# CONFIG_DVB_DUMMY_FE is not set
# end of Media ancillary drivers

#
# Graphics support
#
# CONFIG_AGP is not set
CONFIG_INTEL_GTT=m
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=64
CONFIG_VGA_SWITCHEROO=y
CONFIG_DRM=m
CONFIG_DRM_MIPI_DSI=y
CONFIG_DRM_DP_AUX_CHARDEV=y
# CONFIG_DRM_DEBUG_SELFTEST is not set
CONFIG_DRM_KMS_HELPER=m
CONFIG_DRM_FBDEV_EMULATION=y
CONFIG_DRM_FBDEV_OVERALLOC=100
CONFIG_DRM_LOAD_EDID_FIRMWARE=y
# CONFIG_DRM_DP_CEC is not set
CONFIG_DRM_TTM=m
CONFIG_DRM_VRAM_HELPER=m
CONFIG_DRM_TTM_HELPER=m
CONFIG_DRM_GEM_SHMEM_HELPER=y

#
# I2C encoder or helper chips
#
CONFIG_DRM_I2C_CH7006=m
CONFIG_DRM_I2C_SIL164=m
# CONFIG_DRM_I2C_NXP_TDA998X is not set
# CONFIG_DRM_I2C_NXP_TDA9950 is not set
# end of I2C encoder or helper chips

#
# ARM devices
#
# end of ARM devices

# CONFIG_DRM_RADEON is not set
# CONFIG_DRM_AMDGPU is not set
# CONFIG_DRM_NOUVEAU is not set
CONFIG_DRM_I915=m
CONFIG_DRM_I915_FORCE_PROBE=""
CONFIG_DRM_I915_CAPTURE_ERROR=y
CONFIG_DRM_I915_COMPRESS_ERROR=y
CONFIG_DRM_I915_USERPTR=y
CONFIG_DRM_I915_GVT=y
CONFIG_DRM_I915_GVT_KVMGT=m
CONFIG_DRM_I915_REQUEST_TIMEOUT=20000
CONFIG_DRM_I915_FENCE_TIMEOUT=10000
CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND=250
CONFIG_DRM_I915_HEARTBEAT_INTERVAL=2500
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT=8000
CONFIG_DRM_I915_STOP_TIMEOUT=100
CONFIG_DRM_I915_TIMESLICE_DURATION=1
# CONFIG_DRM_VGEM is not set
# CONFIG_DRM_VKMS is not set
# CONFIG_DRM_VMWGFX is not set
CONFIG_DRM_GMA500=m
# CONFIG_DRM_UDL is not set
CONFIG_DRM_AST=m
CONFIG_DRM_MGAG200=m
CONFIG_DRM_QXL=m
CONFIG_DRM_BOCHS=m
CONFIG_DRM_VIRTIO_GPU=m
CONFIG_DRM_PANEL=y

#
# Display Panels
#
# CONFIG_DRM_PANEL_RASPBERRYPI_TOUCHSCREEN is not set
# end of Display Panels

CONFIG_DRM_BRIDGE=y
CONFIG_DRM_PANEL_BRIDGE=y

#
# Display Interface Bridges
#
# CONFIG_DRM_ANALOGIX_ANX78XX is not set
# end of Display Interface Bridges

# CONFIG_DRM_ETNAVIV is not set
CONFIG_DRM_CIRRUS_QEMU=m
# CONFIG_DRM_GM12U320 is not set
# CONFIG_DRM_SIMPLEDRM is not set
# CONFIG_TINYDRM_HX8357D is not set
# CONFIG_TINYDRM_ILI9225 is not set
# CONFIG_TINYDRM_ILI9341 is not set
# CONFIG_TINYDRM_ILI9486 is not set
# CONFIG_TINYDRM_MI0283QT is not set
# CONFIG_TINYDRM_REPAPER is not set
# CONFIG_TINYDRM_ST7586 is not set
# CONFIG_TINYDRM_ST7735R is not set
# CONFIG_DRM_XEN_FRONTEND is not set
# CONFIG_DRM_VBOXVIDEO is not set
# CONFIG_DRM_GUD is not set
# CONFIG_DRM_LEGACY is not set
CONFIG_DRM_PANEL_ORIENTATION_QUIRKS=y

#
# Frame buffer Devices
#
CONFIG_FB_CMDLINE=y
CONFIG_FB_NOTIFY=y
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_BOOT_VESA_SUPPORT=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
CONFIG_FB_SYS_FILLRECT=m
CONFIG_FB_SYS_COPYAREA=m
CONFIG_FB_SYS_IMAGEBLIT=m
# CONFIG_FB_FOREIGN_ENDIAN is not set
CONFIG_FB_SYS_FOPS=m
CONFIG_FB_DEFERRED_IO=y
# CONFIG_FB_MODE_HELPERS is not set
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
# CONFIG_FB_UVESA is not set
CONFIG_FB_VESA=y
CONFIG_FB_EFI=y
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_OPENCORES is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_I740 is not set
# CONFIG_FB_LE80578 is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_SM501 is not set
# CONFIG_FB_SMSCUFX is not set
# CONFIG_FB_UDL is not set
# CONFIG_FB_IBM_GXT4500 is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_XEN_FBDEV_FRONTEND is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
CONFIG_FB_HYPERV=m
# CONFIG_FB_SIMPLE is not set
# CONFIG_FB_SM712 is not set
# end of Frame buffer Devices

#
# Backlight & LCD device support
#
CONFIG_LCD_CLASS_DEVICE=m
# CONFIG_LCD_L4F00242T03 is not set
# CONFIG_LCD_LMS283GF05 is not set
# CONFIG_LCD_LTV350QV is not set
# CONFIG_LCD_ILI922X is not set
# CONFIG_LCD_ILI9320 is not set
# CONFIG_LCD_TDO24M is not set
# CONFIG_LCD_VGG2432A4 is not set
CONFIG_LCD_PLATFORM=m
# CONFIG_LCD_AMS369FG06 is not set
# CONFIG_LCD_LMS501KF03 is not set
# CONFIG_LCD_HX8357 is not set
# CONFIG_LCD_OTM3225A is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
# CONFIG_BACKLIGHT_KTD253 is not set
# CONFIG_BACKLIGHT_PWM is not set
CONFIG_BACKLIGHT_APPLE=m
# CONFIG_BACKLIGHT_QCOM_WLED is not set
# CONFIG_BACKLIGHT_SAHARA is not set
# CONFIG_BACKLIGHT_ADP8860 is not set
# CONFIG_BACKLIGHT_ADP8870 is not set
# CONFIG_BACKLIGHT_LM3630A is not set
# CONFIG_BACKLIGHT_LM3639 is not set
CONFIG_BACKLIGHT_LP855X=m
# CONFIG_BACKLIGHT_GPIO is not set
# CONFIG_BACKLIGHT_LV5207LP is not set
# CONFIG_BACKLIGHT_BD6107 is not set
# CONFIG_BACKLIGHT_ARCXCNN is not set
# end of Backlight & LCD device support

CONFIG_HDMI=y

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_DUMMY_CONSOLE_COLUMNS=80
CONFIG_DUMMY_CONSOLE_ROWS=25
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
# CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER is not set
# end of Console display driver support

CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224=y
# end of Graphics support

# CONFIG_SOUND is not set

#
# HID support
#
CONFIG_HID=y
CONFIG_HID_BATTERY_STRENGTH=y
CONFIG_HIDRAW=y
CONFIG_UHID=m
CONFIG_HID_GENERIC=y

#
# Special HID drivers
#
CONFIG_HID_A4TECH=m
# CONFIG_HID_ACCUTOUCH is not set
CONFIG_HID_ACRUX=m
# CONFIG_HID_ACRUX_FF is not set
CONFIG_HID_APPLE=m
# CONFIG_HID_APPLEIR is not set
CONFIG_HID_ASUS=m
CONFIG_HID_AUREAL=m
CONFIG_HID_BELKIN=m
# CONFIG_HID_BETOP_FF is not set
# CONFIG_HID_BIGBEN_FF is not set
CONFIG_HID_CHERRY=m
CONFIG_HID_CHICONY=m
# CONFIG_HID_CORSAIR is not set
# CONFIG_HID_COUGAR is not set
# CONFIG_HID_MACALLY is not set
CONFIG_HID_CMEDIA=m
# CONFIG_HID_CP2112 is not set
# CONFIG_HID_CREATIVE_SB0540 is not set
CONFIG_HID_CYPRESS=m
CONFIG_HID_DRAGONRISE=m
# CONFIG_DRAGONRISE_FF is not set
# CONFIG_HID_EMS_FF is not set
# CONFIG_HID_ELAN is not set
CONFIG_HID_ELECOM=m
# CONFIG_HID_ELO is not set
CONFIG_HID_EZKEY=m
CONFIG_HID_GEMBIRD=m
CONFIG_HID_GFRM=m
# CONFIG_HID_GLORIOUS is not set
# CONFIG_HID_HOLTEK is not set
# CONFIG_HID_VIVALDI is not set
# CONFIG_HID_GT683R is not set
CONFIG_HID_KEYTOUCH=m
CONFIG_HID_KYE=m
# CONFIG_HID_UCLOGIC is not set
CONFIG_HID_WALTOP=m
# CONFIG_HID_VIEWSONIC is not set
CONFIG_HID_GYRATION=m
CONFIG_HID_ICADE=m
CONFIG_HID_ITE=m
CONFIG_HID_JABRA=m
CONFIG_HID_TWINHAN=m
CONFIG_HID_KENSINGTON=m
CONFIG_HID_LCPOWER=m
CONFIG_HID_LED=m
CONFIG_HID_LENOVO=m
CONFIG_HID_LOGITECH=m
CONFIG_HID_LOGITECH_DJ=m
CONFIG_HID_LOGITECH_HIDPP=m
# CONFIG_LOGITECH_FF is not set
# CONFIG_LOGIRUMBLEPAD2_FF is not set
# CONFIG_LOGIG940_FF is not set
# CONFIG_LOGIWHEELS_FF is not set
CONFIG_HID_MAGICMOUSE=y
# CONFIG_HID_MALTRON is not set
# CONFIG_HID_MAYFLASH is not set
# CONFIG_HID_REDRAGON is not set
CONFIG_HID_MICROSOFT=m
CONFIG_HID_MONTEREY=m
CONFIG_HID_MULTITOUCH=m
CONFIG_HID_NTI=m
# CONFIG_HID_NTRIG is not set
CONFIG_HID_ORTEK=m
CONFIG_HID_PANTHERLORD=m
# CONFIG_PANTHERLORD_FF is not set
# CONFIG_HID_PENMOUNT is not set
CONFIG_HID_PETALYNX=m
CONFIG_HID_PICOLCD=m
CONFIG_HID_PICOLCD_FB=y
CONFIG_HID_PICOLCD_BACKLIGHT=y
CONFIG_HID_PICOLCD_LCD=y
CONFIG_HID_PICOLCD_LEDS=y
CONFIG_HID_PICOLCD_CIR=y
CONFIG_HID_PLANTRONICS=m
# CONFIG_HID_PLAYSTATION is not set
CONFIG_HID_PRIMAX=m
# CONFIG_HID_RETRODE is not set
# CONFIG_HID_ROCCAT is not set
CONFIG_HID_SAITEK=m
CONFIG_HID_SAMSUNG=m
# CONFIG_HID_SONY is not set
CONFIG_HID_SPEEDLINK=m
# CONFIG_HID_STEAM is not set
CONFIG_HID_STEELSERIES=m
CONFIG_HID_SUNPLUS=m
CONFIG_HID_RMI=m
CONFIG_HID_GREENASIA=m
# CONFIG_GREENASIA_FF is not set
CONFIG_HID_HYPERV_MOUSE=m
CONFIG_HID_SMARTJOYPLUS=m
# CONFIG_SMARTJOYPLUS_FF is not set
CONFIG_HID_TIVO=m
CONFIG_HID_TOPSEED=m
CONFIG_HID_THINGM=m
CONFIG_HID_THRUSTMASTER=m
# CONFIG_THRUSTMASTER_FF is not set
# CONFIG_HID_UDRAW_PS3 is not set
# CONFIG_HID_U2FZERO is not set
# CONFIG_HID_WACOM is not set
CONFIG_HID_WIIMOTE=m
CONFIG_HID_XINMO=m
CONFIG_HID_ZEROPLUS=m
# CONFIG_ZEROPLUS_FF is not set
CONFIG_HID_ZYDACRON=m
CONFIG_HID_SENSOR_HUB=y
CONFIG_HID_SENSOR_CUSTOM_SENSOR=m
CONFIG_HID_ALPS=m
# CONFIG_HID_MCP2221 is not set
# end of Special HID drivers

#
# USB HID support
#
CONFIG_USB_HID=y
# CONFIG_HID_PID is not set
# CONFIG_USB_HIDDEV is not set
# end of USB HID support

#
# I2C HID support
#
# CONFIG_I2C_HID_ACPI is not set
# end of I2C HID support

#
# Intel ISH HID support
#
CONFIG_INTEL_ISH_HID=m
# CONFIG_INTEL_ISH_FIRMWARE_DOWNLOADER is not set
# end of Intel ISH HID support

#
# AMD SFH HID Support
#
# CONFIG_AMD_SFH_HID is not set
# end of AMD SFH HID Support
# end of HID support

CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_COMMON=y
# CONFIG_USB_LED_TRIG is not set
# CONFIG_USB_ULPI_BUS is not set
# CONFIG_USB_CONN_GPIO is not set
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB=y
CONFIG_USB_PCI=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y

#
# Miscellaneous USB options
#
CONFIG_USB_DEFAULT_PERSIST=y
# CONFIG_USB_FEW_INIT_RETRIES is not set
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_OTG is not set
# CONFIG_USB_OTG_PRODUCTLIST is not set
CONFIG_USB_LEDS_TRIGGER_USBPORT=y
CONFIG_USB_AUTOSUSPEND_DELAY=2
CONFIG_USB_MON=y

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
CONFIG_USB_XHCI_HCD=y
# CONFIG_USB_XHCI_DBGCAP is not set
CONFIG_USB_XHCI_PCI=y
# CONFIG_USB_XHCI_PCI_RENESAS is not set
# CONFIG_USB_XHCI_PLATFORM is not set
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
CONFIG_USB_EHCI_PCI=y
# CONFIG_USB_EHCI_FSL is not set
# CONFIG_USB_EHCI_HCD_PLATFORM is not set
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_FOTG210_HCD is not set
# CONFIG_USB_MAX3421_HCD is not set
CONFIG_USB_OHCI_HCD=y
CONFIG_USB_OHCI_HCD_PCI=y
# CONFIG_USB_OHCI_HCD_PLATFORM is not set
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_HCD_BCMA is not set
# CONFIG_USB_HCD_TEST_MODE is not set

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
CONFIG_USB_STORAGE=m
# CONFIG_USB_STORAGE_DEBUG is not set
# CONFIG_USB_STORAGE_REALTEK is not set
# CONFIG_USB_STORAGE_DATAFAB is not set
# CONFIG_USB_STORAGE_FREECOM is not set
# CONFIG_USB_STORAGE_ISD200 is not set
# CONFIG_USB_STORAGE_USBAT is not set
# CONFIG_USB_STORAGE_SDDR09 is not set
# CONFIG_USB_STORAGE_SDDR55 is not set
# CONFIG_USB_STORAGE_JUMPSHOT is not set
# CONFIG_USB_STORAGE_ALAUDA is not set
# CONFIG_USB_STORAGE_ONETOUCH is not set
# CONFIG_USB_STORAGE_KARMA is not set
# CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
# CONFIG_USB_STORAGE_ENE_UB6250 is not set
# CONFIG_USB_UAS is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set
# CONFIG_USBIP_CORE is not set
# CONFIG_USB_CDNS_SUPPORT is not set
# CONFIG_USB_MUSB_HDRC is not set
# CONFIG_USB_DWC3 is not set
# CONFIG_USB_DWC2 is not set
# CONFIG_USB_CHIPIDEA is not set
# CONFIG_USB_ISP1760 is not set

#
# USB port drivers
#
# CONFIG_USB_USS720 is not set
CONFIG_USB_SERIAL=m
CONFIG_USB_SERIAL_GENERIC=y
# CONFIG_USB_SERIAL_SIMPLE is not set
# CONFIG_USB_SERIAL_AIRCABLE is not set
# CONFIG_USB_SERIAL_ARK3116 is not set
# CONFIG_USB_SERIAL_BELKIN is not set
# CONFIG_USB_SERIAL_CH341 is not set
# CONFIG_USB_SERIAL_WHITEHEAT is not set
# CONFIG_USB_SERIAL_DIGI_ACCELEPORT is not set
# CONFIG_USB_SERIAL_CP210X is not set
# CONFIG_USB_SERIAL_CYPRESS_M8 is not set
# CONFIG_USB_SERIAL_EMPEG is not set
# CONFIG_USB_SERIAL_FTDI_SIO is not set
# CONFIG_USB_SERIAL_VISOR is not set
# CONFIG_USB_SERIAL_IPAQ is not set
# CONFIG_USB_SERIAL_IR is not set
# CONFIG_USB_SERIAL_EDGEPORT is not set
# CONFIG_USB_SERIAL_EDGEPORT_TI is not set
# CONFIG_USB_SERIAL_F81232 is not set
# CONFIG_USB_SERIAL_F8153X is not set
# CONFIG_USB_SERIAL_GARMIN is not set
# CONFIG_USB_SERIAL_IPW is not set
# CONFIG_USB_SERIAL_IUU is not set
# CONFIG_USB_SERIAL_KEYSPAN_PDA is not set
# CONFIG_USB_SERIAL_KEYSPAN is not set
# CONFIG_USB_SERIAL_KLSI is not set
# CONFIG_USB_SERIAL_KOBIL_SCT is not set
# CONFIG_USB_SERIAL_MCT_U232 is not set
# CONFIG_USB_SERIAL_METRO is not set
# CONFIG_USB_SERIAL_MOS7720 is not set
# CONFIG_USB_SERIAL_MOS7840 is not set
# CONFIG_USB_SERIAL_MXUPORT is not set
# CONFIG_USB_SERIAL_NAVMAN is not set
# CONFIG_USB_SERIAL_PL2303 is not set
# CONFIG_USB_SERIAL_OTI6858 is not set
# CONFIG_USB_SERIAL_QCAUX is not set
# CONFIG_USB_SERIAL_QUALCOMM is not set
# CONFIG_USB_SERIAL_SPCP8X5 is not set
# CONFIG_USB_SERIAL_SAFE is not set
# CONFIG_USB_SERIAL_SIERRAWIRELESS is not set
# CONFIG_USB_SERIAL_SYMBOL is not set
# CONFIG_USB_SERIAL_TI is not set
# CONFIG_USB_SERIAL_CYBERJACK is not set
# CONFIG_USB_SERIAL_OPTION is not set
# CONFIG_USB_SERIAL_OMNINET is not set
# CONFIG_USB_SERIAL_OPTICON is not set
# CONFIG_USB_SERIAL_XSENS_MT is not set
# CONFIG_USB_SERIAL_WISHBONE is not set
# CONFIG_USB_SERIAL_SSU100 is not set
# CONFIG_USB_SERIAL_QT2 is not set
# CONFIG_USB_SERIAL_UPD78F0730 is not set
# CONFIG_USB_SERIAL_XR is not set
CONFIG_USB_SERIAL_DEBUG=m

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_ADUTUX is not set
# CONFIG_USB_SEVSEG is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_FTDI_ELAN is not set
# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_APPLE_MFI_FASTCHARGE is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
# CONFIG_USB_TEST is not set
# CONFIG_USB_EHSET_TEST_FIXTURE is not set
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_YUREX is not set
# CONFIG_USB_EZUSB_FX2 is not set
# CONFIG_USB_HUB_USB251XB is not set
# CONFIG_USB_HSIC_USB3503 is not set
# CONFIG_USB_HSIC_USB4604 is not set
# CONFIG_USB_LINK_LAYER_TEST is not set
# CONFIG_USB_CHAOSKEY is not set
# CONFIG_USB_ATM is not set

#
# USB Physical Layer drivers
#
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_USB_GPIO_VBUS is not set
# CONFIG_USB_ISP1301 is not set
# end of USB Physical Layer drivers

# CONFIG_USB_GADGET is not set
CONFIG_TYPEC=y
# CONFIG_TYPEC_TCPM is not set
CONFIG_TYPEC_UCSI=y
# CONFIG_UCSI_CCG is not set
CONFIG_UCSI_ACPI=y
# CONFIG_TYPEC_TPS6598X is not set
# CONFIG_TYPEC_STUSB160X is not set

#
# USB Type-C Multiplexer/DeMultiplexer Switch support
#
# CONFIG_TYPEC_MUX_PI3USB30532 is not set
# end of USB Type-C Multiplexer/DeMultiplexer Switch support

#
# USB Type-C Alternate Mode drivers
#
# CONFIG_TYPEC_DP_ALTMODE is not set
# end of USB Type-C Alternate Mode drivers

# CONFIG_USB_ROLE_SWITCH is not set
CONFIG_MMC=m
CONFIG_MMC_BLOCK=m
CONFIG_MMC_BLOCK_MINORS=8
CONFIG_SDIO_UART=m
# CONFIG_MMC_TEST is not set

#
# MMC/SD/SDIO Host Controller Drivers
#
# CONFIG_MMC_DEBUG is not set
CONFIG_MMC_SDHCI=m
CONFIG_MMC_SDHCI_IO_ACCESSORS=y
CONFIG_MMC_SDHCI_PCI=m
CONFIG_MMC_RICOH_MMC=y
CONFIG_MMC_SDHCI_ACPI=m
CONFIG_MMC_SDHCI_PLTFM=m
# CONFIG_MMC_SDHCI_F_SDH30 is not set
# CONFIG_MMC_WBSD is not set
# CONFIG_MMC_TIFM_SD is not set
# CONFIG_MMC_SPI is not set
# CONFIG_MMC_CB710 is not set
# CONFIG_MMC_VIA_SDMMC is not set
# CONFIG_MMC_VUB300 is not set
# CONFIG_MMC_USHC is not set
# CONFIG_MMC_USDHI6ROL0 is not set
# CONFIG_MMC_REALTEK_PCI is not set
CONFIG_MMC_CQHCI=m
# CONFIG_MMC_HSQ is not set
# CONFIG_MMC_TOSHIBA_PCI is not set
# CONFIG_MMC_MTK is not set
# CONFIG_MMC_SDHCI_XENON is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
# CONFIG_LEDS_CLASS_FLASH is not set
# CONFIG_LEDS_CLASS_MULTICOLOR is not set
# CONFIG_LEDS_BRIGHTNESS_HW_CHANGED is not set

#
# LED drivers
#
# CONFIG_LEDS_APU is not set
CONFIG_LEDS_LM3530=m
# CONFIG_LEDS_LM3532 is not set
# CONFIG_LEDS_LM3642 is not set
# CONFIG_LEDS_PCA9532 is not set
# CONFIG_LEDS_GPIO is not set
CONFIG_LEDS_LP3944=m
# CONFIG_LEDS_LP3952 is not set
# CONFIG_LEDS_LP50XX is not set
CONFIG_LEDS_CLEVO_MAIL=m
# CONFIG_LEDS_PCA955X is not set
# CONFIG_LEDS_PCA963X is not set
# CONFIG_LEDS_DAC124S085 is not set
# CONFIG_LEDS_PWM is not set
# CONFIG_LEDS_BD2802 is not set
CONFIG_LEDS_INTEL_SS4200=m
# CONFIG_LEDS_TCA6507 is not set
# CONFIG_LEDS_TLC591XX is not set
# CONFIG_LEDS_LM355x is not set

#
# LED driver for blink(1) USB RGB LED is under Special HID drivers (HID_THINGM)
#
CONFIG_LEDS_BLINKM=m
CONFIG_LEDS_MLXCPLD=m
# CONFIG_LEDS_MLXREG is not set
# CONFIG_LEDS_USER is not set
# CONFIG_LEDS_NIC78BX is not set
# CONFIG_LEDS_TI_LMU_COMMON is not set

#
# Flash and Torch LED drivers
#

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_LEDS_TRIGGER_ONESHOT=m
# CONFIG_LEDS_TRIGGER_DISK is not set
CONFIG_LEDS_TRIGGER_HEARTBEAT=m
CONFIG_LEDS_TRIGGER_BACKLIGHT=m
# CONFIG_LEDS_TRIGGER_CPU is not set
# CONFIG_LEDS_TRIGGER_ACTIVITY is not set
CONFIG_LEDS_TRIGGER_GPIO=m
CONFIG_LEDS_TRIGGER_DEFAULT_ON=m

#
# iptables trigger is under Netfilter config (LED target)
#
CONFIG_LEDS_TRIGGER_TRANSIENT=m
CONFIG_LEDS_TRIGGER_CAMERA=m
# CONFIG_LEDS_TRIGGER_PANIC is not set
# CONFIG_LEDS_TRIGGER_NETDEV is not set
# CONFIG_LEDS_TRIGGER_PATTERN is not set
CONFIG_LEDS_TRIGGER_AUDIO=m
# CONFIG_LEDS_TRIGGER_TTY is not set

#
# LED Blink
#
# CONFIG_LEDS_BLINK is not set
# CONFIG_ACCESSIBILITY is not set
CONFIG_INFINIBAND=m
CONFIG_INFINIBAND_USER_MAD=m
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_USER_MEM=y
CONFIG_INFINIBAND_ON_DEMAND_PAGING=y
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS=y
CONFIG_INFINIBAND_VIRT_DMA=y
# CONFIG_INFINIBAND_MTHCA is not set
# CONFIG_INFINIBAND_EFA is not set
# CONFIG_INFINIBAND_I40IW is not set
# CONFIG_MLX4_INFINIBAND is not set
# CONFIG_INFINIBAND_OCRDMA is not set
# CONFIG_INFINIBAND_USNIC is not set
# CONFIG_INFINIBAND_BNXT_RE is not set
# CONFIG_INFINIBAND_RDMAVT is not set
CONFIG_RDMA_RXE=m
CONFIG_RDMA_SIW=m
CONFIG_INFINIBAND_IPOIB=m
# CONFIG_INFINIBAND_IPOIB_CM is not set
CONFIG_INFINIBAND_IPOIB_DEBUG=y
# CONFIG_INFINIBAND_IPOIB_DEBUG_DATA is not set
CONFIG_INFINIBAND_SRP=m
CONFIG_INFINIBAND_SRPT=m
# CONFIG_INFINIBAND_ISER is not set
# CONFIG_INFINIBAND_ISERT is not set
# CONFIG_INFINIBAND_RTRS_CLIENT is not set
# CONFIG_INFINIBAND_RTRS_SERVER is not set
# CONFIG_INFINIBAND_OPA_VNIC is not set
CONFIG_EDAC_ATOMIC_SCRUB=y
CONFIG_EDAC_SUPPORT=y
CONFIG_EDAC=y
CONFIG_EDAC_LEGACY_SYSFS=y
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_DECODE_MCE=m
CONFIG_EDAC_GHES=y
CONFIG_EDAC_AMD64=m
CONFIG_EDAC_E752X=m
CONFIG_EDAC_I82975X=m
CONFIG_EDAC_I3000=m
CONFIG_EDAC_I3200=m
CONFIG_EDAC_IE31200=m
CONFIG_EDAC_X38=m
CONFIG_EDAC_I5400=m
CONFIG_EDAC_I7CORE=m
CONFIG_EDAC_I5000=m
CONFIG_EDAC_I5100=m
CONFIG_EDAC_I7300=m
CONFIG_EDAC_SBRIDGE=m
CONFIG_EDAC_SKX=m
# CONFIG_EDAC_I10NM is not set
CONFIG_EDAC_PND2=m
# CONFIG_EDAC_IGEN6 is not set
CONFIG_RTC_LIB=y
CONFIG_RTC_MC146818_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
# CONFIG_RTC_SYSTOHC is not set
# CONFIG_RTC_DEBUG is not set
CONFIG_RTC_NVMEM=y

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
# CONFIG_RTC_DRV_ABB5ZES3 is not set
# CONFIG_RTC_DRV_ABEOZ9 is not set
# CONFIG_RTC_DRV_ABX80X is not set
CONFIG_RTC_DRV_DS1307=m
# CONFIG_RTC_DRV_DS1307_CENTURY is not set
CONFIG_RTC_DRV_DS1374=m
# CONFIG_RTC_DRV_DS1374_WDT is not set
CONFIG_RTC_DRV_DS1672=m
CONFIG_RTC_DRV_MAX6900=m
CONFIG_RTC_DRV_RS5C372=m
CONFIG_RTC_DRV_ISL1208=m
CONFIG_RTC_DRV_ISL12022=m
CONFIG_RTC_DRV_X1205=m
CONFIG_RTC_DRV_PCF8523=m
# CONFIG_RTC_DRV_PCF85063 is not set
# CONFIG_RTC_DRV_PCF85363 is not set
CONFIG_RTC_DRV_PCF8563=m
CONFIG_RTC_DRV_PCF8583=m
CONFIG_RTC_DRV_M41T80=m
CONFIG_RTC_DRV_M41T80_WDT=y
CONFIG_RTC_DRV_BQ32K=m
# CONFIG_RTC_DRV_S35390A is not set
CONFIG_RTC_DRV_FM3130=m
# CONFIG_RTC_DRV_RX8010 is not set
CONFIG_RTC_DRV_RX8581=m
CONFIG_RTC_DRV_RX8025=m
CONFIG_RTC_DRV_EM3027=m
# CONFIG_RTC_DRV_RV3028 is not set
# CONFIG_RTC_DRV_RV3032 is not set
# CONFIG_RTC_DRV_RV8803 is not set
# CONFIG_RTC_DRV_SD3078 is not set

#
# SPI RTC drivers
#
# CONFIG_RTC_DRV_M41T93 is not set
# CONFIG_RTC_DRV_M41T94 is not set
# CONFIG_RTC_DRV_DS1302 is not set
# CONFIG_RTC_DRV_DS1305 is not set
# CONFIG_RTC_DRV_DS1343 is not set
# CONFIG_RTC_DRV_DS1347 is not set
# CONFIG_RTC_DRV_DS1390 is not set
# CONFIG_RTC_DRV_MAX6916 is not set
# CONFIG_RTC_DRV_R9701 is not set
CONFIG_RTC_DRV_RX4581=m
# CONFIG_RTC_DRV_RS5C348 is not set
# CONFIG_RTC_DRV_MAX6902 is not set
# CONFIG_RTC_DRV_PCF2123 is not set
# CONFIG_RTC_DRV_MCP795 is not set
CONFIG_RTC_I2C_AND_SPI=y

#
# SPI and I2C RTC drivers
#
CONFIG_RTC_DRV_DS3232=m
CONFIG_RTC_DRV_DS3232_HWMON=y
# CONFIG_RTC_DRV_PCF2127 is not set
CONFIG_RTC_DRV_RV3029C2=m
# CONFIG_RTC_DRV_RV3029_HWMON is not set
# CONFIG_RTC_DRV_RX6110 is not set

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=y
CONFIG_RTC_DRV_DS1286=m
CONFIG_RTC_DRV_DS1511=m
CONFIG_RTC_DRV_DS1553=m
# CONFIG_RTC_DRV_DS1685_FAMILY is not set
CONFIG_RTC_DRV_DS1742=m
CONFIG_RTC_DRV_DS2404=m
CONFIG_RTC_DRV_STK17TA8=m
# CONFIG_RTC_DRV_M48T86 is not set
CONFIG_RTC_DRV_M48T35=m
CONFIG_RTC_DRV_M48T59=m
CONFIG_RTC_DRV_MSM6242=m
CONFIG_RTC_DRV_BQ4802=m
CONFIG_RTC_DRV_RP5C01=m
CONFIG_RTC_DRV_V3020=m

#
# on-CPU RTC drivers
#
# CONFIG_RTC_DRV_FTRTC010 is not set

#
# HID Sensor RTC drivers
#
CONFIG_DMADEVICES=y
# CONFIG_DMADEVICES_DEBUG is not set

#
# DMA Devices
#
CONFIG_DMA_ENGINE=y
CONFIG_DMA_VIRTUAL_CHANNELS=y
CONFIG_DMA_ACPI=y
# CONFIG_ALTERA_MSGDMA is not set
CONFIG_INTEL_IDMA64=m
# CONFIG_INTEL_IDXD is not set
CONFIG_INTEL_IOATDMA=m
# CONFIG_PLX_DMA is not set
# CONFIG_XILINX_ZYNQMP_DPDMA is not set
# CONFIG_QCOM_HIDMA_MGMT is not set
# CONFIG_QCOM_HIDMA is not set
CONFIG_DW_DMAC_CORE=y
CONFIG_DW_DMAC=m
CONFIG_DW_DMAC_PCI=y
# CONFIG_DW_EDMA is not set
# CONFIG_DW_EDMA_PCIE is not set
CONFIG_HSU_DMA=y
# CONFIG_SF_PDMA is not set
# CONFIG_INTEL_LDMA is not set

#
# DMA Clients
#
CONFIG_ASYNC_TX_DMA=y
CONFIG_DMATEST=m
CONFIG_DMA_ENGINE_RAID=y

#
# DMABUF options
#
CONFIG_SYNC_FILE=y
# CONFIG_SW_SYNC is not set
# CONFIG_UDMABUF is not set
# CONFIG_DMABUF_MOVE_NOTIFY is not set
# CONFIG_DMABUF_DEBUG is not set
# CONFIG_DMABUF_SELFTESTS is not set
# CONFIG_DMABUF_HEAPS is not set
# end of DMABUF options

CONFIG_DCA=m
# CONFIG_AUXDISPLAY is not set
# CONFIG_PANEL is not set
CONFIG_UIO=m
CONFIG_UIO_CIF=m
CONFIG_UIO_PDRV_GENIRQ=m
# CONFIG_UIO_DMEM_GENIRQ is not set
CONFIG_UIO_AEC=m
CONFIG_UIO_SERCOS3=m
CONFIG_UIO_PCI_GENERIC=m
# CONFIG_UIO_NETX is not set
# CONFIG_UIO_PRUSS is not set
# CONFIG_UIO_MF624 is not set
CONFIG_UIO_HV_GENERIC=m
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_VFIO_VIRQFD=m
CONFIG_VFIO=m
CONFIG_VFIO_NOIOMMU=y
CONFIG_VFIO_PCI=m
# CONFIG_VFIO_PCI_VGA is not set
CONFIG_VFIO_PCI_MMAP=y
CONFIG_VFIO_PCI_INTX=y
# CONFIG_VFIO_PCI_IGD is not set
CONFIG_VFIO_MDEV=m
CONFIG_VFIO_MDEV_DEVICE=m
CONFIG_IRQ_BYPASS_MANAGER=m
# CONFIG_VIRT_DRIVERS is not set
CONFIG_VIRTIO=y
CONFIG_VIRTIO_PCI_LIB=y
CONFIG_VIRTIO_MENU=y
CONFIG_VIRTIO_PCI=y
CONFIG_VIRTIO_PCI_LEGACY=y
# CONFIG_VIRTIO_PMEM is not set
CONFIG_VIRTIO_BALLOON=m
CONFIG_VIRTIO_MEM=m
CONFIG_VIRTIO_INPUT=m
# CONFIG_VIRTIO_MMIO is not set
CONFIG_VIRTIO_DMA_SHARED_BUFFER=m
# CONFIG_VDPA is not set
CONFIG_VHOST_IOTLB=m
CONFIG_VHOST=m
CONFIG_VHOST_MENU=y
CONFIG_VHOST_NET=m
# CONFIG_VHOST_SCSI is not set
CONFIG_VHOST_VSOCK=m
# CONFIG_VHOST_CROSS_ENDIAN_LEGACY is not set

#
# Microsoft Hyper-V guest support
#
CONFIG_HYPERV=m
CONFIG_HYPERV_TIMER=y
CONFIG_HYPERV_UTILS=m
CONFIG_HYPERV_BALLOON=m
# end of Microsoft Hyper-V guest support

#
# Xen driver support
#
# CONFIG_XEN_BALLOON is not set
CONFIG_XEN_DEV_EVTCHN=m
# CONFIG_XEN_BACKEND is not set
CONFIG_XENFS=m
CONFIG_XEN_COMPAT_XENFS=y
CONFIG_XEN_SYS_HYPERVISOR=y
CONFIG_XEN_XENBUS_FRONTEND=y
# CONFIG_XEN_GNTDEV is not set
# CONFIG_XEN_GRANT_DEV_ALLOC is not set
# CONFIG_XEN_GRANT_DMA_ALLOC is not set
CONFIG_SWIOTLB_XEN=y
# CONFIG_XEN_PVCALLS_FRONTEND is not set
CONFIG_XEN_PRIVCMD=m
CONFIG_XEN_EFI=y
CONFIG_XEN_AUTO_XLATE=y
CONFIG_XEN_ACPI=y
# CONFIG_XEN_UNPOPULATED_ALLOC is not set
# end of Xen driver support

# CONFIG_GREYBUS is not set
# CONFIG_STAGING is not set
CONFIG_X86_PLATFORM_DEVICES=y
CONFIG_ACPI_WMI=m
CONFIG_WMI_BMOF=m
# CONFIG_HUAWEI_WMI is not set
# CONFIG_UV_SYSFS is not set
# CONFIG_INTEL_WMI_SBL_FW_UPDATE is not set
CONFIG_INTEL_WMI_THUNDERBOLT=m
CONFIG_MXM_WMI=m
# CONFIG_PEAQ_WMI is not set
# CONFIG_XIAOMI_WMI is not set
CONFIG_ACERHDF=m
# CONFIG_ACER_WIRELESS is not set
CONFIG_ACER_WMI=m
# CONFIG_AMD_PMC is not set
CONFIG_APPLE_GMUX=m
CONFIG_ASUS_LAPTOP=m
# CONFIG_ASUS_WIRELESS is not set
CONFIG_ASUS_WMI=m
CONFIG_ASUS_NB_WMI=m
CONFIG_EEEPC_LAPTOP=m
CONFIG_EEEPC_WMI=m
# CONFIG_X86_PLATFORM_DRIVERS_DELL is not set
CONFIG_AMILO_RFKILL=m
CONFIG_FUJITSU_LAPTOP=m
CONFIG_FUJITSU_TABLET=m
# CONFIG_GPD_POCKET_FAN is not set
CONFIG_HP_ACCEL=m
CONFIG_HP_WIRELESS=m
CONFIG_HP_WMI=m
# CONFIG_IBM_RTL is not set
CONFIG_IDEAPAD_LAPTOP=m
CONFIG_SENSORS_HDAPS=m
CONFIG_THINKPAD_ACPI=m
# CONFIG_THINKPAD_ACPI_DEBUGFACILITIES is not set
# CONFIG_THINKPAD_ACPI_DEBUG is not set
# CONFIG_THINKPAD_ACPI_UNSAFE_LEDS is not set
CONFIG_THINKPAD_ACPI_VIDEO=y
CONFIG_THINKPAD_ACPI_HOTKEY_POLL=y
# CONFIG_INTEL_ATOMISP2_PM is not set
CONFIG_INTEL_HID_EVENT=m
# CONFIG_INTEL_INT0002_VGPIO is not set
# CONFIG_INTEL_MENLOW is not set
CONFIG_INTEL_OAKTRAIL=m
CONFIG_INTEL_VBTN=m
CONFIG_MSI_LAPTOP=m
CONFIG_MSI_WMI=m
# CONFIG_PCENGINES_APU2 is not set
CONFIG_SAMSUNG_LAPTOP=m
CONFIG_SAMSUNG_Q10=m
CONFIG_TOSHIBA_BT_RFKILL=m
# CONFIG_TOSHIBA_HAPS is not set
# CONFIG_TOSHIBA_WMI is not set
CONFIG_ACPI_CMPC=m
CONFIG_COMPAL_LAPTOP=m
# CONFIG_LG_LAPTOP is not set
CONFIG_PANASONIC_LAPTOP=m
CONFIG_SONY_LAPTOP=m
CONFIG_SONYPI_COMPAT=y
# CONFIG_SYSTEM76_ACPI is not set
CONFIG_TOPSTAR_LAPTOP=m
# CONFIG_I2C_MULTI_INSTANTIATE is not set
CONFIG_MLX_PLATFORM=m
CONFIG_INTEL_IPS=m
CONFIG_INTEL_RST=m
# CONFIG_INTEL_SMARTCONNECT is not set

#
# Intel Speed Select Technology interface support
#
# CONFIG_INTEL_SPEED_SELECT_INTERFACE is not set
# end of Intel Speed Select Technology interface support

CONFIG_INTEL_TURBO_MAX_3=y
# CONFIG_INTEL_UNCORE_FREQ_CONTROL is not set
CONFIG_INTEL_PMC_CORE=m
# CONFIG_INTEL_PUNIT_IPC is not set
# CONFIG_INTEL_SCU_PCI is not set
# CONFIG_INTEL_SCU_PLATFORM is not set
CONFIG_PMC_ATOM=y
# CONFIG_CHROME_PLATFORMS is not set
CONFIG_MELLANOX_PLATFORM=y
CONFIG_MLXREG_HOTPLUG=m
# CONFIG_MLXREG_IO is not set
CONFIG_SURFACE_PLATFORMS=y
# CONFIG_SURFACE3_WMI is not set
# CONFIG_SURFACE_3_POWER_OPREGION is not set
# CONFIG_SURFACE_GPE is not set
# CONFIG_SURFACE_HOTPLUG is not set
# CONFIG_SURFACE_PRO3_BUTTON is not set
CONFIG_HAVE_CLK=y
CONFIG_CLKDEV_LOOKUP=y
CONFIG_HAVE_CLK_PREPARE=y
CONFIG_COMMON_CLK=y
# CONFIG_COMMON_CLK_MAX9485 is not set
# CONFIG_COMMON_CLK_SI5341 is not set
# CONFIG_COMMON_CLK_SI5351 is not set
# CONFIG_COMMON_CLK_SI544 is not set
# CONFIG_COMMON_CLK_CDCE706 is not set
# CONFIG_COMMON_CLK_CS2000_CP is not set
# CONFIG_COMMON_CLK_PWM is not set
# CONFIG_XILINX_VCU is not set
CONFIG_HWSPINLOCK=y

#
# Clock Source drivers
#
CONFIG_CLKEVT_I8253=y
CONFIG_I8253_LOCK=y
CONFIG_CLKBLD_I8253=y
# end of Clock Source drivers

CONFIG_MAILBOX=y
CONFIG_PCC=y
# CONFIG_ALTERA_MBOX is not set
CONFIG_IOMMU_IOVA=y
CONFIG_IOASID=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
CONFIG_IOMMU_IO_PGTABLE=y
# end of Generic IOMMU Pagetable Support

# CONFIG_IOMMU_DEBUGFS is not set
# CONFIG_IOMMU_DEFAULT_PASSTHROUGH is not set
CONFIG_IOMMU_DMA=y
CONFIG_AMD_IOMMU=y
CONFIG_AMD_IOMMU_V2=m
CONFIG_DMAR_TABLE=y
CONFIG_INTEL_IOMMU=y
# CONFIG_INTEL_IOMMU_SVM is not set
# CONFIG_INTEL_IOMMU_DEFAULT_ON is not set
CONFIG_INTEL_IOMMU_FLOPPY_WA=y
# CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON is not set
CONFIG_IRQ_REMAP=y
CONFIG_HYPERV_IOMMU=y

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
# CONFIG_RPMSG_QCOM_GLINK_RPM is not set
# CONFIG_RPMSG_VIRTIO is not set
# end of Rpmsg drivers

# CONFIG_SOUNDWIRE is not set

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
# end of Amlogic SoC drivers

#
# Broadcom SoC drivers
#
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# end of NXP/Freescale QorIQ SoC drivers

#
# i.MX SoC drivers
#
# end of i.MX SoC drivers

#
# Enable LiteX SoC Builder specific drivers
#
# end of Enable LiteX SoC Builder specific drivers

#
# Qualcomm SoC drivers
#
# end of Qualcomm SoC drivers

# CONFIG_SOC_TI is not set

#
# Xilinx SoC drivers
#
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
CONFIG_NTB=m
# CONFIG_NTB_MSI is not set
# CONFIG_NTB_AMD is not set
# CONFIG_NTB_IDT is not set
# CONFIG_NTB_INTEL is not set
# CONFIG_NTB_EPF is not set
# CONFIG_NTB_SWITCHTEC is not set
# CONFIG_NTB_PINGPONG is not set
# CONFIG_NTB_TOOL is not set
# CONFIG_NTB_PERF is not set
# CONFIG_NTB_TRANSPORT is not set
# CONFIG_VME_BUS is not set
CONFIG_PWM=y
CONFIG_PWM_SYSFS=y
# CONFIG_PWM_DEBUG is not set
# CONFIG_PWM_DWC is not set
CONFIG_PWM_LPSS=m
CONFIG_PWM_LPSS_PCI=m
CONFIG_PWM_LPSS_PLATFORM=m
# CONFIG_PWM_PCA9685 is not set

#
# IRQ chip support
#
# end of IRQ chip support

# CONFIG_IPACK_BUS is not set
# CONFIG_RESET_CONTROLLER is not set

#
# PHY Subsystem
#
# CONFIG_GENERIC_PHY is not set
# CONFIG_USB_LGM_PHY is not set
# CONFIG_BCM_KONA_USB2_PHY is not set
# CONFIG_PHY_PXA_28NM_HSIC is not set
# CONFIG_PHY_PXA_28NM_USB2 is not set
# CONFIG_PHY_INTEL_LGM_EMMC is not set
# end of PHY Subsystem

CONFIG_POWERCAP=y
CONFIG_INTEL_RAPL_CORE=m
CONFIG_INTEL_RAPL=m
# CONFIG_IDLE_INJECT is not set
# CONFIG_DTPM is not set
# CONFIG_MCB is not set

#
# Performance monitor support
#
# end of Performance monitor support

CONFIG_RAS=y
# CONFIG_RAS_CEC is not set
# CONFIG_USB4 is not set

#
# Android
#
# CONFIG_ANDROID is not set
# end of Android

CONFIG_LIBNVDIMM=m
CONFIG_BLK_DEV_PMEM=m
CONFIG_ND_BLK=m
CONFIG_ND_CLAIM=y
CONFIG_ND_BTT=m
CONFIG_BTT=y
CONFIG_ND_PFN=m
CONFIG_NVDIMM_PFN=y
CONFIG_NVDIMM_DAX=y
CONFIG_NVDIMM_KEYS=y
CONFIG_DAX_DRIVER=y
CONFIG_DAX=y
CONFIG_DEV_DAX=m
CONFIG_DEV_DAX_PMEM=m
CONFIG_DEV_DAX_KMEM=m
CONFIG_DEV_DAX_PMEM_COMPAT=m
CONFIG_NVMEM=y
CONFIG_NVMEM_SYSFS=y
# CONFIG_NVMEM_RMEM is not set

#
# HW tracing support
#
CONFIG_STM=m
# CONFIG_STM_PROTO_BASIC is not set
# CONFIG_STM_PROTO_SYS_T is not set
CONFIG_STM_DUMMY=m
CONFIG_STM_SOURCE_CONSOLE=m
CONFIG_STM_SOURCE_HEARTBEAT=m
CONFIG_STM_SOURCE_FTRACE=m
CONFIG_INTEL_TH=m
CONFIG_INTEL_TH_PCI=m
CONFIG_INTEL_TH_ACPI=m
CONFIG_INTEL_TH_GTH=m
CONFIG_INTEL_TH_STH=m
CONFIG_INTEL_TH_MSU=m
CONFIG_INTEL_TH_PTI=m
# CONFIG_INTEL_TH_DEBUG is not set
# end of HW tracing support

# CONFIG_FPGA is not set
# CONFIG_TEE is not set
# CONFIG_UNISYS_VISORBUS is not set
# CONFIG_SIOX is not set
# CONFIG_SLIMBUS is not set
# CONFIG_INTERCONNECT is not set
# CONFIG_COUNTER is not set
# CONFIG_MOST is not set
# end of Device Drivers

#
# File systems
#
CONFIG_DCACHE_WORD_ACCESS=y
# CONFIG_VALIDATE_FS_PARSER is not set
CONFIG_FS_IOMAP=y
CONFIG_EXT2_FS=m
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
# CONFIG_EXT3_FS is not set
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
# CONFIG_EXT4_DEBUG is not set
CONFIG_EXT4_KUNIT_TESTS=m
CONFIG_JBD2=y
# CONFIG_JBD2_DEBUG is not set
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
CONFIG_XFS_FS=m
CONFIG_XFS_SUPPORT_V4=y
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
CONFIG_XFS_ONLINE_SCRUB=y
CONFIG_XFS_ONLINE_REPAIR=y
CONFIG_XFS_DEBUG=y
CONFIG_XFS_ASSERT_FATAL=y
CONFIG_GFS2_FS=m
CONFIG_GFS2_FS_LOCKING_DLM=y
CONFIG_OCFS2_FS=m
CONFIG_OCFS2_FS_O2CB=m
CONFIG_OCFS2_FS_USERSPACE_CLUSTER=m
CONFIG_OCFS2_FS_STATS=y
CONFIG_OCFS2_DEBUG_MASKLOG=y
# CONFIG_OCFS2_DEBUG_FS is not set
CONFIG_BTRFS_FS=m
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set
# CONFIG_BTRFS_FS_REF_VERIFY is not set
# CONFIG_NILFS2_FS is not set
CONFIG_F2FS_FS=m
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_FS_SECURITY=y
# CONFIG_F2FS_CHECK_FS is not set
# CONFIG_F2FS_FAULT_INJECTION is not set
# CONFIG_F2FS_FS_COMPRESSION is not set
# CONFIG_ZONEFS_FS is not set
CONFIG_FS_DAX=y
CONFIG_FS_DAX_PMD=y
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_EXPORTFS_BLOCK_OPS=y
CONFIG_FILE_LOCKING=y
CONFIG_MANDATORY_FILE_LOCKING=y
CONFIG_FS_ENCRYPTION=y
CONFIG_FS_ENCRYPTION_ALGS=y
# CONFIG_FS_VERITY is not set
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y
CONFIG_QUOTA=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QUOTA_DEBUG is not set
CONFIG_QUOTA_TREE=y
# CONFIG_QFMT_V1 is not set
CONFIG_QFMT_V2=y
CONFIG_QUOTACTL=y
CONFIG_AUTOFS4_FS=y
CONFIG_AUTOFS_FS=y
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
# CONFIG_VIRTIO_FS is not set
CONFIG_OVERLAY_FS=m
# CONFIG_OVERLAY_FS_REDIRECT_DIR is not set
# CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW is not set
# CONFIG_OVERLAY_FS_INDEX is not set
# CONFIG_OVERLAY_FS_XINO_AUTO is not set
# CONFIG_OVERLAY_FS_METACOPY is not set

#
# Caches
#
CONFIG_FSCACHE=m
CONFIG_FSCACHE_STATS=y
# CONFIG_FSCACHE_HISTOGRAM is not set
# CONFIG_FSCACHE_DEBUG is not set
# CONFIG_FSCACHE_OBJECT_LIST is not set
CONFIG_CACHEFILES=m
# CONFIG_CACHEFILES_DEBUG is not set
# CONFIG_CACHEFILES_HISTOGRAM is not set
# end of Caches

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=m
# end of CD-ROM/DVD Filesystems

#
# DOS/FAT/EXFAT/NT Filesystems
#
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="ascii"
# CONFIG_FAT_DEFAULT_UTF8 is not set
# CONFIG_EXFAT_FS is not set
# CONFIG_NTFS_FS is not set
# end of DOS/FAT/EXFAT/NT Filesystems

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_VMCORE_DEVICE_DUMP=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_PROC_CHILDREN=y
CONFIG_PROC_PID_ARCH_STATUS=y
CONFIG_PROC_CPU_RESCTRL=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
# CONFIG_TMPFS_INODE64 is not set
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_MEMFD_CREATE=y
CONFIG_ARCH_HAS_GIGANTIC_PAGE=y
CONFIG_CONFIGFS_FS=y
CONFIG_EFIVAR_FS=y
# end of Pseudo filesystems

CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ORANGEFS_FS is not set
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_ECRYPT_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
CONFIG_CRAMFS=m
CONFIG_CRAMFS_BLOCKDEV=y
CONFIG_SQUASHFS=m
# CONFIG_SQUASHFS_FILE_CACHE is not set
CONFIG_SQUASHFS_FILE_DIRECT=y
# CONFIG_SQUASHFS_DECOMP_SINGLE is not set
# CONFIG_SQUASHFS_DECOMP_MULTI is not set
CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU=y
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_ZLIB=y
# CONFIG_SQUASHFS_LZ4 is not set
CONFIG_SQUASHFS_LZO=y
CONFIG_SQUASHFS_XZ=y
# CONFIG_SQUASHFS_ZSTD is not set
# CONFIG_SQUASHFS_4K_DEVBLK_SIZE is not set
# CONFIG_SQUASHFS_EMBEDDED is not set
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3
# CONFIG_VXFS_FS is not set
CONFIG_MINIX_FS=m
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX6FS_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_PSTORE=y
CONFIG_PSTORE_DEFAULT_KMSG_BYTES=10240
CONFIG_PSTORE_DEFLATE_COMPRESS=y
# CONFIG_PSTORE_LZO_COMPRESS is not set
# CONFIG_PSTORE_LZ4_COMPRESS is not set
# CONFIG_PSTORE_LZ4HC_COMPRESS is not set
# CONFIG_PSTORE_842_COMPRESS is not set
# CONFIG_PSTORE_ZSTD_COMPRESS is not set
CONFIG_PSTORE_COMPRESS=y
CONFIG_PSTORE_DEFLATE_COMPRESS_DEFAULT=y
CONFIG_PSTORE_COMPRESS_DEFAULT="deflate"
# CONFIG_PSTORE_CONSOLE is not set
# CONFIG_PSTORE_PMSG is not set
# CONFIG_PSTORE_FTRACE is not set
CONFIG_PSTORE_RAM=m
# CONFIG_PSTORE_BLK is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_EROFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
# CONFIG_NFS_V2 is not set
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=m
# CONFIG_NFS_SWAP is not set
CONFIG_NFS_V4_1=y
CONFIG_NFS_V4_2=y
CONFIG_PNFS_FILE_LAYOUT=m
CONFIG_PNFS_BLOCK=m
CONFIG_PNFS_FLEXFILE_LAYOUT=m
CONFIG_NFS_V4_1_IMPLEMENTATION_ID_DOMAIN="kernel.org"
# CONFIG_NFS_V4_1_MIGRATION is not set
CONFIG_NFS_V4_SECURITY_LABEL=y
CONFIG_ROOT_NFS=y
# CONFIG_NFS_USE_LEGACY_DNS is not set
CONFIG_NFS_USE_KERNEL_DNS=y
CONFIG_NFS_DEBUG=y
CONFIG_NFS_DISABLE_UDP_SUPPORT=y
# CONFIG_NFS_V4_2_READ_PLUS is not set
CONFIG_NFSD=m
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_NFSD_PNFS=y
# CONFIG_NFSD_BLOCKLAYOUT is not set
CONFIG_NFSD_SCSILAYOUT=y
# CONFIG_NFSD_FLEXFILELAYOUT is not set
# CONFIG_NFSD_V4_2_INTER_SSC is not set
CONFIG_NFSD_V4_SECURITY_LABEL=y
CONFIG_GRACE_PERIOD=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=y
CONFIG_NFS_COMMON=y
CONFIG_NFS_V4_2_SSC_HELPER=y
CONFIG_SUNRPC=y
CONFIG_SUNRPC_GSS=m
CONFIG_SUNRPC_BACKCHANNEL=y
CONFIG_RPCSEC_GSS_KRB5=m
# CONFIG_SUNRPC_DISABLE_INSECURE_ENCTYPES is not set
CONFIG_SUNRPC_DEBUG=y
CONFIG_SUNRPC_XPRT_RDMA=m
CONFIG_CEPH_FS=m
# CONFIG_CEPH_FSCACHE is not set
CONFIG_CEPH_FS_POSIX_ACL=y
# CONFIG_CEPH_FS_SECURITY_LABEL is not set
CONFIG_CIFS=m
# CONFIG_CIFS_STATS2 is not set
CONFIG_CIFS_ALLOW_INSECURE_LEGACY=y
CONFIG_CIFS_WEAK_PW_HASH=y
CONFIG_CIFS_UPCALL=y
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
CONFIG_CIFS_DEBUG=y
# CONFIG_CIFS_DEBUG2 is not set
# CONFIG_CIFS_DEBUG_DUMP_KEYS is not set
CONFIG_CIFS_DFS_UPCALL=y
# CONFIG_CIFS_SWN_UPCALL is not set
# CONFIG_CIFS_SMB_DIRECT is not set
# CONFIG_CIFS_FSCACHE is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
# CONFIG_9P_FS is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_737=m
CONFIG_NLS_CODEPAGE_775=m
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_CODEPAGE_852=m
CONFIG_NLS_CODEPAGE_855=m
CONFIG_NLS_CODEPAGE_857=m
CONFIG_NLS_CODEPAGE_860=m
CONFIG_NLS_CODEPAGE_861=m
CONFIG_NLS_CODEPAGE_862=m
CONFIG_NLS_CODEPAGE_863=m
CONFIG_NLS_CODEPAGE_864=m
CONFIG_NLS_CODEPAGE_865=m
CONFIG_NLS_CODEPAGE_866=m
CONFIG_NLS_CODEPAGE_869=m
CONFIG_NLS_CODEPAGE_936=m
CONFIG_NLS_CODEPAGE_950=m
CONFIG_NLS_CODEPAGE_932=m
CONFIG_NLS_CODEPAGE_949=m
CONFIG_NLS_CODEPAGE_874=m
CONFIG_NLS_ISO8859_8=m
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=m
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_2=m
CONFIG_NLS_ISO8859_3=m
CONFIG_NLS_ISO8859_4=m
CONFIG_NLS_ISO8859_5=m
CONFIG_NLS_ISO8859_6=m
CONFIG_NLS_ISO8859_7=m
CONFIG_NLS_ISO8859_9=m
CONFIG_NLS_ISO8859_13=m
CONFIG_NLS_ISO8859_14=m
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_KOI8_R=m
CONFIG_NLS_KOI8_U=m
CONFIG_NLS_MAC_ROMAN=m
CONFIG_NLS_MAC_CELTIC=m
CONFIG_NLS_MAC_CENTEURO=m
CONFIG_NLS_MAC_CROATIAN=m
CONFIG_NLS_MAC_CYRILLIC=m
CONFIG_NLS_MAC_GAELIC=m
CONFIG_NLS_MAC_GREEK=m
CONFIG_NLS_MAC_ICELAND=m
CONFIG_NLS_MAC_INUIT=m
CONFIG_NLS_MAC_ROMANIAN=m
CONFIG_NLS_MAC_TURKISH=m
CONFIG_NLS_UTF8=m
CONFIG_DLM=m
CONFIG_DLM_DEBUG=y
# CONFIG_UNICODE is not set
CONFIG_IO_WQ=y
# end of File systems

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_KEYS_REQUEST_CACHE is not set
CONFIG_PERSISTENT_KEYRINGS=y
CONFIG_TRUSTED_KEYS=y
CONFIG_ENCRYPTED_KEYS=y
# CONFIG_KEY_DH_OPERATIONS is not set
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITY_WRITABLE_HOOKS=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
CONFIG_PAGE_TABLE_ISOLATION=y
# CONFIG_SECURITY_INFINIBAND is not set
CONFIG_SECURITY_NETWORK_XFRM=y
CONFIG_SECURITY_PATH=y
CONFIG_INTEL_TXT=y
CONFIG_LSM_MMAP_MIN_ADDR=65535
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
CONFIG_HARDENED_USERCOPY=y
CONFIG_HARDENED_USERCOPY_FALLBACK=y
CONFIG_FORTIFY_SOURCE=y
# CONFIG_STATIC_USERMODEHELPER is not set
# CONFIG_STRICT_FOLLOW_PFN is not set
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_DISABLE=y
CONFIG_SECURITY_SELINUX_DEVELOP=y
CONFIG_SECURITY_SELINUX_AVC_STATS=y
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1
CONFIG_SECURITY_SELINUX_SIDTAB_HASH_BITS=9
CONFIG_SECURITY_SELINUX_SID2STR_CACHE_SIZE=256
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
CONFIG_SECURITY_APPARMOR=y
CONFIG_SECURITY_APPARMOR_HASH=y
CONFIG_SECURITY_APPARMOR_HASH_DEFAULT=y
# CONFIG_SECURITY_APPARMOR_DEBUG is not set
# CONFIG_SECURITY_APPARMOR_KUNIT_TEST is not set
# CONFIG_SECURITY_LOADPIN is not set
CONFIG_SECURITY_YAMA=y
# CONFIG_SECURITY_SAFESETID is not set
# CONFIG_SECURITY_LOCKDOWN_LSM is not set
CONFIG_INTEGRITY=y
CONFIG_INTEGRITY_SIGNATURE=y
CONFIG_INTEGRITY_ASYMMETRIC_KEYS=y
CONFIG_INTEGRITY_TRUSTED_KEYRING=y
# CONFIG_INTEGRITY_PLATFORM_KEYRING is not set
CONFIG_INTEGRITY_AUDIT=y
CONFIG_IMA=y
CONFIG_IMA_MEASURE_PCR_IDX=10
CONFIG_IMA_LSM_RULES=y
# CONFIG_IMA_TEMPLATE is not set
CONFIG_IMA_NG_TEMPLATE=y
# CONFIG_IMA_SIG_TEMPLATE is not set
CONFIG_IMA_DEFAULT_TEMPLATE="ima-ng"
CONFIG_IMA_DEFAULT_HASH_SHA1=y
# CONFIG_IMA_DEFAULT_HASH_SHA256 is not set
# CONFIG_IMA_DEFAULT_HASH_SHA512 is not set
CONFIG_IMA_DEFAULT_HASH="sha1"
# CONFIG_IMA_WRITE_POLICY is not set
# CONFIG_IMA_READ_POLICY is not set
CONFIG_IMA_APPRAISE=y
# CONFIG_IMA_ARCH_POLICY is not set
# CONFIG_IMA_APPRAISE_BUILD_POLICY is not set
CONFIG_IMA_APPRAISE_BOOTPARAM=y
# CONFIG_IMA_APPRAISE_MODSIG is not set
CONFIG_IMA_TRUSTED_KEYRING=y
# CONFIG_IMA_BLACKLIST_KEYRING is not set
# CONFIG_IMA_LOAD_X509 is not set
CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS=y
CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS=y
# CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT is not set
CONFIG_EVM=y
CONFIG_EVM_ATTR_FSUUID=y
# CONFIG_EVM_ADD_XATTRS is not set
# CONFIG_EVM_LOAD_X509 is not set
CONFIG_DEFAULT_SECURITY_SELINUX=y
# CONFIG_DEFAULT_SECURITY_APPARMOR is not set
# CONFIG_DEFAULT_SECURITY_DAC is not set
CONFIG_LSM="lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_INIT_STACK_NONE=y
# CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set
# CONFIG_INIT_ON_FREE_DEFAULT_ON is not set
# end of Memory initialization
# end of Kernel hardening options
# end of Security options

CONFIG_XOR_BLOCKS=m
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_ASYNC_PQ=m
CONFIG_ASYNC_RAID6_RECOV=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_SKCIPHER=y
CONFIG_CRYPTO_SKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=y
CONFIG_CRYPTO_AKCIPHER2=y
CONFIG_CRYPTO_AKCIPHER=y
CONFIG_CRYPTO_KPP2=y
CONFIG_CRYPTO_KPP=m
CONFIG_CRYPTO_ACOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_NULL2=y
CONFIG_CRYPTO_PCRYPT=m
CONFIG_CRYPTO_CRYPTD=y
CONFIG_CRYPTO_AUTHENC=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_SIMD=y

#
# Public-key cryptography
#
CONFIG_CRYPTO_RSA=y
CONFIG_CRYPTO_DH=m
CONFIG_CRYPTO_ECC=m
CONFIG_CRYPTO_ECDH=m
# CONFIG_CRYPTO_ECRDSA is not set
# CONFIG_CRYPTO_SM2 is not set
# CONFIG_CRYPTO_CURVE25519 is not set
# CONFIG_CRYPTO_CURVE25519_X86 is not set

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_CHACHA20POLY1305=m
# CONFIG_CRYPTO_AEGIS128 is not set
# CONFIG_CRYPTO_AEGIS128_AESNI_SSE2 is not set
CONFIG_CRYPTO_SEQIV=y
CONFIG_CRYPTO_ECHAINIV=m

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CFB=y
CONFIG_CRYPTO_CTR=y
CONFIG_CRYPTO_CTS=y
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=m
# CONFIG_CRYPTO_OFB is not set
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_XTS=y
# CONFIG_CRYPTO_KEYWRAP is not set
# CONFIG_CRYPTO_NHPOLY1305_SSE2 is not set
# CONFIG_CRYPTO_NHPOLY1305_AVX2 is not set
# CONFIG_CRYPTO_ADIANTUM is not set
CONFIG_CRYPTO_ESSIV=m

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_VMAC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32C_INTEL=m
CONFIG_CRYPTO_CRC32=m
CONFIG_CRYPTO_CRC32_PCLMUL=m
CONFIG_CRYPTO_XXHASH=m
CONFIG_CRYPTO_BLAKE2B=m
# CONFIG_CRYPTO_BLAKE2S is not set
# CONFIG_CRYPTO_BLAKE2S_X86 is not set
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_CRCT10DIF_PCLMUL=m
CONFIG_CRYPTO_GHASH=y
CONFIG_CRYPTO_POLY1305=m
CONFIG_CRYPTO_POLY1305_X86_64=m
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA1_SSSE3=y
CONFIG_CRYPTO_SHA256_SSSE3=y
CONFIG_CRYPTO_SHA512_SSSE3=m
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
CONFIG_CRYPTO_SHA3=m
# CONFIG_CRYPTO_SM3 is not set
# CONFIG_CRYPTO_STREEBOG is not set
CONFIG_CRYPTO_WP512=m
CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
# CONFIG_CRYPTO_AES_TI is not set
CONFIG_CRYPTO_AES_NI_INTEL=y
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_COMMON=m
CONFIG_CRYPTO_BLOWFISH_X86_64=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAMELLIA_X86_64=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64=m
CONFIG_CRYPTO_CAST_COMMON=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST5_AVX_X86_64=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_CAST6_AVX_X86_64=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_DES3_EDE_X86_64=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_CHACHA20=m
CONFIG_CRYPTO_CHACHA20_X86_64=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SERPENT_SSE2_X86_64=m
CONFIG_CRYPTO_SERPENT_AVX_X86_64=m
CONFIG_CRYPTO_SERPENT_AVX2_X86_64=m
# CONFIG_CRYPTO_SM4 is not set
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m
CONFIG_CRYPTO_TWOFISH_X86_64=m
CONFIG_CRYPTO_TWOFISH_X86_64_3WAY=m
CONFIG_CRYPTO_TWOFISH_AVX_X86_64=m

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
CONFIG_CRYPTO_LZO=y
# CONFIG_CRYPTO_842 is not set
# CONFIG_CRYPTO_LZ4 is not set
# CONFIG_CRYPTO_LZ4HC is not set
# CONFIG_CRYPTO_ZSTD is not set

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_MENU=y
CONFIG_CRYPTO_DRBG_HMAC=y
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
CONFIG_CRYPTO_DRBG=y
CONFIG_CRYPTO_JITTERENTROPY=y
CONFIG_CRYPTO_USER_API=y
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRYPTO_USER_API_SKCIPHER=y
CONFIG_CRYPTO_USER_API_RNG=y
# CONFIG_CRYPTO_USER_API_RNG_CAVP is not set
CONFIG_CRYPTO_USER_API_AEAD=y
CONFIG_CRYPTO_USER_API_ENABLE_OBSOLETE=y
# CONFIG_CRYPTO_STATS is not set
CONFIG_CRYPTO_HASH_INFO=y

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_AES=y
CONFIG_CRYPTO_LIB_ARC4=m
# CONFIG_CRYPTO_LIB_BLAKE2S is not set
CONFIG_CRYPTO_ARCH_HAVE_LIB_CHACHA=m
CONFIG_CRYPTO_LIB_CHACHA_GENERIC=m
# CONFIG_CRYPTO_LIB_CHACHA is not set
# CONFIG_CRYPTO_LIB_CURVE25519 is not set
CONFIG_CRYPTO_LIB_DES=m
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=11
CONFIG_CRYPTO_ARCH_HAVE_LIB_POLY1305=m
CONFIG_CRYPTO_LIB_POLY1305_GENERIC=m
# CONFIG_CRYPTO_LIB_POLY1305 is not set
# CONFIG_CRYPTO_LIB_CHACHA20POLY1305 is not set
CONFIG_CRYPTO_LIB_SHA256=y
CONFIG_CRYPTO_HW=y
CONFIG_CRYPTO_DEV_PADLOCK=m
CONFIG_CRYPTO_DEV_PADLOCK_AES=m
CONFIG_CRYPTO_DEV_PADLOCK_SHA=m
# CONFIG_CRYPTO_DEV_ATMEL_ECC is not set
# CONFIG_CRYPTO_DEV_ATMEL_SHA204A is not set
CONFIG_CRYPTO_DEV_CCP=y
CONFIG_CRYPTO_DEV_CCP_DD=m
CONFIG_CRYPTO_DEV_SP_CCP=y
CONFIG_CRYPTO_DEV_CCP_CRYPTO=m
CONFIG_CRYPTO_DEV_SP_PSP=y
# CONFIG_CRYPTO_DEV_CCP_DEBUGFS is not set
CONFIG_CRYPTO_DEV_QAT=m
CONFIG_CRYPTO_DEV_QAT_DH895xCC=m
CONFIG_CRYPTO_DEV_QAT_C3XXX=m
CONFIG_CRYPTO_DEV_QAT_C62X=m
# CONFIG_CRYPTO_DEV_QAT_4XXX is not set
CONFIG_CRYPTO_DEV_QAT_DH895xCCVF=m
CONFIG_CRYPTO_DEV_QAT_C3XXXVF=m
CONFIG_CRYPTO_DEV_QAT_C62XVF=m
CONFIG_CRYPTO_DEV_NITROX=m
CONFIG_CRYPTO_DEV_NITROX_CNN55XX=m
# CONFIG_CRYPTO_DEV_VIRTIO is not set
# CONFIG_CRYPTO_DEV_SAFEXCEL is not set
# CONFIG_CRYPTO_DEV_AMLOGIC_GXL is not set
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
# CONFIG_ASYMMETRIC_TPM_KEY_SUBTYPE is not set
CONFIG_X509_CERTIFICATE_PARSER=y
# CONFIG_PKCS8_PRIVATE_KEY_PARSER is not set
CONFIG_PKCS7_MESSAGE_PARSER=y
# CONFIG_PKCS7_TEST_KEY is not set
CONFIG_SIGNED_PE_FILE_VERIFICATION=y

#
# Certificates for signature checking
#
CONFIG_MODULE_SIG_KEY="certs/signing_key.pem"
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_SYSTEM_TRUSTED_KEYS=""
# CONFIG_SYSTEM_EXTRA_CERTIFICATE is not set
# CONFIG_SECONDARY_TRUSTED_KEYRING is not set
CONFIG_SYSTEM_BLACKLIST_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_HASH_LIST=""
# end of Certificates for signature checking

CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=m
CONFIG_RAID6_PQ_BENCHMARK=y
# CONFIG_PACKING is not set
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_CORDIC=m
# CONFIG_PRIME_NUMBERS is not set
CONFIG_RATIONAL=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
CONFIG_ARCH_USE_SYM_ANNOTATIONS=y
CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
# CONFIG_CRC64 is not set
# CONFIG_CRC4 is not set
CONFIG_CRC7=m
CONFIG_LIBCRC32C=m
CONFIG_CRC8=m
CONFIG_XXHASH=y
# CONFIG_RANDOM32_SELFTEST is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_DECOMPRESS=y
CONFIG_ZSTD_COMPRESS=m
CONFIG_ZSTD_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_DECOMPRESS_LZ4=y
CONFIG_DECOMPRESS_ZSTD=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_REED_SOLOMON=m
CONFIG_REED_SOLOMON_ENC8=y
CONFIG_REED_SOLOMON_DEC8=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_INTERVAL_TREE=y
CONFIG_XARRAY_MULTI=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT_MAP=y
CONFIG_HAS_DMA=y
CONFIG_DMA_OPS=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED=y
CONFIG_SWIOTLB=y
CONFIG_DMA_COHERENT_POOL=y
CONFIG_DMA_CMA=y
# CONFIG_DMA_PERNUMA_CMA is not set

#
# Default contiguous memory area size:
#
CONFIG_CMA_SIZE_MBYTES=200
CONFIG_CMA_SIZE_SEL_MBYTES=y
# CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set
# CONFIG_CMA_SIZE_SEL_MIN is not set
# CONFIG_CMA_SIZE_SEL_MAX is not set
CONFIG_CMA_ALIGNMENT=8
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_DMA_MAP_BENCHMARK is not set
CONFIG_SGL_ALLOC=y
CONFIG_CHECK_SIGNATURE=y
CONFIG_CPUMASK_OFFSTACK=y
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_GLOB=y
# CONFIG_GLOB_SELFTEST is not set
CONFIG_NLATTR=y
CONFIG_CLZ_TAB=y
CONFIG_IRQ_POLL=y
CONFIG_MPILIB=y
CONFIG_SIGNATURE=y
CONFIG_DIMLIB=y
CONFIG_OID_REGISTRY=y
CONFIG_UCS2_STRING=y
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_GETTIMEOFDAY=y
CONFIG_GENERIC_VDSO_TIME_NS=y
CONFIG_FONT_SUPPORT=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_SG_POOL=y
CONFIG_ARCH_HAS_PMEM_API=y
CONFIG_MEMREGION=y
CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE=y
CONFIG_ARCH_HAS_COPY_MC=y
CONFIG_ARCH_STACKWALK=y
CONFIG_SBITMAP=y
# CONFIG_STRING_SELFTEST is not set
# end of Library routines

#
# Kernel hacking
#

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
# CONFIG_PRINTK_CALLER is not set
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
CONFIG_BOOT_PRINTK_DELAY=y
CONFIG_DYNAMIC_DEBUG=y
CONFIG_DYNAMIC_DEBUG_CORE=y
CONFIG_SYMBOLIC_ERRNAME=y
CONFIG_DEBUG_BUGVERBOSE=y
# end of printk and dmesg options

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_REDUCED=y
# CONFIG_DEBUG_INFO_COMPRESSED is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
# CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is not set
CONFIG_DEBUG_INFO_DWARF4=y
# CONFIG_DEBUG_INFO_DWARF5 is not set
CONFIG_PAHOLE_HAS_SPLIT_BTF=y
# CONFIG_GDB_SCRIPTS is not set
CONFIG_FRAME_WARN=2048
CONFIG_STRIP_ASM_SYMS=y
# CONFIG_READABLE_ASM is not set
# CONFIG_HEADERS_INSTALL is not set
CONFIG_DEBUG_SECTION_MISMATCH=y
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
CONFIG_STACK_VALIDATION=y
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_MAGIC_SYSRQ_SERIAL=y
CONFIG_MAGIC_SYSRQ_SERIAL_SEQUENCE=""
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
# CONFIG_DEBUG_FS_DISALLOW_MOUNT is not set
# CONFIG_DEBUG_FS_ALLOW_NONE is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
# CONFIG_UBSAN is not set
CONFIG_HAVE_ARCH_KCSAN=y
# end of Generic Kernel Debugging Instruments

CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MISC=y

#
# Memory Debugging
#
# CONFIG_PAGE_EXTENSION is not set
# CONFIG_DEBUG_PAGEALLOC is not set
# CONFIG_PAGE_OWNER is not set
# CONFIG_PAGE_POISONING is not set
# CONFIG_DEBUG_PAGE_REF is not set
# CONFIG_DEBUG_RODATA_TEST is not set
CONFIG_ARCH_HAS_DEBUG_WX=y
# CONFIG_DEBUG_WX is not set
CONFIG_GENERIC_PTDUMP=y
# CONFIG_PTDUMP_DEBUGFS is not set
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_DEBUG_KMEMLEAK is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_SCHED_STACK_END_CHECK is not set
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VM_PGTABLE is not set
CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
# CONFIG_DEBUG_VIRTUAL is not set
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DEBUG_PER_CPU_MAPS is not set
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
# CONFIG_KASAN is not set
CONFIG_HAVE_ARCH_KFENCE=y
# CONFIG_KFENCE is not set
# end of Memory Debugging

CONFIG_DEBUG_SHIRQ=y

#
# Debug Oops, Lockups and Hangs
#
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_ON_OOPS_VALUE=1
CONFIG_PANIC_TIMEOUT=0
CONFIG_LOCKUP_DETECTOR=y
CONFIG_SOFTLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HARDLOCKUP_CHECK_TIMESTAMP=y
CONFIG_HARDLOCKUP_DETECTOR=y
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=1
# CONFIG_DETECT_HUNG_TASK is not set
# CONFIG_WQ_WATCHDOG is not set
# CONFIG_TEST_LOCKUP is not set
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y
# end of Scheduler Debugging

# CONFIG_DEBUG_TIMEKEEPING is not set

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set
# CONFIG_DEBUG_RWSEMS is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
CONFIG_DEBUG_ATOMIC_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_LOCK_TORTURE_TEST=m
# CONFIG_WW_MUTEX_SELFTEST is not set
# CONFIG_SCF_TORTURE_TEST is not set
# CONFIG_CSD_LOCK_WAIT_DEBUG is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

# CONFIG_DEBUG_IRQFLAGS is not set
CONFIG_STACKTRACE=y
# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set
# CONFIG_DEBUG_KOBJECT is not set

#
# Debug kernel data structures
#
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_PLIST is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_BUG_ON_DATA_CORRUPTION=y
# end of Debug kernel data structures

# CONFIG_DEBUG_CREDENTIALS is not set

#
# RCU Debugging
#
CONFIG_TORTURE_TEST=m
CONFIG_RCU_SCALE_TEST=m
CONFIG_RCU_TORTURE_TEST=m
# CONFIG_RCU_REF_SCALE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=60
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set
# end of RCU Debugging

# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_CPU_HOTPLUG_STATE_CONTROL is not set
CONFIG_LATENCYTOP=y
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_OBJTOOL_MCOUNT=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_TRACING=y
CONFIG_GLOBAL_TRACE_BUF_SIZE=1441792
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
# CONFIG_BOOTTIME_TRACING is not set
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_FUNCTION_PROFILER=y
CONFIG_STACK_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
CONFIG_SCHED_TRACER=y
CONFIG_HWLAT_TRACER=y
# CONFIG_MMIOTRACE is not set
CONFIG_FTRACE_SYSCALLS=y
CONFIG_TRACER_SNAPSHOT=y
# CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP is not set
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_KPROBE_EVENTS=y
# CONFIG_KPROBE_EVENTS_ON_NOTRACE is not set
CONFIG_UPROBE_EVENTS=y
CONFIG_BPF_EVENTS=y
CONFIG_DYNAMIC_EVENTS=y
CONFIG_PROBE_EVENTS=y
# CONFIG_BPF_KPROBE_OVERRIDE is not set
CONFIG_FTRACE_MCOUNT_RECORD=y
CONFIG_FTRACE_MCOUNT_USE_CC=y
CONFIG_TRACING_MAP=y
CONFIG_SYNTH_EVENTS=y
CONFIG_HIST_TRIGGERS=y
# CONFIG_TRACE_EVENT_INJECT is not set
# CONFIG_TRACEPOINT_BENCHMARK is not set
CONFIG_RING_BUFFER_BENCHMARK=m
# CONFIG_TRACE_EVAL_MAP_FILE is not set
# CONFIG_FTRACE_RECORD_RECURSION is not set
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_RING_BUFFER_STARTUP_TEST is not set
# CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set
# CONFIG_SYNTH_EVENT_GEN_TEST is not set
# CONFIG_KPROBE_EVENT_GEN_TEST is not set
# CONFIG_HIST_TRIGGERS_DEBUG is not set
CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
# CONFIG_SAMPLES is not set
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y
CONFIG_STRICT_DEVMEM=y
# CONFIG_IO_STRICT_DEVMEM is not set

#
# x86 Debugging
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_NMI_SUPPORT=y
CONFIG_EARLY_PRINTK_USB=y
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_PRINTK_DBGP=y
CONFIG_EARLY_PRINTK_USB_XDBC=y
# CONFIG_EFI_PGT_DUMP is not set
# CONFIG_DEBUG_TLBFLUSH is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_X86_DECODER_SELFTEST=y
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEBUG_BOOT_PARAMS=y
# CONFIG_CPA_DEBUG is not set
# CONFIG_DEBUG_ENTRY is not set
# CONFIG_DEBUG_NMI_SELFTEST is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_PUNIT_ATOM_DEBUG is not set
CONFIG_UNWINDER_ORC=y
# CONFIG_UNWINDER_FRAME_POINTER is not set
# end of x86 Debugging

#
# Kernel Testing and Coverage
#
CONFIG_KUNIT=y
# CONFIG_KUNIT_DEBUGFS is not set
CONFIG_KUNIT_TEST=m
CONFIG_KUNIT_EXAMPLE_TEST=m
# CONFIG_KUNIT_ALL_TESTS is not set
# CONFIG_NOTIFIER_ERROR_INJECTION is not set
CONFIG_FUNCTION_ERROR_INJECTION=y
CONFIG_FAULT_INJECTION=y
# CONFIG_FAILSLAB is not set
# CONFIG_FAIL_PAGE_ALLOC is not set
# CONFIG_FAULT_INJECTION_USERCOPY is not set
CONFIG_FAIL_MAKE_REQUEST=y
# CONFIG_FAIL_IO_TIMEOUT is not set
# CONFIG_FAIL_FUTEX is not set
CONFIG_FAULT_INJECTION_DEBUG_FS=y
# CONFIG_FAIL_FUNCTION is not set
# CONFIG_FAIL_MMC_REQUEST is not set
CONFIG_ARCH_HAS_KCOV=y
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
# CONFIG_KCOV is not set
CONFIG_RUNTIME_TESTING_MENU=y
# CONFIG_LKDTM is not set
# CONFIG_TEST_LIST_SORT is not set
# CONFIG_TEST_MIN_HEAP is not set
# CONFIG_TEST_SORT is not set
# CONFIG_KPROBES_SANITY_TEST is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_RBTREE_TEST is not set
# CONFIG_REED_SOLOMON_TEST is not set
# CONFIG_INTERVAL_TREE_TEST is not set
# CONFIG_PERCPU_TEST is not set
CONFIG_ATOMIC64_SELFTEST=y
# CONFIG_ASYNC_RAID6_TEST is not set
# CONFIG_TEST_HEXDUMP is not set
# CONFIG_TEST_STRING_HELPERS is not set
# CONFIG_TEST_STRSCPY is not set
# CONFIG_TEST_KSTRTOX is not set
# CONFIG_TEST_PRINTF is not set
# CONFIG_TEST_BITMAP is not set
# CONFIG_TEST_UUID is not set
# CONFIG_TEST_XARRAY is not set
# CONFIG_TEST_OVERFLOW is not set
# CONFIG_TEST_RHASHTABLE is not set
# CONFIG_TEST_HASH is not set
# CONFIG_TEST_IDA is not set
# CONFIG_TEST_LKM is not set
# CONFIG_TEST_BITOPS is not set
# CONFIG_TEST_VMALLOC is not set
# CONFIG_TEST_USER_COPY is not set
CONFIG_TEST_BPF=m
# CONFIG_TEST_BLACKHOLE_DEV is not set
# CONFIG_FIND_BIT_BENCHMARK is not set
# CONFIG_TEST_FIRMWARE is not set
# CONFIG_TEST_SYSCTL is not set
# CONFIG_BITFIELD_KUNIT is not set
# CONFIG_RESOURCE_KUNIT_TEST is not set
CONFIG_SYSCTL_KUNIT_TEST=m
CONFIG_LIST_KUNIT_TEST=m
# CONFIG_LINEAR_RANGES_TEST is not set
# CONFIG_CMDLINE_KUNIT_TEST is not set
# CONFIG_BITS_TEST is not set
# CONFIG_TEST_UDELAY is not set
# CONFIG_TEST_STATIC_KEYS is not set
# CONFIG_TEST_KMOD is not set
# CONFIG_TEST_MEMCAT_P is not set
# CONFIG_TEST_LIVEPATCH is not set
# CONFIG_TEST_STACKINIT is not set
# CONFIG_TEST_MEMINIT is not set
# CONFIG_TEST_HMM is not set
# CONFIG_TEST_FREE_PAGES is not set
# CONFIG_TEST_FPU is not set
# CONFIG_MEMTEST is not set
# CONFIG_HYPERV_TESTING is not set
# end of Kernel Testing and Coverage
# end of Kernel hacking

[-- Attachment #3: job-script --]
[-- Type: text/plain, Size: 5980 bytes --]

#!/bin/sh

export_top_env()
{
	export suite='xfstests'
	export testcase='xfstests'
	export category='functional'
	export need_memory='3G'
	export job_origin='xfstests-xfs-part3.yaml'
	export queue_cmdline_keys='branch
commit
queue_at_least_once'
	export queue='validate'
	export testbox='lkp-ivb-d04'
	export tbox_group='lkp-ivb-d04'
	export kconfig='x86_64-rhel-8.3'
	export submit_id='6097318b969c5ce718bf5289'
	export job_file='/lkp/jobs/scheduled/lkp-ivb-d04/xfstests-4HDD-xfs-xfs-group-23-ucode=0x21-debian-10.4-x86_64-20200603.cgz-07336fb545bfa9794d5b4146809355dffc93f0aa-20210509-59160-8p1dz9-2.yaml'
	export id='43bebad66295de72603d17e75f29d459d095e52d'
	export queuer_version='/lkp-src'
	export model='Ivy Bridge'
	export nr_node=1
	export nr_cpu=4
	export memory='8G'
	export nr_ssd_partitions=1
	export nr_hdd_partitions=4
	export ssd_partitions='/dev/disk/by-id/ata-INTEL_SSDSC2KB240G8_BTYF836606UQ240AGN-part1'
	export hdd_partitions='/dev/disk/by-id/ata-WDC_WD20EZRX-00D8PB0_WD-WCC4M0KTT6NK-part2 /dev/disk/by-id/ata-WDC_WD20EZRX-00D8PB0_WD-WCC4M0KTT6NK-part3 /dev/disk/by-id/ata-WDC_WD20EZRX-00D8PB0_WD-WCC4M0KTT6NK-part4 /dev/disk/by-id/ata-WDC_WD20EZRX-00D8PB0_WD-WCC4M0KTT6NK-part5'
	export rootfs_partition='/dev/disk/by-id/ata-WDC_WD20EZRX-00D8PB0_WD-WCC4M0KTT6NK-part1'
	export brand='Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz'
	export need_kconfig='CONFIG_BLK_DEV_SD
CONFIG_SCSI
CONFIG_BLOCK=y
CONFIG_SATA_AHCI
CONFIG_SATA_AHCI_PLATFORM
CONFIG_ATA
CONFIG_PCI=y
CONFIG_XFS_FS'
	export commit='07336fb545bfa9794d5b4146809355dffc93f0aa'
	export netconsole_port=6676
	export ucode='0x21'
	export need_kconfig_hw='CONFIG_R8169=y
CONFIG_SATA_AHCI
CONFIG_DRM_I915'
	export bisect_dmesg=true
	export enqueue_time='2021-05-09 08:49:15 +0800'
	export _id='6097318d969c5ce718bf528a'
	export _rt='/result/xfstests/4HDD-xfs-xfs-group-23-ucode=0x21/lkp-ivb-d04/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/07336fb545bfa9794d5b4146809355dffc93f0aa'
	export user='lkp'
	export compiler='gcc-9'
	export LKP_SERVER='internal-lkp-server'
	export head_commit='db948cc95b8d6a62f78c77986c9e12341ad612db'
	export base_commit='9f4ad9e425a1d3b6a34617b8ea226d56a119a717'
	export branch='linux-review/Matthew-Brost/Basic-GuC-submission-support-in-the-i915/20210507-030308'
	export rootfs='debian-10.4-x86_64-20200603.cgz'
	export result_root='/result/xfstests/4HDD-xfs-xfs-group-23-ucode=0x21/lkp-ivb-d04/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/07336fb545bfa9794d5b4146809355dffc93f0aa/3'
	export scheduler_version='/lkp/lkp/.src-20210506-110429'
	export arch='x86_64'
	export max_uptime=2100
	export initrd='/osimage/debian/debian-10.4-x86_64-20200603.cgz'
	export bootloader_append='root=/dev/ram0
user=lkp
job=/lkp/jobs/scheduled/lkp-ivb-d04/xfstests-4HDD-xfs-xfs-group-23-ucode=0x21-debian-10.4-x86_64-20200603.cgz-07336fb545bfa9794d5b4146809355dffc93f0aa-20210509-59160-8p1dz9-2.yaml
ARCH=x86_64
kconfig=x86_64-rhel-8.3
branch=linux-review/Matthew-Brost/Basic-GuC-submission-support-in-the-i915/20210507-030308
commit=07336fb545bfa9794d5b4146809355dffc93f0aa
BOOT_IMAGE=/pkg/linux/x86_64-rhel-8.3/gcc-9/07336fb545bfa9794d5b4146809355dffc93f0aa/vmlinuz-5.12.0-02605-g07336fb545bf
max_uptime=2100
RESULT_ROOT=/result/xfstests/4HDD-xfs-xfs-group-23-ucode=0x21/lkp-ivb-d04/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/07336fb545bfa9794d5b4146809355dffc93f0aa/3
LKP_SERVER=internal-lkp-server
nokaslr
selinux=0
debug
apic=debug
sysrq_always_enabled
rcupdate.rcu_cpu_stall_timeout=100
net.ifnames=0
printk.devkmsg=on
panic=-1
softlockup_panic=1
nmi_watchdog=panic
oops=panic
load_ramdisk=2
prompt_ramdisk=0
drbd.minor_count=8
systemd.log_level=err
ignore_loglevel
console=tty0
earlyprintk=ttyS0,115200
console=ttyS0,115200
vga=normal
rw'
	export modules_initrd='/pkg/linux/x86_64-rhel-8.3/gcc-9/07336fb545bfa9794d5b4146809355dffc93f0aa/modules.cgz'
	export bm_initrd='/osimage/deps/debian-10.4-x86_64-20200603.cgz/run-ipconfig_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/lkp_20201211.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/rsync-rootfs_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/fs_20200714.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/xfstests_20210401.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/xfstests-x86_64-73c0871-1_20210401.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/hw_20200715.cgz'
	export ucode_initrd='/osimage/ucode/intel-ucode-20210222.cgz'
	export lkp_initrd='/osimage/user/lkp/lkp-x86_64.cgz'
	export site='inn'
	export LKP_CGI_PORT=80
	export LKP_CIFS_PORT=139
	export last_kernel='4.20.0'
	export repeat_to=6
	export queue_at_least_once=1
	export kernel='/pkg/linux/x86_64-rhel-8.3/gcc-9/07336fb545bfa9794d5b4146809355dffc93f0aa/vmlinuz-5.12.0-02605-g07336fb545bf'
	export dequeue_time='2021-05-09 08:57:24 +0800'
	export job_initrd='/lkp/jobs/scheduled/lkp-ivb-d04/xfstests-4HDD-xfs-xfs-group-23-ucode=0x21-debian-10.4-x86_64-20200603.cgz-07336fb545bfa9794d5b4146809355dffc93f0aa-20210509-59160-8p1dz9-2.cgz'

	[ -n "$LKP_SRC" ] ||
	export LKP_SRC=/lkp/${user:-lkp}/src
}

run_job()
{
	echo $$ > $TMP/run-job.pid

	. $LKP_SRC/lib/http.sh
	. $LKP_SRC/lib/job.sh
	. $LKP_SRC/lib/env.sh

	export_top_env

	run_setup nr_hdd=4 $LKP_SRC/setup/disk

	run_setup fs='xfs' $LKP_SRC/setup/fs

	run_monitor $LKP_SRC/monitors/wrapper kmsg
	run_monitor $LKP_SRC/monitors/wrapper heartbeat
	run_monitor $LKP_SRC/monitors/wrapper meminfo
	run_monitor $LKP_SRC/monitors/wrapper oom-killer
	run_monitor $LKP_SRC/monitors/plain/watchdog

	run_test test='xfs-group-23' $LKP_SRC/tests/wrapper xfstests
}

extract_stats()
{
	export stats_part_begin=
	export stats_part_end=

	env test='xfs-group-23' $LKP_SRC/stats/wrapper xfstests
	$LKP_SRC/stats/wrapper kmsg
	$LKP_SRC/stats/wrapper meminfo

	$LKP_SRC/stats/wrapper time xfstests.time
	$LKP_SRC/stats/wrapper dmesg
	$LKP_SRC/stats/wrapper kmsg
	$LKP_SRC/stats/wrapper last_state
	$LKP_SRC/stats/wrapper stderr
	$LKP_SRC/stats/wrapper time
}

"$@"

[-- Attachment #4: kmsg.xz --]
[-- Type: application/x-xz, Size: 24128 bytes --]

[-- Attachment #5: xfstests --]
[-- Type: text/plain, Size: 844 bytes --]

2021-05-09 00:58:35 export TEST_DIR=/fs/sda2
2021-05-09 00:58:35 export TEST_DEV=/dev/sda2
2021-05-09 00:58:35 export FSTYP=xfs
2021-05-09 00:58:35 export SCRATCH_MNT=/fs/scratch
2021-05-09 00:58:35 mkdir /fs/scratch -p
2021-05-09 00:58:35 export SCRATCH_DEV=/dev/sda5
2021-05-09 00:58:35 export SCRATCH_LOGDEV=/dev/sda3
2021-05-09 00:58:35 export SCRATCH_XFS_LIST_METADATA_FIELDS=u3.sfdir3.hdr.parent.i4
2021-05-09 00:58:35 export SCRATCH_XFS_LIST_FUZZ_VERBS=random
2021-05-09 00:58:35 sed "s:^:xfs/:" //lkp/benchmarks/xfstests/tests/xfs-group-23
2021-05-09 00:58:35 ./check xfs/238
FSTYP         -- xfs (debug)
PLATFORM      -- Linux/x86_64 lkp-ivb-d04 5.12.0-02605-g07336fb545bf #1 SMP Sun May 9 03:33:15 CST 2021
MKFS_OPTIONS  -- -f -bsize=4096 /dev/sda5
MOUNT_OPTIONS -- /dev/sda5 /fs/scratch

xfs/238	 3s
Ran: xfs/238
Passed all 1 tests


[-- Attachment #6: job.yaml --]
[-- Type: text/plain, Size: 4878 bytes --]

---

#! jobs/xfstests-xfs-part3.yaml
suite: xfstests
testcase: xfstests
category: functional
need_memory: 3G
disk: 4HDD
fs: xfs
xfstests:
  test: xfs-group-23
job_origin: xfstests-xfs-part3.yaml

#! queue options
queue_cmdline_keys:
- branch
- commit
queue: bisect
testbox: lkp-ivb-d04
tbox_group: lkp-ivb-d04
kconfig: x86_64-rhel-8.3
submit_id: 609729e0969c5ce18702d211
job_file: "/lkp/jobs/scheduled/lkp-ivb-d04/xfstests-4HDD-xfs-xfs-group-23-ucode=0x21-debian-10.4-x86_64-20200603.cgz-07336fb545bfa9794d5b4146809355dffc93f0aa-20210509-57735-1irgc8f-0.yaml"
id: a625e60575ffc3de015a8772581215966ceca000
queuer_version: "/lkp-src"

#! hosts/lkp-ivb-d04
model: Ivy Bridge
nr_node: 1
nr_cpu: 4
memory: 8G
nr_ssd_partitions: 1
nr_hdd_partitions: 4
ssd_partitions: "/dev/disk/by-id/ata-INTEL_SSDSC2KB240G8_BTYF836606UQ240AGN-part1"
hdd_partitions: "/dev/disk/by-id/ata-WDC_WD20EZRX-00D8PB0_WD-WCC4M0KTT6NK-part2 /dev/disk/by-id/ata-WDC_WD20EZRX-00D8PB0_WD-WCC4M0KTT6NK-part3
  /dev/disk/by-id/ata-WDC_WD20EZRX-00D8PB0_WD-WCC4M0KTT6NK-part4 /dev/disk/by-id/ata-WDC_WD20EZRX-00D8PB0_WD-WCC4M0KTT6NK-part5"
rootfs_partition: "/dev/disk/by-id/ata-WDC_WD20EZRX-00D8PB0_WD-WCC4M0KTT6NK-part1"
brand: Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz

#! include/category/functional
kmsg: 
heartbeat: 
meminfo: 

#! include/disk/nr_hdd
need_kconfig:
- CONFIG_BLK_DEV_SD
- CONFIG_SCSI
- CONFIG_BLOCK=y
- CONFIG_SATA_AHCI
- CONFIG_SATA_AHCI_PLATFORM
- CONFIG_ATA
- CONFIG_PCI=y
- CONFIG_XFS_FS

#! include/queue/cyclic
commit: 07336fb545bfa9794d5b4146809355dffc93f0aa

#! include/testbox/lkp-ivb-d04
netconsole_port: 6676
ucode: '0x21'
need_kconfig_hw:
- CONFIG_R8169=y
- CONFIG_SATA_AHCI
- CONFIG_DRM_I915
bisect_dmesg: true

#! include/fs/OTHERS
enqueue_time: 2021-05-09 08:16:32.410010899 +08:00
_id: 609729e0969c5ce18702d211
_rt: "/result/xfstests/4HDD-xfs-xfs-group-23-ucode=0x21/lkp-ivb-d04/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/07336fb545bfa9794d5b4146809355dffc93f0aa"

#! schedule options
user: lkp
compiler: gcc-9
LKP_SERVER: internal-lkp-server
head_commit: db948cc95b8d6a62f78c77986c9e12341ad612db
base_commit: 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
branch: linux-devel/devel-hourly-20210507-105946
rootfs: debian-10.4-x86_64-20200603.cgz
result_root: "/result/xfstests/4HDD-xfs-xfs-group-23-ucode=0x21/lkp-ivb-d04/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/07336fb545bfa9794d5b4146809355dffc93f0aa/0"
scheduler_version: "/lkp/lkp/.src-20210506-110429"
arch: x86_64
max_uptime: 2100
initrd: "/osimage/debian/debian-10.4-x86_64-20200603.cgz"
bootloader_append:
- root=/dev/ram0
- user=lkp
- job=/lkp/jobs/scheduled/lkp-ivb-d04/xfstests-4HDD-xfs-xfs-group-23-ucode=0x21-debian-10.4-x86_64-20200603.cgz-07336fb545bfa9794d5b4146809355dffc93f0aa-20210509-57735-1irgc8f-0.yaml
- ARCH=x86_64
- kconfig=x86_64-rhel-8.3
- branch=linux-devel/devel-hourly-20210507-105946
- commit=07336fb545bfa9794d5b4146809355dffc93f0aa
- BOOT_IMAGE=/pkg/linux/x86_64-rhel-8.3/gcc-9/07336fb545bfa9794d5b4146809355dffc93f0aa/vmlinuz-5.12.0-02605-g07336fb545bf
- max_uptime=2100
- RESULT_ROOT=/result/xfstests/4HDD-xfs-xfs-group-23-ucode=0x21/lkp-ivb-d04/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/07336fb545bfa9794d5b4146809355dffc93f0aa/0
- LKP_SERVER=internal-lkp-server
- nokaslr
- selinux=0
- debug
- apic=debug
- sysrq_always_enabled
- rcupdate.rcu_cpu_stall_timeout=100
- net.ifnames=0
- printk.devkmsg=on
- panic=-1
- softlockup_panic=1
- nmi_watchdog=panic
- oops=panic
- load_ramdisk=2
- prompt_ramdisk=0
- drbd.minor_count=8
- systemd.log_level=err
- ignore_loglevel
- console=tty0
- earlyprintk=ttyS0,115200
- console=ttyS0,115200
- vga=normal
- rw
modules_initrd: "/pkg/linux/x86_64-rhel-8.3/gcc-9/07336fb545bfa9794d5b4146809355dffc93f0aa/modules.cgz"
bm_initrd: "/osimage/deps/debian-10.4-x86_64-20200603.cgz/run-ipconfig_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/lkp_20201211.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/rsync-rootfs_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/fs_20200714.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/xfstests_20210401.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/xfstests-x86_64-73c0871-1_20210401.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/hw_20200715.cgz"
ucode_initrd: "/osimage/ucode/intel-ucode-20210222.cgz"
lkp_initrd: "/osimage/user/lkp/lkp-x86_64.cgz"
site: inn

#! /lkp/lkp/.src-20210506-110429/include/site/inn
LKP_CGI_PORT: 80
LKP_CIFS_PORT: 139
oom-killer: 
watchdog: 

#! runtime status
last_kernel: 4.20.0

#! user overrides
kernel: "/pkg/linux/x86_64-rhel-8.3/gcc-9/07336fb545bfa9794d5b4146809355dffc93f0aa/vmlinuz-5.12.0-02605-g07336fb545bf"
dequeue_time: 2021-05-09 08:18:47.692595189 +08:00
job_state: finished
loadavg: 1.19 0.34 0.12 1/149 2587
start_time: '1620519596'
end_time: '1620519603'
version: "/lkp/lkp/.src-20210506-110504:c8d5f8a8:89926580f"

[-- Attachment #7: reproduce --]
[-- Type: text/plain, Size: 852 bytes --]

dmsetup remove_all
wipefs -a --force /dev/sda2
wipefs -a --force /dev/sda3
wipefs -a --force /dev/sda4
wipefs -a --force /dev/sda5
mkfs -t xfs -f /dev/sda2
mkfs -t xfs -f /dev/sda3
mkfs -t xfs -f /dev/sda4
mkfs -t xfs -f /dev/sda5
mkdir -p /fs/sda2
modprobe xfs
mount -t xfs -o inode64 /dev/sda2 /fs/sda2
mkdir -p /fs/sda3
mount -t xfs -o inode64 /dev/sda3 /fs/sda3
mkdir -p /fs/sda4
mount -t xfs -o inode64 /dev/sda4 /fs/sda4
mkdir -p /fs/sda5
mount -t xfs -o inode64 /dev/sda5 /fs/sda5
export TEST_DIR=/fs/sda2
export TEST_DEV=/dev/sda2
export FSTYP=xfs
export SCRATCH_MNT=/fs/scratch
mkdir /fs/scratch -p
export SCRATCH_DEV=/dev/sda5
export SCRATCH_LOGDEV=/dev/sda3
export SCRATCH_XFS_LIST_METADATA_FIELDS=u3.sfdir3.hdr.parent.i4
export SCRATCH_XFS_LIST_FUZZ_VERBS=random
sed "s:^:xfs/:" //lkp/benchmarks/xfstests/tests/xfs-group-23
./check xfs/238

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 20/97] drm/i915/guc: Introduce unified HXG messages
  2021-05-06 19:13 ` [RFC PATCH 20/97] drm/i915/guc: Introduce unified HXG messages Matthew Brost
@ 2021-05-11 15:16   ` Daniel Vetter
  2021-05-11 17:59     ` Matthew Brost
  2021-05-11 22:11     ` Michal Wajdeczko
  0 siblings, 2 replies; 249+ messages in thread
From: Daniel Vetter @ 2021-05-11 15:16 UTC (permalink / raw)
  To: Matthew Brost
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison

On Thu, May 06, 2021 at 12:13:34PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> New GuC firmware will unify format of MMIO and CTB H2G messages.
> Introduce their definitions now to allow gradual transition of
> our code to match new changes.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> ---
>  .../gpu/drm/i915/gt/uc/abi/guc_messages_abi.h | 226 ++++++++++++++++++
>  1 file changed, 226 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
> index 775e21f3058c..1c264819aa03 100644
> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
> @@ -6,6 +6,232 @@
>  #ifndef _ABI_GUC_MESSAGES_ABI_H
>  #define _ABI_GUC_MESSAGES_ABI_H
>  
> +/**
> + * DOC: HXG Message

These aren't useful if we don't pull them in somewhere in the
Documentation/gpu hierarchy. General comment, and also please check that
it all renders correctly still.

btw if you respin a patch not originally by you we generally add a (v1) to
the original s-o-b line (or whever the version split was) and explain in
the usual changelog in the commit message what was changed.

This holds for the entire series ofc.
-Daniel

> + *
> + * All messages exchanged with GuC are defined using 32 bit dwords.
> + * First dword is treated as a message header. Remaining dwords are optional.
> + *
> + * .. _HXG Message:
> + *
> + *  +---+-------+--------------------------------------------------------------+
> + *  |   | Bits  | Description                                                  |
> + *  +===+=======+==============================================================+
> + *  |   |       |                                                              |
> + *  | 0 |    31 | **ORIGIN** - originator of the message                       |
> + *  |   |       |   - _`GUC_HXG_ORIGIN_HOST` = 0                               |
> + *  |   |       |   - _`GUC_HXG_ORIGIN_GUC` = 1                                |
> + *  |   |       |                                                              |
> + *  |   +-------+--------------------------------------------------------------+
> + *  |   | 30:28 | **TYPE** - message type                                      |
> + *  |   |       |   - _`GUC_HXG_TYPE_REQUEST` = 0                              |
> + *  |   |       |   - _`GUC_HXG_TYPE_EVENT` = 1                                |
> + *  |   |       |   - _`GUC_HXG_TYPE_NO_RESPONSE_BUSY` = 3                     |
> + *  |   |       |   - _`GUC_HXG_TYPE_NO_RESPONSE_RETRY` = 5                    |
> + *  |   |       |   - _`GUC_HXG_TYPE_RESPONSE_FAILURE` = 6                     |
> + *  |   |       |   - _`GUC_HXG_TYPE_RESPONSE_SUCCESS` = 7                     |
> + *  |   +-------+--------------------------------------------------------------+
> + *  |   |  27:0 | **AUX** - auxiliary data (depends TYPE)                      |
> + *  +---+-------+--------------------------------------------------------------+
> + *  | 1 |  31:0 | optional payload (depends on TYPE)                           |
> + *  +---+-------+                                                              |
> + *  |...|       |                                                              |
> + *  +---+-------+                                                              |
> + *  | n |  31:0 |                                                              |
> + *  +---+-------+--------------------------------------------------------------+
> + */
> +
> +#define GUC_HXG_MSG_MIN_LEN			1u
> +#define GUC_HXG_MSG_0_ORIGIN			(0x1 << 31)
> +#define   GUC_HXG_ORIGIN_HOST			0u
> +#define   GUC_HXG_ORIGIN_GUC			1u
> +#define GUC_HXG_MSG_0_TYPE			(0x7 << 28)
> +#define   GUC_HXG_TYPE_REQUEST			0u
> +#define   GUC_HXG_TYPE_EVENT			1u
> +#define   GUC_HXG_TYPE_NO_RESPONSE_BUSY		3u
> +#define   GUC_HXG_TYPE_NO_RESPONSE_RETRY	5u
> +#define   GUC_HXG_TYPE_RESPONSE_FAILURE		6u
> +#define   GUC_HXG_TYPE_RESPONSE_SUCCESS		7u
> +#define GUC_HXG_MSG_0_AUX			(0xfffffff << 0)
> +
> +/**
> + * DOC: HXG Request
> + *
> + * The `HXG Request`_ message should be used to initiate synchronous activity
> + * for which confirmation or return data is expected.
> + *
> + * The recipient of this message shall use `HXG Response`_, `HXG Failure`_
> + * or `HXG Retry`_ message as a definite reply, and may use `HXG Busy`_
> + * message as a intermediate reply.
> + *
> + * Format of @DATA0 and all @DATAn fields depends on the @ACTION code.
> + *
> + * _HXG Request:
> + *
> + *  +---+-------+--------------------------------------------------------------+
> + *  |   | Bits  | Description                                                  |
> + *  +===+=======+==============================================================+
> + *  | 0 |    31 | ORIGIN                                                       |
> + *  |   +-------+--------------------------------------------------------------+
> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_                                 |
> + *  |   +-------+--------------------------------------------------------------+
> + *  |   | 27:16 | **DATA0** - request data (depends on ACTION)                 |
> + *  |   +-------+--------------------------------------------------------------+
> + *  |   |  15:0 | **ACTION** - requested action code                           |
> + *  +---+-------+--------------------------------------------------------------+
> + *  | 1 |  31:0 | **DATA1** - optional data (depends on ACTION)                |
> + *  +---+-------+--------------------------------------------------------------+
> + *  |...|       |                                                              |
> + *  +---+-------+--------------------------------------------------------------+
> + *  | n |  31:0 | **DATAn** - optional data (depends on ACTION)                |
> + *  +---+-------+--------------------------------------------------------------+
> + */
> +
> +#define GUC_HXG_REQUEST_MSG_MIN_LEN		GUC_HXG_MSG_MIN_LEN
> +#define GUC_HXG_REQUEST_MSG_0_DATA0		(0xfff << 16)
> +#define GUC_HXG_REQUEST_MSG_0_ACTION		(0xffff << 0)
> +#define GUC_HXG_REQUEST_MSG_n_DATAn		(0xffffffff << 0)
> +
> +/**
> + * DOC: HXG Event
> + *
> + * The `HXG Event`_ message should be used to initiate asynchronous activity
> + * that does not involves immediate confirmation nor data.
> + *
> + * Format of @DATA0 and all @DATAn fields depends on the @ACTION code.
> + *
> + * .. _HXG Event:
> + *
> + *  +---+-------+--------------------------------------------------------------+
> + *  |   | Bits  | Description                                                  |
> + *  +===+=======+==============================================================+
> + *  | 0 |    31 | ORIGIN                                                       |
> + *  |   +-------+--------------------------------------------------------------+
> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_EVENT_                                   |
> + *  |   +-------+--------------------------------------------------------------+
> + *  |   | 27:16 | **DATA0** - event data (depends on ACTION)                   |
> + *  |   +-------+--------------------------------------------------------------+
> + *  |   |  15:0 | **ACTION** - event action code                               |
> + *  +---+-------+--------------------------------------------------------------+
> + *  | 1 |  31:0 | **DATA1** - optional event data (depends on ACTION)          |
> + *  +---+-------+--------------------------------------------------------------+
> + *  |...|       |                                                              |
> + *  +---+-------+--------------------------------------------------------------+
> + *  | n |  31:0 | **DATAn** - optional event  data (depends on ACTION)         |
> + *  +---+-------+--------------------------------------------------------------+
> + */
> +
> +#define GUC_HXG_EVENT_MSG_MIN_LEN		GUC_HXG_MSG_MIN_LEN
> +#define GUC_HXG_EVENT_MSG_0_DATA0		(0xfff << 16)
> +#define GUC_HXG_EVENT_MSG_0_ACTION		(0xffff << 0)
> +#define GUC_HXG_EVENT_MSG_n_DATAn		(0xffffffff << 0)
> +
> +/**
> + * DOC: HXG Busy
> + *
> + * The `HXG Busy`_ message may be used to acknowledge reception of the `HXG Request`_
> + * message if the recipient expects that it processing will be longer than default
> + * timeout.
> + *
> + * The @COUNTER field may be used as a progress indicator.
> + *
> + * .. _HXG Busy:
> + *
> + *  +---+-------+--------------------------------------------------------------+
> + *  |   | Bits  | Description                                                  |
> + *  +===+=======+==============================================================+
> + *  | 0 |    31 | ORIGIN                                                       |
> + *  |   +-------+--------------------------------------------------------------+
> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_NO_RESPONSE_BUSY_                        |
> + *  |   +-------+--------------------------------------------------------------+
> + *  |   |  27:0 | **COUNTER** - progress indicator                             |
> + *  +---+-------+--------------------------------------------------------------+
> + */
> +
> +#define GUC_HXG_BUSY_MSG_LEN			GUC_HXG_MSG_MIN_LEN
> +#define GUC_HXG_BUSY_MSG_0_COUNTER		GUC_HXG_MSG_0_AUX
> +
> +/**
> + * DOC: HXG Retry
> + *
> + * The `HXG Retry`_ message should be used by recipient to indicate that the
> + * `HXG Request`_ message was dropped and it should be resent again.
> + *
> + * The @REASON field may be used to provide additional information.
> + *
> + * .. _HXG Retry:
> + *
> + *  +---+-------+--------------------------------------------------------------+
> + *  |   | Bits  | Description                                                  |
> + *  +===+=======+==============================================================+
> + *  | 0 |    31 | ORIGIN                                                       |
> + *  |   +-------+--------------------------------------------------------------+
> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_NO_RESPONSE_RETRY_                       |
> + *  |   +-------+--------------------------------------------------------------+
> + *  |   |  27:0 | **REASON** - reason for retry                                |
> + *  |   |       |  - _`GUC_HXG_RETRY_REASON_UNSPECIFIED` = 0                   |
> + *  +---+-------+--------------------------------------------------------------+
> + */
> +
> +#define GUC_HXG_RETRY_MSG_LEN			GUC_HXG_MSG_MIN_LEN
> +#define GUC_HXG_RETRY_MSG_0_REASON		GUC_HXG_MSG_0_AUX
> +#define   GUC_HXG_RETRY_REASON_UNSPECIFIED	0u
> +
> +/**
> + * DOC: HXG Failure
> + *
> + * The `HXG Failure`_ message shall be used as a reply to the `HXG Request`_
> + * message that could not be processed due to an error.
> + *
> + * .. _HXG Failure:
> + *
> + *  +---+-------+--------------------------------------------------------------+
> + *  |   | Bits  | Description                                                  |
> + *  +===+=======+==============================================================+
> + *  | 0 |    31 | ORIGIN                                                       |
> + *  |   +-------+--------------------------------------------------------------+
> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_FAILURE_                        |
> + *  |   +-------+--------------------------------------------------------------+
> + *  |   | 27:16 | **HINT** - additional error hint                             |
> + *  |   +-------+--------------------------------------------------------------+
> + *  |   |  15:0 | **ERROR** - error/result code                                |
> + *  +---+-------+--------------------------------------------------------------+
> + */
> +
> +#define GUC_HXG_FAILURE_MSG_LEN			GUC_HXG_MSG_MIN_LEN
> +#define GUC_HXG_FAILURE_MSG_0_HINT		(0xfff << 16)
> +#define GUC_HXG_FAILURE_MSG_0_ERROR		(0xffff << 0)
> +
> +/**
> + * DOC: HXG Response
> + *
> + * The `HXG Response`_ message SHALL be used as a reply to the `HXG Request`_
> + * message that was successfully processed without an error.
> + *
> + * .. _HXG Response:
> + *
> + *  +---+-------+--------------------------------------------------------------+
> + *  |   | Bits  | Description                                                  |
> + *  +===+=======+==============================================================+
> + *  | 0 |    31 | ORIGIN                                                       |
> + *  |   +-------+--------------------------------------------------------------+
> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_                        |
> + *  |   +-------+--------------------------------------------------------------+
> + *  |   |  27:0 | **DATA0** - data (depends on ACTION from `HXG Request`_)     |
> + *  +---+-------+--------------------------------------------------------------+
> + *  | 1 |  31:0 | **DATA1** - data (depends on ACTION from `HXG Request`_)     |
> + *  +---+-------+--------------------------------------------------------------+
> + *  |...|       |                                                              |
> + *  +---+-------+--------------------------------------------------------------+
> + *  | n |  31:0 | **DATAn** - data (depends on ACTION from `HXG Request`_)     |
> + *  +---+-------+--------------------------------------------------------------+
> + */
> +
> +#define GUC_HXG_RESPONSE_MSG_MIN_LEN		GUC_HXG_MSG_MIN_LEN
> +#define GUC_HXG_RESPONSE_MSG_0_DATA0		GUC_HXG_MSG_0_AUX
> +#define GUC_HXG_RESPONSE_MSG_n_DATAn		(0xffffffff << 0)
> +
> +/* deprecated */
>  #define INTEL_GUC_MSG_TYPE_SHIFT	28
>  #define INTEL_GUC_MSG_TYPE_MASK		(0xF << INTEL_GUC_MSG_TYPE_SHIFT)
>  #define INTEL_GUC_MSG_DATA_SHIFT	16
> -- 
> 2.28.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 32/97] drm/i915: Introduce i915_sched_engine object
  2021-05-06 19:13 ` [RFC PATCH 32/97] drm/i915: Introduce i915_sched_engine object Matthew Brost
@ 2021-05-11 15:18   ` Daniel Vetter
  2021-05-11 17:56     ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Daniel Vetter @ 2021-05-11 15:18 UTC (permalink / raw)
  To: Matthew Brost
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison

On Thu, May 06, 2021 at 12:13:46PM -0700, Matthew Brost wrote:
> Introduce i915_sched_engine object which is lower level data structure
> that i915_scheduler / generic code can operate on without touching
> execlist specific structures. This allows additional submission backends
> to be added without breaking the layer.

Maybe add a comment here that this is defacto a detour since we're now
aiming to use drm/scheduler instead. But also since the current code is a
bit a mess, we expect this detour to be overall faster since we can then
refactor in-tree.

Maybe also highlight this a bit more in the rfc to make sure this is
clear.
-Daniel

> 
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_wait.c      |   4 +-
>  drivers/gpu/drm/i915/gt/intel_engine.h        |  16 -
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  77 ++--
>  .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   4 +-
>  drivers/gpu/drm/i915/gt/intel_engine_pm.c     |  10 +-
>  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  42 +--
>  drivers/gpu/drm/i915/gt/intel_engine_user.c   |   2 +-
>  .../drm/i915/gt/intel_execlists_submission.c  | 350 +++++++++++-------
>  .../gpu/drm/i915/gt/intel_ring_submission.c   |  13 +-
>  drivers/gpu/drm/i915/gt/mock_engine.c         |  17 +-
>  drivers/gpu/drm/i915/gt/selftest_execlists.c  |  36 +-
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   6 +-
>  drivers/gpu/drm/i915/gt/selftest_lrc.c        |   6 +-
>  drivers/gpu/drm/i915/gt/selftest_reset.c      |   2 +-
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  75 ++--
>  drivers/gpu/drm/i915/i915_gpu_error.c         |   7 +-
>  drivers/gpu/drm/i915/i915_request.c           |  50 +--
>  drivers/gpu/drm/i915/i915_request.h           |   2 +-
>  drivers/gpu/drm/i915/i915_scheduler.c         | 168 ++++-----
>  drivers/gpu/drm/i915/i915_scheduler.h         |  65 +++-
>  drivers/gpu/drm/i915/i915_scheduler_types.h   |  63 ++++
>  21 files changed, 575 insertions(+), 440 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> index 4b9856d5ba14..af1fbf8e2a9a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> @@ -104,8 +104,8 @@ static void fence_set_priority(struct dma_fence *fence,
>  	engine = rq->engine;
>  
>  	rcu_read_lock(); /* RCU serialisation for set-wedged protection */
> -	if (engine->schedule)
> -		engine->schedule(rq, attr);
> +	if (engine->sched_engine->schedule)
> +		engine->sched_engine->schedule(rq, attr);
>  	rcu_read_unlock();
>  }
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index 8d9184920c51..988d9688ae4d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -123,20 +123,6 @@ execlists_active(const struct intel_engine_execlists *execlists)
>  	return active;
>  }
>  
> -static inline void
> -execlists_active_lock_bh(struct intel_engine_execlists *execlists)
> -{
> -	local_bh_disable(); /* prevent local softirq and lock recursion */
> -	tasklet_lock(&execlists->tasklet);
> -}
> -
> -static inline void
> -execlists_active_unlock_bh(struct intel_engine_execlists *execlists)
> -{
> -	tasklet_unlock(&execlists->tasklet);
> -	local_bh_enable(); /* restore softirq, and kick ksoftirqd! */
> -}
> -
>  struct i915_request *
>  execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists);
>  
> @@ -257,8 +243,6 @@ intel_engine_find_active_request(struct intel_engine_cs *engine);
>  
>  u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
>  
> -void intel_engine_init_active(struct intel_engine_cs *engine,
> -			      unsigned int subclass);
>  #define ENGINE_PHYSICAL	0
>  #define ENGINE_MOCK	1
>  #define ENGINE_VIRTUAL	2
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 828e1669f92c..ec82a7ec0c8d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -8,6 +8,7 @@
>  #include "gem/i915_gem_context.h"
>  
>  #include "i915_drv.h"
> +#include "i915_scheduler.h"
>  
>  #include "intel_breadcrumbs.h"
>  #include "intel_context.h"
> @@ -326,9 +327,6 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
>  	if (engine->context_size)
>  		DRIVER_CAPS(i915)->has_logical_contexts = true;
>  
> -	/* Nothing to do here, execute in order of dependencies */
> -	engine->schedule = NULL;
> -
>  	ewma__engine_latency_init(&engine->latency);
>  	seqcount_init(&engine->stats.lock);
>  
> @@ -583,9 +581,6 @@ void intel_engine_init_execlists(struct intel_engine_cs *engine)
>  	memset(execlists->pending, 0, sizeof(execlists->pending));
>  	execlists->active =
>  		memset(execlists->inflight, 0, sizeof(execlists->inflight));
> -
> -	execlists->queue_priority_hint = INT_MIN;
> -	execlists->queue = RB_ROOT_CACHED;
>  }
>  
>  static void cleanup_status_page(struct intel_engine_cs *engine)
> @@ -712,11 +707,17 @@ static int engine_setup_common(struct intel_engine_cs *engine)
>  		goto err_status;
>  	}
>  
> +	engine->sched_engine = i915_sched_engine_create(ENGINE_PHYSICAL);
> +	if (!engine->sched_engine) {
> +		err = -ENOMEM;
> +		goto err_sched_engine;
> +	}
> +	engine->sched_engine->engine = engine;
> +
>  	err = intel_engine_init_cmd_parser(engine);
>  	if (err)
>  		goto err_cmd_parser;
>  
> -	intel_engine_init_active(engine, ENGINE_PHYSICAL);
>  	intel_engine_init_execlists(engine);
>  	intel_engine_init__pm(engine);
>  	intel_engine_init_retire(engine);
> @@ -735,6 +736,8 @@ static int engine_setup_common(struct intel_engine_cs *engine)
>  	return 0;
>  
>  err_cmd_parser:
> +	i915_sched_engine_put(engine->sched_engine);
> +err_sched_engine:
>  	intel_breadcrumbs_free(engine->breadcrumbs);
>  err_status:
>  	cleanup_status_page(engine);
> @@ -773,11 +776,11 @@ static int measure_breadcrumb_dw(struct intel_context *ce)
>  	frame->rq.ring = &frame->ring;
>  
>  	mutex_lock(&ce->timeline->mutex);
> -	spin_lock_irq(&engine->active.lock);
> +	spin_lock_irq(&engine->sched_engine->lock);
>  
>  	dw = engine->emit_fini_breadcrumb(&frame->rq, frame->cs) - frame->cs;
>  
> -	spin_unlock_irq(&engine->active.lock);
> +	spin_unlock_irq(&engine->sched_engine->lock);
>  	mutex_unlock(&ce->timeline->mutex);
>  
>  	GEM_BUG_ON(dw & 1); /* RING_TAIL must be qword aligned */
> @@ -786,28 +789,6 @@ static int measure_breadcrumb_dw(struct intel_context *ce)
>  	return dw;
>  }
>  
> -void
> -intel_engine_init_active(struct intel_engine_cs *engine, unsigned int subclass)
> -{
> -	INIT_LIST_HEAD(&engine->active.requests);
> -	INIT_LIST_HEAD(&engine->active.hold);
> -
> -	spin_lock_init(&engine->active.lock);
> -	lockdep_set_subclass(&engine->active.lock, subclass);
> -
> -	/*
> -	 * Due to an interesting quirk in lockdep's internal debug tracking,
> -	 * after setting a subclass we must ensure the lock is used. Otherwise,
> -	 * nr_unused_locks is incremented once too often.
> -	 */
> -#ifdef CONFIG_DEBUG_LOCK_ALLOC
> -	local_irq_disable();
> -	lock_map_acquire(&engine->active.lock.dep_map);
> -	lock_map_release(&engine->active.lock.dep_map);
> -	local_irq_enable();
> -#endif
> -}
> -
>  static struct intel_context *
>  create_pinned_context(struct intel_engine_cs *engine,
>  		      unsigned int hwsp,
> @@ -955,10 +936,10 @@ int intel_engines_init(struct intel_gt *gt)
>   */
>  void intel_engine_cleanup_common(struct intel_engine_cs *engine)
>  {
> -	GEM_BUG_ON(!list_empty(&engine->active.requests));
> -	tasklet_kill(&engine->execlists.tasklet); /* flush the callback */
> +	GEM_BUG_ON(!list_empty(&engine->sched_engine->requests));
>  
>  	intel_breadcrumbs_free(engine->breadcrumbs);
> +	i915_sched_engine_put(engine->sched_engine);
>  
>  	intel_engine_fini_retire(engine);
>  	intel_engine_cleanup_cmd_parser(engine);
> @@ -1241,7 +1222,7 @@ static bool ring_is_idle(struct intel_engine_cs *engine)
>  
>  void __intel_engine_flush_submission(struct intel_engine_cs *engine, bool sync)
>  {
> -	struct tasklet_struct *t = &engine->execlists.tasklet;
> +	struct tasklet_struct *t = &engine->sched_engine->tasklet;
>  
>  	if (!t->callback)
>  		return;
> @@ -1281,7 +1262,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
>  	intel_engine_flush_submission(engine);
>  
>  	/* ELSP is empty, but there are ready requests? E.g. after reset */
> -	if (!RB_EMPTY_ROOT(&engine->execlists.queue.rb_root))
> +	if (!i915_sched_engine_is_empty(engine->sched_engine))
>  		return false;
>  
>  	/* Ring stopped? */
> @@ -1347,7 +1328,7 @@ static struct intel_timeline *get_timeline(struct i915_request *rq)
>  	struct intel_timeline *tl;
>  
>  	/*
> -	 * Even though we are holding the engine->active.lock here, there
> +	 * Even though we are holding the engine->sched_engine->lock here, there
>  	 * is no control over the submission queue per-se and we are
>  	 * inspecting the active state at a random point in time, with an
>  	 * unknown queue. Play safe and make sure the timeline remains valid.
> @@ -1502,10 +1483,10 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
>  
>  		drm_printf(m, "\tExeclist tasklet queued? %s (%s), preempt? %s, timeslice? %s\n",
>  			   yesno(test_bit(TASKLET_STATE_SCHED,
> -					  &engine->execlists.tasklet.state)),
> -			   enableddisabled(!atomic_read(&engine->execlists.tasklet.count)),
> -			   repr_timer(&engine->execlists.preempt),
> -			   repr_timer(&engine->execlists.timer));
> +					  &engine->sched_engine->tasklet.state)),
> +			   enableddisabled(!atomic_read(&engine->sched_engine->tasklet.count)),
> +			   repr_timer(&execlists->preempt),
> +			   repr_timer(&execlists->timer));
>  
>  		read = execlists->csb_head;
>  		write = READ_ONCE(*execlists->csb_write);
> @@ -1527,7 +1508,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
>  				   idx, hws[idx * 2], hws[idx * 2 + 1]);
>  		}
>  
> -		execlists_active_lock_bh(execlists);
> +		sched_engine_active_lock_bh(engine->sched_engine);
>  		rcu_read_lock();
>  		for (port = execlists->active; (rq = *port); port++) {
>  			char hdr[160];
> @@ -1558,7 +1539,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
>  			i915_request_show(m, rq, hdr, 0);
>  		}
>  		rcu_read_unlock();
> -		execlists_active_unlock_bh(execlists);
> +		sched_engine_active_unlock_bh(engine->sched_engine);
>  	} else if (INTEL_GEN(dev_priv) > 6) {
>  		drm_printf(m, "\tPP_DIR_BASE: 0x%08x\n",
>  			   ENGINE_READ(engine, RING_PP_DIR_BASE));
> @@ -1694,7 +1675,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>  
>  	drm_printf(m, "\tRequests:\n");
>  
> -	spin_lock_irqsave(&engine->active.lock, flags);
> +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
>  	rq = intel_engine_find_active_request(engine);
>  	if (rq) {
>  		struct intel_timeline *tl = get_timeline(rq);
> @@ -1725,8 +1706,9 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>  			hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE);
>  		}
>  	}
> -	drm_printf(m, "\tOn hold?: %lu\n", list_count(&engine->active.hold));
> -	spin_unlock_irqrestore(&engine->active.lock, flags);
> +	drm_printf(m, "\tOn hold?: %lu\n",
> +		   list_count(&engine->sched_engine->hold));
> +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>  
>  	drm_printf(m, "\tMMIO base:  0x%08x\n", engine->mmio_base);
>  	wakeref = intel_runtime_pm_get_if_in_use(engine->uncore->rpm);
> @@ -1806,7 +1788,7 @@ intel_engine_find_active_request(struct intel_engine_cs *engine)
>  	 * At all other times, we must assume the GPU is still running, but
>  	 * we only care about the snapshot of this moment.
>  	 */
> -	lockdep_assert_held(&engine->active.lock);
> +	lockdep_assert_held(&engine->sched_engine->lock);
>  
>  	rcu_read_lock();
>  	request = execlists_active(&engine->execlists);
> @@ -1824,7 +1806,8 @@ intel_engine_find_active_request(struct intel_engine_cs *engine)
>  	if (active)
>  		return active;
>  
> -	list_for_each_entry(request, &engine->active.requests, sched.link) {
> +	list_for_each_entry(request, &engine->sched_engine->requests,
> +			    sched.link) {
>  		if (__i915_request_is_complete(request))
>  			continue;
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> index b99ac41695f3..b6a305e6a974 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> @@ -121,7 +121,7 @@ static void heartbeat(struct work_struct *wrk)
>  			 * but all other contexts, including the kernel
>  			 * context are stuck waiting for the signal.
>  			 */
> -		} else if (engine->schedule &&
> +		} else if (engine->sched_engine->schedule &&
>  			   rq->sched.attr.priority < I915_PRIORITY_BARRIER) {
>  			/*
>  			 * Gradually raise the priority of the heartbeat to
> @@ -136,7 +136,7 @@ static void heartbeat(struct work_struct *wrk)
>  				attr.priority = I915_PRIORITY_BARRIER;
>  
>  			local_bh_disable();
> -			engine->schedule(rq, &attr);
> +			engine->sched_engine->schedule(rq, &attr);
>  			local_bh_enable();
>  		} else {
>  			if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> index 47f4397095e5..ba6a9931c4e8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> @@ -274,14 +274,16 @@ static int __engine_park(struct intel_wakeref *wf)
>  	intel_engine_park_heartbeat(engine);
>  	intel_breadcrumbs_park(engine->breadcrumbs);
>  
> -	/* Must be reset upon idling, or we may miss the busy wakeup. */
> -	GEM_BUG_ON(engine->execlists.queue_priority_hint != INT_MIN);
> +	/*
> +	 * XXX: Must be reset upon idling, or we may miss the busy wakeup.
> +	 * queue_priority_hint only used in execlists submission but works in
> +	 * other modes as default is INT_MIN.
> +	 */
> +	GEM_BUG_ON(engine->sched_engine->queue_priority_hint != INT_MIN);
>  
>  	if (engine->park)
>  		engine->park(engine);
>  
> -	engine->execlists.no_priolist = false;
> -
>  	/* While gt calls i915_vma_parked(), we have to break the lock cycle */
>  	intel_gt_pm_put_async(engine->gt);
>  	return 0;
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 9ef349cd5cea..93aa22680db0 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -59,6 +59,7 @@ struct drm_i915_reg_table;
>  struct i915_gem_context;
>  struct i915_request;
>  struct i915_sched_attr;
> +struct i915_sched_engine;
>  struct intel_gt;
>  struct intel_ring;
>  struct intel_uncore;
> @@ -137,11 +138,6 @@ struct st_preempt_hang {
>   * driver and the hardware state for execlist mode of submission.
>   */
>  struct intel_engine_execlists {
> -	/**
> -	 * @tasklet: softirq tasklet for bottom handler
> -	 */
> -	struct tasklet_struct tasklet;
> -
>  	/**
>  	 * @timer: kick the current context if its timeslice expires
>  	 */
> @@ -152,11 +148,6 @@ struct intel_engine_execlists {
>  	 */
>  	struct timer_list preempt;
>  
> -	/**
> -	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
> -	 */
> -	struct i915_priolist default_priolist;
> -
>  	/**
>  	 * @ccid: identifier for contexts submitted to this engine
>  	 */
> @@ -191,11 +182,6 @@ struct intel_engine_execlists {
>  	 */
>  	u32 reset_ccid;
>  
> -	/**
> -	 * @no_priolist: priority lists disabled
> -	 */
> -	bool no_priolist;
> -
>  	/**
>  	 * @submit_reg: gen-specific execlist submission register
>  	 * set to the ExecList Submission Port (elsp) register pre-Gen11 and to
> @@ -238,23 +224,8 @@ struct intel_engine_execlists {
>  	unsigned int port_mask;
>  
>  	/**
> -	 * @queue_priority_hint: Highest pending priority.
> -	 *
> -	 * When we add requests into the queue, or adjust the priority of
> -	 * executing requests, we compute the maximum priority of those
> -	 * pending requests. We can then use this value to determine if
> -	 * we need to preempt the executing requests to service the queue.
> -	 * However, since the we may have recorded the priority of an inflight
> -	 * request we wanted to preempt but since completed, at the time of
> -	 * dequeuing the priority hint may no longer may match the highest
> -	 * available request priority.
> +	 * @virtual: virtual of requests, in priority lists
>  	 */
> -	int queue_priority_hint;
> -
> -	/**
> -	 * @queue: queue of requests, in priority lists
> -	 */
> -	struct rb_root_cached queue;
>  	struct rb_root_cached virtual;
>  
>  	/**
> @@ -326,11 +297,7 @@ struct intel_engine_cs {
>  
>  	struct intel_sseu sseu;
>  
> -	struct {
> -		spinlock_t lock;
> -		struct list_head requests;
> -		struct list_head hold; /* ready requests, but on hold */
> -	} active;
> +	struct i915_sched_engine *sched_engine;
>  
>  	/* keep a request in reserve for a [pm] barrier under oom */
>  	struct i915_request *request_pool;
> @@ -459,9 +426,6 @@ struct intel_engine_cs {
>  	 * dependencies may need rescheduling. Note the request itself may
>  	 * not be ready to run!
>  	 */
> -	void		(*schedule)(struct i915_request *request,
> -				    const struct i915_sched_attr *attr);
> -
>  	void		(*release)(struct intel_engine_cs *engine);
>  
>  	struct intel_engine_execlists execlists;
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> index 1cbd84eb24e4..d6dcdeace174 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> @@ -107,7 +107,7 @@ static void set_scheduler_caps(struct drm_i915_private *i915)
>  	for_each_uabi_engine(engine, i915) { /* all engines must agree! */
>  		int i;
>  
> -		if (engine->schedule)
> +		if (engine->sched_engine->schedule)
>  			enabled |= (I915_SCHEDULER_CAP_ENABLED |
>  				    I915_SCHEDULER_CAP_PRIORITY);
>  		else
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 8db200422950..0927a2416b52 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -273,11 +273,11 @@ static int effective_prio(const struct i915_request *rq)
>  	return prio;
>  }
>  
> -static int queue_prio(const struct intel_engine_execlists *execlists)
> +static int queue_prio(const struct i915_sched_engine *sched_engine)
>  {
>  	struct rb_node *rb;
>  
> -	rb = rb_first_cached(&execlists->queue);
> +	rb = rb_first_cached(&sched_engine->queue);
>  	if (!rb)
>  		return INT_MIN;
>  
> @@ -318,14 +318,14 @@ static bool need_preempt(const struct intel_engine_cs *engine,
>  	 * to preserve FIFO ordering of dependencies.
>  	 */
>  	last_prio = max(effective_prio(rq), I915_PRIORITY_NORMAL - 1);
> -	if (engine->execlists.queue_priority_hint <= last_prio)
> +	if (engine->sched_engine->queue_priority_hint <= last_prio)
>  		return false;
>  
>  	/*
>  	 * Check against the first request in ELSP[1], it will, thanks to the
>  	 * power of PI, be the highest priority of that context.
>  	 */
> -	if (!list_is_last(&rq->sched.link, &engine->active.requests) &&
> +	if (!list_is_last(&rq->sched.link, &engine->sched_engine->requests) &&
>  	    rq_prio(list_next_entry(rq, sched.link)) > last_prio)
>  		return true;
>  
> @@ -340,7 +340,7 @@ static bool need_preempt(const struct intel_engine_cs *engine,
>  	 * context, it's priority would not exceed ELSP[0] aka last_prio.
>  	 */
>  	return max(virtual_prio(&engine->execlists),
> -		   queue_prio(&engine->execlists)) > last_prio;
> +		   queue_prio(engine->sched_engine)) > last_prio;
>  }
>  
>  __maybe_unused static bool
> @@ -367,10 +367,10 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
>  	struct list_head *pl;
>  	int prio = I915_PRIORITY_INVALID;
>  
> -	lockdep_assert_held(&engine->active.lock);
> +	lockdep_assert_held(&engine->sched_engine->lock);
>  
>  	list_for_each_entry_safe_reverse(rq, rn,
> -					 &engine->active.requests,
> +					 &engine->sched_engine->requests,
>  					 sched.link) {
>  		if (__i915_request_is_complete(rq)) {
>  			list_del_init(&rq->sched.link);
> @@ -382,9 +382,10 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
>  		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
>  		if (rq_prio(rq) != prio) {
>  			prio = rq_prio(rq);
> -			pl = i915_sched_lookup_priolist(engine, prio);
> +			pl = i915_sched_lookup_priolist(engine->sched_engine,
> +							prio);
>  		}
> -		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
> +		GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
>  
>  		list_move(&rq->sched.link, pl);
>  		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> @@ -534,13 +535,13 @@ resubmit_virtual_request(struct i915_request *rq, struct virtual_engine *ve)
>  {
>  	struct intel_engine_cs *engine = rq->engine;
>  
> -	spin_lock_irq(&engine->active.lock);
> +	spin_lock_irq(&engine->sched_engine->lock);
>  
>  	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>  	WRITE_ONCE(rq->engine, &ve->base);
>  	ve->base.submit_request(rq);
>  
> -	spin_unlock_irq(&engine->active.lock);
> +	spin_unlock_irq(&engine->sched_engine->lock);
>  }
>  
>  static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
> @@ -569,7 +570,7 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
>  		resubmit_virtual_request(rq, ve);
>  
>  	if (READ_ONCE(ve->request))
> -		tasklet_hi_schedule(&ve->base.execlists.tasklet);
> +		i915_sched_engine_hi_kick(ve->base.sched_engine);
>  }
>  
>  static void __execlists_schedule_out(struct i915_request * const rq,
> @@ -579,7 +580,7 @@ static void __execlists_schedule_out(struct i915_request * const rq,
>  	unsigned int ccid;
>  
>  	/*
> -	 * NB process_csb() is not under the engine->active.lock and hence
> +	 * NB process_csb() is not under the engine->sched_engine->lock and hence
>  	 * schedule_out can race with schedule_in meaning that we should
>  	 * refrain from doing non-trivial work here.
>  	 */
> @@ -721,12 +722,11 @@ dump_port(char *buf, int buflen, const char *prefix, struct i915_request *rq)
>  }
>  
>  static __maybe_unused noinline void
> -trace_ports(const struct intel_engine_execlists *execlists,
> +trace_ports(const struct intel_engine_cs *engine,
> +	    const struct intel_engine_execlists *execlists,
>  	    const char *msg,
>  	    struct i915_request * const *ports)
>  {
> -	const struct intel_engine_cs *engine =
> -		container_of(execlists, typeof(*engine), execlists);
>  	char __maybe_unused p0[40], p1[40];
>  
>  	if (!ports[0])
> @@ -738,25 +738,24 @@ trace_ports(const struct intel_engine_execlists *execlists,
>  }
>  
>  static bool
> -reset_in_progress(const struct intel_engine_execlists *execlists)
> +reset_in_progress(const struct intel_engine_cs *engine)
>  {
> -	return unlikely(!__tasklet_is_enabled(&execlists->tasklet));
> +	return unlikely(!__tasklet_is_enabled(&engine->sched_engine->tasklet));
>  }
>  
>  static __maybe_unused noinline bool
> -assert_pending_valid(const struct intel_engine_execlists *execlists,
> +assert_pending_valid(struct intel_engine_cs *engine,
> +		     const struct intel_engine_execlists *execlists,
>  		     const char *msg)
>  {
> -	struct intel_engine_cs *engine =
> -		container_of(execlists, typeof(*engine), execlists);
>  	struct i915_request * const *port, *rq, *prev = NULL;
>  	struct intel_context *ce = NULL;
>  	u32 ccid = -1;
>  
> -	trace_ports(execlists, msg, execlists->pending);
> +	trace_ports(engine, execlists, msg, execlists->pending);
>  
>  	/* We may be messing around with the lists during reset, lalala */
> -	if (reset_in_progress(execlists))
> +	if (reset_in_progress(engine))
>  		return true;
>  
>  	if (!execlists->pending[0]) {
> @@ -878,7 +877,7 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
>  	struct intel_engine_execlists *execlists = &engine->execlists;
>  	unsigned int n;
>  
> -	GEM_BUG_ON(!assert_pending_valid(execlists, "submit"));
> +	GEM_BUG_ON(!assert_pending_valid(engine, execlists, "submit"));
>  
>  	/*
>  	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
> @@ -1096,7 +1095,8 @@ static void defer_active(struct intel_engine_cs *engine)
>  	if (!rq)
>  		return;
>  
> -	defer_request(rq, i915_sched_lookup_priolist(engine, rq_prio(rq)));
> +	defer_request(rq, i915_sched_lookup_priolist(engine->sched_engine,
> +						     rq_prio(rq)));
>  }
>  
>  static bool
> @@ -1133,13 +1133,14 @@ static bool needs_timeslice(const struct intel_engine_cs *engine,
>  		return false;
>  
>  	/* If ELSP[1] is occupied, always check to see if worth slicing */
> -	if (!list_is_last_rcu(&rq->sched.link, &engine->active.requests)) {
> +	if (!list_is_last_rcu(&rq->sched.link,
> +			      &engine->sched_engine->requests)) {
>  		ENGINE_TRACE(engine, "timeslice required for second inflight context\n");
>  		return true;
>  	}
>  
>  	/* Otherwise, ELSP[0] is by itself, but may be waiting in the queue */
> -	if (!RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)) {
> +	if (!i915_sched_engine_is_empty(engine->sched_engine)) {
>  		ENGINE_TRACE(engine, "timeslice required for queue\n");
>  		return true;
>  	}
> @@ -1187,7 +1188,7 @@ static void start_timeslice(struct intel_engine_cs *engine)
>  			 * its timeslice, so recheck.
>  			 */
>  			if (!timer_pending(&el->timer))
> -				tasklet_hi_schedule(&el->tasklet);
> +				i915_sched_engine_hi_kick(engine->sched_engine);
>  			return;
>  		}
>  
> @@ -1235,6 +1236,7 @@ static bool completed(const struct i915_request *rq)
>  
>  static void execlists_dequeue(struct intel_engine_cs *engine)
>  {
> +	struct i915_sched_engine * const sched_engine = engine->sched_engine;
>  	struct intel_engine_execlists * const execlists = &engine->execlists;
>  	struct i915_request **port = execlists->pending;
>  	struct i915_request ** const last_port = port + execlists->port_mask;
> @@ -1265,7 +1267,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  	 * and context switches) submission.
>  	 */
>  
> -	spin_lock(&engine->active.lock);
> +	spin_lock(&engine->sched_engine->lock);
>  
>  	/*
>  	 * If the queue is higher priority than the last
> @@ -1287,7 +1289,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  				     last->fence.context,
>  				     last->fence.seqno,
>  				     last->sched.attr.priority,
> -				     execlists->queue_priority_hint);
> +				     sched_engine->queue_priority_hint);
>  			record_preemption(execlists);
>  
>  			/*
> @@ -1313,7 +1315,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  				     yesno(timer_expired(&execlists->timer)),
>  				     last->fence.context, last->fence.seqno,
>  				     rq_prio(last),
> -				     execlists->queue_priority_hint,
> +				     sched_engine->queue_priority_hint,
>  				     yesno(timeslice_yield(execlists, last)));
>  
>  			/*
> @@ -1365,7 +1367,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  				 * Even if ELSP[1] is occupied and not worthy
>  				 * of timeslices, our queue might be.
>  				 */
> -				spin_unlock(&engine->active.lock);
> +				spin_unlock(&sched_engine->lock);
>  				return;
>  			}
>  		}
> @@ -1375,7 +1377,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  	while ((ve = first_virtual_engine(engine))) {
>  		struct i915_request *rq;
>  
> -		spin_lock(&ve->base.active.lock);
> +		spin_lock(&ve->base.sched_engine->lock);
>  
>  		rq = ve->request;
>  		if (unlikely(!virtual_matches(ve, rq, engine)))
> @@ -1384,14 +1386,14 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  		GEM_BUG_ON(rq->engine != &ve->base);
>  		GEM_BUG_ON(rq->context != &ve->context);
>  
> -		if (unlikely(rq_prio(rq) < queue_prio(execlists))) {
> -			spin_unlock(&ve->base.active.lock);
> +		if (unlikely(rq_prio(rq) < queue_prio(sched_engine))) {
> +			spin_unlock(&ve->base.sched_engine->lock);
>  			break;
>  		}
>  
>  		if (last && !can_merge_rq(last, rq)) {
> -			spin_unlock(&ve->base.active.lock);
> -			spin_unlock(&engine->active.lock);
> +			spin_unlock(&ve->base.sched_engine->lock);
> +			spin_unlock(&sched_engine->lock);
>  			return; /* leave this for another sibling */
>  		}
>  
> @@ -1405,7 +1407,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  			     yesno(engine != ve->siblings[0]));
>  
>  		WRITE_ONCE(ve->request, NULL);
> -		WRITE_ONCE(ve->base.execlists.queue_priority_hint, INT_MIN);
> +		WRITE_ONCE(ve->base.sched_engine->queue_priority_hint, INT_MIN);
>  
>  		rb = &ve->nodes[engine->id].rb;
>  		rb_erase_cached(rb, &execlists->virtual);
> @@ -1437,7 +1439,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  
>  		i915_request_put(rq);
>  unlock:
> -		spin_unlock(&ve->base.active.lock);
> +		spin_unlock(&ve->base.sched_engine->lock);
>  
>  		/*
>  		 * Hmm, we have a bunch of virtual engine requests,
> @@ -1450,7 +1452,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  			break;
>  	}
>  
> -	while ((rb = rb_first_cached(&execlists->queue))) {
> +	while ((rb = rb_first_cached(&sched_engine->queue))) {
>  		struct i915_priolist *p = to_priolist(rb);
>  		struct i915_request *rq, *rn;
>  
> @@ -1529,7 +1531,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  			}
>  		}
>  
> -		rb_erase_cached(&p->node, &execlists->queue);
> +		rb_erase_cached(&p->node, &sched_engine->queue);
>  		i915_priolist_free(p);
>  	}
>  done:
> @@ -1551,8 +1553,9 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  	 * request triggering preemption on the next dequeue (or subsequent
>  	 * interrupt for secondary ports).
>  	 */
> -	execlists->queue_priority_hint = queue_prio(execlists);
> -	spin_unlock(&engine->active.lock);
> +	sched_engine->queue_priority_hint = queue_prio(sched_engine);
> +	i915_sched_engine_reset_on_empty(sched_engine);
> +	spin_unlock(&sched_engine->lock);
>  
>  	/*
>  	 * We can skip poking the HW if we ended up with exactly the same set
> @@ -1767,8 +1770,8 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive)
>  	 * access. Either we are inside the tasklet, or the tasklet is disabled
>  	 * and we assume that is only inside the reset paths and so serialised.
>  	 */
> -	GEM_BUG_ON(!tasklet_is_locked(&execlists->tasklet) &&
> -		   !reset_in_progress(execlists));
> +	GEM_BUG_ON(!tasklet_is_locked(&engine->sched_engine->tasklet) &&
> +		   !reset_in_progress(engine));
>  
>  	/*
>  	 * Note that csb_write, csb_status may be either in HWSP or mmio.
> @@ -1866,12 +1869,12 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive)
>  			smp_wmb(); /* notify execlists_active() */
>  
>  			/* cancel old inflight, prepare for switch */
> -			trace_ports(execlists, "preempted", old);
> +			trace_ports(engine, execlists, "preempted", old);
>  			while (*old)
>  				*inactive++ = *old++;
>  
>  			/* switch pending to inflight */
> -			GEM_BUG_ON(!assert_pending_valid(execlists, "promote"));
> +			GEM_BUG_ON(!assert_pending_valid(engine, execlists, "promote"));
>  			copy_ports(execlists->inflight,
>  				   execlists->pending,
>  				   execlists_num_ports(execlists));
> @@ -1889,7 +1892,7 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive)
>  			}
>  
>  			/* port0 completed, advanced to port1 */
> -			trace_ports(execlists, "completed", execlists->active);
> +			trace_ports(engine, execlists, "completed", execlists->active);
>  
>  			/*
>  			 * We rely on the hardware being strongly
> @@ -1979,7 +1982,7 @@ static void __execlists_hold(struct i915_request *rq)
>  			__i915_request_unsubmit(rq);
>  
>  		clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> -		list_move_tail(&rq->sched.link, &rq->engine->active.hold);
> +		list_move_tail(&rq->sched.link, &rq->engine->sched_engine->hold);
>  		i915_request_set_hold(rq);
>  		RQ_TRACE(rq, "on hold\n");
>  
> @@ -2016,7 +2019,7 @@ static bool execlists_hold(struct intel_engine_cs *engine,
>  	if (i915_request_on_hold(rq))
>  		return false;
>  
> -	spin_lock_irq(&engine->active.lock);
> +	spin_lock_irq(&engine->sched_engine->lock);
>  
>  	if (__i915_request_is_complete(rq)) { /* too late! */
>  		rq = NULL;
> @@ -2032,10 +2035,10 @@ static bool execlists_hold(struct intel_engine_cs *engine,
>  	GEM_BUG_ON(i915_request_on_hold(rq));
>  	GEM_BUG_ON(rq->engine != engine);
>  	__execlists_hold(rq);
> -	GEM_BUG_ON(list_empty(&engine->active.hold));
> +	GEM_BUG_ON(list_empty(&engine->sched_engine->hold));
>  
>  unlock:
> -	spin_unlock_irq(&engine->active.lock);
> +	spin_unlock_irq(&engine->sched_engine->lock);
>  	return rq;
>  }
>  
> @@ -2079,7 +2082,7 @@ static void __execlists_unhold(struct i915_request *rq)
>  
>  		i915_request_clear_hold(rq);
>  		list_move_tail(&rq->sched.link,
> -			       i915_sched_lookup_priolist(rq->engine,
> +			       i915_sched_lookup_priolist(rq->engine->sched_engine,
>  							  rq_prio(rq)));
>  		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>  
> @@ -2115,7 +2118,7 @@ static void __execlists_unhold(struct i915_request *rq)
>  static void execlists_unhold(struct intel_engine_cs *engine,
>  			     struct i915_request *rq)
>  {
> -	spin_lock_irq(&engine->active.lock);
> +	spin_lock_irq(&engine->sched_engine->lock);
>  
>  	/*
>  	 * Move this request back to the priority queue, and all of its
> @@ -2123,12 +2126,12 @@ static void execlists_unhold(struct intel_engine_cs *engine,
>  	 */
>  	__execlists_unhold(rq);
>  
> -	if (rq_prio(rq) > engine->execlists.queue_priority_hint) {
> -		engine->execlists.queue_priority_hint = rq_prio(rq);
> -		tasklet_hi_schedule(&engine->execlists.tasklet);
> +	if (rq_prio(rq) > engine->sched_engine->queue_priority_hint) {
> +		engine->sched_engine->queue_priority_hint = rq_prio(rq);
> +		i915_sched_engine_hi_kick(engine->sched_engine);
>  	}
>  
> -	spin_unlock_irq(&engine->active.lock);
> +	spin_unlock_irq(&engine->sched_engine->lock);
>  }
>  
>  struct execlists_capture {
> @@ -2258,13 +2261,13 @@ static void execlists_capture(struct intel_engine_cs *engine)
>  	if (!cap)
>  		return;
>  
> -	spin_lock_irq(&engine->active.lock);
> +	spin_lock_irq(&engine->sched_engine->lock);
>  	cap->rq = active_context(engine, active_ccid(engine));
>  	if (cap->rq) {
>  		cap->rq = active_request(cap->rq->context->timeline, cap->rq);
>  		cap->rq = i915_request_get_rcu(cap->rq);
>  	}
> -	spin_unlock_irq(&engine->active.lock);
> +	spin_unlock_irq(&engine->sched_engine->lock);
>  	if (!cap->rq)
>  		goto err_free;
>  
> @@ -2316,13 +2319,13 @@ static void execlists_reset(struct intel_engine_cs *engine, const char *msg)
>  	ENGINE_TRACE(engine, "reset for %s\n", msg);
>  
>  	/* Mark this tasklet as disabled to avoid waiting for it to complete */
> -	tasklet_disable_nosync(&engine->execlists.tasklet);
> +	tasklet_disable_nosync(&engine->sched_engine->tasklet);
>  
>  	ring_set_paused(engine, 1); /* Freeze the current request in place */
>  	execlists_capture(engine);
>  	intel_engine_reset(engine, msg);
>  
> -	tasklet_enable(&engine->execlists.tasklet);
> +	tasklet_enable(&engine->sched_engine->tasklet);
>  	clear_and_wake_up_bit(bit, lock);
>  }
>  
> @@ -2345,8 +2348,9 @@ static bool preempt_timeout(const struct intel_engine_cs *const engine)
>   */
>  static void execlists_submission_tasklet(struct tasklet_struct *t)
>  {
> -	struct intel_engine_cs * const engine =
> -		from_tasklet(engine, t, execlists.tasklet);
> +	struct i915_sched_engine *sched_engine =
> +		from_tasklet(sched_engine, t, tasklet);
> +	struct intel_engine_cs * const engine = sched_engine->engine;
>  	struct i915_request *post[2 * EXECLIST_MAX_PORTS];
>  	struct i915_request **inactive;
>  
> @@ -2421,13 +2425,16 @@ static void execlists_irq_handler(struct intel_engine_cs *engine, u16 iir)
>  		intel_engine_signal_breadcrumbs(engine);
>  
>  	if (tasklet)
> -		tasklet_hi_schedule(&engine->execlists.tasklet);
> +		i915_sched_engine_hi_kick(engine->sched_engine);
>  }
>  
>  static void __execlists_kick(struct intel_engine_execlists *execlists)
>  {
> +	struct intel_engine_cs *engine =
> +		container_of(execlists, typeof(*engine), execlists);
> +
>  	/* Kick the tasklet for some interrupt coalescing and reset handling */
> -	tasklet_hi_schedule(&execlists->tasklet);
> +	i915_sched_engine_hi_kick(engine->sched_engine);
>  }
>  
>  #define execlists_kick(t, member) \
> @@ -2448,19 +2455,20 @@ static void queue_request(struct intel_engine_cs *engine,
>  {
>  	GEM_BUG_ON(!list_empty(&rq->sched.link));
>  	list_add_tail(&rq->sched.link,
> -		      i915_sched_lookup_priolist(engine, rq_prio(rq)));
> +		      i915_sched_lookup_priolist(engine->sched_engine,
> +						 rq_prio(rq)));
>  	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>  }
>  
>  static bool submit_queue(struct intel_engine_cs *engine,
>  			 const struct i915_request *rq)
>  {
> -	struct intel_engine_execlists *execlists = &engine->execlists;
> +	struct i915_sched_engine *sched_engine = engine->sched_engine;
>  
> -	if (rq_prio(rq) <= execlists->queue_priority_hint)
> +	if (rq_prio(rq) <= sched_engine->queue_priority_hint)
>  		return false;
>  
> -	execlists->queue_priority_hint = rq_prio(rq);
> +	sched_engine->queue_priority_hint = rq_prio(rq);
>  	return true;
>  }
>  
> @@ -2468,7 +2476,7 @@ static bool ancestor_on_hold(const struct intel_engine_cs *engine,
>  			     const struct i915_request *rq)
>  {
>  	GEM_BUG_ON(i915_request_on_hold(rq));
> -	return !list_empty(&engine->active.hold) && hold_request(rq);
> +	return !list_empty(&engine->sched_engine->hold) && hold_request(rq);
>  }
>  
>  static void execlists_submit_request(struct i915_request *request)
> @@ -2477,23 +2485,24 @@ static void execlists_submit_request(struct i915_request *request)
>  	unsigned long flags;
>  
>  	/* Will be called from irq-context when using foreign fences. */
> -	spin_lock_irqsave(&engine->active.lock, flags);
> +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
>  
>  	if (unlikely(ancestor_on_hold(engine, request))) {
>  		RQ_TRACE(request, "ancestor on hold\n");
> -		list_add_tail(&request->sched.link, &engine->active.hold);
> +		list_add_tail(&request->sched.link,
> +			      &engine->sched_engine->hold);
>  		i915_request_set_hold(request);
>  	} else {
>  		queue_request(engine, request);
>  
> -		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
> +		GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
>  		GEM_BUG_ON(list_empty(&request->sched.link));
>  
>  		if (submit_queue(engine, request))
> -			__execlists_kick(&engine->execlists);
> +			i915_sched_engine_hi_kick(engine->sched_engine);
>  	}
>  
> -	spin_unlock_irqrestore(&engine->active.lock, flags);
> +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>  }
>  
>  static int
> @@ -2800,10 +2809,10 @@ static int execlists_resume(struct intel_engine_cs *engine)
>  
>  static void execlists_reset_prepare(struct intel_engine_cs *engine)
>  {
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
> +	struct i915_sched_engine * const sched_engine = engine->sched_engine;
>  
>  	ENGINE_TRACE(engine, "depth<-%d\n",
> -		     atomic_read(&execlists->tasklet.count));
> +		     atomic_read(&sched_engine->tasklet.count));
>  
>  	/*
>  	 * Prevent request submission to the hardware until we have
> @@ -2814,8 +2823,8 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine)
>  	 * Turning off the execlists->tasklet until the reset is over
>  	 * prevents the race.
>  	 */
> -	__tasklet_disable_sync_once(&execlists->tasklet);
> -	GEM_BUG_ON(!reset_in_progress(execlists));
> +	__tasklet_disable_sync_once(&sched_engine->tasklet);
> +	GEM_BUG_ON(!reset_in_progress(engine));
>  
>  	/*
>  	 * We stop engines, otherwise we might get failed reset and a
> @@ -2957,23 +2966,25 @@ static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled)
>  
>  	/* Push back any incomplete requests for replay after the reset. */
>  	rcu_read_lock();
> -	spin_lock_irqsave(&engine->active.lock, flags);
> +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
>  	__unwind_incomplete_requests(engine);
> -	spin_unlock_irqrestore(&engine->active.lock, flags);
> +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>  	rcu_read_unlock();
>  }
>  
>  static void nop_submission_tasklet(struct tasklet_struct *t)
>  {
> -	struct intel_engine_cs * const engine =
> -		from_tasklet(engine, t, execlists.tasklet);
> +	struct i915_sched_engine *sched_engine =
> +		from_tasklet(sched_engine, t, tasklet);
> +	struct intel_engine_cs * const engine = sched_engine->engine;
>  
>  	/* The driver is wedged; don't process any more events. */
> -	WRITE_ONCE(engine->execlists.queue_priority_hint, INT_MIN);
> +	WRITE_ONCE(engine->sched_engine->queue_priority_hint, INT_MIN);
>  }
>  
>  static void execlists_reset_cancel(struct intel_engine_cs *engine)
>  {
> +	struct i915_sched_engine * const sched_engine = engine->sched_engine;
>  	struct intel_engine_execlists * const execlists = &engine->execlists;
>  	struct i915_request *rq, *rn;
>  	struct rb_node *rb;
> @@ -2998,15 +3009,15 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine)
>  	execlists_reset_csb(engine, true);
>  
>  	rcu_read_lock();
> -	spin_lock_irqsave(&engine->active.lock, flags);
> +	spin_lock_irqsave(&sched_engine->lock, flags);
>  
>  	/* Mark all executing requests as skipped. */
> -	list_for_each_entry(rq, &engine->active.requests, sched.link)
> +	list_for_each_entry(rq, &sched_engine->requests, sched.link)
>  		i915_request_put(i915_request_mark_eio(rq));
>  	intel_engine_signal_breadcrumbs(engine);
>  
>  	/* Flush the queued requests to the timeline list (for retiring). */
> -	while ((rb = rb_first_cached(&execlists->queue))) {
> +	while ((rb = rb_first_cached(&sched_engine->queue))) {
>  		struct i915_priolist *p = to_priolist(rb);
>  
>  		priolist_for_each_request_consume(rq, rn, p) {
> @@ -3016,12 +3027,12 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine)
>  			}
>  		}
>  
> -		rb_erase_cached(&p->node, &execlists->queue);
> +		rb_erase_cached(&p->node, &sched_engine->queue);
>  		i915_priolist_free(p);
>  	}
>  
>  	/* On-hold requests will be flushed to timeline upon their release */
> -	list_for_each_entry(rq, &engine->active.hold, sched.link)
> +	list_for_each_entry(rq, &sched_engine->hold, sched.link)
>  		i915_request_put(i915_request_mark_eio(rq));
>  
>  	/* Cancel all attached virtual engines */
> @@ -3032,7 +3043,7 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine)
>  		rb_erase_cached(rb, &execlists->virtual);
>  		RB_CLEAR_NODE(rb);
>  
> -		spin_lock(&ve->base.active.lock);
> +		spin_lock(&ve->base.sched_engine->lock);
>  		rq = fetch_and_zero(&ve->request);
>  		if (rq) {
>  			if (i915_request_mark_eio(rq)) {
> @@ -3042,26 +3053,26 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine)
>  			}
>  			i915_request_put(rq);
>  
> -			ve->base.execlists.queue_priority_hint = INT_MIN;
> +			ve->base.sched_engine->queue_priority_hint = INT_MIN;
>  		}
> -		spin_unlock(&ve->base.active.lock);
> +		spin_unlock(&ve->base.sched_engine->lock);
>  	}
>  
>  	/* Remaining _unready_ requests will be nop'ed when submitted */
>  
> -	execlists->queue_priority_hint = INT_MIN;
> -	execlists->queue = RB_ROOT_CACHED;
> +	sched_engine->queue_priority_hint = INT_MIN;
> +	sched_engine->queue = RB_ROOT_CACHED;
>  
> -	GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet));
> -	execlists->tasklet.callback = nop_submission_tasklet;
> +	GEM_BUG_ON(__tasklet_is_enabled(&sched_engine->tasklet));
> +	sched_engine->tasklet.callback = nop_submission_tasklet;
>  
> -	spin_unlock_irqrestore(&engine->active.lock, flags);
> +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>  	rcu_read_unlock();
>  }
>  
>  static void execlists_reset_finish(struct intel_engine_cs *engine)
>  {
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
> +	struct i915_sched_engine * const sched_engine = engine->sched_engine;
>  
>  	/*
>  	 * After a GPU reset, we may have requests to replay. Do so now while
> @@ -3073,14 +3084,14 @@ static void execlists_reset_finish(struct intel_engine_cs *engine)
>  	 * reset as the next level of recovery, and as a final resort we
>  	 * will declare the device wedged.
>  	 */
> -	GEM_BUG_ON(!reset_in_progress(execlists));
> +	GEM_BUG_ON(!reset_in_progress(engine));
>  
>  	/* And kick in case we missed a new request submission. */
> -	if (__tasklet_enable(&execlists->tasklet))
> -		__execlists_kick(execlists);
> +	if (__tasklet_enable(&sched_engine->tasklet))
> +		i915_sched_engine_hi_kick(sched_engine);
>  
>  	ENGINE_TRACE(engine, "depth->%d\n",
> -		     atomic_read(&execlists->tasklet.count));
> +		     atomic_read(&sched_engine->tasklet.count));
>  }
>  
>  static void gen8_logical_ring_enable_irq(struct intel_engine_cs *engine)
> @@ -3110,11 +3121,59 @@ static bool can_preempt(struct intel_engine_cs *engine)
>  	return engine->class != RENDER_CLASS;
>  }
>  
> +static void kick_execlists(const struct i915_request *rq, int prio)
> +{
> +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> +	const struct i915_request *inflight;
> +
> +	/*
> +	 * We only need to kick the tasklet once for the high priority
> +	 * new context we add into the queue.
> +	 */
> +	if (prio <= sched_engine->queue_priority_hint)
> +		return;
> +
> +	rcu_read_lock();
> +
> +	/* Nothing currently active? We're overdue for a submission! */
> +	inflight = execlists_active(&rq->engine->execlists);
> +	if (!inflight)
> +		goto unlock;
> +
> +	/*
> +	 * If we are already the currently executing context, don't
> +	 * bother evaluating if we should preempt ourselves.
> +	 */
> +	if (inflight->context == rq->context)
> +		goto unlock;
> +
> +	ENGINE_TRACE(rq->engine,
> +		     "bumping queue-priority-hint:%d for rq:%llx:%lld, inflight:%llx:%lld prio %d\n",
> +		     prio,
> +		     rq->fence.context, rq->fence.seqno,
> +		     inflight->fence.context, inflight->fence.seqno,
> +		     inflight->sched.attr.priority);
> +
> +	sched_engine->queue_priority_hint = prio;
> +
> +	/*
> +	 * Allow preemption of low -> normal -> high, but we do
> +	 * not allow low priority tasks to preempt other low priority
> +	 * tasks under the impression that latency for low priority
> +	 * tasks does not matter (as much as background throughput),
> +	 * so kiss.
> +	 */
> +	if (prio >= max(I915_PRIORITY_NORMAL, rq_prio(inflight)))
> +		i915_sched_engine_hi_kick(sched_engine);
> +
> +unlock:
> +	rcu_read_unlock();
> +}
> +
>  static void execlists_set_default_submission(struct intel_engine_cs *engine)
>  {
>  	engine->submit_request = execlists_submit_request;
> -	engine->schedule = i915_schedule;
> -	engine->execlists.tasklet.callback = execlists_submission_tasklet;
> +	engine->sched_engine->tasklet.callback = execlists_submission_tasklet;
>  }
>  
>  static void execlists_shutdown(struct intel_engine_cs *engine)
> @@ -3122,7 +3181,7 @@ static void execlists_shutdown(struct intel_engine_cs *engine)
>  	/* Synchronise with residual timers and any softirq they raise */
>  	del_timer_sync(&engine->execlists.timer);
>  	del_timer_sync(&engine->execlists.preempt);
> -	tasklet_kill(&engine->execlists.tasklet);
> +	i915_sched_engine_kill(engine->sched_engine);
>  }
>  
>  static void execlists_release(struct intel_engine_cs *engine)
> @@ -3238,10 +3297,14 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine)
>  	struct intel_uncore *uncore = engine->uncore;
>  	u32 base = engine->mmio_base;
>  
> -	tasklet_setup(&engine->execlists.tasklet, execlists_submission_tasklet);
> +	tasklet_setup(&engine->sched_engine->tasklet,
> +		      execlists_submission_tasklet);
>  	timer_setup(&engine->execlists.timer, execlists_timeslice, 0);
>  	timer_setup(&engine->execlists.preempt, execlists_preempt, 0);
>  
> +	engine->sched_engine->schedule = i915_schedule;
> +	engine->sched_engine->kick_backend = kick_execlists;
> +
>  	logical_ring_default_vfuncs(engine);
>  	logical_ring_default_irqs(engine);
>  
> @@ -3286,7 +3349,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine)
>  
>  static struct list_head *virtual_queue(struct virtual_engine *ve)
>  {
> -	return &ve->base.execlists.default_priolist.requests;
> +	return &ve->base.sched_engine->default_priolist.requests;
>  }
>  
>  static void rcu_virtual_context_destroy(struct work_struct *wrk)
> @@ -3301,7 +3364,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
>  	if (unlikely(ve->request)) {
>  		struct i915_request *old;
>  
> -		spin_lock_irq(&ve->base.active.lock);
> +		spin_lock_irq(&ve->base.sched_engine->lock);
>  
>  		old = fetch_and_zero(&ve->request);
>  		if (old) {
> @@ -3310,7 +3373,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
>  			i915_request_put(old);
>  		}
>  
> -		spin_unlock_irq(&ve->base.active.lock);
> +		spin_unlock_irq(&ve->base.sched_engine->lock);
>  	}
>  
>  	/*
> @@ -3320,7 +3383,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
>  	 * rbtrees as in the case it is running in parallel, it may reinsert
>  	 * the rb_node into a sibling.
>  	 */
> -	tasklet_kill(&ve->base.execlists.tasklet);
> +	i915_sched_engine_kill(ve->base.sched_engine);
>  
>  	/* Decouple ourselves from the siblings, no more access allowed. */
>  	for (n = 0; n < ve->num_siblings; n++) {
> @@ -3330,21 +3393,23 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
>  		if (RB_EMPTY_NODE(node))
>  			continue;
>  
> -		spin_lock_irq(&sibling->active.lock);
> +		spin_lock_irq(&sibling->sched_engine->lock);
>  
>  		/* Detachment is lazily performed in the execlists tasklet */
>  		if (!RB_EMPTY_NODE(node))
>  			rb_erase_cached(node, &sibling->execlists.virtual);
>  
> -		spin_unlock_irq(&sibling->active.lock);
> +		spin_unlock_irq(&sibling->sched_engine->lock);
>  	}
> -	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
> +	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.sched_engine->tasklet));
>  	GEM_BUG_ON(!list_empty(virtual_queue(ve)));
>  
>  	lrc_fini(&ve->context);
>  	intel_context_fini(&ve->context);
>  
>  	intel_breadcrumbs_free(ve->base.breadcrumbs);
> +	if (ve->base.sched_engine)
> +		i915_sched_engine_put(ve->base.sched_engine);
>  	intel_engine_free_request_pool(&ve->base);
>  
>  	kfree(ve->bonds);
> @@ -3475,16 +3540,18 @@ static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
>  
>  	ENGINE_TRACE(&ve->base, "rq=%llx:%lld, mask=%x, prio=%d\n",
>  		     rq->fence.context, rq->fence.seqno,
> -		     mask, ve->base.execlists.queue_priority_hint);
> +		     mask, ve->base.sched_engine->queue_priority_hint);
>  
>  	return mask;
>  }
>  
>  static void virtual_submission_tasklet(struct tasklet_struct *t)
>  {
> +	struct i915_sched_engine *sched_engine =
> +		from_tasklet(sched_engine, t, tasklet);
>  	struct virtual_engine * const ve =
> -		from_tasklet(ve, t, base.execlists.tasklet);
> -	const int prio = READ_ONCE(ve->base.execlists.queue_priority_hint);
> +		(struct virtual_engine *)sched_engine->engine;
> +	const int prio = READ_ONCE(ve->base.sched_engine->queue_priority_hint);
>  	intel_engine_mask_t mask;
>  	unsigned int n;
>  
> @@ -3503,7 +3570,7 @@ static void virtual_submission_tasklet(struct tasklet_struct *t)
>  		if (!READ_ONCE(ve->request))
>  			break; /* already handled by a sibling's tasklet */
>  
> -		spin_lock_irq(&sibling->active.lock);
> +		spin_lock_irq(&sibling->sched_engine->lock);
>  
>  		if (unlikely(!(mask & sibling->mask))) {
>  			if (!RB_EMPTY_NODE(&node->rb)) {
> @@ -3552,11 +3619,11 @@ static void virtual_submission_tasklet(struct tasklet_struct *t)
>  submit_engine:
>  		GEM_BUG_ON(RB_EMPTY_NODE(&node->rb));
>  		node->prio = prio;
> -		if (first && prio > sibling->execlists.queue_priority_hint)
> -			tasklet_hi_schedule(&sibling->execlists.tasklet);
> +		if (first && prio > sibling->sched_engine->queue_priority_hint)
> +			i915_sched_engine_hi_kick(sibling->sched_engine);
>  
>  unlock_engine:
> -		spin_unlock_irq(&sibling->active.lock);
> +		spin_unlock_irq(&sibling->sched_engine->lock);
>  
>  		if (intel_context_inflight(&ve->context))
>  			break;
> @@ -3574,7 +3641,7 @@ static void virtual_submit_request(struct i915_request *rq)
>  
>  	GEM_BUG_ON(ve->base.submit_request != virtual_submit_request);
>  
> -	spin_lock_irqsave(&ve->base.active.lock, flags);
> +	spin_lock_irqsave(&ve->base.sched_engine->lock, flags);
>  
>  	/* By the time we resubmit a request, it may be completed */
>  	if (__i915_request_is_complete(rq)) {
> @@ -3588,16 +3655,16 @@ static void virtual_submit_request(struct i915_request *rq)
>  		i915_request_put(ve->request);
>  	}
>  
> -	ve->base.execlists.queue_priority_hint = rq_prio(rq);
> +	ve->base.sched_engine->queue_priority_hint = rq_prio(rq);
>  	ve->request = i915_request_get(rq);
>  
>  	GEM_BUG_ON(!list_empty(virtual_queue(ve)));
>  	list_move_tail(&rq->sched.link, virtual_queue(ve));
>  
> -	tasklet_hi_schedule(&ve->base.execlists.tasklet);
> +	i915_sched_engine_hi_kick(ve->base.sched_engine);
>  
>  unlock:
> -	spin_unlock_irqrestore(&ve->base.active.lock, flags);
> +	spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags);
>  }
>  
>  static struct ve_bond *
> @@ -3681,19 +3748,24 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>  
>  	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
>  
> -	intel_engine_init_active(&ve->base, ENGINE_VIRTUAL);
> -	intel_engine_init_execlists(&ve->base);
> +	ve->base.sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
> +	if (!ve->base.sched_engine) {
> +		kfree(ve);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +	ve->base.sched_engine->engine = &ve->base;
>  
>  	ve->base.cops = &virtual_context_ops;
>  	ve->base.request_alloc = execlists_request_alloc;
>  
> -	ve->base.schedule = i915_schedule;
> +	ve->base.sched_engine->schedule = i915_schedule;
>  	ve->base.submit_request = virtual_submit_request;
>  	ve->base.bond_execute = virtual_bond_execute;
>  
>  	INIT_LIST_HEAD(virtual_queue(ve));
> -	ve->base.execlists.queue_priority_hint = INT_MIN;
> -	tasklet_setup(&ve->base.execlists.tasklet, virtual_submission_tasklet);
> +	ve->base.sched_engine->queue_priority_hint = INT_MIN;
> +	tasklet_setup(&ve->base.sched_engine->tasklet,
> +		      virtual_submission_tasklet);
>  
>  	intel_context_init(&ve->context, &ve->base);
>  
> @@ -3721,7 +3793,7 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>  		 * layering if we handle cloning of the requests and
>  		 * submitting a copy into each backend.
>  		 */
> -		if (sibling->execlists.tasklet.callback !=
> +		if (sibling->sched_engine->tasklet.callback !=
>  		    execlists_submission_tasklet) {
>  			err = -ENODEV;
>  			goto err_put;
> @@ -3756,6 +3828,9 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>  			 "v%dx%d", ve->base.class, count);
>  		ve->base.context_size = sibling->context_size;
>  
> +		ve->base.sched_engine->kick_backend =
> +			sibling->sched_engine->kick_backend;
> +
>  		ve->base.emit_bb_start = sibling->emit_bb_start;
>  		ve->base.emit_flush = sibling->emit_flush;
>  		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
> @@ -3848,17 +3923,18 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
>  							int indent),
>  				   unsigned int max)
>  {
> +	const struct i915_sched_engine *sched_engine = engine->sched_engine;
>  	const struct intel_engine_execlists *execlists = &engine->execlists;
>  	struct i915_request *rq, *last;
>  	unsigned long flags;
>  	unsigned int count;
>  	struct rb_node *rb;
>  
> -	spin_lock_irqsave(&engine->active.lock, flags);
> +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
>  
>  	last = NULL;
>  	count = 0;
> -	list_for_each_entry(rq, &engine->active.requests, sched.link) {
> +	list_for_each_entry(rq, &sched_engine->requests, sched.link) {
>  		if (count++ < max - 1)
>  			show_request(m, rq, "\t\t", 0);
>  		else
> @@ -3873,13 +3949,13 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
>  		show_request(m, last, "\t\t", 0);
>  	}
>  
> -	if (execlists->queue_priority_hint != INT_MIN)
> +	if (sched_engine->queue_priority_hint != INT_MIN)
>  		drm_printf(m, "\t\tQueue priority hint: %d\n",
> -			   READ_ONCE(execlists->queue_priority_hint));
> +			   READ_ONCE(sched_engine->queue_priority_hint));
>  
>  	last = NULL;
>  	count = 0;
> -	for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
> +	for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
>  		struct i915_priolist *p = rb_entry(rb, typeof(*p), node);
>  
>  		priolist_for_each_request(rq, p) {
> @@ -3921,7 +3997,7 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
>  		show_request(m, last, "\t\t", 0);
>  	}
>  
> -	spin_unlock_irqrestore(&engine->active.lock, flags);
> +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>  }
>  
>  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 2b6dffcc2262..14aa31879a37 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -339,9 +339,9 @@ static void reset_rewind(struct intel_engine_cs *engine, bool stalled)
>  	u32 head;
>  
>  	rq = NULL;
> -	spin_lock_irqsave(&engine->active.lock, flags);
> +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
>  	rcu_read_lock();
> -	list_for_each_entry(pos, &engine->active.requests, sched.link) {
> +	list_for_each_entry(pos, &engine->sched_engine->requests, sched.link) {
>  		if (!__i915_request_is_complete(pos)) {
>  			rq = pos;
>  			break;
> @@ -396,7 +396,7 @@ static void reset_rewind(struct intel_engine_cs *engine, bool stalled)
>  	}
>  	engine->legacy.ring->head = intel_ring_wrap(engine->legacy.ring, head);
>  
> -	spin_unlock_irqrestore(&engine->active.lock, flags);
> +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>  }
>  
>  static void reset_finish(struct intel_engine_cs *engine)
> @@ -408,16 +408,17 @@ static void reset_cancel(struct intel_engine_cs *engine)
>  	struct i915_request *request;
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&engine->active.lock, flags);
> +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
>  
>  	/* Mark all submitted requests as skipped. */
> -	list_for_each_entry(request, &engine->active.requests, sched.link)
> +	list_for_each_entry(request, &engine->sched_engine->requests,
> +			    sched.link)
>  		i915_request_put(i915_request_mark_eio(request));
>  	intel_engine_signal_breadcrumbs(engine);
>  
>  	/* Remaining _unready_ requests will be nop'ed when submitted */
>  
> -	spin_unlock_irqrestore(&engine->active.lock, flags);
> +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>  }
>  
>  static void i9xx_submit_request(struct i915_request *request)
> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> index 32589c6625e1..bd005c1b6fd5 100644
> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> @@ -253,10 +253,10 @@ static void mock_reset_cancel(struct intel_engine_cs *engine)
>  
>  	del_timer_sync(&mock->hw_delay);
>  
> -	spin_lock_irqsave(&engine->active.lock, flags);
> +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
>  
>  	/* Mark all submitted requests as skipped. */
> -	list_for_each_entry(rq, &engine->active.requests, sched.link)
> +	list_for_each_entry(rq, &engine->sched_engine->requests, sched.link)
>  		i915_request_put(i915_request_mark_eio(rq));
>  	intel_engine_signal_breadcrumbs(engine);
>  
> @@ -269,7 +269,7 @@ static void mock_reset_cancel(struct intel_engine_cs *engine)
>  	}
>  	INIT_LIST_HEAD(&mock->hw_queue);
>  
> -	spin_unlock_irqrestore(&engine->active.lock, flags);
> +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>  }
>  
>  static void mock_reset_finish(struct intel_engine_cs *engine)
> @@ -283,6 +283,7 @@ static void mock_engine_release(struct intel_engine_cs *engine)
>  
>  	GEM_BUG_ON(timer_pending(&mock->hw_delay));
>  
> +	i915_sched_engine_put(engine->sched_engine);
>  	intel_breadcrumbs_free(engine->breadcrumbs);
>  
>  	intel_context_unpin(engine->kernel_context);
> @@ -345,14 +346,18 @@ int mock_engine_init(struct intel_engine_cs *engine)
>  {
>  	struct intel_context *ce;
>  
> -	intel_engine_init_active(engine, ENGINE_MOCK);
> +	engine->sched_engine = i915_sched_engine_create(ENGINE_MOCK);
> +	if (!engine->sched_engine)
> +		return -ENOMEM;
> +	engine->sched_engine->engine = engine;
> +
>  	intel_engine_init_execlists(engine);
>  	intel_engine_init__pm(engine);
>  	intel_engine_init_retire(engine);
>  
>  	engine->breadcrumbs = intel_breadcrumbs_create(NULL);
>  	if (!engine->breadcrumbs)
> -		return -ENOMEM;
> +		goto err_schedule;
>  
>  	ce = create_kernel_context(engine);
>  	if (IS_ERR(ce))
> @@ -366,6 +371,8 @@ int mock_engine_init(struct intel_engine_cs *engine)
>  
>  err_breadcrumbs:
>  	intel_breadcrumbs_free(engine->breadcrumbs);
> +err_schedule:
> +	i915_sched_engine_put(engine->sched_engine);
>  	return -ENOMEM;
>  }
>  
> diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> index 1f93591a8c69..f349048ccbf6 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> @@ -43,7 +43,7 @@ static int wait_for_submit(struct intel_engine_cs *engine,
>  			   unsigned long timeout)
>  {
>  	/* Ignore our own attempts to suppress excess tasklets */
> -	tasklet_hi_schedule(&engine->execlists.tasklet);
> +	i915_sched_engine_hi_kick(engine->sched_engine);
>  
>  	timeout += jiffies;
>  	do {
> @@ -273,7 +273,7 @@ static int live_unlite_restore(struct intel_gt *gt, int prio)
>  			};
>  
>  			/* Alternatively preempt the spinner with ce[1] */
> -			engine->schedule(rq[1], &attr);
> +			engine->sched_engine->schedule(rq[1], &attr);
>  		}
>  
>  		/* And switch back to ce[0] for good measure */
> @@ -606,9 +606,9 @@ static int live_hold_reset(void *arg)
>  			err = -EBUSY;
>  			goto out;
>  		}
> -		tasklet_disable(&engine->execlists.tasklet);
> +		tasklet_disable(&engine->sched_engine->tasklet);
>  
> -		engine->execlists.tasklet.callback(&engine->execlists.tasklet);
> +		engine->sched_engine->tasklet.callback(&engine->sched_engine->tasklet);
>  		GEM_BUG_ON(execlists_active(&engine->execlists) != rq);
>  
>  		i915_request_get(rq);
> @@ -618,7 +618,7 @@ static int live_hold_reset(void *arg)
>  		__intel_engine_reset_bh(engine, NULL);
>  		GEM_BUG_ON(rq->fence.error != -EIO);
>  
> -		tasklet_enable(&engine->execlists.tasklet);
> +		tasklet_enable(&engine->sched_engine->tasklet);
>  		clear_and_wake_up_bit(I915_RESET_ENGINE + id,
>  				      &gt->reset.flags);
>  		local_bh_enable();
> @@ -900,7 +900,7 @@ release_queue(struct intel_engine_cs *engine,
>  	i915_request_add(rq);
>  
>  	local_bh_disable();
> -	engine->schedule(rq, &attr);
> +	engine->sched_engine->schedule(rq, &attr);
>  	local_bh_enable(); /* kick tasklet */
>  
>  	i915_request_put(rq);
> @@ -1183,7 +1183,7 @@ static int live_timeslice_rewind(void *arg)
>  		while (i915_request_is_active(rq[A2])) { /* semaphore yield! */
>  			/* Wait for the timeslice to kick in */
>  			del_timer(&engine->execlists.timer);
> -			tasklet_hi_schedule(&engine->execlists.tasklet);
> +			i915_sched_engine_hi_kick(engine->sched_engine);
>  			intel_engine_flush_submission(engine);
>  		}
>  		/* -> ELSP[] = { { A:rq1 }, { B:rq1 } } */
> @@ -1325,7 +1325,7 @@ static int live_timeslice_queue(void *arg)
>  			err = PTR_ERR(rq);
>  			goto err_heartbeat;
>  		}
> -		engine->schedule(rq, &attr);
> +		engine->sched_engine->schedule(rq, &attr);
>  		err = wait_for_submit(engine, rq, HZ / 2);
>  		if (err) {
>  			pr_err("%s: Timed out trying to submit semaphores\n",
> @@ -1867,7 +1867,7 @@ static int live_late_preempt(void *arg)
>  		}
>  
>  		attr.priority = I915_PRIORITY_MAX;
> -		engine->schedule(rq, &attr);
> +		engine->sched_engine->schedule(rq, &attr);
>  
>  		if (!igt_wait_for_spinner(&spin_hi, rq)) {
>  			pr_err("High priority context failed to preempt the low priority context\n");
> @@ -2480,7 +2480,7 @@ static int live_suppress_self_preempt(void *arg)
>  			i915_request_add(rq_b);
>  
>  			GEM_BUG_ON(i915_request_completed(rq_a));
> -			engine->schedule(rq_a, &attr);
> +			engine->sched_engine->schedule(rq_a, &attr);
>  			igt_spinner_end(&a.spin);
>  
>  			if (!igt_wait_for_spinner(&b.spin, rq_b)) {
> @@ -2612,7 +2612,7 @@ static int live_chain_preempt(void *arg)
>  
>  			i915_request_get(rq);
>  			i915_request_add(rq);
> -			engine->schedule(rq, &attr);
> +			engine->sched_engine->schedule(rq, &attr);
>  
>  			igt_spinner_end(&hi.spin);
>  			if (i915_request_wait(rq, 0, HZ / 5) < 0) {
> @@ -2971,7 +2971,7 @@ static int live_preempt_gang(void *arg)
>  				break;
>  
>  			/* Submit each spinner at increasing priority */
> -			engine->schedule(rq, &attr);
> +			engine->sched_engine->schedule(rq, &attr);
>  		} while (prio <= I915_PRIORITY_MAX &&
>  			 !__igt_timeout(end_time, NULL));
>  		pr_debug("%s: Preempt chain of %d requests\n",
> @@ -3219,7 +3219,7 @@ static int preempt_user(struct intel_engine_cs *engine,
>  	i915_request_get(rq);
>  	i915_request_add(rq);
>  
> -	engine->schedule(rq, &attr);
> +	engine->sched_engine->schedule(rq, &attr);
>  
>  	if (i915_request_wait(rq, 0, HZ / 2) < 0)
>  		err = -ETIME;
> @@ -4593,15 +4593,15 @@ static int reset_virtual_engine(struct intel_gt *gt,
>  		err = -EBUSY;
>  		goto out_heartbeat;
>  	}
> -	tasklet_disable(&engine->execlists.tasklet);
> +	tasklet_disable(&engine->sched_engine->tasklet);
>  
> -	engine->execlists.tasklet.callback(&engine->execlists.tasklet);
> +	engine->sched_engine->tasklet.callback(&engine->sched_engine->tasklet);
>  	GEM_BUG_ON(execlists_active(&engine->execlists) != rq);
>  
>  	/* Fake a preemption event; failed of course */
> -	spin_lock_irq(&engine->active.lock);
> +	spin_lock_irq(&engine->sched_engine->lock);
>  	__unwind_incomplete_requests(engine);
> -	spin_unlock_irq(&engine->active.lock);
> +	spin_unlock_irq(&engine->sched_engine->lock);
>  	GEM_BUG_ON(rq->engine != engine);
>  
>  	/* Reset the engine while keeping our active request on hold */
> @@ -4612,7 +4612,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
>  	GEM_BUG_ON(rq->fence.error != -EIO);
>  
>  	/* Release our grasp on the engine, letting CS flow again */
> -	tasklet_enable(&engine->execlists.tasklet);
> +	tasklet_enable(&engine->sched_engine->tasklet);
>  	clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags);
>  	local_bh_enable();
>  
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 5b63d4df8c93..cbcb800e2ca0 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -858,12 +858,12 @@ static int active_engine(void *data)
>  		rq[idx] = i915_request_get(new);
>  		i915_request_add(new);
>  
> -		if (engine->schedule && arg->flags & TEST_PRIORITY) {
> +		if (engine->sched_engine->schedule && arg->flags & TEST_PRIORITY) {
>  			struct i915_sched_attr attr = {
>  				.priority =
>  					i915_prandom_u32_max_state(512, &prng),
>  			};
> -			engine->schedule(rq[idx], &attr);
> +			engine->sched_engine->schedule(rq[idx], &attr);
>  		}
>  
>  		err = active_request_put(old);
> @@ -1702,7 +1702,7 @@ static int __igt_atomic_reset_engine(struct intel_engine_cs *engine,
>  				     const struct igt_atomic_section *p,
>  				     const char *mode)
>  {
> -	struct tasklet_struct * const t = &engine->execlists.tasklet;
> +	struct tasklet_struct * const t = &engine->sched_engine->tasklet;
>  	int err;
>  
>  	GEM_TRACE("i915_reset_engine(%s:%s) under %s\n",
> diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> index d8f6623524e8..5b40def7cd9d 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> @@ -49,7 +49,7 @@ static int wait_for_submit(struct intel_engine_cs *engine,
>  			   unsigned long timeout)
>  {
>  	/* Ignore our own attempts to suppress excess tasklets */
> -	tasklet_hi_schedule(&engine->execlists.tasklet);
> +	i915_sched_engine_hi_kick(engine->sched_engine);
>  
>  	timeout += jiffies;
>  	do {
> @@ -1613,12 +1613,12 @@ static void garbage_reset(struct intel_engine_cs *engine,
>  
>  	local_bh_disable();
>  	if (!test_and_set_bit(bit, lock)) {
> -		tasklet_disable(&engine->execlists.tasklet);
> +		tasklet_disable(&engine->sched_engine->tasklet);
>  
>  		if (!rq->fence.error)
>  			__intel_engine_reset_bh(engine, NULL);
>  
> -		tasklet_enable(&engine->execlists.tasklet);
> +		tasklet_enable(&engine->sched_engine->tasklet);
>  		clear_and_wake_up_bit(bit, lock);
>  	}
>  	local_bh_enable();
> diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
> index 8784257ec808..7a50c9f4071b 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_reset.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
> @@ -321,7 +321,7 @@ static int igt_atomic_engine_reset(void *arg)
>  		goto out_unlock;
>  
>  	for_each_engine(engine, gt, id) {
> -		struct tasklet_struct *t = &engine->execlists.tasklet;
> +		struct tasklet_struct *t = &engine->sched_engine->tasklet;
>  
>  		if (t->func)
>  			tasklet_disable(t);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 38cda5d599a6..b8f9c71af13e 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -181,6 +181,7 @@ static void schedule_out(struct i915_request *rq)
>  
>  static void __guc_dequeue(struct intel_engine_cs *engine)
>  {
> +	struct i915_sched_engine * const sched_engine = engine->sched_engine;
>  	struct intel_engine_execlists * const execlists = &engine->execlists;
>  	struct i915_request **first = execlists->inflight;
>  	struct i915_request ** const last_port = first + execlists->port_mask;
> @@ -189,7 +190,7 @@ static void __guc_dequeue(struct intel_engine_cs *engine)
>  	bool submit = false;
>  	struct rb_node *rb;
>  
> -	lockdep_assert_held(&engine->active.lock);
> +	lockdep_assert_held(&engine->sched_engine->lock);
>  
>  	if (last) {
>  		if (*++first)
> @@ -204,7 +205,7 @@ static void __guc_dequeue(struct intel_engine_cs *engine)
>  	 * event.
>  	 */
>  	port = first;
> -	while ((rb = rb_first_cached(&execlists->queue))) {
> +	while ((rb = rb_first_cached(&sched_engine->queue))) {
>  		struct i915_priolist *p = to_priolist(rb);
>  		struct i915_request *rq, *rn;
>  
> @@ -224,11 +225,11 @@ static void __guc_dequeue(struct intel_engine_cs *engine)
>  			last = rq;
>  		}
>  
> -		rb_erase_cached(&p->node, &execlists->queue);
> +		rb_erase_cached(&p->node, &sched_engine->queue);
>  		i915_priolist_free(p);
>  	}
>  done:
> -	execlists->queue_priority_hint =
> +	sched_engine->queue_priority_hint =
>  		rb ? to_priolist(rb)->priority : INT_MIN;
>  	if (submit) {
>  		*port = schedule_in(last, port - execlists->inflight);
> @@ -240,13 +241,14 @@ static void __guc_dequeue(struct intel_engine_cs *engine)
>  
>  static void guc_submission_tasklet(struct tasklet_struct *t)
>  {
> -	struct intel_engine_cs * const engine =
> -		from_tasklet(engine, t, execlists.tasklet);
> +	struct i915_sched_engine *sched_engine =
> +		from_tasklet(sched_engine, t, tasklet);
> +	struct intel_engine_cs * const engine = sched_engine->engine;
>  	struct intel_engine_execlists * const execlists = &engine->execlists;
>  	struct i915_request **port, *rq;
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&engine->active.lock, flags);
> +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
>  
>  	for (port = execlists->inflight; (rq = *port); port++) {
>  		if (!i915_request_completed(rq))
> @@ -262,20 +264,22 @@ static void guc_submission_tasklet(struct tasklet_struct *t)
>  
>  	__guc_dequeue(engine);
>  
> -	spin_unlock_irqrestore(&engine->active.lock, flags);
> +	i915_sched_engine_reset_on_empty(engine->sched_engine);
> +
> +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>  }
>  
>  static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
>  {
>  	if (iir & GT_RENDER_USER_INTERRUPT) {
>  		intel_engine_signal_breadcrumbs(engine);
> -		tasklet_hi_schedule(&engine->execlists.tasklet);
> +		i915_sched_engine_hi_kick(engine->sched_engine);
>  	}
>  }
>  
>  static void guc_reset_prepare(struct intel_engine_cs *engine)
>  {
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
> +	struct i915_sched_engine * const sched_engine = engine->sched_engine;
>  
>  	ENGINE_TRACE(engine, "\n");
>  
> @@ -283,12 +287,12 @@ static void guc_reset_prepare(struct intel_engine_cs *engine)
>  	 * Prevent request submission to the hardware until we have
>  	 * completed the reset in i915_gem_reset_finish(). If a request
>  	 * is completed by one engine, it may then queue a request
> -	 * to a second via its execlists->tasklet *just* as we are
> +	 * to a second via its sched_engine->tasklet *just* as we are
>  	 * calling engine->init_hw() and also writing the ELSP.
> -	 * Turning off the execlists->tasklet until the reset is over
> +	 * Turning off the sched_engine->tasklet until the reset is over
>  	 * prevents the race.
>  	 */
> -	__tasklet_disable_sync_once(&execlists->tasklet);
> +	__tasklet_disable_sync_once(&sched_engine->tasklet);
>  }
>  
>  static void guc_reset_state(struct intel_context *ce,
> @@ -319,7 +323,7 @@ static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
>  	struct i915_request *rq;
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&engine->active.lock, flags);
> +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
>  
>  	/* Push back any incomplete requests for replay after the reset. */
>  	rq = execlists_unwind_incomplete_requests(execlists);
> @@ -333,12 +337,12 @@ static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
>  	guc_reset_state(rq->context, engine, rq->head, stalled);
>  
>  out_unlock:
> -	spin_unlock_irqrestore(&engine->active.lock, flags);
> +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>  }
>  
>  static void guc_reset_cancel(struct intel_engine_cs *engine)
>  {
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
> +	struct i915_sched_engine * const sched_engine = engine->sched_engine;
>  	struct i915_request *rq, *rn;
>  	struct rb_node *rb;
>  	unsigned long flags;
> @@ -359,16 +363,16 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
>  	 * submission's irq state, we also wish to remind ourselves that
>  	 * it is irq state.)
>  	 */
> -	spin_lock_irqsave(&engine->active.lock, flags);
> +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
>  
>  	/* Mark all executing requests as skipped. */
> -	list_for_each_entry(rq, &engine->active.requests, sched.link) {
> +	list_for_each_entry(rq, &engine->sched_engine->requests, sched.link) {
>  		i915_request_set_error_once(rq, -EIO);
>  		i915_request_mark_complete(rq);
>  	}
>  
>  	/* Flush the queued requests to the timeline list (for retiring). */
> -	while ((rb = rb_first_cached(&execlists->queue))) {
> +	while ((rb = rb_first_cached(&sched_engine->queue))) {
>  		struct i915_priolist *p = to_priolist(rb);
>  
>  		priolist_for_each_request_consume(rq, rn, p) {
> @@ -378,28 +382,28 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
>  			i915_request_mark_complete(rq);
>  		}
>  
> -		rb_erase_cached(&p->node, &execlists->queue);
> +		rb_erase_cached(&p->node, &sched_engine->queue);
>  		i915_priolist_free(p);
>  	}
>  
>  	/* Remaining _unready_ requests will be nop'ed when submitted */
>  
> -	execlists->queue_priority_hint = INT_MIN;
> -	execlists->queue = RB_ROOT_CACHED;
> +	sched_engine->queue_priority_hint = INT_MIN;
> +	sched_engine->queue = RB_ROOT_CACHED;
>  
> -	spin_unlock_irqrestore(&engine->active.lock, flags);
> +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>  }
>  
>  static void guc_reset_finish(struct intel_engine_cs *engine)
>  {
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
> +	struct i915_sched_engine * const sched_engine = engine->sched_engine;
>  
> -	if (__tasklet_enable(&execlists->tasklet))
> +	if (__tasklet_enable(&sched_engine->tasklet))
>  		/* And kick in case we missed a new request submission. */
> -		tasklet_hi_schedule(&execlists->tasklet);
> +		i915_sched_engine_hi_kick(sched_engine);
>  
>  	ENGINE_TRACE(engine, "depth->%d\n",
> -		     atomic_read(&execlists->tasklet.count));
> +		     atomic_read(&sched_engine->tasklet.count));
>  }
>  
>  /*
> @@ -500,7 +504,7 @@ static inline void queue_request(struct intel_engine_cs *engine,
>  {
>  	GEM_BUG_ON(!list_empty(&rq->sched.link));
>  	list_add_tail(&rq->sched.link,
> -		      i915_sched_lookup_priolist(engine, prio));
> +		      i915_sched_lookup_priolist(engine->sched_engine, prio));
>  	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>  }
>  
> @@ -510,16 +514,16 @@ static void guc_submit_request(struct i915_request *rq)
>  	unsigned long flags;
>  
>  	/* Will be called from irq-context when using foreign fences. */
> -	spin_lock_irqsave(&engine->active.lock, flags);
> +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
>  
>  	queue_request(engine, rq, rq_prio(rq));
>  
> -	GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
> +	GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
>  	GEM_BUG_ON(list_empty(&rq->sched.link));
>  
> -	tasklet_hi_schedule(&engine->execlists.tasklet);
> +	i915_sched_engine_hi_kick(engine->sched_engine);
>  
> -	spin_unlock_irqrestore(&engine->active.lock, flags);
> +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>  }
>  
>  static void sanitize_hwsp(struct intel_engine_cs *engine)
> @@ -597,7 +601,7 @@ static void guc_release(struct intel_engine_cs *engine)
>  {
>  	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
>  
> -	tasklet_kill(&engine->execlists.tasklet);
> +	tasklet_kill(&engine->sched_engine->tasklet);
>  
>  	intel_engine_cleanup_common(engine);
>  	lrc_fini_wa_ctx(engine);
> @@ -612,7 +616,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>  	engine->cops = &guc_context_ops;
>  	engine->request_alloc = guc_request_alloc;
>  
> -	engine->schedule = i915_schedule;
> +	engine->sched_engine->schedule = i915_schedule;
>  
>  	engine->reset.prepare = guc_reset_prepare;
>  	engine->reset.rewind = guc_reset_rewind;
> @@ -676,7 +680,8 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>  	 */
>  	GEM_BUG_ON(INTEL_GEN(i915) < 11);
>  
> -	tasklet_setup(&engine->execlists.tasklet, guc_submission_tasklet);
> +	tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet);
> +	engine->sched_engine->schedule = i915_schedule;
>  
>  	guc_default_vfuncs(engine);
>  	guc_default_irqs(engine);
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index bb181fe5d47e..3352f56bcf63 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1247,7 +1247,8 @@ static void record_request(const struct i915_request *request,
>  
>  static void engine_record_execlists(struct intel_engine_coredump *ee)
>  {
> -	const struct intel_engine_execlists * const el = &ee->engine->execlists;
> +	const struct intel_engine_execlists * const el =
> +		&ee->engine->execlists;
>  	struct i915_request * const *port = el->active;
>  	unsigned int n = 0;
>  
> @@ -1441,12 +1442,12 @@ capture_engine(struct intel_engine_cs *engine,
>  	if (!ee)
>  		return NULL;
>  
> -	spin_lock_irqsave(&engine->active.lock, flags);
> +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
>  	rq = intel_engine_find_active_request(engine);
>  	if (rq)
>  		capture = intel_engine_coredump_add_request(ee, rq,
>  							    ATOMIC_MAYFAIL);
> -	spin_unlock_irqrestore(&engine->active.lock, flags);
> +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>  	if (!capture) {
>  		kfree(ee);
>  		return NULL;
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 970d8f4986bb..4c0df56e3b86 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -272,11 +272,11 @@ i915_request_active_engine(struct i915_request *rq,
>  	 * check that we have acquired the lock on the final engine.
>  	 */
>  	locked = READ_ONCE(rq->engine);
> -	spin_lock_irq(&locked->active.lock);
> +	spin_lock_irq(&locked->sched_engine->lock);
>  	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> -		spin_unlock(&locked->active.lock);
> +		spin_unlock(&locked->sched_engine->lock);
>  		locked = engine;
> -		spin_lock(&locked->active.lock);
> +		spin_lock(&locked->sched_engine->lock);
>  	}
>  
>  	if (i915_request_is_active(rq)) {
> @@ -285,7 +285,7 @@ i915_request_active_engine(struct i915_request *rq,
>  		ret = true;
>  	}
>  
> -	spin_unlock_irq(&locked->active.lock);
> +	spin_unlock_irq(&locked->sched_engine->lock);
>  
>  	return ret;
>  }
> @@ -302,10 +302,10 @@ static void remove_from_engine(struct i915_request *rq)
>  	 * check that the rq still belongs to the newly locked engine.
>  	 */
>  	locked = READ_ONCE(rq->engine);
> -	spin_lock_irq(&locked->active.lock);
> +	spin_lock_irq(&locked->sched_engine->lock);
>  	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> -		spin_unlock(&locked->active.lock);
> -		spin_lock(&engine->active.lock);
> +		spin_unlock(&locked->sched_engine->lock);
> +		spin_lock(&engine->sched_engine->lock);
>  		locked = engine;
>  	}
>  	list_del_init(&rq->sched.link);
> @@ -316,7 +316,7 @@ static void remove_from_engine(struct i915_request *rq)
>  	/* Prevent further __await_execution() registering a cb, then flush */
>  	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
>  
> -	spin_unlock_irq(&locked->active.lock);
> +	spin_unlock_irq(&locked->sched_engine->lock);
>  
>  	__notify_execute_cb_imm(rq);
>  }
> @@ -481,7 +481,7 @@ static bool __request_in_flight(const struct i915_request *signal)
>  	 * may either perform a context switch to the second inflight execlists,
>  	 * or it may switch to the pending set of execlists. In the case of the
>  	 * latter, it may send the ACK and we process the event copying the
> -	 * pending[] over top of inflight[], _overwriting_ our *active. Since
> +	 * pending[] over top of inflight[], _overwriting_ our *active-> Since
>  	 * this implies the HW is arbitrating and not struck in *active, we do
>  	 * not worry about complete accuracy, but we do require no read/write
>  	 * tearing of the pointer [the read of the pointer must be valid, even
> @@ -490,7 +490,7 @@ static bool __request_in_flight(const struct i915_request *signal)
>  	 *
>  	 * Note that the read of *execlists->active may race with the promotion
>  	 * of execlists->pending[] to execlists->inflight[], overwritting
> -	 * the value at *execlists->active. This is fine. The promotion implies
> +	 * the value at *execlists->active-> This is fine. The promotion implies
>  	 * that we received an ACK from the HW, and so the context is not
>  	 * stuck -- if we do not see ourselves in *active, the inflight status
>  	 * is valid. If instead we see ourselves being copied into *active,
> @@ -545,7 +545,7 @@ __await_execution(struct i915_request *rq,
>  
>  	/*
>  	 * Register the callback first, then see if the signaler is already
> -	 * active. This ensures that if we race with the
> +	 * active-> This ensures that if we race with the
>  	 * __notify_execute_cb from i915_request_submit() and we are not
>  	 * included in that list, we get a second bite of the cherry and
>  	 * execute it ourselves. After this point, a future
> @@ -637,7 +637,7 @@ bool __i915_request_submit(struct i915_request *request)
>  	RQ_TRACE(request, "\n");
>  
>  	GEM_BUG_ON(!irqs_disabled());
> -	lockdep_assert_held(&engine->active.lock);
> +	lockdep_assert_held(&engine->sched_engine->lock);
>  
>  	/*
>  	 * With the advent of preempt-to-busy, we frequently encounter
> @@ -649,9 +649,9 @@ bool __i915_request_submit(struct i915_request *request)
>  	 *
>  	 * We must remove the request from the caller's priority queue,
>  	 * and the caller must only call us when the request is in their
> -	 * priority queue, under the active.lock. This ensures that the
> +	 * priority queue, under the active->lock. This ensures that the
>  	 * request has *not* yet been retired and we can safely move
> -	 * the request into the engine->active.list where it will be
> +	 * the request into the engine->sched_engine->list where it will be
>  	 * dropped upon retiring. (Otherwise if resubmit a *retired*
>  	 * request, this would be a horrible use-after-free.)
>  	 */
> @@ -694,7 +694,7 @@ bool __i915_request_submit(struct i915_request *request)
>  	result = true;
>  
>  	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> -	list_move_tail(&request->sched.link, &engine->active.requests);
> +	list_move_tail(&request->sched.link, &engine->sched_engine->requests);
>  active:
>  	clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags);
>  	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
> @@ -724,11 +724,11 @@ void i915_request_submit(struct i915_request *request)
>  	unsigned long flags;
>  
>  	/* Will be called from irq-context when using foreign fences. */
> -	spin_lock_irqsave(&engine->active.lock, flags);
> +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
>  
>  	__i915_request_submit(request);
>  
> -	spin_unlock_irqrestore(&engine->active.lock, flags);
> +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>  }
>  
>  void __i915_request_unsubmit(struct i915_request *request)
> @@ -742,7 +742,7 @@ void __i915_request_unsubmit(struct i915_request *request)
>  	RQ_TRACE(request, "\n");
>  
>  	GEM_BUG_ON(!irqs_disabled());
> -	lockdep_assert_held(&engine->active.lock);
> +	lockdep_assert_held(&engine->sched_engine->lock);
>  
>  	/*
>  	 * Before we remove this breadcrumb from the signal list, we have
> @@ -775,11 +775,11 @@ void i915_request_unsubmit(struct i915_request *request)
>  	unsigned long flags;
>  
>  	/* Will be called from irq-context when using foreign fences. */
> -	spin_lock_irqsave(&engine->active.lock, flags);
> +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
>  
>  	__i915_request_unsubmit(request);
>  
> -	spin_unlock_irqrestore(&engine->active.lock, flags);
> +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>  }
>  
>  static void __cancel_request(struct i915_request *rq)
> @@ -1343,7 +1343,7 @@ __i915_request_await_execution(struct i915_request *to,
>  	}
>  
>  	/* Couple the dependency tree for PI on this exposed to->fence */
> -	if (to->engine->schedule) {
> +	if (to->engine->sched_engine->schedule) {
>  		err = i915_sched_node_add_dependency(&to->sched,
>  						     &from->sched,
>  						     I915_DEPENDENCY_WEAK);
> @@ -1484,7 +1484,7 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
>  		return 0;
>  	}
>  
> -	if (to->engine->schedule) {
> +	if (to->engine->sched_engine->schedule) {
>  		ret = i915_sched_node_add_dependency(&to->sched,
>  						     &from->sched,
>  						     I915_DEPENDENCY_EXTERNAL);
> @@ -1671,7 +1671,7 @@ __i915_request_add_to_timeline(struct i915_request *rq)
>  			__i915_sw_fence_await_dma_fence(&rq->submit,
>  							&prev->fence,
>  							&rq->dmaq);
> -		if (rq->engine->schedule)
> +		if (rq->engine->sched_engine->schedule)
>  			__i915_sched_node_add_dependency(&rq->sched,
>  							 &prev->sched,
>  							 &rq->dep,
> @@ -1743,8 +1743,8 @@ void __i915_request_queue(struct i915_request *rq,
>  	 * decide whether to preempt the entire chain so that it is ready to
>  	 * run at the earliest possible convenience.
>  	 */
> -	if (attr && rq->engine->schedule)
> -		rq->engine->schedule(rq, attr);
> +	if (attr && rq->engine->sched_engine->schedule)
> +		rq->engine->sched_engine->schedule(rq, attr);
>  
>  	local_bh_disable();
>  	__i915_request_queue_bh(rq);
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 270f6cd37650..239964bec1fa 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -613,7 +613,7 @@ i915_request_active_timeline(const struct i915_request *rq)
>  	 * this submission.
>  	 */
>  	return rcu_dereference_protected(rq->timeline,
> -					 lockdep_is_held(&rq->engine->active.lock));
> +					 lockdep_is_held(&rq->engine->sched_engine->lock));
>  }
>  
>  static inline u32
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index efa638c3acc7..28d403a8d7d2 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -40,7 +40,7 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>  	return rb_entry(rb, struct i915_priolist, node);
>  }
>  
> -static void assert_priolists(struct intel_engine_execlists * const execlists)
> +static void assert_priolists(struct i915_sched_engine * const sched_engine)
>  {
>  	struct rb_node *rb;
>  	long last_prio;
> @@ -48,11 +48,11 @@ static void assert_priolists(struct intel_engine_execlists * const execlists)
>  	if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
>  		return;
>  
> -	GEM_BUG_ON(rb_first_cached(&execlists->queue) !=
> -		   rb_first(&execlists->queue.rb_root));
> +	GEM_BUG_ON(rb_first_cached(&sched_engine->queue) !=
> +		   rb_first(&sched_engine->queue.rb_root));
>  
>  	last_prio = INT_MAX;
> -	for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
> +	for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
>  		const struct i915_priolist *p = to_priolist(rb);
>  
>  		GEM_BUG_ON(p->priority > last_prio);
> @@ -61,23 +61,22 @@ static void assert_priolists(struct intel_engine_execlists * const execlists)
>  }
>  
>  struct list_head *
> -i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
> +i915_sched_lookup_priolist(struct i915_sched_engine *sched_engine, int prio)
>  {
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
>  	struct i915_priolist *p;
>  	struct rb_node **parent, *rb;
>  	bool first = true;
>  
> -	lockdep_assert_held(&engine->active.lock);
> -	assert_priolists(execlists);
> +	lockdep_assert_held(&sched_engine->lock);
> +	assert_priolists(sched_engine);
>  
> -	if (unlikely(execlists->no_priolist))
> +	if (unlikely(sched_engine->no_priolist))
>  		prio = I915_PRIORITY_NORMAL;
>  
>  find_priolist:
>  	/* most positive priority is scheduled first, equal priorities fifo */
>  	rb = NULL;
> -	parent = &execlists->queue.rb_root.rb_node;
> +	parent = &sched_engine->queue.rb_root.rb_node;
>  	while (*parent) {
>  		rb = *parent;
>  		p = to_priolist(rb);
> @@ -92,7 +91,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
>  	}
>  
>  	if (prio == I915_PRIORITY_NORMAL) {
> -		p = &execlists->default_priolist;
> +		p = &sched_engine->default_priolist;
>  	} else {
>  		p = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC);
>  		/* Convert an allocation failure to a priority bump */
> @@ -107,7 +106,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
>  			 * requests, so if userspace lied about their
>  			 * dependencies that reordering may be visible.
>  			 */
> -			execlists->no_priolist = true;
> +			sched_engine->no_priolist = true;
>  			goto find_priolist;
>  		}
>  	}
> @@ -116,7 +115,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
>  	INIT_LIST_HEAD(&p->requests);
>  
>  	rb_link_node(&p->node, rb, parent);
> -	rb_insert_color_cached(&p->node, &execlists->queue, first);
> +	rb_insert_color_cached(&p->node, &sched_engine->queue, first);
>  
>  	return &p->requests;
>  }
> @@ -130,13 +129,13 @@ struct sched_cache {
>  	struct list_head *priolist;
>  };
>  
> -static struct intel_engine_cs *
> -sched_lock_engine(const struct i915_sched_node *node,
> -		  struct intel_engine_cs *locked,
> +static struct i915_sched_engine *
> +lock_sched_engine(struct i915_sched_node *node,
> +		  struct i915_sched_engine *locked,
>  		  struct sched_cache *cache)
>  {
>  	const struct i915_request *rq = node_to_request(node);
> -	struct intel_engine_cs *engine;
> +	struct i915_sched_engine *sched_engine;
>  
>  	GEM_BUG_ON(!locked);
>  
> @@ -146,81 +145,22 @@ sched_lock_engine(const struct i915_sched_node *node,
>  	 * engine lock. The simple ploy we use is to take the lock then
>  	 * check that the rq still belongs to the newly locked engine.
>  	 */
> -	while (locked != (engine = READ_ONCE(rq->engine))) {
> -		spin_unlock(&locked->active.lock);
> +	while (locked != (sched_engine = rq->engine->sched_engine)) {
> +		spin_unlock(&locked->lock);
>  		memset(cache, 0, sizeof(*cache));
> -		spin_lock(&engine->active.lock);
> -		locked = engine;
> +		spin_lock(&sched_engine->lock);
> +		locked = sched_engine;
>  	}
>  
> -	GEM_BUG_ON(locked != engine);
> +	GEM_BUG_ON(locked != sched_engine);
>  	return locked;
>  }
>  
> -static inline int rq_prio(const struct i915_request *rq)
> -{
> -	return rq->sched.attr.priority;
> -}
> -
> -static inline bool need_preempt(int prio, int active)
> -{
> -	/*
> -	 * Allow preemption of low -> normal -> high, but we do
> -	 * not allow low priority tasks to preempt other low priority
> -	 * tasks under the impression that latency for low priority
> -	 * tasks does not matter (as much as background throughput),
> -	 * so kiss.
> -	 */
> -	return prio >= max(I915_PRIORITY_NORMAL, active);
> -}
> -
> -static void kick_submission(struct intel_engine_cs *engine,
> -			    const struct i915_request *rq,
> -			    int prio)
> -{
> -	const struct i915_request *inflight;
> -
> -	/*
> -	 * We only need to kick the tasklet once for the high priority
> -	 * new context we add into the queue.
> -	 */
> -	if (prio <= engine->execlists.queue_priority_hint)
> -		return;
> -
> -	rcu_read_lock();
> -
> -	/* Nothing currently active? We're overdue for a submission! */
> -	inflight = execlists_active(&engine->execlists);
> -	if (!inflight)
> -		goto unlock;
> -
> -	/*
> -	 * If we are already the currently executing context, don't
> -	 * bother evaluating if we should preempt ourselves.
> -	 */
> -	if (inflight->context == rq->context)
> -		goto unlock;
> -
> -	ENGINE_TRACE(engine,
> -		     "bumping queue-priority-hint:%d for rq:%llx:%lld, inflight:%llx:%lld prio %d\n",
> -		     prio,
> -		     rq->fence.context, rq->fence.seqno,
> -		     inflight->fence.context, inflight->fence.seqno,
> -		     inflight->sched.attr.priority);
> -
> -	engine->execlists.queue_priority_hint = prio;
> -	if (need_preempt(prio, rq_prio(inflight)))
> -		tasklet_hi_schedule(&engine->execlists.tasklet);
> -
> -unlock:
> -	rcu_read_unlock();
> -}
> -
>  static void __i915_schedule(struct i915_sched_node *node,
>  			    const struct i915_sched_attr *attr)
>  {
>  	const int prio = max(attr->priority, node->attr.priority);
> -	struct intel_engine_cs *engine;
> +	struct i915_sched_engine *sched_engine;
>  	struct i915_dependency *dep, *p;
>  	struct i915_dependency stack;
>  	struct sched_cache cache;
> @@ -295,23 +235,24 @@ static void __i915_schedule(struct i915_sched_node *node,
>  	}
>  
>  	memset(&cache, 0, sizeof(cache));
> -	engine = node_to_request(node)->engine;
> -	spin_lock(&engine->active.lock);
> +	sched_engine = node_to_request(node)->engine->sched_engine;
> +	spin_lock(&sched_engine->lock);
>  
>  	/* Fifo and depth-first replacement ensure our deps execute before us */
> -	engine = sched_lock_engine(node, engine, &cache);
> +	sched_engine = lock_sched_engine(node, sched_engine, &cache);
>  	list_for_each_entry_safe_reverse(dep, p, &dfs, dfs_link) {
>  		INIT_LIST_HEAD(&dep->dfs_link);
>  
>  		node = dep->signaler;
> -		engine = sched_lock_engine(node, engine, &cache);
> -		lockdep_assert_held(&engine->active.lock);
> +		sched_engine = lock_sched_engine(node, sched_engine, &cache);
> +		lockdep_assert_held(&sched_engine->lock);
>  
>  		/* Recheck after acquiring the engine->timeline.lock */
>  		if (prio <= node->attr.priority || node_signaled(node))
>  			continue;
>  
> -		GEM_BUG_ON(node_to_request(node)->engine != engine);
> +		GEM_BUG_ON(node_to_request(node)->engine->sched_engine !=
> +			   sched_engine);
>  
>  		WRITE_ONCE(node->attr.priority, prio);
>  
> @@ -329,16 +270,17 @@ static void __i915_schedule(struct i915_sched_node *node,
>  		if (i915_request_in_priority_queue(node_to_request(node))) {
>  			if (!cache.priolist)
>  				cache.priolist =
> -					i915_sched_lookup_priolist(engine,
> +					i915_sched_lookup_priolist(sched_engine,
>  								   prio);
>  			list_move_tail(&node->link, cache.priolist);
>  		}
>  
>  		/* Defer (tasklet) submission until after all of our updates. */
> -		kick_submission(engine, node_to_request(node), prio);
> +		if (sched_engine->kick_backend)
> +			sched_engine->kick_backend(node_to_request(node), prio);
>  	}
>  
> -	spin_unlock(&engine->active.lock);
> +	spin_unlock(&sched_engine->lock);
>  }
>  
>  void i915_schedule(struct i915_request *rq, const struct i915_sched_attr *attr)
> @@ -489,6 +431,50 @@ void i915_request_show_with_schedule(struct drm_printer *m,
>  	rcu_read_unlock();
>  }
>  
> +void i915_sched_engine_free(struct kref *kref)
> +{
> +	struct i915_sched_engine *sched_engine =
> +		container_of(kref, typeof(*sched_engine), ref);
> +
> +	i915_sched_engine_kill(sched_engine); /* flush the callback */
> +	kfree(sched_engine);
> +}
> +
> +struct i915_sched_engine *
> +i915_sched_engine_create(unsigned int subclass)
> +{
> +	struct i915_sched_engine *sched_engine;
> +
> +	sched_engine = kzalloc(sizeof(*sched_engine), GFP_KERNEL);
> +	if (!sched_engine)
> +		return NULL;
> +
> +	kref_init(&sched_engine->ref);
> +
> +	sched_engine->queue = RB_ROOT_CACHED;
> +	sched_engine->queue_priority_hint = INT_MIN;
> +
> +	INIT_LIST_HEAD(&sched_engine->requests);
> +	INIT_LIST_HEAD(&sched_engine->hold);
> +
> +	spin_lock_init(&sched_engine->lock);
> +	lockdep_set_subclass(&sched_engine->lock, subclass);
> +
> +	/*
> +	 * Due to an interesting quirk in lockdep's internal debug tracking,
> +	 * after setting a subclass we must ensure the lock is used. Otherwise,
> +	 * nr_unused_locks is incremented once too often.
> +	 */
> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> +	local_irq_disable();
> +	lock_map_acquire(&sched_engine->lock.dep_map);
> +	lock_map_release(&sched_engine->lock.dep_map);
> +	local_irq_enable();
> +#endif
> +
> +	return sched_engine;
> +}
> +
>  static void i915_global_scheduler_shrink(void)
>  {
>  	kmem_cache_shrink(global.slab_dependencies);
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index 858a0938f47a..a78b1f50ecb4 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -39,7 +39,7 @@ void i915_schedule(struct i915_request *request,
>  		   const struct i915_sched_attr *attr);
>  
>  struct list_head *
> -i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio);
> +i915_sched_lookup_priolist(struct i915_sched_engine *sched_engine, int prio);
>  
>  void __i915_priolist_free(struct i915_priolist *p);
>  static inline void i915_priolist_free(struct i915_priolist *p)
> @@ -53,4 +53,67 @@ void i915_request_show_with_schedule(struct drm_printer *m,
>  				     const char *prefix,
>  				     int indent);
>  
> +struct i915_sched_engine *
> +i915_sched_engine_create(unsigned int subclass);
> +
> +void i915_sched_engine_free(struct kref *kref);
> +
> +static inline struct i915_sched_engine *
> +i915_sched_engine_get(struct i915_sched_engine *sched_engine)
> +{
> +	kref_get(&sched_engine->ref);
> +	return sched_engine;
> +}
> +
> +static inline void
> +i915_sched_engine_put(struct i915_sched_engine *sched_engine)
> +{
> +	kref_put(&sched_engine->ref, i915_sched_engine_free);
> +}
> +
> +static inline bool
> +i915_sched_engine_is_empty(struct i915_sched_engine *sched_engine)
> +{
> +	return RB_EMPTY_ROOT(&sched_engine->queue.rb_root);
> +}
> +
> +static inline void
> +i915_sched_engine_reset_on_empty(struct i915_sched_engine *sched_engine)
> +{
> +	if (i915_sched_engine_is_empty(sched_engine))
> +		sched_engine->no_priolist = false;
> +}
> +
> +static inline void
> +i915_sched_engine_hi_kick(struct i915_sched_engine *sched_engine)
> +{
> +	tasklet_hi_schedule(&sched_engine->tasklet);
> +}
> +
> +static inline void
> +i915_sched_engine_kick(struct i915_sched_engine *sched_engine)
> +{
> +	tasklet_schedule(&sched_engine->tasklet);
> +}
> +
> +static inline void
> +i915_sched_engine_kill(struct i915_sched_engine *sched_engine)
> +{
> +	tasklet_kill(&sched_engine->tasklet);
> +}
> +
> +static inline void
> +sched_engine_active_lock_bh(struct i915_sched_engine *sched_engine)
> +{
> +	local_bh_disable(); /* prevent local softirq and lock recursion */
> +	tasklet_lock(&sched_engine->tasklet);
> +}
> +
> +static inline void
> +sched_engine_active_unlock_bh(struct i915_sched_engine *sched_engine)
> +{
> +	tasklet_unlock(&sched_engine->tasklet);
> +	local_bh_enable(); /* restore softirq, and kick ksoftirqd! */
> +}
> +
>  #endif /* _I915_SCHEDULER_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
> index 343ed44d5ed4..90b389ba661b 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler_types.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
> @@ -91,4 +91,67 @@ struct i915_dependency {
>  				&(rq__)->sched.signalers_list, \
>  				signal_link)
>  
> +struct i915_sched_engine {
> +	struct kref ref;
> +
> +	/*
> +	 * @lock: Protects requests in priority lists, requests, hold, and
> +	 * tasklet while running.
> +	 */
> +	spinlock_t lock;
> +
> +	/* Execlist specific lists, needed here as protected by lock */
> +	struct list_head requests;
> +	struct list_head hold; /* ready requests, but on hold */
> +
> +	/**
> +	 * @tasklet: softirq tasklet for bottom handler
> +	 */
> +	struct tasklet_struct tasklet;
> +
> +	/**
> +	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
> +	 */
> +	struct i915_priolist default_priolist;
> +
> +	/**
> +	 * @queue_priority_hint: Highest pending priority.
> +	 *
> +	 * When we add requests into the queue, or adjust the priority of
> +	 * executing requests, we compute the maximum priority of those
> +	 * pending requests. We can then use this value to determine if
> +	 * we need to preempt the executing requests to service the queue.
> +	 * However, since the we may have recorded the priority of an inflight
> +	 * request we wanted to preempt but since completed, at the time of
> +	 * dequeuing the priority hint may no longer may match the highest
> +	 * available request priority.
> +	 */
> +	int queue_priority_hint;
> +
> +	/**
> +	 * @queue: queue of requests, in priority lists
> +	 */
> +	struct rb_root_cached queue;
> +
> +	/**
> +	 * @no_priolist: priority lists disabled
> +	 */
> +	bool no_priolist;
> +
> +	/* Back pointer to engine */
> +	struct intel_engine_cs *engine;
> +
> +	/* Kick backend */
> +	void	(*kick_backend)(const struct i915_request *rq,
> +				int prio);
> +
> +	/*
> +	 * Call when the priority on a request has changed and it and its
> +	 * dependencies may need rescheduling. Note the request itself may
> +	 * not be ready to run!
> +	 */
> +	void	(*schedule)(struct i915_request *request,
> +			    const struct i915_sched_attr *attr);
> +};
> +
>  #endif /* _I915_SCHEDULER_TYPES_H_ */
> -- 
> 2.28.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 43/97] drm/i915/guc: Add lrc descriptor context lookup array
  2021-05-06 19:13 ` [RFC PATCH 43/97] drm/i915/guc: Add lrc descriptor context lookup array Matthew Brost
@ 2021-05-11 15:26   ` Daniel Vetter
  2021-05-11 17:01     ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Daniel Vetter @ 2021-05-11 15:26 UTC (permalink / raw)
  To: Matthew Brost
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison

On Thu, May 06, 2021 at 12:13:57PM -0700, Matthew Brost wrote:
> Add lrc descriptor context lookup array which can resolve the
> intel_context from the lrc descriptor index. In addition to lookup, it
> can determine in the lrc descriptor context is currently registered with
> the GuC by checking if an entry for a descriptor index is present.
> Future patches in the series will make use of this array.
> 
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  5 +++
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +++++++++++++++++--
>  2 files changed, 35 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index d84f37afb9d8..2eb6c497e43c 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -6,6 +6,8 @@
>  #ifndef _INTEL_GUC_H_
>  #define _INTEL_GUC_H_
>  
> +#include "linux/xarray.h"
> +
>  #include "intel_uncore.h"
>  #include "intel_guc_fw.h"
>  #include "intel_guc_fwif.h"
> @@ -47,6 +49,9 @@ struct intel_guc {
>  	struct i915_vma *lrc_desc_pool;
>  	void *lrc_desc_pool_vaddr;
>  
> +	/* guc_id to intel_context lookup */
> +	struct xarray context_lookup;

The current code sets a disastrous example, but for stuff like this it's
always good to explain the locking, and who's holding references and how
you're handling cycles. Since I guess the intel_context also holds the
guc_id alive somehow.

Again holds for the entire series, where it makes sense (as in we don't
expect to rewrite the entire code anyway).
-Daniel

> +
>  	/* Control params for fw initialization */
>  	u32 params[GUC_CTL_MAX_DWORDS];
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 6acc1ef34f92..c2b6d27404b7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>  	return rb_entry(rb, struct i915_priolist, node);
>  }
>  
> -/* Future patches will use this function */
> -__attribute__ ((unused))
>  static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
>  {
>  	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
> @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
>  	return &base[index];
>  }
>  
> +static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
> +{
> +	struct intel_context *ce = xa_load(&guc->context_lookup, id);
> +
> +	GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
> +
> +	return ce;
> +}
> +
>  static int guc_lrc_desc_pool_create(struct intel_guc *guc)
>  {
>  	u32 size;
> @@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
>  	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
>  }
>  
> +static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
> +{
> +	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> +
> +	memset(desc, 0, sizeof(*desc));
> +	xa_erase_irq(&guc->context_lookup, id);
> +}
> +
> +static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> +{
> +	return __get_context(guc, id);
> +}
> +
> +static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> +					   struct intel_context *ce)
> +{
> +	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> +}
> +
>  static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>  {
>  	/* Leaving stub as this function will be used in future patches */
> @@ -404,6 +430,8 @@ int intel_guc_submission_init(struct intel_guc *guc)
>  	 */
>  	GEM_BUG_ON(!guc->lrc_desc_pool);
>  
> +	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> +
>  	return 0;
>  }
>  
> -- 
> 2.28.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* RE: [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-11  8:06         ` Martin Peres
@ 2021-05-11 15:26           ` Bloomfield, Jon
  2021-05-11 16:39             ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Bloomfield, Jon @ 2021-05-11 15:26 UTC (permalink / raw)
  To: Martin Peres, Daniel Vetter
  Cc: Brost, Matthew, Ursulin, Tvrtko, intel-gfx, dri-devel, Ekstrand,
	Jason, Ceraolo Spurio, Daniele, Jason Ekstrand, Vetter, Daniel,
	Harrison, John C

> -----Original Message-----
> From: Martin Peres <martin.peres@free.fr>
> Sent: Tuesday, May 11, 2021 1:06 AM
> To: Daniel Vetter <daniel@ffwll.ch>
> Cc: Jason Ekstrand <jason@jlekstrand.net>; Brost, Matthew
> <matthew.brost@intel.com>; intel-gfx <intel-gfx@lists.freedesktop.org>;
> dri-devel <dri-devel@lists.freedesktop.org>; Ursulin, Tvrtko
> <tvrtko.ursulin@intel.com>; Ekstrand, Jason <jason.ekstrand@intel.com>;
> Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>; Bloomfield, Jon
> <jon.bloomfield@intel.com>; Vetter, Daniel <daniel.vetter@intel.com>;
> Harrison, John C <john.c.harrison@intel.com>
> Subject: Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
> 
> On 10/05/2021 19:33, Daniel Vetter wrote:
> > On Mon, May 10, 2021 at 3:55 PM Martin Peres <martin.peres@free.fr>
> wrote:
> >>
> >> On 10/05/2021 02:11, Jason Ekstrand wrote:
> >>> On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> On 06/05/2021 22:13, Matthew Brost wrote:
> >>>>> Basic GuC submission support. This is the first bullet point in the
> >>>>> upstreaming plan covered in the following RFC [1].
> >>>>>
> >>>>> At a very high level the GuC is a piece of firmware which sits between
> >>>>> the i915 and the GPU. It offloads some of the scheduling of contexts
> >>>>> from the i915 and programs the GPU to submit contexts. The i915
> >>>>> communicates with the GuC and the GuC communicates with the
> GPU.
> >>>>
> >>>> May I ask what will GuC command submission do that execlist
> won't/can't
> >>>> do? And what would be the impact on users? Even forgetting the
> troubled
> >>>> history of GuC (instability, performance regression, poor level of user
> >>>> support, 6+ years of trying to upstream it...), adding this much code
> >>>> and doubling the amount of validation needed should come with a
> >>>> rationale making it feel worth it... and I am not seeing here. Would you
> >>>> mind providing the rationale behind this work?
> >>>>
> >>>>>
> >>>>> GuC submission will be disabled by default on all current upstream
> >>>>> platforms behind a module parameter - enable_guc. A value of 3 will
> >>>>> enable submission and HuC loading via the GuC. GuC submission
> should
> >>>>> work on all gen11+ platforms assuming the GuC firmware is present.
> >>>>
> >>>> What is the plan here when it comes to keeping support for execlist? I
> >>>> am afraid that landing GuC support in Linux is the first step towards
> >>>> killing the execlist, which would force users to use proprietary
> >>>> firmwares that even most Intel engineers have little influence over.
> >>>> Indeed, if "drm/i915/guc: Disable semaphores when using GuC
> scheduling"
> >>>> which states "Disable semaphores when using GuC scheduling as
> semaphores
> >>>> are broken in the current GuC firmware." is anything to go by, it means
> >>>> that even Intel developers seem to prefer working around the GuC
> >>>> firmware, rather than fixing it.
> >>>
> >>> Yes, landing GuC support may be the first step in removing execlist
> >>> support. The inevitable reality is that GPU scheduling is coming and
> >>> likely to be there only path in the not-too-distant future. (See also
> >>> the ongoing thread with AMD about fences.) I'm not going to pass
> >>> judgement on whether or not this is a good thing.  I'm just reading the
> >>> winds and, in my view, this is where things are headed for good or ill.
> >>>
> >>> In answer to the question above, the answer to "what do we gain from
> >>> GuC?" may soon be, "you get to use your GPU."  We're not there yet
> and,
> >>> again, I'm not necessarily advocating for it, but that is likely where
> >>> things are headed.
> >>
> >> This will be a sad day, especially since it seems fundamentally opposed
> >> with any long-term support, on top of taking away user freedom to
> >> fix/tweak their system when Intel won't.
> >>
> >>> A firmware-based submission model isn't a bad design IMO and, aside
> from
> >>> the firmware freedom issues, I think there are actual advantages to the
> >>> model. Immediately, it'll unlock a few features like parallel submission
> >>> (more on that in a bit) and long-running compute because they're
> >>> implemented in GuC and the work to implement them properly in the
> >>> execlist scheduler is highly non-trivial. Longer term, it may (no
> >>> guarantees) unlock some performance by getting the kernel out of the
> way.
> >>
> >> Oh, I definitely agree with firmware-based submission model not being a
> >> bad design. I was even cheering for it in 2015. Experience with it made
> >> me regret that deeply since :s
> >>
> >> But with the DRM scheduler being responsible for most things, I fail to
> >> see what we could offload in the GuC except context switching (like
> >> every other manufacturer). The problem is, the GuC does way more than
> >> just switching registers in bulk, and if the number of revisions of the
> >> GuC is anything to go by, it is way too complex for me to feel
> >> comfortable with it.
> >
> > We need to flesh out that part of the plan more, but we're not going
> > to use drm scheduler for everything. It's only to handle the dma-fence
> > legacy side of things, which means:
> > - timeout handling for batches that take too long
> > - dma_fence dependency sorting/handling
> > - boosting of context from display flips (currently missing, needs to
> > be ported from drm/i915)
> >
> > The actual round-robin/preempt/priority handling is still left to the
> > backend, in this case here the fw. So there's large chunks of
> > code/functionality where drm/scheduler wont be involved in, and like
> > Jason says: The hw direction winds definitely blow in the direction
> > that this is all handled in hw.
> 
> The plan makes sense for a SRIOV-enable GPU, yes.
> 
> However, if the GuC is actually helping i915, then why not open source
> it and drop all the issues related to its stability? Wouldn't it be the
> perfect solution, as it would allow dropping execlist support for newer
> HW, and it would eliminate the concerns about maintenance of stable
> releases of Linux?

That the major version of the FW is high is not due to bugs - Bugs don't trigger major version bumps anyway. Only interface changes increment the major version, and we do add features, to keep it relevant to the evolving hardware and OS landscape. When only Windows used GuC there was no reason not to minimize interface creep - GuC and KMD are released as an atomic bundle on Windows. With Linux, this is no longer the case, and has not been for some time.

We have been using GuC as the sole mechanism for submission on Windows since Gen8, and it has proven very reliable. This is in large part because it is simple, and designed from day 1 as a cohesive solution alongside the hardware.

Will there be bugs in the future? Of course. It's a new i915 backend. There are bugs in the execlist backend too, and the runlist backend, and the majority of real-world software ever written. But the i915 GuC backend is way simpler than execlist, much easier to understand, and therefore much easier to maintain. It's a net win for i915 and Linux.

Jon

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 49/97] drm/i915/guc: Disable engine barriers with GuC during unpin
  2021-05-06 19:14 ` [RFC PATCH 49/97] drm/i915/guc: Disable engine barriers with GuC during unpin Matthew Brost
@ 2021-05-11 15:37   ` Daniel Vetter
  2021-05-11 16:31     ` Matthew Brost
  2021-05-26 10:26   ` [Intel-gfx] " Tvrtko Ursulin
  1 sibling, 1 reply; 249+ messages in thread
From: Daniel Vetter @ 2021-05-11 15:37 UTC (permalink / raw)
  To: Matthew Brost
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison

On Thu, May 06, 2021 at 12:14:03PM -0700, Matthew Brost wrote:
> Disable engine barriers for unpinning with GuC. This feature isn't
> needed with the GuC as it disables context scheduling before unpinning
> which guarantees the HW will not reference the context. Hence it is
> not necessary to defer unpinning until a kernel context request
> completes on each engine in the context engine mask.
> 
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Instead of these ifs in the code, can we push this barrier business down
into backends?

Not in this series, but as one of the things to sort out as part of the
conversion to drm/scheduler.
-Daniel

> ---
>  drivers/gpu/drm/i915/gt/intel_context.c    |  2 +-
>  drivers/gpu/drm/i915/gt/intel_context.h    |  1 +
>  drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++++++++++
>  drivers/gpu/drm/i915/i915_active.c         |  3 +++
>  4 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index 1499b8aace2a..7f97753ab164 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct intel_context *ce)
>  
>  	__i915_active_acquire(&ce->active);
>  
> -	if (intel_context_is_barrier(ce))
> +	if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))
>  		return 0;
>  
>  	/* Preallocate tracking nodes */
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> index 92ecbab8c1cd..9b211ca5ecc7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> @@ -16,6 +16,7 @@
>  #include "intel_engine_types.h"
>  #include "intel_ring_types.h"
>  #include "intel_timeline_types.h"
> +#include "uc/intel_guc_submission.h"
>  
>  #define CE_TRACE(ce, fmt, ...) do {					\
>  	const struct intel_context *ce__ = (ce);			\
> diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c
> index 26685b927169..fa7b99a671dd 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_context.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_context.c
> @@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs *engine)
>  	 * This test makes sure that the context is kept alive until a
>  	 * subsequent idle-barrier (emitted when the engine wakeref hits 0
>  	 * with no more outstanding requests).
> +	 *
> +	 * In GuC submission mode we don't use idle barriers and we instead
> +	 * get a message from the GuC to signal that it is safe to unpin the
> +	 * context from memory.
>  	 */
> +	if (intel_engine_uses_guc(engine))
> +		return 0;
>  
>  	if (intel_engine_pm_is_awake(engine)) {
>  		pr_err("%s is awake before starting %s!\n",
> @@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs *engine)
>  	 * on the context image remotely (intel_context_prepare_remote_request),
>  	 * which inserts foreign fences into intel_context.active, does not
>  	 * clobber the idle-barrier.
> +	 *
> +	 * In GuC submission mode we don't use idle barriers.
>  	 */
> +	if (intel_engine_uses_guc(engine))
> +		return 0;
>  
>  	if (intel_engine_pm_is_awake(engine)) {
>  		pr_err("%s is awake before starting %s!\n",
> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> index b1aa1c482c32..9a264898bb91 100644
> --- a/drivers/gpu/drm/i915/i915_active.c
> +++ b/drivers/gpu/drm/i915/i915_active.c
> @@ -968,6 +968,9 @@ void i915_active_acquire_barrier(struct i915_active *ref)
>  
>  	GEM_BUG_ON(i915_active_is_idle(ref));
>  
> +	if (llist_empty(&ref->preallocated_barriers))
> +		return;
> +
>  	/*
>  	 * Transfer the list of preallocated barriers into the
>  	 * i915_active rbtree, but only as proto-nodes. They will be
> -- 
> 2.28.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 68/97] drm/i915/guc: Handle context reset notification
  2021-05-06 19:14 ` [RFC PATCH 68/97] drm/i915/guc: Handle context reset notification Matthew Brost
@ 2021-05-11 16:25   ` Daniel Vetter
  0 siblings, 0 replies; 249+ messages in thread
From: Daniel Vetter @ 2021-05-11 16:25 UTC (permalink / raw)
  To: Matthew Brost; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Thu, May 06, 2021 at 12:14:22PM -0700, Matthew Brost wrote:
> GuC will issue a reset on detecting an engine hang and will notify
> the driver via a G2H message. The driver will service the notification
> by resetting the guilty context to a simple state or banning it
> completely.
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Entirely aside, but I wonder whether we shouldn't just make
non-recoverable contexts the only thing we support. But probably a too big
can of worms.
-Daniel

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 ++
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  6 ++++
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++++++++++++++++++
>  drivers/gpu/drm/i915/i915_trace.h             | 10 ++++++
>  4 files changed, 53 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 277b4496a20e..a2abe1c422e3 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -263,6 +263,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>  					  const u32 *msg, u32 len);
>  int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>  				     const u32 *msg, u32 len);
> +int intel_guc_context_reset_process_msg(struct intel_guc *guc,
> +					const u32 *msg, u32 len);
>  
>  void intel_guc_submission_reset_prepare(struct intel_guc *guc);
>  void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index b3194d753b13..9c84b2ba63a8 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -941,6 +941,12 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
>  			CT_ERROR(ct, "schedule context failed %x %*ph\n",
>  				  action, 4 * len, payload);
>  		break;
> +	case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
> +		ret = intel_guc_context_reset_process_msg(guc, payload, len);
> +		if (unlikely(ret))
> +			CT_ERROR(ct, "context reset notification failed %x %*ph\n",
> +				  action, 4 * len, payload);
> +		break;
>  	default:
>  		ret = -EOPNOTSUPP;
>  		break;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 2c3791fc24b7..940017495731 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -2192,6 +2192,41 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>  	return 0;
>  }
>  
> +static void guc_context_replay(struct intel_context *ce)
> +{
> +	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
> +
> +	__guc_reset_context(ce, true);
> +	i915_sched_engine_hi_kick(sched_engine);
> +}
> +
> +static void guc_handle_context_reset(struct intel_guc *guc,
> +				     struct intel_context *ce)
> +{
> +	trace_intel_context_reset(ce);
> +	guc_context_replay(ce);
> +}
> +
> +int intel_guc_context_reset_process_msg(struct intel_guc *guc,
> +					const u32 *msg, u32 len)
> +{
> +	struct intel_context *ce;
> +	int desc_idx = msg[0];
> +
> +	if (unlikely(len != 1)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
> +		return -EPROTO;
> +	}
> +
> +	ce = g2h_context_lookup(guc, desc_idx);
> +	if (unlikely(!ce))
> +		return -EPROTO;
> +
> +	guc_handle_context_reset(guc, ce);
> +
> +	return 0;
> +}
> +
>  void intel_guc_log_submission_info(struct intel_guc *guc,
>  				   struct drm_printer *p)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index 97c2e83984ed..c095c4d39456 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context,
>  		      __entry->guc_sched_state_no_lock)
>  );
>  
> +DEFINE_EVENT(intel_context, intel_context_reset,
> +	     TP_PROTO(struct intel_context *ce),
> +	     TP_ARGS(ce)
> +);
> +
>  DEFINE_EVENT(intel_context, intel_context_register,
>  	     TP_PROTO(struct intel_context *ce),
>  	     TP_ARGS(ce)
> @@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq)
>  {
>  }
>  
> +static inline void
> +trace_intel_context_reset(struct intel_context *ce)
> +{
> +}
> +
>  static inline void
>  trace_intel_context_register(struct intel_context *ce)
>  {
> -- 
> 2.28.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 74/97] drm/i915/guc: Capture error state on context reset
  2021-05-06 19:14 ` [RFC PATCH 74/97] drm/i915/guc: Capture error state on context reset Matthew Brost
@ 2021-05-11 16:28   ` Daniel Vetter
  2021-05-11 17:12     ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Daniel Vetter @ 2021-05-11 16:28 UTC (permalink / raw)
  To: Matthew Brost; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Thu, May 06, 2021 at 12:14:28PM -0700, Matthew Brost wrote:
> We receive notification of an engine reset from GuC at its
> completion. Meaning GuC has potentially cleared any HW state
> we may have been interested in capturing. GuC resumes scheduling
> on the engine post-reset, as the resets are meant to be transparent,
> further muddling our error state.
> 
> There is ongoing work to define an API for a GuC debug state dump. The
> suggestion for now is to manually disable FW initiated resets in cases
> where debug state is needed.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

This looks a bit backwards to me:

- I figured we should capture error state when we get the G2H, in which
  case I hope we do know which the offending context was that got shot.

- For now we're missing the hw state, but we should still be able to
  capture the buffers userspace wants us to capture. So that could be
  wired up already?

But yeah register state capturing needs support from GuC fw.

I think this is a big enough miss in GuC features that we should list it
on the rfc as a thing to fix.
-Daniel

> ---
>  drivers/gpu/drm/i915/gt/intel_context.c       | 20 +++++++++++
>  drivers/gpu/drm/i915/gt/intel_context.h       |  3 ++
>  drivers/gpu/drm/i915/gt/intel_engine.h        | 21 ++++++++++-
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 11 ++++--
>  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 ++
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++++++++----------
>  drivers/gpu/drm/i915/i915_gpu_error.c         | 25 ++++++++++---
>  7 files changed, 91 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index 2f01437056a8..3fe7794b2bfd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -514,6 +514,26 @@ struct i915_request *intel_context_create_request(struct intel_context *ce)
>  	return rq;
>  }
>  
> +struct i915_request *intel_context_find_active_request(struct intel_context *ce)
> +{
> +	struct i915_request *rq, *active = NULL;
> +	unsigned long flags;
> +
> +	GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
> +
> +	spin_lock_irqsave(&ce->guc_active.lock, flags);
> +	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
> +				    sched.link) {
> +		if (i915_request_completed(rq))
> +			break;
> +
> +		active = rq;
> +	}
> +	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
> +
> +	return active;
> +}
> +
>  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>  #include "selftest_context.c"
>  #endif
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> index 9b211ca5ecc7..d2b499ed8a05 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> @@ -195,6 +195,9 @@ int intel_context_prepare_remote_request(struct intel_context *ce,
>  
>  struct i915_request *intel_context_create_request(struct intel_context *ce);
>  
> +struct i915_request *
> +intel_context_find_active_request(struct intel_context *ce);
> +
>  static inline struct intel_ring *__intel_context_ring_size(u64 sz)
>  {
>  	return u64_to_ptr(struct intel_ring, sz);
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index 3321d0917a99..bb94963a9fa2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -242,7 +242,7 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine,
>  				   ktime_t *now);
>  
>  struct i915_request *
> -intel_engine_find_active_request(struct intel_engine_cs *engine);
> +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine);
>  
>  u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
>  
> @@ -316,4 +316,23 @@ intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
>  	return engine->cops->get_sibling(engine, sibling);
>  }
>  
> +static inline void
> +intel_engine_set_hung_context(struct intel_engine_cs *engine,
> +			      struct intel_context *ce)
> +{
> +	engine->hung_ce = ce;
> +}
> +
> +static inline void
> +intel_engine_clear_hung_context(struct intel_engine_cs *engine)
> +{
> +	intel_engine_set_hung_context(engine, NULL);
> +}
> +
> +static inline struct intel_context *
> +intel_engine_get_hung_context(struct intel_engine_cs *engine)
> +{
> +	return engine->hung_ce;
> +}
> +
>  #endif /* _INTEL_RINGBUFFER_H_ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 10300db1c9a6..ad3987289f09 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -1727,7 +1727,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>  	drm_printf(m, "\tRequests:\n");
>  
>  	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> -	rq = intel_engine_find_active_request(engine);
> +	rq = intel_engine_execlist_find_hung_request(engine);
>  	if (rq) {
>  		struct intel_timeline *tl = get_timeline(rq);
>  
> @@ -1838,10 +1838,17 @@ static bool match_ring(struct i915_request *rq)
>  }
>  
>  struct i915_request *
> -intel_engine_find_active_request(struct intel_engine_cs *engine)
> +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
>  {
>  	struct i915_request *request, *active = NULL;
>  
> +	/*
> +	 * This search does not work in GuC submission mode. However, the GuC
> +	 * will report the hanging context directly to the driver itself. So
> +	 * the driver should never get here when in GuC mode.
> +	 */
> +	GEM_BUG_ON(intel_uc_uses_guc_submission(&engine->gt->uc));
> +
>  	/*
>  	 * We are called by the error capture, reset and to dump engine
>  	 * state at random points in time. In particular, note that neither is
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index b84562b2708b..bba53e3b39b9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -304,6 +304,8 @@ struct intel_engine_cs {
>  	/* keep a request in reserve for a [pm] barrier under oom */
>  	struct i915_request *request_pool;
>  
> +	struct intel_context *hung_ce;
> +
>  	struct llist_head barrier_tasks;
>  
>  	struct intel_context *kernel_context; /* pinned */
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 22f17a055b21..6b3b74e50b31 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -726,24 +726,6 @@ __unwind_incomplete_requests(struct intel_context *ce)
>  	spin_unlock_irqrestore(&sched_engine->lock, flags);
>  }
>  
> -static struct i915_request *context_find_active_request(struct intel_context *ce)
> -{
> -	struct i915_request *rq, *active = NULL;
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&ce->guc_active.lock, flags);
> -	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
> -				    sched.link) {
> -		if (i915_request_completed(rq))
> -			break;
> -
> -		active = rq;
> -	}
> -	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
> -
> -	return active;
> -}
> -
>  static void __guc_reset_context(struct intel_context *ce, bool stalled)
>  {
>  	struct i915_request *rq;
> @@ -757,7 +739,7 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
>  	 */
>  	clr_context_enabled(ce);
>  
> -	rq = context_find_active_request(ce);
> +	rq = intel_context_find_active_request(ce);
>  	if (!rq) {
>  		head = ce->ring->tail;
>  		stalled = false;
> @@ -2192,6 +2174,20 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>  	return 0;
>  }
>  
> +static void capture_error_state(struct intel_guc *guc,
> +				struct intel_context *ce)
> +{
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	struct drm_i915_private *i915 = gt->i915;
> +	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
> +	intel_wakeref_t wakeref;
> +
> +	intel_engine_set_hung_context(engine, ce);
> +	with_intel_runtime_pm(&i915->runtime_pm, wakeref)
> +		i915_capture_error_state(gt, engine->mask);
> +	atomic_inc(&i915->gpu_error.reset_engine_count[engine->uabi_class]);
> +}
> +
>  static void guc_context_replay(struct intel_context *ce)
>  {
>  	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
> @@ -2204,6 +2200,7 @@ static void guc_handle_context_reset(struct intel_guc *guc,
>  				     struct intel_context *ce)
>  {
>  	trace_intel_context_reset(ce);
> +	capture_error_state(guc, ce);
>  	guc_context_replay(ce);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 3352f56bcf63..825bdfe44225 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1435,20 +1435,37 @@ capture_engine(struct intel_engine_cs *engine,
>  {
>  	struct intel_engine_capture_vma *capture = NULL;
>  	struct intel_engine_coredump *ee;
> -	struct i915_request *rq;
> +	struct intel_context *ce;
> +	struct i915_request *rq = NULL;
>  	unsigned long flags;
>  
>  	ee = intel_engine_coredump_alloc(engine, GFP_KERNEL);
>  	if (!ee)
>  		return NULL;
>  
> -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> -	rq = intel_engine_find_active_request(engine);
> +	ce = intel_engine_get_hung_context(engine);
> +	if (ce) {
> +		intel_engine_clear_hung_context(engine);
> +		rq = intel_context_find_active_request(ce);
> +		if (!rq || !i915_request_started(rq))
> +			goto no_request_capture;
> +	} else {
> +		/*
> +		 * Getting here with GuC enabled means it is a forced error capture
> +		 * with no actual hang. So, no need to attempt the execlist search.
> +		 */
> +		if (!intel_uc_uses_guc_submission(&engine->gt->uc)) {
> +			spin_lock_irqsave(&engine->sched_engine->lock, flags);
> +			rq = intel_engine_execlist_find_hung_request(engine);
> +			spin_unlock_irqrestore(&engine->sched_engine->lock,
> +					       flags);
> +		}
> +	}
>  	if (rq)
>  		capture = intel_engine_coredump_add_request(ee, rq,
>  							    ATOMIC_MAYFAIL);
> -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>  	if (!capture) {
> +no_request_capture:
>  		kfree(ee);
>  		return NULL;
>  	}
> -- 
> 2.28.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 49/97] drm/i915/guc: Disable engine barriers with GuC during unpin
  2021-05-11 15:37   ` Daniel Vetter
@ 2021-05-11 16:31     ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-11 16:31 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison

On Tue, May 11, 2021 at 05:37:54PM +0200, Daniel Vetter wrote:
> On Thu, May 06, 2021 at 12:14:03PM -0700, Matthew Brost wrote:
> > Disable engine barriers for unpinning with GuC. This feature isn't
> > needed with the GuC as it disables context scheduling before unpinning
> > which guarantees the HW will not reference the context. Hence it is
> > not necessary to defer unpinning until a kernel context request
> > completes on each engine in the context engine mask.
> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> 
> Instead of these ifs in the code, can we push this barrier business down
> into backends?
> 

Not a bad idea. This is an example of what I think of implict behavior of the
backend creeping into the higher levels.

> Not in this series, but as one of the things to sort out as part of the
> conversion to drm/scheduler.

Agree. After basic GuC submission gets merged maybe we go through the code and
remove all the implict backend assumptions.

Matt

> -Daniel
> 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_context.c    |  2 +-
> >  drivers/gpu/drm/i915/gt/intel_context.h    |  1 +
> >  drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++++++++++
> >  drivers/gpu/drm/i915/i915_active.c         |  3 +++
> >  4 files changed, 15 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> > index 1499b8aace2a..7f97753ab164 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct intel_context *ce)
> >  
> >  	__i915_active_acquire(&ce->active);
> >  
> > -	if (intel_context_is_barrier(ce))
> > +	if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))
> >  		return 0;
> >  
> >  	/* Preallocate tracking nodes */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> > index 92ecbab8c1cd..9b211ca5ecc7 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > @@ -16,6 +16,7 @@
> >  #include "intel_engine_types.h"
> >  #include "intel_ring_types.h"
> >  #include "intel_timeline_types.h"
> > +#include "uc/intel_guc_submission.h"
> >  
> >  #define CE_TRACE(ce, fmt, ...) do {					\
> >  	const struct intel_context *ce__ = (ce);			\
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c
> > index 26685b927169..fa7b99a671dd 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_context.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_context.c
> > @@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs *engine)
> >  	 * This test makes sure that the context is kept alive until a
> >  	 * subsequent idle-barrier (emitted when the engine wakeref hits 0
> >  	 * with no more outstanding requests).
> > +	 *
> > +	 * In GuC submission mode we don't use idle barriers and we instead
> > +	 * get a message from the GuC to signal that it is safe to unpin the
> > +	 * context from memory.
> >  	 */
> > +	if (intel_engine_uses_guc(engine))
> > +		return 0;
> >  
> >  	if (intel_engine_pm_is_awake(engine)) {
> >  		pr_err("%s is awake before starting %s!\n",
> > @@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs *engine)
> >  	 * on the context image remotely (intel_context_prepare_remote_request),
> >  	 * which inserts foreign fences into intel_context.active, does not
> >  	 * clobber the idle-barrier.
> > +	 *
> > +	 * In GuC submission mode we don't use idle barriers.
> >  	 */
> > +	if (intel_engine_uses_guc(engine))
> > +		return 0;
> >  
> >  	if (intel_engine_pm_is_awake(engine)) {
> >  		pr_err("%s is awake before starting %s!\n",
> > diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> > index b1aa1c482c32..9a264898bb91 100644
> > --- a/drivers/gpu/drm/i915/i915_active.c
> > +++ b/drivers/gpu/drm/i915/i915_active.c
> > @@ -968,6 +968,9 @@ void i915_active_acquire_barrier(struct i915_active *ref)
> >  
> >  	GEM_BUG_ON(i915_active_is_idle(ref));
> >  
> > +	if (llist_empty(&ref->preallocated_barriers))
> > +		return;
> > +
> >  	/*
> >  	 * Transfer the list of preallocated barriers into the
> >  	 * i915_active rbtree, but only as proto-nodes. They will be
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-11 15:26           ` Bloomfield, Jon
@ 2021-05-11 16:39             ` Matthew Brost
  2021-05-12  6:26               ` Martin Peres
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-11 16:39 UTC (permalink / raw)
  To: Bloomfield, Jon
  Cc: Ursulin, Tvrtko, intel-gfx, dri-devel, Ekstrand, Jason,
	Ceraolo Spurio, Daniele, Jason Ekstrand, Vetter, Daniel,
	Harrison, John C

On Tue, May 11, 2021 at 08:26:59AM -0700, Bloomfield, Jon wrote:
> > -----Original Message-----
> > From: Martin Peres <martin.peres@free.fr>
> > Sent: Tuesday, May 11, 2021 1:06 AM
> > To: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Jason Ekstrand <jason@jlekstrand.net>; Brost, Matthew
> > <matthew.brost@intel.com>; intel-gfx <intel-gfx@lists.freedesktop.org>;
> > dri-devel <dri-devel@lists.freedesktop.org>; Ursulin, Tvrtko
> > <tvrtko.ursulin@intel.com>; Ekstrand, Jason <jason.ekstrand@intel.com>;
> > Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>; Bloomfield, Jon
> > <jon.bloomfield@intel.com>; Vetter, Daniel <daniel.vetter@intel.com>;
> > Harrison, John C <john.c.harrison@intel.com>
> > Subject: Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
> > 
> > On 10/05/2021 19:33, Daniel Vetter wrote:
> > > On Mon, May 10, 2021 at 3:55 PM Martin Peres <martin.peres@free.fr>
> > wrote:
> > >>
> > >> On 10/05/2021 02:11, Jason Ekstrand wrote:
> > >>> On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
> > >>>
> > >>>> Hi,
> > >>>>
> > >>>> On 06/05/2021 22:13, Matthew Brost wrote:
> > >>>>> Basic GuC submission support. This is the first bullet point in the
> > >>>>> upstreaming plan covered in the following RFC [1].
> > >>>>>
> > >>>>> At a very high level the GuC is a piece of firmware which sits between
> > >>>>> the i915 and the GPU. It offloads some of the scheduling of contexts
> > >>>>> from the i915 and programs the GPU to submit contexts. The i915
> > >>>>> communicates with the GuC and the GuC communicates with the
> > GPU.
> > >>>>
> > >>>> May I ask what will GuC command submission do that execlist
> > won't/can't
> > >>>> do? And what would be the impact on users? Even forgetting the
> > troubled
> > >>>> history of GuC (instability, performance regression, poor level of user
> > >>>> support, 6+ years of trying to upstream it...), adding this much code
> > >>>> and doubling the amount of validation needed should come with a
> > >>>> rationale making it feel worth it... and I am not seeing here. Would you
> > >>>> mind providing the rationale behind this work?
> > >>>>
> > >>>>>
> > >>>>> GuC submission will be disabled by default on all current upstream
> > >>>>> platforms behind a module parameter - enable_guc. A value of 3 will
> > >>>>> enable submission and HuC loading via the GuC. GuC submission
> > should
> > >>>>> work on all gen11+ platforms assuming the GuC firmware is present.
> > >>>>
> > >>>> What is the plan here when it comes to keeping support for execlist? I
> > >>>> am afraid that landing GuC support in Linux is the first step towards
> > >>>> killing the execlist, which would force users to use proprietary
> > >>>> firmwares that even most Intel engineers have little influence over.
> > >>>> Indeed, if "drm/i915/guc: Disable semaphores when using GuC
> > scheduling"
> > >>>> which states "Disable semaphores when using GuC scheduling as
> > semaphores
> > >>>> are broken in the current GuC firmware." is anything to go by, it means
> > >>>> that even Intel developers seem to prefer working around the GuC
> > >>>> firmware, rather than fixing it.
> > >>>
> > >>> Yes, landing GuC support may be the first step in removing execlist
> > >>> support. The inevitable reality is that GPU scheduling is coming and
> > >>> likely to be there only path in the not-too-distant future. (See also
> > >>> the ongoing thread with AMD about fences.) I'm not going to pass
> > >>> judgement on whether or not this is a good thing.  I'm just reading the
> > >>> winds and, in my view, this is where things are headed for good or ill.
> > >>>
> > >>> In answer to the question above, the answer to "what do we gain from
> > >>> GuC?" may soon be, "you get to use your GPU."  We're not there yet
> > and,
> > >>> again, I'm not necessarily advocating for it, but that is likely where
> > >>> things are headed.
> > >>
> > >> This will be a sad day, especially since it seems fundamentally opposed
> > >> with any long-term support, on top of taking away user freedom to
> > >> fix/tweak their system when Intel won't.
> > >>
> > >>> A firmware-based submission model isn't a bad design IMO and, aside
> > from
> > >>> the firmware freedom issues, I think there are actual advantages to the
> > >>> model. Immediately, it'll unlock a few features like parallel submission
> > >>> (more on that in a bit) and long-running compute because they're
> > >>> implemented in GuC and the work to implement them properly in the
> > >>> execlist scheduler is highly non-trivial. Longer term, it may (no
> > >>> guarantees) unlock some performance by getting the kernel out of the
> > way.
> > >>
> > >> Oh, I definitely agree with firmware-based submission model not being a
> > >> bad design. I was even cheering for it in 2015. Experience with it made
> > >> me regret that deeply since :s
> > >>
> > >> But with the DRM scheduler being responsible for most things, I fail to
> > >> see what we could offload in the GuC except context switching (like
> > >> every other manufacturer). The problem is, the GuC does way more than
> > >> just switching registers in bulk, and if the number of revisions of the
> > >> GuC is anything to go by, it is way too complex for me to feel
> > >> comfortable with it.
> > >
> > > We need to flesh out that part of the plan more, but we're not going
> > > to use drm scheduler for everything. It's only to handle the dma-fence
> > > legacy side of things, which means:
> > > - timeout handling for batches that take too long
> > > - dma_fence dependency sorting/handling
> > > - boosting of context from display flips (currently missing, needs to
> > > be ported from drm/i915)
> > >
> > > The actual round-robin/preempt/priority handling is still left to the
> > > backend, in this case here the fw. So there's large chunks of
> > > code/functionality where drm/scheduler wont be involved in, and like
> > > Jason says: The hw direction winds definitely blow in the direction
> > > that this is all handled in hw.
> > 
> > The plan makes sense for a SRIOV-enable GPU, yes.
> > 
> > However, if the GuC is actually helping i915, then why not open source
> > it and drop all the issues related to its stability? Wouldn't it be the
> > perfect solution, as it would allow dropping execlist support for newer
> > HW, and it would eliminate the concerns about maintenance of stable
> > releases of Linux?
> 
> That the major version of the FW is high is not due to bugs - Bugs don't trigger major version bumps anyway. Only interface changes increment the major version, and we do add features, to keep it relevant to the evolving hardware and OS landscape. When only Windows used GuC there was no reason not to minimize interface creep - GuC and KMD are released as an atomic bundle on Windows. With Linux, this is no longer the case, and has not been for some time.
> 

Jon hit the nail on head here - there hasn't been any reason not to bump the GuC
version / change the interface until there is code upstream using the GuC. Once
we push something that totally changes. Once SRIOV lands we literally can't the
interface without breaking the world. Our goal is to this right before
somethings lands, hence the high version number.

Matt

> We have been using GuC as the sole mechanism for submission on Windows since Gen8, and it has proven very reliable. This is in large part because it is simple, and designed from day 1 as a cohesive solution alongside the hardware.
> 
> Will there be bugs in the future? Of course. It's a new i915 backend. There are bugs in the execlist backend too, and the runlist backend, and the majority of real-world software ever written. But the i915 GuC backend is way simpler than execlist, much easier to understand, and therefore much easier to maintain. It's a net win for i915 and Linux.
> 
> Jon

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 43/97] drm/i915/guc: Add lrc descriptor context lookup array
  2021-05-11 15:26   ` Daniel Vetter
@ 2021-05-11 17:01     ` Matthew Brost
  2021-05-11 17:43       ` Daniel Vetter
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-11 17:01 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison

On Tue, May 11, 2021 at 05:26:34PM +0200, Daniel Vetter wrote:
> On Thu, May 06, 2021 at 12:13:57PM -0700, Matthew Brost wrote:
> > Add lrc descriptor context lookup array which can resolve the
> > intel_context from the lrc descriptor index. In addition to lookup, it
> > can determine in the lrc descriptor context is currently registered with
> > the GuC by checking if an entry for a descriptor index is present.
> > Future patches in the series will make use of this array.
> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  5 +++
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +++++++++++++++++--
> >  2 files changed, 35 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index d84f37afb9d8..2eb6c497e43c 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -6,6 +6,8 @@
> >  #ifndef _INTEL_GUC_H_
> >  #define _INTEL_GUC_H_
> >  
> > +#include "linux/xarray.h"
> > +
> >  #include "intel_uncore.h"
> >  #include "intel_guc_fw.h"
> >  #include "intel_guc_fwif.h"
> > @@ -47,6 +49,9 @@ struct intel_guc {
> >  	struct i915_vma *lrc_desc_pool;
> >  	void *lrc_desc_pool_vaddr;
> >  
> > +	/* guc_id to intel_context lookup */
> > +	struct xarray context_lookup;
> 
> The current code sets a disastrous example, but for stuff like this it's
> always good to explain the locking, and who's holding references and how
> you're handling cycles. Since I guess the intel_context also holds the
> guc_id alive somehow.
> 

I think (?) I know what you mean by this comment. How about adding:

'If an entry in the the context_lookup is present, that means a context
associated with the guc_id is registered with the GuC. We use this xarray as a
lookup mechanism when the GuC communicate with the i915 about the context.'

> Again holds for the entire series, where it makes sense (as in we don't
> expect to rewrite the entire code anyway).

Slightly out of order but one of the last patches in the series, 'Update GuC
documentation' adds a big section of comments that attempts to clarify how all
of this code works. I likely should add a section explaining the data structures
as well.

Matt

> -Daniel
> 
> > +
> >  	/* Control params for fw initialization */
> >  	u32 params[GUC_CTL_MAX_DWORDS];
> >  
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 6acc1ef34f92..c2b6d27404b7 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> >  	return rb_entry(rb, struct i915_priolist, node);
> >  }
> >  
> > -/* Future patches will use this function */
> > -__attribute__ ((unused))
> >  static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
> >  {
> >  	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
> > @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
> >  	return &base[index];
> >  }
> >  
> > +static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
> > +{
> > +	struct intel_context *ce = xa_load(&guc->context_lookup, id);
> > +
> > +	GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
> > +
> > +	return ce;
> > +}
> > +
> >  static int guc_lrc_desc_pool_create(struct intel_guc *guc)
> >  {
> >  	u32 size;
> > @@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
> >  	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
> >  }
> >  
> > +static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
> > +{
> > +	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> > +
> > +	memset(desc, 0, sizeof(*desc));
> > +	xa_erase_irq(&guc->context_lookup, id);
> > +}
> > +
> > +static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> > +{
> > +	return __get_context(guc, id);
> > +}
> > +
> > +static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> > +					   struct intel_context *ce)
> > +{
> > +	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> > +}
> > +
> >  static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >  {
> >  	/* Leaving stub as this function will be used in future patches */
> > @@ -404,6 +430,8 @@ int intel_guc_submission_init(struct intel_guc *guc)
> >  	 */
> >  	GEM_BUG_ON(!guc->lrc_desc_pool);
> >  
> > +	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> > +
> >  	return 0;
> >  }
> >  
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 74/97] drm/i915/guc: Capture error state on context reset
  2021-05-11 16:28   ` [Intel-gfx] " Daniel Vetter
@ 2021-05-11 17:12     ` Matthew Brost
  2021-05-11 17:45       ` Daniel Vetter
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-11 17:12 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Tue, May 11, 2021 at 06:28:25PM +0200, Daniel Vetter wrote:
> On Thu, May 06, 2021 at 12:14:28PM -0700, Matthew Brost wrote:
> > We receive notification of an engine reset from GuC at its
> > completion. Meaning GuC has potentially cleared any HW state
> > we may have been interested in capturing. GuC resumes scheduling
> > on the engine post-reset, as the resets are meant to be transparent,
> > further muddling our error state.
> > 
> > There is ongoing work to define an API for a GuC debug state dump. The
> > suggestion for now is to manually disable FW initiated resets in cases
> > where debug state is needed.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> 
> This looks a bit backwards to me:
> 

Definitely a bit hacky but this patch does the best to capture the error as it
can,

> - I figured we should capture error state when we get the G2H, in which
>   case I hope we do know which the offending context was that got shot.
>

We know which context was shot based on the G2H. See 'hung_ce' in this patch.

> - For now we're missing the hw state, but we should still be able to
>   capture the buffers userspace wants us to capture. So that could be
>   wired up already?

Which buffers exactly? We dump all buffers associated with the context. 

> 
> But yeah register state capturing needs support from GuC fw.
>
> I think this is a big enough miss in GuC features that we should list it
> on the rfc as a thing to fix.

Agree this needs to be fixed.

Matt

> -Daniel
> 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_context.c       | 20 +++++++++++
> >  drivers/gpu/drm/i915/gt/intel_context.h       |  3 ++
> >  drivers/gpu/drm/i915/gt/intel_engine.h        | 21 ++++++++++-
> >  drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 11 ++++--
> >  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 ++
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++++++++----------
> >  drivers/gpu/drm/i915/i915_gpu_error.c         | 25 ++++++++++---
> >  7 files changed, 91 insertions(+), 26 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> > index 2f01437056a8..3fe7794b2bfd 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -514,6 +514,26 @@ struct i915_request *intel_context_create_request(struct intel_context *ce)
> >  	return rq;
> >  }
> >  
> > +struct i915_request *intel_context_find_active_request(struct intel_context *ce)
> > +{
> > +	struct i915_request *rq, *active = NULL;
> > +	unsigned long flags;
> > +
> > +	GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
> > +
> > +	spin_lock_irqsave(&ce->guc_active.lock, flags);
> > +	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
> > +				    sched.link) {
> > +		if (i915_request_completed(rq))
> > +			break;
> > +
> > +		active = rq;
> > +	}
> > +	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
> > +
> > +	return active;
> > +}
> > +
> >  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> >  #include "selftest_context.c"
> >  #endif
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> > index 9b211ca5ecc7..d2b499ed8a05 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > @@ -195,6 +195,9 @@ int intel_context_prepare_remote_request(struct intel_context *ce,
> >  
> >  struct i915_request *intel_context_create_request(struct intel_context *ce);
> >  
> > +struct i915_request *
> > +intel_context_find_active_request(struct intel_context *ce);
> > +
> >  static inline struct intel_ring *__intel_context_ring_size(u64 sz)
> >  {
> >  	return u64_to_ptr(struct intel_ring, sz);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> > index 3321d0917a99..bb94963a9fa2 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> > @@ -242,7 +242,7 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine,
> >  				   ktime_t *now);
> >  
> >  struct i915_request *
> > -intel_engine_find_active_request(struct intel_engine_cs *engine);
> > +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine);
> >  
> >  u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
> >  
> > @@ -316,4 +316,23 @@ intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
> >  	return engine->cops->get_sibling(engine, sibling);
> >  }
> >  
> > +static inline void
> > +intel_engine_set_hung_context(struct intel_engine_cs *engine,
> > +			      struct intel_context *ce)
> > +{
> > +	engine->hung_ce = ce;
> > +}
> > +
> > +static inline void
> > +intel_engine_clear_hung_context(struct intel_engine_cs *engine)
> > +{
> > +	intel_engine_set_hung_context(engine, NULL);
> > +}
> > +
> > +static inline struct intel_context *
> > +intel_engine_get_hung_context(struct intel_engine_cs *engine)
> > +{
> > +	return engine->hung_ce;
> > +}
> > +
> >  #endif /* _INTEL_RINGBUFFER_H_ */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > index 10300db1c9a6..ad3987289f09 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > @@ -1727,7 +1727,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
> >  	drm_printf(m, "\tRequests:\n");
> >  
> >  	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > -	rq = intel_engine_find_active_request(engine);
> > +	rq = intel_engine_execlist_find_hung_request(engine);
> >  	if (rq) {
> >  		struct intel_timeline *tl = get_timeline(rq);
> >  
> > @@ -1838,10 +1838,17 @@ static bool match_ring(struct i915_request *rq)
> >  }
> >  
> >  struct i915_request *
> > -intel_engine_find_active_request(struct intel_engine_cs *engine)
> > +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
> >  {
> >  	struct i915_request *request, *active = NULL;
> >  
> > +	/*
> > +	 * This search does not work in GuC submission mode. However, the GuC
> > +	 * will report the hanging context directly to the driver itself. So
> > +	 * the driver should never get here when in GuC mode.
> > +	 */
> > +	GEM_BUG_ON(intel_uc_uses_guc_submission(&engine->gt->uc));
> > +
> >  	/*
> >  	 * We are called by the error capture, reset and to dump engine
> >  	 * state at random points in time. In particular, note that neither is
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > index b84562b2708b..bba53e3b39b9 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > @@ -304,6 +304,8 @@ struct intel_engine_cs {
> >  	/* keep a request in reserve for a [pm] barrier under oom */
> >  	struct i915_request *request_pool;
> >  
> > +	struct intel_context *hung_ce;
> > +
> >  	struct llist_head barrier_tasks;
> >  
> >  	struct intel_context *kernel_context; /* pinned */
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 22f17a055b21..6b3b74e50b31 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -726,24 +726,6 @@ __unwind_incomplete_requests(struct intel_context *ce)
> >  	spin_unlock_irqrestore(&sched_engine->lock, flags);
> >  }
> >  
> > -static struct i915_request *context_find_active_request(struct intel_context *ce)
> > -{
> > -	struct i915_request *rq, *active = NULL;
> > -	unsigned long flags;
> > -
> > -	spin_lock_irqsave(&ce->guc_active.lock, flags);
> > -	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
> > -				    sched.link) {
> > -		if (i915_request_completed(rq))
> > -			break;
> > -
> > -		active = rq;
> > -	}
> > -	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
> > -
> > -	return active;
> > -}
> > -
> >  static void __guc_reset_context(struct intel_context *ce, bool stalled)
> >  {
> >  	struct i915_request *rq;
> > @@ -757,7 +739,7 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
> >  	 */
> >  	clr_context_enabled(ce);
> >  
> > -	rq = context_find_active_request(ce);
> > +	rq = intel_context_find_active_request(ce);
> >  	if (!rq) {
> >  		head = ce->ring->tail;
> >  		stalled = false;
> > @@ -2192,6 +2174,20 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
> >  	return 0;
> >  }
> >  
> > +static void capture_error_state(struct intel_guc *guc,
> > +				struct intel_context *ce)
> > +{
> > +	struct intel_gt *gt = guc_to_gt(guc);
> > +	struct drm_i915_private *i915 = gt->i915;
> > +	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
> > +	intel_wakeref_t wakeref;
> > +
> > +	intel_engine_set_hung_context(engine, ce);
> > +	with_intel_runtime_pm(&i915->runtime_pm, wakeref)
> > +		i915_capture_error_state(gt, engine->mask);
> > +	atomic_inc(&i915->gpu_error.reset_engine_count[engine->uabi_class]);
> > +}
> > +
> >  static void guc_context_replay(struct intel_context *ce)
> >  {
> >  	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
> > @@ -2204,6 +2200,7 @@ static void guc_handle_context_reset(struct intel_guc *guc,
> >  				     struct intel_context *ce)
> >  {
> >  	trace_intel_context_reset(ce);
> > +	capture_error_state(guc, ce);
> >  	guc_context_replay(ce);
> >  }
> >  
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index 3352f56bcf63..825bdfe44225 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -1435,20 +1435,37 @@ capture_engine(struct intel_engine_cs *engine,
> >  {
> >  	struct intel_engine_capture_vma *capture = NULL;
> >  	struct intel_engine_coredump *ee;
> > -	struct i915_request *rq;
> > +	struct intel_context *ce;
> > +	struct i915_request *rq = NULL;
> >  	unsigned long flags;
> >  
> >  	ee = intel_engine_coredump_alloc(engine, GFP_KERNEL);
> >  	if (!ee)
> >  		return NULL;
> >  
> > -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > -	rq = intel_engine_find_active_request(engine);
> > +	ce = intel_engine_get_hung_context(engine);
> > +	if (ce) {
> > +		intel_engine_clear_hung_context(engine);
> > +		rq = intel_context_find_active_request(ce);
> > +		if (!rq || !i915_request_started(rq))
> > +			goto no_request_capture;
> > +	} else {
> > +		/*
> > +		 * Getting here with GuC enabled means it is a forced error capture
> > +		 * with no actual hang. So, no need to attempt the execlist search.
> > +		 */
> > +		if (!intel_uc_uses_guc_submission(&engine->gt->uc)) {
> > +			spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > +			rq = intel_engine_execlist_find_hung_request(engine);
> > +			spin_unlock_irqrestore(&engine->sched_engine->lock,
> > +					       flags);
> > +		}
> > +	}
> >  	if (rq)
> >  		capture = intel_engine_coredump_add_request(ee, rq,
> >  							    ATOMIC_MAYFAIL);
> > -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >  	if (!capture) {
> > +no_request_capture:
> >  		kfree(ee);
> >  		return NULL;
> >  	}
> > -- 
> > 2.28.0
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 43/97] drm/i915/guc: Add lrc descriptor context lookup array
  2021-05-11 17:01     ` Matthew Brost
@ 2021-05-11 17:43       ` Daniel Vetter
  2021-05-11 19:34         ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Daniel Vetter @ 2021-05-11 17:43 UTC (permalink / raw)
  To: Matthew Brost
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison

On Tue, May 11, 2021 at 10:01:28AM -0700, Matthew Brost wrote:
> On Tue, May 11, 2021 at 05:26:34PM +0200, Daniel Vetter wrote:
> > On Thu, May 06, 2021 at 12:13:57PM -0700, Matthew Brost wrote:
> > > Add lrc descriptor context lookup array which can resolve the
> > > intel_context from the lrc descriptor index. In addition to lookup, it
> > > can determine in the lrc descriptor context is currently registered with
> > > the GuC by checking if an entry for a descriptor index is present.
> > > Future patches in the series will make use of this array.
> > > 
> > > Cc: John Harrison <john.c.harrison@intel.com>
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  5 +++
> > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +++++++++++++++++--
> > >  2 files changed, 35 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > index d84f37afb9d8..2eb6c497e43c 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > @@ -6,6 +6,8 @@
> > >  #ifndef _INTEL_GUC_H_
> > >  #define _INTEL_GUC_H_
> > >  
> > > +#include "linux/xarray.h"
> > > +
> > >  #include "intel_uncore.h"
> > >  #include "intel_guc_fw.h"
> > >  #include "intel_guc_fwif.h"
> > > @@ -47,6 +49,9 @@ struct intel_guc {
> > >  	struct i915_vma *lrc_desc_pool;
> > >  	void *lrc_desc_pool_vaddr;
> > >  
> > > +	/* guc_id to intel_context lookup */
> > > +	struct xarray context_lookup;
> > 
> > The current code sets a disastrous example, but for stuff like this it's
> > always good to explain the locking, and who's holding references and how
> > you're handling cycles. Since I guess the intel_context also holds the
> > guc_id alive somehow.
> > 
> 
> I think (?) I know what you mean by this comment. How about adding:
> 
> 'If an entry in the the context_lookup is present, that means a context
> associated with the guc_id is registered with the GuC. We use this xarray as a
> lookup mechanism when the GuC communicate with the i915 about the context.'

So no idea how this works, but generally we put a "Protecte by
&struct.lock" or similar in here (so you get a nice link plus something
you can use as jump label in your ide too). Plus since intel_context has
some lifetime rules, explaining whether you're allowed to use the pointer
after you unlock, or whether you need to grab a reference or what exactly
is going on. Usually there's three options:

- No refcounting, you cannot access a pointer obtained through this after
  you unluck.
- Weak reference, you upgrade to a full reference with
  kref_get_unless_zero. If that fails it indicates a lookup failure, since
  you raced with destruction. If it succeeds you can use the pointer after
  unlock.
- Strong reference, you get your own reference that stays valid with
  kref_get().

I'm just bringing this up because the current i915-gem code is full of
very tricky locking and lifetime rules, and explains roughly nothing of it
in the data structures. Minimally some hints about the locking/lifetime
rules of important structs should be there.

For locking rules it's good to double-down on them by adding
lockdep_assert_held to all relevant functions (where appropriate only
ofc).

What I generally don't think makes sense is to then also document the
locking in the kerneldoc for the functions. That tends to be one place too
many and ime just gets out of date and not useful at all.

> > Again holds for the entire series, where it makes sense (as in we don't
> > expect to rewrite the entire code anyway).
> 
> Slightly out of order but one of the last patches in the series, 'Update GuC
> documentation' adds a big section of comments that attempts to clarify how all
> of this code works. I likely should add a section explaining the data structures
> as well.

Yeah that would be nice.
-Daniel


> 
> Matt
> 
> > -Daniel
> > 
> > > +
> > >  	/* Control params for fw initialization */
> > >  	u32 params[GUC_CTL_MAX_DWORDS];
> > >  
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > index 6acc1ef34f92..c2b6d27404b7 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> > >  	return rb_entry(rb, struct i915_priolist, node);
> > >  }
> > >  
> > > -/* Future patches will use this function */
> > > -__attribute__ ((unused))
> > >  static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
> > >  {
> > >  	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
> > > @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
> > >  	return &base[index];
> > >  }
> > >  
> > > +static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
> > > +{
> > > +	struct intel_context *ce = xa_load(&guc->context_lookup, id);
> > > +
> > > +	GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
> > > +
> > > +	return ce;
> > > +}
> > > +
> > >  static int guc_lrc_desc_pool_create(struct intel_guc *guc)
> > >  {
> > >  	u32 size;
> > > @@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
> > >  	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
> > >  }
> > >  
> > > +static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
> > > +{
> > > +	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> > > +
> > > +	memset(desc, 0, sizeof(*desc));
> > > +	xa_erase_irq(&guc->context_lookup, id);
> > > +}
> > > +
> > > +static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> > > +{
> > > +	return __get_context(guc, id);
> > > +}
> > > +
> > > +static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> > > +					   struct intel_context *ce)
> > > +{
> > > +	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> > > +}
> > > +
> > >  static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > >  {
> > >  	/* Leaving stub as this function will be used in future patches */
> > > @@ -404,6 +430,8 @@ int intel_guc_submission_init(struct intel_guc *guc)
> > >  	 */
> > >  	GEM_BUG_ON(!guc->lrc_desc_pool);
> > >  
> > > +	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> > > +
> > >  	return 0;
> > >  }
> > >  
> > > -- 
> > > 2.28.0
> > > 
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 74/97] drm/i915/guc: Capture error state on context reset
  2021-05-11 17:12     ` Matthew Brost
@ 2021-05-11 17:45       ` Daniel Vetter
  0 siblings, 0 replies; 249+ messages in thread
From: Daniel Vetter @ 2021-05-11 17:45 UTC (permalink / raw)
  To: Matthew Brost; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Tue, May 11, 2021 at 10:12:32AM -0700, Matthew Brost wrote:
> On Tue, May 11, 2021 at 06:28:25PM +0200, Daniel Vetter wrote:
> > On Thu, May 06, 2021 at 12:14:28PM -0700, Matthew Brost wrote:
> > > We receive notification of an engine reset from GuC at its
> > > completion. Meaning GuC has potentially cleared any HW state
> > > we may have been interested in capturing. GuC resumes scheduling
> > > on the engine post-reset, as the resets are meant to be transparent,
> > > further muddling our error state.
> > > 
> > > There is ongoing work to define an API for a GuC debug state dump. The
> > > suggestion for now is to manually disable FW initiated resets in cases
> > > where debug state is needed.
> > > 
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > 
> > This looks a bit backwards to me:
> > 
> 
> Definitely a bit hacky but this patch does the best to capture the error as it
> can,
> 
> > - I figured we should capture error state when we get the G2H, in which
> >   case I hope we do know which the offending context was that got shot.
> >
> 
> We know which context was shot based on the G2H. See 'hung_ce' in this patch.

Ah maybe I should read more. Would be good to have comments on how the
locking works here, especially around reset it tends to be tricky.
Comments in the data structs/members.

> 
> > - For now we're missing the hw state, but we should still be able to
> >   capture the buffers userspace wants us to capture. So that could be
> >   wired up already?
> 
> Which buffers exactly? We dump all buffers associated with the context. 

There's an opt-in list that userspace can set in execbuf. Maybe that's the
one you mean.
-Daniel

> 
> > 
> > But yeah register state capturing needs support from GuC fw.
> >
> > I think this is a big enough miss in GuC features that we should list it
> > on the rfc as a thing to fix.
> 
> Agree this needs to be fixed.
> 
> Matt
> 
> > -Daniel
> > 
> > > ---
> > >  drivers/gpu/drm/i915/gt/intel_context.c       | 20 +++++++++++
> > >  drivers/gpu/drm/i915/gt/intel_context.h       |  3 ++
> > >  drivers/gpu/drm/i915/gt/intel_engine.h        | 21 ++++++++++-
> > >  drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 11 ++++--
> > >  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 ++
> > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++++++++----------
> > >  drivers/gpu/drm/i915/i915_gpu_error.c         | 25 ++++++++++---
> > >  7 files changed, 91 insertions(+), 26 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> > > index 2f01437056a8..3fe7794b2bfd 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > > @@ -514,6 +514,26 @@ struct i915_request *intel_context_create_request(struct intel_context *ce)
> > >  	return rq;
> > >  }
> > >  
> > > +struct i915_request *intel_context_find_active_request(struct intel_context *ce)
> > > +{
> > > +	struct i915_request *rq, *active = NULL;
> > > +	unsigned long flags;
> > > +
> > > +	GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
> > > +
> > > +	spin_lock_irqsave(&ce->guc_active.lock, flags);
> > > +	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
> > > +				    sched.link) {
> > > +		if (i915_request_completed(rq))
> > > +			break;
> > > +
> > > +		active = rq;
> > > +	}
> > > +	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
> > > +
> > > +	return active;
> > > +}
> > > +
> > >  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> > >  #include "selftest_context.c"
> > >  #endif
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> > > index 9b211ca5ecc7..d2b499ed8a05 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > > @@ -195,6 +195,9 @@ int intel_context_prepare_remote_request(struct intel_context *ce,
> > >  
> > >  struct i915_request *intel_context_create_request(struct intel_context *ce);
> > >  
> > > +struct i915_request *
> > > +intel_context_find_active_request(struct intel_context *ce);
> > > +
> > >  static inline struct intel_ring *__intel_context_ring_size(u64 sz)
> > >  {
> > >  	return u64_to_ptr(struct intel_ring, sz);
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> > > index 3321d0917a99..bb94963a9fa2 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> > > @@ -242,7 +242,7 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine,
> > >  				   ktime_t *now);
> > >  
> > >  struct i915_request *
> > > -intel_engine_find_active_request(struct intel_engine_cs *engine);
> > > +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine);
> > >  
> > >  u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
> > >  
> > > @@ -316,4 +316,23 @@ intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
> > >  	return engine->cops->get_sibling(engine, sibling);
> > >  }
> > >  
> > > +static inline void
> > > +intel_engine_set_hung_context(struct intel_engine_cs *engine,
> > > +			      struct intel_context *ce)
> > > +{
> > > +	engine->hung_ce = ce;
> > > +}
> > > +
> > > +static inline void
> > > +intel_engine_clear_hung_context(struct intel_engine_cs *engine)
> > > +{
> > > +	intel_engine_set_hung_context(engine, NULL);
> > > +}
> > > +
> > > +static inline struct intel_context *
> > > +intel_engine_get_hung_context(struct intel_engine_cs *engine)
> > > +{
> > > +	return engine->hung_ce;
> > > +}
> > > +
> > >  #endif /* _INTEL_RINGBUFFER_H_ */
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > index 10300db1c9a6..ad3987289f09 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > @@ -1727,7 +1727,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
> > >  	drm_printf(m, "\tRequests:\n");
> > >  
> > >  	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > > -	rq = intel_engine_find_active_request(engine);
> > > +	rq = intel_engine_execlist_find_hung_request(engine);
> > >  	if (rq) {
> > >  		struct intel_timeline *tl = get_timeline(rq);
> > >  
> > > @@ -1838,10 +1838,17 @@ static bool match_ring(struct i915_request *rq)
> > >  }
> > >  
> > >  struct i915_request *
> > > -intel_engine_find_active_request(struct intel_engine_cs *engine)
> > > +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
> > >  {
> > >  	struct i915_request *request, *active = NULL;
> > >  
> > > +	/*
> > > +	 * This search does not work in GuC submission mode. However, the GuC
> > > +	 * will report the hanging context directly to the driver itself. So
> > > +	 * the driver should never get here when in GuC mode.
> > > +	 */
> > > +	GEM_BUG_ON(intel_uc_uses_guc_submission(&engine->gt->uc));
> > > +
> > >  	/*
> > >  	 * We are called by the error capture, reset and to dump engine
> > >  	 * state at random points in time. In particular, note that neither is
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > index b84562b2708b..bba53e3b39b9 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > @@ -304,6 +304,8 @@ struct intel_engine_cs {
> > >  	/* keep a request in reserve for a [pm] barrier under oom */
> > >  	struct i915_request *request_pool;
> > >  
> > > +	struct intel_context *hung_ce;
> > > +
> > >  	struct llist_head barrier_tasks;
> > >  
> > >  	struct intel_context *kernel_context; /* pinned */
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > index 22f17a055b21..6b3b74e50b31 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > @@ -726,24 +726,6 @@ __unwind_incomplete_requests(struct intel_context *ce)
> > >  	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > >  }
> > >  
> > > -static struct i915_request *context_find_active_request(struct intel_context *ce)
> > > -{
> > > -	struct i915_request *rq, *active = NULL;
> > > -	unsigned long flags;
> > > -
> > > -	spin_lock_irqsave(&ce->guc_active.lock, flags);
> > > -	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
> > > -				    sched.link) {
> > > -		if (i915_request_completed(rq))
> > > -			break;
> > > -
> > > -		active = rq;
> > > -	}
> > > -	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
> > > -
> > > -	return active;
> > > -}
> > > -
> > >  static void __guc_reset_context(struct intel_context *ce, bool stalled)
> > >  {
> > >  	struct i915_request *rq;
> > > @@ -757,7 +739,7 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
> > >  	 */
> > >  	clr_context_enabled(ce);
> > >  
> > > -	rq = context_find_active_request(ce);
> > > +	rq = intel_context_find_active_request(ce);
> > >  	if (!rq) {
> > >  		head = ce->ring->tail;
> > >  		stalled = false;
> > > @@ -2192,6 +2174,20 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
> > >  	return 0;
> > >  }
> > >  
> > > +static void capture_error_state(struct intel_guc *guc,
> > > +				struct intel_context *ce)
> > > +{
> > > +	struct intel_gt *gt = guc_to_gt(guc);
> > > +	struct drm_i915_private *i915 = gt->i915;
> > > +	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
> > > +	intel_wakeref_t wakeref;
> > > +
> > > +	intel_engine_set_hung_context(engine, ce);
> > > +	with_intel_runtime_pm(&i915->runtime_pm, wakeref)
> > > +		i915_capture_error_state(gt, engine->mask);
> > > +	atomic_inc(&i915->gpu_error.reset_engine_count[engine->uabi_class]);
> > > +}
> > > +
> > >  static void guc_context_replay(struct intel_context *ce)
> > >  {
> > >  	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
> > > @@ -2204,6 +2200,7 @@ static void guc_handle_context_reset(struct intel_guc *guc,
> > >  				     struct intel_context *ce)
> > >  {
> > >  	trace_intel_context_reset(ce);
> > > +	capture_error_state(guc, ce);
> > >  	guc_context_replay(ce);
> > >  }
> > >  
> > > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > > index 3352f56bcf63..825bdfe44225 100644
> > > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > > @@ -1435,20 +1435,37 @@ capture_engine(struct intel_engine_cs *engine,
> > >  {
> > >  	struct intel_engine_capture_vma *capture = NULL;
> > >  	struct intel_engine_coredump *ee;
> > > -	struct i915_request *rq;
> > > +	struct intel_context *ce;
> > > +	struct i915_request *rq = NULL;
> > >  	unsigned long flags;
> > >  
> > >  	ee = intel_engine_coredump_alloc(engine, GFP_KERNEL);
> > >  	if (!ee)
> > >  		return NULL;
> > >  
> > > -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > > -	rq = intel_engine_find_active_request(engine);
> > > +	ce = intel_engine_get_hung_context(engine);
> > > +	if (ce) {
> > > +		intel_engine_clear_hung_context(engine);
> > > +		rq = intel_context_find_active_request(ce);
> > > +		if (!rq || !i915_request_started(rq))
> > > +			goto no_request_capture;
> > > +	} else {
> > > +		/*
> > > +		 * Getting here with GuC enabled means it is a forced error capture
> > > +		 * with no actual hang. So, no need to attempt the execlist search.
> > > +		 */
> > > +		if (!intel_uc_uses_guc_submission(&engine->gt->uc)) {
> > > +			spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > > +			rq = intel_engine_execlist_find_hung_request(engine);
> > > +			spin_unlock_irqrestore(&engine->sched_engine->lock,
> > > +					       flags);
> > > +		}
> > > +	}
> > >  	if (rq)
> > >  		capture = intel_engine_coredump_add_request(ee, rq,
> > >  							    ATOMIC_MAYFAIL);
> > > -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> > >  	if (!capture) {
> > > +no_request_capture:
> > >  		kfree(ee);
> > >  		return NULL;
> > >  	}
> > > -- 
> > > 2.28.0
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 32/97] drm/i915: Introduce i915_sched_engine object
  2021-05-11 15:18   ` Daniel Vetter
@ 2021-05-11 17:56     ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-11 17:56 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison

On Tue, May 11, 2021 at 05:18:22PM +0200, Daniel Vetter wrote:
> On Thu, May 06, 2021 at 12:13:46PM -0700, Matthew Brost wrote:
> > Introduce i915_sched_engine object which is lower level data structure
> > that i915_scheduler / generic code can operate on without touching
> > execlist specific structures. This allows additional submission backends
> > to be added without breaking the layer.
> 
> Maybe add a comment here that this is defacto a detour since we're now
> aiming to use drm/scheduler instead. But also since the current code is a
> bit a mess, we expect this detour to be overall faster since we can then
> refactor in-tree.
> 

Agree. I think in the end we will still have a i915_sched_engine which more or
less encapsulates a 'struct drm_gpu_scheduler' plus a few common variables
between the execlist and GuC backends.

Matt

> Maybe also highlight this a bit more in the rfc to make sure this is
> clear.
> -Daniel
> 
> > 
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_wait.c      |   4 +-
> >  drivers/gpu/drm/i915/gt/intel_engine.h        |  16 -
> >  drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  77 ++--
> >  .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   4 +-
> >  drivers/gpu/drm/i915/gt/intel_engine_pm.c     |  10 +-
> >  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  42 +--
> >  drivers/gpu/drm/i915/gt/intel_engine_user.c   |   2 +-
> >  .../drm/i915/gt/intel_execlists_submission.c  | 350 +++++++++++-------
> >  .../gpu/drm/i915/gt/intel_ring_submission.c   |  13 +-
> >  drivers/gpu/drm/i915/gt/mock_engine.c         |  17 +-
> >  drivers/gpu/drm/i915/gt/selftest_execlists.c  |  36 +-
> >  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   6 +-
> >  drivers/gpu/drm/i915/gt/selftest_lrc.c        |   6 +-
> >  drivers/gpu/drm/i915/gt/selftest_reset.c      |   2 +-
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  75 ++--
> >  drivers/gpu/drm/i915/i915_gpu_error.c         |   7 +-
> >  drivers/gpu/drm/i915/i915_request.c           |  50 +--
> >  drivers/gpu/drm/i915/i915_request.h           |   2 +-
> >  drivers/gpu/drm/i915/i915_scheduler.c         | 168 ++++-----
> >  drivers/gpu/drm/i915/i915_scheduler.h         |  65 +++-
> >  drivers/gpu/drm/i915/i915_scheduler_types.h   |  63 ++++
> >  21 files changed, 575 insertions(+), 440 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> > index 4b9856d5ba14..af1fbf8e2a9a 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> > @@ -104,8 +104,8 @@ static void fence_set_priority(struct dma_fence *fence,
> >  	engine = rq->engine;
> >  
> >  	rcu_read_lock(); /* RCU serialisation for set-wedged protection */
> > -	if (engine->schedule)
> > -		engine->schedule(rq, attr);
> > +	if (engine->sched_engine->schedule)
> > +		engine->sched_engine->schedule(rq, attr);
> >  	rcu_read_unlock();
> >  }
> >  
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> > index 8d9184920c51..988d9688ae4d 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> > @@ -123,20 +123,6 @@ execlists_active(const struct intel_engine_execlists *execlists)
> >  	return active;
> >  }
> >  
> > -static inline void
> > -execlists_active_lock_bh(struct intel_engine_execlists *execlists)
> > -{
> > -	local_bh_disable(); /* prevent local softirq and lock recursion */
> > -	tasklet_lock(&execlists->tasklet);
> > -}
> > -
> > -static inline void
> > -execlists_active_unlock_bh(struct intel_engine_execlists *execlists)
> > -{
> > -	tasklet_unlock(&execlists->tasklet);
> > -	local_bh_enable(); /* restore softirq, and kick ksoftirqd! */
> > -}
> > -
> >  struct i915_request *
> >  execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists);
> >  
> > @@ -257,8 +243,6 @@ intel_engine_find_active_request(struct intel_engine_cs *engine);
> >  
> >  u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
> >  
> > -void intel_engine_init_active(struct intel_engine_cs *engine,
> > -			      unsigned int subclass);
> >  #define ENGINE_PHYSICAL	0
> >  #define ENGINE_MOCK	1
> >  #define ENGINE_VIRTUAL	2
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > index 828e1669f92c..ec82a7ec0c8d 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > @@ -8,6 +8,7 @@
> >  #include "gem/i915_gem_context.h"
> >  
> >  #include "i915_drv.h"
> > +#include "i915_scheduler.h"
> >  
> >  #include "intel_breadcrumbs.h"
> >  #include "intel_context.h"
> > @@ -326,9 +327,6 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
> >  	if (engine->context_size)
> >  		DRIVER_CAPS(i915)->has_logical_contexts = true;
> >  
> > -	/* Nothing to do here, execute in order of dependencies */
> > -	engine->schedule = NULL;
> > -
> >  	ewma__engine_latency_init(&engine->latency);
> >  	seqcount_init(&engine->stats.lock);
> >  
> > @@ -583,9 +581,6 @@ void intel_engine_init_execlists(struct intel_engine_cs *engine)
> >  	memset(execlists->pending, 0, sizeof(execlists->pending));
> >  	execlists->active =
> >  		memset(execlists->inflight, 0, sizeof(execlists->inflight));
> > -
> > -	execlists->queue_priority_hint = INT_MIN;
> > -	execlists->queue = RB_ROOT_CACHED;
> >  }
> >  
> >  static void cleanup_status_page(struct intel_engine_cs *engine)
> > @@ -712,11 +707,17 @@ static int engine_setup_common(struct intel_engine_cs *engine)
> >  		goto err_status;
> >  	}
> >  
> > +	engine->sched_engine = i915_sched_engine_create(ENGINE_PHYSICAL);
> > +	if (!engine->sched_engine) {
> > +		err = -ENOMEM;
> > +		goto err_sched_engine;
> > +	}
> > +	engine->sched_engine->engine = engine;
> > +
> >  	err = intel_engine_init_cmd_parser(engine);
> >  	if (err)
> >  		goto err_cmd_parser;
> >  
> > -	intel_engine_init_active(engine, ENGINE_PHYSICAL);
> >  	intel_engine_init_execlists(engine);
> >  	intel_engine_init__pm(engine);
> >  	intel_engine_init_retire(engine);
> > @@ -735,6 +736,8 @@ static int engine_setup_common(struct intel_engine_cs *engine)
> >  	return 0;
> >  
> >  err_cmd_parser:
> > +	i915_sched_engine_put(engine->sched_engine);
> > +err_sched_engine:
> >  	intel_breadcrumbs_free(engine->breadcrumbs);
> >  err_status:
> >  	cleanup_status_page(engine);
> > @@ -773,11 +776,11 @@ static int measure_breadcrumb_dw(struct intel_context *ce)
> >  	frame->rq.ring = &frame->ring;
> >  
> >  	mutex_lock(&ce->timeline->mutex);
> > -	spin_lock_irq(&engine->active.lock);
> > +	spin_lock_irq(&engine->sched_engine->lock);
> >  
> >  	dw = engine->emit_fini_breadcrumb(&frame->rq, frame->cs) - frame->cs;
> >  
> > -	spin_unlock_irq(&engine->active.lock);
> > +	spin_unlock_irq(&engine->sched_engine->lock);
> >  	mutex_unlock(&ce->timeline->mutex);
> >  
> >  	GEM_BUG_ON(dw & 1); /* RING_TAIL must be qword aligned */
> > @@ -786,28 +789,6 @@ static int measure_breadcrumb_dw(struct intel_context *ce)
> >  	return dw;
> >  }
> >  
> > -void
> > -intel_engine_init_active(struct intel_engine_cs *engine, unsigned int subclass)
> > -{
> > -	INIT_LIST_HEAD(&engine->active.requests);
> > -	INIT_LIST_HEAD(&engine->active.hold);
> > -
> > -	spin_lock_init(&engine->active.lock);
> > -	lockdep_set_subclass(&engine->active.lock, subclass);
> > -
> > -	/*
> > -	 * Due to an interesting quirk in lockdep's internal debug tracking,
> > -	 * after setting a subclass we must ensure the lock is used. Otherwise,
> > -	 * nr_unused_locks is incremented once too often.
> > -	 */
> > -#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > -	local_irq_disable();
> > -	lock_map_acquire(&engine->active.lock.dep_map);
> > -	lock_map_release(&engine->active.lock.dep_map);
> > -	local_irq_enable();
> > -#endif
> > -}
> > -
> >  static struct intel_context *
> >  create_pinned_context(struct intel_engine_cs *engine,
> >  		      unsigned int hwsp,
> > @@ -955,10 +936,10 @@ int intel_engines_init(struct intel_gt *gt)
> >   */
> >  void intel_engine_cleanup_common(struct intel_engine_cs *engine)
> >  {
> > -	GEM_BUG_ON(!list_empty(&engine->active.requests));
> > -	tasklet_kill(&engine->execlists.tasklet); /* flush the callback */
> > +	GEM_BUG_ON(!list_empty(&engine->sched_engine->requests));
> >  
> >  	intel_breadcrumbs_free(engine->breadcrumbs);
> > +	i915_sched_engine_put(engine->sched_engine);
> >  
> >  	intel_engine_fini_retire(engine);
> >  	intel_engine_cleanup_cmd_parser(engine);
> > @@ -1241,7 +1222,7 @@ static bool ring_is_idle(struct intel_engine_cs *engine)
> >  
> >  void __intel_engine_flush_submission(struct intel_engine_cs *engine, bool sync)
> >  {
> > -	struct tasklet_struct *t = &engine->execlists.tasklet;
> > +	struct tasklet_struct *t = &engine->sched_engine->tasklet;
> >  
> >  	if (!t->callback)
> >  		return;
> > @@ -1281,7 +1262,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
> >  	intel_engine_flush_submission(engine);
> >  
> >  	/* ELSP is empty, but there are ready requests? E.g. after reset */
> > -	if (!RB_EMPTY_ROOT(&engine->execlists.queue.rb_root))
> > +	if (!i915_sched_engine_is_empty(engine->sched_engine))
> >  		return false;
> >  
> >  	/* Ring stopped? */
> > @@ -1347,7 +1328,7 @@ static struct intel_timeline *get_timeline(struct i915_request *rq)
> >  	struct intel_timeline *tl;
> >  
> >  	/*
> > -	 * Even though we are holding the engine->active.lock here, there
> > +	 * Even though we are holding the engine->sched_engine->lock here, there
> >  	 * is no control over the submission queue per-se and we are
> >  	 * inspecting the active state at a random point in time, with an
> >  	 * unknown queue. Play safe and make sure the timeline remains valid.
> > @@ -1502,10 +1483,10 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
> >  
> >  		drm_printf(m, "\tExeclist tasklet queued? %s (%s), preempt? %s, timeslice? %s\n",
> >  			   yesno(test_bit(TASKLET_STATE_SCHED,
> > -					  &engine->execlists.tasklet.state)),
> > -			   enableddisabled(!atomic_read(&engine->execlists.tasklet.count)),
> > -			   repr_timer(&engine->execlists.preempt),
> > -			   repr_timer(&engine->execlists.timer));
> > +					  &engine->sched_engine->tasklet.state)),
> > +			   enableddisabled(!atomic_read(&engine->sched_engine->tasklet.count)),
> > +			   repr_timer(&execlists->preempt),
> > +			   repr_timer(&execlists->timer));
> >  
> >  		read = execlists->csb_head;
> >  		write = READ_ONCE(*execlists->csb_write);
> > @@ -1527,7 +1508,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
> >  				   idx, hws[idx * 2], hws[idx * 2 + 1]);
> >  		}
> >  
> > -		execlists_active_lock_bh(execlists);
> > +		sched_engine_active_lock_bh(engine->sched_engine);
> >  		rcu_read_lock();
> >  		for (port = execlists->active; (rq = *port); port++) {
> >  			char hdr[160];
> > @@ -1558,7 +1539,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
> >  			i915_request_show(m, rq, hdr, 0);
> >  		}
> >  		rcu_read_unlock();
> > -		execlists_active_unlock_bh(execlists);
> > +		sched_engine_active_unlock_bh(engine->sched_engine);
> >  	} else if (INTEL_GEN(dev_priv) > 6) {
> >  		drm_printf(m, "\tPP_DIR_BASE: 0x%08x\n",
> >  			   ENGINE_READ(engine, RING_PP_DIR_BASE));
> > @@ -1694,7 +1675,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
> >  
> >  	drm_printf(m, "\tRequests:\n");
> >  
> > -	spin_lock_irqsave(&engine->active.lock, flags);
> > +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> >  	rq = intel_engine_find_active_request(engine);
> >  	if (rq) {
> >  		struct intel_timeline *tl = get_timeline(rq);
> > @@ -1725,8 +1706,9 @@ void intel_engine_dump(struct intel_engine_cs *engine,
> >  			hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE);
> >  		}
> >  	}
> > -	drm_printf(m, "\tOn hold?: %lu\n", list_count(&engine->active.hold));
> > -	spin_unlock_irqrestore(&engine->active.lock, flags);
> > +	drm_printf(m, "\tOn hold?: %lu\n",
> > +		   list_count(&engine->sched_engine->hold));
> > +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >  
> >  	drm_printf(m, "\tMMIO base:  0x%08x\n", engine->mmio_base);
> >  	wakeref = intel_runtime_pm_get_if_in_use(engine->uncore->rpm);
> > @@ -1806,7 +1788,7 @@ intel_engine_find_active_request(struct intel_engine_cs *engine)
> >  	 * At all other times, we must assume the GPU is still running, but
> >  	 * we only care about the snapshot of this moment.
> >  	 */
> > -	lockdep_assert_held(&engine->active.lock);
> > +	lockdep_assert_held(&engine->sched_engine->lock);
> >  
> >  	rcu_read_lock();
> >  	request = execlists_active(&engine->execlists);
> > @@ -1824,7 +1806,8 @@ intel_engine_find_active_request(struct intel_engine_cs *engine)
> >  	if (active)
> >  		return active;
> >  
> > -	list_for_each_entry(request, &engine->active.requests, sched.link) {
> > +	list_for_each_entry(request, &engine->sched_engine->requests,
> > +			    sched.link) {
> >  		if (__i915_request_is_complete(request))
> >  			continue;
> >  
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> > index b99ac41695f3..b6a305e6a974 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> > @@ -121,7 +121,7 @@ static void heartbeat(struct work_struct *wrk)
> >  			 * but all other contexts, including the kernel
> >  			 * context are stuck waiting for the signal.
> >  			 */
> > -		} else if (engine->schedule &&
> > +		} else if (engine->sched_engine->schedule &&
> >  			   rq->sched.attr.priority < I915_PRIORITY_BARRIER) {
> >  			/*
> >  			 * Gradually raise the priority of the heartbeat to
> > @@ -136,7 +136,7 @@ static void heartbeat(struct work_struct *wrk)
> >  				attr.priority = I915_PRIORITY_BARRIER;
> >  
> >  			local_bh_disable();
> > -			engine->schedule(rq, &attr);
> > +			engine->sched_engine->schedule(rq, &attr);
> >  			local_bh_enable();
> >  		} else {
> >  			if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > index 47f4397095e5..ba6a9931c4e8 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > @@ -274,14 +274,16 @@ static int __engine_park(struct intel_wakeref *wf)
> >  	intel_engine_park_heartbeat(engine);
> >  	intel_breadcrumbs_park(engine->breadcrumbs);
> >  
> > -	/* Must be reset upon idling, or we may miss the busy wakeup. */
> > -	GEM_BUG_ON(engine->execlists.queue_priority_hint != INT_MIN);
> > +	/*
> > +	 * XXX: Must be reset upon idling, or we may miss the busy wakeup.
> > +	 * queue_priority_hint only used in execlists submission but works in
> > +	 * other modes as default is INT_MIN.
> > +	 */
> > +	GEM_BUG_ON(engine->sched_engine->queue_priority_hint != INT_MIN);
> >  
> >  	if (engine->park)
> >  		engine->park(engine);
> >  
> > -	engine->execlists.no_priolist = false;
> > -
> >  	/* While gt calls i915_vma_parked(), we have to break the lock cycle */
> >  	intel_gt_pm_put_async(engine->gt);
> >  	return 0;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > index 9ef349cd5cea..93aa22680db0 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > @@ -59,6 +59,7 @@ struct drm_i915_reg_table;
> >  struct i915_gem_context;
> >  struct i915_request;
> >  struct i915_sched_attr;
> > +struct i915_sched_engine;
> >  struct intel_gt;
> >  struct intel_ring;
> >  struct intel_uncore;
> > @@ -137,11 +138,6 @@ struct st_preempt_hang {
> >   * driver and the hardware state for execlist mode of submission.
> >   */
> >  struct intel_engine_execlists {
> > -	/**
> > -	 * @tasklet: softirq tasklet for bottom handler
> > -	 */
> > -	struct tasklet_struct tasklet;
> > -
> >  	/**
> >  	 * @timer: kick the current context if its timeslice expires
> >  	 */
> > @@ -152,11 +148,6 @@ struct intel_engine_execlists {
> >  	 */
> >  	struct timer_list preempt;
> >  
> > -	/**
> > -	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
> > -	 */
> > -	struct i915_priolist default_priolist;
> > -
> >  	/**
> >  	 * @ccid: identifier for contexts submitted to this engine
> >  	 */
> > @@ -191,11 +182,6 @@ struct intel_engine_execlists {
> >  	 */
> >  	u32 reset_ccid;
> >  
> > -	/**
> > -	 * @no_priolist: priority lists disabled
> > -	 */
> > -	bool no_priolist;
> > -
> >  	/**
> >  	 * @submit_reg: gen-specific execlist submission register
> >  	 * set to the ExecList Submission Port (elsp) register pre-Gen11 and to
> > @@ -238,23 +224,8 @@ struct intel_engine_execlists {
> >  	unsigned int port_mask;
> >  
> >  	/**
> > -	 * @queue_priority_hint: Highest pending priority.
> > -	 *
> > -	 * When we add requests into the queue, or adjust the priority of
> > -	 * executing requests, we compute the maximum priority of those
> > -	 * pending requests. We can then use this value to determine if
> > -	 * we need to preempt the executing requests to service the queue.
> > -	 * However, since the we may have recorded the priority of an inflight
> > -	 * request we wanted to preempt but since completed, at the time of
> > -	 * dequeuing the priority hint may no longer may match the highest
> > -	 * available request priority.
> > +	 * @virtual: virtual of requests, in priority lists
> >  	 */
> > -	int queue_priority_hint;
> > -
> > -	/**
> > -	 * @queue: queue of requests, in priority lists
> > -	 */
> > -	struct rb_root_cached queue;
> >  	struct rb_root_cached virtual;
> >  
> >  	/**
> > @@ -326,11 +297,7 @@ struct intel_engine_cs {
> >  
> >  	struct intel_sseu sseu;
> >  
> > -	struct {
> > -		spinlock_t lock;
> > -		struct list_head requests;
> > -		struct list_head hold; /* ready requests, but on hold */
> > -	} active;
> > +	struct i915_sched_engine *sched_engine;
> >  
> >  	/* keep a request in reserve for a [pm] barrier under oom */
> >  	struct i915_request *request_pool;
> > @@ -459,9 +426,6 @@ struct intel_engine_cs {
> >  	 * dependencies may need rescheduling. Note the request itself may
> >  	 * not be ready to run!
> >  	 */
> > -	void		(*schedule)(struct i915_request *request,
> > -				    const struct i915_sched_attr *attr);
> > -
> >  	void		(*release)(struct intel_engine_cs *engine);
> >  
> >  	struct intel_engine_execlists execlists;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > index 1cbd84eb24e4..d6dcdeace174 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > @@ -107,7 +107,7 @@ static void set_scheduler_caps(struct drm_i915_private *i915)
> >  	for_each_uabi_engine(engine, i915) { /* all engines must agree! */
> >  		int i;
> >  
> > -		if (engine->schedule)
> > +		if (engine->sched_engine->schedule)
> >  			enabled |= (I915_SCHEDULER_CAP_ENABLED |
> >  				    I915_SCHEDULER_CAP_PRIORITY);
> >  		else
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index 8db200422950..0927a2416b52 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -273,11 +273,11 @@ static int effective_prio(const struct i915_request *rq)
> >  	return prio;
> >  }
> >  
> > -static int queue_prio(const struct intel_engine_execlists *execlists)
> > +static int queue_prio(const struct i915_sched_engine *sched_engine)
> >  {
> >  	struct rb_node *rb;
> >  
> > -	rb = rb_first_cached(&execlists->queue);
> > +	rb = rb_first_cached(&sched_engine->queue);
> >  	if (!rb)
> >  		return INT_MIN;
> >  
> > @@ -318,14 +318,14 @@ static bool need_preempt(const struct intel_engine_cs *engine,
> >  	 * to preserve FIFO ordering of dependencies.
> >  	 */
> >  	last_prio = max(effective_prio(rq), I915_PRIORITY_NORMAL - 1);
> > -	if (engine->execlists.queue_priority_hint <= last_prio)
> > +	if (engine->sched_engine->queue_priority_hint <= last_prio)
> >  		return false;
> >  
> >  	/*
> >  	 * Check against the first request in ELSP[1], it will, thanks to the
> >  	 * power of PI, be the highest priority of that context.
> >  	 */
> > -	if (!list_is_last(&rq->sched.link, &engine->active.requests) &&
> > +	if (!list_is_last(&rq->sched.link, &engine->sched_engine->requests) &&
> >  	    rq_prio(list_next_entry(rq, sched.link)) > last_prio)
> >  		return true;
> >  
> > @@ -340,7 +340,7 @@ static bool need_preempt(const struct intel_engine_cs *engine,
> >  	 * context, it's priority would not exceed ELSP[0] aka last_prio.
> >  	 */
> >  	return max(virtual_prio(&engine->execlists),
> > -		   queue_prio(&engine->execlists)) > last_prio;
> > +		   queue_prio(engine->sched_engine)) > last_prio;
> >  }
> >  
> >  __maybe_unused static bool
> > @@ -367,10 +367,10 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
> >  	struct list_head *pl;
> >  	int prio = I915_PRIORITY_INVALID;
> >  
> > -	lockdep_assert_held(&engine->active.lock);
> > +	lockdep_assert_held(&engine->sched_engine->lock);
> >  
> >  	list_for_each_entry_safe_reverse(rq, rn,
> > -					 &engine->active.requests,
> > +					 &engine->sched_engine->requests,
> >  					 sched.link) {
> >  		if (__i915_request_is_complete(rq)) {
> >  			list_del_init(&rq->sched.link);
> > @@ -382,9 +382,10 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
> >  		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
> >  		if (rq_prio(rq) != prio) {
> >  			prio = rq_prio(rq);
> > -			pl = i915_sched_lookup_priolist(engine, prio);
> > +			pl = i915_sched_lookup_priolist(engine->sched_engine,
> > +							prio);
> >  		}
> > -		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
> > +		GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
> >  
> >  		list_move(&rq->sched.link, pl);
> >  		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > @@ -534,13 +535,13 @@ resubmit_virtual_request(struct i915_request *rq, struct virtual_engine *ve)
> >  {
> >  	struct intel_engine_cs *engine = rq->engine;
> >  
> > -	spin_lock_irq(&engine->active.lock);
> > +	spin_lock_irq(&engine->sched_engine->lock);
> >  
> >  	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> >  	WRITE_ONCE(rq->engine, &ve->base);
> >  	ve->base.submit_request(rq);
> >  
> > -	spin_unlock_irq(&engine->active.lock);
> > +	spin_unlock_irq(&engine->sched_engine->lock);
> >  }
> >  
> >  static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
> > @@ -569,7 +570,7 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
> >  		resubmit_virtual_request(rq, ve);
> >  
> >  	if (READ_ONCE(ve->request))
> > -		tasklet_hi_schedule(&ve->base.execlists.tasklet);
> > +		i915_sched_engine_hi_kick(ve->base.sched_engine);
> >  }
> >  
> >  static void __execlists_schedule_out(struct i915_request * const rq,
> > @@ -579,7 +580,7 @@ static void __execlists_schedule_out(struct i915_request * const rq,
> >  	unsigned int ccid;
> >  
> >  	/*
> > -	 * NB process_csb() is not under the engine->active.lock and hence
> > +	 * NB process_csb() is not under the engine->sched_engine->lock and hence
> >  	 * schedule_out can race with schedule_in meaning that we should
> >  	 * refrain from doing non-trivial work here.
> >  	 */
> > @@ -721,12 +722,11 @@ dump_port(char *buf, int buflen, const char *prefix, struct i915_request *rq)
> >  }
> >  
> >  static __maybe_unused noinline void
> > -trace_ports(const struct intel_engine_execlists *execlists,
> > +trace_ports(const struct intel_engine_cs *engine,
> > +	    const struct intel_engine_execlists *execlists,
> >  	    const char *msg,
> >  	    struct i915_request * const *ports)
> >  {
> > -	const struct intel_engine_cs *engine =
> > -		container_of(execlists, typeof(*engine), execlists);
> >  	char __maybe_unused p0[40], p1[40];
> >  
> >  	if (!ports[0])
> > @@ -738,25 +738,24 @@ trace_ports(const struct intel_engine_execlists *execlists,
> >  }
> >  
> >  static bool
> > -reset_in_progress(const struct intel_engine_execlists *execlists)
> > +reset_in_progress(const struct intel_engine_cs *engine)
> >  {
> > -	return unlikely(!__tasklet_is_enabled(&execlists->tasklet));
> > +	return unlikely(!__tasklet_is_enabled(&engine->sched_engine->tasklet));
> >  }
> >  
> >  static __maybe_unused noinline bool
> > -assert_pending_valid(const struct intel_engine_execlists *execlists,
> > +assert_pending_valid(struct intel_engine_cs *engine,
> > +		     const struct intel_engine_execlists *execlists,
> >  		     const char *msg)
> >  {
> > -	struct intel_engine_cs *engine =
> > -		container_of(execlists, typeof(*engine), execlists);
> >  	struct i915_request * const *port, *rq, *prev = NULL;
> >  	struct intel_context *ce = NULL;
> >  	u32 ccid = -1;
> >  
> > -	trace_ports(execlists, msg, execlists->pending);
> > +	trace_ports(engine, execlists, msg, execlists->pending);
> >  
> >  	/* We may be messing around with the lists during reset, lalala */
> > -	if (reset_in_progress(execlists))
> > +	if (reset_in_progress(engine))
> >  		return true;
> >  
> >  	if (!execlists->pending[0]) {
> > @@ -878,7 +877,7 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
> >  	struct intel_engine_execlists *execlists = &engine->execlists;
> >  	unsigned int n;
> >  
> > -	GEM_BUG_ON(!assert_pending_valid(execlists, "submit"));
> > +	GEM_BUG_ON(!assert_pending_valid(engine, execlists, "submit"));
> >  
> >  	/*
> >  	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
> > @@ -1096,7 +1095,8 @@ static void defer_active(struct intel_engine_cs *engine)
> >  	if (!rq)
> >  		return;
> >  
> > -	defer_request(rq, i915_sched_lookup_priolist(engine, rq_prio(rq)));
> > +	defer_request(rq, i915_sched_lookup_priolist(engine->sched_engine,
> > +						     rq_prio(rq)));
> >  }
> >  
> >  static bool
> > @@ -1133,13 +1133,14 @@ static bool needs_timeslice(const struct intel_engine_cs *engine,
> >  		return false;
> >  
> >  	/* If ELSP[1] is occupied, always check to see if worth slicing */
> > -	if (!list_is_last_rcu(&rq->sched.link, &engine->active.requests)) {
> > +	if (!list_is_last_rcu(&rq->sched.link,
> > +			      &engine->sched_engine->requests)) {
> >  		ENGINE_TRACE(engine, "timeslice required for second inflight context\n");
> >  		return true;
> >  	}
> >  
> >  	/* Otherwise, ELSP[0] is by itself, but may be waiting in the queue */
> > -	if (!RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)) {
> > +	if (!i915_sched_engine_is_empty(engine->sched_engine)) {
> >  		ENGINE_TRACE(engine, "timeslice required for queue\n");
> >  		return true;
> >  	}
> > @@ -1187,7 +1188,7 @@ static void start_timeslice(struct intel_engine_cs *engine)
> >  			 * its timeslice, so recheck.
> >  			 */
> >  			if (!timer_pending(&el->timer))
> > -				tasklet_hi_schedule(&el->tasklet);
> > +				i915_sched_engine_hi_kick(engine->sched_engine);
> >  			return;
> >  		}
> >  
> > @@ -1235,6 +1236,7 @@ static bool completed(const struct i915_request *rq)
> >  
> >  static void execlists_dequeue(struct intel_engine_cs *engine)
> >  {
> > +	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> >  	struct intel_engine_execlists * const execlists = &engine->execlists;
> >  	struct i915_request **port = execlists->pending;
> >  	struct i915_request ** const last_port = port + execlists->port_mask;
> > @@ -1265,7 +1267,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >  	 * and context switches) submission.
> >  	 */
> >  
> > -	spin_lock(&engine->active.lock);
> > +	spin_lock(&engine->sched_engine->lock);
> >  
> >  	/*
> >  	 * If the queue is higher priority than the last
> > @@ -1287,7 +1289,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >  				     last->fence.context,
> >  				     last->fence.seqno,
> >  				     last->sched.attr.priority,
> > -				     execlists->queue_priority_hint);
> > +				     sched_engine->queue_priority_hint);
> >  			record_preemption(execlists);
> >  
> >  			/*
> > @@ -1313,7 +1315,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >  				     yesno(timer_expired(&execlists->timer)),
> >  				     last->fence.context, last->fence.seqno,
> >  				     rq_prio(last),
> > -				     execlists->queue_priority_hint,
> > +				     sched_engine->queue_priority_hint,
> >  				     yesno(timeslice_yield(execlists, last)));
> >  
> >  			/*
> > @@ -1365,7 +1367,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >  				 * Even if ELSP[1] is occupied and not worthy
> >  				 * of timeslices, our queue might be.
> >  				 */
> > -				spin_unlock(&engine->active.lock);
> > +				spin_unlock(&sched_engine->lock);
> >  				return;
> >  			}
> >  		}
> > @@ -1375,7 +1377,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >  	while ((ve = first_virtual_engine(engine))) {
> >  		struct i915_request *rq;
> >  
> > -		spin_lock(&ve->base.active.lock);
> > +		spin_lock(&ve->base.sched_engine->lock);
> >  
> >  		rq = ve->request;
> >  		if (unlikely(!virtual_matches(ve, rq, engine)))
> > @@ -1384,14 +1386,14 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >  		GEM_BUG_ON(rq->engine != &ve->base);
> >  		GEM_BUG_ON(rq->context != &ve->context);
> >  
> > -		if (unlikely(rq_prio(rq) < queue_prio(execlists))) {
> > -			spin_unlock(&ve->base.active.lock);
> > +		if (unlikely(rq_prio(rq) < queue_prio(sched_engine))) {
> > +			spin_unlock(&ve->base.sched_engine->lock);
> >  			break;
> >  		}
> >  
> >  		if (last && !can_merge_rq(last, rq)) {
> > -			spin_unlock(&ve->base.active.lock);
> > -			spin_unlock(&engine->active.lock);
> > +			spin_unlock(&ve->base.sched_engine->lock);
> > +			spin_unlock(&sched_engine->lock);
> >  			return; /* leave this for another sibling */
> >  		}
> >  
> > @@ -1405,7 +1407,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >  			     yesno(engine != ve->siblings[0]));
> >  
> >  		WRITE_ONCE(ve->request, NULL);
> > -		WRITE_ONCE(ve->base.execlists.queue_priority_hint, INT_MIN);
> > +		WRITE_ONCE(ve->base.sched_engine->queue_priority_hint, INT_MIN);
> >  
> >  		rb = &ve->nodes[engine->id].rb;
> >  		rb_erase_cached(rb, &execlists->virtual);
> > @@ -1437,7 +1439,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >  
> >  		i915_request_put(rq);
> >  unlock:
> > -		spin_unlock(&ve->base.active.lock);
> > +		spin_unlock(&ve->base.sched_engine->lock);
> >  
> >  		/*
> >  		 * Hmm, we have a bunch of virtual engine requests,
> > @@ -1450,7 +1452,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >  			break;
> >  	}
> >  
> > -	while ((rb = rb_first_cached(&execlists->queue))) {
> > +	while ((rb = rb_first_cached(&sched_engine->queue))) {
> >  		struct i915_priolist *p = to_priolist(rb);
> >  		struct i915_request *rq, *rn;
> >  
> > @@ -1529,7 +1531,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >  			}
> >  		}
> >  
> > -		rb_erase_cached(&p->node, &execlists->queue);
> > +		rb_erase_cached(&p->node, &sched_engine->queue);
> >  		i915_priolist_free(p);
> >  	}
> >  done:
> > @@ -1551,8 +1553,9 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >  	 * request triggering preemption on the next dequeue (or subsequent
> >  	 * interrupt for secondary ports).
> >  	 */
> > -	execlists->queue_priority_hint = queue_prio(execlists);
> > -	spin_unlock(&engine->active.lock);
> > +	sched_engine->queue_priority_hint = queue_prio(sched_engine);
> > +	i915_sched_engine_reset_on_empty(sched_engine);
> > +	spin_unlock(&sched_engine->lock);
> >  
> >  	/*
> >  	 * We can skip poking the HW if we ended up with exactly the same set
> > @@ -1767,8 +1770,8 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive)
> >  	 * access. Either we are inside the tasklet, or the tasklet is disabled
> >  	 * and we assume that is only inside the reset paths and so serialised.
> >  	 */
> > -	GEM_BUG_ON(!tasklet_is_locked(&execlists->tasklet) &&
> > -		   !reset_in_progress(execlists));
> > +	GEM_BUG_ON(!tasklet_is_locked(&engine->sched_engine->tasklet) &&
> > +		   !reset_in_progress(engine));
> >  
> >  	/*
> >  	 * Note that csb_write, csb_status may be either in HWSP or mmio.
> > @@ -1866,12 +1869,12 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive)
> >  			smp_wmb(); /* notify execlists_active() */
> >  
> >  			/* cancel old inflight, prepare for switch */
> > -			trace_ports(execlists, "preempted", old);
> > +			trace_ports(engine, execlists, "preempted", old);
> >  			while (*old)
> >  				*inactive++ = *old++;
> >  
> >  			/* switch pending to inflight */
> > -			GEM_BUG_ON(!assert_pending_valid(execlists, "promote"));
> > +			GEM_BUG_ON(!assert_pending_valid(engine, execlists, "promote"));
> >  			copy_ports(execlists->inflight,
> >  				   execlists->pending,
> >  				   execlists_num_ports(execlists));
> > @@ -1889,7 +1892,7 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive)
> >  			}
> >  
> >  			/* port0 completed, advanced to port1 */
> > -			trace_ports(execlists, "completed", execlists->active);
> > +			trace_ports(engine, execlists, "completed", execlists->active);
> >  
> >  			/*
> >  			 * We rely on the hardware being strongly
> > @@ -1979,7 +1982,7 @@ static void __execlists_hold(struct i915_request *rq)
> >  			__i915_request_unsubmit(rq);
> >  
> >  		clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > -		list_move_tail(&rq->sched.link, &rq->engine->active.hold);
> > +		list_move_tail(&rq->sched.link, &rq->engine->sched_engine->hold);
> >  		i915_request_set_hold(rq);
> >  		RQ_TRACE(rq, "on hold\n");
> >  
> > @@ -2016,7 +2019,7 @@ static bool execlists_hold(struct intel_engine_cs *engine,
> >  	if (i915_request_on_hold(rq))
> >  		return false;
> >  
> > -	spin_lock_irq(&engine->active.lock);
> > +	spin_lock_irq(&engine->sched_engine->lock);
> >  
> >  	if (__i915_request_is_complete(rq)) { /* too late! */
> >  		rq = NULL;
> > @@ -2032,10 +2035,10 @@ static bool execlists_hold(struct intel_engine_cs *engine,
> >  	GEM_BUG_ON(i915_request_on_hold(rq));
> >  	GEM_BUG_ON(rq->engine != engine);
> >  	__execlists_hold(rq);
> > -	GEM_BUG_ON(list_empty(&engine->active.hold));
> > +	GEM_BUG_ON(list_empty(&engine->sched_engine->hold));
> >  
> >  unlock:
> > -	spin_unlock_irq(&engine->active.lock);
> > +	spin_unlock_irq(&engine->sched_engine->lock);
> >  	return rq;
> >  }
> >  
> > @@ -2079,7 +2082,7 @@ static void __execlists_unhold(struct i915_request *rq)
> >  
> >  		i915_request_clear_hold(rq);
> >  		list_move_tail(&rq->sched.link,
> > -			       i915_sched_lookup_priolist(rq->engine,
> > +			       i915_sched_lookup_priolist(rq->engine->sched_engine,
> >  							  rq_prio(rq)));
> >  		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> >  
> > @@ -2115,7 +2118,7 @@ static void __execlists_unhold(struct i915_request *rq)
> >  static void execlists_unhold(struct intel_engine_cs *engine,
> >  			     struct i915_request *rq)
> >  {
> > -	spin_lock_irq(&engine->active.lock);
> > +	spin_lock_irq(&engine->sched_engine->lock);
> >  
> >  	/*
> >  	 * Move this request back to the priority queue, and all of its
> > @@ -2123,12 +2126,12 @@ static void execlists_unhold(struct intel_engine_cs *engine,
> >  	 */
> >  	__execlists_unhold(rq);
> >  
> > -	if (rq_prio(rq) > engine->execlists.queue_priority_hint) {
> > -		engine->execlists.queue_priority_hint = rq_prio(rq);
> > -		tasklet_hi_schedule(&engine->execlists.tasklet);
> > +	if (rq_prio(rq) > engine->sched_engine->queue_priority_hint) {
> > +		engine->sched_engine->queue_priority_hint = rq_prio(rq);
> > +		i915_sched_engine_hi_kick(engine->sched_engine);
> >  	}
> >  
> > -	spin_unlock_irq(&engine->active.lock);
> > +	spin_unlock_irq(&engine->sched_engine->lock);
> >  }
> >  
> >  struct execlists_capture {
> > @@ -2258,13 +2261,13 @@ static void execlists_capture(struct intel_engine_cs *engine)
> >  	if (!cap)
> >  		return;
> >  
> > -	spin_lock_irq(&engine->active.lock);
> > +	spin_lock_irq(&engine->sched_engine->lock);
> >  	cap->rq = active_context(engine, active_ccid(engine));
> >  	if (cap->rq) {
> >  		cap->rq = active_request(cap->rq->context->timeline, cap->rq);
> >  		cap->rq = i915_request_get_rcu(cap->rq);
> >  	}
> > -	spin_unlock_irq(&engine->active.lock);
> > +	spin_unlock_irq(&engine->sched_engine->lock);
> >  	if (!cap->rq)
> >  		goto err_free;
> >  
> > @@ -2316,13 +2319,13 @@ static void execlists_reset(struct intel_engine_cs *engine, const char *msg)
> >  	ENGINE_TRACE(engine, "reset for %s\n", msg);
> >  
> >  	/* Mark this tasklet as disabled to avoid waiting for it to complete */
> > -	tasklet_disable_nosync(&engine->execlists.tasklet);
> > +	tasklet_disable_nosync(&engine->sched_engine->tasklet);
> >  
> >  	ring_set_paused(engine, 1); /* Freeze the current request in place */
> >  	execlists_capture(engine);
> >  	intel_engine_reset(engine, msg);
> >  
> > -	tasklet_enable(&engine->execlists.tasklet);
> > +	tasklet_enable(&engine->sched_engine->tasklet);
> >  	clear_and_wake_up_bit(bit, lock);
> >  }
> >  
> > @@ -2345,8 +2348,9 @@ static bool preempt_timeout(const struct intel_engine_cs *const engine)
> >   */
> >  static void execlists_submission_tasklet(struct tasklet_struct *t)
> >  {
> > -	struct intel_engine_cs * const engine =
> > -		from_tasklet(engine, t, execlists.tasklet);
> > +	struct i915_sched_engine *sched_engine =
> > +		from_tasklet(sched_engine, t, tasklet);
> > +	struct intel_engine_cs * const engine = sched_engine->engine;
> >  	struct i915_request *post[2 * EXECLIST_MAX_PORTS];
> >  	struct i915_request **inactive;
> >  
> > @@ -2421,13 +2425,16 @@ static void execlists_irq_handler(struct intel_engine_cs *engine, u16 iir)
> >  		intel_engine_signal_breadcrumbs(engine);
> >  
> >  	if (tasklet)
> > -		tasklet_hi_schedule(&engine->execlists.tasklet);
> > +		i915_sched_engine_hi_kick(engine->sched_engine);
> >  }
> >  
> >  static void __execlists_kick(struct intel_engine_execlists *execlists)
> >  {
> > +	struct intel_engine_cs *engine =
> > +		container_of(execlists, typeof(*engine), execlists);
> > +
> >  	/* Kick the tasklet for some interrupt coalescing and reset handling */
> > -	tasklet_hi_schedule(&execlists->tasklet);
> > +	i915_sched_engine_hi_kick(engine->sched_engine);
> >  }
> >  
> >  #define execlists_kick(t, member) \
> > @@ -2448,19 +2455,20 @@ static void queue_request(struct intel_engine_cs *engine,
> >  {
> >  	GEM_BUG_ON(!list_empty(&rq->sched.link));
> >  	list_add_tail(&rq->sched.link,
> > -		      i915_sched_lookup_priolist(engine, rq_prio(rq)));
> > +		      i915_sched_lookup_priolist(engine->sched_engine,
> > +						 rq_prio(rq)));
> >  	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> >  }
> >  
> >  static bool submit_queue(struct intel_engine_cs *engine,
> >  			 const struct i915_request *rq)
> >  {
> > -	struct intel_engine_execlists *execlists = &engine->execlists;
> > +	struct i915_sched_engine *sched_engine = engine->sched_engine;
> >  
> > -	if (rq_prio(rq) <= execlists->queue_priority_hint)
> > +	if (rq_prio(rq) <= sched_engine->queue_priority_hint)
> >  		return false;
> >  
> > -	execlists->queue_priority_hint = rq_prio(rq);
> > +	sched_engine->queue_priority_hint = rq_prio(rq);
> >  	return true;
> >  }
> >  
> > @@ -2468,7 +2476,7 @@ static bool ancestor_on_hold(const struct intel_engine_cs *engine,
> >  			     const struct i915_request *rq)
> >  {
> >  	GEM_BUG_ON(i915_request_on_hold(rq));
> > -	return !list_empty(&engine->active.hold) && hold_request(rq);
> > +	return !list_empty(&engine->sched_engine->hold) && hold_request(rq);
> >  }
> >  
> >  static void execlists_submit_request(struct i915_request *request)
> > @@ -2477,23 +2485,24 @@ static void execlists_submit_request(struct i915_request *request)
> >  	unsigned long flags;
> >  
> >  	/* Will be called from irq-context when using foreign fences. */
> > -	spin_lock_irqsave(&engine->active.lock, flags);
> > +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> >  
> >  	if (unlikely(ancestor_on_hold(engine, request))) {
> >  		RQ_TRACE(request, "ancestor on hold\n");
> > -		list_add_tail(&request->sched.link, &engine->active.hold);
> > +		list_add_tail(&request->sched.link,
> > +			      &engine->sched_engine->hold);
> >  		i915_request_set_hold(request);
> >  	} else {
> >  		queue_request(engine, request);
> >  
> > -		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
> > +		GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
> >  		GEM_BUG_ON(list_empty(&request->sched.link));
> >  
> >  		if (submit_queue(engine, request))
> > -			__execlists_kick(&engine->execlists);
> > +			i915_sched_engine_hi_kick(engine->sched_engine);
> >  	}
> >  
> > -	spin_unlock_irqrestore(&engine->active.lock, flags);
> > +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >  }
> >  
> >  static int
> > @@ -2800,10 +2809,10 @@ static int execlists_resume(struct intel_engine_cs *engine)
> >  
> >  static void execlists_reset_prepare(struct intel_engine_cs *engine)
> >  {
> > -	struct intel_engine_execlists * const execlists = &engine->execlists;
> > +	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> >  
> >  	ENGINE_TRACE(engine, "depth<-%d\n",
> > -		     atomic_read(&execlists->tasklet.count));
> > +		     atomic_read(&sched_engine->tasklet.count));
> >  
> >  	/*
> >  	 * Prevent request submission to the hardware until we have
> > @@ -2814,8 +2823,8 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine)
> >  	 * Turning off the execlists->tasklet until the reset is over
> >  	 * prevents the race.
> >  	 */
> > -	__tasklet_disable_sync_once(&execlists->tasklet);
> > -	GEM_BUG_ON(!reset_in_progress(execlists));
> > +	__tasklet_disable_sync_once(&sched_engine->tasklet);
> > +	GEM_BUG_ON(!reset_in_progress(engine));
> >  
> >  	/*
> >  	 * We stop engines, otherwise we might get failed reset and a
> > @@ -2957,23 +2966,25 @@ static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled)
> >  
> >  	/* Push back any incomplete requests for replay after the reset. */
> >  	rcu_read_lock();
> > -	spin_lock_irqsave(&engine->active.lock, flags);
> > +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> >  	__unwind_incomplete_requests(engine);
> > -	spin_unlock_irqrestore(&engine->active.lock, flags);
> > +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >  	rcu_read_unlock();
> >  }
> >  
> >  static void nop_submission_tasklet(struct tasklet_struct *t)
> >  {
> > -	struct intel_engine_cs * const engine =
> > -		from_tasklet(engine, t, execlists.tasklet);
> > +	struct i915_sched_engine *sched_engine =
> > +		from_tasklet(sched_engine, t, tasklet);
> > +	struct intel_engine_cs * const engine = sched_engine->engine;
> >  
> >  	/* The driver is wedged; don't process any more events. */
> > -	WRITE_ONCE(engine->execlists.queue_priority_hint, INT_MIN);
> > +	WRITE_ONCE(engine->sched_engine->queue_priority_hint, INT_MIN);
> >  }
> >  
> >  static void execlists_reset_cancel(struct intel_engine_cs *engine)
> >  {
> > +	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> >  	struct intel_engine_execlists * const execlists = &engine->execlists;
> >  	struct i915_request *rq, *rn;
> >  	struct rb_node *rb;
> > @@ -2998,15 +3009,15 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine)
> >  	execlists_reset_csb(engine, true);
> >  
> >  	rcu_read_lock();
> > -	spin_lock_irqsave(&engine->active.lock, flags);
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> >  
> >  	/* Mark all executing requests as skipped. */
> > -	list_for_each_entry(rq, &engine->active.requests, sched.link)
> > +	list_for_each_entry(rq, &sched_engine->requests, sched.link)
> >  		i915_request_put(i915_request_mark_eio(rq));
> >  	intel_engine_signal_breadcrumbs(engine);
> >  
> >  	/* Flush the queued requests to the timeline list (for retiring). */
> > -	while ((rb = rb_first_cached(&execlists->queue))) {
> > +	while ((rb = rb_first_cached(&sched_engine->queue))) {
> >  		struct i915_priolist *p = to_priolist(rb);
> >  
> >  		priolist_for_each_request_consume(rq, rn, p) {
> > @@ -3016,12 +3027,12 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine)
> >  			}
> >  		}
> >  
> > -		rb_erase_cached(&p->node, &execlists->queue);
> > +		rb_erase_cached(&p->node, &sched_engine->queue);
> >  		i915_priolist_free(p);
> >  	}
> >  
> >  	/* On-hold requests will be flushed to timeline upon their release */
> > -	list_for_each_entry(rq, &engine->active.hold, sched.link)
> > +	list_for_each_entry(rq, &sched_engine->hold, sched.link)
> >  		i915_request_put(i915_request_mark_eio(rq));
> >  
> >  	/* Cancel all attached virtual engines */
> > @@ -3032,7 +3043,7 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine)
> >  		rb_erase_cached(rb, &execlists->virtual);
> >  		RB_CLEAR_NODE(rb);
> >  
> > -		spin_lock(&ve->base.active.lock);
> > +		spin_lock(&ve->base.sched_engine->lock);
> >  		rq = fetch_and_zero(&ve->request);
> >  		if (rq) {
> >  			if (i915_request_mark_eio(rq)) {
> > @@ -3042,26 +3053,26 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine)
> >  			}
> >  			i915_request_put(rq);
> >  
> > -			ve->base.execlists.queue_priority_hint = INT_MIN;
> > +			ve->base.sched_engine->queue_priority_hint = INT_MIN;
> >  		}
> > -		spin_unlock(&ve->base.active.lock);
> > +		spin_unlock(&ve->base.sched_engine->lock);
> >  	}
> >  
> >  	/* Remaining _unready_ requests will be nop'ed when submitted */
> >  
> > -	execlists->queue_priority_hint = INT_MIN;
> > -	execlists->queue = RB_ROOT_CACHED;
> > +	sched_engine->queue_priority_hint = INT_MIN;
> > +	sched_engine->queue = RB_ROOT_CACHED;
> >  
> > -	GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet));
> > -	execlists->tasklet.callback = nop_submission_tasklet;
> > +	GEM_BUG_ON(__tasklet_is_enabled(&sched_engine->tasklet));
> > +	sched_engine->tasklet.callback = nop_submission_tasklet;
> >  
> > -	spin_unlock_irqrestore(&engine->active.lock, flags);
> > +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >  	rcu_read_unlock();
> >  }
> >  
> >  static void execlists_reset_finish(struct intel_engine_cs *engine)
> >  {
> > -	struct intel_engine_execlists * const execlists = &engine->execlists;
> > +	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> >  
> >  	/*
> >  	 * After a GPU reset, we may have requests to replay. Do so now while
> > @@ -3073,14 +3084,14 @@ static void execlists_reset_finish(struct intel_engine_cs *engine)
> >  	 * reset as the next level of recovery, and as a final resort we
> >  	 * will declare the device wedged.
> >  	 */
> > -	GEM_BUG_ON(!reset_in_progress(execlists));
> > +	GEM_BUG_ON(!reset_in_progress(engine));
> >  
> >  	/* And kick in case we missed a new request submission. */
> > -	if (__tasklet_enable(&execlists->tasklet))
> > -		__execlists_kick(execlists);
> > +	if (__tasklet_enable(&sched_engine->tasklet))
> > +		i915_sched_engine_hi_kick(sched_engine);
> >  
> >  	ENGINE_TRACE(engine, "depth->%d\n",
> > -		     atomic_read(&execlists->tasklet.count));
> > +		     atomic_read(&sched_engine->tasklet.count));
> >  }
> >  
> >  static void gen8_logical_ring_enable_irq(struct intel_engine_cs *engine)
> > @@ -3110,11 +3121,59 @@ static bool can_preempt(struct intel_engine_cs *engine)
> >  	return engine->class != RENDER_CLASS;
> >  }
> >  
> > +static void kick_execlists(const struct i915_request *rq, int prio)
> > +{
> > +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> > +	const struct i915_request *inflight;
> > +
> > +	/*
> > +	 * We only need to kick the tasklet once for the high priority
> > +	 * new context we add into the queue.
> > +	 */
> > +	if (prio <= sched_engine->queue_priority_hint)
> > +		return;
> > +
> > +	rcu_read_lock();
> > +
> > +	/* Nothing currently active? We're overdue for a submission! */
> > +	inflight = execlists_active(&rq->engine->execlists);
> > +	if (!inflight)
> > +		goto unlock;
> > +
> > +	/*
> > +	 * If we are already the currently executing context, don't
> > +	 * bother evaluating if we should preempt ourselves.
> > +	 */
> > +	if (inflight->context == rq->context)
> > +		goto unlock;
> > +
> > +	ENGINE_TRACE(rq->engine,
> > +		     "bumping queue-priority-hint:%d for rq:%llx:%lld, inflight:%llx:%lld prio %d\n",
> > +		     prio,
> > +		     rq->fence.context, rq->fence.seqno,
> > +		     inflight->fence.context, inflight->fence.seqno,
> > +		     inflight->sched.attr.priority);
> > +
> > +	sched_engine->queue_priority_hint = prio;
> > +
> > +	/*
> > +	 * Allow preemption of low -> normal -> high, but we do
> > +	 * not allow low priority tasks to preempt other low priority
> > +	 * tasks under the impression that latency for low priority
> > +	 * tasks does not matter (as much as background throughput),
> > +	 * so kiss.
> > +	 */
> > +	if (prio >= max(I915_PRIORITY_NORMAL, rq_prio(inflight)))
> > +		i915_sched_engine_hi_kick(sched_engine);
> > +
> > +unlock:
> > +	rcu_read_unlock();
> > +}
> > +
> >  static void execlists_set_default_submission(struct intel_engine_cs *engine)
> >  {
> >  	engine->submit_request = execlists_submit_request;
> > -	engine->schedule = i915_schedule;
> > -	engine->execlists.tasklet.callback = execlists_submission_tasklet;
> > +	engine->sched_engine->tasklet.callback = execlists_submission_tasklet;
> >  }
> >  
> >  static void execlists_shutdown(struct intel_engine_cs *engine)
> > @@ -3122,7 +3181,7 @@ static void execlists_shutdown(struct intel_engine_cs *engine)
> >  	/* Synchronise with residual timers and any softirq they raise */
> >  	del_timer_sync(&engine->execlists.timer);
> >  	del_timer_sync(&engine->execlists.preempt);
> > -	tasklet_kill(&engine->execlists.tasklet);
> > +	i915_sched_engine_kill(engine->sched_engine);
> >  }
> >  
> >  static void execlists_release(struct intel_engine_cs *engine)
> > @@ -3238,10 +3297,14 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine)
> >  	struct intel_uncore *uncore = engine->uncore;
> >  	u32 base = engine->mmio_base;
> >  
> > -	tasklet_setup(&engine->execlists.tasklet, execlists_submission_tasklet);
> > +	tasklet_setup(&engine->sched_engine->tasklet,
> > +		      execlists_submission_tasklet);
> >  	timer_setup(&engine->execlists.timer, execlists_timeslice, 0);
> >  	timer_setup(&engine->execlists.preempt, execlists_preempt, 0);
> >  
> > +	engine->sched_engine->schedule = i915_schedule;
> > +	engine->sched_engine->kick_backend = kick_execlists;
> > +
> >  	logical_ring_default_vfuncs(engine);
> >  	logical_ring_default_irqs(engine);
> >  
> > @@ -3286,7 +3349,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine)
> >  
> >  static struct list_head *virtual_queue(struct virtual_engine *ve)
> >  {
> > -	return &ve->base.execlists.default_priolist.requests;
> > +	return &ve->base.sched_engine->default_priolist.requests;
> >  }
> >  
> >  static void rcu_virtual_context_destroy(struct work_struct *wrk)
> > @@ -3301,7 +3364,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
> >  	if (unlikely(ve->request)) {
> >  		struct i915_request *old;
> >  
> > -		spin_lock_irq(&ve->base.active.lock);
> > +		spin_lock_irq(&ve->base.sched_engine->lock);
> >  
> >  		old = fetch_and_zero(&ve->request);
> >  		if (old) {
> > @@ -3310,7 +3373,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
> >  			i915_request_put(old);
> >  		}
> >  
> > -		spin_unlock_irq(&ve->base.active.lock);
> > +		spin_unlock_irq(&ve->base.sched_engine->lock);
> >  	}
> >  
> >  	/*
> > @@ -3320,7 +3383,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
> >  	 * rbtrees as in the case it is running in parallel, it may reinsert
> >  	 * the rb_node into a sibling.
> >  	 */
> > -	tasklet_kill(&ve->base.execlists.tasklet);
> > +	i915_sched_engine_kill(ve->base.sched_engine);
> >  
> >  	/* Decouple ourselves from the siblings, no more access allowed. */
> >  	for (n = 0; n < ve->num_siblings; n++) {
> > @@ -3330,21 +3393,23 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
> >  		if (RB_EMPTY_NODE(node))
> >  			continue;
> >  
> > -		spin_lock_irq(&sibling->active.lock);
> > +		spin_lock_irq(&sibling->sched_engine->lock);
> >  
> >  		/* Detachment is lazily performed in the execlists tasklet */
> >  		if (!RB_EMPTY_NODE(node))
> >  			rb_erase_cached(node, &sibling->execlists.virtual);
> >  
> > -		spin_unlock_irq(&sibling->active.lock);
> > +		spin_unlock_irq(&sibling->sched_engine->lock);
> >  	}
> > -	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
> > +	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.sched_engine->tasklet));
> >  	GEM_BUG_ON(!list_empty(virtual_queue(ve)));
> >  
> >  	lrc_fini(&ve->context);
> >  	intel_context_fini(&ve->context);
> >  
> >  	intel_breadcrumbs_free(ve->base.breadcrumbs);
> > +	if (ve->base.sched_engine)
> > +		i915_sched_engine_put(ve->base.sched_engine);
> >  	intel_engine_free_request_pool(&ve->base);
> >  
> >  	kfree(ve->bonds);
> > @@ -3475,16 +3540,18 @@ static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
> >  
> >  	ENGINE_TRACE(&ve->base, "rq=%llx:%lld, mask=%x, prio=%d\n",
> >  		     rq->fence.context, rq->fence.seqno,
> > -		     mask, ve->base.execlists.queue_priority_hint);
> > +		     mask, ve->base.sched_engine->queue_priority_hint);
> >  
> >  	return mask;
> >  }
> >  
> >  static void virtual_submission_tasklet(struct tasklet_struct *t)
> >  {
> > +	struct i915_sched_engine *sched_engine =
> > +		from_tasklet(sched_engine, t, tasklet);
> >  	struct virtual_engine * const ve =
> > -		from_tasklet(ve, t, base.execlists.tasklet);
> > -	const int prio = READ_ONCE(ve->base.execlists.queue_priority_hint);
> > +		(struct virtual_engine *)sched_engine->engine;
> > +	const int prio = READ_ONCE(ve->base.sched_engine->queue_priority_hint);
> >  	intel_engine_mask_t mask;
> >  	unsigned int n;
> >  
> > @@ -3503,7 +3570,7 @@ static void virtual_submission_tasklet(struct tasklet_struct *t)
> >  		if (!READ_ONCE(ve->request))
> >  			break; /* already handled by a sibling's tasklet */
> >  
> > -		spin_lock_irq(&sibling->active.lock);
> > +		spin_lock_irq(&sibling->sched_engine->lock);
> >  
> >  		if (unlikely(!(mask & sibling->mask))) {
> >  			if (!RB_EMPTY_NODE(&node->rb)) {
> > @@ -3552,11 +3619,11 @@ static void virtual_submission_tasklet(struct tasklet_struct *t)
> >  submit_engine:
> >  		GEM_BUG_ON(RB_EMPTY_NODE(&node->rb));
> >  		node->prio = prio;
> > -		if (first && prio > sibling->execlists.queue_priority_hint)
> > -			tasklet_hi_schedule(&sibling->execlists.tasklet);
> > +		if (first && prio > sibling->sched_engine->queue_priority_hint)
> > +			i915_sched_engine_hi_kick(sibling->sched_engine);
> >  
> >  unlock_engine:
> > -		spin_unlock_irq(&sibling->active.lock);
> > +		spin_unlock_irq(&sibling->sched_engine->lock);
> >  
> >  		if (intel_context_inflight(&ve->context))
> >  			break;
> > @@ -3574,7 +3641,7 @@ static void virtual_submit_request(struct i915_request *rq)
> >  
> >  	GEM_BUG_ON(ve->base.submit_request != virtual_submit_request);
> >  
> > -	spin_lock_irqsave(&ve->base.active.lock, flags);
> > +	spin_lock_irqsave(&ve->base.sched_engine->lock, flags);
> >  
> >  	/* By the time we resubmit a request, it may be completed */
> >  	if (__i915_request_is_complete(rq)) {
> > @@ -3588,16 +3655,16 @@ static void virtual_submit_request(struct i915_request *rq)
> >  		i915_request_put(ve->request);
> >  	}
> >  
> > -	ve->base.execlists.queue_priority_hint = rq_prio(rq);
> > +	ve->base.sched_engine->queue_priority_hint = rq_prio(rq);
> >  	ve->request = i915_request_get(rq);
> >  
> >  	GEM_BUG_ON(!list_empty(virtual_queue(ve)));
> >  	list_move_tail(&rq->sched.link, virtual_queue(ve));
> >  
> > -	tasklet_hi_schedule(&ve->base.execlists.tasklet);
> > +	i915_sched_engine_hi_kick(ve->base.sched_engine);
> >  
> >  unlock:
> > -	spin_unlock_irqrestore(&ve->base.active.lock, flags);
> > +	spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags);
> >  }
> >  
> >  static struct ve_bond *
> > @@ -3681,19 +3748,24 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> >  
> >  	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
> >  
> > -	intel_engine_init_active(&ve->base, ENGINE_VIRTUAL);
> > -	intel_engine_init_execlists(&ve->base);
> > +	ve->base.sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
> > +	if (!ve->base.sched_engine) {
> > +		kfree(ve);
> > +		return ERR_PTR(-ENOMEM);
> > +	}
> > +	ve->base.sched_engine->engine = &ve->base;
> >  
> >  	ve->base.cops = &virtual_context_ops;
> >  	ve->base.request_alloc = execlists_request_alloc;
> >  
> > -	ve->base.schedule = i915_schedule;
> > +	ve->base.sched_engine->schedule = i915_schedule;
> >  	ve->base.submit_request = virtual_submit_request;
> >  	ve->base.bond_execute = virtual_bond_execute;
> >  
> >  	INIT_LIST_HEAD(virtual_queue(ve));
> > -	ve->base.execlists.queue_priority_hint = INT_MIN;
> > -	tasklet_setup(&ve->base.execlists.tasklet, virtual_submission_tasklet);
> > +	ve->base.sched_engine->queue_priority_hint = INT_MIN;
> > +	tasklet_setup(&ve->base.sched_engine->tasklet,
> > +		      virtual_submission_tasklet);
> >  
> >  	intel_context_init(&ve->context, &ve->base);
> >  
> > @@ -3721,7 +3793,7 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> >  		 * layering if we handle cloning of the requests and
> >  		 * submitting a copy into each backend.
> >  		 */
> > -		if (sibling->execlists.tasklet.callback !=
> > +		if (sibling->sched_engine->tasklet.callback !=
> >  		    execlists_submission_tasklet) {
> >  			err = -ENODEV;
> >  			goto err_put;
> > @@ -3756,6 +3828,9 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> >  			 "v%dx%d", ve->base.class, count);
> >  		ve->base.context_size = sibling->context_size;
> >  
> > +		ve->base.sched_engine->kick_backend =
> > +			sibling->sched_engine->kick_backend;
> > +
> >  		ve->base.emit_bb_start = sibling->emit_bb_start;
> >  		ve->base.emit_flush = sibling->emit_flush;
> >  		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
> > @@ -3848,17 +3923,18 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
> >  							int indent),
> >  				   unsigned int max)
> >  {
> > +	const struct i915_sched_engine *sched_engine = engine->sched_engine;
> >  	const struct intel_engine_execlists *execlists = &engine->execlists;
> >  	struct i915_request *rq, *last;
> >  	unsigned long flags;
> >  	unsigned int count;
> >  	struct rb_node *rb;
> >  
> > -	spin_lock_irqsave(&engine->active.lock, flags);
> > +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> >  
> >  	last = NULL;
> >  	count = 0;
> > -	list_for_each_entry(rq, &engine->active.requests, sched.link) {
> > +	list_for_each_entry(rq, &sched_engine->requests, sched.link) {
> >  		if (count++ < max - 1)
> >  			show_request(m, rq, "\t\t", 0);
> >  		else
> > @@ -3873,13 +3949,13 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
> >  		show_request(m, last, "\t\t", 0);
> >  	}
> >  
> > -	if (execlists->queue_priority_hint != INT_MIN)
> > +	if (sched_engine->queue_priority_hint != INT_MIN)
> >  		drm_printf(m, "\t\tQueue priority hint: %d\n",
> > -			   READ_ONCE(execlists->queue_priority_hint));
> > +			   READ_ONCE(sched_engine->queue_priority_hint));
> >  
> >  	last = NULL;
> >  	count = 0;
> > -	for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
> > +	for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
> >  		struct i915_priolist *p = rb_entry(rb, typeof(*p), node);
> >  
> >  		priolist_for_each_request(rq, p) {
> > @@ -3921,7 +3997,7 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
> >  		show_request(m, last, "\t\t", 0);
> >  	}
> >  
> > -	spin_unlock_irqrestore(&engine->active.lock, flags);
> > +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >  }
> >  
> >  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > index 2b6dffcc2262..14aa31879a37 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > @@ -339,9 +339,9 @@ static void reset_rewind(struct intel_engine_cs *engine, bool stalled)
> >  	u32 head;
> >  
> >  	rq = NULL;
> > -	spin_lock_irqsave(&engine->active.lock, flags);
> > +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> >  	rcu_read_lock();
> > -	list_for_each_entry(pos, &engine->active.requests, sched.link) {
> > +	list_for_each_entry(pos, &engine->sched_engine->requests, sched.link) {
> >  		if (!__i915_request_is_complete(pos)) {
> >  			rq = pos;
> >  			break;
> > @@ -396,7 +396,7 @@ static void reset_rewind(struct intel_engine_cs *engine, bool stalled)
> >  	}
> >  	engine->legacy.ring->head = intel_ring_wrap(engine->legacy.ring, head);
> >  
> > -	spin_unlock_irqrestore(&engine->active.lock, flags);
> > +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >  }
> >  
> >  static void reset_finish(struct intel_engine_cs *engine)
> > @@ -408,16 +408,17 @@ static void reset_cancel(struct intel_engine_cs *engine)
> >  	struct i915_request *request;
> >  	unsigned long flags;
> >  
> > -	spin_lock_irqsave(&engine->active.lock, flags);
> > +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> >  
> >  	/* Mark all submitted requests as skipped. */
> > -	list_for_each_entry(request, &engine->active.requests, sched.link)
> > +	list_for_each_entry(request, &engine->sched_engine->requests,
> > +			    sched.link)
> >  		i915_request_put(i915_request_mark_eio(request));
> >  	intel_engine_signal_breadcrumbs(engine);
> >  
> >  	/* Remaining _unready_ requests will be nop'ed when submitted */
> >  
> > -	spin_unlock_irqrestore(&engine->active.lock, flags);
> > +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >  }
> >  
> >  static void i9xx_submit_request(struct i915_request *request)
> > diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> > index 32589c6625e1..bd005c1b6fd5 100644
> > --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> > +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> > @@ -253,10 +253,10 @@ static void mock_reset_cancel(struct intel_engine_cs *engine)
> >  
> >  	del_timer_sync(&mock->hw_delay);
> >  
> > -	spin_lock_irqsave(&engine->active.lock, flags);
> > +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> >  
> >  	/* Mark all submitted requests as skipped. */
> > -	list_for_each_entry(rq, &engine->active.requests, sched.link)
> > +	list_for_each_entry(rq, &engine->sched_engine->requests, sched.link)
> >  		i915_request_put(i915_request_mark_eio(rq));
> >  	intel_engine_signal_breadcrumbs(engine);
> >  
> > @@ -269,7 +269,7 @@ static void mock_reset_cancel(struct intel_engine_cs *engine)
> >  	}
> >  	INIT_LIST_HEAD(&mock->hw_queue);
> >  
> > -	spin_unlock_irqrestore(&engine->active.lock, flags);
> > +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >  }
> >  
> >  static void mock_reset_finish(struct intel_engine_cs *engine)
> > @@ -283,6 +283,7 @@ static void mock_engine_release(struct intel_engine_cs *engine)
> >  
> >  	GEM_BUG_ON(timer_pending(&mock->hw_delay));
> >  
> > +	i915_sched_engine_put(engine->sched_engine);
> >  	intel_breadcrumbs_free(engine->breadcrumbs);
> >  
> >  	intel_context_unpin(engine->kernel_context);
> > @@ -345,14 +346,18 @@ int mock_engine_init(struct intel_engine_cs *engine)
> >  {
> >  	struct intel_context *ce;
> >  
> > -	intel_engine_init_active(engine, ENGINE_MOCK);
> > +	engine->sched_engine = i915_sched_engine_create(ENGINE_MOCK);
> > +	if (!engine->sched_engine)
> > +		return -ENOMEM;
> > +	engine->sched_engine->engine = engine;
> > +
> >  	intel_engine_init_execlists(engine);
> >  	intel_engine_init__pm(engine);
> >  	intel_engine_init_retire(engine);
> >  
> >  	engine->breadcrumbs = intel_breadcrumbs_create(NULL);
> >  	if (!engine->breadcrumbs)
> > -		return -ENOMEM;
> > +		goto err_schedule;
> >  
> >  	ce = create_kernel_context(engine);
> >  	if (IS_ERR(ce))
> > @@ -366,6 +371,8 @@ int mock_engine_init(struct intel_engine_cs *engine)
> >  
> >  err_breadcrumbs:
> >  	intel_breadcrumbs_free(engine->breadcrumbs);
> > +err_schedule:
> > +	i915_sched_engine_put(engine->sched_engine);
> >  	return -ENOMEM;
> >  }
> >  
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > index 1f93591a8c69..f349048ccbf6 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > @@ -43,7 +43,7 @@ static int wait_for_submit(struct intel_engine_cs *engine,
> >  			   unsigned long timeout)
> >  {
> >  	/* Ignore our own attempts to suppress excess tasklets */
> > -	tasklet_hi_schedule(&engine->execlists.tasklet);
> > +	i915_sched_engine_hi_kick(engine->sched_engine);
> >  
> >  	timeout += jiffies;
> >  	do {
> > @@ -273,7 +273,7 @@ static int live_unlite_restore(struct intel_gt *gt, int prio)
> >  			};
> >  
> >  			/* Alternatively preempt the spinner with ce[1] */
> > -			engine->schedule(rq[1], &attr);
> > +			engine->sched_engine->schedule(rq[1], &attr);
> >  		}
> >  
> >  		/* And switch back to ce[0] for good measure */
> > @@ -606,9 +606,9 @@ static int live_hold_reset(void *arg)
> >  			err = -EBUSY;
> >  			goto out;
> >  		}
> > -		tasklet_disable(&engine->execlists.tasklet);
> > +		tasklet_disable(&engine->sched_engine->tasklet);
> >  
> > -		engine->execlists.tasklet.callback(&engine->execlists.tasklet);
> > +		engine->sched_engine->tasklet.callback(&engine->sched_engine->tasklet);
> >  		GEM_BUG_ON(execlists_active(&engine->execlists) != rq);
> >  
> >  		i915_request_get(rq);
> > @@ -618,7 +618,7 @@ static int live_hold_reset(void *arg)
> >  		__intel_engine_reset_bh(engine, NULL);
> >  		GEM_BUG_ON(rq->fence.error != -EIO);
> >  
> > -		tasklet_enable(&engine->execlists.tasklet);
> > +		tasklet_enable(&engine->sched_engine->tasklet);
> >  		clear_and_wake_up_bit(I915_RESET_ENGINE + id,
> >  				      &gt->reset.flags);
> >  		local_bh_enable();
> > @@ -900,7 +900,7 @@ release_queue(struct intel_engine_cs *engine,
> >  	i915_request_add(rq);
> >  
> >  	local_bh_disable();
> > -	engine->schedule(rq, &attr);
> > +	engine->sched_engine->schedule(rq, &attr);
> >  	local_bh_enable(); /* kick tasklet */
> >  
> >  	i915_request_put(rq);
> > @@ -1183,7 +1183,7 @@ static int live_timeslice_rewind(void *arg)
> >  		while (i915_request_is_active(rq[A2])) { /* semaphore yield! */
> >  			/* Wait for the timeslice to kick in */
> >  			del_timer(&engine->execlists.timer);
> > -			tasklet_hi_schedule(&engine->execlists.tasklet);
> > +			i915_sched_engine_hi_kick(engine->sched_engine);
> >  			intel_engine_flush_submission(engine);
> >  		}
> >  		/* -> ELSP[] = { { A:rq1 }, { B:rq1 } } */
> > @@ -1325,7 +1325,7 @@ static int live_timeslice_queue(void *arg)
> >  			err = PTR_ERR(rq);
> >  			goto err_heartbeat;
> >  		}
> > -		engine->schedule(rq, &attr);
> > +		engine->sched_engine->schedule(rq, &attr);
> >  		err = wait_for_submit(engine, rq, HZ / 2);
> >  		if (err) {
> >  			pr_err("%s: Timed out trying to submit semaphores\n",
> > @@ -1867,7 +1867,7 @@ static int live_late_preempt(void *arg)
> >  		}
> >  
> >  		attr.priority = I915_PRIORITY_MAX;
> > -		engine->schedule(rq, &attr);
> > +		engine->sched_engine->schedule(rq, &attr);
> >  
> >  		if (!igt_wait_for_spinner(&spin_hi, rq)) {
> >  			pr_err("High priority context failed to preempt the low priority context\n");
> > @@ -2480,7 +2480,7 @@ static int live_suppress_self_preempt(void *arg)
> >  			i915_request_add(rq_b);
> >  
> >  			GEM_BUG_ON(i915_request_completed(rq_a));
> > -			engine->schedule(rq_a, &attr);
> > +			engine->sched_engine->schedule(rq_a, &attr);
> >  			igt_spinner_end(&a.spin);
> >  
> >  			if (!igt_wait_for_spinner(&b.spin, rq_b)) {
> > @@ -2612,7 +2612,7 @@ static int live_chain_preempt(void *arg)
> >  
> >  			i915_request_get(rq);
> >  			i915_request_add(rq);
> > -			engine->schedule(rq, &attr);
> > +			engine->sched_engine->schedule(rq, &attr);
> >  
> >  			igt_spinner_end(&hi.spin);
> >  			if (i915_request_wait(rq, 0, HZ / 5) < 0) {
> > @@ -2971,7 +2971,7 @@ static int live_preempt_gang(void *arg)
> >  				break;
> >  
> >  			/* Submit each spinner at increasing priority */
> > -			engine->schedule(rq, &attr);
> > +			engine->sched_engine->schedule(rq, &attr);
> >  		} while (prio <= I915_PRIORITY_MAX &&
> >  			 !__igt_timeout(end_time, NULL));
> >  		pr_debug("%s: Preempt chain of %d requests\n",
> > @@ -3219,7 +3219,7 @@ static int preempt_user(struct intel_engine_cs *engine,
> >  	i915_request_get(rq);
> >  	i915_request_add(rq);
> >  
> > -	engine->schedule(rq, &attr);
> > +	engine->sched_engine->schedule(rq, &attr);
> >  
> >  	if (i915_request_wait(rq, 0, HZ / 2) < 0)
> >  		err = -ETIME;
> > @@ -4593,15 +4593,15 @@ static int reset_virtual_engine(struct intel_gt *gt,
> >  		err = -EBUSY;
> >  		goto out_heartbeat;
> >  	}
> > -	tasklet_disable(&engine->execlists.tasklet);
> > +	tasklet_disable(&engine->sched_engine->tasklet);
> >  
> > -	engine->execlists.tasklet.callback(&engine->execlists.tasklet);
> > +	engine->sched_engine->tasklet.callback(&engine->sched_engine->tasklet);
> >  	GEM_BUG_ON(execlists_active(&engine->execlists) != rq);
> >  
> >  	/* Fake a preemption event; failed of course */
> > -	spin_lock_irq(&engine->active.lock);
> > +	spin_lock_irq(&engine->sched_engine->lock);
> >  	__unwind_incomplete_requests(engine);
> > -	spin_unlock_irq(&engine->active.lock);
> > +	spin_unlock_irq(&engine->sched_engine->lock);
> >  	GEM_BUG_ON(rq->engine != engine);
> >  
> >  	/* Reset the engine while keeping our active request on hold */
> > @@ -4612,7 +4612,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
> >  	GEM_BUG_ON(rq->fence.error != -EIO);
> >  
> >  	/* Release our grasp on the engine, letting CS flow again */
> > -	tasklet_enable(&engine->execlists.tasklet);
> > +	tasklet_enable(&engine->sched_engine->tasklet);
> >  	clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags);
> >  	local_bh_enable();
> >  
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > index 5b63d4df8c93..cbcb800e2ca0 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > @@ -858,12 +858,12 @@ static int active_engine(void *data)
> >  		rq[idx] = i915_request_get(new);
> >  		i915_request_add(new);
> >  
> > -		if (engine->schedule && arg->flags & TEST_PRIORITY) {
> > +		if (engine->sched_engine->schedule && arg->flags & TEST_PRIORITY) {
> >  			struct i915_sched_attr attr = {
> >  				.priority =
> >  					i915_prandom_u32_max_state(512, &prng),
> >  			};
> > -			engine->schedule(rq[idx], &attr);
> > +			engine->sched_engine->schedule(rq[idx], &attr);
> >  		}
> >  
> >  		err = active_request_put(old);
> > @@ -1702,7 +1702,7 @@ static int __igt_atomic_reset_engine(struct intel_engine_cs *engine,
> >  				     const struct igt_atomic_section *p,
> >  				     const char *mode)
> >  {
> > -	struct tasklet_struct * const t = &engine->execlists.tasklet;
> > +	struct tasklet_struct * const t = &engine->sched_engine->tasklet;
> >  	int err;
> >  
> >  	GEM_TRACE("i915_reset_engine(%s:%s) under %s\n",
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> > index d8f6623524e8..5b40def7cd9d 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> > @@ -49,7 +49,7 @@ static int wait_for_submit(struct intel_engine_cs *engine,
> >  			   unsigned long timeout)
> >  {
> >  	/* Ignore our own attempts to suppress excess tasklets */
> > -	tasklet_hi_schedule(&engine->execlists.tasklet);
> > +	i915_sched_engine_hi_kick(engine->sched_engine);
> >  
> >  	timeout += jiffies;
> >  	do {
> > @@ -1613,12 +1613,12 @@ static void garbage_reset(struct intel_engine_cs *engine,
> >  
> >  	local_bh_disable();
> >  	if (!test_and_set_bit(bit, lock)) {
> > -		tasklet_disable(&engine->execlists.tasklet);
> > +		tasklet_disable(&engine->sched_engine->tasklet);
> >  
> >  		if (!rq->fence.error)
> >  			__intel_engine_reset_bh(engine, NULL);
> >  
> > -		tasklet_enable(&engine->execlists.tasklet);
> > +		tasklet_enable(&engine->sched_engine->tasklet);
> >  		clear_and_wake_up_bit(bit, lock);
> >  	}
> >  	local_bh_enable();
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
> > index 8784257ec808..7a50c9f4071b 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_reset.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
> > @@ -321,7 +321,7 @@ static int igt_atomic_engine_reset(void *arg)
> >  		goto out_unlock;
> >  
> >  	for_each_engine(engine, gt, id) {
> > -		struct tasklet_struct *t = &engine->execlists.tasklet;
> > +		struct tasklet_struct *t = &engine->sched_engine->tasklet;
> >  
> >  		if (t->func)
> >  			tasklet_disable(t);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 38cda5d599a6..b8f9c71af13e 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -181,6 +181,7 @@ static void schedule_out(struct i915_request *rq)
> >  
> >  static void __guc_dequeue(struct intel_engine_cs *engine)
> >  {
> > +	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> >  	struct intel_engine_execlists * const execlists = &engine->execlists;
> >  	struct i915_request **first = execlists->inflight;
> >  	struct i915_request ** const last_port = first + execlists->port_mask;
> > @@ -189,7 +190,7 @@ static void __guc_dequeue(struct intel_engine_cs *engine)
> >  	bool submit = false;
> >  	struct rb_node *rb;
> >  
> > -	lockdep_assert_held(&engine->active.lock);
> > +	lockdep_assert_held(&engine->sched_engine->lock);
> >  
> >  	if (last) {
> >  		if (*++first)
> > @@ -204,7 +205,7 @@ static void __guc_dequeue(struct intel_engine_cs *engine)
> >  	 * event.
> >  	 */
> >  	port = first;
> > -	while ((rb = rb_first_cached(&execlists->queue))) {
> > +	while ((rb = rb_first_cached(&sched_engine->queue))) {
> >  		struct i915_priolist *p = to_priolist(rb);
> >  		struct i915_request *rq, *rn;
> >  
> > @@ -224,11 +225,11 @@ static void __guc_dequeue(struct intel_engine_cs *engine)
> >  			last = rq;
> >  		}
> >  
> > -		rb_erase_cached(&p->node, &execlists->queue);
> > +		rb_erase_cached(&p->node, &sched_engine->queue);
> >  		i915_priolist_free(p);
> >  	}
> >  done:
> > -	execlists->queue_priority_hint =
> > +	sched_engine->queue_priority_hint =
> >  		rb ? to_priolist(rb)->priority : INT_MIN;
> >  	if (submit) {
> >  		*port = schedule_in(last, port - execlists->inflight);
> > @@ -240,13 +241,14 @@ static void __guc_dequeue(struct intel_engine_cs *engine)
> >  
> >  static void guc_submission_tasklet(struct tasklet_struct *t)
> >  {
> > -	struct intel_engine_cs * const engine =
> > -		from_tasklet(engine, t, execlists.tasklet);
> > +	struct i915_sched_engine *sched_engine =
> > +		from_tasklet(sched_engine, t, tasklet);
> > +	struct intel_engine_cs * const engine = sched_engine->engine;
> >  	struct intel_engine_execlists * const execlists = &engine->execlists;
> >  	struct i915_request **port, *rq;
> >  	unsigned long flags;
> >  
> > -	spin_lock_irqsave(&engine->active.lock, flags);
> > +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> >  
> >  	for (port = execlists->inflight; (rq = *port); port++) {
> >  		if (!i915_request_completed(rq))
> > @@ -262,20 +264,22 @@ static void guc_submission_tasklet(struct tasklet_struct *t)
> >  
> >  	__guc_dequeue(engine);
> >  
> > -	spin_unlock_irqrestore(&engine->active.lock, flags);
> > +	i915_sched_engine_reset_on_empty(engine->sched_engine);
> > +
> > +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >  }
> >  
> >  static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
> >  {
> >  	if (iir & GT_RENDER_USER_INTERRUPT) {
> >  		intel_engine_signal_breadcrumbs(engine);
> > -		tasklet_hi_schedule(&engine->execlists.tasklet);
> > +		i915_sched_engine_hi_kick(engine->sched_engine);
> >  	}
> >  }
> >  
> >  static void guc_reset_prepare(struct intel_engine_cs *engine)
> >  {
> > -	struct intel_engine_execlists * const execlists = &engine->execlists;
> > +	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> >  
> >  	ENGINE_TRACE(engine, "\n");
> >  
> > @@ -283,12 +287,12 @@ static void guc_reset_prepare(struct intel_engine_cs *engine)
> >  	 * Prevent request submission to the hardware until we have
> >  	 * completed the reset in i915_gem_reset_finish(). If a request
> >  	 * is completed by one engine, it may then queue a request
> > -	 * to a second via its execlists->tasklet *just* as we are
> > +	 * to a second via its sched_engine->tasklet *just* as we are
> >  	 * calling engine->init_hw() and also writing the ELSP.
> > -	 * Turning off the execlists->tasklet until the reset is over
> > +	 * Turning off the sched_engine->tasklet until the reset is over
> >  	 * prevents the race.
> >  	 */
> > -	__tasklet_disable_sync_once(&execlists->tasklet);
> > +	__tasklet_disable_sync_once(&sched_engine->tasklet);
> >  }
> >  
> >  static void guc_reset_state(struct intel_context *ce,
> > @@ -319,7 +323,7 @@ static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
> >  	struct i915_request *rq;
> >  	unsigned long flags;
> >  
> > -	spin_lock_irqsave(&engine->active.lock, flags);
> > +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> >  
> >  	/* Push back any incomplete requests for replay after the reset. */
> >  	rq = execlists_unwind_incomplete_requests(execlists);
> > @@ -333,12 +337,12 @@ static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
> >  	guc_reset_state(rq->context, engine, rq->head, stalled);
> >  
> >  out_unlock:
> > -	spin_unlock_irqrestore(&engine->active.lock, flags);
> > +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >  }
> >  
> >  static void guc_reset_cancel(struct intel_engine_cs *engine)
> >  {
> > -	struct intel_engine_execlists * const execlists = &engine->execlists;
> > +	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> >  	struct i915_request *rq, *rn;
> >  	struct rb_node *rb;
> >  	unsigned long flags;
> > @@ -359,16 +363,16 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> >  	 * submission's irq state, we also wish to remind ourselves that
> >  	 * it is irq state.)
> >  	 */
> > -	spin_lock_irqsave(&engine->active.lock, flags);
> > +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> >  
> >  	/* Mark all executing requests as skipped. */
> > -	list_for_each_entry(rq, &engine->active.requests, sched.link) {
> > +	list_for_each_entry(rq, &engine->sched_engine->requests, sched.link) {
> >  		i915_request_set_error_once(rq, -EIO);
> >  		i915_request_mark_complete(rq);
> >  	}
> >  
> >  	/* Flush the queued requests to the timeline list (for retiring). */
> > -	while ((rb = rb_first_cached(&execlists->queue))) {
> > +	while ((rb = rb_first_cached(&sched_engine->queue))) {
> >  		struct i915_priolist *p = to_priolist(rb);
> >  
> >  		priolist_for_each_request_consume(rq, rn, p) {
> > @@ -378,28 +382,28 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> >  			i915_request_mark_complete(rq);
> >  		}
> >  
> > -		rb_erase_cached(&p->node, &execlists->queue);
> > +		rb_erase_cached(&p->node, &sched_engine->queue);
> >  		i915_priolist_free(p);
> >  	}
> >  
> >  	/* Remaining _unready_ requests will be nop'ed when submitted */
> >  
> > -	execlists->queue_priority_hint = INT_MIN;
> > -	execlists->queue = RB_ROOT_CACHED;
> > +	sched_engine->queue_priority_hint = INT_MIN;
> > +	sched_engine->queue = RB_ROOT_CACHED;
> >  
> > -	spin_unlock_irqrestore(&engine->active.lock, flags);
> > +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >  }
> >  
> >  static void guc_reset_finish(struct intel_engine_cs *engine)
> >  {
> > -	struct intel_engine_execlists * const execlists = &engine->execlists;
> > +	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> >  
> > -	if (__tasklet_enable(&execlists->tasklet))
> > +	if (__tasklet_enable(&sched_engine->tasklet))
> >  		/* And kick in case we missed a new request submission. */
> > -		tasklet_hi_schedule(&execlists->tasklet);
> > +		i915_sched_engine_hi_kick(sched_engine);
> >  
> >  	ENGINE_TRACE(engine, "depth->%d\n",
> > -		     atomic_read(&execlists->tasklet.count));
> > +		     atomic_read(&sched_engine->tasklet.count));
> >  }
> >  
> >  /*
> > @@ -500,7 +504,7 @@ static inline void queue_request(struct intel_engine_cs *engine,
> >  {
> >  	GEM_BUG_ON(!list_empty(&rq->sched.link));
> >  	list_add_tail(&rq->sched.link,
> > -		      i915_sched_lookup_priolist(engine, prio));
> > +		      i915_sched_lookup_priolist(engine->sched_engine, prio));
> >  	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> >  }
> >  
> > @@ -510,16 +514,16 @@ static void guc_submit_request(struct i915_request *rq)
> >  	unsigned long flags;
> >  
> >  	/* Will be called from irq-context when using foreign fences. */
> > -	spin_lock_irqsave(&engine->active.lock, flags);
> > +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> >  
> >  	queue_request(engine, rq, rq_prio(rq));
> >  
> > -	GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
> > +	GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
> >  	GEM_BUG_ON(list_empty(&rq->sched.link));
> >  
> > -	tasklet_hi_schedule(&engine->execlists.tasklet);
> > +	i915_sched_engine_hi_kick(engine->sched_engine);
> >  
> > -	spin_unlock_irqrestore(&engine->active.lock, flags);
> > +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >  }
> >  
> >  static void sanitize_hwsp(struct intel_engine_cs *engine)
> > @@ -597,7 +601,7 @@ static void guc_release(struct intel_engine_cs *engine)
> >  {
> >  	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
> >  
> > -	tasklet_kill(&engine->execlists.tasklet);
> > +	tasklet_kill(&engine->sched_engine->tasklet);
> >  
> >  	intel_engine_cleanup_common(engine);
> >  	lrc_fini_wa_ctx(engine);
> > @@ -612,7 +616,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
> >  	engine->cops = &guc_context_ops;
> >  	engine->request_alloc = guc_request_alloc;
> >  
> > -	engine->schedule = i915_schedule;
> > +	engine->sched_engine->schedule = i915_schedule;
> >  
> >  	engine->reset.prepare = guc_reset_prepare;
> >  	engine->reset.rewind = guc_reset_rewind;
> > @@ -676,7 +680,8 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >  	 */
> >  	GEM_BUG_ON(INTEL_GEN(i915) < 11);
> >  
> > -	tasklet_setup(&engine->execlists.tasklet, guc_submission_tasklet);
> > +	tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet);
> > +	engine->sched_engine->schedule = i915_schedule;
> >  
> >  	guc_default_vfuncs(engine);
> >  	guc_default_irqs(engine);
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index bb181fe5d47e..3352f56bcf63 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -1247,7 +1247,8 @@ static void record_request(const struct i915_request *request,
> >  
> >  static void engine_record_execlists(struct intel_engine_coredump *ee)
> >  {
> > -	const struct intel_engine_execlists * const el = &ee->engine->execlists;
> > +	const struct intel_engine_execlists * const el =
> > +		&ee->engine->execlists;
> >  	struct i915_request * const *port = el->active;
> >  	unsigned int n = 0;
> >  
> > @@ -1441,12 +1442,12 @@ capture_engine(struct intel_engine_cs *engine,
> >  	if (!ee)
> >  		return NULL;
> >  
> > -	spin_lock_irqsave(&engine->active.lock, flags);
> > +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> >  	rq = intel_engine_find_active_request(engine);
> >  	if (rq)
> >  		capture = intel_engine_coredump_add_request(ee, rq,
> >  							    ATOMIC_MAYFAIL);
> > -	spin_unlock_irqrestore(&engine->active.lock, flags);
> > +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >  	if (!capture) {
> >  		kfree(ee);
> >  		return NULL;
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 970d8f4986bb..4c0df56e3b86 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -272,11 +272,11 @@ i915_request_active_engine(struct i915_request *rq,
> >  	 * check that we have acquired the lock on the final engine.
> >  	 */
> >  	locked = READ_ONCE(rq->engine);
> > -	spin_lock_irq(&locked->active.lock);
> > +	spin_lock_irq(&locked->sched_engine->lock);
> >  	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> > -		spin_unlock(&locked->active.lock);
> > +		spin_unlock(&locked->sched_engine->lock);
> >  		locked = engine;
> > -		spin_lock(&locked->active.lock);
> > +		spin_lock(&locked->sched_engine->lock);
> >  	}
> >  
> >  	if (i915_request_is_active(rq)) {
> > @@ -285,7 +285,7 @@ i915_request_active_engine(struct i915_request *rq,
> >  		ret = true;
> >  	}
> >  
> > -	spin_unlock_irq(&locked->active.lock);
> > +	spin_unlock_irq(&locked->sched_engine->lock);
> >  
> >  	return ret;
> >  }
> > @@ -302,10 +302,10 @@ static void remove_from_engine(struct i915_request *rq)
> >  	 * check that the rq still belongs to the newly locked engine.
> >  	 */
> >  	locked = READ_ONCE(rq->engine);
> > -	spin_lock_irq(&locked->active.lock);
> > +	spin_lock_irq(&locked->sched_engine->lock);
> >  	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> > -		spin_unlock(&locked->active.lock);
> > -		spin_lock(&engine->active.lock);
> > +		spin_unlock(&locked->sched_engine->lock);
> > +		spin_lock(&engine->sched_engine->lock);
> >  		locked = engine;
> >  	}
> >  	list_del_init(&rq->sched.link);
> > @@ -316,7 +316,7 @@ static void remove_from_engine(struct i915_request *rq)
> >  	/* Prevent further __await_execution() registering a cb, then flush */
> >  	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> >  
> > -	spin_unlock_irq(&locked->active.lock);
> > +	spin_unlock_irq(&locked->sched_engine->lock);
> >  
> >  	__notify_execute_cb_imm(rq);
> >  }
> > @@ -481,7 +481,7 @@ static bool __request_in_flight(const struct i915_request *signal)
> >  	 * may either perform a context switch to the second inflight execlists,
> >  	 * or it may switch to the pending set of execlists. In the case of the
> >  	 * latter, it may send the ACK and we process the event copying the
> > -	 * pending[] over top of inflight[], _overwriting_ our *active. Since
> > +	 * pending[] over top of inflight[], _overwriting_ our *active-> Since
> >  	 * this implies the HW is arbitrating and not struck in *active, we do
> >  	 * not worry about complete accuracy, but we do require no read/write
> >  	 * tearing of the pointer [the read of the pointer must be valid, even
> > @@ -490,7 +490,7 @@ static bool __request_in_flight(const struct i915_request *signal)
> >  	 *
> >  	 * Note that the read of *execlists->active may race with the promotion
> >  	 * of execlists->pending[] to execlists->inflight[], overwritting
> > -	 * the value at *execlists->active. This is fine. The promotion implies
> > +	 * the value at *execlists->active-> This is fine. The promotion implies
> >  	 * that we received an ACK from the HW, and so the context is not
> >  	 * stuck -- if we do not see ourselves in *active, the inflight status
> >  	 * is valid. If instead we see ourselves being copied into *active,
> > @@ -545,7 +545,7 @@ __await_execution(struct i915_request *rq,
> >  
> >  	/*
> >  	 * Register the callback first, then see if the signaler is already
> > -	 * active. This ensures that if we race with the
> > +	 * active-> This ensures that if we race with the
> >  	 * __notify_execute_cb from i915_request_submit() and we are not
> >  	 * included in that list, we get a second bite of the cherry and
> >  	 * execute it ourselves. After this point, a future
> > @@ -637,7 +637,7 @@ bool __i915_request_submit(struct i915_request *request)
> >  	RQ_TRACE(request, "\n");
> >  
> >  	GEM_BUG_ON(!irqs_disabled());
> > -	lockdep_assert_held(&engine->active.lock);
> > +	lockdep_assert_held(&engine->sched_engine->lock);
> >  
> >  	/*
> >  	 * With the advent of preempt-to-busy, we frequently encounter
> > @@ -649,9 +649,9 @@ bool __i915_request_submit(struct i915_request *request)
> >  	 *
> >  	 * We must remove the request from the caller's priority queue,
> >  	 * and the caller must only call us when the request is in their
> > -	 * priority queue, under the active.lock. This ensures that the
> > +	 * priority queue, under the active->lock. This ensures that the
> >  	 * request has *not* yet been retired and we can safely move
> > -	 * the request into the engine->active.list where it will be
> > +	 * the request into the engine->sched_engine->list where it will be
> >  	 * dropped upon retiring. (Otherwise if resubmit a *retired*
> >  	 * request, this would be a horrible use-after-free.)
> >  	 */
> > @@ -694,7 +694,7 @@ bool __i915_request_submit(struct i915_request *request)
> >  	result = true;
> >  
> >  	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> > -	list_move_tail(&request->sched.link, &engine->active.requests);
> > +	list_move_tail(&request->sched.link, &engine->sched_engine->requests);
> >  active:
> >  	clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags);
> >  	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
> > @@ -724,11 +724,11 @@ void i915_request_submit(struct i915_request *request)
> >  	unsigned long flags;
> >  
> >  	/* Will be called from irq-context when using foreign fences. */
> > -	spin_lock_irqsave(&engine->active.lock, flags);
> > +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> >  
> >  	__i915_request_submit(request);
> >  
> > -	spin_unlock_irqrestore(&engine->active.lock, flags);
> > +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >  }
> >  
> >  void __i915_request_unsubmit(struct i915_request *request)
> > @@ -742,7 +742,7 @@ void __i915_request_unsubmit(struct i915_request *request)
> >  	RQ_TRACE(request, "\n");
> >  
> >  	GEM_BUG_ON(!irqs_disabled());
> > -	lockdep_assert_held(&engine->active.lock);
> > +	lockdep_assert_held(&engine->sched_engine->lock);
> >  
> >  	/*
> >  	 * Before we remove this breadcrumb from the signal list, we have
> > @@ -775,11 +775,11 @@ void i915_request_unsubmit(struct i915_request *request)
> >  	unsigned long flags;
> >  
> >  	/* Will be called from irq-context when using foreign fences. */
> > -	spin_lock_irqsave(&engine->active.lock, flags);
> > +	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> >  
> >  	__i915_request_unsubmit(request);
> >  
> > -	spin_unlock_irqrestore(&engine->active.lock, flags);
> > +	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >  }
> >  
> >  static void __cancel_request(struct i915_request *rq)
> > @@ -1343,7 +1343,7 @@ __i915_request_await_execution(struct i915_request *to,
> >  	}
> >  
> >  	/* Couple the dependency tree for PI on this exposed to->fence */
> > -	if (to->engine->schedule) {
> > +	if (to->engine->sched_engine->schedule) {
> >  		err = i915_sched_node_add_dependency(&to->sched,
> >  						     &from->sched,
> >  						     I915_DEPENDENCY_WEAK);
> > @@ -1484,7 +1484,7 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
> >  		return 0;
> >  	}
> >  
> > -	if (to->engine->schedule) {
> > +	if (to->engine->sched_engine->schedule) {
> >  		ret = i915_sched_node_add_dependency(&to->sched,
> >  						     &from->sched,
> >  						     I915_DEPENDENCY_EXTERNAL);
> > @@ -1671,7 +1671,7 @@ __i915_request_add_to_timeline(struct i915_request *rq)
> >  			__i915_sw_fence_await_dma_fence(&rq->submit,
> >  							&prev->fence,
> >  							&rq->dmaq);
> > -		if (rq->engine->schedule)
> > +		if (rq->engine->sched_engine->schedule)
> >  			__i915_sched_node_add_dependency(&rq->sched,
> >  							 &prev->sched,
> >  							 &rq->dep,
> > @@ -1743,8 +1743,8 @@ void __i915_request_queue(struct i915_request *rq,
> >  	 * decide whether to preempt the entire chain so that it is ready to
> >  	 * run at the earliest possible convenience.
> >  	 */
> > -	if (attr && rq->engine->schedule)
> > -		rq->engine->schedule(rq, attr);
> > +	if (attr && rq->engine->sched_engine->schedule)
> > +		rq->engine->sched_engine->schedule(rq, attr);
> >  
> >  	local_bh_disable();
> >  	__i915_request_queue_bh(rq);
> > diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> > index 270f6cd37650..239964bec1fa 100644
> > --- a/drivers/gpu/drm/i915/i915_request.h
> > +++ b/drivers/gpu/drm/i915/i915_request.h
> > @@ -613,7 +613,7 @@ i915_request_active_timeline(const struct i915_request *rq)
> >  	 * this submission.
> >  	 */
> >  	return rcu_dereference_protected(rq->timeline,
> > -					 lockdep_is_held(&rq->engine->active.lock));
> > +					 lockdep_is_held(&rq->engine->sched_engine->lock));
> >  }
> >  
> >  static inline u32
> > diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> > index efa638c3acc7..28d403a8d7d2 100644
> > --- a/drivers/gpu/drm/i915/i915_scheduler.c
> > +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> > @@ -40,7 +40,7 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> >  	return rb_entry(rb, struct i915_priolist, node);
> >  }
> >  
> > -static void assert_priolists(struct intel_engine_execlists * const execlists)
> > +static void assert_priolists(struct i915_sched_engine * const sched_engine)
> >  {
> >  	struct rb_node *rb;
> >  	long last_prio;
> > @@ -48,11 +48,11 @@ static void assert_priolists(struct intel_engine_execlists * const execlists)
> >  	if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
> >  		return;
> >  
> > -	GEM_BUG_ON(rb_first_cached(&execlists->queue) !=
> > -		   rb_first(&execlists->queue.rb_root));
> > +	GEM_BUG_ON(rb_first_cached(&sched_engine->queue) !=
> > +		   rb_first(&sched_engine->queue.rb_root));
> >  
> >  	last_prio = INT_MAX;
> > -	for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
> > +	for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
> >  		const struct i915_priolist *p = to_priolist(rb);
> >  
> >  		GEM_BUG_ON(p->priority > last_prio);
> > @@ -61,23 +61,22 @@ static void assert_priolists(struct intel_engine_execlists * const execlists)
> >  }
> >  
> >  struct list_head *
> > -i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
> > +i915_sched_lookup_priolist(struct i915_sched_engine *sched_engine, int prio)
> >  {
> > -	struct intel_engine_execlists * const execlists = &engine->execlists;
> >  	struct i915_priolist *p;
> >  	struct rb_node **parent, *rb;
> >  	bool first = true;
> >  
> > -	lockdep_assert_held(&engine->active.lock);
> > -	assert_priolists(execlists);
> > +	lockdep_assert_held(&sched_engine->lock);
> > +	assert_priolists(sched_engine);
> >  
> > -	if (unlikely(execlists->no_priolist))
> > +	if (unlikely(sched_engine->no_priolist))
> >  		prio = I915_PRIORITY_NORMAL;
> >  
> >  find_priolist:
> >  	/* most positive priority is scheduled first, equal priorities fifo */
> >  	rb = NULL;
> > -	parent = &execlists->queue.rb_root.rb_node;
> > +	parent = &sched_engine->queue.rb_root.rb_node;
> >  	while (*parent) {
> >  		rb = *parent;
> >  		p = to_priolist(rb);
> > @@ -92,7 +91,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
> >  	}
> >  
> >  	if (prio == I915_PRIORITY_NORMAL) {
> > -		p = &execlists->default_priolist;
> > +		p = &sched_engine->default_priolist;
> >  	} else {
> >  		p = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC);
> >  		/* Convert an allocation failure to a priority bump */
> > @@ -107,7 +106,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
> >  			 * requests, so if userspace lied about their
> >  			 * dependencies that reordering may be visible.
> >  			 */
> > -			execlists->no_priolist = true;
> > +			sched_engine->no_priolist = true;
> >  			goto find_priolist;
> >  		}
> >  	}
> > @@ -116,7 +115,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
> >  	INIT_LIST_HEAD(&p->requests);
> >  
> >  	rb_link_node(&p->node, rb, parent);
> > -	rb_insert_color_cached(&p->node, &execlists->queue, first);
> > +	rb_insert_color_cached(&p->node, &sched_engine->queue, first);
> >  
> >  	return &p->requests;
> >  }
> > @@ -130,13 +129,13 @@ struct sched_cache {
> >  	struct list_head *priolist;
> >  };
> >  
> > -static struct intel_engine_cs *
> > -sched_lock_engine(const struct i915_sched_node *node,
> > -		  struct intel_engine_cs *locked,
> > +static struct i915_sched_engine *
> > +lock_sched_engine(struct i915_sched_node *node,
> > +		  struct i915_sched_engine *locked,
> >  		  struct sched_cache *cache)
> >  {
> >  	const struct i915_request *rq = node_to_request(node);
> > -	struct intel_engine_cs *engine;
> > +	struct i915_sched_engine *sched_engine;
> >  
> >  	GEM_BUG_ON(!locked);
> >  
> > @@ -146,81 +145,22 @@ sched_lock_engine(const struct i915_sched_node *node,
> >  	 * engine lock. The simple ploy we use is to take the lock then
> >  	 * check that the rq still belongs to the newly locked engine.
> >  	 */
> > -	while (locked != (engine = READ_ONCE(rq->engine))) {
> > -		spin_unlock(&locked->active.lock);
> > +	while (locked != (sched_engine = rq->engine->sched_engine)) {
> > +		spin_unlock(&locked->lock);
> >  		memset(cache, 0, sizeof(*cache));
> > -		spin_lock(&engine->active.lock);
> > -		locked = engine;
> > +		spin_lock(&sched_engine->lock);
> > +		locked = sched_engine;
> >  	}
> >  
> > -	GEM_BUG_ON(locked != engine);
> > +	GEM_BUG_ON(locked != sched_engine);
> >  	return locked;
> >  }
> >  
> > -static inline int rq_prio(const struct i915_request *rq)
> > -{
> > -	return rq->sched.attr.priority;
> > -}
> > -
> > -static inline bool need_preempt(int prio, int active)
> > -{
> > -	/*
> > -	 * Allow preemption of low -> normal -> high, but we do
> > -	 * not allow low priority tasks to preempt other low priority
> > -	 * tasks under the impression that latency for low priority
> > -	 * tasks does not matter (as much as background throughput),
> > -	 * so kiss.
> > -	 */
> > -	return prio >= max(I915_PRIORITY_NORMAL, active);
> > -}
> > -
> > -static void kick_submission(struct intel_engine_cs *engine,
> > -			    const struct i915_request *rq,
> > -			    int prio)
> > -{
> > -	const struct i915_request *inflight;
> > -
> > -	/*
> > -	 * We only need to kick the tasklet once for the high priority
> > -	 * new context we add into the queue.
> > -	 */
> > -	if (prio <= engine->execlists.queue_priority_hint)
> > -		return;
> > -
> > -	rcu_read_lock();
> > -
> > -	/* Nothing currently active? We're overdue for a submission! */
> > -	inflight = execlists_active(&engine->execlists);
> > -	if (!inflight)
> > -		goto unlock;
> > -
> > -	/*
> > -	 * If we are already the currently executing context, don't
> > -	 * bother evaluating if we should preempt ourselves.
> > -	 */
> > -	if (inflight->context == rq->context)
> > -		goto unlock;
> > -
> > -	ENGINE_TRACE(engine,
> > -		     "bumping queue-priority-hint:%d for rq:%llx:%lld, inflight:%llx:%lld prio %d\n",
> > -		     prio,
> > -		     rq->fence.context, rq->fence.seqno,
> > -		     inflight->fence.context, inflight->fence.seqno,
> > -		     inflight->sched.attr.priority);
> > -
> > -	engine->execlists.queue_priority_hint = prio;
> > -	if (need_preempt(prio, rq_prio(inflight)))
> > -		tasklet_hi_schedule(&engine->execlists.tasklet);
> > -
> > -unlock:
> > -	rcu_read_unlock();
> > -}
> > -
> >  static void __i915_schedule(struct i915_sched_node *node,
> >  			    const struct i915_sched_attr *attr)
> >  {
> >  	const int prio = max(attr->priority, node->attr.priority);
> > -	struct intel_engine_cs *engine;
> > +	struct i915_sched_engine *sched_engine;
> >  	struct i915_dependency *dep, *p;
> >  	struct i915_dependency stack;
> >  	struct sched_cache cache;
> > @@ -295,23 +235,24 @@ static void __i915_schedule(struct i915_sched_node *node,
> >  	}
> >  
> >  	memset(&cache, 0, sizeof(cache));
> > -	engine = node_to_request(node)->engine;
> > -	spin_lock(&engine->active.lock);
> > +	sched_engine = node_to_request(node)->engine->sched_engine;
> > +	spin_lock(&sched_engine->lock);
> >  
> >  	/* Fifo and depth-first replacement ensure our deps execute before us */
> > -	engine = sched_lock_engine(node, engine, &cache);
> > +	sched_engine = lock_sched_engine(node, sched_engine, &cache);
> >  	list_for_each_entry_safe_reverse(dep, p, &dfs, dfs_link) {
> >  		INIT_LIST_HEAD(&dep->dfs_link);
> >  
> >  		node = dep->signaler;
> > -		engine = sched_lock_engine(node, engine, &cache);
> > -		lockdep_assert_held(&engine->active.lock);
> > +		sched_engine = lock_sched_engine(node, sched_engine, &cache);
> > +		lockdep_assert_held(&sched_engine->lock);
> >  
> >  		/* Recheck after acquiring the engine->timeline.lock */
> >  		if (prio <= node->attr.priority || node_signaled(node))
> >  			continue;
> >  
> > -		GEM_BUG_ON(node_to_request(node)->engine != engine);
> > +		GEM_BUG_ON(node_to_request(node)->engine->sched_engine !=
> > +			   sched_engine);
> >  
> >  		WRITE_ONCE(node->attr.priority, prio);
> >  
> > @@ -329,16 +270,17 @@ static void __i915_schedule(struct i915_sched_node *node,
> >  		if (i915_request_in_priority_queue(node_to_request(node))) {
> >  			if (!cache.priolist)
> >  				cache.priolist =
> > -					i915_sched_lookup_priolist(engine,
> > +					i915_sched_lookup_priolist(sched_engine,
> >  								   prio);
> >  			list_move_tail(&node->link, cache.priolist);
> >  		}
> >  
> >  		/* Defer (tasklet) submission until after all of our updates. */
> > -		kick_submission(engine, node_to_request(node), prio);
> > +		if (sched_engine->kick_backend)
> > +			sched_engine->kick_backend(node_to_request(node), prio);
> >  	}
> >  
> > -	spin_unlock(&engine->active.lock);
> > +	spin_unlock(&sched_engine->lock);
> >  }
> >  
> >  void i915_schedule(struct i915_request *rq, const struct i915_sched_attr *attr)
> > @@ -489,6 +431,50 @@ void i915_request_show_with_schedule(struct drm_printer *m,
> >  	rcu_read_unlock();
> >  }
> >  
> > +void i915_sched_engine_free(struct kref *kref)
> > +{
> > +	struct i915_sched_engine *sched_engine =
> > +		container_of(kref, typeof(*sched_engine), ref);
> > +
> > +	i915_sched_engine_kill(sched_engine); /* flush the callback */
> > +	kfree(sched_engine);
> > +}
> > +
> > +struct i915_sched_engine *
> > +i915_sched_engine_create(unsigned int subclass)
> > +{
> > +	struct i915_sched_engine *sched_engine;
> > +
> > +	sched_engine = kzalloc(sizeof(*sched_engine), GFP_KERNEL);
> > +	if (!sched_engine)
> > +		return NULL;
> > +
> > +	kref_init(&sched_engine->ref);
> > +
> > +	sched_engine->queue = RB_ROOT_CACHED;
> > +	sched_engine->queue_priority_hint = INT_MIN;
> > +
> > +	INIT_LIST_HEAD(&sched_engine->requests);
> > +	INIT_LIST_HEAD(&sched_engine->hold);
> > +
> > +	spin_lock_init(&sched_engine->lock);
> > +	lockdep_set_subclass(&sched_engine->lock, subclass);
> > +
> > +	/*
> > +	 * Due to an interesting quirk in lockdep's internal debug tracking,
> > +	 * after setting a subclass we must ensure the lock is used. Otherwise,
> > +	 * nr_unused_locks is incremented once too often.
> > +	 */
> > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > +	local_irq_disable();
> > +	lock_map_acquire(&sched_engine->lock.dep_map);
> > +	lock_map_release(&sched_engine->lock.dep_map);
> > +	local_irq_enable();
> > +#endif
> > +
> > +	return sched_engine;
> > +}
> > +
> >  static void i915_global_scheduler_shrink(void)
> >  {
> >  	kmem_cache_shrink(global.slab_dependencies);
> > diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> > index 858a0938f47a..a78b1f50ecb4 100644
> > --- a/drivers/gpu/drm/i915/i915_scheduler.h
> > +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> > @@ -39,7 +39,7 @@ void i915_schedule(struct i915_request *request,
> >  		   const struct i915_sched_attr *attr);
> >  
> >  struct list_head *
> > -i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio);
> > +i915_sched_lookup_priolist(struct i915_sched_engine *sched_engine, int prio);
> >  
> >  void __i915_priolist_free(struct i915_priolist *p);
> >  static inline void i915_priolist_free(struct i915_priolist *p)
> > @@ -53,4 +53,67 @@ void i915_request_show_with_schedule(struct drm_printer *m,
> >  				     const char *prefix,
> >  				     int indent);
> >  
> > +struct i915_sched_engine *
> > +i915_sched_engine_create(unsigned int subclass);
> > +
> > +void i915_sched_engine_free(struct kref *kref);
> > +
> > +static inline struct i915_sched_engine *
> > +i915_sched_engine_get(struct i915_sched_engine *sched_engine)
> > +{
> > +	kref_get(&sched_engine->ref);
> > +	return sched_engine;
> > +}
> > +
> > +static inline void
> > +i915_sched_engine_put(struct i915_sched_engine *sched_engine)
> > +{
> > +	kref_put(&sched_engine->ref, i915_sched_engine_free);
> > +}
> > +
> > +static inline bool
> > +i915_sched_engine_is_empty(struct i915_sched_engine *sched_engine)
> > +{
> > +	return RB_EMPTY_ROOT(&sched_engine->queue.rb_root);
> > +}
> > +
> > +static inline void
> > +i915_sched_engine_reset_on_empty(struct i915_sched_engine *sched_engine)
> > +{
> > +	if (i915_sched_engine_is_empty(sched_engine))
> > +		sched_engine->no_priolist = false;
> > +}
> > +
> > +static inline void
> > +i915_sched_engine_hi_kick(struct i915_sched_engine *sched_engine)
> > +{
> > +	tasklet_hi_schedule(&sched_engine->tasklet);
> > +}
> > +
> > +static inline void
> > +i915_sched_engine_kick(struct i915_sched_engine *sched_engine)
> > +{
> > +	tasklet_schedule(&sched_engine->tasklet);
> > +}
> > +
> > +static inline void
> > +i915_sched_engine_kill(struct i915_sched_engine *sched_engine)
> > +{
> > +	tasklet_kill(&sched_engine->tasklet);
> > +}
> > +
> > +static inline void
> > +sched_engine_active_lock_bh(struct i915_sched_engine *sched_engine)
> > +{
> > +	local_bh_disable(); /* prevent local softirq and lock recursion */
> > +	tasklet_lock(&sched_engine->tasklet);
> > +}
> > +
> > +static inline void
> > +sched_engine_active_unlock_bh(struct i915_sched_engine *sched_engine)
> > +{
> > +	tasklet_unlock(&sched_engine->tasklet);
> > +	local_bh_enable(); /* restore softirq, and kick ksoftirqd! */
> > +}
> > +
> >  #endif /* _I915_SCHEDULER_H_ */
> > diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
> > index 343ed44d5ed4..90b389ba661b 100644
> > --- a/drivers/gpu/drm/i915/i915_scheduler_types.h
> > +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
> > @@ -91,4 +91,67 @@ struct i915_dependency {
> >  				&(rq__)->sched.signalers_list, \
> >  				signal_link)
> >  
> > +struct i915_sched_engine {
> > +	struct kref ref;
> > +
> > +	/*
> > +	 * @lock: Protects requests in priority lists, requests, hold, and
> > +	 * tasklet while running.
> > +	 */
> > +	spinlock_t lock;
> > +
> > +	/* Execlist specific lists, needed here as protected by lock */
> > +	struct list_head requests;
> > +	struct list_head hold; /* ready requests, but on hold */
> > +
> > +	/**
> > +	 * @tasklet: softirq tasklet for bottom handler
> > +	 */
> > +	struct tasklet_struct tasklet;
> > +
> > +	/**
> > +	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
> > +	 */
> > +	struct i915_priolist default_priolist;
> > +
> > +	/**
> > +	 * @queue_priority_hint: Highest pending priority.
> > +	 *
> > +	 * When we add requests into the queue, or adjust the priority of
> > +	 * executing requests, we compute the maximum priority of those
> > +	 * pending requests. We can then use this value to determine if
> > +	 * we need to preempt the executing requests to service the queue.
> > +	 * However, since the we may have recorded the priority of an inflight
> > +	 * request we wanted to preempt but since completed, at the time of
> > +	 * dequeuing the priority hint may no longer may match the highest
> > +	 * available request priority.
> > +	 */
> > +	int queue_priority_hint;
> > +
> > +	/**
> > +	 * @queue: queue of requests, in priority lists
> > +	 */
> > +	struct rb_root_cached queue;
> > +
> > +	/**
> > +	 * @no_priolist: priority lists disabled
> > +	 */
> > +	bool no_priolist;
> > +
> > +	/* Back pointer to engine */
> > +	struct intel_engine_cs *engine;
> > +
> > +	/* Kick backend */
> > +	void	(*kick_backend)(const struct i915_request *rq,
> > +				int prio);
> > +
> > +	/*
> > +	 * Call when the priority on a request has changed and it and its
> > +	 * dependencies may need rescheduling. Note the request itself may
> > +	 * not be ready to run!
> > +	 */
> > +	void	(*schedule)(struct i915_request *request,
> > +			    const struct i915_sched_attr *attr);
> > +};
> > +
> >  #endif /* _I915_SCHEDULER_TYPES_H_ */
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 20/97] drm/i915/guc: Introduce unified HXG messages
  2021-05-11 15:16   ` Daniel Vetter
@ 2021-05-11 17:59     ` Matthew Brost
  2021-05-11 22:11     ` Michal Wajdeczko
  1 sibling, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-11 17:59 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison

On Tue, May 11, 2021 at 05:16:38PM +0200, Daniel Vetter wrote:
> On Thu, May 06, 2021 at 12:13:34PM -0700, Matthew Brost wrote:
> > From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > 
> > New GuC firmware will unify format of MMIO and CTB H2G messages.
> > Introduce their definitions now to allow gradual transition of
> > our code to match new changes.
> > 
> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Cc: Michał Winiarski <michal.winiarski@intel.com>
> > ---
> >  .../gpu/drm/i915/gt/uc/abi/guc_messages_abi.h | 226 ++++++++++++++++++
> >  1 file changed, 226 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
> > index 775e21f3058c..1c264819aa03 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
> > @@ -6,6 +6,232 @@
> >  #ifndef _ABI_GUC_MESSAGES_ABI_H
> >  #define _ABI_GUC_MESSAGES_ABI_H
> >  
> > +/**
> > + * DOC: HXG Message
> 
> These aren't useful if we don't pull them in somewhere in the
> Documentation/gpu hierarchy. General comment, and also please check that
> it all renders correctly still.
>

Sure. Let me figure this out before my next rev.
 
> btw if you respin a patch not originally by you we generally add a (v1) to
> the original s-o-b line (or whever the version split was) and explain in
> the usual changelog in the commit message what was changed.
> 

Still new to this process. Will do.

Matt

> This holds for the entire series ofc.
> -Daniel
> 
> > + *
> > + * All messages exchanged with GuC are defined using 32 bit dwords.
> > + * First dword is treated as a message header. Remaining dwords are optional.
> > + *
> > + * .. _HXG Message:
> > + *
> > + *  +---+-------+--------------------------------------------------------------+
> > + *  |   | Bits  | Description                                                  |
> > + *  +===+=======+==============================================================+
> > + *  |   |       |                                                              |
> > + *  | 0 |    31 | **ORIGIN** - originator of the message                       |
> > + *  |   |       |   - _`GUC_HXG_ORIGIN_HOST` = 0                               |
> > + *  |   |       |   - _`GUC_HXG_ORIGIN_GUC` = 1                                |
> > + *  |   |       |                                                              |
> > + *  |   +-------+--------------------------------------------------------------+
> > + *  |   | 30:28 | **TYPE** - message type                                      |
> > + *  |   |       |   - _`GUC_HXG_TYPE_REQUEST` = 0                              |
> > + *  |   |       |   - _`GUC_HXG_TYPE_EVENT` = 1                                |
> > + *  |   |       |   - _`GUC_HXG_TYPE_NO_RESPONSE_BUSY` = 3                     |
> > + *  |   |       |   - _`GUC_HXG_TYPE_NO_RESPONSE_RETRY` = 5                    |
> > + *  |   |       |   - _`GUC_HXG_TYPE_RESPONSE_FAILURE` = 6                     |
> > + *  |   |       |   - _`GUC_HXG_TYPE_RESPONSE_SUCCESS` = 7                     |
> > + *  |   +-------+--------------------------------------------------------------+
> > + *  |   |  27:0 | **AUX** - auxiliary data (depends TYPE)                      |
> > + *  +---+-------+--------------------------------------------------------------+
> > + *  | 1 |  31:0 | optional payload (depends on TYPE)                           |
> > + *  +---+-------+                                                              |
> > + *  |...|       |                                                              |
> > + *  +---+-------+                                                              |
> > + *  | n |  31:0 |                                                              |
> > + *  +---+-------+--------------------------------------------------------------+
> > + */
> > +
> > +#define GUC_HXG_MSG_MIN_LEN			1u
> > +#define GUC_HXG_MSG_0_ORIGIN			(0x1 << 31)
> > +#define   GUC_HXG_ORIGIN_HOST			0u
> > +#define   GUC_HXG_ORIGIN_GUC			1u
> > +#define GUC_HXG_MSG_0_TYPE			(0x7 << 28)
> > +#define   GUC_HXG_TYPE_REQUEST			0u
> > +#define   GUC_HXG_TYPE_EVENT			1u
> > +#define   GUC_HXG_TYPE_NO_RESPONSE_BUSY		3u
> > +#define   GUC_HXG_TYPE_NO_RESPONSE_RETRY	5u
> > +#define   GUC_HXG_TYPE_RESPONSE_FAILURE		6u
> > +#define   GUC_HXG_TYPE_RESPONSE_SUCCESS		7u
> > +#define GUC_HXG_MSG_0_AUX			(0xfffffff << 0)
> > +
> > +/**
> > + * DOC: HXG Request
> > + *
> > + * The `HXG Request`_ message should be used to initiate synchronous activity
> > + * for which confirmation or return data is expected.
> > + *
> > + * The recipient of this message shall use `HXG Response`_, `HXG Failure`_
> > + * or `HXG Retry`_ message as a definite reply, and may use `HXG Busy`_
> > + * message as a intermediate reply.
> > + *
> > + * Format of @DATA0 and all @DATAn fields depends on the @ACTION code.
> > + *
> > + * _HXG Request:
> > + *
> > + *  +---+-------+--------------------------------------------------------------+
> > + *  |   | Bits  | Description                                                  |
> > + *  +===+=======+==============================================================+
> > + *  | 0 |    31 | ORIGIN                                                       |
> > + *  |   +-------+--------------------------------------------------------------+
> > + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_                                 |
> > + *  |   +-------+--------------------------------------------------------------+
> > + *  |   | 27:16 | **DATA0** - request data (depends on ACTION)                 |
> > + *  |   +-------+--------------------------------------------------------------+
> > + *  |   |  15:0 | **ACTION** - requested action code                           |
> > + *  +---+-------+--------------------------------------------------------------+
> > + *  | 1 |  31:0 | **DATA1** - optional data (depends on ACTION)                |
> > + *  +---+-------+--------------------------------------------------------------+
> > + *  |...|       |                                                              |
> > + *  +---+-------+--------------------------------------------------------------+
> > + *  | n |  31:0 | **DATAn** - optional data (depends on ACTION)                |
> > + *  +---+-------+--------------------------------------------------------------+
> > + */
> > +
> > +#define GUC_HXG_REQUEST_MSG_MIN_LEN		GUC_HXG_MSG_MIN_LEN
> > +#define GUC_HXG_REQUEST_MSG_0_DATA0		(0xfff << 16)
> > +#define GUC_HXG_REQUEST_MSG_0_ACTION		(0xffff << 0)
> > +#define GUC_HXG_REQUEST_MSG_n_DATAn		(0xffffffff << 0)
> > +
> > +/**
> > + * DOC: HXG Event
> > + *
> > + * The `HXG Event`_ message should be used to initiate asynchronous activity
> > + * that does not involves immediate confirmation nor data.
> > + *
> > + * Format of @DATA0 and all @DATAn fields depends on the @ACTION code.
> > + *
> > + * .. _HXG Event:
> > + *
> > + *  +---+-------+--------------------------------------------------------------+
> > + *  |   | Bits  | Description                                                  |
> > + *  +===+=======+==============================================================+
> > + *  | 0 |    31 | ORIGIN                                                       |
> > + *  |   +-------+--------------------------------------------------------------+
> > + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_EVENT_                                   |
> > + *  |   +-------+--------------------------------------------------------------+
> > + *  |   | 27:16 | **DATA0** - event data (depends on ACTION)                   |
> > + *  |   +-------+--------------------------------------------------------------+
> > + *  |   |  15:0 | **ACTION** - event action code                               |
> > + *  +---+-------+--------------------------------------------------------------+
> > + *  | 1 |  31:0 | **DATA1** - optional event data (depends on ACTION)          |
> > + *  +---+-------+--------------------------------------------------------------+
> > + *  |...|       |                                                              |
> > + *  +---+-------+--------------------------------------------------------------+
> > + *  | n |  31:0 | **DATAn** - optional event  data (depends on ACTION)         |
> > + *  +---+-------+--------------------------------------------------------------+
> > + */
> > +
> > +#define GUC_HXG_EVENT_MSG_MIN_LEN		GUC_HXG_MSG_MIN_LEN
> > +#define GUC_HXG_EVENT_MSG_0_DATA0		(0xfff << 16)
> > +#define GUC_HXG_EVENT_MSG_0_ACTION		(0xffff << 0)
> > +#define GUC_HXG_EVENT_MSG_n_DATAn		(0xffffffff << 0)
> > +
> > +/**
> > + * DOC: HXG Busy
> > + *
> > + * The `HXG Busy`_ message may be used to acknowledge reception of the `HXG Request`_
> > + * message if the recipient expects that it processing will be longer than default
> > + * timeout.
> > + *
> > + * The @COUNTER field may be used as a progress indicator.
> > + *
> > + * .. _HXG Busy:
> > + *
> > + *  +---+-------+--------------------------------------------------------------+
> > + *  |   | Bits  | Description                                                  |
> > + *  +===+=======+==============================================================+
> > + *  | 0 |    31 | ORIGIN                                                       |
> > + *  |   +-------+--------------------------------------------------------------+
> > + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_NO_RESPONSE_BUSY_                        |
> > + *  |   +-------+--------------------------------------------------------------+
> > + *  |   |  27:0 | **COUNTER** - progress indicator                             |
> > + *  +---+-------+--------------------------------------------------------------+
> > + */
> > +
> > +#define GUC_HXG_BUSY_MSG_LEN			GUC_HXG_MSG_MIN_LEN
> > +#define GUC_HXG_BUSY_MSG_0_COUNTER		GUC_HXG_MSG_0_AUX
> > +
> > +/**
> > + * DOC: HXG Retry
> > + *
> > + * The `HXG Retry`_ message should be used by recipient to indicate that the
> > + * `HXG Request`_ message was dropped and it should be resent again.
> > + *
> > + * The @REASON field may be used to provide additional information.
> > + *
> > + * .. _HXG Retry:
> > + *
> > + *  +---+-------+--------------------------------------------------------------+
> > + *  |   | Bits  | Description                                                  |
> > + *  +===+=======+==============================================================+
> > + *  | 0 |    31 | ORIGIN                                                       |
> > + *  |   +-------+--------------------------------------------------------------+
> > + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_NO_RESPONSE_RETRY_                       |
> > + *  |   +-------+--------------------------------------------------------------+
> > + *  |   |  27:0 | **REASON** - reason for retry                                |
> > + *  |   |       |  - _`GUC_HXG_RETRY_REASON_UNSPECIFIED` = 0                   |
> > + *  +---+-------+--------------------------------------------------------------+
> > + */
> > +
> > +#define GUC_HXG_RETRY_MSG_LEN			GUC_HXG_MSG_MIN_LEN
> > +#define GUC_HXG_RETRY_MSG_0_REASON		GUC_HXG_MSG_0_AUX
> > +#define   GUC_HXG_RETRY_REASON_UNSPECIFIED	0u
> > +
> > +/**
> > + * DOC: HXG Failure
> > + *
> > + * The `HXG Failure`_ message shall be used as a reply to the `HXG Request`_
> > + * message that could not be processed due to an error.
> > + *
> > + * .. _HXG Failure:
> > + *
> > + *  +---+-------+--------------------------------------------------------------+
> > + *  |   | Bits  | Description                                                  |
> > + *  +===+=======+==============================================================+
> > + *  | 0 |    31 | ORIGIN                                                       |
> > + *  |   +-------+--------------------------------------------------------------+
> > + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_FAILURE_                        |
> > + *  |   +-------+--------------------------------------------------------------+
> > + *  |   | 27:16 | **HINT** - additional error hint                             |
> > + *  |   +-------+--------------------------------------------------------------+
> > + *  |   |  15:0 | **ERROR** - error/result code                                |
> > + *  +---+-------+--------------------------------------------------------------+
> > + */
> > +
> > +#define GUC_HXG_FAILURE_MSG_LEN			GUC_HXG_MSG_MIN_LEN
> > +#define GUC_HXG_FAILURE_MSG_0_HINT		(0xfff << 16)
> > +#define GUC_HXG_FAILURE_MSG_0_ERROR		(0xffff << 0)
> > +
> > +/**
> > + * DOC: HXG Response
> > + *
> > + * The `HXG Response`_ message SHALL be used as a reply to the `HXG Request`_
> > + * message that was successfully processed without an error.
> > + *
> > + * .. _HXG Response:
> > + *
> > + *  +---+-------+--------------------------------------------------------------+
> > + *  |   | Bits  | Description                                                  |
> > + *  +===+=======+==============================================================+
> > + *  | 0 |    31 | ORIGIN                                                       |
> > + *  |   +-------+--------------------------------------------------------------+
> > + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_                        |
> > + *  |   +-------+--------------------------------------------------------------+
> > + *  |   |  27:0 | **DATA0** - data (depends on ACTION from `HXG Request`_)     |
> > + *  +---+-------+--------------------------------------------------------------+
> > + *  | 1 |  31:0 | **DATA1** - data (depends on ACTION from `HXG Request`_)     |
> > + *  +---+-------+--------------------------------------------------------------+
> > + *  |...|       |                                                              |
> > + *  +---+-------+--------------------------------------------------------------+
> > + *  | n |  31:0 | **DATAn** - data (depends on ACTION from `HXG Request`_)     |
> > + *  +---+-------+--------------------------------------------------------------+
> > + */
> > +
> > +#define GUC_HXG_RESPONSE_MSG_MIN_LEN		GUC_HXG_MSG_MIN_LEN
> > +#define GUC_HXG_RESPONSE_MSG_0_DATA0		GUC_HXG_MSG_0_AUX
> > +#define GUC_HXG_RESPONSE_MSG_n_DATAn		(0xffffffff << 0)
> > +
> > +/* deprecated */
> >  #define INTEL_GUC_MSG_TYPE_SHIFT	28
> >  #define INTEL_GUC_MSG_TYPE_MASK		(0xF << INTEL_GUC_MSG_TYPE_SHIFT)
> >  #define INTEL_GUC_MSG_DATA_SHIFT	16
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 43/97] drm/i915/guc: Add lrc descriptor context lookup array
  2021-05-11 17:43       ` Daniel Vetter
@ 2021-05-11 19:34         ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-11 19:34 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison

On Tue, May 11, 2021 at 07:43:30PM +0200, Daniel Vetter wrote:
> On Tue, May 11, 2021 at 10:01:28AM -0700, Matthew Brost wrote:
> > On Tue, May 11, 2021 at 05:26:34PM +0200, Daniel Vetter wrote:
> > > On Thu, May 06, 2021 at 12:13:57PM -0700, Matthew Brost wrote:
> > > > Add lrc descriptor context lookup array which can resolve the
> > > > intel_context from the lrc descriptor index. In addition to lookup, it
> > > > can determine in the lrc descriptor context is currently registered with
> > > > the GuC by checking if an entry for a descriptor index is present.
> > > > Future patches in the series will make use of this array.
> > > > 
> > > > Cc: John Harrison <john.c.harrison@intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  5 +++
> > > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +++++++++++++++++--
> > > >  2 files changed, 35 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > index d84f37afb9d8..2eb6c497e43c 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > @@ -6,6 +6,8 @@
> > > >  #ifndef _INTEL_GUC_H_
> > > >  #define _INTEL_GUC_H_
> > > >  
> > > > +#include "linux/xarray.h"
> > > > +
> > > >  #include "intel_uncore.h"
> > > >  #include "intel_guc_fw.h"
> > > >  #include "intel_guc_fwif.h"
> > > > @@ -47,6 +49,9 @@ struct intel_guc {
> > > >  	struct i915_vma *lrc_desc_pool;
> > > >  	void *lrc_desc_pool_vaddr;
> > > >  
> > > > +	/* guc_id to intel_context lookup */
> > > > +	struct xarray context_lookup;
> > > 
> > > The current code sets a disastrous example, but for stuff like this it's
> > > always good to explain the locking, and who's holding references and how
> > > you're handling cycles. Since I guess the intel_context also holds the
> > > guc_id alive somehow.
> > > 
> > 
> > I think (?) I know what you mean by this comment. How about adding:
> > 
> > 'If an entry in the the context_lookup is present, that means a context
> > associated with the guc_id is registered with the GuC. We use this xarray as a
> > lookup mechanism when the GuC communicate with the i915 about the context.'
> 
> So no idea how this works, but generally we put a "Protecte by
> &struct.lock" or similar in here (so you get a nice link plus something
> you can use as jump label in your ide too). Plus since intel_context has
> some lifetime rules, explaining whether you're allowed to use the pointer
> after you unlock, or whether you need to grab a reference or what exactly
> is going on. Usually there's three options:
> 
> - No refcounting, you cannot access a pointer obtained through this after
>   you unluck.
> - Weak reference, you upgrade to a full reference with
>   kref_get_unless_zero. If that fails it indicates a lookup failure, since
>   you raced with destruction. If it succeeds you can use the pointer after
>   unlock.
> - Strong reference, you get your own reference that stays valid with
>   kref_get().
> 

I think the rules for this are 'if this exists in the xarray, we have ref'.
Likewise if the GuC knows about the context we have a ref to the context. 

> I'm just bringing this up because the current i915-gem code is full of
> very tricky locking and lifetime rules, and explains roughly nothing of it
> in the data structures. Minimally some hints about the locking/lifetime
> rules of important structs should be there.
>

Agree. I'll add some comments here and to other structures this code uses.
 
> For locking rules it's good to double-down on them by adding
> lockdep_assert_held to all relevant functions (where appropriate only
> ofc).
>

Agree. I think I mostly do that in series. That being said the locking is going
to be a bit ugly until we switch to the DRM scheduler because currently multiple
processes can enter the GuC backend in parallel. With the DRM scheduler we allow
a single point of entry which simplifies things quite a bit.

The current locking rules are explained in the documentation patch: 'Update GuC
documentation'. As the locking evolves so will the documentation + lockdep
asserts.

Matt
 
> What I generally don't think makes sense is to then also document the
> locking in the kerneldoc for the functions. That tends to be one place too
> many and ime just gets out of date and not useful at all.
> 
> > > Again holds for the entire series, where it makes sense (as in we don't
> > > expect to rewrite the entire code anyway).
> > 
> > Slightly out of order but one of the last patches in the series, 'Update GuC
> > documentation' adds a big section of comments that attempts to clarify how all
> > of this code works. I likely should add a section explaining the data structures
> > as well.
> 
> Yeah that would be nice.
> -Daniel
> 
> 
> > 
> > Matt
> > 
> > > -Daniel
> > > 
> > > > +
> > > >  	/* Control params for fw initialization */
> > > >  	u32 params[GUC_CTL_MAX_DWORDS];
> > > >  
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > index 6acc1ef34f92..c2b6d27404b7 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> > > >  	return rb_entry(rb, struct i915_priolist, node);
> > > >  }
> > > >  
> > > > -/* Future patches will use this function */
> > > > -__attribute__ ((unused))
> > > >  static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
> > > >  {
> > > >  	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
> > > > @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
> > > >  	return &base[index];
> > > >  }
> > > >  
> > > > +static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
> > > > +{
> > > > +	struct intel_context *ce = xa_load(&guc->context_lookup, id);
> > > > +
> > > > +	GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
> > > > +
> > > > +	return ce;
> > > > +}
> > > > +
> > > >  static int guc_lrc_desc_pool_create(struct intel_guc *guc)
> > > >  {
> > > >  	u32 size;
> > > > @@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
> > > >  	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
> > > >  }
> > > >  
> > > > +static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
> > > > +{
> > > > +	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> > > > +
> > > > +	memset(desc, 0, sizeof(*desc));
> > > > +	xa_erase_irq(&guc->context_lookup, id);
> > > > +}
> > > > +
> > > > +static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> > > > +{
> > > > +	return __get_context(guc, id);
> > > > +}
> > > > +
> > > > +static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> > > > +					   struct intel_context *ce)
> > > > +{
> > > > +	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> > > > +}
> > > > +
> > > >  static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > > >  {
> > > >  	/* Leaving stub as this function will be used in future patches */
> > > > @@ -404,6 +430,8 @@ int intel_guc_submission_init(struct intel_guc *guc)
> > > >  	 */
> > > >  	GEM_BUG_ON(!guc->lrc_desc_pool);
> > > >  
> > > > +	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> > > > +
> > > >  	return 0;
> > > >  }
> > > >  
> > > > -- 
> > > > 2.28.0
> > > > 
> > > 
> > > -- 
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 20/97] drm/i915/guc: Introduce unified HXG messages
  2021-05-11 15:16   ` Daniel Vetter
  2021-05-11 17:59     ` Matthew Brost
@ 2021-05-11 22:11     ` Michal Wajdeczko
  2021-05-12  8:40       ` Daniel Vetter
  1 sibling, 1 reply; 249+ messages in thread
From: Michal Wajdeczko @ 2021-05-11 22:11 UTC (permalink / raw)
  To: Daniel Vetter, Matthew Brost
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison



On 11.05.2021 17:16, Daniel Vetter wrote:
> On Thu, May 06, 2021 at 12:13:34PM -0700, Matthew Brost wrote:
>> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>
>> New GuC firmware will unify format of MMIO and CTB H2G messages.
>> Introduce their definitions now to allow gradual transition of
>> our code to match new changes.
>>
>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> Cc: Michał Winiarski <michal.winiarski@intel.com>
>> ---
>>  .../gpu/drm/i915/gt/uc/abi/guc_messages_abi.h | 226 ++++++++++++++++++
>>  1 file changed, 226 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
>> index 775e21f3058c..1c264819aa03 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
>> @@ -6,6 +6,232 @@
>>  #ifndef _ABI_GUC_MESSAGES_ABI_H
>>  #define _ABI_GUC_MESSAGES_ABI_H
>>  
>> +/**
>> + * DOC: HXG Message
> 
> These aren't useful if we don't pull them in somewhere in the
> Documentation/gpu hierarchy. General comment, and also please check that
> it all renders correctly still.

Patch that connects all these DOC sections into i915.rst is still on
private branch, where I'm trying to verify all html rendering, and ...

> 
> btw if you respin a patch not originally by you we generally add a (v1) to
> the original s-o-b line (or whever the version split was) and explain in
> the usual changelog in the commit message what was changed.
> 
> This holds for the entire series ofc.
> -Daniel
> 
>> + *
>> + * All messages exchanged with GuC are defined using 32 bit dwords.
>> + * First dword is treated as a message header. Remaining dwords are optional.
>> + *
>> + * .. _HXG Message:

where such workarounds from early documentation are already removed,
since they are not needed any more starting from commit ef09989594bf
("scripts/kernel-doc: add internal hyperlink to DOC: sections")

Michal

>> + *
>> + *  +---+-------+--------------------------------------------------------------+
>> + *  |   | Bits  | Description                                                  |
>> + *  +===+=======+==============================================================+
>> + *  |   |       |                                                              |
>> + *  | 0 |    31 | **ORIGIN** - originator of the message                       |
>> + *  |   |       |   - _`GUC_HXG_ORIGIN_HOST` = 0                               |
>> + *  |   |       |   - _`GUC_HXG_ORIGIN_GUC` = 1                                |
>> + *  |   |       |                                                              |
>> + *  |   +-------+--------------------------------------------------------------+
>> + *  |   | 30:28 | **TYPE** - message type                                      |
>> + *  |   |       |   - _`GUC_HXG_TYPE_REQUEST` = 0                              |
>> + *  |   |       |   - _`GUC_HXG_TYPE_EVENT` = 1                                |
>> + *  |   |       |   - _`GUC_HXG_TYPE_NO_RESPONSE_BUSY` = 3                     |
>> + *  |   |       |   - _`GUC_HXG_TYPE_NO_RESPONSE_RETRY` = 5                    |
>> + *  |   |       |   - _`GUC_HXG_TYPE_RESPONSE_FAILURE` = 6                     |
>> + *  |   |       |   - _`GUC_HXG_TYPE_RESPONSE_SUCCESS` = 7                     |
>> + *  |   +-------+--------------------------------------------------------------+
>> + *  |   |  27:0 | **AUX** - auxiliary data (depends TYPE)                      |
>> + *  +---+-------+--------------------------------------------------------------+
>> + *  | 1 |  31:0 | optional payload (depends on TYPE)                           |
>> + *  +---+-------+                                                              |
>> + *  |...|       |                                                              |
>> + *  +---+-------+                                                              |
>> + *  | n |  31:0 |                                                              |
>> + *  +---+-------+--------------------------------------------------------------+
>> + */
>> +
>> +#define GUC_HXG_MSG_MIN_LEN			1u
>> +#define GUC_HXG_MSG_0_ORIGIN			(0x1 << 31)
>> +#define   GUC_HXG_ORIGIN_HOST			0u
>> +#define   GUC_HXG_ORIGIN_GUC			1u
>> +#define GUC_HXG_MSG_0_TYPE			(0x7 << 28)
>> +#define   GUC_HXG_TYPE_REQUEST			0u
>> +#define   GUC_HXG_TYPE_EVENT			1u
>> +#define   GUC_HXG_TYPE_NO_RESPONSE_BUSY		3u
>> +#define   GUC_HXG_TYPE_NO_RESPONSE_RETRY	5u
>> +#define   GUC_HXG_TYPE_RESPONSE_FAILURE		6u
>> +#define   GUC_HXG_TYPE_RESPONSE_SUCCESS		7u
>> +#define GUC_HXG_MSG_0_AUX			(0xfffffff << 0)
>> +
>> +/**
>> + * DOC: HXG Request
>> + *
>> + * The `HXG Request`_ message should be used to initiate synchronous activity
>> + * for which confirmation or return data is expected.
>> + *
>> + * The recipient of this message shall use `HXG Response`_, `HXG Failure`_
>> + * or `HXG Retry`_ message as a definite reply, and may use `HXG Busy`_
>> + * message as a intermediate reply.
>> + *
>> + * Format of @DATA0 and all @DATAn fields depends on the @ACTION code.
>> + *
>> + * _HXG Request:
>> + *
>> + *  +---+-------+--------------------------------------------------------------+
>> + *  |   | Bits  | Description                                                  |
>> + *  +===+=======+==============================================================+
>> + *  | 0 |    31 | ORIGIN                                                       |
>> + *  |   +-------+--------------------------------------------------------------+
>> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_                                 |
>> + *  |   +-------+--------------------------------------------------------------+
>> + *  |   | 27:16 | **DATA0** - request data (depends on ACTION)                 |
>> + *  |   +-------+--------------------------------------------------------------+
>> + *  |   |  15:0 | **ACTION** - requested action code                           |
>> + *  +---+-------+--------------------------------------------------------------+
>> + *  | 1 |  31:0 | **DATA1** - optional data (depends on ACTION)                |
>> + *  +---+-------+--------------------------------------------------------------+
>> + *  |...|       |                                                              |
>> + *  +---+-------+--------------------------------------------------------------+
>> + *  | n |  31:0 | **DATAn** - optional data (depends on ACTION)                |
>> + *  +---+-------+--------------------------------------------------------------+
>> + */
>> +
>> +#define GUC_HXG_REQUEST_MSG_MIN_LEN		GUC_HXG_MSG_MIN_LEN
>> +#define GUC_HXG_REQUEST_MSG_0_DATA0		(0xfff << 16)
>> +#define GUC_HXG_REQUEST_MSG_0_ACTION		(0xffff << 0)
>> +#define GUC_HXG_REQUEST_MSG_n_DATAn		(0xffffffff << 0)
>> +
>> +/**
>> + * DOC: HXG Event
>> + *
>> + * The `HXG Event`_ message should be used to initiate asynchronous activity
>> + * that does not involves immediate confirmation nor data.
>> + *
>> + * Format of @DATA0 and all @DATAn fields depends on the @ACTION code.
>> + *
>> + * .. _HXG Event:
>> + *
>> + *  +---+-------+--------------------------------------------------------------+
>> + *  |   | Bits  | Description                                                  |
>> + *  +===+=======+==============================================================+
>> + *  | 0 |    31 | ORIGIN                                                       |
>> + *  |   +-------+--------------------------------------------------------------+
>> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_EVENT_                                   |
>> + *  |   +-------+--------------------------------------------------------------+
>> + *  |   | 27:16 | **DATA0** - event data (depends on ACTION)                   |
>> + *  |   +-------+--------------------------------------------------------------+
>> + *  |   |  15:0 | **ACTION** - event action code                               |
>> + *  +---+-------+--------------------------------------------------------------+
>> + *  | 1 |  31:0 | **DATA1** - optional event data (depends on ACTION)          |
>> + *  +---+-------+--------------------------------------------------------------+
>> + *  |...|       |                                                              |
>> + *  +---+-------+--------------------------------------------------------------+
>> + *  | n |  31:0 | **DATAn** - optional event  data (depends on ACTION)         |
>> + *  +---+-------+--------------------------------------------------------------+
>> + */
>> +
>> +#define GUC_HXG_EVENT_MSG_MIN_LEN		GUC_HXG_MSG_MIN_LEN
>> +#define GUC_HXG_EVENT_MSG_0_DATA0		(0xfff << 16)
>> +#define GUC_HXG_EVENT_MSG_0_ACTION		(0xffff << 0)
>> +#define GUC_HXG_EVENT_MSG_n_DATAn		(0xffffffff << 0)
>> +
>> +/**
>> + * DOC: HXG Busy
>> + *
>> + * The `HXG Busy`_ message may be used to acknowledge reception of the `HXG Request`_
>> + * message if the recipient expects that it processing will be longer than default
>> + * timeout.
>> + *
>> + * The @COUNTER field may be used as a progress indicator.
>> + *
>> + * .. _HXG Busy:
>> + *
>> + *  +---+-------+--------------------------------------------------------------+
>> + *  |   | Bits  | Description                                                  |
>> + *  +===+=======+==============================================================+
>> + *  | 0 |    31 | ORIGIN                                                       |
>> + *  |   +-------+--------------------------------------------------------------+
>> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_NO_RESPONSE_BUSY_                        |
>> + *  |   +-------+--------------------------------------------------------------+
>> + *  |   |  27:0 | **COUNTER** - progress indicator                             |
>> + *  +---+-------+--------------------------------------------------------------+
>> + */
>> +
>> +#define GUC_HXG_BUSY_MSG_LEN			GUC_HXG_MSG_MIN_LEN
>> +#define GUC_HXG_BUSY_MSG_0_COUNTER		GUC_HXG_MSG_0_AUX
>> +
>> +/**
>> + * DOC: HXG Retry
>> + *
>> + * The `HXG Retry`_ message should be used by recipient to indicate that the
>> + * `HXG Request`_ message was dropped and it should be resent again.
>> + *
>> + * The @REASON field may be used to provide additional information.
>> + *
>> + * .. _HXG Retry:
>> + *
>> + *  +---+-------+--------------------------------------------------------------+
>> + *  |   | Bits  | Description                                                  |
>> + *  +===+=======+==============================================================+
>> + *  | 0 |    31 | ORIGIN                                                       |
>> + *  |   +-------+--------------------------------------------------------------+
>> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_NO_RESPONSE_RETRY_                       |
>> + *  |   +-------+--------------------------------------------------------------+
>> + *  |   |  27:0 | **REASON** - reason for retry                                |
>> + *  |   |       |  - _`GUC_HXG_RETRY_REASON_UNSPECIFIED` = 0                   |
>> + *  +---+-------+--------------------------------------------------------------+
>> + */
>> +
>> +#define GUC_HXG_RETRY_MSG_LEN			GUC_HXG_MSG_MIN_LEN
>> +#define GUC_HXG_RETRY_MSG_0_REASON		GUC_HXG_MSG_0_AUX
>> +#define   GUC_HXG_RETRY_REASON_UNSPECIFIED	0u
>> +
>> +/**
>> + * DOC: HXG Failure
>> + *
>> + * The `HXG Failure`_ message shall be used as a reply to the `HXG Request`_
>> + * message that could not be processed due to an error.
>> + *
>> + * .. _HXG Failure:
>> + *
>> + *  +---+-------+--------------------------------------------------------------+
>> + *  |   | Bits  | Description                                                  |
>> + *  +===+=======+==============================================================+
>> + *  | 0 |    31 | ORIGIN                                                       |
>> + *  |   +-------+--------------------------------------------------------------+
>> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_FAILURE_                        |
>> + *  |   +-------+--------------------------------------------------------------+
>> + *  |   | 27:16 | **HINT** - additional error hint                             |
>> + *  |   +-------+--------------------------------------------------------------+
>> + *  |   |  15:0 | **ERROR** - error/result code                                |
>> + *  +---+-------+--------------------------------------------------------------+
>> + */
>> +
>> +#define GUC_HXG_FAILURE_MSG_LEN			GUC_HXG_MSG_MIN_LEN
>> +#define GUC_HXG_FAILURE_MSG_0_HINT		(0xfff << 16)
>> +#define GUC_HXG_FAILURE_MSG_0_ERROR		(0xffff << 0)
>> +
>> +/**
>> + * DOC: HXG Response
>> + *
>> + * The `HXG Response`_ message SHALL be used as a reply to the `HXG Request`_
>> + * message that was successfully processed without an error.
>> + *
>> + * .. _HXG Response:
>> + *
>> + *  +---+-------+--------------------------------------------------------------+
>> + *  |   | Bits  | Description                                                  |
>> + *  +===+=======+==============================================================+
>> + *  | 0 |    31 | ORIGIN                                                       |
>> + *  |   +-------+--------------------------------------------------------------+
>> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_                        |
>> + *  |   +-------+--------------------------------------------------------------+
>> + *  |   |  27:0 | **DATA0** - data (depends on ACTION from `HXG Request`_)     |
>> + *  +---+-------+--------------------------------------------------------------+
>> + *  | 1 |  31:0 | **DATA1** - data (depends on ACTION from `HXG Request`_)     |
>> + *  +---+-------+--------------------------------------------------------------+
>> + *  |...|       |                                                              |
>> + *  +---+-------+--------------------------------------------------------------+
>> + *  | n |  31:0 | **DATAn** - data (depends on ACTION from `HXG Request`_)     |
>> + *  +---+-------+--------------------------------------------------------------+
>> + */
>> +
>> +#define GUC_HXG_RESPONSE_MSG_MIN_LEN		GUC_HXG_MSG_MIN_LEN
>> +#define GUC_HXG_RESPONSE_MSG_0_DATA0		GUC_HXG_MSG_0_AUX
>> +#define GUC_HXG_RESPONSE_MSG_n_DATAn		(0xffffffff << 0)
>> +
>> +/* deprecated */
>>  #define INTEL_GUC_MSG_TYPE_SHIFT	28
>>  #define INTEL_GUC_MSG_TYPE_MASK		(0xF << INTEL_GUC_MSG_TYPE_SHIFT)
>>  #define INTEL_GUC_MSG_DATA_SHIFT	16
>> -- 
>> 2.28.0
>>
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-11 16:39             ` Matthew Brost
@ 2021-05-12  6:26               ` Martin Peres
  2021-05-14 16:31                 ` Jason Ekstrand
  0 siblings, 1 reply; 249+ messages in thread
From: Martin Peres @ 2021-05-12  6:26 UTC (permalink / raw)
  To: Matthew Brost, Bloomfield, Jon
  Cc: Ursulin, Tvrtko, intel-gfx, dri-devel, Ekstrand, Jason,
	Ceraolo Spurio, Daniele, Jason Ekstrand, Vetter, Daniel,
	Harrison, John C

On 11/05/2021 19:39, Matthew Brost wrote:
> On Tue, May 11, 2021 at 08:26:59AM -0700, Bloomfield, Jon wrote:
>>> -----Original Message-----
>>> From: Martin Peres <martin.peres@free.fr>
>>> Sent: Tuesday, May 11, 2021 1:06 AM
>>> To: Daniel Vetter <daniel@ffwll.ch>
>>> Cc: Jason Ekstrand <jason@jlekstrand.net>; Brost, Matthew
>>> <matthew.brost@intel.com>; intel-gfx <intel-gfx@lists.freedesktop.org>;
>>> dri-devel <dri-devel@lists.freedesktop.org>; Ursulin, Tvrtko
>>> <tvrtko.ursulin@intel.com>; Ekstrand, Jason <jason.ekstrand@intel.com>;
>>> Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>; Bloomfield, Jon
>>> <jon.bloomfield@intel.com>; Vetter, Daniel <daniel.vetter@intel.com>;
>>> Harrison, John C <john.c.harrison@intel.com>
>>> Subject: Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
>>>
>>> On 10/05/2021 19:33, Daniel Vetter wrote:
>>>> On Mon, May 10, 2021 at 3:55 PM Martin Peres <martin.peres@free.fr>
>>> wrote:
>>>>>
>>>>> On 10/05/2021 02:11, Jason Ekstrand wrote:
>>>>>> On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 06/05/2021 22:13, Matthew Brost wrote:
>>>>>>>> Basic GuC submission support. This is the first bullet point in the
>>>>>>>> upstreaming plan covered in the following RFC [1].
>>>>>>>>
>>>>>>>> At a very high level the GuC is a piece of firmware which sits between
>>>>>>>> the i915 and the GPU. It offloads some of the scheduling of contexts
>>>>>>>> from the i915 and programs the GPU to submit contexts. The i915
>>>>>>>> communicates with the GuC and the GuC communicates with the
>>> GPU.
>>>>>>>
>>>>>>> May I ask what will GuC command submission do that execlist
>>> won't/can't
>>>>>>> do? And what would be the impact on users? Even forgetting the
>>> troubled
>>>>>>> history of GuC (instability, performance regression, poor level of user
>>>>>>> support, 6+ years of trying to upstream it...), adding this much code
>>>>>>> and doubling the amount of validation needed should come with a
>>>>>>> rationale making it feel worth it... and I am not seeing here. Would you
>>>>>>> mind providing the rationale behind this work?
>>>>>>>
>>>>>>>>
>>>>>>>> GuC submission will be disabled by default on all current upstream
>>>>>>>> platforms behind a module parameter - enable_guc. A value of 3 will
>>>>>>>> enable submission and HuC loading via the GuC. GuC submission
>>> should
>>>>>>>> work on all gen11+ platforms assuming the GuC firmware is present.
>>>>>>>
>>>>>>> What is the plan here when it comes to keeping support for execlist? I
>>>>>>> am afraid that landing GuC support in Linux is the first step towards
>>>>>>> killing the execlist, which would force users to use proprietary
>>>>>>> firmwares that even most Intel engineers have little influence over.
>>>>>>> Indeed, if "drm/i915/guc: Disable semaphores when using GuC
>>> scheduling"
>>>>>>> which states "Disable semaphores when using GuC scheduling as
>>> semaphores
>>>>>>> are broken in the current GuC firmware." is anything to go by, it means
>>>>>>> that even Intel developers seem to prefer working around the GuC
>>>>>>> firmware, rather than fixing it.
>>>>>>
>>>>>> Yes, landing GuC support may be the first step in removing execlist
>>>>>> support. The inevitable reality is that GPU scheduling is coming and
>>>>>> likely to be there only path in the not-too-distant future. (See also
>>>>>> the ongoing thread with AMD about fences.) I'm not going to pass
>>>>>> judgement on whether or not this is a good thing.  I'm just reading the
>>>>>> winds and, in my view, this is where things are headed for good or ill.
>>>>>>
>>>>>> In answer to the question above, the answer to "what do we gain from
>>>>>> GuC?" may soon be, "you get to use your GPU."  We're not there yet
>>> and,
>>>>>> again, I'm not necessarily advocating for it, but that is likely where
>>>>>> things are headed.
>>>>>
>>>>> This will be a sad day, especially since it seems fundamentally opposed
>>>>> with any long-term support, on top of taking away user freedom to
>>>>> fix/tweak their system when Intel won't.
>>>>>
>>>>>> A firmware-based submission model isn't a bad design IMO and, aside
>>> from
>>>>>> the firmware freedom issues, I think there are actual advantages to the
>>>>>> model. Immediately, it'll unlock a few features like parallel submission
>>>>>> (more on that in a bit) and long-running compute because they're
>>>>>> implemented in GuC and the work to implement them properly in the
>>>>>> execlist scheduler is highly non-trivial. Longer term, it may (no
>>>>>> guarantees) unlock some performance by getting the kernel out of the
>>> way.
>>>>>
>>>>> Oh, I definitely agree with firmware-based submission model not being a
>>>>> bad design. I was even cheering for it in 2015. Experience with it made
>>>>> me regret that deeply since :s
>>>>>
>>>>> But with the DRM scheduler being responsible for most things, I fail to
>>>>> see what we could offload in the GuC except context switching (like
>>>>> every other manufacturer). The problem is, the GuC does way more than
>>>>> just switching registers in bulk, and if the number of revisions of the
>>>>> GuC is anything to go by, it is way too complex for me to feel
>>>>> comfortable with it.
>>>>
>>>> We need to flesh out that part of the plan more, but we're not going
>>>> to use drm scheduler for everything. It's only to handle the dma-fence
>>>> legacy side of things, which means:
>>>> - timeout handling for batches that take too long
>>>> - dma_fence dependency sorting/handling
>>>> - boosting of context from display flips (currently missing, needs to
>>>> be ported from drm/i915)
>>>>
>>>> The actual round-robin/preempt/priority handling is still left to the
>>>> backend, in this case here the fw. So there's large chunks of
>>>> code/functionality where drm/scheduler wont be involved in, and like
>>>> Jason says: The hw direction winds definitely blow in the direction
>>>> that this is all handled in hw.
>>>
>>> The plan makes sense for a SRIOV-enable GPU, yes.
>>>
>>> However, if the GuC is actually helping i915, then why not open source
>>> it and drop all the issues related to its stability? Wouldn't it be the
>>> perfect solution, as it would allow dropping execlist support for newer
>>> HW, and it would eliminate the concerns about maintenance of stable
>>> releases of Linux?
>>
>> That the major version of the FW is high is not due to bugs - Bugs don't trigger major version bumps anyway. 

Of course, where did I say they would?

> Only interface changes increment the major version, and we do add features, to keep it relevant to the evolving hardware and OS landscape. When only Windows used GuC there was no reason not to minimize interface creep - GuC and KMD are released as an atomic bundle on Windows. With Linux, this is no longer the case, and has not been for some time.

AFAIK, Intel has been shipping GuC to customers since gen9, and upstream 
has been supporting command submission (albeit in a broken form) for 
years... until Michal finally disabled it after I asked for it to a bit 
over 2 years ago[1], when GuC was at major version 32.

So... not sure I would trust your word so blindly here.

[1] 
https://patchwork.freedesktop.org/patch/297997/?series=58760&rev=2#comment_559594
>>
> 
> Jon hit the nail on head here - there hasn't been any reason not to bump the GuC
> version / change the interface until there is code upstream using the GuC. Once
> we push something that totally changes. Once SRIOV lands we literally can't the
> interface without breaking the world. Our goal is to this right before
> somethings lands, hence the high version number.

Good to hear! But Intel will continue to change the interface as new 
generations are made, so what is the support model for older GPUs / 
kernels which will be stuck on older major revisions?

> 
> Matt
> 
>> We have been using GuC as the sole mechanism for submission on Windows since Gen8, and it has proven very reliable. This is in large part because it is simple, and designed from day 1 as a cohesive solution alongside the hardware.

Exactly, the GuC was designed with Windows' GPU model... which is not 
directly applicable to Linux. Also, Windows does not care as much about 
submission latency, whereas most Linux users still depend on glamor for 
2D acceleration which is pretty much the biggest stress test for command 
submission latency. Also, features not used by the Windows driver or 
used in a different way are/will get broken (see the semaphore patch 
that works around it).

>>
>> Will there be bugs in the future? Of course. It's a new i915 backend. There are bugs in the execlist backend too, and the runlist backend, and the majority of real-world software ever written. But the i915 GuC backend is way simpler than execlist, much easier to understand, and therefore much easier to maintain. It's a net win for i915 and Linux.

I am more than willing to accept the fact that the interface would be 
easier to work with, and I welcome anything that will simplify the 
driver... but not at the expense of regressing the user experience. One 
has to prove more than *just* code maintainability.

Feel free to iterate/land the code, but enabling guc-based command 
submission is waaaaayyyy too early, no matter how much you want it. This 
patch will remain a NACK from me until I see more of the plan to support 
*users* who are willing to use a proprietary firmware, performance 
analysis, what's the plan for users who will not want to use it, and 
what are the capabilities of GuC which could be used for privilege 
escalation and what is done to mitigate that.

Thanks,
Martin

>>
>> Jon

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 20/97] drm/i915/guc: Introduce unified HXG messages
  2021-05-11 22:11     ` Michal Wajdeczko
@ 2021-05-12  8:40       ` Daniel Vetter
  0 siblings, 0 replies; 249+ messages in thread
From: Daniel Vetter @ 2021-05-12  8:40 UTC (permalink / raw)
  To: Michal Wajdeczko
  Cc: Matthew Brost, tvrtko.ursulin, intel-gfx, dri-devel,
	jason.ekstrand, daniele.ceraolospurio, jon.bloomfield,
	daniel.vetter, john.c.harrison

On Wed, May 12, 2021 at 12:11:40AM +0200, Michal Wajdeczko wrote:
> 
> 
> On 11.05.2021 17:16, Daniel Vetter wrote:
> > On Thu, May 06, 2021 at 12:13:34PM -0700, Matthew Brost wrote:
> >> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >>
> >> New GuC firmware will unify format of MMIO and CTB H2G messages.
> >> Introduce their definitions now to allow gradual transition of
> >> our code to match new changes.
> >>
> >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >> Cc: Michał Winiarski <michal.winiarski@intel.com>
> >> ---
> >>  .../gpu/drm/i915/gt/uc/abi/guc_messages_abi.h | 226 ++++++++++++++++++
> >>  1 file changed, 226 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
> >> index 775e21f3058c..1c264819aa03 100644
> >> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
> >> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
> >> @@ -6,6 +6,232 @@
> >>  #ifndef _ABI_GUC_MESSAGES_ABI_H
> >>  #define _ABI_GUC_MESSAGES_ABI_H
> >>  
> >> +/**
> >> + * DOC: HXG Message
> > 
> > These aren't useful if we don't pull them in somewhere in the
> > Documentation/gpu hierarchy. General comment, and also please check that
> > it all renders correctly still.
> 
> Patch that connects all these DOC sections into i915.rst is still on
> private branch, where I'm trying to verify all html rendering, and ...
> 
> > 
> > btw if you respin a patch not originally by you we generally add a (v1) to
> > the original s-o-b line (or whever the version split was) and explain in
> > the usual changelog in the commit message what was changed.
> > 
> > This holds for the entire series ofc.
> > -Daniel
> > 
> >> + *
> >> + * All messages exchanged with GuC are defined using 32 bit dwords.
> >> + * First dword is treated as a message header. Remaining dwords are optional.
> >> + *
> >> + * .. _HXG Message:
> 
> where such workarounds from early documentation are already removed,
> since they are not needed any more starting from commit ef09989594bf
> ("scripts/kernel-doc: add internal hyperlink to DOC: sections")

Oh this is nice. Fwiw the upstream commit is:

commit 06a755d6269c072ed0c9b84227eaf33113dc243f
Author: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date:   Mon Jan 18 12:08:13 2021 +0100

    scripts/kernel-doc: add internal hyperlink to DOC: sections

I guess the sha1 you have is from your own branch?
-Daniel


> 
> Michal
> 
> >> + *
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + *  |   | Bits  | Description                                                  |
> >> + *  +===+=======+==============================================================+
> >> + *  |   |       |                                                              |
> >> + *  | 0 |    31 | **ORIGIN** - originator of the message                       |
> >> + *  |   |       |   - _`GUC_HXG_ORIGIN_HOST` = 0                               |
> >> + *  |   |       |   - _`GUC_HXG_ORIGIN_GUC` = 1                                |
> >> + *  |   |       |                                                              |
> >> + *  |   +-------+--------------------------------------------------------------+
> >> + *  |   | 30:28 | **TYPE** - message type                                      |
> >> + *  |   |       |   - _`GUC_HXG_TYPE_REQUEST` = 0                              |
> >> + *  |   |       |   - _`GUC_HXG_TYPE_EVENT` = 1                                |
> >> + *  |   |       |   - _`GUC_HXG_TYPE_NO_RESPONSE_BUSY` = 3                     |
> >> + *  |   |       |   - _`GUC_HXG_TYPE_NO_RESPONSE_RETRY` = 5                    |
> >> + *  |   |       |   - _`GUC_HXG_TYPE_RESPONSE_FAILURE` = 6                     |
> >> + *  |   |       |   - _`GUC_HXG_TYPE_RESPONSE_SUCCESS` = 7                     |
> >> + *  |   +-------+--------------------------------------------------------------+
> >> + *  |   |  27:0 | **AUX** - auxiliary data (depends TYPE)                      |
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + *  | 1 |  31:0 | optional payload (depends on TYPE)                           |
> >> + *  +---+-------+                                                              |
> >> + *  |...|       |                                                              |
> >> + *  +---+-------+                                                              |
> >> + *  | n |  31:0 |                                                              |
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + */
> >> +
> >> +#define GUC_HXG_MSG_MIN_LEN			1u
> >> +#define GUC_HXG_MSG_0_ORIGIN			(0x1 << 31)
> >> +#define   GUC_HXG_ORIGIN_HOST			0u
> >> +#define   GUC_HXG_ORIGIN_GUC			1u
> >> +#define GUC_HXG_MSG_0_TYPE			(0x7 << 28)
> >> +#define   GUC_HXG_TYPE_REQUEST			0u
> >> +#define   GUC_HXG_TYPE_EVENT			1u
> >> +#define   GUC_HXG_TYPE_NO_RESPONSE_BUSY		3u
> >> +#define   GUC_HXG_TYPE_NO_RESPONSE_RETRY	5u
> >> +#define   GUC_HXG_TYPE_RESPONSE_FAILURE		6u
> >> +#define   GUC_HXG_TYPE_RESPONSE_SUCCESS		7u
> >> +#define GUC_HXG_MSG_0_AUX			(0xfffffff << 0)
> >> +
> >> +/**
> >> + * DOC: HXG Request
> >> + *
> >> + * The `HXG Request`_ message should be used to initiate synchronous activity
> >> + * for which confirmation or return data is expected.
> >> + *
> >> + * The recipient of this message shall use `HXG Response`_, `HXG Failure`_
> >> + * or `HXG Retry`_ message as a definite reply, and may use `HXG Busy`_
> >> + * message as a intermediate reply.
> >> + *
> >> + * Format of @DATA0 and all @DATAn fields depends on the @ACTION code.
> >> + *
> >> + * _HXG Request:
> >> + *
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + *  |   | Bits  | Description                                                  |
> >> + *  +===+=======+==============================================================+
> >> + *  | 0 |    31 | ORIGIN                                                       |
> >> + *  |   +-------+--------------------------------------------------------------+
> >> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_                                 |
> >> + *  |   +-------+--------------------------------------------------------------+
> >> + *  |   | 27:16 | **DATA0** - request data (depends on ACTION)                 |
> >> + *  |   +-------+--------------------------------------------------------------+
> >> + *  |   |  15:0 | **ACTION** - requested action code                           |
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + *  | 1 |  31:0 | **DATA1** - optional data (depends on ACTION)                |
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + *  |...|       |                                                              |
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + *  | n |  31:0 | **DATAn** - optional data (depends on ACTION)                |
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + */
> >> +
> >> +#define GUC_HXG_REQUEST_MSG_MIN_LEN		GUC_HXG_MSG_MIN_LEN
> >> +#define GUC_HXG_REQUEST_MSG_0_DATA0		(0xfff << 16)
> >> +#define GUC_HXG_REQUEST_MSG_0_ACTION		(0xffff << 0)
> >> +#define GUC_HXG_REQUEST_MSG_n_DATAn		(0xffffffff << 0)
> >> +
> >> +/**
> >> + * DOC: HXG Event
> >> + *
> >> + * The `HXG Event`_ message should be used to initiate asynchronous activity
> >> + * that does not involves immediate confirmation nor data.
> >> + *
> >> + * Format of @DATA0 and all @DATAn fields depends on the @ACTION code.
> >> + *
> >> + * .. _HXG Event:
> >> + *
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + *  |   | Bits  | Description                                                  |
> >> + *  +===+=======+==============================================================+
> >> + *  | 0 |    31 | ORIGIN                                                       |
> >> + *  |   +-------+--------------------------------------------------------------+
> >> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_EVENT_                                   |
> >> + *  |   +-------+--------------------------------------------------------------+
> >> + *  |   | 27:16 | **DATA0** - event data (depends on ACTION)                   |
> >> + *  |   +-------+--------------------------------------------------------------+
> >> + *  |   |  15:0 | **ACTION** - event action code                               |
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + *  | 1 |  31:0 | **DATA1** - optional event data (depends on ACTION)          |
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + *  |...|       |                                                              |
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + *  | n |  31:0 | **DATAn** - optional event  data (depends on ACTION)         |
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + */
> >> +
> >> +#define GUC_HXG_EVENT_MSG_MIN_LEN		GUC_HXG_MSG_MIN_LEN
> >> +#define GUC_HXG_EVENT_MSG_0_DATA0		(0xfff << 16)
> >> +#define GUC_HXG_EVENT_MSG_0_ACTION		(0xffff << 0)
> >> +#define GUC_HXG_EVENT_MSG_n_DATAn		(0xffffffff << 0)
> >> +
> >> +/**
> >> + * DOC: HXG Busy
> >> + *
> >> + * The `HXG Busy`_ message may be used to acknowledge reception of the `HXG Request`_
> >> + * message if the recipient expects that it processing will be longer than default
> >> + * timeout.
> >> + *
> >> + * The @COUNTER field may be used as a progress indicator.
> >> + *
> >> + * .. _HXG Busy:
> >> + *
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + *  |   | Bits  | Description                                                  |
> >> + *  +===+=======+==============================================================+
> >> + *  | 0 |    31 | ORIGIN                                                       |
> >> + *  |   +-------+--------------------------------------------------------------+
> >> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_NO_RESPONSE_BUSY_                        |
> >> + *  |   +-------+--------------------------------------------------------------+
> >> + *  |   |  27:0 | **COUNTER** - progress indicator                             |
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + */
> >> +
> >> +#define GUC_HXG_BUSY_MSG_LEN			GUC_HXG_MSG_MIN_LEN
> >> +#define GUC_HXG_BUSY_MSG_0_COUNTER		GUC_HXG_MSG_0_AUX
> >> +
> >> +/**
> >> + * DOC: HXG Retry
> >> + *
> >> + * The `HXG Retry`_ message should be used by recipient to indicate that the
> >> + * `HXG Request`_ message was dropped and it should be resent again.
> >> + *
> >> + * The @REASON field may be used to provide additional information.
> >> + *
> >> + * .. _HXG Retry:
> >> + *
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + *  |   | Bits  | Description                                                  |
> >> + *  +===+=======+==============================================================+
> >> + *  | 0 |    31 | ORIGIN                                                       |
> >> + *  |   +-------+--------------------------------------------------------------+
> >> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_NO_RESPONSE_RETRY_                       |
> >> + *  |   +-------+--------------------------------------------------------------+
> >> + *  |   |  27:0 | **REASON** - reason for retry                                |
> >> + *  |   |       |  - _`GUC_HXG_RETRY_REASON_UNSPECIFIED` = 0                   |
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + */
> >> +
> >> +#define GUC_HXG_RETRY_MSG_LEN			GUC_HXG_MSG_MIN_LEN
> >> +#define GUC_HXG_RETRY_MSG_0_REASON		GUC_HXG_MSG_0_AUX
> >> +#define   GUC_HXG_RETRY_REASON_UNSPECIFIED	0u
> >> +
> >> +/**
> >> + * DOC: HXG Failure
> >> + *
> >> + * The `HXG Failure`_ message shall be used as a reply to the `HXG Request`_
> >> + * message that could not be processed due to an error.
> >> + *
> >> + * .. _HXG Failure:
> >> + *
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + *  |   | Bits  | Description                                                  |
> >> + *  +===+=======+==============================================================+
> >> + *  | 0 |    31 | ORIGIN                                                       |
> >> + *  |   +-------+--------------------------------------------------------------+
> >> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_FAILURE_                        |
> >> + *  |   +-------+--------------------------------------------------------------+
> >> + *  |   | 27:16 | **HINT** - additional error hint                             |
> >> + *  |   +-------+--------------------------------------------------------------+
> >> + *  |   |  15:0 | **ERROR** - error/result code                                |
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + */
> >> +
> >> +#define GUC_HXG_FAILURE_MSG_LEN			GUC_HXG_MSG_MIN_LEN
> >> +#define GUC_HXG_FAILURE_MSG_0_HINT		(0xfff << 16)
> >> +#define GUC_HXG_FAILURE_MSG_0_ERROR		(0xffff << 0)
> >> +
> >> +/**
> >> + * DOC: HXG Response
> >> + *
> >> + * The `HXG Response`_ message SHALL be used as a reply to the `HXG Request`_
> >> + * message that was successfully processed without an error.
> >> + *
> >> + * .. _HXG Response:
> >> + *
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + *  |   | Bits  | Description                                                  |
> >> + *  +===+=======+==============================================================+
> >> + *  | 0 |    31 | ORIGIN                                                       |
> >> + *  |   +-------+--------------------------------------------------------------+
> >> + *  |   | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_                        |
> >> + *  |   +-------+--------------------------------------------------------------+
> >> + *  |   |  27:0 | **DATA0** - data (depends on ACTION from `HXG Request`_)     |
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + *  | 1 |  31:0 | **DATA1** - data (depends on ACTION from `HXG Request`_)     |
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + *  |...|       |                                                              |
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + *  | n |  31:0 | **DATAn** - data (depends on ACTION from `HXG Request`_)     |
> >> + *  +---+-------+--------------------------------------------------------------+
> >> + */
> >> +
> >> +#define GUC_HXG_RESPONSE_MSG_MIN_LEN		GUC_HXG_MSG_MIN_LEN
> >> +#define GUC_HXG_RESPONSE_MSG_0_DATA0		GUC_HXG_MSG_0_AUX
> >> +#define GUC_HXG_RESPONSE_MSG_n_DATAn		(0xffffffff << 0)
> >> +
> >> +/* deprecated */
> >>  #define INTEL_GUC_MSG_TYPE_SHIFT	28
> >>  #define INTEL_GUC_MSG_TYPE_MASK		(0xF << INTEL_GUC_MSG_TYPE_SHIFT)
> >>  #define INTEL_GUC_MSG_DATA_SHIFT	16
> >> -- 
> >> 2.28.0
> >>
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (97 preceding siblings ...)
  2021-05-09 17:12 ` [RFC PATCH 00/97] Basic GuC submission support in the i915 Martin Peres
@ 2021-05-14 11:11 ` Tvrtko Ursulin
  2021-05-14 16:36   ` Jason Ekstrand
  2021-05-14 16:41   ` Matthew Brost
  2021-05-25 10:32 ` Tvrtko Ursulin
  99 siblings, 2 replies; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-14 11:11 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter


On 06/05/2021 20:13, Matthew Brost wrote:
> Basic GuC submission support. This is the first bullet point in the
> upstreaming plan covered in the following RFC [1].
> 
> At a very high level the GuC is a piece of firmware which sits between
> the i915 and the GPU. It offloads some of the scheduling of contexts
> from the i915 and programs the GPU to submit contexts. The i915
> communicates with the GuC and the GuC communicates with the GPU.
> 
> GuC submission will be disabled by default on all current upstream
> platforms behind a module parameter - enable_guc. A value of 3 will
> enable submission and HuC loading via the GuC. GuC submission should
> work on all gen11+ platforms assuming the GuC firmware is present.

Some thoughts mostly relating to future platforms where GuC will be the 
only option, and to some extent platforms where it will be possible to 
turn it on for one reason or another.

Debuggability - in the context of having an upstream way/tool for 
capturing and viewing GuC logs usable for attaching to bug reports.

Currently i915 logs, can provide traces via tracepoints and trace 
printk, and GPU error capture state, which provides often sufficient 
trail of evidence to debug issues.

We need to make sure GuC does is not a black box in this respect. By 
this I mean it does not hide a large portion of the execution flows from 
upstream observability.

This could mean a tool in IGT to access/capture GuC logs and update bug 
filing instructions.

Leading from here is probably the need for the GuC firmware team to 
cross the internal-upstream boundary and deal with such bug reports on 
upstream trackers. Upstream GuC is unlikely to work if we don't have 
such plan and commitment.

Also leading from here is the need for GPU error capture to be on par 
from day one which is I believe still not there in the firmware.

Another, although unrelated, missing feature on my wish list is firmware 
support for wiring up accurate engine busyness stats to i915 PMU. I 
believe this is also being worked on but I don't know when is the 
expected delivery.

If we are tracking a TODO list of items somewhere I think these ones 
should be definitely considered.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-12  6:26               ` Martin Peres
@ 2021-05-14 16:31                 ` Jason Ekstrand
  2021-05-25 15:37                   ` Alex Deucher
  0 siblings, 1 reply; 249+ messages in thread
From: Jason Ekstrand @ 2021-05-14 16:31 UTC (permalink / raw)
  To: Martin Peres
  Cc: Matthew Brost, Ursulin, Tvrtko, Ceraolo Spurio, Daniele,
	intel-gfx, dri-devel, Ekstrand, Jason, Bloomfield, Jon, Vetter,
	Daniel, Harrison, John C

Pulling a few threads together...

On Mon, May 10, 2021 at 1:39 PM Francisco Jerez <currojerez@riseup.net> wrote:
>
> I agree with Martin on this.  Given that using GuC currently involves
> making your open-source graphics stack rely on a closed-source
> cryptographically-protected blob in order to submit commands to the GPU,
> and given that it is /still/ possible to use the GPU without it, I'd
> expect some strong material justification for making the switch (like,
> it improves performance of test-case X and Y by Z%, or, we're truly
> sorry but we cannot program your GPU anymore with a purely open-source
> software stack).  Any argument based on the apparent direction of the
> wind doesn't sound like a material engineering reason to me, and runs
> the risk of being self-fulfilling if it leads us to do the worse thing
> for our users just because we have the vague feeling that it is the
> general trend, even though we may have had the means to obtain a better
> compromise for them.

I think it's important to distinguish between landing code to support
GuC submission and requiring it in order to use the GPU.  We've got
the execlist back-end and it's not going anywhere, at least not for
older hardware, and it will likely keep working as long as execlists
remain in the hardware.  What's being proposed here is a new back-end
which, yes, depends on firmware and can be used for more features.

I'm well aware of the slippery slope argument that's implicitly being
used here even if no one is actually saying it:  If we land GuC
support in i915 in any form then Intel HW engineers will say "See,
Linux supports GuC now; we can rip out execlists" and we'll end up in
the dystopia of closed-source firmware.  If upstream continues to push
back on GuC in any form then they'll be forced to keep execlists.
I'll freely admit that there is probably some truth to this.  However,
I really doubt that it's going to work long-term.  If the HW
architects are determined enough to rip it out, they will.

If GuC is really inevitable, then it's in our best interests to land
at least beta support earlier.  There are a lot of questions that
people have brought up around back-ports, dealing with stable kernels,
stability concerns, etc.  The best way to sort those out is to land
the code and start dealing with the issues.  We can't front-load
solving every possible issue or the code will never land.  But maybe
that's people's actual objective?


On Wed, May 12, 2021 at 1:26 AM Martin Peres <martin.peres@free.fr> wrote:
>
> On 11/05/2021 19:39, Matthew Brost wrote:
> > On Tue, May 11, 2021 at 08:26:59AM -0700, Bloomfield, Jon wrote:
> >>> On 10/05/2021 19:33, Daniel Vetter wrote:
> >>>> On Mon, May 10, 2021 at 3:55 PM Martin Peres <martin.peres@free.fr>
> >>> wrote:
> >>>
> >>> However, if the GuC is actually helping i915, then why not open source
> >>> it and drop all the issues related to its stability? Wouldn't it be the
> >>> perfect solution, as it would allow dropping execlist support for newer
> >>> HW, and it would eliminate the concerns about maintenance of stable
> >>> releases of Linux?

I would like to see that happen.  I know there was some chatter about
it for a while and then the discussions got killed.  I'm not sure what
happened, to be honest.  However, I don't think we can make any
guarantees or assumptions there, I'm afraid. :-(

> >> That the major version of the FW is high is not due to bugs - Bugs don't trigger major version bumps anyway.
>
> Of course, where did I say they would?

I think the concern here is that old kernels will require old major
GuC versions because interfaces won't be backwards-compatible and then
those kernels won't get bug fixes.  That's a legitimate concern.
Given the Linux usage model, I think it's fair to require either
backwards-compatibility with GuC interfaces and validation of that
backwards-compatibility or stable releases with bug fixes for a good
long while.  I honestly can't say whether or not we've really scoped
that.  Jon?

> >> We have been using GuC as the sole mechanism for submission on Windows since Gen8, and it has proven very reliable. This is in large part because it is simple, and designed from day 1 as a cohesive solution alongside the hardware.

There are going to be differences in the usage patterns that i915 and
Windows will hit when it comes to the subtle details of how we bang on
the GuC rings.  Those will likely lead to bugs on Linux that don't
show up on Windows so "it works on Windows" doesn't mean we're headed
for a bug-free future.  It means we have an existence proof that
firmware-based submission can be very reliable.  However, I don't
think anyone on this thread is really questioning that.

> Exactly, the GuC was designed with Windows' GPU model... which is not
> directly applicable to Linux. Also, Windows does not care as much about
> submission latency, whereas most Linux users still depend on glamor for
> 2D acceleration which is pretty much the biggest stress test for command
> submission latency. Also, features not used by the Windows driver or
> used in a different way are/will get broken (see the semaphore patch
> that works around it).

I'm not nearly as deep into benchmarking the delta as you are so I
won't contradict anything said directly.  However, I think it's worth
pointing out a few things:

There isn't really a Windows GPU model.  There's a different
submission model with Win10 vs. Win7 and Linux looks a lot more like
Win7.  I really want Linux to start looking like Win10 at which point
they'll be using roughly the same "GPU model".  There are other OS
differences that matter here such as Windows' substantially higher
interrupt handling latency which GuC theoretically works around.
However, I don't think it's fair to say that the way Linux wants to
program the GPU for command submission is substantially different from
Windows due to userspace software differences.

There are significant differences in terms of dma_fence handling and
implicit synchronization.  However, as has already been mentioned,
those will be handled by drm/scheduler with GuC as a back-end that
manages load-balancing.  And, yes, there will be Linux-specific bugs
(see above) but they're not because of a fundamentally different
model.

One other thing worth mentioning, which doesn't seem to fit anywhere:
If we really care about keeping execlists working for the upcoming
use-cases, it needs major work.  It's currently way too deeply tied
with i915_sw_fence so it can't handle long-running compute batches
without breaking dma-fence rules.  The way it handles bonded submit is
a bolt-on that doesn't actually provide the guarantees that userspace
needs.  It should also probably be re-architected to use drm/scheduler
for dma_fence and look a lot more like GuC on the inside.

The point of bringing this up is that I'm seeing a lot more execlist
love than I think it deserves. :-)  It may be free software but that
doesn't mean it's good software. :-P  To be clear, I don't mean to
unduly insult Chris or any of the other people who have worked on it.
It works and it's perfectly functional for supporting all the good ol'
use-cases us desktop Linux people are used to.  But the ways in which
it would have to change in order to handle the future are substantial.

--Jason

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-14 11:11 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-14 16:36   ` Jason Ekstrand
  2021-05-14 16:46     ` Matthew Brost
  2021-05-14 16:41   ` Matthew Brost
  1 sibling, 1 reply; 249+ messages in thread
From: Jason Ekstrand @ 2021-05-14 16:36 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Matthew Brost, Jason Ekstrand, Intel GFX,
	Maling list - DRI developers, Daniel Vetter

On Fri, May 14, 2021 at 6:12 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
> On 06/05/2021 20:13, Matthew Brost wrote:
> > Basic GuC submission support. This is the first bullet point in the
> > upstreaming plan covered in the following RFC [1].
> >
> > At a very high level the GuC is a piece of firmware which sits between
> > the i915 and the GPU. It offloads some of the scheduling of contexts
> > from the i915 and programs the GPU to submit contexts. The i915
> > communicates with the GuC and the GuC communicates with the GPU.
> >
> > GuC submission will be disabled by default on all current upstream
> > platforms behind a module parameter - enable_guc. A value of 3 will
> > enable submission and HuC loading via the GuC. GuC submission should
> > work on all gen11+ platforms assuming the GuC firmware is present.
>
> Some thoughts mostly relating to future platforms where GuC will be the
> only option, and to some extent platforms where it will be possible to
> turn it on for one reason or another.
>
> Debuggability - in the context of having an upstream way/tool for
> capturing and viewing GuC logs usable for attaching to bug reports.
>
> Currently i915 logs, can provide traces via tracepoints and trace
> printk, and GPU error capture state, which provides often sufficient
> trail of evidence to debug issues.
>
> We need to make sure GuC does is not a black box in this respect. By
> this I mean it does not hide a large portion of the execution flows from
> upstream observability.

I agree here.  If GuC suddenly makes submission issues massively
harder to debug then that's a regression vs. execlists.  I don't know
what the solution there is but I think the concern is valid.

> This could mean a tool in IGT to access/capture GuC logs and update bug
> filing instructions.
>
> Leading from here is probably the need for the GuC firmware team to
> cross the internal-upstream boundary and deal with such bug reports on
> upstream trackers. Upstream GuC is unlikely to work if we don't have
> such plan and commitment.

I mostly agree here as well.  I'm not sure it'll actually happen but
I'd like anyone who writes code which impacts Linux to be active in
upstream bug trackers.

> Also leading from here is the need for GPU error capture to be on par
> from day one which is I believe still not there in the firmware.

This one has me genuinely concerned.  I've heard rumors that we don't
have competent error captures with GuC yet.  From the Mesa PoV, this
is a non-starter.  We can't be asked to develop graphics drivers with
no error capture.

The good news is that, based on my understanding, it shouldn't be
terrible to support.  We just need the GuC to grab all the registers
for us and shove them in a buffer somewhere before it resets the GPU
and all that data is lost.  I would hope the Windows people have
already done that and we just need to hook it up.  If not, there may
be some GuC engineering required here.

> Another, although unrelated, missing feature on my wish list is firmware
> support for wiring up accurate engine busyness stats to i915 PMU. I
> believe this is also being worked on but I don't know when is the
> expected delivery.
>
> If we are tracking a TODO list of items somewhere I think these ones
> should be definitely considered.

Yup, let's get it all in the ToDo and not flip GuC on by default in
the wild until it's all checked off.

--Jason

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-14 11:11 ` [Intel-gfx] " Tvrtko Ursulin
  2021-05-14 16:36   ` Jason Ekstrand
@ 2021-05-14 16:41   ` Matthew Brost
  1 sibling, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-14 16:41 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Fri, May 14, 2021 at 12:11:56PM +0100, Tvrtko Ursulin wrote:
> 
> On 06/05/2021 20:13, Matthew Brost wrote:
> > Basic GuC submission support. This is the first bullet point in the
> > upstreaming plan covered in the following RFC [1].
> > 
> > At a very high level the GuC is a piece of firmware which sits between
> > the i915 and the GPU. It offloads some of the scheduling of contexts
> > from the i915 and programs the GPU to submit contexts. The i915
> > communicates with the GuC and the GuC communicates with the GPU.
> > 
> > GuC submission will be disabled by default on all current upstream
> > platforms behind a module parameter - enable_guc. A value of 3 will
> > enable submission and HuC loading via the GuC. GuC submission should
> > work on all gen11+ platforms assuming the GuC firmware is present.
> 
> Some thoughts mostly relating to future platforms where GuC will be the only
> option, and to some extent platforms where it will be possible to turn it on
> for one reason or another.
> 
> Debuggability - in the context of having an upstream way/tool for capturing
> and viewing GuC logs usable for attaching to bug reports.
> 

Agree. We have discussed this internally as an upstream requirement for quite
sometime. 

> Currently i915 logs, can provide traces via tracepoints and trace printk,
> and GPU error capture state, which provides often sufficient trail of
> evidence to debug issues.
> 

If we do this right, we should have something the same with GuC submission.

> We need to make sure GuC does is not a black box in this respect. By this I
> mean it does not hide a large portion of the execution flows from upstream
> observability.
> 
> This could mean a tool in IGT to access/capture GuC logs and update bug
> filing instructions.
> 

We have a few internal tools decode the GuC logs. One of these tools will be
open sourced and on a public repo. We just need to decide which tool and make
sure that tool works across all the distros.

> Leading from here is probably the need for the GuC firmware team to cross
> the internal-upstream boundary and deal with such bug reports on upstream
> trackers. Upstream GuC is unlikely to work if we don't have such plan and
> commitment.
> 

I think we can land this code first as it is going be disabled by default.
Certainly once we turn it on by default we need to have everything in place that
you mention in this email.

> Also leading from here is the need for GPU error capture to be on par from
> day one which is I believe still not there in the firmware.
>

We are missing a register dump from the GuC on reset. No other way to say this
than this has been huge miss by the i915 / GuC teams. This is something we
absolutely need and it hasn't gotten done. I'll push on this and hopefully we
can land this feature soon.

> Another, although unrelated, missing feature on my wish list is firmware
> support for wiring up accurate engine busyness stats to i915 PMU. I believe
> this is also being worked on but I don't know when is the expected delivery.
>

This is landing this week I believe. Next upstream post should include an
updated GuC firmware + code in the i915 that hooks into the PMU stats.

Matt

> If we are tracking a TODO list of items somewhere I think these ones should
> be definitely considered.
> 
> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-14 16:36   ` Jason Ekstrand
@ 2021-05-14 16:46     ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-14 16:46 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Jason Ekstrand, Tvrtko Ursulin, Intel GFX,
	Maling list - DRI developers, Daniel Vetter

On Fri, May 14, 2021 at 11:36:37AM -0500, Jason Ekstrand wrote:
> On Fri, May 14, 2021 at 6:12 AM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> >
> > On 06/05/2021 20:13, Matthew Brost wrote:
> > > Basic GuC submission support. This is the first bullet point in the
> > > upstreaming plan covered in the following RFC [1].
> > >
> > > At a very high level the GuC is a piece of firmware which sits between
> > > the i915 and the GPU. It offloads some of the scheduling of contexts
> > > from the i915 and programs the GPU to submit contexts. The i915
> > > communicates with the GuC and the GuC communicates with the GPU.
> > >
> > > GuC submission will be disabled by default on all current upstream
> > > platforms behind a module parameter - enable_guc. A value of 3 will
> > > enable submission and HuC loading via the GuC. GuC submission should
> > > work on all gen11+ platforms assuming the GuC firmware is present.
> >
> > Some thoughts mostly relating to future platforms where GuC will be the
> > only option, and to some extent platforms where it will be possible to
> > turn it on for one reason or another.
> >
> > Debuggability - in the context of having an upstream way/tool for
> > capturing and viewing GuC logs usable for attaching to bug reports.
> >
> > Currently i915 logs, can provide traces via tracepoints and trace
> > printk, and GPU error capture state, which provides often sufficient
> > trail of evidence to debug issues.
> >
> > We need to make sure GuC does is not a black box in this respect. By
> > this I mean it does not hide a large portion of the execution flows from
> > upstream observability.
> 
> I agree here.  If GuC suddenly makes submission issues massively
> harder to debug then that's a regression vs. execlists.  I don't know
> what the solution there is but I think the concern is valid.
> 

Replied to Tvrtko with detailed answers. The TL;DR is agree with basically
everything he said and we have plans address it all and everything must be
addressed before the GuC can be turned on by default.

Matt

> > This could mean a tool in IGT to access/capture GuC logs and update bug
> > filing instructions.
> >
> > Leading from here is probably the need for the GuC firmware team to
> > cross the internal-upstream boundary and deal with such bug reports on
> > upstream trackers. Upstream GuC is unlikely to work if we don't have
> > such plan and commitment.
> 
> I mostly agree here as well.  I'm not sure it'll actually happen but
> I'd like anyone who writes code which impacts Linux to be active in
> upstream bug trackers.
> 
> > Also leading from here is the need for GPU error capture to be on par
> > from day one which is I believe still not there in the firmware.
> 
> This one has me genuinely concerned.  I've heard rumors that we don't
> have competent error captures with GuC yet.  From the Mesa PoV, this
> is a non-starter.  We can't be asked to develop graphics drivers with
> no error capture.
> 
> The good news is that, based on my understanding, it shouldn't be
> terrible to support.  We just need the GuC to grab all the registers
> for us and shove them in a buffer somewhere before it resets the GPU
> and all that data is lost.  I would hope the Windows people have
> already done that and we just need to hook it up.  If not, there may
> be some GuC engineering required here.
> 
> > Another, although unrelated, missing feature on my wish list is firmware
> > support for wiring up accurate engine busyness stats to i915 PMU. I
> > believe this is also being worked on but I don't know when is the
> > expected delivery.
> >
> > If we are tracking a TODO list of items somewhere I think these ones
> > should be definitely considered.
> 
> Yup, let's get it all in the ToDo and not flip GuC on by default in
> the wild until it's all checked off.
> 
> --Jason

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 01/97] drm/i915/gt: Move engine setup out of set_default_submission
  2021-05-06 19:13 ` [RFC PATCH 01/97] drm/i915/gt: Move engine setup out of set_default_submission Matthew Brost
@ 2021-05-19  0:25   ` Matthew Brost
  2021-05-25  8:44   ` [Intel-gfx] " Tvrtko Ursulin
  1 sibling, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-19  0:25 UTC (permalink / raw)
  To: intel-gfx, dri-devel

On Thu, May 06, 2021 at 12:13:15PM -0700, Matthew Brost wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Now that we no longer switch back and forth between guc and execlists,
> we no longer need to restore the backend's vfunc and can leave them set
> after initialisation. The only catch is that we lose the submission on
> wedging and still need to reset the submit_request vfunc on unwedging.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  .../drm/i915/gt/intel_execlists_submission.c  | 46 ++++++++---------
>  .../gpu/drm/i915/gt/intel_ring_submission.c   |  4 --
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 50 ++++++++-----------
>  3 files changed, 44 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index de124870af44..1108c193ab65 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3076,29 +3076,6 @@ static void execlists_set_default_submission(struct intel_engine_cs *engine)
>  	engine->submit_request = execlists_submit_request;
>  	engine->schedule = i915_schedule;
>  	engine->execlists.tasklet.callback = execlists_submission_tasklet;
> -
> -	engine->reset.prepare = execlists_reset_prepare;
> -	engine->reset.rewind = execlists_reset_rewind;
> -	engine->reset.cancel = execlists_reset_cancel;
> -	engine->reset.finish = execlists_reset_finish;
> -
> -	engine->park = execlists_park;
> -	engine->unpark = NULL;
> -
> -	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
> -	if (!intel_vgpu_active(engine->i915)) {
> -		engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
> -		if (can_preempt(engine)) {
> -			engine->flags |= I915_ENGINE_HAS_PREEMPTION;
> -			if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
> -				engine->flags |= I915_ENGINE_HAS_TIMESLICES;
> -		}
> -	}
> -
> -	if (intel_engine_has_preemption(engine))
> -		engine->emit_bb_start = gen8_emit_bb_start;
> -	else
> -		engine->emit_bb_start = gen8_emit_bb_start_noarb;
>  }
>  
>  static void execlists_shutdown(struct intel_engine_cs *engine)
> @@ -3129,6 +3106,14 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>  	engine->cops = &execlists_context_ops;
>  	engine->request_alloc = execlists_request_alloc;
>  
> +	engine->reset.prepare = execlists_reset_prepare;
> +	engine->reset.rewind = execlists_reset_rewind;
> +	engine->reset.cancel = execlists_reset_cancel;
> +	engine->reset.finish = execlists_reset_finish;
> +
> +	engine->park = execlists_park;
> +	engine->unpark = NULL;
> +
>  	engine->emit_flush = gen8_emit_flush_xcs;
>  	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
>  	engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_xcs;
> @@ -3149,6 +3134,21 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>  		 * until a more refined solution exists.
>  		 */
>  	}
> +
> +	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
> +	if (!intel_vgpu_active(engine->i915)) {
> +		engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
> +		if (can_preempt(engine)) {
> +			engine->flags |= I915_ENGINE_HAS_PREEMPTION;
> +			if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
> +				engine->flags |= I915_ENGINE_HAS_TIMESLICES;
> +		}
> +	}
> +
> +	if (intel_engine_has_preemption(engine))
> +		engine->emit_bb_start = gen8_emit_bb_start;
> +	else
> +		engine->emit_bb_start = gen8_emit_bb_start_noarb;
>  }
>  
>  static void logical_ring_default_irqs(struct intel_engine_cs *engine)
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 9585546556ee..5f4f7f1df48f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -989,14 +989,10 @@ static void gen6_bsd_submit_request(struct i915_request *request)
>  static void i9xx_set_default_submission(struct intel_engine_cs *engine)
>  {
>  	engine->submit_request = i9xx_submit_request;
> -
> -	engine->park = NULL;
> -	engine->unpark = NULL;
>  }
>  
>  static void gen6_bsd_set_default_submission(struct intel_engine_cs *engine)
>  {
> -	i9xx_set_default_submission(engine);
>  	engine->submit_request = gen6_bsd_submit_request;
>  }
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 92688a9b6717..f72faa0b8339 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -608,35 +608,6 @@ static int guc_resume(struct intel_engine_cs *engine)
>  static void guc_set_default_submission(struct intel_engine_cs *engine)
>  {
>  	engine->submit_request = guc_submit_request;
> -	engine->schedule = i915_schedule;
> -	engine->execlists.tasklet.callback = guc_submission_tasklet;
> -
> -	engine->reset.prepare = guc_reset_prepare;
> -	engine->reset.rewind = guc_reset_rewind;
> -	engine->reset.cancel = guc_reset_cancel;
> -	engine->reset.finish = guc_reset_finish;
> -
> -	engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
> -	engine->flags |= I915_ENGINE_HAS_PREEMPTION;
> -
> -	/*
> -	 * TODO: GuC supports timeslicing and semaphores as well, but they're
> -	 * handled by the firmware so some minor tweaks are required before
> -	 * enabling.
> -	 *
> -	 * engine->flags |= I915_ENGINE_HAS_TIMESLICES;
> -	 * engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
> -	 */
> -
> -	engine->emit_bb_start = gen8_emit_bb_start;
> -
> -	/*
> -	 * For the breadcrumb irq to work we need the interrupts to stay
> -	 * enabled. However, on all platforms on which we'll have support for
> -	 * GuC submission we don't allow disabling the interrupts at runtime, so
> -	 * we're always safe with the current flow.
> -	 */
> -	GEM_BUG_ON(engine->irq_enable || engine->irq_disable);
>  }
>  
>  static void guc_release(struct intel_engine_cs *engine)
> @@ -658,6 +629,13 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>  	engine->cops = &guc_context_ops;
>  	engine->request_alloc = guc_request_alloc;
>  
> +	engine->schedule = i915_schedule;
> +
> +	engine->reset.prepare = guc_reset_prepare;
> +	engine->reset.rewind = guc_reset_rewind;
> +	engine->reset.cancel = guc_reset_cancel;
> +	engine->reset.finish = guc_reset_finish;
> +
>  	engine->emit_flush = gen8_emit_flush_xcs;
>  	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
>  	engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_xcs;
> @@ -666,6 +644,20 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>  		engine->emit_flush = gen12_emit_flush_xcs;
>  	}
>  	engine->set_default_submission = guc_set_default_submission;
> +
> +	engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
> +	engine->flags |= I915_ENGINE_HAS_PREEMPTION;
> +
> +	/*
> +	 * TODO: GuC supports timeslicing and semaphores as well, but they're
> +	 * handled by the firmware so some minor tweaks are required before
> +	 * enabling.
> +	 *
> +	 * engine->flags |= I915_ENGINE_HAS_TIMESLICES;
> +	 * engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
> +	 */
> +
> +	engine->emit_bb_start = gen8_emit_bb_start;
>  }
>  
>  static void rcs_submission_override(struct intel_engine_cs *engine)
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 02/97] drm/i915/gt: Move submission_method into intel_gt
  2021-05-06 19:13 ` [RFC PATCH 02/97] drm/i915/gt: Move submission_method into intel_gt Matthew Brost
@ 2021-05-19  3:10   ` Matthew Brost
  2021-05-25  8:44   ` [Intel-gfx] " Tvrtko Ursulin
  1 sibling, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-19  3:10 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:16PM -0700, Matthew Brost wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Since we setup the submission method for the engines once, it is easy to
> assign an enum and use that instead of probing into the backends.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/intel_engine.h               |  8 +++++++-
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c            | 12 ++++++++----
>  drivers/gpu/drm/i915/gt/intel_execlists_submission.c |  8 --------
>  drivers/gpu/drm/i915/gt/intel_execlists_submission.h |  3 ---
>  drivers/gpu/drm/i915/gt/intel_gt_types.h             |  7 +++++++
>  drivers/gpu/drm/i915/gt/intel_reset.c                |  7 +++----
>  drivers/gpu/drm/i915/gt/selftest_execlists.c         |  2 +-
>  drivers/gpu/drm/i915/gt/selftest_ring_submission.c   |  2 +-
>  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c    |  5 -----
>  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h    |  1 -
>  drivers/gpu/drm/i915/i915_perf.c                     | 10 +++++-----
>  11 files changed, 32 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index 47ee8578e511..8d9184920c51 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -13,8 +13,9 @@
>  #include "i915_reg.h"
>  #include "i915_request.h"
>  #include "i915_selftest.h"
> -#include "gt/intel_timeline.h"
>  #include "intel_engine_types.h"
> +#include "intel_gt_types.h"
> +#include "intel_timeline.h"
>  #include "intel_workarounds.h"
>  
>  struct drm_printer;
> @@ -262,6 +263,11 @@ void intel_engine_init_active(struct intel_engine_cs *engine,
>  #define ENGINE_MOCK	1
>  #define ENGINE_VIRTUAL	2
>  
> +static inline bool intel_engine_uses_guc(const struct intel_engine_cs *engine)
> +{
> +	return engine->gt->submission_method >= INTEL_SUBMISSION_GUC;
> +}
> +
>  static inline bool
>  intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
>  {
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 6dbdbde00f14..0618379b68ca 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -909,12 +909,16 @@ int intel_engines_init(struct intel_gt *gt)
>  	enum intel_engine_id id;
>  	int err;
>  
> -	if (intel_uc_uses_guc_submission(&gt->uc))
> +	if (intel_uc_uses_guc_submission(&gt->uc)) {
> +		gt->submission_method = INTEL_SUBMISSION_GUC;
>  		setup = intel_guc_submission_setup;
> -	else if (HAS_EXECLISTS(gt->i915))
> +	} else if (HAS_EXECLISTS(gt->i915)) {
> +		gt->submission_method = INTEL_SUBMISSION_ELSP;
>  		setup = intel_execlists_submission_setup;
> -	else
> +	} else {
> +		gt->submission_method = INTEL_SUBMISSION_RING;
>  		setup = intel_ring_submission_setup;
> +	}
>  
>  	for_each_engine(engine, gt, id) {
>  		err = engine_setup_common(engine);
> @@ -1479,7 +1483,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
>  		drm_printf(m, "\tIPEHR: 0x%08x\n", ENGINE_READ(engine, IPEHR));
>  	}
>  
> -	if (intel_engine_in_guc_submission_mode(engine)) {
> +	if (intel_engine_uses_guc(engine)) {
>  		/* nothing to print yet */
>  	} else if (HAS_EXECLISTS(dev_priv)) {
>  		struct i915_request * const *port, *rq;
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 1108c193ab65..9d2da5ccaef6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -1768,7 +1768,6 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive)
>  	 */
>  	GEM_BUG_ON(!tasklet_is_locked(&execlists->tasklet) &&
>  		   !reset_in_progress(execlists));
> -	GEM_BUG_ON(!intel_engine_in_execlists_submission_mode(engine));
>  
>  	/*
>  	 * Note that csb_write, csb_status may be either in HWSP or mmio.
> @@ -3884,13 +3883,6 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
>  	spin_unlock_irqrestore(&engine->active.lock, flags);
>  }
>  
> -bool
> -intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine)
> -{
> -	return engine->set_default_submission ==
> -	       execlists_set_default_submission;
> -}
> -
>  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>  #include "selftest_execlists.c"
>  #endif
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> index fd61dae820e9..4ca9b475e252 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> @@ -43,7 +43,4 @@ int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
>  				     const struct intel_engine_cs *master,
>  				     const struct intel_engine_cs *sibling);
>  
> -bool
> -intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
> -
>  #endif /* __INTEL_EXECLISTS_SUBMISSION_H__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> index 0caf6ca0a784..fecfacf551d5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> @@ -31,6 +31,12 @@ struct i915_ggtt;
>  struct intel_engine_cs;
>  struct intel_uncore;
>  
> +enum intel_submission_method {
> +	INTEL_SUBMISSION_RING,
> +	INTEL_SUBMISSION_ELSP,
> +	INTEL_SUBMISSION_GUC,
> +};
> +
>  struct intel_gt {
>  	struct drm_i915_private *i915;
>  	struct intel_uncore *uncore;
> @@ -118,6 +124,7 @@ struct intel_gt {
>  	struct intel_engine_cs *engine[I915_NUM_ENGINES];
>  	struct intel_engine_cs *engine_class[MAX_ENGINE_CLASS + 1]
>  					    [MAX_ENGINE_INSTANCE + 1];
> +	enum intel_submission_method submission_method;
>  
>  	/*
>  	 * Default address space (either GGTT or ppGTT depending on arch).
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index a377c4588aaa..d5094be6d90f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -1118,7 +1118,6 @@ static int intel_gt_reset_engine(struct intel_engine_cs *engine)
>  int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
>  {
>  	struct intel_gt *gt = engine->gt;
> -	bool uses_guc = intel_engine_in_guc_submission_mode(engine);
>  	int ret;
>  
>  	ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
> @@ -1134,10 +1133,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
>  			   "Resetting %s for %s\n", engine->name, msg);
>  	atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
>  
> -	if (!uses_guc)
> -		ret = intel_gt_reset_engine(engine);
> -	else
> +	if (intel_engine_uses_guc(engine))
>  		ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
> +	else
> +		ret = intel_gt_reset_engine(engine);
>  	if (ret) {
>  		/* If we fail here, we expect to fallback to a global reset */
>  		ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
> diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> index 1081cd36a2bd..1f93591a8c69 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> @@ -4716,7 +4716,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
>  		SUBTEST(live_virtual_reset),
>  	};
>  
> -	if (!HAS_EXECLISTS(i915))
> +	if (i915->gt.submission_method != INTEL_SUBMISSION_ELSP)
>  		return 0;
>  
>  	if (intel_gt_is_wedged(&i915->gt))
> diff --git a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
> index 99609271c3a7..c12e74171b63 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
> @@ -291,7 +291,7 @@ int intel_ring_submission_live_selftests(struct drm_i915_private *i915)
>  		SUBTEST(live_ctx_switch_wa),
>  	};
>  
> -	if (HAS_EXECLISTS(i915))
> +	if (i915->gt.submission_method > INTEL_SUBMISSION_RING)
>  		return 0;
>  
>  	return intel_gt_live_subtests(tests, &i915->gt);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index f72faa0b8339..17b551a0c89f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -745,8 +745,3 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
>  {
>  	guc->submission_selected = __guc_submission_selected(guc);
>  }
> -
> -bool intel_engine_in_guc_submission_mode(const struct intel_engine_cs *engine)
> -{
> -	return engine->set_default_submission == guc_set_default_submission;
> -}
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> index 5f7b9e6347d0..3f7005018939 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> @@ -20,7 +20,6 @@ void intel_guc_submission_fini(struct intel_guc *guc);
>  int intel_guc_preempt_work_create(struct intel_guc *guc);
>  void intel_guc_preempt_work_destroy(struct intel_guc *guc);
>  int intel_guc_submission_setup(struct intel_engine_cs *engine);
> -bool intel_engine_in_guc_submission_mode(const struct intel_engine_cs *engine);
>  
>  static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 85ad62dbabfa..66f1f25119b5 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1257,11 +1257,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>  	case 8:
>  	case 9:
>  	case 10:
> -		if (intel_engine_in_execlists_submission_mode(ce->engine)) {
> -			stream->specific_ctx_id_mask =
> -				(1U << GEN8_CTX_ID_WIDTH) - 1;
> -			stream->specific_ctx_id = stream->specific_ctx_id_mask;
> -		} else {
> +		if (intel_engine_uses_guc(ce->engine)) {
>  			/*
>  			 * When using GuC, the context descriptor we write in
>  			 * i915 is read by GuC and rewritten before it's
> @@ -1280,6 +1276,10 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>  			 */
>  			stream->specific_ctx_id_mask =
>  				(1U << (GEN8_CTX_ID_WIDTH - 1)) - 1;
> +		} else {
> +			stream->specific_ctx_id_mask =
> +				(1U << GEN8_CTX_ID_WIDTH) - 1;
> +			stream->specific_ctx_id = stream->specific_ctx_id_mask;
>  		}
>  		break;
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 03/97] drm/i915/gt: Move CS interrupt handler to the backend
  2021-05-06 19:13 ` [RFC PATCH 03/97] drm/i915/gt: Move CS interrupt handler to the backend Matthew Brost
@ 2021-05-19  3:31   ` Matthew Brost
  2021-05-25  8:45   ` [Intel-gfx] " Tvrtko Ursulin
  1 sibling, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-19  3:31 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:17PM -0700, Matthew Brost wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> The different submission backends each have their own preferred
> behaviour and interrupt setup. Let each handle their own interrupts.
> 
> This becomes more useful later as we to extract the use of auxiliary
> state in the interrupt handler that is backend specific.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  7 ++
>  drivers/gpu/drm/i915/gt/intel_engine_types.h  | 14 +---
>  .../drm/i915/gt/intel_execlists_submission.c  | 41 ++++++++++
>  drivers/gpu/drm/i915/gt/intel_gt_irq.c        | 82 ++++++-------------
>  drivers/gpu/drm/i915/gt/intel_gt_irq.h        | 23 ++++++
>  .../gpu/drm/i915/gt/intel_ring_submission.c   |  8 ++
>  drivers/gpu/drm/i915/gt/intel_rps.c           |  2 +-
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 ++-
>  drivers/gpu/drm/i915/i915_irq.c               | 10 ++-
>  9 files changed, 124 insertions(+), 74 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 0618379b68ca..828e1669f92c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -255,6 +255,11 @@ static void intel_engine_sanitize_mmio(struct intel_engine_cs *engine)
>  	intel_engine_set_hwsp_writemask(engine, ~0u);
>  }
>  
> +static void nop_irq_handler(struct intel_engine_cs *engine, u16 iir)
> +{
> +	GEM_DEBUG_WARN_ON(iir);
> +}
> +
>  static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
>  {
>  	const struct engine_info *info = &intel_engines[id];
> @@ -292,6 +297,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
>  	engine->hw_id = info->hw_id;
>  	engine->guc_id = MAKE_GUC_ID(info->class, info->instance);
>  
> +	engine->irq_handler = nop_irq_handler;
> +
>  	engine->class = info->class;
>  	engine->instance = info->instance;
>  	__sprint_engine_name(engine);
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 883bafc44902..9ef349cd5cea 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -402,6 +402,7 @@ struct intel_engine_cs {
>  	u32		irq_enable_mask; /* bitmask to enable ring interrupt */
>  	void		(*irq_enable)(struct intel_engine_cs *engine);
>  	void		(*irq_disable)(struct intel_engine_cs *engine);
> +	void		(*irq_handler)(struct intel_engine_cs *engine, u16 iir);
>  
>  	void		(*sanitize)(struct intel_engine_cs *engine);
>  	int		(*resume)(struct intel_engine_cs *engine);
> @@ -481,10 +482,9 @@ struct intel_engine_cs {
>  #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
>  #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
>  #define I915_ENGINE_HAS_TIMESLICES   BIT(4)
> -#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(5)
> -#define I915_ENGINE_IS_VIRTUAL       BIT(6)
> -#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(7)
> -#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(8)
> +#define I915_ENGINE_IS_VIRTUAL       BIT(5)
> +#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
> +#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
>  	unsigned int flags;
>  
>  	/*
> @@ -593,12 +593,6 @@ intel_engine_has_timeslices(const struct intel_engine_cs *engine)
>  	return engine->flags & I915_ENGINE_HAS_TIMESLICES;
>  }
>  
> -static inline bool
> -intel_engine_needs_breadcrumb_tasklet(const struct intel_engine_cs *engine)
> -{
> -	return engine->flags & I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
> -}
> -
>  static inline bool
>  intel_engine_is_virtual(const struct intel_engine_cs *engine)
>  {
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 9d2da5ccaef6..8db200422950 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -118,6 +118,7 @@
>  #include "intel_engine_stats.h"
>  #include "intel_execlists_submission.h"
>  #include "intel_gt.h"
> +#include "intel_gt_irq.h"
>  #include "intel_gt_pm.h"
>  #include "intel_gt_requests.h"
>  #include "intel_lrc.h"
> @@ -2384,6 +2385,45 @@ static void execlists_submission_tasklet(struct tasklet_struct *t)
>  	rcu_read_unlock();
>  }
>  
> +static void execlists_irq_handler(struct intel_engine_cs *engine, u16 iir)
> +{
> +	bool tasklet = false;
> +
> +	if (unlikely(iir & GT_CS_MASTER_ERROR_INTERRUPT)) {
> +		u32 eir;
> +
> +		/* Upper 16b are the enabling mask, rsvd for internal errors */
> +		eir = ENGINE_READ(engine, RING_EIR) & GENMASK(15, 0);
> +		ENGINE_TRACE(engine, "CS error: %x\n", eir);
> +
> +		/* Disable the error interrupt until after the reset */
> +		if (likely(eir)) {
> +			ENGINE_WRITE(engine, RING_EMR, ~0u);
> +			ENGINE_WRITE(engine, RING_EIR, eir);
> +			WRITE_ONCE(engine->execlists.error_interrupt, eir);
> +			tasklet = true;
> +		}
> +	}
> +
> +	if (iir & GT_WAIT_SEMAPHORE_INTERRUPT) {
> +		WRITE_ONCE(engine->execlists.yield,
> +			   ENGINE_READ_FW(engine, RING_EXECLIST_STATUS_HI));
> +		ENGINE_TRACE(engine, "semaphore yield: %08x\n",
> +			     engine->execlists.yield);
> +		if (del_timer(&engine->execlists.timer))
> +			tasklet = true;
> +	}
> +
> +	if (iir & GT_CONTEXT_SWITCH_INTERRUPT)
> +		tasklet = true;
> +
> +	if (iir & GT_RENDER_USER_INTERRUPT)
> +		intel_engine_signal_breadcrumbs(engine);
> +
> +	if (tasklet)
> +		tasklet_hi_schedule(&engine->execlists.tasklet);
> +}
> +
>  static void __execlists_kick(struct intel_engine_execlists *execlists)
>  {
>  	/* Kick the tasklet for some interrupt coalescing and reset handling */
> @@ -3133,6 +3173,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>  		 * until a more refined solution exists.
>  		 */
>  	}
> +	intel_engine_set_irq_handler(engine, execlists_irq_handler);
>  
>  	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
>  	if (!intel_vgpu_active(engine->i915)) {
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> index 9fc6c912a4e5..d29126c458ba 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> @@ -20,48 +20,6 @@ static void guc_irq_handler(struct intel_guc *guc, u16 iir)
>  		intel_guc_to_host_event_handler(guc);
>  }
>  
> -static void
> -cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
> -{
> -	bool tasklet = false;
> -
> -	if (unlikely(iir & GT_CS_MASTER_ERROR_INTERRUPT)) {
> -		u32 eir;
> -
> -		/* Upper 16b are the enabling mask, rsvd for internal errors */
> -		eir = ENGINE_READ(engine, RING_EIR) & GENMASK(15, 0);
> -		ENGINE_TRACE(engine, "CS error: %x\n", eir);
> -
> -		/* Disable the error interrupt until after the reset */
> -		if (likely(eir)) {
> -			ENGINE_WRITE(engine, RING_EMR, ~0u);
> -			ENGINE_WRITE(engine, RING_EIR, eir);
> -			WRITE_ONCE(engine->execlists.error_interrupt, eir);
> -			tasklet = true;
> -		}
> -	}
> -
> -	if (iir & GT_WAIT_SEMAPHORE_INTERRUPT) {
> -		WRITE_ONCE(engine->execlists.yield,
> -			   ENGINE_READ_FW(engine, RING_EXECLIST_STATUS_HI));
> -		ENGINE_TRACE(engine, "semaphore yield: %08x\n",
> -			     engine->execlists.yield);
> -		if (del_timer(&engine->execlists.timer))
> -			tasklet = true;
> -	}
> -
> -	if (iir & GT_CONTEXT_SWITCH_INTERRUPT)
> -		tasklet = true;
> -
> -	if (iir & GT_RENDER_USER_INTERRUPT) {
> -		intel_engine_signal_breadcrumbs(engine);
> -		tasklet |= intel_engine_needs_breadcrumb_tasklet(engine);
> -	}
> -
> -	if (tasklet)
> -		tasklet_hi_schedule(&engine->execlists.tasklet);
> -}
> -
>  static u32
>  gen11_gt_engine_identity(struct intel_gt *gt,
>  			 const unsigned int bank, const unsigned int bit)
> @@ -122,7 +80,7 @@ gen11_engine_irq_handler(struct intel_gt *gt, const u8 class,
>  		engine = NULL;
>  
>  	if (likely(engine))
> -		return cs_irq_handler(engine, iir);
> +		return intel_engine_cs_irq(engine, iir);
>  
>  	WARN_ONCE(1, "unhandled engine interrupt class=0x%x, instance=0x%x\n",
>  		  class, instance);
> @@ -275,9 +233,12 @@ void gen11_gt_irq_postinstall(struct intel_gt *gt)
>  void gen5_gt_irq_handler(struct intel_gt *gt, u32 gt_iir)
>  {
>  	if (gt_iir & GT_RENDER_USER_INTERRUPT)
> -		intel_engine_signal_breadcrumbs(gt->engine_class[RENDER_CLASS][0]);
> +		intel_engine_cs_irq(gt->engine_class[RENDER_CLASS][0],
> +				    gt_iir);
> +
>  	if (gt_iir & ILK_BSD_USER_INTERRUPT)
> -		intel_engine_signal_breadcrumbs(gt->engine_class[VIDEO_DECODE_CLASS][0]);
> +		intel_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][0],
> +				    gt_iir);
>  }
>  
>  static void gen7_parity_error_irq_handler(struct intel_gt *gt, u32 iir)
> @@ -301,11 +262,16 @@ static void gen7_parity_error_irq_handler(struct intel_gt *gt, u32 iir)
>  void gen6_gt_irq_handler(struct intel_gt *gt, u32 gt_iir)
>  {
>  	if (gt_iir & GT_RENDER_USER_INTERRUPT)
> -		intel_engine_signal_breadcrumbs(gt->engine_class[RENDER_CLASS][0]);
> +		intel_engine_cs_irq(gt->engine_class[RENDER_CLASS][0],
> +				    gt_iir);
> +
>  	if (gt_iir & GT_BSD_USER_INTERRUPT)
> -		intel_engine_signal_breadcrumbs(gt->engine_class[VIDEO_DECODE_CLASS][0]);
> +		intel_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][0],
> +				    gt_iir >> 12);
> +
>  	if (gt_iir & GT_BLT_USER_INTERRUPT)
> -		intel_engine_signal_breadcrumbs(gt->engine_class[COPY_ENGINE_CLASS][0]);
> +		intel_engine_cs_irq(gt->engine_class[COPY_ENGINE_CLASS][0],
> +				    gt_iir >> 22);
>  
>  	if (gt_iir & (GT_BLT_CS_ERROR_INTERRUPT |
>  		      GT_BSD_CS_ERROR_INTERRUPT |
> @@ -324,10 +290,10 @@ void gen8_gt_irq_handler(struct intel_gt *gt, u32 master_ctl)
>  	if (master_ctl & (GEN8_GT_RCS_IRQ | GEN8_GT_BCS_IRQ)) {
>  		iir = raw_reg_read(regs, GEN8_GT_IIR(0));
>  		if (likely(iir)) {
> -			cs_irq_handler(gt->engine_class[RENDER_CLASS][0],
> -				       iir >> GEN8_RCS_IRQ_SHIFT);
> -			cs_irq_handler(gt->engine_class[COPY_ENGINE_CLASS][0],
> -				       iir >> GEN8_BCS_IRQ_SHIFT);
> +			intel_engine_cs_irq(gt->engine_class[RENDER_CLASS][0],
> +					    iir >> GEN8_RCS_IRQ_SHIFT);
> +			intel_engine_cs_irq(gt->engine_class[COPY_ENGINE_CLASS][0],
> +					    iir >> GEN8_BCS_IRQ_SHIFT);
>  			raw_reg_write(regs, GEN8_GT_IIR(0), iir);
>  		}
>  	}
> @@ -335,10 +301,10 @@ void gen8_gt_irq_handler(struct intel_gt *gt, u32 master_ctl)
>  	if (master_ctl & (GEN8_GT_VCS0_IRQ | GEN8_GT_VCS1_IRQ)) {
>  		iir = raw_reg_read(regs, GEN8_GT_IIR(1));
>  		if (likely(iir)) {
> -			cs_irq_handler(gt->engine_class[VIDEO_DECODE_CLASS][0],
> -				       iir >> GEN8_VCS0_IRQ_SHIFT);
> -			cs_irq_handler(gt->engine_class[VIDEO_DECODE_CLASS][1],
> -				       iir >> GEN8_VCS1_IRQ_SHIFT);
> +			intel_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][0],
> +					    iir >> GEN8_VCS0_IRQ_SHIFT);
> +			intel_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][1],
> +					    iir >> GEN8_VCS1_IRQ_SHIFT);
>  			raw_reg_write(regs, GEN8_GT_IIR(1), iir);
>  		}
>  	}
> @@ -346,8 +312,8 @@ void gen8_gt_irq_handler(struct intel_gt *gt, u32 master_ctl)
>  	if (master_ctl & GEN8_GT_VECS_IRQ) {
>  		iir = raw_reg_read(regs, GEN8_GT_IIR(3));
>  		if (likely(iir)) {
> -			cs_irq_handler(gt->engine_class[VIDEO_ENHANCEMENT_CLASS][0],
> -				       iir >> GEN8_VECS_IRQ_SHIFT);
> +			intel_engine_cs_irq(gt->engine_class[VIDEO_ENHANCEMENT_CLASS][0],
> +					    iir >> GEN8_VECS_IRQ_SHIFT);
>  			raw_reg_write(regs, GEN8_GT_IIR(3), iir);
>  		}
>  	}
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.h b/drivers/gpu/drm/i915/gt/intel_gt_irq.h
> index f667e976fb2b..41cad38668c5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.h
> @@ -8,6 +8,8 @@
>  
>  #include <linux/types.h>
>  
> +#include "intel_engine_types.h"
> +
>  struct intel_gt;
>  
>  #define GEN8_GT_IRQS (GEN8_GT_RCS_IRQ | \
> @@ -39,4 +41,25 @@ void gen8_gt_irq_handler(struct intel_gt *gt, u32 master_ctl);
>  void gen8_gt_irq_reset(struct intel_gt *gt);
>  void gen8_gt_irq_postinstall(struct intel_gt *gt);
>  
> +static inline void intel_engine_cs_irq(struct intel_engine_cs *engine, u16 iir)
> +{
> +	if (iir)
> +		engine->irq_handler(engine, iir);
> +}
> +
> +static inline void
> +intel_engine_set_irq_handler(struct intel_engine_cs *engine,
> +			     void (*fn)(struct intel_engine_cs *engine,
> +					u16 iir))
> +{
> +	/*
> +	 * As the interrupt is live as allocate and setup the engines,
> +	 * err on the side of caution and apply barriers to updating
> +	 * the irq handler callback. This assures that when we do use
> +	 * the engine, we will receive interrupts only to ourselves,
> +	 * and not lose any.
> +	 */
> +	smp_store_mb(engine->irq_handler, fn);
> +}
> +
>  #endif /* INTEL_GT_IRQ_H */
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 5f4f7f1df48f..2b6dffcc2262 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -12,6 +12,7 @@
>  #include "intel_breadcrumbs.h"
>  #include "intel_context.h"
>  #include "intel_gt.h"
> +#include "intel_gt_irq.h"
>  #include "intel_reset.h"
>  #include "intel_ring.h"
>  #include "shmem_utils.h"
> @@ -1017,10 +1018,17 @@ static void ring_release(struct intel_engine_cs *engine)
>  	intel_timeline_put(engine->legacy.timeline);
>  }
>  
> +static void irq_handler(struct intel_engine_cs *engine, u16 iir)
> +{
> +	intel_engine_signal_breadcrumbs(engine);
> +}
> +
>  static void setup_irq(struct intel_engine_cs *engine)
>  {
>  	struct drm_i915_private *i915 = engine->i915;
>  
> +	intel_engine_set_irq_handler(engine, irq_handler);
> +
>  	if (INTEL_GEN(i915) >= 6) {
>  		engine->irq_enable = gen6_irq_enable;
>  		engine->irq_disable = gen6_irq_disable;
> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
> index 405d814e9040..97cab1b99871 100644
> --- a/drivers/gpu/drm/i915/gt/intel_rps.c
> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
> @@ -1774,7 +1774,7 @@ void gen6_rps_irq_handler(struct intel_rps *rps, u32 pm_iir)
>  		return;
>  
>  	if (pm_iir & PM_VEBOX_USER_INTERRUPT)
> -		intel_engine_signal_breadcrumbs(gt->engine[VECS0]);
> +		intel_engine_cs_irq(gt->engine[VECS0], pm_iir >> 10);
>  
>  	if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT)
>  		DRM_DEBUG("Command parser error, pm_iir 0x%08x\n", pm_iir);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 17b551a0c89f..335719f17490 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -11,6 +11,7 @@
>  #include "gt/intel_context.h"
>  #include "gt/intel_engine_pm.h"
>  #include "gt/intel_gt.h"
> +#include "gt/intel_gt_irq.h"
>  #include "gt/intel_gt_pm.h"
>  #include "gt/intel_lrc.h"
>  #include "gt/intel_mocs.h"
> @@ -264,6 +265,14 @@ static void guc_submission_tasklet(struct tasklet_struct *t)
>  	spin_unlock_irqrestore(&engine->active.lock, flags);
>  }
>  
> +static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
> +{
> +	if (iir & GT_RENDER_USER_INTERRUPT) {
> +		intel_engine_signal_breadcrumbs(engine);
> +		tasklet_hi_schedule(&engine->execlists.tasklet);
> +	}
> +}
> +
>  static void guc_reset_prepare(struct intel_engine_cs *engine)
>  {
>  	struct intel_engine_execlists * const execlists = &engine->execlists;
> @@ -645,7 +654,6 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>  	}
>  	engine->set_default_submission = guc_set_default_submission;
>  
> -	engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
>  	engine->flags |= I915_ENGINE_HAS_PREEMPTION;
>  
>  	/*
> @@ -681,6 +689,7 @@ static void rcs_submission_override(struct intel_engine_cs *engine)
>  static inline void guc_default_irqs(struct intel_engine_cs *engine)
>  {
>  	engine->irq_keep_mask = GT_RENDER_USER_INTERRUPT;
> +	intel_engine_set_irq_handler(engine, cs_irq_handler);
>  }
>  
>  int intel_guc_submission_setup(struct intel_engine_cs *engine)
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index f6967a93ec7a..d58118806299 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -4014,7 +4014,7 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
>  		intel_uncore_write16(&dev_priv->uncore, GEN2_IIR, iir);
>  
>  		if (iir & I915_USER_INTERRUPT)
> -			intel_engine_signal_breadcrumbs(dev_priv->gt.engine[RCS0]);
> +			intel_engine_cs_irq(dev_priv->gt.engine[RCS0], iir);
>  
>  		if (iir & I915_MASTER_ERROR_INTERRUPT)
>  			i8xx_error_irq_handler(dev_priv, eir, eir_stuck);
> @@ -4122,7 +4122,7 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
>  		intel_uncore_write(&dev_priv->uncore, GEN2_IIR, iir);
>  
>  		if (iir & I915_USER_INTERRUPT)
> -			intel_engine_signal_breadcrumbs(dev_priv->gt.engine[RCS0]);
> +			intel_engine_cs_irq(dev_priv->gt.engine[RCS0], iir);
>  
>  		if (iir & I915_MASTER_ERROR_INTERRUPT)
>  			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
> @@ -4267,10 +4267,12 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
>  		intel_uncore_write(&dev_priv->uncore, GEN2_IIR, iir);
>  
>  		if (iir & I915_USER_INTERRUPT)
> -			intel_engine_signal_breadcrumbs(dev_priv->gt.engine[RCS0]);
> +			intel_engine_cs_irq(dev_priv->gt.engine[RCS0],
> +					    iir);
>  
>  		if (iir & I915_BSD_USER_INTERRUPT)
> -			intel_engine_signal_breadcrumbs(dev_priv->gt.engine[VCS0]);
> +			intel_engine_cs_irq(dev_priv->gt.engine[VCS0],
> +					    iir >> 25);
>  
>  		if (iir & I915_MASTER_ERROR_INTERRUPT)
>  			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 04/97] drm/i915/guc: skip disabling CTBs before sanitizing the GuC
  2021-05-06 19:13 ` [RFC PATCH 04/97] drm/i915/guc: skip disabling CTBs before sanitizing the GuC Matthew Brost
@ 2021-05-20 16:47   ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-20 16:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:18PM -0700, Matthew Brost wrote:
> From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> 
> If we're about to sanitize the GuC, something might have going wrong
> beforehand, so we should avoid trying to talk to it. Even if GuC is
> still running fine, the sanitize will reset its internal state and clear
> the CTB registration, so there is still no need to explicitly do so.
> 
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/2469
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: John Harrison <John.C.Harrison@Intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_uc.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> index 6abb8f2dc33d..892c1315ce49 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> @@ -504,7 +504,7 @@ static int __uc_init_hw(struct intel_uc *uc)
>  
>  	ret = intel_guc_sample_forcewake(guc);
>  	if (ret)
> -		goto err_communication;
> +		goto err_log_capture;
>  
>  	if (intel_uc_uses_guc_submission(uc))
>  		intel_guc_submission_enable(guc);
> @@ -529,8 +529,6 @@ static int __uc_init_hw(struct intel_uc *uc)
>  	/*
>  	 * We've failed to load the firmware :(
>  	 */
> -err_communication:
> -	guc_disable_communication(guc);
>  err_log_capture:
>  	__uc_capture_load_err_log(uc);
>  err_out:
> @@ -558,9 +556,6 @@ static void __uc_fini_hw(struct intel_uc *uc)
>  	if (intel_uc_uses_guc_submission(uc))
>  		intel_guc_submission_disable(guc);
>  
> -	if (guc_communication_enabled(guc))
> -		guc_disable_communication(guc);
> -
>  	__uc_sanitize(uc);
>  }
>  
> @@ -577,7 +572,6 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
>  	if (!intel_guc_is_ready(guc))
>  		return;
>  
> -	guc_disable_communication(guc);
>  	__uc_sanitize(uc);
>  }
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 05/97] drm/i915/guc: use probe_error log for CT enablement failure
  2021-05-06 19:13 ` [RFC PATCH 05/97] drm/i915/guc: use probe_error log for CT enablement failure Matthew Brost
@ 2021-05-24 10:30   ` Michal Wajdeczko
  0 siblings, 0 replies; 249+ messages in thread
From: Michal Wajdeczko @ 2021-05-24 10:30 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel
  Cc: tvrtko.ursulin, jason.ekstrand, daniele.ceraolospurio,
	jon.bloomfield, daniel.vetter, john.c.harrison



On 06.05.2021 21:13, Matthew Brost wrote:
> From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> 
> We have a couple of failure injection points in the CT enablement path,
> so we need to use i915_probe_error() to select the appropriate log level.
> A new macro (CT_PROBE_ERROR) has been added to the set of CT logging
> macros to be used in this scenario and upcoming ones.
> 
> While adding the new macros, fix the underlying logging mechanics used
> by the existing ones (DRM_DEV_* -> drm_*) and move the inlines to
> before they're used inside the macros.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 48 ++++++++++++-----------
>  1 file changed, 25 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index fa9e048cc65f..25618649048f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -7,14 +7,36 @@
>  #include "intel_guc_ct.h"
>  #include "gt/intel_gt.h"
>  
> +static inline struct intel_guc *ct_to_guc(struct intel_guc_ct *ct)
> +{
> +	return container_of(ct, struct intel_guc, ct);
> +}
> +
> +static inline struct intel_gt *ct_to_gt(struct intel_guc_ct *ct)
> +{
> +	return guc_to_gt(ct_to_guc(ct));
> +}
> +
> +static inline struct drm_i915_private *ct_to_i915(struct intel_guc_ct *ct)
> +{
> +	return ct_to_gt(ct)->i915;
> +}
> +
> +static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
> +{
> +	return &ct_to_i915(ct)->drm;
> +}
> +
>  #define CT_ERROR(_ct, _fmt, ...) \
> -	DRM_DEV_ERROR(ct_to_dev(_ct), "CT: " _fmt, ##__VA_ARGS__)
> +	drm_err(ct_to_drm(_ct), "CT: " _fmt, ##__VA_ARGS__)
>  #ifdef CONFIG_DRM_I915_DEBUG_GUC
>  #define CT_DEBUG(_ct, _fmt, ...) \
> -	DRM_DEV_DEBUG_DRIVER(ct_to_dev(_ct), "CT: " _fmt, ##__VA_ARGS__)
> +	drm_dbg(ct_to_drm(_ct), "CT: " _fmt, ##__VA_ARGS__)
>  #else
>  #define CT_DEBUG(...)	do { } while (0)
>  #endif
> +#define CT_PROBE_ERROR(_ct, _fmt, ...) \
> +	i915_probe_error(ct_to_i915(ct), "CT: " _fmt, ##__VA_ARGS__);
>  
>  struct ct_request {
>  	struct list_head link;
> @@ -47,26 +69,6 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>  	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
>  }
>  
> -static inline struct intel_guc *ct_to_guc(struct intel_guc_ct *ct)
> -{
> -	return container_of(ct, struct intel_guc, ct);
> -}
> -
> -static inline struct intel_gt *ct_to_gt(struct intel_guc_ct *ct)
> -{
> -	return guc_to_gt(ct_to_guc(ct));
> -}
> -
> -static inline struct drm_i915_private *ct_to_i915(struct intel_guc_ct *ct)
> -{
> -	return ct_to_gt(ct)->i915;
> -}
> -
> -static inline struct device *ct_to_dev(struct intel_guc_ct *ct)
> -{
> -	return ct_to_i915(ct)->drm.dev;
> -}
> -
>  static inline const char *guc_ct_buffer_type_to_str(u32 type)
>  {
>  	switch (type) {
> @@ -264,7 +266,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>  err_deregister:
>  	ct_deregister_buffer(ct, INTEL_GUC_CT_BUFFER_TYPE_RECV);
>  err_out:
> -	CT_ERROR(ct, "Failed to open open CT channel (err=%d)\n", err);
> +	CT_PROBE_ERROR(ct, "Failed to open channel (err=%d)\n", err);

nit: while here we can start using %pe to print error

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>

>  	return err;
>  }
>  
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 07/97] drm/i915/guc: Remove sample_forcewake h2g action
  2021-05-06 19:13 ` [RFC PATCH 07/97] drm/i915/guc: Remove sample_forcewake h2g action Matthew Brost
@ 2021-05-24 10:48   ` Michal Wajdeczko
  2021-05-25  0:36   ` Matthew Brost
  1 sibling, 0 replies; 249+ messages in thread
From: Michal Wajdeczko @ 2021-05-24 10:48 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel
  Cc: tvrtko.ursulin, jason.ekstrand, daniele.ceraolospurio,
	jon.bloomfield, daniel.vetter, john.c.harrison



On 06.05.2021 21:13, Matthew Brost wrote:
> From: Rodrigo Vivi <rodrigo.vivi@intel.com>
> 
> This action is no-op in the GuC side for a few versions already
> and it is getting entirely removed soon, in an upcoming version.
> 
> Time to remove before we face communication issues.
> 
> Cc:  Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Acked-by: Michal Wajdeczko <michal.wajdeczko@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c      | 16 ----------------
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h      |  1 -
>  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h |  4 ----
>  drivers/gpu/drm/i915/gt/uc/intel_uc.c       |  4 ----
>  4 files changed, 25 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index adae04c47aab..ab2c8fe8cdfa 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -469,22 +469,6 @@ int intel_guc_to_host_process_recv_msg(struct intel_guc *guc,
>  	return 0;
>  }
>  
> -int intel_guc_sample_forcewake(struct intel_guc *guc)
> -{
> -	struct drm_i915_private *dev_priv = guc_to_gt(guc)->i915;
> -	u32 action[2];
> -
> -	action[0] = INTEL_GUC_ACTION_SAMPLE_FORCEWAKE;
> -	/* WaRsDisableCoarsePowerGating:skl,cnl */
> -	if (!HAS_RC6(dev_priv) || NEEDS_WaRsDisableCoarsePowerGating(dev_priv))
> -		action[1] = 0;
> -	else
> -		/* bit 0 and 1 are for Render and Media domain separately */
> -		action[1] = GUC_FORCEWAKE_RENDER | GUC_FORCEWAKE_MEDIA;
> -
> -	return intel_guc_send(guc, action, ARRAY_SIZE(action));
> -}
> -
>  /**
>   * intel_guc_auth_huc() - Send action to GuC to authenticate HuC ucode
>   * @guc: intel_guc structure
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index bc2ba7d0626c..c20f3839de12 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -128,7 +128,6 @@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action, u32 len,
>  			u32 *response_buf, u32 response_buf_size);
>  int intel_guc_to_host_process_recv_msg(struct intel_guc *guc,
>  				       const u32 *payload, u32 len);
> -int intel_guc_sample_forcewake(struct intel_guc *guc);
>  int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset);
>  int intel_guc_suspend(struct intel_guc *guc);
>  int intel_guc_resume(struct intel_guc *guc);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> index 79c560d9c0b6..0f9afcde1d0b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> @@ -302,9 +302,6 @@ struct guc_ct_buffer_desc {
>  #define GUC_CT_MSG_ACTION_SHIFT			16
>  #define GUC_CT_MSG_ACTION_MASK			0xFFFF
>  
> -#define GUC_FORCEWAKE_RENDER	(1 << 0)
> -#define GUC_FORCEWAKE_MEDIA	(1 << 1)
> -
>  #define GUC_POWER_UNSPECIFIED	0
>  #define GUC_POWER_D0		1
>  #define GUC_POWER_D1		2
> @@ -558,7 +555,6 @@ enum intel_guc_action {
>  	INTEL_GUC_ACTION_ENTER_S_STATE = 0x501,
>  	INTEL_GUC_ACTION_EXIT_S_STATE = 0x502,
>  	INTEL_GUC_ACTION_SLPC_REQUEST = 0x3003,
> -	INTEL_GUC_ACTION_SAMPLE_FORCEWAKE = 0x3005,
>  	INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000,
>  	INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
>  	INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> index 892c1315ce49..ab0789d66e06 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> @@ -502,10 +502,6 @@ static int __uc_init_hw(struct intel_uc *uc)
>  
>  	intel_huc_auth(huc);
>  
> -	ret = intel_guc_sample_forcewake(guc);
> -	if (ret)
> -		goto err_log_capture;
> -
>  	if (intel_uc_uses_guc_submission(uc))
>  		intel_guc_submission_enable(guc);
>  
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 30/97] drm/i915/uc: turn on GuC/HuC auto mode by default
  2021-05-06 19:13 ` [RFC PATCH 30/97] drm/i915/uc: turn on GuC/HuC auto mode by default Matthew Brost
@ 2021-05-24 11:00   ` Michal Wajdeczko
  0 siblings, 0 replies; 249+ messages in thread
From: Michal Wajdeczko @ 2021-05-24 11:00 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel
  Cc: tvrtko.ursulin, jason.ekstrand, daniele.ceraolospurio,
	jon.bloomfield, daniel.vetter, john.c.harrison



On 06.05.2021 21:13, Matthew Brost wrote:
> From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> 
> This will enable HuC loading for Gen11+ by default if the binaries
> are available on the system. GuC submission still requires explicit
> enabling by the user.
> 
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/i915_params.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
> index 14cd64cc61d0..a0575948ab61 100644
> --- a/drivers/gpu/drm/i915/i915_params.h
> +++ b/drivers/gpu/drm/i915/i915_params.h
> @@ -59,7 +59,7 @@ struct drm_printer;
>  	param(int, disable_power_well, -1, 0400) \
>  	param(int, enable_ips, 1, 0600) \
>  	param(int, invert_brightness, 0, 0600) \
> -	param(int, enable_guc, 0, 0400) \
> +	param(int, enable_guc, -1, 0400) \

you also want to update param description from

	"(-1=auto, 0=disable [default], 1=GuC submission, 2=HuC load)");
to
	"(-1=auto [default], 0=disable, 1=GuC submission, 2=HuC load)");

>  	param(int, guc_log_level, -1, 0400) \
>  	param(char *, guc_firmware_path, NULL, 0400) \
>  	param(char *, huc_firmware_path, NULL, 0400) \
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 35/97] drm/i915/guc: Improve error message for unsolicited CT response
  2021-05-06 19:13 ` [RFC PATCH 35/97] drm/i915/guc: Improve error message for unsolicited CT response Matthew Brost
@ 2021-05-24 11:59   ` Michal Wajdeczko
  2021-05-25 17:32     ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Michal Wajdeczko @ 2021-05-24 11:59 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel
  Cc: tvrtko.ursulin, jason.ekstrand, daniele.ceraolospurio,
	jon.bloomfield, daniel.vetter, john.c.harrison



On 06.05.2021 21:13, Matthew Brost wrote:
> Improve the error message when a unsolicited CT response is received by
> printing fence that couldn't be found, the last fence, and all requests
> with a response outstanding.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 217ab3ebd1af..a76603537fa8 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -703,12 +703,16 @@ static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *r
>  		found = true;
>  		break;
>  	}
> -	spin_unlock_irqrestore(&ct->requests.lock, flags);
> -
>  	if (!found) {
>  		CT_ERROR(ct, "Unsolicited response (fence %u)\n", fence);
> -		return -ENOKEY;
> +		CT_ERROR(ct, "Could not find fence=%u, last_fence=%u\n", fence,
> +			 ct->requests.last_fence);

nit: this new wording may suggest that it's our fault, but that's not
necessary true

> +		list_for_each_entry(req, &ct->requests.pending, link)
> +			CT_ERROR(ct, "request %u awaits response\n",
> +				 req->fence);

usually we don't send multiple requests that expects responses, so it's
very likely that list with pending requests will be empty, and even if
list is not empty, I'm not sure what is the relation between those
pending requests to this unsolicited response, thus wondering how these
extra errors could improve our debugging experience ?

> +		err = -ENOKEY;
>  	}
> +	spin_unlock_irqrestore(&ct->requests.lock, flags);
>  
>  	if (unlikely(err))
>  		return err;
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-05-06 19:13 ` [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function Matthew Brost
@ 2021-05-24 12:21   ` Michal Wajdeczko
  2021-05-25 17:30     ` Matthew Brost
  2021-05-25  9:21   ` [Intel-gfx] " Tvrtko Ursulin
  1 sibling, 1 reply; 249+ messages in thread
From: Michal Wajdeczko @ 2021-05-24 12:21 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel
  Cc: tvrtko.ursulin, jason.ekstrand, daniele.ceraolospurio,
	jon.bloomfield, daniel.vetter, john.c.harrison



On 06.05.2021 21:13, Matthew Brost wrote:
> Add non blocking CTB send function, intel_guc_send_nb. In order to
> support a non blocking CTB send function a spin lock is needed to

spin lock was added in 16/97

> protect the CTB descriptors fields. Also the non blocking call must not
> update the fence value as this value is owned by the blocking call
> (intel_guc_send).

all H2G messages are using "fence", nb variant also needs to update it

> 
> The blocking CTB now must have a flow control mechanism to ensure the

s/blocking/non-blocking

> buffer isn't overrun. A lazy spin wait is used as we believe the flow
> control condition should be rare with properly sized buffer.

as this new nb function is still not used in this patch, then maybe
better to move flow control to separate patch for easier review ?

> 
> The function, intel_guc_send_nb, is exported in this patch but unused.
> Several patches later in the series make use of this function.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 ++-
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 96 +++++++++++++++++++++--
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  7 +-
>  3 files changed, 105 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index c20f3839de12..4c0a367e41d8 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -75,7 +75,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
>  static
>  inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
>  {
> -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
> +}
> +
> +#define INTEL_GUC_SEND_NB		BIT(31)
> +static
> +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
> +{
> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
> +				 INTEL_GUC_SEND_NB);
>  }
>  
>  static inline int
> @@ -83,7 +91,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
>  			   u32 *response_buf, u32 response_buf_size)
>  {
>  	return intel_guc_ct_send(&guc->ct, action, len,
> -				 response_buf, response_buf_size);
> +				 response_buf, response_buf_size, 0);
>  }
>  
>  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index a76603537fa8..af7314d45a78 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -3,6 +3,11 @@
>   * Copyright © 2016-2019 Intel Corporation
>   */
>  
> +#include <linux/circ_buf.h>
> +#include <linux/ktime.h>
> +#include <linux/time64.h>
> +#include <linux/timekeeping.h>
> +
>  #include "i915_drv.h"
>  #include "intel_guc_ct.h"
>  #include "gt/intel_gt.h"
> @@ -308,6 +313,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>  	if (unlikely(err))
>  		goto err_deregister;
>  
> +	ct->requests.last_fence = 1;

not needed

>  	ct->enabled = true;
>  
>  	return 0;
> @@ -343,10 +349,22 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
>  	return ++ct->requests.last_fence;
>  }
>  
> +static void write_barrier(struct intel_guc_ct *ct) {
> +	struct intel_guc *guc = ct_to_guc(ct);
> +	struct intel_gt *gt = guc_to_gt(guc);
> +
> +	if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
> +		GEM_BUG_ON(guc->send_regs.fw_domains);
> +		intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
> +	} else {
> +		wmb();
> +	}
> +}

this chunk seems to be good candidate for separate patch that could be
introduced earlier

> +
>  static int ct_write(struct intel_guc_ct *ct,
>  		    const u32 *action,
>  		    u32 len /* in dwords */,
> -		    u32 fence)
> +		    u32 fence, u32 flags)
>  {
>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> @@ -393,9 +411,13 @@ static int ct_write(struct intel_guc_ct *ct,
>  		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
>  		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
>  
> -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
> +	hxg = (flags & INTEL_GUC_SEND_NB) ?
> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
> +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
> +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
>  
>  	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
>  		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
> @@ -412,6 +434,12 @@ static int ct_write(struct intel_guc_ct *ct,
>  	}
>  	GEM_BUG_ON(tail > size);
>  
> +	/*
> +	 * make sure H2G buffer update and LRC tail update (if this triggering a
> +	 * submission) are visable before updating the descriptor tail

typo

> +	 */
> +	write_barrier(ct);
> +
>  	/* now update descriptor */
>  	WRITE_ONCE(desc->tail, tail);
>  
> @@ -466,6 +494,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
>  	return err;
>  }
>  
> +static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> +{
> +	struct guc_ct_buffer_desc *desc = ctb->desc;
> +	u32 head = READ_ONCE(desc->head);
> +	u32 space;
> +
> +	space = CIRC_SPACE(desc->tail, head, ctb->size);

shouldn't we use READ_ONCE for reading the tail?

> +
> +	return space >= len_dw;
> +}
> +
> +static int ct_send_nb(struct intel_guc_ct *ct,
> +		      const u32 *action,
> +		      u32 len,
> +		      u32 flags)
> +{
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> +	unsigned long spin_flags;
> +	u32 fence;
> +	int ret;
> +
> +	spin_lock_irqsave(&ctb->lock, spin_flags);
> +
> +	ret = ctb_has_room(ctb, len + 1);

why +1 ?

> +	if (unlikely(ret))
> +		goto out;
> +
> +	fence = ct_get_next_fence(ct);
> +	ret = ct_write(ct, action, len, fence, flags);
> +	if (unlikely(ret))
> +		goto out;
> +
> +	intel_guc_notify(ct_to_guc(ct));
> +
> +out:
> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> +
> +	return ret;
> +}
> +
>  static int ct_send(struct intel_guc_ct *ct,
>  		   const u32 *action,
>  		   u32 len,
> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
>  		   u32 response_buf_size,
>  		   u32 *status)
>  {
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>  	struct ct_request request;
>  	unsigned long flags;
>  	u32 fence;
> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
>  	GEM_BUG_ON(!len);
>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>  	GEM_BUG_ON(!response_buf && response_buf_size);
> +	might_sleep();
>  
> +	/*
> +	 * We use a lazy spin wait loop here as we believe that if the CT
> +	 * buffers are sized correctly the flow control condition should be
> +	 * rare.
> +	 */
> +retry:
>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> +	if (unlikely(!ctb_has_room(ctb, len + 1))) {

why +1 ?

> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> +		cond_resched();
> +		goto retry;
> +	}

hmm, full CTB can also be seen in case of nb, but it looks that only in
case of blocking call you want to use lazy spin, why ?

also, what if situation is not improving ?
will we be looping here forever ?

>  
>  	fence = ct_get_next_fence(ct);
>  	request.fence = fence;
> @@ -495,7 +576,7 @@ static int ct_send(struct intel_guc_ct *ct,
>  	list_add_tail(&request.link, &ct->requests.pending);
>  	spin_unlock(&ct->requests.lock);
>  
> -	err = ct_write(ct, action, len, fence);
> +	err = ct_write(ct, action, len, fence, 0);
>  
>  	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>  
> @@ -537,7 +618,7 @@ static int ct_send(struct intel_guc_ct *ct,
>   * Command Transport (CT) buffer based GuC send function.
>   */
>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> -		      u32 *response_buf, u32 response_buf_size)
> +		      u32 *response_buf, u32 response_buf_size, u32 flags)
>  {
>  	u32 status = ~0; /* undefined */
>  	int ret;
> @@ -547,6 +628,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>  		return -ENODEV;
>  	}
>  
> +	if (flags & INTEL_GUC_SEND_NB)
> +		return ct_send_nb(ct, action, len, flags);
> +
>  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>  	if (unlikely(ret < 0)) {
>  		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index 1ae2dde6db93..55ef7c52472f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -9,6 +9,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/spinlock.h>
>  #include <linux/workqueue.h>
> +#include <linux/ktime.h>
>  
>  #include "intel_guc_fwif.h"
>  
> @@ -42,7 +43,6 @@ struct intel_guc_ct_buffer {
>  	bool broken;
>  };
>  
> -
>  /** Top-level structure for Command Transport related data
>   *
>   * Includes a pair of CT buffers for bi-directional communication and tracking
> @@ -69,6 +69,9 @@ struct intel_guc_ct {
>  		struct list_head incoming; /* incoming requests */
>  		struct work_struct worker; /* handler for incoming requests */
>  	} requests;
> +
> +	/** @stall_time: time of first time a CTB submission is stalled */
> +	ktime_t stall_time;

this should be introduced in 37/97

>  };
>  
>  void intel_guc_ct_init_early(struct intel_guc_ct *ct);
> @@ -88,7 +91,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
>  }
>  
>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> -		      u32 *response_buf, u32 response_buf_size);
> +		      u32 *response_buf, u32 response_buf_size, u32 flags);
>  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
>  
>  #endif /* _INTEL_GUC_CT_H_ */
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 37/97] drm/i915/guc: Add stall timer to non blocking CTB send function
  2021-05-06 19:13 ` [RFC PATCH 37/97] drm/i915/guc: Add stall timer to " Matthew Brost
@ 2021-05-24 12:58   ` Michal Wajdeczko
  2021-05-24 18:35     ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Michal Wajdeczko @ 2021-05-24 12:58 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel
  Cc: tvrtko.ursulin, jason.ekstrand, daniele.ceraolospurio,
	jon.bloomfield, daniel.vetter, john.c.harrison



On 06.05.2021 21:13, Matthew Brost wrote:
> Implement a stall timer which fails H2G CTBs once a period of time
> with no forward progress is reached to prevent deadlock.
> 
> Also update to ct_write to return -EDEADLK rather than -EPIPE on a
> corrupted descriptor.

broken descriptor is really separate issue compared to no progress from
GuC side, I would really like to keep old error code

note that broken CTB descriptor is unrecoverable error, while on other
hand, in theory, we could recover from temporary non-moving CTB

> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 48 +++++++++++++++++++++--
>  1 file changed, 45 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index af7314d45a78..4eab319d61be 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -69,6 +69,8 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
>  #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
>  #define CTB_G2H_BUFFER_SIZE	(SZ_4K)
>  
> +#define MAX_US_STALL_CTB	1000000

nit: maybe we should make it a CONFIG value ?

> +
>  struct ct_request {
>  	struct list_head link;
>  	u32 fence;
> @@ -315,6 +317,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>  
>  	ct->requests.last_fence = 1;
>  	ct->enabled = true;
> +	ct->stall_time = KTIME_MAX;
>  
>  	return 0;
>  
> @@ -378,7 +381,7 @@ static int ct_write(struct intel_guc_ct *ct,
>  	unsigned int i;
>  
>  	if (unlikely(ctb->broken))
> -		return -EPIPE;
> +		return -EDEADLK;
>  
>  	if (unlikely(desc->status))
>  		goto corrupted;
> @@ -449,7 +452,7 @@ static int ct_write(struct intel_guc_ct *ct,
>  	CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n",
>  		 desc->head, desc->tail, desc->status);
>  	ctb->broken = true;
> -	return -EPIPE;
> +	return -EDEADLK;
>  }
>  
>  /**
> @@ -494,6 +497,17 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
>  	return err;
>  }
>  
> +static inline bool ct_deadlocked(struct intel_guc_ct *ct)
> +{
> +	bool ret = ktime_us_delta(ktime_get(), ct->stall_time) >
> +		MAX_US_STALL_CTB;
> +
> +	if (unlikely(ret))
> +		CT_ERROR(ct, "CT deadlocked\n");
> +
> +	return ret;
> +}
> +
>  static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
>  {
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> @@ -505,6 +519,26 @@ static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
>  	return space >= len_dw;
>  }
>  
> +static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
> +{
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> +
> +	lockdep_assert_held(&ct->ctbs.send.lock);
> +
> +	if (unlikely(!ctb_has_room(ctb, len_dw))) {
> +		if (ct->stall_time == KTIME_MAX)
> +			ct->stall_time = ktime_get();
> +
> +		if (unlikely(ct_deadlocked(ct)))
> +			return -EDEADLK;
> +		else
> +			return -EBUSY;
> +	}
> +
> +	ct->stall_time = KTIME_MAX;
> +	return 0;
> +}
> +
>  static int ct_send_nb(struct intel_guc_ct *ct,
>  		      const u32 *action,
>  		      u32 len,
> @@ -517,7 +551,7 @@ static int ct_send_nb(struct intel_guc_ct *ct,
>  
>  	spin_lock_irqsave(&ctb->lock, spin_flags);
>  
> -	ret = ctb_has_room(ctb, len + 1);
> +	ret = has_room_nb(ct, len + 1);
>  	if (unlikely(ret))
>  		goto out;
>  
> @@ -561,11 +595,19 @@ static int ct_send(struct intel_guc_ct *ct,
>  retry:
>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>  	if (unlikely(!ctb_has_room(ctb, len + 1))) {
> +		if (ct->stall_time == KTIME_MAX)
> +			ct->stall_time = ktime_get();
>  		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> +
> +		if (unlikely(ct_deadlocked(ct)))
> +			return -EDEADLK;
> +

likely, instead of duplicating code, you can reuse has_room_nb here

>  		cond_resched();
>  		goto retry;
>  	}
>  
> +	ct->stall_time = KTIME_MAX;
> +
>  	fence = ct_get_next_fence(ct);
>  	request.fence = fence;
>  	request.status = 0;
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 38/97] drm/i915/guc: Optimize CTB writes and reads
  2021-05-06 19:13 ` [RFC PATCH 38/97] drm/i915/guc: Optimize CTB writes and reads Matthew Brost
@ 2021-05-24 13:31   ` Michal Wajdeczko
  2021-05-25 17:39     ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Michal Wajdeczko @ 2021-05-24 13:31 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel
  Cc: tvrtko.ursulin, jason.ekstrand, daniele.ceraolospurio,
	jon.bloomfield, daniel.vetter, john.c.harrison



On 06.05.2021 21:13, Matthew Brost wrote:
> CTB writes are now in the path of command submission and should be
> optimized for performance. Rather than reading CTB descriptor values
> (e.g. head, tail, size) which could result in accesses across the PCIe

size was removed from the descriptor in 25/97

> bus, store shadow local copies and only read/write the descriptor
> values when absolutely necessary.

maybe worth to add some words about caching available space ?

> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 78 +++++++++++++----------
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
>  2 files changed, 52 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 4eab319d61be..77dfbc94dcc3 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -127,6 +127,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
>  static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
>  {
>  	ctb->broken = false;
> +	ctb->tail = 0;
> +	ctb->head = 0;
> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
> +
>  	guc_ct_buffer_desc_init(ctb->desc);
>  }
>  
> @@ -371,10 +375,8 @@ static int ct_write(struct intel_guc_ct *ct,
>  {
>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> -	u32 tail = desc->tail;
> +	u32 tail = ctb->tail;
>  	u32 size = ctb->size;
> -	u32 used;
>  	u32 header;
>  	u32 hxg;
>  	u32 *cmds = ctb->cmds;
> @@ -386,25 +388,14 @@ static int ct_write(struct intel_guc_ct *ct,
>  	if (unlikely(desc->status))
>  		goto corrupted;
>  
> -	if (unlikely((tail | head) >= size)) {
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> +	if (unlikely((desc->tail | desc->head) >= size)) {
>  		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> -			 head, tail, size);
> +			 desc->head, desc->tail, size);
>  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>  		goto corrupted;

nit: as we are now caching tail value, we can start comparing it with
the value in descriptor and report GUC_CTB_STATUS_MISMATCH if needed

>  	}
> -
> -	/*
> -	 * tail == head condition indicates empty. GuC FW does not support
> -	 * using up the entire buffer to get tail == head meaning full.
> -	 */
> -	if (tail < head)
> -		used = (size - head) + tail;
> -	else
> -		used = tail - head;
> -
> -	/* make sure there is a space including extra dw for the fence */
> -	if (unlikely(used + len + 1 >= size))
> -		return -ENOSPC;
> +#endif
>  
>  	/*
>  	 * dw0: CT header (including fence)
> @@ -444,7 +435,9 @@ static int ct_write(struct intel_guc_ct *ct,
>  	write_barrier(ct);
>  
>  	/* now update descriptor */
> +	ctb->tail = tail;
>  	WRITE_ONCE(desc->tail, tail);
> +	ctb->space -= len + 1;
>  
>  	return 0;
>  
> @@ -460,7 +453,7 @@ static int ct_write(struct intel_guc_ct *ct,
>   * @req:	pointer to pending request
>   * @status:	placeholder for status
>   *
> - * For each sent request, Guc shall send bac CT response message.
> + * For each sent request, GuC shall send back CT response message.
>   * Our message handler will update status of tracked request once
>   * response message with given fence is received. Wait here and
>   * check for valid response status value.
> @@ -508,24 +501,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
>  	return ret;
>  }
>  
> -static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)

this function was introduced moment ago ...
can we minimize number of changes between patches ?

>  {
> -	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = READ_ONCE(desc->head);
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> +	u32 head;
>  	u32 space;
>  
> -	space = CIRC_SPACE(desc->tail, head, ctb->size);
> +	if (ctb->space >= len_dw)
> +		return true;
> +
> +	head = READ_ONCE(ctb->desc->head);
> +	if (unlikely(head > ctb->size)) {
> +		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
> +			  ctb->desc->head, ctb->desc->tail, ctb->size);
> +		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
> +		ctb->broken = true;
> +		return false;
> +	}
> +
> +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> +	ctb->space = space;

shouldn't we update our ctb->head with new head value?

>  
>  	return space >= len_dw;
>  }
>  
>  static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
>  {
> -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> -
>  	lockdep_assert_held(&ct->ctbs.send.lock);
>  
> -	if (unlikely(!ctb_has_room(ctb, len_dw))) {
> +	if (unlikely(!h2g_has_room(ct, len_dw))) {
>  		if (ct->stall_time == KTIME_MAX)
>  			ct->stall_time = ktime_get();
>  
> @@ -593,11 +597,11 @@ static int ct_send(struct intel_guc_ct *ct,
>  	 * rare.
>  	 */
>  retry:
> -	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> -	if (unlikely(!ctb_has_room(ctb, len + 1))) {
> +	spin_lock_irqsave(&ctb->lock, flags);
> +	if (unlikely(!h2g_has_room(ct, len + 1))) {
>  		if (ct->stall_time == KTIME_MAX)
>  			ct->stall_time = ktime_get();
> -		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> +		spin_unlock_irqrestore(&ctb->lock, flags);

this ...

>  
>  		if (unlikely(ct_deadlocked(ct)))
>  			return -EDEADLK;
> @@ -620,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct,
>  
>  	err = ct_write(ct, action, len, fence, 0);
>  
> -	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> +	spin_unlock_irqrestore(&ctb->lock, flags);

and this likely could be done in earlier patch

>  
>  	if (unlikely(err))
>  		goto unlink;
> @@ -708,7 +712,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  {
>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> +	u32 head = ctb->head;
>  	u32 tail = desc->tail;
>  	u32 size = ctb->size;
>  	u32 *cmds = ctb->cmds;
> @@ -723,12 +727,21 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	if (unlikely(desc->status))
>  		goto corrupted;
>  
> -	if (unlikely((tail | head) >= size)) {
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> +	if (unlikely((desc->tail | desc->head) >= size)) {
>  		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>  			 head, tail, size);
>  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>  		goto corrupted;
>  	}
> +#else
> +	if (unlikely((tail | ctb->head) >= size)) {

we are in control of cached 'ctb->head' so it shall never be > size

> +		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> +			 head, tail, size);
> +		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> +		goto corrupted;
> +	}
> +#endif
>  
>  	/* tail == head condition indicates empty */
>  	available = tail - head;
> @@ -778,6 +791,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	}
>  	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
>  
> +	ctb->head = head;
>  	/* now update descriptor */
>  	WRITE_ONCE(desc->head, head);
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index 55ef7c52472f..9924335e2ee6 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -33,6 +33,9 @@ struct intel_guc;
>   * @desc: pointer to the buffer descriptor
>   * @cmds: pointer to the commands buffer
>   * @size: size of the commands buffer in dwords
> + * @head: local shadow copy of head in dwords
> + * @tail: local shadow copy of tail in dwords
> + * @space: local shadow copy of space in dwords
>   * @broken: flag to indicate if descriptor data is broken
>   */
>  struct intel_guc_ct_buffer {
> @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
>  	struct guc_ct_buffer_desc *desc;
>  	u32 *cmds;
>  	u32 size;
> +	u32 tail;
> +	u32 head;
> +	u32 space;
>  	bool broken;
>  };
>  
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 39/97] drm/i915/guc: Increase size of CTB buffers
  2021-05-06 19:13 ` [RFC PATCH 39/97] drm/i915/guc: Increase size of CTB buffers Matthew Brost
@ 2021-05-24 13:43   ` Michal Wajdeczko
  2021-05-24 18:40     ` Matthew Brost
  2021-05-25  9:24   ` Tvrtko Ursulin
  1 sibling, 1 reply; 249+ messages in thread
From: Michal Wajdeczko @ 2021-05-24 13:43 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter



On 06.05.2021 21:13, Matthew Brost wrote:
> With the introduction of non-blocking CTBs more than one CTB can be in
> flight at a time. Increasing the size of the CTBs should reduce how
> often software hits the case where no space is available in the CTB
> buffer.
> 
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 77dfbc94dcc3..d6895d29ed2d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
>   *      +--------+-----------------------------------------------+------+
>   *
>   * Size of each `CT Buffer`_ must be multiple of 4K.
> - * As we don't expect too many messages, for now use minimum sizes.
> + * We don't expect too many messages in flight at any time, unless we are
> + * using the GuC submission. In that case each request requires a minimum
> + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this

nit: all our CTB calculations are in dwords now, not bytes

> + * enough space to avoid backpressure on the driver. We increase the size
> + * of the receive buffer (relative to the send) to ensure a G2H response
> + * CTB has a landing spot.

hmm, but we are not checking G2H CTB yet
will start doing it around patch 54/97
so maybe this other patch should be introduced earlier ?

>   */
>  #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
>  #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
> -#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
> +#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)

in theory, we (host) should be faster than GuC, so G2H CTB shall be
almost always empty, if this is not a case, maybe we should start
monitoring what is happening and report some warnings if G2H is half full ?

>  
>  #define MAX_US_STALL_CTB	1000000
>  
> @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	/* beware of buffer wrap case */
>  	if (unlikely(available < 0))
>  		available += size;
> -	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
> +	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
>  	GEM_BUG_ON(available < 0);
>  
>  	header = cmds[head];
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 40/97] drm/i915/guc: Module load failure test for CT buffer creation
  2021-05-06 19:13 ` [RFC PATCH 40/97] drm/i915/guc: Module load failure test for CT buffer creation Matthew Brost
@ 2021-05-24 13:45   ` Michal Wajdeczko
  0 siblings, 0 replies; 249+ messages in thread
From: Michal Wajdeczko @ 2021-05-24 13:45 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel
  Cc: tvrtko.ursulin, jason.ekstrand, daniele.ceraolospurio,
	jon.bloomfield, daniel.vetter, john.c.harrison



On 06.05.2021 21:13, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Add several module failure load inject points in the CT buffer creation
> code path.
> 
> Signed-off-by: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index d6895d29ed2d..586e6efc3558 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -177,6 +177,10 @@ static int ct_register_buffer(struct intel_guc_ct *ct, u32 type,
>  {
>  	int err;
>  
> +	err = i915_inject_probe_error(guc_to_gt(ct_to_guc(ct))->i915, -ENXIO);
> +	if (unlikely(err))
> +		return err;
> +
>  	err = guc_action_register_ct_buffer(ct_to_guc(ct), type,
>  					    desc_addr, buff_addr, size);
>  	if (unlikely(err))
> @@ -228,6 +232,10 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
>  	u32 *cmds;
>  	int err;
>  
> +	err = i915_inject_probe_error(guc_to_gt(guc)->i915, -ENXIO);
> +	if (err)
> +		return err;
> +
>  	GEM_BUG_ON(ct->vma);
>  
>  	blob_size = 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE + CTB_G2H_BUFFER_SIZE;
> 

likely could be introduced earlier, maybe right after patch 5/97

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 37/97] drm/i915/guc: Add stall timer to non blocking CTB send function
  2021-05-24 12:58   ` Michal Wajdeczko
@ 2021-05-24 18:35     ` Matthew Brost
  2021-05-25 14:15       ` Michal Wajdeczko
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-24 18:35 UTC (permalink / raw)
  To: Michal Wajdeczko
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison

On Mon, May 24, 2021 at 02:58:12PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 06.05.2021 21:13, Matthew Brost wrote:
> > Implement a stall timer which fails H2G CTBs once a period of time
> > with no forward progress is reached to prevent deadlock.
> > 
> > Also update to ct_write to return -EDEADLK rather than -EPIPE on a
> > corrupted descriptor.
> 
> broken descriptor is really separate issue compared to no progress from
> GuC side, I would really like to keep old error code
>

I know you do as you have brought it up several times. Again to the rest
of the stack these two things mean the exact same thing.
 
> note that broken CTB descriptor is unrecoverable error, while on other
> hand, in theory, we could recover from temporary non-moving CTB
> 

Yea but we don't, in both cases we disable submission which in turn
triggers a full GPU reset.

> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 48 +++++++++++++++++++++--
> >  1 file changed, 45 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index af7314d45a78..4eab319d61be 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -69,6 +69,8 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
> >  #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
> >  #define CTB_G2H_BUFFER_SIZE	(SZ_4K)
> >  
> > +#define MAX_US_STALL_CTB	1000000
> 
> nit: maybe we should make it a CONFIG value ?
> 

Sure.

> > +
> >  struct ct_request {
> >  	struct list_head link;
> >  	u32 fence;
> > @@ -315,6 +317,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
> >  
> >  	ct->requests.last_fence = 1;
> >  	ct->enabled = true;
> > +	ct->stall_time = KTIME_MAX;
> >  
> >  	return 0;
> >  
> > @@ -378,7 +381,7 @@ static int ct_write(struct intel_guc_ct *ct,
> >  	unsigned int i;
> >  
> >  	if (unlikely(ctb->broken))
> > -		return -EPIPE;
> > +		return -EDEADLK;
> >  
> >  	if (unlikely(desc->status))
> >  		goto corrupted;
> > @@ -449,7 +452,7 @@ static int ct_write(struct intel_guc_ct *ct,
> >  	CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n",
> >  		 desc->head, desc->tail, desc->status);
> >  	ctb->broken = true;
> > -	return -EPIPE;
> > +	return -EDEADLK;
> >  }
> >  
> >  /**
> > @@ -494,6 +497,17 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> >  	return err;
> >  }
> >  
> > +static inline bool ct_deadlocked(struct intel_guc_ct *ct)
> > +{
> > +	bool ret = ktime_us_delta(ktime_get(), ct->stall_time) >
> > +		MAX_US_STALL_CTB;
> > +
> > +	if (unlikely(ret))
> > +		CT_ERROR(ct, "CT deadlocked\n");
> > +
> > +	return ret;
> > +}
> > +
> >  static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> >  {
> >  	struct guc_ct_buffer_desc *desc = ctb->desc;
> > @@ -505,6 +519,26 @@ static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> >  	return space >= len_dw;
> >  }
> >  
> > +static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
> > +{
> > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > +
> > +	lockdep_assert_held(&ct->ctbs.send.lock);
> > +
> > +	if (unlikely(!ctb_has_room(ctb, len_dw))) {
> > +		if (ct->stall_time == KTIME_MAX)
> > +			ct->stall_time = ktime_get();
> > +
> > +		if (unlikely(ct_deadlocked(ct)))
> > +			return -EDEADLK;
> > +		else
> > +			return -EBUSY;
> > +	}
> > +
> > +	ct->stall_time = KTIME_MAX;
> > +	return 0;
> > +}
> > +
> >  static int ct_send_nb(struct intel_guc_ct *ct,
> >  		      const u32 *action,
> >  		      u32 len,
> > @@ -517,7 +551,7 @@ static int ct_send_nb(struct intel_guc_ct *ct,
> >  
> >  	spin_lock_irqsave(&ctb->lock, spin_flags);
> >  
> > -	ret = ctb_has_room(ctb, len + 1);
> > +	ret = has_room_nb(ct, len + 1);
> >  	if (unlikely(ret))
> >  		goto out;
> >  
> > @@ -561,11 +595,19 @@ static int ct_send(struct intel_guc_ct *ct,
> >  retry:
> >  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> >  	if (unlikely(!ctb_has_room(ctb, len + 1))) {
> > +		if (ct->stall_time == KTIME_MAX)
> > +			ct->stall_time = ktime_get();
> >  		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > +
> > +		if (unlikely(ct_deadlocked(ct)))
> > +			return -EDEADLK;
> > +
> 
> likely, instead of duplicating code, you can reuse has_room_nb here
>

In this patch yes, in the following patch no as this check changes
between non-blockig and blocking once we introduce G2H credits. I'd
rather just leave it as is than churning on the patches.

Matt 
 
> >  		cond_resched();
> >  		goto retry;
> >  	}
> >  
> > +	ct->stall_time = KTIME_MAX;
> > +
> >  	fence = ct_get_next_fence(ct);
> >  	request.fence = fence;
> >  	request.status = 0;
> > 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 39/97] drm/i915/guc: Increase size of CTB buffers
  2021-05-24 13:43   ` [Intel-gfx] " Michal Wajdeczko
@ 2021-05-24 18:40     ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-24 18:40 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Mon, May 24, 2021 at 03:43:11PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 06.05.2021 21:13, Matthew Brost wrote:
> > With the introduction of non-blocking CTBs more than one CTB can be in
> > flight at a time. Increasing the size of the CTBs should reduce how
> > often software hits the case where no space is available in the CTB
> > buffer.
> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 77dfbc94dcc3..d6895d29ed2d 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
> >   *      +--------+-----------------------------------------------+------+
> >   *
> >   * Size of each `CT Buffer`_ must be multiple of 4K.
> > - * As we don't expect too many messages, for now use minimum sizes.
> > + * We don't expect too many messages in flight at any time, unless we are
> > + * using the GuC submission. In that case each request requires a minimum
> > + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this
> 
> nit: all our CTB calculations are in dwords now, not bytes
> 

I can change the wording to DW sizes.

> > + * enough space to avoid backpressure on the driver. We increase the size
> > + * of the receive buffer (relative to the send) to ensure a G2H response
> > + * CTB has a landing spot.
> 
> hmm, but we are not checking G2H CTB yet
> will start doing it around patch 54/97
> so maybe this other patch should be introduced earlier ?
>

Yes, that patch is going to be pulled down to an earlier spot in the
series.
 
> >   */
> >  #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
> >  #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
> > -#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
> > +#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
> 
> in theory, we (host) should be faster than GuC, so G2H CTB shall be
> almost always empty, if this is not a case, maybe we should start
> monitoring what is happening and report some warnings if G2H is half full ?
>

Certainly some IGTs put some more pressure on the G2H channel than the
H2G channel at least I think. This is something we can tune over time
after this lands upstream. IMO a message at this point is overkill.

Matt
 
> >  
> >  #define MAX_US_STALL_CTB	1000000
> >  
> > @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >  	/* beware of buffer wrap case */
> >  	if (unlikely(available < 0))
> >  		available += size;
> > -	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
> > +	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
> >  	GEM_BUG_ON(available < 0);
> >  
> >  	header = cmds[head];
> > 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 08/97] drm/i915/guc: Keep strict GuC ABI definitions
  2021-05-06 19:13 ` [RFC PATCH 08/97] drm/i915/guc: Keep strict GuC ABI definitions Matthew Brost
@ 2021-05-24 23:52   ` Michał Winiarski
  0 siblings, 0 replies; 249+ messages in thread
From: Michał Winiarski @ 2021-05-24 23:52 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-gfx
  Cc: matthew.brost, tvrtko.ursulin, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison

Quoting Matthew Brost (2021-05-06 21:13:22)
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> Our fwif.h file is now mix of strict firmware ABI definitions and
> set of our helpers. In anticipation of upcoming changes to the GuC
> interface try to keep them separate in smaller maintainable files.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>

Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>

-Michał

> ---
>  .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |  51 +++++
>  .../gt/uc/abi/guc_communication_ctb_abi.h     | 106 +++++++++
>  .../gt/uc/abi/guc_communication_mmio_abi.h    |  52 +++++
>  .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h   |  14 ++
>  .../gpu/drm/i915/gt/uc/abi/guc_messages_abi.h |  21 ++
>  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   | 203 +-----------------
>  6 files changed, 250 insertions(+), 197 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
>  create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
>  create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h
>  create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
>  create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 06/97] drm/i915/guc: enable only the user interrupt when using GuC submission
  2021-05-06 19:13 ` [RFC PATCH 06/97] drm/i915/guc: enable only the user interrupt when using GuC submission Matthew Brost
@ 2021-05-25  0:31   ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25  0:31 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:20PM -0700, Matthew Brost wrote:
> From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> 
> In GuC submission mode the CS is owned by the GuC FW, so all CS status
> interrupts are handled by it. We only need the user interrupt as that
> signals request completion.
> 
> Since we're now starting the engines directly in GuC submission mode
> when selected, we can stop switching back and forth between the
> execlists and the GuC programming and select directly the correct
> interrupt mask.
> 
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> Cc: John Harrison <john.c.harrison@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_irq.c        | 18 ++++++-----
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 31 -------------------
>  2 files changed, 11 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> index d29126c458ba..f88c10366e58 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> @@ -194,14 +194,18 @@ void gen11_gt_irq_reset(struct intel_gt *gt)
>  
>  void gen11_gt_irq_postinstall(struct intel_gt *gt)
>  {
> -	const u32 irqs =
> -		GT_CS_MASTER_ERROR_INTERRUPT |
> -		GT_RENDER_USER_INTERRUPT |
> -		GT_CONTEXT_SWITCH_INTERRUPT |
> -		GT_WAIT_SEMAPHORE_INTERRUPT;
>  	struct intel_uncore *uncore = gt->uncore;
> -	const u32 dmask = irqs << 16 | irqs;
> -	const u32 smask = irqs << 16;
> +	u32 irqs = GT_RENDER_USER_INTERRUPT;
> +	u32 dmask;
> +	u32 smask;
> +
> +	if (!intel_uc_wants_guc_submission(&gt->uc))
> +		irqs |= GT_CS_MASTER_ERROR_INTERRUPT |
> +			GT_CONTEXT_SWITCH_INTERRUPT |
> +			GT_WAIT_SEMAPHORE_INTERRUPT;
> +
> +	dmask = irqs << 16 | irqs;
> +	smask = irqs << 16;
>  
>  	BUILD_BUG_ON(irqs & 0xffff0000);
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 335719f17490..38cda5d599a6 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -432,32 +432,6 @@ void intel_guc_submission_fini(struct intel_guc *guc)
>  	}
>  }
>  
> -static void guc_interrupts_capture(struct intel_gt *gt)
> -{
> -	struct intel_uncore *uncore = gt->uncore;
> -	u32 irqs = GT_CONTEXT_SWITCH_INTERRUPT;
> -	u32 dmask = irqs << 16 | irqs;
> -
> -	GEM_BUG_ON(INTEL_GEN(gt->i915) < 11);
> -
> -	/* Don't handle the ctx switch interrupt in GuC submission mode */
> -	intel_uncore_rmw(uncore, GEN11_RENDER_COPY_INTR_ENABLE, dmask, 0);
> -	intel_uncore_rmw(uncore, GEN11_VCS_VECS_INTR_ENABLE, dmask, 0);
> -}
> -
> -static void guc_interrupts_release(struct intel_gt *gt)
> -{
> -	struct intel_uncore *uncore = gt->uncore;
> -	u32 irqs = GT_CONTEXT_SWITCH_INTERRUPT;
> -	u32 dmask = irqs << 16 | irqs;
> -
> -	GEM_BUG_ON(INTEL_GEN(gt->i915) < 11);
> -
> -	/* Handle ctx switch interrupts again */
> -	intel_uncore_rmw(uncore, GEN11_RENDER_COPY_INTR_ENABLE, 0, dmask);
> -	intel_uncore_rmw(uncore, GEN11_VCS_VECS_INTR_ENABLE, 0, dmask);
> -}
> -
>  static int guc_context_alloc(struct intel_context *ce)
>  {
>  	return lrc_alloc(ce, ce->engine);
> @@ -722,9 +696,6 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>  void intel_guc_submission_enable(struct intel_guc *guc)
>  {
>  	guc_stage_desc_init(guc);
> -
> -	/* Take over from manual control of ELSP (execlists) */
> -	guc_interrupts_capture(guc_to_gt(guc));
>  }
>  
>  void intel_guc_submission_disable(struct intel_guc *guc)
> @@ -735,8 +706,6 @@ void intel_guc_submission_disable(struct intel_guc *guc)
>  
>  	/* Note: By the time we're here, GuC may have already been reset */
>  
> -	guc_interrupts_release(gt);
> -
>  	guc_stage_desc_fini(guc);
>  }
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 07/97] drm/i915/guc: Remove sample_forcewake h2g action
  2021-05-06 19:13 ` [RFC PATCH 07/97] drm/i915/guc: Remove sample_forcewake h2g action Matthew Brost
  2021-05-24 10:48   ` Michal Wajdeczko
@ 2021-05-25  0:36   ` Matthew Brost
  1 sibling, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25  0:36 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:21PM -0700, Matthew Brost wrote:
> From: Rodrigo Vivi <rodrigo.vivi@intel.com>
> 
> This action is no-op in the GuC side for a few versions already
> and it is getting entirely removed soon, in an upcoming version.
> 
> Time to remove before we face communication issues.
> 
> Cc:  Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c      | 16 ----------------
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h      |  1 -
>  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h |  4 ----
>  drivers/gpu/drm/i915/gt/uc/intel_uc.c       |  4 ----
>  4 files changed, 25 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index adae04c47aab..ab2c8fe8cdfa 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -469,22 +469,6 @@ int intel_guc_to_host_process_recv_msg(struct intel_guc *guc,
>  	return 0;
>  }
>  
> -int intel_guc_sample_forcewake(struct intel_guc *guc)
> -{
> -	struct drm_i915_private *dev_priv = guc_to_gt(guc)->i915;
> -	u32 action[2];
> -
> -	action[0] = INTEL_GUC_ACTION_SAMPLE_FORCEWAKE;
> -	/* WaRsDisableCoarsePowerGating:skl,cnl */
> -	if (!HAS_RC6(dev_priv) || NEEDS_WaRsDisableCoarsePowerGating(dev_priv))
> -		action[1] = 0;
> -	else
> -		/* bit 0 and 1 are for Render and Media domain separately */
> -		action[1] = GUC_FORCEWAKE_RENDER | GUC_FORCEWAKE_MEDIA;
> -
> -	return intel_guc_send(guc, action, ARRAY_SIZE(action));
> -}
> -
>  /**
>   * intel_guc_auth_huc() - Send action to GuC to authenticate HuC ucode
>   * @guc: intel_guc structure
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index bc2ba7d0626c..c20f3839de12 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -128,7 +128,6 @@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action, u32 len,
>  			u32 *response_buf, u32 response_buf_size);
>  int intel_guc_to_host_process_recv_msg(struct intel_guc *guc,
>  				       const u32 *payload, u32 len);
> -int intel_guc_sample_forcewake(struct intel_guc *guc);
>  int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset);
>  int intel_guc_suspend(struct intel_guc *guc);
>  int intel_guc_resume(struct intel_guc *guc);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> index 79c560d9c0b6..0f9afcde1d0b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> @@ -302,9 +302,6 @@ struct guc_ct_buffer_desc {
>  #define GUC_CT_MSG_ACTION_SHIFT			16
>  #define GUC_CT_MSG_ACTION_MASK			0xFFFF
>  
> -#define GUC_FORCEWAKE_RENDER	(1 << 0)
> -#define GUC_FORCEWAKE_MEDIA	(1 << 1)
> -
>  #define GUC_POWER_UNSPECIFIED	0
>  #define GUC_POWER_D0		1
>  #define GUC_POWER_D1		2
> @@ -558,7 +555,6 @@ enum intel_guc_action {
>  	INTEL_GUC_ACTION_ENTER_S_STATE = 0x501,
>  	INTEL_GUC_ACTION_EXIT_S_STATE = 0x502,
>  	INTEL_GUC_ACTION_SLPC_REQUEST = 0x3003,
> -	INTEL_GUC_ACTION_SAMPLE_FORCEWAKE = 0x3005,
>  	INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000,
>  	INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
>  	INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> index 892c1315ce49..ab0789d66e06 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> @@ -502,10 +502,6 @@ static int __uc_init_hw(struct intel_uc *uc)
>  
>  	intel_huc_auth(huc);
>  
> -	ret = intel_guc_sample_forcewake(guc);
> -	if (ret)
> -		goto err_log_capture;
> -
>  	if (intel_uc_uses_guc_submission(uc))
>  		intel_guc_submission_enable(guc);
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 10/97] drm/i915: Promote ptrdiff() to i915_utils.h
  2021-05-06 19:13 ` [RFC PATCH 10/97] drm/i915: Promote ptrdiff() to i915_utils.h Matthew Brost
@ 2021-05-25  0:42   ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25  0:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:24PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> Generic helpers should be placed in i915_utils.h.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_utils.h | 5 +++++
>  drivers/gpu/drm/i915/i915_vma.h   | 5 -----
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
> index f02f52ab5070..5259edacde38 100644
> --- a/drivers/gpu/drm/i915/i915_utils.h
> +++ b/drivers/gpu/drm/i915/i915_utils.h
> @@ -201,6 +201,11 @@ __check_struct_size(size_t base, size_t arr, size_t count, size_t *size)
>  	__T;								\
>  })
>  
> +static __always_inline ptrdiff_t ptrdiff(const void *a, const void *b)
> +{
> +	return a - b;
> +}
> +
>  /*
>   * container_of_user: Extract the superclass from a pointer to a member.
>   *
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 8df784a026d2..a29a158990c6 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -146,11 +146,6 @@ static inline void i915_vma_put(struct i915_vma *vma)
>  	i915_gem_object_put(vma->obj);
>  }
>  
> -static __always_inline ptrdiff_t ptrdiff(const void *a, const void *b)
> -{
> -	return a - b;
> -}
> -
>  static inline long
>  i915_vma_compare(struct i915_vma *vma,
>  		 struct i915_address_space *vm,
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 28/97] drm/i915/guc: Kill guc_clients.ct_pool
  2021-05-06 19:13 ` [RFC PATCH 28/97] drm/i915/guc: Kill guc_clients.ct_pool Matthew Brost
@ 2021-05-25  1:01   ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25  1:01 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:42PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> CTB pool is now maintained internally by the GuC as part of its
> "private data". No need to allocate separate buffer and pass it
> to GuC as yet another ADS.
> 
> GuC: 57.0.0
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c  | 12 ------------
>  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 12 +-----------
>  2 files changed, 1 insertion(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index 648e1767b17a..775f00d706fa 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -25,8 +25,6 @@
>   *      +---------------------------------------+
>   *      | guc_clients_info                      |
>   *      +---------------------------------------+
> - *      | guc_ct_pool_entry[size]               |
> - *      +---------------------------------------+
>   *      | padding                               |
>   *      +---------------------------------------+ <== 4K aligned
>   *      | private data                          |
> @@ -39,7 +37,6 @@ struct __guc_ads_blob {
>  	struct guc_policies policies;
>  	struct guc_gt_system_info system_info;
>  	struct guc_clients_info clients_info;
> -	struct guc_ct_pool_entry ct_pool[GUC_CT_POOL_SIZE];
>  } __packed;
>  
>  static u32 guc_ads_private_data_size(struct intel_guc *guc)
> @@ -67,11 +64,6 @@ static void guc_policies_init(struct guc_policies *policies)
>  	policies->is_valid = 1;
>  }
>  
> -static void guc_ct_pool_entries_init(struct guc_ct_pool_entry *pool, u32 num)
> -{
> -	memset(pool, 0, num * sizeof(*pool));
> -}
> -
>  static void guc_mapping_table_init(struct intel_gt *gt,
>  				   struct guc_gt_system_info *system_info)
>  {
> @@ -157,11 +149,7 @@ static void __guc_ads_init(struct intel_guc *guc)
>  	base = intel_guc_ggtt_offset(guc, guc->ads_vma);
>  
>  	/* Clients info  */
> -	guc_ct_pool_entries_init(blob->ct_pool, ARRAY_SIZE(blob->ct_pool));
> -
>  	blob->clients_info.clients_num = 1;
> -	blob->clients_info.ct_pool_addr = base + ptr_offset(blob, ct_pool);
> -	blob->clients_info.ct_pool_count = ARRAY_SIZE(blob->ct_pool);
>  
>  	/* ADS */
>  	blob->ads.scheduler_policies = base + ptr_offset(blob, policies);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> index 95db4a7d3f4d..301b173a26bc 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> @@ -269,19 +269,9 @@ struct guc_gt_system_info {
>  } __packed;
>  
>  /* Clients info */
> -struct guc_ct_pool_entry {
> -	struct guc_ct_buffer_desc desc;
> -	u32 reserved[7];
> -} __packed;
> -
> -#define GUC_CT_POOL_SIZE	2
> -
>  struct guc_clients_info {
>  	u32 clients_num;
> -	u32 reserved0[13];
> -	u32 ct_pool_addr;
> -	u32 ct_pool_count;
> -	u32 reserved[4];
> +	u32 reserved[19];
>  } __packed;
>  
>  /* GuC Additional Data Struct */
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 23/97] drm/i915/guc: Support per context scheduling policies
  2021-05-06 19:13 ` [RFC PATCH 23/97] drm/i915/guc: Support per context scheduling policies Matthew Brost
@ 2021-05-25  1:15   ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25  1:15 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:37PM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> GuC firmware v53.0.0 introduced per context scheduling policies. This
> includes changes to some of the ADS structures which are required to
> load the firmware even if not using GuC submission.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c  | 26 +++--------------
>  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 31 +++++----------------
>  2 files changed, 11 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index 17526717368c..648e1767b17a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -58,30 +58,12 @@ static u32 guc_ads_blob_size(struct intel_guc *guc)
>  	       guc_ads_private_data_size(guc);
>  }
>  
> -static void guc_policy_init(struct guc_policy *policy)
> -{
> -	policy->execution_quantum = POLICY_DEFAULT_EXECUTION_QUANTUM_US;
> -	policy->preemption_time = POLICY_DEFAULT_PREEMPTION_TIME_US;
> -	policy->fault_time = POLICY_DEFAULT_FAULT_TIME_US;
> -	policy->policy_flags = 0;
> -}
> -
>  static void guc_policies_init(struct guc_policies *policies)
>  {
> -	struct guc_policy *policy;
> -	u32 p, i;
> -
> -	policies->dpc_promote_time = POLICY_DEFAULT_DPC_PROMOTE_TIME_US;
> -	policies->max_num_work_items = POLICY_MAX_NUM_WI;
> -
> -	for (p = 0; p < GUC_CLIENT_PRIORITY_NUM; p++) {
> -		for (i = 0; i < GUC_MAX_ENGINE_CLASSES; i++) {
> -			policy = &policies->policy[p][i];
> -
> -			guc_policy_init(policy);
> -		}
> -	}
> -
> +	policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US;
> +	policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI;
> +	/* Disable automatic resets as not yet supported. */
> +	policies->global_flags = GLOBAL_POLICY_DISABLE_ENGINE_RESET;
>  	policies->is_valid = 1;
>  }
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> index d445f6b77db4..95db4a7d3f4d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> @@ -221,32 +221,14 @@ struct guc_stage_desc {
>  
>  /* Scheduling policy settings */
>  
> -/* Reset engine upon preempt failure */
> -#define POLICY_RESET_ENGINE		(1<<0)
> -/* Preempt to idle on quantum expiry */
> -#define POLICY_PREEMPT_TO_IDLE		(1<<1)
> -
> -#define POLICY_MAX_NUM_WI 15
> -#define POLICY_DEFAULT_DPC_PROMOTE_TIME_US 500000
> -#define POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000
> -#define POLICY_DEFAULT_PREEMPTION_TIME_US 500000
> -#define POLICY_DEFAULT_FAULT_TIME_US 250000
> -
> -struct guc_policy {
> -	/* Time for one workload to execute. (in micro seconds) */
> -	u32 execution_quantum;
> -	/* Time to wait for a preemption request to completed before issuing a
> -	 * reset. (in micro seconds). */
> -	u32 preemption_time;
> -	/* How much time to allow to run after the first fault is observed.
> -	 * Then preempt afterwards. (in micro seconds) */
> -	u32 fault_time;
> -	u32 policy_flags;
> -	u32 reserved[8];
> -} __packed;
> +#define GLOBAL_POLICY_MAX_NUM_WI 15
> +
> +/* Don't reset an engine upon preemption failure */
> +#define GLOBAL_POLICY_DISABLE_ENGINE_RESET				BIT(0)
> +
> +#define GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US 500000
>  
>  struct guc_policies {
> -	struct guc_policy policy[GUC_CLIENT_PRIORITY_NUM][GUC_MAX_ENGINE_CLASSES];
>  	u32 submission_queue_depth[GUC_MAX_ENGINE_CLASSES];
>  	/* In micro seconds. How much time to allow before DPC processing is
>  	 * called back via interrupt (to prevent DPC queue drain starving).
> @@ -260,6 +242,7 @@ struct guc_policies {
>  	 * idle. */
>  	u32 max_num_work_items;
>  
> +	u32 global_flags;
>  	u32 reserved[4];
>  } __packed;
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 09/97] drm/i915/guc: Stop using fence/status from CTB descriptor
  2021-05-06 19:13 ` [RFC PATCH 09/97] drm/i915/guc: Stop using fence/status from CTB descriptor Matthew Brost
@ 2021-05-25  2:38   ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25  2:38 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:23PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> Stop using fence/status from CTB descriptor as future GuC ABI will
> no longer support replies over CTB descriptor.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  .../gt/uc/abi/guc_communication_ctb_abi.h     |  4 +-
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 72 ++-----------------
>  2 files changed, 6 insertions(+), 70 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
> index ebd8c3e0e4bb..d38935f47ecf 100644
> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
> @@ -71,8 +71,8 @@ struct guc_ct_buffer_desc {
>  	u32 head;		/* offset updated by GuC*/
>  	u32 tail;		/* offset updated by owner */
>  	u32 is_in_error;	/* error indicator */
> -	u32 fence;		/* fence updated by GuC */
> -	u32 status;		/* status updated by GuC */
> +	u32 reserved1;
> +	u32 reserved2;
>  	u32 owner;		/* id of the channel owner */
>  	u32 owner_sub_id;	/* owner-defined field for extra tracking */
>  	u32 reserved[5];
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 25618649048f..4cc8c0b71699 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -90,13 +90,6 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
>  	desc->owner = CTB_OWNER_HOST;
>  }
>  
> -static void guc_ct_buffer_desc_reset(struct guc_ct_buffer_desc *desc)
> -{
> -	desc->head = 0;
> -	desc->tail = 0;
> -	desc->is_in_error = 0;
> -}
> -
>  static int guc_action_register_ct_buffer(struct intel_guc *guc,
>  					 u32 desc_addr,
>  					 u32 type)
> @@ -315,8 +308,7 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
>  static int ct_write(struct intel_guc_ct *ct,
>  		    const u32 *action,
>  		    u32 len /* in dwords */,
> -		    u32 fence,
> -		    bool want_response)
> +		    u32 fence)
>  {
>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs[CTB_SEND];
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> @@ -360,8 +352,7 @@ static int ct_write(struct intel_guc_ct *ct,
>  	 * DW2+: action data
>  	 */
>  	header = (len << GUC_CT_MSG_LEN_SHIFT) |
> -		 (GUC_CT_MSG_WRITE_FENCE_TO_DESC) |
> -		 (want_response ? GUC_CT_MSG_SEND_STATUS : 0) |
> +		 GUC_CT_MSG_SEND_STATUS |
>  		 (action[0] << GUC_CT_MSG_ACTION_SHIFT);
>  
>  	CT_DEBUG(ct, "writing %*ph %*ph %*ph\n",
> @@ -390,56 +381,6 @@ static int ct_write(struct intel_guc_ct *ct,
>  	return -EPIPE;
>  }
>  
> -/**
> - * wait_for_ctb_desc_update - Wait for the CT buffer descriptor update.
> - * @desc:	buffer descriptor
> - * @fence:	response fence
> - * @status:	placeholder for status
> - *
> - * Guc will update CT buffer descriptor with new fence and status
> - * after processing the command identified by the fence. Wait for
> - * specified fence and then read from the descriptor status of the
> - * command.
> - *
> - * Return:
> - * *	0 response received (status is valid)
> - * *	-ETIMEDOUT no response within hardcoded timeout
> - * *	-EPROTO no response, CT buffer is in error
> - */
> -static int wait_for_ctb_desc_update(struct guc_ct_buffer_desc *desc,
> -				    u32 fence,
> -				    u32 *status)
> -{
> -	int err;
> -
> -	/*
> -	 * Fast commands should complete in less than 10us, so sample quickly
> -	 * up to that length of time, then switch to a slower sleep-wait loop.
> -	 * No GuC command should ever take longer than 10ms.
> -	 */
> -#define done (READ_ONCE(desc->fence) == fence)
> -	err = wait_for_us(done, 10);
> -	if (err)
> -		err = wait_for(done, 10);
> -#undef done
> -
> -	if (unlikely(err)) {
> -		DRM_ERROR("CT: fence %u failed; reported fence=%u\n",
> -			  fence, desc->fence);
> -
> -		if (WARN_ON(desc->is_in_error)) {
> -			/* Something went wrong with the messaging, try to reset
> -			 * the buffer and hope for the best
> -			 */
> -			guc_ct_buffer_desc_reset(desc);
> -			err = -EPROTO;
> -		}
> -	}
> -
> -	*status = desc->status;
> -	return err;
> -}
> -
>  /**
>   * wait_for_ct_request_update - Wait for CT request state update.
>   * @req:	pointer to pending request
> @@ -483,8 +424,6 @@ static int ct_send(struct intel_guc_ct *ct,
>  		   u32 response_buf_size,
>  		   u32 *status)
>  {
> -	struct intel_guc_ct_buffer *ctb = &ct->ctbs[CTB_SEND];
> -	struct guc_ct_buffer_desc *desc = ctb->desc;
>  	struct ct_request request;
>  	unsigned long flags;
>  	u32 fence;
> @@ -505,16 +444,13 @@ static int ct_send(struct intel_guc_ct *ct,
>  	list_add_tail(&request.link, &ct->requests.pending);
>  	spin_unlock_irqrestore(&ct->requests.lock, flags);
>  
> -	err = ct_write(ct, action, len, fence, !!response_buf);
> +	err = ct_write(ct, action, len, fence);
>  	if (unlikely(err))
>  		goto unlink;
>  
>  	intel_guc_notify(ct_to_guc(ct));
>  
> -	if (response_buf)
> -		err = wait_for_ct_request_update(&request, status);
> -	else
> -		err = wait_for_ctb_desc_update(desc, fence, status);
> +	err = wait_for_ct_request_update(&request, status);
>  	if (unlikely(err))
>  		goto unlink;
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 11/97] drm/i915/guc: Only rely on own CTB size
  2021-05-06 19:13 ` [RFC PATCH 11/97] drm/i915/guc: Only rely on own CTB size Matthew Brost
@ 2021-05-25  2:47   ` Matthew Brost
  2021-05-25 12:48     ` [Intel-gfx] " Michal Wajdeczko
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-25  2:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:25PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> In upcoming GuC firmware, CTB size will be removed from the CTB
> descriptor so we must keep it locally for any calculations.
> 
> While around, improve some debug messages and helpers.
> 

desc->size is still used in the patch and really shouldn't be per this
comment but a patch later in the series drops it. Seeing as this patch
and that patch are going to squashed into a single patch upgrading the
GuC firmware I think that is ok.

With that:
Reviewed-by: Matthew Brost <matthew.brost@intel.com> 

> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 55 +++++++++++++++++------
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  2 +
>  2 files changed, 43 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 4cc8c0b71699..dbece569fbe4 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -90,6 +90,24 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
>  	desc->owner = CTB_OWNER_HOST;
>  }
>  
> +static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb, u32 cmds_addr)
> +{
> +	guc_ct_buffer_desc_init(ctb->desc, cmds_addr, ctb->size);
> +}
> +
> +static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb,
> +			       struct guc_ct_buffer_desc *desc,
> +			       u32 *cmds, u32 size)
> +{
> +	GEM_BUG_ON(size % 4);
> +
> +	ctb->desc = desc;
> +	ctb->cmds = cmds;
> +	ctb->size = size;
> +
> +	guc_ct_buffer_reset(ctb, 0);
> +}
> +
>  static int guc_action_register_ct_buffer(struct intel_guc *guc,
>  					 u32 desc_addr,
>  					 u32 type)
> @@ -148,7 +166,10 @@ static int ct_deregister_buffer(struct intel_guc_ct *ct, u32 type)
>  int intel_guc_ct_init(struct intel_guc_ct *ct)
>  {
>  	struct intel_guc *guc = ct_to_guc(ct);
> +	struct guc_ct_buffer_desc *desc;
> +	u32 blob_size;
>  	void *blob;
> +	u32 *cmds;
>  	int err;
>  	int i;
>  
> @@ -176,19 +197,24 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
>  	 * other code will need updating as well.
>  	 */
>  
> -	err = intel_guc_allocate_and_map_vma(guc, PAGE_SIZE, &ct->vma, &blob);
> +	blob_size = PAGE_SIZE;
> +	err = intel_guc_allocate_and_map_vma(guc, blob_size, &ct->vma, &blob);
>  	if (unlikely(err)) {
> -		CT_ERROR(ct, "Failed to allocate CT channel (err=%d)\n", err);
> +		CT_PROBE_ERROR(ct, "Failed to allocate %u for CTB data (%pe)\n",
> +			       blob_size, ERR_PTR(err));
>  		return err;
>  	}
>  
> -	CT_DEBUG(ct, "vma base=%#x\n", intel_guc_ggtt_offset(guc, ct->vma));
> +	CT_DEBUG(ct, "base=%#x size=%u\n", intel_guc_ggtt_offset(guc, ct->vma), blob_size);
>  
>  	/* store pointers to desc and cmds */
>  	for (i = 0; i < ARRAY_SIZE(ct->ctbs); i++) {
>  		GEM_BUG_ON((i !=  CTB_SEND) && (i != CTB_RECV));
> -		ct->ctbs[i].desc = blob + PAGE_SIZE/4 * i;
> -		ct->ctbs[i].cmds = blob + PAGE_SIZE/4 * i + PAGE_SIZE/2;
> +
> +		desc = blob + PAGE_SIZE / 4 * i;
> +		cmds = blob + PAGE_SIZE / 4 * i + PAGE_SIZE / 2;
> +
> +		guc_ct_buffer_init(&ct->ctbs[i], desc, cmds, PAGE_SIZE / 4);
>  	}
>  
>  	return 0;
> @@ -217,7 +243,7 @@ void intel_guc_ct_fini(struct intel_guc_ct *ct)
>  int intel_guc_ct_enable(struct intel_guc_ct *ct)
>  {
>  	struct intel_guc *guc = ct_to_guc(ct);
> -	u32 base, cmds, size;
> +	u32 base, cmds;
>  	int err;
>  	int i;
>  
> @@ -232,10 +258,11 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>  	 */
>  	for (i = 0; i < ARRAY_SIZE(ct->ctbs); i++) {
>  		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
> +
>  		cmds = base + PAGE_SIZE / 4 * i + PAGE_SIZE / 2;
> -		size = PAGE_SIZE / 4;
> -		CT_DEBUG(ct, "%d: addr=%#x size=%u\n", i, cmds, size);
> -		guc_ct_buffer_desc_init(ct->ctbs[i].desc, cmds, size);
> +		CT_DEBUG(ct, "%d: cmds addr=%#x\n", i, cmds);
> +
> +		guc_ct_buffer_reset(&ct->ctbs[i], cmds);
>  	}
>  
>  	/*
> @@ -259,7 +286,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>  err_deregister:
>  	ct_deregister_buffer(ct, INTEL_GUC_CT_BUFFER_TYPE_RECV);
>  err_out:
> -	CT_PROBE_ERROR(ct, "Failed to open channel (err=%d)\n", err);
> +	CT_PROBE_ERROR(ct, "Failed to enable CTB (%pe)\n", ERR_PTR(err));
>  	return err;
>  }
>  
> @@ -314,7 +341,7 @@ static int ct_write(struct intel_guc_ct *ct,
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
>  	u32 head = desc->head;
>  	u32 tail = desc->tail;
> -	u32 size = desc->size;
> +	u32 size = ctb->size;
>  	u32 used;
>  	u32 header;
>  	u32 *cmds = ctb->cmds;
> @@ -323,7 +350,7 @@ static int ct_write(struct intel_guc_ct *ct,
>  	if (unlikely(desc->is_in_error))
>  		return -EPIPE;
>  
> -	if (unlikely(!IS_ALIGNED(head | tail | size, 4) ||
> +	if (unlikely(!IS_ALIGNED(head | tail, 4) ||
>  		     (tail | head) >= size))
>  		goto corrupted;
>  
> @@ -530,7 +557,7 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
>  	u32 head = desc->head;
>  	u32 tail = desc->tail;
> -	u32 size = desc->size;
> +	u32 size = ctb->size;
>  	u32 *cmds = ctb->cmds;
>  	s32 available;
>  	unsigned int len;
> @@ -539,7 +566,7 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
>  	if (unlikely(desc->is_in_error))
>  		return -EPIPE;
>  
> -	if (unlikely(!IS_ALIGNED(head | tail | size, 4) ||
> +	if (unlikely(!IS_ALIGNED(head | tail, 4) ||
>  		     (tail | head) >= size))
>  		goto corrupted;
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index 494a51a5200f..4009e2dd0de4 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -29,10 +29,12 @@ struct intel_guc;
>   *
>   * @desc: pointer to the buffer descriptor
>   * @cmds: pointer to the commands buffer
> + * @size: size of the commands buffer
>   */
>  struct intel_guc_ct_buffer {
>  	struct guc_ct_buffer_desc *desc;
>  	u32 *cmds;
> +	u32 size;
>  };
>  
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 12/97] drm/i915/guc: Don't repeat CTB layout calculations
  2021-05-06 19:13 ` [RFC PATCH 12/97] drm/i915/guc: Don't repeat CTB layout calculations Matthew Brost
@ 2021-05-25  2:53   ` Matthew Brost
  2021-05-25 13:07     ` Michal Wajdeczko
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-25  2:53 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:26PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> We can retrieve offsets to cmds buffers and descriptor from
> actual pointers that we already keep locally.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index dbece569fbe4..fbd6bd20f588 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -244,6 +244,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>  {
>  	struct intel_guc *guc = ct_to_guc(ct);
>  	u32 base, cmds;
> +	void *blob;
>  	int err;
>  	int i;
>  
> @@ -251,15 +252,18 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>  
>  	/* vma should be already allocated and map'ed */
>  	GEM_BUG_ON(!ct->vma);
> +	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(ct->vma->obj));

This doesn't really have anything to do with this patch, but again this
patch will be squashed into a large patch updating the GuC firmware, so
I think this is fine.

With that:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>

>  	base = intel_guc_ggtt_offset(guc, ct->vma);
>  
> -	/* (re)initialize descriptors
> -	 * cmds buffers are in the second half of the blob page
> -	 */
> +	/* blob should start with send descriptor */
> +	blob = __px_vaddr(ct->vma->obj);
> +	GEM_BUG_ON(blob != ct->ctbs[CTB_SEND].desc);
> +
> +	/* (re)initialize descriptors */
>  	for (i = 0; i < ARRAY_SIZE(ct->ctbs); i++) {
>  		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
>  
> -		cmds = base + PAGE_SIZE / 4 * i + PAGE_SIZE / 2;
> +		cmds = base + ptrdiff(ct->ctbs[i].cmds, blob);
>  		CT_DEBUG(ct, "%d: cmds addr=%#x\n", i, cmds);
>  
>  		guc_ct_buffer_reset(&ct->ctbs[i], cmds);
> @@ -269,12 +273,12 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>  	 * Register both CT buffers starting with RECV buffer.
>  	 * Descriptors are in first half of the blob.
>  	 */
> -	err = ct_register_buffer(ct, base + PAGE_SIZE / 4 * CTB_RECV,
> +	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs[CTB_RECV].desc, blob),
>  				 INTEL_GUC_CT_BUFFER_TYPE_RECV);
>  	if (unlikely(err))
>  		goto err_out;
>  
> -	err = ct_register_buffer(ct, base + PAGE_SIZE / 4 * CTB_SEND,
> +	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs[CTB_SEND].desc, blob),
>  				 INTEL_GUC_CT_BUFFER_TYPE_SEND);
>  	if (unlikely(err))
>  		goto err_deregister;
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 14/97] drm/i915/guc: Update sizes of CTB buffers
  2021-05-06 19:13 ` [RFC PATCH 14/97] drm/i915/guc: Update sizes of CTB buffers Matthew Brost
@ 2021-05-25  2:56   ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25  2:56 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:28PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> Future GuC will require CTB buffers sizes to be multiple of 4K.
> Make these changes now as this shouldn't impact us too much.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: John Harrison <john.c.harrison@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 60 ++++++++++++-----------
>  1 file changed, 32 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index c54a29176862..c87a0a8bef26 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -38,6 +38,32 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
>  #define CT_PROBE_ERROR(_ct, _fmt, ...) \
>  	i915_probe_error(ct_to_i915(ct), "CT: " _fmt, ##__VA_ARGS__);
>  
> +/**
> + * DOC: CTB Blob
> + *
> + * We allocate single blob to hold both CTB descriptors and buffers:
> + *
> + *      +--------+-----------------------------------------------+------+
> + *      | offset | contents                                      | size |
> + *      +========+===============================================+======+
> + *      | 0x0000 | H2G `CTB Descriptor`_ (send)                  |      |
> + *      +--------+-----------------------------------------------+  4K  |
> + *      | 0x0800 | G2H `CTB Descriptor`_ (recv)                  |      |
> + *      +--------+-----------------------------------------------+------+
> + *      | 0x1000 | H2G `CT Buffer`_ (send)                       | n*4K |
> + *      |        |                                               |      |
> + *      +--------+-----------------------------------------------+------+
> + *      | 0x1000 | G2H `CT Buffer`_ (recv)                       | m*4K |
> + *      | + n*4K |                                               |      |
> + *      +--------+-----------------------------------------------+------+
> + *
> + * Size of each `CT Buffer`_ must be multiple of 4K.
> + * As we don't expect too many messages, for now use minimum sizes.
> + */
> +#define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
> +#define CTB_H2G_BUFFER_SIZE	(SZ_4K)
> +#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
> +
>  struct ct_request {
>  	struct list_head link;
>  	u32 fence;
> @@ -175,29 +201,7 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
>  
>  	GEM_BUG_ON(ct->vma);
>  
> -	/* We allocate 1 page to hold both descriptors and both buffers.
> -	 *       ___________.....................
> -	 *      |desc (SEND)|                   :
> -	 *      |___________|                   PAGE/4
> -	 *      :___________....................:
> -	 *      |desc (RECV)|                   :
> -	 *      |___________|                   PAGE/4
> -	 *      :_______________________________:
> -	 *      |cmds (SEND)                    |
> -	 *      |                               PAGE/4
> -	 *      |_______________________________|
> -	 *      |cmds (RECV)                    |
> -	 *      |                               PAGE/4
> -	 *      |_______________________________|
> -	 *
> -	 * Each message can use a maximum of 32 dwords and we don't expect to
> -	 * have more than 1 in flight at any time, so we have enough space.
> -	 * Some logic further ahead will rely on the fact that there is only 1
> -	 * page and that it is always mapped, so if the size is changed the
> -	 * other code will need updating as well.
> -	 */
> -
> -	blob_size = PAGE_SIZE;
> +	blob_size = 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE + CTB_G2H_BUFFER_SIZE;
>  	err = intel_guc_allocate_and_map_vma(guc, blob_size, &ct->vma, &blob);
>  	if (unlikely(err)) {
>  		CT_PROBE_ERROR(ct, "Failed to allocate %u for CTB data (%pe)\n",
> @@ -209,17 +213,17 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
>  
>  	/* store pointers to desc and cmds for send ctb */
>  	desc = blob;
> -	cmds = blob + PAGE_SIZE / 2;
> -	cmds_size = PAGE_SIZE / 4;
> +	cmds = blob + 2 * CTB_DESC_SIZE;

2 is a magic number here. Think it would be more clear with
CTB_NUMBER_DESC define here.

Michal what do you think? We can fix this in the next post of this with
your blessing.

With that nit:
Reviewed-by: Matthew Brost <matthew.brost@intel.com> 

> +	cmds_size = CTB_H2G_BUFFER_SIZE;
>  	CT_DEBUG(ct, "%s desc %#lx cmds %#lx size %u\n", "send",
>  		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
>  
>  	guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size);
>  
>  	/* store pointers to desc and cmds for recv ctb */
> -	desc = blob + PAGE_SIZE / 4;
> -	cmds = blob + PAGE_SIZE / 4 + PAGE_SIZE / 2;
> -	cmds_size = PAGE_SIZE / 4;
> +	desc = blob + CTB_DESC_SIZE;
> +	cmds = blob + 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE;
> +	cmds_size = CTB_G2H_BUFFER_SIZE;
>  	CT_DEBUG(ct, "%s desc %#lx cmds %#lx size %u\n", "recv",
>  		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 13/97] drm/i915/guc: Replace CTB array with explicit members
  2021-05-06 19:13 ` [RFC PATCH 13/97] drm/i915/guc: Replace CTB array with explicit members Matthew Brost
@ 2021-05-25  3:15   ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25  3:15 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:27PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> Upcoming GuC firmware will always require just two CTBs and we
> also plan to configure them with different sizes, so definining
> them as array is no longer suitable.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 46 ++++++++++++-----------
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  7 +++-
>  2 files changed, 30 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index fbd6bd20f588..c54a29176862 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -168,10 +168,10 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
>  	struct intel_guc *guc = ct_to_guc(ct);
>  	struct guc_ct_buffer_desc *desc;
>  	u32 blob_size;
> +	u32 cmds_size;
>  	void *blob;
>  	u32 *cmds;
>  	int err;
> -	int i;
>  
>  	GEM_BUG_ON(ct->vma);
>  
> @@ -207,15 +207,23 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
>  
>  	CT_DEBUG(ct, "base=%#x size=%u\n", intel_guc_ggtt_offset(guc, ct->vma), blob_size);
>  
> -	/* store pointers to desc and cmds */
> -	for (i = 0; i < ARRAY_SIZE(ct->ctbs); i++) {
> -		GEM_BUG_ON((i !=  CTB_SEND) && (i != CTB_RECV));
> +	/* store pointers to desc and cmds for send ctb */
> +	desc = blob;
> +	cmds = blob + PAGE_SIZE / 2;
> +	cmds_size = PAGE_SIZE / 4;
> +	CT_DEBUG(ct, "%s desc %#lx cmds %#lx size %u\n", "send",
> +		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
>  
> -		desc = blob + PAGE_SIZE / 4 * i;
> -		cmds = blob + PAGE_SIZE / 4 * i + PAGE_SIZE / 2;
> +	guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size);
>  
> -		guc_ct_buffer_init(&ct->ctbs[i], desc, cmds, PAGE_SIZE / 4);
> -	}
> +	/* store pointers to desc and cmds for recv ctb */
> +	desc = blob + PAGE_SIZE / 4;
> +	cmds = blob + PAGE_SIZE / 4 + PAGE_SIZE / 2;
> +	cmds_size = PAGE_SIZE / 4;
> +	CT_DEBUG(ct, "%s desc %#lx cmds %#lx size %u\n", "recv",
> +		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
> +
> +	guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size);
>  
>  	return 0;
>  }
> @@ -246,7 +254,6 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>  	u32 base, cmds;
>  	void *blob;
>  	int err;
> -	int i;
>  
>  	GEM_BUG_ON(ct->enabled);
>  
> @@ -257,28 +264,25 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>  
>  	/* blob should start with send descriptor */
>  	blob = __px_vaddr(ct->vma->obj);
> -	GEM_BUG_ON(blob != ct->ctbs[CTB_SEND].desc);
> +	GEM_BUG_ON(blob != ct->ctbs.send.desc);
>  
>  	/* (re)initialize descriptors */
> -	for (i = 0; i < ARRAY_SIZE(ct->ctbs); i++) {
> -		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
> +	cmds = base + ptrdiff(ct->ctbs.send.cmds, blob);
> +	guc_ct_buffer_reset(&ct->ctbs.send, cmds);
>  
> -		cmds = base + ptrdiff(ct->ctbs[i].cmds, blob);
> -		CT_DEBUG(ct, "%d: cmds addr=%#x\n", i, cmds);
> -
> -		guc_ct_buffer_reset(&ct->ctbs[i], cmds);
> -	}
> +	cmds = base + ptrdiff(ct->ctbs.recv.cmds, blob);
> +	guc_ct_buffer_reset(&ct->ctbs.recv, cmds);
>  
>  	/*
>  	 * Register both CT buffers starting with RECV buffer.
>  	 * Descriptors are in first half of the blob.
>  	 */
> -	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs[CTB_RECV].desc, blob),
> +	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs.recv.desc, blob),
>  				 INTEL_GUC_CT_BUFFER_TYPE_RECV);
>  	if (unlikely(err))
>  		goto err_out;
>  
> -	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs[CTB_SEND].desc, blob),
> +	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs.send.desc, blob),
>  				 INTEL_GUC_CT_BUFFER_TYPE_SEND);
>  	if (unlikely(err))
>  		goto err_deregister;
> @@ -341,7 +345,7 @@ static int ct_write(struct intel_guc_ct *ct,
>  		    u32 len /* in dwords */,
>  		    u32 fence)
>  {
> -	struct intel_guc_ct_buffer *ctb = &ct->ctbs[CTB_SEND];
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
>  	u32 head = desc->head;
>  	u32 tail = desc->tail;
> @@ -557,7 +561,7 @@ static inline bool ct_header_is_response(u32 header)
>  
>  static int ct_read(struct intel_guc_ct *ct, u32 *data)
>  {
> -	struct intel_guc_ct_buffer *ctb = &ct->ctbs[CTB_RECV];
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
>  	u32 head = desc->head;
>  	u32 tail = desc->tail;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index 4009e2dd0de4..fc9486779e87 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -47,8 +47,11 @@ struct intel_guc_ct {
>  	struct i915_vma *vma;
>  	bool enabled;
>  
> -	/* buffers for sending(0) and receiving(1) commands */
> -	struct intel_guc_ct_buffer ctbs[2];
> +	/* buffers for sending and receiving commands */
> +	struct {
> +		struct intel_guc_ct_buffer send;
> +		struct intel_guc_ct_buffer recv;
> +	} ctbs;
>  
>  	struct {
>  		u32 last_fence; /* last fence used to send request */
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 16/97] drm/i915/guc: Start protecting access to CTB descriptors
  2021-05-06 19:13 ` [RFC PATCH 16/97] drm/i915/guc: Start protecting access to CTB descriptors Matthew Brost
@ 2021-05-25  3:21   ` Matthew Brost
  2021-05-25  3:21   ` Matthew Brost
  1 sibling, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25  3:21 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:30PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> We want to stop using guc.send_mutex while sending CTB messages
> so we have to start protecting access to CTB send descriptor.
> 
> For completeness protect also CTB send descriptor.

Michal I think you have a typo here, receive descriptor, right? Again
this is going to get squashed in the firmware update patch but thought
I'd mention this.

With that:
Reviewed-by: Matthew Brost <matthew.brost@intel.com> 

> 
> Add spinlock to struct intel_guc_ct_buffer and start using it.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 14 ++++++++++++--
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  2 ++
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index a4b2e7fe318b..bee0958d8bae 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -89,6 +89,8 @@ static void ct_incoming_request_worker_func(struct work_struct *w);
>   */
>  void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>  {
> +	spin_lock_init(&ct->ctbs.send.lock);
> +	spin_lock_init(&ct->ctbs.recv.lock);
>  	spin_lock_init(&ct->requests.lock);
>  	INIT_LIST_HEAD(&ct->requests.pending);
>  	INIT_LIST_HEAD(&ct->requests.incoming);
> @@ -479,17 +481,22 @@ static int ct_send(struct intel_guc_ct *ct,
>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>  	GEM_BUG_ON(!response_buf && response_buf_size);
>  
> +	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> +
>  	fence = ct_get_next_fence(ct);
>  	request.fence = fence;
>  	request.status = 0;
>  	request.response_len = response_buf_size;
>  	request.response_buf = response_buf;
>  
> -	spin_lock_irqsave(&ct->requests.lock, flags);
> +	spin_lock(&ct->requests.lock);
>  	list_add_tail(&request.link, &ct->requests.pending);
> -	spin_unlock_irqrestore(&ct->requests.lock, flags);
> +	spin_unlock(&ct->requests.lock);
>  
>  	err = ct_write(ct, action, len, fence);
> +
> +	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> +
>  	if (unlikely(err))
>  		goto unlink;
>  
> @@ -825,6 +832,7 @@ static int ct_handle_request(struct intel_guc_ct *ct, const u32 *msg)
>  void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
>  {
>  	u32 msg[GUC_CT_MSG_LEN_MASK + 1]; /* one extra dw for the header */
> +	unsigned long flags;
>  	int err = 0;
>  
>  	if (unlikely(!ct->enabled)) {
> @@ -833,7 +841,9 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
>  	}
>  
>  	do {
> +		spin_lock_irqsave(&ct->ctbs.recv.lock, flags);
>  		err = ct_read(ct, msg);
> +		spin_unlock_irqrestore(&ct->ctbs.recv.lock, flags);
>  		if (err)
>  			break;
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index fc9486779e87..bc52dc479a14 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -27,11 +27,13 @@ struct intel_guc;
>   * record (command transport buffer descriptor) and the actual buffer which
>   * holds the commands.
>   *
> + * @lock: protects access to the commands buffer and buffer descriptor
>   * @desc: pointer to the buffer descriptor
>   * @cmds: pointer to the commands buffer
>   * @size: size of the commands buffer
>   */
>  struct intel_guc_ct_buffer {
> +	spinlock_t lock;
>  	struct guc_ct_buffer_desc *desc;
>  	u32 *cmds;
>  	u32 size;
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 16/97] drm/i915/guc: Start protecting access to CTB descriptors
  2021-05-06 19:13 ` [RFC PATCH 16/97] drm/i915/guc: Start protecting access to CTB descriptors Matthew Brost
  2021-05-25  3:21   ` Matthew Brost
@ 2021-05-25  3:21   ` Matthew Brost
  1 sibling, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25  3:21 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:30PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> We want to stop using guc.send_mutex while sending CTB messages
> so we have to start protecting access to CTB send descriptor.
> 
> For completeness protect also CTB send descriptor.
> 
> Add spinlock to struct intel_guc_ct_buffer and start using it.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 14 ++++++++++++--
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  2 ++
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index a4b2e7fe318b..bee0958d8bae 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -89,6 +89,8 @@ static void ct_incoming_request_worker_func(struct work_struct *w);
>   */
>  void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>  {
> +	spin_lock_init(&ct->ctbs.send.lock);
> +	spin_lock_init(&ct->ctbs.recv.lock);
>  	spin_lock_init(&ct->requests.lock);
>  	INIT_LIST_HEAD(&ct->requests.pending);
>  	INIT_LIST_HEAD(&ct->requests.incoming);
> @@ -479,17 +481,22 @@ static int ct_send(struct intel_guc_ct *ct,
>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>  	GEM_BUG_ON(!response_buf && response_buf_size);
>  
> +	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> +
>  	fence = ct_get_next_fence(ct);
>  	request.fence = fence;
>  	request.status = 0;
>  	request.response_len = response_buf_size;
>  	request.response_buf = response_buf;
>  
> -	spin_lock_irqsave(&ct->requests.lock, flags);
> +	spin_lock(&ct->requests.lock);
>  	list_add_tail(&request.link, &ct->requests.pending);
> -	spin_unlock_irqrestore(&ct->requests.lock, flags);
> +	spin_unlock(&ct->requests.lock);
>  
>  	err = ct_write(ct, action, len, fence);
> +
> +	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> +
>  	if (unlikely(err))
>  		goto unlink;
>  
> @@ -825,6 +832,7 @@ static int ct_handle_request(struct intel_guc_ct *ct, const u32 *msg)
>  void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
>  {
>  	u32 msg[GUC_CT_MSG_LEN_MASK + 1]; /* one extra dw for the header */
> +	unsigned long flags;
>  	int err = 0;
>  
>  	if (unlikely(!ct->enabled)) {
> @@ -833,7 +841,9 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
>  	}
>  
>  	do {
> +		spin_lock_irqsave(&ct->ctbs.recv.lock, flags);
>  		err = ct_read(ct, msg);
> +		spin_unlock_irqrestore(&ct->ctbs.recv.lock, flags);
>  		if (err)
>  			break;
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index fc9486779e87..bc52dc479a14 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -27,11 +27,13 @@ struct intel_guc;
>   * record (command transport buffer descriptor) and the actual buffer which
>   * holds the commands.
>   *
> + * @lock: protects access to the commands buffer and buffer descriptor
>   * @desc: pointer to the buffer descriptor
>   * @cmds: pointer to the commands buffer
>   * @size: size of the commands buffer
>   */
>  struct intel_guc_ct_buffer {
> +	spinlock_t lock;
>  	struct guc_ct_buffer_desc *desc;
>  	u32 *cmds;
>  	u32 size;
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 01/97] drm/i915/gt: Move engine setup out of set_default_submission
  2021-05-06 19:13 ` [RFC PATCH 01/97] drm/i915/gt: Move engine setup out of set_default_submission Matthew Brost
  2021-05-19  0:25   ` Matthew Brost
@ 2021-05-25  8:44   ` Tvrtko Ursulin
  1 sibling, 0 replies; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-25  8:44 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter


On 06/05/2021 20:13, Matthew Brost wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Now that we no longer switch back and forth between guc and execlists,
> we no longer need to restore the backend's vfunc and can leave them set
> after initialisation. The only catch is that we lose the submission on
> wedging and still need to reset the submit_request vfunc on unwedging.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

This patch had my r-b already so I'll repeat it:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> ---
>   .../drm/i915/gt/intel_execlists_submission.c  | 46 ++++++++---------
>   .../gpu/drm/i915/gt/intel_ring_submission.c   |  4 --
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 50 ++++++++-----------
>   3 files changed, 44 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index de124870af44..1108c193ab65 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3076,29 +3076,6 @@ static void execlists_set_default_submission(struct intel_engine_cs *engine)
>   	engine->submit_request = execlists_submit_request;
>   	engine->schedule = i915_schedule;
>   	engine->execlists.tasklet.callback = execlists_submission_tasklet;
> -
> -	engine->reset.prepare = execlists_reset_prepare;
> -	engine->reset.rewind = execlists_reset_rewind;
> -	engine->reset.cancel = execlists_reset_cancel;
> -	engine->reset.finish = execlists_reset_finish;
> -
> -	engine->park = execlists_park;
> -	engine->unpark = NULL;
> -
> -	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
> -	if (!intel_vgpu_active(engine->i915)) {
> -		engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
> -		if (can_preempt(engine)) {
> -			engine->flags |= I915_ENGINE_HAS_PREEMPTION;
> -			if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
> -				engine->flags |= I915_ENGINE_HAS_TIMESLICES;
> -		}
> -	}
> -
> -	if (intel_engine_has_preemption(engine))
> -		engine->emit_bb_start = gen8_emit_bb_start;
> -	else
> -		engine->emit_bb_start = gen8_emit_bb_start_noarb;
>   }
>   
>   static void execlists_shutdown(struct intel_engine_cs *engine)
> @@ -3129,6 +3106,14 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   	engine->cops = &execlists_context_ops;
>   	engine->request_alloc = execlists_request_alloc;
>   
> +	engine->reset.prepare = execlists_reset_prepare;
> +	engine->reset.rewind = execlists_reset_rewind;
> +	engine->reset.cancel = execlists_reset_cancel;
> +	engine->reset.finish = execlists_reset_finish;
> +
> +	engine->park = execlists_park;
> +	engine->unpark = NULL;
> +
>   	engine->emit_flush = gen8_emit_flush_xcs;
>   	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
>   	engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_xcs;
> @@ -3149,6 +3134,21 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   		 * until a more refined solution exists.
>   		 */
>   	}
> +
> +	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
> +	if (!intel_vgpu_active(engine->i915)) {
> +		engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
> +		if (can_preempt(engine)) {
> +			engine->flags |= I915_ENGINE_HAS_PREEMPTION;
> +			if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
> +				engine->flags |= I915_ENGINE_HAS_TIMESLICES;
> +		}
> +	}
> +
> +	if (intel_engine_has_preemption(engine))
> +		engine->emit_bb_start = gen8_emit_bb_start;
> +	else
> +		engine->emit_bb_start = gen8_emit_bb_start_noarb;
>   }
>   
>   static void logical_ring_default_irqs(struct intel_engine_cs *engine)
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 9585546556ee..5f4f7f1df48f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -989,14 +989,10 @@ static void gen6_bsd_submit_request(struct i915_request *request)
>   static void i9xx_set_default_submission(struct intel_engine_cs *engine)
>   {
>   	engine->submit_request = i9xx_submit_request;
> -
> -	engine->park = NULL;
> -	engine->unpark = NULL;
>   }
>   
>   static void gen6_bsd_set_default_submission(struct intel_engine_cs *engine)
>   {
> -	i9xx_set_default_submission(engine);
>   	engine->submit_request = gen6_bsd_submit_request;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 92688a9b6717..f72faa0b8339 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -608,35 +608,6 @@ static int guc_resume(struct intel_engine_cs *engine)
>   static void guc_set_default_submission(struct intel_engine_cs *engine)
>   {
>   	engine->submit_request = guc_submit_request;
> -	engine->schedule = i915_schedule;
> -	engine->execlists.tasklet.callback = guc_submission_tasklet;
> -
> -	engine->reset.prepare = guc_reset_prepare;
> -	engine->reset.rewind = guc_reset_rewind;
> -	engine->reset.cancel = guc_reset_cancel;
> -	engine->reset.finish = guc_reset_finish;
> -
> -	engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
> -	engine->flags |= I915_ENGINE_HAS_PREEMPTION;
> -
> -	/*
> -	 * TODO: GuC supports timeslicing and semaphores as well, but they're
> -	 * handled by the firmware so some minor tweaks are required before
> -	 * enabling.
> -	 *
> -	 * engine->flags |= I915_ENGINE_HAS_TIMESLICES;
> -	 * engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
> -	 */
> -
> -	engine->emit_bb_start = gen8_emit_bb_start;
> -
> -	/*
> -	 * For the breadcrumb irq to work we need the interrupts to stay
> -	 * enabled. However, on all platforms on which we'll have support for
> -	 * GuC submission we don't allow disabling the interrupts at runtime, so
> -	 * we're always safe with the current flow.
> -	 */
> -	GEM_BUG_ON(engine->irq_enable || engine->irq_disable);
>   }
>   
>   static void guc_release(struct intel_engine_cs *engine)
> @@ -658,6 +629,13 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>   	engine->cops = &guc_context_ops;
>   	engine->request_alloc = guc_request_alloc;
>   
> +	engine->schedule = i915_schedule;
> +
> +	engine->reset.prepare = guc_reset_prepare;
> +	engine->reset.rewind = guc_reset_rewind;
> +	engine->reset.cancel = guc_reset_cancel;
> +	engine->reset.finish = guc_reset_finish;
> +
>   	engine->emit_flush = gen8_emit_flush_xcs;
>   	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
>   	engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_xcs;
> @@ -666,6 +644,20 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>   		engine->emit_flush = gen12_emit_flush_xcs;
>   	}
>   	engine->set_default_submission = guc_set_default_submission;
> +
> +	engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
> +	engine->flags |= I915_ENGINE_HAS_PREEMPTION;
> +
> +	/*
> +	 * TODO: GuC supports timeslicing and semaphores as well, but they're
> +	 * handled by the firmware so some minor tweaks are required before
> +	 * enabling.
> +	 *
> +	 * engine->flags |= I915_ENGINE_HAS_TIMESLICES;
> +	 * engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
> +	 */
> +
> +	engine->emit_bb_start = gen8_emit_bb_start;
>   }
>   
>   static void rcs_submission_override(struct intel_engine_cs *engine)
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 02/97] drm/i915/gt: Move submission_method into intel_gt
  2021-05-06 19:13 ` [RFC PATCH 02/97] drm/i915/gt: Move submission_method into intel_gt Matthew Brost
  2021-05-19  3:10   ` Matthew Brost
@ 2021-05-25  8:44   ` Tvrtko Ursulin
  1 sibling, 0 replies; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-25  8:44 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter



On 06/05/2021 20:13, Matthew Brost wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Since we setup the submission method for the engines once, it is easy to
> assign an enum and use that instead of probing into the backends.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Same, this patch had my r-b already so I'll repeat it:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> ---
>   drivers/gpu/drm/i915/gt/intel_engine.h               |  8 +++++++-
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c            | 12 ++++++++----
>   drivers/gpu/drm/i915/gt/intel_execlists_submission.c |  8 --------
>   drivers/gpu/drm/i915/gt/intel_execlists_submission.h |  3 ---
>   drivers/gpu/drm/i915/gt/intel_gt_types.h             |  7 +++++++
>   drivers/gpu/drm/i915/gt/intel_reset.c                |  7 +++----
>   drivers/gpu/drm/i915/gt/selftest_execlists.c         |  2 +-
>   drivers/gpu/drm/i915/gt/selftest_ring_submission.c   |  2 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c    |  5 -----
>   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h    |  1 -
>   drivers/gpu/drm/i915/i915_perf.c                     | 10 +++++-----
>   11 files changed, 32 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index 47ee8578e511..8d9184920c51 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -13,8 +13,9 @@
>   #include "i915_reg.h"
>   #include "i915_request.h"
>   #include "i915_selftest.h"
> -#include "gt/intel_timeline.h"
>   #include "intel_engine_types.h"
> +#include "intel_gt_types.h"
> +#include "intel_timeline.h"
>   #include "intel_workarounds.h"
>   
>   struct drm_printer;
> @@ -262,6 +263,11 @@ void intel_engine_init_active(struct intel_engine_cs *engine,
>   #define ENGINE_MOCK	1
>   #define ENGINE_VIRTUAL	2
>   
> +static inline bool intel_engine_uses_guc(const struct intel_engine_cs *engine)
> +{
> +	return engine->gt->submission_method >= INTEL_SUBMISSION_GUC;
> +}
> +
>   static inline bool
>   intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
>   {
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 6dbdbde00f14..0618379b68ca 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -909,12 +909,16 @@ int intel_engines_init(struct intel_gt *gt)
>   	enum intel_engine_id id;
>   	int err;
>   
> -	if (intel_uc_uses_guc_submission(&gt->uc))
> +	if (intel_uc_uses_guc_submission(&gt->uc)) {
> +		gt->submission_method = INTEL_SUBMISSION_GUC;
>   		setup = intel_guc_submission_setup;
> -	else if (HAS_EXECLISTS(gt->i915))
> +	} else if (HAS_EXECLISTS(gt->i915)) {
> +		gt->submission_method = INTEL_SUBMISSION_ELSP;
>   		setup = intel_execlists_submission_setup;
> -	else
> +	} else {
> +		gt->submission_method = INTEL_SUBMISSION_RING;
>   		setup = intel_ring_submission_setup;
> +	}
>   
>   	for_each_engine(engine, gt, id) {
>   		err = engine_setup_common(engine);
> @@ -1479,7 +1483,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
>   		drm_printf(m, "\tIPEHR: 0x%08x\n", ENGINE_READ(engine, IPEHR));
>   	}
>   
> -	if (intel_engine_in_guc_submission_mode(engine)) {
> +	if (intel_engine_uses_guc(engine)) {
>   		/* nothing to print yet */
>   	} else if (HAS_EXECLISTS(dev_priv)) {
>   		struct i915_request * const *port, *rq;
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 1108c193ab65..9d2da5ccaef6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -1768,7 +1768,6 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive)
>   	 */
>   	GEM_BUG_ON(!tasklet_is_locked(&execlists->tasklet) &&
>   		   !reset_in_progress(execlists));
> -	GEM_BUG_ON(!intel_engine_in_execlists_submission_mode(engine));
>   
>   	/*
>   	 * Note that csb_write, csb_status may be either in HWSP or mmio.
> @@ -3884,13 +3883,6 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
>   	spin_unlock_irqrestore(&engine->active.lock, flags);
>   }
>   
> -bool
> -intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine)
> -{
> -	return engine->set_default_submission ==
> -	       execlists_set_default_submission;
> -}
> -
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   #include "selftest_execlists.c"
>   #endif
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> index fd61dae820e9..4ca9b475e252 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> @@ -43,7 +43,4 @@ int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
>   				     const struct intel_engine_cs *master,
>   				     const struct intel_engine_cs *sibling);
>   
> -bool
> -intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
> -
>   #endif /* __INTEL_EXECLISTS_SUBMISSION_H__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> index 0caf6ca0a784..fecfacf551d5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> @@ -31,6 +31,12 @@ struct i915_ggtt;
>   struct intel_engine_cs;
>   struct intel_uncore;
>   
> +enum intel_submission_method {
> +	INTEL_SUBMISSION_RING,
> +	INTEL_SUBMISSION_ELSP,
> +	INTEL_SUBMISSION_GUC,
> +};
> +
>   struct intel_gt {
>   	struct drm_i915_private *i915;
>   	struct intel_uncore *uncore;
> @@ -118,6 +124,7 @@ struct intel_gt {
>   	struct intel_engine_cs *engine[I915_NUM_ENGINES];
>   	struct intel_engine_cs *engine_class[MAX_ENGINE_CLASS + 1]
>   					    [MAX_ENGINE_INSTANCE + 1];
> +	enum intel_submission_method submission_method;
>   
>   	/*
>   	 * Default address space (either GGTT or ppGTT depending on arch).
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index a377c4588aaa..d5094be6d90f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -1118,7 +1118,6 @@ static int intel_gt_reset_engine(struct intel_engine_cs *engine)
>   int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
>   {
>   	struct intel_gt *gt = engine->gt;
> -	bool uses_guc = intel_engine_in_guc_submission_mode(engine);
>   	int ret;
>   
>   	ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
> @@ -1134,10 +1133,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
>   			   "Resetting %s for %s\n", engine->name, msg);
>   	atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
>   
> -	if (!uses_guc)
> -		ret = intel_gt_reset_engine(engine);
> -	else
> +	if (intel_engine_uses_guc(engine))
>   		ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
> +	else
> +		ret = intel_gt_reset_engine(engine);
>   	if (ret) {
>   		/* If we fail here, we expect to fallback to a global reset */
>   		ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
> diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> index 1081cd36a2bd..1f93591a8c69 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> @@ -4716,7 +4716,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(live_virtual_reset),
>   	};
>   
> -	if (!HAS_EXECLISTS(i915))
> +	if (i915->gt.submission_method != INTEL_SUBMISSION_ELSP)
>   		return 0;
>   
>   	if (intel_gt_is_wedged(&i915->gt))
> diff --git a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
> index 99609271c3a7..c12e74171b63 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
> @@ -291,7 +291,7 @@ int intel_ring_submission_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(live_ctx_switch_wa),
>   	};
>   
> -	if (HAS_EXECLISTS(i915))
> +	if (i915->gt.submission_method > INTEL_SUBMISSION_RING)
>   		return 0;
>   
>   	return intel_gt_live_subtests(tests, &i915->gt);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index f72faa0b8339..17b551a0c89f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -745,8 +745,3 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
>   {
>   	guc->submission_selected = __guc_submission_selected(guc);
>   }
> -
> -bool intel_engine_in_guc_submission_mode(const struct intel_engine_cs *engine)
> -{
> -	return engine->set_default_submission == guc_set_default_submission;
> -}
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> index 5f7b9e6347d0..3f7005018939 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> @@ -20,7 +20,6 @@ void intel_guc_submission_fini(struct intel_guc *guc);
>   int intel_guc_preempt_work_create(struct intel_guc *guc);
>   void intel_guc_preempt_work_destroy(struct intel_guc *guc);
>   int intel_guc_submission_setup(struct intel_engine_cs *engine);
> -bool intel_engine_in_guc_submission_mode(const struct intel_engine_cs *engine);
>   
>   static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
>   {
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 85ad62dbabfa..66f1f25119b5 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1257,11 +1257,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>   	case 8:
>   	case 9:
>   	case 10:
> -		if (intel_engine_in_execlists_submission_mode(ce->engine)) {
> -			stream->specific_ctx_id_mask =
> -				(1U << GEN8_CTX_ID_WIDTH) - 1;
> -			stream->specific_ctx_id = stream->specific_ctx_id_mask;
> -		} else {
> +		if (intel_engine_uses_guc(ce->engine)) {
>   			/*
>   			 * When using GuC, the context descriptor we write in
>   			 * i915 is read by GuC and rewritten before it's
> @@ -1280,6 +1276,10 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>   			 */
>   			stream->specific_ctx_id_mask =
>   				(1U << (GEN8_CTX_ID_WIDTH - 1)) - 1;
> +		} else {
> +			stream->specific_ctx_id_mask =
> +				(1U << GEN8_CTX_ID_WIDTH) - 1;
> +			stream->specific_ctx_id = stream->specific_ctx_id_mask;
>   		}
>   		break;
>   
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 03/97] drm/i915/gt: Move CS interrupt handler to the backend
  2021-05-06 19:13 ` [RFC PATCH 03/97] drm/i915/gt: Move CS interrupt handler to the backend Matthew Brost
  2021-05-19  3:31   ` Matthew Brost
@ 2021-05-25  8:45   ` Tvrtko Ursulin
  1 sibling, 0 replies; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-25  8:45 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter


On 06/05/2021 20:13, Matthew Brost wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> The different submission backends each have their own preferred
> behaviour and interrupt setup. Let each handle their own interrupts.
> 
> This becomes more useful later as we to extract the use of auxiliary
> state in the interrupt handler that is backend specific.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Same, this patch had my r-b already so I'll repeat it:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> ---
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  7 ++
>   drivers/gpu/drm/i915/gt/intel_engine_types.h  | 14 +---
>   .../drm/i915/gt/intel_execlists_submission.c  | 41 ++++++++++
>   drivers/gpu/drm/i915/gt/intel_gt_irq.c        | 82 ++++++-------------
>   drivers/gpu/drm/i915/gt/intel_gt_irq.h        | 23 ++++++
>   .../gpu/drm/i915/gt/intel_ring_submission.c   |  8 ++
>   drivers/gpu/drm/i915/gt/intel_rps.c           |  2 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 ++-
>   drivers/gpu/drm/i915/i915_irq.c               | 10 ++-
>   9 files changed, 124 insertions(+), 74 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 0618379b68ca..828e1669f92c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -255,6 +255,11 @@ static void intel_engine_sanitize_mmio(struct intel_engine_cs *engine)
>   	intel_engine_set_hwsp_writemask(engine, ~0u);
>   }
>   
> +static void nop_irq_handler(struct intel_engine_cs *engine, u16 iir)
> +{
> +	GEM_DEBUG_WARN_ON(iir);
> +}
> +
>   static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
>   {
>   	const struct engine_info *info = &intel_engines[id];
> @@ -292,6 +297,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
>   	engine->hw_id = info->hw_id;
>   	engine->guc_id = MAKE_GUC_ID(info->class, info->instance);
>   
> +	engine->irq_handler = nop_irq_handler;
> +
>   	engine->class = info->class;
>   	engine->instance = info->instance;
>   	__sprint_engine_name(engine);
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 883bafc44902..9ef349cd5cea 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -402,6 +402,7 @@ struct intel_engine_cs {
>   	u32		irq_enable_mask; /* bitmask to enable ring interrupt */
>   	void		(*irq_enable)(struct intel_engine_cs *engine);
>   	void		(*irq_disable)(struct intel_engine_cs *engine);
> +	void		(*irq_handler)(struct intel_engine_cs *engine, u16 iir);
>   
>   	void		(*sanitize)(struct intel_engine_cs *engine);
>   	int		(*resume)(struct intel_engine_cs *engine);
> @@ -481,10 +482,9 @@ struct intel_engine_cs {
>   #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
>   #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
>   #define I915_ENGINE_HAS_TIMESLICES   BIT(4)
> -#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(5)
> -#define I915_ENGINE_IS_VIRTUAL       BIT(6)
> -#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(7)
> -#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(8)
> +#define I915_ENGINE_IS_VIRTUAL       BIT(5)
> +#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
> +#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
>   	unsigned int flags;
>   
>   	/*
> @@ -593,12 +593,6 @@ intel_engine_has_timeslices(const struct intel_engine_cs *engine)
>   	return engine->flags & I915_ENGINE_HAS_TIMESLICES;
>   }
>   
> -static inline bool
> -intel_engine_needs_breadcrumb_tasklet(const struct intel_engine_cs *engine)
> -{
> -	return engine->flags & I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
> -}
> -
>   static inline bool
>   intel_engine_is_virtual(const struct intel_engine_cs *engine)
>   {
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 9d2da5ccaef6..8db200422950 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -118,6 +118,7 @@
>   #include "intel_engine_stats.h"
>   #include "intel_execlists_submission.h"
>   #include "intel_gt.h"
> +#include "intel_gt_irq.h"
>   #include "intel_gt_pm.h"
>   #include "intel_gt_requests.h"
>   #include "intel_lrc.h"
> @@ -2384,6 +2385,45 @@ static void execlists_submission_tasklet(struct tasklet_struct *t)
>   	rcu_read_unlock();
>   }
>   
> +static void execlists_irq_handler(struct intel_engine_cs *engine, u16 iir)
> +{
> +	bool tasklet = false;
> +
> +	if (unlikely(iir & GT_CS_MASTER_ERROR_INTERRUPT)) {
> +		u32 eir;
> +
> +		/* Upper 16b are the enabling mask, rsvd for internal errors */
> +		eir = ENGINE_READ(engine, RING_EIR) & GENMASK(15, 0);
> +		ENGINE_TRACE(engine, "CS error: %x\n", eir);
> +
> +		/* Disable the error interrupt until after the reset */
> +		if (likely(eir)) {
> +			ENGINE_WRITE(engine, RING_EMR, ~0u);
> +			ENGINE_WRITE(engine, RING_EIR, eir);
> +			WRITE_ONCE(engine->execlists.error_interrupt, eir);
> +			tasklet = true;
> +		}
> +	}
> +
> +	if (iir & GT_WAIT_SEMAPHORE_INTERRUPT) {
> +		WRITE_ONCE(engine->execlists.yield,
> +			   ENGINE_READ_FW(engine, RING_EXECLIST_STATUS_HI));
> +		ENGINE_TRACE(engine, "semaphore yield: %08x\n",
> +			     engine->execlists.yield);
> +		if (del_timer(&engine->execlists.timer))
> +			tasklet = true;
> +	}
> +
> +	if (iir & GT_CONTEXT_SWITCH_INTERRUPT)
> +		tasklet = true;
> +
> +	if (iir & GT_RENDER_USER_INTERRUPT)
> +		intel_engine_signal_breadcrumbs(engine);
> +
> +	if (tasklet)
> +		tasklet_hi_schedule(&engine->execlists.tasklet);
> +}
> +
>   static void __execlists_kick(struct intel_engine_execlists *execlists)
>   {
>   	/* Kick the tasklet for some interrupt coalescing and reset handling */
> @@ -3133,6 +3173,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   		 * until a more refined solution exists.
>   		 */
>   	}
> +	intel_engine_set_irq_handler(engine, execlists_irq_handler);
>   
>   	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
>   	if (!intel_vgpu_active(engine->i915)) {
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> index 9fc6c912a4e5..d29126c458ba 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> @@ -20,48 +20,6 @@ static void guc_irq_handler(struct intel_guc *guc, u16 iir)
>   		intel_guc_to_host_event_handler(guc);
>   }
>   
> -static void
> -cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
> -{
> -	bool tasklet = false;
> -
> -	if (unlikely(iir & GT_CS_MASTER_ERROR_INTERRUPT)) {
> -		u32 eir;
> -
> -		/* Upper 16b are the enabling mask, rsvd for internal errors */
> -		eir = ENGINE_READ(engine, RING_EIR) & GENMASK(15, 0);
> -		ENGINE_TRACE(engine, "CS error: %x\n", eir);
> -
> -		/* Disable the error interrupt until after the reset */
> -		if (likely(eir)) {
> -			ENGINE_WRITE(engine, RING_EMR, ~0u);
> -			ENGINE_WRITE(engine, RING_EIR, eir);
> -			WRITE_ONCE(engine->execlists.error_interrupt, eir);
> -			tasklet = true;
> -		}
> -	}
> -
> -	if (iir & GT_WAIT_SEMAPHORE_INTERRUPT) {
> -		WRITE_ONCE(engine->execlists.yield,
> -			   ENGINE_READ_FW(engine, RING_EXECLIST_STATUS_HI));
> -		ENGINE_TRACE(engine, "semaphore yield: %08x\n",
> -			     engine->execlists.yield);
> -		if (del_timer(&engine->execlists.timer))
> -			tasklet = true;
> -	}
> -
> -	if (iir & GT_CONTEXT_SWITCH_INTERRUPT)
> -		tasklet = true;
> -
> -	if (iir & GT_RENDER_USER_INTERRUPT) {
> -		intel_engine_signal_breadcrumbs(engine);
> -		tasklet |= intel_engine_needs_breadcrumb_tasklet(engine);
> -	}
> -
> -	if (tasklet)
> -		tasklet_hi_schedule(&engine->execlists.tasklet);
> -}
> -
>   static u32
>   gen11_gt_engine_identity(struct intel_gt *gt,
>   			 const unsigned int bank, const unsigned int bit)
> @@ -122,7 +80,7 @@ gen11_engine_irq_handler(struct intel_gt *gt, const u8 class,
>   		engine = NULL;
>   
>   	if (likely(engine))
> -		return cs_irq_handler(engine, iir);
> +		return intel_engine_cs_irq(engine, iir);
>   
>   	WARN_ONCE(1, "unhandled engine interrupt class=0x%x, instance=0x%x\n",
>   		  class, instance);
> @@ -275,9 +233,12 @@ void gen11_gt_irq_postinstall(struct intel_gt *gt)
>   void gen5_gt_irq_handler(struct intel_gt *gt, u32 gt_iir)
>   {
>   	if (gt_iir & GT_RENDER_USER_INTERRUPT)
> -		intel_engine_signal_breadcrumbs(gt->engine_class[RENDER_CLASS][0]);
> +		intel_engine_cs_irq(gt->engine_class[RENDER_CLASS][0],
> +				    gt_iir);
> +
>   	if (gt_iir & ILK_BSD_USER_INTERRUPT)
> -		intel_engine_signal_breadcrumbs(gt->engine_class[VIDEO_DECODE_CLASS][0]);
> +		intel_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][0],
> +				    gt_iir);
>   }
>   
>   static void gen7_parity_error_irq_handler(struct intel_gt *gt, u32 iir)
> @@ -301,11 +262,16 @@ static void gen7_parity_error_irq_handler(struct intel_gt *gt, u32 iir)
>   void gen6_gt_irq_handler(struct intel_gt *gt, u32 gt_iir)
>   {
>   	if (gt_iir & GT_RENDER_USER_INTERRUPT)
> -		intel_engine_signal_breadcrumbs(gt->engine_class[RENDER_CLASS][0]);
> +		intel_engine_cs_irq(gt->engine_class[RENDER_CLASS][0],
> +				    gt_iir);
> +
>   	if (gt_iir & GT_BSD_USER_INTERRUPT)
> -		intel_engine_signal_breadcrumbs(gt->engine_class[VIDEO_DECODE_CLASS][0]);
> +		intel_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][0],
> +				    gt_iir >> 12);
> +
>   	if (gt_iir & GT_BLT_USER_INTERRUPT)
> -		intel_engine_signal_breadcrumbs(gt->engine_class[COPY_ENGINE_CLASS][0]);
> +		intel_engine_cs_irq(gt->engine_class[COPY_ENGINE_CLASS][0],
> +				    gt_iir >> 22);
>   
>   	if (gt_iir & (GT_BLT_CS_ERROR_INTERRUPT |
>   		      GT_BSD_CS_ERROR_INTERRUPT |
> @@ -324,10 +290,10 @@ void gen8_gt_irq_handler(struct intel_gt *gt, u32 master_ctl)
>   	if (master_ctl & (GEN8_GT_RCS_IRQ | GEN8_GT_BCS_IRQ)) {
>   		iir = raw_reg_read(regs, GEN8_GT_IIR(0));
>   		if (likely(iir)) {
> -			cs_irq_handler(gt->engine_class[RENDER_CLASS][0],
> -				       iir >> GEN8_RCS_IRQ_SHIFT);
> -			cs_irq_handler(gt->engine_class[COPY_ENGINE_CLASS][0],
> -				       iir >> GEN8_BCS_IRQ_SHIFT);
> +			intel_engine_cs_irq(gt->engine_class[RENDER_CLASS][0],
> +					    iir >> GEN8_RCS_IRQ_SHIFT);
> +			intel_engine_cs_irq(gt->engine_class[COPY_ENGINE_CLASS][0],
> +					    iir >> GEN8_BCS_IRQ_SHIFT);
>   			raw_reg_write(regs, GEN8_GT_IIR(0), iir);
>   		}
>   	}
> @@ -335,10 +301,10 @@ void gen8_gt_irq_handler(struct intel_gt *gt, u32 master_ctl)
>   	if (master_ctl & (GEN8_GT_VCS0_IRQ | GEN8_GT_VCS1_IRQ)) {
>   		iir = raw_reg_read(regs, GEN8_GT_IIR(1));
>   		if (likely(iir)) {
> -			cs_irq_handler(gt->engine_class[VIDEO_DECODE_CLASS][0],
> -				       iir >> GEN8_VCS0_IRQ_SHIFT);
> -			cs_irq_handler(gt->engine_class[VIDEO_DECODE_CLASS][1],
> -				       iir >> GEN8_VCS1_IRQ_SHIFT);
> +			intel_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][0],
> +					    iir >> GEN8_VCS0_IRQ_SHIFT);
> +			intel_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][1],
> +					    iir >> GEN8_VCS1_IRQ_SHIFT);
>   			raw_reg_write(regs, GEN8_GT_IIR(1), iir);
>   		}
>   	}
> @@ -346,8 +312,8 @@ void gen8_gt_irq_handler(struct intel_gt *gt, u32 master_ctl)
>   	if (master_ctl & GEN8_GT_VECS_IRQ) {
>   		iir = raw_reg_read(regs, GEN8_GT_IIR(3));
>   		if (likely(iir)) {
> -			cs_irq_handler(gt->engine_class[VIDEO_ENHANCEMENT_CLASS][0],
> -				       iir >> GEN8_VECS_IRQ_SHIFT);
> +			intel_engine_cs_irq(gt->engine_class[VIDEO_ENHANCEMENT_CLASS][0],
> +					    iir >> GEN8_VECS_IRQ_SHIFT);
>   			raw_reg_write(regs, GEN8_GT_IIR(3), iir);
>   		}
>   	}
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.h b/drivers/gpu/drm/i915/gt/intel_gt_irq.h
> index f667e976fb2b..41cad38668c5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.h
> @@ -8,6 +8,8 @@
>   
>   #include <linux/types.h>
>   
> +#include "intel_engine_types.h"
> +
>   struct intel_gt;
>   
>   #define GEN8_GT_IRQS (GEN8_GT_RCS_IRQ | \
> @@ -39,4 +41,25 @@ void gen8_gt_irq_handler(struct intel_gt *gt, u32 master_ctl);
>   void gen8_gt_irq_reset(struct intel_gt *gt);
>   void gen8_gt_irq_postinstall(struct intel_gt *gt);
>   
> +static inline void intel_engine_cs_irq(struct intel_engine_cs *engine, u16 iir)
> +{
> +	if (iir)
> +		engine->irq_handler(engine, iir);
> +}
> +
> +static inline void
> +intel_engine_set_irq_handler(struct intel_engine_cs *engine,
> +			     void (*fn)(struct intel_engine_cs *engine,
> +					u16 iir))
> +{
> +	/*
> +	 * As the interrupt is live as allocate and setup the engines,
> +	 * err on the side of caution and apply barriers to updating
> +	 * the irq handler callback. This assures that when we do use
> +	 * the engine, we will receive interrupts only to ourselves,
> +	 * and not lose any.
> +	 */
> +	smp_store_mb(engine->irq_handler, fn);
> +}
> +
>   #endif /* INTEL_GT_IRQ_H */
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 5f4f7f1df48f..2b6dffcc2262 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -12,6 +12,7 @@
>   #include "intel_breadcrumbs.h"
>   #include "intel_context.h"
>   #include "intel_gt.h"
> +#include "intel_gt_irq.h"
>   #include "intel_reset.h"
>   #include "intel_ring.h"
>   #include "shmem_utils.h"
> @@ -1017,10 +1018,17 @@ static void ring_release(struct intel_engine_cs *engine)
>   	intel_timeline_put(engine->legacy.timeline);
>   }
>   
> +static void irq_handler(struct intel_engine_cs *engine, u16 iir)
> +{
> +	intel_engine_signal_breadcrumbs(engine);
> +}
> +
>   static void setup_irq(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
>   
> +	intel_engine_set_irq_handler(engine, irq_handler);
> +
>   	if (INTEL_GEN(i915) >= 6) {
>   		engine->irq_enable = gen6_irq_enable;
>   		engine->irq_disable = gen6_irq_disable;
> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
> index 405d814e9040..97cab1b99871 100644
> --- a/drivers/gpu/drm/i915/gt/intel_rps.c
> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
> @@ -1774,7 +1774,7 @@ void gen6_rps_irq_handler(struct intel_rps *rps, u32 pm_iir)
>   		return;
>   
>   	if (pm_iir & PM_VEBOX_USER_INTERRUPT)
> -		intel_engine_signal_breadcrumbs(gt->engine[VECS0]);
> +		intel_engine_cs_irq(gt->engine[VECS0], pm_iir >> 10);
>   
>   	if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT)
>   		DRM_DEBUG("Command parser error, pm_iir 0x%08x\n", pm_iir);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 17b551a0c89f..335719f17490 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -11,6 +11,7 @@
>   #include "gt/intel_context.h"
>   #include "gt/intel_engine_pm.h"
>   #include "gt/intel_gt.h"
> +#include "gt/intel_gt_irq.h"
>   #include "gt/intel_gt_pm.h"
>   #include "gt/intel_lrc.h"
>   #include "gt/intel_mocs.h"
> @@ -264,6 +265,14 @@ static void guc_submission_tasklet(struct tasklet_struct *t)
>   	spin_unlock_irqrestore(&engine->active.lock, flags);
>   }
>   
> +static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
> +{
> +	if (iir & GT_RENDER_USER_INTERRUPT) {
> +		intel_engine_signal_breadcrumbs(engine);
> +		tasklet_hi_schedule(&engine->execlists.tasklet);
> +	}
> +}
> +
>   static void guc_reset_prepare(struct intel_engine_cs *engine)
>   {
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
> @@ -645,7 +654,6 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>   	}
>   	engine->set_default_submission = guc_set_default_submission;
>   
> -	engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
>   	engine->flags |= I915_ENGINE_HAS_PREEMPTION;
>   
>   	/*
> @@ -681,6 +689,7 @@ static void rcs_submission_override(struct intel_engine_cs *engine)
>   static inline void guc_default_irqs(struct intel_engine_cs *engine)
>   {
>   	engine->irq_keep_mask = GT_RENDER_USER_INTERRUPT;
> +	intel_engine_set_irq_handler(engine, cs_irq_handler);
>   }
>   
>   int intel_guc_submission_setup(struct intel_engine_cs *engine)
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index f6967a93ec7a..d58118806299 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -4014,7 +4014,7 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
>   		intel_uncore_write16(&dev_priv->uncore, GEN2_IIR, iir);
>   
>   		if (iir & I915_USER_INTERRUPT)
> -			intel_engine_signal_breadcrumbs(dev_priv->gt.engine[RCS0]);
> +			intel_engine_cs_irq(dev_priv->gt.engine[RCS0], iir);
>   
>   		if (iir & I915_MASTER_ERROR_INTERRUPT)
>   			i8xx_error_irq_handler(dev_priv, eir, eir_stuck);
> @@ -4122,7 +4122,7 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
>   		intel_uncore_write(&dev_priv->uncore, GEN2_IIR, iir);
>   
>   		if (iir & I915_USER_INTERRUPT)
> -			intel_engine_signal_breadcrumbs(dev_priv->gt.engine[RCS0]);
> +			intel_engine_cs_irq(dev_priv->gt.engine[RCS0], iir);
>   
>   		if (iir & I915_MASTER_ERROR_INTERRUPT)
>   			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
> @@ -4267,10 +4267,12 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
>   		intel_uncore_write(&dev_priv->uncore, GEN2_IIR, iir);
>   
>   		if (iir & I915_USER_INTERRUPT)
> -			intel_engine_signal_breadcrumbs(dev_priv->gt.engine[RCS0]);
> +			intel_engine_cs_irq(dev_priv->gt.engine[RCS0],
> +					    iir);
>   
>   		if (iir & I915_BSD_USER_INTERRUPT)
> -			intel_engine_signal_breadcrumbs(dev_priv->gt.engine[VCS0]);
> +			intel_engine_cs_irq(dev_priv->gt.engine[VCS0],
> +					    iir >> 25);
>   
>   		if (iir & I915_MASTER_ERROR_INTERRUPT)
>   			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 33/97] drm/i915: Engine relative MMIO
  2021-05-06 19:13 ` [RFC PATCH 33/97] drm/i915: Engine relative MMIO Matthew Brost
@ 2021-05-25  9:05   ` Tvrtko Ursulin
  0 siblings, 0 replies; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-25  9:05 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter


On 06/05/2021 20:13, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> With virtual engines, it is no longer possible to know which specific
> physical engine a given request will be executed on at the time that
> request is generated. This means that the request itself must be engine
> agnostic - any direct register writes must be relative to the engine
> and not absolute addresses.
> 
> The LRI command has support for engine relative addressing. However,
> the mechanism is not transparent to the driver. The scheme for Gen11
> (MI_LRI_ADD_CS_MMIO_START) requires the LRI address to have no
> absolute engine base component. The hardware then adds on the correct
> engine offset at execution time.
> 
> Due to the non-trivial and differing schemes on different hardware, it
> is not possible to simply update the code that creates the LRI
> commands to set a remap flag and let the hardware get on with it.
> Instead, this patch adds function wrappers for generating the LRI
> command itself and then for constructing the correct address to use
> with the LRI.
> 
> Bspec: 45606
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> CC: Rodrigo Vivi <rodrigo.vivi@intel.com>
> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> CC: Chris P Wilson <chris.p.wilson@intel.com>
> CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_context.c  |  7 +++---
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 25 ++++++++++++++++++++
>   drivers/gpu/drm/i915/gt/intel_engine_types.h |  3 +++
>   drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  5 ++++
>   drivers/gpu/drm/i915/i915_perf.c             |  6 +++++
>   5 files changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 188dee13e017..993faa213b41 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1211,7 +1211,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
>   {
>   	struct i915_address_space *vm = rq->context->vm;
>   	struct intel_engine_cs *engine = rq->engine;
> -	u32 base = engine->mmio_base;
> +	u32 base = engine->lri_mmio_base;
>   	u32 *cs;
>   	int i;
>   
> @@ -1223,7 +1223,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
>   		if (IS_ERR(cs))
>   			return PTR_ERR(cs);
>   
> -		*cs++ = MI_LOAD_REGISTER_IMM(2);
> +		*cs++ = MI_LOAD_REGISTER_IMM(2) | engine->lri_cmd_mode;

Would a helper like MI_LOAD_REGISTER_IMM_REL(engine, n) look better?

>   
>   		*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, 0));
>   		*cs++ = upper_32_bits(pd_daddr);
> @@ -1245,7 +1245,8 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
>   		if (IS_ERR(cs))
>   			return PTR_ERR(cs);
>   
> -		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) | MI_LRI_FORCE_POSTED;
> +		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) |
> +			MI_LRI_FORCE_POSTED | engine->lri_cmd_mode;
>   		for (i = GEN8_3LVL_PDPES; i--; ) {
>   			const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index ec82a7ec0c8d..c88b792c1ab5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -16,6 +16,7 @@
>   #include "intel_engine_pm.h"
>   #include "intel_engine_user.h"
>   #include "intel_execlists_submission.h"
> +#include "intel_gpu_commands.h"
>   #include "intel_gt.h"
>   #include "intel_gt_requests.h"
>   #include "intel_gt_pm.h"
> @@ -223,6 +224,28 @@ static u32 __engine_mmio_base(struct drm_i915_private *i915,
>   	return bases[i].base;
>   }
>   
> +static bool i915_engine_has_relative_lri(const struct intel_engine_cs *engine)
> +{
> +	if (INTEL_GEN(engine->i915) < 11)
> +		return false;
> +
> +	if (engine->class == COPY_ENGINE_CLASS)
> +		return false;
> +
> +	return true;
> +}
> +
> +static void lri_init(struct intel_engine_cs *engine)
> +{
> +	if (i915_engine_has_relative_lri(engine)) {
> +		engine->lri_cmd_mode = MI_LRI_LRM_CS_MMIO;
> +		engine->lri_mmio_base = 0;
> +	} else {
> +		engine->lri_cmd_mode = 0;
> +		engine->lri_mmio_base = engine->mmio_base;
> +	}
> +}
> +
>   static void __sprint_engine_name(struct intel_engine_cs *engine)
>   {
>   	/*
> @@ -327,6 +350,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
>   	if (engine->context_size)
>   		DRIVER_CAPS(i915)->has_logical_contexts = true;
>   
> +	lri_init(engine);
> +
>   	ewma__engine_latency_init(&engine->latency);
>   	seqcount_init(&engine->stats.lock);
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 93aa22680db0..86302e6d86b2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -281,6 +281,9 @@ struct intel_engine_cs {
>   	u32 context_size;
>   	u32 mmio_base;
>   
> +	u32 lri_mmio_base;
> +	u32 lri_cmd_mode;
> +
>   	/*
>   	 * Some w/a require forcewake to be held (which prevents RC6) while
>   	 * a particular engine is active. If so, we set fw_domain to which
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index 14e2ffb6c0e5..887d59897bc2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -134,6 +134,11 @@
>    *   simply ignores the register load under certain conditions.
>    * - One can actually load arbitrary many arbitrary registers: Simply issue x
>    *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
> + * - Newer hardware supports engine relative addressing but older hardware does
> + *   not. This is required for hw engine load balancing. Hence the MI_LRI
> + *   instruction itself is prefixed with '__' and should only be used on
> + *   legacy hardware code paths. Generic code must always use the MI_LRI
> + *   and i915_get_lri_reg() helper functions instead.

Stale comment.

>    */
>   #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
>   /* Gen11+. addr = base + (ctx_restore ? offset & GENMASK(12,2) : offset) */
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 66f1f25119b5..b9cc3f0a616f 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -2118,6 +2118,11 @@ gen8_update_reg_state_unlocked(const struct intel_context *ce,
>   	u32 *reg_state = ce->lrc_reg_state;
>   	int i;
>   
> +	/*
> +	 * NB: The LRI instruction is generated by the hardware.
> +	 * Should we read it in and assert that the offset flag is set?
> +	 */
> +
>   	reg_state[ctx_oactxctrl + 1] =
>   		(stream->period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
>   		(stream->periodic ? GEN8_OA_TIMER_ENABLE : 0) |
> @@ -2174,6 +2179,7 @@ gen8_load_flex(struct i915_request *rq,
>   
>   	*cs++ = MI_LOAD_REGISTER_IMM(count);
>   	do {
> +		/* FIXME: Is this table LRI remap/offset friendly? */
>   		*cs++ = i915_mmio_reg_offset(flex->reg);
>   		*cs++ = flex->value;
>   	} while (flex++, --count);
> 

NB and FIXME would ideally be resolved before merging.

Regards,

Tvrtko



^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-05-06 19:13 ` [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function Matthew Brost
  2021-05-24 12:21   ` Michal Wajdeczko
@ 2021-05-25  9:21   ` Tvrtko Ursulin
  2021-05-25 17:21     ` Matthew Brost
  1 sibling, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-25  9:21 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter


On 06/05/2021 20:13, Matthew Brost wrote:
> Add non blocking CTB send function, intel_guc_send_nb. In order to
> support a non blocking CTB send function a spin lock is needed to
> protect the CTB descriptors fields. Also the non blocking call must not
> update the fence value as this value is owned by the blocking call
> (intel_guc_send).

Could the commit message say why the non-blocking send function is needed?

> 
> The blocking CTB now must have a flow control mechanism to ensure the
> buffer isn't overrun. A lazy spin wait is used as we believe the flow
> control condition should be rare with properly sized buffer.
> 
> The function, intel_guc_send_nb, is exported in this patch but unused.
> Several patches later in the series make use of this function.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 ++-
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 96 +++++++++++++++++++++--
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  7 +-
>   3 files changed, 105 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index c20f3839de12..4c0a367e41d8 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -75,7 +75,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
>   static
>   inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
>   {
> -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
> +}
> +
> +#define INTEL_GUC_SEND_NB		BIT(31)
> +static
> +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
> +{
> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
> +				 INTEL_GUC_SEND_NB);
>   }
>   
>   static inline int
> @@ -83,7 +91,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
>   			   u32 *response_buf, u32 response_buf_size)
>   {
>   	return intel_guc_ct_send(&guc->ct, action, len,
> -				 response_buf, response_buf_size);
> +				 response_buf, response_buf_size, 0);
>   }
>   
>   static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index a76603537fa8..af7314d45a78 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -3,6 +3,11 @@
>    * Copyright © 2016-2019 Intel Corporation
>    */
>   
> +#include <linux/circ_buf.h>
> +#include <linux/ktime.h>
> +#include <linux/time64.h>
> +#include <linux/timekeeping.h>
> +
>   #include "i915_drv.h"
>   #include "intel_guc_ct.h"
>   #include "gt/intel_gt.h"
> @@ -308,6 +313,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>   	if (unlikely(err))
>   		goto err_deregister;
>   
> +	ct->requests.last_fence = 1;
>   	ct->enabled = true;
>   
>   	return 0;
> @@ -343,10 +349,22 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
>   	return ++ct->requests.last_fence;
>   }
>   
> +static void write_barrier(struct intel_guc_ct *ct) {
> +	struct intel_guc *guc = ct_to_guc(ct);
> +	struct intel_gt *gt = guc_to_gt(guc);
> +
> +	if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
> +		GEM_BUG_ON(guc->send_regs.fw_domains);
> +		intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);

It's safe to write to this reg? Does it need a comment to explain it?

> +	} else {
> +		wmb();
> +	}
> +}
> +
>   static int ct_write(struct intel_guc_ct *ct,
>   		    const u32 *action,
>   		    u32 len /* in dwords */,
> -		    u32 fence)
> +		    u32 fence, u32 flags)
>   {
>   	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>   	struct guc_ct_buffer_desc *desc = ctb->desc;
> @@ -393,9 +411,13 @@ static int ct_write(struct intel_guc_ct *ct,
>   		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
>   		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
>   
> -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
> +	hxg = (flags & INTEL_GUC_SEND_NB) ?
> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
> +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
> +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
>   
>   	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
>   		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
> @@ -412,6 +434,12 @@ static int ct_write(struct intel_guc_ct *ct,
>   	}
>   	GEM_BUG_ON(tail > size);
>   
> +	/*
> +	 * make sure H2G buffer update and LRC tail update (if this triggering a
> +	 * submission) are visable before updating the descriptor tail
> +	 */
> +	write_barrier(ct);
> +
>   	/* now update descriptor */
>   	WRITE_ONCE(desc->tail, tail);
>   
> @@ -466,6 +494,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
>   	return err;
>   }
>   
> +static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> +{
> +	struct guc_ct_buffer_desc *desc = ctb->desc;
> +	u32 head = READ_ONCE(desc->head);
> +	u32 space;
> +
> +	space = CIRC_SPACE(desc->tail, head, ctb->size);
> +
> +	return space >= len_dw;
> +}
> +
> +static int ct_send_nb(struct intel_guc_ct *ct,
> +		      const u32 *action,
> +		      u32 len,
> +		      u32 flags)
> +{
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> +	unsigned long spin_flags;
> +	u32 fence;
> +	int ret;
> +
> +	spin_lock_irqsave(&ctb->lock, spin_flags);
> +
> +	ret = ctb_has_room(ctb, len + 1);
> +	if (unlikely(ret))
> +		goto out;
> +
> +	fence = ct_get_next_fence(ct);
> +	ret = ct_write(ct, action, len, fence, flags);
> +	if (unlikely(ret))
> +		goto out;
> +
> +	intel_guc_notify(ct_to_guc(ct));
> +
> +out:
> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> +
> +	return ret;
> +}
> +
>   static int ct_send(struct intel_guc_ct *ct,
>   		   const u32 *action,
>   		   u32 len,
> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
>   		   u32 response_buf_size,
>   		   u32 *status)
>   {
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>   	struct ct_request request;
>   	unsigned long flags;
>   	u32 fence;
> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
>   	GEM_BUG_ON(!len);
>   	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>   	GEM_BUG_ON(!response_buf && response_buf_size);
> +	might_sleep();

Sleep is just cond_resched below or there is more?

>   
> +	/*
> +	 * We use a lazy spin wait loop here as we believe that if the CT
> +	 * buffers are sized correctly the flow control condition should be
> +	 * rare.
> +	 */
> +retry:
>   	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> +		cond_resched();
> +		goto retry;
> +	}

If this patch is about adding a non-blocking send function, and below we 
can see that it creates a fork:

intel_guc_ct_send:
...
	if (flags & INTEL_GUC_SEND_NB)
		return ct_send_nb(ct, action, len, flags);

  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);

Then why is there a change in ct_send here, which is not the new 
non-blocking path?

>   
>   	fence = ct_get_next_fence(ct);
>   	request.fence = fence;
> @@ -495,7 +576,7 @@ static int ct_send(struct intel_guc_ct *ct,
>   	list_add_tail(&request.link, &ct->requests.pending);
>   	spin_unlock(&ct->requests.lock);
>   
> -	err = ct_write(ct, action, len, fence);
> +	err = ct_write(ct, action, len, fence, 0);
>   
>   	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>   
> @@ -537,7 +618,7 @@ static int ct_send(struct intel_guc_ct *ct,
>    * Command Transport (CT) buffer based GuC send function.
>    */
>   int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> -		      u32 *response_buf, u32 response_buf_size)
> +		      u32 *response_buf, u32 response_buf_size, u32 flags)
>   {
>   	u32 status = ~0; /* undefined */
>   	int ret;
> @@ -547,6 +628,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>   		return -ENODEV;
>   	}
>   
> +	if (flags & INTEL_GUC_SEND_NB)
> +		return ct_send_nb(ct, action, len, flags);
> +
>   	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>   	if (unlikely(ret < 0)) {
>   		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index 1ae2dde6db93..55ef7c52472f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -9,6 +9,7 @@
>   #include <linux/interrupt.h>
>   #include <linux/spinlock.h>
>   #include <linux/workqueue.h>
> +#include <linux/ktime.h>
>   
>   #include "intel_guc_fwif.h"
>   
> @@ -42,7 +43,6 @@ struct intel_guc_ct_buffer {
>   	bool broken;
>   };
>   
> -
>   /** Top-level structure for Command Transport related data
>    *
>    * Includes a pair of CT buffers for bi-directional communication and tracking
> @@ -69,6 +69,9 @@ struct intel_guc_ct {
>   		struct list_head incoming; /* incoming requests */
>   		struct work_struct worker; /* handler for incoming requests */
>   	} requests;
> +
> +	/** @stall_time: time of first time a CTB submission is stalled */
> +	ktime_t stall_time;

Unused in this patch.

>   };
>   
>   void intel_guc_ct_init_early(struct intel_guc_ct *ct);
> @@ -88,7 +91,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
>   }
>   
>   int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> -		      u32 *response_buf, u32 response_buf_size);
> +		      u32 *response_buf, u32 response_buf_size, u32 flags);
>   void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
>   
>   #endif /* _INTEL_GUC_CT_H_ */
> 

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 39/97] drm/i915/guc: Increase size of CTB buffers
  2021-05-06 19:13 ` [RFC PATCH 39/97] drm/i915/guc: Increase size of CTB buffers Matthew Brost
  2021-05-24 13:43   ` [Intel-gfx] " Michal Wajdeczko
@ 2021-05-25  9:24   ` Tvrtko Ursulin
  2021-05-25 17:15     ` Matthew Brost
  1 sibling, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-25  9:24 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter


On 06/05/2021 20:13, Matthew Brost wrote:
> With the introduction of non-blocking CTBs more than one CTB can be in
> flight at a time. Increasing the size of the CTBs should reduce how
> often software hits the case where no space is available in the CTB
> buffer.

I'd move this before the patch which adds the non-blocking send since 
that one claims congestion should be rare with properly sized buffers. 
So it makes sense to have them sized properly back before that one.

Regards,

Tvrtko

> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
>   1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 77dfbc94dcc3..d6895d29ed2d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
>    *      +--------+-----------------------------------------------+------+
>    *
>    * Size of each `CT Buffer`_ must be multiple of 4K.
> - * As we don't expect too many messages, for now use minimum sizes.
> + * We don't expect too many messages in flight at any time, unless we are
> + * using the GuC submission. In that case each request requires a minimum
> + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this
> + * enough space to avoid backpressure on the driver. We increase the size
> + * of the receive buffer (relative to the send) to ensure a G2H response
> + * CTB has a landing spot.
>    */
>   #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
>   #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
> -#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
> +#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
>   
>   #define MAX_US_STALL_CTB	1000000
>   
> @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>   	/* beware of buffer wrap case */
>   	if (unlikely(available < 0))
>   		available += size;
> -	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
> +	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
>   	GEM_BUG_ON(available < 0);
>   
>   	header = cmds[head];
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 44/97] drm/i915/guc: Implement GuC submission tasklet
  2021-05-06 19:13 ` [RFC PATCH 44/97] drm/i915/guc: Implement GuC submission tasklet Matthew Brost
@ 2021-05-25  9:43   ` Tvrtko Ursulin
  2021-05-25 17:10     ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-25  9:43 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter


On 06/05/2021 20:13, Matthew Brost wrote:
> Implement GuC submission tasklet for new interface. The new GuC
> interface uses H2G to submit contexts to the GuC. Since H2G use a single
> channel, a single tasklet submits is used for the submission path. As
> such a global struct intel_engine_cs has been added to leverage the
> existing scheduling code.
> 
> Also the per engine interrupt handler has been updated to disable the
> rescheduling of the physical engine tasklet, when using GuC scheduling,
> as the physical engine tasklet is no longer used.
> 
> In this patch the field, guc_id, has been added to intel_context and is
> not assigned. Patches later in the series will assign this value.
> 
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   4 +
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 233 +++++++++---------
>   3 files changed, 127 insertions(+), 119 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index ed8c447a7346..bb6fef7eae52 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -136,6 +136,15 @@ struct intel_context {
>   	struct intel_sseu sseu;
>   
>   	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
> +
> +	/* GuC scheduling state that does not require a lock. */
> +	atomic_t guc_sched_state_no_lock;
> +
> +	/*
> +	 * GuC lrc descriptor ID - Not assigned in this patch but future patches
> +	 * in the series will.
> +	 */
> +	u16 guc_id;
>   };
>   
>   #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 2eb6c497e43c..d32866fe90ad 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -30,6 +30,10 @@ struct intel_guc {
>   	struct intel_guc_log log;
>   	struct intel_guc_ct ct;
>   
> +	/* Global engine used to submit requests to GuC */
> +	struct i915_sched_engine *sched_engine;
> +	struct i915_request *stalled_request;
> +
>   	/* intel_guc_recv interrupt related state */
>   	spinlock_t irq_lock;
>   	unsigned int msg_enabled_mask;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index c2b6d27404b7..0955a8b00ee8 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -60,6 +60,30 @@
>   
>   #define GUC_REQUEST_SIZE 64 /* bytes */
>   
> +/*
> + * Below is a set of functions which control the GuC scheduling state which do
> + * not require a lock as all state transitions are mutually exclusive. i.e. It
> + * is not possible for the context pinning code and submission, for the same
> + * context, to be executing simultaneously.
> + */

Is the statement that some other locks, or other guarantees, serialise 
modification of this state, and if so, why is it using atomics?

Regards,

Tvrtko

> +#define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
> +static inline bool context_enabled(struct intel_context *ce)
> +{
> +	return (atomic_read(&ce->guc_sched_state_no_lock) &
> +		SCHED_STATE_NO_LOCK_ENABLED);
> +}
> +
> +static inline void set_context_enabled(struct intel_context *ce)
> +{
> +	atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
> +}
> +
> +static inline void clr_context_enabled(struct intel_context *ce)
> +{
> +	atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
> +		   &ce->guc_sched_state_no_lock);
> +}
> +
>   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>   {
>   	return rb_entry(rb, struct i915_priolist, node);
> @@ -122,37 +146,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
>   	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
>   }
>   
> -static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> +static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   {
> -	/* Leaving stub as this function will be used in future patches */
> -}
> +	int err;
> +	struct intel_context *ce = rq->context;
> +	u32 action[3];
> +	int len = 0;
> +	bool enabled = context_enabled(ce);
>   
> -/*
> - * When we're doing submissions using regular execlists backend, writing to
> - * ELSP from CPU side is enough to make sure that writes to ringbuffer pages
> - * pinned in mappable aperture portion of GGTT are visible to command streamer.
> - * Writes done by GuC on our behalf are not guaranteeing such ordering,
> - * therefore, to ensure the flush, we're issuing a POSTING READ.
> - */
> -static void flush_ggtt_writes(struct i915_vma *vma)
> -{
> -	if (i915_vma_is_map_and_fenceable(vma))
> -		intel_uncore_posting_read_fw(vma->vm->gt->uncore,
> -					     GUC_STATUS);
> -}
> +	if (!enabled) {
> +		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> +		action[len++] = ce->guc_id;
> +		action[len++] = GUC_CONTEXT_ENABLE;
> +	} else {
> +		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
> +		action[len++] = ce->guc_id;
> +	}
>   
> -static void guc_submit(struct intel_engine_cs *engine,
> -		       struct i915_request **out,
> -		       struct i915_request **end)
> -{
> -	struct intel_guc *guc = &engine->gt->uc.guc;
> +	err = intel_guc_send_nb(guc, action, len);
>   
> -	do {
> -		struct i915_request *rq = *out++;
> +	if (!enabled && !err)
> +		set_context_enabled(ce);
>   
> -		flush_ggtt_writes(rq->ring->vma);
> -		guc_add_request(guc, rq);
> -	} while (out != end);
> +	return err;
>   }
>   
>   static inline int rq_prio(const struct i915_request *rq)
> @@ -160,125 +176,88 @@ static inline int rq_prio(const struct i915_request *rq)
>   	return rq->sched.attr.priority;
>   }
>   
> -static struct i915_request *schedule_in(struct i915_request *rq, int idx)
> -{
> -	trace_i915_request_in(rq, idx);
> -
> -	/*
> -	 * Currently we are not tracking the rq->context being inflight
> -	 * (ce->inflight = rq->engine). It is only used by the execlists
> -	 * backend at the moment, a similar counting strategy would be
> -	 * required if we generalise the inflight tracking.
> -	 */
> -
> -	__intel_gt_pm_get(rq->engine->gt);
> -	return i915_request_get(rq);
> -}
> -
> -static void schedule_out(struct i915_request *rq)
> -{
> -	trace_i915_request_out(rq);
> -
> -	intel_gt_pm_put_async(rq->engine->gt);
> -	i915_request_put(rq);
> -}
> -
> -static void __guc_dequeue(struct intel_engine_cs *engine)
> +static int guc_dequeue_one_context(struct intel_guc *guc)
>   {
> -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
> -	struct i915_request **first = execlists->inflight;
> -	struct i915_request ** const last_port = first + execlists->port_mask;
> -	struct i915_request *last = first[0];
> -	struct i915_request **port;
> +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> +	struct i915_request *last = NULL;
>   	bool submit = false;
>   	struct rb_node *rb;
> +	int ret;
>   
> -	lockdep_assert_held(&engine->sched_engine->lock);
> -
> -	if (last) {
> -		if (*++first)
> -			return;
> +	lockdep_assert_held(&sched_engine->lock);
>   
> -		last = NULL;
> +	if (guc->stalled_request) {
> +		submit = true;
> +		last = guc->stalled_request;
> +		goto resubmit;
>   	}
>   
> -	/*
> -	 * We write directly into the execlists->inflight queue and don't use
> -	 * the execlists->pending queue, as we don't have a distinct switch
> -	 * event.
> -	 */
> -	port = first;
>   	while ((rb = rb_first_cached(&sched_engine->queue))) {
>   		struct i915_priolist *p = to_priolist(rb);
>   		struct i915_request *rq, *rn;
>   
>   		priolist_for_each_request_consume(rq, rn, p) {
> -			if (last && rq->context != last->context) {
> -				if (port == last_port)
> -					goto done;
> -
> -				*port = schedule_in(last,
> -						    port - execlists->inflight);
> -				port++;
> -			}
> +			if (last && rq->context != last->context)
> +				goto done;
>   
>   			list_del_init(&rq->sched.link);
> +
>   			__i915_request_submit(rq);
> -			submit = true;
> +
> +			trace_i915_request_in(rq, 0);
>   			last = rq;
> +			submit = true;
>   		}
>   
>   		rb_erase_cached(&p->node, &sched_engine->queue);
>   		i915_priolist_free(p);
>   	}
>   done:
> -	sched_engine->queue_priority_hint =
> -		rb ? to_priolist(rb)->priority : INT_MIN;
>   	if (submit) {
> -		*port = schedule_in(last, port - execlists->inflight);
> -		*++port = NULL;
> -		guc_submit(engine, first, port);
> +		last->context->lrc_reg_state[CTX_RING_TAIL] =
> +			intel_ring_set_tail(last->ring, last->tail);
> +resubmit:
> +		/*
> +		 * We only check for -EBUSY here even though it is possible for
> +		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
> +		 * died and a full GPU needs to be done. The hangcheck will
> +		 * eventually detect that the GuC has died and trigger this
> +		 * reset so no need to handle -EDEADLK here.
> +		 */
> +		ret = guc_add_request(guc, last);
> +		if (ret == -EBUSY) {
> +			i915_sched_engine_kick(sched_engine);
> +			guc->stalled_request = last;
> +			return false;
> +		}
>   	}
> -	execlists->active = execlists->inflight;
> +
> +	guc->stalled_request = NULL;
> +	return submit;
>   }
>   
>   static void guc_submission_tasklet(struct tasklet_struct *t)
>   {
>   	struct i915_sched_engine *sched_engine =
>   		from_tasklet(sched_engine, t, tasklet);
> -	struct intel_engine_cs * const engine = sched_engine->engine;
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
> -	struct i915_request **port, *rq;
>   	unsigned long flags;
> +	bool loop;
>   
> -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> -
> -	for (port = execlists->inflight; (rq = *port); port++) {
> -		if (!i915_request_completed(rq))
> -			break;
> +	spin_lock_irqsave(&sched_engine->lock, flags);
>   
> -		schedule_out(rq);
> -	}
> -	if (port != execlists->inflight) {
> -		int idx = port - execlists->inflight;
> -		int rem = ARRAY_SIZE(execlists->inflight) - idx;
> -		memmove(execlists->inflight, port, rem * sizeof(*port));
> -	}
> -
> -	__guc_dequeue(engine);
> +	do {
> +		loop = guc_dequeue_one_context(&sched_engine->engine->gt->uc.guc);
> +	} while (loop);
>   
> -	i915_sched_engine_reset_on_empty(engine->sched_engine);
> +	i915_sched_engine_reset_on_empty(sched_engine);
>   
> -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
>   }
>   
>   static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
>   {
> -	if (iir & GT_RENDER_USER_INTERRUPT) {
> +	if (iir & GT_RENDER_USER_INTERRUPT)
>   		intel_engine_signal_breadcrumbs(engine);
> -		i915_sched_engine_hi_kick(engine->sched_engine);
> -	}
>   }
>   
>   static void guc_reset_prepare(struct intel_engine_cs *engine)
> @@ -351,6 +330,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
>   	struct rb_node *rb;
>   	unsigned long flags;
>   
> +	/* Can be called during boot if GuC fails to load */
> +	if (!engine->gt)
> +		return;
> +
>   	ENGINE_TRACE(engine, "\n");
>   
>   	/*
> @@ -437,8 +420,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
>   
>   void intel_guc_submission_fini(struct intel_guc *guc)
>   {
> -	if (guc->lrc_desc_pool)
> -		guc_lrc_desc_pool_destroy(guc);
> +	if (!guc->lrc_desc_pool)
> +		return;
> +
> +	guc_lrc_desc_pool_destroy(guc);
> +	i915_sched_engine_put(guc->sched_engine);
>   }
>   
>   static int guc_context_alloc(struct intel_context *ce)
> @@ -503,32 +489,32 @@ static int guc_request_alloc(struct i915_request *request)
>   	return 0;
>   }
>   
> -static inline void queue_request(struct intel_engine_cs *engine,
> +static inline void queue_request(struct i915_sched_engine *sched_engine,
>   				 struct i915_request *rq,
>   				 int prio)
>   {
>   	GEM_BUG_ON(!list_empty(&rq->sched.link));
>   	list_add_tail(&rq->sched.link,
> -		      i915_sched_lookup_priolist(engine->sched_engine, prio));
> +		      i915_sched_lookup_priolist(sched_engine, prio));
>   	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>   }
>   
>   static void guc_submit_request(struct i915_request *rq)
>   {
> -	struct intel_engine_cs *engine = rq->engine;
> +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
>   	unsigned long flags;
>   
>   	/* Will be called from irq-context when using foreign fences. */
> -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> +	spin_lock_irqsave(&sched_engine->lock, flags);
>   
> -	queue_request(engine, rq, rq_prio(rq));
> +	queue_request(sched_engine, rq, rq_prio(rq));
>   
> -	GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
> +	GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
>   	GEM_BUG_ON(list_empty(&rq->sched.link));
>   
> -	i915_sched_engine_hi_kick(engine->sched_engine);
> +	i915_sched_engine_hi_kick(sched_engine);
>   
> -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
>   }
>   
>   static void sanitize_hwsp(struct intel_engine_cs *engine)
> @@ -606,8 +592,6 @@ static void guc_release(struct intel_engine_cs *engine)
>   {
>   	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
>   
> -	tasklet_kill(&engine->sched_engine->tasklet);
> -
>   	intel_engine_cleanup_common(engine);
>   	lrc_fini_wa_ctx(engine);
>   }
> @@ -678,6 +662,7 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
>   int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
> +	struct intel_guc *guc = &engine->gt->uc.guc;
>   
>   	/*
>   	 * The setup relies on several assumptions (e.g. irqs always enabled)
> @@ -685,8 +670,18 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   	 */
>   	GEM_BUG_ON(INTEL_GEN(i915) < 11);
>   
> -	tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet);
> -	engine->sched_engine->schedule = i915_schedule;
> +	if (!guc->sched_engine) {
> +		guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
> +		if (!guc->sched_engine)
> +			return -ENOMEM;
> +
> +		guc->sched_engine->schedule = i915_schedule;
> +		guc->sched_engine->engine = engine;
> +		tasklet_setup(&guc->sched_engine->tasklet,
> +			      guc_submission_tasklet);
> +	}
> +	i915_sched_engine_put(engine->sched_engine);
> +	engine->sched_engine = i915_sched_engine_get(guc->sched_engine);
>   
>   	guc_default_vfuncs(engine);
>   	guc_default_irqs(engine);
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 53/97] drm/i915/guc: Disable semaphores when using GuC scheduling
  2021-05-06 19:14 ` [RFC PATCH 53/97] drm/i915/guc: Disable semaphores when using GuC scheduling Matthew Brost
@ 2021-05-25  9:52   ` Tvrtko Ursulin
  2021-05-25 17:01     ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-25  9:52 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter


On 06/05/2021 20:14, Matthew Brost wrote:
> Disable semaphores when using GuC scheduling as semaphores are broken in
> the current GuC firmware.

What is "current"? Given that the patch itself is like year and a half old.

Regards,

Tvrtko

> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 993faa213b41..d30260ffe2a7 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -230,7 +230,8 @@ static void intel_context_set_gem(struct intel_context *ce,
>   		ce->timeline = intel_timeline_get(ctx->timeline);
>   
>   	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
> -	    intel_engine_has_timeslices(ce->engine))
> +	    intel_engine_has_timeslices(ce->engine) &&
> +	    intel_engine_has_semaphores(ce->engine))
>   		__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
>   
>   	intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us);
> @@ -1939,7 +1940,8 @@ static int __apply_priority(struct intel_context *ce, void *arg)
>   	if (!intel_engine_has_timeslices(ce->engine))
>   		return 0;
>   
> -	if (ctx->sched.priority >= I915_PRIORITY_NORMAL)
> +	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
> +	    intel_engine_has_semaphores(ce->engine))
>   		intel_context_set_use_semaphores(ce);
>   	else
>   		intel_context_clear_use_semaphores(ce);
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 55/97] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
  2021-05-06 19:14 ` [RFC PATCH 55/97] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC Matthew Brost
@ 2021-05-25 10:06   ` Tvrtko Ursulin
  2021-05-25 17:07     ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-25 10:06 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter


On 06/05/2021 20:14, Matthew Brost wrote:
> When running the GuC the GPU can't be considered idle if the GuC still
> has contexts pinned. As such, a call has been added in
> intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
> the number of unpinned contexts to go to zero.
> 
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
>   drivers/gpu/drm/i915/gt/intel_gt.c            | 18 ++++
>   drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
>   drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 22 ++---
>   drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  1 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 ++++++++++++++++++-
>   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |  5 +
>   drivers/gpu/drm/i915/i915_debugfs.c           |  1 +
>   drivers/gpu/drm/i915/i915_gem_evict.c         |  1 +
>   .../gpu/drm/i915/selftests/igt_live_test.c    |  2 +-
>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
>   14 files changed, 137 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> index 8598a1c78a4c..2f5295c9408d 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> @@ -634,7 +634,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
>   		goto insert;
>   
>   	/* Attempt to reap some mmap space from dead objects */
> -	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
> +	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
> +					       NULL);
>   	if (err)
>   		goto err;
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 8d77dcbad059..1742a8561f69 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -574,6 +574,24 @@ static void __intel_gt_disable(struct intel_gt *gt)
>   	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
>   }
>   
> +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
> +{
> +	long rtimeout;
> +
> +	/* If the device is asleep, we have no requests outstanding */
> +	if (!intel_gt_pm_is_awake(gt))
> +		return 0;
> +
> +	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
> +							   &rtimeout)) > 0) {
> +		cond_resched();
> +		if (signal_pending(current))
> +			return -EINTR;
> +	}
> +
> +	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc, rtimeout);
> +}
> +
>   int intel_gt_init(struct intel_gt *gt)
>   {
>   	int err;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
> index 7ec395cace69..c775043334bf 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
> @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
>   
>   void intel_gt_driver_late_release(struct intel_gt *gt);
>   
> +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> +
>   void intel_gt_check_and_clear_faults(struct intel_gt *gt);
>   void intel_gt_clear_error_registers(struct intel_gt *gt,
>   				    intel_engine_mask_t engine_mask);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index 647eca9d867a..c6c702f236fa 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -13,6 +13,7 @@
>   #include "intel_gt_pm.h"
>   #include "intel_gt_requests.h"
>   #include "intel_timeline.h"
> +#include "uc/intel_uc.h"
>   
>   static bool retire_requests(struct intel_timeline *tl)
>   {
> @@ -130,7 +131,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine)
>   	GEM_BUG_ON(engine->retire);
>   }
>   
> -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
> +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> +				      long *rtimeout)

What is 'rtimeout', I know remaining, but it can be more 
self-descriptive to start with.

It feels a bit churny for what it is. How plausible would be 
alternatives to either change existing timeout to in/out, or measure 
sleep internally in this function, or just risk sleeping twice as long 
by passing the original timeout to uc idle as well?

>   {
>   	struct intel_gt_timelines *timelines = &gt->timelines;
>   	struct intel_timeline *tl, *tn;
> @@ -195,22 +197,10 @@ out_active:	spin_lock(&timelines->lock);
>   	if (flush_submission(gt, timeout)) /* Wait, there's more! */
>   		active_count++;
>   
> -	return active_count ? timeout : 0;
> -}
> -
> -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
> -{
> -	/* If the device is asleep, we have no requests outstanding */
> -	if (!intel_gt_pm_is_awake(gt))
> -		return 0;
> -
> -	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
> -		cond_resched();
> -		if (signal_pending(current))
> -			return -EINTR;
> -	}
> +	if (rtimeout)
> +		*rtimeout = timeout;
>   
> -	return timeout;
> +	return active_count ? timeout : 0;
>   }
>   
>   static void retire_work_handler(struct work_struct *work)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> index fcc30a6e4fe9..4419787124e2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> @@ -10,10 +10,11 @@ struct intel_engine_cs;
>   struct intel_gt;
>   struct intel_timeline;
>   
> -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
> +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> +				      long *rtimeout);
>   static inline void intel_gt_retire_requests(struct intel_gt *gt)
>   {
> -	intel_gt_retire_requests_timeout(gt, 0);
> +	intel_gt_retire_requests_timeout(gt, 0, NULL);
>   }
>   
>   void intel_engine_init_retire(struct intel_engine_cs *engine);
> @@ -21,8 +22,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine,
>   			     struct intel_timeline *tl);
>   void intel_engine_fini_retire(struct intel_engine_cs *engine);
>   
> -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> -
>   void intel_gt_init_requests(struct intel_gt *gt);
>   void intel_gt_park_requests(struct intel_gt *gt);
>   void intel_gt_unpark_requests(struct intel_gt *gt);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 485e98f3f304..47eaa69809e8 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -38,6 +38,8 @@ struct intel_guc {
>   	spinlock_t irq_lock;
>   	unsigned int msg_enabled_mask;
>   
> +	atomic_t outstanding_submission_g2h;
> +
>   	struct {
>   		bool enabled;
>   		void (*reset)(struct intel_guc *guc);
> @@ -239,6 +241,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>   	spin_unlock_irq(&guc->irq_lock);
>   }
>   
> +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
> +
>   int intel_guc_reset_engine(struct intel_guc *guc,
>   			   struct intel_engine_cs *engine);
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index f1893030ca88..cf701056fa14 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -111,6 +111,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>   	INIT_LIST_HEAD(&ct->requests.incoming);
>   	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
>   	tasklet_init(&ct->receive_tasklet, ct_receive_tasklet_func, (unsigned long)ct);
> +	init_waitqueue_head(&ct->wq);
>   }
>   
>   static inline const char *guc_ct_buffer_type_to_str(u32 type)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index 660bf37238e2..ab1b79ab960b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -10,6 +10,7 @@
>   #include <linux/spinlock.h>
>   #include <linux/workqueue.h>
>   #include <linux/ktime.h>
> +#include <linux/wait.h>
>   
>   #include "intel_guc_fwif.h"
>   
> @@ -68,6 +69,9 @@ struct intel_guc_ct {
>   
>   	struct tasklet_struct receive_tasklet;
>   
> +	/** @wq: wait queue for g2h chanenl */
> +	wait_queue_head_t wq;
> +
>   	struct {
>   		u16 last_fence; /* last fence used to send request */
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index ae0b386467e3..0ff7dd6d337d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -253,6 +253,74 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
>   	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
>   }
>   
> +static int guc_submission_busy_loop(struct intel_guc* guc,
> +				    const u32 *action,
> +				    u32 len,
> +				    u32 g2h_len_dw,
> +				    bool loop)
> +{
> +	int err;
> +
> +	err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
> +
> +	if (!err && g2h_len_dw)
> +		atomic_inc(&guc->outstanding_submission_g2h);
> +
> +	return err;
> +}
> +
> +static int guc_wait_for_pending_msg(struct intel_guc *guc,
> +				    atomic_t *wait_var,
> +				    bool interruptible,
> +				    long timeout)
> +{
> +	const int state = interruptible ?
> +		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> +	DEFINE_WAIT(wait);
> +
> +	might_sleep();
> +	GEM_BUG_ON(timeout < 0);
> +
> +	if (!atomic_read(wait_var))
> +		return 0;
> +
> +	if (!timeout)
> +		return -ETIME;
> +
> +	for (;;) {
> +		prepare_to_wait(&guc->ct.wq, &wait, state);
> +
> +		if (!atomic_read(wait_var))
> +			break;
> +
> +		if (signal_pending_state(state, current)) {
> +			timeout = -ERESTARTSYS;
> +			break;
> +		}
> +
> +		if (!timeout) {
> +			timeout = -ETIME;
> +			break;
> +		}
> +
> +		timeout = io_schedule_timeout(timeout);
> +	}
> +	finish_wait(&guc->ct.wq, &wait);
> +
> +	return (timeout < 0) ? timeout : 0;
> +}

See if it is possible to simplify all this with wait_var_event and 
wake_up_var.

> +
> +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
> +{
> +	bool interruptible = true;
> +
> +	if (unlikely(timeout < 0))
> +		timeout = -timeout, interruptible = false;
> +
> +	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
> +					interruptible, timeout);
> +}
> +
>   static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   {
>   	int err;
> @@ -279,6 +347,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   
>   	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
>   	if (!enabled && !err) {
> +		atomic_inc(&guc->outstanding_submission_g2h);
>   		set_context_enabled(ce);
>   	} else if (!enabled) {
>   		clr_context_pending_enable(ce);
> @@ -734,7 +803,7 @@ static int __guc_action_register_context(struct intel_guc *guc,
>   		offset,
>   	};
>   
> -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> +	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
>   }
>   
>   static int register_context(struct intel_context *ce)
> @@ -754,7 +823,7 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
>   		guc_id,
>   	};
>   
> -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
> +	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
>   					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
>   }
>   
> @@ -871,7 +940,9 @@ static int guc_context_pin(struct intel_context *ce, void *vaddr)
>   
>   static void guc_context_unpin(struct intel_context *ce)
>   {
> -	unpin_guc_id(ce_to_guc(ce), ce);
> +	struct intel_guc *guc = ce_to_guc(ce);
> +
> +	unpin_guc_id(guc, ce);
>   	lrc_unpin(ce);
>   }
>   
> @@ -894,7 +965,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
>   
>   	intel_context_get(ce);
>   
> -	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
> +	guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
>   				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
>   }
>   
> @@ -1437,6 +1508,15 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
>   	return ce;
>   }
>   
> +static void decr_outstanding_submission_g2h(struct intel_guc *guc)
> +{
> +	if (atomic_dec_and_test(&guc->outstanding_submission_g2h)) {
> +		smp_mb();
> +		if (waitqueue_active(&guc->ct.wq))
> +			wake_up_all(&guc->ct.wq);

I keep pointing out this pattern is racy and at least needs comment why 
it is safe.

Regards,

Tvrtko

> +	}
> +}
> +
>   int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>   					  const u32 *msg,
>   					  u32 len)
> @@ -1472,6 +1552,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>   		lrc_destroy(&ce->ref);
>   	}
>   
> +	decr_outstanding_submission_g2h(guc);
> +
>   	return 0;
>   }
>   
> @@ -1520,6 +1602,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>   	}
>   
> +	decr_outstanding_submission_g2h(guc);
>   	intel_context_put(ce);
>   
>   	return 0;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> index 9c954c589edf..c4cef885e984 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> @@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission);
>   #undef uc_state_checkers
>   #undef __uc_state_checker
>   
> +static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout)
> +{
> +	return intel_guc_wait_for_idle(&uc->guc, timeout);
> +}
> +
>   #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \
>   static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \
>   { \
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 8dd374691102..bb29838d1cd7 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -36,6 +36,7 @@
>   #include "gt/intel_gt_clock_utils.h"
>   #include "gt/intel_gt.h"
>   #include "gt/intel_gt_pm.h"
> +#include "gt/intel_gt.h"
>   #include "gt/intel_gt_requests.h"
>   #include "gt/intel_reset.h"
>   #include "gt/intel_rc6.h"
> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> index 4d2d59a9942b..2b73ddb11c66 100644
> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> @@ -27,6 +27,7 @@
>    */
>   
>   #include "gem/i915_gem_context.h"
> +#include "gt/intel_gt.h"
>   #include "gt/intel_gt_requests.h"
>   
>   #include "i915_drv.h"
> diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
> index c130010a7033..1c721542e277 100644
> --- a/drivers/gpu/drm/i915/selftests/igt_live_test.c
> +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
> @@ -5,7 +5,7 @@
>    */
>   
>   #include "i915_drv.h"
> -#include "gt/intel_gt_requests.h"
> +#include "gt/intel_gt.h"
>   
>   #include "../i915_selftest.h"
>   #include "igt_flush_test.h"
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index cf40004bc92a..6c06816e2b99 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -51,7 +51,8 @@ void mock_device_flush(struct drm_i915_private *i915)
>   	do {
>   		for_each_engine(engine, gt, id)
>   			mock_engine_flush(engine);
> -	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT));
> +	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT,
> +						  NULL));
>   }
>   
>   static void mock_device_release(struct drm_device *dev)
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 60/97] drm/i915: Track 'serial' counts for virtual engines
  2021-05-06 19:14 ` [RFC PATCH 60/97] drm/i915: Track 'serial' counts for " Matthew Brost
@ 2021-05-25 10:16   ` Tvrtko Ursulin
  2021-05-25 17:52     ` Matthew Brost
  2021-06-02 12:09   ` Tvrtko Ursulin
  1 sibling, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-25 10:16 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter


On 06/05/2021 20:14, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The serial number tracking of engines happens at the backend of
> request submission and was expecting to only be given physical
> engines. However, in GuC submission mode, the decomposition of virtual
> to physical engines does not happen in i915. Instead, requests are
> submitted to their virtual engine mask all the way through to the
> hardware (i.e. to GuC). This would mean that the heart beat code
> thinks the physical engines are idle due to the serial number not
> incrementing.
> 
> This patch updates the tracking to decompose virtual engines into
> their physical constituents and tracks the request against each. This
> is not entirely accurate as the GuC will only be issuing the request
> to one physical engine. However, it is the best that i915 can do given
> that it has no knowledge of the GuC's scheduling decisions.

Commit text sounds a bit defeatist. I think instead of making up the 
serial counts, which has downsides (could you please document in the 
commit what they are), we should think how to design things properly.

Regards,

Tvrtko

> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_engine_types.h     |  2 ++
>   .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++++++
>   drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++++++
>   drivers/gpu/drm/i915/gt/mock_engine.c            |  6 ++++++
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 ++++++++++++++++
>   drivers/gpu/drm/i915/i915_request.c              |  4 +++-
>   6 files changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 86302e6d86b2..e2b5cda6dbc4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -389,6 +389,8 @@ struct intel_engine_cs {
>   	void		(*park)(struct intel_engine_cs *engine);
>   	void		(*unpark)(struct intel_engine_cs *engine);
>   
> +	void		(*bump_serial)(struct intel_engine_cs *engine);
> +
>   	void		(*set_default_submission)(struct intel_engine_cs *engine);
>   
>   	const struct intel_context_ops *cops;
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index ae12d7f19ecd..02880ea5d693 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3199,6 +3199,11 @@ static void execlists_release(struct intel_engine_cs *engine)
>   	lrc_fini_wa_ctx(engine);
>   }
>   
> +static void execlist_bump_serial(struct intel_engine_cs *engine)
> +{
> +	engine->serial++;
> +}
> +
>   static void
>   logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   {
> @@ -3208,6 +3213,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   
>   	engine->cops = &execlists_context_ops;
>   	engine->request_alloc = execlists_request_alloc;
> +	engine->bump_serial = execlist_bump_serial;
>   
>   	engine->reset.prepare = execlists_reset_prepare;
>   	engine->reset.rewind = execlists_reset_rewind;
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 14aa31879a37..39dd7c4ed0a9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -1045,6 +1045,11 @@ static void setup_irq(struct intel_engine_cs *engine)
>   	}
>   }
>   
> +static void ring_bump_serial(struct intel_engine_cs *engine)
> +{
> +	engine->serial++;
> +}
> +
>   static void setup_common(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
> @@ -1064,6 +1069,7 @@ static void setup_common(struct intel_engine_cs *engine)
>   
>   	engine->cops = &ring_context_ops;
>   	engine->request_alloc = ring_request_alloc;
> +	engine->bump_serial = ring_bump_serial;
>   
>   	/*
>   	 * Using a global execution timeline; the previous final breadcrumb is
> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> index bd005c1b6fd5..97b10fd60b55 100644
> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> @@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine)
>   	intel_engine_fini_retire(engine);
>   }
>   
> +static void mock_bump_serial(struct intel_engine_cs *engine)
> +{
> +	engine->serial++;
> +}
> +
>   struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   				    const char *name,
>   				    int id)
> @@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   
>   	engine->base.cops = &mock_context_ops;
>   	engine->base.request_alloc = mock_request_alloc;
> +	engine->base.bump_serial = mock_bump_serial;
>   	engine->base.emit_flush = mock_emit_flush;
>   	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
>   	engine->base.submit_request = mock_submit_request;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index dc79d287c50a..f0e5731bcef6 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1500,6 +1500,20 @@ static void guc_release(struct intel_engine_cs *engine)
>   	lrc_fini_wa_ctx(engine);
>   }
>   
> +static void guc_bump_serial(struct intel_engine_cs *engine)
> +{
> +	engine->serial++;
> +}
> +
> +static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
> +{
> +	struct intel_engine_cs *e;
> +	intel_engine_mask_t tmp, mask = engine->mask;
> +
> +	for_each_engine_masked(e, engine->gt, mask, tmp)
> +		e->serial++;
> +}
> +
>   static void guc_default_vfuncs(struct intel_engine_cs *engine)
>   {
>   	/* Default vfuncs which can be overridden by each engine. */
> @@ -1508,6 +1522,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>   
>   	engine->cops = &guc_context_ops;
>   	engine->request_alloc = guc_request_alloc;
> +	engine->bump_serial = guc_bump_serial;
>   
>   	engine->sched_engine->schedule = i915_schedule;
>   
> @@ -1843,6 +1858,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   
>   	ve->base.cops = &virtual_guc_context_ops;
>   	ve->base.request_alloc = guc_request_alloc;
> +	ve->base.bump_serial = virtual_guc_bump_serial;
>   
>   	ve->base.submit_request = guc_submit_request;
>   
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 9542a5baa45a..127d60b36422 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -692,7 +692,9 @@ bool __i915_request_submit(struct i915_request *request)
>   				     request->ring->vaddr + request->postfix);
>   
>   	trace_i915_request_execute(request);
> -	engine->serial++;
> +	if (engine->bump_serial)
> +		engine->bump_serial(engine);
> +
>   	result = true;
>   
>   	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
                   ` (98 preceding siblings ...)
  2021-05-14 11:11 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-25 10:32 ` Tvrtko Ursulin
  2021-05-25 16:45   ` Matthew Brost
  99 siblings, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-25 10:32 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter


On 06/05/2021 20:13, Matthew Brost wrote:
> Basic GuC submission support. This is the first bullet point in the
> upstreaming plan covered in the following RFC [1].
> 
> At a very high level the GuC is a piece of firmware which sits between
> the i915 and the GPU. It offloads some of the scheduling of contexts
> from the i915 and programs the GPU to submit contexts. The i915
> communicates with the GuC and the GuC communicates with the GPU.
> 
> GuC submission will be disabled by default on all current upstream
> platforms behind a module parameter - enable_guc. A value of 3 will
> enable submission and HuC loading via the GuC. GuC submission should
> work on all gen11+ platforms assuming the GuC firmware is present.
> 
> This is a huge series and it is completely unrealistic to merge all of
> these patches at once. Fortunately I believe we can break down the
> series into different merges:
> 
> 1. Merge Chris Wilson's patches. These have already been reviewed
> upstream and I fully agree with these patches as a precursor to GuC
> submission.
> 
> 2. Update to GuC 60.1.2. These are largely Michal's patches.
> 
> 3. Turn on GuC/HuC auto mode by default.
> 
> 4. Additional patches needed to support GuC submission. This is any
> patch not covered by 1-3 in the first 34 patches. e.g. 'Engine relative
> MMIO'
> 
> 5. GuC submission support. Patches number 35+. These all don't have to
> merge at once though as we don't actually allow GuC submission until the
> last patch of this series.

For the GuC backend/submission part only - it seems to me none of my 
review comments I made in December 2019 have been implemented. At that 
point I stated, and this was all internally at the time mind you, that I 
do not think the series is ready and there were several high level 
issues that would need to be sorted out. I don't think I gave my ack or 
r-b back then and the promise was a few things would be worked on post 
(internal) merge. That was supposed to include upstream refactoring to 
enable GuC better slotting in as a backed. Fast forward a year and a 
half later and the only progress we had in this area has been deleted.

 From the top of my head, and having glanced the series as posted:

  * Self-churn factor in the series is too high.
  * Patch ordering issues.
  * GuC context state machine is way too dodgy to have any confidence it 
can be read and race conditions understood.
  * Context pinning code with it's magical two adds, subtract and 
cmpxchg is dodgy as well.
  * Kludgy way of interfacing with rest of the driver instead of 
refactoring to fit (idling, breadcrumbs, scheduler, tasklets, ...).

Now perhaps the latest plan is to ignore all these issues and still 
merge, then follow up with throwing it away, mostly or at least largely, 
in which case there isn't any point really to review the current state 
yet again. But it is sad that we got to this state. So just for the 
record - all this was reviewed in Nov/Dec 2019. By me among other folks 
and I at least deemed it not ready in this form.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 11/97] drm/i915/guc: Only rely on own CTB size
  2021-05-25  2:47   ` Matthew Brost
@ 2021-05-25 12:48     ` Michal Wajdeczko
  0 siblings, 0 replies; 249+ messages in thread
From: Michal Wajdeczko @ 2021-05-25 12:48 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter



On 25.05.2021 04:47, Matthew Brost wrote:
> On Thu, May 06, 2021 at 12:13:25PM -0700, Matthew Brost wrote:
>> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>
>> In upcoming GuC firmware, CTB size will be removed from the CTB
>> descriptor so we must keep it locally for any calculations.
>>
>> While around, improve some debug messages and helpers.
>>
> 
> desc->size is still used in the patch and really shouldn't be per this

goal of this patch was to start using local ctb->size for any
calculations, not to drop it completely, as at this point (GuC 49)
desc->size is still part of the CTB protocol

> comment but a patch later in the series drops it. Seeing as this patch
> and that patch are going to squashed into a single patch upgrading the
> GuC firmware I think that is ok.

no need to squash this patch

in fact all CTB patches up to and including 20/97 ("drm/i915/guc:
Introduce unified HXG messages") are compatible with current GuC 49.0.1
and IMHO *should* be merged separately

only patches from 21/97 ("drm/i915/guc: Update MMIO based
communication") to 29/97 ("drm/i915/guc: Update firmware to v60.1.2")
must be squashed during merge, as this is were communication protocol is
changing (from GuC 49 to GuC 60) and we may break existing functionality
(HuC authentication)

Michal

> 
> With that:
> Reviewed-by: Matthew Brost <matthew.brost@intel.com> 
> 
>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> ---
>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 55 +++++++++++++++++------
>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  2 +
>>  2 files changed, 43 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> index 4cc8c0b71699..dbece569fbe4 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> @@ -90,6 +90,24 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
>>  	desc->owner = CTB_OWNER_HOST;
>>  }
>>  
>> +static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb, u32 cmds_addr)
>> +{
>> +	guc_ct_buffer_desc_init(ctb->desc, cmds_addr, ctb->size);
>> +}
>> +
>> +static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb,
>> +			       struct guc_ct_buffer_desc *desc,
>> +			       u32 *cmds, u32 size)
>> +{
>> +	GEM_BUG_ON(size % 4);
>> +
>> +	ctb->desc = desc;
>> +	ctb->cmds = cmds;
>> +	ctb->size = size;
>> +
>> +	guc_ct_buffer_reset(ctb, 0);
>> +}
>> +
>>  static int guc_action_register_ct_buffer(struct intel_guc *guc,
>>  					 u32 desc_addr,
>>  					 u32 type)
>> @@ -148,7 +166,10 @@ static int ct_deregister_buffer(struct intel_guc_ct *ct, u32 type)
>>  int intel_guc_ct_init(struct intel_guc_ct *ct)
>>  {
>>  	struct intel_guc *guc = ct_to_guc(ct);
>> +	struct guc_ct_buffer_desc *desc;
>> +	u32 blob_size;
>>  	void *blob;
>> +	u32 *cmds;
>>  	int err;
>>  	int i;
>>  
>> @@ -176,19 +197,24 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
>>  	 * other code will need updating as well.
>>  	 */
>>  
>> -	err = intel_guc_allocate_and_map_vma(guc, PAGE_SIZE, &ct->vma, &blob);
>> +	blob_size = PAGE_SIZE;
>> +	err = intel_guc_allocate_and_map_vma(guc, blob_size, &ct->vma, &blob);
>>  	if (unlikely(err)) {
>> -		CT_ERROR(ct, "Failed to allocate CT channel (err=%d)\n", err);
>> +		CT_PROBE_ERROR(ct, "Failed to allocate %u for CTB data (%pe)\n",
>> +			       blob_size, ERR_PTR(err));
>>  		return err;
>>  	}
>>  
>> -	CT_DEBUG(ct, "vma base=%#x\n", intel_guc_ggtt_offset(guc, ct->vma));
>> +	CT_DEBUG(ct, "base=%#x size=%u\n", intel_guc_ggtt_offset(guc, ct->vma), blob_size);
>>  
>>  	/* store pointers to desc and cmds */
>>  	for (i = 0; i < ARRAY_SIZE(ct->ctbs); i++) {
>>  		GEM_BUG_ON((i !=  CTB_SEND) && (i != CTB_RECV));
>> -		ct->ctbs[i].desc = blob + PAGE_SIZE/4 * i;
>> -		ct->ctbs[i].cmds = blob + PAGE_SIZE/4 * i + PAGE_SIZE/2;
>> +
>> +		desc = blob + PAGE_SIZE / 4 * i;
>> +		cmds = blob + PAGE_SIZE / 4 * i + PAGE_SIZE / 2;
>> +
>> +		guc_ct_buffer_init(&ct->ctbs[i], desc, cmds, PAGE_SIZE / 4);
>>  	}
>>  
>>  	return 0;
>> @@ -217,7 +243,7 @@ void intel_guc_ct_fini(struct intel_guc_ct *ct)
>>  int intel_guc_ct_enable(struct intel_guc_ct *ct)
>>  {
>>  	struct intel_guc *guc = ct_to_guc(ct);
>> -	u32 base, cmds, size;
>> +	u32 base, cmds;
>>  	int err;
>>  	int i;
>>  
>> @@ -232,10 +258,11 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>>  	 */
>>  	for (i = 0; i < ARRAY_SIZE(ct->ctbs); i++) {
>>  		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
>> +
>>  		cmds = base + PAGE_SIZE / 4 * i + PAGE_SIZE / 2;
>> -		size = PAGE_SIZE / 4;
>> -		CT_DEBUG(ct, "%d: addr=%#x size=%u\n", i, cmds, size);
>> -		guc_ct_buffer_desc_init(ct->ctbs[i].desc, cmds, size);
>> +		CT_DEBUG(ct, "%d: cmds addr=%#x\n", i, cmds);
>> +
>> +		guc_ct_buffer_reset(&ct->ctbs[i], cmds);
>>  	}
>>  
>>  	/*
>> @@ -259,7 +286,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>>  err_deregister:
>>  	ct_deregister_buffer(ct, INTEL_GUC_CT_BUFFER_TYPE_RECV);
>>  err_out:
>> -	CT_PROBE_ERROR(ct, "Failed to open channel (err=%d)\n", err);
>> +	CT_PROBE_ERROR(ct, "Failed to enable CTB (%pe)\n", ERR_PTR(err));
>>  	return err;
>>  }
>>  
>> @@ -314,7 +341,7 @@ static int ct_write(struct intel_guc_ct *ct,
>>  	struct guc_ct_buffer_desc *desc = ctb->desc;
>>  	u32 head = desc->head;
>>  	u32 tail = desc->tail;
>> -	u32 size = desc->size;
>> +	u32 size = ctb->size;
>>  	u32 used;
>>  	u32 header;
>>  	u32 *cmds = ctb->cmds;
>> @@ -323,7 +350,7 @@ static int ct_write(struct intel_guc_ct *ct,
>>  	if (unlikely(desc->is_in_error))
>>  		return -EPIPE;
>>  
>> -	if (unlikely(!IS_ALIGNED(head | tail | size, 4) ||
>> +	if (unlikely(!IS_ALIGNED(head | tail, 4) ||
>>  		     (tail | head) >= size))
>>  		goto corrupted;
>>  
>> @@ -530,7 +557,7 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
>>  	struct guc_ct_buffer_desc *desc = ctb->desc;
>>  	u32 head = desc->head;
>>  	u32 tail = desc->tail;
>> -	u32 size = desc->size;
>> +	u32 size = ctb->size;
>>  	u32 *cmds = ctb->cmds;
>>  	s32 available;
>>  	unsigned int len;
>> @@ -539,7 +566,7 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
>>  	if (unlikely(desc->is_in_error))
>>  		return -EPIPE;
>>  
>> -	if (unlikely(!IS_ALIGNED(head | tail | size, 4) ||
>> +	if (unlikely(!IS_ALIGNED(head | tail, 4) ||
>>  		     (tail | head) >= size))
>>  		goto corrupted;
>>  
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>> index 494a51a5200f..4009e2dd0de4 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>> @@ -29,10 +29,12 @@ struct intel_guc;
>>   *
>>   * @desc: pointer to the buffer descriptor
>>   * @cmds: pointer to the commands buffer
>> + * @size: size of the commands buffer
>>   */
>>  struct intel_guc_ct_buffer {
>>  	struct guc_ct_buffer_desc *desc;
>>  	u32 *cmds;
>> +	u32 size;
>>  };
>>  
>>  
>> -- 
>> 2.28.0
>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 12/97] drm/i915/guc: Don't repeat CTB layout calculations
  2021-05-25  2:53   ` Matthew Brost
@ 2021-05-25 13:07     ` Michal Wajdeczko
  2021-05-25 16:56       ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Michal Wajdeczko @ 2021-05-25 13:07 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel
  Cc: tvrtko.ursulin, jason.ekstrand, daniele.ceraolospurio,
	jon.bloomfield, daniel.vetter, john.c.harrison



On 25.05.2021 04:53, Matthew Brost wrote:
> On Thu, May 06, 2021 at 12:13:26PM -0700, Matthew Brost wrote:
>> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>
>> We can retrieve offsets to cmds buffers and descriptor from
>> actual pointers that we already keep locally.
>>
>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> ---
>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 16 ++++++++++------
>>  1 file changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> index dbece569fbe4..fbd6bd20f588 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> @@ -244,6 +244,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>>  {
>>  	struct intel_guc *guc = ct_to_guc(ct);
>>  	u32 base, cmds;
>> +	void *blob;
>>  	int err;
>>  	int i;
>>  
>> @@ -251,15 +252,18 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>>  
>>  	/* vma should be already allocated and map'ed */
>>  	GEM_BUG_ON(!ct->vma);
>> +	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(ct->vma->obj));
> 
> This doesn't really have anything to do with this patch, but again this
> patch will be squashed into a large patch updating the GuC firmware, so
> I think this is fine.

again, no need to squash GuC patches up to 20/97

> 
> With that:
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> 
>>  	base = intel_guc_ggtt_offset(guc, ct->vma);
>>  
>> -	/* (re)initialize descriptors
>> -	 * cmds buffers are in the second half of the blob page
>> -	 */
>> +	/* blob should start with send descriptor */
>> +	blob = __px_vaddr(ct->vma->obj);
>> +	GEM_BUG_ON(blob != ct->ctbs[CTB_SEND].desc);
>> +
>> +	/* (re)initialize descriptors */
>>  	for (i = 0; i < ARRAY_SIZE(ct->ctbs); i++) {
>>  		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
>>  
>> -		cmds = base + PAGE_SIZE / 4 * i + PAGE_SIZE / 2;
>> +		cmds = base + ptrdiff(ct->ctbs[i].cmds, blob);
>>  		CT_DEBUG(ct, "%d: cmds addr=%#x\n", i, cmds);
>>  
>>  		guc_ct_buffer_reset(&ct->ctbs[i], cmds);
>> @@ -269,12 +273,12 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>>  	 * Register both CT buffers starting with RECV buffer.
>>  	 * Descriptors are in first half of the blob.
>>  	 */
>> -	err = ct_register_buffer(ct, base + PAGE_SIZE / 4 * CTB_RECV,
>> +	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs[CTB_RECV].desc, blob),
>>  				 INTEL_GUC_CT_BUFFER_TYPE_RECV);
>>  	if (unlikely(err))
>>  		goto err_out;
>>  
>> -	err = ct_register_buffer(ct, base + PAGE_SIZE / 4 * CTB_SEND,
>> +	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs[CTB_SEND].desc, blob),
>>  				 INTEL_GUC_CT_BUFFER_TYPE_SEND);
>>  	if (unlikely(err))
>>  		goto err_deregister;
>> -- 
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 37/97] drm/i915/guc: Add stall timer to non blocking CTB send function
  2021-05-24 18:35     ` Matthew Brost
@ 2021-05-25 14:15       ` Michal Wajdeczko
  2021-05-25 16:54         ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Michal Wajdeczko @ 2021-05-25 14:15 UTC (permalink / raw)
  To: Matthew Brost
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison



On 24.05.2021 20:35, Matthew Brost wrote:
> On Mon, May 24, 2021 at 02:58:12PM +0200, Michal Wajdeczko wrote:
>>
>>
>> On 06.05.2021 21:13, Matthew Brost wrote:
>>> Implement a stall timer which fails H2G CTBs once a period of time
>>> with no forward progress is reached to prevent deadlock.
>>>
>>> Also update to ct_write to return -EDEADLK rather than -EPIPE on a
>>> corrupted descriptor.
>>
>> broken descriptor is really separate issue compared to no progress from
>> GuC side, I would really like to keep old error code
>>
> 
> I know you do as you have brought it up several times. Again to the rest
> of the stack these two things mean the exact same thing.

but I guess 'the rest of the stack' is only interested if returned error
is EBUSY or not, as all other errors are treated in the same way, thus
no need change existing error codes

>  
>> note that broken CTB descriptor is unrecoverable error, while on other
>> hand, in theory, we could recover from temporary non-moving CTB
>>
> 
> Yea but we don't, in both cases we disable submission which in turn
> triggers a full GPU reset.

is this current limitation or long term design?

I would rather expect that decision to trigger full GPU is done on solid
foundations, like definite lost communication with the GuC or missed
heartbeats, not that we temporarily push CTB to the limit

or if do we want to treat CTB processing as kind of hw health checkout
too, what if heartbeat timeout and CTB stall time do not match ?

> 
>>>
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 48 +++++++++++++++++++++--
>>>  1 file changed, 45 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index af7314d45a78..4eab319d61be 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -69,6 +69,8 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
>>>  #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
>>>  #define CTB_G2H_BUFFER_SIZE	(SZ_4K)
>>>  
>>> +#define MAX_US_STALL_CTB	1000000
>>
>> nit: maybe we should make it a CONFIG value ?
>>
> 
> Sure.
> 
>>> +
>>>  struct ct_request {
>>>  	struct list_head link;
>>>  	u32 fence;
>>> @@ -315,6 +317,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>>>  
>>>  	ct->requests.last_fence = 1;
>>>  	ct->enabled = true;
>>> +	ct->stall_time = KTIME_MAX;
>>>  
>>>  	return 0;
>>>  
>>> @@ -378,7 +381,7 @@ static int ct_write(struct intel_guc_ct *ct,
>>>  	unsigned int i;
>>>  
>>>  	if (unlikely(ctb->broken))
>>> -		return -EPIPE;
>>> +		return -EDEADLK;
>>>  
>>>  	if (unlikely(desc->status))
>>>  		goto corrupted;
>>> @@ -449,7 +452,7 @@ static int ct_write(struct intel_guc_ct *ct,
>>>  	CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n",
>>>  		 desc->head, desc->tail, desc->status);
>>>  	ctb->broken = true;
>>> -	return -EPIPE;
>>> +	return -EDEADLK;
>>>  }
>>>  
>>>  /**
>>> @@ -494,6 +497,17 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
>>>  	return err;
>>>  }
>>>  
>>> +static inline bool ct_deadlocked(struct intel_guc_ct *ct)
>>> +{
>>> +	bool ret = ktime_us_delta(ktime_get(), ct->stall_time) >
>>> +		MAX_US_STALL_CTB;
>>> +
>>> +	if (unlikely(ret))
>>> +		CT_ERROR(ct, "CT deadlocked\n");
>>> +
>>> +	return ret;
>>> +}
>>> +
>>>  static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
>>>  {
>>>  	struct guc_ct_buffer_desc *desc = ctb->desc;
>>> @@ -505,6 +519,26 @@ static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
>>>  	return space >= len_dw;
>>>  }
>>>  
>>> +static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
>>> +{
>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>> +
>>> +	lockdep_assert_held(&ct->ctbs.send.lock);
>>> +
>>> +	if (unlikely(!ctb_has_room(ctb, len_dw))) {
>>> +		if (ct->stall_time == KTIME_MAX)
>>> +			ct->stall_time = ktime_get();
>>> +
>>> +		if (unlikely(ct_deadlocked(ct)))
>>> +			return -EDEADLK;
>>> +		else
>>> +			return -EBUSY;
>>> +	}
>>> +
>>> +	ct->stall_time = KTIME_MAX;
>>> +	return 0;
>>> +}
>>> +
>>>  static int ct_send_nb(struct intel_guc_ct *ct,
>>>  		      const u32 *action,
>>>  		      u32 len,
>>> @@ -517,7 +551,7 @@ static int ct_send_nb(struct intel_guc_ct *ct,
>>>  
>>>  	spin_lock_irqsave(&ctb->lock, spin_flags);
>>>  
>>> -	ret = ctb_has_room(ctb, len + 1);
>>> +	ret = has_room_nb(ct, len + 1);
>>>  	if (unlikely(ret))
>>>  		goto out;
>>>  
>>> @@ -561,11 +595,19 @@ static int ct_send(struct intel_guc_ct *ct,
>>>  retry:
>>>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>>>  	if (unlikely(!ctb_has_room(ctb, len + 1))) {
>>> +		if (ct->stall_time == KTIME_MAX)
>>> +			ct->stall_time = ktime_get();
>>>  		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>> +
>>> +		if (unlikely(ct_deadlocked(ct)))
>>> +			return -EDEADLK;
>>> +
>>
>> likely, instead of duplicating code, you can reuse has_room_nb here
>>
> 
> In this patch yes, in the following patch no as this check changes
> between non-blockig and blocking once we introduce G2H credits. I'd
> rather just leave it as is than churning on the patches.
> 
> Matt 
>  
>>>  		cond_resched();
>>>  		goto retry;
>>>  	}
>>>  
>>> +	ct->stall_time = KTIME_MAX;
>>> +
>>>  	fence = ct_get_next_fence(ct);
>>>  	request.fence = fence;
>>>  	request.status = 0;
>>>

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-14 16:31                 ` Jason Ekstrand
@ 2021-05-25 15:37                   ` Alex Deucher
  0 siblings, 0 replies; 249+ messages in thread
From: Alex Deucher @ 2021-05-25 15:37 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Matthew Brost, Ursulin, Tvrtko, intel-gfx, dri-devel, Ekstrand,
	Jason, Ceraolo Spurio, Daniele, Bloomfield, Jon, Vetter, Daniel,
	Harrison, John C

On Fri, May 14, 2021 at 12:31 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
>
> Pulling a few threads together...
>
> On Mon, May 10, 2021 at 1:39 PM Francisco Jerez <currojerez@riseup.net> wrote:
> >
> > I agree with Martin on this.  Given that using GuC currently involves
> > making your open-source graphics stack rely on a closed-source
> > cryptographically-protected blob in order to submit commands to the GPU,
> > and given that it is /still/ possible to use the GPU without it, I'd
> > expect some strong material justification for making the switch (like,
> > it improves performance of test-case X and Y by Z%, or, we're truly
> > sorry but we cannot program your GPU anymore with a purely open-source
> > software stack).  Any argument based on the apparent direction of the
> > wind doesn't sound like a material engineering reason to me, and runs
> > the risk of being self-fulfilling if it leads us to do the worse thing
> > for our users just because we have the vague feeling that it is the
> > general trend, even though we may have had the means to obtain a better
> > compromise for them.
>
> I think it's important to distinguish between landing code to support
> GuC submission and requiring it in order to use the GPU.  We've got
> the execlist back-end and it's not going anywhere, at least not for
> older hardware, and it will likely keep working as long as execlists
> remain in the hardware.  What's being proposed here is a new back-end
> which, yes, depends on firmware and can be used for more features.
>
> I'm well aware of the slippery slope argument that's implicitly being
> used here even if no one is actually saying it:  If we land GuC
> support in i915 in any form then Intel HW engineers will say "See,
> Linux supports GuC now; we can rip out execlists" and we'll end up in
> the dystopia of closed-source firmware.  If upstream continues to push
> back on GuC in any form then they'll be forced to keep execlists.
> I'll freely admit that there is probably some truth to this.  However,
> I really doubt that it's going to work long-term.  If the HW
> architects are determined enough to rip it out, they will.

You want to stay on the same interfaces as Windows does, like it or
not.  The market is bigger and there is a lot more validation effort.
Even if support for the old way doesn't go away, it won't be as well
tested.  For AMD, we tried to stay on some of the older interfaces on
a number products in the past and ran into lots of subtle issues,
especially around power management related things like clock and power
gating.  There are just too many handshakes and stuff required to make
all of that work smoothly.  It can be especially challenging when the
issues show up well after launch and the firmware and hardware teams
have already moved on to the next projects and have to page the older
projects back into their minds.

Alex


>
> If GuC is really inevitable, then it's in our best interests to land
> at least beta support earlier.  There are a lot of questions that
> people have brought up around back-ports, dealing with stable kernels,
> stability concerns, etc.  The best way to sort those out is to land
> the code and start dealing with the issues.  We can't front-load
> solving every possible issue or the code will never land.  But maybe
> that's people's actual objective?
>
>
> On Wed, May 12, 2021 at 1:26 AM Martin Peres <martin.peres@free.fr> wrote:
> >
> > On 11/05/2021 19:39, Matthew Brost wrote:
> > > On Tue, May 11, 2021 at 08:26:59AM -0700, Bloomfield, Jon wrote:
> > >>> On 10/05/2021 19:33, Daniel Vetter wrote:
> > >>>> On Mon, May 10, 2021 at 3:55 PM Martin Peres <martin.peres@free.fr>
> > >>> wrote:
> > >>>
> > >>> However, if the GuC is actually helping i915, then why not open source
> > >>> it and drop all the issues related to its stability? Wouldn't it be the
> > >>> perfect solution, as it would allow dropping execlist support for newer
> > >>> HW, and it would eliminate the concerns about maintenance of stable
> > >>> releases of Linux?
>
> I would like to see that happen.  I know there was some chatter about
> it for a while and then the discussions got killed.  I'm not sure what
> happened, to be honest.  However, I don't think we can make any
> guarantees or assumptions there, I'm afraid. :-(
>
> > >> That the major version of the FW is high is not due to bugs - Bugs don't trigger major version bumps anyway.
> >
> > Of course, where did I say they would?
>
> I think the concern here is that old kernels will require old major
> GuC versions because interfaces won't be backwards-compatible and then
> those kernels won't get bug fixes.  That's a legitimate concern.
> Given the Linux usage model, I think it's fair to require either
> backwards-compatibility with GuC interfaces and validation of that
> backwards-compatibility or stable releases with bug fixes for a good
> long while.  I honestly can't say whether or not we've really scoped
> that.  Jon?
>
> > >> We have been using GuC as the sole mechanism for submission on Windows since Gen8, and it has proven very reliable. This is in large part because it is simple, and designed from day 1 as a cohesive solution alongside the hardware.
>
> There are going to be differences in the usage patterns that i915 and
> Windows will hit when it comes to the subtle details of how we bang on
> the GuC rings.  Those will likely lead to bugs on Linux that don't
> show up on Windows so "it works on Windows" doesn't mean we're headed
> for a bug-free future.  It means we have an existence proof that
> firmware-based submission can be very reliable.  However, I don't
> think anyone on this thread is really questioning that.
>
> > Exactly, the GuC was designed with Windows' GPU model... which is not
> > directly applicable to Linux. Also, Windows does not care as much about
> > submission latency, whereas most Linux users still depend on glamor for
> > 2D acceleration which is pretty much the biggest stress test for command
> > submission latency. Also, features not used by the Windows driver or
> > used in a different way are/will get broken (see the semaphore patch
> > that works around it).
>
> I'm not nearly as deep into benchmarking the delta as you are so I
> won't contradict anything said directly.  However, I think it's worth
> pointing out a few things:
>
> There isn't really a Windows GPU model.  There's a different
> submission model with Win10 vs. Win7 and Linux looks a lot more like
> Win7.  I really want Linux to start looking like Win10 at which point
> they'll be using roughly the same "GPU model".  There are other OS
> differences that matter here such as Windows' substantially higher
> interrupt handling latency which GuC theoretically works around.
> However, I don't think it's fair to say that the way Linux wants to
> program the GPU for command submission is substantially different from
> Windows due to userspace software differences.
>
> There are significant differences in terms of dma_fence handling and
> implicit synchronization.  However, as has already been mentioned,
> those will be handled by drm/scheduler with GuC as a back-end that
> manages load-balancing.  And, yes, there will be Linux-specific bugs
> (see above) but they're not because of a fundamentally different
> model.
>
> One other thing worth mentioning, which doesn't seem to fit anywhere:
> If we really care about keeping execlists working for the upcoming
> use-cases, it needs major work.  It's currently way too deeply tied
> with i915_sw_fence so it can't handle long-running compute batches
> without breaking dma-fence rules.  The way it handles bonded submit is
> a bolt-on that doesn't actually provide the guarantees that userspace
> needs.  It should also probably be re-architected to use drm/scheduler
> for dma_fence and look a lot more like GuC on the inside.
>
> The point of bringing this up is that I'm seeing a lot more execlist
> love than I think it deserves. :-)  It may be free software but that
> doesn't mean it's good software. :-P  To be clear, I don't mean to
> unduly insult Chris or any of the other people who have worked on it.
> It works and it's perfectly functional for supporting all the good ol'
> use-cases us desktop Linux people are used to.  But the ways in which
> it would have to change in order to handle the future are substantial.
>
> --Jason

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 17/97] drm/i915/guc: Stop using mutex while sending CTB messages
  2021-05-06 19:13 ` [RFC PATCH 17/97] drm/i915/guc: Stop using mutex while sending CTB messages Matthew Brost
@ 2021-05-25 16:14   ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25 16:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:31PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> We are no longer using descriptor to hold G2H replies and we are
> protecting access to the descriptor and command buffer by the
> separate spinlock, so we can stop using mutex.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index bee0958d8bae..cb58fa7f970c 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -537,7 +537,6 @@ static int ct_send(struct intel_guc_ct *ct,
>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>  		      u32 *response_buf, u32 response_buf_size)
>  {
> -	struct intel_guc *guc = ct_to_guc(ct);
>  	u32 status = ~0; /* undefined */
>  	int ret;
>  
> @@ -546,8 +545,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>  		return -ENODEV;
>  	}
>  
> -	mutex_lock(&guc->send_mutex);
> -
>  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>  	if (unlikely(ret < 0)) {
>  		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
> @@ -557,7 +554,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>  			 action[0], ret, ret);
>  	}
>  
> -	mutex_unlock(&guc->send_mutex);
>  	return ret;
>  }
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-25 10:32 ` Tvrtko Ursulin
@ 2021-05-25 16:45   ` Matthew Brost
  2021-06-02 15:27     ` Tvrtko Ursulin
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-25 16:45 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
> 
> On 06/05/2021 20:13, Matthew Brost wrote:
> > Basic GuC submission support. This is the first bullet point in the
> > upstreaming plan covered in the following RFC [1].
> > 
> > At a very high level the GuC is a piece of firmware which sits between
> > the i915 and the GPU. It offloads some of the scheduling of contexts
> > from the i915 and programs the GPU to submit contexts. The i915
> > communicates with the GuC and the GuC communicates with the GPU.
> > 
> > GuC submission will be disabled by default on all current upstream
> > platforms behind a module parameter - enable_guc. A value of 3 will
> > enable submission and HuC loading via the GuC. GuC submission should
> > work on all gen11+ platforms assuming the GuC firmware is present.
> > 
> > This is a huge series and it is completely unrealistic to merge all of
> > these patches at once. Fortunately I believe we can break down the
> > series into different merges:
> > 
> > 1. Merge Chris Wilson's patches. These have already been reviewed
> > upstream and I fully agree with these patches as a precursor to GuC
> > submission.
> > 
> > 2. Update to GuC 60.1.2. These are largely Michal's patches.
> > 
> > 3. Turn on GuC/HuC auto mode by default.
> > 
> > 4. Additional patches needed to support GuC submission. This is any
> > patch not covered by 1-3 in the first 34 patches. e.g. 'Engine relative
> > MMIO'
> > 
> > 5. GuC submission support. Patches number 35+. These all don't have to
> > merge at once though as we don't actually allow GuC submission until the
> > last patch of this series.
> 
> For the GuC backend/submission part only - it seems to me none of my review
> comments I made in December 2019 have been implemented. At that point I

I wouldn't say none of the fixes have done, lots have just not
everything you wanted.

> stated, and this was all internally at the time mind you, that I do not
> think the series is ready and there were several high level issues that
> would need to be sorted out. I don't think I gave my ack or r-b back then
> and the promise was a few things would be worked on post (internal) merge.
> That was supposed to include upstream refactoring to enable GuC better
> slotting in as a backed. Fast forward a year and a half later and the only
> progress we had in this area has been deleted.
> 
> From the top of my head, and having glanced the series as posted:
> 
>  * Self-churn factor in the series is too high.

Not sure what you mean by this? The patches have been reworked
internally too much?

>  * Patch ordering issues.

We are going to clean up some of the ordering as these 97 patches are
posted in smaller mergeable series but at the end of the day this is a
bit of a bikeshed. GuC submission can't be turned until patch 97 so IMO
it really isn't all that big of a deal the order of which patches before
that land as we are not breaking anything.

>  * GuC context state machine is way too dodgy to have any confidence it can
> be read and race conditions understood.

I know you don't really like the state machine but no other real way to
not have DoS on resources and no real way to fairly distribute guc_ids
without it. I know you have had other suggestions here but none of your
suggestions either will work or they are no less complicated in the end.

For what it is worth, the state machine will get simplified when we hook
into the DRM scheduler as won't have to deal with submitting from IRQ
contexts in the backend or having more than 1 request in the backend at
a time.

>  * Context pinning code with it's magical two adds, subtract and cmpxchg is
> dodgy as well.

Daniele tried to remove this and it proved quite difficult + created
even more races in the backend code. This was prior to the pre-pin and
post-unpin code which makes this even more difficult to fix as I believe
these functions would need to be removed first. Not saying we can't
revisit this someday but I personally really like it - it is a clever
way to avoid reentering the pin / unpin code while asynchronous things
are happening rather than some complex locking scheme. Lastly, this code
has proved incredibly stable as I don't think we've had to fix a single
thing in this area since we've been using this code internally.

>  * Kludgy way of interfacing with rest of the driver instead of refactoring
> to fit (idling, breadcrumbs, scheduler, tasklets, ...).
>

Idling and breadcrumbs seem clean to me. Scheduler + tasklet are going
away once the DRM scheduler lands. No need rework those as we are just
going to rework this again.
 
> Now perhaps the latest plan is to ignore all these issues and still merge,
> then follow up with throwing it away, mostly or at least largely, in which
> case there isn't any point really to review the current state yet again. But
> it is sad that we got to this state. So just for the record - all this was
> reviewed in Nov/Dec 2019. By me among other folks and I at least deemed it
> not ready in this form.
> 

I personally don't think it is really in that bad of shape. The fact
that I could put together a PoC more or less fully integrating this
backend into the DRM scheduler within a few days I think speaks to the
quality and flexablitiy of this backend compared to execlists.

Matt 

> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 37/97] drm/i915/guc: Add stall timer to non blocking CTB send function
  2021-05-25 14:15       ` Michal Wajdeczko
@ 2021-05-25 16:54         ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25 16:54 UTC (permalink / raw)
  To: Michal Wajdeczko
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison

On Tue, May 25, 2021 at 04:15:15PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 24.05.2021 20:35, Matthew Brost wrote:
> > On Mon, May 24, 2021 at 02:58:12PM +0200, Michal Wajdeczko wrote:
> >>
> >>
> >> On 06.05.2021 21:13, Matthew Brost wrote:
> >>> Implement a stall timer which fails H2G CTBs once a period of time
> >>> with no forward progress is reached to prevent deadlock.
> >>>
> >>> Also update to ct_write to return -EDEADLK rather than -EPIPE on a
> >>> corrupted descriptor.
> >>
> >> broken descriptor is really separate issue compared to no progress from
> >> GuC side, I would really like to keep old error code
> >>
> > 
> > I know you do as you have brought it up several times. Again to the rest
> > of the stack these two things mean the exact same thing.
> 
> but I guess 'the rest of the stack' is only interested if returned error
> is EBUSY or not, as all other errors are treated in the same way, thus
> no need change existing error codes
> 

No, -ENODEV means something else too. I'd rather have a single explicit
error code which means H2G is broken, take action to fix it. This is a
bikeshed, we made a decession internally to return a single error code
we really need to move on.

> >  
> >> note that broken CTB descriptor is unrecoverable error, while on other
> >> hand, in theory, we could recover from temporary non-moving CTB
> >>
> > 
> > Yea but we don't, in both cases we disable submission which in turn
> > triggers a full GPU reset.
> 
> is this current limitation or long term design?
> 

Long term design.

> I would rather expect that decision to trigger full GPU is done on solid
> foundations, like definite lost communication with the GuC or missed
> heartbeats, not that we temporarily push CTB to the limit
> 

The intent is to have a large enough value here that if it is reached it
is assumed the GuC is toast and we need a full GPU reset.

> or if do we want to treat CTB processing as kind of hw health checkout
> too, what if heartbeat timeout and CTB stall time do not match ?
>

It is a health check of H2G channel.

No need for these two values to match. One is checking if the GuC can
continue make forward progress processing H2G the other is checking if
an engine can make forward progress.

Matt
 
> > 
> >>>
> >>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> >>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >>> ---
> >>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 48 +++++++++++++++++++++--
> >>>  1 file changed, 45 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>> index af7314d45a78..4eab319d61be 100644
> >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>> @@ -69,6 +69,8 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
> >>>  #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
> >>>  #define CTB_G2H_BUFFER_SIZE	(SZ_4K)
> >>>  
> >>> +#define MAX_US_STALL_CTB	1000000
> >>
> >> nit: maybe we should make it a CONFIG value ?
> >>
> > 
> > Sure.
> > 
> >>> +
> >>>  struct ct_request {
> >>>  	struct list_head link;
> >>>  	u32 fence;
> >>> @@ -315,6 +317,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
> >>>  
> >>>  	ct->requests.last_fence = 1;
> >>>  	ct->enabled = true;
> >>> +	ct->stall_time = KTIME_MAX;
> >>>  
> >>>  	return 0;
> >>>  
> >>> @@ -378,7 +381,7 @@ static int ct_write(struct intel_guc_ct *ct,
> >>>  	unsigned int i;
> >>>  
> >>>  	if (unlikely(ctb->broken))
> >>> -		return -EPIPE;
> >>> +		return -EDEADLK;
> >>>  
> >>>  	if (unlikely(desc->status))
> >>>  		goto corrupted;
> >>> @@ -449,7 +452,7 @@ static int ct_write(struct intel_guc_ct *ct,
> >>>  	CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n",
> >>>  		 desc->head, desc->tail, desc->status);
> >>>  	ctb->broken = true;
> >>> -	return -EPIPE;
> >>> +	return -EDEADLK;
> >>>  }
> >>>  
> >>>  /**
> >>> @@ -494,6 +497,17 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> >>>  	return err;
> >>>  }
> >>>  
> >>> +static inline bool ct_deadlocked(struct intel_guc_ct *ct)
> >>> +{
> >>> +	bool ret = ktime_us_delta(ktime_get(), ct->stall_time) >
> >>> +		MAX_US_STALL_CTB;
> >>> +
> >>> +	if (unlikely(ret))
> >>> +		CT_ERROR(ct, "CT deadlocked\n");
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +
> >>>  static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> >>>  {
> >>>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> >>> @@ -505,6 +519,26 @@ static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> >>>  	return space >= len_dw;
> >>>  }
> >>>  
> >>> +static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
> >>> +{
> >>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>> +
> >>> +	lockdep_assert_held(&ct->ctbs.send.lock);
> >>> +
> >>> +	if (unlikely(!ctb_has_room(ctb, len_dw))) {
> >>> +		if (ct->stall_time == KTIME_MAX)
> >>> +			ct->stall_time = ktime_get();
> >>> +
> >>> +		if (unlikely(ct_deadlocked(ct)))
> >>> +			return -EDEADLK;
> >>> +		else
> >>> +			return -EBUSY;
> >>> +	}
> >>> +
> >>> +	ct->stall_time = KTIME_MAX;
> >>> +	return 0;
> >>> +}
> >>> +
> >>>  static int ct_send_nb(struct intel_guc_ct *ct,
> >>>  		      const u32 *action,
> >>>  		      u32 len,
> >>> @@ -517,7 +551,7 @@ static int ct_send_nb(struct intel_guc_ct *ct,
> >>>  
> >>>  	spin_lock_irqsave(&ctb->lock, spin_flags);
> >>>  
> >>> -	ret = ctb_has_room(ctb, len + 1);
> >>> +	ret = has_room_nb(ct, len + 1);
> >>>  	if (unlikely(ret))
> >>>  		goto out;
> >>>  
> >>> @@ -561,11 +595,19 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>  retry:
> >>>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> >>>  	if (unlikely(!ctb_has_room(ctb, len + 1))) {
> >>> +		if (ct->stall_time == KTIME_MAX)
> >>> +			ct->stall_time = ktime_get();
> >>>  		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>> +
> >>> +		if (unlikely(ct_deadlocked(ct)))
> >>> +			return -EDEADLK;
> >>> +
> >>
> >> likely, instead of duplicating code, you can reuse has_room_nb here
> >>
> > 
> > In this patch yes, in the following patch no as this check changes
> > between non-blockig and blocking once we introduce G2H credits. I'd
> > rather just leave it as is than churning on the patches.
> > 
> > Matt 
> >  
> >>>  		cond_resched();
> >>>  		goto retry;
> >>>  	}
> >>>  
> >>> +	ct->stall_time = KTIME_MAX;
> >>> +
> >>>  	fence = ct_get_next_fence(ct);
> >>>  	request.fence = fence;
> >>>  	request.status = 0;
> >>>

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 12/97] drm/i915/guc: Don't repeat CTB layout calculations
  2021-05-25 13:07     ` Michal Wajdeczko
@ 2021-05-25 16:56       ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25 16:56 UTC (permalink / raw)
  To: Michal Wajdeczko
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison

On Tue, May 25, 2021 at 03:07:06PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 25.05.2021 04:53, Matthew Brost wrote:
> > On Thu, May 06, 2021 at 12:13:26PM -0700, Matthew Brost wrote:
> >> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >>
> >> We can retrieve offsets to cmds buffers and descriptor from
> >> actual pointers that we already keep locally.
> >>
> >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >> ---
> >>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 16 ++++++++++------
> >>  1 file changed, 10 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >> index dbece569fbe4..fbd6bd20f588 100644
> >> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >> @@ -244,6 +244,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
> >>  {
> >>  	struct intel_guc *guc = ct_to_guc(ct);
> >>  	u32 base, cmds;
> >> +	void *blob;
> >>  	int err;
> >>  	int i;
> >>  
> >> @@ -251,15 +252,18 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
> >>  
> >>  	/* vma should be already allocated and map'ed */
> >>  	GEM_BUG_ON(!ct->vma);
> >> +	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(ct->vma->obj));
> > 
> > This doesn't really have anything to do with this patch, but again this
> > patch will be squashed into a large patch updating the GuC firmware, so
> > I think this is fine.
> 
> again, no need to squash GuC patches up to 20/97
> 

Got it. As discussed I will post patches 4-20 after I done reviewing all
of them.

Matt 

> > 
> > With that:
> > Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> > 
> >>  	base = intel_guc_ggtt_offset(guc, ct->vma);
> >>  
> >> -	/* (re)initialize descriptors
> >> -	 * cmds buffers are in the second half of the blob page
> >> -	 */
> >> +	/* blob should start with send descriptor */
> >> +	blob = __px_vaddr(ct->vma->obj);
> >> +	GEM_BUG_ON(blob != ct->ctbs[CTB_SEND].desc);
> >> +
> >> +	/* (re)initialize descriptors */
> >>  	for (i = 0; i < ARRAY_SIZE(ct->ctbs); i++) {
> >>  		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
> >>  
> >> -		cmds = base + PAGE_SIZE / 4 * i + PAGE_SIZE / 2;
> >> +		cmds = base + ptrdiff(ct->ctbs[i].cmds, blob);
> >>  		CT_DEBUG(ct, "%d: cmds addr=%#x\n", i, cmds);
> >>  
> >>  		guc_ct_buffer_reset(&ct->ctbs[i], cmds);
> >> @@ -269,12 +273,12 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
> >>  	 * Register both CT buffers starting with RECV buffer.
> >>  	 * Descriptors are in first half of the blob.
> >>  	 */
> >> -	err = ct_register_buffer(ct, base + PAGE_SIZE / 4 * CTB_RECV,
> >> +	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs[CTB_RECV].desc, blob),
> >>  				 INTEL_GUC_CT_BUFFER_TYPE_RECV);
> >>  	if (unlikely(err))
> >>  		goto err_out;
> >>  
> >> -	err = ct_register_buffer(ct, base + PAGE_SIZE / 4 * CTB_SEND,
> >> +	err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs[CTB_SEND].desc, blob),
> >>  				 INTEL_GUC_CT_BUFFER_TYPE_SEND);
> >>  	if (unlikely(err))
> >>  		goto err_deregister;
> >> -- 
> >> 2.28.0
> >>

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 53/97] drm/i915/guc: Disable semaphores when using GuC scheduling
  2021-05-25  9:52   ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-25 17:01     ` Matthew Brost
  2021-05-26  9:25       ` Tvrtko Ursulin
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-25 17:01 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Tue, May 25, 2021 at 10:52:01AM +0100, Tvrtko Ursulin wrote:
> 
> On 06/05/2021 20:14, Matthew Brost wrote:
> > Disable semaphores when using GuC scheduling as semaphores are broken in
> > the current GuC firmware.
> 
> What is "current"? Given that the patch itself is like year and a half old.
> 

Stale comment. Semaphore work with the firmware we just haven't enabled
them in the i915 with GuC submission as this an optimization and not
required for functionality.

Matt

> Regards,
> 
> Tvrtko
> 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 ++++--
> >   1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index 993faa213b41..d30260ffe2a7 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > @@ -230,7 +230,8 @@ static void intel_context_set_gem(struct intel_context *ce,
> >   		ce->timeline = intel_timeline_get(ctx->timeline);
> >   	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
> > -	    intel_engine_has_timeslices(ce->engine))
> > +	    intel_engine_has_timeslices(ce->engine) &&
> > +	    intel_engine_has_semaphores(ce->engine))
> >   		__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
> >   	intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us);
> > @@ -1939,7 +1940,8 @@ static int __apply_priority(struct intel_context *ce, void *arg)
> >   	if (!intel_engine_has_timeslices(ce->engine))
> >   		return 0;
> > -	if (ctx->sched.priority >= I915_PRIORITY_NORMAL)
> > +	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
> > +	    intel_engine_has_semaphores(ce->engine))
> >   		intel_context_set_use_semaphores(ce);
> >   	else
> >   		intel_context_clear_use_semaphores(ce);
> > 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 55/97] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
  2021-05-25 10:06   ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-25 17:07     ` Matthew Brost
  2021-05-26  9:21       ` Tvrtko Ursulin
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-25 17:07 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Tue, May 25, 2021 at 11:06:00AM +0100, Tvrtko Ursulin wrote:
> 
> On 06/05/2021 20:14, Matthew Brost wrote:
> > When running the GuC the GPU can't be considered idle if the GuC still
> > has contexts pinned. As such, a call has been added in
> > intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
> > the number of unpinned contexts to go to zero.
> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
> >   drivers/gpu/drm/i915/gt/intel_gt.c            | 18 ++++
> >   drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
> >   drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 22 ++---
> >   drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  1 +
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 ++++++++++++++++++-
> >   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |  5 +
> >   drivers/gpu/drm/i915/i915_debugfs.c           |  1 +
> >   drivers/gpu/drm/i915/i915_gem_evict.c         |  1 +
> >   .../gpu/drm/i915/selftests/igt_live_test.c    |  2 +-
> >   .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
> >   14 files changed, 137 insertions(+), 27 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > index 8598a1c78a4c..2f5295c9408d 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > @@ -634,7 +634,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
> >   		goto insert;
> >   	/* Attempt to reap some mmap space from dead objects */
> > -	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
> > +	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
> > +					       NULL);
> >   	if (err)
> >   		goto err;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > index 8d77dcbad059..1742a8561f69 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > @@ -574,6 +574,24 @@ static void __intel_gt_disable(struct intel_gt *gt)
> >   	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
> >   }
> > +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
> > +{
> > +	long rtimeout;
> > +
> > +	/* If the device is asleep, we have no requests outstanding */
> > +	if (!intel_gt_pm_is_awake(gt))
> > +		return 0;
> > +
> > +	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
> > +							   &rtimeout)) > 0) {
> > +		cond_resched();
> > +		if (signal_pending(current))
> > +			return -EINTR;
> > +	}
> > +
> > +	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc, rtimeout);
> > +}
> > +
> >   int intel_gt_init(struct intel_gt *gt)
> >   {
> >   	int err;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
> > index 7ec395cace69..c775043334bf 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
> > @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
> >   void intel_gt_driver_late_release(struct intel_gt *gt);
> > +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> > +
> >   void intel_gt_check_and_clear_faults(struct intel_gt *gt);
> >   void intel_gt_clear_error_registers(struct intel_gt *gt,
> >   				    intel_engine_mask_t engine_mask);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > index 647eca9d867a..c6c702f236fa 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > @@ -13,6 +13,7 @@
> >   #include "intel_gt_pm.h"
> >   #include "intel_gt_requests.h"
> >   #include "intel_timeline.h"
> > +#include "uc/intel_uc.h"
> >   static bool retire_requests(struct intel_timeline *tl)
> >   {
> > @@ -130,7 +131,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine)
> >   	GEM_BUG_ON(engine->retire);
> >   }
> > -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
> > +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> > +				      long *rtimeout)
> 
> What is 'rtimeout', I know remaining, but it can be more self-descriptive to
> start with.
>

'remaining_timeout' it is.

> It feels a bit churny for what it is. How plausible would be alternatives to
> either change existing timeout to in/out, or measure sleep internally in
> this function, or just risk sleeping twice as long by passing the original
> timeout to uc idle as well?
>

Originally had it just passing in the same value, got review feedback
saying I should pass in the adjusted value. Hard to make everyone happy.
 
> >   {
> >   	struct intel_gt_timelines *timelines = &gt->timelines;
> >   	struct intel_timeline *tl, *tn;
> > @@ -195,22 +197,10 @@ out_active:	spin_lock(&timelines->lock);
> >   	if (flush_submission(gt, timeout)) /* Wait, there's more! */
> >   		active_count++;
> > -	return active_count ? timeout : 0;
> > -}
> > -
> > -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
> > -{
> > -	/* If the device is asleep, we have no requests outstanding */
> > -	if (!intel_gt_pm_is_awake(gt))
> > -		return 0;
> > -
> > -	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
> > -		cond_resched();
> > -		if (signal_pending(current))
> > -			return -EINTR;
> > -	}
> > +	if (rtimeout)
> > +		*rtimeout = timeout;
> > -	return timeout;
> > +	return active_count ? timeout : 0;
> >   }
> >   static void retire_work_handler(struct work_struct *work)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > index fcc30a6e4fe9..4419787124e2 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > @@ -10,10 +10,11 @@ struct intel_engine_cs;
> >   struct intel_gt;
> >   struct intel_timeline;
> > -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
> > +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> > +				      long *rtimeout);
> >   static inline void intel_gt_retire_requests(struct intel_gt *gt)
> >   {
> > -	intel_gt_retire_requests_timeout(gt, 0);
> > +	intel_gt_retire_requests_timeout(gt, 0, NULL);
> >   }
> >   void intel_engine_init_retire(struct intel_engine_cs *engine);
> > @@ -21,8 +22,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine,
> >   			     struct intel_timeline *tl);
> >   void intel_engine_fini_retire(struct intel_engine_cs *engine);
> > -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> > -
> >   void intel_gt_init_requests(struct intel_gt *gt);
> >   void intel_gt_park_requests(struct intel_gt *gt);
> >   void intel_gt_unpark_requests(struct intel_gt *gt);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 485e98f3f304..47eaa69809e8 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -38,6 +38,8 @@ struct intel_guc {
> >   	spinlock_t irq_lock;
> >   	unsigned int msg_enabled_mask;
> > +	atomic_t outstanding_submission_g2h;
> > +
> >   	struct {
> >   		bool enabled;
> >   		void (*reset)(struct intel_guc *guc);
> > @@ -239,6 +241,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
> >   	spin_unlock_irq(&guc->irq_lock);
> >   }
> > +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
> > +
> >   int intel_guc_reset_engine(struct intel_guc *guc,
> >   			   struct intel_engine_cs *engine);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index f1893030ca88..cf701056fa14 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -111,6 +111,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
> >   	INIT_LIST_HEAD(&ct->requests.incoming);
> >   	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
> >   	tasklet_init(&ct->receive_tasklet, ct_receive_tasklet_func, (unsigned long)ct);
> > +	init_waitqueue_head(&ct->wq);
> >   }
> >   static inline const char *guc_ct_buffer_type_to_str(u32 type)
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > index 660bf37238e2..ab1b79ab960b 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > @@ -10,6 +10,7 @@
> >   #include <linux/spinlock.h>
> >   #include <linux/workqueue.h>
> >   #include <linux/ktime.h>
> > +#include <linux/wait.h>
> >   #include "intel_guc_fwif.h"
> > @@ -68,6 +69,9 @@ struct intel_guc_ct {
> >   	struct tasklet_struct receive_tasklet;
> > +	/** @wq: wait queue for g2h chanenl */
> > +	wait_queue_head_t wq;
> > +
> >   	struct {
> >   		u16 last_fence; /* last fence used to send request */
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index ae0b386467e3..0ff7dd6d337d 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -253,6 +253,74 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> >   	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> >   }
> > +static int guc_submission_busy_loop(struct intel_guc* guc,
> > +				    const u32 *action,
> > +				    u32 len,
> > +				    u32 g2h_len_dw,
> > +				    bool loop)
> > +{
> > +	int err;
> > +
> > +	err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
> > +
> > +	if (!err && g2h_len_dw)
> > +		atomic_inc(&guc->outstanding_submission_g2h);
> > +
> > +	return err;
> > +}
> > +
> > +static int guc_wait_for_pending_msg(struct intel_guc *guc,
> > +				    atomic_t *wait_var,
> > +				    bool interruptible,
> > +				    long timeout)
> > +{
> > +	const int state = interruptible ?
> > +		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> > +	DEFINE_WAIT(wait);
> > +
> > +	might_sleep();
> > +	GEM_BUG_ON(timeout < 0);
> > +
> > +	if (!atomic_read(wait_var))
> > +		return 0;
> > +
> > +	if (!timeout)
> > +		return -ETIME;
> > +
> > +	for (;;) {
> > +		prepare_to_wait(&guc->ct.wq, &wait, state);
> > +
> > +		if (!atomic_read(wait_var))
> > +			break;
> > +
> > +		if (signal_pending_state(state, current)) {
> > +			timeout = -ERESTARTSYS;
> > +			break;
> > +		}
> > +
> > +		if (!timeout) {
> > +			timeout = -ETIME;
> > +			break;
> > +		}
> > +
> > +		timeout = io_schedule_timeout(timeout);
> > +	}
> > +	finish_wait(&guc->ct.wq, &wait);
> > +
> > +	return (timeout < 0) ? timeout : 0;
> > +}
> 
> See if it is possible to simplify all this with wait_var_event and
> wake_up_var.
>

Let me check on that.
 
> > +
> > +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
> > +{
> > +	bool interruptible = true;
> > +
> > +	if (unlikely(timeout < 0))
> > +		timeout = -timeout, interruptible = false;
> > +
> > +	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
> > +					interruptible, timeout);
> > +}
> > +
> >   static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   {
> >   	int err;
> > @@ -279,6 +347,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
> >   	if (!enabled && !err) {
> > +		atomic_inc(&guc->outstanding_submission_g2h);
> >   		set_context_enabled(ce);
> >   	} else if (!enabled) {
> >   		clr_context_pending_enable(ce);
> > @@ -734,7 +803,7 @@ static int __guc_action_register_context(struct intel_guc *guc,
> >   		offset,
> >   	};
> > -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > +	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> >   }
> >   static int register_context(struct intel_context *ce)
> > @@ -754,7 +823,7 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
> >   		guc_id,
> >   	};
> > -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > +	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
> >   					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
> >   }
> > @@ -871,7 +940,9 @@ static int guc_context_pin(struct intel_context *ce, void *vaddr)
> >   static void guc_context_unpin(struct intel_context *ce)
> >   {
> > -	unpin_guc_id(ce_to_guc(ce), ce);
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +
> > +	unpin_guc_id(guc, ce);
> >   	lrc_unpin(ce);
> >   }
> > @@ -894,7 +965,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
> >   	intel_context_get(ce);
> > -	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > +	guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
> >   				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> >   }
> > @@ -1437,6 +1508,15 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> >   	return ce;
> >   }
> > +static void decr_outstanding_submission_g2h(struct intel_guc *guc)
> > +{
> > +	if (atomic_dec_and_test(&guc->outstanding_submission_g2h)) {
> > +		smp_mb();
> > +		if (waitqueue_active(&guc->ct.wq))
> > +			wake_up_all(&guc->ct.wq);
> 
> I keep pointing out this pattern is racy and at least needs comment why it
> is safe.
> 

There is a comment in wake queue code header saying why this is safe. I
don't think we need to repeat this here.

Matt

> Regards,
> 
> Tvrtko
> 
> > +	}
> > +}
> > +
> >   int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >   					  const u32 *msg,
> >   					  u32 len)
> > @@ -1472,6 +1552,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >   		lrc_destroy(&ce->ref);
> >   	}
> > +	decr_outstanding_submission_g2h(guc);
> > +
> >   	return 0;
> >   }
> > @@ -1520,6 +1602,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
> >   		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> >   	}
> > +	decr_outstanding_submission_g2h(guc);
> >   	intel_context_put(ce);
> >   	return 0;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > index 9c954c589edf..c4cef885e984 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > @@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission);
> >   #undef uc_state_checkers
> >   #undef __uc_state_checker
> > +static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout)
> > +{
> > +	return intel_guc_wait_for_idle(&uc->guc, timeout);
> > +}
> > +
> >   #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \
> >   static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \
> >   { \
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 8dd374691102..bb29838d1cd7 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -36,6 +36,7 @@
> >   #include "gt/intel_gt_clock_utils.h"
> >   #include "gt/intel_gt.h"
> >   #include "gt/intel_gt_pm.h"
> > +#include "gt/intel_gt.h"
> >   #include "gt/intel_gt_requests.h"
> >   #include "gt/intel_reset.h"
> >   #include "gt/intel_rc6.h"
> > diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> > index 4d2d59a9942b..2b73ddb11c66 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> > @@ -27,6 +27,7 @@
> >    */
> >   #include "gem/i915_gem_context.h"
> > +#include "gt/intel_gt.h"
> >   #include "gt/intel_gt_requests.h"
> >   #include "i915_drv.h"
> > diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
> > index c130010a7033..1c721542e277 100644
> > --- a/drivers/gpu/drm/i915/selftests/igt_live_test.c
> > +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
> > @@ -5,7 +5,7 @@
> >    */
> >   #include "i915_drv.h"
> > -#include "gt/intel_gt_requests.h"
> > +#include "gt/intel_gt.h"
> >   #include "../i915_selftest.h"
> >   #include "igt_flush_test.h"
> > diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > index cf40004bc92a..6c06816e2b99 100644
> > --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > @@ -51,7 +51,8 @@ void mock_device_flush(struct drm_i915_private *i915)
> >   	do {
> >   		for_each_engine(engine, gt, id)
> >   			mock_engine_flush(engine);
> > -	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT));
> > +	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT,
> > +						  NULL));
> >   }
> >   static void mock_device_release(struct drm_device *dev)
> > 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 44/97] drm/i915/guc: Implement GuC submission tasklet
  2021-05-25  9:43   ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-25 17:10     ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25 17:10 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Tue, May 25, 2021 at 10:43:32AM +0100, Tvrtko Ursulin wrote:
> 
> On 06/05/2021 20:13, Matthew Brost wrote:
> > Implement GuC submission tasklet for new interface. The new GuC
> > interface uses H2G to submit contexts to the GuC. Since H2G use a single
> > channel, a single tasklet submits is used for the submission path. As
> > such a global struct intel_engine_cs has been added to leverage the
> > existing scheduling code.
> > 
> > Also the per engine interrupt handler has been updated to disable the
> > rescheduling of the physical engine tasklet, when using GuC scheduling,
> > as the physical engine tasklet is no longer used.
> > 
> > In this patch the field, guc_id, has been added to intel_context and is
> > not assigned. Patches later in the series will assign this value.
> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   4 +
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 233 +++++++++---------
> >   3 files changed, 127 insertions(+), 119 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index ed8c447a7346..bb6fef7eae52 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -136,6 +136,15 @@ struct intel_context {
> >   	struct intel_sseu sseu;
> >   	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
> > +
> > +	/* GuC scheduling state that does not require a lock. */
> > +	atomic_t guc_sched_state_no_lock;
> > +
> > +	/*
> > +	 * GuC lrc descriptor ID - Not assigned in this patch but future patches
> > +	 * in the series will.
> > +	 */
> > +	u16 guc_id;
> >   };
> >   #endif /* __INTEL_CONTEXT_TYPES__ */
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 2eb6c497e43c..d32866fe90ad 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -30,6 +30,10 @@ struct intel_guc {
> >   	struct intel_guc_log log;
> >   	struct intel_guc_ct ct;
> > +	/* Global engine used to submit requests to GuC */
> > +	struct i915_sched_engine *sched_engine;
> > +	struct i915_request *stalled_request;
> > +
> >   	/* intel_guc_recv interrupt related state */
> >   	spinlock_t irq_lock;
> >   	unsigned int msg_enabled_mask;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index c2b6d27404b7..0955a8b00ee8 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -60,6 +60,30 @@
> >   #define GUC_REQUEST_SIZE 64 /* bytes */
> > +/*
> > + * Below is a set of functions which control the GuC scheduling state which do
> > + * not require a lock as all state transitions are mutually exclusive. i.e. It
> > + * is not possible for the context pinning code and submission, for the same
> > + * context, to be executing simultaneously.
> > + */
> 
> Is the statement that some other locks, or other guarantees, serialise
> modification of this state, and if so, why is it using atomics?
> 

This should probably be reworded. For the atomics the transitions are
happen at the same time but the transitions are safe, for the ones
protected by the locks some other state choices are made so we need a
lock when transitioning the state.

Matt

> Regards,
> 
> Tvrtko
> 
> > +#define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
> > +static inline bool context_enabled(struct intel_context *ce)
> > +{
> > +	return (atomic_read(&ce->guc_sched_state_no_lock) &
> > +		SCHED_STATE_NO_LOCK_ENABLED);
> > +}
> > +
> > +static inline void set_context_enabled(struct intel_context *ce)
> > +{
> > +	atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
> > +}
> > +
> > +static inline void clr_context_enabled(struct intel_context *ce)
> > +{
> > +	atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
> > +		   &ce->guc_sched_state_no_lock);
> > +}
> > +
> >   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> >   {
> >   	return rb_entry(rb, struct i915_priolist, node);
> > @@ -122,37 +146,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> >   	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> >   }
> > -static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > +static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   {
> > -	/* Leaving stub as this function will be used in future patches */
> > -}
> > +	int err;
> > +	struct intel_context *ce = rq->context;
> > +	u32 action[3];
> > +	int len = 0;
> > +	bool enabled = context_enabled(ce);
> > -/*
> > - * When we're doing submissions using regular execlists backend, writing to
> > - * ELSP from CPU side is enough to make sure that writes to ringbuffer pages
> > - * pinned in mappable aperture portion of GGTT are visible to command streamer.
> > - * Writes done by GuC on our behalf are not guaranteeing such ordering,
> > - * therefore, to ensure the flush, we're issuing a POSTING READ.
> > - */
> > -static void flush_ggtt_writes(struct i915_vma *vma)
> > -{
> > -	if (i915_vma_is_map_and_fenceable(vma))
> > -		intel_uncore_posting_read_fw(vma->vm->gt->uncore,
> > -					     GUC_STATUS);
> > -}
> > +	if (!enabled) {
> > +		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> > +		action[len++] = ce->guc_id;
> > +		action[len++] = GUC_CONTEXT_ENABLE;
> > +	} else {
> > +		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
> > +		action[len++] = ce->guc_id;
> > +	}
> > -static void guc_submit(struct intel_engine_cs *engine,
> > -		       struct i915_request **out,
> > -		       struct i915_request **end)
> > -{
> > -	struct intel_guc *guc = &engine->gt->uc.guc;
> > +	err = intel_guc_send_nb(guc, action, len);
> > -	do {
> > -		struct i915_request *rq = *out++;
> > +	if (!enabled && !err)
> > +		set_context_enabled(ce);
> > -		flush_ggtt_writes(rq->ring->vma);
> > -		guc_add_request(guc, rq);
> > -	} while (out != end);
> > +	return err;
> >   }
> >   static inline int rq_prio(const struct i915_request *rq)
> > @@ -160,125 +176,88 @@ static inline int rq_prio(const struct i915_request *rq)
> >   	return rq->sched.attr.priority;
> >   }
> > -static struct i915_request *schedule_in(struct i915_request *rq, int idx)
> > -{
> > -	trace_i915_request_in(rq, idx);
> > -
> > -	/*
> > -	 * Currently we are not tracking the rq->context being inflight
> > -	 * (ce->inflight = rq->engine). It is only used by the execlists
> > -	 * backend at the moment, a similar counting strategy would be
> > -	 * required if we generalise the inflight tracking.
> > -	 */
> > -
> > -	__intel_gt_pm_get(rq->engine->gt);
> > -	return i915_request_get(rq);
> > -}
> > -
> > -static void schedule_out(struct i915_request *rq)
> > -{
> > -	trace_i915_request_out(rq);
> > -
> > -	intel_gt_pm_put_async(rq->engine->gt);
> > -	i915_request_put(rq);
> > -}
> > -
> > -static void __guc_dequeue(struct intel_engine_cs *engine)
> > +static int guc_dequeue_one_context(struct intel_guc *guc)
> >   {
> > -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> > -	struct intel_engine_execlists * const execlists = &engine->execlists;
> > -	struct i915_request **first = execlists->inflight;
> > -	struct i915_request ** const last_port = first + execlists->port_mask;
> > -	struct i915_request *last = first[0];
> > -	struct i915_request **port;
> > +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > +	struct i915_request *last = NULL;
> >   	bool submit = false;
> >   	struct rb_node *rb;
> > +	int ret;
> > -	lockdep_assert_held(&engine->sched_engine->lock);
> > -
> > -	if (last) {
> > -		if (*++first)
> > -			return;
> > +	lockdep_assert_held(&sched_engine->lock);
> > -		last = NULL;
> > +	if (guc->stalled_request) {
> > +		submit = true;
> > +		last = guc->stalled_request;
> > +		goto resubmit;
> >   	}
> > -	/*
> > -	 * We write directly into the execlists->inflight queue and don't use
> > -	 * the execlists->pending queue, as we don't have a distinct switch
> > -	 * event.
> > -	 */
> > -	port = first;
> >   	while ((rb = rb_first_cached(&sched_engine->queue))) {
> >   		struct i915_priolist *p = to_priolist(rb);
> >   		struct i915_request *rq, *rn;
> >   		priolist_for_each_request_consume(rq, rn, p) {
> > -			if (last && rq->context != last->context) {
> > -				if (port == last_port)
> > -					goto done;
> > -
> > -				*port = schedule_in(last,
> > -						    port - execlists->inflight);
> > -				port++;
> > -			}
> > +			if (last && rq->context != last->context)
> > +				goto done;
> >   			list_del_init(&rq->sched.link);
> > +
> >   			__i915_request_submit(rq);
> > -			submit = true;
> > +
> > +			trace_i915_request_in(rq, 0);
> >   			last = rq;
> > +			submit = true;
> >   		}
> >   		rb_erase_cached(&p->node, &sched_engine->queue);
> >   		i915_priolist_free(p);
> >   	}
> >   done:
> > -	sched_engine->queue_priority_hint =
> > -		rb ? to_priolist(rb)->priority : INT_MIN;
> >   	if (submit) {
> > -		*port = schedule_in(last, port - execlists->inflight);
> > -		*++port = NULL;
> > -		guc_submit(engine, first, port);
> > +		last->context->lrc_reg_state[CTX_RING_TAIL] =
> > +			intel_ring_set_tail(last->ring, last->tail);
> > +resubmit:
> > +		/*
> > +		 * We only check for -EBUSY here even though it is possible for
> > +		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
> > +		 * died and a full GPU needs to be done. The hangcheck will
> > +		 * eventually detect that the GuC has died and trigger this
> > +		 * reset so no need to handle -EDEADLK here.
> > +		 */
> > +		ret = guc_add_request(guc, last);
> > +		if (ret == -EBUSY) {
> > +			i915_sched_engine_kick(sched_engine);
> > +			guc->stalled_request = last;
> > +			return false;
> > +		}
> >   	}
> > -	execlists->active = execlists->inflight;
> > +
> > +	guc->stalled_request = NULL;
> > +	return submit;
> >   }
> >   static void guc_submission_tasklet(struct tasklet_struct *t)
> >   {
> >   	struct i915_sched_engine *sched_engine =
> >   		from_tasklet(sched_engine, t, tasklet);
> > -	struct intel_engine_cs * const engine = sched_engine->engine;
> > -	struct intel_engine_execlists * const execlists = &engine->execlists;
> > -	struct i915_request **port, *rq;
> >   	unsigned long flags;
> > +	bool loop;
> > -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > -
> > -	for (port = execlists->inflight; (rq = *port); port++) {
> > -		if (!i915_request_completed(rq))
> > -			break;
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > -		schedule_out(rq);
> > -	}
> > -	if (port != execlists->inflight) {
> > -		int idx = port - execlists->inflight;
> > -		int rem = ARRAY_SIZE(execlists->inflight) - idx;
> > -		memmove(execlists->inflight, port, rem * sizeof(*port));
> > -	}
> > -
> > -	__guc_dequeue(engine);
> > +	do {
> > +		loop = guc_dequeue_one_context(&sched_engine->engine->gt->uc.guc);
> > +	} while (loop);
> > -	i915_sched_engine_reset_on_empty(engine->sched_engine);
> > +	i915_sched_engine_reset_on_empty(sched_engine);
> > -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> >   }
> >   static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
> >   {
> > -	if (iir & GT_RENDER_USER_INTERRUPT) {
> > +	if (iir & GT_RENDER_USER_INTERRUPT)
> >   		intel_engine_signal_breadcrumbs(engine);
> > -		i915_sched_engine_hi_kick(engine->sched_engine);
> > -	}
> >   }
> >   static void guc_reset_prepare(struct intel_engine_cs *engine)
> > @@ -351,6 +330,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> >   	struct rb_node *rb;
> >   	unsigned long flags;
> > +	/* Can be called during boot if GuC fails to load */
> > +	if (!engine->gt)
> > +		return;
> > +
> >   	ENGINE_TRACE(engine, "\n");
> >   	/*
> > @@ -437,8 +420,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
> >   void intel_guc_submission_fini(struct intel_guc *guc)
> >   {
> > -	if (guc->lrc_desc_pool)
> > -		guc_lrc_desc_pool_destroy(guc);
> > +	if (!guc->lrc_desc_pool)
> > +		return;
> > +
> > +	guc_lrc_desc_pool_destroy(guc);
> > +	i915_sched_engine_put(guc->sched_engine);
> >   }
> >   static int guc_context_alloc(struct intel_context *ce)
> > @@ -503,32 +489,32 @@ static int guc_request_alloc(struct i915_request *request)
> >   	return 0;
> >   }
> > -static inline void queue_request(struct intel_engine_cs *engine,
> > +static inline void queue_request(struct i915_sched_engine *sched_engine,
> >   				 struct i915_request *rq,
> >   				 int prio)
> >   {
> >   	GEM_BUG_ON(!list_empty(&rq->sched.link));
> >   	list_add_tail(&rq->sched.link,
> > -		      i915_sched_lookup_priolist(engine->sched_engine, prio));
> > +		      i915_sched_lookup_priolist(sched_engine, prio));
> >   	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> >   }
> >   static void guc_submit_request(struct i915_request *rq)
> >   {
> > -	struct intel_engine_cs *engine = rq->engine;
> > +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> >   	unsigned long flags;
> >   	/* Will be called from irq-context when using foreign fences. */
> > -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > -	queue_request(engine, rq, rq_prio(rq));
> > +	queue_request(sched_engine, rq, rq_prio(rq));
> > -	GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
> > +	GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
> >   	GEM_BUG_ON(list_empty(&rq->sched.link));
> > -	i915_sched_engine_hi_kick(engine->sched_engine);
> > +	i915_sched_engine_hi_kick(sched_engine);
> > -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> >   }
> >   static void sanitize_hwsp(struct intel_engine_cs *engine)
> > @@ -606,8 +592,6 @@ static void guc_release(struct intel_engine_cs *engine)
> >   {
> >   	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
> > -	tasklet_kill(&engine->sched_engine->tasklet);
> > -
> >   	intel_engine_cleanup_common(engine);
> >   	lrc_fini_wa_ctx(engine);
> >   }
> > @@ -678,6 +662,7 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
> >   int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   {
> >   	struct drm_i915_private *i915 = engine->i915;
> > +	struct intel_guc *guc = &engine->gt->uc.guc;
> >   	/*
> >   	 * The setup relies on several assumptions (e.g. irqs always enabled)
> > @@ -685,8 +670,18 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   	 */
> >   	GEM_BUG_ON(INTEL_GEN(i915) < 11);
> > -	tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet);
> > -	engine->sched_engine->schedule = i915_schedule;
> > +	if (!guc->sched_engine) {
> > +		guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
> > +		if (!guc->sched_engine)
> > +			return -ENOMEM;
> > +
> > +		guc->sched_engine->schedule = i915_schedule;
> > +		guc->sched_engine->engine = engine;
> > +		tasklet_setup(&guc->sched_engine->tasklet,
> > +			      guc_submission_tasklet);
> > +	}
> > +	i915_sched_engine_put(engine->sched_engine);
> > +	engine->sched_engine = i915_sched_engine_get(guc->sched_engine);
> >   	guc_default_vfuncs(engine);
> >   	guc_default_irqs(engine);
> > 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 39/97] drm/i915/guc: Increase size of CTB buffers
  2021-05-25  9:24   ` Tvrtko Ursulin
@ 2021-05-25 17:15     ` Matthew Brost
  2021-05-26  9:30       ` Tvrtko Ursulin
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-25 17:15 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Tue, May 25, 2021 at 10:24:09AM +0100, Tvrtko Ursulin wrote:
> 
> On 06/05/2021 20:13, Matthew Brost wrote:
> > With the introduction of non-blocking CTBs more than one CTB can be in
> > flight at a time. Increasing the size of the CTBs should reduce how
> > often software hits the case where no space is available in the CTB
> > buffer.
> 
> I'd move this before the patch which adds the non-blocking send since that
> one claims congestion should be rare with properly sized buffers. So it
> makes sense to have them sized properly back before that one.
>

IMO patch ordering is a bit of bikeshed. All these CTBs changes required
for GuC submission (34-40, 54) will get posted its own series and get
merged together. None of the individual patches break anything or is any
of this code really used until GuC submission is turned on. I can move
this when I post these patches by themselves but I just don't really see
the point either way.

Matt
 
> Regards,
> 
> Tvrtko
> 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
> >   1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 77dfbc94dcc3..d6895d29ed2d 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
> >    *      +--------+-----------------------------------------------+------+
> >    *
> >    * Size of each `CT Buffer`_ must be multiple of 4K.
> > - * As we don't expect too many messages, for now use minimum sizes.
> > + * We don't expect too many messages in flight at any time, unless we are
> > + * using the GuC submission. In that case each request requires a minimum
> > + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this
> > + * enough space to avoid backpressure on the driver. We increase the size
> > + * of the receive buffer (relative to the send) to ensure a G2H response
> > + * CTB has a landing spot.
> >    */
> >   #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
> >   #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
> > -#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
> > +#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
> >   #define MAX_US_STALL_CTB	1000000
> > @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >   	/* beware of buffer wrap case */
> >   	if (unlikely(available < 0))
> >   		available += size;
> > -	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
> > +	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
> >   	GEM_BUG_ON(available < 0);
> >   	header = cmds[head];
> > 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-05-25  9:21   ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-25 17:21     ` Matthew Brost
  2021-05-26  8:57       ` Tvrtko Ursulin
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-25 17:21 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Tue, May 25, 2021 at 10:21:00AM +0100, Tvrtko Ursulin wrote:
> 
> On 06/05/2021 20:13, Matthew Brost wrote:
> > Add non blocking CTB send function, intel_guc_send_nb. In order to
> > support a non blocking CTB send function a spin lock is needed to
> > protect the CTB descriptors fields. Also the non blocking call must not
> > update the fence value as this value is owned by the blocking call
> > (intel_guc_send).
> 
> Could the commit message say why the non-blocking send function is needed?
> 

Sure. Something like:

'CTBs will be used in the critical patch of GuC submission and there is
no need to wait for each CTB complete before moving on the i915'

> > 
> > The blocking CTB now must have a flow control mechanism to ensure the
> > buffer isn't overrun. A lazy spin wait is used as we believe the flow
> > control condition should be rare with properly sized buffer.
> > 
> > The function, intel_guc_send_nb, is exported in this patch but unused.
> > Several patches later in the series make use of this function.
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 ++-
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 96 +++++++++++++++++++++--
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  7 +-
> >   3 files changed, 105 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index c20f3839de12..4c0a367e41d8 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -75,7 +75,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
> >   static
> >   inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
> >   {
> > -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
> > +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
> > +}
> > +
> > +#define INTEL_GUC_SEND_NB		BIT(31)
> > +static
> > +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
> > +{
> > +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
> > +				 INTEL_GUC_SEND_NB);
> >   }
> >   static inline int
> > @@ -83,7 +91,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
> >   			   u32 *response_buf, u32 response_buf_size)
> >   {
> >   	return intel_guc_ct_send(&guc->ct, action, len,
> > -				 response_buf, response_buf_size);
> > +				 response_buf, response_buf_size, 0);
> >   }
> >   static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index a76603537fa8..af7314d45a78 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -3,6 +3,11 @@
> >    * Copyright © 2016-2019 Intel Corporation
> >    */
> > +#include <linux/circ_buf.h>
> > +#include <linux/ktime.h>
> > +#include <linux/time64.h>
> > +#include <linux/timekeeping.h>
> > +
> >   #include "i915_drv.h"
> >   #include "intel_guc_ct.h"
> >   #include "gt/intel_gt.h"
> > @@ -308,6 +313,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
> >   	if (unlikely(err))
> >   		goto err_deregister;
> > +	ct->requests.last_fence = 1;
> >   	ct->enabled = true;
> >   	return 0;
> > @@ -343,10 +349,22 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
> >   	return ++ct->requests.last_fence;
> >   }
> > +static void write_barrier(struct intel_guc_ct *ct) {
> > +	struct intel_guc *guc = ct_to_guc(ct);
> > +	struct intel_gt *gt = guc_to_gt(guc);
> > +
> > +	if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
> > +		GEM_BUG_ON(guc->send_regs.fw_domains);
> > +		intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
> 
> It's safe to write to this reg? Does it need a comment to explain it?
>

Yes, it is same. IMO 'SCRATCH' in the name is enough documentation.
 
> > +	} else {
> > +		wmb();
> > +	}
> > +}
> > +
> >   static int ct_write(struct intel_guc_ct *ct,
> >   		    const u32 *action,
> >   		    u32 len /* in dwords */,
> > -		    u32 fence)
> > +		    u32 fence, u32 flags)
> >   {
> >   	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >   	struct guc_ct_buffer_desc *desc = ctb->desc;
> > @@ -393,9 +411,13 @@ static int ct_write(struct intel_guc_ct *ct,
> >   		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
> >   		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
> > -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> > -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> > -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
> > +	hxg = (flags & INTEL_GUC_SEND_NB) ?
> > +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
> > +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
> > +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
> > +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> > +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> > +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
> >   	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
> >   		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
> > @@ -412,6 +434,12 @@ static int ct_write(struct intel_guc_ct *ct,
> >   	}
> >   	GEM_BUG_ON(tail > size);
> > +	/*
> > +	 * make sure H2G buffer update and LRC tail update (if this triggering a
> > +	 * submission) are visable before updating the descriptor tail
> > +	 */
> > +	write_barrier(ct);
> > +
> >   	/* now update descriptor */
> >   	WRITE_ONCE(desc->tail, tail);
> > @@ -466,6 +494,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> >   	return err;
> >   }
> > +static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> > +{
> > +	struct guc_ct_buffer_desc *desc = ctb->desc;
> > +	u32 head = READ_ONCE(desc->head);
> > +	u32 space;
> > +
> > +	space = CIRC_SPACE(desc->tail, head, ctb->size);
> > +
> > +	return space >= len_dw;
> > +}
> > +
> > +static int ct_send_nb(struct intel_guc_ct *ct,
> > +		      const u32 *action,
> > +		      u32 len,
> > +		      u32 flags)
> > +{
> > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > +	unsigned long spin_flags;
> > +	u32 fence;
> > +	int ret;
> > +
> > +	spin_lock_irqsave(&ctb->lock, spin_flags);
> > +
> > +	ret = ctb_has_room(ctb, len + 1);
> > +	if (unlikely(ret))
> > +		goto out;
> > +
> > +	fence = ct_get_next_fence(ct);
> > +	ret = ct_write(ct, action, len, fence, flags);
> > +	if (unlikely(ret))
> > +		goto out;
> > +
> > +	intel_guc_notify(ct_to_guc(ct));
> > +
> > +out:
> > +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> > +
> > +	return ret;
> > +}
> > +
> >   static int ct_send(struct intel_guc_ct *ct,
> >   		   const u32 *action,
> >   		   u32 len,
> > @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >   		   u32 response_buf_size,
> >   		   u32 *status)
> >   {
> > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >   	struct ct_request request;
> >   	unsigned long flags;
> >   	u32 fence;
> > @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> >   	GEM_BUG_ON(!len);
> >   	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> >   	GEM_BUG_ON(!response_buf && response_buf_size);
> > +	might_sleep();
> 
> Sleep is just cond_resched below or there is more?
> 

Yes, the cond_resched.

> > +	/*
> > +	 * We use a lazy spin wait loop here as we believe that if the CT
> > +	 * buffers are sized correctly the flow control condition should be
> > +	 * rare.
> > +	 */
> > +retry:
> >   	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
> > +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > +		cond_resched();
> > +		goto retry;
> > +	}
> 
> If this patch is about adding a non-blocking send function, and below we can
> see that it creates a fork:
> 
> intel_guc_ct_send:
> ...
> 	if (flags & INTEL_GUC_SEND_NB)
> 		return ct_send_nb(ct, action, len, flags);
> 
>  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> 
> Then why is there a change in ct_send here, which is not the new
> non-blocking path?
>

There is not a change to ct_send(), just to intel_guc_ct_send.

As for why intel_guc_ct_send is updated and we don't just a new public
function, this was another reviewers suggestion. Again can't make
everyone happy.
 
> >   	fence = ct_get_next_fence(ct);
> >   	request.fence = fence;
> > @@ -495,7 +576,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >   	list_add_tail(&request.link, &ct->requests.pending);
> >   	spin_unlock(&ct->requests.lock);
> > -	err = ct_write(ct, action, len, fence);
> > +	err = ct_write(ct, action, len, fence, 0);
> >   	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > @@ -537,7 +618,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >    * Command Transport (CT) buffer based GuC send function.
> >    */
> >   int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> > -		      u32 *response_buf, u32 response_buf_size)
> > +		      u32 *response_buf, u32 response_buf_size, u32 flags)
> >   {
> >   	u32 status = ~0; /* undefined */
> >   	int ret;
> > @@ -547,6 +628,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >   		return -ENODEV;
> >   	}
> > +	if (flags & INTEL_GUC_SEND_NB)
> > +		return ct_send_nb(ct, action, len, flags);
> > +
> >   	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> >   	if (unlikely(ret < 0)) {
> >   		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > index 1ae2dde6db93..55ef7c52472f 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > @@ -9,6 +9,7 @@
> >   #include <linux/interrupt.h>
> >   #include <linux/spinlock.h>
> >   #include <linux/workqueue.h>
> > +#include <linux/ktime.h>
> >   #include "intel_guc_fwif.h"
> > @@ -42,7 +43,6 @@ struct intel_guc_ct_buffer {
> >   	bool broken;
> >   };
> > -
> >   /** Top-level structure for Command Transport related data
> >    *
> >    * Includes a pair of CT buffers for bi-directional communication and tracking
> > @@ -69,6 +69,9 @@ struct intel_guc_ct {
> >   		struct list_head incoming; /* incoming requests */
> >   		struct work_struct worker; /* handler for incoming requests */
> >   	} requests;
> > +
> > +	/** @stall_time: time of first time a CTB submission is stalled */
> > +	ktime_t stall_time;
> 
> Unused in this patch.
>

Yea, wrong patch. Will fix.

Matt
 
> >   };
> >   void intel_guc_ct_init_early(struct intel_guc_ct *ct);
> > @@ -88,7 +91,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
> >   }
> >   int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> > -		      u32 *response_buf, u32 response_buf_size);
> > +		      u32 *response_buf, u32 response_buf_size, u32 flags);
> >   void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
> >   #endif /* _INTEL_GUC_CT_H_ */
> > 
> 
> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-05-24 12:21   ` Michal Wajdeczko
@ 2021-05-25 17:30     ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25 17:30 UTC (permalink / raw)
  To: Michal Wajdeczko
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison

On Mon, May 24, 2021 at 02:21:42PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 06.05.2021 21:13, Matthew Brost wrote:
> > Add non blocking CTB send function, intel_guc_send_nb. In order to
> > support a non blocking CTB send function a spin lock is needed to
> 
> spin lock was added in 16/97
> 
> > protect the CTB descriptors fields. Also the non blocking call must not
> > update the fence value as this value is owned by the blocking call
> > (intel_guc_send).
> 
> all H2G messages are using "fence", nb variant also needs to update it
> 
> > 
> > The blocking CTB now must have a flow control mechanism to ensure the
> 
> s/blocking/non-blocking
> 

Will fix the comments as these are a bit stale.

> > buffer isn't overrun. A lazy spin wait is used as we believe the flow
> > control condition should be rare with properly sized buffer.
> 
> as this new nb function is still not used in this patch, then maybe
> better to move flow control to separate patch for easier review ?
>

You can't do non-blocking without flow control, it just doesn't work.
IMO that makes the review harder.
 
> > 
> > The function, intel_guc_send_nb, is exported in this patch but unused.
> > Several patches later in the series make use of this function.
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 ++-
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 96 +++++++++++++++++++++--
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  7 +-
> >  3 files changed, 105 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index c20f3839de12..4c0a367e41d8 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -75,7 +75,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
> >  static
> >  inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
> >  {
> > -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
> > +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
> > +}
> > +
> > +#define INTEL_GUC_SEND_NB		BIT(31)
> > +static
> > +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
> > +{
> > +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
> > +				 INTEL_GUC_SEND_NB);
> >  }
> >  
> >  static inline int
> > @@ -83,7 +91,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
> >  			   u32 *response_buf, u32 response_buf_size)
> >  {
> >  	return intel_guc_ct_send(&guc->ct, action, len,
> > -				 response_buf, response_buf_size);
> > +				 response_buf, response_buf_size, 0);
> >  }
> >  
> >  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index a76603537fa8..af7314d45a78 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -3,6 +3,11 @@
> >   * Copyright © 2016-2019 Intel Corporation
> >   */
> >  
> > +#include <linux/circ_buf.h>
> > +#include <linux/ktime.h>
> > +#include <linux/time64.h>
> > +#include <linux/timekeeping.h>
> > +
> >  #include "i915_drv.h"
> >  #include "intel_guc_ct.h"
> >  #include "gt/intel_gt.h"
> > @@ -308,6 +313,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
> >  	if (unlikely(err))
> >  		goto err_deregister;
> >  
> > +	ct->requests.last_fence = 1;
> 
> not needed
>

Yep.
 
> >  	ct->enabled = true;
> >  
> >  	return 0;
> > @@ -343,10 +349,22 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
> >  	return ++ct->requests.last_fence;
> >  }
> >  
> > +static void write_barrier(struct intel_guc_ct *ct) {
> > +	struct intel_guc *guc = ct_to_guc(ct);
> > +	struct intel_gt *gt = guc_to_gt(guc);
> > +
> > +	if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
> > +		GEM_BUG_ON(guc->send_regs.fw_domains);
> > +		intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
> > +	} else {
> > +		wmb();
> > +	}
> > +}
> 
> this chunk seems to be good candidate for separate patch that could be
> introduced earlier
>

Yes. Will include patches 3-20 post as this technically is required once
the mutex is removed.
 
> > +
> >  static int ct_write(struct intel_guc_ct *ct,
> >  		    const u32 *action,
> >  		    u32 len /* in dwords */,
> > -		    u32 fence)
> > +		    u32 fence, u32 flags)
> >  {
> >  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >  	struct guc_ct_buffer_desc *desc = ctb->desc;
> > @@ -393,9 +411,13 @@ static int ct_write(struct intel_guc_ct *ct,
> >  		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
> >  		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
> >  
> > -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> > -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> > -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
> > +	hxg = (flags & INTEL_GUC_SEND_NB) ?
> > +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
> > +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
> > +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
> > +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> > +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> > +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
> >  
> >  	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
> >  		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
> > @@ -412,6 +434,12 @@ static int ct_write(struct intel_guc_ct *ct,
> >  	}
> >  	GEM_BUG_ON(tail > size);
> >  
> > +	/*
> > +	 * make sure H2G buffer update and LRC tail update (if this triggering a
> > +	 * submission) are visable before updating the descriptor tail
> 
> typo
> 
> > +	 */
> > +	write_barrier(ct);
> > +
> >  	/* now update descriptor */
> >  	WRITE_ONCE(desc->tail, tail);
> >  
> > @@ -466,6 +494,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> >  	return err;
> >  }
> >  
> > +static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> > +{
> > +	struct guc_ct_buffer_desc *desc = ctb->desc;
> > +	u32 head = READ_ONCE(desc->head);
> > +	u32 space;
> > +
> > +	space = CIRC_SPACE(desc->tail, head, ctb->size);
> 
> shouldn't we use READ_ONCE for reading the tail?
>

I don't think so. The above READ_ONCE should be sufficient as a barrier.
 
> > +
> > +	return space >= len_dw;
> > +}
> > +
> > +static int ct_send_nb(struct intel_guc_ct *ct,
> > +		      const u32 *action,
> > +		      u32 len,
> > +		      u32 flags)
> > +{
> > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > +	unsigned long spin_flags;
> > +	u32 fence;
> > +	int ret;
> > +
> > +	spin_lock_irqsave(&ctb->lock, spin_flags);
> > +
> > +	ret = ctb_has_room(ctb, len + 1);
> 
> why +1 ?

The header is 2 DWs while the action array only include 1 DW for the
action field which is stuffed into the header.

> 
> > +	if (unlikely(ret))
> > +		goto out;
> > +
> > +	fence = ct_get_next_fence(ct);
> > +	ret = ct_write(ct, action, len, fence, flags);
> > +	if (unlikely(ret))
> > +		goto out;
> > +
> > +	intel_guc_notify(ct_to_guc(ct));
> > +
> > +out:
> > +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> > +
> > +	return ret;
> > +}
> > +
> >  static int ct_send(struct intel_guc_ct *ct,
> >  		   const u32 *action,
> >  		   u32 len,
> > @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >  		   u32 response_buf_size,
> >  		   u32 *status)
> >  {
> > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >  	struct ct_request request;
> >  	unsigned long flags;
> >  	u32 fence;
> > @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> >  	GEM_BUG_ON(!len);
> >  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> >  	GEM_BUG_ON(!response_buf && response_buf_size);
> > +	might_sleep();
> >  
> > +	/*
> > +	 * We use a lazy spin wait loop here as we believe that if the CT
> > +	 * buffers are sized correctly the flow control condition should be
> > +	 * rare.
> > +	 */
> > +retry:
> >  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
> 
> why +1 ?
>

Same as above.

> > +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > +		cond_resched();
> > +		goto retry;
> > +	}
> 
> hmm, full CTB can also be seen in case of nb, but it looks that only in
> case of blocking call you want to use lazy spin, why ?
>

Blocking calls are rare + no having credits is rare. No need to over
engineering the wait.
 
> also, what if situation is not improving ?
> will we be looping here forever ?
>

Nope, see the following patch:
https://patchwork.freedesktop.org/patch/432325/?series=89844&rev=1
 
> >  
> >  	fence = ct_get_next_fence(ct);
> >  	request.fence = fence;
> > @@ -495,7 +576,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >  	list_add_tail(&request.link, &ct->requests.pending);
> >  	spin_unlock(&ct->requests.lock);
> >  
> > -	err = ct_write(ct, action, len, fence);
> > +	err = ct_write(ct, action, len, fence, 0);
> >  
> >  	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >  
> > @@ -537,7 +618,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >   * Command Transport (CT) buffer based GuC send function.
> >   */
> >  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> > -		      u32 *response_buf, u32 response_buf_size)
> > +		      u32 *response_buf, u32 response_buf_size, u32 flags)
> >  {
> >  	u32 status = ~0; /* undefined */
> >  	int ret;
> > @@ -547,6 +628,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >  		return -ENODEV;
> >  	}
> >  
> > +	if (flags & INTEL_GUC_SEND_NB)
> > +		return ct_send_nb(ct, action, len, flags);
> > +
> >  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> >  	if (unlikely(ret < 0)) {
> >  		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > index 1ae2dde6db93..55ef7c52472f 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > @@ -9,6 +9,7 @@
> >  #include <linux/interrupt.h>
> >  #include <linux/spinlock.h>
> >  #include <linux/workqueue.h>
> > +#include <linux/ktime.h>
> >  
> >  #include "intel_guc_fwif.h"
> >  
> > @@ -42,7 +43,6 @@ struct intel_guc_ct_buffer {
> >  	bool broken;
> >  };
> >  
> > -
> >  /** Top-level structure for Command Transport related data
> >   *
> >   * Includes a pair of CT buffers for bi-directional communication and tracking
> > @@ -69,6 +69,9 @@ struct intel_guc_ct {
> >  		struct list_head incoming; /* incoming requests */
> >  		struct work_struct worker; /* handler for incoming requests */
> >  	} requests;
> > +
> > +	/** @stall_time: time of first time a CTB submission is stalled */
> > +	ktime_t stall_time;
> 
> this should be introduced in 37/97
>

Yep.

Matt
 
> >  };
> >  
> >  void intel_guc_ct_init_early(struct intel_guc_ct *ct);
> > @@ -88,7 +91,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
> >  }
> >  
> >  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> > -		      u32 *response_buf, u32 response_buf_size);
> > +		      u32 *response_buf, u32 response_buf_size, u32 flags);
> >  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
> >  
> >  #endif /* _INTEL_GUC_CT_H_ */
> > 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 35/97] drm/i915/guc: Improve error message for unsolicited CT response
  2021-05-24 11:59   ` Michal Wajdeczko
@ 2021-05-25 17:32     ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25 17:32 UTC (permalink / raw)
  To: Michal Wajdeczko
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison

On Mon, May 24, 2021 at 01:59:54PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 06.05.2021 21:13, Matthew Brost wrote:
> > Improve the error message when a unsolicited CT response is received by
> > printing fence that couldn't be found, the last fence, and all requests
> > with a response outstanding.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 10 +++++++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 217ab3ebd1af..a76603537fa8 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -703,12 +703,16 @@ static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *r
> >  		found = true;
> >  		break;
> >  	}
> > -	spin_unlock_irqrestore(&ct->requests.lock, flags);
> > -
> >  	if (!found) {
> >  		CT_ERROR(ct, "Unsolicited response (fence %u)\n", fence);
> > -		return -ENOKEY;
> > +		CT_ERROR(ct, "Could not find fence=%u, last_fence=%u\n", fence,
> > +			 ct->requests.last_fence);
> 
> nit: this new wording may suggest that it's our fault, but that's not
> necessary true
> 

I don't think is implies whos fault this is either way.

> > +		list_for_each_entry(req, &ct->requests.pending, link)
> > +			CT_ERROR(ct, "request %u awaits response\n",
> > +				 req->fence);
> 
> usually we don't send multiple requests that expects responses, so it's
> very likely that list with pending requests will be empty, and even if
> list is not empty, I'm not sure what is the relation between those
> pending requests to this unsolicited response, thus wondering how these
> extra errors could improve our debugging experience ?
> 

The more information when this occurs the better.

Matt

> > +		err = -ENOKEY;
> >  	}
> > +	spin_unlock_irqrestore(&ct->requests.lock, flags);
> >  
> >  	if (unlikely(err))
> >  		return err;
> > 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 38/97] drm/i915/guc: Optimize CTB writes and reads
  2021-05-24 13:31   ` Michal Wajdeczko
@ 2021-05-25 17:39     ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25 17:39 UTC (permalink / raw)
  To: Michal Wajdeczko
  Cc: tvrtko.ursulin, intel-gfx, dri-devel, jason.ekstrand,
	daniele.ceraolospurio, jon.bloomfield, daniel.vetter,
	john.c.harrison

On Mon, May 24, 2021 at 03:31:25PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 06.05.2021 21:13, Matthew Brost wrote:
> > CTB writes are now in the path of command submission and should be
> > optimized for performance. Rather than reading CTB descriptor values
> > (e.g. head, tail, size) which could result in accesses across the PCIe
> 
> size was removed from the descriptor in 25/97
> 

Yep, stale comment.

> > bus, store shadow local copies and only read/write the descriptor
> > values when absolutely necessary.
> 
> maybe worth to add some words about caching available space ?
> 

Yea.

> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 78 +++++++++++++----------
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
> >  2 files changed, 52 insertions(+), 32 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 4eab319d61be..77dfbc94dcc3 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -127,6 +127,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
> >  static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
> >  {
> >  	ctb->broken = false;
> > +	ctb->tail = 0;
> > +	ctb->head = 0;
> > +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
> > +
> >  	guc_ct_buffer_desc_init(ctb->desc);
> >  }
> >  
> > @@ -371,10 +375,8 @@ static int ct_write(struct intel_guc_ct *ct,
> >  {
> >  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >  	struct guc_ct_buffer_desc *desc = ctb->desc;
> > -	u32 head = desc->head;
> > -	u32 tail = desc->tail;
> > +	u32 tail = ctb->tail;
> >  	u32 size = ctb->size;
> > -	u32 used;
> >  	u32 header;
> >  	u32 hxg;
> >  	u32 *cmds = ctb->cmds;
> > @@ -386,25 +388,14 @@ static int ct_write(struct intel_guc_ct *ct,
> >  	if (unlikely(desc->status))
> >  		goto corrupted;
> >  
> > -	if (unlikely((tail | head) >= size)) {
> > +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> > +	if (unlikely((desc->tail | desc->head) >= size)) {
> >  		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> > -			 head, tail, size);
> > +			 desc->head, desc->tail, size);
> >  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> >  		goto corrupted;
> 
> nit: as we are now caching tail value, we can start comparing it with
> the value in descriptor and report GUC_CTB_STATUS_MISMATCH if needed
>

Sure.
 
> >  	}
> > -
> > -	/*
> > -	 * tail == head condition indicates empty. GuC FW does not support
> > -	 * using up the entire buffer to get tail == head meaning full.
> > -	 */
> > -	if (tail < head)
> > -		used = (size - head) + tail;
> > -	else
> > -		used = tail - head;
> > -
> > -	/* make sure there is a space including extra dw for the fence */
> > -	if (unlikely(used + len + 1 >= size))
> > -		return -ENOSPC;
> > +#endif
> >  
> >  	/*
> >  	 * dw0: CT header (including fence)
> > @@ -444,7 +435,9 @@ static int ct_write(struct intel_guc_ct *ct,
> >  	write_barrier(ct);
> >  
> >  	/* now update descriptor */
> > +	ctb->tail = tail;
> >  	WRITE_ONCE(desc->tail, tail);
> > +	ctb->space -= len + 1;
> >  
> >  	return 0;
> >  
> > @@ -460,7 +453,7 @@ static int ct_write(struct intel_guc_ct *ct,
> >   * @req:	pointer to pending request
> >   * @status:	placeholder for status
> >   *
> > - * For each sent request, Guc shall send bac CT response message.
> > + * For each sent request, GuC shall send back CT response message.
> >   * Our message handler will update status of tracked request once
> >   * response message with given fence is received. Wait here and
> >   * check for valid response status value.
> > @@ -508,24 +501,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
> >  	return ret;
> >  }
> >  
> > -static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> > +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
> 
> this function was introduced moment ago ...
> can we minimize number of changes between patches ?
>

Yep.
 
> >  {
> > -	struct guc_ct_buffer_desc *desc = ctb->desc;
> > -	u32 head = READ_ONCE(desc->head);
> > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > +	u32 head;
> >  	u32 space;
> >  
> > -	space = CIRC_SPACE(desc->tail, head, ctb->size);
> > +	if (ctb->space >= len_dw)
> > +		return true;
> > +
> > +	head = READ_ONCE(ctb->desc->head);
> > +	if (unlikely(head > ctb->size)) {
> > +		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
> > +			  ctb->desc->head, ctb->desc->tail, ctb->size);
> > +		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > +		ctb->broken = true;
> > +		return false;
> > +	}
> > +
> > +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> > +	ctb->space = space;
> 
> shouldn't we update our ctb->head with new head value?
>

Don't need it here as the cached H2G head value isn't used by the i915,
only the space value is. The G2H head value is used.
 
> >  
> >  	return space >= len_dw;
> >  }
> >  
> >  static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
> >  {
> > -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > -
> >  	lockdep_assert_held(&ct->ctbs.send.lock);
> >  
> > -	if (unlikely(!ctb_has_room(ctb, len_dw))) {
> > +	if (unlikely(!h2g_has_room(ct, len_dw))) {
> >  		if (ct->stall_time == KTIME_MAX)
> >  			ct->stall_time = ktime_get();
> >  
> > @@ -593,11 +597,11 @@ static int ct_send(struct intel_guc_ct *ct,
> >  	 * rare.
> >  	 */
> >  retry:
> > -	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > -	if (unlikely(!ctb_has_room(ctb, len + 1))) {
> > +	spin_lock_irqsave(&ctb->lock, flags);
> > +	if (unlikely(!h2g_has_room(ct, len + 1))) {
> >  		if (ct->stall_time == KTIME_MAX)
> >  			ct->stall_time = ktime_get();
> > -		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > +		spin_unlock_irqrestore(&ctb->lock, flags);
> 
> this ...
> 
> >  
> >  		if (unlikely(ct_deadlocked(ct)))
> >  			return -EDEADLK;
> > @@ -620,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >  
> >  	err = ct_write(ct, action, len, fence, 0);
> >  
> > -	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > +	spin_unlock_irqrestore(&ctb->lock, flags);
> 
> and this likely could be done in earlier patch
> 

Probably.

> >  
> >  	if (unlikely(err))
> >  		goto unlink;
> > @@ -708,7 +712,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >  {
> >  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
> >  	struct guc_ct_buffer_desc *desc = ctb->desc;
> > -	u32 head = desc->head;
> > +	u32 head = ctb->head;
> >  	u32 tail = desc->tail;
> >  	u32 size = ctb->size;
> >  	u32 *cmds = ctb->cmds;
> > @@ -723,12 +727,21 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >  	if (unlikely(desc->status))
> >  		goto corrupted;
> >  
> > -	if (unlikely((tail | head) >= size)) {
> > +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> > +	if (unlikely((desc->tail | desc->head) >= size)) {
> >  		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> >  			 head, tail, size);
> >  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> >  		goto corrupted;
> >  	}
> > +#else
> > +	if (unlikely((tail | ctb->head) >= size)) {
> 
> we are in control of cached 'ctb->head' so it shall never be > size
> 
> > +		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> > +			 head, tail, size);
> > +		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > +		goto corrupted;
> > +	}
> > +#endif
> >  
> >  	/* tail == head condition indicates empty */
> >  	available = tail - head;
> > @@ -778,6 +791,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >  	}
> >  	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
> >  
> > +	ctb->head = head;
> >  	/* now update descriptor */
> >  	WRITE_ONCE(desc->head, head);
> >  
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > index 55ef7c52472f..9924335e2ee6 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > @@ -33,6 +33,9 @@ struct intel_guc;
> >   * @desc: pointer to the buffer descriptor
> >   * @cmds: pointer to the commands buffer
> >   * @size: size of the commands buffer in dwords
> > + * @head: local shadow copy of head in dwords
> > + * @tail: local shadow copy of tail in dwords
> > + * @space: local shadow copy of space in dwords
> >   * @broken: flag to indicate if descriptor data is broken
> >   */
> >  struct intel_guc_ct_buffer {
> > @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
> >  	struct guc_ct_buffer_desc *desc;
> >  	u32 *cmds;
> >  	u32 size;
> > +	u32 tail;
> > +	u32 head;
> > +	u32 space;
> >  	bool broken;
> >  };
> >  
> > 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 60/97] drm/i915: Track 'serial' counts for virtual engines
  2021-05-25 10:16   ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-25 17:52     ` Matthew Brost
  2021-05-26  8:40       ` Tvrtko Ursulin
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-25 17:52 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Tue, May 25, 2021 at 11:16:12AM +0100, Tvrtko Ursulin wrote:
> 
> On 06/05/2021 20:14, Matthew Brost wrote:
> > From: John Harrison <John.C.Harrison@Intel.com>
> > 
> > The serial number tracking of engines happens at the backend of
> > request submission and was expecting to only be given physical
> > engines. However, in GuC submission mode, the decomposition of virtual
> > to physical engines does not happen in i915. Instead, requests are
> > submitted to their virtual engine mask all the way through to the
> > hardware (i.e. to GuC). This would mean that the heart beat code
> > thinks the physical engines are idle due to the serial number not
> > incrementing.
> > 
> > This patch updates the tracking to decompose virtual engines into
> > their physical constituents and tracks the request against each. This
> > is not entirely accurate as the GuC will only be issuing the request
> > to one physical engine. However, it is the best that i915 can do given
> > that it has no knowledge of the GuC's scheduling decisions.
> 
> Commit text sounds a bit defeatist. I think instead of making up the serial
> counts, which has downsides (could you please document in the commit what
> they are), we should think how to design things properly.
> 

IMO, I don't think fixing serial counts is the scope of this series. We
should focus on getting GuC submission in not cleaning up all the crap
that is in the i915. Let's make a note of this though so we can revisit
later.

Matt

> Regards,
> 
> Tvrtko
> 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_engine_types.h     |  2 ++
> >   .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++++++
> >   drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++++++
> >   drivers/gpu/drm/i915/gt/mock_engine.c            |  6 ++++++
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 ++++++++++++++++
> >   drivers/gpu/drm/i915/i915_request.c              |  4 +++-
> >   6 files changed, 39 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > index 86302e6d86b2..e2b5cda6dbc4 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > @@ -389,6 +389,8 @@ struct intel_engine_cs {
> >   	void		(*park)(struct intel_engine_cs *engine);
> >   	void		(*unpark)(struct intel_engine_cs *engine);
> > +	void		(*bump_serial)(struct intel_engine_cs *engine);
> > +
> >   	void		(*set_default_submission)(struct intel_engine_cs *engine);
> >   	const struct intel_context_ops *cops;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index ae12d7f19ecd..02880ea5d693 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -3199,6 +3199,11 @@ static void execlists_release(struct intel_engine_cs *engine)
> >   	lrc_fini_wa_ctx(engine);
> >   }
> > +static void execlist_bump_serial(struct intel_engine_cs *engine)
> > +{
> > +	engine->serial++;
> > +}
> > +
> >   static void
> >   logical_ring_default_vfuncs(struct intel_engine_cs *engine)
> >   {
> > @@ -3208,6 +3213,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
> >   	engine->cops = &execlists_context_ops;
> >   	engine->request_alloc = execlists_request_alloc;
> > +	engine->bump_serial = execlist_bump_serial;
> >   	engine->reset.prepare = execlists_reset_prepare;
> >   	engine->reset.rewind = execlists_reset_rewind;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > index 14aa31879a37..39dd7c4ed0a9 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > @@ -1045,6 +1045,11 @@ static void setup_irq(struct intel_engine_cs *engine)
> >   	}
> >   }
> > +static void ring_bump_serial(struct intel_engine_cs *engine)
> > +{
> > +	engine->serial++;
> > +}
> > +
> >   static void setup_common(struct intel_engine_cs *engine)
> >   {
> >   	struct drm_i915_private *i915 = engine->i915;
> > @@ -1064,6 +1069,7 @@ static void setup_common(struct intel_engine_cs *engine)
> >   	engine->cops = &ring_context_ops;
> >   	engine->request_alloc = ring_request_alloc;
> > +	engine->bump_serial = ring_bump_serial;
> >   	/*
> >   	 * Using a global execution timeline; the previous final breadcrumb is
> > diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> > index bd005c1b6fd5..97b10fd60b55 100644
> > --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> > +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> > @@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine)
> >   	intel_engine_fini_retire(engine);
> >   }
> > +static void mock_bump_serial(struct intel_engine_cs *engine)
> > +{
> > +	engine->serial++;
> > +}
> > +
> >   struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
> >   				    const char *name,
> >   				    int id)
> > @@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
> >   	engine->base.cops = &mock_context_ops;
> >   	engine->base.request_alloc = mock_request_alloc;
> > +	engine->base.bump_serial = mock_bump_serial;
> >   	engine->base.emit_flush = mock_emit_flush;
> >   	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
> >   	engine->base.submit_request = mock_submit_request;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index dc79d287c50a..f0e5731bcef6 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1500,6 +1500,20 @@ static void guc_release(struct intel_engine_cs *engine)
> >   	lrc_fini_wa_ctx(engine);
> >   }
> > +static void guc_bump_serial(struct intel_engine_cs *engine)
> > +{
> > +	engine->serial++;
> > +}
> > +
> > +static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
> > +{
> > +	struct intel_engine_cs *e;
> > +	intel_engine_mask_t tmp, mask = engine->mask;
> > +
> > +	for_each_engine_masked(e, engine->gt, mask, tmp)
> > +		e->serial++;
> > +}
> > +
> >   static void guc_default_vfuncs(struct intel_engine_cs *engine)
> >   {
> >   	/* Default vfuncs which can be overridden by each engine. */
> > @@ -1508,6 +1522,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
> >   	engine->cops = &guc_context_ops;
> >   	engine->request_alloc = guc_request_alloc;
> > +	engine->bump_serial = guc_bump_serial;
> >   	engine->sched_engine->schedule = i915_schedule;
> > @@ -1843,6 +1858,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> >   	ve->base.cops = &virtual_guc_context_ops;
> >   	ve->base.request_alloc = guc_request_alloc;
> > +	ve->base.bump_serial = virtual_guc_bump_serial;
> >   	ve->base.submit_request = guc_submit_request;
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 9542a5baa45a..127d60b36422 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -692,7 +692,9 @@ bool __i915_request_submit(struct i915_request *request)
> >   				     request->ring->vaddr + request->postfix);
> >   	trace_i915_request_execute(request);
> > -	engine->serial++;
> > +	if (engine->bump_serial)
> > +		engine->bump_serial(engine);
> > +
> >   	result = true;
> >   	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> > 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 15/97] drm/i915/guc: Relax CTB response timeout
  2021-05-06 19:13 ` [RFC PATCH 15/97] drm/i915/guc: Relax CTB response timeout Matthew Brost
@ 2021-05-25 18:08   ` Matthew Brost
  2021-05-25 19:37     ` [Intel-gfx] " Michal Wajdeczko
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-25 18:08 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:29PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> In upcoming patch we will allow more CTB requests to be sent in
> parallel to the GuC for procesing, so we shouldn't assume any more
> that GuC will always reply without 10ms.
> 
> Use bigger value from CONFIG_DRM_I915_HEARTBEAT_INTERVAL instead.
> 

I think this should be its own config option or we combine it with a
config option suggested in patch 37.

What do you think Michal? If you agree I can fix this up in the post of
these patches.

Matt

> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index c87a0a8bef26..a4b2e7fe318b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -436,17 +436,23 @@ static int ct_write(struct intel_guc_ct *ct,
>   */
>  static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
>  {
> +	long timeout;
>  	int err;
>  
>  	/*
>  	 * Fast commands should complete in less than 10us, so sample quickly
>  	 * up to that length of time, then switch to a slower sleep-wait loop.
>  	 * No GuC command should ever take longer than 10ms.
> +	 *
> +	 * However, there might be other CT requests in flight before this one,
> +	 * so use @CONFIG_DRM_I915_HEARTBEAT_INTERVAL as backup timeout value.
>  	 */
> +	timeout = max(10, CONFIG_DRM_I915_HEARTBEAT_INTERVAL);
> +
>  #define done INTEL_GUC_MSG_IS_RESPONSE(READ_ONCE(req->status))
>  	err = wait_for_us(done, 10);
>  	if (err)
> -		err = wait_for(done, 10);
> +		err = wait_for(done, timeout);
>  #undef done
>  
>  	if (unlikely(err))
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 18/97] drm/i915/guc: Don't receive all G2H messages in irq handler
  2021-05-06 19:13 ` [RFC PATCH 18/97] drm/i915/guc: Don't receive all G2H messages in irq handler Matthew Brost
@ 2021-05-25 18:15   ` Matthew Brost
  2021-05-25 19:43     ` [Intel-gfx] " Michal Wajdeczko
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-25 18:15 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:32PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> In irq handler try to receive just single G2H message, let other
> messages to be received from tasklet.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 67 ++++++++++++++++-------
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +
>  2 files changed, 50 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index cb58fa7f970c..d630ec32decf 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -81,6 +81,7 @@ enum { CTB_SEND = 0, CTB_RECV = 1 };
>  
>  enum { CTB_OWNER_HOST = 0 };
>  
> +static void ct_receive_tasklet_func(unsigned long data);
>  static void ct_incoming_request_worker_func(struct work_struct *w);
>  
>  /**
> @@ -95,6 +96,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>  	INIT_LIST_HEAD(&ct->requests.pending);
>  	INIT_LIST_HEAD(&ct->requests.incoming);
>  	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
> +	tasklet_init(&ct->receive_tasklet, ct_receive_tasklet_func, (unsigned long)ct);

This function is deprecated. tasklet_setup should be used.
I can fix this up when I post the CTB patches.

>  }
>  
>  static inline const char *guc_ct_buffer_type_to_str(u32 type)
> @@ -244,6 +246,7 @@ void intel_guc_ct_fini(struct intel_guc_ct *ct)
>  {
>  	GEM_BUG_ON(ct->enabled);
>  
> +	tasklet_kill(&ct->receive_tasklet);
>  	i915_vma_unpin_and_release(&ct->vma, I915_VMA_RELEASE_MAP);
>  	memset(ct, 0, sizeof(*ct));
>  }
> @@ -629,7 +632,7 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
>  	CT_DEBUG(ct, "received %*ph\n", 4 * len, data);
>  
>  	desc->head = head * 4;
> -	return 0;
> +	return available - len;
>  
>  corrupted:
>  	CT_ERROR(ct, "Corrupted descriptor addr=%#x head=%u tail=%u size=%u\n",
> @@ -665,10 +668,10 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
>  	u32 status;
>  	u32 datalen;
>  	struct ct_request *req;
> +	unsigned long flags;
>  	bool found = false;
>  
>  	GEM_BUG_ON(!ct_header_is_response(header));
> -	GEM_BUG_ON(!in_irq());
>  
>  	/* Response payload shall at least include fence and status */
>  	if (unlikely(len < 2)) {
> @@ -688,7 +691,7 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
>  
>  	CT_DEBUG(ct, "response fence %u status %#x\n", fence, status);
>  
> -	spin_lock(&ct->requests.lock);
> +	spin_lock_irqsave(&ct->requests.lock, flags);
>  	list_for_each_entry(req, &ct->requests.pending, link) {
>  		if (unlikely(fence != req->fence)) {
>  			CT_DEBUG(ct, "request %u awaits response\n",
> @@ -707,7 +710,7 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
>  		found = true;
>  		break;
>  	}
> -	spin_unlock(&ct->requests.lock);
> +	spin_unlock_irqrestore(&ct->requests.lock, flags);
>  
>  	if (!found)
>  		CT_ERROR(ct, "Unsolicited response %*ph\n", msgsize, msg);
> @@ -821,31 +824,55 @@ static int ct_handle_request(struct intel_guc_ct *ct, const u32 *msg)
>  	return 0;
>  }
>  
> +static int ct_receive(struct intel_guc_ct *ct)
> +{
> +	u32 msg[GUC_CT_MSG_LEN_MASK + 1]; /* one extra dw for the header */
> +	unsigned long flags;
> +	int ret;
> +
> +	spin_lock_irqsave(&ct->ctbs.recv.lock, flags);
> +	ret = ct_read(ct, msg);
> +	spin_unlock_irqrestore(&ct->ctbs.recv.lock, flags);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (ct_header_is_response(msg[0]))
> +		ct_handle_response(ct, msg);
> +	else
> +		ct_handle_request(ct, msg);
> +
> +	return ret;
> +}
> +
> +static void ct_try_receive_message(struct intel_guc_ct *ct)
> +{
> +	int ret;
> +
> +	if (GEM_WARN_ON(!ct->enabled))
> +		return;
> +
> +	ret = ct_receive(ct);
> +	if (ret > 0)
> +		tasklet_hi_schedule(&ct->receive_tasklet);
> +}
> +
> +static void ct_receive_tasklet_func(unsigned long data)
> +{

As mentioned above tasklet_init is deprecated. The callback now accepts
the tasklet structure and ct can be looked up via container_of macro.

Everything else looks good.

With that and changing to the new tasklet API:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> +	struct intel_guc_ct *ct = (struct intel_guc_ct *)data;
> +
> +	ct_try_receive_message(ct);
> +}
> +
>  /*
>   * When we're communicating with the GuC over CT, GuC uses events
>   * to notify us about new messages being posted on the RECV buffer.
>   */
>  void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
>  {
> -	u32 msg[GUC_CT_MSG_LEN_MASK + 1]; /* one extra dw for the header */
> -	unsigned long flags;
> -	int err = 0;
> -
>  	if (unlikely(!ct->enabled)) {
>  		WARN(1, "Unexpected GuC event received while CT disabled!\n");
>  		return;
>  	}
>  
> -	do {
> -		spin_lock_irqsave(&ct->ctbs.recv.lock, flags);
> -		err = ct_read(ct, msg);
> -		spin_unlock_irqrestore(&ct->ctbs.recv.lock, flags);
> -		if (err)
> -			break;
> -
> -		if (ct_header_is_response(msg[0]))
> -			err = ct_handle_response(ct, msg);
> -		else
> -			err = ct_handle_request(ct, msg);
> -	} while (!err);
> +	ct_try_receive_message(ct);
>  }
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index bc52dc479a14..cb222f202301 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -6,6 +6,7 @@
>  #ifndef _INTEL_GUC_CT_H_
>  #define _INTEL_GUC_CT_H_
>  
> +#include <linux/interrupt.h>
>  #include <linux/spinlock.h>
>  #include <linux/workqueue.h>
>  
> @@ -55,6 +56,8 @@ struct intel_guc_ct {
>  		struct intel_guc_ct_buffer recv;
>  	} ctbs;
>  
> +	struct tasklet_struct receive_tasklet;
> +
>  	struct {
>  		u32 last_fence; /* last fence used to send request */
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 19/97] drm/i915/guc: Always copy CT message to new allocation
  2021-05-06 19:13 ` [RFC PATCH 19/97] drm/i915/guc: Always copy CT message to new allocation Matthew Brost
@ 2021-05-25 18:25   ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-25 18:25 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:33PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> Since most of future CT traffic will be based on G2H requests,
> instead of copying incoming CT message to static buffer and then
> create new allocation for such request, always copy incoming CT
> message to new allocation. Also by doing it while reading CT
> header, we can safely fallback if that atomic allocation fails.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 180 ++++++++++++++--------
>  1 file changed, 120 insertions(+), 60 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index d630ec32decf..a174978c6a27 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -72,8 +72,9 @@ struct ct_request {
>  	u32 *response_buf;
>  };
>  
> -struct ct_incoming_request {
> +struct ct_incoming_msg {
>  	struct list_head link;
> +	u32 size;
>  	u32 msg[];
>  };
>  
> @@ -575,7 +576,26 @@ static inline bool ct_header_is_response(u32 header)
>  	return !!(header & GUC_CT_MSG_IS_RESPONSE);
>  }
>  
> -static int ct_read(struct intel_guc_ct *ct, u32 *data)
> +static struct ct_incoming_msg *ct_alloc_msg(u32 num_dwords)
> +{
> +	struct ct_incoming_msg *msg;
> +
> +	msg = kmalloc(sizeof(*msg) + sizeof(u32) * num_dwords, GFP_ATOMIC);
> +	if (msg)
> +		msg->size = num_dwords;
> +	return msg;
> +}
> +
> +static void ct_free_msg(struct ct_incoming_msg *msg)
> +{
> +	kfree(msg);
> +}
> +
> +/*
> + * Return: number available remaining dwords to read (0 if empty)
> + *         or a negative error code on failure
> + */
> +static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  {
>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> @@ -586,6 +606,7 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
>  	s32 available;
>  	unsigned int len;
>  	unsigned int i;
> +	u32 header;
>  
>  	if (unlikely(desc->is_in_error))
>  		return -EPIPE;
> @@ -601,8 +622,10 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
>  
>  	/* tail == head condition indicates empty */
>  	available = tail - head;
> -	if (unlikely(available == 0))
> -		return -ENODATA;
> +	if (unlikely(available == 0)) {
> +		*msg = NULL;
> +		return 0;
> +	}
>  
>  	/* beware of buffer wrap case */
>  	if (unlikely(available < 0))
> @@ -610,14 +633,14 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
>  	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
>  	GEM_BUG_ON(available < 0);
>  
> -	data[0] = cmds[head];
> +	header = cmds[head];
>  	head = (head + 1) % size;
>  
>  	/* message len with header */
> -	len = ct_header_get_len(data[0]) + 1;
> +	len = ct_header_get_len(header) + 1;
>  	if (unlikely(len > (u32)available)) {
>  		CT_ERROR(ct, "Incomplete message %*ph %*ph %*ph\n",
> -			 4, data,
> +			 4, &header,
>  			 4 * (head + available - 1 > size ?
>  			      size - head : available - 1), &cmds[head],
>  			 4 * (head + available - 1 > size ?
> @@ -625,11 +648,24 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
>  		goto corrupted;
>  	}
>  
> +	*msg = ct_alloc_msg(len);
> +	if (!*msg) {
> +		CT_ERROR(ct, "No memory for message %*ph %*ph %*ph\n",
> +			 4, &header,
> +			 4 * (head + available - 1 > size ?
> +			      size - head : available - 1), &cmds[head],
> +			 4 * (head + available - 1 > size ?
> +			      available - 1 - size + head : 0), &cmds[0]);
> +		return available;
> +	}
> +
> +	(*msg)->msg[0] = header;
> +
>  	for (i = 1; i < len; i++) {
> -		data[i] = cmds[head];
> +		(*msg)->msg[i] = cmds[head];
>  		head = (head + 1) % size;
>  	}
> -	CT_DEBUG(ct, "received %*ph\n", 4 * len, data);
> +	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
>  
>  	desc->head = head * 4;
>  	return available - len;
> @@ -659,33 +695,33 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
>   *                   ^-----------------------len-----------------------^
>   */
>  
> -static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
> +static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *response)
>  {
> -	u32 header = msg[0];
> +	u32 header = response->msg[0];
>  	u32 len = ct_header_get_len(header);
> -	u32 msgsize = (len + 1) * sizeof(u32); /* msg size in bytes w/header */
>  	u32 fence;
>  	u32 status;
>  	u32 datalen;
>  	struct ct_request *req;
>  	unsigned long flags;
>  	bool found = false;
> +	int err = 0;
>  
>  	GEM_BUG_ON(!ct_header_is_response(header));
>  
>  	/* Response payload shall at least include fence and status */
>  	if (unlikely(len < 2)) {
> -		CT_ERROR(ct, "Corrupted response %*ph\n", msgsize, msg);
> +		CT_ERROR(ct, "Corrupted response (len %u)\n", len);
>  		return -EPROTO;
>  	}
>  
> -	fence = msg[1];
> -	status = msg[2];
> +	fence = response->msg[1];
> +	status = response->msg[2];
>  	datalen = len - 2;
>  
>  	/* Format of the status follows RESPONSE message */
>  	if (unlikely(!INTEL_GUC_MSG_IS_RESPONSE(status))) {
> -		CT_ERROR(ct, "Corrupted response %*ph\n", msgsize, msg);
> +		CT_ERROR(ct, "Corrupted response (status %#x)\n", status);
>  		return -EPROTO;
>  	}
>  
> @@ -699,12 +735,13 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
>  			continue;
>  		}
>  		if (unlikely(datalen > req->response_len)) {
> -			CT_ERROR(ct, "Response for %u is too long %*ph\n",
> -				 req->fence, msgsize, msg);
> -			datalen = 0;
> +			CT_ERROR(ct, "Response %u too long (datalen %u > %u)\n",
> +				 req->fence, datalen, req->response_len);
> +			datalen = min(datalen, req->response_len);
> +			err = -EMSGSIZE;
>  		}
>  		if (datalen)
> -			memcpy(req->response_buf, msg + 3, 4 * datalen);
> +			memcpy(req->response_buf, response->msg + 3, 4 * datalen);
>  		req->response_len = datalen;
>  		WRITE_ONCE(req->status, status);
>  		found = true;
> @@ -712,45 +749,61 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
>  	}
>  	spin_unlock_irqrestore(&ct->requests.lock, flags);
>  
> -	if (!found)
> -		CT_ERROR(ct, "Unsolicited response %*ph\n", msgsize, msg);
> +	if (!found) {
> +		CT_ERROR(ct, "Unsolicited response (fence %u)\n", fence);
> +		return -ENOKEY;
> +	}
> +
> +	if (unlikely(err))
> +		return err;
> +
> +	ct_free_msg(response);
>  	return 0;
>  }
>  
> -static void ct_process_request(struct intel_guc_ct *ct,
> -			       u32 action, u32 len, const u32 *payload)
> +static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *request)
>  {
>  	struct intel_guc *guc = ct_to_guc(ct);
> +	u32 header, action, len;
> +	const u32 *payload;
>  	int ret;
>  
> +	header = request->msg[0];
> +	payload = &request->msg[1];
> +	action = ct_header_get_action(header);
> +	len = ct_header_get_len(header);
> +
>  	CT_DEBUG(ct, "request %x %*ph\n", action, 4 * len, payload);
>  
>  	switch (action) {
>  	case INTEL_GUC_ACTION_DEFAULT:
>  		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
> -		if (unlikely(ret))
> -			goto fail_unexpected;
>  		break;
> -
>  	default:
> -fail_unexpected:
> -		CT_ERROR(ct, "Unexpected request %x %*ph\n",
> -			 action, 4 * len, payload);
> +		ret = -EOPNOTSUPP;
>  		break;
>  	}
> +
> +	if (unlikely(ret)) {
> +		CT_ERROR(ct, "Failed to process request %04x (%pe)\n",
> +			 action, ERR_PTR(ret));
> +		return ret;
> +	}
> +
> +	ct_free_msg(request);
> +	return 0;
>  }
>  
>  static bool ct_process_incoming_requests(struct intel_guc_ct *ct)
>  {
>  	unsigned long flags;
> -	struct ct_incoming_request *request;
> -	u32 header;
> -	u32 *payload;
> +	struct ct_incoming_msg *request;
>  	bool done;
> +	int err;
>  
>  	spin_lock_irqsave(&ct->requests.lock, flags);
>  	request = list_first_entry_or_null(&ct->requests.incoming,
> -					   struct ct_incoming_request, link);
> +					   struct ct_incoming_msg, link);
>  	if (request)
>  		list_del(&request->link);
>  	done = !!list_empty(&ct->requests.incoming);
> @@ -759,14 +812,13 @@ static bool ct_process_incoming_requests(struct intel_guc_ct *ct)
>  	if (!request)
>  		return true;
>  
> -	header = request->msg[0];
> -	payload = &request->msg[1];
> -	ct_process_request(ct,
> -			   ct_header_get_action(header),
> -			   ct_header_get_len(header),
> -			   payload);
> +	err = ct_process_request(ct, request);
> +	if (unlikely(err)) {
> +		CT_ERROR(ct, "Failed to process CT message (%pe) %*ph\n",
> +			 ERR_PTR(err), 4 * request->size, request->msg);
> +		ct_free_msg(request);
> +	}
>  
> -	kfree(request);
>  	return done;
>  }
>  
> @@ -799,22 +851,11 @@ static void ct_incoming_request_worker_func(struct work_struct *w)
>   *                   ^-----------------------len-----------------------^
>   */
>  
> -static int ct_handle_request(struct intel_guc_ct *ct, const u32 *msg)
> +static int ct_handle_request(struct intel_guc_ct *ct, struct ct_incoming_msg *request)
>  {
> -	u32 header = msg[0];
> -	u32 len = ct_header_get_len(header);
> -	u32 msgsize = (len + 1) * sizeof(u32); /* msg size in bytes w/header */
> -	struct ct_incoming_request *request;
>  	unsigned long flags;
>  
> -	GEM_BUG_ON(ct_header_is_response(header));
> -
> -	request = kmalloc(sizeof(*request) + msgsize, GFP_ATOMIC);
> -	if (unlikely(!request)) {
> -		CT_ERROR(ct, "Dropping request %*ph\n", msgsize, msg);
> -		return 0; /* XXX: -ENOMEM ? */
> -	}
> -	memcpy(request->msg, msg, msgsize);
> +	GEM_BUG_ON(ct_header_is_response(request->msg[0]));
>  
>  	spin_lock_irqsave(&ct->requests.lock, flags);
>  	list_add_tail(&request->link, &ct->requests.incoming);
> @@ -824,22 +865,41 @@ static int ct_handle_request(struct intel_guc_ct *ct, const u32 *msg)
>  	return 0;
>  }
>  
> +static void ct_handle_msg(struct intel_guc_ct *ct, struct ct_incoming_msg *msg)
> +{
> +	u32 header = msg->msg[0];
> +	int err;
> +
> +	if (ct_header_is_response(header))
> +		err = ct_handle_response(ct, msg);
> +	else
> +		err = ct_handle_request(ct, msg);
> +
> +	if (unlikely(err)) {
> +		CT_ERROR(ct, "Failed to process CT message (%pe) %*ph\n",
> +			 ERR_PTR(err), 4 * msg->size, msg->msg);
> +		ct_free_msg(msg);
> +	}
> +}
> +
> +/*
> + * Return: number available remaining dwords to read (0 if empty)
> + *         or a negative error code on failure
> + */
>  static int ct_receive(struct intel_guc_ct *ct)
>  {
> -	u32 msg[GUC_CT_MSG_LEN_MASK + 1]; /* one extra dw for the header */
> +	struct ct_incoming_msg *msg = NULL;
>  	unsigned long flags;
>  	int ret;
>  
>  	spin_lock_irqsave(&ct->ctbs.recv.lock, flags);
> -	ret = ct_read(ct, msg);
> +	ret = ct_read(ct, &msg);
>  	spin_unlock_irqrestore(&ct->ctbs.recv.lock, flags);
>  	if (ret < 0)
>  		return ret;
>  
> -	if (ct_header_is_response(msg[0]))
> -		ct_handle_response(ct, msg);
> -	else
> -		ct_handle_request(ct, msg);
> +	if (msg)
> +		ct_handle_msg(ct, msg);
>  
>  	return ret;
>  }
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 15/97] drm/i915/guc: Relax CTB response timeout
  2021-05-25 18:08   ` Matthew Brost
@ 2021-05-25 19:37     ` Michal Wajdeczko
  0 siblings, 0 replies; 249+ messages in thread
From: Michal Wajdeczko @ 2021-05-25 19:37 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel
  Cc: jason.ekstrand, daniel.vetter, Tvrtko Ursulin



On 25.05.2021 20:08, Matthew Brost wrote:
> On Thu, May 06, 2021 at 12:13:29PM -0700, Matthew Brost wrote:
>> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>
>> In upcoming patch we will allow more CTB requests to be sent in
>> parallel to the GuC for procesing, so we shouldn't assume any more
>> that GuC will always reply without 10ms.
>>
>> Use bigger value from CONFIG_DRM_I915_HEARTBEAT_INTERVAL instead.
>>
> 
> I think this should be its own config option or we combine it with a
> config option suggested in patch 37.
> 
> What do you think Michal? If you agree I can fix this up in the post of
> these patches.

+ Tvrtko

yep, use of dedicated GuC CONFIG is what we also agree internally,
existing HEARTBEAT_INTERVAL was just fastest way to bypass 10ms

so, yes, please go ahead and do it right

> 
> Matt
> 
>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> ---
>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> index c87a0a8bef26..a4b2e7fe318b 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> @@ -436,17 +436,23 @@ static int ct_write(struct intel_guc_ct *ct,
>>   */
>>  static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
>>  {
>> +	long timeout;
>>  	int err;
>>  
>>  	/*
>>  	 * Fast commands should complete in less than 10us, so sample quickly
>>  	 * up to that length of time, then switch to a slower sleep-wait loop.
>>  	 * No GuC command should ever take longer than 10ms.
>> +	 *
>> +	 * However, there might be other CT requests in flight before this one,
>> +	 * so use @CONFIG_DRM_I915_HEARTBEAT_INTERVAL as backup timeout value.
>>  	 */
>> +	timeout = max(10, CONFIG_DRM_I915_HEARTBEAT_INTERVAL);
>> +
>>  #define done INTEL_GUC_MSG_IS_RESPONSE(READ_ONCE(req->status))
>>  	err = wait_for_us(done, 10);
>>  	if (err)
>> -		err = wait_for(done, 10);
>> +		err = wait_for(done, timeout);
>>  #undef done
>>  
>>  	if (unlikely(err))
>> -- 
>> 2.28.0
>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 18/97] drm/i915/guc: Don't receive all G2H messages in irq handler
  2021-05-25 18:15   ` Matthew Brost
@ 2021-05-25 19:43     ` Michal Wajdeczko
  0 siblings, 0 replies; 249+ messages in thread
From: Michal Wajdeczko @ 2021-05-25 19:43 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter



On 25.05.2021 20:15, Matthew Brost wrote:
> On Thu, May 06, 2021 at 12:13:32PM -0700, Matthew Brost wrote:
>> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>
>> In irq handler try to receive just single G2H message, let other
>> messages to be received from tasklet.
>>
>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> ---
>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 67 ++++++++++++++++-------
>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +
>>  2 files changed, 50 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> index cb58fa7f970c..d630ec32decf 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> @@ -81,6 +81,7 @@ enum { CTB_SEND = 0, CTB_RECV = 1 };
>>  
>>  enum { CTB_OWNER_HOST = 0 };
>>  
>> +static void ct_receive_tasklet_func(unsigned long data);
>>  static void ct_incoming_request_worker_func(struct work_struct *w);
>>  
>>  /**
>> @@ -95,6 +96,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>>  	INIT_LIST_HEAD(&ct->requests.pending);
>>  	INIT_LIST_HEAD(&ct->requests.incoming);
>>  	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
>> +	tasklet_init(&ct->receive_tasklet, ct_receive_tasklet_func, (unsigned long)ct);
> 
> This function is deprecated. tasklet_setup should be used.
> I can fix this up when I post the CTB patches.

I didn't notice that, but yes, passing struct to callback will be
cleaner, so please fix it

Thanks,
Michal

> 
>>  }
>>  
>>  static inline const char *guc_ct_buffer_type_to_str(u32 type)
>> @@ -244,6 +246,7 @@ void intel_guc_ct_fini(struct intel_guc_ct *ct)
>>  {
>>  	GEM_BUG_ON(ct->enabled);
>>  
>> +	tasklet_kill(&ct->receive_tasklet);
>>  	i915_vma_unpin_and_release(&ct->vma, I915_VMA_RELEASE_MAP);
>>  	memset(ct, 0, sizeof(*ct));
>>  }
>> @@ -629,7 +632,7 @@ static int ct_read(struct intel_guc_ct *ct, u32 *data)
>>  	CT_DEBUG(ct, "received %*ph\n", 4 * len, data);
>>  
>>  	desc->head = head * 4;
>> -	return 0;
>> +	return available - len;
>>  
>>  corrupted:
>>  	CT_ERROR(ct, "Corrupted descriptor addr=%#x head=%u tail=%u size=%u\n",
>> @@ -665,10 +668,10 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
>>  	u32 status;
>>  	u32 datalen;
>>  	struct ct_request *req;
>> +	unsigned long flags;
>>  	bool found = false;
>>  
>>  	GEM_BUG_ON(!ct_header_is_response(header));
>> -	GEM_BUG_ON(!in_irq());
>>  
>>  	/* Response payload shall at least include fence and status */
>>  	if (unlikely(len < 2)) {
>> @@ -688,7 +691,7 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
>>  
>>  	CT_DEBUG(ct, "response fence %u status %#x\n", fence, status);
>>  
>> -	spin_lock(&ct->requests.lock);
>> +	spin_lock_irqsave(&ct->requests.lock, flags);
>>  	list_for_each_entry(req, &ct->requests.pending, link) {
>>  		if (unlikely(fence != req->fence)) {
>>  			CT_DEBUG(ct, "request %u awaits response\n",
>> @@ -707,7 +710,7 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
>>  		found = true;
>>  		break;
>>  	}
>> -	spin_unlock(&ct->requests.lock);
>> +	spin_unlock_irqrestore(&ct->requests.lock, flags);
>>  
>>  	if (!found)
>>  		CT_ERROR(ct, "Unsolicited response %*ph\n", msgsize, msg);
>> @@ -821,31 +824,55 @@ static int ct_handle_request(struct intel_guc_ct *ct, const u32 *msg)
>>  	return 0;
>>  }
>>  
>> +static int ct_receive(struct intel_guc_ct *ct)
>> +{
>> +	u32 msg[GUC_CT_MSG_LEN_MASK + 1]; /* one extra dw for the header */
>> +	unsigned long flags;
>> +	int ret;
>> +
>> +	spin_lock_irqsave(&ct->ctbs.recv.lock, flags);
>> +	ret = ct_read(ct, msg);
>> +	spin_unlock_irqrestore(&ct->ctbs.recv.lock, flags);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	if (ct_header_is_response(msg[0]))
>> +		ct_handle_response(ct, msg);
>> +	else
>> +		ct_handle_request(ct, msg);
>> +
>> +	return ret;
>> +}
>> +
>> +static void ct_try_receive_message(struct intel_guc_ct *ct)
>> +{
>> +	int ret;
>> +
>> +	if (GEM_WARN_ON(!ct->enabled))
>> +		return;
>> +
>> +	ret = ct_receive(ct);
>> +	if (ret > 0)
>> +		tasklet_hi_schedule(&ct->receive_tasklet);
>> +}
>> +
>> +static void ct_receive_tasklet_func(unsigned long data)
>> +{
> 
> As mentioned above tasklet_init is deprecated. The callback now accepts
> the tasklet structure and ct can be looked up via container_of macro.
> 
> Everything else looks good.
> 
> With that and changing to the new tasklet API:
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> 
>> +	struct intel_guc_ct *ct = (struct intel_guc_ct *)data;
>> +
>> +	ct_try_receive_message(ct);
>> +}
>> +
>>  /*
>>   * When we're communicating with the GuC over CT, GuC uses events
>>   * to notify us about new messages being posted on the RECV buffer.
>>   */
>>  void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
>>  {
>> -	u32 msg[GUC_CT_MSG_LEN_MASK + 1]; /* one extra dw for the header */
>> -	unsigned long flags;
>> -	int err = 0;
>> -
>>  	if (unlikely(!ct->enabled)) {
>>  		WARN(1, "Unexpected GuC event received while CT disabled!\n");
>>  		return;
>>  	}
>>  
>> -	do {
>> -		spin_lock_irqsave(&ct->ctbs.recv.lock, flags);
>> -		err = ct_read(ct, msg);
>> -		spin_unlock_irqrestore(&ct->ctbs.recv.lock, flags);
>> -		if (err)
>> -			break;
>> -
>> -		if (ct_header_is_response(msg[0]))
>> -			err = ct_handle_response(ct, msg);
>> -		else
>> -			err = ct_handle_request(ct, msg);
>> -	} while (!err);
>> +	ct_try_receive_message(ct);
>>  }
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>> index bc52dc479a14..cb222f202301 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>> @@ -6,6 +6,7 @@
>>  #ifndef _INTEL_GUC_CT_H_
>>  #define _INTEL_GUC_CT_H_
>>  
>> +#include <linux/interrupt.h>
>>  #include <linux/spinlock.h>
>>  #include <linux/workqueue.h>
>>  
>> @@ -55,6 +56,8 @@ struct intel_guc_ct {
>>  		struct intel_guc_ct_buffer recv;
>>  	} ctbs;
>>  
>> +	struct tasklet_struct receive_tasklet;
>> +
>>  	struct {
>>  		u32 last_fence; /* last fence used to send request */
>>  
>> -- 
>> 2.28.0
>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 60/97] drm/i915: Track 'serial' counts for virtual engines
  2021-05-25 17:52     ` Matthew Brost
@ 2021-05-26  8:40       ` Tvrtko Ursulin
  2021-05-26 18:45         ` John Harrison
  0 siblings, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-26  8:40 UTC (permalink / raw)
  To: Matthew Brost; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel


On 25/05/2021 18:52, Matthew Brost wrote:
> On Tue, May 25, 2021 at 11:16:12AM +0100, Tvrtko Ursulin wrote:
>>
>> On 06/05/2021 20:14, Matthew Brost wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> The serial number tracking of engines happens at the backend of
>>> request submission and was expecting to only be given physical
>>> engines. However, in GuC submission mode, the decomposition of virtual
>>> to physical engines does not happen in i915. Instead, requests are
>>> submitted to their virtual engine mask all the way through to the
>>> hardware (i.e. to GuC). This would mean that the heart beat code
>>> thinks the physical engines are idle due to the serial number not
>>> incrementing.
>>>
>>> This patch updates the tracking to decompose virtual engines into
>>> their physical constituents and tracks the request against each. This
>>> is not entirely accurate as the GuC will only be issuing the request
>>> to one physical engine. However, it is the best that i915 can do given
>>> that it has no knowledge of the GuC's scheduling decisions.
>>
>> Commit text sounds a bit defeatist. I think instead of making up the serial
>> counts, which has downsides (could you please document in the commit what
>> they are), we should think how to design things properly.
>>
> 
> IMO, I don't think fixing serial counts is the scope of this series. We
> should focus on getting GuC submission in not cleaning up all the crap
> that is in the i915. Let's make a note of this though so we can revisit
> later.

I will say again - commit message implies it is introducing an 
unspecified downside by not fully fixing an also unspecified issue. It 
is completely reasonable, and customary even, to ask for both to be 
documented in the commit message.

If we are abandoning the normal review process someone please say so I 
don't waste my time reading it.

Regards,

Tvrtko

> Matt
> 
>> Regards,
>>
>> Tvrtko
>>
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_engine_types.h     |  2 ++
>>>    .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++++++
>>>    drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++++++
>>>    drivers/gpu/drm/i915/gt/mock_engine.c            |  6 ++++++
>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 ++++++++++++++++
>>>    drivers/gpu/drm/i915/i915_request.c              |  4 +++-
>>>    6 files changed, 39 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>> index 86302e6d86b2..e2b5cda6dbc4 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>> @@ -389,6 +389,8 @@ struct intel_engine_cs {
>>>    	void		(*park)(struct intel_engine_cs *engine);
>>>    	void		(*unpark)(struct intel_engine_cs *engine);
>>> +	void		(*bump_serial)(struct intel_engine_cs *engine);
>>> +
>>>    	void		(*set_default_submission)(struct intel_engine_cs *engine);
>>>    	const struct intel_context_ops *cops;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> index ae12d7f19ecd..02880ea5d693 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> @@ -3199,6 +3199,11 @@ static void execlists_release(struct intel_engine_cs *engine)
>>>    	lrc_fini_wa_ctx(engine);
>>>    }
>>> +static void execlist_bump_serial(struct intel_engine_cs *engine)
>>> +{
>>> +	engine->serial++;
>>> +}
>>> +
>>>    static void
>>>    logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>>>    {
>>> @@ -3208,6 +3213,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>>>    	engine->cops = &execlists_context_ops;
>>>    	engine->request_alloc = execlists_request_alloc;
>>> +	engine->bump_serial = execlist_bump_serial;
>>>    	engine->reset.prepare = execlists_reset_prepare;
>>>    	engine->reset.rewind = execlists_reset_rewind;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>> index 14aa31879a37..39dd7c4ed0a9 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>> @@ -1045,6 +1045,11 @@ static void setup_irq(struct intel_engine_cs *engine)
>>>    	}
>>>    }
>>> +static void ring_bump_serial(struct intel_engine_cs *engine)
>>> +{
>>> +	engine->serial++;
>>> +}
>>> +
>>>    static void setup_common(struct intel_engine_cs *engine)
>>>    {
>>>    	struct drm_i915_private *i915 = engine->i915;
>>> @@ -1064,6 +1069,7 @@ static void setup_common(struct intel_engine_cs *engine)
>>>    	engine->cops = &ring_context_ops;
>>>    	engine->request_alloc = ring_request_alloc;
>>> +	engine->bump_serial = ring_bump_serial;
>>>    	/*
>>>    	 * Using a global execution timeline; the previous final breadcrumb is
>>> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
>>> index bd005c1b6fd5..97b10fd60b55 100644
>>> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
>>> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
>>> @@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine)
>>>    	intel_engine_fini_retire(engine);
>>>    }
>>> +static void mock_bump_serial(struct intel_engine_cs *engine)
>>> +{
>>> +	engine->serial++;
>>> +}
>>> +
>>>    struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>>>    				    const char *name,
>>>    				    int id)
>>> @@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>>>    	engine->base.cops = &mock_context_ops;
>>>    	engine->base.request_alloc = mock_request_alloc;
>>> +	engine->base.bump_serial = mock_bump_serial;
>>>    	engine->base.emit_flush = mock_emit_flush;
>>>    	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
>>>    	engine->base.submit_request = mock_submit_request;
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> index dc79d287c50a..f0e5731bcef6 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> @@ -1500,6 +1500,20 @@ static void guc_release(struct intel_engine_cs *engine)
>>>    	lrc_fini_wa_ctx(engine);
>>>    }
>>> +static void guc_bump_serial(struct intel_engine_cs *engine)
>>> +{
>>> +	engine->serial++;
>>> +}
>>> +
>>> +static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
>>> +{
>>> +	struct intel_engine_cs *e;
>>> +	intel_engine_mask_t tmp, mask = engine->mask;
>>> +
>>> +	for_each_engine_masked(e, engine->gt, mask, tmp)
>>> +		e->serial++;
>>> +}
>>> +
>>>    static void guc_default_vfuncs(struct intel_engine_cs *engine)
>>>    {
>>>    	/* Default vfuncs which can be overridden by each engine. */
>>> @@ -1508,6 +1522,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>>>    	engine->cops = &guc_context_ops;
>>>    	engine->request_alloc = guc_request_alloc;
>>> +	engine->bump_serial = guc_bump_serial;
>>>    	engine->sched_engine->schedule = i915_schedule;
>>> @@ -1843,6 +1858,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>>>    	ve->base.cops = &virtual_guc_context_ops;
>>>    	ve->base.request_alloc = guc_request_alloc;
>>> +	ve->base.bump_serial = virtual_guc_bump_serial;
>>>    	ve->base.submit_request = guc_submit_request;
>>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>>> index 9542a5baa45a..127d60b36422 100644
>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>> @@ -692,7 +692,9 @@ bool __i915_request_submit(struct i915_request *request)
>>>    				     request->ring->vaddr + request->postfix);
>>>    	trace_i915_request_execute(request);
>>> -	engine->serial++;
>>> +	if (engine->bump_serial)
>>> +		engine->bump_serial(engine);
>>> +
>>>    	result = true;
>>>    	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
>>>

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-05-25 17:21     ` Matthew Brost
@ 2021-05-26  8:57       ` Tvrtko Ursulin
  2021-05-26 18:10         ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-26  8:57 UTC (permalink / raw)
  To: Matthew Brost; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel


On 25/05/2021 18:21, Matthew Brost wrote:
> On Tue, May 25, 2021 at 10:21:00AM +0100, Tvrtko Ursulin wrote:
>>
>> On 06/05/2021 20:13, Matthew Brost wrote:
>>> Add non blocking CTB send function, intel_guc_send_nb. In order to
>>> support a non blocking CTB send function a spin lock is needed to
>>> protect the CTB descriptors fields. Also the non blocking call must not
>>> update the fence value as this value is owned by the blocking call
>>> (intel_guc_send).
>>
>> Could the commit message say why the non-blocking send function is needed?
>>
> 
> Sure. Something like:
> 
> 'CTBs will be used in the critical patch of GuC submission and there is
> no need to wait for each CTB complete before moving on the i915'

A bit more, like also mentioning the critical path is with interrupts disabled or so. And not just that there is no need to wait but waiting is not possible because this or that. So only choice is to do this busy loop send. It's a bit horrible so justification needs to be documented.

>>> The blocking CTB now must have a flow control mechanism to ensure the
>>> buffer isn't overrun. A lazy spin wait is used as we believe the flow
>>> control condition should be rare with properly sized buffer.
>>>
>>> The function, intel_guc_send_nb, is exported in this patch but unused.
>>> Several patches later in the series make use of this function.
>>>
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 ++-
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 96 +++++++++++++++++++++--
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  7 +-
>>>    3 files changed, 105 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> index c20f3839de12..4c0a367e41d8 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> @@ -75,7 +75,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
>>>    static
>>>    inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
>>>    {
>>> -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
>>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
>>> +}
>>> +
>>> +#define INTEL_GUC_SEND_NB		BIT(31)
>>> +static
>>> +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
>>> +{
>>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
>>> +				 INTEL_GUC_SEND_NB);
>>>    }
>>>    static inline int
>>> @@ -83,7 +91,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
>>>    			   u32 *response_buf, u32 response_buf_size)
>>>    {
>>>    	return intel_guc_ct_send(&guc->ct, action, len,
>>> -				 response_buf, response_buf_size);
>>> +				 response_buf, response_buf_size, 0);
>>>    }
>>>    static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index a76603537fa8..af7314d45a78 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -3,6 +3,11 @@
>>>     * Copyright © 2016-2019 Intel Corporation
>>>     */
>>> +#include <linux/circ_buf.h>
>>> +#include <linux/ktime.h>
>>> +#include <linux/time64.h>
>>> +#include <linux/timekeeping.h>
>>> +
>>>    #include "i915_drv.h"
>>>    #include "intel_guc_ct.h"
>>>    #include "gt/intel_gt.h"
>>> @@ -308,6 +313,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>>>    	if (unlikely(err))
>>>    		goto err_deregister;
>>> +	ct->requests.last_fence = 1;
>>>    	ct->enabled = true;
>>>    	return 0;
>>> @@ -343,10 +349,22 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
>>>    	return ++ct->requests.last_fence;
>>>    }
>>> +static void write_barrier(struct intel_guc_ct *ct) {
>>> +	struct intel_guc *guc = ct_to_guc(ct);
>>> +	struct intel_gt *gt = guc_to_gt(guc);
>>> +
>>> +	if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
>>> +		GEM_BUG_ON(guc->send_regs.fw_domains);
>>> +		intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
>>
>> It's safe to write to this reg? Does it need a comment to explain it?
>>
> 
> Yes, it is same. IMO 'SCRATCH' in the name is enough documentation.

Why would it be enough? It requires digging to figure it out since it appears these are some sort of GuC special registers and not generic scratch:

commit 2d4ed3a988e6b1ff9729d0edd74bf4890571253e
Author: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date:   Mon May 27 18:36:05 2019 +0000

     drm/i915/guc: New GuC scratch registers for Gen11

If it was a normal scratch then async trashing of those from a random driver thread isn't per se safe if used from a GPU context running in parallel.

But then according to bspec they are called VF_SW_FLAG_<n> and not GEN11_SOFT_SCRATCH so yeah.

>   
>>> +	} else {
>>> +		wmb();
>>> +	}
>>> +}
>>> +
>>>    static int ct_write(struct intel_guc_ct *ct,
>>>    		    const u32 *action,
>>>    		    u32 len /* in dwords */,
>>> -		    u32 fence)
>>> +		    u32 fence, u32 flags)
>>>    {
>>>    	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>    	struct guc_ct_buffer_desc *desc = ctb->desc;
>>> @@ -393,9 +411,13 @@ static int ct_write(struct intel_guc_ct *ct,
>>>    		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
>>>    		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
>>> -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
>>> -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
>>> -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
>>> +	hxg = (flags & INTEL_GUC_SEND_NB) ?
>>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
>>> +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
>>> +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
>>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
>>> +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
>>> +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
>>>    	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
>>>    		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
>>> @@ -412,6 +434,12 @@ static int ct_write(struct intel_guc_ct *ct,
>>>    	}
>>>    	GEM_BUG_ON(tail > size);
>>> +	/*
>>> +	 * make sure H2G buffer update and LRC tail update (if this triggering a
>>> +	 * submission) are visable before updating the descriptor tail
>>> +	 */
>>> +	write_barrier(ct);
>>> +
>>>    	/* now update descriptor */
>>>    	WRITE_ONCE(desc->tail, tail);
>>> @@ -466,6 +494,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
>>>    	return err;
>>>    }
>>> +static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
>>> +{
>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>>> +	u32 head = READ_ONCE(desc->head);
>>> +	u32 space;
>>> +
>>> +	space = CIRC_SPACE(desc->tail, head, ctb->size);
>>> +
>>> +	return space >= len_dw;
>>> +}
>>> +
>>> +static int ct_send_nb(struct intel_guc_ct *ct,
>>> +		      const u32 *action,
>>> +		      u32 len,
>>> +		      u32 flags)
>>> +{
>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>> +	unsigned long spin_flags;
>>> +	u32 fence;
>>> +	int ret;
>>> +
>>> +	spin_lock_irqsave(&ctb->lock, spin_flags);
>>> +
>>> +	ret = ctb_has_room(ctb, len + 1);
>>> +	if (unlikely(ret))
>>> +		goto out;
>>> +
>>> +	fence = ct_get_next_fence(ct);
>>> +	ret = ct_write(ct, action, len, fence, flags);
>>> +	if (unlikely(ret))
>>> +		goto out;
>>> +
>>> +	intel_guc_notify(ct_to_guc(ct));
>>> +
>>> +out:
>>> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
>>> +
>>> +	return ret;
>>> +}
>>> +
>>>    static int ct_send(struct intel_guc_ct *ct,
>>>    		   const u32 *action,
>>>    		   u32 len,
>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>    		   u32 response_buf_size,
>>>    		   u32 *status)
>>>    {
>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>    	struct ct_request request;
>>>    	unsigned long flags;
>>>    	u32 fence;
>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
>>>    	GEM_BUG_ON(!len);
>>>    	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>    	GEM_BUG_ON(!response_buf && response_buf_size);
>>> +	might_sleep();
>>
>> Sleep is just cond_resched below or there is more?
>>
> 
> Yes, the cond_resched.
> 
>>> +	/*
>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>>> +	 * buffers are sized correctly the flow control condition should be
>>> +	 * rare.
>>> +	 */
>>> +retry:
>>>    	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>>> +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
>>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>> +		cond_resched();
>>> +		goto retry;
>>> +	}
>>
>> If this patch is about adding a non-blocking send function, and below we can
>> see that it creates a fork:
>>
>> intel_guc_ct_send:
>> ...
>> 	if (flags & INTEL_GUC_SEND_NB)
>> 		return ct_send_nb(ct, action, len, flags);
>>
>>   	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>>
>> Then why is there a change in ct_send here, which is not the new
>> non-blocking path?
>>
> 
> There is not a change to ct_send(), just to intel_guc_ct_send.

I was doing by the diff which says:

  static int ct_send(struct intel_guc_ct *ct,
  		   const u32 *action,
  		   u32 len,
@@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
  		   u32 response_buf_size,
  		   u32 *status)
  {
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
  	struct ct_request request;
  	unsigned long flags;
  	u32 fence;
@@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
  	GEM_BUG_ON(!len);
  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
  	GEM_BUG_ON(!response_buf && response_buf_size);
+	might_sleep();
  
+	/*
+	 * We use a lazy spin wait loop here as we believe that if the CT
+	 * buffers are sized correctly the flow control condition should be
+	 * rare.
+	 */
+retry:
  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
+	if (unlikely(!ctb_has_room(ctb, len + 1))) {
+		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
+		cond_resched();
+		goto retry;
+	}

So it looks like a change to ct_send to me. Is that wrong?

Regards,

Tvrtko

> As for why intel_guc_ct_send is updated and we don't just a new public
> function, this was another reviewers suggestion. Again can't make
> everyone happy.
>   
>>>    	fence = ct_get_next_fence(ct);
>>>    	request.fence = fence;
>>> @@ -495,7 +576,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>    	list_add_tail(&request.link, &ct->requests.pending);
>>>    	spin_unlock(&ct->requests.lock);
>>> -	err = ct_write(ct, action, len, fence);
>>> +	err = ct_write(ct, action, len, fence, 0);
>>>    	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>> @@ -537,7 +618,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>     * Command Transport (CT) buffer based GuC send function.
>>>     */
>>>    int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>> -		      u32 *response_buf, u32 response_buf_size)
>>> +		      u32 *response_buf, u32 response_buf_size, u32 flags)
>>>    {
>>>    	u32 status = ~0; /* undefined */
>>>    	int ret;
>>> @@ -547,6 +628,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>>    		return -ENODEV;
>>>    	}
>>> +	if (flags & INTEL_GUC_SEND_NB)
>>> +		return ct_send_nb(ct, action, len, flags);
>>> +
>>>    	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>>>    	if (unlikely(ret < 0)) {
>>>    		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> index 1ae2dde6db93..55ef7c52472f 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> @@ -9,6 +9,7 @@
>>>    #include <linux/interrupt.h>
>>>    #include <linux/spinlock.h>
>>>    #include <linux/workqueue.h>
>>> +#include <linux/ktime.h>
>>>    #include "intel_guc_fwif.h"
>>> @@ -42,7 +43,6 @@ struct intel_guc_ct_buffer {
>>>    	bool broken;
>>>    };
>>> -
>>>    /** Top-level structure for Command Transport related data
>>>     *
>>>     * Includes a pair of CT buffers for bi-directional communication and tracking
>>> @@ -69,6 +69,9 @@ struct intel_guc_ct {
>>>    		struct list_head incoming; /* incoming requests */
>>>    		struct work_struct worker; /* handler for incoming requests */
>>>    	} requests;
>>> +
>>> +	/** @stall_time: time of first time a CTB submission is stalled */
>>> +	ktime_t stall_time;
>>
>> Unused in this patch.
>>
> 
> Yea, wrong patch. Will fix.
> 
> Matt
>   
>>>    };
>>>    void intel_guc_ct_init_early(struct intel_guc_ct *ct);
>>> @@ -88,7 +91,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
>>>    }
>>>    int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>> -		      u32 *response_buf, u32 response_buf_size);
>>> +		      u32 *response_buf, u32 response_buf_size, u32 flags);
>>>    void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
>>>    #endif /* _INTEL_GUC_CT_H_ */
>>>
>>
>> Regards,
>>
>> Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 55/97] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
  2021-05-25 17:07     ` Matthew Brost
@ 2021-05-26  9:21       ` Tvrtko Ursulin
  2021-05-26 18:18         ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-26  9:21 UTC (permalink / raw)
  To: Matthew Brost; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel


On 25/05/2021 18:07, Matthew Brost wrote:
> On Tue, May 25, 2021 at 11:06:00AM +0100, Tvrtko Ursulin wrote:
>>
>> On 06/05/2021 20:14, Matthew Brost wrote:
>>> When running the GuC the GPU can't be considered idle if the GuC still
>>> has contexts pinned. As such, a call has been added in
>>> intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
>>> the number of unpinned contexts to go to zero.
>>>
>>> Cc: John Harrison <john.c.harrison@intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
>>>    drivers/gpu/drm/i915/gt/intel_gt.c            | 18 ++++
>>>    drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
>>>    drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 22 ++---
>>>    drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  1 +
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +
>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 ++++++++++++++++++-
>>>    drivers/gpu/drm/i915/gt/uc/intel_uc.h         |  5 +
>>>    drivers/gpu/drm/i915/i915_debugfs.c           |  1 +
>>>    drivers/gpu/drm/i915/i915_gem_evict.c         |  1 +
>>>    .../gpu/drm/i915/selftests/igt_live_test.c    |  2 +-
>>>    .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
>>>    14 files changed, 137 insertions(+), 27 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> index 8598a1c78a4c..2f5295c9408d 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> @@ -634,7 +634,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
>>>    		goto insert;
>>>    	/* Attempt to reap some mmap space from dead objects */
>>> -	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
>>> +	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
>>> +					       NULL);
>>>    	if (err)
>>>    		goto err;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>>> index 8d77dcbad059..1742a8561f69 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>>> @@ -574,6 +574,24 @@ static void __intel_gt_disable(struct intel_gt *gt)
>>>    	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
>>>    }
>>> +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
>>> +{
>>> +	long rtimeout;
>>> +
>>> +	/* If the device is asleep, we have no requests outstanding */
>>> +	if (!intel_gt_pm_is_awake(gt))
>>> +		return 0;
>>> +
>>> +	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
>>> +							   &rtimeout)) > 0) {
>>> +		cond_resched();
>>> +		if (signal_pending(current))
>>> +			return -EINTR;
>>> +	}
>>> +
>>> +	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc, rtimeout);
>>> +}
>>> +
>>>    int intel_gt_init(struct intel_gt *gt)
>>>    {
>>>    	int err;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
>>> index 7ec395cace69..c775043334bf 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
>>> @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
>>>    void intel_gt_driver_late_release(struct intel_gt *gt);
>>> +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
>>> +
>>>    void intel_gt_check_and_clear_faults(struct intel_gt *gt);
>>>    void intel_gt_clear_error_registers(struct intel_gt *gt,
>>>    				    intel_engine_mask_t engine_mask);
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> index 647eca9d867a..c6c702f236fa 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> @@ -13,6 +13,7 @@
>>>    #include "intel_gt_pm.h"
>>>    #include "intel_gt_requests.h"
>>>    #include "intel_timeline.h"
>>> +#include "uc/intel_uc.h"
>>>    static bool retire_requests(struct intel_timeline *tl)
>>>    {
>>> @@ -130,7 +131,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine)
>>>    	GEM_BUG_ON(engine->retire);
>>>    }
>>> -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
>>> +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
>>> +				      long *rtimeout)
>>
>> What is 'rtimeout', I know remaining, but it can be more self-descriptive to
>> start with.
>>
> 
> 'remaining_timeout' it is.
> 
>> It feels a bit churny for what it is. How plausible would be alternatives to
>> either change existing timeout to in/out, or measure sleep internally in
>> this function, or just risk sleeping twice as long by passing the original
>> timeout to uc idle as well?
>>
> 
> Originally had it just passing in the same value, got review feedback
> saying I should pass in the adjusted value. Hard to make everyone happy.

Ok.

>   
>>>    {
>>>    	struct intel_gt_timelines *timelines = &gt->timelines;
>>>    	struct intel_timeline *tl, *tn;
>>> @@ -195,22 +197,10 @@ out_active:	spin_lock(&timelines->lock);
>>>    	if (flush_submission(gt, timeout)) /* Wait, there's more! */
>>>    		active_count++;
>>> -	return active_count ? timeout : 0;
>>> -}
>>> -
>>> -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
>>> -{
>>> -	/* If the device is asleep, we have no requests outstanding */
>>> -	if (!intel_gt_pm_is_awake(gt))
>>> -		return 0;
>>> -
>>> -	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
>>> -		cond_resched();
>>> -		if (signal_pending(current))
>>> -			return -EINTR;
>>> -	}
>>> +	if (rtimeout)
>>> +		*rtimeout = timeout;
>>> -	return timeout;
>>> +	return active_count ? timeout : 0;
>>>    }
>>>    static void retire_work_handler(struct work_struct *work)
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
>>> index fcc30a6e4fe9..4419787124e2 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
>>> @@ -10,10 +10,11 @@ struct intel_engine_cs;
>>>    struct intel_gt;
>>>    struct intel_timeline;
>>> -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
>>> +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
>>> +				      long *rtimeout);
>>>    static inline void intel_gt_retire_requests(struct intel_gt *gt)
>>>    {
>>> -	intel_gt_retire_requests_timeout(gt, 0);
>>> +	intel_gt_retire_requests_timeout(gt, 0, NULL);
>>>    }
>>>    void intel_engine_init_retire(struct intel_engine_cs *engine);
>>> @@ -21,8 +22,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine,
>>>    			     struct intel_timeline *tl);
>>>    void intel_engine_fini_retire(struct intel_engine_cs *engine);
>>> -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
>>> -
>>>    void intel_gt_init_requests(struct intel_gt *gt);
>>>    void intel_gt_park_requests(struct intel_gt *gt);
>>>    void intel_gt_unpark_requests(struct intel_gt *gt);
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> index 485e98f3f304..47eaa69809e8 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> @@ -38,6 +38,8 @@ struct intel_guc {
>>>    	spinlock_t irq_lock;
>>>    	unsigned int msg_enabled_mask;
>>> +	atomic_t outstanding_submission_g2h;
>>> +
>>>    	struct {
>>>    		bool enabled;
>>>    		void (*reset)(struct intel_guc *guc);
>>> @@ -239,6 +241,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>>>    	spin_unlock_irq(&guc->irq_lock);
>>>    }
>>> +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
>>> +
>>>    int intel_guc_reset_engine(struct intel_guc *guc,
>>>    			   struct intel_engine_cs *engine);
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index f1893030ca88..cf701056fa14 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -111,6 +111,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>>>    	INIT_LIST_HEAD(&ct->requests.incoming);
>>>    	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
>>>    	tasklet_init(&ct->receive_tasklet, ct_receive_tasklet_func, (unsigned long)ct);
>>> +	init_waitqueue_head(&ct->wq);
>>>    }
>>>    static inline const char *guc_ct_buffer_type_to_str(u32 type)
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> index 660bf37238e2..ab1b79ab960b 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> @@ -10,6 +10,7 @@
>>>    #include <linux/spinlock.h>
>>>    #include <linux/workqueue.h>
>>>    #include <linux/ktime.h>
>>> +#include <linux/wait.h>
>>>    #include "intel_guc_fwif.h"
>>> @@ -68,6 +69,9 @@ struct intel_guc_ct {
>>>    	struct tasklet_struct receive_tasklet;
>>> +	/** @wq: wait queue for g2h chanenl */
>>> +	wait_queue_head_t wq;
>>> +
>>>    	struct {
>>>    		u16 last_fence; /* last fence used to send request */
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> index ae0b386467e3..0ff7dd6d337d 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> @@ -253,6 +253,74 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
>>>    	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
>>>    }
>>> +static int guc_submission_busy_loop(struct intel_guc* guc,
>>> +				    const u32 *action,
>>> +				    u32 len,
>>> +				    u32 g2h_len_dw,
>>> +				    bool loop)
>>> +{
>>> +	int err;
>>> +
>>> +	err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
>>> +
>>> +	if (!err && g2h_len_dw)
>>> +		atomic_inc(&guc->outstanding_submission_g2h);
>>> +
>>> +	return err;
>>> +}
>>> +
>>> +static int guc_wait_for_pending_msg(struct intel_guc *guc,
>>> +				    atomic_t *wait_var,
>>> +				    bool interruptible,
>>> +				    long timeout)
>>> +{
>>> +	const int state = interruptible ?
>>> +		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
>>> +	DEFINE_WAIT(wait);
>>> +
>>> +	might_sleep();
>>> +	GEM_BUG_ON(timeout < 0);
>>> +
>>> +	if (!atomic_read(wait_var))
>>> +		return 0;
>>> +
>>> +	if (!timeout)
>>> +		return -ETIME;
>>> +
>>> +	for (;;) {
>>> +		prepare_to_wait(&guc->ct.wq, &wait, state);
>>> +
>>> +		if (!atomic_read(wait_var))
>>> +			break;
>>> +
>>> +		if (signal_pending_state(state, current)) {
>>> +			timeout = -ERESTARTSYS;
>>> +			break;
>>> +		}
>>> +
>>> +		if (!timeout) {
>>> +			timeout = -ETIME;
>>> +			break;
>>> +		}
>>> +
>>> +		timeout = io_schedule_timeout(timeout);
>>> +	}
>>> +	finish_wait(&guc->ct.wq, &wait);
>>> +
>>> +	return (timeout < 0) ? timeout : 0;
>>> +}
>>
>> See if it is possible to simplify all this with wait_var_event and
>> wake_up_var.
>>
> 
> Let me check on that.
>   
>>> +
>>> +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
>>> +{
>>> +	bool interruptible = true;
>>> +
>>> +	if (unlikely(timeout < 0))
>>> +		timeout = -timeout, interruptible = false;
>>> +
>>> +	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
>>> +					interruptible, timeout);
>>> +}
>>> +
>>>    static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>>>    {
>>>    	int err;
>>> @@ -279,6 +347,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>>>    	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
>>>    	if (!enabled && !err) {
>>> +		atomic_inc(&guc->outstanding_submission_g2h);
>>>    		set_context_enabled(ce);
>>>    	} else if (!enabled) {
>>>    		clr_context_pending_enable(ce);
>>> @@ -734,7 +803,7 @@ static int __guc_action_register_context(struct intel_guc *guc,
>>>    		offset,
>>>    	};
>>> -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
>>> +	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
>>>    }
>>>    static int register_context(struct intel_context *ce)
>>> @@ -754,7 +823,7 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
>>>    		guc_id,
>>>    	};
>>> -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
>>> +	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
>>>    					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
>>>    }
>>> @@ -871,7 +940,9 @@ static int guc_context_pin(struct intel_context *ce, void *vaddr)
>>>    static void guc_context_unpin(struct intel_context *ce)
>>>    {
>>> -	unpin_guc_id(ce_to_guc(ce), ce);
>>> +	struct intel_guc *guc = ce_to_guc(ce);
>>> +
>>> +	unpin_guc_id(guc, ce);
>>>    	lrc_unpin(ce);
>>>    }
>>> @@ -894,7 +965,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
>>>    	intel_context_get(ce);
>>> -	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
>>> +	guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
>>>    				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
>>>    }
>>> @@ -1437,6 +1508,15 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
>>>    	return ce;
>>>    }
>>> +static void decr_outstanding_submission_g2h(struct intel_guc *guc)
>>> +{
>>> +	if (atomic_dec_and_test(&guc->outstanding_submission_g2h)) {
>>> +		smp_mb();
>>> +		if (waitqueue_active(&guc->ct.wq))
>>> +			wake_up_all(&guc->ct.wq);
>>
>> I keep pointing out this pattern is racy and at least needs comment why it
>> is safe.
>>
> 
> There is a comment in wake queue code header saying why this is safe. I
> don't think we need to repeat this here.

Yeah, _describing how to make it safe_, after it starts with:

  * NOTE: this function is lockless and requires care, incorrect usage _will_
  * lead to sporadic and non-obvious failure.

Then it also says:

  * Also note that this 'optimization' trades a spin_lock() for an smp_mb(),
  * which (when the lock is uncontended) are of roughly equal cost.

I question the need to optimize this path since it means reader has to figure out if it is safe while a simple wake_up_all after atomic_dec_and_test would have done it.

Is the case of no waiters a predominant one? It at least deserves a comment explaining why the optimisation is important.

Regards,

Tvrtko

> 
> Matt
> 
>> Regards,
>>
>> Tvrtko
>>
>>> +	}
>>> +}
>>> +
>>>    int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>>>    					  const u32 *msg,
>>>    					  u32 len)
>>> @@ -1472,6 +1552,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>>>    		lrc_destroy(&ce->ref);
>>>    	}
>>> +	decr_outstanding_submission_g2h(guc);
>>> +
>>>    	return 0;
>>>    }
>>> @@ -1520,6 +1602,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>>>    		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>>>    	}
>>> +	decr_outstanding_submission_g2h(guc);
>>>    	intel_context_put(ce);
>>>    	return 0;
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
>>> index 9c954c589edf..c4cef885e984 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
>>> @@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission);
>>>    #undef uc_state_checkers
>>>    #undef __uc_state_checker
>>> +static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout)
>>> +{
>>> +	return intel_guc_wait_for_idle(&uc->guc, timeout);
>>> +}
>>> +
>>>    #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \
>>>    static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \
>>>    { \
>>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>>> index 8dd374691102..bb29838d1cd7 100644
>>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>>> @@ -36,6 +36,7 @@
>>>    #include "gt/intel_gt_clock_utils.h"
>>>    #include "gt/intel_gt.h"
>>>    #include "gt/intel_gt_pm.h"
>>> +#include "gt/intel_gt.h"
>>>    #include "gt/intel_gt_requests.h"
>>>    #include "gt/intel_reset.h"
>>>    #include "gt/intel_rc6.h"
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
>>> index 4d2d59a9942b..2b73ddb11c66 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
>>> @@ -27,6 +27,7 @@
>>>     */
>>>    #include "gem/i915_gem_context.h"
>>> +#include "gt/intel_gt.h"
>>>    #include "gt/intel_gt_requests.h"
>>>    #include "i915_drv.h"
>>> diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
>>> index c130010a7033..1c721542e277 100644
>>> --- a/drivers/gpu/drm/i915/selftests/igt_live_test.c
>>> +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
>>> @@ -5,7 +5,7 @@
>>>     */
>>>    #include "i915_drv.h"
>>> -#include "gt/intel_gt_requests.h"
>>> +#include "gt/intel_gt.h"
>>>    #include "../i915_selftest.h"
>>>    #include "igt_flush_test.h"
>>> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> index cf40004bc92a..6c06816e2b99 100644
>>> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> @@ -51,7 +51,8 @@ void mock_device_flush(struct drm_i915_private *i915)
>>>    	do {
>>>    		for_each_engine(engine, gt, id)
>>>    			mock_engine_flush(engine);
>>> -	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT));
>>> +	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT,
>>> +						  NULL));
>>>    }
>>>    static void mock_device_release(struct drm_device *dev)
>>>

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 53/97] drm/i915/guc: Disable semaphores when using GuC scheduling
  2021-05-25 17:01     ` Matthew Brost
@ 2021-05-26  9:25       ` Tvrtko Ursulin
  2021-05-26 18:15         ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-26  9:25 UTC (permalink / raw)
  To: Matthew Brost; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel


On 25/05/2021 18:01, Matthew Brost wrote:
> On Tue, May 25, 2021 at 10:52:01AM +0100, Tvrtko Ursulin wrote:
>>
>> On 06/05/2021 20:14, Matthew Brost wrote:
>>> Disable semaphores when using GuC scheduling as semaphores are broken in
>>> the current GuC firmware.
>>
>> What is "current"? Given that the patch itself is like year and a half old.
>>
> 
> Stale comment. Semaphore work with the firmware we just haven't enabled
> them in the i915 with GuC submission as this an optimization and not
> required for functionality.

How will the updated commit message look in terms of remaining reasons 
why semaphores won't/can't be enabled?

They were a nice performance win on some media workloads although 
granted a lot of tweaking was required to find a good balance on when to 
use them and when not to.

Regards,

Tvrtko

> Matt
> 
>> Regards,
>>
>> Tvrtko
>>
>>> Cc: John Harrison <john.c.harrison@intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 ++++--
>>>    1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> index 993faa213b41..d30260ffe2a7 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> @@ -230,7 +230,8 @@ static void intel_context_set_gem(struct intel_context *ce,
>>>    		ce->timeline = intel_timeline_get(ctx->timeline);
>>>    	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
>>> -	    intel_engine_has_timeslices(ce->engine))
>>> +	    intel_engine_has_timeslices(ce->engine) &&
>>> +	    intel_engine_has_semaphores(ce->engine))
>>>    		__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
>>>    	intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us);
>>> @@ -1939,7 +1940,8 @@ static int __apply_priority(struct intel_context *ce, void *arg)
>>>    	if (!intel_engine_has_timeslices(ce->engine))
>>>    		return 0;
>>> -	if (ctx->sched.priority >= I915_PRIORITY_NORMAL)
>>> +	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
>>> +	    intel_engine_has_semaphores(ce->engine))
>>>    		intel_context_set_use_semaphores(ce);
>>>    	else
>>>    		intel_context_clear_use_semaphores(ce);
>>>

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 39/97] drm/i915/guc: Increase size of CTB buffers
  2021-05-25 17:15     ` Matthew Brost
@ 2021-05-26  9:30       ` Tvrtko Ursulin
  2021-05-26 18:20         ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-26  9:30 UTC (permalink / raw)
  To: Matthew Brost; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel


On 25/05/2021 18:15, Matthew Brost wrote:
> On Tue, May 25, 2021 at 10:24:09AM +0100, Tvrtko Ursulin wrote:
>>
>> On 06/05/2021 20:13, Matthew Brost wrote:
>>> With the introduction of non-blocking CTBs more than one CTB can be in
>>> flight at a time. Increasing the size of the CTBs should reduce how
>>> often software hits the case where no space is available in the CTB
>>> buffer.
>>
>> I'd move this before the patch which adds the non-blocking send since that
>> one claims congestion should be rare with properly sized buffers. So it
>> makes sense to have them sized properly back before that one.
>>
> 
> IMO patch ordering is a bit of bikeshed. All these CTBs changes required
> for GuC submission (34-40, 54) will get posted its own series and get
> merged together. None of the individual patches break anything or is any
> of this code really used until GuC submission is turned on. I can move
> this when I post these patches by themselves but I just don't really see
> the point either way.

As a general principle we do try to have work in the order which makes 
sense functionality wise.

That includes trying to avoid adding and then removing, or changing a 
lot, the same code within the series. And also adding functionality 
which is known to not work completely well until later in the series.

With a master switch at the end of series you can sometimes get away 
with it, but if nothing else it at least makes it much easier to read if 
things are flowing in the expected way within (the series).

In this particular example sizing the buffers appropriately before 
starting to use the facility a lot more certainly sounds like a no 
brainer to me, especially since the patch is so trivial to move conflict 
wise.

Regards,

Tvrtko

> Matt
>   
>> Regards,
>>
>> Tvrtko
>>
>>> Cc: John Harrison <john.c.harrison@intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
>>>    1 file changed, 8 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index 77dfbc94dcc3..d6895d29ed2d 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
>>>     *      +--------+-----------------------------------------------+------+
>>>     *
>>>     * Size of each `CT Buffer`_ must be multiple of 4K.
>>> - * As we don't expect too many messages, for now use minimum sizes.
>>> + * We don't expect too many messages in flight at any time, unless we are
>>> + * using the GuC submission. In that case each request requires a minimum
>>> + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this
>>> + * enough space to avoid backpressure on the driver. We increase the size
>>> + * of the receive buffer (relative to the send) to ensure a G2H response
>>> + * CTB has a landing spot.
>>>     */
>>>    #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
>>>    #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
>>> -#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
>>> +#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
>>>    #define MAX_US_STALL_CTB	1000000
>>> @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>>>    	/* beware of buffer wrap case */
>>>    	if (unlikely(available < 0))
>>>    		available += size;
>>> -	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
>>> +	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
>>>    	GEM_BUG_ON(available < 0);
>>>    	header = cmds[head];
>>>

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 49/97] drm/i915/guc: Disable engine barriers with GuC during unpin
  2021-05-06 19:14 ` [RFC PATCH 49/97] drm/i915/guc: Disable engine barriers with GuC during unpin Matthew Brost
  2021-05-11 15:37   ` Daniel Vetter
@ 2021-05-26 10:26   ` Tvrtko Ursulin
  1 sibling, 0 replies; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-26 10:26 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter


On 06/05/2021 20:14, Matthew Brost wrote:
> Disable engine barriers for unpinning with GuC. This feature isn't
> needed with the GuC as it disables context scheduling before unpinning

Just isn't needed or causes a problem somehow?

> which guarantees the HW will not reference the context. Hence it is
> not necessary to defer unpinning until a kernel context request
> completes on each engine in the context engine mask.

Hm context engine mask does not come across as something used in this patch.

Engine PM works fine with this change - i915 does not turn of the 
engine/gt too early? I mean context unpin is on retire and the guc 
disable of context scheduling is sync or async? Even when the kernel 
context request gets emitted on engine pm put there is no race?

> 
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context.c    |  2 +-
>   drivers/gpu/drm/i915/gt/intel_context.h    |  1 +
>   drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++++++++++
>   drivers/gpu/drm/i915/i915_active.c         |  3 +++
>   4 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index 1499b8aace2a..7f97753ab164 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct intel_context *ce)
>   
>   	__i915_active_acquire(&ce->active);
>   
> -	if (intel_context_is_barrier(ce))
> +	if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))
>   		return 0;
>   
>   	/* Preallocate tracking nodes */
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> index 92ecbab8c1cd..9b211ca5ecc7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> @@ -16,6 +16,7 @@
>   #include "intel_engine_types.h"
>   #include "intel_ring_types.h"
>   #include "intel_timeline_types.h"
> +#include "uc/intel_guc_submission.h"
>   
>   #define CE_TRACE(ce, fmt, ...) do {					\
>   	const struct intel_context *ce__ = (ce);			\
> diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c
> index 26685b927169..fa7b99a671dd 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_context.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_context.c
> @@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs *engine)
>   	 * This test makes sure that the context is kept alive until a
>   	 * subsequent idle-barrier (emitted when the engine wakeref hits 0
>   	 * with no more outstanding requests).
> +	 *
> +	 * In GuC submission mode we don't use idle barriers and we instead
> +	 * get a message from the GuC to signal that it is safe to unpin the
> +	 * context from memory.
>   	 */
> +	if (intel_engine_uses_guc(engine))
> +		return 0;
>   
>   	if (intel_engine_pm_is_awake(engine)) {
>   		pr_err("%s is awake before starting %s!\n",
> @@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs *engine)
>   	 * on the context image remotely (intel_context_prepare_remote_request),
>   	 * which inserts foreign fences into intel_context.active, does not
>   	 * clobber the idle-barrier.
> +	 *
> +	 * In GuC submission mode we don't use idle barriers.
>   	 */
> +	if (intel_engine_uses_guc(engine))
> +		return 0;
>   
>   	if (intel_engine_pm_is_awake(engine)) {
>   		pr_err("%s is awake before starting %s!\n",
> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> index b1aa1c482c32..9a264898bb91 100644
> --- a/drivers/gpu/drm/i915/i915_active.c
> +++ b/drivers/gpu/drm/i915/i915_active.c
> @@ -968,6 +968,9 @@ void i915_active_acquire_barrier(struct i915_active *ref)
>   
>   	GEM_BUG_ON(i915_active_is_idle(ref));
>   
> +	if (llist_empty(&ref->preallocated_barriers))
> +		return;

This hunk is not needed since the effectively same check is few lines below.

Regards,

Tvrtko

> +
>   	/*
>   	 * Transfer the list of preallocated barriers into the
>   	 * i915_active rbtree, but only as proto-nodes. They will be
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-05-26  8:57       ` Tvrtko Ursulin
@ 2021-05-26 18:10         ` Matthew Brost
  2021-05-27 10:02           ` Tvrtko Ursulin
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-26 18:10 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Wed, May 26, 2021 at 09:57:10AM +0100, Tvrtko Ursulin wrote:
> 
> On 25/05/2021 18:21, Matthew Brost wrote:
> > On Tue, May 25, 2021 at 10:21:00AM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 06/05/2021 20:13, Matthew Brost wrote:
> > > > Add non blocking CTB send function, intel_guc_send_nb. In order to
> > > > support a non blocking CTB send function a spin lock is needed to
> > > > protect the CTB descriptors fields. Also the non blocking call must not
> > > > update the fence value as this value is owned by the blocking call
> > > > (intel_guc_send).
> > > 
> > > Could the commit message say why the non-blocking send function is needed?
> > > 
> > 
> > Sure. Something like:
> > 
> > 'CTBs will be used in the critical patch of GuC submission and there is
> > no need to wait for each CTB complete before moving on the i915'
> 
> A bit more, like also mentioning the critical path is with interrupts disabled or so. And not just that there is no need to wait but waiting is not possible because this or that. So only choice is to do this busy loop send. It's a bit horrible so justification needs to be documented.
> 

Don't I basically say all this? Anyways I'll scrub this comment.

> > > > The blocking CTB now must have a flow control mechanism to ensure the
> > > > buffer isn't overrun. A lazy spin wait is used as we believe the flow
> > > > control condition should be rare with properly sized buffer.
> > > > 
> > > > The function, intel_guc_send_nb, is exported in this patch but unused.
> > > > Several patches later in the series make use of this function.
> > > > 
> > > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >    drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 ++-
> > > >    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 96 +++++++++++++++++++++--
> > > >    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  7 +-
> > > >    3 files changed, 105 insertions(+), 10 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > index c20f3839de12..4c0a367e41d8 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > @@ -75,7 +75,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
> > > >    static
> > > >    inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
> > > >    {
> > > > -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
> > > > +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
> > > > +}
> > > > +
> > > > +#define INTEL_GUC_SEND_NB		BIT(31)
> > > > +static
> > > > +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
> > > > +{
> > > > +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
> > > > +				 INTEL_GUC_SEND_NB);
> > > >    }
> > > >    static inline int
> > > > @@ -83,7 +91,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
> > > >    			   u32 *response_buf, u32 response_buf_size)
> > > >    {
> > > >    	return intel_guc_ct_send(&guc->ct, action, len,
> > > > -				 response_buf, response_buf_size);
> > > > +				 response_buf, response_buf_size, 0);
> > > >    }
> > > >    static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > index a76603537fa8..af7314d45a78 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > @@ -3,6 +3,11 @@
> > > >     * Copyright © 2016-2019 Intel Corporation
> > > >     */
> > > > +#include <linux/circ_buf.h>
> > > > +#include <linux/ktime.h>
> > > > +#include <linux/time64.h>
> > > > +#include <linux/timekeeping.h>
> > > > +
> > > >    #include "i915_drv.h"
> > > >    #include "intel_guc_ct.h"
> > > >    #include "gt/intel_gt.h"
> > > > @@ -308,6 +313,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
> > > >    	if (unlikely(err))
> > > >    		goto err_deregister;
> > > > +	ct->requests.last_fence = 1;
> > > >    	ct->enabled = true;
> > > >    	return 0;
> > > > @@ -343,10 +349,22 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
> > > >    	return ++ct->requests.last_fence;
> > > >    }
> > > > +static void write_barrier(struct intel_guc_ct *ct) {
> > > > +	struct intel_guc *guc = ct_to_guc(ct);
> > > > +	struct intel_gt *gt = guc_to_gt(guc);
> > > > +
> > > > +	if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
> > > > +		GEM_BUG_ON(guc->send_regs.fw_domains);
> > > > +		intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
> > > 
> > > It's safe to write to this reg? Does it need a comment to explain it?
> > > 
> > 
> > Yes, it is same. IMO 'SCRATCH' in the name is enough documentation.
> 
> Why would it be enough? It requires digging to figure it out since it appears these are some sort of GuC special registers and not generic scratch:
> 
> commit 2d4ed3a988e6b1ff9729d0edd74bf4890571253e
> Author: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Date:   Mon May 27 18:36:05 2019 +0000
> 
>     drm/i915/guc: New GuC scratch registers for Gen11
> 
> If it was a normal scratch then async trashing of those from a random driver thread isn't per se safe if used from a GPU context running in parallel.
> 
> But then according to bspec they are called VF_SW_FLAG_<n> and not GEN11_SOFT_SCRATCH so yeah.
> 

Moved this part into its own patch and added comment indicating why they
are safe to use.

Matt 

> > > > +	} else {
> > > > +		wmb();
> > > > +	}
> > > > +}
> > > > +
> > > >    static int ct_write(struct intel_guc_ct *ct,
> > > >    		    const u32 *action,
> > > >    		    u32 len /* in dwords */,
> > > > -		    u32 fence)
> > > > +		    u32 fence, u32 flags)
> > > >    {
> > > >    	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > >    	struct guc_ct_buffer_desc *desc = ctb->desc;
> > > > @@ -393,9 +411,13 @@ static int ct_write(struct intel_guc_ct *ct,
> > > >    		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
> > > >    		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
> > > > -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> > > > -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> > > > -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
> > > > +	hxg = (flags & INTEL_GUC_SEND_NB) ?
> > > > +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
> > > > +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
> > > > +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
> > > > +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> > > > +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> > > > +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
> > > >    	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
> > > >    		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
> > > > @@ -412,6 +434,12 @@ static int ct_write(struct intel_guc_ct *ct,
> > > >    	}
> > > >    	GEM_BUG_ON(tail > size);
> > > > +	/*
> > > > +	 * make sure H2G buffer update and LRC tail update (if this triggering a
> > > > +	 * submission) are visable before updating the descriptor tail
> > > > +	 */
> > > > +	write_barrier(ct);
> > > > +
> > > >    	/* now update descriptor */
> > > >    	WRITE_ONCE(desc->tail, tail);
> > > > @@ -466,6 +494,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> > > >    	return err;
> > > >    }
> > > > +static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> > > > +{
> > > > +	struct guc_ct_buffer_desc *desc = ctb->desc;
> > > > +	u32 head = READ_ONCE(desc->head);
> > > > +	u32 space;
> > > > +
> > > > +	space = CIRC_SPACE(desc->tail, head, ctb->size);
> > > > +
> > > > +	return space >= len_dw;
> > > > +}
> > > > +
> > > > +static int ct_send_nb(struct intel_guc_ct *ct,
> > > > +		      const u32 *action,
> > > > +		      u32 len,
> > > > +		      u32 flags)
> > > > +{
> > > > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > > +	unsigned long spin_flags;
> > > > +	u32 fence;
> > > > +	int ret;
> > > > +
> > > > +	spin_lock_irqsave(&ctb->lock, spin_flags);
> > > > +
> > > > +	ret = ctb_has_room(ctb, len + 1);
> > > > +	if (unlikely(ret))
> > > > +		goto out;
> > > > +
> > > > +	fence = ct_get_next_fence(ct);
> > > > +	ret = ct_write(ct, action, len, fence, flags);
> > > > +	if (unlikely(ret))
> > > > +		goto out;
> > > > +
> > > > +	intel_guc_notify(ct_to_guc(ct));
> > > > +
> > > > +out:
> > > > +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +
> > > >    static int ct_send(struct intel_guc_ct *ct,
> > > >    		   const u32 *action,
> > > >    		   u32 len,
> > > > @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > > >    		   u32 response_buf_size,
> > > >    		   u32 *status)
> > > >    {
> > > > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > >    	struct ct_request request;
> > > >    	unsigned long flags;
> > > >    	u32 fence;
> > > > @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> > > >    	GEM_BUG_ON(!len);
> > > >    	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> > > >    	GEM_BUG_ON(!response_buf && response_buf_size);
> > > > +	might_sleep();
> > > 
> > > Sleep is just cond_resched below or there is more?
> > > 
> > 
> > Yes, the cond_resched.
> > 
> > > > +	/*
> > > > +	 * We use a lazy spin wait loop here as we believe that if the CT
> > > > +	 * buffers are sized correctly the flow control condition should be
> > > > +	 * rare.
> > > > +	 */
> > > > +retry:
> > > >    	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > > > +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
> > > > +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > > > +		cond_resched();
> > > > +		goto retry;
> > > > +	}
> > > 
> > > If this patch is about adding a non-blocking send function, and below we can
> > > see that it creates a fork:
> > > 
> > > intel_guc_ct_send:
> > > ...
> > > 	if (flags & INTEL_GUC_SEND_NB)
> > > 		return ct_send_nb(ct, action, len, flags);
> > > 
> > >   	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> > > 
> > > Then why is there a change in ct_send here, which is not the new
> > > non-blocking path?
> > > 
> > 
> > There is not a change to ct_send(), just to intel_guc_ct_send.
> 
> I was doing by the diff which says:
> 
>  static int ct_send(struct intel_guc_ct *ct,
>  		   const u32 *action,
>  		   u32 len,
> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
>  		   u32 response_buf_size,
>  		   u32 *status)
>  {
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>  	struct ct_request request;
>  	unsigned long flags;
>  	u32 fence;
> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
>  	GEM_BUG_ON(!len);
>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>  	GEM_BUG_ON(!response_buf && response_buf_size);
> +	might_sleep();
> +	/*
> +	 * We use a lazy spin wait loop here as we believe that if the CT
> +	 * buffers are sized correctly the flow control condition should be
> +	 * rare.
> +	 */
> +retry:
>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> +		cond_resched();
> +		goto retry;
> +	}
> 
> So it looks like a change to ct_send to me. Is that wrong?
> 
> Regards,
> 
> Tvrtko
> 
> > As for why intel_guc_ct_send is updated and we don't just a new public
> > function, this was another reviewers suggestion. Again can't make
> > everyone happy.
> > > >    	fence = ct_get_next_fence(ct);
> > > >    	request.fence = fence;
> > > > @@ -495,7 +576,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > > >    	list_add_tail(&request.link, &ct->requests.pending);
> > > >    	spin_unlock(&ct->requests.lock);
> > > > -	err = ct_write(ct, action, len, fence);
> > > > +	err = ct_write(ct, action, len, fence, 0);
> > > >    	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > > > @@ -537,7 +618,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > > >     * Command Transport (CT) buffer based GuC send function.
> > > >     */
> > > >    int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> > > > -		      u32 *response_buf, u32 response_buf_size)
> > > > +		      u32 *response_buf, u32 response_buf_size, u32 flags)
> > > >    {
> > > >    	u32 status = ~0; /* undefined */
> > > >    	int ret;
> > > > @@ -547,6 +628,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> > > >    		return -ENODEV;
> > > >    	}
> > > > +	if (flags & INTEL_GUC_SEND_NB)
> > > > +		return ct_send_nb(ct, action, len, flags);
> > > > +
> > > >    	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> > > >    	if (unlikely(ret < 0)) {
> > > >    		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > index 1ae2dde6db93..55ef7c52472f 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > @@ -9,6 +9,7 @@
> > > >    #include <linux/interrupt.h>
> > > >    #include <linux/spinlock.h>
> > > >    #include <linux/workqueue.h>
> > > > +#include <linux/ktime.h>
> > > >    #include "intel_guc_fwif.h"
> > > > @@ -42,7 +43,6 @@ struct intel_guc_ct_buffer {
> > > >    	bool broken;
> > > >    };
> > > > -
> > > >    /** Top-level structure for Command Transport related data
> > > >     *
> > > >     * Includes a pair of CT buffers for bi-directional communication and tracking
> > > > @@ -69,6 +69,9 @@ struct intel_guc_ct {
> > > >    		struct list_head incoming; /* incoming requests */
> > > >    		struct work_struct worker; /* handler for incoming requests */
> > > >    	} requests;
> > > > +
> > > > +	/** @stall_time: time of first time a CTB submission is stalled */
> > > > +	ktime_t stall_time;
> > > 
> > > Unused in this patch.
> > > 
> > 
> > Yea, wrong patch. Will fix.
> > 
> > Matt
> > > >    };
> > > >    void intel_guc_ct_init_early(struct intel_guc_ct *ct);
> > > > @@ -88,7 +91,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
> > > >    }
> > > >    int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> > > > -		      u32 *response_buf, u32 response_buf_size);
> > > > +		      u32 *response_buf, u32 response_buf_size, u32 flags);
> > > >    void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
> > > >    #endif /* _INTEL_GUC_CT_H_ */
> > > > 
> > > 
> > > Regards,
> > > 
> > > Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 53/97] drm/i915/guc: Disable semaphores when using GuC scheduling
  2021-05-26  9:25       ` Tvrtko Ursulin
@ 2021-05-26 18:15         ` Matthew Brost
  2021-05-27  8:41           ` Tvrtko Ursulin
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-26 18:15 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Wed, May 26, 2021 at 10:25:13AM +0100, Tvrtko Ursulin wrote:
> 
> On 25/05/2021 18:01, Matthew Brost wrote:
> > On Tue, May 25, 2021 at 10:52:01AM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 06/05/2021 20:14, Matthew Brost wrote:
> > > > Disable semaphores when using GuC scheduling as semaphores are broken in
> > > > the current GuC firmware.
> > > 
> > > What is "current"? Given that the patch itself is like year and a half old.
> > > 
> > 
> > Stale comment. Semaphore work with the firmware we just haven't enabled
> > them in the i915 with GuC submission as this an optimization and not
> > required for functionality.
> 
> How will the updated commit message look in terms of remaining reasons why
> semaphores won't/can't be enabled?
> 

Semaphores are an optimization and not required for basic GuC submission
to work properly. Disable until we have time to do the implementation to
enable semaphores and tune them for performance.

> They were a nice performance win on some media workloads although granted a
> lot of tweaking was required to find a good balance on when to use them and
> when not to.
>

The same tweaking would have to be done for with GuC submission. Let's
get basic submission then tweak for performance.

Matt 
 
> Regards,
> 
> Tvrtko
> 
> > Matt
> > 
> > > Regards,
> > > 
> > > Tvrtko
> > > 
> > > > Cc: John Harrison <john.c.harrison@intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >    drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 ++++--
> > > >    1 file changed, 4 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > index 993faa213b41..d30260ffe2a7 100644
> > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > @@ -230,7 +230,8 @@ static void intel_context_set_gem(struct intel_context *ce,
> > > >    		ce->timeline = intel_timeline_get(ctx->timeline);
> > > >    	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
> > > > -	    intel_engine_has_timeslices(ce->engine))
> > > > +	    intel_engine_has_timeslices(ce->engine) &&
> > > > +	    intel_engine_has_semaphores(ce->engine))
> > > >    		__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
> > > >    	intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us);
> > > > @@ -1939,7 +1940,8 @@ static int __apply_priority(struct intel_context *ce, void *arg)
> > > >    	if (!intel_engine_has_timeslices(ce->engine))
> > > >    		return 0;
> > > > -	if (ctx->sched.priority >= I915_PRIORITY_NORMAL)
> > > > +	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
> > > > +	    intel_engine_has_semaphores(ce->engine))
> > > >    		intel_context_set_use_semaphores(ce);
> > > >    	else
> > > >    		intel_context_clear_use_semaphores(ce);
> > > > 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 55/97] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
  2021-05-26  9:21       ` Tvrtko Ursulin
@ 2021-05-26 18:18         ` Matthew Brost
  2021-05-27  9:02           ` Tvrtko Ursulin
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-26 18:18 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Wed, May 26, 2021 at 10:21:05AM +0100, Tvrtko Ursulin wrote:
> 
> On 25/05/2021 18:07, Matthew Brost wrote:
> > On Tue, May 25, 2021 at 11:06:00AM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 06/05/2021 20:14, Matthew Brost wrote:
> > > > When running the GuC the GPU can't be considered idle if the GuC still
> > > > has contexts pinned. As such, a call has been added in
> > > > intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
> > > > the number of unpinned contexts to go to zero.
> > > > 
> > > > Cc: John Harrison <john.c.harrison@intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >    drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
> > > >    drivers/gpu/drm/i915/gt/intel_gt.c            | 18 ++++
> > > >    drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
> > > >    drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 22 ++---
> > > >    drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
> > > >    drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
> > > >    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  1 +
> > > >    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +
> > > >    .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 ++++++++++++++++++-
> > > >    drivers/gpu/drm/i915/gt/uc/intel_uc.h         |  5 +
> > > >    drivers/gpu/drm/i915/i915_debugfs.c           |  1 +
> > > >    drivers/gpu/drm/i915/i915_gem_evict.c         |  1 +
> > > >    .../gpu/drm/i915/selftests/igt_live_test.c    |  2 +-
> > > >    .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
> > > >    14 files changed, 137 insertions(+), 27 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > > index 8598a1c78a4c..2f5295c9408d 100644
> > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > > @@ -634,7 +634,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
> > > >    		goto insert;
> > > >    	/* Attempt to reap some mmap space from dead objects */
> > > > -	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
> > > > +	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
> > > > +					       NULL);
> > > >    	if (err)
> > > >    		goto err;
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > index 8d77dcbad059..1742a8561f69 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > @@ -574,6 +574,24 @@ static void __intel_gt_disable(struct intel_gt *gt)
> > > >    	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
> > > >    }
> > > > +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
> > > > +{
> > > > +	long rtimeout;
> > > > +
> > > > +	/* If the device is asleep, we have no requests outstanding */
> > > > +	if (!intel_gt_pm_is_awake(gt))
> > > > +		return 0;
> > > > +
> > > > +	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
> > > > +							   &rtimeout)) > 0) {
> > > > +		cond_resched();
> > > > +		if (signal_pending(current))
> > > > +			return -EINTR;
> > > > +	}
> > > > +
> > > > +	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc, rtimeout);
> > > > +}
> > > > +
> > > >    int intel_gt_init(struct intel_gt *gt)
> > > >    {
> > > >    	int err;
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
> > > > index 7ec395cace69..c775043334bf 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_gt.h
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
> > > > @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
> > > >    void intel_gt_driver_late_release(struct intel_gt *gt);
> > > > +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> > > > +
> > > >    void intel_gt_check_and_clear_faults(struct intel_gt *gt);
> > > >    void intel_gt_clear_error_registers(struct intel_gt *gt,
> > > >    				    intel_engine_mask_t engine_mask);
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > > > index 647eca9d867a..c6c702f236fa 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > > > @@ -13,6 +13,7 @@
> > > >    #include "intel_gt_pm.h"
> > > >    #include "intel_gt_requests.h"
> > > >    #include "intel_timeline.h"
> > > > +#include "uc/intel_uc.h"
> > > >    static bool retire_requests(struct intel_timeline *tl)
> > > >    {
> > > > @@ -130,7 +131,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine)
> > > >    	GEM_BUG_ON(engine->retire);
> > > >    }
> > > > -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
> > > > +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> > > > +				      long *rtimeout)
> > > 
> > > What is 'rtimeout', I know remaining, but it can be more self-descriptive to
> > > start with.
> > > 
> > 
> > 'remaining_timeout' it is.
> > 
> > > It feels a bit churny for what it is. How plausible would be alternatives to
> > > either change existing timeout to in/out, or measure sleep internally in
> > > this function, or just risk sleeping twice as long by passing the original
> > > timeout to uc idle as well?
> > > 
> > 
> > Originally had it just passing in the same value, got review feedback
> > saying I should pass in the adjusted value. Hard to make everyone happy.
> 
> Ok.
> 
> > > >    {
> > > >    	struct intel_gt_timelines *timelines = &gt->timelines;
> > > >    	struct intel_timeline *tl, *tn;
> > > > @@ -195,22 +197,10 @@ out_active:	spin_lock(&timelines->lock);
> > > >    	if (flush_submission(gt, timeout)) /* Wait, there's more! */
> > > >    		active_count++;
> > > > -	return active_count ? timeout : 0;
> > > > -}
> > > > -
> > > > -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
> > > > -{
> > > > -	/* If the device is asleep, we have no requests outstanding */
> > > > -	if (!intel_gt_pm_is_awake(gt))
> > > > -		return 0;
> > > > -
> > > > -	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
> > > > -		cond_resched();
> > > > -		if (signal_pending(current))
> > > > -			return -EINTR;
> > > > -	}
> > > > +	if (rtimeout)
> > > > +		*rtimeout = timeout;
> > > > -	return timeout;
> > > > +	return active_count ? timeout : 0;
> > > >    }
> > > >    static void retire_work_handler(struct work_struct *work)
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > > > index fcc30a6e4fe9..4419787124e2 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > > > @@ -10,10 +10,11 @@ struct intel_engine_cs;
> > > >    struct intel_gt;
> > > >    struct intel_timeline;
> > > > -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
> > > > +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> > > > +				      long *rtimeout);
> > > >    static inline void intel_gt_retire_requests(struct intel_gt *gt)
> > > >    {
> > > > -	intel_gt_retire_requests_timeout(gt, 0);
> > > > +	intel_gt_retire_requests_timeout(gt, 0, NULL);
> > > >    }
> > > >    void intel_engine_init_retire(struct intel_engine_cs *engine);
> > > > @@ -21,8 +22,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine,
> > > >    			     struct intel_timeline *tl);
> > > >    void intel_engine_fini_retire(struct intel_engine_cs *engine);
> > > > -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> > > > -
> > > >    void intel_gt_init_requests(struct intel_gt *gt);
> > > >    void intel_gt_park_requests(struct intel_gt *gt);
> > > >    void intel_gt_unpark_requests(struct intel_gt *gt);
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > index 485e98f3f304..47eaa69809e8 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > @@ -38,6 +38,8 @@ struct intel_guc {
> > > >    	spinlock_t irq_lock;
> > > >    	unsigned int msg_enabled_mask;
> > > > +	atomic_t outstanding_submission_g2h;
> > > > +
> > > >    	struct {
> > > >    		bool enabled;
> > > >    		void (*reset)(struct intel_guc *guc);
> > > > @@ -239,6 +241,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
> > > >    	spin_unlock_irq(&guc->irq_lock);
> > > >    }
> > > > +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
> > > > +
> > > >    int intel_guc_reset_engine(struct intel_guc *guc,
> > > >    			   struct intel_engine_cs *engine);
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > index f1893030ca88..cf701056fa14 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > @@ -111,6 +111,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
> > > >    	INIT_LIST_HEAD(&ct->requests.incoming);
> > > >    	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
> > > >    	tasklet_init(&ct->receive_tasklet, ct_receive_tasklet_func, (unsigned long)ct);
> > > > +	init_waitqueue_head(&ct->wq);
> > > >    }
> > > >    static inline const char *guc_ct_buffer_type_to_str(u32 type)
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > index 660bf37238e2..ab1b79ab960b 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > @@ -10,6 +10,7 @@
> > > >    #include <linux/spinlock.h>
> > > >    #include <linux/workqueue.h>
> > > >    #include <linux/ktime.h>
> > > > +#include <linux/wait.h>
> > > >    #include "intel_guc_fwif.h"
> > > > @@ -68,6 +69,9 @@ struct intel_guc_ct {
> > > >    	struct tasklet_struct receive_tasklet;
> > > > +	/** @wq: wait queue for g2h chanenl */
> > > > +	wait_queue_head_t wq;
> > > > +
> > > >    	struct {
> > > >    		u16 last_fence; /* last fence used to send request */
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > index ae0b386467e3..0ff7dd6d337d 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > @@ -253,6 +253,74 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> > > >    	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> > > >    }
> > > > +static int guc_submission_busy_loop(struct intel_guc* guc,
> > > > +				    const u32 *action,
> > > > +				    u32 len,
> > > > +				    u32 g2h_len_dw,
> > > > +				    bool loop)
> > > > +{
> > > > +	int err;
> > > > +
> > > > +	err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
> > > > +
> > > > +	if (!err && g2h_len_dw)
> > > > +		atomic_inc(&guc->outstanding_submission_g2h);
> > > > +
> > > > +	return err;
> > > > +}
> > > > +
> > > > +static int guc_wait_for_pending_msg(struct intel_guc *guc,
> > > > +				    atomic_t *wait_var,
> > > > +				    bool interruptible,
> > > > +				    long timeout)
> > > > +{
> > > > +	const int state = interruptible ?
> > > > +		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> > > > +	DEFINE_WAIT(wait);
> > > > +
> > > > +	might_sleep();
> > > > +	GEM_BUG_ON(timeout < 0);
> > > > +
> > > > +	if (!atomic_read(wait_var))
> > > > +		return 0;
> > > > +
> > > > +	if (!timeout)
> > > > +		return -ETIME;
> > > > +
> > > > +	for (;;) {
> > > > +		prepare_to_wait(&guc->ct.wq, &wait, state);
> > > > +
> > > > +		if (!atomic_read(wait_var))
> > > > +			break;
> > > > +
> > > > +		if (signal_pending_state(state, current)) {
> > > > +			timeout = -ERESTARTSYS;
> > > > +			break;
> > > > +		}
> > > > +
> > > > +		if (!timeout) {
> > > > +			timeout = -ETIME;
> > > > +			break;
> > > > +		}
> > > > +
> > > > +		timeout = io_schedule_timeout(timeout);
> > > > +	}
> > > > +	finish_wait(&guc->ct.wq, &wait);
> > > > +
> > > > +	return (timeout < 0) ? timeout : 0;
> > > > +}
> > > 
> > > See if it is possible to simplify all this with wait_var_event and
> > > wake_up_var.
> > > 
> > 
> > Let me check on that.
> > > > +
> > > > +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
> > > > +{
> > > > +	bool interruptible = true;
> > > > +
> > > > +	if (unlikely(timeout < 0))
> > > > +		timeout = -timeout, interruptible = false;
> > > > +
> > > > +	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
> > > > +					interruptible, timeout);
> > > > +}
> > > > +
> > > >    static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > > >    {
> > > >    	int err;
> > > > @@ -279,6 +347,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > > >    	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
> > > >    	if (!enabled && !err) {
> > > > +		atomic_inc(&guc->outstanding_submission_g2h);
> > > >    		set_context_enabled(ce);
> > > >    	} else if (!enabled) {
> > > >    		clr_context_pending_enable(ce);
> > > > @@ -734,7 +803,7 @@ static int __guc_action_register_context(struct intel_guc *guc,
> > > >    		offset,
> > > >    	};
> > > > -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > > > +	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > > >    }
> > > >    static int register_context(struct intel_context *ce)
> > > > @@ -754,7 +823,7 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
> > > >    		guc_id,
> > > >    	};
> > > > -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > > > +	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
> > > >    					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
> > > >    }
> > > > @@ -871,7 +940,9 @@ static int guc_context_pin(struct intel_context *ce, void *vaddr)
> > > >    static void guc_context_unpin(struct intel_context *ce)
> > > >    {
> > > > -	unpin_guc_id(ce_to_guc(ce), ce);
> > > > +	struct intel_guc *guc = ce_to_guc(ce);
> > > > +
> > > > +	unpin_guc_id(guc, ce);
> > > >    	lrc_unpin(ce);
> > > >    }
> > > > @@ -894,7 +965,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
> > > >    	intel_context_get(ce);
> > > > -	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > > > +	guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
> > > >    				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> > > >    }
> > > > @@ -1437,6 +1508,15 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> > > >    	return ce;
> > > >    }
> > > > +static void decr_outstanding_submission_g2h(struct intel_guc *guc)
> > > > +{
> > > > +	if (atomic_dec_and_test(&guc->outstanding_submission_g2h)) {
> > > > +		smp_mb();
> > > > +		if (waitqueue_active(&guc->ct.wq))
> > > > +			wake_up_all(&guc->ct.wq);
> > > 
> > > I keep pointing out this pattern is racy and at least needs comment why it
> > > is safe.
> > > 
> > 
> > There is a comment in wake queue code header saying why this is safe. I
> > don't think we need to repeat this here.
> 
> Yeah, _describing how to make it safe_, after it starts with:
> 
>  * NOTE: this function is lockless and requires care, incorrect usage _will_
>  * lead to sporadic and non-obvious failure.
> 
> Then it also says:
> 
>  * Also note that this 'optimization' trades a spin_lock() for an smp_mb(),
>  * which (when the lock is uncontended) are of roughly equal cost.
> 
> I question the need to optimize this path since it means reader has to figure out if it is safe while a simple wake_up_all after atomic_dec_and_test would have done it.
> 
> Is the case of no waiters a predominant one? It at least deserves a comment explaining why the optimisation is important.
> 

I just didn't want to add a spin_lock if there is known working code
path without one and our code fits into that path. I can add a comment
but I don't really think it necessary.

Matt 

> Regards,
> 
> Tvrtko
> 
> > 
> > Matt
> > 
> > > Regards,
> > > 
> > > Tvrtko
> > > 
> > > > +	}
> > > > +}
> > > > +
> > > >    int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > > >    					  const u32 *msg,
> > > >    					  u32 len)
> > > > @@ -1472,6 +1552,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > > >    		lrc_destroy(&ce->ref);
> > > >    	}
> > > > +	decr_outstanding_submission_g2h(guc);
> > > > +
> > > >    	return 0;
> > > >    }
> > > > @@ -1520,6 +1602,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
> > > >    		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > > >    	}
> > > > +	decr_outstanding_submission_g2h(guc);
> > > >    	intel_context_put(ce);
> > > >    	return 0;
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > > > index 9c954c589edf..c4cef885e984 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > > > @@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission);
> > > >    #undef uc_state_checkers
> > > >    #undef __uc_state_checker
> > > > +static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout)
> > > > +{
> > > > +	return intel_guc_wait_for_idle(&uc->guc, timeout);
> > > > +}
> > > > +
> > > >    #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \
> > > >    static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \
> > > >    { \
> > > > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > > > index 8dd374691102..bb29838d1cd7 100644
> > > > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > > > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > > > @@ -36,6 +36,7 @@
> > > >    #include "gt/intel_gt_clock_utils.h"
> > > >    #include "gt/intel_gt.h"
> > > >    #include "gt/intel_gt_pm.h"
> > > > +#include "gt/intel_gt.h"
> > > >    #include "gt/intel_gt_requests.h"
> > > >    #include "gt/intel_reset.h"
> > > >    #include "gt/intel_rc6.h"
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> > > > index 4d2d59a9942b..2b73ddb11c66 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> > > > @@ -27,6 +27,7 @@
> > > >     */
> > > >    #include "gem/i915_gem_context.h"
> > > > +#include "gt/intel_gt.h"
> > > >    #include "gt/intel_gt_requests.h"
> > > >    #include "i915_drv.h"
> > > > diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
> > > > index c130010a7033..1c721542e277 100644
> > > > --- a/drivers/gpu/drm/i915/selftests/igt_live_test.c
> > > > +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
> > > > @@ -5,7 +5,7 @@
> > > >     */
> > > >    #include "i915_drv.h"
> > > > -#include "gt/intel_gt_requests.h"
> > > > +#include "gt/intel_gt.h"
> > > >    #include "../i915_selftest.h"
> > > >    #include "igt_flush_test.h"
> > > > diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > > > index cf40004bc92a..6c06816e2b99 100644
> > > > --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > > > +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > > > @@ -51,7 +51,8 @@ void mock_device_flush(struct drm_i915_private *i915)
> > > >    	do {
> > > >    		for_each_engine(engine, gt, id)
> > > >    			mock_engine_flush(engine);
> > > > -	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT));
> > > > +	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT,
> > > > +						  NULL));
> > > >    }
> > > >    static void mock_device_release(struct drm_device *dev)
> > > > 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 39/97] drm/i915/guc: Increase size of CTB buffers
  2021-05-26  9:30       ` Tvrtko Ursulin
@ 2021-05-26 18:20         ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-26 18:20 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Wed, May 26, 2021 at 10:30:27AM +0100, Tvrtko Ursulin wrote:
> 
> On 25/05/2021 18:15, Matthew Brost wrote:
> > On Tue, May 25, 2021 at 10:24:09AM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 06/05/2021 20:13, Matthew Brost wrote:
> > > > With the introduction of non-blocking CTBs more than one CTB can be in
> > > > flight at a time. Increasing the size of the CTBs should reduce how
> > > > often software hits the case where no space is available in the CTB
> > > > buffer.
> > > 
> > > I'd move this before the patch which adds the non-blocking send since that
> > > one claims congestion should be rare with properly sized buffers. So it
> > > makes sense to have them sized properly back before that one.
> > > 
> > 
> > IMO patch ordering is a bit of bikeshed. All these CTBs changes required
> > for GuC submission (34-40, 54) will get posted its own series and get
> > merged together. None of the individual patches break anything or is any
> > of this code really used until GuC submission is turned on. I can move
> > this when I post these patches by themselves but I just don't really see
> > the point either way.
> 
> As a general principle we do try to have work in the order which makes sense
> functionality wise.
> 
> That includes trying to avoid adding and then removing, or changing a lot,
> the same code within the series. And also adding functionality which is
> known to not work completely well until later in the series.
> 
> With a master switch at the end of series you can sometimes get away with
> it, but if nothing else it at least makes it much easier to read if things
> are flowing in the expected way within (the series).
> 
> In this particular example sizing the buffers appropriately before starting
> to use the facility a lot more certainly sounds like a no brainer to me,
> especially since the patch is so trivial to move conflict wise.
> 

Fair enough. I'll reorder these patches when I do a post to merge these
ones.

Matt

> Regards,
> 
> Tvrtko
> 
> > Matt
> > > Regards,
> > > 
> > > Tvrtko
> > > 
> > > > Cc: John Harrison <john.c.harrison@intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
> > > >    1 file changed, 8 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > index 77dfbc94dcc3..d6895d29ed2d 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
> > > >     *      +--------+-----------------------------------------------+------+
> > > >     *
> > > >     * Size of each `CT Buffer`_ must be multiple of 4K.
> > > > - * As we don't expect too many messages, for now use minimum sizes.
> > > > + * We don't expect too many messages in flight at any time, unless we are
> > > > + * using the GuC submission. In that case each request requires a minimum
> > > > + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this
> > > > + * enough space to avoid backpressure on the driver. We increase the size
> > > > + * of the receive buffer (relative to the send) to ensure a G2H response
> > > > + * CTB has a landing spot.
> > > >     */
> > > >    #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
> > > >    #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
> > > > -#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
> > > > +#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
> > > >    #define MAX_US_STALL_CTB	1000000
> > > > @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> > > >    	/* beware of buffer wrap case */
> > > >    	if (unlikely(available < 0))
> > > >    		available += size;
> > > > -	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
> > > > +	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
> > > >    	GEM_BUG_ON(available < 0);
> > > >    	header = cmds[head];
> > > > 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 60/97] drm/i915: Track 'serial' counts for virtual engines
  2021-05-26  8:40       ` Tvrtko Ursulin
@ 2021-05-26 18:45         ` John Harrison
  2021-05-27  8:53           ` Tvrtko Ursulin
  0 siblings, 1 reply; 249+ messages in thread
From: John Harrison @ 2021-05-26 18:45 UTC (permalink / raw)
  To: Tvrtko Ursulin, Matthew Brost
  Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On 5/26/2021 01:40, Tvrtko Ursulin wrote:
> On 25/05/2021 18:52, Matthew Brost wrote:
>> On Tue, May 25, 2021 at 11:16:12AM +0100, Tvrtko Ursulin wrote:
>>>
>>> On 06/05/2021 20:14, Matthew Brost wrote:
>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>
>>>> The serial number tracking of engines happens at the backend of
>>>> request submission and was expecting to only be given physical
>>>> engines. However, in GuC submission mode, the decomposition of virtual
>>>> to physical engines does not happen in i915. Instead, requests are
>>>> submitted to their virtual engine mask all the way through to the
>>>> hardware (i.e. to GuC). This would mean that the heart beat code
>>>> thinks the physical engines are idle due to the serial number not
>>>> incrementing.
>>>>
>>>> This patch updates the tracking to decompose virtual engines into
>>>> their physical constituents and tracks the request against each. This
>>>> is not entirely accurate as the GuC will only be issuing the request
>>>> to one physical engine. However, it is the best that i915 can do given
>>>> that it has no knowledge of the GuC's scheduling decisions.
>>>
>>> Commit text sounds a bit defeatist. I think instead of making up the 
>>> serial
>>> counts, which has downsides (could you please document in the commit 
>>> what
>>> they are), we should think how to design things properly.
>>>
>>
>> IMO, I don't think fixing serial counts is the scope of this series. We
>> should focus on getting GuC submission in not cleaning up all the crap
>> that is in the i915. Let's make a note of this though so we can revisit
>> later.
>
> I will say again - commit message implies it is introducing an 
> unspecified downside by not fully fixing an also unspecified issue. It 
> is completely reasonable, and customary even, to ask for both to be 
> documented in the commit message.
Not sure what exactly is 'unspecified'. I thought the commit message 
described both the problem (heartbeat not running when using virtual 
engines) and the result (heartbeat running on more engines than strictly 
necessary). But in greater detail...

The serial number tracking is a hack for the heartbeat code to know 
whether an engine is busy or idle, and therefore whether it should be 
pinged for aliveness. Whenever a submission is made to an engine, the 
serial number is incremented. The heartbeat code keeps a copy of the 
value. If the value has changed, the engine is busy and needs to be pinged.

This works fine for execlist mode where virtual engine decomposition is 
done inside i915. It fails miserably for GuC mode where the 
decomposition is done by the hardware. The reason being that the 
heartbeat code only looks at physical engines but the serial count is 
only incremented on the virtual engine. Thus, the heartbeat sees 
everything as idle and does not ping.

This patch decomposes the virtual engines for the sake of incrementing 
the serial count on each sub-engine in order to keep the heartbeat code 
happy. The downside is that now the heartbeat sees all sub-engines as 
busy rather than only the one the submission actually ends up on. There 
really isn't much that can be done about that. The heartbeat code is in 
i915 not GuC, the scheduler is in GuC not i915. The only way to improve 
it is to either move the heartbeat code into GuC as well and completely 
disable the i915 side, or add some way for i915 to interrogate GuC as to 
which engines are or are not active. Technically, we do have both. GuC 
has (or at least had) an option to force a context switch on every 
execution quantum pre-emption. However, that is much, much, more heavy 
weight than the heartbeat. For the latter, we do (almost) have the 
engine usage statistics for PMU and such like. I'm not sure how much 
effort it would be to wire that up to the heartbeat code instead of 
using the serial count.

In short, the serial count is ever so slightly inefficient in that it 
causes heartbeat pings on engines which are idle. On the other hand, it 
is way more efficient and simpler than the current alternatives.

Does that answer the questions?

John.


>
> If we are abandoning the normal review process someone please say so I 
> don't waste my time reading it.
>
> Regards,
>
> Tvrtko
>
>> Matt
>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/gt/intel_engine_types.h     |  2 ++
>>>>    .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++++++
>>>>    drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++++++
>>>>    drivers/gpu/drm/i915/gt/mock_engine.c            |  6 ++++++
>>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 
>>>> ++++++++++++++++
>>>>    drivers/gpu/drm/i915/i915_request.c              |  4 +++-
>>>>    6 files changed, 39 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
>>>> b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>>> index 86302e6d86b2..e2b5cda6dbc4 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>>> @@ -389,6 +389,8 @@ struct intel_engine_cs {
>>>>        void        (*park)(struct intel_engine_cs *engine);
>>>>        void        (*unpark)(struct intel_engine_cs *engine);
>>>> +    void        (*bump_serial)(struct intel_engine_cs *engine);
>>>> +
>>>>        void        (*set_default_submission)(struct intel_engine_cs 
>>>> *engine);
>>>>        const struct intel_context_ops *cops;
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
>>>> b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>>> index ae12d7f19ecd..02880ea5d693 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>>> @@ -3199,6 +3199,11 @@ static void execlists_release(struct 
>>>> intel_engine_cs *engine)
>>>>        lrc_fini_wa_ctx(engine);
>>>>    }
>>>> +static void execlist_bump_serial(struct intel_engine_cs *engine)
>>>> +{
>>>> +    engine->serial++;
>>>> +}
>>>> +
>>>>    static void
>>>>    logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>>>>    {
>>>> @@ -3208,6 +3213,7 @@ logical_ring_default_vfuncs(struct 
>>>> intel_engine_cs *engine)
>>>>        engine->cops = &execlists_context_ops;
>>>>        engine->request_alloc = execlists_request_alloc;
>>>> +    engine->bump_serial = execlist_bump_serial;
>>>>        engine->reset.prepare = execlists_reset_prepare;
>>>>        engine->reset.rewind = execlists_reset_rewind;
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c 
>>>> b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>>> index 14aa31879a37..39dd7c4ed0a9 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>>> @@ -1045,6 +1045,11 @@ static void setup_irq(struct intel_engine_cs 
>>>> *engine)
>>>>        }
>>>>    }
>>>> +static void ring_bump_serial(struct intel_engine_cs *engine)
>>>> +{
>>>> +    engine->serial++;
>>>> +}
>>>> +
>>>>    static void setup_common(struct intel_engine_cs *engine)
>>>>    {
>>>>        struct drm_i915_private *i915 = engine->i915;
>>>> @@ -1064,6 +1069,7 @@ static void setup_common(struct 
>>>> intel_engine_cs *engine)
>>>>        engine->cops = &ring_context_ops;
>>>>        engine->request_alloc = ring_request_alloc;
>>>> +    engine->bump_serial = ring_bump_serial;
>>>>        /*
>>>>         * Using a global execution timeline; the previous final 
>>>> breadcrumb is
>>>> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c 
>>>> b/drivers/gpu/drm/i915/gt/mock_engine.c
>>>> index bd005c1b6fd5..97b10fd60b55 100644
>>>> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
>>>> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
>>>> @@ -292,6 +292,11 @@ static void mock_engine_release(struct 
>>>> intel_engine_cs *engine)
>>>>        intel_engine_fini_retire(engine);
>>>>    }
>>>> +static void mock_bump_serial(struct intel_engine_cs *engine)
>>>> +{
>>>> +    engine->serial++;
>>>> +}
>>>> +
>>>>    struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>>>>                        const char *name,
>>>>                        int id)
>>>> @@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct 
>>>> drm_i915_private *i915,
>>>>        engine->base.cops = &mock_context_ops;
>>>>        engine->base.request_alloc = mock_request_alloc;
>>>> +    engine->base.bump_serial = mock_bump_serial;
>>>>        engine->base.emit_flush = mock_emit_flush;
>>>>        engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
>>>>        engine->base.submit_request = mock_submit_request;
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> index dc79d287c50a..f0e5731bcef6 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> @@ -1500,6 +1500,20 @@ static void guc_release(struct 
>>>> intel_engine_cs *engine)
>>>>        lrc_fini_wa_ctx(engine);
>>>>    }
>>>> +static void guc_bump_serial(struct intel_engine_cs *engine)
>>>> +{
>>>> +    engine->serial++;
>>>> +}
>>>> +
>>>> +static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
>>>> +{
>>>> +    struct intel_engine_cs *e;
>>>> +    intel_engine_mask_t tmp, mask = engine->mask;
>>>> +
>>>> +    for_each_engine_masked(e, engine->gt, mask, tmp)
>>>> +        e->serial++;
>>>> +}
>>>> +
>>>>    static void guc_default_vfuncs(struct intel_engine_cs *engine)
>>>>    {
>>>>        /* Default vfuncs which can be overridden by each engine. */
>>>> @@ -1508,6 +1522,7 @@ static void guc_default_vfuncs(struct 
>>>> intel_engine_cs *engine)
>>>>        engine->cops = &guc_context_ops;
>>>>        engine->request_alloc = guc_request_alloc;
>>>> +    engine->bump_serial = guc_bump_serial;
>>>>        engine->sched_engine->schedule = i915_schedule;
>>>> @@ -1843,6 +1858,7 @@ guc_create_virtual(struct intel_engine_cs 
>>>> **siblings, unsigned int count)
>>>>        ve->base.cops = &virtual_guc_context_ops;
>>>>        ve->base.request_alloc = guc_request_alloc;
>>>> +    ve->base.bump_serial = virtual_guc_bump_serial;
>>>>        ve->base.submit_request = guc_submit_request;
>>>> diff --git a/drivers/gpu/drm/i915/i915_request.c 
>>>> b/drivers/gpu/drm/i915/i915_request.c
>>>> index 9542a5baa45a..127d60b36422 100644
>>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>>> @@ -692,7 +692,9 @@ bool __i915_request_submit(struct i915_request 
>>>> *request)
>>>>                         request->ring->vaddr + request->postfix);
>>>>        trace_i915_request_execute(request);
>>>> -    engine->serial++;
>>>> +    if (engine->bump_serial)
>>>> +        engine->bump_serial(engine);
>>>> +
>>>>        result = true;
>>>>        GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, 
>>>> &request->fence.flags));
>>>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx


^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 31/97] drm/i915/guc: Early initialization of GuC send registers
  2021-05-06 19:13 ` [RFC PATCH 31/97] drm/i915/guc: Early initialization of GuC send registers Matthew Brost
@ 2021-05-26 20:28   ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-26 20:28 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:45PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> Base offset and count of the GuC scratch registers, used for
> sending MMIO messages to GuC, can be initialized earlier with
> other GuC members that also depends on platform.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index 454c8d886499..235c1997f32d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -60,15 +60,8 @@ void intel_guc_init_send_regs(struct intel_guc *guc)
>  	enum forcewake_domains fw_domains = 0;
>  	unsigned int i;
>  
> -	if (INTEL_GEN(gt->i915) >= 11) {
> -		guc->send_regs.base =
> -				i915_mmio_reg_offset(GEN11_SOFT_SCRATCH(0));
> -		guc->send_regs.count = GEN11_SOFT_SCRATCH_COUNT;
> -	} else {
> -		guc->send_regs.base = i915_mmio_reg_offset(SOFT_SCRATCH(0));
> -		guc->send_regs.count = GUC_MAX_MMIO_MSG_LEN;
> -		BUILD_BUG_ON(GUC_MAX_MMIO_MSG_LEN > SOFT_SCRATCH_COUNT);
> -	}
> +	GEM_BUG_ON(!guc->send_regs.base);
> +	GEM_BUG_ON(!guc->send_regs.count);
>  
>  	for (i = 0; i < guc->send_regs.count; i++) {
>  		fw_domains |= intel_uncore_forcewake_for_reg(gt->uncore,
> @@ -181,11 +174,18 @@ void intel_guc_init_early(struct intel_guc *guc)
>  		guc->interrupts.reset = gen11_reset_guc_interrupts;
>  		guc->interrupts.enable = gen11_enable_guc_interrupts;
>  		guc->interrupts.disable = gen11_disable_guc_interrupts;
> +		guc->send_regs.base =
> +			i915_mmio_reg_offset(GEN11_SOFT_SCRATCH(0));
> +		guc->send_regs.count = GEN11_SOFT_SCRATCH_COUNT;
> +
>  	} else {
>  		guc->notify_reg = GUC_SEND_INTERRUPT;
>  		guc->interrupts.reset = gen9_reset_guc_interrupts;
>  		guc->interrupts.enable = gen9_enable_guc_interrupts;
>  		guc->interrupts.disable = gen9_disable_guc_interrupts;
> +		guc->send_regs.base = i915_mmio_reg_offset(SOFT_SCRATCH(0));
> +		guc->send_regs.count = GUC_MAX_MMIO_MSG_LEN;
> +		BUILD_BUG_ON(GUC_MAX_MMIO_MSG_LEN > SOFT_SCRATCH_COUNT);
>  	}
>  }
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 34/97] drm/i915/guc: Use guc_class instead of engine_class in fw interface
  2021-05-06 19:13 ` [RFC PATCH 34/97] drm/i915/guc: Use guc_class instead of engine_class in fw interface Matthew Brost
@ 2021-05-26 20:41   ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-26 20:41 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:48PM -0700, Matthew Brost wrote:
> From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> 
> GuC has its own defines for the engine classes. They're currently
> mapping 1:1 to the defines used by the driver, but there is no guarantee
> this will continue in the future. Given that we've been caught off-guard
> in the past by similar divergences, we can prepare for the changes by
> introducing helper functions to convert from engine class to GuC class and
> back again.
> 
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: John Harrison <John.C.Harrison@Intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c   |  6 +++--
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c  | 20 +++++++++-------
>  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 26 +++++++++++++++++++++
>  3 files changed, 42 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index c88b792c1ab5..7866ff0c2673 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -289,6 +289,7 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
>  	const struct engine_info *info = &intel_engines[id];
>  	struct drm_i915_private *i915 = gt->i915;
>  	struct intel_engine_cs *engine;
> +	u8 guc_class;
>  
>  	BUILD_BUG_ON(MAX_ENGINE_CLASS >= BIT(GEN11_ENGINE_CLASS_WIDTH));
>  	BUILD_BUG_ON(MAX_ENGINE_INSTANCE >= BIT(GEN11_ENGINE_INSTANCE_WIDTH));
> @@ -317,9 +318,10 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
>  	engine->i915 = i915;
>  	engine->gt = gt;
>  	engine->uncore = gt->uncore;
> -	engine->mmio_base = __engine_mmio_base(i915, info->mmio_bases);
>  	engine->hw_id = info->hw_id;
> -	engine->guc_id = MAKE_GUC_ID(info->class, info->instance);
> +	guc_class = engine_class_to_guc_class(info->class);
> +	engine->guc_id = MAKE_GUC_ID(guc_class, info->instance);
> +	engine->mmio_base = __engine_mmio_base(i915, info->mmio_bases);
>  
>  	engine->irq_handler = nop_irq_handler;
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index 775f00d706fa..ecd18531b40a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -6,6 +6,7 @@
>  #include "gt/intel_gt.h"
>  #include "gt/intel_lrc.h"
>  #include "intel_guc_ads.h"
> +#include "intel_guc_fwif.h"
>  #include "intel_uc.h"
>  #include "i915_drv.h"
>  
> @@ -78,7 +79,7 @@ static void guc_mapping_table_init(struct intel_gt *gt,
>  				GUC_MAX_INSTANCES_PER_CLASS;
>  
>  	for_each_engine(engine, gt, id) {
> -		u8 guc_class = engine->class;
> +		u8 guc_class = engine_class_to_guc_class(engine->class);
>  
>  		system_info->mapping_table[guc_class][engine->instance] =
>  			engine->instance;
> @@ -98,7 +99,7 @@ static void __guc_ads_init(struct intel_guc *guc)
>  	struct __guc_ads_blob *blob = guc->ads_blob;
>  	const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE;
>  	u32 base;
> -	u8 engine_class;
> +	u8 engine_class, guc_class;
>  
>  	/* GuC scheduling policies */
>  	guc_policies_init(&blob->policies);
> @@ -114,22 +115,25 @@ static void __guc_ads_init(struct intel_guc *guc)
>  	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
>  		if (engine_class == OTHER_CLASS)
>  			continue;
> +
> +		guc_class = engine_class_to_guc_class(engine_class);
> +
>  		/*
>  		 * TODO: Set context pointer to default state to allow
>  		 * GuC to re-init guilty contexts after internal reset.
>  		 */
> -		blob->ads.golden_context_lrca[engine_class] = 0;
> -		blob->ads.eng_state_size[engine_class] =
> +		blob->ads.golden_context_lrca[guc_class] = 0;
> +		blob->ads.eng_state_size[guc_class] =
>  			intel_engine_context_size(guc_to_gt(guc),
>  						  engine_class) -
>  			skipped_size;
>  	}
>  
>  	/* System info */
> -	blob->system_info.engine_enabled_masks[RENDER_CLASS] = 1;
> -	blob->system_info.engine_enabled_masks[COPY_ENGINE_CLASS] = 1;
> -	blob->system_info.engine_enabled_masks[VIDEO_DECODE_CLASS] = VDBOX_MASK(gt);
> -	blob->system_info.engine_enabled_masks[VIDEO_ENHANCEMENT_CLASS] = VEBOX_MASK(gt);
> +	blob->system_info.engine_enabled_masks[GUC_RENDER_CLASS] = 1;
> +	blob->system_info.engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
> +	blob->system_info.engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
> +	blob->system_info.engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
>  
>  	blob->system_info.generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_SLICE_ENABLED] =
>  		hweight8(gt->info.sseu.slice_mask);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> index 301b173a26bc..558cfe168cb7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> @@ -15,6 +15,7 @@
>  #include "abi/guc_communication_mmio_abi.h"
>  #include "abi/guc_communication_ctb_abi.h"
>  #include "abi/guc_messages_abi.h"
> +#include "gt/intel_engine_types.h"
>  
>  #define GUC_CLIENT_PRIORITY_KMD_HIGH	0
>  #define GUC_CLIENT_PRIORITY_HIGH	1
> @@ -32,6 +33,12 @@
>  #define GUC_VIDEO_ENGINE2		4
>  #define GUC_MAX_ENGINES_NUM		(GUC_VIDEO_ENGINE2 + 1)
>  
> +#define GUC_RENDER_CLASS		0
> +#define GUC_VIDEO_CLASS			1
> +#define GUC_VIDEOENHANCE_CLASS		2
> +#define GUC_BLITTER_CLASS		3
> +#define GUC_RESERVED_CLASS		4
> +#define GUC_LAST_ENGINE_CLASS		GUC_RESERVED_CLASS
>  #define GUC_MAX_ENGINE_CLASSES		16
>  #define GUC_MAX_INSTANCES_PER_CLASS	32
>  
> @@ -129,6 +136,25 @@
>  #define GUC_ID_TO_ENGINE_INSTANCE(guc_id) \
>  	(((guc_id) & GUC_ENGINE_INSTANCE_MASK) >> GUC_ENGINE_INSTANCE_SHIFT)
>  
> +static inline u8 engine_class_to_guc_class(u8 class)
> +{
> +	BUILD_BUG_ON(GUC_RENDER_CLASS != RENDER_CLASS);
> +	BUILD_BUG_ON(GUC_BLITTER_CLASS != COPY_ENGINE_CLASS);
> +	BUILD_BUG_ON(GUC_VIDEO_CLASS != VIDEO_DECODE_CLASS);
> +	BUILD_BUG_ON(GUC_VIDEOENHANCE_CLASS != VIDEO_ENHANCEMENT_CLASS);
> +	GEM_BUG_ON(class > MAX_ENGINE_CLASS || class == OTHER_CLASS);
> +
> +	return class;
> +}
> +
> +static inline u8 guc_class_to_engine_class(u8 guc_class)
> +{
> +	GEM_BUG_ON(guc_class > GUC_LAST_ENGINE_CLASS);
> +	GEM_BUG_ON(guc_class == GUC_RESERVED_CLASS);
> +
> +	return guc_class;
> +}
> +
>  /* Work item for submitting workloads into work queue of GuC. */
>  struct guc_wq_item {
>  	u32 header;
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 53/97] drm/i915/guc: Disable semaphores when using GuC scheduling
  2021-05-26 18:15         ` Matthew Brost
@ 2021-05-27  8:41           ` Tvrtko Ursulin
  2021-05-27 14:38             ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-27  8:41 UTC (permalink / raw)
  To: Matthew Brost; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel


On 26/05/2021 19:15, Matthew Brost wrote:
> On Wed, May 26, 2021 at 10:25:13AM +0100, Tvrtko Ursulin wrote:
>>
>> On 25/05/2021 18:01, Matthew Brost wrote:
>>> On Tue, May 25, 2021 at 10:52:01AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 06/05/2021 20:14, Matthew Brost wrote:
>>>>> Disable semaphores when using GuC scheduling as semaphores are broken in
>>>>> the current GuC firmware.
>>>>
>>>> What is "current"? Given that the patch itself is like year and a half old.
>>>>
>>>
>>> Stale comment. Semaphore work with the firmware we just haven't enabled
>>> them in the i915 with GuC submission as this an optimization and not
>>> required for functionality.
>>
>> How will the updated commit message look in terms of remaining reasons why
>> semaphores won't/can't be enabled?
>>
> 
> Semaphores are an optimization and not required for basic GuC submission
> to work properly. Disable until we have time to do the implementation to
> enable semaphores and tune them for performance.
> 
>> They were a nice performance win on some media workloads although granted a
>> lot of tweaking was required to find a good balance on when to use them and
>> when not to.
>>
> 
> The same tweaking would have to be done for with GuC submission. Let's
> get basic submission then tweak for performance.

I don't have fundamental complaints as longs as commit message is 
improved and it is understood the absence of semaphores is extremely 
likely to regress transcode performance tests. Latter probably doesn't 
matter (for some definition of it) unless either there will be a 
platform where both GuC and execlists can be supported, or there will be 
two separate platforms similar in hw performance in theory, both 
relevant to transcode customers, one using execlists and one using GuC.

Regards,

Tvrtko



^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 60/97] drm/i915: Track 'serial' counts for virtual engines
  2021-05-26 18:45         ` John Harrison
@ 2021-05-27  8:53           ` Tvrtko Ursulin
  2021-05-27 17:01             ` John Harrison
  0 siblings, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-27  8:53 UTC (permalink / raw)
  To: John Harrison, Matthew Brost
  Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel


On 26/05/2021 19:45, John Harrison wrote:
> On 5/26/2021 01:40, Tvrtko Ursulin wrote:
>> On 25/05/2021 18:52, Matthew Brost wrote:
>>> On Tue, May 25, 2021 at 11:16:12AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 06/05/2021 20:14, Matthew Brost wrote:
>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>
>>>>> The serial number tracking of engines happens at the backend of
>>>>> request submission and was expecting to only be given physical
>>>>> engines. However, in GuC submission mode, the decomposition of virtual
>>>>> to physical engines does not happen in i915. Instead, requests are
>>>>> submitted to their virtual engine mask all the way through to the
>>>>> hardware (i.e. to GuC). This would mean that the heart beat code
>>>>> thinks the physical engines are idle due to the serial number not
>>>>> incrementing.
>>>>>
>>>>> This patch updates the tracking to decompose virtual engines into
>>>>> their physical constituents and tracks the request against each. This
>>>>> is not entirely accurate as the GuC will only be issuing the request
>>>>> to one physical engine. However, it is the best that i915 can do given
>>>>> that it has no knowledge of the GuC's scheduling decisions.
>>>>
>>>> Commit text sounds a bit defeatist. I think instead of making up the 
>>>> serial
>>>> counts, which has downsides (could you please document in the commit 
>>>> what
>>>> they are), we should think how to design things properly.
>>>>
>>>
>>> IMO, I don't think fixing serial counts is the scope of this series. We
>>> should focus on getting GuC submission in not cleaning up all the crap
>>> that is in the i915. Let's make a note of this though so we can revisit
>>> later.
>>
>> I will say again - commit message implies it is introducing an 
>> unspecified downside by not fully fixing an also unspecified issue. It 
>> is completely reasonable, and customary even, to ask for both to be 
>> documented in the commit message.
> Not sure what exactly is 'unspecified'. I thought the commit message 
> described both the problem (heartbeat not running when using virtual 
> engines) and the result (heartbeat running on more engines than strictly 
> necessary). But in greater detail...
> 
> The serial number tracking is a hack for the heartbeat code to know 
> whether an engine is busy or idle, and therefore whether it should be 
> pinged for aliveness. Whenever a submission is made to an engine, the 
> serial number is incremented. The heartbeat code keeps a copy of the 
> value. If the value has changed, the engine is busy and needs to be pinged.
> 
> This works fine for execlist mode where virtual engine decomposition is 
> done inside i915. It fails miserably for GuC mode where the 
> decomposition is done by the hardware. The reason being that the 
> heartbeat code only looks at physical engines but the serial count is 
> only incremented on the virtual engine. Thus, the heartbeat sees 
> everything as idle and does not ping.

So hangcheck does not work. Or it works because GuC does it anyway. 
Either way, that's one thing to explicitly state in the commit message.

> This patch decomposes the virtual engines for the sake of incrementing 
> the serial count on each sub-engine in order to keep the heartbeat code 
> happy. The downside is that now the heartbeat sees all sub-engines as 
> busy rather than only the one the submission actually ends up on. There 
> really isn't much that can be done about that. The heartbeat code is in 
> i915 not GuC, the scheduler is in GuC not i915. The only way to improve 
> it is to either move the heartbeat code into GuC as well and completely 
> disable the i915 side, or add some way for i915 to interrogate GuC as to 
> which engines are or are not active. Technically, we do have both. GuC 
> has (or at least had) an option to force a context switch on every 
> execution quantum pre-emption. However, that is much, much, more heavy 
> weight than the heartbeat. For the latter, we do (almost) have the 
> engine usage statistics for PMU and such like. I'm not sure how much 
> effort it would be to wire that up to the heartbeat code instead of 
> using the serial count.
> 
> In short, the serial count is ever so slightly inefficient in that it 
> causes heartbeat pings on engines which are idle. On the other hand, it 
> is way more efficient and simpler than the current alternatives.

And the hack to make hangcheck work creates this inefficiency where 
heartbeats are sent to idle engines. Which is probably fine just needs 
to be explained.

> Does that answer the questions?

With the two points I re-raise clearly explained, possibly even patch 
title changed, yeah. I am just wanting for it to be more easily obvious 
to patch reader what it is functionally about - not just what 
implementation details have been change but why as well.

Regards,

Tvrtko

> John.
> 
> 
>>
>> If we are abandoning the normal review process someone please say so I 
>> don't waste my time reading it.
>>
>> Regards,
>>
>> Tvrtko
>>
>>> Matt
>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>>
>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/i915/gt/intel_engine_types.h     |  2 ++
>>>>>    .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++++++
>>>>>    drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++++++
>>>>>    drivers/gpu/drm/i915/gt/mock_engine.c            |  6 ++++++
>>>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 
>>>>> ++++++++++++++++
>>>>>    drivers/gpu/drm/i915/i915_request.c              |  4 +++-
>>>>>    6 files changed, 39 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
>>>>> b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>>>> index 86302e6d86b2..e2b5cda6dbc4 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>>>> @@ -389,6 +389,8 @@ struct intel_engine_cs {
>>>>>        void        (*park)(struct intel_engine_cs *engine);
>>>>>        void        (*unpark)(struct intel_engine_cs *engine);
>>>>> +    void        (*bump_serial)(struct intel_engine_cs *engine);
>>>>> +
>>>>>        void        (*set_default_submission)(struct intel_engine_cs 
>>>>> *engine);
>>>>>        const struct intel_context_ops *cops;
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
>>>>> b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>>>> index ae12d7f19ecd..02880ea5d693 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>>>> @@ -3199,6 +3199,11 @@ static void execlists_release(struct 
>>>>> intel_engine_cs *engine)
>>>>>        lrc_fini_wa_ctx(engine);
>>>>>    }
>>>>> +static void execlist_bump_serial(struct intel_engine_cs *engine)
>>>>> +{
>>>>> +    engine->serial++;
>>>>> +}
>>>>> +
>>>>>    static void
>>>>>    logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>>>>>    {
>>>>> @@ -3208,6 +3213,7 @@ logical_ring_default_vfuncs(struct 
>>>>> intel_engine_cs *engine)
>>>>>        engine->cops = &execlists_context_ops;
>>>>>        engine->request_alloc = execlists_request_alloc;
>>>>> +    engine->bump_serial = execlist_bump_serial;
>>>>>        engine->reset.prepare = execlists_reset_prepare;
>>>>>        engine->reset.rewind = execlists_reset_rewind;
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c 
>>>>> b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>>>> index 14aa31879a37..39dd7c4ed0a9 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>>>> @@ -1045,6 +1045,11 @@ static void setup_irq(struct intel_engine_cs 
>>>>> *engine)
>>>>>        }
>>>>>    }
>>>>> +static void ring_bump_serial(struct intel_engine_cs *engine)
>>>>> +{
>>>>> +    engine->serial++;
>>>>> +}
>>>>> +
>>>>>    static void setup_common(struct intel_engine_cs *engine)
>>>>>    {
>>>>>        struct drm_i915_private *i915 = engine->i915;
>>>>> @@ -1064,6 +1069,7 @@ static void setup_common(struct 
>>>>> intel_engine_cs *engine)
>>>>>        engine->cops = &ring_context_ops;
>>>>>        engine->request_alloc = ring_request_alloc;
>>>>> +    engine->bump_serial = ring_bump_serial;
>>>>>        /*
>>>>>         * Using a global execution timeline; the previous final 
>>>>> breadcrumb is
>>>>> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c 
>>>>> b/drivers/gpu/drm/i915/gt/mock_engine.c
>>>>> index bd005c1b6fd5..97b10fd60b55 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
>>>>> @@ -292,6 +292,11 @@ static void mock_engine_release(struct 
>>>>> intel_engine_cs *engine)
>>>>>        intel_engine_fini_retire(engine);
>>>>>    }
>>>>> +static void mock_bump_serial(struct intel_engine_cs *engine)
>>>>> +{
>>>>> +    engine->serial++;
>>>>> +}
>>>>> +
>>>>>    struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>>>>>                        const char *name,
>>>>>                        int id)
>>>>> @@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct 
>>>>> drm_i915_private *i915,
>>>>>        engine->base.cops = &mock_context_ops;
>>>>>        engine->base.request_alloc = mock_request_alloc;
>>>>> +    engine->base.bump_serial = mock_bump_serial;
>>>>>        engine->base.emit_flush = mock_emit_flush;
>>>>>        engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
>>>>>        engine->base.submit_request = mock_submit_request;
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>> index dc79d287c50a..f0e5731bcef6 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>> @@ -1500,6 +1500,20 @@ static void guc_release(struct 
>>>>> intel_engine_cs *engine)
>>>>>        lrc_fini_wa_ctx(engine);
>>>>>    }
>>>>> +static void guc_bump_serial(struct intel_engine_cs *engine)
>>>>> +{
>>>>> +    engine->serial++;
>>>>> +}
>>>>> +
>>>>> +static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
>>>>> +{
>>>>> +    struct intel_engine_cs *e;
>>>>> +    intel_engine_mask_t tmp, mask = engine->mask;
>>>>> +
>>>>> +    for_each_engine_masked(e, engine->gt, mask, tmp)
>>>>> +        e->serial++;
>>>>> +}
>>>>> +
>>>>>    static void guc_default_vfuncs(struct intel_engine_cs *engine)
>>>>>    {
>>>>>        /* Default vfuncs which can be overridden by each engine. */
>>>>> @@ -1508,6 +1522,7 @@ static void guc_default_vfuncs(struct 
>>>>> intel_engine_cs *engine)
>>>>>        engine->cops = &guc_context_ops;
>>>>>        engine->request_alloc = guc_request_alloc;
>>>>> +    engine->bump_serial = guc_bump_serial;
>>>>>        engine->sched_engine->schedule = i915_schedule;
>>>>> @@ -1843,6 +1858,7 @@ guc_create_virtual(struct intel_engine_cs 
>>>>> **siblings, unsigned int count)
>>>>>        ve->base.cops = &virtual_guc_context_ops;
>>>>>        ve->base.request_alloc = guc_request_alloc;
>>>>> +    ve->base.bump_serial = virtual_guc_bump_serial;
>>>>>        ve->base.submit_request = guc_submit_request;
>>>>> diff --git a/drivers/gpu/drm/i915/i915_request.c 
>>>>> b/drivers/gpu/drm/i915/i915_request.c
>>>>> index 9542a5baa45a..127d60b36422 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>>>> @@ -692,7 +692,9 @@ bool __i915_request_submit(struct i915_request 
>>>>> *request)
>>>>>                         request->ring->vaddr + request->postfix);
>>>>>        trace_i915_request_execute(request);
>>>>> -    engine->serial++;
>>>>> +    if (engine->bump_serial)
>>>>> +        engine->bump_serial(engine);
>>>>> +
>>>>>        result = true;
>>>>>        GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, 
>>>>> &request->fence.flags));
>>>>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 55/97] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
  2021-05-26 18:18         ` Matthew Brost
@ 2021-05-27  9:02           ` Tvrtko Ursulin
  2021-05-27 14:37             ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-27  9:02 UTC (permalink / raw)
  To: Matthew Brost; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel


On 26/05/2021 19:18, Matthew Brost wrote:
> On Wed, May 26, 2021 at 10:21:05AM +0100, Tvrtko Ursulin wrote:
>>
>> On 25/05/2021 18:07, Matthew Brost wrote:
>>> On Tue, May 25, 2021 at 11:06:00AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 06/05/2021 20:14, Matthew Brost wrote:
>>>>> When running the GuC the GPU can't be considered idle if the GuC still
>>>>> has contexts pinned. As such, a call has been added in
>>>>> intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
>>>>> the number of unpinned contexts to go to zero.
>>>>>
>>>>> Cc: John Harrison <john.c.harrison@intel.com>
>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>> ---
>>>>>     drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
>>>>>     drivers/gpu/drm/i915/gt/intel_gt.c            | 18 ++++
>>>>>     drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
>>>>>     drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 22 ++---
>>>>>     drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
>>>>>     drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
>>>>>     drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  1 +
>>>>>     drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +
>>>>>     .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 ++++++++++++++++++-
>>>>>     drivers/gpu/drm/i915/gt/uc/intel_uc.h         |  5 +
>>>>>     drivers/gpu/drm/i915/i915_debugfs.c           |  1 +
>>>>>     drivers/gpu/drm/i915/i915_gem_evict.c         |  1 +
>>>>>     .../gpu/drm/i915/selftests/igt_live_test.c    |  2 +-
>>>>>     .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
>>>>>     14 files changed, 137 insertions(+), 27 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>>>> index 8598a1c78a4c..2f5295c9408d 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>>>> @@ -634,7 +634,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
>>>>>     		goto insert;
>>>>>     	/* Attempt to reap some mmap space from dead objects */
>>>>> -	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
>>>>> +	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
>>>>> +					       NULL);
>>>>>     	if (err)
>>>>>     		goto err;
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>>>>> index 8d77dcbad059..1742a8561f69 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>>>>> @@ -574,6 +574,24 @@ static void __intel_gt_disable(struct intel_gt *gt)
>>>>>     	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
>>>>>     }
>>>>> +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
>>>>> +{
>>>>> +	long rtimeout;
>>>>> +
>>>>> +	/* If the device is asleep, we have no requests outstanding */
>>>>> +	if (!intel_gt_pm_is_awake(gt))
>>>>> +		return 0;
>>>>> +
>>>>> +	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
>>>>> +							   &rtimeout)) > 0) {
>>>>> +		cond_resched();
>>>>> +		if (signal_pending(current))
>>>>> +			return -EINTR;
>>>>> +	}
>>>>> +
>>>>> +	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc, rtimeout);
>>>>> +}
>>>>> +
>>>>>     int intel_gt_init(struct intel_gt *gt)
>>>>>     {
>>>>>     	int err;
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
>>>>> index 7ec395cace69..c775043334bf 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
>>>>> @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
>>>>>     void intel_gt_driver_late_release(struct intel_gt *gt);
>>>>> +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
>>>>> +
>>>>>     void intel_gt_check_and_clear_faults(struct intel_gt *gt);
>>>>>     void intel_gt_clear_error_registers(struct intel_gt *gt,
>>>>>     				    intel_engine_mask_t engine_mask);
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>>>> index 647eca9d867a..c6c702f236fa 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>>>> @@ -13,6 +13,7 @@
>>>>>     #include "intel_gt_pm.h"
>>>>>     #include "intel_gt_requests.h"
>>>>>     #include "intel_timeline.h"
>>>>> +#include "uc/intel_uc.h"
>>>>>     static bool retire_requests(struct intel_timeline *tl)
>>>>>     {
>>>>> @@ -130,7 +131,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine)
>>>>>     	GEM_BUG_ON(engine->retire);
>>>>>     }
>>>>> -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
>>>>> +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
>>>>> +				      long *rtimeout)
>>>>
>>>> What is 'rtimeout', I know remaining, but it can be more self-descriptive to
>>>> start with.
>>>>
>>>
>>> 'remaining_timeout' it is.
>>>
>>>> It feels a bit churny for what it is. How plausible would be alternatives to
>>>> either change existing timeout to in/out, or measure sleep internally in
>>>> this function, or just risk sleeping twice as long by passing the original
>>>> timeout to uc idle as well?
>>>>
>>>
>>> Originally had it just passing in the same value, got review feedback
>>> saying I should pass in the adjusted value. Hard to make everyone happy.
>>
>> Ok.
>>
>>>>>     {
>>>>>     	struct intel_gt_timelines *timelines = &gt->timelines;
>>>>>     	struct intel_timeline *tl, *tn;
>>>>> @@ -195,22 +197,10 @@ out_active:	spin_lock(&timelines->lock);
>>>>>     	if (flush_submission(gt, timeout)) /* Wait, there's more! */
>>>>>     		active_count++;
>>>>> -	return active_count ? timeout : 0;
>>>>> -}
>>>>> -
>>>>> -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
>>>>> -{
>>>>> -	/* If the device is asleep, we have no requests outstanding */
>>>>> -	if (!intel_gt_pm_is_awake(gt))
>>>>> -		return 0;
>>>>> -
>>>>> -	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
>>>>> -		cond_resched();
>>>>> -		if (signal_pending(current))
>>>>> -			return -EINTR;
>>>>> -	}
>>>>> +	if (rtimeout)
>>>>> +		*rtimeout = timeout;
>>>>> -	return timeout;
>>>>> +	return active_count ? timeout : 0;
>>>>>     }
>>>>>     static void retire_work_handler(struct work_struct *work)
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
>>>>> index fcc30a6e4fe9..4419787124e2 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
>>>>> @@ -10,10 +10,11 @@ struct intel_engine_cs;
>>>>>     struct intel_gt;
>>>>>     struct intel_timeline;
>>>>> -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
>>>>> +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
>>>>> +				      long *rtimeout);
>>>>>     static inline void intel_gt_retire_requests(struct intel_gt *gt)
>>>>>     {
>>>>> -	intel_gt_retire_requests_timeout(gt, 0);
>>>>> +	intel_gt_retire_requests_timeout(gt, 0, NULL);
>>>>>     }
>>>>>     void intel_engine_init_retire(struct intel_engine_cs *engine);
>>>>> @@ -21,8 +22,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine,
>>>>>     			     struct intel_timeline *tl);
>>>>>     void intel_engine_fini_retire(struct intel_engine_cs *engine);
>>>>> -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
>>>>> -
>>>>>     void intel_gt_init_requests(struct intel_gt *gt);
>>>>>     void intel_gt_park_requests(struct intel_gt *gt);
>>>>>     void intel_gt_unpark_requests(struct intel_gt *gt);
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>>> index 485e98f3f304..47eaa69809e8 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>>> @@ -38,6 +38,8 @@ struct intel_guc {
>>>>>     	spinlock_t irq_lock;
>>>>>     	unsigned int msg_enabled_mask;
>>>>> +	atomic_t outstanding_submission_g2h;
>>>>> +
>>>>>     	struct {
>>>>>     		bool enabled;
>>>>>     		void (*reset)(struct intel_guc *guc);
>>>>> @@ -239,6 +241,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>>>>>     	spin_unlock_irq(&guc->irq_lock);
>>>>>     }
>>>>> +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
>>>>> +
>>>>>     int intel_guc_reset_engine(struct intel_guc *guc,
>>>>>     			   struct intel_engine_cs *engine);
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>>> index f1893030ca88..cf701056fa14 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>>> @@ -111,6 +111,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>>>>>     	INIT_LIST_HEAD(&ct->requests.incoming);
>>>>>     	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
>>>>>     	tasklet_init(&ct->receive_tasklet, ct_receive_tasklet_func, (unsigned long)ct);
>>>>> +	init_waitqueue_head(&ct->wq);
>>>>>     }
>>>>>     static inline const char *guc_ct_buffer_type_to_str(u32 type)
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>>> index 660bf37238e2..ab1b79ab960b 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>>> @@ -10,6 +10,7 @@
>>>>>     #include <linux/spinlock.h>
>>>>>     #include <linux/workqueue.h>
>>>>>     #include <linux/ktime.h>
>>>>> +#include <linux/wait.h>
>>>>>     #include "intel_guc_fwif.h"
>>>>> @@ -68,6 +69,9 @@ struct intel_guc_ct {
>>>>>     	struct tasklet_struct receive_tasklet;
>>>>> +	/** @wq: wait queue for g2h chanenl */
>>>>> +	wait_queue_head_t wq;
>>>>> +
>>>>>     	struct {
>>>>>     		u16 last_fence; /* last fence used to send request */
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>> index ae0b386467e3..0ff7dd6d337d 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>> @@ -253,6 +253,74 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
>>>>>     	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
>>>>>     }
>>>>> +static int guc_submission_busy_loop(struct intel_guc* guc,
>>>>> +				    const u32 *action,
>>>>> +				    u32 len,
>>>>> +				    u32 g2h_len_dw,
>>>>> +				    bool loop)
>>>>> +{
>>>>> +	int err;
>>>>> +
>>>>> +	err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
>>>>> +
>>>>> +	if (!err && g2h_len_dw)
>>>>> +		atomic_inc(&guc->outstanding_submission_g2h);
>>>>> +
>>>>> +	return err;
>>>>> +}
>>>>> +
>>>>> +static int guc_wait_for_pending_msg(struct intel_guc *guc,
>>>>> +				    atomic_t *wait_var,
>>>>> +				    bool interruptible,
>>>>> +				    long timeout)
>>>>> +{
>>>>> +	const int state = interruptible ?
>>>>> +		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
>>>>> +	DEFINE_WAIT(wait);
>>>>> +
>>>>> +	might_sleep();
>>>>> +	GEM_BUG_ON(timeout < 0);
>>>>> +
>>>>> +	if (!atomic_read(wait_var))
>>>>> +		return 0;
>>>>> +
>>>>> +	if (!timeout)
>>>>> +		return -ETIME;
>>>>> +
>>>>> +	for (;;) {
>>>>> +		prepare_to_wait(&guc->ct.wq, &wait, state);
>>>>> +
>>>>> +		if (!atomic_read(wait_var))
>>>>> +			break;
>>>>> +
>>>>> +		if (signal_pending_state(state, current)) {
>>>>> +			timeout = -ERESTARTSYS;
>>>>> +			break;
>>>>> +		}
>>>>> +
>>>>> +		if (!timeout) {
>>>>> +			timeout = -ETIME;
>>>>> +			break;
>>>>> +		}
>>>>> +
>>>>> +		timeout = io_schedule_timeout(timeout);
>>>>> +	}
>>>>> +	finish_wait(&guc->ct.wq, &wait);
>>>>> +
>>>>> +	return (timeout < 0) ? timeout : 0;
>>>>> +}
>>>>
>>>> See if it is possible to simplify all this with wait_var_event and
>>>> wake_up_var.
>>>>
>>>
>>> Let me check on that.
>>>>> +
>>>>> +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
>>>>> +{
>>>>> +	bool interruptible = true;
>>>>> +
>>>>> +	if (unlikely(timeout < 0))
>>>>> +		timeout = -timeout, interruptible = false;
>>>>> +
>>>>> +	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
>>>>> +					interruptible, timeout);
>>>>> +}
>>>>> +
>>>>>     static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>>>>>     {
>>>>>     	int err;
>>>>> @@ -279,6 +347,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>>>>>     	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
>>>>>     	if (!enabled && !err) {
>>>>> +		atomic_inc(&guc->outstanding_submission_g2h);
>>>>>     		set_context_enabled(ce);
>>>>>     	} else if (!enabled) {
>>>>>     		clr_context_pending_enable(ce);
>>>>> @@ -734,7 +803,7 @@ static int __guc_action_register_context(struct intel_guc *guc,
>>>>>     		offset,
>>>>>     	};
>>>>> -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
>>>>> +	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
>>>>>     }
>>>>>     static int register_context(struct intel_context *ce)
>>>>> @@ -754,7 +823,7 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
>>>>>     		guc_id,
>>>>>     	};
>>>>> -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
>>>>> +	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
>>>>>     					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
>>>>>     }
>>>>> @@ -871,7 +940,9 @@ static int guc_context_pin(struct intel_context *ce, void *vaddr)
>>>>>     static void guc_context_unpin(struct intel_context *ce)
>>>>>     {
>>>>> -	unpin_guc_id(ce_to_guc(ce), ce);
>>>>> +	struct intel_guc *guc = ce_to_guc(ce);
>>>>> +
>>>>> +	unpin_guc_id(guc, ce);
>>>>>     	lrc_unpin(ce);
>>>>>     }
>>>>> @@ -894,7 +965,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
>>>>>     	intel_context_get(ce);
>>>>> -	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
>>>>> +	guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
>>>>>     				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
>>>>>     }
>>>>> @@ -1437,6 +1508,15 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
>>>>>     	return ce;
>>>>>     }
>>>>> +static void decr_outstanding_submission_g2h(struct intel_guc *guc)
>>>>> +{
>>>>> +	if (atomic_dec_and_test(&guc->outstanding_submission_g2h)) {
>>>>> +		smp_mb();
>>>>> +		if (waitqueue_active(&guc->ct.wq))
>>>>> +			wake_up_all(&guc->ct.wq);
>>>>
>>>> I keep pointing out this pattern is racy and at least needs comment why it
>>>> is safe.
>>>>
>>>
>>> There is a comment in wake queue code header saying why this is safe. I
>>> don't think we need to repeat this here.
>>
>> Yeah, _describing how to make it safe_, after it starts with:
>>
>>   * NOTE: this function is lockless and requires care, incorrect usage _will_
>>   * lead to sporadic and non-obvious failure.
>>
>> Then it also says:
>>
>>   * Also note that this 'optimization' trades a spin_lock() for an smp_mb(),
>>   * which (when the lock is uncontended) are of roughly equal cost.
>>
>> I question the need to optimize this path since it means reader has to figure out if it is safe while a simple wake_up_all after atomic_dec_and_test would have done it.
>>
>> Is the case of no waiters a predominant one? It at least deserves a comment explaining why the optimisation is important.
>>
> 
> I just didn't want to add a spin_lock if there is known working code
> path without one and our code fits into that path. I can add a comment
> but I don't really think it necessary.

Lock already exists in the wake_up_all, it is not about adding your own.

As premature optimisations are usually best avoided it is simply about 
how do you justify a):

+static void decr_outstanding_submission_g2h(struct intel_guc *guc)
+{
+	if (atomic_dec_and_test(&guc->outstanding_submission_g2h)) {
+		smp_mb();
+		if (waitqueue_active(&guc->ct.wq))
+			wake_up_all(&guc->ct.wq);

When the easy alternative (easy to read, easy to review, easy to 
maintain) is b):

+static void decr_outstanding_submission_g2h(struct intel_guc *guc)
+{
+	if (atomic_dec_and_test(&guc->outstanding_submission_g2h))
+		wake_up_all(&guc->ct.wq);

For me as external reader the question seems to be, I will say it again, 
is the case of no waiters a common one and is this a hot path to justify 
avoiding a function call by adding the mental complexity explained in 
the waitqueue_active comment? Here and in the other places in the GuC 
backend waitqueue_active is used.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-05-26 18:10         ` Matthew Brost
@ 2021-05-27 10:02           ` Tvrtko Ursulin
  2021-05-27 14:35             ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-27 10:02 UTC (permalink / raw)
  To: Matthew Brost; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel


On 26/05/2021 19:10, Matthew Brost wrote:

[snip]

>>>>> +static int ct_send_nb(struct intel_guc_ct *ct,
>>>>> +		      const u32 *action,
>>>>> +		      u32 len,
>>>>> +		      u32 flags)
>>>>> +{
>>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>> +	unsigned long spin_flags;
>>>>> +	u32 fence;
>>>>> +	int ret;
>>>>> +
>>>>> +	spin_lock_irqsave(&ctb->lock, spin_flags);
>>>>> +
>>>>> +	ret = ctb_has_room(ctb, len + 1);
>>>>> +	if (unlikely(ret))
>>>>> +		goto out;
>>>>> +
>>>>> +	fence = ct_get_next_fence(ct);
>>>>> +	ret = ct_write(ct, action, len, fence, flags);
>>>>> +	if (unlikely(ret))
>>>>> +		goto out;
>>>>> +
>>>>> +	intel_guc_notify(ct_to_guc(ct));
>>>>> +
>>>>> +out:
>>>>> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
>>>>> +
>>>>> +	return ret;
>>>>> +}
>>>>> +
>>>>>     static int ct_send(struct intel_guc_ct *ct,
>>>>>     		   const u32 *action,
>>>>>     		   u32 len,
>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>     		   u32 response_buf_size,
>>>>>     		   u32 *status)
>>>>>     {
>>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>>     	struct ct_request request;
>>>>>     	unsigned long flags;
>>>>>     	u32 fence;
>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>     	GEM_BUG_ON(!len);
>>>>>     	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>>>     	GEM_BUG_ON(!response_buf && response_buf_size);
>>>>> +	might_sleep();
>>>>
>>>> Sleep is just cond_resched below or there is more?
>>>>
>>>
>>> Yes, the cond_resched.
>>>
>>>>> +	/*
>>>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>>>>> +	 * buffers are sized correctly the flow control condition should be
>>>>> +	 * rare.
>>>>> +	 */
>>>>> +retry:
>>>>>     	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>>>>> +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
>>>>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>>>> +		cond_resched();
>>>>> +		goto retry;
>>>>> +	}
>>>>
>>>> If this patch is about adding a non-blocking send function, and below we can
>>>> see that it creates a fork:
>>>>
>>>> intel_guc_ct_send:
>>>> ...
>>>> 	if (flags & INTEL_GUC_SEND_NB)
>>>> 		return ct_send_nb(ct, action, len, flags);
>>>>
>>>>    	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>>>>
>>>> Then why is there a change in ct_send here, which is not the new
>>>> non-blocking path?
>>>>
>>>
>>> There is not a change to ct_send(), just to intel_guc_ct_send.
>>
>> I was doing by the diff which says:
>>
>>   static int ct_send(struct intel_guc_ct *ct,
>>   		   const u32 *action,
>>   		   u32 len,
>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>   		   u32 response_buf_size,
>>   		   u32 *status)
>>   {
>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>   	struct ct_request request;
>>   	unsigned long flags;
>>   	u32 fence;
>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
>>   	GEM_BUG_ON(!len);
>>   	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>   	GEM_BUG_ON(!response_buf && response_buf_size);
>> +	might_sleep();
>> +	/*
>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>> +	 * buffers are sized correctly the flow control condition should be
>> +	 * rare.
>> +	 */
>> +retry:
>>   	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>> +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>> +		cond_resched();
>> +		goto retry;
>> +	}
>>
>> So it looks like a change to ct_send to me. Is that wrong?

What about this part - is the patch changing the blocking ct_send or 
not, and if it is why?

Regards,

Tvrtko


>>
>> Regards,
>>
>> Tvrtko
>>
>>> As for why intel_guc_ct_send is updated and we don't just a new public
>>> function, this was another reviewers suggestion. Again can't make
>>> everyone happy.
>>>>>     	fence = ct_get_next_fence(ct);
>>>>>     	request.fence = fence;
>>>>> @@ -495,7 +576,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>     	list_add_tail(&request.link, &ct->requests.pending);
>>>>>     	spin_unlock(&ct->requests.lock);
>>>>> -	err = ct_write(ct, action, len, fence);
>>>>> +	err = ct_write(ct, action, len, fence, 0);
>>>>>     	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>>>> @@ -537,7 +618,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>      * Command Transport (CT) buffer based GuC send function.
>>>>>      */
>>>>>     int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>>>> -		      u32 *response_buf, u32 response_buf_size)
>>>>> +		      u32 *response_buf, u32 response_buf_size, u32 flags)
>>>>>     {
>>>>>     	u32 status = ~0; /* undefined */
>>>>>     	int ret;
>>>>> @@ -547,6 +628,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>>>>     		return -ENODEV;
>>>>>     	}
>>>>> +	if (flags & INTEL_GUC_SEND_NB)
>>>>> +		return ct_send_nb(ct, action, len, flags);
>>>>> +
>>>>>     	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>>>>>     	if (unlikely(ret < 0)) {
>>>>>     		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>>> index 1ae2dde6db93..55ef7c52472f 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>>> @@ -9,6 +9,7 @@
>>>>>     #include <linux/interrupt.h>
>>>>>     #include <linux/spinlock.h>
>>>>>     #include <linux/workqueue.h>
>>>>> +#include <linux/ktime.h>
>>>>>     #include "intel_guc_fwif.h"
>>>>> @@ -42,7 +43,6 @@ struct intel_guc_ct_buffer {
>>>>>     	bool broken;
>>>>>     };
>>>>> -
>>>>>     /** Top-level structure for Command Transport related data
>>>>>      *
>>>>>      * Includes a pair of CT buffers for bi-directional communication and tracking
>>>>> @@ -69,6 +69,9 @@ struct intel_guc_ct {
>>>>>     		struct list_head incoming; /* incoming requests */
>>>>>     		struct work_struct worker; /* handler for incoming requests */
>>>>>     	} requests;
>>>>> +
>>>>> +	/** @stall_time: time of first time a CTB submission is stalled */
>>>>> +	ktime_t stall_time;
>>>>
>>>> Unused in this patch.
>>>>
>>>
>>> Yea, wrong patch. Will fix.
>>>
>>> Matt
>>>>>     };
>>>>>     void intel_guc_ct_init_early(struct intel_guc_ct *ct);
>>>>> @@ -88,7 +91,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
>>>>>     }
>>>>>     int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>>>> -		      u32 *response_buf, u32 response_buf_size);
>>>>> +		      u32 *response_buf, u32 response_buf_size, u32 flags);
>>>>>     void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
>>>>>     #endif /* _INTEL_GUC_CT_H_ */
>>>>>
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-05-27 10:02           ` Tvrtko Ursulin
@ 2021-05-27 14:35             ` Matthew Brost
  2021-05-27 15:11               ` Tvrtko Ursulin
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-05-27 14:35 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Thu, May 27, 2021 at 11:02:24AM +0100, Tvrtko Ursulin wrote:
> 
> On 26/05/2021 19:10, Matthew Brost wrote:
> 
> [snip]
> 
> > > > > > +static int ct_send_nb(struct intel_guc_ct *ct,
> > > > > > +		      const u32 *action,
> > > > > > +		      u32 len,
> > > > > > +		      u32 flags)
> > > > > > +{
> > > > > > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > > > > +	unsigned long spin_flags;
> > > > > > +	u32 fence;
> > > > > > +	int ret;
> > > > > > +
> > > > > > +	spin_lock_irqsave(&ctb->lock, spin_flags);
> > > > > > +
> > > > > > +	ret = ctb_has_room(ctb, len + 1);
> > > > > > +	if (unlikely(ret))
> > > > > > +		goto out;
> > > > > > +
> > > > > > +	fence = ct_get_next_fence(ct);
> > > > > > +	ret = ct_write(ct, action, len, fence, flags);
> > > > > > +	if (unlikely(ret))
> > > > > > +		goto out;
> > > > > > +
> > > > > > +	intel_guc_notify(ct_to_guc(ct));
> > > > > > +
> > > > > > +out:
> > > > > > +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> > > > > > +
> > > > > > +	return ret;
> > > > > > +}
> > > > > > +
> > > > > >     static int ct_send(struct intel_guc_ct *ct,
> > > > > >     		   const u32 *action,
> > > > > >     		   u32 len,
> > > > > > @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > > > > >     		   u32 response_buf_size,
> > > > > >     		   u32 *status)
> > > > > >     {
> > > > > > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > > > >     	struct ct_request request;
> > > > > >     	unsigned long flags;
> > > > > >     	u32 fence;
> > > > > > @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> > > > > >     	GEM_BUG_ON(!len);
> > > > > >     	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> > > > > >     	GEM_BUG_ON(!response_buf && response_buf_size);
> > > > > > +	might_sleep();
> > > > > 
> > > > > Sleep is just cond_resched below or there is more?
> > > > > 
> > > > 
> > > > Yes, the cond_resched.
> > > > 
> > > > > > +	/*
> > > > > > +	 * We use a lazy spin wait loop here as we believe that if the CT
> > > > > > +	 * buffers are sized correctly the flow control condition should be
> > > > > > +	 * rare.
> > > > > > +	 */
> > > > > > +retry:
> > > > > >     	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > > > > > +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
> > > > > > +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > > > > > +		cond_resched();
> > > > > > +		goto retry;
> > > > > > +	}
> > > > > 
> > > > > If this patch is about adding a non-blocking send function, and below we can
> > > > > see that it creates a fork:
> > > > > 
> > > > > intel_guc_ct_send:
> > > > > ...
> > > > > 	if (flags & INTEL_GUC_SEND_NB)
> > > > > 		return ct_send_nb(ct, action, len, flags);
> > > > > 
> > > > >    	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> > > > > 
> > > > > Then why is there a change in ct_send here, which is not the new
> > > > > non-blocking path?
> > > > > 
> > > > 
> > > > There is not a change to ct_send(), just to intel_guc_ct_send.
> > > 
> > > I was doing by the diff which says:
> > > 
> > >   static int ct_send(struct intel_guc_ct *ct,
> > >   		   const u32 *action,
> > >   		   u32 len,
> > > @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > >   		   u32 response_buf_size,
> > >   		   u32 *status)
> > >   {
> > > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > >   	struct ct_request request;
> > >   	unsigned long flags;
> > >   	u32 fence;
> > > @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> > >   	GEM_BUG_ON(!len);
> > >   	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> > >   	GEM_BUG_ON(!response_buf && response_buf_size);
> > > +	might_sleep();
> > > +	/*
> > > +	 * We use a lazy spin wait loop here as we believe that if the CT
> > > +	 * buffers are sized correctly the flow control condition should be
> > > +	 * rare.
> > > +	 */
> > > +retry:
> > >   	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > > +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
> > > +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > > +		cond_resched();
> > > +		goto retry;
> > > +	}
> > > 
> > > So it looks like a change to ct_send to me. Is that wrong?
> 
> What about this part - is the patch changing the blocking ct_send or not,
> and if it is why?
> 

Yes, ct_send() changes. Sorry for the confusion.

This function needs to be updated to account for the H2G space and
backoff if no space is available.

Matt

> Regards,
> 
> Tvrtko
> 
> 
> > > 
> > > Regards,
> > > 
> > > Tvrtko
> > > 
> > > > As for why intel_guc_ct_send is updated and we don't just a new public
> > > > function, this was another reviewers suggestion. Again can't make
> > > > everyone happy.
> > > > > >     	fence = ct_get_next_fence(ct);
> > > > > >     	request.fence = fence;
> > > > > > @@ -495,7 +576,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > > > > >     	list_add_tail(&request.link, &ct->requests.pending);
> > > > > >     	spin_unlock(&ct->requests.lock);
> > > > > > -	err = ct_write(ct, action, len, fence);
> > > > > > +	err = ct_write(ct, action, len, fence, 0);
> > > > > >     	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > > > > > @@ -537,7 +618,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > > > > >      * Command Transport (CT) buffer based GuC send function.
> > > > > >      */
> > > > > >     int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> > > > > > -		      u32 *response_buf, u32 response_buf_size)
> > > > > > +		      u32 *response_buf, u32 response_buf_size, u32 flags)
> > > > > >     {
> > > > > >     	u32 status = ~0; /* undefined */
> > > > > >     	int ret;
> > > > > > @@ -547,6 +628,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> > > > > >     		return -ENODEV;
> > > > > >     	}
> > > > > > +	if (flags & INTEL_GUC_SEND_NB)
> > > > > > +		return ct_send_nb(ct, action, len, flags);
> > > > > > +
> > > > > >     	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> > > > > >     	if (unlikely(ret < 0)) {
> > > > > >     		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > > > index 1ae2dde6db93..55ef7c52472f 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > > > @@ -9,6 +9,7 @@
> > > > > >     #include <linux/interrupt.h>
> > > > > >     #include <linux/spinlock.h>
> > > > > >     #include <linux/workqueue.h>
> > > > > > +#include <linux/ktime.h>
> > > > > >     #include "intel_guc_fwif.h"
> > > > > > @@ -42,7 +43,6 @@ struct intel_guc_ct_buffer {
> > > > > >     	bool broken;
> > > > > >     };
> > > > > > -
> > > > > >     /** Top-level structure for Command Transport related data
> > > > > >      *
> > > > > >      * Includes a pair of CT buffers for bi-directional communication and tracking
> > > > > > @@ -69,6 +69,9 @@ struct intel_guc_ct {
> > > > > >     		struct list_head incoming; /* incoming requests */
> > > > > >     		struct work_struct worker; /* handler for incoming requests */
> > > > > >     	} requests;
> > > > > > +
> > > > > > +	/** @stall_time: time of first time a CTB submission is stalled */
> > > > > > +	ktime_t stall_time;
> > > > > 
> > > > > Unused in this patch.
> > > > > 
> > > > 
> > > > Yea, wrong patch. Will fix.
> > > > 
> > > > Matt
> > > > > >     };
> > > > > >     void intel_guc_ct_init_early(struct intel_guc_ct *ct);
> > > > > > @@ -88,7 +91,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
> > > > > >     }
> > > > > >     int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> > > > > > -		      u32 *response_buf, u32 response_buf_size);
> > > > > > +		      u32 *response_buf, u32 response_buf_size, u32 flags);
> > > > > >     void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
> > > > > >     #endif /* _INTEL_GUC_CT_H_ */
> > > > > > 
> > > > > 
> > > > > Regards,
> > > > > 
> > > > > Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 55/97] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
  2021-05-27  9:02           ` Tvrtko Ursulin
@ 2021-05-27 14:37             ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-27 14:37 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Thu, May 27, 2021 at 10:02:55AM +0100, Tvrtko Ursulin wrote:
> 
> On 26/05/2021 19:18, Matthew Brost wrote:
> > On Wed, May 26, 2021 at 10:21:05AM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 25/05/2021 18:07, Matthew Brost wrote:
> > > > On Tue, May 25, 2021 at 11:06:00AM +0100, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 06/05/2021 20:14, Matthew Brost wrote:
> > > > > > When running the GuC the GPU can't be considered idle if the GuC still
> > > > > > has contexts pinned. As such, a call has been added in
> > > > > > intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
> > > > > > the number of unpinned contexts to go to zero.
> > > > > > 
> > > > > > Cc: John Harrison <john.c.harrison@intel.com>
> > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > > ---
> > > > > >     drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
> > > > > >     drivers/gpu/drm/i915/gt/intel_gt.c            | 18 ++++
> > > > > >     drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
> > > > > >     drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 22 ++---
> > > > > >     drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
> > > > > >     drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
> > > > > >     drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  1 +
> > > > > >     drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +
> > > > > >     .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 ++++++++++++++++++-
> > > > > >     drivers/gpu/drm/i915/gt/uc/intel_uc.h         |  5 +
> > > > > >     drivers/gpu/drm/i915/i915_debugfs.c           |  1 +
> > > > > >     drivers/gpu/drm/i915/i915_gem_evict.c         |  1 +
> > > > > >     .../gpu/drm/i915/selftests/igt_live_test.c    |  2 +-
> > > > > >     .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
> > > > > >     14 files changed, 137 insertions(+), 27 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > > > > index 8598a1c78a4c..2f5295c9408d 100644
> > > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > > > > @@ -634,7 +634,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
> > > > > >     		goto insert;
> > > > > >     	/* Attempt to reap some mmap space from dead objects */
> > > > > > -	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
> > > > > > +	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
> > > > > > +					       NULL);
> > > > > >     	if (err)
> > > > > >     		goto err;
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > > > index 8d77dcbad059..1742a8561f69 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > > > @@ -574,6 +574,24 @@ static void __intel_gt_disable(struct intel_gt *gt)
> > > > > >     	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
> > > > > >     }
> > > > > > +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
> > > > > > +{
> > > > > > +	long rtimeout;
> > > > > > +
> > > > > > +	/* If the device is asleep, we have no requests outstanding */
> > > > > > +	if (!intel_gt_pm_is_awake(gt))
> > > > > > +		return 0;
> > > > > > +
> > > > > > +	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
> > > > > > +							   &rtimeout)) > 0) {
> > > > > > +		cond_resched();
> > > > > > +		if (signal_pending(current))
> > > > > > +			return -EINTR;
> > > > > > +	}
> > > > > > +
> > > > > > +	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc, rtimeout);
> > > > > > +}
> > > > > > +
> > > > > >     int intel_gt_init(struct intel_gt *gt)
> > > > > >     {
> > > > > >     	int err;
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
> > > > > > index 7ec395cace69..c775043334bf 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/intel_gt.h
> > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
> > > > > > @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
> > > > > >     void intel_gt_driver_late_release(struct intel_gt *gt);
> > > > > > +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> > > > > > +
> > > > > >     void intel_gt_check_and_clear_faults(struct intel_gt *gt);
> > > > > >     void intel_gt_clear_error_registers(struct intel_gt *gt,
> > > > > >     				    intel_engine_mask_t engine_mask);
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > > > > > index 647eca9d867a..c6c702f236fa 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > > > > > @@ -13,6 +13,7 @@
> > > > > >     #include "intel_gt_pm.h"
> > > > > >     #include "intel_gt_requests.h"
> > > > > >     #include "intel_timeline.h"
> > > > > > +#include "uc/intel_uc.h"
> > > > > >     static bool retire_requests(struct intel_timeline *tl)
> > > > > >     {
> > > > > > @@ -130,7 +131,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine)
> > > > > >     	GEM_BUG_ON(engine->retire);
> > > > > >     }
> > > > > > -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
> > > > > > +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> > > > > > +				      long *rtimeout)
> > > > > 
> > > > > What is 'rtimeout', I know remaining, but it can be more self-descriptive to
> > > > > start with.
> > > > > 
> > > > 
> > > > 'remaining_timeout' it is.
> > > > 
> > > > > It feels a bit churny for what it is. How plausible would be alternatives to
> > > > > either change existing timeout to in/out, or measure sleep internally in
> > > > > this function, or just risk sleeping twice as long by passing the original
> > > > > timeout to uc idle as well?
> > > > > 
> > > > 
> > > > Originally had it just passing in the same value, got review feedback
> > > > saying I should pass in the adjusted value. Hard to make everyone happy.
> > > 
> > > Ok.
> > > 
> > > > > >     {
> > > > > >     	struct intel_gt_timelines *timelines = &gt->timelines;
> > > > > >     	struct intel_timeline *tl, *tn;
> > > > > > @@ -195,22 +197,10 @@ out_active:	spin_lock(&timelines->lock);
> > > > > >     	if (flush_submission(gt, timeout)) /* Wait, there's more! */
> > > > > >     		active_count++;
> > > > > > -	return active_count ? timeout : 0;
> > > > > > -}
> > > > > > -
> > > > > > -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
> > > > > > -{
> > > > > > -	/* If the device is asleep, we have no requests outstanding */
> > > > > > -	if (!intel_gt_pm_is_awake(gt))
> > > > > > -		return 0;
> > > > > > -
> > > > > > -	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
> > > > > > -		cond_resched();
> > > > > > -		if (signal_pending(current))
> > > > > > -			return -EINTR;
> > > > > > -	}
> > > > > > +	if (rtimeout)
> > > > > > +		*rtimeout = timeout;
> > > > > > -	return timeout;
> > > > > > +	return active_count ? timeout : 0;
> > > > > >     }
> > > > > >     static void retire_work_handler(struct work_struct *work)
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > > > > > index fcc30a6e4fe9..4419787124e2 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > > > > > @@ -10,10 +10,11 @@ struct intel_engine_cs;
> > > > > >     struct intel_gt;
> > > > > >     struct intel_timeline;
> > > > > > -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
> > > > > > +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> > > > > > +				      long *rtimeout);
> > > > > >     static inline void intel_gt_retire_requests(struct intel_gt *gt)
> > > > > >     {
> > > > > > -	intel_gt_retire_requests_timeout(gt, 0);
> > > > > > +	intel_gt_retire_requests_timeout(gt, 0, NULL);
> > > > > >     }
> > > > > >     void intel_engine_init_retire(struct intel_engine_cs *engine);
> > > > > > @@ -21,8 +22,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine,
> > > > > >     			     struct intel_timeline *tl);
> > > > > >     void intel_engine_fini_retire(struct intel_engine_cs *engine);
> > > > > > -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> > > > > > -
> > > > > >     void intel_gt_init_requests(struct intel_gt *gt);
> > > > > >     void intel_gt_park_requests(struct intel_gt *gt);
> > > > > >     void intel_gt_unpark_requests(struct intel_gt *gt);
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > > > index 485e98f3f304..47eaa69809e8 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > > > @@ -38,6 +38,8 @@ struct intel_guc {
> > > > > >     	spinlock_t irq_lock;
> > > > > >     	unsigned int msg_enabled_mask;
> > > > > > +	atomic_t outstanding_submission_g2h;
> > > > > > +
> > > > > >     	struct {
> > > > > >     		bool enabled;
> > > > > >     		void (*reset)(struct intel_guc *guc);
> > > > > > @@ -239,6 +241,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
> > > > > >     	spin_unlock_irq(&guc->irq_lock);
> > > > > >     }
> > > > > > +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
> > > > > > +
> > > > > >     int intel_guc_reset_engine(struct intel_guc *guc,
> > > > > >     			   struct intel_engine_cs *engine);
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > > > index f1893030ca88..cf701056fa14 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > > > @@ -111,6 +111,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
> > > > > >     	INIT_LIST_HEAD(&ct->requests.incoming);
> > > > > >     	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
> > > > > >     	tasklet_init(&ct->receive_tasklet, ct_receive_tasklet_func, (unsigned long)ct);
> > > > > > +	init_waitqueue_head(&ct->wq);
> > > > > >     }
> > > > > >     static inline const char *guc_ct_buffer_type_to_str(u32 type)
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > > > index 660bf37238e2..ab1b79ab960b 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > > > @@ -10,6 +10,7 @@
> > > > > >     #include <linux/spinlock.h>
> > > > > >     #include <linux/workqueue.h>
> > > > > >     #include <linux/ktime.h>
> > > > > > +#include <linux/wait.h>
> > > > > >     #include "intel_guc_fwif.h"
> > > > > > @@ -68,6 +69,9 @@ struct intel_guc_ct {
> > > > > >     	struct tasklet_struct receive_tasklet;
> > > > > > +	/** @wq: wait queue for g2h chanenl */
> > > > > > +	wait_queue_head_t wq;
> > > > > > +
> > > > > >     	struct {
> > > > > >     		u16 last_fence; /* last fence used to send request */
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > > > index ae0b386467e3..0ff7dd6d337d 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > > > @@ -253,6 +253,74 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> > > > > >     	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> > > > > >     }
> > > > > > +static int guc_submission_busy_loop(struct intel_guc* guc,
> > > > > > +				    const u32 *action,
> > > > > > +				    u32 len,
> > > > > > +				    u32 g2h_len_dw,
> > > > > > +				    bool loop)
> > > > > > +{
> > > > > > +	int err;
> > > > > > +
> > > > > > +	err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
> > > > > > +
> > > > > > +	if (!err && g2h_len_dw)
> > > > > > +		atomic_inc(&guc->outstanding_submission_g2h);
> > > > > > +
> > > > > > +	return err;
> > > > > > +}
> > > > > > +
> > > > > > +static int guc_wait_for_pending_msg(struct intel_guc *guc,
> > > > > > +				    atomic_t *wait_var,
> > > > > > +				    bool interruptible,
> > > > > > +				    long timeout)
> > > > > > +{
> > > > > > +	const int state = interruptible ?
> > > > > > +		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> > > > > > +	DEFINE_WAIT(wait);
> > > > > > +
> > > > > > +	might_sleep();
> > > > > > +	GEM_BUG_ON(timeout < 0);
> > > > > > +
> > > > > > +	if (!atomic_read(wait_var))
> > > > > > +		return 0;
> > > > > > +
> > > > > > +	if (!timeout)
> > > > > > +		return -ETIME;
> > > > > > +
> > > > > > +	for (;;) {
> > > > > > +		prepare_to_wait(&guc->ct.wq, &wait, state);
> > > > > > +
> > > > > > +		if (!atomic_read(wait_var))
> > > > > > +			break;
> > > > > > +
> > > > > > +		if (signal_pending_state(state, current)) {
> > > > > > +			timeout = -ERESTARTSYS;
> > > > > > +			break;
> > > > > > +		}
> > > > > > +
> > > > > > +		if (!timeout) {
> > > > > > +			timeout = -ETIME;
> > > > > > +			break;
> > > > > > +		}
> > > > > > +
> > > > > > +		timeout = io_schedule_timeout(timeout);
> > > > > > +	}
> > > > > > +	finish_wait(&guc->ct.wq, &wait);
> > > > > > +
> > > > > > +	return (timeout < 0) ? timeout : 0;
> > > > > > +}
> > > > > 
> > > > > See if it is possible to simplify all this with wait_var_event and
> > > > > wake_up_var.
> > > > > 
> > > > 
> > > > Let me check on that.
> > > > > > +
> > > > > > +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
> > > > > > +{
> > > > > > +	bool interruptible = true;
> > > > > > +
> > > > > > +	if (unlikely(timeout < 0))
> > > > > > +		timeout = -timeout, interruptible = false;
> > > > > > +
> > > > > > +	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
> > > > > > +					interruptible, timeout);
> > > > > > +}
> > > > > > +
> > > > > >     static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > > > > >     {
> > > > > >     	int err;
> > > > > > @@ -279,6 +347,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > > > > >     	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
> > > > > >     	if (!enabled && !err) {
> > > > > > +		atomic_inc(&guc->outstanding_submission_g2h);
> > > > > >     		set_context_enabled(ce);
> > > > > >     	} else if (!enabled) {
> > > > > >     		clr_context_pending_enable(ce);
> > > > > > @@ -734,7 +803,7 @@ static int __guc_action_register_context(struct intel_guc *guc,
> > > > > >     		offset,
> > > > > >     	};
> > > > > > -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > > > > > +	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > > > > >     }
> > > > > >     static int register_context(struct intel_context *ce)
> > > > > > @@ -754,7 +823,7 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
> > > > > >     		guc_id,
> > > > > >     	};
> > > > > > -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > > > > > +	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
> > > > > >     					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
> > > > > >     }
> > > > > > @@ -871,7 +940,9 @@ static int guc_context_pin(struct intel_context *ce, void *vaddr)
> > > > > >     static void guc_context_unpin(struct intel_context *ce)
> > > > > >     {
> > > > > > -	unpin_guc_id(ce_to_guc(ce), ce);
> > > > > > +	struct intel_guc *guc = ce_to_guc(ce);
> > > > > > +
> > > > > > +	unpin_guc_id(guc, ce);
> > > > > >     	lrc_unpin(ce);
> > > > > >     }
> > > > > > @@ -894,7 +965,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
> > > > > >     	intel_context_get(ce);
> > > > > > -	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > > > > > +	guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
> > > > > >     				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> > > > > >     }
> > > > > > @@ -1437,6 +1508,15 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> > > > > >     	return ce;
> > > > > >     }
> > > > > > +static void decr_outstanding_submission_g2h(struct intel_guc *guc)
> > > > > > +{
> > > > > > +	if (atomic_dec_and_test(&guc->outstanding_submission_g2h)) {
> > > > > > +		smp_mb();
> > > > > > +		if (waitqueue_active(&guc->ct.wq))
> > > > > > +			wake_up_all(&guc->ct.wq);
> > > > > 
> > > > > I keep pointing out this pattern is racy and at least needs comment why it
> > > > > is safe.
> > > > > 
> > > > 
> > > > There is a comment in wake queue code header saying why this is safe. I
> > > > don't think we need to repeat this here.
> > > 
> > > Yeah, _describing how to make it safe_, after it starts with:
> > > 
> > >   * NOTE: this function is lockless and requires care, incorrect usage _will_
> > >   * lead to sporadic and non-obvious failure.
> > > 
> > > Then it also says:
> > > 
> > >   * Also note that this 'optimization' trades a spin_lock() for an smp_mb(),
> > >   * which (when the lock is uncontended) are of roughly equal cost.
> > > 
> > > I question the need to optimize this path since it means reader has to figure out if it is safe while a simple wake_up_all after atomic_dec_and_test would have done it.
> > > 
> > > Is the case of no waiters a predominant one? It at least deserves a comment explaining why the optimisation is important.
> > > 
> > 
> > I just didn't want to add a spin_lock if there is known working code
> > path without one and our code fits into that path. I can add a comment
> > but I don't really think it necessary.
> 
> Lock already exists in the wake_up_all, it is not about adding your own.
> 
> As premature optimisations are usually best avoided it is simply about how
> do you justify a):
> 
> +static void decr_outstanding_submission_g2h(struct intel_guc *guc)
> +{
> +	if (atomic_dec_and_test(&guc->outstanding_submission_g2h)) {
> +		smp_mb();
> +		if (waitqueue_active(&guc->ct.wq))
> +			wake_up_all(&guc->ct.wq);
> 
> When the easy alternative (easy to read, easy to review, easy to maintain)
> is b):
> 
> +static void decr_outstanding_submission_g2h(struct intel_guc *guc)
> +{
> +	if (atomic_dec_and_test(&guc->outstanding_submission_g2h))
> +		wake_up_all(&guc->ct.wq);
> 

I'll go with option B.

Matt

> For me as external reader the question seems to be, I will say it again, is
> the case of no waiters a common one and is this a hot path to justify
> avoiding a function call by adding the mental complexity explained in the
> waitqueue_active comment? Here and in the other places in the GuC backend
> waitqueue_active is used.
> 
> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 53/97] drm/i915/guc: Disable semaphores when using GuC scheduling
  2021-05-27  8:41           ` Tvrtko Ursulin
@ 2021-05-27 14:38             ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-27 14:38 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Thu, May 27, 2021 at 09:41:46AM +0100, Tvrtko Ursulin wrote:
> 
> On 26/05/2021 19:15, Matthew Brost wrote:
> > On Wed, May 26, 2021 at 10:25:13AM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 25/05/2021 18:01, Matthew Brost wrote:
> > > > On Tue, May 25, 2021 at 10:52:01AM +0100, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 06/05/2021 20:14, Matthew Brost wrote:
> > > > > > Disable semaphores when using GuC scheduling as semaphores are broken in
> > > > > > the current GuC firmware.
> > > > > 
> > > > > What is "current"? Given that the patch itself is like year and a half old.
> > > > > 
> > > > 
> > > > Stale comment. Semaphore work with the firmware we just haven't enabled
> > > > them in the i915 with GuC submission as this an optimization and not
> > > > required for functionality.
> > > 
> > > How will the updated commit message look in terms of remaining reasons why
> > > semaphores won't/can't be enabled?
> > > 
> > 
> > Semaphores are an optimization and not required for basic GuC submission
> > to work properly. Disable until we have time to do the implementation to
> > enable semaphores and tune them for performance.
> > 
> > > They were a nice performance win on some media workloads although granted a
> > > lot of tweaking was required to find a good balance on when to use them and
> > > when not to.
> > > 
> > 
> > The same tweaking would have to be done for with GuC submission. Let's
> > get basic submission then tweak for performance.
> 
> I don't have fundamental complaints as longs as commit message is improved
> and it is understood the absence of semaphores is extremely likely to
> regress transcode performance tests. Latter probably doesn't matter (for
> some definition of it) unless either there will be a platform where both GuC
> and execlists can be supported, or there will be two separate platforms
> similar in hw performance in theory, both relevant to transcode customers,
> one using execlists and one using GuC.
> 

Sounds good. Already have this commit message updated in my branch and will be
included in the next post.

Matt

> Regards,
> 
> Tvrtko
> 
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-05-27 14:35             ` Matthew Brost
@ 2021-05-27 15:11               ` Tvrtko Ursulin
  2021-06-07 17:31                 ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-05-27 15:11 UTC (permalink / raw)
  To: Matthew Brost; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel


On 27/05/2021 15:35, Matthew Brost wrote:
> On Thu, May 27, 2021 at 11:02:24AM +0100, Tvrtko Ursulin wrote:
>>
>> On 26/05/2021 19:10, Matthew Brost wrote:
>>
>> [snip]
>>
>>>>>>> +static int ct_send_nb(struct intel_guc_ct *ct,
>>>>>>> +		      const u32 *action,
>>>>>>> +		      u32 len,
>>>>>>> +		      u32 flags)
>>>>>>> +{
>>>>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>>>> +	unsigned long spin_flags;
>>>>>>> +	u32 fence;
>>>>>>> +	int ret;
>>>>>>> +
>>>>>>> +	spin_lock_irqsave(&ctb->lock, spin_flags);
>>>>>>> +
>>>>>>> +	ret = ctb_has_room(ctb, len + 1);
>>>>>>> +	if (unlikely(ret))
>>>>>>> +		goto out;
>>>>>>> +
>>>>>>> +	fence = ct_get_next_fence(ct);
>>>>>>> +	ret = ct_write(ct, action, len, fence, flags);
>>>>>>> +	if (unlikely(ret))
>>>>>>> +		goto out;
>>>>>>> +
>>>>>>> +	intel_guc_notify(ct_to_guc(ct));
>>>>>>> +
>>>>>>> +out:
>>>>>>> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
>>>>>>> +
>>>>>>> +	return ret;
>>>>>>> +}
>>>>>>> +
>>>>>>>      static int ct_send(struct intel_guc_ct *ct,
>>>>>>>      		   const u32 *action,
>>>>>>>      		   u32 len,
>>>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>>>      		   u32 response_buf_size,
>>>>>>>      		   u32 *status)
>>>>>>>      {
>>>>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>>>>      	struct ct_request request;
>>>>>>>      	unsigned long flags;
>>>>>>>      	u32 fence;
>>>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>>>      	GEM_BUG_ON(!len);
>>>>>>>      	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>>>>>      	GEM_BUG_ON(!response_buf && response_buf_size);
>>>>>>> +	might_sleep();
>>>>>>
>>>>>> Sleep is just cond_resched below or there is more?
>>>>>>
>>>>>
>>>>> Yes, the cond_resched.
>>>>>
>>>>>>> +	/*
>>>>>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>>>>>>> +	 * buffers are sized correctly the flow control condition should be
>>>>>>> +	 * rare.
>>>>>>> +	 */
>>>>>>> +retry:
>>>>>>>      	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>>>>>>> +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
>>>>>>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>>>>>> +		cond_resched();
>>>>>>> +		goto retry;
>>>>>>> +	}
>>>>>>
>>>>>> If this patch is about adding a non-blocking send function, and below we can
>>>>>> see that it creates a fork:
>>>>>>
>>>>>> intel_guc_ct_send:
>>>>>> ...
>>>>>> 	if (flags & INTEL_GUC_SEND_NB)
>>>>>> 		return ct_send_nb(ct, action, len, flags);
>>>>>>
>>>>>>     	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>>>>>>
>>>>>> Then why is there a change in ct_send here, which is not the new
>>>>>> non-blocking path?
>>>>>>
>>>>>
>>>>> There is not a change to ct_send(), just to intel_guc_ct_send.
>>>>
>>>> I was doing by the diff which says:
>>>>
>>>>    static int ct_send(struct intel_guc_ct *ct,
>>>>    		   const u32 *action,
>>>>    		   u32 len,
>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>    		   u32 response_buf_size,
>>>>    		   u32 *status)
>>>>    {
>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>    	struct ct_request request;
>>>>    	unsigned long flags;
>>>>    	u32 fence;
>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>    	GEM_BUG_ON(!len);
>>>>    	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>>    	GEM_BUG_ON(!response_buf && response_buf_size);
>>>> +	might_sleep();
>>>> +	/*
>>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>>>> +	 * buffers are sized correctly the flow control condition should be
>>>> +	 * rare.
>>>> +	 */
>>>> +retry:
>>>>    	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>>>> +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
>>>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>>> +		cond_resched();
>>>> +		goto retry;
>>>> +	}
>>>>
>>>> So it looks like a change to ct_send to me. Is that wrong?
>>
>> What about this part - is the patch changing the blocking ct_send or not,
>> and if it is why?
>>
> 
> Yes, ct_send() changes. Sorry for the confusion.
> 
> This function needs to be updated to account for the H2G space and
> backoff if no space is available.

Since this one is the sleeping path, it probably can and needs to be 
smarter than having a cond_resched busy loop added. Like sleep and get 
woken up when there is space. Otherwise it can degenerate to busy 
looping via contention with the non-blocking path.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 60/97] drm/i915: Track 'serial' counts for virtual engines
  2021-05-27  8:53           ` Tvrtko Ursulin
@ 2021-05-27 17:01             ` John Harrison
  2021-06-01  9:31               ` Tvrtko Ursulin
  0 siblings, 1 reply; 249+ messages in thread
From: John Harrison @ 2021-05-27 17:01 UTC (permalink / raw)
  To: Tvrtko Ursulin, Matthew Brost
  Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

[-- Attachment #1: Type: text/plain, Size: 14134 bytes --]

On 5/27/2021 01:53, Tvrtko Ursulin wrote:
> On 26/05/2021 19:45, John Harrison wrote:
>> On 5/26/2021 01:40, Tvrtko Ursulin wrote:
>>> On 25/05/2021 18:52, Matthew Brost wrote:
>>>> On Tue, May 25, 2021 at 11:16:12AM +0100, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 06/05/2021 20:14, Matthew Brost wrote:
>>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>>
>>>>>> The serial number tracking of engines happens at the backend of
>>>>>> request submission and was expecting to only be given physical
>>>>>> engines. However, in GuC submission mode, the decomposition of 
>>>>>> virtual
>>>>>> to physical engines does not happen in i915. Instead, requests are
>>>>>> submitted to their virtual engine mask all the way through to the
>>>>>> hardware (i.e. to GuC). This would mean that the heart beat code
>>>>>> thinks the physical engines are idle due to the serial number not
>>>>>> incrementing.
>>>>>>
>>>>>> This patch updates the tracking to decompose virtual engines into
>>>>>> their physical constituents and tracks the request against each. 
>>>>>> This
>>>>>> is not entirely accurate as the GuC will only be issuing the request
>>>>>> to one physical engine. However, it is the best that i915 can do 
>>>>>> given
>>>>>> that it has no knowledge of the GuC's scheduling decisions.
>>>>>
>>>>> Commit text sounds a bit defeatist. I think instead of making up 
>>>>> the serial
>>>>> counts, which has downsides (could you please document in the 
>>>>> commit what
>>>>> they are), we should think how to design things properly.
>>>>>
>>>>
>>>> IMO, I don't think fixing serial counts is the scope of this 
>>>> series. We
>>>> should focus on getting GuC submission in not cleaning up all the crap
>>>> that is in the i915. Let's make a note of this though so we can 
>>>> revisit
>>>> later.
>>>
>>> I will say again - commit message implies it is introducing an 
>>> unspecified downside by not fully fixing an also unspecified issue. 
>>> It is completely reasonable, and customary even, to ask for both to 
>>> be documented in the commit message.
>> Not sure what exactly is 'unspecified'. I thought the commit message 
>> described both the problem (heartbeat not running when using virtual 
>> engines) and the result (heartbeat running on more engines than 
>> strictly necessary). But in greater detail...
>>
>> The serial number tracking is a hack for the heartbeat code to know 
>> whether an engine is busy or idle, and therefore whether it should be 
>> pinged for aliveness. Whenever a submission is made to an engine, the 
>> serial number is incremented. The heartbeat code keeps a copy of the 
>> value. If the value has changed, the engine is busy and needs to be 
>> pinged.
>>
>> This works fine for execlist mode where virtual engine decomposition 
>> is done inside i915. It fails miserably for GuC mode where the 
>> decomposition is done by the hardware. The reason being that the 
>> heartbeat code only looks at physical engines but the serial count is 
>> only incremented on the virtual engine. Thus, the heartbeat sees 
>> everything as idle and does not ping.
>
> So hangcheck does not work. Or it works because GuC does it anyway. 
> Either way, that's one thing to explicitly state in the commit message.
>
>> This patch decomposes the virtual engines for the sake of 
>> incrementing the serial count on each sub-engine in order to keep the 
>> heartbeat code happy. The downside is that now the heartbeat sees all 
>> sub-engines as busy rather than only the one the submission actually 
>> ends up on. There really isn't much that can be done about that. The 
>> heartbeat code is in i915 not GuC, the scheduler is in GuC not i915. 
>> The only way to improve it is to either move the heartbeat code into 
>> GuC as well and completely disable the i915 side, or add some way for 
>> i915 to interrogate GuC as to which engines are or are not active. 
>> Technically, we do have both. GuC has (or at least had) an option to 
>> force a context switch on every execution quantum pre-emption. 
>> However, that is much, much, more heavy weight than the heartbeat. 
>> For the latter, we do (almost) have the engine usage statistics for 
>> PMU and such like. I'm not sure how much effort it would be to wire 
>> that up to the heartbeat code instead of using the serial count.
>>
>> In short, the serial count is ever so slightly inefficient in that it 
>> causes heartbeat pings on engines which are idle. On the other hand, 
>> it is way more efficient and simpler than the current alternatives.
>
> And the hack to make hangcheck work creates this inefficiency where 
> heartbeats are sent to idle engines. Which is probably fine just needs 
> to be explained.
>
>> Does that answer the questions?
>
> With the two points I re-raise clearly explained, possibly even patch 
> title changed, yeah. I am just wanting for it to be more easily 
> obvious to patch reader what it is functionally about - not just what 
> implementation details have been change but why as well.
>
My understanding is that we don't explain every piece of code in minute 
detail in every checkin email that touches it. I thought my description 
was already pretty verbose. I've certainly seen way less informative 
checkins that apparently made it through review without issue.

Regarding the problem statement, I thought this was fairly clear that 
the heartbeat was broken for virtual engines:

    This would mean that the heart beat code
    thinks the physical engines are idle due to the serial number not
    incrementing.


Regarding the inefficiency about heartbeating all physical engines in a 
virtual engine, again, this seems clear to me:

    decompose virtual engines into
    their physical constituents and tracks the request against each. This
    is not entirely accurate as the GuC will only be issuing the request
    to one physical engine.


For the subject, I guess you could say "Track 'heartbeat serial' counts 
for virtual engines". However, the serial tracking count is not 
explicitly named for heartbeats so it seems inaccurate to rename it for 
a checkin email subject.

If you have a suggestion for better wording then feel free to propose 
something.

John.


> Regards,
>
> Tvrtko
>
>> John.
>>
>>
>>>
>>> If we are abandoning the normal review process someone please say so 
>>> I don't waste my time reading it.
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> Matt
>>>>
>>>>> Regards,
>>>>>
>>>>> Tvrtko
>>>>>
>>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>>> ---
>>>>>>    drivers/gpu/drm/i915/gt/intel_engine_types.h     |  2 ++
>>>>>>    .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++++++
>>>>>>    drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++++++
>>>>>>    drivers/gpu/drm/i915/gt/mock_engine.c            |  6 ++++++
>>>>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 
>>>>>> ++++++++++++++++
>>>>>>    drivers/gpu/drm/i915/i915_request.c              |  4 +++-
>>>>>>    6 files changed, 39 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
>>>>>> b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>>>>> index 86302e6d86b2..e2b5cda6dbc4 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>>>>> @@ -389,6 +389,8 @@ struct intel_engine_cs {
>>>>>>        void        (*park)(struct intel_engine_cs *engine);
>>>>>>        void        (*unpark)(struct intel_engine_cs *engine);
>>>>>> +    void        (*bump_serial)(struct intel_engine_cs *engine);
>>>>>> +
>>>>>>        void        (*set_default_submission)(struct 
>>>>>> intel_engine_cs *engine);
>>>>>>        const struct intel_context_ops *cops;
>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
>>>>>> b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>>>>> index ae12d7f19ecd..02880ea5d693 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>>>>> @@ -3199,6 +3199,11 @@ static void execlists_release(struct 
>>>>>> intel_engine_cs *engine)
>>>>>>        lrc_fini_wa_ctx(engine);
>>>>>>    }
>>>>>> +static void execlist_bump_serial(struct intel_engine_cs *engine)
>>>>>> +{
>>>>>> +    engine->serial++;
>>>>>> +}
>>>>>> +
>>>>>>    static void
>>>>>>    logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>>>>>>    {
>>>>>> @@ -3208,6 +3213,7 @@ logical_ring_default_vfuncs(struct 
>>>>>> intel_engine_cs *engine)
>>>>>>        engine->cops = &execlists_context_ops;
>>>>>>        engine->request_alloc = execlists_request_alloc;
>>>>>> +    engine->bump_serial = execlist_bump_serial;
>>>>>>        engine->reset.prepare = execlists_reset_prepare;
>>>>>>        engine->reset.rewind = execlists_reset_rewind;
>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c 
>>>>>> b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>>>>> index 14aa31879a37..39dd7c4ed0a9 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>>>>> @@ -1045,6 +1045,11 @@ static void setup_irq(struct 
>>>>>> intel_engine_cs *engine)
>>>>>>        }
>>>>>>    }
>>>>>> +static void ring_bump_serial(struct intel_engine_cs *engine)
>>>>>> +{
>>>>>> +    engine->serial++;
>>>>>> +}
>>>>>> +
>>>>>>    static void setup_common(struct intel_engine_cs *engine)
>>>>>>    {
>>>>>>        struct drm_i915_private *i915 = engine->i915;
>>>>>> @@ -1064,6 +1069,7 @@ static void setup_common(struct 
>>>>>> intel_engine_cs *engine)
>>>>>>        engine->cops = &ring_context_ops;
>>>>>>        engine->request_alloc = ring_request_alloc;
>>>>>> +    engine->bump_serial = ring_bump_serial;
>>>>>>        /*
>>>>>>         * Using a global execution timeline; the previous final 
>>>>>> breadcrumb is
>>>>>> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c 
>>>>>> b/drivers/gpu/drm/i915/gt/mock_engine.c
>>>>>> index bd005c1b6fd5..97b10fd60b55 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
>>>>>> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
>>>>>> @@ -292,6 +292,11 @@ static void mock_engine_release(struct 
>>>>>> intel_engine_cs *engine)
>>>>>>        intel_engine_fini_retire(engine);
>>>>>>    }
>>>>>> +static void mock_bump_serial(struct intel_engine_cs *engine)
>>>>>> +{
>>>>>> +    engine->serial++;
>>>>>> +}
>>>>>> +
>>>>>>    struct intel_engine_cs *mock_engine(struct drm_i915_private 
>>>>>> *i915,
>>>>>>                        const char *name,
>>>>>>                        int id)
>>>>>> @@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct 
>>>>>> drm_i915_private *i915,
>>>>>>        engine->base.cops = &mock_context_ops;
>>>>>>        engine->base.request_alloc = mock_request_alloc;
>>>>>> +    engine->base.bump_serial = mock_bump_serial;
>>>>>>        engine->base.emit_flush = mock_emit_flush;
>>>>>>        engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
>>>>>>        engine->base.submit_request = mock_submit_request;
>>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>>> index dc79d287c50a..f0e5731bcef6 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>>> @@ -1500,6 +1500,20 @@ static void guc_release(struct 
>>>>>> intel_engine_cs *engine)
>>>>>>        lrc_fini_wa_ctx(engine);
>>>>>>    }
>>>>>> +static void guc_bump_serial(struct intel_engine_cs *engine)
>>>>>> +{
>>>>>> +    engine->serial++;
>>>>>> +}
>>>>>> +
>>>>>> +static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
>>>>>> +{
>>>>>> +    struct intel_engine_cs *e;
>>>>>> +    intel_engine_mask_t tmp, mask = engine->mask;
>>>>>> +
>>>>>> +    for_each_engine_masked(e, engine->gt, mask, tmp)
>>>>>> +        e->serial++;
>>>>>> +}
>>>>>> +
>>>>>>    static void guc_default_vfuncs(struct intel_engine_cs *engine)
>>>>>>    {
>>>>>>        /* Default vfuncs which can be overridden by each engine. */
>>>>>> @@ -1508,6 +1522,7 @@ static void guc_default_vfuncs(struct 
>>>>>> intel_engine_cs *engine)
>>>>>>        engine->cops = &guc_context_ops;
>>>>>>        engine->request_alloc = guc_request_alloc;
>>>>>> +    engine->bump_serial = guc_bump_serial;
>>>>>>        engine->sched_engine->schedule = i915_schedule;
>>>>>> @@ -1843,6 +1858,7 @@ guc_create_virtual(struct intel_engine_cs 
>>>>>> **siblings, unsigned int count)
>>>>>>        ve->base.cops = &virtual_guc_context_ops;
>>>>>>        ve->base.request_alloc = guc_request_alloc;
>>>>>> +    ve->base.bump_serial = virtual_guc_bump_serial;
>>>>>>        ve->base.submit_request = guc_submit_request;
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_request.c 
>>>>>> b/drivers/gpu/drm/i915/i915_request.c
>>>>>> index 9542a5baa45a..127d60b36422 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>>>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>>>>> @@ -692,7 +692,9 @@ bool __i915_request_submit(struct 
>>>>>> i915_request *request)
>>>>>>                         request->ring->vaddr + request->postfix);
>>>>>>        trace_i915_request_execute(request);
>>>>>> -    engine->serial++;
>>>>>> +    if (engine->bump_serial)
>>>>>> +        engine->bump_serial(engine);
>>>>>> +
>>>>>>        result = true;
>>>>>>        GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, 
>>>>>> &request->fence.flags));
>>>>>>
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>


[-- Attachment #2: Type: text/html, Size: 22476 bytes --]

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 24/97] drm/i915/guc: Add flag for mark broken CTB
  2021-05-06 19:13 ` [RFC PATCH 24/97] drm/i915/guc: Add flag for mark broken CTB Matthew Brost
@ 2021-05-27 19:44   ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-05-27 19:44 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: tvrtko.ursulin, daniele.ceraolospurio, jason.ekstrand,
	jon.bloomfield, daniel.vetter, john.c.harrison

On Thu, May 06, 2021 at 12:13:38PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> Once CTB descriptor is found in error state, either set by GuC
> or us, there is no need continue checking descriptor any more,
> we can rely on our internal flag.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 13 +++++++++++--
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  2 ++
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 1afdeac683b5..178f73ab2c96 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -123,6 +123,7 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
>  
>  static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb, u32 cmds_addr)
>  {
> +	ctb->broken = false;
>  	guc_ct_buffer_desc_init(ctb->desc, cmds_addr, ctb->size);
>  }
>  
> @@ -365,9 +366,12 @@ static int ct_write(struct intel_guc_ct *ct,
>  	u32 *cmds = ctb->cmds;
>  	unsigned int i;
>  
> -	if (unlikely(desc->is_in_error))
> +	if (unlikely(ctb->broken))
>  		return -EPIPE;
>  
> +	if (unlikely(desc->is_in_error))
> +		goto corrupted;
> +
>  	if (unlikely(!IS_ALIGNED(head | tail, 4) ||
>  		     (tail | head) >= size))
>  		goto corrupted;
> @@ -423,6 +427,7 @@ static int ct_write(struct intel_guc_ct *ct,
>  	CT_ERROR(ct, "Corrupted descriptor addr=%#x head=%u tail=%u size=%u\n",
>  		 desc->addr, desc->head, desc->tail, desc->size);
>  	desc->is_in_error = 1;
> +	ctb->broken = true;
>  	return -EPIPE;
>  }
>  
> @@ -608,9 +613,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	unsigned int i;
>  	u32 header;
>  
> -	if (unlikely(desc->is_in_error))
> +	if (unlikely(ctb->broken))
>  		return -EPIPE;
>  
> +	if (unlikely(desc->is_in_error))
> +		goto corrupted;
> +
>  	if (unlikely(!IS_ALIGNED(head | tail, 4) ||
>  		     (tail | head) >= size))
>  		goto corrupted;
> @@ -674,6 +682,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	CT_ERROR(ct, "Corrupted descriptor addr=%#x head=%u tail=%u size=%u\n",
>  		 desc->addr, desc->head, desc->tail, desc->size);
>  	desc->is_in_error = 1;
> +	ctb->broken = true;
>  	return -EPIPE;
>  }
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index cb222f202301..7d3cd375d6a7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -32,12 +32,14 @@ struct intel_guc;
>   * @desc: pointer to the buffer descriptor
>   * @cmds: pointer to the commands buffer
>   * @size: size of the commands buffer
> + * @broken: flag to indicate if descriptor data is broken
>   */
>  struct intel_guc_ct_buffer {
>  	spinlock_t lock;
>  	struct guc_ct_buffer_desc *desc;
>  	u32 *cmds;
>  	u32 size;
> +	bool broken;
>  };
>  
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [RFC PATCH 46/97] drm/i915/guc: Implement GuC context operations for new inteface
  2021-05-06 19:14 ` [RFC PATCH 46/97] drm/i915/guc: Implement GuC context operations for new inteface Matthew Brost
@ 2021-05-29 20:32   ` Michal Wajdeczko
  0 siblings, 0 replies; 249+ messages in thread
From: Michal Wajdeczko @ 2021-05-29 20:32 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel
  Cc: tvrtko.ursulin, jason.ekstrand, daniele.ceraolospurio,
	jon.bloomfield, daniel.vetter, john.c.harrison



On 06.05.2021 21:14, Matthew Brost wrote:
> Implement GuC context operations which includes GuC specific operations
> pin, unpin, and destroy.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
>  drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
>  drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  34 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   7 +
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 663 ++++++++++++++++--
>  drivers/gpu/drm/i915/i915_reg.h               |   1 +
>  drivers/gpu/drm/i915/i915_request.c           |   1 +
>  8 files changed, 680 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index 4033184f13b9..2b68af16222c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -383,6 +383,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>  
>  	mutex_init(&ce->pin_mutex);
>  
> +	spin_lock_init(&ce->guc_state.lock);
> +
> +	ce->guc_id = GUC_INVALID_LRC_ID;
> +	INIT_LIST_HEAD(&ce->guc_id_link);
> +
>  	i915_active_init(&ce->active,
>  			 __intel_context_active, __intel_context_retire, 0);
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index bb6fef7eae52..ce7c69b34cd1 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -95,6 +95,7 @@ struct intel_context {
>  #define CONTEXT_BANNED			6
>  #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
>  #define CONTEXT_NOPREEMPT		8
> +#define CONTEXT_LRCA_DIRTY		9
>  
>  	struct {
>  		u64 timeout_us;
> @@ -137,14 +138,29 @@ struct intel_context {
>  
>  	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
>  
> +	struct {
> +		/** lock: protects everything in guc_state */
> +		spinlock_t lock;
> +		/**
> +		 * sched_state: scheduling state of this context using GuC
> +		 * submission
> +		 */
> +		u8 sched_state;
> +	} guc_state;
> +
>  	/* GuC scheduling state that does not require a lock. */
>  	atomic_t guc_sched_state_no_lock;
>  
> +	/* GuC lrc descriptor ID */
> +	u16 guc_id;
> +
> +	/* GuC lrc descriptor reference count */
> +	atomic_t guc_id_ref;
> +
>  	/*
> -	 * GuC lrc descriptor ID - Not assigned in this patch but future patches
> -	 * in the series will.
> +	 * GuC ID link - in list when unpinned but guc_id still valid in GuC

nit: now there is no need for multi-line comment

>  	 */
> -	u16 guc_id;
> +	struct list_head guc_id_link;
>  };
>  
>  #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> index 41e5350a7a05..49d4857ad9b7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> @@ -87,7 +87,6 @@
>  #define GEN11_CSB_WRITE_PTR_MASK	(GEN11_CSB_PTR_MASK << 0)
>  
>  #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */
> -#define MAX_GUC_CONTEXT_HW_ID	(1 << 20) /* exclusive */
>  #define GEN11_MAX_CONTEXT_HW_ID	(1 << 11) /* exclusive */
>  /* in Gen12 ID 0x7FF is reserved to indicate idle */
>  #define GEN12_MAX_CONTEXT_HW_ID	(GEN11_MAX_CONTEXT_HW_ID - 1)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index d32866fe90ad..85ff32bfd074 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -45,6 +45,14 @@ struct intel_guc {
>  		void (*disable)(struct intel_guc *guc);
>  	} interrupts;
>  
> +	/*
> +	 * contexts_lock protects the pool of free guc ids and a linked list of
> +	 * guc ids available to be stolden

typo

> +	 */
> +	spinlock_t contexts_lock;
> +	struct ida guc_ids;
> +	struct list_head guc_id_list;
> +
>  	bool submission_selected;
>  
>  	struct i915_vma *ads_vma;
> @@ -103,6 +111,29 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
>  				 response_buf, response_buf_size, 0);
>  }
>  
> +static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
> +					   const u32 *action,
> +					   u32 len,
> +					   bool loop)
> +{
> +	int err;
> +
> +	/* No sleeping with spin locks, just busy loop */
> +	might_sleep_if(loop && (!in_atomic() && !irqs_disabled()));
> +
> +retry:
> +	err = intel_guc_send_nb(guc, action, len);
> +	if (unlikely(err == -EBUSY && loop)) {
> +		if (likely(!in_atomic() && !irqs_disabled()))
> +			cond_resched();
> +		else
> +			cpu_relax();
> +		goto retry;
> +	}
> +
> +	return err;
> +}
> +
>  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
>  {
>  	intel_guc_ct_event_handler(&guc->ct);
> @@ -204,6 +235,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>  int intel_guc_reset_engine(struct intel_guc *guc,
>  			   struct intel_engine_cs *engine);
>  
> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> +					  const u32 *msg, u32 len);
> +
>  void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
>  
>  #endif
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 586e6efc3558..51c5efdf543a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -893,6 +893,13 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
>  	case INTEL_GUC_ACTION_DEFAULT:
>  		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
>  		break;
> +	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
> +		ret = intel_guc_deregister_done_process_msg(guc, payload,
> +							    len);
> +		if (unlikely(ret))
> +			CT_ERROR(ct, "deregister context failed %x %*ph\n",
> +				  action, 4 * len, payload);

errors like this should be printed directly from GuC submission code,
not from here and definitely not as CT_ERROR

and btw, handler function should rather return error only if there was
CTB related failure (like truncated/corrupted message)

> +		break;
>  	default:
>  		ret = -EOPNOTSUPP;
>  		break;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 2fd83562c1d1..eada9ffc1a54 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -13,7 +13,9 @@
>  #include "gt/intel_gt.h"
>  #include "gt/intel_gt_irq.h"
>  #include "gt/intel_gt_pm.h"
> +#include "gt/intel_gt_requests.h"
>  #include "gt/intel_lrc.h"
> +#include "gt/intel_lrc_reg.h"
>  #include "gt/intel_mocs.h"
>  #include "gt/intel_ring.h"
>  
> @@ -84,6 +86,73 @@ static inline void clr_context_enabled(struct intel_context *ce)
>  		   &ce->guc_sched_state_no_lock);
>  }
>  
> +/*
> + * Below is a set of functions which control the GuC scheduling state which
> + * require a lock, aside from the special case where the functions are called
> + * from guc_lrc_desc_pin(). In that case it isn't possible for any other code
> + * path to be executing on the context.
> + */
> +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
> +#define SCHED_STATE_DESTROYED				BIT(1)
> +static inline void init_sched_state(struct intel_context *ce)
> +{
> +	/* Only should be called from guc_lrc_desc_pin() */
> +	atomic_set(&ce->guc_sched_state_no_lock, 0);
> +	ce->guc_state.sched_state = 0;
> +}
> +
> +static inline bool
> +context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	return (ce->guc_state.sched_state &
> +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> +}
> +
> +static inline void
> +set_context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	/* Only should be called from guc_lrc_desc_pin() */
> +	ce->guc_state.sched_state |=
> +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> +}
> +
> +static inline void
> +clr_context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	ce->guc_state.sched_state =
> +		(ce->guc_state.sched_state &
> +		 ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> +}
> +
> +static inline bool
> +context_destroyed(struct intel_context *ce)
> +{
> +	return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
> +}
> +
> +static inline void
> +set_context_destroyed(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
> +}
> +
> +static inline bool context_guc_id_invalid(struct intel_context *ce)
> +{
> +	return (ce->guc_id == GUC_INVALID_LRC_ID);
> +}
> +
> +static inline void set_context_guc_id_invalid(struct intel_context *ce)
> +{
> +	ce->guc_id = GUC_INVALID_LRC_ID;
> +}
> +
> +static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
> +{
> +	return &ce->engine->gt->uc.guc;
> +}
> +
>  static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>  {
>  	return rb_entry(rb, struct i915_priolist, node);
> @@ -154,6 +223,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>  	int len = 0;
>  	bool enabled = context_enabled(ce);
>  
> +	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> +	GEM_BUG_ON(context_guc_id_invalid(ce));
> +
>  	if (!enabled) {
>  		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>  		action[len++] = ce->guc_id;
> @@ -420,6 +492,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
>  
>  	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
>  
> +	spin_lock_init(&guc->contexts_lock);
> +	INIT_LIST_HEAD(&guc->guc_id_list);
> +	ida_init(&guc->guc_ids);
> +
>  	return 0;
>  }
>  
> @@ -432,9 +508,303 @@ void intel_guc_submission_fini(struct intel_guc *guc)
>  	i915_sched_engine_put(guc->sched_engine);
>  }
>  
> -static int guc_context_alloc(struct intel_context *ce)
> +static inline void queue_request(struct i915_sched_engine *sched_engine,
> +				 struct i915_request *rq,
> +				 int prio)
>  {
> -	return lrc_alloc(ce, ce->engine);
> +	GEM_BUG_ON(!list_empty(&rq->sched.link));
> +	list_add_tail(&rq->sched.link,
> +		      i915_sched_lookup_priolist(sched_engine, prio));
> +	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> +}
> +
> +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> +				     struct i915_request *rq)
> +{
> +	int ret;
> +
> +	__i915_request_submit(rq);
> +
> +	trace_i915_request_in(rq, 0);
> +
> +	guc_set_lrc_tail(rq);
> +	ret = guc_add_request(guc, rq);
> +	if (ret == -EBUSY)
> +		guc->stalled_request = rq;
> +
> +	return ret;
> +}
> +
> +static void guc_submit_request(struct i915_request *rq)
> +{
> +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> +	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> +	unsigned long flags;
> +
> +	/* Will be called from irq-context when using foreign fences. */
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +
> +	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> +		queue_request(sched_engine, rq, rq_prio(rq));
> +	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> +		i915_sched_engine_hi_kick(sched_engine);
> +
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +}
> +
> +#define GUC_ID_START	64	/* First 64 guc_ids reserved */
> +static int new_guc_id(struct intel_guc *guc)
> +{
> +	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
> +			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
> +			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> +}
> +
> +static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	if (!context_guc_id_invalid(ce)) {
> +		ida_simple_remove(&guc->guc_ids, ce->guc_id);
> +		reset_lrc_desc(guc, ce->guc_id);
> +		set_context_guc_id_invalid(ce);
> +	}
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +}
> +
> +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	__release_guc_id(guc, ce);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +}
> +
> +static int steal_guc_id(struct intel_guc *guc)
> +{
> +	struct intel_context *ce;
> +	int guc_id;
> +
> +	if (!list_empty(&guc->guc_id_list)) {
> +		ce = list_first_entry(&guc->guc_id_list,
> +				      struct intel_context,
> +				      guc_id_link);
> +
> +		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> +		GEM_BUG_ON(context_guc_id_invalid(ce));
> +
> +		list_del_init(&ce->guc_id_link);
> +		guc_id = ce->guc_id;
> +		set_context_guc_id_invalid(ce);
> +		return guc_id;
> +	} else {
> +		return -EAGAIN;
> +	}
> +}
> +
> +static int assign_guc_id(struct intel_guc *guc, u16 *out)
> +{
> +	int ret;
> +
> +	ret = new_guc_id(guc);
> +	if (unlikely(ret < 0)) {
> +		ret = steal_guc_id(guc);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
> +	*out = ret;
> +	return 0;
> +}
> +
> +#define PIN_GUC_ID_TRIES	4
> +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	int ret = 0;
> +	unsigned long flags, tries = PIN_GUC_ID_TRIES;
> +
> +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> +
> +try_again:
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +
> +	if (context_guc_id_invalid(ce)) {
> +		ret = assign_guc_id(guc, &ce->guc_id);
> +		if (ret)
> +			goto out_unlock;
> +		ret = 1;	// Indidcates newly assigned HW context
> +	}
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +	atomic_inc(&ce->guc_id_ref);
> +
> +out_unlock:
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +
> +	/*
> +	 * -EAGAIN indicates no guc_ids are available, let's retire any
> +	 * outstanding requests to see if that frees up a guc_id. If the first
> +	 * retire didn't help, insert a sleep with the timeslice duration before
> +	 * attempting to retire more requests. Double the sleep period each
> +	 * subsequent pass before finally giving up. The sleep period has max of
> +	 * 100ms and minimum of 1ms.
> +	 */
> +	if (ret == -EAGAIN && --tries) {
> +		if (PIN_GUC_ID_TRIES - tries > 1) {
> +			unsigned int timeslice_shifted =
> +				ce->engine->props.timeslice_duration_ms <<
> +				(PIN_GUC_ID_TRIES - tries - 2);
> +			unsigned int max = min_t(unsigned int, 100,
> +						 timeslice_shifted);
> +
> +			msleep(max_t(unsigned int, max, 1));
> +		}
> +		intel_gt_retire_requests(guc_to_gt(guc));
> +		goto try_again;
> +	}
> +
> +	return ret;
> +}
> +
> +static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	unsigned long flags;
> +
> +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
> +
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
> +	    !atomic_read(&ce->guc_id_ref))
> +		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +}
> +
> +static int __guc_action_register_context(struct intel_guc *guc,
> +					 u32 guc_id,
> +					 u32 offset)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_REGISTER_CONTEXT,
> +		guc_id,
> +		offset,
> +	};
> +
> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> +}
> +
> +static int register_context(struct intel_context *ce)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> +		ce->guc_id * sizeof(struct guc_lrc_desc);
> +
> +	return __guc_action_register_context(guc, ce->guc_id, offset);
> +}
> +
> +static int __guc_action_deregister_context(struct intel_guc *guc,
> +					   u32 guc_id)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> +		guc_id,
> +	};
> +
> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> +}
> +
> +static int deregister_context(struct intel_context *ce, u32 guc_id)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +
> +	return __guc_action_deregister_context(guc, guc_id);
> +}
> +
> +static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> +{
> +	switch (class) {
> +	case RENDER_CLASS:
> +		return mask >> RCS0;
> +	case VIDEO_ENHANCEMENT_CLASS:
> +		return mask >> VECS0;
> +	case VIDEO_DECODE_CLASS:
> +		return mask >> VCS0;
> +	case COPY_ENGINE_CLASS:
> +		return mask >> BCS0;
> +	default:
> +		GEM_BUG_ON("Invalid Class");
> +		return 0;
> +	}
> +}
> +
> +static void guc_context_policy_init(struct intel_engine_cs *engine,
> +				    struct guc_lrc_desc *desc)
> +{
> +	desc->policy_flags = 0;
> +
> +	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
> +	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> +}
> +
> +static int guc_lrc_desc_pin(struct intel_context *ce)
> +{
> +	struct intel_runtime_pm *runtime_pm =
> +		&ce->engine->gt->i915->runtime_pm;
> +	struct intel_engine_cs *engine = ce->engine;
> +	struct intel_guc *guc = &engine->gt->uc.guc;

can you reorder/optimize above locals?

> +	u32 desc_idx = ce->guc_id;
> +	struct guc_lrc_desc *desc;
> +	bool context_registered;
> +	intel_wakeref_t wakeref;
> +	int ret = 0;
> +
> +	GEM_BUG_ON(!engine->mask);
> +
> +	/*
> +	 * Ensure LRC + CT vmas are is same region as write barrier is done
> +	 * based on CT vma region.
> +	 */
> +	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
> +		   i915_gem_object_is_lmem(ce->ring->vma->obj));
> +
> +	context_registered = lrc_desc_registered(guc, desc_idx);
> +
> +	reset_lrc_desc(guc, desc_idx);
> +	set_lrc_desc_registered(guc, desc_idx, ce);
> +
> +	desc = __get_lrc_desc(guc, desc_idx);
> +	desc->engine_class = engine_class_to_guc_class(engine->class);
> +	desc->engine_submit_mask = adjust_engine_mask(engine->class,
> +						      engine->mask);
> +	desc->hw_context_desc = ce->lrc.lrca;
> +	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> +	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> +	guc_context_policy_init(engine, desc);
> +	init_sched_state(ce);
> +
> +	/*
> +	 * The context_lookup xarray is used to determine if the hardware
> +	 * context is currently registered. There are two cases in which it
> +	 * could be regisgered either the guc_id has been stole from from

typo

> +	 * another context or the lrc descriptor address of this context has
> +	 * changed. In either case the context needs to be deregistered with the
> +	 * GuC before registering this context.
> +	 */
> +	if (context_registered) {
> +		set_context_wait_for_deregister_to_register(ce);
> +		intel_context_get(ce);
> +
> +		/*
> +		 * If stealing the guc_id, this ce has the same guc_id as the
> +		 * context whos guc_id was stole.
> +		 */
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			ret = deregister_context(ce, ce->guc_id);
> +	} else {
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			ret = register_context(ce);
> +	}
> +
> +	return ret;
>  }
>  
>  static int guc_context_pre_pin(struct intel_context *ce,
> @@ -446,36 +816,137 @@ static int guc_context_pre_pin(struct intel_context *ce,
>  
>  static int guc_context_pin(struct intel_context *ce, void *vaddr)
>  {
> +	if (i915_ggtt_offset(ce->state) !=
> +	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
> +		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> +
>  	return lrc_pin(ce, ce->engine, vaddr);
>  }
>  
> +static void guc_context_unpin(struct intel_context *ce)
> +{
> +	unpin_guc_id(ce_to_guc(ce), ce);
> +	lrc_unpin(ce);
> +}
> +
> +static void guc_context_post_unpin(struct intel_context *ce)
> +{
> +	lrc_post_unpin(ce);
> +}
> +
> +static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> +{
> +	struct intel_engine_cs *engine = ce->engine;
> +	struct intel_guc *guc = &engine->gt->uc.guc;
> +	unsigned long flags;
> +
> +	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> +	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> +
> +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> +	set_context_destroyed(ce);
> +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +	deregister_context(ce, ce->guc_id);
> +}
> +
> +static void guc_context_destroy(struct kref *kref)
> +{
> +	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
> +	struct intel_guc *guc = &ce->engine->gt->uc.guc;
> +	intel_wakeref_t wakeref;
> +	unsigned long flags;
> +
> +	/*
> +	 * If the guc_id is invalid this context has been stolen and we can free
> +	 * it immediately. Also can be freed immediately if the context is not
> +	 * registered with the GuC.
> +	 */
> +	if (context_guc_id_invalid(ce) ||
> +	    !lrc_desc_registered(guc, ce->guc_id)) {
> +		release_guc_id(guc, ce);
> +		lrc_destroy(kref);
> +		return;
> +	}
> +
> +	/*
> +	 * We have to acquire the context spinlock and check guc_id again, if it
> +	 * is valid it hasn't been stolen and needs to be deregistered. We
> +	 * delete this context from the list of unpinned guc_ids available to
> +	 * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
> +	 * returns indicating this context has been deregistered the guc_id is
> +	 * returned to the pool of available guc_ids.
> +	 */
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	if (context_guc_id_invalid(ce)) {
> +		__release_guc_id(guc, ce);
> +		spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +		lrc_destroy(kref);
> +		return;
> +	}
> +
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +
> +	/*
> +	 * We defer GuC context deregistration until the context is destroyed
> +	 * in order to save on CTBs. With this optimization ideally we only need
> +	 * 1 CTB to register the context during the first pin and 1 CTB to
> +	 * deregister the context when the context is destroyed. Without this
> +	 * optimization, a CTB would be needed every pin & unpin.
> +	 *
> +	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
> +	 * from context_free_worker when not runtime wakeref is held.
> +	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
> +	 * in H2G CTB to deregister the context. A future patch may defer this
> +	 * H2G CTB if the runtime wakeref is zero.
> +	 */
> +	with_intel_runtime_pm(runtime_pm, wakeref)
> +		guc_lrc_desc_unpin(ce);
> +}
> +
> +static int guc_context_alloc(struct intel_context *ce)
> +{
> +	return lrc_alloc(ce, ce->engine);
> +}
> +
>  static const struct intel_context_ops guc_context_ops = {
>  	.alloc = guc_context_alloc,
>  
>  	.pre_pin = guc_context_pre_pin,
>  	.pin = guc_context_pin,
> -	.unpin = lrc_unpin,
> -	.post_unpin = lrc_post_unpin,
> +	.unpin = guc_context_unpin,
> +	.post_unpin = guc_context_post_unpin,
>  
>  	.enter = intel_context_enter_engine,
>  	.exit = intel_context_exit_engine,
>  
>  	.reset = lrc_reset,
> -	.destroy = lrc_destroy,
> +	.destroy = guc_context_destroy,
>  };
>  
> -static int guc_request_alloc(struct i915_request *request)
> +static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
> +{
> +	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> +}
> +
> +static int guc_request_alloc(struct i915_request *rq)
>  {
> +	struct intel_context *ce = rq->context;
> +	struct intel_guc *guc = ce_to_guc(ce);
>  	int ret;
>  
> -	GEM_BUG_ON(!intel_context_is_pinned(request->context));
> +	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
>  
>  	/*
>  	 * Flush enough space to reduce the likelihood of waiting after
>  	 * we start building the request - in which case we will just
>  	 * have to repeat work.
>  	 */
> -	request->reserved_space += GUC_REQUEST_SIZE;
> +	rq->reserved_space += GUC_REQUEST_SIZE;
>  
>  	/*
>  	 * Note that after this point, we have committed to using
> @@ -486,56 +957,48 @@ static int guc_request_alloc(struct i915_request *request)
>  	 */
>  
>  	/* Unconditionally invalidate GPU caches and TLBs. */
> -	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
> +	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
>  	if (ret)
>  		return ret;
>  
> -	request->reserved_space -= GUC_REQUEST_SIZE;
> -	return 0;
> -}
> -
> -static inline void queue_request(struct i915_sched_engine *sched_engine,
> -				 struct i915_request *rq,
> -				 int prio)
> -{
> -	GEM_BUG_ON(!list_empty(&rq->sched.link));
> -	list_add_tail(&rq->sched.link,
> -		      i915_sched_lookup_priolist(sched_engine, prio));
> -	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> -}
> -
> -static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> -				     struct i915_request *rq)
> -{
> -	int ret;
> +	rq->reserved_space -= GUC_REQUEST_SIZE;
>  
> -	__i915_request_submit(rq);
> -
> -	trace_i915_request_in(rq, 0);
> -
> -	guc_set_lrc_tail(rq);
> -	ret = guc_add_request(guc, rq);
> -	if (ret == -EBUSY)
> -		guc->stalled_request = rq;
> -
> -	return ret;
> -}
> +	/*
> +	 * Call pin_guc_id here rather than in the pinning step as with
> +	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
> +	 * guc_ids and creating horrible race conditions. This is especially bad
> +	 * when guc_ids are being stolen due to over subscription. By the time
> +	 * this function is reached, it is guaranteed that the guc_id will be
> +	 * persistent until the generated request is retired. Thus, sealing these
> +	 * race conditions. It is still safe to fail here if guc_ids are
> +	 * exhausted and return -EAGAIN to the user indicating that they can try
> +	 * again in the future.
> +	 *
> +	 * There is no need for a lock here as the timeline mutex ensures at
> +	 * most one context can be executing this code path at once. The
> +	 * guc_id_ref is incremented once for every request in flight and
> +	 * decremented on each retire. When it is zero, a lock around the
> +	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
> +	 */
> +	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
> +		return 0;
>  
> -static void guc_submit_request(struct i915_request *rq)
> -{
> -	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> -	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> -	unsigned long flags;
> +	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
> +	if (unlikely(ret < 0))
> +		return ret;;
>  
> -	/* Will be called from irq-context when using foreign fences. */
> -	spin_lock_irqsave(&sched_engine->lock, flags);
> +	if (context_needs_register(ce, !!ret)) {
> +		ret = guc_lrc_desc_pin(ce);
> +		if (unlikely(ret)) {	/* unwind */
> +			atomic_dec(&ce->guc_id_ref);
> +			unpin_guc_id(guc, ce);
> +			return ret;
> +		}
> +	}
>  
> -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> -		queue_request(sched_engine, rq, rq_prio(rq));
> -	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> -		i915_sched_engine_hi_kick(sched_engine);
> +	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
>  
> -	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +	return 0;
>  }
>  
>  static void sanitize_hwsp(struct intel_engine_cs *engine)
> @@ -609,6 +1072,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
>  	engine->submit_request = guc_submit_request;
>  }
>  
> +static inline void guc_kernel_context_pin(struct intel_guc *guc,
> +					  struct intel_context *ce)
> +{
> +	if (context_guc_id_invalid(ce))
> +		pin_guc_id(guc, ce);
> +	guc_lrc_desc_pin(ce);
> +}
> +
> +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> +{
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +
> +	/* make sure all descriptors are clean... */
> +	xa_destroy(&guc->context_lookup);
> +
> +	/*
> +	 * Some contexts might have been pinned before we enabled GuC
> +	 * submission, so we need to add them to the GuC bookeeping.
> +	 * Also, after a reset the GuC we want to make sure that the information
> +	 * shared with GuC is properly reset. The kernel lrcs are not attached
> +	 * to the gem_context, so they need to be added separately.
> +	 *
> +	 * Note: we purposely do not check the error return of
> +	 * guc_lrc_desc_pin, because that function can only fail in two cases.
> +	 * One, if there aren't enough free IDs, but we're guaranteed to have
> +	 * enough here (we're either only pinning a handful of lrc on first boot
> +	 * or we're re-pinning lrcs that were already pinned before the reset).
> +	 * Two, if the GuC has died and CTBs can't make forward progress.
> +	 * Presumably, the GuC should be alive as this function is called on
> +	 * driver load or after a reset. Even if it is dead, another full GPU
> +	 * reset will be triggered and this function would be called again.
> +	 */
> +
> +	for_each_engine(engine, gt, id)
> +		if (engine->kernel_context)
> +			guc_kernel_context_pin(guc, engine->kernel_context);
> +}
> +
>  static void guc_release(struct intel_engine_cs *engine)
>  {
>  	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
> @@ -721,6 +1224,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>  
>  void intel_guc_submission_enable(struct intel_guc *guc)
>  {
> +	guc_init_lrc_mapping(guc);
>  }
>  
>  void intel_guc_submission_disable(struct intel_guc *guc)
> @@ -746,3 +1250,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
>  {
>  	guc->submission_selected = __guc_submission_selected(guc);
>  }
> +
> +static inline struct intel_context *
> +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> +{
> +	struct intel_context *ce;
> +
> +	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> +			"Invalid desc_idx %u", desc_idx);
> +		return NULL;
> +	}
> +
> +	ce = __get_context(guc, desc_idx);
> +	if (unlikely(!ce)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> +			"Context is NULL, desc_idx %u", desc_idx);
> +		return NULL;
> +	}
> +
> +	return ce;
> +}
> +
> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> +					  const u32 *msg,
> +					  u32 len)
> +{
> +	struct intel_context *ce;
> +	u32 desc_idx = msg[0];
> +
> +	if (unlikely(len < 1)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
> +		return -EPROTO;
> +	}
> +
> +	ce = g2h_context_lookup(guc, desc_idx);
> +	if (unlikely(!ce))
> +		return -EPROTO;
> +
> +	if (context_wait_for_deregister_to_register(ce)) {
> +		struct intel_runtime_pm *runtime_pm =
> +			&ce->engine->gt->i915->runtime_pm;
> +		intel_wakeref_t wakeref;
> +
> +		/*
> +		 * Previous owner of this guc_id has been deregistered, now safe
> +		 * register this context.
> +		 */
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			register_context(ce);
> +		clr_context_wait_for_deregister_to_register(ce);
> +		intel_context_put(ce);
> +	} else if (context_destroyed(ce)) {
> +		/* Context has been destroyed */
> +		release_guc_id(guc, ce);
> +		lrc_destroy(&ce->ref);
> +	}
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 9ffd173f8b7f..db151b522825 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -4127,6 +4127,7 @@ enum {
>  	FAULT_AND_CONTINUE /* Unsupported */
>  };
>  
> +#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
>  #define GEN8_CTX_VALID (1 << 0)
>  #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
>  #define GEN8_CTX_FORCE_RESTORE (1 << 2)
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 4c0df56e3b86..56860b7d065b 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -419,6 +419,7 @@ bool i915_request_retire(struct i915_request *rq)
>  	 */
>  	if (!list_empty(&rq->sched.link))
>  		remove_from_engine(rq);
> +	atomic_dec(&rq->context->guc_id_ref);
>  	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
>  
>  	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 60/97] drm/i915: Track 'serial' counts for virtual engines
  2021-05-27 17:01             ` John Harrison
@ 2021-06-01  9:31               ` Tvrtko Ursulin
  2021-06-02  1:20                 ` John Harrison
  0 siblings, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-06-01  9:31 UTC (permalink / raw)
  To: John Harrison, Matthew Brost
  Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel


On 27/05/2021 18:01, John Harrison wrote:
> On 5/27/2021 01:53, Tvrtko Ursulin wrote:
>> On 26/05/2021 19:45, John Harrison wrote:
>>> On 5/26/2021 01:40, Tvrtko Ursulin wrote:
>>>> On 25/05/2021 18:52, Matthew Brost wrote:
>>>>> On Tue, May 25, 2021 at 11:16:12AM +0100, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 06/05/2021 20:14, Matthew Brost wrote:
>>>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>>>
>>>>>>> The serial number tracking of engines happens at the backend of
>>>>>>> request submission and was expecting to only be given physical
>>>>>>> engines. However, in GuC submission mode, the decomposition of 
>>>>>>> virtual
>>>>>>> to physical engines does not happen in i915. Instead, requests are
>>>>>>> submitted to their virtual engine mask all the way through to the
>>>>>>> hardware (i.e. to GuC). This would mean that the heart beat code
>>>>>>> thinks the physical engines are idle due to the serial number not
>>>>>>> incrementing.
>>>>>>>
>>>>>>> This patch updates the tracking to decompose virtual engines into
>>>>>>> their physical constituents and tracks the request against each. 
>>>>>>> This
>>>>>>> is not entirely accurate as the GuC will only be issuing the request
>>>>>>> to one physical engine. However, it is the best that i915 can do 
>>>>>>> given
>>>>>>> that it has no knowledge of the GuC's scheduling decisions.
>>>>>>
>>>>>> Commit text sounds a bit defeatist. I think instead of making up 
>>>>>> the serial
>>>>>> counts, which has downsides (could you please document in the 
>>>>>> commit what
>>>>>> they are), we should think how to design things properly.
>>>>>>
>>>>>
>>>>> IMO, I don't think fixing serial counts is the scope of this 
>>>>> series. We
>>>>> should focus on getting GuC submission in not cleaning up all the crap
>>>>> that is in the i915. Let's make a note of this though so we can 
>>>>> revisit
>>>>> later.
>>>>
>>>> I will say again - commit message implies it is introducing an 
>>>> unspecified downside by not fully fixing an also unspecified issue. 
>>>> It is completely reasonable, and customary even, to ask for both to 
>>>> be documented in the commit message.
>>> Not sure what exactly is 'unspecified'. I thought the commit message 
>>> described both the problem (heartbeat not running when using virtual 
>>> engines) and the result (heartbeat running on more engines than 
>>> strictly necessary). But in greater detail...
>>>
>>> The serial number tracking is a hack for the heartbeat code to know 
>>> whether an engine is busy or idle, and therefore whether it should be 
>>> pinged for aliveness. Whenever a submission is made to an engine, the 
>>> serial number is incremented. The heartbeat code keeps a copy of the 
>>> value. If the value has changed, the engine is busy and needs to be 
>>> pinged.
>>>
>>> This works fine for execlist mode where virtual engine decomposition 
>>> is done inside i915. It fails miserably for GuC mode where the 
>>> decomposition is done by the hardware. The reason being that the 
>>> heartbeat code only looks at physical engines but the serial count is 
>>> only incremented on the virtual engine. Thus, the heartbeat sees 
>>> everything as idle and does not ping.
>>
>> So hangcheck does not work. Or it works because GuC does it anyway. 
>> Either way, that's one thing to explicitly state in the commit message.
>>
>>> This patch decomposes the virtual engines for the sake of 
>>> incrementing the serial count on each sub-engine in order to keep the 
>>> heartbeat code happy. The downside is that now the heartbeat sees all 
>>> sub-engines as busy rather than only the one the submission actually 
>>> ends up on. There really isn't much that can be done about that. The 
>>> heartbeat code is in i915 not GuC, the scheduler is in GuC not i915. 
>>> The only way to improve it is to either move the heartbeat code into 
>>> GuC as well and completely disable the i915 side, or add some way for 
>>> i915 to interrogate GuC as to which engines are or are not active. 
>>> Technically, we do have both. GuC has (or at least had) an option to 
>>> force a context switch on every execution quantum pre-emption. 
>>> However, that is much, much, more heavy weight than the heartbeat. 
>>> For the latter, we do (almost) have the engine usage statistics for 
>>> PMU and such like. I'm not sure how much effort it would be to wire 
>>> that up to the heartbeat code instead of using the serial count.
>>>
>>> In short, the serial count is ever so slightly inefficient in that it 
>>> causes heartbeat pings on engines which are idle. On the other hand, 
>>> it is way more efficient and simpler than the current alternatives.
>>
>> And the hack to make hangcheck work creates this inefficiency where 
>> heartbeats are sent to idle engines. Which is probably fine just needs 
>> to be explained.
>>
>>> Does that answer the questions?
>>
>> With the two points I re-raise clearly explained, possibly even patch 
>> title changed, yeah. I am just wanting for it to be more easily 
>> obvious to patch reader what it is functionally about - not just what 
>> implementation details have been change but why as well.
>>
> My understanding is that we don't explain every piece of code in minute 
> detail in every checkin email that touches it. I thought my description 
> was already pretty verbose. I've certainly seen way less informative 
> checkins that apparently made it through review without issue.
> 
> Regarding the problem statement, I thought this was fairly clear that 
> the heartbeat was broken for virtual engines:
> 
>     This would mean that the heart beat code
>     thinks the physical engines are idle due to the serial number not
>     incrementing.
> 
> 
> Regarding the inefficiency about heartbeating all physical engines in a 
> virtual engine, again, this seems clear to me:
> 
>     decompose virtual engines into
>     their physical constituents and tracks the request against each. This
>     is not entirely accurate as the GuC will only be issuing the request
>     to one physical engine.
> 
> 
> For the subject, I guess you could say "Track 'heartbeat serial' counts 
> for virtual engines". However, the serial tracking count is not 
> explicitly named for heartbeats so it seems inaccurate to rename it for 
> a checkin email subject.
> 
> If you have a suggestion for better wording then feel free to propose 
> something.

Sigh, I am not asking for more low level detail but for more up to point 
high level naming and high level description.

"drm/i915: Fix hangchek for guc virtual engines"

"..Blah blah, but hack because it is not ideal due xyz which needlessly 
wakes up all engines which has an effect on power yes/no? Latency? 
Throughput when high prio pulse triggers pointless preemption?"

Also, can we fix it properly without introducing inefficiencies? Do we 
even need heartbeats when GuC is in charge of engine resets? And if we 
do can we make them work better?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 60/97] drm/i915: Track 'serial' counts for virtual engines
  2021-06-01  9:31               ` Tvrtko Ursulin
@ 2021-06-02  1:20                 ` John Harrison
  2021-06-02 12:04                   ` Tvrtko Ursulin
  0 siblings, 1 reply; 249+ messages in thread
From: John Harrison @ 2021-06-02  1:20 UTC (permalink / raw)
  To: Tvrtko Ursulin, Matthew Brost
  Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On 6/1/2021 02:31, Tvrtko Ursulin wrote:
> On 27/05/2021 18:01, John Harrison wrote:
>> On 5/27/2021 01:53, Tvrtko Ursulin wrote:
>>> On 26/05/2021 19:45, John Harrison wrote:
>>>> On 5/26/2021 01:40, Tvrtko Ursulin wrote:
>>>>> On 25/05/2021 18:52, Matthew Brost wrote:
>>>>>> On Tue, May 25, 2021 at 11:16:12AM +0100, Tvrtko Ursulin wrote:
>>>>>>>
>>>>>>> On 06/05/2021 20:14, Matthew Brost wrote:
>>>>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>>>>
>>>>>>>> The serial number tracking of engines happens at the backend of
>>>>>>>> request submission and was expecting to only be given physical
>>>>>>>> engines. However, in GuC submission mode, the decomposition of 
>>>>>>>> virtual
>>>>>>>> to physical engines does not happen in i915. Instead, requests are
>>>>>>>> submitted to their virtual engine mask all the way through to the
>>>>>>>> hardware (i.e. to GuC). This would mean that the heart beat code
>>>>>>>> thinks the physical engines are idle due to the serial number not
>>>>>>>> incrementing.
>>>>>>>>
>>>>>>>> This patch updates the tracking to decompose virtual engines into
>>>>>>>> their physical constituents and tracks the request against 
>>>>>>>> each. This
>>>>>>>> is not entirely accurate as the GuC will only be issuing the 
>>>>>>>> request
>>>>>>>> to one physical engine. However, it is the best that i915 can 
>>>>>>>> do given
>>>>>>>> that it has no knowledge of the GuC's scheduling decisions.
>>>>>>>
>>>>>>> Commit text sounds a bit defeatist. I think instead of making up 
>>>>>>> the serial
>>>>>>> counts, which has downsides (could you please document in the 
>>>>>>> commit what
>>>>>>> they are), we should think how to design things properly.
>>>>>>>
>>>>>>
>>>>>> IMO, I don't think fixing serial counts is the scope of this 
>>>>>> series. We
>>>>>> should focus on getting GuC submission in not cleaning up all the 
>>>>>> crap
>>>>>> that is in the i915. Let's make a note of this though so we can 
>>>>>> revisit
>>>>>> later.
>>>>>
>>>>> I will say again - commit message implies it is introducing an 
>>>>> unspecified downside by not fully fixing an also unspecified 
>>>>> issue. It is completely reasonable, and customary even, to ask for 
>>>>> both to be documented in the commit message.
>>>> Not sure what exactly is 'unspecified'. I thought the commit 
>>>> message described both the problem (heartbeat not running when 
>>>> using virtual engines) and the result (heartbeat running on more 
>>>> engines than strictly necessary). But in greater detail...
>>>>
>>>> The serial number tracking is a hack for the heartbeat code to know 
>>>> whether an engine is busy or idle, and therefore whether it should 
>>>> be pinged for aliveness. Whenever a submission is made to an 
>>>> engine, the serial number is incremented. The heartbeat code keeps 
>>>> a copy of the value. If the value has changed, the engine is busy 
>>>> and needs to be pinged.
>>>>
>>>> This works fine for execlist mode where virtual engine 
>>>> decomposition is done inside i915. It fails miserably for GuC mode 
>>>> where the decomposition is done by the hardware. The reason being 
>>>> that the heartbeat code only looks at physical engines but the 
>>>> serial count is only incremented on the virtual engine. Thus, the 
>>>> heartbeat sees everything as idle and does not ping.
>>>
>>> So hangcheck does not work. Or it works because GuC does it anyway. 
>>> Either way, that's one thing to explicitly state in the commit message.
>>>
>>>> This patch decomposes the virtual engines for the sake of 
>>>> incrementing the serial count on each sub-engine in order to keep 
>>>> the heartbeat code happy. The downside is that now the heartbeat 
>>>> sees all sub-engines as busy rather than only the one the 
>>>> submission actually ends up on. There really isn't much that can be 
>>>> done about that. The heartbeat code is in i915 not GuC, the 
>>>> scheduler is in GuC not i915. The only way to improve it is to 
>>>> either move the heartbeat code into GuC as well and completely 
>>>> disable the i915 side, or add some way for i915 to interrogate GuC 
>>>> as to which engines are or are not active. Technically, we do have 
>>>> both. GuC has (or at least had) an option to force a context switch 
>>>> on every execution quantum pre-emption. However, that is much, 
>>>> much, more heavy weight than the heartbeat. For the latter, we do 
>>>> (almost) have the engine usage statistics for PMU and such like. 
>>>> I'm not sure how much effort it would be to wire that up to the 
>>>> heartbeat code instead of using the serial count.
>>>>
>>>> In short, the serial count is ever so slightly inefficient in that 
>>>> it causes heartbeat pings on engines which are idle. On the other 
>>>> hand, it is way more efficient and simpler than the current 
>>>> alternatives.
>>>
>>> And the hack to make hangcheck work creates this inefficiency where 
>>> heartbeats are sent to idle engines. Which is probably fine just 
>>> needs to be explained.
>>>
>>>> Does that answer the questions?
>>>
>>> With the two points I re-raise clearly explained, possibly even 
>>> patch title changed, yeah. I am just wanting for it to be more 
>>> easily obvious to patch reader what it is functionally about - not 
>>> just what implementation details have been change but why as well.
>>>
>> My understanding is that we don't explain every piece of code in 
>> minute detail in every checkin email that touches it. I thought my 
>> description was already pretty verbose. I've certainly seen way less 
>> informative checkins that apparently made it through review without 
>> issue.
>>
>> Regarding the problem statement, I thought this was fairly clear that 
>> the heartbeat was broken for virtual engines:
>>
>>     This would mean that the heart beat code
>>     thinks the physical engines are idle due to the serial number not
>>     incrementing.
>>
>>
>> Regarding the inefficiency about heartbeating all physical engines in 
>> a virtual engine, again, this seems clear to me:
>>
>>     decompose virtual engines into
>>     their physical constituents and tracks the request against each. 
>> This
>>     is not entirely accurate as the GuC will only be issuing the request
>>     to one physical engine.
>>
>>
>> For the subject, I guess you could say "Track 'heartbeat serial' 
>> counts for virtual engines". However, the serial tracking count is 
>> not explicitly named for heartbeats so it seems inaccurate to rename 
>> it for a checkin email subject.
>>
>> If you have a suggestion for better wording then feel free to propose 
>> something.
>
> Sigh, I am not asking for more low level detail but for more up to 
> point high level naming and high level description.
>
> "drm/i915: Fix hangchek for guc virtual engines"
I would argue that the bug is not a with hangcheck bug and only 
tangentially a GuC bug. It is really a bug with the serial number 
tracking of virtual engines in general and the lack of support for 
non-execlist backends in the serial number implementation. Hangcheck 
makes use of the serial number. It is not clear from the code whether 
anything else does currently or used to previously use them. Certainly, 
there is no documentation on the serial number declaration in the engine 
structure to explain its purpose. Likewise, there is nothing GuC 
specific about delaying the decomposition of virtual engines. Any 
externally scheduled backed end would do similar. E.g. once the execlist 
backend moves to using the DRM scheduler then maybe it will have delayed 
decomposition as well, and therefore also fall foul of the missing 
serial number updates.


>
> "..Blah blah, but hack because it is not ideal due xyz which 
> needlessly wakes up all engines which has an effect on power yes/no? 
> Latency? Throughput when high prio pulse triggers pointless preemption?"
Yes to all the above but that is already true of the heartbeat mechanism 
in general and I do not see any documentation in the code as to what the 
effect of the heartbeat mechanism is on power, latency, throughput, etc. 
My assumption is that the heartbeat is considered slow enough 
periodicity that any performance impact is negligible. And if the system 
is loaded to the point where the heartbeat is having an impact then all 
engines within the virtual set are going to be in use (because if they 
aren't then the system is obviously not heavily loaded), in which case 
the heartbeat would be pinging all engines anyway.

>
> Also, can we fix it properly without introducing inefficiencies? Do we 
> even need heartbeats when GuC is in charge of engine resets? And if we 
> do can we make them work better?
In short, no, not easily.

The GuC's internal hang detection and recovery mechanism relies on 
pre-emption timeouts for the detection part. However, if only one 
context is active on a given engine, there will be no pre-emptions and 
thus the GuC will not be able to detect if that context is making 
forward progress or not. That's where the heartbeat comes in. It sends a 
dummy request on a different context and thus causes a pre-emption to 
occur. So the architecture level decision was to keep the heartbeat 
enabled even with the GuC submission backend. Unless you are running 
OpenCL of course, in which case we turn everything off :(.

As for doing something better, not easily. GuC is not able to generate 
requests itself, so it can't replicate the heartbeat's operation 
internally. There is an option to force a context switch to idle on 
every quantum expiration. However, that is deemed too intrusive and 
costly from a performance viewpoint. It might be possible to add an 
independent heartbeat timer to the GuC firmware and use that to trigger 
less frequent forced pre-emptions. That would be more efficient and more 
targetted. Whether it is worth the effort required is another matter 
given how small an impact the heartbeat itself currently is.

I would still be my view that the serial count should be fixed anyway. 
It is broken for virtual engines. End of story. Whether that actually 
affects the users of the count is a separate issue that is dependent 
upon those users. But that just changes the severity of the bug, not its 
validity.

John.


>
> Regards,
>
> Tvrtko


^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 60/97] drm/i915: Track 'serial' counts for virtual engines
  2021-06-02  1:20                 ` John Harrison
@ 2021-06-02 12:04                   ` Tvrtko Ursulin
  0 siblings, 0 replies; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-06-02 12:04 UTC (permalink / raw)
  To: John Harrison, Matthew Brost
  Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel


On 02/06/2021 02:20, John Harrison wrote:
> On 6/1/2021 02:31, Tvrtko Ursulin wrote:
>> On 27/05/2021 18:01, John Harrison wrote:
>>> On 5/27/2021 01:53, Tvrtko Ursulin wrote:
>>>> On 26/05/2021 19:45, John Harrison wrote:
>>>>> On 5/26/2021 01:40, Tvrtko Ursulin wrote:
>>>>>> On 25/05/2021 18:52, Matthew Brost wrote:
>>>>>>> On Tue, May 25, 2021 at 11:16:12AM +0100, Tvrtko Ursulin wrote:
>>>>>>>>
>>>>>>>> On 06/05/2021 20:14, Matthew Brost wrote:
>>>>>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>>>>>
>>>>>>>>> The serial number tracking of engines happens at the backend of
>>>>>>>>> request submission and was expecting to only be given physical
>>>>>>>>> engines. However, in GuC submission mode, the decomposition of 
>>>>>>>>> virtual
>>>>>>>>> to physical engines does not happen in i915. Instead, requests are
>>>>>>>>> submitted to their virtual engine mask all the way through to the
>>>>>>>>> hardware (i.e. to GuC). This would mean that the heart beat code
>>>>>>>>> thinks the physical engines are idle due to the serial number not
>>>>>>>>> incrementing.
>>>>>>>>>
>>>>>>>>> This patch updates the tracking to decompose virtual engines into
>>>>>>>>> their physical constituents and tracks the request against 
>>>>>>>>> each. This
>>>>>>>>> is not entirely accurate as the GuC will only be issuing the 
>>>>>>>>> request
>>>>>>>>> to one physical engine. However, it is the best that i915 can 
>>>>>>>>> do given
>>>>>>>>> that it has no knowledge of the GuC's scheduling decisions.
>>>>>>>>
>>>>>>>> Commit text sounds a bit defeatist. I think instead of making up 
>>>>>>>> the serial
>>>>>>>> counts, which has downsides (could you please document in the 
>>>>>>>> commit what
>>>>>>>> they are), we should think how to design things properly.
>>>>>>>>
>>>>>>>
>>>>>>> IMO, I don't think fixing serial counts is the scope of this 
>>>>>>> series. We
>>>>>>> should focus on getting GuC submission in not cleaning up all the 
>>>>>>> crap
>>>>>>> that is in the i915. Let's make a note of this though so we can 
>>>>>>> revisit
>>>>>>> later.
>>>>>>
>>>>>> I will say again - commit message implies it is introducing an 
>>>>>> unspecified downside by not fully fixing an also unspecified 
>>>>>> issue. It is completely reasonable, and customary even, to ask for 
>>>>>> both to be documented in the commit message.
>>>>> Not sure what exactly is 'unspecified'. I thought the commit 
>>>>> message described both the problem (heartbeat not running when 
>>>>> using virtual engines) and the result (heartbeat running on more 
>>>>> engines than strictly necessary). But in greater detail...
>>>>>
>>>>> The serial number tracking is a hack for the heartbeat code to know 
>>>>> whether an engine is busy or idle, and therefore whether it should 
>>>>> be pinged for aliveness. Whenever a submission is made to an 
>>>>> engine, the serial number is incremented. The heartbeat code keeps 
>>>>> a copy of the value. If the value has changed, the engine is busy 
>>>>> and needs to be pinged.
>>>>>
>>>>> This works fine for execlist mode where virtual engine 
>>>>> decomposition is done inside i915. It fails miserably for GuC mode 
>>>>> where the decomposition is done by the hardware. The reason being 
>>>>> that the heartbeat code only looks at physical engines but the 
>>>>> serial count is only incremented on the virtual engine. Thus, the 
>>>>> heartbeat sees everything as idle and does not ping.
>>>>
>>>> So hangcheck does not work. Or it works because GuC does it anyway. 
>>>> Either way, that's one thing to explicitly state in the commit message.
>>>>
>>>>> This patch decomposes the virtual engines for the sake of 
>>>>> incrementing the serial count on each sub-engine in order to keep 
>>>>> the heartbeat code happy. The downside is that now the heartbeat 
>>>>> sees all sub-engines as busy rather than only the one the 
>>>>> submission actually ends up on. There really isn't much that can be 
>>>>> done about that. The heartbeat code is in i915 not GuC, the 
>>>>> scheduler is in GuC not i915. The only way to improve it is to 
>>>>> either move the heartbeat code into GuC as well and completely 
>>>>> disable the i915 side, or add some way for i915 to interrogate GuC 
>>>>> as to which engines are or are not active. Technically, we do have 
>>>>> both. GuC has (or at least had) an option to force a context switch 
>>>>> on every execution quantum pre-emption. However, that is much, 
>>>>> much, more heavy weight than the heartbeat. For the latter, we do 
>>>>> (almost) have the engine usage statistics for PMU and such like. 
>>>>> I'm not sure how much effort it would be to wire that up to the 
>>>>> heartbeat code instead of using the serial count.
>>>>>
>>>>> In short, the serial count is ever so slightly inefficient in that 
>>>>> it causes heartbeat pings on engines which are idle. On the other 
>>>>> hand, it is way more efficient and simpler than the current 
>>>>> alternatives.
>>>>
>>>> And the hack to make hangcheck work creates this inefficiency where 
>>>> heartbeats are sent to idle engines. Which is probably fine just 
>>>> needs to be explained.
>>>>
>>>>> Does that answer the questions?
>>>>
>>>> With the two points I re-raise clearly explained, possibly even 
>>>> patch title changed, yeah. I am just wanting for it to be more 
>>>> easily obvious to patch reader what it is functionally about - not 
>>>> just what implementation details have been change but why as well.
>>>>
>>> My understanding is that we don't explain every piece of code in 
>>> minute detail in every checkin email that touches it. I thought my 
>>> description was already pretty verbose. I've certainly seen way less 
>>> informative checkins that apparently made it through review without 
>>> issue.
>>>
>>> Regarding the problem statement, I thought this was fairly clear that 
>>> the heartbeat was broken for virtual engines:
>>>
>>>     This would mean that the heart beat code
>>>     thinks the physical engines are idle due to the serial number not
>>>     incrementing.
>>>
>>>
>>> Regarding the inefficiency about heartbeating all physical engines in 
>>> a virtual engine, again, this seems clear to me:
>>>
>>>     decompose virtual engines into
>>>     their physical constituents and tracks the request against each. 
>>> This
>>>     is not entirely accurate as the GuC will only be issuing the request
>>>     to one physical engine.
>>>
>>>
>>> For the subject, I guess you could say "Track 'heartbeat serial' 
>>> counts for virtual engines". However, the serial tracking count is 
>>> not explicitly named for heartbeats so it seems inaccurate to rename 
>>> it for a checkin email subject.
>>>
>>> If you have a suggestion for better wording then feel free to propose 
>>> something.
>>
>> Sigh, I am not asking for more low level detail but for more up to 
>> point high level naming and high level description.
>>
>> "drm/i915: Fix hangchek for guc virtual engines"
> I would argue that the bug is not a with hangcheck bug and only 
> tangentially a GuC bug. It is really a bug with the serial number 
> tracking of virtual engines in general and the lack of support for 

You argue it is a bug in general but nothing is currently broken apart 
from hangcheck with GuC virtual engines? :) That could mean, say, that 
it is not actually broken but designed for the current code base.

Maybe "drm/i915: Make hangcheck work with GuC virtual engines" then if 
you object on the word fix? Would that make it immediately clear why is 
this patch must have/desirable?

> non-execlist backends in the serial number implementation. Hangcheck 
> makes use of the serial number. It is not clear from the code whether 
> anything else does currently or used to previously use them. Certainly, 

Engine pm clearly uses it to know when it is safe to park the engine. I 
think I asked earlier in the series have the interactions in that area 
been looked at. I don't know myself, since I think that GuC changes how 
engine parking is done, but not really familiar. Now that I think of it, 
there possibly is a patch which keeps all engines unparked for virtual 
engines, so that's looking okay.

> there is no documentation on the serial number declaration in the engine 
> structure to explain its purpose. Likewise, there is nothing GuC 
> specific about delaying the decomposition of virtual engines. Any 
> externally scheduled backed end would do similar. E.g. once the execlist 
> backend moves to using the DRM scheduler then maybe it will have delayed 
> decomposition as well, and therefore also fall foul of the missing 
> serial number updates.

I don't think we know yet how drm/scheduler will be used to go that far.

>> "..Blah blah, but hack because it is not ideal due xyz which 
>> needlessly wakes up all engines which has an effect on power yes/no? 
>> Latency? Throughput when high prio pulse triggers pointless preemption?"
> Yes to all the above but that is already true of the heartbeat mechanism 
> in general and I do not see any documentation in the code as to what the 
> effect of the heartbeat mechanism is on power, latency, throughput, etc. 

Difference is current code does not emit heartbeats on idle engines. So 
if we have a virtual engine built of say four some class engines, then 
the proposal here is to keep pinging all four in parallel. Even if only 
single context is executing. I am not saying that cost is big but 
honestly I don't understand why it is difficult to mention this in the 
commit message using clear and direct language.

> My assumption is that the heartbeat is considered slow enough 
> periodicity that any performance impact is negligible. And if the system 
> is loaded to the point where the heartbeat is having an impact then all 
> engines within the virtual set are going to be in use (because if they 
> aren't then the system is obviously not heavily loaded), in which case 
> the heartbeat would be pinging all engines anyway.
> 
>>
>> Also, can we fix it properly without introducing inefficiencies? Do we 
>> even need heartbeats when GuC is in charge of engine resets? And if we 
>> do can we make them work better?
> In short, no, not easily.
> 
> The GuC's internal hang detection and recovery mechanism relies on 
> pre-emption timeouts for the detection part. However, if only one 
> context is active on a given engine, there will be no pre-emptions and 
> thus the GuC will not be able to detect if that context is making 
> forward progress or not. That's where the heartbeat comes in. It sends a 
> dummy request on a different context and thus causes a pre-emption to 
> occur. So the architecture level decision was to keep the heartbeat 
> enabled even with the GuC submission backend. Unless you are running 
> OpenCL of course, in which case we turn everything off :(.
> 
> As for doing something better, not easily. GuC is not able to generate 
> requests itself, so it can't replicate the heartbeat's operation 
> internally. There is an option to force a context switch to idle on 
> every quantum expiration. However, that is deemed too intrusive and 
> costly from a performance viewpoint. It might be possible to add an 
> independent heartbeat timer to the GuC firmware and use that to trigger 
> less frequent forced pre-emptions. That would be more efficient and more 
> targetted. Whether it is worth the effort required is another matter 
> given how small an impact the heartbeat itself currently is.

Well GuC could obviously do it in many ways and not all are expensive. 
If it can force a context switch on quantum expiration, it could force 
it on hearbeat expiration as you say. That would actually be more proper 
design than this kludge which leaves a bad taste regardless how little 
cost it has. Or it could perhaps track some sort of serials in a shared 
memory page.

But anyway, all I am asking here is that patch subject and commit 
message are made clear and direct. Here, I add two sentences as what I 
think is minimum:

drm/i915/guc: Make hangcheck work with GuC virtual engines

The serial number tracking of engines happens at the backend of
request submission and was expecting to only be given physical
engines. However, in GuC submission mode, the decomposition of virtual
to physical engines does not happen in i915. Instead, requests are
submitted to their virtual engine mask all the way through to the
hardware (i.e. to GuC). This would mean that the heart beat code
thinks the physical engines are idle due to the serial number not
incrementing. <added>Which in turns means hangcheck does not work for 
GuC virtual engines.</added>

This patch updates the tracking to decompose virtual engines into
their physical constituents and tracks the request against each. This
is not entirely accurate as the GuC will only be issuing the request
to one physical engine. However, it is the best that i915 can do given
that it has no knowledge of the GuC's scheduling decisions.

<added>Downside of this is that all physical engines constituting a GuC 
virtual engine will be periodically unparked (even during just a single 
context executing) in order to be pinged with a heartbeat request. 
However the power and performance cost of this is not expected to be 
measurable (due low frequency of heartbeat pulses) and it is considered 
an easier option than trying to make changes to GuC firmware.</added>

> I would still be my view that the serial count should be fixed anyway. 
> It is broken for virtual engines. End of story. Whether that actually 
> affects the users of the count is a separate issue that is dependent 
> upon those users. But that just changes the severity of the bug, not its 
> validity.

It is clearly not broken for the current codebase, otherwise this patch 
would come with virtual_execlists_bump_serial and would be called like 
"drm/i915: Fix hangcheck on virtual engines". :)

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 60/97] drm/i915: Track 'serial' counts for virtual engines
  2021-05-06 19:14 ` [RFC PATCH 60/97] drm/i915: Track 'serial' counts for " Matthew Brost
  2021-05-25 10:16   ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-06-02 12:09   ` Tvrtko Ursulin
  1 sibling, 0 replies; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-06-02 12:09 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter


On 06/05/2021 20:14, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The serial number tracking of engines happens at the backend of
> request submission and was expecting to only be given physical
> engines. However, in GuC submission mode, the decomposition of virtual
> to physical engines does not happen in i915. Instead, requests are
> submitted to their virtual engine mask all the way through to the
> hardware (i.e. to GuC). This would mean that the heart beat code
> thinks the physical engines are idle due to the serial number not
> incrementing.
> 
> This patch updates the tracking to decompose virtual engines into
> their physical constituents and tracks the request against each. This
> is not entirely accurate as the GuC will only be issuing the request
> to one physical engine. However, it is the best that i915 can do given
> that it has no knowledge of the GuC's scheduling decisions.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_engine_types.h     |  2 ++
>   .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++++++
>   drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++++++
>   drivers/gpu/drm/i915/gt/mock_engine.c            |  6 ++++++
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 ++++++++++++++++
>   drivers/gpu/drm/i915/i915_request.c              |  4 +++-
>   6 files changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 86302e6d86b2..e2b5cda6dbc4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -389,6 +389,8 @@ struct intel_engine_cs {
>   	void		(*park)(struct intel_engine_cs *engine);
>   	void		(*unpark)(struct intel_engine_cs *engine);
>   
> +	void		(*bump_serial)(struct intel_engine_cs *engine);
> +
>   	void		(*set_default_submission)(struct intel_engine_cs *engine);
>   
>   	const struct intel_context_ops *cops;
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index ae12d7f19ecd..02880ea5d693 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3199,6 +3199,11 @@ static void execlists_release(struct intel_engine_cs *engine)
>   	lrc_fini_wa_ctx(engine);
>   }
>   
> +static void execlist_bump_serial(struct intel_engine_cs *engine)
> +{
> +	engine->serial++;
> +}
> +
>   static void
>   logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   {
> @@ -3208,6 +3213,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   
>   	engine->cops = &execlists_context_ops;
>   	engine->request_alloc = execlists_request_alloc;
> +	engine->bump_serial = execlist_bump_serial;
>   
>   	engine->reset.prepare = execlists_reset_prepare;
>   	engine->reset.rewind = execlists_reset_rewind;
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 14aa31879a37..39dd7c4ed0a9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -1045,6 +1045,11 @@ static void setup_irq(struct intel_engine_cs *engine)
>   	}
>   }
>   
> +static void ring_bump_serial(struct intel_engine_cs *engine)
> +{
> +	engine->serial++;
> +}
> +
>   static void setup_common(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
> @@ -1064,6 +1069,7 @@ static void setup_common(struct intel_engine_cs *engine)
>   
>   	engine->cops = &ring_context_ops;
>   	engine->request_alloc = ring_request_alloc;
> +	engine->bump_serial = ring_bump_serial;
>   
>   	/*
>   	 * Using a global execution timeline; the previous final breadcrumb is
> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> index bd005c1b6fd5..97b10fd60b55 100644
> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> @@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine)
>   	intel_engine_fini_retire(engine);
>   }
>   
> +static void mock_bump_serial(struct intel_engine_cs *engine)
> +{
> +	engine->serial++;
> +}
> +
>   struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   				    const char *name,
>   				    int id)
> @@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   
>   	engine->base.cops = &mock_context_ops;
>   	engine->base.request_alloc = mock_request_alloc;
> +	engine->base.bump_serial = mock_bump_serial;
>   	engine->base.emit_flush = mock_emit_flush;
>   	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
>   	engine->base.submit_request = mock_submit_request;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index dc79d287c50a..f0e5731bcef6 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1500,6 +1500,20 @@ static void guc_release(struct intel_engine_cs *engine)
>   	lrc_fini_wa_ctx(engine);
>   }
>   
> +static void guc_bump_serial(struct intel_engine_cs *engine)
> +{
> +	engine->serial++;
> +}
> +
> +static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
> +{
> +	struct intel_engine_cs *e;
> +	intel_engine_mask_t tmp, mask = engine->mask;
> +
> +	for_each_engine_masked(e, engine->gt, mask, tmp)
> +		e->serial++;
> +}
> +
>   static void guc_default_vfuncs(struct intel_engine_cs *engine)
>   {
>   	/* Default vfuncs which can be overridden by each engine. */
> @@ -1508,6 +1522,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>   
>   	engine->cops = &guc_context_ops;
>   	engine->request_alloc = guc_request_alloc;
> +	engine->bump_serial = guc_bump_serial;
>   
>   	engine->sched_engine->schedule = i915_schedule;
>   
> @@ -1843,6 +1858,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   
>   	ve->base.cops = &virtual_guc_context_ops;
>   	ve->base.request_alloc = guc_request_alloc;
> +	ve->base.bump_serial = virtual_guc_bump_serial;
>   
>   	ve->base.submit_request = guc_submit_request;
>   
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 9542a5baa45a..127d60b36422 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -692,7 +692,9 @@ bool __i915_request_submit(struct i915_request *request)
>   				     request->ring->vaddr + request->postfix);
>   
>   	trace_i915_request_execute(request);
> -	engine->serial++;
> +	if (engine->bump_serial)
> +		engine->bump_serial(engine);
> +

As long as you have to handle null vfunc, you could make the patch way 
smaller by doing:

   if (engine->bump_serial)
	engine->bump_serial(engine);
   else
	engine->serial++;

Added bonus you avoid a function call with execlists making the patch 
not introduce a double penalty. Or just make bump_serial always point to 
a valid/default function. No need for both a new branch *and* a function 
call I think. I'd prefer the code snippet as above though.

Regards,

Tvrtko

>   	result = true;
>   
>   	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 61/97] drm/i915: Hold reference to intel_context over life of i915_request
  2021-05-06 19:14 ` [RFC PATCH 61/97] drm/i915: Hold reference to intel_context over life of i915_request Matthew Brost
@ 2021-06-02 12:18   ` Tvrtko Ursulin
  0 siblings, 0 replies; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-06-02 12:18 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter


On 06/05/2021 20:14, Matthew Brost wrote:
> Hold a reference to the intel_context over life of an i915_request.
> Without this an i915_request can exist after the context has been
> destroyed (e.g. request retired, context closed, but user space holds a
> reference to the request from an out fence). In the case of GuC
> submission + virtual engine, the engine that the request references is
> also destroyed which can trigger bad pointer dref in fence ops (e.g.
> i915_fence_get_driver_name). We could likely change
> i915_fence_get_driver_name to avoid touching the engine but let's just
> be safe and hold the intel_context reference.

Isn't this a bug in present upstream as well? Like calling sync fence 
info on retired requests or something else?

If it is a bug in upstream then I think a single patch to deal with the 
issue should be posted independently. It may be as simple as checking 
for the signaled bit in i915_fence_get_driver_name and dereferencing 
with rcu protection.

Regards,

Tvrtko

> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 54 ++++++++++++-----------------
>   1 file changed, 22 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 127d60b36422..0b96b824ea06 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -126,39 +126,17 @@ static void i915_fence_release(struct dma_fence *fence)
>   	i915_sw_fence_fini(&rq->semaphore);
>   
>   	/*
> -	 * Keep one request on each engine for reserved use under mempressure
> -	 *
> -	 * We do not hold a reference to the engine here and so have to be
> -	 * very careful in what rq->engine we poke. The virtual engine is
> -	 * referenced via the rq->context and we released that ref during
> -	 * i915_request_retire(), ergo we must not dereference a virtual
> -	 * engine here. Not that we would want to, as the only consumer of
> -	 * the reserved engine->request_pool is the power management parking,
> -	 * which must-not-fail, and that is only run on the physical engines.
> -	 *
> -	 * Since the request must have been executed to be have completed,
> -	 * we know that it will have been processed by the HW and will
> -	 * not be unsubmitted again, so rq->engine and rq->execution_mask
> -	 * at this point is stable. rq->execution_mask will be a single
> -	 * bit if the last and _only_ engine it could execution on was a
> -	 * physical engine, if it's multiple bits then it started on and
> -	 * could still be on a virtual engine. Thus if the mask is not a
> -	 * power-of-two we assume that rq->engine may still be a virtual
> -	 * engine and so a dangling invalid pointer that we cannot dereference
> -	 *
> -	 * For example, consider the flow of a bonded request through a virtual
> -	 * engine. The request is created with a wide engine mask (all engines
> -	 * that we might execute on). On processing the bond, the request mask
> -	 * is reduced to one or more engines. If the request is subsequently
> -	 * bound to a single engine, it will then be constrained to only
> -	 * execute on that engine and never returned to the virtual engine
> -	 * after timeslicing away, see __unwind_incomplete_requests(). Thus we
> -	 * know that if the rq->execution_mask is a single bit, rq->engine
> -	 * can be a physical engine with the exact corresponding mask.
> +	 * Keep one request on each engine for reserved use under mempressure,
> +	 * do not use with virtual engines as this really is only needed for
> +	 * kernel contexts.
>   	 */
> -	if (is_power_of_2(rq->execution_mask) &&
> -	    !cmpxchg(&rq->engine->request_pool, NULL, rq))
> +	if (!intel_engine_is_virtual(rq->engine) &&
> +	    !cmpxchg(&rq->engine->request_pool, NULL, rq)) {
> +		intel_context_put(rq->context);
>   		return;
> +	}
> +
> +	intel_context_put(rq->context);
>   
>   	kmem_cache_free(global.slab_requests, rq);
>   }
> @@ -977,7 +955,18 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
>   		}
>   	}
>   
> -	rq->context = ce;
> +	/*
> +	 * Hold a reference to the intel_context over life of an i915_request.
> +	 * Without this an i915_request can exist after the context has been
> +	 * destroyed (e.g. request retired, context closed, but user space holds
> +	 * a reference to the request from an out fence). In the case of GuC
> +	 * submission + virtual engine, the engine that the request references
> +	 * is also destroyed which can trigger bad pointer dref in fence ops
> +	 * (e.g. i915_fence_get_driver_name). We could likely change these
> +	 * functions to avoid touching the engine but let's just be safe and
> +	 * hold the intel_context reference.
> +	 */
> +	rq->context = intel_context_get(ce);
>   	rq->engine = ce->engine;
>   	rq->ring = ce->ring;
>   	rq->execution_mask = ce->engine->mask;
> @@ -1054,6 +1043,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
>   	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
>   
>   err_free:
> +	intel_context_put(ce);
>   	kmem_cache_free(global.slab_requests, rq);
>   err_unreserve:
>   	intel_context_unpin(ce);
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 63/97] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
  2021-05-06 19:14 ` [RFC PATCH 63/97] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs Matthew Brost
@ 2021-06-02 13:31   ` Tvrtko Ursulin
  0 siblings, 0 replies; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-06-02 13:31 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter


On 06/05/2021 20:14, Matthew Brost wrote:
> With GuC virtual engines the physical engine which a request executes
> and completes on isn't known to the i915. Therefore we can't attach a
> request to a physical engines breadcrumbs. To work around this we create
> a single breadcrumbs per engine class when using GuC submission and
> direct all physical engine interrupts to this breadcrumbs.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> CC: John Harrison <John.C.Harrison@Intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 41 +++++-------
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   | 14 +++-
>   .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |  7 ++
>   drivers/gpu/drm/i915/gt/intel_engine.h        |  3 +
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 28 +++++++-
>   drivers/gpu/drm/i915/gt/intel_engine_types.h  |  1 -
>   .../drm/i915/gt/intel_execlists_submission.c  |  4 +-
>   drivers/gpu/drm/i915/gt/mock_engine.c         |  4 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +++++++++++++++++--
>   9 files changed, 133 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index 38cc42783dfb..2007dc6f6b99 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -15,28 +15,14 @@
>   #include "intel_gt_pm.h"
>   #include "intel_gt_requests.h"
>   
> -static bool irq_enable(struct intel_engine_cs *engine)
> +static bool irq_enable(struct intel_breadcrumbs *b)
>   {
> -	if (!engine->irq_enable)
> -		return false;
> -
> -	/* Caller disables interrupts */
> -	spin_lock(&engine->gt->irq_lock);
> -	engine->irq_enable(engine);
> -	spin_unlock(&engine->gt->irq_lock);
> -
> -	return true;
> +	return intel_engine_irq_enable(b->irq_engine);
>   }
>   
> -static void irq_disable(struct intel_engine_cs *engine)
> +static void irq_disable(struct intel_breadcrumbs *b)
>   {
> -	if (!engine->irq_disable)
> -		return;
> -
> -	/* Caller disables interrupts */
> -	spin_lock(&engine->gt->irq_lock);
> -	engine->irq_disable(engine);
> -	spin_unlock(&engine->gt->irq_lock);
> +	intel_engine_irq_disable(b->irq_engine);
>   }
>   
>   static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
> @@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
>   	WRITE_ONCE(b->irq_armed, true);
>   
>   	/* Requests may have completed before we could enable the interrupt. */
> -	if (!b->irq_enabled++ && irq_enable(b->irq_engine))
> +	if (!b->irq_enabled++ && b->irq_enable(b))
>   		irq_work_queue(&b->irq_work);
>   }
>   
> @@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
>   {
>   	GEM_BUG_ON(!b->irq_enabled);
>   	if (!--b->irq_enabled)
> -		irq_disable(b->irq_engine);
> +		b->irq_disable(b);
>   
>   	WRITE_ONCE(b->irq_armed, false);
>   	intel_gt_pm_put_async(b->irq_engine->gt);
> @@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
>   	if (!b)
>   		return NULL;
>   
> -	b->irq_engine = irq_engine;
> +	kref_init(&b->ref);
>   
>   	spin_lock_init(&b->signalers_lock);
>   	INIT_LIST_HEAD(&b->signalers);
> @@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
>   	spin_lock_init(&b->irq_lock);
>   	init_irq_work(&b->irq_work, signal_irq_work);
>   
> +	b->irq_engine = irq_engine;
> +	b->irq_enable = irq_enable;
> +	b->irq_disable = irq_disable;
> +
>   	return b;
>   }
>   
> @@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)
>   	spin_lock_irqsave(&b->irq_lock, flags);
>   
>   	if (b->irq_enabled)
> -		irq_enable(b->irq_engine);
> +		b->irq_enable(b);
>   	else
> -		irq_disable(b->irq_engine);
> +		b->irq_disable(b);
>   
>   	spin_unlock_irqrestore(&b->irq_lock, flags);
>   }
> @@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
>   	}
>   }
>   
> -void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
> +void intel_breadcrumbs_free(struct kref *kref)
>   {
> +	struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref);
> +
>   	irq_work_sync(&b->irq_work);
>   	GEM_BUG_ON(!list_empty(&b->signalers));
>   	GEM_BUG_ON(b->irq_armed);
> +
>   	kfree(b);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> index 3ce5ce270b04..72105b74663d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> @@ -17,7 +17,7 @@ struct intel_breadcrumbs;
>   
>   struct intel_breadcrumbs *
>   intel_breadcrumbs_create(struct intel_engine_cs *irq_engine);
> -void intel_breadcrumbs_free(struct intel_breadcrumbs *b);
> +void intel_breadcrumbs_free(struct kref *kref);
>   
>   void intel_breadcrumbs_reset(struct intel_breadcrumbs *b);
>   void __intel_breadcrumbs_park(struct intel_breadcrumbs *b);
> @@ -48,4 +48,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *request);
>   void intel_context_remove_breadcrumbs(struct intel_context *ce,
>   				      struct intel_breadcrumbs *b);
>   
> +static inline struct intel_breadcrumbs *
> +intel_breadcrumbs_get(struct intel_breadcrumbs *b)
> +{
> +	kref_get(&b->ref);
> +	return b;
> +}
> +
> +static inline void intel_breadcrumbs_put(struct intel_breadcrumbs *b)
> +{
> +	kref_put(&b->ref, intel_breadcrumbs_free);
> +}
> +
>   #endif /* __INTEL_BREADCRUMBS__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> index 3a084ce8ff5e..a4e146684be8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> @@ -7,10 +7,13 @@
>   #define __INTEL_BREADCRUMBS_TYPES__
>   
>   #include <linux/irq_work.h>
> +#include <linux/kref.h>
>   #include <linux/list.h>
>   #include <linux/spinlock.h>
>   #include <linux/types.h>
>   
> +typedef u8 intel_engine_mask_t;

Why not include engine types? Engine mask so belongs there and I 
wouldn't move it. Especially since over there otherwise a "dangling" 
#define ALL_ENGINES ((intel_engine_mask_t)~0ul) remains.

> +
>   /*
>    * Rather than have every client wait upon all user interrupts,
>    * with the herd waking after every interrupt and each doing the
> @@ -29,6 +32,7 @@
>    * the overhead of waking that client is much preferred.
>    */
>   struct intel_breadcrumbs {
> +	struct kref ref;
>   	atomic_t active;
>   
>   	spinlock_t signalers_lock; /* protects the list of signalers */
> @@ -42,7 +46,10 @@ struct intel_breadcrumbs {
>   	bool irq_armed;
>   
>   	/* Not all breadcrumbs are attached to physical HW */
> +	intel_engine_mask_t	engine_mask;
>   	struct intel_engine_cs *irq_engine;
> +	bool	(*irq_enable)(struct intel_breadcrumbs *b);
> +	void	(*irq_disable)(struct intel_breadcrumbs *b);
>   };
>   
>   #endif /* __INTEL_BREADCRUMBS_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index 3cd09381b6f8..3321d0917a99 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -209,6 +209,9 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine,
>   
>   void intel_engine_init_execlists(struct intel_engine_cs *engine);
>   
> +bool intel_engine_irq_enable(struct intel_engine_cs *engine);
> +void intel_engine_irq_disable(struct intel_engine_cs *engine);
> +
>   static inline void __intel_engine_reset(struct intel_engine_cs *engine,
>   					bool stalled)
>   {
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 903f72f0953a..10300db1c9a6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -765,7 +765,7 @@ static int engine_setup_common(struct intel_engine_cs *engine)
>   err_cmd_parser:
>   	i915_sched_engine_put(engine->sched_engine);
>   err_sched_engine:
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   err_status:
>   	cleanup_status_page(engine);
>   	return err;
> @@ -965,7 +965,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
>   {
>   	GEM_BUG_ON(!list_empty(&engine->sched_engine->requests));
>   
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   	i915_sched_engine_put(engine->sched_engine);
>   
>   	intel_engine_fini_retire(engine);
> @@ -1320,6 +1320,30 @@ bool intel_engines_are_idle(struct intel_gt *gt)
>   	return true;
>   }
>   
> +bool intel_engine_irq_enable(struct intel_engine_cs *engine)
> +{
> +	if (!engine->irq_enable)
> +		return false;
> +
> +	/* Caller disables interrupts */
> +	spin_lock(&engine->gt->irq_lock);
> +	engine->irq_enable(engine);
> +	spin_unlock(&engine->gt->irq_lock);
> +
> +	return true;
> +}
> +
> +void intel_engine_irq_disable(struct intel_engine_cs *engine)
> +{
> +	if (!engine->irq_disable)
> +		return;
> +
> +	/* Caller disables interrupts */
> +	spin_lock(&engine->gt->irq_lock);
> +	engine->irq_disable(engine);
> +	spin_unlock(&engine->gt->irq_lock);
> +}
> +
>   void intel_engines_reset_default_submission(struct intel_gt *gt)
>   {
>   	struct intel_engine_cs *engine;
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index e2b5cda6dbc4..f7b6eed586ce 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -64,7 +64,6 @@ struct intel_gt;
>   struct intel_ring;
>   struct intel_uncore;
>   
> -typedef u8 intel_engine_mask_t;
>   #define ALL_ENGINES ((intel_engine_mask_t)~0ul)
>   
>   struct intel_hw_status_page {
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 02880ea5d693..396b1356ea3e 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3418,9 +3418,11 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
>   	lrc_fini(&ve->context);
>   	intel_context_fini(&ve->context);
>   
> -	intel_breadcrumbs_free(ve->base.breadcrumbs);
> +	if (ve->base.breadcrumbs)
> +		intel_breadcrumbs_put(ve->base.breadcrumbs);
>   	if (ve->base.sched_engine)
>   		i915_sched_engine_put(ve->base.sched_engine);
> +
>   	intel_engine_free_request_pool(&ve->base);
>   
>   	kfree(ve->bonds);
> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> index 97b10fd60b55..4d023b5cd5da 100644
> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> @@ -284,7 +284,7 @@ static void mock_engine_release(struct intel_engine_cs *engine)
>   	GEM_BUG_ON(timer_pending(&mock->hw_delay));
>   
>   	i915_sched_engine_put(engine->sched_engine);
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   
>   	intel_context_unpin(engine->kernel_context);
>   	intel_context_put(engine->kernel_context);
> @@ -376,7 +376,7 @@ int mock_engine_init(struct intel_engine_cs *engine)
>   	return 0;
>   
>   err_breadcrumbs:
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   err_schedule:
>   	i915_sched_engine_put(engine->sched_engine);
>   	return -ENOMEM;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index f0e5731bcef6..80b89171b35a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1077,6 +1077,9 @@ static void __guc_context_destroy(struct intel_context *ce)
>   		struct guc_virtual_engine *ve =
>   			container_of(ce, typeof(*ve), context);
>   
> +		if (ve->base.breadcrumbs)
> +			intel_breadcrumbs_put(ve->base.breadcrumbs);
> +
>   		kfree(ve);
>   	} else {
>   		intel_context_free(ce);
> @@ -1381,6 +1384,62 @@ static const struct intel_context_ops virtual_guc_context_ops = {
>   	.get_sibling = guc_virtual_get_sibling,
>   };
>   
> +static bool
> +guc_irq_enable_breadcrumbs(struct intel_breadcrumbs *b)
> +{
> +	struct intel_engine_cs *sibling;
> +	intel_engine_mask_t tmp, mask = b->engine_mask;
> +	bool result = false;
> +
> +	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
> +		result |= intel_engine_irq_enable(sibling);
> +
> +	return result;
> +}
> +
> +static void
> +guc_irq_disable_breadcrumbs(struct intel_breadcrumbs *b)
> +{
> +	struct intel_engine_cs *sibling;
> +	intel_engine_mask_t tmp, mask = b->engine_mask;
> +
> +	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
> +		intel_engine_irq_disable(sibling);
> +}
> +
> +static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
> +{
> +	int i;
> +
> +       /*
> +        * In GuC submission mode we do not know which physical engine a request
> +        * will be scheduled on, this creates a problem because the breadcrumb
> +        * interrupt is per physical engine. To work around this we attach
> +        * requests and direct all breadcrumb interrupts to the first instance
> +        * of an engine per class. In addition all breadcrumb interrupts are
> +	* enaled / disabled across an engine class in unison.

enabled

So the problem statement only applies to virtual engines but this code 
runs for physical engines as well..

> +        */
> +	for (i = 0; i < MAX_ENGINE_INSTANCE; ++i) {
> +		struct intel_engine_cs *sibling =
> +			engine->gt->engine_class[engine->class][i];
> +
> +		if (sibling) {
> +			if (engine->breadcrumbs != sibling->breadcrumbs) {
> +				intel_breadcrumbs_put(engine->breadcrumbs);
> +				engine->breadcrumbs =
> +					intel_breadcrumbs_get(sibling->breadcrumbs);
> +			}

...and it frees the breadcrumb instances previously created, replacing 
it with..

> +			break;
> +		}
> +	}
> +
> +	if (engine->breadcrumbs) {
> +		engine->breadcrumbs->engine_mask |= engine->mask;
> +		engine->breadcrumbs->irq_enable = guc_irq_enable_breadcrumbs;
> +		engine->breadcrumbs->irq_disable = guc_irq_disable_breadcrumbs;
> +	}

...patched up version of the breadcrumbs from the first engine instance?

It's too kludgy in my view. Rather than create and then destroy + patch 
up, setup code should just do the right thing from the start.

This means if the design is breadcrumbs tree per class, have that 
created and stored somehwere under gt.guc and then just kref assign if 
guc is in use instead of creating direct per engine breadcrumbs and then 
destroying them.

On a related note, it's unfortunate how the very thing which should off 
load from the CPU is actually creating more work for the CPU in several 
areas (breadcrumbs, engine serial, CT busy looping, probably more. ...). 
And we have no idea if and when will it come up overall better.

But yeah, I don't have any ideas on how to do it better on the high 
level. Clearly interrupts need to be enabled for all engines virtuals 
are composed off, and clearly there needs to be a single tree of all 
those so code can find them.

Actually, this approach doesn't work for mixed class virtual engines 
which were considered at one point. Oh well.. pain for another day.

Regards,

Tvrtko

> +}
> +
>   static void sanitize_hwsp(struct intel_engine_cs *engine)
>   {
>   	struct intel_timeline *tl;
> @@ -1604,6 +1663,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   
>   	guc_default_vfuncs(engine);
>   	guc_default_irqs(engine);
> +	guc_init_breadcrumbs(engine);
>   
>   	if (engine->class == RENDER_CLASS)
>   		rcs_submission_override(engine);
> @@ -1846,11 +1906,6 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
>   	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
>   	ve->base.saturated = ALL_ENGINES;
> -	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
> -	if (!ve->base.breadcrumbs) {
> -		kfree(ve);
> -		return ERR_PTR(-ENOMEM);
> -	}
>   
>   	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
>   
> @@ -1899,6 +1954,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   				sibling->emit_fini_breadcrumb;
>   			ve->base.emit_fini_breadcrumb_dw =
>   				sibling->emit_fini_breadcrumb_dw;
> +			ve->base.breadcrumbs =
> +				intel_breadcrumbs_get(sibling->breadcrumbs);
>   
>   			ve->base.flags |= sibling->flags;
>   
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 64/97] drm/i915/guc: Reset implementation for new GuC interface
  2021-05-06 19:14 ` [RFC PATCH 64/97] drm/i915/guc: Reset implementation for new GuC interface Matthew Brost
@ 2021-06-02 14:33   ` Tvrtko Ursulin
  2021-06-04  3:17     ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-06-02 14:33 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter


On 06/05/2021 20:14, Matthew Brost wrote:
> Reset implementation for new GuC interface. This is the legacy reset
> implementation which is called when the i915 owns the engine hang check.
> Future patches will offload the engine hang check to GuC but we will
> continue to maintain this legacy path as a fallback and this code path
> is also required if the GuC dies.
> 
> With the new GuC interface it is not possible to reset individual
> engines - it is only possible to reset the GPU entirely. This patch
> forces an entire chip reset if any engine hangs.
> 
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context.c       |   3 +
>   drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
>   drivers/gpu/drm/i915/gt/intel_engine_types.h  |   6 +
>   .../drm/i915/gt/intel_execlists_submission.c  |  40 ++
>   drivers/gpu/drm/i915/gt/intel_gt_pm.c         |   6 +-
>   drivers/gpu/drm/i915/gt/intel_reset.c         |  18 +-
>   .../gpu/drm/i915/gt/intel_ring_submission.c   |  22 +
>   drivers/gpu/drm/i915/gt/mock_engine.c         |  31 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  16 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   8 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 580 ++++++++++++++----
>   drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  34 +-
>   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   3 +
>   drivers/gpu/drm/i915/i915_request.c           |  41 +-
>   drivers/gpu/drm/i915/i915_request.h           |   2 +
>   15 files changed, 643 insertions(+), 174 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index b24a1b7a3f88..2f01437056a8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -392,6 +392,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>   	spin_lock_init(&ce->guc_state.lock);
>   	INIT_LIST_HEAD(&ce->guc_state.fences);
>   
> +	spin_lock_init(&ce->guc_active.lock);
> +	INIT_LIST_HEAD(&ce->guc_active.requests);
> +
>   	ce->guc_id = GUC_INVALID_LRC_ID;
>   	INIT_LIST_HEAD(&ce->guc_id_link);
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 6945963a31ba..b63c8cf7823b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -165,6 +165,13 @@ struct intel_context {
>   		struct list_head fences;
>   	} guc_state;
>   
> +	struct {
> +		/** lock: protects everything in guc_active */
> +		spinlock_t lock;
> +		/** requests: active requests on this context */
> +		struct list_head requests;
> +	} guc_active;

More accounting, yeah, this is more of that where GuC gives with one 
hand and takes away with the other. :(

> +
>   	/* GuC scheduling state that does not require a lock. */
>   	atomic_t guc_sched_state_no_lock;
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index f7b6eed586ce..b84562b2708b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -432,6 +432,12 @@ struct intel_engine_cs {
>   	 */
>   	void		(*release)(struct intel_engine_cs *engine);
>   
> +	/*
> +	 * Add / remove request from engine active tracking
> +	 */
> +	void		(*add_active_request)(struct i915_request *rq);
> +	void		(*remove_active_request)(struct i915_request *rq);
> +
>   	struct intel_engine_execlists execlists;
>   
>   	/*
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 396b1356ea3e..54518b64bdbd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3117,6 +3117,42 @@ static void execlists_park(struct intel_engine_cs *engine)
>   	cancel_timer(&engine->execlists.preempt);
>   }
>   
> +static void add_to_engine(struct i915_request *rq)
> +{
> +	lockdep_assert_held(&rq->engine->sched_engine->lock);
> +	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> +}
> +
> +static void remove_from_engine(struct i915_request *rq)
> +{
> +	struct intel_engine_cs *engine, *locked;
> +
> +	/*
> +	 * Virtual engines complicate acquiring the engine timeline lock,
> +	 * as their rq->engine pointer is not stable until under that
> +	 * engine lock. The simple ploy we use is to take the lock then
> +	 * check that the rq still belongs to the newly locked engine.
> +	 */
> +	locked = READ_ONCE(rq->engine);
> +	spin_lock_irq(&locked->sched_engine->lock);
> +	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> +		spin_unlock(&locked->sched_engine->lock);
> +		spin_lock(&engine->sched_engine->lock);
> +		locked = engine;
> +	}

Could use i915_request_active_engine although tbf I don't remember why I 
did not convert all callers when I added it. Perhaps I just did not find 
them all.

> +	list_del_init(&rq->sched.link);
> +
> +	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> +	clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
> +
> +	/* Prevent further __await_execution() registering a cb, then flush */
> +	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> +
> +	spin_unlock_irq(&locked->sched_engine->lock);
> +
> +	i915_request_notify_execute_cb_imm(rq);
> +}
> +
>   static bool can_preempt(struct intel_engine_cs *engine)
>   {
>   	if (INTEL_GEN(engine->i915) > 8)
> @@ -3214,6 +3250,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   	engine->cops = &execlists_context_ops;
>   	engine->request_alloc = execlists_request_alloc;
>   	engine->bump_serial = execlist_bump_serial;
> +	engine->add_active_request = add_to_engine;
> +	engine->remove_active_request = remove_from_engine;
>   
>   	engine->reset.prepare = execlists_reset_prepare;
>   	engine->reset.rewind = execlists_reset_rewind;
> @@ -3915,6 +3953,8 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   		ve->base.sched_engine->kick_backend =
>   			sibling->sched_engine->kick_backend;
>   
> +		ve->base.add_active_request = sibling->add_active_request;
> +		ve->base.remove_active_request = sibling->remove_active_request;
>   		ve->base.emit_bb_start = sibling->emit_bb_start;
>   		ve->base.emit_flush = sibling->emit_flush;
>   		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> index aef3084e8b16..463a6ae605a0 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> @@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
>   	if (intel_gt_is_wedged(gt))
>   		intel_gt_unset_wedged(gt);
>   
> -	intel_uc_sanitize(&gt->uc);
> -
>   	for_each_engine(engine, gt, id)
>   		if (engine->reset.prepare)
>   			engine->reset.prepare(engine);
> @@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
>   			__intel_engine_reset(engine, false);
>   	}
>   
> +	intel_uc_reset(&gt->uc, false);
> +
>   	for_each_engine(engine, gt, id)
>   		if (engine->reset.finish)
>   			engine->reset.finish(engine);
> @@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt)
>   		goto err_wedged;
>   	}
>   
> +	intel_uc_reset_finish(&gt->uc);
> +
>   	intel_rps_enable(&gt->rps);
>   	intel_llc_enable(&gt->llc);
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index d5094be6d90f..ce3ef26ffe2d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -758,6 +758,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
>   		__intel_engine_reset(engine, stalled_mask & engine->mask);
>   	local_bh_enable();
>   
> +	intel_uc_reset(&gt->uc, true);
> +
>   	intel_ggtt_restore_fences(gt->ggtt);
>   
>   	return err;
> @@ -782,6 +784,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
>   		if (awake & engine->mask)
>   			intel_engine_pm_put(engine);
>   	}
> +
> +	intel_uc_reset_finish(&gt->uc);
>   }
>   
>   static void nop_submit_request(struct i915_request *request)
> @@ -835,6 +839,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
>   	for_each_engine(engine, gt, id)
>   		if (engine->reset.cancel)
>   			engine->reset.cancel(engine);
> +	intel_uc_cancel_requests(&gt->uc);
>   	local_bh_enable();
>   
>   	reset_finish(gt, awake);
> @@ -1123,6 +1128,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
>   	ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
>   	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags));
>   
> +	if (intel_engine_uses_guc(engine))
> +		return -ENODEV;
> +
>   	if (!intel_engine_pm_get_if_awake(engine))
>   		return 0;
>   
> @@ -1133,13 +1141,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
>   			   "Resetting %s for %s\n", engine->name, msg);
>   	atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
>   
> -	if (intel_engine_uses_guc(engine))
> -		ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
> -	else
> -		ret = intel_gt_reset_engine(engine);
> +	ret = intel_gt_reset_engine(engine);
>   	if (ret) {
>   		/* If we fail here, we expect to fallback to a global reset */
> -		ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
> +		ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret);
>   		goto out;
>   	}
>   
> @@ -1273,7 +1278,8 @@ void intel_gt_handle_error(struct intel_gt *gt,
>   	 * Try engine reset when available. We fall back to full reset if
>   	 * single reset fails.
>   	 */
> -	if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
> +	if (!intel_uc_uses_guc_submission(&gt->uc) &&
> +	    intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {

If with guc driver cannot do engine reset, could intel_has_reset_engine 
just say false in that case so guc check wouldn't have to be added here? 
Also noticed this is the same open I had in 2019. and someone said it 
can and would be folded. ;(

>   		local_bh_disable();
>   		for_each_engine_masked(engine, gt, engine_mask, tmp) {
>   			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 39dd7c4ed0a9..7d05bf16094c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -1050,6 +1050,25 @@ static void ring_bump_serial(struct intel_engine_cs *engine)
>   	engine->serial++;
>   }
>   
> +static void add_to_engine(struct i915_request *rq)
> +{
> +	lockdep_assert_held(&rq->engine->sched_engine->lock);
> +	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> +}
> +
> +static void remove_from_engine(struct i915_request *rq)
> +{
> +	spin_lock_irq(&rq->engine->sched_engine->lock);
> +	list_del_init(&rq->sched.link);
> +
> +	/* Prevent further __await_execution() registering a cb, then flush */
> +	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> +
> +	spin_unlock_irq(&rq->engine->sched_engine->lock);
> +
> +	i915_request_notify_execute_cb_imm(rq);
> +}
> +
>   static void setup_common(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
> @@ -1067,6 +1086,9 @@ static void setup_common(struct intel_engine_cs *engine)
>   	engine->reset.cancel = reset_cancel;
>   	engine->reset.finish = reset_finish;
>   
> +	engine->add_active_request = add_to_engine;
> +	engine->remove_active_request = remove_from_engine;
> +
>   	engine->cops = &ring_context_ops;
>   	engine->request_alloc = ring_request_alloc;
>   	engine->bump_serial = ring_bump_serial;
> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> index 4d023b5cd5da..dccf5fce980a 100644
> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> @@ -235,6 +235,35 @@ static void mock_submit_request(struct i915_request *request)
>   	spin_unlock_irqrestore(&engine->hw_lock, flags);
>   }
>   
> +static void mock_add_to_engine(struct i915_request *rq)
> +{
> +	lockdep_assert_held(&rq->engine->sched_engine->lock);
> +	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> +}
> +
> +static void mock_remove_from_engine(struct i915_request *rq)
> +{
> +	struct intel_engine_cs *engine, *locked;
> +
> +	/*
> +	 * Virtual engines complicate acquiring the engine timeline lock,
> +	 * as their rq->engine pointer is not stable until under that
> +	 * engine lock. The simple ploy we use is to take the lock then
> +	 * check that the rq still belongs to the newly locked engine.
> +	 */
> +
> +	locked = READ_ONCE(rq->engine);
> +	spin_lock_irq(&locked->sched_engine->lock);
> +	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> +		spin_unlock(&locked->sched_engine->lock);
> +		spin_lock(&engine->sched_engine->lock);
> +		locked = engine;
> +	}
> +	list_del_init(&rq->sched.link);
> +	spin_unlock_irq(&locked->sched_engine->lock);
> +}
> +
> +
>   static void mock_reset_prepare(struct intel_engine_cs *engine)
>   {
>   }
> @@ -327,6 +356,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   	engine->base.emit_flush = mock_emit_flush;
>   	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
>   	engine->base.submit_request = mock_submit_request;
> +	engine->base.add_active_request = mock_add_to_engine;
> +	engine->base.remove_active_request = mock_remove_from_engine;
>   
>   	engine->base.reset.prepare = mock_reset_prepare;
>   	engine->base.reset.rewind = mock_reset_rewind;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index 235c1997f32d..864b14e313a3 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -146,6 +146,9 @@ static void gen11_disable_guc_interrupts(struct intel_guc *guc)
>   {
>   	struct intel_gt *gt = guc_to_gt(guc);
>   
> +	if (!guc->interrupts.enabled)
> +		return;
> +
>   	spin_lock_irq(&gt->irq_lock);
>   	guc->interrupts.enabled = false;
>   
> @@ -579,19 +582,6 @@ int intel_guc_suspend(struct intel_guc *guc)
>   	return 0;
>   }
>   
> -/**
> - * intel_guc_reset_engine() - ask GuC to reset an engine
> - * @guc:	intel_guc structure
> - * @engine:	engine to be reset
> - */
> -int intel_guc_reset_engine(struct intel_guc *guc,
> -			   struct intel_engine_cs *engine)
> -{
> -	/* XXX: to be implemented with submission interface rework */
> -
> -	return -ENODEV;
> -}
> -
>   /**
>    * intel_guc_resume() - notify GuC resuming from suspend state
>    * @guc:	the guc
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 47eaa69809e8..afea04d56494 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -243,14 +243,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>   
>   int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
>   
> -int intel_guc_reset_engine(struct intel_guc *guc,
> -			   struct intel_engine_cs *engine);
> -
>   int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>   					  const u32 *msg, u32 len);
>   int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   				     const u32 *msg, u32 len);
>   
> +void intel_guc_submission_reset_prepare(struct intel_guc *guc);
> +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
> +void intel_guc_submission_reset_finish(struct intel_guc *guc);
> +void intel_guc_submission_cancel_requests(struct intel_guc *guc);
> +
>   void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
>   
>   #endif
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 80b89171b35a..8c093bc2d3a4 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -140,7 +140,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce)
>   static inline void
>   set_context_wait_for_deregister_to_register(struct intel_context *ce)
>   {
> -	/* Only should be called from guc_lrc_desc_pin() */
> +	/* Only should be called from guc_lrc_desc_pin() without lock */
>   	ce->guc_state.sched_state |=
>   		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
>   }
> @@ -240,15 +240,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
>   
>   static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
>   {
> +	guc->lrc_desc_pool_vaddr = NULL;
>   	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
>   }
>   
> +static inline bool guc_submission_initialized(struct intel_guc *guc)
> +{
> +	return guc->lrc_desc_pool_vaddr != NULL;
> +}
> +
>   static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
>   {
> -	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> +	if (likely(guc_submission_initialized(guc))) {
> +		struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> +		unsigned long flags;
>   
> -	memset(desc, 0, sizeof(*desc));
> -	xa_erase_irq(&guc->context_lookup, id);
> +		memset(desc, 0, sizeof(*desc));
> +
> +		/*
> +		 * xarray API doesn't have xa_erase_irqsave wrapper, so calling
> +		 * the lower level functions directly.
> +		 */
> +		xa_lock_irqsave(&guc->context_lookup, flags);
> +		__xa_erase(&guc->context_lookup, id);
> +		xa_unlock_irqrestore(&guc->context_lookup, flags);
> +	}
>   }
>   
>   static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> @@ -259,7 +275,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
>   static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
>   					   struct intel_context *ce)
>   {
> -	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> +	unsigned long flags;
> +
> +	/*
> +	 * xarray API doesn't have xa_save_irqsave wrapper, so calling the
> +	 * lower level functions directly.
> +	 */
> +	xa_lock_irqsave(&guc->context_lookup, flags);
> +	__xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
> +	xa_unlock_irqrestore(&guc->context_lookup, flags);
>   }
>   
>   static int guc_submission_busy_loop(struct intel_guc* guc,
> @@ -330,6 +354,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
>   					interruptible, timeout);
>   }
>   
> +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
> +
>   static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   {
>   	int err;
> @@ -337,11 +363,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   	u32 action[3];
>   	int len = 0;
>   	u32 g2h_len_dw = 0;
> -	bool enabled = context_enabled(ce);
> +	bool enabled;
>   
>   	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
>   	GEM_BUG_ON(context_guc_id_invalid(ce));
>   
> +	/*
> +	 * Corner case where the GuC firmware was blown away and reloaded while
> +	 * this context was pinned.
> +	 */
> +	if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
> +		err = guc_lrc_desc_pin(ce, false);
> +		if (unlikely(err))
> +			goto out;
> +	}
> +	enabled = context_enabled(ce);
> +
>   	if (!enabled) {
>   		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>   		action[len++] = ce->guc_id;
> @@ -364,6 +401,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   		intel_context_put(ce);
>   	}
>   
> +out:
>   	return err;
>   }
>   
> @@ -418,15 +456,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
>   	if (submit) {
>   		guc_set_lrc_tail(last);
>   resubmit:
> -		/*
> -		 * We only check for -EBUSY here even though it is possible for
> -		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
> -		 * died and a full GPU needs to be done. The hangcheck will
> -		 * eventually detect that the GuC has died and trigger this
> -		 * reset so no need to handle -EDEADLK here.
> -		 */
>   		ret = guc_add_request(guc, last);
> -		if (ret == -EBUSY) {
> +		if (unlikely(ret == -EDEADLK))
> +			goto deadlk;
> +		else if (ret == -EBUSY) {
>   			i915_sched_engine_kick(sched_engine);
>   			guc->stalled_request = last;
>   			return false;
> @@ -436,6 +469,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
>   
>   	guc->stalled_request = NULL;
>   	return submit;
> +
> +deadlk:
> +	sched_engine->tasklet.callback = NULL;
> +	tasklet_disable_nosync(&sched_engine->tasklet);
> +	return false;
>   }
>   
>   static void guc_submission_tasklet(struct tasklet_struct *t)
> @@ -462,29 +500,165 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
>   		intel_engine_signal_breadcrumbs(engine);
>   }
>   
> -static void guc_reset_prepare(struct intel_engine_cs *engine)
> +static void __guc_context_destroy(struct intel_context *ce);
> +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
> +static void guc_signal_context_fence(struct intel_context *ce);
> +
> +static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
>   {
> -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> +	struct intel_context *ce;
> +	unsigned long index, flags;
> +	bool pending_disable, pending_enable, deregister, destroyed;
>   
> -	ENGINE_TRACE(engine, "\n");
> +	xa_for_each(&guc->context_lookup, index, ce) {
> +		/* Flush context */
> +		spin_lock_irqsave(&ce->guc_state.lock, flags);
> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);

Very unusual pattern - what does it do?

> +
> +		/*
> +		 * Once we are at this point submission_disabled() is guaranteed
> +		 * to visible to all callers who set the below flags (see above
> +		 * flush and flushes in reset_prepare). If submission_disabled()
> +		 * is set, the caller shouldn't set these flags.
> +		 */
> +
> +		destroyed = context_destroyed(ce);
> +		pending_enable = context_pending_enable(ce);
> +		pending_disable = context_pending_disable(ce);
> +		deregister = context_wait_for_deregister_to_register(ce);
> +		init_sched_state(ce);
> +
> +		if (pending_enable || destroyed || deregister) {
> +			atomic_dec(&guc->outstanding_submission_g2h);
> +			if (deregister)
> +				guc_signal_context_fence(ce);
> +			if (destroyed) {
> +				release_guc_id(guc, ce);
> +				__guc_context_destroy(ce);
> +			}
> +			if (pending_enable|| deregister)
> +				intel_context_put(ce);
> +		}
> +
> +		/* Not mutualy exclusive with above if statement. */
> +		if (pending_disable) {
> +			guc_signal_context_fence(ce);
> +			intel_context_sched_disable_unpin(ce);
> +			atomic_dec(&guc->outstanding_submission_g2h);
> +			intel_context_put(ce);
> +		}

Yeah this function is a taste of the taste machine I think is 
_extremely_ hard to review and know with any confidence it does the 
right thing.

> +	}
> +}
> +
> +static inline bool
> +submission_disabled(struct intel_guc *guc)
> +{
> +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> +
> +	return unlikely(!__tasklet_is_enabled(&sched_engine->tasklet));
> +}
> +
> +static void disable_submission(struct intel_guc *guc)
> +{
> +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> +
> +	if (__tasklet_is_enabled(&sched_engine->tasklet)) {
> +		GEM_BUG_ON(!guc->ct.enabled);
> +		__tasklet_disable_sync_once(&sched_engine->tasklet);
> +		sched_engine->tasklet.callback = NULL;
> +	}
> +}
> +
> +static void enable_submission(struct intel_guc *guc)
> +{
> +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&guc->sched_engine->lock, flags);
> +	sched_engine->tasklet.callback = guc_submission_tasklet;
> +	wmb();

All memory barriers must be documented.

> +	if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
> +	    __tasklet_enable(&sched_engine->tasklet)) {
> +		GEM_BUG_ON(!guc->ct.enabled);
> +
> +		/* And kick in case we missed a new request submission. */
> +		i915_sched_engine_hi_kick(sched_engine);
> +	}
> +	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
> +}
> +
> +static void guc_flush_submissions(struct intel_guc *guc)
> +{
> +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);

Oh right, more of this. No idea.

> +}
> +
> +void intel_guc_submission_reset_prepare(struct intel_guc *guc)
> +{
> +	int i;
> +
> +	if (unlikely(!guc_submission_initialized(guc)))
> +		/* Reset called during driver load? GuC not yet initialised! */
> +		return;
> +
> +	disable_submission(guc);
> +	guc->interrupts.disable(guc);
> +
> +	/* Flush IRQ handler */
> +	spin_lock_irq(&guc_to_gt(guc)->irq_lock);
> +	spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
> +
> +	guc_flush_submissions(guc);
>   
>   	/*
> -	 * Prevent request submission to the hardware until we have
> -	 * completed the reset in i915_gem_reset_finish(). If a request
> -	 * is completed by one engine, it may then queue a request
> -	 * to a second via its sched_engine->tasklet *just* as we are
> -	 * calling engine->init_hw() and also writing the ELSP.
> -	 * Turning off the sched_engine->tasklet until the reset is over
> -	 * prevents the race.
> +	 * Handle any outstanding G2Hs before reset. Call IRQ handler directly
> +	 * each pass as interrupt have been disabled. We always scrub for
> +	 * outstanding G2H as it is possible for outstanding_submission_g2h to
> +	 * be incremented after the context state update.
>   	 */
> -	__tasklet_disable_sync_once(&sched_engine->tasklet);
> +	for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {

Why is four the magic number and what happens if it is not enough?

> +		intel_guc_to_host_event_handler(guc);
> +#define wait_for_reset(guc, wait_var) \
> +		guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
> +		do {
> +			wait_for_reset(guc, &guc->outstanding_submission_g2h);
> +		} while (!list_empty(&guc->ct.requests.incoming));
> +	}
> +	scrub_guc_desc_for_outstanding_g2h(guc);
> +}
> +
> +static struct intel_engine_cs *
> +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> +{
> +	struct intel_engine_cs *engine;
> +	intel_engine_mask_t tmp, mask = ve->mask;
> +	unsigned int num_siblings = 0;
> +
> +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> +		if (num_siblings++ == sibling)
> +			return engine;

Not sure how often is this used overall and whether just storing the 
array in ve could be justified.

> +
> +	return NULL;
> +}
> +
> +static inline struct intel_engine_cs *
> +__context_to_physical_engine(struct intel_context *ce)
> +{
> +	struct intel_engine_cs *engine = ce->engine;
> +
> +	if (intel_engine_is_virtual(engine))
> +		engine = guc_virtual_get_sibling(engine, 0);
> +
> +	return engine;
>   }
>   
> -static void guc_reset_state(struct intel_context *ce,
> -			    struct intel_engine_cs *engine,
> -			    u32 head,
> -			    bool scrub)
> +static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
>   {
> +	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
> +
>   	GEM_BUG_ON(!intel_context_is_pinned(ce));
>   
>   	/*
> @@ -502,42 +676,147 @@ static void guc_reset_state(struct intel_context *ce,
>   	lrc_update_regs(ce, engine, head);
>   }
>   
> -static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
> +static void guc_reset_nop(struct intel_engine_cs *engine)
>   {
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
> -	struct i915_request *rq;
> +}
> +
> +static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled)
> +{
> +}
> +
> +static void
> +__unwind_incomplete_requests(struct intel_context *ce)
> +{
> +	struct i915_request *rq, *rn;
> +	struct list_head *pl;
> +	int prio = I915_PRIORITY_INVALID;
> +	struct i915_sched_engine * const sched_engine =
> +		ce->engine->sched_engine;
>   	unsigned long flags;
>   
> -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +	spin_lock(&ce->guc_active.lock);
> +	list_for_each_entry_safe(rq, rn,
> +				 &ce->guc_active.requests,
> +				 sched.link) {
> +		if (i915_request_completed(rq))
> +			continue;
> +
> +		list_del_init(&rq->sched.link);
> +		spin_unlock(&ce->guc_active.lock);

Drops the lock and continues iterating the same list is safe? Comment 
needed I think and I do remember I worried about this, or similar 
instances, in GuC code before.

> +
> +		__i915_request_unsubmit(rq);
> +
> +		/* Push the request back into the queue for later resubmission. */
> +		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
> +		if (rq_prio(rq) != prio) {
> +			prio = rq_prio(rq);
> +			pl = i915_sched_lookup_priolist(sched_engine, prio);
> +		}
> +		GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
>   
> -	/* Push back any incomplete requests for replay after the reset. */
> -	rq = execlists_unwind_incomplete_requests(execlists);
> -	if (!rq)
> -		goto out_unlock;
> +		list_add_tail(&rq->sched.link, pl);
> +		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> +
> +		spin_lock(&ce->guc_active.lock);
> +	}
> +	spin_unlock(&ce->guc_active.lock);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +}
> +
> +static struct i915_request *context_find_active_request(struct intel_context *ce)
> +{
> +	struct i915_request *rq, *active = NULL;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&ce->guc_active.lock, flags);
> +	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
> +				    sched.link) {
> +		if (i915_request_completed(rq))
> +			break;
> +
> +		active = rq;
> +	}
> +	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
> +
> +	return active;
> +}
> +
> +static void __guc_reset_context(struct intel_context *ce, bool stalled)
> +{
> +	struct i915_request *rq;
> +	u32 head;
> +
> +	/*
> +	 * GuC will implicitly mark the context as non-schedulable
> +	 * when it sends the reset notification. Make sure our state
> +	 * reflects this change. The context will be marked enabled
> +	 * on resubmission.
> +	 */
> +	clr_context_enabled(ce);
> +
> +	rq = context_find_active_request(ce);
> +	if (!rq) {
> +		head = ce->ring->tail;
> +		stalled = false;
> +		goto out_replay;
> +	}
>   
>   	if (!i915_request_started(rq))
>   		stalled = false;
>   
> +	GEM_BUG_ON(i915_active_is_idle(&ce->active));
> +	head = intel_ring_wrap(ce->ring, rq->head);
>   	__i915_request_reset(rq, stalled);
> -	guc_reset_state(rq->context, engine, rq->head, stalled);
>   
> -out_unlock:
> -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> +out_replay:
> +	guc_reset_state(ce, head, stalled);
> +	__unwind_incomplete_requests(ce);
> +}
> +
> +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
> +{
> +	struct intel_context *ce;
> +	unsigned long index;
> +
> +	if (unlikely(!guc_submission_initialized(guc)))
> +		/* Reset called during driver load? GuC not yet initialised! */
> +		return;
> +
> +	xa_for_each(&guc->context_lookup, index, ce)
> +		if (intel_context_is_pinned(ce))
> +			__guc_reset_context(ce, stalled);
> +
> +	/* GuC is blown away, drop all references to contexts */
> +	xa_destroy(&guc->context_lookup);
> +}
> +
> +static void guc_cancel_context_requests(struct intel_context *ce)
> +{
> +	struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
> +	struct i915_request *rq;
> +	unsigned long flags;
> +
> +	/* Mark all executing requests as skipped. */
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +	spin_lock(&ce->guc_active.lock);
> +	list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
> +		i915_request_put(i915_request_mark_eio(rq));
> +	spin_unlock(&ce->guc_active.lock);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);

I suppose somewhere it will need to be documented what are the two locks 
protecting and why both are needed at some places.

>   }
>   
> -static void guc_reset_cancel(struct intel_engine_cs *engine)
> +static void
> +guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine)
>   {
> -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
>   	struct i915_request *rq, *rn;
>   	struct rb_node *rb;
>   	unsigned long flags;
>   
>   	/* Can be called during boot if GuC fails to load */
> -	if (!engine->gt)
> +	if (!sched_engine)
>   		return;
>   
> -	ENGINE_TRACE(engine, "\n");
> -
>   	/*
>   	 * Before we call engine->cancel_requests(), we should have exclusive
>   	 * access to the submission state. This is arranged for us by the
> @@ -552,13 +831,7 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
>   	 * submission's irq state, we also wish to remind ourselves that
>   	 * it is irq state.)
>   	 */
> -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> -
> -	/* Mark all executing requests as skipped. */
> -	list_for_each_entry(rq, &engine->sched_engine->requests, sched.link) {
> -		i915_request_set_error_once(rq, -EIO);
> -		i915_request_mark_complete(rq);
> -	}
> +	spin_lock_irqsave(&sched_engine->lock, flags);
>   
>   	/* Flush the queued requests to the timeline list (for retiring). */
>   	while ((rb = rb_first_cached(&sched_engine->queue))) {
> @@ -566,9 +839,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
>   
>   		priolist_for_each_request_consume(rq, rn, p) {
>   			list_del_init(&rq->sched.link);
> +
>   			__i915_request_submit(rq);
> -			dma_fence_set_error(&rq->fence, -EIO);
> -			i915_request_mark_complete(rq);
> +
> +			i915_request_put(i915_request_mark_eio(rq));
>   		}
>   
>   		rb_erase_cached(&p->node, &sched_engine->queue);
> @@ -580,19 +854,41 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
>   	sched_engine->queue_priority_hint = INT_MIN;
>   	sched_engine->queue = RB_ROOT_CACHED;
>   
> -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
>   }
>   
> -static void guc_reset_finish(struct intel_engine_cs *engine)
> +void intel_guc_submission_cancel_requests(struct intel_guc *guc)
>   {
> -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> +	struct intel_context *ce;
> +	unsigned long index;
>   
> -	if (__tasklet_enable(&sched_engine->tasklet))
> -		/* And kick in case we missed a new request submission. */
> -		i915_sched_engine_hi_kick(sched_engine);
> +	xa_for_each(&guc->context_lookup, index, ce)
> +		if (intel_context_is_pinned(ce))
> +			guc_cancel_context_requests(ce);
> +
> +	guc_cancel_sched_engine_requests(guc->sched_engine);
> +
> +	/* GuC is blown away, drop all references to contexts */
> +	xa_destroy(&guc->context_lookup);
> +}
> +
> +void intel_guc_submission_reset_finish(struct intel_guc *guc)
> +{
> +	/* Reset called during driver load or during wedge? */
> +	if (unlikely(!guc_submission_initialized(guc) ||
> +		     test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
> +		return;
>   
> -	ENGINE_TRACE(engine, "depth->%d\n",
> -		     atomic_read(&sched_engine->tasklet.count));
> +	/*
> +	 * Technically possible for either of these values to be non-zero here,
> +	 * but very unlikely + harmless. Regardless let's add a warn so we can
> +	 * see in CI if this happens frequently / a precursor to taking down the
> +	 * machine.

And what did CI say over time this was in?

It needs to be explained when it can be non zero and whether or not it 
can go to non zero just after the atomic_set below. Or if not why not.

> +	 */
> +	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
> +	atomic_set(&guc->outstanding_submission_g2h, 0);
> +
> +	enable_submission(guc);
>   }
>   
>   /*
> @@ -659,6 +955,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
>   	else
>   		trace_i915_request_guc_submit(rq);
>   
> +	if (unlikely(ret == -EDEADLK))
> +		disable_submission(guc);
> +
>   	return ret;
>   }
>   
> @@ -671,7 +970,8 @@ static void guc_submit_request(struct i915_request *rq)
>   	/* Will be called from irq-context when using foreign fences. */
>   	spin_lock_irqsave(&sched_engine->lock, flags);
>   
> -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> +	if (submission_disabled(guc) || guc->stalled_request ||
> +	    !i915_sched_engine_is_empty(sched_engine))
>   		queue_request(sched_engine, rq, rq_prio(rq));
>   	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
>   		i915_sched_engine_hi_kick(sched_engine);
> @@ -808,7 +1108,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
>   
>   static int __guc_action_register_context(struct intel_guc *guc,
>   					 u32 guc_id,
> -					 u32 offset)
> +					 u32 offset,
> +					 bool loop)
>   {
>   	u32 action[] = {
>   		INTEL_GUC_ACTION_REGISTER_CONTEXT,
> @@ -816,10 +1117,10 @@ static int __guc_action_register_context(struct intel_guc *guc,
>   		offset,
>   	};
>   
> -	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> +	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, loop);
>   }
>   
> -static int register_context(struct intel_context *ce)
> +static int register_context(struct intel_context *ce, bool loop)
>   {
>   	struct intel_guc *guc = ce_to_guc(ce);
>   	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> @@ -827,11 +1128,12 @@ static int register_context(struct intel_context *ce)
>   
>   	trace_intel_context_register(ce);
>   
> -	return __guc_action_register_context(guc, ce->guc_id, offset);
> +	return __guc_action_register_context(guc, ce->guc_id, offset, loop);
>   }
>   
>   static int __guc_action_deregister_context(struct intel_guc *guc,
> -					   u32 guc_id)
> +					   u32 guc_id,
> +					   bool loop)
>   {
>   	u32 action[] = {
>   		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> @@ -839,16 +1141,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
>   	};
>   
>   	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
> -					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
> +					G2H_LEN_DW_DEREGISTER_CONTEXT, loop);
>   }
>   
> -static int deregister_context(struct intel_context *ce, u32 guc_id)
> +static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
>   {
>   	struct intel_guc *guc = ce_to_guc(ce);
>   
>   	trace_intel_context_deregister(ce);
>   
> -	return __guc_action_deregister_context(guc, guc_id);
> +	return __guc_action_deregister_context(guc, guc_id, loop);
>   }
>   
>   static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> @@ -877,7 +1179,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
>   	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
>   }
>   
> -static int guc_lrc_desc_pin(struct intel_context *ce)
> +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>   {
>   	struct intel_runtime_pm *runtime_pm =
>   		&ce->engine->gt->i915->runtime_pm;
> @@ -923,18 +1225,44 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
>   	 */
>   	if (context_registered) {
>   		trace_intel_context_steal_guc_id(ce);
> -		set_context_wait_for_deregister_to_register(ce);
> -		intel_context_get(ce);
> +		if (!loop) {
> +			set_context_wait_for_deregister_to_register(ce);
> +			intel_context_get(ce);
> +		} else {
> +			bool disabled;
> +			unsigned long flags;
> +
> +			/* Seal race with Reset */

Needs to be more descriptive.

> +			spin_lock_irqsave(&ce->guc_state.lock, flags);
> +			disabled = submission_disabled(guc);
> +			if (likely(!disabled)) {
> +				set_context_wait_for_deregister_to_register(ce);
> +				intel_context_get(ce);
> +			}
> +			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +			if (unlikely(disabled)) {
> +				reset_lrc_desc(guc, desc_idx);
> +				return 0;	/* Will get registered later */
> +			}
> +		}
>   
>   		/*
>   		 * If stealing the guc_id, this ce has the same guc_id as the
>   		 * context whos guc_id was stole.
>   		 */
>   		with_intel_runtime_pm(runtime_pm, wakeref)
> -			ret = deregister_context(ce, ce->guc_id);
> +			ret = deregister_context(ce, ce->guc_id, loop);
> +		if (unlikely(ret == -EBUSY)) {
> +			clr_context_wait_for_deregister_to_register(ce);
> +			intel_context_put(ce);
> +		}
>   	} else {
>   		with_intel_runtime_pm(runtime_pm, wakeref)
> -			ret = register_context(ce);
> +			ret = register_context(ce, loop);
> +		if (unlikely(ret == -EBUSY))
> +			reset_lrc_desc(guc, desc_idx);
> +		else if (unlikely(ret == -ENODEV))
> +			ret = 0;	/* Will get registered later */
>   	}
>   
>   	return ret;
> @@ -997,7 +1325,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
>   	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
>   
>   	trace_intel_context_sched_disable(ce);
> -	intel_context_get(ce);
>   
>   	guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
>   				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> @@ -1007,6 +1334,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
>   {
>   	set_context_pending_disable(ce);
>   	clr_context_enabled(ce);
> +	intel_context_get(ce);
>   
>   	return ce->guc_id;
>   }
> @@ -1019,7 +1347,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
>   	u16 guc_id;
>   	intel_wakeref_t wakeref;
>   
> -	if (context_guc_id_invalid(ce) ||
> +	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
>   	    !lrc_desc_registered(guc, ce->guc_id)) {
>   		clr_context_enabled(ce);
>   		goto unpin;
> @@ -1053,19 +1381,13 @@ static void guc_context_sched_disable(struct intel_context *ce)
>   
>   static inline void guc_lrc_desc_unpin(struct intel_context *ce)
>   {
> -	struct intel_engine_cs *engine = ce->engine;
> -	struct intel_guc *guc = &engine->gt->uc.guc;
> -	unsigned long flags;
> +	struct intel_guc *guc = ce_to_guc(ce);
>   
>   	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
>   	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
>   	GEM_BUG_ON(context_enabled(ce));
>   
> -	spin_lock_irqsave(&ce->guc_state.lock, flags);
> -	set_context_destroyed(ce);
> -	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> -
> -	deregister_context(ce, ce->guc_id);
> +	deregister_context(ce, ce->guc_id, true);
>   }
>   
>   static void __guc_context_destroy(struct intel_context *ce)
> @@ -1093,13 +1415,15 @@ static void guc_context_destroy(struct kref *kref)
>   	struct intel_guc *guc = &ce->engine->gt->uc.guc;
>   	intel_wakeref_t wakeref;
>   	unsigned long flags;
> +	bool disabled;
>   
>   	/*
>   	 * If the guc_id is invalid this context has been stolen and we can free
>   	 * it immediately. Also can be freed immediately if the context is not
>   	 * registered with the GuC.
>   	 */
> -	if (context_guc_id_invalid(ce) ||
> +	if (submission_disabled(guc) ||
> +	    context_guc_id_invalid(ce) ||
>   	    !lrc_desc_registered(guc, ce->guc_id)) {
>   		release_guc_id(guc, ce);
>   		__guc_context_destroy(ce);
> @@ -1126,6 +1450,18 @@ static void guc_context_destroy(struct kref *kref)
>   		list_del_init(&ce->guc_id_link);
>   	spin_unlock_irqrestore(&guc->contexts_lock, flags);
>   
> +	/* Seal race with Reset */
> +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> +	disabled = submission_disabled(guc);
> +	if (likely(!disabled))
> +		set_context_destroyed(ce);
> +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +	if (unlikely(disabled)) {
> +		release_guc_id(guc, ce);
> +		__guc_context_destroy(ce);
> +		return;

Same as above, needs a better comment. It is also hard for reader to 
know if snapshot of disabled taked under the lock is still valid after 
the lock has been released and why.

Regards,

Tvrtko

> +	}
> +
>   	/*
>   	 * We defer GuC context deregistration until the context is destroyed
>   	 * in order to save on CTBs. With this optimization ideally we only need
> @@ -1148,6 +1484,33 @@ static int guc_context_alloc(struct intel_context *ce)
>   	return lrc_alloc(ce, ce->engine);
>   }
>   
> +static void add_to_context(struct i915_request *rq)
> +{
> +	struct intel_context *ce = rq->context;
> +
> +	spin_lock(&ce->guc_active.lock);
> +	list_move_tail(&rq->sched.link, &ce->guc_active.requests);
> +	spin_unlock(&ce->guc_active.lock);
> +}
> +
> +static void remove_from_context(struct i915_request *rq)
> +{
> +	struct intel_context *ce = rq->context;
> +
> +	spin_lock_irq(&ce->guc_active.lock);
> +
> +	list_del_init(&rq->sched.link);
> +	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> +
> +	/* Prevent further __await_execution() registering a cb, then flush */
> +	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> +
> +	spin_unlock_irq(&ce->guc_active.lock);
> +
> +	atomic_dec(&ce->guc_id_ref);
> +	i915_request_notify_execute_cb_imm(rq);
> +}
> +
>   static const struct intel_context_ops guc_context_ops = {
>   	.alloc = guc_context_alloc,
>   
> @@ -1186,8 +1549,6 @@ static void guc_signal_context_fence(struct intel_context *ce)
>   {
>   	unsigned long flags;
>   
> -	GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
> -
>   	spin_lock_irqsave(&ce->guc_state.lock, flags);
>   	clr_context_wait_for_deregister_to_register(ce);
>   	__guc_signal_context_fence(ce);
> @@ -1196,8 +1557,9 @@ static void guc_signal_context_fence(struct intel_context *ce)
>   
>   static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
>   {
> -	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> -		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> +	return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
> +		!submission_disabled(ce_to_guc(ce));
>   }
>   
>   static int guc_request_alloc(struct i915_request *rq)
> @@ -1256,8 +1618,10 @@ static int guc_request_alloc(struct i915_request *rq)
>   		return ret;;
>   
>   	if (context_needs_register(ce, !!ret)) {
> -		ret = guc_lrc_desc_pin(ce);
> +		ret = guc_lrc_desc_pin(ce, true);
>   		if (unlikely(ret)) {	/* unwind */
> +			if (ret == -EDEADLK)
> +				disable_submission(guc);
>   			atomic_dec(&ce->guc_id_ref);
>   			unpin_guc_id(guc, ce);
>   			return ret;
> @@ -1294,20 +1658,6 @@ static int guc_request_alloc(struct i915_request *rq)
>   	return 0;
>   }
>   
> -static struct intel_engine_cs *
> -guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> -{
> -	struct intel_engine_cs *engine;
> -	intel_engine_mask_t tmp, mask = ve->mask;
> -	unsigned int num_siblings = 0;
> -
> -	for_each_engine_masked(engine, ve->gt, mask, tmp)
> -		if (num_siblings++ == sibling)
> -			return engine;
> -
> -	return NULL;
> -}
> -
>   static int guc_virtual_context_pre_pin(struct intel_context *ce,
>   				       struct i915_gem_ww_ctx *ww,
>   				       void **vaddr)
> @@ -1516,7 +1866,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
>   {
>   	if (context_guc_id_invalid(ce))
>   		pin_guc_id(guc, ce);
> -	guc_lrc_desc_pin(ce);
> +	guc_lrc_desc_pin(ce, true);
>   }
>   
>   static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> @@ -1582,13 +1932,15 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>   	engine->cops = &guc_context_ops;
>   	engine->request_alloc = guc_request_alloc;
>   	engine->bump_serial = guc_bump_serial;
> +	engine->add_active_request = add_to_context;
> +	engine->remove_active_request = remove_from_context;
>   
>   	engine->sched_engine->schedule = i915_schedule;
>   
> -	engine->reset.prepare = guc_reset_prepare;
> -	engine->reset.rewind = guc_reset_rewind;
> -	engine->reset.cancel = guc_reset_cancel;
> -	engine->reset.finish = guc_reset_finish;
> +	engine->reset.prepare = guc_reset_nop;
> +	engine->reset.rewind = guc_rewind_nop;
> +	engine->reset.cancel = guc_reset_nop;
> +	engine->reset.finish = guc_reset_nop;
>   
>   	engine->emit_flush = gen8_emit_flush_xcs;
>   	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
> @@ -1764,7 +2116,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>   		 * register this context.
>   		 */
>   		with_intel_runtime_pm(runtime_pm, wakeref)
> -			register_context(ce);
> +			register_context(ce, true);
>   		guc_signal_context_fence(ce);
>   		intel_context_put(ce);
>   	} else if (context_destroyed(ce)) {
> @@ -1946,6 +2298,10 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   				 "v%dx%d", ve->base.class, count);
>   			ve->base.context_size = sibling->context_size;
>   
> +			ve->base.add_active_request =
> +				sibling->add_active_request;
> +			ve->base.remove_active_request =
> +				sibling->remove_active_request;
>   			ve->base.emit_bb_start = sibling->emit_bb_start;
>   			ve->base.emit_flush = sibling->emit_flush;
>   			ve->base.emit_init_breadcrumb =
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> index ab0789d66e06..d5ccffbb89ae 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> @@ -565,12 +565,44 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
>   {
>   	struct intel_guc *guc = &uc->guc;
>   
> +	/* Firmware expected to be running when this function is called */
>   	if (!intel_guc_is_ready(guc))
> -		return;
> +		goto sanitize;
> +
> +	if (intel_uc_uses_guc_submission(uc))
> +		intel_guc_submission_reset_prepare(guc);
>   
> +sanitize:
>   	__uc_sanitize(uc);
>   }
>   
> +void intel_uc_reset(struct intel_uc *uc, bool stalled)
> +{
> +	struct intel_guc *guc = &uc->guc;
> +
> +	/* Firmware can not be running when this function is called  */
> +	if (intel_uc_uses_guc_submission(uc))
> +		intel_guc_submission_reset(guc, stalled);
> +}
> +
> +void intel_uc_reset_finish(struct intel_uc *uc)
> +{
> +	struct intel_guc *guc = &uc->guc;
> +
> +	/* Firmware expected to be running when this function is called */
> +	if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
> +		intel_guc_submission_reset_finish(guc);
> +}
> +
> +void intel_uc_cancel_requests(struct intel_uc *uc)
> +{
> +	struct intel_guc *guc = &uc->guc;
> +
> +	/* Firmware can not be running when this function is called  */
> +	if (intel_uc_uses_guc_submission(uc))
> +		intel_guc_submission_cancel_requests(guc);
> +}
> +
>   void intel_uc_runtime_suspend(struct intel_uc *uc)
>   {
>   	struct intel_guc *guc = &uc->guc;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> index c4cef885e984..eaa3202192ac 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> @@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc);
>   void intel_uc_driver_remove(struct intel_uc *uc);
>   void intel_uc_init_mmio(struct intel_uc *uc);
>   void intel_uc_reset_prepare(struct intel_uc *uc);
> +void intel_uc_reset(struct intel_uc *uc, bool stalled);
> +void intel_uc_reset_finish(struct intel_uc *uc);
> +void intel_uc_cancel_requests(struct intel_uc *uc);
>   void intel_uc_suspend(struct intel_uc *uc);
>   void intel_uc_runtime_suspend(struct intel_uc *uc);
>   int intel_uc_resume(struct intel_uc *uc);
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 0b96b824ea06..4855cf7ebe21 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -194,7 +194,7 @@ static bool irq_work_imm(struct irq_work *wrk)
>   	return false;
>   }
>   
> -static void __notify_execute_cb_imm(struct i915_request *rq)
> +void i915_request_notify_execute_cb_imm(struct i915_request *rq)
>   {
>   	__notify_execute_cb(rq, irq_work_imm);
>   }
> @@ -268,37 +268,6 @@ i915_request_active_engine(struct i915_request *rq,
>   	return ret;
>   }
>   
> -
> -static void remove_from_engine(struct i915_request *rq)
> -{
> -	struct intel_engine_cs *engine, *locked;
> -
> -	/*
> -	 * Virtual engines complicate acquiring the engine timeline lock,
> -	 * as their rq->engine pointer is not stable until under that
> -	 * engine lock. The simple ploy we use is to take the lock then
> -	 * check that the rq still belongs to the newly locked engine.
> -	 */
> -	locked = READ_ONCE(rq->engine);
> -	spin_lock_irq(&locked->sched_engine->lock);
> -	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> -		spin_unlock(&locked->sched_engine->lock);
> -		spin_lock(&engine->sched_engine->lock);
> -		locked = engine;
> -	}
> -	list_del_init(&rq->sched.link);
> -
> -	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> -	clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
> -
> -	/* Prevent further __await_execution() registering a cb, then flush */
> -	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> -
> -	spin_unlock_irq(&locked->sched_engine->lock);
> -
> -	__notify_execute_cb_imm(rq);
> -}
> -
>   static void __rq_init_watchdog(struct i915_request *rq)
>   {
>   	rq->watchdog.timer.function = NULL;
> @@ -395,9 +364,7 @@ bool i915_request_retire(struct i915_request *rq)
>   	 * after removing the breadcrumb and signaling it, so that we do not
>   	 * inadvertently attach the breadcrumb to a completed request.
>   	 */
> -	if (!list_empty(&rq->sched.link))
> -		remove_from_engine(rq);
> -	atomic_dec(&rq->context->guc_id_ref);
> +	rq->engine->remove_active_request(rq);
>   	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
>   
>   	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
> @@ -539,7 +506,7 @@ __await_execution(struct i915_request *rq,
>   	if (llist_add(&cb->work.node.llist, &signal->execute_cb)) {
>   		if (i915_request_is_active(signal) ||
>   		    __request_in_flight(signal))
> -			__notify_execute_cb_imm(signal);
> +			i915_request_notify_execute_cb_imm(signal);
>   	}
>   
>   	return 0;
> @@ -676,7 +643,7 @@ bool __i915_request_submit(struct i915_request *request)
>   	result = true;
>   
>   	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> -	list_move_tail(&request->sched.link, &engine->sched_engine->requests);
> +	engine->add_active_request(request);
>   active:
>   	clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags);
>   	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index f870cd75a001..bcc6340c505e 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -649,4 +649,6 @@ bool
>   i915_request_active_engine(struct i915_request *rq,
>   			   struct intel_engine_cs **active);
>   
> +void i915_request_notify_execute_cb_imm(struct i915_request *rq);
> +
>   #endif /* I915_REQUEST_H */
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 65/97] drm/i915: Reset GPU immediately if submission is disabled
  2021-05-06 19:14 ` [RFC PATCH 65/97] drm/i915: Reset GPU immediately if submission is disabled Matthew Brost
@ 2021-06-02 14:36   ` Tvrtko Ursulin
  0 siblings, 0 replies; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-06-02 14:36 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: jason.ekstrand, daniel.vetter


On 06/05/2021 20:14, Matthew Brost wrote:
> If submission is disabled by the backend for any reason, reset the GPU
> immediately in the heartbeat code.

Okay that's what, but why is also often good to have in commit messages.

Regards,

Tvrtko

> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 63 +++++++++++++++----
>   .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  4 ++
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  9 +++
>   drivers/gpu/drm/i915/i915_scheduler.c         |  6 ++
>   drivers/gpu/drm/i915/i915_scheduler.h         |  6 ++
>   drivers/gpu/drm/i915/i915_scheduler_types.h   |  3 +
>   6 files changed, 78 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> index b6a305e6a974..a8495364d906 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> @@ -70,12 +70,30 @@ static void show_heartbeat(const struct i915_request *rq,
>   {
>   	struct drm_printer p = drm_debug_printer("heartbeat");
>   
> -	intel_engine_dump(engine, &p,
> -			  "%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n",
> -			  engine->name,
> -			  rq->fence.context,
> -			  rq->fence.seqno,
> -			  rq->sched.attr.priority);
> +	if (!rq) {
> +		intel_engine_dump(engine, &p,
> +				  "%s heartbeat not ticking\n",
> +				  engine->name);
> +	} else {
> +		intel_engine_dump(engine, &p,
> +				  "%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n",
> +				  engine->name,
> +				  rq->fence.context,
> +				  rq->fence.seqno,
> +				  rq->sched.attr.priority);
> +	}
> +}
> +
> +static void
> +reset_engine(struct intel_engine_cs *engine, struct i915_request *rq)
> +{
> +	if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
> +		show_heartbeat(rq, engine);
> +
> +	intel_gt_handle_error(engine->gt, engine->mask,
> +			      I915_ERROR_CAPTURE,
> +			      "stopped heartbeat on %s",
> +			      engine->name);
>   }
>   
>   static void heartbeat(struct work_struct *wrk)
> @@ -102,6 +120,11 @@ static void heartbeat(struct work_struct *wrk)
>   	if (intel_gt_is_wedged(engine->gt))
>   		goto out;
>   
> +	if (i915_sched_engine_disabled(engine->sched_engine)) {
> +		reset_engine(engine, engine->heartbeat.systole);
> +		goto out;
> +	}
> +
>   	if (engine->heartbeat.systole) {
>   		long delay = READ_ONCE(engine->props.heartbeat_interval_ms);
>   
> @@ -139,13 +162,7 @@ static void heartbeat(struct work_struct *wrk)
>   			engine->sched_engine->schedule(rq, &attr);
>   			local_bh_enable();
>   		} else {
> -			if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
> -				show_heartbeat(rq, engine);
> -
> -			intel_gt_handle_error(engine->gt, engine->mask,
> -					      I915_ERROR_CAPTURE,
> -					      "stopped heartbeat on %s",
> -					      engine->name);
> +			reset_engine(engine, rq);
>   		}
>   
>   		rq->emitted_jiffies = jiffies;
> @@ -194,6 +211,26 @@ void intel_engine_park_heartbeat(struct intel_engine_cs *engine)
>   		i915_request_put(fetch_and_zero(&engine->heartbeat.systole));
>   }
>   
> +void intel_gt_unpark_heartbeats(struct intel_gt *gt)
> +{
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +
> +	for_each_engine(engine, gt, id)
> +		if (intel_engine_pm_is_awake(engine))
> +			intel_engine_unpark_heartbeat(engine);
> +
> +}
> +
> +void intel_gt_park_heartbeats(struct intel_gt *gt)
> +{
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +
> +	for_each_engine(engine, gt, id)
> +		intel_engine_park_heartbeat(engine);
> +}
> +
>   void intel_engine_init_heartbeat(struct intel_engine_cs *engine)
>   {
>   	INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> index a488ea3e84a3..5da6d809a87a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> @@ -7,6 +7,7 @@
>   #define INTEL_ENGINE_HEARTBEAT_H
>   
>   struct intel_engine_cs;
> +struct intel_gt;
>   
>   void intel_engine_init_heartbeat(struct intel_engine_cs *engine);
>   
> @@ -16,6 +17,9 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
>   void intel_engine_park_heartbeat(struct intel_engine_cs *engine);
>   void intel_engine_unpark_heartbeat(struct intel_engine_cs *engine);
>   
> +void intel_gt_park_heartbeats(struct intel_gt *gt);
> +void intel_gt_unpark_heartbeats(struct intel_gt *gt);
> +
>   int intel_engine_pulse(struct intel_engine_cs *engine);
>   int intel_engine_flush_barriers(struct intel_engine_cs *engine);
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 8c093bc2d3a4..a5997d6b4aa4 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -10,6 +10,7 @@
>   #include "gt/intel_breadcrumbs.h"
>   #include "gt/intel_context.h"
>   #include "gt/intel_engine_pm.h"
> +#include "gt/intel_engine_heartbeat.h"
>   #include "gt/intel_gt.h"
>   #include "gt/intel_gt_irq.h"
>   #include "gt/intel_gt_pm.h"
> @@ -604,6 +605,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc)
>   		/* Reset called during driver load? GuC not yet initialised! */
>   		return;
>   
> +	intel_gt_park_heartbeats(guc_to_gt(guc));
>   	disable_submission(guc);
>   	guc->interrupts.disable(guc);
>   
> @@ -889,6 +891,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
>   	atomic_set(&guc->outstanding_submission_g2h, 0);
>   
>   	enable_submission(guc);
> +	intel_gt_unpark_heartbeats(guc_to_gt(guc));
>   }
>   
>   /*
> @@ -1856,6 +1859,11 @@ static int guc_resume(struct intel_engine_cs *engine)
>   	return 0;
>   }
>   
> +static bool guc_sched_engine_disabled(struct i915_sched_engine *sched_engine)
> +{
> +	return !sched_engine->tasklet.callback;
> +}
> +
>   static void guc_set_default_submission(struct intel_engine_cs *engine)
>   {
>   	engine->submit_request = guc_submit_request;
> @@ -2006,6 +2014,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   			return -ENOMEM;
>   
>   		guc->sched_engine->schedule = i915_schedule;
> +		guc->sched_engine->disabled = guc_sched_engine_disabled;
>   		guc->sched_engine->engine = engine;
>   		tasklet_setup(&guc->sched_engine->tasklet,
>   			      guc_submission_tasklet);
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index 28d403a8d7d2..72a9bee3026f 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -440,6 +440,11 @@ void i915_sched_engine_free(struct kref *kref)
>   	kfree(sched_engine);
>   }
>   
> +static bool default_disabled(struct i915_sched_engine *sched_engine)
> +{
> +	return false;
> +}
> +
>   struct i915_sched_engine *
>   i915_sched_engine_create(unsigned int subclass)
>   {
> @@ -453,6 +458,7 @@ i915_sched_engine_create(unsigned int subclass)
>   
>   	sched_engine->queue = RB_ROOT_CACHED;
>   	sched_engine->queue_priority_hint = INT_MIN;
> +	sched_engine->disabled = default_disabled;
>   
>   	INIT_LIST_HEAD(&sched_engine->requests);
>   	INIT_LIST_HEAD(&sched_engine->hold);
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index a78b1f50ecb4..ec8dfa87cbb6 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -116,4 +116,10 @@ sched_engine_active_unlock_bh(struct i915_sched_engine *sched_engine)
>   	local_bh_enable(); /* restore softirq, and kick ksoftirqd! */
>   }
>   
> +static inline bool
> +i915_sched_engine_disabled(struct i915_sched_engine *sched_engine)
> +{
> +	return sched_engine->disabled(sched_engine);
> +}
> +
>   #endif /* _I915_SCHEDULER_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
> index 90b389ba661b..a7183792d110 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler_types.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
> @@ -141,6 +141,9 @@ struct i915_sched_engine {
>   	/* Back pointer to engine */
>   	struct intel_engine_cs *engine;
>   
> +	/* Schedule engine is disabled by backend */
> +	bool	(*disabled)(struct i915_sched_engine *sched_engine);
> +
>   	/* Kick backend */
>   	void	(*kick_backend)(const struct i915_request *rq,
>   				int prio);
> 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-05-25 16:45   ` Matthew Brost
@ 2021-06-02 15:27     ` Tvrtko Ursulin
  2021-06-02 18:57       ` Daniel Vetter
  2021-06-03  4:10       ` Matthew Brost
  0 siblings, 2 replies; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-06-02 15:27 UTC (permalink / raw)
  To: Matthew Brost; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel


On 25/05/2021 17:45, Matthew Brost wrote:
> On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
>>
>> On 06/05/2021 20:13, Matthew Brost wrote:
>>> Basic GuC submission support. This is the first bullet point in the
>>> upstreaming plan covered in the following RFC [1].
>>>
>>> At a very high level the GuC is a piece of firmware which sits between
>>> the i915 and the GPU. It offloads some of the scheduling of contexts
>>> from the i915 and programs the GPU to submit contexts. The i915
>>> communicates with the GuC and the GuC communicates with the GPU.
>>>
>>> GuC submission will be disabled by default on all current upstream
>>> platforms behind a module parameter - enable_guc. A value of 3 will
>>> enable submission and HuC loading via the GuC. GuC submission should
>>> work on all gen11+ platforms assuming the GuC firmware is present.
>>>
>>> This is a huge series and it is completely unrealistic to merge all of
>>> these patches at once. Fortunately I believe we can break down the
>>> series into different merges:
>>>
>>> 1. Merge Chris Wilson's patches. These have already been reviewed
>>> upstream and I fully agree with these patches as a precursor to GuC
>>> submission.
>>>
>>> 2. Update to GuC 60.1.2. These are largely Michal's patches.
>>>
>>> 3. Turn on GuC/HuC auto mode by default.
>>>
>>> 4. Additional patches needed to support GuC submission. This is any
>>> patch not covered by 1-3 in the first 34 patches. e.g. 'Engine relative
>>> MMIO'
>>>
>>> 5. GuC submission support. Patches number 35+. These all don't have to
>>> merge at once though as we don't actually allow GuC submission until the
>>> last patch of this series.
>>
>> For the GuC backend/submission part only - it seems to me none of my review
>> comments I made in December 2019 have been implemented. At that point I
> 
> I wouldn't say none of the fixes have done, lots have just not
> everything you wanted.
> 
>> stated, and this was all internally at the time mind you, that I do not
>> think the series is ready and there were several high level issues that
>> would need to be sorted out. I don't think I gave my ack or r-b back then
>> and the promise was a few things would be worked on post (internal) merge.
>> That was supposed to include upstream refactoring to enable GuC better
>> slotting in as a backed. Fast forward a year and a half later and the only
>> progress we had in this area has been deleted.
>>
>>  From the top of my head, and having glanced the series as posted:
>>
>>   * Self-churn factor in the series is too high.
> 
> Not sure what you mean by this? The patches have been reworked
> internally too much?

No, I meant series adds and removes, changes the same code a bit much 
which makes it harder to review. It is much easier when the flow is 
logical and typical, where it starts with refactoring, generalising, 
building infrastructure and then plugging bits in, than it is to review 
patches which add stuff which then get removed or changed significantly 
a few patches down the line.

>>   * Patch ordering issues.
> 
> We are going to clean up some of the ordering as these 97 patches are
> posted in smaller mergeable series but at the end of the day this is a
> bit of a bikeshed. GuC submission can't be turned until patch 97 so IMO
> it really isn't all that big of a deal the order of which patches before
> that land as we are not breaking anything.

Yes some leeway for ordering is fine.

>>   * GuC context state machine is way too dodgy to have any confidence it can
>> be read and race conditions understood.
> 
> I know you don't really like the state machine but no other real way to
> not have DoS on resources and no real way to fairly distribute guc_ids
> without it. I know you have had other suggestions here but none of your
> suggestions either will work or they are no less complicated in the end.
> 
> For what it is worth, the state machine will get simplified when we hook
> into the DRM scheduler as won't have to deal with submitting from IRQ
> contexts in the backend or having more than 1 request in the backend at
> a time.

Dunno. A mix of self-churn, locks, inconsistent naming, verbosity and 
magic makes it super hard to review. States in functions like 
guc_context_ban, guc_context_sched_disable, guc_context_block, .. I find 
it impossible to follow what's going on. Some under lock, some outside, 
jumps, returns, add magic two .. Perhaps it is just me so wait and see 
what other reviewers will think.

>>   * Context pinning code with it's magical two adds, subtract and cmpxchg is
>> dodgy as well.
> 
> Daniele tried to remove this and it proved quite difficult + created
> even more races in the backend code. This was prior to the pre-pin and
> post-unpin code which makes this even more difficult to fix as I believe
> these functions would need to be removed first. Not saying we can't
> revisit this someday but I personally really like it - it is a clever
> way to avoid reentering the pin / unpin code while asynchronous things
> are happening rather than some complex locking scheme. Lastly, this code
> has proved incredibly stable as I don't think we've had to fix a single
> thing in this area since we've been using this code internally.

Pretty much same as above. The code like:

static inline void __intel_context_unpin(struct intel_context *ce)
{
	if (!ce->ops->sched_disable) {
		__intel_context_do_unpin(ce, 1);
	} else {
		while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
			if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
				ce->ops->sched_disable(ce);
				break;
			}
		}
	}
}

That's pretty much impenetrable for me and the only thing I can think of 
here is **ALARM** must be broken! See what others think..

>>   * Kludgy way of interfacing with rest of the driver instead of refactoring
>> to fit (idling, breadcrumbs, scheduler, tasklets, ...).
>>
> 
> Idling and breadcrumbs seem clean to me. Scheduler + tasklet are going
> away once the DRM scheduler lands. No need rework those as we are just
> going to rework this again.

Well today I read the breadcrumbs patch and there is no way that's 
clean. It goes and creates one object per engine, then deletes them, 
replacing with GuC special one. All in the same engine setup. The same 
pattern of bolting on the GuC repeats too much for my taste.

>> Now perhaps the latest plan is to ignore all these issues and still merge,
>> then follow up with throwing it away, mostly or at least largely, in which
>> case there isn't any point really to review the current state yet again. But
>> it is sad that we got to this state. So just for the record - all this was
>> reviewed in Nov/Dec 2019. By me among other folks and I at least deemed it
>> not ready in this form.
>>
> 
> I personally don't think it is really in that bad of shape. The fact
> that I could put together a PoC more or less fully integrating this
> backend into the DRM scheduler within a few days I think speaks to the
> quality and flexablitiy of this backend compared to execlists.

Or that you are much more familiar with it. Anyway, it's not the line of 
argument I think we should continue.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-06-02 15:27     ` Tvrtko Ursulin
@ 2021-06-02 18:57       ` Daniel Vetter
  2021-06-03  3:41         ` Matthew Brost
  2021-06-03  4:10       ` Matthew Brost
  1 sibling, 1 reply; 249+ messages in thread
From: Daniel Vetter @ 2021-06-02 18:57 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Matthew Brost, Jason Ekstrand, intel-gfx, dri-devel, Daniel Vetter

On Wed, Jun 2, 2021 at 5:27 PM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
> On 25/05/2021 17:45, Matthew Brost wrote:
> > On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
> >>   * Context pinning code with it's magical two adds, subtract and cmpxchg is
> >> dodgy as well.
> >
> > Daniele tried to remove this and it proved quite difficult + created
> > even more races in the backend code. This was prior to the pre-pin and
> > post-unpin code which makes this even more difficult to fix as I believe
> > these functions would need to be removed first. Not saying we can't
> > revisit this someday but I personally really like it - it is a clever
> > way to avoid reentering the pin / unpin code while asynchronous things
> > are happening rather than some complex locking scheme. Lastly, this code
> > has proved incredibly stable as I don't think we've had to fix a single
> > thing in this area since we've been using this code internally.
>
> Pretty much same as above. The code like:
>
> static inline void __intel_context_unpin(struct intel_context *ce)
> {
>         if (!ce->ops->sched_disable) {
>                 __intel_context_do_unpin(ce, 1);
>         } else {
>                 while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
>                         if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
>                                 ce->ops->sched_disable(ce);
>                                 break;
>                         }
>                 }
>         }
> }
>
> That's pretty much impenetrable for me and the only thing I can think of
> here is **ALARM** must be broken! See what others think..

pin_count is a hand-rolled mutex, except not actually a real one, and
it's absolutely hiliarous in it's various incarnations (there's one
each on i915_vm, vma, obj and probably a few more).

Not the worst one I've seen by far in the code we've merged already.
Minimally this needs a comment here and in the struct next to
@pin_count to explain where all this is abused, which would already
make it better than most of the in-tree ones.

As part of the ttm conversion we have a plan to sunset the "pin_count
as a lock" stuff, depending how bad that goes we might need to split
up the task for each struct that has such a pin_count.

-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-06-02 18:57       ` Daniel Vetter
@ 2021-06-03  3:41         ` Matthew Brost
  2021-06-03  4:47           ` Daniel Vetter
  2021-06-03 10:52           ` Tvrtko Ursulin
  0 siblings, 2 replies; 249+ messages in thread
From: Matthew Brost @ 2021-06-03  3:41 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jason Ekstrand, Tvrtko Ursulin, intel-gfx, dri-devel, Daniel Vetter

On Wed, Jun 02, 2021 at 08:57:02PM +0200, Daniel Vetter wrote:
> On Wed, Jun 2, 2021 at 5:27 PM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> > On 25/05/2021 17:45, Matthew Brost wrote:
> > > On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
> > >>   * Context pinning code with it's magical two adds, subtract and cmpxchg is
> > >> dodgy as well.
> > >
> > > Daniele tried to remove this and it proved quite difficult + created
> > > even more races in the backend code. This was prior to the pre-pin and
> > > post-unpin code which makes this even more difficult to fix as I believe
> > > these functions would need to be removed first. Not saying we can't
> > > revisit this someday but I personally really like it - it is a clever
> > > way to avoid reentering the pin / unpin code while asynchronous things
> > > are happening rather than some complex locking scheme. Lastly, this code
> > > has proved incredibly stable as I don't think we've had to fix a single
> > > thing in this area since we've been using this code internally.
> >
> > Pretty much same as above. The code like:
> >
> > static inline void __intel_context_unpin(struct intel_context *ce)
> > {
> >         if (!ce->ops->sched_disable) {
> >                 __intel_context_do_unpin(ce, 1);
> >         } else {
> >                 while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
> >                         if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
> >                                 ce->ops->sched_disable(ce);
> >                                 break;
> >                         }
> >                 }
> >         }
> > }
> >
> > That's pretty much impenetrable for me and the only thing I can think of
> > here is **ALARM** must be broken! See what others think..

Yea, probably should add a comment:

/*
 * If the context has the sched_disable function, it isn't safe to unpin
 * until this function completes. This function is allowed to complete
 * asynchronously too. To avoid this function from being entered twice
 * and move ownership of the unpin to this function's completion, adjust
 * the pin count to 2 before it is entered. When this function completes
 * the context can call intel_context_sched_unpin which decrements the
 * pin count by 2 potentially resulting in an unpin.
 *
 * A while loop is needed to ensure the atomicity of the pin count. e.g.
 * The below if / else statement has a race:
 * 
 * if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1)
 * 	ce->ops->sched_disable(ce);
 * else
 * 	atomic_dec(ce, 1);
 *
 * Two threads could simultaneously fail the if clause resulting in the
 * pin_count going to 0 with scheduling enabled + the context pinned. 
 */

> 
> pin_count is a hand-rolled mutex, except not actually a real one, and
> it's absolutely hiliarous in it's various incarnations (there's one
> each on i915_vm, vma, obj and probably a few more).
>
> Not the worst one I've seen by far in the code we've merged already.
> Minimally this needs a comment here and in the struct next to
> @pin_count to explain where all this is abused, which would already
> make it better than most of the in-tree ones.
> 
> As part of the ttm conversion we have a plan to sunset the "pin_count
> as a lock" stuff, depending how bad that goes we might need to split
> up the task for each struct that has such a pin_count.
>

Didn't know that with the TTM rework this value might go away. If that
is truely the direction I don't see the point in reworking this now. It
100% works and with a comment I think it can be understood what it is
doing.

Matt

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-06-02 15:27     ` Tvrtko Ursulin
  2021-06-02 18:57       ` Daniel Vetter
@ 2021-06-03  4:10       ` Matthew Brost
  2021-06-03  8:51         ` Tvrtko Ursulin
  1 sibling, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-06-03  4:10 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Wed, Jun 02, 2021 at 04:27:18PM +0100, Tvrtko Ursulin wrote:
> 
> On 25/05/2021 17:45, Matthew Brost wrote:
> > On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 06/05/2021 20:13, Matthew Brost wrote:
> > > > Basic GuC submission support. This is the first bullet point in the
> > > > upstreaming plan covered in the following RFC [1].
> > > > 
> > > > At a very high level the GuC is a piece of firmware which sits between
> > > > the i915 and the GPU. It offloads some of the scheduling of contexts
> > > > from the i915 and programs the GPU to submit contexts. The i915
> > > > communicates with the GuC and the GuC communicates with the GPU.
> > > > 
> > > > GuC submission will be disabled by default on all current upstream
> > > > platforms behind a module parameter - enable_guc. A value of 3 will
> > > > enable submission and HuC loading via the GuC. GuC submission should
> > > > work on all gen11+ platforms assuming the GuC firmware is present.
> > > > 
> > > > This is a huge series and it is completely unrealistic to merge all of
> > > > these patches at once. Fortunately I believe we can break down the
> > > > series into different merges:
> > > > 
> > > > 1. Merge Chris Wilson's patches. These have already been reviewed
> > > > upstream and I fully agree with these patches as a precursor to GuC
> > > > submission.
> > > > 
> > > > 2. Update to GuC 60.1.2. These are largely Michal's patches.
> > > > 
> > > > 3. Turn on GuC/HuC auto mode by default.
> > > > 
> > > > 4. Additional patches needed to support GuC submission. This is any
> > > > patch not covered by 1-3 in the first 34 patches. e.g. 'Engine relative
> > > > MMIO'
> > > > 
> > > > 5. GuC submission support. Patches number 35+. These all don't have to
> > > > merge at once though as we don't actually allow GuC submission until the
> > > > last patch of this series.
> > > 
> > > For the GuC backend/submission part only - it seems to me none of my review
> > > comments I made in December 2019 have been implemented. At that point I
> > 
> > I wouldn't say none of the fixes have done, lots have just not
> > everything you wanted.
> > 
> > > stated, and this was all internally at the time mind you, that I do not
> > > think the series is ready and there were several high level issues that
> > > would need to be sorted out. I don't think I gave my ack or r-b back then
> > > and the promise was a few things would be worked on post (internal) merge.
> > > That was supposed to include upstream refactoring to enable GuC better
> > > slotting in as a backed. Fast forward a year and a half later and the only
> > > progress we had in this area has been deleted.
> > > 
> > >  From the top of my head, and having glanced the series as posted:
> > > 
> > >   * Self-churn factor in the series is too high.
> > 
> > Not sure what you mean by this? The patches have been reworked
> > internally too much?
> 
> No, I meant series adds and removes, changes the same code a bit much which
> makes it harder to review. It is much easier when the flow is logical and
> typical, where it starts with refactoring, generalising, building
> infrastructure and then plugging bits in, than it is to review patches which
> add stuff which then get removed or changed significantly a few patches down
> the line.
>

This has been part of the internal churn but most of this should go
away as it gets posted / merged in smaller sets of patches.
 
> > >   * Patch ordering issues.
> > 
> > We are going to clean up some of the ordering as these 97 patches are
> > posted in smaller mergeable series but at the end of the day this is a
> > bit of a bikeshed. GuC submission can't be turned until patch 97 so IMO
> > it really isn't all that big of a deal the order of which patches before
> > that land as we are not breaking anything.
> 
> Yes some leeway for ordering is fine.
> 
> > >   * GuC context state machine is way too dodgy to have any confidence it can
> > > be read and race conditions understood.
> > 
> > I know you don't really like the state machine but no other real way to
> > not have DoS on resources and no real way to fairly distribute guc_ids
> > without it. I know you have had other suggestions here but none of your
> > suggestions either will work or they are no less complicated in the end.
> > 
> > For what it is worth, the state machine will get simplified when we hook
> > into the DRM scheduler as won't have to deal with submitting from IRQ
> > contexts in the backend or having more than 1 request in the backend at
> > a time.
> 
> Dunno. A mix of self-churn, locks, inconsistent naming, verbosity and magic
> makes it super hard to review. States in functions like guc_context_ban,
> guc_context_sched_disable, guc_context_block, .. I find it impossible to
> follow what's going on. Some under lock, some outside, jumps, returns, add
> magic two .. Perhaps it is just me so wait and see what other reviewers will
> think.
> 

No doubt it is a bit complex as all of the above function can be
executing at the same time, so can a reset, so can a submission, and the
GuC is also responding to the all of the above functions asynchronously.
When you have 6 things that can be operating on the same state, yes the
locking is going to be a bit confusing. I do write documentation in a
patch towards the end of this series explaining the locking rules + all
the races.

> > >   * Context pinning code with it's magical two adds, subtract and cmpxchg is
> > > dodgy as well.
> > 
> > Daniele tried to remove this and it proved quite difficult + created
> > even more races in the backend code. This was prior to the pre-pin and
> > post-unpin code which makes this even more difficult to fix as I believe
> > these functions would need to be removed first. Not saying we can't
> > revisit this someday but I personally really like it - it is a clever
> > way to avoid reentering the pin / unpin code while asynchronous things
> > are happening rather than some complex locking scheme. Lastly, this code
> > has proved incredibly stable as I don't think we've had to fix a single
> > thing in this area since we've been using this code internally.
> 
> Pretty much same as above. The code like:
> 
> static inline void __intel_context_unpin(struct intel_context *ce)
> {
> 	if (!ce->ops->sched_disable) {
> 		__intel_context_do_unpin(ce, 1);
> 	} else {
> 		while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
> 			if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
> 				ce->ops->sched_disable(ce);
> 				break;
> 			}
> 		}
> 	}
> }
> 
> That's pretty much impenetrable for me and the only thing I can think of
> here is **ALARM** must be broken! See what others think..
> 

Answered in a reply to Daniel's reply but I'll repeat. Should have a
comment here:

/*
 * If the context has the sched_disable function, it isn't safe to unpin
 * until this function completes. This function is allowed to complete
 * asynchronously too. To avoid this function from being entered twice
 * and move ownership of the unpin to this function's completion, adjust
 * the pin count to 2 before it is entered. When this function completes
 * the context can call intel_context_sched_unpin which decrements the
 * pin count by 2 potentially resulting in an unpin.
 *
 * A while loop is needed to ensure the atomicity of the pin count. e.g.
 * The below if / else statement has a race:
 *
 * if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1)
 * 	ce->ops->sched_disable(ce);
 * else
 * 	atomic_dec(ce, 1);
 *
 * Two threads could simultaneously fail the if clause resulting in the
 * pin_count going to 0 with scheduling enabled + the context pinned. 
 */

> > >   * Kludgy way of interfacing with rest of the driver instead of refactoring
> > > to fit (idling, breadcrumbs, scheduler, tasklets, ...).
> > > 
> > 
> > Idling and breadcrumbs seem clean to me. Scheduler + tasklet are going
> > away once the DRM scheduler lands. No need rework those as we are just
> > going to rework this again.
> 
> Well today I read the breadcrumbs patch and there is no way that's clean. It
> goes and creates one object per engine, then deletes them, replacing with
> GuC special one. All in the same engine setup. The same pattern of bolting
> on the GuC repeats too much for my taste.
> 

I don't think creating a default object /w a ref count then decrementing
the ref count + replacing it with a new object is that hard to
understand. IMO that is way better than how things worked previously
where we just made implicit assumptions all over the driver of the
execlists backend behavior. If this was done properly in the current
i915 code base this really wouldn't be an issue.

> > > Now perhaps the latest plan is to ignore all these issues and still merge,
> > > then follow up with throwing it away, mostly or at least largely, in which
> > > case there isn't any point really to review the current state yet again. But
> > > it is sad that we got to this state. So just for the record - all this was
> > > reviewed in Nov/Dec 2019. By me among other folks and I at least deemed it
> > > not ready in this form.
> > > 
> > 
> > I personally don't think it is really in that bad of shape. The fact
> > that I could put together a PoC more or less fully integrating this
> > backend into the DRM scheduler within a few days I think speaks to the
> > quality and flexablitiy of this backend compared to execlists.
> 
> Or that you are much more familiar with it. Anyway, it's not the line of
> argument I think we should continue.
>

Yes, obviously more familiar with this code but I think the argument
holds when it relates to DRM scheduler. Please someone who is familiar
with the execlists backend try to integrate that with the DRM scheduler
- I guarantee it will be a nightmare / total hack job.

Matt

> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-06-03  3:41         ` Matthew Brost
@ 2021-06-03  4:47           ` Daniel Vetter
  2021-06-03  9:49             ` Tvrtko Ursulin
  2021-06-03 10:52           ` Tvrtko Ursulin
  1 sibling, 1 reply; 249+ messages in thread
From: Daniel Vetter @ 2021-06-03  4:47 UTC (permalink / raw)
  To: Matthew Brost, Thomas Hellström, Maarten Lankhorst, Matthew Auld
  Cc: Jason Ekstrand, Tvrtko Ursulin, intel-gfx, dri-devel, Daniel Vetter

On Thu, Jun 3, 2021 at 5:48 AM Matthew Brost <matthew.brost@intel.com> wrote:
>
> On Wed, Jun 02, 2021 at 08:57:02PM +0200, Daniel Vetter wrote:
> > On Wed, Jun 2, 2021 at 5:27 PM Tvrtko Ursulin
> > <tvrtko.ursulin@linux.intel.com> wrote:
> > > On 25/05/2021 17:45, Matthew Brost wrote:
> > > > On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
> > > >>   * Context pinning code with it's magical two adds, subtract and cmpxchg is
> > > >> dodgy as well.
> > > >
> > > > Daniele tried to remove this and it proved quite difficult + created
> > > > even more races in the backend code. This was prior to the pre-pin and
> > > > post-unpin code which makes this even more difficult to fix as I believe
> > > > these functions would need to be removed first. Not saying we can't
> > > > revisit this someday but I personally really like it - it is a clever
> > > > way to avoid reentering the pin / unpin code while asynchronous things
> > > > are happening rather than some complex locking scheme. Lastly, this code
> > > > has proved incredibly stable as I don't think we've had to fix a single
> > > > thing in this area since we've been using this code internally.
> > >
> > > Pretty much same as above. The code like:
> > >
> > > static inline void __intel_context_unpin(struct intel_context *ce)
> > > {
> > >         if (!ce->ops->sched_disable) {
> > >                 __intel_context_do_unpin(ce, 1);
> > >         } else {
> > >                 while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
> > >                         if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
> > >                                 ce->ops->sched_disable(ce);
> > >                                 break;
> > >                         }
> > >                 }
> > >         }
> > > }
> > >
> > > That's pretty much impenetrable for me and the only thing I can think of
> > > here is **ALARM** must be broken! See what others think..
>
> Yea, probably should add a comment:
>
> /*
>  * If the context has the sched_disable function, it isn't safe to unpin
>  * until this function completes. This function is allowed to complete
>  * asynchronously too. To avoid this function from being entered twice
>  * and move ownership of the unpin to this function's completion, adjust
>  * the pin count to 2 before it is entered. When this function completes
>  * the context can call intel_context_sched_unpin which decrements the
>  * pin count by 2 potentially resulting in an unpin.
>  *
>  * A while loop is needed to ensure the atomicity of the pin count. e.g.
>  * The below if / else statement has a race:
>  *
>  * if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1)
>  *      ce->ops->sched_disable(ce);
>  * else
>  *      atomic_dec(ce, 1);
>  *
>  * Two threads could simultaneously fail the if clause resulting in the
>  * pin_count going to 0 with scheduling enabled + the context pinned.
>  */
>
> >
> > pin_count is a hand-rolled mutex, except not actually a real one, and
> > it's absolutely hiliarous in it's various incarnations (there's one
> > each on i915_vm, vma, obj and probably a few more).
> >
> > Not the worst one I've seen by far in the code we've merged already.
> > Minimally this needs a comment here and in the struct next to
> > @pin_count to explain where all this is abused, which would already
> > make it better than most of the in-tree ones.
> >
> > As part of the ttm conversion we have a plan to sunset the "pin_count
> > as a lock" stuff, depending how bad that goes we might need to split
> > up the task for each struct that has such a pin_count.
> >
>
> Didn't know that with the TTM rework this value might go away. If that
> is truely the direction I don't see the point in reworking this now. It
> 100% works and with a comment I think it can be understood what it is
> doing.

Well not go away, but things will change. Currently the various
->pin_count sprinkled all over the place have essentially two uses
- pinning stuff long term (scanout, ctxs, anything that stays pinned
after the ioctl is done essentially)
- short-term lock-like construct

There's going to be two changes:
- The short-term pins will be replaced by dma_resv_lock/unlock pairs
- the locking rules for long-term pins will change, because we'll
require that you must hold dma_resv_lock for unpinning. So no more
atomic_t, also no more races for final unpins vs cleanup work

Also now that you've explained the why for this dance, especially the
async part: Since the new unpin will hold dma_resv_lock, we can
create&attach dma_fence for tracking async completion which then the
next operation can wait on.

The awkward state we have right now is that there's a lot of places
where we require the unpin to be done locklessly with these atomic
tricks, so there's going to be quite some surgery involved all over
the code.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-06-03  4:10       ` Matthew Brost
@ 2021-06-03  8:51         ` Tvrtko Ursulin
  2021-06-03 16:34           ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-06-03  8:51 UTC (permalink / raw)
  To: Matthew Brost; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel


On 03/06/2021 05:10, Matthew Brost wrote:
> On Wed, Jun 02, 2021 at 04:27:18PM +0100, Tvrtko Ursulin wrote:
>>
>> On 25/05/2021 17:45, Matthew Brost wrote:

[snip]

>>>>    * Kludgy way of interfacing with rest of the driver instead of refactoring
>>>> to fit (idling, breadcrumbs, scheduler, tasklets, ...).
>>>>
>>>
>>> Idling and breadcrumbs seem clean to me. Scheduler + tasklet are going
>>> away once the DRM scheduler lands. No need rework those as we are just
>>> going to rework this again.
>>
>> Well today I read the breadcrumbs patch and there is no way that's clean. It
>> goes and creates one object per engine, then deletes them, replacing with
>> GuC special one. All in the same engine setup. The same pattern of bolting
>> on the GuC repeats too much for my taste.
>>
> 
> I don't think creating a default object /w a ref count then decrementing
> the ref count + replacing it with a new object is that hard to
> understand. IMO that is way better than how things worked previously

It's not about it being hard to understand, although it certainly is far 
from the usual patterns, but about it being lazy design which in normal 
times would never be allowed. Because reduced and flattened to highlight 
the principal complaint it looks like this:

engine_setup_for_each_engine:
    engine->breadcrumbs = create_breadcrumbs();
    if (guc) {
       if (!first_class_engine) {
         kfree(engine->breadcrumbs);
         engine->breadcrumbs = first_class_engine->breadcrumbs;
       } else {
         first_class_engine->breadcrumbs->vfuncs = guc_vfuncs;
       }
    }

What I suggested is that patch should not break and hack the object 
oriented design and instead could do along the lines:

gt_guc_setup:
    for_each_class:
       gt->uc.breadcrumbs[class] = create_guc_breadcrumbs();

engine_setup_for_each_engine:
    if (guc)
       engine->breadcrumbs = get(gt->uc.breadcrumbs[class]);
    else
       engine->breadcrumbs = create_breadcrumbs();

> where we just made implicit assumptions all over the driver of the
> execlists backend behavior. If this was done properly in the current
> i915 code base this really wouldn't be an issue.

Don't really follow you here but it probably goes back to how upstream 
code was there available to be refactored all this time.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-06-03  4:47           ` Daniel Vetter
@ 2021-06-03  9:49             ` Tvrtko Ursulin
  0 siblings, 0 replies; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-06-03  9:49 UTC (permalink / raw)
  To: Daniel Vetter, Matthew Brost, Thomas Hellström,
	Maarten Lankhorst, Matthew Auld
  Cc: Jason Ekstrand, Daniel Vetter, intel-gfx, dri-devel


On 03/06/2021 05:47, Daniel Vetter wrote:
> On Thu, Jun 3, 2021 at 5:48 AM Matthew Brost <matthew.brost@intel.com> wrote:
>>
>> On Wed, Jun 02, 2021 at 08:57:02PM +0200, Daniel Vetter wrote:
>>> On Wed, Jun 2, 2021 at 5:27 PM Tvrtko Ursulin
>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>> On 25/05/2021 17:45, Matthew Brost wrote:
>>>>> On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
>>>>>>    * Context pinning code with it's magical two adds, subtract and cmpxchg is
>>>>>> dodgy as well.
>>>>>
>>>>> Daniele tried to remove this and it proved quite difficult + created
>>>>> even more races in the backend code. This was prior to the pre-pin and
>>>>> post-unpin code which makes this even more difficult to fix as I believe
>>>>> these functions would need to be removed first. Not saying we can't
>>>>> revisit this someday but I personally really like it - it is a clever
>>>>> way to avoid reentering the pin / unpin code while asynchronous things
>>>>> are happening rather than some complex locking scheme. Lastly, this code
>>>>> has proved incredibly stable as I don't think we've had to fix a single
>>>>> thing in this area since we've been using this code internally.
>>>>
>>>> Pretty much same as above. The code like:
>>>>
>>>> static inline void __intel_context_unpin(struct intel_context *ce)
>>>> {
>>>>          if (!ce->ops->sched_disable) {
>>>>                  __intel_context_do_unpin(ce, 1);
>>>>          } else {
>>>>                  while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
>>>>                          if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
>>>>                                  ce->ops->sched_disable(ce);
>>>>                                  break;
>>>>                          }
>>>>                  }
>>>>          }
>>>> }
>>>>
>>>> That's pretty much impenetrable for me and the only thing I can think of
>>>> here is **ALARM** must be broken! See what others think..
>>
>> Yea, probably should add a comment:
>>
>> /*
>>   * If the context has the sched_disable function, it isn't safe to unpin
>>   * until this function completes. This function is allowed to complete
>>   * asynchronously too. To avoid this function from being entered twice
>>   * and move ownership of the unpin to this function's completion, adjust
>>   * the pin count to 2 before it is entered. When this function completes
>>   * the context can call intel_context_sched_unpin which decrements the
>>   * pin count by 2 potentially resulting in an unpin.
>>   *
>>   * A while loop is needed to ensure the atomicity of the pin count. e.g.
>>   * The below if / else statement has a race:
>>   *
>>   * if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1)
>>   *      ce->ops->sched_disable(ce);
>>   * else
>>   *      atomic_dec(ce, 1);
>>   *
>>   * Two threads could simultaneously fail the if clause resulting in the
>>   * pin_count going to 0 with scheduling enabled + the context pinned.
>>   */
>>
>>>
>>> pin_count is a hand-rolled mutex, except not actually a real one, and
>>> it's absolutely hiliarous in it's various incarnations (there's one
>>> each on i915_vm, vma, obj and probably a few more).
>>>
>>> Not the worst one I've seen by far in the code we've merged already.
>>> Minimally this needs a comment here and in the struct next to
>>> @pin_count to explain where all this is abused, which would already
>>> make it better than most of the in-tree ones.
>>>
>>> As part of the ttm conversion we have a plan to sunset the "pin_count
>>> as a lock" stuff, depending how bad that goes we might need to split
>>> up the task for each struct that has such a pin_count.
>>>
>>
>> Didn't know that with the TTM rework this value might go away. If that
>> is truely the direction I don't see the point in reworking this now. It
>> 100% works and with a comment I think it can be understood what it is
>> doing.
> 
> Well not go away, but things will change. Currently the various
> ->pin_count sprinkled all over the place have essentially two uses
> - pinning stuff long term (scanout, ctxs, anything that stays pinned
> after the ioctl is done essentially)
> - short-term lock-like construct
> 
> There's going to be two changes:
> - The short-term pins will be replaced by dma_resv_lock/unlock pairs
> - the locking rules for long-term pins will change, because we'll
> require that you must hold dma_resv_lock for unpinning. So no more
> atomic_t, also no more races for final unpins vs cleanup work
> 
> Also now that you've explained the why for this dance, especially the
> async part: Since the new unpin will hold dma_resv_lock, we can
> create&attach dma_fence for tracking async completion which then the
> next operation can wait on.

Yes, async would be an improvement in principle, because...

> The awkward state we have right now is that there's a lot of places
> where we require the unpin to be done locklessly with these atomic
> tricks, so there's going to be quite some surgery involved all over
> the code.

... I think the main problem with how impenetrable, both this and the 
guc context state machine in general are, stem from the fact that the 
design is not right.

For instance we have intel_context which is one thing to i915, but with 
the GuC adaptations the guc state machine handling has been shoved 
inside it. That makes it complex and destroys the separation of duties.

Instead intel_context should remain the common layer and handling of GuC 
firmware needs should go in a layer under it. Whether it would subclass, 
or use a different pattern I can't tell right now. But if it was clearly 
separated then the state machine handling would have it's place away 
from the common code.

In 2019 I did push to at least prefix the guc specific fields with guc, 
as minimum, but I don't think they, and accompanying code, really should 
be present in backend agnostic intel_context.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-06-03  3:41         ` Matthew Brost
  2021-06-03  4:47           ` Daniel Vetter
@ 2021-06-03 10:52           ` Tvrtko Ursulin
  1 sibling, 0 replies; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-06-03 10:52 UTC (permalink / raw)
  To: Matthew Brost, Daniel Vetter
  Cc: Jason Ekstrand, Daniel Vetter, intel-gfx, dri-devel


On 03/06/2021 04:41, Matthew Brost wrote:
> On Wed, Jun 02, 2021 at 08:57:02PM +0200, Daniel Vetter wrote:
>> On Wed, Jun 2, 2021 at 5:27 PM Tvrtko Ursulin
>> <tvrtko.ursulin@linux.intel.com> wrote:
>>> On 25/05/2021 17:45, Matthew Brost wrote:
>>>> On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
>>>>>    * Context pinning code with it's magical two adds, subtract and cmpxchg is
>>>>> dodgy as well.
>>>>
>>>> Daniele tried to remove this and it proved quite difficult + created
>>>> even more races in the backend code. This was prior to the pre-pin and
>>>> post-unpin code which makes this even more difficult to fix as I believe
>>>> these functions would need to be removed first. Not saying we can't
>>>> revisit this someday but I personally really like it - it is a clever
>>>> way to avoid reentering the pin / unpin code while asynchronous things
>>>> are happening rather than some complex locking scheme. Lastly, this code
>>>> has proved incredibly stable as I don't think we've had to fix a single
>>>> thing in this area since we've been using this code internally.
>>>
>>> Pretty much same as above. The code like:
>>>
>>> static inline void __intel_context_unpin(struct intel_context *ce)
>>> {
>>>          if (!ce->ops->sched_disable) {
>>>                  __intel_context_do_unpin(ce, 1);
>>>          } else {
>>>                  while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
>>>                          if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
>>>                                  ce->ops->sched_disable(ce);
>>>                                  break;
>>>                          }
>>>                  }
>>>          }
>>> }
>>>
>>> That's pretty much impenetrable for me and the only thing I can think of
>>> here is **ALARM** must be broken! See what others think..
> 
> Yea, probably should add a comment:
> 
> /*
>   * If the context has the sched_disable function, it isn't safe to unpin
>   * until this function completes. This function is allowed to complete
>   * asynchronously too. To avoid this function from being entered twice
>   * and move ownership of the unpin to this function's completion, adjust
>   * the pin count to 2 before it is entered. When this function completes
>   * the context can call intel_context_sched_unpin which decrements the
>   * pin count by 2 potentially resulting in an unpin.
>   *
>   * A while loop is needed to ensure the atomicity of the pin count. e.g.
>   * The below if / else statement has a race:
>   *
>   * if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1)
>   * 	ce->ops->sched_disable(ce);
>   * else
>   * 	atomic_dec(ce, 1);
>   *
>   * Two threads could simultaneously fail the if clause resulting in the
>   * pin_count going to 0 with scheduling enabled + the context pinned.
>   */

I have many questions here..

How time bound is the busy loop?

In guc_context_sched_disable the case where someone pins after the magic 
2 has been set is handled.

But what is pin_count got to 2 legitimately, via the unpin and pin 
between the atomic_cmpxchg in __intel_context_unpin and relevant lines 
in guc_context_sched_disable get to execute?

Why is the pin_count dec in guc_context_sched_disable under the 
ce->guc_state.lock?

What is the point of:

	enabled = context_enabled(ce);
	if (unlikely(!enabled || submission_disabled(guc))) {
		if (!enabled)
			clr_context_enabled(ce);

Reads like clearing the enabled bit if it is not set?!

Why is:

static inline void clr_context_enabled(struct intel_context *ce)
{
	atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
		   &ce->guc_sched_state_no_lock);
}

Operating on a field called "guc_sched_state_no_lock" (no lock!) while 
the caller is holding guc_state.lock while manipulating that lock.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
  2021-06-03  8:51         ` Tvrtko Ursulin
@ 2021-06-03 16:34           ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-06-03 16:34 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Thu, Jun 03, 2021 at 09:51:19AM +0100, Tvrtko Ursulin wrote:
> 
> On 03/06/2021 05:10, Matthew Brost wrote:
> > On Wed, Jun 02, 2021 at 04:27:18PM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 25/05/2021 17:45, Matthew Brost wrote:
> 
> [snip]
> 
> > > > >    * Kludgy way of interfacing with rest of the driver instead of refactoring
> > > > > to fit (idling, breadcrumbs, scheduler, tasklets, ...).
> > > > > 
> > > > 
> > > > Idling and breadcrumbs seem clean to me. Scheduler + tasklet are going
> > > > away once the DRM scheduler lands. No need rework those as we are just
> > > > going to rework this again.
> > > 
> > > Well today I read the breadcrumbs patch and there is no way that's clean. It
> > > goes and creates one object per engine, then deletes them, replacing with
> > > GuC special one. All in the same engine setup. The same pattern of bolting
> > > on the GuC repeats too much for my taste.
> > > 
> > 
> > I don't think creating a default object /w a ref count then decrementing
> > the ref count + replacing it with a new object is that hard to
> > understand. IMO that is way better than how things worked previously
> 
> It's not about it being hard to understand, although it certainly is far
> from the usual patterns, but about it being lazy design which in normal
> times would never be allowed. Because reduced and flattened to highlight the
> principal complaint it looks like this:
> 
> engine_setup_for_each_engine:
>    engine->breadcrumbs = create_breadcrumbs();
>    if (guc) {
>       if (!first_class_engine) {
>         kfree(engine->breadcrumbs);
>         engine->breadcrumbs = first_class_engine->breadcrumbs;
>       } else {
>         first_class_engine->breadcrumbs->vfuncs = guc_vfuncs;
>       }
>    }
> 

I think are diving way to deep into individual patches on the cover
letter.

Agree this could be refactored bit more. Let me try a rework on this
patch in particular before this patch gets posted again.

Matt 

> What I suggested is that patch should not break and hack the object oriented
> design and instead could do along the lines:
> 
> gt_guc_setup:
>    for_each_class:
>       gt->uc.breadcrumbs[class] = create_guc_breadcrumbs();
> 
> engine_setup_for_each_engine:
>    if (guc)
>       engine->breadcrumbs = get(gt->uc.breadcrumbs[class]);
>    else
>       engine->breadcrumbs = create_breadcrumbs();
> 
> > where we just made implicit assumptions all over the driver of the
> > execlists backend behavior. If this was done properly in the current
> > i915 code base this really wouldn't be an issue.
> 
> Don't really follow you here but it probably goes back to how upstream code
> was there available to be refactored all this time.
> 
> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 64/97] drm/i915/guc: Reset implementation for new GuC interface
  2021-06-02 14:33   ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-06-04  3:17     ` Matthew Brost
  2021-06-04  8:16       ` Daniel Vetter
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-06-04  3:17 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Wed, Jun 02, 2021 at 03:33:43PM +0100, Tvrtko Ursulin wrote:
> 
> On 06/05/2021 20:14, Matthew Brost wrote:
> > Reset implementation for new GuC interface. This is the legacy reset
> > implementation which is called when the i915 owns the engine hang check.
> > Future patches will offload the engine hang check to GuC but we will
> > continue to maintain this legacy path as a fallback and this code path
> > is also required if the GuC dies.
> > 
> > With the new GuC interface it is not possible to reset individual
> > engines - it is only possible to reset the GPU entirely. This patch
> > forces an entire chip reset if any engine hangs.
> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context.c       |   3 +
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
> >   drivers/gpu/drm/i915/gt/intel_engine_types.h  |   6 +
> >   .../drm/i915/gt/intel_execlists_submission.c  |  40 ++
> >   drivers/gpu/drm/i915/gt/intel_gt_pm.c         |   6 +-
> >   drivers/gpu/drm/i915/gt/intel_reset.c         |  18 +-
> >   .../gpu/drm/i915/gt/intel_ring_submission.c   |  22 +
> >   drivers/gpu/drm/i915/gt/mock_engine.c         |  31 +
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  16 +-
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   8 +-
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 580 ++++++++++++++----
> >   drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  34 +-
> >   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   3 +
> >   drivers/gpu/drm/i915/i915_request.c           |  41 +-
> >   drivers/gpu/drm/i915/i915_request.h           |   2 +
> >   15 files changed, 643 insertions(+), 174 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> > index b24a1b7a3f88..2f01437056a8 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -392,6 +392,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
> >   	spin_lock_init(&ce->guc_state.lock);
> >   	INIT_LIST_HEAD(&ce->guc_state.fences);
> > +	spin_lock_init(&ce->guc_active.lock);
> > +	INIT_LIST_HEAD(&ce->guc_active.requests);
> > +
> >   	ce->guc_id = GUC_INVALID_LRC_ID;
> >   	INIT_LIST_HEAD(&ce->guc_id_link);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 6945963a31ba..b63c8cf7823b 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -165,6 +165,13 @@ struct intel_context {
> >   		struct list_head fences;
> >   	} guc_state;
> > +	struct {
> > +		/** lock: protects everything in guc_active */
> > +		spinlock_t lock;
> > +		/** requests: active requests on this context */
> > +		struct list_head requests;
> > +	} guc_active;
> 
> More accounting, yeah, this is more of that where GuC gives with one hand
> and takes away with the other. :(
> 

Yep but we probably can drop this once we switch to the DRM scheduler.
The drm_gpu_scheduler has a list of jobs and if don't mind searching the
whole thing on a reset that will probably work too. I think the only
reason we have a per context list is because of feedback I received a
a while go saying resets are per context with GuC so keep a list on the
context and engine list didn't really fit either. I'll make a to circle
back to this when we hook into the DRM scheduler.

> > +
> >   	/* GuC scheduling state that does not require a lock. */
> >   	atomic_t guc_sched_state_no_lock;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > index f7b6eed586ce..b84562b2708b 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > @@ -432,6 +432,12 @@ struct intel_engine_cs {
> >   	 */
> >   	void		(*release)(struct intel_engine_cs *engine);
> > +	/*
> > +	 * Add / remove request from engine active tracking
> > +	 */
> > +	void		(*add_active_request)(struct i915_request *rq);
> > +	void		(*remove_active_request)(struct i915_request *rq);
> > +
> >   	struct intel_engine_execlists execlists;
> >   	/*
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index 396b1356ea3e..54518b64bdbd 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -3117,6 +3117,42 @@ static void execlists_park(struct intel_engine_cs *engine)
> >   	cancel_timer(&engine->execlists.preempt);
> >   }
> > +static void add_to_engine(struct i915_request *rq)
> > +{
> > +	lockdep_assert_held(&rq->engine->sched_engine->lock);
> > +	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> > +}
> > +
> > +static void remove_from_engine(struct i915_request *rq)
> > +{
> > +	struct intel_engine_cs *engine, *locked;
> > +
> > +	/*
> > +	 * Virtual engines complicate acquiring the engine timeline lock,
> > +	 * as their rq->engine pointer is not stable until under that
> > +	 * engine lock. The simple ploy we use is to take the lock then
> > +	 * check that the rq still belongs to the newly locked engine.
> > +	 */
> > +	locked = READ_ONCE(rq->engine);
> > +	spin_lock_irq(&locked->sched_engine->lock);
> > +	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> > +		spin_unlock(&locked->sched_engine->lock);
> > +		spin_lock(&engine->sched_engine->lock);
> > +		locked = engine;
> > +	}
> 
> Could use i915_request_active_engine although tbf I don't remember why I did
> not convert all callers when I added it. Perhaps I just did not find them
> all.
>

I think is a copy paste from the existing code or least at it should be.
It just moves implicit execlists behavior from common code to a
execlists specific vfunc.
 
> > +	list_del_init(&rq->sched.link);
> > +
> > +	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > +	clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
> > +
> > +	/* Prevent further __await_execution() registering a cb, then flush */
> > +	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> > +
> > +	spin_unlock_irq(&locked->sched_engine->lock);
> > +
> > +	i915_request_notify_execute_cb_imm(rq);
> > +}
> > +
> >   static bool can_preempt(struct intel_engine_cs *engine)
> >   {
> >   	if (INTEL_GEN(engine->i915) > 8)
> > @@ -3214,6 +3250,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
> >   	engine->cops = &execlists_context_ops;
> >   	engine->request_alloc = execlists_request_alloc;
> >   	engine->bump_serial = execlist_bump_serial;
> > +	engine->add_active_request = add_to_engine;
> > +	engine->remove_active_request = remove_from_engine;
> >   	engine->reset.prepare = execlists_reset_prepare;
> >   	engine->reset.rewind = execlists_reset_rewind;
> > @@ -3915,6 +3953,8 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> >   		ve->base.sched_engine->kick_backend =
> >   			sibling->sched_engine->kick_backend;
> > +		ve->base.add_active_request = sibling->add_active_request;
> > +		ve->base.remove_active_request = sibling->remove_active_request;
> >   		ve->base.emit_bb_start = sibling->emit_bb_start;
> >   		ve->base.emit_flush = sibling->emit_flush;
> >   		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > index aef3084e8b16..463a6ae605a0 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > @@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
> >   	if (intel_gt_is_wedged(gt))
> >   		intel_gt_unset_wedged(gt);
> > -	intel_uc_sanitize(&gt->uc);
> > -
> >   	for_each_engine(engine, gt, id)
> >   		if (engine->reset.prepare)
> >   			engine->reset.prepare(engine);
> > @@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
> >   			__intel_engine_reset(engine, false);
> >   	}
> > +	intel_uc_reset(&gt->uc, false);
> > +
> >   	for_each_engine(engine, gt, id)
> >   		if (engine->reset.finish)
> >   			engine->reset.finish(engine);
> > @@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt)
> >   		goto err_wedged;
> >   	}
> > +	intel_uc_reset_finish(&gt->uc);
> > +
> >   	intel_rps_enable(&gt->rps);
> >   	intel_llc_enable(&gt->llc);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> > index d5094be6d90f..ce3ef26ffe2d 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> > @@ -758,6 +758,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
> >   		__intel_engine_reset(engine, stalled_mask & engine->mask);
> >   	local_bh_enable();
> > +	intel_uc_reset(&gt->uc, true);
> > +
> >   	intel_ggtt_restore_fences(gt->ggtt);
> >   	return err;
> > @@ -782,6 +784,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
> >   		if (awake & engine->mask)
> >   			intel_engine_pm_put(engine);
> >   	}
> > +
> > +	intel_uc_reset_finish(&gt->uc);
> >   }
> >   static void nop_submit_request(struct i915_request *request)
> > @@ -835,6 +839,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
> >   	for_each_engine(engine, gt, id)
> >   		if (engine->reset.cancel)
> >   			engine->reset.cancel(engine);
> > +	intel_uc_cancel_requests(&gt->uc);
> >   	local_bh_enable();
> >   	reset_finish(gt, awake);
> > @@ -1123,6 +1128,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
> >   	ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
> >   	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags));
> > +	if (intel_engine_uses_guc(engine))
> > +		return -ENODEV;
> > +
> >   	if (!intel_engine_pm_get_if_awake(engine))
> >   		return 0;
> > @@ -1133,13 +1141,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
> >   			   "Resetting %s for %s\n", engine->name, msg);
> >   	atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
> > -	if (intel_engine_uses_guc(engine))
> > -		ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
> > -	else
> > -		ret = intel_gt_reset_engine(engine);
> > +	ret = intel_gt_reset_engine(engine);
> >   	if (ret) {
> >   		/* If we fail here, we expect to fallback to a global reset */
> > -		ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
> > +		ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret);
> >   		goto out;
> >   	}
> > @@ -1273,7 +1278,8 @@ void intel_gt_handle_error(struct intel_gt *gt,
> >   	 * Try engine reset when available. We fall back to full reset if
> >   	 * single reset fails.
> >   	 */
> > -	if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
> > +	if (!intel_uc_uses_guc_submission(&gt->uc) &&
> > +	    intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
> 
> If with guc driver cannot do engine reset, could intel_has_reset_engine just
> say false in that case so guc check wouldn't have to be added here? Also
> noticed this is the same open I had in 2019. and someone said it can and
> would be folded. ;(
>

Let me look into that before the next rev, I briefly looked at this and
it does seem plausible this function could return false. Only concern
here is reset code is notoriously delicate so I am wary to change this
in this series. We have a live list of follow ups, this could be
included if it doesn't get fixed in this patch.
 
> >   		local_bh_disable();
> >   		for_each_engine_masked(engine, gt, engine_mask, tmp) {
> >   			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > index 39dd7c4ed0a9..7d05bf16094c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > @@ -1050,6 +1050,25 @@ static void ring_bump_serial(struct intel_engine_cs *engine)
> >   	engine->serial++;
> >   }
> > +static void add_to_engine(struct i915_request *rq)
> > +{
> > +	lockdep_assert_held(&rq->engine->sched_engine->lock);
> > +	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> > +}
> > +
> > +static void remove_from_engine(struct i915_request *rq)
> > +{
> > +	spin_lock_irq(&rq->engine->sched_engine->lock);
> > +	list_del_init(&rq->sched.link);
> > +
> > +	/* Prevent further __await_execution() registering a cb, then flush */
> > +	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> > +
> > +	spin_unlock_irq(&rq->engine->sched_engine->lock);
> > +
> > +	i915_request_notify_execute_cb_imm(rq);
> > +}
> > +
> >   static void setup_common(struct intel_engine_cs *engine)
> >   {
> >   	struct drm_i915_private *i915 = engine->i915;
> > @@ -1067,6 +1086,9 @@ static void setup_common(struct intel_engine_cs *engine)
> >   	engine->reset.cancel = reset_cancel;
> >   	engine->reset.finish = reset_finish;
> > +	engine->add_active_request = add_to_engine;
> > +	engine->remove_active_request = remove_from_engine;
> > +
> >   	engine->cops = &ring_context_ops;
> >   	engine->request_alloc = ring_request_alloc;
> >   	engine->bump_serial = ring_bump_serial;
> > diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> > index 4d023b5cd5da..dccf5fce980a 100644
> > --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> > +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> > @@ -235,6 +235,35 @@ static void mock_submit_request(struct i915_request *request)
> >   	spin_unlock_irqrestore(&engine->hw_lock, flags);
> >   }
> > +static void mock_add_to_engine(struct i915_request *rq)
> > +{
> > +	lockdep_assert_held(&rq->engine->sched_engine->lock);
> > +	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> > +}
> > +
> > +static void mock_remove_from_engine(struct i915_request *rq)
> > +{
> > +	struct intel_engine_cs *engine, *locked;
> > +
> > +	/*
> > +	 * Virtual engines complicate acquiring the engine timeline lock,
> > +	 * as their rq->engine pointer is not stable until under that
> > +	 * engine lock. The simple ploy we use is to take the lock then
> > +	 * check that the rq still belongs to the newly locked engine.
> > +	 */
> > +
> > +	locked = READ_ONCE(rq->engine);
> > +	spin_lock_irq(&locked->sched_engine->lock);
> > +	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> > +		spin_unlock(&locked->sched_engine->lock);
> > +		spin_lock(&engine->sched_engine->lock);
> > +		locked = engine;
> > +	}
> > +	list_del_init(&rq->sched.link);
> > +	spin_unlock_irq(&locked->sched_engine->lock);
> > +}
> > +
> > +
> >   static void mock_reset_prepare(struct intel_engine_cs *engine)
> >   {
> >   }
> > @@ -327,6 +356,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
> >   	engine->base.emit_flush = mock_emit_flush;
> >   	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
> >   	engine->base.submit_request = mock_submit_request;
> > +	engine->base.add_active_request = mock_add_to_engine;
> > +	engine->base.remove_active_request = mock_remove_from_engine;
> >   	engine->base.reset.prepare = mock_reset_prepare;
> >   	engine->base.reset.rewind = mock_reset_rewind;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > index 235c1997f32d..864b14e313a3 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > @@ -146,6 +146,9 @@ static void gen11_disable_guc_interrupts(struct intel_guc *guc)
> >   {
> >   	struct intel_gt *gt = guc_to_gt(guc);
> > +	if (!guc->interrupts.enabled)
> > +		return;
> > +
> >   	spin_lock_irq(&gt->irq_lock);
> >   	guc->interrupts.enabled = false;
> > @@ -579,19 +582,6 @@ int intel_guc_suspend(struct intel_guc *guc)
> >   	return 0;
> >   }
> > -/**
> > - * intel_guc_reset_engine() - ask GuC to reset an engine
> > - * @guc:	intel_guc structure
> > - * @engine:	engine to be reset
> > - */
> > -int intel_guc_reset_engine(struct intel_guc *guc,
> > -			   struct intel_engine_cs *engine)
> > -{
> > -	/* XXX: to be implemented with submission interface rework */
> > -
> > -	return -ENODEV;
> > -}
> > -
> >   /**
> >    * intel_guc_resume() - notify GuC resuming from suspend state
> >    * @guc:	the guc
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 47eaa69809e8..afea04d56494 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -243,14 +243,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
> >   int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
> > -int intel_guc_reset_engine(struct intel_guc *guc,
> > -			   struct intel_engine_cs *engine);
> > -
> >   int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >   					  const u32 *msg, u32 len);
> >   int intel_guc_sched_done_process_msg(struct intel_guc *guc,
> >   				     const u32 *msg, u32 len);
> > +void intel_guc_submission_reset_prepare(struct intel_guc *guc);
> > +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
> > +void intel_guc_submission_reset_finish(struct intel_guc *guc);
> > +void intel_guc_submission_cancel_requests(struct intel_guc *guc);
> > +
> >   void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
> >   #endif
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 80b89171b35a..8c093bc2d3a4 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -140,7 +140,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce)
> >   static inline void
> >   set_context_wait_for_deregister_to_register(struct intel_context *ce)
> >   {
> > -	/* Only should be called from guc_lrc_desc_pin() */
> > +	/* Only should be called from guc_lrc_desc_pin() without lock */
> >   	ce->guc_state.sched_state |=
> >   		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> >   }
> > @@ -240,15 +240,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
> >   static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
> >   {
> > +	guc->lrc_desc_pool_vaddr = NULL;
> >   	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
> >   }
> > +static inline bool guc_submission_initialized(struct intel_guc *guc)
> > +{
> > +	return guc->lrc_desc_pool_vaddr != NULL;
> > +}
> > +
> >   static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
> >   {
> > -	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> > +	if (likely(guc_submission_initialized(guc))) {
> > +		struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> > +		unsigned long flags;
> > -	memset(desc, 0, sizeof(*desc));
> > -	xa_erase_irq(&guc->context_lookup, id);
> > +		memset(desc, 0, sizeof(*desc));
> > +
> > +		/*
> > +		 * xarray API doesn't have xa_erase_irqsave wrapper, so calling
> > +		 * the lower level functions directly.
> > +		 */
> > +		xa_lock_irqsave(&guc->context_lookup, flags);
> > +		__xa_erase(&guc->context_lookup, id);
> > +		xa_unlock_irqrestore(&guc->context_lookup, flags);
> > +	}
> >   }
> >   static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> > @@ -259,7 +275,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> >   static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> >   					   struct intel_context *ce)
> >   {
> > -	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> > +	unsigned long flags;
> > +
> > +	/*
> > +	 * xarray API doesn't have xa_save_irqsave wrapper, so calling the
> > +	 * lower level functions directly.
> > +	 */
> > +	xa_lock_irqsave(&guc->context_lookup, flags);
> > +	__xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
> > +	xa_unlock_irqrestore(&guc->context_lookup, flags);
> >   }
> >   static int guc_submission_busy_loop(struct intel_guc* guc,
> > @@ -330,6 +354,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
> >   					interruptible, timeout);
> >   }
> > +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
> > +
> >   static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   {
> >   	int err;
> > @@ -337,11 +363,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   	u32 action[3];
> >   	int len = 0;
> >   	u32 g2h_len_dw = 0;
> > -	bool enabled = context_enabled(ce);
> > +	bool enabled;
> >   	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> >   	GEM_BUG_ON(context_guc_id_invalid(ce));
> > +	/*
> > +	 * Corner case where the GuC firmware was blown away and reloaded while
> > +	 * this context was pinned.
> > +	 */
> > +	if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
> > +		err = guc_lrc_desc_pin(ce, false);
> > +		if (unlikely(err))
> > +			goto out;
> > +	}
> > +	enabled = context_enabled(ce);
> > +
> >   	if (!enabled) {
> >   		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> >   		action[len++] = ce->guc_id;
> > @@ -364,6 +401,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   		intel_context_put(ce);
> >   	}
> > +out:
> >   	return err;
> >   }
> > @@ -418,15 +456,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
> >   	if (submit) {
> >   		guc_set_lrc_tail(last);
> >   resubmit:
> > -		/*
> > -		 * We only check for -EBUSY here even though it is possible for
> > -		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
> > -		 * died and a full GPU needs to be done. The hangcheck will
> > -		 * eventually detect that the GuC has died and trigger this
> > -		 * reset so no need to handle -EDEADLK here.
> > -		 */
> >   		ret = guc_add_request(guc, last);
> > -		if (ret == -EBUSY) {
> > +		if (unlikely(ret == -EDEADLK))
> > +			goto deadlk;
> > +		else if (ret == -EBUSY) {
> >   			i915_sched_engine_kick(sched_engine);
> >   			guc->stalled_request = last;
> >   			return false;
> > @@ -436,6 +469,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
> >   	guc->stalled_request = NULL;
> >   	return submit;
> > +
> > +deadlk:
> > +	sched_engine->tasklet.callback = NULL;
> > +	tasklet_disable_nosync(&sched_engine->tasklet);
> > +	return false;
> >   }
> >   static void guc_submission_tasklet(struct tasklet_struct *t)
> > @@ -462,29 +500,165 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
> >   		intel_engine_signal_breadcrumbs(engine);
> >   }
> > -static void guc_reset_prepare(struct intel_engine_cs *engine)
> > +static void __guc_context_destroy(struct intel_context *ce);
> > +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
> > +static void guc_signal_context_fence(struct intel_context *ce);
> > +
> > +static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
> >   {
> > -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> > +	struct intel_context *ce;
> > +	unsigned long index, flags;
> > +	bool pending_disable, pending_enable, deregister, destroyed;
> > -	ENGINE_TRACE(engine, "\n");
> > +	xa_for_each(&guc->context_lookup, index, ce) {
> > +		/* Flush context */
> > +		spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> 
> Very unusual pattern - what does it do?
> 

The comment below tries to explain this. Basically by cycling the lock
it guarantees submission_disabled() is visible to all callers that touch
the below flags.

> > +
> > +		/*
> > +		 * Once we are at this point submission_disabled() is guaranteed
> > +		 * to visible to all callers who set the below flags (see above
> > +		 * flush and flushes in reset_prepare). If submission_disabled()
> > +		 * is set, the caller shouldn't set these flags.
> > +		 */
> > +
> > +		destroyed = context_destroyed(ce);
> > +		pending_enable = context_pending_enable(ce);
> > +		pending_disable = context_pending_disable(ce);
> > +		deregister = context_wait_for_deregister_to_register(ce);
> > +		init_sched_state(ce);
> > +
> > +		if (pending_enable || destroyed || deregister) {
> > +			atomic_dec(&guc->outstanding_submission_g2h);
> > +			if (deregister)
> > +				guc_signal_context_fence(ce);
> > +			if (destroyed) {
> > +				release_guc_id(guc, ce);
> > +				__guc_context_destroy(ce);
> > +			}
> > +			if (pending_enable|| deregister)
> > +				intel_context_put(ce);
> > +		}
> > +
> > +		/* Not mutualy exclusive with above if statement. */
> > +		if (pending_disable) {
> > +			guc_signal_context_fence(ce);
> > +			intel_context_sched_disable_unpin(ce);
> > +			atomic_dec(&guc->outstanding_submission_g2h);
> > +			intel_context_put(ce);
> > +		}
> 
> Yeah this function is a taste of the taste machine I think is _extremely_
> hard to review and know with any confidence it does the right thing.
>

What is the other option? Block everytime we issue an asynchronous
command to the GuC? If we want to do everyting asynchronously we have to
track state and take further actions when the GuC finally responds. We
also have to deal with the GuC dying and taking those actions on our
own.

Luckily we do have several other aside from myself that do understand
this quite well.

> > +	}
> > +}
> > +
> > +static inline bool
> > +submission_disabled(struct intel_guc *guc)
> > +{
> > +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > +
> > +	return unlikely(!__tasklet_is_enabled(&sched_engine->tasklet));
> > +}
> > +
> > +static void disable_submission(struct intel_guc *guc)
> > +{
> > +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > +
> > +	if (__tasklet_is_enabled(&sched_engine->tasklet)) {
> > +		GEM_BUG_ON(!guc->ct.enabled);
> > +		__tasklet_disable_sync_once(&sched_engine->tasklet);
> > +		sched_engine->tasklet.callback = NULL;
> > +	}
> > +}
> > +
> > +static void enable_submission(struct intel_guc *guc)
> > +{
> > +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&guc->sched_engine->lock, flags);
> > +	sched_engine->tasklet.callback = guc_submission_tasklet;
> > +	wmb();
> 
> All memory barriers must be documented.
>

Found that out from checkpatch the other day, will fix.

> > +	if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
> > +	    __tasklet_enable(&sched_engine->tasklet)) {
> > +		GEM_BUG_ON(!guc->ct.enabled);
> > +
> > +		/* And kick in case we missed a new request submission. */
> > +		i915_sched_engine_hi_kick(sched_engine);
> > +	}
> > +	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
> > +}
> > +
> > +static void guc_flush_submissions(struct intel_guc *guc)
> > +{
> > +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> 
> Oh right, more of this. No idea.
>

Same as above. If you change some state then cycle a lock it is
guaranteed that state is visible next time someone grabs the lock. I do
explain these races in the documentation patch near the end of the
series. Without a BKL I don't see how else to avoid these reset races.
 
> > +}
> > +
> > +void intel_guc_submission_reset_prepare(struct intel_guc *guc)
> > +{
> > +	int i;
> > +
> > +	if (unlikely(!guc_submission_initialized(guc)))
> > +		/* Reset called during driver load? GuC not yet initialised! */
> > +		return;
> > +
> > +	disable_submission(guc);
> > +	guc->interrupts.disable(guc);
> > +
> > +	/* Flush IRQ handler */
> > +	spin_lock_irq(&guc_to_gt(guc)->irq_lock);
> > +	spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
> > +
> > +	guc_flush_submissions(guc);
> >   	/*
> > -	 * Prevent request submission to the hardware until we have
> > -	 * completed the reset in i915_gem_reset_finish(). If a request
> > -	 * is completed by one engine, it may then queue a request
> > -	 * to a second via its sched_engine->tasklet *just* as we are
> > -	 * calling engine->init_hw() and also writing the ELSP.
> > -	 * Turning off the sched_engine->tasklet until the reset is over
> > -	 * prevents the race.
> > +	 * Handle any outstanding G2Hs before reset. Call IRQ handler directly
> > +	 * each pass as interrupt have been disabled. We always scrub for
> > +	 * outstanding G2H as it is possible for outstanding_submission_g2h to
> > +	 * be incremented after the context state update.
> >   	 */
> > -	__tasklet_disable_sync_once(&sched_engine->tasklet);
> > +	for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
> 
> Why is four the magic number and what happens if it is not enough?
>

I just picked a number. Regardless if the normal G2H path processes all the
G2H we scrub all the context state for lost ones.
 
> > +		intel_guc_to_host_event_handler(guc);
> > +#define wait_for_reset(guc, wait_var) \
> > +		guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
> > +		do {
> > +			wait_for_reset(guc, &guc->outstanding_submission_g2h);
> > +		} while (!list_empty(&guc->ct.requests.incoming));
> > +	}
> > +	scrub_guc_desc_for_outstanding_g2h(guc);
> > +}
> > +
> > +static struct intel_engine_cs *
> > +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> > +{
> > +	struct intel_engine_cs *engine;
> > +	intel_engine_mask_t tmp, mask = ve->mask;
> > +	unsigned int num_siblings = 0;
> > +
> > +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> > +		if (num_siblings++ == sibling)
> > +			return engine;
> 
> Not sure how often is this used overall and whether just storing the array
> in ve could be justified.
>

It really is only used with sibling == 0, so it should be fast.
 
> > +
> > +	return NULL;
> > +}
> > +
> > +static inline struct intel_engine_cs *
> > +__context_to_physical_engine(struct intel_context *ce)
> > +{
> > +	struct intel_engine_cs *engine = ce->engine;
> > +
> > +	if (intel_engine_is_virtual(engine))
> > +		engine = guc_virtual_get_sibling(engine, 0);
> > +
> > +	return engine;
> >   }
> > -static void guc_reset_state(struct intel_context *ce,
> > -			    struct intel_engine_cs *engine,
> > -			    u32 head,
> > -			    bool scrub)
> > +static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
> >   {
> > +	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
> > +
> >   	GEM_BUG_ON(!intel_context_is_pinned(ce));
> >   	/*
> > @@ -502,42 +676,147 @@ static void guc_reset_state(struct intel_context *ce,
> >   	lrc_update_regs(ce, engine, head);
> >   }
> > -static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
> > +static void guc_reset_nop(struct intel_engine_cs *engine)
> >   {
> > -	struct intel_engine_execlists * const execlists = &engine->execlists;
> > -	struct i915_request *rq;
> > +}
> > +
> > +static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled)
> > +{
> > +}
> > +
> > +static void
> > +__unwind_incomplete_requests(struct intel_context *ce)
> > +{
> > +	struct i915_request *rq, *rn;
> > +	struct list_head *pl;
> > +	int prio = I915_PRIORITY_INVALID;
> > +	struct i915_sched_engine * const sched_engine =
> > +		ce->engine->sched_engine;
> >   	unsigned long flags;
> > -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +	spin_lock(&ce->guc_active.lock);
> > +	list_for_each_entry_safe(rq, rn,
> > +				 &ce->guc_active.requests,
> > +				 sched.link) {
> > +		if (i915_request_completed(rq))
> > +			continue;
> > +
> > +		list_del_init(&rq->sched.link);
> > +		spin_unlock(&ce->guc_active.lock);
> 
> Drops the lock and continues iterating the same list is safe? Comment needed
> I think and I do remember I worried about this, or similar instances, in GuC
> code before.
>

We only need the active lock for ce->guc_active.requests list. It is
indeed safe to drop the lock.
 
> > +
> > +		__i915_request_unsubmit(rq);
> > +
> > +		/* Push the request back into the queue for later resubmission. */
> > +		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
> > +		if (rq_prio(rq) != prio) {
> > +			prio = rq_prio(rq);
> > +			pl = i915_sched_lookup_priolist(sched_engine, prio);
> > +		}
> > +		GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
> > -	/* Push back any incomplete requests for replay after the reset. */
> > -	rq = execlists_unwind_incomplete_requests(execlists);
> > -	if (!rq)
> > -		goto out_unlock;
> > +		list_add_tail(&rq->sched.link, pl);
> > +		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > +
> > +		spin_lock(&ce->guc_active.lock);
> > +	}
> > +	spin_unlock(&ce->guc_active.lock);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +}
> > +
> > +static struct i915_request *context_find_active_request(struct intel_context *ce)
> > +{
> > +	struct i915_request *rq, *active = NULL;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&ce->guc_active.lock, flags);
> > +	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
> > +				    sched.link) {
> > +		if (i915_request_completed(rq))
> > +			break;
> > +
> > +		active = rq;
> > +	}
> > +	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
> > +
> > +	return active;
> > +}
> > +
> > +static void __guc_reset_context(struct intel_context *ce, bool stalled)
> > +{
> > +	struct i915_request *rq;
> > +	u32 head;
> > +
> > +	/*
> > +	 * GuC will implicitly mark the context as non-schedulable
> > +	 * when it sends the reset notification. Make sure our state
> > +	 * reflects this change. The context will be marked enabled
> > +	 * on resubmission.
> > +	 */
> > +	clr_context_enabled(ce);
> > +
> > +	rq = context_find_active_request(ce);
> > +	if (!rq) {
> > +		head = ce->ring->tail;
> > +		stalled = false;
> > +		goto out_replay;
> > +	}
> >   	if (!i915_request_started(rq))
> >   		stalled = false;
> > +	GEM_BUG_ON(i915_active_is_idle(&ce->active));
> > +	head = intel_ring_wrap(ce->ring, rq->head);
> >   	__i915_request_reset(rq, stalled);
> > -	guc_reset_state(rq->context, engine, rq->head, stalled);
> > -out_unlock:
> > -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> > +out_replay:
> > +	guc_reset_state(ce, head, stalled);
> > +	__unwind_incomplete_requests(ce);
> > +}
> > +
> > +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
> > +{
> > +	struct intel_context *ce;
> > +	unsigned long index;
> > +
> > +	if (unlikely(!guc_submission_initialized(guc)))
> > +		/* Reset called during driver load? GuC not yet initialised! */
> > +		return;
> > +
> > +	xa_for_each(&guc->context_lookup, index, ce)
> > +		if (intel_context_is_pinned(ce))
> > +			__guc_reset_context(ce, stalled);
> > +
> > +	/* GuC is blown away, drop all references to contexts */
> > +	xa_destroy(&guc->context_lookup);
> > +}
> > +
> > +static void guc_cancel_context_requests(struct intel_context *ce)
> > +{
> > +	struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
> > +	struct i915_request *rq;
> > +	unsigned long flags;
> > +
> > +	/* Mark all executing requests as skipped. */
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +	spin_lock(&ce->guc_active.lock);
> > +	list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
> > +		i915_request_put(i915_request_mark_eio(rq));
> > +	spin_unlock(&ce->guc_active.lock);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> 
> I suppose somewhere it will need to be documented what are the two locks
> protecting and why both are needed at some places.
> 

Yep, I have a locking section the doc patch near the end of the series.
Basically here is we don't want any new submissions processed when we
are canceling requests - that is the outer lock. The inner lock again
ce->guc_active.requests list.

BTW - I think I am overly careful with the locks (when in doubt grab a
lock) in the reset / cancel code as there is no expectation that this
needs to perform well and resets are by far the racest code in the i915.

> >   }
> > -static void guc_reset_cancel(struct intel_engine_cs *engine)
> > +static void
> > +guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine)
> >   {
> > -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> >   	struct i915_request *rq, *rn;
> >   	struct rb_node *rb;
> >   	unsigned long flags;
> >   	/* Can be called during boot if GuC fails to load */
> > -	if (!engine->gt)
> > +	if (!sched_engine)
> >   		return;
> > -	ENGINE_TRACE(engine, "\n");
> > -
> >   	/*
> >   	 * Before we call engine->cancel_requests(), we should have exclusive
> >   	 * access to the submission state. This is arranged for us by the
> > @@ -552,13 +831,7 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> >   	 * submission's irq state, we also wish to remind ourselves that
> >   	 * it is irq state.)
> >   	 */
> > -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > -
> > -	/* Mark all executing requests as skipped. */
> > -	list_for_each_entry(rq, &engine->sched_engine->requests, sched.link) {
> > -		i915_request_set_error_once(rq, -EIO);
> > -		i915_request_mark_complete(rq);
> > -	}
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> >   	/* Flush the queued requests to the timeline list (for retiring). */
> >   	while ((rb = rb_first_cached(&sched_engine->queue))) {
> > @@ -566,9 +839,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> >   		priolist_for_each_request_consume(rq, rn, p) {
> >   			list_del_init(&rq->sched.link);
> > +
> >   			__i915_request_submit(rq);
> > -			dma_fence_set_error(&rq->fence, -EIO);
> > -			i915_request_mark_complete(rq);
> > +
> > +			i915_request_put(i915_request_mark_eio(rq));
> >   		}
> >   		rb_erase_cached(&p->node, &sched_engine->queue);
> > @@ -580,19 +854,41 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> >   	sched_engine->queue_priority_hint = INT_MIN;
> >   	sched_engine->queue = RB_ROOT_CACHED;
> > -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> >   }
> > -static void guc_reset_finish(struct intel_engine_cs *engine)
> > +void intel_guc_submission_cancel_requests(struct intel_guc *guc)
> >   {
> > -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> > +	struct intel_context *ce;
> > +	unsigned long index;
> > -	if (__tasklet_enable(&sched_engine->tasklet))
> > -		/* And kick in case we missed a new request submission. */
> > -		i915_sched_engine_hi_kick(sched_engine);
> > +	xa_for_each(&guc->context_lookup, index, ce)
> > +		if (intel_context_is_pinned(ce))
> > +			guc_cancel_context_requests(ce);
> > +
> > +	guc_cancel_sched_engine_requests(guc->sched_engine);
> > +
> > +	/* GuC is blown away, drop all references to contexts */
> > +	xa_destroy(&guc->context_lookup);
> > +}
> > +
> > +void intel_guc_submission_reset_finish(struct intel_guc *guc)
> > +{
> > +	/* Reset called during driver load or during wedge? */
> > +	if (unlikely(!guc_submission_initialized(guc) ||
> > +		     test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
> > +		return;
> > -	ENGINE_TRACE(engine, "depth->%d\n",
> > -		     atomic_read(&sched_engine->tasklet.count));
> > +	/*
> > +	 * Technically possible for either of these values to be non-zero here,
> > +	 * but very unlikely + harmless. Regardless let's add a warn so we can
> > +	 * see in CI if this happens frequently / a precursor to taking down the
> > +	 * machine.
> 
> And what did CI say over time this was in?
> 

It hasn't popped yet. This is more for upcoming code where we have a
g2hs we can't scrub (i.e. a TLB invalidation, engine class scheduling
disable, etc...).

> It needs to be explained when it can be non zero and whether or not it can
> go to non zero just after the atomic_set below. Or if not why not.
>

At this point we could probably turn this into a BUG_ON.
 
> > +	 */
> > +	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
> > +	atomic_set(&guc->outstanding_submission_g2h, 0);
> > +
> > +	enable_submission(guc);
> >   }
> >   /*
> > @@ -659,6 +955,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> >   	else
> >   		trace_i915_request_guc_submit(rq);
> > +	if (unlikely(ret == -EDEADLK))
> > +		disable_submission(guc);
> > +
> >   	return ret;
> >   }
> > @@ -671,7 +970,8 @@ static void guc_submit_request(struct i915_request *rq)
> >   	/* Will be called from irq-context when using foreign fences. */
> >   	spin_lock_irqsave(&sched_engine->lock, flags);
> > -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> > +	if (submission_disabled(guc) || guc->stalled_request ||
> > +	    !i915_sched_engine_is_empty(sched_engine))
> >   		queue_request(sched_engine, rq, rq_prio(rq));
> >   	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> >   		i915_sched_engine_hi_kick(sched_engine);
> > @@ -808,7 +1108,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> >   static int __guc_action_register_context(struct intel_guc *guc,
> >   					 u32 guc_id,
> > -					 u32 offset)
> > +					 u32 offset,
> > +					 bool loop)
> >   {
> >   	u32 action[] = {
> >   		INTEL_GUC_ACTION_REGISTER_CONTEXT,
> > @@ -816,10 +1117,10 @@ static int __guc_action_register_context(struct intel_guc *guc,
> >   		offset,
> >   	};
> > -	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > +	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, loop);
> >   }
> > -static int register_context(struct intel_context *ce)
> > +static int register_context(struct intel_context *ce, bool loop)
> >   {
> >   	struct intel_guc *guc = ce_to_guc(ce);
> >   	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> > @@ -827,11 +1128,12 @@ static int register_context(struct intel_context *ce)
> >   	trace_intel_context_register(ce);
> > -	return __guc_action_register_context(guc, ce->guc_id, offset);
> > +	return __guc_action_register_context(guc, ce->guc_id, offset, loop);
> >   }
> >   static int __guc_action_deregister_context(struct intel_guc *guc,
> > -					   u32 guc_id)
> > +					   u32 guc_id,
> > +					   bool loop)
> >   {
> >   	u32 action[] = {
> >   		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> > @@ -839,16 +1141,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
> >   	};
> >   	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
> > -					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
> > +					G2H_LEN_DW_DEREGISTER_CONTEXT, loop);
> >   }
> > -static int deregister_context(struct intel_context *ce, u32 guc_id)
> > +static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
> >   {
> >   	struct intel_guc *guc = ce_to_guc(ce);
> >   	trace_intel_context_deregister(ce);
> > -	return __guc_action_deregister_context(guc, guc_id);
> > +	return __guc_action_deregister_context(guc, guc_id, loop);
> >   }
> >   static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> > @@ -877,7 +1179,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
> >   	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> >   }
> > -static int guc_lrc_desc_pin(struct intel_context *ce)
> > +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> >   {
> >   	struct intel_runtime_pm *runtime_pm =
> >   		&ce->engine->gt->i915->runtime_pm;
> > @@ -923,18 +1225,44 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
> >   	 */
> >   	if (context_registered) {
> >   		trace_intel_context_steal_guc_id(ce);
> > -		set_context_wait_for_deregister_to_register(ce);
> > -		intel_context_get(ce);
> > +		if (!loop) {
> > +			set_context_wait_for_deregister_to_register(ce);
> > +			intel_context_get(ce);
> > +		} else {
> > +			bool disabled;
> > +			unsigned long flags;
> > +
> > +			/* Seal race with Reset */
> 
> Needs to be more descriptive.
> 

Again have comment about this the doc patch. Basically this goes back to
your other questions about cycling a lock. You must check if
submission_disabled() within a lock otherwise there is a race with
resets and updating a contexts state.

> > +			spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +			disabled = submission_disabled(guc);
> > +			if (likely(!disabled)) {
> > +				set_context_wait_for_deregister_to_register(ce);
> > +				intel_context_get(ce);
> > +			}
> > +			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +			if (unlikely(disabled)) {
> > +				reset_lrc_desc(guc, desc_idx);
> > +				return 0;	/* Will get registered later */
> > +			}
> > +		}
> >   		/*
> >   		 * If stealing the guc_id, this ce has the same guc_id as the
> >   		 * context whos guc_id was stole.
> >   		 */
> >   		with_intel_runtime_pm(runtime_pm, wakeref)
> > -			ret = deregister_context(ce, ce->guc_id);
> > +			ret = deregister_context(ce, ce->guc_id, loop);
> > +		if (unlikely(ret == -EBUSY)) {
> > +			clr_context_wait_for_deregister_to_register(ce);
> > +			intel_context_put(ce);
> > +		}
> >   	} else {
> >   		with_intel_runtime_pm(runtime_pm, wakeref)
> > -			ret = register_context(ce);
> > +			ret = register_context(ce, loop);
> > +		if (unlikely(ret == -EBUSY))
> > +			reset_lrc_desc(guc, desc_idx);
> > +		else if (unlikely(ret == -ENODEV))
> > +			ret = 0;	/* Will get registered later */
> >   	}
> >   	return ret;
> > @@ -997,7 +1325,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
> >   	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
> >   	trace_intel_context_sched_disable(ce);
> > -	intel_context_get(ce);
> >   	guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
> >   				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> > @@ -1007,6 +1334,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
> >   {
> >   	set_context_pending_disable(ce);
> >   	clr_context_enabled(ce);
> > +	intel_context_get(ce);
> >   	return ce->guc_id;
> >   }
> > @@ -1019,7 +1347,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
> >   	u16 guc_id;
> >   	intel_wakeref_t wakeref;
> > -	if (context_guc_id_invalid(ce) ||
> > +	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
> >   	    !lrc_desc_registered(guc, ce->guc_id)) {
> >   		clr_context_enabled(ce);
> >   		goto unpin;
> > @@ -1053,19 +1381,13 @@ static void guc_context_sched_disable(struct intel_context *ce)
> >   static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> >   {
> > -	struct intel_engine_cs *engine = ce->engine;
> > -	struct intel_guc *guc = &engine->gt->uc.guc;
> > -	unsigned long flags;
> > +	struct intel_guc *guc = ce_to_guc(ce);
> >   	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> >   	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> >   	GEM_BUG_ON(context_enabled(ce));
> > -	spin_lock_irqsave(&ce->guc_state.lock, flags);
> > -	set_context_destroyed(ce);
> > -	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > -
> > -	deregister_context(ce, ce->guc_id);
> > +	deregister_context(ce, ce->guc_id, true);
> >   }
> >   static void __guc_context_destroy(struct intel_context *ce)
> > @@ -1093,13 +1415,15 @@ static void guc_context_destroy(struct kref *kref)
> >   	struct intel_guc *guc = &ce->engine->gt->uc.guc;
> >   	intel_wakeref_t wakeref;
> >   	unsigned long flags;
> > +	bool disabled;
> >   	/*
> >   	 * If the guc_id is invalid this context has been stolen and we can free
> >   	 * it immediately. Also can be freed immediately if the context is not
> >   	 * registered with the GuC.
> >   	 */
> > -	if (context_guc_id_invalid(ce) ||
> > +	if (submission_disabled(guc) ||
> > +	    context_guc_id_invalid(ce) ||
> >   	    !lrc_desc_registered(guc, ce->guc_id)) {
> >   		release_guc_id(guc, ce);
> >   		__guc_context_destroy(ce);
> > @@ -1126,6 +1450,18 @@ static void guc_context_destroy(struct kref *kref)
> >   		list_del_init(&ce->guc_id_link);
> >   	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +	/* Seal race with Reset */
> > +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +	disabled = submission_disabled(guc);
> > +	if (likely(!disabled))
> > +		set_context_destroyed(ce);
> > +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +	if (unlikely(disabled)) {
> > +		release_guc_id(guc, ce);
> > +		__guc_context_destroy(ce);
> > +		return;
> 
> Same as above, needs a better comment. It is also hard for reader to know if
> snapshot of disabled taked under the lock is still valid after the lock has
> been released and why.
> 

Will pull in the doc comment into this patch.

Matt

> Regards,
> 
> Tvrtko
> 
> > +	}
> > +
> >   	/*
> >   	 * We defer GuC context deregistration until the context is destroyed
> >   	 * in order to save on CTBs. With this optimization ideally we only need
> > @@ -1148,6 +1484,33 @@ static int guc_context_alloc(struct intel_context *ce)
> >   	return lrc_alloc(ce, ce->engine);
> >   }
> > +static void add_to_context(struct i915_request *rq)
> > +{
> > +	struct intel_context *ce = rq->context;
> > +
> > +	spin_lock(&ce->guc_active.lock);
> > +	list_move_tail(&rq->sched.link, &ce->guc_active.requests);
> > +	spin_unlock(&ce->guc_active.lock);
> > +}
> > +
> > +static void remove_from_context(struct i915_request *rq)
> > +{
> > +	struct intel_context *ce = rq->context;
> > +
> > +	spin_lock_irq(&ce->guc_active.lock);
> > +
> > +	list_del_init(&rq->sched.link);
> > +	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > +
> > +	/* Prevent further __await_execution() registering a cb, then flush */
> > +	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> > +
> > +	spin_unlock_irq(&ce->guc_active.lock);
> > +
> > +	atomic_dec(&ce->guc_id_ref);
> > +	i915_request_notify_execute_cb_imm(rq);
> > +}
> > +
> >   static const struct intel_context_ops guc_context_ops = {
> >   	.alloc = guc_context_alloc,
> > @@ -1186,8 +1549,6 @@ static void guc_signal_context_fence(struct intel_context *ce)
> >   {
> >   	unsigned long flags;
> > -	GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
> > -
> >   	spin_lock_irqsave(&ce->guc_state.lock, flags);
> >   	clr_context_wait_for_deregister_to_register(ce);
> >   	__guc_signal_context_fence(ce);
> > @@ -1196,8 +1557,9 @@ static void guc_signal_context_fence(struct intel_context *ce)
> >   static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
> >   {
> > -	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> > -		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> > +	return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> > +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
> > +		!submission_disabled(ce_to_guc(ce));
> >   }
> >   static int guc_request_alloc(struct i915_request *rq)
> > @@ -1256,8 +1618,10 @@ static int guc_request_alloc(struct i915_request *rq)
> >   		return ret;;
> >   	if (context_needs_register(ce, !!ret)) {
> > -		ret = guc_lrc_desc_pin(ce);
> > +		ret = guc_lrc_desc_pin(ce, true);
> >   		if (unlikely(ret)) {	/* unwind */
> > +			if (ret == -EDEADLK)
> > +				disable_submission(guc);
> >   			atomic_dec(&ce->guc_id_ref);
> >   			unpin_guc_id(guc, ce);
> >   			return ret;
> > @@ -1294,20 +1658,6 @@ static int guc_request_alloc(struct i915_request *rq)
> >   	return 0;
> >   }
> > -static struct intel_engine_cs *
> > -guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> > -{
> > -	struct intel_engine_cs *engine;
> > -	intel_engine_mask_t tmp, mask = ve->mask;
> > -	unsigned int num_siblings = 0;
> > -
> > -	for_each_engine_masked(engine, ve->gt, mask, tmp)
> > -		if (num_siblings++ == sibling)
> > -			return engine;
> > -
> > -	return NULL;
> > -}
> > -
> >   static int guc_virtual_context_pre_pin(struct intel_context *ce,
> >   				       struct i915_gem_ww_ctx *ww,
> >   				       void **vaddr)
> > @@ -1516,7 +1866,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
> >   {
> >   	if (context_guc_id_invalid(ce))
> >   		pin_guc_id(guc, ce);
> > -	guc_lrc_desc_pin(ce);
> > +	guc_lrc_desc_pin(ce, true);
> >   }
> >   static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> > @@ -1582,13 +1932,15 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
> >   	engine->cops = &guc_context_ops;
> >   	engine->request_alloc = guc_request_alloc;
> >   	engine->bump_serial = guc_bump_serial;
> > +	engine->add_active_request = add_to_context;
> > +	engine->remove_active_request = remove_from_context;
> >   	engine->sched_engine->schedule = i915_schedule;
> > -	engine->reset.prepare = guc_reset_prepare;
> > -	engine->reset.rewind = guc_reset_rewind;
> > -	engine->reset.cancel = guc_reset_cancel;
> > -	engine->reset.finish = guc_reset_finish;
> > +	engine->reset.prepare = guc_reset_nop;
> > +	engine->reset.rewind = guc_rewind_nop;
> > +	engine->reset.cancel = guc_reset_nop;
> > +	engine->reset.finish = guc_reset_nop;
> >   	engine->emit_flush = gen8_emit_flush_xcs;
> >   	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
> > @@ -1764,7 +2116,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >   		 * register this context.
> >   		 */
> >   		with_intel_runtime_pm(runtime_pm, wakeref)
> > -			register_context(ce);
> > +			register_context(ce, true);
> >   		guc_signal_context_fence(ce);
> >   		intel_context_put(ce);
> >   	} else if (context_destroyed(ce)) {
> > @@ -1946,6 +2298,10 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> >   				 "v%dx%d", ve->base.class, count);
> >   			ve->base.context_size = sibling->context_size;
> > +			ve->base.add_active_request =
> > +				sibling->add_active_request;
> > +			ve->base.remove_active_request =
> > +				sibling->remove_active_request;
> >   			ve->base.emit_bb_start = sibling->emit_bb_start;
> >   			ve->base.emit_flush = sibling->emit_flush;
> >   			ve->base.emit_init_breadcrumb =
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > index ab0789d66e06..d5ccffbb89ae 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > @@ -565,12 +565,44 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
> >   {
> >   	struct intel_guc *guc = &uc->guc;
> > +	/* Firmware expected to be running when this function is called */
> >   	if (!intel_guc_is_ready(guc))
> > -		return;
> > +		goto sanitize;
> > +
> > +	if (intel_uc_uses_guc_submission(uc))
> > +		intel_guc_submission_reset_prepare(guc);
> > +sanitize:
> >   	__uc_sanitize(uc);
> >   }
> > +void intel_uc_reset(struct intel_uc *uc, bool stalled)
> > +{
> > +	struct intel_guc *guc = &uc->guc;
> > +
> > +	/* Firmware can not be running when this function is called  */
> > +	if (intel_uc_uses_guc_submission(uc))
> > +		intel_guc_submission_reset(guc, stalled);
> > +}
> > +
> > +void intel_uc_reset_finish(struct intel_uc *uc)
> > +{
> > +	struct intel_guc *guc = &uc->guc;
> > +
> > +	/* Firmware expected to be running when this function is called */
> > +	if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
> > +		intel_guc_submission_reset_finish(guc);
> > +}
> > +
> > +void intel_uc_cancel_requests(struct intel_uc *uc)
> > +{
> > +	struct intel_guc *guc = &uc->guc;
> > +
> > +	/* Firmware can not be running when this function is called  */
> > +	if (intel_uc_uses_guc_submission(uc))
> > +		intel_guc_submission_cancel_requests(guc);
> > +}
> > +
> >   void intel_uc_runtime_suspend(struct intel_uc *uc)
> >   {
> >   	struct intel_guc *guc = &uc->guc;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > index c4cef885e984..eaa3202192ac 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > @@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc);
> >   void intel_uc_driver_remove(struct intel_uc *uc);
> >   void intel_uc_init_mmio(struct intel_uc *uc);
> >   void intel_uc_reset_prepare(struct intel_uc *uc);
> > +void intel_uc_reset(struct intel_uc *uc, bool stalled);
> > +void intel_uc_reset_finish(struct intel_uc *uc);
> > +void intel_uc_cancel_requests(struct intel_uc *uc);
> >   void intel_uc_suspend(struct intel_uc *uc);
> >   void intel_uc_runtime_suspend(struct intel_uc *uc);
> >   int intel_uc_resume(struct intel_uc *uc);
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 0b96b824ea06..4855cf7ebe21 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -194,7 +194,7 @@ static bool irq_work_imm(struct irq_work *wrk)
> >   	return false;
> >   }
> > -static void __notify_execute_cb_imm(struct i915_request *rq)
> > +void i915_request_notify_execute_cb_imm(struct i915_request *rq)
> >   {
> >   	__notify_execute_cb(rq, irq_work_imm);
> >   }
> > @@ -268,37 +268,6 @@ i915_request_active_engine(struct i915_request *rq,
> >   	return ret;
> >   }
> > -
> > -static void remove_from_engine(struct i915_request *rq)
> > -{
> > -	struct intel_engine_cs *engine, *locked;
> > -
> > -	/*
> > -	 * Virtual engines complicate acquiring the engine timeline lock,
> > -	 * as their rq->engine pointer is not stable until under that
> > -	 * engine lock. The simple ploy we use is to take the lock then
> > -	 * check that the rq still belongs to the newly locked engine.
> > -	 */
> > -	locked = READ_ONCE(rq->engine);
> > -	spin_lock_irq(&locked->sched_engine->lock);
> > -	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> > -		spin_unlock(&locked->sched_engine->lock);
> > -		spin_lock(&engine->sched_engine->lock);
> > -		locked = engine;
> > -	}
> > -	list_del_init(&rq->sched.link);
> > -
> > -	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > -	clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
> > -
> > -	/* Prevent further __await_execution() registering a cb, then flush */
> > -	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> > -
> > -	spin_unlock_irq(&locked->sched_engine->lock);
> > -
> > -	__notify_execute_cb_imm(rq);
> > -}
> > -
> >   static void __rq_init_watchdog(struct i915_request *rq)
> >   {
> >   	rq->watchdog.timer.function = NULL;
> > @@ -395,9 +364,7 @@ bool i915_request_retire(struct i915_request *rq)
> >   	 * after removing the breadcrumb and signaling it, so that we do not
> >   	 * inadvertently attach the breadcrumb to a completed request.
> >   	 */
> > -	if (!list_empty(&rq->sched.link))
> > -		remove_from_engine(rq);
> > -	atomic_dec(&rq->context->guc_id_ref);
> > +	rq->engine->remove_active_request(rq);
> >   	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
> >   	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
> > @@ -539,7 +506,7 @@ __await_execution(struct i915_request *rq,
> >   	if (llist_add(&cb->work.node.llist, &signal->execute_cb)) {
> >   		if (i915_request_is_active(signal) ||
> >   		    __request_in_flight(signal))
> > -			__notify_execute_cb_imm(signal);
> > +			i915_request_notify_execute_cb_imm(signal);
> >   	}
> >   	return 0;
> > @@ -676,7 +643,7 @@ bool __i915_request_submit(struct i915_request *request)
> >   	result = true;
> >   	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> > -	list_move_tail(&request->sched.link, &engine->sched_engine->requests);
> > +	engine->add_active_request(request);
> >   active:
> >   	clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags);
> >   	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
> > diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> > index f870cd75a001..bcc6340c505e 100644
> > --- a/drivers/gpu/drm/i915/i915_request.h
> > +++ b/drivers/gpu/drm/i915/i915_request.h
> > @@ -649,4 +649,6 @@ bool
> >   i915_request_active_engine(struct i915_request *rq,
> >   			   struct intel_engine_cs **active);
> > +void i915_request_notify_execute_cb_imm(struct i915_request *rq);
> > +
> >   #endif /* I915_REQUEST_H */
> > 

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 64/97] drm/i915/guc: Reset implementation for new GuC interface
  2021-06-04  3:17     ` Matthew Brost
@ 2021-06-04  8:16       ` Daniel Vetter
  2021-06-04 18:02         ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Daniel Vetter @ 2021-06-04  8:16 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Jason Ekstrand, Tvrtko Ursulin, intel-gfx, dri-devel, Daniel Vetter

On Fri, Jun 4, 2021 at 5:25 AM Matthew Brost <matthew.brost@intel.com> wrote:
>
> On Wed, Jun 02, 2021 at 03:33:43PM +0100, Tvrtko Ursulin wrote:
> >
> > On 06/05/2021 20:14, Matthew Brost wrote:
> > > Reset implementation for new GuC interface. This is the legacy reset
> > > implementation which is called when the i915 owns the engine hang check.
> > > Future patches will offload the engine hang check to GuC but we will
> > > continue to maintain this legacy path as a fallback and this code path
> > > is also required if the GuC dies.
> > >
> > > With the new GuC interface it is not possible to reset individual
> > > engines - it is only possible to reset the GPU entirely. This patch
> > > forces an entire chip reset if any engine hangs.
> > >
> > > Cc: John Harrison <john.c.harrison@intel.com>
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/gt/intel_context.c       |   3 +
> > >   drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
> > >   drivers/gpu/drm/i915/gt/intel_engine_types.h  |   6 +
> > >   .../drm/i915/gt/intel_execlists_submission.c  |  40 ++
> > >   drivers/gpu/drm/i915/gt/intel_gt_pm.c         |   6 +-
> > >   drivers/gpu/drm/i915/gt/intel_reset.c         |  18 +-
> > >   .../gpu/drm/i915/gt/intel_ring_submission.c   |  22 +
> > >   drivers/gpu/drm/i915/gt/mock_engine.c         |  31 +
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  16 +-
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   8 +-
> > >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 580 ++++++++++++++----
> > >   drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  34 +-
> > >   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   3 +
> > >   drivers/gpu/drm/i915/i915_request.c           |  41 +-
> > >   drivers/gpu/drm/i915/i915_request.h           |   2 +
> > >   15 files changed, 643 insertions(+), 174 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> > > index b24a1b7a3f88..2f01437056a8 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > > @@ -392,6 +392,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
> > >     spin_lock_init(&ce->guc_state.lock);
> > >     INIT_LIST_HEAD(&ce->guc_state.fences);
> > > +   spin_lock_init(&ce->guc_active.lock);
> > > +   INIT_LIST_HEAD(&ce->guc_active.requests);
> > > +
> > >     ce->guc_id = GUC_INVALID_LRC_ID;
> > >     INIT_LIST_HEAD(&ce->guc_id_link);
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > index 6945963a31ba..b63c8cf7823b 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > @@ -165,6 +165,13 @@ struct intel_context {
> > >             struct list_head fences;
> > >     } guc_state;
> > > +   struct {
> > > +           /** lock: protects everything in guc_active */
> > > +           spinlock_t lock;
> > > +           /** requests: active requests on this context */
> > > +           struct list_head requests;
> > > +   } guc_active;
> >
> > More accounting, yeah, this is more of that where GuC gives with one hand
> > and takes away with the other. :(
> >
>
> Yep but we probably can drop this once we switch to the DRM scheduler.
> The drm_gpu_scheduler has a list of jobs and if don't mind searching the
> whole thing on a reset that will probably work too. I think the only
> reason we have a per context list is because of feedback I received a
> a while go saying resets are per context with GuC so keep a list on the
> context and engine list didn't really fit either. I'll make a to circle
> back to this when we hook into the DRM scheduler.

Please add a FIXME or similar to the kerneldoc comment for stuff like
this. We have a lot of things to recheck once the big picture is
sorted, and it's easy to forget them.

Similar for anything else where we have opens about how to structure
things once it's cut over.
-Daniel

>
> > > +
> > >     /* GuC scheduling state that does not require a lock. */
> > >     atomic_t guc_sched_state_no_lock;
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > index f7b6eed586ce..b84562b2708b 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > @@ -432,6 +432,12 @@ struct intel_engine_cs {
> > >      */
> > >     void            (*release)(struct intel_engine_cs *engine);
> > > +   /*
> > > +    * Add / remove request from engine active tracking
> > > +    */
> > > +   void            (*add_active_request)(struct i915_request *rq);
> > > +   void            (*remove_active_request)(struct i915_request *rq);
> > > +
> > >     struct intel_engine_execlists execlists;
> > >     /*
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > index 396b1356ea3e..54518b64bdbd 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > @@ -3117,6 +3117,42 @@ static void execlists_park(struct intel_engine_cs *engine)
> > >     cancel_timer(&engine->execlists.preempt);
> > >   }
> > > +static void add_to_engine(struct i915_request *rq)
> > > +{
> > > +   lockdep_assert_held(&rq->engine->sched_engine->lock);
> > > +   list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> > > +}
> > > +
> > > +static void remove_from_engine(struct i915_request *rq)
> > > +{
> > > +   struct intel_engine_cs *engine, *locked;
> > > +
> > > +   /*
> > > +    * Virtual engines complicate acquiring the engine timeline lock,
> > > +    * as their rq->engine pointer is not stable until under that
> > > +    * engine lock. The simple ploy we use is to take the lock then
> > > +    * check that the rq still belongs to the newly locked engine.
> > > +    */
> > > +   locked = READ_ONCE(rq->engine);
> > > +   spin_lock_irq(&locked->sched_engine->lock);
> > > +   while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> > > +           spin_unlock(&locked->sched_engine->lock);
> > > +           spin_lock(&engine->sched_engine->lock);
> > > +           locked = engine;
> > > +   }
> >
> > Could use i915_request_active_engine although tbf I don't remember why I did
> > not convert all callers when I added it. Perhaps I just did not find them
> > all.
> >
>
> I think is a copy paste from the existing code or least at it should be.
> It just moves implicit execlists behavior from common code to a
> execlists specific vfunc.
>
> > > +   list_del_init(&rq->sched.link);
> > > +
> > > +   clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > > +   clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
> > > +
> > > +   /* Prevent further __await_execution() registering a cb, then flush */
> > > +   set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> > > +
> > > +   spin_unlock_irq(&locked->sched_engine->lock);
> > > +
> > > +   i915_request_notify_execute_cb_imm(rq);
> > > +}
> > > +
> > >   static bool can_preempt(struct intel_engine_cs *engine)
> > >   {
> > >     if (INTEL_GEN(engine->i915) > 8)
> > > @@ -3214,6 +3250,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
> > >     engine->cops = &execlists_context_ops;
> > >     engine->request_alloc = execlists_request_alloc;
> > >     engine->bump_serial = execlist_bump_serial;
> > > +   engine->add_active_request = add_to_engine;
> > > +   engine->remove_active_request = remove_from_engine;
> > >     engine->reset.prepare = execlists_reset_prepare;
> > >     engine->reset.rewind = execlists_reset_rewind;
> > > @@ -3915,6 +3953,8 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> > >             ve->base.sched_engine->kick_backend =
> > >                     sibling->sched_engine->kick_backend;
> > > +           ve->base.add_active_request = sibling->add_active_request;
> > > +           ve->base.remove_active_request = sibling->remove_active_request;
> > >             ve->base.emit_bb_start = sibling->emit_bb_start;
> > >             ve->base.emit_flush = sibling->emit_flush;
> > >             ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > > index aef3084e8b16..463a6ae605a0 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > > @@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
> > >     if (intel_gt_is_wedged(gt))
> > >             intel_gt_unset_wedged(gt);
> > > -   intel_uc_sanitize(&gt->uc);
> > > -
> > >     for_each_engine(engine, gt, id)
> > >             if (engine->reset.prepare)
> > >                     engine->reset.prepare(engine);
> > > @@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
> > >                     __intel_engine_reset(engine, false);
> > >     }
> > > +   intel_uc_reset(&gt->uc, false);
> > > +
> > >     for_each_engine(engine, gt, id)
> > >             if (engine->reset.finish)
> > >                     engine->reset.finish(engine);
> > > @@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt)
> > >             goto err_wedged;
> > >     }
> > > +   intel_uc_reset_finish(&gt->uc);
> > > +
> > >     intel_rps_enable(&gt->rps);
> > >     intel_llc_enable(&gt->llc);
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> > > index d5094be6d90f..ce3ef26ffe2d 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> > > @@ -758,6 +758,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
> > >             __intel_engine_reset(engine, stalled_mask & engine->mask);
> > >     local_bh_enable();
> > > +   intel_uc_reset(&gt->uc, true);
> > > +
> > >     intel_ggtt_restore_fences(gt->ggtt);
> > >     return err;
> > > @@ -782,6 +784,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
> > >             if (awake & engine->mask)
> > >                     intel_engine_pm_put(engine);
> > >     }
> > > +
> > > +   intel_uc_reset_finish(&gt->uc);
> > >   }
> > >   static void nop_submit_request(struct i915_request *request)
> > > @@ -835,6 +839,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
> > >     for_each_engine(engine, gt, id)
> > >             if (engine->reset.cancel)
> > >                     engine->reset.cancel(engine);
> > > +   intel_uc_cancel_requests(&gt->uc);
> > >     local_bh_enable();
> > >     reset_finish(gt, awake);
> > > @@ -1123,6 +1128,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
> > >     ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
> > >     GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags));
> > > +   if (intel_engine_uses_guc(engine))
> > > +           return -ENODEV;
> > > +
> > >     if (!intel_engine_pm_get_if_awake(engine))
> > >             return 0;
> > > @@ -1133,13 +1141,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
> > >                        "Resetting %s for %s\n", engine->name, msg);
> > >     atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
> > > -   if (intel_engine_uses_guc(engine))
> > > -           ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
> > > -   else
> > > -           ret = intel_gt_reset_engine(engine);
> > > +   ret = intel_gt_reset_engine(engine);
> > >     if (ret) {
> > >             /* If we fail here, we expect to fallback to a global reset */
> > > -           ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
> > > +           ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret);
> > >             goto out;
> > >     }
> > > @@ -1273,7 +1278,8 @@ void intel_gt_handle_error(struct intel_gt *gt,
> > >      * Try engine reset when available. We fall back to full reset if
> > >      * single reset fails.
> > >      */
> > > -   if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
> > > +   if (!intel_uc_uses_guc_submission(&gt->uc) &&
> > > +       intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
> >
> > If with guc driver cannot do engine reset, could intel_has_reset_engine just
> > say false in that case so guc check wouldn't have to be added here? Also
> > noticed this is the same open I had in 2019. and someone said it can and
> > would be folded. ;(
> >
>
> Let me look into that before the next rev, I briefly looked at this and
> it does seem plausible this function could return false. Only concern
> here is reset code is notoriously delicate so I am wary to change this
> in this series. We have a live list of follow ups, this could be
> included if it doesn't get fixed in this patch.
>
> > >             local_bh_disable();
> > >             for_each_engine_masked(engine, gt, engine_mask, tmp) {
> > >                     BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > index 39dd7c4ed0a9..7d05bf16094c 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > @@ -1050,6 +1050,25 @@ static void ring_bump_serial(struct intel_engine_cs *engine)
> > >     engine->serial++;
> > >   }
> > > +static void add_to_engine(struct i915_request *rq)
> > > +{
> > > +   lockdep_assert_held(&rq->engine->sched_engine->lock);
> > > +   list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> > > +}
> > > +
> > > +static void remove_from_engine(struct i915_request *rq)
> > > +{
> > > +   spin_lock_irq(&rq->engine->sched_engine->lock);
> > > +   list_del_init(&rq->sched.link);
> > > +
> > > +   /* Prevent further __await_execution() registering a cb, then flush */
> > > +   set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> > > +
> > > +   spin_unlock_irq(&rq->engine->sched_engine->lock);
> > > +
> > > +   i915_request_notify_execute_cb_imm(rq);
> > > +}
> > > +
> > >   static void setup_common(struct intel_engine_cs *engine)
> > >   {
> > >     struct drm_i915_private *i915 = engine->i915;
> > > @@ -1067,6 +1086,9 @@ static void setup_common(struct intel_engine_cs *engine)
> > >     engine->reset.cancel = reset_cancel;
> > >     engine->reset.finish = reset_finish;
> > > +   engine->add_active_request = add_to_engine;
> > > +   engine->remove_active_request = remove_from_engine;
> > > +
> > >     engine->cops = &ring_context_ops;
> > >     engine->request_alloc = ring_request_alloc;
> > >     engine->bump_serial = ring_bump_serial;
> > > diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> > > index 4d023b5cd5da..dccf5fce980a 100644
> > > --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> > > +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> > > @@ -235,6 +235,35 @@ static void mock_submit_request(struct i915_request *request)
> > >     spin_unlock_irqrestore(&engine->hw_lock, flags);
> > >   }
> > > +static void mock_add_to_engine(struct i915_request *rq)
> > > +{
> > > +   lockdep_assert_held(&rq->engine->sched_engine->lock);
> > > +   list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> > > +}
> > > +
> > > +static void mock_remove_from_engine(struct i915_request *rq)
> > > +{
> > > +   struct intel_engine_cs *engine, *locked;
> > > +
> > > +   /*
> > > +    * Virtual engines complicate acquiring the engine timeline lock,
> > > +    * as their rq->engine pointer is not stable until under that
> > > +    * engine lock. The simple ploy we use is to take the lock then
> > > +    * check that the rq still belongs to the newly locked engine.
> > > +    */
> > > +
> > > +   locked = READ_ONCE(rq->engine);
> > > +   spin_lock_irq(&locked->sched_engine->lock);
> > > +   while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> > > +           spin_unlock(&locked->sched_engine->lock);
> > > +           spin_lock(&engine->sched_engine->lock);
> > > +           locked = engine;
> > > +   }
> > > +   list_del_init(&rq->sched.link);
> > > +   spin_unlock_irq(&locked->sched_engine->lock);
> > > +}
> > > +
> > > +
> > >   static void mock_reset_prepare(struct intel_engine_cs *engine)
> > >   {
> > >   }
> > > @@ -327,6 +356,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
> > >     engine->base.emit_flush = mock_emit_flush;
> > >     engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
> > >     engine->base.submit_request = mock_submit_request;
> > > +   engine->base.add_active_request = mock_add_to_engine;
> > > +   engine->base.remove_active_request = mock_remove_from_engine;
> > >     engine->base.reset.prepare = mock_reset_prepare;
> > >     engine->base.reset.rewind = mock_reset_rewind;
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > > index 235c1997f32d..864b14e313a3 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > > @@ -146,6 +146,9 @@ static void gen11_disable_guc_interrupts(struct intel_guc *guc)
> > >   {
> > >     struct intel_gt *gt = guc_to_gt(guc);
> > > +   if (!guc->interrupts.enabled)
> > > +           return;
> > > +
> > >     spin_lock_irq(&gt->irq_lock);
> > >     guc->interrupts.enabled = false;
> > > @@ -579,19 +582,6 @@ int intel_guc_suspend(struct intel_guc *guc)
> > >     return 0;
> > >   }
> > > -/**
> > > - * intel_guc_reset_engine() - ask GuC to reset an engine
> > > - * @guc:   intel_guc structure
> > > - * @engine:        engine to be reset
> > > - */
> > > -int intel_guc_reset_engine(struct intel_guc *guc,
> > > -                      struct intel_engine_cs *engine)
> > > -{
> > > -   /* XXX: to be implemented with submission interface rework */
> > > -
> > > -   return -ENODEV;
> > > -}
> > > -
> > >   /**
> > >    * intel_guc_resume() - notify GuC resuming from suspend state
> > >    * @guc:  the guc
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > index 47eaa69809e8..afea04d56494 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > @@ -243,14 +243,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
> > >   int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
> > > -int intel_guc_reset_engine(struct intel_guc *guc,
> > > -                      struct intel_engine_cs *engine);
> > > -
> > >   int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > >                                       const u32 *msg, u32 len);
> > >   int intel_guc_sched_done_process_msg(struct intel_guc *guc,
> > >                                  const u32 *msg, u32 len);
> > > +void intel_guc_submission_reset_prepare(struct intel_guc *guc);
> > > +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
> > > +void intel_guc_submission_reset_finish(struct intel_guc *guc);
> > > +void intel_guc_submission_cancel_requests(struct intel_guc *guc);
> > > +
> > >   void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
> > >   #endif
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > index 80b89171b35a..8c093bc2d3a4 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > @@ -140,7 +140,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce)
> > >   static inline void
> > >   set_context_wait_for_deregister_to_register(struct intel_context *ce)
> > >   {
> > > -   /* Only should be called from guc_lrc_desc_pin() */
> > > +   /* Only should be called from guc_lrc_desc_pin() without lock */
> > >     ce->guc_state.sched_state |=
> > >             SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> > >   }
> > > @@ -240,15 +240,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
> > >   static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
> > >   {
> > > +   guc->lrc_desc_pool_vaddr = NULL;
> > >     i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
> > >   }
> > > +static inline bool guc_submission_initialized(struct intel_guc *guc)
> > > +{
> > > +   return guc->lrc_desc_pool_vaddr != NULL;
> > > +}
> > > +
> > >   static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
> > >   {
> > > -   struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> > > +   if (likely(guc_submission_initialized(guc))) {
> > > +           struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> > > +           unsigned long flags;
> > > -   memset(desc, 0, sizeof(*desc));
> > > -   xa_erase_irq(&guc->context_lookup, id);
> > > +           memset(desc, 0, sizeof(*desc));
> > > +
> > > +           /*
> > > +            * xarray API doesn't have xa_erase_irqsave wrapper, so calling
> > > +            * the lower level functions directly.
> > > +            */
> > > +           xa_lock_irqsave(&guc->context_lookup, flags);
> > > +           __xa_erase(&guc->context_lookup, id);
> > > +           xa_unlock_irqrestore(&guc->context_lookup, flags);
> > > +   }
> > >   }
> > >   static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> > > @@ -259,7 +275,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> > >   static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> > >                                        struct intel_context *ce)
> > >   {
> > > -   xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> > > +   unsigned long flags;
> > > +
> > > +   /*
> > > +    * xarray API doesn't have xa_save_irqsave wrapper, so calling the
> > > +    * lower level functions directly.
> > > +    */
> > > +   xa_lock_irqsave(&guc->context_lookup, flags);
> > > +   __xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
> > > +   xa_unlock_irqrestore(&guc->context_lookup, flags);
> > >   }
> > >   static int guc_submission_busy_loop(struct intel_guc* guc,
> > > @@ -330,6 +354,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
> > >                                     interruptible, timeout);
> > >   }
> > > +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
> > > +
> > >   static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > >   {
> > >     int err;
> > > @@ -337,11 +363,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > >     u32 action[3];
> > >     int len = 0;
> > >     u32 g2h_len_dw = 0;
> > > -   bool enabled = context_enabled(ce);
> > > +   bool enabled;
> > >     GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> > >     GEM_BUG_ON(context_guc_id_invalid(ce));
> > > +   /*
> > > +    * Corner case where the GuC firmware was blown away and reloaded while
> > > +    * this context was pinned.
> > > +    */
> > > +   if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
> > > +           err = guc_lrc_desc_pin(ce, false);
> > > +           if (unlikely(err))
> > > +                   goto out;
> > > +   }
> > > +   enabled = context_enabled(ce);
> > > +
> > >     if (!enabled) {
> > >             action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> > >             action[len++] = ce->guc_id;
> > > @@ -364,6 +401,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > >             intel_context_put(ce);
> > >     }
> > > +out:
> > >     return err;
> > >   }
> > > @@ -418,15 +456,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
> > >     if (submit) {
> > >             guc_set_lrc_tail(last);
> > >   resubmit:
> > > -           /*
> > > -            * We only check for -EBUSY here even though it is possible for
> > > -            * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
> > > -            * died and a full GPU needs to be done. The hangcheck will
> > > -            * eventually detect that the GuC has died and trigger this
> > > -            * reset so no need to handle -EDEADLK here.
> > > -            */
> > >             ret = guc_add_request(guc, last);
> > > -           if (ret == -EBUSY) {
> > > +           if (unlikely(ret == -EDEADLK))
> > > +                   goto deadlk;
> > > +           else if (ret == -EBUSY) {
> > >                     i915_sched_engine_kick(sched_engine);
> > >                     guc->stalled_request = last;
> > >                     return false;
> > > @@ -436,6 +469,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
> > >     guc->stalled_request = NULL;
> > >     return submit;
> > > +
> > > +deadlk:
> > > +   sched_engine->tasklet.callback = NULL;
> > > +   tasklet_disable_nosync(&sched_engine->tasklet);
> > > +   return false;
> > >   }
> > >   static void guc_submission_tasklet(struct tasklet_struct *t)
> > > @@ -462,29 +500,165 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
> > >             intel_engine_signal_breadcrumbs(engine);
> > >   }
> > > -static void guc_reset_prepare(struct intel_engine_cs *engine)
> > > +static void __guc_context_destroy(struct intel_context *ce);
> > > +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
> > > +static void guc_signal_context_fence(struct intel_context *ce);
> > > +
> > > +static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
> > >   {
> > > -   struct i915_sched_engine * const sched_engine = engine->sched_engine;
> > > +   struct intel_context *ce;
> > > +   unsigned long index, flags;
> > > +   bool pending_disable, pending_enable, deregister, destroyed;
> > > -   ENGINE_TRACE(engine, "\n");
> > > +   xa_for_each(&guc->context_lookup, index, ce) {
> > > +           /* Flush context */
> > > +           spin_lock_irqsave(&ce->guc_state.lock, flags);
> > > +           spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> >
> > Very unusual pattern - what does it do?
> >
>
> The comment below tries to explain this. Basically by cycling the lock
> it guarantees submission_disabled() is visible to all callers that touch
> the below flags.
>
> > > +
> > > +           /*
> > > +            * Once we are at this point submission_disabled() is guaranteed
> > > +            * to visible to all callers who set the below flags (see above
> > > +            * flush and flushes in reset_prepare). If submission_disabled()
> > > +            * is set, the caller shouldn't set these flags.
> > > +            */
> > > +
> > > +           destroyed = context_destroyed(ce);
> > > +           pending_enable = context_pending_enable(ce);
> > > +           pending_disable = context_pending_disable(ce);
> > > +           deregister = context_wait_for_deregister_to_register(ce);
> > > +           init_sched_state(ce);
> > > +
> > > +           if (pending_enable || destroyed || deregister) {
> > > +                   atomic_dec(&guc->outstanding_submission_g2h);
> > > +                   if (deregister)
> > > +                           guc_signal_context_fence(ce);
> > > +                   if (destroyed) {
> > > +                           release_guc_id(guc, ce);
> > > +                           __guc_context_destroy(ce);
> > > +                   }
> > > +                   if (pending_enable|| deregister)
> > > +                           intel_context_put(ce);
> > > +           }
> > > +
> > > +           /* Not mutualy exclusive with above if statement. */
> > > +           if (pending_disable) {
> > > +                   guc_signal_context_fence(ce);
> > > +                   intel_context_sched_disable_unpin(ce);
> > > +                   atomic_dec(&guc->outstanding_submission_g2h);
> > > +                   intel_context_put(ce);
> > > +           }
> >
> > Yeah this function is a taste of the taste machine I think is _extremely_
> > hard to review and know with any confidence it does the right thing.
> >
>
> What is the other option? Block everytime we issue an asynchronous
> command to the GuC? If we want to do everyting asynchronously we have to
> track state and take further actions when the GuC finally responds. We
> also have to deal with the GuC dying and taking those actions on our
> own.
>
> Luckily we do have several other aside from myself that do understand
> this quite well.
>
> > > +   }
> > > +}
> > > +
> > > +static inline bool
> > > +submission_disabled(struct intel_guc *guc)
> > > +{
> > > +   struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > > +
> > > +   return unlikely(!__tasklet_is_enabled(&sched_engine->tasklet));
> > > +}
> > > +
> > > +static void disable_submission(struct intel_guc *guc)
> > > +{
> > > +   struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > > +
> > > +   if (__tasklet_is_enabled(&sched_engine->tasklet)) {
> > > +           GEM_BUG_ON(!guc->ct.enabled);
> > > +           __tasklet_disable_sync_once(&sched_engine->tasklet);
> > > +           sched_engine->tasklet.callback = NULL;
> > > +   }
> > > +}
> > > +
> > > +static void enable_submission(struct intel_guc *guc)
> > > +{
> > > +   struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > > +   unsigned long flags;
> > > +
> > > +   spin_lock_irqsave(&guc->sched_engine->lock, flags);
> > > +   sched_engine->tasklet.callback = guc_submission_tasklet;
> > > +   wmb();
> >
> > All memory barriers must be documented.
> >
>
> Found that out from checkpatch the other day, will fix.
>
> > > +   if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
> > > +       __tasklet_enable(&sched_engine->tasklet)) {
> > > +           GEM_BUG_ON(!guc->ct.enabled);
> > > +
> > > +           /* And kick in case we missed a new request submission. */
> > > +           i915_sched_engine_hi_kick(sched_engine);
> > > +   }
> > > +   spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
> > > +}
> > > +
> > > +static void guc_flush_submissions(struct intel_guc *guc)
> > > +{
> > > +   struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > > +   unsigned long flags;
> > > +
> > > +   spin_lock_irqsave(&sched_engine->lock, flags);
> > > +   spin_unlock_irqrestore(&sched_engine->lock, flags);
> >
> > Oh right, more of this. No idea.
> >
>
> Same as above. If you change some state then cycle a lock it is
> guaranteed that state is visible next time someone grabs the lock. I do
> explain these races in the documentation patch near the end of the
> series. Without a BKL I don't see how else to avoid these reset races.
>
> > > +}
> > > +
> > > +void intel_guc_submission_reset_prepare(struct intel_guc *guc)
> > > +{
> > > +   int i;
> > > +
> > > +   if (unlikely(!guc_submission_initialized(guc)))
> > > +           /* Reset called during driver load? GuC not yet initialised! */
> > > +           return;
> > > +
> > > +   disable_submission(guc);
> > > +   guc->interrupts.disable(guc);
> > > +
> > > +   /* Flush IRQ handler */
> > > +   spin_lock_irq(&guc_to_gt(guc)->irq_lock);
> > > +   spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
> > > +
> > > +   guc_flush_submissions(guc);
> > >     /*
> > > -    * Prevent request submission to the hardware until we have
> > > -    * completed the reset in i915_gem_reset_finish(). If a request
> > > -    * is completed by one engine, it may then queue a request
> > > -    * to a second via its sched_engine->tasklet *just* as we are
> > > -    * calling engine->init_hw() and also writing the ELSP.
> > > -    * Turning off the sched_engine->tasklet until the reset is over
> > > -    * prevents the race.
> > > +    * Handle any outstanding G2Hs before reset. Call IRQ handler directly
> > > +    * each pass as interrupt have been disabled. We always scrub for
> > > +    * outstanding G2H as it is possible for outstanding_submission_g2h to
> > > +    * be incremented after the context state update.
> > >      */
> > > -   __tasklet_disable_sync_once(&sched_engine->tasklet);
> > > +   for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
> >
> > Why is four the magic number and what happens if it is not enough?
> >
>
> I just picked a number. Regardless if the normal G2H path processes all the
> G2H we scrub all the context state for lost ones.
>
> > > +           intel_guc_to_host_event_handler(guc);
> > > +#define wait_for_reset(guc, wait_var) \
> > > +           guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
> > > +           do {
> > > +                   wait_for_reset(guc, &guc->outstanding_submission_g2h);
> > > +           } while (!list_empty(&guc->ct.requests.incoming));
> > > +   }
> > > +   scrub_guc_desc_for_outstanding_g2h(guc);
> > > +}
> > > +
> > > +static struct intel_engine_cs *
> > > +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> > > +{
> > > +   struct intel_engine_cs *engine;
> > > +   intel_engine_mask_t tmp, mask = ve->mask;
> > > +   unsigned int num_siblings = 0;
> > > +
> > > +   for_each_engine_masked(engine, ve->gt, mask, tmp)
> > > +           if (num_siblings++ == sibling)
> > > +                   return engine;
> >
> > Not sure how often is this used overall and whether just storing the array
> > in ve could be justified.
> >
>
> It really is only used with sibling == 0, so it should be fast.
>
> > > +
> > > +   return NULL;
> > > +}
> > > +
> > > +static inline struct intel_engine_cs *
> > > +__context_to_physical_engine(struct intel_context *ce)
> > > +{
> > > +   struct intel_engine_cs *engine = ce->engine;
> > > +
> > > +   if (intel_engine_is_virtual(engine))
> > > +           engine = guc_virtual_get_sibling(engine, 0);
> > > +
> > > +   return engine;
> > >   }
> > > -static void guc_reset_state(struct intel_context *ce,
> > > -                       struct intel_engine_cs *engine,
> > > -                       u32 head,
> > > -                       bool scrub)
> > > +static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
> > >   {
> > > +   struct intel_engine_cs *engine = __context_to_physical_engine(ce);
> > > +
> > >     GEM_BUG_ON(!intel_context_is_pinned(ce));
> > >     /*
> > > @@ -502,42 +676,147 @@ static void guc_reset_state(struct intel_context *ce,
> > >     lrc_update_regs(ce, engine, head);
> > >   }
> > > -static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
> > > +static void guc_reset_nop(struct intel_engine_cs *engine)
> > >   {
> > > -   struct intel_engine_execlists * const execlists = &engine->execlists;
> > > -   struct i915_request *rq;
> > > +}
> > > +
> > > +static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled)
> > > +{
> > > +}
> > > +
> > > +static void
> > > +__unwind_incomplete_requests(struct intel_context *ce)
> > > +{
> > > +   struct i915_request *rq, *rn;
> > > +   struct list_head *pl;
> > > +   int prio = I915_PRIORITY_INVALID;
> > > +   struct i915_sched_engine * const sched_engine =
> > > +           ce->engine->sched_engine;
> > >     unsigned long flags;
> > > -   spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > > +   spin_lock_irqsave(&sched_engine->lock, flags);
> > > +   spin_lock(&ce->guc_active.lock);
> > > +   list_for_each_entry_safe(rq, rn,
> > > +                            &ce->guc_active.requests,
> > > +                            sched.link) {
> > > +           if (i915_request_completed(rq))
> > > +                   continue;
> > > +
> > > +           list_del_init(&rq->sched.link);
> > > +           spin_unlock(&ce->guc_active.lock);
> >
> > Drops the lock and continues iterating the same list is safe? Comment needed
> > I think and I do remember I worried about this, or similar instances, in GuC
> > code before.
> >
>
> We only need the active lock for ce->guc_active.requests list. It is
> indeed safe to drop the lock.
>
> > > +
> > > +           __i915_request_unsubmit(rq);
> > > +
> > > +           /* Push the request back into the queue for later resubmission. */
> > > +           GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
> > > +           if (rq_prio(rq) != prio) {
> > > +                   prio = rq_prio(rq);
> > > +                   pl = i915_sched_lookup_priolist(sched_engine, prio);
> > > +           }
> > > +           GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
> > > -   /* Push back any incomplete requests for replay after the reset. */
> > > -   rq = execlists_unwind_incomplete_requests(execlists);
> > > -   if (!rq)
> > > -           goto out_unlock;
> > > +           list_add_tail(&rq->sched.link, pl);
> > > +           set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > > +
> > > +           spin_lock(&ce->guc_active.lock);
> > > +   }
> > > +   spin_unlock(&ce->guc_active.lock);
> > > +   spin_unlock_irqrestore(&sched_engine->lock, flags);
> > > +}
> > > +
> > > +static struct i915_request *context_find_active_request(struct intel_context *ce)
> > > +{
> > > +   struct i915_request *rq, *active = NULL;
> > > +   unsigned long flags;
> > > +
> > > +   spin_lock_irqsave(&ce->guc_active.lock, flags);
> > > +   list_for_each_entry_reverse(rq, &ce->guc_active.requests,
> > > +                               sched.link) {
> > > +           if (i915_request_completed(rq))
> > > +                   break;
> > > +
> > > +           active = rq;
> > > +   }
> > > +   spin_unlock_irqrestore(&ce->guc_active.lock, flags);
> > > +
> > > +   return active;
> > > +}
> > > +
> > > +static void __guc_reset_context(struct intel_context *ce, bool stalled)
> > > +{
> > > +   struct i915_request *rq;
> > > +   u32 head;
> > > +
> > > +   /*
> > > +    * GuC will implicitly mark the context as non-schedulable
> > > +    * when it sends the reset notification. Make sure our state
> > > +    * reflects this change. The context will be marked enabled
> > > +    * on resubmission.
> > > +    */
> > > +   clr_context_enabled(ce);
> > > +
> > > +   rq = context_find_active_request(ce);
> > > +   if (!rq) {
> > > +           head = ce->ring->tail;
> > > +           stalled = false;
> > > +           goto out_replay;
> > > +   }
> > >     if (!i915_request_started(rq))
> > >             stalled = false;
> > > +   GEM_BUG_ON(i915_active_is_idle(&ce->active));
> > > +   head = intel_ring_wrap(ce->ring, rq->head);
> > >     __i915_request_reset(rq, stalled);
> > > -   guc_reset_state(rq->context, engine, rq->head, stalled);
> > > -out_unlock:
> > > -   spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> > > +out_replay:
> > > +   guc_reset_state(ce, head, stalled);
> > > +   __unwind_incomplete_requests(ce);
> > > +}
> > > +
> > > +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
> > > +{
> > > +   struct intel_context *ce;
> > > +   unsigned long index;
> > > +
> > > +   if (unlikely(!guc_submission_initialized(guc)))
> > > +           /* Reset called during driver load? GuC not yet initialised! */
> > > +           return;
> > > +
> > > +   xa_for_each(&guc->context_lookup, index, ce)
> > > +           if (intel_context_is_pinned(ce))
> > > +                   __guc_reset_context(ce, stalled);
> > > +
> > > +   /* GuC is blown away, drop all references to contexts */
> > > +   xa_destroy(&guc->context_lookup);
> > > +}
> > > +
> > > +static void guc_cancel_context_requests(struct intel_context *ce)
> > > +{
> > > +   struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
> > > +   struct i915_request *rq;
> > > +   unsigned long flags;
> > > +
> > > +   /* Mark all executing requests as skipped. */
> > > +   spin_lock_irqsave(&sched_engine->lock, flags);
> > > +   spin_lock(&ce->guc_active.lock);
> > > +   list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
> > > +           i915_request_put(i915_request_mark_eio(rq));
> > > +   spin_unlock(&ce->guc_active.lock);
> > > +   spin_unlock_irqrestore(&sched_engine->lock, flags);
> >
> > I suppose somewhere it will need to be documented what are the two locks
> > protecting and why both are needed at some places.
> >
>
> Yep, I have a locking section the doc patch near the end of the series.
> Basically here is we don't want any new submissions processed when we
> are canceling requests - that is the outer lock. The inner lock again
> ce->guc_active.requests list.
>
> BTW - I think I am overly careful with the locks (when in doubt grab a
> lock) in the reset / cancel code as there is no expectation that this
> needs to perform well and resets are by far the racest code in the i915.
>
> > >   }
> > > -static void guc_reset_cancel(struct intel_engine_cs *engine)
> > > +static void
> > > +guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine)
> > >   {
> > > -   struct i915_sched_engine * const sched_engine = engine->sched_engine;
> > >     struct i915_request *rq, *rn;
> > >     struct rb_node *rb;
> > >     unsigned long flags;
> > >     /* Can be called during boot if GuC fails to load */
> > > -   if (!engine->gt)
> > > +   if (!sched_engine)
> > >             return;
> > > -   ENGINE_TRACE(engine, "\n");
> > > -
> > >     /*
> > >      * Before we call engine->cancel_requests(), we should have exclusive
> > >      * access to the submission state. This is arranged for us by the
> > > @@ -552,13 +831,7 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> > >      * submission's irq state, we also wish to remind ourselves that
> > >      * it is irq state.)
> > >      */
> > > -   spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > > -
> > > -   /* Mark all executing requests as skipped. */
> > > -   list_for_each_entry(rq, &engine->sched_engine->requests, sched.link) {
> > > -           i915_request_set_error_once(rq, -EIO);
> > > -           i915_request_mark_complete(rq);
> > > -   }
> > > +   spin_lock_irqsave(&sched_engine->lock, flags);
> > >     /* Flush the queued requests to the timeline list (for retiring). */
> > >     while ((rb = rb_first_cached(&sched_engine->queue))) {
> > > @@ -566,9 +839,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> > >             priolist_for_each_request_consume(rq, rn, p) {
> > >                     list_del_init(&rq->sched.link);
> > > +
> > >                     __i915_request_submit(rq);
> > > -                   dma_fence_set_error(&rq->fence, -EIO);
> > > -                   i915_request_mark_complete(rq);
> > > +
> > > +                   i915_request_put(i915_request_mark_eio(rq));
> > >             }
> > >             rb_erase_cached(&p->node, &sched_engine->queue);
> > > @@ -580,19 +854,41 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> > >     sched_engine->queue_priority_hint = INT_MIN;
> > >     sched_engine->queue = RB_ROOT_CACHED;
> > > -   spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> > > +   spin_unlock_irqrestore(&sched_engine->lock, flags);
> > >   }
> > > -static void guc_reset_finish(struct intel_engine_cs *engine)
> > > +void intel_guc_submission_cancel_requests(struct intel_guc *guc)
> > >   {
> > > -   struct i915_sched_engine * const sched_engine = engine->sched_engine;
> > > +   struct intel_context *ce;
> > > +   unsigned long index;
> > > -   if (__tasklet_enable(&sched_engine->tasklet))
> > > -           /* And kick in case we missed a new request submission. */
> > > -           i915_sched_engine_hi_kick(sched_engine);
> > > +   xa_for_each(&guc->context_lookup, index, ce)
> > > +           if (intel_context_is_pinned(ce))
> > > +                   guc_cancel_context_requests(ce);
> > > +
> > > +   guc_cancel_sched_engine_requests(guc->sched_engine);
> > > +
> > > +   /* GuC is blown away, drop all references to contexts */
> > > +   xa_destroy(&guc->context_lookup);
> > > +}
> > > +
> > > +void intel_guc_submission_reset_finish(struct intel_guc *guc)
> > > +{
> > > +   /* Reset called during driver load or during wedge? */
> > > +   if (unlikely(!guc_submission_initialized(guc) ||
> > > +                test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
> > > +           return;
> > > -   ENGINE_TRACE(engine, "depth->%d\n",
> > > -                atomic_read(&sched_engine->tasklet.count));
> > > +   /*
> > > +    * Technically possible for either of these values to be non-zero here,
> > > +    * but very unlikely + harmless. Regardless let's add a warn so we can
> > > +    * see in CI if this happens frequently / a precursor to taking down the
> > > +    * machine.
> >
> > And what did CI say over time this was in?
> >
>
> It hasn't popped yet. This is more for upcoming code where we have a
> g2hs we can't scrub (i.e. a TLB invalidation, engine class scheduling
> disable, etc...).
>
> > It needs to be explained when it can be non zero and whether or not it can
> > go to non zero just after the atomic_set below. Or if not why not.
> >
>
> At this point we could probably turn this into a BUG_ON.
>
> > > +    */
> > > +   GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
> > > +   atomic_set(&guc->outstanding_submission_g2h, 0);
> > > +
> > > +   enable_submission(guc);
> > >   }
> > >   /*
> > > @@ -659,6 +955,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> > >     else
> > >             trace_i915_request_guc_submit(rq);
> > > +   if (unlikely(ret == -EDEADLK))
> > > +           disable_submission(guc);
> > > +
> > >     return ret;
> > >   }
> > > @@ -671,7 +970,8 @@ static void guc_submit_request(struct i915_request *rq)
> > >     /* Will be called from irq-context when using foreign fences. */
> > >     spin_lock_irqsave(&sched_engine->lock, flags);
> > > -   if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> > > +   if (submission_disabled(guc) || guc->stalled_request ||
> > > +       !i915_sched_engine_is_empty(sched_engine))
> > >             queue_request(sched_engine, rq, rq_prio(rq));
> > >     else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> > >             i915_sched_engine_hi_kick(sched_engine);
> > > @@ -808,7 +1108,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > >   static int __guc_action_register_context(struct intel_guc *guc,
> > >                                      u32 guc_id,
> > > -                                    u32 offset)
> > > +                                    u32 offset,
> > > +                                    bool loop)
> > >   {
> > >     u32 action[] = {
> > >             INTEL_GUC_ACTION_REGISTER_CONTEXT,
> > > @@ -816,10 +1117,10 @@ static int __guc_action_register_context(struct intel_guc *guc,
> > >             offset,
> > >     };
> > > -   return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > > +   return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, loop);
> > >   }
> > > -static int register_context(struct intel_context *ce)
> > > +static int register_context(struct intel_context *ce, bool loop)
> > >   {
> > >     struct intel_guc *guc = ce_to_guc(ce);
> > >     u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> > > @@ -827,11 +1128,12 @@ static int register_context(struct intel_context *ce)
> > >     trace_intel_context_register(ce);
> > > -   return __guc_action_register_context(guc, ce->guc_id, offset);
> > > +   return __guc_action_register_context(guc, ce->guc_id, offset, loop);
> > >   }
> > >   static int __guc_action_deregister_context(struct intel_guc *guc,
> > > -                                      u32 guc_id)
> > > +                                      u32 guc_id,
> > > +                                      bool loop)
> > >   {
> > >     u32 action[] = {
> > >             INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> > > @@ -839,16 +1141,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
> > >     };
> > >     return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
> > > -                                   G2H_LEN_DW_DEREGISTER_CONTEXT, true);
> > > +                                   G2H_LEN_DW_DEREGISTER_CONTEXT, loop);
> > >   }
> > > -static int deregister_context(struct intel_context *ce, u32 guc_id)
> > > +static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
> > >   {
> > >     struct intel_guc *guc = ce_to_guc(ce);
> > >     trace_intel_context_deregister(ce);
> > > -   return __guc_action_deregister_context(guc, guc_id);
> > > +   return __guc_action_deregister_context(guc, guc_id, loop);
> > >   }
> > >   static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> > > @@ -877,7 +1179,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
> > >     desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> > >   }
> > > -static int guc_lrc_desc_pin(struct intel_context *ce)
> > > +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> > >   {
> > >     struct intel_runtime_pm *runtime_pm =
> > >             &ce->engine->gt->i915->runtime_pm;
> > > @@ -923,18 +1225,44 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
> > >      */
> > >     if (context_registered) {
> > >             trace_intel_context_steal_guc_id(ce);
> > > -           set_context_wait_for_deregister_to_register(ce);
> > > -           intel_context_get(ce);
> > > +           if (!loop) {
> > > +                   set_context_wait_for_deregister_to_register(ce);
> > > +                   intel_context_get(ce);
> > > +           } else {
> > > +                   bool disabled;
> > > +                   unsigned long flags;
> > > +
> > > +                   /* Seal race with Reset */
> >
> > Needs to be more descriptive.
> >
>
> Again have comment about this the doc patch. Basically this goes back to
> your other questions about cycling a lock. You must check if
> submission_disabled() within a lock otherwise there is a race with
> resets and updating a contexts state.
>
> > > +                   spin_lock_irqsave(&ce->guc_state.lock, flags);
> > > +                   disabled = submission_disabled(guc);
> > > +                   if (likely(!disabled)) {
> > > +                           set_context_wait_for_deregister_to_register(ce);
> > > +                           intel_context_get(ce);
> > > +                   }
> > > +                   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > > +                   if (unlikely(disabled)) {
> > > +                           reset_lrc_desc(guc, desc_idx);
> > > +                           return 0;       /* Will get registered later */
> > > +                   }
> > > +           }
> > >             /*
> > >              * If stealing the guc_id, this ce has the same guc_id as the
> > >              * context whos guc_id was stole.
> > >              */
> > >             with_intel_runtime_pm(runtime_pm, wakeref)
> > > -                   ret = deregister_context(ce, ce->guc_id);
> > > +                   ret = deregister_context(ce, ce->guc_id, loop);
> > > +           if (unlikely(ret == -EBUSY)) {
> > > +                   clr_context_wait_for_deregister_to_register(ce);
> > > +                   intel_context_put(ce);
> > > +           }
> > >     } else {
> > >             with_intel_runtime_pm(runtime_pm, wakeref)
> > > -                   ret = register_context(ce);
> > > +                   ret = register_context(ce, loop);
> > > +           if (unlikely(ret == -EBUSY))
> > > +                   reset_lrc_desc(guc, desc_idx);
> > > +           else if (unlikely(ret == -ENODEV))
> > > +                   ret = 0;        /* Will get registered later */
> > >     }
> > >     return ret;
> > > @@ -997,7 +1325,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
> > >     GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
> > >     trace_intel_context_sched_disable(ce);
> > > -   intel_context_get(ce);
> > >     guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
> > >                              G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> > > @@ -1007,6 +1334,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
> > >   {
> > >     set_context_pending_disable(ce);
> > >     clr_context_enabled(ce);
> > > +   intel_context_get(ce);
> > >     return ce->guc_id;
> > >   }
> > > @@ -1019,7 +1347,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
> > >     u16 guc_id;
> > >     intel_wakeref_t wakeref;
> > > -   if (context_guc_id_invalid(ce) ||
> > > +   if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
> > >         !lrc_desc_registered(guc, ce->guc_id)) {
> > >             clr_context_enabled(ce);
> > >             goto unpin;
> > > @@ -1053,19 +1381,13 @@ static void guc_context_sched_disable(struct intel_context *ce)
> > >   static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> > >   {
> > > -   struct intel_engine_cs *engine = ce->engine;
> > > -   struct intel_guc *guc = &engine->gt->uc.guc;
> > > -   unsigned long flags;
> > > +   struct intel_guc *guc = ce_to_guc(ce);
> > >     GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> > >     GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> > >     GEM_BUG_ON(context_enabled(ce));
> > > -   spin_lock_irqsave(&ce->guc_state.lock, flags);
> > > -   set_context_destroyed(ce);
> > > -   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > > -
> > > -   deregister_context(ce, ce->guc_id);
> > > +   deregister_context(ce, ce->guc_id, true);
> > >   }
> > >   static void __guc_context_destroy(struct intel_context *ce)
> > > @@ -1093,13 +1415,15 @@ static void guc_context_destroy(struct kref *kref)
> > >     struct intel_guc *guc = &ce->engine->gt->uc.guc;
> > >     intel_wakeref_t wakeref;
> > >     unsigned long flags;
> > > +   bool disabled;
> > >     /*
> > >      * If the guc_id is invalid this context has been stolen and we can free
> > >      * it immediately. Also can be freed immediately if the context is not
> > >      * registered with the GuC.
> > >      */
> > > -   if (context_guc_id_invalid(ce) ||
> > > +   if (submission_disabled(guc) ||
> > > +       context_guc_id_invalid(ce) ||
> > >         !lrc_desc_registered(guc, ce->guc_id)) {
> > >             release_guc_id(guc, ce);
> > >             __guc_context_destroy(ce);
> > > @@ -1126,6 +1450,18 @@ static void guc_context_destroy(struct kref *kref)
> > >             list_del_init(&ce->guc_id_link);
> > >     spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > > +   /* Seal race with Reset */
> > > +   spin_lock_irqsave(&ce->guc_state.lock, flags);
> > > +   disabled = submission_disabled(guc);
> > > +   if (likely(!disabled))
> > > +           set_context_destroyed(ce);
> > > +   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > > +   if (unlikely(disabled)) {
> > > +           release_guc_id(guc, ce);
> > > +           __guc_context_destroy(ce);
> > > +           return;
> >
> > Same as above, needs a better comment. It is also hard for reader to know if
> > snapshot of disabled taked under the lock is still valid after the lock has
> > been released and why.
> >
>
> Will pull in the doc comment into this patch.
>
> Matt
>
> > Regards,
> >
> > Tvrtko
> >
> > > +   }
> > > +
> > >     /*
> > >      * We defer GuC context deregistration until the context is destroyed
> > >      * in order to save on CTBs. With this optimization ideally we only need
> > > @@ -1148,6 +1484,33 @@ static int guc_context_alloc(struct intel_context *ce)
> > >     return lrc_alloc(ce, ce->engine);
> > >   }
> > > +static void add_to_context(struct i915_request *rq)
> > > +{
> > > +   struct intel_context *ce = rq->context;
> > > +
> > > +   spin_lock(&ce->guc_active.lock);
> > > +   list_move_tail(&rq->sched.link, &ce->guc_active.requests);
> > > +   spin_unlock(&ce->guc_active.lock);
> > > +}
> > > +
> > > +static void remove_from_context(struct i915_request *rq)
> > > +{
> > > +   struct intel_context *ce = rq->context;
> > > +
> > > +   spin_lock_irq(&ce->guc_active.lock);
> > > +
> > > +   list_del_init(&rq->sched.link);
> > > +   clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > > +
> > > +   /* Prevent further __await_execution() registering a cb, then flush */
> > > +   set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> > > +
> > > +   spin_unlock_irq(&ce->guc_active.lock);
> > > +
> > > +   atomic_dec(&ce->guc_id_ref);
> > > +   i915_request_notify_execute_cb_imm(rq);
> > > +}
> > > +
> > >   static const struct intel_context_ops guc_context_ops = {
> > >     .alloc = guc_context_alloc,
> > > @@ -1186,8 +1549,6 @@ static void guc_signal_context_fence(struct intel_context *ce)
> > >   {
> > >     unsigned long flags;
> > > -   GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
> > > -
> > >     spin_lock_irqsave(&ce->guc_state.lock, flags);
> > >     clr_context_wait_for_deregister_to_register(ce);
> > >     __guc_signal_context_fence(ce);
> > > @@ -1196,8 +1557,9 @@ static void guc_signal_context_fence(struct intel_context *ce)
> > >   static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
> > >   {
> > > -   return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> > > -           !lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> > > +   return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> > > +           !lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
> > > +           !submission_disabled(ce_to_guc(ce));
> > >   }
> > >   static int guc_request_alloc(struct i915_request *rq)
> > > @@ -1256,8 +1618,10 @@ static int guc_request_alloc(struct i915_request *rq)
> > >             return ret;;
> > >     if (context_needs_register(ce, !!ret)) {
> > > -           ret = guc_lrc_desc_pin(ce);
> > > +           ret = guc_lrc_desc_pin(ce, true);
> > >             if (unlikely(ret)) {    /* unwind */
> > > +                   if (ret == -EDEADLK)
> > > +                           disable_submission(guc);
> > >                     atomic_dec(&ce->guc_id_ref);
> > >                     unpin_guc_id(guc, ce);
> > >                     return ret;
> > > @@ -1294,20 +1658,6 @@ static int guc_request_alloc(struct i915_request *rq)
> > >     return 0;
> > >   }
> > > -static struct intel_engine_cs *
> > > -guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> > > -{
> > > -   struct intel_engine_cs *engine;
> > > -   intel_engine_mask_t tmp, mask = ve->mask;
> > > -   unsigned int num_siblings = 0;
> > > -
> > > -   for_each_engine_masked(engine, ve->gt, mask, tmp)
> > > -           if (num_siblings++ == sibling)
> > > -                   return engine;
> > > -
> > > -   return NULL;
> > > -}
> > > -
> > >   static int guc_virtual_context_pre_pin(struct intel_context *ce,
> > >                                    struct i915_gem_ww_ctx *ww,
> > >                                    void **vaddr)
> > > @@ -1516,7 +1866,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
> > >   {
> > >     if (context_guc_id_invalid(ce))
> > >             pin_guc_id(guc, ce);
> > > -   guc_lrc_desc_pin(ce);
> > > +   guc_lrc_desc_pin(ce, true);
> > >   }
> > >   static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> > > @@ -1582,13 +1932,15 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
> > >     engine->cops = &guc_context_ops;
> > >     engine->request_alloc = guc_request_alloc;
> > >     engine->bump_serial = guc_bump_serial;
> > > +   engine->add_active_request = add_to_context;
> > > +   engine->remove_active_request = remove_from_context;
> > >     engine->sched_engine->schedule = i915_schedule;
> > > -   engine->reset.prepare = guc_reset_prepare;
> > > -   engine->reset.rewind = guc_reset_rewind;
> > > -   engine->reset.cancel = guc_reset_cancel;
> > > -   engine->reset.finish = guc_reset_finish;
> > > +   engine->reset.prepare = guc_reset_nop;
> > > +   engine->reset.rewind = guc_rewind_nop;
> > > +   engine->reset.cancel = guc_reset_nop;
> > > +   engine->reset.finish = guc_reset_nop;
> > >     engine->emit_flush = gen8_emit_flush_xcs;
> > >     engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
> > > @@ -1764,7 +2116,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > >              * register this context.
> > >              */
> > >             with_intel_runtime_pm(runtime_pm, wakeref)
> > > -                   register_context(ce);
> > > +                   register_context(ce, true);
> > >             guc_signal_context_fence(ce);
> > >             intel_context_put(ce);
> > >     } else if (context_destroyed(ce)) {
> > > @@ -1946,6 +2298,10 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> > >                              "v%dx%d", ve->base.class, count);
> > >                     ve->base.context_size = sibling->context_size;
> > > +                   ve->base.add_active_request =
> > > +                           sibling->add_active_request;
> > > +                   ve->base.remove_active_request =
> > > +                           sibling->remove_active_request;
> > >                     ve->base.emit_bb_start = sibling->emit_bb_start;
> > >                     ve->base.emit_flush = sibling->emit_flush;
> > >                     ve->base.emit_init_breadcrumb =
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > index ab0789d66e06..d5ccffbb89ae 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > @@ -565,12 +565,44 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
> > >   {
> > >     struct intel_guc *guc = &uc->guc;
> > > +   /* Firmware expected to be running when this function is called */
> > >     if (!intel_guc_is_ready(guc))
> > > -           return;
> > > +           goto sanitize;
> > > +
> > > +   if (intel_uc_uses_guc_submission(uc))
> > > +           intel_guc_submission_reset_prepare(guc);
> > > +sanitize:
> > >     __uc_sanitize(uc);
> > >   }
> > > +void intel_uc_reset(struct intel_uc *uc, bool stalled)
> > > +{
> > > +   struct intel_guc *guc = &uc->guc;
> > > +
> > > +   /* Firmware can not be running when this function is called  */
> > > +   if (intel_uc_uses_guc_submission(uc))
> > > +           intel_guc_submission_reset(guc, stalled);
> > > +}
> > > +
> > > +void intel_uc_reset_finish(struct intel_uc *uc)
> > > +{
> > > +   struct intel_guc *guc = &uc->guc;
> > > +
> > > +   /* Firmware expected to be running when this function is called */
> > > +   if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
> > > +           intel_guc_submission_reset_finish(guc);
> > > +}
> > > +
> > > +void intel_uc_cancel_requests(struct intel_uc *uc)
> > > +{
> > > +   struct intel_guc *guc = &uc->guc;
> > > +
> > > +   /* Firmware can not be running when this function is called  */
> > > +   if (intel_uc_uses_guc_submission(uc))
> > > +           intel_guc_submission_cancel_requests(guc);
> > > +}
> > > +
> > >   void intel_uc_runtime_suspend(struct intel_uc *uc)
> > >   {
> > >     struct intel_guc *guc = &uc->guc;
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > > index c4cef885e984..eaa3202192ac 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > > @@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc);
> > >   void intel_uc_driver_remove(struct intel_uc *uc);
> > >   void intel_uc_init_mmio(struct intel_uc *uc);
> > >   void intel_uc_reset_prepare(struct intel_uc *uc);
> > > +void intel_uc_reset(struct intel_uc *uc, bool stalled);
> > > +void intel_uc_reset_finish(struct intel_uc *uc);
> > > +void intel_uc_cancel_requests(struct intel_uc *uc);
> > >   void intel_uc_suspend(struct intel_uc *uc);
> > >   void intel_uc_runtime_suspend(struct intel_uc *uc);
> > >   int intel_uc_resume(struct intel_uc *uc);
> > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > > index 0b96b824ea06..4855cf7ebe21 100644
> > > --- a/drivers/gpu/drm/i915/i915_request.c
> > > +++ b/drivers/gpu/drm/i915/i915_request.c
> > > @@ -194,7 +194,7 @@ static bool irq_work_imm(struct irq_work *wrk)
> > >     return false;
> > >   }
> > > -static void __notify_execute_cb_imm(struct i915_request *rq)
> > > +void i915_request_notify_execute_cb_imm(struct i915_request *rq)
> > >   {
> > >     __notify_execute_cb(rq, irq_work_imm);
> > >   }
> > > @@ -268,37 +268,6 @@ i915_request_active_engine(struct i915_request *rq,
> > >     return ret;
> > >   }
> > > -
> > > -static void remove_from_engine(struct i915_request *rq)
> > > -{
> > > -   struct intel_engine_cs *engine, *locked;
> > > -
> > > -   /*
> > > -    * Virtual engines complicate acquiring the engine timeline lock,
> > > -    * as their rq->engine pointer is not stable until under that
> > > -    * engine lock. The simple ploy we use is to take the lock then
> > > -    * check that the rq still belongs to the newly locked engine.
> > > -    */
> > > -   locked = READ_ONCE(rq->engine);
> > > -   spin_lock_irq(&locked->sched_engine->lock);
> > > -   while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> > > -           spin_unlock(&locked->sched_engine->lock);
> > > -           spin_lock(&engine->sched_engine->lock);
> > > -           locked = engine;
> > > -   }
> > > -   list_del_init(&rq->sched.link);
> > > -
> > > -   clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > > -   clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
> > > -
> > > -   /* Prevent further __await_execution() registering a cb, then flush */
> > > -   set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> > > -
> > > -   spin_unlock_irq(&locked->sched_engine->lock);
> > > -
> > > -   __notify_execute_cb_imm(rq);
> > > -}
> > > -
> > >   static void __rq_init_watchdog(struct i915_request *rq)
> > >   {
> > >     rq->watchdog.timer.function = NULL;
> > > @@ -395,9 +364,7 @@ bool i915_request_retire(struct i915_request *rq)
> > >      * after removing the breadcrumb and signaling it, so that we do not
> > >      * inadvertently attach the breadcrumb to a completed request.
> > >      */
> > > -   if (!list_empty(&rq->sched.link))
> > > -           remove_from_engine(rq);
> > > -   atomic_dec(&rq->context->guc_id_ref);
> > > +   rq->engine->remove_active_request(rq);
> > >     GEM_BUG_ON(!llist_empty(&rq->execute_cb));
> > >     __list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
> > > @@ -539,7 +506,7 @@ __await_execution(struct i915_request *rq,
> > >     if (llist_add(&cb->work.node.llist, &signal->execute_cb)) {
> > >             if (i915_request_is_active(signal) ||
> > >                 __request_in_flight(signal))
> > > -                   __notify_execute_cb_imm(signal);
> > > +                   i915_request_notify_execute_cb_imm(signal);
> > >     }
> > >     return 0;
> > > @@ -676,7 +643,7 @@ bool __i915_request_submit(struct i915_request *request)
> > >     result = true;
> > >     GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> > > -   list_move_tail(&request->sched.link, &engine->sched_engine->requests);
> > > +   engine->add_active_request(request);
> > >   active:
> > >     clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags);
> > >     set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
> > > diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> > > index f870cd75a001..bcc6340c505e 100644
> > > --- a/drivers/gpu/drm/i915/i915_request.h
> > > +++ b/drivers/gpu/drm/i915/i915_request.h
> > > @@ -649,4 +649,6 @@ bool
> > >   i915_request_active_engine(struct i915_request *rq,
> > >                        struct intel_engine_cs **active);
> > > +void i915_request_notify_execute_cb_imm(struct i915_request *rq);
> > > +
> > >   #endif /* I915_REQUEST_H */
> > >
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 64/97] drm/i915/guc: Reset implementation for new GuC interface
  2021-06-04  8:16       ` Daniel Vetter
@ 2021-06-04 18:02         ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-06-04 18:02 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jason Ekstrand, Tvrtko Ursulin, intel-gfx, dri-devel, Daniel Vetter

On Fri, Jun 04, 2021 at 10:16:14AM +0200, Daniel Vetter wrote:
> On Fri, Jun 4, 2021 at 5:25 AM Matthew Brost <matthew.brost@intel.com> wrote:
> >
> > On Wed, Jun 02, 2021 at 03:33:43PM +0100, Tvrtko Ursulin wrote:
> > >
> > > On 06/05/2021 20:14, Matthew Brost wrote:
> > > > Reset implementation for new GuC interface. This is the legacy reset
> > > > implementation which is called when the i915 owns the engine hang check.
> > > > Future patches will offload the engine hang check to GuC but we will
> > > > continue to maintain this legacy path as a fallback and this code path
> > > > is also required if the GuC dies.
> > > >
> > > > With the new GuC interface it is not possible to reset individual
> > > > engines - it is only possible to reset the GPU entirely. This patch
> > > > forces an entire chip reset if any engine hangs.
> > > >
> > > > Cc: John Harrison <john.c.harrison@intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >   drivers/gpu/drm/i915/gt/intel_context.c       |   3 +
> > > >   drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
> > > >   drivers/gpu/drm/i915/gt/intel_engine_types.h  |   6 +
> > > >   .../drm/i915/gt/intel_execlists_submission.c  |  40 ++
> > > >   drivers/gpu/drm/i915/gt/intel_gt_pm.c         |   6 +-
> > > >   drivers/gpu/drm/i915/gt/intel_reset.c         |  18 +-
> > > >   .../gpu/drm/i915/gt/intel_ring_submission.c   |  22 +
> > > >   drivers/gpu/drm/i915/gt/mock_engine.c         |  31 +
> > > >   drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  16 +-
> > > >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   8 +-
> > > >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 580 ++++++++++++++----
> > > >   drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  34 +-
> > > >   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   3 +
> > > >   drivers/gpu/drm/i915/i915_request.c           |  41 +-
> > > >   drivers/gpu/drm/i915/i915_request.h           |   2 +
> > > >   15 files changed, 643 insertions(+), 174 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> > > > index b24a1b7a3f88..2f01437056a8 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > > > @@ -392,6 +392,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
> > > >     spin_lock_init(&ce->guc_state.lock);
> > > >     INIT_LIST_HEAD(&ce->guc_state.fences);
> > > > +   spin_lock_init(&ce->guc_active.lock);
> > > > +   INIT_LIST_HEAD(&ce->guc_active.requests);
> > > > +
> > > >     ce->guc_id = GUC_INVALID_LRC_ID;
> > > >     INIT_LIST_HEAD(&ce->guc_id_link);
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > > index 6945963a31ba..b63c8cf7823b 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > > @@ -165,6 +165,13 @@ struct intel_context {
> > > >             struct list_head fences;
> > > >     } guc_state;
> > > > +   struct {
> > > > +           /** lock: protects everything in guc_active */
> > > > +           spinlock_t lock;
> > > > +           /** requests: active requests on this context */
> > > > +           struct list_head requests;
> > > > +   } guc_active;
> > >
> > > More accounting, yeah, this is more of that where GuC gives with one hand
> > > and takes away with the other. :(
> > >
> >
> > Yep but we probably can drop this once we switch to the DRM scheduler.
> > The drm_gpu_scheduler has a list of jobs and if don't mind searching the
> > whole thing on a reset that will probably work too. I think the only
> > reason we have a per context list is because of feedback I received a
> > a while go saying resets are per context with GuC so keep a list on the
> > context and engine list didn't really fit either. I'll make a to circle
> > back to this when we hook into the DRM scheduler.
> 
> Please add a FIXME or similar to the kerneldoc comment for stuff like
> this. We have a lot of things to recheck once the big picture is
> sorted, and it's easy to forget them.
> 

Sure, will do in next rev. But I think a lot things like this will come
naturally when we switch over to the DRM scheduler. I didn't quite get
to deleting this list the in my PoC but could clearly see this was no
longer needed and actually have a TODO in the code to delete this.

Matt

> Similar for anything else where we have opens about how to structure
> things once it's cut over.
> -Daniel
> 
> >
> > > > +
> > > >     /* GuC scheduling state that does not require a lock. */
> > > >     atomic_t guc_sched_state_no_lock;
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > > index f7b6eed586ce..b84562b2708b 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > > @@ -432,6 +432,12 @@ struct intel_engine_cs {
> > > >      */
> > > >     void            (*release)(struct intel_engine_cs *engine);
> > > > +   /*
> > > > +    * Add / remove request from engine active tracking
> > > > +    */
> > > > +   void            (*add_active_request)(struct i915_request *rq);
> > > > +   void            (*remove_active_request)(struct i915_request *rq);
> > > > +
> > > >     struct intel_engine_execlists execlists;
> > > >     /*
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > index 396b1356ea3e..54518b64bdbd 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > @@ -3117,6 +3117,42 @@ static void execlists_park(struct intel_engine_cs *engine)
> > > >     cancel_timer(&engine->execlists.preempt);
> > > >   }
> > > > +static void add_to_engine(struct i915_request *rq)
> > > > +{
> > > > +   lockdep_assert_held(&rq->engine->sched_engine->lock);
> > > > +   list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> > > > +}
> > > > +
> > > > +static void remove_from_engine(struct i915_request *rq)
> > > > +{
> > > > +   struct intel_engine_cs *engine, *locked;
> > > > +
> > > > +   /*
> > > > +    * Virtual engines complicate acquiring the engine timeline lock,
> > > > +    * as their rq->engine pointer is not stable until under that
> > > > +    * engine lock. The simple ploy we use is to take the lock then
> > > > +    * check that the rq still belongs to the newly locked engine.
> > > > +    */
> > > > +   locked = READ_ONCE(rq->engine);
> > > > +   spin_lock_irq(&locked->sched_engine->lock);
> > > > +   while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> > > > +           spin_unlock(&locked->sched_engine->lock);
> > > > +           spin_lock(&engine->sched_engine->lock);
> > > > +           locked = engine;
> > > > +   }
> > >
> > > Could use i915_request_active_engine although tbf I don't remember why I did
> > > not convert all callers when I added it. Perhaps I just did not find them
> > > all.
> > >
> >
> > I think is a copy paste from the existing code or least at it should be.
> > It just moves implicit execlists behavior from common code to a
> > execlists specific vfunc.
> >
> > > > +   list_del_init(&rq->sched.link);
> > > > +
> > > > +   clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > > > +   clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
> > > > +
> > > > +   /* Prevent further __await_execution() registering a cb, then flush */
> > > > +   set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> > > > +
> > > > +   spin_unlock_irq(&locked->sched_engine->lock);
> > > > +
> > > > +   i915_request_notify_execute_cb_imm(rq);
> > > > +}
> > > > +
> > > >   static bool can_preempt(struct intel_engine_cs *engine)
> > > >   {
> > > >     if (INTEL_GEN(engine->i915) > 8)
> > > > @@ -3214,6 +3250,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
> > > >     engine->cops = &execlists_context_ops;
> > > >     engine->request_alloc = execlists_request_alloc;
> > > >     engine->bump_serial = execlist_bump_serial;
> > > > +   engine->add_active_request = add_to_engine;
> > > > +   engine->remove_active_request = remove_from_engine;
> > > >     engine->reset.prepare = execlists_reset_prepare;
> > > >     engine->reset.rewind = execlists_reset_rewind;
> > > > @@ -3915,6 +3953,8 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> > > >             ve->base.sched_engine->kick_backend =
> > > >                     sibling->sched_engine->kick_backend;
> > > > +           ve->base.add_active_request = sibling->add_active_request;
> > > > +           ve->base.remove_active_request = sibling->remove_active_request;
> > > >             ve->base.emit_bb_start = sibling->emit_bb_start;
> > > >             ve->base.emit_flush = sibling->emit_flush;
> > > >             ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > > > index aef3084e8b16..463a6ae605a0 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > > > @@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
> > > >     if (intel_gt_is_wedged(gt))
> > > >             intel_gt_unset_wedged(gt);
> > > > -   intel_uc_sanitize(&gt->uc);
> > > > -
> > > >     for_each_engine(engine, gt, id)
> > > >             if (engine->reset.prepare)
> > > >                     engine->reset.prepare(engine);
> > > > @@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
> > > >                     __intel_engine_reset(engine, false);
> > > >     }
> > > > +   intel_uc_reset(&gt->uc, false);
> > > > +
> > > >     for_each_engine(engine, gt, id)
> > > >             if (engine->reset.finish)
> > > >                     engine->reset.finish(engine);
> > > > @@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt)
> > > >             goto err_wedged;
> > > >     }
> > > > +   intel_uc_reset_finish(&gt->uc);
> > > > +
> > > >     intel_rps_enable(&gt->rps);
> > > >     intel_llc_enable(&gt->llc);
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> > > > index d5094be6d90f..ce3ef26ffe2d 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> > > > @@ -758,6 +758,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
> > > >             __intel_engine_reset(engine, stalled_mask & engine->mask);
> > > >     local_bh_enable();
> > > > +   intel_uc_reset(&gt->uc, true);
> > > > +
> > > >     intel_ggtt_restore_fences(gt->ggtt);
> > > >     return err;
> > > > @@ -782,6 +784,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
> > > >             if (awake & engine->mask)
> > > >                     intel_engine_pm_put(engine);
> > > >     }
> > > > +
> > > > +   intel_uc_reset_finish(&gt->uc);
> > > >   }
> > > >   static void nop_submit_request(struct i915_request *request)
> > > > @@ -835,6 +839,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
> > > >     for_each_engine(engine, gt, id)
> > > >             if (engine->reset.cancel)
> > > >                     engine->reset.cancel(engine);
> > > > +   intel_uc_cancel_requests(&gt->uc);
> > > >     local_bh_enable();
> > > >     reset_finish(gt, awake);
> > > > @@ -1123,6 +1128,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
> > > >     ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
> > > >     GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags));
> > > > +   if (intel_engine_uses_guc(engine))
> > > > +           return -ENODEV;
> > > > +
> > > >     if (!intel_engine_pm_get_if_awake(engine))
> > > >             return 0;
> > > > @@ -1133,13 +1141,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
> > > >                        "Resetting %s for %s\n", engine->name, msg);
> > > >     atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
> > > > -   if (intel_engine_uses_guc(engine))
> > > > -           ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
> > > > -   else
> > > > -           ret = intel_gt_reset_engine(engine);
> > > > +   ret = intel_gt_reset_engine(engine);
> > > >     if (ret) {
> > > >             /* If we fail here, we expect to fallback to a global reset */
> > > > -           ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
> > > > +           ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret);
> > > >             goto out;
> > > >     }
> > > > @@ -1273,7 +1278,8 @@ void intel_gt_handle_error(struct intel_gt *gt,
> > > >      * Try engine reset when available. We fall back to full reset if
> > > >      * single reset fails.
> > > >      */
> > > > -   if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
> > > > +   if (!intel_uc_uses_guc_submission(&gt->uc) &&
> > > > +       intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
> > >
> > > If with guc driver cannot do engine reset, could intel_has_reset_engine just
> > > say false in that case so guc check wouldn't have to be added here? Also
> > > noticed this is the same open I had in 2019. and someone said it can and
> > > would be folded. ;(
> > >
> >
> > Let me look into that before the next rev, I briefly looked at this and
> > it does seem plausible this function could return false. Only concern
> > here is reset code is notoriously delicate so I am wary to change this
> > in this series. We have a live list of follow ups, this could be
> > included if it doesn't get fixed in this patch.
> >
> > > >             local_bh_disable();
> > > >             for_each_engine_masked(engine, gt, engine_mask, tmp) {
> > > >                     BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > > index 39dd7c4ed0a9..7d05bf16094c 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > > @@ -1050,6 +1050,25 @@ static void ring_bump_serial(struct intel_engine_cs *engine)
> > > >     engine->serial++;
> > > >   }
> > > > +static void add_to_engine(struct i915_request *rq)
> > > > +{
> > > > +   lockdep_assert_held(&rq->engine->sched_engine->lock);
> > > > +   list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> > > > +}
> > > > +
> > > > +static void remove_from_engine(struct i915_request *rq)
> > > > +{
> > > > +   spin_lock_irq(&rq->engine->sched_engine->lock);
> > > > +   list_del_init(&rq->sched.link);
> > > > +
> > > > +   /* Prevent further __await_execution() registering a cb, then flush */
> > > > +   set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> > > > +
> > > > +   spin_unlock_irq(&rq->engine->sched_engine->lock);
> > > > +
> > > > +   i915_request_notify_execute_cb_imm(rq);
> > > > +}
> > > > +
> > > >   static void setup_common(struct intel_engine_cs *engine)
> > > >   {
> > > >     struct drm_i915_private *i915 = engine->i915;
> > > > @@ -1067,6 +1086,9 @@ static void setup_common(struct intel_engine_cs *engine)
> > > >     engine->reset.cancel = reset_cancel;
> > > >     engine->reset.finish = reset_finish;
> > > > +   engine->add_active_request = add_to_engine;
> > > > +   engine->remove_active_request = remove_from_engine;
> > > > +
> > > >     engine->cops = &ring_context_ops;
> > > >     engine->request_alloc = ring_request_alloc;
> > > >     engine->bump_serial = ring_bump_serial;
> > > > diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> > > > index 4d023b5cd5da..dccf5fce980a 100644
> > > > --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> > > > +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> > > > @@ -235,6 +235,35 @@ static void mock_submit_request(struct i915_request *request)
> > > >     spin_unlock_irqrestore(&engine->hw_lock, flags);
> > > >   }
> > > > +static void mock_add_to_engine(struct i915_request *rq)
> > > > +{
> > > > +   lockdep_assert_held(&rq->engine->sched_engine->lock);
> > > > +   list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> > > > +}
> > > > +
> > > > +static void mock_remove_from_engine(struct i915_request *rq)
> > > > +{
> > > > +   struct intel_engine_cs *engine, *locked;
> > > > +
> > > > +   /*
> > > > +    * Virtual engines complicate acquiring the engine timeline lock,
> > > > +    * as their rq->engine pointer is not stable until under that
> > > > +    * engine lock. The simple ploy we use is to take the lock then
> > > > +    * check that the rq still belongs to the newly locked engine.
> > > > +    */
> > > > +
> > > > +   locked = READ_ONCE(rq->engine);
> > > > +   spin_lock_irq(&locked->sched_engine->lock);
> > > > +   while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> > > > +           spin_unlock(&locked->sched_engine->lock);
> > > > +           spin_lock(&engine->sched_engine->lock);
> > > > +           locked = engine;
> > > > +   }
> > > > +   list_del_init(&rq->sched.link);
> > > > +   spin_unlock_irq(&locked->sched_engine->lock);
> > > > +}
> > > > +
> > > > +
> > > >   static void mock_reset_prepare(struct intel_engine_cs *engine)
> > > >   {
> > > >   }
> > > > @@ -327,6 +356,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
> > > >     engine->base.emit_flush = mock_emit_flush;
> > > >     engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
> > > >     engine->base.submit_request = mock_submit_request;
> > > > +   engine->base.add_active_request = mock_add_to_engine;
> > > > +   engine->base.remove_active_request = mock_remove_from_engine;
> > > >     engine->base.reset.prepare = mock_reset_prepare;
> > > >     engine->base.reset.rewind = mock_reset_rewind;
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > > > index 235c1997f32d..864b14e313a3 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > > > @@ -146,6 +146,9 @@ static void gen11_disable_guc_interrupts(struct intel_guc *guc)
> > > >   {
> > > >     struct intel_gt *gt = guc_to_gt(guc);
> > > > +   if (!guc->interrupts.enabled)
> > > > +           return;
> > > > +
> > > >     spin_lock_irq(&gt->irq_lock);
> > > >     guc->interrupts.enabled = false;
> > > > @@ -579,19 +582,6 @@ int intel_guc_suspend(struct intel_guc *guc)
> > > >     return 0;
> > > >   }
> > > > -/**
> > > > - * intel_guc_reset_engine() - ask GuC to reset an engine
> > > > - * @guc:   intel_guc structure
> > > > - * @engine:        engine to be reset
> > > > - */
> > > > -int intel_guc_reset_engine(struct intel_guc *guc,
> > > > -                      struct intel_engine_cs *engine)
> > > > -{
> > > > -   /* XXX: to be implemented with submission interface rework */
> > > > -
> > > > -   return -ENODEV;
> > > > -}
> > > > -
> > > >   /**
> > > >    * intel_guc_resume() - notify GuC resuming from suspend state
> > > >    * @guc:  the guc
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > index 47eaa69809e8..afea04d56494 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > @@ -243,14 +243,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
> > > >   int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
> > > > -int intel_guc_reset_engine(struct intel_guc *guc,
> > > > -                      struct intel_engine_cs *engine);
> > > > -
> > > >   int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > > >                                       const u32 *msg, u32 len);
> > > >   int intel_guc_sched_done_process_msg(struct intel_guc *guc,
> > > >                                  const u32 *msg, u32 len);
> > > > +void intel_guc_submission_reset_prepare(struct intel_guc *guc);
> > > > +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
> > > > +void intel_guc_submission_reset_finish(struct intel_guc *guc);
> > > > +void intel_guc_submission_cancel_requests(struct intel_guc *guc);
> > > > +
> > > >   void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
> > > >   #endif
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > index 80b89171b35a..8c093bc2d3a4 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > @@ -140,7 +140,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce)
> > > >   static inline void
> > > >   set_context_wait_for_deregister_to_register(struct intel_context *ce)
> > > >   {
> > > > -   /* Only should be called from guc_lrc_desc_pin() */
> > > > +   /* Only should be called from guc_lrc_desc_pin() without lock */
> > > >     ce->guc_state.sched_state |=
> > > >             SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> > > >   }
> > > > @@ -240,15 +240,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
> > > >   static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
> > > >   {
> > > > +   guc->lrc_desc_pool_vaddr = NULL;
> > > >     i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
> > > >   }
> > > > +static inline bool guc_submission_initialized(struct intel_guc *guc)
> > > > +{
> > > > +   return guc->lrc_desc_pool_vaddr != NULL;
> > > > +}
> > > > +
> > > >   static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
> > > >   {
> > > > -   struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> > > > +   if (likely(guc_submission_initialized(guc))) {
> > > > +           struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> > > > +           unsigned long flags;
> > > > -   memset(desc, 0, sizeof(*desc));
> > > > -   xa_erase_irq(&guc->context_lookup, id);
> > > > +           memset(desc, 0, sizeof(*desc));
> > > > +
> > > > +           /*
> > > > +            * xarray API doesn't have xa_erase_irqsave wrapper, so calling
> > > > +            * the lower level functions directly.
> > > > +            */
> > > > +           xa_lock_irqsave(&guc->context_lookup, flags);
> > > > +           __xa_erase(&guc->context_lookup, id);
> > > > +           xa_unlock_irqrestore(&guc->context_lookup, flags);
> > > > +   }
> > > >   }
> > > >   static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> > > > @@ -259,7 +275,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> > > >   static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> > > >                                        struct intel_context *ce)
> > > >   {
> > > > -   xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> > > > +   unsigned long flags;
> > > > +
> > > > +   /*
> > > > +    * xarray API doesn't have xa_save_irqsave wrapper, so calling the
> > > > +    * lower level functions directly.
> > > > +    */
> > > > +   xa_lock_irqsave(&guc->context_lookup, flags);
> > > > +   __xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
> > > > +   xa_unlock_irqrestore(&guc->context_lookup, flags);
> > > >   }
> > > >   static int guc_submission_busy_loop(struct intel_guc* guc,
> > > > @@ -330,6 +354,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
> > > >                                     interruptible, timeout);
> > > >   }
> > > > +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
> > > > +
> > > >   static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > > >   {
> > > >     int err;
> > > > @@ -337,11 +363,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > > >     u32 action[3];
> > > >     int len = 0;
> > > >     u32 g2h_len_dw = 0;
> > > > -   bool enabled = context_enabled(ce);
> > > > +   bool enabled;
> > > >     GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> > > >     GEM_BUG_ON(context_guc_id_invalid(ce));
> > > > +   /*
> > > > +    * Corner case where the GuC firmware was blown away and reloaded while
> > > > +    * this context was pinned.
> > > > +    */
> > > > +   if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
> > > > +           err = guc_lrc_desc_pin(ce, false);
> > > > +           if (unlikely(err))
> > > > +                   goto out;
> > > > +   }
> > > > +   enabled = context_enabled(ce);
> > > > +
> > > >     if (!enabled) {
> > > >             action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> > > >             action[len++] = ce->guc_id;
> > > > @@ -364,6 +401,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > > >             intel_context_put(ce);
> > > >     }
> > > > +out:
> > > >     return err;
> > > >   }
> > > > @@ -418,15 +456,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
> > > >     if (submit) {
> > > >             guc_set_lrc_tail(last);
> > > >   resubmit:
> > > > -           /*
> > > > -            * We only check for -EBUSY here even though it is possible for
> > > > -            * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
> > > > -            * died and a full GPU needs to be done. The hangcheck will
> > > > -            * eventually detect that the GuC has died and trigger this
> > > > -            * reset so no need to handle -EDEADLK here.
> > > > -            */
> > > >             ret = guc_add_request(guc, last);
> > > > -           if (ret == -EBUSY) {
> > > > +           if (unlikely(ret == -EDEADLK))
> > > > +                   goto deadlk;
> > > > +           else if (ret == -EBUSY) {
> > > >                     i915_sched_engine_kick(sched_engine);
> > > >                     guc->stalled_request = last;
> > > >                     return false;
> > > > @@ -436,6 +469,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
> > > >     guc->stalled_request = NULL;
> > > >     return submit;
> > > > +
> > > > +deadlk:
> > > > +   sched_engine->tasklet.callback = NULL;
> > > > +   tasklet_disable_nosync(&sched_engine->tasklet);
> > > > +   return false;
> > > >   }
> > > >   static void guc_submission_tasklet(struct tasklet_struct *t)
> > > > @@ -462,29 +500,165 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
> > > >             intel_engine_signal_breadcrumbs(engine);
> > > >   }
> > > > -static void guc_reset_prepare(struct intel_engine_cs *engine)
> > > > +static void __guc_context_destroy(struct intel_context *ce);
> > > > +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
> > > > +static void guc_signal_context_fence(struct intel_context *ce);
> > > > +
> > > > +static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
> > > >   {
> > > > -   struct i915_sched_engine * const sched_engine = engine->sched_engine;
> > > > +   struct intel_context *ce;
> > > > +   unsigned long index, flags;
> > > > +   bool pending_disable, pending_enable, deregister, destroyed;
> > > > -   ENGINE_TRACE(engine, "\n");
> > > > +   xa_for_each(&guc->context_lookup, index, ce) {
> > > > +           /* Flush context */
> > > > +           spin_lock_irqsave(&ce->guc_state.lock, flags);
> > > > +           spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > >
> > > Very unusual pattern - what does it do?
> > >
> >
> > The comment below tries to explain this. Basically by cycling the lock
> > it guarantees submission_disabled() is visible to all callers that touch
> > the below flags.
> >
> > > > +
> > > > +           /*
> > > > +            * Once we are at this point submission_disabled() is guaranteed
> > > > +            * to visible to all callers who set the below flags (see above
> > > > +            * flush and flushes in reset_prepare). If submission_disabled()
> > > > +            * is set, the caller shouldn't set these flags.
> > > > +            */
> > > > +
> > > > +           destroyed = context_destroyed(ce);
> > > > +           pending_enable = context_pending_enable(ce);
> > > > +           pending_disable = context_pending_disable(ce);
> > > > +           deregister = context_wait_for_deregister_to_register(ce);
> > > > +           init_sched_state(ce);
> > > > +
> > > > +           if (pending_enable || destroyed || deregister) {
> > > > +                   atomic_dec(&guc->outstanding_submission_g2h);
> > > > +                   if (deregister)
> > > > +                           guc_signal_context_fence(ce);
> > > > +                   if (destroyed) {
> > > > +                           release_guc_id(guc, ce);
> > > > +                           __guc_context_destroy(ce);
> > > > +                   }
> > > > +                   if (pending_enable|| deregister)
> > > > +                           intel_context_put(ce);
> > > > +           }
> > > > +
> > > > +           /* Not mutualy exclusive with above if statement. */
> > > > +           if (pending_disable) {
> > > > +                   guc_signal_context_fence(ce);
> > > > +                   intel_context_sched_disable_unpin(ce);
> > > > +                   atomic_dec(&guc->outstanding_submission_g2h);
> > > > +                   intel_context_put(ce);
> > > > +           }
> > >
> > > Yeah this function is a taste of the taste machine I think is _extremely_
> > > hard to review and know with any confidence it does the right thing.
> > >
> >
> > What is the other option? Block everytime we issue an asynchronous
> > command to the GuC? If we want to do everyting asynchronously we have to
> > track state and take further actions when the GuC finally responds. We
> > also have to deal with the GuC dying and taking those actions on our
> > own.
> >
> > Luckily we do have several other aside from myself that do understand
> > this quite well.
> >
> > > > +   }
> > > > +}
> > > > +
> > > > +static inline bool
> > > > +submission_disabled(struct intel_guc *guc)
> > > > +{
> > > > +   struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > > > +
> > > > +   return unlikely(!__tasklet_is_enabled(&sched_engine->tasklet));
> > > > +}
> > > > +
> > > > +static void disable_submission(struct intel_guc *guc)
> > > > +{
> > > > +   struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > > > +
> > > > +   if (__tasklet_is_enabled(&sched_engine->tasklet)) {
> > > > +           GEM_BUG_ON(!guc->ct.enabled);
> > > > +           __tasklet_disable_sync_once(&sched_engine->tasklet);
> > > > +           sched_engine->tasklet.callback = NULL;
> > > > +   }
> > > > +}
> > > > +
> > > > +static void enable_submission(struct intel_guc *guc)
> > > > +{
> > > > +   struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > > > +   unsigned long flags;
> > > > +
> > > > +   spin_lock_irqsave(&guc->sched_engine->lock, flags);
> > > > +   sched_engine->tasklet.callback = guc_submission_tasklet;
> > > > +   wmb();
> > >
> > > All memory barriers must be documented.
> > >
> >
> > Found that out from checkpatch the other day, will fix.
> >
> > > > +   if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
> > > > +       __tasklet_enable(&sched_engine->tasklet)) {
> > > > +           GEM_BUG_ON(!guc->ct.enabled);
> > > > +
> > > > +           /* And kick in case we missed a new request submission. */
> > > > +           i915_sched_engine_hi_kick(sched_engine);
> > > > +   }
> > > > +   spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
> > > > +}
> > > > +
> > > > +static void guc_flush_submissions(struct intel_guc *guc)
> > > > +{
> > > > +   struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > > > +   unsigned long flags;
> > > > +
> > > > +   spin_lock_irqsave(&sched_engine->lock, flags);
> > > > +   spin_unlock_irqrestore(&sched_engine->lock, flags);
> > >
> > > Oh right, more of this. No idea.
> > >
> >
> > Same as above. If you change some state then cycle a lock it is
> > guaranteed that state is visible next time someone grabs the lock. I do
> > explain these races in the documentation patch near the end of the
> > series. Without a BKL I don't see how else to avoid these reset races.
> >
> > > > +}
> > > > +
> > > > +void intel_guc_submission_reset_prepare(struct intel_guc *guc)
> > > > +{
> > > > +   int i;
> > > > +
> > > > +   if (unlikely(!guc_submission_initialized(guc)))
> > > > +           /* Reset called during driver load? GuC not yet initialised! */
> > > > +           return;
> > > > +
> > > > +   disable_submission(guc);
> > > > +   guc->interrupts.disable(guc);
> > > > +
> > > > +   /* Flush IRQ handler */
> > > > +   spin_lock_irq(&guc_to_gt(guc)->irq_lock);
> > > > +   spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
> > > > +
> > > > +   guc_flush_submissions(guc);
> > > >     /*
> > > > -    * Prevent request submission to the hardware until we have
> > > > -    * completed the reset in i915_gem_reset_finish(). If a request
> > > > -    * is completed by one engine, it may then queue a request
> > > > -    * to a second via its sched_engine->tasklet *just* as we are
> > > > -    * calling engine->init_hw() and also writing the ELSP.
> > > > -    * Turning off the sched_engine->tasklet until the reset is over
> > > > -    * prevents the race.
> > > > +    * Handle any outstanding G2Hs before reset. Call IRQ handler directly
> > > > +    * each pass as interrupt have been disabled. We always scrub for
> > > > +    * outstanding G2H as it is possible for outstanding_submission_g2h to
> > > > +    * be incremented after the context state update.
> > > >      */
> > > > -   __tasklet_disable_sync_once(&sched_engine->tasklet);
> > > > +   for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
> > >
> > > Why is four the magic number and what happens if it is not enough?
> > >
> >
> > I just picked a number. Regardless if the normal G2H path processes all the
> > G2H we scrub all the context state for lost ones.
> >
> > > > +           intel_guc_to_host_event_handler(guc);
> > > > +#define wait_for_reset(guc, wait_var) \
> > > > +           guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
> > > > +           do {
> > > > +                   wait_for_reset(guc, &guc->outstanding_submission_g2h);
> > > > +           } while (!list_empty(&guc->ct.requests.incoming));
> > > > +   }
> > > > +   scrub_guc_desc_for_outstanding_g2h(guc);
> > > > +}
> > > > +
> > > > +static struct intel_engine_cs *
> > > > +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> > > > +{
> > > > +   struct intel_engine_cs *engine;
> > > > +   intel_engine_mask_t tmp, mask = ve->mask;
> > > > +   unsigned int num_siblings = 0;
> > > > +
> > > > +   for_each_engine_masked(engine, ve->gt, mask, tmp)
> > > > +           if (num_siblings++ == sibling)
> > > > +                   return engine;
> > >
> > > Not sure how often is this used overall and whether just storing the array
> > > in ve could be justified.
> > >
> >
> > It really is only used with sibling == 0, so it should be fast.
> >
> > > > +
> > > > +   return NULL;
> > > > +}
> > > > +
> > > > +static inline struct intel_engine_cs *
> > > > +__context_to_physical_engine(struct intel_context *ce)
> > > > +{
> > > > +   struct intel_engine_cs *engine = ce->engine;
> > > > +
> > > > +   if (intel_engine_is_virtual(engine))
> > > > +           engine = guc_virtual_get_sibling(engine, 0);
> > > > +
> > > > +   return engine;
> > > >   }
> > > > -static void guc_reset_state(struct intel_context *ce,
> > > > -                       struct intel_engine_cs *engine,
> > > > -                       u32 head,
> > > > -                       bool scrub)
> > > > +static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
> > > >   {
> > > > +   struct intel_engine_cs *engine = __context_to_physical_engine(ce);
> > > > +
> > > >     GEM_BUG_ON(!intel_context_is_pinned(ce));
> > > >     /*
> > > > @@ -502,42 +676,147 @@ static void guc_reset_state(struct intel_context *ce,
> > > >     lrc_update_regs(ce, engine, head);
> > > >   }
> > > > -static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
> > > > +static void guc_reset_nop(struct intel_engine_cs *engine)
> > > >   {
> > > > -   struct intel_engine_execlists * const execlists = &engine->execlists;
> > > > -   struct i915_request *rq;
> > > > +}
> > > > +
> > > > +static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled)
> > > > +{
> > > > +}
> > > > +
> > > > +static void
> > > > +__unwind_incomplete_requests(struct intel_context *ce)
> > > > +{
> > > > +   struct i915_request *rq, *rn;
> > > > +   struct list_head *pl;
> > > > +   int prio = I915_PRIORITY_INVALID;
> > > > +   struct i915_sched_engine * const sched_engine =
> > > > +           ce->engine->sched_engine;
> > > >     unsigned long flags;
> > > > -   spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > > > +   spin_lock_irqsave(&sched_engine->lock, flags);
> > > > +   spin_lock(&ce->guc_active.lock);
> > > > +   list_for_each_entry_safe(rq, rn,
> > > > +                            &ce->guc_active.requests,
> > > > +                            sched.link) {
> > > > +           if (i915_request_completed(rq))
> > > > +                   continue;
> > > > +
> > > > +           list_del_init(&rq->sched.link);
> > > > +           spin_unlock(&ce->guc_active.lock);
> > >
> > > Drops the lock and continues iterating the same list is safe? Comment needed
> > > I think and I do remember I worried about this, or similar instances, in GuC
> > > code before.
> > >
> >
> > We only need the active lock for ce->guc_active.requests list. It is
> > indeed safe to drop the lock.
> >
> > > > +
> > > > +           __i915_request_unsubmit(rq);
> > > > +
> > > > +           /* Push the request back into the queue for later resubmission. */
> > > > +           GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
> > > > +           if (rq_prio(rq) != prio) {
> > > > +                   prio = rq_prio(rq);
> > > > +                   pl = i915_sched_lookup_priolist(sched_engine, prio);
> > > > +           }
> > > > +           GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
> > > > -   /* Push back any incomplete requests for replay after the reset. */
> > > > -   rq = execlists_unwind_incomplete_requests(execlists);
> > > > -   if (!rq)
> > > > -           goto out_unlock;
> > > > +           list_add_tail(&rq->sched.link, pl);
> > > > +           set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > > > +
> > > > +           spin_lock(&ce->guc_active.lock);
> > > > +   }
> > > > +   spin_unlock(&ce->guc_active.lock);
> > > > +   spin_unlock_irqrestore(&sched_engine->lock, flags);
> > > > +}
> > > > +
> > > > +static struct i915_request *context_find_active_request(struct intel_context *ce)
> > > > +{
> > > > +   struct i915_request *rq, *active = NULL;
> > > > +   unsigned long flags;
> > > > +
> > > > +   spin_lock_irqsave(&ce->guc_active.lock, flags);
> > > > +   list_for_each_entry_reverse(rq, &ce->guc_active.requests,
> > > > +                               sched.link) {
> > > > +           if (i915_request_completed(rq))
> > > > +                   break;
> > > > +
> > > > +           active = rq;
> > > > +   }
> > > > +   spin_unlock_irqrestore(&ce->guc_active.lock, flags);
> > > > +
> > > > +   return active;
> > > > +}
> > > > +
> > > > +static void __guc_reset_context(struct intel_context *ce, bool stalled)
> > > > +{
> > > > +   struct i915_request *rq;
> > > > +   u32 head;
> > > > +
> > > > +   /*
> > > > +    * GuC will implicitly mark the context as non-schedulable
> > > > +    * when it sends the reset notification. Make sure our state
> > > > +    * reflects this change. The context will be marked enabled
> > > > +    * on resubmission.
> > > > +    */
> > > > +   clr_context_enabled(ce);
> > > > +
> > > > +   rq = context_find_active_request(ce);
> > > > +   if (!rq) {
> > > > +           head = ce->ring->tail;
> > > > +           stalled = false;
> > > > +           goto out_replay;
> > > > +   }
> > > >     if (!i915_request_started(rq))
> > > >             stalled = false;
> > > > +   GEM_BUG_ON(i915_active_is_idle(&ce->active));
> > > > +   head = intel_ring_wrap(ce->ring, rq->head);
> > > >     __i915_request_reset(rq, stalled);
> > > > -   guc_reset_state(rq->context, engine, rq->head, stalled);
> > > > -out_unlock:
> > > > -   spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> > > > +out_replay:
> > > > +   guc_reset_state(ce, head, stalled);
> > > > +   __unwind_incomplete_requests(ce);
> > > > +}
> > > > +
> > > > +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
> > > > +{
> > > > +   struct intel_context *ce;
> > > > +   unsigned long index;
> > > > +
> > > > +   if (unlikely(!guc_submission_initialized(guc)))
> > > > +           /* Reset called during driver load? GuC not yet initialised! */
> > > > +           return;
> > > > +
> > > > +   xa_for_each(&guc->context_lookup, index, ce)
> > > > +           if (intel_context_is_pinned(ce))
> > > > +                   __guc_reset_context(ce, stalled);
> > > > +
> > > > +   /* GuC is blown away, drop all references to contexts */
> > > > +   xa_destroy(&guc->context_lookup);
> > > > +}
> > > > +
> > > > +static void guc_cancel_context_requests(struct intel_context *ce)
> > > > +{
> > > > +   struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
> > > > +   struct i915_request *rq;
> > > > +   unsigned long flags;
> > > > +
> > > > +   /* Mark all executing requests as skipped. */
> > > > +   spin_lock_irqsave(&sched_engine->lock, flags);
> > > > +   spin_lock(&ce->guc_active.lock);
> > > > +   list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
> > > > +           i915_request_put(i915_request_mark_eio(rq));
> > > > +   spin_unlock(&ce->guc_active.lock);
> > > > +   spin_unlock_irqrestore(&sched_engine->lock, flags);
> > >
> > > I suppose somewhere it will need to be documented what are the two locks
> > > protecting and why both are needed at some places.
> > >
> >
> > Yep, I have a locking section the doc patch near the end of the series.
> > Basically here is we don't want any new submissions processed when we
> > are canceling requests - that is the outer lock. The inner lock again
> > ce->guc_active.requests list.
> >
> > BTW - I think I am overly careful with the locks (when in doubt grab a
> > lock) in the reset / cancel code as there is no expectation that this
> > needs to perform well and resets are by far the racest code in the i915.
> >
> > > >   }
> > > > -static void guc_reset_cancel(struct intel_engine_cs *engine)
> > > > +static void
> > > > +guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine)
> > > >   {
> > > > -   struct i915_sched_engine * const sched_engine = engine->sched_engine;
> > > >     struct i915_request *rq, *rn;
> > > >     struct rb_node *rb;
> > > >     unsigned long flags;
> > > >     /* Can be called during boot if GuC fails to load */
> > > > -   if (!engine->gt)
> > > > +   if (!sched_engine)
> > > >             return;
> > > > -   ENGINE_TRACE(engine, "\n");
> > > > -
> > > >     /*
> > > >      * Before we call engine->cancel_requests(), we should have exclusive
> > > >      * access to the submission state. This is arranged for us by the
> > > > @@ -552,13 +831,7 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> > > >      * submission's irq state, we also wish to remind ourselves that
> > > >      * it is irq state.)
> > > >      */
> > > > -   spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > > > -
> > > > -   /* Mark all executing requests as skipped. */
> > > > -   list_for_each_entry(rq, &engine->sched_engine->requests, sched.link) {
> > > > -           i915_request_set_error_once(rq, -EIO);
> > > > -           i915_request_mark_complete(rq);
> > > > -   }
> > > > +   spin_lock_irqsave(&sched_engine->lock, flags);
> > > >     /* Flush the queued requests to the timeline list (for retiring). */
> > > >     while ((rb = rb_first_cached(&sched_engine->queue))) {
> > > > @@ -566,9 +839,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> > > >             priolist_for_each_request_consume(rq, rn, p) {
> > > >                     list_del_init(&rq->sched.link);
> > > > +
> > > >                     __i915_request_submit(rq);
> > > > -                   dma_fence_set_error(&rq->fence, -EIO);
> > > > -                   i915_request_mark_complete(rq);
> > > > +
> > > > +                   i915_request_put(i915_request_mark_eio(rq));
> > > >             }
> > > >             rb_erase_cached(&p->node, &sched_engine->queue);
> > > > @@ -580,19 +854,41 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> > > >     sched_engine->queue_priority_hint = INT_MIN;
> > > >     sched_engine->queue = RB_ROOT_CACHED;
> > > > -   spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> > > > +   spin_unlock_irqrestore(&sched_engine->lock, flags);
> > > >   }
> > > > -static void guc_reset_finish(struct intel_engine_cs *engine)
> > > > +void intel_guc_submission_cancel_requests(struct intel_guc *guc)
> > > >   {
> > > > -   struct i915_sched_engine * const sched_engine = engine->sched_engine;
> > > > +   struct intel_context *ce;
> > > > +   unsigned long index;
> > > > -   if (__tasklet_enable(&sched_engine->tasklet))
> > > > -           /* And kick in case we missed a new request submission. */
> > > > -           i915_sched_engine_hi_kick(sched_engine);
> > > > +   xa_for_each(&guc->context_lookup, index, ce)
> > > > +           if (intel_context_is_pinned(ce))
> > > > +                   guc_cancel_context_requests(ce);
> > > > +
> > > > +   guc_cancel_sched_engine_requests(guc->sched_engine);
> > > > +
> > > > +   /* GuC is blown away, drop all references to contexts */
> > > > +   xa_destroy(&guc->context_lookup);
> > > > +}
> > > > +
> > > > +void intel_guc_submission_reset_finish(struct intel_guc *guc)
> > > > +{
> > > > +   /* Reset called during driver load or during wedge? */
> > > > +   if (unlikely(!guc_submission_initialized(guc) ||
> > > > +                test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
> > > > +           return;
> > > > -   ENGINE_TRACE(engine, "depth->%d\n",
> > > > -                atomic_read(&sched_engine->tasklet.count));
> > > > +   /*
> > > > +    * Technically possible for either of these values to be non-zero here,
> > > > +    * but very unlikely + harmless. Regardless let's add a warn so we can
> > > > +    * see in CI if this happens frequently / a precursor to taking down the
> > > > +    * machine.
> > >
> > > And what did CI say over time this was in?
> > >
> >
> > It hasn't popped yet. This is more for upcoming code where we have a
> > g2hs we can't scrub (i.e. a TLB invalidation, engine class scheduling
> > disable, etc...).
> >
> > > It needs to be explained when it can be non zero and whether or not it can
> > > go to non zero just after the atomic_set below. Or if not why not.
> > >
> >
> > At this point we could probably turn this into a BUG_ON.
> >
> > > > +    */
> > > > +   GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
> > > > +   atomic_set(&guc->outstanding_submission_g2h, 0);
> > > > +
> > > > +   enable_submission(guc);
> > > >   }
> > > >   /*
> > > > @@ -659,6 +955,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> > > >     else
> > > >             trace_i915_request_guc_submit(rq);
> > > > +   if (unlikely(ret == -EDEADLK))
> > > > +           disable_submission(guc);
> > > > +
> > > >     return ret;
> > > >   }
> > > > @@ -671,7 +970,8 @@ static void guc_submit_request(struct i915_request *rq)
> > > >     /* Will be called from irq-context when using foreign fences. */
> > > >     spin_lock_irqsave(&sched_engine->lock, flags);
> > > > -   if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> > > > +   if (submission_disabled(guc) || guc->stalled_request ||
> > > > +       !i915_sched_engine_is_empty(sched_engine))
> > > >             queue_request(sched_engine, rq, rq_prio(rq));
> > > >     else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> > > >             i915_sched_engine_hi_kick(sched_engine);
> > > > @@ -808,7 +1108,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > > >   static int __guc_action_register_context(struct intel_guc *guc,
> > > >                                      u32 guc_id,
> > > > -                                    u32 offset)
> > > > +                                    u32 offset,
> > > > +                                    bool loop)
> > > >   {
> > > >     u32 action[] = {
> > > >             INTEL_GUC_ACTION_REGISTER_CONTEXT,
> > > > @@ -816,10 +1117,10 @@ static int __guc_action_register_context(struct intel_guc *guc,
> > > >             offset,
> > > >     };
> > > > -   return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > > > +   return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, loop);
> > > >   }
> > > > -static int register_context(struct intel_context *ce)
> > > > +static int register_context(struct intel_context *ce, bool loop)
> > > >   {
> > > >     struct intel_guc *guc = ce_to_guc(ce);
> > > >     u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> > > > @@ -827,11 +1128,12 @@ static int register_context(struct intel_context *ce)
> > > >     trace_intel_context_register(ce);
> > > > -   return __guc_action_register_context(guc, ce->guc_id, offset);
> > > > +   return __guc_action_register_context(guc, ce->guc_id, offset, loop);
> > > >   }
> > > >   static int __guc_action_deregister_context(struct intel_guc *guc,
> > > > -                                      u32 guc_id)
> > > > +                                      u32 guc_id,
> > > > +                                      bool loop)
> > > >   {
> > > >     u32 action[] = {
> > > >             INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> > > > @@ -839,16 +1141,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
> > > >     };
> > > >     return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
> > > > -                                   G2H_LEN_DW_DEREGISTER_CONTEXT, true);
> > > > +                                   G2H_LEN_DW_DEREGISTER_CONTEXT, loop);
> > > >   }
> > > > -static int deregister_context(struct intel_context *ce, u32 guc_id)
> > > > +static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
> > > >   {
> > > >     struct intel_guc *guc = ce_to_guc(ce);
> > > >     trace_intel_context_deregister(ce);
> > > > -   return __guc_action_deregister_context(guc, guc_id);
> > > > +   return __guc_action_deregister_context(guc, guc_id, loop);
> > > >   }
> > > >   static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> > > > @@ -877,7 +1179,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
> > > >     desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> > > >   }
> > > > -static int guc_lrc_desc_pin(struct intel_context *ce)
> > > > +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> > > >   {
> > > >     struct intel_runtime_pm *runtime_pm =
> > > >             &ce->engine->gt->i915->runtime_pm;
> > > > @@ -923,18 +1225,44 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
> > > >      */
> > > >     if (context_registered) {
> > > >             trace_intel_context_steal_guc_id(ce);
> > > > -           set_context_wait_for_deregister_to_register(ce);
> > > > -           intel_context_get(ce);
> > > > +           if (!loop) {
> > > > +                   set_context_wait_for_deregister_to_register(ce);
> > > > +                   intel_context_get(ce);
> > > > +           } else {
> > > > +                   bool disabled;
> > > > +                   unsigned long flags;
> > > > +
> > > > +                   /* Seal race with Reset */
> > >
> > > Needs to be more descriptive.
> > >
> >
> > Again have comment about this the doc patch. Basically this goes back to
> > your other questions about cycling a lock. You must check if
> > submission_disabled() within a lock otherwise there is a race with
> > resets and updating a contexts state.
> >
> > > > +                   spin_lock_irqsave(&ce->guc_state.lock, flags);
> > > > +                   disabled = submission_disabled(guc);
> > > > +                   if (likely(!disabled)) {
> > > > +                           set_context_wait_for_deregister_to_register(ce);
> > > > +                           intel_context_get(ce);
> > > > +                   }
> > > > +                   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > > > +                   if (unlikely(disabled)) {
> > > > +                           reset_lrc_desc(guc, desc_idx);
> > > > +                           return 0;       /* Will get registered later */
> > > > +                   }
> > > > +           }
> > > >             /*
> > > >              * If stealing the guc_id, this ce has the same guc_id as the
> > > >              * context whos guc_id was stole.
> > > >              */
> > > >             with_intel_runtime_pm(runtime_pm, wakeref)
> > > > -                   ret = deregister_context(ce, ce->guc_id);
> > > > +                   ret = deregister_context(ce, ce->guc_id, loop);
> > > > +           if (unlikely(ret == -EBUSY)) {
> > > > +                   clr_context_wait_for_deregister_to_register(ce);
> > > > +                   intel_context_put(ce);
> > > > +           }
> > > >     } else {
> > > >             with_intel_runtime_pm(runtime_pm, wakeref)
> > > > -                   ret = register_context(ce);
> > > > +                   ret = register_context(ce, loop);
> > > > +           if (unlikely(ret == -EBUSY))
> > > > +                   reset_lrc_desc(guc, desc_idx);
> > > > +           else if (unlikely(ret == -ENODEV))
> > > > +                   ret = 0;        /* Will get registered later */
> > > >     }
> > > >     return ret;
> > > > @@ -997,7 +1325,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
> > > >     GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
> > > >     trace_intel_context_sched_disable(ce);
> > > > -   intel_context_get(ce);
> > > >     guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
> > > >                              G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> > > > @@ -1007,6 +1334,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
> > > >   {
> > > >     set_context_pending_disable(ce);
> > > >     clr_context_enabled(ce);
> > > > +   intel_context_get(ce);
> > > >     return ce->guc_id;
> > > >   }
> > > > @@ -1019,7 +1347,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
> > > >     u16 guc_id;
> > > >     intel_wakeref_t wakeref;
> > > > -   if (context_guc_id_invalid(ce) ||
> > > > +   if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
> > > >         !lrc_desc_registered(guc, ce->guc_id)) {
> > > >             clr_context_enabled(ce);
> > > >             goto unpin;
> > > > @@ -1053,19 +1381,13 @@ static void guc_context_sched_disable(struct intel_context *ce)
> > > >   static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> > > >   {
> > > > -   struct intel_engine_cs *engine = ce->engine;
> > > > -   struct intel_guc *guc = &engine->gt->uc.guc;
> > > > -   unsigned long flags;
> > > > +   struct intel_guc *guc = ce_to_guc(ce);
> > > >     GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> > > >     GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> > > >     GEM_BUG_ON(context_enabled(ce));
> > > > -   spin_lock_irqsave(&ce->guc_state.lock, flags);
> > > > -   set_context_destroyed(ce);
> > > > -   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > > > -
> > > > -   deregister_context(ce, ce->guc_id);
> > > > +   deregister_context(ce, ce->guc_id, true);
> > > >   }
> > > >   static void __guc_context_destroy(struct intel_context *ce)
> > > > @@ -1093,13 +1415,15 @@ static void guc_context_destroy(struct kref *kref)
> > > >     struct intel_guc *guc = &ce->engine->gt->uc.guc;
> > > >     intel_wakeref_t wakeref;
> > > >     unsigned long flags;
> > > > +   bool disabled;
> > > >     /*
> > > >      * If the guc_id is invalid this context has been stolen and we can free
> > > >      * it immediately. Also can be freed immediately if the context is not
> > > >      * registered with the GuC.
> > > >      */
> > > > -   if (context_guc_id_invalid(ce) ||
> > > > +   if (submission_disabled(guc) ||
> > > > +       context_guc_id_invalid(ce) ||
> > > >         !lrc_desc_registered(guc, ce->guc_id)) {
> > > >             release_guc_id(guc, ce);
> > > >             __guc_context_destroy(ce);
> > > > @@ -1126,6 +1450,18 @@ static void guc_context_destroy(struct kref *kref)
> > > >             list_del_init(&ce->guc_id_link);
> > > >     spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > > > +   /* Seal race with Reset */
> > > > +   spin_lock_irqsave(&ce->guc_state.lock, flags);
> > > > +   disabled = submission_disabled(guc);
> > > > +   if (likely(!disabled))
> > > > +           set_context_destroyed(ce);
> > > > +   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > > > +   if (unlikely(disabled)) {
> > > > +           release_guc_id(guc, ce);
> > > > +           __guc_context_destroy(ce);
> > > > +           return;
> > >
> > > Same as above, needs a better comment. It is also hard for reader to know if
> > > snapshot of disabled taked under the lock is still valid after the lock has
> > > been released and why.
> > >
> >
> > Will pull in the doc comment into this patch.
> >
> > Matt
> >
> > > Regards,
> > >
> > > Tvrtko
> > >
> > > > +   }
> > > > +
> > > >     /*
> > > >      * We defer GuC context deregistration until the context is destroyed
> > > >      * in order to save on CTBs. With this optimization ideally we only need
> > > > @@ -1148,6 +1484,33 @@ static int guc_context_alloc(struct intel_context *ce)
> > > >     return lrc_alloc(ce, ce->engine);
> > > >   }
> > > > +static void add_to_context(struct i915_request *rq)
> > > > +{
> > > > +   struct intel_context *ce = rq->context;
> > > > +
> > > > +   spin_lock(&ce->guc_active.lock);
> > > > +   list_move_tail(&rq->sched.link, &ce->guc_active.requests);
> > > > +   spin_unlock(&ce->guc_active.lock);
> > > > +}
> > > > +
> > > > +static void remove_from_context(struct i915_request *rq)
> > > > +{
> > > > +   struct intel_context *ce = rq->context;
> > > > +
> > > > +   spin_lock_irq(&ce->guc_active.lock);
> > > > +
> > > > +   list_del_init(&rq->sched.link);
> > > > +   clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > > > +
> > > > +   /* Prevent further __await_execution() registering a cb, then flush */
> > > > +   set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> > > > +
> > > > +   spin_unlock_irq(&ce->guc_active.lock);
> > > > +
> > > > +   atomic_dec(&ce->guc_id_ref);
> > > > +   i915_request_notify_execute_cb_imm(rq);
> > > > +}
> > > > +
> > > >   static const struct intel_context_ops guc_context_ops = {
> > > >     .alloc = guc_context_alloc,
> > > > @@ -1186,8 +1549,6 @@ static void guc_signal_context_fence(struct intel_context *ce)
> > > >   {
> > > >     unsigned long flags;
> > > > -   GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
> > > > -
> > > >     spin_lock_irqsave(&ce->guc_state.lock, flags);
> > > >     clr_context_wait_for_deregister_to_register(ce);
> > > >     __guc_signal_context_fence(ce);
> > > > @@ -1196,8 +1557,9 @@ static void guc_signal_context_fence(struct intel_context *ce)
> > > >   static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
> > > >   {
> > > > -   return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> > > > -           !lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> > > > +   return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> > > > +           !lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
> > > > +           !submission_disabled(ce_to_guc(ce));
> > > >   }
> > > >   static int guc_request_alloc(struct i915_request *rq)
> > > > @@ -1256,8 +1618,10 @@ static int guc_request_alloc(struct i915_request *rq)
> > > >             return ret;;
> > > >     if (context_needs_register(ce, !!ret)) {
> > > > -           ret = guc_lrc_desc_pin(ce);
> > > > +           ret = guc_lrc_desc_pin(ce, true);
> > > >             if (unlikely(ret)) {    /* unwind */
> > > > +                   if (ret == -EDEADLK)
> > > > +                           disable_submission(guc);
> > > >                     atomic_dec(&ce->guc_id_ref);
> > > >                     unpin_guc_id(guc, ce);
> > > >                     return ret;
> > > > @@ -1294,20 +1658,6 @@ static int guc_request_alloc(struct i915_request *rq)
> > > >     return 0;
> > > >   }
> > > > -static struct intel_engine_cs *
> > > > -guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> > > > -{
> > > > -   struct intel_engine_cs *engine;
> > > > -   intel_engine_mask_t tmp, mask = ve->mask;
> > > > -   unsigned int num_siblings = 0;
> > > > -
> > > > -   for_each_engine_masked(engine, ve->gt, mask, tmp)
> > > > -           if (num_siblings++ == sibling)
> > > > -                   return engine;
> > > > -
> > > > -   return NULL;
> > > > -}
> > > > -
> > > >   static int guc_virtual_context_pre_pin(struct intel_context *ce,
> > > >                                    struct i915_gem_ww_ctx *ww,
> > > >                                    void **vaddr)
> > > > @@ -1516,7 +1866,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
> > > >   {
> > > >     if (context_guc_id_invalid(ce))
> > > >             pin_guc_id(guc, ce);
> > > > -   guc_lrc_desc_pin(ce);
> > > > +   guc_lrc_desc_pin(ce, true);
> > > >   }
> > > >   static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> > > > @@ -1582,13 +1932,15 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
> > > >     engine->cops = &guc_context_ops;
> > > >     engine->request_alloc = guc_request_alloc;
> > > >     engine->bump_serial = guc_bump_serial;
> > > > +   engine->add_active_request = add_to_context;
> > > > +   engine->remove_active_request = remove_from_context;
> > > >     engine->sched_engine->schedule = i915_schedule;
> > > > -   engine->reset.prepare = guc_reset_prepare;
> > > > -   engine->reset.rewind = guc_reset_rewind;
> > > > -   engine->reset.cancel = guc_reset_cancel;
> > > > -   engine->reset.finish = guc_reset_finish;
> > > > +   engine->reset.prepare = guc_reset_nop;
> > > > +   engine->reset.rewind = guc_rewind_nop;
> > > > +   engine->reset.cancel = guc_reset_nop;
> > > > +   engine->reset.finish = guc_reset_nop;
> > > >     engine->emit_flush = gen8_emit_flush_xcs;
> > > >     engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
> > > > @@ -1764,7 +2116,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > > >              * register this context.
> > > >              */
> > > >             with_intel_runtime_pm(runtime_pm, wakeref)
> > > > -                   register_context(ce);
> > > > +                   register_context(ce, true);
> > > >             guc_signal_context_fence(ce);
> > > >             intel_context_put(ce);
> > > >     } else if (context_destroyed(ce)) {
> > > > @@ -1946,6 +2298,10 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> > > >                              "v%dx%d", ve->base.class, count);
> > > >                     ve->base.context_size = sibling->context_size;
> > > > +                   ve->base.add_active_request =
> > > > +                           sibling->add_active_request;
> > > > +                   ve->base.remove_active_request =
> > > > +                           sibling->remove_active_request;
> > > >                     ve->base.emit_bb_start = sibling->emit_bb_start;
> > > >                     ve->base.emit_flush = sibling->emit_flush;
> > > >                     ve->base.emit_init_breadcrumb =
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > > index ab0789d66e06..d5ccffbb89ae 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > > @@ -565,12 +565,44 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
> > > >   {
> > > >     struct intel_guc *guc = &uc->guc;
> > > > +   /* Firmware expected to be running when this function is called */
> > > >     if (!intel_guc_is_ready(guc))
> > > > -           return;
> > > > +           goto sanitize;
> > > > +
> > > > +   if (intel_uc_uses_guc_submission(uc))
> > > > +           intel_guc_submission_reset_prepare(guc);
> > > > +sanitize:
> > > >     __uc_sanitize(uc);
> > > >   }
> > > > +void intel_uc_reset(struct intel_uc *uc, bool stalled)
> > > > +{
> > > > +   struct intel_guc *guc = &uc->guc;
> > > > +
> > > > +   /* Firmware can not be running when this function is called  */
> > > > +   if (intel_uc_uses_guc_submission(uc))
> > > > +           intel_guc_submission_reset(guc, stalled);
> > > > +}
> > > > +
> > > > +void intel_uc_reset_finish(struct intel_uc *uc)
> > > > +{
> > > > +   struct intel_guc *guc = &uc->guc;
> > > > +
> > > > +   /* Firmware expected to be running when this function is called */
> > > > +   if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
> > > > +           intel_guc_submission_reset_finish(guc);
> > > > +}
> > > > +
> > > > +void intel_uc_cancel_requests(struct intel_uc *uc)
> > > > +{
> > > > +   struct intel_guc *guc = &uc->guc;
> > > > +
> > > > +   /* Firmware can not be running when this function is called  */
> > > > +   if (intel_uc_uses_guc_submission(uc))
> > > > +           intel_guc_submission_cancel_requests(guc);
> > > > +}
> > > > +
> > > >   void intel_uc_runtime_suspend(struct intel_uc *uc)
> > > >   {
> > > >     struct intel_guc *guc = &uc->guc;
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > > > index c4cef885e984..eaa3202192ac 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > > > @@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc);
> > > >   void intel_uc_driver_remove(struct intel_uc *uc);
> > > >   void intel_uc_init_mmio(struct intel_uc *uc);
> > > >   void intel_uc_reset_prepare(struct intel_uc *uc);
> > > > +void intel_uc_reset(struct intel_uc *uc, bool stalled);
> > > > +void intel_uc_reset_finish(struct intel_uc *uc);
> > > > +void intel_uc_cancel_requests(struct intel_uc *uc);
> > > >   void intel_uc_suspend(struct intel_uc *uc);
> > > >   void intel_uc_runtime_suspend(struct intel_uc *uc);
> > > >   int intel_uc_resume(struct intel_uc *uc);
> > > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > > > index 0b96b824ea06..4855cf7ebe21 100644
> > > > --- a/drivers/gpu/drm/i915/i915_request.c
> > > > +++ b/drivers/gpu/drm/i915/i915_request.c
> > > > @@ -194,7 +194,7 @@ static bool irq_work_imm(struct irq_work *wrk)
> > > >     return false;
> > > >   }
> > > > -static void __notify_execute_cb_imm(struct i915_request *rq)
> > > > +void i915_request_notify_execute_cb_imm(struct i915_request *rq)
> > > >   {
> > > >     __notify_execute_cb(rq, irq_work_imm);
> > > >   }
> > > > @@ -268,37 +268,6 @@ i915_request_active_engine(struct i915_request *rq,
> > > >     return ret;
> > > >   }
> > > > -
> > > > -static void remove_from_engine(struct i915_request *rq)
> > > > -{
> > > > -   struct intel_engine_cs *engine, *locked;
> > > > -
> > > > -   /*
> > > > -    * Virtual engines complicate acquiring the engine timeline lock,
> > > > -    * as their rq->engine pointer is not stable until under that
> > > > -    * engine lock. The simple ploy we use is to take the lock then
> > > > -    * check that the rq still belongs to the newly locked engine.
> > > > -    */
> > > > -   locked = READ_ONCE(rq->engine);
> > > > -   spin_lock_irq(&locked->sched_engine->lock);
> > > > -   while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> > > > -           spin_unlock(&locked->sched_engine->lock);
> > > > -           spin_lock(&engine->sched_engine->lock);
> > > > -           locked = engine;
> > > > -   }
> > > > -   list_del_init(&rq->sched.link);
> > > > -
> > > > -   clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > > > -   clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
> > > > -
> > > > -   /* Prevent further __await_execution() registering a cb, then flush */
> > > > -   set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> > > > -
> > > > -   spin_unlock_irq(&locked->sched_engine->lock);
> > > > -
> > > > -   __notify_execute_cb_imm(rq);
> > > > -}
> > > > -
> > > >   static void __rq_init_watchdog(struct i915_request *rq)
> > > >   {
> > > >     rq->watchdog.timer.function = NULL;
> > > > @@ -395,9 +364,7 @@ bool i915_request_retire(struct i915_request *rq)
> > > >      * after removing the breadcrumb and signaling it, so that we do not
> > > >      * inadvertently attach the breadcrumb to a completed request.
> > > >      */
> > > > -   if (!list_empty(&rq->sched.link))
> > > > -           remove_from_engine(rq);
> > > > -   atomic_dec(&rq->context->guc_id_ref);
> > > > +   rq->engine->remove_active_request(rq);
> > > >     GEM_BUG_ON(!llist_empty(&rq->execute_cb));
> > > >     __list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
> > > > @@ -539,7 +506,7 @@ __await_execution(struct i915_request *rq,
> > > >     if (llist_add(&cb->work.node.llist, &signal->execute_cb)) {
> > > >             if (i915_request_is_active(signal) ||
> > > >                 __request_in_flight(signal))
> > > > -                   __notify_execute_cb_imm(signal);
> > > > +                   i915_request_notify_execute_cb_imm(signal);
> > > >     }
> > > >     return 0;
> > > > @@ -676,7 +643,7 @@ bool __i915_request_submit(struct i915_request *request)
> > > >     result = true;
> > > >     GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> > > > -   list_move_tail(&request->sched.link, &engine->sched_engine->requests);
> > > > +   engine->add_active_request(request);
> > > >   active:
> > > >     clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags);
> > > >     set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
> > > > diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> > > > index f870cd75a001..bcc6340c505e 100644
> > > > --- a/drivers/gpu/drm/i915/i915_request.h
> > > > +++ b/drivers/gpu/drm/i915/i915_request.h
> > > > @@ -649,4 +649,6 @@ bool
> > > >   i915_request_active_engine(struct i915_request *rq,
> > > >                        struct intel_engine_cs **active);
> > > > +void i915_request_notify_execute_cb_imm(struct i915_request *rq);
> > > > +
> > > >   #endif /* I915_REQUEST_H */
> > > >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-05-27 15:11               ` Tvrtko Ursulin
@ 2021-06-07 17:31                 ` Matthew Brost
  2021-06-08  8:39                   ` Tvrtko Ursulin
  2021-06-09 14:14                   ` Michal Wajdeczko
  0 siblings, 2 replies; 249+ messages in thread
From: Matthew Brost @ 2021-06-07 17:31 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel

On Thu, May 27, 2021 at 04:11:50PM +0100, Tvrtko Ursulin wrote:
> 
> On 27/05/2021 15:35, Matthew Brost wrote:
> > On Thu, May 27, 2021 at 11:02:24AM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 26/05/2021 19:10, Matthew Brost wrote:
> > > 
> > > [snip]
> > > 
> > > > > > > > +static int ct_send_nb(struct intel_guc_ct *ct,
> > > > > > > > +		      const u32 *action,
> > > > > > > > +		      u32 len,
> > > > > > > > +		      u32 flags)
> > > > > > > > +{
> > > > > > > > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > > > > > > +	unsigned long spin_flags;
> > > > > > > > +	u32 fence;
> > > > > > > > +	int ret;
> > > > > > > > +
> > > > > > > > +	spin_lock_irqsave(&ctb->lock, spin_flags);
> > > > > > > > +
> > > > > > > > +	ret = ctb_has_room(ctb, len + 1);
> > > > > > > > +	if (unlikely(ret))
> > > > > > > > +		goto out;
> > > > > > > > +
> > > > > > > > +	fence = ct_get_next_fence(ct);
> > > > > > > > +	ret = ct_write(ct, action, len, fence, flags);
> > > > > > > > +	if (unlikely(ret))
> > > > > > > > +		goto out;
> > > > > > > > +
> > > > > > > > +	intel_guc_notify(ct_to_guc(ct));
> > > > > > > > +
> > > > > > > > +out:
> > > > > > > > +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> > > > > > > > +
> > > > > > > > +	return ret;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >      static int ct_send(struct intel_guc_ct *ct,
> > > > > > > >      		   const u32 *action,
> > > > > > > >      		   u32 len,
> > > > > > > > @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > > > > > > >      		   u32 response_buf_size,
> > > > > > > >      		   u32 *status)
> > > > > > > >      {
> > > > > > > > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > > > > > >      	struct ct_request request;
> > > > > > > >      	unsigned long flags;
> > > > > > > >      	u32 fence;
> > > > > > > > @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> > > > > > > >      	GEM_BUG_ON(!len);
> > > > > > > >      	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> > > > > > > >      	GEM_BUG_ON(!response_buf && response_buf_size);
> > > > > > > > +	might_sleep();
> > > > > > > 
> > > > > > > Sleep is just cond_resched below or there is more?
> > > > > > > 
> > > > > > 
> > > > > > Yes, the cond_resched.
> > > > > > 
> > > > > > > > +	/*
> > > > > > > > +	 * We use a lazy spin wait loop here as we believe that if the CT
> > > > > > > > +	 * buffers are sized correctly the flow control condition should be
> > > > > > > > +	 * rare.
> > > > > > > > +	 */
> > > > > > > > +retry:
> > > > > > > >      	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > > > > > > > +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
> > > > > > > > +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > > > > > > > +		cond_resched();
> > > > > > > > +		goto retry;
> > > > > > > > +	}
> > > > > > > 
> > > > > > > If this patch is about adding a non-blocking send function, and below we can
> > > > > > > see that it creates a fork:
> > > > > > > 
> > > > > > > intel_guc_ct_send:
> > > > > > > ...
> > > > > > > 	if (flags & INTEL_GUC_SEND_NB)
> > > > > > > 		return ct_send_nb(ct, action, len, flags);
> > > > > > > 
> > > > > > >     	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> > > > > > > 
> > > > > > > Then why is there a change in ct_send here, which is not the new
> > > > > > > non-blocking path?
> > > > > > > 
> > > > > > 
> > > > > > There is not a change to ct_send(), just to intel_guc_ct_send.
> > > > > 
> > > > > I was doing by the diff which says:
> > > > > 
> > > > >    static int ct_send(struct intel_guc_ct *ct,
> > > > >    		   const u32 *action,
> > > > >    		   u32 len,
> > > > > @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > > > >    		   u32 response_buf_size,
> > > > >    		   u32 *status)
> > > > >    {
> > > > > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > > >    	struct ct_request request;
> > > > >    	unsigned long flags;
> > > > >    	u32 fence;
> > > > > @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> > > > >    	GEM_BUG_ON(!len);
> > > > >    	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> > > > >    	GEM_BUG_ON(!response_buf && response_buf_size);
> > > > > +	might_sleep();
> > > > > +	/*
> > > > > +	 * We use a lazy spin wait loop here as we believe that if the CT
> > > > > +	 * buffers are sized correctly the flow control condition should be
> > > > > +	 * rare.
> > > > > +	 */
> > > > > +retry:
> > > > >    	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > > > > +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
> > > > > +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > > > > +		cond_resched();
> > > > > +		goto retry;
> > > > > +	}
> > > > > 
> > > > > So it looks like a change to ct_send to me. Is that wrong?
> > > 
> > > What about this part - is the patch changing the blocking ct_send or not,
> > > and if it is why?
> > > 
> > 
> > Yes, ct_send() changes. Sorry for the confusion.
> > 
> > This function needs to be updated to account for the H2G space and
> > backoff if no space is available.
> 
> Since this one is the sleeping path, it probably can and needs to be smarter
> than having a cond_resched busy loop added. Like sleep and get woken up when
> there is space. Otherwise it can degenerate to busy looping via contention
> with the non-blocking path.
> 

That screams over enginerring a simple problem to me. If the CT channel
is full we are really in trouble anyways - i.e. the performance is going
to terrible as we overwhelmed the GuC with traffic. That being said,
IGTs can do this but that really isn't a real world use case. For the
real world, this buffer is large enough that it won't ever be full hence
the comment + lazy spin loop.

Next, it isn't like we get an interrupt or something when space
becomes available so how would we wake this thread? Could we come up
with a convoluted scheme where we insert ops that generated an interrupt
at regular intervals, probably? Would it be super complicated, totally
unnecessary, and gain use nothing - absolutely.

Lastly, blocking CTBs really shouldn't ever be used. Certainly the
submission code doesn't use these. I think SRIOV might, but those can
probably be reworked too to use non-blocking. At some point we might
want to scrub the driver and just delete the blocking path.

Matt

> Regards,

> 
> Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-06-07 17:31                 ` Matthew Brost
@ 2021-06-08  8:39                   ` Tvrtko Ursulin
  2021-06-08  8:46                     ` Daniel Vetter
  2021-06-09 13:58                     ` Michal Wajdeczko
  2021-06-09 14:14                   ` Michal Wajdeczko
  1 sibling, 2 replies; 249+ messages in thread
From: Tvrtko Ursulin @ 2021-06-08  8:39 UTC (permalink / raw)
  To: Matthew Brost; +Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel


On 07/06/2021 18:31, Matthew Brost wrote:
> On Thu, May 27, 2021 at 04:11:50PM +0100, Tvrtko Ursulin wrote:
>>
>> On 27/05/2021 15:35, Matthew Brost wrote:
>>> On Thu, May 27, 2021 at 11:02:24AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 26/05/2021 19:10, Matthew Brost wrote:
>>>>
>>>> [snip]
>>>>
>>>>>>>>> +static int ct_send_nb(struct intel_guc_ct *ct,
>>>>>>>>> +		      const u32 *action,
>>>>>>>>> +		      u32 len,
>>>>>>>>> +		      u32 flags)
>>>>>>>>> +{
>>>>>>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>>>>>> +	unsigned long spin_flags;
>>>>>>>>> +	u32 fence;
>>>>>>>>> +	int ret;
>>>>>>>>> +
>>>>>>>>> +	spin_lock_irqsave(&ctb->lock, spin_flags);
>>>>>>>>> +
>>>>>>>>> +	ret = ctb_has_room(ctb, len + 1);
>>>>>>>>> +	if (unlikely(ret))
>>>>>>>>> +		goto out;
>>>>>>>>> +
>>>>>>>>> +	fence = ct_get_next_fence(ct);
>>>>>>>>> +	ret = ct_write(ct, action, len, fence, flags);
>>>>>>>>> +	if (unlikely(ret))
>>>>>>>>> +		goto out;
>>>>>>>>> +
>>>>>>>>> +	intel_guc_notify(ct_to_guc(ct));
>>>>>>>>> +
>>>>>>>>> +out:
>>>>>>>>> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
>>>>>>>>> +
>>>>>>>>> +	return ret;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>       static int ct_send(struct intel_guc_ct *ct,
>>>>>>>>>       		   const u32 *action,
>>>>>>>>>       		   u32 len,
>>>>>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>>>>>       		   u32 response_buf_size,
>>>>>>>>>       		   u32 *status)
>>>>>>>>>       {
>>>>>>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>>>>>>       	struct ct_request request;
>>>>>>>>>       	unsigned long flags;
>>>>>>>>>       	u32 fence;
>>>>>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>>>>>       	GEM_BUG_ON(!len);
>>>>>>>>>       	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>>>>>>>       	GEM_BUG_ON(!response_buf && response_buf_size);
>>>>>>>>> +	might_sleep();
>>>>>>>>
>>>>>>>> Sleep is just cond_resched below or there is more?
>>>>>>>>
>>>>>>>
>>>>>>> Yes, the cond_resched.
>>>>>>>
>>>>>>>>> +	/*
>>>>>>>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>>>>>>>>> +	 * buffers are sized correctly the flow control condition should be
>>>>>>>>> +	 * rare.
>>>>>>>>> +	 */
>>>>>>>>> +retry:
>>>>>>>>>       	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>>>>>>>>> +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
>>>>>>>>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>>>>>>>> +		cond_resched();
>>>>>>>>> +		goto retry;
>>>>>>>>> +	}
>>>>>>>>
>>>>>>>> If this patch is about adding a non-blocking send function, and below we can
>>>>>>>> see that it creates a fork:
>>>>>>>>
>>>>>>>> intel_guc_ct_send:
>>>>>>>> ...
>>>>>>>> 	if (flags & INTEL_GUC_SEND_NB)
>>>>>>>> 		return ct_send_nb(ct, action, len, flags);
>>>>>>>>
>>>>>>>>      	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>>>>>>>>
>>>>>>>> Then why is there a change in ct_send here, which is not the new
>>>>>>>> non-blocking path?
>>>>>>>>
>>>>>>>
>>>>>>> There is not a change to ct_send(), just to intel_guc_ct_send.
>>>>>>
>>>>>> I was doing by the diff which says:
>>>>>>
>>>>>>     static int ct_send(struct intel_guc_ct *ct,
>>>>>>     		   const u32 *action,
>>>>>>     		   u32 len,
>>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>>     		   u32 response_buf_size,
>>>>>>     		   u32 *status)
>>>>>>     {
>>>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>>>     	struct ct_request request;
>>>>>>     	unsigned long flags;
>>>>>>     	u32 fence;
>>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>>     	GEM_BUG_ON(!len);
>>>>>>     	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>>>>     	GEM_BUG_ON(!response_buf && response_buf_size);
>>>>>> +	might_sleep();
>>>>>> +	/*
>>>>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>>>>>> +	 * buffers are sized correctly the flow control condition should be
>>>>>> +	 * rare.
>>>>>> +	 */
>>>>>> +retry:
>>>>>>     	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>>>>>> +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
>>>>>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>>>>> +		cond_resched();
>>>>>> +		goto retry;
>>>>>> +	}
>>>>>>
>>>>>> So it looks like a change to ct_send to me. Is that wrong?
>>>>
>>>> What about this part - is the patch changing the blocking ct_send or not,
>>>> and if it is why?
>>>>
>>>
>>> Yes, ct_send() changes. Sorry for the confusion.
>>>
>>> This function needs to be updated to account for the H2G space and
>>> backoff if no space is available.
>>
>> Since this one is the sleeping path, it probably can and needs to be smarter
>> than having a cond_resched busy loop added. Like sleep and get woken up when
>> there is space. Otherwise it can degenerate to busy looping via contention
>> with the non-blocking path.
>>
> 
> That screams over enginerring a simple problem to me. If the CT channel
> is full we are really in trouble anyways - i.e. the performance is going
> to terrible as we overwhelmed the GuC with traffic. That being said,

Performance of what would be terrible? Something relating to submitting 
new jobs to the GPU I guess. Or something SRIOV related as you hint below.

But there is no real reason why CPU cycles/power should suffer if GuC is 
busy.

Okay, if it can't happen in real world then it's possibly passable as a 
design of a communication interface. But to me it leaves a bad taste and 
a doubt that there is this other aspect of the real world. And that is 
when the unexpected happens. Even the most trivial things like a bug in 
GuC firmware causes the driver to busy spin in there. So not much 
happening on the machine but CPU cores pinned burning cycles in this 
code. It's just lazy and not robust design. "Bug #nnnnn - High CPU usage 
and GUI blocked - Solution: Upgrade GuC firmware and _reboot_ the 
machine". Oh well..

At least I think the commit message should spell out clearly that a busy 
looping path is being added to the sleeping send as a downside of 
implementation choices. Still, for the record, I object to the design.

Regards,

Tvrtko

> IGTs can do this but that really isn't a real world use case. For the
> real world, this buffer is large enough that it won't ever be full hence
> the comment + lazy spin loop.
> 
> Next, it isn't like we get an interrupt or something when space
> becomes available so how would we wake this thread? Could we come up
> with a convoluted scheme where we insert ops that generated an interrupt
> at regular intervals, probably? Would it be super complicated, totally
> unnecessary, and gain use nothing - absolutely.
> 
> Lastly, blocking CTBs really shouldn't ever be used. Certainly the
> submission code doesn't use these. I think SRIOV might, but those can
> probably be reworked too to use non-blocking. At some point we might
> want to scrub the driver and just delete the blocking path.
> 
> Matt
> 
>> Regards,
> 
>>
>> Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-06-08  8:39                   ` Tvrtko Ursulin
@ 2021-06-08  8:46                     ` Daniel Vetter
  2021-06-09 23:10                       ` Matthew Brost
  2021-06-09 13:58                     ` Michal Wajdeczko
  1 sibling, 1 reply; 249+ messages in thread
From: Daniel Vetter @ 2021-06-08  8:46 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Matthew Brost, Jason Ekstrand, intel-gfx, dri-devel, Daniel Vetter

On Tue, Jun 8, 2021 at 10:39 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 07/06/2021 18:31, Matthew Brost wrote:
> > On Thu, May 27, 2021 at 04:11:50PM +0100, Tvrtko Ursulin wrote:
> >>
> >> On 27/05/2021 15:35, Matthew Brost wrote:
> >>> On Thu, May 27, 2021 at 11:02:24AM +0100, Tvrtko Ursulin wrote:
> >>>>
> >>>> On 26/05/2021 19:10, Matthew Brost wrote:
> >>>>
> >>>> [snip]
> >>>>
> >>>>>>>>> +static int ct_send_nb(struct intel_guc_ct *ct,
> >>>>>>>>> +                   const u32 *action,
> >>>>>>>>> +                   u32 len,
> >>>>>>>>> +                   u32 flags)
> >>>>>>>>> +{
> >>>>>>>>> +     struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>>>>>>> +     unsigned long spin_flags;
> >>>>>>>>> +     u32 fence;
> >>>>>>>>> +     int ret;
> >>>>>>>>> +
> >>>>>>>>> +     spin_lock_irqsave(&ctb->lock, spin_flags);
> >>>>>>>>> +
> >>>>>>>>> +     ret = ctb_has_room(ctb, len + 1);
> >>>>>>>>> +     if (unlikely(ret))
> >>>>>>>>> +             goto out;
> >>>>>>>>> +
> >>>>>>>>> +     fence = ct_get_next_fence(ct);
> >>>>>>>>> +     ret = ct_write(ct, action, len, fence, flags);
> >>>>>>>>> +     if (unlikely(ret))
> >>>>>>>>> +             goto out;
> >>>>>>>>> +
> >>>>>>>>> +     intel_guc_notify(ct_to_guc(ct));
> >>>>>>>>> +
> >>>>>>>>> +out:
> >>>>>>>>> +     spin_unlock_irqrestore(&ctb->lock, spin_flags);
> >>>>>>>>> +
> >>>>>>>>> +     return ret;
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>>       static int ct_send(struct intel_guc_ct *ct,
> >>>>>>>>>                          const u32 *action,
> >>>>>>>>>                          u32 len,
> >>>>>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>>>>>                          u32 response_buf_size,
> >>>>>>>>>                          u32 *status)
> >>>>>>>>>       {
> >>>>>>>>> +     struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>>>>>>>               struct ct_request request;
> >>>>>>>>>               unsigned long flags;
> >>>>>>>>>               u32 fence;
> >>>>>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>>>>>               GEM_BUG_ON(!len);
> >>>>>>>>>               GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> >>>>>>>>>               GEM_BUG_ON(!response_buf && response_buf_size);
> >>>>>>>>> +     might_sleep();
> >>>>>>>>
> >>>>>>>> Sleep is just cond_resched below or there is more?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Yes, the cond_resched.
> >>>>>>>
> >>>>>>>>> +     /*
> >>>>>>>>> +      * We use a lazy spin wait loop here as we believe that if the CT
> >>>>>>>>> +      * buffers are sized correctly the flow control condition should be
> >>>>>>>>> +      * rare.
> >>>>>>>>> +      */
> >>>>>>>>> +retry:
> >>>>>>>>>               spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> >>>>>>>>> +     if (unlikely(!ctb_has_room(ctb, len + 1))) {
> >>>>>>>>> +             spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>>>>>>>> +             cond_resched();
> >>>>>>>>> +             goto retry;
> >>>>>>>>> +     }
> >>>>>>>>
> >>>>>>>> If this patch is about adding a non-blocking send function, and below we can
> >>>>>>>> see that it creates a fork:
> >>>>>>>>
> >>>>>>>> intel_guc_ct_send:
> >>>>>>>> ...
> >>>>>>>>        if (flags & INTEL_GUC_SEND_NB)
> >>>>>>>>                return ct_send_nb(ct, action, len, flags);
> >>>>>>>>
> >>>>>>>>        ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> >>>>>>>>
> >>>>>>>> Then why is there a change in ct_send here, which is not the new
> >>>>>>>> non-blocking path?
> >>>>>>>>
> >>>>>>>
> >>>>>>> There is not a change to ct_send(), just to intel_guc_ct_send.
> >>>>>>
> >>>>>> I was doing by the diff which says:
> >>>>>>
> >>>>>>     static int ct_send(struct intel_guc_ct *ct,
> >>>>>>                     const u32 *action,
> >>>>>>                     u32 len,
> >>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>>                     u32 response_buf_size,
> >>>>>>                     u32 *status)
> >>>>>>     {
> >>>>>> +        struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>>>>          struct ct_request request;
> >>>>>>          unsigned long flags;
> >>>>>>          u32 fence;
> >>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>>          GEM_BUG_ON(!len);
> >>>>>>          GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> >>>>>>          GEM_BUG_ON(!response_buf && response_buf_size);
> >>>>>> +        might_sleep();
> >>>>>> +        /*
> >>>>>> +         * We use a lazy spin wait loop here as we believe that if the CT
> >>>>>> +         * buffers are sized correctly the flow control condition should be
> >>>>>> +         * rare.
> >>>>>> +         */
> >>>>>> +retry:
> >>>>>>          spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> >>>>>> +        if (unlikely(!ctb_has_room(ctb, len + 1))) {
> >>>>>> +                spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>>>>> +                cond_resched();
> >>>>>> +                goto retry;
> >>>>>> +        }
> >>>>>>
> >>>>>> So it looks like a change to ct_send to me. Is that wrong?
> >>>>
> >>>> What about this part - is the patch changing the blocking ct_send or not,
> >>>> and if it is why?
> >>>>
> >>>
> >>> Yes, ct_send() changes. Sorry for the confusion.
> >>>
> >>> This function needs to be updated to account for the H2G space and
> >>> backoff if no space is available.
> >>
> >> Since this one is the sleeping path, it probably can and needs to be smarter
> >> than having a cond_resched busy loop added. Like sleep and get woken up when
> >> there is space. Otherwise it can degenerate to busy looping via contention
> >> with the non-blocking path.
> >>
> >
> > That screams over enginerring a simple problem to me. If the CT channel
> > is full we are really in trouble anyways - i.e. the performance is going
> > to terrible as we overwhelmed the GuC with traffic. That being said,
>
> Performance of what would be terrible? Something relating to submitting
> new jobs to the GPU I guess. Or something SRIOV related as you hint below.
>
> But there is no real reason why CPU cycles/power should suffer if GuC is
> busy.
>
> Okay, if it can't happen in real world then it's possibly passable as a
> design of a communication interface. But to me it leaves a bad taste and
> a doubt that there is this other aspect of the real world. And that is
> when the unexpected happens. Even the most trivial things like a bug in
> GuC firmware causes the driver to busy spin in there. So not much
> happening on the machine but CPU cores pinned burning cycles in this
> code. It's just lazy and not robust design. "Bug #nnnnn - High CPU usage
> and GUI blocked - Solution: Upgrade GuC firmware and _reboot_ the
> machine". Oh well..
>
> At least I think the commit message should spell out clearly that a busy
> looping path is being added to the sleeping send as a downside of
> implementation choices. Still, for the record, I object to the design.
>
> > IGTs can do this but that really isn't a real world use case. For the
> > real world, this buffer is large enough that it won't ever be full hence
> > the comment + lazy spin loop.
> >
> > Next, it isn't like we get an interrupt or something when space
> > becomes available so how would we wake this thread? Could we come up
> > with a convoluted scheme where we insert ops that generated an interrupt
> > at regular intervals, probably? Would it be super complicated, totally
> > unnecessary, and gain use nothing - absolutely.
> >
> > Lastly, blocking CTBs really shouldn't ever be used. Certainly the
> > submission code doesn't use these. I think SRIOV might, but those can
> > probably be reworked too to use non-blocking. At some point we might
> > want to scrub the driver and just delete the blocking path.

I'd do an s/cond_resched()/msleep(1)/ and comment explaining why we
just don't care about this. That checks of the cpu wasting in this
case (GuC is overloaded, it wont come back anytime soon anyway) and
explains why we really don't want to make this any more clever or
complex code (because comment can explain why we wont hit this in
actual real world usage except when something else is on fire already
anyway).

If you want to go absolutely overkill and it's not too much work, make
the msleep interruptible or check for signals, and bail out. That way
the process can be made unstuck with ^C at least.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-06-08  8:39                   ` Tvrtko Ursulin
  2021-06-08  8:46                     ` Daniel Vetter
@ 2021-06-09 13:58                     ` Michal Wajdeczko
  2021-06-09 23:05                       ` Matthew Brost
  1 sibling, 1 reply; 249+ messages in thread
From: Michal Wajdeczko @ 2021-06-09 13:58 UTC (permalink / raw)
  To: Tvrtko Ursulin, Matthew Brost
  Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel



On 08.06.2021 10:39, Tvrtko Ursulin wrote:
> 
> On 07/06/2021 18:31, Matthew Brost wrote:
>> On Thu, May 27, 2021 at 04:11:50PM +0100, Tvrtko Ursulin wrote:
>>>
>>> On 27/05/2021 15:35, Matthew Brost wrote:
>>>> On Thu, May 27, 2021 at 11:02:24AM +0100, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 26/05/2021 19:10, Matthew Brost wrote:
>>>>>
>>>>> [snip]
>>>>>
>>>>>>>>>> +static int ct_send_nb(struct intel_guc_ct *ct,
>>>>>>>>>> +              const u32 *action,
>>>>>>>>>> +              u32 len,
>>>>>>>>>> +              u32 flags)
>>>>>>>>>> +{
>>>>>>>>>> +    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>>>>>>> +    unsigned long spin_flags;
>>>>>>>>>> +    u32 fence;
>>>>>>>>>> +    int ret;
>>>>>>>>>> +
>>>>>>>>>> +    spin_lock_irqsave(&ctb->lock, spin_flags);
>>>>>>>>>> +
>>>>>>>>>> +    ret = ctb_has_room(ctb, len + 1);
>>>>>>>>>> +    if (unlikely(ret))
>>>>>>>>>> +        goto out;
>>>>>>>>>> +
>>>>>>>>>> +    fence = ct_get_next_fence(ct);
>>>>>>>>>> +    ret = ct_write(ct, action, len, fence, flags);
>>>>>>>>>> +    if (unlikely(ret))
>>>>>>>>>> +        goto out;
>>>>>>>>>> +
>>>>>>>>>> +    intel_guc_notify(ct_to_guc(ct));
>>>>>>>>>> +
>>>>>>>>>> +out:
>>>>>>>>>> +    spin_unlock_irqrestore(&ctb->lock, spin_flags);
>>>>>>>>>> +
>>>>>>>>>> +    return ret;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>       static int ct_send(struct intel_guc_ct *ct,
>>>>>>>>>>                  const u32 *action,
>>>>>>>>>>                  u32 len,
>>>>>>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>>>>>>                  u32 response_buf_size,
>>>>>>>>>>                  u32 *status)
>>>>>>>>>>       {
>>>>>>>>>> +    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>>>>>>>           struct ct_request request;
>>>>>>>>>>           unsigned long flags;
>>>>>>>>>>           u32 fence;
>>>>>>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>>>>>>           GEM_BUG_ON(!len);
>>>>>>>>>>           GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>>>>>>>>           GEM_BUG_ON(!response_buf && response_buf_size);
>>>>>>>>>> +    might_sleep();
>>>>>>>>>
>>>>>>>>> Sleep is just cond_resched below or there is more?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, the cond_resched.
>>>>>>>>
>>>>>>>>>> +    /*
>>>>>>>>>> +     * We use a lazy spin wait loop here as we believe that
>>>>>>>>>> if the CT
>>>>>>>>>> +     * buffers are sized correctly the flow control condition
>>>>>>>>>> should be
>>>>>>>>>> +     * rare.
>>>>>>>>>> +     */
>>>>>>>>>> +retry:
>>>>>>>>>>           spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>>>>>>>>>> +    if (unlikely(!ctb_has_room(ctb, len + 1))) {
>>>>>>>>>> +        spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>>>>>>>>> +        cond_resched();
>>>>>>>>>> +        goto retry;
>>>>>>>>>> +    }
>>>>>>>>>
>>>>>>>>> If this patch is about adding a non-blocking send function, and
>>>>>>>>> below we can
>>>>>>>>> see that it creates a fork:
>>>>>>>>>
>>>>>>>>> intel_guc_ct_send:
>>>>>>>>> ...
>>>>>>>>>     if (flags & INTEL_GUC_SEND_NB)
>>>>>>>>>         return ct_send_nb(ct, action, len, flags);
>>>>>>>>>
>>>>>>>>>          ret = ct_send(ct, action, len, response_buf,
>>>>>>>>> response_buf_size, &status);
>>>>>>>>>
>>>>>>>>> Then why is there a change in ct_send here, which is not the new
>>>>>>>>> non-blocking path?
>>>>>>>>>
>>>>>>>>
>>>>>>>> There is not a change to ct_send(), just to intel_guc_ct_send.
>>>>>>>
>>>>>>> I was doing by the diff which says:
>>>>>>>
>>>>>>>     static int ct_send(struct intel_guc_ct *ct,
>>>>>>>                const u32 *action,
>>>>>>>                u32 len,
>>>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>>>                u32 response_buf_size,
>>>>>>>                u32 *status)
>>>>>>>     {
>>>>>>> +    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>>>>         struct ct_request request;
>>>>>>>         unsigned long flags;
>>>>>>>         u32 fence;
>>>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>>>         GEM_BUG_ON(!len);
>>>>>>>         GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>>>>>         GEM_BUG_ON(!response_buf && response_buf_size);
>>>>>>> +    might_sleep();
>>>>>>> +    /*
>>>>>>> +     * We use a lazy spin wait loop here as we believe that if
>>>>>>> the CT
>>>>>>> +     * buffers are sized correctly the flow control condition
>>>>>>> should be
>>>>>>> +     * rare.
>>>>>>> +     */
>>>>>>> +retry:
>>>>>>>         spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>>>>>>> +    if (unlikely(!ctb_has_room(ctb, len + 1))) {
>>>>>>> +        spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>>>>>> +        cond_resched();
>>>>>>> +        goto retry;
>>>>>>> +    }
>>>>>>>
>>>>>>> So it looks like a change to ct_send to me. Is that wrong?
>>>>>
>>>>> What about this part - is the patch changing the blocking ct_send
>>>>> or not,
>>>>> and if it is why?
>>>>>
>>>>
>>>> Yes, ct_send() changes. Sorry for the confusion.
>>>>
>>>> This function needs to be updated to account for the H2G space and
>>>> backoff if no space is available.
>>>
>>> Since this one is the sleeping path, it probably can and needs to be
>>> smarter
>>> than having a cond_resched busy loop added. Like sleep and get woken
>>> up when
>>> there is space. Otherwise it can degenerate to busy looping via
>>> contention
>>> with the non-blocking path.
>>>
>>
>> That screams over enginerring a simple problem to me. If the CT channel
>> is full we are really in trouble anyways - i.e. the performance is going
>> to terrible as we overwhelmed the GuC with traffic. That being said,
> 
> Performance of what would be terrible? Something relating to submitting
> new jobs to the GPU I guess. Or something SRIOV related as you hint below.
> 
> But there is no real reason why CPU cycles/power should suffer if GuC is
> busy.
> 
> Okay, if it can't happen in real world then it's possibly passable as a

if that can't happen in real world, then maybe we can just return
-ENOSPC/-EBUSY to report that 'unexpected' case, instead of hiding it
behind silent busy loop ?

> design of a communication interface. But to me it leaves a bad taste and
> a doubt that there is this other aspect of the real world. And that is
> when the unexpected happens. Even the most trivial things like a bug in
> GuC firmware causes the driver to busy spin in there. So not much
> happening on the machine but CPU cores pinned burning cycles in this
> code. It's just lazy and not robust design. "Bug #nnnnn - High CPU usage
> and GUI blocked - Solution: Upgrade GuC firmware and _reboot_ the
> machine". Oh well..
> 
> At least I think the commit message should spell out clearly that a busy
> looping path is being added to the sleeping send as a downside of
> implementation choices. Still, for the record, I object to the design.
> 
> Regards,
> 
> Tvrtko
> 
>> IGTs can do this but that really isn't a real world use case. For the
>> real world, this buffer is large enough that it won't ever be full hence
>> the comment + lazy spin loop.
>>
>> Next, it isn't like we get an interrupt or something when space
>> becomes available so how would we wake this thread? Could we come up
>> with a convoluted scheme where we insert ops that generated an interrupt
>> at regular intervals, probably? Would it be super complicated, totally
>> unnecessary, and gain use nothing - absolutely.
>>
>> Lastly, blocking CTBs really shouldn't ever be used. Certainly the
>> submission code doesn't use these. I think SRIOV might, but those can
>> probably be reworked too to use non-blocking. At some point we might
>> want to scrub the driver and just delete the blocking path.
>>
>> Matt
>>
>>> Regards,
>>
>>>
>>> Tvrtko
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-06-07 17:31                 ` Matthew Brost
  2021-06-08  8:39                   ` Tvrtko Ursulin
@ 2021-06-09 14:14                   ` Michal Wajdeczko
  2021-06-09 23:13                     ` Matthew Brost
  1 sibling, 1 reply; 249+ messages in thread
From: Michal Wajdeczko @ 2021-06-09 14:14 UTC (permalink / raw)
  To: Matthew Brost, Tvrtko Ursulin
  Cc: jason.ekstrand, daniel.vetter, intel-gfx, dri-devel



On 07.06.2021 19:31, Matthew Brost wrote:
> On Thu, May 27, 2021 at 04:11:50PM +0100, Tvrtko Ursulin wrote:
>>
>> On 27/05/2021 15:35, Matthew Brost wrote:
>>> On Thu, May 27, 2021 at 11:02:24AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 26/05/2021 19:10, Matthew Brost wrote:
>>>>
>>>> [snip]
>>>>
>>>>>>>>> +static int ct_send_nb(struct intel_guc_ct *ct,
>>>>>>>>> +		      const u32 *action,
>>>>>>>>> +		      u32 len,
>>>>>>>>> +		      u32 flags)
>>>>>>>>> +{
>>>>>>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>>>>>> +	unsigned long spin_flags;
>>>>>>>>> +	u32 fence;
>>>>>>>>> +	int ret;
>>>>>>>>> +
>>>>>>>>> +	spin_lock_irqsave(&ctb->lock, spin_flags);
>>>>>>>>> +
>>>>>>>>> +	ret = ctb_has_room(ctb, len + 1);
>>>>>>>>> +	if (unlikely(ret))
>>>>>>>>> +		goto out;
>>>>>>>>> +
>>>>>>>>> +	fence = ct_get_next_fence(ct);
>>>>>>>>> +	ret = ct_write(ct, action, len, fence, flags);
>>>>>>>>> +	if (unlikely(ret))
>>>>>>>>> +		goto out;
>>>>>>>>> +
>>>>>>>>> +	intel_guc_notify(ct_to_guc(ct));
>>>>>>>>> +
>>>>>>>>> +out:
>>>>>>>>> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
>>>>>>>>> +
>>>>>>>>> +	return ret;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>      static int ct_send(struct intel_guc_ct *ct,
>>>>>>>>>      		   const u32 *action,
>>>>>>>>>      		   u32 len,
>>>>>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>>>>>      		   u32 response_buf_size,
>>>>>>>>>      		   u32 *status)
>>>>>>>>>      {
>>>>>>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>>>>>>      	struct ct_request request;
>>>>>>>>>      	unsigned long flags;
>>>>>>>>>      	u32 fence;
>>>>>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>>>>>      	GEM_BUG_ON(!len);
>>>>>>>>>      	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>>>>>>>      	GEM_BUG_ON(!response_buf && response_buf_size);
>>>>>>>>> +	might_sleep();
>>>>>>>>
>>>>>>>> Sleep is just cond_resched below or there is more?
>>>>>>>>
>>>>>>>
>>>>>>> Yes, the cond_resched.
>>>>>>>
>>>>>>>>> +	/*
>>>>>>>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>>>>>>>>> +	 * buffers are sized correctly the flow control condition should be
>>>>>>>>> +	 * rare.
>>>>>>>>> +	 */
>>>>>>>>> +retry:
>>>>>>>>>      	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>>>>>>>>> +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
>>>>>>>>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>>>>>>>> +		cond_resched();
>>>>>>>>> +		goto retry;
>>>>>>>>> +	}
>>>>>>>>
>>>>>>>> If this patch is about adding a non-blocking send function, and below we can
>>>>>>>> see that it creates a fork:
>>>>>>>>
>>>>>>>> intel_guc_ct_send:
>>>>>>>> ...
>>>>>>>> 	if (flags & INTEL_GUC_SEND_NB)
>>>>>>>> 		return ct_send_nb(ct, action, len, flags);
>>>>>>>>
>>>>>>>>     	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>>>>>>>>
>>>>>>>> Then why is there a change in ct_send here, which is not the new
>>>>>>>> non-blocking path?
>>>>>>>>
>>>>>>>
>>>>>>> There is not a change to ct_send(), just to intel_guc_ct_send.
>>>>>>
>>>>>> I was doing by the diff which says:
>>>>>>
>>>>>>    static int ct_send(struct intel_guc_ct *ct,
>>>>>>    		   const u32 *action,
>>>>>>    		   u32 len,
>>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>>    		   u32 response_buf_size,
>>>>>>    		   u32 *status)
>>>>>>    {
>>>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>>>    	struct ct_request request;
>>>>>>    	unsigned long flags;
>>>>>>    	u32 fence;
>>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>>    	GEM_BUG_ON(!len);
>>>>>>    	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>>>>    	GEM_BUG_ON(!response_buf && response_buf_size);
>>>>>> +	might_sleep();
>>>>>> +	/*
>>>>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>>>>>> +	 * buffers are sized correctly the flow control condition should be
>>>>>> +	 * rare.
>>>>>> +	 */
>>>>>> +retry:
>>>>>>    	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>>>>>> +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
>>>>>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>>>>> +		cond_resched();
>>>>>> +		goto retry;
>>>>>> +	}
>>>>>>
>>>>>> So it looks like a change to ct_send to me. Is that wrong?
>>>>
>>>> What about this part - is the patch changing the blocking ct_send or not,
>>>> and if it is why?
>>>>
>>>
>>> Yes, ct_send() changes. Sorry for the confusion.
>>>
>>> This function needs to be updated to account for the H2G space and
>>> backoff if no space is available.
>>
>> Since this one is the sleeping path, it probably can and needs to be smarter
>> than having a cond_resched busy loop added. Like sleep and get woken up when
>> there is space. Otherwise it can degenerate to busy looping via contention
>> with the non-blocking path.
>>
> 
> That screams over enginerring a simple problem to me. If the CT channel
> is full we are really in trouble anyways - i.e. the performance is going
> to terrible as we overwhelmed the GuC with traffic. That being said,
> IGTs can do this but that really isn't a real world use case. For the
> real world, this buffer is large enough that it won't ever be full hence
> the comment + lazy spin loop.
> 
> Next, it isn't like we get an interrupt or something when space
> becomes available so how would we wake this thread? Could we come up
> with a convoluted scheme where we insert ops that generated an interrupt
> at regular intervals, probably? Would it be super complicated, totally
> unnecessary, and gain use nothing - absolutely.
> 
> Lastly, blocking CTBs really shouldn't ever be used. Certainly the
> submission code doesn't use these. I think SRIOV might, but those can
> probably be reworked too to use non-blocking. At some point we might
> want to scrub the driver and just delete the blocking path.

I guess the main problem is not with "blocking CTBs", as now only
calling thread is "blocked" waiting for reply and other threads can
still send their CTBs (blocked/nonblocking), but the fact that we are
sending too many messages, stopping only when CTB is full, and even then
trying hard to squeeze that message again.

it should be caller responsibility to throttle its stream of
non-blocking CTBs if either we are running out of CTB but if we have too
many "non-blocking" requests in flight.

making CTB buffer just larger and larger does not solve the problem,
only makes it less visible

and as you are using busy-loop to send even 'non-blocking' CTBs, it
might indicate that your code is not prepared to step-back in case of
any temporary CTB congestion

also note that currently all CTB messages are asynchronous, REQUEST /
RESPONSE pair could be processed in fully non-blocking approach, but
that would require refactoring of part driver into event-driven state
machine, as sometimes we can't move forward without information that we
are waiting from the GuC (and blocking was simplest solution for that)

but if your submission code is already  event-driven, then it should be
easier to trigger state machine into 'retry' mode without using this
busy-loop

> 
> Matt
> 
>> Regards,
> 
>>
>> Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-06-09 13:58                     ` Michal Wajdeczko
@ 2021-06-09 23:05                       ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-06-09 23:05 UTC (permalink / raw)
  To: Michal Wajdeczko
  Cc: jason.ekstrand, Tvrtko Ursulin, intel-gfx, dri-devel, daniel.vetter

On Wed, Jun 09, 2021 at 03:58:38PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 08.06.2021 10:39, Tvrtko Ursulin wrote:
> > 
> > On 07/06/2021 18:31, Matthew Brost wrote:
> >> On Thu, May 27, 2021 at 04:11:50PM +0100, Tvrtko Ursulin wrote:
> >>>
> >>> On 27/05/2021 15:35, Matthew Brost wrote:
> >>>> On Thu, May 27, 2021 at 11:02:24AM +0100, Tvrtko Ursulin wrote:
> >>>>>
> >>>>> On 26/05/2021 19:10, Matthew Brost wrote:
> >>>>>
> >>>>> [snip]
> >>>>>
> >>>>>>>>>> +static int ct_send_nb(struct intel_guc_ct *ct,
> >>>>>>>>>> +              const u32 *action,
> >>>>>>>>>> +              u32 len,
> >>>>>>>>>> +              u32 flags)
> >>>>>>>>>> +{
> >>>>>>>>>> +    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>>>>>>>> +    unsigned long spin_flags;
> >>>>>>>>>> +    u32 fence;
> >>>>>>>>>> +    int ret;
> >>>>>>>>>> +
> >>>>>>>>>> +    spin_lock_irqsave(&ctb->lock, spin_flags);
> >>>>>>>>>> +
> >>>>>>>>>> +    ret = ctb_has_room(ctb, len + 1);
> >>>>>>>>>> +    if (unlikely(ret))
> >>>>>>>>>> +        goto out;
> >>>>>>>>>> +
> >>>>>>>>>> +    fence = ct_get_next_fence(ct);
> >>>>>>>>>> +    ret = ct_write(ct, action, len, fence, flags);
> >>>>>>>>>> +    if (unlikely(ret))
> >>>>>>>>>> +        goto out;
> >>>>>>>>>> +
> >>>>>>>>>> +    intel_guc_notify(ct_to_guc(ct));
> >>>>>>>>>> +
> >>>>>>>>>> +out:
> >>>>>>>>>> +    spin_unlock_irqrestore(&ctb->lock, spin_flags);
> >>>>>>>>>> +
> >>>>>>>>>> +    return ret;
> >>>>>>>>>> +}
> >>>>>>>>>> +
> >>>>>>>>>>       static int ct_send(struct intel_guc_ct *ct,
> >>>>>>>>>>                  const u32 *action,
> >>>>>>>>>>                  u32 len,
> >>>>>>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>>>>>>                  u32 response_buf_size,
> >>>>>>>>>>                  u32 *status)
> >>>>>>>>>>       {
> >>>>>>>>>> +    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>>>>>>>>           struct ct_request request;
> >>>>>>>>>>           unsigned long flags;
> >>>>>>>>>>           u32 fence;
> >>>>>>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>>>>>>           GEM_BUG_ON(!len);
> >>>>>>>>>>           GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> >>>>>>>>>>           GEM_BUG_ON(!response_buf && response_buf_size);
> >>>>>>>>>> +    might_sleep();
> >>>>>>>>>
> >>>>>>>>> Sleep is just cond_resched below or there is more?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Yes, the cond_resched.
> >>>>>>>>
> >>>>>>>>>> +    /*
> >>>>>>>>>> +     * We use a lazy spin wait loop here as we believe that
> >>>>>>>>>> if the CT
> >>>>>>>>>> +     * buffers are sized correctly the flow control condition
> >>>>>>>>>> should be
> >>>>>>>>>> +     * rare.
> >>>>>>>>>> +     */
> >>>>>>>>>> +retry:
> >>>>>>>>>>           spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> >>>>>>>>>> +    if (unlikely(!ctb_has_room(ctb, len + 1))) {
> >>>>>>>>>> +        spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>>>>>>>>> +        cond_resched();
> >>>>>>>>>> +        goto retry;
> >>>>>>>>>> +    }
> >>>>>>>>>
> >>>>>>>>> If this patch is about adding a non-blocking send function, and
> >>>>>>>>> below we can
> >>>>>>>>> see that it creates a fork:
> >>>>>>>>>
> >>>>>>>>> intel_guc_ct_send:
> >>>>>>>>> ...
> >>>>>>>>>     if (flags & INTEL_GUC_SEND_NB)
> >>>>>>>>>         return ct_send_nb(ct, action, len, flags);
> >>>>>>>>>
> >>>>>>>>>          ret = ct_send(ct, action, len, response_buf,
> >>>>>>>>> response_buf_size, &status);
> >>>>>>>>>
> >>>>>>>>> Then why is there a change in ct_send here, which is not the new
> >>>>>>>>> non-blocking path?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> There is not a change to ct_send(), just to intel_guc_ct_send.
> >>>>>>>
> >>>>>>> I was doing by the diff which says:
> >>>>>>>
> >>>>>>>     static int ct_send(struct intel_guc_ct *ct,
> >>>>>>>                const u32 *action,
> >>>>>>>                u32 len,
> >>>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>>>                u32 response_buf_size,
> >>>>>>>                u32 *status)
> >>>>>>>     {
> >>>>>>> +    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>>>>>         struct ct_request request;
> >>>>>>>         unsigned long flags;
> >>>>>>>         u32 fence;
> >>>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>>>         GEM_BUG_ON(!len);
> >>>>>>>         GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> >>>>>>>         GEM_BUG_ON(!response_buf && response_buf_size);
> >>>>>>> +    might_sleep();
> >>>>>>> +    /*
> >>>>>>> +     * We use a lazy spin wait loop here as we believe that if
> >>>>>>> the CT
> >>>>>>> +     * buffers are sized correctly the flow control condition
> >>>>>>> should be
> >>>>>>> +     * rare.
> >>>>>>> +     */
> >>>>>>> +retry:
> >>>>>>>         spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> >>>>>>> +    if (unlikely(!ctb_has_room(ctb, len + 1))) {
> >>>>>>> +        spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>>>>>> +        cond_resched();
> >>>>>>> +        goto retry;
> >>>>>>> +    }
> >>>>>>>
> >>>>>>> So it looks like a change to ct_send to me. Is that wrong?
> >>>>>
> >>>>> What about this part - is the patch changing the blocking ct_send
> >>>>> or not,
> >>>>> and if it is why?
> >>>>>
> >>>>
> >>>> Yes, ct_send() changes. Sorry for the confusion.
> >>>>
> >>>> This function needs to be updated to account for the H2G space and
> >>>> backoff if no space is available.
> >>>
> >>> Since this one is the sleeping path, it probably can and needs to be
> >>> smarter
> >>> than having a cond_resched busy loop added. Like sleep and get woken
> >>> up when
> >>> there is space. Otherwise it can degenerate to busy looping via
> >>> contention
> >>> with the non-blocking path.
> >>>
> >>
> >> That screams over enginerring a simple problem to me. If the CT channel
> >> is full we are really in trouble anyways - i.e. the performance is going
> >> to terrible as we overwhelmed the GuC with traffic. That being said,
> > 
> > Performance of what would be terrible? Something relating to submitting
> > new jobs to the GPU I guess. Or something SRIOV related as you hint below.
> > 
> > But there is no real reason why CPU cycles/power should suffer if GuC is
> > busy.
> > 
> > Okay, if it can't happen in real world then it's possibly passable as a
> 
> if that can't happen in real world, then maybe we can just return
> -ENOSPC/-EBUSY to report that 'unexpected' case, instead of hiding it
> behind silent busy loop ?
> 

No. This is a blocking call, hence it is ok for the function block if it
doesn't have space /w a timeout.

Matt

> > design of a communication interface. But to me it leaves a bad taste and
> > a doubt that there is this other aspect of the real world. And that is
> > when the unexpected happens. Even the most trivial things like a bug in
> > GuC firmware causes the driver to busy spin in there. So not much
> > happening on the machine but CPU cores pinned burning cycles in this
> > code. It's just lazy and not robust design. "Bug #nnnnn - High CPU usage
> > and GUI blocked - Solution: Upgrade GuC firmware and _reboot_ the
> > machine". Oh well..
> > 
> > At least I think the commit message should spell out clearly that a busy
> > looping path is being added to the sleeping send as a downside of
> > implementation choices. Still, for the record, I object to the design.
> > 
> > Regards,
> > 
> > Tvrtko
> > 
> >> IGTs can do this but that really isn't a real world use case. For the
> >> real world, this buffer is large enough that it won't ever be full hence
> >> the comment + lazy spin loop.
> >>
> >> Next, it isn't like we get an interrupt or something when space
> >> becomes available so how would we wake this thread? Could we come up
> >> with a convoluted scheme where we insert ops that generated an interrupt
> >> at regular intervals, probably? Would it be super complicated, totally
> >> unnecessary, and gain use nothing - absolutely.
> >>
> >> Lastly, blocking CTBs really shouldn't ever be used. Certainly the
> >> submission code doesn't use these. I think SRIOV might, but those can
> >> probably be reworked too to use non-blocking. At some point we might
> >> want to scrub the driver and just delete the blocking path.
> >>
> >> Matt
> >>
> >>> Regards,
> >>
> >>>
> >>> Tvrtko
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-06-08  8:46                     ` Daniel Vetter
@ 2021-06-09 23:10                       ` Matthew Brost
  2021-06-10 15:27                         ` Daniel Vetter
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-06-09 23:10 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jason Ekstrand, Tvrtko Ursulin, intel-gfx, dri-devel, Daniel Vetter

On Tue, Jun 08, 2021 at 10:46:15AM +0200, Daniel Vetter wrote:
> On Tue, Jun 8, 2021 at 10:39 AM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> >
> >
> > On 07/06/2021 18:31, Matthew Brost wrote:
> > > On Thu, May 27, 2021 at 04:11:50PM +0100, Tvrtko Ursulin wrote:
> > >>
> > >> On 27/05/2021 15:35, Matthew Brost wrote:
> > >>> On Thu, May 27, 2021 at 11:02:24AM +0100, Tvrtko Ursulin wrote:
> > >>>>
> > >>>> On 26/05/2021 19:10, Matthew Brost wrote:
> > >>>>
> > >>>> [snip]
> > >>>>
> > >>>>>>>>> +static int ct_send_nb(struct intel_guc_ct *ct,
> > >>>>>>>>> +                   const u32 *action,
> > >>>>>>>>> +                   u32 len,
> > >>>>>>>>> +                   u32 flags)
> > >>>>>>>>> +{
> > >>>>>>>>> +     struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > >>>>>>>>> +     unsigned long spin_flags;
> > >>>>>>>>> +     u32 fence;
> > >>>>>>>>> +     int ret;
> > >>>>>>>>> +
> > >>>>>>>>> +     spin_lock_irqsave(&ctb->lock, spin_flags);
> > >>>>>>>>> +
> > >>>>>>>>> +     ret = ctb_has_room(ctb, len + 1);
> > >>>>>>>>> +     if (unlikely(ret))
> > >>>>>>>>> +             goto out;
> > >>>>>>>>> +
> > >>>>>>>>> +     fence = ct_get_next_fence(ct);
> > >>>>>>>>> +     ret = ct_write(ct, action, len, fence, flags);
> > >>>>>>>>> +     if (unlikely(ret))
> > >>>>>>>>> +             goto out;
> > >>>>>>>>> +
> > >>>>>>>>> +     intel_guc_notify(ct_to_guc(ct));
> > >>>>>>>>> +
> > >>>>>>>>> +out:
> > >>>>>>>>> +     spin_unlock_irqrestore(&ctb->lock, spin_flags);
> > >>>>>>>>> +
> > >>>>>>>>> +     return ret;
> > >>>>>>>>> +}
> > >>>>>>>>> +
> > >>>>>>>>>       static int ct_send(struct intel_guc_ct *ct,
> > >>>>>>>>>                          const u32 *action,
> > >>>>>>>>>                          u32 len,
> > >>>>>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > >>>>>>>>>                          u32 response_buf_size,
> > >>>>>>>>>                          u32 *status)
> > >>>>>>>>>       {
> > >>>>>>>>> +     struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > >>>>>>>>>               struct ct_request request;
> > >>>>>>>>>               unsigned long flags;
> > >>>>>>>>>               u32 fence;
> > >>>>>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> > >>>>>>>>>               GEM_BUG_ON(!len);
> > >>>>>>>>>               GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> > >>>>>>>>>               GEM_BUG_ON(!response_buf && response_buf_size);
> > >>>>>>>>> +     might_sleep();
> > >>>>>>>>
> > >>>>>>>> Sleep is just cond_resched below or there is more?
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> Yes, the cond_resched.
> > >>>>>>>
> > >>>>>>>>> +     /*
> > >>>>>>>>> +      * We use a lazy spin wait loop here as we believe that if the CT
> > >>>>>>>>> +      * buffers are sized correctly the flow control condition should be
> > >>>>>>>>> +      * rare.
> > >>>>>>>>> +      */
> > >>>>>>>>> +retry:
> > >>>>>>>>>               spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > >>>>>>>>> +     if (unlikely(!ctb_has_room(ctb, len + 1))) {
> > >>>>>>>>> +             spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > >>>>>>>>> +             cond_resched();
> > >>>>>>>>> +             goto retry;
> > >>>>>>>>> +     }
> > >>>>>>>>
> > >>>>>>>> If this patch is about adding a non-blocking send function, and below we can
> > >>>>>>>> see that it creates a fork:
> > >>>>>>>>
> > >>>>>>>> intel_guc_ct_send:
> > >>>>>>>> ...
> > >>>>>>>>        if (flags & INTEL_GUC_SEND_NB)
> > >>>>>>>>                return ct_send_nb(ct, action, len, flags);
> > >>>>>>>>
> > >>>>>>>>        ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> > >>>>>>>>
> > >>>>>>>> Then why is there a change in ct_send here, which is not the new
> > >>>>>>>> non-blocking path?
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> There is not a change to ct_send(), just to intel_guc_ct_send.
> > >>>>>>
> > >>>>>> I was doing by the diff which says:
> > >>>>>>
> > >>>>>>     static int ct_send(struct intel_guc_ct *ct,
> > >>>>>>                     const u32 *action,
> > >>>>>>                     u32 len,
> > >>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > >>>>>>                     u32 response_buf_size,
> > >>>>>>                     u32 *status)
> > >>>>>>     {
> > >>>>>> +        struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > >>>>>>          struct ct_request request;
> > >>>>>>          unsigned long flags;
> > >>>>>>          u32 fence;
> > >>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> > >>>>>>          GEM_BUG_ON(!len);
> > >>>>>>          GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> > >>>>>>          GEM_BUG_ON(!response_buf && response_buf_size);
> > >>>>>> +        might_sleep();
> > >>>>>> +        /*
> > >>>>>> +         * We use a lazy spin wait loop here as we believe that if the CT
> > >>>>>> +         * buffers are sized correctly the flow control condition should be
> > >>>>>> +         * rare.
> > >>>>>> +         */
> > >>>>>> +retry:
> > >>>>>>          spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > >>>>>> +        if (unlikely(!ctb_has_room(ctb, len + 1))) {
> > >>>>>> +                spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > >>>>>> +                cond_resched();
> > >>>>>> +                goto retry;
> > >>>>>> +        }
> > >>>>>>
> > >>>>>> So it looks like a change to ct_send to me. Is that wrong?
> > >>>>
> > >>>> What about this part - is the patch changing the blocking ct_send or not,
> > >>>> and if it is why?
> > >>>>
> > >>>
> > >>> Yes, ct_send() changes. Sorry for the confusion.
> > >>>
> > >>> This function needs to be updated to account for the H2G space and
> > >>> backoff if no space is available.
> > >>
> > >> Since this one is the sleeping path, it probably can and needs to be smarter
> > >> than having a cond_resched busy loop added. Like sleep and get woken up when
> > >> there is space. Otherwise it can degenerate to busy looping via contention
> > >> with the non-blocking path.
> > >>
> > >
> > > That screams over enginerring a simple problem to me. If the CT channel
> > > is full we are really in trouble anyways - i.e. the performance is going
> > > to terrible as we overwhelmed the GuC with traffic. That being said,
> >
> > Performance of what would be terrible? Something relating to submitting
> > new jobs to the GPU I guess. Or something SRIOV related as you hint below.
> >
> > But there is no real reason why CPU cycles/power should suffer if GuC is
> > busy.
> >
> > Okay, if it can't happen in real world then it's possibly passable as a
> > design of a communication interface. But to me it leaves a bad taste and
> > a doubt that there is this other aspect of the real world. And that is
> > when the unexpected happens. Even the most trivial things like a bug in
> > GuC firmware causes the driver to busy spin in there. So not much
> > happening on the machine but CPU cores pinned burning cycles in this
> > code. It's just lazy and not robust design. "Bug #nnnnn - High CPU usage
> > and GUI blocked - Solution: Upgrade GuC firmware and _reboot_ the
> > machine". Oh well..
> >
> > At least I think the commit message should spell out clearly that a busy
> > looping path is being added to the sleeping send as a downside of
> > implementation choices. Still, for the record, I object to the design.
> >
> > > IGTs can do this but that really isn't a real world use case. For the
> > > real world, this buffer is large enough that it won't ever be full hence
> > > the comment + lazy spin loop.
> > >
> > > Next, it isn't like we get an interrupt or something when space
> > > becomes available so how would we wake this thread? Could we come up
> > > with a convoluted scheme where we insert ops that generated an interrupt
> > > at regular intervals, probably? Would it be super complicated, totally
> > > unnecessary, and gain use nothing - absolutely.
> > >
> > > Lastly, blocking CTBs really shouldn't ever be used. Certainly the
> > > submission code doesn't use these. I think SRIOV might, but those can
> > > probably be reworked too to use non-blocking. At some point we might
> > > want to scrub the driver and just delete the blocking path.
> 
> I'd do an s/cond_resched()/msleep(1)/ and comment explaining why we
> just don't care about this. That checks of the cpu wasting in this
> case (GuC is overloaded, it wont come back anytime soon anyway) and
> explains why we really don't want to make this any more clever or
> complex code (because comment can explain why we wont hit this in
> actual real world usage except when something else is on fire already
> anyway).
> 

Sounds good.

> If you want to go absolutely overkill and it's not too much work, make
> the msleep interruptible or check for signals, and bail out. That way
> the process can be made unstuck with ^C at least.

This loop is already bound by a timer and if no forward progress is made
we pop out of this loop. It is assumed if this happens the GuC / GPU is
dead a and full GPU reset will have to be issued. A following patch
adds the timer, a bit later in submission section of the series a patch
is added to trigger the reset.

Matt

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-06-09 14:14                   ` Michal Wajdeczko
@ 2021-06-09 23:13                     ` Matthew Brost
  0 siblings, 0 replies; 249+ messages in thread
From: Matthew Brost @ 2021-06-09 23:13 UTC (permalink / raw)
  To: Michal Wajdeczko
  Cc: jason.ekstrand, Tvrtko Ursulin, intel-gfx, dri-devel, daniel.vetter

On Wed, Jun 09, 2021 at 04:14:05PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 07.06.2021 19:31, Matthew Brost wrote:
> > On Thu, May 27, 2021 at 04:11:50PM +0100, Tvrtko Ursulin wrote:
> >>
> >> On 27/05/2021 15:35, Matthew Brost wrote:
> >>> On Thu, May 27, 2021 at 11:02:24AM +0100, Tvrtko Ursulin wrote:
> >>>>
> >>>> On 26/05/2021 19:10, Matthew Brost wrote:
> >>>>
> >>>> [snip]
> >>>>
> >>>>>>>>> +static int ct_send_nb(struct intel_guc_ct *ct,
> >>>>>>>>> +		      const u32 *action,
> >>>>>>>>> +		      u32 len,
> >>>>>>>>> +		      u32 flags)
> >>>>>>>>> +{
> >>>>>>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>>>>>>> +	unsigned long spin_flags;
> >>>>>>>>> +	u32 fence;
> >>>>>>>>> +	int ret;
> >>>>>>>>> +
> >>>>>>>>> +	spin_lock_irqsave(&ctb->lock, spin_flags);
> >>>>>>>>> +
> >>>>>>>>> +	ret = ctb_has_room(ctb, len + 1);
> >>>>>>>>> +	if (unlikely(ret))
> >>>>>>>>> +		goto out;
> >>>>>>>>> +
> >>>>>>>>> +	fence = ct_get_next_fence(ct);
> >>>>>>>>> +	ret = ct_write(ct, action, len, fence, flags);
> >>>>>>>>> +	if (unlikely(ret))
> >>>>>>>>> +		goto out;
> >>>>>>>>> +
> >>>>>>>>> +	intel_guc_notify(ct_to_guc(ct));
> >>>>>>>>> +
> >>>>>>>>> +out:
> >>>>>>>>> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> >>>>>>>>> +
> >>>>>>>>> +	return ret;
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>>      static int ct_send(struct intel_guc_ct *ct,
> >>>>>>>>>      		   const u32 *action,
> >>>>>>>>>      		   u32 len,
> >>>>>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>>>>>      		   u32 response_buf_size,
> >>>>>>>>>      		   u32 *status)
> >>>>>>>>>      {
> >>>>>>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>>>>>>>      	struct ct_request request;
> >>>>>>>>>      	unsigned long flags;
> >>>>>>>>>      	u32 fence;
> >>>>>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>>>>>      	GEM_BUG_ON(!len);
> >>>>>>>>>      	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> >>>>>>>>>      	GEM_BUG_ON(!response_buf && response_buf_size);
> >>>>>>>>> +	might_sleep();
> >>>>>>>>
> >>>>>>>> Sleep is just cond_resched below or there is more?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Yes, the cond_resched.
> >>>>>>>
> >>>>>>>>> +	/*
> >>>>>>>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
> >>>>>>>>> +	 * buffers are sized correctly the flow control condition should be
> >>>>>>>>> +	 * rare.
> >>>>>>>>> +	 */
> >>>>>>>>> +retry:
> >>>>>>>>>      	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> >>>>>>>>> +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
> >>>>>>>>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>>>>>>>> +		cond_resched();
> >>>>>>>>> +		goto retry;
> >>>>>>>>> +	}
> >>>>>>>>
> >>>>>>>> If this patch is about adding a non-blocking send function, and below we can
> >>>>>>>> see that it creates a fork:
> >>>>>>>>
> >>>>>>>> intel_guc_ct_send:
> >>>>>>>> ...
> >>>>>>>> 	if (flags & INTEL_GUC_SEND_NB)
> >>>>>>>> 		return ct_send_nb(ct, action, len, flags);
> >>>>>>>>
> >>>>>>>>     	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> >>>>>>>>
> >>>>>>>> Then why is there a change in ct_send here, which is not the new
> >>>>>>>> non-blocking path?
> >>>>>>>>
> >>>>>>>
> >>>>>>> There is not a change to ct_send(), just to intel_guc_ct_send.
> >>>>>>
> >>>>>> I was doing by the diff which says:
> >>>>>>
> >>>>>>    static int ct_send(struct intel_guc_ct *ct,
> >>>>>>    		   const u32 *action,
> >>>>>>    		   u32 len,
> >>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>>    		   u32 response_buf_size,
> >>>>>>    		   u32 *status)
> >>>>>>    {
> >>>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>>>>    	struct ct_request request;
> >>>>>>    	unsigned long flags;
> >>>>>>    	u32 fence;
> >>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>>    	GEM_BUG_ON(!len);
> >>>>>>    	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> >>>>>>    	GEM_BUG_ON(!response_buf && response_buf_size);
> >>>>>> +	might_sleep();
> >>>>>> +	/*
> >>>>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
> >>>>>> +	 * buffers are sized correctly the flow control condition should be
> >>>>>> +	 * rare.
> >>>>>> +	 */
> >>>>>> +retry:
> >>>>>>    	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> >>>>>> +	if (unlikely(!ctb_has_room(ctb, len + 1))) {
> >>>>>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>>>>> +		cond_resched();
> >>>>>> +		goto retry;
> >>>>>> +	}
> >>>>>>
> >>>>>> So it looks like a change to ct_send to me. Is that wrong?
> >>>>
> >>>> What about this part - is the patch changing the blocking ct_send or not,
> >>>> and if it is why?
> >>>>
> >>>
> >>> Yes, ct_send() changes. Sorry for the confusion.
> >>>
> >>> This function needs to be updated to account for the H2G space and
> >>> backoff if no space is available.
> >>
> >> Since this one is the sleeping path, it probably can and needs to be smarter
> >> than having a cond_resched busy loop added. Like sleep and get woken up when
> >> there is space. Otherwise it can degenerate to busy looping via contention
> >> with the non-blocking path.
> >>
> > 
> > That screams over enginerring a simple problem to me. If the CT channel
> > is full we are really in trouble anyways - i.e. the performance is going
> > to terrible as we overwhelmed the GuC with traffic. That being said,
> > IGTs can do this but that really isn't a real world use case. For the
> > real world, this buffer is large enough that it won't ever be full hence
> > the comment + lazy spin loop.
> > 
> > Next, it isn't like we get an interrupt or something when space
> > becomes available so how would we wake this thread? Could we come up
> > with a convoluted scheme where we insert ops that generated an interrupt
> > at regular intervals, probably? Would it be super complicated, totally
> > unnecessary, and gain use nothing - absolutely.
> > 
> > Lastly, blocking CTBs really shouldn't ever be used. Certainly the
> > submission code doesn't use these. I think SRIOV might, but those can
> > probably be reworked too to use non-blocking. At some point we might
> > want to scrub the driver and just delete the blocking path.
> 
> I guess the main problem is not with "blocking CTBs", as now only
> calling thread is "blocked" waiting for reply and other threads can
> still send their CTBs (blocked/nonblocking), but the fact that we are
> sending too many messages, stopping only when CTB is full, and even then
> trying hard to squeeze that message again.
> 
> it should be caller responsibility to throttle its stream of
> non-blocking CTBs if either we are running out of CTB but if we have too
> many "non-blocking" requests in flight.
> 
> making CTB buffer just larger and larger does not solve the problem,
> only makes it less visible
> 
> and as you are using busy-loop to send even 'non-blocking' CTBs, it
> might indicate that your code is not prepared to step-back in case of
> any temporary CTB congestion
> 
> also note that currently all CTB messages are asynchronous, REQUEST /
> RESPONSE pair could be processed in fully non-blocking approach, but
> that would require refactoring of part driver into event-driven state
> machine, as sometimes we can't move forward without information that we
> are waiting from the GuC (and blocking was simplest solution for that)
> 
> but if your submission code is already  event-driven, then it should be
> easier to trigger state machine into 'retry' mode without using this
> busy-loop

Yes, the state-machine is used in most cases as a back off where it
makes sense. Some cases we still just use a busy-loop. See my comments
about over engineering solutions - sometimes it is better to use
something simple for something that rare.

Matt

> 
> > 
> > Matt
> > 
> >> Regards,
> > 
> >>
> >> Tvrtko

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-06-09 23:10                       ` Matthew Brost
@ 2021-06-10 15:27                         ` Daniel Vetter
  2021-06-24 16:38                           ` Matthew Brost
  0 siblings, 1 reply; 249+ messages in thread
From: Daniel Vetter @ 2021-06-10 15:27 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Tvrtko Ursulin, intel-gfx, dri-devel, Jason Ekstrand, Daniel Vetter

On Wed, Jun 09, 2021 at 04:10:23PM -0700, Matthew Brost wrote:
> On Tue, Jun 08, 2021 at 10:46:15AM +0200, Daniel Vetter wrote:
> > On Tue, Jun 8, 2021 at 10:39 AM Tvrtko Ursulin
> > <tvrtko.ursulin@linux.intel.com> wrote:
> > >
> > >
> > > On 07/06/2021 18:31, Matthew Brost wrote:
> > > > On Thu, May 27, 2021 at 04:11:50PM +0100, Tvrtko Ursulin wrote:
> > > >>
> > > >> On 27/05/2021 15:35, Matthew Brost wrote:
> > > >>> On Thu, May 27, 2021 at 11:02:24AM +0100, Tvrtko Ursulin wrote:
> > > >>>>
> > > >>>> On 26/05/2021 19:10, Matthew Brost wrote:
> > > >>>>
> > > >>>> [snip]
> > > >>>>
> > > >>>>>>>>> +static int ct_send_nb(struct intel_guc_ct *ct,
> > > >>>>>>>>> +                   const u32 *action,
> > > >>>>>>>>> +                   u32 len,
> > > >>>>>>>>> +                   u32 flags)
> > > >>>>>>>>> +{
> > > >>>>>>>>> +     struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > >>>>>>>>> +     unsigned long spin_flags;
> > > >>>>>>>>> +     u32 fence;
> > > >>>>>>>>> +     int ret;
> > > >>>>>>>>> +
> > > >>>>>>>>> +     spin_lock_irqsave(&ctb->lock, spin_flags);
> > > >>>>>>>>> +
> > > >>>>>>>>> +     ret = ctb_has_room(ctb, len + 1);
> > > >>>>>>>>> +     if (unlikely(ret))
> > > >>>>>>>>> +             goto out;
> > > >>>>>>>>> +
> > > >>>>>>>>> +     fence = ct_get_next_fence(ct);
> > > >>>>>>>>> +     ret = ct_write(ct, action, len, fence, flags);
> > > >>>>>>>>> +     if (unlikely(ret))
> > > >>>>>>>>> +             goto out;
> > > >>>>>>>>> +
> > > >>>>>>>>> +     intel_guc_notify(ct_to_guc(ct));
> > > >>>>>>>>> +
> > > >>>>>>>>> +out:
> > > >>>>>>>>> +     spin_unlock_irqrestore(&ctb->lock, spin_flags);
> > > >>>>>>>>> +
> > > >>>>>>>>> +     return ret;
> > > >>>>>>>>> +}
> > > >>>>>>>>> +
> > > >>>>>>>>>       static int ct_send(struct intel_guc_ct *ct,
> > > >>>>>>>>>                          const u32 *action,
> > > >>>>>>>>>                          u32 len,
> > > >>>>>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > > >>>>>>>>>                          u32 response_buf_size,
> > > >>>>>>>>>                          u32 *status)
> > > >>>>>>>>>       {
> > > >>>>>>>>> +     struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > >>>>>>>>>               struct ct_request request;
> > > >>>>>>>>>               unsigned long flags;
> > > >>>>>>>>>               u32 fence;
> > > >>>>>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> > > >>>>>>>>>               GEM_BUG_ON(!len);
> > > >>>>>>>>>               GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> > > >>>>>>>>>               GEM_BUG_ON(!response_buf && response_buf_size);
> > > >>>>>>>>> +     might_sleep();
> > > >>>>>>>>
> > > >>>>>>>> Sleep is just cond_resched below or there is more?
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>> Yes, the cond_resched.
> > > >>>>>>>
> > > >>>>>>>>> +     /*
> > > >>>>>>>>> +      * We use a lazy spin wait loop here as we believe that if the CT
> > > >>>>>>>>> +      * buffers are sized correctly the flow control condition should be
> > > >>>>>>>>> +      * rare.
> > > >>>>>>>>> +      */
> > > >>>>>>>>> +retry:
> > > >>>>>>>>>               spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > > >>>>>>>>> +     if (unlikely(!ctb_has_room(ctb, len + 1))) {
> > > >>>>>>>>> +             spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > > >>>>>>>>> +             cond_resched();
> > > >>>>>>>>> +             goto retry;
> > > >>>>>>>>> +     }
> > > >>>>>>>>
> > > >>>>>>>> If this patch is about adding a non-blocking send function, and below we can
> > > >>>>>>>> see that it creates a fork:
> > > >>>>>>>>
> > > >>>>>>>> intel_guc_ct_send:
> > > >>>>>>>> ...
> > > >>>>>>>>        if (flags & INTEL_GUC_SEND_NB)
> > > >>>>>>>>                return ct_send_nb(ct, action, len, flags);
> > > >>>>>>>>
> > > >>>>>>>>        ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> > > >>>>>>>>
> > > >>>>>>>> Then why is there a change in ct_send here, which is not the new
> > > >>>>>>>> non-blocking path?
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>> There is not a change to ct_send(), just to intel_guc_ct_send.
> > > >>>>>>
> > > >>>>>> I was doing by the diff which says:
> > > >>>>>>
> > > >>>>>>     static int ct_send(struct intel_guc_ct *ct,
> > > >>>>>>                     const u32 *action,
> > > >>>>>>                     u32 len,
> > > >>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > > >>>>>>                     u32 response_buf_size,
> > > >>>>>>                     u32 *status)
> > > >>>>>>     {
> > > >>>>>> +        struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > >>>>>>          struct ct_request request;
> > > >>>>>>          unsigned long flags;
> > > >>>>>>          u32 fence;
> > > >>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> > > >>>>>>          GEM_BUG_ON(!len);
> > > >>>>>>          GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> > > >>>>>>          GEM_BUG_ON(!response_buf && response_buf_size);
> > > >>>>>> +        might_sleep();
> > > >>>>>> +        /*
> > > >>>>>> +         * We use a lazy spin wait loop here as we believe that if the CT
> > > >>>>>> +         * buffers are sized correctly the flow control condition should be
> > > >>>>>> +         * rare.
> > > >>>>>> +         */
> > > >>>>>> +retry:
> > > >>>>>>          spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > > >>>>>> +        if (unlikely(!ctb_has_room(ctb, len + 1))) {
> > > >>>>>> +                spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > > >>>>>> +                cond_resched();
> > > >>>>>> +                goto retry;
> > > >>>>>> +        }
> > > >>>>>>
> > > >>>>>> So it looks like a change to ct_send to me. Is that wrong?
> > > >>>>
> > > >>>> What about this part - is the patch changing the blocking ct_send or not,
> > > >>>> and if it is why?
> > > >>>>
> > > >>>
> > > >>> Yes, ct_send() changes. Sorry for the confusion.
> > > >>>
> > > >>> This function needs to be updated to account for the H2G space and
> > > >>> backoff if no space is available.
> > > >>
> > > >> Since this one is the sleeping path, it probably can and needs to be smarter
> > > >> than having a cond_resched busy loop added. Like sleep and get woken up when
> > > >> there is space. Otherwise it can degenerate to busy looping via contention
> > > >> with the non-blocking path.
> > > >>
> > > >
> > > > That screams over enginerring a simple problem to me. If the CT channel
> > > > is full we are really in trouble anyways - i.e. the performance is going
> > > > to terrible as we overwhelmed the GuC with traffic. That being said,
> > >
> > > Performance of what would be terrible? Something relating to submitting
> > > new jobs to the GPU I guess. Or something SRIOV related as you hint below.
> > >
> > > But there is no real reason why CPU cycles/power should suffer if GuC is
> > > busy.
> > >
> > > Okay, if it can't happen in real world then it's possibly passable as a
> > > design of a communication interface. But to me it leaves a bad taste and
> > > a doubt that there is this other aspect of the real world. And that is
> > > when the unexpected happens. Even the most trivial things like a bug in
> > > GuC firmware causes the driver to busy spin in there. So not much
> > > happening on the machine but CPU cores pinned burning cycles in this
> > > code. It's just lazy and not robust design. "Bug #nnnnn - High CPU usage
> > > and GUI blocked - Solution: Upgrade GuC firmware and _reboot_ the
> > > machine". Oh well..
> > >
> > > At least I think the commit message should spell out clearly that a busy
> > > looping path is being added to the sleeping send as a downside of
> > > implementation choices. Still, for the record, I object to the design.
> > >
> > > > IGTs can do this but that really isn't a real world use case. For the
> > > > real world, this buffer is large enough that it won't ever be full hence
> > > > the comment + lazy spin loop.
> > > >
> > > > Next, it isn't like we get an interrupt or something when space
> > > > becomes available so how would we wake this thread? Could we come up
> > > > with a convoluted scheme where we insert ops that generated an interrupt
> > > > at regular intervals, probably? Would it be super complicated, totally
> > > > unnecessary, and gain use nothing - absolutely.
> > > >
> > > > Lastly, blocking CTBs really shouldn't ever be used. Certainly the
> > > > submission code doesn't use these. I think SRIOV might, but those can
> > > > probably be reworked too to use non-blocking. At some point we might
> > > > want to scrub the driver and just delete the blocking path.
> > 
> > I'd do an s/cond_resched()/msleep(1)/ and comment explaining why we
> > just don't care about this. That checks of the cpu wasting in this
> > case (GuC is overloaded, it wont come back anytime soon anyway) and
> > explains why we really don't want to make this any more clever or
> > complex code (because comment can explain why we wont hit this in
> > actual real world usage except when something else is on fire already
> > anyway).
> > 
> 
> Sounds good.
> 
> > If you want to go absolutely overkill and it's not too much work, make
> > the msleep interruptible or check for signals, and bail out. That way
> > the process can be made unstuck with ^C at least.
> 
> This loop is already bound by a timer and if no forward progress is made
> we pop out of this loop. It is assumed if this happens the GuC / GPU is
> dead a and full GPU reset will have to be issued. A following patch
> adds the timer, a bit later in submission section of the series a patch
> is added to trigger the reset.

Yeah timeout bail-out works too, and if you then switch it from timeout to
also interruptible it shouldn't be much more code. It's just nice to not
have any uninterruptible sleep.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-06-10 15:27                         ` Daniel Vetter
@ 2021-06-24 16:38                           ` Matthew Brost
  2021-06-24 17:25                             ` Daniel Vetter
  0 siblings, 1 reply; 249+ messages in thread
From: Matthew Brost @ 2021-06-24 16:38 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jason Ekstrand, Tvrtko Ursulin, intel-gfx, dri-devel, Daniel Vetter

On Thu, Jun 10, 2021 at 05:27:48PM +0200, Daniel Vetter wrote:
> On Wed, Jun 09, 2021 at 04:10:23PM -0700, Matthew Brost wrote:
> > On Tue, Jun 08, 2021 at 10:46:15AM +0200, Daniel Vetter wrote:
> > > On Tue, Jun 8, 2021 at 10:39 AM Tvrtko Ursulin
> > > <tvrtko.ursulin@linux.intel.com> wrote:
> > > >
> > > >
> > > > On 07/06/2021 18:31, Matthew Brost wrote:
> > > > > On Thu, May 27, 2021 at 04:11:50PM +0100, Tvrtko Ursulin wrote:
> > > > >>
> > > > >> On 27/05/2021 15:35, Matthew Brost wrote:
> > > > >>> On Thu, May 27, 2021 at 11:02:24AM +0100, Tvrtko Ursulin wrote:
> > > > >>>>
> > > > >>>> On 26/05/2021 19:10, Matthew Brost wrote:
> > > > >>>>
> > > > >>>> [snip]
> > > > >>>>
> > > > >>>>>>>>> +static int ct_send_nb(struct intel_guc_ct *ct,
> > > > >>>>>>>>> +                   const u32 *action,
> > > > >>>>>>>>> +                   u32 len,
> > > > >>>>>>>>> +                   u32 flags)
> > > > >>>>>>>>> +{
> > > > >>>>>>>>> +     struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > > >>>>>>>>> +     unsigned long spin_flags;
> > > > >>>>>>>>> +     u32 fence;
> > > > >>>>>>>>> +     int ret;
> > > > >>>>>>>>> +
> > > > >>>>>>>>> +     spin_lock_irqsave(&ctb->lock, spin_flags);
> > > > >>>>>>>>> +
> > > > >>>>>>>>> +     ret = ctb_has_room(ctb, len + 1);
> > > > >>>>>>>>> +     if (unlikely(ret))
> > > > >>>>>>>>> +             goto out;
> > > > >>>>>>>>> +
> > > > >>>>>>>>> +     fence = ct_get_next_fence(ct);
> > > > >>>>>>>>> +     ret = ct_write(ct, action, len, fence, flags);
> > > > >>>>>>>>> +     if (unlikely(ret))
> > > > >>>>>>>>> +             goto out;
> > > > >>>>>>>>> +
> > > > >>>>>>>>> +     intel_guc_notify(ct_to_guc(ct));
> > > > >>>>>>>>> +
> > > > >>>>>>>>> +out:
> > > > >>>>>>>>> +     spin_unlock_irqrestore(&ctb->lock, spin_flags);
> > > > >>>>>>>>> +
> > > > >>>>>>>>> +     return ret;
> > > > >>>>>>>>> +}
> > > > >>>>>>>>> +
> > > > >>>>>>>>>       static int ct_send(struct intel_guc_ct *ct,
> > > > >>>>>>>>>                          const u32 *action,
> > > > >>>>>>>>>                          u32 len,
> > > > >>>>>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > > > >>>>>>>>>                          u32 response_buf_size,
> > > > >>>>>>>>>                          u32 *status)
> > > > >>>>>>>>>       {
> > > > >>>>>>>>> +     struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > > >>>>>>>>>               struct ct_request request;
> > > > >>>>>>>>>               unsigned long flags;
> > > > >>>>>>>>>               u32 fence;
> > > > >>>>>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> > > > >>>>>>>>>               GEM_BUG_ON(!len);
> > > > >>>>>>>>>               GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> > > > >>>>>>>>>               GEM_BUG_ON(!response_buf && response_buf_size);
> > > > >>>>>>>>> +     might_sleep();
> > > > >>>>>>>>
> > > > >>>>>>>> Sleep is just cond_resched below or there is more?
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Yes, the cond_resched.
> > > > >>>>>>>
> > > > >>>>>>>>> +     /*
> > > > >>>>>>>>> +      * We use a lazy spin wait loop here as we believe that if the CT
> > > > >>>>>>>>> +      * buffers are sized correctly the flow control condition should be
> > > > >>>>>>>>> +      * rare.
> > > > >>>>>>>>> +      */
> > > > >>>>>>>>> +retry:
> > > > >>>>>>>>>               spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > > > >>>>>>>>> +     if (unlikely(!ctb_has_room(ctb, len + 1))) {
> > > > >>>>>>>>> +             spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > > > >>>>>>>>> +             cond_resched();
> > > > >>>>>>>>> +             goto retry;
> > > > >>>>>>>>> +     }
> > > > >>>>>>>>
> > > > >>>>>>>> If this patch is about adding a non-blocking send function, and below we can
> > > > >>>>>>>> see that it creates a fork:
> > > > >>>>>>>>
> > > > >>>>>>>> intel_guc_ct_send:
> > > > >>>>>>>> ...
> > > > >>>>>>>>        if (flags & INTEL_GUC_SEND_NB)
> > > > >>>>>>>>                return ct_send_nb(ct, action, len, flags);
> > > > >>>>>>>>
> > > > >>>>>>>>        ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> > > > >>>>>>>>
> > > > >>>>>>>> Then why is there a change in ct_send here, which is not the new
> > > > >>>>>>>> non-blocking path?
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> There is not a change to ct_send(), just to intel_guc_ct_send.
> > > > >>>>>>
> > > > >>>>>> I was doing by the diff which says:
> > > > >>>>>>
> > > > >>>>>>     static int ct_send(struct intel_guc_ct *ct,
> > > > >>>>>>                     const u32 *action,
> > > > >>>>>>                     u32 len,
> > > > >>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > > > >>>>>>                     u32 response_buf_size,
> > > > >>>>>>                     u32 *status)
> > > > >>>>>>     {
> > > > >>>>>> +        struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > > >>>>>>          struct ct_request request;
> > > > >>>>>>          unsigned long flags;
> > > > >>>>>>          u32 fence;
> > > > >>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> > > > >>>>>>          GEM_BUG_ON(!len);
> > > > >>>>>>          GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> > > > >>>>>>          GEM_BUG_ON(!response_buf && response_buf_size);
> > > > >>>>>> +        might_sleep();
> > > > >>>>>> +        /*
> > > > >>>>>> +         * We use a lazy spin wait loop here as we believe that if the CT
> > > > >>>>>> +         * buffers are sized correctly the flow control condition should be
> > > > >>>>>> +         * rare.
> > > > >>>>>> +         */
> > > > >>>>>> +retry:
> > > > >>>>>>          spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > > > >>>>>> +        if (unlikely(!ctb_has_room(ctb, len + 1))) {
> > > > >>>>>> +                spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > > > >>>>>> +                cond_resched();
> > > > >>>>>> +                goto retry;
> > > > >>>>>> +        }
> > > > >>>>>>
> > > > >>>>>> So it looks like a change to ct_send to me. Is that wrong?
> > > > >>>>
> > > > >>>> What about this part - is the patch changing the blocking ct_send or not,
> > > > >>>> and if it is why?
> > > > >>>>
> > > > >>>
> > > > >>> Yes, ct_send() changes. Sorry for the confusion.
> > > > >>>
> > > > >>> This function needs to be updated to account for the H2G space and
> > > > >>> backoff if no space is available.
> > > > >>
> > > > >> Since this one is the sleeping path, it probably can and needs to be smarter
> > > > >> than having a cond_resched busy loop added. Like sleep and get woken up when
> > > > >> there is space. Otherwise it can degenerate to busy looping via contention
> > > > >> with the non-blocking path.
> > > > >>
> > > > >
> > > > > That screams over enginerring a simple problem to me. If the CT channel
> > > > > is full we are really in trouble anyways - i.e. the performance is going
> > > > > to terrible as we overwhelmed the GuC with traffic. That being said,
> > > >
> > > > Performance of what would be terrible? Something relating to submitting
> > > > new jobs to the GPU I guess. Or something SRIOV related as you hint below.
> > > >
> > > > But there is no real reason why CPU cycles/power should suffer if GuC is
> > > > busy.
> > > >
> > > > Okay, if it can't happen in real world then it's possibly passable as a
> > > > design of a communication interface. But to me it leaves a bad taste and
> > > > a doubt that there is this other aspect of the real world. And that is
> > > > when the unexpected happens. Even the most trivial things like a bug in
> > > > GuC firmware causes the driver to busy spin in there. So not much
> > > > happening on the machine but CPU cores pinned burning cycles in this
> > > > code. It's just lazy and not robust design. "Bug #nnnnn - High CPU usage
> > > > and GUI blocked - Solution: Upgrade GuC firmware and _reboot_ the
> > > > machine". Oh well..
> > > >
> > > > At least I think the commit message should spell out clearly that a busy
> > > > looping path is being added to the sleeping send as a downside of
> > > > implementation choices. Still, for the record, I object to the design.
> > > >
> > > > > IGTs can do this but that really isn't a real world use case. For the
> > > > > real world, this buffer is large enough that it won't ever be full hence
> > > > > the comment + lazy spin loop.
> > > > >
> > > > > Next, it isn't like we get an interrupt or something when space
> > > > > becomes available so how would we wake this thread? Could we come up
> > > > > with a convoluted scheme where we insert ops that generated an interrupt
> > > > > at regular intervals, probably? Would it be super complicated, totally
> > > > > unnecessary, and gain use nothing - absolutely.
> > > > >
> > > > > Lastly, blocking CTBs really shouldn't ever be used. Certainly the
> > > > > submission code doesn't use these. I think SRIOV might, but those can
> > > > > probably be reworked too to use non-blocking. At some point we might
> > > > > want to scrub the driver and just delete the blocking path.
> > > 
> > > I'd do an s/cond_resched()/msleep(1)/ and comment explaining why we
> > > just don't care about this. That checks of the cpu wasting in this
> > > case (GuC is overloaded, it wont come back anytime soon anyway) and
> > > explains why we really don't want to make this any more clever or
> > > complex code (because comment can explain why we wont hit this in
> > > actual real world usage except when something else is on fire already
> > > anyway).
> > > 
> > 
> > Sounds good.
> > 
> > > If you want to go absolutely overkill and it's not too much work, make
> > > the msleep interruptible or check for signals, and bail out. That way
> > > the process can be made unstuck with ^C at least.
> > 
> > This loop is already bound by a timer and if no forward progress is made
> > we pop out of this loop. It is assumed if this happens the GuC / GPU is
> > dead a and full GPU reset will have to be issued. A following patch
> > adds the timer, a bit later in submission section of the series a patch
> > is added to trigger the reset.
> 
> Yeah timeout bail-out works too, and if you then switch it from timeout to
> also interruptible it shouldn't be much more code. It's just nice to not
> have any uninterruptible sleep.
> -Daniel

I didn't get this in my next rev as I didn't know how to do this off
hand but I think all I need to add is something like this to each
iteration of the busy loops, right?

if (signal_pending_state(TASK_INTERRUPTIBLE, current))
	bail out of busy loop, and return and error

Matt

> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function
  2021-06-24 16:38                           ` Matthew Brost
@ 2021-06-24 17:25                             ` Daniel Vetter
  0 siblings, 0 replies; 249+ messages in thread
From: Daniel Vetter @ 2021-06-24 17:25 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Tvrtko Ursulin, intel-gfx, dri-devel, Jason Ekstrand, Daniel Vetter

On Thu, Jun 24, 2021 at 09:38:33AM -0700, Matthew Brost wrote:
> On Thu, Jun 10, 2021 at 05:27:48PM +0200, Daniel Vetter wrote:
> > On Wed, Jun 09, 2021 at 04:10:23PM -0700, Matthew Brost wrote:
> > > On Tue, Jun 08, 2021 at 10:46:15AM +0200, Daniel Vetter wrote:
> > > > On Tue, Jun 8, 2021 at 10:39 AM Tvrtko Ursulin
> > > > <tvrtko.ursulin@linux.intel.com> wrote:
> > > > >
> > > > >
> > > > > On 07/06/2021 18:31, Matthew Brost wrote:
> > > > > > On Thu, May 27, 2021 at 04:11:50PM +0100, Tvrtko Ursulin wrote:
> > > > > >>
> > > > > >> On 27/05/2021 15:35, Matthew Brost wrote:
> > > > > >>> On Thu, May 27, 2021 at 11:02:24AM +0100, Tvrtko Ursulin wrote:
> > > > > >>>>
> > > > > >>>> On 26/05/2021 19:10, Matthew Brost wrote:
> > > > > >>>>
> > > > > >>>> [snip]
> > > > > >>>>
> > > > > >>>>>>>>> +static int ct_send_nb(struct intel_guc_ct *ct,
> > > > > >>>>>>>>> +                   const u32 *action,
> > > > > >>>>>>>>> +                   u32 len,
> > > > > >>>>>>>>> +                   u32 flags)
> > > > > >>>>>>>>> +{
> > > > > >>>>>>>>> +     struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > > > >>>>>>>>> +     unsigned long spin_flags;
> > > > > >>>>>>>>> +     u32 fence;
> > > > > >>>>>>>>> +     int ret;
> > > > > >>>>>>>>> +
> > > > > >>>>>>>>> +     spin_lock_irqsave(&ctb->lock, spin_flags);
> > > > > >>>>>>>>> +
> > > > > >>>>>>>>> +     ret = ctb_has_room(ctb, len + 1);
> > > > > >>>>>>>>> +     if (unlikely(ret))
> > > > > >>>>>>>>> +             goto out;
> > > > > >>>>>>>>> +
> > > > > >>>>>>>>> +     fence = ct_get_next_fence(ct);
> > > > > >>>>>>>>> +     ret = ct_write(ct, action, len, fence, flags);
> > > > > >>>>>>>>> +     if (unlikely(ret))
> > > > > >>>>>>>>> +             goto out;
> > > > > >>>>>>>>> +
> > > > > >>>>>>>>> +     intel_guc_notify(ct_to_guc(ct));
> > > > > >>>>>>>>> +
> > > > > >>>>>>>>> +out:
> > > > > >>>>>>>>> +     spin_unlock_irqrestore(&ctb->lock, spin_flags);
> > > > > >>>>>>>>> +
> > > > > >>>>>>>>> +     return ret;
> > > > > >>>>>>>>> +}
> > > > > >>>>>>>>> +
> > > > > >>>>>>>>>       static int ct_send(struct intel_guc_ct *ct,
> > > > > >>>>>>>>>                          const u32 *action,
> > > > > >>>>>>>>>                          u32 len,
> > > > > >>>>>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > > > > >>>>>>>>>                          u32 response_buf_size,
> > > > > >>>>>>>>>                          u32 *status)
> > > > > >>>>>>>>>       {
> > > > > >>>>>>>>> +     struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > > > >>>>>>>>>               struct ct_request request;
> > > > > >>>>>>>>>               unsigned long flags;
> > > > > >>>>>>>>>               u32 fence;
> > > > > >>>>>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> > > > > >>>>>>>>>               GEM_BUG_ON(!len);
> > > > > >>>>>>>>>               GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> > > > > >>>>>>>>>               GEM_BUG_ON(!response_buf && response_buf_size);
> > > > > >>>>>>>>> +     might_sleep();
> > > > > >>>>>>>>
> > > > > >>>>>>>> Sleep is just cond_resched below or there is more?
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> Yes, the cond_resched.
> > > > > >>>>>>>
> > > > > >>>>>>>>> +     /*
> > > > > >>>>>>>>> +      * We use a lazy spin wait loop here as we believe that if the CT
> > > > > >>>>>>>>> +      * buffers are sized correctly the flow control condition should be
> > > > > >>>>>>>>> +      * rare.
> > > > > >>>>>>>>> +      */
> > > > > >>>>>>>>> +retry:
> > > > > >>>>>>>>>               spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > > > > >>>>>>>>> +     if (unlikely(!ctb_has_room(ctb, len + 1))) {
> > > > > >>>>>>>>> +             spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > > > > >>>>>>>>> +             cond_resched();
> > > > > >>>>>>>>> +             goto retry;
> > > > > >>>>>>>>> +     }
> > > > > >>>>>>>>
> > > > > >>>>>>>> If this patch is about adding a non-blocking send function, and below we can
> > > > > >>>>>>>> see that it creates a fork:
> > > > > >>>>>>>>
> > > > > >>>>>>>> intel_guc_ct_send:
> > > > > >>>>>>>> ...
> > > > > >>>>>>>>        if (flags & INTEL_GUC_SEND_NB)
> > > > > >>>>>>>>                return ct_send_nb(ct, action, len, flags);
> > > > > >>>>>>>>
> > > > > >>>>>>>>        ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> > > > > >>>>>>>>
> > > > > >>>>>>>> Then why is there a change in ct_send here, which is not the new
> > > > > >>>>>>>> non-blocking path?
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> There is not a change to ct_send(), just to intel_guc_ct_send.
> > > > > >>>>>>
> > > > > >>>>>> I was doing by the diff which says:
> > > > > >>>>>>
> > > > > >>>>>>     static int ct_send(struct intel_guc_ct *ct,
> > > > > >>>>>>                     const u32 *action,
> > > > > >>>>>>                     u32 len,
> > > > > >>>>>> @@ -473,6 +541,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > > > > >>>>>>                     u32 response_buf_size,
> > > > > >>>>>>                     u32 *status)
> > > > > >>>>>>     {
> > > > > >>>>>> +        struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > > > >>>>>>          struct ct_request request;
> > > > > >>>>>>          unsigned long flags;
> > > > > >>>>>>          u32 fence;
> > > > > >>>>>> @@ -482,8 +551,20 @@ static int ct_send(struct intel_guc_ct *ct,
> > > > > >>>>>>          GEM_BUG_ON(!len);
> > > > > >>>>>>          GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> > > > > >>>>>>          GEM_BUG_ON(!response_buf && response_buf_size);
> > > > > >>>>>> +        might_sleep();
> > > > > >>>>>> +        /*
> > > > > >>>>>> +         * We use a lazy spin wait loop here as we believe that if the CT
> > > > > >>>>>> +         * buffers are sized correctly the flow control condition should be
> > > > > >>>>>> +         * rare.
> > > > > >>>>>> +         */
> > > > > >>>>>> +retry:
> > > > > >>>>>>          spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > > > > >>>>>> +        if (unlikely(!ctb_has_room(ctb, len + 1))) {
> > > > > >>>>>> +                spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > > > > >>>>>> +                cond_resched();
> > > > > >>>>>> +                goto retry;
> > > > > >>>>>> +        }
> > > > > >>>>>>
> > > > > >>>>>> So it looks like a change to ct_send to me. Is that wrong?
> > > > > >>>>
> > > > > >>>> What about this part - is the patch changing the blocking ct_send or not,
> > > > > >>>> and if it is why?
> > > > > >>>>
> > > > > >>>
> > > > > >>> Yes, ct_send() changes. Sorry for the confusion.
> > > > > >>>
> > > > > >>> This function needs to be updated to account for the H2G space and
> > > > > >>> backoff if no space is available.
> > > > > >>
> > > > > >> Since this one is the sleeping path, it probably can and needs to be smarter
> > > > > >> than having a cond_resched busy loop added. Like sleep and get woken up when
> > > > > >> there is space. Otherwise it can degenerate to busy looping via contention
> > > > > >> with the non-blocking path.
> > > > > >>
> > > > > >
> > > > > > That screams over enginerring a simple problem to me. If the CT channel
> > > > > > is full we are really in trouble anyways - i.e. the performance is going
> > > > > > to terrible as we overwhelmed the GuC with traffic. That being said,
> > > > >
> > > > > Performance of what would be terrible? Something relating to submitting
> > > > > new jobs to the GPU I guess. Or something SRIOV related as you hint below.
> > > > >
> > > > > But there is no real reason why CPU cycles/power should suffer if GuC is
> > > > > busy.
> > > > >
> > > > > Okay, if it can't happen in real world then it's possibly passable as a
> > > > > design of a communication interface. But to me it leaves a bad taste and
> > > > > a doubt that there is this other aspect of the real world. And that is
> > > > > when the unexpected happens. Even the most trivial things like a bug in
> > > > > GuC firmware causes the driver to busy spin in there. So not much
> > > > > happening on the machine but CPU cores pinned burning cycles in this
> > > > > code. It's just lazy and not robust design. "Bug #nnnnn - High CPU usage
> > > > > and GUI blocked - Solution: Upgrade GuC firmware and _reboot_ the
> > > > > machine". Oh well..
> > > > >
> > > > > At least I think the commit message should spell out clearly that a busy
> > > > > looping path is being added to the sleeping send as a downside of
> > > > > implementation choices. Still, for the record, I object to the design.
> > > > >
> > > > > > IGTs can do this but that really isn't a real world use case. For the
> > > > > > real world, this buffer is large enough that it won't ever be full hence
> > > > > > the comment + lazy spin loop.
> > > > > >
> > > > > > Next, it isn't like we get an interrupt or something when space
> > > > > > becomes available so how would we wake this thread? Could we come up
> > > > > > with a convoluted scheme where we insert ops that generated an interrupt
> > > > > > at regular intervals, probably? Would it be super complicated, totally
> > > > > > unnecessary, and gain use nothing - absolutely.
> > > > > >
> > > > > > Lastly, blocking CTBs really shouldn't ever be used. Certainly the
> > > > > > submission code doesn't use these. I think SRIOV might, but those can
> > > > > > probably be reworked too to use non-blocking. At some point we might
> > > > > > want to scrub the driver and just delete the blocking path.
> > > > 
> > > > I'd do an s/cond_resched()/msleep(1)/ and comment explaining why we
> > > > just don't care about this. That checks of the cpu wasting in this
> > > > case (GuC is overloaded, it wont come back anytime soon anyway) and
> > > > explains why we really don't want to make this any more clever or
> > > > complex code (because comment can explain why we wont hit this in
> > > > actual real world usage except when something else is on fire already
> > > > anyway).
> > > > 
> > > 
> > > Sounds good.
> > > 
> > > > If you want to go absolutely overkill and it's not too much work, make
> > > > the msleep interruptible or check for signals, and bail out. That way
> > > > the process can be made unstuck with ^C at least.
> > > 
> > > This loop is already bound by a timer and if no forward progress is made
> > > we pop out of this loop. It is assumed if this happens the GuC / GPU is
> > > dead a and full GPU reset will have to be issued. A following patch
> > > adds the timer, a bit later in submission section of the series a patch
> > > is added to trigger the reset.
> > 
> > Yeah timeout bail-out works too, and if you then switch it from timeout to
> > also interruptible it shouldn't be much more code. It's just nice to not
> > have any uninterruptible sleep.
> > -Daniel
> 
> I didn't get this in my next rev as I didn't know how to do this off
> hand but I think all I need to add is something like this to each
> iteration of the busy loops, right?
> 
> if (signal_pending_state(TASK_INTERRUPTIBLE, current))
> 	bail out of busy loop, and return and error

Yeah. Or since this is all hopeless already anyway, go with
msleep_interruptible().
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 249+ messages in thread

end of thread, other threads:[~2021-06-24 17:25 UTC | newest]

Thread overview: 249+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-06 19:13 [RFC PATCH 00/97] Basic GuC submission support in the i915 Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 01/97] drm/i915/gt: Move engine setup out of set_default_submission Matthew Brost
2021-05-19  0:25   ` Matthew Brost
2021-05-25  8:44   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-06 19:13 ` [RFC PATCH 02/97] drm/i915/gt: Move submission_method into intel_gt Matthew Brost
2021-05-19  3:10   ` Matthew Brost
2021-05-25  8:44   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-06 19:13 ` [RFC PATCH 03/97] drm/i915/gt: Move CS interrupt handler to the backend Matthew Brost
2021-05-19  3:31   ` Matthew Brost
2021-05-25  8:45   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-06 19:13 ` [RFC PATCH 04/97] drm/i915/guc: skip disabling CTBs before sanitizing the GuC Matthew Brost
2021-05-20 16:47   ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 05/97] drm/i915/guc: use probe_error log for CT enablement failure Matthew Brost
2021-05-24 10:30   ` Michal Wajdeczko
2021-05-06 19:13 ` [RFC PATCH 06/97] drm/i915/guc: enable only the user interrupt when using GuC submission Matthew Brost
2021-05-25  0:31   ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 07/97] drm/i915/guc: Remove sample_forcewake h2g action Matthew Brost
2021-05-24 10:48   ` Michal Wajdeczko
2021-05-25  0:36   ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 08/97] drm/i915/guc: Keep strict GuC ABI definitions Matthew Brost
2021-05-24 23:52   ` Michał Winiarski
2021-05-06 19:13 ` [RFC PATCH 09/97] drm/i915/guc: Stop using fence/status from CTB descriptor Matthew Brost
2021-05-25  2:38   ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 10/97] drm/i915: Promote ptrdiff() to i915_utils.h Matthew Brost
2021-05-25  0:42   ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 11/97] drm/i915/guc: Only rely on own CTB size Matthew Brost
2021-05-25  2:47   ` Matthew Brost
2021-05-25 12:48     ` [Intel-gfx] " Michal Wajdeczko
2021-05-06 19:13 ` [RFC PATCH 12/97] drm/i915/guc: Don't repeat CTB layout calculations Matthew Brost
2021-05-25  2:53   ` Matthew Brost
2021-05-25 13:07     ` Michal Wajdeczko
2021-05-25 16:56       ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 13/97] drm/i915/guc: Replace CTB array with explicit members Matthew Brost
2021-05-25  3:15   ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 14/97] drm/i915/guc: Update sizes of CTB buffers Matthew Brost
2021-05-25  2:56   ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 15/97] drm/i915/guc: Relax CTB response timeout Matthew Brost
2021-05-25 18:08   ` Matthew Brost
2021-05-25 19:37     ` [Intel-gfx] " Michal Wajdeczko
2021-05-06 19:13 ` [RFC PATCH 16/97] drm/i915/guc: Start protecting access to CTB descriptors Matthew Brost
2021-05-25  3:21   ` Matthew Brost
2021-05-25  3:21   ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 17/97] drm/i915/guc: Stop using mutex while sending CTB messages Matthew Brost
2021-05-25 16:14   ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 18/97] drm/i915/guc: Don't receive all G2H messages in irq handler Matthew Brost
2021-05-25 18:15   ` Matthew Brost
2021-05-25 19:43     ` [Intel-gfx] " Michal Wajdeczko
2021-05-06 19:13 ` [RFC PATCH 19/97] drm/i915/guc: Always copy CT message to new allocation Matthew Brost
2021-05-25 18:25   ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 20/97] drm/i915/guc: Introduce unified HXG messages Matthew Brost
2021-05-11 15:16   ` Daniel Vetter
2021-05-11 17:59     ` Matthew Brost
2021-05-11 22:11     ` Michal Wajdeczko
2021-05-12  8:40       ` Daniel Vetter
2021-05-06 19:13 ` [RFC PATCH 21/97] drm/i915/guc: Update MMIO based communication Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 22/97] drm/i915/guc: Update CTB response status Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 23/97] drm/i915/guc: Support per context scheduling policies Matthew Brost
2021-05-25  1:15   ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 24/97] drm/i915/guc: Add flag for mark broken CTB Matthew Brost
2021-05-27 19:44   ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 25/97] drm/i915/guc: New definition of the CTB descriptor Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 26/97] drm/i915/guc: New definition of the CTB registration action Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 27/97] drm/i915/guc: New CTB based communication Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 28/97] drm/i915/guc: Kill guc_clients.ct_pool Matthew Brost
2021-05-25  1:01   ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 29/97] drm/i915/guc: Update firmware to v60.1.2 Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 30/97] drm/i915/uc: turn on GuC/HuC auto mode by default Matthew Brost
2021-05-24 11:00   ` Michal Wajdeczko
2021-05-06 19:13 ` [RFC PATCH 31/97] drm/i915/guc: Early initialization of GuC send registers Matthew Brost
2021-05-26 20:28   ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 32/97] drm/i915: Introduce i915_sched_engine object Matthew Brost
2021-05-11 15:18   ` Daniel Vetter
2021-05-11 17:56     ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 33/97] drm/i915: Engine relative MMIO Matthew Brost
2021-05-25  9:05   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-06 19:13 ` [RFC PATCH 34/97] drm/i915/guc: Use guc_class instead of engine_class in fw interface Matthew Brost
2021-05-26 20:41   ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 35/97] drm/i915/guc: Improve error message for unsolicited CT response Matthew Brost
2021-05-24 11:59   ` Michal Wajdeczko
2021-05-25 17:32     ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 36/97] drm/i915/guc: Add non blocking CTB send function Matthew Brost
2021-05-24 12:21   ` Michal Wajdeczko
2021-05-25 17:30     ` Matthew Brost
2021-05-25  9:21   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-25 17:21     ` Matthew Brost
2021-05-26  8:57       ` Tvrtko Ursulin
2021-05-26 18:10         ` Matthew Brost
2021-05-27 10:02           ` Tvrtko Ursulin
2021-05-27 14:35             ` Matthew Brost
2021-05-27 15:11               ` Tvrtko Ursulin
2021-06-07 17:31                 ` Matthew Brost
2021-06-08  8:39                   ` Tvrtko Ursulin
2021-06-08  8:46                     ` Daniel Vetter
2021-06-09 23:10                       ` Matthew Brost
2021-06-10 15:27                         ` Daniel Vetter
2021-06-24 16:38                           ` Matthew Brost
2021-06-24 17:25                             ` Daniel Vetter
2021-06-09 13:58                     ` Michal Wajdeczko
2021-06-09 23:05                       ` Matthew Brost
2021-06-09 14:14                   ` Michal Wajdeczko
2021-06-09 23:13                     ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 37/97] drm/i915/guc: Add stall timer to " Matthew Brost
2021-05-24 12:58   ` Michal Wajdeczko
2021-05-24 18:35     ` Matthew Brost
2021-05-25 14:15       ` Michal Wajdeczko
2021-05-25 16:54         ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 38/97] drm/i915/guc: Optimize CTB writes and reads Matthew Brost
2021-05-24 13:31   ` Michal Wajdeczko
2021-05-25 17:39     ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 39/97] drm/i915/guc: Increase size of CTB buffers Matthew Brost
2021-05-24 13:43   ` [Intel-gfx] " Michal Wajdeczko
2021-05-24 18:40     ` Matthew Brost
2021-05-25  9:24   ` Tvrtko Ursulin
2021-05-25 17:15     ` Matthew Brost
2021-05-26  9:30       ` Tvrtko Ursulin
2021-05-26 18:20         ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 40/97] drm/i915/guc: Module load failure test for CT buffer creation Matthew Brost
2021-05-24 13:45   ` Michal Wajdeczko
2021-05-06 19:13 ` [RFC PATCH 41/97] drm/i915/guc: Add new GuC interface defines and structures Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 42/97] drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 43/97] drm/i915/guc: Add lrc descriptor context lookup array Matthew Brost
2021-05-11 15:26   ` Daniel Vetter
2021-05-11 17:01     ` Matthew Brost
2021-05-11 17:43       ` Daniel Vetter
2021-05-11 19:34         ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 44/97] drm/i915/guc: Implement GuC submission tasklet Matthew Brost
2021-05-25  9:43   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-25 17:10     ` Matthew Brost
2021-05-06 19:13 ` [RFC PATCH 45/97] drm/i915/guc: Add bypass tasklet submission path to GuC Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 46/97] drm/i915/guc: Implement GuC context operations for new inteface Matthew Brost
2021-05-29 20:32   ` Michal Wajdeczko
2021-05-06 19:14 ` [RFC PATCH 47/97] drm/i915/guc: Insert fence on context when deregistering Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 48/97] drm/i915/guc: Defer context unpin until scheduling is disabled Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 49/97] drm/i915/guc: Disable engine barriers with GuC during unpin Matthew Brost
2021-05-11 15:37   ` Daniel Vetter
2021-05-11 16:31     ` Matthew Brost
2021-05-26 10:26   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-06 19:14 ` [RFC PATCH 50/97] drm/i915/guc: Extend deregistration fence to schedule disable Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 51/97] drm/i915: Disable preempt busywait when using GuC scheduling Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 52/97] drm/i915/guc: Ensure request ordering via completion fences Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 53/97] drm/i915/guc: Disable semaphores when using GuC scheduling Matthew Brost
2021-05-25  9:52   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-25 17:01     ` Matthew Brost
2021-05-26  9:25       ` Tvrtko Ursulin
2021-05-26 18:15         ` Matthew Brost
2021-05-27  8:41           ` Tvrtko Ursulin
2021-05-27 14:38             ` Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 54/97] drm/i915/guc: Ensure G2H response has space in buffer Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 55/97] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC Matthew Brost
2021-05-25 10:06   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-25 17:07     ` Matthew Brost
2021-05-26  9:21       ` Tvrtko Ursulin
2021-05-26 18:18         ` Matthew Brost
2021-05-27  9:02           ` Tvrtko Ursulin
2021-05-27 14:37             ` Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 56/97] drm/i915/guc: Update GuC debugfs to support new GuC Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 57/97] drm/i915/guc: Add several request trace points Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 58/97] drm/i915: Add intel_context tracing Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 59/97] drm/i915/guc: GuC virtual engines Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 60/97] drm/i915: Track 'serial' counts for " Matthew Brost
2021-05-25 10:16   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-25 17:52     ` Matthew Brost
2021-05-26  8:40       ` Tvrtko Ursulin
2021-05-26 18:45         ` John Harrison
2021-05-27  8:53           ` Tvrtko Ursulin
2021-05-27 17:01             ` John Harrison
2021-06-01  9:31               ` Tvrtko Ursulin
2021-06-02  1:20                 ` John Harrison
2021-06-02 12:04                   ` Tvrtko Ursulin
2021-06-02 12:09   ` Tvrtko Ursulin
2021-05-06 19:14 ` [RFC PATCH 61/97] drm/i915: Hold reference to intel_context over life of i915_request Matthew Brost
2021-06-02 12:18   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-06 19:14 ` [RFC PATCH 62/97] drm/i915/guc: Disable bonding extension with GuC submission Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 63/97] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs Matthew Brost
2021-06-02 13:31   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-06 19:14 ` [RFC PATCH 64/97] drm/i915/guc: Reset implementation for new GuC interface Matthew Brost
2021-06-02 14:33   ` [Intel-gfx] " Tvrtko Ursulin
2021-06-04  3:17     ` Matthew Brost
2021-06-04  8:16       ` Daniel Vetter
2021-06-04 18:02         ` Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 65/97] drm/i915: Reset GPU immediately if submission is disabled Matthew Brost
2021-06-02 14:36   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-06 19:14 ` [RFC PATCH 66/97] drm/i915/guc: Add disable interrupts to guc sanitize Matthew Brost
2021-05-11  8:16   ` [drm/i915/guc] 07336fb545: WARNING:at_drivers/gpu/drm/i915/gt/uc/intel_uc.c:#__uc_sanitize[i915] kernel test robot
2021-05-06 19:14 ` [RFC PATCH 67/97] drm/i915/guc: Suspend/resume implementation for new interface Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 68/97] drm/i915/guc: Handle context reset notification Matthew Brost
2021-05-11 16:25   ` [Intel-gfx] " Daniel Vetter
2021-05-06 19:14 ` [RFC PATCH 69/97] drm/i915/guc: Handle engine reset failure notification Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 70/97] drm/i915/guc: Enable the timer expired interrupt for GuC Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 71/97] drm/i915/guc: Provide mmio list to be saved/restored on engine reset Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 72/97] drm/i915/guc: Don't complain about reset races Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 73/97] drm/i915/guc: Enable GuC engine reset Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 74/97] drm/i915/guc: Capture error state on context reset Matthew Brost
2021-05-11 16:28   ` [Intel-gfx] " Daniel Vetter
2021-05-11 17:12     ` Matthew Brost
2021-05-11 17:45       ` Daniel Vetter
2021-05-06 19:14 ` [RFC PATCH 75/97] drm/i915/guc: Fix for error capture after full GPU reset with GuC Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 76/97] drm/i915/guc: Hook GuC scheduling policies up Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 77/97] drm/i915/guc: Connect reset modparam updates to GuC policy flags Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 78/97] drm/i915/guc: Include scheduling policies in the debugfs state dump Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 79/97] drm/i915/guc: Don't call ring_is_idle in GuC submission Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 80/97] drm/i915/guc: Implement banned contexts for " Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 81/97] drm/i915/guc: Allow flexible number of context ids Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 82/97] drm/i915/guc: Connect the number of guc_ids to debugfs Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 83/97] drm/i915/guc: Don't return -EAGAIN to user when guc_ids exhausted Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 84/97] drm/i915/guc: Don't allow requests not ready to consume all guc_ids Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 85/97] drm/i915/guc: Introduce guc_submit_engine object Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 86/97] drm/i915/guc: Add golden context to GuC ADS Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 87/97] drm/i915/guc: Implement GuC priority management Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 88/97] drm/i915/guc: Support request cancellation Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 89/97] drm/i915/guc: Check return of __xa_store when registering a context Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 90/97] drm/i915/guc: Non-static lrc descriptor registration buffer Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 91/97] drm/i915/guc: Take GT PM ref when deregistering context Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 92/97] drm/i915: Add GT PM delayed worker Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 93/97] drm/i915/guc: Take engine PM when a context is pinned with GuC submission Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 94/97] drm/i915/guc: Don't call switch_to_kernel_context " Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 95/97] drm/i915/guc: Selftest for GuC flow control Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 96/97] drm/i915/guc: Update GuC documentation Matthew Brost
2021-05-06 19:14 ` [RFC PATCH 97/97] drm/i915/guc: Unblock GuC submission on Gen11+ Matthew Brost
2021-05-09 17:12 ` [RFC PATCH 00/97] Basic GuC submission support in the i915 Martin Peres
2021-05-09 23:11   ` Jason Ekstrand
2021-05-10 13:55     ` Martin Peres
2021-05-10 16:25       ` Jason Ekstrand
2021-05-11  8:01         ` Martin Peres
2021-05-10 16:33       ` Daniel Vetter
2021-05-10 18:30         ` [Intel-gfx] " Francisco Jerez
2021-05-11  8:06         ` Martin Peres
2021-05-11 15:26           ` Bloomfield, Jon
2021-05-11 16:39             ` Matthew Brost
2021-05-12  6:26               ` Martin Peres
2021-05-14 16:31                 ` Jason Ekstrand
2021-05-25 15:37                   ` Alex Deucher
2021-05-11  2:58     ` Dixit, Ashutosh
2021-05-11  7:47       ` Martin Peres
2021-05-14 11:11 ` [Intel-gfx] " Tvrtko Ursulin
2021-05-14 16:36   ` Jason Ekstrand
2021-05-14 16:46     ` Matthew Brost
2021-05-14 16:41   ` Matthew Brost
2021-05-25 10:32 ` Tvrtko Ursulin
2021-05-25 16:45   ` Matthew Brost
2021-06-02 15:27     ` Tvrtko Ursulin
2021-06-02 18:57       ` Daniel Vetter
2021-06-03  3:41         ` Matthew Brost
2021-06-03  4:47           ` Daniel Vetter
2021-06-03  9:49             ` Tvrtko Ursulin
2021-06-03 10:52           ` Tvrtko Ursulin
2021-06-03  4:10       ` Matthew Brost
2021-06-03  8:51         ` Tvrtko Ursulin
2021-06-03 16:34           ` Matthew Brost

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).