intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH 00/47] GuC submission support
@ 2021-06-24  7:04 Matthew Brost
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 01/47] drm/i915/guc: Relax CTB response timeout Matthew Brost
                   ` (51 more replies)
  0 siblings, 52 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

As discussed in [1], [2] we are enabling GuC submission support in the
i915. This is a subset of the patches in step 5 described in [1],
basically it is absolute to enable CI with GuC submission on gen11+
platforms.

This series itself will likely be broken down into smaller patch sets to
merge. Likely into CTBs changes, basic submission, virtual engines, and
resets.

A following series will address the missing patches remaining from [1].

Locally tested on TGL machine and basic tests seem to be passing.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

[1] https://patchwork.freedesktop.org/series/89844/
[2] https://patchwork.freedesktop.org/series/91417/

Daniele Ceraolo Spurio (1):
  drm/i915/guc: Unblock GuC submission on Gen11+

John Harrison (10):
  drm/i915/guc: Module load failure test for CT buffer creation
  drm/i915: Track 'serial' counts for virtual engines
  drm/i915/guc: Provide mmio list to be saved/restored on engine reset
  drm/i915/guc: Don't complain about reset races
  drm/i915/guc: Enable GuC engine reset
  drm/i915/guc: Fix for error capture after full GPU reset with GuC
  drm/i915/guc: Hook GuC scheduling policies up
  drm/i915/guc: Connect reset modparam updates to GuC policy flags
  drm/i915/guc: Include scheduling policies in the debugfs state dump
  drm/i915/guc: Add golden context to GuC ADS

Matthew Brost (36):
  drm/i915/guc: Relax CTB response timeout
  drm/i915/guc: Improve error message for unsolicited CT response
  drm/i915/guc: Increase size of CTB buffers
  drm/i915/guc: Add non blocking CTB send function
  drm/i915/guc: Add stall timer to non blocking CTB send function
  drm/i915/guc: Optimize CTB writes and reads
  drm/i915/guc: Add new GuC interface defines and structures
  drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor
  drm/i915/guc: Add lrc descriptor context lookup array
  drm/i915/guc: Implement GuC submission tasklet
  drm/i915/guc: Add bypass tasklet submission path to GuC
  drm/i915/guc: Implement GuC context operations for new inteface
  drm/i915/guc: Insert fence on context when deregistering
  drm/i915/guc: Defer context unpin until scheduling is disabled
  drm/i915/guc: Disable engine barriers with GuC during unpin
  drm/i915/guc: Extend deregistration fence to schedule disable
  drm/i915: Disable preempt busywait when using GuC scheduling
  drm/i915/guc: Ensure request ordering via completion fences
  drm/i915/guc: Disable semaphores when using GuC scheduling
  drm/i915/guc: Ensure G2H response has space in buffer
  drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
  drm/i915/guc: Update GuC debugfs to support new GuC
  drm/i915/guc: Add several request trace points
  drm/i915: Add intel_context tracing
  drm/i915/guc: GuC virtual engines
  drm/i915: Hold reference to intel_context over life of i915_request
  drm/i915/guc: Disable bonding extension with GuC submission
  drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
  drm/i915/guc: Reset implementation for new GuC interface
  drm/i915: Reset GPU immediately if submission is disabled
  drm/i915/guc: Add disable interrupts to guc sanitize
  drm/i915/guc: Suspend/resume implementation for new interface
  drm/i915/guc: Handle context reset notification
  drm/i915/guc: Handle engine reset failure notification
  drm/i915/guc: Enable the timer expired interrupt for GuC
  drm/i915/guc: Capture error state on context reset

 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   30 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |    1 +
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |    3 +-
 drivers/gpu/drm/i915/gt/gen8_engine_cs.c      |    6 +-
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   |   41 +-
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   |   14 +-
 .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |    7 +
 drivers/gpu/drm/i915/gt/intel_context.c       |   41 +-
 drivers/gpu/drm/i915/gt/intel_context.h       |   31 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   49 +
 drivers/gpu/drm/i915/gt/intel_engine.h        |   72 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  182 +-
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   71 +-
 .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |    4 +
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   12 +-
 .../drm/i915/gt/intel_execlists_submission.c  |  234 +-
 .../drm/i915/gt/intel_execlists_submission.h  |   11 -
 drivers/gpu/drm/i915/gt/intel_gt.c            |   21 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |    2 +
 drivers/gpu/drm/i915/gt/intel_gt_pm.c         |    6 +-
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   |   22 +-
 drivers/gpu/drm/i915/gt/intel_gt_requests.h   |    9 +-
 drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |    1 -
 drivers/gpu/drm/i915/gt/intel_reset.c         |   20 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   28 +
 drivers/gpu/drm/i915/gt/intel_rps.c           |    4 +
 drivers/gpu/drm/i915/gt/intel_workarounds.c   |   46 +-
 .../gpu/drm/i915/gt/intel_workarounds_types.h |    1 +
 drivers/gpu/drm/i915/gt/mock_engine.c         |   41 +-
 drivers/gpu/drm/i915/gt/selftest_context.c    |   10 +
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |   20 +-
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   15 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |   82 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  106 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |  460 +++-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h    |    3 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  318 ++-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |   22 +-
 .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    |   25 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   88 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2197 +++++++++++++++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   17 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  102 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   11 +
 drivers/gpu/drm/i915/i915_debugfs.c           |    2 +
 drivers/gpu/drm/i915/i915_debugfs_params.c    |   31 +
 drivers/gpu/drm/i915/i915_gem_evict.c         |    1 +
 drivers/gpu/drm/i915/i915_gpu_error.c         |   25 +-
 drivers/gpu/drm/i915/i915_reg.h               |    2 +
 drivers/gpu/drm/i915/i915_request.c           |  159 +-
 drivers/gpu/drm/i915/i915_request.h           |   21 +
 drivers/gpu/drm/i915/i915_scheduler.c         |    6 +
 drivers/gpu/drm/i915/i915_scheduler.h         |    6 +
 drivers/gpu/drm/i915/i915_scheduler_types.h   |    5 +
 drivers/gpu/drm/i915/i915_trace.h             |  197 +-
 .../gpu/drm/i915/selftests/igt_live_test.c    |    2 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |    3 +-
 57 files changed, 4159 insertions(+), 787 deletions(-)

-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 01/47] drm/i915/guc: Relax CTB response timeout
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-06-24 17:23   ` Michal Wajdeczko
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 02/47] drm/i915/guc: Improve error message for unsolicited CT response Matthew Brost
                   ` (50 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

In upcoming patch we will allow more CTB requests to be sent in
parallel to the GuC for processing, so we shouldn't assume any more
that GuC will always reply without 10ms.

Use bigger value hardcoded value of 1s instead.

v2: Add CONFIG_DRM_I915_GUC_CTB_TIMEOUT config option
v3:
 (Daniel Vetter)
  - Use hardcoded value of 1s rather than config option

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 43409044528e..a59e239497ee 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -474,14 +474,16 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
 	/*
 	 * Fast commands should complete in less than 10us, so sample quickly
 	 * up to that length of time, then switch to a slower sleep-wait loop.
-	 * No GuC command should ever take longer than 10ms.
+	 * No GuC command should ever take longer than 10ms but many GuC
+	 * commands can be inflight at time, so use a 1s timeout on the slower
+	 * sleep-wait loop.
 	 */
 #define done \
 	(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, READ_ONCE(req->status)) == \
 	 GUC_HXG_ORIGIN_GUC)
 	err = wait_for_us(done, 10);
 	if (err)
-		err = wait_for(done, 10);
+		err = wait_for(done, 1000);
 #undef done
 
 	if (unlikely(err))
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 02/47] drm/i915/guc: Improve error message for unsolicited CT response
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 01/47] drm/i915/guc: Relax CTB response timeout Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-06-25 11:58   ` Michal Wajdeczko
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 03/47] drm/i915/guc: Increase size of CTB buffers Matthew Brost
                   ` (49 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Improve the error message when a unsolicited CT response is received by
printing fence that couldn't be found, the last fence, and all requests
with a response outstanding.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index a59e239497ee..07f080ddb9ae 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -730,12 +730,16 @@ static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 		found = true;
 		break;
 	}
-	spin_unlock_irqrestore(&ct->requests.lock, flags);
-
 	if (!found) {
 		CT_ERROR(ct, "Unsolicited response (fence %u)\n", fence);
-		return -ENOKEY;
+		CT_ERROR(ct, "Could not find fence=%u, last_fence=%u\n", fence,
+			 ct->requests.last_fence);
+		list_for_each_entry(req, &ct->requests.pending, link)
+			CT_ERROR(ct, "request %u awaits response\n",
+				 req->fence);
+		err = -ENOKEY;
 	}
+	spin_unlock_irqrestore(&ct->requests.lock, flags);
 
 	if (unlikely(err))
 		return err;
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 03/47] drm/i915/guc: Increase size of CTB buffers
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 01/47] drm/i915/guc: Relax CTB response timeout Matthew Brost
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 02/47] drm/i915/guc: Improve error message for unsolicited CT response Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-06-24 13:49   ` Michal Wajdeczko
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 04/47] drm/i915/guc: Add non blocking CTB send function Matthew Brost
                   ` (48 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

With the introduction of non-blocking CTBs more than one CTB can be in
flight at a time. Increasing the size of the CTBs should reduce how
often software hits the case where no space is available in the CTB
buffer.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 07f080ddb9ae..a17215920e58 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -58,11 +58,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
  *      +--------+-----------------------------------------------+------+
  *
  * Size of each `CT Buffer`_ must be multiple of 4K.
- * As we don't expect too many messages, for now use minimum sizes.
+ * We don't expect too many messages in flight at any time, unless we are
+ * using the GuC submission. In that case each request requires a minimum
+ * 2 dwords which gives us a maximum 256 queue'd requests. Hopefully this
+ * enough space to avoid backpressure on the driver. We increase the size
+ * of the receive buffer (relative to the send) to ensure a G2H response
+ * CTB has a landing spot.
  */
 #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
 #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
-#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
+#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
 
 struct ct_request {
 	struct list_head link;
@@ -641,7 +646,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	/* beware of buffer wrap case */
 	if (unlikely(available < 0))
 		available += size;
-	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
+	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
 	GEM_BUG_ON(available < 0);
 
 	header = cmds[head];
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 04/47] drm/i915/guc: Add non blocking CTB send function
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (2 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 03/47] drm/i915/guc: Increase size of CTB buffers Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-06-24 14:48   ` Michal Wajdeczko
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 05/47] drm/i915/guc: Add stall timer to " Matthew Brost
                   ` (47 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Add non blocking CTB send function, intel_guc_send_nb. GuC submission
will send CTBs in the critical path and does not need to wait for these
CTBs to complete before moving on, hence the need for this new function.

The non-blocking CTB now must have a flow control mechanism to ensure
the buffer isn't overrun. A lazy spin wait is used as we believe the
flow control condition should be rare with a properly sized buffer.

The function, intel_guc_send_nb, is exported in this patch but unused.
Several patches later in the series make use of this function.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 +++-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +-
 3 files changed, 82 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 4abc59f6f3cd..24b1df6ad4ae 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
 static
 inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
 {
-	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
+	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
+}
+
+#define INTEL_GUC_SEND_NB		BIT(31)
+static
+inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
+{
+	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
+				 INTEL_GUC_SEND_NB);
 }
 
 static inline int
@@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
 			   u32 *response_buf, u32 response_buf_size)
 {
 	return intel_guc_ct_send(&guc->ct, action, len,
-				 response_buf, response_buf_size);
+				 response_buf, response_buf_size, 0);
 }
 
 static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index a17215920e58..c9a65d05911f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -3,6 +3,11 @@
  * Copyright © 2016-2019 Intel Corporation
  */
 
+#include <linux/circ_buf.h>
+#include <linux/ktime.h>
+#include <linux/time64.h>
+#include <linux/timekeeping.h>
+
 #include "i915_drv.h"
 #include "intel_guc_ct.h"
 #include "gt/intel_gt.h"
@@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct)
 static int ct_write(struct intel_guc_ct *ct,
 		    const u32 *action,
 		    u32 len /* in dwords */,
-		    u32 fence)
+		    u32 fence, u32 flags)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
@@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct,
 		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
 		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
 
-	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
-	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
-			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
+	hxg = (flags & INTEL_GUC_SEND_NB) ?
+		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
+		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
+			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
+		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
+		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
+			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
 
 	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
 		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
@@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
 	return err;
 }
 
+static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
+{
+	struct guc_ct_buffer_desc *desc = ctb->desc;
+	u32 head = READ_ONCE(desc->head);
+	u32 space;
+
+	space = CIRC_SPACE(desc->tail, head, ctb->size);
+
+	return space >= len_dw;
+}
+
+static int ct_send_nb(struct intel_guc_ct *ct,
+		      const u32 *action,
+		      u32 len,
+		      u32 flags)
+{
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+	unsigned long spin_flags;
+	u32 fence;
+	int ret;
+
+	spin_lock_irqsave(&ctb->lock, spin_flags);
+
+	ret = h2g_has_room(ctb, len + 1);
+	if (unlikely(ret))
+		goto out;
+
+	fence = ct_get_next_fence(ct);
+	ret = ct_write(ct, action, len, fence, flags);
+	if (unlikely(ret))
+		goto out;
+
+	intel_guc_notify(ct_to_guc(ct));
+
+out:
+	spin_unlock_irqrestore(&ctb->lock, spin_flags);
+
+	return ret;
+}
+
 static int ct_send(struct intel_guc_ct *ct,
 		   const u32 *action,
 		   u32 len,
@@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct,
 		   u32 response_buf_size,
 		   u32 *status)
 {
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct ct_request request;
 	unsigned long flags;
 	u32 fence;
@@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct,
 	GEM_BUG_ON(!len);
 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
 	GEM_BUG_ON(!response_buf && response_buf_size);
+	might_sleep();
 
+	/*
+	 * We use a lazy spin wait loop here as we believe that if the CT
+	 * buffers are sized correctly the flow control condition should be
+	 * rare.
+	 */
+retry:
 	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
+	if (unlikely(!h2g_has_room(ctb, len + 1))) {
+		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
+		cond_resched();
+		goto retry;
+	}
 
 	fence = ct_get_next_fence(ct);
 	request.fence = fence;
@@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct,
 	list_add_tail(&request.link, &ct->requests.pending);
 	spin_unlock(&ct->requests.lock);
 
-	err = ct_write(ct, action, len, fence);
+	err = ct_write(ct, action, len, fence, 0);
 
 	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
 
@@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
  * Command Transport (CT) buffer based GuC send function.
  */
 int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
-		      u32 *response_buf, u32 response_buf_size)
+		      u32 *response_buf, u32 response_buf_size, u32 flags)
 {
 	u32 status = ~0; /* undefined */
 	int ret;
@@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 		return -ENODEV;
 	}
 
+	if (flags & INTEL_GUC_SEND_NB)
+		return ct_send_nb(ct, action, len, flags);
+
 	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
 	if (unlikely(ret < 0)) {
 		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 1ae2dde6db93..eb69263324ba 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -42,7 +42,6 @@ struct intel_guc_ct_buffer {
 	bool broken;
 };
 
-
 /** Top-level structure for Command Transport related data
  *
  * Includes a pair of CT buffers for bi-directional communication and tracking
@@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
 }
 
 int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
-		      u32 *response_buf, u32 response_buf_size);
+		      u32 *response_buf, u32 response_buf_size, u32 flags);
 void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
 
 #endif /* _INTEL_GUC_CT_H_ */
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 05/47] drm/i915/guc: Add stall timer to non blocking CTB send function
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (3 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 04/47] drm/i915/guc: Add non blocking CTB send function Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-06-24 17:37   ` Michal Wajdeczko
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 06/47] drm/i915/guc: Optimize CTB writes and reads Matthew Brost
                   ` (46 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Implement a stall timer which fails H2G CTBs once a period of time
with no forward progress is reached to prevent deadlock.

Also update to ct_write to return -EIO rather than -EPIPE on a
corrupted descriptor.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 47 +++++++++++++++++++++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  4 ++
 2 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index c9a65d05911f..27ec30b5ef47 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -319,6 +319,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
 		goto err_deregister;
 
 	ct->enabled = true;
+	ct->stall_time = KTIME_MAX;
 
 	return 0;
 
@@ -392,7 +393,7 @@ static int ct_write(struct intel_guc_ct *ct,
 	unsigned int i;
 
 	if (unlikely(ctb->broken))
-		return -EPIPE;
+		return -EIO;
 
 	if (unlikely(desc->status))
 		goto corrupted;
@@ -464,7 +465,7 @@ static int ct_write(struct intel_guc_ct *ct,
 	CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n",
 		 desc->head, desc->tail, desc->status);
 	ctb->broken = true;
-	return -EPIPE;
+	return -EIO;
 }
 
 /**
@@ -507,6 +508,18 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
 	return err;
 }
 
+#define GUC_CTB_TIMEOUT_MS	1500
+static inline bool ct_deadlocked(struct intel_guc_ct *ct)
+{
+	long timeout = GUC_CTB_TIMEOUT_MS;
+	bool ret = ktime_ms_delta(ktime_get(), ct->stall_time) > timeout;
+
+	if (unlikely(ret))
+		CT_ERROR(ct, "CT deadlocked\n");
+
+	return ret;
+}
+
 static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
 {
 	struct guc_ct_buffer_desc *desc = ctb->desc;
@@ -518,6 +531,26 @@ static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
 	return space >= len_dw;
 }
 
+static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
+{
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+
+	lockdep_assert_held(&ct->ctbs.send.lock);
+
+	if (unlikely(!h2g_has_room(ctb, len_dw))) {
+		if (ct->stall_time == KTIME_MAX)
+			ct->stall_time = ktime_get();
+
+		if (unlikely(ct_deadlocked(ct)))
+			return -EIO;
+		else
+			return -EBUSY;
+	}
+
+	ct->stall_time = KTIME_MAX;
+	return 0;
+}
+
 static int ct_send_nb(struct intel_guc_ct *ct,
 		      const u32 *action,
 		      u32 len,
@@ -530,7 +563,7 @@ static int ct_send_nb(struct intel_guc_ct *ct,
 
 	spin_lock_irqsave(&ctb->lock, spin_flags);
 
-	ret = h2g_has_room(ctb, len + 1);
+	ret = has_room_nb(ct, len + 1);
 	if (unlikely(ret))
 		goto out;
 
@@ -574,11 +607,19 @@ static int ct_send(struct intel_guc_ct *ct,
 retry:
 	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
 	if (unlikely(!h2g_has_room(ctb, len + 1))) {
+		if (ct->stall_time == KTIME_MAX)
+			ct->stall_time = ktime_get();
 		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
+
+		if (unlikely(ct_deadlocked(ct)))
+			return -EIO;
+
 		cond_resched();
 		goto retry;
 	}
 
+	ct->stall_time = KTIME_MAX;
+
 	fence = ct_get_next_fence(ct);
 	request.fence = fence;
 	request.status = 0;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index eb69263324ba..55ef7c52472f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -9,6 +9,7 @@
 #include <linux/interrupt.h>
 #include <linux/spinlock.h>
 #include <linux/workqueue.h>
+#include <linux/ktime.h>
 
 #include "intel_guc_fwif.h"
 
@@ -68,6 +69,9 @@ struct intel_guc_ct {
 		struct list_head incoming; /* incoming requests */
 		struct work_struct worker; /* handler for incoming requests */
 	} requests;
+
+	/** @stall_time: time of first time a CTB submission is stalled */
+	ktime_t stall_time;
 };
 
 void intel_guc_ct_init_early(struct intel_guc_ct *ct);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 06/47] drm/i915/guc: Optimize CTB writes and reads
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (4 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 05/47] drm/i915/guc: Add stall timer to " Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-06-25 13:09   ` Michal Wajdeczko
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 07/47] drm/i915/guc: Module load failure test for CT buffer creation Matthew Brost
                   ` (45 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail) which could result in accesses across the PCIe bus,
store shadow local copies and only read/write the descriptor values when
absolutely necessary. Also store the current space in the each channel
locally.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 76 ++++++++++++++---------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
 2 files changed, 51 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 27ec30b5ef47..1fd5c69358ef 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
 static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
 {
 	ctb->broken = false;
+	ctb->tail = 0;
+	ctb->head = 0;
+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+
 	guc_ct_buffer_desc_init(ctb->desc);
 }
 
@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
-	u32 tail = desc->tail;
+	u32 tail = ctb->tail;
 	u32 size = ctb->size;
-	u32 used;
 	u32 header;
 	u32 hxg;
 	u32 *cmds = ctb->cmds;
@@ -398,25 +400,14 @@ static int ct_write(struct intel_guc_ct *ct,
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely((desc->tail | desc->head) >= size)) {
 		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-			 head, tail, size);
+			 desc->head, desc->tail, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
-
-	/*
-	 * tail == head condition indicates empty. GuC FW does not support
-	 * using up the entire buffer to get tail == head meaning full.
-	 */
-	if (tail < head)
-		used = (size - head) + tail;
-	else
-		used = tail - head;
-
-	/* make sure there is a space including extra dw for the fence */
-	if (unlikely(used + len + 1 >= size))
-		return -ENOSPC;
+#endif
 
 	/*
 	 * dw0: CT header (including fence)
@@ -457,7 +448,9 @@ static int ct_write(struct intel_guc_ct *ct,
 	write_barrier(ct);
 
 	/* now update descriptor */
+	ctb->tail = tail;
 	WRITE_ONCE(desc->tail, tail);
+	ctb->space -= len + 1;
 
 	return 0;
 
@@ -473,7 +466,7 @@ static int ct_write(struct intel_guc_ct *ct,
  * @req:	pointer to pending request
  * @status:	placeholder for status
  *
- * For each sent request, Guc shall send bac CT response message.
+ * For each sent request, GuC shall send back CT response message.
  * Our message handler will update status of tracked request once
  * response message with given fence is received. Wait here and
  * check for valid response status value.
@@ -520,24 +513,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 	return ret;
 }
 
-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
+static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = READ_ONCE(desc->head);
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+	u32 head;
 	u32 space;
 
-	space = CIRC_SPACE(desc->tail, head, ctb->size);
+	if (ctb->space >= len_dw)
+		return true;
+
+	head = READ_ONCE(ctb->desc->head);
+	if (unlikely(head > ctb->size)) {
+		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
+			  ctb->desc->head, ctb->desc->tail, ctb->size);
+		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
+		ctb->broken = true;
+		return false;
+	}
+
+	space = CIRC_SPACE(ctb->tail, head, ctb->size);
+	ctb->space = space;
 
 	return space >= len_dw;
 }
 
 static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
-
 	lockdep_assert_held(&ct->ctbs.send.lock);
 
-	if (unlikely(!h2g_has_room(ctb, len_dw))) {
+	if (unlikely(!h2g_has_room(ct, len_dw))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 
@@ -606,10 +610,10 @@ static int ct_send(struct intel_guc_ct *ct,
 	 */
 retry:
 	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
-	if (unlikely(!h2g_has_room(ctb, len + 1))) {
+	if (unlikely(!h2g_has_room(ct, len + 1))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
-		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
+		spin_unlock_irqrestore(&ctb->lock, flags);
 
 		if (unlikely(ct_deadlocked(ct)))
 			return -EIO;
@@ -632,7 +636,7 @@ static int ct_send(struct intel_guc_ct *ct,
 
 	err = ct_write(ct, action, len, fence, 0);
 
-	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
+	spin_unlock_irqrestore(&ctb->lock, flags);
 
 	if (unlikely(err))
 		goto unlink;
@@ -720,7 +724,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
+	u32 head = ctb->head;
 	u32 tail = desc->tail;
 	u32 size = ctb->size;
 	u32 *cmds = ctb->cmds;
@@ -735,12 +739,21 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely((desc->tail | desc->head) >= size)) {
 		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
 			 head, tail, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
+#else
+	if (unlikely((tail | ctb->head) >= size)) {
+		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
+			 head, tail, size);
+		desc->status |= GUC_CTB_STATUS_OVERFLOW;
+		goto corrupted;
+	}
+#endif
 
 	/* tail == head condition indicates empty */
 	available = tail - head;
@@ -790,6 +803,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	}
 	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
 
+	ctb->head = head;
 	/* now update descriptor */
 	WRITE_ONCE(desc->head, head);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 55ef7c52472f..9924335e2ee6 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -33,6 +33,9 @@ struct intel_guc;
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
  * @size: size of the commands buffer in dwords
+ * @head: local shadow copy of head in dwords
+ * @tail: local shadow copy of tail in dwords
+ * @space: local shadow copy of space in dwords
  * @broken: flag to indicate if descriptor data is broken
  */
 struct intel_guc_ct_buffer {
@@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
 	u32 size;
+	u32 tail;
+	u32 head;
+	u32 space;
 	bool broken;
 };
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 07/47] drm/i915/guc: Module load failure test for CT buffer creation
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (5 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 06/47] drm/i915/guc: Optimize CTB writes and reads Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 08/47] drm/i915/guc: Add new GuC interface defines and structures Matthew Brost
                   ` (44 subsequent siblings)
  51 siblings, 0 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

Add several module failure load inject points in the CT buffer creation
code path.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 1fd5c69358ef..8e0ed7d8feb3 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -175,6 +175,10 @@ static int ct_register_buffer(struct intel_guc_ct *ct, u32 type,
 {
 	int err;
 
+	err = i915_inject_probe_error(guc_to_gt(ct_to_guc(ct))->i915, -ENXIO);
+	if (unlikely(err))
+		return err;
+
 	err = guc_action_register_ct_buffer(ct_to_guc(ct), type,
 					    desc_addr, buff_addr, size);
 	if (unlikely(err))
@@ -226,6 +230,10 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
 	u32 *cmds;
 	int err;
 
+	err = i915_inject_probe_error(guc_to_gt(guc)->i915, -ENXIO);
+	if (err)
+		return err;
+
 	GEM_BUG_ON(ct->vma);
 
 	blob_size = 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE + CTB_G2H_BUFFER_SIZE;
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 08/47] drm/i915/guc: Add new GuC interface defines and structures
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (6 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 07/47] drm/i915/guc: Module load failure test for CT buffer creation Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-06-29 21:11   ` John Harrison
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 09/47] drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor Matthew Brost
                   ` (43 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Add new GuC interface defines and structures while maintaining old ones
in parallel.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  | 14 +++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   | 41 +++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 2d6198e63ebe..57e18babdf4b 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -124,10 +124,24 @@ enum intel_guc_action {
 	INTEL_GUC_ACTION_FORCE_LOG_BUFFER_FLUSH = 0x302,
 	INTEL_GUC_ACTION_ENTER_S_STATE = 0x501,
 	INTEL_GUC_ACTION_EXIT_S_STATE = 0x502,
+	INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE = 0x506,
+	INTEL_GUC_ACTION_SCHED_CONTEXT = 0x1000,
+	INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET = 0x1001,
+	INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE = 0x1002,
+	INTEL_GUC_ACTION_SCHED_ENGINE_MODE_SET = 0x1003,
+	INTEL_GUC_ACTION_SCHED_ENGINE_MODE_DONE = 0x1004,
+	INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY = 0x1005,
+	INTEL_GUC_ACTION_SET_CONTEXT_EXECUTION_QUANTUM = 0x1006,
+	INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT = 0x1007,
+	INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION = 0x1008,
+	INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION = 0x1009,
 	INTEL_GUC_ACTION_SLPC_REQUEST = 0x3003,
 	INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000,
+	INTEL_GUC_ACTION_REGISTER_CONTEXT = 0x4502,
+	INTEL_GUC_ACTION_DEREGISTER_CONTEXT = 0x4503,
 	INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
 	INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
+	INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
 	INTEL_GUC_ACTION_LIMIT
 };
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 617ec601648d..28245a217a39 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -17,6 +17,9 @@
 #include "abi/guc_communication_ctb_abi.h"
 #include "abi/guc_messages_abi.h"
 
+#define GUC_CONTEXT_DISABLE		0
+#define GUC_CONTEXT_ENABLE		1
+
 #define GUC_CLIENT_PRIORITY_KMD_HIGH	0
 #define GUC_CLIENT_PRIORITY_HIGH	1
 #define GUC_CLIENT_PRIORITY_KMD_NORMAL	2
@@ -26,6 +29,9 @@
 #define GUC_MAX_STAGE_DESCRIPTORS	1024
 #define	GUC_INVALID_STAGE_ID		GUC_MAX_STAGE_DESCRIPTORS
 
+#define GUC_MAX_LRC_DESCRIPTORS		65535
+#define	GUC_INVALID_LRC_ID		GUC_MAX_LRC_DESCRIPTORS
+
 #define GUC_RENDER_ENGINE		0
 #define GUC_VIDEO_ENGINE		1
 #define GUC_BLITTER_ENGINE		2
@@ -237,6 +243,41 @@ struct guc_stage_desc {
 	u64 desc_private;
 } __packed;
 
+#define CONTEXT_REGISTRATION_FLAG_KMD	BIT(0)
+
+#define CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000
+#define CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US 500000
+
+/* Preempt to idle on quantum expiry */
+#define CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE	BIT(0)
+
+/*
+ * GuC Context registration descriptor.
+ * FIXME: This is only required to exist during context registration.
+ * The current 1:1 between guc_lrc_desc and LRCs for the lifetime of the LRC
+ * is not required.
+ */
+struct guc_lrc_desc {
+	u32 hw_context_desc;
+	u32 slpm_perf_mode_hint;	/* SPLC v1 only */
+	u32 slpm_freq_hint;
+	u32 engine_submit_mask;		/* In logical space */
+	u8 engine_class;
+	u8 reserved0[3];
+	u32 priority;
+	u32 process_desc;
+	u32 wq_addr;
+	u32 wq_size;
+	u32 context_flags;		/* CONTEXT_REGISTRATION_* */
+	/* Time for one workload to execute. (in micro seconds) */
+	u32 execution_quantum;
+	/* Time to wait for a preemption request to complete before issuing a
+	 * reset. (in micro seconds). */
+	u32 preemption_timeout;
+	u32 policy_flags;		/* CONTEXT_POLICY_* */
+	u32 reserved1[19];
+} __packed;
+
 #define GUC_POWER_UNSPECIFIED	0
 #define GUC_POWER_D0		1
 #define GUC_POWER_D1		2
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 09/47] drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (7 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 08/47] drm/i915/guc: Add new GuC interface defines and structures Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-06-25 19:44   ` John Harrison
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 10/47] drm/i915/guc: Add lrc descriptor context lookup array Matthew Brost
                   ` (42 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Remove old GuC stage descriptor, add lrc descriptor which will be used
by the new GuC interface implemented in this patch series.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   | 65 -----------------
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 72 ++++++-------------
 3 files changed, 25 insertions(+), 116 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 24b1df6ad4ae..b28fa54214f2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -43,8 +43,8 @@ struct intel_guc {
 	struct i915_vma *ads_vma;
 	struct __guc_ads_blob *ads_blob;
 
-	struct i915_vma *stage_desc_pool;
-	void *stage_desc_pool_vaddr;
+	struct i915_vma *lrc_desc_pool;
+	void *lrc_desc_pool_vaddr;
 
 	/* Control params for fw initialization */
 	u32 params[GUC_CTL_MAX_DWORDS];
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 28245a217a39..4e4edc368b77 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -26,9 +26,6 @@
 #define GUC_CLIENT_PRIORITY_NORMAL	3
 #define GUC_CLIENT_PRIORITY_NUM		4
 
-#define GUC_MAX_STAGE_DESCRIPTORS	1024
-#define	GUC_INVALID_STAGE_ID		GUC_MAX_STAGE_DESCRIPTORS
-
 #define GUC_MAX_LRC_DESCRIPTORS		65535
 #define	GUC_INVALID_LRC_ID		GUC_MAX_LRC_DESCRIPTORS
 
@@ -181,68 +178,6 @@ struct guc_process_desc {
 	u32 reserved[30];
 } __packed;
 
-/* engine id and context id is packed into guc_execlist_context.context_id*/
-#define GUC_ELC_CTXID_OFFSET		0
-#define GUC_ELC_ENGINE_OFFSET		29
-
-/* The execlist context including software and HW information */
-struct guc_execlist_context {
-	u32 context_desc;
-	u32 context_id;
-	u32 ring_status;
-	u32 ring_lrca;
-	u32 ring_begin;
-	u32 ring_end;
-	u32 ring_next_free_location;
-	u32 ring_current_tail_pointer_value;
-	u8 engine_state_submit_value;
-	u8 engine_state_wait_value;
-	u16 pagefault_count;
-	u16 engine_submit_queue_count;
-} __packed;
-
-/*
- * This structure describes a stage set arranged for a particular communication
- * between uKernel (GuC) and Driver (KMD). Technically, this is known as a
- * "GuC Context descriptor" in the specs, but we use the term "stage descriptor"
- * to avoid confusion with all the other things already named "context" in the
- * driver. A static pool of these descriptors are stored inside a GEM object
- * (stage_desc_pool) which is held for the entire lifetime of our interaction
- * with the GuC, being allocated before the GuC is loaded with its firmware.
- */
-struct guc_stage_desc {
-	u32 sched_common_area;
-	u32 stage_id;
-	u32 pas_id;
-	u8 engines_used;
-	u64 db_trigger_cpu;
-	u32 db_trigger_uk;
-	u64 db_trigger_phy;
-	u16 db_id;
-
-	struct guc_execlist_context lrc[GUC_MAX_ENGINES_NUM];
-
-	u8 attribute;
-
-	u32 priority;
-
-	u32 wq_sampled_tail_offset;
-	u32 wq_total_submit_enqueues;
-
-	u32 process_desc;
-	u32 wq_addr;
-	u32 wq_size;
-
-	u32 engine_presence;
-
-	u8 engine_suspended;
-
-	u8 reserved0[3];
-	u64 reserved1[1];
-
-	u64 desc_private;
-} __packed;
-
 #define CONTEXT_REGISTRATION_FLAG_KMD	BIT(0)
 
 #define CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index e9c237b18692..a366890fb840 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -65,57 +65,35 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 	return rb_entry(rb, struct i915_priolist, node);
 }
 
-static struct guc_stage_desc *__get_stage_desc(struct intel_guc *guc, u32 id)
+/* Future patches will use this function */
+__attribute__ ((unused))
+static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
 {
-	struct guc_stage_desc *base = guc->stage_desc_pool_vaddr;
+	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
 
-	return &base[id];
-}
-
-static int guc_stage_desc_pool_create(struct intel_guc *guc)
-{
-	u32 size = PAGE_ALIGN(sizeof(struct guc_stage_desc) *
-			      GUC_MAX_STAGE_DESCRIPTORS);
+	GEM_BUG_ON(index >= GUC_MAX_LRC_DESCRIPTORS);
 
-	return intel_guc_allocate_and_map_vma(guc, size, &guc->stage_desc_pool,
-					      &guc->stage_desc_pool_vaddr);
+	return &base[index];
 }
 
-static void guc_stage_desc_pool_destroy(struct intel_guc *guc)
-{
-	i915_vma_unpin_and_release(&guc->stage_desc_pool, I915_VMA_RELEASE_MAP);
-}
-
-/*
- * Initialise/clear the stage descriptor shared with the GuC firmware.
- *
- * This descriptor tells the GuC where (in GGTT space) to find the important
- * data structures related to work submission (process descriptor, write queue,
- * etc).
- */
-static void guc_stage_desc_init(struct intel_guc *guc)
+static int guc_lrc_desc_pool_create(struct intel_guc *guc)
 {
-	struct guc_stage_desc *desc;
-
-	/* we only use 1 stage desc, so hardcode it to 0 */
-	desc = __get_stage_desc(guc, 0);
-	memset(desc, 0, sizeof(*desc));
-
-	desc->attribute = GUC_STAGE_DESC_ATTR_ACTIVE |
-			  GUC_STAGE_DESC_ATTR_KERNEL;
+	u32 size;
+	int ret;
 
-	desc->stage_id = 0;
-	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
+	size = PAGE_ALIGN(sizeof(struct guc_lrc_desc) *
+			  GUC_MAX_LRC_DESCRIPTORS);
+	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->lrc_desc_pool,
+					     (void **)&guc->lrc_desc_pool_vaddr);
+	if (ret)
+		return ret;
 
-	desc->wq_size = GUC_WQ_SIZE;
+	return 0;
 }
 
-static void guc_stage_desc_fini(struct intel_guc *guc)
+static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
 {
-	struct guc_stage_desc *desc;
-
-	desc = __get_stage_desc(guc, 0);
-	memset(desc, 0, sizeof(*desc));
+	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
 }
 
 static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
@@ -410,26 +388,25 @@ int intel_guc_submission_init(struct intel_guc *guc)
 {
 	int ret;
 
-	if (guc->stage_desc_pool)
+	if (guc->lrc_desc_pool)
 		return 0;
 
-	ret = guc_stage_desc_pool_create(guc);
+	ret = guc_lrc_desc_pool_create(guc);
 	if (ret)
 		return ret;
 	/*
 	 * Keep static analysers happy, let them know that we allocated the
 	 * vma after testing that it didn't exist earlier.
 	 */
-	GEM_BUG_ON(!guc->stage_desc_pool);
+	GEM_BUG_ON(!guc->lrc_desc_pool);
 
 	return 0;
 }
 
 void intel_guc_submission_fini(struct intel_guc *guc)
 {
-	if (guc->stage_desc_pool) {
-		guc_stage_desc_pool_destroy(guc);
-	}
+	if (guc->lrc_desc_pool)
+		guc_lrc_desc_pool_destroy(guc);
 }
 
 static int guc_context_alloc(struct intel_context *ce)
@@ -695,7 +672,6 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 
 void intel_guc_submission_enable(struct intel_guc *guc)
 {
-	guc_stage_desc_init(guc);
 }
 
 void intel_guc_submission_disable(struct intel_guc *guc)
@@ -705,8 +681,6 @@ void intel_guc_submission_disable(struct intel_guc *guc)
 	GEM_BUG_ON(gt->awake); /* GT should be parked first */
 
 	/* Note: By the time we're here, GuC may have already been reset */
-
-	guc_stage_desc_fini(guc);
 }
 
 static bool __guc_submission_selected(struct intel_guc *guc)
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 10/47] drm/i915/guc: Add lrc descriptor context lookup array
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (8 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 09/47] drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-06-25 13:17   ` Michal Wajdeczko
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 11/47] drm/i915/guc: Implement GuC submission tasklet Matthew Brost
                   ` (41 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Add lrc descriptor context lookup array which can resolve the
intel_context from the lrc descriptor index. In addition to lookup, it
can determine in the lrc descriptor context is currently registered with
the GuC by checking if an entry for a descriptor index is present.
Future patches in the series will make use of this array.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  5 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +++++++++++++++++--
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index b28fa54214f2..2313d9fc087b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -6,6 +6,8 @@
 #ifndef _INTEL_GUC_H_
 #define _INTEL_GUC_H_
 
+#include "linux/xarray.h"
+
 #include "intel_uncore.h"
 #include "intel_guc_fw.h"
 #include "intel_guc_fwif.h"
@@ -46,6 +48,9 @@ struct intel_guc {
 	struct i915_vma *lrc_desc_pool;
 	void *lrc_desc_pool_vaddr;
 
+	/* guc_id to intel_context lookup */
+	struct xarray context_lookup;
+
 	/* Control params for fw initialization */
 	u32 params[GUC_CTL_MAX_DWORDS];
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a366890fb840..23a94a896a0b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 	return rb_entry(rb, struct i915_priolist, node);
 }
 
-/* Future patches will use this function */
-__attribute__ ((unused))
 static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
 {
 	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
@@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
 	return &base[index];
 }
 
+static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
+{
+	struct intel_context *ce = xa_load(&guc->context_lookup, id);
+
+	GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
+
+	return ce;
+}
+
 static int guc_lrc_desc_pool_create(struct intel_guc *guc)
 {
 	u32 size;
@@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
 	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
 }
 
+static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
+{
+	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
+
+	memset(desc, 0, sizeof(*desc));
+	xa_erase_irq(&guc->context_lookup, id);
+}
+
+static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
+{
+	return __get_context(guc, id);
+}
+
+static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
+					   struct intel_context *ce)
+{
+	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
+}
+
 static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
 	/* Leaving stub as this function will be used in future patches */
@@ -400,6 +426,8 @@ int intel_guc_submission_init(struct intel_guc *guc)
 	 */
 	GEM_BUG_ON(!guc->lrc_desc_pool);
 
+	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
+
 	return 0;
 }
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 11/47] drm/i915/guc: Implement GuC submission tasklet
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (9 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 10/47] drm/i915/guc: Add lrc descriptor context lookup array Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-06-29 22:04   ` John Harrison
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 12/47] drm/i915/guc: Add bypass tasklet submission path to GuC Matthew Brost
                   ` (40 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Implement GuC submission tasklet for new interface. The new GuC
interface uses H2G to submit contexts to the GuC. Since H2G use a single
channel, a single tasklet submits is used for the submission path.

Also the per engine interrupt handler has been updated to disable the
rescheduling of the physical engine tasklet, when using GuC scheduling,
as the physical engine tasklet is no longer used.

In this patch the field, guc_id, has been added to intel_context and is
not assigned. Patches later in the series will assign this value.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +++++++++---------
 3 files changed, 127 insertions(+), 117 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index ed8c447a7346..bb6fef7eae52 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -136,6 +136,15 @@ struct intel_context {
 	struct intel_sseu sseu;
 
 	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
+
+	/* GuC scheduling state that does not require a lock. */
+	atomic_t guc_sched_state_no_lock;
+
+	/*
+	 * GuC lrc descriptor ID - Not assigned in this patch but future patches
+	 * in the series will.
+	 */
+	u16 guc_id;
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 2313d9fc087b..9ba8219475b2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -30,6 +30,10 @@ struct intel_guc {
 	struct intel_guc_log log;
 	struct intel_guc_ct ct;
 
+	/* Global engine used to submit requests to GuC */
+	struct i915_sched_engine *sched_engine;
+	struct i915_request *stalled_request;
+
 	/* intel_guc_recv interrupt related state */
 	spinlock_t irq_lock;
 	unsigned int msg_enabled_mask;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 23a94a896a0b..ee933efbf0ff 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -60,6 +60,31 @@
 
 #define GUC_REQUEST_SIZE 64 /* bytes */
 
+/*
+ * Below is a set of functions which control the GuC scheduling state which do
+ * not require a lock as all state transitions are mutually exclusive. i.e. It
+ * is not possible for the context pinning code and submission, for the same
+ * context, to be executing simultaneously. We still need an atomic as it is
+ * possible for some of the bits to changing at the same time though.
+ */
+#define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
+static inline bool context_enabled(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_ENABLED);
+}
+
+static inline void set_context_enabled(struct intel_context *ce)
+{
+	atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_enabled(struct intel_context *ce)
+{
+	atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
+		   &ce->guc_sched_state_no_lock);
+}
+
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
 	return rb_entry(rb, struct i915_priolist, node);
@@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
 	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
 }
 
-static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
+static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
-	/* Leaving stub as this function will be used in future patches */
-}
+	int err;
+	struct intel_context *ce = rq->context;
+	u32 action[3];
+	int len = 0;
+	bool enabled = context_enabled(ce);
 
-/*
- * When we're doing submissions using regular execlists backend, writing to
- * ELSP from CPU side is enough to make sure that writes to ringbuffer pages
- * pinned in mappable aperture portion of GGTT are visible to command streamer.
- * Writes done by GuC on our behalf are not guaranteeing such ordering,
- * therefore, to ensure the flush, we're issuing a POSTING READ.
- */
-static void flush_ggtt_writes(struct i915_vma *vma)
-{
-	if (i915_vma_is_map_and_fenceable(vma))
-		intel_uncore_posting_read_fw(vma->vm->gt->uncore,
-					     GUC_STATUS);
-}
+	if (!enabled) {
+		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
+		action[len++] = ce->guc_id;
+		action[len++] = GUC_CONTEXT_ENABLE;
+	} else {
+		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
+		action[len++] = ce->guc_id;
+	}
 
-static void guc_submit(struct intel_engine_cs *engine,
-		       struct i915_request **out,
-		       struct i915_request **end)
-{
-	struct intel_guc *guc = &engine->gt->uc.guc;
+	err = intel_guc_send_nb(guc, action, len);
 
-	do {
-		struct i915_request *rq = *out++;
+	if (!enabled && !err)
+		set_context_enabled(ce);
 
-		flush_ggtt_writes(rq->ring->vma);
-		guc_add_request(guc, rq);
-	} while (out != end);
+	return err;
 }
 
 static inline int rq_prio(const struct i915_request *rq)
@@ -160,125 +177,88 @@ static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority;
 }
 
-static struct i915_request *schedule_in(struct i915_request *rq, int idx)
+static int guc_dequeue_one_context(struct intel_guc *guc)
 {
-	trace_i915_request_in(rq, idx);
-
-	/*
-	 * Currently we are not tracking the rq->context being inflight
-	 * (ce->inflight = rq->engine). It is only used by the execlists
-	 * backend at the moment, a similar counting strategy would be
-	 * required if we generalise the inflight tracking.
-	 */
-
-	__intel_gt_pm_get(rq->engine->gt);
-	return i915_request_get(rq);
-}
-
-static void schedule_out(struct i915_request *rq)
-{
-	trace_i915_request_out(rq);
-
-	intel_gt_pm_put_async(rq->engine->gt);
-	i915_request_put(rq);
-}
-
-static void __guc_dequeue(struct intel_engine_cs *engine)
-{
-	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct i915_sched_engine * const sched_engine = engine->sched_engine;
-	struct i915_request **first = execlists->inflight;
-	struct i915_request ** const last_port = first + execlists->port_mask;
-	struct i915_request *last = first[0];
-	struct i915_request **port;
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	struct i915_request *last = NULL;
 	bool submit = false;
 	struct rb_node *rb;
+	int ret;
 
 	lockdep_assert_held(&sched_engine->lock);
 
-	if (last) {
-		if (*++first)
-			return;
-
-		last = NULL;
+	if (guc->stalled_request) {
+		submit = true;
+		last = guc->stalled_request;
+		goto resubmit;
 	}
 
-	/*
-	 * We write directly into the execlists->inflight queue and don't use
-	 * the execlists->pending queue, as we don't have a distinct switch
-	 * event.
-	 */
-	port = first;
 	while ((rb = rb_first_cached(&sched_engine->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 		struct i915_request *rq, *rn;
 
 		priolist_for_each_request_consume(rq, rn, p) {
-			if (last && rq->context != last->context) {
-				if (port == last_port)
-					goto done;
-
-				*port = schedule_in(last,
-						    port - execlists->inflight);
-				port++;
-			}
+			if (last && rq->context != last->context)
+				goto done;
 
 			list_del_init(&rq->sched.link);
+
 			__i915_request_submit(rq);
-			submit = true;
+
+			trace_i915_request_in(rq, 0);
 			last = rq;
+			submit = true;
 		}
 
 		rb_erase_cached(&p->node, &sched_engine->queue);
 		i915_priolist_free(p);
 	}
 done:
-	sched_engine->queue_priority_hint =
-		rb ? to_priolist(rb)->priority : INT_MIN;
 	if (submit) {
-		*port = schedule_in(last, port - execlists->inflight);
-		*++port = NULL;
-		guc_submit(engine, first, port);
+		last->context->lrc_reg_state[CTX_RING_TAIL] =
+			intel_ring_set_tail(last->ring, last->tail);
+resubmit:
+		/*
+		 * We only check for -EBUSY here even though it is possible for
+		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
+		 * died and a full GPU needs to be done. The hangcheck will
+		 * eventually detect that the GuC has died and trigger this
+		 * reset so no need to handle -EDEADLK here.
+		 */
+		ret = guc_add_request(guc, last);
+		if (ret == -EBUSY) {
+			tasklet_schedule(&sched_engine->tasklet);
+			guc->stalled_request = last;
+			return false;
+		}
 	}
-	execlists->active = execlists->inflight;
+
+	guc->stalled_request = NULL;
+	return submit;
 }
 
 static void guc_submission_tasklet(struct tasklet_struct *t)
 {
 	struct i915_sched_engine *sched_engine =
 		from_tasklet(sched_engine, t, tasklet);
-	struct intel_engine_cs * const engine = sched_engine->private_data;
-	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct i915_request **port, *rq;
 	unsigned long flags;
+	bool loop;
 
-	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-
-	for (port = execlists->inflight; (rq = *port); port++) {
-		if (!i915_request_completed(rq))
-			break;
-
-		schedule_out(rq);
-	}
-	if (port != execlists->inflight) {
-		int idx = port - execlists->inflight;
-		int rem = ARRAY_SIZE(execlists->inflight) - idx;
-		memmove(execlists->inflight, port, rem * sizeof(*port));
-	}
+	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	__guc_dequeue(engine);
+	do {
+		loop = guc_dequeue_one_context(sched_engine->private_data);
+	} while (loop);
 
-	i915_sched_engine_reset_on_empty(engine->sched_engine);
+	i915_sched_engine_reset_on_empty(sched_engine);
 
-	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
 static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
 {
-	if (iir & GT_RENDER_USER_INTERRUPT) {
+	if (iir & GT_RENDER_USER_INTERRUPT)
 		intel_engine_signal_breadcrumbs(engine);
-		tasklet_hi_schedule(&engine->sched_engine->tasklet);
-	}
 }
 
 static void guc_reset_prepare(struct intel_engine_cs *engine)
@@ -349,6 +329,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
 	struct rb_node *rb;
 	unsigned long flags;
 
+	/* Can be called during boot if GuC fails to load */
+	if (!engine->gt)
+		return;
+
 	ENGINE_TRACE(engine, "\n");
 
 	/*
@@ -433,8 +417,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
 
 void intel_guc_submission_fini(struct intel_guc *guc)
 {
-	if (guc->lrc_desc_pool)
-		guc_lrc_desc_pool_destroy(guc);
+	if (!guc->lrc_desc_pool)
+		return;
+
+	guc_lrc_desc_pool_destroy(guc);
+	i915_sched_engine_put(guc->sched_engine);
 }
 
 static int guc_context_alloc(struct intel_context *ce)
@@ -499,32 +486,32 @@ static int guc_request_alloc(struct i915_request *request)
 	return 0;
 }
 
-static inline void queue_request(struct intel_engine_cs *engine,
+static inline void queue_request(struct i915_sched_engine *sched_engine,
 				 struct i915_request *rq,
 				 int prio)
 {
 	GEM_BUG_ON(!list_empty(&rq->sched.link));
 	list_add_tail(&rq->sched.link,
-		      i915_sched_lookup_priolist(engine->sched_engine, prio));
+		      i915_sched_lookup_priolist(sched_engine, prio));
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 }
 
 static void guc_submit_request(struct i915_request *rq)
 {
-	struct intel_engine_cs *engine = rq->engine;
+	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
 	unsigned long flags;
 
 	/* Will be called from irq-context when using foreign fences. */
-	spin_lock_irqsave(&engine->sched_engine->lock, flags);
+	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	queue_request(engine, rq, rq_prio(rq));
+	queue_request(sched_engine, rq, rq_prio(rq));
 
-	GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
+	GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
 	GEM_BUG_ON(list_empty(&rq->sched.link));
 
-	tasklet_hi_schedule(&engine->sched_engine->tasklet);
+	tasklet_hi_schedule(&sched_engine->tasklet);
 
-	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
 static void sanitize_hwsp(struct intel_engine_cs *engine)
@@ -602,8 +589,6 @@ static void guc_release(struct intel_engine_cs *engine)
 {
 	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
 
-	tasklet_kill(&engine->sched_engine->tasklet);
-
 	intel_engine_cleanup_common(engine);
 	lrc_fini_wa_ctx(engine);
 }
@@ -674,6 +659,7 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
 int intel_guc_submission_setup(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
+	struct intel_guc *guc = &engine->gt->uc.guc;
 
 	/*
 	 * The setup relies on several assumptions (e.g. irqs always enabled)
@@ -681,7 +667,18 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 	 */
 	GEM_BUG_ON(GRAPHICS_VER(i915) < 11);
 
-	tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet);
+	if (!guc->sched_engine) {
+		guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
+		if (!guc->sched_engine)
+			return -ENOMEM;
+
+		guc->sched_engine->schedule = i915_schedule;
+		guc->sched_engine->private_data = guc;
+		tasklet_setup(&guc->sched_engine->tasklet,
+			      guc_submission_tasklet);
+	}
+	i915_sched_engine_put(engine->sched_engine);
+	engine->sched_engine = i915_sched_engine_get(guc->sched_engine);
 
 	guc_default_vfuncs(engine);
 	guc_default_irqs(engine);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 12/47] drm/i915/guc: Add bypass tasklet submission path to GuC
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (10 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 11/47] drm/i915/guc: Implement GuC submission tasklet Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-06-29 22:09   ` John Harrison
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 13/47] drm/i915/guc: Implement GuC context operations for new inteface Matthew Brost
                   ` (39 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Add bypass tasklet submission path to GuC. The tasklet is only used if H2G
channel has backpresure.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++++++++++++++----
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ee933efbf0ff..38aff83ee9fa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -172,6 +172,12 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	return err;
 }
 
+static inline void guc_set_lrc_tail(struct i915_request *rq)
+{
+	rq->context->lrc_reg_state[CTX_RING_TAIL] =
+		intel_ring_set_tail(rq->ring, rq->tail);
+}
+
 static inline int rq_prio(const struct i915_request *rq)
 {
 	return rq->sched.attr.priority;
@@ -215,8 +221,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 	}
 done:
 	if (submit) {
-		last->context->lrc_reg_state[CTX_RING_TAIL] =
-			intel_ring_set_tail(last->ring, last->tail);
+		guc_set_lrc_tail(last);
 resubmit:
 		/*
 		 * We only check for -EBUSY here even though it is possible for
@@ -496,20 +501,36 @@ static inline void queue_request(struct i915_sched_engine *sched_engine,
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 }
 
+static int guc_bypass_tasklet_submit(struct intel_guc *guc,
+				     struct i915_request *rq)
+{
+	int ret;
+
+	__i915_request_submit(rq);
+
+	trace_i915_request_in(rq, 0);
+
+	guc_set_lrc_tail(rq);
+	ret = guc_add_request(guc, rq);
+	if (ret == -EBUSY)
+		guc->stalled_request = rq;
+
+	return ret;
+}
+
 static void guc_submit_request(struct i915_request *rq)
 {
 	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
+	struct intel_guc *guc = &rq->engine->gt->uc.guc;
 	unsigned long flags;
 
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	queue_request(sched_engine, rq, rq_prio(rq));
-
-	GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
-	GEM_BUG_ON(list_empty(&rq->sched.link));
-
-	tasklet_hi_schedule(&sched_engine->tasklet);
+	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
+		queue_request(sched_engine, rq, rq_prio(rq));
+	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
+		tasklet_hi_schedule(&sched_engine->tasklet);
 
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 13/47] drm/i915/guc: Implement GuC context operations for new inteface
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (11 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 12/47] drm/i915/guc: Add bypass tasklet submission path to GuC Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-06-25 13:25   ` Michal Wajdeczko
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 14/47] drm/i915/guc: Insert fence on context when deregistering Matthew Brost
                   ` (38 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Implement GuC context operations which includes GuC specific operations
alloc, pin, unpin, and destroy.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
 drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  34 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 664 ++++++++++++++++--
 drivers/gpu/drm/i915/i915_reg.h               |   1 +
 drivers/gpu/drm/i915/i915_request.c           |   1 +
 8 files changed, 677 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 4033184f13b9..2b68af16222c 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -383,6 +383,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 
 	mutex_init(&ce->pin_mutex);
 
+	spin_lock_init(&ce->guc_state.lock);
+
+	ce->guc_id = GUC_INVALID_LRC_ID;
+	INIT_LIST_HEAD(&ce->guc_id_link);
+
 	i915_active_init(&ce->active,
 			 __intel_context_active, __intel_context_retire, 0);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index bb6fef7eae52..ce7c69b34cd1 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -95,6 +95,7 @@ struct intel_context {
 #define CONTEXT_BANNED			6
 #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
 #define CONTEXT_NOPREEMPT		8
+#define CONTEXT_LRCA_DIRTY		9
 
 	struct {
 		u64 timeout_us;
@@ -137,14 +138,29 @@ struct intel_context {
 
 	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
 
+	struct {
+		/** lock: protects everything in guc_state */
+		spinlock_t lock;
+		/**
+		 * sched_state: scheduling state of this context using GuC
+		 * submission
+		 */
+		u8 sched_state;
+	} guc_state;
+
 	/* GuC scheduling state that does not require a lock. */
 	atomic_t guc_sched_state_no_lock;
 
+	/* GuC lrc descriptor ID */
+	u16 guc_id;
+
+	/* GuC lrc descriptor reference count */
+	atomic_t guc_id_ref;
+
 	/*
-	 * GuC lrc descriptor ID - Not assigned in this patch but future patches
-	 * in the series will.
+	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
 	 */
-	u16 guc_id;
+	struct list_head guc_id_link;
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
index 41e5350a7a05..49d4857ad9b7 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
@@ -87,7 +87,6 @@
 #define GEN11_CSB_WRITE_PTR_MASK	(GEN11_CSB_PTR_MASK << 0)
 
 #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */
-#define MAX_GUC_CONTEXT_HW_ID	(1 << 20) /* exclusive */
 #define GEN11_MAX_CONTEXT_HW_ID	(1 << 11) /* exclusive */
 /* in Gen12 ID 0x7FF is reserved to indicate idle */
 #define GEN12_MAX_CONTEXT_HW_ID	(GEN11_MAX_CONTEXT_HW_ID - 1)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 9ba8219475b2..d44316dc914b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -44,6 +44,14 @@ struct intel_guc {
 		void (*disable)(struct intel_guc *guc);
 	} interrupts;
 
+	/*
+	 * contexts_lock protects the pool of free guc ids and a linked list of
+	 * guc ids available to be stolen
+	 */
+	spinlock_t contexts_lock;
+	struct ida guc_ids;
+	struct list_head guc_id_list;
+
 	bool submission_selected;
 
 	struct i915_vma *ads_vma;
@@ -102,6 +110,29 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
 				 response_buf, response_buf_size, 0);
 }
 
+static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
+					   const u32 *action,
+					   u32 len,
+					   bool loop)
+{
+	int err;
+
+	/* No sleeping with spin locks, just busy loop */
+	might_sleep_if(loop && (!in_atomic() && !irqs_disabled()));
+
+retry:
+	err = intel_guc_send_nb(guc, action, len);
+	if (unlikely(err == -EBUSY && loop)) {
+		if (likely(!in_atomic() && !irqs_disabled()))
+			cond_resched();
+		else
+			cpu_relax();
+		goto retry;
+	}
+
+	return err;
+}
+
 static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
 {
 	intel_guc_ct_event_handler(&guc->ct);
@@ -203,6 +234,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
 int intel_guc_reset_engine(struct intel_guc *guc,
 			   struct intel_engine_cs *engine);
 
+int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
+					  const u32 *msg, u32 len);
+
 void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
 
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 8e0ed7d8feb3..42a7daef2ff6 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -901,6 +901,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 	case INTEL_GUC_ACTION_DEFAULT:
 		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
 		break;
+	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
+		ret = intel_guc_deregister_done_process_msg(guc, payload,
+							    len);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 38aff83ee9fa..d39579ac2faa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -13,7 +13,9 @@
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_irq.h"
 #include "gt/intel_gt_pm.h"
+#include "gt/intel_gt_requests.h"
 #include "gt/intel_lrc.h"
+#include "gt/intel_lrc_reg.h"
 #include "gt/intel_mocs.h"
 #include "gt/intel_ring.h"
 
@@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce)
 		   &ce->guc_sched_state_no_lock);
 }
 
+/*
+ * Below is a set of functions which control the GuC scheduling state which
+ * require a lock, aside from the special case where the functions are called
+ * from guc_lrc_desc_pin(). In that case it isn't possible for any other code
+ * path to be executing on the context.
+ */
+#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
+#define SCHED_STATE_DESTROYED				BIT(1)
+static inline void init_sched_state(struct intel_context *ce)
+{
+	/* Only should be called from guc_lrc_desc_pin() */
+	atomic_set(&ce->guc_sched_state_no_lock, 0);
+	ce->guc_state.sched_state = 0;
+}
+
+static inline bool
+context_wait_for_deregister_to_register(struct intel_context *ce)
+{
+	return (ce->guc_state.sched_state &
+		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
+}
+
+static inline void
+set_context_wait_for_deregister_to_register(struct intel_context *ce)
+{
+	/* Only should be called from guc_lrc_desc_pin() */
+	ce->guc_state.sched_state |=
+		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
+}
+
+static inline void
+clr_context_wait_for_deregister_to_register(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state =
+		(ce->guc_state.sched_state &
+		 ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
+}
+
+static inline bool
+context_destroyed(struct intel_context *ce)
+{
+	return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
+}
+
+static inline void
+set_context_destroyed(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
+}
+
+static inline bool context_guc_id_invalid(struct intel_context *ce)
+{
+	return (ce->guc_id == GUC_INVALID_LRC_ID);
+}
+
+static inline void set_context_guc_id_invalid(struct intel_context *ce)
+{
+	ce->guc_id = GUC_INVALID_LRC_ID;
+}
+
+static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
+{
+	return &ce->engine->gt->uc.guc;
+}
+
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
 	return rb_entry(rb, struct i915_priolist, node);
@@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	int len = 0;
 	bool enabled = context_enabled(ce);
 
+	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
+	GEM_BUG_ON(context_guc_id_invalid(ce));
+
 	if (!enabled) {
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
 		action[len++] = ce->guc_id;
@@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
 
 	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
 
+	spin_lock_init(&guc->contexts_lock);
+	INIT_LIST_HEAD(&guc->guc_id_list);
+	ida_init(&guc->guc_ids);
+
 	return 0;
 }
 
@@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc)
 	i915_sched_engine_put(guc->sched_engine);
 }
 
-static int guc_context_alloc(struct intel_context *ce)
+static inline void queue_request(struct i915_sched_engine *sched_engine,
+				 struct i915_request *rq,
+				 int prio)
 {
-	return lrc_alloc(ce, ce->engine);
+	GEM_BUG_ON(!list_empty(&rq->sched.link));
+	list_add_tail(&rq->sched.link,
+		      i915_sched_lookup_priolist(sched_engine, prio));
+	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+}
+
+static int guc_bypass_tasklet_submit(struct intel_guc *guc,
+				     struct i915_request *rq)
+{
+	int ret;
+
+	__i915_request_submit(rq);
+
+	trace_i915_request_in(rq, 0);
+
+	guc_set_lrc_tail(rq);
+	ret = guc_add_request(guc, rq);
+	if (ret == -EBUSY)
+		guc->stalled_request = rq;
+
+	return ret;
+}
+
+static void guc_submit_request(struct i915_request *rq)
+{
+	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
+	struct intel_guc *guc = &rq->engine->gt->uc.guc;
+	unsigned long flags;
+
+	/* Will be called from irq-context when using foreign fences. */
+	spin_lock_irqsave(&sched_engine->lock, flags);
+
+	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
+		queue_request(sched_engine, rq, rq_prio(rq));
+	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
+		tasklet_hi_schedule(&sched_engine->tasklet);
+
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+
+#define GUC_ID_START	64	/* First 64 guc_ids reserved */
+static int new_guc_id(struct intel_guc *guc)
+{
+	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
+			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
+			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
+}
+
+static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
+{
+	if (!context_guc_id_invalid(ce)) {
+		ida_simple_remove(&guc->guc_ids, ce->guc_id);
+		reset_lrc_desc(guc, ce->guc_id);
+		set_context_guc_id_invalid(ce);
+	}
+	if (!list_empty(&ce->guc_id_link))
+		list_del_init(&ce->guc_id_link);
+}
+
+static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&guc->contexts_lock, flags);
+	__release_guc_id(guc, ce);
+	spin_unlock_irqrestore(&guc->contexts_lock, flags);
+}
+
+static int steal_guc_id(struct intel_guc *guc)
+{
+	struct intel_context *ce;
+	int guc_id;
+
+	if (!list_empty(&guc->guc_id_list)) {
+		ce = list_first_entry(&guc->guc_id_list,
+				      struct intel_context,
+				      guc_id_link);
+
+		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
+		GEM_BUG_ON(context_guc_id_invalid(ce));
+
+		list_del_init(&ce->guc_id_link);
+		guc_id = ce->guc_id;
+		set_context_guc_id_invalid(ce);
+		return guc_id;
+	} else {
+		return -EAGAIN;
+	}
+}
+
+static int assign_guc_id(struct intel_guc *guc, u16 *out)
+{
+	int ret;
+
+	ret = new_guc_id(guc);
+	if (unlikely(ret < 0)) {
+		ret = steal_guc_id(guc);
+		if (ret < 0)
+			return ret;
+	}
+
+	*out = ret;
+	return 0;
+}
+
+#define PIN_GUC_ID_TRIES	4
+static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
+{
+	int ret = 0;
+	unsigned long flags, tries = PIN_GUC_ID_TRIES;
+
+	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
+
+try_again:
+	spin_lock_irqsave(&guc->contexts_lock, flags);
+
+	if (context_guc_id_invalid(ce)) {
+		ret = assign_guc_id(guc, &ce->guc_id);
+		if (ret)
+			goto out_unlock;
+		ret = 1;	// Indidcates newly assigned HW context
+	}
+	if (!list_empty(&ce->guc_id_link))
+		list_del_init(&ce->guc_id_link);
+	atomic_inc(&ce->guc_id_ref);
+
+out_unlock:
+	spin_unlock_irqrestore(&guc->contexts_lock, flags);
+
+	/*
+	 * -EAGAIN indicates no guc_ids are available, let's retire any
+	 * outstanding requests to see if that frees up a guc_id. If the first
+	 * retire didn't help, insert a sleep with the timeslice duration before
+	 * attempting to retire more requests. Double the sleep period each
+	 * subsequent pass before finally giving up. The sleep period has max of
+	 * 100ms and minimum of 1ms.
+	 */
+	if (ret == -EAGAIN && --tries) {
+		if (PIN_GUC_ID_TRIES - tries > 1) {
+			unsigned int timeslice_shifted =
+				ce->engine->props.timeslice_duration_ms <<
+				(PIN_GUC_ID_TRIES - tries - 2);
+			unsigned int max = min_t(unsigned int, 100,
+						 timeslice_shifted);
+
+			msleep(max_t(unsigned int, max, 1));
+		}
+		intel_gt_retire_requests(guc_to_gt(guc));
+		goto try_again;
+	}
+
+	return ret;
+}
+
+static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
+{
+	unsigned long flags;
+
+	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
+
+	spin_lock_irqsave(&guc->contexts_lock, flags);
+	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
+	    !atomic_read(&ce->guc_id_ref))
+		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
+	spin_unlock_irqrestore(&guc->contexts_lock, flags);
+}
+
+static int __guc_action_register_context(struct intel_guc *guc,
+					 u32 guc_id,
+					 u32 offset)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_REGISTER_CONTEXT,
+		guc_id,
+		offset,
+	};
+
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+}
+
+static int register_context(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
+		ce->guc_id * sizeof(struct guc_lrc_desc);
+
+	return __guc_action_register_context(guc, ce->guc_id, offset);
+}
+
+static int __guc_action_deregister_context(struct intel_guc *guc,
+					   u32 guc_id)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
+		guc_id,
+	};
+
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+}
+
+static int deregister_context(struct intel_context *ce, u32 guc_id)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+
+	return __guc_action_deregister_context(guc, guc_id);
+}
+
+static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
+{
+	switch (class) {
+	case RENDER_CLASS:
+		return mask >> RCS0;
+	case VIDEO_ENHANCEMENT_CLASS:
+		return mask >> VECS0;
+	case VIDEO_DECODE_CLASS:
+		return mask >> VCS0;
+	case COPY_ENGINE_CLASS:
+		return mask >> BCS0;
+	default:
+		GEM_BUG_ON("Invalid Class");
+		return 0;
+	}
+}
+
+static void guc_context_policy_init(struct intel_engine_cs *engine,
+				    struct guc_lrc_desc *desc)
+{
+	desc->policy_flags = 0;
+
+	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
+	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
+}
+
+static int guc_lrc_desc_pin(struct intel_context *ce)
+{
+	struct intel_runtime_pm *runtime_pm =
+		&ce->engine->gt->i915->runtime_pm;
+	struct intel_engine_cs *engine = ce->engine;
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	u32 desc_idx = ce->guc_id;
+	struct guc_lrc_desc *desc;
+	bool context_registered;
+	intel_wakeref_t wakeref;
+	int ret = 0;
+
+	GEM_BUG_ON(!engine->mask);
+
+	/*
+	 * Ensure LRC + CT vmas are is same region as write barrier is done
+	 * based on CT vma region.
+	 */
+	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
+		   i915_gem_object_is_lmem(ce->ring->vma->obj));
+
+	context_registered = lrc_desc_registered(guc, desc_idx);
+
+	reset_lrc_desc(guc, desc_idx);
+	set_lrc_desc_registered(guc, desc_idx, ce);
+
+	desc = __get_lrc_desc(guc, desc_idx);
+	desc->engine_class = engine_class_to_guc_class(engine->class);
+	desc->engine_submit_mask = adjust_engine_mask(engine->class,
+						      engine->mask);
+	desc->hw_context_desc = ce->lrc.lrca;
+	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
+	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
+	guc_context_policy_init(engine, desc);
+	init_sched_state(ce);
+
+	/*
+	 * The context_lookup xarray is used to determine if the hardware
+	 * context is currently registered. There are two cases in which it
+	 * could be regisgered either the guc_id has been stole from from
+	 * another context or the lrc descriptor address of this context has
+	 * changed. In either case the context needs to be deregistered with the
+	 * GuC before registering this context.
+	 */
+	if (context_registered) {
+		set_context_wait_for_deregister_to_register(ce);
+		intel_context_get(ce);
+
+		/*
+		 * If stealing the guc_id, this ce has the same guc_id as the
+		 * context whos guc_id was stole.
+		 */
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			ret = deregister_context(ce, ce->guc_id);
+	} else {
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			ret = register_context(ce);
+	}
+
+	return ret;
 }
 
 static int guc_context_pre_pin(struct intel_context *ce,
@@ -443,36 +813,137 @@ static int guc_context_pre_pin(struct intel_context *ce,
 
 static int guc_context_pin(struct intel_context *ce, void *vaddr)
 {
+	if (i915_ggtt_offset(ce->state) !=
+	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
+		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
+
 	return lrc_pin(ce, ce->engine, vaddr);
 }
 
+static void guc_context_unpin(struct intel_context *ce)
+{
+	unpin_guc_id(ce_to_guc(ce), ce);
+	lrc_unpin(ce);
+}
+
+static void guc_context_post_unpin(struct intel_context *ce)
+{
+	lrc_post_unpin(ce);
+}
+
+static inline void guc_lrc_desc_unpin(struct intel_context *ce)
+{
+	struct intel_engine_cs *engine = ce->engine;
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	unsigned long flags;
+
+	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
+	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	set_context_destroyed(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+	deregister_context(ce, ce->guc_id);
+}
+
+static void guc_context_destroy(struct kref *kref)
+{
+	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
+	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
+	struct intel_guc *guc = &ce->engine->gt->uc.guc;
+	intel_wakeref_t wakeref;
+	unsigned long flags;
+
+	/*
+	 * If the guc_id is invalid this context has been stolen and we can free
+	 * it immediately. Also can be freed immediately if the context is not
+	 * registered with the GuC.
+	 */
+	if (context_guc_id_invalid(ce) ||
+	    !lrc_desc_registered(guc, ce->guc_id)) {
+		release_guc_id(guc, ce);
+		lrc_destroy(kref);
+		return;
+	}
+
+	/*
+	 * We have to acquire the context spinlock and check guc_id again, if it
+	 * is valid it hasn't been stolen and needs to be deregistered. We
+	 * delete this context from the list of unpinned guc_ids available to
+	 * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
+	 * returns indicating this context has been deregistered the guc_id is
+	 * returned to the pool of available guc_ids.
+	 */
+	spin_lock_irqsave(&guc->contexts_lock, flags);
+	if (context_guc_id_invalid(ce)) {
+		__release_guc_id(guc, ce);
+		spin_unlock_irqrestore(&guc->contexts_lock, flags);
+		lrc_destroy(kref);
+		return;
+	}
+
+	if (!list_empty(&ce->guc_id_link))
+		list_del_init(&ce->guc_id_link);
+	spin_unlock_irqrestore(&guc->contexts_lock, flags);
+
+	/*
+	 * We defer GuC context deregistration until the context is destroyed
+	 * in order to save on CTBs. With this optimization ideally we only need
+	 * 1 CTB to register the context during the first pin and 1 CTB to
+	 * deregister the context when the context is destroyed. Without this
+	 * optimization, a CTB would be needed every pin & unpin.
+	 *
+	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
+	 * from context_free_worker when not runtime wakeref is held.
+	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
+	 * in H2G CTB to deregister the context. A future patch may defer this
+	 * H2G CTB if the runtime wakeref is zero.
+	 */
+	with_intel_runtime_pm(runtime_pm, wakeref)
+		guc_lrc_desc_unpin(ce);
+}
+
+static int guc_context_alloc(struct intel_context *ce)
+{
+	return lrc_alloc(ce, ce->engine);
+}
+
 static const struct intel_context_ops guc_context_ops = {
 	.alloc = guc_context_alloc,
 
 	.pre_pin = guc_context_pre_pin,
 	.pin = guc_context_pin,
-	.unpin = lrc_unpin,
-	.post_unpin = lrc_post_unpin,
+	.unpin = guc_context_unpin,
+	.post_unpin = guc_context_post_unpin,
 
 	.enter = intel_context_enter_engine,
 	.exit = intel_context_exit_engine,
 
 	.reset = lrc_reset,
-	.destroy = lrc_destroy,
+	.destroy = guc_context_destroy,
 };
 
-static int guc_request_alloc(struct i915_request *request)
+static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
 {
+	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
+		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
+}
+
+static int guc_request_alloc(struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+	struct intel_guc *guc = ce_to_guc(ce);
 	int ret;
 
-	GEM_BUG_ON(!intel_context_is_pinned(request->context));
+	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
 
 	/*
 	 * Flush enough space to reduce the likelihood of waiting after
 	 * we start building the request - in which case we will just
 	 * have to repeat work.
 	 */
-	request->reserved_space += GUC_REQUEST_SIZE;
+	rq->reserved_space += GUC_REQUEST_SIZE;
 
 	/*
 	 * Note that after this point, we have committed to using
@@ -483,56 +954,47 @@ static int guc_request_alloc(struct i915_request *request)
 	 */
 
 	/* Unconditionally invalidate GPU caches and TLBs. */
-	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
+	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
 	if (ret)
 		return ret;
 
-	request->reserved_space -= GUC_REQUEST_SIZE;
-	return 0;
-}
-
-static inline void queue_request(struct i915_sched_engine *sched_engine,
-				 struct i915_request *rq,
-				 int prio)
-{
-	GEM_BUG_ON(!list_empty(&rq->sched.link));
-	list_add_tail(&rq->sched.link,
-		      i915_sched_lookup_priolist(sched_engine, prio));
-	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
-}
-
-static int guc_bypass_tasklet_submit(struct intel_guc *guc,
-				     struct i915_request *rq)
-{
-	int ret;
-
-	__i915_request_submit(rq);
+	rq->reserved_space -= GUC_REQUEST_SIZE;
 
-	trace_i915_request_in(rq, 0);
-
-	guc_set_lrc_tail(rq);
-	ret = guc_add_request(guc, rq);
-	if (ret == -EBUSY)
-		guc->stalled_request = rq;
-
-	return ret;
-}
-
-static void guc_submit_request(struct i915_request *rq)
-{
-	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
-	struct intel_guc *guc = &rq->engine->gt->uc.guc;
-	unsigned long flags;
+	/*
+	 * Call pin_guc_id here rather than in the pinning step as with
+	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
+	 * guc_ids and creating horrible race conditions. This is especially bad
+	 * when guc_ids are being stolen due to over subscription. By the time
+	 * this function is reached, it is guaranteed that the guc_id will be
+	 * persistent until the generated request is retired. Thus, sealing these
+	 * race conditions. It is still safe to fail here if guc_ids are
+	 * exhausted and return -EAGAIN to the user indicating that they can try
+	 * again in the future.
+	 *
+	 * There is no need for a lock here as the timeline mutex ensures at
+	 * most one context can be executing this code path at once. The
+	 * guc_id_ref is incremented once for every request in flight and
+	 * decremented on each retire. When it is zero, a lock around the
+	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
+	 */
+	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
+		return 0;
 
-	/* Will be called from irq-context when using foreign fences. */
-	spin_lock_irqsave(&sched_engine->lock, flags);
+	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
+	if (unlikely(ret < 0))
+		return ret;;
+	if (context_needs_register(ce, !!ret)) {
+		ret = guc_lrc_desc_pin(ce);
+		if (unlikely(ret)) {	/* unwind */
+			atomic_dec(&ce->guc_id_ref);
+			unpin_guc_id(guc, ce);
+			return ret;
+		}
+	}
 
-	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
-		queue_request(sched_engine, rq, rq_prio(rq));
-	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
-		tasklet_hi_schedule(&sched_engine->tasklet);
+	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
 
-	spin_unlock_irqrestore(&sched_engine->lock, flags);
+	return 0;
 }
 
 static void sanitize_hwsp(struct intel_engine_cs *engine)
@@ -606,6 +1068,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
 	engine->submit_request = guc_submit_request;
 }
 
+static inline void guc_kernel_context_pin(struct intel_guc *guc,
+					  struct intel_context *ce)
+{
+	if (context_guc_id_invalid(ce))
+		pin_guc_id(guc, ce);
+	guc_lrc_desc_pin(ce);
+}
+
+static inline void guc_init_lrc_mapping(struct intel_guc *guc)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	/* make sure all descriptors are clean... */
+	xa_destroy(&guc->context_lookup);
+
+	/*
+	 * Some contexts might have been pinned before we enabled GuC
+	 * submission, so we need to add them to the GuC bookeeping.
+	 * Also, after a reset the GuC we want to make sure that the information
+	 * shared with GuC is properly reset. The kernel lrcs are not attached
+	 * to the gem_context, so they need to be added separately.
+	 *
+	 * Note: we purposely do not check the error return of
+	 * guc_lrc_desc_pin, because that function can only fail in two cases.
+	 * One, if there aren't enough free IDs, but we're guaranteed to have
+	 * enough here (we're either only pinning a handful of lrc on first boot
+	 * or we're re-pinning lrcs that were already pinned before the reset).
+	 * Two, if the GuC has died and CTBs can't make forward progress.
+	 * Presumably, the GuC should be alive as this function is called on
+	 * driver load or after a reset. Even if it is dead, another full GPU
+	 * reset will be triggered and this function would be called again.
+	 */
+
+	for_each_engine(engine, gt, id)
+		if (engine->kernel_context)
+			guc_kernel_context_pin(guc, engine->kernel_context);
+}
+
 static void guc_release(struct intel_engine_cs *engine)
 {
 	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
@@ -718,6 +1220,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 
 void intel_guc_submission_enable(struct intel_guc *guc)
 {
+	guc_init_lrc_mapping(guc);
 }
 
 void intel_guc_submission_disable(struct intel_guc *guc)
@@ -743,3 +1246,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
 {
 	guc->submission_selected = __guc_submission_selected(guc);
 }
+
+static inline struct intel_context *
+g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
+{
+	struct intel_context *ce;
+
+	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"Invalid desc_idx %u", desc_idx);
+		return NULL;
+	}
+
+	ce = __get_context(guc, desc_idx);
+	if (unlikely(!ce)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"Context is NULL, desc_idx %u", desc_idx);
+		return NULL;
+	}
+
+	return ce;
+}
+
+int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
+					  const u32 *msg,
+					  u32 len)
+{
+	struct intel_context *ce;
+	u32 desc_idx = msg[0];
+
+	if (unlikely(len < 1)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+		return -EPROTO;
+	}
+
+	ce = g2h_context_lookup(guc, desc_idx);
+	if (unlikely(!ce))
+		return -EPROTO;
+
+	if (context_wait_for_deregister_to_register(ce)) {
+		struct intel_runtime_pm *runtime_pm =
+			&ce->engine->gt->i915->runtime_pm;
+		intel_wakeref_t wakeref;
+
+		/*
+		 * Previous owner of this guc_id has been deregistered, now safe
+		 * register this context.
+		 */
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			register_context(ce);
+		clr_context_wait_for_deregister_to_register(ce);
+		intel_context_put(ce);
+	} else if (context_destroyed(ce)) {
+		/* Context has been destroyed */
+		release_guc_id(guc, ce);
+		lrc_destroy(&ce->ref);
+	}
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index c857fafb8a30..a9c2242d61a2 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -4142,6 +4142,7 @@ enum {
 	FAULT_AND_CONTINUE /* Unsupported */
 };
 
+#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
 #define GEN8_CTX_VALID (1 << 0)
 #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
 #define GEN8_CTX_FORCE_RESTORE (1 << 2)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index c5989c0b83d3..9dad3df5eaf7 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -419,6 +419,7 @@ bool i915_request_retire(struct i915_request *rq)
 	 */
 	if (!list_empty(&rq->sched.link))
 		remove_from_engine(rq);
+	atomic_dec(&rq->context->guc_id_ref);
 	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
 
 	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 14/47] drm/i915/guc: Insert fence on context when deregistering
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (12 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 13/47] drm/i915/guc: Implement GuC context operations for new inteface Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-07-09 22:39   ` John Harrison
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 15/47] drm/i915/guc: Defer context unpin until scheduling is disabled Matthew Brost
                   ` (37 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Sometime during context pinning a context with the same guc_id is
registered with the GuC. In this a case deregister must be before before
the context can be registered. A fence is inserted on all requests while
the deregister is in flight. Once the G2H is received indicating the
deregistration is complete the context is registered and the fence is
released.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |  1 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |  5 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 51 ++++++++++++++++++-
 drivers/gpu/drm/i915/i915_request.h           |  8 +++
 4 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 2b68af16222c..f750c826e19d 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -384,6 +384,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	mutex_init(&ce->pin_mutex);
 
 	spin_lock_init(&ce->guc_state.lock);
+	INIT_LIST_HEAD(&ce->guc_state.fences);
 
 	ce->guc_id = GUC_INVALID_LRC_ID;
 	INIT_LIST_HEAD(&ce->guc_id_link);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index ce7c69b34cd1..beafe55a9101 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -146,6 +146,11 @@ struct intel_context {
 		 * submission
 		 */
 		u8 sched_state;
+		/*
+		 * fences: maintains of list of requests that have a submit
+		 * fence related to GuC submission
+		 */
+		struct list_head fences;
 	} guc_state;
 
 	/* GuC scheduling state that does not require a lock. */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index d39579ac2faa..49e5d460d54b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -924,6 +924,30 @@ static const struct intel_context_ops guc_context_ops = {
 	.destroy = guc_context_destroy,
 };
 
+static void __guc_signal_context_fence(struct intel_context *ce)
+{
+	struct i915_request *rq;
+
+	lockdep_assert_held(&ce->guc_state.lock);
+
+	list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link)
+		i915_sw_fence_complete(&rq->submit);
+
+	INIT_LIST_HEAD(&ce->guc_state.fences);
+}
+
+static void guc_signal_context_fence(struct intel_context *ce)
+{
+	unsigned long flags;
+
+	GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	clr_context_wait_for_deregister_to_register(ce);
+	__guc_signal_context_fence(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+}
+
 static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
 {
 	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
@@ -934,6 +958,7 @@ static int guc_request_alloc(struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
 	struct intel_guc *guc = ce_to_guc(ce);
+	unsigned long flags;
 	int ret;
 
 	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
@@ -978,7 +1003,7 @@ static int guc_request_alloc(struct i915_request *rq)
 	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
 	 */
 	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
-		return 0;
+		goto out;
 
 	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
 	if (unlikely(ret < 0))
@@ -994,6 +1019,28 @@ static int guc_request_alloc(struct i915_request *rq)
 
 	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
 
+out:
+	/*
+	 * We block all requests on this context if a G2H is pending for a
+	 * context deregistration as the GuC will fail a context registration
+	 * while this G2H is pending. Once a G2H returns, the fence is released
+	 * that is blocking these requests (see guc_signal_context_fence).
+	 *
+	 * We can safely check the below field outside of the lock as it isn't
+	 * possible for this field to transition from being clear to set but
+	 * converse is possible, hence the need for the check within the lock.
+	 */
+	if (likely(!context_wait_for_deregister_to_register(ce)))
+		return 0;
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	if (context_wait_for_deregister_to_register(ce)) {
+		i915_sw_fence_await(&rq->submit);
+
+		list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences);
+	}
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
 	return 0;
 }
 
@@ -1295,7 +1342,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 		 */
 		with_intel_runtime_pm(runtime_pm, wakeref)
 			register_context(ce);
-		clr_context_wait_for_deregister_to_register(ce);
+		guc_signal_context_fence(ce);
 		intel_context_put(ce);
 	} else if (context_destroyed(ce)) {
 		/* Context has been destroyed */
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 239964bec1fa..f870cd75a001 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -285,6 +285,14 @@ struct i915_request {
 		struct hrtimer timer;
 	} watchdog;
 
+	/*
+	 * Requests may need to be stalled when using GuC submission waiting for
+	 * certain GuC operations to complete. If that is the case, stalled
+	 * requests are added to a per context list of stalled requests. The
+	 * below list_head is the link in that list.
+	 */
+	struct list_head guc_fence_link;
+
 	I915_SELFTEST_DECLARE(struct {
 		struct list_head link;
 		unsigned long delay;
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 15/47] drm/i915/guc: Defer context unpin until scheduling is disabled
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (13 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 14/47] drm/i915/guc: Insert fence on context when deregistering Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-07-09 22:48   ` John Harrison
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 16/47] drm/i915/guc: Disable engine barriers with GuC during unpin Matthew Brost
                   ` (36 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

With GuC scheduling, it isn't safe to unpin a context while scheduling
is enabled for that context as the GuC may touch some of the pinned
state (e.g. LRC). To ensure scheduling isn't enabled when an unpin is
done, a call back is added to intel_context_unpin when pin count == 1
to disable scheduling for that context. When the response CTB is
received it is safe to do the final unpin.

Future patches may add a heuristic / delay to schedule the disable
call back to avoid thrashing on schedule enable / disable.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   4 +-
 drivers/gpu/drm/i915/gt/intel_context.h       |  27 +++-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   3 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 145 +++++++++++++++++-
 6 files changed, 179 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index f750c826e19d..1499b8aace2a 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -306,9 +306,9 @@ int __intel_context_do_pin(struct intel_context *ce)
 	return err;
 }
 
-void intel_context_unpin(struct intel_context *ce)
+void __intel_context_do_unpin(struct intel_context *ce, int sub)
 {
-	if (!atomic_dec_and_test(&ce->pin_count))
+	if (!atomic_sub_and_test(sub, &ce->pin_count))
 		return;
 
 	CE_TRACE(ce, "unpin\n");
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index f83a73a2b39f..8a7199afbe61 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -113,7 +113,32 @@ static inline void __intel_context_pin(struct intel_context *ce)
 	atomic_inc(&ce->pin_count);
 }
 
-void intel_context_unpin(struct intel_context *ce);
+void __intel_context_do_unpin(struct intel_context *ce, int sub);
+
+static inline void intel_context_sched_disable_unpin(struct intel_context *ce)
+{
+	__intel_context_do_unpin(ce, 2);
+}
+
+static inline void intel_context_unpin(struct intel_context *ce)
+{
+	if (!ce->ops->sched_disable) {
+		__intel_context_do_unpin(ce, 1);
+	} else {
+		/*
+		 * Move ownership of this pin to the scheduling disable which is
+		 * an async operation. When that operation completes the above
+		 * intel_context_sched_disable_unpin is called potentially
+		 * unpinning the context.
+		 */
+		while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
+			if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
+				ce->ops->sched_disable(ce);
+				break;
+			}
+		}
+	}
+}
 
 void intel_context_enter_engine(struct intel_context *ce);
 void intel_context_exit_engine(struct intel_context *ce);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index beafe55a9101..e7af6a2368f8 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -43,6 +43,8 @@ struct intel_context_ops {
 	void (*enter)(struct intel_context *ce);
 	void (*exit)(struct intel_context *ce);
 
+	void (*sched_disable)(struct intel_context *ce);
+
 	void (*reset)(struct intel_context *ce);
 	void (*destroy)(struct kref *kref);
 };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index d44316dc914b..b43ec56986b5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -236,6 +236,8 @@ int intel_guc_reset_engine(struct intel_guc *guc,
 
 int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 					  const u32 *msg, u32 len);
+int intel_guc_sched_done_process_msg(struct intel_guc *guc,
+				     const u32 *msg, u32 len);
 
 void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 42a7daef2ff6..7491f041859e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -905,6 +905,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 		ret = intel_guc_deregister_done_process_msg(guc, payload,
 							    len);
 		break;
+	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
+		ret = intel_guc_sched_done_process_msg(guc, payload, len);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 49e5d460d54b..0386ccd5a481 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -70,6 +70,7 @@
  * possible for some of the bits to changing at the same time though.
  */
 #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
+#define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
 static inline bool context_enabled(struct intel_context *ce)
 {
 	return (atomic_read(&ce->guc_sched_state_no_lock) &
@@ -87,6 +88,24 @@ static inline void clr_context_enabled(struct intel_context *ce)
 		   &ce->guc_sched_state_no_lock);
 }
 
+static inline bool context_pending_enable(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_PENDING_ENABLE);
+}
+
+static inline void set_context_pending_enable(struct intel_context *ce)
+{
+	atomic_or(SCHED_STATE_NO_LOCK_PENDING_ENABLE,
+		  &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_pending_enable(struct intel_context *ce)
+{
+	atomic_and((u32)~SCHED_STATE_NO_LOCK_PENDING_ENABLE,
+		   &ce->guc_sched_state_no_lock);
+}
+
 /*
  * Below is a set of functions which control the GuC scheduling state which
  * require a lock, aside from the special case where the functions are called
@@ -95,6 +114,7 @@ static inline void clr_context_enabled(struct intel_context *ce)
  */
 #define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
 #define SCHED_STATE_DESTROYED				BIT(1)
+#define SCHED_STATE_PENDING_DISABLE			BIT(2)
 static inline void init_sched_state(struct intel_context *ce)
 {
 	/* Only should be called from guc_lrc_desc_pin() */
@@ -139,6 +159,24 @@ set_context_destroyed(struct intel_context *ce)
 	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
 }
 
+static inline bool context_pending_disable(struct intel_context *ce)
+{
+	return (ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE);
+}
+
+static inline void set_context_pending_disable(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state |= SCHED_STATE_PENDING_DISABLE;
+}
+
+static inline void clr_context_pending_disable(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state =
+		(ce->guc_state.sched_state & ~SCHED_STATE_PENDING_DISABLE);
+}
+
 static inline bool context_guc_id_invalid(struct intel_context *ce)
 {
 	return (ce->guc_id == GUC_INVALID_LRC_ID);
@@ -231,6 +269,8 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
 		action[len++] = ce->guc_id;
 		action[len++] = GUC_CONTEXT_ENABLE;
+		set_context_pending_enable(ce);
+		intel_context_get(ce);
 	} else {
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
 		action[len++] = ce->guc_id;
@@ -238,8 +278,12 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 
 	err = intel_guc_send_nb(guc, action, len);
 
-	if (!enabled && !err)
+	if (!enabled && !err) {
 		set_context_enabled(ce);
+	} else if (!enabled) {
+		clr_context_pending_enable(ce);
+		intel_context_put(ce);
+	}
 
 	return err;
 }
@@ -831,6 +875,60 @@ static void guc_context_post_unpin(struct intel_context *ce)
 	lrc_post_unpin(ce);
 }
 
+static void __guc_context_sched_disable(struct intel_guc *guc,
+					struct intel_context *ce,
+					u16 guc_id)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET,
+		guc_id,	/* ce->guc_id not stable */
+		GUC_CONTEXT_DISABLE
+	};
+
+	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
+
+	intel_context_get(ce);
+
+	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+}
+
+static u16 prep_context_pending_disable(struct intel_context *ce)
+{
+	set_context_pending_disable(ce);
+	clr_context_enabled(ce);
+
+	return ce->guc_id;
+}
+
+static void guc_context_sched_disable(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
+	unsigned long flags;
+	u16 guc_id;
+	intel_wakeref_t wakeref;
+
+	if (context_guc_id_invalid(ce) ||
+	    !lrc_desc_registered(guc, ce->guc_id)) {
+		clr_context_enabled(ce);
+		goto unpin;
+	}
+
+	if (!context_enabled(ce))
+		goto unpin;
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	guc_id = prep_context_pending_disable(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+	with_intel_runtime_pm(runtime_pm, wakeref)
+		__guc_context_sched_disable(guc, ce, guc_id);
+
+	return;
+unpin:
+	intel_context_sched_disable_unpin(ce);
+}
+
 static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 {
 	struct intel_engine_cs *engine = ce->engine;
@@ -839,6 +937,7 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 
 	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
 	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
+	GEM_BUG_ON(context_enabled(ce));
 
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
 	set_context_destroyed(ce);
@@ -920,6 +1019,8 @@ static const struct intel_context_ops guc_context_ops = {
 	.enter = intel_context_enter_engine,
 	.exit = intel_context_exit_engine,
 
+	.sched_disable = guc_context_sched_disable,
+
 	.reset = lrc_reset,
 	.destroy = guc_context_destroy,
 };
@@ -1352,3 +1453,45 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 
 	return 0;
 }
+
+int intel_guc_sched_done_process_msg(struct intel_guc *guc,
+				     const u32 *msg,
+				     u32 len)
+{
+	struct intel_context *ce;
+	unsigned long flags;
+	u32 desc_idx = msg[0];
+
+	if (unlikely(len < 2)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+		return -EPROTO;
+	}
+
+	ce = g2h_context_lookup(guc, desc_idx);
+	if (unlikely(!ce))
+		return -EPROTO;
+
+	if (unlikely(context_destroyed(ce) ||
+		     (!context_pending_enable(ce) &&
+		     !context_pending_disable(ce)))) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"Bad context sched_state 0x%x, 0x%x, desc_idx %u",
+			atomic_read(&ce->guc_sched_state_no_lock),
+			ce->guc_state.sched_state, desc_idx);
+		return -EPROTO;
+	}
+
+	if (context_pending_enable(ce)) {
+		clr_context_pending_enable(ce);
+	} else if (context_pending_disable(ce)) {
+		intel_context_sched_disable_unpin(ce);
+
+		spin_lock_irqsave(&ce->guc_state.lock, flags);
+		clr_context_pending_disable(ce);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+	}
+
+	intel_context_put(ce);
+
+	return 0;
+}
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 16/47] drm/i915/guc: Disable engine barriers with GuC during unpin
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (14 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 15/47] drm/i915/guc: Defer context unpin until scheduling is disabled Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-07-09 22:53   ` John Harrison
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 17/47] drm/i915/guc: Extend deregistration fence to schedule disable Matthew Brost
                   ` (35 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Disable engine barriers for unpinning with GuC. This feature isn't
needed with the GuC as it disables context scheduling before unpinning
which guarantees the HW will not reference the context. Hence it is
not necessary to defer unpinning until a kernel context request
completes on each engine in the context engine mask.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c    |  2 +-
 drivers/gpu/drm/i915/gt/intel_context.h    |  1 +
 drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++++++++++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 1499b8aace2a..7f97753ab164 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct intel_context *ce)
 
 	__i915_active_acquire(&ce->active);
 
-	if (intel_context_is_barrier(ce))
+	if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))
 		return 0;
 
 	/* Preallocate tracking nodes */
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index 8a7199afbe61..a592a9605dc8 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -16,6 +16,7 @@
 #include "intel_engine_types.h"
 #include "intel_ring_types.h"
 #include "intel_timeline_types.h"
+#include "uc/intel_guc_submission.h"
 
 #define CE_TRACE(ce, fmt, ...) do {					\
 	const struct intel_context *ce__ = (ce);			\
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c
index 26685b927169..fa7b99a671dd 100644
--- a/drivers/gpu/drm/i915/gt/selftest_context.c
+++ b/drivers/gpu/drm/i915/gt/selftest_context.c
@@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs *engine)
 	 * This test makes sure that the context is kept alive until a
 	 * subsequent idle-barrier (emitted when the engine wakeref hits 0
 	 * with no more outstanding requests).
+	 *
+	 * In GuC submission mode we don't use idle barriers and we instead
+	 * get a message from the GuC to signal that it is safe to unpin the
+	 * context from memory.
 	 */
+	if (intel_engine_uses_guc(engine))
+		return 0;
 
 	if (intel_engine_pm_is_awake(engine)) {
 		pr_err("%s is awake before starting %s!\n",
@@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs *engine)
 	 * on the context image remotely (intel_context_prepare_remote_request),
 	 * which inserts foreign fences into intel_context.active, does not
 	 * clobber the idle-barrier.
+	 *
+	 * In GuC submission mode we don't use idle barriers.
 	 */
+	if (intel_engine_uses_guc(engine))
+		return 0;
 
 	if (intel_engine_pm_is_awake(engine)) {
 		pr_err("%s is awake before starting %s!\n",
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 17/47] drm/i915/guc: Extend deregistration fence to schedule disable
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (15 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 16/47] drm/i915/guc: Disable engine barriers with GuC during unpin Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-07-09 22:59   ` John Harrison
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 18/47] drm/i915: Disable preempt busywait when using GuC scheduling Matthew Brost
                   ` (34 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Extend the deregistration context fence to fence whne a GuC context has
scheduling disable pending.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++++++++++++++----
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 0386ccd5a481..0a6ccdf32316 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -918,7 +918,19 @@ static void guc_context_sched_disable(struct intel_context *ce)
 		goto unpin;
 
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
+
+	/*
+	 * We have to check if the context has been pinned again as another pin
+	 * operation is allowed to pass this function. Checking the pin count
+	 * here synchronizes this function with guc_request_alloc ensuring a
+	 * request doesn't slip through the 'context_pending_disable' fence.
+	 */
+	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+		return;
+	}
 	guc_id = prep_context_pending_disable(ce);
+
 	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
 	with_intel_runtime_pm(runtime_pm, wakeref)
@@ -1123,19 +1135,22 @@ static int guc_request_alloc(struct i915_request *rq)
 out:
 	/*
 	 * We block all requests on this context if a G2H is pending for a
-	 * context deregistration as the GuC will fail a context registration
-	 * while this G2H is pending. Once a G2H returns, the fence is released
-	 * that is blocking these requests (see guc_signal_context_fence).
+	 * schedule disable or context deregistration as the GuC will fail a
+	 * schedule enable or context registration if either G2H is pending
+	 * respectfully. Once a G2H returns, the fence is released that is
+	 * blocking these requests (see guc_signal_context_fence).
 	 *
-	 * We can safely check the below field outside of the lock as it isn't
-	 * possible for this field to transition from being clear to set but
+	 * We can safely check the below fields outside of the lock as it isn't
+	 * possible for these fields to transition from being clear to set but
 	 * converse is possible, hence the need for the check within the lock.
 	 */
-	if (likely(!context_wait_for_deregister_to_register(ce)))
+	if (likely(!context_wait_for_deregister_to_register(ce) &&
+		   !context_pending_disable(ce)))
 		return 0;
 
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
-	if (context_wait_for_deregister_to_register(ce)) {
+	if (context_wait_for_deregister_to_register(ce) ||
+	    context_pending_disable(ce)) {
 		i915_sw_fence_await(&rq->submit);
 
 		list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences);
@@ -1484,10 +1499,18 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 	if (context_pending_enable(ce)) {
 		clr_context_pending_enable(ce);
 	} else if (context_pending_disable(ce)) {
+		/*
+		 * Unpin must be done before __guc_signal_context_fence,
+		 * otherwise a race exists between the requests getting
+		 * submitted + retired before this unpin completes resulting in
+		 * the pin_count going to zero and the context still being
+		 * enabled.
+		 */
 		intel_context_sched_disable_unpin(ce);
 
 		spin_lock_irqsave(&ce->guc_state.lock, flags);
 		clr_context_pending_disable(ce);
+		__guc_signal_context_fence(ce);
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 	}
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 18/47] drm/i915: Disable preempt busywait when using GuC scheduling
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (16 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 17/47] drm/i915/guc: Extend deregistration fence to schedule disable Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-07-09 23:03   ` John Harrison
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 19/47] drm/i915/guc: Ensure request ordering via completion fences Matthew Brost
                   ` (33 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Disable preempt busywait when using GuC scheduling. This isn't need as
the GuC control preemption when scheduling.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
index 87b06572fd2e..f7aae502ec3d 100644
--- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
@@ -506,7 +506,8 @@ gen8_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs)
 	*cs++ = MI_USER_INTERRUPT;
 
 	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
-	if (intel_engine_has_semaphores(rq->engine))
+	if (intel_engine_has_semaphores(rq->engine) &&
+	    !intel_uc_uses_guc_submission(&rq->engine->gt->uc))
 		cs = emit_preempt_busywait(rq, cs);
 
 	rq->tail = intel_ring_offset(rq, cs);
@@ -598,7 +599,8 @@ gen12_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs)
 	*cs++ = MI_USER_INTERRUPT;
 
 	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
-	if (intel_engine_has_semaphores(rq->engine))
+	if (intel_engine_has_semaphores(rq->engine) &&
+	    !intel_uc_uses_guc_submission(&rq->engine->gt->uc))
 		cs = gen12_emit_preempt_busywait(rq, cs);
 
 	rq->tail = intel_ring_offset(rq, cs);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 19/47] drm/i915/guc: Ensure request ordering via completion fences
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (17 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 18/47] drm/i915: Disable preempt busywait when using GuC scheduling Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-07-15  1:51   ` Daniele Ceraolo Spurio
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 20/47] drm/i915/guc: Disable semaphores when using GuC scheduling Matthew Brost
                   ` (32 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

If two requests are on the same ring, they are explicitly ordered by the
HW. So, a submission fence is sufficient to ensure ordering when using
the new GuC submission interface. Conversely, if two requests share a
timeline and are on the same physical engine but different context this
doesn't ensure ordering on the new GuC submission interface. So, a
completion fence needs to be used to ensure ordering.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c   |  1 -
 drivers/gpu/drm/i915/i915_request.c             | 17 +++++++++++++----
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 0a6ccdf32316..010e46dd6b16 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -926,7 +926,6 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	 * request doesn't slip through the 'context_pending_disable' fence.
 	 */
 	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
-		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 		return;
 	}
 	guc_id = prep_context_pending_disable(ce);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 9dad3df5eaf7..d92c9f25c9f4 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -444,6 +444,7 @@ void i915_request_retire_upto(struct i915_request *rq)
 
 	do {
 		tmp = list_first_entry(&tl->requests, typeof(*tmp), link);
+		GEM_BUG_ON(!i915_request_completed(tmp));
 	} while (i915_request_retire(tmp) && tmp != rq);
 }
 
@@ -1405,6 +1406,9 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence)
 	return err;
 }
 
+static int
+i915_request_await_request(struct i915_request *to, struct i915_request *from);
+
 int
 i915_request_await_execution(struct i915_request *rq,
 			     struct dma_fence *fence,
@@ -1464,12 +1468,13 @@ await_request_submit(struct i915_request *to, struct i915_request *from)
 	 * the waiter to be submitted immediately to the physical engine
 	 * as it may then bypass the virtual request.
 	 */
-	if (to->engine == READ_ONCE(from->engine))
+	if (to->engine == READ_ONCE(from->engine)) {
 		return i915_sw_fence_await_sw_fence_gfp(&to->submit,
 							&from->submit,
 							I915_FENCE_GFP);
-	else
+	} else {
 		return __i915_request_await_execution(to, from, NULL);
+	}
 }
 
 static int
@@ -1493,7 +1498,8 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 			return ret;
 	}
 
-	if (is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
+	if (!intel_engine_uses_guc(to->engine) &&
+	    is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
 		ret = await_request_submit(to, from);
 	else
 		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
@@ -1654,6 +1660,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
 	prev = to_request(__i915_active_fence_set(&timeline->last_request,
 						  &rq->fence));
 	if (prev && !__i915_request_is_complete(prev)) {
+		bool uses_guc = intel_engine_uses_guc(rq->engine);
+
 		/*
 		 * The requests are supposed to be kept in order. However,
 		 * we need to be wary in case the timeline->last_request
@@ -1664,7 +1672,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
 			   i915_seqno_passed(prev->fence.seqno,
 					     rq->fence.seqno));
 
-		if (is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask))
+		if ((!uses_guc && is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask)) ||
+		    (uses_guc && prev->context == rq->context))
 			i915_sw_fence_await_sw_fence(&rq->submit,
 						     &prev->submit,
 						     &rq->submitq);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 20/47] drm/i915/guc: Disable semaphores when using GuC scheduling
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (18 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 19/47] drm/i915/guc: Ensure request ordering via completion fences Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-07-09 23:53   ` John Harrison
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 21/47] drm/i915/guc: Ensure G2H response has space in buffer Matthew Brost
                   ` (31 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Semaphores are an optimization and not required for basic GuC submission
to work properly. Disable until we have time to do the implementation to
enable semaphores and tune them for performance. Also long direction is
just to delete semaphores from the i915 so another reason to not enable
these for GuC submission.

v2: Reword commit message

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 7720b8c22c81..5c07e6abf16a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -230,7 +230,8 @@ static void intel_context_set_gem(struct intel_context *ce,
 		ce->timeline = intel_timeline_get(ctx->timeline);
 
 	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
-	    intel_engine_has_timeslices(ce->engine))
+	    intel_engine_has_timeslices(ce->engine) &&
+	    intel_engine_has_semaphores(ce->engine))
 		__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
 
 	intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us);
@@ -1938,7 +1939,8 @@ static int __apply_priority(struct intel_context *ce, void *arg)
 	if (!intel_engine_has_timeslices(ce->engine))
 		return 0;
 
-	if (ctx->sched.priority >= I915_PRIORITY_NORMAL)
+	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
+	    intel_engine_has_semaphores(ce->engine))
 		intel_context_set_use_semaphores(ce);
 	else
 		intel_context_clear_use_semaphores(ce);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 21/47] drm/i915/guc: Ensure G2H response has space in buffer
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (19 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 20/47] drm/i915/guc: Disable semaphores when using GuC scheduling Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-07-13 18:36   ` John Harrison
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 22/47] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC Matthew Brost
                   ` (30 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Ensure G2H response has space in the buffer before sending H2G CTB as
the GuC can't handle any backpressure on the G2H interface.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        | 13 +++-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 76 +++++++++++++++----
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 ++--
 5 files changed, 87 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index b43ec56986b5..24e7a924134e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -95,11 +95,17 @@ inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
 }
 
 #define INTEL_GUC_SEND_NB		BIT(31)
+#define INTEL_GUC_SEND_G2H_DW_SHIFT	0
+#define INTEL_GUC_SEND_G2H_DW_MASK	(0xff << INTEL_GUC_SEND_G2H_DW_SHIFT)
+#define MAKE_SEND_FLAGS(len) \
+	({GEM_BUG_ON(!FIELD_FIT(INTEL_GUC_SEND_G2H_DW_MASK, len)); \
+	(FIELD_PREP(INTEL_GUC_SEND_G2H_DW_MASK, len) | INTEL_GUC_SEND_NB);})
 static
-inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
+inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len,
+			     u32 g2h_len_dw)
 {
 	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
-				 INTEL_GUC_SEND_NB);
+				 MAKE_SEND_FLAGS(g2h_len_dw));
 }
 
 static inline int
@@ -113,6 +119,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
 static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
 					   const u32 *action,
 					   u32 len,
+					   u32 g2h_len_dw,
 					   bool loop)
 {
 	int err;
@@ -121,7 +128,7 @@ static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
 	might_sleep_if(loop && (!in_atomic() && !irqs_disabled()));
 
 retry:
-	err = intel_guc_send_nb(guc, action, len);
+	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
 	if (unlikely(err == -EBUSY && loop)) {
 		if (likely(!in_atomic() && !irqs_disabled()))
 			cond_resched();
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 7491f041859e..a60970e85635 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -73,6 +73,7 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
 #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
 #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
 #define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
+#define G2H_ROOM_BUFFER_SIZE	(PAGE_SIZE)
 
 struct ct_request {
 	struct list_head link;
@@ -129,23 +130,27 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
 
 static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
 {
+	u32 space;
+
 	ctb->broken = false;
 	ctb->tail = 0;
 	ctb->head = 0;
-	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+	space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size) - ctb->resv_space;
+	atomic_set(&ctb->space, space);
 
 	guc_ct_buffer_desc_init(ctb->desc);
 }
 
 static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb,
 			       struct guc_ct_buffer_desc *desc,
-			       u32 *cmds, u32 size_in_bytes)
+			       u32 *cmds, u32 size_in_bytes, u32 resv_space)
 {
 	GEM_BUG_ON(size_in_bytes % 4);
 
 	ctb->desc = desc;
 	ctb->cmds = cmds;
 	ctb->size = size_in_bytes / 4;
+	ctb->resv_space = resv_space / 4;
 
 	guc_ct_buffer_reset(ctb);
 }
@@ -226,6 +231,7 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
 	struct guc_ct_buffer_desc *desc;
 	u32 blob_size;
 	u32 cmds_size;
+	u32 resv_space;
 	void *blob;
 	u32 *cmds;
 	int err;
@@ -250,19 +256,23 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
 	desc = blob;
 	cmds = blob + 2 * CTB_DESC_SIZE;
 	cmds_size = CTB_H2G_BUFFER_SIZE;
-	CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "send",
-		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
+	resv_space = 0;
+	CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u/%u\n", "send",
+		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size,
+		 resv_space);
 
-	guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size);
+	guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size, resv_space);
 
 	/* store pointers to desc and cmds for recv ctb */
 	desc = blob + CTB_DESC_SIZE;
 	cmds = blob + 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE;
 	cmds_size = CTB_G2H_BUFFER_SIZE;
-	CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "recv",
-		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
+	resv_space = G2H_ROOM_BUFFER_SIZE;
+	CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u/%u\n", "recv",
+		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size,
+		 resv_space);
 
-	guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size);
+	guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size, resv_space);
 
 	return 0;
 }
@@ -458,7 +468,7 @@ static int ct_write(struct intel_guc_ct *ct,
 	/* now update descriptor */
 	ctb->tail = tail;
 	WRITE_ONCE(desc->tail, tail);
-	ctb->space -= len + 1;
+	atomic_sub(len + 1, &ctb->space);
 
 	return 0;
 
@@ -521,13 +531,34 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 	return ret;
 }
 
+static inline bool g2h_has_room(struct intel_guc_ct *ct, u32 g2h_len_dw)
+{
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
+
+	/*
+	 * We leave a certain amount of space in the G2H CTB buffer for
+	 * unexpected G2H CTBs (e.g. logging, engine hang, etc...)
+	 */
+	return !g2h_len_dw || atomic_read(&ctb->space) >= g2h_len_dw;
+}
+
+static inline void g2h_reserve_space(struct intel_guc_ct *ct, u32 g2h_len_dw)
+{
+	lockdep_assert_held(&ct->ctbs.send.lock);
+
+	GEM_BUG_ON(!g2h_has_room(ct, g2h_len_dw));
+
+	if (g2h_len_dw)
+		atomic_sub(g2h_len_dw, &ct->ctbs.recv.space);
+}
+
 static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	u32 head;
 	u32 space;
 
-	if (ctb->space >= len_dw)
+	if (atomic_read(&ctb->space) >= len_dw)
 		return true;
 
 	head = READ_ONCE(ctb->desc->head);
@@ -540,16 +571,16 @@ static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 	}
 
 	space = CIRC_SPACE(ctb->tail, head, ctb->size);
-	ctb->space = space;
+	atomic_set(&ctb->space, space);
 
 	return space >= len_dw;
 }
 
-static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
+static int has_room_nb(struct intel_guc_ct *ct, u32 h2g_dw, u32 g2h_dw)
 {
 	lockdep_assert_held(&ct->ctbs.send.lock);
 
-	if (unlikely(!h2g_has_room(ct, len_dw))) {
+	if (unlikely(!h2g_has_room(ct, h2g_dw) || !g2h_has_room(ct, g2h_dw))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 
@@ -563,6 +594,9 @@ static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
 	return 0;
 }
 
+#define G2H_LEN_DW(f) \
+	FIELD_GET(INTEL_GUC_SEND_G2H_DW_MASK, f) ? \
+	FIELD_GET(INTEL_GUC_SEND_G2H_DW_MASK, f) + GUC_CTB_HXG_MSG_MIN_LEN : 0
 static int ct_send_nb(struct intel_guc_ct *ct,
 		      const u32 *action,
 		      u32 len,
@@ -570,12 +604,13 @@ static int ct_send_nb(struct intel_guc_ct *ct,
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	unsigned long spin_flags;
+	u32 g2h_len_dw = G2H_LEN_DW(flags);
 	u32 fence;
 	int ret;
 
 	spin_lock_irqsave(&ctb->lock, spin_flags);
 
-	ret = has_room_nb(ct, len + 1);
+	ret = has_room_nb(ct, len + 1, g2h_len_dw);
 	if (unlikely(ret))
 		goto out;
 
@@ -584,6 +619,7 @@ static int ct_send_nb(struct intel_guc_ct *ct,
 	if (unlikely(ret))
 		goto out;
 
+	g2h_reserve_space(ct, g2h_len_dw);
 	intel_guc_notify(ct_to_guc(ct));
 
 out:
@@ -965,10 +1001,22 @@ static void ct_incoming_request_worker_func(struct work_struct *w)
 static int ct_handle_event(struct intel_guc_ct *ct, struct ct_incoming_msg *request)
 {
 	const u32 *hxg = &request->msg[GUC_CTB_MSG_MIN_LEN];
+	u32 action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, hxg[0]);
 	unsigned long flags;
 
 	GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]) != GUC_HXG_TYPE_EVENT);
 
+	/*
+	 * Adjusting the space must be done in IRQ or deadlock can occur as the
+	 * CTB processing in the below workqueue can send CTBs which creates a
+	 * circular dependency if the space was returned there.
+	 */
+	switch (action) {
+	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
+	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
+		atomic_add(request->size, &ct->ctbs.recv.space);
+	}
+
 	spin_lock_irqsave(&ct->requests.lock, flags);
 	list_add_tail(&request->link, &ct->requests.incoming);
 	spin_unlock_irqrestore(&ct->requests.lock, flags);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 9924335e2ee6..660bf37238e2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -33,6 +33,7 @@ struct intel_guc;
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
  * @size: size of the commands buffer in dwords
+ * @resv_space: reserved space in buffer in dwords
  * @head: local shadow copy of head in dwords
  * @tail: local shadow copy of tail in dwords
  * @space: local shadow copy of space in dwords
@@ -43,9 +44,10 @@ struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
 	u32 size;
+	u32 resv_space;
 	u32 tail;
 	u32 head;
-	u32 space;
+	atomic_t space;
 	bool broken;
 };
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 4e4edc368b77..94bb1ca6f889 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -17,6 +17,10 @@
 #include "abi/guc_communication_ctb_abi.h"
 #include "abi/guc_messages_abi.h"
 
+/* Payload length only i.e. don't include G2H header length */
+#define G2H_LEN_DW_SCHED_CONTEXT_MODE_SET	2
+#define G2H_LEN_DW_DEREGISTER_CONTEXT		1
+
 #define GUC_CONTEXT_DISABLE		0
 #define GUC_CONTEXT_ENABLE		1
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 010e46dd6b16..ef24758c4266 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -260,6 +260,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	struct intel_context *ce = rq->context;
 	u32 action[3];
 	int len = 0;
+	u32 g2h_len_dw = 0;
 	bool enabled = context_enabled(ce);
 
 	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
@@ -271,13 +272,13 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		action[len++] = GUC_CONTEXT_ENABLE;
 		set_context_pending_enable(ce);
 		intel_context_get(ce);
+		g2h_len_dw = G2H_LEN_DW_SCHED_CONTEXT_MODE_SET;
 	} else {
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
 		action[len++] = ce->guc_id;
 	}
 
-	err = intel_guc_send_nb(guc, action, len);
-
+	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
 	if (!enabled && !err) {
 		set_context_enabled(ce);
 	} else if (!enabled) {
@@ -730,7 +731,7 @@ static int __guc_action_register_context(struct intel_guc *guc,
 		offset,
 	};
 
-	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
 }
 
 static int register_context(struct intel_context *ce)
@@ -750,7 +751,8 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
 		guc_id,
 	};
 
-	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
+					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
 }
 
 static int deregister_context(struct intel_context *ce, u32 guc_id)
@@ -889,7 +891,8 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 
 	intel_context_get(ce);
 
-	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
+				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
 }
 
 static u16 prep_context_pending_disable(struct intel_context *ce)
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 22/47] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (20 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 21/47] drm/i915/guc: Ensure G2H response has space in buffer Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-07-10  0:16   ` John Harrison
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 23/47] drm/i915/guc: Update GuC debugfs to support new GuC Matthew Brost
                   ` (29 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

When running the GuC the GPU can't be considered idle if the GuC still
has contexts pinned. As such, a call has been added in
intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
the number of unpinned contexts to go to zero.

v2: rtimeout -> remaining_timeout

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
 drivers/gpu/drm/i915/gt/intel_gt.c            | 19 ++++
 drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 22 ++---
 drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  9 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 88 ++++++++++++++++++-
 drivers/gpu/drm/i915/gt/uc/intel_uc.h         |  5 ++
 drivers/gpu/drm/i915/i915_debugfs.c           |  1 +
 drivers/gpu/drm/i915/i915_gem_evict.c         |  1 +
 .../gpu/drm/i915/selftests/igt_live_test.c    |  2 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
 14 files changed, 137 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 2fd155742bd2..335b955d5b4b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -644,7 +644,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
 		goto insert;
 
 	/* Attempt to reap some mmap space from dead objects */
-	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
+	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
+					       NULL);
 	if (err)
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index e714e21c0a4d..acfdd53b2678 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt)
 	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
 }
 
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
+{
+	long remaining_timeout;
+
+	/* If the device is asleep, we have no requests outstanding */
+	if (!intel_gt_pm_is_awake(gt))
+		return 0;
+
+	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
+							   &remaining_timeout)) > 0) {
+		cond_resched();
+		if (signal_pending(current))
+			return -EINTR;
+	}
+
+	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc,
+							  remaining_timeout);
+}
+
 int intel_gt_init(struct intel_gt *gt)
 {
 	int err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index e7aabe0cc5bf..74e771871a9b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
 
 void intel_gt_driver_late_release(struct intel_gt *gt);
 
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
+
 void intel_gt_check_and_clear_faults(struct intel_gt *gt);
 void intel_gt_clear_error_registers(struct intel_gt *gt,
 				    intel_engine_mask_t engine_mask);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index 647eca9d867a..39f5e824dac5 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -13,6 +13,7 @@
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
 #include "intel_timeline.h"
+#include "uc/intel_uc.h"
 
 static bool retire_requests(struct intel_timeline *tl)
 {
@@ -130,7 +131,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine)
 	GEM_BUG_ON(engine->retire);
 }
 
-long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
+long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
+				      long *remaining_timeout)
 {
 	struct intel_gt_timelines *timelines = &gt->timelines;
 	struct intel_timeline *tl, *tn;
@@ -195,22 +197,10 @@ out_active:	spin_lock(&timelines->lock);
 	if (flush_submission(gt, timeout)) /* Wait, there's more! */
 		active_count++;
 
-	return active_count ? timeout : 0;
-}
-
-int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
-{
-	/* If the device is asleep, we have no requests outstanding */
-	if (!intel_gt_pm_is_awake(gt))
-		return 0;
-
-	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
-		cond_resched();
-		if (signal_pending(current))
-			return -EINTR;
-	}
+	if (remaining_timeout)
+		*remaining_timeout = timeout;
 
-	return timeout;
+	return active_count ? timeout : 0;
 }
 
 static void retire_work_handler(struct work_struct *work)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
index fcc30a6e4fe9..51dbe0e3294e 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
@@ -6,14 +6,17 @@
 #ifndef INTEL_GT_REQUESTS_H
 #define INTEL_GT_REQUESTS_H
 
+#include <stddef.h>
+
 struct intel_engine_cs;
 struct intel_gt;
 struct intel_timeline;
 
-long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
+long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
+				      long *remaining_timeout);
 static inline void intel_gt_retire_requests(struct intel_gt *gt)
 {
-	intel_gt_retire_requests_timeout(gt, 0);
+	intel_gt_retire_requests_timeout(gt, 0, NULL);
 }
 
 void intel_engine_init_retire(struct intel_engine_cs *engine);
@@ -21,8 +24,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine,
 			     struct intel_timeline *tl);
 void intel_engine_fini_retire(struct intel_engine_cs *engine);
 
-int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
-
 void intel_gt_init_requests(struct intel_gt *gt);
 void intel_gt_park_requests(struct intel_gt *gt);
 void intel_gt_unpark_requests(struct intel_gt *gt);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 24e7a924134e..22eb1e9cca41 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -38,6 +38,8 @@ struct intel_guc {
 	spinlock_t irq_lock;
 	unsigned int msg_enabled_mask;
 
+	atomic_t outstanding_submission_g2h;
+
 	struct {
 		void (*reset)(struct intel_guc *guc);
 		void (*enable)(struct intel_guc *guc);
@@ -238,6 +240,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
 	spin_unlock_irq(&guc->irq_lock);
 }
 
+int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
+
 int intel_guc_reset_engine(struct intel_guc *guc,
 			   struct intel_engine_cs *engine);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index a60970e85635..e0f92e28350c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -109,6 +109,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
 	INIT_LIST_HEAD(&ct->requests.incoming);
 	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
 	tasklet_setup(&ct->receive_tasklet, ct_receive_tasklet_func);
+	init_waitqueue_head(&ct->wq);
 }
 
 static inline const char *guc_ct_buffer_type_to_str(u32 type)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 660bf37238e2..ab1b79ab960b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -10,6 +10,7 @@
 #include <linux/spinlock.h>
 #include <linux/workqueue.h>
 #include <linux/ktime.h>
+#include <linux/wait.h>
 
 #include "intel_guc_fwif.h"
 
@@ -68,6 +69,9 @@ struct intel_guc_ct {
 
 	struct tasklet_struct receive_tasklet;
 
+	/** @wq: wait queue for g2h chanenl */
+	wait_queue_head_t wq;
+
 	struct {
 		u16 last_fence; /* last fence used to send request */
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ef24758c4266..d1a28283a9ae 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -254,6 +254,74 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
 	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
 }
 
+static int guc_submission_busy_loop(struct intel_guc* guc,
+				    const u32 *action,
+				    u32 len,
+				    u32 g2h_len_dw,
+				    bool loop)
+{
+	int err;
+
+	err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+
+	if (!err && g2h_len_dw)
+		atomic_inc(&guc->outstanding_submission_g2h);
+
+	return err;
+}
+
+static int guc_wait_for_pending_msg(struct intel_guc *guc,
+				    atomic_t *wait_var,
+				    bool interruptible,
+				    long timeout)
+{
+	const int state = interruptible ?
+		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
+	DEFINE_WAIT(wait);
+
+	might_sleep();
+	GEM_BUG_ON(timeout < 0);
+
+	if (!atomic_read(wait_var))
+		return 0;
+
+	if (!timeout)
+		return -ETIME;
+
+	for (;;) {
+		prepare_to_wait(&guc->ct.wq, &wait, state);
+
+		if (!atomic_read(wait_var))
+			break;
+
+		if (signal_pending_state(state, current)) {
+			timeout = -ERESTARTSYS;
+			break;
+		}
+
+		if (!timeout) {
+			timeout = -ETIME;
+			break;
+		}
+
+		timeout = io_schedule_timeout(timeout);
+	}
+	finish_wait(&guc->ct.wq, &wait);
+
+	return (timeout < 0) ? timeout : 0;
+}
+
+int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
+{
+	bool interruptible = true;
+
+	if (unlikely(timeout < 0))
+		timeout = -timeout, interruptible = false;
+
+	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
+					interruptible, timeout);
+}
+
 static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
 	int err;
@@ -280,6 +348,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 
 	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
 	if (!enabled && !err) {
+		atomic_inc(&guc->outstanding_submission_g2h);
 		set_context_enabled(ce);
 	} else if (!enabled) {
 		clr_context_pending_enable(ce);
@@ -731,7 +800,7 @@ static int __guc_action_register_context(struct intel_guc *guc,
 		offset,
 	};
 
-	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
+	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
 }
 
 static int register_context(struct intel_context *ce)
@@ -751,7 +820,7 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
 		guc_id,
 	};
 
-	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
+	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
 					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
 }
 
@@ -868,7 +937,9 @@ static int guc_context_pin(struct intel_context *ce, void *vaddr)
 
 static void guc_context_unpin(struct intel_context *ce)
 {
-	unpin_guc_id(ce_to_guc(ce), ce);
+	struct intel_guc *guc = ce_to_guc(ce);
+
+	unpin_guc_id(guc, ce);
 	lrc_unpin(ce);
 }
 
@@ -891,7 +962,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 
 	intel_context_get(ce);
 
-	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
+	guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
 				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
 }
 
@@ -1433,6 +1504,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
 	return ce;
 }
 
+static void decr_outstanding_submission_g2h(struct intel_guc *guc)
+{
+	if (atomic_dec_and_test(&guc->outstanding_submission_g2h))
+		wake_up_all(&guc->ct.wq);
+}
+
 int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 					  const u32 *msg,
 					  u32 len)
@@ -1468,6 +1545,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 		lrc_destroy(&ce->ref);
 	}
 
+	decr_outstanding_submission_g2h(guc);
+
 	return 0;
 }
 
@@ -1516,6 +1595,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 	}
 
+	decr_outstanding_submission_g2h(guc);
 	intel_context_put(ce);
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
index 9c954c589edf..c4cef885e984 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission);
 #undef uc_state_checkers
 #undef __uc_state_checker
 
+static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout)
+{
+	return intel_guc_wait_for_idle(&uc->guc, timeout);
+}
+
 #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \
 static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \
 { \
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index cc745751ac53..277800987bf8 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -36,6 +36,7 @@
 #include "gt/intel_gt_clock_utils.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_pm.h"
+#include "gt/intel_gt.h"
 #include "gt/intel_gt_requests.h"
 #include "gt/intel_reset.h"
 #include "gt/intel_rc6.h"
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 4d2d59a9942b..2b73ddb11c66 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -27,6 +27,7 @@
  */
 
 #include "gem/i915_gem_context.h"
+#include "gt/intel_gt.h"
 #include "gt/intel_gt_requests.h"
 
 #include "i915_drv.h"
diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
index c130010a7033..1c721542e277 100644
--- a/drivers/gpu/drm/i915/selftests/igt_live_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
@@ -5,7 +5,7 @@
  */
 
 #include "i915_drv.h"
-#include "gt/intel_gt_requests.h"
+#include "gt/intel_gt.h"
 
 #include "../i915_selftest.h"
 #include "igt_flush_test.h"
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index d189c4bd4bef..4f8180146888 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -52,7 +52,8 @@ void mock_device_flush(struct drm_i915_private *i915)
 	do {
 		for_each_engine(engine, gt, id)
 			mock_engine_flush(engine);
-	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT));
+	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT,
+						  NULL));
 }
 
 static void mock_device_release(struct drm_device *dev)
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 23/47] drm/i915/guc: Update GuC debugfs to support new GuC
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (21 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 22/47] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-07-12 18:05   ` John Harrison
  2021-07-13  8:51   ` Michal Wajdeczko
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 24/47] drm/i915/guc: Add several request trace points Matthew Brost
                   ` (28 subsequent siblings)
  51 siblings, 2 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Update GuC debugfs to support the new GuC structures.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 22 ++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  3 ++
 .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    | 23 +++++++-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 52 +++++++++++++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  4 ++
 drivers/gpu/drm/i915/i915_debugfs.c           |  1 +
 6 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index e0f92e28350c..4ed074df88e5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -1135,3 +1135,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
 
 	ct_try_receive_message(ct);
 }
+
+void intel_guc_log_ct_info(struct intel_guc_ct *ct,
+			   struct drm_printer *p)
+{
+	if (!ct->enabled) {
+		drm_puts(p, "CT disabled\n");
+		return;
+	}
+
+	drm_printf(p, "H2G Space: %u\n",
+		   atomic_read(&ct->ctbs.send.space) * 4);
+	drm_printf(p, "Head: %u\n",
+		   ct->ctbs.send.desc->head);
+	drm_printf(p, "Tail: %u\n",
+		   ct->ctbs.send.desc->tail);
+	drm_printf(p, "G2H Space: %u\n",
+		   atomic_read(&ct->ctbs.recv.space) * 4);
+	drm_printf(p, "Head: %u\n",
+		   ct->ctbs.recv.desc->head);
+	drm_printf(p, "Tail: %u\n",
+		   ct->ctbs.recv.desc->tail);
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index ab1b79ab960b..f62eb06b32fc 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -16,6 +16,7 @@
 
 struct i915_vma;
 struct intel_guc;
+struct drm_printer;
 
 /**
  * DOC: Command Transport (CT).
@@ -106,4 +107,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 		      u32 *response_buf, u32 response_buf_size, u32 flags);
 void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
 
+void intel_guc_log_ct_info(struct intel_guc_ct *ct, struct drm_printer *p);
+
 #endif /* _INTEL_GUC_CT_H_ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
index fe7cb7b29a1e..62b9ce0fafaa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
@@ -9,6 +9,8 @@
 #include "intel_guc.h"
 #include "intel_guc_debugfs.h"
 #include "intel_guc_log_debugfs.h"
+#include "gt/uc/intel_guc_ct.h"
+#include "gt/uc/intel_guc_submission.h"
 
 static int guc_info_show(struct seq_file *m, void *data)
 {
@@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data)
 	drm_puts(&p, "\n");
 	intel_guc_log_info(&guc->log, &p);
 
-	/* Add more as required ... */
+	if (!intel_guc_submission_is_used(guc))
+		return 0;
+
+	intel_guc_log_ct_info(&guc->ct, &p);
+	intel_guc_log_submission_info(guc, &p);
 
 	return 0;
 }
 DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info);
 
+static int guc_registered_contexts_show(struct seq_file *m, void *data)
+{
+	struct intel_guc *guc = m->private;
+	struct drm_printer p = drm_seq_file_printer(m);
+
+	if (!intel_guc_submission_is_used(guc))
+		return -ENODEV;
+
+	intel_guc_log_context_info(guc, &p);
+
+	return 0;
+}
+DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts);
+
 void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root)
 {
 	static const struct debugfs_gt_file files[] = {
 		{ "guc_info", &guc_info_fops, NULL },
+		{ "guc_registered_contexts", &guc_registered_contexts_fops, NULL },
 	};
 
 	if (!intel_guc_is_supported(guc))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index d1a28283a9ae..89b3c7e5d15b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1600,3 +1600,55 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 
 	return 0;
 }
+
+void intel_guc_log_submission_info(struct intel_guc *guc,
+				   struct drm_printer *p)
+{
+	struct i915_sched_engine *sched_engine = guc->sched_engine;
+	struct rb_node *rb;
+	unsigned long flags;
+
+	drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
+		   atomic_read(&guc->outstanding_submission_g2h));
+	drm_printf(p, "GuC tasklet count: %u\n\n",
+		   atomic_read(&sched_engine->tasklet.count));
+
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	drm_printf(p, "Requests in GuC submit tasklet:\n");
+	for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
+		struct i915_priolist *pl = to_priolist(rb);
+		struct i915_request *rq;
+
+		priolist_for_each_request(rq, pl)
+			drm_printf(p, "guc_id=%u, seqno=%llu\n",
+				   rq->context->guc_id,
+				   rq->fence.seqno);
+	}
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+	drm_printf(p, "\n");
+}
+
+void intel_guc_log_context_info(struct intel_guc *guc,
+				struct drm_printer *p)
+{
+	struct intel_context *ce;
+	unsigned long index;
+
+	xa_for_each(&guc->context_lookup, index, ce) {
+		drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
+		drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
+		drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
+			   ce->ring->head,
+			   ce->lrc_reg_state[CTX_RING_HEAD]);
+		drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
+			   ce->ring->tail,
+			   ce->lrc_reg_state[CTX_RING_TAIL]);
+		drm_printf(p, "\t\tContext Pin Count: %u\n",
+			   atomic_read(&ce->pin_count));
+		drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
+			   atomic_read(&ce->guc_id_ref));
+		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
+			   ce->guc_state.sched_state,
+			   atomic_read(&ce->guc_sched_state_no_lock));
+	}
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 3f7005018939..6453e2bfa151 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -10,6 +10,7 @@
 
 #include "intel_guc.h"
 
+struct drm_printer;
 struct intel_engine_cs;
 
 void intel_guc_submission_init_early(struct intel_guc *guc);
@@ -20,6 +21,9 @@ void intel_guc_submission_fini(struct intel_guc *guc);
 int intel_guc_preempt_work_create(struct intel_guc *guc);
 void intel_guc_preempt_work_destroy(struct intel_guc *guc);
 int intel_guc_submission_setup(struct intel_engine_cs *engine);
+void intel_guc_log_submission_info(struct intel_guc *guc,
+				   struct drm_printer *p);
+void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p);
 
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 277800987bf8..a9084789deff 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -50,6 +50,7 @@
 #include "i915_trace.h"
 #include "intel_pm.h"
 #include "intel_sideband.h"
+#include "gt/intel_lrc_reg.h"
 
 static inline struct drm_i915_private *node_to_i915(struct drm_info_node *node)
 {
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 24/47] drm/i915/guc: Add several request trace points
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (22 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 23/47] drm/i915/guc: Update GuC debugfs to support new GuC Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-07-12 18:08   ` John Harrison
  2021-07-13  9:06   ` Tvrtko Ursulin
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 25/47] drm/i915: Add intel_context tracing Matthew Brost
                   ` (27 subsequent siblings)
  51 siblings, 2 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Add trace points for request dependencies and GuC submit. Extended
existing request trace points to include submit fence value,, guc_id,
and ring tail value.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  3 ++
 drivers/gpu/drm/i915/i915_request.c           |  3 ++
 drivers/gpu/drm/i915/i915_trace.h             | 39 ++++++++++++++++++-
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 89b3c7e5d15b..c2327eebc09c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -422,6 +422,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 			guc->stalled_request = last;
 			return false;
 		}
+		trace_i915_request_guc_submit(last);
 	}
 
 	guc->stalled_request = NULL;
@@ -642,6 +643,8 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
 	ret = guc_add_request(guc, rq);
 	if (ret == -EBUSY)
 		guc->stalled_request = rq;
+	else
+		trace_i915_request_guc_submit(rq);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index d92c9f25c9f4..7f7aa096e873 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1344,6 +1344,9 @@ __i915_request_await_execution(struct i915_request *to,
 			return err;
 	}
 
+	trace_i915_request_dep_to(to);
+	trace_i915_request_dep_from(from);
+
 	/* Couple the dependency tree for PI on this exposed to->fence */
 	if (to->engine->sched_engine->schedule) {
 		err = i915_sched_node_add_dependency(&to->sched,
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 6778ad2a14a4..b02d04b6c8f6 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -794,22 +794,27 @@ DECLARE_EVENT_CLASS(i915_request,
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
 			     __field(u64, ctx)
+			     __field(u32, guc_id)
 			     __field(u16, class)
 			     __field(u16, instance)
 			     __field(u32, seqno)
+			     __field(u32, tail)
 			     ),
 
 	    TP_fast_assign(
 			   __entry->dev = rq->engine->i915->drm.primary->index;
 			   __entry->class = rq->engine->uabi_class;
 			   __entry->instance = rq->engine->uabi_instance;
+			   __entry->guc_id = rq->context->guc_id;
 			   __entry->ctx = rq->fence.context;
 			   __entry->seqno = rq->fence.seqno;
+			   __entry->tail = rq->tail;
 			   ),
 
-	    TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u",
+	    TP_printk("dev=%u, engine=%u:%u, guc_id=%u, ctx=%llu, seqno=%u, tail=%u",
 		      __entry->dev, __entry->class, __entry->instance,
-		      __entry->ctx, __entry->seqno)
+		      __entry->guc_id, __entry->ctx, __entry->seqno,
+		      __entry->tail)
 );
 
 DEFINE_EVENT(i915_request, i915_request_add,
@@ -818,6 +823,21 @@ DEFINE_EVENT(i915_request, i915_request_add,
 );
 
 #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
+DEFINE_EVENT(i915_request, i915_request_dep_to,
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
+);
+
+DEFINE_EVENT(i915_request, i915_request_dep_from,
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
+);
+
+DEFINE_EVENT(i915_request, i915_request_guc_submit,
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
+);
+
 DEFINE_EVENT(i915_request, i915_request_submit,
 	     TP_PROTO(struct i915_request *rq),
 	     TP_ARGS(rq)
@@ -887,6 +907,21 @@ TRACE_EVENT(i915_request_out,
 
 #else
 #if !defined(TRACE_HEADER_MULTI_READ)
+static inline void
+trace_i915_request_dep_to(struct i915_request *rq)
+{
+}
+
+static inline void
+trace_i915_request_dep_from(struct i915_request *rq)
+{
+}
+
+static inline void
+trace_i915_request_guc_submit(struct i915_request *rq)
+{
+}
+
 static inline void
 trace_i915_request_submit(struct i915_request *rq)
 {
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 25/47] drm/i915: Add intel_context tracing
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (23 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 24/47] drm/i915/guc: Add several request trace points Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-07-12 18:10   ` John Harrison
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 26/47] drm/i915/guc: GuC virtual engines Matthew Brost
                   ` (26 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Add intel_context tracing. These trace points are particular helpful
when debugging the GuC firmware and can be enabled via
CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS kernel config option.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   6 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  14 ++
 drivers/gpu/drm/i915/i915_trace.h             | 148 +++++++++++++++++-
 3 files changed, 166 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 7f97753ab164..b24a1b7a3f88 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -8,6 +8,7 @@
 
 #include "i915_drv.h"
 #include "i915_globals.h"
+#include "i915_trace.h"
 
 #include "intel_context.h"
 #include "intel_engine.h"
@@ -28,6 +29,7 @@ static void rcu_context_free(struct rcu_head *rcu)
 {
 	struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
 
+	trace_intel_context_free(ce);
 	kmem_cache_free(global.slab_ce, ce);
 }
 
@@ -46,6 +48,7 @@ intel_context_create(struct intel_engine_cs *engine)
 		return ERR_PTR(-ENOMEM);
 
 	intel_context_init(ce, engine);
+	trace_intel_context_create(ce);
 	return ce;
 }
 
@@ -268,6 +271,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
 
 	GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */
 
+	trace_intel_context_do_pin(ce);
+
 err_unlock:
 	mutex_unlock(&ce->pin_mutex);
 err_post_unpin:
@@ -323,6 +328,7 @@ void __intel_context_do_unpin(struct intel_context *ce, int sub)
 	 */
 	intel_context_get(ce);
 	intel_context_active_release(ce);
+	trace_intel_context_do_unpin(ce);
 	intel_context_put(ce);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index c2327eebc09c..d605af0d66e6 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -348,6 +348,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 
 	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
 	if (!enabled && !err) {
+		trace_intel_context_sched_enable(ce);
 		atomic_inc(&guc->outstanding_submission_g2h);
 		set_context_enabled(ce);
 	} else if (!enabled) {
@@ -812,6 +813,8 @@ static int register_context(struct intel_context *ce)
 	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
 		ce->guc_id * sizeof(struct guc_lrc_desc);
 
+	trace_intel_context_register(ce);
+
 	return __guc_action_register_context(guc, ce->guc_id, offset);
 }
 
@@ -831,6 +834,8 @@ static int deregister_context(struct intel_context *ce, u32 guc_id)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 
+	trace_intel_context_deregister(ce);
+
 	return __guc_action_deregister_context(guc, guc_id);
 }
 
@@ -905,6 +910,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
 	 * GuC before registering this context.
 	 */
 	if (context_registered) {
+		trace_intel_context_steal_guc_id(ce);
 		set_context_wait_for_deregister_to_register(ce);
 		intel_context_get(ce);
 
@@ -963,6 +969,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 
 	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
 
+	trace_intel_context_sched_disable(ce);
 	intel_context_get(ce);
 
 	guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
@@ -1119,6 +1126,9 @@ static void __guc_signal_context_fence(struct intel_context *ce)
 
 	lockdep_assert_held(&ce->guc_state.lock);
 
+	if (!list_empty(&ce->guc_state.fences))
+		trace_intel_context_fence_release(ce);
+
 	list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link)
 		i915_sw_fence_complete(&rq->submit);
 
@@ -1529,6 +1539,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 	if (unlikely(!ce))
 		return -EPROTO;
 
+	trace_intel_context_deregister_done(ce);
+
 	if (context_wait_for_deregister_to_register(ce)) {
 		struct intel_runtime_pm *runtime_pm =
 			&ce->engine->gt->i915->runtime_pm;
@@ -1580,6 +1592,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		return -EPROTO;
 	}
 
+	trace_intel_context_sched_done(ce);
+
 	if (context_pending_enable(ce)) {
 		clr_context_pending_enable(ce);
 	} else if (context_pending_disable(ce)) {
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index b02d04b6c8f6..97c2e83984ed 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -818,8 +818,8 @@ DECLARE_EVENT_CLASS(i915_request,
 );
 
 DEFINE_EVENT(i915_request, i915_request_add,
-	    TP_PROTO(struct i915_request *rq),
-	    TP_ARGS(rq)
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
 );
 
 #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
@@ -905,6 +905,90 @@ TRACE_EVENT(i915_request_out,
 			      __entry->ctx, __entry->seqno, __entry->completed)
 );
 
+DECLARE_EVENT_CLASS(intel_context,
+	    TP_PROTO(struct intel_context *ce),
+	    TP_ARGS(ce),
+
+	    TP_STRUCT__entry(
+			     __field(u32, guc_id)
+			     __field(int, pin_count)
+			     __field(u32, sched_state)
+			     __field(u32, guc_sched_state_no_lock)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->guc_id = ce->guc_id;
+			   __entry->pin_count = atomic_read(&ce->pin_count);
+			   __entry->sched_state = ce->guc_state.sched_state;
+			   __entry->guc_sched_state_no_lock =
+			   atomic_read(&ce->guc_sched_state_no_lock);
+			   ),
+
+	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x",
+		      __entry->guc_id, __entry->pin_count, __entry->sched_state,
+		      __entry->guc_sched_state_no_lock)
+);
+
+DEFINE_EVENT(intel_context, intel_context_register,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_deregister,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_deregister_done,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_sched_enable,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_sched_disable,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_sched_done,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_create,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_fence_release,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_free,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_steal_guc_id,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_do_pin,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_do_unpin,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
 #else
 #if !defined(TRACE_HEADER_MULTI_READ)
 static inline void
@@ -941,6 +1025,66 @@ static inline void
 trace_i915_request_out(struct i915_request *rq)
 {
 }
+
+static inline void
+trace_intel_context_register(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_deregister(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_deregister_done(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_sched_enable(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_sched_disable(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_sched_done(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_create(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_fence_release(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_free(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_steal_guc_id(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_do_pin(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_do_unpin(struct intel_context *ce)
+{
+}
 #endif
 #endif
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 26/47] drm/i915/guc: GuC virtual engines
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (24 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 25/47] drm/i915: Add intel_context tracing Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-07-15  1:21   ` Daniele Ceraolo Spurio
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 27/47] drm/i915: Track 'serial' counts for " Matthew Brost
                   ` (25 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Implement GuC virtual engines. Rather simple implementation, basically
just allocate an engine, setup context enter / exit function to virtual
engine specific functions, set all other variables / functions to guc
versions, and set the engine mask to that of all the siblings.

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  19 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |   1 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |  10 +
 drivers/gpu/drm/i915/gt/intel_engine.h        |  45 +++-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  14 +
 .../drm/i915/gt/intel_execlists_submission.c  | 186 +++++++------
 .../drm/i915/gt/intel_execlists_submission.h  |  11 -
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  20 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 253 +++++++++++++++++-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   2 +
 10 files changed, 429 insertions(+), 132 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 5c07e6abf16a..8a9293e0ca92 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -72,7 +72,6 @@
 #include "gt/intel_context_param.h"
 #include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_engine_user.h"
-#include "gt/intel_execlists_submission.h" /* virtual_engine */
 #include "gt/intel_gpu_commands.h"
 #include "gt/intel_ring.h"
 
@@ -1568,9 +1567,6 @@ set_engines__load_balance(struct i915_user_extension __user *base, void *data)
 	if (!HAS_EXECLISTS(i915))
 		return -ENODEV;
 
-	if (intel_uc_uses_guc_submission(&i915->gt.uc))
-		return -ENODEV; /* not implement yet */
-
 	if (get_user(idx, &ext->engine_index))
 		return -EFAULT;
 
@@ -1627,7 +1623,7 @@ set_engines__load_balance(struct i915_user_extension __user *base, void *data)
 		}
 	}
 
-	ce = intel_execlists_create_virtual(siblings, n);
+	ce = intel_engine_create_virtual(siblings, n);
 	if (IS_ERR(ce)) {
 		err = PTR_ERR(ce);
 		goto out_siblings;
@@ -1723,13 +1719,9 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
 		 * A non-virtual engine has no siblings to choose between; and
 		 * a submit fence will always be directed to the one engine.
 		 */
-		if (intel_engine_is_virtual(virtual)) {
-			err = intel_virtual_engine_attach_bond(virtual,
-							       master,
-							       bond);
-			if (err)
-				return err;
-		}
+		err = intel_engine_attach_bond(virtual, master, bond);
+		if (err)
+			return err;
 	}
 
 	return 0;
@@ -2116,8 +2108,7 @@ static int clone_engines(struct i915_gem_context *dst,
 		 * the virtual engine instead.
 		 */
 		if (intel_engine_is_virtual(engine))
-			clone->engines[n] =
-				intel_execlists_clone_virtual(engine);
+			clone->engines[n] = intel_engine_clone_virtual(engine);
 		else
 			clone->engines[n] = intel_context_create(engine);
 		if (IS_ERR_OR_NULL(clone->engines[n])) {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
index b5c908f3f4f2..ba772762f7b9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
@@ -10,6 +10,7 @@
 #include "i915_gem_context_types.h"
 
 #include "gt/intel_context.h"
+#include "gt/intel_engine.h"
 
 #include "i915_drv.h"
 #include "i915_gem.h"
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index e7af6a2368f8..6945963a31ba 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -47,6 +47,16 @@ struct intel_context_ops {
 
 	void (*reset)(struct intel_context *ce);
 	void (*destroy)(struct kref *kref);
+
+	/* virtual engine/context interface */
+	struct intel_context *(*create_virtual)(struct intel_engine_cs **engine,
+						unsigned int count);
+	struct intel_context *(*clone_virtual)(struct intel_engine_cs *engine);
+	struct intel_engine_cs *(*get_sibling)(struct intel_engine_cs *engine,
+					       unsigned int sibling);
+	int (*attach_bond)(struct intel_engine_cs *engine,
+			   const struct intel_engine_cs *master,
+			   const struct intel_engine_cs *sibling);
 };
 
 struct intel_context {
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index f911c1224ab2..923eaee627b3 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -273,13 +273,56 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
 	return intel_engine_has_preemption(engine);
 }
 
+struct intel_context *
+intel_engine_create_virtual(struct intel_engine_cs **siblings,
+			    unsigned int count);
+
+static inline bool
+intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine)
+{
+	if (intel_engine_uses_guc(engine))
+		return intel_guc_virtual_engine_has_heartbeat(engine);
+	else
+		GEM_BUG_ON("Only should be called in GuC submission");
+
+	return false;
+}
+
 static inline bool
 intel_engine_has_heartbeat(const struct intel_engine_cs *engine)
 {
 	if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL))
 		return false;
 
-	return READ_ONCE(engine->props.heartbeat_interval_ms);
+	if (intel_engine_is_virtual(engine))
+		return intel_virtual_engine_has_heartbeat(engine);
+	else
+		return READ_ONCE(engine->props.heartbeat_interval_ms);
+}
+
+static inline struct intel_context *
+intel_engine_clone_virtual(struct intel_engine_cs *src)
+{
+	GEM_BUG_ON(!intel_engine_is_virtual(src));
+	return src->cops->clone_virtual(src);
+}
+
+static inline int
+intel_engine_attach_bond(struct intel_engine_cs *engine,
+			 const struct intel_engine_cs *master,
+			 const struct intel_engine_cs *sibling)
+{
+	if (!engine->cops->attach_bond)
+		return 0;
+
+	return engine->cops->attach_bond(engine, master, sibling);
+}
+
+static inline struct intel_engine_cs *
+intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
+{
+	GEM_BUG_ON(!intel_engine_is_virtual(engine));
+	return engine->cops->get_sibling(engine, sibling);
 }
 
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 88694822716a..d13b1716c29e 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1736,6 +1736,20 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now)
 	return total;
 }
 
+struct intel_context *
+intel_engine_create_virtual(struct intel_engine_cs **siblings,
+			    unsigned int count)
+{
+	if (count == 0)
+		return ERR_PTR(-EINVAL);
+
+	if (count == 1)
+		return intel_context_create(siblings[0]);
+
+	GEM_BUG_ON(!siblings[0]->cops->create_virtual);
+	return siblings[0]->cops->create_virtual(siblings, count);
+}
+
 static bool match_ring(struct i915_request *rq)
 {
 	u32 ring = ENGINE_READ(rq->engine, RING_START);
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index cdb2126a159a..bd4ced794ff9 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -205,6 +205,9 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
 	return container_of(engine, struct virtual_engine, base);
 }
 
+static struct intel_context *
+execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
+
 static struct i915_request *
 __active_request(const struct intel_timeline * const tl,
 		 struct i915_request *rq,
@@ -2560,6 +2563,8 @@ static const struct intel_context_ops execlists_context_ops = {
 
 	.reset = lrc_reset,
 	.destroy = lrc_destroy,
+
+	.create_virtual = execlists_create_virtual,
 };
 
 static int emit_pdps(struct i915_request *rq)
@@ -3506,6 +3511,94 @@ static void virtual_context_exit(struct intel_context *ce)
 		intel_engine_pm_put(ve->siblings[n]);
 }
 
+static struct intel_engine_cs *
+virtual_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
+{
+	struct virtual_engine *ve = to_virtual_engine(engine);
+
+	if (sibling >= ve->num_siblings)
+		return NULL;
+
+	return ve->siblings[sibling];
+}
+
+static struct intel_context *
+virtual_clone(struct intel_engine_cs *src)
+{
+	struct virtual_engine *se = to_virtual_engine(src);
+	struct intel_context *dst;
+
+	dst = execlists_create_virtual(se->siblings, se->num_siblings);
+	if (IS_ERR(dst))
+		return dst;
+
+	if (se->num_bonds) {
+		struct virtual_engine *de = to_virtual_engine(dst->engine);
+
+		de->bonds = kmemdup(se->bonds,
+				    sizeof(*se->bonds) * se->num_bonds,
+				    GFP_KERNEL);
+		if (!de->bonds) {
+			intel_context_put(dst);
+			return ERR_PTR(-ENOMEM);
+		}
+
+		de->num_bonds = se->num_bonds;
+	}
+
+	return dst;
+}
+
+static struct ve_bond *
+virtual_find_bond(struct virtual_engine *ve,
+		  const struct intel_engine_cs *master)
+{
+	int i;
+
+	for (i = 0; i < ve->num_bonds; i++) {
+		if (ve->bonds[i].master == master)
+			return &ve->bonds[i];
+	}
+
+	return NULL;
+}
+
+static int virtual_attach_bond(struct intel_engine_cs *engine,
+			       const struct intel_engine_cs *master,
+			       const struct intel_engine_cs *sibling)
+{
+	struct virtual_engine *ve = to_virtual_engine(engine);
+	struct ve_bond *bond;
+	int n;
+
+	/* Sanity check the sibling is part of the virtual engine */
+	for (n = 0; n < ve->num_siblings; n++)
+		if (sibling == ve->siblings[n])
+			break;
+	if (n == ve->num_siblings)
+		return -EINVAL;
+
+	bond = virtual_find_bond(ve, master);
+	if (bond) {
+		bond->sibling_mask |= sibling->mask;
+		return 0;
+	}
+
+	bond = krealloc(ve->bonds,
+			sizeof(*bond) * (ve->num_bonds + 1),
+			GFP_KERNEL);
+	if (!bond)
+		return -ENOMEM;
+
+	bond[ve->num_bonds].master = master;
+	bond[ve->num_bonds].sibling_mask = sibling->mask;
+
+	ve->bonds = bond;
+	ve->num_bonds++;
+
+	return 0;
+}
+
 static const struct intel_context_ops virtual_context_ops = {
 	.flags = COPS_HAS_INFLIGHT,
 
@@ -3520,6 +3613,10 @@ static const struct intel_context_ops virtual_context_ops = {
 	.exit = virtual_context_exit,
 
 	.destroy = virtual_context_destroy,
+
+	.clone_virtual = virtual_clone,
+	.get_sibling = virtual_get_sibling,
+	.attach_bond = virtual_attach_bond,
 };
 
 static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
@@ -3668,20 +3765,6 @@ static void virtual_submit_request(struct i915_request *rq)
 	spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags);
 }
 
-static struct ve_bond *
-virtual_find_bond(struct virtual_engine *ve,
-		  const struct intel_engine_cs *master)
-{
-	int i;
-
-	for (i = 0; i < ve->num_bonds; i++) {
-		if (ve->bonds[i].master == master)
-			return &ve->bonds[i];
-	}
-
-	return NULL;
-}
-
 static void
 virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
 {
@@ -3704,20 +3787,13 @@ virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
 	to_request(signal)->execution_mask &= ~allowed;
 }
 
-struct intel_context *
-intel_execlists_create_virtual(struct intel_engine_cs **siblings,
-			       unsigned int count)
+static struct intel_context *
+execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 {
 	struct virtual_engine *ve;
 	unsigned int n;
 	int err;
 
-	if (count == 0)
-		return ERR_PTR(-EINVAL);
-
-	if (count == 1)
-		return intel_context_create(siblings[0]);
-
 	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
 	if (!ve)
 		return ERR_PTR(-ENOMEM);
@@ -3850,70 +3926,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
 	return ERR_PTR(err);
 }
 
-struct intel_context *
-intel_execlists_clone_virtual(struct intel_engine_cs *src)
-{
-	struct virtual_engine *se = to_virtual_engine(src);
-	struct intel_context *dst;
-
-	dst = intel_execlists_create_virtual(se->siblings,
-					     se->num_siblings);
-	if (IS_ERR(dst))
-		return dst;
-
-	if (se->num_bonds) {
-		struct virtual_engine *de = to_virtual_engine(dst->engine);
-
-		de->bonds = kmemdup(se->bonds,
-				    sizeof(*se->bonds) * se->num_bonds,
-				    GFP_KERNEL);
-		if (!de->bonds) {
-			intel_context_put(dst);
-			return ERR_PTR(-ENOMEM);
-		}
-
-		de->num_bonds = se->num_bonds;
-	}
-
-	return dst;
-}
-
-int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
-				     const struct intel_engine_cs *master,
-				     const struct intel_engine_cs *sibling)
-{
-	struct virtual_engine *ve = to_virtual_engine(engine);
-	struct ve_bond *bond;
-	int n;
-
-	/* Sanity check the sibling is part of the virtual engine */
-	for (n = 0; n < ve->num_siblings; n++)
-		if (sibling == ve->siblings[n])
-			break;
-	if (n == ve->num_siblings)
-		return -EINVAL;
-
-	bond = virtual_find_bond(ve, master);
-	if (bond) {
-		bond->sibling_mask |= sibling->mask;
-		return 0;
-	}
-
-	bond = krealloc(ve->bonds,
-			sizeof(*bond) * (ve->num_bonds + 1),
-			GFP_KERNEL);
-	if (!bond)
-		return -ENOMEM;
-
-	bond[ve->num_bonds].master = master;
-	bond[ve->num_bonds].sibling_mask = sibling->mask;
-
-	ve->bonds = bond;
-	ve->num_bonds++;
-
-	return 0;
-}
-
 void intel_execlists_show_requests(struct intel_engine_cs *engine,
 				   struct drm_printer *m,
 				   void (*show_request)(struct drm_printer *m,
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
index 4ca9b475e252..74041b1994af 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
@@ -32,15 +32,4 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 							int indent),
 				   unsigned int max);
 
-struct intel_context *
-intel_execlists_create_virtual(struct intel_engine_cs **siblings,
-			       unsigned int count);
-
-struct intel_context *
-intel_execlists_clone_virtual(struct intel_engine_cs *src);
-
-int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
-				     const struct intel_engine_cs *master,
-				     const struct intel_engine_cs *sibling);
-
 #endif /* __INTEL_EXECLISTS_SUBMISSION_H__ */
diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index 08896ae027d5..88aac9977e09 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -3727,7 +3727,7 @@ static int nop_virtual_engine(struct intel_gt *gt,
 	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ve));
 
 	for (n = 0; n < nctx; n++) {
-		ve[n] = intel_execlists_create_virtual(siblings, nsibling);
+		ve[n] = intel_engine_create_virtual(siblings, nsibling);
 		if (IS_ERR(ve[n])) {
 			err = PTR_ERR(ve[n]);
 			nctx = n;
@@ -3923,7 +3923,7 @@ static int mask_virtual_engine(struct intel_gt *gt,
 	 * restrict it to our desired engine within the virtual engine.
 	 */
 
-	ve = intel_execlists_create_virtual(siblings, nsibling);
+	ve = intel_engine_create_virtual(siblings, nsibling);
 	if (IS_ERR(ve)) {
 		err = PTR_ERR(ve);
 		goto out_close;
@@ -4054,7 +4054,7 @@ static int slicein_virtual_engine(struct intel_gt *gt,
 		i915_request_add(rq);
 	}
 
-	ce = intel_execlists_create_virtual(siblings, nsibling);
+	ce = intel_engine_create_virtual(siblings, nsibling);
 	if (IS_ERR(ce)) {
 		err = PTR_ERR(ce);
 		goto out;
@@ -4106,7 +4106,7 @@ static int sliceout_virtual_engine(struct intel_gt *gt,
 
 	/* XXX We do not handle oversubscription and fairness with normal rq */
 	for (n = 0; n < nsibling; n++) {
-		ce = intel_execlists_create_virtual(siblings, nsibling);
+		ce = intel_engine_create_virtual(siblings, nsibling);
 		if (IS_ERR(ce)) {
 			err = PTR_ERR(ce);
 			goto out;
@@ -4208,7 +4208,7 @@ static int preserved_virtual_engine(struct intel_gt *gt,
 	if (err)
 		goto out_scratch;
 
-	ve = intel_execlists_create_virtual(siblings, nsibling);
+	ve = intel_engine_create_virtual(siblings, nsibling);
 	if (IS_ERR(ve)) {
 		err = PTR_ERR(ve);
 		goto out_scratch;
@@ -4431,16 +4431,16 @@ static int bond_virtual_engine(struct intel_gt *gt,
 		for (n = 0; n < nsibling; n++) {
 			struct intel_context *ve;
 
-			ve = intel_execlists_create_virtual(siblings, nsibling);
+			ve = intel_engine_create_virtual(siblings, nsibling);
 			if (IS_ERR(ve)) {
 				err = PTR_ERR(ve);
 				onstack_fence_fini(&fence);
 				goto out;
 			}
 
-			err = intel_virtual_engine_attach_bond(ve->engine,
-							       master,
-							       siblings[n]);
+			err = intel_engine_attach_bond(ve->engine,
+						       master,
+						       siblings[n]);
 			if (err) {
 				intel_context_put(ve);
 				onstack_fence_fini(&fence);
@@ -4576,7 +4576,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
 	if (igt_spinner_init(&spin, gt))
 		return -ENOMEM;
 
-	ve = intel_execlists_create_virtual(siblings, nsibling);
+	ve = intel_engine_create_virtual(siblings, nsibling);
 	if (IS_ERR(ve)) {
 		err = PTR_ERR(ve);
 		goto out_spin;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index d605af0d66e6..ccbcf024b31b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -60,6 +60,15 @@
  *
  */
 
+/* GuC Virtual Engine */
+struct guc_virtual_engine {
+	struct intel_engine_cs base;
+	struct intel_context context;
+};
+
+static struct intel_context *
+guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
+
 #define GUC_REQUEST_SIZE 64 /* bytes */
 
 /*
@@ -928,20 +937,35 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
 	return ret;
 }
 
-static int guc_context_pre_pin(struct intel_context *ce,
-			       struct i915_gem_ww_ctx *ww,
-			       void **vaddr)
+static int __guc_context_pre_pin(struct intel_context *ce,
+				 struct intel_engine_cs *engine,
+				 struct i915_gem_ww_ctx *ww,
+				 void **vaddr)
 {
-	return lrc_pre_pin(ce, ce->engine, ww, vaddr);
+	return lrc_pre_pin(ce, engine, ww, vaddr);
 }
 
-static int guc_context_pin(struct intel_context *ce, void *vaddr)
+static int __guc_context_pin(struct intel_context *ce,
+			     struct intel_engine_cs *engine,
+			     void *vaddr)
 {
 	if (i915_ggtt_offset(ce->state) !=
 	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
 		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
 
-	return lrc_pin(ce, ce->engine, vaddr);
+	return lrc_pin(ce, engine, vaddr);
+}
+
+static int guc_context_pre_pin(struct intel_context *ce,
+			       struct i915_gem_ww_ctx *ww,
+			       void **vaddr)
+{
+	return __guc_context_pre_pin(ce, ce->engine, ww, vaddr);
+}
+
+static int guc_context_pin(struct intel_context *ce, void *vaddr)
+{
+	return __guc_context_pin(ce, ce->engine, vaddr);
 }
 
 static void guc_context_unpin(struct intel_context *ce)
@@ -1041,6 +1065,21 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 	deregister_context(ce, ce->guc_id);
 }
 
+static void __guc_context_destroy(struct intel_context *ce)
+{
+	lrc_fini(ce);
+	intel_context_fini(ce);
+
+	if (intel_engine_is_virtual(ce->engine)) {
+		struct guc_virtual_engine *ve =
+			container_of(ce, typeof(*ve), context);
+
+		kfree(ve);
+	} else {
+		intel_context_free(ce);
+	}
+}
+
 static void guc_context_destroy(struct kref *kref)
 {
 	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
@@ -1057,7 +1096,7 @@ static void guc_context_destroy(struct kref *kref)
 	if (context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
 		release_guc_id(guc, ce);
-		lrc_destroy(kref);
+		__guc_context_destroy(ce);
 		return;
 	}
 
@@ -1073,7 +1112,7 @@ static void guc_context_destroy(struct kref *kref)
 	if (context_guc_id_invalid(ce)) {
 		__release_guc_id(guc, ce);
 		spin_unlock_irqrestore(&guc->contexts_lock, flags);
-		lrc_destroy(kref);
+		__guc_context_destroy(ce);
 		return;
 	}
 
@@ -1118,6 +1157,8 @@ static const struct intel_context_ops guc_context_ops = {
 
 	.reset = lrc_reset,
 	.destroy = guc_context_destroy,
+
+	.create_virtual = guc_create_virtual,
 };
 
 static void __guc_signal_context_fence(struct intel_context *ce)
@@ -1246,6 +1287,96 @@ static int guc_request_alloc(struct i915_request *rq)
 	return 0;
 }
 
+static struct intel_engine_cs *
+guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
+{
+	struct intel_engine_cs *engine;
+	intel_engine_mask_t tmp, mask = ve->mask;
+	unsigned int num_siblings = 0;
+
+	for_each_engine_masked(engine, ve->gt, mask, tmp)
+		if (num_siblings++ == sibling)
+			return engine;
+
+	return NULL;
+}
+
+static int guc_virtual_context_pre_pin(struct intel_context *ce,
+				       struct i915_gem_ww_ctx *ww,
+				       void **vaddr)
+{
+	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+
+	return __guc_context_pre_pin(ce, engine, ww, vaddr);
+}
+
+static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
+{
+	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+
+	return __guc_context_pin(ce, engine, vaddr);
+}
+
+static void guc_virtual_context_enter(struct intel_context *ce)
+{
+	intel_engine_mask_t tmp, mask = ce->engine->mask;
+	struct intel_engine_cs *engine;
+
+	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
+		intel_engine_pm_get(engine);
+
+	intel_timeline_enter(ce->timeline);
+}
+
+static void guc_virtual_context_exit(struct intel_context *ce)
+{
+	intel_engine_mask_t tmp, mask = ce->engine->mask;
+	struct intel_engine_cs *engine;
+
+	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
+		intel_engine_pm_put(engine);
+
+	intel_timeline_exit(ce->timeline);
+}
+
+static int guc_virtual_context_alloc(struct intel_context *ce)
+{
+	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+
+	return lrc_alloc(ce, engine);
+}
+
+static struct intel_context *guc_clone_virtual(struct intel_engine_cs *src)
+{
+	struct intel_engine_cs *siblings[GUC_MAX_INSTANCES_PER_CLASS], *engine;
+	intel_engine_mask_t tmp, mask = src->mask;
+	unsigned int num_siblings = 0;
+
+	for_each_engine_masked(engine, src->gt, mask, tmp)
+		siblings[num_siblings++] = engine;
+
+	return guc_create_virtual(siblings, num_siblings);
+}
+
+static const struct intel_context_ops virtual_guc_context_ops = {
+	.alloc = guc_virtual_context_alloc,
+
+	.pre_pin = guc_virtual_context_pre_pin,
+	.pin = guc_virtual_context_pin,
+	.unpin = guc_context_unpin,
+	.post_unpin = guc_context_post_unpin,
+
+	.enter = guc_virtual_context_enter,
+	.exit = guc_virtual_context_exit,
+
+	.sched_disable = guc_context_sched_disable,
+
+	.destroy = guc_context_destroy,
+
+	.clone_virtual = guc_clone_virtual,
+	.get_sibling = guc_virtual_get_sibling,
+};
+
 static void sanitize_hwsp(struct intel_engine_cs *engine)
 {
 	struct intel_timeline *tl;
@@ -1557,7 +1688,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 	} else if (context_destroyed(ce)) {
 		/* Context has been destroyed */
 		release_guc_id(guc, ce);
-		lrc_destroy(&ce->ref);
+		__guc_context_destroy(ce);
 	}
 
 	decr_outstanding_submission_g2h(guc);
@@ -1669,3 +1800,107 @@ void intel_guc_log_context_info(struct intel_guc *guc,
 			   atomic_read(&ce->guc_sched_state_no_lock));
 	}
 }
+
+static struct intel_context *
+guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
+{
+	struct guc_virtual_engine *ve;
+	struct intel_guc *guc;
+	unsigned int n;
+	int err;
+
+	ve = kzalloc(sizeof(*ve), GFP_KERNEL);
+	if (!ve)
+		return ERR_PTR(-ENOMEM);
+
+	guc = &siblings[0]->gt->uc.guc;
+
+	ve->base.i915 = siblings[0]->i915;
+	ve->base.gt = siblings[0]->gt;
+	ve->base.uncore = siblings[0]->uncore;
+	ve->base.id = -1;
+
+	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
+	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
+	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
+	ve->base.saturated = ALL_ENGINES;
+	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
+	if (!ve->base.breadcrumbs) {
+		kfree(ve);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
+
+	ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine);
+
+	ve->base.cops = &virtual_guc_context_ops;
+	ve->base.request_alloc = guc_request_alloc;
+
+	ve->base.submit_request = guc_submit_request;
+
+	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
+
+	intel_context_init(&ve->context, &ve->base);
+
+	for (n = 0; n < count; n++) {
+		struct intel_engine_cs *sibling = siblings[n];
+
+		GEM_BUG_ON(!is_power_of_2(sibling->mask));
+		if (sibling->mask & ve->base.mask) {
+			DRM_DEBUG("duplicate %s entry in load balancer\n",
+				  sibling->name);
+			err = -EINVAL;
+			goto err_put;
+		}
+
+		ve->base.mask |= sibling->mask;
+
+		if (n != 0 && ve->base.class != sibling->class) {
+			DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n",
+				  sibling->class, ve->base.class);
+			err = -EINVAL;
+			goto err_put;
+		} else if (n == 0) {
+			ve->base.class = sibling->class;
+			ve->base.uabi_class = sibling->uabi_class;
+			snprintf(ve->base.name, sizeof(ve->base.name),
+				 "v%dx%d", ve->base.class, count);
+			ve->base.context_size = sibling->context_size;
+
+			ve->base.emit_bb_start = sibling->emit_bb_start;
+			ve->base.emit_flush = sibling->emit_flush;
+			ve->base.emit_init_breadcrumb =
+				sibling->emit_init_breadcrumb;
+			ve->base.emit_fini_breadcrumb =
+				sibling->emit_fini_breadcrumb;
+			ve->base.emit_fini_breadcrumb_dw =
+				sibling->emit_fini_breadcrumb_dw;
+
+			ve->base.flags |= sibling->flags;
+
+			ve->base.props.timeslice_duration_ms =
+				sibling->props.timeslice_duration_ms;
+		}
+	}
+
+	return &ve->context;
+
+err_put:
+	intel_context_put(&ve->context);
+	return ERR_PTR(err);
+}
+
+
+
+bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve)
+{
+	struct intel_engine_cs *engine;
+	intel_engine_mask_t tmp, mask = ve->mask;
+
+	for_each_engine_masked(engine, ve->gt, mask, tmp)
+		if (READ_ONCE(engine->props.heartbeat_interval_ms))
+			return true;
+
+	return false;
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 6453e2bfa151..95df5ab06031 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -25,6 +25,8 @@ void intel_guc_log_submission_info(struct intel_guc *guc,
 				   struct drm_printer *p);
 void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p);
 
+bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
+
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
 	/* XXX: GuC submission is unavailable for now */
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 27/47] drm/i915: Track 'serial' counts for virtual engines
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (25 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 26/47] drm/i915/guc: GuC virtual engines Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-07-12 18:11   ` John Harrison
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 28/47] drm/i915: Hold reference to intel_context over life of i915_request Matthew Brost
                   ` (24 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

The serial number tracking of engines happens at the backend of
request submission and was expecting to only be given physical
engines. However, in GuC submission mode, the decomposition of virtual
to physical engines does not happen in i915. Instead, requests are
submitted to their virtual engine mask all the way through to the
hardware (i.e. to GuC). This would mean that the heart beat code
thinks the physical engines are idle due to the serial number not
incrementing.

This patch updates the tracking to decompose virtual engines into
their physical constituents and tracks the request against each. This
is not entirely accurate as the GuC will only be issuing the request
to one physical engine. However, it is the best that i915 can do given
that it has no knowledge of the GuC's scheduling decisions.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h     |  2 ++
 .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++++++
 drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++++++
 drivers/gpu/drm/i915/gt/mock_engine.c            |  6 ++++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 ++++++++++++++++
 drivers/gpu/drm/i915/i915_request.c              |  4 +++-
 6 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 5b91068ab277..1dc59e6c9a92 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -388,6 +388,8 @@ struct intel_engine_cs {
 	void		(*park)(struct intel_engine_cs *engine);
 	void		(*unpark)(struct intel_engine_cs *engine);
 
+	void		(*bump_serial)(struct intel_engine_cs *engine);
+
 	void		(*set_default_submission)(struct intel_engine_cs *engine);
 
 	const struct intel_context_ops *cops;
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index bd4ced794ff9..9cfb8800a0e6 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3203,6 +3203,11 @@ static void execlists_release(struct intel_engine_cs *engine)
 	lrc_fini_wa_ctx(engine);
 }
 
+static void execlist_bump_serial(struct intel_engine_cs *engine)
+{
+	engine->serial++;
+}
+
 static void
 logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 {
@@ -3212,6 +3217,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 
 	engine->cops = &execlists_context_ops;
 	engine->request_alloc = execlists_request_alloc;
+	engine->bump_serial = execlist_bump_serial;
 
 	engine->reset.prepare = execlists_reset_prepare;
 	engine->reset.rewind = execlists_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 5d42a12ef3d6..e1506b280df1 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -1044,6 +1044,11 @@ static void setup_irq(struct intel_engine_cs *engine)
 	}
 }
 
+static void ring_bump_serial(struct intel_engine_cs *engine)
+{
+	engine->serial++;
+}
+
 static void setup_common(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
@@ -1063,6 +1068,7 @@ static void setup_common(struct intel_engine_cs *engine)
 
 	engine->cops = &ring_context_ops;
 	engine->request_alloc = ring_request_alloc;
+	engine->bump_serial = ring_bump_serial;
 
 	/*
 	 * Using a global execution timeline; the previous final breadcrumb is
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index 68970398e4ef..9203c766db80 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine)
 	intel_engine_fini_retire(engine);
 }
 
+static void mock_bump_serial(struct intel_engine_cs *engine)
+{
+	engine->serial++;
+}
+
 struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 				    const char *name,
 				    int id)
@@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 
 	engine->base.cops = &mock_context_ops;
 	engine->base.request_alloc = mock_request_alloc;
+	engine->base.bump_serial = mock_bump_serial;
 	engine->base.emit_flush = mock_emit_flush;
 	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
 	engine->base.submit_request = mock_submit_request;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ccbcf024b31b..d1badd7137b7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1496,6 +1496,20 @@ static void guc_release(struct intel_engine_cs *engine)
 	lrc_fini_wa_ctx(engine);
 }
 
+static void guc_bump_serial(struct intel_engine_cs *engine)
+{
+	engine->serial++;
+}
+
+static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
+{
+	struct intel_engine_cs *e;
+	intel_engine_mask_t tmp, mask = engine->mask;
+
+	for_each_engine_masked(e, engine->gt, mask, tmp)
+		e->serial++;
+}
+
 static void guc_default_vfuncs(struct intel_engine_cs *engine)
 {
 	/* Default vfuncs which can be overridden by each engine. */
@@ -1504,6 +1518,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 
 	engine->cops = &guc_context_ops;
 	engine->request_alloc = guc_request_alloc;
+	engine->bump_serial = guc_bump_serial;
 
 	engine->sched_engine->schedule = i915_schedule;
 
@@ -1836,6 +1851,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 
 	ve->base.cops = &virtual_guc_context_ops;
 	ve->base.request_alloc = guc_request_alloc;
+	ve->base.bump_serial = virtual_guc_bump_serial;
 
 	ve->base.submit_request = guc_submit_request;
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 7f7aa096e873..de9deb95b8b1 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -692,7 +692,9 @@ bool __i915_request_submit(struct i915_request *request)
 				     request->ring->vaddr + request->postfix);
 
 	trace_i915_request_execute(request);
-	engine->serial++;
+	if (engine->bump_serial)
+		engine->bump_serial(engine);
+
 	result = true;
 
 	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 28/47] drm/i915: Hold reference to intel_context over life of i915_request
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (26 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 27/47] drm/i915: Track 'serial' counts for " Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-07-12 18:23   ` John Harrison
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 29/47] drm/i915/guc: Disable bonding extension with GuC submission Matthew Brost
                   ` (23 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Hold a reference to the intel_context over life of an i915_request.
Without this an i915_request can exist after the context has been
destroyed (e.g. request retired, context closed, but user space holds a
reference to the request from an out fence). In the case of GuC
submission + virtual engine, the engine that the request references is
also destroyed which can trigger bad pointer dref in fence ops (e.g.
i915_fence_get_driver_name). We could likely change
i915_fence_get_driver_name to avoid touching the engine but let's just
be safe and hold the intel_context reference.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 54 ++++++++++++-----------------
 1 file changed, 22 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index de9deb95b8b1..dec5a35c9aa2 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -126,39 +126,17 @@ static void i915_fence_release(struct dma_fence *fence)
 	i915_sw_fence_fini(&rq->semaphore);
 
 	/*
-	 * Keep one request on each engine for reserved use under mempressure
-	 *
-	 * We do not hold a reference to the engine here and so have to be
-	 * very careful in what rq->engine we poke. The virtual engine is
-	 * referenced via the rq->context and we released that ref during
-	 * i915_request_retire(), ergo we must not dereference a virtual
-	 * engine here. Not that we would want to, as the only consumer of
-	 * the reserved engine->request_pool is the power management parking,
-	 * which must-not-fail, and that is only run on the physical engines.
-	 *
-	 * Since the request must have been executed to be have completed,
-	 * we know that it will have been processed by the HW and will
-	 * not be unsubmitted again, so rq->engine and rq->execution_mask
-	 * at this point is stable. rq->execution_mask will be a single
-	 * bit if the last and _only_ engine it could execution on was a
-	 * physical engine, if it's multiple bits then it started on and
-	 * could still be on a virtual engine. Thus if the mask is not a
-	 * power-of-two we assume that rq->engine may still be a virtual
-	 * engine and so a dangling invalid pointer that we cannot dereference
-	 *
-	 * For example, consider the flow of a bonded request through a virtual
-	 * engine. The request is created with a wide engine mask (all engines
-	 * that we might execute on). On processing the bond, the request mask
-	 * is reduced to one or more engines. If the request is subsequently
-	 * bound to a single engine, it will then be constrained to only
-	 * execute on that engine and never returned to the virtual engine
-	 * after timeslicing away, see __unwind_incomplete_requests(). Thus we
-	 * know that if the rq->execution_mask is a single bit, rq->engine
-	 * can be a physical engine with the exact corresponding mask.
+	 * Keep one request on each engine for reserved use under mempressure,
+	 * do not use with virtual engines as this really is only needed for
+	 * kernel contexts.
 	 */
-	if (is_power_of_2(rq->execution_mask) &&
-	    !cmpxchg(&rq->engine->request_pool, NULL, rq))
+	if (!intel_engine_is_virtual(rq->engine) &&
+	    !cmpxchg(&rq->engine->request_pool, NULL, rq)) {
+		intel_context_put(rq->context);
 		return;
+	}
+
+	intel_context_put(rq->context);
 
 	kmem_cache_free(global.slab_requests, rq);
 }
@@ -977,7 +955,18 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 		}
 	}
 
-	rq->context = ce;
+	/*
+	 * Hold a reference to the intel_context over life of an i915_request.
+	 * Without this an i915_request can exist after the context has been
+	 * destroyed (e.g. request retired, context closed, but user space holds
+	 * a reference to the request from an out fence). In the case of GuC
+	 * submission + virtual engine, the engine that the request references
+	 * is also destroyed which can trigger bad pointer dref in fence ops
+	 * (e.g. i915_fence_get_driver_name). We could likely change these
+	 * functions to avoid touching the engine but let's just be safe and
+	 * hold the intel_context reference.
+	 */
+	rq->context = intel_context_get(ce);
 	rq->engine = ce->engine;
 	rq->ring = ce->ring;
 	rq->execution_mask = ce->engine->mask;
@@ -1054,6 +1043,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
 err_free:
+	intel_context_put(ce);
 	kmem_cache_free(global.slab_requests, rq);
 err_unreserve:
 	intel_context_unpin(ce);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 29/47] drm/i915/guc: Disable bonding extension with GuC submission
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (27 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 28/47] drm/i915: Hold reference to intel_context over life of i915_request Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-07-12 18:23   ` John Harrison
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 30/47] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs Matthew Brost
                   ` (22 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Update the bonding extension to return -ENODEV when using GuC submission
as this extension fundamentally will not work with the GuC submission
interface.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 8a9293e0ca92..0429aa4172bf 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1674,6 +1674,11 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
 	}
 	virtual = set->engines->engines[idx]->engine;
 
+	if (intel_engine_uses_guc(virtual)) {
+		DRM_DEBUG("bonding extension not supported with GuC submission");
+		return -ENODEV;
+	}
+
 	err = check_user_mbz(&ext->flags);
 	if (err)
 		return err;
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 30/47] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (28 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 29/47] drm/i915/guc: Disable bonding extension with GuC submission Matthew Brost
@ 2021-06-24  7:04 ` Matthew Brost
  2021-07-12 19:19   ` John Harrison
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 31/47] drm/i915/guc: Reset implementation for new GuC interface Matthew Brost
                   ` (21 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

With GuC virtual engines the physical engine which a request executes
and completes on isn't known to the i915. Therefore we can't attach a
request to a physical engines breadcrumbs. To work around this we create
a single breadcrumbs per engine class when using GuC submission and
direct all physical engine interrupts to this breadcrumbs.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
CC: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 41 +++++-------
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   | 14 +++-
 .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |  7 ++
 drivers/gpu/drm/i915/gt/intel_engine.h        |  3 +
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 28 +++++++-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  1 -
 .../drm/i915/gt/intel_execlists_submission.c  |  2 +-
 drivers/gpu/drm/i915/gt/mock_engine.c         |  4 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +++++++++++++++++--
 9 files changed, 131 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 38cc42783dfb..2007dc6f6b99 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -15,28 +15,14 @@
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
 
-static bool irq_enable(struct intel_engine_cs *engine)
+static bool irq_enable(struct intel_breadcrumbs *b)
 {
-	if (!engine->irq_enable)
-		return false;
-
-	/* Caller disables interrupts */
-	spin_lock(&engine->gt->irq_lock);
-	engine->irq_enable(engine);
-	spin_unlock(&engine->gt->irq_lock);
-
-	return true;
+	return intel_engine_irq_enable(b->irq_engine);
 }
 
-static void irq_disable(struct intel_engine_cs *engine)
+static void irq_disable(struct intel_breadcrumbs *b)
 {
-	if (!engine->irq_disable)
-		return;
-
-	/* Caller disables interrupts */
-	spin_lock(&engine->gt->irq_lock);
-	engine->irq_disable(engine);
-	spin_unlock(&engine->gt->irq_lock);
+	intel_engine_irq_disable(b->irq_engine);
 }
 
 static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
@@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 	WRITE_ONCE(b->irq_armed, true);
 
 	/* Requests may have completed before we could enable the interrupt. */
-	if (!b->irq_enabled++ && irq_enable(b->irq_engine))
+	if (!b->irq_enabled++ && b->irq_enable(b))
 		irq_work_queue(&b->irq_work);
 }
 
@@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
 {
 	GEM_BUG_ON(!b->irq_enabled);
 	if (!--b->irq_enabled)
-		irq_disable(b->irq_engine);
+		b->irq_disable(b);
 
 	WRITE_ONCE(b->irq_armed, false);
 	intel_gt_pm_put_async(b->irq_engine->gt);
@@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
 	if (!b)
 		return NULL;
 
-	b->irq_engine = irq_engine;
+	kref_init(&b->ref);
 
 	spin_lock_init(&b->signalers_lock);
 	INIT_LIST_HEAD(&b->signalers);
@@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
 	spin_lock_init(&b->irq_lock);
 	init_irq_work(&b->irq_work, signal_irq_work);
 
+	b->irq_engine = irq_engine;
+	b->irq_enable = irq_enable;
+	b->irq_disable = irq_disable;
+
 	return b;
 }
 
@@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)
 	spin_lock_irqsave(&b->irq_lock, flags);
 
 	if (b->irq_enabled)
-		irq_enable(b->irq_engine);
+		b->irq_enable(b);
 	else
-		irq_disable(b->irq_engine);
+		b->irq_disable(b);
 
 	spin_unlock_irqrestore(&b->irq_lock, flags);
 }
@@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
 	}
 }
 
-void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
+void intel_breadcrumbs_free(struct kref *kref)
 {
+	struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref);
+
 	irq_work_sync(&b->irq_work);
 	GEM_BUG_ON(!list_empty(&b->signalers));
 	GEM_BUG_ON(b->irq_armed);
+
 	kfree(b);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
index 3ce5ce270b04..72105b74663d 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
@@ -17,7 +17,7 @@ struct intel_breadcrumbs;
 
 struct intel_breadcrumbs *
 intel_breadcrumbs_create(struct intel_engine_cs *irq_engine);
-void intel_breadcrumbs_free(struct intel_breadcrumbs *b);
+void intel_breadcrumbs_free(struct kref *kref);
 
 void intel_breadcrumbs_reset(struct intel_breadcrumbs *b);
 void __intel_breadcrumbs_park(struct intel_breadcrumbs *b);
@@ -48,4 +48,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *request);
 void intel_context_remove_breadcrumbs(struct intel_context *ce,
 				      struct intel_breadcrumbs *b);
 
+static inline struct intel_breadcrumbs *
+intel_breadcrumbs_get(struct intel_breadcrumbs *b)
+{
+	kref_get(&b->ref);
+	return b;
+}
+
+static inline void intel_breadcrumbs_put(struct intel_breadcrumbs *b)
+{
+	kref_put(&b->ref, intel_breadcrumbs_free);
+}
+
 #endif /* __INTEL_BREADCRUMBS__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
index 3a084ce8ff5e..a4e146684be8 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
@@ -7,10 +7,13 @@
 #define __INTEL_BREADCRUMBS_TYPES__
 
 #include <linux/irq_work.h>
+#include <linux/kref.h>
 #include <linux/list.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
 
+typedef u8 intel_engine_mask_t;
+
 /*
  * Rather than have every client wait upon all user interrupts,
  * with the herd waking after every interrupt and each doing the
@@ -29,6 +32,7 @@
  * the overhead of waking that client is much preferred.
  */
 struct intel_breadcrumbs {
+	struct kref ref;
 	atomic_t active;
 
 	spinlock_t signalers_lock; /* protects the list of signalers */
@@ -42,7 +46,10 @@ struct intel_breadcrumbs {
 	bool irq_armed;
 
 	/* Not all breadcrumbs are attached to physical HW */
+	intel_engine_mask_t	engine_mask;
 	struct intel_engine_cs *irq_engine;
+	bool	(*irq_enable)(struct intel_breadcrumbs *b);
+	void	(*irq_disable)(struct intel_breadcrumbs *b);
 };
 
 #endif /* __INTEL_BREADCRUMBS_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 923eaee627b3..e9e0657f847a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -212,6 +212,9 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine,
 
 void intel_engine_init_execlists(struct intel_engine_cs *engine);
 
+bool intel_engine_irq_enable(struct intel_engine_cs *engine);
+void intel_engine_irq_disable(struct intel_engine_cs *engine);
+
 static inline void __intel_engine_reset(struct intel_engine_cs *engine,
 					bool stalled)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index d13b1716c29e..69245670b8b0 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -739,7 +739,7 @@ static int engine_setup_common(struct intel_engine_cs *engine)
 err_cmd_parser:
 	i915_sched_engine_put(engine->sched_engine);
 err_sched_engine:
-	intel_breadcrumbs_free(engine->breadcrumbs);
+	intel_breadcrumbs_put(engine->breadcrumbs);
 err_status:
 	cleanup_status_page(engine);
 	return err;
@@ -947,7 +947,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 	GEM_BUG_ON(!list_empty(&engine->sched_engine->requests));
 
 	i915_sched_engine_put(engine->sched_engine);
-	intel_breadcrumbs_free(engine->breadcrumbs);
+	intel_breadcrumbs_put(engine->breadcrumbs);
 
 	intel_engine_fini_retire(engine);
 	intel_engine_cleanup_cmd_parser(engine);
@@ -1264,6 +1264,30 @@ bool intel_engines_are_idle(struct intel_gt *gt)
 	return true;
 }
 
+bool intel_engine_irq_enable(struct intel_engine_cs *engine)
+{
+	if (!engine->irq_enable)
+		return false;
+
+	/* Caller disables interrupts */
+	spin_lock(&engine->gt->irq_lock);
+	engine->irq_enable(engine);
+	spin_unlock(&engine->gt->irq_lock);
+
+	return true;
+}
+
+void intel_engine_irq_disable(struct intel_engine_cs *engine)
+{
+	if (!engine->irq_disable)
+		return;
+
+	/* Caller disables interrupts */
+	spin_lock(&engine->gt->irq_lock);
+	engine->irq_disable(engine);
+	spin_unlock(&engine->gt->irq_lock);
+}
+
 void intel_engines_reset_default_submission(struct intel_gt *gt)
 {
 	struct intel_engine_cs *engine;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 1dc59e6c9a92..e7cb6a06db9d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -64,7 +64,6 @@ struct intel_gt;
 struct intel_ring;
 struct intel_uncore;
 
-typedef u8 intel_engine_mask_t;
 #define ALL_ENGINES ((intel_engine_mask_t)~0ul)
 
 struct intel_hw_status_page {
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 9cfb8800a0e6..c10ea6080752 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3419,7 +3419,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
 	intel_context_fini(&ve->context);
 
 	if (ve->base.breadcrumbs)
-		intel_breadcrumbs_free(ve->base.breadcrumbs);
+		intel_breadcrumbs_put(ve->base.breadcrumbs);
 	if (ve->base.sched_engine)
 		i915_sched_engine_put(ve->base.sched_engine);
 	intel_engine_free_request_pool(&ve->base);
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index 9203c766db80..fc5a65ab1937 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -284,7 +284,7 @@ static void mock_engine_release(struct intel_engine_cs *engine)
 	GEM_BUG_ON(timer_pending(&mock->hw_delay));
 
 	i915_sched_engine_put(engine->sched_engine);
-	intel_breadcrumbs_free(engine->breadcrumbs);
+	intel_breadcrumbs_put(engine->breadcrumbs);
 
 	intel_context_unpin(engine->kernel_context);
 	intel_context_put(engine->kernel_context);
@@ -376,7 +376,7 @@ int mock_engine_init(struct intel_engine_cs *engine)
 	return 0;
 
 err_breadcrumbs:
-	intel_breadcrumbs_free(engine->breadcrumbs);
+	intel_breadcrumbs_put(engine->breadcrumbs);
 err_schedule:
 	i915_sched_engine_put(engine->sched_engine);
 	return -ENOMEM;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index d1badd7137b7..83058df5ba01 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1074,6 +1074,9 @@ static void __guc_context_destroy(struct intel_context *ce)
 		struct guc_virtual_engine *ve =
 			container_of(ce, typeof(*ve), context);
 
+		if (ve->base.breadcrumbs)
+			intel_breadcrumbs_put(ve->base.breadcrumbs);
+
 		kfree(ve);
 	} else {
 		intel_context_free(ce);
@@ -1377,6 +1380,62 @@ static const struct intel_context_ops virtual_guc_context_ops = {
 	.get_sibling = guc_virtual_get_sibling,
 };
 
+static bool
+guc_irq_enable_breadcrumbs(struct intel_breadcrumbs *b)
+{
+	struct intel_engine_cs *sibling;
+	intel_engine_mask_t tmp, mask = b->engine_mask;
+	bool result = false;
+
+	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
+		result |= intel_engine_irq_enable(sibling);
+
+	return result;
+}
+
+static void
+guc_irq_disable_breadcrumbs(struct intel_breadcrumbs *b)
+{
+	struct intel_engine_cs *sibling;
+	intel_engine_mask_t tmp, mask = b->engine_mask;
+
+	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
+		intel_engine_irq_disable(sibling);
+}
+
+static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
+{
+	int i;
+
+       /*
+        * In GuC submission mode we do not know which physical engine a request
+        * will be scheduled on, this creates a problem because the breadcrumb
+        * interrupt is per physical engine. To work around this we attach
+        * requests and direct all breadcrumb interrupts to the first instance
+        * of an engine per class. In addition all breadcrumb interrupts are
+	* enabled / disabled across an engine class in unison.
+        */
+	for (i = 0; i < MAX_ENGINE_INSTANCE; ++i) {
+		struct intel_engine_cs *sibling =
+			engine->gt->engine_class[engine->class][i];
+
+		if (sibling) {
+			if (engine->breadcrumbs != sibling->breadcrumbs) {
+				intel_breadcrumbs_put(engine->breadcrumbs);
+				engine->breadcrumbs =
+					intel_breadcrumbs_get(sibling->breadcrumbs);
+			}
+			break;
+		}
+	}
+
+	if (engine->breadcrumbs) {
+		engine->breadcrumbs->engine_mask |= engine->mask;
+		engine->breadcrumbs->irq_enable = guc_irq_enable_breadcrumbs;
+		engine->breadcrumbs->irq_disable = guc_irq_disable_breadcrumbs;
+	}
+}
+
 static void sanitize_hwsp(struct intel_engine_cs *engine)
 {
 	struct intel_timeline *tl;
@@ -1600,6 +1659,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 
 	guc_default_vfuncs(engine);
 	guc_default_irqs(engine);
+	guc_init_breadcrumbs(engine);
 
 	if (engine->class == RENDER_CLASS)
 		rcs_submission_override(engine);
@@ -1839,11 +1899,6 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
 	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
 	ve->base.saturated = ALL_ENGINES;
-	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
-	if (!ve->base.breadcrumbs) {
-		kfree(ve);
-		return ERR_PTR(-ENOMEM);
-	}
 
 	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
 
@@ -1892,6 +1947,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 				sibling->emit_fini_breadcrumb;
 			ve->base.emit_fini_breadcrumb_dw =
 				sibling->emit_fini_breadcrumb_dw;
+			ve->base.breadcrumbs =
+				intel_breadcrumbs_get(sibling->breadcrumbs);
 
 			ve->base.flags |= sibling->flags;
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 31/47] drm/i915/guc: Reset implementation for new GuC interface
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (29 preceding siblings ...)
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 30/47] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs Matthew Brost
@ 2021-06-24  7:05 ` Matthew Brost
  2021-07-12 19:58   ` John Harrison
  2021-07-15  9:36   ` Tvrtko Ursulin
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 32/47] drm/i915: Reset GPU immediately if submission is disabled Matthew Brost
                   ` (20 subsequent siblings)
  51 siblings, 2 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Reset implementation for new GuC interface. This is the legacy reset
implementation which is called when the i915 owns the engine hang check.
Future patches will offload the engine hang check to GuC but we will
continue to maintain this legacy path as a fallback and this code path
is also required if the GuC dies.

With the new GuC interface it is not possible to reset individual
engines - it is only possible to reset the GPU entirely. This patch
forces an entire chip reset if any engine hangs.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   3 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   6 +
 .../drm/i915/gt/intel_execlists_submission.c  |  40 ++
 drivers/gpu/drm/i915/gt/intel_gt_pm.c         |   6 +-
 drivers/gpu/drm/i915/gt/intel_reset.c         |  18 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  22 +
 drivers/gpu/drm/i915/gt/mock_engine.c         |  31 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  13 -
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   8 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 581 ++++++++++++++----
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  39 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   3 +
 drivers/gpu/drm/i915/i915_request.c           |  41 +-
 drivers/gpu/drm/i915/i915_request.h           |   2 +
 15 files changed, 649 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index b24a1b7a3f88..2f01437056a8 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -392,6 +392,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	spin_lock_init(&ce->guc_state.lock);
 	INIT_LIST_HEAD(&ce->guc_state.fences);
 
+	spin_lock_init(&ce->guc_active.lock);
+	INIT_LIST_HEAD(&ce->guc_active.requests);
+
 	ce->guc_id = GUC_INVALID_LRC_ID;
 	INIT_LIST_HEAD(&ce->guc_id_link);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 6945963a31ba..b63c8cf7823b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -165,6 +165,13 @@ struct intel_context {
 		struct list_head fences;
 	} guc_state;
 
+	struct {
+		/** lock: protects everything in guc_active */
+		spinlock_t lock;
+		/** requests: active requests on this context */
+		struct list_head requests;
+	} guc_active;
+
 	/* GuC scheduling state that does not require a lock. */
 	atomic_t guc_sched_state_no_lock;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index e7cb6a06db9d..f9d264c008e8 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -426,6 +426,12 @@ struct intel_engine_cs {
 
 	void		(*release)(struct intel_engine_cs *engine);
 
+	/*
+	 * Add / remove request from engine active tracking
+	 */
+	void		(*add_active_request)(struct i915_request *rq);
+	void		(*remove_active_request)(struct i915_request *rq);
+
 	struct intel_engine_execlists execlists;
 
 	/*
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index c10ea6080752..c301a2d088b1 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3118,6 +3118,42 @@ static void execlists_park(struct intel_engine_cs *engine)
 	cancel_timer(&engine->execlists.preempt);
 }
 
+static void add_to_engine(struct i915_request *rq)
+{
+	lockdep_assert_held(&rq->engine->sched_engine->lock);
+	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+
+static void remove_from_engine(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine, *locked;
+
+	/*
+	 * Virtual engines complicate acquiring the engine timeline lock,
+	 * as their rq->engine pointer is not stable until under that
+	 * engine lock. The simple ploy we use is to take the lock then
+	 * check that the rq still belongs to the newly locked engine.
+	 */
+	locked = READ_ONCE(rq->engine);
+	spin_lock_irq(&locked->sched_engine->lock);
+	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
+		spin_unlock(&locked->sched_engine->lock);
+		spin_lock(&engine->sched_engine->lock);
+		locked = engine;
+	}
+	list_del_init(&rq->sched.link);
+
+	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+	clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
+
+	/* Prevent further __await_execution() registering a cb, then flush */
+	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
+
+	spin_unlock_irq(&locked->sched_engine->lock);
+
+	i915_request_notify_execute_cb_imm(rq);
+}
+
 static bool can_preempt(struct intel_engine_cs *engine)
 {
 	if (GRAPHICS_VER(engine->i915) > 8)
@@ -3218,6 +3254,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->cops = &execlists_context_ops;
 	engine->request_alloc = execlists_request_alloc;
 	engine->bump_serial = execlist_bump_serial;
+	engine->add_active_request = add_to_engine;
+	engine->remove_active_request = remove_from_engine;
 
 	engine->reset.prepare = execlists_reset_prepare;
 	engine->reset.rewind = execlists_reset_rewind;
@@ -3912,6 +3950,8 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 			 "v%dx%d", ve->base.class, count);
 		ve->base.context_size = sibling->context_size;
 
+		ve->base.add_active_request = sibling->add_active_request;
+		ve->base.remove_active_request = sibling->remove_active_request;
 		ve->base.emit_bb_start = sibling->emit_bb_start;
 		ve->base.emit_flush = sibling->emit_flush;
 		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index aef3084e8b16..463a6ae605a0 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
 	if (intel_gt_is_wedged(gt))
 		intel_gt_unset_wedged(gt);
 
-	intel_uc_sanitize(&gt->uc);
-
 	for_each_engine(engine, gt, id)
 		if (engine->reset.prepare)
 			engine->reset.prepare(engine);
@@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
 			__intel_engine_reset(engine, false);
 	}
 
+	intel_uc_reset(&gt->uc, false);
+
 	for_each_engine(engine, gt, id)
 		if (engine->reset.finish)
 			engine->reset.finish(engine);
@@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt)
 		goto err_wedged;
 	}
 
+	intel_uc_reset_finish(&gt->uc);
+
 	intel_rps_enable(&gt->rps);
 	intel_llc_enable(&gt->llc);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 72251638d4ea..2987282dff6d 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
 		__intel_engine_reset(engine, stalled_mask & engine->mask);
 	local_bh_enable();
 
+	intel_uc_reset(&gt->uc, true);
+
 	intel_ggtt_restore_fences(gt->ggtt);
 
 	return err;
@@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
 		if (awake & engine->mask)
 			intel_engine_pm_put(engine);
 	}
+
+	intel_uc_reset_finish(&gt->uc);
 }
 
 static void nop_submit_request(struct i915_request *request)
@@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
 	for_each_engine(engine, gt, id)
 		if (engine->reset.cancel)
 			engine->reset.cancel(engine);
+	intel_uc_cancel_requests(&gt->uc);
 	local_bh_enable();
 
 	reset_finish(gt, awake);
@@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
 	ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
 	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags));
 
+	if (intel_engine_uses_guc(engine))
+		return -ENODEV;
+
 	if (!intel_engine_pm_get_if_awake(engine))
 		return 0;
 
@@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
 			   "Resetting %s for %s\n", engine->name, msg);
 	atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
 
-	if (intel_engine_uses_guc(engine))
-		ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
-	else
-		ret = intel_gt_reset_engine(engine);
+	ret = intel_gt_reset_engine(engine);
 	if (ret) {
 		/* If we fail here, we expect to fallback to a global reset */
-		ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
+		ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret);
 		goto out;
 	}
 
@@ -1341,7 +1346,8 @@ void intel_gt_handle_error(struct intel_gt *gt,
 	 * Try engine reset when available. We fall back to full reset if
 	 * single reset fails.
 	 */
-	if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
+	if (!intel_uc_uses_guc_submission(&gt->uc) &&
+	    intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
 		local_bh_disable();
 		for_each_engine_masked(engine, gt, engine_mask, tmp) {
 			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index e1506b280df1..99dcdc8fba12 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -1049,6 +1049,25 @@ static void ring_bump_serial(struct intel_engine_cs *engine)
 	engine->serial++;
 }
 
+static void add_to_engine(struct i915_request *rq)
+{
+	lockdep_assert_held(&rq->engine->sched_engine->lock);
+	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+
+static void remove_from_engine(struct i915_request *rq)
+{
+	spin_lock_irq(&rq->engine->sched_engine->lock);
+	list_del_init(&rq->sched.link);
+
+	/* Prevent further __await_execution() registering a cb, then flush */
+	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
+
+	spin_unlock_irq(&rq->engine->sched_engine->lock);
+
+	i915_request_notify_execute_cb_imm(rq);
+}
+
 static void setup_common(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
@@ -1066,6 +1085,9 @@ static void setup_common(struct intel_engine_cs *engine)
 	engine->reset.cancel = reset_cancel;
 	engine->reset.finish = reset_finish;
 
+	engine->add_active_request = add_to_engine;
+	engine->remove_active_request = remove_from_engine;
+
 	engine->cops = &ring_context_ops;
 	engine->request_alloc = ring_request_alloc;
 	engine->bump_serial = ring_bump_serial;
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index fc5a65ab1937..c12ff3a75ce6 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -235,6 +235,35 @@ static void mock_submit_request(struct i915_request *request)
 	spin_unlock_irqrestore(&engine->hw_lock, flags);
 }
 
+static void mock_add_to_engine(struct i915_request *rq)
+{
+	lockdep_assert_held(&rq->engine->sched_engine->lock);
+	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+
+static void mock_remove_from_engine(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine, *locked;
+
+	/*
+	 * Virtual engines complicate acquiring the engine timeline lock,
+	 * as their rq->engine pointer is not stable until under that
+	 * engine lock. The simple ploy we use is to take the lock then
+	 * check that the rq still belongs to the newly locked engine.
+	 */
+
+	locked = READ_ONCE(rq->engine);
+	spin_lock_irq(&locked->sched_engine->lock);
+	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
+		spin_unlock(&locked->sched_engine->lock);
+		spin_lock(&engine->sched_engine->lock);
+		locked = engine;
+	}
+	list_del_init(&rq->sched.link);
+	spin_unlock_irq(&locked->sched_engine->lock);
+}
+
+
 static void mock_reset_prepare(struct intel_engine_cs *engine)
 {
 }
@@ -327,6 +356,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	engine->base.emit_flush = mock_emit_flush;
 	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
 	engine->base.submit_request = mock_submit_request;
+	engine->base.add_active_request = mock_add_to_engine;
+	engine->base.remove_active_request = mock_remove_from_engine;
 
 	engine->base.reset.prepare = mock_reset_prepare;
 	engine->base.reset.rewind = mock_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 6661dcb02239..9b09395b998f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -572,19 +572,6 @@ int intel_guc_suspend(struct intel_guc *guc)
 	return 0;
 }
 
-/**
- * intel_guc_reset_engine() - ask GuC to reset an engine
- * @guc:	intel_guc structure
- * @engine:	engine to be reset
- */
-int intel_guc_reset_engine(struct intel_guc *guc,
-			   struct intel_engine_cs *engine)
-{
-	/* XXX: to be implemented with submission interface rework */
-
-	return -ENODEV;
-}
-
 /**
  * intel_guc_resume() - notify GuC resuming from suspend state
  * @guc:	the guc
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 22eb1e9cca41..40c9868762d7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -242,14 +242,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
 
 int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
 
-int intel_guc_reset_engine(struct intel_guc *guc,
-			   struct intel_engine_cs *engine);
-
 int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 					  const u32 *msg, u32 len);
 int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 				     const u32 *msg, u32 len);
 
+void intel_guc_submission_reset_prepare(struct intel_guc *guc);
+void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
+void intel_guc_submission_reset_finish(struct intel_guc *guc);
+void intel_guc_submission_cancel_requests(struct intel_guc *guc);
+
 void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
 
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 83058df5ba01..b8c894ad8caf 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -141,7 +141,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce)
 static inline void
 set_context_wait_for_deregister_to_register(struct intel_context *ce)
 {
-	/* Only should be called from guc_lrc_desc_pin() */
+	/* Only should be called from guc_lrc_desc_pin() without lock */
 	ce->guc_state.sched_state |=
 		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
 }
@@ -241,15 +241,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
 
 static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
 {
+	guc->lrc_desc_pool_vaddr = NULL;
 	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
 }
 
+static inline bool guc_submission_initialized(struct intel_guc *guc)
+{
+	return guc->lrc_desc_pool_vaddr != NULL;
+}
+
 static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
 {
-	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
+	if (likely(guc_submission_initialized(guc))) {
+		struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
+		unsigned long flags;
 
-	memset(desc, 0, sizeof(*desc));
-	xa_erase_irq(&guc->context_lookup, id);
+		memset(desc, 0, sizeof(*desc));
+
+		/*
+		 * xarray API doesn't have xa_erase_irqsave wrapper, so calling
+		 * the lower level functions directly.
+		 */
+		xa_lock_irqsave(&guc->context_lookup, flags);
+		__xa_erase(&guc->context_lookup, id);
+		xa_unlock_irqrestore(&guc->context_lookup, flags);
+	}
 }
 
 static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
@@ -260,7 +276,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
 static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
 					   struct intel_context *ce)
 {
-	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
+	unsigned long flags;
+
+	/*
+	 * xarray API doesn't have xa_save_irqsave wrapper, so calling the
+	 * lower level functions directly.
+	 */
+	xa_lock_irqsave(&guc->context_lookup, flags);
+	__xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
+	xa_unlock_irqrestore(&guc->context_lookup, flags);
 }
 
 static int guc_submission_busy_loop(struct intel_guc* guc,
@@ -331,6 +355,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
 					interruptible, timeout);
 }
 
+static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
+
 static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
 	int err;
@@ -338,11 +364,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	u32 action[3];
 	int len = 0;
 	u32 g2h_len_dw = 0;
-	bool enabled = context_enabled(ce);
+	bool enabled;
 
 	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
 	GEM_BUG_ON(context_guc_id_invalid(ce));
 
+	/*
+	 * Corner case where the GuC firmware was blown away and reloaded while
+	 * this context was pinned.
+	 */
+	if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
+		err = guc_lrc_desc_pin(ce, false);
+		if (unlikely(err))
+			goto out;
+	}
+	enabled = context_enabled(ce);
+
 	if (!enabled) {
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
 		action[len++] = ce->guc_id;
@@ -365,6 +402,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		intel_context_put(ce);
 	}
 
+out:
 	return err;
 }
 
@@ -419,15 +457,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 	if (submit) {
 		guc_set_lrc_tail(last);
 resubmit:
-		/*
-		 * We only check for -EBUSY here even though it is possible for
-		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
-		 * died and a full GPU needs to be done. The hangcheck will
-		 * eventually detect that the GuC has died and trigger this
-		 * reset so no need to handle -EDEADLK here.
-		 */
 		ret = guc_add_request(guc, last);
-		if (ret == -EBUSY) {
+		if (unlikely(ret == -EIO))
+			goto deadlk;
+		else if (ret == -EBUSY) {
 			tasklet_schedule(&sched_engine->tasklet);
 			guc->stalled_request = last;
 			return false;
@@ -437,6 +470,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 
 	guc->stalled_request = NULL;
 	return submit;
+
+deadlk:
+	sched_engine->tasklet.callback = NULL;
+	tasklet_disable_nosync(&sched_engine->tasklet);
+	return false;
 }
 
 static void guc_submission_tasklet(struct tasklet_struct *t)
@@ -463,27 +501,165 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
 		intel_engine_signal_breadcrumbs(engine);
 }
 
-static void guc_reset_prepare(struct intel_engine_cs *engine)
+static void __guc_context_destroy(struct intel_context *ce);
+static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
+static void guc_signal_context_fence(struct intel_context *ce);
+
+static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
+{
+	struct intel_context *ce;
+	unsigned long index, flags;
+	bool pending_disable, pending_enable, deregister, destroyed;
+
+	xa_for_each(&guc->context_lookup, index, ce) {
+		/* Flush context */
+		spin_lock_irqsave(&ce->guc_state.lock, flags);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		/*
+		 * Once we are at this point submission_disabled() is guaranteed
+		 * to visible to all callers who set the below flags (see above
+		 * flush and flushes in reset_prepare). If submission_disabled()
+		 * is set, the caller shouldn't set these flags.
+		 */
+
+		destroyed = context_destroyed(ce);
+		pending_enable = context_pending_enable(ce);
+		pending_disable = context_pending_disable(ce);
+		deregister = context_wait_for_deregister_to_register(ce);
+		init_sched_state(ce);
+
+		if (pending_enable || destroyed || deregister) {
+			atomic_dec(&guc->outstanding_submission_g2h);
+			if (deregister)
+				guc_signal_context_fence(ce);
+			if (destroyed) {
+				release_guc_id(guc, ce);
+				__guc_context_destroy(ce);
+			}
+			if (pending_enable|| deregister)
+				intel_context_put(ce);
+		}
+
+		/* Not mutualy exclusive with above if statement. */
+		if (pending_disable) {
+			guc_signal_context_fence(ce);
+			intel_context_sched_disable_unpin(ce);
+			atomic_dec(&guc->outstanding_submission_g2h);
+			intel_context_put(ce);
+		}
+	}
+}
+
+static inline bool
+submission_disabled(struct intel_guc *guc)
+{
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+
+	return unlikely(!__tasklet_is_enabled(&sched_engine->tasklet));
+}
+
+static void disable_submission(struct intel_guc *guc)
+{
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+
+	if (__tasklet_is_enabled(&sched_engine->tasklet)) {
+		GEM_BUG_ON(!guc->ct.enabled);
+		__tasklet_disable_sync_once(&sched_engine->tasklet);
+		sched_engine->tasklet.callback = NULL;
+	}
+}
+
+static void enable_submission(struct intel_guc *guc)
+{
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	unsigned long flags;
+
+	spin_lock_irqsave(&guc->sched_engine->lock, flags);
+	sched_engine->tasklet.callback = guc_submission_tasklet;
+	wmb();
+	if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
+	    __tasklet_enable(&sched_engine->tasklet)) {
+		GEM_BUG_ON(!guc->ct.enabled);
+
+		/* And kick in case we missed a new request submission. */
+		tasklet_hi_schedule(&sched_engine->tasklet);
+	}
+	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+}
+
+static void guc_flush_submissions(struct intel_guc *guc)
 {
-	ENGINE_TRACE(engine, "\n");
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	unsigned long flags;
+
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+
+void intel_guc_submission_reset_prepare(struct intel_guc *guc)
+{
+	int i;
+
+	if (unlikely(!guc_submission_initialized(guc)))
+		/* Reset called during driver load? GuC not yet initialised! */
+		return;
+
+	disable_submission(guc);
+	guc->interrupts.disable(guc);
+
+	/* Flush IRQ handler */
+	spin_lock_irq(&guc_to_gt(guc)->irq_lock);
+	spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
+
+	guc_flush_submissions(guc);
 
 	/*
-	 * Prevent request submission to the hardware until we have
-	 * completed the reset in i915_gem_reset_finish(). If a request
-	 * is completed by one engine, it may then queue a request
-	 * to a second via its execlists->tasklet *just* as we are
-	 * calling engine->init_hw() and also writing the ELSP.
-	 * Turning off the execlists->tasklet until the reset is over
-	 * prevents the race.
-	 */
-	__tasklet_disable_sync_once(&engine->sched_engine->tasklet);
+	 * Handle any outstanding G2Hs before reset. Call IRQ handler directly
+	 * each pass as interrupt have been disabled. We always scrub for
+	 * outstanding G2H as it is possible for outstanding_submission_g2h to
+	 * be incremented after the context state update.
+ 	 */
+	for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
+		intel_guc_to_host_event_handler(guc);
+#define wait_for_reset(guc, wait_var) \
+		guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
+		do {
+			wait_for_reset(guc, &guc->outstanding_submission_g2h);
+		} while (!list_empty(&guc->ct.requests.incoming));
+	}
+	scrub_guc_desc_for_outstanding_g2h(guc);
 }
 
-static void guc_reset_state(struct intel_context *ce,
-			    struct intel_engine_cs *engine,
-			    u32 head,
-			    bool scrub)
+static struct intel_engine_cs *
+guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
 {
+	struct intel_engine_cs *engine;
+	intel_engine_mask_t tmp, mask = ve->mask;
+	unsigned int num_siblings = 0;
+
+	for_each_engine_masked(engine, ve->gt, mask, tmp)
+		if (num_siblings++ == sibling)
+			return engine;
+
+	return NULL;
+}
+
+static inline struct intel_engine_cs *
+__context_to_physical_engine(struct intel_context *ce)
+{
+	struct intel_engine_cs *engine = ce->engine;
+
+	if (intel_engine_is_virtual(engine))
+		engine = guc_virtual_get_sibling(engine, 0);
+
+	return engine;
+}
+
+static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
+{
+	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
+
 	GEM_BUG_ON(!intel_context_is_pinned(ce));
 
 	/*
@@ -501,42 +677,147 @@ static void guc_reset_state(struct intel_context *ce,
 	lrc_update_regs(ce, engine, head);
 }
 
-static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
+static void guc_reset_nop(struct intel_engine_cs *engine)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct i915_request *rq;
+}
+
+static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled)
+{
+}
+
+static void
+__unwind_incomplete_requests(struct intel_context *ce)
+{
+	struct i915_request *rq, *rn;
+	struct list_head *pl;
+	int prio = I915_PRIORITY_INVALID;
+	struct i915_sched_engine * const sched_engine =
+		ce->engine->sched_engine;
 	unsigned long flags;
 
-	spin_lock_irqsave(&engine->sched_engine->lock, flags);
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	spin_lock(&ce->guc_active.lock);
+	list_for_each_entry_safe(rq, rn,
+				 &ce->guc_active.requests,
+				 sched.link) {
+		if (i915_request_completed(rq))
+			continue;
+
+		list_del_init(&rq->sched.link);
+		spin_unlock(&ce->guc_active.lock);
+
+		__i915_request_unsubmit(rq);
+
+		/* Push the request back into the queue for later resubmission. */
+		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
+		if (rq_prio(rq) != prio) {
+			prio = rq_prio(rq);
+			pl = i915_sched_lookup_priolist(sched_engine, prio);
+		}
+		GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
+
+		list_add_tail(&rq->sched.link, pl);
+		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 
-	/* Push back any incomplete requests for replay after the reset. */
-	rq = execlists_unwind_incomplete_requests(execlists);
-	if (!rq)
-		goto out_unlock;
+		spin_lock(&ce->guc_active.lock);
+	}
+	spin_unlock(&ce->guc_active.lock);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+
+static struct i915_request *context_find_active_request(struct intel_context *ce)
+{
+	struct i915_request *rq, *active = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ce->guc_active.lock, flags);
+	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
+				    sched.link) {
+		if (i915_request_completed(rq))
+			break;
+
+		active = rq;
+	}
+	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
+
+	return active;
+}
+
+static void __guc_reset_context(struct intel_context *ce, bool stalled)
+{
+	struct i915_request *rq;
+	u32 head;
+
+	/*
+	 * GuC will implicitly mark the context as non-schedulable
+	 * when it sends the reset notification. Make sure our state
+	 * reflects this change. The context will be marked enabled
+	 * on resubmission.
+	 */
+	clr_context_enabled(ce);
+
+	rq = context_find_active_request(ce);
+	if (!rq) {
+		head = ce->ring->tail;
+		stalled = false;
+		goto out_replay;
+	}
 
 	if (!i915_request_started(rq))
 		stalled = false;
 
+	GEM_BUG_ON(i915_active_is_idle(&ce->active));
+	head = intel_ring_wrap(ce->ring, rq->head);
 	__i915_request_reset(rq, stalled);
-	guc_reset_state(rq->context, engine, rq->head, stalled);
 
-out_unlock:
-	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+out_replay:
+	guc_reset_state(ce, head, stalled);
+	__unwind_incomplete_requests(ce);
 }
 
-static void guc_reset_cancel(struct intel_engine_cs *engine)
+void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
+{
+	struct intel_context *ce;
+	unsigned long index;
+
+	if (unlikely(!guc_submission_initialized(guc)))
+		/* Reset called during driver load? GuC not yet initialised! */
+		return;
+
+	xa_for_each(&guc->context_lookup, index, ce)
+		if (intel_context_is_pinned(ce))
+			__guc_reset_context(ce, stalled);
+
+	/* GuC is blown away, drop all references to contexts */
+	xa_destroy(&guc->context_lookup);
+}
+
+static void guc_cancel_context_requests(struct intel_context *ce)
+{
+	struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
+	struct i915_request *rq;
+	unsigned long flags;
+
+	/* Mark all executing requests as skipped. */
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	spin_lock(&ce->guc_active.lock);
+	list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
+		i915_request_put(i915_request_mark_eio(rq));
+	spin_unlock(&ce->guc_active.lock);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+
+static void
+guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine)
 {
-	struct i915_sched_engine * const sched_engine = engine->sched_engine;
 	struct i915_request *rq, *rn;
 	struct rb_node *rb;
 	unsigned long flags;
 
 	/* Can be called during boot if GuC fails to load */
-	if (!engine->gt)
+	if (!sched_engine)
 		return;
 
-	ENGINE_TRACE(engine, "\n");
-
 	/*
 	 * Before we call engine->cancel_requests(), we should have exclusive
 	 * access to the submission state. This is arranged for us by the
@@ -553,21 +834,16 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
 	 */
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	/* Mark all executing requests as skipped. */
-	list_for_each_entry(rq, &sched_engine->requests, sched.link) {
-		i915_request_set_error_once(rq, -EIO);
-		i915_request_mark_complete(rq);
-	}
-
 	/* Flush the queued requests to the timeline list (for retiring). */
 	while ((rb = rb_first_cached(&sched_engine->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 
 		priolist_for_each_request_consume(rq, rn, p) {
 			list_del_init(&rq->sched.link);
+
 			__i915_request_submit(rq);
-			dma_fence_set_error(&rq->fence, -EIO);
-			i915_request_mark_complete(rq);
+
+			i915_request_put(i915_request_mark_eio(rq));
 		}
 
 		rb_erase_cached(&p->node, &sched_engine->queue);
@@ -582,14 +858,38 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
-static void guc_reset_finish(struct intel_engine_cs *engine)
+void intel_guc_submission_cancel_requests(struct intel_guc *guc)
 {
-	if (__tasklet_enable(&engine->sched_engine->tasklet))
-		/* And kick in case we missed a new request submission. */
-		tasklet_hi_schedule(&engine->sched_engine->tasklet);
+	struct intel_context *ce;
+	unsigned long index;
+
+	xa_for_each(&guc->context_lookup, index, ce)
+		if (intel_context_is_pinned(ce))
+			guc_cancel_context_requests(ce);
 
-	ENGINE_TRACE(engine, "depth->%d\n",
-		     atomic_read(&engine->sched_engine->tasklet.count));
+	guc_cancel_sched_engine_requests(guc->sched_engine);
+
+	/* GuC is blown away, drop all references to contexts */
+	xa_destroy(&guc->context_lookup);
+}
+
+void intel_guc_submission_reset_finish(struct intel_guc *guc)
+{
+	/* Reset called during driver load or during wedge? */
+	if (unlikely(!guc_submission_initialized(guc) ||
+		     test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
+		return;
+
+	/*
+	 * Technically possible for either of these values to be non-zero here,
+	 * but very unlikely + harmless. Regardless let's add a warn so we can
+	 * see in CI if this happens frequently / a precursor to taking down the
+	 * machine.
+	 */
+	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
+	atomic_set(&guc->outstanding_submission_g2h, 0);
+
+	enable_submission(guc);
 }
 
 /*
@@ -656,6 +956,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
 	else
 		trace_i915_request_guc_submit(rq);
 
+	if (unlikely(ret == -EIO))
+		disable_submission(guc);
+
 	return ret;
 }
 
@@ -668,7 +971,8 @@ static void guc_submit_request(struct i915_request *rq)
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
+	if (submission_disabled(guc) || guc->stalled_request ||
+	    !i915_sched_engine_is_empty(sched_engine))
 		queue_request(sched_engine, rq, rq_prio(rq));
 	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
 		tasklet_hi_schedule(&sched_engine->tasklet);
@@ -805,7 +1109,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
 
 static int __guc_action_register_context(struct intel_guc *guc,
 					 u32 guc_id,
-					 u32 offset)
+					 u32 offset,
+					 bool loop)
 {
 	u32 action[] = {
 		INTEL_GUC_ACTION_REGISTER_CONTEXT,
@@ -813,10 +1118,10 @@ static int __guc_action_register_context(struct intel_guc *guc,
 		offset,
 	};
 
-	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
+	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, loop);
 }
 
-static int register_context(struct intel_context *ce)
+static int register_context(struct intel_context *ce, bool loop)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
@@ -824,11 +1129,12 @@ static int register_context(struct intel_context *ce)
 
 	trace_intel_context_register(ce);
 
-	return __guc_action_register_context(guc, ce->guc_id, offset);
+	return __guc_action_register_context(guc, ce->guc_id, offset, loop);
 }
 
 static int __guc_action_deregister_context(struct intel_guc *guc,
-					   u32 guc_id)
+					   u32 guc_id,
+					   bool loop)
 {
 	u32 action[] = {
 		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
@@ -836,16 +1142,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
 	};
 
 	return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
-					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
+					G2H_LEN_DW_DEREGISTER_CONTEXT, loop);
 }
 
-static int deregister_context(struct intel_context *ce, u32 guc_id)
+static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 
 	trace_intel_context_deregister(ce);
 
-	return __guc_action_deregister_context(guc, guc_id);
+	return __guc_action_deregister_context(guc, guc_id, loop);
 }
 
 static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
@@ -874,7 +1180,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
 	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
 }
 
-static int guc_lrc_desc_pin(struct intel_context *ce)
+static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 {
 	struct intel_runtime_pm *runtime_pm =
 		&ce->engine->gt->i915->runtime_pm;
@@ -920,18 +1226,44 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
 	 */
 	if (context_registered) {
 		trace_intel_context_steal_guc_id(ce);
-		set_context_wait_for_deregister_to_register(ce);
-		intel_context_get(ce);
+		if (!loop) {
+			set_context_wait_for_deregister_to_register(ce);
+			intel_context_get(ce);
+		} else {
+			bool disabled;
+			unsigned long flags;
+
+			/* Seal race with Reset */
+			spin_lock_irqsave(&ce->guc_state.lock, flags);
+			disabled = submission_disabled(guc);
+			if (likely(!disabled)) {
+				set_context_wait_for_deregister_to_register(ce);
+				intel_context_get(ce);
+			}
+			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+			if (unlikely(disabled)) {
+				reset_lrc_desc(guc, desc_idx);
+				return 0;	/* Will get registered later */
+			}
+		}
 
 		/*
 		 * If stealing the guc_id, this ce has the same guc_id as the
 		 * context whos guc_id was stole.
 		 */
 		with_intel_runtime_pm(runtime_pm, wakeref)
-			ret = deregister_context(ce, ce->guc_id);
+			ret = deregister_context(ce, ce->guc_id, loop);
+		if (unlikely(ret == -EBUSY)) {
+			clr_context_wait_for_deregister_to_register(ce);
+			intel_context_put(ce);
+		}
 	} else {
 		with_intel_runtime_pm(runtime_pm, wakeref)
-			ret = register_context(ce);
+			ret = register_context(ce, loop);
+		if (unlikely(ret == -EBUSY))
+			reset_lrc_desc(guc, desc_idx);
+		else if (unlikely(ret == -ENODEV))
+			ret = 0;	/* Will get registered later */
 	}
 
 	return ret;
@@ -994,7 +1326,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
 
 	trace_intel_context_sched_disable(ce);
-	intel_context_get(ce);
 
 	guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
 				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
@@ -1004,6 +1335,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
 {
 	set_context_pending_disable(ce);
 	clr_context_enabled(ce);
+	intel_context_get(ce);
 
 	return ce->guc_id;
 }
@@ -1016,7 +1348,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	u16 guc_id;
 	intel_wakeref_t wakeref;
 
-	if (context_guc_id_invalid(ce) ||
+	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
 		clr_context_enabled(ce);
 		goto unpin;
@@ -1034,6 +1366,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	 * request doesn't slip through the 'context_pending_disable' fence.
 	 */
 	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 		return;
 	}
 	guc_id = prep_context_pending_disable(ce);
@@ -1050,19 +1383,13 @@ static void guc_context_sched_disable(struct intel_context *ce)
 
 static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 {
-	struct intel_engine_cs *engine = ce->engine;
-	struct intel_guc *guc = &engine->gt->uc.guc;
-	unsigned long flags;
+	struct intel_guc *guc = ce_to_guc(ce);
 
 	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
 	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
 	GEM_BUG_ON(context_enabled(ce));
 
-	spin_lock_irqsave(&ce->guc_state.lock, flags);
-	set_context_destroyed(ce);
-	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
-
-	deregister_context(ce, ce->guc_id);
+	deregister_context(ce, ce->guc_id, true);
 }
 
 static void __guc_context_destroy(struct intel_context *ce)
@@ -1090,13 +1417,15 @@ static void guc_context_destroy(struct kref *kref)
 	struct intel_guc *guc = &ce->engine->gt->uc.guc;
 	intel_wakeref_t wakeref;
 	unsigned long flags;
+	bool disabled;
 
 	/*
 	 * If the guc_id is invalid this context has been stolen and we can free
 	 * it immediately. Also can be freed immediately if the context is not
 	 * registered with the GuC.
 	 */
-	if (context_guc_id_invalid(ce) ||
+	if (submission_disabled(guc) ||
+	    context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
 		release_guc_id(guc, ce);
 		__guc_context_destroy(ce);
@@ -1123,6 +1452,18 @@ static void guc_context_destroy(struct kref *kref)
 		list_del_init(&ce->guc_id_link);
 	spin_unlock_irqrestore(&guc->contexts_lock, flags);
 
+	/* Seal race with Reset */
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	disabled = submission_disabled(guc);
+	if (likely(!disabled))
+		set_context_destroyed(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+	if (unlikely(disabled)) {
+		release_guc_id(guc, ce);
+		__guc_context_destroy(ce);
+		return;
+	}
+
 	/*
 	 * We defer GuC context deregistration until the context is destroyed
 	 * in order to save on CTBs. With this optimization ideally we only need
@@ -1145,6 +1486,33 @@ static int guc_context_alloc(struct intel_context *ce)
 	return lrc_alloc(ce, ce->engine);
 }
 
+static void add_to_context(struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+
+	spin_lock(&ce->guc_active.lock);
+	list_move_tail(&rq->sched.link, &ce->guc_active.requests);
+	spin_unlock(&ce->guc_active.lock);
+}
+
+static void remove_from_context(struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+
+	spin_lock_irq(&ce->guc_active.lock);
+
+	list_del_init(&rq->sched.link);
+	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+
+	/* Prevent further __await_execution() registering a cb, then flush */
+	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
+
+	spin_unlock_irq(&ce->guc_active.lock);
+
+	atomic_dec(&ce->guc_id_ref);
+	i915_request_notify_execute_cb_imm(rq);
+}
+
 static const struct intel_context_ops guc_context_ops = {
 	.alloc = guc_context_alloc,
 
@@ -1183,8 +1551,6 @@ static void guc_signal_context_fence(struct intel_context *ce)
 {
 	unsigned long flags;
 
-	GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
-
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
 	clr_context_wait_for_deregister_to_register(ce);
 	__guc_signal_context_fence(ce);
@@ -1193,8 +1559,9 @@ static void guc_signal_context_fence(struct intel_context *ce)
 
 static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
 {
-	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
-		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
+	return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
+		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
+		!submission_disabled(ce_to_guc(ce));
 }
 
 static int guc_request_alloc(struct i915_request *rq)
@@ -1252,8 +1619,12 @@ static int guc_request_alloc(struct i915_request *rq)
 	if (unlikely(ret < 0))
 		return ret;;
 	if (context_needs_register(ce, !!ret)) {
-		ret = guc_lrc_desc_pin(ce);
+		ret = guc_lrc_desc_pin(ce, true);
 		if (unlikely(ret)) {	/* unwind */
+			if (ret == -EIO) {
+				disable_submission(guc);
+				goto out;	/* GPU will be reset */
+			}
 			atomic_dec(&ce->guc_id_ref);
 			unpin_guc_id(guc, ce);
 			return ret;
@@ -1290,20 +1661,6 @@ static int guc_request_alloc(struct i915_request *rq)
 	return 0;
 }
 
-static struct intel_engine_cs *
-guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
-{
-	struct intel_engine_cs *engine;
-	intel_engine_mask_t tmp, mask = ve->mask;
-	unsigned int num_siblings = 0;
-
-	for_each_engine_masked(engine, ve->gt, mask, tmp)
-		if (num_siblings++ == sibling)
-			return engine;
-
-	return NULL;
-}
-
 static int guc_virtual_context_pre_pin(struct intel_context *ce,
 				       struct i915_gem_ww_ctx *ww,
 				       void **vaddr)
@@ -1512,7 +1869,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
 {
 	if (context_guc_id_invalid(ce))
 		pin_guc_id(guc, ce);
-	guc_lrc_desc_pin(ce);
+	guc_lrc_desc_pin(ce, true);
 }
 
 static inline void guc_init_lrc_mapping(struct intel_guc *guc)
@@ -1578,13 +1935,15 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 	engine->cops = &guc_context_ops;
 	engine->request_alloc = guc_request_alloc;
 	engine->bump_serial = guc_bump_serial;
+	engine->add_active_request = add_to_context;
+	engine->remove_active_request = remove_from_context;
 
 	engine->sched_engine->schedule = i915_schedule;
 
-	engine->reset.prepare = guc_reset_prepare;
-	engine->reset.rewind = guc_reset_rewind;
-	engine->reset.cancel = guc_reset_cancel;
-	engine->reset.finish = guc_reset_finish;
+	engine->reset.prepare = guc_reset_nop;
+	engine->reset.rewind = guc_rewind_nop;
+	engine->reset.cancel = guc_reset_nop;
+	engine->reset.finish = guc_reset_nop;
 
 	engine->emit_flush = gen8_emit_flush_xcs;
 	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
@@ -1757,7 +2116,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 		 * register this context.
 		 */
 		with_intel_runtime_pm(runtime_pm, wakeref)
-			register_context(ce);
+			register_context(ce, true);
 		guc_signal_context_fence(ce);
 		intel_context_put(ce);
 	} else if (context_destroyed(ce)) {
@@ -1939,6 +2298,10 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 				 "v%dx%d", ve->base.class, count);
 			ve->base.context_size = sibling->context_size;
 
+			ve->base.add_active_request =
+				sibling->add_active_request;
+			ve->base.remove_active_request =
+				sibling->remove_active_request;
 			ve->base.emit_bb_start = sibling->emit_bb_start;
 			ve->base.emit_flush = sibling->emit_flush;
 			ve->base.emit_init_breadcrumb =
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 6d8b9233214e..f0b02200aa01 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -565,12 +565,49 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
 
-	if (!intel_guc_is_ready(guc))
+
+	/* Nothing to do if GuC isn't supported */
+	if (!intel_uc_supports_guc(uc))
 		return;
 
+	/* Firmware expected to be running when this function is called */
+	if (!intel_guc_is_ready(guc))
+		goto sanitize;
+
+	if (intel_uc_uses_guc_submission(uc))
+		intel_guc_submission_reset_prepare(guc);
+
+sanitize:
 	__uc_sanitize(uc);
 }
 
+void intel_uc_reset(struct intel_uc *uc, bool stalled)
+{
+	struct intel_guc *guc = &uc->guc;
+
+	/* Firmware can not be running when this function is called  */
+	if (intel_uc_uses_guc_submission(uc))
+		intel_guc_submission_reset(guc, stalled);
+}
+
+void intel_uc_reset_finish(struct intel_uc *uc)
+{
+	struct intel_guc *guc = &uc->guc;
+
+	/* Firmware expected to be running when this function is called */
+	if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
+		intel_guc_submission_reset_finish(guc);
+}
+
+void intel_uc_cancel_requests(struct intel_uc *uc)
+{
+	struct intel_guc *guc = &uc->guc;
+
+	/* Firmware can not be running when this function is called  */
+	if (intel_uc_uses_guc_submission(uc))
+		intel_guc_submission_cancel_requests(guc);
+}
+
 void intel_uc_runtime_suspend(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
index c4cef885e984..eaa3202192ac 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc);
 void intel_uc_driver_remove(struct intel_uc *uc);
 void intel_uc_init_mmio(struct intel_uc *uc);
 void intel_uc_reset_prepare(struct intel_uc *uc);
+void intel_uc_reset(struct intel_uc *uc, bool stalled);
+void intel_uc_reset_finish(struct intel_uc *uc);
+void intel_uc_cancel_requests(struct intel_uc *uc);
 void intel_uc_suspend(struct intel_uc *uc);
 void intel_uc_runtime_suspend(struct intel_uc *uc);
 int intel_uc_resume(struct intel_uc *uc);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index dec5a35c9aa2..192784875a1d 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -194,7 +194,7 @@ static bool irq_work_imm(struct irq_work *wrk)
 	return false;
 }
 
-static void __notify_execute_cb_imm(struct i915_request *rq)
+void i915_request_notify_execute_cb_imm(struct i915_request *rq)
 {
 	__notify_execute_cb(rq, irq_work_imm);
 }
@@ -268,37 +268,6 @@ i915_request_active_engine(struct i915_request *rq,
 	return ret;
 }
 
-
-static void remove_from_engine(struct i915_request *rq)
-{
-	struct intel_engine_cs *engine, *locked;
-
-	/*
-	 * Virtual engines complicate acquiring the engine timeline lock,
-	 * as their rq->engine pointer is not stable until under that
-	 * engine lock. The simple ploy we use is to take the lock then
-	 * check that the rq still belongs to the newly locked engine.
-	 */
-	locked = READ_ONCE(rq->engine);
-	spin_lock_irq(&locked->sched_engine->lock);
-	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
-		spin_unlock(&locked->sched_engine->lock);
-		spin_lock(&engine->sched_engine->lock);
-		locked = engine;
-	}
-	list_del_init(&rq->sched.link);
-
-	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
-	clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
-
-	/* Prevent further __await_execution() registering a cb, then flush */
-	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
-
-	spin_unlock_irq(&locked->sched_engine->lock);
-
-	__notify_execute_cb_imm(rq);
-}
-
 static void __rq_init_watchdog(struct i915_request *rq)
 {
 	rq->watchdog.timer.function = NULL;
@@ -395,9 +364,7 @@ bool i915_request_retire(struct i915_request *rq)
 	 * after removing the breadcrumb and signaling it, so that we do not
 	 * inadvertently attach the breadcrumb to a completed request.
 	 */
-	if (!list_empty(&rq->sched.link))
-		remove_from_engine(rq);
-	atomic_dec(&rq->context->guc_id_ref);
+	rq->engine->remove_active_request(rq);
 	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
 
 	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
@@ -539,7 +506,7 @@ __await_execution(struct i915_request *rq,
 	if (llist_add(&cb->work.node.llist, &signal->execute_cb)) {
 		if (i915_request_is_active(signal) ||
 		    __request_in_flight(signal))
-			__notify_execute_cb_imm(signal);
+			i915_request_notify_execute_cb_imm(signal);
 	}
 
 	return 0;
@@ -676,7 +643,7 @@ bool __i915_request_submit(struct i915_request *request)
 	result = true;
 
 	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
-	list_move_tail(&request->sched.link, &engine->sched_engine->requests);
+	engine->add_active_request(request);
 active:
 	clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags);
 	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index f870cd75a001..bcc6340c505e 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -649,4 +649,6 @@ bool
 i915_request_active_engine(struct i915_request *rq,
 			   struct intel_engine_cs **active);
 
+void i915_request_notify_execute_cb_imm(struct i915_request *rq);
+
 #endif /* I915_REQUEST_H */
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 32/47] drm/i915: Reset GPU immediately if submission is disabled
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (30 preceding siblings ...)
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 31/47] drm/i915/guc: Reset implementation for new GuC interface Matthew Brost
@ 2021-06-24  7:05 ` Matthew Brost
  2021-07-12 20:01   ` John Harrison
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 33/47] drm/i915/guc: Add disable interrupts to guc sanitize Matthew Brost
                   ` (19 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

If submission is disabled by the backend for any reason, reset the GPU
immediately in the heartbeat code as the backend can't be reenabled
until the GPU is reset.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 63 +++++++++++++++----
 .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  4 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  9 +++
 drivers/gpu/drm/i915/i915_scheduler.c         |  6 ++
 drivers/gpu/drm/i915/i915_scheduler.h         |  6 ++
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  5 ++
 6 files changed, 80 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index b6a305e6a974..a8495364d906 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -70,12 +70,30 @@ static void show_heartbeat(const struct i915_request *rq,
 {
 	struct drm_printer p = drm_debug_printer("heartbeat");
 
-	intel_engine_dump(engine, &p,
-			  "%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n",
-			  engine->name,
-			  rq->fence.context,
-			  rq->fence.seqno,
-			  rq->sched.attr.priority);
+	if (!rq) {
+		intel_engine_dump(engine, &p,
+				  "%s heartbeat not ticking\n",
+				  engine->name);
+	} else {
+		intel_engine_dump(engine, &p,
+				  "%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n",
+				  engine->name,
+				  rq->fence.context,
+				  rq->fence.seqno,
+				  rq->sched.attr.priority);
+	}
+}
+
+static void
+reset_engine(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+	if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
+		show_heartbeat(rq, engine);
+
+	intel_gt_handle_error(engine->gt, engine->mask,
+			      I915_ERROR_CAPTURE,
+			      "stopped heartbeat on %s",
+			      engine->name);
 }
 
 static void heartbeat(struct work_struct *wrk)
@@ -102,6 +120,11 @@ static void heartbeat(struct work_struct *wrk)
 	if (intel_gt_is_wedged(engine->gt))
 		goto out;
 
+	if (i915_sched_engine_disabled(engine->sched_engine)) {
+		reset_engine(engine, engine->heartbeat.systole);
+		goto out;
+	}
+
 	if (engine->heartbeat.systole) {
 		long delay = READ_ONCE(engine->props.heartbeat_interval_ms);
 
@@ -139,13 +162,7 @@ static void heartbeat(struct work_struct *wrk)
 			engine->sched_engine->schedule(rq, &attr);
 			local_bh_enable();
 		} else {
-			if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
-				show_heartbeat(rq, engine);
-
-			intel_gt_handle_error(engine->gt, engine->mask,
-					      I915_ERROR_CAPTURE,
-					      "stopped heartbeat on %s",
-					      engine->name);
+			reset_engine(engine, rq);
 		}
 
 		rq->emitted_jiffies = jiffies;
@@ -194,6 +211,26 @@ void intel_engine_park_heartbeat(struct intel_engine_cs *engine)
 		i915_request_put(fetch_and_zero(&engine->heartbeat.systole));
 }
 
+void intel_gt_unpark_heartbeats(struct intel_gt *gt)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, gt, id)
+		if (intel_engine_pm_is_awake(engine))
+			intel_engine_unpark_heartbeat(engine);
+
+}
+
+void intel_gt_park_heartbeats(struct intel_gt *gt)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, gt, id)
+		intel_engine_park_heartbeat(engine);
+}
+
 void intel_engine_init_heartbeat(struct intel_engine_cs *engine)
 {
 	INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
index a488ea3e84a3..5da6d809a87a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
@@ -7,6 +7,7 @@
 #define INTEL_ENGINE_HEARTBEAT_H
 
 struct intel_engine_cs;
+struct intel_gt;
 
 void intel_engine_init_heartbeat(struct intel_engine_cs *engine);
 
@@ -16,6 +17,9 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
 void intel_engine_park_heartbeat(struct intel_engine_cs *engine);
 void intel_engine_unpark_heartbeat(struct intel_engine_cs *engine);
 
+void intel_gt_park_heartbeats(struct intel_gt *gt);
+void intel_gt_unpark_heartbeats(struct intel_gt *gt);
+
 int intel_engine_pulse(struct intel_engine_cs *engine);
 int intel_engine_flush_barriers(struct intel_engine_cs *engine);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index b8c894ad8caf..59fca9748c15 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -10,6 +10,7 @@
 #include "gt/intel_breadcrumbs.h"
 #include "gt/intel_context.h"
 #include "gt/intel_engine_pm.h"
+#include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_irq.h"
 #include "gt/intel_gt_pm.h"
@@ -605,6 +606,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 		/* Reset called during driver load? GuC not yet initialised! */
 		return;
 
+	intel_gt_park_heartbeats(guc_to_gt(guc));
 	disable_submission(guc);
 	guc->interrupts.disable(guc);
 
@@ -890,6 +892,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
 	atomic_set(&guc->outstanding_submission_g2h, 0);
 
 	enable_submission(guc);
+	intel_gt_unpark_heartbeats(guc_to_gt(guc));
 }
 
 /*
@@ -1859,6 +1862,11 @@ static int guc_resume(struct intel_engine_cs *engine)
 	return 0;
 }
 
+static bool guc_sched_engine_disabled(struct i915_sched_engine *sched_engine)
+{
+	return !sched_engine->tasklet.callback;
+}
+
 static void guc_set_default_submission(struct intel_engine_cs *engine)
 {
 	engine->submit_request = guc_submit_request;
@@ -2009,6 +2017,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 			return -ENOMEM;
 
 		guc->sched_engine->schedule = i915_schedule;
+		guc->sched_engine->disabled = guc_sched_engine_disabled;
 		guc->sched_engine->private_data = guc;
 		tasklet_setup(&guc->sched_engine->tasklet,
 			      guc_submission_tasklet);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 3a58a9130309..3fb009ea2cb2 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -440,6 +440,11 @@ void i915_sched_engine_free(struct kref *kref)
 	kfree(sched_engine);
 }
 
+static bool default_disabled(struct i915_sched_engine *sched_engine)
+{
+	return false;
+}
+
 struct i915_sched_engine *
 i915_sched_engine_create(unsigned int subclass)
 {
@@ -453,6 +458,7 @@ i915_sched_engine_create(unsigned int subclass)
 
 	sched_engine->queue = RB_ROOT_CACHED;
 	sched_engine->queue_priority_hint = INT_MIN;
+	sched_engine->disabled = default_disabled;
 
 	INIT_LIST_HEAD(&sched_engine->requests);
 	INIT_LIST_HEAD(&sched_engine->hold);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 650ab8e0db9f..72105a53b0e1 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -98,4 +98,10 @@ void i915_request_show_with_schedule(struct drm_printer *m,
 				     const char *prefix,
 				     int indent);
 
+static inline bool
+i915_sched_engine_disabled(struct i915_sched_engine *sched_engine)
+{
+	return sched_engine->disabled(sched_engine);
+}
+
 #endif /* _I915_SCHEDULER_H_ */
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index 5935c3152bdc..cfaf52e528d0 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -163,6 +163,11 @@ struct i915_sched_engine {
 	 */
 	void *private_data;
 
+	/**
+	 * @disabled: check if backend has disabled submission
+	 */
+	bool	(*disabled)(struct i915_sched_engine *sched_engine);
+
 	/**
 	 * @kick_backend: kick backend after a request's priority has changed
 	 */
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 33/47] drm/i915/guc: Add disable interrupts to guc sanitize
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (31 preceding siblings ...)
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 32/47] drm/i915: Reset GPU immediately if submission is disabled Matthew Brost
@ 2021-06-24  7:05 ` Matthew Brost
  2021-07-12 20:11   ` John Harrison
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 34/47] drm/i915/guc: Suspend/resume implementation for new interface Matthew Brost
                   ` (18 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Add disable GuC interrupts to intel_guc_sanitize(). Part of this
requires moving the guc_*_interrupt wrapper function into header file
intel_guc.h.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h | 16 ++++++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c  | 21 +++------------------
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 40c9868762d7..85ef6767f13b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -217,9 +217,25 @@ static inline bool intel_guc_is_ready(struct intel_guc *guc)
 	return intel_guc_is_fw_running(guc) && intel_guc_ct_enabled(&guc->ct);
 }
 
+static inline void intel_guc_reset_interrupts(struct intel_guc *guc)
+{
+	guc->interrupts.reset(guc);
+}
+
+static inline void intel_guc_enable_interrupts(struct intel_guc *guc)
+{
+	guc->interrupts.enable(guc);
+}
+
+static inline void intel_guc_disable_interrupts(struct intel_guc *guc)
+{
+	guc->interrupts.disable(guc);
+}
+
 static inline int intel_guc_sanitize(struct intel_guc *guc)
 {
 	intel_uc_fw_sanitize(&guc->fw);
+	intel_guc_disable_interrupts(guc);
 	intel_guc_ct_sanitize(&guc->ct);
 	guc->mmio_msg = 0;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index f0b02200aa01..ab11fe731ee7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -207,21 +207,6 @@ static void guc_handle_mmio_msg(struct intel_guc *guc)
 	spin_unlock_irq(&guc->irq_lock);
 }
 
-static void guc_reset_interrupts(struct intel_guc *guc)
-{
-	guc->interrupts.reset(guc);
-}
-
-static void guc_enable_interrupts(struct intel_guc *guc)
-{
-	guc->interrupts.enable(guc);
-}
-
-static void guc_disable_interrupts(struct intel_guc *guc)
-{
-	guc->interrupts.disable(guc);
-}
-
 static int guc_enable_communication(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
@@ -242,7 +227,7 @@ static int guc_enable_communication(struct intel_guc *guc)
 	guc_get_mmio_msg(guc);
 	guc_handle_mmio_msg(guc);
 
-	guc_enable_interrupts(guc);
+	intel_guc_enable_interrupts(guc);
 
 	/* check for CT messages received before we enabled interrupts */
 	spin_lock_irq(&gt->irq_lock);
@@ -265,7 +250,7 @@ static void guc_disable_communication(struct intel_guc *guc)
 	 */
 	guc_clear_mmio_msg(guc);
 
-	guc_disable_interrupts(guc);
+	intel_guc_disable_interrupts(guc);
 
 	intel_guc_ct_disable(&guc->ct);
 
@@ -463,7 +448,7 @@ static int __uc_init_hw(struct intel_uc *uc)
 	if (ret)
 		goto err_out;
 
-	guc_reset_interrupts(guc);
+	intel_guc_reset_interrupts(guc);
 
 	/* WaEnableuKernelHeaderValidFix:skl */
 	/* WaEnableGuCBootHashCheckNotSet:skl,bxt,kbl */
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 34/47] drm/i915/guc: Suspend/resume implementation for new interface
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (32 preceding siblings ...)
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 33/47] drm/i915/guc: Add disable interrupts to guc sanitize Matthew Brost
@ 2021-06-24  7:05 ` Matthew Brost
  2021-07-12 22:56   ` John Harrison
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 35/47] drm/i915/guc: Handle context reset notification Matthew Brost
                   ` (17 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

The new GuC interface introduces an MMIO H2G command,
INTEL_GUC_ACTION_RESET_CLIENT, which is used to implement suspend. This
MMIO tears down any active contexts generating a context reset G2H CTB
for each. Once that step completes the GuC tears down the CTB
channels. It is safe to suspend once this MMIO H2G command completes
and all G2H CTBs have been processed. In practice the i915 will likely
never receive a G2H as suspend should only be called after the GPU is
idle.

Resume is implemented in the same manner as before - simply reload the
GuC firmware and reinitialize everything (e.g. CTB channels, contexts,
etc..).

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        | 64 ++++++++-----------
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 ++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  5 ++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         | 20 ++++--
 5 files changed, 53 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 57e18babdf4b..596cf4b818e5 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -142,6 +142,7 @@ enum intel_guc_action {
 	INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
 	INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
 	INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
+	INTEL_GUC_ACTION_RESET_CLIENT = 0x5B01,
 	INTEL_GUC_ACTION_LIMIT
 };
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 9b09395b998f..68266cbffd1f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -524,51 +524,34 @@ int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset)
  */
 int intel_guc_suspend(struct intel_guc *guc)
 {
-	struct intel_uncore *uncore = guc_to_gt(guc)->uncore;
 	int ret;
-	u32 status;
 	u32 action[] = {
-		INTEL_GUC_ACTION_ENTER_S_STATE,
-		GUC_POWER_D1, /* any value greater than GUC_POWER_D0 */
+		INTEL_GUC_ACTION_RESET_CLIENT,
 	};
 
-	/*
-	 * If GuC communication is enabled but submission is not supported,
-	 * we do not need to suspend the GuC.
-	 */
-	if (!intel_guc_submission_is_used(guc) || !intel_guc_is_ready(guc))
+	if (!intel_guc_is_ready(guc))
 		return 0;
 
-	/*
-	 * The ENTER_S_STATE action queues the save/restore operation in GuC FW
-	 * and then returns, so waiting on the H2G is not enough to guarantee
-	 * GuC is done. When all the processing is done, GuC writes
-	 * INTEL_GUC_SLEEP_STATE_SUCCESS to scratch register 14, so we can poll
-	 * on that. Note that GuC does not ensure that the value in the register
-	 * is different from INTEL_GUC_SLEEP_STATE_SUCCESS while the action is
-	 * in progress so we need to take care of that ourselves as well.
-	 */
-
-	intel_uncore_write(uncore, SOFT_SCRATCH(14),
-			   INTEL_GUC_SLEEP_STATE_INVALID_MASK);
-
-	ret = intel_guc_send(guc, action, ARRAY_SIZE(action));
-	if (ret)
-		return ret;
-
-	ret = __intel_wait_for_register(uncore, SOFT_SCRATCH(14),
-					INTEL_GUC_SLEEP_STATE_INVALID_MASK,
-					0, 0, 10, &status);
-	if (ret)
-		return ret;
-
-	if (status != INTEL_GUC_SLEEP_STATE_SUCCESS) {
-		DRM_ERROR("GuC failed to change sleep state. "
-			  "action=0x%x, err=%u\n",
-			  action[0], status);
-		return -EIO;
+	if (intel_guc_submission_is_used(guc)) {
+		/*
+		 * This H2G MMIO command tears down the GuC in two steps. First it will
+		 * generate a G2H CTB for every active context indicating a reset. In
+		 * practice the i915 shouldn't ever get a G2H as suspend should only be
+		 * called when the GPU is idle. Next, it tears down the CTBs and this
+		 * H2G MMIO command completes.
+		 *
+		 * Don't abort on a failure code from the GuC. Keep going and do the
+		 * clean up in santize() and re-initialisation on resume and hopefully
+		 * the error here won't be problematic.
+		 */
+		ret = intel_guc_send_mmio(guc, action, ARRAY_SIZE(action), NULL, 0);
+		if (ret)
+			DRM_ERROR("GuC suspend: RESET_CLIENT action failed with error %d!\n", ret);
 	}
 
+	/* Signal that the GuC isn't running. */
+	intel_guc_sanitize(guc);
+
 	return 0;
 }
 
@@ -578,7 +561,12 @@ int intel_guc_suspend(struct intel_guc *guc)
  */
 int intel_guc_resume(struct intel_guc *guc)
 {
-	/* XXX: to be implemented with submission interface rework */
+	/*
+	 * NB: This function can still be called even if GuC submission is
+	 * disabled, e.g. if GuC is enabled for HuC authentication only. Thus,
+	 * if any code is later added here, it must be support doing nothing
+	 * if submission is disabled (as per intel_guc_suspend).
+	 */
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 59fca9748c15..16b61fe71b07 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -304,10 +304,10 @@ static int guc_submission_busy_loop(struct intel_guc* guc,
 	return err;
 }
 
-static int guc_wait_for_pending_msg(struct intel_guc *guc,
-				    atomic_t *wait_var,
-				    bool interruptible,
-				    long timeout)
+int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
+				   atomic_t *wait_var,
+				   bool interruptible,
+				   long timeout)
 {
 	const int state = interruptible ?
 		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
@@ -352,8 +352,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
 	if (unlikely(timeout < 0))
 		timeout = -timeout, interruptible = false;
 
-	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
-					interruptible, timeout);
+	return intel_guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
+					      interruptible, timeout);
 }
 
 static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
@@ -625,7 +625,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 	for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
 		intel_guc_to_host_event_handler(guc);
 #define wait_for_reset(guc, wait_var) \
-		guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
+		intel_guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
 		do {
 			wait_for_reset(guc, &guc->outstanding_submission_g2h);
 		} while (!list_empty(&guc->ct.requests.incoming));
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 95df5ab06031..b9b9f0f60f91 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -27,6 +27,11 @@ void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p);
 
 bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
 
+int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
+				   atomic_t *wait_var,
+				   bool interruptible,
+				   long timeout);
+
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
 	/* XXX: GuC submission is unavailable for now */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index ab11fe731ee7..b523a8521351 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -596,14 +596,18 @@ void intel_uc_cancel_requests(struct intel_uc *uc)
 void intel_uc_runtime_suspend(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
-	int err;
 
 	if (!intel_guc_is_ready(guc))
 		return;
 
-	err = intel_guc_suspend(guc);
-	if (err)
-		DRM_DEBUG_DRIVER("Failed to suspend GuC, err=%d", err);
+	/*
+	 * Wait for any outstanding CTB before tearing down communication /w the
+	 * GuC.
+	 */
+#define OUTSTANDING_CTB_TIMEOUT_PERIOD	(HZ / 5)
+	intel_guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
+				       false, OUTSTANDING_CTB_TIMEOUT_PERIOD);
+	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
 
 	guc_disable_communication(guc);
 }
@@ -612,12 +616,16 @@ void intel_uc_suspend(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
 	intel_wakeref_t wakeref;
+	int err;
 
 	if (!intel_guc_is_ready(guc))
 		return;
 
-	with_intel_runtime_pm(uc_to_gt(uc)->uncore->rpm, wakeref)
-		intel_uc_runtime_suspend(uc);
+	with_intel_runtime_pm(&uc_to_gt(uc)->i915->runtime_pm, wakeref) {
+		err = intel_guc_suspend(guc);
+		if (err)
+			DRM_DEBUG_DRIVER("Failed to suspend GuC, err=%d", err);
+	}
 }
 
 static int __uc_resume(struct intel_uc *uc, bool enable_communication)
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 35/47] drm/i915/guc: Handle context reset notification
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (33 preceding siblings ...)
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 34/47] drm/i915/guc: Suspend/resume implementation for new interface Matthew Brost
@ 2021-06-24  7:05 ` Matthew Brost
  2021-07-12 22:58   ` John Harrison
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 36/47] drm/i915/guc: Handle engine reset failure notification Matthew Brost
                   ` (16 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

GuC will issue a reset on detecting an engine hang and will notify
the driver via a G2H message. The driver will service the notification
by resetting the guilty context to a simple state or banning it
completely.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  3 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_trace.h             | 10 ++++++
 4 files changed, 50 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 85ef6767f13b..e94b0ef733da 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -262,6 +262,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 					  const u32 *msg, u32 len);
 int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 				     const u32 *msg, u32 len);
+int intel_guc_context_reset_process_msg(struct intel_guc *guc,
+					const u32 *msg, u32 len);
 
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 4ed074df88e5..a2020373b8e8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -945,6 +945,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
 		ret = intel_guc_sched_done_process_msg(guc, payload, len);
 		break;
+	case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
+		ret = intel_guc_context_reset_process_msg(guc, payload, len);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 16b61fe71b07..9845c5bd9832 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2192,6 +2192,41 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 	return 0;
 }
 
+static void guc_context_replay(struct intel_context *ce)
+{
+	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
+
+	__guc_reset_context(ce, true);
+	tasklet_hi_schedule(&sched_engine->tasklet);
+}
+
+static void guc_handle_context_reset(struct intel_guc *guc,
+				     struct intel_context *ce)
+{
+	trace_intel_context_reset(ce);
+	guc_context_replay(ce);
+}
+
+int intel_guc_context_reset_process_msg(struct intel_guc *guc,
+					const u32 *msg, u32 len)
+{
+	struct intel_context *ce;
+	int desc_idx = msg[0];
+
+	if (unlikely(len != 1)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+		return -EPROTO;
+	}
+
+	ce = g2h_context_lookup(guc, desc_idx);
+	if (unlikely(!ce))
+		return -EPROTO;
+
+	guc_handle_context_reset(guc, ce);
+
+	return 0;
+}
+
 void intel_guc_log_submission_info(struct intel_guc *guc,
 				   struct drm_printer *p)
 {
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 97c2e83984ed..c095c4d39456 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context,
 		      __entry->guc_sched_state_no_lock)
 );
 
+DEFINE_EVENT(intel_context, intel_context_reset,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
 DEFINE_EVENT(intel_context, intel_context_register,
 	     TP_PROTO(struct intel_context *ce),
 	     TP_ARGS(ce)
@@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq)
 {
 }
 
+static inline void
+trace_intel_context_reset(struct intel_context *ce)
+{
+}
+
 static inline void
 trace_intel_context_register(struct intel_context *ce)
 {
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 36/47] drm/i915/guc: Handle engine reset failure notification
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (34 preceding siblings ...)
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 35/47] drm/i915/guc: Handle context reset notification Matthew Brost
@ 2021-06-24  7:05 ` Matthew Brost
  2021-07-12 22:59   ` John Harrison
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 37/47] drm/i915/guc: Enable the timer expired interrupt for GuC Matthew Brost
                   ` (15 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

GuC will notify the driver, via G2H, if it fails to
reset an engine. We recover by resorting to a full GPU
reset.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  3 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 43 +++++++++++++++++++
 3 files changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index e94b0ef733da..99742625e6ff 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -264,6 +264,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 				     const u32 *msg, u32 len);
 int intel_guc_context_reset_process_msg(struct intel_guc *guc,
 					const u32 *msg, u32 len);
+int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
+					 const u32 *msg, u32 len);
 
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index a2020373b8e8..dd6177c8d75c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -948,6 +948,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 	case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
 		ret = intel_guc_context_reset_process_msg(guc, payload, len);
 		break;
+	case INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION:
+		ret = intel_guc_engine_failure_process_msg(guc, payload, len);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9845c5bd9832..c3223958dfe0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2227,6 +2227,49 @@ int intel_guc_context_reset_process_msg(struct intel_guc *guc,
 	return 0;
 }
 
+static struct intel_engine_cs *
+guc_lookup_engine(struct intel_guc *guc, u8 guc_class, u8 instance)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	u8 engine_class = guc_class_to_engine_class(guc_class);
+
+	/* Class index is checked in class converter */
+	GEM_BUG_ON(instance > MAX_ENGINE_INSTANCE);
+
+	return gt->engine_class[engine_class][instance];
+}
+
+int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
+					 const u32 *msg, u32 len)
+{
+	struct intel_engine_cs *engine;
+	u8 guc_class, instance;
+	u32 reason;
+
+	if (unlikely(len != 3)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+		return -EPROTO;
+	}
+
+	guc_class = msg[0];
+	instance = msg[1];
+	reason = msg[2];
+
+	engine = guc_lookup_engine(guc, guc_class, instance);
+	if (unlikely(!engine)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"Invalid engine %d:%d", guc_class, instance);
+		return -EPROTO;
+	}
+
+	intel_gt_handle_error(guc_to_gt(guc), engine->mask,
+			      I915_ERROR_CAPTURE,
+			      "GuC failed to reset %s (reason=0x%08x)\n",
+			      engine->name, reason);
+
+	return 0;
+}
+
 void intel_guc_log_submission_info(struct intel_guc *guc,
 				   struct drm_printer *p)
 {
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 37/47] drm/i915/guc: Enable the timer expired interrupt for GuC
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (35 preceding siblings ...)
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 36/47] drm/i915/guc: Handle engine reset failure notification Matthew Brost
@ 2021-06-24  7:05 ` Matthew Brost
  2021-07-12 23:00   ` John Harrison
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 38/47] drm/i915/guc: Provide mmio list to be saved/restored on engine reset Matthew Brost
                   ` (14 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

The GuC can implement execution qunatums, detect hung contexts and
other such things but it requires the timer expired interrupt to do so.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
CC: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_rps.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
index 06e9a8ed4e03..0c8e7f2b06f0 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -1877,6 +1877,10 @@ void intel_rps_init(struct intel_rps *rps)
 
 	if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) < 11)
 		rps->pm_intrmsk_mbz |= GEN8_PMINTR_DISABLE_REDIRECT_TO_GUC;
+
+	/* GuC needs ARAT expired interrupt unmasked */
+	if (intel_uc_uses_guc_submission(&rps_to_gt(rps)->uc))
+		rps->pm_intrmsk_mbz |= ARAT_EXPIRED_INTRMSK;
 }
 
 void intel_rps_sanitize(struct intel_rps *rps)
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 38/47] drm/i915/guc: Provide mmio list to be saved/restored on engine reset
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (36 preceding siblings ...)
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 37/47] drm/i915/guc: Enable the timer expired interrupt for GuC Matthew Brost
@ 2021-06-24  7:05 ` Matthew Brost
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 39/47] drm/i915/guc: Don't complain about reset races Matthew Brost
                   ` (13 subsequent siblings)
  51 siblings, 0 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

The driver must provide GuC with a list of mmio registers
that should be saved/restored during a GuC-based engine reset.
Unfortunately, the list must be dynamically allocated as its size is
variable. That means the driver must generate the list twice - once to
work out the size and a second time to actually save it.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c   |  46 ++--
 .../gpu/drm/i915/gt/intel_workarounds_types.h |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    | 199 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_reg.h               |   1 +
 5 files changed, 222 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index d9a5a445ceec..9bb85187f071 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -150,13 +150,14 @@ static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
 }
 
 static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
-		   u32 clear, u32 set, u32 read_mask)
+		   u32 clear, u32 set, u32 read_mask, bool masked_reg)
 {
 	struct i915_wa wa = {
 		.reg  = reg,
 		.clr  = clear,
 		.set  = set,
 		.read = read_mask,
+		.masked_reg = masked_reg,
 	};
 
 	_wa_add(wal, &wa);
@@ -165,7 +166,7 @@ static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
 static void
 wa_write_clr_set(struct i915_wa_list *wal, i915_reg_t reg, u32 clear, u32 set)
 {
-	wa_add(wal, reg, clear, set, clear);
+	wa_add(wal, reg, clear, set, clear, false);
 }
 
 static void
@@ -200,20 +201,20 @@ wa_write_clr(struct i915_wa_list *wal, i915_reg_t reg, u32 clr)
 static void
 wa_masked_en(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
 {
-	wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val);
+	wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val, true);
 }
 
 static void
 wa_masked_dis(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
 {
-	wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val);
+	wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val, true);
 }
 
 static void
 wa_masked_field_set(struct i915_wa_list *wal, i915_reg_t reg,
 		    u32 mask, u32 val)
 {
-	wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask);
+	wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask, true);
 }
 
 static void gen6_ctx_workarounds_init(struct intel_engine_cs *engine,
@@ -583,10 +584,10 @@ static void icl_ctx_workarounds_init(struct intel_engine_cs *engine,
 			     GEN11_BLEND_EMB_FIX_DISABLE_IN_RCC);
 
 	/* WaEnableFloatBlendOptimization:icl */
-	wa_write_clr_set(wal,
-			 GEN10_CACHE_MODE_SS,
-			 0, /* write-only, so skip validation */
-			 _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE));
+	wa_add(wal, GEN10_CACHE_MODE_SS, 0,
+	       _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE),
+	       0 /* write-only, so skip validation */,
+	       true);
 
 	/* WaDisableGPGPUMidThreadPreemption:icl */
 	wa_masked_field_set(wal, GEN8_CS_CHICKEN1,
@@ -631,7 +632,7 @@ static void gen12_ctx_gt_tuning_init(struct intel_engine_cs *engine,
 	       FF_MODE2,
 	       FF_MODE2_TDS_TIMER_MASK,
 	       FF_MODE2_TDS_TIMER_128,
-	       0);
+	       0, false);
 }
 
 static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine,
@@ -669,7 +670,7 @@ static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine,
 	       FF_MODE2,
 	       FF_MODE2_GS_TIMER_MASK,
 	       FF_MODE2_GS_TIMER_224,
-	       0);
+	       0, false);
 }
 
 static void dg1_ctx_workarounds_init(struct intel_engine_cs *engine,
@@ -840,7 +841,7 @@ hsw_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
 	wa_add(wal,
 	       HSW_ROW_CHICKEN3, 0,
 	       _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE),
-		0 /* XXX does this reg exist? */);
+	       0 /* XXX does this reg exist? */, true);
 
 	/* WaVSRefCountFullforceMissDisable:hsw */
 	wa_write_clr(wal, GEN7_FF_THREAD_MODE, GEN7_FF_VS_REF_CNT_FFME);
@@ -1929,10 +1930,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 		 * disable bit, which we don't touch here, but it's good
 		 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
 		 */
-		wa_add(wal, GEN7_GT_MODE, 0,
-		       _MASKED_FIELD(GEN6_WIZ_HASHING_MASK,
-				     GEN6_WIZ_HASHING_16x4),
-		       GEN6_WIZ_HASHING_16x4);
+		wa_masked_field_set(wal,
+				    GEN7_GT_MODE,
+				    GEN6_WIZ_HASHING_MASK,
+				    GEN6_WIZ_HASHING_16x4);
 	}
 
 	if (IS_GRAPHICS_VER(i915, 6, 7))
@@ -1982,10 +1983,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 		 * disable bit, which we don't touch here, but it's good
 		 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
 		 */
-		wa_add(wal,
-		       GEN6_GT_MODE, 0,
-		       _MASKED_FIELD(GEN6_WIZ_HASHING_MASK, GEN6_WIZ_HASHING_16x4),
-		       GEN6_WIZ_HASHING_16x4);
+		wa_masked_field_set(wal,
+				    GEN7_GT_MODE,
+				    GEN6_WIZ_HASHING_MASK,
+				    GEN6_WIZ_HASHING_16x4);
 
 		/* WaDisable_RenderCache_OperationalFlush:snb */
 		wa_masked_dis(wal, CACHE_MODE_0, RC_OP_FLUSH_ENABLE);
@@ -2006,7 +2007,7 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 		wa_add(wal, MI_MODE,
 		       0, _MASKED_BIT_ENABLE(VS_TIMER_DISPATCH),
 		       /* XXX bit doesn't stick on Broadwater */
-		       IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH);
+		       IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH, true);
 
 	if (GRAPHICS_VER(i915) == 4)
 		/*
@@ -2021,7 +2022,8 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 		 */
 		wa_add(wal, ECOSKPD,
 		       0, _MASKED_BIT_ENABLE(ECO_CONSTANT_BUFFER_SR_DISABLE),
-		       0 /* XXX bit doesn't stick on Broadwater */);
+		       0 /* XXX bit doesn't stick on Broadwater */,
+		       true);
 }
 
 static void
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds_types.h b/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
index c214111ea367..1e873681795d 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
@@ -15,6 +15,7 @@ struct i915_wa {
 	u32		clr;
 	u32		set;
 	u32		read;
+	bool		masked_reg;
 };
 
 struct i915_wa_list {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 99742625e6ff..ab1a85b508db 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -58,6 +58,7 @@ struct intel_guc {
 
 	struct i915_vma *ads_vma;
 	struct __guc_ads_blob *ads_blob;
+	u32 ads_regset_size;
 
 	struct i915_vma *lrc_desc_pool;
 	void *lrc_desc_pool_vaddr;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index b82145652d57..9fd3c911f5fb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -3,6 +3,8 @@
  * Copyright © 2014-2019 Intel Corporation
  */
 
+#include <linux/bsearch.h>
+
 #include "gt/intel_gt.h"
 #include "gt/intel_lrc.h"
 #include "intel_guc_ads.h"
@@ -23,7 +25,12 @@
  *      | guc_policies                          |
  *      +---------------------------------------+
  *      | guc_gt_system_info                    |
- *      +---------------------------------------+
+ *      +---------------------------------------+ <== static
+ *      | guc_mmio_reg[countA] (engine 0.0)     |
+ *      | guc_mmio_reg[countB] (engine 0.1)     |
+ *      | guc_mmio_reg[countC] (engine 1.0)     |
+ *      |   ...                                 |
+ *      +---------------------------------------+ <== dynamic
  *      | padding                               |
  *      +---------------------------------------+ <== 4K aligned
  *      | private data                          |
@@ -35,16 +42,33 @@ struct __guc_ads_blob {
 	struct guc_ads ads;
 	struct guc_policies policies;
 	struct guc_gt_system_info system_info;
+	/* From here on, location is dynamic! Refer to above diagram. */
+	struct guc_mmio_reg regset[0];
 } __packed;
 
+static u32 guc_ads_regset_size(struct intel_guc *guc)
+{
+	GEM_BUG_ON(!guc->ads_regset_size);
+	return guc->ads_regset_size;
+}
+
 static u32 guc_ads_private_data_size(struct intel_guc *guc)
 {
 	return PAGE_ALIGN(guc->fw.private_data_size);
 }
 
+static u32 guc_ads_regset_offset(struct intel_guc *guc)
+{
+	return offsetof(struct __guc_ads_blob, regset);
+}
+
 static u32 guc_ads_private_data_offset(struct intel_guc *guc)
 {
-	return PAGE_ALIGN(sizeof(struct __guc_ads_blob));
+	u32 offset;
+
+	offset = guc_ads_regset_offset(guc) +
+		 guc_ads_regset_size(guc);
+	return PAGE_ALIGN(offset);
 }
 
 static u32 guc_ads_blob_size(struct intel_guc *guc)
@@ -83,6 +107,165 @@ static void guc_mapping_table_init(struct intel_gt *gt,
 	}
 }
 
+/*
+ * The save/restore register list must be pre-calculated to a temporary
+ * buffer of driver defined size before it can be generated in place
+ * inside the ADS.
+ */
+#define MAX_MMIO_REGS	128	/* Arbitrary size, increase as needed */
+struct temp_regset {
+	struct guc_mmio_reg *registers;
+	u32 used;
+	u32 size;
+};
+
+static int guc_mmio_reg_cmp(const void *a, const void *b)
+{
+	const struct guc_mmio_reg *ra = a;
+	const struct guc_mmio_reg *rb = b;
+
+	return (int)ra->offset - (int)rb->offset;
+}
+
+static void guc_mmio_reg_add(struct temp_regset *regset,
+			     u32 offset, u32 flags)
+{
+	u32 count = regset->used;
+	struct guc_mmio_reg reg = {
+		.offset = offset,
+		.flags = flags,
+	};
+	struct guc_mmio_reg *slot;
+
+	GEM_BUG_ON(count >= regset->size);
+
+	/*
+	 * The mmio list is built using separate lists within the driver.
+	 * It's possible that at some point we may attempt to add the same
+	 * register more than once. Do not consider this an error; silently
+	 * move on if the register is already in the list.
+	 */
+	if (bsearch(&reg, regset->registers, count,
+		    sizeof(reg), guc_mmio_reg_cmp))
+		return;
+
+	slot = &regset->registers[count];
+	regset->used++;
+	*slot = reg;
+
+	while (slot-- > regset->registers) {
+		GEM_BUG_ON(slot[0].offset == slot[1].offset);
+		if (slot[1].offset > slot[0].offset)
+			break;
+
+		swap(slot[1], slot[0]);
+	}
+}
+
+#define GUC_MMIO_REG_ADD(regset, reg, masked) \
+	guc_mmio_reg_add(regset, \
+			 i915_mmio_reg_offset((reg)), \
+			 (masked) ? GUC_REGSET_MASKED : 0)
+
+static void guc_mmio_regset_init(struct temp_regset *regset,
+				 struct intel_engine_cs *engine)
+{
+	const u32 base = engine->mmio_base;
+	struct i915_wa_list *wal = &engine->wa_list;
+	struct i915_wa *wa;
+	unsigned int i;
+
+	regset->used = 0;
+
+	GUC_MMIO_REG_ADD(regset, RING_MODE_GEN7(base), true);
+	GUC_MMIO_REG_ADD(regset, RING_HWS_PGA(base), false);
+	GUC_MMIO_REG_ADD(regset, RING_IMR(base), false);
+
+	for (i = 0, wa = wal->list; i < wal->count; i++, wa++)
+		GUC_MMIO_REG_ADD(regset, wa->reg, wa->masked_reg);
+
+	/* Be extra paranoid and include all whitelist registers. */
+	for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++)
+		GUC_MMIO_REG_ADD(regset,
+				 RING_FORCE_TO_NONPRIV(base, i),
+				 false);
+
+	/* add in local MOCS registers */
+	for (i = 0; i < GEN9_LNCFCMOCS_REG_COUNT; i++)
+		GUC_MMIO_REG_ADD(regset, GEN9_LNCFCMOCS(i), false);
+}
+
+static int guc_mmio_reg_state_query(struct intel_guc *guc)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	struct temp_regset temp_set;
+	u32 total;
+
+	/*
+	 * Need to actually build the list in order to filter out
+	 * duplicates and other such data dependent constructions.
+	 */
+	temp_set.size = MAX_MMIO_REGS;
+	temp_set.registers = kmalloc_array(temp_set.size,
+					  sizeof(*temp_set.registers),
+					  GFP_KERNEL);
+	if (!temp_set.registers)
+		return -ENOMEM;
+
+	total = 0;
+	for_each_engine(engine, gt, id) {
+		guc_mmio_regset_init(&temp_set, engine);
+		total += temp_set.used;
+	}
+
+	kfree(temp_set.registers);
+
+	return total * sizeof(struct guc_mmio_reg);
+}
+
+static void guc_mmio_reg_state_init(struct intel_guc *guc,
+				    struct __guc_ads_blob *blob)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	struct temp_regset temp_set;
+	struct guc_mmio_reg_set *ads_reg_set;
+	u32 addr_ggtt, offset;
+	u8 guc_class;
+
+	offset = guc_ads_regset_offset(guc);
+	addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
+	temp_set.registers = (struct guc_mmio_reg *) (((u8 *) blob) + offset);
+	temp_set.size = guc->ads_regset_size / sizeof(temp_set.registers[0]);
+
+	for_each_engine(engine, gt, id) {
+		/* Class index is checked in class converter */
+		GEM_BUG_ON(engine->instance >= GUC_MAX_INSTANCES_PER_CLASS);
+
+		guc_class = engine_class_to_guc_class(engine->class);
+		ads_reg_set = &blob->ads.reg_state_list[guc_class][engine->instance];
+
+		guc_mmio_regset_init(&temp_set, engine);
+		if (!temp_set.used) {
+			ads_reg_set->address = 0;
+			ads_reg_set->count = 0;
+			continue;
+		}
+
+		ads_reg_set->address = addr_ggtt;
+		ads_reg_set->count = temp_set.used;
+
+		temp_set.size -= temp_set.used;
+		temp_set.registers += temp_set.used;
+		addr_ggtt += temp_set.used * sizeof(struct guc_mmio_reg);
+	}
+
+	GEM_BUG_ON(temp_set.size);
+}
+
 /*
  * The first 80 dwords of the register state context, containing the
  * execlists and ppgtt registers.
@@ -121,8 +304,7 @@ static void __guc_ads_init(struct intel_guc *guc)
 		 */
 		blob->ads.golden_context_lrca[guc_class] = 0;
 		blob->ads.eng_state_size[guc_class] =
-			intel_engine_context_size(guc_to_gt(guc),
-						  engine_class) -
+			intel_engine_context_size(gt, engine_class) -
 			skipped_size;
 	}
 
@@ -153,6 +335,9 @@ static void __guc_ads_init(struct intel_guc *guc)
 	blob->ads.scheduler_policies = base + ptr_offset(blob, policies);
 	blob->ads.gt_system_info = base + ptr_offset(blob, system_info);
 
+	/* MMIO save/restore list */
+	guc_mmio_reg_state_init(guc, blob);
+
 	/* Private Data */
 	blob->ads.private_data = base + guc_ads_private_data_offset(guc);
 
@@ -173,6 +358,12 @@ int intel_guc_ads_create(struct intel_guc *guc)
 
 	GEM_BUG_ON(guc->ads_vma);
 
+	/* Need to calculate the reg state size dynamically: */
+	ret = guc_mmio_reg_state_query(guc);
+	if (ret < 0)
+		return ret;
+	guc->ads_regset_size = ret;
+
 	size = guc_ads_blob_size(guc);
 
 	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma,
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index a9c2242d61a2..f1217b5c2ff3 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -12289,6 +12289,7 @@ enum skl_power_gate {
 
 /* MOCS (Memory Object Control State) registers */
 #define GEN9_LNCFCMOCS(i)	_MMIO(0xb020 + (i) * 4)	/* L3 Cache Control */
+#define GEN9_LNCFCMOCS_REG_COUNT	32
 
 #define __GEN9_RCS0_MOCS0	0xc800
 #define GEN9_GFX_MOCS(i)	_MMIO(__GEN9_RCS0_MOCS0 + (i) * 4)
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 39/47] drm/i915/guc: Don't complain about reset races
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (37 preceding siblings ...)
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 38/47] drm/i915/guc: Provide mmio list to be saved/restored on engine reset Matthew Brost
@ 2021-06-24  7:05 ` Matthew Brost
  2021-06-24 15:55   ` Matthew Brost
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 40/47] drm/i915/guc: Enable GuC engine reset Matthew Brost
                   ` (12 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

It is impossible to seal all race conditions of resets occurring
concurrent to other operations. At least, not without introducing
excesive mutex locking. Instead, don't complain if it occurs. In
particular, don't complain if trying to send a H2G during a reset.
Whatever the H2G was about should get redone once the reset is over.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 5 ++++-
 drivers/gpu/drm/i915/gt/uc/intel_uc.c     | 3 +++
 drivers/gpu/drm/i915/gt/uc/intel_uc.h     | 2 ++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index dd6177c8d75c..3b32755f892e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -727,7 +727,10 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 	int ret;
 
 	if (unlikely(!ct->enabled)) {
-		WARN(1, "Unexpected send: action=%#x\n", *action);
+		struct intel_guc *guc = ct_to_guc(ct);
+		struct intel_uc *uc = container_of(guc, struct intel_uc, guc);
+
+		WARN(!uc->reset_in_progress, "Unexpected send: action=%#x\n", *action);
 		return -ENODEV;
 	}
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index b523a8521351..77c1fe2ed883 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -550,6 +550,7 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
 
+	uc->reset_in_progress = true;
 
 	/* Nothing to do if GuC isn't supported */
 	if (!intel_uc_supports_guc(uc))
@@ -579,6 +580,8 @@ void intel_uc_reset_finish(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
 
+	uc->reset_in_progress = false;
+
 	/* Firmware expected to be running when this function is called */
 	if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
 		intel_guc_submission_reset_finish(guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
index eaa3202192ac..91315e3f1c58 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -30,6 +30,8 @@ struct intel_uc {
 
 	/* Snapshot of GuC log from last failed load */
 	struct drm_i915_gem_object *load_err_log;
+
+	bool reset_in_progress;
 };
 
 void intel_uc_init_early(struct intel_uc *uc);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 40/47] drm/i915/guc: Enable GuC engine reset
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (38 preceding siblings ...)
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 39/47] drm/i915/guc: Don't complain about reset races Matthew Brost
@ 2021-06-24  7:05 ` Matthew Brost
  2021-06-24 16:19   ` Matthew Brost
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 41/47] drm/i915/guc: Capture error state on context reset Matthew Brost
                   ` (11 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

Clear the 'disable resets' flag to allow GuC to reset hung contexts
(detected via pre-emption timeout).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 9fd3c911f5fb..d3e86ab7508f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -81,8 +81,7 @@ static void guc_policies_init(struct guc_policies *policies)
 {
 	policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US;
 	policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI;
-	/* Disable automatic resets as not yet supported. */
-	policies->global_flags = GLOBAL_POLICY_DISABLE_ENGINE_RESET;
+	policies->global_flags = 0;
 	policies->is_valid = 1;
 }
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 41/47] drm/i915/guc: Capture error state on context reset
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (39 preceding siblings ...)
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 40/47] drm/i915/guc: Enable GuC engine reset Matthew Brost
@ 2021-06-24  7:05 ` Matthew Brost
  2021-07-12 23:05   ` John Harrison
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 42/47] drm/i915/guc: Fix for error capture after full GPU reset with GuC Matthew Brost
                   ` (10 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

We receive notification of an engine reset from GuC at its
completion. Meaning GuC has potentially cleared any HW state
we may have been interested in capturing. GuC resumes scheduling
on the engine post-reset, as the resets are meant to be transparent,
further muddling our error state.

There is ongoing work to define an API for a GuC debug state dump. The
suggestion for now is to manually disable FW initiated resets in cases
where debug state is needed.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       | 20 +++++++++++
 drivers/gpu/drm/i915/gt/intel_context.h       |  3 ++
 drivers/gpu/drm/i915/gt/intel_engine.h        | 21 ++++++++++-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 11 ++++--
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++++++++----------
 drivers/gpu/drm/i915/i915_gpu_error.c         | 25 ++++++++++---
 7 files changed, 91 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 2f01437056a8..3fe7794b2bfd 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -514,6 +514,26 @@ struct i915_request *intel_context_create_request(struct intel_context *ce)
 	return rq;
 }
 
+struct i915_request *intel_context_find_active_request(struct intel_context *ce)
+{
+	struct i915_request *rq, *active = NULL;
+	unsigned long flags;
+
+	GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
+
+	spin_lock_irqsave(&ce->guc_active.lock, flags);
+	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
+				    sched.link) {
+		if (i915_request_completed(rq))
+			break;
+
+		active = rq;
+	}
+	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
+
+	return active;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftest_context.c"
 #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index a592a9605dc8..3363b59c0c40 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -201,6 +201,9 @@ int intel_context_prepare_remote_request(struct intel_context *ce,
 
 struct i915_request *intel_context_create_request(struct intel_context *ce);
 
+struct i915_request *
+intel_context_find_active_request(struct intel_context *ce);
+
 static inline struct intel_ring *__intel_context_ring_size(u64 sz)
 {
 	return u64_to_ptr(struct intel_ring, sz);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index e9e0657f847a..6ea5643a3aaa 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -245,7 +245,7 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine,
 				   ktime_t *now);
 
 struct i915_request *
-intel_engine_find_active_request(struct intel_engine_cs *engine);
+intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine);
 
 u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
 struct intel_context *
@@ -328,4 +328,23 @@ intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
 	return engine->cops->get_sibling(engine, sibling);
 }
 
+static inline void
+intel_engine_set_hung_context(struct intel_engine_cs *engine,
+			      struct intel_context *ce)
+{
+	engine->hung_ce = ce;
+}
+
+static inline void
+intel_engine_clear_hung_context(struct intel_engine_cs *engine)
+{
+	intel_engine_set_hung_context(engine, NULL);
+}
+
+static inline struct intel_context *
+intel_engine_get_hung_context(struct intel_engine_cs *engine)
+{
+	return engine->hung_ce;
+}
+
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 69245670b8b0..1d243b83b023 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1671,7 +1671,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	drm_printf(m, "\tRequests:\n");
 
 	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-	rq = intel_engine_find_active_request(engine);
+	rq = intel_engine_execlist_find_hung_request(engine);
 	if (rq) {
 		struct intel_timeline *tl = get_timeline(rq);
 
@@ -1782,10 +1782,17 @@ static bool match_ring(struct i915_request *rq)
 }
 
 struct i915_request *
-intel_engine_find_active_request(struct intel_engine_cs *engine)
+intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
 {
 	struct i915_request *request, *active = NULL;
 
+	/*
+	 * This search does not work in GuC submission mode. However, the GuC
+	 * will report the hanging context directly to the driver itself. So
+	 * the driver should never get here when in GuC mode.
+	 */
+	GEM_BUG_ON(intel_uc_uses_guc_submission(&engine->gt->uc));
+
 	/*
 	 * We are called by the error capture, reset and to dump engine
 	 * state at random points in time. In particular, note that neither is
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index f9d264c008e8..0ceffa2be7a7 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -303,6 +303,8 @@ struct intel_engine_cs {
 	/* keep a request in reserve for a [pm] barrier under oom */
 	struct i915_request *request_pool;
 
+	struct intel_context *hung_ce;
+
 	struct llist_head barrier_tasks;
 
 	struct intel_context *kernel_context; /* pinned */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index c3223958dfe0..315edeaa186a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -727,24 +727,6 @@ __unwind_incomplete_requests(struct intel_context *ce)
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
-static struct i915_request *context_find_active_request(struct intel_context *ce)
-{
-	struct i915_request *rq, *active = NULL;
-	unsigned long flags;
-
-	spin_lock_irqsave(&ce->guc_active.lock, flags);
-	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
-				    sched.link) {
-		if (i915_request_completed(rq))
-			break;
-
-		active = rq;
-	}
-	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
-
-	return active;
-}
-
 static void __guc_reset_context(struct intel_context *ce, bool stalled)
 {
 	struct i915_request *rq;
@@ -758,7 +740,7 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
 	 */
 	clr_context_enabled(ce);
 
-	rq = context_find_active_request(ce);
+	rq = intel_context_find_active_request(ce);
 	if (!rq) {
 		head = ce->ring->tail;
 		stalled = false;
@@ -2192,6 +2174,20 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 	return 0;
 }
 
+static void capture_error_state(struct intel_guc *guc,
+				struct intel_context *ce)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct drm_i915_private *i915 = gt->i915;
+	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
+	intel_wakeref_t wakeref;
+
+	intel_engine_set_hung_context(engine, ce);
+	with_intel_runtime_pm(&i915->runtime_pm, wakeref)
+		i915_capture_error_state(gt, engine->mask);
+	atomic_inc(&i915->gpu_error.reset_engine_count[engine->uabi_class]);
+}
+
 static void guc_context_replay(struct intel_context *ce)
 {
 	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
@@ -2204,6 +2200,7 @@ static void guc_handle_context_reset(struct intel_guc *guc,
 				     struct intel_context *ce)
 {
 	trace_intel_context_reset(ce);
+	capture_error_state(guc, ce);
 	guc_context_replay(ce);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index cb182c6d265a..20e0a1bfadc1 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1429,20 +1429,37 @@ capture_engine(struct intel_engine_cs *engine,
 {
 	struct intel_engine_capture_vma *capture = NULL;
 	struct intel_engine_coredump *ee;
-	struct i915_request *rq;
+	struct intel_context *ce;
+	struct i915_request *rq = NULL;
 	unsigned long flags;
 
 	ee = intel_engine_coredump_alloc(engine, GFP_KERNEL);
 	if (!ee)
 		return NULL;
 
-	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-	rq = intel_engine_find_active_request(engine);
+	ce = intel_engine_get_hung_context(engine);
+	if (ce) {
+		intel_engine_clear_hung_context(engine);
+		rq = intel_context_find_active_request(ce);
+		if (!rq || !i915_request_started(rq))
+			goto no_request_capture;
+	} else {
+		/*
+		 * Getting here with GuC enabled means it is a forced error capture
+		 * with no actual hang. So, no need to attempt the execlist search.
+		 */
+		if (!intel_uc_uses_guc_submission(&engine->gt->uc)) {
+			spin_lock_irqsave(&engine->sched_engine->lock, flags);
+			rq = intel_engine_execlist_find_hung_request(engine);
+			spin_unlock_irqrestore(&engine->sched_engine->lock,
+					       flags);
+		}
+	}
 	if (rq)
 		capture = intel_engine_coredump_add_request(ee, rq,
 							    ATOMIC_MAYFAIL);
-	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 	if (!capture) {
+no_request_capture:
 		kfree(ee);
 		return NULL;
 	}
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 42/47] drm/i915/guc: Fix for error capture after full GPU reset with GuC
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (40 preceding siblings ...)
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 41/47] drm/i915/guc: Capture error state on context reset Matthew Brost
@ 2021-06-24  7:05 ` Matthew Brost
  2021-07-15  0:43   ` Matthew Brost
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 43/47] drm/i915/guc: Hook GuC scheduling policies up Matthew Brost
                   ` (9 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

In the case of a full GPU reset (e.g. because GuC has died or because
GuC's hang detection has been disabled), the driver can't rely on GuC
reporting the guilty context. Instead, the driver needs to scan all
active contexts and find one that is currently executing, as per the
execlist mode behaviour. In GuC mode, this scan is different to
execlist mode as the active request list is handled very differently.

Similarly, the request state dump in debugfs needs to be handled
differently when in GuC submission mode.

Also refactured some of the request scanning code to avoid duplication
across the multiple code paths that are now replicating it.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine.h        |   3 +
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 139 ++++++++++++------
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   8 +
 drivers/gpu/drm/i915/gt/intel_reset.c         |   2 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  67 +++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   3 +
 drivers/gpu/drm/i915/i915_request.c           |  41 ++++++
 drivers/gpu/drm/i915/i915_request.h           |  11 ++
 9 files changed, 229 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 6ea5643a3aaa..9ba131175564 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -240,6 +240,9 @@ __printf(3, 4)
 void intel_engine_dump(struct intel_engine_cs *engine,
 		       struct drm_printer *m,
 		       const char *header, ...);
+void intel_engine_dump_active_requests(struct list_head *requests,
+				       struct i915_request *hung_rq,
+				       struct drm_printer *m);
 
 ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine,
 				   ktime_t *now);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 1d243b83b023..bbea7c9a367d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1624,6 +1624,97 @@ static void print_properties(struct intel_engine_cs *engine,
 			   read_ul(&engine->defaults, p->offset));
 }
 
+static void engine_dump_request(struct i915_request *rq, struct drm_printer *m, const char *msg)
+{
+	struct intel_timeline *tl = get_timeline(rq);
+
+	i915_request_show(m, rq, msg, 0);
+
+	drm_printf(m, "\t\tring->start:  0x%08x\n",
+		   i915_ggtt_offset(rq->ring->vma));
+	drm_printf(m, "\t\tring->head:   0x%08x\n",
+		   rq->ring->head);
+	drm_printf(m, "\t\tring->tail:   0x%08x\n",
+		   rq->ring->tail);
+	drm_printf(m, "\t\tring->emit:   0x%08x\n",
+		   rq->ring->emit);
+	drm_printf(m, "\t\tring->space:  0x%08x\n",
+		   rq->ring->space);
+
+	if (tl) {
+		drm_printf(m, "\t\tring->hwsp:   0x%08x\n",
+			   tl->hwsp_offset);
+		intel_timeline_put(tl);
+	}
+
+	print_request_ring(m, rq);
+
+	if (rq->context->lrc_reg_state) {
+		drm_printf(m, "Logical Ring Context:\n");
+		hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE);
+	}
+}
+
+void intel_engine_dump_active_requests(struct list_head *requests,
+				       struct i915_request *hung_rq,
+				       struct drm_printer *m)
+{
+	struct i915_request *rq;
+	const char *msg;
+	enum i915_request_state state;
+
+	list_for_each_entry(rq, requests, sched.link) {
+		if (rq == hung_rq)
+			continue;
+
+		state = i915_test_request_state(rq);
+		if (state < I915_REQUEST_QUEUED)
+			continue;
+
+		if (state == I915_REQUEST_ACTIVE)
+			msg = "\t\tactive on engine";
+		else
+			msg = "\t\tactive in queue";
+
+		engine_dump_request(rq, m, msg);
+	}
+}
+
+static void engine_dump_active_requests(struct intel_engine_cs *engine, struct drm_printer *m)
+{
+	struct i915_request *hung_rq = NULL;
+	struct intel_context *ce;
+	bool guc;
+
+	/*
+	 * No need for an engine->irq_seqno_barrier() before the seqno reads.
+	 * The GPU is still running so requests are still executing and any
+	 * hardware reads will be out of date by the time they are reported.
+	 * But the intention here is just to report an instantaneous snapshot
+	 * so that's fine.
+	 */
+	lockdep_assert_held(&engine->sched_engine->lock);
+
+	drm_printf(m, "\tRequests:\n");
+
+	guc = intel_uc_uses_guc_submission(&engine->gt->uc);
+	if (guc) {
+		ce = intel_engine_get_hung_context(engine);
+		if (ce)
+			hung_rq = intel_context_find_active_request(ce);
+	} else
+		hung_rq = intel_engine_execlist_find_hung_request(engine);
+
+	if (hung_rq)
+		engine_dump_request(hung_rq, m, "\t\thung");
+
+	if (guc)
+		intel_guc_dump_active_requests(engine, hung_rq, m);
+	else
+		intel_engine_dump_active_requests(&engine->sched_engine->requests,
+						  hung_rq, m);
+}
+
 void intel_engine_dump(struct intel_engine_cs *engine,
 		       struct drm_printer *m,
 		       const char *header, ...)
@@ -1668,39 +1759,9 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		   i915_reset_count(error));
 	print_properties(engine, m);
 
-	drm_printf(m, "\tRequests:\n");
-
 	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-	rq = intel_engine_execlist_find_hung_request(engine);
-	if (rq) {
-		struct intel_timeline *tl = get_timeline(rq);
-
-		i915_request_show(m, rq, "\t\tactive ", 0);
-
-		drm_printf(m, "\t\tring->start:  0x%08x\n",
-			   i915_ggtt_offset(rq->ring->vma));
-		drm_printf(m, "\t\tring->head:   0x%08x\n",
-			   rq->ring->head);
-		drm_printf(m, "\t\tring->tail:   0x%08x\n",
-			   rq->ring->tail);
-		drm_printf(m, "\t\tring->emit:   0x%08x\n",
-			   rq->ring->emit);
-		drm_printf(m, "\t\tring->space:  0x%08x\n",
-			   rq->ring->space);
-
-		if (tl) {
-			drm_printf(m, "\t\tring->hwsp:   0x%08x\n",
-				   tl->hwsp_offset);
-			intel_timeline_put(tl);
-		}
-
-		print_request_ring(m, rq);
+	engine_dump_active_requests(engine, m);
 
-		if (rq->context->lrc_reg_state) {
-			drm_printf(m, "Logical Ring Context:\n");
-			hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE);
-		}
-	}
 	drm_printf(m, "\tOn hold?: %lu\n",
 		   list_count(&engine->sched_engine->hold));
 	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
@@ -1774,13 +1835,6 @@ intel_engine_create_virtual(struct intel_engine_cs **siblings,
 	return siblings[0]->cops->create_virtual(siblings, count);
 }
 
-static bool match_ring(struct i915_request *rq)
-{
-	u32 ring = ENGINE_READ(rq->engine, RING_START);
-
-	return ring == i915_ggtt_offset(rq->ring->vma);
-}
-
 struct i915_request *
 intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
 {
@@ -1824,14 +1878,7 @@ intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
 
 	list_for_each_entry(request, &engine->sched_engine->requests,
 			    sched.link) {
-		if (__i915_request_is_complete(request))
-			continue;
-
-		if (!__i915_request_has_started(request))
-			continue;
-
-		/* More than one preemptible request may match! */
-		if (!match_ring(request))
+		if (i915_test_request_state(request) != I915_REQUEST_ACTIVE)
 			continue;
 
 		active = request;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index a8495364d906..f0768824de6f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -90,6 +90,14 @@ reset_engine(struct intel_engine_cs *engine, struct i915_request *rq)
 	if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
 		show_heartbeat(rq, engine);
 
+	if (intel_engine_uses_guc(engine))
+		/*
+		 * GuC itself is toast or GuC's hang detection
+		 * is disabled. Either way, need to find the
+		 * hang culprit manually.
+		 */
+		intel_guc_find_hung_context(engine);
+
 	intel_gt_handle_error(engine->gt, engine->mask,
 			      I915_ERROR_CAPTURE,
 			      "stopped heartbeat on %s",
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 2987282dff6d..f3cdbf4ba5c8 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -156,7 +156,7 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
 	if (guilty) {
 		i915_request_set_error_once(rq, -EIO);
 		__i915_request_skip(rq);
-		if (mark_guilty(rq))
+		if (mark_guilty(rq) && !intel_engine_uses_guc(rq->engine))
 			skip_context(rq);
 	} else {
 		i915_request_set_error_once(rq, -EAGAIN);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index ab1a85b508db..c38365cd5fab 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -268,6 +268,8 @@ int intel_guc_context_reset_process_msg(struct intel_guc *guc,
 int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
 					 const u32 *msg, u32 len);
 
+void intel_guc_find_hung_context(struct intel_engine_cs *engine);
+
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
 void intel_guc_submission_reset_finish(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 315edeaa186a..6188189314d5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2267,6 +2267,73 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
 	return 0;
 }
 
+void intel_guc_find_hung_context(struct intel_engine_cs *engine)
+{
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	struct intel_context *ce;
+	struct i915_request *rq;
+	unsigned long index;
+
+	/* Reset called during driver load? GuC not yet initialised! */
+	if (unlikely(!guc_submission_initialized(guc)))
+		return;
+
+	xa_for_each(&guc->context_lookup, index, ce) {
+		if (!intel_context_is_pinned(ce))
+			continue;
+
+		if (intel_engine_is_virtual(ce->engine)) {
+			if (!(ce->engine->mask & engine->mask))
+				continue;
+		} else {
+			if (ce->engine != engine)
+				continue;
+		}
+
+		list_for_each_entry(rq, &ce->guc_active.requests, sched.link) {
+			if (i915_test_request_state(rq) != I915_REQUEST_ACTIVE)
+				continue;
+
+			intel_engine_set_hung_context(engine, ce);
+
+			/* Can only cope with one hang at a time... */
+			return;
+		}
+	}
+}
+
+void intel_guc_dump_active_requests(struct intel_engine_cs *engine,
+				    struct i915_request *hung_rq,
+				    struct drm_printer *m)
+{
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	struct intel_context *ce;
+	unsigned long index;
+	unsigned long flags;
+
+	/* Reset called during driver load? GuC not yet initialised! */
+	if (unlikely(!guc_submission_initialized(guc)))
+		return;
+
+	xa_for_each(&guc->context_lookup, index, ce) {
+		if (!intel_context_is_pinned(ce))
+			continue;
+
+		if (intel_engine_is_virtual(ce->engine)) {
+			if (!(ce->engine->mask & engine->mask))
+				continue;
+		} else {
+			if (ce->engine != engine)
+				continue;
+		}
+
+		spin_lock_irqsave(&ce->guc_active.lock, flags);
+		intel_engine_dump_active_requests(&ce->guc_active.requests,
+						  hung_rq, m);
+		spin_unlock_irqrestore(&ce->guc_active.lock, flags);
+	}
+}
+
 void intel_guc_log_submission_info(struct intel_guc *guc,
 				   struct drm_printer *p)
 {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index b9b9f0f60f91..a2a3fad72be1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -24,6 +24,9 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine);
 void intel_guc_log_submission_info(struct intel_guc *guc,
 				   struct drm_printer *p);
 void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p);
+void intel_guc_dump_active_requests(struct intel_engine_cs *engine,
+				    struct i915_request *hung_rq,
+				    struct drm_printer *m);
 
 bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 192784875a1d..2978c8d45021 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -2076,6 +2076,47 @@ void i915_request_show(struct drm_printer *m,
 		   name);
 }
 
+static bool engine_match_ring(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+	u32 ring = ENGINE_READ(engine, RING_START);
+
+	return ring == i915_ggtt_offset(rq->ring->vma);
+}
+
+static bool match_ring(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine;
+	bool found;
+	int i;
+
+	if (!intel_engine_is_virtual(rq->engine))
+		return engine_match_ring(rq->engine, rq);
+
+	found = false;
+	i = 0;
+	while ((engine = intel_engine_get_sibling(rq->engine, i++))) {
+		found = engine_match_ring(engine, rq);
+		if (found)
+			break;
+	}
+
+	return found;
+}
+
+enum i915_request_state i915_test_request_state(struct i915_request *rq)
+{
+	if (i915_request_completed(rq))
+		return I915_REQUEST_COMPLETE;
+
+	if (!i915_request_started(rq))
+		return I915_REQUEST_PENDING;
+
+	if (match_ring(rq))
+		return I915_REQUEST_ACTIVE;
+
+	return I915_REQUEST_QUEUED;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/mock_request.c"
 #include "selftests/i915_request.c"
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index bcc6340c505e..f98385f72782 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -651,4 +651,15 @@ i915_request_active_engine(struct i915_request *rq,
 
 void i915_request_notify_execute_cb_imm(struct i915_request *rq);
 
+enum i915_request_state
+{
+	I915_REQUEST_UNKNOWN = 0,
+	I915_REQUEST_COMPLETE,
+	I915_REQUEST_PENDING,
+	I915_REQUEST_QUEUED,
+	I915_REQUEST_ACTIVE,
+};
+
+enum i915_request_state i915_test_request_state(struct i915_request *rq);
+
 #endif /* I915_REQUEST_H */
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 43/47] drm/i915/guc: Hook GuC scheduling policies up
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (41 preceding siblings ...)
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 42/47] drm/i915/guc: Fix for error capture after full GPU reset with GuC Matthew Brost
@ 2021-06-24  7:05 ` Matthew Brost
  2021-06-25  0:59   ` Matthew Brost
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 44/47] drm/i915/guc: Connect reset modparam updates to GuC policy flags Matthew Brost
                   ` (8 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

Use the official driver default scheduling policies for configuring
the GuC scheduler rather than a bunch of hardcoded values.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Jose Souza <jose.souza@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    | 44 ++++++++++++++++++-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 +++--
 4 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 0ceffa2be7a7..37db857bb56c 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -455,6 +455,7 @@ struct intel_engine_cs {
 #define I915_ENGINE_IS_VIRTUAL       BIT(5)
 #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
 #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
+#define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8)
 	unsigned int flags;
 
 	/*
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index c38365cd5fab..905ecbc7dbe3 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -270,6 +270,8 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
 
 void intel_guc_find_hung_context(struct intel_engine_cs *engine);
 
+int intel_guc_global_policies_update(struct intel_guc *guc);
+
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
 void intel_guc_submission_reset_finish(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index d3e86ab7508f..2ad5fcd4e1b7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -77,14 +77,54 @@ static u32 guc_ads_blob_size(struct intel_guc *guc)
 	       guc_ads_private_data_size(guc);
 }
 
-static void guc_policies_init(struct guc_policies *policies)
+static void guc_policies_init(struct intel_guc *guc, struct guc_policies *policies)
 {
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct drm_i915_private *i915 = gt->i915;
+
 	policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US;
 	policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI;
+
 	policies->global_flags = 0;
+	if (i915->params.reset < 2)
+		policies->global_flags |= GLOBAL_POLICY_DISABLE_ENGINE_RESET;
+
 	policies->is_valid = 1;
 }
 
+static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE,
+		policy_offset
+	};
+
+	return intel_guc_send(guc, action, ARRAY_SIZE(action));
+}
+
+int intel_guc_global_policies_update(struct intel_guc *guc)
+{
+	struct __guc_ads_blob *blob = guc->ads_blob;
+	struct intel_gt *gt = guc_to_gt(guc);
+	intel_wakeref_t wakeref;
+	int ret;
+
+	if (!blob)
+		return -ENOTSUPP;
+
+	GEM_BUG_ON(!blob->ads.scheduler_policies);
+
+	guc_policies_init(guc, &blob->policies);
+
+	if (!intel_guc_is_ready(guc))
+		return 0;
+
+	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
+		ret = guc_action_policies_update(guc, blob->ads.scheduler_policies);
+
+	return ret;
+}
+
 static void guc_mapping_table_init(struct intel_gt *gt,
 				   struct guc_gt_system_info *system_info)
 {
@@ -281,7 +321,7 @@ static void __guc_ads_init(struct intel_guc *guc)
 	u8 engine_class, guc_class;
 
 	/* GuC scheduling policies */
-	guc_policies_init(&blob->policies);
+	guc_policies_init(guc, &blob->policies);
 
 	/*
 	 * GuC expects a per-engine-class context image and size
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 6188189314d5..a427336ce916 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -873,6 +873,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
 	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
 	atomic_set(&guc->outstanding_submission_g2h, 0);
 
+	intel_guc_global_policies_update(guc);
 	enable_submission(guc);
 	intel_gt_unpark_heartbeats(guc_to_gt(guc));
 }
@@ -1161,8 +1162,12 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
 {
 	desc->policy_flags = 0;
 
-	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
-	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
+	if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION)
+		desc->policy_flags |= CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE;
+
+	/* NB: For both of these, zero means disabled. */
+	desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
+	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
 }
 
 static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
@@ -1945,13 +1950,13 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 	engine->set_default_submission = guc_set_default_submission;
 
 	engine->flags |= I915_ENGINE_HAS_PREEMPTION;
+	engine->flags |= I915_ENGINE_HAS_TIMESLICES;
 
 	/*
 	 * TODO: GuC supports timeslicing and semaphores as well, but they're
 	 * handled by the firmware so some minor tweaks are required before
 	 * enabling.
 	 *
-	 * engine->flags |= I915_ENGINE_HAS_TIMESLICES;
 	 * engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
 	 */
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 44/47] drm/i915/guc: Connect reset modparam updates to GuC policy flags
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (42 preceding siblings ...)
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 43/47] drm/i915/guc: Hook GuC scheduling policies up Matthew Brost
@ 2021-06-24  7:05 ` Matthew Brost
  2021-06-25  1:10   ` Matthew Brost
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 45/47] drm/i915/guc: Include scheduling policies in the debugfs state dump Matthew Brost
                   ` (7 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

Changing the reset module parameter has no effect on a running GuC.
The corresponding entry in the ADS must be updated and then the GuC
informed via a Host2GuC message.

The new debugfs interface to module parameters allows this to happen.
However, connecting the parameter data address back to anything useful
is messy. One option would be to pass a new private data structure
address through instead of just the parameter pointer. However, that
means having a new (and different) data structure for each parameter
and a new (and different) write function for each parameter. This
method keeps everything generic by instead using a string lookup on
the directory entry name.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c |  2 +-
 drivers/gpu/drm/i915/i915_debugfs_params.c | 31 ++++++++++++++++++++++
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 2ad5fcd4e1b7..c6d0b762d82c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -99,7 +99,7 @@ static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
 		policy_offset
 	};
 
-	return intel_guc_send(guc, action, ARRAY_SIZE(action));
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
 }
 
 int intel_guc_global_policies_update(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/i915_debugfs_params.c b/drivers/gpu/drm/i915/i915_debugfs_params.c
index 4e2b077692cb..8ecd8b42f048 100644
--- a/drivers/gpu/drm/i915/i915_debugfs_params.c
+++ b/drivers/gpu/drm/i915/i915_debugfs_params.c
@@ -6,9 +6,20 @@
 #include <linux/kernel.h>
 
 #include "i915_debugfs_params.h"
+#include "gt/intel_gt.h"
+#include "gt/uc/intel_guc.h"
 #include "i915_drv.h"
 #include "i915_params.h"
 
+#define MATCH_DEBUGFS_NODE_NAME(_file, _name)	(strcmp((_file)->f_path.dentry->d_name.name, (_name)) == 0)
+
+#define GET_I915(i915, name, ptr)	\
+	do {	\
+		struct i915_params *params;	\
+		params = container_of(((void *) (ptr)), typeof(*params), name);	\
+		(i915) = container_of(params, typeof(*(i915)), params);	\
+	} while(0)
+
 /* int param */
 static int i915_param_int_show(struct seq_file *m, void *data)
 {
@@ -24,6 +35,16 @@ static int i915_param_int_open(struct inode *inode, struct file *file)
 	return single_open(file, i915_param_int_show, inode->i_private);
 }
 
+static int notify_guc(struct drm_i915_private *i915)
+{
+	int ret = 0;
+
+	if (intel_uc_uses_guc_submission(&i915->gt.uc))
+		ret = intel_guc_global_policies_update(&i915->gt.uc.guc);
+
+	return ret;
+}
+
 static ssize_t i915_param_int_write(struct file *file,
 				    const char __user *ubuf, size_t len,
 				    loff_t *offp)
@@ -81,8 +102,10 @@ static ssize_t i915_param_uint_write(struct file *file,
 				     const char __user *ubuf, size_t len,
 				     loff_t *offp)
 {
+	struct drm_i915_private *i915;
 	struct seq_file *m = file->private_data;
 	unsigned int *value = m->private;
+	unsigned int old = *value;
 	int ret;
 
 	ret = kstrtouint_from_user(ubuf, len, 0, value);
@@ -95,6 +118,14 @@ static ssize_t i915_param_uint_write(struct file *file,
 			*value = b;
 	}
 
+	if (!ret && MATCH_DEBUGFS_NODE_NAME(file, "reset")) {
+		GET_I915(i915, reset, value);
+
+		ret = notify_guc(i915);
+		if (ret)
+			*value = old;
+	}
+
 	return ret ?: len;
 }
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 45/47] drm/i915/guc: Include scheduling policies in the debugfs state dump
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (43 preceding siblings ...)
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 44/47] drm/i915/guc: Connect reset modparam updates to GuC policy flags Matthew Brost
@ 2021-06-24  7:05 ` Matthew Brost
  2021-06-24 16:34   ` Matthew Brost
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 46/47] drm/i915/guc: Add golden context to GuC ADS Matthew Brost
                   ` (6 subsequent siblings)
  51 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

Added the scheduling policy parameters to the 'guc_info' debugfs state
dump.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c     | 13 +++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h     |  2 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c |  2 ++
 3 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index c6d0b762d82c..b8182844aa00 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -92,6 +92,19 @@ static void guc_policies_init(struct intel_guc *guc, struct guc_policies *polici
 	policies->is_valid = 1;
 }
 
+void intel_guc_log_policy_info(struct intel_guc *guc, struct drm_printer *dp)
+{
+	struct __guc_ads_blob *blob = guc->ads_blob;
+
+	if (unlikely(!blob))
+		return;
+
+	drm_printf(dp, "Global scheduling policies:\n");
+	drm_printf(dp, "  DPC promote time   = %u\n", blob->policies.dpc_promote_time);
+	drm_printf(dp, "  Max num work items = %u\n", blob->policies.max_num_work_items);
+	drm_printf(dp, "  Flags              = %u\n", blob->policies.global_flags);
+}
+
 static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
 {
 	u32 action[] = {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
index b00d3ae1113a..0fdcb3583601 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
@@ -7,9 +7,11 @@
 #define _INTEL_GUC_ADS_H_
 
 struct intel_guc;
+struct drm_printer;
 
 int intel_guc_ads_create(struct intel_guc *guc);
 void intel_guc_ads_destroy(struct intel_guc *guc);
 void intel_guc_ads_reset(struct intel_guc *guc);
+void intel_guc_log_policy_info(struct intel_guc *guc, struct drm_printer *p);
 
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
index 62b9ce0fafaa..9a03ff56e654 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
@@ -10,6 +10,7 @@
 #include "intel_guc_debugfs.h"
 #include "intel_guc_log_debugfs.h"
 #include "gt/uc/intel_guc_ct.h"
+#include "gt/uc/intel_guc_ads.h"
 #include "gt/uc/intel_guc_submission.h"
 
 static int guc_info_show(struct seq_file *m, void *data)
@@ -29,6 +30,7 @@ static int guc_info_show(struct seq_file *m, void *data)
 
 	intel_guc_log_ct_info(&guc->ct, &p);
 	intel_guc_log_submission_info(guc, &p);
+	intel_guc_log_policy_info(guc, &p);
 
 	return 0;
 }
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 46/47] drm/i915/guc: Add golden context to GuC ADS
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (44 preceding siblings ...)
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 45/47] drm/i915/guc: Include scheduling policies in the debugfs state dump Matthew Brost
@ 2021-06-24  7:05 ` Matthew Brost
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 47/47] drm/i915/guc: Unblock GuC submission on Gen11+ Matthew Brost
                   ` (5 subsequent siblings)
  51 siblings, 0 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

The media watchdog mechanism involves GuC doing a silent reset and
continue of the hung context. This requires the i915 driver provide a
golden context to GuC in the ADS.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.c         |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c     |   5 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h     |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 213 ++++++++++++++++++---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.c      |   5 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.h      |   1 +
 7 files changed, 199 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index acfdd53b2678..ceeb517ba259 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -654,6 +654,8 @@ int intel_gt_init(struct intel_gt *gt)
 	if (err)
 		goto err_gt;
 
+	intel_uc_init_late(&gt->uc);
+
 	err = i915_inject_probe_error(gt->i915, -EIO);
 	if (err)
 		goto err_gt;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 68266cbffd1f..979128e28372 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -180,6 +180,11 @@ void intel_guc_init_early(struct intel_guc *guc)
 	}
 }
 
+void intel_guc_init_late(struct intel_guc *guc)
+{
+	intel_guc_ads_init_late(guc);
+}
+
 static u32 guc_ctl_debug_flags(struct intel_guc *guc)
 {
 	u32 level = intel_guc_log_get_level(&guc->log);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 905ecbc7dbe3..fae01dc8e1b9 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -59,6 +59,7 @@ struct intel_guc {
 	struct i915_vma *ads_vma;
 	struct __guc_ads_blob *ads_blob;
 	u32 ads_regset_size;
+	u32 ads_golden_ctxt_size;
 
 	struct i915_vma *lrc_desc_pool;
 	void *lrc_desc_pool_vaddr;
@@ -176,6 +177,7 @@ static inline u32 intel_guc_ggtt_offset(struct intel_guc *guc,
 }
 
 void intel_guc_init_early(struct intel_guc *guc);
+void intel_guc_init_late(struct intel_guc *guc);
 void intel_guc_init_send_regs(struct intel_guc *guc);
 void intel_guc_write_params(struct intel_guc *guc);
 int intel_guc_init(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index b8182844aa00..dfaeafc512fb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -7,6 +7,7 @@
 
 #include "gt/intel_gt.h"
 #include "gt/intel_lrc.h"
+#include "gt/shmem_utils.h"
 #include "intel_guc_ads.h"
 #include "intel_guc_fwif.h"
 #include "intel_uc.h"
@@ -33,6 +34,10 @@
  *      +---------------------------------------+ <== dynamic
  *      | padding                               |
  *      +---------------------------------------+ <== 4K aligned
+ *      | golden contexts                       |
+ *      +---------------------------------------+
+ *      | padding                               |
+ *      +---------------------------------------+ <== 4K aligned
  *      | private data                          |
  *      +---------------------------------------+
  *      | padding                               |
@@ -52,6 +57,11 @@ static u32 guc_ads_regset_size(struct intel_guc *guc)
 	return guc->ads_regset_size;
 }
 
+static u32 guc_ads_golden_ctxt_size(struct intel_guc *guc)
+{
+	return PAGE_ALIGN(guc->ads_golden_ctxt_size);
+}
+
 static u32 guc_ads_private_data_size(struct intel_guc *guc)
 {
 	return PAGE_ALIGN(guc->fw.private_data_size);
@@ -62,12 +72,23 @@ static u32 guc_ads_regset_offset(struct intel_guc *guc)
 	return offsetof(struct __guc_ads_blob, regset);
 }
 
-static u32 guc_ads_private_data_offset(struct intel_guc *guc)
+static u32 guc_ads_golden_ctxt_offset(struct intel_guc *guc)
 {
 	u32 offset;
 
 	offset = guc_ads_regset_offset(guc) +
 		 guc_ads_regset_size(guc);
+
+	return PAGE_ALIGN(offset);
+}
+
+static u32 guc_ads_private_data_offset(struct intel_guc *guc)
+{
+	u32 offset;
+
+	offset = guc_ads_golden_ctxt_offset(guc) +
+		 guc_ads_golden_ctxt_size(guc);
+
 	return PAGE_ALIGN(offset);
 }
 
@@ -318,53 +339,163 @@ static void guc_mmio_reg_state_init(struct intel_guc *guc,
 	GEM_BUG_ON(temp_set.size);
 }
 
-/*
- * The first 80 dwords of the register state context, containing the
- * execlists and ppgtt registers.
- */
-#define LR_HW_CONTEXT_SIZE	(80 * sizeof(u32))
+static void fill_engine_enable_masks(struct intel_gt *gt,
+				     struct guc_gt_system_info *info)
+{
+	info->engine_enabled_masks[GUC_RENDER_CLASS] = 1;
+	info->engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
+	info->engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
+	info->engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
+}
 
-static void __guc_ads_init(struct intel_guc *guc)
+/* Skip execlist and PPGTT registers */
+#define LR_HW_CONTEXT_SIZE      (80 * sizeof(u32))
+#define SKIP_SIZE               (LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE)
+
+static int guc_prep_golden_context(struct intel_guc *guc,
+				   struct __guc_ads_blob *blob)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
-	struct drm_i915_private *i915 = gt->i915;
+	u32 addr_ggtt, offset;
+	u32 total_size = 0, alloc_size, real_size;
+	u8 engine_class, guc_class;
+	struct guc_gt_system_info *info, local_info;
+
+	/*
+	 * Reserve the memory for the golden contexts and point GuC at it but
+	 * leave it empty for now. The context data will be filled in later
+	 * once there is something available to put there.
+	 *
+	 * Note that the HWSP and ring context are not included.
+	 *
+	 * Note also that the storage must be pinned in the GGTT, so that the
+	 * address won't change after GuC has been told where to find it. The
+	 * GuC will also validate that the LRC base + size fall within the
+	 * allowed GGTT range.
+	 */
+	if (blob) {
+		offset = guc_ads_golden_ctxt_offset(guc);
+		addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
+		info = &blob->system_info;
+	} else {
+		memset(&local_info, 0, sizeof(local_info));
+		info = &local_info;
+		fill_engine_enable_masks(gt, info);
+	}
+
+	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
+		if (engine_class == OTHER_CLASS)
+			continue;
+
+		guc_class = engine_class_to_guc_class(engine_class);
+
+		if (!info->engine_enabled_masks[guc_class])
+			continue;
+
+		real_size = intel_engine_context_size(gt, engine_class);
+		alloc_size = PAGE_ALIGN(real_size);
+		total_size += alloc_size;
+
+		if (!blob)
+			continue;
+
+		blob->ads.eng_state_size[guc_class] = real_size;
+		blob->ads.golden_context_lrca[guc_class] = addr_ggtt;
+		addr_ggtt += alloc_size;
+	}
+
+	if (!blob)
+		return total_size;
+
+	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
+	return total_size;
+}
+
+static struct intel_engine_cs *find_engine_state(struct intel_gt *gt, u8 engine_class)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, gt, id) {
+		if (engine->class != engine_class)
+			continue;
+
+		if (!engine->default_state)
+			continue;
+
+		return engine;
+	}
+
+	return NULL;
+}
+
+static void guc_init_golden_context(struct intel_guc *guc)
+{
 	struct __guc_ads_blob *blob = guc->ads_blob;
-	const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE;
-	u32 base;
+	struct intel_engine_cs *engine;
+	struct intel_gt *gt = guc_to_gt(guc);
+	u32 addr_ggtt, offset;
+	u32 total_size = 0, alloc_size, real_size;
 	u8 engine_class, guc_class;
+	u8 *ptr;
 
-	/* GuC scheduling policies */
-	guc_policies_init(guc, &blob->policies);
+	if (!intel_uc_uses_guc_submission(&gt->uc))
+		return;
+
+	GEM_BUG_ON(!blob);
 
 	/*
-	 * GuC expects a per-engine-class context image and size
-	 * (minus hwsp and ring context). The context image will be
-	 * used to reinitialize engines after a reset. It must exist
-	 * and be pinned in the GGTT, so that the address won't change after
-	 * we have told GuC where to find it. The context size will be used
-	 * to validate that the LRC base + size fall within allowed GGTT.
+	 * Go back and fill in the golden context data now that it is
+	 * available.
 	 */
+	offset = guc_ads_golden_ctxt_offset(guc);
+	addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
+	ptr = ((u8 *) blob) + offset;
+
 	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
 		if (engine_class == OTHER_CLASS)
 			continue;
 
 		guc_class = engine_class_to_guc_class(engine_class);
 
-		/*
-		 * TODO: Set context pointer to default state to allow
-		 * GuC to re-init guilty contexts after internal reset.
-		 */
-		blob->ads.golden_context_lrca[guc_class] = 0;
-		blob->ads.eng_state_size[guc_class] =
-			intel_engine_context_size(gt, engine_class) -
-			skipped_size;
+		if (!blob->system_info.engine_enabled_masks[guc_class])
+			continue;
+
+		real_size = intel_engine_context_size(gt, engine_class);
+		alloc_size = PAGE_ALIGN(real_size);
+		total_size += alloc_size;
+
+		engine = find_engine_state(gt, engine_class);
+		if (!engine) {
+			drm_err(&gt->i915->drm, "No engine state recorded for class %d!\n", engine_class);
+			blob->ads.eng_state_size[guc_class] = 0;
+			blob->ads.golden_context_lrca[guc_class] = 0;
+			continue;
+		}
+
+		GEM_BUG_ON(blob->ads.eng_state_size[guc_class] != real_size);
+		GEM_BUG_ON(blob->ads.golden_context_lrca[guc_class] != addr_ggtt);
+		addr_ggtt += alloc_size;
+
+		shmem_read(engine->default_state, SKIP_SIZE, ptr + SKIP_SIZE, real_size);
+		ptr += alloc_size;
 	}
 
+	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
+}
+
+static void __guc_ads_init(struct intel_guc *guc)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct drm_i915_private *i915 = gt->i915;
+	struct __guc_ads_blob *blob = guc->ads_blob;
+	u32 base;
+
+	/* GuC scheduling policies */
+	guc_policies_init(guc, &blob->policies);
+
 	/* System info */
-	blob->system_info.engine_enabled_masks[GUC_RENDER_CLASS] = 1;
-	blob->system_info.engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
-	blob->system_info.engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
-	blob->system_info.engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
+	fill_engine_enable_masks(gt, &blob->system_info);
 
 	blob->system_info.generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_SLICE_ENABLED] =
 		hweight8(gt->info.sseu.slice_mask);
@@ -379,6 +510,9 @@ static void __guc_ads_init(struct intel_guc *guc)
 			 GEN12_DOORBELLS_PER_SQIDI) + 1;
 	}
 
+	/* Golden contexts for re-initialising after a watchdog reset */
+	guc_prep_golden_context(guc, blob);
+
 	guc_mapping_table_init(guc_to_gt(guc), &blob->system_info);
 
 	base = intel_guc_ggtt_offset(guc, guc->ads_vma);
@@ -416,6 +550,13 @@ int intel_guc_ads_create(struct intel_guc *guc)
 		return ret;
 	guc->ads_regset_size = ret;
 
+	/* Likewise the golden contexts: */
+	ret = guc_prep_golden_context(guc, NULL);
+	if (ret < 0)
+		return ret;
+	guc->ads_golden_ctxt_size = ret;
+
+	/* Now the total size can be determined: */
 	size = guc_ads_blob_size(guc);
 
 	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma,
@@ -428,6 +569,18 @@ int intel_guc_ads_create(struct intel_guc *guc)
 	return 0;
 }
 
+void intel_guc_ads_init_late(struct intel_guc *guc)
+{
+	/*
+	 * The golden context setup requires the saved engine state from
+	 * __engines_record_defaults(). However, that requires engines to be
+	 * operational which means the ADS must already have been configured.
+	 * Fortunately, the golden context state is not needed until a hang
+	 * occurs, so it can be filled in during this late init phase.
+	 */
+	guc_init_golden_context(guc);
+}
+
 void intel_guc_ads_destroy(struct intel_guc *guc)
 {
 	i915_vma_unpin_and_release(&guc->ads_vma, I915_VMA_RELEASE_MAP);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
index 0fdcb3583601..dac0dc32da34 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
@@ -11,6 +11,7 @@ struct drm_printer;
 
 int intel_guc_ads_create(struct intel_guc *guc);
 void intel_guc_ads_destroy(struct intel_guc *guc);
+void intel_guc_ads_init_late(struct intel_guc *guc);
 void intel_guc_ads_reset(struct intel_guc *guc);
 void intel_guc_log_policy_info(struct intel_guc *guc, struct drm_printer *p);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 77c1fe2ed883..7a69c3c027e9 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -120,6 +120,11 @@ void intel_uc_init_early(struct intel_uc *uc)
 		uc->ops = &uc_ops_off;
 }
 
+void intel_uc_init_late(struct intel_uc *uc)
+{
+	intel_guc_init_late(&uc->guc);
+}
+
 void intel_uc_driver_late_release(struct intel_uc *uc)
 {
 }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
index 91315e3f1c58..e2da2b6e76e1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -35,6 +35,7 @@ struct intel_uc {
 };
 
 void intel_uc_init_early(struct intel_uc *uc);
+void intel_uc_init_late(struct intel_uc *uc);
 void intel_uc_driver_late_release(struct intel_uc *uc);
 void intel_uc_driver_remove(struct intel_uc *uc);
 void intel_uc_init_mmio(struct intel_uc *uc);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] [PATCH 47/47] drm/i915/guc: Unblock GuC submission on Gen11+
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (45 preceding siblings ...)
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 46/47] drm/i915/guc: Add golden context to GuC ADS Matthew Brost
@ 2021-06-24  7:05 ` Matthew Brost
  2021-06-30  8:22   ` Martin Peres
  2021-07-15  0:49   ` Matthew Brost
  2021-06-24  7:17 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for GuC submission support Patchwork
                   ` (4 subsequent siblings)
  51 siblings, 2 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-24  7:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Unblock GuC submission on Gen11+ platforms.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h            |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  8 ++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h |  3 +--
 drivers/gpu/drm/i915/gt/uc/intel_uc.c             | 14 +++++++++-----
 4 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index fae01dc8e1b9..77981788204f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -54,6 +54,7 @@ struct intel_guc {
 	struct ida guc_ids;
 	struct list_head guc_id_list;
 
+	bool submission_supported;
 	bool submission_selected;
 
 	struct i915_vma *ads_vma;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a427336ce916..405339202280 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2042,6 +2042,13 @@ void intel_guc_submission_disable(struct intel_guc *guc)
 	/* Note: By the time we're here, GuC may have already been reset */
 }
 
+static bool __guc_submission_supported(struct intel_guc *guc)
+{
+	/* GuC submission is unavailable for pre-Gen11 */
+	return intel_guc_is_supported(guc) &&
+	       INTEL_GEN(guc_to_gt(guc)->i915) >= 11;
+}
+
 static bool __guc_submission_selected(struct intel_guc *guc)
 {
 	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
@@ -2054,6 +2061,7 @@ static bool __guc_submission_selected(struct intel_guc *guc)
 
 void intel_guc_submission_init_early(struct intel_guc *guc)
 {
+	guc->submission_supported = __guc_submission_supported(guc);
 	guc->submission_selected = __guc_submission_selected(guc);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index a2a3fad72be1..be767eb6ff71 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -37,8 +37,7 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
 
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
-	/* XXX: GuC submission is unavailable for now */
-	return false;
+	return guc->submission_supported;
 }
 
 static inline bool intel_guc_submission_is_wanted(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 7a69c3c027e9..61be0aa81492 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -34,8 +34,15 @@ static void uc_expand_default_options(struct intel_uc *uc)
 		return;
 	}
 
-	/* Default: enable HuC authentication only */
-	i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
+	/* Intermediate platforms are HuC authentication only */
+	if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) {
+		drm_dbg(&i915->drm, "Disabling GuC only due to old platform\n");
+		i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
+		return;
+	}
+
+	/* Default: enable HuC authentication and GuC submission */
+	i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION;
 }
 
 /* Reset GuC providing us with fresh state for both GuC and HuC.
@@ -313,9 +320,6 @@ static int __uc_init(struct intel_uc *uc)
 	if (i915_inject_probe_failure(uc_to_gt(uc)->i915))
 		return -ENOMEM;
 
-	/* XXX: GuC submission is unavailable for now */
-	GEM_BUG_ON(intel_uc_uses_guc_submission(uc));
-
 	ret = intel_guc_init(guc);
 	if (ret)
 		return ret;
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for GuC submission support
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (46 preceding siblings ...)
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 47/47] drm/i915/guc: Unblock GuC submission on Gen11+ Matthew Brost
@ 2021-06-24  7:17 ` Patchwork
  2021-06-24  7:19 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
                   ` (3 subsequent siblings)
  51 siblings, 0 replies; 170+ messages in thread
From: Patchwork @ 2021-06-24  7:17 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx

== Series Details ==

Series: GuC submission support
URL   : https://patchwork.freedesktop.org/series/91840/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
a582e9c666ff drm/i915/guc: Relax CTB response timeout
743d703ec315 drm/i915/guc: Improve error message for unsolicited CT response
beb15fc92c92 drm/i915/guc: Increase size of CTB buffers
3d4dab7e7c81 drm/i915/guc: Add non blocking CTB send function
c0779d6fc3b7 drm/i915/guc: Add stall timer to non blocking CTB send function
2c9efd827987 drm/i915/guc: Optimize CTB writes and reads
-:112: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#112: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c:528:
+		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
+			  ctb->desc->head, ctb->desc->tail, ctb->size);

total: 0 errors, 0 warnings, 1 checks, 184 lines checked
28324246d8aa drm/i915/guc: Module load failure test for CT buffer creation
-:38: WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email address mismatch: 'From: John Harrison <John.C.Harrison@Intel.com>' != 'Signed-off-by: John Harrison <john.c.harrison@intel.com>'

total: 0 errors, 1 warnings, 0 checks, 20 lines checked
071d7fa76a91 drm/i915/guc: Add new GuC interface defines and structures
-:98: WARNING:BLOCK_COMMENT_STYLE: Block comments use a trailing */ on a separate line
#98: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h:275:
+	 * reset. (in micro seconds). */

total: 0 errors, 1 warnings, 0 checks, 83 lines checked
9cc26b390c65 drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor
-:120: WARNING:PREFER_DEFINED_ATTRIBUTE_MACRO: __always_unused or __maybe_unused is preferred over __attribute__((__unused__))
#120: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:69:
+__attribute__ ((unused))

total: 0 errors, 1 warnings, 0 checks, 208 lines checked
dbd8df00f28f drm/i915/guc: Add lrc descriptor context lookup array
0478ac5fcc47 drm/i915/guc: Implement GuC submission tasklet
86db42bdbf1a drm/i915/guc: Add bypass tasklet submission path to GuC
b9faec389671 drm/i915/guc: Implement GuC context operations for new inteface
-:109: ERROR:POINTER_LOCATION: "foo* bar" should be "foo *bar"
#109: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc.h:113:
+static inline int intel_guc_send_busy_loop(struct intel_guc* guc,

-:117: ERROR:IN_ATOMIC: do not use in_atomic in drivers
#117: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc.h:121:
+	might_sleep_if(loop && (!in_atomic() && !irqs_disabled()));

-:122: ERROR:IN_ATOMIC: do not use in_atomic in drivers
#122: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc.h:126:
+		if (likely(!in_atomic() && !irqs_disabled()))

-:551: WARNING:REPEATED_WORD: Possible repeated word: 'from'
#551: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:784:
+	 * could be regisgered either the guc_id has been stole from from

-:789: WARNING:ONE_SEMICOLON: Statements terminations use 1 semicolon
#789: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:985:
+		return ret;;

total: 3 errors, 2 warnings, 0 checks, 887 lines checked
0ddf4a0e5ead drm/i915/guc: Insert fence on context when deregistering
-:7: WARNING:REPEATED_WORD: Possible repeated word: 'before'
#7: 
registered with the GuC. In this a case deregister must be before before

total: 0 errors, 1 warnings, 0 checks, 113 lines checked
53890c0d5072 drm/i915/guc: Defer context unpin until scheduling is disabled
9e9fc047e7c1 drm/i915/guc: Disable engine barriers with GuC during unpin
e7452c180185 drm/i915/guc: Extend deregistration fence to schedule disable
195129a34fbe drm/i915: Disable preempt busywait when using GuC scheduling
e53180c87b60 drm/i915/guc: Ensure request ordering via completion fences
-:90: WARNING:LONG_LINE: line length of 101 exceeds 100 columns
#90: FILE: drivers/gpu/drm/i915/i915_request.c:1675:
+		if ((!uses_guc && is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask)) ||

total: 0 errors, 1 warnings, 0 checks, 64 lines checked
59c796b965e6 drm/i915/guc: Disable semaphores when using GuC scheduling
588df5bce44c drm/i915/guc: Ensure G2H response has space in buffer
-:22: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'len' - possible side-effects?
#22: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc.h:100:
+#define MAKE_SEND_FLAGS(len) \
+	({GEM_BUG_ON(!FIELD_FIT(INTEL_GUC_SEND_G2H_DW_MASK, len)); \
+	(FIELD_PREP(INTEL_GUC_SEND_G2H_DW_MASK, len) | INTEL_GUC_SEND_NB);})

-:24: ERROR:SPACING: space required after that ';' (ctx:VxV)
#24: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc.h:102:
+	(FIELD_PREP(INTEL_GUC_SEND_G2H_DW_MASK, len) | INTEL_GUC_SEND_NB);})
 	                                                                 ^

-:202: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#202: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c:597:
+#define G2H_LEN_DW(f) \
+	FIELD_GET(INTEL_GUC_SEND_G2H_DW_MASK, f) ? \
+	FIELD_GET(INTEL_GUC_SEND_G2H_DW_MASK, f) + GUC_CTB_HXG_MSG_MIN_LEN : 0

-:202: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'f' - possible side-effects?
#202: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c:597:
+#define G2H_LEN_DW(f) \
+	FIELD_GET(INTEL_GUC_SEND_G2H_DW_MASK, f) ? \
+	FIELD_GET(INTEL_GUC_SEND_G2H_DW_MASK, f) + GUC_CTB_HXG_MSG_MIN_LEN : 0

total: 2 errors, 0 warnings, 2 checks, 296 lines checked
a03438ead8d3 drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
-:218: ERROR:POINTER_LOCATION: "foo* bar" should be "foo *bar"
#218: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:257:
+static int guc_submission_busy_loop(struct intel_guc* guc,

total: 1 errors, 0 warnings, 0 checks, 333 lines checked
c84dc5dbebea drm/i915/guc: Update GuC debugfs to support new GuC
4adb7ce61e5f drm/i915/guc: Add several request trace points
37ebee77b9c2 drm/i915: Add intel_context tracing
-:152: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#152: FILE: drivers/gpu/drm/i915/i915_trace.h:909:
+DECLARE_EVENT_CLASS(intel_context,
+	    TP_PROTO(struct intel_context *ce),

-:155: CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#155: FILE: drivers/gpu/drm/i915/i915_trace.h:912:
+	    TP_STRUCT__entry(

-:162: CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#162: FILE: drivers/gpu/drm/i915/i915_trace.h:919:
+	    TP_fast_assign(

total: 0 errors, 0 warnings, 3 checks, 264 lines checked
351767d30288 drm/i915/guc: GuC virtual engines
-:122: WARNING:UNNECESSARY_ELSE: else is not generally useful after a break or return
#122: FILE: drivers/gpu/drm/i915/gt/intel_engine.h:285:
+		return intel_guc_virtual_engine_has_heartbeat(engine);
+	else

-:847: CHECK:LINE_SPACING: Please don't use multiple blank lines
#847: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1894:
+
+

total: 0 errors, 1 warnings, 1 checks, 785 lines checked
5ca8f1a503a8 drm/i915: Track 'serial' counts for virtual engines
81e0788642e7 drm/i915: Hold reference to intel_context over life of i915_request
2fca6597dcd8 drm/i915/guc: Disable bonding extension with GuC submission
7ae740c21ec4 drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
-:346: ERROR:CODE_INDENT: code indent should use tabs where possible
#346: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1411:
+        * In GuC submission mode we do not know which physical engine a request$

-:347: ERROR:CODE_INDENT: code indent should use tabs where possible
#347: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1412:
+        * will be scheduled on, this creates a problem because the breadcrumb$

-:348: ERROR:CODE_INDENT: code indent should use tabs where possible
#348: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1413:
+        * interrupt is per physical engine. To work around this we attach$

-:349: ERROR:CODE_INDENT: code indent should use tabs where possible
#349: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1414:
+        * requests and direct all breadcrumb interrupts to the first instance$

-:350: ERROR:CODE_INDENT: code indent should use tabs where possible
#350: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1415:
+        * of an engine per class. In addition all breadcrumb interrupts are$

-:352: ERROR:CODE_INDENT: code indent should use tabs where possible
#352: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1417:
+        */$

total: 6 errors, 0 warnings, 0 checks, 329 lines checked
3c79abc34b26 drm/i915/guc: Reset implementation for new GuC interface
-:306: CHECK:LINE_SPACING: Please don't use multiple blank lines
#306: FILE: drivers/gpu/drm/i915/gt/mock_engine.c:266:
+
+

-:390: CHECK:COMPARISON_TO_NULL: Comparison to NULL could be written "guc->lrc_desc_pool_vaddr"
#390: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:250:
+	return guc->lrc_desc_pool_vaddr != NULL;

-:545: ERROR:SPACING: spaces required around that '||' (ctx:VxW)
#545: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:540:
+			if (pending_enable|| deregister)
 			                  ^

-:585: WARNING:MEMORY_BARRIER: memory barrier without comment
#585: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:580:
+	wmb();

-:637: ERROR:CODE_INDENT: code indent should use tabs where possible
#637: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:622:
+ ^I */$

-:637: WARNING:SPACE_BEFORE_TAB: please, no space before tabs
#637: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:622:
+ ^I */$

-:1294: CHECK:LINE_SPACING: Please don't use multiple blank lines
#1294: FILE: drivers/gpu/drm/i915/gt/uc/intel_uc.c:568:
 
+

total: 2 errors, 2 warnings, 3 checks, 1301 lines checked
848ca056fc3e drm/i915: Reset GPU immediately if submission is disabled
-:93: CHECK:BRACES: Blank lines aren't necessary before a close brace '}'
#93: FILE: drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c:223:
+
+}

total: 0 errors, 0 warnings, 1 checks, 181 lines checked
17b20c0c8bd7 drm/i915/guc: Add disable interrupts to guc sanitize
-:11: ERROR:BAD_SIGN_OFF: Unrecognized email address: 'Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com'
#11: 
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com

total: 1 errors, 0 warnings, 0 checks, 70 lines checked
345f1eca0a86 drm/i915/guc: Suspend/resume implementation for new interface
a921fc062048 drm/i915/guc: Handle context reset notification
5560a36d3f00 drm/i915/guc: Handle engine reset failure notification
ee41dea700c0 drm/i915/guc: Enable the timer expired interrupt for GuC
adcd1418b513 drm/i915/guc: Provide mmio list to be saved/restored on engine reset
-:355: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#355: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c:212:
+	temp_set.registers = kmalloc_array(temp_set.size,
+					  sizeof(*temp_set.registers),

-:384: CHECK:SPACING: No space is necessary after a cast
#384: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c:241:
+	temp_set.registers = (struct guc_mmio_reg *) (((u8 *) blob) + offset);

total: 0 errors, 0 warnings, 2 checks, 400 lines checked
b7f2c290b8da drm/i915/guc: Don't complain about reset races
cf156af71348 drm/i915/guc: Enable GuC engine reset
d43f902c9f69 drm/i915/guc: Capture error state on context reset
c24a302b3409 drm/i915/guc: Fix for error capture after full GPU reset with GuC
-:119: CHECK:BRACES: braces {} should be used on all arms of this statement
#119: FILE: drivers/gpu/drm/i915/gt/intel_engine_cs.c:1701:
+	if (guc) {
[...]
+	} else
[...]

-:123: CHECK:BRACES: Unbalanced braces around else statement
#123: FILE: drivers/gpu/drm/i915/gt/intel_engine_cs.c:1705:
+	} else

-:408: ERROR:OPEN_BRACE: open brace '{' following enum go on the same line
#408: FILE: drivers/gpu/drm/i915/i915_request.h:655:
+enum i915_request_state
+{

-:418: WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email address mismatch: 'From: John Harrison <John.C.Harrison@Intel.com>' != 'Signed-off-by: John Harrison <john.c.harrison@intel.com>'

total: 1 errors, 1 warnings, 2 checks, 348 lines checked
2529cfc15930 drm/i915/guc: Hook GuC scheduling policies up
-:80: WARNING:ENOTSUPP: ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP
#80: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c:113:
+		return -ENOTSUPP;

-:148: WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email address mismatch: 'From: John Harrison <John.C.Harrison@Intel.com>' != 'Signed-off-by: John Harrison <john.c.harrison@intel.com>'

total: 0 errors, 2 warnings, 0 checks, 113 lines checked
055c438902ea drm/i915/guc: Connect reset modparam updates to GuC policy flags
-:49: WARNING:LONG_LINE: line length of 107 exceeds 100 columns
#49: FILE: drivers/gpu/drm/i915/i915_debugfs_params.c:14:
+#define MATCH_DEBUGFS_NODE_NAME(_file, _name)	(strcmp((_file)->f_path.dentry->d_name.name, (_name)) == 0)

-:51: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'i915' - possible side-effects?
#51: FILE: drivers/gpu/drm/i915/i915_debugfs_params.c:16:
+#define GET_I915(i915, name, ptr)	\
+	do {	\
+		struct i915_params *params;	\
+		params = container_of(((void *) (ptr)), typeof(*params), name);	\
+		(i915) = container_of(params, typeof(*(i915)), params);	\
+	} while(0)

-:54: CHECK:SPACING: No space is necessary after a cast
#54: FILE: drivers/gpu/drm/i915/i915_debugfs_params.c:19:
+		params = container_of(((void *) (ptr)), typeof(*params), name);	\

-:56: ERROR:SPACING: space required before the open parenthesis '('
#56: FILE: drivers/gpu/drm/i915/i915_debugfs_params.c:21:
+	} while(0)

-:103: WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email address mismatch: 'From: John Harrison <John.C.Harrison@Intel.com>' != 'Signed-off-by: John Harrison <john.c.harrison@intel.com>'

total: 1 errors, 2 warnings, 2 checks, 68 lines checked
9bedfd84c9e8 drm/i915/guc: Include scheduling policies in the debugfs state dump
-:72: WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email address mismatch: 'From: John Harrison <John.C.Harrison@Intel.com>' != 'Signed-off-by: John Harrison <john.c.harrison@intel.com>'

total: 0 errors, 1 warnings, 0 checks, 44 lines checked
b49edbcc9280 drm/i915/guc: Add golden context to GuC ADS
-:254: CHECK:SPACING: No space is necessary after a cast
#254: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c:453:
+	ptr = ((u8 *) blob) + offset;

-:279: WARNING:LONG_LINE: line length of 106 exceeds 100 columns
#279: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c:470:
+			drm_err(&gt->i915->drm, "No engine state recorded for class %d!\n", engine_class);

total: 0 errors, 1 warnings, 1 checks, 342 lines checked
8a320e973471 drm/i915/guc: Unblock GuC submission on Gen11+


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for GuC submission support
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (47 preceding siblings ...)
  2021-06-24  7:17 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for GuC submission support Patchwork
@ 2021-06-24  7:19 ` Patchwork
  2021-06-24  7:47 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
                   ` (2 subsequent siblings)
  51 siblings, 0 replies; 170+ messages in thread
From: Patchwork @ 2021-06-24  7:19 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx

== Series Details ==

Series: GuC submission support
URL   : https://patchwork.freedesktop.org/series/91840/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+./include/linux/stddef.h:17:9: this was the original definition
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined
+/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h:417:9: warning: preprocessor token offsetof redefined


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for GuC submission support
  2021-06-24  7:04 [Intel-gfx] [PATCH 00/47] GuC submission support Matthew Brost
                   ` (48 preceding siblings ...)
  2021-06-24  7:19 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
@ 2021-06-24  7:47 ` Patchwork
  2021-07-12 19:23 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for GuC submission support (rev2) Patchwork
  2021-10-22  9:35 ` [Intel-gfx] [PATCH 00/47] GuC submission support Joonas Lahtinen
  51 siblings, 0 replies; 170+ messages in thread
From: Patchwork @ 2021-06-24  7:47 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 7120 bytes --]

== Series Details ==

Series: GuC submission support
URL   : https://patchwork.freedesktop.org/series/91840/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_10271 -> Patchwork_20449
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_20449 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_20449, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20449/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_20449:

### IGT changes ###

#### Possible regressions ####

  * igt@core_auth@basic-auth:
    - fi-kbl-8809g:       [PASS][1] -> [DMESG-WARN][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10271/fi-kbl-8809g/igt@core_auth@basic-auth.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20449/fi-kbl-8809g/igt@core_auth@basic-auth.html

  * igt@i915_selftest@live@workarounds:
    - fi-snb-2520m:       [PASS][3] -> [DMESG-FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10271/fi-snb-2520m/igt@i915_selftest@live@workarounds.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20449/fi-snb-2520m/igt@i915_selftest@live@workarounds.html
    - fi-snb-2600:        [PASS][5] -> [DMESG-FAIL][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10271/fi-snb-2600/igt@i915_selftest@live@workarounds.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20449/fi-snb-2600/igt@i915_selftest@live@workarounds.html

  
Known issues
------------

  Here are the changes found in Patchwork_20449 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_basic@cs-gfx:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][7] ([fdo#109271]) +15 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20449/fi-kbl-soraka/igt@amdgpu/amd_basic@cs-gfx.html

  * igt@kms_chamelium@dp-crc-fast:
    - fi-bsw-nick:        NOTRUN -> [SKIP][8] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20449/fi-bsw-nick/igt@kms_chamelium@dp-crc-fast.html

  * igt@prime_vgem@basic-fence-flip:
    - fi-bsw-nick:        NOTRUN -> [SKIP][9] ([fdo#109271]) +63 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20449/fi-bsw-nick/igt@prime_vgem@basic-fence-flip.html

  * igt@runner@aborted:
    - fi-kbl-8809g:       NOTRUN -> [FAIL][10] ([i915#3363])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20449/fi-kbl-8809g/igt@runner@aborted.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#3303]: https://gitlab.freedesktop.org/drm/intel/issues/3303
  [i915#3363]: https://gitlab.freedesktop.org/drm/intel/issues/3363


Participating hosts (41 -> 38)
------------------------------

  Missing    (3): fi-ilk-m540 fi-bsw-cyan fi-bdw-samus 


Build changes
-------------

  * Linux: CI_DRM_10271 -> Patchwork_20449

  CI-20190529: 20190529
  CI_DRM_10271: 7a4a01d6716339c418394dbeb9a20d55bbb9a9ba @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6117: 3ba0a02404f243d6d8f232c6215163cc4b0fd699 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_20449: 8a320e973471dadebf185470543bfff31807af3f @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

8a320e973471 drm/i915/guc: Unblock GuC submission on Gen11+
b49edbcc9280 drm/i915/guc: Add golden context to GuC ADS
9bedfd84c9e8 drm/i915/guc: Include scheduling policies in the debugfs state dump
055c438902ea drm/i915/guc: Connect reset modparam updates to GuC policy flags
2529cfc15930 drm/i915/guc: Hook GuC scheduling policies up
c24a302b3409 drm/i915/guc: Fix for error capture after full GPU reset with GuC
d43f902c9f69 drm/i915/guc: Capture error state on context reset
cf156af71348 drm/i915/guc: Enable GuC engine reset
b7f2c290b8da drm/i915/guc: Don't complain about reset races
adcd1418b513 drm/i915/guc: Provide mmio list to be saved/restored on engine reset
ee41dea700c0 drm/i915/guc: Enable the timer expired interrupt for GuC
5560a36d3f00 drm/i915/guc: Handle engine reset failure notification
a921fc062048 drm/i915/guc: Handle context reset notification
345f1eca0a86 drm/i915/guc: Suspend/resume implementation for new interface
17b20c0c8bd7 drm/i915/guc: Add disable interrupts to guc sanitize
848ca056fc3e drm/i915: Reset GPU immediately if submission is disabled
3c79abc34b26 drm/i915/guc: Reset implementation for new GuC interface
7ae740c21ec4 drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
2fca6597dcd8 drm/i915/guc: Disable bonding extension with GuC submission
81e0788642e7 drm/i915: Hold reference to intel_context over life of i915_request
5ca8f1a503a8 drm/i915: Track 'serial' counts for virtual engines
351767d30288 drm/i915/guc: GuC virtual engines
37ebee77b9c2 drm/i915: Add intel_context tracing
4adb7ce61e5f drm/i915/guc: Add several request trace points
c84dc5dbebea drm/i915/guc: Update GuC debugfs to support new GuC
a03438ead8d3 drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
588df5bce44c drm/i915/guc: Ensure G2H response has space in buffer
59c796b965e6 drm/i915/guc: Disable semaphores when using GuC scheduling
e53180c87b60 drm/i915/guc: Ensure request ordering via completion fences
195129a34fbe drm/i915: Disable preempt busywait when using GuC scheduling
e7452c180185 drm/i915/guc: Extend deregistration fence to schedule disable
9e9fc047e7c1 drm/i915/guc: Disable engine barriers with GuC during unpin
53890c0d5072 drm/i915/guc: Defer context unpin until scheduling is disabled
0ddf4a0e5ead drm/i915/guc: Insert fence on context when deregistering
b9faec389671 drm/i915/guc: Implement GuC context operations for new inteface
86db42bdbf1a drm/i915/guc: Add bypass tasklet submission path to GuC
0478ac5fcc47 drm/i915/guc: Implement GuC submission tasklet
dbd8df00f28f drm/i915/guc: Add lrc descriptor context lookup array
9cc26b390c65 drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor
071d7fa76a91 drm/i915/guc: Add new GuC interface defines and structures
28324246d8aa drm/i915/guc: Module load failure test for CT buffer creation
2c9efd827987 drm/i915/guc: Optimize CTB writes and reads
c0779d6fc3b7 drm/i915/guc: Add stall timer to non blocking CTB send function
3d4dab7e7c81 drm/i915/guc: Add non blocking CTB send function
beb15fc92c92 drm/i915/guc: Increase size of CTB buffers
743d703ec315 drm/i915/guc: Improve error message for unsolicited CT response
a582e9c666ff drm/i915/guc: Relax CTB response timeout

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20449/index.html

[-- Attachment #1.2: Type: text/html, Size: 8257 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 03/47] drm/i915/guc: Increase size of CTB buffers
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 03/47] drm/i915/guc: Increase size of CTB buffers Matthew Brost
@ 2021-06-24 13:49   ` Michal Wajdeczko
  2021-06-24 15:41     ` Matthew Brost
  0 siblings, 1 reply; 170+ messages in thread
From: Michal Wajdeczko @ 2021-06-24 13:49 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel



On 24.06.2021 09:04, Matthew Brost wrote:
> With the introduction of non-blocking CTBs more than one CTB can be in
> flight at a time. Increasing the size of the CTBs should reduce how
> often software hits the case where no space is available in the CTB
> buffer.
> 
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 07f080ddb9ae..a17215920e58 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -58,11 +58,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
>   *      +--------+-----------------------------------------------+------+
>   *
>   * Size of each `CT Buffer`_ must be multiple of 4K.
> - * As we don't expect too many messages, for now use minimum sizes.
> + * We don't expect too many messages in flight at any time, unless we are
> + * using the GuC submission. In that case each request requires a minimum
> + * 2 dwords which gives us a maximum 256 queue'd requests. Hopefully this
> + * enough space to avoid backpressure on the driver. We increase the size
> + * of the receive buffer (relative to the send) to ensure a G2H response
> + * CTB has a landing spot.
>   */
>  #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
>  #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
> -#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
> +#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
>  
>  struct ct_request {
>  	struct list_head link;
> @@ -641,7 +646,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	/* beware of buffer wrap case */
>  	if (unlikely(available < 0))
>  		available += size;
> -	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
> +	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);

CTB size is already printed in intel_guc_ct_init() and is fixed so not
sure if repeating it on every ct_read has any benefit

>  	GEM_BUG_ON(available < 0);
>  
>  	header = cmds[head];
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 04/47] drm/i915/guc: Add non blocking CTB send function
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 04/47] drm/i915/guc: Add non blocking CTB send function Matthew Brost
@ 2021-06-24 14:48   ` Michal Wajdeczko
  2021-06-24 15:49     ` Matthew Brost
  0 siblings, 1 reply; 170+ messages in thread
From: Michal Wajdeczko @ 2021-06-24 14:48 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel



On 24.06.2021 09:04, Matthew Brost wrote:
> Add non blocking CTB send function, intel_guc_send_nb. GuC submission
> will send CTBs in the critical path and does not need to wait for these
> CTBs to complete before moving on, hence the need for this new function.
> 
> The non-blocking CTB now must have a flow control mechanism to ensure
> the buffer isn't overrun. A lazy spin wait is used as we believe the
> flow control condition should be rare with a properly sized buffer.
> 
> The function, intel_guc_send_nb, is exported in this patch but unused.
> Several patches later in the series make use of this function.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 +++-
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++--
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +-
>  3 files changed, 82 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 4abc59f6f3cd..24b1df6ad4ae 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
>  static
>  inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
>  {
> -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
> +}
> +
> +#define INTEL_GUC_SEND_NB		BIT(31)

hmm, this flag really belongs to intel_guc_ct_send() so it should be
defined as CTB flag near that function declaration

> +static
> +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
> +{
> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
> +				 INTEL_GUC_SEND_NB);
>  }
>  
>  static inline int
> @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
>  			   u32 *response_buf, u32 response_buf_size)
>  {
>  	return intel_guc_ct_send(&guc->ct, action, len,
> -				 response_buf, response_buf_size);
> +				 response_buf, response_buf_size, 0);
>  }
>  
>  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index a17215920e58..c9a65d05911f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -3,6 +3,11 @@
>   * Copyright © 2016-2019 Intel Corporation
>   */
>  
> +#include <linux/circ_buf.h>
> +#include <linux/ktime.h>
> +#include <linux/time64.h>
> +#include <linux/timekeeping.h>
> +
>  #include "i915_drv.h"
>  #include "intel_guc_ct.h"
>  #include "gt/intel_gt.h"
> @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct)
>  static int ct_write(struct intel_guc_ct *ct,
>  		    const u32 *action,
>  		    u32 len /* in dwords */,
> -		    u32 fence)
> +		    u32 fence, u32 flags)
>  {
>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct,
>  		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
>  		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
>  
> -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
> +	hxg = (flags & INTEL_GUC_SEND_NB) ?
> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
> +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
> +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));

or as we already switched to accept and return whole HXG messages in
guc_send_mmio() maybe we should do the same for CTB variant too and
instead of using extra flag just let caller to prepare proper HXG header
with HXG_EVENT type and then in CTB code just look at this type to make
decision which code path to use

note that existing callers should not be impacted, as full HXG header
for the REQUEST message looks exactly the same as "action" code alone.

>  
>  	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
>  		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
> @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
>  	return err;
>  }
>  
> +static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> +{
> +	struct guc_ct_buffer_desc *desc = ctb->desc;
> +	u32 head = READ_ONCE(desc->head);
> +	u32 space;
> +
> +	space = CIRC_SPACE(desc->tail, head, ctb->size);
> +
> +	return space >= len_dw;

here you are returning true(1) as has room

> +}
> +
> +static int ct_send_nb(struct intel_guc_ct *ct,
> +		      const u32 *action,
> +		      u32 len,
> +		      u32 flags)
> +{
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> +	unsigned long spin_flags;
> +	u32 fence;
> +	int ret;
> +
> +	spin_lock_irqsave(&ctb->lock, spin_flags);
> +
> +	ret = h2g_has_room(ctb, len + 1);

but here you treat "1" it as en error

and this "1" is GUC_HXG_MSG_MIN_LEN, right ?

> +	if (unlikely(ret))
> +		goto out;
> +
> +	fence = ct_get_next_fence(ct);
> +	ret = ct_write(ct, action, len, fence, flags);
> +	if (unlikely(ret))
> +		goto out;
> +
> +	intel_guc_notify(ct_to_guc(ct));
> +
> +out:
> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> +
> +	return ret;
> +}
> +
>  static int ct_send(struct intel_guc_ct *ct,
>  		   const u32 *action,
>  		   u32 len,
> @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct,
>  		   u32 response_buf_size,
>  		   u32 *status)
>  {
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>  	struct ct_request request;
>  	unsigned long flags;
>  	u32 fence;
> @@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct,
>  	GEM_BUG_ON(!len);
>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>  	GEM_BUG_ON(!response_buf && response_buf_size);
> +	might_sleep();
>  
> +	/*
> +	 * We use a lazy spin wait loop here as we believe that if the CT
> +	 * buffers are sized correctly the flow control condition should be
> +	 * rare.

shouldn't we at least try to log such cases with RATE_LIMITED to find
out how "rare" it is, or if really unlikely just return -EBUSY as in
case of non-blocking send ?

> +	 */
> +retry:
>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> +	if (unlikely(!h2g_has_room(ctb, len + 1))) {
> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> +		cond_resched();
> +		goto retry;
> +	}
>  
>  	fence = ct_get_next_fence(ct);
>  	request.fence = fence;
> @@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct,
>  	list_add_tail(&request.link, &ct->requests.pending);
>  	spin_unlock(&ct->requests.lock);
>  
> -	err = ct_write(ct, action, len, fence);
> +	err = ct_write(ct, action, len, fence, 0);
>  
>  	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>  
> @@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
>   * Command Transport (CT) buffer based GuC send function.
>   */
>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> -		      u32 *response_buf, u32 response_buf_size)
> +		      u32 *response_buf, u32 response_buf_size, u32 flags)
>  {
>  	u32 status = ~0; /* undefined */
>  	int ret;
> @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>  		return -ENODEV;
>  	}
>  
> +	if (flags & INTEL_GUC_SEND_NB)
> +		return ct_send_nb(ct, action, len, flags);
> +
>  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>  	if (unlikely(ret < 0)) {
>  		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index 1ae2dde6db93..eb69263324ba 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer {
>  	bool broken;
>  };
>  
> -
>  /** Top-level structure for Command Transport related data
>   *
>   * Includes a pair of CT buffers for bi-directional communication and tracking
> @@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
>  }
>  
>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> -		      u32 *response_buf, u32 response_buf_size);
> +		      u32 *response_buf, u32 response_buf_size, u32 flags);
>  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
>  
>  #endif /* _INTEL_GUC_CT_H_ */
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 03/47] drm/i915/guc: Increase size of CTB buffers
  2021-06-24 13:49   ` Michal Wajdeczko
@ 2021-06-24 15:41     ` Matthew Brost
  2021-06-25 12:03       ` Michal Wajdeczko
  0 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24 15:41 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: intel-gfx, dri-devel

On Thu, Jun 24, 2021 at 03:49:55PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 24.06.2021 09:04, Matthew Brost wrote:
> > With the introduction of non-blocking CTBs more than one CTB can be in
> > flight at a time. Increasing the size of the CTBs should reduce how
> > often software hits the case where no space is available in the CTB
> > buffer.
> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 07f080ddb9ae..a17215920e58 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -58,11 +58,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
> >   *      +--------+-----------------------------------------------+------+
> >   *
> >   * Size of each `CT Buffer`_ must be multiple of 4K.
> > - * As we don't expect too many messages, for now use minimum sizes.
> > + * We don't expect too many messages in flight at any time, unless we are
> > + * using the GuC submission. In that case each request requires a minimum
> > + * 2 dwords which gives us a maximum 256 queue'd requests. Hopefully this
> > + * enough space to avoid backpressure on the driver. We increase the size
> > + * of the receive buffer (relative to the send) to ensure a G2H response
> > + * CTB has a landing spot.
> >   */
> >  #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
> >  #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
> > -#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
> > +#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
> >  
> >  struct ct_request {
> >  	struct list_head link;
> > @@ -641,7 +646,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >  	/* beware of buffer wrap case */
> >  	if (unlikely(available < 0))
> >  		available += size;
> > -	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
> > +	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
> 
> CTB size is already printed in intel_guc_ct_init() and is fixed so not
> sure if repeating it on every ct_read has any benefit
> 

I'd say more debug the better and if CT_DEBUG is enabled the logs are
very verbose so an extra value doesn't really hurt.

Matt

> >  	GEM_BUG_ON(available < 0);
> >  
> >  	header = cmds[head];
> > 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 04/47] drm/i915/guc: Add non blocking CTB send function
  2021-06-24 14:48   ` Michal Wajdeczko
@ 2021-06-24 15:49     ` Matthew Brost
  2021-06-24 17:02       ` Michal Wajdeczko
  0 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24 15:49 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: intel-gfx, dri-devel

On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 24.06.2021 09:04, Matthew Brost wrote:
> > Add non blocking CTB send function, intel_guc_send_nb. GuC submission
> > will send CTBs in the critical path and does not need to wait for these
> > CTBs to complete before moving on, hence the need for this new function.
> > 
> > The non-blocking CTB now must have a flow control mechanism to ensure
> > the buffer isn't overrun. A lazy spin wait is used as we believe the
> > flow control condition should be rare with a properly sized buffer.
> > 
> > The function, intel_guc_send_nb, is exported in this patch but unused.
> > Several patches later in the series make use of this function.
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 +++-
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++--
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +-
> >  3 files changed, 82 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 4abc59f6f3cd..24b1df6ad4ae 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
> >  static
> >  inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
> >  {
> > -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
> > +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
> > +}
> > +
> > +#define INTEL_GUC_SEND_NB		BIT(31)
> 
> hmm, this flag really belongs to intel_guc_ct_send() so it should be
> defined as CTB flag near that function declaration
> 

I can move this up a few lines.

> > +static
> > +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
> > +{
> > +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
> > +				 INTEL_GUC_SEND_NB);
> >  }
> >  
> >  static inline int
> > @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
> >  			   u32 *response_buf, u32 response_buf_size)
> >  {
> >  	return intel_guc_ct_send(&guc->ct, action, len,
> > -				 response_buf, response_buf_size);
> > +				 response_buf, response_buf_size, 0);
> >  }
> >  
> >  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index a17215920e58..c9a65d05911f 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -3,6 +3,11 @@
> >   * Copyright © 2016-2019 Intel Corporation
> >   */
> >  
> > +#include <linux/circ_buf.h>
> > +#include <linux/ktime.h>
> > +#include <linux/time64.h>
> > +#include <linux/timekeeping.h>
> > +
> >  #include "i915_drv.h"
> >  #include "intel_guc_ct.h"
> >  #include "gt/intel_gt.h"
> > @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct)
> >  static int ct_write(struct intel_guc_ct *ct,
> >  		    const u32 *action,
> >  		    u32 len /* in dwords */,
> > -		    u32 fence)
> > +		    u32 fence, u32 flags)
> >  {
> >  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >  	struct guc_ct_buffer_desc *desc = ctb->desc;
> > @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct,
> >  		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
> >  		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
> >  
> > -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> > -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> > -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
> > +	hxg = (flags & INTEL_GUC_SEND_NB) ?
> > +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
> > +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
> > +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
> > +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> > +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> > +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
> 
> or as we already switched to accept and return whole HXG messages in
> guc_send_mmio() maybe we should do the same for CTB variant too and
> instead of using extra flag just let caller to prepare proper HXG header
> with HXG_EVENT type and then in CTB code just look at this type to make
> decision which code path to use
>

Not sure I follow. Anyways could this be done in a follow up by you if
want this change.
 
> note that existing callers should not be impacted, as full HXG header
> for the REQUEST message looks exactly the same as "action" code alone.
> 
> >  
> >  	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
> >  		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
> > @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> >  	return err;
> >  }
> >  
> > +static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> > +{
> > +	struct guc_ct_buffer_desc *desc = ctb->desc;
> > +	u32 head = READ_ONCE(desc->head);
> > +	u32 space;
> > +
> > +	space = CIRC_SPACE(desc->tail, head, ctb->size);
> > +
> > +	return space >= len_dw;
> 
> here you are returning true(1) as has room
>

See below.
 
> > +}
> > +
> > +static int ct_send_nb(struct intel_guc_ct *ct,
> > +		      const u32 *action,
> > +		      u32 len,
> > +		      u32 flags)
> > +{
> > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > +	unsigned long spin_flags;
> > +	u32 fence;
> > +	int ret;
> > +
> > +	spin_lock_irqsave(&ctb->lock, spin_flags);
> > +
> > +	ret = h2g_has_room(ctb, len + 1);
> 
> but here you treat "1" it as en error
> 

Yes, this patch is broken but fixed in a follow up one. Regardless I'll
fix this patch in place.

> and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
>

Not exactly. This is following how ct_send() uses the action + len
field. Action[0] field goes in the HXG header and extra + 1 is for the
CT header.

> > +	if (unlikely(ret))
> > +		goto out;
> > +
> > +	fence = ct_get_next_fence(ct);
> > +	ret = ct_write(ct, action, len, fence, flags);
> > +	if (unlikely(ret))
> > +		goto out;
> > +
> > +	intel_guc_notify(ct_to_guc(ct));
> > +
> > +out:
> > +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> > +
> > +	return ret;
> > +}
> > +
> >  static int ct_send(struct intel_guc_ct *ct,
> >  		   const u32 *action,
> >  		   u32 len,
> > @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >  		   u32 response_buf_size,
> >  		   u32 *status)
> >  {
> > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >  	struct ct_request request;
> >  	unsigned long flags;
> >  	u32 fence;
> > @@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct,
> >  	GEM_BUG_ON(!len);
> >  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> >  	GEM_BUG_ON(!response_buf && response_buf_size);
> > +	might_sleep();
> >  
> > +	/*
> > +	 * We use a lazy spin wait loop here as we believe that if the CT
> > +	 * buffers are sized correctly the flow control condition should be
> > +	 * rare.
> 
> shouldn't we at least try to log such cases with RATE_LIMITED to find
> out how "rare" it is, or if really unlikely just return -EBUSY as in
> case of non-blocking send ?
>

Definitely not return -EBUSY as this a blocking call. Perhaps we can log
this, but IGTs likely can hit rather easily. It really is only
interesting if real workloads hit this. Regardless that can be a follow
up.

Matt
 
> > +	 */
> > +retry:
> >  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > +	if (unlikely(!h2g_has_room(ctb, len + 1))) {
> > +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > +		cond_resched();
> > +		goto retry;
> > +	}
> >  
> >  	fence = ct_get_next_fence(ct);
> >  	request.fence = fence;
> > @@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >  	list_add_tail(&request.link, &ct->requests.pending);
> >  	spin_unlock(&ct->requests.lock);
> >  
> > -	err = ct_write(ct, action, len, fence);
> > +	err = ct_write(ct, action, len, fence, 0);
> >  
> >  	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >  
> > @@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >   * Command Transport (CT) buffer based GuC send function.
> >   */
> >  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> > -		      u32 *response_buf, u32 response_buf_size)
> > +		      u32 *response_buf, u32 response_buf_size, u32 flags)
> >  {
> >  	u32 status = ~0; /* undefined */
> >  	int ret;
> > @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >  		return -ENODEV;
> >  	}
> >  
> > +	if (flags & INTEL_GUC_SEND_NB)
> > +		return ct_send_nb(ct, action, len, flags);
> > +
> >  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> >  	if (unlikely(ret < 0)) {
> >  		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > index 1ae2dde6db93..eb69263324ba 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer {
> >  	bool broken;
> >  };
> >  
> > -
> >  /** Top-level structure for Command Transport related data
> >   *
> >   * Includes a pair of CT buffers for bi-directional communication and tracking
> > @@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
> >  }
> >  
> >  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> > -		      u32 *response_buf, u32 response_buf_size);
> > +		      u32 *response_buf, u32 response_buf_size, u32 flags);
> >  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
> >  
> >  #endif /* _INTEL_GUC_CT_H_ */
> > 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 39/47] drm/i915/guc: Don't complain about reset races
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 39/47] drm/i915/guc: Don't complain about reset races Matthew Brost
@ 2021-06-24 15:55   ` Matthew Brost
  0 siblings, 0 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-24 15:55 UTC (permalink / raw)
  To: intel-gfx, dri-devel

On Thu, Jun 24, 2021 at 12:05:08AM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> It is impossible to seal all race conditions of resets occurring
> concurrent to other operations. At least, not without introducing
> excesive mutex locking. Instead, don't complain if it occurs. In
> particular, don't complain if trying to send a H2G during a reset.
> Whatever the H2G was about should get redone once the reset is over.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 5 ++++-
>  drivers/gpu/drm/i915/gt/uc/intel_uc.c     | 3 +++
>  drivers/gpu/drm/i915/gt/uc/intel_uc.h     | 2 ++
>  3 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index dd6177c8d75c..3b32755f892e 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -727,7 +727,10 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>  	int ret;
>  
>  	if (unlikely(!ct->enabled)) {
> -		WARN(1, "Unexpected send: action=%#x\n", *action);
> +		struct intel_guc *guc = ct_to_guc(ct);
> +		struct intel_uc *uc = container_of(guc, struct intel_uc, guc);
> +
> +		WARN(!uc->reset_in_progress, "Unexpected send: action=%#x\n", *action);
>  		return -ENODEV;
>  	}
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> index b523a8521351..77c1fe2ed883 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> @@ -550,6 +550,7 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
>  {
>  	struct intel_guc *guc = &uc->guc;
>  
> +	uc->reset_in_progress = true;
>  
>  	/* Nothing to do if GuC isn't supported */
>  	if (!intel_uc_supports_guc(uc))
> @@ -579,6 +580,8 @@ void intel_uc_reset_finish(struct intel_uc *uc)
>  {
>  	struct intel_guc *guc = &uc->guc;
>  
> +	uc->reset_in_progress = false;
> +
>  	/* Firmware expected to be running when this function is called */
>  	if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
>  		intel_guc_submission_reset_finish(guc);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> index eaa3202192ac..91315e3f1c58 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> @@ -30,6 +30,8 @@ struct intel_uc {
>  
>  	/* Snapshot of GuC log from last failed load */
>  	struct drm_i915_gem_object *load_err_log;
> +
> +	bool reset_in_progress;
>  };
>  
>  void intel_uc_init_early(struct intel_uc *uc);
> -- 
> 2.28.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 40/47] drm/i915/guc: Enable GuC engine reset
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 40/47] drm/i915/guc: Enable GuC engine reset Matthew Brost
@ 2021-06-24 16:19   ` Matthew Brost
  0 siblings, 0 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-24 16:19 UTC (permalink / raw)
  To: intel-gfx, dri-devel

On Thu, Jun 24, 2021 at 12:05:09AM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Clear the 'disable resets' flag to allow GuC to reset hung contexts
> (detected via pre-emption timeout).
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index 9fd3c911f5fb..d3e86ab7508f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -81,8 +81,7 @@ static void guc_policies_init(struct guc_policies *policies)
>  {
>  	policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US;
>  	policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI;
> -	/* Disable automatic resets as not yet supported. */
> -	policies->global_flags = GLOBAL_POLICY_DISABLE_ENGINE_RESET;
> +	policies->global_flags = 0;
>  	policies->is_valid = 1;
>  }
>  
> -- 
> 2.28.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 45/47] drm/i915/guc: Include scheduling policies in the debugfs state dump
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 45/47] drm/i915/guc: Include scheduling policies in the debugfs state dump Matthew Brost
@ 2021-06-24 16:34   ` Matthew Brost
  0 siblings, 0 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-24 16:34 UTC (permalink / raw)
  To: intel-gfx, dri-devel

On Thu, Jun 24, 2021 at 12:05:14AM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Added the scheduling policy parameters to the 'guc_info' debugfs state
> dump.
> 
> Signed-off-by: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c     | 13 +++++++++++++
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h     |  2 ++
>  drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c |  2 ++
>  3 files changed, 17 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index c6d0b762d82c..b8182844aa00 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -92,6 +92,19 @@ static void guc_policies_init(struct intel_guc *guc, struct guc_policies *polici
>  	policies->is_valid = 1;
>  }
>  
> +void intel_guc_log_policy_info(struct intel_guc *guc, struct drm_printer *dp)
> +{
> +	struct __guc_ads_blob *blob = guc->ads_blob;
> +
> +	if (unlikely(!blob))
> +		return;
> +
> +	drm_printf(dp, "Global scheduling policies:\n");
> +	drm_printf(dp, "  DPC promote time   = %u\n", blob->policies.dpc_promote_time);
> +	drm_printf(dp, "  Max num work items = %u\n", blob->policies.max_num_work_items);
> +	drm_printf(dp, "  Flags              = %u\n", blob->policies.global_flags);
> +}
> +
>  static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
>  {
>  	u32 action[] = {
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> index b00d3ae1113a..0fdcb3583601 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> @@ -7,9 +7,11 @@
>  #define _INTEL_GUC_ADS_H_
>  
>  struct intel_guc;
> +struct drm_printer;
>  
>  int intel_guc_ads_create(struct intel_guc *guc);
>  void intel_guc_ads_destroy(struct intel_guc *guc);
>  void intel_guc_ads_reset(struct intel_guc *guc);
> +void intel_guc_log_policy_info(struct intel_guc *guc, struct drm_printer *p);
>  
>  #endif
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
> index 62b9ce0fafaa..9a03ff56e654 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
> @@ -10,6 +10,7 @@
>  #include "intel_guc_debugfs.h"
>  #include "intel_guc_log_debugfs.h"
>  #include "gt/uc/intel_guc_ct.h"
> +#include "gt/uc/intel_guc_ads.h"
>  #include "gt/uc/intel_guc_submission.h"
>  
>  static int guc_info_show(struct seq_file *m, void *data)
> @@ -29,6 +30,7 @@ static int guc_info_show(struct seq_file *m, void *data)
>  
>  	intel_guc_log_ct_info(&guc->ct, &p);
>  	intel_guc_log_submission_info(guc, &p);
> +	intel_guc_log_policy_info(guc, &p);
>  
>  	return 0;
>  }
> -- 
> 2.28.0
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 04/47] drm/i915/guc: Add non blocking CTB send function
  2021-06-24 15:49     ` Matthew Brost
@ 2021-06-24 17:02       ` Michal Wajdeczko
  2021-06-24 22:41         ` Matthew Brost
  2021-06-24 22:47         ` Matthew Brost
  0 siblings, 2 replies; 170+ messages in thread
From: Michal Wajdeczko @ 2021-06-24 17:02 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, dri-devel



On 24.06.2021 17:49, Matthew Brost wrote:
> On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
>>
>>
>> On 24.06.2021 09:04, Matthew Brost wrote:
>>> Add non blocking CTB send function, intel_guc_send_nb. GuC submission
>>> will send CTBs in the critical path and does not need to wait for these
>>> CTBs to complete before moving on, hence the need for this new function.
>>>
>>> The non-blocking CTB now must have a flow control mechanism to ensure
>>> the buffer isn't overrun. A lazy spin wait is used as we believe the
>>> flow control condition should be rare with a properly sized buffer.
>>>
>>> The function, intel_guc_send_nb, is exported in this patch but unused.
>>> Several patches later in the series make use of this function.
>>>
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 +++-
>>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++--
>>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +-
>>>  3 files changed, 82 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> index 4abc59f6f3cd..24b1df6ad4ae 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
>>>  static
>>>  inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
>>>  {
>>> -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
>>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
>>> +}
>>> +
>>> +#define INTEL_GUC_SEND_NB		BIT(31)
>>
>> hmm, this flag really belongs to intel_guc_ct_send() so it should be
>> defined as CTB flag near that function declaration
>>
> 
> I can move this up a few lines.
> 
>>> +static
>>> +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
>>> +{
>>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
>>> +				 INTEL_GUC_SEND_NB);
>>>  }
>>>  
>>>  static inline int
>>> @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
>>>  			   u32 *response_buf, u32 response_buf_size)
>>>  {
>>>  	return intel_guc_ct_send(&guc->ct, action, len,
>>> -				 response_buf, response_buf_size);
>>> +				 response_buf, response_buf_size, 0);
>>>  }
>>>  
>>>  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index a17215920e58..c9a65d05911f 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -3,6 +3,11 @@
>>>   * Copyright © 2016-2019 Intel Corporation
>>>   */
>>>  
>>> +#include <linux/circ_buf.h>
>>> +#include <linux/ktime.h>
>>> +#include <linux/time64.h>
>>> +#include <linux/timekeeping.h>
>>> +
>>>  #include "i915_drv.h"
>>>  #include "intel_guc_ct.h"
>>>  #include "gt/intel_gt.h"
>>> @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct)
>>>  static int ct_write(struct intel_guc_ct *ct,
>>>  		    const u32 *action,
>>>  		    u32 len /* in dwords */,
>>> -		    u32 fence)
>>> +		    u32 fence, u32 flags)
>>>  {
>>>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>  	struct guc_ct_buffer_desc *desc = ctb->desc;
>>> @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct,
>>>  		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
>>>  		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
>>>  
>>> -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
>>> -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
>>> -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
>>> +	hxg = (flags & INTEL_GUC_SEND_NB) ?
>>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
>>> +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
>>> +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
>>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
>>> +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
>>> +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
>>
>> or as we already switched to accept and return whole HXG messages in
>> guc_send_mmio() maybe we should do the same for CTB variant too and
>> instead of using extra flag just let caller to prepare proper HXG header
>> with HXG_EVENT type and then in CTB code just look at this type to make
>> decision which code path to use
>>
> 
> Not sure I follow. Anyways could this be done in a follow up by you if
> want this change.
>  
>> note that existing callers should not be impacted, as full HXG header
>> for the REQUEST message looks exactly the same as "action" code alone.
>>
>>>  
>>>  	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
>>>  		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
>>> @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
>>>  	return err;
>>>  }
>>>  
>>> +static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
>>> +{
>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>>> +	u32 head = READ_ONCE(desc->head);
>>> +	u32 space;
>>> +
>>> +	space = CIRC_SPACE(desc->tail, head, ctb->size);
>>> +
>>> +	return space >= len_dw;
>>
>> here you are returning true(1) as has room
>>
> 
> See below.
>  
>>> +}
>>> +
>>> +static int ct_send_nb(struct intel_guc_ct *ct,
>>> +		      const u32 *action,
>>> +		      u32 len,
>>> +		      u32 flags)
>>> +{
>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>> +	unsigned long spin_flags;
>>> +	u32 fence;
>>> +	int ret;
>>> +
>>> +	spin_lock_irqsave(&ctb->lock, spin_flags);
>>> +
>>> +	ret = h2g_has_room(ctb, len + 1);
>>
>> but here you treat "1" it as en error
>>
> 
> Yes, this patch is broken but fixed in a follow up one. Regardless I'll
> fix this patch in place.
> 
>> and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
>>
> 
> Not exactly. This is following how ct_send() uses the action + len
> field. Action[0] field goes in the HXG header and extra + 1 is for the
> CT header.

well, "len" already counts "action" so by treating input as full HXG
message (including HXG header) will make it cleaner

we can try do it later but by doing it right now we would avoid
introducing this send_nb() function and deprecating them long term again

> 
>>> +	if (unlikely(ret))
>>> +		goto out;
>>> +
>>> +	fence = ct_get_next_fence(ct);
>>> +	ret = ct_write(ct, action, len, fence, flags);
>>> +	if (unlikely(ret))
>>> +		goto out;
>>> +
>>> +	intel_guc_notify(ct_to_guc(ct));
>>> +
>>> +out:
>>> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
>>> +
>>> +	return ret;
>>> +}
>>> +
>>>  static int ct_send(struct intel_guc_ct *ct,
>>>  		   const u32 *action,
>>>  		   u32 len,
>>> @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>  		   u32 response_buf_size,
>>>  		   u32 *status)
>>>  {
>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>  	struct ct_request request;
>>>  	unsigned long flags;
>>>  	u32 fence;
>>> @@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct,
>>>  	GEM_BUG_ON(!len);
>>>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>  	GEM_BUG_ON(!response_buf && response_buf_size);
>>> +	might_sleep();
>>>  
>>> +	/*
>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>>> +	 * buffers are sized correctly the flow control condition should be
>>> +	 * rare.
>>
>> shouldn't we at least try to log such cases with RATE_LIMITED to find
>> out how "rare" it is, or if really unlikely just return -EBUSY as in
>> case of non-blocking send ?
>>
> 
> Definitely not return -EBUSY as this a blocking call. Perhaps we can log

blocking calls still can fail for various reasons, full CTB is one of
them, and if we return error (now broken) for non-blocking variant then
we should do the same for blocking variant as well and let the caller
decide about next steps

> this, but IGTs likely can hit rather easily. It really is only
> interesting if real workloads hit this. Regardless that can be a follow
> up.

if we hide retry in a silent loop then we will not find it out if we hit
this condition (IGT or real WL) or not

> 
> Matt
>  
>>> +	 */
>>> +retry:
>>>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>>> +	if (unlikely(!h2g_has_room(ctb, len + 1))) {
>>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>> +		cond_resched();
>>> +		goto retry;
>>> +	}
>>>  
>>>  	fence = ct_get_next_fence(ct);
>>>  	request.fence = fence;
>>> @@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>  	list_add_tail(&request.link, &ct->requests.pending);
>>>  	spin_unlock(&ct->requests.lock);
>>>  
>>> -	err = ct_write(ct, action, len, fence);
>>> +	err = ct_write(ct, action, len, fence, 0);
>>>  
>>>  	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>>  
>>> @@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>   * Command Transport (CT) buffer based GuC send function.
>>>   */
>>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>> -		      u32 *response_buf, u32 response_buf_size)
>>> +		      u32 *response_buf, u32 response_buf_size, u32 flags)
>>>  {
>>>  	u32 status = ~0; /* undefined */
>>>  	int ret;
>>> @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>>  		return -ENODEV;
>>>  	}
>>>  
>>> +	if (flags & INTEL_GUC_SEND_NB)
>>> +		return ct_send_nb(ct, action, len, flags);
>>> +
>>>  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>>>  	if (unlikely(ret < 0)) {
>>>  		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> index 1ae2dde6db93..eb69263324ba 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer {
>>>  	bool broken;
>>>  };
>>>  
>>> -
>>>  /** Top-level structure for Command Transport related data
>>>   *
>>>   * Includes a pair of CT buffers for bi-directional communication and tracking
>>> @@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
>>>  }
>>>  
>>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>> -		      u32 *response_buf, u32 response_buf_size);
>>> +		      u32 *response_buf, u32 response_buf_size, u32 flags);
>>>  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
>>>  
>>>  #endif /* _INTEL_GUC_CT_H_ */
>>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 01/47] drm/i915/guc: Relax CTB response timeout
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 01/47] drm/i915/guc: Relax CTB response timeout Matthew Brost
@ 2021-06-24 17:23   ` Michal Wajdeczko
  0 siblings, 0 replies; 170+ messages in thread
From: Michal Wajdeczko @ 2021-06-24 17:23 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel



On 24.06.2021 09:04, Matthew Brost wrote:
> In upcoming patch we will allow more CTB requests to be sent in
> parallel to the GuC for processing, so we shouldn't assume any more
> that GuC will always reply without 10ms.
> 
> Use bigger value hardcoded value of 1s instead.
> 
> v2: Add CONFIG_DRM_I915_GUC_CTB_TIMEOUT config option
> v3:
>  (Daniel Vetter)
>   - Use hardcoded value of 1s rather than config option
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 43409044528e..a59e239497ee 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -474,14 +474,16 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
>  	/*
>  	 * Fast commands should complete in less than 10us, so sample quickly
>  	 * up to that length of time, then switch to a slower sleep-wait loop.
> -	 * No GuC command should ever take longer than 10ms.
> +	 * No GuC command should ever take longer than 10ms but many GuC
> +	 * commands can be inflight at time, so use a 1s timeout on the slower
> +	 * sleep-wait loop.
>  	 */
>  #define done \
>  	(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, READ_ONCE(req->status)) == \
>  	 GUC_HXG_ORIGIN_GUC)
>  	err = wait_for_us(done, 10);
>  	if (err)
> -		err = wait_for(done, 10);
> +		err = wait_for(done, 1000);

can we add #defines for these 10/1000 values? with that

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>

>  #undef done
>  
>  	if (unlikely(err))
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 05/47] drm/i915/guc: Add stall timer to non blocking CTB send function
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 05/47] drm/i915/guc: Add stall timer to " Matthew Brost
@ 2021-06-24 17:37   ` Michal Wajdeczko
  2021-06-24 23:01     ` Matthew Brost
  0 siblings, 1 reply; 170+ messages in thread
From: Michal Wajdeczko @ 2021-06-24 17:37 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel



On 24.06.2021 09:04, Matthew Brost wrote:
> Implement a stall timer which fails H2G CTBs once a period of time
> with no forward progress is reached to prevent deadlock.
> 
> Also update to ct_write to return -EIO rather than -EPIPE on a
> corrupted descriptor.

by doing so you will have the same error code for two different problems:

a) corrupted CTB descriptor (definitely unrecoverable)
b) long stall in CTB processing (still recoverable)

while caller is explicitly instructed to retry only on:

c) temporary stall in CTB processing (likely recoverable)

so why do we want to limit our diagnostics?

> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 47 +++++++++++++++++++++--
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  4 ++
>  2 files changed, 48 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index c9a65d05911f..27ec30b5ef47 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -319,6 +319,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
>  		goto err_deregister;
>  
>  	ct->enabled = true;
> +	ct->stall_time = KTIME_MAX;
>  
>  	return 0;
>  
> @@ -392,7 +393,7 @@ static int ct_write(struct intel_guc_ct *ct,
>  	unsigned int i;
>  
>  	if (unlikely(ctb->broken))
> -		return -EPIPE;
> +		return -EIO;
>  
>  	if (unlikely(desc->status))
>  		goto corrupted;
> @@ -464,7 +465,7 @@ static int ct_write(struct intel_guc_ct *ct,
>  	CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n",
>  		 desc->head, desc->tail, desc->status);
>  	ctb->broken = true;
> -	return -EPIPE;
> +	return -EIO;
>  }
>  
>  /**
> @@ -507,6 +508,18 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
>  	return err;
>  }
>  
> +#define GUC_CTB_TIMEOUT_MS	1500

it's 150% of core CTB timeout, maybe we should correlate them ?

> +static inline bool ct_deadlocked(struct intel_guc_ct *ct)
> +{
> +	long timeout = GUC_CTB_TIMEOUT_MS;
> +	bool ret = ktime_ms_delta(ktime_get(), ct->stall_time) > timeout;
> +
> +	if (unlikely(ret))
> +		CT_ERROR(ct, "CT deadlocked\n");

nit: in commit message you said all these changes are to "prevent
deadlock" so maybe this message should rather be:

	int delta = ktime_ms_delta(ktime_get(), ct->stall_time);

	CT_ERROR(ct, "Communication stalled for %dms\n", delta);

(note that CT_ERROR already adds "CT" prefix)

> +
> +	return ret;
> +}
> +
>  static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
>  {
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> @@ -518,6 +531,26 @@ static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
>  	return space >= len_dw;
>  }
>  
> +static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
> +{
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> +
> +	lockdep_assert_held(&ct->ctbs.send.lock);
> +
> +	if (unlikely(!h2g_has_room(ctb, len_dw))) {
> +		if (ct->stall_time == KTIME_MAX)
> +			ct->stall_time = ktime_get();
> +
> +		if (unlikely(ct_deadlocked(ct)))

and maybe above message should be printed somewhere around here when we
detect "deadlock" for the first time?

> +			return -EIO;
> +		else
> +			return -EBUSY;
> +	}
> +
> +	ct->stall_time = KTIME_MAX;
> +	return 0;
> +}
> +
>  static int ct_send_nb(struct intel_guc_ct *ct,
>  		      const u32 *action,
>  		      u32 len,
> @@ -530,7 +563,7 @@ static int ct_send_nb(struct intel_guc_ct *ct,
>  
>  	spin_lock_irqsave(&ctb->lock, spin_flags);
>  
> -	ret = h2g_has_room(ctb, len + 1);
> +	ret = has_room_nb(ct, len + 1);
>  	if (unlikely(ret))
>  		goto out;
>  
> @@ -574,11 +607,19 @@ static int ct_send(struct intel_guc_ct *ct,
>  retry:
>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>  	if (unlikely(!h2g_has_room(ctb, len + 1))) {
> +		if (ct->stall_time == KTIME_MAX)
> +			ct->stall_time = ktime_get();

as this is a repeated pattern, maybe it should be moved to h2g_has_room
or other wrapper ?

>  		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> +
> +		if (unlikely(ct_deadlocked(ct)))
> +			return -EIO;
> +
>  		cond_resched();
>  		goto retry;
>  	}
>  
> +	ct->stall_time = KTIME_MAX;

this one too

> +
>  	fence = ct_get_next_fence(ct);
>  	request.fence = fence;
>  	request.status = 0;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index eb69263324ba..55ef7c52472f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -9,6 +9,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/spinlock.h>
>  #include <linux/workqueue.h>
> +#include <linux/ktime.h>
>  
>  #include "intel_guc_fwif.h"
>  
> @@ -68,6 +69,9 @@ struct intel_guc_ct {
>  		struct list_head incoming; /* incoming requests */
>  		struct work_struct worker; /* handler for incoming requests */
>  	} requests;
> +
> +	/** @stall_time: time of first time a CTB submission is stalled */
> +	ktime_t stall_time;
>  };
>  
>  void intel_guc_ct_init_early(struct intel_guc_ct *ct);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 04/47] drm/i915/guc: Add non blocking CTB send function
  2021-06-24 17:02       ` Michal Wajdeczko
@ 2021-06-24 22:41         ` Matthew Brost
  2021-06-25 11:50           ` Michal Wajdeczko
  2021-06-24 22:47         ` Matthew Brost
  1 sibling, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-24 22:41 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: intel-gfx, dri-devel

On Thu, Jun 24, 2021 at 07:02:18PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 24.06.2021 17:49, Matthew Brost wrote:
> > On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
> >>
> >>
> >> On 24.06.2021 09:04, Matthew Brost wrote:
> >>> Add non blocking CTB send function, intel_guc_send_nb. GuC submission
> >>> will send CTBs in the critical path and does not need to wait for these
> >>> CTBs to complete before moving on, hence the need for this new function.
> >>>
> >>> The non-blocking CTB now must have a flow control mechanism to ensure
> >>> the buffer isn't overrun. A lazy spin wait is used as we believe the
> >>> flow control condition should be rare with a properly sized buffer.
> >>>
> >>> The function, intel_guc_send_nb, is exported in this patch but unused.
> >>> Several patches later in the series make use of this function.
> >>>
> >>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >>> ---
> >>>  drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 +++-
> >>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++--
> >>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +-
> >>>  3 files changed, 82 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>> index 4abc59f6f3cd..24b1df6ad4ae 100644
> >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>> @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
> >>>  static
> >>>  inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
> >>>  {
> >>> -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
> >>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
> >>> +}
> >>> +
> >>> +#define INTEL_GUC_SEND_NB		BIT(31)
> >>
> >> hmm, this flag really belongs to intel_guc_ct_send() so it should be
> >> defined as CTB flag near that function declaration
> >>
> > 
> > I can move this up a few lines.
> > 
> >>> +static
> >>> +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
> >>> +{
> >>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
> >>> +				 INTEL_GUC_SEND_NB);
> >>>  }
> >>>  
> >>>  static inline int
> >>> @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
> >>>  			   u32 *response_buf, u32 response_buf_size)
> >>>  {
> >>>  	return intel_guc_ct_send(&guc->ct, action, len,
> >>> -				 response_buf, response_buf_size);
> >>> +				 response_buf, response_buf_size, 0);
> >>>  }
> >>>  
> >>>  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>> index a17215920e58..c9a65d05911f 100644
> >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>> @@ -3,6 +3,11 @@
> >>>   * Copyright © 2016-2019 Intel Corporation
> >>>   */
> >>>  
> >>> +#include <linux/circ_buf.h>
> >>> +#include <linux/ktime.h>
> >>> +#include <linux/time64.h>
> >>> +#include <linux/timekeeping.h>
> >>> +
> >>>  #include "i915_drv.h"
> >>>  #include "intel_guc_ct.h"
> >>>  #include "gt/intel_gt.h"
> >>> @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct)
> >>>  static int ct_write(struct intel_guc_ct *ct,
> >>>  		    const u32 *action,
> >>>  		    u32 len /* in dwords */,
> >>> -		    u32 fence)
> >>> +		    u32 fence, u32 flags)
> >>>  {
> >>>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> >>> @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct,
> >>>  		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
> >>>  		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
> >>>  
> >>> -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> >>> -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> >>> -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
> >>> +	hxg = (flags & INTEL_GUC_SEND_NB) ?
> >>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
> >>> +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
> >>> +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
> >>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> >>> +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> >>> +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
> >>
> >> or as we already switched to accept and return whole HXG messages in
> >> guc_send_mmio() maybe we should do the same for CTB variant too and
> >> instead of using extra flag just let caller to prepare proper HXG header
> >> with HXG_EVENT type and then in CTB code just look at this type to make
> >> decision which code path to use
> >>
> > 
> > Not sure I follow. Anyways could this be done in a follow up by you if
> > want this change.
> >  
> >> note that existing callers should not be impacted, as full HXG header
> >> for the REQUEST message looks exactly the same as "action" code alone.
> >>
> >>>  
> >>>  	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
> >>>  		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
> >>> @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> >>>  	return err;
> >>>  }
> >>>  
> >>> +static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> >>> +{
> >>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
> >>> +	u32 head = READ_ONCE(desc->head);
> >>> +	u32 space;
> >>> +
> >>> +	space = CIRC_SPACE(desc->tail, head, ctb->size);
> >>> +
> >>> +	return space >= len_dw;
> >>
> >> here you are returning true(1) as has room
> >>
> > 
> > See below.
> >  
> >>> +}
> >>> +
> >>> +static int ct_send_nb(struct intel_guc_ct *ct,
> >>> +		      const u32 *action,
> >>> +		      u32 len,
> >>> +		      u32 flags)
> >>> +{
> >>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>> +	unsigned long spin_flags;
> >>> +	u32 fence;
> >>> +	int ret;
> >>> +
> >>> +	spin_lock_irqsave(&ctb->lock, spin_flags);
> >>> +
> >>> +	ret = h2g_has_room(ctb, len + 1);
> >>
> >> but here you treat "1" it as en error
> >>
> > 
> > Yes, this patch is broken but fixed in a follow up one. Regardless I'll
> > fix this patch in place.
> > 
> >> and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
> >>
> > 
> > Not exactly. This is following how ct_send() uses the action + len
> > field. Action[0] field goes in the HXG header and extra + 1 is for the
> > CT header.
> 
> well, "len" already counts "action" so by treating input as full HXG
> message (including HXG header) will make it cleaner
> 

Yes, I know. See above. To me GUC_HXG_MSG_MIN_LEN makes zero sense and
it is worse than adding + 1. This + 1 accounts for the CT header not the
HXG header. If any we add a new define, GUC_CT_HDR_LEN, and add that.

Matt

> we can try do it later but by doing it right now we would avoid
> introducing this send_nb() function and deprecating them long term again
> 
> > 
> >>> +	if (unlikely(ret))
> >>> +		goto out;
> >>> +
> >>> +	fence = ct_get_next_fence(ct);
> >>> +	ret = ct_write(ct, action, len, fence, flags);
> >>> +	if (unlikely(ret))
> >>> +		goto out;
> >>> +
> >>> +	intel_guc_notify(ct_to_guc(ct));
> >>> +
> >>> +out:
> >>> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +
> >>>  static int ct_send(struct intel_guc_ct *ct,
> >>>  		   const u32 *action,
> >>>  		   u32 len,
> >>> @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>  		   u32 response_buf_size,
> >>>  		   u32 *status)
> >>>  {
> >>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>  	struct ct_request request;
> >>>  	unsigned long flags;
> >>>  	u32 fence;
> >>> @@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>  	GEM_BUG_ON(!len);
> >>>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> >>>  	GEM_BUG_ON(!response_buf && response_buf_size);
> >>> +	might_sleep();
> >>>  
> >>> +	/*
> >>> +	 * We use a lazy spin wait loop here as we believe that if the CT
> >>> +	 * buffers are sized correctly the flow control condition should be
> >>> +	 * rare.
> >>
> >> shouldn't we at least try to log such cases with RATE_LIMITED to find
> >> out how "rare" it is, or if really unlikely just return -EBUSY as in
> >> case of non-blocking send ?
> >>
> > 
> > Definitely not return -EBUSY as this a blocking call. Perhaps we can log
> 
> blocking calls still can fail for various reasons, full CTB is one of
> them, and if we return error (now broken) for non-blocking variant then
> we should do the same for blocking variant as well and let the caller
> decide about next steps
> 
> > this, but IGTs likely can hit rather easily. It really is only
> > interesting if real workloads hit this. Regardless that can be a follow
> > up.
> 
> if we hide retry in a silent loop then we will not find it out if we hit
> this condition (IGT or real WL) or not
> 
> > 
> > Matt
> >  
> >>> +	 */
> >>> +retry:
> >>>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> >>> +	if (unlikely(!h2g_has_room(ctb, len + 1))) {
> >>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>> +		cond_resched();
> >>> +		goto retry;
> >>> +	}
> >>>  
> >>>  	fence = ct_get_next_fence(ct);
> >>>  	request.fence = fence;
> >>> @@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>  	list_add_tail(&request.link, &ct->requests.pending);
> >>>  	spin_unlock(&ct->requests.lock);
> >>>  
> >>> -	err = ct_write(ct, action, len, fence);
> >>> +	err = ct_write(ct, action, len, fence, 0);
> >>>  
> >>>  	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>>  
> >>> @@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>   * Command Transport (CT) buffer based GuC send function.
> >>>   */
> >>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>> -		      u32 *response_buf, u32 response_buf_size)
> >>> +		      u32 *response_buf, u32 response_buf_size, u32 flags)
> >>>  {
> >>>  	u32 status = ~0; /* undefined */
> >>>  	int ret;
> >>> @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>>  		return -ENODEV;
> >>>  	}
> >>>  
> >>> +	if (flags & INTEL_GUC_SEND_NB)
> >>> +		return ct_send_nb(ct, action, len, flags);
> >>> +
> >>>  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> >>>  	if (unlikely(ret < 0)) {
> >>>  		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
> >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>> index 1ae2dde6db93..eb69263324ba 100644
> >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>> @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer {
> >>>  	bool broken;
> >>>  };
> >>>  
> >>> -
> >>>  /** Top-level structure for Command Transport related data
> >>>   *
> >>>   * Includes a pair of CT buffers for bi-directional communication and tracking
> >>> @@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
> >>>  }
> >>>  
> >>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>> -		      u32 *response_buf, u32 response_buf_size);
> >>> +		      u32 *response_buf, u32 response_buf_size, u32 flags);
> >>>  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
> >>>  
> >>>  #endif /* _INTEL_GUC_CT_H_ */
> >>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 04/47] drm/i915/guc: Add non blocking CTB send function
  2021-06-24 17:02       ` Michal Wajdeczko
  2021-06-24 22:41         ` Matthew Brost
@ 2021-06-24 22:47         ` Matthew Brost
  1 sibling, 0 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-24 22:47 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: intel-gfx, dri-devel

On Thu, Jun 24, 2021 at 07:02:18PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 24.06.2021 17:49, Matthew Brost wrote:
> > On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
> >>
> >>
> >> On 24.06.2021 09:04, Matthew Brost wrote:
> >>> Add non blocking CTB send function, intel_guc_send_nb. GuC submission
> >>> will send CTBs in the critical path and does not need to wait for these
> >>> CTBs to complete before moving on, hence the need for this new function.
> >>>
> >>> The non-blocking CTB now must have a flow control mechanism to ensure
> >>> the buffer isn't overrun. A lazy spin wait is used as we believe the
> >>> flow control condition should be rare with a properly sized buffer.
> >>>
> >>> The function, intel_guc_send_nb, is exported in this patch but unused.
> >>> Several patches later in the series make use of this function.
> >>>
> >>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >>> ---
> >>>  drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 +++-
> >>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++--
> >>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +-
> >>>  3 files changed, 82 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>> index 4abc59f6f3cd..24b1df6ad4ae 100644
> >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>> @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
> >>>  static
> >>>  inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
> >>>  {
> >>> -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
> >>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
> >>> +}
> >>> +
> >>> +#define INTEL_GUC_SEND_NB		BIT(31)
> >>
> >> hmm, this flag really belongs to intel_guc_ct_send() so it should be
> >> defined as CTB flag near that function declaration
> >>
> > 
> > I can move this up a few lines.
> > 
> >>> +static
> >>> +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
> >>> +{
> >>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
> >>> +				 INTEL_GUC_SEND_NB);
> >>>  }
> >>>  
> >>>  static inline int
> >>> @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
> >>>  			   u32 *response_buf, u32 response_buf_size)
> >>>  {
> >>>  	return intel_guc_ct_send(&guc->ct, action, len,
> >>> -				 response_buf, response_buf_size);
> >>> +				 response_buf, response_buf_size, 0);
> >>>  }
> >>>  
> >>>  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>> index a17215920e58..c9a65d05911f 100644
> >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>> @@ -3,6 +3,11 @@
> >>>   * Copyright © 2016-2019 Intel Corporation
> >>>   */
> >>>  
> >>> +#include <linux/circ_buf.h>
> >>> +#include <linux/ktime.h>
> >>> +#include <linux/time64.h>
> >>> +#include <linux/timekeeping.h>
> >>> +
> >>>  #include "i915_drv.h"
> >>>  #include "intel_guc_ct.h"
> >>>  #include "gt/intel_gt.h"
> >>> @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct)
> >>>  static int ct_write(struct intel_guc_ct *ct,
> >>>  		    const u32 *action,
> >>>  		    u32 len /* in dwords */,
> >>> -		    u32 fence)
> >>> +		    u32 fence, u32 flags)
> >>>  {
> >>>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> >>> @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct,
> >>>  		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
> >>>  		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
> >>>  
> >>> -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> >>> -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> >>> -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
> >>> +	hxg = (flags & INTEL_GUC_SEND_NB) ?
> >>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
> >>> +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
> >>> +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
> >>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> >>> +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> >>> +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
> >>
> >> or as we already switched to accept and return whole HXG messages in
> >> guc_send_mmio() maybe we should do the same for CTB variant too and
> >> instead of using extra flag just let caller to prepare proper HXG header
> >> with HXG_EVENT type and then in CTB code just look at this type to make
> >> decision which code path to use
> >>
> > 
> > Not sure I follow. Anyways could this be done in a follow up by you if
> > want this change.
> >  
> >> note that existing callers should not be impacted, as full HXG header
> >> for the REQUEST message looks exactly the same as "action" code alone.
> >>
> >>>  
> >>>  	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
> >>>  		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
> >>> @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> >>>  	return err;
> >>>  }
> >>>  
> >>> +static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> >>> +{
> >>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
> >>> +	u32 head = READ_ONCE(desc->head);
> >>> +	u32 space;
> >>> +
> >>> +	space = CIRC_SPACE(desc->tail, head, ctb->size);
> >>> +
> >>> +	return space >= len_dw;
> >>
> >> here you are returning true(1) as has room
> >>
> > 
> > See below.
> >  
> >>> +}
> >>> +
> >>> +static int ct_send_nb(struct intel_guc_ct *ct,
> >>> +		      const u32 *action,
> >>> +		      u32 len,
> >>> +		      u32 flags)
> >>> +{
> >>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>> +	unsigned long spin_flags;
> >>> +	u32 fence;
> >>> +	int ret;
> >>> +
> >>> +	spin_lock_irqsave(&ctb->lock, spin_flags);
> >>> +
> >>> +	ret = h2g_has_room(ctb, len + 1);
> >>
> >> but here you treat "1" it as en error
> >>
> > 
> > Yes, this patch is broken but fixed in a follow up one. Regardless I'll
> > fix this patch in place.
> > 
> >> and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
> >>
> > 
> > Not exactly. This is following how ct_send() uses the action + len
> > field. Action[0] field goes in the HXG header and extra + 1 is for the
> > CT header.
> 
> well, "len" already counts "action" so by treating input as full HXG
> message (including HXG header) will make it cleaner
> 
> we can try do it later but by doing it right now we would avoid
> introducing this send_nb() function and deprecating them long term again
> 
> > 
> >>> +	if (unlikely(ret))
> >>> +		goto out;
> >>> +
> >>> +	fence = ct_get_next_fence(ct);
> >>> +	ret = ct_write(ct, action, len, fence, flags);
> >>> +	if (unlikely(ret))
> >>> +		goto out;
> >>> +
> >>> +	intel_guc_notify(ct_to_guc(ct));
> >>> +
> >>> +out:
> >>> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +
> >>>  static int ct_send(struct intel_guc_ct *ct,
> >>>  		   const u32 *action,
> >>>  		   u32 len,
> >>> @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>  		   u32 response_buf_size,
> >>>  		   u32 *status)
> >>>  {
> >>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>  	struct ct_request request;
> >>>  	unsigned long flags;
> >>>  	u32 fence;
> >>> @@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>  	GEM_BUG_ON(!len);
> >>>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> >>>  	GEM_BUG_ON(!response_buf && response_buf_size);
> >>> +	might_sleep();
> >>>  
> >>> +	/*
> >>> +	 * We use a lazy spin wait loop here as we believe that if the CT
> >>> +	 * buffers are sized correctly the flow control condition should be
> >>> +	 * rare.
> >>
> >> shouldn't we at least try to log such cases with RATE_LIMITED to find
> >> out how "rare" it is, or if really unlikely just return -EBUSY as in
> >> case of non-blocking send ?
> >>
> > 
> > Definitely not return -EBUSY as this a blocking call. Perhaps we can log
> 
> blocking calls still can fail for various reasons, full CTB is one of
> them, and if we return error (now broken) for non-blocking variant then
> we should do the same for blocking variant as well and let the caller
> decide about next steps
> 

And have to rewrite reset of the stack with the new behavior, that seems
wrong. This function is allowed to block, so let it.

If you want to do this but all means go ahead but I'll likely NACK it as
over engineered.

> > this, but IGTs likely can hit rather easily. It really is only
> > interesting if real workloads hit this. Regardless that can be a follow
> > up.
> 
> if we hide retry in a silent loop then we will not find it out if we hit
> this condition (IGT or real WL) or not
>

We don't care if this is hit in IGTs as this isn't a real world use
case and will spam the dmesg if we hit this. I can make a note of this
as an open and we can revisit later.

Matt
 
> > 
> > Matt
> >  
> >>> +	 */
> >>> +retry:
> >>>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> >>> +	if (unlikely(!h2g_has_room(ctb, len + 1))) {
> >>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>> +		cond_resched();
> >>> +		goto retry;
> >>> +	}
> >>>  
> >>>  	fence = ct_get_next_fence(ct);
> >>>  	request.fence = fence;
> >>> @@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>  	list_add_tail(&request.link, &ct->requests.pending);
> >>>  	spin_unlock(&ct->requests.lock);
> >>>  
> >>> -	err = ct_write(ct, action, len, fence);
> >>> +	err = ct_write(ct, action, len, fence, 0);
> >>>  
> >>>  	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>>  
> >>> @@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>   * Command Transport (CT) buffer based GuC send function.
> >>>   */
> >>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>> -		      u32 *response_buf, u32 response_buf_size)
> >>> +		      u32 *response_buf, u32 response_buf_size, u32 flags)
> >>>  {
> >>>  	u32 status = ~0; /* undefined */
> >>>  	int ret;
> >>> @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>>  		return -ENODEV;
> >>>  	}
> >>>  
> >>> +	if (flags & INTEL_GUC_SEND_NB)
> >>> +		return ct_send_nb(ct, action, len, flags);
> >>> +
> >>>  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> >>>  	if (unlikely(ret < 0)) {
> >>>  		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
> >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>> index 1ae2dde6db93..eb69263324ba 100644
> >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>> @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer {
> >>>  	bool broken;
> >>>  };
> >>>  
> >>> -
> >>>  /** Top-level structure for Command Transport related data
> >>>   *
> >>>   * Includes a pair of CT buffers for bi-directional communication and tracking
> >>> @@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
> >>>  }
> >>>  
> >>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>> -		      u32 *response_buf, u32 response_buf_size);
> >>> +		      u32 *response_buf, u32 response_buf_size, u32 flags);
> >>>  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
> >>>  
> >>>  #endif /* _INTEL_GUC_CT_H_ */
> >>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 05/47] drm/i915/guc: Add stall timer to non blocking CTB send function
  2021-06-24 17:37   ` Michal Wajdeczko
@ 2021-06-24 23:01     ` Matthew Brost
  0 siblings, 0 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-24 23:01 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: intel-gfx, dri-devel

On Thu, Jun 24, 2021 at 07:37:01PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 24.06.2021 09:04, Matthew Brost wrote:
> > Implement a stall timer which fails H2G CTBs once a period of time
> > with no forward progress is reached to prevent deadlock.
> > 
> > Also update to ct_write to return -EIO rather than -EPIPE on a
> > corrupted descriptor.
> 
> by doing so you will have the same error code for two different problems:
> 
> a) corrupted CTB descriptor (definitely unrecoverable)
> b) long stall in CTB processing (still recoverable)
> 

Already discussed both are treated exactly the same by the rest of the
stack so we return a single error code. 

> while caller is explicitly instructed to retry only on:
> 
> c) temporary stall in CTB processing (likely recoverable)
> 
> so why do we want to limit our diagnostics?
> 
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 47 +++++++++++++++++++++--
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  4 ++
> >  2 files changed, 48 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index c9a65d05911f..27ec30b5ef47 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -319,6 +319,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
> >  		goto err_deregister;
> >  
> >  	ct->enabled = true;
> > +	ct->stall_time = KTIME_MAX;
> >  
> >  	return 0;
> >  
> > @@ -392,7 +393,7 @@ static int ct_write(struct intel_guc_ct *ct,
> >  	unsigned int i;
> >  
> >  	if (unlikely(ctb->broken))
> > -		return -EPIPE;
> > +		return -EIO;
> >  
> >  	if (unlikely(desc->status))
> >  		goto corrupted;
> > @@ -464,7 +465,7 @@ static int ct_write(struct intel_guc_ct *ct,
> >  	CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n",
> >  		 desc->head, desc->tail, desc->status);
> >  	ctb->broken = true;
> > -	return -EPIPE;
> > +	return -EIO;
> >  }
> >  
> >  /**
> > @@ -507,6 +508,18 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> >  	return err;
> >  }
> >  
> > +#define GUC_CTB_TIMEOUT_MS	1500
> 
> it's 150% of core CTB timeout, maybe we should correlate them ?
>

Seems overkill.
 
> > +static inline bool ct_deadlocked(struct intel_guc_ct *ct)
> > +{
> > +	long timeout = GUC_CTB_TIMEOUT_MS;
> > +	bool ret = ktime_ms_delta(ktime_get(), ct->stall_time) > timeout;
> > +
> > +	if (unlikely(ret))
> > +		CT_ERROR(ct, "CT deadlocked\n");
> 
> nit: in commit message you said all these changes are to "prevent
> deadlock" so maybe this message should rather be:
> 
> 	int delta = ktime_ms_delta(ktime_get(), ct->stall_time);
> 
> 	CT_ERROR(ct, "Communication stalled for %dms\n", delta);
>

Sure.
 
> (note that CT_ERROR already adds "CT" prefix)
>
> > +
> > +	return ret;
> > +}
> > +
> >  static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> >  {
> >  	struct guc_ct_buffer_desc *desc = ctb->desc;
> > @@ -518,6 +531,26 @@ static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> >  	return space >= len_dw;
> >  }
> >  
> > +static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
> > +{
> > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > +
> > +	lockdep_assert_held(&ct->ctbs.send.lock);
> > +
> > +	if (unlikely(!h2g_has_room(ctb, len_dw))) {
> > +		if (ct->stall_time == KTIME_MAX)
> > +			ct->stall_time = ktime_get();
> > +
> > +		if (unlikely(ct_deadlocked(ct)))
> 
> and maybe above message should be printed somewhere around here when we
> detect "deadlock" for the first time?
> 

Not sure I follow. The error message is in the correct place if ask me.
Probably should set the broken flag though when the message is printed
though. 

> > +			return -EIO;
> > +		else
> > +			return -EBUSY;
> > +	}
> > +
> > +	ct->stall_time = KTIME_MAX;
> > +	return 0;
> > +}
> > +
> >  static int ct_send_nb(struct intel_guc_ct *ct,
> >  		      const u32 *action,
> >  		      u32 len,
> > @@ -530,7 +563,7 @@ static int ct_send_nb(struct intel_guc_ct *ct,
> >  
> >  	spin_lock_irqsave(&ctb->lock, spin_flags);
> >  
> > -	ret = h2g_has_room(ctb, len + 1);
> > +	ret = has_room_nb(ct, len + 1);
> >  	if (unlikely(ret))
> >  		goto out;
> >  
> > @@ -574,11 +607,19 @@ static int ct_send(struct intel_guc_ct *ct,
> >  retry:
> >  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> >  	if (unlikely(!h2g_has_room(ctb, len + 1))) {
> > +		if (ct->stall_time == KTIME_MAX)
> > +			ct->stall_time = ktime_get();
> 
> as this is a repeated pattern, maybe it should be moved to h2g_has_room
> or other wrapper ?
>

Once we check G2H credits the pattern changes, hence the reason for this
outside of the wrapper. Also IMO sometimes we go a little overboard with
wrappers and it makes the code harder to understand.
 
> >  		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > +
> > +		if (unlikely(ct_deadlocked(ct)))
> > +			return -EIO;
> > +
> >  		cond_resched();
> >  		goto retry;
> >  	}
> >  
> > +	ct->stall_time = KTIME_MAX;
> 
> this one too
> 

Same as above.

Matt

> > +
> >  	fence = ct_get_next_fence(ct);
> >  	request.fence = fence;
> >  	request.status = 0;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > index eb69263324ba..55ef7c52472f 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > @@ -9,6 +9,7 @@
> >  #include <linux/interrupt.h>
> >  #include <linux/spinlock.h>
> >  #include <linux/workqueue.h>
> > +#include <linux/ktime.h>
> >  
> >  #include "intel_guc_fwif.h"
> >  
> > @@ -68,6 +69,9 @@ struct intel_guc_ct {
> >  		struct list_head incoming; /* incoming requests */
> >  		struct work_struct worker; /* handler for incoming requests */
> >  	} requests;
> > +
> > +	/** @stall_time: time of first time a CTB submission is stalled */
> > +	ktime_t stall_time;
> >  };
> >  
> >  void intel_guc_ct_init_early(struct intel_guc_ct *ct);
> > 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 43/47] drm/i915/guc: Hook GuC scheduling policies up
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 43/47] drm/i915/guc: Hook GuC scheduling policies up Matthew Brost
@ 2021-06-25  0:59   ` Matthew Brost
  2021-06-25 19:10     ` John Harrison
  0 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-25  0:59 UTC (permalink / raw)
  To: intel-gfx, dri-devel

On Thu, Jun 24, 2021 at 12:05:12AM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Use the official driver default scheduling policies for configuring
> the GuC scheduler rather than a bunch of hardcoded values.
> 
> Signed-off-by: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Jose Souza <jose.souza@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  1 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    | 44 ++++++++++++++++++-
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 +++--
>  4 files changed, 53 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 0ceffa2be7a7..37db857bb56c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -455,6 +455,7 @@ struct intel_engine_cs {
>  #define I915_ENGINE_IS_VIRTUAL       BIT(5)
>  #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
>  #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
> +#define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8)
>  	unsigned int flags;
>  
>  	/*
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index c38365cd5fab..905ecbc7dbe3 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -270,6 +270,8 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
>  
>  void intel_guc_find_hung_context(struct intel_engine_cs *engine);
>  
> +int intel_guc_global_policies_update(struct intel_guc *guc);
> +
>  void intel_guc_submission_reset_prepare(struct intel_guc *guc);
>  void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
>  void intel_guc_submission_reset_finish(struct intel_guc *guc);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index d3e86ab7508f..2ad5fcd4e1b7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -77,14 +77,54 @@ static u32 guc_ads_blob_size(struct intel_guc *guc)
>  	       guc_ads_private_data_size(guc);
>  }
>  
> -static void guc_policies_init(struct guc_policies *policies)
> +static void guc_policies_init(struct intel_guc *guc, struct guc_policies *policies)
>  {
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	struct drm_i915_private *i915 = gt->i915;
> +
>  	policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US;
>  	policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI;
> +
>  	policies->global_flags = 0;
> +	if (i915->params.reset < 2)
> +		policies->global_flags |= GLOBAL_POLICY_DISABLE_ENGINE_RESET;
> +
>  	policies->is_valid = 1;
>  }
>  
> +static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE,
> +		policy_offset
> +	};
> +
> +	return intel_guc_send(guc, action, ARRAY_SIZE(action));
> +}
> +
> +int intel_guc_global_policies_update(struct intel_guc *guc)
> +{
> +	struct __guc_ads_blob *blob = guc->ads_blob;
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	intel_wakeref_t wakeref;
> +	int ret;
> +
> +	if (!blob)
> +		return -ENOTSUPP;
> +
> +	GEM_BUG_ON(!blob->ads.scheduler_policies);
> +
> +	guc_policies_init(guc, &blob->policies);
> +
> +	if (!intel_guc_is_ready(guc))
> +		return 0;
> +
> +	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
> +		ret = guc_action_policies_update(guc, blob->ads.scheduler_policies);
> +
> +	return ret;
> +}
> +
>  static void guc_mapping_table_init(struct intel_gt *gt,
>  				   struct guc_gt_system_info *system_info)
>  {
> @@ -281,7 +321,7 @@ static void __guc_ads_init(struct intel_guc *guc)
>  	u8 engine_class, guc_class;
>  
>  	/* GuC scheduling policies */
> -	guc_policies_init(&blob->policies);
> +	guc_policies_init(guc, &blob->policies);
>  
>  	/*
>  	 * GuC expects a per-engine-class context image and size
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 6188189314d5..a427336ce916 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -873,6 +873,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
>  	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
>  	atomic_set(&guc->outstanding_submission_g2h, 0);
>  
> +	intel_guc_global_policies_update(guc);
>  	enable_submission(guc);
>  	intel_gt_unpark_heartbeats(guc_to_gt(guc));
>  }
> @@ -1161,8 +1162,12 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
>  {
>  	desc->policy_flags = 0;
>  
> -	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
> -	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> +	if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION)

I can't see where we set this in this series, although I do see a
selftest we need to fixup that sets this. Perhaps we drop this until we
fix that selftest? Or at minimum add a comment saying it will be used in
the future by selftests. What do you think John?

> +		desc->policy_flags |= CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE;
> +
> +	/* NB: For both of these, zero means disabled. */
> +	desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
> +	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
>  }
>  
>  static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> @@ -1945,13 +1950,13 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>  	engine->set_default_submission = guc_set_default_submission;
>  
>  	engine->flags |= I915_ENGINE_HAS_PREEMPTION;
> +	engine->flags |= I915_ENGINE_HAS_TIMESLICES;
>  
>  	/*
>  	 * TODO: GuC supports timeslicing and semaphores as well, but they're

Nit, we now support timeslicing. I can fix that up in next rev.

Matt

>  	 * handled by the firmware so some minor tweaks are required before
>  	 * enabling.
>  	 *
> -	 * engine->flags |= I915_ENGINE_HAS_TIMESLICES;
>  	 * engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
>  	 */
>  
> -- 
> 2.28.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 44/47] drm/i915/guc: Connect reset modparam updates to GuC policy flags
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 44/47] drm/i915/guc: Connect reset modparam updates to GuC policy flags Matthew Brost
@ 2021-06-25  1:10   ` Matthew Brost
  0 siblings, 0 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-25  1:10 UTC (permalink / raw)
  To: intel-gfx, dri-devel

On Thu, Jun 24, 2021 at 12:05:13AM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Changing the reset module parameter has no effect on a running GuC.
> The corresponding entry in the ADS must be updated and then the GuC
> informed via a Host2GuC message.
> 
> The new debugfs interface to module parameters allows this to happen.
> However, connecting the parameter data address back to anything useful
> is messy. One option would be to pass a new private data structure
> address through instead of just the parameter pointer. However, that
> means having a new (and different) data structure for each parameter
> and a new (and different) write function for each parameter. This
> method keeps everything generic by instead using a string lookup on
> the directory entry name.
> 
> Signed-off-by: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c |  2 +-
>  drivers/gpu/drm/i915/i915_debugfs_params.c | 31 ++++++++++++++++++++++
>  2 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index 2ad5fcd4e1b7..c6d0b762d82c 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -99,7 +99,7 @@ static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
>  		policy_offset
>  	};
>  
> -	return intel_guc_send(guc, action, ARRAY_SIZE(action));
> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
>  }
>  
>  int intel_guc_global_policies_update(struct intel_guc *guc)
> diff --git a/drivers/gpu/drm/i915/i915_debugfs_params.c b/drivers/gpu/drm/i915/i915_debugfs_params.c
> index 4e2b077692cb..8ecd8b42f048 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs_params.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs_params.c
> @@ -6,9 +6,20 @@
>  #include <linux/kernel.h>
>  
>  #include "i915_debugfs_params.h"
> +#include "gt/intel_gt.h"
> +#include "gt/uc/intel_guc.h"
>  #include "i915_drv.h"
>  #include "i915_params.h"
>  
> +#define MATCH_DEBUGFS_NODE_NAME(_file, _name)	(strcmp((_file)->f_path.dentry->d_name.name, (_name)) == 0)
> +
> +#define GET_I915(i915, name, ptr)	\
> +	do {	\
> +		struct i915_params *params;	\
> +		params = container_of(((void *) (ptr)), typeof(*params), name);	\
> +		(i915) = container_of(params, typeof(*(i915)), params);	\
> +	} while(0)
> +
>  /* int param */
>  static int i915_param_int_show(struct seq_file *m, void *data)
>  {
> @@ -24,6 +35,16 @@ static int i915_param_int_open(struct inode *inode, struct file *file)
>  	return single_open(file, i915_param_int_show, inode->i_private);
>  }
>  
> +static int notify_guc(struct drm_i915_private *i915)
> +{
> +	int ret = 0;
> +
> +	if (intel_uc_uses_guc_submission(&i915->gt.uc))
> +		ret = intel_guc_global_policies_update(&i915->gt.uc.guc);
> +
> +	return ret;
> +}
> +
>  static ssize_t i915_param_int_write(struct file *file,
>  				    const char __user *ubuf, size_t len,
>  				    loff_t *offp)
> @@ -81,8 +102,10 @@ static ssize_t i915_param_uint_write(struct file *file,
>  				     const char __user *ubuf, size_t len,
>  				     loff_t *offp)
>  {
> +	struct drm_i915_private *i915;
>  	struct seq_file *m = file->private_data;
>  	unsigned int *value = m->private;
> +	unsigned int old = *value;
>  	int ret;
>  
>  	ret = kstrtouint_from_user(ubuf, len, 0, value);
> @@ -95,6 +118,14 @@ static ssize_t i915_param_uint_write(struct file *file,
>  			*value = b;
>  	}
>  
> +	if (!ret && MATCH_DEBUGFS_NODE_NAME(file, "reset")) {
> +		GET_I915(i915, reset, value);

We might want to make this into a macro in case we need to update more
than just "reset" with the GuC going forward but that is not a blocker.

With that:
Reviewed-by: Matthew Brost <matthew.brost@intel.com> 

> +
> +		ret = notify_guc(i915);
> +		if (ret)
> +			*value = old;
> +	}
> +
>  	return ret ?: len;
>  }
>  
> -- 
> 2.28.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 04/47] drm/i915/guc: Add non blocking CTB send function
  2021-06-24 22:41         ` Matthew Brost
@ 2021-06-25 11:50           ` Michal Wajdeczko
  2021-06-25 17:53             ` Matthew Brost
  0 siblings, 1 reply; 170+ messages in thread
From: Michal Wajdeczko @ 2021-06-25 11:50 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, dri-devel



On 25.06.2021 00:41, Matthew Brost wrote:
> On Thu, Jun 24, 2021 at 07:02:18PM +0200, Michal Wajdeczko wrote:
>>
>>
>> On 24.06.2021 17:49, Matthew Brost wrote:
>>> On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
>>>>
>>>>
>>>> On 24.06.2021 09:04, Matthew Brost wrote:
>>>>> Add non blocking CTB send function, intel_guc_send_nb. GuC submission
>>>>> will send CTBs in the critical path and does not need to wait for these
>>>>> CTBs to complete before moving on, hence the need for this new function.
>>>>>
>>>>> The non-blocking CTB now must have a flow control mechanism to ensure
>>>>> the buffer isn't overrun. A lazy spin wait is used as we believe the
>>>>> flow control condition should be rare with a properly sized buffer.
>>>>>
>>>>> The function, intel_guc_send_nb, is exported in this patch but unused.
>>>>> Several patches later in the series make use of this function.
>>>>>
>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>> ---
>>>>>  drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 +++-
>>>>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++--
>>>>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +-
>>>>>  3 files changed, 82 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>>> index 4abc59f6f3cd..24b1df6ad4ae 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>>> @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
>>>>>  static
>>>>>  inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
>>>>>  {
>>>>> -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
>>>>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
>>>>> +}
>>>>> +
>>>>> +#define INTEL_GUC_SEND_NB		BIT(31)
>>>>
>>>> hmm, this flag really belongs to intel_guc_ct_send() so it should be
>>>> defined as CTB flag near that function declaration
>>>>
>>>
>>> I can move this up a few lines.
>>>
>>>>> +static
>>>>> +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
>>>>> +{
>>>>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
>>>>> +				 INTEL_GUC_SEND_NB);
>>>>>  }
>>>>>  
>>>>>  static inline int
>>>>> @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
>>>>>  			   u32 *response_buf, u32 response_buf_size)
>>>>>  {
>>>>>  	return intel_guc_ct_send(&guc->ct, action, len,
>>>>> -				 response_buf, response_buf_size);
>>>>> +				 response_buf, response_buf_size, 0);
>>>>>  }
>>>>>  
>>>>>  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>>> index a17215920e58..c9a65d05911f 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>>> @@ -3,6 +3,11 @@
>>>>>   * Copyright © 2016-2019 Intel Corporation
>>>>>   */
>>>>>  
>>>>> +#include <linux/circ_buf.h>
>>>>> +#include <linux/ktime.h>
>>>>> +#include <linux/time64.h>
>>>>> +#include <linux/timekeeping.h>
>>>>> +
>>>>>  #include "i915_drv.h"
>>>>>  #include "intel_guc_ct.h"
>>>>>  #include "gt/intel_gt.h"
>>>>> @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct)
>>>>>  static int ct_write(struct intel_guc_ct *ct,
>>>>>  		    const u32 *action,
>>>>>  		    u32 len /* in dwords */,
>>>>> -		    u32 fence)
>>>>> +		    u32 fence, u32 flags)
>>>>>  {
>>>>>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>>  	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>>> @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct,
>>>>>  		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
>>>>>  		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
>>>>>  
>>>>> -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
>>>>> -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
>>>>> -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
>>>>> +	hxg = (flags & INTEL_GUC_SEND_NB) ?
>>>>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
>>>>> +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
>>>>> +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
>>>>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
>>>>> +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
>>>>> +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
>>>>
>>>> or as we already switched to accept and return whole HXG messages in
>>>> guc_send_mmio() maybe we should do the same for CTB variant too and
>>>> instead of using extra flag just let caller to prepare proper HXG header
>>>> with HXG_EVENT type and then in CTB code just look at this type to make
>>>> decision which code path to use
>>>>
>>>
>>> Not sure I follow. Anyways could this be done in a follow up by you if
>>> want this change.
>>>  
>>>> note that existing callers should not be impacted, as full HXG header
>>>> for the REQUEST message looks exactly the same as "action" code alone.
>>>>
>>>>>  
>>>>>  	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
>>>>>  		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
>>>>> @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
>>>>>  	return err;
>>>>>  }
>>>>>  
>>>>> +static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
>>>>> +{
>>>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>>> +	u32 head = READ_ONCE(desc->head);
>>>>> +	u32 space;
>>>>> +
>>>>> +	space = CIRC_SPACE(desc->tail, head, ctb->size);
>>>>> +
>>>>> +	return space >= len_dw;
>>>>
>>>> here you are returning true(1) as has room
>>>>
>>>
>>> See below.
>>>  
>>>>> +}
>>>>> +
>>>>> +static int ct_send_nb(struct intel_guc_ct *ct,
>>>>> +		      const u32 *action,
>>>>> +		      u32 len,
>>>>> +		      u32 flags)
>>>>> +{
>>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>> +	unsigned long spin_flags;
>>>>> +	u32 fence;
>>>>> +	int ret;
>>>>> +
>>>>> +	spin_lock_irqsave(&ctb->lock, spin_flags);
>>>>> +
>>>>> +	ret = h2g_has_room(ctb, len + 1);
>>>>
>>>> but here you treat "1" it as en error
>>>>
>>>
>>> Yes, this patch is broken but fixed in a follow up one. Regardless I'll
>>> fix this patch in place.
>>>
>>>> and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
>>>>
>>>
>>> Not exactly. This is following how ct_send() uses the action + len
>>> field. Action[0] field goes in the HXG header and extra + 1 is for the
>>> CT header.
>>
>> well, "len" already counts "action" so by treating input as full HXG
>> message (including HXG header) will make it cleaner
>>
> 
> Yes, I know. See above. To me GUC_HXG_MSG_MIN_LEN makes zero sense and
> it is worse than adding + 1. This + 1 accounts for the CT header not the
> HXG header. If any we add a new define, GUC_CT_HDR_LEN, and add that.

you mean GUC_CTB_MSG_MIN_LEN ? it's already there [1]

[1]
https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h#n82

> 
> Matt
> 
>> we can try do it later but by doing it right now we would avoid
>> introducing this send_nb() function and deprecating them long term again
>>
>>>
>>>>> +	if (unlikely(ret))
>>>>> +		goto out;
>>>>> +
>>>>> +	fence = ct_get_next_fence(ct);
>>>>> +	ret = ct_write(ct, action, len, fence, flags);
>>>>> +	if (unlikely(ret))
>>>>> +		goto out;
>>>>> +
>>>>> +	intel_guc_notify(ct_to_guc(ct));
>>>>> +
>>>>> +out:
>>>>> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
>>>>> +
>>>>> +	return ret;
>>>>> +}
>>>>> +
>>>>>  static int ct_send(struct intel_guc_ct *ct,
>>>>>  		   const u32 *action,
>>>>>  		   u32 len,
>>>>> @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>  		   u32 response_buf_size,
>>>>>  		   u32 *status)
>>>>>  {
>>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>>  	struct ct_request request;
>>>>>  	unsigned long flags;
>>>>>  	u32 fence;
>>>>> @@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>  	GEM_BUG_ON(!len);
>>>>>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>>>  	GEM_BUG_ON(!response_buf && response_buf_size);
>>>>> +	might_sleep();
>>>>>  
>>>>> +	/*
>>>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>>>>> +	 * buffers are sized correctly the flow control condition should be
>>>>> +	 * rare.
>>>>
>>>> shouldn't we at least try to log such cases with RATE_LIMITED to find
>>>> out how "rare" it is, or if really unlikely just return -EBUSY as in
>>>> case of non-blocking send ?
>>>>
>>>
>>> Definitely not return -EBUSY as this a blocking call. Perhaps we can log
>>
>> blocking calls still can fail for various reasons, full CTB is one of
>> them, and if we return error (now broken) for non-blocking variant then
>> we should do the same for blocking variant as well and let the caller
>> decide about next steps
>>
>>> this, but IGTs likely can hit rather easily. It really is only
>>> interesting if real workloads hit this. Regardless that can be a follow
>>> up.
>>
>> if we hide retry in a silent loop then we will not find it out if we hit
>> this condition (IGT or real WL) or not
>>
>>>
>>> Matt
>>>  
>>>>> +	 */
>>>>> +retry:
>>>>>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>>>>> +	if (unlikely(!h2g_has_room(ctb, len + 1))) {
>>>>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>>>> +		cond_resched();
>>>>> +		goto retry;
>>>>> +	}
>>>>>  
>>>>>  	fence = ct_get_next_fence(ct);
>>>>>  	request.fence = fence;
>>>>> @@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>  	list_add_tail(&request.link, &ct->requests.pending);
>>>>>  	spin_unlock(&ct->requests.lock);
>>>>>  
>>>>> -	err = ct_write(ct, action, len, fence);
>>>>> +	err = ct_write(ct, action, len, fence, 0);
>>>>>  
>>>>>  	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>>>>  
>>>>> @@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>   * Command Transport (CT) buffer based GuC send function.
>>>>>   */
>>>>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>>>> -		      u32 *response_buf, u32 response_buf_size)
>>>>> +		      u32 *response_buf, u32 response_buf_size, u32 flags)
>>>>>  {
>>>>>  	u32 status = ~0; /* undefined */
>>>>>  	int ret;
>>>>> @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>>>>  		return -ENODEV;
>>>>>  	}
>>>>>  
>>>>> +	if (flags & INTEL_GUC_SEND_NB)
>>>>> +		return ct_send_nb(ct, action, len, flags);
>>>>> +
>>>>>  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>>>>>  	if (unlikely(ret < 0)) {
>>>>>  		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>>> index 1ae2dde6db93..eb69263324ba 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>>> @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer {
>>>>>  	bool broken;
>>>>>  };
>>>>>  
>>>>> -
>>>>>  /** Top-level structure for Command Transport related data
>>>>>   *
>>>>>   * Includes a pair of CT buffers for bi-directional communication and tracking
>>>>> @@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
>>>>>  }
>>>>>  
>>>>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>>>> -		      u32 *response_buf, u32 response_buf_size);
>>>>> +		      u32 *response_buf, u32 response_buf_size, u32 flags);
>>>>>  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
>>>>>  
>>>>>  #endif /* _INTEL_GUC_CT_H_ */
>>>>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 02/47] drm/i915/guc: Improve error message for unsolicited CT response
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 02/47] drm/i915/guc: Improve error message for unsolicited CT response Matthew Brost
@ 2021-06-25 11:58   ` Michal Wajdeczko
  0 siblings, 0 replies; 170+ messages in thread
From: Michal Wajdeczko @ 2021-06-25 11:58 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel



On 24.06.2021 09:04, Matthew Brost wrote:
> Improve the error message when a unsolicited CT response is received by
> printing fence that couldn't be found, the last fence, and all requests
> with a response outstanding.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index a59e239497ee..07f080ddb9ae 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -730,12 +730,16 @@ static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *r
>  		found = true;
>  		break;
>  	}
> -	spin_unlock_irqrestore(&ct->requests.lock, flags);
> -
>  	if (!found) {
>  		CT_ERROR(ct, "Unsolicited response (fence %u)\n", fence);
> -		return -ENOKEY;
> +		CT_ERROR(ct, "Could not find fence=%u, last_fence=%u\n", fence,
> +			 ct->requests.last_fence);
> +		list_for_each_entry(req, &ct->requests.pending, link)
> +			CT_ERROR(ct, "request %u awaits response\n",
> +				 req->fence);

not quite sure how listing of awaiting requests could help here (if we
suspect that this is a duplicated reply, then we should rather track
short list of already processed messages to look there) but since it
does not hurt too much, this is:

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>

> +		err = -ENOKEY;
>  	}
> +	spin_unlock_irqrestore(&ct->requests.lock, flags);
>  
>  	if (unlikely(err))
>  		return err;
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 03/47] drm/i915/guc: Increase size of CTB buffers
  2021-06-24 15:41     ` Matthew Brost
@ 2021-06-25 12:03       ` Michal Wajdeczko
  0 siblings, 0 replies; 170+ messages in thread
From: Michal Wajdeczko @ 2021-06-25 12:03 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, dri-devel



On 24.06.2021 17:41, Matthew Brost wrote:
> On Thu, Jun 24, 2021 at 03:49:55PM +0200, Michal Wajdeczko wrote:
>>
>>
>> On 24.06.2021 09:04, Matthew Brost wrote:
>>> With the introduction of non-blocking CTBs more than one CTB can be in
>>> flight at a time. Increasing the size of the CTBs should reduce how
>>> often software hits the case where no space is available in the CTB
>>> buffer.
>>>
>>> Cc: John Harrison <john.c.harrison@intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
>>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index 07f080ddb9ae..a17215920e58 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -58,11 +58,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
>>>   *      +--------+-----------------------------------------------+------+
>>>   *
>>>   * Size of each `CT Buffer`_ must be multiple of 4K.
>>> - * As we don't expect too many messages, for now use minimum sizes.
>>> + * We don't expect too many messages in flight at any time, unless we are
>>> + * using the GuC submission. In that case each request requires a minimum
>>> + * 2 dwords which gives us a maximum 256 queue'd requests. Hopefully this
>>> + * enough space to avoid backpressure on the driver. We increase the size
>>> + * of the receive buffer (relative to the send) to ensure a G2H response
>>> + * CTB has a landing spot.
>>>   */
>>>  #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
>>>  #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
>>> -#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
>>> +#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
>>>  
>>>  struct ct_request {
>>>  	struct list_head link;
>>> @@ -641,7 +646,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>>>  	/* beware of buffer wrap case */
>>>  	if (unlikely(available < 0))
>>>  		available += size;
>>> -	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
>>> +	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
>>
>> CTB size is already printed in intel_guc_ct_init() and is fixed so not
>> sure if repeating it on every ct_read has any benefit
>>
> 
> I'd say more debug the better and if CT_DEBUG is enabled the logs are
> very verbose so an extra value doesn't really hurt.

fair, but this doesn't mean we should add little/no value item, anyway
since DEBUG_GUC is if off by default, this is:

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>

> 
> Matt
> 
>>>  	GEM_BUG_ON(available < 0);
>>>  
>>>  	header = cmds[head];
>>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 06/47] drm/i915/guc: Optimize CTB writes and reads
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 06/47] drm/i915/guc: Optimize CTB writes and reads Matthew Brost
@ 2021-06-25 13:09   ` Michal Wajdeczko
  2021-06-25 18:26     ` Matthew Brost
  2021-06-25 20:28     ` Matthew Brost
  0 siblings, 2 replies; 170+ messages in thread
From: Michal Wajdeczko @ 2021-06-25 13:09 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel



On 24.06.2021 09:04, Matthew Brost wrote:
> CTB writes are now in the path of command submission and should be
> optimized for performance. Rather than reading CTB descriptor values
> (e.g. head, tail) which could result in accesses across the PCIe bus,
> store shadow local copies and only read/write the descriptor values when
> absolutely necessary. Also store the current space in the each channel
> locally.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 76 ++++++++++++++---------
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
>  2 files changed, 51 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 27ec30b5ef47..1fd5c69358ef 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
>  static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
>  {
>  	ctb->broken = false;
> +	ctb->tail = 0;
> +	ctb->head = 0;
> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
> +
>  	guc_ct_buffer_desc_init(ctb->desc);
>  }
>  
> @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
>  {
>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> -	u32 tail = desc->tail;
> +	u32 tail = ctb->tail;
>  	u32 size = ctb->size;
> -	u32 used;
>  	u32 header;
>  	u32 hxg;
>  	u32 *cmds = ctb->cmds;
> @@ -398,25 +400,14 @@ static int ct_write(struct intel_guc_ct *ct,
>  	if (unlikely(desc->status))
>  		goto corrupted;
>  
> -	if (unlikely((tail | head) >= size)) {
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC

since we are caching tail, we may want to check if it's sill correct:

	tail = READ_ONCE(desc->tail);
	if (unlikely(tail != ctb->tail)) {
  		CT_ERROR(ct, "Tail was modified %u != %u\n",
			 tail, ctb->tail);
  		desc->status |= GUC_CTB_STATUS_MISMATCH;
  		goto corrupted;
	}

and since we own the tail then we can be more strict:

	GEM_BUG_ON(tail > size);

and then finally just check GuC head:

	head = READ_ONCE(desc->head);
	if (unlikely(head >= size)) {
		...

> +	if (unlikely((desc->tail | desc->head) >= size)) {
>  		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> -			 head, tail, size);
> +			 desc->head, desc->tail, size);
>  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>  		goto corrupted;
>  	}
> -
> -	/*
> -	 * tail == head condition indicates empty. GuC FW does not support
> -	 * using up the entire buffer to get tail == head meaning full.
> -	 */
> -	if (tail < head)
> -		used = (size - head) + tail;
> -	else
> -		used = tail - head;
> -
> -	/* make sure there is a space including extra dw for the fence */
> -	if (unlikely(used + len + 1 >= size))
> -		return -ENOSPC;
> +#endif
>  
>  	/*
>  	 * dw0: CT header (including fence)
> @@ -457,7 +448,9 @@ static int ct_write(struct intel_guc_ct *ct,
>  	write_barrier(ct);
>  
>  	/* now update descriptor */
> +	ctb->tail = tail;
>  	WRITE_ONCE(desc->tail, tail);
> +	ctb->space -= len + 1;

this magic "1" is likely GUC_CTB_MSG_MIN_LEN, right ?

>  
>  	return 0;
>  
> @@ -473,7 +466,7 @@ static int ct_write(struct intel_guc_ct *ct,
>   * @req:	pointer to pending request
>   * @status:	placeholder for status
>   *
> - * For each sent request, Guc shall send bac CT response message.
> + * For each sent request, GuC shall send back CT response message.
>   * Our message handler will update status of tracked request once
>   * response message with given fence is received. Wait here and
>   * check for valid response status value.
> @@ -520,24 +513,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
>  	return ret;
>  }
>  
> -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
>  {
> -	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = READ_ONCE(desc->head);
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> +	u32 head;
>  	u32 space;
>  
> -	space = CIRC_SPACE(desc->tail, head, ctb->size);
> +	if (ctb->space >= len_dw)
> +		return true;
> +
> +	head = READ_ONCE(ctb->desc->head);
> +	if (unlikely(head > ctb->size)) {
> +		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
> +			  ctb->desc->head, ctb->desc->tail, ctb->size);
> +		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
> +		ctb->broken = true;
> +		return false;
> +	}
> +
> +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> +	ctb->space = space;

maybe here we could mark stall_time ?

 	if (space >= len_dw)
		return true;

	if (ct->stall_time == KTIME_MAX)
		ct->stall_time = ktime_get();
	return false;

>  
>  	return space >= len_dw;

btw, maybe to avoid filling CTB to the last dword, this should be

	space > len_dw

note the earlier comment:

/*
 * tail == head condition indicates empty. GuC FW does not support
 * using up the entire buffer to get tail == head meaning full.
 */

>  }
>  
>  static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
>  {
> -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> -
>  	lockdep_assert_held(&ct->ctbs.send.lock);
>  
> -	if (unlikely(!h2g_has_room(ctb, len_dw))) {
> +	if (unlikely(!h2g_has_room(ct, len_dw))) {
>  		if (ct->stall_time == KTIME_MAX)
>  			ct->stall_time = ktime_get();
>  
> @@ -606,10 +610,10 @@ static int ct_send(struct intel_guc_ct *ct,
>  	 */
>  retry:
>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> -	if (unlikely(!h2g_has_room(ctb, len + 1))) {
> +	if (unlikely(!h2g_has_room(ct, len + 1))) {
>  		if (ct->stall_time == KTIME_MAX)
>  			ct->stall_time = ktime_get();
> -		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> +		spin_unlock_irqrestore(&ctb->lock, flags);
>  
>  		if (unlikely(ct_deadlocked(ct)))
>  			return -EIO;
> @@ -632,7 +636,7 @@ static int ct_send(struct intel_guc_ct *ct,
>  
>  	err = ct_write(ct, action, len, fence, 0);
>  
> -	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> +	spin_unlock_irqrestore(&ctb->lock, flags);
>  
>  	if (unlikely(err))
>  		goto unlink;
> @@ -720,7 +724,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  {
>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> +	u32 head = ctb->head;
>  	u32 tail = desc->tail;
>  	u32 size = ctb->size;
>  	u32 *cmds = ctb->cmds;
> @@ -735,12 +739,21 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	if (unlikely(desc->status))
>  		goto corrupted;
>  
> -	if (unlikely((tail | head) >= size)) {
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC

as above we may want to check if our cached head was not modified

> +	if (unlikely((desc->tail | desc->head) >= size)) {
>  		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>  			 head, tail, size);
>  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>  		goto corrupted;
>  	}
> +#else
> +	if (unlikely((tail | ctb->head) >= size)) {
> +		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> +			 head, tail, size);
> +		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> +		goto corrupted;
> +	}
> +#endif
>  
>  	/* tail == head condition indicates empty */
>  	available = tail - head;
> @@ -790,6 +803,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	}
>  	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
>  
> +	ctb->head = head;
>  	/* now update descriptor */
>  	WRITE_ONCE(desc->head, head);
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index 55ef7c52472f..9924335e2ee6 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -33,6 +33,9 @@ struct intel_guc;
>   * @desc: pointer to the buffer descriptor
>   * @cmds: pointer to the commands buffer
>   * @size: size of the commands buffer in dwords
> + * @head: local shadow copy of head in dwords
> + * @tail: local shadow copy of tail in dwords
> + * @space: local shadow copy of space in dwords
>   * @broken: flag to indicate if descriptor data is broken
>   */
>  struct intel_guc_ct_buffer {
> @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
>  	struct guc_ct_buffer_desc *desc;
>  	u32 *cmds;
>  	u32 size;
> +	u32 tail;
> +	u32 head;
> +	u32 space;

in later patch this is changing to atomic_t
maybe we can start with it ?

>  	bool broken;
>  };
>  
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 10/47] drm/i915/guc: Add lrc descriptor context lookup array
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 10/47] drm/i915/guc: Add lrc descriptor context lookup array Matthew Brost
@ 2021-06-25 13:17   ` Michal Wajdeczko
  2021-06-25 17:26     ` Matthew Brost
  0 siblings, 1 reply; 170+ messages in thread
From: Michal Wajdeczko @ 2021-06-25 13:17 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel



On 24.06.2021 09:04, Matthew Brost wrote:
> Add lrc descriptor context lookup array which can resolve the
> intel_context from the lrc descriptor index. In addition to lookup, it
> can determine in the lrc descriptor context is currently registered with
> the GuC by checking if an entry for a descriptor index is present.
> Future patches in the series will make use of this array.

s/lrc/LRC

> 
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  5 +++
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +++++++++++++++++--
>  2 files changed, 35 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index b28fa54214f2..2313d9fc087b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -6,6 +6,8 @@
>  #ifndef _INTEL_GUC_H_
>  #define _INTEL_GUC_H_
>  
> +#include "linux/xarray.h"

#include <linux/xarray.h>

> +
>  #include "intel_uncore.h"
>  #include "intel_guc_fw.h"
>  #include "intel_guc_fwif.h"
> @@ -46,6 +48,9 @@ struct intel_guc {
>  	struct i915_vma *lrc_desc_pool;
>  	void *lrc_desc_pool_vaddr;
>  
> +	/* guc_id to intel_context lookup */
> +	struct xarray context_lookup;
> +
>  	/* Control params for fw initialization */
>  	u32 params[GUC_CTL_MAX_DWORDS];

btw, IIRC there was idea to move most struct definitions to
intel_guc_types.h, is this still a plan ?

>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index a366890fb840..23a94a896a0b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>  	return rb_entry(rb, struct i915_priolist, node);
>  }
>  
> -/* Future patches will use this function */
> -__attribute__ ((unused))
>  static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
>  {
>  	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
> @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
>  	return &base[index];
>  }
>  
> +static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
> +{
> +	struct intel_context *ce = xa_load(&guc->context_lookup, id);
> +
> +	GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
> +
> +	return ce;
> +}
> +
>  static int guc_lrc_desc_pool_create(struct intel_guc *guc)
>  {
>  	u32 size;
> @@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
>  	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
>  }
>  
> +static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
> +{
> +	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> +
> +	memset(desc, 0, sizeof(*desc));
> +	xa_erase_irq(&guc->context_lookup, id);
> +}
> +
> +static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> +{
> +	return __get_context(guc, id);
> +}
> +
> +static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> +					   struct intel_context *ce)
> +{
> +	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> +}
> +
>  static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>  {
>  	/* Leaving stub as this function will be used in future patches */
> @@ -400,6 +426,8 @@ int intel_guc_submission_init(struct intel_guc *guc)
>  	 */
>  	GEM_BUG_ON(!guc->lrc_desc_pool);
>  
> +	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> +
>  	return 0;
>  }
>  
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 13/47] drm/i915/guc: Implement GuC context operations for new inteface
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 13/47] drm/i915/guc: Implement GuC context operations for new inteface Matthew Brost
@ 2021-06-25 13:25   ` Michal Wajdeczko
  2021-06-25 17:46     ` Matthew Brost
  0 siblings, 1 reply; 170+ messages in thread
From: Michal Wajdeczko @ 2021-06-25 13:25 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel



On 24.06.2021 09:04, Matthew Brost wrote:
> Implement GuC context operations which includes GuC specific operations
> alloc, pin, unpin, and destroy.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
>  drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
>  drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  34 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 664 ++++++++++++++++--
>  drivers/gpu/drm/i915/i915_reg.h               |   1 +
>  drivers/gpu/drm/i915/i915_request.c           |   1 +
>  8 files changed, 677 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index 4033184f13b9..2b68af16222c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -383,6 +383,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>  
>  	mutex_init(&ce->pin_mutex);
>  
> +	spin_lock_init(&ce->guc_state.lock);
> +
> +	ce->guc_id = GUC_INVALID_LRC_ID;
> +	INIT_LIST_HEAD(&ce->guc_id_link);
> +
>  	i915_active_init(&ce->active,
>  			 __intel_context_active, __intel_context_retire, 0);
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index bb6fef7eae52..ce7c69b34cd1 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -95,6 +95,7 @@ struct intel_context {
>  #define CONTEXT_BANNED			6
>  #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
>  #define CONTEXT_NOPREEMPT		8
> +#define CONTEXT_LRCA_DIRTY		9
>  
>  	struct {
>  		u64 timeout_us;
> @@ -137,14 +138,29 @@ struct intel_context {
>  
>  	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
>  
> +	struct {
> +		/** lock: protects everything in guc_state */
> +		spinlock_t lock;
> +		/**
> +		 * sched_state: scheduling state of this context using GuC
> +		 * submission
> +		 */
> +		u8 sched_state;
> +	} guc_state;
> +
>  	/* GuC scheduling state that does not require a lock. */
>  	atomic_t guc_sched_state_no_lock;
>  
> +	/* GuC lrc descriptor ID */
> +	u16 guc_id;
> +
> +	/* GuC lrc descriptor reference count */
> +	atomic_t guc_id_ref;
> +
>  	/*
> -	 * GuC lrc descriptor ID - Not assigned in this patch but future patches
> -	 * in the series will.
> +	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
>  	 */
> -	u16 guc_id;
> +	struct list_head guc_id_link;

some fields are being added with kerneldoc, some without
what's the rule ?

>  };
>  
>  #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> index 41e5350a7a05..49d4857ad9b7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> @@ -87,7 +87,6 @@
>  #define GEN11_CSB_WRITE_PTR_MASK	(GEN11_CSB_PTR_MASK << 0)
>  
>  #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */
> -#define MAX_GUC_CONTEXT_HW_ID	(1 << 20) /* exclusive */
>  #define GEN11_MAX_CONTEXT_HW_ID	(1 << 11) /* exclusive */
>  /* in Gen12 ID 0x7FF is reserved to indicate idle */
>  #define GEN12_MAX_CONTEXT_HW_ID	(GEN11_MAX_CONTEXT_HW_ID - 1)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 9ba8219475b2..d44316dc914b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -44,6 +44,14 @@ struct intel_guc {
>  		void (*disable)(struct intel_guc *guc);
>  	} interrupts;
>  
> +	/*
> +	 * contexts_lock protects the pool of free guc ids and a linked list of
> +	 * guc ids available to be stolen
> +	 */
> +	spinlock_t contexts_lock;
> +	struct ida guc_ids;
> +	struct list_head guc_id_list;
> +
>  	bool submission_selected;
>  
>  	struct i915_vma *ads_vma;
> @@ -102,6 +110,29 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
>  				 response_buf, response_buf_size, 0);
>  }
>  
> +static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
> +					   const u32 *action,
> +					   u32 len,
> +					   bool loop)
> +{
> +	int err;
> +
> +	/* No sleeping with spin locks, just busy loop */
> +	might_sleep_if(loop && (!in_atomic() && !irqs_disabled()));
> +
> +retry:
> +	err = intel_guc_send_nb(guc, action, len);
> +	if (unlikely(err == -EBUSY && loop)) {
> +		if (likely(!in_atomic() && !irqs_disabled()))
> +			cond_resched();
> +		else
> +			cpu_relax();
> +		goto retry;
> +	}
> +
> +	return err;
> +}
> +
>  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
>  {
>  	intel_guc_ct_event_handler(&guc->ct);
> @@ -203,6 +234,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>  int intel_guc_reset_engine(struct intel_guc *guc,
>  			   struct intel_engine_cs *engine);
>  
> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> +					  const u32 *msg, u32 len);
> +
>  void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
>  
>  #endif
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 8e0ed7d8feb3..42a7daef2ff6 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -901,6 +901,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
>  	case INTEL_GUC_ACTION_DEFAULT:
>  		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
>  		break;
> +	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
> +		ret = intel_guc_deregister_done_process_msg(guc, payload,
> +							    len);
> +		break;
>  	default:
>  		ret = -EOPNOTSUPP;
>  		break;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 38aff83ee9fa..d39579ac2faa 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -13,7 +13,9 @@
>  #include "gt/intel_gt.h"
>  #include "gt/intel_gt_irq.h"
>  #include "gt/intel_gt_pm.h"
> +#include "gt/intel_gt_requests.h"
>  #include "gt/intel_lrc.h"
> +#include "gt/intel_lrc_reg.h"
>  #include "gt/intel_mocs.h"
>  #include "gt/intel_ring.h"
>  
> @@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce)
>  		   &ce->guc_sched_state_no_lock);
>  }
>  
> +/*
> + * Below is a set of functions which control the GuC scheduling state which
> + * require a lock, aside from the special case where the functions are called
> + * from guc_lrc_desc_pin(). In that case it isn't possible for any other code
> + * path to be executing on the context.
> + */
> +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
> +#define SCHED_STATE_DESTROYED				BIT(1)
> +static inline void init_sched_state(struct intel_context *ce)
> +{
> +	/* Only should be called from guc_lrc_desc_pin() */
> +	atomic_set(&ce->guc_sched_state_no_lock, 0);
> +	ce->guc_state.sched_state = 0;
> +}
> +
> +static inline bool
> +context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	return (ce->guc_state.sched_state &
> +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> +}
> +
> +static inline void
> +set_context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	/* Only should be called from guc_lrc_desc_pin() */
> +	ce->guc_state.sched_state |=
> +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> +}
> +
> +static inline void
> +clr_context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	ce->guc_state.sched_state =
> +		(ce->guc_state.sched_state &
> +		 ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> +}
> +
> +static inline bool
> +context_destroyed(struct intel_context *ce)
> +{
> +	return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
> +}
> +
> +static inline void
> +set_context_destroyed(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
> +}
> +
> +static inline bool context_guc_id_invalid(struct intel_context *ce)
> +{
> +	return (ce->guc_id == GUC_INVALID_LRC_ID);
> +}
> +
> +static inline void set_context_guc_id_invalid(struct intel_context *ce)
> +{
> +	ce->guc_id = GUC_INVALID_LRC_ID;
> +}
> +
> +static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
> +{
> +	return &ce->engine->gt->uc.guc;
> +}
> +
>  static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>  {
>  	return rb_entry(rb, struct i915_priolist, node);
> @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>  	int len = 0;
>  	bool enabled = context_enabled(ce);
>  
> +	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> +	GEM_BUG_ON(context_guc_id_invalid(ce));
> +
>  	if (!enabled) {
>  		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>  		action[len++] = ce->guc_id;
> @@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
>  
>  	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
>  
> +	spin_lock_init(&guc->contexts_lock);
> +	INIT_LIST_HEAD(&guc->guc_id_list);
> +	ida_init(&guc->guc_ids);
> +
>  	return 0;
>  }
>  
> @@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc)
>  	i915_sched_engine_put(guc->sched_engine);
>  }
>  
> -static int guc_context_alloc(struct intel_context *ce)
> +static inline void queue_request(struct i915_sched_engine *sched_engine,
> +				 struct i915_request *rq,
> +				 int prio)
>  {
> -	return lrc_alloc(ce, ce->engine);
> +	GEM_BUG_ON(!list_empty(&rq->sched.link));
> +	list_add_tail(&rq->sched.link,
> +		      i915_sched_lookup_priolist(sched_engine, prio));
> +	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> +}
> +
> +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> +				     struct i915_request *rq)
> +{
> +	int ret;
> +
> +	__i915_request_submit(rq);
> +
> +	trace_i915_request_in(rq, 0);
> +
> +	guc_set_lrc_tail(rq);
> +	ret = guc_add_request(guc, rq);
> +	if (ret == -EBUSY)
> +		guc->stalled_request = rq;
> +
> +	return ret;
> +}
> +
> +static void guc_submit_request(struct i915_request *rq)
> +{
> +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> +	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> +	unsigned long flags;
> +
> +	/* Will be called from irq-context when using foreign fences. */
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +
> +	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> +		queue_request(sched_engine, rq, rq_prio(rq));
> +	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> +		tasklet_hi_schedule(&sched_engine->tasklet);
> +
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +}
> +
> +#define GUC_ID_START	64	/* First 64 guc_ids reserved */
> +static int new_guc_id(struct intel_guc *guc)
> +{
> +	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
> +			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
> +			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> +}
> +
> +static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	if (!context_guc_id_invalid(ce)) {
> +		ida_simple_remove(&guc->guc_ids, ce->guc_id);
> +		reset_lrc_desc(guc, ce->guc_id);
> +		set_context_guc_id_invalid(ce);
> +	}
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +}
> +
> +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	__release_guc_id(guc, ce);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +}
> +
> +static int steal_guc_id(struct intel_guc *guc)
> +{
> +	struct intel_context *ce;
> +	int guc_id;
> +
> +	if (!list_empty(&guc->guc_id_list)) {
> +		ce = list_first_entry(&guc->guc_id_list,
> +				      struct intel_context,
> +				      guc_id_link);
> +
> +		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> +		GEM_BUG_ON(context_guc_id_invalid(ce));
> +
> +		list_del_init(&ce->guc_id_link);
> +		guc_id = ce->guc_id;
> +		set_context_guc_id_invalid(ce);
> +		return guc_id;
> +	} else {
> +		return -EAGAIN;
> +	}
> +}
> +
> +static int assign_guc_id(struct intel_guc *guc, u16 *out)
> +{
> +	int ret;
> +
> +	ret = new_guc_id(guc);
> +	if (unlikely(ret < 0)) {
> +		ret = steal_guc_id(guc);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
> +	*out = ret;
> +	return 0;
> +}
> +
> +#define PIN_GUC_ID_TRIES	4
> +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	int ret = 0;
> +	unsigned long flags, tries = PIN_GUC_ID_TRIES;
> +
> +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> +
> +try_again:
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +
> +	if (context_guc_id_invalid(ce)) {
> +		ret = assign_guc_id(guc, &ce->guc_id);
> +		if (ret)
> +			goto out_unlock;
> +		ret = 1;	// Indidcates newly assigned HW context

C++ style comment

> +	}
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +	atomic_inc(&ce->guc_id_ref);
> +
> +out_unlock:
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +
> +	/*
> +	 * -EAGAIN indicates no guc_ids are available, let's retire any
> +	 * outstanding requests to see if that frees up a guc_id. If the first
> +	 * retire didn't help, insert a sleep with the timeslice duration before
> +	 * attempting to retire more requests. Double the sleep period each
> +	 * subsequent pass before finally giving up. The sleep period has max of
> +	 * 100ms and minimum of 1ms.
> +	 */
> +	if (ret == -EAGAIN && --tries) {
> +		if (PIN_GUC_ID_TRIES - tries > 1) {
> +			unsigned int timeslice_shifted =
> +				ce->engine->props.timeslice_duration_ms <<
> +				(PIN_GUC_ID_TRIES - tries - 2);
> +			unsigned int max = min_t(unsigned int, 100,
> +						 timeslice_shifted);
> +
> +			msleep(max_t(unsigned int, max, 1));
> +		}
> +		intel_gt_retire_requests(guc_to_gt(guc));
> +		goto try_again;
> +	}
> +
> +	return ret;
> +}
> +
> +static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	unsigned long flags;
> +
> +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
> +
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
> +	    !atomic_read(&ce->guc_id_ref))
> +		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +}
> +
> +static int __guc_action_register_context(struct intel_guc *guc,
> +					 u32 guc_id,
> +					 u32 offset)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_REGISTER_CONTEXT,
> +		guc_id,
> +		offset,
> +	};
> +
> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> +}
> +
> +static int register_context(struct intel_context *ce)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> +		ce->guc_id * sizeof(struct guc_lrc_desc);
> +
> +	return __guc_action_register_context(guc, ce->guc_id, offset);
> +}
> +
> +static int __guc_action_deregister_context(struct intel_guc *guc,
> +					   u32 guc_id)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> +		guc_id,
> +	};
> +
> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> +}
> +
> +static int deregister_context(struct intel_context *ce, u32 guc_id)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +
> +	return __guc_action_deregister_context(guc, guc_id);
> +}
> +
> +static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> +{
> +	switch (class) {
> +	case RENDER_CLASS:
> +		return mask >> RCS0;
> +	case VIDEO_ENHANCEMENT_CLASS:
> +		return mask >> VECS0;
> +	case VIDEO_DECODE_CLASS:
> +		return mask >> VCS0;
> +	case COPY_ENGINE_CLASS:
> +		return mask >> BCS0;
> +	default:
> +		GEM_BUG_ON("Invalid Class");
> +		return 0;
> +	}
> +}
> +
> +static void guc_context_policy_init(struct intel_engine_cs *engine,
> +				    struct guc_lrc_desc *desc)
> +{
> +	desc->policy_flags = 0;
> +
> +	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
> +	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> +}
> +
> +static int guc_lrc_desc_pin(struct intel_context *ce)
> +{
> +	struct intel_runtime_pm *runtime_pm =
> +		&ce->engine->gt->i915->runtime_pm;
> +	struct intel_engine_cs *engine = ce->engine;
> +	struct intel_guc *guc = &engine->gt->uc.guc;
> +	u32 desc_idx = ce->guc_id;
> +	struct guc_lrc_desc *desc;
> +	bool context_registered;
> +	intel_wakeref_t wakeref;
> +	int ret = 0;
> +
> +	GEM_BUG_ON(!engine->mask);
> +
> +	/*
> +	 * Ensure LRC + CT vmas are is same region as write barrier is done
> +	 * based on CT vma region.
> +	 */
> +	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
> +		   i915_gem_object_is_lmem(ce->ring->vma->obj));
> +
> +	context_registered = lrc_desc_registered(guc, desc_idx);
> +
> +	reset_lrc_desc(guc, desc_idx);
> +	set_lrc_desc_registered(guc, desc_idx, ce);
> +
> +	desc = __get_lrc_desc(guc, desc_idx);
> +	desc->engine_class = engine_class_to_guc_class(engine->class);
> +	desc->engine_submit_mask = adjust_engine_mask(engine->class,
> +						      engine->mask);
> +	desc->hw_context_desc = ce->lrc.lrca;
> +	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> +	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> +	guc_context_policy_init(engine, desc);
> +	init_sched_state(ce);
> +
> +	/*
> +	 * The context_lookup xarray is used to determine if the hardware
> +	 * context is currently registered. There are two cases in which it
> +	 * could be regisgered either the guc_id has been stole from from
> +	 * another context or the lrc descriptor address of this context has
> +	 * changed. In either case the context needs to be deregistered with the
> +	 * GuC before registering this context.
> +	 */
> +	if (context_registered) {
> +		set_context_wait_for_deregister_to_register(ce);
> +		intel_context_get(ce);
> +
> +		/*
> +		 * If stealing the guc_id, this ce has the same guc_id as the
> +		 * context whos guc_id was stole.
> +		 */
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			ret = deregister_context(ce, ce->guc_id);
> +	} else {
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			ret = register_context(ce);
> +	}
> +
> +	return ret;
>  }
>  
>  static int guc_context_pre_pin(struct intel_context *ce,
> @@ -443,36 +813,137 @@ static int guc_context_pre_pin(struct intel_context *ce,
>  
>  static int guc_context_pin(struct intel_context *ce, void *vaddr)
>  {
> +	if (i915_ggtt_offset(ce->state) !=
> +	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
> +		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> +
>  	return lrc_pin(ce, ce->engine, vaddr);
>  }
>  
> +static void guc_context_unpin(struct intel_context *ce)
> +{
> +	unpin_guc_id(ce_to_guc(ce), ce);
> +	lrc_unpin(ce);
> +}
> +
> +static void guc_context_post_unpin(struct intel_context *ce)
> +{
> +	lrc_post_unpin(ce);
> +}
> +
> +static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> +{
> +	struct intel_engine_cs *engine = ce->engine;
> +	struct intel_guc *guc = &engine->gt->uc.guc;
> +	unsigned long flags;
> +
> +	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> +	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> +
> +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> +	set_context_destroyed(ce);
> +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +	deregister_context(ce, ce->guc_id);
> +}
> +
> +static void guc_context_destroy(struct kref *kref)
> +{
> +	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
> +	struct intel_guc *guc = &ce->engine->gt->uc.guc;
> +	intel_wakeref_t wakeref;
> +	unsigned long flags;
> +
> +	/*
> +	 * If the guc_id is invalid this context has been stolen and we can free
> +	 * it immediately. Also can be freed immediately if the context is not
> +	 * registered with the GuC.
> +	 */
> +	if (context_guc_id_invalid(ce) ||
> +	    !lrc_desc_registered(guc, ce->guc_id)) {
> +		release_guc_id(guc, ce);
> +		lrc_destroy(kref);
> +		return;
> +	}
> +
> +	/*
> +	 * We have to acquire the context spinlock and check guc_id again, if it
> +	 * is valid it hasn't been stolen and needs to be deregistered. We
> +	 * delete this context from the list of unpinned guc_ids available to
> +	 * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
> +	 * returns indicating this context has been deregistered the guc_id is
> +	 * returned to the pool of available guc_ids.
> +	 */
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	if (context_guc_id_invalid(ce)) {
> +		__release_guc_id(guc, ce);
> +		spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +		lrc_destroy(kref);
> +		return;
> +	}
> +
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +
> +	/*
> +	 * We defer GuC context deregistration until the context is destroyed
> +	 * in order to save on CTBs. With this optimization ideally we only need
> +	 * 1 CTB to register the context during the first pin and 1 CTB to
> +	 * deregister the context when the context is destroyed. Without this
> +	 * optimization, a CTB would be needed every pin & unpin.
> +	 *
> +	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
> +	 * from context_free_worker when not runtime wakeref is held.
> +	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
> +	 * in H2G CTB to deregister the context. A future patch may defer this
> +	 * H2G CTB if the runtime wakeref is zero.
> +	 */
> +	with_intel_runtime_pm(runtime_pm, wakeref)
> +		guc_lrc_desc_unpin(ce);
> +}
> +
> +static int guc_context_alloc(struct intel_context *ce)
> +{
> +	return lrc_alloc(ce, ce->engine);
> +}
> +
>  static const struct intel_context_ops guc_context_ops = {
>  	.alloc = guc_context_alloc,
>  
>  	.pre_pin = guc_context_pre_pin,
>  	.pin = guc_context_pin,
> -	.unpin = lrc_unpin,
> -	.post_unpin = lrc_post_unpin,
> +	.unpin = guc_context_unpin,
> +	.post_unpin = guc_context_post_unpin,
>  
>  	.enter = intel_context_enter_engine,
>  	.exit = intel_context_exit_engine,
>  
>  	.reset = lrc_reset,
> -	.destroy = lrc_destroy,
> +	.destroy = guc_context_destroy,
>  };
>  
> -static int guc_request_alloc(struct i915_request *request)
> +static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
>  {
> +	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> +}
> +
> +static int guc_request_alloc(struct i915_request *rq)
> +{
> +	struct intel_context *ce = rq->context;
> +	struct intel_guc *guc = ce_to_guc(ce);
>  	int ret;
>  
> -	GEM_BUG_ON(!intel_context_is_pinned(request->context));
> +	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
>  
>  	/*
>  	 * Flush enough space to reduce the likelihood of waiting after
>  	 * we start building the request - in which case we will just
>  	 * have to repeat work.
>  	 */
> -	request->reserved_space += GUC_REQUEST_SIZE;
> +	rq->reserved_space += GUC_REQUEST_SIZE;
>  
>  	/*
>  	 * Note that after this point, we have committed to using
> @@ -483,56 +954,47 @@ static int guc_request_alloc(struct i915_request *request)
>  	 */
>  
>  	/* Unconditionally invalidate GPU caches and TLBs. */
> -	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
> +	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
>  	if (ret)
>  		return ret;
>  
> -	request->reserved_space -= GUC_REQUEST_SIZE;
> -	return 0;
> -}
> -
> -static inline void queue_request(struct i915_sched_engine *sched_engine,
> -				 struct i915_request *rq,
> -				 int prio)
> -{
> -	GEM_BUG_ON(!list_empty(&rq->sched.link));
> -	list_add_tail(&rq->sched.link,
> -		      i915_sched_lookup_priolist(sched_engine, prio));
> -	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> -}
> -
> -static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> -				     struct i915_request *rq)
> -{
> -	int ret;
> -
> -	__i915_request_submit(rq);
> +	rq->reserved_space -= GUC_REQUEST_SIZE;
>  
> -	trace_i915_request_in(rq, 0);
> -
> -	guc_set_lrc_tail(rq);
> -	ret = guc_add_request(guc, rq);
> -	if (ret == -EBUSY)
> -		guc->stalled_request = rq;
> -
> -	return ret;
> -}
> -
> -static void guc_submit_request(struct i915_request *rq)
> -{
> -	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> -	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> -	unsigned long flags;
> +	/*
> +	 * Call pin_guc_id here rather than in the pinning step as with
> +	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
> +	 * guc_ids and creating horrible race conditions. This is especially bad
> +	 * when guc_ids are being stolen due to over subscription. By the time
> +	 * this function is reached, it is guaranteed that the guc_id will be
> +	 * persistent until the generated request is retired. Thus, sealing these
> +	 * race conditions. It is still safe to fail here if guc_ids are
> +	 * exhausted and return -EAGAIN to the user indicating that they can try
> +	 * again in the future.
> +	 *
> +	 * There is no need for a lock here as the timeline mutex ensures at
> +	 * most one context can be executing this code path at once. The
> +	 * guc_id_ref is incremented once for every request in flight and
> +	 * decremented on each retire. When it is zero, a lock around the
> +	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
> +	 */
> +	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
> +		return 0;
>  
> -	/* Will be called from irq-context when using foreign fences. */
> -	spin_lock_irqsave(&sched_engine->lock, flags);
> +	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
> +	if (unlikely(ret < 0))
> +		return ret;;
> +	if (context_needs_register(ce, !!ret)) {
> +		ret = guc_lrc_desc_pin(ce);
> +		if (unlikely(ret)) {	/* unwind */
> +			atomic_dec(&ce->guc_id_ref);
> +			unpin_guc_id(guc, ce);
> +			return ret;
> +		}
> +	}
>  
> -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> -		queue_request(sched_engine, rq, rq_prio(rq));
> -	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> -		tasklet_hi_schedule(&sched_engine->tasklet);
> +	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
>  
> -	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +	return 0;
>  }
>  
>  static void sanitize_hwsp(struct intel_engine_cs *engine)
> @@ -606,6 +1068,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
>  	engine->submit_request = guc_submit_request;
>  }
>  
> +static inline void guc_kernel_context_pin(struct intel_guc *guc,
> +					  struct intel_context *ce)
> +{
> +	if (context_guc_id_invalid(ce))
> +		pin_guc_id(guc, ce);
> +	guc_lrc_desc_pin(ce);
> +}
> +
> +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> +{
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +
> +	/* make sure all descriptors are clean... */
> +	xa_destroy(&guc->context_lookup);
> +
> +	/*
> +	 * Some contexts might have been pinned before we enabled GuC
> +	 * submission, so we need to add them to the GuC bookeeping.
> +	 * Also, after a reset the GuC we want to make sure that the information
> +	 * shared with GuC is properly reset. The kernel lrcs are not attached
> +	 * to the gem_context, so they need to be added separately.
> +	 *
> +	 * Note: we purposely do not check the error return of
> +	 * guc_lrc_desc_pin, because that function can only fail in two cases.
> +	 * One, if there aren't enough free IDs, but we're guaranteed to have
> +	 * enough here (we're either only pinning a handful of lrc on first boot
> +	 * or we're re-pinning lrcs that were already pinned before the reset).
> +	 * Two, if the GuC has died and CTBs can't make forward progress.
> +	 * Presumably, the GuC should be alive as this function is called on
> +	 * driver load or after a reset. Even if it is dead, another full GPU
> +	 * reset will be triggered and this function would be called again.
> +	 */
> +
> +	for_each_engine(engine, gt, id)
> +		if (engine->kernel_context)
> +			guc_kernel_context_pin(guc, engine->kernel_context);
> +}
> +
>  static void guc_release(struct intel_engine_cs *engine)
>  {
>  	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
> @@ -718,6 +1220,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>  
>  void intel_guc_submission_enable(struct intel_guc *guc)
>  {
> +	guc_init_lrc_mapping(guc);
>  }
>  
>  void intel_guc_submission_disable(struct intel_guc *guc)
> @@ -743,3 +1246,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
>  {
>  	guc->submission_selected = __guc_submission_selected(guc);
>  }
> +
> +static inline struct intel_context *
> +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> +{
> +	struct intel_context *ce;
> +
> +	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> +			"Invalid desc_idx %u", desc_idx);

just debug? why not an (fatal) error ?

> +		return NULL;
> +	}
> +
> +	ce = __get_context(guc, desc_idx);
> +	if (unlikely(!ce)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> +			"Context is NULL, desc_idx %u", desc_idx);

not an error ?

> +		return NULL;
> +	}
> +
> +	return ce;
> +}
> +
> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> +					  const u32 *msg,
> +					  u32 len)
> +{
> +	struct intel_context *ce;
> +	u32 desc_idx = msg[0];
> +
> +	if (unlikely(len < 1)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
> +		return -EPROTO;
> +	}
> +
> +	ce = g2h_context_lookup(guc, desc_idx);
> +	if (unlikely(!ce))
> +		return -EPROTO;
> +
> +	if (context_wait_for_deregister_to_register(ce)) {
> +		struct intel_runtime_pm *runtime_pm =
> +			&ce->engine->gt->i915->runtime_pm;
> +		intel_wakeref_t wakeref;
> +
> +		/*
> +		 * Previous owner of this guc_id has been deregistered, now safe
> +		 * register this context.
> +		 */
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			register_context(ce);
> +		clr_context_wait_for_deregister_to_register(ce);
> +		intel_context_put(ce);
> +	} else if (context_destroyed(ce)) {
> +		/* Context has been destroyed */
> +		release_guc_id(guc, ce);
> +		lrc_destroy(&ce->ref);
> +	}
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index c857fafb8a30..a9c2242d61a2 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -4142,6 +4142,7 @@ enum {
>  	FAULT_AND_CONTINUE /* Unsupported */
>  };
>  
> +#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
>  #define GEN8_CTX_VALID (1 << 0)
>  #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
>  #define GEN8_CTX_FORCE_RESTORE (1 << 2)
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index c5989c0b83d3..9dad3df5eaf7 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -419,6 +419,7 @@ bool i915_request_retire(struct i915_request *rq)
>  	 */
>  	if (!list_empty(&rq->sched.link))
>  		remove_from_engine(rq);
> +	atomic_dec(&rq->context->guc_id_ref);
>  	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
>  
>  	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 10/47] drm/i915/guc: Add lrc descriptor context lookup array
  2021-06-25 13:17   ` Michal Wajdeczko
@ 2021-06-25 17:26     ` Matthew Brost
  2021-06-29 21:20       ` John Harrison
  0 siblings, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-25 17:26 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: intel-gfx, dri-devel

On Fri, Jun 25, 2021 at 03:17:51PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 24.06.2021 09:04, Matthew Brost wrote:
> > Add lrc descriptor context lookup array which can resolve the
> > intel_context from the lrc descriptor index. In addition to lookup, it
> > can determine in the lrc descriptor context is currently registered with
> > the GuC by checking if an entry for a descriptor index is present.
> > Future patches in the series will make use of this array.
> 
> s/lrc/LRC
> 

I guess? lrc and LRC are used interchangeably throughout the current
code base. 

> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  5 +++
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +++++++++++++++++--
> >  2 files changed, 35 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index b28fa54214f2..2313d9fc087b 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -6,6 +6,8 @@
> >  #ifndef _INTEL_GUC_H_
> >  #define _INTEL_GUC_H_
> >  
> > +#include "linux/xarray.h"
> 
> #include <linux/xarray.h>
> 

Yep.

> > +
> >  #include "intel_uncore.h"
> >  #include "intel_guc_fw.h"
> >  #include "intel_guc_fwif.h"
> > @@ -46,6 +48,9 @@ struct intel_guc {
> >  	struct i915_vma *lrc_desc_pool;
> >  	void *lrc_desc_pool_vaddr;
> >  
> > +	/* guc_id to intel_context lookup */
> > +	struct xarray context_lookup;
> > +
> >  	/* Control params for fw initialization */
> >  	u32 params[GUC_CTL_MAX_DWORDS];
> 
> btw, IIRC there was idea to move most struct definitions to
> intel_guc_types.h, is this still a plan ?
> 

I don't ever recall discussing this but we can certainly do this. For
what it is worth we do introduce intel_guc_submission_types.h a bit
later. I'll make a note about intel_guc_types.h though.

Matt

> >  
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index a366890fb840..23a94a896a0b 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> >  	return rb_entry(rb, struct i915_priolist, node);
> >  }
> >  
> > -/* Future patches will use this function */
> > -__attribute__ ((unused))
> >  static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
> >  {
> >  	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
> > @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
> >  	return &base[index];
> >  }
> >  
> > +static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
> > +{
> > +	struct intel_context *ce = xa_load(&guc->context_lookup, id);
> > +
> > +	GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
> > +
> > +	return ce;
> > +}
> > +
> >  static int guc_lrc_desc_pool_create(struct intel_guc *guc)
> >  {
> >  	u32 size;
> > @@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
> >  	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
> >  }
> >  
> > +static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
> > +{
> > +	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> > +
> > +	memset(desc, 0, sizeof(*desc));
> > +	xa_erase_irq(&guc->context_lookup, id);
> > +}
> > +
> > +static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> > +{
> > +	return __get_context(guc, id);
> > +}
> > +
> > +static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> > +					   struct intel_context *ce)
> > +{
> > +	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> > +}
> > +
> >  static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >  {
> >  	/* Leaving stub as this function will be used in future patches */
> > @@ -400,6 +426,8 @@ int intel_guc_submission_init(struct intel_guc *guc)
> >  	 */
> >  	GEM_BUG_ON(!guc->lrc_desc_pool);
> >  
> > +	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> > +
> >  	return 0;
> >  }
> >  
> > 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 13/47] drm/i915/guc: Implement GuC context operations for new inteface
  2021-06-25 13:25   ` Michal Wajdeczko
@ 2021-06-25 17:46     ` Matthew Brost
  0 siblings, 0 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-25 17:46 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: intel-gfx, dri-devel

On Fri, Jun 25, 2021 at 03:25:13PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 24.06.2021 09:04, Matthew Brost wrote:
> > Implement GuC context operations which includes GuC specific operations
> > alloc, pin, unpin, and destroy.
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
> >  drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
> >  drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
> >  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  34 +
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 664 ++++++++++++++++--
> >  drivers/gpu/drm/i915/i915_reg.h               |   1 +
> >  drivers/gpu/drm/i915/i915_request.c           |   1 +
> >  8 files changed, 677 insertions(+), 55 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> > index 4033184f13b9..2b68af16222c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -383,6 +383,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
> >  
> >  	mutex_init(&ce->pin_mutex);
> >  
> > +	spin_lock_init(&ce->guc_state.lock);
> > +
> > +	ce->guc_id = GUC_INVALID_LRC_ID;
> > +	INIT_LIST_HEAD(&ce->guc_id_link);
> > +
> >  	i915_active_init(&ce->active,
> >  			 __intel_context_active, __intel_context_retire, 0);
> >  }
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index bb6fef7eae52..ce7c69b34cd1 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -95,6 +95,7 @@ struct intel_context {
> >  #define CONTEXT_BANNED			6
> >  #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
> >  #define CONTEXT_NOPREEMPT		8
> > +#define CONTEXT_LRCA_DIRTY		9
> >  
> >  	struct {
> >  		u64 timeout_us;
> > @@ -137,14 +138,29 @@ struct intel_context {
> >  
> >  	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
> >  
> > +	struct {
> > +		/** lock: protects everything in guc_state */
> > +		spinlock_t lock;
> > +		/**
> > +		 * sched_state: scheduling state of this context using GuC
> > +		 * submission
> > +		 */
> > +		u8 sched_state;
> > +	} guc_state;
> > +
> >  	/* GuC scheduling state that does not require a lock. */
> >  	atomic_t guc_sched_state_no_lock;
> >  
> > +	/* GuC lrc descriptor ID */
> > +	u16 guc_id;
> > +
> > +	/* GuC lrc descriptor reference count */
> > +	atomic_t guc_id_ref;
> > +
> >  	/*
> > -	 * GuC lrc descriptor ID - Not assigned in this patch but future patches
> > -	 * in the series will.
> > +	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
> >  	 */
> > -	u16 guc_id;
> > +	struct list_head guc_id_link;
> 
> some fields are being added with kerneldoc, some without
> what's the rule ?
> 

Yea, idk. I think we need to scrub all the structures in the driver and
add kernel doc everywhere. IMO not a blocker too as I think all the
structures are going to be reworked with OO concepts after the GuC code
lands before moving to DRM scheduler. That would be logical time to
update all the kernel doc too.

> >  };
> >  
> >  #endif /* __INTEL_CONTEXT_TYPES__ */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> > index 41e5350a7a05..49d4857ad9b7 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> > @@ -87,7 +87,6 @@
> >  #define GEN11_CSB_WRITE_PTR_MASK	(GEN11_CSB_PTR_MASK << 0)
> >  
> >  #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */
> > -#define MAX_GUC_CONTEXT_HW_ID	(1 << 20) /* exclusive */
> >  #define GEN11_MAX_CONTEXT_HW_ID	(1 << 11) /* exclusive */
> >  /* in Gen12 ID 0x7FF is reserved to indicate idle */
> >  #define GEN12_MAX_CONTEXT_HW_ID	(GEN11_MAX_CONTEXT_HW_ID - 1)
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 9ba8219475b2..d44316dc914b 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -44,6 +44,14 @@ struct intel_guc {
> >  		void (*disable)(struct intel_guc *guc);
> >  	} interrupts;
> >  
> > +	/*
> > +	 * contexts_lock protects the pool of free guc ids and a linked list of
> > +	 * guc ids available to be stolen
> > +	 */
> > +	spinlock_t contexts_lock;
> > +	struct ida guc_ids;
> > +	struct list_head guc_id_list;
> > +
> >  	bool submission_selected;
> >  
> >  	struct i915_vma *ads_vma;
> > @@ -102,6 +110,29 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
> >  				 response_buf, response_buf_size, 0);
> >  }
> >  
> > +static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
> > +					   const u32 *action,
> > +					   u32 len,
> > +					   bool loop)
> > +{
> > +	int err;
> > +
> > +	/* No sleeping with spin locks, just busy loop */
> > +	might_sleep_if(loop && (!in_atomic() && !irqs_disabled()));
> > +
> > +retry:
> > +	err = intel_guc_send_nb(guc, action, len);
> > +	if (unlikely(err == -EBUSY && loop)) {
> > +		if (likely(!in_atomic() && !irqs_disabled()))
> > +			cond_resched();
> > +		else
> > +			cpu_relax();
> > +		goto retry;
> > +	}
> > +
> > +	return err;
> > +}
> > +
> >  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> >  {
> >  	intel_guc_ct_event_handler(&guc->ct);
> > @@ -203,6 +234,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
> >  int intel_guc_reset_engine(struct intel_guc *guc,
> >  			   struct intel_engine_cs *engine);
> >  
> > +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > +					  const u32 *msg, u32 len);
> > +
> >  void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
> >  
> >  #endif
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 8e0ed7d8feb3..42a7daef2ff6 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -901,6 +901,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
> >  	case INTEL_GUC_ACTION_DEFAULT:
> >  		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
> >  		break;
> > +	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
> > +		ret = intel_guc_deregister_done_process_msg(guc, payload,
> > +							    len);
> > +		break;
> >  	default:
> >  		ret = -EOPNOTSUPP;
> >  		break;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 38aff83ee9fa..d39579ac2faa 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -13,7 +13,9 @@
> >  #include "gt/intel_gt.h"
> >  #include "gt/intel_gt_irq.h"
> >  #include "gt/intel_gt_pm.h"
> > +#include "gt/intel_gt_requests.h"
> >  #include "gt/intel_lrc.h"
> > +#include "gt/intel_lrc_reg.h"
> >  #include "gt/intel_mocs.h"
> >  #include "gt/intel_ring.h"
> >  
> > @@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce)
> >  		   &ce->guc_sched_state_no_lock);
> >  }
> >  
> > +/*
> > + * Below is a set of functions which control the GuC scheduling state which
> > + * require a lock, aside from the special case where the functions are called
> > + * from guc_lrc_desc_pin(). In that case it isn't possible for any other code
> > + * path to be executing on the context.
> > + */
> > +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
> > +#define SCHED_STATE_DESTROYED				BIT(1)
> > +static inline void init_sched_state(struct intel_context *ce)
> > +{
> > +	/* Only should be called from guc_lrc_desc_pin() */
> > +	atomic_set(&ce->guc_sched_state_no_lock, 0);
> > +	ce->guc_state.sched_state = 0;
> > +}
> > +
> > +static inline bool
> > +context_wait_for_deregister_to_register(struct intel_context *ce)
> > +{
> > +	return (ce->guc_state.sched_state &
> > +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> > +}
> > +
> > +static inline void
> > +set_context_wait_for_deregister_to_register(struct intel_context *ce)
> > +{
> > +	/* Only should be called from guc_lrc_desc_pin() */
> > +	ce->guc_state.sched_state |=
> > +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> > +}
> > +
> > +static inline void
> > +clr_context_wait_for_deregister_to_register(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->guc_state.lock);
> > +	ce->guc_state.sched_state =
> > +		(ce->guc_state.sched_state &
> > +		 ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> > +}
> > +
> > +static inline bool
> > +context_destroyed(struct intel_context *ce)
> > +{
> > +	return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
> > +}
> > +
> > +static inline void
> > +set_context_destroyed(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->guc_state.lock);
> > +	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
> > +}
> > +
> > +static inline bool context_guc_id_invalid(struct intel_context *ce)
> > +{
> > +	return (ce->guc_id == GUC_INVALID_LRC_ID);
> > +}
> > +
> > +static inline void set_context_guc_id_invalid(struct intel_context *ce)
> > +{
> > +	ce->guc_id = GUC_INVALID_LRC_ID;
> > +}
> > +
> > +static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
> > +{
> > +	return &ce->engine->gt->uc.guc;
> > +}
> > +
> >  static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> >  {
> >  	return rb_entry(rb, struct i915_priolist, node);
> > @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >  	int len = 0;
> >  	bool enabled = context_enabled(ce);
> >  
> > +	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> > +	GEM_BUG_ON(context_guc_id_invalid(ce));
> > +
> >  	if (!enabled) {
> >  		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> >  		action[len++] = ce->guc_id;
> > @@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
> >  
> >  	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> >  
> > +	spin_lock_init(&guc->contexts_lock);
> > +	INIT_LIST_HEAD(&guc->guc_id_list);
> > +	ida_init(&guc->guc_ids);
> > +
> >  	return 0;
> >  }
> >  
> > @@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc)
> >  	i915_sched_engine_put(guc->sched_engine);
> >  }
> >  
> > -static int guc_context_alloc(struct intel_context *ce)
> > +static inline void queue_request(struct i915_sched_engine *sched_engine,
> > +				 struct i915_request *rq,
> > +				 int prio)
> >  {
> > -	return lrc_alloc(ce, ce->engine);
> > +	GEM_BUG_ON(!list_empty(&rq->sched.link));
> > +	list_add_tail(&rq->sched.link,
> > +		      i915_sched_lookup_priolist(sched_engine, prio));
> > +	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > +}
> > +
> > +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> > +				     struct i915_request *rq)
> > +{
> > +	int ret;
> > +
> > +	__i915_request_submit(rq);
> > +
> > +	trace_i915_request_in(rq, 0);
> > +
> > +	guc_set_lrc_tail(rq);
> > +	ret = guc_add_request(guc, rq);
> > +	if (ret == -EBUSY)
> > +		guc->stalled_request = rq;
> > +
> > +	return ret;
> > +}
> > +
> > +static void guc_submit_request(struct i915_request *rq)
> > +{
> > +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> > +	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> > +	unsigned long flags;
> > +
> > +	/* Will be called from irq-context when using foreign fences. */
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +
> > +	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> > +		queue_request(sched_engine, rq, rq_prio(rq));
> > +	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> > +		tasklet_hi_schedule(&sched_engine->tasklet);
> > +
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +}
> > +
> > +#define GUC_ID_START	64	/* First 64 guc_ids reserved */
> > +static int new_guc_id(struct intel_guc *guc)
> > +{
> > +	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
> > +			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
> > +			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> > +}
> > +
> > +static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	if (!context_guc_id_invalid(ce)) {
> > +		ida_simple_remove(&guc->guc_ids, ce->guc_id);
> > +		reset_lrc_desc(guc, ce->guc_id);
> > +		set_context_guc_id_invalid(ce);
> > +	}
> > +	if (!list_empty(&ce->guc_id_link))
> > +		list_del_init(&ce->guc_id_link);
> > +}
> > +
> > +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +	__release_guc_id(guc, ce);
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +}
> > +
> > +static int steal_guc_id(struct intel_guc *guc)
> > +{
> > +	struct intel_context *ce;
> > +	int guc_id;
> > +
> > +	if (!list_empty(&guc->guc_id_list)) {
> > +		ce = list_first_entry(&guc->guc_id_list,
> > +				      struct intel_context,
> > +				      guc_id_link);
> > +
> > +		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> > +		GEM_BUG_ON(context_guc_id_invalid(ce));
> > +
> > +		list_del_init(&ce->guc_id_link);
> > +		guc_id = ce->guc_id;
> > +		set_context_guc_id_invalid(ce);
> > +		return guc_id;
> > +	} else {
> > +		return -EAGAIN;
> > +	}
> > +}
> > +
> > +static int assign_guc_id(struct intel_guc *guc, u16 *out)
> > +{
> > +	int ret;
> > +
> > +	ret = new_guc_id(guc);
> > +	if (unlikely(ret < 0)) {
> > +		ret = steal_guc_id(guc);
> > +		if (ret < 0)
> > +			return ret;
> > +	}
> > +
> > +	*out = ret;
> > +	return 0;
> > +}
> > +
> > +#define PIN_GUC_ID_TRIES	4
> > +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	int ret = 0;
> > +	unsigned long flags, tries = PIN_GUC_ID_TRIES;
> > +
> > +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> > +
> > +try_again:
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +
> > +	if (context_guc_id_invalid(ce)) {
> > +		ret = assign_guc_id(guc, &ce->guc_id);
> > +		if (ret)
> > +			goto out_unlock;
> > +		ret = 1;	// Indidcates newly assigned HW context
> 
> C++ style comment
> 

Yep, will fix.

> > +	}
> > +	if (!list_empty(&ce->guc_id_link))
> > +		list_del_init(&ce->guc_id_link);
> > +	atomic_inc(&ce->guc_id_ref);
> > +
> > +out_unlock:
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +
> > +	/*
> > +	 * -EAGAIN indicates no guc_ids are available, let's retire any
> > +	 * outstanding requests to see if that frees up a guc_id. If the first
> > +	 * retire didn't help, insert a sleep with the timeslice duration before
> > +	 * attempting to retire more requests. Double the sleep period each
> > +	 * subsequent pass before finally giving up. The sleep period has max of
> > +	 * 100ms and minimum of 1ms.
> > +	 */
> > +	if (ret == -EAGAIN && --tries) {
> > +		if (PIN_GUC_ID_TRIES - tries > 1) {
> > +			unsigned int timeslice_shifted =
> > +				ce->engine->props.timeslice_duration_ms <<
> > +				(PIN_GUC_ID_TRIES - tries - 2);
> > +			unsigned int max = min_t(unsigned int, 100,
> > +						 timeslice_shifted);
> > +
> > +			msleep(max_t(unsigned int, max, 1));
> > +		}
> > +		intel_gt_retire_requests(guc_to_gt(guc));
> > +		goto try_again;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	unsigned long flags;
> > +
> > +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
> > +
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
> > +	    !atomic_read(&ce->guc_id_ref))
> > +		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +}
> > +
> > +static int __guc_action_register_context(struct intel_guc *guc,
> > +					 u32 guc_id,
> > +					 u32 offset)
> > +{
> > +	u32 action[] = {
> > +		INTEL_GUC_ACTION_REGISTER_CONTEXT,
> > +		guc_id,
> > +		offset,
> > +	};
> > +
> > +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> > +}
> > +
> > +static int register_context(struct intel_context *ce)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> > +		ce->guc_id * sizeof(struct guc_lrc_desc);
> > +
> > +	return __guc_action_register_context(guc, ce->guc_id, offset);
> > +}
> > +
> > +static int __guc_action_deregister_context(struct intel_guc *guc,
> > +					   u32 guc_id)
> > +{
> > +	u32 action[] = {
> > +		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> > +		guc_id,
> > +	};
> > +
> > +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> > +}
> > +
> > +static int deregister_context(struct intel_context *ce, u32 guc_id)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +
> > +	return __guc_action_deregister_context(guc, guc_id);
> > +}
> > +
> > +static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> > +{
> > +	switch (class) {
> > +	case RENDER_CLASS:
> > +		return mask >> RCS0;
> > +	case VIDEO_ENHANCEMENT_CLASS:
> > +		return mask >> VECS0;
> > +	case VIDEO_DECODE_CLASS:
> > +		return mask >> VCS0;
> > +	case COPY_ENGINE_CLASS:
> > +		return mask >> BCS0;
> > +	default:
> > +		GEM_BUG_ON("Invalid Class");
> > +		return 0;
> > +	}
> > +}
> > +
> > +static void guc_context_policy_init(struct intel_engine_cs *engine,
> > +				    struct guc_lrc_desc *desc)
> > +{
> > +	desc->policy_flags = 0;
> > +
> > +	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
> > +	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> > +}
> > +
> > +static int guc_lrc_desc_pin(struct intel_context *ce)
> > +{
> > +	struct intel_runtime_pm *runtime_pm =
> > +		&ce->engine->gt->i915->runtime_pm;
> > +	struct intel_engine_cs *engine = ce->engine;
> > +	struct intel_guc *guc = &engine->gt->uc.guc;
> > +	u32 desc_idx = ce->guc_id;
> > +	struct guc_lrc_desc *desc;
> > +	bool context_registered;
> > +	intel_wakeref_t wakeref;
> > +	int ret = 0;
> > +
> > +	GEM_BUG_ON(!engine->mask);
> > +
> > +	/*
> > +	 * Ensure LRC + CT vmas are is same region as write barrier is done
> > +	 * based on CT vma region.
> > +	 */
> > +	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
> > +		   i915_gem_object_is_lmem(ce->ring->vma->obj));
> > +
> > +	context_registered = lrc_desc_registered(guc, desc_idx);
> > +
> > +	reset_lrc_desc(guc, desc_idx);
> > +	set_lrc_desc_registered(guc, desc_idx, ce);
> > +
> > +	desc = __get_lrc_desc(guc, desc_idx);
> > +	desc->engine_class = engine_class_to_guc_class(engine->class);
> > +	desc->engine_submit_mask = adjust_engine_mask(engine->class,
> > +						      engine->mask);
> > +	desc->hw_context_desc = ce->lrc.lrca;
> > +	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> > +	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> > +	guc_context_policy_init(engine, desc);
> > +	init_sched_state(ce);
> > +
> > +	/*
> > +	 * The context_lookup xarray is used to determine if the hardware
> > +	 * context is currently registered. There are two cases in which it
> > +	 * could be regisgered either the guc_id has been stole from from
> > +	 * another context or the lrc descriptor address of this context has
> > +	 * changed. In either case the context needs to be deregistered with the
> > +	 * GuC before registering this context.
> > +	 */
> > +	if (context_registered) {
> > +		set_context_wait_for_deregister_to_register(ce);
> > +		intel_context_get(ce);
> > +
> > +		/*
> > +		 * If stealing the guc_id, this ce has the same guc_id as the
> > +		 * context whos guc_id was stole.
> > +		 */
> > +		with_intel_runtime_pm(runtime_pm, wakeref)
> > +			ret = deregister_context(ce, ce->guc_id);
> > +	} else {
> > +		with_intel_runtime_pm(runtime_pm, wakeref)
> > +			ret = register_context(ce);
> > +	}
> > +
> > +	return ret;
> >  }
> >  
> >  static int guc_context_pre_pin(struct intel_context *ce,
> > @@ -443,36 +813,137 @@ static int guc_context_pre_pin(struct intel_context *ce,
> >  
> >  static int guc_context_pin(struct intel_context *ce, void *vaddr)
> >  {
> > +	if (i915_ggtt_offset(ce->state) !=
> > +	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
> > +		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> > +
> >  	return lrc_pin(ce, ce->engine, vaddr);
> >  }
> >  
> > +static void guc_context_unpin(struct intel_context *ce)
> > +{
> > +	unpin_guc_id(ce_to_guc(ce), ce);
> > +	lrc_unpin(ce);
> > +}
> > +
> > +static void guc_context_post_unpin(struct intel_context *ce)
> > +{
> > +	lrc_post_unpin(ce);
> > +}
> > +
> > +static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> > +{
> > +	struct intel_engine_cs *engine = ce->engine;
> > +	struct intel_guc *guc = &engine->gt->uc.guc;
> > +	unsigned long flags;
> > +
> > +	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> > +	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> > +
> > +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +	set_context_destroyed(ce);
> > +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +
> > +	deregister_context(ce, ce->guc_id);
> > +}
> > +
> > +static void guc_context_destroy(struct kref *kref)
> > +{
> > +	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> > +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
> > +	struct intel_guc *guc = &ce->engine->gt->uc.guc;
> > +	intel_wakeref_t wakeref;
> > +	unsigned long flags;
> > +
> > +	/*
> > +	 * If the guc_id is invalid this context has been stolen and we can free
> > +	 * it immediately. Also can be freed immediately if the context is not
> > +	 * registered with the GuC.
> > +	 */
> > +	if (context_guc_id_invalid(ce) ||
> > +	    !lrc_desc_registered(guc, ce->guc_id)) {
> > +		release_guc_id(guc, ce);
> > +		lrc_destroy(kref);
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * We have to acquire the context spinlock and check guc_id again, if it
> > +	 * is valid it hasn't been stolen and needs to be deregistered. We
> > +	 * delete this context from the list of unpinned guc_ids available to
> > +	 * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
> > +	 * returns indicating this context has been deregistered the guc_id is
> > +	 * returned to the pool of available guc_ids.
> > +	 */
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +	if (context_guc_id_invalid(ce)) {
> > +		__release_guc_id(guc, ce);
> > +		spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +		lrc_destroy(kref);
> > +		return;
> > +	}
> > +
> > +	if (!list_empty(&ce->guc_id_link))
> > +		list_del_init(&ce->guc_id_link);
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +
> > +	/*
> > +	 * We defer GuC context deregistration until the context is destroyed
> > +	 * in order to save on CTBs. With this optimization ideally we only need
> > +	 * 1 CTB to register the context during the first pin and 1 CTB to
> > +	 * deregister the context when the context is destroyed. Without this
> > +	 * optimization, a CTB would be needed every pin & unpin.
> > +	 *
> > +	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
> > +	 * from context_free_worker when not runtime wakeref is held.
> > +	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
> > +	 * in H2G CTB to deregister the context. A future patch may defer this
> > +	 * H2G CTB if the runtime wakeref is zero.
> > +	 */
> > +	with_intel_runtime_pm(runtime_pm, wakeref)
> > +		guc_lrc_desc_unpin(ce);
> > +}
> > +
> > +static int guc_context_alloc(struct intel_context *ce)
> > +{
> > +	return lrc_alloc(ce, ce->engine);
> > +}
> > +
> >  static const struct intel_context_ops guc_context_ops = {
> >  	.alloc = guc_context_alloc,
> >  
> >  	.pre_pin = guc_context_pre_pin,
> >  	.pin = guc_context_pin,
> > -	.unpin = lrc_unpin,
> > -	.post_unpin = lrc_post_unpin,
> > +	.unpin = guc_context_unpin,
> > +	.post_unpin = guc_context_post_unpin,
> >  
> >  	.enter = intel_context_enter_engine,
> >  	.exit = intel_context_exit_engine,
> >  
> >  	.reset = lrc_reset,
> > -	.destroy = lrc_destroy,
> > +	.destroy = guc_context_destroy,
> >  };
> >  
> > -static int guc_request_alloc(struct i915_request *request)
> > +static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
> >  {
> > +	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> > +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> > +}
> > +
> > +static int guc_request_alloc(struct i915_request *rq)
> > +{
> > +	struct intel_context *ce = rq->context;
> > +	struct intel_guc *guc = ce_to_guc(ce);
> >  	int ret;
> >  
> > -	GEM_BUG_ON(!intel_context_is_pinned(request->context));
> > +	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
> >  
> >  	/*
> >  	 * Flush enough space to reduce the likelihood of waiting after
> >  	 * we start building the request - in which case we will just
> >  	 * have to repeat work.
> >  	 */
> > -	request->reserved_space += GUC_REQUEST_SIZE;
> > +	rq->reserved_space += GUC_REQUEST_SIZE;
> >  
> >  	/*
> >  	 * Note that after this point, we have committed to using
> > @@ -483,56 +954,47 @@ static int guc_request_alloc(struct i915_request *request)
> >  	 */
> >  
> >  	/* Unconditionally invalidate GPU caches and TLBs. */
> > -	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
> > +	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
> >  	if (ret)
> >  		return ret;
> >  
> > -	request->reserved_space -= GUC_REQUEST_SIZE;
> > -	return 0;
> > -}
> > -
> > -static inline void queue_request(struct i915_sched_engine *sched_engine,
> > -				 struct i915_request *rq,
> > -				 int prio)
> > -{
> > -	GEM_BUG_ON(!list_empty(&rq->sched.link));
> > -	list_add_tail(&rq->sched.link,
> > -		      i915_sched_lookup_priolist(sched_engine, prio));
> > -	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > -}
> > -
> > -static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> > -				     struct i915_request *rq)
> > -{
> > -	int ret;
> > -
> > -	__i915_request_submit(rq);
> > +	rq->reserved_space -= GUC_REQUEST_SIZE;
> >  
> > -	trace_i915_request_in(rq, 0);
> > -
> > -	guc_set_lrc_tail(rq);
> > -	ret = guc_add_request(guc, rq);
> > -	if (ret == -EBUSY)
> > -		guc->stalled_request = rq;
> > -
> > -	return ret;
> > -}
> > -
> > -static void guc_submit_request(struct i915_request *rq)
> > -{
> > -	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> > -	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> > -	unsigned long flags;
> > +	/*
> > +	 * Call pin_guc_id here rather than in the pinning step as with
> > +	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
> > +	 * guc_ids and creating horrible race conditions. This is especially bad
> > +	 * when guc_ids are being stolen due to over subscription. By the time
> > +	 * this function is reached, it is guaranteed that the guc_id will be
> > +	 * persistent until the generated request is retired. Thus, sealing these
> > +	 * race conditions. It is still safe to fail here if guc_ids are
> > +	 * exhausted and return -EAGAIN to the user indicating that they can try
> > +	 * again in the future.
> > +	 *
> > +	 * There is no need for a lock here as the timeline mutex ensures at
> > +	 * most one context can be executing this code path at once. The
> > +	 * guc_id_ref is incremented once for every request in flight and
> > +	 * decremented on each retire. When it is zero, a lock around the
> > +	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
> > +	 */
> > +	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
> > +		return 0;
> >  
> > -	/* Will be called from irq-context when using foreign fences. */
> > -	spin_lock_irqsave(&sched_engine->lock, flags);
> > +	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
> > +	if (unlikely(ret < 0))
> > +		return ret;;
> > +	if (context_needs_register(ce, !!ret)) {
> > +		ret = guc_lrc_desc_pin(ce);
> > +		if (unlikely(ret)) {	/* unwind */
> > +			atomic_dec(&ce->guc_id_ref);
> > +			unpin_guc_id(guc, ce);
> > +			return ret;
> > +		}
> > +	}
> >  
> > -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> > -		queue_request(sched_engine, rq, rq_prio(rq));
> > -	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> > -		tasklet_hi_schedule(&sched_engine->tasklet);
> > +	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> >  
> > -	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +	return 0;
> >  }
> >  
> >  static void sanitize_hwsp(struct intel_engine_cs *engine)
> > @@ -606,6 +1068,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
> >  	engine->submit_request = guc_submit_request;
> >  }
> >  
> > +static inline void guc_kernel_context_pin(struct intel_guc *guc,
> > +					  struct intel_context *ce)
> > +{
> > +	if (context_guc_id_invalid(ce))
> > +		pin_guc_id(guc, ce);
> > +	guc_lrc_desc_pin(ce);
> > +}
> > +
> > +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> > +{
> > +	struct intel_gt *gt = guc_to_gt(guc);
> > +	struct intel_engine_cs *engine;
> > +	enum intel_engine_id id;
> > +
> > +	/* make sure all descriptors are clean... */
> > +	xa_destroy(&guc->context_lookup);
> > +
> > +	/*
> > +	 * Some contexts might have been pinned before we enabled GuC
> > +	 * submission, so we need to add them to the GuC bookeeping.
> > +	 * Also, after a reset the GuC we want to make sure that the information
> > +	 * shared with GuC is properly reset. The kernel lrcs are not attached
> > +	 * to the gem_context, so they need to be added separately.
> > +	 *
> > +	 * Note: we purposely do not check the error return of
> > +	 * guc_lrc_desc_pin, because that function can only fail in two cases.
> > +	 * One, if there aren't enough free IDs, but we're guaranteed to have
> > +	 * enough here (we're either only pinning a handful of lrc on first boot
> > +	 * or we're re-pinning lrcs that were already pinned before the reset).
> > +	 * Two, if the GuC has died and CTBs can't make forward progress.
> > +	 * Presumably, the GuC should be alive as this function is called on
> > +	 * driver load or after a reset. Even if it is dead, another full GPU
> > +	 * reset will be triggered and this function would be called again.
> > +	 */
> > +
> > +	for_each_engine(engine, gt, id)
> > +		if (engine->kernel_context)
> > +			guc_kernel_context_pin(guc, engine->kernel_context);
> > +}
> > +
> >  static void guc_release(struct intel_engine_cs *engine)
> >  {
> >  	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
> > @@ -718,6 +1220,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >  
> >  void intel_guc_submission_enable(struct intel_guc *guc)
> >  {
> > +	guc_init_lrc_mapping(guc);
> >  }
> >  
> >  void intel_guc_submission_disable(struct intel_guc *guc)
> > @@ -743,3 +1246,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
> >  {
> >  	guc->submission_selected = __guc_submission_selected(guc);
> >  }
> > +
> > +static inline struct intel_context *
> > +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> > +{
> > +	struct intel_context *ce;
> > +
> > +	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> > +			"Invalid desc_idx %u", desc_idx);
> 
> just debug? why not an (fatal) error ?
> 

This is a G2H communication. We can't crash the driver if the GuC gives
us crap.

> > +		return NULL;
> > +	}
> > +
> > +	ce = __get_context(guc, desc_idx);
> > +	if (unlikely(!ce)) {
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> > +			"Context is NULL, desc_idx %u", desc_idx);
> 
> not an error ?
>

It is an error, we return NULL.

BTW - Can you include your name on the last comment? That makes it
easier to know when I've reached the end of your comments.

Matt

> > +		return NULL;
> > +	}
> > +
> > +	return ce;
> > +}
> > +
> > +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > +					  const u32 *msg,
> > +					  u32 len)
> > +{
> > +	struct intel_context *ce;
> > +	u32 desc_idx = msg[0];
> > +
> > +	if (unlikely(len < 1)) {
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
> > +		return -EPROTO;
> > +	}
> > +
> > +	ce = g2h_context_lookup(guc, desc_idx);
> > +	if (unlikely(!ce))
> > +		return -EPROTO;
> > +
> > +	if (context_wait_for_deregister_to_register(ce)) {
> > +		struct intel_runtime_pm *runtime_pm =
> > +			&ce->engine->gt->i915->runtime_pm;
> > +		intel_wakeref_t wakeref;
> > +
> > +		/*
> > +		 * Previous owner of this guc_id has been deregistered, now safe
> > +		 * register this context.
> > +		 */
> > +		with_intel_runtime_pm(runtime_pm, wakeref)
> > +			register_context(ce);
> > +		clr_context_wait_for_deregister_to_register(ce);
> > +		intel_context_put(ce);
> > +	} else if (context_destroyed(ce)) {
> > +		/* Context has been destroyed */
> > +		release_guc_id(guc, ce);
> > +		lrc_destroy(&ce->ref);
> > +	}
> > +
> > +	return 0;
> > +}
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index c857fafb8a30..a9c2242d61a2 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -4142,6 +4142,7 @@ enum {
> >  	FAULT_AND_CONTINUE /* Unsupported */
> >  };
> >  
> > +#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
> >  #define GEN8_CTX_VALID (1 << 0)
> >  #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
> >  #define GEN8_CTX_FORCE_RESTORE (1 << 2)
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index c5989c0b83d3..9dad3df5eaf7 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -419,6 +419,7 @@ bool i915_request_retire(struct i915_request *rq)
> >  	 */
> >  	if (!list_empty(&rq->sched.link))
> >  		remove_from_engine(rq);
> > +	atomic_dec(&rq->context->guc_id_ref);
> >  	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
> >  
> >  	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
> > 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 04/47] drm/i915/guc: Add non blocking CTB send function
  2021-06-25 11:50           ` Michal Wajdeczko
@ 2021-06-25 17:53             ` Matthew Brost
  0 siblings, 0 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-25 17:53 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: intel-gfx, dri-devel

On Fri, Jun 25, 2021 at 01:50:21PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 25.06.2021 00:41, Matthew Brost wrote:
> > On Thu, Jun 24, 2021 at 07:02:18PM +0200, Michal Wajdeczko wrote:
> >>
> >>
> >> On 24.06.2021 17:49, Matthew Brost wrote:
> >>> On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
> >>>>
> >>>>
> >>>> On 24.06.2021 09:04, Matthew Brost wrote:
> >>>>> Add non blocking CTB send function, intel_guc_send_nb. GuC submission
> >>>>> will send CTBs in the critical path and does not need to wait for these
> >>>>> CTBs to complete before moving on, hence the need for this new function.
> >>>>>
> >>>>> The non-blocking CTB now must have a flow control mechanism to ensure
> >>>>> the buffer isn't overrun. A lazy spin wait is used as we believe the
> >>>>> flow control condition should be rare with a properly sized buffer.
> >>>>>
> >>>>> The function, intel_guc_send_nb, is exported in this patch but unused.
> >>>>> Several patches later in the series make use of this function.
> >>>>>
> >>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >>>>> ---
> >>>>>  drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 +++-
> >>>>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++--
> >>>>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +-
> >>>>>  3 files changed, 82 insertions(+), 10 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>>>> index 4abc59f6f3cd..24b1df6ad4ae 100644
> >>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>>>> @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
> >>>>>  static
> >>>>>  inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
> >>>>>  {
> >>>>> -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
> >>>>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
> >>>>> +}
> >>>>> +
> >>>>> +#define INTEL_GUC_SEND_NB		BIT(31)
> >>>>
> >>>> hmm, this flag really belongs to intel_guc_ct_send() so it should be
> >>>> defined as CTB flag near that function declaration
> >>>>
> >>>
> >>> I can move this up a few lines.
> >>>
> >>>>> +static
> >>>>> +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
> >>>>> +{
> >>>>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
> >>>>> +				 INTEL_GUC_SEND_NB);
> >>>>>  }
> >>>>>  
> >>>>>  static inline int
> >>>>> @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
> >>>>>  			   u32 *response_buf, u32 response_buf_size)
> >>>>>  {
> >>>>>  	return intel_guc_ct_send(&guc->ct, action, len,
> >>>>> -				 response_buf, response_buf_size);
> >>>>> +				 response_buf, response_buf_size, 0);
> >>>>>  }
> >>>>>  
> >>>>>  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> >>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>>> index a17215920e58..c9a65d05911f 100644
> >>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>>> @@ -3,6 +3,11 @@
> >>>>>   * Copyright © 2016-2019 Intel Corporation
> >>>>>   */
> >>>>>  
> >>>>> +#include <linux/circ_buf.h>
> >>>>> +#include <linux/ktime.h>
> >>>>> +#include <linux/time64.h>
> >>>>> +#include <linux/timekeeping.h>
> >>>>> +
> >>>>>  #include "i915_drv.h"
> >>>>>  #include "intel_guc_ct.h"
> >>>>>  #include "gt/intel_gt.h"
> >>>>> @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct)
> >>>>>  static int ct_write(struct intel_guc_ct *ct,
> >>>>>  		    const u32 *action,
> >>>>>  		    u32 len /* in dwords */,
> >>>>> -		    u32 fence)
> >>>>> +		    u32 fence, u32 flags)
> >>>>>  {
> >>>>>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>>>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> >>>>> @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct,
> >>>>>  		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
> >>>>>  		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
> >>>>>  
> >>>>> -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> >>>>> -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> >>>>> -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
> >>>>> +	hxg = (flags & INTEL_GUC_SEND_NB) ?
> >>>>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
> >>>>> +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
> >>>>> +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
> >>>>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> >>>>> +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> >>>>> +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
> >>>>
> >>>> or as we already switched to accept and return whole HXG messages in
> >>>> guc_send_mmio() maybe we should do the same for CTB variant too and
> >>>> instead of using extra flag just let caller to prepare proper HXG header
> >>>> with HXG_EVENT type and then in CTB code just look at this type to make
> >>>> decision which code path to use
> >>>>
> >>>
> >>> Not sure I follow. Anyways could this be done in a follow up by you if
> >>> want this change.
> >>>  
> >>>> note that existing callers should not be impacted, as full HXG header
> >>>> for the REQUEST message looks exactly the same as "action" code alone.
> >>>>
> >>>>>  
> >>>>>  	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
> >>>>>  		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
> >>>>> @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> >>>>>  	return err;
> >>>>>  }
> >>>>>  
> >>>>> +static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> >>>>> +{
> >>>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
> >>>>> +	u32 head = READ_ONCE(desc->head);
> >>>>> +	u32 space;
> >>>>> +
> >>>>> +	space = CIRC_SPACE(desc->tail, head, ctb->size);
> >>>>> +
> >>>>> +	return space >= len_dw;
> >>>>
> >>>> here you are returning true(1) as has room
> >>>>
> >>>
> >>> See below.
> >>>  
> >>>>> +}
> >>>>> +
> >>>>> +static int ct_send_nb(struct intel_guc_ct *ct,
> >>>>> +		      const u32 *action,
> >>>>> +		      u32 len,
> >>>>> +		      u32 flags)
> >>>>> +{
> >>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>>> +	unsigned long spin_flags;
> >>>>> +	u32 fence;
> >>>>> +	int ret;
> >>>>> +
> >>>>> +	spin_lock_irqsave(&ctb->lock, spin_flags);
> >>>>> +
> >>>>> +	ret = h2g_has_room(ctb, len + 1);
> >>>>
> >>>> but here you treat "1" it as en error
> >>>>
> >>>
> >>> Yes, this patch is broken but fixed in a follow up one. Regardless I'll
> >>> fix this patch in place.
> >>>
> >>>> and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
> >>>>
> >>>
> >>> Not exactly. This is following how ct_send() uses the action + len
> >>> field. Action[0] field goes in the HXG header and extra + 1 is for the
> >>> CT header.
> >>
> >> well, "len" already counts "action" so by treating input as full HXG
> >> message (including HXG header) will make it cleaner
> >>
> > 
> > Yes, I know. See above. To me GUC_HXG_MSG_MIN_LEN makes zero sense and
> > it is worse than adding + 1. This + 1 accounts for the CT header not the
> > HXG header. If any we add a new define, GUC_CT_HDR_LEN, and add that.
> 
> you mean GUC_CTB_MSG_MIN_LEN ? it's already there [1]
> 

Kinda? I think we should have a define GUC_CTB_HDR_LEN which is 1 and
GUC_CTB_MSG_MIN_LEN is defined as GUC_CTB_HDR_LEN. 'GUC_CTB_HDR_LEN'
makes it clear that the + 1 is referring to the header. I've done this
branch of these patches already.

Matt

> [1]
> https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h#n82
> 
> > 
> > Matt
> > 
> >> we can try do it later but by doing it right now we would avoid
> >> introducing this send_nb() function and deprecating them long term again
> >>
> >>>
> >>>>> +	if (unlikely(ret))
> >>>>> +		goto out;
> >>>>> +
> >>>>> +	fence = ct_get_next_fence(ct);
> >>>>> +	ret = ct_write(ct, action, len, fence, flags);
> >>>>> +	if (unlikely(ret))
> >>>>> +		goto out;
> >>>>> +
> >>>>> +	intel_guc_notify(ct_to_guc(ct));
> >>>>> +
> >>>>> +out:
> >>>>> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> >>>>> +
> >>>>> +	return ret;
> >>>>> +}
> >>>>> +
> >>>>>  static int ct_send(struct intel_guc_ct *ct,
> >>>>>  		   const u32 *action,
> >>>>>  		   u32 len,
> >>>>> @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>  		   u32 response_buf_size,
> >>>>>  		   u32 *status)
> >>>>>  {
> >>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>>>  	struct ct_request request;
> >>>>>  	unsigned long flags;
> >>>>>  	u32 fence;
> >>>>> @@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>  	GEM_BUG_ON(!len);
> >>>>>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> >>>>>  	GEM_BUG_ON(!response_buf && response_buf_size);
> >>>>> +	might_sleep();
> >>>>>  
> >>>>> +	/*
> >>>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
> >>>>> +	 * buffers are sized correctly the flow control condition should be
> >>>>> +	 * rare.
> >>>>
> >>>> shouldn't we at least try to log such cases with RATE_LIMITED to find
> >>>> out how "rare" it is, or if really unlikely just return -EBUSY as in
> >>>> case of non-blocking send ?
> >>>>
> >>>
> >>> Definitely not return -EBUSY as this a blocking call. Perhaps we can log
> >>
> >> blocking calls still can fail for various reasons, full CTB is one of
> >> them, and if we return error (now broken) for non-blocking variant then
> >> we should do the same for blocking variant as well and let the caller
> >> decide about next steps
> >>
> >>> this, but IGTs likely can hit rather easily. It really is only
> >>> interesting if real workloads hit this. Regardless that can be a follow
> >>> up.
> >>
> >> if we hide retry in a silent loop then we will not find it out if we hit
> >> this condition (IGT or real WL) or not
> >>
> >>>
> >>> Matt
> >>>  
> >>>>> +	 */
> >>>>> +retry:
> >>>>>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> >>>>> +	if (unlikely(!h2g_has_room(ctb, len + 1))) {
> >>>>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>>>> +		cond_resched();
> >>>>> +		goto retry;
> >>>>> +	}
> >>>>>  
> >>>>>  	fence = ct_get_next_fence(ct);
> >>>>>  	request.fence = fence;
> >>>>> @@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>  	list_add_tail(&request.link, &ct->requests.pending);
> >>>>>  	spin_unlock(&ct->requests.lock);
> >>>>>  
> >>>>> -	err = ct_write(ct, action, len, fence);
> >>>>> +	err = ct_write(ct, action, len, fence, 0);
> >>>>>  
> >>>>>  	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>>>>  
> >>>>> @@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>   * Command Transport (CT) buffer based GuC send function.
> >>>>>   */
> >>>>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>>>> -		      u32 *response_buf, u32 response_buf_size)
> >>>>> +		      u32 *response_buf, u32 response_buf_size, u32 flags)
> >>>>>  {
> >>>>>  	u32 status = ~0; /* undefined */
> >>>>>  	int ret;
> >>>>> @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>>>>  		return -ENODEV;
> >>>>>  	}
> >>>>>  
> >>>>> +	if (flags & INTEL_GUC_SEND_NB)
> >>>>> +		return ct_send_nb(ct, action, len, flags);
> >>>>> +
> >>>>>  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> >>>>>  	if (unlikely(ret < 0)) {
> >>>>>  		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
> >>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>>>> index 1ae2dde6db93..eb69263324ba 100644
> >>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>>>> @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer {
> >>>>>  	bool broken;
> >>>>>  };
> >>>>>  
> >>>>> -
> >>>>>  /** Top-level structure for Command Transport related data
> >>>>>   *
> >>>>>   * Includes a pair of CT buffers for bi-directional communication and tracking
> >>>>> @@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
> >>>>>  }
> >>>>>  
> >>>>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>>>> -		      u32 *response_buf, u32 response_buf_size);
> >>>>> +		      u32 *response_buf, u32 response_buf_size, u32 flags);
> >>>>>  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
> >>>>>  
> >>>>>  #endif /* _INTEL_GUC_CT_H_ */
> >>>>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 06/47] drm/i915/guc: Optimize CTB writes and reads
  2021-06-25 13:09   ` Michal Wajdeczko
@ 2021-06-25 18:26     ` Matthew Brost
  2021-06-25 20:28     ` Matthew Brost
  1 sibling, 0 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-25 18:26 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: intel-gfx, dri-devel

On Fri, Jun 25, 2021 at 03:09:29PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 24.06.2021 09:04, Matthew Brost wrote:
> > CTB writes are now in the path of command submission and should be
> > optimized for performance. Rather than reading CTB descriptor values
> > (e.g. head, tail) which could result in accesses across the PCIe bus,
> > store shadow local copies and only read/write the descriptor values when
> > absolutely necessary. Also store the current space in the each channel
> > locally.
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 76 ++++++++++++++---------
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
> >  2 files changed, 51 insertions(+), 31 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 27ec30b5ef47..1fd5c69358ef 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
> >  static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
> >  {
> >  	ctb->broken = false;
> > +	ctb->tail = 0;
> > +	ctb->head = 0;
> > +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
> > +
> >  	guc_ct_buffer_desc_init(ctb->desc);
> >  }
> >  
> > @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
> >  {
> >  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >  	struct guc_ct_buffer_desc *desc = ctb->desc;
> > -	u32 head = desc->head;
> > -	u32 tail = desc->tail;
> > +	u32 tail = ctb->tail;
> >  	u32 size = ctb->size;
> > -	u32 used;
> >  	u32 header;
> >  	u32 hxg;
> >  	u32 *cmds = ctb->cmds;
> > @@ -398,25 +400,14 @@ static int ct_write(struct intel_guc_ct *ct,
> >  	if (unlikely(desc->status))
> >  		goto corrupted;
> >  
> > -	if (unlikely((tail | head) >= size)) {
> > +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> 
> since we are caching tail, we may want to check if it's sill correct:
> 
> 	tail = READ_ONCE(desc->tail);
> 	if (unlikely(tail != ctb->tail)) {
>   		CT_ERROR(ct, "Tail was modified %u != %u\n",
> 			 tail, ctb->tail);
>   		desc->status |= GUC_CTB_STATUS_MISMATCH;
>   		goto corrupted;
> 	}
> 
> and since we own the tail then we can be more strict:
> 
> 	GEM_BUG_ON(tail > size);
> 
> and then finally just check GuC head:
> 
> 	head = READ_ONCE(desc->head);
> 	if (unlikely(head >= size)) {
> 		...
> 

Sure, but still hidden behind CONFIG_DRM_I915_DEBUG_GUC, right?

> > +	if (unlikely((desc->tail | desc->head) >= size)) {
> >  		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> > -			 head, tail, size);
> > +			 desc->head, desc->tail, size);
> >  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> >  		goto corrupted;
> >  	}
> > -
> > -	/*
> > -	 * tail == head condition indicates empty. GuC FW does not support
> > -	 * using up the entire buffer to get tail == head meaning full.
> > -	 */
> > -	if (tail < head)
> > -		used = (size - head) + tail;
> > -	else
> > -		used = tail - head;
> > -
> > -	/* make sure there is a space including extra dw for the fence */
> > -	if (unlikely(used + len + 1 >= size))
> > -		return -ENOSPC;
> > +#endif
> >  
> >  	/*
> >  	 * dw0: CT header (including fence)
> > @@ -457,7 +448,9 @@ static int ct_write(struct intel_guc_ct *ct,
> >  	write_barrier(ct);
> >  
> >  	/* now update descriptor */
> > +	ctb->tail = tail;
> >  	WRITE_ONCE(desc->tail, tail);
> > +	ctb->space -= len + 1;
> 
> this magic "1" is likely GUC_CTB_MSG_MIN_LEN, right ?
>

Yes.
 
> >  
> >  	return 0;
> >  
> > @@ -473,7 +466,7 @@ static int ct_write(struct intel_guc_ct *ct,
> >   * @req:	pointer to pending request
> >   * @status:	placeholder for status
> >   *
> > - * For each sent request, Guc shall send bac CT response message.
> > + * For each sent request, GuC shall send back CT response message.
> >   * Our message handler will update status of tracked request once
> >   * response message with given fence is received. Wait here and
> >   * check for valid response status value.
> > @@ -520,24 +513,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
> >  	return ret;
> >  }
> >  
> > -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> > +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
> >  {
> > -	struct guc_ct_buffer_desc *desc = ctb->desc;
> > -	u32 head = READ_ONCE(desc->head);
> > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > +	u32 head;
> >  	u32 space;
> >  
> > -	space = CIRC_SPACE(desc->tail, head, ctb->size);
> > +	if (ctb->space >= len_dw)
> > +		return true;
> > +
> > +	head = READ_ONCE(ctb->desc->head);
> > +	if (unlikely(head > ctb->size)) {
> > +		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
> > +			  ctb->desc->head, ctb->desc->tail, ctb->size);
> > +		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > +		ctb->broken = true;
> > +		return false;
> > +	}
> > +
> > +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> > +	ctb->space = space;
> 
> maybe here we could mark stall_time ?
> 
>  	if (space >= len_dw)
> 		return true;
> 
> 	if (ct->stall_time == KTIME_MAX)
> 		ct->stall_time = ktime_get();
> 	return false;
>

No. See my eariler comment [1] about why I'd rather leave this to the
caller. 

[1] https://patchwork.freedesktop.org/patch/440703/?series=91840&rev=1

> >  
> >  	return space >= len_dw;
> 
> btw, maybe to avoid filling CTB to the last dword, this should be
> 
> 	space > len_dw
> 

CIRC_SPACE leaves an extra DW already.

> note the earlier comment:
> 
> /*
>  * tail == head condition indicates empty. GuC FW does not support
>  * using up the entire buffer to get tail == head meaning full.
>  */
>

Yes, again CIRC_SPACE uses this same algorithm.

Matt
 
> >  }
> >  
> >  static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
> >  {
> > -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > -
> >  	lockdep_assert_held(&ct->ctbs.send.lock);
> >  
> > -	if (unlikely(!h2g_has_room(ctb, len_dw))) {
> > +	if (unlikely(!h2g_has_room(ct, len_dw))) {
> >  		if (ct->stall_time == KTIME_MAX)
> >  			ct->stall_time = ktime_get();
> >  
> > @@ -606,10 +610,10 @@ static int ct_send(struct intel_guc_ct *ct,
> >  	 */
> >  retry:
> >  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > -	if (unlikely(!h2g_has_room(ctb, len + 1))) {
> > +	if (unlikely(!h2g_has_room(ct, len + 1))) {
> >  		if (ct->stall_time == KTIME_MAX)
> >  			ct->stall_time = ktime_get();
> > -		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > +		spin_unlock_irqrestore(&ctb->lock, flags);
> >  
> >  		if (unlikely(ct_deadlocked(ct)))
> >  			return -EIO;
> > @@ -632,7 +636,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >  
> >  	err = ct_write(ct, action, len, fence, 0);
> >  
> > -	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > +	spin_unlock_irqrestore(&ctb->lock, flags);
> >  
> >  	if (unlikely(err))
> >  		goto unlink;
> > @@ -720,7 +724,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >  {
> >  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
> >  	struct guc_ct_buffer_desc *desc = ctb->desc;
> > -	u32 head = desc->head;
> > +	u32 head = ctb->head;
> >  	u32 tail = desc->tail;
> >  	u32 size = ctb->size;
> >  	u32 *cmds = ctb->cmds;
> > @@ -735,12 +739,21 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >  	if (unlikely(desc->status))
> >  		goto corrupted;
> >  
> > -	if (unlikely((tail | head) >= size)) {
> > +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> 
> as above we may want to check if our cached head was not modified
> 
> > +	if (unlikely((desc->tail | desc->head) >= size)) {
> >  		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> >  			 head, tail, size);
> >  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> >  		goto corrupted;
> >  	}
> > +#else
> > +	if (unlikely((tail | ctb->head) >= size)) {
> > +		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> > +			 head, tail, size);
> > +		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > +		goto corrupted;
> > +	}
> > +#endif
> >  
> >  	/* tail == head condition indicates empty */
> >  	available = tail - head;
> > @@ -790,6 +803,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >  	}
> >  	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
> >  
> > +	ctb->head = head;
> >  	/* now update descriptor */
> >  	WRITE_ONCE(desc->head, head);
> >  
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > index 55ef7c52472f..9924335e2ee6 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > @@ -33,6 +33,9 @@ struct intel_guc;
> >   * @desc: pointer to the buffer descriptor
> >   * @cmds: pointer to the commands buffer
> >   * @size: size of the commands buffer in dwords
> > + * @head: local shadow copy of head in dwords
> > + * @tail: local shadow copy of tail in dwords
> > + * @space: local shadow copy of space in dwords
> >   * @broken: flag to indicate if descriptor data is broken
> >   */
> >  struct intel_guc_ct_buffer {
> > @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
> >  	struct guc_ct_buffer_desc *desc;
> >  	u32 *cmds;
> >  	u32 size;
> > +	u32 tail;
> > +	u32 head;
> > +	u32 space;
> 
> in later patch this is changing to atomic_t
> maybe we can start with it ?
> 
> >  	bool broken;
> >  };
> >  
> > 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 43/47] drm/i915/guc: Hook GuC scheduling policies up
  2021-06-25  0:59   ` Matthew Brost
@ 2021-06-25 19:10     ` John Harrison
  2021-07-10 18:56       ` Matthew Brost
  0 siblings, 1 reply; 170+ messages in thread
From: John Harrison @ 2021-06-25 19:10 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 6/24/2021 17:59, Matthew Brost wrote:
> On Thu, Jun 24, 2021 at 12:05:12AM -0700, Matthew Brost wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> Use the official driver default scheduling policies for configuring
>> the GuC scheduler rather than a bunch of hardcoded values.
>>
>> Signed-off-by: John Harrison <john.c.harrison@intel.com>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> Cc: Jose Souza <jose.souza@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_engine_types.h  |  1 +
>>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 +
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    | 44 ++++++++++++++++++-
>>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 +++--
>>   4 files changed, 53 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>> index 0ceffa2be7a7..37db857bb56c 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>> @@ -455,6 +455,7 @@ struct intel_engine_cs {
>>   #define I915_ENGINE_IS_VIRTUAL       BIT(5)
>>   #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
>>   #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
>> +#define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8)
>>   	unsigned int flags;
>>   
>>   	/*
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>> index c38365cd5fab..905ecbc7dbe3 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>> @@ -270,6 +270,8 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
>>   
>>   void intel_guc_find_hung_context(struct intel_engine_cs *engine);
>>   
>> +int intel_guc_global_policies_update(struct intel_guc *guc);
>> +
>>   void intel_guc_submission_reset_prepare(struct intel_guc *guc);
>>   void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
>>   void intel_guc_submission_reset_finish(struct intel_guc *guc);
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
>> index d3e86ab7508f..2ad5fcd4e1b7 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
>> @@ -77,14 +77,54 @@ static u32 guc_ads_blob_size(struct intel_guc *guc)
>>   	       guc_ads_private_data_size(guc);
>>   }
>>   
>> -static void guc_policies_init(struct guc_policies *policies)
>> +static void guc_policies_init(struct intel_guc *guc, struct guc_policies *policies)
>>   {
>> +	struct intel_gt *gt = guc_to_gt(guc);
>> +	struct drm_i915_private *i915 = gt->i915;
>> +
>>   	policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US;
>>   	policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI;
>> +
>>   	policies->global_flags = 0;
>> +	if (i915->params.reset < 2)
>> +		policies->global_flags |= GLOBAL_POLICY_DISABLE_ENGINE_RESET;
>> +
>>   	policies->is_valid = 1;
>>   }
>>   
>> +static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
>> +{
>> +	u32 action[] = {
>> +		INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE,
>> +		policy_offset
>> +	};
>> +
>> +	return intel_guc_send(guc, action, ARRAY_SIZE(action));
>> +}
>> +
>> +int intel_guc_global_policies_update(struct intel_guc *guc)
>> +{
>> +	struct __guc_ads_blob *blob = guc->ads_blob;
>> +	struct intel_gt *gt = guc_to_gt(guc);
>> +	intel_wakeref_t wakeref;
>> +	int ret;
>> +
>> +	if (!blob)
>> +		return -ENOTSUPP;
>> +
>> +	GEM_BUG_ON(!blob->ads.scheduler_policies);
>> +
>> +	guc_policies_init(guc, &blob->policies);
>> +
>> +	if (!intel_guc_is_ready(guc))
>> +		return 0;
>> +
>> +	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
>> +		ret = guc_action_policies_update(guc, blob->ads.scheduler_policies);
>> +
>> +	return ret;
>> +}
>> +
>>   static void guc_mapping_table_init(struct intel_gt *gt,
>>   				   struct guc_gt_system_info *system_info)
>>   {
>> @@ -281,7 +321,7 @@ static void __guc_ads_init(struct intel_guc *guc)
>>   	u8 engine_class, guc_class;
>>   
>>   	/* GuC scheduling policies */
>> -	guc_policies_init(&blob->policies);
>> +	guc_policies_init(guc, &blob->policies);
>>   
>>   	/*
>>   	 * GuC expects a per-engine-class context image and size
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> index 6188189314d5..a427336ce916 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> @@ -873,6 +873,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
>>   	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
>>   	atomic_set(&guc->outstanding_submission_g2h, 0);
>>   
>> +	intel_guc_global_policies_update(guc);
>>   	enable_submission(guc);
>>   	intel_gt_unpark_heartbeats(guc_to_gt(guc));
>>   }
>> @@ -1161,8 +1162,12 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
>>   {
>>   	desc->policy_flags = 0;
>>   
>> -	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
>> -	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
>> +	if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION)
> I can't see where we set this in this series, although I do see a
> selftest we need to fixup that sets this. Perhaps we drop this until we
> fix that selftest? Or at minimum add a comment saying it will be used in
> the future by selftests. What do you think John?
Yeah, it is only ever intended to be used by selftests. So yes, it could 
be punted down the road until the selftest patch. Likewise the 
definition for the flag, too.

John.

>> +		desc->policy_flags |= CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE;
>> +
>> +	/* NB: For both of these, zero means disabled. */
>> +	desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
>> +	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
>>   }
>>   
>>   static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>> @@ -1945,13 +1950,13 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>>   	engine->set_default_submission = guc_set_default_submission;
>>   
>>   	engine->flags |= I915_ENGINE_HAS_PREEMPTION;
>> +	engine->flags |= I915_ENGINE_HAS_TIMESLICES;
>>   
>>   	/*
>>   	 * TODO: GuC supports timeslicing and semaphores as well, but they're
> Nit, we now support timeslicing. I can fix that up in next rev.
>
> Matt
>
>>   	 * handled by the firmware so some minor tweaks are required before
>>   	 * enabling.
>>   	 *
>> -	 * engine->flags |= I915_ENGINE_HAS_TIMESLICES;
>>   	 * engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
>>   	 */
>>   
>> -- 
>> 2.28.0
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 09/47] drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 09/47] drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor Matthew Brost
@ 2021-06-25 19:44   ` John Harrison
  0 siblings, 0 replies; 170+ messages in thread
From: John Harrison @ 2021-06-25 19:44 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 6/24/2021 00:04, Matthew Brost wrote:
> Remove old GuC stage descriptor, add lrc descriptor which will be used
> by the new GuC interface implemented in this patch series.
>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 06/47] drm/i915/guc: Optimize CTB writes and reads
  2021-06-25 13:09   ` Michal Wajdeczko
  2021-06-25 18:26     ` Matthew Brost
@ 2021-06-25 20:28     ` Matthew Brost
  1 sibling, 0 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-25 20:28 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: intel-gfx, dri-devel

On Fri, Jun 25, 2021 at 03:09:29PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 24.06.2021 09:04, Matthew Brost wrote:
> > CTB writes are now in the path of command submission and should be
> > optimized for performance. Rather than reading CTB descriptor values
> > (e.g. head, tail) which could result in accesses across the PCIe bus,
> > store shadow local copies and only read/write the descriptor values when
> > absolutely necessary. Also store the current space in the each channel
> > locally.
> > 

Missed two comments, addressed below.

> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 76 ++++++++++++++---------
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
> >  2 files changed, 51 insertions(+), 31 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 27ec30b5ef47..1fd5c69358ef 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
> >  static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
> >  {
> >  	ctb->broken = false;
> > +	ctb->tail = 0;
> > +	ctb->head = 0;
> > +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
> > +
> >  	guc_ct_buffer_desc_init(ctb->desc);
> >  }
> >  
> > @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
> >  {
> >  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >  	struct guc_ct_buffer_desc *desc = ctb->desc;
> > -	u32 head = desc->head;
> > -	u32 tail = desc->tail;
> > +	u32 tail = ctb->tail;
> >  	u32 size = ctb->size;
> > -	u32 used;
> >  	u32 header;
> >  	u32 hxg;
> >  	u32 *cmds = ctb->cmds;
> > @@ -398,25 +400,14 @@ static int ct_write(struct intel_guc_ct *ct,
> >  	if (unlikely(desc->status))
> >  		goto corrupted;
> >  
> > -	if (unlikely((tail | head) >= size)) {
> > +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> 
> since we are caching tail, we may want to check if it's sill correct:
> 
> 	tail = READ_ONCE(desc->tail);
> 	if (unlikely(tail != ctb->tail)) {
>   		CT_ERROR(ct, "Tail was modified %u != %u\n",
> 			 tail, ctb->tail);
>   		desc->status |= GUC_CTB_STATUS_MISMATCH;
>   		goto corrupted;
> 	}
> 
> and since we own the tail then we can be more strict:
> 
> 	GEM_BUG_ON(tail > size);
> 
> and then finally just check GuC head:
> 
> 	head = READ_ONCE(desc->head);
> 	if (unlikely(head >= size)) {
> 		...
> 
> > +	if (unlikely((desc->tail | desc->head) >= size)) {
> >  		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> > -			 head, tail, size);
> > +			 desc->head, desc->tail, size);
> >  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> >  		goto corrupted;
> >  	}
> > -
> > -	/*
> > -	 * tail == head condition indicates empty. GuC FW does not support
> > -	 * using up the entire buffer to get tail == head meaning full.
> > -	 */
> > -	if (tail < head)
> > -		used = (size - head) + tail;
> > -	else
> > -		used = tail - head;
> > -
> > -	/* make sure there is a space including extra dw for the fence */
> > -	if (unlikely(used + len + 1 >= size))
> > -		return -ENOSPC;
> > +#endif
> >  
> >  	/*
> >  	 * dw0: CT header (including fence)
> > @@ -457,7 +448,9 @@ static int ct_write(struct intel_guc_ct *ct,
> >  	write_barrier(ct);
> >  
> >  	/* now update descriptor */
> > +	ctb->tail = tail;
> >  	WRITE_ONCE(desc->tail, tail);
> > +	ctb->space -= len + 1;
> 
> this magic "1" is likely GUC_CTB_MSG_MIN_LEN, right ?
> 
> >  
> >  	return 0;
> >  
> > @@ -473,7 +466,7 @@ static int ct_write(struct intel_guc_ct *ct,
> >   * @req:	pointer to pending request
> >   * @status:	placeholder for status
> >   *
> > - * For each sent request, Guc shall send bac CT response message.
> > + * For each sent request, GuC shall send back CT response message.
> >   * Our message handler will update status of tracked request once
> >   * response message with given fence is received. Wait here and
> >   * check for valid response status value.
> > @@ -520,24 +513,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
> >  	return ret;
> >  }
> >  
> > -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> > +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
> >  {
> > -	struct guc_ct_buffer_desc *desc = ctb->desc;
> > -	u32 head = READ_ONCE(desc->head);
> > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > +	u32 head;
> >  	u32 space;
> >  
> > -	space = CIRC_SPACE(desc->tail, head, ctb->size);
> > +	if (ctb->space >= len_dw)
> > +		return true;
> > +
> > +	head = READ_ONCE(ctb->desc->head);
> > +	if (unlikely(head > ctb->size)) {
> > +		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
> > +			  ctb->desc->head, ctb->desc->tail, ctb->size);
> > +		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > +		ctb->broken = true;
> > +		return false;
> > +	}
> > +
> > +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> > +	ctb->space = space;
> 
> maybe here we could mark stall_time ?
> 
>  	if (space >= len_dw)
> 		return true;
> 
> 	if (ct->stall_time == KTIME_MAX)
> 		ct->stall_time = ktime_get();
> 	return false;
> 
> >  
> >  	return space >= len_dw;
> 
> btw, maybe to avoid filling CTB to the last dword, this should be
> 
> 	space > len_dw
> 
> note the earlier comment:
> 
> /*
>  * tail == head condition indicates empty. GuC FW does not support
>  * using up the entire buffer to get tail == head meaning full.
>  */
> 
> >  }
> >  
> >  static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
> >  {
> > -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > -
> >  	lockdep_assert_held(&ct->ctbs.send.lock);
> >  
> > -	if (unlikely(!h2g_has_room(ctb, len_dw))) {
> > +	if (unlikely(!h2g_has_room(ct, len_dw))) {
> >  		if (ct->stall_time == KTIME_MAX)
> >  			ct->stall_time = ktime_get();
> >  
> > @@ -606,10 +610,10 @@ static int ct_send(struct intel_guc_ct *ct,
> >  	 */
> >  retry:
> >  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > -	if (unlikely(!h2g_has_room(ctb, len + 1))) {
> > +	if (unlikely(!h2g_has_room(ct, len + 1))) {
> >  		if (ct->stall_time == KTIME_MAX)
> >  			ct->stall_time = ktime_get();
> > -		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > +		spin_unlock_irqrestore(&ctb->lock, flags);
> >  
> >  		if (unlikely(ct_deadlocked(ct)))
> >  			return -EIO;
> > @@ -632,7 +636,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >  
> >  	err = ct_write(ct, action, len, fence, 0);
> >  
> > -	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > +	spin_unlock_irqrestore(&ctb->lock, flags);
> >  
> >  	if (unlikely(err))
> >  		goto unlink;
> > @@ -720,7 +724,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >  {
> >  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
> >  	struct guc_ct_buffer_desc *desc = ctb->desc;
> > -	u32 head = desc->head;
> > +	u32 head = ctb->head;
> >  	u32 tail = desc->tail;
> >  	u32 size = ctb->size;
> >  	u32 *cmds = ctb->cmds;
> > @@ -735,12 +739,21 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >  	if (unlikely(desc->status))
> >  		goto corrupted;
> >  
> > -	if (unlikely((tail | head) >= size)) {
> > +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> 
> as above we may want to check if our cached head was not modified
>

Sure.
 
> > +	if (unlikely((desc->tail | desc->head) >= size)) {
> >  		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> >  			 head, tail, size);
> >  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> >  		goto corrupted;
> >  	}
> > +#else
> > +	if (unlikely((tail | ctb->head) >= size)) {
> > +		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> > +			 head, tail, size);
> > +		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > +		goto corrupted;
> > +	}
> > +#endif
> >  
> >  	/* tail == head condition indicates empty */
> >  	available = tail - head;
> > @@ -790,6 +803,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >  	}
> >  	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
> >  
> > +	ctb->head = head;
> >  	/* now update descriptor */
> >  	WRITE_ONCE(desc->head, head);
> >  
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > index 55ef7c52472f..9924335e2ee6 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > @@ -33,6 +33,9 @@ struct intel_guc;
> >   * @desc: pointer to the buffer descriptor
> >   * @cmds: pointer to the commands buffer
> >   * @size: size of the commands buffer in dwords
> > + * @head: local shadow copy of head in dwords
> > + * @tail: local shadow copy of tail in dwords
> > + * @space: local shadow copy of space in dwords
> >   * @broken: flag to indicate if descriptor data is broken
> >   */
> >  struct intel_guc_ct_buffer {
> > @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
> >  	struct guc_ct_buffer_desc *desc;
> >  	u32 *cmds;
> >  	u32 size;
> > +	u32 tail;
> > +	u32 head;
> > +	u32 space;
> 
> in later patch this is changing to atomic_t
> maybe we can start with it ?
> 

I'd rather leave this as is. It doesn't make to use an atomic here but
G2H credits patch makes it clear why we need an atomic.

Matt 

> >  	bool broken;
> >  };
> >  
> > 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 08/47] drm/i915/guc: Add new GuC interface defines and structures
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 08/47] drm/i915/guc: Add new GuC interface defines and structures Matthew Brost
@ 2021-06-29 21:11   ` John Harrison
  2021-06-30  0:30     ` Matthew Brost
  0 siblings, 1 reply; 170+ messages in thread
From: John Harrison @ 2021-06-29 21:11 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 6/24/2021 00:04, Matthew Brost wrote:
> Add new GuC interface defines and structures while maintaining old ones
> in parallel.
>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
I think there was some difference of opinion over whether these 
additions should be squashed in to the specific patches that first use 
them. However, on the grounds that such is basically a patch-only style 
comment and doesn't change the final product plus, we need to get this 
stuff merged efficiently and not spend forever rebasing and refactoring...

Reviewed-by: John Harrison <John.C.Harrison@Intel.com>


> ---
>   .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  | 14 +++++++
>   drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   | 41 +++++++++++++++++++
>   2 files changed, 55 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> index 2d6198e63ebe..57e18babdf4b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> @@ -124,10 +124,24 @@ enum intel_guc_action {
>   	INTEL_GUC_ACTION_FORCE_LOG_BUFFER_FLUSH = 0x302,
>   	INTEL_GUC_ACTION_ENTER_S_STATE = 0x501,
>   	INTEL_GUC_ACTION_EXIT_S_STATE = 0x502,
> +	INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE = 0x506,
> +	INTEL_GUC_ACTION_SCHED_CONTEXT = 0x1000,
> +	INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET = 0x1001,
> +	INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE = 0x1002,
> +	INTEL_GUC_ACTION_SCHED_ENGINE_MODE_SET = 0x1003,
> +	INTEL_GUC_ACTION_SCHED_ENGINE_MODE_DONE = 0x1004,
> +	INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY = 0x1005,
> +	INTEL_GUC_ACTION_SET_CONTEXT_EXECUTION_QUANTUM = 0x1006,
> +	INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT = 0x1007,
> +	INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION = 0x1008,
> +	INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION = 0x1009,
>   	INTEL_GUC_ACTION_SLPC_REQUEST = 0x3003,
>   	INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000,
> +	INTEL_GUC_ACTION_REGISTER_CONTEXT = 0x4502,
> +	INTEL_GUC_ACTION_DEREGISTER_CONTEXT = 0x4503,
>   	INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
>   	INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
> +	INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
>   	INTEL_GUC_ACTION_LIMIT
>   };
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> index 617ec601648d..28245a217a39 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> @@ -17,6 +17,9 @@
>   #include "abi/guc_communication_ctb_abi.h"
>   #include "abi/guc_messages_abi.h"
>   
> +#define GUC_CONTEXT_DISABLE		0
> +#define GUC_CONTEXT_ENABLE		1
> +
>   #define GUC_CLIENT_PRIORITY_KMD_HIGH	0
>   #define GUC_CLIENT_PRIORITY_HIGH	1
>   #define GUC_CLIENT_PRIORITY_KMD_NORMAL	2
> @@ -26,6 +29,9 @@
>   #define GUC_MAX_STAGE_DESCRIPTORS	1024
>   #define	GUC_INVALID_STAGE_ID		GUC_MAX_STAGE_DESCRIPTORS
>   
> +#define GUC_MAX_LRC_DESCRIPTORS		65535
> +#define	GUC_INVALID_LRC_ID		GUC_MAX_LRC_DESCRIPTORS
> +
>   #define GUC_RENDER_ENGINE		0
>   #define GUC_VIDEO_ENGINE		1
>   #define GUC_BLITTER_ENGINE		2
> @@ -237,6 +243,41 @@ struct guc_stage_desc {
>   	u64 desc_private;
>   } __packed;
>   
> +#define CONTEXT_REGISTRATION_FLAG_KMD	BIT(0)
> +
> +#define CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000
> +#define CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US 500000
> +
> +/* Preempt to idle on quantum expiry */
> +#define CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE	BIT(0)
> +
> +/*
> + * GuC Context registration descriptor.
> + * FIXME: This is only required to exist during context registration.
> + * The current 1:1 between guc_lrc_desc and LRCs for the lifetime of the LRC
> + * is not required.
> + */
> +struct guc_lrc_desc {
> +	u32 hw_context_desc;
> +	u32 slpm_perf_mode_hint;	/* SPLC v1 only */
> +	u32 slpm_freq_hint;
> +	u32 engine_submit_mask;		/* In logical space */
> +	u8 engine_class;
> +	u8 reserved0[3];
> +	u32 priority;
> +	u32 process_desc;
> +	u32 wq_addr;
> +	u32 wq_size;
> +	u32 context_flags;		/* CONTEXT_REGISTRATION_* */
> +	/* Time for one workload to execute. (in micro seconds) */
> +	u32 execution_quantum;
> +	/* Time to wait for a preemption request to complete before issuing a
> +	 * reset. (in micro seconds). */
> +	u32 preemption_timeout;
> +	u32 policy_flags;		/* CONTEXT_POLICY_* */
> +	u32 reserved1[19];
> +} __packed;
> +
>   #define GUC_POWER_UNSPECIFIED	0
>   #define GUC_POWER_D0		1
>   #define GUC_POWER_D1		2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 10/47] drm/i915/guc: Add lrc descriptor context lookup array
  2021-06-25 17:26     ` Matthew Brost
@ 2021-06-29 21:20       ` John Harrison
  0 siblings, 0 replies; 170+ messages in thread
From: John Harrison @ 2021-06-29 21:20 UTC (permalink / raw)
  To: Matthew Brost, Michal Wajdeczko; +Cc: intel-gfx, dri-devel

On 6/25/2021 10:26, Matthew Brost wrote:
> On Fri, Jun 25, 2021 at 03:17:51PM +0200, Michal Wajdeczko wrote:
>> On 24.06.2021 09:04, Matthew Brost wrote:
>>> Add lrc descriptor context lookup array which can resolve the
>>> intel_context from the lrc descriptor index. In addition to lookup, it
>>> can determine in the lrc descriptor context is currently registered with
>>> the GuC by checking if an entry for a descriptor index is present.
>>> Future patches in the series will make use of this array.
>> s/lrc/LRC
>>
> I guess? lrc and LRC are used interchangeably throughout the current
> code base.
It is an abbreviation so LRC is technically the correct version for a 
comment. The fact that other existing comments are incorrect is not a 
valid reason to perpetuate a mistake :). Might as well fix it if you are 
going to repost the patch anyway for any other reason, but I would not 
call it a blocking issue.

Also, 'can determine in the' should be 'can determine if the'. Again, 
not exactly a blocking issue but should be fixed.

>>> Cc: John Harrison <john.c.harrison@intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  5 +++
>>>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +++++++++++++++++--
>>>   2 files changed, 35 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> index b28fa54214f2..2313d9fc087b 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> @@ -6,6 +6,8 @@
>>>   #ifndef _INTEL_GUC_H_
>>>   #define _INTEL_GUC_H_
>>>   
>>> +#include "linux/xarray.h"
>> #include <linux/xarray.h>
>>
> Yep.
>
>>> +
>>>   #include "intel_uncore.h"
>>>   #include "intel_guc_fw.h"
>>>   #include "intel_guc_fwif.h"
>>> @@ -46,6 +48,9 @@ struct intel_guc {
>>>   	struct i915_vma *lrc_desc_pool;
>>>   	void *lrc_desc_pool_vaddr;
>>>   
>>> +	/* guc_id to intel_context lookup */
>>> +	struct xarray context_lookup;
>>> +
>>>   	/* Control params for fw initialization */
>>>   	u32 params[GUC_CTL_MAX_DWORDS];
>> btw, IIRC there was idea to move most struct definitions to
>> intel_guc_types.h, is this still a plan ?
>>
> I don't ever recall discussing this but we can certainly do this. For
> what it is worth we do introduce intel_guc_submission_types.h a bit
> later. I'll make a note about intel_guc_types.h though.
>
> Matt
Yeah, my only recollection was about the submission types header. Are 
there sufficient non-submission fields in the GuC structure to warrant a 
general GuC types header?

With the commit message tweaks and #include fix mentioned above, it 
looks good to me.
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>


>>>   
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> index a366890fb840..23a94a896a0b 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>>>   	return rb_entry(rb, struct i915_priolist, node);
>>>   }
>>>   
>>> -/* Future patches will use this function */
>>> -__attribute__ ((unused))
>>>   static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
>>>   {
>>>   	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
>>> @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
>>>   	return &base[index];
>>>   }
>>>   
>>> +static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
>>> +{
>>> +	struct intel_context *ce = xa_load(&guc->context_lookup, id);
>>> +
>>> +	GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
>>> +
>>> +	return ce;
>>> +}
>>> +
>>>   static int guc_lrc_desc_pool_create(struct intel_guc *guc)
>>>   {
>>>   	u32 size;
>>> @@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
>>>   	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
>>>   }
>>>   
>>> +static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
>>> +{
>>> +	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
>>> +
>>> +	memset(desc, 0, sizeof(*desc));
>>> +	xa_erase_irq(&guc->context_lookup, id);
>>> +}
>>> +
>>> +static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
>>> +{
>>> +	return __get_context(guc, id);
>>> +}
>>> +
>>> +static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
>>> +					   struct intel_context *ce)
>>> +{
>>> +	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
>>> +}
>>> +
>>>   static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>>>   {
>>>   	/* Leaving stub as this function will be used in future patches */
>>> @@ -400,6 +426,8 @@ int intel_guc_submission_init(struct intel_guc *guc)
>>>   	 */
>>>   	GEM_BUG_ON(!guc->lrc_desc_pool);
>>>   
>>> +	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
>>> +
>>>   	return 0;
>>>   }
>>>   
>>>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 11/47] drm/i915/guc: Implement GuC submission tasklet
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 11/47] drm/i915/guc: Implement GuC submission tasklet Matthew Brost
@ 2021-06-29 22:04   ` John Harrison
  2021-06-30  0:41     ` Matthew Brost
  0 siblings, 1 reply; 170+ messages in thread
From: John Harrison @ 2021-06-29 22:04 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 6/24/2021 00:04, Matthew Brost wrote:
> Implement GuC submission tasklet for new interface. The new GuC
> interface uses H2G to submit contexts to the GuC. Since H2G use a single
> channel, a single tasklet submits is used for the submission path.
Re-word? 'a single tasklet submits is used...' doesn't make sense.

> Also the per engine interrupt handler has been updated to disable the
> rescheduling of the physical engine tasklet, when using GuC scheduling,
> as the physical engine tasklet is no longer used.
>
> In this patch the field, guc_id, has been added to intel_context and is
> not assigned. Patches later in the series will assign this value.
>
> Cc: John Harrison<john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost<matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   4 +
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +++++++++---------
>   3 files changed, 127 insertions(+), 117 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index ed8c447a7346..bb6fef7eae52 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -136,6 +136,15 @@ struct intel_context {
>   	struct intel_sseu sseu;
>   
>   	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
> +
> +	/* GuC scheduling state that does not require a lock. */
Maybe 'GuC scheduling state flags that do not require a lock'? Otherwise 
it just looks like a counter or something.

> +	atomic_t guc_sched_state_no_lock;
> +
> +	/*
> +	 * GuC lrc descriptor ID - Not assigned in this patch but future patches
Not a blocker but s/lrc/LRC/ would keep Michal happy ;). Although 
presumably this comment is at least being amended by later patches in 
the series.

> +	 * in the series will.
> +	 */
> +	u16 guc_id;
>   };
>   
>   #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 2313d9fc087b..9ba8219475b2 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -30,6 +30,10 @@ struct intel_guc {
>   	struct intel_guc_log log;
>   	struct intel_guc_ct ct;
>   
> +	/* Global engine used to submit requests to GuC */
> +	struct i915_sched_engine *sched_engine;
> +	struct i915_request *stalled_request;
> +
>   	/* intel_guc_recv interrupt related state */
>   	spinlock_t irq_lock;
>   	unsigned int msg_enabled_mask;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 23a94a896a0b..ee933efbf0ff 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -60,6 +60,31 @@
>   
>   #define GUC_REQUEST_SIZE 64 /* bytes */
>   
> +/*
> + * Below is a set of functions which control the GuC scheduling state which do
> + * not require a lock as all state transitions are mutually exclusive. i.e. It
> + * is not possible for the context pinning code and submission, for the same
> + * context, to be executing simultaneously. We still need an atomic as it is
> + * possible for some of the bits to changing at the same time though.
> + */
> +#define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
> +static inline bool context_enabled(struct intel_context *ce)
> +{
> +	return (atomic_read(&ce->guc_sched_state_no_lock) &
> +		SCHED_STATE_NO_LOCK_ENABLED);
> +}
> +
> +static inline void set_context_enabled(struct intel_context *ce)
> +{
> +	atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
> +}
> +
> +static inline void clr_context_enabled(struct intel_context *ce)
> +{
> +	atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
> +		   &ce->guc_sched_state_no_lock);
> +}
> +
>   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>   {
>   	return rb_entry(rb, struct i915_priolist, node);
> @@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
>   	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
>   }
>   
> -static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> +static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   {
> -	/* Leaving stub as this function will be used in future patches */
> -}
> +	int err;
> +	struct intel_context *ce = rq->context;
> +	u32 action[3];
> +	int len = 0;
> +	bool enabled = context_enabled(ce);
>   
> -/*
> - * When we're doing submissions using regular execlists backend, writing to
> - * ELSP from CPU side is enough to make sure that writes to ringbuffer pages
> - * pinned in mappable aperture portion of GGTT are visible to command streamer.
> - * Writes done by GuC on our behalf are not guaranteeing such ordering,
> - * therefore, to ensure the flush, we're issuing a POSTING READ.
> - */
> -static void flush_ggtt_writes(struct i915_vma *vma)
> -{
> -	if (i915_vma_is_map_and_fenceable(vma))
> -		intel_uncore_posting_read_fw(vma->vm->gt->uncore,
> -					     GUC_STATUS);
> -}
> +	if (!enabled) {
> +		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> +		action[len++] = ce->guc_id;
> +		action[len++] = GUC_CONTEXT_ENABLE;
> +	} else {
> +		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
> +		action[len++] = ce->guc_id;
> +	}
>   
> -static void guc_submit(struct intel_engine_cs *engine,
> -		       struct i915_request **out,
> -		       struct i915_request **end)
> -{
> -	struct intel_guc *guc = &engine->gt->uc.guc;
> +	err = intel_guc_send_nb(guc, action, len);
>   
> -	do {
> -		struct i915_request *rq = *out++;
> +	if (!enabled && !err)
> +		set_context_enabled(ce);
>   
> -		flush_ggtt_writes(rq->ring->vma);
> -		guc_add_request(guc, rq);
> -	} while (out != end);
> +	return err;
>   }
>   
>   static inline int rq_prio(const struct i915_request *rq)
> @@ -160,125 +177,88 @@ static inline int rq_prio(const struct i915_request *rq)
>   	return rq->sched.attr.priority;
>   }
>   
> -static struct i915_request *schedule_in(struct i915_request *rq, int idx)
> +static int guc_dequeue_one_context(struct intel_guc *guc)
>   {
> -	trace_i915_request_in(rq, idx);
> -
> -	/*
> -	 * Currently we are not tracking the rq->context being inflight
> -	 * (ce->inflight = rq->engine). It is only used by the execlists
> -	 * backend at the moment, a similar counting strategy would be
> -	 * required if we generalise the inflight tracking.
> -	 */
> -
> -	__intel_gt_pm_get(rq->engine->gt);
> -	return i915_request_get(rq);
> -}
> -
> -static void schedule_out(struct i915_request *rq)
> -{
> -	trace_i915_request_out(rq);
> -
> -	intel_gt_pm_put_async(rq->engine->gt);
> -	i915_request_put(rq);
> -}
> -
> -static void __guc_dequeue(struct intel_engine_cs *engine)
> -{
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
> -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> -	struct i915_request **first = execlists->inflight;
> -	struct i915_request ** const last_port = first + execlists->port_mask;
> -	struct i915_request *last = first[0];
> -	struct i915_request **port;
> +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> +	struct i915_request *last = NULL;
>   	bool submit = false;
>   	struct rb_node *rb;
> +	int ret;
>   
>   	lockdep_assert_held(&sched_engine->lock);
>   
> -	if (last) {
> -		if (*++first)
> -			return;
> -
> -		last = NULL;
> +	if (guc->stalled_request) {
> +		submit = true;
> +		last = guc->stalled_request;
> +		goto resubmit;
>   	}
>   
> -	/*
> -	 * We write directly into the execlists->inflight queue and don't use
> -	 * the execlists->pending queue, as we don't have a distinct switch
> -	 * event.
> -	 */
> -	port = first;
>   	while ((rb = rb_first_cached(&sched_engine->queue))) {
>   		struct i915_priolist *p = to_priolist(rb);
>   		struct i915_request *rq, *rn;
>   
>   		priolist_for_each_request_consume(rq, rn, p) {
> -			if (last && rq->context != last->context) {
> -				if (port == last_port)
> -					goto done;
> -
> -				*port = schedule_in(last,
> -						    port - execlists->inflight);
> -				port++;
> -			}
> +			if (last && rq->context != last->context)
> +				goto done;
>   
>   			list_del_init(&rq->sched.link);
> +
>   			__i915_request_submit(rq);
> -			submit = true;
> +
> +			trace_i915_request_in(rq, 0);
>   			last = rq;
> +			submit = true;
>   		}
>   
>   		rb_erase_cached(&p->node, &sched_engine->queue);
>   		i915_priolist_free(p);
>   	}
>   done:
> -	sched_engine->queue_priority_hint =
> -		rb ? to_priolist(rb)->priority : INT_MIN;
>   	if (submit) {
> -		*port = schedule_in(last, port - execlists->inflight);
> -		*++port = NULL;
> -		guc_submit(engine, first, port);
> +		last->context->lrc_reg_state[CTX_RING_TAIL] =
> +			intel_ring_set_tail(last->ring, last->tail);
> +resubmit:
> +		/*
> +		 * We only check for -EBUSY here even though it is possible for
> +		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
> +		 * died and a full GPU needs to be done. The hangcheck will
'full GPU reset'. Although I believe strictly speaking, it is a 'full GT 
reset'. There are other bits of the GPU beyond the GT.

> +		 * eventually detect that the GuC has died and trigger this
> +		 * reset so no need to handle -EDEADLK here.
> +		 */
> +		ret = guc_add_request(guc, last);
> +		if (ret == -EBUSY) {
> +			tasklet_schedule(&sched_engine->tasklet);
> +			guc->stalled_request = last;
> +			return false;
> +		}
>   	}
> -	execlists->active = execlists->inflight;
> +
> +	guc->stalled_request = NULL;
> +	return submit;
>   }
>   
>   static void guc_submission_tasklet(struct tasklet_struct *t)
>   {
>   	struct i915_sched_engine *sched_engine =
>   		from_tasklet(sched_engine, t, tasklet);
> -	struct intel_engine_cs * const engine = sched_engine->private_data;
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
> -	struct i915_request **port, *rq;
>   	unsigned long flags;
> +	bool loop;
>   
> -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> -
> -	for (port = execlists->inflight; (rq = *port); port++) {
> -		if (!i915_request_completed(rq))
> -			break;
> -
> -		schedule_out(rq);
> -	}
> -	if (port != execlists->inflight) {
> -		int idx = port - execlists->inflight;
> -		int rem = ARRAY_SIZE(execlists->inflight) - idx;
> -		memmove(execlists->inflight, port, rem * sizeof(*port));
> -	}
> +	spin_lock_irqsave(&sched_engine->lock, flags);
>   
> -	__guc_dequeue(engine);
> +	do {
> +		loop = guc_dequeue_one_context(sched_engine->private_data);
> +	} while (loop);
>   
> -	i915_sched_engine_reset_on_empty(engine->sched_engine);
> +	i915_sched_engine_reset_on_empty(sched_engine);
>   
> -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
>   }
>   
Not a blocker but it has to be said that it would be much easier to 
remove all of the above if the delete was split into a separate patch. 
Having two completely disparate threads of code interwoven in the diff 
makes it much harder to see what the new version is doing!


>   static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
>   {
> -	if (iir & GT_RENDER_USER_INTERRUPT) {
> +	if (iir & GT_RENDER_USER_INTERRUPT)
>   		intel_engine_signal_breadcrumbs(engine);
> -		tasklet_hi_schedule(&engine->sched_engine->tasklet);
> -	}
>   }
>   
>   static void guc_reset_prepare(struct intel_engine_cs *engine)
> @@ -349,6 +329,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
>   	struct rb_node *rb;
>   	unsigned long flags;
>   
> +	/* Can be called during boot if GuC fails to load */
> +	if (!engine->gt)
> +		return;
> +
>   	ENGINE_TRACE(engine, "\n");
>   
>   	/*
> @@ -433,8 +417,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
>   
>   void intel_guc_submission_fini(struct intel_guc *guc)
>   {
> -	if (guc->lrc_desc_pool)
> -		guc_lrc_desc_pool_destroy(guc);
> +	if (!guc->lrc_desc_pool)
> +		return;
> +
> +	guc_lrc_desc_pool_destroy(guc);
> +	i915_sched_engine_put(guc->sched_engine);
>   }
>   
>   static int guc_context_alloc(struct intel_context *ce)
> @@ -499,32 +486,32 @@ static int guc_request_alloc(struct i915_request *request)
>   	return 0;
>   }
>   
> -static inline void queue_request(struct intel_engine_cs *engine,
> +static inline void queue_request(struct i915_sched_engine *sched_engine,
>   				 struct i915_request *rq,
>   				 int prio)
>   {
>   	GEM_BUG_ON(!list_empty(&rq->sched.link));
>   	list_add_tail(&rq->sched.link,
> -		      i915_sched_lookup_priolist(engine->sched_engine, prio));
> +		      i915_sched_lookup_priolist(sched_engine, prio));
>   	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>   }
>   
>   static void guc_submit_request(struct i915_request *rq)
>   {
> -	struct intel_engine_cs *engine = rq->engine;
> +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
>   	unsigned long flags;
>   
>   	/* Will be called from irq-context when using foreign fences. */
> -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> +	spin_lock_irqsave(&sched_engine->lock, flags);
>   
> -	queue_request(engine, rq, rq_prio(rq));
> +	queue_request(sched_engine, rq, rq_prio(rq));
>   
> -	GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
> +	GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
>   	GEM_BUG_ON(list_empty(&rq->sched.link));
>   
> -	tasklet_hi_schedule(&engine->sched_engine->tasklet);
> +	tasklet_hi_schedule(&sched_engine->tasklet);
>   
> -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
>   }
>   
>   static void sanitize_hwsp(struct intel_engine_cs *engine)
> @@ -602,8 +589,6 @@ static void guc_release(struct intel_engine_cs *engine)
>   {
>   	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
>   
> -	tasklet_kill(&engine->sched_engine->tasklet);
> -
>   	intel_engine_cleanup_common(engine);
>   	lrc_fini_wa_ctx(engine);
>   }
> @@ -674,6 +659,7 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
>   int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
> +	struct intel_guc *guc = &engine->gt->uc.guc;
>   
>   	/*
>   	 * The setup relies on several assumptions (e.g. irqs always enabled)
> @@ -681,7 +667,18 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   	 */
>   	GEM_BUG_ON(GRAPHICS_VER(i915) < 11);
>   
> -	tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet);
> +	if (!guc->sched_engine) {
> +		guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
Does the re-work of the sched_engine create/destroy happen later in this 
patch series? Wasn't there issues with the wrong destroy function being 
called in certain situations? Or do those issues (and fixes) only come 
in with the virtual engine support?

John.

> +		if (!guc->sched_engine)
> +			return -ENOMEM;
> +
> +		guc->sched_engine->schedule = i915_schedule;
> +		guc->sched_engine->private_data = guc;
> +		tasklet_setup(&guc->sched_engine->tasklet,
> +			      guc_submission_tasklet);
> +	}
> +	i915_sched_engine_put(engine->sched_engine);
> +	engine->sched_engine = i915_sched_engine_get(guc->sched_engine);
>   
>   	guc_default_vfuncs(engine);
>   	guc_default_irqs(engine);

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 12/47] drm/i915/guc: Add bypass tasklet submission path to GuC
  2021-06-24  7:04 ` [Intel-gfx] [PATCH 12/47] drm/i915/guc: Add bypass tasklet submission path to GuC Matthew Brost
@ 2021-06-29 22:09   ` John Harrison
  0 siblings, 0 replies; 170+ messages in thread
From: John Harrison @ 2021-06-29 22:09 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 6/24/2021 00:04, Matthew Brost wrote:
> Add bypass tasklet submission path to GuC. The tasklet is only used if H2G
> channel has backpresure.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> ---
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++++++++++++++----
>   1 file changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index ee933efbf0ff..38aff83ee9fa 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -172,6 +172,12 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   	return err;
>   }
>   
> +static inline void guc_set_lrc_tail(struct i915_request *rq)
> +{
> +	rq->context->lrc_reg_state[CTX_RING_TAIL] =
> +		intel_ring_set_tail(rq->ring, rq->tail);
> +}
> +
>   static inline int rq_prio(const struct i915_request *rq)
>   {
>   	return rq->sched.attr.priority;
> @@ -215,8 +221,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
>   	}
>   done:
>   	if (submit) {
> -		last->context->lrc_reg_state[CTX_RING_TAIL] =
> -			intel_ring_set_tail(last->ring, last->tail);
> +		guc_set_lrc_tail(last);
>   resubmit:
>   		/*
>   		 * We only check for -EBUSY here even though it is possible for
> @@ -496,20 +501,36 @@ static inline void queue_request(struct i915_sched_engine *sched_engine,
>   	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>   }
>   
> +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> +				     struct i915_request *rq)
> +{
> +	int ret;
> +
> +	__i915_request_submit(rq);
> +
> +	trace_i915_request_in(rq, 0);
> +
> +	guc_set_lrc_tail(rq);
> +	ret = guc_add_request(guc, rq);
> +	if (ret == -EBUSY)
> +		guc->stalled_request = rq;
> +
> +	return ret;
> +}
> +
>   static void guc_submit_request(struct i915_request *rq)
>   {
>   	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> +	struct intel_guc *guc = &rq->engine->gt->uc.guc;
>   	unsigned long flags;
>   
>   	/* Will be called from irq-context when using foreign fences. */
>   	spin_lock_irqsave(&sched_engine->lock, flags);
>   
> -	queue_request(sched_engine, rq, rq_prio(rq));
> -
> -	GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
> -	GEM_BUG_ON(list_empty(&rq->sched.link));
> -
> -	tasklet_hi_schedule(&sched_engine->tasklet);
> +	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> +		queue_request(sched_engine, rq, rq_prio(rq));
> +	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> +		tasklet_hi_schedule(&sched_engine->tasklet);
>   
>   	spin_unlock_irqrestore(&sched_engine->lock, flags);
>   }

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 08/47] drm/i915/guc: Add new GuC interface defines and structures
  2021-06-29 21:11   ` John Harrison
@ 2021-06-30  0:30     ` Matthew Brost
  0 siblings, 0 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-30  0:30 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Tue, Jun 29, 2021 at 02:11:00PM -0700, John Harrison wrote:
> On 6/24/2021 00:04, Matthew Brost wrote:
> > Add new GuC interface defines and structures while maintaining old ones
> > in parallel.
> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> I think there was some difference of opinion over whether these additions
> should be squashed in to the specific patches that first use them. However,
> on the grounds that such is basically a patch-only style comment and doesn't
> change the final product plus, we need to get this stuff merged efficiently
> and not spend forever rebasing and refactoring...
> 

Agree this doesn't need to be squashed as it doesn't break anything and
also this all dead code anyways until we enable submission at the end of
the series.

Matt

> Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
> 
> 
> > ---
> >   .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  | 14 +++++++
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   | 41 +++++++++++++++++++
> >   2 files changed, 55 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> > index 2d6198e63ebe..57e18babdf4b 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> > @@ -124,10 +124,24 @@ enum intel_guc_action {
> >   	INTEL_GUC_ACTION_FORCE_LOG_BUFFER_FLUSH = 0x302,
> >   	INTEL_GUC_ACTION_ENTER_S_STATE = 0x501,
> >   	INTEL_GUC_ACTION_EXIT_S_STATE = 0x502,
> > +	INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE = 0x506,
> > +	INTEL_GUC_ACTION_SCHED_CONTEXT = 0x1000,
> > +	INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET = 0x1001,
> > +	INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE = 0x1002,
> > +	INTEL_GUC_ACTION_SCHED_ENGINE_MODE_SET = 0x1003,
> > +	INTEL_GUC_ACTION_SCHED_ENGINE_MODE_DONE = 0x1004,
> > +	INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY = 0x1005,
> > +	INTEL_GUC_ACTION_SET_CONTEXT_EXECUTION_QUANTUM = 0x1006,
> > +	INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT = 0x1007,
> > +	INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION = 0x1008,
> > +	INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION = 0x1009,
> >   	INTEL_GUC_ACTION_SLPC_REQUEST = 0x3003,
> >   	INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000,
> > +	INTEL_GUC_ACTION_REGISTER_CONTEXT = 0x4502,
> > +	INTEL_GUC_ACTION_DEREGISTER_CONTEXT = 0x4503,
> >   	INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
> >   	INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
> > +	INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
> >   	INTEL_GUC_ACTION_LIMIT
> >   };
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> > index 617ec601648d..28245a217a39 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> > @@ -17,6 +17,9 @@
> >   #include "abi/guc_communication_ctb_abi.h"
> >   #include "abi/guc_messages_abi.h"
> > +#define GUC_CONTEXT_DISABLE		0
> > +#define GUC_CONTEXT_ENABLE		1
> > +
> >   #define GUC_CLIENT_PRIORITY_KMD_HIGH	0
> >   #define GUC_CLIENT_PRIORITY_HIGH	1
> >   #define GUC_CLIENT_PRIORITY_KMD_NORMAL	2
> > @@ -26,6 +29,9 @@
> >   #define GUC_MAX_STAGE_DESCRIPTORS	1024
> >   #define	GUC_INVALID_STAGE_ID		GUC_MAX_STAGE_DESCRIPTORS
> > +#define GUC_MAX_LRC_DESCRIPTORS		65535
> > +#define	GUC_INVALID_LRC_ID		GUC_MAX_LRC_DESCRIPTORS
> > +
> >   #define GUC_RENDER_ENGINE		0
> >   #define GUC_VIDEO_ENGINE		1
> >   #define GUC_BLITTER_ENGINE		2
> > @@ -237,6 +243,41 @@ struct guc_stage_desc {
> >   	u64 desc_private;
> >   } __packed;
> > +#define CONTEXT_REGISTRATION_FLAG_KMD	BIT(0)
> > +
> > +#define CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000
> > +#define CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US 500000
> > +
> > +/* Preempt to idle on quantum expiry */
> > +#define CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE	BIT(0)
> > +
> > +/*
> > + * GuC Context registration descriptor.
> > + * FIXME: This is only required to exist during context registration.
> > + * The current 1:1 between guc_lrc_desc and LRCs for the lifetime of the LRC
> > + * is not required.
> > + */
> > +struct guc_lrc_desc {
> > +	u32 hw_context_desc;
> > +	u32 slpm_perf_mode_hint;	/* SPLC v1 only */
> > +	u32 slpm_freq_hint;
> > +	u32 engine_submit_mask;		/* In logical space */
> > +	u8 engine_class;
> > +	u8 reserved0[3];
> > +	u32 priority;
> > +	u32 process_desc;
> > +	u32 wq_addr;
> > +	u32 wq_size;
> > +	u32 context_flags;		/* CONTEXT_REGISTRATION_* */
> > +	/* Time for one workload to execute. (in micro seconds) */
> > +	u32 execution_quantum;
> > +	/* Time to wait for a preemption request to complete before issuing a
> > +	 * reset. (in micro seconds). */
> > +	u32 preemption_timeout;
> > +	u32 policy_flags;		/* CONTEXT_POLICY_* */
> > +	u32 reserved1[19];
> > +} __packed;
> > +
> >   #define GUC_POWER_UNSPECIFIED	0
> >   #define GUC_POWER_D0		1
> >   #define GUC_POWER_D1		2
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 11/47] drm/i915/guc: Implement GuC submission tasklet
  2021-06-29 22:04   ` John Harrison
@ 2021-06-30  0:41     ` Matthew Brost
  0 siblings, 0 replies; 170+ messages in thread
From: Matthew Brost @ 2021-06-30  0:41 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Tue, Jun 29, 2021 at 03:04:56PM -0700, John Harrison wrote:
> On 6/24/2021 00:04, Matthew Brost wrote:
> > Implement GuC submission tasklet for new interface. The new GuC
> > interface uses H2G to submit contexts to the GuC. Since H2G use a single
> > channel, a single tasklet submits is used for the submission path.
> Re-word? 'a single tasklet submits is used...' doesn't make sense.
> 

Will do.

> > Also the per engine interrupt handler has been updated to disable the
> > rescheduling of the physical engine tasklet, when using GuC scheduling,
> > as the physical engine tasklet is no longer used.
> > 
> > In this patch the field, guc_id, has been added to intel_context and is
> > not assigned. Patches later in the series will assign this value.
> > 
> > Cc: John Harrison<john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost<matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   4 +
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +++++++++---------
> >   3 files changed, 127 insertions(+), 117 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index ed8c447a7346..bb6fef7eae52 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -136,6 +136,15 @@ struct intel_context {
> >   	struct intel_sseu sseu;
> >   	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
> > +
> > +	/* GuC scheduling state that does not require a lock. */
> Maybe 'GuC scheduling state flags that do not require a lock'? Otherwise it
> just looks like a counter or something.
> 

Sure.

> > +	atomic_t guc_sched_state_no_lock;
> > +
> > +	/*
> > +	 * GuC lrc descriptor ID - Not assigned in this patch but future patches
> Not a blocker but s/lrc/LRC/ would keep Michal happy ;). Although presumably
> this comment is at least being amended by later patches in the series.
>

Will fix.

> > +	 * in the series will.
> > +	 */
> > +	u16 guc_id;
> >   };
> >   #endif /* __INTEL_CONTEXT_TYPES__ */
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 2313d9fc087b..9ba8219475b2 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -30,6 +30,10 @@ struct intel_guc {
> >   	struct intel_guc_log log;
> >   	struct intel_guc_ct ct;
> > +	/* Global engine used to submit requests to GuC */
> > +	struct i915_sched_engine *sched_engine;
> > +	struct i915_request *stalled_request;
> > +
> >   	/* intel_guc_recv interrupt related state */
> >   	spinlock_t irq_lock;
> >   	unsigned int msg_enabled_mask;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 23a94a896a0b..ee933efbf0ff 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -60,6 +60,31 @@
> >   #define GUC_REQUEST_SIZE 64 /* bytes */
> > +/*
> > + * Below is a set of functions which control the GuC scheduling state which do
> > + * not require a lock as all state transitions are mutually exclusive. i.e. It
> > + * is not possible for the context pinning code and submission, for the same
> > + * context, to be executing simultaneously. We still need an atomic as it is
> > + * possible for some of the bits to changing at the same time though.
> > + */
> > +#define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
> > +static inline bool context_enabled(struct intel_context *ce)
> > +{
> > +	return (atomic_read(&ce->guc_sched_state_no_lock) &
> > +		SCHED_STATE_NO_LOCK_ENABLED);
> > +}
> > +
> > +static inline void set_context_enabled(struct intel_context *ce)
> > +{
> > +	atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
> > +}
> > +
> > +static inline void clr_context_enabled(struct intel_context *ce)
> > +{
> > +	atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
> > +		   &ce->guc_sched_state_no_lock);
> > +}
> > +
> >   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> >   {
> >   	return rb_entry(rb, struct i915_priolist, node);
> > @@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> >   	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> >   }
> > -static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > +static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   {
> > -	/* Leaving stub as this function will be used in future patches */
> > -}
> > +	int err;
> > +	struct intel_context *ce = rq->context;
> > +	u32 action[3];
> > +	int len = 0;
> > +	bool enabled = context_enabled(ce);
> > -/*
> > - * When we're doing submissions using regular execlists backend, writing to
> > - * ELSP from CPU side is enough to make sure that writes to ringbuffer pages
> > - * pinned in mappable aperture portion of GGTT are visible to command streamer.
> > - * Writes done by GuC on our behalf are not guaranteeing such ordering,
> > - * therefore, to ensure the flush, we're issuing a POSTING READ.
> > - */
> > -static void flush_ggtt_writes(struct i915_vma *vma)
> > -{
> > -	if (i915_vma_is_map_and_fenceable(vma))
> > -		intel_uncore_posting_read_fw(vma->vm->gt->uncore,
> > -					     GUC_STATUS);
> > -}
> > +	if (!enabled) {
> > +		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> > +		action[len++] = ce->guc_id;
> > +		action[len++] = GUC_CONTEXT_ENABLE;
> > +	} else {
> > +		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
> > +		action[len++] = ce->guc_id;
> > +	}
> > -static void guc_submit(struct intel_engine_cs *engine,
> > -		       struct i915_request **out,
> > -		       struct i915_request **end)
> > -{
> > -	struct intel_guc *guc = &engine->gt->uc.guc;
> > +	err = intel_guc_send_nb(guc, action, len);
> > -	do {
> > -		struct i915_request *rq = *out++;
> > +	if (!enabled && !err)
> > +		set_context_enabled(ce);
> > -		flush_ggtt_writes(rq->ring->vma);
> > -		guc_add_request(guc, rq);
> > -	} while (out != end);
> > +	return err;
> >   }
> >   static inline int rq_prio(const struct i915_request *rq)
> > @@ -160,125 +177,88 @@ static inline int rq_prio(const struct i915_request *rq)
> >   	return rq->sched.attr.priority;
> >   }
> > -static struct i915_request *schedule_in(struct i915_request *rq, int idx)
> > +static int guc_dequeue_one_context(struct intel_guc *guc)
> >   {
> > -	trace_i915_request_in(rq, idx);
> > -
> > -	/*
> > -	 * Currently we are not tracking the rq->context being inflight
> > -	 * (ce->inflight = rq->engine). It is only used by the execlists
> > -	 * backend at the moment, a similar counting strategy would be
> > -	 * required if we generalise the inflight tracking.
> > -	 */
> > -
> > -	__intel_gt_pm_get(rq->engine->gt);
> > -	return i915_request_get(rq);
> > -}
> > -
> > -static void schedule_out(struct i915_request *rq)
> > -{
> > -	trace_i915_request_out(rq);
> > -
> > -	intel_gt_pm_put_async(rq->engine->gt);
> > -	i915_request_put(rq);
> > -}
> > -
> > -static void __guc_dequeue(struct intel_engine_cs *engine)
> > -{
> > -	struct intel_engine_execlists * const execlists = &engine->execlists;
> > -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> > -	struct i915_request **first = execlists->inflight;
> > -	struct i915_request ** const last_port = first + execlists->port_mask;
> > -	struct i915_request *last = first[0];
> > -	struct i915_request **port;
> > +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > +	struct i915_request *last = NULL;
> >   	bool submit = false;
> >   	struct rb_node *rb;
> > +	int ret;
> >   	lockdep_assert_held(&sched_engine->lock);
> > -	if (last) {
> > -		if (*++first)
> > -			return;
> > -
> > -		last = NULL;
> > +	if (guc->stalled_request) {
> > +		submit = true;
> > +		last = guc->stalled_request;
> > +		goto resubmit;
> >   	}
> > -	/*
> > -	 * We write directly into the execlists->inflight queue and don't use
> > -	 * the execlists->pending queue, as we don't have a distinct switch
> > -	 * event.
> > -	 */
> > -	port = first;
> >   	while ((rb = rb_first_cached(&sched_engine->queue))) {
> >   		struct i915_priolist *p = to_priolist(rb);
> >   		struct i915_request *rq, *rn;
> >   		priolist_for_each_request_consume(rq, rn, p) {
> > -			if (last && rq->context != last->context) {
> > -				if (port == last_port)
> > -					goto done;
> > -
> > -				*port = schedule_in(last,
> > -						    port - execlists->inflight);
> > -				port++;
> > -			}
> > +			if (last && rq->context != last->context)
> > +				goto done;
> >   			list_del_init(&rq->sched.link);
> > +
> >   			__i915_request_submit(rq);
> > -			submit = true;
> > +
> > +			trace_i915_request_in(rq, 0);
> >   			last = rq;
> > +			submit = true;
> >   		}
> >   		rb_erase_cached(&p->node, &sched_engine->queue);
> >   		i915_priolist_free(p);
> >   	}
> >   done:
> > -	sched_engine->queue_priority_hint =
> > -		rb ? to_priolist(rb)->priority : INT_MIN;
> >   	if (submit) {
> > -		*port = schedule_in(last, port - execlists->inflight);
> > -		*++port = NULL;
> > -		guc_submit(engine, first, port);
> > +		last->context->lrc_reg_state[CTX_RING_TAIL] =
> > +			intel_ring_set_tail(last->ring, last->tail);
> > +resubmit:
> > +		/*
> > +		 * We only check for -EBUSY here even though it is possible for
> > +		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
> > +		 * died and a full GPU needs to be done. The hangcheck will
> 'full GPU reset'. Although I believe strictly speaking, it is a 'full GT
> reset'. There are other bits of the GPU beyond the GT.

Yep, will fix.

> 
> > +		 * eventually detect that the GuC has died and trigger this
> > +		 * reset so no need to handle -EDEADLK here.
> > +		 */
> > +		ret = guc_add_request(guc, last);
> > +		if (ret == -EBUSY) {
> > +			tasklet_schedule(&sched_engine->tasklet);
> > +			guc->stalled_request = last;
> > +			return false;
> > +		}
> >   	}
> > -	execlists->active = execlists->inflight;
> > +
> > +	guc->stalled_request = NULL;
> > +	return submit;
> >   }
> >   static void guc_submission_tasklet(struct tasklet_struct *t)
> >   {
> >   	struct i915_sched_engine *sched_engine =
> >   		from_tasklet(sched_engine, t, tasklet);
> > -	struct intel_engine_cs * const engine = sched_engine->private_data;
> > -	struct intel_engine_execlists * const execlists = &engine->execlists;
> > -	struct i915_request **port, *rq;
> >   	unsigned long flags;
> > +	bool loop;
> > -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > -
> > -	for (port = execlists->inflight; (rq = *port); port++) {
> > -		if (!i915_request_completed(rq))
> > -			break;
> > -
> > -		schedule_out(rq);
> > -	}
> > -	if (port != execlists->inflight) {
> > -		int idx = port - execlists->inflight;
> > -		int rem = ARRAY_SIZE(execlists->inflight) - idx;
> > -		memmove(execlists->inflight, port, rem * sizeof(*port));
> > -	}
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > -	__guc_dequeue(engine);
> > +	do {
> > +		loop = guc_dequeue_one_context(sched_engine->private_data);
> > +	} while (loop);
> > -	i915_sched_engine_reset_on_empty(engine->sched_engine);
> > +	i915_sched_engine_reset_on_empty(sched_engine);
> > -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> >   }
> Not a blocker but it has to be said that it would be much easier to remove
> all of the above if the delete was split into a separate patch. Having two
> completely disparate threads of code interwoven in the diff makes it much
> harder to see what the new version is doing!
> 

Yes, it would be easier to read if this code was deleted in a seperate
patch. I'll keep that in mind going forward. No promises but perhaps
I'll do this in the next rev.

> 
> >   static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
> >   {
> > -	if (iir & GT_RENDER_USER_INTERRUPT) {
> > +	if (iir & GT_RENDER_USER_INTERRUPT)
> >   		intel_engine_signal_breadcrumbs(engine);
> > -		tasklet_hi_schedule(&engine->sched_engine->tasklet);
> > -	}
> >   }
> >   static void guc_reset_prepare(struct intel_engine_cs *engine)
> > @@ -349,6 +329,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> >   	struct rb_node *rb;
> >   	unsigned long flags;
> > +	/* Can be called during boot if GuC fails to load */
> > +	if (!engine->gt)
> > +		return;
> > +
> >   	ENGINE_TRACE(engine, "\n");
> >   	/*
> > @@ -433,8 +417,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
> >   void intel_guc_submission_fini(struct intel_guc *guc)
> >   {
> > -	if (guc->lrc_desc_pool)
> > -		guc_lrc_desc_pool_destroy(guc);
> > +	if (!guc->lrc_desc_pool)
> > +		return;
> > +
> > +	guc_lrc_desc_pool_destroy(guc);
> > +	i915_sched_engine_put(guc->sched_engine);
> >   }
> >   static int guc_context_alloc(struct intel_context *ce)
> > @@ -499,32 +486,32 @@ static int guc_request_alloc(struct i915_request *request)
> >   	return 0;
> >   }
> > -static inline void queue_request(struct intel_engine_cs *engine,
> > +static inline void queue_request(struct i915_sched_engine *sched_engine,
> >   				 struct i915_request *rq,
> >   				 int prio)
> >   {
> >   	GEM_BUG_ON(!list_empty(&rq->sched.link));
> >   	list_add_tail(&rq->sched.link,
> > -		      i915_sched_lookup_priolist(engine->sched_engine, prio));
> > +		      i915_sched_lookup_priolist(sched_engine, prio));
> >   	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> >   }
> >   static void guc_submit_request(struct i915_request *rq)
> >   {
> > -	struct intel_engine_cs *engine = rq->engine;
> > +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> >   	unsigned long flags;
> >   	/* Will be called from irq-context when using foreign fences. */
> > -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > -	queue_request(engine, rq, rq_prio(rq));
> > +	queue_request(sched_engine, rq, rq_prio(rq));
> > -	GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
> > +	GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
> >   	GEM_BUG_ON(list_empty(&rq->sched.link));
> > -	tasklet_hi_schedule(&engine->sched_engine->tasklet);
> > +	tasklet_hi_schedule(&sched_engine->tasklet);
> > -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> >   }
> >   static void sanitize_hwsp(struct intel_engine_cs *engine)
> > @@ -602,8 +589,6 @@ static void guc_release(struct intel_engine_cs *engine)
> >   {
> >   	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
> > -	tasklet_kill(&engine->sched_engine->tasklet);
> > -
> >   	intel_engine_cleanup_common(engine);
> >   	lrc_fini_wa_ctx(engine);
> >   }
> > @@ -674,6 +659,7 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
> >   int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   {
> >   	struct drm_i915_private *i915 = engine->i915;
> > +	struct intel_guc *guc = &engine->gt->uc.guc;
> >   	/*
> >   	 * The setup relies on several assumptions (e.g. irqs always enabled)
> > @@ -681,7 +667,18 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   	 */
> >   	GEM_BUG_ON(GRAPHICS_VER(i915) < 11);
> > -	tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet);
> > +	if (!guc->sched_engine) {
> > +		guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
> Does the re-work of the sched_engine create/destroy happen later in this
> patch series? Wasn't there issues with the wrong destroy function being
> called in certain situations? Or do those issues (and fixes) only come in
> with the virtual engine support?
> 

We didn't need the destroy until we introduce the guc_submit_engine, but
that is changing after the kasan bug fix I sent out today for the
internal version of this code. I've already reworked my upstream branch
to add a destroy vfunc for sched_engine in seperate patch a bit later in
the series.

Matt

> John.
> 
> > +		if (!guc->sched_engine)
> > +			return -ENOMEM;
> > +
> > +		guc->sched_engine->schedule = i915_schedule;
> > +		guc->sched_engine->private_data = guc;
> > +		tasklet_setup(&guc->sched_engine->tasklet,
> > +			      guc_submission_tasklet);
> > +	}
> > +	i915_sched_engine_put(engine->sched_engine);
> > +	engine->sched_engine = i915_sched_engine_get(guc->sched_engine);
> >   	guc_default_vfuncs(engine);
> >   	guc_default_irqs(engine);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 47/47] drm/i915/guc: Unblock GuC submission on Gen11+
  2021-06-24  7:05 ` [Intel-gfx] [PATCH 47/47] drm/i915/guc: Unblock GuC submission on Gen11+ Matthew Brost
@ 2021-06-30  8:22   ` Martin Peres
  2021-06-30 18:00     ` Matthew Brost
  2021-06-30 18:58     ` John Harrison
  2021-07-15  0:49   ` Matthew Brost
  1 sibling, 2 replies; 170+ messages in thread
From: Martin Peres @ 2021-06-30  8:22 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel



On 24/06/2021 10:05, Matthew Brost wrote:
> From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> 
> Unblock GuC submission on Gen11+ platforms.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h            |  1 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  8 ++++++++
>   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h |  3 +--
>   drivers/gpu/drm/i915/gt/uc/intel_uc.c             | 14 +++++++++-----
>   4 files changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index fae01dc8e1b9..77981788204f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -54,6 +54,7 @@ struct intel_guc {
>   	struct ida guc_ids;
>   	struct list_head guc_id_list;
>   
> +	bool submission_supported;
>   	bool submission_selected;
>   
>   	struct i915_vma *ads_vma;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index a427336ce916..405339202280 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -2042,6 +2042,13 @@ void intel_guc_submission_disable(struct intel_guc *guc)
>   	/* Note: By the time we're here, GuC may have already been reset */
>   }
>   
> +static bool __guc_submission_supported(struct intel_guc *guc)
> +{
> +	/* GuC submission is unavailable for pre-Gen11 */
> +	return intel_guc_is_supported(guc) &&
> +	       INTEL_GEN(guc_to_gt(guc)->i915) >= 11;
> +}
> +
>   static bool __guc_submission_selected(struct intel_guc *guc)
>   {
>   	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
> @@ -2054,6 +2061,7 @@ static bool __guc_submission_selected(struct intel_guc *guc)
>   
>   void intel_guc_submission_init_early(struct intel_guc *guc)
>   {
> +	guc->submission_supported = __guc_submission_supported(guc);
>   	guc->submission_selected = __guc_submission_selected(guc);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> index a2a3fad72be1..be767eb6ff71 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> @@ -37,8 +37,7 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
>   
>   static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
>   {
> -	/* XXX: GuC submission is unavailable for now */
> -	return false;
> +	return guc->submission_supported;
>   }
>   
>   static inline bool intel_guc_submission_is_wanted(struct intel_guc *guc)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> index 7a69c3c027e9..61be0aa81492 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct intel_uc *uc)
>   		return;
>   	}
>   
> -	/* Default: enable HuC authentication only */
> -	i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
> +	/* Intermediate platforms are HuC authentication only */
> +	if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) {
> +		drm_dbg(&i915->drm, "Disabling GuC only due to old platform\n");

This comment does not seem accurate, given that DG1 is barely out, and 
ADL is not out yet. How about:

"Disabling GuC on untested platforms"?

> +		i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
> +		return;
> +	}
> +
> +	/* Default: enable HuC authentication and GuC submission */
> +	i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION;

This seems to be in contradiction with the GuC submission plan which 
states:

"Not enabled by default on any current platforms but can be enabled via 
modparam enable_guc".

When you rework the patch, could you please add a warning when the user 
force-enables the GuC Command Submission? Something like:

"WARNING: The user force-enabled the experimental GuC command submission 
backend using i915.enable_guc. Please disable it if experiencing 
stability issues. No bug reports will be accepted on this backend".

This should allow you to work on the backend, while communicating 
clearly to users that it is not ready just yet. Once it has matured, the 
warning can be removed.

Cheers,
Martin

>   }
>   
>   /* Reset GuC providing us with fresh state for both GuC and HuC.
> @@ -313,9 +320,6 @@ static int __uc_init(struct intel_uc *uc)
>   	if (i915_inject_probe_failure(uc_to_gt(uc)->i915))
>   		return -ENOMEM;
>   
> -	/* XXX: GuC submission is unavailable for now */
> -	GEM_BUG_ON(intel_uc_uses_guc_submission(uc));
> -
>   	ret = intel_guc_init(guc);
>   	if (ret)
>   		return ret;
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 47/47] drm/i915/guc: Unblock GuC submission on Gen11+
  2021-06-30  8:22   ` Martin Peres
@ 2021-06-30 18:00     ` Matthew Brost
  2021-07-01 18:24       ` Martin Peres
  2021-06-30 18:58     ` John Harrison
  1 sibling, 1 reply; 170+ messages in thread
From: Matthew Brost @ 2021-06-30 18:00 UTC (permalink / raw)
  To: Martin Peres; +Cc: intel-gfx, dri-devel

On Wed, Jun 30, 2021 at 11:22:38AM +0300, Martin Peres wrote:
> 
> 
> On 24/06/2021 10:05, Matthew Brost wrote:
> > From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > 
> > Unblock GuC submission on Gen11+ platforms.
> > 
> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h            |  1 +
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  8 ++++++++
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h |  3 +--
> >   drivers/gpu/drm/i915/gt/uc/intel_uc.c             | 14 +++++++++-----
> >   4 files changed, 19 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index fae01dc8e1b9..77981788204f 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -54,6 +54,7 @@ struct intel_guc {
> >   	struct ida guc_ids;
> >   	struct list_head guc_id_list;
> > +	bool submission_supported;
> >   	bool submission_selected;
> >   	struct i915_vma *ads_vma;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index a427336ce916..405339202280 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -2042,6 +2042,13 @@ void intel_guc_submission_disable(struct intel_guc *guc)
> >   	/* Note: By the time we're here, GuC may have already been reset */
> >   }
> > +static bool __guc_submission_supported(struct intel_guc *guc)
> > +{
> > +	/* GuC submission is unavailable for pre-Gen11 */
> > +	return intel_guc_is_supported(guc) &&
> > +	       INTEL_GEN(guc_to_gt(guc)->i915) >= 11;
> > +}
> > +
> >   static bool __guc_submission_selected(struct intel_guc *guc)
> >   {
> >   	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
> > @@ -2054,6 +2061,7 @@ static bool __guc_submission_selected(struct intel_guc *guc)
> >   void intel_guc_submission_init_early(struct intel_guc *guc)
> >   {
> > +	guc->submission_supported = __guc_submission_supported(guc);
> >   	guc->submission_selected = __guc_submission_selected(guc);
> >   }
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> > index a2a3fad72be1..be767eb6ff71 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> > @@ -37,8 +37,7 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
> >   static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
> >   {
> > -	/* XXX: GuC submission is unavailable for now */
> > -	return false;
> > +	return guc->submission_supported;
> >   }
> >   static inline bool intel_guc_submission_is_wanted(struct intel_guc *guc)
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > index 7a69c3c027e9..61be0aa81492 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct intel_uc *uc)
> >   		return;
> >   	}
> > -	/* Default: enable HuC authentication only */
> > -	i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
> > +	/* Intermediate platforms are HuC authentication only */
> > +	if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) {
> > +		drm_dbg(&i915->drm, "Disabling GuC only due to old platform\n");
> 
> This comment does not seem accurate, given that DG1 is barely out, and ADL
> is not out yet. How about:
> 
> "Disabling GuC on untested platforms"?

This isn't my comment but it seems right to me. AFAIK this describes the
current PR but it is subject to change (i.e. we may enable GuC on DG1 by
default at some point).

> 
> > +		i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
> > +		return;
> > +	}
> > +
> > +	/* Default: enable HuC authentication and GuC submission */
> > +	i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION;
> 
> This seems to be in contradiction with the GuC submission plan which states:
> 
> "Not enabled by default on any current platforms but can be enabled via
> modparam enable_guc".
> 

I don't believe any current platform gets this point where GuC
submission would be enabled by default. The first would be ADL-P which
isn't out yet. 

> When you rework the patch, could you please add a warning when the user
> force-enables the GuC Command Submission? Something like:
> 
> "WARNING: The user force-enabled the experimental GuC command submission
> backend using i915.enable_guc. Please disable it if experiencing stability
> issues. No bug reports will be accepted on this backend".
> 
> This should allow you to work on the backend, while communicating clearly to
> users that it is not ready just yet. Once it has matured, the warning can be
> removed.

This is a good idea but the only issue I see this message blowing up CI.
We plan to enable GuC submission, via a modparam, on several platforms
(e.g. TGL) where TGL isn't the PR in CI. I think if is a debug level
message CI should be happy but I'll double check on this. 

Matt

> 
> Cheers,
> Martin
> 
> >   }
> >   /* Reset GuC providing us with fresh state for both GuC and HuC.
> > @@ -313,9 +320,6 @@ static int __uc_init(struct intel_uc *uc)
> >   	if (i915_inject_probe_failure(uc_to_gt(uc)->i915))
> >   		return -ENOMEM;
> > -	/* XXX: GuC submission is unavailable for now */
> > -	GEM_BUG_ON(intel_uc_uses_guc_submission(uc));
> > -
> >   	ret = intel_guc_init(guc);
> >   	if (ret)
> >   		return ret;
> > 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [Intel-gfx] [PATCH 47/47] drm/i915/guc: Unblock GuC submission on Gen11+
  2021-06-30  8:22   ` Martin Peres
  2021-06-30 18:00     ` Matthew Brost
@ 2021-06-30 18:58     ` John Harrison
  2021-07-01  8:14       ` Pekka Paalanen
  1 sibling, 1 reply; 170+ messages in thread
From: John Harrison @ 2021-06-30 18:58 UTC (permalink / raw)
  To: Martin Peres, Matthew Brost, intel-gfx, dri-devel

On 6/30/2021 01:22, Martin Peres wrote:
> On 24/06/2021 10:05, Matthew Brost wrote:
>> From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>
>> Unblock GuC submission on Gen11+ platforms.
>>
>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/uc/intel_guc.h            |  1 +
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  8 ++++++++
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h |  3 +--
>>   drivers/gpu/drm/i915/gt/uc/intel_uc.c             | 14 +++++++++-----
>>   4 files changed, 19 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>> index fae01dc8e1b9..77981788204f 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>> @@ -54,6 +54,7 @@ struct intel_guc {
>>       struct ida guc_ids;
>>       struct list_head guc_id_list;
>>   +    bool submission_supported;
>>       bool submission_selected;
>>         struct i915_vma *ads_vma;
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> index a427336ce916..405339202280 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> @@ -2042,6 +2042,13 @@ void intel_guc_submission_disable(struct 
>> intel_guc *guc)
>>       /* Note: By the time we're here, GuC may have already been 
>> reset */
>>   }
>>   +static bool __guc_submission_supported(struct intel_guc *guc)
>> +{
>> +    /* GuC submission is unavailable for pre-Gen11 */
>> +    return intel_guc_is_supported(guc) &&
>> +           INTEL_GEN(guc_to_gt(guc)->i915) >= 11;
>> +}
>> +
>>   static bool __guc_submission_selected(struct intel_guc *guc)
>>   {
>>       struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
>> @@ -2054,6 +2061,7 @@ static bool __guc_submission_selected(struct 
>> intel_guc *guc)
>>     void intel_guc_submission_init_early(struct intel_guc *guc)
>>   {
>> +    guc->submission_supported = __guc_submission_supported(guc);
>>       guc->submission_selected = __guc_submission_selected(guc);
>>   }
>>   diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h 
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
>> index a2a3fad72be1..be767eb6ff71 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
>> @@ -37,8 +37,7 @@ int intel_guc_wait_for_pending_msg(struct intel_guc 
>> *guc,
>>     static inline bool intel_guc_submission_is_supported(struct 
>> intel_guc *guc)
>>   {
>> -    /* XXX: GuC submission is unavailable for now */
>> -    return false;
>> +    return guc->submission_supported;
>>   }
>>     static inline bool intel_guc_submission_is_wanted(struct 
>> intel_guc *guc)
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
>> b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> index 7a69c3c027e9..61be0aa81492 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct 
>> intel_uc *uc)
>>           return;
>>       }
>>   -    /* Default: enable HuC authentication only */
>> -    i915->params.enable_guc = ENABLE_GUC