All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for CT changes required for GuC submission (rev3)
  2021-07-06 22:20 ` [Intel-gfx] " Matthew Brost
  (?)
@ 2021-07-06 22:10 ` Patchwork
  -1 siblings, 0 replies; 54+ messages in thread
From: Patchwork @ 2021-07-06 22:10 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx

== Series Details ==

Series: CT changes required for GuC submission (rev3)
URL   : https://patchwork.freedesktop.org/series/91943/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
d6dbe98e9087 drm/i915/guc: Relax CTB response timeout
5a166441c701 drm/i915/guc: Improve error message for unsolicited CT response
48bbc2746b99 drm/i915/guc: Increase size of CTB buffers
7f796c01cba1 drm/i915/guc: Add non blocking CTB send function
020f0bee0513 drm/i915/guc: Add stall timer to non blocking CTB send function
ed6225a350bc drm/i915/guc: Optimize CTB writes and reads
04ca74f95089 drm/i915/guc: Module load failure test for CT buffer creation
-:38: WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email address mismatch: 'From: John Harrison <John.C.Harrison@Intel.com>' != 'Signed-off-by: John Harrison <john.c.harrison@intel.com>'

total: 0 errors, 1 warnings, 0 checks, 20 lines checked


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for CT changes required for GuC submission (rev3)
  2021-07-06 22:20 ` [Intel-gfx] " Matthew Brost
  (?)
  (?)
@ 2021-07-06 22:11 ` Patchwork
  -1 siblings, 0 replies; 54+ messages in thread
From: Patchwork @ 2021-07-06 22:11 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx

== Series Details ==

Series: CT changes required for GuC submission (rev3)
URL   : https://patchwork.freedesktop.org/series/91943/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.
-
+drivers/gpu/drm/i915/display/intel_display.c:1896:21:    expected struct i915_vma *[assigned] vma
+drivers/gpu/drm/i915/display/intel_display.c:1896:21:    got void [noderef] __iomem *[assigned] iomem
+drivers/gpu/drm/i915/display/intel_display.c:1896:21: warning: incorrect type in assignment (different address spaces)
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:27:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:27:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:27:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:32:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:32:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:49:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:49:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:49:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:56:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:56:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_reset.c:1396:5: warning: context imbalance in 'intel_gt_reset_trylock' - different lock contexts for basic block
+drivers/gpu/drm/i915/gt/intel_ring_submission.c:1210:24: warning: Using plain integer as NULL pointer
+drivers/gpu/drm/i915/i915_perf.c:1434:15: warning: memset with byte count of 16777216
+drivers/gpu/drm/i915/i915_perf.c:1488:15: warning: memset with byte count of 16777216
+./include/asm-generic/bitops/find.h:112:45: warning: shift count is negative (-262080)
+./include/asm-generic/bitops/find.h:32:31: warning: shift count is negative (-262080)
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen8_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen8_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen8_write8' - different lock contexts for basic block


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 0/7] CT changes required for GuC submission
@ 2021-07-06 22:20 ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-06 22:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: john.c.harrison, Michal.Wajdeczko

As part of enabling GuC submission discussed in [1], [2], and [3] we
need optimize and update the CT code as this is now in the critical
path of submission. This series includes the patches to do that which is
the first 7 patches from [3]. The patches should have addressed all the
feedback in [3] and should be ready to merge once CI returns a we get a
few more RBs.

v2: Fix checkpatch warning, address a couple of Michal's comments
v3: Address John Harrison's comments

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

[1] https://patchwork.freedesktop.org/series/89844/
[2] https://patchwork.freedesktop.org/series/91417/
[3] https://patchwork.freedesktop.org/series/91840/

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

John Harrison (1):
  drm/i915/guc: Module load failure test for CT buffer creation

Matthew Brost (6):
  drm/i915/guc: Relax CTB response timeout
  drm/i915/guc: Improve error message for unsolicited CT response
  drm/i915/guc: Increase size of CTB buffers
  drm/i915/guc: Add non blocking CTB send function
  drm/i915/guc: Add stall timer to non blocking CTB send function
  drm/i915/guc: Optimize CTB writes and reads

 .../gt/uc/abi/guc_communication_ctb_abi.h     |   3 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  11 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 249 +++++++++++++++---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  14 +-
 4 files changed, 231 insertions(+), 46 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Intel-gfx] [PATCH 0/7] CT changes required for GuC submission
@ 2021-07-06 22:20 ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-06 22:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel

As part of enabling GuC submission discussed in [1], [2], and [3] we
need optimize and update the CT code as this is now in the critical
path of submission. This series includes the patches to do that which is
the first 7 patches from [3]. The patches should have addressed all the
feedback in [3] and should be ready to merge once CI returns a we get a
few more RBs.

v2: Fix checkpatch warning, address a couple of Michal's comments
v3: Address John Harrison's comments

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

[1] https://patchwork.freedesktop.org/series/89844/
[2] https://patchwork.freedesktop.org/series/91417/
[3] https://patchwork.freedesktop.org/series/91840/

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

John Harrison (1):
  drm/i915/guc: Module load failure test for CT buffer creation

Matthew Brost (6):
  drm/i915/guc: Relax CTB response timeout
  drm/i915/guc: Improve error message for unsolicited CT response
  drm/i915/guc: Increase size of CTB buffers
  drm/i915/guc: Add non blocking CTB send function
  drm/i915/guc: Add stall timer to non blocking CTB send function
  drm/i915/guc: Optimize CTB writes and reads

 .../gt/uc/abi/guc_communication_ctb_abi.h     |   3 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  11 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 249 +++++++++++++++---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  14 +-
 4 files changed, 231 insertions(+), 46 deletions(-)

-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 1/7] drm/i915/guc: Relax CTB response timeout
  2021-07-06 22:20 ` [Intel-gfx] " Matthew Brost
@ 2021-07-06 22:20   ` Matthew Brost
  -1 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-06 22:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: john.c.harrison, Michal.Wajdeczko

In upcoming patch we will allow more CTB requests to be sent in
parallel to the GuC for processing, so we shouldn't assume any more
that GuC will always reply without 10ms.

Use bigger value hardcoded value of 1s instead.

v2: Add CONFIG_DRM_I915_GUC_CTB_TIMEOUT config option
v3:
 (Daniel Vetter)
  - Use hardcoded value of 1s rather than config option
v4:
 (Michal)
  - Use defines for timeout values

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 43409044528e..b86575b99537 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -474,14 +474,18 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
 	/*
 	 * Fast commands should complete in less than 10us, so sample quickly
 	 * up to that length of time, then switch to a slower sleep-wait loop.
-	 * No GuC command should ever take longer than 10ms.
+	 * No GuC command should ever take longer than 10ms but many GuC
+	 * commands can be inflight at time, so use a 1s timeout on the slower
+	 * sleep-wait loop.
 	 */
+#define GUC_CTB_RESPONSE_TIMEOUT_SHORT_MS 10
+#define GUC_CTB_RESPONSE_TIMEOUT_LONG_MS 1000
 #define done \
 	(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, READ_ONCE(req->status)) == \
 	 GUC_HXG_ORIGIN_GUC)
-	err = wait_for_us(done, 10);
+	err = wait_for_us(done, GUC_CTB_RESPONSE_TIMEOUT_SHORT_MS);
 	if (err)
-		err = wait_for(done, 10);
+		err = wait_for(done, GUC_CTB_RESPONSE_TIMEOUT_LONG_MS);
 #undef done
 
 	if (unlikely(err))
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Intel-gfx] [PATCH 1/7] drm/i915/guc: Relax CTB response timeout
@ 2021-07-06 22:20   ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-06 22:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel

In upcoming patch we will allow more CTB requests to be sent in
parallel to the GuC for processing, so we shouldn't assume any more
that GuC will always reply without 10ms.

Use bigger value hardcoded value of 1s instead.

v2: Add CONFIG_DRM_I915_GUC_CTB_TIMEOUT config option
v3:
 (Daniel Vetter)
  - Use hardcoded value of 1s rather than config option
v4:
 (Michal)
  - Use defines for timeout values

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 43409044528e..b86575b99537 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -474,14 +474,18 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
 	/*
 	 * Fast commands should complete in less than 10us, so sample quickly
 	 * up to that length of time, then switch to a slower sleep-wait loop.
-	 * No GuC command should ever take longer than 10ms.
+	 * No GuC command should ever take longer than 10ms but many GuC
+	 * commands can be inflight at time, so use a 1s timeout on the slower
+	 * sleep-wait loop.
 	 */
+#define GUC_CTB_RESPONSE_TIMEOUT_SHORT_MS 10
+#define GUC_CTB_RESPONSE_TIMEOUT_LONG_MS 1000
 #define done \
 	(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, READ_ONCE(req->status)) == \
 	 GUC_HXG_ORIGIN_GUC)
-	err = wait_for_us(done, 10);
+	err = wait_for_us(done, GUC_CTB_RESPONSE_TIMEOUT_SHORT_MS);
 	if (err)
-		err = wait_for(done, 10);
+		err = wait_for(done, GUC_CTB_RESPONSE_TIMEOUT_LONG_MS);
 #undef done
 
 	if (unlikely(err))
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 2/7] drm/i915/guc: Improve error message for unsolicited CT response
  2021-07-06 22:20 ` [Intel-gfx] " Matthew Brost
@ 2021-07-06 22:20   ` Matthew Brost
  -1 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-06 22:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: john.c.harrison, Michal.Wajdeczko

Improve the error message when a unsolicited CT response is received by
printing fence that couldn't be found, the last fence, and all requests
with a response outstanding.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index b86575b99537..80db59b45c45 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -732,12 +732,16 @@ static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 		found = true;
 		break;
 	}
-	spin_unlock_irqrestore(&ct->requests.lock, flags);
-
 	if (!found) {
 		CT_ERROR(ct, "Unsolicited response (fence %u)\n", fence);
-		return -ENOKEY;
+		CT_ERROR(ct, "Could not find fence=%u, last_fence=%u\n", fence,
+			 ct->requests.last_fence);
+		list_for_each_entry(req, &ct->requests.pending, link)
+			CT_ERROR(ct, "request %u awaits response\n",
+				 req->fence);
+		err = -ENOKEY;
 	}
+	spin_unlock_irqrestore(&ct->requests.lock, flags);
 
 	if (unlikely(err))
 		return err;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Intel-gfx] [PATCH 2/7] drm/i915/guc: Improve error message for unsolicited CT response
@ 2021-07-06 22:20   ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-06 22:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Improve the error message when a unsolicited CT response is received by
printing fence that couldn't be found, the last fence, and all requests
with a response outstanding.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index b86575b99537..80db59b45c45 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -732,12 +732,16 @@ static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 		found = true;
 		break;
 	}
-	spin_unlock_irqrestore(&ct->requests.lock, flags);
-
 	if (!found) {
 		CT_ERROR(ct, "Unsolicited response (fence %u)\n", fence);
-		return -ENOKEY;
+		CT_ERROR(ct, "Could not find fence=%u, last_fence=%u\n", fence,
+			 ct->requests.last_fence);
+		list_for_each_entry(req, &ct->requests.pending, link)
+			CT_ERROR(ct, "request %u awaits response\n",
+				 req->fence);
+		err = -ENOKEY;
 	}
+	spin_unlock_irqrestore(&ct->requests.lock, flags);
 
 	if (unlikely(err))
 		return err;
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 3/7] drm/i915/guc: Increase size of CTB buffers
  2021-07-06 22:20 ` [Intel-gfx] " Matthew Brost
@ 2021-07-06 22:20   ` Matthew Brost
  -1 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-06 22:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: john.c.harrison, Michal.Wajdeczko

With the introduction of non-blocking CTBs more than one CTB can be in
flight at a time. Increasing the size of the CTBs should reduce how
often software hits the case where no space is available in the CTB
buffer.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 80db59b45c45..43e03aa2dde8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -58,11 +58,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
  *      +--------+-----------------------------------------------+------+
  *
  * Size of each `CT Buffer`_ must be multiple of 4K.
- * As we don't expect too many messages, for now use minimum sizes.
+ * We don't expect too many messages in flight at any time, unless we are
+ * using the GuC submission. In that case each request requires a minimum
+ * 2 dwords which gives us a maximum 256 queue'd requests. Hopefully this
+ * enough space to avoid backpressure on the driver. We increase the size
+ * of the receive buffer (relative to the send) to ensure a G2H response
+ * CTB has a landing spot.
  */
 #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
 #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
-#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
+#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
 
 struct ct_request {
 	struct list_head link;
@@ -643,7 +648,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	/* beware of buffer wrap case */
 	if (unlikely(available < 0))
 		available += size;
-	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
+	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
 	GEM_BUG_ON(available < 0);
 
 	header = cmds[head];
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Intel-gfx] [PATCH 3/7] drm/i915/guc: Increase size of CTB buffers
@ 2021-07-06 22:20   ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-06 22:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel

With the introduction of non-blocking CTBs more than one CTB can be in
flight at a time. Increasing the size of the CTBs should reduce how
often software hits the case where no space is available in the CTB
buffer.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 80db59b45c45..43e03aa2dde8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -58,11 +58,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
  *      +--------+-----------------------------------------------+------+
  *
  * Size of each `CT Buffer`_ must be multiple of 4K.
- * As we don't expect too many messages, for now use minimum sizes.
+ * We don't expect too many messages in flight at any time, unless we are
+ * using the GuC submission. In that case each request requires a minimum
+ * 2 dwords which gives us a maximum 256 queue'd requests. Hopefully this
+ * enough space to avoid backpressure on the driver. We increase the size
+ * of the receive buffer (relative to the send) to ensure a G2H response
+ * CTB has a landing spot.
  */
 #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
 #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
-#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
+#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
 
 struct ct_request {
 	struct list_head link;
@@ -643,7 +648,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	/* beware of buffer wrap case */
 	if (unlikely(available < 0))
 		available += size;
-	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
+	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
 	GEM_BUG_ON(available < 0);
 
 	header = cmds[head];
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 4/7] drm/i915/guc: Add non blocking CTB send function
  2021-07-06 22:20 ` [Intel-gfx] " Matthew Brost
@ 2021-07-06 22:20   ` Matthew Brost
  -1 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-06 22:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: john.c.harrison, Michal.Wajdeczko

Add non blocking CTB send function, intel_guc_send_nb. GuC submission
will send CTBs in the critical path and does not need to wait for these
CTBs to complete before moving on, hence the need for this new function.

The non-blocking CTB now must have a flow control mechanism to ensure
the buffer isn't overrun. A lazy spin wait is used as we believe the
flow control condition should be rare with a properly sized buffer.

The function, intel_guc_send_nb, is exported in this patch but unused.
Several patches later in the series make use of this function.

v2:
 (Michal)
  - Use define for H2G room calculations
  - Move INTEL_GUC_SEND_NB define
 (Daniel Vetter)
  - Use msleep_interruptible rather than cond_resched
v3:
 (Michal)
  - Move includes to following patch
  - s/INTEL_GUC_SEND_NB/INTEL_GUC_CT_SEND_NB/g
v4:
 (John H)
  - Update comment, add type local variable

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gt/uc/abi/guc_communication_ctb_abi.h     |  3 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        | 11 ++-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 88 ++++++++++++++++---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +-
 4 files changed, 91 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
index e933ca02d0eb..99e1fad5ca20 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
@@ -79,7 +79,8 @@ static_assert(sizeof(struct guc_ct_buffer_desc) == 64);
  *  +---+-------+--------------------------------------------------------------+
  */
 
-#define GUC_CTB_MSG_MIN_LEN			1u
+#define GUC_CTB_HDR_LEN				1u
+#define GUC_CTB_MSG_MIN_LEN			GUC_CTB_HDR_LEN
 #define GUC_CTB_MSG_MAX_LEN			256u
 #define GUC_CTB_MSG_0_FENCE			(0xffff << 16)
 #define GUC_CTB_MSG_0_FORMAT			(0xf << 12)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 4abc59f6f3cd..72e4653222e2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -74,7 +74,14 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
 static
 inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
 {
-	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
+	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
+}
+
+static
+inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
+{
+	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
+				 INTEL_GUC_CT_SEND_NB);
 }
 
 static inline int
@@ -82,7 +89,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
 			   u32 *response_buf, u32 response_buf_size)
 {
 	return intel_guc_ct_send(&guc->ct, action, len,
-				 response_buf, response_buf_size);
+				 response_buf, response_buf_size, 0);
 }
 
 static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 43e03aa2dde8..3d6cba8d91ad 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -3,6 +3,8 @@
  * Copyright © 2016-2019 Intel Corporation
  */
 
+#include <linux/circ_buf.h>
+
 #include "i915_drv.h"
 #include "intel_guc_ct.h"
 #include "gt/intel_gt.h"
@@ -373,7 +375,7 @@ static void write_barrier(struct intel_guc_ct *ct)
 static int ct_write(struct intel_guc_ct *ct,
 		    const u32 *action,
 		    u32 len /* in dwords */,
-		    u32 fence)
+		    u32 fence, u32 flags)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
@@ -383,6 +385,7 @@ static int ct_write(struct intel_guc_ct *ct,
 	u32 used;
 	u32 header;
 	u32 hxg;
+	u32 type;
 	u32 *cmds = ctb->cmds;
 	unsigned int i;
 
@@ -408,8 +411,8 @@ static int ct_write(struct intel_guc_ct *ct,
 	else
 		used = tail - head;
 
-	/* make sure there is a space including extra dw for the fence */
-	if (unlikely(used + len + 1 >= size))
+	/* make sure there is a space including extra dw for the header */
+	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
 		return -ENOSPC;
 
 	/*
@@ -421,9 +424,11 @@ static int ct_write(struct intel_guc_ct *ct,
 		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
 		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
 
-	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
-	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
-			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
+	type = (flags & INTEL_GUC_CT_SEND_NB) ? GUC_HXG_TYPE_EVENT :
+		GUC_HXG_TYPE_REQUEST;
+	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, type) |
+		FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
+			   GUC_HXG_EVENT_MSG_0_DATA0, action[0]);
 
 	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
 		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
@@ -500,6 +505,48 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
 	return err;
 }
 
+static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
+{
+	struct guc_ct_buffer_desc *desc = ctb->desc;
+	u32 head = READ_ONCE(desc->head);
+	u32 space;
+
+	space = CIRC_SPACE(desc->tail, head, ctb->size);
+
+	return space >= len_dw;
+}
+
+static int ct_send_nb(struct intel_guc_ct *ct,
+		      const u32 *action,
+		      u32 len,
+		      u32 flags)
+{
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+	unsigned long spin_flags;
+	u32 fence;
+	int ret;
+
+	spin_lock_irqsave(&ctb->lock, spin_flags);
+
+	ret = h2g_has_room(ctb, len + GUC_CTB_HDR_LEN);
+	if (unlikely(!ret)) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	fence = ct_get_next_fence(ct);
+	ret = ct_write(ct, action, len, fence, flags);
+	if (unlikely(ret))
+		goto out;
+
+	intel_guc_notify(ct_to_guc(ct));
+
+out:
+	spin_unlock_irqrestore(&ctb->lock, spin_flags);
+
+	return ret;
+}
+
 static int ct_send(struct intel_guc_ct *ct,
 		   const u32 *action,
 		   u32 len,
@@ -507,8 +554,10 @@ static int ct_send(struct intel_guc_ct *ct,
 		   u32 response_buf_size,
 		   u32 *status)
 {
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct ct_request request;
 	unsigned long flags;
+	unsigned int sleep_period_ms = 1;
 	u32 fence;
 	int err;
 
@@ -516,8 +565,24 @@ static int ct_send(struct intel_guc_ct *ct,
 	GEM_BUG_ON(!len);
 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
 	GEM_BUG_ON(!response_buf && response_buf_size);
+	might_sleep();
+
+	/*
+	 * We use a lazy spin wait loop here as we believe that if the CT
+	 * buffers are sized correctly the flow control condition should be
+	 * rare.
+	 */
+retry:
+	spin_lock_irqsave(&ctb->lock, flags);
+	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
+		spin_unlock_irqrestore(&ctb->lock, flags);
 
-	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
+		if (msleep_interruptible(sleep_period_ms))
+			return -EINTR;
+		sleep_period_ms = sleep_period_ms << 1;
+
+		goto retry;
+	}
 
 	fence = ct_get_next_fence(ct);
 	request.fence = fence;
@@ -529,9 +594,9 @@ static int ct_send(struct intel_guc_ct *ct,
 	list_add_tail(&request.link, &ct->requests.pending);
 	spin_unlock(&ct->requests.lock);
 
-	err = ct_write(ct, action, len, fence);
+	err = ct_write(ct, action, len, fence, 0);
 
-	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
+	spin_unlock_irqrestore(&ctb->lock, flags);
 
 	if (unlikely(err))
 		goto unlink;
@@ -571,7 +636,7 @@ static int ct_send(struct intel_guc_ct *ct,
  * Command Transport (CT) buffer based GuC send function.
  */
 int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
-		      u32 *response_buf, u32 response_buf_size)
+		      u32 *response_buf, u32 response_buf_size, u32 flags)
 {
 	u32 status = ~0; /* undefined */
 	int ret;
@@ -581,6 +646,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 		return -ENODEV;
 	}
 
+	if (flags & INTEL_GUC_CT_SEND_NB)
+		return ct_send_nb(ct, action, len, flags);
+
 	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
 	if (unlikely(ret < 0)) {
 		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 1ae2dde6db93..5bb8bef024c8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -42,7 +42,6 @@ struct intel_guc_ct_buffer {
 	bool broken;
 };
 
-
 /** Top-level structure for Command Transport related data
  *
  * Includes a pair of CT buffers for bi-directional communication and tracking
@@ -87,8 +86,9 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
 	return ct->enabled;
 }
 
+#define INTEL_GUC_CT_SEND_NB		BIT(31)
 int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
-		      u32 *response_buf, u32 response_buf_size);
+		      u32 *response_buf, u32 response_buf_size, u32 flags);
 void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
 
 #endif /* _INTEL_GUC_CT_H_ */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Intel-gfx] [PATCH 4/7] drm/i915/guc: Add non blocking CTB send function
@ 2021-07-06 22:20   ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-06 22:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Add non blocking CTB send function, intel_guc_send_nb. GuC submission
will send CTBs in the critical path and does not need to wait for these
CTBs to complete before moving on, hence the need for this new function.

The non-blocking CTB now must have a flow control mechanism to ensure
the buffer isn't overrun. A lazy spin wait is used as we believe the
flow control condition should be rare with a properly sized buffer.

The function, intel_guc_send_nb, is exported in this patch but unused.
Several patches later in the series make use of this function.

v2:
 (Michal)
  - Use define for H2G room calculations
  - Move INTEL_GUC_SEND_NB define
 (Daniel Vetter)
  - Use msleep_interruptible rather than cond_resched
v3:
 (Michal)
  - Move includes to following patch
  - s/INTEL_GUC_SEND_NB/INTEL_GUC_CT_SEND_NB/g
v4:
 (John H)
  - Update comment, add type local variable

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gt/uc/abi/guc_communication_ctb_abi.h     |  3 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        | 11 ++-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 88 ++++++++++++++++---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +-
 4 files changed, 91 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
index e933ca02d0eb..99e1fad5ca20 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
@@ -79,7 +79,8 @@ static_assert(sizeof(struct guc_ct_buffer_desc) == 64);
  *  +---+-------+--------------------------------------------------------------+
  */
 
-#define GUC_CTB_MSG_MIN_LEN			1u
+#define GUC_CTB_HDR_LEN				1u
+#define GUC_CTB_MSG_MIN_LEN			GUC_CTB_HDR_LEN
 #define GUC_CTB_MSG_MAX_LEN			256u
 #define GUC_CTB_MSG_0_FENCE			(0xffff << 16)
 #define GUC_CTB_MSG_0_FORMAT			(0xf << 12)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 4abc59f6f3cd..72e4653222e2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -74,7 +74,14 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
 static
 inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
 {
-	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
+	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
+}
+
+static
+inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
+{
+	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
+				 INTEL_GUC_CT_SEND_NB);
 }
 
 static inline int
@@ -82,7 +89,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
 			   u32 *response_buf, u32 response_buf_size)
 {
 	return intel_guc_ct_send(&guc->ct, action, len,
-				 response_buf, response_buf_size);
+				 response_buf, response_buf_size, 0);
 }
 
 static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 43e03aa2dde8..3d6cba8d91ad 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -3,6 +3,8 @@
  * Copyright © 2016-2019 Intel Corporation
  */
 
+#include <linux/circ_buf.h>
+
 #include "i915_drv.h"
 #include "intel_guc_ct.h"
 #include "gt/intel_gt.h"
@@ -373,7 +375,7 @@ static void write_barrier(struct intel_guc_ct *ct)
 static int ct_write(struct intel_guc_ct *ct,
 		    const u32 *action,
 		    u32 len /* in dwords */,
-		    u32 fence)
+		    u32 fence, u32 flags)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
@@ -383,6 +385,7 @@ static int ct_write(struct intel_guc_ct *ct,
 	u32 used;
 	u32 header;
 	u32 hxg;
+	u32 type;
 	u32 *cmds = ctb->cmds;
 	unsigned int i;
 
@@ -408,8 +411,8 @@ static int ct_write(struct intel_guc_ct *ct,
 	else
 		used = tail - head;
 
-	/* make sure there is a space including extra dw for the fence */
-	if (unlikely(used + len + 1 >= size))
+	/* make sure there is a space including extra dw for the header */
+	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
 		return -ENOSPC;
 
 	/*
@@ -421,9 +424,11 @@ static int ct_write(struct intel_guc_ct *ct,
 		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
 		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
 
-	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
-	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
-			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
+	type = (flags & INTEL_GUC_CT_SEND_NB) ? GUC_HXG_TYPE_EVENT :
+		GUC_HXG_TYPE_REQUEST;
+	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, type) |
+		FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
+			   GUC_HXG_EVENT_MSG_0_DATA0, action[0]);
 
 	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
 		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
@@ -500,6 +505,48 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
 	return err;
 }
 
+static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
+{
+	struct guc_ct_buffer_desc *desc = ctb->desc;
+	u32 head = READ_ONCE(desc->head);
+	u32 space;
+
+	space = CIRC_SPACE(desc->tail, head, ctb->size);
+
+	return space >= len_dw;
+}
+
+static int ct_send_nb(struct intel_guc_ct *ct,
+		      const u32 *action,
+		      u32 len,
+		      u32 flags)
+{
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+	unsigned long spin_flags;
+	u32 fence;
+	int ret;
+
+	spin_lock_irqsave(&ctb->lock, spin_flags);
+
+	ret = h2g_has_room(ctb, len + GUC_CTB_HDR_LEN);
+	if (unlikely(!ret)) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	fence = ct_get_next_fence(ct);
+	ret = ct_write(ct, action, len, fence, flags);
+	if (unlikely(ret))
+		goto out;
+
+	intel_guc_notify(ct_to_guc(ct));
+
+out:
+	spin_unlock_irqrestore(&ctb->lock, spin_flags);
+
+	return ret;
+}
+
 static int ct_send(struct intel_guc_ct *ct,
 		   const u32 *action,
 		   u32 len,
@@ -507,8 +554,10 @@ static int ct_send(struct intel_guc_ct *ct,
 		   u32 response_buf_size,
 		   u32 *status)
 {
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct ct_request request;
 	unsigned long flags;
+	unsigned int sleep_period_ms = 1;
 	u32 fence;
 	int err;
 
@@ -516,8 +565,24 @@ static int ct_send(struct intel_guc_ct *ct,
 	GEM_BUG_ON(!len);
 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
 	GEM_BUG_ON(!response_buf && response_buf_size);
+	might_sleep();
+
+	/*
+	 * We use a lazy spin wait loop here as we believe that if the CT
+	 * buffers are sized correctly the flow control condition should be
+	 * rare.
+	 */
+retry:
+	spin_lock_irqsave(&ctb->lock, flags);
+	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
+		spin_unlock_irqrestore(&ctb->lock, flags);
 
-	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
+		if (msleep_interruptible(sleep_period_ms))
+			return -EINTR;
+		sleep_period_ms = sleep_period_ms << 1;
+
+		goto retry;
+	}
 
 	fence = ct_get_next_fence(ct);
 	request.fence = fence;
@@ -529,9 +594,9 @@ static int ct_send(struct intel_guc_ct *ct,
 	list_add_tail(&request.link, &ct->requests.pending);
 	spin_unlock(&ct->requests.lock);
 
-	err = ct_write(ct, action, len, fence);
+	err = ct_write(ct, action, len, fence, 0);
 
-	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
+	spin_unlock_irqrestore(&ctb->lock, flags);
 
 	if (unlikely(err))
 		goto unlink;
@@ -571,7 +636,7 @@ static int ct_send(struct intel_guc_ct *ct,
  * Command Transport (CT) buffer based GuC send function.
  */
 int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
-		      u32 *response_buf, u32 response_buf_size)
+		      u32 *response_buf, u32 response_buf_size, u32 flags)
 {
 	u32 status = ~0; /* undefined */
 	int ret;
@@ -581,6 +646,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 		return -ENODEV;
 	}
 
+	if (flags & INTEL_GUC_CT_SEND_NB)
+		return ct_send_nb(ct, action, len, flags);
+
 	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
 	if (unlikely(ret < 0)) {
 		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 1ae2dde6db93..5bb8bef024c8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -42,7 +42,6 @@ struct intel_guc_ct_buffer {
 	bool broken;
 };
 
-
 /** Top-level structure for Command Transport related data
  *
  * Includes a pair of CT buffers for bi-directional communication and tracking
@@ -87,8 +86,9 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
 	return ct->enabled;
 }
 
+#define INTEL_GUC_CT_SEND_NB		BIT(31)
 int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
-		      u32 *response_buf, u32 response_buf_size);
+		      u32 *response_buf, u32 response_buf_size, u32 flags);
 void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
 
 #endif /* _INTEL_GUC_CT_H_ */
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 5/7] drm/i915/guc: Add stall timer to non blocking CTB send function
  2021-07-06 22:20 ` [Intel-gfx] " Matthew Brost
@ 2021-07-06 22:20   ` Matthew Brost
  -1 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-06 22:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: john.c.harrison, Michal.Wajdeczko

Implement a stall timer which fails H2G CTBs once a period of time
with no forward progress is reached to prevent deadlock.

v2:
 (Michal)
  - Improve error message in ct_deadlock()
  - Set broken when ct_deadlock() returns true
  - Return -EPIPE on ct_deadlock()
v3:
 (Michal)
  - Add ms to stall timer comment
 (Matthew)
  - Move broken check to intel_guc_ct_send()

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 62 ++++++++++++++++++++---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  4 ++
 2 files changed, 59 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 3d6cba8d91ad..db3e85b89573 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -4,6 +4,9 @@
  */
 
 #include <linux/circ_buf.h>
+#include <linux/ktime.h>
+#include <linux/time64.h>
+#include <linux/timekeeping.h>
 
 #include "i915_drv.h"
 #include "intel_guc_ct.h"
@@ -316,6 +319,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
 		goto err_deregister;
 
 	ct->enabled = true;
+	ct->stall_time = KTIME_MAX;
 
 	return 0;
 
@@ -389,9 +393,6 @@ static int ct_write(struct intel_guc_ct *ct,
 	u32 *cmds = ctb->cmds;
 	unsigned int i;
 
-	if (unlikely(ctb->broken))
-		return -EPIPE;
-
 	if (unlikely(desc->status))
 		goto corrupted;
 
@@ -505,6 +506,25 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
 	return err;
 }
 
+#define GUC_CTB_TIMEOUT_MS	1500
+static inline bool ct_deadlocked(struct intel_guc_ct *ct)
+{
+	long timeout = GUC_CTB_TIMEOUT_MS;
+	bool ret = ktime_ms_delta(ktime_get(), ct->stall_time) > timeout;
+
+	if (unlikely(ret)) {
+		struct guc_ct_buffer_desc *send = ct->ctbs.send.desc;
+		struct guc_ct_buffer_desc *recv = ct->ctbs.send.desc;
+
+		CT_ERROR(ct, "Communication stalled for %lld ms, desc status=%#x,%#x\n",
+			 ktime_ms_delta(ktime_get(), ct->stall_time),
+			 send->status, recv->status);
+		ct->ctbs.send.broken = true;
+	}
+
+	return ret;
+}
+
 static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
 {
 	struct guc_ct_buffer_desc *desc = ctb->desc;
@@ -516,6 +536,26 @@ static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
 	return space >= len_dw;
 }
 
+static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
+{
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+
+	lockdep_assert_held(&ct->ctbs.send.lock);
+
+	if (unlikely(!h2g_has_room(ctb, len_dw))) {
+		if (ct->stall_time == KTIME_MAX)
+			ct->stall_time = ktime_get();
+
+		if (unlikely(ct_deadlocked(ct)))
+			return -EPIPE;
+		else
+			return -EBUSY;
+	}
+
+	ct->stall_time = KTIME_MAX;
+	return 0;
+}
+
 static int ct_send_nb(struct intel_guc_ct *ct,
 		      const u32 *action,
 		      u32 len,
@@ -528,11 +568,9 @@ static int ct_send_nb(struct intel_guc_ct *ct,
 
 	spin_lock_irqsave(&ctb->lock, spin_flags);
 
-	ret = h2g_has_room(ctb, len + GUC_CTB_HDR_LEN);
-	if (unlikely(!ret)) {
-		ret = -EBUSY;
+	ret = has_room_nb(ct, len + GUC_CTB_HDR_LEN);
+	if (unlikely(ret))
 		goto out;
-	}
 
 	fence = ct_get_next_fence(ct);
 	ret = ct_write(ct, action, len, fence, flags);
@@ -575,8 +613,13 @@ static int ct_send(struct intel_guc_ct *ct,
 retry:
 	spin_lock_irqsave(&ctb->lock, flags);
 	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
+		if (ct->stall_time == KTIME_MAX)
+			ct->stall_time = ktime_get();
 		spin_unlock_irqrestore(&ctb->lock, flags);
 
+		if (unlikely(ct_deadlocked(ct)))
+			return -EPIPE;
+
 		if (msleep_interruptible(sleep_period_ms))
 			return -EINTR;
 		sleep_period_ms = sleep_period_ms << 1;
@@ -584,6 +627,8 @@ static int ct_send(struct intel_guc_ct *ct,
 		goto retry;
 	}
 
+	ct->stall_time = KTIME_MAX;
+
 	fence = ct_get_next_fence(ct);
 	request.fence = fence;
 	request.status = 0;
@@ -646,6 +691,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 		return -ENODEV;
 	}
 
+	if (unlikely(ct->ctbs.send.broken))
+		return -EPIPE;
+
 	if (flags & INTEL_GUC_CT_SEND_NB)
 		return ct_send_nb(ct, action, len, flags);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 5bb8bef024c8..bee03794c1eb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -9,6 +9,7 @@
 #include <linux/interrupt.h>
 #include <linux/spinlock.h>
 #include <linux/workqueue.h>
+#include <linux/ktime.h>
 
 #include "intel_guc_fwif.h"
 
@@ -68,6 +69,9 @@ struct intel_guc_ct {
 		struct list_head incoming; /* incoming requests */
 		struct work_struct worker; /* handler for incoming requests */
 	} requests;
+
+	/** @stall_time: time of first time a CTB submission is stalled */
+	ktime_t stall_time;
 };
 
 void intel_guc_ct_init_early(struct intel_guc_ct *ct);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Intel-gfx] [PATCH 5/7] drm/i915/guc: Add stall timer to non blocking CTB send function
@ 2021-07-06 22:20   ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-06 22:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Implement a stall timer which fails H2G CTBs once a period of time
with no forward progress is reached to prevent deadlock.

v2:
 (Michal)
  - Improve error message in ct_deadlock()
  - Set broken when ct_deadlock() returns true
  - Return -EPIPE on ct_deadlock()
v3:
 (Michal)
  - Add ms to stall timer comment
 (Matthew)
  - Move broken check to intel_guc_ct_send()

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 62 ++++++++++++++++++++---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  4 ++
 2 files changed, 59 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 3d6cba8d91ad..db3e85b89573 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -4,6 +4,9 @@
  */
 
 #include <linux/circ_buf.h>
+#include <linux/ktime.h>
+#include <linux/time64.h>
+#include <linux/timekeeping.h>
 
 #include "i915_drv.h"
 #include "intel_guc_ct.h"
@@ -316,6 +319,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
 		goto err_deregister;
 
 	ct->enabled = true;
+	ct->stall_time = KTIME_MAX;
 
 	return 0;
 
@@ -389,9 +393,6 @@ static int ct_write(struct intel_guc_ct *ct,
 	u32 *cmds = ctb->cmds;
 	unsigned int i;
 
-	if (unlikely(ctb->broken))
-		return -EPIPE;
-
 	if (unlikely(desc->status))
 		goto corrupted;
 
@@ -505,6 +506,25 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
 	return err;
 }
 
+#define GUC_CTB_TIMEOUT_MS	1500
+static inline bool ct_deadlocked(struct intel_guc_ct *ct)
+{
+	long timeout = GUC_CTB_TIMEOUT_MS;
+	bool ret = ktime_ms_delta(ktime_get(), ct->stall_time) > timeout;
+
+	if (unlikely(ret)) {
+		struct guc_ct_buffer_desc *send = ct->ctbs.send.desc;
+		struct guc_ct_buffer_desc *recv = ct->ctbs.send.desc;
+
+		CT_ERROR(ct, "Communication stalled for %lld ms, desc status=%#x,%#x\n",
+			 ktime_ms_delta(ktime_get(), ct->stall_time),
+			 send->status, recv->status);
+		ct->ctbs.send.broken = true;
+	}
+
+	return ret;
+}
+
 static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
 {
 	struct guc_ct_buffer_desc *desc = ctb->desc;
@@ -516,6 +536,26 @@ static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
 	return space >= len_dw;
 }
 
+static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
+{
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+
+	lockdep_assert_held(&ct->ctbs.send.lock);
+
+	if (unlikely(!h2g_has_room(ctb, len_dw))) {
+		if (ct->stall_time == KTIME_MAX)
+			ct->stall_time = ktime_get();
+
+		if (unlikely(ct_deadlocked(ct)))
+			return -EPIPE;
+		else
+			return -EBUSY;
+	}
+
+	ct->stall_time = KTIME_MAX;
+	return 0;
+}
+
 static int ct_send_nb(struct intel_guc_ct *ct,
 		      const u32 *action,
 		      u32 len,
@@ -528,11 +568,9 @@ static int ct_send_nb(struct intel_guc_ct *ct,
 
 	spin_lock_irqsave(&ctb->lock, spin_flags);
 
-	ret = h2g_has_room(ctb, len + GUC_CTB_HDR_LEN);
-	if (unlikely(!ret)) {
-		ret = -EBUSY;
+	ret = has_room_nb(ct, len + GUC_CTB_HDR_LEN);
+	if (unlikely(ret))
 		goto out;
-	}
 
 	fence = ct_get_next_fence(ct);
 	ret = ct_write(ct, action, len, fence, flags);
@@ -575,8 +613,13 @@ static int ct_send(struct intel_guc_ct *ct,
 retry:
 	spin_lock_irqsave(&ctb->lock, flags);
 	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
+		if (ct->stall_time == KTIME_MAX)
+			ct->stall_time = ktime_get();
 		spin_unlock_irqrestore(&ctb->lock, flags);
 
+		if (unlikely(ct_deadlocked(ct)))
+			return -EPIPE;
+
 		if (msleep_interruptible(sleep_period_ms))
 			return -EINTR;
 		sleep_period_ms = sleep_period_ms << 1;
@@ -584,6 +627,8 @@ static int ct_send(struct intel_guc_ct *ct,
 		goto retry;
 	}
 
+	ct->stall_time = KTIME_MAX;
+
 	fence = ct_get_next_fence(ct);
 	request.fence = fence;
 	request.status = 0;
@@ -646,6 +691,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 		return -ENODEV;
 	}
 
+	if (unlikely(ct->ctbs.send.broken))
+		return -EPIPE;
+
 	if (flags & INTEL_GUC_CT_SEND_NB)
 		return ct_send_nb(ct, action, len, flags);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 5bb8bef024c8..bee03794c1eb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -9,6 +9,7 @@
 #include <linux/interrupt.h>
 #include <linux/spinlock.h>
 #include <linux/workqueue.h>
+#include <linux/ktime.h>
 
 #include "intel_guc_fwif.h"
 
@@ -68,6 +69,9 @@ struct intel_guc_ct {
 		struct list_head incoming; /* incoming requests */
 		struct work_struct worker; /* handler for incoming requests */
 	} requests;
+
+	/** @stall_time: time of first time a CTB submission is stalled */
+	ktime_t stall_time;
 };
 
 void intel_guc_ct_init_early(struct intel_guc_ct *ct);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
  2021-07-06 22:20 ` [Intel-gfx] " Matthew Brost
@ 2021-07-06 22:20   ` Matthew Brost
  -1 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-06 22:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: john.c.harrison, Michal.Wajdeczko

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail) which could result in accesses across the PCIe bus,
store shadow local copies and only read/write the descriptor values when
absolutely necessary. Also store the current space in the each channel
locally.

v2:
 (Michal)
  - Add additional sanity checks for head / tail pointers
  - Use GUC_CTB_HDR_LEN rather than magic 1
v3:
 (Michal / John H)
  - Drop redundant check of head value

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++--------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
 2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index db3e85b89573..4a73a1f03a9b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
 static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
 {
 	ctb->broken = false;
+	ctb->tail = 0;
+	ctb->head = 0;
+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+
 	guc_ct_buffer_desc_init(ctb->desc);
 }
 
@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
-	u32 tail = desc->tail;
+	u32 tail = ctb->tail;
 	u32 size = ctb->size;
-	u32 used;
 	u32 header;
 	u32 hxg;
 	u32 type;
@@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
+	GEM_BUG_ON(tail > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(tail != READ_ONCE(desc->tail))) {
+		CT_ERROR(ct, "Tail was modified %u != %u\n",
+			 desc->tail, ctb->tail);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+	if (unlikely((desc->tail | desc->head) >= size)) {
 		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-			 head, tail, size);
+			 desc->head, desc->tail, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
-
-	/*
-	 * tail == head condition indicates empty. GuC FW does not support
-	 * using up the entire buffer to get tail == head meaning full.
-	 */
-	if (tail < head)
-		used = (size - head) + tail;
-	else
-		used = tail - head;
-
-	/* make sure there is a space including extra dw for the header */
-	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
-		return -ENOSPC;
+#endif
 
 	/*
 	 * dw0: CT header (including fence)
@@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct,
 	write_barrier(ct);
 
 	/* now update descriptor */
+	ctb->tail = tail;
 	WRITE_ONCE(desc->tail, tail);
+	ctb->space -= len + GUC_CTB_HDR_LEN;
 
 	return 0;
 
@@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,
  * @req:	pointer to pending request
  * @status:	placeholder for status
  *
- * For each sent request, Guc shall send bac CT response message.
+ * For each sent request, GuC shall send back CT response message.
  * Our message handler will update status of tracked request once
  * response message with given fence is received. Wait here and
  * check for valid response status value.
@@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 	return ret;
 }
 
-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
+static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = READ_ONCE(desc->head);
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+	u32 head;
 	u32 space;
 
-	space = CIRC_SPACE(desc->tail, head, ctb->size);
+	if (ctb->space >= len_dw)
+		return true;
+
+	head = READ_ONCE(ctb->desc->head);
+	if (unlikely(head > ctb->size)) {
+		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
+			 ctb->desc->head, ctb->desc->tail, ctb->size);
+		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
+		ctb->broken = true;
+		return false;
+	}
+
+	space = CIRC_SPACE(ctb->tail, head, ctb->size);
+	ctb->space = space;
 
 	return space >= len_dw;
 }
 
 static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
-
 	lockdep_assert_held(&ct->ctbs.send.lock);
 
-	if (unlikely(!h2g_has_room(ctb, len_dw))) {
+	if (unlikely(!h2g_has_room(ct, len_dw))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 
@@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct,
 	 */
 retry:
 	spin_lock_irqsave(&ctb->lock, flags);
-	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
+	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 		spin_unlock_irqrestore(&ctb->lock, flags);
@@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
+	u32 head = ctb->head;
 	u32 tail = desc->tail;
 	u32 size = ctb->size;
 	u32 *cmds = ctb->cmds;
@@ -747,12 +759,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
+	GEM_BUG_ON(head > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(head != READ_ONCE(desc->head))) {
+		CT_ERROR(ct, "Head was modified %u != %u\n",
+			 desc->head, ctb->head);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+	if (unlikely((desc->tail | desc->head) >= size)) {
 		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
 			 head, tail, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
+#else
+	if (unlikely(tail >= size)) {
+		CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
+			 tail, size);
+		desc->status |= GUC_CTB_STATUS_OVERFLOW;
+		goto corrupted;
+	}
+#endif
 
 	/* tail == head condition indicates empty */
 	available = tail - head;
@@ -802,6 +831,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	}
 	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
 
+	ctb->head = head;
 	/* now update descriptor */
 	WRITE_ONCE(desc->head, head);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index bee03794c1eb..edd1bba0445d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -33,6 +33,9 @@ struct intel_guc;
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
  * @size: size of the commands buffer in dwords
+ * @head: local shadow copy of head in dwords
+ * @tail: local shadow copy of tail in dwords
+ * @space: local shadow copy of space in dwords
  * @broken: flag to indicate if descriptor data is broken
  */
 struct intel_guc_ct_buffer {
@@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
 	u32 size;
+	u32 tail;
+	u32 head;
+	u32 space;
 	bool broken;
 };
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Intel-gfx] [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
@ 2021-07-06 22:20   ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-06 22:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail) which could result in accesses across the PCIe bus,
store shadow local copies and only read/write the descriptor values when
absolutely necessary. Also store the current space in the each channel
locally.

v2:
 (Michal)
  - Add additional sanity checks for head / tail pointers
  - Use GUC_CTB_HDR_LEN rather than magic 1
v3:
 (Michal / John H)
  - Drop redundant check of head value

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++--------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
 2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index db3e85b89573..4a73a1f03a9b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
 static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
 {
 	ctb->broken = false;
+	ctb->tail = 0;
+	ctb->head = 0;
+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+
 	guc_ct_buffer_desc_init(ctb->desc);
 }
 
@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
-	u32 tail = desc->tail;
+	u32 tail = ctb->tail;
 	u32 size = ctb->size;
-	u32 used;
 	u32 header;
 	u32 hxg;
 	u32 type;
@@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
+	GEM_BUG_ON(tail > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(tail != READ_ONCE(desc->tail))) {
+		CT_ERROR(ct, "Tail was modified %u != %u\n",
+			 desc->tail, ctb->tail);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+	if (unlikely((desc->tail | desc->head) >= size)) {
 		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-			 head, tail, size);
+			 desc->head, desc->tail, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
-
-	/*
-	 * tail == head condition indicates empty. GuC FW does not support
-	 * using up the entire buffer to get tail == head meaning full.
-	 */
-	if (tail < head)
-		used = (size - head) + tail;
-	else
-		used = tail - head;
-
-	/* make sure there is a space including extra dw for the header */
-	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
-		return -ENOSPC;
+#endif
 
 	/*
 	 * dw0: CT header (including fence)
@@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct,
 	write_barrier(ct);
 
 	/* now update descriptor */
+	ctb->tail = tail;
 	WRITE_ONCE(desc->tail, tail);
+	ctb->space -= len + GUC_CTB_HDR_LEN;
 
 	return 0;
 
@@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,
  * @req:	pointer to pending request
  * @status:	placeholder for status
  *
- * For each sent request, Guc shall send bac CT response message.
+ * For each sent request, GuC shall send back CT response message.
  * Our message handler will update status of tracked request once
  * response message with given fence is received. Wait here and
  * check for valid response status value.
@@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 	return ret;
 }
 
-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
+static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = READ_ONCE(desc->head);
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+	u32 head;
 	u32 space;
 
-	space = CIRC_SPACE(desc->tail, head, ctb->size);
+	if (ctb->space >= len_dw)
+		return true;
+
+	head = READ_ONCE(ctb->desc->head);
+	if (unlikely(head > ctb->size)) {
+		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
+			 ctb->desc->head, ctb->desc->tail, ctb->size);
+		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
+		ctb->broken = true;
+		return false;
+	}
+
+	space = CIRC_SPACE(ctb->tail, head, ctb->size);
+	ctb->space = space;
 
 	return space >= len_dw;
 }
 
 static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
-
 	lockdep_assert_held(&ct->ctbs.send.lock);
 
-	if (unlikely(!h2g_has_room(ctb, len_dw))) {
+	if (unlikely(!h2g_has_room(ct, len_dw))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 
@@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct,
 	 */
 retry:
 	spin_lock_irqsave(&ctb->lock, flags);
-	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
+	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 		spin_unlock_irqrestore(&ctb->lock, flags);
@@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
+	u32 head = ctb->head;
 	u32 tail = desc->tail;
 	u32 size = ctb->size;
 	u32 *cmds = ctb->cmds;
@@ -747,12 +759,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
+	GEM_BUG_ON(head > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(head != READ_ONCE(desc->head))) {
+		CT_ERROR(ct, "Head was modified %u != %u\n",
+			 desc->head, ctb->head);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+	if (unlikely((desc->tail | desc->head) >= size)) {
 		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
 			 head, tail, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
+#else
+	if (unlikely(tail >= size)) {
+		CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
+			 tail, size);
+		desc->status |= GUC_CTB_STATUS_OVERFLOW;
+		goto corrupted;
+	}
+#endif
 
 	/* tail == head condition indicates empty */
 	available = tail - head;
@@ -802,6 +831,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	}
 	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
 
+	ctb->head = head;
 	/* now update descriptor */
 	WRITE_ONCE(desc->head, head);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index bee03794c1eb..edd1bba0445d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -33,6 +33,9 @@ struct intel_guc;
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
  * @size: size of the commands buffer in dwords
+ * @head: local shadow copy of head in dwords
+ * @tail: local shadow copy of tail in dwords
+ * @space: local shadow copy of space in dwords
  * @broken: flag to indicate if descriptor data is broken
  */
 struct intel_guc_ct_buffer {
@@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
 	u32 size;
+	u32 tail;
+	u32 head;
+	u32 space;
 	bool broken;
 };
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 7/7] drm/i915/guc: Module load failure test for CT buffer creation
  2021-07-06 22:20 ` [Intel-gfx] " Matthew Brost
@ 2021-07-06 22:20   ` Matthew Brost
  -1 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-06 22:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: john.c.harrison, Michal.Wajdeczko

From: John Harrison <John.C.Harrison@Intel.com>

Add several module failure load inject points in the CT buffer creation
code path.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 4a73a1f03a9b..5448377026e0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -175,6 +175,10 @@ static int ct_register_buffer(struct intel_guc_ct *ct, u32 type,
 {
 	int err;
 
+	err = i915_inject_probe_error(guc_to_gt(ct_to_guc(ct))->i915, -ENXIO);
+	if (unlikely(err))
+		return err;
+
 	err = guc_action_register_ct_buffer(ct_to_guc(ct), type,
 					    desc_addr, buff_addr, size);
 	if (unlikely(err))
@@ -226,6 +230,10 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
 	u32 *cmds;
 	int err;
 
+	err = i915_inject_probe_error(guc_to_gt(guc)->i915, -ENXIO);
+	if (err)
+		return err;
+
 	GEM_BUG_ON(ct->vma);
 
 	blob_size = 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE + CTB_G2H_BUFFER_SIZE;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Intel-gfx] [PATCH 7/7] drm/i915/guc: Module load failure test for CT buffer creation
@ 2021-07-06 22:20   ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-06 22:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

Add several module failure load inject points in the CT buffer creation
code path.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 4a73a1f03a9b..5448377026e0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -175,6 +175,10 @@ static int ct_register_buffer(struct intel_guc_ct *ct, u32 type,
 {
 	int err;
 
+	err = i915_inject_probe_error(guc_to_gt(ct_to_guc(ct))->i915, -ENXIO);
+	if (unlikely(err))
+		return err;
+
 	err = guc_action_register_ct_buffer(ct_to_guc(ct), type,
 					    desc_addr, buff_addr, size);
 	if (unlikely(err))
@@ -226,6 +230,10 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
 	u32 *cmds;
 	int err;
 
+	err = i915_inject_probe_error(guc_to_gt(guc)->i915, -ENXIO);
+	if (err)
+		return err;
+
 	GEM_BUG_ON(ct->vma);
 
 	blob_size = 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE + CTB_G2H_BUFFER_SIZE;
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for CT changes required for GuC submission (rev3)
  2021-07-06 22:20 ` [Intel-gfx] " Matthew Brost
                   ` (9 preceding siblings ...)
  (?)
@ 2021-07-06 22:38 ` Patchwork
  -1 siblings, 0 replies; 54+ messages in thread
From: Patchwork @ 2021-07-06 22:38 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 3136 bytes --]

== Series Details ==

Series: CT changes required for GuC submission (rev3)
URL   : https://patchwork.freedesktop.org/series/91943/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_10307 -> Patchwork_20539
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_20539 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_20539, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20539/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_20539:

### IGT changes ###

#### Possible regressions ####

  * igt@vgem_basic@unload:
    - fi-bwr-2160:        NOTRUN -> [INCOMPLETE][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20539/fi-bwr-2160/igt@vgem_basic@unload.html

  
Known issues
------------

  Here are the changes found in Patchwork_20539 that come from known issues:

### IGT changes ###

#### Possible fixes ####

  * igt@core_hotunplug@unbind-rebind:
    - fi-bwr-2160:        [{ABORT}][2] -> [PASS][3]
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10307/fi-bwr-2160/igt@core_hotunplug@unbind-rebind.html
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20539/fi-bwr-2160/igt@core_hotunplug@unbind-rebind.html

  
#### Warnings ####

  * igt@runner@aborted:
    - fi-bwr-2160:        [FAIL][4] ([i915#2505]) -> [FAIL][5] ([i915#2505] / [i915#2722])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10307/fi-bwr-2160/igt@runner@aborted.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20539/fi-bwr-2160/igt@runner@aborted.html

  
  [i915#2505]: https://gitlab.freedesktop.org/drm/intel/issues/2505
  [i915#2722]: https://gitlab.freedesktop.org/drm/intel/issues/2722


Participating hosts (38 -> 36)
------------------------------

  Missing    (2): fi-bsw-cyan fi-bdw-samus 


Build changes
-------------

  * Linux: CI_DRM_10307 -> Patchwork_20539

  CI-20190529: 20190529
  CI_DRM_10307: 5f62539f797eecbee492eb05bf574b3ea02ad9ff @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6129: 687589e76f787d26ee2b539e551a9be06bd41ce3 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_20539: 04ca74f95089481a5c31e2eb99c40ec4ff3ff430 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

04ca74f95089 drm/i915/guc: Module load failure test for CT buffer creation
ed6225a350bc drm/i915/guc: Optimize CTB writes and reads
020f0bee0513 drm/i915/guc: Add stall timer to non blocking CTB send function
7f796c01cba1 drm/i915/guc: Add non blocking CTB send function
48bbc2746b99 drm/i915/guc: Increase size of CTB buffers
5a166441c701 drm/i915/guc: Improve error message for unsolicited CT response
d6dbe98e9087 drm/i915/guc: Relax CTB response timeout

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20539/index.html

[-- Attachment #1.2: Type: text/html, Size: 3895 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
  2021-07-06 22:20   ` [Intel-gfx] " Matthew Brost
@ 2021-07-06 22:51     ` John Harrison
  -1 siblings, 0 replies; 54+ messages in thread
From: John Harrison @ 2021-07-06 22:51 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: Michal.Wajdeczko

On 7/6/2021 15:20, Matthew Brost wrote:
> CTB writes are now in the path of command submission and should be
> optimized for performance. Rather than reading CTB descriptor values
> (e.g. head, tail) which could result in accesses across the PCIe bus,
> store shadow local copies and only read/write the descriptor values when
> absolutely necessary. Also store the current space in the each channel
> locally.
>
> v2:
>   (Michal)
>    - Add additional sanity checks for head / tail pointers
>    - Use GUC_CTB_HDR_LEN rather than magic 1
> v3:
>   (Michal / John H)
>    - Drop redundant check of head value
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++--------
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
>   2 files changed, 65 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index db3e85b89573..4a73a1f03a9b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
>   static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
>   {
>   	ctb->broken = false;
> +	ctb->tail = 0;
> +	ctb->head = 0;
> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
> +
>   	guc_ct_buffer_desc_init(ctb->desc);
>   }
>   
> @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
>   {
>   	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>   	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> -	u32 tail = desc->tail;
> +	u32 tail = ctb->tail;
>   	u32 size = ctb->size;
> -	u32 used;
>   	u32 header;
>   	u32 hxg;
>   	u32 type;
> @@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
>   	if (unlikely(desc->status))
>   		goto corrupted;
>   
> -	if (unlikely((tail | head) >= size)) {
> +	GEM_BUG_ON(tail > size);
> +
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> +	if (unlikely(tail != READ_ONCE(desc->tail))) {
> +		CT_ERROR(ct, "Tail was modified %u != %u\n",
> +			 desc->tail, ctb->tail);
> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> +		goto corrupted;
> +	}
> +	if (unlikely((desc->tail | desc->head) >= size)) {
Same arguments below about head apply to tail here. Also, there is no 
#else check on ctb->head?

>   		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> -			 head, tail, size);
> +			 desc->head, desc->tail, size);
>   		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>   		goto corrupted;
>   	}
> -
> -	/*
> -	 * tail == head condition indicates empty. GuC FW does not support
> -	 * using up the entire buffer to get tail == head meaning full.
> -	 */
> -	if (tail < head)
> -		used = (size - head) + tail;
> -	else
> -		used = tail - head;
> -
> -	/* make sure there is a space including extra dw for the header */
> -	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
> -		return -ENOSPC;
> +#endif
>   
>   	/*
>   	 * dw0: CT header (including fence)
> @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct,
>   	write_barrier(ct);
>   
>   	/* now update descriptor */
> +	ctb->tail = tail;
>   	WRITE_ONCE(desc->tail, tail);
> +	ctb->space -= len + GUC_CTB_HDR_LEN;
>   
>   	return 0;
>   
> @@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,
>    * @req:	pointer to pending request
>    * @status:	placeholder for status
>    *
> - * For each sent request, Guc shall send bac CT response message.
> + * For each sent request, GuC shall send back CT response message.
>    * Our message handler will update status of tracked request once
>    * response message with given fence is received. Wait here and
>    * check for valid response status value.
> @@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
>   	return ret;
>   }
>   
> -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
>   {
> -	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = READ_ONCE(desc->head);
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> +	u32 head;
>   	u32 space;
>   
> -	space = CIRC_SPACE(desc->tail, head, ctb->size);
> +	if (ctb->space >= len_dw)
> +		return true;
> +
> +	head = READ_ONCE(ctb->desc->head);
> +	if (unlikely(head > ctb->size)) {
> +		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
> +			 ctb->desc->head, ctb->desc->tail, ctb->size);
> +		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
> +		ctb->broken = true;
> +		return false;
> +	}
> +
> +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> +	ctb->space = space;
>   
>   	return space >= len_dw;
>   }
>   
>   static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
>   {
> -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> -
>   	lockdep_assert_held(&ct->ctbs.send.lock);
>   
> -	if (unlikely(!h2g_has_room(ctb, len_dw))) {
> +	if (unlikely(!h2g_has_room(ct, len_dw))) {
>   		if (ct->stall_time == KTIME_MAX)
>   			ct->stall_time = ktime_get();
>   
> @@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct,
>   	 */
>   retry:
>   	spin_lock_irqsave(&ctb->lock, flags);
> -	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
> +	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
>   		if (ct->stall_time == KTIME_MAX)
>   			ct->stall_time = ktime_get();
>   		spin_unlock_irqrestore(&ctb->lock, flags);
> @@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>   {
>   	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
>   	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> +	u32 head = ctb->head;
>   	u32 tail = desc->tail;
>   	u32 size = ctb->size;
>   	u32 *cmds = ctb->cmds;
> @@ -747,12 +759,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>   	if (unlikely(desc->status))
>   		goto corrupted;
>   
> -	if (unlikely((tail | head) >= size)) {
> +	GEM_BUG_ON(head > size);
> +
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> +	if (unlikely(head != READ_ONCE(desc->head))) {
> +		CT_ERROR(ct, "Head was modified %u != %u\n",
> +			 desc->head, ctb->head);
> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> +		goto corrupted;
> +	}
> +	if (unlikely((desc->tail | desc->head) >= size)) {
As per comment in other thread, the check on head here is redundant 
because you have already hit a BUG_ON(ctb->head > size) followed by 
CT_ERROR(ctb->head != desc->head). Therefore, you can't get here if 
'desc->head > size'.

>   		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>   			 head, tail, size);
>   		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>   		goto corrupted;
>   	}
> +#else
> +	if (unlikely(tail >= size)) {
> +		CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
> +			 tail, size);
> +		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> +		goto corrupted;
> +	}
> +#endif
>   
>   	/* tail == head condition indicates empty */
>   	available = tail - head;
> @@ -802,6 +831,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>   	}
>   	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
>   
> +	ctb->head = head;
>   	/* now update descriptor */
>   	WRITE_ONCE(desc->head, head);
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index bee03794c1eb..edd1bba0445d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -33,6 +33,9 @@ struct intel_guc;
>    * @desc: pointer to the buffer descriptor
>    * @cmds: pointer to the commands buffer
>    * @size: size of the commands buffer in dwords
> + * @head: local shadow copy of head in dwords
> + * @tail: local shadow copy of tail in dwords
> + * @space: local shadow copy of space in dwords
>    * @broken: flag to indicate if descriptor data is broken
>    */
>   struct intel_guc_ct_buffer {
> @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
>   	struct guc_ct_buffer_desc *desc;
>   	u32 *cmds;
>   	u32 size;
> +	u32 tail;
> +	u32 head;
> +	u32 space;
>   	bool broken;
>   };
>   


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Intel-gfx] [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
@ 2021-07-06 22:51     ` John Harrison
  0 siblings, 0 replies; 54+ messages in thread
From: John Harrison @ 2021-07-06 22:51 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 7/6/2021 15:20, Matthew Brost wrote:
> CTB writes are now in the path of command submission and should be
> optimized for performance. Rather than reading CTB descriptor values
> (e.g. head, tail) which could result in accesses across the PCIe bus,
> store shadow local copies and only read/write the descriptor values when
> absolutely necessary. Also store the current space in the each channel
> locally.
>
> v2:
>   (Michal)
>    - Add additional sanity checks for head / tail pointers
>    - Use GUC_CTB_HDR_LEN rather than magic 1
> v3:
>   (Michal / John H)
>    - Drop redundant check of head value
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++--------
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
>   2 files changed, 65 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index db3e85b89573..4a73a1f03a9b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
>   static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
>   {
>   	ctb->broken = false;
> +	ctb->tail = 0;
> +	ctb->head = 0;
> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
> +
>   	guc_ct_buffer_desc_init(ctb->desc);
>   }
>   
> @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
>   {
>   	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>   	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> -	u32 tail = desc->tail;
> +	u32 tail = ctb->tail;
>   	u32 size = ctb->size;
> -	u32 used;
>   	u32 header;
>   	u32 hxg;
>   	u32 type;
> @@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
>   	if (unlikely(desc->status))
>   		goto corrupted;
>   
> -	if (unlikely((tail | head) >= size)) {
> +	GEM_BUG_ON(tail > size);
> +
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> +	if (unlikely(tail != READ_ONCE(desc->tail))) {
> +		CT_ERROR(ct, "Tail was modified %u != %u\n",
> +			 desc->tail, ctb->tail);
> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> +		goto corrupted;
> +	}
> +	if (unlikely((desc->tail | desc->head) >= size)) {
Same arguments below about head apply to tail here. Also, there is no 
#else check on ctb->head?

>   		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> -			 head, tail, size);
> +			 desc->head, desc->tail, size);
>   		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>   		goto corrupted;
>   	}
> -
> -	/*
> -	 * tail == head condition indicates empty. GuC FW does not support
> -	 * using up the entire buffer to get tail == head meaning full.
> -	 */
> -	if (tail < head)
> -		used = (size - head) + tail;
> -	else
> -		used = tail - head;
> -
> -	/* make sure there is a space including extra dw for the header */
> -	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
> -		return -ENOSPC;
> +#endif
>   
>   	/*
>   	 * dw0: CT header (including fence)
> @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct,
>   	write_barrier(ct);
>   
>   	/* now update descriptor */
> +	ctb->tail = tail;
>   	WRITE_ONCE(desc->tail, tail);
> +	ctb->space -= len + GUC_CTB_HDR_LEN;
>   
>   	return 0;
>   
> @@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,
>    * @req:	pointer to pending request
>    * @status:	placeholder for status
>    *
> - * For each sent request, Guc shall send bac CT response message.
> + * For each sent request, GuC shall send back CT response message.
>    * Our message handler will update status of tracked request once
>    * response message with given fence is received. Wait here and
>    * check for valid response status value.
> @@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
>   	return ret;
>   }
>   
> -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
>   {
> -	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = READ_ONCE(desc->head);
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> +	u32 head;
>   	u32 space;
>   
> -	space = CIRC_SPACE(desc->tail, head, ctb->size);
> +	if (ctb->space >= len_dw)
> +		return true;
> +
> +	head = READ_ONCE(ctb->desc->head);
> +	if (unlikely(head > ctb->size)) {
> +		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
> +			 ctb->desc->head, ctb->desc->tail, ctb->size);
> +		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
> +		ctb->broken = true;
> +		return false;
> +	}
> +
> +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> +	ctb->space = space;
>   
>   	return space >= len_dw;
>   }
>   
>   static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
>   {
> -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> -
>   	lockdep_assert_held(&ct->ctbs.send.lock);
>   
> -	if (unlikely(!h2g_has_room(ctb, len_dw))) {
> +	if (unlikely(!h2g_has_room(ct, len_dw))) {
>   		if (ct->stall_time == KTIME_MAX)
>   			ct->stall_time = ktime_get();
>   
> @@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct,
>   	 */
>   retry:
>   	spin_lock_irqsave(&ctb->lock, flags);
> -	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
> +	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
>   		if (ct->stall_time == KTIME_MAX)
>   			ct->stall_time = ktime_get();
>   		spin_unlock_irqrestore(&ctb->lock, flags);
> @@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>   {
>   	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
>   	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> +	u32 head = ctb->head;
>   	u32 tail = desc->tail;
>   	u32 size = ctb->size;
>   	u32 *cmds = ctb->cmds;
> @@ -747,12 +759,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>   	if (unlikely(desc->status))
>   		goto corrupted;
>   
> -	if (unlikely((tail | head) >= size)) {
> +	GEM_BUG_ON(head > size);
> +
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> +	if (unlikely(head != READ_ONCE(desc->head))) {
> +		CT_ERROR(ct, "Head was modified %u != %u\n",
> +			 desc->head, ctb->head);
> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> +		goto corrupted;
> +	}
> +	if (unlikely((desc->tail | desc->head) >= size)) {
As per comment in other thread, the check on head here is redundant 
because you have already hit a BUG_ON(ctb->head > size) followed by 
CT_ERROR(ctb->head != desc->head). Therefore, you can't get here if 
'desc->head > size'.

>   		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>   			 head, tail, size);
>   		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>   		goto corrupted;
>   	}
> +#else
> +	if (unlikely(tail >= size)) {
> +		CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
> +			 tail, size);
> +		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> +		goto corrupted;
> +	}
> +#endif
>   
>   	/* tail == head condition indicates empty */
>   	available = tail - head;
> @@ -802,6 +831,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>   	}
>   	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
>   
> +	ctb->head = head;
>   	/* now update descriptor */
>   	WRITE_ONCE(desc->head, head);
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index bee03794c1eb..edd1bba0445d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -33,6 +33,9 @@ struct intel_guc;
>    * @desc: pointer to the buffer descriptor
>    * @cmds: pointer to the commands buffer
>    * @size: size of the commands buffer in dwords
> + * @head: local shadow copy of head in dwords
> + * @tail: local shadow copy of tail in dwords
> + * @space: local shadow copy of space in dwords
>    * @broken: flag to indicate if descriptor data is broken
>    */
>   struct intel_guc_ct_buffer {
> @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
>   	struct guc_ct_buffer_desc *desc;
>   	u32 *cmds;
>   	u32 size;
> +	u32 tail;
> +	u32 head;
> +	u32 space;
>   	bool broken;
>   };
>   

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
  2021-07-06 22:51     ` [Intel-gfx] " John Harrison
@ 2021-07-07 17:50       ` Matthew Brost
  -1 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-07 17:50 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel, Michal.Wajdeczko

On Tue, Jul 06, 2021 at 03:51:00PM -0700, John Harrison wrote:
> On 7/6/2021 15:20, Matthew Brost wrote:
> > CTB writes are now in the path of command submission and should be
> > optimized for performance. Rather than reading CTB descriptor values
> > (e.g. head, tail) which could result in accesses across the PCIe bus,
> > store shadow local copies and only read/write the descriptor values when
> > absolutely necessary. Also store the current space in the each channel
> > locally.
> > 
> > v2:
> >   (Michal)
> >    - Add additional sanity checks for head / tail pointers
> >    - Use GUC_CTB_HDR_LEN rather than magic 1
> > v3:
> >   (Michal / John H)
> >    - Drop redundant check of head value
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++--------
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
> >   2 files changed, 65 insertions(+), 29 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index db3e85b89573..4a73a1f03a9b 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
> >   static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
> >   {
> >   	ctb->broken = false;
> > +	ctb->tail = 0;
> > +	ctb->head = 0;
> > +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
> > +
> >   	guc_ct_buffer_desc_init(ctb->desc);
> >   }
> > @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
> >   {
> >   	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >   	struct guc_ct_buffer_desc *desc = ctb->desc;
> > -	u32 head = desc->head;
> > -	u32 tail = desc->tail;
> > +	u32 tail = ctb->tail;
> >   	u32 size = ctb->size;
> > -	u32 used;
> >   	u32 header;
> >   	u32 hxg;
> >   	u32 type;
> > @@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
> >   	if (unlikely(desc->status))
> >   		goto corrupted;
> > -	if (unlikely((tail | head) >= size)) {
> > +	GEM_BUG_ON(tail > size);
> > +
> > +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> > +	if (unlikely(tail != READ_ONCE(desc->tail))) {
> > +		CT_ERROR(ct, "Tail was modified %u != %u\n",
> > +			 desc->tail, ctb->tail);
> > +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> > +		goto corrupted;
> > +	}
> > +	if (unlikely((desc->tail | desc->head) >= size)) {
> Same arguments below about head apply to tail here. Also, there is no #else

Yes, desc->tail can be removed from this check. Same for head below. Can
you fix this when merging?

> check on ctb->head?

ctb->head variable isn't used in this path nor is ctb->tail in the
other. In the other path desc->tail is checked as it is read while
desc->head isn't needed to be read here. The other path can also likely
be reworked to pull the tail check outside of the if / else define
block.

> 
> >   		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> > -			 head, tail, size);
> > +			 desc->head, desc->tail, size);
> >   		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> >   		goto corrupted;
> >   	}
> > -
> > -	/*
> > -	 * tail == head condition indicates empty. GuC FW does not support
> > -	 * using up the entire buffer to get tail == head meaning full.
> > -	 */
> > -	if (tail < head)
> > -		used = (size - head) + tail;
> > -	else
> > -		used = tail - head;
> > -
> > -	/* make sure there is a space including extra dw for the header */
> > -	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
> > -		return -ENOSPC;
> > +#endif
> >   	/*
> >   	 * dw0: CT header (including fence)
> > @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct,
> >   	write_barrier(ct);
> >   	/* now update descriptor */
> > +	ctb->tail = tail;
> >   	WRITE_ONCE(desc->tail, tail);
> > +	ctb->space -= len + GUC_CTB_HDR_LEN;
> >   	return 0;
> > @@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,
> >    * @req:	pointer to pending request
> >    * @status:	placeholder for status
> >    *
> > - * For each sent request, Guc shall send bac CT response message.
> > + * For each sent request, GuC shall send back CT response message.
> >    * Our message handler will update status of tracked request once
> >    * response message with given fence is received. Wait here and
> >    * check for valid response status value.
> > @@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
> >   	return ret;
> >   }
> > -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> > +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
> >   {
> > -	struct guc_ct_buffer_desc *desc = ctb->desc;
> > -	u32 head = READ_ONCE(desc->head);
> > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > +	u32 head;
> >   	u32 space;
> > -	space = CIRC_SPACE(desc->tail, head, ctb->size);
> > +	if (ctb->space >= len_dw)
> > +		return true;
> > +
> > +	head = READ_ONCE(ctb->desc->head);
> > +	if (unlikely(head > ctb->size)) {
> > +		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
> > +			 ctb->desc->head, ctb->desc->tail, ctb->size);
> > +		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > +		ctb->broken = true;
> > +		return false;
> > +	}
> > +
> > +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> > +	ctb->space = space;
> >   	return space >= len_dw;
> >   }
> >   static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
> >   {
> > -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > -
> >   	lockdep_assert_held(&ct->ctbs.send.lock);
> > -	if (unlikely(!h2g_has_room(ctb, len_dw))) {
> > +	if (unlikely(!h2g_has_room(ct, len_dw))) {
> >   		if (ct->stall_time == KTIME_MAX)
> >   			ct->stall_time = ktime_get();
> > @@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >   	 */
> >   retry:
> >   	spin_lock_irqsave(&ctb->lock, flags);
> > -	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
> > +	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
> >   		if (ct->stall_time == KTIME_MAX)
> >   			ct->stall_time = ktime_get();
> >   		spin_unlock_irqrestore(&ctb->lock, flags);
> > @@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >   {
> >   	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
> >   	struct guc_ct_buffer_desc *desc = ctb->desc;
> > -	u32 head = desc->head;
> > +	u32 head = ctb->head;
> >   	u32 tail = desc->tail;
> >   	u32 size = ctb->size;
> >   	u32 *cmds = ctb->cmds;
> > @@ -747,12 +759,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >   	if (unlikely(desc->status))
> >   		goto corrupted;
> > -	if (unlikely((tail | head) >= size)) {
> > +	GEM_BUG_ON(head > size);
> > +
> > +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> > +	if (unlikely(head != READ_ONCE(desc->head))) {
> > +		CT_ERROR(ct, "Head was modified %u != %u\n",
> > +			 desc->head, ctb->head);
> > +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> > +		goto corrupted;
> > +	}
> > +	if (unlikely((desc->tail | desc->head) >= size)) {
> As per comment in other thread, the check on head here is redundant because
> you have already hit a BUG_ON(ctb->head > size) followed by
> CT_ERROR(ctb->head != desc->head). Therefore, you can't get here if
> 'desc->head > size'.
>

Yep. See above we can likely just delete this.
 
> >   		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> >   			 head, tail, size);
> >   		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> >   		goto corrupted;
> >   	}
> > +#else
> > +	if (unlikely(tail >= size)) {
> > +		CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
> > +			 tail, size);
> > +		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > +		goto corrupted;
> > +	}

Now we can move this outside if/else define block as it is same check as
above. Again can you do this when you merge this?

Matt

> > +#endif
> >   	/* tail == head condition indicates empty */
> >   	available = tail - head;
> > @@ -802,6 +831,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >   	}
> >   	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
> > +	ctb->head = head;
> >   	/* now update descriptor */
> >   	WRITE_ONCE(desc->head, head);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > index bee03794c1eb..edd1bba0445d 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > @@ -33,6 +33,9 @@ struct intel_guc;
> >    * @desc: pointer to the buffer descriptor
> >    * @cmds: pointer to the commands buffer
> >    * @size: size of the commands buffer in dwords
> > + * @head: local shadow copy of head in dwords
> > + * @tail: local shadow copy of tail in dwords
> > + * @space: local shadow copy of space in dwords
> >    * @broken: flag to indicate if descriptor data is broken
> >    */
> >   struct intel_guc_ct_buffer {
> > @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
> >   	struct guc_ct_buffer_desc *desc;
> >   	u32 *cmds;
> >   	u32 size;
> > +	u32 tail;
> > +	u32 head;
> > +	u32 space;
> >   	bool broken;
> >   };
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Intel-gfx] [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
@ 2021-07-07 17:50       ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-07 17:50 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Tue, Jul 06, 2021 at 03:51:00PM -0700, John Harrison wrote:
> On 7/6/2021 15:20, Matthew Brost wrote:
> > CTB writes are now in the path of command submission and should be
> > optimized for performance. Rather than reading CTB descriptor values
> > (e.g. head, tail) which could result in accesses across the PCIe bus,
> > store shadow local copies and only read/write the descriptor values when
> > absolutely necessary. Also store the current space in the each channel
> > locally.
> > 
> > v2:
> >   (Michal)
> >    - Add additional sanity checks for head / tail pointers
> >    - Use GUC_CTB_HDR_LEN rather than magic 1
> > v3:
> >   (Michal / John H)
> >    - Drop redundant check of head value
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++--------
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
> >   2 files changed, 65 insertions(+), 29 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index db3e85b89573..4a73a1f03a9b 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
> >   static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
> >   {
> >   	ctb->broken = false;
> > +	ctb->tail = 0;
> > +	ctb->head = 0;
> > +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
> > +
> >   	guc_ct_buffer_desc_init(ctb->desc);
> >   }
> > @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
> >   {
> >   	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >   	struct guc_ct_buffer_desc *desc = ctb->desc;
> > -	u32 head = desc->head;
> > -	u32 tail = desc->tail;
> > +	u32 tail = ctb->tail;
> >   	u32 size = ctb->size;
> > -	u32 used;
> >   	u32 header;
> >   	u32 hxg;
> >   	u32 type;
> > @@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
> >   	if (unlikely(desc->status))
> >   		goto corrupted;
> > -	if (unlikely((tail | head) >= size)) {
> > +	GEM_BUG_ON(tail > size);
> > +
> > +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> > +	if (unlikely(tail != READ_ONCE(desc->tail))) {
> > +		CT_ERROR(ct, "Tail was modified %u != %u\n",
> > +			 desc->tail, ctb->tail);
> > +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> > +		goto corrupted;
> > +	}
> > +	if (unlikely((desc->tail | desc->head) >= size)) {
> Same arguments below about head apply to tail here. Also, there is no #else

Yes, desc->tail can be removed from this check. Same for head below. Can
you fix this when merging?

> check on ctb->head?

ctb->head variable isn't used in this path nor is ctb->tail in the
other. In the other path desc->tail is checked as it is read while
desc->head isn't needed to be read here. The other path can also likely
be reworked to pull the tail check outside of the if / else define
block.

> 
> >   		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> > -			 head, tail, size);
> > +			 desc->head, desc->tail, size);
> >   		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> >   		goto corrupted;
> >   	}
> > -
> > -	/*
> > -	 * tail == head condition indicates empty. GuC FW does not support
> > -	 * using up the entire buffer to get tail == head meaning full.
> > -	 */
> > -	if (tail < head)
> > -		used = (size - head) + tail;
> > -	else
> > -		used = tail - head;
> > -
> > -	/* make sure there is a space including extra dw for the header */
> > -	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
> > -		return -ENOSPC;
> > +#endif
> >   	/*
> >   	 * dw0: CT header (including fence)
> > @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct,
> >   	write_barrier(ct);
> >   	/* now update descriptor */
> > +	ctb->tail = tail;
> >   	WRITE_ONCE(desc->tail, tail);
> > +	ctb->space -= len + GUC_CTB_HDR_LEN;
> >   	return 0;
> > @@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,
> >    * @req:	pointer to pending request
> >    * @status:	placeholder for status
> >    *
> > - * For each sent request, Guc shall send bac CT response message.
> > + * For each sent request, GuC shall send back CT response message.
> >    * Our message handler will update status of tracked request once
> >    * response message with given fence is received. Wait here and
> >    * check for valid response status value.
> > @@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
> >   	return ret;
> >   }
> > -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> > +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
> >   {
> > -	struct guc_ct_buffer_desc *desc = ctb->desc;
> > -	u32 head = READ_ONCE(desc->head);
> > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > +	u32 head;
> >   	u32 space;
> > -	space = CIRC_SPACE(desc->tail, head, ctb->size);
> > +	if (ctb->space >= len_dw)
> > +		return true;
> > +
> > +	head = READ_ONCE(ctb->desc->head);
> > +	if (unlikely(head > ctb->size)) {
> > +		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
> > +			 ctb->desc->head, ctb->desc->tail, ctb->size);
> > +		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > +		ctb->broken = true;
> > +		return false;
> > +	}
> > +
> > +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> > +	ctb->space = space;
> >   	return space >= len_dw;
> >   }
> >   static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
> >   {
> > -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > -
> >   	lockdep_assert_held(&ct->ctbs.send.lock);
> > -	if (unlikely(!h2g_has_room(ctb, len_dw))) {
> > +	if (unlikely(!h2g_has_room(ct, len_dw))) {
> >   		if (ct->stall_time == KTIME_MAX)
> >   			ct->stall_time = ktime_get();
> > @@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >   	 */
> >   retry:
> >   	spin_lock_irqsave(&ctb->lock, flags);
> > -	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
> > +	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
> >   		if (ct->stall_time == KTIME_MAX)
> >   			ct->stall_time = ktime_get();
> >   		spin_unlock_irqrestore(&ctb->lock, flags);
> > @@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >   {
> >   	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
> >   	struct guc_ct_buffer_desc *desc = ctb->desc;
> > -	u32 head = desc->head;
> > +	u32 head = ctb->head;
> >   	u32 tail = desc->tail;
> >   	u32 size = ctb->size;
> >   	u32 *cmds = ctb->cmds;
> > @@ -747,12 +759,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >   	if (unlikely(desc->status))
> >   		goto corrupted;
> > -	if (unlikely((tail | head) >= size)) {
> > +	GEM_BUG_ON(head > size);
> > +
> > +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> > +	if (unlikely(head != READ_ONCE(desc->head))) {
> > +		CT_ERROR(ct, "Head was modified %u != %u\n",
> > +			 desc->head, ctb->head);
> > +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> > +		goto corrupted;
> > +	}
> > +	if (unlikely((desc->tail | desc->head) >= size)) {
> As per comment in other thread, the check on head here is redundant because
> you have already hit a BUG_ON(ctb->head > size) followed by
> CT_ERROR(ctb->head != desc->head). Therefore, you can't get here if
> 'desc->head > size'.
>

Yep. See above we can likely just delete this.
 
> >   		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> >   			 head, tail, size);
> >   		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> >   		goto corrupted;
> >   	}
> > +#else
> > +	if (unlikely(tail >= size)) {
> > +		CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
> > +			 tail, size);
> > +		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > +		goto corrupted;
> > +	}

Now we can move this outside if/else define block as it is same check as
above. Again can you do this when you merge this?

Matt

> > +#endif
> >   	/* tail == head condition indicates empty */
> >   	available = tail - head;
> > @@ -802,6 +831,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >   	}
> >   	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
> > +	ctb->head = head;
> >   	/* now update descriptor */
> >   	WRITE_ONCE(desc->head, head);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > index bee03794c1eb..edd1bba0445d 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > @@ -33,6 +33,9 @@ struct intel_guc;
> >    * @desc: pointer to the buffer descriptor
> >    * @cmds: pointer to the commands buffer
> >    * @size: size of the commands buffer in dwords
> > + * @head: local shadow copy of head in dwords
> > + * @tail: local shadow copy of tail in dwords
> > + * @space: local shadow copy of space in dwords
> >    * @broken: flag to indicate if descriptor data is broken
> >    */
> >   struct intel_guc_ct_buffer {
> > @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
> >   	struct guc_ct_buffer_desc *desc;
> >   	u32 *cmds;
> >   	u32 size;
> > +	u32 tail;
> > +	u32 head;
> > +	u32 space;
> >   	bool broken;
> >   };
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
  2021-07-07 17:50       ` [Intel-gfx] " Matthew Brost
@ 2021-07-07 18:19         ` John Harrison
  -1 siblings, 0 replies; 54+ messages in thread
From: John Harrison @ 2021-07-07 18:19 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, dri-devel, Michal.Wajdeczko

On 7/7/2021 10:50, Matthew Brost wrote:
> On Tue, Jul 06, 2021 at 03:51:00PM -0700, John Harrison wrote:
>> On 7/6/2021 15:20, Matthew Brost wrote:
>>> CTB writes are now in the path of command submission and should be
>>> optimized for performance. Rather than reading CTB descriptor values
>>> (e.g. head, tail) which could result in accesses across the PCIe bus,
>>> store shadow local copies and only read/write the descriptor values when
>>> absolutely necessary. Also store the current space in the each channel
>>> locally.
>>>
>>> v2:
>>>    (Michal)
>>>     - Add additional sanity checks for head / tail pointers
>>>     - Use GUC_CTB_HDR_LEN rather than magic 1
>>> v3:
>>>    (Michal / John H)
>>>     - Drop redundant check of head value
>>>
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++--------
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
>>>    2 files changed, 65 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index db3e85b89573..4a73a1f03a9b 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
>>>    static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
>>>    {
>>>    	ctb->broken = false;
>>> +	ctb->tail = 0;
>>> +	ctb->head = 0;
>>> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>>> +
>>>    	guc_ct_buffer_desc_init(ctb->desc);
>>>    }
>>> @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
>>>    {
>>>    	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>    	struct guc_ct_buffer_desc *desc = ctb->desc;
>>> -	u32 head = desc->head;
>>> -	u32 tail = desc->tail;
>>> +	u32 tail = ctb->tail;
>>>    	u32 size = ctb->size;
>>> -	u32 used;
>>>    	u32 header;
>>>    	u32 hxg;
>>>    	u32 type;
>>> @@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
>>>    	if (unlikely(desc->status))
>>>    		goto corrupted;
>>> -	if (unlikely((tail | head) >= size)) {
>>> +	GEM_BUG_ON(tail > size);
>>> +
>>> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
>>> +	if (unlikely(tail != READ_ONCE(desc->tail))) {
>>> +		CT_ERROR(ct, "Tail was modified %u != %u\n",
>>> +			 desc->tail, ctb->tail);
>>> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
>>> +		goto corrupted;
>>> +	}
>>> +	if (unlikely((desc->tail | desc->head) >= size)) {
>> Same arguments below about head apply to tail here. Also, there is no #else
> Yes, desc->tail can be removed from this check. Same for head below. Can
> you fix this when merging?
>
>> check on ctb->head?
> ctb->head variable isn't used in this path nor is ctb->tail in the
> other. In the other path desc->tail is checked as it is read while
> desc->head isn't needed to be read here. The other path can also likely
> be reworked to pull the tail check outside of the if / else define
> block.
>
>>>    		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>>> -			 head, tail, size);
>>> +			 desc->head, desc->tail, size);
>>>    		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>>    		goto corrupted;
>>>    	}
>>> -
>>> -	/*
>>> -	 * tail == head condition indicates empty. GuC FW does not support
>>> -	 * using up the entire buffer to get tail == head meaning full.
>>> -	 */
>>> -	if (tail < head)
>>> -		used = (size - head) + tail;
>>> -	else
>>> -		used = tail - head;
>>> -
>>> -	/* make sure there is a space including extra dw for the header */
>>> -	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
>>> -		return -ENOSPC;
>>> +#endif
>>>    	/*
>>>    	 * dw0: CT header (including fence)
>>> @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct,
>>>    	write_barrier(ct);
>>>    	/* now update descriptor */
>>> +	ctb->tail = tail;
>>>    	WRITE_ONCE(desc->tail, tail);
>>> +	ctb->space -= len + GUC_CTB_HDR_LEN;
>>>    	return 0;
>>> @@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,
>>>     * @req:	pointer to pending request
>>>     * @status:	placeholder for status
>>>     *
>>> - * For each sent request, Guc shall send bac CT response message.
>>> + * For each sent request, GuC shall send back CT response message.
>>>     * Our message handler will update status of tracked request once
>>>     * response message with given fence is received. Wait here and
>>>     * check for valid response status value.
>>> @@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
>>>    	return ret;
>>>    }
>>> -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
>>> +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
>>>    {
>>> -	struct guc_ct_buffer_desc *desc = ctb->desc;
>>> -	u32 head = READ_ONCE(desc->head);
>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>> +	u32 head;
>>>    	u32 space;
>>> -	space = CIRC_SPACE(desc->tail, head, ctb->size);
>>> +	if (ctb->space >= len_dw)
>>> +		return true;
>>> +
>>> +	head = READ_ONCE(ctb->desc->head);
>>> +	if (unlikely(head > ctb->size)) {
>>> +		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
>>> +			 ctb->desc->head, ctb->desc->tail, ctb->size);
>>> +		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>> +		ctb->broken = true;
>>> +		return false;
>>> +	}
>>> +
>>> +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
>>> +	ctb->space = space;
>>>    	return space >= len_dw;
>>>    }
>>>    static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
>>>    {
>>> -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>> -
>>>    	lockdep_assert_held(&ct->ctbs.send.lock);
>>> -	if (unlikely(!h2g_has_room(ctb, len_dw))) {
>>> +	if (unlikely(!h2g_has_room(ct, len_dw))) {
>>>    		if (ct->stall_time == KTIME_MAX)
>>>    			ct->stall_time = ktime_get();
>>> @@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>    	 */
>>>    retry:
>>>    	spin_lock_irqsave(&ctb->lock, flags);
>>> -	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
>>> +	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
>>>    		if (ct->stall_time == KTIME_MAX)
>>>    			ct->stall_time = ktime_get();
>>>    		spin_unlock_irqrestore(&ctb->lock, flags);
>>> @@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>>>    {
>>>    	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
>>>    	struct guc_ct_buffer_desc *desc = ctb->desc;
>>> -	u32 head = desc->head;
>>> +	u32 head = ctb->head;
>>>    	u32 tail = desc->tail;
>>>    	u32 size = ctb->size;
>>>    	u32 *cmds = ctb->cmds;
>>> @@ -747,12 +759,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>>>    	if (unlikely(desc->status))
>>>    		goto corrupted;
>>> -	if (unlikely((tail | head) >= size)) {
>>> +	GEM_BUG_ON(head > size);
>>> +
>>> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
>>> +	if (unlikely(head != READ_ONCE(desc->head))) {
>>> +		CT_ERROR(ct, "Head was modified %u != %u\n",
>>> +			 desc->head, ctb->head);
>>> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
>>> +		goto corrupted;
>>> +	}
>>> +	if (unlikely((desc->tail | desc->head) >= size)) {
>> As per comment in other thread, the check on head here is redundant because
>> you have already hit a BUG_ON(ctb->head > size) followed by
>> CT_ERROR(ctb->head != desc->head). Therefore, you can't get here if
>> 'desc->head > size'.
>>
> Yep. See above we can likely just delete this.
>   
>>>    		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>>>    			 head, tail, size);
>>>    		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>>    		goto corrupted;
>>>    	}
>>> +#else
>>> +	if (unlikely(tail >= size)) {
>>> +		CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
>>> +			 tail, size);
>>> +		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>> +		goto corrupted;
>>> +	}
> Now we can move this outside if/else define block as it is same check as
> above. Again can you do this when you merge this?
>
> Matt
Given that a) there are multiple changes which are not trivial one 
liners and b) I personally prefer keeping the CT_ERRORs and dropping the 
BUG_ON, I would recommend that you repost an updated patch how you want 
it changed. Shouldn't need to repost the whole set, just this one patch. 
And maybe get it reviewed by Michal as he seems to be in agreement with 
your preferred direction.

John.


>
>>> +#endif
>>>    	/* tail == head condition indicates empty */
>>>    	available = tail - head;
>>> @@ -802,6 +831,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>>>    	}
>>>    	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
>>> +	ctb->head = head;
>>>    	/* now update descriptor */
>>>    	WRITE_ONCE(desc->head, head);
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> index bee03794c1eb..edd1bba0445d 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> @@ -33,6 +33,9 @@ struct intel_guc;
>>>     * @desc: pointer to the buffer descriptor
>>>     * @cmds: pointer to the commands buffer
>>>     * @size: size of the commands buffer in dwords
>>> + * @head: local shadow copy of head in dwords
>>> + * @tail: local shadow copy of tail in dwords
>>> + * @space: local shadow copy of space in dwords
>>>     * @broken: flag to indicate if descriptor data is broken
>>>     */
>>>    struct intel_guc_ct_buffer {
>>> @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
>>>    	struct guc_ct_buffer_desc *desc;
>>>    	u32 *cmds;
>>>    	u32 size;
>>> +	u32 tail;
>>> +	u32 head;
>>> +	u32 space;
>>>    	bool broken;
>>>    };


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Intel-gfx] [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
@ 2021-07-07 18:19         ` John Harrison
  0 siblings, 0 replies; 54+ messages in thread
From: John Harrison @ 2021-07-07 18:19 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, dri-devel

On 7/7/2021 10:50, Matthew Brost wrote:
> On Tue, Jul 06, 2021 at 03:51:00PM -0700, John Harrison wrote:
>> On 7/6/2021 15:20, Matthew Brost wrote:
>>> CTB writes are now in the path of command submission and should be
>>> optimized for performance. Rather than reading CTB descriptor values
>>> (e.g. head, tail) which could result in accesses across the PCIe bus,
>>> store shadow local copies and only read/write the descriptor values when
>>> absolutely necessary. Also store the current space in the each channel
>>> locally.
>>>
>>> v2:
>>>    (Michal)
>>>     - Add additional sanity checks for head / tail pointers
>>>     - Use GUC_CTB_HDR_LEN rather than magic 1
>>> v3:
>>>    (Michal / John H)
>>>     - Drop redundant check of head value
>>>
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++--------
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
>>>    2 files changed, 65 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index db3e85b89573..4a73a1f03a9b 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
>>>    static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
>>>    {
>>>    	ctb->broken = false;
>>> +	ctb->tail = 0;
>>> +	ctb->head = 0;
>>> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>>> +
>>>    	guc_ct_buffer_desc_init(ctb->desc);
>>>    }
>>> @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
>>>    {
>>>    	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>    	struct guc_ct_buffer_desc *desc = ctb->desc;
>>> -	u32 head = desc->head;
>>> -	u32 tail = desc->tail;
>>> +	u32 tail = ctb->tail;
>>>    	u32 size = ctb->size;
>>> -	u32 used;
>>>    	u32 header;
>>>    	u32 hxg;
>>>    	u32 type;
>>> @@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
>>>    	if (unlikely(desc->status))
>>>    		goto corrupted;
>>> -	if (unlikely((tail | head) >= size)) {
>>> +	GEM_BUG_ON(tail > size);
>>> +
>>> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
>>> +	if (unlikely(tail != READ_ONCE(desc->tail))) {
>>> +		CT_ERROR(ct, "Tail was modified %u != %u\n",
>>> +			 desc->tail, ctb->tail);
>>> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
>>> +		goto corrupted;
>>> +	}
>>> +	if (unlikely((desc->tail | desc->head) >= size)) {
>> Same arguments below about head apply to tail here. Also, there is no #else
> Yes, desc->tail can be removed from this check. Same for head below. Can
> you fix this when merging?
>
>> check on ctb->head?
> ctb->head variable isn't used in this path nor is ctb->tail in the
> other. In the other path desc->tail is checked as it is read while
> desc->head isn't needed to be read here. The other path can also likely
> be reworked to pull the tail check outside of the if / else define
> block.
>
>>>    		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>>> -			 head, tail, size);
>>> +			 desc->head, desc->tail, size);
>>>    		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>>    		goto corrupted;
>>>    	}
>>> -
>>> -	/*
>>> -	 * tail == head condition indicates empty. GuC FW does not support
>>> -	 * using up the entire buffer to get tail == head meaning full.
>>> -	 */
>>> -	if (tail < head)
>>> -		used = (size - head) + tail;
>>> -	else
>>> -		used = tail - head;
>>> -
>>> -	/* make sure there is a space including extra dw for the header */
>>> -	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
>>> -		return -ENOSPC;
>>> +#endif
>>>    	/*
>>>    	 * dw0: CT header (including fence)
>>> @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct,
>>>    	write_barrier(ct);
>>>    	/* now update descriptor */
>>> +	ctb->tail = tail;
>>>    	WRITE_ONCE(desc->tail, tail);
>>> +	ctb->space -= len + GUC_CTB_HDR_LEN;
>>>    	return 0;
>>> @@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,
>>>     * @req:	pointer to pending request
>>>     * @status:	placeholder for status
>>>     *
>>> - * For each sent request, Guc shall send bac CT response message.
>>> + * For each sent request, GuC shall send back CT response message.
>>>     * Our message handler will update status of tracked request once
>>>     * response message with given fence is received. Wait here and
>>>     * check for valid response status value.
>>> @@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
>>>    	return ret;
>>>    }
>>> -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
>>> +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
>>>    {
>>> -	struct guc_ct_buffer_desc *desc = ctb->desc;
>>> -	u32 head = READ_ONCE(desc->head);
>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>> +	u32 head;
>>>    	u32 space;
>>> -	space = CIRC_SPACE(desc->tail, head, ctb->size);
>>> +	if (ctb->space >= len_dw)
>>> +		return true;
>>> +
>>> +	head = READ_ONCE(ctb->desc->head);
>>> +	if (unlikely(head > ctb->size)) {
>>> +		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
>>> +			 ctb->desc->head, ctb->desc->tail, ctb->size);
>>> +		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>> +		ctb->broken = true;
>>> +		return false;
>>> +	}
>>> +
>>> +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
>>> +	ctb->space = space;
>>>    	return space >= len_dw;
>>>    }
>>>    static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
>>>    {
>>> -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>> -
>>>    	lockdep_assert_held(&ct->ctbs.send.lock);
>>> -	if (unlikely(!h2g_has_room(ctb, len_dw))) {
>>> +	if (unlikely(!h2g_has_room(ct, len_dw))) {
>>>    		if (ct->stall_time == KTIME_MAX)
>>>    			ct->stall_time = ktime_get();
>>> @@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>    	 */
>>>    retry:
>>>    	spin_lock_irqsave(&ctb->lock, flags);
>>> -	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
>>> +	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
>>>    		if (ct->stall_time == KTIME_MAX)
>>>    			ct->stall_time = ktime_get();
>>>    		spin_unlock_irqrestore(&ctb->lock, flags);
>>> @@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>>>    {
>>>    	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
>>>    	struct guc_ct_buffer_desc *desc = ctb->desc;
>>> -	u32 head = desc->head;
>>> +	u32 head = ctb->head;
>>>    	u32 tail = desc->tail;
>>>    	u32 size = ctb->size;
>>>    	u32 *cmds = ctb->cmds;
>>> @@ -747,12 +759,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>>>    	if (unlikely(desc->status))
>>>    		goto corrupted;
>>> -	if (unlikely((tail | head) >= size)) {
>>> +	GEM_BUG_ON(head > size);
>>> +
>>> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
>>> +	if (unlikely(head != READ_ONCE(desc->head))) {
>>> +		CT_ERROR(ct, "Head was modified %u != %u\n",
>>> +			 desc->head, ctb->head);
>>> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
>>> +		goto corrupted;
>>> +	}
>>> +	if (unlikely((desc->tail | desc->head) >= size)) {
>> As per comment in other thread, the check on head here is redundant because
>> you have already hit a BUG_ON(ctb->head > size) followed by
>> CT_ERROR(ctb->head != desc->head). Therefore, you can't get here if
>> 'desc->head > size'.
>>
> Yep. See above we can likely just delete this.
>   
>>>    		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>>>    			 head, tail, size);
>>>    		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>>    		goto corrupted;
>>>    	}
>>> +#else
>>> +	if (unlikely(tail >= size)) {
>>> +		CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
>>> +			 tail, size);
>>> +		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>> +		goto corrupted;
>>> +	}
> Now we can move this outside if/else define block as it is same check as
> above. Again can you do this when you merge this?
>
> Matt
Given that a) there are multiple changes which are not trivial one 
liners and b) I personally prefer keeping the CT_ERRORs and dropping the 
BUG_ON, I would recommend that you repost an updated patch how you want 
it changed. Shouldn't need to repost the whole set, just this one patch. 
And maybe get it reviewed by Michal as he seems to be in agreement with 
your preferred direction.

John.


>
>>> +#endif
>>>    	/* tail == head condition indicates empty */
>>>    	available = tail - head;
>>> @@ -802,6 +831,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>>>    	}
>>>    	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
>>> +	ctb->head = head;
>>>    	/* now update descriptor */
>>>    	WRITE_ONCE(desc->head, head);
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> index bee03794c1eb..edd1bba0445d 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> @@ -33,6 +33,9 @@ struct intel_guc;
>>>     * @desc: pointer to the buffer descriptor
>>>     * @cmds: pointer to the commands buffer
>>>     * @size: size of the commands buffer in dwords
>>> + * @head: local shadow copy of head in dwords
>>> + * @tail: local shadow copy of tail in dwords
>>> + * @space: local shadow copy of space in dwords
>>>     * @broken: flag to indicate if descriptor data is broken
>>>     */
>>>    struct intel_guc_ct_buffer {
>>> @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
>>>    	struct guc_ct_buffer_desc *desc;
>>>    	u32 *cmds;
>>>    	u32 size;
>>> +	u32 tail;
>>> +	u32 head;
>>> +	u32 space;
>>>    	bool broken;
>>>    };

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
  2021-07-07 18:19         ` [Intel-gfx] " John Harrison
@ 2021-07-07 18:56           ` Matthew Brost
  -1 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-07 18:56 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel, Michal.Wajdeczko

On Wed, Jul 07, 2021 at 11:19:01AM -0700, John Harrison wrote:
> On 7/7/2021 10:50, Matthew Brost wrote:
> > On Tue, Jul 06, 2021 at 03:51:00PM -0700, John Harrison wrote:
> > > On 7/6/2021 15:20, Matthew Brost wrote:
> > > > CTB writes are now in the path of command submission and should be
> > > > optimized for performance. Rather than reading CTB descriptor values
> > > > (e.g. head, tail) which could result in accesses across the PCIe bus,
> > > > store shadow local copies and only read/write the descriptor values when
> > > > absolutely necessary. Also store the current space in the each channel
> > > > locally.
> > > > 
> > > > v2:
> > > >    (Michal)
> > > >     - Add additional sanity checks for head / tail pointers
> > > >     - Use GUC_CTB_HDR_LEN rather than magic 1
> > > > v3:
> > > >    (Michal / John H)
> > > >     - Drop redundant check of head value
> > > > 
> > > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++--------
> > > >    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
> > > >    2 files changed, 65 insertions(+), 29 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > index db3e85b89573..4a73a1f03a9b 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
> > > >    static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
> > > >    {
> > > >    	ctb->broken = false;
> > > > +	ctb->tail = 0;
> > > > +	ctb->head = 0;
> > > > +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
> > > > +
> > > >    	guc_ct_buffer_desc_init(ctb->desc);
> > > >    }
> > > > @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
> > > >    {
> > > >    	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > >    	struct guc_ct_buffer_desc *desc = ctb->desc;
> > > > -	u32 head = desc->head;
> > > > -	u32 tail = desc->tail;
> > > > +	u32 tail = ctb->tail;
> > > >    	u32 size = ctb->size;
> > > > -	u32 used;
> > > >    	u32 header;
> > > >    	u32 hxg;
> > > >    	u32 type;
> > > > @@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
> > > >    	if (unlikely(desc->status))
> > > >    		goto corrupted;
> > > > -	if (unlikely((tail | head) >= size)) {
> > > > +	GEM_BUG_ON(tail > size);
> > > > +
> > > > +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> > > > +	if (unlikely(tail != READ_ONCE(desc->tail))) {
> > > > +		CT_ERROR(ct, "Tail was modified %u != %u\n",
> > > > +			 desc->tail, ctb->tail);
> > > > +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> > > > +		goto corrupted;
> > > > +	}
> > > > +	if (unlikely((desc->tail | desc->head) >= size)) {
> > > Same arguments below about head apply to tail here. Also, there is no #else
> > Yes, desc->tail can be removed from this check. Same for head below. Can
> > you fix this when merging?
> > 
> > > check on ctb->head?
> > ctb->head variable isn't used in this path nor is ctb->tail in the
> > other. In the other path desc->tail is checked as it is read while
> > desc->head isn't needed to be read here. The other path can also likely
> > be reworked to pull the tail check outside of the if / else define
> > block.
> > 
> > > >    		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> > > > -			 head, tail, size);
> > > > +			 desc->head, desc->tail, size);
> > > >    		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > > >    		goto corrupted;
> > > >    	}
> > > > -
> > > > -	/*
> > > > -	 * tail == head condition indicates empty. GuC FW does not support
> > > > -	 * using up the entire buffer to get tail == head meaning full.
> > > > -	 */
> > > > -	if (tail < head)
> > > > -		used = (size - head) + tail;
> > > > -	else
> > > > -		used = tail - head;
> > > > -
> > > > -	/* make sure there is a space including extra dw for the header */
> > > > -	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
> > > > -		return -ENOSPC;
> > > > +#endif
> > > >    	/*
> > > >    	 * dw0: CT header (including fence)
> > > > @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct,
> > > >    	write_barrier(ct);
> > > >    	/* now update descriptor */
> > > > +	ctb->tail = tail;
> > > >    	WRITE_ONCE(desc->tail, tail);
> > > > +	ctb->space -= len + GUC_CTB_HDR_LEN;
> > > >    	return 0;
> > > > @@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,
> > > >     * @req:	pointer to pending request
> > > >     * @status:	placeholder for status
> > > >     *
> > > > - * For each sent request, Guc shall send bac CT response message.
> > > > + * For each sent request, GuC shall send back CT response message.
> > > >     * Our message handler will update status of tracked request once
> > > >     * response message with given fence is received. Wait here and
> > > >     * check for valid response status value.
> > > > @@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
> > > >    	return ret;
> > > >    }
> > > > -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> > > > +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
> > > >    {
> > > > -	struct guc_ct_buffer_desc *desc = ctb->desc;
> > > > -	u32 head = READ_ONCE(desc->head);
> > > > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > > +	u32 head;
> > > >    	u32 space;
> > > > -	space = CIRC_SPACE(desc->tail, head, ctb->size);
> > > > +	if (ctb->space >= len_dw)
> > > > +		return true;
> > > > +
> > > > +	head = READ_ONCE(ctb->desc->head);
> > > > +	if (unlikely(head > ctb->size)) {
> > > > +		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
> > > > +			 ctb->desc->head, ctb->desc->tail, ctb->size);
> > > > +		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > > > +		ctb->broken = true;
> > > > +		return false;
> > > > +	}
> > > > +
> > > > +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> > > > +	ctb->space = space;
> > > >    	return space >= len_dw;
> > > >    }
> > > >    static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
> > > >    {
> > > > -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > > -
> > > >    	lockdep_assert_held(&ct->ctbs.send.lock);
> > > > -	if (unlikely(!h2g_has_room(ctb, len_dw))) {
> > > > +	if (unlikely(!h2g_has_room(ct, len_dw))) {
> > > >    		if (ct->stall_time == KTIME_MAX)
> > > >    			ct->stall_time = ktime_get();
> > > > @@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > > >    	 */
> > > >    retry:
> > > >    	spin_lock_irqsave(&ctb->lock, flags);
> > > > -	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
> > > > +	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
> > > >    		if (ct->stall_time == KTIME_MAX)
> > > >    			ct->stall_time = ktime_get();
> > > >    		spin_unlock_irqrestore(&ctb->lock, flags);
> > > > @@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> > > >    {
> > > >    	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
> > > >    	struct guc_ct_buffer_desc *desc = ctb->desc;
> > > > -	u32 head = desc->head;
> > > > +	u32 head = ctb->head;
> > > >    	u32 tail = desc->tail;
> > > >    	u32 size = ctb->size;
> > > >    	u32 *cmds = ctb->cmds;
> > > > @@ -747,12 +759,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> > > >    	if (unlikely(desc->status))
> > > >    		goto corrupted;
> > > > -	if (unlikely((tail | head) >= size)) {
> > > > +	GEM_BUG_ON(head > size);
> > > > +
> > > > +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> > > > +	if (unlikely(head != READ_ONCE(desc->head))) {
> > > > +		CT_ERROR(ct, "Head was modified %u != %u\n",
> > > > +			 desc->head, ctb->head);
> > > > +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> > > > +		goto corrupted;
> > > > +	}
> > > > +	if (unlikely((desc->tail | desc->head) >= size)) {
> > > As per comment in other thread, the check on head here is redundant because
> > > you have already hit a BUG_ON(ctb->head > size) followed by
> > > CT_ERROR(ctb->head != desc->head). Therefore, you can't get here if
> > > 'desc->head > size'.
> > > 
> > Yep. See above we can likely just delete this.
> > > >    		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> > > >    			 head, tail, size);
> > > >    		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > > >    		goto corrupted;
> > > >    	}
> > > > +#else
> > > > +	if (unlikely(tail >= size)) {
> > > > +		CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
> > > > +			 tail, size);
> > > > +		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > > > +		goto corrupted;
> > > > +	}
> > Now we can move this outside if/else define block as it is same check as
> > above. Again can you do this when you merge this?
> > 
> > Matt
> Given that a) there are multiple changes which are not trivial one liners
> and b) I personally prefer keeping the CT_ERRORs and dropping the BUG_ON, I
> would recommend that you repost an updated patch how you want it changed.
> Shouldn't need to repost the whole set, just this one patch. And maybe get
> it reviewed by Michal as he seems to be in agreement with your preferred
> direction.
> 

Ok, I sent it but I looks like patchworks didn't like it. Anyways we
should be able to review that patch.

Matt 

> John.
> 
> 
> > 
> > > > +#endif
> > > >    	/* tail == head condition indicates empty */
> > > >    	available = tail - head;
> > > > @@ -802,6 +831,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> > > >    	}
> > > >    	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
> > > > +	ctb->head = head;
> > > >    	/* now update descriptor */
> > > >    	WRITE_ONCE(desc->head, head);
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > index bee03794c1eb..edd1bba0445d 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > @@ -33,6 +33,9 @@ struct intel_guc;
> > > >     * @desc: pointer to the buffer descriptor
> > > >     * @cmds: pointer to the commands buffer
> > > >     * @size: size of the commands buffer in dwords
> > > > + * @head: local shadow copy of head in dwords
> > > > + * @tail: local shadow copy of tail in dwords
> > > > + * @space: local shadow copy of space in dwords
> > > >     * @broken: flag to indicate if descriptor data is broken
> > > >     */
> > > >    struct intel_guc_ct_buffer {
> > > > @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
> > > >    	struct guc_ct_buffer_desc *desc;
> > > >    	u32 *cmds;
> > > >    	u32 size;
> > > > +	u32 tail;
> > > > +	u32 head;
> > > > +	u32 space;
> > > >    	bool broken;
> > > >    };
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Intel-gfx] [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
@ 2021-07-07 18:56           ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-07 18:56 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Wed, Jul 07, 2021 at 11:19:01AM -0700, John Harrison wrote:
> On 7/7/2021 10:50, Matthew Brost wrote:
> > On Tue, Jul 06, 2021 at 03:51:00PM -0700, John Harrison wrote:
> > > On 7/6/2021 15:20, Matthew Brost wrote:
> > > > CTB writes are now in the path of command submission and should be
> > > > optimized for performance. Rather than reading CTB descriptor values
> > > > (e.g. head, tail) which could result in accesses across the PCIe bus,
> > > > store shadow local copies and only read/write the descriptor values when
> > > > absolutely necessary. Also store the current space in the each channel
> > > > locally.
> > > > 
> > > > v2:
> > > >    (Michal)
> > > >     - Add additional sanity checks for head / tail pointers
> > > >     - Use GUC_CTB_HDR_LEN rather than magic 1
> > > > v3:
> > > >    (Michal / John H)
> > > >     - Drop redundant check of head value
> > > > 
> > > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++--------
> > > >    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
> > > >    2 files changed, 65 insertions(+), 29 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > index db3e85b89573..4a73a1f03a9b 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
> > > >    static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
> > > >    {
> > > >    	ctb->broken = false;
> > > > +	ctb->tail = 0;
> > > > +	ctb->head = 0;
> > > > +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
> > > > +
> > > >    	guc_ct_buffer_desc_init(ctb->desc);
> > > >    }
> > > > @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
> > > >    {
> > > >    	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > >    	struct guc_ct_buffer_desc *desc = ctb->desc;
> > > > -	u32 head = desc->head;
> > > > -	u32 tail = desc->tail;
> > > > +	u32 tail = ctb->tail;
> > > >    	u32 size = ctb->size;
> > > > -	u32 used;
> > > >    	u32 header;
> > > >    	u32 hxg;
> > > >    	u32 type;
> > > > @@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
> > > >    	if (unlikely(desc->status))
> > > >    		goto corrupted;
> > > > -	if (unlikely((tail | head) >= size)) {
> > > > +	GEM_BUG_ON(tail > size);
> > > > +
> > > > +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> > > > +	if (unlikely(tail != READ_ONCE(desc->tail))) {
> > > > +		CT_ERROR(ct, "Tail was modified %u != %u\n",
> > > > +			 desc->tail, ctb->tail);
> > > > +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> > > > +		goto corrupted;
> > > > +	}
> > > > +	if (unlikely((desc->tail | desc->head) >= size)) {
> > > Same arguments below about head apply to tail here. Also, there is no #else
> > Yes, desc->tail can be removed from this check. Same for head below. Can
> > you fix this when merging?
> > 
> > > check on ctb->head?
> > ctb->head variable isn't used in this path nor is ctb->tail in the
> > other. In the other path desc->tail is checked as it is read while
> > desc->head isn't needed to be read here. The other path can also likely
> > be reworked to pull the tail check outside of the if / else define
> > block.
> > 
> > > >    		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> > > > -			 head, tail, size);
> > > > +			 desc->head, desc->tail, size);
> > > >    		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > > >    		goto corrupted;
> > > >    	}
> > > > -
> > > > -	/*
> > > > -	 * tail == head condition indicates empty. GuC FW does not support
> > > > -	 * using up the entire buffer to get tail == head meaning full.
> > > > -	 */
> > > > -	if (tail < head)
> > > > -		used = (size - head) + tail;
> > > > -	else
> > > > -		used = tail - head;
> > > > -
> > > > -	/* make sure there is a space including extra dw for the header */
> > > > -	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
> > > > -		return -ENOSPC;
> > > > +#endif
> > > >    	/*
> > > >    	 * dw0: CT header (including fence)
> > > > @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct,
> > > >    	write_barrier(ct);
> > > >    	/* now update descriptor */
> > > > +	ctb->tail = tail;
> > > >    	WRITE_ONCE(desc->tail, tail);
> > > > +	ctb->space -= len + GUC_CTB_HDR_LEN;
> > > >    	return 0;
> > > > @@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,
> > > >     * @req:	pointer to pending request
> > > >     * @status:	placeholder for status
> > > >     *
> > > > - * For each sent request, Guc shall send bac CT response message.
> > > > + * For each sent request, GuC shall send back CT response message.
> > > >     * Our message handler will update status of tracked request once
> > > >     * response message with given fence is received. Wait here and
> > > >     * check for valid response status value.
> > > > @@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
> > > >    	return ret;
> > > >    }
> > > > -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> > > > +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
> > > >    {
> > > > -	struct guc_ct_buffer_desc *desc = ctb->desc;
> > > > -	u32 head = READ_ONCE(desc->head);
> > > > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > > +	u32 head;
> > > >    	u32 space;
> > > > -	space = CIRC_SPACE(desc->tail, head, ctb->size);
> > > > +	if (ctb->space >= len_dw)
> > > > +		return true;
> > > > +
> > > > +	head = READ_ONCE(ctb->desc->head);
> > > > +	if (unlikely(head > ctb->size)) {
> > > > +		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
> > > > +			 ctb->desc->head, ctb->desc->tail, ctb->size);
> > > > +		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > > > +		ctb->broken = true;
> > > > +		return false;
> > > > +	}
> > > > +
> > > > +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> > > > +	ctb->space = space;
> > > >    	return space >= len_dw;
> > > >    }
> > > >    static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
> > > >    {
> > > > -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > > > -
> > > >    	lockdep_assert_held(&ct->ctbs.send.lock);
> > > > -	if (unlikely(!h2g_has_room(ctb, len_dw))) {
> > > > +	if (unlikely(!h2g_has_room(ct, len_dw))) {
> > > >    		if (ct->stall_time == KTIME_MAX)
> > > >    			ct->stall_time = ktime_get();
> > > > @@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct,
> > > >    	 */
> > > >    retry:
> > > >    	spin_lock_irqsave(&ctb->lock, flags);
> > > > -	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
> > > > +	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
> > > >    		if (ct->stall_time == KTIME_MAX)
> > > >    			ct->stall_time = ktime_get();
> > > >    		spin_unlock_irqrestore(&ctb->lock, flags);
> > > > @@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> > > >    {
> > > >    	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
> > > >    	struct guc_ct_buffer_desc *desc = ctb->desc;
> > > > -	u32 head = desc->head;
> > > > +	u32 head = ctb->head;
> > > >    	u32 tail = desc->tail;
> > > >    	u32 size = ctb->size;
> > > >    	u32 *cmds = ctb->cmds;
> > > > @@ -747,12 +759,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> > > >    	if (unlikely(desc->status))
> > > >    		goto corrupted;
> > > > -	if (unlikely((tail | head) >= size)) {
> > > > +	GEM_BUG_ON(head > size);
> > > > +
> > > > +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> > > > +	if (unlikely(head != READ_ONCE(desc->head))) {
> > > > +		CT_ERROR(ct, "Head was modified %u != %u\n",
> > > > +			 desc->head, ctb->head);
> > > > +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> > > > +		goto corrupted;
> > > > +	}
> > > > +	if (unlikely((desc->tail | desc->head) >= size)) {
> > > As per comment in other thread, the check on head here is redundant because
> > > you have already hit a BUG_ON(ctb->head > size) followed by
> > > CT_ERROR(ctb->head != desc->head). Therefore, you can't get here if
> > > 'desc->head > size'.
> > > 
> > Yep. See above we can likely just delete this.
> > > >    		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> > > >    			 head, tail, size);
> > > >    		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > > >    		goto corrupted;
> > > >    	}
> > > > +#else
> > > > +	if (unlikely(tail >= size)) {
> > > > +		CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
> > > > +			 tail, size);
> > > > +		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> > > > +		goto corrupted;
> > > > +	}
> > Now we can move this outside if/else define block as it is same check as
> > above. Again can you do this when you merge this?
> > 
> > Matt
> Given that a) there are multiple changes which are not trivial one liners
> and b) I personally prefer keeping the CT_ERRORs and dropping the BUG_ON, I
> would recommend that you repost an updated patch how you want it changed.
> Shouldn't need to repost the whole set, just this one patch. And maybe get
> it reviewed by Michal as he seems to be in agreement with your preferred
> direction.
> 

Ok, I sent it but I looks like patchworks didn't like it. Anyways we
should be able to review that patch.

Matt 

> John.
> 
> 
> > 
> > > > +#endif
> > > >    	/* tail == head condition indicates empty */
> > > >    	available = tail - head;
> > > > @@ -802,6 +831,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> > > >    	}
> > > >    	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
> > > > +	ctb->head = head;
> > > >    	/* now update descriptor */
> > > >    	WRITE_ONCE(desc->head, head);
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > index bee03794c1eb..edd1bba0445d 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > > > @@ -33,6 +33,9 @@ struct intel_guc;
> > > >     * @desc: pointer to the buffer descriptor
> > > >     * @cmds: pointer to the commands buffer
> > > >     * @size: size of the commands buffer in dwords
> > > > + * @head: local shadow copy of head in dwords
> > > > + * @tail: local shadow copy of tail in dwords
> > > > + * @space: local shadow copy of space in dwords
> > > >     * @broken: flag to indicate if descriptor data is broken
> > > >     */
> > > >    struct intel_guc_ct_buffer {
> > > > @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
> > > >    	struct guc_ct_buffer_desc *desc;
> > > >    	u32 *cmds;
> > > >    	u32 size;
> > > > +	u32 tail;
> > > > +	u32 head;
> > > > +	u32 space;
> > > >    	bool broken;
> > > >    };
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 06/56] drm/i915/guc: Optimize CTB writes and reads
  2021-07-06 22:20 ` [Intel-gfx] " Matthew Brost
@ 2021-07-07 19:09   ` Matthew Brost
  -1 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-07 19:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: john.c.harrison, Michal.Wajdeczko

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail) which could result in accesses across the PCIe bus,
store shadow local copies and only read/write the descriptor values when
absolutely necessary. Also store the current space in the each channel
locally.

v2:
 (Michal)
  - Add additional sanity checks for head / tail pointers
  - Use GUC_CTB_HDR_LEN rather than magic 1
v3:
 (Michal / John H)
  - Drop redundant check of head value
v4:
 (John H)
  - Drop redundant checks of tail / head values

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 87 ++++++++++++++---------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
 2 files changed, 61 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index db3e85b89573..37fe9f3bbce3 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
 static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
 {
 	ctb->broken = false;
+	ctb->tail = 0;
+	ctb->head = 0;
+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+
 	guc_ct_buffer_desc_init(ctb->desc);
 }
 
@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
-	u32 tail = desc->tail;
+	u32 tail = ctb->tail;
 	u32 size = ctb->size;
-	u32 used;
 	u32 header;
 	u32 hxg;
 	u32 type;
@@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
-		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-			 head, tail, size);
+	GEM_BUG_ON(tail > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(tail != READ_ONCE(desc->tail))) {
+		CT_ERROR(ct, "Tail was modified %u != %u\n",
+			 desc->tail, ctb->tail);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+	if (unlikely(desc->head >= size)) {
+		CT_ERROR(ct, "Invalid offsets head=%u (size=%u)\n",
+			 desc->head, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
-
-	/*
-	 * tail == head condition indicates empty. GuC FW does not support
-	 * using up the entire buffer to get tail == head meaning full.
-	 */
-	if (tail < head)
-		used = (size - head) + tail;
-	else
-		used = tail - head;
-
-	/* make sure there is a space including extra dw for the header */
-	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
-		return -ENOSPC;
+#endif
 
 	/*
 	 * dw0: CT header (including fence)
@@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct,
 	write_barrier(ct);
 
 	/* now update descriptor */
+	ctb->tail = tail;
 	WRITE_ONCE(desc->tail, tail);
+	ctb->space -= len + GUC_CTB_HDR_LEN;
 
 	return 0;
 
@@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,
  * @req:	pointer to pending request
  * @status:	placeholder for status
  *
- * For each sent request, Guc shall send bac CT response message.
+ * For each sent request, GuC shall send back CT response message.
  * Our message handler will update status of tracked request once
  * response message with given fence is received. Wait here and
  * check for valid response status value.
@@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 	return ret;
 }
 
-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
+static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = READ_ONCE(desc->head);
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+	u32 head;
 	u32 space;
 
-	space = CIRC_SPACE(desc->tail, head, ctb->size);
+	if (ctb->space >= len_dw)
+		return true;
+
+	head = READ_ONCE(ctb->desc->head);
+	if (unlikely(head > ctb->size)) {
+		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
+			 ctb->desc->head, ctb->desc->tail, ctb->size);
+		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
+		ctb->broken = true;
+		return false;
+	}
+
+	space = CIRC_SPACE(ctb->tail, head, ctb->size);
+	ctb->space = space;
 
 	return space >= len_dw;
 }
 
 static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
-
 	lockdep_assert_held(&ct->ctbs.send.lock);
 
-	if (unlikely(!h2g_has_room(ctb, len_dw))) {
+	if (unlikely(!h2g_has_room(ct, len_dw))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 
@@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct,
 	 */
 retry:
 	spin_lock_irqsave(&ctb->lock, flags);
-	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
+	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 		spin_unlock_irqrestore(&ctb->lock, flags);
@@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
+	u32 head = ctb->head;
 	u32 tail = desc->tail;
 	u32 size = ctb->size;
 	u32 *cmds = ctb->cmds;
@@ -747,9 +759,19 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
-		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-			 head, tail, size);
+	GEM_BUG_ON(head > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(head != READ_ONCE(desc->head))) {
+		CT_ERROR(ct, "Head was modified %u != %u\n",
+			 desc->head, ctb->head);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+#endif
+	if (unlikely(tail >= size)) {
+		CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
+			 tail, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
@@ -802,6 +824,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	}
 	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
 
+	ctb->head = head;
 	/* now update descriptor */
 	WRITE_ONCE(desc->head, head);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index bee03794c1eb..edd1bba0445d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -33,6 +33,9 @@ struct intel_guc;
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
  * @size: size of the commands buffer in dwords
+ * @head: local shadow copy of head in dwords
+ * @tail: local shadow copy of tail in dwords
+ * @space: local shadow copy of space in dwords
  * @broken: flag to indicate if descriptor data is broken
  */
 struct intel_guc_ct_buffer {
@@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
 	u32 size;
+	u32 tail;
+	u32 head;
+	u32 space;
 	bool broken;
 };
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Intel-gfx] [PATCH 06/56] drm/i915/guc: Optimize CTB writes and reads
@ 2021-07-07 19:09   ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-07 19:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail) which could result in accesses across the PCIe bus,
store shadow local copies and only read/write the descriptor values when
absolutely necessary. Also store the current space in the each channel
locally.

v2:
 (Michal)
  - Add additional sanity checks for head / tail pointers
  - Use GUC_CTB_HDR_LEN rather than magic 1
v3:
 (Michal / John H)
  - Drop redundant check of head value
v4:
 (John H)
  - Drop redundant checks of tail / head values

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 87 ++++++++++++++---------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
 2 files changed, 61 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index db3e85b89573..37fe9f3bbce3 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
 static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
 {
 	ctb->broken = false;
+	ctb->tail = 0;
+	ctb->head = 0;
+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+
 	guc_ct_buffer_desc_init(ctb->desc);
 }
 
@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
-	u32 tail = desc->tail;
+	u32 tail = ctb->tail;
 	u32 size = ctb->size;
-	u32 used;
 	u32 header;
 	u32 hxg;
 	u32 type;
@@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
-		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-			 head, tail, size);
+	GEM_BUG_ON(tail > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(tail != READ_ONCE(desc->tail))) {
+		CT_ERROR(ct, "Tail was modified %u != %u\n",
+			 desc->tail, ctb->tail);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+	if (unlikely(desc->head >= size)) {
+		CT_ERROR(ct, "Invalid offsets head=%u (size=%u)\n",
+			 desc->head, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
-
-	/*
-	 * tail == head condition indicates empty. GuC FW does not support
-	 * using up the entire buffer to get tail == head meaning full.
-	 */
-	if (tail < head)
-		used = (size - head) + tail;
-	else
-		used = tail - head;
-
-	/* make sure there is a space including extra dw for the header */
-	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
-		return -ENOSPC;
+#endif
 
 	/*
 	 * dw0: CT header (including fence)
@@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct,
 	write_barrier(ct);
 
 	/* now update descriptor */
+	ctb->tail = tail;
 	WRITE_ONCE(desc->tail, tail);
+	ctb->space -= len + GUC_CTB_HDR_LEN;
 
 	return 0;
 
@@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,
  * @req:	pointer to pending request
  * @status:	placeholder for status
  *
- * For each sent request, Guc shall send bac CT response message.
+ * For each sent request, GuC shall send back CT response message.
  * Our message handler will update status of tracked request once
  * response message with given fence is received. Wait here and
  * check for valid response status value.
@@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 	return ret;
 }
 
-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
+static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = READ_ONCE(desc->head);
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+	u32 head;
 	u32 space;
 
-	space = CIRC_SPACE(desc->tail, head, ctb->size);
+	if (ctb->space >= len_dw)
+		return true;
+
+	head = READ_ONCE(ctb->desc->head);
+	if (unlikely(head > ctb->size)) {
+		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
+			 ctb->desc->head, ctb->desc->tail, ctb->size);
+		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
+		ctb->broken = true;
+		return false;
+	}
+
+	space = CIRC_SPACE(ctb->tail, head, ctb->size);
+	ctb->space = space;
 
 	return space >= len_dw;
 }
 
 static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
-
 	lockdep_assert_held(&ct->ctbs.send.lock);
 
-	if (unlikely(!h2g_has_room(ctb, len_dw))) {
+	if (unlikely(!h2g_has_room(ct, len_dw))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 
@@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct,
 	 */
 retry:
 	spin_lock_irqsave(&ctb->lock, flags);
-	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
+	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 		spin_unlock_irqrestore(&ctb->lock, flags);
@@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
+	u32 head = ctb->head;
 	u32 tail = desc->tail;
 	u32 size = ctb->size;
 	u32 *cmds = ctb->cmds;
@@ -747,9 +759,19 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
-		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-			 head, tail, size);
+	GEM_BUG_ON(head > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(head != READ_ONCE(desc->head))) {
+		CT_ERROR(ct, "Head was modified %u != %u\n",
+			 desc->head, ctb->head);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+#endif
+	if (unlikely(tail >= size)) {
+		CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
+			 tail, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
@@ -802,6 +824,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	}
 	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
 
+	ctb->head = head;
 	/* now update descriptor */
 	WRITE_ONCE(desc->head, head);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index bee03794c1eb..edd1bba0445d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -33,6 +33,9 @@ struct intel_guc;
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
  * @size: size of the commands buffer in dwords
+ * @head: local shadow copy of head in dwords
+ * @tail: local shadow copy of tail in dwords
+ * @space: local shadow copy of space in dwords
  * @broken: flag to indicate if descriptor data is broken
  */
 struct intel_guc_ct_buffer {
@@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
 	u32 size;
+	u32 tail;
+	u32 head;
+	u32 space;
 	bool broken;
 };
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
  2021-07-07 18:56           ` [Intel-gfx] " Matthew Brost
@ 2021-07-07 20:21             ` John Harrison
  -1 siblings, 0 replies; 54+ messages in thread
From: John Harrison @ 2021-07-07 20:21 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, dri-devel, Michal.Wajdeczko

On 7/7/2021 11:56, Matthew Brost wrote:
<snip>
> Ok, I sent it but I looks like patchworks didn't like it. Anyways we
> should be able to review that patch.
>
> Matt
Maybe because it came out as 6/56 instead of 6/7? Also, not sure if it 
needs to be in reply to 0/7 or 6/7?

John.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Intel-gfx] [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
@ 2021-07-07 20:21             ` John Harrison
  0 siblings, 0 replies; 54+ messages in thread
From: John Harrison @ 2021-07-07 20:21 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, dri-devel

On 7/7/2021 11:56, Matthew Brost wrote:
<snip>
> Ok, I sent it but I looks like patchworks didn't like it. Anyways we
> should be able to review that patch.
>
> Matt
Maybe because it came out as 6/56 instead of 6/7? Also, not sure if it 
needs to be in reply to 0/7 or 6/7?

John.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
  2021-07-07 20:21             ` [Intel-gfx] " John Harrison
@ 2021-07-07 20:23               ` Matthew Brost
  -1 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-07 20:23 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel, Michal.Wajdeczko

On Wed, Jul 07, 2021 at 01:21:35PM -0700, John Harrison wrote:
> On 7/7/2021 11:56, Matthew Brost wrote:
> <snip>
> > Ok, I sent it but I looks like patchworks didn't like it. Anyways we
> > should be able to review that patch.
> > 
> > Matt
> Maybe because it came out as 6/56 instead of 6/7? Also, not sure if it needs
> to be in reply to 0/7 or 6/7?

Yea, that is probably it. I think 6/7 would've made patckworks happy.

Matt

> 
> John.
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Intel-gfx] [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
@ 2021-07-07 20:23               ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-07 20:23 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Wed, Jul 07, 2021 at 01:21:35PM -0700, John Harrison wrote:
> On 7/7/2021 11:56, Matthew Brost wrote:
> <snip>
> > Ok, I sent it but I looks like patchworks didn't like it. Anyways we
> > should be able to review that patch.
> > 
> > Matt
> Maybe because it came out as 6/56 instead of 6/7? Also, not sure if it needs
> to be in reply to 0/7 or 6/7?

Yea, that is probably it. I think 6/7 would've made patckworks happy.

Matt

> 
> John.
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 06/56] drm/i915/guc: Optimize CTB writes and reads
  2021-07-07 19:09   ` [Intel-gfx] " Matthew Brost
@ 2021-07-07 20:30     ` Michal Wajdeczko
  -1 siblings, 0 replies; 54+ messages in thread
From: Michal Wajdeczko @ 2021-07-07 20:30 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: john.c.harrison

Hi,

On 07.07.2021 21:09, Matthew Brost wrote:
> CTB writes are now in the path of command submission and should be
> optimized for performance. Rather than reading CTB descriptor values
> (e.g. head, tail) which could result in accesses across the PCIe bus,
> store shadow local copies and only read/write the descriptor values when
> absolutely necessary. Also store the current space in the each channel
> locally.
> 
> v2:
>  (Michal)
>   - Add additional sanity checks for head / tail pointers
>   - Use GUC_CTB_HDR_LEN rather than magic 1
> v3:
>  (Michal / John H)
>   - Drop redundant check of head value
> v4:
>  (John H)
>   - Drop redundant checks of tail / head values

mostly nits, but since you will be sending it again...

> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 87 ++++++++++++++---------
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
>  2 files changed, 61 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index db3e85b89573..37fe9f3bbce3 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
>  static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
>  {
>  	ctb->broken = false;
> +	ctb->tail = 0;
> +	ctb->head = 0;
> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
> +
>  	guc_ct_buffer_desc_init(ctb->desc);
>  }
>  
> @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
>  {
>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> -	u32 tail = desc->tail;
> +	u32 tail = ctb->tail;
>  	u32 size = ctb->size;
> -	u32 used;
>  	u32 header;
>  	u32 hxg;
>  	u32 type;
> @@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
>  	if (unlikely(desc->status))
>  		goto corrupted;
>  
> -	if (unlikely((tail | head) >= size)) {
> -		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> -			 head, tail, size);
> +	GEM_BUG_ON(tail > size);
> +
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> +	if (unlikely(tail != READ_ONCE(desc->tail))) {
> +		CT_ERROR(ct, "Tail was modified %u != %u\n",
> +			 desc->tail, ctb->tail);

here you accessing desc->tail again so maybe we can use:

	u32 raw __maybe_unused;

	raw = READ_ONCE(desc->tail);
	if (unlikely(raw != tail)) ...
		CT_ERROR(..., raw, tail);

> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> +		goto corrupted;
> +	}
> +	if (unlikely(desc->head >= size)) {

above you are reading value from desc using READ_ONCE, could be

	raw = READ_ONCE(desc->head);
	if (unlikely(raw >= size)) ...
		CT_ERROR(..., raw, size);

> +		CT_ERROR(ct, "Invalid offsets head=%u (size=%u)\n",

"Invalid head offset %u > %u\n" ?

> +			 desc->head, size);
>  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>  		goto corrupted;
>  	}
> -
> -	/*
> -	 * tail == head condition indicates empty. GuC FW does not support
> -	 * using up the entire buffer to get tail == head meaning full.
> -	 */
> -	if (tail < head)
> -		used = (size - head) + tail;
> -	else
> -		used = tail - head;
> -
> -	/* make sure there is a space including extra dw for the header */
> -	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
> -		return -ENOSPC;
> +#endif
>  
>  	/*
>  	 * dw0: CT header (including fence)
> @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct,
>  	write_barrier(ct);
>  
>  	/* now update descriptor */
> +	ctb->tail = tail;
>  	WRITE_ONCE(desc->tail, tail);
> +	ctb->space -= len + GUC_CTB_HDR_LEN;

maybe keep ctb updates together?

+	/* update local copies */
+	ctb->tail = tail;
+	ctb->space -= len + GUC_CTB_HDR_LEN;
+
  	/* now update descriptor */
 	WRITE_ONCE(desc->tail, tail);

>  
>  	return 0;
>  
> @@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,
>   * @req:	pointer to pending request
>   * @status:	placeholder for status
>   *
> - * For each sent request, Guc shall send bac CT response message.
> + * For each sent request, GuC shall send back CT response message.
>   * Our message handler will update status of tracked request once
>   * response message with given fence is received. Wait here and
>   * check for valid response status value.
> @@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
>  	return ret;
>  }
>  
> -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
>  {
> -	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = READ_ONCE(desc->head);
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;

keep
	struct guc_ct_buffer_desc *desc = ctb->desc;

> +	u32 head;
>  	u32 space;
>  
> -	space = CIRC_SPACE(desc->tail, head, ctb->size);
> +	if (ctb->space >= len_dw)
> +		return true;
> +
> +	head = READ_ONCE(ctb->desc->head);
> +	if (unlikely(head > ctb->size)) {
> +		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",

"Invalid head offset %u > %u\n" ?

> +			 ctb->desc->head, ctb->desc->tail, ctb->size);
> +		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
> +		ctb->broken = true;

so above references to desc will be simpler

> +		return false;
> +	}
> +
> +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> +	ctb->space = space;
>  
>  	return space >= len_dw;
>  }
>  
>  static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
>  {
> -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> -
>  	lockdep_assert_held(&ct->ctbs.send.lock);
>  
> -	if (unlikely(!h2g_has_room(ctb, len_dw))) {
> +	if (unlikely(!h2g_has_room(ct, len_dw))) {
>  		if (ct->stall_time == KTIME_MAX)
>  			ct->stall_time = ktime_get();
>  
> @@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct,
>  	 */
>  retry:
>  	spin_lock_irqsave(&ctb->lock, flags);
> -	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
> +	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
>  		if (ct->stall_time == KTIME_MAX)
>  			ct->stall_time = ktime_get();
>  		spin_unlock_irqrestore(&ctb->lock, flags);
> @@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  {
>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> +	u32 head = ctb->head;
>  	u32 tail = desc->tail;

READ_ONCE ?

>  	u32 size = ctb->size;
>  	u32 *cmds = ctb->cmds;
> @@ -747,9 +759,19 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	if (unlikely(desc->status))
>  		goto corrupted;
>  
> -	if (unlikely((tail | head) >= size)) {
> -		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> -			 head, tail, size);
> +	GEM_BUG_ON(head > size);
> +
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> +	if (unlikely(head != READ_ONCE(desc->head))) {
> +		CT_ERROR(ct, "Head was modified %u != %u\n",
> +			 desc->head, ctb->head);
> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> +		goto corrupted;
> +	}
> +#endif
> +	if (unlikely(tail >= size)) {
> +		CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
> +			 tail, size);
>  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>  		goto corrupted;
>  	}
> @@ -802,6 +824,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	}
>  	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
>  
> +	ctb->head = head;

+ empty line

>  	/* now update descriptor */
>  	WRITE_ONCE(desc->head, head);
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index bee03794c1eb..edd1bba0445d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -33,6 +33,9 @@ struct intel_guc;
>   * @desc: pointer to the buffer descriptor
>   * @cmds: pointer to the commands buffer
>   * @size: size of the commands buffer in dwords
> + * @head: local shadow copy of head in dwords
> + * @tail: local shadow copy of tail in dwords
> + * @space: local shadow copy of space in dwords
>   * @broken: flag to indicate if descriptor data is broken
>   */
>  struct intel_guc_ct_buffer {
> @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
>  	struct guc_ct_buffer_desc *desc;
>  	u32 *cmds;
>  	u32 size;
> +	u32 tail;
> +	u32 head;
> +	u32 space;

it looks that we update space only for send/H2G ctb but maybe we should
also track free space in recv/G2H ctb ? then we can report different
stress conditions if space is less than 25% of size or so (someday)

Michal

>  	bool broken;
>  };
>  
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Intel-gfx] [PATCH 06/56] drm/i915/guc: Optimize CTB writes and reads
@ 2021-07-07 20:30     ` Michal Wajdeczko
  0 siblings, 0 replies; 54+ messages in thread
From: Michal Wajdeczko @ 2021-07-07 20:30 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

Hi,

On 07.07.2021 21:09, Matthew Brost wrote:
> CTB writes are now in the path of command submission and should be
> optimized for performance. Rather than reading CTB descriptor values
> (e.g. head, tail) which could result in accesses across the PCIe bus,
> store shadow local copies and only read/write the descriptor values when
> absolutely necessary. Also store the current space in the each channel
> locally.
> 
> v2:
>  (Michal)
>   - Add additional sanity checks for head / tail pointers
>   - Use GUC_CTB_HDR_LEN rather than magic 1
> v3:
>  (Michal / John H)
>   - Drop redundant check of head value
> v4:
>  (John H)
>   - Drop redundant checks of tail / head values

mostly nits, but since you will be sending it again...

> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 87 ++++++++++++++---------
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
>  2 files changed, 61 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index db3e85b89573..37fe9f3bbce3 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
>  static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
>  {
>  	ctb->broken = false;
> +	ctb->tail = 0;
> +	ctb->head = 0;
> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
> +
>  	guc_ct_buffer_desc_init(ctb->desc);
>  }
>  
> @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
>  {
>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> -	u32 tail = desc->tail;
> +	u32 tail = ctb->tail;
>  	u32 size = ctb->size;
> -	u32 used;
>  	u32 header;
>  	u32 hxg;
>  	u32 type;
> @@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
>  	if (unlikely(desc->status))
>  		goto corrupted;
>  
> -	if (unlikely((tail | head) >= size)) {
> -		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> -			 head, tail, size);
> +	GEM_BUG_ON(tail > size);
> +
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> +	if (unlikely(tail != READ_ONCE(desc->tail))) {
> +		CT_ERROR(ct, "Tail was modified %u != %u\n",
> +			 desc->tail, ctb->tail);

here you accessing desc->tail again so maybe we can use:

	u32 raw __maybe_unused;

	raw = READ_ONCE(desc->tail);
	if (unlikely(raw != tail)) ...
		CT_ERROR(..., raw, tail);

> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> +		goto corrupted;
> +	}
> +	if (unlikely(desc->head >= size)) {

above you are reading value from desc using READ_ONCE, could be

	raw = READ_ONCE(desc->head);
	if (unlikely(raw >= size)) ...
		CT_ERROR(..., raw, size);

> +		CT_ERROR(ct, "Invalid offsets head=%u (size=%u)\n",

"Invalid head offset %u > %u\n" ?

> +			 desc->head, size);
>  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>  		goto corrupted;
>  	}
> -
> -	/*
> -	 * tail == head condition indicates empty. GuC FW does not support
> -	 * using up the entire buffer to get tail == head meaning full.
> -	 */
> -	if (tail < head)
> -		used = (size - head) + tail;
> -	else
> -		used = tail - head;
> -
> -	/* make sure there is a space including extra dw for the header */
> -	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
> -		return -ENOSPC;
> +#endif
>  
>  	/*
>  	 * dw0: CT header (including fence)
> @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct,
>  	write_barrier(ct);
>  
>  	/* now update descriptor */
> +	ctb->tail = tail;
>  	WRITE_ONCE(desc->tail, tail);
> +	ctb->space -= len + GUC_CTB_HDR_LEN;

maybe keep ctb updates together?

+	/* update local copies */
+	ctb->tail = tail;
+	ctb->space -= len + GUC_CTB_HDR_LEN;
+
  	/* now update descriptor */
 	WRITE_ONCE(desc->tail, tail);

>  
>  	return 0;
>  
> @@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,
>   * @req:	pointer to pending request
>   * @status:	placeholder for status
>   *
> - * For each sent request, Guc shall send bac CT response message.
> + * For each sent request, GuC shall send back CT response message.
>   * Our message handler will update status of tracked request once
>   * response message with given fence is received. Wait here and
>   * check for valid response status value.
> @@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
>  	return ret;
>  }
>  
> -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
>  {
> -	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = READ_ONCE(desc->head);
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;

keep
	struct guc_ct_buffer_desc *desc = ctb->desc;

> +	u32 head;
>  	u32 space;
>  
> -	space = CIRC_SPACE(desc->tail, head, ctb->size);
> +	if (ctb->space >= len_dw)
> +		return true;
> +
> +	head = READ_ONCE(ctb->desc->head);
> +	if (unlikely(head > ctb->size)) {
> +		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",

"Invalid head offset %u > %u\n" ?

> +			 ctb->desc->head, ctb->desc->tail, ctb->size);
> +		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
> +		ctb->broken = true;

so above references to desc will be simpler

> +		return false;
> +	}
> +
> +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> +	ctb->space = space;
>  
>  	return space >= len_dw;
>  }
>  
>  static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
>  {
> -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> -
>  	lockdep_assert_held(&ct->ctbs.send.lock);
>  
> -	if (unlikely(!h2g_has_room(ctb, len_dw))) {
> +	if (unlikely(!h2g_has_room(ct, len_dw))) {
>  		if (ct->stall_time == KTIME_MAX)
>  			ct->stall_time = ktime_get();
>  
> @@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct,
>  	 */
>  retry:
>  	spin_lock_irqsave(&ctb->lock, flags);
> -	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
> +	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
>  		if (ct->stall_time == KTIME_MAX)
>  			ct->stall_time = ktime_get();
>  		spin_unlock_irqrestore(&ctb->lock, flags);
> @@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  {
>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> +	u32 head = ctb->head;
>  	u32 tail = desc->tail;

READ_ONCE ?

>  	u32 size = ctb->size;
>  	u32 *cmds = ctb->cmds;
> @@ -747,9 +759,19 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	if (unlikely(desc->status))
>  		goto corrupted;
>  
> -	if (unlikely((tail | head) >= size)) {
> -		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> -			 head, tail, size);
> +	GEM_BUG_ON(head > size);
> +
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> +	if (unlikely(head != READ_ONCE(desc->head))) {
> +		CT_ERROR(ct, "Head was modified %u != %u\n",
> +			 desc->head, ctb->head);
> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> +		goto corrupted;
> +	}
> +#endif
> +	if (unlikely(tail >= size)) {
> +		CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
> +			 tail, size);
>  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>  		goto corrupted;
>  	}
> @@ -802,6 +824,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	}
>  	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
>  
> +	ctb->head = head;

+ empty line

>  	/* now update descriptor */
>  	WRITE_ONCE(desc->head, head);
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index bee03794c1eb..edd1bba0445d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -33,6 +33,9 @@ struct intel_guc;
>   * @desc: pointer to the buffer descriptor
>   * @cmds: pointer to the commands buffer
>   * @size: size of the commands buffer in dwords
> + * @head: local shadow copy of head in dwords
> + * @tail: local shadow copy of tail in dwords
> + * @space: local shadow copy of space in dwords
>   * @broken: flag to indicate if descriptor data is broken
>   */
>  struct intel_guc_ct_buffer {
> @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
>  	struct guc_ct_buffer_desc *desc;
>  	u32 *cmds;
>  	u32 size;
> +	u32 tail;
> +	u32 head;
> +	u32 space;

it looks that we update space only for send/H2G ctb but maybe we should
also track free space in recv/G2H ctb ? then we can report different
stress conditions if space is less than 25% of size or so (someday)

Michal

>  	bool broken;
>  };
>  
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 06/7] drm/i915/guc: Optimize CTB writes and reads
  2021-07-06 22:20 ` [Intel-gfx] " Matthew Brost
@ 2021-07-07 23:25   ` Matthew Brost
  -1 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-07 23:25 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: john.c.harrison, Michal.Wajdeczko

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail) which could result in accesses across the PCIe bus,
store shadow local copies and only read/write the descriptor values when
absolutely necessary. Also store the current space in the each channel
locally.

v2:
 (Michal)
  - Add additional sanity checks for head / tail pointers
  - Use GUC_CTB_HDR_LEN rather than magic 1
v3:
 (Michal / John H)
  - Drop redundant check of head value
v4:
 (John H)
  - Drop redundant checks of tail / head values
v5:
 (Michal)
  - Address more nits

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 92 +++++++++++++++--------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
 2 files changed, 66 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index db3e85b89573..d552d3016779 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
 static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
 {
 	ctb->broken = false;
+	ctb->tail = 0;
+	ctb->head = 0;
+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+
 	guc_ct_buffer_desc_init(ctb->desc);
 }
 
@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
-	u32 tail = desc->tail;
+	u32 tail = ctb->tail;
 	u32 size = ctb->size;
-	u32 used;
 	u32 header;
 	u32 hxg;
 	u32 type;
@@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
-		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-			 head, tail, size);
+	GEM_BUG_ON(tail > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(tail != READ_ONCE(desc->tail))) {
+		CT_ERROR(ct, "Tail was modified %u != %u\n",
+			 desc->tail, tail);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+	if (unlikely(desc->head >= size)) {
+		CT_ERROR(ct, "Invalid head offset %u >= %u)\n",
+			 desc->head, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
-
-	/*
-	 * tail == head condition indicates empty. GuC FW does not support
-	 * using up the entire buffer to get tail == head meaning full.
-	 */
-	if (tail < head)
-		used = (size - head) + tail;
-	else
-		used = tail - head;
-
-	/* make sure there is a space including extra dw for the header */
-	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
-		return -ENOSPC;
+#endif
 
 	/*
 	 * dw0: CT header (including fence)
@@ -452,6 +451,10 @@ static int ct_write(struct intel_guc_ct *ct,
 	 */
 	write_barrier(ct);
 
+	/* update local copies */
+	ctb->tail = tail;
+	ctb->space -= len + GUC_CTB_HDR_LEN;
+
 	/* now update descriptor */
 	WRITE_ONCE(desc->tail, tail);
 
@@ -469,7 +472,7 @@ static int ct_write(struct intel_guc_ct *ct,
  * @req:	pointer to pending request
  * @status:	placeholder for status
  *
- * For each sent request, Guc shall send bac CT response message.
+ * For each sent request, GuC shall send back CT response message.
  * Our message handler will update status of tracked request once
  * response message with given fence is received. Wait here and
  * check for valid response status value.
@@ -525,24 +528,36 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 	return ret;
 }
 
-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
+static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 {
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = READ_ONCE(desc->head);
+	u32 head;
 	u32 space;
 
-	space = CIRC_SPACE(desc->tail, head, ctb->size);
+	if (ctb->space >= len_dw)
+		return true;
+
+	head = READ_ONCE(desc->head);
+	if (unlikely(head > ctb->size)) {
+		CT_ERROR(ct, "Invalid head offset %u >= %u)\n",
+			 head, ctb->size);
+		desc->status |= GUC_CTB_STATUS_OVERFLOW;
+		ctb->broken = true;
+		return false;
+	}
+
+	space = CIRC_SPACE(ctb->tail, head, ctb->size);
+	ctb->space = space;
 
 	return space >= len_dw;
 }
 
 static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
-
 	lockdep_assert_held(&ct->ctbs.send.lock);
 
-	if (unlikely(!h2g_has_room(ctb, len_dw))) {
+	if (unlikely(!h2g_has_room(ct, len_dw))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 
@@ -612,7 +627,7 @@ static int ct_send(struct intel_guc_ct *ct,
 	 */
 retry:
 	spin_lock_irqsave(&ctb->lock, flags);
-	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
+	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 		spin_unlock_irqrestore(&ctb->lock, flags);
@@ -732,8 +747,8 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
-	u32 tail = desc->tail;
+	u32 head = ctb->head;
+	u32 tail = READ_ONCE(desc->tail);
 	u32 size = ctb->size;
 	u32 *cmds = ctb->cmds;
 	s32 available;
@@ -747,9 +762,19 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
-		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-			 head, tail, size);
+	GEM_BUG_ON(head > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(head != READ_ONCE(desc->head))) {
+		CT_ERROR(ct, "Head was modified %u != %u\n",
+			 desc->head, head);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+#endif
+	if (unlikely(tail >= size)) {
+		CT_ERROR(ct, "Invalid tail offset %u >= %u)\n",
+			 tail, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
@@ -802,6 +827,9 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	}
 	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
 
+	/* update local copies */
+	ctb->head = head;
+
 	/* now update descriptor */
 	WRITE_ONCE(desc->head, head);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index bee03794c1eb..edd1bba0445d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -33,6 +33,9 @@ struct intel_guc;
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
  * @size: size of the commands buffer in dwords
+ * @head: local shadow copy of head in dwords
+ * @tail: local shadow copy of tail in dwords
+ * @space: local shadow copy of space in dwords
  * @broken: flag to indicate if descriptor data is broken
  */
 struct intel_guc_ct_buffer {
@@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
 	u32 size;
+	u32 tail;
+	u32 head;
+	u32 space;
 	bool broken;
 };
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Intel-gfx] [PATCH 06/7] drm/i915/guc: Optimize CTB writes and reads
@ 2021-07-07 23:25   ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-07 23:25 UTC (permalink / raw)
  To: intel-gfx, dri-devel

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail) which could result in accesses across the PCIe bus,
store shadow local copies and only read/write the descriptor values when
absolutely necessary. Also store the current space in the each channel
locally.

v2:
 (Michal)
  - Add additional sanity checks for head / tail pointers
  - Use GUC_CTB_HDR_LEN rather than magic 1
v3:
 (Michal / John H)
  - Drop redundant check of head value
v4:
 (John H)
  - Drop redundant checks of tail / head values
v5:
 (Michal)
  - Address more nits

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 92 +++++++++++++++--------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
 2 files changed, 66 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index db3e85b89573..d552d3016779 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
 static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
 {
 	ctb->broken = false;
+	ctb->tail = 0;
+	ctb->head = 0;
+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+
 	guc_ct_buffer_desc_init(ctb->desc);
 }
 
@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
-	u32 tail = desc->tail;
+	u32 tail = ctb->tail;
 	u32 size = ctb->size;
-	u32 used;
 	u32 header;
 	u32 hxg;
 	u32 type;
@@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
-		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-			 head, tail, size);
+	GEM_BUG_ON(tail > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(tail != READ_ONCE(desc->tail))) {
+		CT_ERROR(ct, "Tail was modified %u != %u\n",
+			 desc->tail, tail);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+	if (unlikely(desc->head >= size)) {
+		CT_ERROR(ct, "Invalid head offset %u >= %u)\n",
+			 desc->head, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
-
-	/*
-	 * tail == head condition indicates empty. GuC FW does not support
-	 * using up the entire buffer to get tail == head meaning full.
-	 */
-	if (tail < head)
-		used = (size - head) + tail;
-	else
-		used = tail - head;
-
-	/* make sure there is a space including extra dw for the header */
-	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
-		return -ENOSPC;
+#endif
 
 	/*
 	 * dw0: CT header (including fence)
@@ -452,6 +451,10 @@ static int ct_write(struct intel_guc_ct *ct,
 	 */
 	write_barrier(ct);
 
+	/* update local copies */
+	ctb->tail = tail;
+	ctb->space -= len + GUC_CTB_HDR_LEN;
+
 	/* now update descriptor */
 	WRITE_ONCE(desc->tail, tail);
 
@@ -469,7 +472,7 @@ static int ct_write(struct intel_guc_ct *ct,
  * @req:	pointer to pending request
  * @status:	placeholder for status
  *
- * For each sent request, Guc shall send bac CT response message.
+ * For each sent request, GuC shall send back CT response message.
  * Our message handler will update status of tracked request once
  * response message with given fence is received. Wait here and
  * check for valid response status value.
@@ -525,24 +528,36 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 	return ret;
 }
 
-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
+static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 {
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = READ_ONCE(desc->head);
+	u32 head;
 	u32 space;
 
-	space = CIRC_SPACE(desc->tail, head, ctb->size);
+	if (ctb->space >= len_dw)
+		return true;
+
+	head = READ_ONCE(desc->head);
+	if (unlikely(head > ctb->size)) {
+		CT_ERROR(ct, "Invalid head offset %u >= %u)\n",
+			 head, ctb->size);
+		desc->status |= GUC_CTB_STATUS_OVERFLOW;
+		ctb->broken = true;
+		return false;
+	}
+
+	space = CIRC_SPACE(ctb->tail, head, ctb->size);
+	ctb->space = space;
 
 	return space >= len_dw;
 }
 
 static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
-
 	lockdep_assert_held(&ct->ctbs.send.lock);
 
-	if (unlikely(!h2g_has_room(ctb, len_dw))) {
+	if (unlikely(!h2g_has_room(ct, len_dw))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 
@@ -612,7 +627,7 @@ static int ct_send(struct intel_guc_ct *ct,
 	 */
 retry:
 	spin_lock_irqsave(&ctb->lock, flags);
-	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
+	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 		spin_unlock_irqrestore(&ctb->lock, flags);
@@ -732,8 +747,8 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
-	u32 tail = desc->tail;
+	u32 head = ctb->head;
+	u32 tail = READ_ONCE(desc->tail);
 	u32 size = ctb->size;
 	u32 *cmds = ctb->cmds;
 	s32 available;
@@ -747,9 +762,19 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
-		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-			 head, tail, size);
+	GEM_BUG_ON(head > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(head != READ_ONCE(desc->head))) {
+		CT_ERROR(ct, "Head was modified %u != %u\n",
+			 desc->head, head);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+#endif
+	if (unlikely(tail >= size)) {
+		CT_ERROR(ct, "Invalid tail offset %u >= %u)\n",
+			 tail, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
@@ -802,6 +827,9 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	}
 	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
 
+	/* update local copies */
+	ctb->head = head;
+
 	/* now update descriptor */
 	WRITE_ONCE(desc->head, head);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index bee03794c1eb..edd1bba0445d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -33,6 +33,9 @@ struct intel_guc;
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
  * @size: size of the commands buffer in dwords
+ * @head: local shadow copy of head in dwords
+ * @tail: local shadow copy of tail in dwords
+ * @space: local shadow copy of space in dwords
  * @broken: flag to indicate if descriptor data is broken
  */
 struct intel_guc_ct_buffer {
@@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
 	u32 size;
+	u32 tail;
+	u32 head;
+	u32 space;
 	bool broken;
 };
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 0/2] Introduce set_parallel2 extension
  2021-07-06 22:20 ` [Intel-gfx] " Matthew Brost
@ 2021-07-08  0:30   ` Matthew Brost
  -1 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-08  0:30 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: john.c.harrison, Michal.Wajdeczko

Based on upstream feedback [1] the current set_parallel extension isn't
suitable. Add a single patch to DII implementing the new interface
agreed two upstream [2]. Intended to enable the UMDs with the upstream
interface while maintaining the old interface on DII. 

Quick IGT to prove this is working should be list shortly.

v2: Move single patch in GuC section on pile, align with agreed to
upstream interface, only include prelim* definitions. 
v3: Enable set_parallel2 via SET_PARAM IOCTL, resend for CI
v4: Fix regression when patch was merge - only do parallel checks on
user engine sets 

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

[1] https://patchwork.freedesktop.org/patch/432205/?series=89840&rev=1
[2] https://patchwork.freedesktop.org/patch/438911/?series=91417&rev=1

Signed-off-by: Matthew Brost <matthew.brost@intel.com>


---
baseline: b7227afd06bac1fe6719136e2ddd2bfed1d85feb
pile-commit: b7a2c9136977a385659a71df837cbe5a1f775b32
range-diff:
   -:  ------------ >  930:  ad12b87b91af INTEL_DII/NOT_UPSTREAM: drm/i915: Introduce set_parallel2 extension
1083:  73e59e150cde ! 1084:  79b296835b1c INTEL_DII/FIXME: drm/i915/perf: add a parameter to control the size of OA buffer
1120:  edbc20ae1355 ! 1121:  30d02d618229 INTEL_DII/FIXME: drm/i915: Add context parameter for debug flags
1293:  997b317fc408 ! 1294:  016b5903b0a0 INTEL_DII: drm/i915/perf: Add OA formats for XEHPSDV
1364:  136064b76b92 ! 1365:  5f564d553dc8 INTEL_DII: drm/i915/xehpsdv: Expand total numbers of supported engines up to 256
1403:  67b729033e82 ! 1404:  4398a2322f2f INTEL_DII: drm/i915/xehpsdv: Impose ULLS context restrictions
1405:  b8dd2a22a952 ! 1406:  dd2fab232cf1 INTEL_DII: drm/i915: Add context methods to suspend and resume requests
1670:  b4633106fa13 ! 1671:  53b4a54ee2cc INTEL_DII: drm/i915/pxp: interface for marking contexts as using protected content
1671:  22369ab70556 ! 1672:  42234590cdf5 INTEL_DII: drm/i915/pxp: start the arb session on demand

 series                                             |   1 +
 ...IXME-drm-i915-perf-add-a-parameter-to-con.patch |   4 +-
 ...IXME-drm-i915-Add-context-parameter-for-d.patch |  18 +-
 ...-drm-i915-perf-Add-OA-formats-for-XEHPSDV.patch |   4 +-
 ...rm-i915-xehpsdv-Expand-total-numbers-of-s.patch |   2 +-
 ...rm-i915-xehpsdv-Impose-ULLS-context-restr.patch |  12 +-
 ...rm-i915-Add-context-methods-to-suspend-an.patch |  38 +-
 ...rm-i915-pxp-interface-for-marking-context.patch |  16 +-
 ...rm-i915-pxp-start-the-arb-session-on-dema.patch |   2 +-
 ...OT_UPSTREAM-drm-i915-Introduce-set_parall.patch | 676 +++++++++++++++++++++
 10 files changed, 725 insertions(+), 48 deletions(-)

diff --git a/series b/series
index 8b77d52df40c..7db508ea974d 100644
--- a/series
+++ b/series
@@ -929,6 +929,7 @@
 0001-INTEL_DII-drm-i915-guc-Increase-GuC-log-size-for-CON.patch
 0001-INTEL_DII-NOT_UPSTREAM-drm-i915-Dump-error-capture-t.patch
 0001-INTEL_DII-NOT_UPSTREAM-drm-i915-guc-Dump-error-captu.patch
+0001-INTEL_DII-NOT_UPSTREAM-drm-i915-Introduce-set_parall.patch
 0001-INTEL_DII-END-GuC-submission-and-slpc-support.patch
 0001-INTEL_DII-BEGIN-SR-IOV-ENABLING.patch
 0001-INTEL_DII-drm-i915-guc-Update-GuC-to-62.0.3.patch
diff --git a/0001-INTEL_DII-FIXME-drm-i915-perf-add-a-parameter-to-con.patch b/0001-INTEL_DII-FIXME-drm-i915-perf-add-a-parameter-to-con.patch
index dd654f144374..b7a637b3813f 100644
--- a/0001-INTEL_DII-FIXME-drm-i915-perf-add-a-parameter-to-con.patch
+++ b/0001-INTEL_DII-FIXME-drm-i915-perf-add-a-parameter-to-con.patch
@@ -384,8 +384,8 @@ diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
 diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h
 --- a/include/uapi/drm/i915_drm_prelim.h
 +++ b/include/uapi/drm/i915_drm_prelim.h
-@@ -393,6 +393,36 @@ struct prelim_i915_context_param_engines {
- #define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
+@@ -508,6 +508,36 @@ struct prelim_i915_context_param_engines {
+ #define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT (PRELIM_I915_USER_EXT | 3) /* see prelim_i915_context_engines_parallel2_submit */
  };
  
 +enum prelim_drm_i915_perf_property_id {
diff --git a/0001-INTEL_DII-FIXME-drm-i915-Add-context-parameter-for-d.patch b/0001-INTEL_DII-FIXME-drm-i915-Add-context-parameter-for-d.patch
index dfd5790ac2b8..71a5943b5536 100644
--- a/0001-INTEL_DII-FIXME-drm-i915-Add-context-parameter-for-d.patch
+++ b/0001-INTEL_DII-FIXME-drm-i915-Add-context-parameter-for-d.patch
@@ -44,7 +44,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  }
  
  static void __free_engines(struct i915_gem_engines *e, unsigned int count)
-@@ -2252,6 +2255,76 @@ static int set_priority(struct i915_gem_context *ctx,
+@@ -2436,6 +2439,76 @@ static int set_priority(struct i915_gem_context *ctx,
  	return 0;
  }
  
@@ -121,7 +121,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  static int ctx_setparam(struct drm_i915_file_private *fpriv,
  			struct i915_gem_context *ctx,
  			struct drm_i915_gem_context_param *args)
-@@ -2321,6 +2394,11 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
+@@ -2505,6 +2578,11 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
  		ret = set_ringsize(ctx, args);
  		break;
  
@@ -133,7 +133,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  	case I915_CONTEXT_PARAM_BAN_PERIOD:
  	default:
  		ret = -EINVAL;
-@@ -2777,6 +2855,11 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
+@@ -2961,6 +3039,11 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
  		ret = get_ringsize(ctx, args);
  		break;
  
@@ -184,7 +184,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm
 diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
 --- a/drivers/gpu/drm/i915/gt/intel_context.h
 +++ b/drivers/gpu/drm/i915/gt/intel_context.h
-@@ -285,6 +285,24 @@ intel_context_clear_nopreempt(struct intel_context *ce)
+@@ -296,6 +296,24 @@ intel_context_clear_nopreempt(struct intel_context *ce)
  		ce->emit_bb_start = ce->engine->emit_bb_start;
  }
  
@@ -212,19 +212,19 @@ diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/i
 diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
 +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
-@@ -114,6 +114,7 @@ struct intel_context {
- #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
+@@ -115,6 +115,7 @@ struct intel_context {
  #define CONTEXT_NOPREEMPT		8
  #define CONTEXT_LRCA_DIRTY		9
-+#define CONTEXT_DEBUG			10
+ #define CONTEXT_NO_PREEMPT_MID_BATCH	10
++#define CONTEXT_DEBUG			11
  
  	struct {
  		u64 timeout_us;
 diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h
 --- a/include/uapi/drm/i915_drm_prelim.h
 +++ b/include/uapi/drm/i915_drm_prelim.h
-@@ -395,6 +395,32 @@ struct prelim_i915_context_param_engines {
- #define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
+@@ -510,6 +510,32 @@ struct prelim_i915_context_param_engines {
+ #define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT (PRELIM_I915_USER_EXT | 3) /* see prelim_i915_context_engines_parallel2_submit */
  };
  
 +struct prelim_drm_i915_gem_context_param {
diff --git a/0001-INTEL_DII-drm-i915-perf-Add-OA-formats-for-XEHPSDV.patch b/0001-INTEL_DII-drm-i915-perf-Add-OA-formats-for-XEHPSDV.patch
index 19a07b3926ae..f62d7848e091 100644
--- a/0001-INTEL_DII-drm-i915-perf-Add-OA-formats-for-XEHPSDV.patch
+++ b/0001-INTEL_DII-drm-i915-perf-Add-OA-formats-for-XEHPSDV.patch
@@ -204,8 +204,8 @@ diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
 diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h
 --- a/include/uapi/drm/i915_drm_prelim.h
 +++ b/include/uapi/drm/i915_drm_prelim.h
-@@ -435,6 +435,27 @@ struct prelim_i915_context_param_engines {
- #define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
+@@ -550,6 +550,27 @@ struct prelim_i915_context_param_engines {
+ #define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT (PRELIM_I915_USER_EXT | 3) /* see prelim_i915_context_engines_parallel2_submit */
  };
  
 +enum prelim_drm_i915_oa_format {
diff --git a/0001-INTEL_DII-drm-i915-xehpsdv-Expand-total-numbers-of-s.patch b/0001-INTEL_DII-drm-i915-xehpsdv-Expand-total-numbers-of-s.patch
index 05a84884a3d1..ee486b95d11e 100644
--- a/0001-INTEL_DII-drm-i915-xehpsdv-Expand-total-numbers-of-s.patch
+++ b/0001-INTEL_DII-drm-i915-xehpsdv-Expand-total-numbers-of-s.patch
@@ -76,7 +76,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9
  
  	/* Kernel clipping was a DRI1 misfeature */
  	if (!(exec->flags & I915_EXEC_FENCE_ARRAY)) {
-@@ -3233,9 +3235,12 @@ eb_select_engine(struct i915_execbuffer *eb)
+@@ -3233,9 +3235,12 @@ eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number)
  	int err;
  
  	if (i915_gem_context_user_engines(eb->gem_context))
diff --git a/0001-INTEL_DII-drm-i915-xehpsdv-Impose-ULLS-context-restr.patch b/0001-INTEL_DII-drm-i915-xehpsdv-Impose-ULLS-context-restr.patch
index 38ad84c4dc12..80880e3008cc 100644
--- a/0001-INTEL_DII-drm-i915-xehpsdv-Impose-ULLS-context-restr.patch
+++ b/0001-INTEL_DII-drm-i915-xehpsdv-Impose-ULLS-context-restr.patch
@@ -76,7 +76,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9
  	if (intel_context_nopreempt(eb->context) ||
  	    intel_context_debug(eb->context))
  		__set_bit(I915_FENCE_FLAG_NOPREEMPT, &eb->request->fence.flags);
-@@ -3453,6 +3462,13 @@ static int eb_request_add(struct i915_execbuffer *eb, int err)
+@@ -3463,6 +3472,13 @@ static int eb_request_add(struct i915_execbuffer *eb, int err)
  
  	trace_i915_request_add(rq);
  
@@ -90,7 +90,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9
  	prev = __i915_request_commit(rq);
  
  	/* Check that the context wasn't destroyed before submission */
-@@ -3531,6 +3547,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
+@@ -3541,6 +3557,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
  	int err;
  	bool first = batch_number == 0;
  	bool last = batch_number + 1 == num_batches;
@@ -98,7 +98,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9
  
  	BUILD_BUG_ON(__EXEC_INTERNAL_FLAGS & ~__I915_EXEC_ILLEGAL_FLAGS);
  	BUILD_BUG_ON(__EXEC_OBJECT_INTERNAL_FLAGS &
-@@ -3582,6 +3599,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
+@@ -3592,6 +3609,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
  	if (unlikely(err))
  		goto err_destroy;
  
@@ -109,7 +109,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9
 +		goto err_context;
 +	}
 +
- 	err = eb_select_engine(&eb);
+ 	err = eb_select_engine(&eb, batch_number);
  	if (unlikely(err))
  		goto err_context;
 diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -239,7 +239,7 @@ diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_req
  		/*
  		 * Requests on the same timeline are explicitly ordered, along
  		 * with their dependencies, by i915_request_add() which ensures
-@@ -2126,6 +2181,7 @@ long i915_request_wait(struct i915_request *rq,
+@@ -2121,6 +2176,7 @@ long i915_request_wait(struct i915_request *rq,
  {
  	might_sleep();
  	GEM_BUG_ON(timeout < 0);
@@ -247,7 +247,7 @@ diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_req
  
  	if (dma_fence_is_signaled(&rq->fence))
  		return timeout;
-@@ -2331,6 +2387,8 @@ static struct i915_global_request global = { {
+@@ -2326,6 +2382,8 @@ static struct i915_global_request global = { {
  
  int __init i915_global_request_init(void)
  {
diff --git a/0001-INTEL_DII-drm-i915-Add-context-methods-to-suspend-an.patch b/0001-INTEL_DII-drm-i915-Add-context-methods-to-suspend-an.patch
index 7d523c8dadba..44fd93184b8a 100644
--- a/0001-INTEL_DII-drm-i915-Add-context-methods-to-suspend-an.patch
+++ b/0001-INTEL_DII-drm-i915-Add-context-methods-to-suspend-an.patch
@@ -52,7 +52,7 @@ diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/i
  void
  intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
  {
-@@ -475,6 +481,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
+@@ -476,6 +482,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
  	ce->guc_id = GUC_INVALID_LRC_ID;
  	INIT_LIST_HEAD(&ce->guc_id_link);
  
@@ -62,7 +62,7 @@ diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/i
  	i915_active_init(&ce->active,
  			 __intel_context_active, __intel_context_retire);
  }
-@@ -485,6 +494,7 @@ void intel_context_fini(struct intel_context *ce)
+@@ -486,6 +495,7 @@ void intel_context_fini(struct intel_context *ce)
  
  	if (ce->last_rq)
  		i915_request_put(ce->last_rq);
@@ -73,7 +73,7 @@ diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/i
 diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
 --- a/drivers/gpu/drm/i915/gt/intel_context.h
 +++ b/drivers/gpu/drm/i915/gt/intel_context.h
-@@ -252,6 +252,54 @@ static inline bool intel_context_ban(struct intel_context *ce,
+@@ -263,6 +263,54 @@ static inline bool intel_context_ban(struct intel_context *ce,
  	return ret;
  }
  
@@ -152,10 +152,10 @@ diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i91
  	void (*enter)(struct intel_context *ce);
  	void (*exit)(struct intel_context *ce);
  
-@@ -241,6 +248,9 @@ struct intel_context {
+@@ -245,6 +252,9 @@ struct intel_context {
  
- 	/* Last request submitted on a parent */
- 	struct i915_request *last_rq;
+ 	/* Parallel submit mutex */
+ 	struct mutex parallel_submit;
 +
 +	/* GuC context blocked fence */
 +	struct i915_sw_fence guc_blocked;
@@ -231,7 +231,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  
  	if (!enabled) {
  		GEM_BUG_ON(context_pending_enable(ce));
-@@ -1103,6 +1137,8 @@ static void __guc_context_destroy(struct intel_context *ce);
+@@ -1102,6 +1136,8 @@ static void __guc_context_destroy(struct intel_context *ce);
  static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
  static void guc_signal_context_fence(struct intel_context *ce);
  static void guc_cancel_context_requests(struct intel_context *ce);
@@ -240,7 +240,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  
  static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
  {
-@@ -1143,6 +1179,8 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
+@@ -1142,6 +1178,8 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
  
  		/* Not mutualy exclusive with above if statement. */
  		if (pending_disable) {
@@ -249,7 +249,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  			guc_signal_context_fence(ce);
  			if (banned) {
  				guc_cancel_context_requests(ce);
-@@ -1150,7 +1188,12 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
+@@ -1149,7 +1187,12 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
  			}
  			intel_context_sched_disable_unpin(ce);
  			atomic_dec(&guc->outstanding_submission_g2h);
@@ -262,7 +262,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  		}
  	}
  }
-@@ -2549,6 +2592,22 @@ static void guc_parent_context_unpin(struct intel_context *ce)
+@@ -2551,6 +2594,22 @@ static void guc_parent_context_unpin(struct intel_context *ce)
  	__guc_context_unpin(ce);
  }
  
@@ -285,7 +285,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  static void __guc_context_sched_disable(struct intel_guc *guc,
  					struct intel_context *ce,
  					u16 guc_id)
-@@ -2576,10 +2635,13 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
+@@ -2578,10 +2637,13 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
  				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
  }
  
@@ -299,7 +299,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  	intel_context_get(ce);
  
  	return ce->guc_id;
-@@ -2677,6 +2739,132 @@ static void guc_context_sched_disable(struct intel_context *ce)
+@@ -2679,6 +2741,132 @@ static void guc_context_sched_disable(struct intel_context *ce)
  	intel_context_sched_disable_unpin(ce);
  }
  
@@ -432,7 +432,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  int intel_guc_modify_scheduling(struct intel_guc *guc, bool enable)
  {
  	struct intel_gt *gt = guc_to_gt(guc);
-@@ -2991,6 +3179,9 @@ static const struct intel_context_ops guc_context_ops = {
+@@ -2993,6 +3181,9 @@ static const struct intel_context_ops guc_context_ops = {
  
  	.ban = guc_context_ban,
  
@@ -442,7 +442,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  	.enter = intel_context_enter_engine,
  	.exit = guc_context_exit,
  
-@@ -3380,6 +3571,9 @@ static const struct intel_context_ops virtual_guc_context_ops = {
+@@ -3382,6 +3573,9 @@ static const struct intel_context_ops virtual_guc_context_ops = {
  
  	.ban = guc_context_ban,
  
@@ -452,7 +452,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  	.enter = guc_virtual_context_enter,
  	.exit = guc_virtual_context_exit,
  
-@@ -3457,6 +3651,9 @@ static const struct intel_context_ops parent_context_ops = {
+@@ -3459,6 +3653,9 @@ static const struct intel_context_ops parent_context_ops = {
  
  	.ban = guc_context_ban,
  
@@ -462,7 +462,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  	.enter = intel_context_enter_engine,
  	.exit = intel_context_exit_engine,
  
-@@ -3476,6 +3673,9 @@ static const struct intel_context_ops virtual_parent_context_ops = {
+@@ -3478,6 +3675,9 @@ static const struct intel_context_ops virtual_parent_context_ops = {
  
  	.ban = guc_context_ban,
  
@@ -472,7 +472,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  	.enter = guc_virtual_context_enter,
  	.exit = guc_virtual_context_exit,
  
-@@ -3487,6 +3687,9 @@ static const struct intel_context_ops virtual_parent_context_ops = {
+@@ -3489,6 +3689,9 @@ static const struct intel_context_ops virtual_parent_context_ops = {
  static const struct intel_context_ops child_context_ops = {
  	.alloc = guc_context_alloc,
  
@@ -482,7 +482,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  	.enter = intel_context_enter_engine,
  	.exit = guc_context_exit,
  
-@@ -3497,6 +3700,9 @@ static const struct intel_context_ops child_context_ops = {
+@@ -3499,6 +3702,9 @@ static const struct intel_context_ops child_context_ops = {
  static const struct intel_context_ops virtual_child_context_ops = {
  	.alloc = guc_virtual_context_alloc,
  
@@ -492,7 +492,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  	.enter = guc_virtual_context_enter,
  	.exit = guc_virtual_context_exit,
  
-@@ -4440,6 +4646,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
+@@ -4441,6 +4647,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
  		clr_context_banned(ce);
  		clr_context_pending_disable(ce);
  		__guc_signal_context_fence(ce);
diff --git a/0001-INTEL_DII-drm-i915-pxp-interface-for-marking-context.patch b/0001-INTEL_DII-drm-i915-pxp-interface-for-marking-context.patch
index 6b38bd36d21b..8a6b9561eb24 100644
--- a/0001-INTEL_DII-drm-i915-pxp-interface-for-marking-context.patch
+++ b/0001-INTEL_DII-drm-i915-pxp-interface-for-marking-context.patch
@@ -56,7 +56,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  #include "i915_gem_context.h"
  #include "i915_gem_ioctls.h"
  #include "i915_globals.h"
-@@ -2574,6 +2576,40 @@ static int set_acc(struct i915_gem_context *ctx,
+@@ -2769,6 +2771,40 @@ static int set_acc(struct i915_gem_context *ctx,
  	return 0;
  }
  
@@ -97,7 +97,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  static int ctx_setparam(struct drm_i915_file_private *fpriv,
  			struct i915_gem_context *ctx,
  			struct drm_i915_gem_context_param *args,
-@@ -2607,6 +2643,8 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
+@@ -2802,6 +2838,8 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
  			ret = -EPERM;
  		else if (args->value)
  			i915_gem_context_set_bannable(ctx);
@@ -106,7 +106,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  		else
  			i915_gem_context_clear_bannable(ctx);
  		break;
-@@ -2614,10 +2652,12 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
+@@ -2809,10 +2847,12 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
  	case I915_CONTEXT_PARAM_RECOVERABLE:
  		if (args->size)
  			ret = -EINVAL;
@@ -122,7 +122,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  		break;
  
  	case I915_CONTEXT_PARAM_PRIORITY:
-@@ -2664,6 +2704,9 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
+@@ -2865,6 +2905,9 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
  	case I915_CONTEXT_PARAM_DEBUG_FLAGS:
  		ret = set_debug_flags(ctx, args);
  		break;
@@ -132,7 +132,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  
  	case I915_CONTEXT_PARAM_BAN_PERIOD:
  	default:
-@@ -3157,6 +3200,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
+@@ -3358,6 +3401,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
  	case I915_CONTEXT_PARAM_DEBUG_FLAGS:
  		ret = get_debug_flags(ctx, args);
  		break;
@@ -142,7 +142,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  
  	case I915_CONTEXT_PARAM_BAN_PERIOD:
  	default:
-@@ -3281,6 +3327,11 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
+@@ -3482,6 +3528,11 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
  	args->batch_active = atomic_read(&ctx->guilty_count);
  	args->batch_pending = atomic_read(&ctx->active_count);
  
@@ -225,7 +225,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9
  	eb->gem_context = ctx;
  	if (rcu_access_pointer(ctx->vm))
  		eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
-@@ -3301,6 +3308,17 @@ eb_select_engine(struct i915_execbuffer *eb)
+@@ -3311,6 +3318,17 @@ eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number)
  
  	intel_gt_pm_get(ce->engine->gt);
  
@@ -348,7 +348,7 @@ diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
 diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h
 --- a/include/uapi/drm/i915_drm_prelim.h
 +++ b/include/uapi/drm/i915_drm_prelim.h
-@@ -893,6 +893,26 @@ struct prelim_drm_i915_gem_context_param {
+@@ -1003,6 +1003,26 @@ struct prelim_drm_i915_gem_context_param {
  #define I915_CONTEXT_PARAM_ACC    0xd
  };
  
diff --git a/0001-INTEL_DII-drm-i915-pxp-start-the-arb-session-on-dema.patch b/0001-INTEL_DII-drm-i915-pxp-start-the-arb-session-on-dema.patch
index 5ee627b00811..4b4326057959 100644
--- a/0001-INTEL_DII-drm-i915-pxp-start-the-arb-session-on-dema.patch
+++ b/0001-INTEL_DII-drm-i915-pxp-start-the-arb-session-on-dema.patch
@@ -22,7 +22,7 @@ Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
 diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
 +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
-@@ -3309,9 +3309,11 @@ eb_select_engine(struct i915_execbuffer *eb)
+@@ -3319,9 +3319,11 @@ eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number)
  	intel_gt_pm_get(ce->engine->gt);
  
  	if (i915_gem_context_uses_protected_content(eb->gem_context)) {
diff --git a/0001-INTEL_DII-NOT_UPSTREAM-drm-i915-Introduce-set_parall.patch b/0001-INTEL_DII-NOT_UPSTREAM-drm-i915-Introduce-set_parall.patch
new file mode 100644
index 000000000000..415fbd930383
--- /dev/null
+++ b/0001-INTEL_DII-NOT_UPSTREAM-drm-i915-Introduce-set_parall.patch
@@ -0,0 +1,676 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Matthew Brost <matthew.brost@intel.com>
+Date: Wed, 7 Jul 2021 16:55:03 -0700
+Subject: [PATCH] INTEL_DII/NOT_UPSTREAM: drm/i915: Introduce set_parallel2
+ extension
+
+Based on upstream feedback the set_parallel extension isn't suitable as
+it looks a bit too much like the bonding extension. Introduce a
+set_parallel2 extension which configures parallel submission in a single
+extension and in a single slot. This compares to old set_parallel
+extension which configured parallel submission across multiple slots.
+
+Also remove the ability for the user to pass in the number of BBs in
+the execbuf IOCTL. The number of BBs is now implied based on the width
+of the context in the slot.
+
+This patch is intended in enable UMDs for the upstream direction while
+maintaining the old set_parallel extension to not break UMDs. Once UMDs
+have been updated to use new extension the old one can be removed from
+DII.
+
+v2: Only enable parallel submission on engines set by user
+
+Signed-off-by: Matthew Brost <matthew.brost@intel.com>
+---
+ drivers/gpu/drm/i915/gem/i915_gem_context.c   | 190 +++++++++++++++++-
+ .../gpu/drm/i915/gem/i915_gem_context_types.h |   6 -
+ .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  73 +++++--
+ drivers/gpu/drm/i915/gt/intel_context.c       |   2 +
+ drivers/gpu/drm/i915/gt/intel_context.h       |  11 +
+ drivers/gpu/drm/i915/gt/intel_context_types.h |   4 +
+ .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   4 +-
+ drivers/gpu/drm/i915/i915_request.c           |   7 +-
+ include/uapi/drm/i915_drm_prelim.h            | 115 +++++++++++
+ 9 files changed, 376 insertions(+), 36 deletions(-)
+
+diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
+--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
++++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
+@@ -374,7 +374,6 @@ void i915_gem_context_release(struct kref *ref)
+ 	mutex_destroy(&ctx->engines_mutex);
+ 	mutex_destroy(&ctx->lut_mutex);
+ 	mutex_destroy(&ctx->mutex);
+-	mutex_destroy(&ctx->parallel_submit);
+ 
+ 	kfree_rcu(ctx, rcu);
+ }
+@@ -699,8 +698,6 @@ __create_context(struct drm_i915_private *i915)
+ 	mutex_init(&ctx->mutex);
+ 	INIT_LIST_HEAD(&ctx->link);
+ 
+-	mutex_init(&ctx->parallel_submit);
+-
+ 	spin_lock_init(&ctx->stale.lock);
+ 	INIT_LIST_HEAD(&ctx->stale.engines);
+ 
+@@ -1857,6 +1854,48 @@ static bool validate_parallel_engines_layout(const struct set_engines *set)
+ 	return true;
+ }
+ 
++/*
++ * Engine must be same class and form a logically contiguous mask.
++ *
++ * FIXME: Logical mask check not 100% correct but good enough for the PoC
++ */
++static bool __validate_parallel_engines_layout(struct drm_i915_private *i915,
++					       struct intel_context *parent)
++{
++	u8 engine_class = parent->engine->class;
++	u8 num_siblings = hweight_long(parent->engine->logical_mask);
++	struct intel_context *child;
++	intel_engine_mask_t logical_mask = parent->engine->logical_mask;
++
++	for_each_child(parent, child) {
++		if (child->engine->class != engine_class) {
++			drm_dbg(&i915->drm, "Class mismatch: %u, %u",
++				engine_class, child->engine->class);
++			return false;
++		}
++		if (hweight_long(child->engine->logical_mask) != num_siblings) {
++			drm_dbg(&i915->drm, "Sibling mismatch: %u, %lu",
++				num_siblings,
++				hweight_long(child->engine->logical_mask));
++			return false;
++		}
++		if (logical_mask & child->engine->logical_mask) {
++			drm_dbg(&i915->drm, "Overlapping logical mask: 0x%04x, 0x%04x",
++				logical_mask, child->engine->logical_mask);
++			return false;
++		}
++		logical_mask |= child->engine->logical_mask;
++	}
++
++	if (!is_power_of_2((logical_mask >> (ffs(logical_mask) - 1)) + 1)) {
++		drm_dbg(&i915->drm, "Non-contiguous logical mask: 0x%04x",
++			logical_mask);
++		return false;
++	}
++
++	return true;
++}
++
+ static int
+ set_engines__parallel_submit(struct i915_user_extension __user *base, void *data)
+ {
+@@ -2009,11 +2048,156 @@ set_engines__parallel_submit(struct i915_user_extension __user *base, void *data
+ 	return err;
+ }
+ 
++static int
++set_engines__parallel2_submit(struct i915_user_extension __user *base,
++			      void *data)
++{
++	struct prelim_drm_i915_context_engines_parallel2_submit __user *ext =
++		container_of_user(base, typeof(*ext), base);
++	const struct set_engines *set = data;
++	struct drm_i915_private *i915 = set->ctx->i915;
++	struct intel_context *parent, *child, *ce;
++	u64 flags;
++	int err = 0, n, i, j;
++	u16 slot, width, num_siblings;
++	struct intel_engine_cs **siblings = NULL;
++
++	if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
++		return -ENODEV;
++
++	if (get_user(slot, &ext->engine_index))
++		return -EFAULT;
++
++	if (get_user(width, &ext->width))
++		return -EFAULT;
++
++	if (get_user(num_siblings, &ext->num_siblings))
++		return -EFAULT;
++
++	if (slot >= set->engines->num_engines) {
++		drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
++			slot, set->engines->num_engines);
++		return -EINVAL;
++	}
++
++	parent = set->engines->engines[slot];
++	if (parent) {
++		drm_dbg(&i915->drm, "Context index[%d] not NULL\n", slot);
++		return -EINVAL;
++	}
++
++	if (get_user(flags, &ext->flags))
++		return -EFAULT;
++
++	if (flags) {
++		drm_dbg(&i915->drm, "Unknown flags 0x%02llx", flags);
++		return -EINVAL;
++	}
++
++	for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
++		err = check_user_mbz(&ext->mbz64[n]);
++		if (err)
++			return err;
++	}
++
++	if (width < 1) {
++		drm_dbg(&i915->drm, "Width (%d) < 1 \n", width);
++		return -EINVAL;
++	}
++
++	if (num_siblings < 1) {
++		drm_dbg(&i915->drm, "Number siblings (%d) < 1 \n",
++			num_siblings);
++		return -EINVAL;
++	}
++
++	siblings = kmalloc_array(num_siblings,
++				 sizeof(*siblings),
++				 GFP_KERNEL);
++	if (!siblings)
++		return -ENOMEM;
++
++	mutex_lock(&set->ctx->mutex);
++
++	/* Create contexts / engines */
++	for (i = 0; i < width; ++i) {
++		for (j = 0; j < num_siblings; ++j) {
++			struct i915_engine_class_instance ci;
++
++			if (copy_from_user(&ci, &ext->engines[i * num_siblings + j],
++					   sizeof(ci))) {
++				err = -EFAULT;
++				goto out_err;
++			}
++
++			siblings[j] = intel_engine_lookup_user(i915,
++							       ci.engine_class,
++							       ci.engine_instance);
++			if (!siblings[j]) {
++				drm_dbg(&i915->drm,
++					"Invalid sibling[%d]: { class:%d, inst:%d }\n",
++					n, ci.engine_class, ci.engine_instance);
++				err = -EINVAL;
++				goto out_err;
++			}
++		}
++
++		ce = intel_engine_create_virtual(siblings, num_siblings,
++						 FORCE_VIRTUAL);
++		if (IS_ERR(ce)) {
++			err = PTR_ERR(ce);
++			goto out_err;
++		}
++		intel_context_set_gem(ce, set->ctx);
++
++		if (i == 0) {
++			parent = ce;
++			__set_bit(CONTEXT_NO_PREEMPT_MID_BATCH, &ce->flags);
++		} else {
++			intel_context_bind_parent_child(parent, ce);
++			err = intel_context_alloc_state(ce);
++			if (err)
++				goto out_err;
++		}
++	}
++
++	if (!__validate_parallel_engines_layout(i915, parent)) {
++		drm_dbg(&i915->drm, "Invalidate parallel context layout");
++		err = -EINVAL;
++		goto out_err;
++	}
++
++	intel_guc_configure_parent_context(parent);
++	if (cmpxchg(&set->engines->engines[slot], NULL, parent)) {
++		err = -EEXIST;
++		goto out_err;
++	}
++
++	kfree(siblings);
++	mutex_unlock(&set->ctx->mutex);
++
++	return 0;
++
++out_err:
++	if (parent) {
++		for_each_child(parent, child)
++			intel_context_put(child);
++		intel_context_put(parent);
++		set->engines->engines[slot] = NULL;
++	}
++	kfree(siblings);
++	mutex_unlock(&set->ctx->mutex);
++
++	return err;
++}
++
+ static const i915_user_extension_fn set_engines__extensions[] = {
+ 	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
+ 	[I915_CONTEXT_ENGINES_EXT_BOND] = set_engines__bond,
+ 	[PRELIM_I915_USER_EXT_MASK(PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT)] =
+ 		set_engines__parallel_submit,
++	[PRELIM_I915_USER_EXT_MASK(PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT)] =
++		set_engines__parallel2_submit,
+ };
+ 
+ static int
+diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
++++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+@@ -194,12 +194,6 @@ struct i915_gem_context {
+ 	 */
+ 	u64 fence_context;
+ 
+-	/**
+-	 * @parallel_submit: Ensure only 1 parallel submission is happening on
+-	 * this context at a time.
+-	 */
+-	struct mutex parallel_submit;
+-
+ 	/**
+ 	 * @seqno: Seqno when using when a parallel context.
+ 	 */
+diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
++++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+@@ -1633,7 +1633,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
+ 		goto err_unmap;
+ 
+ 	if (engine == eb->context->engine &&
+-	    !i915_gem_context_is_parallel(eb->gem_context)) {
++	    !intel_context_is_parallel(eb->context)) {
+ 		rq = i915_request_create(eb->context);
+ 	} else {
+ 		struct intel_context *ce = eb->reloc_context;
+@@ -1727,7 +1727,7 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
+ 		struct intel_engine_cs *engine = eb->engine;
+ 
+ 		if (!reloc_can_use_engine(engine) ||
+-		    i915_gem_context_is_parallel(eb->gem_context)) {
++		    intel_context_is_parallel(eb->context)) {
+ 			engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
+ 			if (!engine)
+ 				return ERR_PTR(-ENODEV);
+@@ -3223,7 +3223,7 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
+ }
+ 
+ static int
+-eb_select_engine(struct i915_execbuffer *eb)
++eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number)
+ {
+ 	struct intel_context *ce;
+ 	unsigned int idx;
+@@ -3238,6 +3238,16 @@ eb_select_engine(struct i915_execbuffer *eb)
+ 	if (IS_ERR(ce))
+ 		return PTR_ERR(ce);
+ 
++	if (batch_number > 0 &&
++	    !i915_gem_context_is_parallel(eb->gem_context)) {
++		struct intel_context *parent = ce;
++		for_each_child(parent, ce)
++			if (!--batch_number)
++				break;
++		intel_context_put(parent);
++		intel_context_get(ce);
++	}
++
+ 	intel_gt_pm_get(ce->engine->gt);
+ 
+ 	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
+@@ -3562,7 +3572,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
+ 	if (unlikely(err))
+ 		goto err_destroy;
+ 
+-	err = eb_select_engine(&eb);
++	err = eb_select_engine(&eb, batch_number);
+ 	if (unlikely(err))
+ 		goto err_context;
+ 
+@@ -3751,6 +3761,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
+ 	const size_t count = args->buffer_count;
+ 	unsigned int num_batches, i;
+ 	int err, start_context;
++	bool is_parallel = false;
++	struct intel_context *parent = NULL;
+ 
+ 	if (!check_buffer_count(count)) {
+ 		drm_dbg(&i915->drm, "execbuf2 with %zd buffers\n", count);
+@@ -3782,15 +3794,35 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
+ 			I915_EXEC_NUMBER_BB_LSB) +
+ 		       ((args->flags & PRELIM_I915_EXEC_NUMBER_BB_MASK) >>
+ 			PRELIM_I915_EXEC_NUMBER_BB_LSB)) + 1;
+-	if (i915_gem_context_is_parallel(ctx)) {
+-		if (num_batches > count ||
+-		    start_context + num_batches > ctx->width) {
+-			err = -EINVAL;
+-			goto err_context;
++
++	if (i915_gem_context_user_engines(ctx)) {
++		parent = i915_gem_context_get_engine(ctx, start_context);
++		if (IS_ERR(parent)) {
++			i915_gem_context_put(ctx);
++			return PTR_ERR(parent);
+ 		}
+ 
+-		if (i915_gem_context_is_bb_preempt_boundary(ctx) &&
+-		    (start_context || num_batches != ctx->width)) {
++		is_parallel = i915_gem_context_is_parallel(ctx) ||
++			intel_context_is_parallel(parent);
++		if (i915_gem_context_is_parallel(ctx)) {
++			if (num_batches > count ||
++			    start_context + num_batches > ctx->width) {
++				err = -EINVAL;
++				goto err_context;
++			}
++
++			if (i915_gem_context_is_bb_preempt_boundary(ctx) &&
++			    (start_context || num_batches != ctx->width)) {
++				err = -EINVAL;
++				goto err_context;
++			}
++		} else if (intel_context_is_parallel(parent)) {
++			if (num_batches != 1)
++				return -EINVAL;
++			num_batches = parent->guc_number_children + 1;
++			if (num_batches > count)
++				return -EINVAL;
++		} else if(num_batches > 1) {
+ 			err = -EINVAL;
+ 			goto err_context;
+ 		}
+@@ -3827,8 +3859,7 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
+ 	 * properly, also this is needed to create an excl fence for an dma buf
+ 	 * objects these BBs touch.
+ 	 */
+-	if (args->flags & I915_EXEC_FENCE_OUT ||
+-	    i915_gem_context_is_parallel(ctx)) {
++	if (args->flags & I915_EXEC_FENCE_OUT || is_parallel) {
+ 		out_fences = kcalloc(num_batches, sizeof(*out_fences),
+ 				     GFP_KERNEL);
+ 		if (!out_fences) {
+@@ -3874,8 +3905,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
+ 	 * in intel_context sequence, thus only 1 submission can happen at a
+ 	 * time.
+ 	 */
+-	if (i915_gem_context_is_parallel(ctx))
+-		mutex_lock(&ctx->parallel_submit);
++	if (is_parallel)
++		mutex_lock(&parent->parallel_submit);
+ 
+ 	err = i915_gem_do_execbuffer(dev, file, args, exec2_list,
+ 				     args->flags & I915_EXEC_BATCH_FIRST ?
+@@ -3889,8 +3920,10 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
+ 				     &ww);
+ 
+ 	for (i = 1; err == 0 && i < num_batches; i++) {
+-		args->flags &= ~I915_EXEC_RING_MASK;
+-		args->flags |= start_context + i;
++		if (i915_gem_context_is_parallel(ctx)) {
++			args->flags &= ~I915_EXEC_RING_MASK;
++			args->flags |= start_context + i;
++		}
+ 		args->batch_len = 0;
+ 
+ 		err = i915_gem_do_execbuffer(dev, file, args, exec2_list,
+@@ -3905,8 +3938,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
+ 					     &ww);
+ 	}
+ 
+-	if (i915_gem_context_is_parallel(ctx))
+-		mutex_unlock(&ctx->parallel_submit);
++	if (is_parallel)
++		mutex_unlock(&parent->parallel_submit);
+ 
+ 	/*
+ 	 * Now that we have begun execution of the batchbuffer, we ignore
+@@ -4009,6 +4042,8 @@ end:;
+ 	dma_fence_put(in_fence);
+ err_context:
+ 	i915_gem_context_put(ctx);
++	if (parent)
++		intel_context_put(parent);
+ 
+ 	return err;
+ }
+diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
+--- a/drivers/gpu/drm/i915/gt/intel_context.c
++++ b/drivers/gpu/drm/i915/gt/intel_context.c
+@@ -460,6 +460,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
+ 	INIT_LIST_HEAD(&ce->signals);
+ 
+ 	mutex_init(&ce->pin_mutex);
++	mutex_init(&ce->parallel_submit);
+ 
+ 	spin_lock_init(&ce->guc_state.lock);
+ 	INIT_LIST_HEAD(&ce->guc_state.fences);
+@@ -491,6 +492,7 @@ void intel_context_fini(struct intel_context *ce)
+ 			intel_context_put(child);
+ 
+ 	mutex_destroy(&ce->pin_mutex);
++	mutex_destroy(&ce->parallel_submit);
+ 	i915_active_fini(&ce->active);
+ }
+ 
+diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
+--- a/drivers/gpu/drm/i915/gt/intel_context.h
++++ b/drivers/gpu/drm/i915/gt/intel_context.h
+@@ -52,6 +52,11 @@ static inline bool intel_context_is_parent(struct intel_context *ce)
+ 	return !!ce->guc_number_children;
+ }
+ 
++static inline bool intel_context_is_parallel(struct intel_context *ce)
++{
++	return intel_context_is_child(ce) || intel_context_is_parent(ce);
++}
++
+ /* Only should be called directly by selftests */
+ void __intel_context_bind_parent_child(struct intel_context *parent,
+ 				       struct intel_context *child);
+@@ -204,6 +209,12 @@ static inline bool intel_context_is_barrier(const struct intel_context *ce)
+ 	return test_bit(CONTEXT_BARRIER_BIT, &ce->flags);
+ }
+ 
++static inline bool
++intel_context_is_no_preempt_mid_batch(const struct intel_context *ce)
++{
++	return test_bit(CONTEXT_NO_PREEMPT_MID_BATCH, &ce->flags);
++}
++
+ static inline bool intel_context_is_closed(const struct intel_context *ce)
+ {
+ 	return test_bit(CONTEXT_CLOSED_BIT, &ce->flags);
+diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
+--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
++++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
+@@ -114,6 +114,7 @@ struct intel_context {
+ #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
+ #define CONTEXT_NOPREEMPT		8
+ #define CONTEXT_LRCA_DIRTY		9
++#define CONTEXT_NO_PREEMPT_MID_BATCH	10
+ 
+ 	struct {
+ 		u64 timeout_us;
+@@ -239,6 +240,9 @@ struct intel_context {
+ 
+ 	/* Last request submitted on a parent */
+ 	struct i915_request *last_rq;
++
++	/* Parallel submit mutex */
++	struct mutex parallel_submit;
+ };
+ 
+ #endif /* __INTEL_CONTEXT_TYPES__ */
+diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
++++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+@@ -798,8 +798,7 @@ static inline int rq_prio(const struct i915_request *rq)
+ 
+ static inline bool is_multi_lrc(struct intel_context *ce)
+ {
+-	return intel_context_is_child(ce) ||
+-		intel_context_is_parent(ce);
++	return intel_context_is_parallel(ce);
+ }
+ 
+ static inline bool is_multi_lrc_rq(struct i915_request *rq)
+@@ -3458,6 +3457,7 @@ void intel_guc_configure_parent_context(struct intel_context *ce)
+ 		bb_preempt_boundary =
+ 			i915_gem_context_is_bb_preempt_boundary(ctx);
+ 	rcu_read_unlock();
++	bb_preempt_boundary |= intel_context_is_no_preempt_mid_batch(ce);
+ 	if (bb_preempt_boundary) {
+ 		ce->emit_bb_start = emit_bb_start_parent_bb_preempt_boundary;
+ 		ce->emit_fini_breadcrumb =
+diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
+--- a/drivers/gpu/drm/i915/i915_request.c
++++ b/drivers/gpu/drm/i915/i915_request.c
+@@ -1606,14 +1606,9 @@ i915_request_await_object(struct i915_request *to,
+ 	return ret;
+ }
+ 
+-static inline bool is_parallel(struct intel_context *ce)
+-{
+-	return intel_context_is_child(ce) || intel_context_is_parent(ce);
+-}
+-
+ static inline bool is_parallel_rq(struct i915_request *rq)
+ {
+-	return is_parallel(rq->context);
++	return intel_context_is_parallel(rq->context);
+ }
+ 
+ static inline struct intel_context *request_to_parent(struct i915_request *rq)
+diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h
+--- a/include/uapi/drm/i915_drm_prelim.h
++++ b/include/uapi/drm/i915_drm_prelim.h
+@@ -370,9 +370,124 @@ struct prelim_i915_context_engines_parallel_submit {
+ } __attribute__ ((packed));
+ #define i915_context_engines_parallel_submit prelim_i915_context_engines_parallel_submit
+ 
++/**
++ * struct prelim_drm_i915_context_engines_parallel2_submit - Configure engine
++ * for parallel submission.
++ *
++ * Setup a slot in the context engine map to allow multiple BBs to be submitted
++ * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU
++ * in parallel. Multiple hardware contexts are created internally in the i915
++ * run these BBs. Once a slot is configured for N BBs only N BBs can be
++ * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user
++ * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how
++ * many BBs there are based on the slot's configuration. The N BBs are the last
++ * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
++ *
++ * The default placement behavior is to create implicit bonds between each
++ * context if each context maps to more than 1 physical engine (e.g. context is
++ * a virtual engine). Also we only allow contexts of same engine class and these
++ * contexts must be in logically contiguous order. Examples of the placement
++ * behavior described below. Lastly, the default is to not allow BBs to
++ * preempted mid BB rather insert coordinated preemption on all hardware
++ * contexts between each set of BBs. Flags may be added in the future to change
++ * both of these default behaviors.
++ *
++ * Returns -EINVAL if hardware context placement configuration is invalid or if
++ * the placement configuration isn't supported on the platform / submission
++ * interface.
++ * Returns -ENODEV if extension isn't supported on the platform / submission
++ * inteface.
++ *
++ * .. code-block::
++ *
++ *	Example 1 pseudo code:
++ *	CS[X] = generic engine of same class, logical instance X
++ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
++ *	set_engines(INVALID)
++ *	set_parallel(engine_index=0, width=2, num_siblings=1,
++ *		     engines=CS[0],CS[1])
++ *
++ *	Results in the following valid placement:
++ *	CS[0], CS[1]
++ *
++ *	Example 2 pseudo code:
++ *	CS[X] = generic engine of same class, logical instance X
++ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
++ *	set_engines(INVALID)
++ *	set_parallel(engine_index=0, width=2, num_siblings=2,
++ *		     engines=CS[0],CS[2],CS[1],CS[3])
++ *
++ *	Results in the following valid placements:
++ *	CS[0], CS[1]
++ *	CS[2], CS[3]
++ *
++ *	This can also be thought of as 2 virtual engines described by 2-D array
++ *	in the engines the field with bonds placed between each index of the
++ *	virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to
++ *	CS[3].
++ *	VE[0] = CS[0], CS[2]
++ *	VE[1] = CS[1], CS[3]
++ *
++ *	Example 3 pseudo code:
++ *	CS[X] = generic engine of same class, logical instance X
++ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
++ *	set_engines(INVALID)
++ *	set_parallel(engine_index=0, width=2, num_siblings=2,
++ *		     engines=CS[0],CS[1],CS[1],CS[3])
++ *
++ *	Results in the following valid and invalid placements:
++ *	CS[0], CS[1]
++ *	CS[1], CS[3] - Not logical contiguous, return -EINVAL
++ */
++struct prelim_drm_i915_context_engines_parallel2_submit {
++	/**
++	 * @base: base user extension.
++	 */
++	struct i915_user_extension base;
++
++	/**
++	 * @engine_index: slot for parallel engine
++	 */
++	__u16 engine_index;
++
++	/**
++	 * @width: number of contexts per parallel engine
++	 */
++	__u16 width;
++
++	/**
++	 * @num_siblings: number of siblings per context
++	 */
++	__u16 num_siblings;
++
++	/**
++	 * @mbz16: reserved for future use; must be zero
++	 */
++	__u16 mbz16;
++
++	/**
++	 * @flags: all undefined flags must be zero, currently not defined flags
++	 */
++	__u64 flags;
++
++	/**
++	 * @mbz64: reserved for future use; must be zero
++	 */
++	__u64 mbz64[3];
++
++	/**
++	 * @engines: 2-d array of engine instances to configure parallel engine
++	 *
++	 * length = width (i) * num_siblings (j)
++	 * index = j + i * num_siblings
++	 */
++	struct i915_engine_class_instance engines[0];
++} __attribute__ ((packed));
++
+ struct prelim_i915_context_param_engines {
+ #define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT (PRELIM_I915_USER_EXT | 2) /* see prelim_i915_context_engines_parallel_submit */
+ #define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
++#define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT (PRELIM_I915_USER_EXT | 3) /* see prelim_i915_context_engines_parallel2_submit */
+ };
+ 
+ enum prelim_drm_i915_gem_memory_class {
--
git-pile 0.97


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Intel-gfx] [PATCH 0/2] Introduce set_parallel2 extension
@ 2021-07-08  0:30   ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-08  0:30 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Based on upstream feedback [1] the current set_parallel extension isn't
suitable. Add a single patch to DII implementing the new interface
agreed two upstream [2]. Intended to enable the UMDs with the upstream
interface while maintaining the old interface on DII. 

Quick IGT to prove this is working should be list shortly.

v2: Move single patch in GuC section on pile, align with agreed to
upstream interface, only include prelim* definitions. 
v3: Enable set_parallel2 via SET_PARAM IOCTL, resend for CI
v4: Fix regression when patch was merge - only do parallel checks on
user engine sets 

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

[1] https://patchwork.freedesktop.org/patch/432205/?series=89840&rev=1
[2] https://patchwork.freedesktop.org/patch/438911/?series=91417&rev=1

Signed-off-by: Matthew Brost <matthew.brost@intel.com>


---
baseline: b7227afd06bac1fe6719136e2ddd2bfed1d85feb
pile-commit: b7a2c9136977a385659a71df837cbe5a1f775b32
range-diff:
   -:  ------------ >  930:  ad12b87b91af INTEL_DII/NOT_UPSTREAM: drm/i915: Introduce set_parallel2 extension
1083:  73e59e150cde ! 1084:  79b296835b1c INTEL_DII/FIXME: drm/i915/perf: add a parameter to control the size of OA buffer
1120:  edbc20ae1355 ! 1121:  30d02d618229 INTEL_DII/FIXME: drm/i915: Add context parameter for debug flags
1293:  997b317fc408 ! 1294:  016b5903b0a0 INTEL_DII: drm/i915/perf: Add OA formats for XEHPSDV
1364:  136064b76b92 ! 1365:  5f564d553dc8 INTEL_DII: drm/i915/xehpsdv: Expand total numbers of supported engines up to 256
1403:  67b729033e82 ! 1404:  4398a2322f2f INTEL_DII: drm/i915/xehpsdv: Impose ULLS context restrictions
1405:  b8dd2a22a952 ! 1406:  dd2fab232cf1 INTEL_DII: drm/i915: Add context methods to suspend and resume requests
1670:  b4633106fa13 ! 1671:  53b4a54ee2cc INTEL_DII: drm/i915/pxp: interface for marking contexts as using protected content
1671:  22369ab70556 ! 1672:  42234590cdf5 INTEL_DII: drm/i915/pxp: start the arb session on demand

 series                                             |   1 +
 ...IXME-drm-i915-perf-add-a-parameter-to-con.patch |   4 +-
 ...IXME-drm-i915-Add-context-parameter-for-d.patch |  18 +-
 ...-drm-i915-perf-Add-OA-formats-for-XEHPSDV.patch |   4 +-
 ...rm-i915-xehpsdv-Expand-total-numbers-of-s.patch |   2 +-
 ...rm-i915-xehpsdv-Impose-ULLS-context-restr.patch |  12 +-
 ...rm-i915-Add-context-methods-to-suspend-an.patch |  38 +-
 ...rm-i915-pxp-interface-for-marking-context.patch |  16 +-
 ...rm-i915-pxp-start-the-arb-session-on-dema.patch |   2 +-
 ...OT_UPSTREAM-drm-i915-Introduce-set_parall.patch | 676 +++++++++++++++++++++
 10 files changed, 725 insertions(+), 48 deletions(-)

diff --git a/series b/series
index 8b77d52df40c..7db508ea974d 100644
--- a/series
+++ b/series
@@ -929,6 +929,7 @@
 0001-INTEL_DII-drm-i915-guc-Increase-GuC-log-size-for-CON.patch
 0001-INTEL_DII-NOT_UPSTREAM-drm-i915-Dump-error-capture-t.patch
 0001-INTEL_DII-NOT_UPSTREAM-drm-i915-guc-Dump-error-captu.patch
+0001-INTEL_DII-NOT_UPSTREAM-drm-i915-Introduce-set_parall.patch
 0001-INTEL_DII-END-GuC-submission-and-slpc-support.patch
 0001-INTEL_DII-BEGIN-SR-IOV-ENABLING.patch
 0001-INTEL_DII-drm-i915-guc-Update-GuC-to-62.0.3.patch
diff --git a/0001-INTEL_DII-FIXME-drm-i915-perf-add-a-parameter-to-con.patch b/0001-INTEL_DII-FIXME-drm-i915-perf-add-a-parameter-to-con.patch
index dd654f144374..b7a637b3813f 100644
--- a/0001-INTEL_DII-FIXME-drm-i915-perf-add-a-parameter-to-con.patch
+++ b/0001-INTEL_DII-FIXME-drm-i915-perf-add-a-parameter-to-con.patch
@@ -384,8 +384,8 @@ diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
 diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h
 --- a/include/uapi/drm/i915_drm_prelim.h
 +++ b/include/uapi/drm/i915_drm_prelim.h
-@@ -393,6 +393,36 @@ struct prelim_i915_context_param_engines {
- #define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
+@@ -508,6 +508,36 @@ struct prelim_i915_context_param_engines {
+ #define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT (PRELIM_I915_USER_EXT | 3) /* see prelim_i915_context_engines_parallel2_submit */
  };
  
 +enum prelim_drm_i915_perf_property_id {
diff --git a/0001-INTEL_DII-FIXME-drm-i915-Add-context-parameter-for-d.patch b/0001-INTEL_DII-FIXME-drm-i915-Add-context-parameter-for-d.patch
index dfd5790ac2b8..71a5943b5536 100644
--- a/0001-INTEL_DII-FIXME-drm-i915-Add-context-parameter-for-d.patch
+++ b/0001-INTEL_DII-FIXME-drm-i915-Add-context-parameter-for-d.patch
@@ -44,7 +44,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  }
  
  static void __free_engines(struct i915_gem_engines *e, unsigned int count)
-@@ -2252,6 +2255,76 @@ static int set_priority(struct i915_gem_context *ctx,
+@@ -2436,6 +2439,76 @@ static int set_priority(struct i915_gem_context *ctx,
  	return 0;
  }
  
@@ -121,7 +121,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  static int ctx_setparam(struct drm_i915_file_private *fpriv,
  			struct i915_gem_context *ctx,
  			struct drm_i915_gem_context_param *args)
-@@ -2321,6 +2394,11 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
+@@ -2505,6 +2578,11 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
  		ret = set_ringsize(ctx, args);
  		break;
  
@@ -133,7 +133,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  	case I915_CONTEXT_PARAM_BAN_PERIOD:
  	default:
  		ret = -EINVAL;
-@@ -2777,6 +2855,11 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
+@@ -2961,6 +3039,11 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
  		ret = get_ringsize(ctx, args);
  		break;
  
@@ -184,7 +184,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm
 diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
 --- a/drivers/gpu/drm/i915/gt/intel_context.h
 +++ b/drivers/gpu/drm/i915/gt/intel_context.h
-@@ -285,6 +285,24 @@ intel_context_clear_nopreempt(struct intel_context *ce)
+@@ -296,6 +296,24 @@ intel_context_clear_nopreempt(struct intel_context *ce)
  		ce->emit_bb_start = ce->engine->emit_bb_start;
  }
  
@@ -212,19 +212,19 @@ diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/i
 diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
 +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
-@@ -114,6 +114,7 @@ struct intel_context {
- #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
+@@ -115,6 +115,7 @@ struct intel_context {
  #define CONTEXT_NOPREEMPT		8
  #define CONTEXT_LRCA_DIRTY		9
-+#define CONTEXT_DEBUG			10
+ #define CONTEXT_NO_PREEMPT_MID_BATCH	10
++#define CONTEXT_DEBUG			11
  
  	struct {
  		u64 timeout_us;
 diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h
 --- a/include/uapi/drm/i915_drm_prelim.h
 +++ b/include/uapi/drm/i915_drm_prelim.h
-@@ -395,6 +395,32 @@ struct prelim_i915_context_param_engines {
- #define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
+@@ -510,6 +510,32 @@ struct prelim_i915_context_param_engines {
+ #define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT (PRELIM_I915_USER_EXT | 3) /* see prelim_i915_context_engines_parallel2_submit */
  };
  
 +struct prelim_drm_i915_gem_context_param {
diff --git a/0001-INTEL_DII-drm-i915-perf-Add-OA-formats-for-XEHPSDV.patch b/0001-INTEL_DII-drm-i915-perf-Add-OA-formats-for-XEHPSDV.patch
index 19a07b3926ae..f62d7848e091 100644
--- a/0001-INTEL_DII-drm-i915-perf-Add-OA-formats-for-XEHPSDV.patch
+++ b/0001-INTEL_DII-drm-i915-perf-Add-OA-formats-for-XEHPSDV.patch
@@ -204,8 +204,8 @@ diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
 diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h
 --- a/include/uapi/drm/i915_drm_prelim.h
 +++ b/include/uapi/drm/i915_drm_prelim.h
-@@ -435,6 +435,27 @@ struct prelim_i915_context_param_engines {
- #define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
+@@ -550,6 +550,27 @@ struct prelim_i915_context_param_engines {
+ #define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT (PRELIM_I915_USER_EXT | 3) /* see prelim_i915_context_engines_parallel2_submit */
  };
  
 +enum prelim_drm_i915_oa_format {
diff --git a/0001-INTEL_DII-drm-i915-xehpsdv-Expand-total-numbers-of-s.patch b/0001-INTEL_DII-drm-i915-xehpsdv-Expand-total-numbers-of-s.patch
index 05a84884a3d1..ee486b95d11e 100644
--- a/0001-INTEL_DII-drm-i915-xehpsdv-Expand-total-numbers-of-s.patch
+++ b/0001-INTEL_DII-drm-i915-xehpsdv-Expand-total-numbers-of-s.patch
@@ -76,7 +76,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9
  
  	/* Kernel clipping was a DRI1 misfeature */
  	if (!(exec->flags & I915_EXEC_FENCE_ARRAY)) {
-@@ -3233,9 +3235,12 @@ eb_select_engine(struct i915_execbuffer *eb)
+@@ -3233,9 +3235,12 @@ eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number)
  	int err;
  
  	if (i915_gem_context_user_engines(eb->gem_context))
diff --git a/0001-INTEL_DII-drm-i915-xehpsdv-Impose-ULLS-context-restr.patch b/0001-INTEL_DII-drm-i915-xehpsdv-Impose-ULLS-context-restr.patch
index 38ad84c4dc12..80880e3008cc 100644
--- a/0001-INTEL_DII-drm-i915-xehpsdv-Impose-ULLS-context-restr.patch
+++ b/0001-INTEL_DII-drm-i915-xehpsdv-Impose-ULLS-context-restr.patch
@@ -76,7 +76,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9
  	if (intel_context_nopreempt(eb->context) ||
  	    intel_context_debug(eb->context))
  		__set_bit(I915_FENCE_FLAG_NOPREEMPT, &eb->request->fence.flags);
-@@ -3453,6 +3462,13 @@ static int eb_request_add(struct i915_execbuffer *eb, int err)
+@@ -3463,6 +3472,13 @@ static int eb_request_add(struct i915_execbuffer *eb, int err)
  
  	trace_i915_request_add(rq);
  
@@ -90,7 +90,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9
  	prev = __i915_request_commit(rq);
  
  	/* Check that the context wasn't destroyed before submission */
-@@ -3531,6 +3547,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
+@@ -3541,6 +3557,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
  	int err;
  	bool first = batch_number == 0;
  	bool last = batch_number + 1 == num_batches;
@@ -98,7 +98,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9
  
  	BUILD_BUG_ON(__EXEC_INTERNAL_FLAGS & ~__I915_EXEC_ILLEGAL_FLAGS);
  	BUILD_BUG_ON(__EXEC_OBJECT_INTERNAL_FLAGS &
-@@ -3582,6 +3599,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
+@@ -3592,6 +3609,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
  	if (unlikely(err))
  		goto err_destroy;
  
@@ -109,7 +109,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9
 +		goto err_context;
 +	}
 +
- 	err = eb_select_engine(&eb);
+ 	err = eb_select_engine(&eb, batch_number);
  	if (unlikely(err))
  		goto err_context;
 diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -239,7 +239,7 @@ diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_req
  		/*
  		 * Requests on the same timeline are explicitly ordered, along
  		 * with their dependencies, by i915_request_add() which ensures
-@@ -2126,6 +2181,7 @@ long i915_request_wait(struct i915_request *rq,
+@@ -2121,6 +2176,7 @@ long i915_request_wait(struct i915_request *rq,
  {
  	might_sleep();
  	GEM_BUG_ON(timeout < 0);
@@ -247,7 +247,7 @@ diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_req
  
  	if (dma_fence_is_signaled(&rq->fence))
  		return timeout;
-@@ -2331,6 +2387,8 @@ static struct i915_global_request global = { {
+@@ -2326,6 +2382,8 @@ static struct i915_global_request global = { {
  
  int __init i915_global_request_init(void)
  {
diff --git a/0001-INTEL_DII-drm-i915-Add-context-methods-to-suspend-an.patch b/0001-INTEL_DII-drm-i915-Add-context-methods-to-suspend-an.patch
index 7d523c8dadba..44fd93184b8a 100644
--- a/0001-INTEL_DII-drm-i915-Add-context-methods-to-suspend-an.patch
+++ b/0001-INTEL_DII-drm-i915-Add-context-methods-to-suspend-an.patch
@@ -52,7 +52,7 @@ diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/i
  void
  intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
  {
-@@ -475,6 +481,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
+@@ -476,6 +482,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
  	ce->guc_id = GUC_INVALID_LRC_ID;
  	INIT_LIST_HEAD(&ce->guc_id_link);
  
@@ -62,7 +62,7 @@ diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/i
  	i915_active_init(&ce->active,
  			 __intel_context_active, __intel_context_retire);
  }
-@@ -485,6 +494,7 @@ void intel_context_fini(struct intel_context *ce)
+@@ -486,6 +495,7 @@ void intel_context_fini(struct intel_context *ce)
  
  	if (ce->last_rq)
  		i915_request_put(ce->last_rq);
@@ -73,7 +73,7 @@ diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/i
 diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
 --- a/drivers/gpu/drm/i915/gt/intel_context.h
 +++ b/drivers/gpu/drm/i915/gt/intel_context.h
-@@ -252,6 +252,54 @@ static inline bool intel_context_ban(struct intel_context *ce,
+@@ -263,6 +263,54 @@ static inline bool intel_context_ban(struct intel_context *ce,
  	return ret;
  }
  
@@ -152,10 +152,10 @@ diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i91
  	void (*enter)(struct intel_context *ce);
  	void (*exit)(struct intel_context *ce);
  
-@@ -241,6 +248,9 @@ struct intel_context {
+@@ -245,6 +252,9 @@ struct intel_context {
  
- 	/* Last request submitted on a parent */
- 	struct i915_request *last_rq;
+ 	/* Parallel submit mutex */
+ 	struct mutex parallel_submit;
 +
 +	/* GuC context blocked fence */
 +	struct i915_sw_fence guc_blocked;
@@ -231,7 +231,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  
  	if (!enabled) {
  		GEM_BUG_ON(context_pending_enable(ce));
-@@ -1103,6 +1137,8 @@ static void __guc_context_destroy(struct intel_context *ce);
+@@ -1102,6 +1136,8 @@ static void __guc_context_destroy(struct intel_context *ce);
  static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
  static void guc_signal_context_fence(struct intel_context *ce);
  static void guc_cancel_context_requests(struct intel_context *ce);
@@ -240,7 +240,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  
  static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
  {
-@@ -1143,6 +1179,8 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
+@@ -1142,6 +1178,8 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
  
  		/* Not mutualy exclusive with above if statement. */
  		if (pending_disable) {
@@ -249,7 +249,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  			guc_signal_context_fence(ce);
  			if (banned) {
  				guc_cancel_context_requests(ce);
-@@ -1150,7 +1188,12 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
+@@ -1149,7 +1187,12 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
  			}
  			intel_context_sched_disable_unpin(ce);
  			atomic_dec(&guc->outstanding_submission_g2h);
@@ -262,7 +262,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  		}
  	}
  }
-@@ -2549,6 +2592,22 @@ static void guc_parent_context_unpin(struct intel_context *ce)
+@@ -2551,6 +2594,22 @@ static void guc_parent_context_unpin(struct intel_context *ce)
  	__guc_context_unpin(ce);
  }
  
@@ -285,7 +285,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  static void __guc_context_sched_disable(struct intel_guc *guc,
  					struct intel_context *ce,
  					u16 guc_id)
-@@ -2576,10 +2635,13 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
+@@ -2578,10 +2637,13 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
  				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
  }
  
@@ -299,7 +299,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  	intel_context_get(ce);
  
  	return ce->guc_id;
-@@ -2677,6 +2739,132 @@ static void guc_context_sched_disable(struct intel_context *ce)
+@@ -2679,6 +2741,132 @@ static void guc_context_sched_disable(struct intel_context *ce)
  	intel_context_sched_disable_unpin(ce);
  }
  
@@ -432,7 +432,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  int intel_guc_modify_scheduling(struct intel_guc *guc, bool enable)
  {
  	struct intel_gt *gt = guc_to_gt(guc);
-@@ -2991,6 +3179,9 @@ static const struct intel_context_ops guc_context_ops = {
+@@ -2993,6 +3181,9 @@ static const struct intel_context_ops guc_context_ops = {
  
  	.ban = guc_context_ban,
  
@@ -442,7 +442,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  	.enter = intel_context_enter_engine,
  	.exit = guc_context_exit,
  
-@@ -3380,6 +3571,9 @@ static const struct intel_context_ops virtual_guc_context_ops = {
+@@ -3382,6 +3573,9 @@ static const struct intel_context_ops virtual_guc_context_ops = {
  
  	.ban = guc_context_ban,
  
@@ -452,7 +452,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  	.enter = guc_virtual_context_enter,
  	.exit = guc_virtual_context_exit,
  
-@@ -3457,6 +3651,9 @@ static const struct intel_context_ops parent_context_ops = {
+@@ -3459,6 +3653,9 @@ static const struct intel_context_ops parent_context_ops = {
  
  	.ban = guc_context_ban,
  
@@ -462,7 +462,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  	.enter = intel_context_enter_engine,
  	.exit = intel_context_exit_engine,
  
-@@ -3476,6 +3673,9 @@ static const struct intel_context_ops virtual_parent_context_ops = {
+@@ -3478,6 +3675,9 @@ static const struct intel_context_ops virtual_parent_context_ops = {
  
  	.ban = guc_context_ban,
  
@@ -472,7 +472,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  	.enter = guc_virtual_context_enter,
  	.exit = guc_virtual_context_exit,
  
-@@ -3487,6 +3687,9 @@ static const struct intel_context_ops virtual_parent_context_ops = {
+@@ -3489,6 +3689,9 @@ static const struct intel_context_ops virtual_parent_context_ops = {
  static const struct intel_context_ops child_context_ops = {
  	.alloc = guc_context_alloc,
  
@@ -482,7 +482,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  	.enter = intel_context_enter_engine,
  	.exit = guc_context_exit,
  
-@@ -3497,6 +3700,9 @@ static const struct intel_context_ops child_context_ops = {
+@@ -3499,6 +3702,9 @@ static const struct intel_context_ops child_context_ops = {
  static const struct intel_context_ops virtual_child_context_ops = {
  	.alloc = guc_virtual_context_alloc,
  
@@ -492,7 +492,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm
  	.enter = guc_virtual_context_enter,
  	.exit = guc_virtual_context_exit,
  
-@@ -4440,6 +4646,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
+@@ -4441,6 +4647,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
  		clr_context_banned(ce);
  		clr_context_pending_disable(ce);
  		__guc_signal_context_fence(ce);
diff --git a/0001-INTEL_DII-drm-i915-pxp-interface-for-marking-context.patch b/0001-INTEL_DII-drm-i915-pxp-interface-for-marking-context.patch
index 6b38bd36d21b..8a6b9561eb24 100644
--- a/0001-INTEL_DII-drm-i915-pxp-interface-for-marking-context.patch
+++ b/0001-INTEL_DII-drm-i915-pxp-interface-for-marking-context.patch
@@ -56,7 +56,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  #include "i915_gem_context.h"
  #include "i915_gem_ioctls.h"
  #include "i915_globals.h"
-@@ -2574,6 +2576,40 @@ static int set_acc(struct i915_gem_context *ctx,
+@@ -2769,6 +2771,40 @@ static int set_acc(struct i915_gem_context *ctx,
  	return 0;
  }
  
@@ -97,7 +97,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  static int ctx_setparam(struct drm_i915_file_private *fpriv,
  			struct i915_gem_context *ctx,
  			struct drm_i915_gem_context_param *args,
-@@ -2607,6 +2643,8 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
+@@ -2802,6 +2838,8 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
  			ret = -EPERM;
  		else if (args->value)
  			i915_gem_context_set_bannable(ctx);
@@ -106,7 +106,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  		else
  			i915_gem_context_clear_bannable(ctx);
  		break;
-@@ -2614,10 +2652,12 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
+@@ -2809,10 +2847,12 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
  	case I915_CONTEXT_PARAM_RECOVERABLE:
  		if (args->size)
  			ret = -EINVAL;
@@ -122,7 +122,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  		break;
  
  	case I915_CONTEXT_PARAM_PRIORITY:
-@@ -2664,6 +2704,9 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
+@@ -2865,6 +2905,9 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
  	case I915_CONTEXT_PARAM_DEBUG_FLAGS:
  		ret = set_debug_flags(ctx, args);
  		break;
@@ -132,7 +132,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  
  	case I915_CONTEXT_PARAM_BAN_PERIOD:
  	default:
-@@ -3157,6 +3200,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
+@@ -3358,6 +3401,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
  	case I915_CONTEXT_PARAM_DEBUG_FLAGS:
  		ret = get_debug_flags(ctx, args);
  		break;
@@ -142,7 +142,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/
  
  	case I915_CONTEXT_PARAM_BAN_PERIOD:
  	default:
-@@ -3281,6 +3327,11 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
+@@ -3482,6 +3528,11 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
  	args->batch_active = atomic_read(&ctx->guilty_count);
  	args->batch_pending = atomic_read(&ctx->active_count);
  
@@ -225,7 +225,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9
  	eb->gem_context = ctx;
  	if (rcu_access_pointer(ctx->vm))
  		eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
-@@ -3301,6 +3308,17 @@ eb_select_engine(struct i915_execbuffer *eb)
+@@ -3311,6 +3318,17 @@ eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number)
  
  	intel_gt_pm_get(ce->engine->gt);
  
@@ -348,7 +348,7 @@ diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
 diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h
 --- a/include/uapi/drm/i915_drm_prelim.h
 +++ b/include/uapi/drm/i915_drm_prelim.h
-@@ -893,6 +893,26 @@ struct prelim_drm_i915_gem_context_param {
+@@ -1003,6 +1003,26 @@ struct prelim_drm_i915_gem_context_param {
  #define I915_CONTEXT_PARAM_ACC    0xd
  };
  
diff --git a/0001-INTEL_DII-drm-i915-pxp-start-the-arb-session-on-dema.patch b/0001-INTEL_DII-drm-i915-pxp-start-the-arb-session-on-dema.patch
index 5ee627b00811..4b4326057959 100644
--- a/0001-INTEL_DII-drm-i915-pxp-start-the-arb-session-on-dema.patch
+++ b/0001-INTEL_DII-drm-i915-pxp-start-the-arb-session-on-dema.patch
@@ -22,7 +22,7 @@ Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
 diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
 +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
-@@ -3309,9 +3309,11 @@ eb_select_engine(struct i915_execbuffer *eb)
+@@ -3319,9 +3319,11 @@ eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number)
  	intel_gt_pm_get(ce->engine->gt);
  
  	if (i915_gem_context_uses_protected_content(eb->gem_context)) {
diff --git a/0001-INTEL_DII-NOT_UPSTREAM-drm-i915-Introduce-set_parall.patch b/0001-INTEL_DII-NOT_UPSTREAM-drm-i915-Introduce-set_parall.patch
new file mode 100644
index 000000000000..415fbd930383
--- /dev/null
+++ b/0001-INTEL_DII-NOT_UPSTREAM-drm-i915-Introduce-set_parall.patch
@@ -0,0 +1,676 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Matthew Brost <matthew.brost@intel.com>
+Date: Wed, 7 Jul 2021 16:55:03 -0700
+Subject: [PATCH] INTEL_DII/NOT_UPSTREAM: drm/i915: Introduce set_parallel2
+ extension
+
+Based on upstream feedback the set_parallel extension isn't suitable as
+it looks a bit too much like the bonding extension. Introduce a
+set_parallel2 extension which configures parallel submission in a single
+extension and in a single slot. This compares to old set_parallel
+extension which configured parallel submission across multiple slots.
+
+Also remove the ability for the user to pass in the number of BBs in
+the execbuf IOCTL. The number of BBs is now implied based on the width
+of the context in the slot.
+
+This patch is intended in enable UMDs for the upstream direction while
+maintaining the old set_parallel extension to not break UMDs. Once UMDs
+have been updated to use new extension the old one can be removed from
+DII.
+
+v2: Only enable parallel submission on engines set by user
+
+Signed-off-by: Matthew Brost <matthew.brost@intel.com>
+---
+ drivers/gpu/drm/i915/gem/i915_gem_context.c   | 190 +++++++++++++++++-
+ .../gpu/drm/i915/gem/i915_gem_context_types.h |   6 -
+ .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  73 +++++--
+ drivers/gpu/drm/i915/gt/intel_context.c       |   2 +
+ drivers/gpu/drm/i915/gt/intel_context.h       |  11 +
+ drivers/gpu/drm/i915/gt/intel_context_types.h |   4 +
+ .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   4 +-
+ drivers/gpu/drm/i915/i915_request.c           |   7 +-
+ include/uapi/drm/i915_drm_prelim.h            | 115 +++++++++++
+ 9 files changed, 376 insertions(+), 36 deletions(-)
+
+diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
+--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
++++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
+@@ -374,7 +374,6 @@ void i915_gem_context_release(struct kref *ref)
+ 	mutex_destroy(&ctx->engines_mutex);
+ 	mutex_destroy(&ctx->lut_mutex);
+ 	mutex_destroy(&ctx->mutex);
+-	mutex_destroy(&ctx->parallel_submit);
+ 
+ 	kfree_rcu(ctx, rcu);
+ }
+@@ -699,8 +698,6 @@ __create_context(struct drm_i915_private *i915)
+ 	mutex_init(&ctx->mutex);
+ 	INIT_LIST_HEAD(&ctx->link);
+ 
+-	mutex_init(&ctx->parallel_submit);
+-
+ 	spin_lock_init(&ctx->stale.lock);
+ 	INIT_LIST_HEAD(&ctx->stale.engines);
+ 
+@@ -1857,6 +1854,48 @@ static bool validate_parallel_engines_layout(const struct set_engines *set)
+ 	return true;
+ }
+ 
++/*
++ * Engine must be same class and form a logically contiguous mask.
++ *
++ * FIXME: Logical mask check not 100% correct but good enough for the PoC
++ */
++static bool __validate_parallel_engines_layout(struct drm_i915_private *i915,
++					       struct intel_context *parent)
++{
++	u8 engine_class = parent->engine->class;
++	u8 num_siblings = hweight_long(parent->engine->logical_mask);
++	struct intel_context *child;
++	intel_engine_mask_t logical_mask = parent->engine->logical_mask;
++
++	for_each_child(parent, child) {
++		if (child->engine->class != engine_class) {
++			drm_dbg(&i915->drm, "Class mismatch: %u, %u",
++				engine_class, child->engine->class);
++			return false;
++		}
++		if (hweight_long(child->engine->logical_mask) != num_siblings) {
++			drm_dbg(&i915->drm, "Sibling mismatch: %u, %lu",
++				num_siblings,
++				hweight_long(child->engine->logical_mask));
++			return false;
++		}
++		if (logical_mask & child->engine->logical_mask) {
++			drm_dbg(&i915->drm, "Overlapping logical mask: 0x%04x, 0x%04x",
++				logical_mask, child->engine->logical_mask);
++			return false;
++		}
++		logical_mask |= child->engine->logical_mask;
++	}
++
++	if (!is_power_of_2((logical_mask >> (ffs(logical_mask) - 1)) + 1)) {
++		drm_dbg(&i915->drm, "Non-contiguous logical mask: 0x%04x",
++			logical_mask);
++		return false;
++	}
++
++	return true;
++}
++
+ static int
+ set_engines__parallel_submit(struct i915_user_extension __user *base, void *data)
+ {
+@@ -2009,11 +2048,156 @@ set_engines__parallel_submit(struct i915_user_extension __user *base, void *data
+ 	return err;
+ }
+ 
++static int
++set_engines__parallel2_submit(struct i915_user_extension __user *base,
++			      void *data)
++{
++	struct prelim_drm_i915_context_engines_parallel2_submit __user *ext =
++		container_of_user(base, typeof(*ext), base);
++	const struct set_engines *set = data;
++	struct drm_i915_private *i915 = set->ctx->i915;
++	struct intel_context *parent, *child, *ce;
++	u64 flags;
++	int err = 0, n, i, j;
++	u16 slot, width, num_siblings;
++	struct intel_engine_cs **siblings = NULL;
++
++	if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
++		return -ENODEV;
++
++	if (get_user(slot, &ext->engine_index))
++		return -EFAULT;
++
++	if (get_user(width, &ext->width))
++		return -EFAULT;
++
++	if (get_user(num_siblings, &ext->num_siblings))
++		return -EFAULT;
++
++	if (slot >= set->engines->num_engines) {
++		drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
++			slot, set->engines->num_engines);
++		return -EINVAL;
++	}
++
++	parent = set->engines->engines[slot];
++	if (parent) {
++		drm_dbg(&i915->drm, "Context index[%d] not NULL\n", slot);
++		return -EINVAL;
++	}
++
++	if (get_user(flags, &ext->flags))
++		return -EFAULT;
++
++	if (flags) {
++		drm_dbg(&i915->drm, "Unknown flags 0x%02llx", flags);
++		return -EINVAL;
++	}
++
++	for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
++		err = check_user_mbz(&ext->mbz64[n]);
++		if (err)
++			return err;
++	}
++
++	if (width < 1) {
++		drm_dbg(&i915->drm, "Width (%d) < 1 \n", width);
++		return -EINVAL;
++	}
++
++	if (num_siblings < 1) {
++		drm_dbg(&i915->drm, "Number siblings (%d) < 1 \n",
++			num_siblings);
++		return -EINVAL;
++	}
++
++	siblings = kmalloc_array(num_siblings,
++				 sizeof(*siblings),
++				 GFP_KERNEL);
++	if (!siblings)
++		return -ENOMEM;
++
++	mutex_lock(&set->ctx->mutex);
++
++	/* Create contexts / engines */
++	for (i = 0; i < width; ++i) {
++		for (j = 0; j < num_siblings; ++j) {
++			struct i915_engine_class_instance ci;
++
++			if (copy_from_user(&ci, &ext->engines[i * num_siblings + j],
++					   sizeof(ci))) {
++				err = -EFAULT;
++				goto out_err;
++			}
++
++			siblings[j] = intel_engine_lookup_user(i915,
++							       ci.engine_class,
++							       ci.engine_instance);
++			if (!siblings[j]) {
++				drm_dbg(&i915->drm,
++					"Invalid sibling[%d]: { class:%d, inst:%d }\n",
++					n, ci.engine_class, ci.engine_instance);
++				err = -EINVAL;
++				goto out_err;
++			}
++		}
++
++		ce = intel_engine_create_virtual(siblings, num_siblings,
++						 FORCE_VIRTUAL);
++		if (IS_ERR(ce)) {
++			err = PTR_ERR(ce);
++			goto out_err;
++		}
++		intel_context_set_gem(ce, set->ctx);
++
++		if (i == 0) {
++			parent = ce;
++			__set_bit(CONTEXT_NO_PREEMPT_MID_BATCH, &ce->flags);
++		} else {
++			intel_context_bind_parent_child(parent, ce);
++			err = intel_context_alloc_state(ce);
++			if (err)
++				goto out_err;
++		}
++	}
++
++	if (!__validate_parallel_engines_layout(i915, parent)) {
++		drm_dbg(&i915->drm, "Invalidate parallel context layout");
++		err = -EINVAL;
++		goto out_err;
++	}
++
++	intel_guc_configure_parent_context(parent);
++	if (cmpxchg(&set->engines->engines[slot], NULL, parent)) {
++		err = -EEXIST;
++		goto out_err;
++	}
++
++	kfree(siblings);
++	mutex_unlock(&set->ctx->mutex);
++
++	return 0;
++
++out_err:
++	if (parent) {
++		for_each_child(parent, child)
++			intel_context_put(child);
++		intel_context_put(parent);
++		set->engines->engines[slot] = NULL;
++	}
++	kfree(siblings);
++	mutex_unlock(&set->ctx->mutex);
++
++	return err;
++}
++
+ static const i915_user_extension_fn set_engines__extensions[] = {
+ 	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
+ 	[I915_CONTEXT_ENGINES_EXT_BOND] = set_engines__bond,
+ 	[PRELIM_I915_USER_EXT_MASK(PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT)] =
+ 		set_engines__parallel_submit,
++	[PRELIM_I915_USER_EXT_MASK(PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT)] =
++		set_engines__parallel2_submit,
+ };
+ 
+ static int
+diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
++++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+@@ -194,12 +194,6 @@ struct i915_gem_context {
+ 	 */
+ 	u64 fence_context;
+ 
+-	/**
+-	 * @parallel_submit: Ensure only 1 parallel submission is happening on
+-	 * this context at a time.
+-	 */
+-	struct mutex parallel_submit;
+-
+ 	/**
+ 	 * @seqno: Seqno when using when a parallel context.
+ 	 */
+diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
++++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+@@ -1633,7 +1633,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
+ 		goto err_unmap;
+ 
+ 	if (engine == eb->context->engine &&
+-	    !i915_gem_context_is_parallel(eb->gem_context)) {
++	    !intel_context_is_parallel(eb->context)) {
+ 		rq = i915_request_create(eb->context);
+ 	} else {
+ 		struct intel_context *ce = eb->reloc_context;
+@@ -1727,7 +1727,7 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
+ 		struct intel_engine_cs *engine = eb->engine;
+ 
+ 		if (!reloc_can_use_engine(engine) ||
+-		    i915_gem_context_is_parallel(eb->gem_context)) {
++		    intel_context_is_parallel(eb->context)) {
+ 			engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
+ 			if (!engine)
+ 				return ERR_PTR(-ENODEV);
+@@ -3223,7 +3223,7 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
+ }
+ 
+ static int
+-eb_select_engine(struct i915_execbuffer *eb)
++eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number)
+ {
+ 	struct intel_context *ce;
+ 	unsigned int idx;
+@@ -3238,6 +3238,16 @@ eb_select_engine(struct i915_execbuffer *eb)
+ 	if (IS_ERR(ce))
+ 		return PTR_ERR(ce);
+ 
++	if (batch_number > 0 &&
++	    !i915_gem_context_is_parallel(eb->gem_context)) {
++		struct intel_context *parent = ce;
++		for_each_child(parent, ce)
++			if (!--batch_number)
++				break;
++		intel_context_put(parent);
++		intel_context_get(ce);
++	}
++
+ 	intel_gt_pm_get(ce->engine->gt);
+ 
+ 	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
+@@ -3562,7 +3572,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
+ 	if (unlikely(err))
+ 		goto err_destroy;
+ 
+-	err = eb_select_engine(&eb);
++	err = eb_select_engine(&eb, batch_number);
+ 	if (unlikely(err))
+ 		goto err_context;
+ 
+@@ -3751,6 +3761,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
+ 	const size_t count = args->buffer_count;
+ 	unsigned int num_batches, i;
+ 	int err, start_context;
++	bool is_parallel = false;
++	struct intel_context *parent = NULL;
+ 
+ 	if (!check_buffer_count(count)) {
+ 		drm_dbg(&i915->drm, "execbuf2 with %zd buffers\n", count);
+@@ -3782,15 +3794,35 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
+ 			I915_EXEC_NUMBER_BB_LSB) +
+ 		       ((args->flags & PRELIM_I915_EXEC_NUMBER_BB_MASK) >>
+ 			PRELIM_I915_EXEC_NUMBER_BB_LSB)) + 1;
+-	if (i915_gem_context_is_parallel(ctx)) {
+-		if (num_batches > count ||
+-		    start_context + num_batches > ctx->width) {
+-			err = -EINVAL;
+-			goto err_context;
++
++	if (i915_gem_context_user_engines(ctx)) {
++		parent = i915_gem_context_get_engine(ctx, start_context);
++		if (IS_ERR(parent)) {
++			i915_gem_context_put(ctx);
++			return PTR_ERR(parent);
+ 		}
+ 
+-		if (i915_gem_context_is_bb_preempt_boundary(ctx) &&
+-		    (start_context || num_batches != ctx->width)) {
++		is_parallel = i915_gem_context_is_parallel(ctx) ||
++			intel_context_is_parallel(parent);
++		if (i915_gem_context_is_parallel(ctx)) {
++			if (num_batches > count ||
++			    start_context + num_batches > ctx->width) {
++				err = -EINVAL;
++				goto err_context;
++			}
++
++			if (i915_gem_context_is_bb_preempt_boundary(ctx) &&
++			    (start_context || num_batches != ctx->width)) {
++				err = -EINVAL;
++				goto err_context;
++			}
++		} else if (intel_context_is_parallel(parent)) {
++			if (num_batches != 1)
++				return -EINVAL;
++			num_batches = parent->guc_number_children + 1;
++			if (num_batches > count)
++				return -EINVAL;
++		} else if(num_batches > 1) {
+ 			err = -EINVAL;
+ 			goto err_context;
+ 		}
+@@ -3827,8 +3859,7 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
+ 	 * properly, also this is needed to create an excl fence for an dma buf
+ 	 * objects these BBs touch.
+ 	 */
+-	if (args->flags & I915_EXEC_FENCE_OUT ||
+-	    i915_gem_context_is_parallel(ctx)) {
++	if (args->flags & I915_EXEC_FENCE_OUT || is_parallel) {
+ 		out_fences = kcalloc(num_batches, sizeof(*out_fences),
+ 				     GFP_KERNEL);
+ 		if (!out_fences) {
+@@ -3874,8 +3905,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
+ 	 * in intel_context sequence, thus only 1 submission can happen at a
+ 	 * time.
+ 	 */
+-	if (i915_gem_context_is_parallel(ctx))
+-		mutex_lock(&ctx->parallel_submit);
++	if (is_parallel)
++		mutex_lock(&parent->parallel_submit);
+ 
+ 	err = i915_gem_do_execbuffer(dev, file, args, exec2_list,
+ 				     args->flags & I915_EXEC_BATCH_FIRST ?
+@@ -3889,8 +3920,10 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
+ 				     &ww);
+ 
+ 	for (i = 1; err == 0 && i < num_batches; i++) {
+-		args->flags &= ~I915_EXEC_RING_MASK;
+-		args->flags |= start_context + i;
++		if (i915_gem_context_is_parallel(ctx)) {
++			args->flags &= ~I915_EXEC_RING_MASK;
++			args->flags |= start_context + i;
++		}
+ 		args->batch_len = 0;
+ 
+ 		err = i915_gem_do_execbuffer(dev, file, args, exec2_list,
+@@ -3905,8 +3938,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
+ 					     &ww);
+ 	}
+ 
+-	if (i915_gem_context_is_parallel(ctx))
+-		mutex_unlock(&ctx->parallel_submit);
++	if (is_parallel)
++		mutex_unlock(&parent->parallel_submit);
+ 
+ 	/*
+ 	 * Now that we have begun execution of the batchbuffer, we ignore
+@@ -4009,6 +4042,8 @@ end:;
+ 	dma_fence_put(in_fence);
+ err_context:
+ 	i915_gem_context_put(ctx);
++	if (parent)
++		intel_context_put(parent);
+ 
+ 	return err;
+ }
+diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
+--- a/drivers/gpu/drm/i915/gt/intel_context.c
++++ b/drivers/gpu/drm/i915/gt/intel_context.c
+@@ -460,6 +460,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
+ 	INIT_LIST_HEAD(&ce->signals);
+ 
+ 	mutex_init(&ce->pin_mutex);
++	mutex_init(&ce->parallel_submit);
+ 
+ 	spin_lock_init(&ce->guc_state.lock);
+ 	INIT_LIST_HEAD(&ce->guc_state.fences);
+@@ -491,6 +492,7 @@ void intel_context_fini(struct intel_context *ce)
+ 			intel_context_put(child);
+ 
+ 	mutex_destroy(&ce->pin_mutex);
++	mutex_destroy(&ce->parallel_submit);
+ 	i915_active_fini(&ce->active);
+ }
+ 
+diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
+--- a/drivers/gpu/drm/i915/gt/intel_context.h
++++ b/drivers/gpu/drm/i915/gt/intel_context.h
+@@ -52,6 +52,11 @@ static inline bool intel_context_is_parent(struct intel_context *ce)
+ 	return !!ce->guc_number_children;
+ }
+ 
++static inline bool intel_context_is_parallel(struct intel_context *ce)
++{
++	return intel_context_is_child(ce) || intel_context_is_parent(ce);
++}
++
+ /* Only should be called directly by selftests */
+ void __intel_context_bind_parent_child(struct intel_context *parent,
+ 				       struct intel_context *child);
+@@ -204,6 +209,12 @@ static inline bool intel_context_is_barrier(const struct intel_context *ce)
+ 	return test_bit(CONTEXT_BARRIER_BIT, &ce->flags);
+ }
+ 
++static inline bool
++intel_context_is_no_preempt_mid_batch(const struct intel_context *ce)
++{
++	return test_bit(CONTEXT_NO_PREEMPT_MID_BATCH, &ce->flags);
++}
++
+ static inline bool intel_context_is_closed(const struct intel_context *ce)
+ {
+ 	return test_bit(CONTEXT_CLOSED_BIT, &ce->flags);
+diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
+--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
++++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
+@@ -114,6 +114,7 @@ struct intel_context {
+ #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
+ #define CONTEXT_NOPREEMPT		8
+ #define CONTEXT_LRCA_DIRTY		9
++#define CONTEXT_NO_PREEMPT_MID_BATCH	10
+ 
+ 	struct {
+ 		u64 timeout_us;
+@@ -239,6 +240,9 @@ struct intel_context {
+ 
+ 	/* Last request submitted on a parent */
+ 	struct i915_request *last_rq;
++
++	/* Parallel submit mutex */
++	struct mutex parallel_submit;
+ };
+ 
+ #endif /* __INTEL_CONTEXT_TYPES__ */
+diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
++++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+@@ -798,8 +798,7 @@ static inline int rq_prio(const struct i915_request *rq)
+ 
+ static inline bool is_multi_lrc(struct intel_context *ce)
+ {
+-	return intel_context_is_child(ce) ||
+-		intel_context_is_parent(ce);
++	return intel_context_is_parallel(ce);
+ }
+ 
+ static inline bool is_multi_lrc_rq(struct i915_request *rq)
+@@ -3458,6 +3457,7 @@ void intel_guc_configure_parent_context(struct intel_context *ce)
+ 		bb_preempt_boundary =
+ 			i915_gem_context_is_bb_preempt_boundary(ctx);
+ 	rcu_read_unlock();
++	bb_preempt_boundary |= intel_context_is_no_preempt_mid_batch(ce);
+ 	if (bb_preempt_boundary) {
+ 		ce->emit_bb_start = emit_bb_start_parent_bb_preempt_boundary;
+ 		ce->emit_fini_breadcrumb =
+diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
+--- a/drivers/gpu/drm/i915/i915_request.c
++++ b/drivers/gpu/drm/i915/i915_request.c
+@@ -1606,14 +1606,9 @@ i915_request_await_object(struct i915_request *to,
+ 	return ret;
+ }
+ 
+-static inline bool is_parallel(struct intel_context *ce)
+-{
+-	return intel_context_is_child(ce) || intel_context_is_parent(ce);
+-}
+-
+ static inline bool is_parallel_rq(struct i915_request *rq)
+ {
+-	return is_parallel(rq->context);
++	return intel_context_is_parallel(rq->context);
+ }
+ 
+ static inline struct intel_context *request_to_parent(struct i915_request *rq)
+diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h
+--- a/include/uapi/drm/i915_drm_prelim.h
++++ b/include/uapi/drm/i915_drm_prelim.h
+@@ -370,9 +370,124 @@ struct prelim_i915_context_engines_parallel_submit {
+ } __attribute__ ((packed));
+ #define i915_context_engines_parallel_submit prelim_i915_context_engines_parallel_submit
+ 
++/**
++ * struct prelim_drm_i915_context_engines_parallel2_submit - Configure engine
++ * for parallel submission.
++ *
++ * Setup a slot in the context engine map to allow multiple BBs to be submitted
++ * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU
++ * in parallel. Multiple hardware contexts are created internally in the i915
++ * run these BBs. Once a slot is configured for N BBs only N BBs can be
++ * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user
++ * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how
++ * many BBs there are based on the slot's configuration. The N BBs are the last
++ * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
++ *
++ * The default placement behavior is to create implicit bonds between each
++ * context if each context maps to more than 1 physical engine (e.g. context is
++ * a virtual engine). Also we only allow contexts of same engine class and these
++ * contexts must be in logically contiguous order. Examples of the placement
++ * behavior described below. Lastly, the default is to not allow BBs to
++ * preempted mid BB rather insert coordinated preemption on all hardware
++ * contexts between each set of BBs. Flags may be added in the future to change
++ * both of these default behaviors.
++ *
++ * Returns -EINVAL if hardware context placement configuration is invalid or if
++ * the placement configuration isn't supported on the platform / submission
++ * interface.
++ * Returns -ENODEV if extension isn't supported on the platform / submission
++ * inteface.
++ *
++ * .. code-block::
++ *
++ *	Example 1 pseudo code:
++ *	CS[X] = generic engine of same class, logical instance X
++ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
++ *	set_engines(INVALID)
++ *	set_parallel(engine_index=0, width=2, num_siblings=1,
++ *		     engines=CS[0],CS[1])
++ *
++ *	Results in the following valid placement:
++ *	CS[0], CS[1]
++ *
++ *	Example 2 pseudo code:
++ *	CS[X] = generic engine of same class, logical instance X
++ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
++ *	set_engines(INVALID)
++ *	set_parallel(engine_index=0, width=2, num_siblings=2,
++ *		     engines=CS[0],CS[2],CS[1],CS[3])
++ *
++ *	Results in the following valid placements:
++ *	CS[0], CS[1]
++ *	CS[2], CS[3]
++ *
++ *	This can also be thought of as 2 virtual engines described by 2-D array
++ *	in the engines the field with bonds placed between each index of the
++ *	virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to
++ *	CS[3].
++ *	VE[0] = CS[0], CS[2]
++ *	VE[1] = CS[1], CS[3]
++ *
++ *	Example 3 pseudo code:
++ *	CS[X] = generic engine of same class, logical instance X
++ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
++ *	set_engines(INVALID)
++ *	set_parallel(engine_index=0, width=2, num_siblings=2,
++ *		     engines=CS[0],CS[1],CS[1],CS[3])
++ *
++ *	Results in the following valid and invalid placements:
++ *	CS[0], CS[1]
++ *	CS[1], CS[3] - Not logical contiguous, return -EINVAL
++ */
++struct prelim_drm_i915_context_engines_parallel2_submit {
++	/**
++	 * @base: base user extension.
++	 */
++	struct i915_user_extension base;
++
++	/**
++	 * @engine_index: slot for parallel engine
++	 */
++	__u16 engine_index;
++
++	/**
++	 * @width: number of contexts per parallel engine
++	 */
++	__u16 width;
++
++	/**
++	 * @num_siblings: number of siblings per context
++	 */
++	__u16 num_siblings;
++
++	/**
++	 * @mbz16: reserved for future use; must be zero
++	 */
++	__u16 mbz16;
++
++	/**
++	 * @flags: all undefined flags must be zero, currently not defined flags
++	 */
++	__u64 flags;
++
++	/**
++	 * @mbz64: reserved for future use; must be zero
++	 */
++	__u64 mbz64[3];
++
++	/**
++	 * @engines: 2-d array of engine instances to configure parallel engine
++	 *
++	 * length = width (i) * num_siblings (j)
++	 * index = j + i * num_siblings
++	 */
++	struct i915_engine_class_instance engines[0];
++} __attribute__ ((packed));
++
+ struct prelim_i915_context_param_engines {
+ #define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT (PRELIM_I915_USER_EXT | 2) /* see prelim_i915_context_engines_parallel_submit */
+ #define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
++#define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT (PRELIM_I915_USER_EXT | 3) /* see prelim_i915_context_engines_parallel2_submit */
+ };
+ 
+ enum prelim_drm_i915_gem_memory_class {
--
git-pile 0.97

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 1/2] INTEL_DII/NOT_UPSTREAM: drm/i915: Introduce set_parallel2 extension
  2021-07-08  0:30   ` [Intel-gfx] " Matthew Brost
  (?)
@ 2021-07-08  0:30   ` Matthew Brost
  -1 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-08  0:30 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: john.c.harrison, Michal.Wajdeczko

Based on upstream feedback the set_parallel extension isn't suitable as
it looks a bit too much like the bonding extension. Introduce a
set_parallel2 extension which configures parallel submission in a single
extension and in a single slot. This compares to old set_parallel
extension which configured parallel submission across multiple slots.

Also remove the ability for the user to pass in the number of BBs in
the execbuf IOCTL. The number of BBs is now implied based on the width
of the context in the slot.

This patch is intended in enable UMDs for the upstream direction while
maintaining the old set_parallel extension to not break UMDs. Once UMDs
have been updated to use new extension the old one can be removed from
DII.

v2: Only enable parallel submission on engines set by user

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 190 +++++++++++++++++-
 .../gpu/drm/i915/gem/i915_gem_context_types.h |   6 -
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  73 +++++--
 drivers/gpu/drm/i915/gt/intel_context.c       |   2 +
 drivers/gpu/drm/i915/gt/intel_context.h       |  11 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   4 +-
 drivers/gpu/drm/i915/i915_request.c           |   7 +-
 include/uapi/drm/i915_drm_prelim.h            | 115 +++++++++++
 9 files changed, 376 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 2e3c0bf09eff..c8da87dde1fa 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -374,7 +374,6 @@ void i915_gem_context_release(struct kref *ref)
 	mutex_destroy(&ctx->engines_mutex);
 	mutex_destroy(&ctx->lut_mutex);
 	mutex_destroy(&ctx->mutex);
-	mutex_destroy(&ctx->parallel_submit);
 
 	kfree_rcu(ctx, rcu);
 }
@@ -699,8 +698,6 @@ __create_context(struct drm_i915_private *i915)
 	mutex_init(&ctx->mutex);
 	INIT_LIST_HEAD(&ctx->link);
 
-	mutex_init(&ctx->parallel_submit);
-
 	spin_lock_init(&ctx->stale.lock);
 	INIT_LIST_HEAD(&ctx->stale.engines);
 
@@ -1857,6 +1854,48 @@ static bool validate_parallel_engines_layout(const struct set_engines *set)
 	return true;
 }
 
+/*
+ * Engine must be same class and form a logically contiguous mask.
+ *
+ * FIXME: Logical mask check not 100% correct but good enough for the PoC
+ */
+static bool __validate_parallel_engines_layout(struct drm_i915_private *i915,
+					       struct intel_context *parent)
+{
+	u8 engine_class = parent->engine->class;
+	u8 num_siblings = hweight_long(parent->engine->logical_mask);
+	struct intel_context *child;
+	intel_engine_mask_t logical_mask = parent->engine->logical_mask;
+
+	for_each_child(parent, child) {
+		if (child->engine->class != engine_class) {
+			drm_dbg(&i915->drm, "Class mismatch: %u, %u",
+				engine_class, child->engine->class);
+			return false;
+		}
+		if (hweight_long(child->engine->logical_mask) != num_siblings) {
+			drm_dbg(&i915->drm, "Sibling mismatch: %u, %lu",
+				num_siblings,
+				hweight_long(child->engine->logical_mask));
+			return false;
+		}
+		if (logical_mask & child->engine->logical_mask) {
+			drm_dbg(&i915->drm, "Overlapping logical mask: 0x%04x, 0x%04x",
+				logical_mask, child->engine->logical_mask);
+			return false;
+		}
+		logical_mask |= child->engine->logical_mask;
+	}
+
+	if (!is_power_of_2((logical_mask >> (ffs(logical_mask) - 1)) + 1)) {
+		drm_dbg(&i915->drm, "Non-contiguous logical mask: 0x%04x",
+			logical_mask);
+		return false;
+	}
+
+	return true;
+}
+
 static int
 set_engines__parallel_submit(struct i915_user_extension __user *base, void *data)
 {
@@ -2009,11 +2048,156 @@ set_engines__parallel_submit(struct i915_user_extension __user *base, void *data
 	return err;
 }
 
+static int
+set_engines__parallel2_submit(struct i915_user_extension __user *base,
+			      void *data)
+{
+	struct prelim_drm_i915_context_engines_parallel2_submit __user *ext =
+		container_of_user(base, typeof(*ext), base);
+	const struct set_engines *set = data;
+	struct drm_i915_private *i915 = set->ctx->i915;
+	struct intel_context *parent, *child, *ce;
+	u64 flags;
+	int err = 0, n, i, j;
+	u16 slot, width, num_siblings;
+	struct intel_engine_cs **siblings = NULL;
+
+	if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
+		return -ENODEV;
+
+	if (get_user(slot, &ext->engine_index))
+		return -EFAULT;
+
+	if (get_user(width, &ext->width))
+		return -EFAULT;
+
+	if (get_user(num_siblings, &ext->num_siblings))
+		return -EFAULT;
+
+	if (slot >= set->engines->num_engines) {
+		drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
+			slot, set->engines->num_engines);
+		return -EINVAL;
+	}
+
+	parent = set->engines->engines[slot];
+	if (parent) {
+		drm_dbg(&i915->drm, "Context index[%d] not NULL\n", slot);
+		return -EINVAL;
+	}
+
+	if (get_user(flags, &ext->flags))
+		return -EFAULT;
+
+	if (flags) {
+		drm_dbg(&i915->drm, "Unknown flags 0x%02llx", flags);
+		return -EINVAL;
+	}
+
+	for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
+		err = check_user_mbz(&ext->mbz64[n]);
+		if (err)
+			return err;
+	}
+
+	if (width < 1) {
+		drm_dbg(&i915->drm, "Width (%d) < 1 \n", width);
+		return -EINVAL;
+	}
+
+	if (num_siblings < 1) {
+		drm_dbg(&i915->drm, "Number siblings (%d) < 1 \n",
+			num_siblings);
+		return -EINVAL;
+	}
+
+	siblings = kmalloc_array(num_siblings,
+				 sizeof(*siblings),
+				 GFP_KERNEL);
+	if (!siblings)
+		return -ENOMEM;
+
+	mutex_lock(&set->ctx->mutex);
+
+	/* Create contexts / engines */
+	for (i = 0; i < width; ++i) {
+		for (j = 0; j < num_siblings; ++j) {
+			struct i915_engine_class_instance ci;
+
+			if (copy_from_user(&ci, &ext->engines[i * num_siblings + j],
+					   sizeof(ci))) {
+				err = -EFAULT;
+				goto out_err;
+			}
+
+			siblings[j] = intel_engine_lookup_user(i915,
+							       ci.engine_class,
+							       ci.engine_instance);
+			if (!siblings[j]) {
+				drm_dbg(&i915->drm,
+					"Invalid sibling[%d]: { class:%d, inst:%d }\n",
+					n, ci.engine_class, ci.engine_instance);
+				err = -EINVAL;
+				goto out_err;
+			}
+		}
+
+		ce = intel_engine_create_virtual(siblings, num_siblings,
+						 FORCE_VIRTUAL);
+		if (IS_ERR(ce)) {
+			err = PTR_ERR(ce);
+			goto out_err;
+		}
+		intel_context_set_gem(ce, set->ctx);
+
+		if (i == 0) {
+			parent = ce;
+			__set_bit(CONTEXT_NO_PREEMPT_MID_BATCH, &ce->flags);
+		} else {
+			intel_context_bind_parent_child(parent, ce);
+			err = intel_context_alloc_state(ce);
+			if (err)
+				goto out_err;
+		}
+	}
+
+	if (!__validate_parallel_engines_layout(i915, parent)) {
+		drm_dbg(&i915->drm, "Invalidate parallel context layout");
+		err = -EINVAL;
+		goto out_err;
+	}
+
+	intel_guc_configure_parent_context(parent);
+	if (cmpxchg(&set->engines->engines[slot], NULL, parent)) {
+		err = -EEXIST;
+		goto out_err;
+	}
+
+	kfree(siblings);
+	mutex_unlock(&set->ctx->mutex);
+
+	return 0;
+
+out_err:
+	if (parent) {
+		for_each_child(parent, child)
+			intel_context_put(child);
+		intel_context_put(parent);
+		set->engines->engines[slot] = NULL;
+	}
+	kfree(siblings);
+	mutex_unlock(&set->ctx->mutex);
+
+	return err;
+}
+
 static const i915_user_extension_fn set_engines__extensions[] = {
 	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
 	[I915_CONTEXT_ENGINES_EXT_BOND] = set_engines__bond,
 	[PRELIM_I915_USER_EXT_MASK(PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT)] =
 		set_engines__parallel_submit,
+	[PRELIM_I915_USER_EXT_MASK(PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT)] =
+		set_engines__parallel2_submit,
 };
 
 static int
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 4f5785c7528b..0c79000e95fe 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -194,12 +194,6 @@ struct i915_gem_context {
 	 */
 	u64 fence_context;
 
-	/**
-	 * @parallel_submit: Ensure only 1 parallel submission is happening on
-	 * this context at a time.
-	 */
-	struct mutex parallel_submit;
-
 	/**
 	 * @seqno: Seqno when using when a parallel context.
 	 */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index d3f70899e7ab..2253336e9791 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1633,7 +1633,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 		goto err_unmap;
 
 	if (engine == eb->context->engine &&
-	    !i915_gem_context_is_parallel(eb->gem_context)) {
+	    !intel_context_is_parallel(eb->context)) {
 		rq = i915_request_create(eb->context);
 	} else {
 		struct intel_context *ce = eb->reloc_context;
@@ -1727,7 +1727,7 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
 		struct intel_engine_cs *engine = eb->engine;
 
 		if (!reloc_can_use_engine(engine) ||
-		    i915_gem_context_is_parallel(eb->gem_context)) {
+		    intel_context_is_parallel(eb->context)) {
 			engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
 			if (!engine)
 				return ERR_PTR(-ENODEV);
@@ -3223,7 +3223,7 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
 }
 
 static int
-eb_select_engine(struct i915_execbuffer *eb)
+eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number)
 {
 	struct intel_context *ce;
 	unsigned int idx;
@@ -3238,6 +3238,16 @@ eb_select_engine(struct i915_execbuffer *eb)
 	if (IS_ERR(ce))
 		return PTR_ERR(ce);
 
+	if (batch_number > 0 &&
+	    !i915_gem_context_is_parallel(eb->gem_context)) {
+		struct intel_context *parent = ce;
+		for_each_child(parent, ce)
+			if (!--batch_number)
+				break;
+		intel_context_put(parent);
+		intel_context_get(ce);
+	}
+
 	intel_gt_pm_get(ce->engine->gt);
 
 	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
@@ -3562,7 +3572,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (unlikely(err))
 		goto err_destroy;
 
-	err = eb_select_engine(&eb);
+	err = eb_select_engine(&eb, batch_number);
 	if (unlikely(err))
 		goto err_context;
 
@@ -3751,6 +3761,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	const size_t count = args->buffer_count;
 	unsigned int num_batches, i;
 	int err, start_context;
+	bool is_parallel = false;
+	struct intel_context *parent = NULL;
 
 	if (!check_buffer_count(count)) {
 		drm_dbg(&i915->drm, "execbuf2 with %zd buffers\n", count);
@@ -3782,15 +3794,35 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 			I915_EXEC_NUMBER_BB_LSB) +
 		       ((args->flags & PRELIM_I915_EXEC_NUMBER_BB_MASK) >>
 			PRELIM_I915_EXEC_NUMBER_BB_LSB)) + 1;
-	if (i915_gem_context_is_parallel(ctx)) {
-		if (num_batches > count ||
-		    start_context + num_batches > ctx->width) {
-			err = -EINVAL;
-			goto err_context;
+
+	if (i915_gem_context_user_engines(ctx)) {
+		parent = i915_gem_context_get_engine(ctx, start_context);
+		if (IS_ERR(parent)) {
+			i915_gem_context_put(ctx);
+			return PTR_ERR(parent);
 		}
 
-		if (i915_gem_context_is_bb_preempt_boundary(ctx) &&
-		    (start_context || num_batches != ctx->width)) {
+		is_parallel = i915_gem_context_is_parallel(ctx) ||
+			intel_context_is_parallel(parent);
+		if (i915_gem_context_is_parallel(ctx)) {
+			if (num_batches > count ||
+			    start_context + num_batches > ctx->width) {
+				err = -EINVAL;
+				goto err_context;
+			}
+
+			if (i915_gem_context_is_bb_preempt_boundary(ctx) &&
+			    (start_context || num_batches != ctx->width)) {
+				err = -EINVAL;
+				goto err_context;
+			}
+		} else if (intel_context_is_parallel(parent)) {
+			if (num_batches != 1)
+				return -EINVAL;
+			num_batches = parent->guc_number_children + 1;
+			if (num_batches > count)
+				return -EINVAL;
+		} else if(num_batches > 1) {
 			err = -EINVAL;
 			goto err_context;
 		}
@@ -3827,8 +3859,7 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	 * properly, also this is needed to create an excl fence for an dma buf
 	 * objects these BBs touch.
 	 */
-	if (args->flags & I915_EXEC_FENCE_OUT ||
-	    i915_gem_context_is_parallel(ctx)) {
+	if (args->flags & I915_EXEC_FENCE_OUT || is_parallel) {
 		out_fences = kcalloc(num_batches, sizeof(*out_fences),
 				     GFP_KERNEL);
 		if (!out_fences) {
@@ -3874,8 +3905,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	 * in intel_context sequence, thus only 1 submission can happen at a
 	 * time.
 	 */
-	if (i915_gem_context_is_parallel(ctx))
-		mutex_lock(&ctx->parallel_submit);
+	if (is_parallel)
+		mutex_lock(&parent->parallel_submit);
 
 	err = i915_gem_do_execbuffer(dev, file, args, exec2_list,
 				     args->flags & I915_EXEC_BATCH_FIRST ?
@@ -3889,8 +3920,10 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 				     &ww);
 
 	for (i = 1; err == 0 && i < num_batches; i++) {
-		args->flags &= ~I915_EXEC_RING_MASK;
-		args->flags |= start_context + i;
+		if (i915_gem_context_is_parallel(ctx)) {
+			args->flags &= ~I915_EXEC_RING_MASK;
+			args->flags |= start_context + i;
+		}
 		args->batch_len = 0;
 
 		err = i915_gem_do_execbuffer(dev, file, args, exec2_list,
@@ -3905,8 +3938,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 					     &ww);
 	}
 
-	if (i915_gem_context_is_parallel(ctx))
-		mutex_unlock(&ctx->parallel_submit);
+	if (is_parallel)
+		mutex_unlock(&parent->parallel_submit);
 
 	/*
 	 * Now that we have begun execution of the batchbuffer, we ignore
@@ -4009,6 +4042,8 @@ end:;
 	dma_fence_put(in_fence);
 err_context:
 	i915_gem_context_put(ctx);
+	if (parent)
+		intel_context_put(parent);
 
 	return err;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 84ce1f4d3051..2e45c128ee0b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -460,6 +460,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	INIT_LIST_HEAD(&ce->signals);
 
 	mutex_init(&ce->pin_mutex);
+	mutex_init(&ce->parallel_submit);
 
 	spin_lock_init(&ce->guc_state.lock);
 	INIT_LIST_HEAD(&ce->guc_state.fences);
@@ -491,6 +492,7 @@ void intel_context_fini(struct intel_context *ce)
 			intel_context_put(child);
 
 	mutex_destroy(&ce->pin_mutex);
+	mutex_destroy(&ce->parallel_submit);
 	i915_active_fini(&ce->active);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index fc2da541d633..5ae6cc7374c8 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -52,6 +52,11 @@ static inline bool intel_context_is_parent(struct intel_context *ce)
 	return !!ce->guc_number_children;
 }
 
+static inline bool intel_context_is_parallel(struct intel_context *ce)
+{
+	return intel_context_is_child(ce) || intel_context_is_parent(ce);
+}
+
 /* Only should be called directly by selftests */
 void __intel_context_bind_parent_child(struct intel_context *parent,
 				       struct intel_context *child);
@@ -204,6 +209,12 @@ static inline bool intel_context_is_barrier(const struct intel_context *ce)
 	return test_bit(CONTEXT_BARRIER_BIT, &ce->flags);
 }
 
+static inline bool
+intel_context_is_no_preempt_mid_batch(const struct intel_context *ce)
+{
+	return test_bit(CONTEXT_NO_PREEMPT_MID_BATCH, &ce->flags);
+}
+
 static inline bool intel_context_is_closed(const struct intel_context *ce)
 {
 	return test_bit(CONTEXT_CLOSED_BIT, &ce->flags);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 44bc13ee9f8b..74c23c34ee50 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -114,6 +114,7 @@ struct intel_context {
 #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
 #define CONTEXT_NOPREEMPT		8
 #define CONTEXT_LRCA_DIRTY		9
+#define CONTEXT_NO_PREEMPT_MID_BATCH	10
 
 	struct {
 		u64 timeout_us;
@@ -239,6 +240,9 @@ struct intel_context {
 
 	/* Last request submitted on a parent */
 	struct i915_request *last_rq;
+
+	/* Parallel submit mutex */
+	struct mutex parallel_submit;
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index c9e873f075a7..2b2c243ffd16 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -798,8 +798,7 @@ static inline int rq_prio(const struct i915_request *rq)
 
 static inline bool is_multi_lrc(struct intel_context *ce)
 {
-	return intel_context_is_child(ce) ||
-		intel_context_is_parent(ce);
+	return intel_context_is_parallel(ce);
 }
 
 static inline bool is_multi_lrc_rq(struct i915_request *rq)
@@ -3458,6 +3457,7 @@ void intel_guc_configure_parent_context(struct intel_context *ce)
 		bb_preempt_boundary =
 			i915_gem_context_is_bb_preempt_boundary(ctx);
 	rcu_read_unlock();
+	bb_preempt_boundary |= intel_context_is_no_preempt_mid_batch(ce);
 	if (bb_preempt_boundary) {
 		ce->emit_bb_start = emit_bb_start_parent_bb_preempt_boundary;
 		ce->emit_fini_breadcrumb =
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 092e9d892c30..76fa06f06947 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1606,14 +1606,9 @@ i915_request_await_object(struct i915_request *to,
 	return ret;
 }
 
-static inline bool is_parallel(struct intel_context *ce)
-{
-	return intel_context_is_child(ce) || intel_context_is_parent(ce);
-}
-
 static inline bool is_parallel_rq(struct i915_request *rq)
 {
-	return is_parallel(rq->context);
+	return intel_context_is_parallel(rq->context);
 }
 
 static inline struct intel_context *request_to_parent(struct i915_request *rq)
diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h
index a885c14585aa..6d5d7eb54813 100644
--- a/include/uapi/drm/i915_drm_prelim.h
+++ b/include/uapi/drm/i915_drm_prelim.h
@@ -370,9 +370,124 @@ struct prelim_i915_context_engines_parallel_submit {
 } __attribute__ ((packed));
 #define i915_context_engines_parallel_submit prelim_i915_context_engines_parallel_submit
 
+/**
+ * struct prelim_drm_i915_context_engines_parallel2_submit - Configure engine
+ * for parallel submission.
+ *
+ * Setup a slot in the context engine map to allow multiple BBs to be submitted
+ * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU
+ * in parallel. Multiple hardware contexts are created internally in the i915
+ * run these BBs. Once a slot is configured for N BBs only N BBs can be
+ * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user
+ * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how
+ * many BBs there are based on the slot's configuration. The N BBs are the last
+ * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
+ *
+ * The default placement behavior is to create implicit bonds between each
+ * context if each context maps to more than 1 physical engine (e.g. context is
+ * a virtual engine). Also we only allow contexts of same engine class and these
+ * contexts must be in logically contiguous order. Examples of the placement
+ * behavior described below. Lastly, the default is to not allow BBs to
+ * preempted mid BB rather insert coordinated preemption on all hardware
+ * contexts between each set of BBs. Flags may be added in the future to change
+ * both of these default behaviors.
+ *
+ * Returns -EINVAL if hardware context placement configuration is invalid or if
+ * the placement configuration isn't supported on the platform / submission
+ * interface.
+ * Returns -ENODEV if extension isn't supported on the platform / submission
+ * inteface.
+ *
+ * .. code-block::
+ *
+ *	Example 1 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=1,
+ *		     engines=CS[0],CS[1])
+ *
+ *	Results in the following valid placement:
+ *	CS[0], CS[1]
+ *
+ *	Example 2 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=2,
+ *		     engines=CS[0],CS[2],CS[1],CS[3])
+ *
+ *	Results in the following valid placements:
+ *	CS[0], CS[1]
+ *	CS[2], CS[3]
+ *
+ *	This can also be thought of as 2 virtual engines described by 2-D array
+ *	in the engines the field with bonds placed between each index of the
+ *	virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to
+ *	CS[3].
+ *	VE[0] = CS[0], CS[2]
+ *	VE[1] = CS[1], CS[3]
+ *
+ *	Example 3 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=2,
+ *		     engines=CS[0],CS[1],CS[1],CS[3])
+ *
+ *	Results in the following valid and invalid placements:
+ *	CS[0], CS[1]
+ *	CS[1], CS[3] - Not logical contiguous, return -EINVAL
+ */
+struct prelim_drm_i915_context_engines_parallel2_submit {
+	/**
+	 * @base: base user extension.
+	 */
+	struct i915_user_extension base;
+
+	/**
+	 * @engine_index: slot for parallel engine
+	 */
+	__u16 engine_index;
+
+	/**
+	 * @width: number of contexts per parallel engine
+	 */
+	__u16 width;
+
+	/**
+	 * @num_siblings: number of siblings per context
+	 */
+	__u16 num_siblings;
+
+	/**
+	 * @mbz16: reserved for future use; must be zero
+	 */
+	__u16 mbz16;
+
+	/**
+	 * @flags: all undefined flags must be zero, currently not defined flags
+	 */
+	__u64 flags;
+
+	/**
+	 * @mbz64: reserved for future use; must be zero
+	 */
+	__u64 mbz64[3];
+
+	/**
+	 * @engines: 2-d array of engine instances to configure parallel engine
+	 *
+	 * length = width (i) * num_siblings (j)
+	 * index = j + i * num_siblings
+	 */
+	struct i915_engine_class_instance engines[0];
+} __attribute__ ((packed));
+
 struct prelim_i915_context_param_engines {
 #define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT (PRELIM_I915_USER_EXT | 2) /* see prelim_i915_context_engines_parallel_submit */
 #define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
+#define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT (PRELIM_I915_USER_EXT | 3) /* see prelim_i915_context_engines_parallel2_submit */
 };
 
 enum prelim_drm_i915_gem_memory_class {

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 2/2] REVIEW: Full tree diff against internal/internal
  2021-07-08  0:30   ` [Intel-gfx] " Matthew Brost
@ 2021-07-08  0:30     ` Matthew Brost
  -1 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-08  0:30 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: john.c.harrison, Michal.Wajdeczko

Auto-generated diff between internal/internal..internal
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c       | 190 +++++++++++++++++++++-
 drivers/gpu/drm/i915/gem/i915_gem_context_types.h |   6 -
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c    |  73 ++++++---
 drivers/gpu/drm/i915/gt/intel_context.c           |   2 +
 drivers/gpu/drm/i915/gt/intel_context.h           |  11 ++
 drivers/gpu/drm/i915/gt/intel_context_types.h     |   6 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |   4 +-
 drivers/gpu/drm/i915/i915_request.c               |   7 +-
 include/uapi/drm/i915_drm_prelim.h                | 115 +++++++++++++
 9 files changed, 377 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 82b7af55ba05..583113c58e9c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -389,7 +389,6 @@ void i915_gem_context_release(struct kref *ref)
 	mutex_destroy(&ctx->engines_mutex);
 	mutex_destroy(&ctx->lut_mutex);
 	mutex_destroy(&ctx->mutex);
-	mutex_destroy(&ctx->parallel_submit);
 
 	kfree_rcu(ctx, rcu);
 }
@@ -769,8 +768,6 @@ static struct i915_gem_context *__create_context(struct intel_gt *gt)
 	mutex_init(&ctx->mutex);
 	INIT_LIST_HEAD(&ctx->link);
 
-	mutex_init(&ctx->parallel_submit);
-
 	spin_lock_init(&ctx->stale.lock);
 	INIT_LIST_HEAD(&ctx->stale.engines);
 
@@ -2093,6 +2090,48 @@ static bool validate_parallel_engines_layout(const struct set_engines *set)
 	return true;
 }
 
+/*
+ * Engine must be same class and form a logically contiguous mask.
+ *
+ * FIXME: Logical mask check not 100% correct but good enough for the PoC
+ */
+static bool __validate_parallel_engines_layout(struct drm_i915_private *i915,
+					       struct intel_context *parent)
+{
+	u8 engine_class = parent->engine->class;
+	u8 num_siblings = hweight_long(parent->engine->logical_mask);
+	struct intel_context *child;
+	intel_engine_mask_t logical_mask = parent->engine->logical_mask;
+
+	for_each_child(parent, child) {
+		if (child->engine->class != engine_class) {
+			drm_dbg(&i915->drm, "Class mismatch: %u, %u",
+				engine_class, child->engine->class);
+			return false;
+		}
+		if (hweight_long(child->engine->logical_mask) != num_siblings) {
+			drm_dbg(&i915->drm, "Sibling mismatch: %u, %lu",
+				num_siblings,
+				hweight_long(child->engine->logical_mask));
+			return false;
+		}
+		if (logical_mask & child->engine->logical_mask) {
+			drm_dbg(&i915->drm, "Overlapping logical mask: 0x%04x, 0x%04x",
+				logical_mask, child->engine->logical_mask);
+			return false;
+		}
+		logical_mask |= child->engine->logical_mask;
+	}
+
+	if (!is_power_of_2((logical_mask >> (ffs(logical_mask) - 1)) + 1)) {
+		drm_dbg(&i915->drm, "Non-contiguous logical mask: 0x%04x",
+			logical_mask);
+		return false;
+	}
+
+	return true;
+}
+
 static int
 set_engines__parallel_submit(struct i915_user_extension __user *base, void *data)
 {
@@ -2245,11 +2284,156 @@ set_engines__parallel_submit(struct i915_user_extension __user *base, void *data
 	return err;
 }
 
+static int
+set_engines__parallel2_submit(struct i915_user_extension __user *base,
+			      void *data)
+{
+	struct prelim_drm_i915_context_engines_parallel2_submit __user *ext =
+		container_of_user(base, typeof(*ext), base);
+	const struct set_engines *set = data;
+	struct drm_i915_private *i915 = set->ctx->i915;
+	struct intel_context *parent, *child, *ce;
+	u64 flags;
+	int err = 0, n, i, j;
+	u16 slot, width, num_siblings;
+	struct intel_engine_cs **siblings = NULL;
+
+	if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
+		return -ENODEV;
+
+	if (get_user(slot, &ext->engine_index))
+		return -EFAULT;
+
+	if (get_user(width, &ext->width))
+		return -EFAULT;
+
+	if (get_user(num_siblings, &ext->num_siblings))
+		return -EFAULT;
+
+	if (slot >= set->engines->num_engines) {
+		drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
+			slot, set->engines->num_engines);
+		return -EINVAL;
+	}
+
+	parent = set->engines->engines[slot];
+	if (parent) {
+		drm_dbg(&i915->drm, "Context index[%d] not NULL\n", slot);
+		return -EINVAL;
+	}
+
+	if (get_user(flags, &ext->flags))
+		return -EFAULT;
+
+	if (flags) {
+		drm_dbg(&i915->drm, "Unknown flags 0x%02llx", flags);
+		return -EINVAL;
+	}
+
+	for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
+		err = check_user_mbz(&ext->mbz64[n]);
+		if (err)
+			return err;
+	}
+
+	if (width < 1) {
+		drm_dbg(&i915->drm, "Width (%d) < 1 \n", width);
+		return -EINVAL;
+	}
+
+	if (num_siblings < 1) {
+		drm_dbg(&i915->drm, "Number siblings (%d) < 1 \n",
+			num_siblings);
+		return -EINVAL;
+	}
+
+	siblings = kmalloc_array(num_siblings,
+				 sizeof(*siblings),
+				 GFP_KERNEL);
+	if (!siblings)
+		return -ENOMEM;
+
+	mutex_lock(&set->ctx->mutex);
+
+	/* Create contexts / engines */
+	for (i = 0; i < width; ++i) {
+		for (j = 0; j < num_siblings; ++j) {
+			struct i915_engine_class_instance ci;
+
+			if (copy_from_user(&ci, &ext->engines[i * num_siblings + j],
+					   sizeof(ci))) {
+				err = -EFAULT;
+				goto out_err;
+			}
+
+			siblings[j] = intel_engine_lookup_user(i915,
+							       ci.engine_class,
+							       ci.engine_instance);
+			if (!siblings[j]) {
+				drm_dbg(&i915->drm,
+					"Invalid sibling[%d]: { class:%d, inst:%d }\n",
+					n, ci.engine_class, ci.engine_instance);
+				err = -EINVAL;
+				goto out_err;
+			}
+		}
+
+		ce = intel_engine_create_virtual(siblings, num_siblings,
+						 FORCE_VIRTUAL);
+		if (IS_ERR(ce)) {
+			err = PTR_ERR(ce);
+			goto out_err;
+		}
+		intel_context_set_gem(ce, set->ctx);
+
+		if (i == 0) {
+			parent = ce;
+			__set_bit(CONTEXT_NO_PREEMPT_MID_BATCH, &ce->flags);
+		} else {
+			intel_context_bind_parent_child(parent, ce);
+			err = intel_context_alloc_state(ce);
+			if (err)
+				goto out_err;
+		}
+	}
+
+	if (!__validate_parallel_engines_layout(i915, parent)) {
+		drm_dbg(&i915->drm, "Invalidate parallel context layout");
+		err = -EINVAL;
+		goto out_err;
+	}
+
+	intel_guc_configure_parent_context(parent);
+	if (cmpxchg(&set->engines->engines[slot], NULL, parent)) {
+		err = -EEXIST;
+		goto out_err;
+	}
+
+	kfree(siblings);
+	mutex_unlock(&set->ctx->mutex);
+
+	return 0;
+
+out_err:
+	if (parent) {
+		for_each_child(parent, child)
+			intel_context_put(child);
+		intel_context_put(parent);
+		set->engines->engines[slot] = NULL;
+	}
+	kfree(siblings);
+	mutex_unlock(&set->ctx->mutex);
+
+	return err;
+}
+
 static const i915_user_extension_fn set_engines__extensions[] = {
 	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
 	[I915_CONTEXT_ENGINES_EXT_BOND] = set_engines__bond,
 	[PRELIM_I915_USER_EXT_MASK(PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT)] =
 		set_engines__parallel_submit,
+	[PRELIM_I915_USER_EXT_MASK(PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT)] =
+		set_engines__parallel2_submit,
 };
 
 static int
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 70b6103d9a32..c70c6d042f06 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -206,12 +206,6 @@ struct i915_gem_context {
 	 */
 	u64 fence_context;
 
-	/**
-	 * @parallel_submit: Ensure only 1 parallel submission is happening on
-	 * this context at a time.
-	 */
-	struct mutex parallel_submit;
-
 	/**
 	 * @seqno: Seqno when using when a parallel context.
 	 */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index cdf37c65b904..3cd4327c25f3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1682,7 +1682,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 		goto err_unmap;
 
 	if (engine == eb->context->engine &&
-	    !i915_gem_context_is_parallel(eb->gem_context)) {
+	    !intel_context_is_parallel(eb->context)) {
 		rq = i915_request_create(eb->context);
 	} else {
 		struct intel_context *ce = eb->reloc_context;
@@ -1776,7 +1776,7 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
 		struct intel_engine_cs *engine = eb->engine;
 
 		if (!reloc_can_use_engine(engine) ||
-		    i915_gem_context_is_parallel(eb->gem_context)) {
+		    intel_context_is_parallel(eb->context)) {
 			engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
 			if (!engine)
 				return ERR_PTR(-ENODEV);
@@ -3308,7 +3308,7 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
 }
 
 static int
-eb_select_engine(struct i915_execbuffer *eb)
+eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number)
 {
 	struct intel_context *ce;
 	unsigned int idx;
@@ -3326,6 +3326,16 @@ eb_select_engine(struct i915_execbuffer *eb)
 	if (IS_ERR(ce))
 		return PTR_ERR(ce);
 
+	if (batch_number > 0 &&
+	    !i915_gem_context_is_parallel(eb->gem_context)) {
+		struct intel_context *parent = ce;
+		for_each_child(parent, ce)
+			if (!--batch_number)
+				break;
+		intel_context_put(parent);
+		intel_context_get(ce);
+	}
+
 	intel_gt_pm_get(ce->engine->gt);
 
 	if (i915_gem_context_uses_protected_content(eb->gem_context)) {
@@ -3687,7 +3697,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		goto err_context;
 	}
 
-	err = eb_select_engine(&eb);
+	err = eb_select_engine(&eb, batch_number);
 	if (unlikely(err))
 		goto err_context;
 
@@ -3900,6 +3910,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	const size_t count = args->buffer_count;
 	unsigned int num_batches, i;
 	int err, start_context;
+	bool is_parallel = false;
+	struct intel_context *parent = NULL;
 
 	if (!check_buffer_count(count)) {
 		drm_dbg(&i915->drm, "execbuf2 with %zd buffers\n", count);
@@ -3936,15 +3948,35 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 			I915_EXEC_NUMBER_BB_LSB) +
 		       ((args->flags & PRELIM_I915_EXEC_NUMBER_BB_MASK) >>
 			PRELIM_I915_EXEC_NUMBER_BB_LSB)) + 1;
-	if (i915_gem_context_is_parallel(ctx)) {
-		if (num_batches > count ||
-		    start_context + num_batches > ctx->width) {
-			err = -EINVAL;
-			goto err_context;
+
+	if (i915_gem_context_user_engines(ctx)) {
+		parent = i915_gem_context_get_engine(ctx, start_context);
+		if (IS_ERR(parent)) {
+			i915_gem_context_put(ctx);
+			return PTR_ERR(parent);
 		}
 
-		if (i915_gem_context_is_bb_preempt_boundary(ctx) &&
-		    (start_context || num_batches != ctx->width)) {
+		is_parallel = i915_gem_context_is_parallel(ctx) ||
+			intel_context_is_parallel(parent);
+		if (i915_gem_context_is_parallel(ctx)) {
+			if (num_batches > count ||
+			    start_context + num_batches > ctx->width) {
+				err = -EINVAL;
+				goto err_context;
+			}
+
+			if (i915_gem_context_is_bb_preempt_boundary(ctx) &&
+			    (start_context || num_batches != ctx->width)) {
+				err = -EINVAL;
+				goto err_context;
+			}
+		} else if (intel_context_is_parallel(parent)) {
+			if (num_batches != 1)
+				return -EINVAL;
+			num_batches = parent->guc_number_children + 1;
+			if (num_batches > count)
+				return -EINVAL;
+		} else if(num_batches > 1) {
 			err = -EINVAL;
 			goto err_context;
 		}
@@ -3981,8 +4013,7 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	 * properly, also this is needed to create an excl fence for an dma buf
 	 * objects these BBs touch.
 	 */
-	if (args->flags & I915_EXEC_FENCE_OUT ||
-	    i915_gem_context_is_parallel(ctx)) {
+	if (args->flags & I915_EXEC_FENCE_OUT || is_parallel) {
 		out_fences = kcalloc(num_batches, sizeof(*out_fences),
 				     GFP_KERNEL);
 		if (!out_fences) {
@@ -4028,8 +4059,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	 * in intel_context sequence, thus only 1 submission can happen at a
 	 * time.
 	 */
-	if (i915_gem_context_is_parallel(ctx))
-		mutex_lock(&ctx->parallel_submit);
+	if (is_parallel)
+		mutex_lock(&parent->parallel_submit);
 
 	err = i915_gem_do_execbuffer(dev, file, args, exec2_list,
 				     args->flags & I915_EXEC_BATCH_FIRST ?
@@ -4043,8 +4074,10 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 				     &ww);
 
 	for (i = 1; err == 0 && i < num_batches; i++) {
-		args->flags &= ~I915_EXEC_RING_MASK;
-		args->flags |= start_context + i;
+		if (i915_gem_context_is_parallel(ctx)) {
+			args->flags &= ~I915_EXEC_RING_MASK;
+			args->flags |= start_context + i;
+		}
 		args->batch_len = 0;
 
 		err = i915_gem_do_execbuffer(dev, file, args, exec2_list,
@@ -4059,8 +4092,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 					     &ww);
 	}
 
-	if (i915_gem_context_is_parallel(ctx))
-		mutex_unlock(&ctx->parallel_submit);
+	if (is_parallel)
+		mutex_unlock(&parent->parallel_submit);
 
 	/*
 	 * Now that we have begun execution of the batchbuffer, we ignore
@@ -4164,6 +4197,8 @@ end:;
 	dma_fence_put(in_fence);
 err_context:
 	i915_gem_context_put(ctx);
+	if (parent)
+		intel_context_put(parent);
 
 	return err;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 3e704be30501..fa36419b7fa4 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -469,6 +469,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	INIT_LIST_HEAD(&ce->signals);
 
 	mutex_init(&ce->pin_mutex);
+	mutex_init(&ce->parallel_submit);
 
 	spin_lock_init(&ce->guc_state.lock);
 	INIT_LIST_HEAD(&ce->guc_state.fences);
@@ -507,6 +508,7 @@ void intel_context_fini(struct intel_context *ce)
 			intel_context_put(child);
 
 	mutex_destroy(&ce->pin_mutex);
+	mutex_destroy(&ce->parallel_submit);
 	i915_active_fini(&ce->active);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index 88d4a71c5d29..c401cad7a231 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -52,6 +52,11 @@ static inline bool intel_context_is_parent(struct intel_context *ce)
 	return !!ce->guc_number_children;
 }
 
+static inline bool intel_context_is_parallel(struct intel_context *ce)
+{
+	return intel_context_is_child(ce) || intel_context_is_parent(ce);
+}
+
 /* Only should be called directly by selftests */
 void __intel_context_bind_parent_child(struct intel_context *parent,
 				       struct intel_context *child);
@@ -205,6 +210,12 @@ static inline bool intel_context_is_barrier(const struct intel_context *ce)
 	return test_bit(CONTEXT_BARRIER_BIT, &ce->flags);
 }
 
+static inline bool
+intel_context_is_no_preempt_mid_batch(const struct intel_context *ce)
+{
+	return test_bit(CONTEXT_NO_PREEMPT_MID_BATCH, &ce->flags);
+}
+
 static inline bool intel_context_is_closed(const struct intel_context *ce)
 {
 	return test_bit(CONTEXT_CLOSED_BIT, &ce->flags);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 7bf73704b250..a7c9bf87ef23 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -125,7 +125,8 @@ struct intel_context {
 #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
 #define CONTEXT_NOPREEMPT		8
 #define CONTEXT_LRCA_DIRTY		9
-#define CONTEXT_DEBUG			10
+#define CONTEXT_NO_PREEMPT_MID_BATCH	10
+#define CONTEXT_DEBUG			11
 
 	struct {
 		u64 timeout_us;
@@ -252,6 +253,9 @@ struct intel_context {
 	/* Last request submitted on a parent */
 	struct i915_request *last_rq;
 
+	/* Parallel submit mutex */
+	struct mutex parallel_submit;
+
 	/* GuC context blocked fence */
 	struct i915_sw_fence guc_blocked;
 };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index c285160be8b3..c935c8b7f557 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -832,8 +832,7 @@ static inline int rq_prio(const struct i915_request *rq)
 
 static inline bool is_multi_lrc(struct intel_context *ce)
 {
-	return intel_context_is_child(ce) ||
-		intel_context_is_parent(ce);
+	return intel_context_is_parallel(ce);
 }
 
 static inline bool is_multi_lrc_rq(struct i915_request *rq)
@@ -3827,6 +3826,7 @@ void intel_guc_configure_parent_context(struct intel_context *ce)
 		bb_preempt_boundary =
 			i915_gem_context_is_bb_preempt_boundary(ctx);
 	rcu_read_unlock();
+	bb_preempt_boundary |= intel_context_is_no_preempt_mid_batch(ce);
 	if (bb_preempt_boundary) {
 		ce->emit_bb_start = emit_bb_start_parent_bb_preempt_boundary;
 		ce->emit_fini_breadcrumb =
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 65c31d359e81..e1db7d9f27f8 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1721,14 +1721,9 @@ i915_request_await_object(struct i915_request *to,
 	return ret;
 }
 
-static inline bool is_parallel(struct intel_context *ce)
-{
-	return intel_context_is_child(ce) || intel_context_is_parent(ce);
-}
-
 static inline bool is_parallel_rq(struct i915_request *rq)
 {
-	return is_parallel(rq->context);
+	return intel_context_is_parallel(rq->context);
 }
 
 static inline struct intel_context *request_to_parent(struct i915_request *rq)
diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h
index 577e99801690..ac5cc639e078 100644
--- a/include/uapi/drm/i915_drm_prelim.h
+++ b/include/uapi/drm/i915_drm_prelim.h
@@ -884,9 +884,124 @@ struct prelim_i915_context_engines_parallel_submit {
 } __attribute__ ((packed));
 #define i915_context_engines_parallel_submit prelim_i915_context_engines_parallel_submit
 
+/**
+ * struct prelim_drm_i915_context_engines_parallel2_submit - Configure engine
+ * for parallel submission.
+ *
+ * Setup a slot in the context engine map to allow multiple BBs to be submitted
+ * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU
+ * in parallel. Multiple hardware contexts are created internally in the i915
+ * run these BBs. Once a slot is configured for N BBs only N BBs can be
+ * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user
+ * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how
+ * many BBs there are based on the slot's configuration. The N BBs are the last
+ * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
+ *
+ * The default placement behavior is to create implicit bonds between each
+ * context if each context maps to more than 1 physical engine (e.g. context is
+ * a virtual engine). Also we only allow contexts of same engine class and these
+ * contexts must be in logically contiguous order. Examples of the placement
+ * behavior described below. Lastly, the default is to not allow BBs to
+ * preempted mid BB rather insert coordinated preemption on all hardware
+ * contexts between each set of BBs. Flags may be added in the future to change
+ * both of these default behaviors.
+ *
+ * Returns -EINVAL if hardware context placement configuration is invalid or if
+ * the placement configuration isn't supported on the platform / submission
+ * interface.
+ * Returns -ENODEV if extension isn't supported on the platform / submission
+ * inteface.
+ *
+ * .. code-block::
+ *
+ *	Example 1 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=1,
+ *		     engines=CS[0],CS[1])
+ *
+ *	Results in the following valid placement:
+ *	CS[0], CS[1]
+ *
+ *	Example 2 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=2,
+ *		     engines=CS[0],CS[2],CS[1],CS[3])
+ *
+ *	Results in the following valid placements:
+ *	CS[0], CS[1]
+ *	CS[2], CS[3]
+ *
+ *	This can also be thought of as 2 virtual engines described by 2-D array
+ *	in the engines the field with bonds placed between each index of the
+ *	virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to
+ *	CS[3].
+ *	VE[0] = CS[0], CS[2]
+ *	VE[1] = CS[1], CS[3]
+ *
+ *	Example 3 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=2,
+ *		     engines=CS[0],CS[1],CS[1],CS[3])
+ *
+ *	Results in the following valid and invalid placements:
+ *	CS[0], CS[1]
+ *	CS[1], CS[3] - Not logical contiguous, return -EINVAL
+ */
+struct prelim_drm_i915_context_engines_parallel2_submit {
+	/**
+	 * @base: base user extension.
+	 */
+	struct i915_user_extension base;
+
+	/**
+	 * @engine_index: slot for parallel engine
+	 */
+	__u16 engine_index;
+
+	/**
+	 * @width: number of contexts per parallel engine
+	 */
+	__u16 width;
+
+	/**
+	 * @num_siblings: number of siblings per context
+	 */
+	__u16 num_siblings;
+
+	/**
+	 * @mbz16: reserved for future use; must be zero
+	 */
+	__u16 mbz16;
+
+	/**
+	 * @flags: all undefined flags must be zero, currently not defined flags
+	 */
+	__u64 flags;
+
+	/**
+	 * @mbz64: reserved for future use; must be zero
+	 */
+	__u64 mbz64[3];
+
+	/**
+	 * @engines: 2-d array of engine instances to configure parallel engine
+	 *
+	 * length = width (i) * num_siblings (j)
+	 * index = j + i * num_siblings
+	 */
+	struct i915_engine_class_instance engines[0];
+} __attribute__ ((packed));
+
 struct prelim_i915_context_param_engines {
 #define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT (PRELIM_I915_USER_EXT | 2) /* see prelim_i915_context_engines_parallel_submit */
 #define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
+#define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT (PRELIM_I915_USER_EXT | 3) /* see prelim_i915_context_engines_parallel2_submit */
 };
 
 enum prelim_drm_i915_oa_format {
--
git-pile 0.97


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Intel-gfx] [PATCH 2/2] REVIEW: Full tree diff against internal/internal
@ 2021-07-08  0:30     ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-08  0:30 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Auto-generated diff between internal/internal..internal
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c       | 190 +++++++++++++++++++++-
 drivers/gpu/drm/i915/gem/i915_gem_context_types.h |   6 -
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c    |  73 ++++++---
 drivers/gpu/drm/i915/gt/intel_context.c           |   2 +
 drivers/gpu/drm/i915/gt/intel_context.h           |  11 ++
 drivers/gpu/drm/i915/gt/intel_context_types.h     |   6 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |   4 +-
 drivers/gpu/drm/i915/i915_request.c               |   7 +-
 include/uapi/drm/i915_drm_prelim.h                | 115 +++++++++++++
 9 files changed, 377 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 82b7af55ba05..583113c58e9c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -389,7 +389,6 @@ void i915_gem_context_release(struct kref *ref)
 	mutex_destroy(&ctx->engines_mutex);
 	mutex_destroy(&ctx->lut_mutex);
 	mutex_destroy(&ctx->mutex);
-	mutex_destroy(&ctx->parallel_submit);
 
 	kfree_rcu(ctx, rcu);
 }
@@ -769,8 +768,6 @@ static struct i915_gem_context *__create_context(struct intel_gt *gt)
 	mutex_init(&ctx->mutex);
 	INIT_LIST_HEAD(&ctx->link);
 
-	mutex_init(&ctx->parallel_submit);
-
 	spin_lock_init(&ctx->stale.lock);
 	INIT_LIST_HEAD(&ctx->stale.engines);
 
@@ -2093,6 +2090,48 @@ static bool validate_parallel_engines_layout(const struct set_engines *set)
 	return true;
 }
 
+/*
+ * Engine must be same class and form a logically contiguous mask.
+ *
+ * FIXME: Logical mask check not 100% correct but good enough for the PoC
+ */
+static bool __validate_parallel_engines_layout(struct drm_i915_private *i915,
+					       struct intel_context *parent)
+{
+	u8 engine_class = parent->engine->class;
+	u8 num_siblings = hweight_long(parent->engine->logical_mask);
+	struct intel_context *child;
+	intel_engine_mask_t logical_mask = parent->engine->logical_mask;
+
+	for_each_child(parent, child) {
+		if (child->engine->class != engine_class) {
+			drm_dbg(&i915->drm, "Class mismatch: %u, %u",
+				engine_class, child->engine->class);
+			return false;
+		}
+		if (hweight_long(child->engine->logical_mask) != num_siblings) {
+			drm_dbg(&i915->drm, "Sibling mismatch: %u, %lu",
+				num_siblings,
+				hweight_long(child->engine->logical_mask));
+			return false;
+		}
+		if (logical_mask & child->engine->logical_mask) {
+			drm_dbg(&i915->drm, "Overlapping logical mask: 0x%04x, 0x%04x",
+				logical_mask, child->engine->logical_mask);
+			return false;
+		}
+		logical_mask |= child->engine->logical_mask;
+	}
+
+	if (!is_power_of_2((logical_mask >> (ffs(logical_mask) - 1)) + 1)) {
+		drm_dbg(&i915->drm, "Non-contiguous logical mask: 0x%04x",
+			logical_mask);
+		return false;
+	}
+
+	return true;
+}
+
 static int
 set_engines__parallel_submit(struct i915_user_extension __user *base, void *data)
 {
@@ -2245,11 +2284,156 @@ set_engines__parallel_submit(struct i915_user_extension __user *base, void *data
 	return err;
 }
 
+static int
+set_engines__parallel2_submit(struct i915_user_extension __user *base,
+			      void *data)
+{
+	struct prelim_drm_i915_context_engines_parallel2_submit __user *ext =
+		container_of_user(base, typeof(*ext), base);
+	const struct set_engines *set = data;
+	struct drm_i915_private *i915 = set->ctx->i915;
+	struct intel_context *parent, *child, *ce;
+	u64 flags;
+	int err = 0, n, i, j;
+	u16 slot, width, num_siblings;
+	struct intel_engine_cs **siblings = NULL;
+
+	if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
+		return -ENODEV;
+
+	if (get_user(slot, &ext->engine_index))
+		return -EFAULT;
+
+	if (get_user(width, &ext->width))
+		return -EFAULT;
+
+	if (get_user(num_siblings, &ext->num_siblings))
+		return -EFAULT;
+
+	if (slot >= set->engines->num_engines) {
+		drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
+			slot, set->engines->num_engines);
+		return -EINVAL;
+	}
+
+	parent = set->engines->engines[slot];
+	if (parent) {
+		drm_dbg(&i915->drm, "Context index[%d] not NULL\n", slot);
+		return -EINVAL;
+	}
+
+	if (get_user(flags, &ext->flags))
+		return -EFAULT;
+
+	if (flags) {
+		drm_dbg(&i915->drm, "Unknown flags 0x%02llx", flags);
+		return -EINVAL;
+	}
+
+	for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
+		err = check_user_mbz(&ext->mbz64[n]);
+		if (err)
+			return err;
+	}
+
+	if (width < 1) {
+		drm_dbg(&i915->drm, "Width (%d) < 1 \n", width);
+		return -EINVAL;
+	}
+
+	if (num_siblings < 1) {
+		drm_dbg(&i915->drm, "Number siblings (%d) < 1 \n",
+			num_siblings);
+		return -EINVAL;
+	}
+
+	siblings = kmalloc_array(num_siblings,
+				 sizeof(*siblings),
+				 GFP_KERNEL);
+	if (!siblings)
+		return -ENOMEM;
+
+	mutex_lock(&set->ctx->mutex);
+
+	/* Create contexts / engines */
+	for (i = 0; i < width; ++i) {
+		for (j = 0; j < num_siblings; ++j) {
+			struct i915_engine_class_instance ci;
+
+			if (copy_from_user(&ci, &ext->engines[i * num_siblings + j],
+					   sizeof(ci))) {
+				err = -EFAULT;
+				goto out_err;
+			}
+
+			siblings[j] = intel_engine_lookup_user(i915,
+							       ci.engine_class,
+							       ci.engine_instance);
+			if (!siblings[j]) {
+				drm_dbg(&i915->drm,
+					"Invalid sibling[%d]: { class:%d, inst:%d }\n",
+					n, ci.engine_class, ci.engine_instance);
+				err = -EINVAL;
+				goto out_err;
+			}
+		}
+
+		ce = intel_engine_create_virtual(siblings, num_siblings,
+						 FORCE_VIRTUAL);
+		if (IS_ERR(ce)) {
+			err = PTR_ERR(ce);
+			goto out_err;
+		}
+		intel_context_set_gem(ce, set->ctx);
+
+		if (i == 0) {
+			parent = ce;
+			__set_bit(CONTEXT_NO_PREEMPT_MID_BATCH, &ce->flags);
+		} else {
+			intel_context_bind_parent_child(parent, ce);
+			err = intel_context_alloc_state(ce);
+			if (err)
+				goto out_err;
+		}
+	}
+
+	if (!__validate_parallel_engines_layout(i915, parent)) {
+		drm_dbg(&i915->drm, "Invalidate parallel context layout");
+		err = -EINVAL;
+		goto out_err;
+	}
+
+	intel_guc_configure_parent_context(parent);
+	if (cmpxchg(&set->engines->engines[slot], NULL, parent)) {
+		err = -EEXIST;
+		goto out_err;
+	}
+
+	kfree(siblings);
+	mutex_unlock(&set->ctx->mutex);
+
+	return 0;
+
+out_err:
+	if (parent) {
+		for_each_child(parent, child)
+			intel_context_put(child);
+		intel_context_put(parent);
+		set->engines->engines[slot] = NULL;
+	}
+	kfree(siblings);
+	mutex_unlock(&set->ctx->mutex);
+
+	return err;
+}
+
 static const i915_user_extension_fn set_engines__extensions[] = {
 	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
 	[I915_CONTEXT_ENGINES_EXT_BOND] = set_engines__bond,
 	[PRELIM_I915_USER_EXT_MASK(PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT)] =
 		set_engines__parallel_submit,
+	[PRELIM_I915_USER_EXT_MASK(PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT)] =
+		set_engines__parallel2_submit,
 };
 
 static int
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 70b6103d9a32..c70c6d042f06 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -206,12 +206,6 @@ struct i915_gem_context {
 	 */
 	u64 fence_context;
 
-	/**
-	 * @parallel_submit: Ensure only 1 parallel submission is happening on
-	 * this context at a time.
-	 */
-	struct mutex parallel_submit;
-
 	/**
 	 * @seqno: Seqno when using when a parallel context.
 	 */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index cdf37c65b904..3cd4327c25f3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1682,7 +1682,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 		goto err_unmap;
 
 	if (engine == eb->context->engine &&
-	    !i915_gem_context_is_parallel(eb->gem_context)) {
+	    !intel_context_is_parallel(eb->context)) {
 		rq = i915_request_create(eb->context);
 	} else {
 		struct intel_context *ce = eb->reloc_context;
@@ -1776,7 +1776,7 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
 		struct intel_engine_cs *engine = eb->engine;
 
 		if (!reloc_can_use_engine(engine) ||
-		    i915_gem_context_is_parallel(eb->gem_context)) {
+		    intel_context_is_parallel(eb->context)) {
 			engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
 			if (!engine)
 				return ERR_PTR(-ENODEV);
@@ -3308,7 +3308,7 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
 }
 
 static int
-eb_select_engine(struct i915_execbuffer *eb)
+eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number)
 {
 	struct intel_context *ce;
 	unsigned int idx;
@@ -3326,6 +3326,16 @@ eb_select_engine(struct i915_execbuffer *eb)
 	if (IS_ERR(ce))
 		return PTR_ERR(ce);
 
+	if (batch_number > 0 &&
+	    !i915_gem_context_is_parallel(eb->gem_context)) {
+		struct intel_context *parent = ce;
+		for_each_child(parent, ce)
+			if (!--batch_number)
+				break;
+		intel_context_put(parent);
+		intel_context_get(ce);
+	}
+
 	intel_gt_pm_get(ce->engine->gt);
 
 	if (i915_gem_context_uses_protected_content(eb->gem_context)) {
@@ -3687,7 +3697,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		goto err_context;
 	}
 
-	err = eb_select_engine(&eb);
+	err = eb_select_engine(&eb, batch_number);
 	if (unlikely(err))
 		goto err_context;
 
@@ -3900,6 +3910,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	const size_t count = args->buffer_count;
 	unsigned int num_batches, i;
 	int err, start_context;
+	bool is_parallel = false;
+	struct intel_context *parent = NULL;
 
 	if (!check_buffer_count(count)) {
 		drm_dbg(&i915->drm, "execbuf2 with %zd buffers\n", count);
@@ -3936,15 +3948,35 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 			I915_EXEC_NUMBER_BB_LSB) +
 		       ((args->flags & PRELIM_I915_EXEC_NUMBER_BB_MASK) >>
 			PRELIM_I915_EXEC_NUMBER_BB_LSB)) + 1;
-	if (i915_gem_context_is_parallel(ctx)) {
-		if (num_batches > count ||
-		    start_context + num_batches > ctx->width) {
-			err = -EINVAL;
-			goto err_context;
+
+	if (i915_gem_context_user_engines(ctx)) {
+		parent = i915_gem_context_get_engine(ctx, start_context);
+		if (IS_ERR(parent)) {
+			i915_gem_context_put(ctx);
+			return PTR_ERR(parent);
 		}
 
-		if (i915_gem_context_is_bb_preempt_boundary(ctx) &&
-		    (start_context || num_batches != ctx->width)) {
+		is_parallel = i915_gem_context_is_parallel(ctx) ||
+			intel_context_is_parallel(parent);
+		if (i915_gem_context_is_parallel(ctx)) {
+			if (num_batches > count ||
+			    start_context + num_batches > ctx->width) {
+				err = -EINVAL;
+				goto err_context;
+			}
+
+			if (i915_gem_context_is_bb_preempt_boundary(ctx) &&
+			    (start_context || num_batches != ctx->width)) {
+				err = -EINVAL;
+				goto err_context;
+			}
+		} else if (intel_context_is_parallel(parent)) {
+			if (num_batches != 1)
+				return -EINVAL;
+			num_batches = parent->guc_number_children + 1;
+			if (num_batches > count)
+				return -EINVAL;
+		} else if(num_batches > 1) {
 			err = -EINVAL;
 			goto err_context;
 		}
@@ -3981,8 +4013,7 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	 * properly, also this is needed to create an excl fence for an dma buf
 	 * objects these BBs touch.
 	 */
-	if (args->flags & I915_EXEC_FENCE_OUT ||
-	    i915_gem_context_is_parallel(ctx)) {
+	if (args->flags & I915_EXEC_FENCE_OUT || is_parallel) {
 		out_fences = kcalloc(num_batches, sizeof(*out_fences),
 				     GFP_KERNEL);
 		if (!out_fences) {
@@ -4028,8 +4059,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	 * in intel_context sequence, thus only 1 submission can happen at a
 	 * time.
 	 */
-	if (i915_gem_context_is_parallel(ctx))
-		mutex_lock(&ctx->parallel_submit);
+	if (is_parallel)
+		mutex_lock(&parent->parallel_submit);
 
 	err = i915_gem_do_execbuffer(dev, file, args, exec2_list,
 				     args->flags & I915_EXEC_BATCH_FIRST ?
@@ -4043,8 +4074,10 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 				     &ww);
 
 	for (i = 1; err == 0 && i < num_batches; i++) {
-		args->flags &= ~I915_EXEC_RING_MASK;
-		args->flags |= start_context + i;
+		if (i915_gem_context_is_parallel(ctx)) {
+			args->flags &= ~I915_EXEC_RING_MASK;
+			args->flags |= start_context + i;
+		}
 		args->batch_len = 0;
 
 		err = i915_gem_do_execbuffer(dev, file, args, exec2_list,
@@ -4059,8 +4092,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 					     &ww);
 	}
 
-	if (i915_gem_context_is_parallel(ctx))
-		mutex_unlock(&ctx->parallel_submit);
+	if (is_parallel)
+		mutex_unlock(&parent->parallel_submit);
 
 	/*
 	 * Now that we have begun execution of the batchbuffer, we ignore
@@ -4164,6 +4197,8 @@ end:;
 	dma_fence_put(in_fence);
 err_context:
 	i915_gem_context_put(ctx);
+	if (parent)
+		intel_context_put(parent);
 
 	return err;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 3e704be30501..fa36419b7fa4 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -469,6 +469,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	INIT_LIST_HEAD(&ce->signals);
 
 	mutex_init(&ce->pin_mutex);
+	mutex_init(&ce->parallel_submit);
 
 	spin_lock_init(&ce->guc_state.lock);
 	INIT_LIST_HEAD(&ce->guc_state.fences);
@@ -507,6 +508,7 @@ void intel_context_fini(struct intel_context *ce)
 			intel_context_put(child);
 
 	mutex_destroy(&ce->pin_mutex);
+	mutex_destroy(&ce->parallel_submit);
 	i915_active_fini(&ce->active);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index 88d4a71c5d29..c401cad7a231 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -52,6 +52,11 @@ static inline bool intel_context_is_parent(struct intel_context *ce)
 	return !!ce->guc_number_children;
 }
 
+static inline bool intel_context_is_parallel(struct intel_context *ce)
+{
+	return intel_context_is_child(ce) || intel_context_is_parent(ce);
+}
+
 /* Only should be called directly by selftests */
 void __intel_context_bind_parent_child(struct intel_context *parent,
 				       struct intel_context *child);
@@ -205,6 +210,12 @@ static inline bool intel_context_is_barrier(const struct intel_context *ce)
 	return test_bit(CONTEXT_BARRIER_BIT, &ce->flags);
 }
 
+static inline bool
+intel_context_is_no_preempt_mid_batch(const struct intel_context *ce)
+{
+	return test_bit(CONTEXT_NO_PREEMPT_MID_BATCH, &ce->flags);
+}
+
 static inline bool intel_context_is_closed(const struct intel_context *ce)
 {
 	return test_bit(CONTEXT_CLOSED_BIT, &ce->flags);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 7bf73704b250..a7c9bf87ef23 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -125,7 +125,8 @@ struct intel_context {
 #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
 #define CONTEXT_NOPREEMPT		8
 #define CONTEXT_LRCA_DIRTY		9
-#define CONTEXT_DEBUG			10
+#define CONTEXT_NO_PREEMPT_MID_BATCH	10
+#define CONTEXT_DEBUG			11
 
 	struct {
 		u64 timeout_us;
@@ -252,6 +253,9 @@ struct intel_context {
 	/* Last request submitted on a parent */
 	struct i915_request *last_rq;
 
+	/* Parallel submit mutex */
+	struct mutex parallel_submit;
+
 	/* GuC context blocked fence */
 	struct i915_sw_fence guc_blocked;
 };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index c285160be8b3..c935c8b7f557 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -832,8 +832,7 @@ static inline int rq_prio(const struct i915_request *rq)
 
 static inline bool is_multi_lrc(struct intel_context *ce)
 {
-	return intel_context_is_child(ce) ||
-		intel_context_is_parent(ce);
+	return intel_context_is_parallel(ce);
 }
 
 static inline bool is_multi_lrc_rq(struct i915_request *rq)
@@ -3827,6 +3826,7 @@ void intel_guc_configure_parent_context(struct intel_context *ce)
 		bb_preempt_boundary =
 			i915_gem_context_is_bb_preempt_boundary(ctx);
 	rcu_read_unlock();
+	bb_preempt_boundary |= intel_context_is_no_preempt_mid_batch(ce);
 	if (bb_preempt_boundary) {
 		ce->emit_bb_start = emit_bb_start_parent_bb_preempt_boundary;
 		ce->emit_fini_breadcrumb =
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 65c31d359e81..e1db7d9f27f8 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1721,14 +1721,9 @@ i915_request_await_object(struct i915_request *to,
 	return ret;
 }
 
-static inline bool is_parallel(struct intel_context *ce)
-{
-	return intel_context_is_child(ce) || intel_context_is_parent(ce);
-}
-
 static inline bool is_parallel_rq(struct i915_request *rq)
 {
-	return is_parallel(rq->context);
+	return intel_context_is_parallel(rq->context);
 }
 
 static inline struct intel_context *request_to_parent(struct i915_request *rq)
diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h
index 577e99801690..ac5cc639e078 100644
--- a/include/uapi/drm/i915_drm_prelim.h
+++ b/include/uapi/drm/i915_drm_prelim.h
@@ -884,9 +884,124 @@ struct prelim_i915_context_engines_parallel_submit {
 } __attribute__ ((packed));
 #define i915_context_engines_parallel_submit prelim_i915_context_engines_parallel_submit
 
+/**
+ * struct prelim_drm_i915_context_engines_parallel2_submit - Configure engine
+ * for parallel submission.
+ *
+ * Setup a slot in the context engine map to allow multiple BBs to be submitted
+ * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU
+ * in parallel. Multiple hardware contexts are created internally in the i915
+ * run these BBs. Once a slot is configured for N BBs only N BBs can be
+ * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user
+ * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how
+ * many BBs there are based on the slot's configuration. The N BBs are the last
+ * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
+ *
+ * The default placement behavior is to create implicit bonds between each
+ * context if each context maps to more than 1 physical engine (e.g. context is
+ * a virtual engine). Also we only allow contexts of same engine class and these
+ * contexts must be in logically contiguous order. Examples of the placement
+ * behavior described below. Lastly, the default is to not allow BBs to
+ * preempted mid BB rather insert coordinated preemption on all hardware
+ * contexts between each set of BBs. Flags may be added in the future to change
+ * both of these default behaviors.
+ *
+ * Returns -EINVAL if hardware context placement configuration is invalid or if
+ * the placement configuration isn't supported on the platform / submission
+ * interface.
+ * Returns -ENODEV if extension isn't supported on the platform / submission
+ * inteface.
+ *
+ * .. code-block::
+ *
+ *	Example 1 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=1,
+ *		     engines=CS[0],CS[1])
+ *
+ *	Results in the following valid placement:
+ *	CS[0], CS[1]
+ *
+ *	Example 2 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=2,
+ *		     engines=CS[0],CS[2],CS[1],CS[3])
+ *
+ *	Results in the following valid placements:
+ *	CS[0], CS[1]
+ *	CS[2], CS[3]
+ *
+ *	This can also be thought of as 2 virtual engines described by 2-D array
+ *	in the engines the field with bonds placed between each index of the
+ *	virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to
+ *	CS[3].
+ *	VE[0] = CS[0], CS[2]
+ *	VE[1] = CS[1], CS[3]
+ *
+ *	Example 3 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=2,
+ *		     engines=CS[0],CS[1],CS[1],CS[3])
+ *
+ *	Results in the following valid and invalid placements:
+ *	CS[0], CS[1]
+ *	CS[1], CS[3] - Not logical contiguous, return -EINVAL
+ */
+struct prelim_drm_i915_context_engines_parallel2_submit {
+	/**
+	 * @base: base user extension.
+	 */
+	struct i915_user_extension base;
+
+	/**
+	 * @engine_index: slot for parallel engine
+	 */
+	__u16 engine_index;
+
+	/**
+	 * @width: number of contexts per parallel engine
+	 */
+	__u16 width;
+
+	/**
+	 * @num_siblings: number of siblings per context
+	 */
+	__u16 num_siblings;
+
+	/**
+	 * @mbz16: reserved for future use; must be zero
+	 */
+	__u16 mbz16;
+
+	/**
+	 * @flags: all undefined flags must be zero, currently not defined flags
+	 */
+	__u64 flags;
+
+	/**
+	 * @mbz64: reserved for future use; must be zero
+	 */
+	__u64 mbz64[3];
+
+	/**
+	 * @engines: 2-d array of engine instances to configure parallel engine
+	 *
+	 * length = width (i) * num_siblings (j)
+	 * index = j + i * num_siblings
+	 */
+	struct i915_engine_class_instance engines[0];
+} __attribute__ ((packed));
+
 struct prelim_i915_context_param_engines {
 #define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT (PRELIM_I915_USER_EXT | 2) /* see prelim_i915_context_engines_parallel_submit */
 #define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
+#define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT (PRELIM_I915_USER_EXT | 3) /* see prelim_i915_context_engines_parallel2_submit */
 };
 
 enum prelim_drm_i915_oa_format {
--
git-pile 0.97

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for CT changes required for GuC submission (rev4)
  2021-07-06 22:20 ` [Intel-gfx] " Matthew Brost
                   ` (13 preceding siblings ...)
  (?)
@ 2021-07-08  2:18 ` Patchwork
  -1 siblings, 0 replies; 54+ messages in thread
From: Patchwork @ 2021-07-08  2:18 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx

== Series Details ==

Series: CT changes required for GuC submission (rev4)
URL   : https://patchwork.freedesktop.org/series/91943/
State : failure

== Summary ==

Applying: drm/i915/guc: Relax CTB response timeout
Applying: REVIEW: Full tree diff against internal/internal
error: sha1 information is lacking or useless (drivers/gpu/drm/i915/gem/i915_gem_context.c).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 REVIEW: Full tree diff against internal/internal
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 06/7] drm/i915/guc: Optimize CTB writes and reads
  2021-07-07 23:25   ` [Intel-gfx] " Matthew Brost
@ 2021-07-08 13:23     ` Michal Wajdeczko
  -1 siblings, 0 replies; 54+ messages in thread
From: Michal Wajdeczko @ 2021-07-08 13:23 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: john.c.harrison



On 08.07.2021 01:25, Matthew Brost wrote:
> CTB writes are now in the path of command submission and should be
> optimized for performance. Rather than reading CTB descriptor values
> (e.g. head, tail) which could result in accesses across the PCIe bus,
> store shadow local copies and only read/write the descriptor values when
> absolutely necessary. Also store the current space in the each channel
> locally.
> 
> v2:
>  (Michal)
>   - Add additional sanity checks for head / tail pointers
>   - Use GUC_CTB_HDR_LEN rather than magic 1
> v3:
>  (Michal / John H)
>   - Drop redundant check of head value
> v4:
>  (John H)
>   - Drop redundant checks of tail / head values
> v5:
>  (Michal)
>   - Address more nits
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 92 +++++++++++++++--------
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
>  2 files changed, 66 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index db3e85b89573..d552d3016779 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
>  static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
>  {
>  	ctb->broken = false;
> +	ctb->tail = 0;
> +	ctb->head = 0;
> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
> +
>  	guc_ct_buffer_desc_init(ctb->desc);
>  }
>  
> @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
>  {
>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> -	u32 tail = desc->tail;
> +	u32 tail = ctb->tail;
>  	u32 size = ctb->size;
> -	u32 used;
>  	u32 header;
>  	u32 hxg;
>  	u32 type;
> @@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
>  	if (unlikely(desc->status))
>  		goto corrupted;
>  
> -	if (unlikely((tail | head) >= size)) {
> -		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> -			 head, tail, size);
> +	GEM_BUG_ON(tail > size);
> +
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> +	if (unlikely(tail != READ_ONCE(desc->tail))) {
> +		CT_ERROR(ct, "Tail was modified %u != %u\n",
> +			 desc->tail, tail);
> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> +		goto corrupted;
> +	}
> +	if (unlikely(desc->head >= size)) {

READ_ONCE wouldn't hurt

> +		CT_ERROR(ct, "Invalid head offset %u >= %u)\n",
> +			 desc->head, size);
>  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>  		goto corrupted;
>  	}
> -
> -	/*
> -	 * tail == head condition indicates empty. GuC FW does not support
> -	 * using up the entire buffer to get tail == head meaning full.
> -	 */
> -	if (tail < head)
> -		used = (size - head) + tail;
> -	else
> -		used = tail - head;
> -
> -	/* make sure there is a space including extra dw for the header */
> -	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
> -		return -ENOSPC;
> +#endif
>  
>  	/*
>  	 * dw0: CT header (including fence)
> @@ -452,6 +451,10 @@ static int ct_write(struct intel_guc_ct *ct,
>  	 */
>  	write_barrier(ct);
>  
> +	/* update local copies */
> +	ctb->tail = tail;
> +	ctb->space -= len + GUC_CTB_HDR_LEN;

it looks that we rely on previous call to h2g_has_room(), but maybe for
completeness we should have sanity check in this function as well:

	GEM_BUG_ON(ctb->space < len + HDR_LEN);

not a blocker, other LGTM,

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>

Michal

> +
>  	/* now update descriptor */
>  	WRITE_ONCE(desc->tail, tail);
>  
> @@ -469,7 +472,7 @@ static int ct_write(struct intel_guc_ct *ct,
>   * @req:	pointer to pending request
>   * @status:	placeholder for status
>   *
> - * For each sent request, Guc shall send bac CT response message.
> + * For each sent request, GuC shall send back CT response message.
>   * Our message handler will update status of tracked request once
>   * response message with given fence is received. Wait here and
>   * check for valid response status value.
> @@ -525,24 +528,36 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
>  	return ret;
>  }
>  
> -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
>  {
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = READ_ONCE(desc->head);
> +	u32 head;
>  	u32 space;
>  
> -	space = CIRC_SPACE(desc->tail, head, ctb->size);
> +	if (ctb->space >= len_dw)
> +		return true;
> +
> +	head = READ_ONCE(desc->head);
> +	if (unlikely(head > ctb->size)) {
> +		CT_ERROR(ct, "Invalid head offset %u >= %u)\n",
> +			 head, ctb->size);
> +		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> +		ctb->broken = true;
> +		return false;
> +	}
> +
> +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> +	ctb->space = space;
>  
>  	return space >= len_dw;
>  }
>  
>  static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
>  {
> -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> -
>  	lockdep_assert_held(&ct->ctbs.send.lock);
>  
> -	if (unlikely(!h2g_has_room(ctb, len_dw))) {
> +	if (unlikely(!h2g_has_room(ct, len_dw))) {
>  		if (ct->stall_time == KTIME_MAX)
>  			ct->stall_time = ktime_get();
>  
> @@ -612,7 +627,7 @@ static int ct_send(struct intel_guc_ct *ct,
>  	 */
>  retry:
>  	spin_lock_irqsave(&ctb->lock, flags);
> -	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
> +	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
>  		if (ct->stall_time == KTIME_MAX)
>  			ct->stall_time = ktime_get();
>  		spin_unlock_irqrestore(&ctb->lock, flags);
> @@ -732,8 +747,8 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  {
>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> -	u32 tail = desc->tail;
> +	u32 head = ctb->head;
> +	u32 tail = READ_ONCE(desc->tail);
>  	u32 size = ctb->size;
>  	u32 *cmds = ctb->cmds;
>  	s32 available;
> @@ -747,9 +762,19 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	if (unlikely(desc->status))
>  		goto corrupted;
>  
> -	if (unlikely((tail | head) >= size)) {
> -		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> -			 head, tail, size);
> +	GEM_BUG_ON(head > size);
> +
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> +	if (unlikely(head != READ_ONCE(desc->head))) {
> +		CT_ERROR(ct, "Head was modified %u != %u\n",
> +			 desc->head, head);
> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> +		goto corrupted;
> +	}
> +#endif
> +	if (unlikely(tail >= size)) {
> +		CT_ERROR(ct, "Invalid tail offset %u >= %u)\n",
> +			 tail, size);
>  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>  		goto corrupted;
>  	}
> @@ -802,6 +827,9 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	}
>  	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
>  
> +	/* update local copies */
> +	ctb->head = head;
> +
>  	/* now update descriptor */
>  	WRITE_ONCE(desc->head, head);
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index bee03794c1eb..edd1bba0445d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -33,6 +33,9 @@ struct intel_guc;
>   * @desc: pointer to the buffer descriptor
>   * @cmds: pointer to the commands buffer
>   * @size: size of the commands buffer in dwords
> + * @head: local shadow copy of head in dwords
> + * @tail: local shadow copy of tail in dwords
> + * @space: local shadow copy of space in dwords
>   * @broken: flag to indicate if descriptor data is broken
>   */
>  struct intel_guc_ct_buffer {
> @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
>  	struct guc_ct_buffer_desc *desc;
>  	u32 *cmds;
>  	u32 size;
> +	u32 tail;
> +	u32 head;
> +	u32 space;
>  	bool broken;
>  };
>  
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Intel-gfx] [PATCH 06/7] drm/i915/guc: Optimize CTB writes and reads
@ 2021-07-08 13:23     ` Michal Wajdeczko
  0 siblings, 0 replies; 54+ messages in thread
From: Michal Wajdeczko @ 2021-07-08 13:23 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel



On 08.07.2021 01:25, Matthew Brost wrote:
> CTB writes are now in the path of command submission and should be
> optimized for performance. Rather than reading CTB descriptor values
> (e.g. head, tail) which could result in accesses across the PCIe bus,
> store shadow local copies and only read/write the descriptor values when
> absolutely necessary. Also store the current space in the each channel
> locally.
> 
> v2:
>  (Michal)
>   - Add additional sanity checks for head / tail pointers
>   - Use GUC_CTB_HDR_LEN rather than magic 1
> v3:
>  (Michal / John H)
>   - Drop redundant check of head value
> v4:
>  (John H)
>   - Drop redundant checks of tail / head values
> v5:
>  (Michal)
>   - Address more nits
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 92 +++++++++++++++--------
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
>  2 files changed, 66 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index db3e85b89573..d552d3016779 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
>  static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
>  {
>  	ctb->broken = false;
> +	ctb->tail = 0;
> +	ctb->head = 0;
> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
> +
>  	guc_ct_buffer_desc_init(ctb->desc);
>  }
>  
> @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
>  {
>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> -	u32 tail = desc->tail;
> +	u32 tail = ctb->tail;
>  	u32 size = ctb->size;
> -	u32 used;
>  	u32 header;
>  	u32 hxg;
>  	u32 type;
> @@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
>  	if (unlikely(desc->status))
>  		goto corrupted;
>  
> -	if (unlikely((tail | head) >= size)) {
> -		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> -			 head, tail, size);
> +	GEM_BUG_ON(tail > size);
> +
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> +	if (unlikely(tail != READ_ONCE(desc->tail))) {
> +		CT_ERROR(ct, "Tail was modified %u != %u\n",
> +			 desc->tail, tail);
> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> +		goto corrupted;
> +	}
> +	if (unlikely(desc->head >= size)) {

READ_ONCE wouldn't hurt

> +		CT_ERROR(ct, "Invalid head offset %u >= %u)\n",
> +			 desc->head, size);
>  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>  		goto corrupted;
>  	}
> -
> -	/*
> -	 * tail == head condition indicates empty. GuC FW does not support
> -	 * using up the entire buffer to get tail == head meaning full.
> -	 */
> -	if (tail < head)
> -		used = (size - head) + tail;
> -	else
> -		used = tail - head;
> -
> -	/* make sure there is a space including extra dw for the header */
> -	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
> -		return -ENOSPC;
> +#endif
>  
>  	/*
>  	 * dw0: CT header (including fence)
> @@ -452,6 +451,10 @@ static int ct_write(struct intel_guc_ct *ct,
>  	 */
>  	write_barrier(ct);
>  
> +	/* update local copies */
> +	ctb->tail = tail;
> +	ctb->space -= len + GUC_CTB_HDR_LEN;

it looks that we rely on previous call to h2g_has_room(), but maybe for
completeness we should have sanity check in this function as well:

	GEM_BUG_ON(ctb->space < len + HDR_LEN);

not a blocker, other LGTM,

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>

Michal

> +
>  	/* now update descriptor */
>  	WRITE_ONCE(desc->tail, tail);
>  
> @@ -469,7 +472,7 @@ static int ct_write(struct intel_guc_ct *ct,
>   * @req:	pointer to pending request
>   * @status:	placeholder for status
>   *
> - * For each sent request, Guc shall send bac CT response message.
> + * For each sent request, GuC shall send back CT response message.
>   * Our message handler will update status of tracked request once
>   * response message with given fence is received. Wait here and
>   * check for valid response status value.
> @@ -525,24 +528,36 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
>  	return ret;
>  }
>  
> -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
>  {
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = READ_ONCE(desc->head);
> +	u32 head;
>  	u32 space;
>  
> -	space = CIRC_SPACE(desc->tail, head, ctb->size);
> +	if (ctb->space >= len_dw)
> +		return true;
> +
> +	head = READ_ONCE(desc->head);
> +	if (unlikely(head > ctb->size)) {
> +		CT_ERROR(ct, "Invalid head offset %u >= %u)\n",
> +			 head, ctb->size);
> +		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> +		ctb->broken = true;
> +		return false;
> +	}
> +
> +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> +	ctb->space = space;
>  
>  	return space >= len_dw;
>  }
>  
>  static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
>  {
> -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> -
>  	lockdep_assert_held(&ct->ctbs.send.lock);
>  
> -	if (unlikely(!h2g_has_room(ctb, len_dw))) {
> +	if (unlikely(!h2g_has_room(ct, len_dw))) {
>  		if (ct->stall_time == KTIME_MAX)
>  			ct->stall_time = ktime_get();
>  
> @@ -612,7 +627,7 @@ static int ct_send(struct intel_guc_ct *ct,
>  	 */
>  retry:
>  	spin_lock_irqsave(&ctb->lock, flags);
> -	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
> +	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
>  		if (ct->stall_time == KTIME_MAX)
>  			ct->stall_time = ktime_get();
>  		spin_unlock_irqrestore(&ctb->lock, flags);
> @@ -732,8 +747,8 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  {
>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> -	u32 tail = desc->tail;
> +	u32 head = ctb->head;
> +	u32 tail = READ_ONCE(desc->tail);
>  	u32 size = ctb->size;
>  	u32 *cmds = ctb->cmds;
>  	s32 available;
> @@ -747,9 +762,19 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	if (unlikely(desc->status))
>  		goto corrupted;
>  
> -	if (unlikely((tail | head) >= size)) {
> -		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> -			 head, tail, size);
> +	GEM_BUG_ON(head > size);
> +
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> +	if (unlikely(head != READ_ONCE(desc->head))) {
> +		CT_ERROR(ct, "Head was modified %u != %u\n",
> +			 desc->head, head);
> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> +		goto corrupted;
> +	}
> +#endif
> +	if (unlikely(tail >= size)) {
> +		CT_ERROR(ct, "Invalid tail offset %u >= %u)\n",
> +			 tail, size);
>  		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>  		goto corrupted;
>  	}
> @@ -802,6 +827,9 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	}
>  	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
>  
> +	/* update local copies */
> +	ctb->head = head;
> +
>  	/* now update descriptor */
>  	WRITE_ONCE(desc->head, head);
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index bee03794c1eb..edd1bba0445d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -33,6 +33,9 @@ struct intel_guc;
>   * @desc: pointer to the buffer descriptor
>   * @cmds: pointer to the commands buffer
>   * @size: size of the commands buffer in dwords
> + * @head: local shadow copy of head in dwords
> + * @tail: local shadow copy of tail in dwords
> + * @space: local shadow copy of space in dwords
>   * @broken: flag to indicate if descriptor data is broken
>   */
>  struct intel_guc_ct_buffer {
> @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
>  	struct guc_ct_buffer_desc *desc;
>  	u32 *cmds;
>  	u32 size;
> +	u32 tail;
> +	u32 head;
> +	u32 space;
>  	bool broken;
>  };
>  
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
  2021-07-08 16:20 [PATCH 0/7] CT changes required for GuC submission Matthew Brost
@ 2021-07-08 16:20 ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-08 16:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: daniele.ceraolospurio, john.c.harrison, Michal.Wajdeczko

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail) which could result in accesses across the PCIe bus,
store shadow local copies and only read/write the descriptor values when
absolutely necessary. Also store the current space in the each channel
locally.

v2:
 (Michal)
  - Add additional sanity checks for head / tail pointers
  - Use GUC_CTB_HDR_LEN rather than magic 1
v3:
 (Michal / John H)
  - Drop redundant check of head value
v4:
 (John H)
  - Drop redundant checks of tail / head values
v5:
 (Michal)
  - Address more nits
v6:
 (Michal)
  - Add GEM_BUG_ON sanity check on ctb->space

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 93 +++++++++++++++--------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
 2 files changed, 67 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index db3e85b89573..ad33708c2818 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
 static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
 {
 	ctb->broken = false;
+	ctb->tail = 0;
+	ctb->head = 0;
+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+
 	guc_ct_buffer_desc_init(ctb->desc);
 }
 
@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
-	u32 tail = desc->tail;
+	u32 tail = ctb->tail;
 	u32 size = ctb->size;
-	u32 used;
 	u32 header;
 	u32 hxg;
 	u32 type;
@@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
-		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-			 head, tail, size);
+	GEM_BUG_ON(tail > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(tail != READ_ONCE(desc->tail))) {
+		CT_ERROR(ct, "Tail was modified %u != %u\n",
+			 desc->tail, tail);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+	if (unlikely(READ_ONCE(desc->head) >= size)) {
+		CT_ERROR(ct, "Invalid head offset %u >= %u)\n",
+			 desc->head, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
-
-	/*
-	 * tail == head condition indicates empty. GuC FW does not support
-	 * using up the entire buffer to get tail == head meaning full.
-	 */
-	if (tail < head)
-		used = (size - head) + tail;
-	else
-		used = tail - head;
-
-	/* make sure there is a space including extra dw for the header */
-	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
-		return -ENOSPC;
+#endif
 
 	/*
 	 * dw0: CT header (including fence)
@@ -452,6 +451,11 @@ static int ct_write(struct intel_guc_ct *ct,
 	 */
 	write_barrier(ct);
 
+	/* update local copies */
+	ctb->tail = tail;
+	GEM_BUG_ON(ctb->space < len + GUC_CTB_HDR_LEN);
+	ctb->space -= len + GUC_CTB_HDR_LEN;
+
 	/* now update descriptor */
 	WRITE_ONCE(desc->tail, tail);
 
@@ -469,7 +473,7 @@ static int ct_write(struct intel_guc_ct *ct,
  * @req:	pointer to pending request
  * @status:	placeholder for status
  *
- * For each sent request, Guc shall send bac CT response message.
+ * For each sent request, GuC shall send back CT response message.
  * Our message handler will update status of tracked request once
  * response message with given fence is received. Wait here and
  * check for valid response status value.
@@ -525,24 +529,36 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 	return ret;
 }
 
-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
+static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 {
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = READ_ONCE(desc->head);
+	u32 head;
 	u32 space;
 
-	space = CIRC_SPACE(desc->tail, head, ctb->size);
+	if (ctb->space >= len_dw)
+		return true;
+
+	head = READ_ONCE(desc->head);
+	if (unlikely(head > ctb->size)) {
+		CT_ERROR(ct, "Invalid head offset %u >= %u)\n",
+			 head, ctb->size);
+		desc->status |= GUC_CTB_STATUS_OVERFLOW;
+		ctb->broken = true;
+		return false;
+	}
+
+	space = CIRC_SPACE(ctb->tail, head, ctb->size);
+	ctb->space = space;
 
 	return space >= len_dw;
 }
 
 static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
-
 	lockdep_assert_held(&ct->ctbs.send.lock);
 
-	if (unlikely(!h2g_has_room(ctb, len_dw))) {
+	if (unlikely(!h2g_has_room(ct, len_dw))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 
@@ -612,7 +628,7 @@ static int ct_send(struct intel_guc_ct *ct,
 	 */
 retry:
 	spin_lock_irqsave(&ctb->lock, flags);
-	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
+	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 		spin_unlock_irqrestore(&ctb->lock, flags);
@@ -732,8 +748,8 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
-	u32 tail = desc->tail;
+	u32 head = ctb->head;
+	u32 tail = READ_ONCE(desc->tail);
 	u32 size = ctb->size;
 	u32 *cmds = ctb->cmds;
 	s32 available;
@@ -747,9 +763,19 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
-		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-			 head, tail, size);
+	GEM_BUG_ON(head > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(head != READ_ONCE(desc->head))) {
+		CT_ERROR(ct, "Head was modified %u != %u\n",
+			 desc->head, head);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+#endif
+	if (unlikely(tail >= size)) {
+		CT_ERROR(ct, "Invalid tail offset %u >= %u)\n",
+			 tail, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
@@ -802,6 +828,9 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	}
 	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
 
+	/* update local copies */
+	ctb->head = head;
+
 	/* now update descriptor */
 	WRITE_ONCE(desc->head, head);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index bee03794c1eb..edd1bba0445d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -33,6 +33,9 @@ struct intel_guc;
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
  * @size: size of the commands buffer in dwords
+ * @head: local shadow copy of head in dwords
+ * @tail: local shadow copy of tail in dwords
+ * @space: local shadow copy of space in dwords
  * @broken: flag to indicate if descriptor data is broken
  */
 struct intel_guc_ct_buffer {
@@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
 	u32 size;
+	u32 tail;
+	u32 head;
+	u32 space;
 	bool broken;
 };
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
  2021-07-06 19:33         ` Michal Wajdeczko
  2021-07-06 21:22           ` Matthew Brost
@ 2021-07-06 22:41           ` John Harrison
  1 sibling, 0 replies; 54+ messages in thread
From: John Harrison @ 2021-07-06 22:41 UTC (permalink / raw)
  To: Michal Wajdeczko, Matthew Brost, intel-gfx, dri-devel

On 7/6/2021 12:33, Michal Wajdeczko wrote:
> On 06.07.2021 21:19, John Harrison wrote:
>> On 7/6/2021 12:12, Michal Wajdeczko wrote:
>>> On 06.07.2021 21:00, John Harrison wrote:
>>>> On 7/1/2021 10:15, Matthew Brost wrote:
>>>>> CTB writes are now in the path of command submission and should be
>>>>> optimized for performance. Rather than reading CTB descriptor values
>>>>> (e.g. head, tail) which could result in accesses across the PCIe bus,
>>>>> store shadow local copies and only read/write the descriptor values
>>>>> when
>>>>> absolutely necessary. Also store the current space in the each channel
>>>>> locally.
>>>>>
>>>>> v2:
>>>>>     (Michel)
>>>>>      - Add additional sanity checks for head / tail pointers
>>>>>      - Use GUC_CTB_HDR_LEN rather than magic 1
>>>>>
>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>> ---
>>>>>     drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88
>>>>> +++++++++++++++--------
>>>>>     drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
>>>>>     2 files changed, 65 insertions(+), 29 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>>> index a9cb7b608520..5b8b4ff609e2 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>>> @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct
>>>>> guc_ct_buffer_desc *desc)
>>>>>     static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
>>>>>     {
>>>>>         ctb->broken = false;
>>>>> +    ctb->tail = 0;
>>>>> +    ctb->head = 0;
>>>>> +    ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>>>>> +
>>>>>         guc_ct_buffer_desc_init(ctb->desc);
>>>>>     }
>>>>>     @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
>>>>>     {
>>>>>         struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>>         struct guc_ct_buffer_desc *desc = ctb->desc;
>>>>> -    u32 head = desc->head;
>>>>> -    u32 tail = desc->tail;
>>>>> +    u32 tail = ctb->tail;
>>>>>         u32 size = ctb->size;
>>>>> -    u32 used;
>>>>>         u32 header;
>>>>>         u32 hxg;
>>>>>         u32 *cmds = ctb->cmds;
>>>>> @@ -395,25 +397,22 @@ static int ct_write(struct intel_guc_ct *ct,
>>>>>         if (unlikely(desc->status))
>>>>>             goto corrupted;
>>>>>     -    if (unlikely((tail | head) >= size)) {
>>>>> +    GEM_BUG_ON(tail > size);
>>>>> +
>>>>> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
>>>>> +    if (unlikely(tail != READ_ONCE(desc->tail))) {
>>>>> +        CT_ERROR(ct, "Tail was modified %u != %u\n",
>>>>> +             desc->tail, ctb->tail);
>>>>> +        desc->status |= GUC_CTB_STATUS_MISMATCH;
>>>>> +        goto corrupted;
>>>>> +    }
>>>>> +    if (unlikely((desc->tail | desc->head) >= size)) {
>>>>>             CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>>>>> -             head, tail, size);
>>>>> +             desc->head, desc->tail, size);
>>>>>             desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>>>>             goto corrupted;
>>>>>         }
>>>>> -
>>>>> -    /*
>>>>> -     * tail == head condition indicates empty. GuC FW does not support
>>>>> -     * using up the entire buffer to get tail == head meaning full.
>>>>> -     */
>>>>> -    if (tail < head)
>>>>> -        used = (size - head) + tail;
>>>>> -    else
>>>>> -        used = tail - head;
>>>>> -
>>>>> -    /* make sure there is a space including extra dw for the fence */
>>>>> -    if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
>>>>> -        return -ENOSPC;
>>>>> +#endif
>>>>>           /*
>>>>>          * dw0: CT header (including fence)
>>>>> @@ -454,7 +453,9 @@ static int ct_write(struct intel_guc_ct *ct,
>>>>>         write_barrier(ct);
>>>>>           /* now update descriptor */
>>>>> +    ctb->tail = tail;
>>>>>         WRITE_ONCE(desc->tail, tail);
>>>>> +    ctb->space -= len + GUC_CTB_HDR_LEN;
>>>>>           return 0;
>>>>>     @@ -470,7 +471,7 @@ static int ct_write(struct intel_guc_ct *ct,
>>>>>      * @req:    pointer to pending request
>>>>>      * @status:    placeholder for status
>>>>>      *
>>>>> - * For each sent request, Guc shall send bac CT response message.
>>>>> + * For each sent request, GuC shall send back CT response message.
>>>>>      * Our message handler will update status of tracked request once
>>>>>      * response message with given fence is received. Wait here and
>>>>>      * check for valid response status value.
>>>>> @@ -526,24 +527,35 @@ static inline bool ct_deadlocked(struct
>>>>> intel_guc_ct *ct)
>>>>>         return ret;
>>>>>     }
>>>>>     -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb,
>>>>> u32 len_dw)
>>>>> +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
>>>>>     {
>>>>> -    struct guc_ct_buffer_desc *desc = ctb->desc;
>>>>> -    u32 head = READ_ONCE(desc->head);
>>>>> +    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>> +    u32 head;
>>>>>         u32 space;
>>>>>     -    space = CIRC_SPACE(desc->tail, head, ctb->size);
>>>>> +    if (ctb->space >= len_dw)
>>>>> +        return true;
>>>>> +
>>>>> +    head = READ_ONCE(ctb->desc->head);
>>>>> +    if (unlikely(head > ctb->size)) {
>>>>> +        CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
>>>>> +             ctb->desc->head, ctb->desc->tail, ctb->size);
>>>>> +        ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>>>> +        ctb->broken = true;
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    space = CIRC_SPACE(ctb->tail, head, ctb->size);
>>>>> +    ctb->space = space;
>>>>>           return space >= len_dw;
>>>>>     }
>>>>>       static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
>>>>>     {
>>>>> -    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>> -
>>>>>         lockdep_assert_held(&ct->ctbs.send.lock);
>>>>>     -    if (unlikely(!h2g_has_room(ctb, len_dw))) {
>>>>> +    if (unlikely(!h2g_has_room(ct, len_dw))) {
>>>>>             if (ct->stall_time == KTIME_MAX)
>>>>>                 ct->stall_time = ktime_get();
>>>>>     @@ -613,7 +625,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>          */
>>>>>     retry:
>>>>>         spin_lock_irqsave(&ctb->lock, flags);
>>>>> -    if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
>>>>> +    if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
>>>>>             if (ct->stall_time == KTIME_MAX)
>>>>>                 ct->stall_time = ktime_get();
>>>>>             spin_unlock_irqrestore(&ctb->lock, flags);
>>>>> @@ -733,7 +745,7 @@ static int ct_read(struct intel_guc_ct *ct, struct
>>>>> ct_incoming_msg **msg)
>>>>>     {
>>>>>         struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
>>>>>         struct guc_ct_buffer_desc *desc = ctb->desc;
>>>>> -    u32 head = desc->head;
>>>>> +    u32 head = ctb->head;
>>>>>         u32 tail = desc->tail;
>>>>>         u32 size = ctb->size;
>>>>>         u32 *cmds = ctb->cmds;
>>>>> @@ -748,12 +760,29 @@ static int ct_read(struct intel_guc_ct *ct,
>>>>> struct ct_incoming_msg **msg)
>>>>>         if (unlikely(desc->status))
>>>>>             goto corrupted;
>>>>>     -    if (unlikely((tail | head) >= size)) {
>>>>> +    GEM_BUG_ON(head > size);
>>>> Is the BUG_ON necessary given that both options below do the same check
>>>> but as a corrupted buffer test (with subsequent recovery by GT reset?)
>>>> rather than killing the driver.
>>> "head" and "size" are now fully owned by the driver.
>>> BUGON here is to make sure driver is coded correctly.
>> The point is that both sides of the #if below also validate head. So
> but not the same "head"
It is from the point of view of is the BUG_ON redundant duplication:

+    if (unlikely(head != READ_ONCE(desc->head))) {



>
> under DEBUG we are validating the one from descriptor (together with
> tail) - and that should be recoverable as if this fails it was clearly
> not our fault.
>
> but under non-DEBUG we were attempting to validate again the local one,
> pretending that this is recoverable, but it is not, as this is our fault
> (elsewhere in i915 we don't attempt to recover from obvious coding errors).
If it happens during driver development when someone is editing code 
then it might be an obvious coding error. If it happens once in a blue 
moon on some customer's system then it is a very hard to track down race 
condition that is totally not obvious at all. The former won't make it 
past CI whether it is a BUG_ON or a CT_ERROR. The latter has just killed 
the end user's kernel rather than attempted to recover (and would have 
been likely to succeed the recovery).

John.

>
>> first there is a BUG_ON, then there is the same test but without blowing
>> the driver apart. One or the other is not required. My vote would be to
>> keep the recoverable test rather than the BUG_ON.
> IMHO we should keep GEMBUGON and drop redundant check in non-DEBUG.
>
> But let Matt decide.
>
> Michal
>
>> John.
>>
>>>>> +
>>>>> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
>>>>> +    if (unlikely(head != READ_ONCE(desc->head))) {
>>>>> +        CT_ERROR(ct, "Head was modified %u != %u\n",
>>>>> +             desc->head, ctb->head);
>>>>> +        desc->status |= GUC_CTB_STATUS_MISMATCH;
>>>>> +        goto corrupted;
>>>>> +    }
>>>>> +    if (unlikely((desc->tail | desc->head) >= size)) {
>>>>> +        CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>>>>> +             head, tail, size);
>>>>> +        desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>>>> +        goto corrupted;
>>>>> +    }
>>>>> +#else
>>>>> +    if (unlikely((tail | ctb->head) >= size)) {
>>>> Could just be 'head' rather than 'ctb->head'.
>>> or drop "ctb->head" completely since this is driver owned field and
>>> above you already have BUGON to test it
>>>
>>> Michal
>>>
>>>> John.
>>>>
>>>>>             CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>>>>>                  head, tail, size);
>>>>>             desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>>>>             goto corrupted;
>>>>>         }
>>>>> +#endif
>>>>>           /* tail == head condition indicates empty */
>>>>>         available = tail - head;
>>>>> @@ -803,6 +832,7 @@ static int ct_read(struct intel_guc_ct *ct, struct
>>>>> ct_incoming_msg **msg)
>>>>>         }
>>>>>         CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
>>>>>     +    ctb->head = head;
>>>>>         /* now update descriptor */
>>>>>         WRITE_ONCE(desc->head, head);
>>>>>     diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>>> index bee03794c1eb..edd1bba0445d 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>>> @@ -33,6 +33,9 @@ struct intel_guc;
>>>>>      * @desc: pointer to the buffer descriptor
>>>>>      * @cmds: pointer to the commands buffer
>>>>>      * @size: size of the commands buffer in dwords
>>>>> + * @head: local shadow copy of head in dwords
>>>>> + * @tail: local shadow copy of tail in dwords
>>>>> + * @space: local shadow copy of space in dwords
>>>>>      * @broken: flag to indicate if descriptor data is broken
>>>>>      */
>>>>>     struct intel_guc_ct_buffer {
>>>>> @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
>>>>>         struct guc_ct_buffer_desc *desc;
>>>>>         u32 *cmds;
>>>>>         u32 size;
>>>>> +    u32 tail;
>>>>> +    u32 head;
>>>>> +    u32 space;
>>>>>         bool broken;
>>>>>     };
>>>>>     


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
  2021-07-06 19:33         ` Michal Wajdeczko
@ 2021-07-06 21:22           ` Matthew Brost
  2021-07-06 22:41           ` John Harrison
  1 sibling, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-07-06 21:22 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: intel-gfx, dri-devel, John Harrison

On Tue, Jul 06, 2021 at 09:33:23PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 06.07.2021 21:19, John Harrison wrote:
> > On 7/6/2021 12:12, Michal Wajdeczko wrote:
> >> On 06.07.2021 21:00, John Harrison wrote:
> >>> On 7/1/2021 10:15, Matthew Brost wrote:
> >>>> CTB writes are now in the path of command submission and should be
> >>>> optimized for performance. Rather than reading CTB descriptor values
> >>>> (e.g. head, tail) which could result in accesses across the PCIe bus,
> >>>> store shadow local copies and only read/write the descriptor values
> >>>> when
> >>>> absolutely necessary. Also store the current space in the each channel
> >>>> locally.
> >>>>
> >>>> v2:
> >>>>    (Michel)
> >>>>     - Add additional sanity checks for head / tail pointers
> >>>>     - Use GUC_CTB_HDR_LEN rather than magic 1
> >>>>
> >>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >>>> ---
> >>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88
> >>>> +++++++++++++++--------
> >>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
> >>>>    2 files changed, 65 insertions(+), 29 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>> index a9cb7b608520..5b8b4ff609e2 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>> @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct
> >>>> guc_ct_buffer_desc *desc)
> >>>>    static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
> >>>>    {
> >>>>        ctb->broken = false;
> >>>> +    ctb->tail = 0;
> >>>> +    ctb->head = 0;
> >>>> +    ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
> >>>> +
> >>>>        guc_ct_buffer_desc_init(ctb->desc);
> >>>>    }
> >>>>    @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
> >>>>    {
> >>>>        struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>>        struct guc_ct_buffer_desc *desc = ctb->desc;
> >>>> -    u32 head = desc->head;
> >>>> -    u32 tail = desc->tail;
> >>>> +    u32 tail = ctb->tail;
> >>>>        u32 size = ctb->size;
> >>>> -    u32 used;
> >>>>        u32 header;
> >>>>        u32 hxg;
> >>>>        u32 *cmds = ctb->cmds;
> >>>> @@ -395,25 +397,22 @@ static int ct_write(struct intel_guc_ct *ct,
> >>>>        if (unlikely(desc->status))
> >>>>            goto corrupted;
> >>>>    -    if (unlikely((tail | head) >= size)) {
> >>>> +    GEM_BUG_ON(tail > size);
> >>>> +
> >>>> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> >>>> +    if (unlikely(tail != READ_ONCE(desc->tail))) {
> >>>> +        CT_ERROR(ct, "Tail was modified %u != %u\n",
> >>>> +             desc->tail, ctb->tail);
> >>>> +        desc->status |= GUC_CTB_STATUS_MISMATCH;
> >>>> +        goto corrupted;
> >>>> +    }
> >>>> +    if (unlikely((desc->tail | desc->head) >= size)) {
> >>>>            CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> >>>> -             head, tail, size);
> >>>> +             desc->head, desc->tail, size);
> >>>>            desc->status |= GUC_CTB_STATUS_OVERFLOW;
> >>>>            goto corrupted;
> >>>>        }
> >>>> -
> >>>> -    /*
> >>>> -     * tail == head condition indicates empty. GuC FW does not support
> >>>> -     * using up the entire buffer to get tail == head meaning full.
> >>>> -     */
> >>>> -    if (tail < head)
> >>>> -        used = (size - head) + tail;
> >>>> -    else
> >>>> -        used = tail - head;
> >>>> -
> >>>> -    /* make sure there is a space including extra dw for the fence */
> >>>> -    if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
> >>>> -        return -ENOSPC;
> >>>> +#endif
> >>>>          /*
> >>>>         * dw0: CT header (including fence)
> >>>> @@ -454,7 +453,9 @@ static int ct_write(struct intel_guc_ct *ct,
> >>>>        write_barrier(ct);
> >>>>          /* now update descriptor */
> >>>> +    ctb->tail = tail;
> >>>>        WRITE_ONCE(desc->tail, tail);
> >>>> +    ctb->space -= len + GUC_CTB_HDR_LEN;
> >>>>          return 0;
> >>>>    @@ -470,7 +471,7 @@ static int ct_write(struct intel_guc_ct *ct,
> >>>>     * @req:    pointer to pending request
> >>>>     * @status:    placeholder for status
> >>>>     *
> >>>> - * For each sent request, Guc shall send bac CT response message.
> >>>> + * For each sent request, GuC shall send back CT response message.
> >>>>     * Our message handler will update status of tracked request once
> >>>>     * response message with given fence is received. Wait here and
> >>>>     * check for valid response status value.
> >>>> @@ -526,24 +527,35 @@ static inline bool ct_deadlocked(struct
> >>>> intel_guc_ct *ct)
> >>>>        return ret;
> >>>>    }
> >>>>    -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb,
> >>>> u32 len_dw)
> >>>> +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
> >>>>    {
> >>>> -    struct guc_ct_buffer_desc *desc = ctb->desc;
> >>>> -    u32 head = READ_ONCE(desc->head);
> >>>> +    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>> +    u32 head;
> >>>>        u32 space;
> >>>>    -    space = CIRC_SPACE(desc->tail, head, ctb->size);
> >>>> +    if (ctb->space >= len_dw)
> >>>> +        return true;
> >>>> +
> >>>> +    head = READ_ONCE(ctb->desc->head);
> >>>> +    if (unlikely(head > ctb->size)) {
> >>>> +        CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
> >>>> +             ctb->desc->head, ctb->desc->tail, ctb->size);
> >>>> +        ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
> >>>> +        ctb->broken = true;
> >>>> +        return false;
> >>>> +    }
> >>>> +
> >>>> +    space = CIRC_SPACE(ctb->tail, head, ctb->size);
> >>>> +    ctb->space = space;
> >>>>          return space >= len_dw;
> >>>>    }
> >>>>      static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
> >>>>    {
> >>>> -    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>> -
> >>>>        lockdep_assert_held(&ct->ctbs.send.lock);
> >>>>    -    if (unlikely(!h2g_has_room(ctb, len_dw))) {
> >>>> +    if (unlikely(!h2g_has_room(ct, len_dw))) {
> >>>>            if (ct->stall_time == KTIME_MAX)
> >>>>                ct->stall_time = ktime_get();
> >>>>    @@ -613,7 +625,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>         */
> >>>>    retry:
> >>>>        spin_lock_irqsave(&ctb->lock, flags);
> >>>> -    if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
> >>>> +    if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
> >>>>            if (ct->stall_time == KTIME_MAX)
> >>>>                ct->stall_time = ktime_get();
> >>>>            spin_unlock_irqrestore(&ctb->lock, flags);
> >>>> @@ -733,7 +745,7 @@ static int ct_read(struct intel_guc_ct *ct, struct
> >>>> ct_incoming_msg **msg)
> >>>>    {
> >>>>        struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
> >>>>        struct guc_ct_buffer_desc *desc = ctb->desc;
> >>>> -    u32 head = desc->head;
> >>>> +    u32 head = ctb->head;
> >>>>        u32 tail = desc->tail;
> >>>>        u32 size = ctb->size;
> >>>>        u32 *cmds = ctb->cmds;
> >>>> @@ -748,12 +760,29 @@ static int ct_read(struct intel_guc_ct *ct,
> >>>> struct ct_incoming_msg **msg)
> >>>>        if (unlikely(desc->status))
> >>>>            goto corrupted;
> >>>>    -    if (unlikely((tail | head) >= size)) {
> >>>> +    GEM_BUG_ON(head > size);

This is driver owned field so I think a GEM_BUG_ON is correct as if this
blows the driver apart we have a bug in the i915. 

> >>> Is the BUG_ON necessary given that both options below do the same check
> >>> but as a corrupted buffer test (with subsequent recovery by GT reset?)
> >>> rather than killing the driver.
> >> "head" and "size" are now fully owned by the driver.
> >> BUGON here is to make sure driver is coded correctly.
> > The point is that both sides of the #if below also validate head. So
> 
> but not the same "head"
> 
> under DEBUG we are validating the one from descriptor (together with
> tail) - and that should be recoverable as if this fails it was clearly
> not our fault.
> 
> but under non-DEBUG we were attempting to validate again the local one,
> pretending that this is recoverable, but it is not, as this is our fault
> (elsewhere in i915 we don't attempt to recover from obvious coding errors).
>
> > first there is a BUG_ON, then there is the same test but without blowing
> > the driver apart. One or the other is not required. My vote would be to
> > keep the recoverable test rather than the BUG_ON.
> 
> IMHO we should keep GEMBUGON and drop redundant check in non-DEBUG.
> 
> But let Matt decide.
>

I think I'll drop the testing of the head value and keep the BUG_ON.

Matt
 
> Michal
> 
> > 
> > John.
> > 
> >>
> >>>> +
> >>>> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> >>>> +    if (unlikely(head != READ_ONCE(desc->head))) {
> >>>> +        CT_ERROR(ct, "Head was modified %u != %u\n",
> >>>> +             desc->head, ctb->head);
> >>>> +        desc->status |= GUC_CTB_STATUS_MISMATCH;
> >>>> +        goto corrupted;
> >>>> +    }
> >>>> +    if (unlikely((desc->tail | desc->head) >= size)) {
> >>>> +        CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> >>>> +             head, tail, size);
> >>>> +        desc->status |= GUC_CTB_STATUS_OVERFLOW;
> >>>> +        goto corrupted;
> >>>> +    }
> >>>> +#else
> >>>> +    if (unlikely((tail | ctb->head) >= size)) {
> >>> Could just be 'head' rather than 'ctb->head'.
> >> or drop "ctb->head" completely since this is driver owned field and
> >> above you already have BUGON to test it
> >>
> >> Michal
> >>
> >>> John.
> >>>
> >>>>            CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> >>>>                 head, tail, size);
> >>>>            desc->status |= GUC_CTB_STATUS_OVERFLOW;
> >>>>            goto corrupted;
> >>>>        }
> >>>> +#endif
> >>>>          /* tail == head condition indicates empty */
> >>>>        available = tail - head;
> >>>> @@ -803,6 +832,7 @@ static int ct_read(struct intel_guc_ct *ct, struct
> >>>> ct_incoming_msg **msg)
> >>>>        }
> >>>>        CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
> >>>>    +    ctb->head = head;
> >>>>        /* now update descriptor */
> >>>>        WRITE_ONCE(desc->head, head);
> >>>>    diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>>> index bee03794c1eb..edd1bba0445d 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>>> @@ -33,6 +33,9 @@ struct intel_guc;
> >>>>     * @desc: pointer to the buffer descriptor
> >>>>     * @cmds: pointer to the commands buffer
> >>>>     * @size: size of the commands buffer in dwords
> >>>> + * @head: local shadow copy of head in dwords
> >>>> + * @tail: local shadow copy of tail in dwords
> >>>> + * @space: local shadow copy of space in dwords
> >>>>     * @broken: flag to indicate if descriptor data is broken
> >>>>     */
> >>>>    struct intel_guc_ct_buffer {
> >>>> @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
> >>>>        struct guc_ct_buffer_desc *desc;
> >>>>        u32 *cmds;
> >>>>        u32 size;
> >>>> +    u32 tail;
> >>>> +    u32 head;
> >>>> +    u32 space;
> >>>>        bool broken;
> >>>>    };
> >>>>    
> > 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
  2021-07-06 19:19       ` John Harrison
@ 2021-07-06 19:33         ` Michal Wajdeczko
  2021-07-06 21:22           ` Matthew Brost
  2021-07-06 22:41           ` John Harrison
  0 siblings, 2 replies; 54+ messages in thread
From: Michal Wajdeczko @ 2021-07-06 19:33 UTC (permalink / raw)
  To: John Harrison, Matthew Brost, intel-gfx, dri-devel



On 06.07.2021 21:19, John Harrison wrote:
> On 7/6/2021 12:12, Michal Wajdeczko wrote:
>> On 06.07.2021 21:00, John Harrison wrote:
>>> On 7/1/2021 10:15, Matthew Brost wrote:
>>>> CTB writes are now in the path of command submission and should be
>>>> optimized for performance. Rather than reading CTB descriptor values
>>>> (e.g. head, tail) which could result in accesses across the PCIe bus,
>>>> store shadow local copies and only read/write the descriptor values
>>>> when
>>>> absolutely necessary. Also store the current space in the each channel
>>>> locally.
>>>>
>>>> v2:
>>>>    (Michel)
>>>>     - Add additional sanity checks for head / tail pointers
>>>>     - Use GUC_CTB_HDR_LEN rather than magic 1
>>>>
>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88
>>>> +++++++++++++++--------
>>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
>>>>    2 files changed, 65 insertions(+), 29 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> index a9cb7b608520..5b8b4ff609e2 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct
>>>> guc_ct_buffer_desc *desc)
>>>>    static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
>>>>    {
>>>>        ctb->broken = false;
>>>> +    ctb->tail = 0;
>>>> +    ctb->head = 0;
>>>> +    ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>>>> +
>>>>        guc_ct_buffer_desc_init(ctb->desc);
>>>>    }
>>>>    @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
>>>>    {
>>>>        struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>        struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> -    u32 head = desc->head;
>>>> -    u32 tail = desc->tail;
>>>> +    u32 tail = ctb->tail;
>>>>        u32 size = ctb->size;
>>>> -    u32 used;
>>>>        u32 header;
>>>>        u32 hxg;
>>>>        u32 *cmds = ctb->cmds;
>>>> @@ -395,25 +397,22 @@ static int ct_write(struct intel_guc_ct *ct,
>>>>        if (unlikely(desc->status))
>>>>            goto corrupted;
>>>>    -    if (unlikely((tail | head) >= size)) {
>>>> +    GEM_BUG_ON(tail > size);
>>>> +
>>>> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
>>>> +    if (unlikely(tail != READ_ONCE(desc->tail))) {
>>>> +        CT_ERROR(ct, "Tail was modified %u != %u\n",
>>>> +             desc->tail, ctb->tail);
>>>> +        desc->status |= GUC_CTB_STATUS_MISMATCH;
>>>> +        goto corrupted;
>>>> +    }
>>>> +    if (unlikely((desc->tail | desc->head) >= size)) {
>>>>            CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>>>> -             head, tail, size);
>>>> +             desc->head, desc->tail, size);
>>>>            desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>>>            goto corrupted;
>>>>        }
>>>> -
>>>> -    /*
>>>> -     * tail == head condition indicates empty. GuC FW does not support
>>>> -     * using up the entire buffer to get tail == head meaning full.
>>>> -     */
>>>> -    if (tail < head)
>>>> -        used = (size - head) + tail;
>>>> -    else
>>>> -        used = tail - head;
>>>> -
>>>> -    /* make sure there is a space including extra dw for the fence */
>>>> -    if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
>>>> -        return -ENOSPC;
>>>> +#endif
>>>>          /*
>>>>         * dw0: CT header (including fence)
>>>> @@ -454,7 +453,9 @@ static int ct_write(struct intel_guc_ct *ct,
>>>>        write_barrier(ct);
>>>>          /* now update descriptor */
>>>> +    ctb->tail = tail;
>>>>        WRITE_ONCE(desc->tail, tail);
>>>> +    ctb->space -= len + GUC_CTB_HDR_LEN;
>>>>          return 0;
>>>>    @@ -470,7 +471,7 @@ static int ct_write(struct intel_guc_ct *ct,
>>>>     * @req:    pointer to pending request
>>>>     * @status:    placeholder for status
>>>>     *
>>>> - * For each sent request, Guc shall send bac CT response message.
>>>> + * For each sent request, GuC shall send back CT response message.
>>>>     * Our message handler will update status of tracked request once
>>>>     * response message with given fence is received. Wait here and
>>>>     * check for valid response status value.
>>>> @@ -526,24 +527,35 @@ static inline bool ct_deadlocked(struct
>>>> intel_guc_ct *ct)
>>>>        return ret;
>>>>    }
>>>>    -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb,
>>>> u32 len_dw)
>>>> +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
>>>>    {
>>>> -    struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> -    u32 head = READ_ONCE(desc->head);
>>>> +    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>> +    u32 head;
>>>>        u32 space;
>>>>    -    space = CIRC_SPACE(desc->tail, head, ctb->size);
>>>> +    if (ctb->space >= len_dw)
>>>> +        return true;
>>>> +
>>>> +    head = READ_ONCE(ctb->desc->head);
>>>> +    if (unlikely(head > ctb->size)) {
>>>> +        CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
>>>> +             ctb->desc->head, ctb->desc->tail, ctb->size);
>>>> +        ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>>> +        ctb->broken = true;
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    space = CIRC_SPACE(ctb->tail, head, ctb->size);
>>>> +    ctb->space = space;
>>>>          return space >= len_dw;
>>>>    }
>>>>      static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
>>>>    {
>>>> -    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>> -
>>>>        lockdep_assert_held(&ct->ctbs.send.lock);
>>>>    -    if (unlikely(!h2g_has_room(ctb, len_dw))) {
>>>> +    if (unlikely(!h2g_has_room(ct, len_dw))) {
>>>>            if (ct->stall_time == KTIME_MAX)
>>>>                ct->stall_time = ktime_get();
>>>>    @@ -613,7 +625,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>         */
>>>>    retry:
>>>>        spin_lock_irqsave(&ctb->lock, flags);
>>>> -    if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
>>>> +    if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
>>>>            if (ct->stall_time == KTIME_MAX)
>>>>                ct->stall_time = ktime_get();
>>>>            spin_unlock_irqrestore(&ctb->lock, flags);
>>>> @@ -733,7 +745,7 @@ static int ct_read(struct intel_guc_ct *ct, struct
>>>> ct_incoming_msg **msg)
>>>>    {
>>>>        struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
>>>>        struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> -    u32 head = desc->head;
>>>> +    u32 head = ctb->head;
>>>>        u32 tail = desc->tail;
>>>>        u32 size = ctb->size;
>>>>        u32 *cmds = ctb->cmds;
>>>> @@ -748,12 +760,29 @@ static int ct_read(struct intel_guc_ct *ct,
>>>> struct ct_incoming_msg **msg)
>>>>        if (unlikely(desc->status))
>>>>            goto corrupted;
>>>>    -    if (unlikely((tail | head) >= size)) {
>>>> +    GEM_BUG_ON(head > size);
>>> Is the BUG_ON necessary given that both options below do the same check
>>> but as a corrupted buffer test (with subsequent recovery by GT reset?)
>>> rather than killing the driver.
>> "head" and "size" are now fully owned by the driver.
>> BUGON here is to make sure driver is coded correctly.
> The point is that both sides of the #if below also validate head. So

but not the same "head"

under DEBUG we are validating the one from descriptor (together with
tail) - and that should be recoverable as if this fails it was clearly
not our fault.

but under non-DEBUG we were attempting to validate again the local one,
pretending that this is recoverable, but it is not, as this is our fault
(elsewhere in i915 we don't attempt to recover from obvious coding errors).

> first there is a BUG_ON, then there is the same test but without blowing
> the driver apart. One or the other is not required. My vote would be to
> keep the recoverable test rather than the BUG_ON.

IMHO we should keep GEMBUGON and drop redundant check in non-DEBUG.

But let Matt decide.

Michal

> 
> John.
> 
>>
>>>> +
>>>> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
>>>> +    if (unlikely(head != READ_ONCE(desc->head))) {
>>>> +        CT_ERROR(ct, "Head was modified %u != %u\n",
>>>> +             desc->head, ctb->head);
>>>> +        desc->status |= GUC_CTB_STATUS_MISMATCH;
>>>> +        goto corrupted;
>>>> +    }
>>>> +    if (unlikely((desc->tail | desc->head) >= size)) {
>>>> +        CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>>>> +             head, tail, size);
>>>> +        desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>>> +        goto corrupted;
>>>> +    }
>>>> +#else
>>>> +    if (unlikely((tail | ctb->head) >= size)) {
>>> Could just be 'head' rather than 'ctb->head'.
>> or drop "ctb->head" completely since this is driver owned field and
>> above you already have BUGON to test it
>>
>> Michal
>>
>>> John.
>>>
>>>>            CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>>>>                 head, tail, size);
>>>>            desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>>>            goto corrupted;
>>>>        }
>>>> +#endif
>>>>          /* tail == head condition indicates empty */
>>>>        available = tail - head;
>>>> @@ -803,6 +832,7 @@ static int ct_read(struct intel_guc_ct *ct, struct
>>>> ct_incoming_msg **msg)
>>>>        }
>>>>        CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
>>>>    +    ctb->head = head;
>>>>        /* now update descriptor */
>>>>        WRITE_ONCE(desc->head, head);
>>>>    diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> index bee03794c1eb..edd1bba0445d 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> @@ -33,6 +33,9 @@ struct intel_guc;
>>>>     * @desc: pointer to the buffer descriptor
>>>>     * @cmds: pointer to the commands buffer
>>>>     * @size: size of the commands buffer in dwords
>>>> + * @head: local shadow copy of head in dwords
>>>> + * @tail: local shadow copy of tail in dwords
>>>> + * @space: local shadow copy of space in dwords
>>>>     * @broken: flag to indicate if descriptor data is broken
>>>>     */
>>>>    struct intel_guc_ct_buffer {
>>>> @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
>>>>        struct guc_ct_buffer_desc *desc;
>>>>        u32 *cmds;
>>>>        u32 size;
>>>> +    u32 tail;
>>>> +    u32 head;
>>>> +    u32 space;
>>>>        bool broken;
>>>>    };
>>>>    
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
  2021-07-06 19:12     ` Michal Wajdeczko
@ 2021-07-06 19:19       ` John Harrison
  2021-07-06 19:33         ` Michal Wajdeczko
  0 siblings, 1 reply; 54+ messages in thread
From: John Harrison @ 2021-07-06 19:19 UTC (permalink / raw)
  To: Michal Wajdeczko, Matthew Brost, intel-gfx, dri-devel

On 7/6/2021 12:12, Michal Wajdeczko wrote:
> On 06.07.2021 21:00, John Harrison wrote:
>> On 7/1/2021 10:15, Matthew Brost wrote:
>>> CTB writes are now in the path of command submission and should be
>>> optimized for performance. Rather than reading CTB descriptor values
>>> (e.g. head, tail) which could result in accesses across the PCIe bus,
>>> store shadow local copies and only read/write the descriptor values when
>>> absolutely necessary. Also store the current space in the each channel
>>> locally.
>>>
>>> v2:
>>>    (Michel)
>>>     - Add additional sanity checks for head / tail pointers
>>>     - Use GUC_CTB_HDR_LEN rather than magic 1
>>>
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++--------
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
>>>    2 files changed, 65 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index a9cb7b608520..5b8b4ff609e2 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct
>>> guc_ct_buffer_desc *desc)
>>>    static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
>>>    {
>>>        ctb->broken = false;
>>> +    ctb->tail = 0;
>>> +    ctb->head = 0;
>>> +    ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>>> +
>>>        guc_ct_buffer_desc_init(ctb->desc);
>>>    }
>>>    @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
>>>    {
>>>        struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>        struct guc_ct_buffer_desc *desc = ctb->desc;
>>> -    u32 head = desc->head;
>>> -    u32 tail = desc->tail;
>>> +    u32 tail = ctb->tail;
>>>        u32 size = ctb->size;
>>> -    u32 used;
>>>        u32 header;
>>>        u32 hxg;
>>>        u32 *cmds = ctb->cmds;
>>> @@ -395,25 +397,22 @@ static int ct_write(struct intel_guc_ct *ct,
>>>        if (unlikely(desc->status))
>>>            goto corrupted;
>>>    -    if (unlikely((tail | head) >= size)) {
>>> +    GEM_BUG_ON(tail > size);
>>> +
>>> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
>>> +    if (unlikely(tail != READ_ONCE(desc->tail))) {
>>> +        CT_ERROR(ct, "Tail was modified %u != %u\n",
>>> +             desc->tail, ctb->tail);
>>> +        desc->status |= GUC_CTB_STATUS_MISMATCH;
>>> +        goto corrupted;
>>> +    }
>>> +    if (unlikely((desc->tail | desc->head) >= size)) {
>>>            CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>>> -             head, tail, size);
>>> +             desc->head, desc->tail, size);
>>>            desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>>            goto corrupted;
>>>        }
>>> -
>>> -    /*
>>> -     * tail == head condition indicates empty. GuC FW does not support
>>> -     * using up the entire buffer to get tail == head meaning full.
>>> -     */
>>> -    if (tail < head)
>>> -        used = (size - head) + tail;
>>> -    else
>>> -        used = tail - head;
>>> -
>>> -    /* make sure there is a space including extra dw for the fence */
>>> -    if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
>>> -        return -ENOSPC;
>>> +#endif
>>>          /*
>>>         * dw0: CT header (including fence)
>>> @@ -454,7 +453,9 @@ static int ct_write(struct intel_guc_ct *ct,
>>>        write_barrier(ct);
>>>          /* now update descriptor */
>>> +    ctb->tail = tail;
>>>        WRITE_ONCE(desc->tail, tail);
>>> +    ctb->space -= len + GUC_CTB_HDR_LEN;
>>>          return 0;
>>>    @@ -470,7 +471,7 @@ static int ct_write(struct intel_guc_ct *ct,
>>>     * @req:    pointer to pending request
>>>     * @status:    placeholder for status
>>>     *
>>> - * For each sent request, Guc shall send bac CT response message.
>>> + * For each sent request, GuC shall send back CT response message.
>>>     * Our message handler will update status of tracked request once
>>>     * response message with given fence is received. Wait here and
>>>     * check for valid response status value.
>>> @@ -526,24 +527,35 @@ static inline bool ct_deadlocked(struct
>>> intel_guc_ct *ct)
>>>        return ret;
>>>    }
>>>    -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb,
>>> u32 len_dw)
>>> +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
>>>    {
>>> -    struct guc_ct_buffer_desc *desc = ctb->desc;
>>> -    u32 head = READ_ONCE(desc->head);
>>> +    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>> +    u32 head;
>>>        u32 space;
>>>    -    space = CIRC_SPACE(desc->tail, head, ctb->size);
>>> +    if (ctb->space >= len_dw)
>>> +        return true;
>>> +
>>> +    head = READ_ONCE(ctb->desc->head);
>>> +    if (unlikely(head > ctb->size)) {
>>> +        CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
>>> +             ctb->desc->head, ctb->desc->tail, ctb->size);
>>> +        ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>> +        ctb->broken = true;
>>> +        return false;
>>> +    }
>>> +
>>> +    space = CIRC_SPACE(ctb->tail, head, ctb->size);
>>> +    ctb->space = space;
>>>          return space >= len_dw;
>>>    }
>>>      static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
>>>    {
>>> -    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>> -
>>>        lockdep_assert_held(&ct->ctbs.send.lock);
>>>    -    if (unlikely(!h2g_has_room(ctb, len_dw))) {
>>> +    if (unlikely(!h2g_has_room(ct, len_dw))) {
>>>            if (ct->stall_time == KTIME_MAX)
>>>                ct->stall_time = ktime_get();
>>>    @@ -613,7 +625,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>         */
>>>    retry:
>>>        spin_lock_irqsave(&ctb->lock, flags);
>>> -    if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
>>> +    if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
>>>            if (ct->stall_time == KTIME_MAX)
>>>                ct->stall_time = ktime_get();
>>>            spin_unlock_irqrestore(&ctb->lock, flags);
>>> @@ -733,7 +745,7 @@ static int ct_read(struct intel_guc_ct *ct, struct
>>> ct_incoming_msg **msg)
>>>    {
>>>        struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
>>>        struct guc_ct_buffer_desc *desc = ctb->desc;
>>> -    u32 head = desc->head;
>>> +    u32 head = ctb->head;
>>>        u32 tail = desc->tail;
>>>        u32 size = ctb->size;
>>>        u32 *cmds = ctb->cmds;
>>> @@ -748,12 +760,29 @@ static int ct_read(struct intel_guc_ct *ct,
>>> struct ct_incoming_msg **msg)
>>>        if (unlikely(desc->status))
>>>            goto corrupted;
>>>    -    if (unlikely((tail | head) >= size)) {
>>> +    GEM_BUG_ON(head > size);
>> Is the BUG_ON necessary given that both options below do the same check
>> but as a corrupted buffer test (with subsequent recovery by GT reset?)
>> rather than killing the driver.
> "head" and "size" are now fully owned by the driver.
> BUGON here is to make sure driver is coded correctly.
The point is that both sides of the #if below also validate head. So 
first there is a BUG_ON, then there is the same test but without blowing 
the driver apart. One or the other is not required. My vote would be to 
keep the recoverable test rather than the BUG_ON.

John.

>
>>> +
>>> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
>>> +    if (unlikely(head != READ_ONCE(desc->head))) {
>>> +        CT_ERROR(ct, "Head was modified %u != %u\n",
>>> +             desc->head, ctb->head);
>>> +        desc->status |= GUC_CTB_STATUS_MISMATCH;
>>> +        goto corrupted;
>>> +    }
>>> +    if (unlikely((desc->tail | desc->head) >= size)) {
>>> +        CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>>> +             head, tail, size);
>>> +        desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>> +        goto corrupted;
>>> +    }
>>> +#else
>>> +    if (unlikely((tail | ctb->head) >= size)) {
>> Could just be 'head' rather than 'ctb->head'.
> or drop "ctb->head" completely since this is driver owned field and
> above you already have BUGON to test it
>
> Michal
>
>> John.
>>
>>>            CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>>>                 head, tail, size);
>>>            desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>>            goto corrupted;
>>>        }
>>> +#endif
>>>          /* tail == head condition indicates empty */
>>>        available = tail - head;
>>> @@ -803,6 +832,7 @@ static int ct_read(struct intel_guc_ct *ct, struct
>>> ct_incoming_msg **msg)
>>>        }
>>>        CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
>>>    +    ctb->head = head;
>>>        /* now update descriptor */
>>>        WRITE_ONCE(desc->head, head);
>>>    diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> index bee03794c1eb..edd1bba0445d 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> @@ -33,6 +33,9 @@ struct intel_guc;
>>>     * @desc: pointer to the buffer descriptor
>>>     * @cmds: pointer to the commands buffer
>>>     * @size: size of the commands buffer in dwords
>>> + * @head: local shadow copy of head in dwords
>>> + * @tail: local shadow copy of tail in dwords
>>> + * @space: local shadow copy of space in dwords
>>>     * @broken: flag to indicate if descriptor data is broken
>>>     */
>>>    struct intel_guc_ct_buffer {
>>> @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
>>>        struct guc_ct_buffer_desc *desc;
>>>        u32 *cmds;
>>>        u32 size;
>>> +    u32 tail;
>>> +    u32 head;
>>> +    u32 space;
>>>        bool broken;
>>>    };
>>>    


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
  2021-07-06 19:00   ` John Harrison
@ 2021-07-06 19:12     ` Michal Wajdeczko
  2021-07-06 19:19       ` John Harrison
  0 siblings, 1 reply; 54+ messages in thread
From: Michal Wajdeczko @ 2021-07-06 19:12 UTC (permalink / raw)
  To: John Harrison, Matthew Brost, intel-gfx, dri-devel



On 06.07.2021 21:00, John Harrison wrote:
> On 7/1/2021 10:15, Matthew Brost wrote:
>> CTB writes are now in the path of command submission and should be
>> optimized for performance. Rather than reading CTB descriptor values
>> (e.g. head, tail) which could result in accesses across the PCIe bus,
>> store shadow local copies and only read/write the descriptor values when
>> absolutely necessary. Also store the current space in the each channel
>> locally.
>>
>> v2:
>>   (Michel)
>>    - Add additional sanity checks for head / tail pointers
>>    - Use GUC_CTB_HDR_LEN rather than magic 1
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++--------
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
>>   2 files changed, 65 insertions(+), 29 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> index a9cb7b608520..5b8b4ff609e2 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct
>> guc_ct_buffer_desc *desc)
>>   static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
>>   {
>>       ctb->broken = false;
>> +    ctb->tail = 0;
>> +    ctb->head = 0;
>> +    ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>> +
>>       guc_ct_buffer_desc_init(ctb->desc);
>>   }
>>   @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
>>   {
>>       struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>       struct guc_ct_buffer_desc *desc = ctb->desc;
>> -    u32 head = desc->head;
>> -    u32 tail = desc->tail;
>> +    u32 tail = ctb->tail;
>>       u32 size = ctb->size;
>> -    u32 used;
>>       u32 header;
>>       u32 hxg;
>>       u32 *cmds = ctb->cmds;
>> @@ -395,25 +397,22 @@ static int ct_write(struct intel_guc_ct *ct,
>>       if (unlikely(desc->status))
>>           goto corrupted;
>>   -    if (unlikely((tail | head) >= size)) {
>> +    GEM_BUG_ON(tail > size);
>> +
>> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
>> +    if (unlikely(tail != READ_ONCE(desc->tail))) {
>> +        CT_ERROR(ct, "Tail was modified %u != %u\n",
>> +             desc->tail, ctb->tail);
>> +        desc->status |= GUC_CTB_STATUS_MISMATCH;
>> +        goto corrupted;
>> +    }
>> +    if (unlikely((desc->tail | desc->head) >= size)) {
>>           CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>> -             head, tail, size);
>> +             desc->head, desc->tail, size);
>>           desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>           goto corrupted;
>>       }
>> -
>> -    /*
>> -     * tail == head condition indicates empty. GuC FW does not support
>> -     * using up the entire buffer to get tail == head meaning full.
>> -     */
>> -    if (tail < head)
>> -        used = (size - head) + tail;
>> -    else
>> -        used = tail - head;
>> -
>> -    /* make sure there is a space including extra dw for the fence */
>> -    if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
>> -        return -ENOSPC;
>> +#endif
>>         /*
>>        * dw0: CT header (including fence)
>> @@ -454,7 +453,9 @@ static int ct_write(struct intel_guc_ct *ct,
>>       write_barrier(ct);
>>         /* now update descriptor */
>> +    ctb->tail = tail;
>>       WRITE_ONCE(desc->tail, tail);
>> +    ctb->space -= len + GUC_CTB_HDR_LEN;
>>         return 0;
>>   @@ -470,7 +471,7 @@ static int ct_write(struct intel_guc_ct *ct,
>>    * @req:    pointer to pending request
>>    * @status:    placeholder for status
>>    *
>> - * For each sent request, Guc shall send bac CT response message.
>> + * For each sent request, GuC shall send back CT response message.
>>    * Our message handler will update status of tracked request once
>>    * response message with given fence is received. Wait here and
>>    * check for valid response status value.
>> @@ -526,24 +527,35 @@ static inline bool ct_deadlocked(struct
>> intel_guc_ct *ct)
>>       return ret;
>>   }
>>   -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb,
>> u32 len_dw)
>> +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
>>   {
>> -    struct guc_ct_buffer_desc *desc = ctb->desc;
>> -    u32 head = READ_ONCE(desc->head);
>> +    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>> +    u32 head;
>>       u32 space;
>>   -    space = CIRC_SPACE(desc->tail, head, ctb->size);
>> +    if (ctb->space >= len_dw)
>> +        return true;
>> +
>> +    head = READ_ONCE(ctb->desc->head);
>> +    if (unlikely(head > ctb->size)) {
>> +        CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
>> +             ctb->desc->head, ctb->desc->tail, ctb->size);
>> +        ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
>> +        ctb->broken = true;
>> +        return false;
>> +    }
>> +
>> +    space = CIRC_SPACE(ctb->tail, head, ctb->size);
>> +    ctb->space = space;
>>         return space >= len_dw;
>>   }
>>     static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
>>   {
>> -    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>> -
>>       lockdep_assert_held(&ct->ctbs.send.lock);
>>   -    if (unlikely(!h2g_has_room(ctb, len_dw))) {
>> +    if (unlikely(!h2g_has_room(ct, len_dw))) {
>>           if (ct->stall_time == KTIME_MAX)
>>               ct->stall_time = ktime_get();
>>   @@ -613,7 +625,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>        */
>>   retry:
>>       spin_lock_irqsave(&ctb->lock, flags);
>> -    if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
>> +    if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
>>           if (ct->stall_time == KTIME_MAX)
>>               ct->stall_time = ktime_get();
>>           spin_unlock_irqrestore(&ctb->lock, flags);
>> @@ -733,7 +745,7 @@ static int ct_read(struct intel_guc_ct *ct, struct
>> ct_incoming_msg **msg)
>>   {
>>       struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
>>       struct guc_ct_buffer_desc *desc = ctb->desc;
>> -    u32 head = desc->head;
>> +    u32 head = ctb->head;
>>       u32 tail = desc->tail;
>>       u32 size = ctb->size;
>>       u32 *cmds = ctb->cmds;
>> @@ -748,12 +760,29 @@ static int ct_read(struct intel_guc_ct *ct,
>> struct ct_incoming_msg **msg)
>>       if (unlikely(desc->status))
>>           goto corrupted;
>>   -    if (unlikely((tail | head) >= size)) {
>> +    GEM_BUG_ON(head > size);
> Is the BUG_ON necessary given that both options below do the same check
> but as a corrupted buffer test (with subsequent recovery by GT reset?)
> rather than killing the driver.

"head" and "size" are now fully owned by the driver.
BUGON here is to make sure driver is coded correctly.

> 
>> +
>> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
>> +    if (unlikely(head != READ_ONCE(desc->head))) {
>> +        CT_ERROR(ct, "Head was modified %u != %u\n",
>> +             desc->head, ctb->head);
>> +        desc->status |= GUC_CTB_STATUS_MISMATCH;
>> +        goto corrupted;
>> +    }
>> +    if (unlikely((desc->tail | desc->head) >= size)) {
>> +        CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>> +             head, tail, size);
>> +        desc->status |= GUC_CTB_STATUS_OVERFLOW;
>> +        goto corrupted;
>> +    }
>> +#else
>> +    if (unlikely((tail | ctb->head) >= size)) {
> Could just be 'head' rather than 'ctb->head'.

or drop "ctb->head" completely since this is driver owned field and
above you already have BUGON to test it

Michal

> 
> John.
> 
>>           CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>>                head, tail, size);
>>           desc->status |= GUC_CTB_STATUS_OVERFLOW;
>>           goto corrupted;
>>       }
>> +#endif
>>         /* tail == head condition indicates empty */
>>       available = tail - head;
>> @@ -803,6 +832,7 @@ static int ct_read(struct intel_guc_ct *ct, struct
>> ct_incoming_msg **msg)
>>       }
>>       CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
>>   +    ctb->head = head;
>>       /* now update descriptor */
>>       WRITE_ONCE(desc->head, head);
>>   diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>> index bee03794c1eb..edd1bba0445d 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>> @@ -33,6 +33,9 @@ struct intel_guc;
>>    * @desc: pointer to the buffer descriptor
>>    * @cmds: pointer to the commands buffer
>>    * @size: size of the commands buffer in dwords
>> + * @head: local shadow copy of head in dwords
>> + * @tail: local shadow copy of tail in dwords
>> + * @space: local shadow copy of space in dwords
>>    * @broken: flag to indicate if descriptor data is broken
>>    */
>>   struct intel_guc_ct_buffer {
>> @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
>>       struct guc_ct_buffer_desc *desc;
>>       u32 *cmds;
>>       u32 size;
>> +    u32 tail;
>> +    u32 head;
>> +    u32 space;
>>       bool broken;
>>   };
>>   
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
  2021-07-01 17:15 ` [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads Matthew Brost
@ 2021-07-06 19:00   ` John Harrison
  2021-07-06 19:12     ` Michal Wajdeczko
  0 siblings, 1 reply; 54+ messages in thread
From: John Harrison @ 2021-07-06 19:00 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: Michal.Wajdeczko

On 7/1/2021 10:15, Matthew Brost wrote:
> CTB writes are now in the path of command submission and should be
> optimized for performance. Rather than reading CTB descriptor values
> (e.g. head, tail) which could result in accesses across the PCIe bus,
> store shadow local copies and only read/write the descriptor values when
> absolutely necessary. Also store the current space in the each channel
> locally.
>
> v2:
>   (Michel)
>    - Add additional sanity checks for head / tail pointers
>    - Use GUC_CTB_HDR_LEN rather than magic 1
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++--------
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
>   2 files changed, 65 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index a9cb7b608520..5b8b4ff609e2 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
>   static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
>   {
>   	ctb->broken = false;
> +	ctb->tail = 0;
> +	ctb->head = 0;
> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
> +
>   	guc_ct_buffer_desc_init(ctb->desc);
>   }
>   
> @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
>   {
>   	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>   	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> -	u32 tail = desc->tail;
> +	u32 tail = ctb->tail;
>   	u32 size = ctb->size;
> -	u32 used;
>   	u32 header;
>   	u32 hxg;
>   	u32 *cmds = ctb->cmds;
> @@ -395,25 +397,22 @@ static int ct_write(struct intel_guc_ct *ct,
>   	if (unlikely(desc->status))
>   		goto corrupted;
>   
> -	if (unlikely((tail | head) >= size)) {
> +	GEM_BUG_ON(tail > size);
> +
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> +	if (unlikely(tail != READ_ONCE(desc->tail))) {
> +		CT_ERROR(ct, "Tail was modified %u != %u\n",
> +			 desc->tail, ctb->tail);
> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> +		goto corrupted;
> +	}
> +	if (unlikely((desc->tail | desc->head) >= size)) {
>   		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> -			 head, tail, size);
> +			 desc->head, desc->tail, size);
>   		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>   		goto corrupted;
>   	}
> -
> -	/*
> -	 * tail == head condition indicates empty. GuC FW does not support
> -	 * using up the entire buffer to get tail == head meaning full.
> -	 */
> -	if (tail < head)
> -		used = (size - head) + tail;
> -	else
> -		used = tail - head;
> -
> -	/* make sure there is a space including extra dw for the fence */
> -	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
> -		return -ENOSPC;
> +#endif
>   
>   	/*
>   	 * dw0: CT header (including fence)
> @@ -454,7 +453,9 @@ static int ct_write(struct intel_guc_ct *ct,
>   	write_barrier(ct);
>   
>   	/* now update descriptor */
> +	ctb->tail = tail;
>   	WRITE_ONCE(desc->tail, tail);
> +	ctb->space -= len + GUC_CTB_HDR_LEN;
>   
>   	return 0;
>   
> @@ -470,7 +471,7 @@ static int ct_write(struct intel_guc_ct *ct,
>    * @req:	pointer to pending request
>    * @status:	placeholder for status
>    *
> - * For each sent request, Guc shall send bac CT response message.
> + * For each sent request, GuC shall send back CT response message.
>    * Our message handler will update status of tracked request once
>    * response message with given fence is received. Wait here and
>    * check for valid response status value.
> @@ -526,24 +527,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
>   	return ret;
>   }
>   
> -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
>   {
> -	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = READ_ONCE(desc->head);
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> +	u32 head;
>   	u32 space;
>   
> -	space = CIRC_SPACE(desc->tail, head, ctb->size);
> +	if (ctb->space >= len_dw)
> +		return true;
> +
> +	head = READ_ONCE(ctb->desc->head);
> +	if (unlikely(head > ctb->size)) {
> +		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
> +			 ctb->desc->head, ctb->desc->tail, ctb->size);
> +		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
> +		ctb->broken = true;
> +		return false;
> +	}
> +
> +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> +	ctb->space = space;
>   
>   	return space >= len_dw;
>   }
>   
>   static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
>   {
> -	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> -
>   	lockdep_assert_held(&ct->ctbs.send.lock);
>   
> -	if (unlikely(!h2g_has_room(ctb, len_dw))) {
> +	if (unlikely(!h2g_has_room(ct, len_dw))) {
>   		if (ct->stall_time == KTIME_MAX)
>   			ct->stall_time = ktime_get();
>   
> @@ -613,7 +625,7 @@ static int ct_send(struct intel_guc_ct *ct,
>   	 */
>   retry:
>   	spin_lock_irqsave(&ctb->lock, flags);
> -	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
> +	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
>   		if (ct->stall_time == KTIME_MAX)
>   			ct->stall_time = ktime_get();
>   		spin_unlock_irqrestore(&ctb->lock, flags);
> @@ -733,7 +745,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>   {
>   	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
>   	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head;
> +	u32 head = ctb->head;
>   	u32 tail = desc->tail;
>   	u32 size = ctb->size;
>   	u32 *cmds = ctb->cmds;
> @@ -748,12 +760,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>   	if (unlikely(desc->status))
>   		goto corrupted;
>   
> -	if (unlikely((tail | head) >= size)) {
> +	GEM_BUG_ON(head > size);
Is the BUG_ON necessary given that both options below do the same check 
but as a corrupted buffer test (with subsequent recovery by GT reset?) 
rather than killing the driver.

> +
> +#ifdef CONFIG_DRM_I915_DEBUG_GUC
> +	if (unlikely(head != READ_ONCE(desc->head))) {
> +		CT_ERROR(ct, "Head was modified %u != %u\n",
> +			 desc->head, ctb->head);
> +		desc->status |= GUC_CTB_STATUS_MISMATCH;
> +		goto corrupted;
> +	}
> +	if (unlikely((desc->tail | desc->head) >= size)) {
> +		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
> +			 head, tail, size);
> +		desc->status |= GUC_CTB_STATUS_OVERFLOW;
> +		goto corrupted;
> +	}
> +#else
> +	if (unlikely((tail | ctb->head) >= size)) {
Could just be 'head' rather than 'ctb->head'.

John.

>   		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
>   			 head, tail, size);
>   		desc->status |= GUC_CTB_STATUS_OVERFLOW;
>   		goto corrupted;
>   	}
> +#endif
>   
>   	/* tail == head condition indicates empty */
>   	available = tail - head;
> @@ -803,6 +832,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>   	}
>   	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
>   
> +	ctb->head = head;
>   	/* now update descriptor */
>   	WRITE_ONCE(desc->head, head);
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index bee03794c1eb..edd1bba0445d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -33,6 +33,9 @@ struct intel_guc;
>    * @desc: pointer to the buffer descriptor
>    * @cmds: pointer to the commands buffer
>    * @size: size of the commands buffer in dwords
> + * @head: local shadow copy of head in dwords
> + * @tail: local shadow copy of tail in dwords
> + * @space: local shadow copy of space in dwords
>    * @broken: flag to indicate if descriptor data is broken
>    */
>   struct intel_guc_ct_buffer {
> @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
>   	struct guc_ct_buffer_desc *desc;
>   	u32 *cmds;
>   	u32 size;
> +	u32 tail;
> +	u32 head;
> +	u32 space;
>   	bool broken;
>   };
>   


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
  2021-07-01 17:15 [PATCH 0/7] CT changes required for GuC submission Matthew Brost
@ 2021-07-01 17:15 ` Matthew Brost
  2021-07-06 19:00   ` John Harrison
  0 siblings, 1 reply; 54+ messages in thread
From: Matthew Brost @ 2021-07-01 17:15 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: john.c.harrison, Michal.Wajdeczko

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail) which could result in accesses across the PCIe bus,
store shadow local copies and only read/write the descriptor values when
absolutely necessary. Also store the current space in the each channel
locally.

v2:
 (Michel)
  - Add additional sanity checks for head / tail pointers
  - Use GUC_CTB_HDR_LEN rather than magic 1

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++--------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
 2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index a9cb7b608520..5b8b4ff609e2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
 static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
 {
 	ctb->broken = false;
+	ctb->tail = 0;
+	ctb->head = 0;
+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+
 	guc_ct_buffer_desc_init(ctb->desc);
 }
 
@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
-	u32 tail = desc->tail;
+	u32 tail = ctb->tail;
 	u32 size = ctb->size;
-	u32 used;
 	u32 header;
 	u32 hxg;
 	u32 *cmds = ctb->cmds;
@@ -395,25 +397,22 @@ static int ct_write(struct intel_guc_ct *ct,
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
+	GEM_BUG_ON(tail > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(tail != READ_ONCE(desc->tail))) {
+		CT_ERROR(ct, "Tail was modified %u != %u\n",
+			 desc->tail, ctb->tail);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+	if (unlikely((desc->tail | desc->head) >= size)) {
 		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-			 head, tail, size);
+			 desc->head, desc->tail, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
-
-	/*
-	 * tail == head condition indicates empty. GuC FW does not support
-	 * using up the entire buffer to get tail == head meaning full.
-	 */
-	if (tail < head)
-		used = (size - head) + tail;
-	else
-		used = tail - head;
-
-	/* make sure there is a space including extra dw for the fence */
-	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
-		return -ENOSPC;
+#endif
 
 	/*
 	 * dw0: CT header (including fence)
@@ -454,7 +453,9 @@ static int ct_write(struct intel_guc_ct *ct,
 	write_barrier(ct);
 
 	/* now update descriptor */
+	ctb->tail = tail;
 	WRITE_ONCE(desc->tail, tail);
+	ctb->space -= len + GUC_CTB_HDR_LEN;
 
 	return 0;
 
@@ -470,7 +471,7 @@ static int ct_write(struct intel_guc_ct *ct,
  * @req:	pointer to pending request
  * @status:	placeholder for status
  *
- * For each sent request, Guc shall send bac CT response message.
+ * For each sent request, GuC shall send back CT response message.
  * Our message handler will update status of tracked request once
  * response message with given fence is received. Wait here and
  * check for valid response status value.
@@ -526,24 +527,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 	return ret;
 }
 
-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
+static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = READ_ONCE(desc->head);
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+	u32 head;
 	u32 space;
 
-	space = CIRC_SPACE(desc->tail, head, ctb->size);
+	if (ctb->space >= len_dw)
+		return true;
+
+	head = READ_ONCE(ctb->desc->head);
+	if (unlikely(head > ctb->size)) {
+		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
+			 ctb->desc->head, ctb->desc->tail, ctb->size);
+		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
+		ctb->broken = true;
+		return false;
+	}
+
+	space = CIRC_SPACE(ctb->tail, head, ctb->size);
+	ctb->space = space;
 
 	return space >= len_dw;
 }
 
 static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
-
 	lockdep_assert_held(&ct->ctbs.send.lock);
 
-	if (unlikely(!h2g_has_room(ctb, len_dw))) {
+	if (unlikely(!h2g_has_room(ct, len_dw))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 
@@ -613,7 +625,7 @@ static int ct_send(struct intel_guc_ct *ct,
 	 */
 retry:
 	spin_lock_irqsave(&ctb->lock, flags);
-	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
+	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 		spin_unlock_irqrestore(&ctb->lock, flags);
@@ -733,7 +745,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
+	u32 head = ctb->head;
 	u32 tail = desc->tail;
 	u32 size = ctb->size;
 	u32 *cmds = ctb->cmds;
@@ -748,12 +760,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
+	GEM_BUG_ON(head > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(head != READ_ONCE(desc->head))) {
+		CT_ERROR(ct, "Head was modified %u != %u\n",
+			 desc->head, ctb->head);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+	if (unlikely((desc->tail | desc->head) >= size)) {
+		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
+			 head, tail, size);
+		desc->status |= GUC_CTB_STATUS_OVERFLOW;
+		goto corrupted;
+	}
+#else
+	if (unlikely((tail | ctb->head) >= size)) {
 		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
 			 head, tail, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
+#endif
 
 	/* tail == head condition indicates empty */
 	available = tail - head;
@@ -803,6 +832,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	}
 	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
 
+	ctb->head = head;
 	/* now update descriptor */
 	WRITE_ONCE(desc->head, head);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index bee03794c1eb..edd1bba0445d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -33,6 +33,9 @@ struct intel_guc;
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
  * @size: size of the commands buffer in dwords
+ * @head: local shadow copy of head in dwords
+ * @tail: local shadow copy of tail in dwords
+ * @space: local shadow copy of space in dwords
  * @broken: flag to indicate if descriptor data is broken
  */
 struct intel_guc_ct_buffer {
@@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
 	u32 size;
+	u32 tail;
+	u32 head;
+	u32 space;
 	bool broken;
 };
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
  2021-06-27 23:14 [PATCH 0/7] CT changes required for GuC submission Matthew Brost
@ 2021-06-27 23:14 ` Matthew Brost
  0 siblings, 0 replies; 54+ messages in thread
From: Matthew Brost @ 2021-06-27 23:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: daniele.ceraolospurio, john.c.harrison, Michal.Wajdeczko

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail) which could result in accesses across the PCIe bus,
store shadow local copies and only read/write the descriptor values when
absolutely necessary. Also store the current space in the each channel
locally.

v2:
 (Michel)
  - Add additional sanity checks for head / tail pointers
  - Use GUC_CTB_HDR_LEN rather than magic 1

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++--------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
 2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 8f553f7f9619..63bce4dc21fa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
 static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
 {
 	ctb->broken = false;
+	ctb->tail = 0;
+	ctb->head = 0;
+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+
 	guc_ct_buffer_desc_init(ctb->desc);
 }
 
@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
-	u32 tail = desc->tail;
+	u32 tail = ctb->tail;
 	u32 size = ctb->size;
-	u32 used;
 	u32 header;
 	u32 hxg;
 	u32 *cmds = ctb->cmds;
@@ -395,25 +397,22 @@ static int ct_write(struct intel_guc_ct *ct,
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
+	GEM_BUG_ON(tail > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(tail != READ_ONCE(desc->tail))) {
+		CT_ERROR(ct, "Tail was modified %u != %u\n",
+			 desc->tail, ctb->tail);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+	if (unlikely((desc->tail | desc->head) >= size)) {
 		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-			 head, tail, size);
+			 desc->head, desc->tail, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
-
-	/*
-	 * tail == head condition indicates empty. GuC FW does not support
-	 * using up the entire buffer to get tail == head meaning full.
-	 */
-	if (tail < head)
-		used = (size - head) + tail;
-	else
-		used = tail - head;
-
-	/* make sure there is a space including extra dw for the fence */
-	if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
-		return -ENOSPC;
+#endif
 
 	/*
 	 * dw0: CT header (including fence)
@@ -454,7 +453,9 @@ static int ct_write(struct intel_guc_ct *ct,
 	write_barrier(ct);
 
 	/* now update descriptor */
+	ctb->tail = tail;
 	WRITE_ONCE(desc->tail, tail);
+	ctb->space -= len + GUC_CTB_HDR_LEN;
 
 	return 0;
 
@@ -470,7 +471,7 @@ static int ct_write(struct intel_guc_ct *ct,
  * @req:	pointer to pending request
  * @status:	placeholder for status
  *
- * For each sent request, Guc shall send bac CT response message.
+ * For each sent request, GuC shall send back CT response message.
  * Our message handler will update status of tracked request once
  * response message with given fence is received. Wait here and
  * check for valid response status value.
@@ -526,24 +527,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 	return ret;
 }
 
-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
+static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = READ_ONCE(desc->head);
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+	u32 head;
 	u32 space;
 
-	space = CIRC_SPACE(desc->tail, head, ctb->size);
+	if (ctb->space >= len_dw)
+		return true;
+
+	head = READ_ONCE(ctb->desc->head);
+	if (unlikely(head > ctb->size)) {
+		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
+			  ctb->desc->head, ctb->desc->tail, ctb->size);
+		ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
+		ctb->broken = true;
+		return false;
+	}
+
+	space = CIRC_SPACE(ctb->tail, head, ctb->size);
+	ctb->space = space;
 
 	return space >= len_dw;
 }
 
 static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
 {
-	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
-
 	lockdep_assert_held(&ct->ctbs.send.lock);
 
-	if (unlikely(!h2g_has_room(ctb, len_dw))) {
+	if (unlikely(!h2g_has_room(ct, len_dw))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 
@@ -619,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
 	 */
 retry:
 	spin_lock_irqsave(&ctb->lock, flags);
-	if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
+	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 		spin_unlock_irqrestore(&ctb->lock, flags);
@@ -736,7 +748,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head;
+	u32 head = ctb->head;
 	u32 tail = desc->tail;
 	u32 size = ctb->size;
 	u32 *cmds = ctb->cmds;
@@ -751,12 +763,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	if (unlikely(desc->status))
 		goto corrupted;
 
-	if (unlikely((tail | head) >= size)) {
+	GEM_BUG_ON(head > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+	if (unlikely(head != READ_ONCE(desc->head))) {
+		CT_ERROR(ct, "Head was modified %u != %u\n",
+			 desc->head, ctb->head);
+		desc->status |= GUC_CTB_STATUS_MISMATCH;
+		goto corrupted;
+	}
+	if (unlikely((desc->tail | desc->head) >= size)) {
+		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
+			 head, tail, size);
+		desc->status |= GUC_CTB_STATUS_OVERFLOW;
+		goto corrupted;
+	}
+#else
+	if (unlikely((tail | ctb->head) >= size)) {
 		CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
 			 head, tail, size);
 		desc->status |= GUC_CTB_STATUS_OVERFLOW;
 		goto corrupted;
 	}
+#endif
 
 	/* tail == head condition indicates empty */
 	available = tail - head;
@@ -806,6 +835,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	}
 	CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
 
+	ctb->head = head;
 	/* now update descriptor */
 	WRITE_ONCE(desc->head, head);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index c9d6ae7848a7..a5b42b58483b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -33,6 +33,9 @@ struct intel_guc;
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
  * @size: size of the commands buffer in dwords
+ * @head: local shadow copy of head in dwords
+ * @tail: local shadow copy of tail in dwords
+ * @space: local shadow copy of space in dwords
  * @broken: flag to indicate if descriptor data is broken
  */
 struct intel_guc_ct_buffer {
@@ -40,6 +43,9 @@ struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
 	u32 size;
+	u32 tail;
+	u32 head;
+	u32 space;
 	bool broken;
 };
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2021-07-08 16:04 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-06 22:20 [PATCH 0/7] CT changes required for GuC submission Matthew Brost
2021-07-06 22:20 ` [Intel-gfx] " Matthew Brost
2021-07-06 22:10 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for CT changes required for GuC submission (rev3) Patchwork
2021-07-06 22:11 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-07-06 22:20 ` [PATCH 1/7] drm/i915/guc: Relax CTB response timeout Matthew Brost
2021-07-06 22:20   ` [Intel-gfx] " Matthew Brost
2021-07-06 22:20 ` [PATCH 2/7] drm/i915/guc: Improve error message for unsolicited CT response Matthew Brost
2021-07-06 22:20   ` [Intel-gfx] " Matthew Brost
2021-07-06 22:20 ` [PATCH 3/7] drm/i915/guc: Increase size of CTB buffers Matthew Brost
2021-07-06 22:20   ` [Intel-gfx] " Matthew Brost
2021-07-06 22:20 ` [PATCH 4/7] drm/i915/guc: Add non blocking CTB send function Matthew Brost
2021-07-06 22:20   ` [Intel-gfx] " Matthew Brost
2021-07-06 22:20 ` [PATCH 5/7] drm/i915/guc: Add stall timer to " Matthew Brost
2021-07-06 22:20   ` [Intel-gfx] " Matthew Brost
2021-07-06 22:20 ` [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads Matthew Brost
2021-07-06 22:20   ` [Intel-gfx] " Matthew Brost
2021-07-06 22:51   ` John Harrison
2021-07-06 22:51     ` [Intel-gfx] " John Harrison
2021-07-07 17:50     ` Matthew Brost
2021-07-07 17:50       ` [Intel-gfx] " Matthew Brost
2021-07-07 18:19       ` John Harrison
2021-07-07 18:19         ` [Intel-gfx] " John Harrison
2021-07-07 18:56         ` Matthew Brost
2021-07-07 18:56           ` [Intel-gfx] " Matthew Brost
2021-07-07 20:21           ` John Harrison
2021-07-07 20:21             ` [Intel-gfx] " John Harrison
2021-07-07 20:23             ` Matthew Brost
2021-07-07 20:23               ` [Intel-gfx] " Matthew Brost
2021-07-06 22:20 ` [PATCH 7/7] drm/i915/guc: Module load failure test for CT buffer creation Matthew Brost
2021-07-06 22:20   ` [Intel-gfx] " Matthew Brost
2021-07-06 22:38 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for CT changes required for GuC submission (rev3) Patchwork
2021-07-07 19:09 ` [PATCH 06/56] drm/i915/guc: Optimize CTB writes and reads Matthew Brost
2021-07-07 19:09   ` [Intel-gfx] " Matthew Brost
2021-07-07 20:30   ` Michal Wajdeczko
2021-07-07 20:30     ` [Intel-gfx] " Michal Wajdeczko
2021-07-07 23:25 ` [PATCH 06/7] " Matthew Brost
2021-07-07 23:25   ` [Intel-gfx] " Matthew Brost
2021-07-08 13:23   ` Michal Wajdeczko
2021-07-08 13:23     ` [Intel-gfx] " Michal Wajdeczko
2021-07-08  0:30 ` [PATCH 0/2] Introduce set_parallel2 extension Matthew Brost
2021-07-08  0:30   ` [Intel-gfx] " Matthew Brost
2021-07-08  0:30   ` [PATCH 1/2] INTEL_DII/NOT_UPSTREAM: drm/i915: " Matthew Brost
2021-07-08  0:30   ` [PATCH 2/2] REVIEW: Full tree diff against internal/internal Matthew Brost
2021-07-08  0:30     ` [Intel-gfx] " Matthew Brost
2021-07-08  2:18 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for CT changes required for GuC submission (rev4) Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2021-07-08 16:20 [PATCH 0/7] CT changes required for GuC submission Matthew Brost
2021-07-08 16:20 ` [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads Matthew Brost
2021-07-01 17:15 [PATCH 0/7] CT changes required for GuC submission Matthew Brost
2021-07-01 17:15 ` [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads Matthew Brost
2021-07-06 19:00   ` John Harrison
2021-07-06 19:12     ` Michal Wajdeczko
2021-07-06 19:19       ` John Harrison
2021-07-06 19:33         ` Michal Wajdeczko
2021-07-06 21:22           ` Matthew Brost
2021-07-06 22:41           ` John Harrison
2021-06-27 23:14 [PATCH 0/7] CT changes required for GuC submission Matthew Brost
2021-06-27 23:14 ` [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads Matthew Brost

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.