All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Fix stealing guc_ids + test
@ 2021-12-14 17:04 ` Matthew Brost
  0 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-14 17:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Patches 1 & 2 address bugs in stealing of guc_ids and patch 7 tests this
path.

Patches 4-6 address some issues with the CTs exposed by the selftest in
patch 7. Basically if a lot of contexts were all deregistered all at
once, the CT channel could deadlock.

Patch 3 is a small fix that is already review but just included for CI.

v2: Address comments, resend for CI
v3: Address comments in patch #7

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

John Harrison (1):
  drm/i915/guc: Don't hog IRQs when destroying contexts

Matthew Brost (6):
  drm/i915/guc: Use correct context lock when callig
    clr_context_registered
  drm/i915/guc: Only assign guc_id.id when stealing guc_id
  drm/i915/guc: Remove racey GEM_BUG_ON
  drm/i915/guc: Add extra debug on CT deadlock
  drm/i915/guc: Kick G2H tasklet if no credits
  drm/i915/guc: Selftest for stealing of guc ids

 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  12 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  18 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  69 ++++---
 drivers/gpu/drm/i915/gt/uc/selftest_guc.c     | 173 ++++++++++++++++++
 4 files changed, 244 insertions(+), 28 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Intel-gfx] [PATCH 0/7] Fix stealing guc_ids + test
@ 2021-12-14 17:04 ` Matthew Brost
  0 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-14 17:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Patches 1 & 2 address bugs in stealing of guc_ids and patch 7 tests this
path.

Patches 4-6 address some issues with the CTs exposed by the selftest in
patch 7. Basically if a lot of contexts were all deregistered all at
once, the CT channel could deadlock.

Patch 3 is a small fix that is already review but just included for CI.

v2: Address comments, resend for CI
v3: Address comments in patch #7

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

John Harrison (1):
  drm/i915/guc: Don't hog IRQs when destroying contexts

Matthew Brost (6):
  drm/i915/guc: Use correct context lock when callig
    clr_context_registered
  drm/i915/guc: Only assign guc_id.id when stealing guc_id
  drm/i915/guc: Remove racey GEM_BUG_ON
  drm/i915/guc: Add extra debug on CT deadlock
  drm/i915/guc: Kick G2H tasklet if no credits
  drm/i915/guc: Selftest for stealing of guc ids

 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  12 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  18 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  69 ++++---
 drivers/gpu/drm/i915/gt/uc/selftest_guc.c     | 173 ++++++++++++++++++
 4 files changed, 244 insertions(+), 28 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/7] drm/i915/guc: Use correct context lock when callig clr_context_registered
  2021-12-14 17:04 ` [Intel-gfx] " Matthew Brost
@ 2021-12-14 17:04   ` Matthew Brost
  -1 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-14 17:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

s/ce/cn/ when grabbing guc_state.lock before calling
clr_context_registered.

Fixes: 0f7976506de61 ("drm/i915/guc: Rework and simplify locking")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 1f9d4fde421f..9b7b4f4e0d91 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1937,9 +1937,9 @@ static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce)
 		list_del_init(&cn->guc_id.link);
 		ce->guc_id = cn->guc_id;
 
-		spin_lock(&ce->guc_state.lock);
+		spin_lock(&cn->guc_state.lock);
 		clr_context_registered(cn);
-		spin_unlock(&ce->guc_state.lock);
+		spin_unlock(&cn->guc_state.lock);
 
 		set_context_guc_id_invalid(cn);
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-gfx] [PATCH 1/7] drm/i915/guc: Use correct context lock when callig clr_context_registered
@ 2021-12-14 17:04   ` Matthew Brost
  0 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-14 17:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

s/ce/cn/ when grabbing guc_state.lock before calling
clr_context_registered.

Fixes: 0f7976506de61 ("drm/i915/guc: Rework and simplify locking")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 1f9d4fde421f..9b7b4f4e0d91 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1937,9 +1937,9 @@ static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce)
 		list_del_init(&cn->guc_id.link);
 		ce->guc_id = cn->guc_id;
 
-		spin_lock(&ce->guc_state.lock);
+		spin_lock(&cn->guc_state.lock);
 		clr_context_registered(cn);
-		spin_unlock(&ce->guc_state.lock);
+		spin_unlock(&cn->guc_state.lock);
 
 		set_context_guc_id_invalid(cn);
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/7] drm/i915/guc: Only assign guc_id.id when stealing guc_id
  2021-12-14 17:04 ` [Intel-gfx] " Matthew Brost
@ 2021-12-14 17:04   ` Matthew Brost
  -1 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-14 17:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Previously assigned whole guc_id structure (list, spin lock) which is
incorrect, only assign the guc_id.id.

Fixes: 0f7976506de61 ("drm/i915/guc: Rework and simplify locking")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9b7b4f4e0d91..0fb2eeff0262 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1935,7 +1935,7 @@ static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce)
 		GEM_BUG_ON(intel_context_is_parent(cn));
 
 		list_del_init(&cn->guc_id.link);
-		ce->guc_id = cn->guc_id;
+		ce->guc_id.id = cn->guc_id.id;
 
 		spin_lock(&cn->guc_state.lock);
 		clr_context_registered(cn);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-gfx] [PATCH 2/7] drm/i915/guc: Only assign guc_id.id when stealing guc_id
@ 2021-12-14 17:04   ` Matthew Brost
  0 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-14 17:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Previously assigned whole guc_id structure (list, spin lock) which is
incorrect, only assign the guc_id.id.

Fixes: 0f7976506de61 ("drm/i915/guc: Rework and simplify locking")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9b7b4f4e0d91..0fb2eeff0262 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1935,7 +1935,7 @@ static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce)
 		GEM_BUG_ON(intel_context_is_parent(cn));
 
 		list_del_init(&cn->guc_id.link);
-		ce->guc_id = cn->guc_id;
+		ce->guc_id.id = cn->guc_id.id;
 
 		spin_lock(&cn->guc_state.lock);
 		clr_context_registered(cn);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/7] drm/i915/guc: Remove racey GEM_BUG_ON
  2021-12-14 17:04 ` [Intel-gfx] " Matthew Brost
@ 2021-12-14 17:04   ` Matthew Brost
  -1 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-14 17:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

A full GT reset can race with the last context put resulting in the
context ref count being zero but the destroyed bit not yet being set.
Remove GEM_BUG_ON in scrub_guc_desc_for_outstanding_g2h that asserts the
destroyed bit must be set in ref count is zero.

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 0fb2eeff0262..36c2965db49b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1040,8 +1040,6 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 
 		spin_unlock(&ce->guc_state.lock);
 
-		GEM_BUG_ON(!do_put && !destroyed);
-
 		if (pending_enable || destroyed || deregister) {
 			decr_outstanding_submission_g2h(guc);
 			if (deregister)
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-gfx] [PATCH 3/7] drm/i915/guc: Remove racey GEM_BUG_ON
@ 2021-12-14 17:04   ` Matthew Brost
  0 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-14 17:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

A full GT reset can race with the last context put resulting in the
context ref count being zero but the destroyed bit not yet being set.
Remove GEM_BUG_ON in scrub_guc_desc_for_outstanding_g2h that asserts the
destroyed bit must be set in ref count is zero.

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 0fb2eeff0262..36c2965db49b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1040,8 +1040,6 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 
 		spin_unlock(&ce->guc_state.lock);
 
-		GEM_BUG_ON(!do_put && !destroyed);
-
 		if (pending_enable || destroyed || deregister) {
 			decr_outstanding_submission_g2h(guc);
 			if (deregister)
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts
  2021-12-14 17:04 ` [Intel-gfx] " Matthew Brost
@ 2021-12-14 17:04   ` Matthew Brost
  -1 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-14 17:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

While attempting to debug a CT deadlock issue in various CI failures
(most easily reproduced with gem_ctx_create/basic-files), I was seeing
CPU deadlock errors being reported. This were because the context
destroy loop was blocking waiting on H2G space from inside an IRQ
spinlock. There no was deadlock as such, it's just that the H2G queue
was full of context destroy commands and GuC was taking a long time to
process them. However, the kernel was seeing the large amount of time
spent inside the IRQ lock as a dead CPU. Various Bad Things(tm) would
then happen (heartbeat failures, CT deadlock errors, outstanding H2G
WARNs, etc.).

Re-working the loop to only acquire the spinlock around the list
management (which is all it is meant to protect) rather than the
entire destroy operation seems to fix all the above issues.

v2:
 (John Harrison)
  - Fix typo in comment message

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 45 ++++++++++++-------
 1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 36c2965db49b..96fcf869e3ff 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2644,7 +2644,6 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 	unsigned long flags;
 	bool disabled;
 
-	lockdep_assert_held(&guc->submission_state.lock);
 	GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
 	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id.id));
 	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id.id));
@@ -2660,7 +2659,7 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 	}
 	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 	if (unlikely(disabled)) {
-		__release_guc_id(guc, ce);
+		release_guc_id(guc, ce);
 		__guc_context_destroy(ce);
 		return;
 	}
@@ -2694,36 +2693,48 @@ static void __guc_context_destroy(struct intel_context *ce)
 
 static void guc_flush_destroyed_contexts(struct intel_guc *guc)
 {
-	struct intel_context *ce, *cn;
+	struct intel_context *ce;
 	unsigned long flags;
 
 	GEM_BUG_ON(!submission_disabled(guc) &&
 		   guc_submission_initialized(guc));
 
-	spin_lock_irqsave(&guc->submission_state.lock, flags);
-	list_for_each_entry_safe(ce, cn,
-				 &guc->submission_state.destroyed_contexts,
-				 destroyed_link) {
-		list_del_init(&ce->destroyed_link);
-		__release_guc_id(guc, ce);
+	while (!list_empty(&guc->submission_state.destroyed_contexts)) {
+		spin_lock_irqsave(&guc->submission_state.lock, flags);
+		ce = list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
+					      struct intel_context,
+					      destroyed_link);
+		if (ce)
+			list_del_init(&ce->destroyed_link);
+		spin_unlock_irqrestore(&guc->submission_state.lock, flags);
+
+		if (!ce)
+			break;
+
+		release_guc_id(guc, ce);
 		__guc_context_destroy(ce);
 	}
-	spin_unlock_irqrestore(&guc->submission_state.lock, flags);
 }
 
 static void deregister_destroyed_contexts(struct intel_guc *guc)
 {
-	struct intel_context *ce, *cn;
+	struct intel_context *ce;
 	unsigned long flags;
 
-	spin_lock_irqsave(&guc->submission_state.lock, flags);
-	list_for_each_entry_safe(ce, cn,
-				 &guc->submission_state.destroyed_contexts,
-				 destroyed_link) {
-		list_del_init(&ce->destroyed_link);
+	while (!list_empty(&guc->submission_state.destroyed_contexts)) {
+		spin_lock_irqsave(&guc->submission_state.lock, flags);
+		ce = list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
+					      struct intel_context,
+					      destroyed_link);
+		if (ce)
+			list_del_init(&ce->destroyed_link);
+		spin_unlock_irqrestore(&guc->submission_state.lock, flags);
+
+		if (!ce)
+			break;
+
 		guc_lrc_desc_unpin(ce);
 	}
-	spin_unlock_irqrestore(&guc->submission_state.lock, flags);
 }
 
 static void destroyed_worker_func(struct work_struct *w)
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-gfx] [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts
@ 2021-12-14 17:04   ` Matthew Brost
  0 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-14 17:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

While attempting to debug a CT deadlock issue in various CI failures
(most easily reproduced with gem_ctx_create/basic-files), I was seeing
CPU deadlock errors being reported. This were because the context
destroy loop was blocking waiting on H2G space from inside an IRQ
spinlock. There no was deadlock as such, it's just that the H2G queue
was full of context destroy commands and GuC was taking a long time to
process them. However, the kernel was seeing the large amount of time
spent inside the IRQ lock as a dead CPU. Various Bad Things(tm) would
then happen (heartbeat failures, CT deadlock errors, outstanding H2G
WARNs, etc.).

Re-working the loop to only acquire the spinlock around the list
management (which is all it is meant to protect) rather than the
entire destroy operation seems to fix all the above issues.

v2:
 (John Harrison)
  - Fix typo in comment message

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 45 ++++++++++++-------
 1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 36c2965db49b..96fcf869e3ff 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2644,7 +2644,6 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 	unsigned long flags;
 	bool disabled;
 
-	lockdep_assert_held(&guc->submission_state.lock);
 	GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
 	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id.id));
 	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id.id));
@@ -2660,7 +2659,7 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 	}
 	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 	if (unlikely(disabled)) {
-		__release_guc_id(guc, ce);
+		release_guc_id(guc, ce);
 		__guc_context_destroy(ce);
 		return;
 	}
@@ -2694,36 +2693,48 @@ static void __guc_context_destroy(struct intel_context *ce)
 
 static void guc_flush_destroyed_contexts(struct intel_guc *guc)
 {
-	struct intel_context *ce, *cn;
+	struct intel_context *ce;
 	unsigned long flags;
 
 	GEM_BUG_ON(!submission_disabled(guc) &&
 		   guc_submission_initialized(guc));
 
-	spin_lock_irqsave(&guc->submission_state.lock, flags);
-	list_for_each_entry_safe(ce, cn,
-				 &guc->submission_state.destroyed_contexts,
-				 destroyed_link) {
-		list_del_init(&ce->destroyed_link);
-		__release_guc_id(guc, ce);
+	while (!list_empty(&guc->submission_state.destroyed_contexts)) {
+		spin_lock_irqsave(&guc->submission_state.lock, flags);
+		ce = list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
+					      struct intel_context,
+					      destroyed_link);
+		if (ce)
+			list_del_init(&ce->destroyed_link);
+		spin_unlock_irqrestore(&guc->submission_state.lock, flags);
+
+		if (!ce)
+			break;
+
+		release_guc_id(guc, ce);
 		__guc_context_destroy(ce);
 	}
-	spin_unlock_irqrestore(&guc->submission_state.lock, flags);
 }
 
 static void deregister_destroyed_contexts(struct intel_guc *guc)
 {
-	struct intel_context *ce, *cn;
+	struct intel_context *ce;
 	unsigned long flags;
 
-	spin_lock_irqsave(&guc->submission_state.lock, flags);
-	list_for_each_entry_safe(ce, cn,
-				 &guc->submission_state.destroyed_contexts,
-				 destroyed_link) {
-		list_del_init(&ce->destroyed_link);
+	while (!list_empty(&guc->submission_state.destroyed_contexts)) {
+		spin_lock_irqsave(&guc->submission_state.lock, flags);
+		ce = list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
+					      struct intel_context,
+					      destroyed_link);
+		if (ce)
+			list_del_init(&ce->destroyed_link);
+		spin_unlock_irqrestore(&guc->submission_state.lock, flags);
+
+		if (!ce)
+			break;
+
 		guc_lrc_desc_unpin(ce);
 	}
-	spin_unlock_irqrestore(&guc->submission_state.lock, flags);
 }
 
 static void destroyed_worker_func(struct work_struct *w)
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/7] drm/i915/guc: Add extra debug on CT deadlock
  2021-12-14 17:04 ` [Intel-gfx] " Matthew Brost
@ 2021-12-14 17:04   ` Matthew Brost
  -1 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-14 17:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Print CT state (H2G + G2H head / tail pointers, credits) on CT
deadlock.

v2:
 (John Harrison)
  - Add units to debug messages

Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index a0cc34be7b56..741be9abab68 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -523,6 +523,15 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 		CT_ERROR(ct, "Communication stalled for %lld ms, desc status=%#x,%#x\n",
 			 ktime_ms_delta(ktime_get(), ct->stall_time),
 			 send->status, recv->status);
+		CT_ERROR(ct, "H2G Space: %u (Bytes)\n",
+			 atomic_read(&ct->ctbs.send.space) * 4);
+		CT_ERROR(ct, "Head: %u (Dwords)\n", ct->ctbs.send.desc->head);
+		CT_ERROR(ct, "Tail: %u (Dwords)\n", ct->ctbs.send.desc->tail);
+		CT_ERROR(ct, "G2H Space: %u (Bytes)\n",
+			 atomic_read(&ct->ctbs.recv.space) * 4);
+		CT_ERROR(ct, "Head: %u\n (Dwords)", ct->ctbs.recv.desc->head);
+		CT_ERROR(ct, "Tail: %u\n (Dwords)", ct->ctbs.recv.desc->tail);
+
 		ct->ctbs.send.broken = true;
 	}
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-gfx] [PATCH 5/7] drm/i915/guc: Add extra debug on CT deadlock
@ 2021-12-14 17:04   ` Matthew Brost
  0 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-14 17:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Print CT state (H2G + G2H head / tail pointers, credits) on CT
deadlock.

v2:
 (John Harrison)
  - Add units to debug messages

Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index a0cc34be7b56..741be9abab68 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -523,6 +523,15 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 		CT_ERROR(ct, "Communication stalled for %lld ms, desc status=%#x,%#x\n",
 			 ktime_ms_delta(ktime_get(), ct->stall_time),
 			 send->status, recv->status);
+		CT_ERROR(ct, "H2G Space: %u (Bytes)\n",
+			 atomic_read(&ct->ctbs.send.space) * 4);
+		CT_ERROR(ct, "Head: %u (Dwords)\n", ct->ctbs.send.desc->head);
+		CT_ERROR(ct, "Tail: %u (Dwords)\n", ct->ctbs.send.desc->tail);
+		CT_ERROR(ct, "G2H Space: %u (Bytes)\n",
+			 atomic_read(&ct->ctbs.recv.space) * 4);
+		CT_ERROR(ct, "Head: %u\n (Dwords)", ct->ctbs.recv.desc->head);
+		CT_ERROR(ct, "Tail: %u\n (Dwords)", ct->ctbs.recv.desc->tail);
+
 		ct->ctbs.send.broken = true;
 	}
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/7] drm/i915/guc: Kick G2H tasklet if no credits
  2021-12-14 17:04 ` [Intel-gfx] " Matthew Brost
@ 2021-12-14 17:04   ` Matthew Brost
  -1 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-14 17:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Let's be paranoid and kick the G2H tasklet, which dequeues messages, if
G2H credits are exhausted.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 741be9abab68..aa6dd6415202 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -591,12 +591,19 @@ static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 
 static int has_room_nb(struct intel_guc_ct *ct, u32 h2g_dw, u32 g2h_dw)
 {
+	bool h2g = h2g_has_room(ct, h2g_dw);
+	bool g2h = g2h_has_room(ct, g2h_dw);
+
 	lockdep_assert_held(&ct->ctbs.send.lock);
 
-	if (unlikely(!h2g_has_room(ct, h2g_dw) || !g2h_has_room(ct, g2h_dw))) {
+	if (unlikely(!h2g || !g2h)) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 
+		/* Be paranoid and kick G2H tasklet to free credits */
+		if (!g2h)
+			tasklet_hi_schedule(&ct->receive_tasklet);
+
 		if (unlikely(ct_deadlocked(ct)))
 			return -EPIPE;
 		else
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-gfx] [PATCH 6/7] drm/i915/guc: Kick G2H tasklet if no credits
@ 2021-12-14 17:04   ` Matthew Brost
  0 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-14 17:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Let's be paranoid and kick the G2H tasklet, which dequeues messages, if
G2H credits are exhausted.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 741be9abab68..aa6dd6415202 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -591,12 +591,19 @@ static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 
 static int has_room_nb(struct intel_guc_ct *ct, u32 h2g_dw, u32 g2h_dw)
 {
+	bool h2g = h2g_has_room(ct, h2g_dw);
+	bool g2h = g2h_has_room(ct, g2h_dw);
+
 	lockdep_assert_held(&ct->ctbs.send.lock);
 
-	if (unlikely(!h2g_has_room(ct, h2g_dw) || !g2h_has_room(ct, g2h_dw))) {
+	if (unlikely(!h2g || !g2h)) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 
+		/* Be paranoid and kick G2H tasklet to free credits */
+		if (!g2h)
+			tasklet_hi_schedule(&ct->receive_tasklet);
+
 		if (unlikely(ct_deadlocked(ct)))
 			return -EPIPE;
 		else
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 7/7] drm/i915/guc: Selftest for stealing of guc ids
  2021-12-14 17:04 ` [Intel-gfx] " Matthew Brost
@ 2021-12-14 17:05   ` Matthew Brost
  -1 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-14 17:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Testing the stealing of guc ids is hard from user space as we have 64k
guc_ids. Add a selftest, which artificially reduces the number of guc
ids, and forces a steal.

The test creates a spinner which is used to block all subsequent
submissions until it completes. Next, a loop creates a context and a NOP
request each iteration until the guc_ids are exhausted (request creation
returns -EAGAIN). The spinner is ended, unblocking all requests created
in the loop. At this point all guc_ids are exhausted but are available
to steal. Try to create another request which should successfully steal
a guc_id. Wait on last request to complete, idle GPU, verify a guc_id
was stolen via a counter, and exit the test. Test also artificially
reduces the number of guc_ids so the test runs in a timely manner.

v2:
 (John Harrison)
  - s/stole/stolen
  - Fix some wording in test description
  - Rework indexing into context array
  - Add test description to commit message
  - Fix typo in commit message
 (Checkpatch)
  - s/guc/(guc) in NUMBER_MULTI_LRC_GUC_ID
v3:
 (John Harrison)
  - Set array value to NULL after extracting error
  - Fix a few typos in comments / error messages
  - Delete redundant comment in commit message

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  12 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  16 +-
 drivers/gpu/drm/i915/gt/uc/selftest_guc.c     | 173 ++++++++++++++++++
 3 files changed, 196 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 1cb46098030d..f9240d4baa69 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -94,6 +94,11 @@ struct intel_guc {
 		 * @guc_ids: used to allocate new guc_ids, single-lrc
 		 */
 		struct ida guc_ids;
+		/**
+		 * @num_guc_ids: Number of guc_ids, selftest feature to be able
+		 * to reduce this number while testing.
+		 */
+		int num_guc_ids;
 		/**
 		 * @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc
 		 */
@@ -202,6 +207,13 @@ struct intel_guc {
 		 */
 		struct delayed_work work;
 	} timestamp;
+
+#ifdef CONFIG_DRM_I915_SELFTEST
+	/**
+	 * @number_guc_id_stolen: The number of guc_ids that have been stolen
+	 */
+	int number_guc_id_stolen;
+#endif
 };
 
 static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 96fcf869e3ff..99414b49ca6d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -145,7 +145,8 @@ guc_create_parallel(struct intel_engine_cs **engines,
  * use should be low and 1/16 should be sufficient. Minimum of 32 guc_ids for
  * multi-lrc.
  */
-#define NUMBER_MULTI_LRC_GUC_ID		(GUC_MAX_LRC_DESCRIPTORS / 16)
+#define NUMBER_MULTI_LRC_GUC_ID(guc)	\
+	((guc)->submission_state.num_guc_ids / 16)
 
 /*
  * Below is a set of functions which control the GuC scheduling state which
@@ -1775,7 +1776,7 @@ int intel_guc_submission_init(struct intel_guc *guc)
 		  destroyed_worker_func);
 
 	guc->submission_state.guc_ids_bitmap =
-		bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID, GFP_KERNEL);
+		bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
 	if (!guc->submission_state.guc_ids_bitmap)
 		return -ENOMEM;
 
@@ -1869,13 +1870,13 @@ static int new_guc_id(struct intel_guc *guc, struct intel_context *ce)
 
 	if (intel_context_is_parent(ce))
 		ret = bitmap_find_free_region(guc->submission_state.guc_ids_bitmap,
-					      NUMBER_MULTI_LRC_GUC_ID,
+					      NUMBER_MULTI_LRC_GUC_ID(guc),
 					      order_base_2(ce->parallel.number_children
 							   + 1));
 	else
 		ret = ida_simple_get(&guc->submission_state.guc_ids,
-				     NUMBER_MULTI_LRC_GUC_ID,
-				     GUC_MAX_LRC_DESCRIPTORS,
+				     NUMBER_MULTI_LRC_GUC_ID(guc),
+				     guc->submission_state.num_guc_ids,
 				     GFP_KERNEL | __GFP_RETRY_MAYFAIL |
 				     __GFP_NOWARN);
 	if (unlikely(ret < 0))
@@ -1941,6 +1942,10 @@ static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce)
 
 		set_context_guc_id_invalid(cn);
 
+#ifdef CONFIG_DRM_I915_SELFTEST
+		guc->number_guc_id_stolen++;
+#endif
+
 		return 0;
 	} else {
 		return -EAGAIN;
@@ -3779,6 +3784,7 @@ static bool __guc_submission_selected(struct intel_guc *guc)
 
 void intel_guc_submission_init_early(struct intel_guc *guc)
 {
+	guc->submission_state.num_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
 	guc->submission_supported = __guc_submission_supported(guc);
 	guc->submission_selected = __guc_submission_selected(guc);
 }
diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c
index fb0e4a7bd8ca..2ae414446112 100644
--- a/drivers/gpu/drm/i915/gt/uc/selftest_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c
@@ -3,8 +3,21 @@
  * Copyright �� 2021 Intel Corporation
  */
 
+#include "selftests/igt_spinner.h"
 #include "selftests/intel_scheduler_helpers.h"
 
+static int request_add_spin(struct i915_request *rq, struct igt_spinner *spin)
+{
+	int err = 0;
+
+	i915_request_get(rq);
+	i915_request_add(rq);
+	if (spin && !igt_wait_for_spinner(spin, rq))
+		err = -ETIMEDOUT;
+
+	return err;
+}
+
 static struct i915_request *nop_user_request(struct intel_context *ce,
 					     struct i915_request *from)
 {
@@ -110,10 +123,170 @@ static int intel_guc_scrub_ctbs(void *arg)
 	return ret;
 }
 
+/*
+ * intel_guc_steal_guc_ids - Test to exhaust all guc_ids and then steal one
+ *
+ * This test creates a spinner which is used to block all subsequent submissions
+ * until it completes. Next, a loop creates a context and a NOP request each
+ * iteration until the guc_ids are exhausted (request creation returns -EAGAIN).
+ * The spinner is ended, unblocking all requests created in the loop. At this
+ * point all guc_ids are exhausted but are available to steal. Try to create
+ * another request which should successfully steal a guc_id. Wait on last
+ * request to complete, idle GPU, verify a guc_id was stolen via a counter, and
+ * exit the test. Test also artificially reduces the number of guc_ids so the
+ * test runs in a timely manner.
+ */
+static int intel_guc_steal_guc_ids(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct intel_guc *guc = &gt->uc.guc;
+	int ret, sv, context_index = 0;
+	intel_wakeref_t wakeref;
+	struct intel_engine_cs *engine;
+	struct intel_context **ce;
+	struct igt_spinner spin;
+	struct i915_request *spin_rq = NULL, *rq, *last = NULL;
+	int number_guc_id_stolen = guc->number_guc_id_stolen;
+
+	ce = kzalloc(sizeof(*ce) * GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL);
+	if (!ce) {
+		pr_err("Context array allocation failed\n");
+		return -ENOMEM;
+	}
+
+	wakeref = intel_runtime_pm_get(gt->uncore->rpm);
+	engine = intel_selftest_find_any_engine(gt);
+	sv = guc->submission_state.num_guc_ids;
+	guc->submission_state.num_guc_ids = 4096;
+
+	/* Create spinner to block requests in below loop */
+	ce[context_index] = intel_context_create(engine);
+	if (IS_ERR(ce[context_index])) {
+		ret = PTR_ERR(ce[context_index]);
+		ce[context_index] = NULL;
+		pr_err("Failed to create context: %d\n", ret);
+		goto err_wakeref;
+	}
+	ret = igt_spinner_init(&spin, engine->gt);
+	if (ret) {
+		pr_err("Failed to create spinner: %d\n", ret);
+		goto err_contexts;
+	}
+	spin_rq = igt_spinner_create_request(&spin, ce[context_index],
+					     MI_ARB_CHECK);
+	if (IS_ERR(spin_rq)) {
+		ret = PTR_ERR(spin_rq);
+		pr_err("Failed to create spinner request: %d\n", ret);
+		goto err_contexts;
+	}
+	ret = request_add_spin(spin_rq, &spin);
+	if (ret) {
+		pr_err("Failed to add Spinner request: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+	/* Use all guc_ids */
+	while (ret != -EAGAIN) {
+		ce[++context_index] = intel_context_create(engine);
+		if (IS_ERR(ce[context_index])) {
+			ret = PTR_ERR(ce[context_index--]);
+			ce[context_index] = NULL;
+			pr_err("Failed to create context: %d\n", ret);
+			goto err_spin_rq;
+		}
+
+		rq = nop_user_request(ce[context_index], spin_rq);
+		if (IS_ERR(rq)) {
+			ret = PTR_ERR(rq);
+			rq = NULL;
+			if (ret != -EAGAIN) {
+				pr_err("Failed to create request, %d: %d\n",
+				       context_index, ret);
+				goto err_spin_rq;
+			}
+		} else {
+			if (last)
+				i915_request_put(last);
+			last = rq;
+		}
+	}
+
+	/* Release blocked requests */
+	igt_spinner_end(&spin);
+	ret = intel_selftest_wait_for_rq(spin_rq);
+	if (ret) {
+		pr_err("Spin request failed to complete: %d\n", ret);
+		i915_request_put(last);
+		goto err_spin_rq;
+	}
+	i915_request_put(spin_rq);
+	igt_spinner_fini(&spin);
+	spin_rq = NULL;
+
+	/* Wait for last request */
+	ret = i915_request_wait(last, 0, HZ * 30);
+	i915_request_put(last);
+	if (ret < 0) {
+		pr_err("Last request failed to complete: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+	/* Try to steal guc_id */
+	rq = nop_user_request(ce[context_index], NULL);
+	if (IS_ERR(rq)) {
+		ret = PTR_ERR(rq);
+		pr_err("Failed to steal guc_id, %d: %d\n", context_index, ret);
+		goto err_spin_rq;
+	}
+
+	/* Wait for request with stolen guc_id */
+	ret = i915_request_wait(rq, 0, HZ);
+	i915_request_put(rq);
+	if (ret < 0) {
+		pr_err("Request with stolen guc_id failed to complete: %d\n",
+		       ret);
+		goto err_spin_rq;
+	}
+
+	/* Wait for idle */
+	ret = intel_gt_wait_for_idle(gt, HZ * 30);
+	if (ret < 0) {
+		pr_err("GT failed to idle: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+	/* Verify a guc_id was stolen */
+	if (guc->number_guc_id_stolen == number_guc_id_stolen) {
+		pr_err("No guc_id was stolen");
+		ret = -EINVAL;
+	} else {
+		ret = 0;
+	}
+
+err_spin_rq:
+	if (spin_rq) {
+		igt_spinner_end(&spin);
+		intel_selftest_wait_for_rq(spin_rq);
+		i915_request_put(spin_rq);
+		igt_spinner_fini(&spin);
+		intel_gt_wait_for_idle(gt, HZ * 30);
+	}
+err_contexts:
+	for (; context_index >= 0 && ce[context_index]; --context_index)
+		intel_context_put(ce[context_index]);
+err_wakeref:
+	intel_runtime_pm_put(gt->uncore->rpm, wakeref);
+	kfree(ce);
+	guc->submission_state.num_guc_ids = sv;
+
+	return ret;
+}
+
 int intel_guc_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
 		SUBTEST(intel_guc_scrub_ctbs),
+		SUBTEST(intel_guc_steal_guc_ids),
 	};
 	struct intel_gt *gt = &i915->gt;
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-gfx] [PATCH 7/7] drm/i915/guc: Selftest for stealing of guc ids
@ 2021-12-14 17:05   ` Matthew Brost
  0 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-14 17:05 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Testing the stealing of guc ids is hard from user space as we have 64k
guc_ids. Add a selftest, which artificially reduces the number of guc
ids, and forces a steal.

The test creates a spinner which is used to block all subsequent
submissions until it completes. Next, a loop creates a context and a NOP
request each iteration until the guc_ids are exhausted (request creation
returns -EAGAIN). The spinner is ended, unblocking all requests created
in the loop. At this point all guc_ids are exhausted but are available
to steal. Try to create another request which should successfully steal
a guc_id. Wait on last request to complete, idle GPU, verify a guc_id
was stolen via a counter, and exit the test. Test also artificially
reduces the number of guc_ids so the test runs in a timely manner.

v2:
 (John Harrison)
  - s/stole/stolen
  - Fix some wording in test description
  - Rework indexing into context array
  - Add test description to commit message
  - Fix typo in commit message
 (Checkpatch)
  - s/guc/(guc) in NUMBER_MULTI_LRC_GUC_ID
v3:
 (John Harrison)
  - Set array value to NULL after extracting error
  - Fix a few typos in comments / error messages
  - Delete redundant comment in commit message

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  12 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  16 +-
 drivers/gpu/drm/i915/gt/uc/selftest_guc.c     | 173 ++++++++++++++++++
 3 files changed, 196 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 1cb46098030d..f9240d4baa69 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -94,6 +94,11 @@ struct intel_guc {
 		 * @guc_ids: used to allocate new guc_ids, single-lrc
 		 */
 		struct ida guc_ids;
+		/**
+		 * @num_guc_ids: Number of guc_ids, selftest feature to be able
+		 * to reduce this number while testing.
+		 */
+		int num_guc_ids;
 		/**
 		 * @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc
 		 */
@@ -202,6 +207,13 @@ struct intel_guc {
 		 */
 		struct delayed_work work;
 	} timestamp;
+
+#ifdef CONFIG_DRM_I915_SELFTEST
+	/**
+	 * @number_guc_id_stolen: The number of guc_ids that have been stolen
+	 */
+	int number_guc_id_stolen;
+#endif
 };
 
 static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 96fcf869e3ff..99414b49ca6d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -145,7 +145,8 @@ guc_create_parallel(struct intel_engine_cs **engines,
  * use should be low and 1/16 should be sufficient. Minimum of 32 guc_ids for
  * multi-lrc.
  */
-#define NUMBER_MULTI_LRC_GUC_ID		(GUC_MAX_LRC_DESCRIPTORS / 16)
+#define NUMBER_MULTI_LRC_GUC_ID(guc)	\
+	((guc)->submission_state.num_guc_ids / 16)
 
 /*
  * Below is a set of functions which control the GuC scheduling state which
@@ -1775,7 +1776,7 @@ int intel_guc_submission_init(struct intel_guc *guc)
 		  destroyed_worker_func);
 
 	guc->submission_state.guc_ids_bitmap =
-		bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID, GFP_KERNEL);
+		bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
 	if (!guc->submission_state.guc_ids_bitmap)
 		return -ENOMEM;
 
@@ -1869,13 +1870,13 @@ static int new_guc_id(struct intel_guc *guc, struct intel_context *ce)
 
 	if (intel_context_is_parent(ce))
 		ret = bitmap_find_free_region(guc->submission_state.guc_ids_bitmap,
-					      NUMBER_MULTI_LRC_GUC_ID,
+					      NUMBER_MULTI_LRC_GUC_ID(guc),
 					      order_base_2(ce->parallel.number_children
 							   + 1));
 	else
 		ret = ida_simple_get(&guc->submission_state.guc_ids,
-				     NUMBER_MULTI_LRC_GUC_ID,
-				     GUC_MAX_LRC_DESCRIPTORS,
+				     NUMBER_MULTI_LRC_GUC_ID(guc),
+				     guc->submission_state.num_guc_ids,
 				     GFP_KERNEL | __GFP_RETRY_MAYFAIL |
 				     __GFP_NOWARN);
 	if (unlikely(ret < 0))
@@ -1941,6 +1942,10 @@ static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce)
 
 		set_context_guc_id_invalid(cn);
 
+#ifdef CONFIG_DRM_I915_SELFTEST
+		guc->number_guc_id_stolen++;
+#endif
+
 		return 0;
 	} else {
 		return -EAGAIN;
@@ -3779,6 +3784,7 @@ static bool __guc_submission_selected(struct intel_guc *guc)
 
 void intel_guc_submission_init_early(struct intel_guc *guc)
 {
+	guc->submission_state.num_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
 	guc->submission_supported = __guc_submission_supported(guc);
 	guc->submission_selected = __guc_submission_selected(guc);
 }
diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c
index fb0e4a7bd8ca..2ae414446112 100644
--- a/drivers/gpu/drm/i915/gt/uc/selftest_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c
@@ -3,8 +3,21 @@
  * Copyright �� 2021 Intel Corporation
  */
 
+#include "selftests/igt_spinner.h"
 #include "selftests/intel_scheduler_helpers.h"
 
+static int request_add_spin(struct i915_request *rq, struct igt_spinner *spin)
+{
+	int err = 0;
+
+	i915_request_get(rq);
+	i915_request_add(rq);
+	if (spin && !igt_wait_for_spinner(spin, rq))
+		err = -ETIMEDOUT;
+
+	return err;
+}
+
 static struct i915_request *nop_user_request(struct intel_context *ce,
 					     struct i915_request *from)
 {
@@ -110,10 +123,170 @@ static int intel_guc_scrub_ctbs(void *arg)
 	return ret;
 }
 
+/*
+ * intel_guc_steal_guc_ids - Test to exhaust all guc_ids and then steal one
+ *
+ * This test creates a spinner which is used to block all subsequent submissions
+ * until it completes. Next, a loop creates a context and a NOP request each
+ * iteration until the guc_ids are exhausted (request creation returns -EAGAIN).
+ * The spinner is ended, unblocking all requests created in the loop. At this
+ * point all guc_ids are exhausted but are available to steal. Try to create
+ * another request which should successfully steal a guc_id. Wait on last
+ * request to complete, idle GPU, verify a guc_id was stolen via a counter, and
+ * exit the test. Test also artificially reduces the number of guc_ids so the
+ * test runs in a timely manner.
+ */
+static int intel_guc_steal_guc_ids(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct intel_guc *guc = &gt->uc.guc;
+	int ret, sv, context_index = 0;
+	intel_wakeref_t wakeref;
+	struct intel_engine_cs *engine;
+	struct intel_context **ce;
+	struct igt_spinner spin;
+	struct i915_request *spin_rq = NULL, *rq, *last = NULL;
+	int number_guc_id_stolen = guc->number_guc_id_stolen;
+
+	ce = kzalloc(sizeof(*ce) * GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL);
+	if (!ce) {
+		pr_err("Context array allocation failed\n");
+		return -ENOMEM;
+	}
+
+	wakeref = intel_runtime_pm_get(gt->uncore->rpm);
+	engine = intel_selftest_find_any_engine(gt);
+	sv = guc->submission_state.num_guc_ids;
+	guc->submission_state.num_guc_ids = 4096;
+
+	/* Create spinner to block requests in below loop */
+	ce[context_index] = intel_context_create(engine);
+	if (IS_ERR(ce[context_index])) {
+		ret = PTR_ERR(ce[context_index]);
+		ce[context_index] = NULL;
+		pr_err("Failed to create context: %d\n", ret);
+		goto err_wakeref;
+	}
+	ret = igt_spinner_init(&spin, engine->gt);
+	if (ret) {
+		pr_err("Failed to create spinner: %d\n", ret);
+		goto err_contexts;
+	}
+	spin_rq = igt_spinner_create_request(&spin, ce[context_index],
+					     MI_ARB_CHECK);
+	if (IS_ERR(spin_rq)) {
+		ret = PTR_ERR(spin_rq);
+		pr_err("Failed to create spinner request: %d\n", ret);
+		goto err_contexts;
+	}
+	ret = request_add_spin(spin_rq, &spin);
+	if (ret) {
+		pr_err("Failed to add Spinner request: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+	/* Use all guc_ids */
+	while (ret != -EAGAIN) {
+		ce[++context_index] = intel_context_create(engine);
+		if (IS_ERR(ce[context_index])) {
+			ret = PTR_ERR(ce[context_index--]);
+			ce[context_index] = NULL;
+			pr_err("Failed to create context: %d\n", ret);
+			goto err_spin_rq;
+		}
+
+		rq = nop_user_request(ce[context_index], spin_rq);
+		if (IS_ERR(rq)) {
+			ret = PTR_ERR(rq);
+			rq = NULL;
+			if (ret != -EAGAIN) {
+				pr_err("Failed to create request, %d: %d\n",
+				       context_index, ret);
+				goto err_spin_rq;
+			}
+		} else {
+			if (last)
+				i915_request_put(last);
+			last = rq;
+		}
+	}
+
+	/* Release blocked requests */
+	igt_spinner_end(&spin);
+	ret = intel_selftest_wait_for_rq(spin_rq);
+	if (ret) {
+		pr_err("Spin request failed to complete: %d\n", ret);
+		i915_request_put(last);
+		goto err_spin_rq;
+	}
+	i915_request_put(spin_rq);
+	igt_spinner_fini(&spin);
+	spin_rq = NULL;
+
+	/* Wait for last request */
+	ret = i915_request_wait(last, 0, HZ * 30);
+	i915_request_put(last);
+	if (ret < 0) {
+		pr_err("Last request failed to complete: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+	/* Try to steal guc_id */
+	rq = nop_user_request(ce[context_index], NULL);
+	if (IS_ERR(rq)) {
+		ret = PTR_ERR(rq);
+		pr_err("Failed to steal guc_id, %d: %d\n", context_index, ret);
+		goto err_spin_rq;
+	}
+
+	/* Wait for request with stolen guc_id */
+	ret = i915_request_wait(rq, 0, HZ);
+	i915_request_put(rq);
+	if (ret < 0) {
+		pr_err("Request with stolen guc_id failed to complete: %d\n",
+		       ret);
+		goto err_spin_rq;
+	}
+
+	/* Wait for idle */
+	ret = intel_gt_wait_for_idle(gt, HZ * 30);
+	if (ret < 0) {
+		pr_err("GT failed to idle: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+	/* Verify a guc_id was stolen */
+	if (guc->number_guc_id_stolen == number_guc_id_stolen) {
+		pr_err("No guc_id was stolen");
+		ret = -EINVAL;
+	} else {
+		ret = 0;
+	}
+
+err_spin_rq:
+	if (spin_rq) {
+		igt_spinner_end(&spin);
+		intel_selftest_wait_for_rq(spin_rq);
+		i915_request_put(spin_rq);
+		igt_spinner_fini(&spin);
+		intel_gt_wait_for_idle(gt, HZ * 30);
+	}
+err_contexts:
+	for (; context_index >= 0 && ce[context_index]; --context_index)
+		intel_context_put(ce[context_index]);
+err_wakeref:
+	intel_runtime_pm_put(gt->uncore->rpm, wakeref);
+	kfree(ce);
+	guc->submission_state.num_guc_ids = sv;
+
+	return ret;
+}
+
 int intel_guc_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
 		SUBTEST(intel_guc_scrub_ctbs),
+		SUBTEST(intel_guc_steal_guc_ids),
 	};
 	struct intel_gt *gt = &i915->gt;
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Fix stealing guc_ids + test (rev3)
  2021-12-14 17:04 ` [Intel-gfx] " Matthew Brost
                   ` (7 preceding siblings ...)
  (?)
@ 2021-12-14 18:12 ` Patchwork
  -1 siblings, 0 replies; 28+ messages in thread
From: Patchwork @ 2021-12-14 18:12 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx

== Series Details ==

Series: Fix stealing guc_ids + test (rev3)
URL   : https://patchwork.freedesktop.org/series/97896/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
b855be58ca75 drm/i915/guc: Use correct context lock when callig clr_context_registered
b2c8358f28bf drm/i915/guc: Only assign guc_id.id when stealing guc_id
16e7a94892d5 drm/i915/guc: Remove racey GEM_BUG_ON
4d46938487d5 drm/i915/guc: Don't hog IRQs when destroying contexts
cbcb8fe154c3 drm/i915/guc: Add extra debug on CT deadlock
344e5cf583e1 drm/i915/guc: Kick G2H tasklet if no credits
c07631e2c9df drm/i915/guc: Selftest for stealing of guc ids
-:183: WARNING:OOM_MESSAGE: Possible unnecessary 'out of memory' message
#183: FILE: drivers/gpu/drm/i915/gt/uc/selftest_guc.c:153:
+	if (!ce) {
+		pr_err("Context array allocation failed\n");

total: 0 errors, 1 warnings, 0 checks, 265 lines checked



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Fix stealing guc_ids + test (rev3)
  2021-12-14 17:04 ` [Intel-gfx] " Matthew Brost
                   ` (8 preceding siblings ...)
  (?)
@ 2021-12-14 18:13 ` Patchwork
  -1 siblings, 0 replies; 28+ messages in thread
From: Patchwork @ 2021-12-14 18:13 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx

== Series Details ==

Series: Fix stealing guc_ids + test (rev3)
URL   : https://patchwork.freedesktop.org/series/97896/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for Fix stealing guc_ids + test (rev3)
  2021-12-14 17:04 ` [Intel-gfx] " Matthew Brost
                   ` (9 preceding siblings ...)
  (?)
@ 2021-12-14 18:42 ` Patchwork
  -1 siblings, 0 replies; 28+ messages in thread
From: Patchwork @ 2021-12-14 18:42 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 4815 bytes --]

== Series Details ==

Series: Fix stealing guc_ids + test (rev3)
URL   : https://patchwork.freedesktop.org/series/97896/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11002 -> Patchwork_21847
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/index.html

Participating hosts (46 -> 34)
------------------------------

  Missing    (12): bat-dg1-6 bat-dg1-5 fi-hsw-4200u fi-bdw-gvtdvm fi-bsw-cyan bat-adlp-6 bat-adlp-4 fi-ctg-p8600 fi-pnv-d510 fi-bdw-samus bat-jsl-2 bat-jsl-1 

Known issues
------------

  Here are the changes found in Patchwork_21847 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_basic@query-info:
    - fi-bsw-kefka:       NOTRUN -> [SKIP][1] ([fdo#109271]) +17 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/fi-bsw-kefka/igt@amdgpu/amd_basic@query-info.html

  * igt@amdgpu/amd_cs_nop@sync-fork-gfx0:
    - fi-skl-6600u:       NOTRUN -> [SKIP][2] ([fdo#109271]) +21 similar issues
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/fi-skl-6600u/igt@amdgpu/amd_cs_nop@sync-fork-gfx0.html

  * igt@gem_huc_copy@huc-copy:
    - fi-skl-6600u:       NOTRUN -> [SKIP][3] ([fdo#109271] / [i915#2190])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/fi-skl-6600u/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@verify-random:
    - fi-skl-6600u:       NOTRUN -> [SKIP][4] ([fdo#109271] / [i915#4613]) +3 similar issues
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/fi-skl-6600u/igt@gem_lmem_swapping@verify-random.html

  * igt@kms_chamelium@vga-edid-read:
    - fi-skl-6600u:       NOTRUN -> [SKIP][5] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/fi-skl-6600u/igt@kms_chamelium@vga-edid-read.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d:
    - fi-skl-6600u:       NOTRUN -> [SKIP][6] ([fdo#109271] / [i915#533])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/fi-skl-6600u/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d.html

  
#### Possible fixes ####

  * igt@gem_flink_basic@bad-flink:
    - fi-skl-6600u:       [INCOMPLETE][7] ([i915#4547]) -> [PASS][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/fi-skl-6600u/igt@gem_flink_basic@bad-flink.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/fi-skl-6600u/igt@gem_flink_basic@bad-flink.html

  * igt@i915_selftest@live@execlists:
    - fi-bsw-kefka:       [INCOMPLETE][9] ([i915#2940]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/fi-bsw-kefka/igt@i915_selftest@live@execlists.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/fi-bsw-kefka/igt@i915_selftest@live@execlists.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-cml-u2:          [DMESG-WARN][11] ([i915#4269]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html

  
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2940]: https://gitlab.freedesktop.org/drm/intel/issues/2940
  [i915#4269]: https://gitlab.freedesktop.org/drm/intel/issues/4269
  [i915#4547]: https://gitlab.freedesktop.org/drm/intel/issues/4547
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533


Build changes
-------------

  * Linux: CI_DRM_11002 -> Patchwork_21847

  CI-20190529: 20190529
  CI_DRM_11002: 8dabbcc22a6b77b358ff35d31adbaa653ab45857 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6307: be84fe4f151bc092e068cab5cd0cd19c34948b40 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_21847: c07631e2c9dfcef2599ea8f76bd23049eee201e5 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

c07631e2c9df drm/i915/guc: Selftest for stealing of guc ids
344e5cf583e1 drm/i915/guc: Kick G2H tasklet if no credits
cbcb8fe154c3 drm/i915/guc: Add extra debug on CT deadlock
4d46938487d5 drm/i915/guc: Don't hog IRQs when destroying contexts
16e7a94892d5 drm/i915/guc: Remove racey GEM_BUG_ON
b2c8358f28bf drm/i915/guc: Only assign guc_id.id when stealing guc_id
b855be58ca75 drm/i915/guc: Use correct context lock when callig clr_context_registered

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/index.html

[-- Attachment #2: Type: text/html, Size: 6001 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/7] drm/i915/guc: Selftest for stealing of guc ids
  2021-12-14 17:05   ` [Intel-gfx] " Matthew Brost
@ 2021-12-14 19:48     ` John Harrison
  -1 siblings, 0 replies; 28+ messages in thread
From: John Harrison @ 2021-12-14 19:48 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: daniele.ceraolospurio

On 12/14/2021 09:05, Matthew Brost wrote:
> Testing the stealing of guc ids is hard from user space as we have 64k
> guc_ids. Add a selftest, which artificially reduces the number of guc
> ids, and forces a steal.
>
> The test creates a spinner which is used to block all subsequent
> submissions until it completes. Next, a loop creates a context and a NOP
> request each iteration until the guc_ids are exhausted (request creation
> returns -EAGAIN). The spinner is ended, unblocking all requests created
> in the loop. At this point all guc_ids are exhausted but are available
> to steal. Try to create another request which should successfully steal
> a guc_id. Wait on last request to complete, idle GPU, verify a guc_id
> was stolen via a counter, and exit the test. Test also artificially
> reduces the number of guc_ids so the test runs in a timely manner.
>
> v2:
>   (John Harrison)
>    - s/stole/stolen
>    - Fix some wording in test description
>    - Rework indexing into context array
>    - Add test description to commit message
>    - Fix typo in commit message
>   (Checkpatch)
>    - s/guc/(guc) in NUMBER_MULTI_LRC_GUC_ID
> v3:
>   (John Harrison)
>    - Set array value to NULL after extracting error
>    - Fix a few typos in comments / error messages
>    - Delete redundant comment in commit message
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  12 ++
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  16 +-
>   drivers/gpu/drm/i915/gt/uc/selftest_guc.c     | 173 ++++++++++++++++++
>   3 files changed, 196 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 1cb46098030d..f9240d4baa69 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -94,6 +94,11 @@ struct intel_guc {
>   		 * @guc_ids: used to allocate new guc_ids, single-lrc
>   		 */
>   		struct ida guc_ids;
> +		/**
> +		 * @num_guc_ids: Number of guc_ids, selftest feature to be able
> +		 * to reduce this number while testing.
> +		 */
> +		int num_guc_ids;
>   		/**
>   		 * @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc
>   		 */
> @@ -202,6 +207,13 @@ struct intel_guc {
>   		 */
>   		struct delayed_work work;
>   	} timestamp;
> +
> +#ifdef CONFIG_DRM_I915_SELFTEST
> +	/**
> +	 * @number_guc_id_stolen: The number of guc_ids that have been stolen
> +	 */
> +	int number_guc_id_stolen;
> +#endif
>   };
>   
>   static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 96fcf869e3ff..99414b49ca6d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -145,7 +145,8 @@ guc_create_parallel(struct intel_engine_cs **engines,
>    * use should be low and 1/16 should be sufficient. Minimum of 32 guc_ids for
>    * multi-lrc.
>    */
> -#define NUMBER_MULTI_LRC_GUC_ID		(GUC_MAX_LRC_DESCRIPTORS / 16)
> +#define NUMBER_MULTI_LRC_GUC_ID(guc)	\
> +	((guc)->submission_state.num_guc_ids / 16)
>   
>   /*
>    * Below is a set of functions which control the GuC scheduling state which
> @@ -1775,7 +1776,7 @@ int intel_guc_submission_init(struct intel_guc *guc)
>   		  destroyed_worker_func);
>   
>   	guc->submission_state.guc_ids_bitmap =
> -		bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID, GFP_KERNEL);
> +		bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
>   	if (!guc->submission_state.guc_ids_bitmap)
>   		return -ENOMEM;
>   
> @@ -1869,13 +1870,13 @@ static int new_guc_id(struct intel_guc *guc, struct intel_context *ce)
>   
>   	if (intel_context_is_parent(ce))
>   		ret = bitmap_find_free_region(guc->submission_state.guc_ids_bitmap,
> -					      NUMBER_MULTI_LRC_GUC_ID,
> +					      NUMBER_MULTI_LRC_GUC_ID(guc),
>   					      order_base_2(ce->parallel.number_children
>   							   + 1));
>   	else
>   		ret = ida_simple_get(&guc->submission_state.guc_ids,
> -				     NUMBER_MULTI_LRC_GUC_ID,
> -				     GUC_MAX_LRC_DESCRIPTORS,
> +				     NUMBER_MULTI_LRC_GUC_ID(guc),
> +				     guc->submission_state.num_guc_ids,
>   				     GFP_KERNEL | __GFP_RETRY_MAYFAIL |
>   				     __GFP_NOWARN);
>   	if (unlikely(ret < 0))
> @@ -1941,6 +1942,10 @@ static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce)
>   
>   		set_context_guc_id_invalid(cn);
>   
> +#ifdef CONFIG_DRM_I915_SELFTEST
> +		guc->number_guc_id_stolen++;
> +#endif
> +
>   		return 0;
>   	} else {
>   		return -EAGAIN;
> @@ -3779,6 +3784,7 @@ static bool __guc_submission_selected(struct intel_guc *guc)
>   
>   void intel_guc_submission_init_early(struct intel_guc *guc)
>   {
> +	guc->submission_state.num_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
>   	guc->submission_supported = __guc_submission_supported(guc);
>   	guc->submission_selected = __guc_submission_selected(guc);
>   }
> diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c
> index fb0e4a7bd8ca..2ae414446112 100644
> --- a/drivers/gpu/drm/i915/gt/uc/selftest_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c
> @@ -3,8 +3,21 @@
>    * Copyright �� 2021 Intel Corporation
>    */
>   
> +#include "selftests/igt_spinner.h"
>   #include "selftests/intel_scheduler_helpers.h"
>   
> +static int request_add_spin(struct i915_request *rq, struct igt_spinner *spin)
> +{
> +	int err = 0;
> +
> +	i915_request_get(rq);
> +	i915_request_add(rq);
> +	if (spin && !igt_wait_for_spinner(spin, rq))
> +		err = -ETIMEDOUT;
> +
> +	return err;
> +}
> +
>   static struct i915_request *nop_user_request(struct intel_context *ce,
>   					     struct i915_request *from)
>   {
> @@ -110,10 +123,170 @@ static int intel_guc_scrub_ctbs(void *arg)
>   	return ret;
>   }
>   
> +/*
> + * intel_guc_steal_guc_ids - Test to exhaust all guc_ids and then steal one
> + *
> + * This test creates a spinner which is used to block all subsequent submissions
> + * until it completes. Next, a loop creates a context and a NOP request each
> + * iteration until the guc_ids are exhausted (request creation returns -EAGAIN).
> + * The spinner is ended, unblocking all requests created in the loop. At this
> + * point all guc_ids are exhausted but are available to steal. Try to create
> + * another request which should successfully steal a guc_id. Wait on last
> + * request to complete, idle GPU, verify a guc_id was stolen via a counter, and
> + * exit the test. Test also artificially reduces the number of guc_ids so the
> + * test runs in a timely manner.
> + */
> +static int intel_guc_steal_guc_ids(void *arg)
> +{
> +	struct intel_gt *gt = arg;
> +	struct intel_guc *guc = &gt->uc.guc;
> +	int ret, sv, context_index = 0;
> +	intel_wakeref_t wakeref;
> +	struct intel_engine_cs *engine;
> +	struct intel_context **ce;
> +	struct igt_spinner spin;
> +	struct i915_request *spin_rq = NULL, *rq, *last = NULL;
> +	int number_guc_id_stolen = guc->number_guc_id_stolen;
> +
> +	ce = kzalloc(sizeof(*ce) * GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL);
> +	if (!ce) {
> +		pr_err("Context array allocation failed\n");
> +		return -ENOMEM;
> +	}
> +
> +	wakeref = intel_runtime_pm_get(gt->uncore->rpm);
> +	engine = intel_selftest_find_any_engine(gt);
> +	sv = guc->submission_state.num_guc_ids;
> +	guc->submission_state.num_guc_ids = 4096;
> +
> +	/* Create spinner to block requests in below loop */
> +	ce[context_index] = intel_context_create(engine);
> +	if (IS_ERR(ce[context_index])) {
> +		ret = PTR_ERR(ce[context_index]);
> +		ce[context_index] = NULL;
> +		pr_err("Failed to create context: %d\n", ret);
> +		goto err_wakeref;
> +	}
> +	ret = igt_spinner_init(&spin, engine->gt);
> +	if (ret) {
> +		pr_err("Failed to create spinner: %d\n", ret);
> +		goto err_contexts;
> +	}
> +	spin_rq = igt_spinner_create_request(&spin, ce[context_index],
> +					     MI_ARB_CHECK);
> +	if (IS_ERR(spin_rq)) {
> +		ret = PTR_ERR(spin_rq);
> +		pr_err("Failed to create spinner request: %d\n", ret);
> +		goto err_contexts;
> +	}
> +	ret = request_add_spin(spin_rq, &spin);
> +	if (ret) {
> +		pr_err("Failed to add Spinner request: %d\n", ret);
> +		goto err_spin_rq;
> +	}
> +
> +	/* Use all guc_ids */
> +	while (ret != -EAGAIN) {
> +		ce[++context_index] = intel_context_create(engine);
> +		if (IS_ERR(ce[context_index])) {
> +			ret = PTR_ERR(ce[context_index--]);
> +			ce[context_index] = NULL;
> +			pr_err("Failed to create context: %d\n", ret);
> +			goto err_spin_rq;
> +		}
> +
> +		rq = nop_user_request(ce[context_index], spin_rq);
> +		if (IS_ERR(rq)) {
> +			ret = PTR_ERR(rq);
> +			rq = NULL;
> +			if (ret != -EAGAIN) {
> +				pr_err("Failed to create request, %d: %d\n",
> +				       context_index, ret);
> +				goto err_spin_rq;
> +			}
> +		} else {
> +			if (last)
> +				i915_request_put(last);
> +			last = rq;
> +		}
> +	}
> +
> +	/* Release blocked requests */
> +	igt_spinner_end(&spin);
> +	ret = intel_selftest_wait_for_rq(spin_rq);
> +	if (ret) {
> +		pr_err("Spin request failed to complete: %d\n", ret);
> +		i915_request_put(last);
> +		goto err_spin_rq;
> +	}
> +	i915_request_put(spin_rq);
> +	igt_spinner_fini(&spin);
> +	spin_rq = NULL;
> +
> +	/* Wait for last request */
> +	ret = i915_request_wait(last, 0, HZ * 30);
> +	i915_request_put(last);
> +	if (ret < 0) {
> +		pr_err("Last request failed to complete: %d\n", ret);
> +		goto err_spin_rq;
> +	}
> +
> +	/* Try to steal guc_id */
> +	rq = nop_user_request(ce[context_index], NULL);
> +	if (IS_ERR(rq)) {
> +		ret = PTR_ERR(rq);
> +		pr_err("Failed to steal guc_id, %d: %d\n", context_index, ret);
> +		goto err_spin_rq;
> +	}
> +
> +	/* Wait for request with stolen guc_id */
> +	ret = i915_request_wait(rq, 0, HZ);
> +	i915_request_put(rq);
> +	if (ret < 0) {
> +		pr_err("Request with stolen guc_id failed to complete: %d\n",
> +		       ret);
> +		goto err_spin_rq;
> +	}
> +
> +	/* Wait for idle */
> +	ret = intel_gt_wait_for_idle(gt, HZ * 30);
> +	if (ret < 0) {
> +		pr_err("GT failed to idle: %d\n", ret);
> +		goto err_spin_rq;
> +	}
> +
> +	/* Verify a guc_id was stolen */
> +	if (guc->number_guc_id_stolen == number_guc_id_stolen) {
> +		pr_err("No guc_id was stolen");
> +		ret = -EINVAL;
> +	} else {
> +		ret = 0;
> +	}
> +
> +err_spin_rq:
> +	if (spin_rq) {
> +		igt_spinner_end(&spin);
> +		intel_selftest_wait_for_rq(spin_rq);
> +		i915_request_put(spin_rq);
> +		igt_spinner_fini(&spin);
> +		intel_gt_wait_for_idle(gt, HZ * 30);
> +	}
> +err_contexts:
> +	for (; context_index >= 0 && ce[context_index]; --context_index)
> +		intel_context_put(ce[context_index]);
> +err_wakeref:
> +	intel_runtime_pm_put(gt->uncore->rpm, wakeref);
> +	kfree(ce);
> +	guc->submission_state.num_guc_ids = sv;
> +
> +	return ret;
> +}
> +
>   int intel_guc_live_selftests(struct drm_i915_private *i915)
>   {
>   	static const struct i915_subtest tests[] = {
>   		SUBTEST(intel_guc_scrub_ctbs),
> +		SUBTEST(intel_guc_steal_guc_ids),
>   	};
>   	struct intel_gt *gt = &i915->gt;
>   


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 7/7] drm/i915/guc: Selftest for stealing of guc ids
@ 2021-12-14 19:48     ` John Harrison
  0 siblings, 0 replies; 28+ messages in thread
From: John Harrison @ 2021-12-14 19:48 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 12/14/2021 09:05, Matthew Brost wrote:
> Testing the stealing of guc ids is hard from user space as we have 64k
> guc_ids. Add a selftest, which artificially reduces the number of guc
> ids, and forces a steal.
>
> The test creates a spinner which is used to block all subsequent
> submissions until it completes. Next, a loop creates a context and a NOP
> request each iteration until the guc_ids are exhausted (request creation
> returns -EAGAIN). The spinner is ended, unblocking all requests created
> in the loop. At this point all guc_ids are exhausted but are available
> to steal. Try to create another request which should successfully steal
> a guc_id. Wait on last request to complete, idle GPU, verify a guc_id
> was stolen via a counter, and exit the test. Test also artificially
> reduces the number of guc_ids so the test runs in a timely manner.
>
> v2:
>   (John Harrison)
>    - s/stole/stolen
>    - Fix some wording in test description
>    - Rework indexing into context array
>    - Add test description to commit message
>    - Fix typo in commit message
>   (Checkpatch)
>    - s/guc/(guc) in NUMBER_MULTI_LRC_GUC_ID
> v3:
>   (John Harrison)
>    - Set array value to NULL after extracting error
>    - Fix a few typos in comments / error messages
>    - Delete redundant comment in commit message
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  12 ++
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  16 +-
>   drivers/gpu/drm/i915/gt/uc/selftest_guc.c     | 173 ++++++++++++++++++
>   3 files changed, 196 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 1cb46098030d..f9240d4baa69 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -94,6 +94,11 @@ struct intel_guc {
>   		 * @guc_ids: used to allocate new guc_ids, single-lrc
>   		 */
>   		struct ida guc_ids;
> +		/**
> +		 * @num_guc_ids: Number of guc_ids, selftest feature to be able
> +		 * to reduce this number while testing.
> +		 */
> +		int num_guc_ids;
>   		/**
>   		 * @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc
>   		 */
> @@ -202,6 +207,13 @@ struct intel_guc {
>   		 */
>   		struct delayed_work work;
>   	} timestamp;
> +
> +#ifdef CONFIG_DRM_I915_SELFTEST
> +	/**
> +	 * @number_guc_id_stolen: The number of guc_ids that have been stolen
> +	 */
> +	int number_guc_id_stolen;
> +#endif
>   };
>   
>   static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 96fcf869e3ff..99414b49ca6d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -145,7 +145,8 @@ guc_create_parallel(struct intel_engine_cs **engines,
>    * use should be low and 1/16 should be sufficient. Minimum of 32 guc_ids for
>    * multi-lrc.
>    */
> -#define NUMBER_MULTI_LRC_GUC_ID		(GUC_MAX_LRC_DESCRIPTORS / 16)
> +#define NUMBER_MULTI_LRC_GUC_ID(guc)	\
> +	((guc)->submission_state.num_guc_ids / 16)
>   
>   /*
>    * Below is a set of functions which control the GuC scheduling state which
> @@ -1775,7 +1776,7 @@ int intel_guc_submission_init(struct intel_guc *guc)
>   		  destroyed_worker_func);
>   
>   	guc->submission_state.guc_ids_bitmap =
> -		bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID, GFP_KERNEL);
> +		bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
>   	if (!guc->submission_state.guc_ids_bitmap)
>   		return -ENOMEM;
>   
> @@ -1869,13 +1870,13 @@ static int new_guc_id(struct intel_guc *guc, struct intel_context *ce)
>   
>   	if (intel_context_is_parent(ce))
>   		ret = bitmap_find_free_region(guc->submission_state.guc_ids_bitmap,
> -					      NUMBER_MULTI_LRC_GUC_ID,
> +					      NUMBER_MULTI_LRC_GUC_ID(guc),
>   					      order_base_2(ce->parallel.number_children
>   							   + 1));
>   	else
>   		ret = ida_simple_get(&guc->submission_state.guc_ids,
> -				     NUMBER_MULTI_LRC_GUC_ID,
> -				     GUC_MAX_LRC_DESCRIPTORS,
> +				     NUMBER_MULTI_LRC_GUC_ID(guc),
> +				     guc->submission_state.num_guc_ids,
>   				     GFP_KERNEL | __GFP_RETRY_MAYFAIL |
>   				     __GFP_NOWARN);
>   	if (unlikely(ret < 0))
> @@ -1941,6 +1942,10 @@ static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce)
>   
>   		set_context_guc_id_invalid(cn);
>   
> +#ifdef CONFIG_DRM_I915_SELFTEST
> +		guc->number_guc_id_stolen++;
> +#endif
> +
>   		return 0;
>   	} else {
>   		return -EAGAIN;
> @@ -3779,6 +3784,7 @@ static bool __guc_submission_selected(struct intel_guc *guc)
>   
>   void intel_guc_submission_init_early(struct intel_guc *guc)
>   {
> +	guc->submission_state.num_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
>   	guc->submission_supported = __guc_submission_supported(guc);
>   	guc->submission_selected = __guc_submission_selected(guc);
>   }
> diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c
> index fb0e4a7bd8ca..2ae414446112 100644
> --- a/drivers/gpu/drm/i915/gt/uc/selftest_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c
> @@ -3,8 +3,21 @@
>    * Copyright �� 2021 Intel Corporation
>    */
>   
> +#include "selftests/igt_spinner.h"
>   #include "selftests/intel_scheduler_helpers.h"
>   
> +static int request_add_spin(struct i915_request *rq, struct igt_spinner *spin)
> +{
> +	int err = 0;
> +
> +	i915_request_get(rq);
> +	i915_request_add(rq);
> +	if (spin && !igt_wait_for_spinner(spin, rq))
> +		err = -ETIMEDOUT;
> +
> +	return err;
> +}
> +
>   static struct i915_request *nop_user_request(struct intel_context *ce,
>   					     struct i915_request *from)
>   {
> @@ -110,10 +123,170 @@ static int intel_guc_scrub_ctbs(void *arg)
>   	return ret;
>   }
>   
> +/*
> + * intel_guc_steal_guc_ids - Test to exhaust all guc_ids and then steal one
> + *
> + * This test creates a spinner which is used to block all subsequent submissions
> + * until it completes. Next, a loop creates a context and a NOP request each
> + * iteration until the guc_ids are exhausted (request creation returns -EAGAIN).
> + * The spinner is ended, unblocking all requests created in the loop. At this
> + * point all guc_ids are exhausted but are available to steal. Try to create
> + * another request which should successfully steal a guc_id. Wait on last
> + * request to complete, idle GPU, verify a guc_id was stolen via a counter, and
> + * exit the test. Test also artificially reduces the number of guc_ids so the
> + * test runs in a timely manner.
> + */
> +static int intel_guc_steal_guc_ids(void *arg)
> +{
> +	struct intel_gt *gt = arg;
> +	struct intel_guc *guc = &gt->uc.guc;
> +	int ret, sv, context_index = 0;
> +	intel_wakeref_t wakeref;
> +	struct intel_engine_cs *engine;
> +	struct intel_context **ce;
> +	struct igt_spinner spin;
> +	struct i915_request *spin_rq = NULL, *rq, *last = NULL;
> +	int number_guc_id_stolen = guc->number_guc_id_stolen;
> +
> +	ce = kzalloc(sizeof(*ce) * GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL);
> +	if (!ce) {
> +		pr_err("Context array allocation failed\n");
> +		return -ENOMEM;
> +	}
> +
> +	wakeref = intel_runtime_pm_get(gt->uncore->rpm);
> +	engine = intel_selftest_find_any_engine(gt);
> +	sv = guc->submission_state.num_guc_ids;
> +	guc->submission_state.num_guc_ids = 4096;
> +
> +	/* Create spinner to block requests in below loop */
> +	ce[context_index] = intel_context_create(engine);
> +	if (IS_ERR(ce[context_index])) {
> +		ret = PTR_ERR(ce[context_index]);
> +		ce[context_index] = NULL;
> +		pr_err("Failed to create context: %d\n", ret);
> +		goto err_wakeref;
> +	}
> +	ret = igt_spinner_init(&spin, engine->gt);
> +	if (ret) {
> +		pr_err("Failed to create spinner: %d\n", ret);
> +		goto err_contexts;
> +	}
> +	spin_rq = igt_spinner_create_request(&spin, ce[context_index],
> +					     MI_ARB_CHECK);
> +	if (IS_ERR(spin_rq)) {
> +		ret = PTR_ERR(spin_rq);
> +		pr_err("Failed to create spinner request: %d\n", ret);
> +		goto err_contexts;
> +	}
> +	ret = request_add_spin(spin_rq, &spin);
> +	if (ret) {
> +		pr_err("Failed to add Spinner request: %d\n", ret);
> +		goto err_spin_rq;
> +	}
> +
> +	/* Use all guc_ids */
> +	while (ret != -EAGAIN) {
> +		ce[++context_index] = intel_context_create(engine);
> +		if (IS_ERR(ce[context_index])) {
> +			ret = PTR_ERR(ce[context_index--]);
> +			ce[context_index] = NULL;
> +			pr_err("Failed to create context: %d\n", ret);
> +			goto err_spin_rq;
> +		}
> +
> +		rq = nop_user_request(ce[context_index], spin_rq);
> +		if (IS_ERR(rq)) {
> +			ret = PTR_ERR(rq);
> +			rq = NULL;
> +			if (ret != -EAGAIN) {
> +				pr_err("Failed to create request, %d: %d\n",
> +				       context_index, ret);
> +				goto err_spin_rq;
> +			}
> +		} else {
> +			if (last)
> +				i915_request_put(last);
> +			last = rq;
> +		}
> +	}
> +
> +	/* Release blocked requests */
> +	igt_spinner_end(&spin);
> +	ret = intel_selftest_wait_for_rq(spin_rq);
> +	if (ret) {
> +		pr_err("Spin request failed to complete: %d\n", ret);
> +		i915_request_put(last);
> +		goto err_spin_rq;
> +	}
> +	i915_request_put(spin_rq);
> +	igt_spinner_fini(&spin);
> +	spin_rq = NULL;
> +
> +	/* Wait for last request */
> +	ret = i915_request_wait(last, 0, HZ * 30);
> +	i915_request_put(last);
> +	if (ret < 0) {
> +		pr_err("Last request failed to complete: %d\n", ret);
> +		goto err_spin_rq;
> +	}
> +
> +	/* Try to steal guc_id */
> +	rq = nop_user_request(ce[context_index], NULL);
> +	if (IS_ERR(rq)) {
> +		ret = PTR_ERR(rq);
> +		pr_err("Failed to steal guc_id, %d: %d\n", context_index, ret);
> +		goto err_spin_rq;
> +	}
> +
> +	/* Wait for request with stolen guc_id */
> +	ret = i915_request_wait(rq, 0, HZ);
> +	i915_request_put(rq);
> +	if (ret < 0) {
> +		pr_err("Request with stolen guc_id failed to complete: %d\n",
> +		       ret);
> +		goto err_spin_rq;
> +	}
> +
> +	/* Wait for idle */
> +	ret = intel_gt_wait_for_idle(gt, HZ * 30);
> +	if (ret < 0) {
> +		pr_err("GT failed to idle: %d\n", ret);
> +		goto err_spin_rq;
> +	}
> +
> +	/* Verify a guc_id was stolen */
> +	if (guc->number_guc_id_stolen == number_guc_id_stolen) {
> +		pr_err("No guc_id was stolen");
> +		ret = -EINVAL;
> +	} else {
> +		ret = 0;
> +	}
> +
> +err_spin_rq:
> +	if (spin_rq) {
> +		igt_spinner_end(&spin);
> +		intel_selftest_wait_for_rq(spin_rq);
> +		i915_request_put(spin_rq);
> +		igt_spinner_fini(&spin);
> +		intel_gt_wait_for_idle(gt, HZ * 30);
> +	}
> +err_contexts:
> +	for (; context_index >= 0 && ce[context_index]; --context_index)
> +		intel_context_put(ce[context_index]);
> +err_wakeref:
> +	intel_runtime_pm_put(gt->uncore->rpm, wakeref);
> +	kfree(ce);
> +	guc->submission_state.num_guc_ids = sv;
> +
> +	return ret;
> +}
> +
>   int intel_guc_live_selftests(struct drm_i915_private *i915)
>   {
>   	static const struct i915_subtest tests[] = {
>   		SUBTEST(intel_guc_scrub_ctbs),
> +		SUBTEST(intel_guc_steal_guc_ids),
>   	};
>   	struct intel_gt *gt = &i915->gt;
>   


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for Fix stealing guc_ids + test (rev3)
  2021-12-14 17:04 ` [Intel-gfx] " Matthew Brost
                   ` (10 preceding siblings ...)
  (?)
@ 2021-12-15  3:28 ` Patchwork
  -1 siblings, 0 replies; 28+ messages in thread
From: Patchwork @ 2021-12-15  3:28 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 30257 bytes --]

== Series Details ==

Series: Fix stealing guc_ids + test (rev3)
URL   : https://patchwork.freedesktop.org/series/97896/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11002_full -> Patchwork_21847_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_21847_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_21847_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (10 -> 10)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_21847_full:

### IGT changes ###

#### Possible regressions ####

  * igt@perf@invalid-oa-metric-set-id:
    - shard-iclb:         [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-iclb5/igt@perf@invalid-oa-metric-set-id.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb4/igt@perf@invalid-oa-metric-set-id.html

  
Known issues
------------

  Here are the changes found in Patchwork_21847_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@feature_discovery@display-4x:
    - shard-iclb:         NOTRUN -> [SKIP][3] ([i915#1839])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb7/igt@feature_discovery@display-4x.html

  * igt@feature_discovery@psr2:
    - shard-iclb:         [PASS][4] -> [SKIP][5] ([i915#658])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-iclb2/igt@feature_discovery@psr2.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb6/igt@feature_discovery@psr2.html

  * igt@gem_exec_capture@pi@bcs0:
    - shard-skl:          [PASS][6] -> [INCOMPLETE][7] ([i915#4547])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-skl4/igt@gem_exec_capture@pi@bcs0.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-skl4/igt@gem_exec_capture@pi@bcs0.html

  * igt@gem_exec_fair@basic-none-rrul@rcs0:
    - shard-glk:          [PASS][8] -> [FAIL][9] ([i915#2842]) +1 similar issue
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-glk5/igt@gem_exec_fair@basic-none-rrul@rcs0.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-glk4/igt@gem_exec_fair@basic-none-rrul@rcs0.html

  * igt@gem_exec_fair@basic-pace@bcs0:
    - shard-tglb:         [PASS][10] -> [FAIL][11] ([i915#2842]) +1 similar issue
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-tglb8/igt@gem_exec_fair@basic-pace@bcs0.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb6/igt@gem_exec_fair@basic-pace@bcs0.html

  * igt@gem_exec_fair@basic-pace@vecs0:
    - shard-kbl:          [PASS][12] -> [FAIL][13] ([i915#2842])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-kbl6/igt@gem_exec_fair@basic-pace@vecs0.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl1/igt@gem_exec_fair@basic-pace@vecs0.html

  * igt@gem_exec_whisper@basic-contexts-forked-all:
    - shard-glk:          [PASS][14] -> [DMESG-WARN][15] ([i915#118])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-glk5/igt@gem_exec_whisper@basic-contexts-forked-all.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-glk4/igt@gem_exec_whisper@basic-contexts-forked-all.html

  * igt@gem_lmem_swapping@heavy-verify-multi:
    - shard-kbl:          NOTRUN -> [SKIP][16] ([fdo#109271] / [i915#4613]) +2 similar issues
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl6/igt@gem_lmem_swapping@heavy-verify-multi.html

  * igt@gem_lmem_swapping@parallel-multi:
    - shard-apl:          NOTRUN -> [SKIP][17] ([fdo#109271] / [i915#4613]) +1 similar issue
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-apl3/igt@gem_lmem_swapping@parallel-multi.html
    - shard-skl:          NOTRUN -> [SKIP][18] ([fdo#109271] / [i915#4613])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-skl10/igt@gem_lmem_swapping@parallel-multi.html
    - shard-tglb:         NOTRUN -> [SKIP][19] ([i915#4613]) +1 similar issue
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb8/igt@gem_lmem_swapping@parallel-multi.html

  * igt@gem_lmem_swapping@smem-oom:
    - shard-iclb:         NOTRUN -> [SKIP][20] ([i915#4613])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb4/igt@gem_lmem_swapping@smem-oom.html

  * igt@gem_mmap_gtt@coherency:
    - shard-iclb:         NOTRUN -> [SKIP][21] ([fdo#109292])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb7/igt@gem_mmap_gtt@coherency.html

  * igt@gem_pxp@fail-invalid-protected-context:
    - shard-tglb:         NOTRUN -> [SKIP][22] ([i915#4270])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@gem_pxp@fail-invalid-protected-context.html
    - shard-iclb:         NOTRUN -> [SKIP][23] ([i915#4270])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb4/igt@gem_pxp@fail-invalid-protected-context.html

  * igt@gem_pxp@regular-baseline-src-copy-readible:
    - shard-kbl:          NOTRUN -> [SKIP][24] ([fdo#109271]) +174 similar issues
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl6/igt@gem_pxp@regular-baseline-src-copy-readible.html

  * igt@gen9_exec_parse@batch-without-end:
    - shard-tglb:         NOTRUN -> [SKIP][25] ([i915#2856])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@gen9_exec_parse@batch-without-end.html

  * igt@i915_pm_dc@dc6-dpms:
    - shard-iclb:         [PASS][26] -> [FAIL][27] ([i915#454])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-iclb1/igt@i915_pm_dc@dc6-dpms.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb3/igt@i915_pm_dc@dc6-dpms.html
    - shard-kbl:          NOTRUN -> [FAIL][28] ([i915#454])
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl7/igt@i915_pm_dc@dc6-dpms.html

  * igt@i915_pm_dc@dc9-dpms:
    - shard-tglb:         NOTRUN -> [SKIP][29] ([i915#4281])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@i915_pm_dc@dc9-dpms.html

  * igt@kms_big_fb@x-tiled-max-hw-stride-64bpp-rotate-0-hflip:
    - shard-kbl:          NOTRUN -> [SKIP][30] ([fdo#109271] / [i915#3777]) +2 similar issues
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl6/igt@kms_big_fb@x-tiled-max-hw-stride-64bpp-rotate-0-hflip.html

  * igt@kms_big_fb@y-tiled-64bpp-rotate-270:
    - shard-tglb:         NOTRUN -> [SKIP][31] ([fdo#111614]) +1 similar issue
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@kms_big_fb@y-tiled-64bpp-rotate-270.html
    - shard-iclb:         NOTRUN -> [SKIP][32] ([fdo#110725] / [fdo#111614])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb4/igt@kms_big_fb@y-tiled-64bpp-rotate-270.html

  * igt@kms_big_fb@yf-tiled-32bpp-rotate-90:
    - shard-tglb:         NOTRUN -> [SKIP][33] ([fdo#111615])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@kms_big_fb@yf-tiled-32bpp-rotate-90.html

  * igt@kms_ccs@pipe-a-bad-rotation-90-y_tiled_gen12_rc_ccs_cc:
    - shard-kbl:          NOTRUN -> [SKIP][34] ([fdo#109271] / [i915#3886]) +8 similar issues
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl3/igt@kms_ccs@pipe-a-bad-rotation-90-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-a-ccs-on-another-bo-y_tiled_gen12_mc_ccs:
    - shard-apl:          NOTRUN -> [SKIP][35] ([fdo#109271] / [i915#3886]) +3 similar issues
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-apl1/igt@kms_ccs@pipe-a-ccs-on-another-bo-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-a-ccs-on-another-bo-yf_tiled_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][36] ([fdo#111615] / [i915#3689])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@kms_ccs@pipe-a-ccs-on-another-bo-yf_tiled_ccs.html

  * igt@kms_ccs@pipe-b-missing-ccs-buffer-y_tiled_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][37] ([i915#3689])
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@kms_ccs@pipe-b-missing-ccs-buffer-y_tiled_ccs.html

  * igt@kms_ccs@pipe-c-bad-pixel-format-y_tiled_gen12_mc_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][38] ([i915#3689] / [i915#3886])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@kms_ccs@pipe-c-bad-pixel-format-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-c-bad-pixel-format-y_tiled_gen12_rc_ccs_cc:
    - shard-skl:          NOTRUN -> [SKIP][39] ([fdo#109271] / [i915#3886])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-skl10/igt@kms_ccs@pipe-c-bad-pixel-format-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_chamelium@dp-edid-change-during-suspend:
    - shard-apl:          NOTRUN -> [SKIP][40] ([fdo#109271] / [fdo#111827]) +7 similar issues
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-apl3/igt@kms_chamelium@dp-edid-change-during-suspend.html

  * igt@kms_color@pipe-d-ctm-green-to-red:
    - shard-iclb:         NOTRUN -> [SKIP][41] ([fdo#109278] / [i915#1149])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb4/igt@kms_color@pipe-d-ctm-green-to-red.html

  * igt@kms_color_chamelium@pipe-a-ctm-0-75:
    - shard-kbl:          NOTRUN -> [SKIP][42] ([fdo#109271] / [fdo#111827]) +17 similar issues
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl2/igt@kms_color_chamelium@pipe-a-ctm-0-75.html

  * igt@kms_color_chamelium@pipe-a-ctm-green-to-red:
    - shard-iclb:         NOTRUN -> [SKIP][43] ([fdo#109284] / [fdo#111827]) +1 similar issue
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb4/igt@kms_color_chamelium@pipe-a-ctm-green-to-red.html

  * igt@kms_color_chamelium@pipe-c-degamma:
    - shard-tglb:         NOTRUN -> [SKIP][44] ([fdo#109284] / [fdo#111827]) +2 similar issues
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@kms_color_chamelium@pipe-c-degamma.html

  * igt@kms_content_protection@atomic-dpms:
    - shard-kbl:          NOTRUN -> [TIMEOUT][45] ([i915#1319])
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl6/igt@kms_content_protection@atomic-dpms.html

  * igt@kms_content_protection@legacy:
    - shard-apl:          NOTRUN -> [TIMEOUT][46] ([i915#1319])
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-apl3/igt@kms_content_protection@legacy.html

  * igt@kms_content_protection@uevent:
    - shard-apl:          NOTRUN -> [FAIL][47] ([i915#2105])
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-apl1/igt@kms_content_protection@uevent.html

  * igt@kms_cursor_crc@pipe-b-cursor-32x32-onscreen:
    - shard-tglb:         NOTRUN -> [SKIP][48] ([i915#3319])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@kms_cursor_crc@pipe-b-cursor-32x32-onscreen.html

  * igt@kms_cursor_crc@pipe-b-cursor-512x170-rapid-movement:
    - shard-glk:          NOTRUN -> [SKIP][49] ([fdo#109271]) +1 similar issue
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-glk4/igt@kms_cursor_crc@pipe-b-cursor-512x170-rapid-movement.html

  * igt@kms_cursor_crc@pipe-c-cursor-max-size-rapid-movement:
    - shard-tglb:         NOTRUN -> [SKIP][50] ([i915#3359])
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@kms_cursor_crc@pipe-c-cursor-max-size-rapid-movement.html

  * igt@kms_cursor_crc@pipe-d-cursor-32x32-onscreen:
    - shard-iclb:         NOTRUN -> [SKIP][51] ([fdo#109278]) +7 similar issues
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb5/igt@kms_cursor_crc@pipe-d-cursor-32x32-onscreen.html

  * igt@kms_cursor_crc@pipe-d-cursor-512x512-rapid-movement:
    - shard-tglb:         NOTRUN -> [SKIP][52] ([fdo#109279] / [i915#3359])
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@kms_cursor_crc@pipe-d-cursor-512x512-rapid-movement.html

  * igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy:
    - shard-iclb:         NOTRUN -> [SKIP][53] ([fdo#109274] / [fdo#109278]) +1 similar issue
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb5/igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy.html

  * igt@kms_cursor_legacy@flip-vs-cursor-varying-size:
    - shard-skl:          [PASS][54] -> [FAIL][55] ([i915#2346])
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-skl6/igt@kms_cursor_legacy@flip-vs-cursor-varying-size.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-skl1/igt@kms_cursor_legacy@flip-vs-cursor-varying-size.html

  * igt@kms_cursor_legacy@pipe-d-forked-move:
    - shard-skl:          NOTRUN -> [SKIP][56] ([fdo#109271]) +14 similar issues
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-skl10/igt@kms_cursor_legacy@pipe-d-forked-move.html

  * igt@kms_dsc@basic-dsc-enable:
    - shard-iclb:         NOTRUN -> [SKIP][57] ([i915#3840])
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb4/igt@kms_dsc@basic-dsc-enable.html

  * igt@kms_flip@2x-blocking-wf_vblank:
    - shard-iclb:         NOTRUN -> [SKIP][58] ([fdo#109274]) +2 similar issues
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb4/igt@kms_flip@2x-blocking-wf_vblank.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@c-dp1:
    - shard-apl:          [PASS][59] -> [FAIL][60] ([i915#79])
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-apl7/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-dp1.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-apl6/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-dp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@c-dp1:
    - shard-apl:          [PASS][61] -> [DMESG-WARN][62] ([i915#180]) +4 similar issues
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-apl3/igt@kms_flip@flip-vs-suspend-interruptible@c-dp1.html
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-apl8/igt@kms_flip@flip-vs-suspend-interruptible@c-dp1.html

  * igt@kms_flip@flip-vs-suspend@c-dp1:
    - shard-kbl:          [PASS][63] -> [DMESG-WARN][64] ([i915#180]) +7 similar issues
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-kbl6/igt@kms_flip@flip-vs-suspend@c-dp1.html
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl1/igt@kms_flip@flip-vs-suspend@c-dp1.html

  * igt@kms_flip@plain-flip-fb-recreate@b-edp1:
    - shard-skl:          [PASS][65] -> [FAIL][66] ([i915#2122]) +1 similar issue
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-skl6/igt@kms_flip@plain-flip-fb-recreate@b-edp1.html
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-skl1/igt@kms_flip@plain-flip-fb-recreate@b-edp1.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytilegen12rcccs:
    - shard-kbl:          NOTRUN -> [SKIP][67] ([fdo#109271] / [i915#2672])
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl6/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytilegen12rcccs.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilercccs:
    - shard-skl:          NOTRUN -> [SKIP][68] ([fdo#109271] / [i915#2672])
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-skl10/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilercccs.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-spr-indfb-move:
    - shard-tglb:         NOTRUN -> [SKIP][69] ([fdo#111825]) +9 similar issues
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb8/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-spr-indfb-move.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-spr-indfb-draw-pwrite:
    - shard-iclb:         NOTRUN -> [SKIP][70] ([fdo#109280]) +4 similar issues
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb7/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-spr-indfb-draw-pwrite.html

  * igt@kms_hdr@bpc-switch-dpms:
    - shard-skl:          [PASS][71] -> [FAIL][72] ([i915#1188]) +2 similar issues
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-skl1/igt@kms_hdr@bpc-switch-dpms.html
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-skl4/igt@kms_hdr@bpc-switch-dpms.html

  * igt@kms_hdr@static-swap:
    - shard-tglb:         NOTRUN -> [SKIP][73] ([i915#1187])
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@kms_hdr@static-swap.html

  * igt@kms_pipe_crc_basic@disable-crc-after-crtc-pipe-d:
    - shard-skl:          NOTRUN -> [SKIP][74] ([fdo#109271] / [i915#533])
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-skl7/igt@kms_pipe_crc_basic@disable-crc-after-crtc-pipe-d.html

  * igt@kms_plane_alpha_blend@pipe-b-alpha-basic:
    - shard-kbl:          NOTRUN -> [FAIL][75] ([fdo#108145] / [i915#265]) +1 similar issue
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl6/igt@kms_plane_alpha_blend@pipe-b-alpha-basic.html

  * igt@kms_plane_alpha_blend@pipe-b-alpha-transparent-fb:
    - shard-kbl:          NOTRUN -> [FAIL][76] ([i915#265])
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl3/igt@kms_plane_alpha_blend@pipe-b-alpha-transparent-fb.html

  * igt@kms_plane_alpha_blend@pipe-b-constant-alpha-max:
    - shard-apl:          NOTRUN -> [FAIL][77] ([fdo#108145] / [i915#265]) +1 similar issue
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-apl1/igt@kms_plane_alpha_blend@pipe-b-constant-alpha-max.html

  * igt@kms_plane_alpha_blend@pipe-c-constant-alpha-max:
    - shard-skl:          NOTRUN -> [FAIL][78] ([fdo#108145] / [i915#265]) +1 similar issue
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-skl10/igt@kms_plane_alpha_blend@pipe-c-constant-alpha-max.html

  * igt@kms_plane_lowres@pipe-b-tiling-yf:
    - shard-iclb:         NOTRUN -> [SKIP][79] ([i915#3536])
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb5/igt@kms_plane_lowres@pipe-b-tiling-yf.html

  * igt@kms_plane_lowres@pipe-c-tiling-y:
    - shard-tglb:         NOTRUN -> [SKIP][80] ([i915#3536])
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@kms_plane_lowres@pipe-c-tiling-y.html

  * igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area:
    - shard-kbl:          NOTRUN -> [SKIP][81] ([fdo#109271] / [i915#658])
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl2/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area.html

  * igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area:
    - shard-tglb:         NOTRUN -> [SKIP][82] ([i915#2920])
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area.html

  * igt@kms_psr@psr2_basic:
    - shard-tglb:         NOTRUN -> [FAIL][83] ([i915#132] / [i915#3467])
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@kms_psr@psr2_basic.html

  * igt@kms_psr@psr2_cursor_plane_move:
    - shard-iclb:         [PASS][84] -> [SKIP][85] ([fdo#109441]) +2 similar issues
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-iclb2/igt@kms_psr@psr2_cursor_plane_move.html
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb8/igt@kms_psr@psr2_cursor_plane_move.html

  * igt@kms_psr@psr2_primary_mmap_cpu:
    - shard-iclb:         NOTRUN -> [SKIP][86] ([fdo#109441]) +1 similar issue
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb5/igt@kms_psr@psr2_primary_mmap_cpu.html

  * igt@kms_setmode@basic:
    - shard-glk:          [PASS][87] -> [FAIL][88] ([i915#31])
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-glk6/igt@kms_setmode@basic.html
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-glk1/igt@kms_setmode@basic.html

  * igt@kms_tv_load_detect@load-detect:
    - shard-tglb:         NOTRUN -> [SKIP][89] ([fdo#109309])
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@kms_tv_load_detect@load-detect.html

  * igt@kms_vblank@pipe-a-ts-continuation-suspend:
    - shard-kbl:          [PASS][90] -> [DMESG-WARN][91] ([i915#180] / [i915#295])
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-kbl7/igt@kms_vblank@pipe-a-ts-continuation-suspend.html
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl1/igt@kms_vblank@pipe-a-ts-continuation-suspend.html

  * igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend:
    - shard-kbl:          [PASS][92] -> [INCOMPLETE][93] ([i915#2828])
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-kbl7/igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend.html
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl4/igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend.html

  * igt@kms_vblank@pipe-c-ts-continuation-suspend:
    - shard-skl:          [PASS][94] -> [INCOMPLETE][95] ([i915#2828])
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-skl4/igt@kms_vblank@pipe-c-ts-continuation-suspend.html
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-skl4/igt@kms_vblank@pipe-c-ts-continuation-suspend.html

  * igt@kms_vblank@pipe-d-ts-continuation-idle:
    - shard-apl:          NOTRUN -> [SKIP][96] ([fdo#109271]) +77 similar issues
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-apl3/igt@kms_vblank@pipe-d-ts-continuation-idle.html

  * igt@nouveau_crc@pipe-b-ctx-flip-detection:
    - shard-tglb:         NOTRUN -> [SKIP][97] ([i915#2530]) +1 similar issue
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@nouveau_crc@pipe-b-ctx-flip-detection.html

  * igt@perf@polling-parameterized:
    - shard-skl:          [PASS][98] -> [FAIL][99] ([i915#1542])
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-skl6/igt@perf@polling-parameterized.html
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-skl1/igt@perf@polling-parameterized.html

  * igt@perf_pmu@module-unload:
    - shard-apl:          [PASS][100] -> [INCOMPLETE][101] ([i915#1982] / [i915#262])
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-apl4/igt@perf_pmu@module-unload.html
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-apl6/igt@perf_pmu@module-unload.html

  * igt@sysfs_clients@fair-0:
    - shard-tglb:         NOTRUN -> [SKIP][102] ([i915#2994]) +1 similar issue
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb8/igt@sysfs_clients@fair-0.html
    - shard-apl:          NOTRUN -> [SKIP][103] ([fdo#109271] / [i915#2994]) +2 similar issues
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-apl3/igt@sysfs_clients@fair-0.html

  * igt@sysfs_clients@fair-1:
    - shard-skl:          NOTRUN -> [SKIP][104] ([fdo#109271] / [i915#2994])
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-skl10/igt@sysfs_clients@fair-1.html

  * igt@sysfs_clients@fair-7:
    - shard-iclb:         NOTRUN -> [SKIP][105] ([i915#2994])
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb5/igt@sysfs_clients@fair-7.html

  * igt@sysfs_clients@sema-50:
    - shard-kbl:          NOTRUN -> [SKIP][106] ([fdo#109271] / [i915#2994]) +2 similar issues
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl7/igt@sysfs_clients@sema-50.html

  
#### Possible fixes ####

  * igt@gem_eio@in-flight-contexts-immediate:
    - {shard-rkl}:        ([TIMEOUT][107], [PASS][108]) ([i915#3063]) -> [PASS][109]
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-rkl-1/igt@gem_eio@in-flight-contexts-immediate.html
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-rkl-4/igt@gem_eio@in-flight-contexts-immediate.html
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-rkl-1/igt@gem_eio@in-flight-contexts-immediate.html

  * igt@gem_exec_fair@basic-none-share@rcs0:
    - shard-glk:          [FAIL][110] ([i915#2842]) -> [PASS][111]
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-glk4/igt@gem_exec_fair@basic-none-share@rcs0.html
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-glk5/igt@gem_exec_fair@basic-none-share@rcs0.html

  * igt@gem_exec_fair@basic-none@vcs0:
    - shard-kbl:          [FAIL][112] ([i915#2842]) -> [PASS][113] +1 similar issue
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-kbl7/igt@gem_exec_fair@basic-none@vcs0.html
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl4/igt@gem_exec_fair@basic-none@vcs0.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - shard-tglb:         [FAIL][114] ([i915#2842]) -> [PASS][115]
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-tglb1/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-tglb1/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@gem_exec_fair@basic-throttle@rcs0:
    - shard-iclb:         [FAIL][116] ([i915#2849]) -> [PASS][117]
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-iclb6/igt@gem_exec_fair@basic-throttle@rcs0.html
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-iclb2/igt@gem_exec_fair@basic-throttle@rcs0.html

  * igt@gem_spin_batch@spin-each:
    - shard-apl:          [FAIL][118] ([i915#2898]) -> [PASS][119]
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-apl8/igt@gem_spin_batch@spin-each.html
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-apl7/igt@gem_spin_batch@spin-each.html

  * igt@gen9_exec_parse@allowed-single:
    - shard-skl:          [DMESG-WARN][120] ([i915#1436] / [i915#716]) -> [PASS][121]
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-skl1/igt@gen9_exec_parse@allowed-single.html
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-skl10/igt@gen9_exec_parse@allowed-single.html

  * igt@i915_pm_rpm@drm-resources-equal:
    - shard-kbl:          [DMESG-WARN][122] ([i915#62] / [i915#92]) -> [PASS][123] +34 similar issues
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-kbl6/igt@i915_pm_rpm@drm-resources-equal.html
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl1/igt@i915_pm_rpm@drm-resources-equal.html

  * igt@i915_suspend@sysfs-reader:
    - shard-apl:          [DMESG-WARN][124] ([i915#180]) -> [PASS][125] +2 similar issues
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-apl1/igt@i915_suspend@sysfs-reader.html
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-apl1/igt@i915_suspend@sysfs-reader.html

  * igt@kms_big_fb@yf-tiled-32bpp-rotate-0:
    - shard-glk:          [DMESG-WARN][126] ([i915#118]) -> [PASS][127]
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-glk2/igt@kms_big_fb@yf-tiled-32bpp-rotate-0.html
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-glk8/igt@kms_big_fb@yf-tiled-32bpp-rotate-0.html

  * igt@kms_cursor_legacy@pipe-c-torture-move:
    - {shard-rkl}:        [SKIP][128] ([i915#4070]) -> [PASS][129]
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-rkl-1/igt@kms_cursor_legacy@pipe-c-torture-move.html
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-rkl-4/igt@kms_cursor_legacy@pipe-c-torture-move.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@a-dp1:
    - shard-apl:          [FAIL][130] ([i915#79]) -> [PASS][131]
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-apl7/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-dp1.html
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-apl6/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-dp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@a-dp1:
    - shard-kbl:          [DMESG-WARN][132] ([i915#180]) -> [PASS][133] +7 similar issues
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-kbl4/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl6/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html

  * igt@kms_flip@flip-vs-wf_vblank-interruptible@a-dp1:
    - shard-kbl:          [DMESG-WARN][134] ([i915#1982] / [i915#62] / [i915#92]) -> [PASS][135]
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-kbl6/igt@kms_flip@flip-vs-wf_vblank-interruptible@a-dp1.html
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-kbl1/igt@kms_flip@flip-vs-wf_vblank-interruptible@a-dp1.html

  * igt@kms_flip@plain-flip-fb-recreate-interruptible@c-edp1:
    - shard-skl:          [FAIL][136] ([i915#2122]) -> [PASS][137] +1 similar issue
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11002/shard-skl10/igt@kms_flip@plain-flip-fb-recreate-interruptible@c-edp1.html
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/shard-skl8/igt@kms_flip@plain-flip-fb-recreate-interruptible@c-edp1.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile:
    - shard-iclb:         [SKIP][138] ([i915#3701]) -> [PASS][139]
   [138]: https://intel-gfx-ci.01.org/tree/

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21847/index.html

[-- Attachment #2: Type: text/html, Size: 33584 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts
  2021-12-14 17:04   ` [Intel-gfx] " Matthew Brost
  (?)
@ 2021-12-17 11:06   ` Tvrtko Ursulin
  2021-12-17 11:14     ` Tvrtko Ursulin
  -1 siblings, 1 reply; 28+ messages in thread
From: Tvrtko Ursulin @ 2021-12-17 11:06 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 14/12/2021 17:04, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> While attempting to debug a CT deadlock issue in various CI failures
> (most easily reproduced with gem_ctx_create/basic-files), I was seeing
> CPU deadlock errors being reported. This were because the context
> destroy loop was blocking waiting on H2G space from inside an IRQ
> spinlock. There no was deadlock as such, it's just that the H2G queue
> was full of context destroy commands and GuC was taking a long time to
> process them. However, the kernel was seeing the large amount of time
> spent inside the IRQ lock as a dead CPU. Various Bad Things(tm) would
> then happen (heartbeat failures, CT deadlock errors, outstanding H2G
> WARNs, etc.).
> 
> Re-working the loop to only acquire the spinlock around the list
> management (which is all it is meant to protect) rather than the
> entire destroy operation seems to fix all the above issues.
> 
> v2:
>   (John Harrison)
>    - Fix typo in comment message
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 45 ++++++++++++-------
>   1 file changed, 28 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 36c2965db49b..96fcf869e3ff 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -2644,7 +2644,6 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
>   	unsigned long flags;
>   	bool disabled;
>   
> -	lockdep_assert_held(&guc->submission_state.lock);
>   	GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
>   	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id.id));
>   	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id.id));
> @@ -2660,7 +2659,7 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
>   	}
>   	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>   	if (unlikely(disabled)) {
> -		__release_guc_id(guc, ce);
> +		release_guc_id(guc, ce);
>   		__guc_context_destroy(ce);
>   		return;
>   	}
> @@ -2694,36 +2693,48 @@ static void __guc_context_destroy(struct intel_context *ce)
>   
>   static void guc_flush_destroyed_contexts(struct intel_guc *guc)
>   {
> -	struct intel_context *ce, *cn;
> +	struct intel_context *ce;
>   	unsigned long flags;
>   
>   	GEM_BUG_ON(!submission_disabled(guc) &&
>   		   guc_submission_initialized(guc));
>   
> -	spin_lock_irqsave(&guc->submission_state.lock, flags);
> -	list_for_each_entry_safe(ce, cn,
> -				 &guc->submission_state.destroyed_contexts,
> -				 destroyed_link) {
> -		list_del_init(&ce->destroyed_link);
> -		__release_guc_id(guc, ce);
> +	while (!list_empty(&guc->submission_state.destroyed_contexts)) {

Are lockless false negatives a concern here - I mean this thread not seeing something just got added to the list?

> +		spin_lock_irqsave(&guc->submission_state.lock, flags);
> +		ce = list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
> +					      struct intel_context,
> +					      destroyed_link);
> +		if (ce)
> +			list_del_init(&ce->destroyed_link);
> +		spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> +
> +		if (!ce)
> +			break;
> +
> +		release_guc_id(guc, ce);

This looks suboptimal and in conflict with this part of the commit message:

"""
  Re-working the loop to only acquire the spinlock around the list
  management (which is all it is meant to protect) rather than the
  entire destroy operation seems to fix all the above issues.
"""

Because you end up doing:

... loop ...
   spin_lock_irqsave(&guc->submission_state.lock, flags);
   list_del_init(&ce->destroyed_link);
   spin_unlock_irqrestore(&guc->submission_state.lock, flags);

   release_guc_id, which calls:
     spin_lock_irqsave(&guc->submission_state.lock, flags);
     __release_guc_id(guc, ce);
     spin_unlock_irqrestore(&guc->submission_state.lock, flags);

So a) the lock seems to be protecting more than just list management, or release_guc_if is wrong, and b) the loop ends up with highly questionable hammering on the lock.

Is there any point to this part of the patch? Or the only business end of the patch is below:

>   		__guc_context_destroy(ce);
>   	}
> -	spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>   }
>   
>   static void deregister_destroyed_contexts(struct intel_guc *guc)
>   {
> -	struct intel_context *ce, *cn;
> +	struct intel_context *ce;
>   	unsigned long flags;
>   
> -	spin_lock_irqsave(&guc->submission_state.lock, flags);
> -	list_for_each_entry_safe(ce, cn,
> -				 &guc->submission_state.destroyed_contexts,
> -				 destroyed_link) {
> -		list_del_init(&ce->destroyed_link);
> +	while (!list_empty(&guc->submission_state.destroyed_contexts)) {
> +		spin_lock_irqsave(&guc->submission_state.lock, flags);
> +		ce = list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
> +					      struct intel_context,
> +					      destroyed_link);
> +		if (ce)
> +			list_del_init(&ce->destroyed_link);
> +		spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> +
> +		if (!ce)
> +			break;
> +
>   		guc_lrc_desc_unpin(ce);

Here?

Not wanting/needing to nest ce->guc_state.lock under guc->submission_state.lock, and call the CPU cycle expensive deregister_context?

1)
Could you unlink en masse, under the assumption destroyed contexts are not reachable from anywhere else at this point, so under a single lock hold?

2)
But then you also end up with guc_lrc_desc_unpin calling __release_guc_id, which when called by release_guc_id does take guc->submission_state.lock and here it does not. Is it then clear which operations inside __release_guc_id need the lock? Bitmap or IDA?

Regards,

Tvrtko

>   	}
> -	spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>   }
>   
>   static void destroyed_worker_func(struct work_struct *w)
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts
  2021-12-17 11:06   ` Tvrtko Ursulin
@ 2021-12-17 11:14     ` Tvrtko Ursulin
  2021-12-22 16:25         ` Tvrtko Ursulin
  0 siblings, 1 reply; 28+ messages in thread
From: Tvrtko Ursulin @ 2021-12-17 11:14 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel


On 17/12/2021 11:06, Tvrtko Ursulin wrote:
> On 14/12/2021 17:04, Matthew Brost wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> While attempting to debug a CT deadlock issue in various CI failures
>> (most easily reproduced with gem_ctx_create/basic-files), I was seeing
>> CPU deadlock errors being reported. This were because the context
>> destroy loop was blocking waiting on H2G space from inside an IRQ
>> spinlock. There no was deadlock as such, it's just that the H2G queue
>> was full of context destroy commands and GuC was taking a long time to
>> process them. However, the kernel was seeing the large amount of time
>> spent inside the IRQ lock as a dead CPU. Various Bad Things(tm) would
>> then happen (heartbeat failures, CT deadlock errors, outstanding H2G
>> WARNs, etc.).
>>
>> Re-working the loop to only acquire the spinlock around the list
>> management (which is all it is meant to protect) rather than the
>> entire destroy operation seems to fix all the above issues.
>>
>> v2:
>>   (John Harrison)
>>    - Fix typo in comment message
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
>> ---
>>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 45 ++++++++++++-------
>>   1 file changed, 28 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> index 36c2965db49b..96fcf869e3ff 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> @@ -2644,7 +2644,6 @@ static inline void guc_lrc_desc_unpin(struct 
>> intel_context *ce)
>>       unsigned long flags;
>>       bool disabled;
>> -    lockdep_assert_held(&guc->submission_state.lock);
>>       GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
>>       GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id.id));
>>       GEM_BUG_ON(ce != __get_context(guc, ce->guc_id.id));
>> @@ -2660,7 +2659,7 @@ static inline void guc_lrc_desc_unpin(struct 
>> intel_context *ce)
>>       }
>>       spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>>       if (unlikely(disabled)) {
>> -        __release_guc_id(guc, ce);
>> +        release_guc_id(guc, ce);
>>           __guc_context_destroy(ce);
>>           return;
>>       }
>> @@ -2694,36 +2693,48 @@ static void __guc_context_destroy(struct 
>> intel_context *ce)
>>   static void guc_flush_destroyed_contexts(struct intel_guc *guc)
>>   {
>> -    struct intel_context *ce, *cn;
>> +    struct intel_context *ce;
>>       unsigned long flags;
>>       GEM_BUG_ON(!submission_disabled(guc) &&
>>              guc_submission_initialized(guc));
>> -    spin_lock_irqsave(&guc->submission_state.lock, flags);
>> -    list_for_each_entry_safe(ce, cn,
>> -                 &guc->submission_state.destroyed_contexts,
>> -                 destroyed_link) {
>> -        list_del_init(&ce->destroyed_link);
>> -        __release_guc_id(guc, ce);
>> +    while (!list_empty(&guc->submission_state.destroyed_contexts)) {
> 
> Are lockless false negatives a concern here - I mean this thread not 
> seeing something just got added to the list?
> 
>> +        spin_lock_irqsave(&guc->submission_state.lock, flags);
>> +        ce = 
>> list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
>> +                          struct intel_context,
>> +                          destroyed_link);
>> +        if (ce)
>> +            list_del_init(&ce->destroyed_link);
>> +        spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>> +
>> +        if (!ce)
>> +            break;
>> +
>> +        release_guc_id(guc, ce);
> 
> This looks suboptimal and in conflict with this part of the commit message:
> 
> """
>   Re-working the loop to only acquire the spinlock around the list
>   management (which is all it is meant to protect) rather than the
>   entire destroy operation seems to fix all the above issues.
> """
> 
> Because you end up doing:
> 
> ... loop ...
>    spin_lock_irqsave(&guc->submission_state.lock, flags);
>    list_del_init(&ce->destroyed_link);
>    spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> 
>    release_guc_id, which calls:
>      spin_lock_irqsave(&guc->submission_state.lock, flags);
>      __release_guc_id(guc, ce);
>      spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> 
> So a) the lock seems to be protecting more than just list management, or 
> release_guc_if is wrong, and b) the loop ends up with highly 
> questionable hammering on the lock.
> 
> Is there any point to this part of the patch? Or the only business end 
> of the patch is below:
> 
>>           __guc_context_destroy(ce);
>>       }
>> -    spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>>   }
>>   static void deregister_destroyed_contexts(struct intel_guc *guc)
>>   {
>> -    struct intel_context *ce, *cn;
>> +    struct intel_context *ce;
>>       unsigned long flags;
>> -    spin_lock_irqsave(&guc->submission_state.lock, flags);
>> -    list_for_each_entry_safe(ce, cn,
>> -                 &guc->submission_state.destroyed_contexts,
>> -                 destroyed_link) {
>> -        list_del_init(&ce->destroyed_link);
>> +    while (!list_empty(&guc->submission_state.destroyed_contexts)) {
>> +        spin_lock_irqsave(&guc->submission_state.lock, flags);
>> +        ce = 
>> list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
>> +                          struct intel_context,
>> +                          destroyed_link);
>> +        if (ce)
>> +            list_del_init(&ce->destroyed_link);
>> +        spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>> +
>> +        if (!ce)
>> +            break;
>> +
>>           guc_lrc_desc_unpin(ce);
> 
> Here?
> 
> Not wanting/needing to nest ce->guc_state.lock under 
> guc->submission_state.lock, and call the CPU cycle expensive 
> deregister_context?
> 
> 1)
> Could you unlink en masse, under the assumption destroyed contexts are 
> not reachable from anywhere else at this point, so under a single lock 
> hold?
> 
> 2)
> But then you also end up with guc_lrc_desc_unpin calling 
> __release_guc_id, which when called by release_guc_id does take 
> guc->submission_state.lock and here it does not. Is it then clear which 
> operations inside __release_guc_id need the lock? Bitmap or IDA?

Ah no, with 2nd point I missed you changed guc_lrc_desc_unpin to call 
release_guc_id.

Question on the merit of change in guc_flush_destroyed_contexts remains, 
and also whether at both places you could do group unlink (one lock 
hold), put on a private list, and then unpin/deregister.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts
  2021-12-17 11:14     ` Tvrtko Ursulin
@ 2021-12-22 16:25         ` Tvrtko Ursulin
  0 siblings, 0 replies; 28+ messages in thread
From: Tvrtko Ursulin @ 2021-12-22 16:25 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: John Harrison


Ping?

Main two points being:

1) Commit message seems in contradiction with the change in 
guc_flush_destroyed_contexts. And the lock drop to immediately 
re-acquire it looks questionable to start with.

2) And in deregister_destroyed_contexts and in 1) I was therefore asking 
if you can unlink all at once and process with reduced hammering on the 
lock.

Regards,

Tvrtko

On 17/12/2021 11:14, Tvrtko Ursulin wrote:
> 
> On 17/12/2021 11:06, Tvrtko Ursulin wrote:
>> On 14/12/2021 17:04, Matthew Brost wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> While attempting to debug a CT deadlock issue in various CI failures
>>> (most easily reproduced with gem_ctx_create/basic-files), I was seeing
>>> CPU deadlock errors being reported. This were because the context
>>> destroy loop was blocking waiting on H2G space from inside an IRQ
>>> spinlock. There no was deadlock as such, it's just that the H2G queue
>>> was full of context destroy commands and GuC was taking a long time to
>>> process them. However, the kernel was seeing the large amount of time
>>> spent inside the IRQ lock as a dead CPU. Various Bad Things(tm) would
>>> then happen (heartbeat failures, CT deadlock errors, outstanding H2G
>>> WARNs, etc.).
>>>
>>> Re-working the loop to only acquire the spinlock around the list
>>> management (which is all it is meant to protect) rather than the
>>> entire destroy operation seems to fix all the above issues.
>>>
>>> v2:
>>>   (John Harrison)
>>>    - Fix typo in comment message
>>>
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 45 ++++++++++++-------
>>>   1 file changed, 28 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> index 36c2965db49b..96fcf869e3ff 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> @@ -2644,7 +2644,6 @@ static inline void guc_lrc_desc_unpin(struct 
>>> intel_context *ce)
>>>       unsigned long flags;
>>>       bool disabled;
>>> -    lockdep_assert_held(&guc->submission_state.lock);
>>>       GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
>>>       GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id.id));
>>>       GEM_BUG_ON(ce != __get_context(guc, ce->guc_id.id));
>>> @@ -2660,7 +2659,7 @@ static inline void guc_lrc_desc_unpin(struct 
>>> intel_context *ce)
>>>       }
>>>       spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>>>       if (unlikely(disabled)) {
>>> -        __release_guc_id(guc, ce);
>>> +        release_guc_id(guc, ce);
>>>           __guc_context_destroy(ce);
>>>           return;
>>>       }
>>> @@ -2694,36 +2693,48 @@ static void __guc_context_destroy(struct 
>>> intel_context *ce)
>>>   static void guc_flush_destroyed_contexts(struct intel_guc *guc)
>>>   {
>>> -    struct intel_context *ce, *cn;
>>> +    struct intel_context *ce;
>>>       unsigned long flags;
>>>       GEM_BUG_ON(!submission_disabled(guc) &&
>>>              guc_submission_initialized(guc));
>>> -    spin_lock_irqsave(&guc->submission_state.lock, flags);
>>> -    list_for_each_entry_safe(ce, cn,
>>> -                 &guc->submission_state.destroyed_contexts,
>>> -                 destroyed_link) {
>>> -        list_del_init(&ce->destroyed_link);
>>> -        __release_guc_id(guc, ce);
>>> +    while (!list_empty(&guc->submission_state.destroyed_contexts)) {
>>
>> Are lockless false negatives a concern here - I mean this thread not 
>> seeing something just got added to the list?
>>
>>> +        spin_lock_irqsave(&guc->submission_state.lock, flags);
>>> +        ce = 
>>> list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
>>> +                          struct intel_context,
>>> +                          destroyed_link);
>>> +        if (ce)
>>> +            list_del_init(&ce->destroyed_link);
>>> +        spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>>> +
>>> +        if (!ce)
>>> +            break;
>>> +
>>> +        release_guc_id(guc, ce);
>>
>> This looks suboptimal and in conflict with this part of the commit 
>> message:
>>
>> """
>>   Re-working the loop to only acquire the spinlock around the list
>>   management (which is all it is meant to protect) rather than the
>>   entire destroy operation seems to fix all the above issues.
>> """
>>
>> Because you end up doing:
>>
>> ... loop ...
>>    spin_lock_irqsave(&guc->submission_state.lock, flags);
>>    list_del_init(&ce->destroyed_link);
>>    spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>>
>>    release_guc_id, which calls:
>>      spin_lock_irqsave(&guc->submission_state.lock, flags);
>>      __release_guc_id(guc, ce);
>>      spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>>
>> So a) the lock seems to be protecting more than just list management, 
>> or release_guc_if is wrong, and b) the loop ends up with highly 
>> questionable hammering on the lock.
>>
>> Is there any point to this part of the patch? Or the only business end 
>> of the patch is below:
>>
>>>           __guc_context_destroy(ce);
>>>       }
>>> -    spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>>>   }
>>>   static void deregister_destroyed_contexts(struct intel_guc *guc)
>>>   {
>>> -    struct intel_context *ce, *cn;
>>> +    struct intel_context *ce;
>>>       unsigned long flags;
>>> -    spin_lock_irqsave(&guc->submission_state.lock, flags);
>>> -    list_for_each_entry_safe(ce, cn,
>>> -                 &guc->submission_state.destroyed_contexts,
>>> -                 destroyed_link) {
>>> -        list_del_init(&ce->destroyed_link);
>>> +    while (!list_empty(&guc->submission_state.destroyed_contexts)) {
>>> +        spin_lock_irqsave(&guc->submission_state.lock, flags);
>>> +        ce = 
>>> list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
>>> +                          struct intel_context,
>>> +                          destroyed_link);
>>> +        if (ce)
>>> +            list_del_init(&ce->destroyed_link);
>>> +        spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>>> +
>>> +        if (!ce)
>>> +            break;
>>> +
>>>           guc_lrc_desc_unpin(ce);
>>
>> Here?
>>
>> Not wanting/needing to nest ce->guc_state.lock under 
>> guc->submission_state.lock, and call the CPU cycle expensive 
>> deregister_context?
>>
>> 1)
>> Could you unlink en masse, under the assumption destroyed contexts are 
>> not reachable from anywhere else at this point, so under a single lock 
>> hold?
>>
>> 2)
>> But then you also end up with guc_lrc_desc_unpin calling 
>> __release_guc_id, which when called by release_guc_id does take 
>> guc->submission_state.lock and here it does not. Is it then clear 
>> which operations inside __release_guc_id need the lock? Bitmap or IDA?
> 
> Ah no, with 2nd point I missed you changed guc_lrc_desc_unpin to call 
> release_guc_id.
> 
> Question on the merit of change in guc_flush_destroyed_contexts remains, 
> and also whether at both places you could do group unlink (one lock 
> hold), put on a private list, and then unpin/deregister.
> 
> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts
@ 2021-12-22 16:25         ` Tvrtko Ursulin
  0 siblings, 0 replies; 28+ messages in thread
From: Tvrtko Ursulin @ 2021-12-22 16:25 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel


Ping?

Main two points being:

1) Commit message seems in contradiction with the change in 
guc_flush_destroyed_contexts. And the lock drop to immediately 
re-acquire it looks questionable to start with.

2) And in deregister_destroyed_contexts and in 1) I was therefore asking 
if you can unlink all at once and process with reduced hammering on the 
lock.

Regards,

Tvrtko

On 17/12/2021 11:14, Tvrtko Ursulin wrote:
> 
> On 17/12/2021 11:06, Tvrtko Ursulin wrote:
>> On 14/12/2021 17:04, Matthew Brost wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> While attempting to debug a CT deadlock issue in various CI failures
>>> (most easily reproduced with gem_ctx_create/basic-files), I was seeing
>>> CPU deadlock errors being reported. This were because the context
>>> destroy loop was blocking waiting on H2G space from inside an IRQ
>>> spinlock. There no was deadlock as such, it's just that the H2G queue
>>> was full of context destroy commands and GuC was taking a long time to
>>> process them. However, the kernel was seeing the large amount of time
>>> spent inside the IRQ lock as a dead CPU. Various Bad Things(tm) would
>>> then happen (heartbeat failures, CT deadlock errors, outstanding H2G
>>> WARNs, etc.).
>>>
>>> Re-working the loop to only acquire the spinlock around the list
>>> management (which is all it is meant to protect) rather than the
>>> entire destroy operation seems to fix all the above issues.
>>>
>>> v2:
>>>   (John Harrison)
>>>    - Fix typo in comment message
>>>
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 45 ++++++++++++-------
>>>   1 file changed, 28 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> index 36c2965db49b..96fcf869e3ff 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> @@ -2644,7 +2644,6 @@ static inline void guc_lrc_desc_unpin(struct 
>>> intel_context *ce)
>>>       unsigned long flags;
>>>       bool disabled;
>>> -    lockdep_assert_held(&guc->submission_state.lock);
>>>       GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
>>>       GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id.id));
>>>       GEM_BUG_ON(ce != __get_context(guc, ce->guc_id.id));
>>> @@ -2660,7 +2659,7 @@ static inline void guc_lrc_desc_unpin(struct 
>>> intel_context *ce)
>>>       }
>>>       spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>>>       if (unlikely(disabled)) {
>>> -        __release_guc_id(guc, ce);
>>> +        release_guc_id(guc, ce);
>>>           __guc_context_destroy(ce);
>>>           return;
>>>       }
>>> @@ -2694,36 +2693,48 @@ static void __guc_context_destroy(struct 
>>> intel_context *ce)
>>>   static void guc_flush_destroyed_contexts(struct intel_guc *guc)
>>>   {
>>> -    struct intel_context *ce, *cn;
>>> +    struct intel_context *ce;
>>>       unsigned long flags;
>>>       GEM_BUG_ON(!submission_disabled(guc) &&
>>>              guc_submission_initialized(guc));
>>> -    spin_lock_irqsave(&guc->submission_state.lock, flags);
>>> -    list_for_each_entry_safe(ce, cn,
>>> -                 &guc->submission_state.destroyed_contexts,
>>> -                 destroyed_link) {
>>> -        list_del_init(&ce->destroyed_link);
>>> -        __release_guc_id(guc, ce);
>>> +    while (!list_empty(&guc->submission_state.destroyed_contexts)) {
>>
>> Are lockless false negatives a concern here - I mean this thread not 
>> seeing something just got added to the list?
>>
>>> +        spin_lock_irqsave(&guc->submission_state.lock, flags);
>>> +        ce = 
>>> list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
>>> +                          struct intel_context,
>>> +                          destroyed_link);
>>> +        if (ce)
>>> +            list_del_init(&ce->destroyed_link);
>>> +        spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>>> +
>>> +        if (!ce)
>>> +            break;
>>> +
>>> +        release_guc_id(guc, ce);
>>
>> This looks suboptimal and in conflict with this part of the commit 
>> message:
>>
>> """
>>   Re-working the loop to only acquire the spinlock around the list
>>   management (which is all it is meant to protect) rather than the
>>   entire destroy operation seems to fix all the above issues.
>> """
>>
>> Because you end up doing:
>>
>> ... loop ...
>>    spin_lock_irqsave(&guc->submission_state.lock, flags);
>>    list_del_init(&ce->destroyed_link);
>>    spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>>
>>    release_guc_id, which calls:
>>      spin_lock_irqsave(&guc->submission_state.lock, flags);
>>      __release_guc_id(guc, ce);
>>      spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>>
>> So a) the lock seems to be protecting more than just list management, 
>> or release_guc_if is wrong, and b) the loop ends up with highly 
>> questionable hammering on the lock.
>>
>> Is there any point to this part of the patch? Or the only business end 
>> of the patch is below:
>>
>>>           __guc_context_destroy(ce);
>>>       }
>>> -    spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>>>   }
>>>   static void deregister_destroyed_contexts(struct intel_guc *guc)
>>>   {
>>> -    struct intel_context *ce, *cn;
>>> +    struct intel_context *ce;
>>>       unsigned long flags;
>>> -    spin_lock_irqsave(&guc->submission_state.lock, flags);
>>> -    list_for_each_entry_safe(ce, cn,
>>> -                 &guc->submission_state.destroyed_contexts,
>>> -                 destroyed_link) {
>>> -        list_del_init(&ce->destroyed_link);
>>> +    while (!list_empty(&guc->submission_state.destroyed_contexts)) {
>>> +        spin_lock_irqsave(&guc->submission_state.lock, flags);
>>> +        ce = 
>>> list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
>>> +                          struct intel_context,
>>> +                          destroyed_link);
>>> +        if (ce)
>>> +            list_del_init(&ce->destroyed_link);
>>> +        spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>>> +
>>> +        if (!ce)
>>> +            break;
>>> +
>>>           guc_lrc_desc_unpin(ce);
>>
>> Here?
>>
>> Not wanting/needing to nest ce->guc_state.lock under 
>> guc->submission_state.lock, and call the CPU cycle expensive 
>> deregister_context?
>>
>> 1)
>> Could you unlink en masse, under the assumption destroyed contexts are 
>> not reachable from anywhere else at this point, so under a single lock 
>> hold?
>>
>> 2)
>> But then you also end up with guc_lrc_desc_unpin calling 
>> __release_guc_id, which when called by release_guc_id does take 
>> guc->submission_state.lock and here it does not. Is it then clear 
>> which operations inside __release_guc_id need the lock? Bitmap or IDA?
> 
> Ah no, with 2nd point I missed you changed guc_lrc_desc_unpin to call 
> release_guc_id.
> 
> Question on the merit of change in guc_flush_destroyed_contexts remains, 
> and also whether at both places you could do group unlink (one lock 
> hold), put on a private list, and then unpin/deregister.
> 
> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts
  2021-12-22 16:25         ` Tvrtko Ursulin
@ 2021-12-22 20:38           ` Matthew Brost
  -1 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-22 20:38 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, John Harrison, dri-devel

On Wed, Dec 22, 2021 at 04:25:13PM +0000, Tvrtko Ursulin wrote:
> 
> Ping?
>

Missed this.

This was merged before your comments landed on the list.
 
> Main two points being:
> 
> 1) Commit message seems in contradiction with the change in
> guc_flush_destroyed_contexts. And the lock drop to immediately re-acquire it
> looks questionable to start with.
> 
> 2) And in deregister_destroyed_contexts and in 1) I was therefore asking if
> you can unlink all at once and process with reduced hammering on the lock.
> 

Probably can address both concerns by using a llist, right?

Be on the look out for this rework patch over the next week or so.

Matt

> Regards,
> 
> Tvrtko
> 
> On 17/12/2021 11:14, Tvrtko Ursulin wrote:
> > 
> > On 17/12/2021 11:06, Tvrtko Ursulin wrote:
> > > On 14/12/2021 17:04, Matthew Brost wrote:
> > > > From: John Harrison <John.C.Harrison@Intel.com>
> > > > 
> > > > While attempting to debug a CT deadlock issue in various CI failures
> > > > (most easily reproduced with gem_ctx_create/basic-files), I was seeing
> > > > CPU deadlock errors being reported. This were because the context
> > > > destroy loop was blocking waiting on H2G space from inside an IRQ
> > > > spinlock. There no was deadlock as such, it's just that the H2G queue
> > > > was full of context destroy commands and GuC was taking a long time to
> > > > process them. However, the kernel was seeing the large amount of time
> > > > spent inside the IRQ lock as a dead CPU. Various Bad Things(tm) would
> > > > then happen (heartbeat failures, CT deadlock errors, outstanding H2G
> > > > WARNs, etc.).
> > > > 
> > > > Re-working the loop to only acquire the spinlock around the list
> > > > management (which is all it is meant to protect) rather than the
> > > > entire destroy operation seems to fix all the above issues.
> > > > 
> > > > v2:
> > > >   (John Harrison)
> > > >    - Fix typo in comment message
> > > > 
> > > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 45 ++++++++++++-------
> > > >   1 file changed, 28 insertions(+), 17 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > index 36c2965db49b..96fcf869e3ff 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > @@ -2644,7 +2644,6 @@ static inline void
> > > > guc_lrc_desc_unpin(struct intel_context *ce)
> > > >       unsigned long flags;
> > > >       bool disabled;
> > > > -    lockdep_assert_held(&guc->submission_state.lock);
> > > >       GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
> > > >       GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id.id));
> > > >       GEM_BUG_ON(ce != __get_context(guc, ce->guc_id.id));
> > > > @@ -2660,7 +2659,7 @@ static inline void
> > > > guc_lrc_desc_unpin(struct intel_context *ce)
> > > >       }
> > > >       spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > > >       if (unlikely(disabled)) {
> > > > -        __release_guc_id(guc, ce);
> > > > +        release_guc_id(guc, ce);
> > > >           __guc_context_destroy(ce);
> > > >           return;
> > > >       }
> > > > @@ -2694,36 +2693,48 @@ static void __guc_context_destroy(struct
> > > > intel_context *ce)
> > > >   static void guc_flush_destroyed_contexts(struct intel_guc *guc)
> > > >   {
> > > > -    struct intel_context *ce, *cn;
> > > > +    struct intel_context *ce;
> > > >       unsigned long flags;
> > > >       GEM_BUG_ON(!submission_disabled(guc) &&
> > > >              guc_submission_initialized(guc));
> > > > -    spin_lock_irqsave(&guc->submission_state.lock, flags);
> > > > -    list_for_each_entry_safe(ce, cn,
> > > > -                 &guc->submission_state.destroyed_contexts,
> > > > -                 destroyed_link) {
> > > > -        list_del_init(&ce->destroyed_link);
> > > > -        __release_guc_id(guc, ce);
> > > > +    while (!list_empty(&guc->submission_state.destroyed_contexts)) {
> > > 
> > > Are lockless false negatives a concern here - I mean this thread not
> > > seeing something just got added to the list?
> > > 
> > > > +        spin_lock_irqsave(&guc->submission_state.lock, flags);
> > > > +        ce =
> > > > list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
> > > > +                          struct intel_context,
> > > > +                          destroyed_link);
> > > > +        if (ce)
> > > > +            list_del_init(&ce->destroyed_link);
> > > > +        spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> > > > +
> > > > +        if (!ce)
> > > > +            break;
> > > > +
> > > > +        release_guc_id(guc, ce);
> > > 
> > > This looks suboptimal and in conflict with this part of the commit
> > > message:
> > > 
> > > """
> > >   Re-working the loop to only acquire the spinlock around the list
> > >   management (which is all it is meant to protect) rather than the
> > >   entire destroy operation seems to fix all the above issues.
> > > """
> > > 
> > > Because you end up doing:
> > > 
> > > ... loop ...
> > >    spin_lock_irqsave(&guc->submission_state.lock, flags);
> > >    list_del_init(&ce->destroyed_link);
> > >    spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> > > 
> > >    release_guc_id, which calls:
> > >      spin_lock_irqsave(&guc->submission_state.lock, flags);
> > >      __release_guc_id(guc, ce);
> > >      spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> > > 
> > > So a) the lock seems to be protecting more than just list
> > > management, or release_guc_if is wrong, and b) the loop ends up with
> > > highly questionable hammering on the lock.
> > > 
> > > Is there any point to this part of the patch? Or the only business
> > > end of the patch is below:
> > > 
> > > >           __guc_context_destroy(ce);
> > > >       }
> > > > -    spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> > > >   }
> > > >   static void deregister_destroyed_contexts(struct intel_guc *guc)
> > > >   {
> > > > -    struct intel_context *ce, *cn;
> > > > +    struct intel_context *ce;
> > > >       unsigned long flags;
> > > > -    spin_lock_irqsave(&guc->submission_state.lock, flags);
> > > > -    list_for_each_entry_safe(ce, cn,
> > > > -                 &guc->submission_state.destroyed_contexts,
> > > > -                 destroyed_link) {
> > > > -        list_del_init(&ce->destroyed_link);
> > > > +    while (!list_empty(&guc->submission_state.destroyed_contexts)) {
> > > > +        spin_lock_irqsave(&guc->submission_state.lock, flags);
> > > > +        ce =
> > > > list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
> > > > +                          struct intel_context,
> > > > +                          destroyed_link);
> > > > +        if (ce)
> > > > +            list_del_init(&ce->destroyed_link);
> > > > +        spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> > > > +
> > > > +        if (!ce)
> > > > +            break;
> > > > +
> > > >           guc_lrc_desc_unpin(ce);
> > > 
> > > Here?
> > > 
> > > Not wanting/needing to nest ce->guc_state.lock under
> > > guc->submission_state.lock, and call the CPU cycle expensive
> > > deregister_context?
> > > 
> > > 1)
> > > Could you unlink en masse, under the assumption destroyed contexts
> > > are not reachable from anywhere else at this point, so under a
> > > single lock hold?
> > > 
> > > 2)
> > > But then you also end up with guc_lrc_desc_unpin calling
> > > __release_guc_id, which when called by release_guc_id does take
> > > guc->submission_state.lock and here it does not. Is it then clear
> > > which operations inside __release_guc_id need the lock? Bitmap or
> > > IDA?
> > 
> > Ah no, with 2nd point I missed you changed guc_lrc_desc_unpin to call
> > release_guc_id.
> > 
> > Question on the merit of change in guc_flush_destroyed_contexts remains,
> > and also whether at both places you could do group unlink (one lock
> > hold), put on a private list, and then unpin/deregister.
> > 
> > Regards,
> > 
> > Tvrtko

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts
@ 2021-12-22 20:38           ` Matthew Brost
  0 siblings, 0 replies; 28+ messages in thread
From: Matthew Brost @ 2021-12-22 20:38 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, dri-devel

On Wed, Dec 22, 2021 at 04:25:13PM +0000, Tvrtko Ursulin wrote:
> 
> Ping?
>

Missed this.

This was merged before your comments landed on the list.
 
> Main two points being:
> 
> 1) Commit message seems in contradiction with the change in
> guc_flush_destroyed_contexts. And the lock drop to immediately re-acquire it
> looks questionable to start with.
> 
> 2) And in deregister_destroyed_contexts and in 1) I was therefore asking if
> you can unlink all at once and process with reduced hammering on the lock.
> 

Probably can address both concerns by using a llist, right?

Be on the look out for this rework patch over the next week or so.

Matt

> Regards,
> 
> Tvrtko
> 
> On 17/12/2021 11:14, Tvrtko Ursulin wrote:
> > 
> > On 17/12/2021 11:06, Tvrtko Ursulin wrote:
> > > On 14/12/2021 17:04, Matthew Brost wrote:
> > > > From: John Harrison <John.C.Harrison@Intel.com>
> > > > 
> > > > While attempting to debug a CT deadlock issue in various CI failures
> > > > (most easily reproduced with gem_ctx_create/basic-files), I was seeing
> > > > CPU deadlock errors being reported. This were because the context
> > > > destroy loop was blocking waiting on H2G space from inside an IRQ
> > > > spinlock. There no was deadlock as such, it's just that the H2G queue
> > > > was full of context destroy commands and GuC was taking a long time to
> > > > process them. However, the kernel was seeing the large amount of time
> > > > spent inside the IRQ lock as a dead CPU. Various Bad Things(tm) would
> > > > then happen (heartbeat failures, CT deadlock errors, outstanding H2G
> > > > WARNs, etc.).
> > > > 
> > > > Re-working the loop to only acquire the spinlock around the list
> > > > management (which is all it is meant to protect) rather than the
> > > > entire destroy operation seems to fix all the above issues.
> > > > 
> > > > v2:
> > > >   (John Harrison)
> > > >    - Fix typo in comment message
> > > > 
> > > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 45 ++++++++++++-------
> > > >   1 file changed, 28 insertions(+), 17 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > index 36c2965db49b..96fcf869e3ff 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > @@ -2644,7 +2644,6 @@ static inline void
> > > > guc_lrc_desc_unpin(struct intel_context *ce)
> > > >       unsigned long flags;
> > > >       bool disabled;
> > > > -    lockdep_assert_held(&guc->submission_state.lock);
> > > >       GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
> > > >       GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id.id));
> > > >       GEM_BUG_ON(ce != __get_context(guc, ce->guc_id.id));
> > > > @@ -2660,7 +2659,7 @@ static inline void
> > > > guc_lrc_desc_unpin(struct intel_context *ce)
> > > >       }
> > > >       spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > > >       if (unlikely(disabled)) {
> > > > -        __release_guc_id(guc, ce);
> > > > +        release_guc_id(guc, ce);
> > > >           __guc_context_destroy(ce);
> > > >           return;
> > > >       }
> > > > @@ -2694,36 +2693,48 @@ static void __guc_context_destroy(struct
> > > > intel_context *ce)
> > > >   static void guc_flush_destroyed_contexts(struct intel_guc *guc)
> > > >   {
> > > > -    struct intel_context *ce, *cn;
> > > > +    struct intel_context *ce;
> > > >       unsigned long flags;
> > > >       GEM_BUG_ON(!submission_disabled(guc) &&
> > > >              guc_submission_initialized(guc));
> > > > -    spin_lock_irqsave(&guc->submission_state.lock, flags);
> > > > -    list_for_each_entry_safe(ce, cn,
> > > > -                 &guc->submission_state.destroyed_contexts,
> > > > -                 destroyed_link) {
> > > > -        list_del_init(&ce->destroyed_link);
> > > > -        __release_guc_id(guc, ce);
> > > > +    while (!list_empty(&guc->submission_state.destroyed_contexts)) {
> > > 
> > > Are lockless false negatives a concern here - I mean this thread not
> > > seeing something just got added to the list?
> > > 
> > > > +        spin_lock_irqsave(&guc->submission_state.lock, flags);
> > > > +        ce =
> > > > list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
> > > > +                          struct intel_context,
> > > > +                          destroyed_link);
> > > > +        if (ce)
> > > > +            list_del_init(&ce->destroyed_link);
> > > > +        spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> > > > +
> > > > +        if (!ce)
> > > > +            break;
> > > > +
> > > > +        release_guc_id(guc, ce);
> > > 
> > > This looks suboptimal and in conflict with this part of the commit
> > > message:
> > > 
> > > """
> > >   Re-working the loop to only acquire the spinlock around the list
> > >   management (which is all it is meant to protect) rather than the
> > >   entire destroy operation seems to fix all the above issues.
> > > """
> > > 
> > > Because you end up doing:
> > > 
> > > ... loop ...
> > >    spin_lock_irqsave(&guc->submission_state.lock, flags);
> > >    list_del_init(&ce->destroyed_link);
> > >    spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> > > 
> > >    release_guc_id, which calls:
> > >      spin_lock_irqsave(&guc->submission_state.lock, flags);
> > >      __release_guc_id(guc, ce);
> > >      spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> > > 
> > > So a) the lock seems to be protecting more than just list
> > > management, or release_guc_if is wrong, and b) the loop ends up with
> > > highly questionable hammering on the lock.
> > > 
> > > Is there any point to this part of the patch? Or the only business
> > > end of the patch is below:
> > > 
> > > >           __guc_context_destroy(ce);
> > > >       }
> > > > -    spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> > > >   }
> > > >   static void deregister_destroyed_contexts(struct intel_guc *guc)
> > > >   {
> > > > -    struct intel_context *ce, *cn;
> > > > +    struct intel_context *ce;
> > > >       unsigned long flags;
> > > > -    spin_lock_irqsave(&guc->submission_state.lock, flags);
> > > > -    list_for_each_entry_safe(ce, cn,
> > > > -                 &guc->submission_state.destroyed_contexts,
> > > > -                 destroyed_link) {
> > > > -        list_del_init(&ce->destroyed_link);
> > > > +    while (!list_empty(&guc->submission_state.destroyed_contexts)) {
> > > > +        spin_lock_irqsave(&guc->submission_state.lock, flags);
> > > > +        ce =
> > > > list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
> > > > +                          struct intel_context,
> > > > +                          destroyed_link);
> > > > +        if (ce)
> > > > +            list_del_init(&ce->destroyed_link);
> > > > +        spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> > > > +
> > > > +        if (!ce)
> > > > +            break;
> > > > +
> > > >           guc_lrc_desc_unpin(ce);
> > > 
> > > Here?
> > > 
> > > Not wanting/needing to nest ce->guc_state.lock under
> > > guc->submission_state.lock, and call the CPU cycle expensive
> > > deregister_context?
> > > 
> > > 1)
> > > Could you unlink en masse, under the assumption destroyed contexts
> > > are not reachable from anywhere else at this point, so under a
> > > single lock hold?
> > > 
> > > 2)
> > > But then you also end up with guc_lrc_desc_unpin calling
> > > __release_guc_id, which when called by release_guc_id does take
> > > guc->submission_state.lock and here it does not. Is it then clear
> > > which operations inside __release_guc_id need the lock? Bitmap or
> > > IDA?
> > 
> > Ah no, with 2nd point I missed you changed guc_lrc_desc_unpin to call
> > release_guc_id.
> > 
> > Question on the merit of change in guc_flush_destroyed_contexts remains,
> > and also whether at both places you could do group unlink (one lock
> > hold), put on a private list, and then unpin/deregister.
> > 
> > Regards,
> > 
> > Tvrtko

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-12-22 20:44 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-14 17:04 [PATCH 0/7] Fix stealing guc_ids + test Matthew Brost
2021-12-14 17:04 ` [Intel-gfx] " Matthew Brost
2021-12-14 17:04 ` [PATCH 1/7] drm/i915/guc: Use correct context lock when callig clr_context_registered Matthew Brost
2021-12-14 17:04   ` [Intel-gfx] " Matthew Brost
2021-12-14 17:04 ` [PATCH 2/7] drm/i915/guc: Only assign guc_id.id when stealing guc_id Matthew Brost
2021-12-14 17:04   ` [Intel-gfx] " Matthew Brost
2021-12-14 17:04 ` [PATCH 3/7] drm/i915/guc: Remove racey GEM_BUG_ON Matthew Brost
2021-12-14 17:04   ` [Intel-gfx] " Matthew Brost
2021-12-14 17:04 ` [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts Matthew Brost
2021-12-14 17:04   ` [Intel-gfx] " Matthew Brost
2021-12-17 11:06   ` Tvrtko Ursulin
2021-12-17 11:14     ` Tvrtko Ursulin
2021-12-22 16:25       ` Tvrtko Ursulin
2021-12-22 16:25         ` Tvrtko Ursulin
2021-12-22 20:38         ` Matthew Brost
2021-12-22 20:38           ` Matthew Brost
2021-12-14 17:04 ` [PATCH 5/7] drm/i915/guc: Add extra debug on CT deadlock Matthew Brost
2021-12-14 17:04   ` [Intel-gfx] " Matthew Brost
2021-12-14 17:04 ` [PATCH 6/7] drm/i915/guc: Kick G2H tasklet if no credits Matthew Brost
2021-12-14 17:04   ` [Intel-gfx] " Matthew Brost
2021-12-14 17:05 ` [PATCH 7/7] drm/i915/guc: Selftest for stealing of guc ids Matthew Brost
2021-12-14 17:05   ` [Intel-gfx] " Matthew Brost
2021-12-14 19:48   ` John Harrison
2021-12-14 19:48     ` [Intel-gfx] " John Harrison
2021-12-14 18:12 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Fix stealing guc_ids + test (rev3) Patchwork
2021-12-14 18:13 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-12-14 18:42 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-12-15  3:28 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.