[Intel-gfx] [PATCH] drm/i915/gt: Be defensive in the face of false CS events

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Intel-gfx] [PATCH] drm/i915/gt: Be defensive in the face of false CS events
@ 2020-07-10 12:07 Chris Wilson
  2020-07-10 12:15 ` Chris Wilson
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Chris Wilson @ 2020-07-10 12:07 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If the HW throws a curve ball and reports either en event before it is
possible, or just a completely impossible event, we have to grin and
bear it. The first few events, we will likely not notice as we would be
expecting some event, but as soon as we stop expecting an event and yet
they still keep coming, then we enter into undefined state terrority.
In which case, bail out, stop processing the events, and reset the
engine and our set of queued requests to recover.

The sporadic hangs and warnings will continue to plague CI, but at least
system stability should not be compromised.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 8 ++++++--
 drivers/gpu/drm/i915/i915_gem.h     | 2 ++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index fbcfeaed6441..f22cf8ed47ac 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2567,6 +2567,7 @@ static void process_csb(struct intel_engine_cs *engine)
 	tail = READ_ONCE(*execlists->csb_write);
 	if (unlikely(head == tail))
 		return;
+	execlists->csb_head = tail;
 
 	/*
 	 * Hopefully paired with a wmb() in HW!
@@ -2613,6 +2614,9 @@ static void process_csb(struct intel_engine_cs *engine)
 		if (promote) {
 			struct i915_request * const *old = execlists->active;
 
+			if (GEM_WARN_ON_ONCE(!*execlists->pending))
+				break;
+
 			ring_set_paused(engine, 0);
 
 			/* Point active to the new ELSP; prevent overwriting */
@@ -2635,7 +2639,8 @@ static void process_csb(struct intel_engine_cs *engine)
 
 			WRITE_ONCE(execlists->pending[0], NULL);
 		} else {
-			GEM_BUG_ON(!*execlists->active);
+			if (GEM_WARN_ON_ONCE(!*execlists->active))
+				break;
 
 			/* port0 completed, advanced to port1 */
 			trace_ports(execlists, "completed", execlists->active);
@@ -2686,7 +2691,6 @@ static void process_csb(struct intel_engine_cs *engine)
 		}
 	} while (head != tail);
 
-	execlists->csb_head = head;
 	set_timeslice(engine);
 
 	/*
diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index f333e88a2b6e..c0c689fa3f19 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -46,6 +46,7 @@ struct drm_i915_private;
 		} \
 	} while(0)
 #define GEM_WARN_ON(expr) WARN_ON(expr)
+#define GEM_WARN_ON_ONCE(expr) WARN_ON_ONCE(expr)
 
 #define GEM_DEBUG_DECL(var) var
 #define GEM_DEBUG_EXEC(expr) expr
@@ -58,6 +59,7 @@ struct drm_i915_private;
 
 #define GEM_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr)
 #define GEM_WARN_ON(expr) ({ unlikely(!!(expr)); })
+#define GEM_WARN_ON_ONCE(expr) ({ unlikely(!!(expr)); })
 
 #define GEM_DEBUG_DECL(var)
 #define GEM_DEBUG_EXEC(expr) do { } while (0)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/gt: Be defensive in the face of false CS events
  2020-07-10 12:07 [Intel-gfx] [PATCH] drm/i915/gt: Be defensive in the face of false CS events Chris Wilson
@ 2020-07-10 12:15 ` Chris Wilson
  2020-07-10 12:16 ` Chris Wilson
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2020-07-10 12:15 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2020-07-10 13:07:17)
> If the HW throws a curve ball and reports either en event before it is
> possible, or just a completely impossible event, we have to grin and
> bear it. The first few events, we will likely not notice as we would be
> expecting some event, but as soon as we stop expecting an event and yet
> they still keep coming, then we enter into undefined state terrority.
> In which case, bail out, stop processing the events, and reset the
> engine and our set of queued requests to recover.
> 
> The sporadic hangs and warnings will continue to plague CI, but at least
> system stability should not be compromised.
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c | 8 ++++++--
>  drivers/gpu/drm/i915/i915_gem.h     | 2 ++
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index fbcfeaed6441..f22cf8ed47ac 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2567,6 +2567,7 @@ static void process_csb(struct intel_engine_cs *engine)
>         tail = READ_ONCE(*execlists->csb_write);
>         if (unlikely(head == tail))
>                 return;
> +       execlists->csb_head = tail;
>  
>         /*
>          * Hopefully paired with a wmb() in HW!
> @@ -2613,6 +2614,9 @@ static void process_csb(struct intel_engine_cs *engine)
>                 if (promote) {
>                         struct i915_request * const *old = execlists->active;
>  
> +                       if (GEM_WARN_ON_ONCE(!*execlists->pending))

I wonder if we should just default GEM_WARN_ON to be GEM_WARN_ON_ONCE,
CI reboots after a warning so the spam is unhelpful.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-gfx] [PATCH] drm/i915/gt: Be defensive in the face of false CS events
  2020-07-10 12:07 [Intel-gfx] [PATCH] drm/i915/gt: Be defensive in the face of false CS events Chris Wilson
  2020-07-10 12:15 ` Chris Wilson
@ 2020-07-10 12:16 ` Chris Wilson
  2020-07-10 12:30   ` Tvrtko Ursulin
  2020-07-10 17:23   ` Ruhl, Michael J
  2020-07-10 13:05 ` [Intel-gfx] [PATCH v2] " Chris Wilson
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Chris Wilson @ 2020-07-10 12:16 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If the HW throws a curve ball and reports either en event before it is
possible, or just a completely impossible event, we have to grin and
bear it. The first few events, we will likely not notice as we would be
expecting some event, but as soon as we stop expecting an event and yet
they still keep coming, then we enter into undefined state territory.
In which case, bail out, stop processing the events, and reset the
engine and our set of queued requests to recover.

The sporadic hangs and warnings will continue to plague CI, but at least
system stability should not be compromised.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index fbcfeaed6441..c86324d2d2bb 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2567,6 +2567,7 @@ static void process_csb(struct intel_engine_cs *engine)
 	tail = READ_ONCE(*execlists->csb_write);
 	if (unlikely(head == tail))
 		return;
+	execlists->csb_head = tail;
 
 	/*
 	 * Hopefully paired with a wmb() in HW!
@@ -2613,6 +2614,9 @@ static void process_csb(struct intel_engine_cs *engine)
 		if (promote) {
 			struct i915_request * const *old = execlists->active;
 
+			if (GEM_WARN_ON(!*execlists->pending))
+				break;
+
 			ring_set_paused(engine, 0);
 
 			/* Point active to the new ELSP; prevent overwriting */
@@ -2635,7 +2639,8 @@ static void process_csb(struct intel_engine_cs *engine)
 
 			WRITE_ONCE(execlists->pending[0], NULL);
 		} else {
-			GEM_BUG_ON(!*execlists->active);
+			if (GEM_WARN_ON(!*execlists->active))
+				break;
 
 			/* port0 completed, advanced to port1 */
 			trace_ports(execlists, "completed", execlists->active);
@@ -2686,7 +2691,6 @@ static void process_csb(struct intel_engine_cs *engine)
 		}
 	} while (head != tail);
 
-	execlists->csb_head = head;
 	set_timeslice(engine);
 
 	/*
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/gt: Be defensive in the face of false CS events
  2020-07-10 12:16 ` Chris Wilson
@ 2020-07-10 12:30   ` Tvrtko Ursulin
  2020-07-10 12:35     ` Chris Wilson
  2020-07-10 17:23   ` Ruhl, Michael J
  1 sibling, 1 reply; 14+ messages in thread
From: Tvrtko Ursulin @ 2020-07-10 12:30 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 10/07/2020 13:16, Chris Wilson wrote:
> If the HW throws a curve ball and reports either en event before it is
> possible, or just a completely impossible event, we have to grin and
> bear it. The first few events, we will likely not notice as we would be
> expecting some event, but as soon as we stop expecting an event and yet
> they still keep coming, then we enter into undefined state territory.
> In which case, bail out, stop processing the events, and reset the
> engine and our set of queued requests to recover.
> 
> The sporadic hangs and warnings will continue to plague CI, but at least
> system stability should not be compromised.
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index fbcfeaed6441..c86324d2d2bb 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2567,6 +2567,7 @@ static void process_csb(struct intel_engine_cs *engine)
>   	tail = READ_ONCE(*execlists->csb_write);
>   	if (unlikely(head == tail))
>   		return;
> +	execlists->csb_head = tail;

This deserves a comment...

>   
>   	/*
>   	 * Hopefully paired with a wmb() in HW!
> @@ -2613,6 +2614,9 @@ static void process_csb(struct intel_engine_cs *engine)
>   		if (promote) {
>   			struct i915_request * const *old = execlists->active;
>   
> +			if (GEM_WARN_ON(!*execlists->pending))
> +				break;
> +

... but why not continue? You think nothing good can come out of trying 
further and break simply expedites the hang? We have to be confident we 
can cope with any random i915 state caused by skipping maybe valid entries.

Conclusion will define what kind of comment to put above. "Assume we 
always consume all CSB entries, or things are really bad and we mark all 
as invalid upon finding first bad entry"?

Regards,

Tvrtko

>   			ring_set_paused(engine, 0);
>   
>   			/* Point active to the new ELSP; prevent overwriting */
> @@ -2635,7 +2639,8 @@ static void process_csb(struct intel_engine_cs *engine)
>   
>   			WRITE_ONCE(execlists->pending[0], NULL);
>   		} else {
> -			GEM_BUG_ON(!*execlists->active);
> +			if (GEM_WARN_ON(!*execlists->active))
> +				break;
>   
>   			/* port0 completed, advanced to port1 */
>   			trace_ports(execlists, "completed", execlists->active);
> @@ -2686,7 +2691,6 @@ static void process_csb(struct intel_engine_cs *engine)
>   		}
>   	} while (head != tail);
>   
> -	execlists->csb_head = head;
>   	set_timeslice(engine);
>   
>   	/*
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/gt: Be defensive in the face of false CS events
  2020-07-10 12:30   ` Tvrtko Ursulin
@ 2020-07-10 12:35     ` Chris Wilson
  2020-07-10 12:49       ` Tvrtko Ursulin
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2020-07-10 12:35 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-07-10 13:30:09)
> 
> On 10/07/2020 13:16, Chris Wilson wrote:
> > If the HW throws a curve ball and reports either en event before it is
> > possible, or just a completely impossible event, we have to grin and
> > bear it. The first few events, we will likely not notice as we would be
> > expecting some event, but as soon as we stop expecting an event and yet
> > they still keep coming, then we enter into undefined state territory.
> > In which case, bail out, stop processing the events, and reset the
> > engine and our set of queued requests to recover.
> > 
> > The sporadic hangs and warnings will continue to plague CI, but at least
> > system stability should not be compromised.
> > 
> > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_lrc.c | 8 ++++++--
> >   1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index fbcfeaed6441..c86324d2d2bb 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -2567,6 +2567,7 @@ static void process_csb(struct intel_engine_cs *engine)
> >       tail = READ_ONCE(*execlists->csb_write);
> >       if (unlikely(head == tail))
> >               return;
> > +     execlists->csb_head = tail;
> 
> This deserves a comment...
> 
> >   
> >       /*
> >        * Hopefully paired with a wmb() in HW!
> > @@ -2613,6 +2614,9 @@ static void process_csb(struct intel_engine_cs *engine)
> >               if (promote) {
> >                       struct i915_request * const *old = execlists->active;
> >   
> > +                     if (GEM_WARN_ON(!*execlists->pending))
> > +                             break;
> > +
> 
> ... but why not continue? You think nothing good can come out of trying 
> further and break simply expedites the hang? We have to be confident we 
> can cope with any random i915 state caused by skipping maybe valid entries.

We are already past the point of no return as the events coming from HW
do not correspond to our events; continuing on cannot recover, we will
already have made mistakes.
 
> Conclusion will define what kind of comment to put above. "Assume we 
> always consume all CSB entries, or things are really bad and we mark all 
> as invalid upon finding first bad entry"?

It's dead, Jim.

We escape out, reset the engine/GPU, consign the port tracking to the bin,
and reload with the next set of requests.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/gt: Be defensive in the face of false CS events
  2020-07-10 12:35     ` Chris Wilson
@ 2020-07-10 12:49       ` Tvrtko Ursulin
  0 siblings, 0 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2020-07-10 12:49 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 10/07/2020 13:35, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-07-10 13:30:09)
>>
>> On 10/07/2020 13:16, Chris Wilson wrote:
>>> If the HW throws a curve ball and reports either en event before it is
>>> possible, or just a completely impossible event, we have to grin and
>>> bear it. The first few events, we will likely not notice as we would be
>>> expecting some event, but as soon as we stop expecting an event and yet
>>> they still keep coming, then we enter into undefined state territory.
>>> In which case, bail out, stop processing the events, and reset the
>>> engine and our set of queued requests to recover.
>>>
>>> The sporadic hangs and warnings will continue to plague CI, but at least
>>> system stability should not be compromised.
>>>
>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_lrc.c | 8 ++++++--
>>>    1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> index fbcfeaed6441..c86324d2d2bb 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> @@ -2567,6 +2567,7 @@ static void process_csb(struct intel_engine_cs *engine)
>>>        tail = READ_ONCE(*execlists->csb_write);
>>>        if (unlikely(head == tail))
>>>                return;
>>> +     execlists->csb_head = tail;
>>
>> This deserves a comment...
>>
>>>    
>>>        /*
>>>         * Hopefully paired with a wmb() in HW!
>>> @@ -2613,6 +2614,9 @@ static void process_csb(struct intel_engine_cs *engine)
>>>                if (promote) {
>>>                        struct i915_request * const *old = execlists->active;
>>>    
>>> +                     if (GEM_WARN_ON(!*execlists->pending))
>>> +                             break;
>>> +
>>
>> ... but why not continue? You think nothing good can come out of trying
>> further and break simply expedites the hang? We have to be confident we
>> can cope with any random i915 state caused by skipping maybe valid entries.
> 
> We are already past the point of no return as the events coming from HW
> do not correspond to our events; continuing on cannot recover, we will
> already have made mistakes.

Yeah, I am just worried if between first error and reset, the fact we 
skipped possible valid entries, could cause hitting some other bug on or 
null ptr deref. I don't have anything concrete.. so maybe just FUD.

>> Conclusion will define what kind of comment to put above. "Assume we
>> always consume all CSB entries, or things are really bad and we mark all
>> as invalid upon finding first bad entry"?
> 
> It's dead, Jim.
> 
> We escape out, reset the engine/GPU, consign the port tracking to the bin,
> and reload with the next set of requests.

With a comment at the "execlists->csb_head = tail;" site explaining the 
plan for handling seriously unexpected HW events:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-gfx] [PATCH v2] drm/i915/gt: Be defensive in the face of false CS events
  2020-07-10 12:07 [Intel-gfx] [PATCH] drm/i915/gt: Be defensive in the face of false CS events Chris Wilson
  2020-07-10 12:15 ` Chris Wilson
  2020-07-10 12:16 ` Chris Wilson
@ 2020-07-10 13:05 ` Chris Wilson
  2020-07-10 13:14   ` Tvrtko Ursulin
  2020-07-10 14:00 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/gt: Be defensive in the face of false CS events (rev5) Patchwork
  2020-07-10 16:29 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  4 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2020-07-10 13:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If the HW throws a curve ball and reports either en event before it is
possible, or just a completely impossible event, we have to grin and
bear it. The first few events, we will likely not notice as we would be
expecting some event, but as soon as we stop expecting an event and yet
they still keep coming, then we enter into undefined state territory.
In which case, bail out, stop processing the events, and reset the
engine and our set of queued requests to recover.

The sporadic hangs and warnings will continue to plague CI, but at least
system stability should not be compromised.

v2: Commentary and force the reset-on-error.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 35 ++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index fbcfeaed6441..54cd943921a6 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2568,6 +2568,25 @@ static void process_csb(struct intel_engine_cs *engine)
 	if (unlikely(head == tail))
 		return;
 
+	/*
+	 * We will consume all events from HW, or at least pretend to.
+	 *
+	 * The sequence of events from the HW is deterministic, and derived
+	 * from our writes to the ELSP, with a smidgen of variability for
+	 * the arrival of the asynchronous requests wrt to the inflight
+	 * execution. If the HW sends an event that does not correspond with
+	 * the one we are expecting, we have to abandon all hope as we lose
+	 * all tracking of what the engine is actually executing. We will
+	 * only detect we are out of sequence with the HW when we get an
+	 * 'impossible' event because we have already drained our own
+	 * preemption/promotion queue. If this occurs, we know that we likely
+	 * lost track of execution earlier and must unwind and restart, the
+	 * simplest way is by stop processing the event queue and force the
+	 * engine to reset.
+	 */
+	execlists->csb_head = tail;
+	ENGINE_TRACE(engine, "cs-irq head=%d, tail=%d\n", head, tail);
+
 	/*
 	 * Hopefully paired with a wmb() in HW!
 	 *
@@ -2577,8 +2596,6 @@ static void process_csb(struct intel_engine_cs *engine)
 	 * we perform the READ_ONCE(*csb_write).
 	 */
 	rmb();
-
-	ENGINE_TRACE(engine, "cs-irq head=%d, tail=%d\n", head, tail);
 	do {
 		bool promote;
 
@@ -2613,6 +2630,11 @@ static void process_csb(struct intel_engine_cs *engine)
 		if (promote) {
 			struct i915_request * const *old = execlists->active;
 
+			if (GEM_WARN_ON(!*execlists->pending)) {
+				engine->execlists.error_interrupt = ~0u;
+				break;
+			}
+
 			ring_set_paused(engine, 0);
 
 			/* Point active to the new ELSP; prevent overwriting */
@@ -2635,7 +2657,10 @@ static void process_csb(struct intel_engine_cs *engine)
 
 			WRITE_ONCE(execlists->pending[0], NULL);
 		} else {
-			GEM_BUG_ON(!*execlists->active);
+			if (GEM_WARN_ON(!*execlists->active)) {
+				engine->execlists.error_interrupt = ~0u;
+				break;
+			}
 
 			/* port0 completed, advanced to port1 */
 			trace_ports(execlists, "completed", execlists->active);
@@ -2686,7 +2711,6 @@ static void process_csb(struct intel_engine_cs *engine)
 		}
 	} while (head != tail);
 
-	execlists->csb_head = head;
 	set_timeslice(engine);
 
 	/*
@@ -3118,8 +3142,7 @@ static void execlists_submission_tasklet(unsigned long data)
 
 	if (unlikely(READ_ONCE(engine->execlists.error_interrupt))) {
 		engine->execlists.error_interrupt = 0;
-		if (ENGINE_READ(engine, RING_ESR)) /* confirm the error */
-			execlists_reset(engine, "CS error");
+		execlists_reset(engine, "CS error");
 	}
 
 	if (!READ_ONCE(engine->execlists.pending[0]) || timeout) {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH v2] drm/i915/gt: Be defensive in the face of false CS events
  2020-07-10 13:05 ` [Intel-gfx] [PATCH v2] " Chris Wilson
@ 2020-07-10 13:14   ` Tvrtko Ursulin
  2020-07-10 13:27     ` [Intel-gfx] [PATCH v3] " Chris Wilson
  2020-07-10 13:31     ` Chris Wilson
  0 siblings, 2 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2020-07-10 13:14 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 10/07/2020 14:05, Chris Wilson wrote:
> If the HW throws a curve ball and reports either en event before it is
> possible, or just a completely impossible event, we have to grin and
> bear it. The first few events, we will likely not notice as we would be
> expecting some event, but as soon as we stop expecting an event and yet
> they still keep coming, then we enter into undefined state territory.
> In which case, bail out, stop processing the events, and reset the
> engine and our set of queued requests to recover.
> 
> The sporadic hangs and warnings will continue to plague CI, but at least
> system stability should not be compromised.
> 
> v2: Commentary and force the reset-on-error.
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c | 35 ++++++++++++++++++++++++-----
>   1 file changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index fbcfeaed6441..54cd943921a6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2568,6 +2568,25 @@ static void process_csb(struct intel_engine_cs *engine)
>   	if (unlikely(head == tail))
>   		return;
>   
> +	/*
> +	 * We will consume all events from HW, or at least pretend to.
> +	 *
> +	 * The sequence of events from the HW is deterministic, and derived
> +	 * from our writes to the ELSP, with a smidgen of variability for
> +	 * the arrival of the asynchronous requests wrt to the inflight
> +	 * execution. If the HW sends an event that does not correspond with
> +	 * the one we are expecting, we have to abandon all hope as we lose
> +	 * all tracking of what the engine is actually executing. We will
> +	 * only detect we are out of sequence with the HW when we get an
> +	 * 'impossible' event because we have already drained our own
> +	 * preemption/promotion queue. If this occurs, we know that we likely
> +	 * lost track of execution earlier and must unwind and restart, the
> +	 * simplest way is by stop processing the event queue and force the
> +	 * engine to reset.
> +	 */
> +	execlists->csb_head = tail;
> +	ENGINE_TRACE(engine, "cs-irq head=%d, tail=%d\n", head, tail);
> +
>   	/*
>   	 * Hopefully paired with a wmb() in HW!
>   	 *
> @@ -2577,8 +2596,6 @@ static void process_csb(struct intel_engine_cs *engine)
>   	 * we perform the READ_ONCE(*csb_write).
>   	 */
>   	rmb();
> -
> -	ENGINE_TRACE(engine, "cs-irq head=%d, tail=%d\n", head, tail);
>   	do {
>   		bool promote;
>   
> @@ -2613,6 +2630,11 @@ static void process_csb(struct intel_engine_cs *engine)
>   		if (promote) {
>   			struct i915_request * const *old = execlists->active;
>   
> +			if (GEM_WARN_ON(!*execlists->pending)) {
> +				engine->execlists.error_interrupt = ~0u;
> +				break;
> +			}
> +
>   			ring_set_paused(engine, 0);
>   
>   			/* Point active to the new ELSP; prevent overwriting */
> @@ -2635,7 +2657,10 @@ static void process_csb(struct intel_engine_cs *engine)
>   
>   			WRITE_ONCE(execlists->pending[0], NULL);
>   		} else {
> -			GEM_BUG_ON(!*execlists->active);
> +			if (GEM_WARN_ON(!*execlists->active)) {
> +				engine->execlists.error_interrupt = ~0u;
> +				break;
> +			}
>   
>   			/* port0 completed, advanced to port1 */
>   			trace_ports(execlists, "completed", execlists->active);
> @@ -2686,7 +2711,6 @@ static void process_csb(struct intel_engine_cs *engine)
>   		}
>   	} while (head != tail);
>   
> -	execlists->csb_head = head;
>   	set_timeslice(engine);
>   
>   	/*
> @@ -3118,8 +3142,7 @@ static void execlists_submission_tasklet(unsigned long data)
>   
>   	if (unlikely(READ_ONCE(engine->execlists.error_interrupt))) {
>   		engine->execlists.error_interrupt = 0;
> -		if (ENGINE_READ(engine, RING_ESR)) /* confirm the error */
> -			execlists_reset(engine, "CS error");
> +		execlists_reset(engine, "CS error");

Since this is user visible and the rest of logging may not be, I am 
thinking there is value in distinguishing between RING_EIR reported 
errors and the ones we "add". Since RING_EIR is a masked register, it 
seems we could do that easily by making sure upper 16 bits are cleared 
in cs_irq_handler, set in process_csb, and here we choose a log message 
based on it/them?

Regards,

Tvrtko

>   	}
>   
>   	if (!READ_ONCE(engine->execlists.pending[0]) || timeout) {
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-gfx] [PATCH v3] drm/i915/gt: Be defensive in the face of false CS events
  2020-07-10 13:14   ` Tvrtko Ursulin
@ 2020-07-10 13:27     ` Chris Wilson
  2020-07-10 13:31     ` Chris Wilson
  1 sibling, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2020-07-10 13:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If the HW throws a curve ball and reports either en event before it is
possible, or just a completely impossible event, we have to grin and
bear it. The first few events, we will likely not notice as we would be
expecting some event, but as soon as we stop expecting an event and yet
they still keep coming, then we enter into undefined state territory.
In which case, bail out, stop processing the events, and reset the
engine and our set of queued requests to recover.

The sporadic hangs and warnings will continue to plague CI, but at least
system stability should not be compromised.

v2: Commentary and force the reset-on-error.
v3: Customised user facing message for forced resets from internal errors.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h |  4 ++
 drivers/gpu/drm/i915/gt/intel_gt_irq.c       |  3 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c          | 45 +++++++++++++++++---
 3 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 490af81bd6f3..d2db74f50277 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -177,8 +177,12 @@ struct intel_engine_execlists {
 	 * the first error interrupt, record the EIR and schedule the tasklet.
 	 * In the tasklet, we process the pending CS events to ensure we have
 	 * the guilty request, and then reset the engine.
+	 *
+	 * Low 16b are used by HW, with the upper 16b used as the enabling mask.
+	 * Reserve the upper 16b for tracking internal errors.
 	 */
 	u32 error_interrupt;
+#define INVALID_CSB BIT(31)
 
 	/**
 	 * @reset_ccid: Active CCID [EXECLISTS_STATUS_HI] at the time of reset
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
index e1964cf40fd6..b05da68e52f4 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
@@ -27,7 +27,8 @@ cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 	if (unlikely(iir & GT_CS_MASTER_ERROR_INTERRUPT)) {
 		u32 eir;
 
-		eir = ENGINE_READ(engine, RING_EIR);
+		/* Upper 16b are the enabling mask, rsvd for internal errors */
+		eir = ENGINE_READ(engine, RING_EIR) & GENMASK(15, 0);
 		ENGINE_TRACE(engine, "CS error: %x\n", eir);
 
 		/* Disable the error interrupt until after the reset */
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index fbcfeaed6441..bb65b0ed6cd7 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2568,6 +2568,25 @@ static void process_csb(struct intel_engine_cs *engine)
 	if (unlikely(head == tail))
 		return;
 
+	/*
+	 * We will consume all events from HW, or at least pretend to.
+	 *
+	 * The sequence of events from the HW is deterministic, and derived
+	 * from our writes to the ELSP, with a smidgen of variability for
+	 * the arrival of the asynchronous requests wrt to the inflight
+	 * execution. If the HW sends an event that does not correspond with
+	 * the one we are expecting, we have to abandon all hope as we lose
+	 * all tracking of what the engine is actually executing. We will
+	 * only detect we are out of sequence with the HW when we get an
+	 * 'impossible' event because we have already drained our own
+	 * preemption/promotion queue. If this occurs, we know that we likely
+	 * lost track of execution earlier and must unwind and restart, the
+	 * simplest way is by stop processing the event queue and force the
+	 * engine to reset.
+	 */
+	execlists->csb_head = tail;
+	ENGINE_TRACE(engine, "cs-irq head=%d, tail=%d\n", head, tail);
+
 	/*
 	 * Hopefully paired with a wmb() in HW!
 	 *
@@ -2577,8 +2596,6 @@ static void process_csb(struct intel_engine_cs *engine)
 	 * we perform the READ_ONCE(*csb_write).
 	 */
 	rmb();
-
-	ENGINE_TRACE(engine, "cs-irq head=%d, tail=%d\n", head, tail);
 	do {
 		bool promote;
 
@@ -2613,6 +2630,11 @@ static void process_csb(struct intel_engine_cs *engine)
 		if (promote) {
 			struct i915_request * const *old = execlists->active;
 
+			if (GEM_WARN_ON(!*execlists->pending)) {
+				execlists->error_interrupt |= INVALID_CSB;
+				break;
+			}
+
 			ring_set_paused(engine, 0);
 
 			/* Point active to the new ELSP; prevent overwriting */
@@ -2635,7 +2657,10 @@ static void process_csb(struct intel_engine_cs *engine)
 
 			WRITE_ONCE(execlists->pending[0], NULL);
 		} else {
-			GEM_BUG_ON(!*execlists->active);
+			if (GEM_WARN_ON(!*execlists->active)) {
+				execlists->error_interrupt |= INVALID_CSB;
+				break;
+			}
 
 			/* port0 completed, advanced to port1 */
 			trace_ports(execlists, "completed", execlists->active);
@@ -2686,7 +2711,6 @@ static void process_csb(struct intel_engine_cs *engine)
 		}
 	} while (head != tail);
 
-	execlists->csb_head = head;
 	set_timeslice(engine);
 
 	/*
@@ -3117,9 +3141,18 @@ static void execlists_submission_tasklet(unsigned long data)
 	process_csb(engine);
 
 	if (unlikely(READ_ONCE(engine->execlists.error_interrupt))) {
+		const char *msg;
+
+		/* Generate the error message in priority wrt to the user! */
+		if (engine->execlists.error_interrupt & GENMASK(15, 0))
+			msg = "CS error"; /* thrown by a user payload */
+		else if (engine->execlists.error_interrupt & INVALID_CSB)
+			msg = "invalid CSB event";
+		else
+			msg = "internal error";
+
 		engine->execlists.error_interrupt = 0;
-		if (ENGINE_READ(engine, RING_ESR)) /* confirm the error */
-			execlists_reset(engine, "CS error");
+		execlists_reset(engine, msg);
 	}
 
 	if (!READ_ONCE(engine->execlists.pending[0]) || timeout) {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-gfx] [PATCH v3] drm/i915/gt: Be defensive in the face of false CS events
  2020-07-10 13:14   ` Tvrtko Ursulin
  2020-07-10 13:27     ` [Intel-gfx] [PATCH v3] " Chris Wilson
@ 2020-07-10 13:31     ` Chris Wilson
  2020-07-10 13:43       ` Tvrtko Ursulin
  1 sibling, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2020-07-10 13:31 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If the HW throws a curve ball and reports either en event before it is
possible, or just a completely impossible event, we have to grin and
bear it. The first few events, we will likely not notice as we would be
expecting some event, but as soon as we stop expecting an event and yet
they still keep coming, then we enter into undefined state territory.
In which case, bail out, stop processing the events, and reset the
engine and our set of queued requests to recover.

The sporadic hangs and warnings will continue to plague CI, but at least
system stability should not be compromised.

v2: Commentary and force the reset-on-error.
v3: Customised user facing message for forced resets from internal errors.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h |  4 ++
 drivers/gpu/drm/i915/gt/intel_gt_irq.c       |  3 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c          | 45 +++++++++++++++++---
 3 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 490af81bd6f3..8de92fd7d392 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -177,8 +177,12 @@ struct intel_engine_execlists {
 	 * the first error interrupt, record the EIR and schedule the tasklet.
 	 * In the tasklet, we process the pending CS events to ensure we have
 	 * the guilty request, and then reset the engine.
+	 *
+	 * Low 16b are used by HW, with the upper 16b used as the enabling mask.
+	 * Reserve the upper 16b for tracking internal errors.
 	 */
 	u32 error_interrupt;
+#define ERROR_CSB BIT(31)
 
 	/**
 	 * @reset_ccid: Active CCID [EXECLISTS_STATUS_HI] at the time of reset
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
index e1964cf40fd6..b05da68e52f4 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
@@ -27,7 +27,8 @@ cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 	if (unlikely(iir & GT_CS_MASTER_ERROR_INTERRUPT)) {
 		u32 eir;
 
-		eir = ENGINE_READ(engine, RING_EIR);
+		/* Upper 16b are the enabling mask, rsvd for internal errors */
+		eir = ENGINE_READ(engine, RING_EIR) & GENMASK(15, 0);
 		ENGINE_TRACE(engine, "CS error: %x\n", eir);
 
 		/* Disable the error interrupt until after the reset */
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index fbcfeaed6441..cd4262cc96e2 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2568,6 +2568,25 @@ static void process_csb(struct intel_engine_cs *engine)
 	if (unlikely(head == tail))
 		return;
 
+	/*
+	 * We will consume all events from HW, or at least pretend to.
+	 *
+	 * The sequence of events from the HW is deterministic, and derived
+	 * from our writes to the ELSP, with a smidgen of variability for
+	 * the arrival of the asynchronous requests wrt to the inflight
+	 * execution. If the HW sends an event that does not correspond with
+	 * the one we are expecting, we have to abandon all hope as we lose
+	 * all tracking of what the engine is actually executing. We will
+	 * only detect we are out of sequence with the HW when we get an
+	 * 'impossible' event because we have already drained our own
+	 * preemption/promotion queue. If this occurs, we know that we likely
+	 * lost track of execution earlier and must unwind and restart, the
+	 * simplest way is by stop processing the event queue and force the
+	 * engine to reset.
+	 */
+	execlists->csb_head = tail;
+	ENGINE_TRACE(engine, "cs-irq head=%d, tail=%d\n", head, tail);
+
 	/*
 	 * Hopefully paired with a wmb() in HW!
 	 *
@@ -2577,8 +2596,6 @@ static void process_csb(struct intel_engine_cs *engine)
 	 * we perform the READ_ONCE(*csb_write).
 	 */
 	rmb();
-
-	ENGINE_TRACE(engine, "cs-irq head=%d, tail=%d\n", head, tail);
 	do {
 		bool promote;
 
@@ -2613,6 +2630,11 @@ static void process_csb(struct intel_engine_cs *engine)
 		if (promote) {
 			struct i915_request * const *old = execlists->active;
 
+			if (GEM_WARN_ON(!*execlists->pending)) {
+				execlists->error_interrupt |= ERROR_CSB;
+				break;
+			}
+
 			ring_set_paused(engine, 0);
 
 			/* Point active to the new ELSP; prevent overwriting */
@@ -2635,7 +2657,10 @@ static void process_csb(struct intel_engine_cs *engine)
 
 			WRITE_ONCE(execlists->pending[0], NULL);
 		} else {
-			GEM_BUG_ON(!*execlists->active);
+			if (GEM_WARN_ON(!*execlists->active)) {
+				execlists->error_interrupt |= ERROR_CSB;
+				break;
+			}
 
 			/* port0 completed, advanced to port1 */
 			trace_ports(execlists, "completed", execlists->active);
@@ -2686,7 +2711,6 @@ static void process_csb(struct intel_engine_cs *engine)
 		}
 	} while (head != tail);
 
-	execlists->csb_head = head;
 	set_timeslice(engine);
 
 	/*
@@ -3117,9 +3141,18 @@ static void execlists_submission_tasklet(unsigned long data)
 	process_csb(engine);
 
 	if (unlikely(READ_ONCE(engine->execlists.error_interrupt))) {
+		const char *msg;
+
+		/* Generate the error message in priority wrt to the user! */
+		if (engine->execlists.error_interrupt & GENMASK(15, 0))
+			msg = "CS error"; /* thrown by a user payload */
+		else if (engine->execlists.error_interrupt & ERROR_CSB)
+			msg = "invalid CSB event";
+		else
+			msg = "internal error";
+
 		engine->execlists.error_interrupt = 0;
-		if (ENGINE_READ(engine, RING_ESR)) /* confirm the error */
-			execlists_reset(engine, "CS error");
+		execlists_reset(engine, msg);
 	}
 
 	if (!READ_ONCE(engine->execlists.pending[0]) || timeout) {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH v3] drm/i915/gt: Be defensive in the face of false CS events
  2020-07-10 13:31     ` Chris Wilson
@ 2020-07-10 13:43       ` Tvrtko Ursulin
  0 siblings, 0 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2020-07-10 13:43 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 10/07/2020 14:31, Chris Wilson wrote:
> If the HW throws a curve ball and reports either en event before it is
> possible, or just a completely impossible event, we have to grin and
> bear it. The first few events, we will likely not notice as we would be
> expecting some event, but as soon as we stop expecting an event and yet
> they still keep coming, then we enter into undefined state territory.
> In which case, bail out, stop processing the events, and reset the
> engine and our set of queued requests to recover.
> 
> The sporadic hangs and warnings will continue to plague CI, but at least
> system stability should not be compromised.
> 
> v2: Commentary and force the reset-on-error.
> v3: Customised user facing message for forced resets from internal errors.
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
> ---
>   drivers/gpu/drm/i915/gt/intel_engine_types.h |  4 ++
>   drivers/gpu/drm/i915/gt/intel_gt_irq.c       |  3 +-
>   drivers/gpu/drm/i915/gt/intel_lrc.c          | 45 +++++++++++++++++---
>   3 files changed, 45 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 490af81bd6f3..8de92fd7d392 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -177,8 +177,12 @@ struct intel_engine_execlists {
>   	 * the first error interrupt, record the EIR and schedule the tasklet.
>   	 * In the tasklet, we process the pending CS events to ensure we have
>   	 * the guilty request, and then reset the engine.
> +	 *
> +	 * Low 16b are used by HW, with the upper 16b used as the enabling mask.
> +	 * Reserve the upper 16b for tracking internal errors.
>   	 */
>   	u32 error_interrupt;
> +#define ERROR_CSB BIT(31)
>   
>   	/**
>   	 * @reset_ccid: Active CCID [EXECLISTS_STATUS_HI] at the time of reset
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> index e1964cf40fd6..b05da68e52f4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> @@ -27,7 +27,8 @@ cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
>   	if (unlikely(iir & GT_CS_MASTER_ERROR_INTERRUPT)) {
>   		u32 eir;
>   
> -		eir = ENGINE_READ(engine, RING_EIR);
> +		/* Upper 16b are the enabling mask, rsvd for internal errors */
> +		eir = ENGINE_READ(engine, RING_EIR) & GENMASK(15, 0);
>   		ENGINE_TRACE(engine, "CS error: %x\n", eir);
>   
>   		/* Disable the error interrupt until after the reset */
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index fbcfeaed6441..cd4262cc96e2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2568,6 +2568,25 @@ static void process_csb(struct intel_engine_cs *engine)
>   	if (unlikely(head == tail))
>   		return;
>   
> +	/*
> +	 * We will consume all events from HW, or at least pretend to.
> +	 *
> +	 * The sequence of events from the HW is deterministic, and derived
> +	 * from our writes to the ELSP, with a smidgen of variability for
> +	 * the arrival of the asynchronous requests wrt to the inflight
> +	 * execution. If the HW sends an event that does not correspond with
> +	 * the one we are expecting, we have to abandon all hope as we lose
> +	 * all tracking of what the engine is actually executing. We will
> +	 * only detect we are out of sequence with the HW when we get an
> +	 * 'impossible' event because we have already drained our own
> +	 * preemption/promotion queue. If this occurs, we know that we likely
> +	 * lost track of execution earlier and must unwind and restart, the
> +	 * simplest way is by stop processing the event queue and force the
> +	 * engine to reset.
> +	 */
> +	execlists->csb_head = tail;
> +	ENGINE_TRACE(engine, "cs-irq head=%d, tail=%d\n", head, tail);
> +
>   	/*
>   	 * Hopefully paired with a wmb() in HW!
>   	 *
> @@ -2577,8 +2596,6 @@ static void process_csb(struct intel_engine_cs *engine)
>   	 * we perform the READ_ONCE(*csb_write).
>   	 */
>   	rmb();
> -
> -	ENGINE_TRACE(engine, "cs-irq head=%d, tail=%d\n", head, tail);
>   	do {
>   		bool promote;
>   
> @@ -2613,6 +2630,11 @@ static void process_csb(struct intel_engine_cs *engine)
>   		if (promote) {
>   			struct i915_request * const *old = execlists->active;
>   
> +			if (GEM_WARN_ON(!*execlists->pending)) {
> +				execlists->error_interrupt |= ERROR_CSB;
> +				break;
> +			}
> +
>   			ring_set_paused(engine, 0);
>   
>   			/* Point active to the new ELSP; prevent overwriting */
> @@ -2635,7 +2657,10 @@ static void process_csb(struct intel_engine_cs *engine)
>   
>   			WRITE_ONCE(execlists->pending[0], NULL);
>   		} else {
> -			GEM_BUG_ON(!*execlists->active);
> +			if (GEM_WARN_ON(!*execlists->active)) {
> +				execlists->error_interrupt |= ERROR_CSB;
> +				break;
> +			}
>   
>   			/* port0 completed, advanced to port1 */
>   			trace_ports(execlists, "completed", execlists->active);
> @@ -2686,7 +2711,6 @@ static void process_csb(struct intel_engine_cs *engine)
>   		}
>   	} while (head != tail);
>   
> -	execlists->csb_head = head;
>   	set_timeslice(engine);
>   
>   	/*
> @@ -3117,9 +3141,18 @@ static void execlists_submission_tasklet(unsigned long data)
>   	process_csb(engine);
>   
>   	if (unlikely(READ_ONCE(engine->execlists.error_interrupt))) {
> +		const char *msg;
> +
> +		/* Generate the error message in priority wrt to the user! */
> +		if (engine->execlists.error_interrupt & GENMASK(15, 0))
> +			msg = "CS error"; /* thrown by a user payload */
> +		else if (engine->execlists.error_interrupt & ERROR_CSB)
> +			msg = "invalid CSB event";
> +		else
> +			msg = "internal error";
> +
>   		engine->execlists.error_interrupt = 0;
> -		if (ENGINE_READ(engine, RING_ESR)) /* confirm the error */
> -			execlists_reset(engine, "CS error");
> +		execlists_reset(engine, msg);
>   	}
>   
>   	if (!READ_ONCE(engine->execlists.pending[0]) || timeout) {
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/gt: Be defensive in the face of false CS events (rev5)
  2020-07-10 12:07 [Intel-gfx] [PATCH] drm/i915/gt: Be defensive in the face of false CS events Chris Wilson
                   ` (2 preceding siblings ...)
  2020-07-10 13:05 ` [Intel-gfx] [PATCH v2] " Chris Wilson
@ 2020-07-10 14:00 ` Patchwork
  2020-07-10 16:29 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  4 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2020-07-10 14:00 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/gt: Be defensive in the face of false CS events (rev5)
URL   : https://patchwork.freedesktop.org/series/79340/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8726 -> Patchwork_18132
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/index.html

Known issues
------------

  Here are the changes found in Patchwork_18132 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_suspend@basic-s0:
    - fi-tgl-u2:          [PASS][1] -> [FAIL][2] ([i915#1888])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/fi-tgl-u2/igt@gem_exec_suspend@basic-s0.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/fi-tgl-u2/igt@gem_exec_suspend@basic-s0.html

  * igt@i915_module_load@reload:
    - fi-byt-j1900:       [PASS][3] -> [DMESG-WARN][4] ([i915#1982]) +1 similar issue
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/fi-byt-j1900/igt@i915_module_load@reload.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/fi-byt-j1900/igt@i915_module_load@reload.html

  * igt@i915_pm_backlight@basic-brightness:
    - fi-whl-u:           [PASS][5] -> [DMESG-WARN][6] ([i915#95])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/fi-whl-u/igt@i915_pm_backlight@basic-brightness.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/fi-whl-u/igt@i915_pm_backlight@basic-brightness.html

  * igt@i915_pm_rpm@basic-pci-d3-state:
    - fi-bsw-kefka:       [PASS][7] -> [DMESG-WARN][8] ([i915#1982])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/fi-bsw-kefka/igt@i915_pm_rpm@basic-pci-d3-state.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/fi-bsw-kefka/igt@i915_pm_rpm@basic-pci-d3-state.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@gem_contexts:
    - fi-tgl-u2:          [INCOMPLETE][9] ([i915#1932] / [i915#2045]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/fi-tgl-u2/igt@i915_selftest@live@gem_contexts.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/fi-tgl-u2/igt@i915_selftest@live@gem_contexts.html

  * igt@kms_pipe_crc_basic@read-crc-pipe-a-frame-sequence:
    - fi-tgl-u2:          [DMESG-WARN][11] ([i915#402]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/fi-tgl-u2/igt@kms_pipe_crc_basic@read-crc-pipe-a-frame-sequence.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/fi-tgl-u2/igt@kms_pipe_crc_basic@read-crc-pipe-a-frame-sequence.html

  
#### Warnings ####

  * igt@kms_chamelium@common-hpd-after-suspend:
    - fi-bxt-dsi:         [SKIP][13] ([fdo#109271] / [fdo#111827]) -> [SKIP][14] ([fdo#109271] / [fdo#111827] / [i915#1635]) +8 similar issues
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/fi-bxt-dsi/igt@kms_chamelium@common-hpd-after-suspend.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/fi-bxt-dsi/igt@kms_chamelium@common-hpd-after-suspend.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-apl-guc:         [SKIP][15] ([fdo#109271] / [fdo#111827]) -> [SKIP][16] ([fdo#109271] / [fdo#111827] / [i915#1635]) +8 similar issues
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/fi-apl-guc/igt@kms_chamelium@hdmi-hpd-fast.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/fi-apl-guc/igt@kms_chamelium@hdmi-hpd-fast.html

  * igt@kms_cursor_legacy@basic-flip-before-cursor-atomic:
    - fi-kbl-x1275:       [DMESG-WARN][17] ([i915#62] / [i915#92]) -> [DMESG-WARN][18] ([i915#62] / [i915#92] / [i915#95]) +1 similar issue
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/fi-kbl-x1275/igt@kms_cursor_legacy@basic-flip-before-cursor-atomic.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/fi-kbl-x1275/igt@kms_cursor_legacy@basic-flip-before-cursor-atomic.html

  * igt@kms_force_connector_basic@force-load-detect:
    - fi-bxt-dsi:         [SKIP][19] ([fdo#109271]) -> [SKIP][20] ([fdo#109271] / [i915#1635]) +27 similar issues
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/fi-bxt-dsi/igt@kms_force_connector_basic@force-load-detect.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/fi-bxt-dsi/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_psr@primary_page_flip:
    - fi-apl-guc:         [SKIP][21] ([fdo#109271]) -> [SKIP][22] ([fdo#109271] / [i915#1635]) +28 similar issues
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/fi-apl-guc/igt@kms_psr@primary_page_flip.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/fi-apl-guc/igt@kms_psr@primary_page_flip.html

  * igt@prime_vgem@basic-fence-flip:
    - fi-kbl-x1275:       [DMESG-WARN][23] ([i915#62] / [i915#92] / [i915#95]) -> [DMESG-WARN][24] ([i915#62] / [i915#92]) +5 similar issues
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/fi-kbl-x1275/igt@prime_vgem@basic-fence-flip.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/fi-kbl-x1275/igt@prime_vgem@basic-fence-flip.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1635]: https://gitlab.freedesktop.org/drm/intel/issues/1635
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#1932]: https://gitlab.freedesktop.org/drm/intel/issues/1932
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2045]: https://gitlab.freedesktop.org/drm/intel/issues/2045
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62
  [i915#92]: https://gitlab.freedesktop.org/drm/intel/issues/92
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (40 -> 35)
------------------------------

  Additional (2): fi-tgl-y fi-elk-e7500 
  Missing    (7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * Linux: CI_DRM_8726 -> Patchwork_18132

  CI-20190529: 20190529
  CI_DRM_8726: 723780498c9dd2f58b80e6b9daeaa5cd08ec8e7a @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5729: a048d54f58dd70b07dbeb4541b273ec230ddb586 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_18132: 3e4336b168680f261d8bcc7f38beebffeb5c22c1 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

3e4336b16868 drm/i915/gt: Be defensive in the face of false CS events

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915/gt: Be defensive in the face of false CS events (rev5)
  2020-07-10 12:07 [Intel-gfx] [PATCH] drm/i915/gt: Be defensive in the face of false CS events Chris Wilson
                   ` (3 preceding siblings ...)
  2020-07-10 14:00 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/gt: Be defensive in the face of false CS events (rev5) Patchwork
@ 2020-07-10 16:29 ` Patchwork
  4 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2020-07-10 16:29 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/gt: Be defensive in the face of false CS events (rev5)
URL   : https://patchwork.freedesktop.org/series/79340/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8726_full -> Patchwork_18132_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_18132_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_params@invalid-fence-in:
    - shard-kbl:          [PASS][1] -> [DMESG-WARN][2] ([i915#93] / [i915#95]) +4 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-kbl4/igt@gem_exec_params@invalid-fence-in.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-kbl3/igt@gem_exec_params@invalid-fence-in.html

  * igt@gem_exec_whisper@basic-queues-all:
    - shard-glk:          [PASS][3] -> [DMESG-WARN][4] ([i915#118] / [i915#95]) +2 similar issues
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-glk2/igt@gem_exec_whisper@basic-queues-all.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-glk3/igt@gem_exec_whisper@basic-queues-all.html

  * igt@i915_pm_rpm@modeset-lpsp-stress:
    - shard-skl:          [PASS][5] -> [DMESG-WARN][6] ([i915#1982]) +8 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-skl6/igt@i915_pm_rpm@modeset-lpsp-stress.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-skl3/igt@i915_pm_rpm@modeset-lpsp-stress.html

  * igt@kms_concurrent@pipe-a:
    - shard-tglb:         [PASS][7] -> [DMESG-WARN][8] ([i915#1982]) +1 similar issue
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-tglb6/igt@kms_concurrent@pipe-a.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-tglb7/igt@kms_concurrent@pipe-a.html

  * igt@kms_cursor_crc@pipe-a-cursor-128x42-sliding:
    - shard-apl:          [PASS][9] -> [DMESG-WARN][10] ([i915#1635] / [i915#95]) +23 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl8/igt@kms_cursor_crc@pipe-a-cursor-128x42-sliding.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-apl2/igt@kms_cursor_crc@pipe-a-cursor-128x42-sliding.html

  * igt@kms_draw_crc@draw-method-xrgb8888-mmap-wc-ytiled:
    - shard-skl:          [PASS][11] -> [FAIL][12] ([i915#52] / [i915#54])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-skl4/igt@kms_draw_crc@draw-method-xrgb8888-mmap-wc-ytiled.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-skl8/igt@kms_draw_crc@draw-method-xrgb8888-mmap-wc-ytiled.html

  * igt@kms_frontbuffer_tracking@basic:
    - shard-glk:          [PASS][13] -> [DMESG-WARN][14] ([i915#1982])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-glk3/igt@kms_frontbuffer_tracking@basic.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-glk9/igt@kms_frontbuffer_tracking@basic.html

  * igt@kms_hdr@bpc-switch:
    - shard-skl:          [PASS][15] -> [FAIL][16] ([i915#1188])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-skl3/igt@kms_hdr@bpc-switch.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-skl9/igt@kms_hdr@bpc-switch.html

  * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc:
    - shard-skl:          [PASS][17] -> [FAIL][18] ([fdo#108145] / [i915#265]) +1 similar issue
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-skl9/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-skl1/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html

  * igt@kms_psr@psr2_cursor_render:
    - shard-iclb:         [PASS][19] -> [SKIP][20] ([fdo#109441]) +5 similar issues
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-iclb2/igt@kms_psr@psr2_cursor_render.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-iclb8/igt@kms_psr@psr2_cursor_render.html

  * igt@kms_setmode@basic:
    - shard-kbl:          [PASS][21] -> [FAIL][22] ([i915#31])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-kbl2/igt@kms_setmode@basic.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-kbl6/igt@kms_setmode@basic.html

  * igt@kms_vblank@pipe-a-ts-continuation-suspend:
    - shard-kbl:          [PASS][23] -> [DMESG-WARN][24] ([i915#180]) +3 similar issues
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-kbl4/igt@kms_vblank@pipe-a-ts-continuation-suspend.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-kbl7/igt@kms_vblank@pipe-a-ts-continuation-suspend.html

  * igt@sw_sync@sync_busy_fork_unixsocket:
    - shard-tglb:         [PASS][25] -> [DMESG-WARN][26] ([i915#402])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-tglb6/igt@sw_sync@sync_busy_fork_unixsocket.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-tglb1/igt@sw_sync@sync_busy_fork_unixsocket.html

  
#### Possible fixes ####

  * igt@gem_ctx_param@basic:
    - shard-apl:          [DMESG-WARN][27] ([i915#1635] / [i915#95]) -> [PASS][28] +14 similar issues
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl7/igt@gem_ctx_param@basic.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-apl8/igt@gem_ctx_param@basic.html

  * igt@gem_exec_balancer@bonded-early:
    - shard-kbl:          [FAIL][29] ([i915#2079]) -> [PASS][30]
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-kbl3/igt@gem_exec_balancer@bonded-early.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-kbl1/igt@gem_exec_balancer@bonded-early.html

  * igt@i915_module_load@reload:
    - shard-tglb:         [DMESG-WARN][31] ([i915#402]) -> [PASS][32]
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-tglb2/igt@i915_module_load@reload.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-tglb3/igt@i915_module_load@reload.html

  * igt@i915_suspend@forcewake:
    - shard-snb:          [DMESG-WARN][33] ([i915#42]) -> [PASS][34]
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-snb6/igt@i915_suspend@forcewake.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-snb6/igt@i915_suspend@forcewake.html

  * igt@kms_color@pipe-b-ctm-negative:
    - shard-skl:          [DMESG-WARN][35] ([i915#1982]) -> [PASS][36] +9 similar issues
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-skl9/igt@kms_color@pipe-b-ctm-negative.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-skl1/igt@kms_color@pipe-b-ctm-negative.html

  * igt@kms_flip@basic-flip-vs-dpms@a-dp1:
    - shard-kbl:          [DMESG-WARN][37] ([i915#165]) -> [PASS][38]
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-kbl2/igt@kms_flip@basic-flip-vs-dpms@a-dp1.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-kbl2/igt@kms_flip@basic-flip-vs-dpms@a-dp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@a-dp1:
    - shard-kbl:          [DMESG-WARN][39] ([i915#180]) -> [PASS][40] +10 similar issues
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-kbl4/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-kbl3/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-pwrite:
    - shard-tglb:         [DMESG-WARN][41] ([i915#1982]) -> [PASS][42] +2 similar issues
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-tglb2/igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-pwrite.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-tglb3/igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-pwrite.html

  * igt@kms_psr@psr2_sprite_plane_onoff:
    - shard-iclb:         [SKIP][43] ([fdo#109441]) -> [PASS][44] +1 similar issue
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-iclb4/igt@kms_psr@psr2_sprite_plane_onoff.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-iclb2/igt@kms_psr@psr2_sprite_plane_onoff.html

  * igt@prime_busy@hang-wait@bcs0:
    - shard-hsw:          [INCOMPLETE][45] ([i915#409]) -> [PASS][46]
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-hsw6/igt@prime_busy@hang-wait@bcs0.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-hsw4/igt@prime_busy@hang-wait@bcs0.html

  
#### Warnings ####

  * igt@gem_exec_reloc@basic-concurrent0:
    - shard-apl:          [FAIL][47] ([i915#1930]) -> [FAIL][48] ([i915#1635] / [i915#1930])
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl4/igt@gem_exec_reloc@basic-concurrent0.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-apl1/igt@gem_exec_reloc@basic-concurrent0.html

  * igt@gem_userptr_blits@process-exit-mmap@wb:
    - shard-apl:          [SKIP][49] ([fdo#109271] / [i915#1699]) -> [SKIP][50] ([fdo#109271] / [i915#1635] / [i915#1699]) +3 similar issues
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl4/igt@gem_userptr_blits@process-exit-mmap@wb.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-apl1/igt@gem_userptr_blits@process-exit-mmap@wb.html

  * igt@i915_pm_dc@dc3co-vpb-simulation:
    - shard-apl:          [SKIP][51] ([fdo#109271] / [i915#658]) -> [SKIP][52] ([fdo#109271] / [i915#1635] / [i915#658])
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl7/igt@i915_pm_dc@dc3co-vpb-simulation.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-apl8/igt@i915_pm_dc@dc3co-vpb-simulation.html

  * igt@kms_color_chamelium@pipe-a-ctm-limited-range:
    - shard-apl:          [SKIP][53] ([fdo#109271] / [fdo#111827]) -> [SKIP][54] ([fdo#109271] / [fdo#111827] / [i915#1635]) +85 similar issues
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl8/igt@kms_color_chamelium@pipe-a-ctm-limited-range.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-apl2/igt@kms_color_chamelium@pipe-a-ctm-limited-range.html

  * igt@kms_content_protection@atomic-dpms:
    - shard-apl:          [FAIL][55] ([fdo#110321] / [fdo#110336]) -> [FAIL][56] ([fdo#110321] / [fdo#110336] / [i915#1635]) +2 similar issues
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl8/igt@kms_content_protection@atomic-dpms.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-apl7/igt@kms_content_protection@atomic-dpms.html

  * igt@kms_content_protection@lic:
    - shard-apl:          [FAIL][57] ([fdo#110321]) -> [FAIL][58] ([fdo#110321] / [i915#1635]) +1 similar issue
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl3/igt@kms_content_protection@lic.html
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-apl3/igt@kms_content_protection@lic.html

  * igt@kms_content_protection@uevent:
    - shard-apl:          [FAIL][59] ([i915#2105]) -> [FAIL][60] ([i915#1635] / [i915#2105])
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl1/igt@kms_content_protection@uevent.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-apl7/igt@kms_content_protection@uevent.html

  * igt@kms_cursor_crc@pipe-c-cursor-suspend:
    - shard-kbl:          [DMESG-WARN][61] ([i915#93] / [i915#95]) -> [DMESG-WARN][62] ([i915#180] / [i915#93] / [i915#95])
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-kbl4/igt@kms_cursor_crc@pipe-c-cursor-suspend.html
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-kbl1/igt@kms_cursor_crc@pipe-c-cursor-suspend.html

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-apl:          [FAIL][63] ([i915#1525]) -> [FAIL][64] ([i915#1525] / [i915#1635])
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl3/igt@kms_fbcon_fbt@fbc-suspend.html
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-apl8/igt@kms_fbcon_fbt@fbc-suspend.html

  * igt@kms_flip@flip-vs-suspend@a-edp1:
    - shard-tglb:         [INCOMPLETE][65] ([i915#1602] / [i915#1887] / [i915#456]) -> [DMESG-WARN][66] ([i915#1602] / [i915#1887])
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-tglb6/igt@kms_flip@flip-vs-suspend@a-edp1.html
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-tglb1/igt@kms_flip@flip-vs-suspend@a-edp1.html

  * igt@kms_plane_alpha_blend@pipe-a-alpha-opaque-fb:
    - shard-apl:          [FAIL][67] ([fdo#108145] / [i915#265]) -> [FAIL][68] ([fdo#108145] / [i915#1635] / [i915#265]) +10 similar issues
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl1/igt@kms_plane_alpha_blend@pipe-a-alpha-opaque-fb.html
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-apl7/igt@kms_plane_alpha_blend@pipe-a-alpha-opaque-fb.html

  * igt@kms_plane_alpha_blend@pipe-b-alpha-transparent-fb:
    - shard-apl:          [FAIL][69] ([i915#265]) -> [FAIL][70] ([i915#1635] / [i915#265]) +1 similar issue
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl2/igt@kms_plane_alpha_blend@pipe-b-alpha-transparent-fb.html
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-apl4/igt@kms_plane_alpha_blend@pipe-b-alpha-transparent-fb.html

  * igt@kms_setmode@basic:
    - shard-apl:          [FAIL][71] ([i915#31]) -> [FAIL][72] ([i915#1635] / [i915#31])
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl6/igt@kms_setmode@basic.html
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-apl6/igt@kms_setmode@basic.html

  * igt@kms_sysfs_edid_timing:
    - shard-apl:          [FAIL][73] ([IGT#2]) -> [FAIL][74] ([IGT#2] / [i915#1635])
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl7/igt@kms_sysfs_edid_timing.html
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-apl2/igt@kms_sysfs_edid_timing.html

  * igt@kms_vblank@pipe-d-ts-continuation-idle:
    - shard-apl:          [SKIP][75] ([fdo#109271]) -> [SKIP][76] ([fdo#109271] / [i915#1635]) +852 similar issues
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl4/igt@kms_vblank@pipe-d-ts-continuation-idle.html
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-apl4/igt@kms_vblank@pipe-d-ts-continuation-idle.html

  * igt@perf@blocking-parameterized:
    - shard-apl:          [FAIL][77] ([i915#1542]) -> [FAIL][78] ([i915#1542] / [i915#1635])
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl6/igt@perf@blocking-parameterized.html
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-apl6/igt@perf@blocking-parameterized.html

  * igt@runner@aborted:
    - shard-apl:          ([FAIL][79], [FAIL][80]) ([i915#1610] / [i915#1635] / [i915#2110]) -> [FAIL][81] ([i915#1635] / [i915#2110])
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl4/igt@runner@aborted.html
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-apl3/igt@runner@aborted.html
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-apl3/igt@runner@aborted.html
    - shard-tglb:         ([FAIL][82], [FAIL][83], [FAIL][84], [FAIL][85], [FAIL][86]) ([i915#2110] / [i915#2150]) -> ([FAIL][87], [FAIL][88]) ([i915#1764] / [i915#2110] / [i915#2150])
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-tglb8/igt@runner@aborted.html
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-tglb6/igt@runner@aborted.html
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-tglb6/igt@runner@aborted.html
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-tglb2/igt@runner@aborted.html
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8726/shard-tglb6/igt@runner@aborted.html
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-tglb3/igt@runner@aborted.html
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/shard-tglb1/igt@runner@aborted.html

  
  [IGT#2]: https://gitlab.freedesktop.org/drm/igt-gpu-tools/issues/2
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#110321]: https://bugs.freedesktop.org/show_bug.cgi?id=110321
  [fdo#110336]: https://bugs.freedesktop.org/show_bug.cgi?id=110336
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#118]: https://gitlab.freedesktop.org/drm/intel/issues/118
  [i915#1188]: https://gitlab.freedesktop.org/drm/intel/issues/1188
  [i915#1525]: https://gitlab.freedesktop.org/drm/intel/issues/1525
  [i915#1542]: https://gitlab.freedesktop.org/drm/intel/issues/1542
  [i915#1602]: https://gitlab.freedesktop.org/drm/intel/issues/1602
  [i915#1610]: https://gitlab.freedesktop.org/drm/intel/issues/1610
  [i915#1635]: https://gitlab.freedesktop.org/drm/intel/issues/1635
  [i915#165]: https://gitlab.freedesktop.org/drm/intel/issues/165
  [i915#1699]: https://gitlab.freedesktop.org/drm/intel/issues/1699
  [i915#1764]: https://gitlab.freedesktop.org/drm/intel/issues/1764
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#1887]: https://gitlab.freedesktop.org/drm/intel/issues/1887
  [i915#1930]: https://gitlab.freedesktop.org/drm/intel/issues/1930
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2079]: https://gitlab.freedesktop.org/drm/intel/issues/2079
  [i915#2105]: https://gitlab.freedesktop.org/drm/intel/issues/2105
  [i915#2110]: https://gitlab.freedesktop.org/drm/intel/issues/2110
  [i915#2150]: https://gitlab.freedesktop.org/drm/intel/issues/2150
  [i915#265]: https://gitlab.freedesktop.org/drm/intel/issues/265
  [i915#31]: https://gitlab.freedesktop.org/drm/intel/issues/31
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#409]: https://gitlab.freedesktop.org/drm/intel/issues/409
  [i915#42]: https://gitlab.freedesktop.org/drm/intel/issues/42
  [i915#456]: https://gitlab.freedesktop.org/drm/intel/issues/456
  [i915#52]: https://gitlab.freedesktop.org/drm/intel/issues/52
  [i915#54]: https://gitlab.freedesktop.org/drm/intel/issues/54
  [i915#658]: https://gitlab.freedesktop.org/drm/intel/issues/658
  [i915#93]: https://gitlab.freedesktop.org/drm/intel/issues/93
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (11 -> 11)
------------------------------

  No changes in participating hosts


Build changes
-------------

  * Linux: CI_DRM_8726 -> Patchwork_18132

  CI-20190529: 20190529
  CI_DRM_8726: 723780498c9dd2f58b80e6b9daeaa5cd08ec8e7a @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5729: a048d54f58dd70b07dbeb4541b273ec230ddb586 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_18132: 3e4336b168680f261d8bcc7f38beebffeb5c22c1 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18132/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/gt: Be defensive in the face of false CS events
  2020-07-10 12:16 ` Chris Wilson
  2020-07-10 12:30   ` Tvrtko Ursulin
@ 2020-07-10 17:23   ` Ruhl, Michael J
  1 sibling, 0 replies; 14+ messages in thread
From: Ruhl, Michael J @ 2020-07-10 17:23 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

>-----Original Message-----
>From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Chris
>Wilson
>Sent: Friday, July 10, 2020 8:16 AM
>To: intel-gfx@lists.freedesktop.org
>Cc: Chris Wilson <chris@chris-wilson.co.uk>
>Subject: [Intel-gfx] [PATCH] drm/i915/gt: Be defensive in the face of false CS
>events
>
>If the HW throws a curve ball and reports either en event before it is
                                                                                         ^^
s/en/an/

?

m

>possible, or just a completely impossible event, we have to grin and
>bear it. The first few events, we will likely not notice as we would be
>expecting some event, but as soon as we stop expecting an event and yet
>they still keep coming, then we enter into undefined state territory.
>In which case, bail out, stop processing the events, and reset the
>engine and our set of queued requests to recover.
>
>The sporadic hangs and warnings will continue to plague CI, but at least
>system stability should not be compromised.
>
>Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>---
> drivers/gpu/drm/i915/gt/intel_lrc.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c
>b/drivers/gpu/drm/i915/gt/intel_lrc.c
>index fbcfeaed6441..c86324d2d2bb 100644
>--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>@@ -2567,6 +2567,7 @@ static void process_csb(struct intel_engine_cs
>*engine)
> 	tail = READ_ONCE(*execlists->csb_write);
> 	if (unlikely(head == tail))
> 		return;
>+	execlists->csb_head = tail;
>
> 	/*
> 	 * Hopefully paired with a wmb() in HW!
>@@ -2613,6 +2614,9 @@ static void process_csb(struct intel_engine_cs
>*engine)
> 		if (promote) {
> 			struct i915_request * const *old = execlists->active;
>
>+			if (GEM_WARN_ON(!*execlists->pending))
>+				break;
>+
> 			ring_set_paused(engine, 0);
>
> 			/* Point active to the new ELSP; prevent overwriting
>*/
>@@ -2635,7 +2639,8 @@ static void process_csb(struct intel_engine_cs
>*engine)
>
> 			WRITE_ONCE(execlists->pending[0], NULL);
> 		} else {
>-			GEM_BUG_ON(!*execlists->active);
>+			if (GEM_WARN_ON(!*execlists->active))
>+				break;
>
> 			/* port0 completed, advanced to port1 */
> 			trace_ports(execlists, "completed", execlists->active);
>@@ -2686,7 +2691,6 @@ static void process_csb(struct intel_engine_cs
>*engine)
> 		}
> 	} while (head != tail);
>
>-	execlists->csb_head = head;
> 	set_timeslice(engine);
>
> 	/*
>--
>2.20.1
>
>_______________________________________________
>Intel-gfx mailing list
>Intel-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-07-10 17:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-10 12:07 [Intel-gfx] [PATCH] drm/i915/gt: Be defensive in the face of false CS events Chris Wilson
2020-07-10 12:15 ` Chris Wilson
2020-07-10 12:16 ` Chris Wilson
2020-07-10 12:30   ` Tvrtko Ursulin
2020-07-10 12:35     ` Chris Wilson
2020-07-10 12:49       ` Tvrtko Ursulin
2020-07-10 17:23   ` Ruhl, Michael J
2020-07-10 13:05 ` [Intel-gfx] [PATCH v2] " Chris Wilson
2020-07-10 13:14   ` Tvrtko Ursulin
2020-07-10 13:27     ` [Intel-gfx] [PATCH v3] " Chris Wilson
2020-07-10 13:31     ` Chris Wilson
2020-07-10 13:43       ` Tvrtko Ursulin
2020-07-10 14:00 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/gt: Be defensive in the face of false CS events (rev5) Patchwork
2020-07-10 16:29 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.