All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH] drm/i915: Reboot CI if we get wedged during driver init
@ 2020-07-01 15:07 Michał Winiarski
  2020-07-01 15:16 ` Chris Wilson
  2020-07-01 15:17 ` Chris Wilson
  0 siblings, 2 replies; 4+ messages in thread
From: Michał Winiarski @ 2020-07-01 15:07 UTC (permalink / raw)
  To: intel-gfx; +Cc: Michał Winiarski, Chris Wilson

From: Michał Winiarski <michal.winiarski@intel.com>

Getting wedged device on driver init is pretty much unrecoverable.
Since we're running verious scenarios that may potentially hit this in
CI (module reload / selftests / hotunplug), and if it happens, it means
that we can't trust any subsequent CI results, we should just apply the
taint to let the CI know that it should reboot (CI checks taint between
test runs).

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Petri Latvala <petri.latvala@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_reset.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 0156f1f5c736..d27e8bb7d550 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -1360,6 +1360,8 @@ void intel_gt_set_wedged_on_init(struct intel_gt *gt)
 		     I915_WEDGED_ON_INIT);
 	intel_gt_set_wedged(gt);
 	set_bit(I915_WEDGED_ON_INIT, &gt->reset.flags);
+
+	add_taint_for_CI(TAINT_WARN);
 }
 
 void intel_gt_init_reset(struct intel_gt *gt)
-- 
2.27.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Reboot CI if we get wedged during driver init
  2020-07-01 15:07 [Intel-gfx] [PATCH] drm/i915: Reboot CI if we get wedged during driver init Michał Winiarski
@ 2020-07-01 15:16 ` Chris Wilson
  2020-07-01 15:17 ` Chris Wilson
  1 sibling, 0 replies; 4+ messages in thread
From: Chris Wilson @ 2020-07-01 15:16 UTC (permalink / raw)
  To: Michał Winiarski, intel-gfx; +Cc: Michał Winiarski

Quoting Michał Winiarski (2020-07-01 16:07:21)
> From: Michał Winiarski <michal.winiarski@intel.com>
> 
> Getting wedged device on driver init is pretty much unrecoverable.
> Since we're running verious scenarios that may potentially hit this in
> CI (module reload / selftests / hotunplug), and if it happens, it means
> that we can't trust any subsequent CI results, we should just apply the
> taint to let the CI know that it should reboot (CI checks taint between
> test runs).

Ok, we treat WEDGED_ON_INIT as non-recoverable [as opposed to the less
wedged WEDGED].
 
> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Reboot CI if we get wedged during driver init
  2020-07-01 15:07 [Intel-gfx] [PATCH] drm/i915: Reboot CI if we get wedged during driver init Michał Winiarski
  2020-07-01 15:16 ` Chris Wilson
@ 2020-07-01 15:17 ` Chris Wilson
  2020-07-01 17:03   ` Michal Wajdeczko
  1 sibling, 1 reply; 4+ messages in thread
From: Chris Wilson @ 2020-07-01 15:17 UTC (permalink / raw)
  To: Michał Winiarski, intel-gfx; +Cc: Michał Winiarski

Quoting Michał Winiarski (2020-07-01 16:07:21)
> From: Michał Winiarski <michal.winiarski@intel.com>
> 
> Getting wedged device on driver init is pretty much unrecoverable.
> Since we're running verious scenarios that may potentially hit this in
> CI (module reload / selftests / hotunplug), and if it happens, it means
> that we can't trust any subsequent CI results, we should just apply the
> taint to let the CI know that it should reboot (CI checks taint between
> test runs).
> 
> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Petri Latvala <petri.latvala@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 0156f1f5c736..d27e8bb7d550 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -1360,6 +1360,8 @@ void intel_gt_set_wedged_on_init(struct intel_gt *gt)
>                      I915_WEDGED_ON_INIT);
>         intel_gt_set_wedged(gt);
>         set_bit(I915_WEDGED_ON_INIT, &gt->reset.flags);
> +

Ah, we don't say around here that this WEDGED_ON_INIT is non-recoverable,
could you please add a comment to that effect?

> +       add_taint_for_CI(TAINT_WARN);
>  }
>  
>  void intel_gt_init_reset(struct intel_gt *gt)
> -- 
> 2.27.0
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Reboot CI if we get wedged during driver init
  2020-07-01 15:17 ` Chris Wilson
@ 2020-07-01 17:03   ` Michal Wajdeczko
  0 siblings, 0 replies; 4+ messages in thread
From: Michal Wajdeczko @ 2020-07-01 17:03 UTC (permalink / raw)
  To: intel-gfx



On 01.07.2020 17:17, Chris Wilson wrote:
> Quoting Michał Winiarski (2020-07-01 16:07:21)
>> From: Michał Winiarski <michal.winiarski@intel.com>
>>
>> Getting wedged device on driver init is pretty much unrecoverable.
>> Since we're running verious scenarios that may potentially hit this in

typo

>> CI (module reload / selftests / hotunplug), and if it happens, it means
>> that we can't trust any subsequent CI results, we should just apply the
>> taint to let the CI know that it should reboot (CI checks taint between
>> test runs).
>>
>> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Petri Latvala <petri.latvala@intel.com>
>> ---
>>  drivers/gpu/drm/i915/gt/intel_reset.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
>> index 0156f1f5c736..d27e8bb7d550 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
>> @@ -1360,6 +1360,8 @@ void intel_gt_set_wedged_on_init(struct intel_gt *gt)
>>                      I915_WEDGED_ON_INIT);
>>         intel_gt_set_wedged(gt);
>>         set_bit(I915_WEDGED_ON_INIT, &gt->reset.flags);
>> +
> 
> Ah, we don't say around here that this WEDGED_ON_INIT is non-recoverable,
> could you please add a comment to that effect?
> 

Such comment is already in WEDGED_ON_INIT description, but repeating it
will definitely help

>> +       add_taint_for_CI(TAINT_WARN);

btw, today we are tainting kernel for CI silently and from different
places, so maybe it is worth to add there some debug log with
__builtin_return_address() for better diagnose why we stopped CI?

with typo/comment fixed,
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>

>>  }
>>  
>>  void intel_gt_init_reset(struct intel_gt *gt)
>> -- 
>> 2.27.0
>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-07-01 17:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-01 15:07 [Intel-gfx] [PATCH] drm/i915: Reboot CI if we get wedged during driver init Michał Winiarski
2020-07-01 15:16 ` Chris Wilson
2020-07-01 15:17 ` Chris Wilson
2020-07-01 17:03   ` Michal Wajdeczko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.