All of lore.kernel.org
 help / color / mirror / Atom feed
* S4 resume breakage with i915 driver
@ 2016-08-25 13:11 Takashi Iwai
  2016-08-25 15:32 ` Chris Wilson
  0 siblings, 1 reply; 22+ messages in thread
From: Takashi Iwai @ 2016-08-25 13:11 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

Hi,

while debugging our 4.4.x based SLE12-SP2 kernel, we noticed that S4
resume is broken on many machines with i915 gfx, even on the upstream
4.7 kernel.  Originally it was reported by Intel about SKL machines,
but later on, we found that it hits on many other older chips (at
least HSW), too.

This was hard to identify because there have been other S4 resume bugs
until recently.  But even after these fixes, when the system is tested
on i915 gfx, the system gets memory corruption or kernel Oops sooner
or later after a few (usually < 10) S4 cycles.

As the bug happened between 4.2 and 4.3, I bisected and it pointed to
the commit:

  4c436d55b279bbc6b02aac02e7dc683fc09f884e
    drm/i915: Enable Resource Streamer state save/restore on MI_SET_CONTEXT

Indeed, reverting this on top of our 4.4.x kernel seems to make S4
working stably (at least on a test machine).

Does this make any sense to you guys?

Since the commit message doesn't give a good explanation about this
change, I wonder what's the purpose of this commit.  Was it merely
optimization?


Some other things to be noted:
- This might be depending on the kernel config, of course.  We've
  stated hitting this after the deferred page init is enabled, for
  example.  But it's just a coincidence, not the cause.

- S3 seems working stably.  Only S4 is the problem.

- The upstream commits I backported onto 4.4.x are:
  65c0554b73c920023cc8998802e508b798113b46
    x86/power/64: Fix kernel text mapping corruption during image restoration
  406f992e4a372dafbe3c2cff7efbb2002a5c8ebd
    x86 / hibernate: Use hlt_play_dead() when resuming from hibernation

  With these, S4 works very stable on 4.4.x without i915, passed over
  100 S4 cycles.

- 4.8-rc has a mm-related change (the commit
  e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
    mm, page_alloc: remove fair zone allocation policy), and this
  commit alone improves the S4 stability by some mystery reason.  But
  the issue with i915 S4 must remain even with 4.8, I believe.  It's
  just a matter of probability.  Hence, for checking the i915 S4
  issue, it'd be easier to test with a slightly older kernel.


thanks,

Takashi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-25 13:11 S4 resume breakage with i915 driver Takashi Iwai
@ 2016-08-25 15:32 ` Chris Wilson
  2016-08-25 15:54   ` Takashi Iwai
  2016-08-26 12:29   ` David Weinehall
  0 siblings, 2 replies; 22+ messages in thread
From: Chris Wilson @ 2016-08-25 15:32 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Daniel Vetter, intel-gfx

On Thu, Aug 25, 2016 at 03:11:35PM +0200, Takashi Iwai wrote:
> Hi,
> 
> while debugging our 4.4.x based SLE12-SP2 kernel, we noticed that S4
> resume is broken on many machines with i915 gfx, even on the upstream
> 4.7 kernel.  Originally it was reported by Intel about SKL machines,
> but later on, we found that it hits on many other older chips (at
> least HSW), too.
> 
> This was hard to identify because there have been other S4 resume bugs
> until recently.  But even after these fixes, when the system is tested
> on i915 gfx, the system gets memory corruption or kernel Oops sooner
> or later after a few (usually < 10) S4 cycles.
> 
> As the bug happened between 4.2 and 4.3, I bisected and it pointed to
> the commit:
> 
>   4c436d55b279bbc6b02aac02e7dc683fc09f884e
>     drm/i915: Enable Resource Streamer state save/restore on MI_SET_CONTEXT
> 
> Indeed, reverting this on top of our 4.4.x kernel seems to make S4
> working stably (at least on a test machine).
> 
> Does this make any sense to you guys?

It could explain a HSW issue, but not SKL and not BDW by default (using
execlists).

All machine big core, no Braswell or other !llc device?
 
> Since the commit message doesn't give a good explanation about this
> change, I wonder what's the purpose of this commit.  Was it merely
> optimization?

It's a step towards enabling the Resource Streamer hardware block in the
GPU, kind of a glorified prefetch. Userspace also has to notify the GPU
to enable it for a batch as well.

The only doubt in mind my about that patch was whether the hw ignored
the RS_RESTORE_STATE if MI_RESTORE_INHIBIT as expected:

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 3c97f0e7a003..c618bb86aeb9 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -574,10 +574,15 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 
        /* These flags are for resource streamer on HSW+ */
        if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8)
-               flags |= (HSW_MI_RS_SAVE_STATE_EN | HSW_MI_RS_RESTORE_STATE_EN);
+               flags |= HSW_MI_RS_SAVE_STATE_EN;
        else if (INTEL_GEN(dev_priv) < 8)
-               flags |= (MI_SAVE_EXT_STATE_EN | MI_RESTORE_EXT_STATE_EN);
-
+               flags |= MI_SAVE_EXT_STATE_EN;
+       if ((flags & MI_RESTORE_INHIBIT) == 0) {
+               if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8)
+                       flags |= HSW_MI_RS_RESTORE_STATE_EN;
+               else if (INTEL_GEN(dev_priv) < 8)
+                       flags |= MI_RESTORE_EXT_STATE_EN;
+       }
 

But this code is only executed for Haswell, and we only inhibit restore
from uninitialised contexts. The biggest fear is that we are restoring
garbage from userspace and making the GPU do strange things.

That still doesn't explain stray writes into system memory, that is more
likely our GTT being broken. Could you confirm that bisect has any
impact on the other machines, and of course double check the result?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-25 15:32 ` Chris Wilson
@ 2016-08-25 15:54   ` Takashi Iwai
  2016-08-25 16:12     ` Chris Wilson
  2016-08-26 12:29   ` David Weinehall
  1 sibling, 1 reply; 22+ messages in thread
From: Takashi Iwai @ 2016-08-25 15:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx

On Thu, 25 Aug 2016 17:32:41 +0200,
Chris Wilson wrote:
> 
> On Thu, Aug 25, 2016 at 03:11:35PM +0200, Takashi Iwai wrote:
> > Hi,
> > 
> > while debugging our 4.4.x based SLE12-SP2 kernel, we noticed that S4
> > resume is broken on many machines with i915 gfx, even on the upstream
> > 4.7 kernel.  Originally it was reported by Intel about SKL machines,
> > but later on, we found that it hits on many other older chips (at
> > least HSW), too.
> > 
> > This was hard to identify because there have been other S4 resume bugs
> > until recently.  But even after these fixes, when the system is tested
> > on i915 gfx, the system gets memory corruption or kernel Oops sooner
> > or later after a few (usually < 10) S4 cycles.
> > 
> > As the bug happened between 4.2 and 4.3, I bisected and it pointed to
> > the commit:
> > 
> >   4c436d55b279bbc6b02aac02e7dc683fc09f884e
> >     drm/i915: Enable Resource Streamer state save/restore on MI_SET_CONTEXT
> > 
> > Indeed, reverting this on top of our 4.4.x kernel seems to make S4
> > working stably (at least on a test machine).
> > 
> > Does this make any sense to you guys?
> 
> It could explain a HSW issue, but not SKL and not BDW by default (using
> execlists).

Hrm, IIRC, SKL didn't work without the commit.  But it doesn't matter
if this doesn't has a bad influence on SKL, either.

> All machine big core, no Braswell or other !llc device?

Yeah, tested only on SKL, HSW and IVY.  No small boxes.

> > Since the commit message doesn't give a good explanation about this
> > change, I wonder what's the purpose of this commit.  Was it merely
> > optimization?
> 
> It's a step towards enabling the Resource Streamer hardware block in the
> GPU, kind of a glorified prefetch. Userspace also has to notify the GPU
> to enable it for a batch as well.
> 
> The only doubt in mind my about that patch was whether the hw ignored
> the RS_RESTORE_STATE if MI_RESTORE_INHIBIT as expected:
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 3c97f0e7a003..c618bb86aeb9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -574,10 +574,15 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
>  
>         /* These flags are for resource streamer on HSW+ */
>         if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8)
> -               flags |= (HSW_MI_RS_SAVE_STATE_EN | HSW_MI_RS_RESTORE_STATE_EN);
> +               flags |= HSW_MI_RS_SAVE_STATE_EN;
>         else if (INTEL_GEN(dev_priv) < 8)
> -               flags |= (MI_SAVE_EXT_STATE_EN | MI_RESTORE_EXT_STATE_EN);
> -
> +               flags |= MI_SAVE_EXT_STATE_EN;
> +       if ((flags & MI_RESTORE_INHIBIT) == 0) {
> +               if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8)
> +                       flags |= HSW_MI_RS_RESTORE_STATE_EN;
> +               else if (INTEL_GEN(dev_priv) < 8)
> +                       flags |= MI_RESTORE_EXT_STATE_EN;
> +       }
>  
> 
> But this code is only executed for Haswell, and we only inhibit restore
> from uninitialised contexts. The biggest fear is that we are restoring
> garbage from userspace and making the GPU do strange things.
> 
> That still doesn't explain stray writes into system memory, that is more
> likely our GTT being broken.

Yeah, that's my expectation.  But the bisection didn't reach it.
So maybe this casually surfaced the hidden GTT or cache issue.

> Could you confirm that bisect has any
> impact on the other machines, and of course double check the result?

You're asking bisection on all machines from the scratch for such a
bug taking so long time to reproduce, and especially for i915 code
path, that is known to be deadly difficult due to various merge
commits?  I sincerely decline the offer :)

Yes, the result was double-checked.  This has a positive effect on all
our tested machines.  Maybe But it's hard to tell exactly whether this is
the 100% culprit.  As said, there have been multiple S4 bugs, so far.
IVY worked without this patch (after x86 fixes), but obviously this
had no negative effect, either.


thanks,

Takashi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-25 15:54   ` Takashi Iwai
@ 2016-08-25 16:12     ` Chris Wilson
  2016-08-25 16:15       ` Takashi Iwai
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2016-08-25 16:12 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Daniel Vetter, intel-gfx

On Thu, Aug 25, 2016 at 05:54:58PM +0200, Takashi Iwai wrote:
> On Thu, 25 Aug 2016 17:32:41 +0200,
> 
> > Could you confirm that bisect has any
> > impact on the other machines, and of course double check the result?
> 
> You're asking bisection on all machines from the scratch for such a
> bug taking so long time to reproduce, and especially for i915 code
> path, that is known to be deadly difficult due to various merge
> commits?  I sincerely decline the offer :)

;)
 
> Yes, the result was double-checked.  This has a positive effect on all
> our tested machines.

That's more what I wanted to hear, just because it sounds dubious based
on the impact of the bisect result.

> Maybe But it's hard to tell exactly whether this is
> the 100% culprit.  As said, there have been multiple S4 bugs, so far.
> IVY worked without this patch (after x86 fixes), but obviously this
> had no negative effect, either.

Try

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 95ddd56b89f0..913ccf14c5a9 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1946,6 +1946,7 @@ static int i915_pm_thaw(struct device *dev)
 /* restore: called after loading the hibernation image. */
 static int i915_pm_restore_early(struct device *dev)
 {
+       intel_gpu_reset(dev_to_i915(dev), ALL_ENGINES);
        return i915_pm_resume_early(dev);
 }

as a shot in the dark.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-25 16:12     ` Chris Wilson
@ 2016-08-25 16:15       ` Takashi Iwai
  2016-08-26  9:18         ` Takashi Iwai
  0 siblings, 1 reply; 22+ messages in thread
From: Takashi Iwai @ 2016-08-25 16:15 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx

On Thu, 25 Aug 2016 18:12:19 +0200,
Chris Wilson wrote:
> 
> > Maybe But it's hard to tell exactly whether this is
> > the 100% culprit.  As said, there have been multiple S4 bugs, so far.
> > IVY worked without this patch (after x86 fixes), but obviously this
> > had no negative effect, either.
> 
> Try
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 95ddd56b89f0..913ccf14c5a9 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1946,6 +1946,7 @@ static int i915_pm_thaw(struct device *dev)
>  /* restore: called after loading the hibernation image. */
>  static int i915_pm_restore_early(struct device *dev)
>  {
> +       intel_gpu_reset(dev_to_i915(dev), ALL_ENGINES);
>         return i915_pm_resume_early(dev);
>  }
> 
> as a shot in the dark.

OK, I'll check it.  Maybe the result will be on tomorrow as I'll need
to leave soon from my office.


Thanks!

Takashi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-25 16:15       ` Takashi Iwai
@ 2016-08-26  9:18         ` Takashi Iwai
  2016-08-26 10:25           ` Takashi Iwai
  0 siblings, 1 reply; 22+ messages in thread
From: Takashi Iwai @ 2016-08-26  9:18 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx

On Thu, 25 Aug 2016 18:15:37 +0200,
Takashi Iwai wrote:
> 
> On Thu, 25 Aug 2016 18:12:19 +0200,
> Chris Wilson wrote:
> > 
> > > Maybe But it's hard to tell exactly whether this is
> > > the 100% culprit.  As said, there have been multiple S4 bugs, so far.
> > > IVY worked without this patch (after x86 fixes), but obviously this
> > > had no negative effect, either.
> > 
> > Try
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index 95ddd56b89f0..913ccf14c5a9 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -1946,6 +1946,7 @@ static int i915_pm_thaw(struct device *dev)
> >  /* restore: called after loading the hibernation image. */
> >  static int i915_pm_restore_early(struct device *dev)
> >  {
> > +       intel_gpu_reset(dev_to_i915(dev), ALL_ENGINES);
> >         return i915_pm_resume_early(dev);
> >  }
> > 
> > as a shot in the dark.
> 
> OK, I'll check it.  Maybe the result will be on tomorrow as I'll need
> to leave soon from my office.

I had to modify the intel_gpu_reset() call because the test was done
on the older kernel, so it's like:

+       intel_gpu_reset(dev_to_i915(dev)->dev);

And, it seems working on HSW! \o/

A simple trick, better than the magical register write revert.
I'll check other machines, too, to see whether it has any negative
impact.


Thanks!

Takashi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-26  9:18         ` Takashi Iwai
@ 2016-08-26 10:25           ` Takashi Iwai
  2016-08-26 10:39             ` Chris Wilson
  0 siblings, 1 reply; 22+ messages in thread
From: Takashi Iwai @ 2016-08-26 10:25 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx

On Fri, 26 Aug 2016 11:18:00 +0200,
Takashi Iwai wrote:
> 
> On Thu, 25 Aug 2016 18:15:37 +0200,
> Takashi Iwai wrote:
> > 
> > On Thu, 25 Aug 2016 18:12:19 +0200,
> > Chris Wilson wrote:
> > > 
> > > > Maybe But it's hard to tell exactly whether this is
> > > > the 100% culprit.  As said, there have been multiple S4 bugs, so far.
> > > > IVY worked without this patch (after x86 fixes), but obviously this
> > > > had no negative effect, either.
> > > 
> > > Try
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > > index 95ddd56b89f0..913ccf14c5a9 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.c
> > > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > > @@ -1946,6 +1946,7 @@ static int i915_pm_thaw(struct device *dev)
> > >  /* restore: called after loading the hibernation image. */
> > >  static int i915_pm_restore_early(struct device *dev)
> > >  {
> > > +       intel_gpu_reset(dev_to_i915(dev), ALL_ENGINES);
> > >         return i915_pm_resume_early(dev);
> > >  }
> > > 
> > > as a shot in the dark.
> > 
> > OK, I'll check it.  Maybe the result will be on tomorrow as I'll need
> > to leave soon from my office.
> 
> I had to modify the intel_gpu_reset() call because the test was done
> on the older kernel, so it's like:
> 
> +       intel_gpu_reset(dev_to_i915(dev)->dev);
> 
> And, it seems working on HSW! \o/
> 
> A simple trick, better than the magical register write revert.
> I'll check other machines, too, to see whether it has any negative
> impact.

The test results look good on all machines.


Takashi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-26 10:25           ` Takashi Iwai
@ 2016-08-26 10:39             ` Chris Wilson
  2016-08-26 11:10               ` Imre Deak
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2016-08-26 10:39 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Daniel Vetter, intel-gfx

On Fri, Aug 26, 2016 at 12:25:01PM +0200, Takashi Iwai wrote:
> On Fri, 26 Aug 2016 11:18:00 +0200,
> Takashi Iwai wrote:
> > I had to modify the intel_gpu_reset() call because the test was done
> > on the older kernel, so it's like:
> > 
> > +       intel_gpu_reset(dev_to_i915(dev)->dev);
> > 
> > And, it seems working on HSW! \o/
> > 
> > A simple trick, better than the magical register write revert.
> > I'll check other machines, too, to see whether it has any negative
> > impact.
> 
> The test results look good on all machines.

The theory then is that the GPU's are active across the load of the
hibernation image and so before the GTT is updated the memory currently
in use by the GPU is reused by the system.

The key question then is the memory of boot kernel still in place during
the hibernate restore phase? If we need to add a ->shutdown callback (if
that is even called before hibernate restore) then we can only fix
future kernels and are still susceptible to corruption when booing from
old kernels.

Any one familiar with the details of the hibernation restore? (And how
much relates to kexec?)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-26 10:39             ` Chris Wilson
@ 2016-08-26 11:10               ` Imre Deak
  2016-08-26 11:42                 ` Imre Deak
  0 siblings, 1 reply; 22+ messages in thread
From: Imre Deak @ 2016-08-26 11:10 UTC (permalink / raw)
  To: Chris Wilson, Takashi Iwai; +Cc: Daniel Vetter, intel-gfx

On pe, 2016-08-26 at 11:39 +0100, Chris Wilson wrote:
> On Fri, Aug 26, 2016 at 12:25:01PM +0200, Takashi Iwai wrote:
> > On Fri, 26 Aug 2016 11:18:00 +0200,
> > Takashi Iwai wrote:
> > > I had to modify the intel_gpu_reset() call because the test was done
> > > on the older kernel, so it's like:
> > > 
> > > +       intel_gpu_reset(dev_to_i915(dev)->dev);
> > > 
> > > And, it seems working on HSW! \o/
> > > 
> > > A simple trick, better than the magical register write revert.
> > > I'll check other machines, too, to see whether it has any negative
> > > impact.
> > 
> > The test results look good on all machines.
> 
> The theory then is that the GPU's are active across the load of the
> hibernation image and so before the GTT is updated the memory currently
> in use by the GPU is reused by the system.
> 
> The key question then is the memory of boot kernel still in place during
> the hibernate restore phase?

Before restoring the image all devices are quiesced by calling their
freeze callback, so the GPU should be idle already
in i915_pm_restore_early() already.

> If we need to add a ->shutdown callback (if
> that is even called before hibernate restore) then we can only fix
> future kernels and are still susceptible to corruption when booing from
> old kernels.
> 
> Any one familiar with the details of the hibernation restore? (And how
> much relates to kexec?)
> -Chris
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-26 11:10               ` Imre Deak
@ 2016-08-26 11:42                 ` Imre Deak
  2016-08-29 13:32                   ` Daniel Vetter
  0 siblings, 1 reply; 22+ messages in thread
From: Imre Deak @ 2016-08-26 11:42 UTC (permalink / raw)
  To: Chris Wilson, Takashi Iwai; +Cc: Daniel Vetter, intel-gfx

On pe, 2016-08-26 at 14:10 +0300, Imre Deak wrote:
> On pe, 2016-08-26 at 11:39 +0100, Chris Wilson wrote:
> > On Fri, Aug 26, 2016 at 12:25:01PM +0200, Takashi Iwai wrote:
> > > On Fri, 26 Aug 2016 11:18:00 +0200,
> > > Takashi Iwai wrote:
> > > > I had to modify the intel_gpu_reset() call because the test was
> > > > done
> > > > on the older kernel, so it's like:
> > > > 
> > > > +       intel_gpu_reset(dev_to_i915(dev)->dev);
> > > > 
> > > > And, it seems working on HSW! \o/
> > > > 
> > > > A simple trick, better than the magical register write revert.
> > > > I'll check other machines, too, to see whether it has any
> > > > negative
> > > > impact.
> > > 
> > > The test results look good on all machines.
> > 
> > The theory then is that the GPU's are active across the load of the
> > hibernation image and so before the GTT is updated the memory
> > currently
> > in use by the GPU is reused by the system.
> > 
> > The key question then is the memory of boot kernel still in place
> > during
> > the hibernate restore phase?
> 
> Before restoring the image all devices are quiesced by calling their
> freeze callback, so the GPU should be idle already
> in i915_pm_restore_early() already.

But this happens in the loader kernel, so if that doesn't have the
driver built-in then the freeze callback won't be called either. So any
possible BIOS related GPU activity/setup should be quiesced from the
restore callback then.

> > If we need to add a ->shutdown callback (if
> > that is even called before hibernate restore) then we can only fix
> > future kernels and are still susceptible to corruption when booing
> > from
> > old kernels.
> > 
> > Any one familiar with the details of the hibernation restore? (And
> > how
> > much relates to kexec?)
> > -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-25 15:32 ` Chris Wilson
  2016-08-25 15:54   ` Takashi Iwai
@ 2016-08-26 12:29   ` David Weinehall
  2016-08-26 12:38     ` Chris Wilson
  1 sibling, 1 reply; 22+ messages in thread
From: David Weinehall @ 2016-08-26 12:29 UTC (permalink / raw)
  To: Chris Wilson, Takashi Iwai, Daniel Vetter, Abdiel Janulgue, intel-gfx

On Thu, Aug 25, 2016 at 04:32:41PM +0100, Chris Wilson wrote:
[snip]
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 3c97f0e7a003..c618bb86aeb9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -574,10 +574,15 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
>  
>         /* These flags are for resource streamer on HSW+ */
>         if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8)
> -               flags |= (HSW_MI_RS_SAVE_STATE_EN | HSW_MI_RS_RESTORE_STATE_EN);
> +               flags |= HSW_MI_RS_SAVE_STATE_EN;
>         else if (INTEL_GEN(dev_priv) < 8)
> -               flags |= (MI_SAVE_EXT_STATE_EN | MI_RESTORE_EXT_STATE_EN);
> -
> +               flags |= MI_SAVE_EXT_STATE_EN;
> +       if ((flags & MI_RESTORE_INHIBIT) == 0) {
> +               if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8)
> +                       flags |= HSW_MI_RS_RESTORE_STATE_EN;
> +               else if (INTEL_GEN(dev_priv) < 8)
> +                       flags |= MI_RESTORE_EXT_STATE_EN;
> +       }
>  
> 
> But this code is only executed for Haswell, and we only inhibit restore
> from uninitialised contexts. The biggest fear is that we are restoring
> garbage from userspace and making the GPU do strange things.

I notice from the thread that this has already been resolved, but:

You say that this is only executed for Haswell?

Wouldn't:

if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8)

Make it execute on Haswell *or* anything gen8 or newer?


Regards, David
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-26 12:29   ` David Weinehall
@ 2016-08-26 12:38     ` Chris Wilson
  2016-08-26 12:54       ` David Weinehall
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2016-08-26 12:38 UTC (permalink / raw)
  To: David Weinehall; +Cc: Takashi Iwai, Daniel Vetter, intel-gfx

On Fri, Aug 26, 2016 at 02:29:41PM +0200, David Weinehall wrote:
> On Thu, Aug 25, 2016 at 04:32:41PM +0100, Chris Wilson wrote:
> [snip]
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 3c97f0e7a003..c618bb86aeb9 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -574,10 +574,15 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
> >  
> >         /* These flags are for resource streamer on HSW+ */
> >         if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8)
> > -               flags |= (HSW_MI_RS_SAVE_STATE_EN | HSW_MI_RS_RESTORE_STATE_EN);
> > +               flags |= HSW_MI_RS_SAVE_STATE_EN;
> >         else if (INTEL_GEN(dev_priv) < 8)
> > -               flags |= (MI_SAVE_EXT_STATE_EN | MI_RESTORE_EXT_STATE_EN);
> > -
> > +               flags |= MI_SAVE_EXT_STATE_EN;
> > +       if ((flags & MI_RESTORE_INHIBIT) == 0) {
> > +               if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8)
> > +                       flags |= HSW_MI_RS_RESTORE_STATE_EN;
> > +               else if (INTEL_GEN(dev_priv) < 8)
> > +                       flags |= MI_RESTORE_EXT_STATE_EN;
> > +       }
> >  
> > 
> > But this code is only executed for Haswell, and we only inhibit restore
> > from uninitialised contexts. The biggest fear is that we are restoring
> > garbage from userspace and making the GPU do strange things.
> 
> I notice from the thread that this has already been resolved, but:
> 
> You say that this is only executed for Haswell?
> 
> Wouldn't:
> 
> if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8)
> 
> Make it execute on Haswell *or* anything gen8 or newer?

Only if i915.enable_execlists=0.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-26 12:38     ` Chris Wilson
@ 2016-08-26 12:54       ` David Weinehall
  0 siblings, 0 replies; 22+ messages in thread
From: David Weinehall @ 2016-08-26 12:54 UTC (permalink / raw)
  To: Chris Wilson, Takashi Iwai, Daniel Vetter, Abdiel Janulgue, intel-gfx

On Fri, Aug 26, 2016 at 01:38:26PM +0100, Chris Wilson wrote:
> On Fri, Aug 26, 2016 at 02:29:41PM +0200, David Weinehall wrote:
> > On Thu, Aug 25, 2016 at 04:32:41PM +0100, Chris Wilson wrote:
> > [snip]
> > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > > index 3c97f0e7a003..c618bb86aeb9 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > > @@ -574,10 +574,15 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
> > >  
> > >         /* These flags are for resource streamer on HSW+ */
> > >         if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8)
> > > -               flags |= (HSW_MI_RS_SAVE_STATE_EN | HSW_MI_RS_RESTORE_STATE_EN);
> > > +               flags |= HSW_MI_RS_SAVE_STATE_EN;
> > >         else if (INTEL_GEN(dev_priv) < 8)
> > > -               flags |= (MI_SAVE_EXT_STATE_EN | MI_RESTORE_EXT_STATE_EN);
> > > -
> > > +               flags |= MI_SAVE_EXT_STATE_EN;
> > > +       if ((flags & MI_RESTORE_INHIBIT) == 0) {
> > > +               if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8)
> > > +                       flags |= HSW_MI_RS_RESTORE_STATE_EN;
> > > +               else if (INTEL_GEN(dev_priv) < 8)
> > > +                       flags |= MI_RESTORE_EXT_STATE_EN;
> > > +       }
> > >  
> > > 
> > > But this code is only executed for Haswell, and we only inhibit restore
> > > from uninitialised contexts. The biggest fear is that we are restoring
> > > garbage from userspace and making the GPU do strange things.
> > 
> > I notice from the thread that this has already been resolved, but:
> > 
> > You say that this is only executed for Haswell?
> > 
> > Wouldn't:
> > 
> > if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8)
> > 
> > Make it execute on Haswell *or* anything gen8 or newer?
> 
> Only if i915.enable_execlists=0.

Ahhhh. Right.


Kind regards, David
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-26 11:42                 ` Imre Deak
@ 2016-08-29 13:32                   ` Daniel Vetter
  2016-08-29 14:09                     ` Imre Deak
  2016-09-02 18:34                     ` Dave Gordon
  0 siblings, 2 replies; 22+ messages in thread
From: Daniel Vetter @ 2016-08-29 13:32 UTC (permalink / raw)
  To: Imre Deak; +Cc: Takashi Iwai, Daniel Vetter, intel-gfx

On Fri, Aug 26, 2016 at 02:42:47PM +0300, Imre Deak wrote:
> On pe, 2016-08-26 at 14:10 +0300, Imre Deak wrote:
> > On pe, 2016-08-26 at 11:39 +0100, Chris Wilson wrote:
> > > On Fri, Aug 26, 2016 at 12:25:01PM +0200, Takashi Iwai wrote:
> > > > On Fri, 26 Aug 2016 11:18:00 +0200,
> > > > Takashi Iwai wrote:
> > > > > I had to modify the intel_gpu_reset() call because the test was
> > > > > done
> > > > > on the older kernel, so it's like:
> > > > > 
> > > > > +       intel_gpu_reset(dev_to_i915(dev)->dev);
> > > > > 
> > > > > And, it seems working on HSW! \o/
> > > > > 
> > > > > A simple trick, better than the magical register write revert.
> > > > > I'll check other machines, too, to see whether it has any
> > > > > negative
> > > > > impact.
> > > > 
> > > > The test results look good on all machines.
> > > 
> > > The theory then is that the GPU's are active across the load of the
> > > hibernation image and so before the GTT is updated the memory
> > > currently
> > > in use by the GPU is reused by the system.
> > > 
> > > The key question then is the memory of boot kernel still in place
> > > during
> > > the hibernate restore phase?
> > 
> > Before restoring the image all devices are quiesced by calling their
> > freeze callback, so the GPU should be idle already
> > in i915_pm_restore_early() already.
> 
> But this happens in the loader kernel, so if that doesn't have the
> driver built-in then the freeze callback won't be called either. So any
> possible BIOS related GPU activity/setup should be quiesced from the
> restore callback then.

I thought the loader kernel has an entire initrd attached, to allow stuff
like typing in the disk encryption passwd. Which means we very much do
load i915 in the loader kernel already.

So maybe we need to throw a gpu reset into the right hook (shutdown or
whatever it was) to make sure the loader kernel really stops all gpu write
cycles, including anything done due to power saving context restoring. We
already know that the only way to get the gpu off the context image
permanently is a gpu reset, so that would make some sense.
-Daniel

> 
> > > If we need to add a ->shutdown callback (if
> > > that is even called before hibernate restore) then we can only fix
> > > future kernels and are still susceptible to corruption when booing
> > > from
> > > old kernels.
> > > 
> > > Any one familiar with the details of the hibernation restore? (And
> > > how
> > > much relates to kexec?)
> > > -Chris

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-29 13:32                   ` Daniel Vetter
@ 2016-08-29 14:09                     ` Imre Deak
  2016-08-29 14:24                       ` Takashi Iwai
  2016-09-02 18:34                     ` Dave Gordon
  1 sibling, 1 reply; 22+ messages in thread
From: Imre Deak @ 2016-08-29 14:09 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Takashi Iwai, Daniel Vetter, intel-gfx

On ma, 2016-08-29 at 15:32 +0200, Daniel Vetter wrote:
> On Fri, Aug 26, 2016 at 02:42:47PM +0300, Imre Deak wrote:
> > On pe, 2016-08-26 at 14:10 +0300, Imre Deak wrote:
> > > On pe, 2016-08-26 at 11:39 +0100, Chris Wilson wrote:
> > > > On Fri, Aug 26, 2016 at 12:25:01PM +0200, Takashi Iwai wrote:
> > > > > On Fri, 26 Aug 2016 11:18:00 +0200,
> > > > > Takashi Iwai wrote:
> > > > > > I had to modify the intel_gpu_reset() call because the test was
> > > > > > done
> > > > > > on the older kernel, so it's like:
> > > > > > 
> > > > > > +       intel_gpu_reset(dev_to_i915(dev)->dev);
> > > > > > 
> > > > > > And, it seems working on HSW! \o/
> > > > > > 
> > > > > > A simple trick, better than the magical register write revert.
> > > > > > I'll check other machines, too, to see whether it has any
> > > > > > negative
> > > > > > impact.
> > > > > 
> > > > > The test results look good on all machines.
> > > > 
> > > > The theory then is that the GPU's are active across the load of the
> > > > hibernation image and so before the GTT is updated the memory
> > > > currently
> > > > in use by the GPU is reused by the system.
> > > > 
> > > > The key question then is the memory of boot kernel still in place
> > > > during
> > > > the hibernate restore phase?
> > > 
> > > Before restoring the image all devices are quiesced by calling their
> > > freeze callback, so the GPU should be idle already
> > > in i915_pm_restore_early() already.
> > 
> > But this happens in the loader kernel, so if that doesn't have the
> > driver built-in then the freeze callback won't be called either. So any
> > possible BIOS related GPU activity/setup should be quiesced from the
> > restore callback then.
> 
> I thought the loader kernel has an entire initrd attached, to allow stuff
> like typing in the disk encryption passwd. Which means we very much do
> load i915 in the loader kernel already.

AFAICS, the hibernation image is restored from a late_initcall and so
/bin/init etc. won't be run in the loader kernel and so the driver
won't be loaded if built as a module. But in theory at least it's
possible that the driver won't even be configured in the loader kernel.

> So maybe we need to throw a gpu reset into the right hook (shutdown or
> whatever it was) to make sure the loader kernel really stops all gpu write
> cycles, including anything done due to power saving context restoring.

The callback called right before the hibernation image is restored is
freeze. Shutdown is called only after creating the image, before
powering off.

--Imre

> We already know that the only way to get the gpu off the context image
> permanently is a gpu reset, so that would make some sense.
> -Daniel
> 
> > 
> > > > If we need to add a ->shutdown callback (if
> > > > that is even called before hibernate restore) then we can only fix
> > > > future kernels and are still susceptible to corruption when booing
> > > > from
> > > > old kernels.
> > > > 
> > > > Any one familiar with the details of the hibernation restore? (And
> > > > how
> > > > much relates to kexec?)
> > > > -Chris
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-29 14:09                     ` Imre Deak
@ 2016-08-29 14:24                       ` Takashi Iwai
  2016-08-29 14:54                         ` Imre Deak
  0 siblings, 1 reply; 22+ messages in thread
From: Takashi Iwai @ 2016-08-29 14:24 UTC (permalink / raw)
  To: imre.deak; +Cc: Daniel Vetter, intel-gfx

On Mon, 29 Aug 2016 16:09:23 +0200,
Imre Deak wrote:
> 
> On ma, 2016-08-29 at 15:32 +0200, Daniel Vetter wrote:
> > On Fri, Aug 26, 2016 at 02:42:47PM +0300, Imre Deak wrote:
> > > On pe, 2016-08-26 at 14:10 +0300, Imre Deak wrote:
> > > > On pe, 2016-08-26 at 11:39 +0100, Chris Wilson wrote:
> > > > > On Fri, Aug 26, 2016 at 12:25:01PM +0200, Takashi Iwai wrote:
> > > > > > On Fri, 26 Aug 2016 11:18:00 +0200,
> > > > > > Takashi Iwai wrote:
> > > > > > > I had to modify the intel_gpu_reset() call because the test was
> > > > > > > done
> > > > > > > on the older kernel, so it's like:
> > > > > > > 
> > > > > > > +       intel_gpu_reset(dev_to_i915(dev)->dev);
> > > > > > > 
> > > > > > > And, it seems working on HSW! \o/
> > > > > > > 
> > > > > > > A simple trick, better than the magical register write revert.
> > > > > > > I'll check other machines, too, to see whether it has any
> > > > > > > negative
> > > > > > > impact.
> > > > > > 
> > > > > > The test results look good on all machines.
> > > > > 
> > > > > The theory then is that the GPU's are active across the load of the
> > > > > hibernation image and so before the GTT is updated the memory
> > > > > currently
> > > > > in use by the GPU is reused by the system.
> > > > > 
> > > > > The key question then is the memory of boot kernel still in place
> > > > > during
> > > > > the hibernate restore phase?
> > > > 
> > > > Before restoring the image all devices are quiesced by calling their
> > > > freeze callback, so the GPU should be idle already
> > > > in i915_pm_restore_early() already.
> > > 
> > > But this happens in the loader kernel, so if that doesn't have the
> > > driver built-in then the freeze callback won't be called either. So any
> > > possible BIOS related GPU activity/setup should be quiesced from the
> > > restore callback then.
> > 
> > I thought the loader kernel has an entire initrd attached, to allow stuff
> > like typing in the disk encryption passwd. Which means we very much do
> > load i915 in the loader kernel already.
> 
> AFAICS, the hibernation image is restored from a late_initcall and so
> /bin/init etc. won't be run in the loader kernel and so the driver
> won't be loaded if built as a module.

Well, on many systems, it's explicitly triggered from initrd (at
least, (open)SUSE does it so since ages ago).  dracut does it after
the whole driver initializations on initrd, usually.

> But in theory at least it's
> possible that the driver won't even be configured in the loader kernel.
> 
> > So maybe we need to throw a gpu reset into the right hook (shutdown or
> > whatever it was) to make sure the loader kernel really stops all gpu write
> > cycles, including anything done due to power saving context restoring.
> 
> The callback called right before the hibernation image is restored is
> freeze. Shutdown is called only after creating the image, before
> powering off.

Hmm, this always confuses me.  Is the freeze callback called to the
loader kernel?


thanks,

Takashi


> 
> --Imre
> 
> > We already know that the only way to get the gpu off the context image
> > permanently is a gpu reset, so that would make some sense.
> > -Daniel
> > 
> > > 
> > > > > If we need to add a ->shutdown callback (if
> > > > > that is even called before hibernate restore) then we can only fix
> > > > > future kernels and are still susceptible to corruption when booing
> > > > > from
> > > > > old kernels.
> > > > > 
> > > > > Any one familiar with the details of the hibernation restore? (And
> > > > > how
> > > > > much relates to kexec?)
> > > > > -Chris
> > 
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-29 14:24                       ` Takashi Iwai
@ 2016-08-29 14:54                         ` Imre Deak
  2016-08-29 15:25                           ` Chris Wilson
  0 siblings, 1 reply; 22+ messages in thread
From: Imre Deak @ 2016-08-29 14:54 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Daniel Vetter, intel-gfx

On ma, 2016-08-29 at 16:24 +0200, Takashi Iwai wrote:
> On Mon, 29 Aug 2016 16:09:23 +0200,
> Imre Deak wrote:
> > 
> > On ma, 2016-08-29 at 15:32 +0200, Daniel Vetter wrote:
> > > On Fri, Aug 26, 2016 at 02:42:47PM +0300, Imre Deak wrote:
> > > > On pe, 2016-08-26 at 14:10 +0300, Imre Deak wrote:
> > > > > On pe, 2016-08-26 at 11:39 +0100, Chris Wilson wrote:
> > > > > > On Fri, Aug 26, 2016 at 12:25:01PM +0200, Takashi Iwai
> > > > > > wrote:
> > > > > > > On Fri, 26 Aug 2016 11:18:00 +0200,
> > > > > > > Takashi Iwai wrote:
> > > > > > > > I had to modify the intel_gpu_reset() call because the
> > > > > > > > test was
> > > > > > > > done
> > > > > > > > on the older kernel, so it's like:
> > > > > > > > 
> > > > > > > > +       intel_gpu_reset(dev_to_i915(dev)->dev);
> > > > > > > > 
> > > > > > > > And, it seems working on HSW! \o/
> > > > > > > > 
> > > > > > > > A simple trick, better than the magical register write
> > > > > > > > revert.
> > > > > > > > I'll check other machines, too, to see whether it has
> > > > > > > > any
> > > > > > > > negative
> > > > > > > > impact.
> > > > > > > 
> > > > > > > The test results look good on all machines.
> > > > > > 
> > > > > > The theory then is that the GPU's are active across the
> > > > > > load of the
> > > > > > hibernation image and so before the GTT is updated the
> > > > > > memory
> > > > > > currently
> > > > > > in use by the GPU is reused by the system.
> > > > > > 
> > > > > > The key question then is the memory of boot kernel still in
> > > > > > place
> > > > > > during
> > > > > > the hibernate restore phase?
> > > > > 
> > > > > Before restoring the image all devices are quiesced by
> > > > > calling their
> > > > > freeze callback, so the GPU should be idle already
> > > > > in i915_pm_restore_early() already.
> > > > 
> > > > But this happens in the loader kernel, so if that doesn't have
> > > > the
> > > > driver built-in then the freeze callback won't be called
> > > > either. So any
> > > > possible BIOS related GPU activity/setup should be quiesced
> > > > from the
> > > > restore callback then.
> > > 
> > > I thought the loader kernel has an entire initrd attached, to
> > > allow stuff
> > > like typing in the disk encryption passwd. Which means we very
> > > much do
> > > load i915 in the loader kernel already.
> > 
> > AFAICS, the hibernation image is restored from a late_initcall and
> > so
> > /bin/init etc. won't be run in the loader kernel and so the driver
> > won't be loaded if built as a module.
> 
> Well, on many systems, it's explicitly triggered from initrd (at
> least, (open)SUSE does it so since ages ago).  dracut does it after
> the whole driver initializations on initrd, usually.

Right, with manual resume that will work. But it's still possible not
to have the driver configured or use kernel resume (passing resume=..).

> > But in theory at least it's
> > possible that the driver won't even be configured in the loader
> > kernel.
> > 
> > > So maybe we need to throw a gpu reset into the right hook
> > > (shutdown or
> > > whatever it was) to make sure the loader kernel really stops all
> > > gpu write
> > > cycles, including anything done due to power saving context
> > > restoring.
> > 
> > The callback called right before the hibernation image is restored
> > is
> > freeze. Shutdown is called only after creating the image, before
> > powering off.
> 
> Hmm, this always confuses me.  Is the freeze callback called to the
> loader kernel?

It's called both in loader and target kernel, before creating or
restoring the image.

--Imre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-29 14:54                         ` Imre Deak
@ 2016-08-29 15:25                           ` Chris Wilson
  2016-08-29 16:46                             ` Lukas Wunner
  2016-08-30 11:33                             ` Imre Deak
  0 siblings, 2 replies; 22+ messages in thread
From: Chris Wilson @ 2016-08-29 15:25 UTC (permalink / raw)
  To: Imre Deak; +Cc: Takashi Iwai, Daniel Vetter, intel-gfx

On Mon, Aug 29, 2016 at 05:54:45PM +0300, Imre Deak wrote:
> On ma, 2016-08-29 at 16:24 +0200, Takashi Iwai wrote:
> > Hmm, this always confuses me.  Is the freeze callback called to the
> > loader kernel?
> 
> It's called both in loader and target kernel, before creating or
> restoring the image.

So the right answer for hiberation is?

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 492c4d4e5ebc..892e1626a9ad 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1915,6 +1915,7 @@ static int i915_pm_freeze_late(struct device *kdev)
        if (ret)
                return ret;
 
+       intel_gpu_reset(dev_priv, ALL_ENGINES);
        return 0;
 }

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-29 15:25                           ` Chris Wilson
@ 2016-08-29 16:46                             ` Lukas Wunner
  2016-08-30 11:43                               ` Imre Deak
  2016-08-30 11:33                             ` Imre Deak
  1 sibling, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2016-08-29 16:46 UTC (permalink / raw)
  To: Chris Wilson, Imre Deak, Takashi Iwai, Daniel Vetter,
	Daniel Vetter, intel-gfx

On Mon, Aug 29, 2016 at 04:25:25PM +0100, Chris Wilson wrote:
> On Mon, Aug 29, 2016 at 05:54:45PM +0300, Imre Deak wrote:
> > On ma, 2016-08-29 at 16:24 +0200, Takashi Iwai wrote:
> > > Hmm, this always confuses me.  Is the freeze callback called to the
> > > loader kernel?
> > 
> > It's called both in loader and target kernel, before creating or
> > restoring the image.
> 
> So the right answer for hiberation is?

According to Documentation/power/devices.txt:

"After the image has been loaded, the devices managed by the boot kernel
need to be prepared for passing control back to the image kernel.  This
is very similar to the initial steps involved in creating a system image,
and it is accomplished in the same way, using prepare, freeze, and
freeze_noirq phases.  However the devices affected by these phases are
only those having drivers in the boot kernel; other devices will still be
in whatever state the boot loader left them."

"Most often the pre-hibernation memory contents are restored successfully
and control is passed to the image kernel, which then becomes responsible
for bringing the system back to the working state.
To achieve this, the image kernel must restore the devices' pre-hibernation
functionality.  The operation is much like waking up from the memory sleep
state, although it involves different phases:
        restore_noirq, restore_early, restore, complete"


If a missing i915.ko in the boot kernel's initrd causes problems, perhaps
this can be solved by amending intel_graphics_quirks() to reset the GPU?

Best regards,

Lukas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-29 15:25                           ` Chris Wilson
  2016-08-29 16:46                             ` Lukas Wunner
@ 2016-08-30 11:33                             ` Imre Deak
  1 sibling, 0 replies; 22+ messages in thread
From: Imre Deak @ 2016-08-30 11:33 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Takashi Iwai, Daniel Vetter, intel-gfx

On ma, 2016-08-29 at 16:25 +0100, Chris Wilson wrote:
> On Mon, Aug 29, 2016 at 05:54:45PM +0300, Imre Deak wrote:
> > On ma, 2016-08-29 at 16:24 +0200, Takashi Iwai wrote:
> > > Hmm, this always confuses me.  Is the freeze callback called to
> > > the
> > > loader kernel?
> > 
> > It's called both in loader and target kernel, before creating or
> > restoring the image.
> 
> So the right answer for hiberation is?

It would be good to know what goes wrong first. Assuming Takashi has
the driver in his loader kernel these are the GPU specific steps in his
hibernation sequence:

Loader kernel:
i915_pm_freeze()->
  i915_gem_suspend()
i915_pm_freeze_late()->
  pci_disable_device()
  pci_set_power_state(D3)

<restore target kernel image>

Target kernel:
i915_pm_restore_early()->
  pci_set_power_state(D0)
  pci_enable_device()
i915_pm_restore()
  i915_gem_resume()

Nothing seems wrong to me here, and after i915_gem_suspend() there
should be no GPU activity any more. In any case if there is no better
explanation for the root cause I'd try the reset in i915_gem_suspend()
or i915_gem_resume().

--Imre

> diff --git a/drivers/gpu/drm/i915/i915_drv.c
> b/drivers/gpu/drm/i915/i915_drv.c
> index 492c4d4e5ebc..892e1626a9ad 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1915,6 +1915,7 @@ static int i915_pm_freeze_late(struct device
> *kdev)
>         if (ret)
>                 return ret;
>  
> +       intel_gpu_reset(dev_priv, ALL_ENGINES);
>         return 0;
>  }
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-29 16:46                             ` Lukas Wunner
@ 2016-08-30 11:43                               ` Imre Deak
  0 siblings, 0 replies; 22+ messages in thread
From: Imre Deak @ 2016-08-30 11:43 UTC (permalink / raw)
  To: Lukas Wunner, Chris Wilson, Takashi Iwai, Daniel Vetter,
	Daniel Vetter, intel-gfx

On ma, 2016-08-29 at 18:46 +0200, Lukas Wunner wrote:
> On Mon, Aug 29, 2016 at 04:25:25PM +0100, Chris Wilson wrote:
> > On Mon, Aug 29, 2016 at 05:54:45PM +0300, Imre Deak wrote:
> > > On ma, 2016-08-29 at 16:24 +0200, Takashi Iwai wrote:
> > > > Hmm, this always confuses me.  Is the freeze callback called to the
> > > > loader kernel?
> > > 
> > > It's called both in loader and target kernel, before creating or
> > > restoring the image.
> > 
> > So the right answer for hiberation is?
> 
> According to Documentation/power/devices.txt:
> 
> "After the image has been loaded, the devices managed by the boot kernel
> need to be prepared for passing control back to the image kernel.  This
> is very similar to the initial steps involved in creating a system image,
> and it is accomplished in the same way, using prepare, freeze, and
> freeze_noirq phases.  However the devices affected by these phases are
> only those having drivers in the boot kernel; other devices will still be
> in whatever state the boot loader left them."
> 
> "Most often the pre-hibernation memory contents are restored successfully
> and control is passed to the image kernel, which then becomes responsible
> for bringing the system back to the working state.
> To achieve this, the image kernel must restore the devices' pre-hibernation
> functionality.  The operation is much like waking up from the memory sleep
> state, although it involves different phases:
>         restore_noirq, restore_early, restore, complete"
> 
> 
> If a missing i915.ko in the boot kernel's initrd causes problems, perhaps
> this can be solved by amending intel_graphics_quirks() to reset the GPU?

The problem at hand seems to happen with the driver present in the
loader kernel, so it needs to be fixed in the above PM callbacks. If
the missing driver is really a problem - no proof for this, I only
mentioned it as a possible scenario - then we'd need a separate fix.

--Imre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: S4 resume breakage with i915 driver
  2016-08-29 13:32                   ` Daniel Vetter
  2016-08-29 14:09                     ` Imre Deak
@ 2016-09-02 18:34                     ` Dave Gordon
  1 sibling, 0 replies; 22+ messages in thread
From: Dave Gordon @ 2016-09-02 18:34 UTC (permalink / raw)
  To: Daniel Vetter, Imre Deak; +Cc: Takashi Iwai, Daniel Vetter, intel-gfx

On 29/08/16 14:32, Daniel Vetter wrote:
> On Fri, Aug 26, 2016 at 02:42:47PM +0300, Imre Deak wrote:
>> On pe, 2016-08-26 at 14:10 +0300, Imre Deak wrote:
>>> On pe, 2016-08-26 at 11:39 +0100, Chris Wilson wrote:
>>>> On Fri, Aug 26, 2016 at 12:25:01PM +0200, Takashi Iwai wrote:
>>>>> On Fri, 26 Aug 2016 11:18:00 +0200,
>>>>> Takashi Iwai wrote:
>>>>>> I had to modify the intel_gpu_reset() call because the test was
>>>>>> done
>>>>>> on the older kernel, so it's like:
>>>>>>
>>>>>> +       intel_gpu_reset(dev_to_i915(dev)->dev);
>>>>>>
>>>>>> And, it seems working on HSW! \o/
>>>>>>
>>>>>> A simple trick, better than the magical register write revert.
>>>>>> I'll check other machines, too, to see whether it has any
>>>>>> negative
>>>>>> impact.
>>>>>
>>>>> The test results look good on all machines.
>>>>
>>>> The theory then is that the GPU's are active across the load of the
>>>> hibernation image and so before the GTT is updated the memory
>>>> currently
>>>> in use by the GPU is reused by the system.
>>>>
>>>> The key question then is the memory of boot kernel still in place
>>>> during
>>>> the hibernate restore phase?
>>>
>>> Before restoring the image all devices are quiesced by calling their
>>> freeze callback, so the GPU should be idle already
>>> in i915_pm_restore_early() already.
>>
>> But this happens in the loader kernel, so if that doesn't have the
>> driver built-in then the freeze callback won't be called either. So any
>> possible BIOS related GPU activity/setup should be quiesced from the
>> restore callback then.
>
> I thought the loader kernel has an entire initrd attached, to allow stuff
> like typing in the disk encryption passwd. Which means we very much do
> load i915 in the loader kernel already.
>
> So maybe we need to throw a gpu reset into the right hook (shutdown or
> whatever it was) to make sure the loader kernel really stops all gpu write
> cycles, including anything done due to power saving context restoring. We
> already know that the only way to get the gpu off the context image
> permanently is a gpu reset, so that would make some sense.
> -Daniel
>
>>
>>>> If we need to add a ->shutdown callback (if
>>>> that is even called before hibernate restore) then we can only fix
>>>> future kernels and are still susceptible to corruption when booing
>>>> from
>>>> old kernels.
>>>>
>>>> Any one familiar with the details of the hibernation restore? (And
>>>> how
>>>> much relates to kexec?)
>>>> -Chris

Seems like:

1. the driver should quiesce the h/w *as much as possible* during the 
saving of the hibernation image -- which may mean a reset during this 
phase; otherwise there might be some risk of the saved image being 
incomplete or inconsistent.

2. the driver should make *no assumptions whatsoever* about the state of 
the h/w during resume-from-hibernation, as we don't know what state the 
bios may have left it in, or whether a (possibly completely different) 
version of the driver in the loader kernel played with it at all. So a 
hard reset should be mandatory during resume.

[Aside: that other (fault-tolerant) kernel I used to work on took this 
approach. Whenever a driver is *given* control of a device, it should 
assume *nothing*, reset the h/w, and reprogram from scratch. And when 
it's about to *lose* control of a device, it should quiesce all activity 
and then reset the h/w to the default state before handing it back. That 
way there are no surprises for anybody.]

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-09-02 18:34 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-25 13:11 S4 resume breakage with i915 driver Takashi Iwai
2016-08-25 15:32 ` Chris Wilson
2016-08-25 15:54   ` Takashi Iwai
2016-08-25 16:12     ` Chris Wilson
2016-08-25 16:15       ` Takashi Iwai
2016-08-26  9:18         ` Takashi Iwai
2016-08-26 10:25           ` Takashi Iwai
2016-08-26 10:39             ` Chris Wilson
2016-08-26 11:10               ` Imre Deak
2016-08-26 11:42                 ` Imre Deak
2016-08-29 13:32                   ` Daniel Vetter
2016-08-29 14:09                     ` Imre Deak
2016-08-29 14:24                       ` Takashi Iwai
2016-08-29 14:54                         ` Imre Deak
2016-08-29 15:25                           ` Chris Wilson
2016-08-29 16:46                             ` Lukas Wunner
2016-08-30 11:43                               ` Imre Deak
2016-08-30 11:33                             ` Imre Deak
2016-09-02 18:34                     ` Dave Gordon
2016-08-26 12:29   ` David Weinehall
2016-08-26 12:38     ` Chris Wilson
2016-08-26 12:54       ` David Weinehall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.