S4 resume breakage with i915 driver

* S4 resume breakage with i915 driver
@ 2016-08-25 13:11 Takashi Iwai
  2016-08-25 15:32 ` Chris Wilson
  0 siblings, 1 reply; 22+ messages in thread
From: Takashi Iwai @ 2016-08-25 13:11 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

Hi,

while debugging our 4.4.x based SLE12-SP2 kernel, we noticed that S4
resume is broken on many machines with i915 gfx, even on the upstream
4.7 kernel.  Originally it was reported by Intel about SKL machines,
but later on, we found that it hits on many other older chips (at
least HSW), too.

This was hard to identify because there have been other S4 resume bugs
until recently.  But even after these fixes, when the system is tested
on i915 gfx, the system gets memory corruption or kernel Oops sooner
or later after a few (usually < 10) S4 cycles.

As the bug happened between 4.2 and 4.3, I bisected and it pointed to
the commit:

  4c436d55b279bbc6b02aac02e7dc683fc09f884e
    drm/i915: Enable Resource Streamer state save/restore on MI_SET_CONTEXT

Indeed, reverting this on top of our 4.4.x kernel seems to make S4
working stably (at least on a test machine).

Does this make any sense to you guys?

Since the commit message doesn't give a good explanation about this
change, I wonder what's the purpose of this commit.  Was it merely
optimization?

Some other things to be noted:
- This might be depending on the kernel config, of course.  We've
  stated hitting this after the deferred page init is enabled, for
  example.  But it's just a coincidence, not the cause.

- S3 seems working stably.  Only S4 is the problem.

- The upstream commits I backported onto 4.4.x are:
  65c0554b73c920023cc8998802e508b798113b46
    x86/power/64: Fix kernel text mapping corruption during image restoration
  406f992e4a372dafbe3c2cff7efbb2002a5c8ebd
    x86 / hibernate: Use hlt_play_dead() when resuming from hibernation

  With these, S4 works very stable on 4.4.x without i915, passed over
  100 S4 cycles.

- 4.8-rc has a mm-related change (the commit
  e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
    mm, page_alloc: remove fair zone allocation policy), and this
  commit alone improves the S4 stability by some mystery reason.  But
  the issue with i915 S4 must remain even with 4.8, I believe.  It's
  just a matter of probability.  Hence, for checking the i915 S4
  issue, it'd be easier to test with a slightly older kernel.

thanks,

Takashi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread