[BISECTED REGRESSION v3.18->v3.19-rc1] drm/i915: failure to poweroff after hibernation

* [BISECTED REGRESSION v3.18->v3.19-rc1] drm/i915: failure to poweroff after hibernation
@ 2015-02-24 14:49 Bjørn Mork
  2015-02-24 14:58 ` [PATCH] drm/i915: fix failure to power off after hibernate Bjørn Mork
  0 siblings, 1 reply; 22+ messages in thread
From: Bjørn Mork @ 2015-02-24 14:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Daniel Vetter, Jani Nikula, intel-gfx, dri-devel, Imre Deak,
	Ville Syrjälä

Hello,

My Lenovo Thinkpad X301 has failed to power off after saving the
hibernation image ever since v3.19-rc1.  This is a regression since
v3.18.  The regression is still present i v4.0-rc1.

The symptoms are: Hibernation proceeds as usual, writing a complete
image. But instead of powering off the laptop stays on in a "dead"
state, with fans running and the "sleep" LED blinking.  The only way out
of this state is by hard poweroff (pressing the power button for 4+
seconds). The system can successfully resume after this, proving that
the hibernation was complete.

I consider the bug somewhat critical as the system will continue to draw
power until it is forcibly powered off, or the battery is completely
dead. This causes unexpected battery usage and unnecessary battery wear
if the power-off failure goes unnoticed.

I finally took the time to bisect the problem, and ended up with this
surprisingly obvious result:

da2bc1b9db3351addd293e5b82757efe1f77ed1d is the first bad commit
commit da2bc1b9db3351addd293e5b82757efe1f77ed1d
Author: Imre Deak <imre.deak@intel.com>
Date:   Thu Oct 23 19:23:26 2014 +0300

    drm/i915: add poweroff_late handler

    The suspend_late handler saves some registers and powers off the device,
    so it doesn't have a big overhead. Calling it at S4 poweroff_late time
    makes the power off handling identical to the S3 suspend and S4 freeze
    handling, so do this for consistency.

    Signed-off-by: Imre Deak <imre.deak@intel.com>
    Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

:040000 040000 367eee899c6c2b2a669e2e46f68529dad0e1f7a3 78c7571e2b18dc0fb77161b8a3e32288bd4cbee8 M      drivers

The bisect log was:

# bad: [97bf6af1f928216fd6c5a66e8a57bfa95a659672] Linux 3.19-rc1
# good: [b2776bf7149bddd1f4161f14f79520f17fc1d71d] Linux 3.18
git bisect start 'v3.19-rc1' 'v3.18'
# good: [70e71ca0af244f48a5dcf56dc435243792e3a495] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect good 70e71ca0af244f48a5dcf56dc435243792e3a495
# bad: [988adfdffdd43cfd841df734664727993076d7cb] Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
git bisect bad 988adfdffdd43cfd841df734664727993076d7cb
# good: [e7cf773d431a63a2417902696fcc9e0ebdc83bbe] Merge tag 'usb-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
git bisect good e7cf773d431a63a2417902696fcc9e0ebdc83bbe
# bad: [1a92b7a241dcf06a92d84219b4124dcf420ae315] Merge branch 'linux-3.19' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next
git bisect bad 1a92b7a241dcf06a92d84219b4124dcf420ae315
# bad: [fd172d0c47fddff801d998e38c3efdd236ed082f] Merge tag 'drm-intel-next-2014-11-07-fixups' of git://anongit.freedesktop.org/drm-intel into drm-next
git bisect bad fd172d0c47fddff801d998e38c3efdd236ed082f
# bad: [820d2d77482810702758381808bdbb64595298e2] drm/i915/audio: pass intel_encoder on to platform specific ELD functions
git bisect bad 820d2d77482810702758381808bdbb64595298e2
# good: [11c9b6c628c646894e6ef53f92cfd33a814ee553] drm/i915: Tighting frontbuffer tracking around flips
git bisect good 11c9b6c628c646894e6ef53f92cfd33a814ee553
# good: [0b14cbd2f58199a024acbe2994bb27533c97d756] drm/i915: remove dead code from legacy suspend handler
git bisect good 0b14cbd2f58199a024acbe2994bb27533c97d756
# good: [f4a12ead50580c17c3641ac1a453e68b5a5195dd] drm/i915: remove unused restore_gtt_mappings optimization during suspend
git bisect good f4a12ead50580c17c3641ac1a453e68b5a5195dd
# bad: [aff437667b93c3d65576b02628885687c72e1b3b] drm/i915: Move flags describing VMA mappings into the VMA
git bisect bad aff437667b93c3d65576b02628885687c72e1b3b
# bad: [da2bc1b9db3351addd293e5b82757efe1f77ed1d] drm/i915: add poweroff_late handler
git bisect bad da2bc1b9db3351addd293e5b82757efe1f77ed1d
# good: [f2476ae65e6159b41168bc41c630e9fbb1d72dde] drm/i915: disable/re-enable PCI device around S4 freeze/thaw
git bisect good f2476ae65e6159b41168bc41c630e9fbb1d72dde
# good: [5e365c391aeffe8b53d6952c28a68bd5fc856390] drm/i915: sanitize suspend/resume helper function names
git bisect good 5e365c391aeffe8b53d6952c28a68bd5fc856390

My rather old system has a GM45 chip, but I would expect the bug to
show up on most (all?) i915 systems:

nemi:/tmp# lspci -vvvnns 2
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller [8086:2a42] (rev 07) (prog-if 00 [VGA controller])
        Subsystem: Lenovo Device [17aa:20e4]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 28
        Region 0: Memory at f0000000 (64-bit, non-prefetchable) [size=4M]
        Region 2: Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Region 4: I/O ports at 1800 [size=8]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee0100c  Data: 41c2
        Capabilities: [d0] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Kernel driver in use: i915

00:02.1 Display controller [0380]: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller [8086:2a43] (rev 07)
        Subsystem: Lenovo Device [17aa:20e4]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Region 0: Memory at f0400000 (64-bit, non-prefetchable) [size=1M]
        Capabilities: [d0] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

I'll follow up with a patch reverting the buggy commit.  The patch has
been successfully verified to fix the problem on top of v4.0-rc1, but
should also go into the v3.19 stable series.

Bjørn

^ permalink raw reply	[flat|nested] 22+ messages in thread