linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Machine crashes right *after* ~successful resume
@ 2014-10-07 23:20 Wilmer van der Gaast
  2014-10-12 14:30 ` Pavel Machek
  0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-07 23:20 UTC (permalink / raw)
  To: rafael.j.wysocki, linux-kernel

Hello,

Rafael, including you on this since 
http://linuxconcloudopenna2013.sched.org/event/d708f47d07cd44b9669610778c024708#.VDRzTDS_EUF 
mentions you as the maintainer for Linux + power management. I hope this 
is still accurate.

Since Linux 3.12 (Debian version 3.12.9-1~bpo70+1) and all the way up to 
3.16 (Debian version 3.16.3-2), I'm having suspend-resume issues on my 
machine (Intel Z68, i7-3770K) that are somewhat less obvious.

After every boot, I get two successful suspend+resume cycles, but after 
the third suspend, it won't resume successfully. On the VGA console I've 
never had anything useful logged, luckily over the serial console I've 
had more luck. I seem to get as far as:

[  153.787678] PM: resume of devices complete after 3797.737 msecs
[  153.787775] PM: resume devices took 3.796 seconds
[  154.238612] Restarting tasks ... done.

And indeed, while testing I was running a "ping -i0.01" to a host on my 
network, and it managed to get a few packets out. Timing already seems 
quite off though:

22:11:49.515489 IP 192.168.44.101 > 192.168.44.100: ICMP echo request, 
id 3074, seq 894, length 64
22:11:49.982265 IP 192.168.44.101 > 192.168.44.100: ICMP echo request, 
id 3074, seq 895, length 64
22:11:50.986779 IP 192.168.44.101 > 192.168.44.100: ICMP echo request, 
id 3074, seq 896, length 64

Note the gaps that are 0.4-1.0s instead of the 0.01s they should've 
been. To me these pings going *out* sound like userland's definitely 
waking up for a while, or at least some processes are. Also, for several 
seconds even during earlier stages of the resume, the machine is already 
responding to echo requests.

Sadly after this message to my serial console and these few ICMP 
packets, the machine locks up quite hard, to the point that SysRq 
doesn't respond anymore. :-(

This is happening for a while already and makes suspend+resume mostly 
useless on my machine. What other debugging info can I provide to help 
with getting this fixed?

I've found out about pm_trace, which always points at the same line (and 
no device):

/var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [    0.780503]   Magic 
number: 0:52:740
/var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [    0.780599]   hash 
matches /tmp/linux-3.16.3/drivers/base/power/main.c:812

In my source tree that line is:

         TRACE_RESUME(error);

Right at the end of device_resume(), under the Complete: label. Note 
that I might have to redo this though, as I now realise I had only 
recompiled my *kernel* with the PM_TRACE_RTC flag set, not all my 
modules, which I assume is not enough. (I'm thinking of filing a Debian 
bug requesting this flag to be enabled by default..) However since the 
kernel seems to declare the resume as complete I'm not sure whether 
pm_trace is still of any use?

With kernels 3.10 and older I have no such problems, I can 
suspend+resume as often as I want.

I've already tried to skip the NVidia + VMware modules at boot time (as 
you can see from the logs they're not loaded at any point), but it 
didn't help. I could try omitting more modules.

I'm attaching a full dmesg of boot + a few suspend+resume cycles in 3.10 
and 3.16, and a dump of the serial console showing the last resume cycle 
(which I couldn't get from dmesg of course).

You might notice the message about s2ram segfaulting which I've looked 
at, that seems to be VBE-related code, but this problem occurs even when 
I just echo ram to /sys/power/state directly without using s2ram, so I 
assume it's not related.

Sorry for the long message. I'd love some ideas for troubleshooting an 
issue like this.

"Attachments" in http://roy.gaast.net/~wilmer/.lkml/ since I just 
realised >200KB of attachments might not be appreciated. :-)


Cheers,

Wilmer van der Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-07 23:20 Machine crashes right *after* ~successful resume Wilmer van der Gaast
@ 2014-10-12 14:30 ` Pavel Machek
  2014-10-12 15:49   ` Wilmer van der Gaast
  0 siblings, 1 reply; 51+ messages in thread
From: Pavel Machek @ 2014-10-12 14:30 UTC (permalink / raw)
  To: Wilmer van der Gaast; +Cc: rafael.j.wysocki, linux-kernel

Hi!

> Rafael, including you on this since http://linuxconcloudopenna2013.sched.org/event/d708f47d07cd44b9669610778c024708#.VDRzTDS_EUF
> mentions you as the maintainer for Linux + power management. I hope this is
> still accurate.
> 
> Since Linux 3.12 (Debian version 3.12.9-1~bpo70+1) and all the way up to
> 3.16 (Debian version 3.16.3-2), I'm having suspend-resume issues on my
> machine (Intel Z68, i7-3770K) that are somewhat less obvious.
> 
> After every boot, I get two successful suspend+resume cycles, but after the
> third suspend, it won't resume successfully. On the VGA console I've never
> had anything useful logged, luckily over the serial console I've had more
> luck. I seem to get as far as:

Has it ever worked ok? ...aha, in 3.10, ok.

> I've found out about pm_trace, which always points at the same line (and no
> device):
> 
> /var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [    0.780503]   Magic
> number: 0:52:740
> /var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [    0.780599]   hash matches
> /tmp/linux-3.16.3/drivers/base/power/main.c:812
> 
> In my source tree that line is:
> 
>         TRACE_RESUME(error);


if it resumes ok, this kind of tracking will not help.

> With kernels 3.10 and older I have no such problems, I can suspend+resume as
> often as I want.

is there chance to bisect?

> I've already tried to skip the NVidia + VMware modules at boot time (as you
> can see from the logs they're not loaded at any point), but it didn't help.
> I could try omitting more modules.

Yes, try with minimal modules (and no s2ram) would be nice.
									
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-12 14:30 ` Pavel Machek
@ 2014-10-12 15:49   ` Wilmer van der Gaast
  2014-10-12 20:40     ` Pavel Machek
  0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-12 15:49 UTC (permalink / raw)
  To: Pavel Machek; +Cc: rafael.j.wysocki, linux-kernel

Hello,

Many thanks for your response!

On 12-10-14 15:30, Pavel Machek wrote:
>
> Has it ever worked ok? ...aha, in 3.10, ok.
>
Correct. And I've tried a few more kernels now, compiled on my own. 3.17 
still has this issue, 3.10 is completely fine all the way up to 3.10.57 
(I've tested just under 50 cycles last night). 3.11 I tried but it seems 
to have other suspend-resume stability issues not present anymore in 
later kernels, I've mostly not used those results.

git bisect: I've finally succeeded! I've tried automating it completely, 
but sadly Gigabyte couldn't be bothered wiring up the motherboard to 
make the watchdog work. :-(

The culprit appears to be this one: 2e8b5f621dbe29425906852c6079afb6b28720cb

Merge: 07f2daa fed2451
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Wed Aug 28 20:55:41 2013 -0600

     Merge branch 'pci/misc' into next

     * pci/misc:
       PCI: Remove pcie_cap_has_devctl()
       PCI: Support PCIe Capability Slot registers only for ports with slots
       PCI: Remove PCIe Capability version checks
       PCI: Allow PCIe Capability link-related register access for switches
       PCI: Add offsets of PCIe capability registers
       PCI: Tidy bitmasks and spacing of PCIe capability definitions
       PCI: Remove obsolete comment reference to pci_pcie_cap2()
       PCI: Clarify PCI_EXP_TYPE_PCI_BRIDGE comment
       PCI: Rename PCIe capability definitions to follow convention
       PCI: Disable decoding for BAR sizing only when it was actually 
enabled
       PCI: Add comment about needing pci_msi_off() even when 
CONFIG_PCI_MSI=n
       PCI: Add pcibios_pm_ops for optional arch-specific hibernate 
functionality

I've then tried to narrow down which of the merged changes is my issue 
but with no luck, possibly because there's a problem with a combination 
of one of these changes, and a change that was not in the pci/misc 
branch at the time. I could do a manual test instead.

>> I've already tried to skip the NVidia + VMware modules at boot time (as you
>> can see from the logs they're not loaded at any point), but it didn't help.
>> I could try omitting more modules.
> Yes, try with minimal modules (and no s2ram) would be nice.
> 									
I've tried unloading a bunch of modules (sound and NIC IIRC), same 
results. I can try this again with an even more minimal set. If this 
improves the situation, I'll post again.


Wilmer van der Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-12 15:49   ` Wilmer van der Gaast
@ 2014-10-12 20:40     ` Pavel Machek
  2014-10-12 23:47       ` Wilmer van der Gaast
  2014-10-13 15:06       ` Rafael J. Wysocki
  0 siblings, 2 replies; 51+ messages in thread
From: Pavel Machek @ 2014-10-12 20:40 UTC (permalink / raw)
  To: Wilmer van der Gaast, bhelgaas; +Cc: rafael.j.wysocki, linux-kernel

Bjorn, any ideas?

Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?

Thanks,
									Pavel

On Sun 2014-10-12 16:49:18, Wilmer van der Gaast wrote:
> Hello,
> 
> Many thanks for your response!
> 
> On 12-10-14 15:30, Pavel Machek wrote:
> >
> >Has it ever worked ok? ...aha, in 3.10, ok.
> >
> Correct. And I've tried a few more kernels now, compiled on my own. 3.17
> still has this issue, 3.10 is completely fine all the way up to 3.10.57
> (I've tested just under 50 cycles last night). 3.11 I tried but it seems to
> have other suspend-resume stability issues not present anymore in later
> kernels, I've mostly not used those results.
> 
> git bisect: I've finally succeeded! I've tried automating it completely, but
> sadly Gigabyte couldn't be bothered wiring up the motherboard to make the
> watchdog work. :-(
> 
> The culprit appears to be this one: 2e8b5f621dbe29425906852c6079afb6b28720cb
> 
> Merge: 07f2daa fed2451
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Wed Aug 28 20:55:41 2013 -0600
> 
>     Merge branch 'pci/misc' into next
> 
>     * pci/misc:
>       PCI: Remove pcie_cap_has_devctl()
>       PCI: Support PCIe Capability Slot registers only for ports with slots
>       PCI: Remove PCIe Capability version checks
>       PCI: Allow PCIe Capability link-related register access for switches
>       PCI: Add offsets of PCIe capability registers
>       PCI: Tidy bitmasks and spacing of PCIe capability definitions
>       PCI: Remove obsolete comment reference to pci_pcie_cap2()
>       PCI: Clarify PCI_EXP_TYPE_PCI_BRIDGE comment
>       PCI: Rename PCIe capability definitions to follow convention
>       PCI: Disable decoding for BAR sizing only when it was actually enabled
>       PCI: Add comment about needing pci_msi_off() even when
> CONFIG_PCI_MSI=n
>       PCI: Add pcibios_pm_ops for optional arch-specific hibernate
> functionality
> 
> I've then tried to narrow down which of the merged changes is my issue but
> with no luck, possibly because there's a problem with a combination of one
> of these changes, and a change that was not in the pci/misc branch at the
> time. I could do a manual test instead.
> 
> >>I've already tried to skip the NVidia + VMware modules at boot time (as you
> >>can see from the logs they're not loaded at any point), but it didn't help.
> >>I could try omitting more modules.
> >Yes, try with minimal modules (and no s2ram) would be nice.
> >									
> I've tried unloading a bunch of modules (sound and NIC IIRC), same results.
> I can try this again with an even more minimal set. If this improves the
> situation, I'll post again.
> 
> 
> Wilmer van der Gaast.
> 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-12 20:40     ` Pavel Machek
@ 2014-10-12 23:47       ` Wilmer van der Gaast
  2014-10-13 15:06       ` Rafael J. Wysocki
  1 sibling, 0 replies; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-12 23:47 UTC (permalink / raw)
  To: Pavel Machek; +Cc: bhelgaas, rafael.j.wysocki, linux-kernel

On 12-10-14 21:40, Pavel Machek wrote:
> Bjorn, any ideas?
>
> Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?
>
I've tried this, too many conflicts unfortunately.

Just noticed this message appear during failing resumes by the way:

[   54.203072] Clocksource tsc unstable (delta = -499956111 ns)
[   54.203151] Switched to clocksource hpet
[   54.203166] PM: resume of devices complete after 2142.341 msecs

Though not all the time. Feels like it's more another symptom of the 
same problem. In my original e-mail I already noted timing strangeness, 
with a 0.01s ping interval growing to 0.4s+.

Anyway, my previous bisect result appears to be wrong. :-( I've done 
another bisect on a narrow range around it, now 
928bea964827d7824b548c1f8e06eccbbc4d0d7d is considered guilty. I've 
rerun the test twice with that revision and the one before it 
(55ed83a615730c2578da155bc99b68f4417ffe20), and the result seems 
consistent now; 928bea gets me just two clean suspend+resumes, 55ed83 more.

I have tried to revert this change in a 3.17 tree but it didn't apply 
cleanly. One issue was a "Unreversed patch detected!" which looks to me 
like some of this work has been changed already. Even against a 3.12 
tree I get this issue.

Just to be sure, I've tried ignoring the unreversed patch warning and 
tweaked the patch in two more places to make it apply, but indeed that 
does not solve my problem.

A Google search for the revision number shows that there has been quite 
a discussion about it already. Maybe my machine has found another issue 
(though I suppose my machine's more guilty than the kernel! :-/).

>> I've tried unloading a bunch of modules (sound and NIC IIRC), same results.
>> I can try this again with an even more minimal set. If this improves the
>> situation, I'll post again.
>>
This is done: Still seeing the same issue. (And I'm using raw echo 
mem>/proc/... for all testing now.) Same for a "make defconfig" kernel.


Wilmer van der Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-12 20:40     ` Pavel Machek
  2014-10-12 23:47       ` Wilmer van der Gaast
@ 2014-10-13 15:06       ` Rafael J. Wysocki
  2014-10-15 11:16         ` Wilmer van der Gaast
  1 sibling, 1 reply; 51+ messages in thread
From: Rafael J. Wysocki @ 2014-10-13 15:06 UTC (permalink / raw)
  To: Pavel Machek, Wilmer van der Gaast
  Cc: bhelgaas, rafael.j.wysocki, linux-kernel

On Sunday, October 12, 2014 10:40:32 PM Pavel Machek wrote:
> Bjorn, any ideas?
> 
> Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?

That's a merge, isn't it?

I'd rather check what the pci/misc branch was based on and then bisect that
branch.

If you do

$ git show fed2451

you'll see (among other things) that this indeed is the PCI branch merged
by that commit and that it is based on

3b2f64d00c46 Linux 3.11-rc2

So, you can do

$ git bisect 3b2f64d00c46..fed2451

and see which of the commits in there introduced the problem you're seeing.

Note: Test fed2451 itself *first* and if that is bad already, then the merge
itself was problematic, in which case please let me know.


> On Sun 2014-10-12 16:49:18, Wilmer van der Gaast wrote:
> > Hello,
> > 
> > Many thanks for your response!
> > 
> > On 12-10-14 15:30, Pavel Machek wrote:
> > >
> > >Has it ever worked ok? ...aha, in 3.10, ok.
> > >
> > Correct. And I've tried a few more kernels now, compiled on my own. 3.17
> > still has this issue, 3.10 is completely fine all the way up to 3.10.57
> > (I've tested just under 50 cycles last night). 3.11 I tried but it seems to
> > have other suspend-resume stability issues not present anymore in later
> > kernels, I've mostly not used those results.
> > 
> > git bisect: I've finally succeeded! I've tried automating it completely, but
> > sadly Gigabyte couldn't be bothered wiring up the motherboard to make the
> > watchdog work. :-(
> > 
> > The culprit appears to be this one: 2e8b5f621dbe29425906852c6079afb6b28720cb
> > 
> > Merge: 07f2daa fed2451
> > Author: Bjorn Helgaas <bhelgaas@google.com>
> > Date:   Wed Aug 28 20:55:41 2013 -0600
> > 
> >     Merge branch 'pci/misc' into next
> > 
> >     * pci/misc:
> >       PCI: Remove pcie_cap_has_devctl()
> >       PCI: Support PCIe Capability Slot registers only for ports with slots
> >       PCI: Remove PCIe Capability version checks
> >       PCI: Allow PCIe Capability link-related register access for switches
> >       PCI: Add offsets of PCIe capability registers
> >       PCI: Tidy bitmasks and spacing of PCIe capability definitions
> >       PCI: Remove obsolete comment reference to pci_pcie_cap2()
> >       PCI: Clarify PCI_EXP_TYPE_PCI_BRIDGE comment
> >       PCI: Rename PCIe capability definitions to follow convention
> >       PCI: Disable decoding for BAR sizing only when it was actually enabled
> >       PCI: Add comment about needing pci_msi_off() even when
> > CONFIG_PCI_MSI=n
> >       PCI: Add pcibios_pm_ops for optional arch-specific hibernate
> > functionality
> > 
> > I've then tried to narrow down which of the merged changes is my issue but
> > with no luck, possibly because there's a problem with a combination of one
> > of these changes, and a change that was not in the pci/misc branch at the
> > time. I could do a manual test instead.
> > 
> > >>I've already tried to skip the NVidia + VMware modules at boot time (as you
> > >>can see from the logs they're not loaded at any point), but it didn't help.
> > >>I could try omitting more modules.
> > >Yes, try with minimal modules (and no s2ram) would be nice.
> > >									
> > I've tried unloading a bunch of modules (sound and NIC IIRC), same results.
> > I can try this again with an even more minimal set. If this improves the
> > situation, I'll post again.
> > 
> > 
> > Wilmer van der Gaast.
> > 
> 
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-13 15:06       ` Rafael J. Wysocki
@ 2014-10-15 11:16         ` Wilmer van der Gaast
  2014-10-15 13:58           ` Bjorn Helgaas
  0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-15 11:16 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Pavel Machek, bhelgaas, rafael.j.wysocki, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1302 bytes --]

Hello Rafael,

Rafael J. Wysocki (rjw@rjwysocki.net) wrote:
> > Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?
> That's a merge, isn't it?
> 
Correct, it was, and I did try to figure out which of its parents was
the guilty one, but then I found out the real problem is
928bea964827d7824b548c1f8e06eccbbc4d0d7d.

Not sure why 2e8b... was initially found guilty by git bisect, I fear
that my testing was not thorough enough. I've verified a couple of times
now that 928bea96... does cause crashes and the previous revision does not.

928bea... seems to reshuffle PCI initialisation a little bit and has
caused more troubles, judging from a Google query for it. Some changes
were made already as a result, and this unfortunately makes a revert on
a later kernel tree (to see if that fixes the problem for me) much less
straight-forward. :-(

I can look at the code and see how to revert this now, but I'm
definitely not very proficient outside userland.


Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 173 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-15 11:16         ` Wilmer van der Gaast
@ 2014-10-15 13:58           ` Bjorn Helgaas
  2014-10-15 18:39             ` Yinghai Lu
  0 siblings, 1 reply; 51+ messages in thread
From: Bjorn Helgaas @ 2014-10-15 13:58 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Rafael J. Wysocki, Pavel Machek, Rafael Wysocki, linux-kernel,
	Yinghai Lu

[+cc Yinghai, author of 928bea964827 ("PCI: Delay enabling bridges
until they're needed")]

On Wed, Oct 15, 2014 at 5:16 AM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> Hello Rafael,
>
> Rafael J. Wysocki (rjw@rjwysocki.net) wrote:
>> > Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?
>> That's a merge, isn't it?
>>
> Correct, it was, and I did try to figure out which of its parents was
> the guilty one, but then I found out the real problem is
> 928bea964827d7824b548c1f8e06eccbbc4d0d7d.
>
> Not sure why 2e8b... was initially found guilty by git bisect, I fear
> that my testing was not thorough enough. I've verified a couple of times
> now that 928bea96... does cause crashes and the previous revision does not.
>
> 928bea... seems to reshuffle PCI initialisation a little bit and has
> caused more troubles, judging from a Google query for it. Some changes
> were made already as a result, and this unfortunately makes a revert on
> a later kernel tree (to see if that fixes the problem for me) much less
> straight-forward. :-(

More details (from initial post) here: http://roy.gaast.net/~wilmer/.lkml/

Can you open a report at http://bugzilla.kernel.org, please?  Please
also attach the complete "lspci -vv" output.

Bjorn

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-15 13:58           ` Bjorn Helgaas
@ 2014-10-15 18:39             ` Yinghai Lu
  2014-10-15 23:34               ` Wilmer van der Gaast
  0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-15 18:39 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Wilmer van der Gaast, Rafael J. Wysocki, Pavel Machek,
	Rafael Wysocki, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1050 bytes --]

On Wed, Oct 15, 2014 at 6:58 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc Yinghai, author of 928bea964827 ("PCI: Delay enabling bridges
> until they're needed")]
>
> On Wed, Oct 15, 2014 at 5:16 AM, Wilmer van der Gaast <wilmer@gaast.net>
>> Not sure why 2e8b... was initially found guilty by git bisect, I fear
>> that my testing was not thorough enough. I've verified a couple of times
>> now that 928bea96... does cause crashes and the previous revision does not.

so third resume will not work? that is strange.
second and third should not use same code path...

>>
>> 928bea... seems to reshuffle PCI initialisation a little bit and has
>> caused more troubles, judging from a Google query for it. Some changes
>> were made already as a result, and this unfortunately makes a revert on
>> a later kernel tree (to see if that fixes the problem for me) much less
>> straight-forward. :-(
>
> More details (from initial post) here: http://roy.gaast.net/~wilmer/.lkml/

Please check if attached reverting patch would work on 3.17.

Yinghai

[-- Attachment #2: revert_928bea9_from_3.17.patch --]
[-- Type: text/x-patch, Size: 7187 bytes --]

diff --git a/arch/arm/kernel/bios32.c b/arch/arm/kernel/bios32.c
index 17a26c1..306ca53 100644
--- a/arch/arm/kernel/bios32.c
+++ b/arch/arm/kernel/bios32.c
@@ -538,6 +538,12 @@ void pci_common_init_dev(struct device *parent, struct hw_pci *hw)
 			 * Assign resources.
 			 */
 			pci_bus_assign_resources(bus);
+
+
+			/*
+			 * Enable bridges
+			 */
+			pci_enable_bridges(bus);
 		}
 
 		/*
diff --git a/arch/m68k/coldfire/pci.c b/arch/m68k/coldfire/pci.c
index df96792..b33f97a 100644
--- a/arch/m68k/coldfire/pci.c
+++ b/arch/m68k/coldfire/pci.c
@@ -319,6 +319,7 @@ static int __init mcf_pci_init(void)
 	pci_fixup_irqs(pci_common_swizzle, mcf_pci_map_irq);
 	pci_bus_size_bridges(rootbus);
 	pci_bus_assign_resources(rootbus);
+	pci_enable_bridges(rootbus);
 	return 0;
 }
 
diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c
index 1bf60b1..4f2e17d 100644
--- a/arch/mips/pci/pci.c
+++ b/arch/mips/pci/pci.c
@@ -113,6 +113,7 @@ static void pcibios_scanbus(struct pci_controller *hose)
 		if (!pci_has_flag(PCI_PROBE_ONLY)) {
 			pci_bus_size_bridges(bus);
 			pci_bus_assign_resources(bus);
+			pci_enable_bridges(bus);
 		}
 	}
 }
diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c
index 1bc09ee..5272327 100644
--- a/arch/sh/drivers/pci/pci.c
+++ b/arch/sh/drivers/pci/pci.c
@@ -69,6 +69,7 @@ static void pcibios_scanbus(struct pci_channel *hose)
 
 		pci_bus_size_bridges(bus);
 		pci_bus_assign_resources(bus);
+		pci_enable_bridges(bus);
 	} else {
 		pci_free_resource_list(&resources);
 	}
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index cd4de7e..c15bc3c 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -614,6 +614,9 @@ static int acpi_pci_root_add(struct acpi_device *device,
 	if (system_state != SYSTEM_BOOTING) {
 		pcibios_resource_survey_bus(root->bus);
 		pci_assign_unassigned_root_bus_resources(root->bus);
+
+		/* need to after hot-added ioapic is registered */
+		pci_enable_bridges(root->bus);
 	}
 
 	pci_lock_rescan_remove();
diff --git a/drivers/parisc/lba_pci.c b/drivers/parisc/lba_pci.c
index 37e71ff..19f6f70 100644
--- a/drivers/parisc/lba_pci.c
+++ b/drivers/parisc/lba_pci.c
@@ -1590,6 +1590,7 @@ lba_driver_probe(struct parisc_device *dev)
 		lba_dump_res(&lba_dev->hba.lmmio_space, 2);
 #endif
 	}
+	pci_enable_bridges(lba_bus);
 
 	/*
 	** Once PCI register ops has walked the bus, access to config
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 73aef51..761601e 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -283,6 +283,26 @@ void pci_bus_add_devices(const struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_bus_add_devices);
 
+void pci_enable_bridges(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	int retval;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		if (dev->subordinate) {
+			if (!pci_is_enabled(dev)) {
+				retval = pci_enable_device(dev);
+				if (retval)
+					dev_err(&dev->dev, "Error enabling bridge (%d), continuing\n", retval);
+				pci_set_master(dev);
+			}
+			pci_enable_bridges(dev->subordinate);
+		}
+	}
+}
+EXPORT_SYMBOL(pci_enable_bridges);
+
+
 /** pci_walk_bus - walk devices on/under bus, calling callback.
  *  @top      bus whose devices should be walked
  *  @cb       callback to be called for each device found
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index bcb90e4..8fadc84 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -511,6 +511,7 @@ static void enable_slot(struct acpiphp_slot *slot)
 	acpiphp_sanitize_bus(bus);
 	pcie_bus_configure_settings(bus);
 	acpiphp_set_acpi_region(slot);
+	pci_enable_bridges(bus);
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		/* Assume that newly added devices are powered on already. */
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 625a4ac..4121518 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1242,31 +1242,8 @@ int pci_reenable_device(struct pci_dev *dev)
 }
 EXPORT_SYMBOL(pci_reenable_device);
 
-static void pci_enable_bridge(struct pci_dev *dev)
-{
-	struct pci_dev *bridge;
-	int retval;
-
-	bridge = pci_upstream_bridge(dev);
-	if (bridge)
-		pci_enable_bridge(bridge);
-
-	if (pci_is_enabled(dev)) {
-		if (!dev->is_busmaster)
-			pci_set_master(dev);
-		return;
-	}
-
-	retval = pci_enable_device(dev);
-	if (retval)
-		dev_err(&dev->dev, "Error enabling bridge (%d), continuing\n",
-			retval);
-	pci_set_master(dev);
-}
-
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
-	struct pci_dev *bridge;
 	int err;
 	int i, bars = 0;
 
@@ -1285,10 +1262,6 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 	if (atomic_inc_return(&dev->enable_cnt) > 1)
 		return 0;		/* already enabled */
 
-	bridge = pci_upstream_bridge(dev);
-	if (bridge)
-		pci_enable_bridge(bridge);
-
 	/* only skip sriov related */
 	for (i = 0; i <= PCI_ROM_RESOURCE; i++)
 		if (dev->resource[i].flags & flags)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 5ed9930..df17ba8 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2177,6 +2177,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 
 	max = pci_scan_child_bus(bus);
 	pci_assign_unassigned_bus_resources(bus);
+	pci_enable_bridges(bus);
 	pci_bus_add_devices(bus);
 
 	return max;
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 0482235..2cfb1eb 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1587,7 +1587,7 @@ again:
 
 	/* any device complain? */
 	if (list_empty(&fail_head))
-		goto dump;
+		goto enable_and_dump;
 
 	if (tried_times >= pci_try_num) {
 		if (enable_local == undefined)
@@ -1596,7 +1596,7 @@ again:
 			dev_info(&bus->dev, "Automatically enabled pci realloc, if you have problem, try booting with pci=realloc=off\n");
 
 		free_list(&fail_head);
-		goto dump;
+		goto enable_and_dump;
 	}
 
 	dev_printk(KERN_DEBUG, &bus->dev,
@@ -1629,7 +1629,10 @@ again:
 
 	goto again;
 
-dump:
+enable_and_dump:
+	/* Depth last, update the hardware. */
+	pci_enable_bridges(bus);
+
 	/* dump the resource on buses */
 	pci_bus_dump_resources(bus);
 }
@@ -1700,6 +1703,7 @@ enable_all:
 	if (retval)
 		dev_err(&bridge->dev, "Error reenabling bridge (%d)\n", retval);
 	pci_set_master(bridge);
+	pci_enable_bridges(parent);
 }
 EXPORT_SYMBOL_GPL(pci_assign_unassigned_bridge_resources);
 
diff --git a/drivers/pcmcia/cardbus.c b/drivers/pcmcia/cardbus.c
index 4fe4cc4..9cbe4cf 100644
--- a/drivers/pcmcia/cardbus.c
+++ b/drivers/pcmcia/cardbus.c
@@ -92,6 +92,7 @@ int __ref cb_alloc(struct pcmcia_socket *s)
 	if (s->tune_bridge)
 		s->tune_bridge(s, bus);
 
+	pci_enable_bridges(bus);
 	pci_bus_add_devices(bus);
 
 	pci_unlock_rescan_remove();
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5be8db4..1f85fb5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1105,7 +1105,7 @@ int __must_check pci_bus_alloc_resource(struct pci_bus *bus,
 						  resource_size_t,
 						  resource_size_t),
 			void *alignf_data);
-
+void pci_enable_bridges(struct pci_bus *bus);
 
 int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr);
 

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-15 18:39             ` Yinghai Lu
@ 2014-10-15 23:34               ` Wilmer van der Gaast
  2014-10-16  4:32                 ` Yinghai Lu
  0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-15 23:34 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

Hello Yinghai,

On 15-10-14 19:39, Yinghai Lu wrote:
>
> so third resume will not work? that is strange.
> second and third should not use same code path...
>
Always exactly the third time, yes. Seems strange indeed. :-( I was 
under the impression that on each resume, completion time of device 
resumes was growing, and wondered whether that could be related. However 
looking back at my logs, this is not consistent, in some cases the time 
is constant.

Anyway, your patch works! Had to tweak it slightly to apply cleanly to 
the 3.17 tarball I have, but my machine now went through eleven 
successful suspend+resume cycles again.

Is there anything I can do now to find out why your change is causing my 
machine to crash?

Thank you!


Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-15 23:34               ` Wilmer van der Gaast
@ 2014-10-16  4:32                 ` Yinghai Lu
  2014-10-16  9:36                   ` Wilmer van der Gaast
  0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-16  4:32 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 468 bytes --]

On Wed, Oct 15, 2014 at 4:34 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
>
> Is there anything I can do now to find out why your change is causing my
> machine to crash?

Can you please try attached patch? that should workaround the problem.

as some driver is using pci_enable_device in .resume instead of
pci_renable_device....

We should skip the pci_enable_bridge in those pci_enable_device to avoid
contention between async device_resume.

Thanks

Yinghai

[-- Attachment #2: skip_enable_bridge_on_resume_path.patch --]
[-- Type: text/x-patch, Size: 1081 bytes --]

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 625a4ac..6567831 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1266,7 +1266,6 @@ static void pci_enable_bridge(struct pci_dev *dev)
 
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
-	struct pci_dev *bridge;
 	int err;
 	int i, bars = 0;
 
@@ -1285,9 +1284,19 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 	if (atomic_inc_return(&dev->enable_cnt) > 1)
 		return 0;		/* already enabled */
 
-	bridge = pci_upstream_bridge(dev);
-	if (bridge)
-		pci_enable_bridge(bridge);
+	/*
+	 * Do not enable bridge again on resume path, as parent state
+	 * get restored before.
+	 * Also could avoid delay between different async resume.
+	 */
+	if (!(dev->dev.power.is_suspended ||
+	      dev->dev.power.is_noirq_suspended ||
+	      dev->dev.power.is_late_suspended)) {
+		struct pci_dev *bridge = pci_upstream_bridge(dev);
+
+		if (bridge)
+			pci_enable_bridge(bridge);
+	}
 
 	/* only skip sriov related */
 	for (i = 0; i <= PCI_ROM_RESOURCE; i++)

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-16  4:32                 ` Yinghai Lu
@ 2014-10-16  9:36                   ` Wilmer van der Gaast
  2014-10-16 16:36                     ` Yinghai Lu
  0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-16  9:36 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

Hello,

On 16-10-14 05:32, Yinghai Lu wrote:
>
> Can you please try attached patch? that should workaround the problem.
>
Sadly, no luck. (I do assume you meant me to use the patch against a 
clean 3.17 tree *without* yesterday's revert patch applied.) Back to a 
crash at/after the third resume:

[  372.502897] usb 3-1.1: reset high-speed USB device number 3 using 
ehci-pci
[  372.678765] usb 2-1.5: reset low-speed USB device number 3 using ehci-pci
[  373.398437] Clocksource tsc unstable (delta = -136457848 ns)
[  373.897503] Switched to clocksource hpet
[  373.897536] PM: resume of devices complete after 2143.535 msecs
[  373.898225] r8169 0000:07:00.0 eth0: link up
[  374.319311] Restarting tasks ... done.
(And then nothing.)

Interestingly I did see the "resume of devices" time grow on each resume 
again this time. I'll put the full dmesg dump in the same place like 
before: http://gaast.net/~wilmer/.lkml/

There's a lspci -vv dump there as well, as Bjorn asked for. I'll file a 
bug on bugzilla tonight.

> as some driver is using pci_enable_device in .resume instead of
> pci_renable_device....
>
Maybe this doesn't matter, but I could reproduce this issue even with no 
modules loaded at all (so barebone that I couldn't even mount my rootfs 
and had to do this testing in the initrd), so with only mainline kernel 
code running.


Thanks,

Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-16  9:36                   ` Wilmer van der Gaast
@ 2014-10-16 16:36                     ` Yinghai Lu
  2014-10-16 21:08                       ` Wilmer van der Gaast
  0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-16 16:36 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

On Thu, Oct 16, 2014 at 2:36 AM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> Hello,
>
> On 16-10-14 05:32, Yinghai Lu wrote:
>>
>>
>> Can you please try attached patch? that should workaround the problem.
>>
> Sadly, no luck. (I do assume you meant me to use the patch against a clean
> 3.17 tree *without* yesterday's revert patch applied.) Back to a crash
> at/after the third resume:
>
> [  372.502897] usb 3-1.1: reset high-speed USB device number 3 using
> ehci-pci
> [  372.678765] usb 2-1.5: reset low-speed USB device number 3 using ehci-pci
> [  373.398437] Clocksource tsc unstable (delta = -136457848 ns)
> [  373.897503] Switched to clocksource hpet
> [  373.897536] PM: resume of devices complete after 2143.535 msecs
> [  373.898225] r8169 0000:07:00.0 eth0: link up
> [  374.319311] Restarting tasks ... done.
> (And then nothing.)
>
> Interestingly I did see the "resume of devices" time grow on each resume
> again this time. I'll put the full dmesg dump in the same place like before:
> http://gaast.net/~wilmer/.lkml/

Checked that dmesg and console output, looks ok from last resume.

Can you put "debug ignore_loglevel" in boot command line?
So we can compare output from serial console between good one and bad
one directly.

Also did you try to remove r8169 every time before suspend?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-16 16:36                     ` Yinghai Lu
@ 2014-10-16 21:08                       ` Wilmer van der Gaast
  2014-10-18 21:28                         ` Yinghai Lu
  0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-16 21:08 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

Hello,

I have filed a bug now: 
https://bugzilla.kernel.org/show_bug.cgi?id=86421 We should probably 
continue the discussion there now? I've added just you to the CC field, 
not sure who else on this thread is still interested at this point.

On 16-10-14 17:36, Yinghai Lu wrote:
>
> Can you put "debug ignore_loglevel" in boot command line?
> So we can compare output from serial console between good one and bad
> one directly.
>
Did that, will throw the output in the same log dir. Those arguments 
resulted in very little extra output. :-/

> Also did you try to remove r8169 every time before suspend?
>
Did that on this run, no difference either. For full completeness, I 
reproduced this problem with no modules loaded (done from initramfs) at 
all, with a kernel with your workaround included, logs are here: 
http://gaast.net/~wilmer/.lkml/bad3.17-patched-debug-initramfs.txt


Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-16 21:08                       ` Wilmer van der Gaast
@ 2014-10-18 21:28                         ` Yinghai Lu
  2014-10-18 23:57                           ` Wilmer van der Gaast
  0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-18 21:28 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 586 bytes --]

On Thu, Oct 16, 2014 at 2:08 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> Did that on this run, no difference either. For full completeness, I
> reproduced this problem with no modules loaded (done from initramfs) at all,
> with a kernel with your workaround included, logs are here:
> http://gaast.net/~wilmer/.lkml/bad3.17-patched-debug-initramfs.txt

Yes, those output are good.

Please apply attached debug patch on top of v3.17 and boot with
"debug ignore_loglevel initcall_debug no_console_suspend".

Hope we can find out which nb notifier cause problem.

Thanks

Yinghai

[-- Attachment #2: debug_suspend_resume_x.patch --]
[-- Type: text/x-patch, Size: 1849 bytes --]

---
 kernel/notifier.c   |    9 +++++++++
 kernel/power/main.c |    4 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -24,16 +24,18 @@ DEFINE_MUTEX(pm_mutex);
 
 /* Routines for PM-transition notifications */
 
-static BLOCKING_NOTIFIER_HEAD(pm_chain_head);
+BLOCKING_NOTIFIER_HEAD(pm_chain_head);
 
 int register_pm_notifier(struct notifier_block *nb)
 {
+	pr_info("PM: registering nb %pF\n", nb->notifier_call);
 	return blocking_notifier_chain_register(&pm_chain_head, nb);
 }
 EXPORT_SYMBOL_GPL(register_pm_notifier);
 
 int unregister_pm_notifier(struct notifier_block *nb)
 {
+	pr_info("PM: unregistering nb %pF\n", nb->notifier_call);
 	return blocking_notifier_chain_unregister(&pm_chain_head, nb);
 }
 EXPORT_SYMBOL_GPL(unregister_pm_notifier);
Index: linux-2.6/kernel/notifier.c
===================================================================
--- linux-2.6.orig/kernel/notifier.c
+++ linux-2.6/kernel/notifier.c
@@ -59,6 +59,9 @@ static int notifier_chain_unregister(str
 	return -ENOENT;
 }
 
+extern struct blocking_notifier_head pm_chain_head;
+#define PM_POST_SUSPEND		0x0004 /* Suspend finished */
+
 /**
  * notifier_call_chain - Informs the registered notifiers about an event.
  *	@nl:		Pointer to head of the blocking notifier chain
@@ -90,8 +93,14 @@ static int notifier_call_chain(struct no
 			continue;
 		}
 #endif
+		if (nl == &pm_chain_head.head && val == PM_POST_SUSPEND)
+			pr_info("PM: calling nb %pF\n", nb->notifier_call);
+
 		ret = nb->notifier_call(nb, val, v);
 
+		if (nl == &pm_chain_head.head && val == PM_POST_SUSPEND)
+			pr_info("PM: ... nb %pF done\n", nb->notifier_call);
+
 		if (nr_calls)
 			(*nr_calls)++;
 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-18 21:28                         ` Yinghai Lu
@ 2014-10-18 23:57                           ` Wilmer van der Gaast
  2014-10-19  4:29                             ` Yinghai Lu
  2014-10-19  8:07                             ` Pavel Machek
  0 siblings, 2 replies; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-18 23:57 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

(Resending, forgot to hit reply-to-all.)

Hello Yinghai,

On 18-10-14 22:28, Yinghai Lu wrote:
 >
 > Please apply attached debug patch on top of v3.17 and boot with
 > "debug ignore_loglevel initcall_debug no_console_suspend".
 >
 > Hope we can find out which nb notifier cause problem.
 >
Did that. Strangely, or better said, quite annoyingly, I'm now getting 
no output anymore at all on the third resume! :-(

I could try non-serial instead if you think that's worth a shot, but the 
most annoying thing is that my video doesn't get initialised properly 
after resume unless I have the tainting nvidia driver loaded. I could 
try if nouveau helps.

I've dropped all the debugging output in the same directory like before, 
look for files named like 
http://roy.gaast.net/~wilmer/.lkml/bad3.17-patched-initcall.txt


Thanks,

Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-18 23:57                           ` Wilmer van der Gaast
@ 2014-10-19  4:29                             ` Yinghai Lu
  2014-10-19 10:48                               ` Wilmer van der Gaast
  2014-10-21 21:40                               ` Wilmer van der Gaast
  2014-10-19  8:07                             ` Pavel Machek
  1 sibling, 2 replies; 51+ messages in thread
From: Yinghai Lu @ 2014-10-19  4:29 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

On Sat, Oct 18, 2014 at 4:57 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> On 18-10-14 22:28, Yinghai Lu wrote:
>>
>> Please apply attached debug patch on top of v3.17 and boot with
>> "debug ignore_loglevel initcall_debug no_console_suspend".
>>
>> Hope we can find out which nb notifier cause problem.
>>
> Did that. Strangely, or better said, quite annoyingly, I'm now getting no
> output anymore at all on the third resume! :-(
>
> I could try non-serial instead if you think that's worth a shot, but the
> most annoying thing is that my video doesn't get initialised properly after
> resume unless I have the tainting nvidia driver loaded. I could try if
> nouveau helps.

oh no.

Please try to "debug ignore_loglevel no_console_suspend".

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-18 23:57                           ` Wilmer van der Gaast
  2014-10-19  4:29                             ` Yinghai Lu
@ 2014-10-19  8:07                             ` Pavel Machek
  1 sibling, 0 replies; 51+ messages in thread
From: Pavel Machek @ 2014-10-19  8:07 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Yinghai Lu, Bjorn Helgaas, Rafael J. Wysocki, Rafael Wysocki,
	linux-kernel

On Sun 2014-10-19 00:57:12, Wilmer van der Gaast wrote:
> (Resending, forgot to hit reply-to-all.)
> 
> Hello Yinghai,
> 
> On 18-10-14 22:28, Yinghai Lu wrote:
> >
> > Please apply attached debug patch on top of v3.17 and boot with
> > "debug ignore_loglevel initcall_debug no_console_suspend".
> >
> > Hope we can find out which nb notifier cause problem.
> >
> Did that. Strangely, or better said, quite annoyingly, I'm now getting no
> output anymore at all on the third resume! :-(
> 
> I could try non-serial instead if you think that's worth a shot, but the
> most annoying thing is that my video doesn't get initialised properly after
> resume unless I have the tainting nvidia driver loaded. I could try if
> nouveau helps.

Tainting should not be a problem. If it works for you, it works...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-19  4:29                             ` Yinghai Lu
@ 2014-10-19 10:48                               ` Wilmer van der Gaast
  2014-10-21 21:40                               ` Wilmer van der Gaast
  1 sibling, 0 replies; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-19 10:48 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

Hello,

On 19-10-14 05:29, Yinghai Lu wrote:
>
> Please try to "debug ignore_loglevel no_console_suspend".
>
Same thing. :-(

[   72.572354] Restarting tasks ... done.
[   72.576554] PM: calling nb rcu_pm_notify+0x0/0x60
[   72.581277] PM: ... nb rcu_pm_notify+0x0/0x60 done
[   72.586115] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
[   72.591692] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
[   72.597345] PM: calling nb fw_pm_notify+0x0/0x150
[   72.602047] PM: ... nb fw_pm_notify+0x0/0x150 done
[   72.606839] PM: calling nb bsp_pm_callback+0x0/0x50
[   72.611711] PM: ... nb bsp_pm_callback+0x0/0x50 done
[   73.382175] r8169 0000:07:00.0 eth0: link up
[   78.857526] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   79.025718] ata3.00: configured for UDMA/133
[   81.379533] ata4: softreset failed (device not ready)
[   82.623212] PM: Syncing filesystems ... done.
[   82.661564] PM: Preparing system for mem sleep
[   82.669405] Freezing user space processes ... (elapsed 0.001 seconds) 
done.
[   82.677729] Freezing remaining freezable tasks ... (elapsed 0.001 
seconds) done.
[   82.686338] PM: Entering mem sleep

And nothing related to resume. :-(

Is there any point of me retrying with the initcall_debug flag but 
without your patch?

Looking at your patch again, it seems pretty mad that this would cause 
such a big difference. Overnight I remembered how my machine has TSC 
issues at the time this bug shows, so I tried setting hpet as the 
clocksource. (hpet=force on the cmdline did not seem to have that effect 
so I used sysfs instead) No effect either.

I need to go now, can experiment a little more tonight.


Thanks,

Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-19  4:29                             ` Yinghai Lu
  2014-10-19 10:48                               ` Wilmer van der Gaast
@ 2014-10-21 21:40                               ` Wilmer van der Gaast
  2014-10-21 23:15                                 ` Yinghai Lu
  1 sibling, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-21 21:40 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

Hello,

Sorry for the delay, finally poked at this again. It looks like the 
no_console_suspend flag was causing troubles, which I didn't really need 
anyway with logging going to my serial port.

This is what I get now on the failing resume:

[  112.879390] PM: resume of devices complete after 2239.905 msecs
[  112.880068] r8169 0000:07:00.0 eth0: link up
[  112.880078] Switched to clocksource hpet
[  116.069248] PM: Finishing wakeup.
[  116.072574] Restarting tasks ... done.
[  116.076664] PM: calling nb rcu_pm_notify+0x0/0x60
[  116.081439] PM: ... nb rcu_pm_notify+0x0/0x60 done
[  116.086267] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
[  116.088526] systemd[1]: Got notification message for unit 
systemd-journald.service
[  116.099442] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
[  116.105099] PM: calling nb fw_pm_notify+0x0/0x150
[  116.109812] PM: ... nb fw_pm_notify+0x0/0x150 done
[  116.114623] PM: calling nb bsp_pm_callback+0x0/0x50
[  116.119504] PM: ... nb bsp_pm_callback+0x0/0x50 done

And then nothing, and it's hung. Looks the same to me (apart from the 
tsc issues + hpet switch) as a successful resume:

[   95.499513] PM: resume of devices complete after 1240.115 msecs
[   96.368940] r8169 0000:07:00.0 eth0: link up
[   98.676455] PM: Finishing wakeup.
[   98.679765] Restarting tasks ... done.
[   98.683821] PM: calling nb rcu_pm_notify+0x0/0x60
[   98.688524] PM: ... nb rcu_pm_notify+0x0/0x60 done
[   98.692044] systemd[1]: Got notification message for unit 
systemd-journald.service
[   98.700897] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
[   98.706470] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
[   98.712132] PM: calling nb fw_pm_notify+0x0/0x150
[   98.716848] PM: ... nb fw_pm_notify+0x0/0x150 done
[   98.721644] PM: calling nb bsp_pm_callback+0x0/0x50
[   98.726536] PM: ... nb bsp_pm_callback+0x0/0x50 done

Full logs in http://gaast.net/~wilmer/.lkml/bad3.17-patched-megadebug.txt


Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-21 21:40                               ` Wilmer van der Gaast
@ 2014-10-21 23:15                                 ` Yinghai Lu
  2014-10-22 12:53                                   ` Wilmer van der Gaast
  0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-21 23:15 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1360 bytes --]

On Tue, Oct 21, 2014 at 2:40 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> Hello,
>
> Sorry for the delay, finally poked at this again. It looks like the
> no_console_suspend flag was causing troubles, which I didn't really need
> anyway with logging going to my serial port.
>
> This is what I get now on the failing resume:
>
> [  112.879390] PM: resume of devices complete after 2239.905 msecs
> [  112.880068] r8169 0000:07:00.0 eth0: link up
> [  112.880078] Switched to clocksource hpet
> [  116.069248] PM: Finishing wakeup.
> [  116.072574] Restarting tasks ... done.
> [  116.076664] PM: calling nb rcu_pm_notify+0x0/0x60
> [  116.081439] PM: ... nb rcu_pm_notify+0x0/0x60 done
> [  116.086267] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
> [  116.088526] systemd[1]: Got notification message for unit
> systemd-journald.service
> [  116.099442] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
> [  116.105099] PM: calling nb fw_pm_notify+0x0/0x150
> [  116.109812] PM: ... nb fw_pm_notify+0x0/0x150 done
> [  116.114623] PM: calling nb bsp_pm_callback+0x0/0x50
> [  116.119504] PM: ... nb bsp_pm_callback+0x0/0x50 done
>
> And then nothing, and it's hung. Looks the same to me (apart from the tsc
> issues + hpet switch) as a successful resume:

then it stuck in pm_restore_console()?

Please check attached debut patch.

Thanks

Yinghai

[-- Attachment #2: debug_suspend_resume_y.patch --]
[-- Type: text/x-patch, Size: 1770 bytes --]

---
 kernel/power/console.c |    9 +++++++++
 1 file changed, 9 insertions(+)

Index: linux-2.6/kernel/power/console.c
===================================================================
--- linux-2.6.orig/kernel/power/console.c
+++ linux-2.6/kernel/power/console.c
@@ -51,6 +51,7 @@ void pm_vt_switch_required(struct device
 		if (tmp->dev == dev) {
 			/* already registered, update requirement */
 			tmp->required = required;
+			dev_info(dev, "pm_vt_switch_required() update %d\n", required);
 			goto out;
 		}
 	}
@@ -61,6 +62,7 @@ void pm_vt_switch_required(struct device
 
 	entry->required = required;
 	entry->dev = dev;
+	dev_info(dev, "pm_vt_switch_required() added %d\n", required);
 
 	list_add(&entry->head, &pm_vt_switch_list);
 out:
@@ -81,6 +83,7 @@ void pm_vt_switch_unregister(struct devi
 	mutex_lock(&vt_switch_mutex);
 	list_for_each_entry(tmp, &pm_vt_switch_list, head) {
 		if (tmp->dev == dev) {
+			dev_info(dev, "pm_vt_switch_required() removed %d\n", tmp->required);
 			list_del(&tmp->head);
 			kfree(tmp);
 			break;
@@ -131,11 +134,14 @@ int pm_prepare_console(void)
 	if (!pm_vt_switch())
 		return 0;
 
+	pr_info("pm_prepare_console() before move\n");
 	orig_fgconsole = vt_move_to_console(SUSPEND_CONSOLE, 1);
 	if (orig_fgconsole < 0)
 		return 1;
 
+	pr_info("pm_prepare_console() before redirect\n");
 	orig_kmsg = vt_kmsg_redirect(SUSPEND_CONSOLE);
+	pr_info("pm_prepare_console() done\n");
 	return 0;
 }
 
@@ -145,7 +151,10 @@ void pm_restore_console(void)
 		return;
 
 	if (orig_fgconsole >= 0) {
+		pr_info("pm_restore_console() before move\n");
 		vt_move_to_console(orig_fgconsole, 0);
+		pr_info("pm_restore_console() before redirect\n");
 		vt_kmsg_redirect(orig_kmsg);
+		pr_info("pm_restore_console() done\n");
 	}
 }

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-21 23:15                                 ` Yinghai Lu
@ 2014-10-22 12:53                                   ` Wilmer van der Gaast
  2014-10-26 21:53                                     ` Yinghai Lu
  0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-22 12:53 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

Hello Yinghai,

This looks more promising!

Yinghai Lu (yinghai@kernel.org) wrote:
> >
> > And then nothing, and it's hung. Looks the same to me (apart from the tsc
> > issues + hpet switch) as a successful resume:
> 
> then it stuck in pm_restore_console()?
> 
That seems to be the case yes:

[  106.661152] PM: ... nb fw_pm_notify+0x0/0x150 done
[  106.665939] PM: calling nb bsp_pm_callback+0x0/0x50
[  106.670814] PM: ... nb bsp_pm_callback+0x0/0x50 done
[  106.675775] pm_restore_console() before move

Then nothing, during the third resume.

http://gaast.net/~wilmer/.lkml/bad3.17-patched-console-restore.txt has
the full log.

(Some of your other debug lines in your patch don't seem to be logging
anything during my repro BTW.)


Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-22 12:53                                   ` Wilmer van der Gaast
@ 2014-10-26 21:53                                     ` Yinghai Lu
  2014-10-27 10:50                                       ` Wilmer van der Gaast
  0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-26 21:53 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 672 bytes --]

On Wed, Oct 22, 2014 at 5:53 AM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> That seems to be the case yes:
>
> [  106.661152] PM: ... nb fw_pm_notify+0x0/0x150 done
> [  106.665939] PM: calling nb bsp_pm_callback+0x0/0x50
> [  106.670814] PM: ... nb bsp_pm_callback+0x0/0x50 done
> [  106.675775] pm_restore_console() before move
>
> Then nothing, during the third resume.
>
> http://gaast.net/~wilmer/.lkml/bad3.17-patched-console-restore.txt has
> the full log.
>
> (Some of your other debug lines in your patch don't seem to be logging
> anything during my repro BTW.)

Please try attached two debug patches to check the pci registers
between the suspend/resume.

[-- Attachment #2: debug_extra_dump_pci.patch --]
[-- Type: text/x-patch, Size: 1804 bytes --]

Subject: [PATCH] pci: print out about pci=dump

debug print out before later driver hang

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pci.c |   52 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -3858,6 +3858,54 @@ void __weak pci_fixup_cardbus(struct pci
 }
 EXPORT_SYMBOL(pci_fixup_cardbus);
 
+static void dump_pci_device_range(struct pci_dev *dev, unsigned start_reg,
+					 unsigned size)
+{
+	int i;
+	int j;
+	u32 val;
+	int end = start_reg + size;
+
+	printk(KERN_DEBUG "PCI: %s", pci_name(dev));
+
+	for (i = start_reg; i < end; i += 4) {
+		if (!(i & 0x0f))
+			printk("\n%04x:", i);
+
+		pci_read_config_dword(dev, i, &val);
+		for (j = 0; j < 4; j++) {
+			printk(" %02x", val & 0xff);
+			val >>= 8;
+		}
+	}
+	printk("\n");
+}
+
+static int dump_pci_devices(void)
+{
+	struct pci_dev *dev = NULL;
+
+	while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL)
+		dump_pci_device_range(dev, 0, dev->cfg_size);
+
+	return 0;
+}
+
+static int pci_dump_regs;
+static void pci_dump(void)
+{
+	pci_dump_regs = 1;
+}
+
+static int pci_init(void)
+{
+	if (pci_dump_regs)
+		dump_pci_devices();
+
+	return 0;
+}
+device_initcall(pci_init);
+
 static int __init pci_setup(char *str)
 {
 	while (str) {
@@ -3865,7 +3913,9 @@ static int __init pci_setup(char *str)
 		if (k)
 			*k++ = 0;
 		if (*str && (str = pcibios_setup(str)) && *str) {
-			if (!strcmp(str, "nomsi")) {
+			if (!strcmp(str, "dump")) {
+				pci_dump();
+			} else if (!strcmp(str, "nomsi")) {
 				pci_no_msi();
 			} else if (!strcmp(str, "noaer")) {
 				pci_no_aer();

[-- Attachment #3: debug_suspend_resume_z.patch --]
[-- Type: text/x-patch, Size: 1037 bytes --]

---
 drivers/pci/pci.c      |    2 +-
 kernel/power/suspend.c |    2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -4462,7 +4462,7 @@ static void dump_pci_device_range(struct
 	printk("\n");
 }
 
-static int dump_pci_devices(void)
+int dump_pci_devices(void)
 {
 	struct pci_dev *dev = NULL;
 
Index: linux-2.6/kernel/power/suspend.c
===================================================================
--- linux-2.6.orig/kernel/power/suspend.c
+++ linux-2.6/kernel/power/suspend.c
@@ -401,6 +401,7 @@ int suspend_devices_and_enter(suspend_st
 	goto Resume_devices;
 }
 
+int dump_pci_devices(void);
 /**
  * suspend_finish - Clean up before finishing the suspend sequence.
  *
@@ -411,6 +412,7 @@ static void suspend_finish(void)
 {
 	suspend_thaw_processes();
 	pm_notifier_call_chain(PM_POST_SUSPEND);
+	dump_pci_devices();
 	pm_restore_console();
 }
 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-26 21:53                                     ` Yinghai Lu
@ 2014-10-27 10:50                                       ` Wilmer van der Gaast
  2014-10-27 18:23                                         ` Yinghai Lu
  2014-10-27 21:21                                         ` Pavel Machek
  0 siblings, 2 replies; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-27 10:50 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

Hello Yinghai,

Thanks again for your time!

I've applied your two patches, and as a wild guess also added pci=dump 
to my kernel cmdline though I guess that just gave me a boot-time dump - 
which mostly didn't make it into my dmesg.

I accidentally booted with no_console_suspend on the first run, which 
still caused no output at all on the failed resume. I'm including the 
output of that anyway, but also I have a run with that flag removed, and 
annoyingly the crash appears to happen before the dump during the crash 
finishes - while dumping info for this device, it seems:

04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892 (rev 
10) (prog-if 01 [Subtractive decode])

(More info in my lspci.txt)

Wondering what device that is exactly, I stumbled upon 
http://sourceforge.net/p/linux1394/mailman/message/29755048/ where 
someone describes it as a "cheap and crappy PCI bridge". More and more I 
wonder if I should just buy a new motherboard - sadly this one wasn't 
even that cheap. :-( Though I don't know if the output stopping while 
dumping output for this device means that it is the culprit, is printk() 
to the serial console in any way blocking/buffered?

Anyway, dumps are in:

http://gaast.net/~wilmer/.lkml/bad3.17-pcidumps-no_console_suspend.txt
http://gaast.net/~wilmer/.lkml/bad3.17-pcidumps.txt


Cheers,

Wilmer van der Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-27 10:50                                       ` Wilmer van der Gaast
@ 2014-10-27 18:23                                         ` Yinghai Lu
  2014-10-27 22:22                                           ` Wilmer van der Gaast
  2014-10-27 21:21                                         ` Pavel Machek
  1 sibling, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-27 18:23 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1588 bytes --]

On Mon, Oct 27, 2014 at 3:50 AM, Wilmer van der Gaast <wilmer@gaast.net> wrote:

> http://gaast.net/~wilmer/.lkml/bad3.17-pcidumps.txt

[  252.028142] PCI: 0000:04:00.0
0000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0010: ff ff ff ff ff ff ff ff


04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892
(rev 10) (prog-if 01 [Subtractive decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 4 bytes
Bus: primary=04, secondary=05, subordinate=05, sec-latency=32
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fbc00000-fbcfffff
Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr+ DiscTmrStat- DiscTmrSERREn-
Capabilities: [90] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=55mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a0] Subsystem: Gigabyte Technology Co., Ltd Device 5000

under

00:1c.3 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5)
(prog-if 01 [Subtractive decode])

So that ITE will not work after suspend/resume?

Please apply 4 attached patches and try to remove the device like

echo 1 > /sys/bus/pci/devices/0000\:04\:00.0/remove
echo 1 > /sys/bus/pci/devices/0000\:00\:1c.3/pcie_link_disable

before suspend/resume test.

Thanks

Yinghai

[-- Attachment #2: move_pcie_link_disable_1.patch --]
[-- Type: text/x-patch, Size: 2628 bytes --]

Subject: [PATCH] PCI: Add generic pcie_link_disable

Remove not needed return value checking that Linus pointed out before.

Will use it from /sys/.../pcie/link_disable

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/Makefile    |    2 +-
 drivers/pci/pcie-link.c |   42 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h     |    2 ++
 3 files changed, 45 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pcie-link.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/pci/pcie-link.c
@@ -0,0 +1,42 @@
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/errno.h>
+#include <linux/jiffies.h>
+#include <linux/delay.h>
+
+int pcie_link_disable_get(struct pci_dev *dev)
+{
+	u16 lnk_ctrl;
+	if (!pci_is_pcie(dev))
+		return 0;
+
+	pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnk_ctrl);
+
+	return !!(lnk_ctrl & PCI_EXP_LNKCTL_LD);
+}
+
+void pcie_link_disable_set(struct pci_dev *dev, int bit)
+{
+	u16 lnk_ctrl, old_lnk_ctrl;
+
+	if (!pci_is_pcie(dev))
+		return;
+
+	pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnk_ctrl);
+	old_lnk_ctrl = lnk_ctrl;
+
+	if (!bit)
+		lnk_ctrl &= ~PCI_EXP_LNKCTL_LD;
+	else
+		lnk_ctrl |= PCI_EXP_LNKCTL_LD;
+
+	if (old_lnk_ctrl == lnk_ctrl)
+		return;
+
+	pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnk_ctrl);
+
+	dev_printk(KERN_DEBUG, &dev->dev, "%s: lnk_ctrl = %x\n", __func__,
+			 lnk_ctrl);
+}
+EXPORT_SYMBOL(pcie_link_disable_set);
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -842,6 +842,8 @@ struct pci_bus *pci_scan_root_bus(struct
 struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
 				int busnr);
 void pcie_update_link_speed(struct pci_bus *bus, u16 link_status);
+void pcie_link_disable_set(struct pci_dev *dev, int bit);
+int pcie_link_disable_get(struct pci_dev *dev);
 struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
 				 const char *name,
 				 struct hotplug_slot *hotplug);
Index: linux-2.6/drivers/pci/Makefile
===================================================================
--- linux-2.6.orig/drivers/pci/Makefile
+++ linux-2.6/drivers/pci/Makefile
@@ -4,7 +4,7 @@
 
 obj-y		+= access.o bus.o probe.o host-bridge.o remove.o pci.o \
 			pci-driver.o search.o pci-sysfs.o rom.o setup-res.o \
-			irq.o vpd.o setup-bus.o vc.o
+			irq.o vpd.o setup-bus.o pcie-link.o vc.o
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_SYSFS) += slot.o
 

[-- Attachment #3: move_pcie_link_disable_2.patch --]
[-- Type: text/x-patch, Size: 1947 bytes --]

Subject: [PATCH] PCI, pciehp: Use generic pcie_link_disable

Also remove old version with not needed return check.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/hotplug/pciehp_hpc.c |   30 +++---------------------------
 1 file changed, 3 insertions(+), 27 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
===================================================================
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
@@ -305,28 +305,6 @@ int pciehp_check_link_status(struct cont
 	return 0;
 }
 
-static int __pciehp_link_set(struct controller *ctrl, bool enable)
-{
-	struct pci_dev *pdev = ctrl_dev(ctrl);
-	u16 lnk_ctrl;
-
-	pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &lnk_ctrl);
-
-	if (enable)
-		lnk_ctrl &= ~PCI_EXP_LNKCTL_LD;
-	else
-		lnk_ctrl |= PCI_EXP_LNKCTL_LD;
-
-	pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnk_ctrl);
-	ctrl_dbg(ctrl, "%s: lnk_ctrl = %x\n", __func__, lnk_ctrl);
-	return 0;
-}
-
-static int pciehp_link_enable(struct controller *ctrl)
-{
-	return __pciehp_link_set(ctrl, true);
-}
-
 void pciehp_get_attention_status(struct slot *slot, u8 *status)
 {
 	struct controller *ctrl = slot->ctrl;
@@ -473,7 +451,6 @@ int pciehp_power_on_slot(struct slot * s
 	struct controller *ctrl = slot->ctrl;
 	struct pci_dev *pdev = ctrl_dev(ctrl);
 	u16 slot_status;
-	int retval;
 
 	/* Clear sticky power-fault bit from previous power failures */
 	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
@@ -487,11 +464,10 @@ int pciehp_power_on_slot(struct slot * s
 		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
 		 PCI_EXP_SLTCTL_PWR_ON);
 
-	retval = pciehp_link_enable(ctrl);
-	if (retval)
-		ctrl_err(ctrl, "%s: Can not enable the link!\n", __func__);
+	/* Enable the link */
+	pcie_link_disable_set(ctrl->pcie->port, 0);
 
-	return retval;
+	return 0;
 }
 
 void pciehp_power_off_slot(struct slot * slot)

[-- Attachment #4: pci_express_link.patch --]
[-- Type: text/x-patch, Size: 2574 bytes --]

Subject: [PATCH] PCI, sysfs: Add pcie attrs for pcie device under pci dev dir.

Will put link_disable and link_retrain

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/Makefile     |    2 +-
 drivers/pci/pci-sysfs.c  |    1 +
 drivers/pci/pci.h        |    1 +
 drivers/pci/pcie-sysfs.c |   23 +++++++++++++++++++++++
 4 files changed, 26 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -1608,6 +1608,7 @@ static struct attribute_group pci_dev_br
 static const struct attribute_group *pci_dev_attr_groups[] = {
 	&pci_dev_attr_group,
 	&pci_dev_bridge_attr_group,
+	&pci_dev_pcie_attr_group,
 	&pci_dev_hp_attr_group,
 #ifdef CONFIG_PCI_IOV
 	&sriov_dev_attr_group,
Index: linux-2.6/drivers/pci/pcie-sysfs.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/pci/pcie-sysfs.c
@@ -0,0 +1,23 @@
+#include <linux/kernel.h>
+#include <linux/pci.h>
+
+static struct attribute *pci_dev_pcie_dev_attrs[] = {
+	NULL,
+};
+
+static umode_t pci_dev_pcie_attrs_are_visible(struct kobject *kobj,
+						struct attribute *a, int n)
+{
+	struct device *dev = container_of(kobj, struct device, kobj);
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	if (!pci_is_pcie(pdev))
+		return 0;
+
+	return a->mode;
+}
+
+struct attribute_group pci_dev_pcie_attr_group = {
+	.is_visible = pci_dev_pcie_attrs_are_visible,
+	.attrs	    = pci_dev_pcie_dev_attrs,
+};
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -152,6 +152,7 @@ static inline int pci_no_d1d2(struct pci
 extern const struct attribute_group *pci_dev_groups[];
 extern const struct attribute_group *pcibus_groups[];
 extern struct device_type pci_dev_type;
+extern struct attribute_group pci_dev_pcie_attr_group;
 extern const struct attribute_group *pci_bus_groups[];
 
 
Index: linux-2.6/drivers/pci/Makefile
===================================================================
--- linux-2.6.orig/drivers/pci/Makefile
+++ linux-2.6/drivers/pci/Makefile
@@ -4,7 +4,7 @@
 
 obj-y		+= access.o bus.o probe.o host-bridge.o remove.o pci.o \
 			pci-driver.o search.o pci-sysfs.o rom.o setup-res.o \
-			irq.o vpd.o setup-bus.o pcie-link.o vc.o
+			irq.o vpd.o setup-bus.o pcie-link.o pcie-sysfs.o vc.o
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_SYSFS) += slot.o
 

[-- Attachment #5: pci_express_link_disable.patch --]
[-- Type: text/x-patch, Size: 1930 bytes --]

Subject: [PATCH] PCI: Add link_disable in /sysfs for pcie device

Found PCIe cards from one vendor, will not respond to scan from bridge,
if we change bus number setting in bridge device.

Have to do link disable/enable on the pcie root port.

So try to expose link disable bit of pcie link control register. We can use
 echo 1 > /sys/..../link_disable
 echo 0 > /sys/..../link_disable
to bring the pcie device back to respond to scan.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pcie-sysfs.c |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

Index: linux-2.6/drivers/pci/pcie-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pcie-sysfs.c
+++ linux-2.6/drivers/pci/pcie-sysfs.c
@@ -1,7 +1,35 @@
 #include <linux/kernel.h>
 #include <linux/pci.h>
 
+static ssize_t
+pcie_link_disable_show(struct device *dev, struct device_attribute *attr,
+			char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	return sprintf(buf, "%u\n", pcie_link_disable_get(pdev));
+}
+static ssize_t
+pcie_link_disable_store(struct device *dev, struct device_attribute *attr,
+			const char *buf, size_t count)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	unsigned long val;
+
+	if (kstrtoul(buf, 0, &val) < 0)
+		return -EINVAL;
+
+	pcie_link_disable_set(pdev, val);
+
+	return count;
+}
+
+static struct device_attribute pcie_link_disable_attr =
+		__ATTR(pcie_link_disable, 0644,
+		       pcie_link_disable_show, pcie_link_disable_store);
+
 static struct attribute *pci_dev_pcie_dev_attrs[] = {
+	&pcie_link_disable_attr.attr,
 	NULL,
 };
 
@@ -14,6 +42,11 @@ static umode_t pci_dev_pcie_attrs_are_vi
 	if (!pci_is_pcie(pdev))
 		return 0;
 
+	if (a == &pcie_link_disable_attr.attr)
+		if ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
+		    (pci_pcie_type(pdev) != PCI_EXP_TYPE_DOWNSTREAM))
+			return 0;
+
 	return a->mode;
 }
 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-27 10:50                                       ` Wilmer van der Gaast
  2014-10-27 18:23                                         ` Yinghai Lu
@ 2014-10-27 21:21                                         ` Pavel Machek
  1 sibling, 0 replies; 51+ messages in thread
From: Pavel Machek @ 2014-10-27 21:21 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Yinghai Lu, Bjorn Helgaas, Rafael J. Wysocki, Rafael Wysocki,
	linux-kernel

On Mon 2014-10-27 10:50:04, Wilmer van der Gaast wrote:
> Hello Yinghai,
> 
> Thanks again for your time!
> 
> I've applied your two patches, and as a wild guess also added pci=dump to my
> kernel cmdline though I guess that just gave me a boot-time dump - which
> mostly didn't make it into my dmesg.
> 
> I accidentally booted with no_console_suspend on the first run, which still
> caused no output at all on the failed resume. I'm including the output of
> that anyway, but also I have a run with that flag removed, and annoyingly
> the crash appears to happen before the dump during the crash finishes -
> while dumping info for this device, it seems:
> 
> 04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892 (rev 10)
> (prog-if 01 [Subtractive decode])
> 
> (More info in my lspci.txt)
> 
> Wondering what device that is exactly, I stumbled upon
> http://sourceforge.net/p/linux1394/mailman/message/29755048/ where someone
> describes it as a "cheap and crappy PCI bridge". More and more I wonder if I
> should just buy a new motherboard - sadly this one wasn't even that
> cheap.

It is probably not just you that is affected, and we already know what
change broke it. So we really should fix it.

> :-( Though I don't know if the output stopping while dumping output for this
> device means that it is the culprit, is printk() to the serial console in
> any way blocking/buffered?

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-27 18:23                                         ` Yinghai Lu
@ 2014-10-27 22:22                                           ` Wilmer van der Gaast
  2014-10-27 23:41                                             ` Yinghai Lu
  0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-27 22:22 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

Hello,

On 27-10-14 18:23, Yinghai Lu wrote:
>
> 04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892
>
> So that ITE will not work after suspend/resume?
>
Even after the first one already, you mean?

Honestly, I don't really know what its purpose is, and it doesn't have 
any child nodes in the PCI tree from what I can tell. Possibly because I 
don't have any PCI cards in the machine, just a PCIe video card - 
assuming this is a PCI bridge taking care of legacy PCI plugin cards?

> Please apply 4 attached patches and try to remove the device like
>
> echo 1 > /sys/bus/pci/devices/0000\:04\:00.0/remove
> echo 1 > /sys/bus/pci/devices/0000\:00\:1c.3/pcie_link_disable
>
> before suspend/resume test.
>
That worked! Resumed properly now.

Full log in http://gaast.net/~wilmer/.lkml/good3.17.txt . Including the 
PCI dump at boot time, where that device doesn't dump just ff's.


Wilmer van der Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-27 22:22                                           ` Wilmer van der Gaast
@ 2014-10-27 23:41                                             ` Yinghai Lu
  2014-10-28  0:03                                               ` Wilmer van der Gaast
  0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-27 23:41 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

On Mon, Oct 27, 2014 at 3:22 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> Hello,
>
> On 27-10-14 18:23, Yinghai Lu wrote:
>>
>>
>> 04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892
>>
>> So that ITE will not work after suspend/resume?
>>
> Even after the first one already, you mean?

Yes.

>
> Honestly, I don't really know what its purpose is, and it doesn't have any
> child nodes in the PCI tree from what I can tell. Possibly because I don't
> have any PCI cards in the machine, just a PCIe video card - assuming this is
> a PCI bridge taking care of legacy PCI plugin cards?
>
>> Please apply 4 attached patches and try to remove the device like
>>
>> echo 1 > /sys/bus/pci/devices/0000\:04\:00.0/remove
>> echo 1 > /sys/bus/pci/devices/0000\:00\:1c.3/pcie_link_disable
>>
>> before suspend/resume test.
>>
> That worked! Resumed properly now.
>
> Full log in http://gaast.net/~wilmer/.lkml/good3.17.txt . Including the PCI
> dump at boot time, where that device doesn't dump just ff's.

Can you only apply the patch that revert enable bridge early and
two pci dump patches to see if 04:00.0 readout is 0xff?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-27 23:41                                             ` Yinghai Lu
@ 2014-10-28  0:03                                               ` Wilmer van der Gaast
  2014-10-28  1:12                                                 ` Yinghai Lu
  0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-28  0:03 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

On 27-10-14 23:41, Yinghai Lu wrote:
>
> Can you only apply the patch that revert enable bridge early and
> two pci dump patches to see if 04:00.0 readout is 0xff?
>
I was curious about that already, did that with a 3.16.6 that I think 
just had your revert applied (and using lspci -xxxx to get the dump 
which I assumed would be the same): No changes to 04:00 at all.

Confirmed that this is the case with 3.17 + those patches as well, it's 
showing this at all times:

[  130.000122] PCI: 0000:04:00.0
0000: 83 12 92 88 07 00 10 00 10 01 04 06 01 00 01 00
0010: 00 00 00 00 00 00 00 00 04 05 05 20 d1 d1 20 22
0020: c0 fb c0 fb f1 ff 01 00 00 00 00 00 00 00 00 00
0030: 00 00 00 00 90 00 00 00 00 00 00 00 0b 01 00 02
0040: 0c 31 00 00 08 06 00 00 00 00 00 00 ff 00 00 00
0050: 72 ab b9 6d 00 00 00 00 20 c9 8e 00 00 00 00 00
0060: 00 00 00 00 aa 0d 00 10 00 44 00 00 00 00 00 80
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0090: 01 a0 42 fe 00 00 00 00 00 00 00 00 00 00 00 00
00a0: 0d 00 00 00 58 14 00 50 00 00 00 00 00 00 00 00
00b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00f0: 00 00 00 00 00 1f 00 00 00 00 00 00 00 00 00 00


Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-28  0:03                                               ` Wilmer van der Gaast
@ 2014-10-28  1:12                                                 ` Yinghai Lu
  2014-10-28  4:03                                                   ` Yinghai Lu
  2014-10-28 23:34                                                   ` Wilmer van der Gaast
  0 siblings, 2 replies; 51+ messages in thread
From: Yinghai Lu @ 2014-10-28  1:12 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

On Mon, Oct 27, 2014 at 5:03 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> I was curious about that already, did that with a 3.16.6 that I think just
> had your revert applied (and using lspci -xxxx to get the dump which I
> assumed would be the same): No changes to 04:00 at all.
>
> Confirmed that this is the case with 3.17 + those patches as well, it's
> showing this at all times:

can you post
lspci -vvxxxx -s 00:1c.3
lspci -vvxxxx -s 04:00.0
before reverting enable bridge early patch
and after reverting on 3.17+?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-28  1:12                                                 ` Yinghai Lu
@ 2014-10-28  4:03                                                   ` Yinghai Lu
  2014-10-28 10:23                                                     ` Wilmer van der Gaast
  2014-10-28 23:34                                                   ` Wilmer van der Gaast
  1 sibling, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-28  4:03 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 707 bytes --]

On Mon, Oct 27, 2014 at 6:12 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Mon, Oct 27, 2014 at 5:03 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
>> I was curious about that already, did that with a 3.16.6 that I think just
>> had your revert applied (and using lspci -xxxx to get the dump which I
>> assumed would be the same): No changes to 04:00 at all.
>>
>> Confirmed that this is the case with 3.17 + those patches as well, it's
>> showing this at all times:
>
> can you post
> lspci -vvxxxx -s 00:1c.3
> lspci -vvxxxx -s 04:00.0
> before reverting enable bridge early patch
> and after reverting on 3.17+?

Please check if attached patch could fix the problem on your setup.

Thanks

Yinghai

[-- Attachment #2: pci_set_bridge_d0.patch --]
[-- Type: text/x-patch, Size: 793 bytes --]

---
 drivers/pci/quirks.c |    6 ++++++
 1 file changed, 6 insertions(+)

Index: linux-2.6/drivers/pci/quirks.c
===================================================================
--- linux-2.6.orig/drivers/pci/quirks.c
+++ linux-2.6/drivers/pci/quirks.c
@@ -3098,6 +3098,12 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_IN
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x8c02, quirk_remove_d3_delay);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x8c22, quirk_remove_d3_delay);
 
+static void enable_pci_bridge_d0(struct pci_dev *dev)
+{
+	pci_set_power_state(dev, PCI_D0);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, enable_pci_bridge_d0);
+
 /*
  * Some devices may pass our check in pci_intx_mask_supported if
  * PCI_COMMAND_INTX_DISABLE works though they actually do not properly

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-28  4:03                                                   ` Yinghai Lu
@ 2014-10-28 10:23                                                     ` Wilmer van der Gaast
  0 siblings, 0 replies; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-28 10:23 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

On 28-10-14 04:03, Yinghai Lu wrote:
>
> Please check if attached patch could fix the problem on your setup.
>
Sadly it looks like it did not. :-( Applied your patch on a vanilla 3.17 
tree, still seeing the same crash.

I'll get more debugging output and the output you asked for in your 
previous e-mail tonight, need to go to work now.


Cheers,

Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-28  1:12                                                 ` Yinghai Lu
  2014-10-28  4:03                                                   ` Yinghai Lu
@ 2014-10-28 23:34                                                   ` Wilmer van der Gaast
  2014-10-29  5:17                                                     ` Yinghai Lu
  1 sibling, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-28 23:34 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

Hello,

On 28-10-14 01:12, Yinghai Lu wrote:
> lspci -vvxxxx -s 00:1c.3
> lspci -vvxxxx -s 04:00.0
> before reverting enable bridge early patch

http://gaast.net/~wilmer/.lkml/lspcixx-nopatch.txt (So that's 3.17 + 
your revert patch)

> and after reverting on 3.17+?
>
http://gaast.net/~wilmer/.lkml/lspcixx-patched.txt

plain 3.17.

I've run the commands twice, once before and once after a single 
suspend+resume cycle. Small difference and only before that cycle:

ruby:~/crashit# diff -u lspcixx-*
--- lspcixx-nopatch.txt 2014-10-28 23:26:08.679690828 +0000
+++ lspcixx-patched.txt 2014-10-28 23:10:05.391896757 +0000
@@ -92,10 +92,10 @@
  2f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
-320: 5b 60 09 00 00 20 00 0a 08 10 b8 04 07 00 d5 0c
+320: 5b 60 09 00 00 20 00 0a 90 10 b8 04 8f 00 5d 0d
  330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
-340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 d5 00
-350: fe 07 dd 00 01 00 08 00 00 00 00 00 00 00 00 00
+340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 5d 00
+350: fe 07 65 00 01 00 08 00 00 00 00 00 00 00 00 00
  360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

(Diff is in the Intel device, not the ITE one.)


Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-28 23:34                                                   ` Wilmer van der Gaast
@ 2014-10-29  5:17                                                     ` Yinghai Lu
  2014-10-29  9:37                                                       ` Wilmer van der Gaast
  0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-29  5:17 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1271 bytes --]

On Tue, Oct 28, 2014 at 4:34 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
>
> I've run the commands twice, once before and once after a single
> suspend+resume cycle. Small difference and only before that cycle:
>
> ruby:~/crashit# diff -u lspcixx-*
> --- lspcixx-nopatch.txt 2014-10-28 23:26:08.679690828 +0000
> +++ lspcixx-patched.txt 2014-10-28 23:10:05.391896757 +0000
> @@ -92,10 +92,10 @@
>  2f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
> -320: 5b 60 09 00 00 20 00 0a 08 10 b8 04 07 00 d5 0c
> +320: 5b 60 09 00 00 20 00 0a 90 10 b8 04 8f 00 5d 0d
>  330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
> -340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 d5 00
> -350: fe 07 dd 00 01 00 08 00 00 00 00 00 00 00 00 00
> +340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 5d 00
> +350: fe 07 65 00 01 00 08 00 00 00 00 00 00 00 00 00
>  360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> (Diff is in the Intel device, not the ITE one.)
>

That is strange.

Anyway please try attached patched on top of 3.17.

Thanks

Yinghai

[-- Attachment #2: debug_suspend_resume_z_xx.patch --]
[-- Type: text/x-patch, Size: 511 bytes --]

---
 drivers/pci/pci.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,8 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_bridge);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-29  5:17                                                     ` Yinghai Lu
@ 2014-10-29  9:37                                                       ` Wilmer van der Gaast
  2014-10-30  0:53                                                         ` Yinghai Lu
  0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-29  9:37 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

Helllo,

On 29-10-14 05:17, Yinghai Lu wrote:
>> (Diff is in the Intel device, not the ITE one.)
> That is strange.
>
I did wonder later, why was I not seeing the ff* dump anymore after the 
resume..

> Anyway please try attached patched on top of 3.17.
>
Done, and that did work! Four suspend+resume cycles later and it's still 
stable.


Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-29  9:37                                                       ` Wilmer van der Gaast
@ 2014-10-30  0:53                                                         ` Yinghai Lu
  2014-10-30 10:36                                                           ` Wilmer van der Gaast
  0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-30  0:53 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 268 bytes --]

On Wed, Oct 29, 2014 at 2:37 AM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
>
>> Anyway please try attached patched on top of 3.17.
>>
> Done, and that did work! Four suspend+resume cycles later and it's still
> stable.

Then can you test attached simplified one.

[-- Attachment #2: debug_suspend_resume_z_yy.patch --]
[-- Type: text/x-patch, Size: 835 bytes --]

---
 drivers/pci/pci.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,19 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static void ite_set_d0(struct pci_dev *dev)
+{
+	if (dev->pm_cap) {
+		u16 pmcsr;
+		pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
+	}
+
+	pci_set_power_state(dev, PCI_D0);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x244e, ite_set_d0);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, ite_set_d0);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-30  0:53                                                         ` Yinghai Lu
@ 2014-10-30 10:36                                                           ` Wilmer van der Gaast
  2014-10-30 16:57                                                             ` Yinghai Lu
  0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-30 10:36 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

Hello,

On 30-10-14 00:53, Yinghai Lu wrote:
>> Done, and that did work! Four suspend+resume cycles later and it's still
>> stable.
> Then can you test attached simplified one.
>
Sadly, with that patch (applied against a vanilla 3.17 tree like all the 
others) the second resume fails already. :-(


Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-30 10:36                                                           ` Wilmer van der Gaast
@ 2014-10-30 16:57                                                             ` Yinghai Lu
  2014-10-30 21:54                                                               ` Wilmer van der Gaast
  0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-30 16:57 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 347 bytes --]

On Thu, Oct 30, 2014 at 3:36 AM, Wilmer van der Gaast <wilmer@gaast.net> wrote:

> Sadly, with that patch (applied against a vanilla 3.17 tree like all the> others) the second resume fails already. :-(

oh, no. Really want to know which bit causes the problem.

Please check debug patch...that will print out pci conf space before
...and after...

[-- Attachment #2: debug_extra_dump_pci.patch --]
[-- Type: text/x-patch, Size: 1804 bytes --]

Subject: [PATCH] pci: print out about pci=dump

debug print out before later driver hang

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pci.c |   52 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -3858,6 +3858,54 @@ void __weak pci_fixup_cardbus(struct pci
 }
 EXPORT_SYMBOL(pci_fixup_cardbus);
 
+static void dump_pci_device_range(struct pci_dev *dev, unsigned start_reg,
+					 unsigned size)
+{
+	int i;
+	int j;
+	u32 val;
+	int end = start_reg + size;
+
+	printk(KERN_DEBUG "PCI: %s", pci_name(dev));
+
+	for (i = start_reg; i < end; i += 4) {
+		if (!(i & 0x0f))
+			printk("\n%04x:", i);
+
+		pci_read_config_dword(dev, i, &val);
+		for (j = 0; j < 4; j++) {
+			printk(" %02x", val & 0xff);
+			val >>= 8;
+		}
+	}
+	printk("\n");
+}
+
+static int dump_pci_devices(void)
+{
+	struct pci_dev *dev = NULL;
+
+	while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL)
+		dump_pci_device_range(dev, 0, dev->cfg_size);
+
+	return 0;
+}
+
+static int pci_dump_regs;
+static void pci_dump(void)
+{
+	pci_dump_regs = 1;
+}
+
+static int pci_init(void)
+{
+	if (pci_dump_regs)
+		dump_pci_devices();
+
+	return 0;
+}
+device_initcall(pci_init);
+
 static int __init pci_setup(char *str)
 {
 	while (str) {
@@ -3865,7 +3913,9 @@ static int __init pci_setup(char *str)
 		if (k)
 			*k++ = 0;
 		if (*str && (str = pcibios_setup(str)) && *str) {
-			if (!strcmp(str, "nomsi")) {
+			if (!strcmp(str, "dump")) {
+				pci_dump();
+			} else if (!strcmp(str, "nomsi")) {
 				pci_no_msi();
 			} else if (!strcmp(str, "noaer")) {
 				pci_no_aer();

[-- Attachment #3: debug_suspend_resume_z_zz.patch --]
[-- Type: text/x-patch, Size: 740 bytes --]

---
 drivers/pci/pci.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,20 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static int dump_pci_devices(void);
+
+static void pci_enable_ite(struct pci_dev *dev)
+{
+	pr_info("before...\n");
+	dump_pci_devices();
+
+	pci_enable_bridge(dev);
+
+	pr_info("after...\n");
+	dump_pci_devices();
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-30 16:57                                                             ` Yinghai Lu
@ 2014-10-30 21:54                                                               ` Wilmer van der Gaast
  2014-10-30 23:02                                                                 ` Yinghai Lu
  0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-30 21:54 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

Hello,

On 30-10-14 16:57, Yinghai Lu wrote:
>> Sadly, with that patch (applied against a vanilla 3.17 tree like all the> others) the second resume fails already. :-(
>
> oh, no. Really want to know which bit causes the problem.
>
Good question. And I think you will find my new finding even more 
confusing: With your two patches from this e-mail, I could 
suspend+resume 3× with no problems.. With just your two debugging 
patches applied.

Lovely heisenbug here. I'll add that for every test so far I've removed 
the kernel source tree, re-untarred it and applied the patches from your 
e-mails on that, so the tests should be consistent. As is the bug 
normally, before we started testing patches the crashes were already 
always *very* reliably happening exactly after the third resume.

Just to be sure this morning was not a fluke, I've retested your patch 
from this morning, and still a crash on the second resume.

> Please check debug patch...that will print out pci conf space before
> ...and after...
>
http://gaast.net/~wilmer/.lkml/good3.17-patched-megadebug.txt


Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-30 21:54                                                               ` Wilmer van der Gaast
@ 2014-10-30 23:02                                                                 ` Yinghai Lu
  2014-10-30 23:24                                                                   ` Wilmer van der Gaast
  0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-30 23:02 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1203 bytes --]

On Thu, Oct 30, 2014 at 2:54 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:

> http://gaast.net/~wilmer/.lkml/good3.17-patched-megadebug.txt

no difference except on 00:1c.3

--- before.txt    2014-10-30 15:20:35.782886485 -0700
+++ after.txt    2014-10-30 15:21:37.034882515 -0700
@@ -49,10 +49,10 @@
 02f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
-0320: 5b 60 09 00 00 20 00 0a ec 19 b8 04 eb 09 b9 06
+0320: 5b 60 09 00 00 20 00 0a 33 1a b8 04 32 0a 00 07
 0330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
-0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 45 02 b9 00
-0350: 4b 02 c1 00 01 00 08 00 00 00 00 00 00 00 00 00
+0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 46 02 00 00
+0350: 4c 02 08 00 01 00 08 00 00 00 00 00 00 00 00 00
 0360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Please try attached patch on top of 3.17 without other patches.

If it is working, please dump acpi tables include dsdt.
need to check if there extra work in _PRT.

Thanks

Yinghai

[-- Attachment #2: debug_suspend_resume_xxx.patch --]
[-- Type: text/x-patch, Size: 720 bytes --]

---
 arch/x86/pci/common.c |    8 ++++++++
 1 file changed, 8 insertions(+)

Index: linux-2.6/arch/x86/pci/common.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/common.c
+++ linux-2.6/arch/x86/pci/common.c
@@ -719,6 +719,14 @@ int pcibios_enable_device(struct pci_dev
 	return 0;
 }
 
+static void pci_enable_irq_ite(struct pci_dev *dev)
+{
+	if (!pci_dev_msi_enabled(dev))
+		pcibios_enable_irq(dev);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x244e, pci_enable_irq_ite);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_irq_ite);
+
 void pcibios_disable_device (struct pci_dev *dev)
 {
 	if (!pci_dev_msi_enabled(dev) && pcibios_disable_irq)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-30 23:02                                                                 ` Yinghai Lu
@ 2014-10-30 23:24                                                                   ` Wilmer van der Gaast
  2014-10-31  0:43                                                                     ` Yinghai Lu
  0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-30 23:24 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel



On 30-10-14 23:02, Yinghai Lu wrote:
>> http://gaast.net/~wilmer/.lkml/good3.17-patched-megadebug.txt
>
> no difference except on 00:1c.3
>
> --- before.txt    2014-10-30 15:20:35.782886485 -0700
> +++ after.txt    2014-10-30 15:21:37.034882515 -0700
> @@ -49,10 +49,10 @@
>   02f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   0310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
> -0320: 5b 60 09 00 00 20 00 0a ec 19 b8 04 eb 09 b9 06
> +0320: 5b 60 09 00 00 20 00 0a 33 1a b8 04 32 0a 00 07
>   0330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
> -0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 45 02 b9 00
> -0350: 4b 02 c1 00 01 00 08 00 00 00 00 00 00 00 00 00
> +0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 46 02 00 00
> +0350: 4c 02 08 00 01 00 08 00 00 00 00 00 00 00 00 00
>   0360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   0370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   0380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
Those diffs are in exactly the same offsets like the dumps I was diffing 
a few days ago it seems.

> Please try attached patch on top of 3.17 without other patches.
>
Same problem like this morning: Failure after the second resume already. :-(

> If it is working, please dump acpi tables include dsdt.
> need to check if there extra work in _PRT.
>
Original files and iasl interpretations in: 
http://gaast.net/~wilmer/.lkml/tables/


Thanks,

Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-30 23:24                                                                   ` Wilmer van der Gaast
@ 2014-10-31  0:43                                                                     ` Yinghai Lu
  2014-10-31  2:13                                                                       ` Yinghai Lu
  0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-31  0:43 UTC (permalink / raw)
  To: Wilmer van der Gaast, Bjorn Helgaas
  Cc: Rafael J. Wysocki, Pavel Machek, Rafael Wysocki, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 411 bytes --]

On Thu, Oct 30, 2014 at 4:24 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
>
>
> Same problem like this morning: Failure after the second resume already. :-(
>
can not find out any magic line in pci_enable_bridge that could cause
the difference.

so either use attached pcie_enable_bridge_ite.patch or just revert the
commit 928bea9?

Bjorn, please check which one that you want to go on.

Thanks

Yinghai

[-- Attachment #2: pci_enable_bridge_ite.patch --]
[-- Type: text/x-patch, Size: 594 bytes --]

---
 drivers/pci/pci.c |    6 ++++++
 1 file changed, 6 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,12 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static void pci_enable_ite(struct pci_dev *dev)
+{
+	pci_enable_bridge(dev);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;

[-- Attachment #3: revert_928bea9_from_3.17.patch --]
[-- Type: text/x-patch, Size: 7187 bytes --]

diff --git a/arch/arm/kernel/bios32.c b/arch/arm/kernel/bios32.c
index 17a26c1..306ca53 100644
--- a/arch/arm/kernel/bios32.c
+++ b/arch/arm/kernel/bios32.c
@@ -538,6 +538,12 @@ void pci_common_init_dev(struct device *parent, struct hw_pci *hw)
 			 * Assign resources.
 			 */
 			pci_bus_assign_resources(bus);
+
+
+			/*
+			 * Enable bridges
+			 */
+			pci_enable_bridges(bus);
 		}
 
 		/*
diff --git a/arch/m68k/coldfire/pci.c b/arch/m68k/coldfire/pci.c
index df96792..b33f97a 100644
--- a/arch/m68k/coldfire/pci.c
+++ b/arch/m68k/coldfire/pci.c
@@ -319,6 +319,7 @@ static int __init mcf_pci_init(void)
 	pci_fixup_irqs(pci_common_swizzle, mcf_pci_map_irq);
 	pci_bus_size_bridges(rootbus);
 	pci_bus_assign_resources(rootbus);
+	pci_enable_bridges(rootbus);
 	return 0;
 }
 
diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c
index 1bf60b1..4f2e17d 100644
--- a/arch/mips/pci/pci.c
+++ b/arch/mips/pci/pci.c
@@ -113,6 +113,7 @@ static void pcibios_scanbus(struct pci_controller *hose)
 		if (!pci_has_flag(PCI_PROBE_ONLY)) {
 			pci_bus_size_bridges(bus);
 			pci_bus_assign_resources(bus);
+			pci_enable_bridges(bus);
 		}
 	}
 }
diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c
index 1bc09ee..5272327 100644
--- a/arch/sh/drivers/pci/pci.c
+++ b/arch/sh/drivers/pci/pci.c
@@ -69,6 +69,7 @@ static void pcibios_scanbus(struct pci_channel *hose)
 
 		pci_bus_size_bridges(bus);
 		pci_bus_assign_resources(bus);
+		pci_enable_bridges(bus);
 	} else {
 		pci_free_resource_list(&resources);
 	}
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index cd4de7e..c15bc3c 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -614,6 +614,9 @@ static int acpi_pci_root_add(struct acpi_device *device,
 	if (system_state != SYSTEM_BOOTING) {
 		pcibios_resource_survey_bus(root->bus);
 		pci_assign_unassigned_root_bus_resources(root->bus);
+
+		/* need to after hot-added ioapic is registered */
+		pci_enable_bridges(root->bus);
 	}
 
 	pci_lock_rescan_remove();
diff --git a/drivers/parisc/lba_pci.c b/drivers/parisc/lba_pci.c
index 37e71ff..19f6f70 100644
--- a/drivers/parisc/lba_pci.c
+++ b/drivers/parisc/lba_pci.c
@@ -1590,6 +1590,7 @@ lba_driver_probe(struct parisc_device *dev)
 		lba_dump_res(&lba_dev->hba.lmmio_space, 2);
 #endif
 	}
+	pci_enable_bridges(lba_bus);
 
 	/*
 	** Once PCI register ops has walked the bus, access to config
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 73aef51..761601e 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -283,6 +283,26 @@ void pci_bus_add_devices(const struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_bus_add_devices);
 
+void pci_enable_bridges(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	int retval;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		if (dev->subordinate) {
+			if (!pci_is_enabled(dev)) {
+				retval = pci_enable_device(dev);
+				if (retval)
+					dev_err(&dev->dev, "Error enabling bridge (%d), continuing\n", retval);
+				pci_set_master(dev);
+			}
+			pci_enable_bridges(dev->subordinate);
+		}
+	}
+}
+EXPORT_SYMBOL(pci_enable_bridges);
+
+
 /** pci_walk_bus - walk devices on/under bus, calling callback.
  *  @top      bus whose devices should be walked
  *  @cb       callback to be called for each device found
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index bcb90e4..8fadc84 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -511,6 +511,7 @@ static void enable_slot(struct acpiphp_slot *slot)
 	acpiphp_sanitize_bus(bus);
 	pcie_bus_configure_settings(bus);
 	acpiphp_set_acpi_region(slot);
+	pci_enable_bridges(bus);
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		/* Assume that newly added devices are powered on already. */
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 625a4ac..4121518 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1242,31 +1242,8 @@ int pci_reenable_device(struct pci_dev *dev)
 }
 EXPORT_SYMBOL(pci_reenable_device);
 
-static void pci_enable_bridge(struct pci_dev *dev)
-{
-	struct pci_dev *bridge;
-	int retval;
-
-	bridge = pci_upstream_bridge(dev);
-	if (bridge)
-		pci_enable_bridge(bridge);
-
-	if (pci_is_enabled(dev)) {
-		if (!dev->is_busmaster)
-			pci_set_master(dev);
-		return;
-	}
-
-	retval = pci_enable_device(dev);
-	if (retval)
-		dev_err(&dev->dev, "Error enabling bridge (%d), continuing\n",
-			retval);
-	pci_set_master(dev);
-}
-
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
-	struct pci_dev *bridge;
 	int err;
 	int i, bars = 0;
 
@@ -1285,10 +1262,6 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 	if (atomic_inc_return(&dev->enable_cnt) > 1)
 		return 0;		/* already enabled */
 
-	bridge = pci_upstream_bridge(dev);
-	if (bridge)
-		pci_enable_bridge(bridge);
-
 	/* only skip sriov related */
 	for (i = 0; i <= PCI_ROM_RESOURCE; i++)
 		if (dev->resource[i].flags & flags)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 5ed9930..df17ba8 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2177,6 +2177,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 
 	max = pci_scan_child_bus(bus);
 	pci_assign_unassigned_bus_resources(bus);
+	pci_enable_bridges(bus);
 	pci_bus_add_devices(bus);
 
 	return max;
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 0482235..2cfb1eb 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1587,7 +1587,7 @@ again:
 
 	/* any device complain? */
 	if (list_empty(&fail_head))
-		goto dump;
+		goto enable_and_dump;
 
 	if (tried_times >= pci_try_num) {
 		if (enable_local == undefined)
@@ -1596,7 +1596,7 @@ again:
 			dev_info(&bus->dev, "Automatically enabled pci realloc, if you have problem, try booting with pci=realloc=off\n");
 
 		free_list(&fail_head);
-		goto dump;
+		goto enable_and_dump;
 	}
 
 	dev_printk(KERN_DEBUG, &bus->dev,
@@ -1629,7 +1629,10 @@ again:
 
 	goto again;
 
-dump:
+enable_and_dump:
+	/* Depth last, update the hardware. */
+	pci_enable_bridges(bus);
+
 	/* dump the resource on buses */
 	pci_bus_dump_resources(bus);
 }
@@ -1700,6 +1703,7 @@ enable_all:
 	if (retval)
 		dev_err(&bridge->dev, "Error reenabling bridge (%d)\n", retval);
 	pci_set_master(bridge);
+	pci_enable_bridges(parent);
 }
 EXPORT_SYMBOL_GPL(pci_assign_unassigned_bridge_resources);
 
diff --git a/drivers/pcmcia/cardbus.c b/drivers/pcmcia/cardbus.c
index 4fe4cc4..9cbe4cf 100644
--- a/drivers/pcmcia/cardbus.c
+++ b/drivers/pcmcia/cardbus.c
@@ -92,6 +92,7 @@ int __ref cb_alloc(struct pcmcia_socket *s)
 	if (s->tune_bridge)
 		s->tune_bridge(s, bus);
 
+	pci_enable_bridges(bus);
 	pci_bus_add_devices(bus);
 
 	pci_unlock_rescan_remove();
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5be8db4..1f85fb5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1105,7 +1105,7 @@ int __must_check pci_bus_alloc_resource(struct pci_bus *bus,
 						  resource_size_t,
 						  resource_size_t),
 			void *alignf_data);
-
+void pci_enable_bridges(struct pci_bus *bus);
 
 int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr);
 

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-31  0:43                                                                     ` Yinghai Lu
@ 2014-10-31  2:13                                                                       ` Yinghai Lu
  2014-10-31  9:39                                                                         ` Wilmer van der Gaast
  0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-31  2:13 UTC (permalink / raw)
  To: Wilmer van der Gaast, Bjorn Helgaas
  Cc: Rafael J. Wysocki, Pavel Machek, Rafael Wysocki, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 516 bytes --]

On Thu, Oct 30, 2014 at 5:43 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Thu, Oct 30, 2014 at 4:24 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
>>
>>
>> Same problem like this morning: Failure after the second resume already. :-(
>>
> can not find out any magic line in pci_enable_bridge that could cause
> the difference.
>
> so either use attached pcie_enable_bridge_ite.patch or just revert the
> commit 928bea9?

Last try:

Please check attached patch that will keep state consistent.

Thanks

Yinghai

[-- Attachment #2: pci_enable_bridge_ite_x.patch --]
[-- Type: text/x-patch, Size: 1088 bytes --]

---
 drivers/pci/pci.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1264,6 +1264,26 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static void pci_enable_ite(struct pci_dev *dev)
+{
+        u16 cmd;
+
+	/*
+	 * FW enable the bridge already, so keep enable_cnt consistent,
+	 * then later we can go through pci_pm_resume/pci_pm_reenable_device
+	 * to enable it again.
+	 * --- for pci bridge without driver case.
+	 */
+	if (cmd & PCI_COMMAND_MASTER)
+		dev->is_busmaster = true;
+
+	pci_read_config_word(dev, PCI_COMMAND, &cmd);
+	if (cmd & (PCI_COMMAND_IO || PCI_COMMAND_MEMORY))
+		atomic_inc(&dev->enable_cnt);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x244e, pci_enable_ite);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-31  2:13                                                                       ` Yinghai Lu
@ 2014-10-31  9:39                                                                         ` Wilmer van der Gaast
  2014-10-31 16:11                                                                           ` Yinghai Lu
  0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-31  9:39 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

Hello Yinghai,

On 31-10-14 02:13, Yinghai Lu wrote:
> Last try:
>
> Please check attached patch that will keep state consistent.

Good news: This last patch worked! For good measure, I ran my test twice 
with a reboot in between. Worked consistently.

And similarly, to ensure that your debugging-at-boottime-only patch 
wasn't just working by accident yesterday, I tested it twice more with 
the same effect.


Thanks,

Wilmer van der Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-31  9:39                                                                         ` Wilmer van der Gaast
@ 2014-10-31 16:11                                                                           ` Yinghai Lu
  2014-10-31 21:13                                                                             ` Wilmer van der Gaast
  0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-31 16:11 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 602 bytes --]

On Fri, Oct 31, 2014 at 2:39 AM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> Hello Yinghai,
>
> On 31-10-14 02:13, Yinghai Lu wrote:
>>
>> Last try:
>>
>> Please check attached patch that will keep state consistent.
>
>
> Good news: This last patch worked! For good measure, I ran my test twice
> with a reboot in between. Worked consistently.
>
> And similarly, to ensure that your debugging-at-boottime-only patch wasn't
> just working by accident yesterday, I tested it twice more with the same
> effect.

Good. Please check if attached one on top of 3.17 only would work too.

Thanks

Yinghai

[-- Attachment #2: debug_suspend_resume_xxx1.patch --]
[-- Type: text/x-patch, Size: 643 bytes --]

---
 drivers/pci/pci.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1063,7 +1063,9 @@ static void pci_restore_config_space(str
 		pci_restore_config_space_range(pdev, 4, 9, 10);
 		pci_restore_config_space_range(pdev, 0, 3, 0);
 	} else {
-		pci_restore_config_space_range(pdev, 0, 15, 0);
+		/* Restore BARs before the command register. */
+		pci_restore_config_space_range(pdev, 4, 15, 0);
+		pci_restore_config_space_range(pdev, 0, 3, 0);
 	}
 }
 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-31 16:11                                                                           ` Yinghai Lu
@ 2014-10-31 21:13                                                                             ` Wilmer van der Gaast
  2014-10-31 21:22                                                                               ` Yinghai Lu
  0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-31 21:13 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

On 31-10-14 16:11, Yinghai Lu wrote:
>
> Good. Please check if attached one on top of 3.17 only would work too.
>
No luck, sadly. :-( Unsuccessful third resume.

I forgot to set up the serial console, would that still be useful?


Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-31 21:13                                                                             ` Wilmer van der Gaast
@ 2014-10-31 21:22                                                                               ` Yinghai Lu
  2014-10-31 23:18                                                                                 ` Yinghai Lu
  0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-31 21:22 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

On Fri, Oct 31, 2014 at 2:13 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> On 31-10-14 16:11, Yinghai Lu wrote:
>>
>>
>> Good. Please check if attached one on top of 3.17 only would work too.
>>
> No luck, sadly. :-( Unsuccessful third resume.
>
> I forgot to set up the serial console, would that still be useful?

never mind, let me go through suspend/resume code path again.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-31 21:22                                                                               ` Yinghai Lu
@ 2014-10-31 23:18                                                                                 ` Yinghai Lu
  2014-11-01  0:00                                                                                   ` Wilmer van der Gaast
  0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-31 23:18 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 391 bytes --]

On Fri, Oct 31, 2014 at 2:22 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Oct 31, 2014 at 2:13 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
>> On 31-10-14 16:11, Yinghai Lu wrote:
>>>
>>>
>>> Good. Please check if attached one on top of 3.17 only would work too.
>>>
>> No luck, sadly. :-( Unsuccessful third resume.

Please try attached two patches separately on top of 3.17.

[-- Attachment #2: pci_enable_bridge_ite.patch --]
[-- Type: text/x-patch, Size: 1016 bytes --]

---
 drivers/pci/pci.c |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,24 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static void pci_enable_ite(struct pci_dev *dev)
+{
+	/*
+	 * FW enable the bridge already, so call pci_enable_bridge()
+	 * to keep enable_cnt consistent, then later we can go through
+	 * pci_pm_resume/pci_pm_reenable_device to enable it again.
+	 * --- for pci bridge without driver case.
+	 */
+        if (!pci_is_enabled(dev)) {
+		u16 cmd;
+
+		pci_read_config_word(dev, PCI_COMMAND, &cmd);
+		if ((cmd & (PCI_COMMAND_IO || PCI_COMMAND_MEMORY)) &&
+			pci_enable_bridge(dev);
+	}
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;

[-- Attachment #3: pci_pm_reenable_device_enhance.patch --]
[-- Type: text/x-patch, Size: 859 bytes --]

---
 drivers/pci/pci-driver.c |    9 +++++++++
 1 file changed, 9 insertions(+)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -519,8 +519,17 @@ static void pci_pm_set_unknown_state(str
  */
 static int pci_pm_reenable_device(struct pci_dev *pci_dev)
 {
+	u16 cmd;
 	int retval;
 
+	/* update enable_cnt according to cmd register */
+	pci_read_config_word(pci_dev, PCI_COMMAND, &cmd);
+	if (!pci_dev->is_busmaster && (cmd & PCI_COMMAND_MASTER))
+		pci_dev->is_busmaster = true;
+	if (!pci_is_enabled(pci_dev) &&
+	    (cmd & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)))
+		atomic_inc(&pci_dev->enable_cnt);
+
 	/* if the device was enabled before suspend, reenable */
 	retval = pci_reenable_device(pci_dev);
 	/*

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-10-31 23:18                                                                                 ` Yinghai Lu
@ 2014-11-01  0:00                                                                                   ` Wilmer van der Gaast
  2014-11-01  2:10                                                                                     ` Yinghai Lu
  0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-11-01  0:00 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

Hello,

Patch #1 worked after a simple s/&&/)/. And patch #2 seems to fix the 
problem as well!


Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-11-01  0:00                                                                                   ` Wilmer van der Gaast
@ 2014-11-01  2:10                                                                                     ` Yinghai Lu
  2014-11-02 23:16                                                                                       ` Wilmer van der Gaast
  0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-11-01  2:10 UTC (permalink / raw)
  To: Wilmer van der Gaast
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 201 bytes --]

On Fri, Oct 31, 2014 at 5:00 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> Hello,
>
> Patch #1 worked after a simple s/&&/)/. And patch #2 seems to fix the
> problem as well!

updated first #1.

[-- Attachment #2: pci_enable_bridge_ite_v2.patch --]
[-- Type: text/x-patch, Size: 1011 bytes --]

---
 drivers/pci/pci.c |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,24 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static void pci_enable_ite(struct pci_dev *dev)
+{
+	/*
+	 * FW enable the bridge already, so call pci_enable_bridge()
+	 * to keep enable_cnt consistent, then later we can go through
+	 * pci_pm_resume/pci_pm_reenable_device to enable it again.
+	 * --- for pci bridge without driver case.
+	 */
+        if (!pci_is_enabled(dev)) {
+		u16 cmd;
+
+		pci_read_config_word(dev, PCI_COMMAND, &cmd);
+		if (cmd & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY))
+			pci_enable_bridge(dev);
+	}
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Machine crashes right *after* ~successful resume
  2014-11-01  2:10                                                                                     ` Yinghai Lu
@ 2014-11-02 23:16                                                                                       ` Wilmer van der Gaast
  0 siblings, 0 replies; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-11-02 23:16 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
	linux-kernel

On 01-11-14 02:10, Yinghai Lu wrote:
>> Patch #1 worked after a simple s/&&/)/. And patch #2 seems to fix the
>> problem as well!
> updated first #1.
>
Works as well!


Wilmer v/d Gaast.

-- 
+-------- .''`.     - -- ---+  +        - -- --- ---- ----- ------+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---------------+  +------ ----- ---- --- -- -        +

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2014-11-02 23:16 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-07 23:20 Machine crashes right *after* ~successful resume Wilmer van der Gaast
2014-10-12 14:30 ` Pavel Machek
2014-10-12 15:49   ` Wilmer van der Gaast
2014-10-12 20:40     ` Pavel Machek
2014-10-12 23:47       ` Wilmer van der Gaast
2014-10-13 15:06       ` Rafael J. Wysocki
2014-10-15 11:16         ` Wilmer van der Gaast
2014-10-15 13:58           ` Bjorn Helgaas
2014-10-15 18:39             ` Yinghai Lu
2014-10-15 23:34               ` Wilmer van der Gaast
2014-10-16  4:32                 ` Yinghai Lu
2014-10-16  9:36                   ` Wilmer van der Gaast
2014-10-16 16:36                     ` Yinghai Lu
2014-10-16 21:08                       ` Wilmer van der Gaast
2014-10-18 21:28                         ` Yinghai Lu
2014-10-18 23:57                           ` Wilmer van der Gaast
2014-10-19  4:29                             ` Yinghai Lu
2014-10-19 10:48                               ` Wilmer van der Gaast
2014-10-21 21:40                               ` Wilmer van der Gaast
2014-10-21 23:15                                 ` Yinghai Lu
2014-10-22 12:53                                   ` Wilmer van der Gaast
2014-10-26 21:53                                     ` Yinghai Lu
2014-10-27 10:50                                       ` Wilmer van der Gaast
2014-10-27 18:23                                         ` Yinghai Lu
2014-10-27 22:22                                           ` Wilmer van der Gaast
2014-10-27 23:41                                             ` Yinghai Lu
2014-10-28  0:03                                               ` Wilmer van der Gaast
2014-10-28  1:12                                                 ` Yinghai Lu
2014-10-28  4:03                                                   ` Yinghai Lu
2014-10-28 10:23                                                     ` Wilmer van der Gaast
2014-10-28 23:34                                                   ` Wilmer van der Gaast
2014-10-29  5:17                                                     ` Yinghai Lu
2014-10-29  9:37                                                       ` Wilmer van der Gaast
2014-10-30  0:53                                                         ` Yinghai Lu
2014-10-30 10:36                                                           ` Wilmer van der Gaast
2014-10-30 16:57                                                             ` Yinghai Lu
2014-10-30 21:54                                                               ` Wilmer van der Gaast
2014-10-30 23:02                                                                 ` Yinghai Lu
2014-10-30 23:24                                                                   ` Wilmer van der Gaast
2014-10-31  0:43                                                                     ` Yinghai Lu
2014-10-31  2:13                                                                       ` Yinghai Lu
2014-10-31  9:39                                                                         ` Wilmer van der Gaast
2014-10-31 16:11                                                                           ` Yinghai Lu
2014-10-31 21:13                                                                             ` Wilmer van der Gaast
2014-10-31 21:22                                                                               ` Yinghai Lu
2014-10-31 23:18                                                                                 ` Yinghai Lu
2014-11-01  0:00                                                                                   ` Wilmer van der Gaast
2014-11-01  2:10                                                                                     ` Yinghai Lu
2014-11-02 23:16                                                                                       ` Wilmer van der Gaast
2014-10-27 21:21                                         ` Pavel Machek
2014-10-19  8:07                             ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).