* Machine crashes right *after* ~successful resume
@ 2014-10-07 23:20 Wilmer van der Gaast
2014-10-12 14:30 ` Pavel Machek
0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-07 23:20 UTC (permalink / raw)
To: rafael.j.wysocki, linux-kernel
Hello,
Rafael, including you on this since
http://linuxconcloudopenna2013.sched.org/event/d708f47d07cd44b9669610778c024708#.VDRzTDS_EUF
mentions you as the maintainer for Linux + power management. I hope this
is still accurate.
Since Linux 3.12 (Debian version 3.12.9-1~bpo70+1) and all the way up to
3.16 (Debian version 3.16.3-2), I'm having suspend-resume issues on my
machine (Intel Z68, i7-3770K) that are somewhat less obvious.
After every boot, I get two successful suspend+resume cycles, but after
the third suspend, it won't resume successfully. On the VGA console I've
never had anything useful logged, luckily over the serial console I've
had more luck. I seem to get as far as:
[ 153.787678] PM: resume of devices complete after 3797.737 msecs
[ 153.787775] PM: resume devices took 3.796 seconds
[ 154.238612] Restarting tasks ... done.
And indeed, while testing I was running a "ping -i0.01" to a host on my
network, and it managed to get a few packets out. Timing already seems
quite off though:
22:11:49.515489 IP 192.168.44.101 > 192.168.44.100: ICMP echo request,
id 3074, seq 894, length 64
22:11:49.982265 IP 192.168.44.101 > 192.168.44.100: ICMP echo request,
id 3074, seq 895, length 64
22:11:50.986779 IP 192.168.44.101 > 192.168.44.100: ICMP echo request,
id 3074, seq 896, length 64
Note the gaps that are 0.4-1.0s instead of the 0.01s they should've
been. To me these pings going *out* sound like userland's definitely
waking up for a while, or at least some processes are. Also, for several
seconds even during earlier stages of the resume, the machine is already
responding to echo requests.
Sadly after this message to my serial console and these few ICMP
packets, the machine locks up quite hard, to the point that SysRq
doesn't respond anymore. :-(
This is happening for a while already and makes suspend+resume mostly
useless on my machine. What other debugging info can I provide to help
with getting this fixed?
I've found out about pm_trace, which always points at the same line (and
no device):
/var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [ 0.780503] Magic
number: 0:52:740
/var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [ 0.780599] hash
matches /tmp/linux-3.16.3/drivers/base/power/main.c:812
In my source tree that line is:
TRACE_RESUME(error);
Right at the end of device_resume(), under the Complete: label. Note
that I might have to redo this though, as I now realise I had only
recompiled my *kernel* with the PM_TRACE_RTC flag set, not all my
modules, which I assume is not enough. (I'm thinking of filing a Debian
bug requesting this flag to be enabled by default..) However since the
kernel seems to declare the resume as complete I'm not sure whether
pm_trace is still of any use?
With kernels 3.10 and older I have no such problems, I can
suspend+resume as often as I want.
I've already tried to skip the NVidia + VMware modules at boot time (as
you can see from the logs they're not loaded at any point), but it
didn't help. I could try omitting more modules.
I'm attaching a full dmesg of boot + a few suspend+resume cycles in 3.10
and 3.16, and a dump of the serial console showing the last resume cycle
(which I couldn't get from dmesg of course).
You might notice the message about s2ram segfaulting which I've looked
at, that seems to be VBE-related code, but this problem occurs even when
I just echo ram to /sys/power/state directly without using s2ram, so I
assume it's not related.
Sorry for the long message. I'd love some ideas for troubleshooting an
issue like this.
"Attachments" in http://roy.gaast.net/~wilmer/.lkml/ since I just
realised >200KB of attachments might not be appreciated. :-)
Cheers,
Wilmer van der Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-07 23:20 Machine crashes right *after* ~successful resume Wilmer van der Gaast
@ 2014-10-12 14:30 ` Pavel Machek
2014-10-12 15:49 ` Wilmer van der Gaast
0 siblings, 1 reply; 51+ messages in thread
From: Pavel Machek @ 2014-10-12 14:30 UTC (permalink / raw)
To: Wilmer van der Gaast; +Cc: rafael.j.wysocki, linux-kernel
Hi!
> Rafael, including you on this since http://linuxconcloudopenna2013.sched.org/event/d708f47d07cd44b9669610778c024708#.VDRzTDS_EUF
> mentions you as the maintainer for Linux + power management. I hope this is
> still accurate.
>
> Since Linux 3.12 (Debian version 3.12.9-1~bpo70+1) and all the way up to
> 3.16 (Debian version 3.16.3-2), I'm having suspend-resume issues on my
> machine (Intel Z68, i7-3770K) that are somewhat less obvious.
>
> After every boot, I get two successful suspend+resume cycles, but after the
> third suspend, it won't resume successfully. On the VGA console I've never
> had anything useful logged, luckily over the serial console I've had more
> luck. I seem to get as far as:
Has it ever worked ok? ...aha, in 3.10, ok.
> I've found out about pm_trace, which always points at the same line (and no
> device):
>
> /var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [ 0.780503] Magic
> number: 0:52:740
> /var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [ 0.780599] hash matches
> /tmp/linux-3.16.3/drivers/base/power/main.c:812
>
> In my source tree that line is:
>
> TRACE_RESUME(error);
if it resumes ok, this kind of tracking will not help.
> With kernels 3.10 and older I have no such problems, I can suspend+resume as
> often as I want.
is there chance to bisect?
> I've already tried to skip the NVidia + VMware modules at boot time (as you
> can see from the logs they're not loaded at any point), but it didn't help.
> I could try omitting more modules.
Yes, try with minimal modules (and no s2ram) would be nice.
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-12 14:30 ` Pavel Machek
@ 2014-10-12 15:49 ` Wilmer van der Gaast
2014-10-12 20:40 ` Pavel Machek
0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-12 15:49 UTC (permalink / raw)
To: Pavel Machek; +Cc: rafael.j.wysocki, linux-kernel
Hello,
Many thanks for your response!
On 12-10-14 15:30, Pavel Machek wrote:
>
> Has it ever worked ok? ...aha, in 3.10, ok.
>
Correct. And I've tried a few more kernels now, compiled on my own. 3.17
still has this issue, 3.10 is completely fine all the way up to 3.10.57
(I've tested just under 50 cycles last night). 3.11 I tried but it seems
to have other suspend-resume stability issues not present anymore in
later kernels, I've mostly not used those results.
git bisect: I've finally succeeded! I've tried automating it completely,
but sadly Gigabyte couldn't be bothered wiring up the motherboard to
make the watchdog work. :-(
The culprit appears to be this one: 2e8b5f621dbe29425906852c6079afb6b28720cb
Merge: 07f2daa fed2451
Author: Bjorn Helgaas <bhelgaas@google.com>
Date: Wed Aug 28 20:55:41 2013 -0600
Merge branch 'pci/misc' into next
* pci/misc:
PCI: Remove pcie_cap_has_devctl()
PCI: Support PCIe Capability Slot registers only for ports with slots
PCI: Remove PCIe Capability version checks
PCI: Allow PCIe Capability link-related register access for switches
PCI: Add offsets of PCIe capability registers
PCI: Tidy bitmasks and spacing of PCIe capability definitions
PCI: Remove obsolete comment reference to pci_pcie_cap2()
PCI: Clarify PCI_EXP_TYPE_PCI_BRIDGE comment
PCI: Rename PCIe capability definitions to follow convention
PCI: Disable decoding for BAR sizing only when it was actually
enabled
PCI: Add comment about needing pci_msi_off() even when
CONFIG_PCI_MSI=n
PCI: Add pcibios_pm_ops for optional arch-specific hibernate
functionality
I've then tried to narrow down which of the merged changes is my issue
but with no luck, possibly because there's a problem with a combination
of one of these changes, and a change that was not in the pci/misc
branch at the time. I could do a manual test instead.
>> I've already tried to skip the NVidia + VMware modules at boot time (as you
>> can see from the logs they're not loaded at any point), but it didn't help.
>> I could try omitting more modules.
> Yes, try with minimal modules (and no s2ram) would be nice.
>
I've tried unloading a bunch of modules (sound and NIC IIRC), same
results. I can try this again with an even more minimal set. If this
improves the situation, I'll post again.
Wilmer van der Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-12 15:49 ` Wilmer van der Gaast
@ 2014-10-12 20:40 ` Pavel Machek
2014-10-12 23:47 ` Wilmer van der Gaast
2014-10-13 15:06 ` Rafael J. Wysocki
0 siblings, 2 replies; 51+ messages in thread
From: Pavel Machek @ 2014-10-12 20:40 UTC (permalink / raw)
To: Wilmer van der Gaast, bhelgaas; +Cc: rafael.j.wysocki, linux-kernel
Bjorn, any ideas?
Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?
Thanks,
Pavel
On Sun 2014-10-12 16:49:18, Wilmer van der Gaast wrote:
> Hello,
>
> Many thanks for your response!
>
> On 12-10-14 15:30, Pavel Machek wrote:
> >
> >Has it ever worked ok? ...aha, in 3.10, ok.
> >
> Correct. And I've tried a few more kernels now, compiled on my own. 3.17
> still has this issue, 3.10 is completely fine all the way up to 3.10.57
> (I've tested just under 50 cycles last night). 3.11 I tried but it seems to
> have other suspend-resume stability issues not present anymore in later
> kernels, I've mostly not used those results.
>
> git bisect: I've finally succeeded! I've tried automating it completely, but
> sadly Gigabyte couldn't be bothered wiring up the motherboard to make the
> watchdog work. :-(
>
> The culprit appears to be this one: 2e8b5f621dbe29425906852c6079afb6b28720cb
>
> Merge: 07f2daa fed2451
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date: Wed Aug 28 20:55:41 2013 -0600
>
> Merge branch 'pci/misc' into next
>
> * pci/misc:
> PCI: Remove pcie_cap_has_devctl()
> PCI: Support PCIe Capability Slot registers only for ports with slots
> PCI: Remove PCIe Capability version checks
> PCI: Allow PCIe Capability link-related register access for switches
> PCI: Add offsets of PCIe capability registers
> PCI: Tidy bitmasks and spacing of PCIe capability definitions
> PCI: Remove obsolete comment reference to pci_pcie_cap2()
> PCI: Clarify PCI_EXP_TYPE_PCI_BRIDGE comment
> PCI: Rename PCIe capability definitions to follow convention
> PCI: Disable decoding for BAR sizing only when it was actually enabled
> PCI: Add comment about needing pci_msi_off() even when
> CONFIG_PCI_MSI=n
> PCI: Add pcibios_pm_ops for optional arch-specific hibernate
> functionality
>
> I've then tried to narrow down which of the merged changes is my issue but
> with no luck, possibly because there's a problem with a combination of one
> of these changes, and a change that was not in the pci/misc branch at the
> time. I could do a manual test instead.
>
> >>I've already tried to skip the NVidia + VMware modules at boot time (as you
> >>can see from the logs they're not loaded at any point), but it didn't help.
> >>I could try omitting more modules.
> >Yes, try with minimal modules (and no s2ram) would be nice.
> >
> I've tried unloading a bunch of modules (sound and NIC IIRC), same results.
> I can try this again with an even more minimal set. If this improves the
> situation, I'll post again.
>
>
> Wilmer van der Gaast.
>
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-12 20:40 ` Pavel Machek
@ 2014-10-12 23:47 ` Wilmer van der Gaast
2014-10-13 15:06 ` Rafael J. Wysocki
1 sibling, 0 replies; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-12 23:47 UTC (permalink / raw)
To: Pavel Machek; +Cc: bhelgaas, rafael.j.wysocki, linux-kernel
On 12-10-14 21:40, Pavel Machek wrote:
> Bjorn, any ideas?
>
> Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?
>
I've tried this, too many conflicts unfortunately.
Just noticed this message appear during failing resumes by the way:
[ 54.203072] Clocksource tsc unstable (delta = -499956111 ns)
[ 54.203151] Switched to clocksource hpet
[ 54.203166] PM: resume of devices complete after 2142.341 msecs
Though not all the time. Feels like it's more another symptom of the
same problem. In my original e-mail I already noted timing strangeness,
with a 0.01s ping interval growing to 0.4s+.
Anyway, my previous bisect result appears to be wrong. :-( I've done
another bisect on a narrow range around it, now
928bea964827d7824b548c1f8e06eccbbc4d0d7d is considered guilty. I've
rerun the test twice with that revision and the one before it
(55ed83a615730c2578da155bc99b68f4417ffe20), and the result seems
consistent now; 928bea gets me just two clean suspend+resumes, 55ed83 more.
I have tried to revert this change in a 3.17 tree but it didn't apply
cleanly. One issue was a "Unreversed patch detected!" which looks to me
like some of this work has been changed already. Even against a 3.12
tree I get this issue.
Just to be sure, I've tried ignoring the unreversed patch warning and
tweaked the patch in two more places to make it apply, but indeed that
does not solve my problem.
A Google search for the revision number shows that there has been quite
a discussion about it already. Maybe my machine has found another issue
(though I suppose my machine's more guilty than the kernel! :-/).
>> I've tried unloading a bunch of modules (sound and NIC IIRC), same results.
>> I can try this again with an even more minimal set. If this improves the
>> situation, I'll post again.
>>
This is done: Still seeing the same issue. (And I'm using raw echo
mem>/proc/... for all testing now.) Same for a "make defconfig" kernel.
Wilmer van der Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-12 20:40 ` Pavel Machek
2014-10-12 23:47 ` Wilmer van der Gaast
@ 2014-10-13 15:06 ` Rafael J. Wysocki
2014-10-15 11:16 ` Wilmer van der Gaast
1 sibling, 1 reply; 51+ messages in thread
From: Rafael J. Wysocki @ 2014-10-13 15:06 UTC (permalink / raw)
To: Pavel Machek, Wilmer van der Gaast
Cc: bhelgaas, rafael.j.wysocki, linux-kernel
On Sunday, October 12, 2014 10:40:32 PM Pavel Machek wrote:
> Bjorn, any ideas?
>
> Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?
That's a merge, isn't it?
I'd rather check what the pci/misc branch was based on and then bisect that
branch.
If you do
$ git show fed2451
you'll see (among other things) that this indeed is the PCI branch merged
by that commit and that it is based on
3b2f64d00c46 Linux 3.11-rc2
So, you can do
$ git bisect 3b2f64d00c46..fed2451
and see which of the commits in there introduced the problem you're seeing.
Note: Test fed2451 itself *first* and if that is bad already, then the merge
itself was problematic, in which case please let me know.
> On Sun 2014-10-12 16:49:18, Wilmer van der Gaast wrote:
> > Hello,
> >
> > Many thanks for your response!
> >
> > On 12-10-14 15:30, Pavel Machek wrote:
> > >
> > >Has it ever worked ok? ...aha, in 3.10, ok.
> > >
> > Correct. And I've tried a few more kernels now, compiled on my own. 3.17
> > still has this issue, 3.10 is completely fine all the way up to 3.10.57
> > (I've tested just under 50 cycles last night). 3.11 I tried but it seems to
> > have other suspend-resume stability issues not present anymore in later
> > kernels, I've mostly not used those results.
> >
> > git bisect: I've finally succeeded! I've tried automating it completely, but
> > sadly Gigabyte couldn't be bothered wiring up the motherboard to make the
> > watchdog work. :-(
> >
> > The culprit appears to be this one: 2e8b5f621dbe29425906852c6079afb6b28720cb
> >
> > Merge: 07f2daa fed2451
> > Author: Bjorn Helgaas <bhelgaas@google.com>
> > Date: Wed Aug 28 20:55:41 2013 -0600
> >
> > Merge branch 'pci/misc' into next
> >
> > * pci/misc:
> > PCI: Remove pcie_cap_has_devctl()
> > PCI: Support PCIe Capability Slot registers only for ports with slots
> > PCI: Remove PCIe Capability version checks
> > PCI: Allow PCIe Capability link-related register access for switches
> > PCI: Add offsets of PCIe capability registers
> > PCI: Tidy bitmasks and spacing of PCIe capability definitions
> > PCI: Remove obsolete comment reference to pci_pcie_cap2()
> > PCI: Clarify PCI_EXP_TYPE_PCI_BRIDGE comment
> > PCI: Rename PCIe capability definitions to follow convention
> > PCI: Disable decoding for BAR sizing only when it was actually enabled
> > PCI: Add comment about needing pci_msi_off() even when
> > CONFIG_PCI_MSI=n
> > PCI: Add pcibios_pm_ops for optional arch-specific hibernate
> > functionality
> >
> > I've then tried to narrow down which of the merged changes is my issue but
> > with no luck, possibly because there's a problem with a combination of one
> > of these changes, and a change that was not in the pci/misc branch at the
> > time. I could do a manual test instead.
> >
> > >>I've already tried to skip the NVidia + VMware modules at boot time (as you
> > >>can see from the logs they're not loaded at any point), but it didn't help.
> > >>I could try omitting more modules.
> > >Yes, try with minimal modules (and no s2ram) would be nice.
> > >
> > I've tried unloading a bunch of modules (sound and NIC IIRC), same results.
> > I can try this again with an even more minimal set. If this improves the
> > situation, I'll post again.
> >
> >
> > Wilmer van der Gaast.
> >
>
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-13 15:06 ` Rafael J. Wysocki
@ 2014-10-15 11:16 ` Wilmer van der Gaast
2014-10-15 13:58 ` Bjorn Helgaas
0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-15 11:16 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Pavel Machek, bhelgaas, rafael.j.wysocki, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1302 bytes --]
Hello Rafael,
Rafael J. Wysocki (rjw@rjwysocki.net) wrote:
> > Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?
> That's a merge, isn't it?
>
Correct, it was, and I did try to figure out which of its parents was
the guilty one, but then I found out the real problem is
928bea964827d7824b548c1f8e06eccbbc4d0d7d.
Not sure why 2e8b... was initially found guilty by git bisect, I fear
that my testing was not thorough enough. I've verified a couple of times
now that 928bea96... does cause crashes and the previous revision does not.
928bea... seems to reshuffle PCI initialisation a little bit and has
caused more troubles, judging from a Google query for it. Some changes
were made already as a result, and this unfortunately makes a revert on
a later kernel tree (to see if that fixes the problem for me) much less
straight-forward. :-(
I can look at the code and see how to revert this now, but I'm
definitely not very proficient outside userland.
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 173 bytes --]
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-15 11:16 ` Wilmer van der Gaast
@ 2014-10-15 13:58 ` Bjorn Helgaas
2014-10-15 18:39 ` Yinghai Lu
0 siblings, 1 reply; 51+ messages in thread
From: Bjorn Helgaas @ 2014-10-15 13:58 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Rafael J. Wysocki, Pavel Machek, Rafael Wysocki, linux-kernel,
Yinghai Lu
[+cc Yinghai, author of 928bea964827 ("PCI: Delay enabling bridges
until they're needed")]
On Wed, Oct 15, 2014 at 5:16 AM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> Hello Rafael,
>
> Rafael J. Wysocki (rjw@rjwysocki.net) wrote:
>> > Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?
>> That's a merge, isn't it?
>>
> Correct, it was, and I did try to figure out which of its parents was
> the guilty one, but then I found out the real problem is
> 928bea964827d7824b548c1f8e06eccbbc4d0d7d.
>
> Not sure why 2e8b... was initially found guilty by git bisect, I fear
> that my testing was not thorough enough. I've verified a couple of times
> now that 928bea96... does cause crashes and the previous revision does not.
>
> 928bea... seems to reshuffle PCI initialisation a little bit and has
> caused more troubles, judging from a Google query for it. Some changes
> were made already as a result, and this unfortunately makes a revert on
> a later kernel tree (to see if that fixes the problem for me) much less
> straight-forward. :-(
More details (from initial post) here: http://roy.gaast.net/~wilmer/.lkml/
Can you open a report at http://bugzilla.kernel.org, please? Please
also attach the complete "lspci -vv" output.
Bjorn
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-15 13:58 ` Bjorn Helgaas
@ 2014-10-15 18:39 ` Yinghai Lu
2014-10-15 23:34 ` Wilmer van der Gaast
0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-15 18:39 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Wilmer van der Gaast, Rafael J. Wysocki, Pavel Machek,
Rafael Wysocki, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1050 bytes --]
On Wed, Oct 15, 2014 at 6:58 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc Yinghai, author of 928bea964827 ("PCI: Delay enabling bridges
> until they're needed")]
>
> On Wed, Oct 15, 2014 at 5:16 AM, Wilmer van der Gaast <wilmer@gaast.net>
>> Not sure why 2e8b... was initially found guilty by git bisect, I fear
>> that my testing was not thorough enough. I've verified a couple of times
>> now that 928bea96... does cause crashes and the previous revision does not.
so third resume will not work? that is strange.
second and third should not use same code path...
>>
>> 928bea... seems to reshuffle PCI initialisation a little bit and has
>> caused more troubles, judging from a Google query for it. Some changes
>> were made already as a result, and this unfortunately makes a revert on
>> a later kernel tree (to see if that fixes the problem for me) much less
>> straight-forward. :-(
>
> More details (from initial post) here: http://roy.gaast.net/~wilmer/.lkml/
Please check if attached reverting patch would work on 3.17.
Yinghai
[-- Attachment #2: revert_928bea9_from_3.17.patch --]
[-- Type: text/x-patch, Size: 7187 bytes --]
diff --git a/arch/arm/kernel/bios32.c b/arch/arm/kernel/bios32.c
index 17a26c1..306ca53 100644
--- a/arch/arm/kernel/bios32.c
+++ b/arch/arm/kernel/bios32.c
@@ -538,6 +538,12 @@ void pci_common_init_dev(struct device *parent, struct hw_pci *hw)
* Assign resources.
*/
pci_bus_assign_resources(bus);
+
+
+ /*
+ * Enable bridges
+ */
+ pci_enable_bridges(bus);
}
/*
diff --git a/arch/m68k/coldfire/pci.c b/arch/m68k/coldfire/pci.c
index df96792..b33f97a 100644
--- a/arch/m68k/coldfire/pci.c
+++ b/arch/m68k/coldfire/pci.c
@@ -319,6 +319,7 @@ static int __init mcf_pci_init(void)
pci_fixup_irqs(pci_common_swizzle, mcf_pci_map_irq);
pci_bus_size_bridges(rootbus);
pci_bus_assign_resources(rootbus);
+ pci_enable_bridges(rootbus);
return 0;
}
diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c
index 1bf60b1..4f2e17d 100644
--- a/arch/mips/pci/pci.c
+++ b/arch/mips/pci/pci.c
@@ -113,6 +113,7 @@ static void pcibios_scanbus(struct pci_controller *hose)
if (!pci_has_flag(PCI_PROBE_ONLY)) {
pci_bus_size_bridges(bus);
pci_bus_assign_resources(bus);
+ pci_enable_bridges(bus);
}
}
}
diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c
index 1bc09ee..5272327 100644
--- a/arch/sh/drivers/pci/pci.c
+++ b/arch/sh/drivers/pci/pci.c
@@ -69,6 +69,7 @@ static void pcibios_scanbus(struct pci_channel *hose)
pci_bus_size_bridges(bus);
pci_bus_assign_resources(bus);
+ pci_enable_bridges(bus);
} else {
pci_free_resource_list(&resources);
}
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index cd4de7e..c15bc3c 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -614,6 +614,9 @@ static int acpi_pci_root_add(struct acpi_device *device,
if (system_state != SYSTEM_BOOTING) {
pcibios_resource_survey_bus(root->bus);
pci_assign_unassigned_root_bus_resources(root->bus);
+
+ /* need to after hot-added ioapic is registered */
+ pci_enable_bridges(root->bus);
}
pci_lock_rescan_remove();
diff --git a/drivers/parisc/lba_pci.c b/drivers/parisc/lba_pci.c
index 37e71ff..19f6f70 100644
--- a/drivers/parisc/lba_pci.c
+++ b/drivers/parisc/lba_pci.c
@@ -1590,6 +1590,7 @@ lba_driver_probe(struct parisc_device *dev)
lba_dump_res(&lba_dev->hba.lmmio_space, 2);
#endif
}
+ pci_enable_bridges(lba_bus);
/*
** Once PCI register ops has walked the bus, access to config
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 73aef51..761601e 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -283,6 +283,26 @@ void pci_bus_add_devices(const struct pci_bus *bus)
}
EXPORT_SYMBOL(pci_bus_add_devices);
+void pci_enable_bridges(struct pci_bus *bus)
+{
+ struct pci_dev *dev;
+ int retval;
+
+ list_for_each_entry(dev, &bus->devices, bus_list) {
+ if (dev->subordinate) {
+ if (!pci_is_enabled(dev)) {
+ retval = pci_enable_device(dev);
+ if (retval)
+ dev_err(&dev->dev, "Error enabling bridge (%d), continuing\n", retval);
+ pci_set_master(dev);
+ }
+ pci_enable_bridges(dev->subordinate);
+ }
+ }
+}
+EXPORT_SYMBOL(pci_enable_bridges);
+
+
/** pci_walk_bus - walk devices on/under bus, calling callback.
* @top bus whose devices should be walked
* @cb callback to be called for each device found
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index bcb90e4..8fadc84 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -511,6 +511,7 @@ static void enable_slot(struct acpiphp_slot *slot)
acpiphp_sanitize_bus(bus);
pcie_bus_configure_settings(bus);
acpiphp_set_acpi_region(slot);
+ pci_enable_bridges(bus);
list_for_each_entry(dev, &bus->devices, bus_list) {
/* Assume that newly added devices are powered on already. */
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 625a4ac..4121518 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1242,31 +1242,8 @@ int pci_reenable_device(struct pci_dev *dev)
}
EXPORT_SYMBOL(pci_reenable_device);
-static void pci_enable_bridge(struct pci_dev *dev)
-{
- struct pci_dev *bridge;
- int retval;
-
- bridge = pci_upstream_bridge(dev);
- if (bridge)
- pci_enable_bridge(bridge);
-
- if (pci_is_enabled(dev)) {
- if (!dev->is_busmaster)
- pci_set_master(dev);
- return;
- }
-
- retval = pci_enable_device(dev);
- if (retval)
- dev_err(&dev->dev, "Error enabling bridge (%d), continuing\n",
- retval);
- pci_set_master(dev);
-}
-
static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
{
- struct pci_dev *bridge;
int err;
int i, bars = 0;
@@ -1285,10 +1262,6 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
if (atomic_inc_return(&dev->enable_cnt) > 1)
return 0; /* already enabled */
- bridge = pci_upstream_bridge(dev);
- if (bridge)
- pci_enable_bridge(bridge);
-
/* only skip sriov related */
for (i = 0; i <= PCI_ROM_RESOURCE; i++)
if (dev->resource[i].flags & flags)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 5ed9930..df17ba8 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2177,6 +2177,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
max = pci_scan_child_bus(bus);
pci_assign_unassigned_bus_resources(bus);
+ pci_enable_bridges(bus);
pci_bus_add_devices(bus);
return max;
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 0482235..2cfb1eb 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1587,7 +1587,7 @@ again:
/* any device complain? */
if (list_empty(&fail_head))
- goto dump;
+ goto enable_and_dump;
if (tried_times >= pci_try_num) {
if (enable_local == undefined)
@@ -1596,7 +1596,7 @@ again:
dev_info(&bus->dev, "Automatically enabled pci realloc, if you have problem, try booting with pci=realloc=off\n");
free_list(&fail_head);
- goto dump;
+ goto enable_and_dump;
}
dev_printk(KERN_DEBUG, &bus->dev,
@@ -1629,7 +1629,10 @@ again:
goto again;
-dump:
+enable_and_dump:
+ /* Depth last, update the hardware. */
+ pci_enable_bridges(bus);
+
/* dump the resource on buses */
pci_bus_dump_resources(bus);
}
@@ -1700,6 +1703,7 @@ enable_all:
if (retval)
dev_err(&bridge->dev, "Error reenabling bridge (%d)\n", retval);
pci_set_master(bridge);
+ pci_enable_bridges(parent);
}
EXPORT_SYMBOL_GPL(pci_assign_unassigned_bridge_resources);
diff --git a/drivers/pcmcia/cardbus.c b/drivers/pcmcia/cardbus.c
index 4fe4cc4..9cbe4cf 100644
--- a/drivers/pcmcia/cardbus.c
+++ b/drivers/pcmcia/cardbus.c
@@ -92,6 +92,7 @@ int __ref cb_alloc(struct pcmcia_socket *s)
if (s->tune_bridge)
s->tune_bridge(s, bus);
+ pci_enable_bridges(bus);
pci_bus_add_devices(bus);
pci_unlock_rescan_remove();
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5be8db4..1f85fb5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1105,7 +1105,7 @@ int __must_check pci_bus_alloc_resource(struct pci_bus *bus,
resource_size_t,
resource_size_t),
void *alignf_data);
-
+void pci_enable_bridges(struct pci_bus *bus);
int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr);
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-15 18:39 ` Yinghai Lu
@ 2014-10-15 23:34 ` Wilmer van der Gaast
2014-10-16 4:32 ` Yinghai Lu
0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-15 23:34 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
Hello Yinghai,
On 15-10-14 19:39, Yinghai Lu wrote:
>
> so third resume will not work? that is strange.
> second and third should not use same code path...
>
Always exactly the third time, yes. Seems strange indeed. :-( I was
under the impression that on each resume, completion time of device
resumes was growing, and wondered whether that could be related. However
looking back at my logs, this is not consistent, in some cases the time
is constant.
Anyway, your patch works! Had to tweak it slightly to apply cleanly to
the 3.17 tarball I have, but my machine now went through eleven
successful suspend+resume cycles again.
Is there anything I can do now to find out why your change is causing my
machine to crash?
Thank you!
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-15 23:34 ` Wilmer van der Gaast
@ 2014-10-16 4:32 ` Yinghai Lu
2014-10-16 9:36 ` Wilmer van der Gaast
0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-16 4:32 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 468 bytes --]
On Wed, Oct 15, 2014 at 4:34 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
>
> Is there anything I can do now to find out why your change is causing my
> machine to crash?
Can you please try attached patch? that should workaround the problem.
as some driver is using pci_enable_device in .resume instead of
pci_renable_device....
We should skip the pci_enable_bridge in those pci_enable_device to avoid
contention between async device_resume.
Thanks
Yinghai
[-- Attachment #2: skip_enable_bridge_on_resume_path.patch --]
[-- Type: text/x-patch, Size: 1081 bytes --]
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 625a4ac..6567831 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1266,7 +1266,6 @@ static void pci_enable_bridge(struct pci_dev *dev)
static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
{
- struct pci_dev *bridge;
int err;
int i, bars = 0;
@@ -1285,9 +1284,19 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
if (atomic_inc_return(&dev->enable_cnt) > 1)
return 0; /* already enabled */
- bridge = pci_upstream_bridge(dev);
- if (bridge)
- pci_enable_bridge(bridge);
+ /*
+ * Do not enable bridge again on resume path, as parent state
+ * get restored before.
+ * Also could avoid delay between different async resume.
+ */
+ if (!(dev->dev.power.is_suspended ||
+ dev->dev.power.is_noirq_suspended ||
+ dev->dev.power.is_late_suspended)) {
+ struct pci_dev *bridge = pci_upstream_bridge(dev);
+
+ if (bridge)
+ pci_enable_bridge(bridge);
+ }
/* only skip sriov related */
for (i = 0; i <= PCI_ROM_RESOURCE; i++)
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-16 4:32 ` Yinghai Lu
@ 2014-10-16 9:36 ` Wilmer van der Gaast
2014-10-16 16:36 ` Yinghai Lu
0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-16 9:36 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
Hello,
On 16-10-14 05:32, Yinghai Lu wrote:
>
> Can you please try attached patch? that should workaround the problem.
>
Sadly, no luck. (I do assume you meant me to use the patch against a
clean 3.17 tree *without* yesterday's revert patch applied.) Back to a
crash at/after the third resume:
[ 372.502897] usb 3-1.1: reset high-speed USB device number 3 using
ehci-pci
[ 372.678765] usb 2-1.5: reset low-speed USB device number 3 using ehci-pci
[ 373.398437] Clocksource tsc unstable (delta = -136457848 ns)
[ 373.897503] Switched to clocksource hpet
[ 373.897536] PM: resume of devices complete after 2143.535 msecs
[ 373.898225] r8169 0000:07:00.0 eth0: link up
[ 374.319311] Restarting tasks ... done.
(And then nothing.)
Interestingly I did see the "resume of devices" time grow on each resume
again this time. I'll put the full dmesg dump in the same place like
before: http://gaast.net/~wilmer/.lkml/
There's a lspci -vv dump there as well, as Bjorn asked for. I'll file a
bug on bugzilla tonight.
> as some driver is using pci_enable_device in .resume instead of
> pci_renable_device....
>
Maybe this doesn't matter, but I could reproduce this issue even with no
modules loaded at all (so barebone that I couldn't even mount my rootfs
and had to do this testing in the initrd), so with only mainline kernel
code running.
Thanks,
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-16 9:36 ` Wilmer van der Gaast
@ 2014-10-16 16:36 ` Yinghai Lu
2014-10-16 21:08 ` Wilmer van der Gaast
0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-16 16:36 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
On Thu, Oct 16, 2014 at 2:36 AM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> Hello,
>
> On 16-10-14 05:32, Yinghai Lu wrote:
>>
>>
>> Can you please try attached patch? that should workaround the problem.
>>
> Sadly, no luck. (I do assume you meant me to use the patch against a clean
> 3.17 tree *without* yesterday's revert patch applied.) Back to a crash
> at/after the third resume:
>
> [ 372.502897] usb 3-1.1: reset high-speed USB device number 3 using
> ehci-pci
> [ 372.678765] usb 2-1.5: reset low-speed USB device number 3 using ehci-pci
> [ 373.398437] Clocksource tsc unstable (delta = -136457848 ns)
> [ 373.897503] Switched to clocksource hpet
> [ 373.897536] PM: resume of devices complete after 2143.535 msecs
> [ 373.898225] r8169 0000:07:00.0 eth0: link up
> [ 374.319311] Restarting tasks ... done.
> (And then nothing.)
>
> Interestingly I did see the "resume of devices" time grow on each resume
> again this time. I'll put the full dmesg dump in the same place like before:
> http://gaast.net/~wilmer/.lkml/
Checked that dmesg and console output, looks ok from last resume.
Can you put "debug ignore_loglevel" in boot command line?
So we can compare output from serial console between good one and bad
one directly.
Also did you try to remove r8169 every time before suspend?
Thanks
Yinghai
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-16 16:36 ` Yinghai Lu
@ 2014-10-16 21:08 ` Wilmer van der Gaast
2014-10-18 21:28 ` Yinghai Lu
0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-16 21:08 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
Hello,
I have filed a bug now:
https://bugzilla.kernel.org/show_bug.cgi?id=86421 We should probably
continue the discussion there now? I've added just you to the CC field,
not sure who else on this thread is still interested at this point.
On 16-10-14 17:36, Yinghai Lu wrote:
>
> Can you put "debug ignore_loglevel" in boot command line?
> So we can compare output from serial console between good one and bad
> one directly.
>
Did that, will throw the output in the same log dir. Those arguments
resulted in very little extra output. :-/
> Also did you try to remove r8169 every time before suspend?
>
Did that on this run, no difference either. For full completeness, I
reproduced this problem with no modules loaded (done from initramfs) at
all, with a kernel with your workaround included, logs are here:
http://gaast.net/~wilmer/.lkml/bad3.17-patched-debug-initramfs.txt
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-16 21:08 ` Wilmer van der Gaast
@ 2014-10-18 21:28 ` Yinghai Lu
2014-10-18 23:57 ` Wilmer van der Gaast
0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-18 21:28 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 586 bytes --]
On Thu, Oct 16, 2014 at 2:08 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> Did that on this run, no difference either. For full completeness, I
> reproduced this problem with no modules loaded (done from initramfs) at all,
> with a kernel with your workaround included, logs are here:
> http://gaast.net/~wilmer/.lkml/bad3.17-patched-debug-initramfs.txt
Yes, those output are good.
Please apply attached debug patch on top of v3.17 and boot with
"debug ignore_loglevel initcall_debug no_console_suspend".
Hope we can find out which nb notifier cause problem.
Thanks
Yinghai
[-- Attachment #2: debug_suspend_resume_x.patch --]
[-- Type: text/x-patch, Size: 1849 bytes --]
---
kernel/notifier.c | 9 +++++++++
kernel/power/main.c | 4 +++-
2 files changed, 12 insertions(+), 1 deletion(-)
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -24,16 +24,18 @@ DEFINE_MUTEX(pm_mutex);
/* Routines for PM-transition notifications */
-static BLOCKING_NOTIFIER_HEAD(pm_chain_head);
+BLOCKING_NOTIFIER_HEAD(pm_chain_head);
int register_pm_notifier(struct notifier_block *nb)
{
+ pr_info("PM: registering nb %pF\n", nb->notifier_call);
return blocking_notifier_chain_register(&pm_chain_head, nb);
}
EXPORT_SYMBOL_GPL(register_pm_notifier);
int unregister_pm_notifier(struct notifier_block *nb)
{
+ pr_info("PM: unregistering nb %pF\n", nb->notifier_call);
return blocking_notifier_chain_unregister(&pm_chain_head, nb);
}
EXPORT_SYMBOL_GPL(unregister_pm_notifier);
Index: linux-2.6/kernel/notifier.c
===================================================================
--- linux-2.6.orig/kernel/notifier.c
+++ linux-2.6/kernel/notifier.c
@@ -59,6 +59,9 @@ static int notifier_chain_unregister(str
return -ENOENT;
}
+extern struct blocking_notifier_head pm_chain_head;
+#define PM_POST_SUSPEND 0x0004 /* Suspend finished */
+
/**
* notifier_call_chain - Informs the registered notifiers about an event.
* @nl: Pointer to head of the blocking notifier chain
@@ -90,8 +93,14 @@ static int notifier_call_chain(struct no
continue;
}
#endif
+ if (nl == &pm_chain_head.head && val == PM_POST_SUSPEND)
+ pr_info("PM: calling nb %pF\n", nb->notifier_call);
+
ret = nb->notifier_call(nb, val, v);
+ if (nl == &pm_chain_head.head && val == PM_POST_SUSPEND)
+ pr_info("PM: ... nb %pF done\n", nb->notifier_call);
+
if (nr_calls)
(*nr_calls)++;
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-18 21:28 ` Yinghai Lu
@ 2014-10-18 23:57 ` Wilmer van der Gaast
2014-10-19 4:29 ` Yinghai Lu
2014-10-19 8:07 ` Pavel Machek
0 siblings, 2 replies; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-18 23:57 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
(Resending, forgot to hit reply-to-all.)
Hello Yinghai,
On 18-10-14 22:28, Yinghai Lu wrote:
>
> Please apply attached debug patch on top of v3.17 and boot with
> "debug ignore_loglevel initcall_debug no_console_suspend".
>
> Hope we can find out which nb notifier cause problem.
>
Did that. Strangely, or better said, quite annoyingly, I'm now getting
no output anymore at all on the third resume! :-(
I could try non-serial instead if you think that's worth a shot, but the
most annoying thing is that my video doesn't get initialised properly
after resume unless I have the tainting nvidia driver loaded. I could
try if nouveau helps.
I've dropped all the debugging output in the same directory like before,
look for files named like
http://roy.gaast.net/~wilmer/.lkml/bad3.17-patched-initcall.txt
Thanks,
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-18 23:57 ` Wilmer van der Gaast
@ 2014-10-19 4:29 ` Yinghai Lu
2014-10-19 10:48 ` Wilmer van der Gaast
2014-10-21 21:40 ` Wilmer van der Gaast
2014-10-19 8:07 ` Pavel Machek
1 sibling, 2 replies; 51+ messages in thread
From: Yinghai Lu @ 2014-10-19 4:29 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
On Sat, Oct 18, 2014 at 4:57 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> On 18-10-14 22:28, Yinghai Lu wrote:
>>
>> Please apply attached debug patch on top of v3.17 and boot with
>> "debug ignore_loglevel initcall_debug no_console_suspend".
>>
>> Hope we can find out which nb notifier cause problem.
>>
> Did that. Strangely, or better said, quite annoyingly, I'm now getting no
> output anymore at all on the third resume! :-(
>
> I could try non-serial instead if you think that's worth a shot, but the
> most annoying thing is that my video doesn't get initialised properly after
> resume unless I have the tainting nvidia driver loaded. I could try if
> nouveau helps.
oh no.
Please try to "debug ignore_loglevel no_console_suspend".
Thanks
Yinghai
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-18 23:57 ` Wilmer van der Gaast
2014-10-19 4:29 ` Yinghai Lu
@ 2014-10-19 8:07 ` Pavel Machek
1 sibling, 0 replies; 51+ messages in thread
From: Pavel Machek @ 2014-10-19 8:07 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Yinghai Lu, Bjorn Helgaas, Rafael J. Wysocki, Rafael Wysocki,
linux-kernel
On Sun 2014-10-19 00:57:12, Wilmer van der Gaast wrote:
> (Resending, forgot to hit reply-to-all.)
>
> Hello Yinghai,
>
> On 18-10-14 22:28, Yinghai Lu wrote:
> >
> > Please apply attached debug patch on top of v3.17 and boot with
> > "debug ignore_loglevel initcall_debug no_console_suspend".
> >
> > Hope we can find out which nb notifier cause problem.
> >
> Did that. Strangely, or better said, quite annoyingly, I'm now getting no
> output anymore at all on the third resume! :-(
>
> I could try non-serial instead if you think that's worth a shot, but the
> most annoying thing is that my video doesn't get initialised properly after
> resume unless I have the tainting nvidia driver loaded. I could try if
> nouveau helps.
Tainting should not be a problem. If it works for you, it works...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-19 4:29 ` Yinghai Lu
@ 2014-10-19 10:48 ` Wilmer van der Gaast
2014-10-21 21:40 ` Wilmer van der Gaast
1 sibling, 0 replies; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-19 10:48 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
Hello,
On 19-10-14 05:29, Yinghai Lu wrote:
>
> Please try to "debug ignore_loglevel no_console_suspend".
>
Same thing. :-(
[ 72.572354] Restarting tasks ... done.
[ 72.576554] PM: calling nb rcu_pm_notify+0x0/0x60
[ 72.581277] PM: ... nb rcu_pm_notify+0x0/0x60 done
[ 72.586115] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
[ 72.591692] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
[ 72.597345] PM: calling nb fw_pm_notify+0x0/0x150
[ 72.602047] PM: ... nb fw_pm_notify+0x0/0x150 done
[ 72.606839] PM: calling nb bsp_pm_callback+0x0/0x50
[ 72.611711] PM: ... nb bsp_pm_callback+0x0/0x50 done
[ 73.382175] r8169 0000:07:00.0 eth0: link up
[ 78.857526] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 79.025718] ata3.00: configured for UDMA/133
[ 81.379533] ata4: softreset failed (device not ready)
[ 82.623212] PM: Syncing filesystems ... done.
[ 82.661564] PM: Preparing system for mem sleep
[ 82.669405] Freezing user space processes ... (elapsed 0.001 seconds)
done.
[ 82.677729] Freezing remaining freezable tasks ... (elapsed 0.001
seconds) done.
[ 82.686338] PM: Entering mem sleep
And nothing related to resume. :-(
Is there any point of me retrying with the initcall_debug flag but
without your patch?
Looking at your patch again, it seems pretty mad that this would cause
such a big difference. Overnight I remembered how my machine has TSC
issues at the time this bug shows, so I tried setting hpet as the
clocksource. (hpet=force on the cmdline did not seem to have that effect
so I used sysfs instead) No effect either.
I need to go now, can experiment a little more tonight.
Thanks,
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-19 4:29 ` Yinghai Lu
2014-10-19 10:48 ` Wilmer van der Gaast
@ 2014-10-21 21:40 ` Wilmer van der Gaast
2014-10-21 23:15 ` Yinghai Lu
1 sibling, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-21 21:40 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
Hello,
Sorry for the delay, finally poked at this again. It looks like the
no_console_suspend flag was causing troubles, which I didn't really need
anyway with logging going to my serial port.
This is what I get now on the failing resume:
[ 112.879390] PM: resume of devices complete after 2239.905 msecs
[ 112.880068] r8169 0000:07:00.0 eth0: link up
[ 112.880078] Switched to clocksource hpet
[ 116.069248] PM: Finishing wakeup.
[ 116.072574] Restarting tasks ... done.
[ 116.076664] PM: calling nb rcu_pm_notify+0x0/0x60
[ 116.081439] PM: ... nb rcu_pm_notify+0x0/0x60 done
[ 116.086267] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
[ 116.088526] systemd[1]: Got notification message for unit
systemd-journald.service
[ 116.099442] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
[ 116.105099] PM: calling nb fw_pm_notify+0x0/0x150
[ 116.109812] PM: ... nb fw_pm_notify+0x0/0x150 done
[ 116.114623] PM: calling nb bsp_pm_callback+0x0/0x50
[ 116.119504] PM: ... nb bsp_pm_callback+0x0/0x50 done
And then nothing, and it's hung. Looks the same to me (apart from the
tsc issues + hpet switch) as a successful resume:
[ 95.499513] PM: resume of devices complete after 1240.115 msecs
[ 96.368940] r8169 0000:07:00.0 eth0: link up
[ 98.676455] PM: Finishing wakeup.
[ 98.679765] Restarting tasks ... done.
[ 98.683821] PM: calling nb rcu_pm_notify+0x0/0x60
[ 98.688524] PM: ... nb rcu_pm_notify+0x0/0x60 done
[ 98.692044] systemd[1]: Got notification message for unit
systemd-journald.service
[ 98.700897] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
[ 98.706470] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
[ 98.712132] PM: calling nb fw_pm_notify+0x0/0x150
[ 98.716848] PM: ... nb fw_pm_notify+0x0/0x150 done
[ 98.721644] PM: calling nb bsp_pm_callback+0x0/0x50
[ 98.726536] PM: ... nb bsp_pm_callback+0x0/0x50 done
Full logs in http://gaast.net/~wilmer/.lkml/bad3.17-patched-megadebug.txt
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-21 21:40 ` Wilmer van der Gaast
@ 2014-10-21 23:15 ` Yinghai Lu
2014-10-22 12:53 ` Wilmer van der Gaast
0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-21 23:15 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1360 bytes --]
On Tue, Oct 21, 2014 at 2:40 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> Hello,
>
> Sorry for the delay, finally poked at this again. It looks like the
> no_console_suspend flag was causing troubles, which I didn't really need
> anyway with logging going to my serial port.
>
> This is what I get now on the failing resume:
>
> [ 112.879390] PM: resume of devices complete after 2239.905 msecs
> [ 112.880068] r8169 0000:07:00.0 eth0: link up
> [ 112.880078] Switched to clocksource hpet
> [ 116.069248] PM: Finishing wakeup.
> [ 116.072574] Restarting tasks ... done.
> [ 116.076664] PM: calling nb rcu_pm_notify+0x0/0x60
> [ 116.081439] PM: ... nb rcu_pm_notify+0x0/0x60 done
> [ 116.086267] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
> [ 116.088526] systemd[1]: Got notification message for unit
> systemd-journald.service
> [ 116.099442] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
> [ 116.105099] PM: calling nb fw_pm_notify+0x0/0x150
> [ 116.109812] PM: ... nb fw_pm_notify+0x0/0x150 done
> [ 116.114623] PM: calling nb bsp_pm_callback+0x0/0x50
> [ 116.119504] PM: ... nb bsp_pm_callback+0x0/0x50 done
>
> And then nothing, and it's hung. Looks the same to me (apart from the tsc
> issues + hpet switch) as a successful resume:
then it stuck in pm_restore_console()?
Please check attached debut patch.
Thanks
Yinghai
[-- Attachment #2: debug_suspend_resume_y.patch --]
[-- Type: text/x-patch, Size: 1770 bytes --]
---
kernel/power/console.c | 9 +++++++++
1 file changed, 9 insertions(+)
Index: linux-2.6/kernel/power/console.c
===================================================================
--- linux-2.6.orig/kernel/power/console.c
+++ linux-2.6/kernel/power/console.c
@@ -51,6 +51,7 @@ void pm_vt_switch_required(struct device
if (tmp->dev == dev) {
/* already registered, update requirement */
tmp->required = required;
+ dev_info(dev, "pm_vt_switch_required() update %d\n", required);
goto out;
}
}
@@ -61,6 +62,7 @@ void pm_vt_switch_required(struct device
entry->required = required;
entry->dev = dev;
+ dev_info(dev, "pm_vt_switch_required() added %d\n", required);
list_add(&entry->head, &pm_vt_switch_list);
out:
@@ -81,6 +83,7 @@ void pm_vt_switch_unregister(struct devi
mutex_lock(&vt_switch_mutex);
list_for_each_entry(tmp, &pm_vt_switch_list, head) {
if (tmp->dev == dev) {
+ dev_info(dev, "pm_vt_switch_required() removed %d\n", tmp->required);
list_del(&tmp->head);
kfree(tmp);
break;
@@ -131,11 +134,14 @@ int pm_prepare_console(void)
if (!pm_vt_switch())
return 0;
+ pr_info("pm_prepare_console() before move\n");
orig_fgconsole = vt_move_to_console(SUSPEND_CONSOLE, 1);
if (orig_fgconsole < 0)
return 1;
+ pr_info("pm_prepare_console() before redirect\n");
orig_kmsg = vt_kmsg_redirect(SUSPEND_CONSOLE);
+ pr_info("pm_prepare_console() done\n");
return 0;
}
@@ -145,7 +151,10 @@ void pm_restore_console(void)
return;
if (orig_fgconsole >= 0) {
+ pr_info("pm_restore_console() before move\n");
vt_move_to_console(orig_fgconsole, 0);
+ pr_info("pm_restore_console() before redirect\n");
vt_kmsg_redirect(orig_kmsg);
+ pr_info("pm_restore_console() done\n");
}
}
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-21 23:15 ` Yinghai Lu
@ 2014-10-22 12:53 ` Wilmer van der Gaast
2014-10-26 21:53 ` Yinghai Lu
0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-22 12:53 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
Hello Yinghai,
This looks more promising!
Yinghai Lu (yinghai@kernel.org) wrote:
> >
> > And then nothing, and it's hung. Looks the same to me (apart from the tsc
> > issues + hpet switch) as a successful resume:
>
> then it stuck in pm_restore_console()?
>
That seems to be the case yes:
[ 106.661152] PM: ... nb fw_pm_notify+0x0/0x150 done
[ 106.665939] PM: calling nb bsp_pm_callback+0x0/0x50
[ 106.670814] PM: ... nb bsp_pm_callback+0x0/0x50 done
[ 106.675775] pm_restore_console() before move
Then nothing, during the third resume.
http://gaast.net/~wilmer/.lkml/bad3.17-patched-console-restore.txt has
the full log.
(Some of your other debug lines in your patch don't seem to be logging
anything during my repro BTW.)
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-22 12:53 ` Wilmer van der Gaast
@ 2014-10-26 21:53 ` Yinghai Lu
2014-10-27 10:50 ` Wilmer van der Gaast
0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-26 21:53 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 672 bytes --]
On Wed, Oct 22, 2014 at 5:53 AM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> That seems to be the case yes:
>
> [ 106.661152] PM: ... nb fw_pm_notify+0x0/0x150 done
> [ 106.665939] PM: calling nb bsp_pm_callback+0x0/0x50
> [ 106.670814] PM: ... nb bsp_pm_callback+0x0/0x50 done
> [ 106.675775] pm_restore_console() before move
>
> Then nothing, during the third resume.
>
> http://gaast.net/~wilmer/.lkml/bad3.17-patched-console-restore.txt has
> the full log.
>
> (Some of your other debug lines in your patch don't seem to be logging
> anything during my repro BTW.)
Please try attached two debug patches to check the pci registers
between the suspend/resume.
[-- Attachment #2: debug_extra_dump_pci.patch --]
[-- Type: text/x-patch, Size: 1804 bytes --]
Subject: [PATCH] pci: print out about pci=dump
debug print out before later driver hang
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
drivers/pci/pci.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 51 insertions(+), 1 deletion(-)
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -3858,6 +3858,54 @@ void __weak pci_fixup_cardbus(struct pci
}
EXPORT_SYMBOL(pci_fixup_cardbus);
+static void dump_pci_device_range(struct pci_dev *dev, unsigned start_reg,
+ unsigned size)
+{
+ int i;
+ int j;
+ u32 val;
+ int end = start_reg + size;
+
+ printk(KERN_DEBUG "PCI: %s", pci_name(dev));
+
+ for (i = start_reg; i < end; i += 4) {
+ if (!(i & 0x0f))
+ printk("\n%04x:", i);
+
+ pci_read_config_dword(dev, i, &val);
+ for (j = 0; j < 4; j++) {
+ printk(" %02x", val & 0xff);
+ val >>= 8;
+ }
+ }
+ printk("\n");
+}
+
+static int dump_pci_devices(void)
+{
+ struct pci_dev *dev = NULL;
+
+ while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL)
+ dump_pci_device_range(dev, 0, dev->cfg_size);
+
+ return 0;
+}
+
+static int pci_dump_regs;
+static void pci_dump(void)
+{
+ pci_dump_regs = 1;
+}
+
+static int pci_init(void)
+{
+ if (pci_dump_regs)
+ dump_pci_devices();
+
+ return 0;
+}
+device_initcall(pci_init);
+
static int __init pci_setup(char *str)
{
while (str) {
@@ -3865,7 +3913,9 @@ static int __init pci_setup(char *str)
if (k)
*k++ = 0;
if (*str && (str = pcibios_setup(str)) && *str) {
- if (!strcmp(str, "nomsi")) {
+ if (!strcmp(str, "dump")) {
+ pci_dump();
+ } else if (!strcmp(str, "nomsi")) {
pci_no_msi();
} else if (!strcmp(str, "noaer")) {
pci_no_aer();
[-- Attachment #3: debug_suspend_resume_z.patch --]
[-- Type: text/x-patch, Size: 1037 bytes --]
---
drivers/pci/pci.c | 2 +-
kernel/power/suspend.c | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -4462,7 +4462,7 @@ static void dump_pci_device_range(struct
printk("\n");
}
-static int dump_pci_devices(void)
+int dump_pci_devices(void)
{
struct pci_dev *dev = NULL;
Index: linux-2.6/kernel/power/suspend.c
===================================================================
--- linux-2.6.orig/kernel/power/suspend.c
+++ linux-2.6/kernel/power/suspend.c
@@ -401,6 +401,7 @@ int suspend_devices_and_enter(suspend_st
goto Resume_devices;
}
+int dump_pci_devices(void);
/**
* suspend_finish - Clean up before finishing the suspend sequence.
*
@@ -411,6 +412,7 @@ static void suspend_finish(void)
{
suspend_thaw_processes();
pm_notifier_call_chain(PM_POST_SUSPEND);
+ dump_pci_devices();
pm_restore_console();
}
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-26 21:53 ` Yinghai Lu
@ 2014-10-27 10:50 ` Wilmer van der Gaast
2014-10-27 18:23 ` Yinghai Lu
2014-10-27 21:21 ` Pavel Machek
0 siblings, 2 replies; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-27 10:50 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
Hello Yinghai,
Thanks again for your time!
I've applied your two patches, and as a wild guess also added pci=dump
to my kernel cmdline though I guess that just gave me a boot-time dump -
which mostly didn't make it into my dmesg.
I accidentally booted with no_console_suspend on the first run, which
still caused no output at all on the failed resume. I'm including the
output of that anyway, but also I have a run with that flag removed, and
annoyingly the crash appears to happen before the dump during the crash
finishes - while dumping info for this device, it seems:
04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892 (rev
10) (prog-if 01 [Subtractive decode])
(More info in my lspci.txt)
Wondering what device that is exactly, I stumbled upon
http://sourceforge.net/p/linux1394/mailman/message/29755048/ where
someone describes it as a "cheap and crappy PCI bridge". More and more I
wonder if I should just buy a new motherboard - sadly this one wasn't
even that cheap. :-( Though I don't know if the output stopping while
dumping output for this device means that it is the culprit, is printk()
to the serial console in any way blocking/buffered?
Anyway, dumps are in:
http://gaast.net/~wilmer/.lkml/bad3.17-pcidumps-no_console_suspend.txt
http://gaast.net/~wilmer/.lkml/bad3.17-pcidumps.txt
Cheers,
Wilmer van der Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-27 10:50 ` Wilmer van der Gaast
@ 2014-10-27 18:23 ` Yinghai Lu
2014-10-27 22:22 ` Wilmer van der Gaast
2014-10-27 21:21 ` Pavel Machek
1 sibling, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-27 18:23 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1588 bytes --]
On Mon, Oct 27, 2014 at 3:50 AM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> http://gaast.net/~wilmer/.lkml/bad3.17-pcidumps.txt
[ 252.028142] PCI: 0000:04:00.0
0000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0010: ff ff ff ff ff ff ff ff
04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892
(rev 10) (prog-if 01 [Subtractive decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 4 bytes
Bus: primary=04, secondary=05, subordinate=05, sec-latency=32
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fbc00000-fbcfffff
Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr+ DiscTmrStat- DiscTmrSERREn-
Capabilities: [90] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=55mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a0] Subsystem: Gigabyte Technology Co., Ltd Device 5000
under
00:1c.3 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5)
(prog-if 01 [Subtractive decode])
So that ITE will not work after suspend/resume?
Please apply 4 attached patches and try to remove the device like
echo 1 > /sys/bus/pci/devices/0000\:04\:00.0/remove
echo 1 > /sys/bus/pci/devices/0000\:00\:1c.3/pcie_link_disable
before suspend/resume test.
Thanks
Yinghai
[-- Attachment #2: move_pcie_link_disable_1.patch --]
[-- Type: text/x-patch, Size: 2628 bytes --]
Subject: [PATCH] PCI: Add generic pcie_link_disable
Remove not needed return value checking that Linus pointed out before.
Will use it from /sys/.../pcie/link_disable
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
drivers/pci/Makefile | 2 +-
drivers/pci/pcie-link.c | 42 ++++++++++++++++++++++++++++++++++++++++++
include/linux/pci.h | 2 ++
3 files changed, 45 insertions(+), 1 deletion(-)
Index: linux-2.6/drivers/pci/pcie-link.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/pci/pcie-link.c
@@ -0,0 +1,42 @@
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/errno.h>
+#include <linux/jiffies.h>
+#include <linux/delay.h>
+
+int pcie_link_disable_get(struct pci_dev *dev)
+{
+ u16 lnk_ctrl;
+ if (!pci_is_pcie(dev))
+ return 0;
+
+ pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnk_ctrl);
+
+ return !!(lnk_ctrl & PCI_EXP_LNKCTL_LD);
+}
+
+void pcie_link_disable_set(struct pci_dev *dev, int bit)
+{
+ u16 lnk_ctrl, old_lnk_ctrl;
+
+ if (!pci_is_pcie(dev))
+ return;
+
+ pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnk_ctrl);
+ old_lnk_ctrl = lnk_ctrl;
+
+ if (!bit)
+ lnk_ctrl &= ~PCI_EXP_LNKCTL_LD;
+ else
+ lnk_ctrl |= PCI_EXP_LNKCTL_LD;
+
+ if (old_lnk_ctrl == lnk_ctrl)
+ return;
+
+ pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnk_ctrl);
+
+ dev_printk(KERN_DEBUG, &dev->dev, "%s: lnk_ctrl = %x\n", __func__,
+ lnk_ctrl);
+}
+EXPORT_SYMBOL(pcie_link_disable_set);
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -842,6 +842,8 @@ struct pci_bus *pci_scan_root_bus(struct
struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
int busnr);
void pcie_update_link_speed(struct pci_bus *bus, u16 link_status);
+void pcie_link_disable_set(struct pci_dev *dev, int bit);
+int pcie_link_disable_get(struct pci_dev *dev);
struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
const char *name,
struct hotplug_slot *hotplug);
Index: linux-2.6/drivers/pci/Makefile
===================================================================
--- linux-2.6.orig/drivers/pci/Makefile
+++ linux-2.6/drivers/pci/Makefile
@@ -4,7 +4,7 @@
obj-y += access.o bus.o probe.o host-bridge.o remove.o pci.o \
pci-driver.o search.o pci-sysfs.o rom.o setup-res.o \
- irq.o vpd.o setup-bus.o vc.o
+ irq.o vpd.o setup-bus.o pcie-link.o vc.o
obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_SYSFS) += slot.o
[-- Attachment #3: move_pcie_link_disable_2.patch --]
[-- Type: text/x-patch, Size: 1947 bytes --]
Subject: [PATCH] PCI, pciehp: Use generic pcie_link_disable
Also remove old version with not needed return check.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
drivers/pci/hotplug/pciehp_hpc.c | 30 +++---------------------------
1 file changed, 3 insertions(+), 27 deletions(-)
Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
===================================================================
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
@@ -305,28 +305,6 @@ int pciehp_check_link_status(struct cont
return 0;
}
-static int __pciehp_link_set(struct controller *ctrl, bool enable)
-{
- struct pci_dev *pdev = ctrl_dev(ctrl);
- u16 lnk_ctrl;
-
- pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &lnk_ctrl);
-
- if (enable)
- lnk_ctrl &= ~PCI_EXP_LNKCTL_LD;
- else
- lnk_ctrl |= PCI_EXP_LNKCTL_LD;
-
- pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnk_ctrl);
- ctrl_dbg(ctrl, "%s: lnk_ctrl = %x\n", __func__, lnk_ctrl);
- return 0;
-}
-
-static int pciehp_link_enable(struct controller *ctrl)
-{
- return __pciehp_link_set(ctrl, true);
-}
-
void pciehp_get_attention_status(struct slot *slot, u8 *status)
{
struct controller *ctrl = slot->ctrl;
@@ -473,7 +451,6 @@ int pciehp_power_on_slot(struct slot * s
struct controller *ctrl = slot->ctrl;
struct pci_dev *pdev = ctrl_dev(ctrl);
u16 slot_status;
- int retval;
/* Clear sticky power-fault bit from previous power failures */
pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
@@ -487,11 +464,10 @@ int pciehp_power_on_slot(struct slot * s
pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
PCI_EXP_SLTCTL_PWR_ON);
- retval = pciehp_link_enable(ctrl);
- if (retval)
- ctrl_err(ctrl, "%s: Can not enable the link!\n", __func__);
+ /* Enable the link */
+ pcie_link_disable_set(ctrl->pcie->port, 0);
- return retval;
+ return 0;
}
void pciehp_power_off_slot(struct slot * slot)
[-- Attachment #4: pci_express_link.patch --]
[-- Type: text/x-patch, Size: 2574 bytes --]
Subject: [PATCH] PCI, sysfs: Add pcie attrs for pcie device under pci dev dir.
Will put link_disable and link_retrain
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
drivers/pci/Makefile | 2 +-
drivers/pci/pci-sysfs.c | 1 +
drivers/pci/pci.h | 1 +
drivers/pci/pcie-sysfs.c | 23 +++++++++++++++++++++++
4 files changed, 26 insertions(+), 1 deletion(-)
Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -1608,6 +1608,7 @@ static struct attribute_group pci_dev_br
static const struct attribute_group *pci_dev_attr_groups[] = {
&pci_dev_attr_group,
&pci_dev_bridge_attr_group,
+ &pci_dev_pcie_attr_group,
&pci_dev_hp_attr_group,
#ifdef CONFIG_PCI_IOV
&sriov_dev_attr_group,
Index: linux-2.6/drivers/pci/pcie-sysfs.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/pci/pcie-sysfs.c
@@ -0,0 +1,23 @@
+#include <linux/kernel.h>
+#include <linux/pci.h>
+
+static struct attribute *pci_dev_pcie_dev_attrs[] = {
+ NULL,
+};
+
+static umode_t pci_dev_pcie_attrs_are_visible(struct kobject *kobj,
+ struct attribute *a, int n)
+{
+ struct device *dev = container_of(kobj, struct device, kobj);
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ if (!pci_is_pcie(pdev))
+ return 0;
+
+ return a->mode;
+}
+
+struct attribute_group pci_dev_pcie_attr_group = {
+ .is_visible = pci_dev_pcie_attrs_are_visible,
+ .attrs = pci_dev_pcie_dev_attrs,
+};
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -152,6 +152,7 @@ static inline int pci_no_d1d2(struct pci
extern const struct attribute_group *pci_dev_groups[];
extern const struct attribute_group *pcibus_groups[];
extern struct device_type pci_dev_type;
+extern struct attribute_group pci_dev_pcie_attr_group;
extern const struct attribute_group *pci_bus_groups[];
Index: linux-2.6/drivers/pci/Makefile
===================================================================
--- linux-2.6.orig/drivers/pci/Makefile
+++ linux-2.6/drivers/pci/Makefile
@@ -4,7 +4,7 @@
obj-y += access.o bus.o probe.o host-bridge.o remove.o pci.o \
pci-driver.o search.o pci-sysfs.o rom.o setup-res.o \
- irq.o vpd.o setup-bus.o pcie-link.o vc.o
+ irq.o vpd.o setup-bus.o pcie-link.o pcie-sysfs.o vc.o
obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_SYSFS) += slot.o
[-- Attachment #5: pci_express_link_disable.patch --]
[-- Type: text/x-patch, Size: 1930 bytes --]
Subject: [PATCH] PCI: Add link_disable in /sysfs for pcie device
Found PCIe cards from one vendor, will not respond to scan from bridge,
if we change bus number setting in bridge device.
Have to do link disable/enable on the pcie root port.
So try to expose link disable bit of pcie link control register. We can use
echo 1 > /sys/..../link_disable
echo 0 > /sys/..../link_disable
to bring the pcie device back to respond to scan.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
drivers/pci/pcie-sysfs.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
Index: linux-2.6/drivers/pci/pcie-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pcie-sysfs.c
+++ linux-2.6/drivers/pci/pcie-sysfs.c
@@ -1,7 +1,35 @@
#include <linux/kernel.h>
#include <linux/pci.h>
+static ssize_t
+pcie_link_disable_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ return sprintf(buf, "%u\n", pcie_link_disable_get(pdev));
+}
+static ssize_t
+pcie_link_disable_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+ unsigned long val;
+
+ if (kstrtoul(buf, 0, &val) < 0)
+ return -EINVAL;
+
+ pcie_link_disable_set(pdev, val);
+
+ return count;
+}
+
+static struct device_attribute pcie_link_disable_attr =
+ __ATTR(pcie_link_disable, 0644,
+ pcie_link_disable_show, pcie_link_disable_store);
+
static struct attribute *pci_dev_pcie_dev_attrs[] = {
+ &pcie_link_disable_attr.attr,
NULL,
};
@@ -14,6 +42,11 @@ static umode_t pci_dev_pcie_attrs_are_vi
if (!pci_is_pcie(pdev))
return 0;
+ if (a == &pcie_link_disable_attr.attr)
+ if ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
+ (pci_pcie_type(pdev) != PCI_EXP_TYPE_DOWNSTREAM))
+ return 0;
+
return a->mode;
}
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-27 10:50 ` Wilmer van der Gaast
2014-10-27 18:23 ` Yinghai Lu
@ 2014-10-27 21:21 ` Pavel Machek
1 sibling, 0 replies; 51+ messages in thread
From: Pavel Machek @ 2014-10-27 21:21 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Yinghai Lu, Bjorn Helgaas, Rafael J. Wysocki, Rafael Wysocki,
linux-kernel
On Mon 2014-10-27 10:50:04, Wilmer van der Gaast wrote:
> Hello Yinghai,
>
> Thanks again for your time!
>
> I've applied your two patches, and as a wild guess also added pci=dump to my
> kernel cmdline though I guess that just gave me a boot-time dump - which
> mostly didn't make it into my dmesg.
>
> I accidentally booted with no_console_suspend on the first run, which still
> caused no output at all on the failed resume. I'm including the output of
> that anyway, but also I have a run with that flag removed, and annoyingly
> the crash appears to happen before the dump during the crash finishes -
> while dumping info for this device, it seems:
>
> 04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892 (rev 10)
> (prog-if 01 [Subtractive decode])
>
> (More info in my lspci.txt)
>
> Wondering what device that is exactly, I stumbled upon
> http://sourceforge.net/p/linux1394/mailman/message/29755048/ where someone
> describes it as a "cheap and crappy PCI bridge". More and more I wonder if I
> should just buy a new motherboard - sadly this one wasn't even that
> cheap.
It is probably not just you that is affected, and we already know what
change broke it. So we really should fix it.
> :-( Though I don't know if the output stopping while dumping output for this
> device means that it is the culprit, is printk() to the serial console in
> any way blocking/buffered?
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-27 18:23 ` Yinghai Lu
@ 2014-10-27 22:22 ` Wilmer van der Gaast
2014-10-27 23:41 ` Yinghai Lu
0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-27 22:22 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
Hello,
On 27-10-14 18:23, Yinghai Lu wrote:
>
> 04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892
>
> So that ITE will not work after suspend/resume?
>
Even after the first one already, you mean?
Honestly, I don't really know what its purpose is, and it doesn't have
any child nodes in the PCI tree from what I can tell. Possibly because I
don't have any PCI cards in the machine, just a PCIe video card -
assuming this is a PCI bridge taking care of legacy PCI plugin cards?
> Please apply 4 attached patches and try to remove the device like
>
> echo 1 > /sys/bus/pci/devices/0000\:04\:00.0/remove
> echo 1 > /sys/bus/pci/devices/0000\:00\:1c.3/pcie_link_disable
>
> before suspend/resume test.
>
That worked! Resumed properly now.
Full log in http://gaast.net/~wilmer/.lkml/good3.17.txt . Including the
PCI dump at boot time, where that device doesn't dump just ff's.
Wilmer van der Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-27 22:22 ` Wilmer van der Gaast
@ 2014-10-27 23:41 ` Yinghai Lu
2014-10-28 0:03 ` Wilmer van der Gaast
0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-27 23:41 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
On Mon, Oct 27, 2014 at 3:22 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> Hello,
>
> On 27-10-14 18:23, Yinghai Lu wrote:
>>
>>
>> 04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892
>>
>> So that ITE will not work after suspend/resume?
>>
> Even after the first one already, you mean?
Yes.
>
> Honestly, I don't really know what its purpose is, and it doesn't have any
> child nodes in the PCI tree from what I can tell. Possibly because I don't
> have any PCI cards in the machine, just a PCIe video card - assuming this is
> a PCI bridge taking care of legacy PCI plugin cards?
>
>> Please apply 4 attached patches and try to remove the device like
>>
>> echo 1 > /sys/bus/pci/devices/0000\:04\:00.0/remove
>> echo 1 > /sys/bus/pci/devices/0000\:00\:1c.3/pcie_link_disable
>>
>> before suspend/resume test.
>>
> That worked! Resumed properly now.
>
> Full log in http://gaast.net/~wilmer/.lkml/good3.17.txt . Including the PCI
> dump at boot time, where that device doesn't dump just ff's.
Can you only apply the patch that revert enable bridge early and
two pci dump patches to see if 04:00.0 readout is 0xff?
Thanks
Yinghai
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-27 23:41 ` Yinghai Lu
@ 2014-10-28 0:03 ` Wilmer van der Gaast
2014-10-28 1:12 ` Yinghai Lu
0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-28 0:03 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
On 27-10-14 23:41, Yinghai Lu wrote:
>
> Can you only apply the patch that revert enable bridge early and
> two pci dump patches to see if 04:00.0 readout is 0xff?
>
I was curious about that already, did that with a 3.16.6 that I think
just had your revert applied (and using lspci -xxxx to get the dump
which I assumed would be the same): No changes to 04:00 at all.
Confirmed that this is the case with 3.17 + those patches as well, it's
showing this at all times:
[ 130.000122] PCI: 0000:04:00.0
0000: 83 12 92 88 07 00 10 00 10 01 04 06 01 00 01 00
0010: 00 00 00 00 00 00 00 00 04 05 05 20 d1 d1 20 22
0020: c0 fb c0 fb f1 ff 01 00 00 00 00 00 00 00 00 00
0030: 00 00 00 00 90 00 00 00 00 00 00 00 0b 01 00 02
0040: 0c 31 00 00 08 06 00 00 00 00 00 00 ff 00 00 00
0050: 72 ab b9 6d 00 00 00 00 20 c9 8e 00 00 00 00 00
0060: 00 00 00 00 aa 0d 00 10 00 44 00 00 00 00 00 80
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0090: 01 a0 42 fe 00 00 00 00 00 00 00 00 00 00 00 00
00a0: 0d 00 00 00 58 14 00 50 00 00 00 00 00 00 00 00
00b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00f0: 00 00 00 00 00 1f 00 00 00 00 00 00 00 00 00 00
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-28 0:03 ` Wilmer van der Gaast
@ 2014-10-28 1:12 ` Yinghai Lu
2014-10-28 4:03 ` Yinghai Lu
2014-10-28 23:34 ` Wilmer van der Gaast
0 siblings, 2 replies; 51+ messages in thread
From: Yinghai Lu @ 2014-10-28 1:12 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
On Mon, Oct 27, 2014 at 5:03 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> I was curious about that already, did that with a 3.16.6 that I think just
> had your revert applied (and using lspci -xxxx to get the dump which I
> assumed would be the same): No changes to 04:00 at all.
>
> Confirmed that this is the case with 3.17 + those patches as well, it's
> showing this at all times:
can you post
lspci -vvxxxx -s 00:1c.3
lspci -vvxxxx -s 04:00.0
before reverting enable bridge early patch
and after reverting on 3.17+?
Thanks
Yinghai
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-28 1:12 ` Yinghai Lu
@ 2014-10-28 4:03 ` Yinghai Lu
2014-10-28 10:23 ` Wilmer van der Gaast
2014-10-28 23:34 ` Wilmer van der Gaast
1 sibling, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-28 4:03 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 707 bytes --]
On Mon, Oct 27, 2014 at 6:12 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Mon, Oct 27, 2014 at 5:03 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
>> I was curious about that already, did that with a 3.16.6 that I think just
>> had your revert applied (and using lspci -xxxx to get the dump which I
>> assumed would be the same): No changes to 04:00 at all.
>>
>> Confirmed that this is the case with 3.17 + those patches as well, it's
>> showing this at all times:
>
> can you post
> lspci -vvxxxx -s 00:1c.3
> lspci -vvxxxx -s 04:00.0
> before reverting enable bridge early patch
> and after reverting on 3.17+?
Please check if attached patch could fix the problem on your setup.
Thanks
Yinghai
[-- Attachment #2: pci_set_bridge_d0.patch --]
[-- Type: text/x-patch, Size: 793 bytes --]
---
| 6 ++++++
1 file changed, 6 insertions(+)
Index: linux-2.6/drivers/pci/quirks.c
===================================================================
--- linux-2.6.orig/drivers/pci/quirks.c
+++ linux-2.6/drivers/pci/quirks.c
@@ -3098,6 +3098,12 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_IN
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x8c02, quirk_remove_d3_delay);
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x8c22, quirk_remove_d3_delay);
+static void enable_pci_bridge_d0(struct pci_dev *dev)
+{
+ pci_set_power_state(dev, PCI_D0);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, enable_pci_bridge_d0);
+
/*
* Some devices may pass our check in pci_intx_mask_supported if
* PCI_COMMAND_INTX_DISABLE works though they actually do not properly
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-28 4:03 ` Yinghai Lu
@ 2014-10-28 10:23 ` Wilmer van der Gaast
0 siblings, 0 replies; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-28 10:23 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
On 28-10-14 04:03, Yinghai Lu wrote:
>
> Please check if attached patch could fix the problem on your setup.
>
Sadly it looks like it did not. :-( Applied your patch on a vanilla 3.17
tree, still seeing the same crash.
I'll get more debugging output and the output you asked for in your
previous e-mail tonight, need to go to work now.
Cheers,
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-28 1:12 ` Yinghai Lu
2014-10-28 4:03 ` Yinghai Lu
@ 2014-10-28 23:34 ` Wilmer van der Gaast
2014-10-29 5:17 ` Yinghai Lu
1 sibling, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-28 23:34 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
Hello,
On 28-10-14 01:12, Yinghai Lu wrote:
> lspci -vvxxxx -s 00:1c.3
> lspci -vvxxxx -s 04:00.0
> before reverting enable bridge early patch
http://gaast.net/~wilmer/.lkml/lspcixx-nopatch.txt (So that's 3.17 +
your revert patch)
> and after reverting on 3.17+?
>
http://gaast.net/~wilmer/.lkml/lspcixx-patched.txt
plain 3.17.
I've run the commands twice, once before and once after a single
suspend+resume cycle. Small difference and only before that cycle:
ruby:~/crashit# diff -u lspcixx-*
--- lspcixx-nopatch.txt 2014-10-28 23:26:08.679690828 +0000
+++ lspcixx-patched.txt 2014-10-28 23:10:05.391896757 +0000
@@ -92,10 +92,10 @@
2f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
-320: 5b 60 09 00 00 20 00 0a 08 10 b8 04 07 00 d5 0c
+320: 5b 60 09 00 00 20 00 0a 90 10 b8 04 8f 00 5d 0d
330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
-340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 d5 00
-350: fe 07 dd 00 01 00 08 00 00 00 00 00 00 00 00 00
+340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 5d 00
+350: fe 07 65 00 01 00 08 00 00 00 00 00 00 00 00 00
360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
(Diff is in the Intel device, not the ITE one.)
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-28 23:34 ` Wilmer van der Gaast
@ 2014-10-29 5:17 ` Yinghai Lu
2014-10-29 9:37 ` Wilmer van der Gaast
0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-29 5:17 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1271 bytes --]
On Tue, Oct 28, 2014 at 4:34 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
>
> I've run the commands twice, once before and once after a single
> suspend+resume cycle. Small difference and only before that cycle:
>
> ruby:~/crashit# diff -u lspcixx-*
> --- lspcixx-nopatch.txt 2014-10-28 23:26:08.679690828 +0000
> +++ lspcixx-patched.txt 2014-10-28 23:10:05.391896757 +0000
> @@ -92,10 +92,10 @@
> 2f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
> -320: 5b 60 09 00 00 20 00 0a 08 10 b8 04 07 00 d5 0c
> +320: 5b 60 09 00 00 20 00 0a 90 10 b8 04 8f 00 5d 0d
> 330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
> -340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 d5 00
> -350: fe 07 dd 00 01 00 08 00 00 00 00 00 00 00 00 00
> +340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 5d 00
> +350: fe 07 65 00 01 00 08 00 00 00 00 00 00 00 00 00
> 360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> (Diff is in the Intel device, not the ITE one.)
>
That is strange.
Anyway please try attached patched on top of 3.17.
Thanks
Yinghai
[-- Attachment #2: debug_suspend_resume_z_xx.patch --]
[-- Type: text/x-patch, Size: 511 bytes --]
---
drivers/pci/pci.c | 2 ++
1 file changed, 2 insertions(+)
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,8 @@ static void pci_enable_bridge(struct pci
pci_set_master(dev);
}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_bridge);
+
static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
{
struct pci_dev *bridge;
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-29 5:17 ` Yinghai Lu
@ 2014-10-29 9:37 ` Wilmer van der Gaast
2014-10-30 0:53 ` Yinghai Lu
0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-29 9:37 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
Helllo,
On 29-10-14 05:17, Yinghai Lu wrote:
>> (Diff is in the Intel device, not the ITE one.)
> That is strange.
>
I did wonder later, why was I not seeing the ff* dump anymore after the
resume..
> Anyway please try attached patched on top of 3.17.
>
Done, and that did work! Four suspend+resume cycles later and it's still
stable.
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-29 9:37 ` Wilmer van der Gaast
@ 2014-10-30 0:53 ` Yinghai Lu
2014-10-30 10:36 ` Wilmer van der Gaast
0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-30 0:53 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 268 bytes --]
On Wed, Oct 29, 2014 at 2:37 AM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
>
>> Anyway please try attached patched on top of 3.17.
>>
> Done, and that did work! Four suspend+resume cycles later and it's still
> stable.
Then can you test attached simplified one.
[-- Attachment #2: debug_suspend_resume_z_yy.patch --]
[-- Type: text/x-patch, Size: 835 bytes --]
---
drivers/pci/pci.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,19 @@ static void pci_enable_bridge(struct pci
pci_set_master(dev);
}
+static void ite_set_d0(struct pci_dev *dev)
+{
+ if (dev->pm_cap) {
+ u16 pmcsr;
+ pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+ dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
+ }
+
+ pci_set_power_state(dev, PCI_D0);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x244e, ite_set_d0);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, ite_set_d0);
+
static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
{
struct pci_dev *bridge;
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-30 0:53 ` Yinghai Lu
@ 2014-10-30 10:36 ` Wilmer van der Gaast
2014-10-30 16:57 ` Yinghai Lu
0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-30 10:36 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
Hello,
On 30-10-14 00:53, Yinghai Lu wrote:
>> Done, and that did work! Four suspend+resume cycles later and it's still
>> stable.
> Then can you test attached simplified one.
>
Sadly, with that patch (applied against a vanilla 3.17 tree like all the
others) the second resume fails already. :-(
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-30 10:36 ` Wilmer van der Gaast
@ 2014-10-30 16:57 ` Yinghai Lu
2014-10-30 21:54 ` Wilmer van der Gaast
0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-30 16:57 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 347 bytes --]
On Thu, Oct 30, 2014 at 3:36 AM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> Sadly, with that patch (applied against a vanilla 3.17 tree like all the> others) the second resume fails already. :-(
oh, no. Really want to know which bit causes the problem.
Please check debug patch...that will print out pci conf space before
...and after...
[-- Attachment #2: debug_extra_dump_pci.patch --]
[-- Type: text/x-patch, Size: 1804 bytes --]
Subject: [PATCH] pci: print out about pci=dump
debug print out before later driver hang
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
drivers/pci/pci.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 51 insertions(+), 1 deletion(-)
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -3858,6 +3858,54 @@ void __weak pci_fixup_cardbus(struct pci
}
EXPORT_SYMBOL(pci_fixup_cardbus);
+static void dump_pci_device_range(struct pci_dev *dev, unsigned start_reg,
+ unsigned size)
+{
+ int i;
+ int j;
+ u32 val;
+ int end = start_reg + size;
+
+ printk(KERN_DEBUG "PCI: %s", pci_name(dev));
+
+ for (i = start_reg; i < end; i += 4) {
+ if (!(i & 0x0f))
+ printk("\n%04x:", i);
+
+ pci_read_config_dword(dev, i, &val);
+ for (j = 0; j < 4; j++) {
+ printk(" %02x", val & 0xff);
+ val >>= 8;
+ }
+ }
+ printk("\n");
+}
+
+static int dump_pci_devices(void)
+{
+ struct pci_dev *dev = NULL;
+
+ while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL)
+ dump_pci_device_range(dev, 0, dev->cfg_size);
+
+ return 0;
+}
+
+static int pci_dump_regs;
+static void pci_dump(void)
+{
+ pci_dump_regs = 1;
+}
+
+static int pci_init(void)
+{
+ if (pci_dump_regs)
+ dump_pci_devices();
+
+ return 0;
+}
+device_initcall(pci_init);
+
static int __init pci_setup(char *str)
{
while (str) {
@@ -3865,7 +3913,9 @@ static int __init pci_setup(char *str)
if (k)
*k++ = 0;
if (*str && (str = pcibios_setup(str)) && *str) {
- if (!strcmp(str, "nomsi")) {
+ if (!strcmp(str, "dump")) {
+ pci_dump();
+ } else if (!strcmp(str, "nomsi")) {
pci_no_msi();
} else if (!strcmp(str, "noaer")) {
pci_no_aer();
[-- Attachment #3: debug_suspend_resume_z_zz.patch --]
[-- Type: text/x-patch, Size: 740 bytes --]
---
drivers/pci/pci.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,20 @@ static void pci_enable_bridge(struct pci
pci_set_master(dev);
}
+static int dump_pci_devices(void);
+
+static void pci_enable_ite(struct pci_dev *dev)
+{
+ pr_info("before...\n");
+ dump_pci_devices();
+
+ pci_enable_bridge(dev);
+
+ pr_info("after...\n");
+ dump_pci_devices();
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
{
struct pci_dev *bridge;
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-30 16:57 ` Yinghai Lu
@ 2014-10-30 21:54 ` Wilmer van der Gaast
2014-10-30 23:02 ` Yinghai Lu
0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-30 21:54 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
Hello,
On 30-10-14 16:57, Yinghai Lu wrote:
>> Sadly, with that patch (applied against a vanilla 3.17 tree like all the> others) the second resume fails already. :-(
>
> oh, no. Really want to know which bit causes the problem.
>
Good question. And I think you will find my new finding even more
confusing: With your two patches from this e-mail, I could
suspend+resume 3× with no problems.. With just your two debugging
patches applied.
Lovely heisenbug here. I'll add that for every test so far I've removed
the kernel source tree, re-untarred it and applied the patches from your
e-mails on that, so the tests should be consistent. As is the bug
normally, before we started testing patches the crashes were already
always *very* reliably happening exactly after the third resume.
Just to be sure this morning was not a fluke, I've retested your patch
from this morning, and still a crash on the second resume.
> Please check debug patch...that will print out pci conf space before
> ...and after...
>
http://gaast.net/~wilmer/.lkml/good3.17-patched-megadebug.txt
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-30 21:54 ` Wilmer van der Gaast
@ 2014-10-30 23:02 ` Yinghai Lu
2014-10-30 23:24 ` Wilmer van der Gaast
0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-30 23:02 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1203 bytes --]
On Thu, Oct 30, 2014 at 2:54 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> http://gaast.net/~wilmer/.lkml/good3.17-patched-megadebug.txt
no difference except on 00:1c.3
--- before.txt 2014-10-30 15:20:35.782886485 -0700
+++ after.txt 2014-10-30 15:21:37.034882515 -0700
@@ -49,10 +49,10 @@
02f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
-0320: 5b 60 09 00 00 20 00 0a ec 19 b8 04 eb 09 b9 06
+0320: 5b 60 09 00 00 20 00 0a 33 1a b8 04 32 0a 00 07
0330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
-0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 45 02 b9 00
-0350: 4b 02 c1 00 01 00 08 00 00 00 00 00 00 00 00 00
+0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 46 02 00 00
+0350: 4c 02 08 00 01 00 08 00 00 00 00 00 00 00 00 00
0360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Please try attached patch on top of 3.17 without other patches.
If it is working, please dump acpi tables include dsdt.
need to check if there extra work in _PRT.
Thanks
Yinghai
[-- Attachment #2: debug_suspend_resume_xxx.patch --]
[-- Type: text/x-patch, Size: 720 bytes --]
---
arch/x86/pci/common.c | 8 ++++++++
1 file changed, 8 insertions(+)
Index: linux-2.6/arch/x86/pci/common.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/common.c
+++ linux-2.6/arch/x86/pci/common.c
@@ -719,6 +719,14 @@ int pcibios_enable_device(struct pci_dev
return 0;
}
+static void pci_enable_irq_ite(struct pci_dev *dev)
+{
+ if (!pci_dev_msi_enabled(dev))
+ pcibios_enable_irq(dev);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x244e, pci_enable_irq_ite);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_irq_ite);
+
void pcibios_disable_device (struct pci_dev *dev)
{
if (!pci_dev_msi_enabled(dev) && pcibios_disable_irq)
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-30 23:02 ` Yinghai Lu
@ 2014-10-30 23:24 ` Wilmer van der Gaast
2014-10-31 0:43 ` Yinghai Lu
0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-30 23:24 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
On 30-10-14 23:02, Yinghai Lu wrote:
>> http://gaast.net/~wilmer/.lkml/good3.17-patched-megadebug.txt
>
> no difference except on 00:1c.3
>
> --- before.txt 2014-10-30 15:20:35.782886485 -0700
> +++ after.txt 2014-10-30 15:21:37.034882515 -0700
> @@ -49,10 +49,10 @@
> 02f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
> -0320: 5b 60 09 00 00 20 00 0a ec 19 b8 04 eb 09 b9 06
> +0320: 5b 60 09 00 00 20 00 0a 33 1a b8 04 32 0a 00 07
> 0330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
> -0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 45 02 b9 00
> -0350: 4b 02 c1 00 01 00 08 00 00 00 00 00 00 00 00 00
> +0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 46 02 00 00
> +0350: 4c 02 08 00 01 00 08 00 00 00 00 00 00 00 00 00
> 0360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
Those diffs are in exactly the same offsets like the dumps I was diffing
a few days ago it seems.
> Please try attached patch on top of 3.17 without other patches.
>
Same problem like this morning: Failure after the second resume already. :-(
> If it is working, please dump acpi tables include dsdt.
> need to check if there extra work in _PRT.
>
Original files and iasl interpretations in:
http://gaast.net/~wilmer/.lkml/tables/
Thanks,
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-30 23:24 ` Wilmer van der Gaast
@ 2014-10-31 0:43 ` Yinghai Lu
2014-10-31 2:13 ` Yinghai Lu
0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-31 0:43 UTC (permalink / raw)
To: Wilmer van der Gaast, Bjorn Helgaas
Cc: Rafael J. Wysocki, Pavel Machek, Rafael Wysocki, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 411 bytes --]
On Thu, Oct 30, 2014 at 4:24 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
>
>
> Same problem like this morning: Failure after the second resume already. :-(
>
can not find out any magic line in pci_enable_bridge that could cause
the difference.
so either use attached pcie_enable_bridge_ite.patch or just revert the
commit 928bea9?
Bjorn, please check which one that you want to go on.
Thanks
Yinghai
[-- Attachment #2: pci_enable_bridge_ite.patch --]
[-- Type: text/x-patch, Size: 594 bytes --]
---
drivers/pci/pci.c | 6 ++++++
1 file changed, 6 insertions(+)
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,12 @@ static void pci_enable_bridge(struct pci
pci_set_master(dev);
}
+static void pci_enable_ite(struct pci_dev *dev)
+{
+ pci_enable_bridge(dev);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
{
struct pci_dev *bridge;
[-- Attachment #3: revert_928bea9_from_3.17.patch --]
[-- Type: text/x-patch, Size: 7187 bytes --]
diff --git a/arch/arm/kernel/bios32.c b/arch/arm/kernel/bios32.c
index 17a26c1..306ca53 100644
--- a/arch/arm/kernel/bios32.c
+++ b/arch/arm/kernel/bios32.c
@@ -538,6 +538,12 @@ void pci_common_init_dev(struct device *parent, struct hw_pci *hw)
* Assign resources.
*/
pci_bus_assign_resources(bus);
+
+
+ /*
+ * Enable bridges
+ */
+ pci_enable_bridges(bus);
}
/*
diff --git a/arch/m68k/coldfire/pci.c b/arch/m68k/coldfire/pci.c
index df96792..b33f97a 100644
--- a/arch/m68k/coldfire/pci.c
+++ b/arch/m68k/coldfire/pci.c
@@ -319,6 +319,7 @@ static int __init mcf_pci_init(void)
pci_fixup_irqs(pci_common_swizzle, mcf_pci_map_irq);
pci_bus_size_bridges(rootbus);
pci_bus_assign_resources(rootbus);
+ pci_enable_bridges(rootbus);
return 0;
}
diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c
index 1bf60b1..4f2e17d 100644
--- a/arch/mips/pci/pci.c
+++ b/arch/mips/pci/pci.c
@@ -113,6 +113,7 @@ static void pcibios_scanbus(struct pci_controller *hose)
if (!pci_has_flag(PCI_PROBE_ONLY)) {
pci_bus_size_bridges(bus);
pci_bus_assign_resources(bus);
+ pci_enable_bridges(bus);
}
}
}
diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c
index 1bc09ee..5272327 100644
--- a/arch/sh/drivers/pci/pci.c
+++ b/arch/sh/drivers/pci/pci.c
@@ -69,6 +69,7 @@ static void pcibios_scanbus(struct pci_channel *hose)
pci_bus_size_bridges(bus);
pci_bus_assign_resources(bus);
+ pci_enable_bridges(bus);
} else {
pci_free_resource_list(&resources);
}
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index cd4de7e..c15bc3c 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -614,6 +614,9 @@ static int acpi_pci_root_add(struct acpi_device *device,
if (system_state != SYSTEM_BOOTING) {
pcibios_resource_survey_bus(root->bus);
pci_assign_unassigned_root_bus_resources(root->bus);
+
+ /* need to after hot-added ioapic is registered */
+ pci_enable_bridges(root->bus);
}
pci_lock_rescan_remove();
diff --git a/drivers/parisc/lba_pci.c b/drivers/parisc/lba_pci.c
index 37e71ff..19f6f70 100644
--- a/drivers/parisc/lba_pci.c
+++ b/drivers/parisc/lba_pci.c
@@ -1590,6 +1590,7 @@ lba_driver_probe(struct parisc_device *dev)
lba_dump_res(&lba_dev->hba.lmmio_space, 2);
#endif
}
+ pci_enable_bridges(lba_bus);
/*
** Once PCI register ops has walked the bus, access to config
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 73aef51..761601e 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -283,6 +283,26 @@ void pci_bus_add_devices(const struct pci_bus *bus)
}
EXPORT_SYMBOL(pci_bus_add_devices);
+void pci_enable_bridges(struct pci_bus *bus)
+{
+ struct pci_dev *dev;
+ int retval;
+
+ list_for_each_entry(dev, &bus->devices, bus_list) {
+ if (dev->subordinate) {
+ if (!pci_is_enabled(dev)) {
+ retval = pci_enable_device(dev);
+ if (retval)
+ dev_err(&dev->dev, "Error enabling bridge (%d), continuing\n", retval);
+ pci_set_master(dev);
+ }
+ pci_enable_bridges(dev->subordinate);
+ }
+ }
+}
+EXPORT_SYMBOL(pci_enable_bridges);
+
+
/** pci_walk_bus - walk devices on/under bus, calling callback.
* @top bus whose devices should be walked
* @cb callback to be called for each device found
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index bcb90e4..8fadc84 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -511,6 +511,7 @@ static void enable_slot(struct acpiphp_slot *slot)
acpiphp_sanitize_bus(bus);
pcie_bus_configure_settings(bus);
acpiphp_set_acpi_region(slot);
+ pci_enable_bridges(bus);
list_for_each_entry(dev, &bus->devices, bus_list) {
/* Assume that newly added devices are powered on already. */
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 625a4ac..4121518 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1242,31 +1242,8 @@ int pci_reenable_device(struct pci_dev *dev)
}
EXPORT_SYMBOL(pci_reenable_device);
-static void pci_enable_bridge(struct pci_dev *dev)
-{
- struct pci_dev *bridge;
- int retval;
-
- bridge = pci_upstream_bridge(dev);
- if (bridge)
- pci_enable_bridge(bridge);
-
- if (pci_is_enabled(dev)) {
- if (!dev->is_busmaster)
- pci_set_master(dev);
- return;
- }
-
- retval = pci_enable_device(dev);
- if (retval)
- dev_err(&dev->dev, "Error enabling bridge (%d), continuing\n",
- retval);
- pci_set_master(dev);
-}
-
static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
{
- struct pci_dev *bridge;
int err;
int i, bars = 0;
@@ -1285,10 +1262,6 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
if (atomic_inc_return(&dev->enable_cnt) > 1)
return 0; /* already enabled */
- bridge = pci_upstream_bridge(dev);
- if (bridge)
- pci_enable_bridge(bridge);
-
/* only skip sriov related */
for (i = 0; i <= PCI_ROM_RESOURCE; i++)
if (dev->resource[i].flags & flags)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 5ed9930..df17ba8 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2177,6 +2177,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
max = pci_scan_child_bus(bus);
pci_assign_unassigned_bus_resources(bus);
+ pci_enable_bridges(bus);
pci_bus_add_devices(bus);
return max;
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 0482235..2cfb1eb 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1587,7 +1587,7 @@ again:
/* any device complain? */
if (list_empty(&fail_head))
- goto dump;
+ goto enable_and_dump;
if (tried_times >= pci_try_num) {
if (enable_local == undefined)
@@ -1596,7 +1596,7 @@ again:
dev_info(&bus->dev, "Automatically enabled pci realloc, if you have problem, try booting with pci=realloc=off\n");
free_list(&fail_head);
- goto dump;
+ goto enable_and_dump;
}
dev_printk(KERN_DEBUG, &bus->dev,
@@ -1629,7 +1629,10 @@ again:
goto again;
-dump:
+enable_and_dump:
+ /* Depth last, update the hardware. */
+ pci_enable_bridges(bus);
+
/* dump the resource on buses */
pci_bus_dump_resources(bus);
}
@@ -1700,6 +1703,7 @@ enable_all:
if (retval)
dev_err(&bridge->dev, "Error reenabling bridge (%d)\n", retval);
pci_set_master(bridge);
+ pci_enable_bridges(parent);
}
EXPORT_SYMBOL_GPL(pci_assign_unassigned_bridge_resources);
diff --git a/drivers/pcmcia/cardbus.c b/drivers/pcmcia/cardbus.c
index 4fe4cc4..9cbe4cf 100644
--- a/drivers/pcmcia/cardbus.c
+++ b/drivers/pcmcia/cardbus.c
@@ -92,6 +92,7 @@ int __ref cb_alloc(struct pcmcia_socket *s)
if (s->tune_bridge)
s->tune_bridge(s, bus);
+ pci_enable_bridges(bus);
pci_bus_add_devices(bus);
pci_unlock_rescan_remove();
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5be8db4..1f85fb5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1105,7 +1105,7 @@ int __must_check pci_bus_alloc_resource(struct pci_bus *bus,
resource_size_t,
resource_size_t),
void *alignf_data);
-
+void pci_enable_bridges(struct pci_bus *bus);
int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr);
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-31 0:43 ` Yinghai Lu
@ 2014-10-31 2:13 ` Yinghai Lu
2014-10-31 9:39 ` Wilmer van der Gaast
0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-31 2:13 UTC (permalink / raw)
To: Wilmer van der Gaast, Bjorn Helgaas
Cc: Rafael J. Wysocki, Pavel Machek, Rafael Wysocki, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 516 bytes --]
On Thu, Oct 30, 2014 at 5:43 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Thu, Oct 30, 2014 at 4:24 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
>>
>>
>> Same problem like this morning: Failure after the second resume already. :-(
>>
> can not find out any magic line in pci_enable_bridge that could cause
> the difference.
>
> so either use attached pcie_enable_bridge_ite.patch or just revert the
> commit 928bea9?
Last try:
Please check attached patch that will keep state consistent.
Thanks
Yinghai
[-- Attachment #2: pci_enable_bridge_ite_x.patch --]
[-- Type: text/x-patch, Size: 1088 bytes --]
---
drivers/pci/pci.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1264,6 +1264,26 @@ static void pci_enable_bridge(struct pci
pci_set_master(dev);
}
+static void pci_enable_ite(struct pci_dev *dev)
+{
+ u16 cmd;
+
+ /*
+ * FW enable the bridge already, so keep enable_cnt consistent,
+ * then later we can go through pci_pm_resume/pci_pm_reenable_device
+ * to enable it again.
+ * --- for pci bridge without driver case.
+ */
+ if (cmd & PCI_COMMAND_MASTER)
+ dev->is_busmaster = true;
+
+ pci_read_config_word(dev, PCI_COMMAND, &cmd);
+ if (cmd & (PCI_COMMAND_IO || PCI_COMMAND_MEMORY))
+ atomic_inc(&dev->enable_cnt);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x244e, pci_enable_ite);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
{
struct pci_dev *bridge;
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-31 2:13 ` Yinghai Lu
@ 2014-10-31 9:39 ` Wilmer van der Gaast
2014-10-31 16:11 ` Yinghai Lu
0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-31 9:39 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
Hello Yinghai,
On 31-10-14 02:13, Yinghai Lu wrote:
> Last try:
>
> Please check attached patch that will keep state consistent.
Good news: This last patch worked! For good measure, I ran my test twice
with a reboot in between. Worked consistently.
And similarly, to ensure that your debugging-at-boottime-only patch
wasn't just working by accident yesterday, I tested it twice more with
the same effect.
Thanks,
Wilmer van der Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-31 9:39 ` Wilmer van der Gaast
@ 2014-10-31 16:11 ` Yinghai Lu
2014-10-31 21:13 ` Wilmer van der Gaast
0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-31 16:11 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 602 bytes --]
On Fri, Oct 31, 2014 at 2:39 AM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> Hello Yinghai,
>
> On 31-10-14 02:13, Yinghai Lu wrote:
>>
>> Last try:
>>
>> Please check attached patch that will keep state consistent.
>
>
> Good news: This last patch worked! For good measure, I ran my test twice
> with a reboot in between. Worked consistently.
>
> And similarly, to ensure that your debugging-at-boottime-only patch wasn't
> just working by accident yesterday, I tested it twice more with the same
> effect.
Good. Please check if attached one on top of 3.17 only would work too.
Thanks
Yinghai
[-- Attachment #2: debug_suspend_resume_xxx1.patch --]
[-- Type: text/x-patch, Size: 643 bytes --]
---
drivers/pci/pci.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1063,7 +1063,9 @@ static void pci_restore_config_space(str
pci_restore_config_space_range(pdev, 4, 9, 10);
pci_restore_config_space_range(pdev, 0, 3, 0);
} else {
- pci_restore_config_space_range(pdev, 0, 15, 0);
+ /* Restore BARs before the command register. */
+ pci_restore_config_space_range(pdev, 4, 15, 0);
+ pci_restore_config_space_range(pdev, 0, 3, 0);
}
}
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-31 16:11 ` Yinghai Lu
@ 2014-10-31 21:13 ` Wilmer van der Gaast
2014-10-31 21:22 ` Yinghai Lu
0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-10-31 21:13 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
On 31-10-14 16:11, Yinghai Lu wrote:
>
> Good. Please check if attached one on top of 3.17 only would work too.
>
No luck, sadly. :-( Unsuccessful third resume.
I forgot to set up the serial console, would that still be useful?
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-31 21:13 ` Wilmer van der Gaast
@ 2014-10-31 21:22 ` Yinghai Lu
2014-10-31 23:18 ` Yinghai Lu
0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-31 21:22 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
On Fri, Oct 31, 2014 at 2:13 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> On 31-10-14 16:11, Yinghai Lu wrote:
>>
>>
>> Good. Please check if attached one on top of 3.17 only would work too.
>>
> No luck, sadly. :-( Unsuccessful third resume.
>
> I forgot to set up the serial console, would that still be useful?
never mind, let me go through suspend/resume code path again.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-31 21:22 ` Yinghai Lu
@ 2014-10-31 23:18 ` Yinghai Lu
2014-11-01 0:00 ` Wilmer van der Gaast
0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-10-31 23:18 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 391 bytes --]
On Fri, Oct 31, 2014 at 2:22 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Oct 31, 2014 at 2:13 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
>> On 31-10-14 16:11, Yinghai Lu wrote:
>>>
>>>
>>> Good. Please check if attached one on top of 3.17 only would work too.
>>>
>> No luck, sadly. :-( Unsuccessful third resume.
Please try attached two patches separately on top of 3.17.
[-- Attachment #2: pci_enable_bridge_ite.patch --]
[-- Type: text/x-patch, Size: 1016 bytes --]
---
drivers/pci/pci.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,24 @@ static void pci_enable_bridge(struct pci
pci_set_master(dev);
}
+static void pci_enable_ite(struct pci_dev *dev)
+{
+ /*
+ * FW enable the bridge already, so call pci_enable_bridge()
+ * to keep enable_cnt consistent, then later we can go through
+ * pci_pm_resume/pci_pm_reenable_device to enable it again.
+ * --- for pci bridge without driver case.
+ */
+ if (!pci_is_enabled(dev)) {
+ u16 cmd;
+
+ pci_read_config_word(dev, PCI_COMMAND, &cmd);
+ if ((cmd & (PCI_COMMAND_IO || PCI_COMMAND_MEMORY)) &&
+ pci_enable_bridge(dev);
+ }
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
{
struct pci_dev *bridge;
[-- Attachment #3: pci_pm_reenable_device_enhance.patch --]
[-- Type: text/x-patch, Size: 859 bytes --]
---
drivers/pci/pci-driver.c | 9 +++++++++
1 file changed, 9 insertions(+)
Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -519,8 +519,17 @@ static void pci_pm_set_unknown_state(str
*/
static int pci_pm_reenable_device(struct pci_dev *pci_dev)
{
+ u16 cmd;
int retval;
+ /* update enable_cnt according to cmd register */
+ pci_read_config_word(pci_dev, PCI_COMMAND, &cmd);
+ if (!pci_dev->is_busmaster && (cmd & PCI_COMMAND_MASTER))
+ pci_dev->is_busmaster = true;
+ if (!pci_is_enabled(pci_dev) &&
+ (cmd & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)))
+ atomic_inc(&pci_dev->enable_cnt);
+
/* if the device was enabled before suspend, reenable */
retval = pci_reenable_device(pci_dev);
/*
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-10-31 23:18 ` Yinghai Lu
@ 2014-11-01 0:00 ` Wilmer van der Gaast
2014-11-01 2:10 ` Yinghai Lu
0 siblings, 1 reply; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-11-01 0:00 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
Hello,
Patch #1 worked after a simple s/&&/)/. And patch #2 seems to fix the
problem as well!
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-11-01 0:00 ` Wilmer van der Gaast
@ 2014-11-01 2:10 ` Yinghai Lu
2014-11-02 23:16 ` Wilmer van der Gaast
0 siblings, 1 reply; 51+ messages in thread
From: Yinghai Lu @ 2014-11-01 2:10 UTC (permalink / raw)
To: Wilmer van der Gaast
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 201 bytes --]
On Fri, Oct 31, 2014 at 5:00 PM, Wilmer van der Gaast <wilmer@gaast.net> wrote:
> Hello,
>
> Patch #1 worked after a simple s/&&/)/. And patch #2 seems to fix the
> problem as well!
updated first #1.
[-- Attachment #2: pci_enable_bridge_ite_v2.patch --]
[-- Type: text/x-patch, Size: 1011 bytes --]
---
drivers/pci/pci.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,24 @@ static void pci_enable_bridge(struct pci
pci_set_master(dev);
}
+static void pci_enable_ite(struct pci_dev *dev)
+{
+ /*
+ * FW enable the bridge already, so call pci_enable_bridge()
+ * to keep enable_cnt consistent, then later we can go through
+ * pci_pm_resume/pci_pm_reenable_device to enable it again.
+ * --- for pci bridge without driver case.
+ */
+ if (!pci_is_enabled(dev)) {
+ u16 cmd;
+
+ pci_read_config_word(dev, PCI_COMMAND, &cmd);
+ if (cmd & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY))
+ pci_enable_bridge(dev);
+ }
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
{
struct pci_dev *bridge;
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Machine crashes right *after* ~successful resume
2014-11-01 2:10 ` Yinghai Lu
@ 2014-11-02 23:16 ` Wilmer van der Gaast
0 siblings, 0 replies; 51+ messages in thread
From: Wilmer van der Gaast @ 2014-11-02 23:16 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, Rafael J. Wysocki, Pavel Machek, Rafael Wysocki,
linux-kernel
On 01-11-14 02:10, Yinghai Lu wrote:
>> Patch #1 worked after a simple s/&&/)/. And patch #2 seems to fix the
>> problem as well!
> updated first #1.
>
Works as well!
Wilmer v/d Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
^ permalink raw reply [flat|nested] 51+ messages in thread
end of thread, other threads:[~2014-11-02 23:16 UTC | newest]
Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-07 23:20 Machine crashes right *after* ~successful resume Wilmer van der Gaast
2014-10-12 14:30 ` Pavel Machek
2014-10-12 15:49 ` Wilmer van der Gaast
2014-10-12 20:40 ` Pavel Machek
2014-10-12 23:47 ` Wilmer van der Gaast
2014-10-13 15:06 ` Rafael J. Wysocki
2014-10-15 11:16 ` Wilmer van der Gaast
2014-10-15 13:58 ` Bjorn Helgaas
2014-10-15 18:39 ` Yinghai Lu
2014-10-15 23:34 ` Wilmer van der Gaast
2014-10-16 4:32 ` Yinghai Lu
2014-10-16 9:36 ` Wilmer van der Gaast
2014-10-16 16:36 ` Yinghai Lu
2014-10-16 21:08 ` Wilmer van der Gaast
2014-10-18 21:28 ` Yinghai Lu
2014-10-18 23:57 ` Wilmer van der Gaast
2014-10-19 4:29 ` Yinghai Lu
2014-10-19 10:48 ` Wilmer van der Gaast
2014-10-21 21:40 ` Wilmer van der Gaast
2014-10-21 23:15 ` Yinghai Lu
2014-10-22 12:53 ` Wilmer van der Gaast
2014-10-26 21:53 ` Yinghai Lu
2014-10-27 10:50 ` Wilmer van der Gaast
2014-10-27 18:23 ` Yinghai Lu
2014-10-27 22:22 ` Wilmer van der Gaast
2014-10-27 23:41 ` Yinghai Lu
2014-10-28 0:03 ` Wilmer van der Gaast
2014-10-28 1:12 ` Yinghai Lu
2014-10-28 4:03 ` Yinghai Lu
2014-10-28 10:23 ` Wilmer van der Gaast
2014-10-28 23:34 ` Wilmer van der Gaast
2014-10-29 5:17 ` Yinghai Lu
2014-10-29 9:37 ` Wilmer van der Gaast
2014-10-30 0:53 ` Yinghai Lu
2014-10-30 10:36 ` Wilmer van der Gaast
2014-10-30 16:57 ` Yinghai Lu
2014-10-30 21:54 ` Wilmer van der Gaast
2014-10-30 23:02 ` Yinghai Lu
2014-10-30 23:24 ` Wilmer van der Gaast
2014-10-31 0:43 ` Yinghai Lu
2014-10-31 2:13 ` Yinghai Lu
2014-10-31 9:39 ` Wilmer van der Gaast
2014-10-31 16:11 ` Yinghai Lu
2014-10-31 21:13 ` Wilmer van der Gaast
2014-10-31 21:22 ` Yinghai Lu
2014-10-31 23:18 ` Yinghai Lu
2014-11-01 0:00 ` Wilmer van der Gaast
2014-11-01 2:10 ` Yinghai Lu
2014-11-02 23:16 ` Wilmer van der Gaast
2014-10-27 21:21 ` Pavel Machek
2014-10-19 8:07 ` Pavel Machek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).