From: Andreas Hartmann <andihartmann@freenet.de>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
linux-pci <linux-pci@vger.kernel.org>
Subject: Re: Hard and silent lock up since linux 3.14 with PCIe pass through (vfio)
Date: Thu, 30 Oct 2014 20:09:22 +0100 [thread overview]
Message-ID: <20141030200922.15126d7a@dualc.maya.org> (raw)
In-Reply-To: <1414688299.27420.292.camel@ul30vt.home>
Alex Williamson wrote:
> On Thu, 2014-10-30 at 17:35 +0100, Andreas Hartmann wrote:
>> Alex Williamson wrote:
>>> On Wed, 2014-10-29 at 20:43 +0100, Andreas Hartmann wrote:
>> [...]
>>>> Therefore, I never should need pci_save_vc_state and
>>>> pci_restore_vc_state. Thus, it should be ok to add "return" at the
>>>> beginning of each of these function, true? Then it should work.
>>>>
>>>> I tested it. It worked.
>>>>
>>>> But if I'm removing only one of these returns either in
>>>> pci_save_vc_state or pci_restore_vc_state, the machine hangs again.
>>>>
>>>> Therefore, there must be something odd going on in the for loops. Isn't
>>>> it possible to add some useful debug code to these loops to see what's
>>>> really going on? But the output *must* go to the actual console,
>>>> otherwise I can't see it!
>>>>
>>>>
>>>> int pci_save_vc_state(struct pci_dev *dev)
>>>> {
>>>> return 0; // must be set
>>>> int i;
>>>>
>>>> for (i = 0; i < ARRAY_SIZE(vc_caps); i++) {
>> // continue; -> works
>>>> int pos, ret;
>>>> struct pci_cap_saved_state *save_state;
>> // continue does not work!
>>
>> --> Most probably the
>>
>> struct pci_cap_saved_state *save_state;
>>
>> makes the system hang!
>
> We've done nothing more than declare variables there, there's no actual
> code. What happens if you increase the delay after bus reset, edit
> drivers/pci/pci.c, find the call to ssleep(1) and change the 1 to a 2,
> doubling the delay after reset.
Same behaviour.
> It seems like VC save/restore is just a
> scapegoat for the platform already being broken by the bus reset. Also,
> if you have any other card to test in this slot, it would be useful
> comparison data to know if we're dealing with an endpoint issue or a bus
> issue.
I organized an Intel pcie card:
03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
Subsystem: Intel Corporation Gigabit CT Desktop Adapter
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 17
Region 0: Memory at fdbc0000 (32-bit, non-prefetchable) [disabled] [size=128K]
Region 1: Memory at fdb00000 (32-bit, non-prefetchable) [disabled] [size=512K]
Region 2: I/O ports at cf00 [disabled] [size=32]
Region 3: Memory at fdbfc000 (32-bit, non-prefetchable) [disabled] [size=16K]
[virtual] Expansion ROM at fdb80000 [disabled] [size=256K]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable+ DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [a0] MSI-X: Enable- Count=5 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [140 v1] Device Serial Number 00-1b-21-ff-ff-cf-8f-57
Kernel driver in use: vfio-pci
and tested with the same kernel, which hangs w/ atheros card. It just
worked. Not just once, but each of the tests I did. I retested w/
atheros -> hang. Tested again with intel-card -> works. Back to
atheros -> hang.
Seems to be really a problem w/ the atheros card, which is triggered by
new vc save/restore.
Well, but what to do now? I know how to "fix" it. But this means I have
to compile my kernels again on my own if it is >= 3.14.
Thanks,
kind regards,
Andreas
next prev parent reply other threads:[~2014-10-30 19:16 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-23 19:03 Hard and silent lock up since linux 3.14 with PCIe pass through (vfio) Andreas Hartmann
2014-09-23 20:07 ` Alex Williamson
2014-09-24 14:54 ` Andreas Hartmann
2014-09-24 17:16 ` Andreas Hartmann
2014-10-10 9:39 ` Andreas Hartmann
2014-10-10 14:37 ` Bjorn Helgaas
2014-10-10 14:49 ` Andreas Hartmann
2014-10-10 15:55 ` Bjorn Helgaas
2014-10-10 16:09 ` Andreas Hartmann
2014-10-10 16:41 ` Bjorn Helgaas
2014-10-10 22:32 ` Andreas Hartmann
2014-10-10 22:54 ` Bjorn Helgaas
2014-10-11 6:20 ` Andreas Hartmann
2014-10-15 8:04 ` Alex Williamson
2014-10-17 1:04 ` Andreas Hartmann
2014-10-21 21:06 ` Alex Williamson
2014-10-21 21:32 ` Alex Williamson
2014-10-22 16:22 ` Andreas Hartmann
2014-10-22 20:36 ` Alex Williamson
2014-10-23 16:00 ` Andreas Hartmann
2014-10-23 16:33 ` Alex Williamson
2014-10-23 17:12 ` Andreas Hartmann
2014-10-23 17:33 ` Andreas Hartmann
2014-10-23 19:37 ` Alex Williamson
2014-10-24 14:21 ` Andreas Hartmann
2014-10-25 6:03 ` Andreas Hartmann
2014-10-28 21:51 ` Alex Williamson
2014-10-29 16:47 ` Andreas Hartmann
2014-10-29 17:44 ` Alex Williamson
2014-10-29 17:57 ` Andreas Hartmann
2014-10-29 18:16 ` Alex Williamson
2014-10-29 19:43 ` Andreas Hartmann
2014-10-29 20:50 ` Alex Williamson
2014-10-29 21:35 ` Andreas Hartmann
2014-10-30 16:35 ` Andreas Hartmann
2014-10-30 16:58 ` Alex Williamson
2014-10-30 19:09 ` Andreas Hartmann [this message]
2014-10-30 19:45 ` Alex Williamson
2014-10-30 20:21 ` Andreas Hartmann
2014-10-22 15:34 ` Andreas Hartmann
2014-10-22 16:02 ` Alex Williamson
2014-10-22 16:20 ` Andreas Hartmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141030200922.15126d7a@dualc.maya.org \
--to=andihartmann@freenet.de \
--cc=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=linux-pci@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).