Linux-PCI Archive on lore.kernel.org
 help / color / Atom feed
From: Andreas Hartmann <andihartmann@freenet.de>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci <linux-pci@vger.kernel.org>
Subject: Re: Hard and silent lock up since linux 3.14 with PCIe pass through (vfio)
Date: Thu, 30 Oct 2014 20:09:22 +0100
Message-ID: <20141030200922.15126d7a@dualc.maya.org> (raw)
In-Reply-To: <1414688299.27420.292.camel@ul30vt.home>

Alex Williamson wrote:
> On Thu, 2014-10-30 at 17:35 +0100, Andreas Hartmann wrote:
>> Alex Williamson wrote:
>>> On Wed, 2014-10-29 at 20:43 +0100, Andreas Hartmann wrote:
>> [...]
>>>> Therefore, I never should need pci_save_vc_state and
>>>> pci_restore_vc_state. Thus, it should be ok to add "return" at the
>>>> beginning of each of these function, true? Then it should work.
>>>>
>>>> I tested it. It worked.
>>>>
>>>> But if I'm removing only one of these returns either in
>>>> pci_save_vc_state or pci_restore_vc_state, the machine hangs again.
>>>>
>>>> Therefore, there must be something odd going on in the for loops. Isn't
>>>> it possible to add some useful debug code to these loops to see what's
>>>> really going on? But the output *must* go to the actual console,
>>>> otherwise I can't see it!
>>>>
>>>>
>>>> int pci_save_vc_state(struct pci_dev *dev)
>>>> {
>>>>         return 0; // must be set
>>>>         int i;
>>>>
>>>>         for (i = 0; i < ARRAY_SIZE(vc_caps); i++) {
>> 		   // continue; -> works
>>>>                 int pos, ret;
>>>>                 struct pci_cap_saved_state *save_state;
>> 		   // continue does not work!
>>
>> --> Most probably the
>>
>>             struct pci_cap_saved_state *save_state;
>>
>>     makes the system hang!
> 
> We've done nothing more than declare variables there, there's no actual
> code.  What happens if you increase the delay after bus reset, edit
> drivers/pci/pci.c, find the call to ssleep(1) and change the 1 to a 2,
> doubling the delay after reset.

Same behaviour.

>  It seems like VC save/restore is just a
> scapegoat for the platform already being broken by the bus reset.  Also,
> if you have any other card to test in this slot, it would be useful
> comparison data to know if we're dealing with an endpoint issue or a bus
> issue.

I organized an Intel pcie card:

03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
        Subsystem: Intel Corporation Gigabit CT Desktop Adapter
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 17
        Region 0: Memory at fdbc0000 (32-bit, non-prefetchable) [disabled] [size=128K]
        Region 1: Memory at fdb00000 (32-bit, non-prefetchable) [disabled] [size=512K]
        Region 2: I/O ports at cf00 [disabled] [size=32]
        Region 3: Memory at fdbfc000 (32-bit, non-prefetchable) [disabled] [size=16K]
        [virtual] Expansion ROM at fdb80000 [disabled] [size=256K]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable+ DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [a0] MSI-X: Enable- Count=5 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00002000
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [140 v1] Device Serial Number 00-1b-21-ff-ff-cf-8f-57
        Kernel driver in use: vfio-pci


and tested with the same kernel, which hangs w/ atheros card. It just
worked. Not just once, but each of the tests I did. I retested w/
atheros -> hang. Tested again with intel-card -> works. Back to
atheros -> hang.

Seems to be really a problem w/ the atheros card, which is triggered by
new vc save/restore.

Well, but what to do now? I know how to "fix" it. But this means I have
to compile my kernels again on my own if it is >= 3.14.


Thanks,
kind regards,
Andreas

  reply index

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-23 19:03 Andreas Hartmann
2014-09-23 20:07 ` Alex Williamson
2014-09-24 14:54   ` Andreas Hartmann
2014-09-24 17:16     ` Andreas Hartmann
2014-10-10  9:39   ` Andreas Hartmann
2014-10-10 14:37     ` Bjorn Helgaas
2014-10-10 14:49       ` Andreas Hartmann
2014-10-10 15:55         ` Bjorn Helgaas
2014-10-10 16:09           ` Andreas Hartmann
2014-10-10 16:41             ` Bjorn Helgaas
2014-10-10 22:32               ` Andreas Hartmann
2014-10-10 22:54                 ` Bjorn Helgaas
2014-10-11  6:20                   ` Andreas Hartmann
2014-10-15  8:04                     ` Alex Williamson
2014-10-17  1:04                       ` Andreas Hartmann
2014-10-21 21:06                         ` Alex Williamson
2014-10-21 21:32                           ` Alex Williamson
2014-10-22 16:22                             ` Andreas Hartmann
2014-10-22 20:36                               ` Alex Williamson
2014-10-23 16:00                                 ` Andreas Hartmann
2014-10-23 16:33                                   ` Alex Williamson
2014-10-23 17:12                                     ` Andreas Hartmann
2014-10-23 17:33                                     ` Andreas Hartmann
2014-10-23 19:37                                       ` Alex Williamson
2014-10-24 14:21                                         ` Andreas Hartmann
2014-10-25  6:03                                         ` Andreas Hartmann
2014-10-28 21:51                                           ` Alex Williamson
2014-10-29 16:47                                             ` Andreas Hartmann
2014-10-29 17:44                                               ` Alex Williamson
2014-10-29 17:57                                                 ` Andreas Hartmann
2014-10-29 18:16                                                   ` Alex Williamson
2014-10-29 19:43                                                     ` Andreas Hartmann
2014-10-29 20:50                                                       ` Alex Williamson
2014-10-29 21:35                                                         ` Andreas Hartmann
2014-10-30 16:35                                                         ` Andreas Hartmann
2014-10-30 16:58                                                           ` Alex Williamson
2014-10-30 19:09                                                             ` Andreas Hartmann [this message]
2014-10-30 19:45                                                               ` Alex Williamson
2014-10-30 20:21                                                                 ` Andreas Hartmann
2014-10-22 15:34                           ` Andreas Hartmann
2014-10-22 16:02                             ` Alex Williamson
2014-10-22 16:20                               ` Andreas Hartmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141030200922.15126d7a@dualc.maya.org \
    --to=andihartmann@freenet.de \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-PCI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pci/0 linux-pci/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pci linux-pci/ https://lore.kernel.org/linux-pci \
		linux-pci@vger.kernel.org
	public-inbox-index linux-pci

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pci


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git