Linux-PCI Archive on lore.kernel.org
 help / color / Atom feed
From: Andreas Hartmann <andihartmann@freenet.de>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: linux-pci <linux-pci@vger.kernel.org>
Subject: Re: Hard and silent lock up since linux 3.14 with PCIe pass through (vfio)
Date: Fri, 10 Oct 2014 11:39:36 +0200
Message-ID: <5437A958.3000201@maya.org> (raw)
In-Reply-To: <1411502866.24563.8.camel@ul30vt.home>

shortly: I retested w/ qemu 2.1.0 and Linux 3.17.0 - no change in behaviour.

Alex Williamson wrote:
> On Tue, 2014-09-23 at 21:03 +0200, Andreas Hartmann wrote:
>> Hello!
>>
>> Since long time now, I'm using w/o any problem PCIe pass through with a
>> Gigabyte GA-990XA-UD3/GA-990XA-UD3 mainboard (AMD 990X chipset) and
>> enabled IOMMU with vfio-pci.
>>
>> The last kernel working w/o any problem is kernel 3.13.7 (I didn't use
>> .8 and .9, but I do not think they would have been problematic).
>>
>> Since 3.14.19 (I didn't test any 3.14 kernel before) I'm encountering a
>> hard and silent lock up of the complete machine when starting the VM
>> with the PCIe card passed through.
>>
>> That's the relevant PCIe card, which locks up the machine (here
>> running w/ 3.12.28) when passed to the VM:
>>
>> 03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
>>         Subsystem: Qualcomm Atheros Device 3112
>>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>         Latency: 0, Cache Line Size: 64 bytes
>>         Interrupt: pin A routed to IRQ 17
>>         Region 0: Memory at fdbc0000 (64-bit, non-prefetchable) [size=128K]
>>         Expansion ROM at fda00000 [size=64K]
>>         Capabilities: [40] Power Management version 3
>>                 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
>>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>>         Capabilities: [50] MSI: Enable- Count=1/4 Maskable+ 64bit+
>>                 Address: 0000000000000000  Data: 0000
>>                 Masking: 00000000  Pending: 00000000
>>         Capabilities: [70] Express (v2) Endpoint, MSI 00
>>                 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
>>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>>                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>                         MaxPayload 128 bytes, MaxReadReq 512 bytes
>>                 DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
>>                 LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <2us, L1 <64us
>>                         ClockPM- Surprise- LLActRep- BwNot-
>>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
>>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>                 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>                 DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
>>                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
>>                 LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>>                          Compliance De-emphasis: -6dB
>>                 LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
>>                          EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>         Capabilities: [100 v1] Advanced Error Reporting
>>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout+ NonFatalErr+
>>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>>                 AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
>>         Capabilities: [140 v1] Virtual Channel
>>                 Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
>>                 Arb:    Fixed- WRR32- WRR64- WRR128-
>>                 Ctrl:   ArbSelect=Fixed
>>                 Status: InProgress-
>>                 VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>                         Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>                         Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
>>                         Status: NegoPending- InProgress-
>>         Capabilities: [300 v1] Device Serial Number 00-00-00-00-00-00-00-00
>>         Kernel driver in use: vfio-pci
>>         Kernel modules: ath9k
>>
>>
>> Unbinding it works w/o any problem. The lock up encounters about 4 s
>> after the start of the VM.
>>
>> On 3.12.x, I can see the following message on the error terminal when
>> starting the VM: 
>> vfio-pci: 03:00.0: invalid ROM contents.
>>
>> I compared AMD-Vi debug output between 3.12 and 3.14, but couldn't see
>> any difference. I compared /proc/interrupts between 3.12 and 3.14
>> and couldn't see any difference too so far.
>>
>>
>> qemu version I'm using is 1.7.0.
>>
>>
>> It is strange(?), that a second VM using PCI (legacy) pass through works
>> w/o any problem. I tried to start the problematic VM even w/o running
>> this VM - same result: machine is locked up hard.
>>
>>
>> Do you have any idea, what could be going on there? Or how to debug it
>> to see what happened?

> There weren't many vfio changes between 3.13 and 3.14.

It could be a pci problem, too? It is strange, that there is no problem
with the pci-card, but the pcie card hangs the machine!

> Have you tested whether the problem still occurs on 3.16 +

Same problem with 3.17.0

> newer QEMU?

Same problem With qemu 2.1.0.

>  Maybe also remove the ROM from the equation with the
> rombar=0 option for the vfio-pci device in QEMU.

Same problem :-(. The machine really is completely dead: it even pings
any more.



Regards,
Andreas

  parent reply index

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-23 19:03 Andreas Hartmann
2014-09-23 20:07 ` Alex Williamson
2014-09-24 14:54   ` Andreas Hartmann
2014-09-24 17:16     ` Andreas Hartmann
2014-10-10  9:39   ` Andreas Hartmann [this message]
2014-10-10 14:37     ` Bjorn Helgaas
2014-10-10 14:49       ` Andreas Hartmann
2014-10-10 15:55         ` Bjorn Helgaas
2014-10-10 16:09           ` Andreas Hartmann
2014-10-10 16:41             ` Bjorn Helgaas
2014-10-10 22:32               ` Andreas Hartmann
2014-10-10 22:54                 ` Bjorn Helgaas
2014-10-11  6:20                   ` Andreas Hartmann
2014-10-15  8:04                     ` Alex Williamson
2014-10-17  1:04                       ` Andreas Hartmann
2014-10-21 21:06                         ` Alex Williamson
2014-10-21 21:32                           ` Alex Williamson
2014-10-22 16:22                             ` Andreas Hartmann
2014-10-22 20:36                               ` Alex Williamson
2014-10-23 16:00                                 ` Andreas Hartmann
2014-10-23 16:33                                   ` Alex Williamson
2014-10-23 17:12                                     ` Andreas Hartmann
2014-10-23 17:33                                     ` Andreas Hartmann
2014-10-23 19:37                                       ` Alex Williamson
2014-10-24 14:21                                         ` Andreas Hartmann
2014-10-25  6:03                                         ` Andreas Hartmann
2014-10-28 21:51                                           ` Alex Williamson
2014-10-29 16:47                                             ` Andreas Hartmann
2014-10-29 17:44                                               ` Alex Williamson
2014-10-29 17:57                                                 ` Andreas Hartmann
2014-10-29 18:16                                                   ` Alex Williamson
2014-10-29 19:43                                                     ` Andreas Hartmann
2014-10-29 20:50                                                       ` Alex Williamson
2014-10-29 21:35                                                         ` Andreas Hartmann
2014-10-30 16:35                                                         ` Andreas Hartmann
2014-10-30 16:58                                                           ` Alex Williamson
2014-10-30 19:09                                                             ` Andreas Hartmann
2014-10-30 19:45                                                               ` Alex Williamson
2014-10-30 20:21                                                                 ` Andreas Hartmann
2014-10-22 15:34                           ` Andreas Hartmann
2014-10-22 16:02                             ` Alex Williamson
2014-10-22 16:20                               ` Andreas Hartmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5437A958.3000201@maya.org \
    --to=andihartmann@freenet.de \
    --cc=alex.williamson@redhat.com \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-PCI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pci/0 linux-pci/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pci linux-pci/ https://lore.kernel.org/linux-pci \
		linux-pci@vger.kernel.org
	public-inbox-index linux-pci

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pci


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git