All of lore.kernel.org
 help / color / mirror / Atom feed
* Bug: Completion-Wait loop timed out with vfio
@ 2023-02-25  6:25 Tasos Sahanidis
  2023-02-27  5:33 ` Abhishek Sahu
  0 siblings, 1 reply; 16+ messages in thread
From: Tasos Sahanidis @ 2023-02-25  6:25 UTC (permalink / raw)
  To: alex.williamson; +Cc: abhsahu, kvm

Hello everyone,

Attempting to pass through my graphics card to a VM with kernel 
>= 5.19.results in the following (host):

[   72.645091] AMD-Vi: Completion-Wait loop timed out
[   72.791448] AMD-Vi: Completion-Wait loop timed out
[   72.937768] AMD-Vi: Completion-Wait loop timed out
[   73.084388] AMD-Vi: Completion-Wait loop timed out
[   73.231661] AMD-Vi: Completion-Wait loop timed out
[   73.231711] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f000 flags=0x0050]
[   73.231724] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f040 flags=0x0050]
[   73.231734] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f080 flags=0x0050]
[   73.231743] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f0c0 flags=0x0050]
[   73.231752] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f100 flags=0x0050]
[   73.231761] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f140 flags=0x0050]
[   73.231770] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f180 flags=0x0050]
[   73.231779] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f1c0 flags=0x0050]
[   73.231788] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f200 flags=0x0050]
[   73.231797] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f240 flags=0x0050]
[   73.377900] AMD-Vi: Completion-Wait loop timed out
[   73.500538] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x1001e4600]
[   73.546431] AMD-Vi: Completion-Wait loop timed out
[   73.693772] AMD-Vi: Completion-Wait loop timed out
[   73.847385] AMD-Vi: Completion-Wait loop timed out
[   74.001796] AMD-Vi: Completion-Wait loop timed out
[   74.148077] AMD-Vi: Completion-Wait loop timed out
[   74.168380] virbr0: port 2(vnet0) entered learning state
[   74.294937] AMD-Vi: Completion-Wait loop timed out
[   74.296484] ata2.00: exception Emask 0x20 SAct 0x7e703fff SErr 0x0 action 0x6 frozen
[   74.296492] ata2.00: irq_stat 0x20000000, host bus error
[   74.296496] ata2.00: failed command: WRITE FPDMA QUEUED
[   74.296498] ata2.00: cmd 61/08:00:c0:ec:91/00:00:01:00:00/40 tag 0 ncq dma 4096 out
                        res 40/00:34:20:eb:91/00:00:01:00:00/40 Emask 0x20 (host bus error)
[   74.296507] ata2.00: status: { DRDY }
[more ATA errors]
[   74.296724] ata2: hard resetting link
[   74.430739] AMD-Vi: Completion-Wait loop timed out
[   74.502557] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x1001e4660]
[   74.502563] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x1001e4680]
[   74.680713] vfio-pci 0000:06:00.0: enabling device (0000 -> 0003)
[   74.681219] vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
[   74.681235] vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
[   74.700687] vfio-pci 0000:06:00.1: enabling device (0000 -> 0002)
[   74.772816] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   74.775906] ata2.00: configured for UDMA/133
[   74.775957] ata2: EH complete
[   74.935315] AMD-Vi: Completion-Wait loop timed out
[   75.073590] AMD-Vi: Completion-Wait loop timed out
[   75.212946] AMD-Vi: Completion-Wait loop timed out
[   75.379316] AMD-Vi: Completion-Wait loop timed out
[   75.504512] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x1001e46f0]

Stopping the VM results in similar messages.

The card is an AMD Radeon HD 7790 (1002:665c) and shows up at 06:00.0 on
the host. This is a Ryzen system with an ASUS "TUF GAMING X570-PLUS".
Userspace virt-related packages are all stock from Ubuntu 20.04.

While these messages are printed, sometimes the cursor and audio
stutter. These temporary freezes have also caused file system
corruption. The graphics card is non functional in this state.

Bisecting this shows that the issue was introduced by:
7ab5e10eda02d ("vfio/pci: Move the unused device into low power state with runtime PM").

Reverting that commit in 5.19 results in GPU passthrough working as
expected. The patch doesn't cleanly revert on kernels newer than 5.19.

--
Tasos

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Completion-Wait loop timed out with vfio
  2023-02-25  6:25 Bug: Completion-Wait loop timed out with vfio Tasos Sahanidis
@ 2023-02-27  5:33 ` Abhishek Sahu
  2023-02-28  7:33   ` Tasos Sahanidis
  0 siblings, 1 reply; 16+ messages in thread
From: Abhishek Sahu @ 2023-02-27  5:33 UTC (permalink / raw)
  To: Tasos Sahanidis, alex.williamson; +Cc: kvm

On 2/25/2023 11:55 AM, Tasos Sahanidis wrote:
> Hello everyone,
> 
> Attempting to pass through my graphics card to a VM with kernel 
>> = 5.19.results in the following (host):
> 
> [   72.645091] AMD-Vi: Completion-Wait loop timed out
> [   72.791448] AMD-Vi: Completion-Wait loop timed out
> [   72.937768] AMD-Vi: Completion-Wait loop timed out
> [   73.084388] AMD-Vi: Completion-Wait loop timed out
> [   73.231661] AMD-Vi: Completion-Wait loop timed out
> [   73.231711] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f000 flags=0x0050]
> [   73.231724] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f040 flags=0x0050]
> [   73.231734] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f080 flags=0x0050]
> [   73.231743] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f0c0 flags=0x0050]
> [   73.231752] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f100 flags=0x0050]
> [   73.231761] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f140 flags=0x0050]
> [   73.231770] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f180 flags=0x0050]
> [   73.231779] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f1c0 flags=0x0050]
> [   73.231788] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f200 flags=0x0050]
> [   73.231797] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f240 flags=0x0050]
> [   73.377900] AMD-Vi: Completion-Wait loop timed out
> [   73.500538] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x1001e4600]
> [   73.546431] AMD-Vi: Completion-Wait loop timed out
> [   73.693772] AMD-Vi: Completion-Wait loop timed out
> [   73.847385] AMD-Vi: Completion-Wait loop timed out
> [   74.001796] AMD-Vi: Completion-Wait loop timed out
> [   74.148077] AMD-Vi: Completion-Wait loop timed out
> [   74.168380] virbr0: port 2(vnet0) entered learning state
> [   74.294937] AMD-Vi: Completion-Wait loop timed out
> [   74.296484] ata2.00: exception Emask 0x20 SAct 0x7e703fff SErr 0x0 action 0x6 frozen
> [   74.296492] ata2.00: irq_stat 0x20000000, host bus error
> [   74.296496] ata2.00: failed command: WRITE FPDMA QUEUED
> [   74.296498] ata2.00: cmd 61/08:00:c0:ec:91/00:00:01:00:00/40 tag 0 ncq dma 4096 out
>                         res 40/00:34:20:eb:91/00:00:01:00:00/40 Emask 0x20 (host bus error)
> [   74.296507] ata2.00: status: { DRDY }
> [more ATA errors]
> [   74.296724] ata2: hard resetting link
> [   74.430739] AMD-Vi: Completion-Wait loop timed out
> [   74.502557] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x1001e4660]
> [   74.502563] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x1001e4680]
> [   74.680713] vfio-pci 0000:06:00.0: enabling device (0000 -> 0003)
> [   74.681219] vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
> [   74.681235] vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
> [   74.700687] vfio-pci 0000:06:00.1: enabling device (0000 -> 0002)
> [   74.772816] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [   74.775906] ata2.00: configured for UDMA/133
> [   74.775957] ata2: EH complete
> [   74.935315] AMD-Vi: Completion-Wait loop timed out
> [   75.073590] AMD-Vi: Completion-Wait loop timed out
> [   75.212946] AMD-Vi: Completion-Wait loop timed out
> [   75.379316] AMD-Vi: Completion-Wait loop timed out
> [   75.504512] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x1001e46f0]
> 
> Stopping the VM results in similar messages.
> 
> The card is an AMD Radeon HD 7790 (1002:665c) and shows up at 06:00.0 on
> the host. This is a Ryzen system with an ASUS "TUF GAMING X570-PLUS".
> Userspace virt-related packages are all stock from Ubuntu 20.04.
> 
> While these messages are printed, sometimes the cursor and audio
> stutter. These temporary freezes have also caused file system
> corruption. The graphics card is non functional in this state.
> 
> Bisecting this shows that the issue was introduced by:
> 7ab5e10eda02d ("vfio/pci: Move the unused device into low power state with runtime PM").
> 
> Reverting that commit in 5.19 results in GPU passthrough working as
> expected. The patch doesn't cleanly revert on kernels newer than 5.19.
> 
> --
> Tasos

 Thanks Tasos.

 The patch enables the runtime power management. Previously, when the device is unused
 state, then it will be put in D3hot state. Now, it will be put into D3cold.
 In D3cold, the device power will be removed completely.

 If the issue is happening after this patch that means somehow the runtime power
 management is not working as expected with this device or platform.

 Is it possible to try following things at your end to get more information,

 1. Set disable_idle_d3 module parameter set and check if this issue happens.
    It can be done by adding following entry in command line

               vfio_pci.disable_idle_d3=1

 2. Without starting the VM, check the status of following sysfs entries.

    # cat /sys/bus/pci/devices/<B:D:F>/power/runtime_status
    # cat /sys/bus/pci/devices/<B:D:F>/power/power_state

 3. After issue happens, run the above command again.
 4. Do lspci -s <B:D:F> -vvv without starting the VM and see if it is printing the correct
    results and there is no new prints in the dmesg.
 5. Enable the ftrace events related with runtime power management before starting the VM

    # echo 1 > /sys/kernel/debug/tracing/events/rpm/enable
               
    and collect the trace logs after this issue happens

    # cat /sys/kernel/debug/tracing/trace
           
 6. Do you have any NVIDIA graphics card with you. If you have, then
    could you please check if issue happens with that.

  Thanks,
  Abhishek

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Completion-Wait loop timed out with vfio
  2023-02-27  5:33 ` Abhishek Sahu
@ 2023-02-28  7:33   ` Tasos Sahanidis
  2023-02-28 18:46     ` Alex Williamson
  2023-03-01 10:10     ` Abhishek Sahu
  0 siblings, 2 replies; 16+ messages in thread
From: Tasos Sahanidis @ 2023-02-28  7:33 UTC (permalink / raw)
  To: Abhishek Sahu, alex.williamson; +Cc: kvm

Thank you very much for your quick response, Abhishek.

> 1. Set disable_idle_d3 module parameter set and check if this issue happens.
The issue does not happen with disable_idle_d3, which means I can at
least now use newer kernels. All the following commands were ran
*without* disable_idle_d3, so that the issue would occur.

> 2. Without starting the VM, check the status of following sysfs entries.
I assume by /sys/bus/pci/devices/<B:D:F>/power/power_state you meant
/sys/bus/pci/devices/<B:D:F>/power_state, as the former doesn't exist.

# cat /sys/bus/pci/devices/0000\:06\:00.0/power/runtime_status
suspended
# cat /sys/bus/pci/devices/0000\:06\:00.0/power_state
D3hot

> 3. After issue happens, run the above command again.
This is with the VM running and the errors in dmesg:

# cat /sys/bus/pci/devices/0000\:06\:00.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000\:06\:00.0/power_state
D0

> 4. Do lspci -s <B:D:F> -vvv without starting the VM and see if it is printing the correct
>    results and there is no new prints in the dmesg.
This is from before the VM was started:

# lspci -s 0000:06:00.0 -vvv
06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Bonaire XT [Radeon HD 7790/8770 / R7 360 / R9 260/360 OEM] (prog-if 00 [VGA controller])
	Subsystem: ASUSTeK Computer Inc. Radeon HD 7790 DirectCU II OC
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 255
	Region 0: Memory at <ignored> (64-bit, prefetchable) [disabled]
	Region 2: Memory at <ignored> (64-bit, prefetchable) [disabled]
	Region 4: I/O ports at d000 [disabled] [size=256]
	Region 5: Memory at fca00000 (32-bit, non-prefetchable) [disabled] [size=256K]
	Expansion ROM at fca40000 [disabled] [size=128K]
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
		Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
		LnkCap:	Port #2, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (ok), Width x4 (downgraded)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, NROPrPrP-, LTR-
			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt+, EETLPPrefix+, MaxEETLPPrefixes 1
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [270 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
		LaneErrStat: 0
	Capabilities: [2b0 v1] Address Translation Service (ATS)
		ATSCap:	Invalidate Queue Depth: 00
		ATSCtl:	Enable+, Smallest Translation Unit: 00
	Capabilities: [2c0 v1] Page Request Interface (PRI)
		PRICtl: Enable- Reset-
		PRISta: RF- UPRGI- Stopped+
		Page Request Capacity: 00000020, Page Request Allocation: 00000000
	Capabilities: [2d0 v1] Process Address Space ID (PASID)
		PASIDCap: Exec+ Priv+, Max PASID Width: 10
		PASIDCtl: Enable- Exec- Priv-
	Kernel driver in use: vfio-pci
	Kernel modules: amdgpu


This is the diff from while the VM was running:

@@ -1,25 +1,25 @@
 root@tasos-Standard-PC-Q35-ICH9-2009:~# lspci -s 0000:06:00.0 -vvv
 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Bonaire XT [Radeon HD 7790/8770 / R7 360 / R9 260/360 OEM] (prog-if 00 [VGA controller])
 	Subsystem: ASUSTeK Computer Inc. Radeon HD 7790 DirectCU II OC
-	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
+	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
-	Interrupt: pin A routed to IRQ 255
-	Region 0: Memory at <ignored> (64-bit, prefetchable) [disabled]
-	Region 2: Memory at <ignored> (64-bit, prefetchable) [disabled]
-	Region 4: I/O ports at d000 [disabled] [size=256]
-	Region 5: Memory at fca00000 (32-bit, non-prefetchable) [disabled] [size=256K]
+	Interrupt: pin A routed to IRQ 42
+	Region 0: Memory at <ignored> (64-bit, prefetchable)
+	Region 2: Memory at <ignored> (64-bit, prefetchable)
+	Region 4: I/O ports at d000 [size=256]
+	Region 5: Memory at fca00000 (32-bit, non-prefetchable) [size=256K]
 	Expansion ROM at fca40000 [disabled] [size=128K]
 	Capabilities: [48] Vendor Specific Information: Len=08 <?>
 	Capabilities: [50] Power Management version 3
 		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
-		Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
+		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
 	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
 		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
 			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
 		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
 			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
 			MaxPayload 128 bytes, MaxReadReq 512 bytes
-		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
+		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
 		LnkCap:	Port #2, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
 			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+


If I am reading this correctly, the card claims to not support D3cold.

Nothing extra was printed in dmesg while running the lspci commands.

Regarding the "Memory at <ignored> (64-bit, prefetchable)" lines, this
is because of presumably a firmware bug where it doesn't correctly map
the secondary graphics card in the address space.

On my main installation, I boot with pci=realloc which fixes it, but due
to the filesystem corruption risk, I am performing this testing on a
separate installation with the rest of the drives unmounted, and I
forgot to add pci=realloc. The behaviour in regards to the power
management issue is the same in both cases (it was originally discovered
with pci=realloc set).

> 5. Enable the ftrace events related with runtime power management before starting the VM
I captured the trace, but
$ wc -l trace
41129 trace

It doesn't sound like a good idea to send the contents of that entire
file. Is there something specific you'd like me to filter for?

Perhaps this is a bit better?:
$ grep "KVM\|qemu\|0000:06:00" trace | wc -l
719

If not, I can upload the entire file and send a link, although I don't
know if it will be caught in a spam filter.

>  6. Do you have any NVIDIA graphics card with you. If you have, then
>     could you please check if issue happens with that.
Unfortunately, no, I do not have any NVIDIA cards.

I tried passing through a:
0a:00.0 Serial controller [0700]: MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller [9710:9912] (prog-if 02 [16550])
which claims:
	Capabilities: [78] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

and I did not get any messages/errors in dmesg.

Thank you

--
Tasos


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Completion-Wait loop timed out with vfio
  2023-02-28  7:33   ` Tasos Sahanidis
@ 2023-02-28 18:46     ` Alex Williamson
  2023-03-01 10:34       ` Tasos Sahanidis
  2023-03-01 10:10     ` Abhishek Sahu
  1 sibling, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2023-02-28 18:46 UTC (permalink / raw)
  To: Tasos Sahanidis; +Cc: Abhishek Sahu, kvm

On Tue, 28 Feb 2023 09:33:53 +0200
Tasos Sahanidis <tasos@tasossah.com> wrote:

> Thank you very much for your quick response, Abhishek.
> 
> > 1. Set disable_idle_d3 module parameter set and check if this issue happens.  
> The issue does not happen with disable_idle_d3, which means I can at
> least now use newer kernels. All the following commands were ran
> *without* disable_idle_d3, so that the issue would occur.
> 
> > 2. Without starting the VM, check the status of following sysfs entries.  
> I assume by /sys/bus/pci/devices/<B:D:F>/power/power_state you meant
> /sys/bus/pci/devices/<B:D:F>/power_state, as the former doesn't exist.
> 
> # cat /sys/bus/pci/devices/0000\:06\:00.0/power/runtime_status
> suspended
> # cat /sys/bus/pci/devices/0000\:06\:00.0/power_state
> D3hot
> 
> > 3. After issue happens, run the above command again.  
> This is with the VM running and the errors in dmesg:
> 
> # cat /sys/bus/pci/devices/0000\:06\:00.0/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000\:06\:00.0/power_state
> D0


Can you do the same for the root port to the GPU, ex. use lspci -t to
find the parent root port.  Since the device doesn't seem to be
achieving D3cold (expected on a desktop system), the other significant
change of the identified commit is that the root port will also enter a
low power state.  Prior to that commit the device would enter D3hot, but
we never touched the root port.  Perhaps confirm the root port now
enters D3hot and compare lspci for the root port when using
disable_idle_d3 to that found when trying to use the device without
disable_idle_d3. Thanks,

Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Completion-Wait loop timed out with vfio
  2023-02-28  7:33   ` Tasos Sahanidis
  2023-02-28 18:46     ` Alex Williamson
@ 2023-03-01 10:10     ` Abhishek Sahu
  2023-03-02  7:18       ` Tasos Sahanidis
  1 sibling, 1 reply; 16+ messages in thread
From: Abhishek Sahu @ 2023-03-01 10:10 UTC (permalink / raw)
  To: Tasos Sahanidis, alex.williamson; +Cc: kvm

On 2/28/2023 1:03 PM, Tasos Sahanidis wrote:
> Thank you very much for your quick response, Abhishek.
> 
>> 1. Set disable_idle_d3 module parameter set and check if this issue happens.
> The issue does not happen with disable_idle_d3, which means I can at
> least now use newer kernels. All the following commands were ran
> *without* disable_idle_d3, so that the issue would occur.
> 
>> 2. Without starting the VM, check the status of following sysfs entries.
> I assume by /sys/bus/pci/devices/<B:D:F>/power/power_state you meant
> /sys/bus/pci/devices/<B:D:F>/power_state, as the former doesn't exist.
> 
> # cat /sys/bus/pci/devices/0000\:06\:00.0/power/runtime_status
> suspended
> # cat /sys/bus/pci/devices/0000\:06\:00.0/power_state
> D3hot
> 

 So D3cold is not supported on this system.
 Most of the desktop systems doesn’t support D3cold. 
 In that case, as Alex mentioned that after that patch the root port can also
 go into D3hot state.
 
 Another difference is that earlier we were changing the device power state by
 directly writing into PCI PM_CTRL registers. Now, we are using kernel generic
 runtime PM function to perform the same.

 We need to print the root port runtime status and power_state as Alex mentioned.

 Apart from that, can we try following things to get more information,

 Before binding the Device to vfio-pci driver, disable the runtime power
 management of the root port
 
 # echo on > /sys/bus/pci/devices/<root_port B:D:F>/power/control

 After this, bind the device to vfio-pci driver and check the runtime status and power_state
 for both device and root port. The root port runtime_status should be active and power_state
 should be D0.

 With the runtime PM disabled for the root port, check if this issue happens.
 It will give clue if the root port going into D3hot status is causing the issue or
 the use of runtime PM to put device into D3hot is causing this. 

>> 3. After issue happens, run the above command again.
> This is with the VM running and the errors in dmesg:
> 
> # cat /sys/bus/pci/devices/0000\:06\:00.0/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000\:06\:00.0/power_state
> D0
> 
>> 4. Do lspci -s <B:D:F> -vvv without starting the VM and see if it is printing the correct
>>    results and there is no new prints in the dmesg.
> This is from before the VM was started:
> 
> # lspci -s 0000:06:00.0 -vvv
> 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Bonaire XT [Radeon HD 7790/8770 / R7 360 / R9 260/360 OEM] (prog-if 00 [VGA controller])
> 	Subsystem: ASUSTeK Computer Inc. Radeon HD 7790 DirectCU II OC
> 	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Interrupt: pin A routed to IRQ 255
> 	Region 0: Memory at <ignored> (64-bit, prefetchable) [disabled]
> 	Region 2: Memory at <ignored> (64-bit, prefetchable) [disabled]
> 	Region 4: I/O ports at d000 [disabled] [size=256]
> 	Region 5: Memory at fca00000 (32-bit, non-prefetchable) [disabled] [size=256K]
> 	Expansion ROM at fca40000 [disabled] [size=128K]
> 	Capabilities: [48] Vendor Specific Information: Len=08 <?>
> 	Capabilities: [50] Power Management version 3
> 		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
> 		Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
> 	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
> 		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
> 			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> 		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> 			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> 			MaxPayload 128 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
> 		LnkCap:	Port #2, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
> 			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
> 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 8GT/s (ok), Width x4 (downgraded)
> 			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, NROPrPrP-, LTR-
> 			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt+, EETLPPrefix+, MaxEETLPPrefixes 1
> 			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
> 			 FRS-
> 			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> 			 AtomicOpsCtl: ReqEn-
> 		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> 		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
> 			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
> 	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
> 		Address: 0000000000000000  Data: 0000
> 	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
> 	Capabilities: [150 v2] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> 		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
> 			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> 		HeaderLog: 00000000 00000000 00000000 00000000
> 	Capabilities: [270 v1] Secondary PCI Express
> 		LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
> 		LaneErrStat: 0
> 	Capabilities: [2b0 v1] Address Translation Service (ATS)
> 		ATSCap:	Invalidate Queue Depth: 00
> 		ATSCtl:	Enable+, Smallest Translation Unit: 00
> 	Capabilities: [2c0 v1] Page Request Interface (PRI)
> 		PRICtl: Enable- Reset-
> 		PRISta: RF- UPRGI- Stopped+
> 		Page Request Capacity: 00000020, Page Request Allocation: 00000000
> 	Capabilities: [2d0 v1] Process Address Space ID (PASID)
> 		PASIDCap: Exec+ Priv+, Max PASID Width: 10
> 		PASIDCtl: Enable- Exec- Priv-
> 	Kernel driver in use: vfio-pci
> 	Kernel modules: amdgpu
> 
> 
> This is the diff from while the VM was running:
> 
> @@ -1,25 +1,25 @@
>  root@tasos-Standard-PC-Q35-ICH9-2009:~# lspci -s 0000:06:00.0 -vvv
>  06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Bonaire XT [Radeon HD 7790/8770 / R7 360 / R9 260/360 OEM] (prog-if 00 [VGA controller])
>  	Subsystem: ASUSTeK Computer Inc. Radeon HD 7790 DirectCU II OC
> -	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> +	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>  	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> -	Interrupt: pin A routed to IRQ 255
> -	Region 0: Memory at <ignored> (64-bit, prefetchable) [disabled]
> -	Region 2: Memory at <ignored> (64-bit, prefetchable) [disabled]
> -	Region 4: I/O ports at d000 [disabled] [size=256]
> -	Region 5: Memory at fca00000 (32-bit, non-prefetchable) [disabled] [size=256K]
> +	Interrupt: pin A routed to IRQ 42
> +	Region 0: Memory at <ignored> (64-bit, prefetchable)
> +	Region 2: Memory at <ignored> (64-bit, prefetchable)
> +	Region 4: I/O ports at d000 [size=256]
> +	Region 5: Memory at fca00000 (32-bit, non-prefetchable) [size=256K]
>  	Expansion ROM at fca40000 [disabled] [size=128K]
>  	Capabilities: [48] Vendor Specific Information: Len=08 <?>
>  	Capabilities: [50] Power Management version 3
>  		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
> -		Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
> +		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>  	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
>  		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
>  			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>  		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
>  			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
>  			MaxPayload 128 bytes, MaxReadReq 512 bytes
> -		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
> +		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
>  		LnkCap:	Port #2, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
>  			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
>  		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> 
> 
> If I am reading this correctly, the card claims to not support D3cold.
> 
> Nothing extra was printed in dmesg while running the lspci commands.
> 
> Regarding the "Memory at <ignored> (64-bit, prefetchable)" lines, this
> is because of presumably a firmware bug where it doesn't correctly map
> the secondary graphics card in the address space.
> 
> On my main installation, I boot with pci=realloc which fixes it, but due
> to the filesystem corruption risk, I am performing this testing on a
> separate installation with the rest of the drives unmounted, and I
> forgot to add pci=realloc. The behaviour in regards to the power
> management issue is the same in both cases (it was originally discovered
> with pci=realloc set).
> 

 This  “Completion-Wait loop timed out with vfio” prints is coming
 from the IOMMU driver. Can you please check once by adding ‘pci=realloc’
 in your separate installation and see if we the memory are enabled after
 D3hot cycles. If memory is getting disabled only after D3hot cycles with
 ‘pci=realloc’, then we need to find out at which stage it is happening
 (when the device is going into D3hot or when root port is going into D3hot).

 For this we can disable the runtime PM of both device and root port before
 binding the device to vfio-pci driver. Then enable runtime PM of device first
 and wait for it to go into suspended state. Then check lspci output. 
 Then enable the same for root port and check lspci output.

>> 5. Enable the ftrace events related with runtime power management before starting the VM
> I captured the trace, but
> $ wc -l trace
> 41129 trace
> 
> It doesn't sound like a good idea to send the contents of that entire
> file. Is there something specific you'd like me to filter for?
> 
> Perhaps this is a bit better?:
> $ grep "KVM\|qemu\|0000:06:00" trace | wc -l
> 719
> 
> If not, I can upload the entire file and send a link, although I don't
> know if it will be caught in a spam filter.
> 

 Mainly I was looking for the prints related with the device, root port
 and any other device present under the root port (like audio function of the GPU).
 But given these logs are huge and won’t help much here so you can skip this.

 Thanks,
 Abhishek

>>  6. Do you have any NVIDIA graphics card with you. If you have, then
>>     could you please check if issue happens with that.
> Unfortunately, no, I do not have any NVIDIA cards.
> 
> I tried passing through a:
> 0a:00.0 Serial controller [0700]: MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller [9710:9912] (prog-if 02 [16550])
> which claims:
> 	Capabilities: [78] Power Management version 3
> 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
> 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> 
> and I did not get any messages/errors in dmesg.
> 
> Thank you
> 
> --
> Tasos
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Completion-Wait loop timed out with vfio
  2023-02-28 18:46     ` Alex Williamson
@ 2023-03-01 10:34       ` Tasos Sahanidis
  2023-03-01 11:53         ` Abhishek Sahu
  2023-03-01 14:10         ` Alex Williamson
  0 siblings, 2 replies; 16+ messages in thread
From: Tasos Sahanidis @ 2023-03-01 10:34 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Abhishek Sahu, kvm

On 2023-02-28 20:46, Alex Williamson wrote:
> Can you do the same for the root port to the GPU, ex. use lspci -t to
> find the parent root port.  Since the device doesn't seem to be
> achieving D3cold (expected on a desktop system), the other significant
> change of the identified commit is that the root port will also enter a
> low power state.  Prior to that commit the device would enter D3hot, but
> we never touched the root port.  Perhaps confirm the root port now
> enters D3hot and compare lspci for the root port when using
> disable_idle_d3 to that found when trying to use the device without
> disable_idle_d3. Thanks,
> 
> Alex
> 

I seem to have trouble understanding the lspci tree.

The tree is as follows:

-[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
[...]      |
           +-01.2-[02-0d]----00.0-[03-0d]--+-01.0-[04-05]----00.0-[05]--+-00.0  Creative Labs EMU10k2/CA0100/CA0102/CA10200 [Sound Blaster Audigy Series]
           |                               |                            +-00.1  Creative Labs SB Audigy Game Port
           |                               |                            +-01.0  Brooktree Corporation Bt878 Video Capture
           |                               |                            \-01.1  Brooktree Corporation Bt878 Audio Capture
           |                               +-02.0-[06]--+-00.0  Advanced Micro Devices, Inc. [AMD/ATI] Bonaire XT [Radeon HD 7790/8770 / R7 360 / R9 260/360 OEM]
           |                               |            \-00.1  Advanced Micro Devices, Inc. [AMD/ATI] Tobago HDMI Audio [Radeon R7 360 / R9 360 OEM]
           |                               +-03.0-[07-08]----00.0-[08]--+-00.0  Philips Semiconductors SAA7131/SAA7133/SAA7135 Video Broadcast Decoder
           |                               |                            \-01.0  Yamaha Corporation YMF-744B [DS-1S Audio Controller]
           |                               +-05.0-[09]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
           |                               +-06.0-[0a]--+-00.0  MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
           |                               |            +-00.1  MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
           |                               |            \-00.2  MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
           |                               +-08.0-[0b]--+-00.0  Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP
           |                               |            +-00.1  Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
           |                               |            \-00.3  Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
           |                               +-09.0-[0c]----00.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
           |                               \-0a.0-[0d]----00.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
[...]      |

The parent root port is either 0000:00:01.2 or 0000:00:02.0, correct?
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch Upstream

If so, I tested in 5.18, both before and while running the VM, with 6.2
both with and without disable_idle_d3, and in all cases they stayed at D0.

Only difference was the card itself would be at D0 instead of D3hot with
disable_idle_d3. In the working 5.18, without disable_idle_d3, it would
still enter D3hot.

==> 5_18_before_vm <==
# cat /sys/bus/pci/devices/0000:02:00.0/power_state
D0
# cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:00:01.2/power_state
D0
# cat /sys/bus/pci/devices/0000:00:01.2/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:00:02.0/power_state
unknown
# cat /sys/bus/pci/devices/0000:00:02.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:06:00.0/power_state
D3hot
# cat /sys/bus/pci/devices/0000:06:00.0/power/runtime_status
active

==> 5_18_running_vm <==
# cat /sys/bus/pci/devices/0000:02:00.0/power_state
D0
# cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:00:01.2/power_state
D0
# cat /sys/bus/pci/devices/0000:00:01.2/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:00:02.0/power_state
unknown
# cat /sys/bus/pci/devices/0000:00:02.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:06:00.0/power_state
D0
# cat /sys/bus/pci/devices/0000:06:00.0/power/runtime_status
active

==> 6_2_before_vm <==
# cat /sys/bus/pci/devices/0000:02:00.0/power_state
D0
# cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:00:01.2/power_state
D0
# cat /sys/bus/pci/devices/0000:00:01.2/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:00:02.0/power_state
unknown
# cat /sys/bus/pci/devices/0000:00:02.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:06:00.0/power_state
D3hot
# cat /sys/bus/pci/devices/0000:06:00.0/power/runtime_status
suspended

==> 6_2_running_vm <==
# cat /sys/bus/pci/devices/0000:02:00.0/power_state
D0
# cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:00:01.2/power_state
D0
# cat /sys/bus/pci/devices/0000:00:01.2/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:00:02.0/power_state
unknown
# cat /sys/bus/pci/devices/0000:00:02.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:06:00.0/power_state
D0
# cat /sys/bus/pci/devices/0000:06:00.0/power/runtime_status
active

==> 6_2_before_vm_disable_idle_d3 <==
# cat /sys/bus/pci/devices/0000:02:00.0/power_state
D0
# cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:00:01.2/power_state
D0
# cat /sys/bus/pci/devices/0000:00:01.2/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:00:02.0/power_state
unknown
# cat /sys/bus/pci/devices/0000:00:02.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:06:00.0/power_state
D0
# cat /sys/bus/pci/devices/0000:06:00.0/power/runtime_status
active

==> 6_2_running_vm_disable_idle_d3 <==
# cat /sys/bus/pci/devices/0000:02:00.0/power_state
D0
# cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:00:01.2/power_state
D0
# cat /sys/bus/pci/devices/0000:00:01.2/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:00:02.0/power_state
unknown
# cat /sys/bus/pci/devices/0000:00:02.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:06:00.0/power_state
D0
# cat /sys/bus/pci/devices/0000:06:00.0/power/runtime_status
active

0000:00:02.0 is Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
and can presumably be ignored.

--
Tasos

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Completion-Wait loop timed out with vfio
  2023-03-01 10:34       ` Tasos Sahanidis
@ 2023-03-01 11:53         ` Abhishek Sahu
  2023-03-02  7:26           ` Tasos Sahanidis
  2023-03-01 14:10         ` Alex Williamson
  1 sibling, 1 reply; 16+ messages in thread
From: Abhishek Sahu @ 2023-03-01 11:53 UTC (permalink / raw)
  To: Tasos Sahanidis, Alex Williamson; +Cc: kvm

On 3/1/2023 4:04 PM, Tasos Sahanidis wrote:
> On 2023-02-28 20:46, Alex Williamson wrote:
>> Can you do the same for the root port to the GPU, ex. use lspci -t to
>> find the parent root port.  Since the device doesn't seem to be
>> achieving D3cold (expected on a desktop system), the other significant
>> change of the identified commit is that the root port will also enter a
>> low power state.  Prior to that commit the device would enter D3hot, but
>> we never touched the root port.  Perhaps confirm the root port now
>> enters D3hot and compare lspci for the root port when using
>> disable_idle_d3 to that found when trying to use the device without
>> disable_idle_d3. Thanks,
>>
>> Alex
>>
> 
> I seem to have trouble understanding the lspci tree.
> 

 Can you please try following way to confirm the path

 ls -l /sys/bus/pci/devices

 It generally displays the full path like

 0000:03:00.0 -> ../../../devices/pci0000:00/0000:00:1c.5/0000:02:00.0/0000:03:00.0

 Also, Can you please print the runtime PM control entry as well

 /sys/bus/pci/devices/<root_port B:D:F>/power/control

 Thanks,
 Abhishek

> The tree is as follows:
> 
> -[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
> [...]      |
>            +-01.2-[02-0d]----00.0-[03-0d]--+-01.0-[04-05]----00.0-[05]--+-00.0  Creative Labs EMU10k2/CA0100/CA0102/CA10200 [Sound Blaster Audigy Series]
>            |                               |                            +-00.1  Creative Labs SB Audigy Game Port
>            |                               |                            +-01.0  Brooktree Corporation Bt878 Video Capture
>            |                               |                            \-01.1  Brooktree Corporation Bt878 Audio Capture
>            |                               +-02.0-[06]--+-00.0  Advanced Micro Devices, Inc. [AMD/ATI] Bonaire XT [Radeon HD 7790/8770 / R7 360 / R9 260/360 OEM]
>            |                               |            \-00.1  Advanced Micro Devices, Inc. [AMD/ATI] Tobago HDMI Audio [Radeon R7 360 / R9 360 OEM]
>            |                               +-03.0-[07-08]----00.0-[08]--+-00.0  Philips Semiconductors SAA7131/SAA7133/SAA7135 Video Broadcast Decoder
>            |                               |                            \-01.0  Yamaha Corporation YMF-744B [DS-1S Audio Controller]
>            |                               +-05.0-[09]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>            |                               +-06.0-[0a]--+-00.0  MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
>            |                               |            +-00.1  MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
>            |                               |            \-00.2  MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
>            |                               +-08.0-[0b]--+-00.0  Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP
>            |                               |            +-00.1  Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
>            |                               |            \-00.3  Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
>            |                               +-09.0-[0c]----00.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
>            |                               \-0a.0-[0d]----00.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
> [...]      |
> 
> The parent root port is either 0000:00:01.2 or 0000:00:02.0, correct?
> 00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
> 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch Upstream
> 
> If so, I tested in 5.18, both before and while running the VM, with 6.2
> both with and without disable_idle_d3, and in all cases they stayed at D0.
> 
> Only difference was the card itself would be at D0 instead of D3hot with
> disable_idle_d3. In the working 5.18, without disable_idle_d3, it would
> still enter D3hot.
> 
> ==> 5_18_before_vm <==
> # cat /sys/bus/pci/devices/0000:02:00.0/power_state
> D0
> # cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:00:01.2/power_state
> D0
> # cat /sys/bus/pci/devices/0000:00:01.2/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:00:02.0/power_state
> unknown
> # cat /sys/bus/pci/devices/0000:00:02.0/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:06:00.0/power_state
> D3hot
> # cat /sys/bus/pci/devices/0000:06:00.0/power/runtime_status
> active
> 
> ==> 5_18_running_vm <==
> # cat /sys/bus/pci/devices/0000:02:00.0/power_state
> D0
> # cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:00:01.2/power_state
> D0
> # cat /sys/bus/pci/devices/0000:00:01.2/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:00:02.0/power_state
> unknown
> # cat /sys/bus/pci/devices/0000:00:02.0/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:06:00.0/power_state
> D0
> # cat /sys/bus/pci/devices/0000:06:00.0/power/runtime_status
> active
> 
> ==> 6_2_before_vm <==
> # cat /sys/bus/pci/devices/0000:02:00.0/power_state
> D0
> # cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:00:01.2/power_state
> D0
> # cat /sys/bus/pci/devices/0000:00:01.2/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:00:02.0/power_state
> unknown
> # cat /sys/bus/pci/devices/0000:00:02.0/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:06:00.0/power_state
> D3hot
> # cat /sys/bus/pci/devices/0000:06:00.0/power/runtime_status
> suspended
> 
> ==> 6_2_running_vm <==
> # cat /sys/bus/pci/devices/0000:02:00.0/power_state
> D0
> # cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:00:01.2/power_state
> D0
> # cat /sys/bus/pci/devices/0000:00:01.2/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:00:02.0/power_state
> unknown
> # cat /sys/bus/pci/devices/0000:00:02.0/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:06:00.0/power_state
> D0
> # cat /sys/bus/pci/devices/0000:06:00.0/power/runtime_status
> active
> 
> ==> 6_2_before_vm_disable_idle_d3 <==
> # cat /sys/bus/pci/devices/0000:02:00.0/power_state
> D0
> # cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:00:01.2/power_state
> D0
> # cat /sys/bus/pci/devices/0000:00:01.2/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:00:02.0/power_state
> unknown
> # cat /sys/bus/pci/devices/0000:00:02.0/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:06:00.0/power_state
> D0
> # cat /sys/bus/pci/devices/0000:06:00.0/power/runtime_status
> active
> 
> ==> 6_2_running_vm_disable_idle_d3 <==
> # cat /sys/bus/pci/devices/0000:02:00.0/power_state
> D0
> # cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:00:01.2/power_state
> D0
> # cat /sys/bus/pci/devices/0000:00:01.2/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:00:02.0/power_state
> unknown
> # cat /sys/bus/pci/devices/0000:00:02.0/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000:06:00.0/power_state
> D0
> # cat /sys/bus/pci/devices/0000:06:00.0/power/runtime_status
> active
> 
> 0000:00:02.0 is Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
> and can presumably be ignored.
> 
> --
> Tasos


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Completion-Wait loop timed out with vfio
  2023-03-01 10:34       ` Tasos Sahanidis
  2023-03-01 11:53         ` Abhishek Sahu
@ 2023-03-01 14:10         ` Alex Williamson
  2023-03-02  7:40           ` Tasos Sahanidis
  1 sibling, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2023-03-01 14:10 UTC (permalink / raw)
  To: Tasos Sahanidis; +Cc: Abhishek Sahu, kvm

On Wed, 1 Mar 2023 12:34:32 +0200
Tasos Sahanidis <tasos@tasossah.com> wrote:

> On 2023-02-28 20:46, Alex Williamson wrote:
> > Can you do the same for the root port to the GPU, ex. use lspci -t to
> > find the parent root port.  Since the device doesn't seem to be
> > achieving D3cold (expected on a desktop system), the other significant
> > change of the identified commit is that the root port will also enter a
> > low power state.  Prior to that commit the device would enter D3hot, but
> > we never touched the root port.  Perhaps confirm the root port now
> > enters D3hot and compare lspci for the root port when using
> > disable_idle_d3 to that found when trying to use the device without
> > disable_idle_d3. Thanks,
> > 
> > Alex
> >   
> 
> I seem to have trouble understanding the lspci tree.
> 
> The tree is as follows:
> 
> -[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
> [...]      |
>            +-01.2-[02-0d]----00.0-[03-0d]--+-01.0-[04-05]----00.0-[05]--+-00.0  Creative Labs EMU10k2/CA0100/CA0102/CA10200 [Sound Blaster Audigy Series]
>            |                               |                            +-00.1  Creative Labs SB Audigy Game Port
>            |                               |                            +-01.0  Brooktree Corporation Bt878 Video Capture
>            |                               |                            \-01.1  Brooktree Corporation Bt878 Audio Capture
>            |                               +-02.0-[06]--+-00.0  Advanced Micro Devices, Inc. [AMD/ATI] Bonaire XT [Radeon HD 7790/8770 / R7 360 / R9 260/360 OEM]
>            |                               |            \-00.1  Advanced Micro Devices, Inc. [AMD/ATI] Tobago HDMI Audio [Radeon R7 360 / R9 360 OEM]
>            |                               +-03.0-[07-08]----00.0-[08]--+-00.0  Philips Semiconductors SAA7131/SAA7133/SAA7135 Video Broadcast Decoder
>            |                               |                            \-01.0  Yamaha Corporation YMF-744B [DS-1S Audio Controller]
>            |                               +-05.0-[09]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>            |                               +-06.0-[0a]--+-00.0  MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
>            |                               |            +-00.1  MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
>            |                               |            \-00.2  MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
>            |                               +-08.0-[0b]--+-00.0  Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP
>            |                               |            +-00.1  Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
>            |                               |            \-00.3  Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
>            |                               +-09.0-[0c]----00.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
>            |                               \-0a.0-[0d]----00.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
> [...]      |
> 
> The parent root port is either 0000:00:01.2 or 0000:00:02.0, correct?

The topology is a bit more complex than usual, the root port is indeed
0000:00:01.2, but we have a PCIe switch.

> 00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
> 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch Upstream

0000:02:00.0 is the upstream port of that switch and 0000:03:02.0 is
the downstream port for the 7790.  0000:03:02.0 is the port that should
also now enter D3hot.

> If so, I tested in 5.18, both before and while running the VM, with 6.2
> both with and without disable_idle_d3, and in all cases they stayed at D0.

In this case the upstream port should always stay in D0, it has quite a
lot more devices under it than just the GPU.  It's interesting that the
MosChip that assigns ok is also under a downstream port of this switch.
That means the downstream port 0000:03:06.0 should also be entering
D3hot when all of the MosChip devices are attached to vfio-pci and
unused.

I'm not convinced thought that the MosChip assignment is a good
comparison device though, as a "multi-i/o" controller, it's possible
that it doesn't actually make use of DMA that would trigger the IOMMU
like the GPU does.  Do you have a NIC card you could replace one of
these with?

It's possible the switch has a problem with D3hot support and it may
need to be disabled or augmented with a PCI quirk.  In addition to
investigating what power state the downstream port is achieving and
reporting lspci -vvv with and without disable_idle_d3, would you mind
reporting "lspci -nns 2:00.0" and "lspci -nns 3:" to report all the
vendor and device IDs of the switch.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Completion-Wait loop timed out with vfio
  2023-03-01 10:10     ` Abhishek Sahu
@ 2023-03-02  7:18       ` Tasos Sahanidis
  0 siblings, 0 replies; 16+ messages in thread
From: Tasos Sahanidis @ 2023-03-02  7:18 UTC (permalink / raw)
  To: Abhishek Sahu, alex.williamson; +Cc: kvm

On 2023-03-01 12:10, Abhishek Sahu wrote:
>  So D3cold is not supported on this system.
>  Most of the desktop systems doesn’t support D3cold. 
>  In that case, as Alex mentioned that after that patch the root port can also
>  go into D3hot state.
>  
>  Another difference is that earlier we were changing the device power state by
>  directly writing into PCI PM_CTRL registers. Now, we are using kernel generic
>  runtime PM function to perform the same.
> 
>  We need to print the root port runtime status and power_state as Alex mentioned.

Understood. Thanks for explaining!

>  Apart from that, can we try following things to get more information,
> 
>  Before binding the Device to vfio-pci driver, disable the runtime power
>  management of the root port
>  
>  # echo on > /sys/bus/pci/devices/<root_port B:D:F>/power/control
> 
>  After this, bind the device to vfio-pci driver and check the runtime status and power_state
>  for both device and root port. The root port runtime_status should be active and power_state
>  should be D0.
> 
>  With the runtime PM disabled for the root port, check if this issue happens.
>  It will give clue if the root port going into D3hot status is causing the issue or
>  the use of runtime PM to put device into D3hot is causing this. 

I prevented vfio-pci from loading automatically on boot, and booted with
nomodeset. I set the power control to manual (echo on), and then loaded
vfio-pci. At that point, the root port was at D0 (never entered D3hot)
but the card was at D3hot.

There were no errors in dmesg when I started the VM.

On boot:
==> 6_2_before_vm_before_vfio_before_manual <==
# cat /sys/bus/pci/devices/0000:03:02.0/power_state
D0
# cat /sys/bus/pci/devices/0000:03:02.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:03:02.0/power/control
auto
# cat /sys/bus/pci/devices/0000:06:00.0/power_state
unknown
# cat /sys/bus/pci/devices/0000:06:00.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:06:00.0/power/control
on

After echo on > [..]/power/control
==> 6_2_before_vm_before_vfio_after_manual <==
# cat /sys/bus/pci/devices/0000:03:02.0/power_state
D0
# cat /sys/bus/pci/devices/0000:03:02.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:03:02.0/power/control
on
# cat /sys/bus/pci/devices/0000:06:00.0/power_state
unknown
# cat /sys/bus/pci/devices/0000:06:00.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:06:00.0/power/control
on

After loading vfio-pci:
==> 6_2_before_vm_after_vfio_after_manual <==
# cat /sys/bus/pci/devices/0000:03:02.0/power_state
D0
# cat /sys/bus/pci/devices/0000:03:02.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:03:02.0/power/control
on
# cat /sys/bus/pci/devices/0000:06:00.0/power_state
D3hot
# cat /sys/bus/pci/devices/0000:06:00.0/power/runtime_status
suspended
# cat /sys/bus/pci/devices/0000:06:00.0/power/control
auto

And finally, while the VM was running:
==> 6_2_running_vm_after_vfio_after_manual <==
# cat /sys/bus/pci/devices/0000:03:02.0/power_state
D0
# cat /sys/bus/pci/devices/0000:03:02.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:03:02.0/power/control
on
# cat /sys/bus/pci/devices/0000:06:00.0/power_state
D0
# cat /sys/bus/pci/devices/0000:06:00.0/power/runtime_status
active
# cat /sys/bus/pci/devices/0000:06:00.0/power/control
auto

>  This  “Completion-Wait loop timed out with vfio” prints is coming
>  from the IOMMU driver. Can you please check once by adding ‘pci=realloc’
>  in your separate installation and see if we the memory are enabled after
>  D3hot cycles. If memory is getting disabled only after D3hot cycles with
>  ‘pci=realloc’, then we need to find out at which stage it is happening
>  (when the device is going into D3hot or when root port is going into D3hot).
> 
>  For this we can disable the runtime PM of both device and root port before
>  binding the device to vfio-pci driver. Then enable runtime PM of device first
>  and wait for it to go into suspended state. Then check lspci output. 
>  Then enable the same for root port and check lspci output.

Assuming I understood this correctly, I added pci=realloc, and the
memory was disabled while in D3hot, but enabled as expected on D0.
While running the below commands, power control was set to the default
"auto" after a fresh reboot.

Before running the VM:
# lspci -vvnn -s 0000:06:00.0
06:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Bonaire XT [Radeon HD 7790/8770 / R7 360 / R9 260/360 OEM] [1002:665c] (prog-if 00 [VGA controller])
	Subsystem: ASUSTeK Computer Inc. Radeon HD 7790 DirectCU II OC [1043:0452]
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 255
	Region 0: Memory at 1030000000 (64-bit, prefetchable) [disabled] [size=256M]
	Region 2: Memory at 1040000000 (64-bit, prefetchable) [disabled] [size=8M]
	Region 4: I/O ports at d000 [disabled] [size=256]
	Region 5: Memory at d0100000 (32-bit, non-prefetchable) [disabled] [size=256K]
	Expansion ROM at <ignored> [disabled]
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
		Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
		LnkCap:	Port #2, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (ok), Width x4 (downgraded)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, NROPrPrP-, LTR-
			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt+, EETLPPrefix+, MaxEETLPPrefixes 1
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [270 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
		LaneErrStat: 0
	Capabilities: [2b0 v1] Address Translation Service (ATS)
		ATSCap:	Invalidate Queue Depth: 00
		ATSCtl:	Enable+, Smallest Translation Unit: 00
	Capabilities: [2c0 v1] Page Request Interface (PRI)
		PRICtl: Enable- Reset-
		PRISta: RF- UPRGI- Stopped+
		Page Request Capacity: 00000020, Page Request Allocation: 00000000
	Capabilities: [2d0 v1] Process Address Space ID (PASID)
		PASIDCap: Exec+ Priv+, Max PASID Width: 10
		PASIDCtl: Enable- Exec- Priv-
	Kernel driver in use: vfio-pci
	Kernel modules: amdgpu


While running the VM (with IOMMU errors):
# lspci -vvnn -s 0000:06:00.0
06:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Bonaire XT [Radeon HD 7790/8770 / R7 360 / R9 260/360 OEM] [1002:665c] (prog-if 00 [VGA controller])
	Subsystem: ASUSTeK Computer Inc. Radeon HD 7790 DirectCU II OC [1043:0452]
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 42
	Region 0: Memory at 1030000000 (64-bit, prefetchable) [size=256M]
	Region 2: Memory at 1040000000 (64-bit, prefetchable) [size=8M]
	Region 4: I/O ports at d000 [size=256]
	Region 5: Memory at d0100000 (32-bit, non-prefetchable) [size=256K]
	Expansion ROM at <ignored> [disabled]
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #2, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (ok), Width x4 (downgraded)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, NROPrPrP-, LTR-
			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt+, EETLPPrefix+, MaxEETLPPrefixes 1
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [270 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
		LaneErrStat: 0
	Capabilities: [2b0 v1] Address Translation Service (ATS)
		ATSCap:	Invalidate Queue Depth: 00
		ATSCtl:	Enable+, Smallest Translation Unit: 00
	Capabilities: [2c0 v1] Page Request Interface (PRI)
		PRICtl: Enable- Reset-
		PRISta: RF- UPRGI- Stopped+
		Page Request Capacity: 00000020, Page Request Allocation: 00000000
	Capabilities: [2d0 v1] Process Address Space ID (PASID)
		PASIDCap: Exec+ Priv+, Max PASID Width: 10
		PASIDCtl: Enable- Exec- Priv-
	Kernel driver in use: vfio-pci
	Kernel modules: amdgpu

--
Tasos

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Completion-Wait loop timed out with vfio
  2023-03-01 11:53         ` Abhishek Sahu
@ 2023-03-02  7:26           ` Tasos Sahanidis
  0 siblings, 0 replies; 16+ messages in thread
From: Tasos Sahanidis @ 2023-03-02  7:26 UTC (permalink / raw)
  To: Abhishek Sahu, Alex Williamson; +Cc: kvm

On 2023-03-01 13:53, Abhishek Sahu wrote:
> On 3/1/2023 4:04 PM, Tasos Sahanidis wrote:
>> On 2023-02-28 20:46, Alex Williamson wrote:
>>> Can you do the same for the root port to the GPU, ex. use lspci -t to
>>> find the parent root port.  Since the device doesn't seem to be
>>> achieving D3cold (expected on a desktop system), the other significant
>>> change of the identified commit is that the root port will also enter a
>>> low power state.  Prior to that commit the device would enter D3hot, but
>>> we never touched the root port.  Perhaps confirm the root port now
>>> enters D3hot and compare lspci for the root port when using
>>> disable_idle_d3 to that found when trying to use the device without
>>> disable_idle_d3. Thanks,
>>>
>>> Alex
>>>
>>
>> I seem to have trouble understanding the lspci tree.
>>
> 
>  Can you please try following way to confirm the path
> 
>  ls -l /sys/bus/pci/devices
> 
>  It generally displays the full path like
> 
>  0000:03:00.0 -> ../../../devices/pci0000:00/0000:00:1c.5/0000:02:00.0/0000:03:00.0

Thanks! I had the wrong device (as Alex pointed out), but this helped
verify that it is in fact 0000:03:02.0.

$ ls -l /sys/bus/pci/devices/ | grep 06:
0000:06:00.0 -> ../../../devices/pci0000:00/0000:00:01.2/0000:02:00.0/0000:03:02.0/0000:06:00.0
0000:06:00.1 -> ../../../devices/pci0000:00/0000:00:01.2/0000:02:00.0/0000:03:02.0/0000:06:00.1

>  Also, Can you please print the runtime PM control entry as well
> 
>  /sys/bus/pci/devices/<root_port B:D:F>/power/control

I now printed it in my reply to your other message, and I also did so
while collecting information in response to Alex's email. Hope it helps.

>  Thanks,
>  Abhishek

--
Tasos


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Completion-Wait loop timed out with vfio
  2023-03-01 14:10         ` Alex Williamson
@ 2023-03-02  7:40           ` Tasos Sahanidis
  2023-03-02 20:36             ` Alex Williamson
  0 siblings, 1 reply; 16+ messages in thread
From: Tasos Sahanidis @ 2023-03-02  7:40 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Abhishek Sahu, kvm

On 2023-03-01 16:10, Alex Williamson wrote:
> 0000:02:00.0 is the upstream port of that switch and 0000:03:02.0 is
> the downstream port for the 7790.  0000:03:02.0 is the port that should
> also now enter D3hot.
> 
>> If so, I tested in 5.18, both before and while running the VM, with 6.2
>> both with and without disable_idle_d3, and in all cases they stayed at D0.
> 
> In this case the upstream port should always stay in D0, it has quite a
> lot more devices under it than just the GPU.  It's interesting that the
> MosChip that assigns ok is also under a downstream port of this switch.
> That means the downstream port 0000:03:06.0 should also be entering
> D3hot when all of the MosChip devices are attached to vfio-pci and
> unused.
> 
> I'm not convinced thought that the MosChip assignment is a good
> comparison device though, as a "multi-i/o" controller, it's possible
> that it doesn't actually make use of DMA that would trigger the IOMMU
> like the GPU does.  Do you have a NIC card you could replace one of
> these with?

Unfortunately I do not have any other PCIe devices available.
I did, however, pass through the onboard Realtek NIC and it worked
perfectly fine (without any errors in dmesg).

0000:03:05.0 should be the root port for the NIC.

==> 6_2_before_vm_rtl <==
+ cat /sys/bus/pci/devices/0000:03:05.0/power_state
D3hot
+ cat /sys/bus/pci/devices/0000:03:05.0/power/runtime_status
suspended
+ cat /sys/bus/pci/devices/0000:03:05.0/power/control
auto
+ cat /sys/bus/pci/devices/0000:09:00.0/power_state
D3hot
+ cat /sys/bus/pci/devices/0000:09:00.0/power/runtime_status
suspended
+ cat /sys/bus/pci/devices/0000:09:00.0/power/control
auto
+ lspci -vvv -s 0000:03:05.0
03:05.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin ? routed to IRQ 37
	Bus: primary=03, secondary=09, subordinate=09, sec-latency=0
	I/O behind bridge: 0000b000-0000bfff [size=4K]
	Memory behind bridge: d0300000-d03fffff [size=1M]
	Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [disabled]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport


==> 6_2_running_vm_rtl <==
+ cat /sys/bus/pci/devices/0000:03:05.0/power_state
D0
+ cat /sys/bus/pci/devices/0000:03:05.0/power/runtime_status
active
+ cat /sys/bus/pci/devices/0000:03:05.0/power/control
auto
+ cat /sys/bus/pci/devices/0000:09:00.0/power_state
D0
+ cat /sys/bus/pci/devices/0000:09:00.0/power/runtime_status
active
+ cat /sys/bus/pci/devices/0000:09:00.0/power/control
auto
+ lspci -vvv -s 0000:03:05.0
03:05.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin ? routed to IRQ 37
	Bus: primary=03, secondary=09, subordinate=09, sec-latency=0
	I/O behind bridge: 0000b000-0000bfff [size=4K]
	Memory behind bridge: d0300000-d03fffff [size=1M]
	Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [disabled]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport


> It's possible the switch has a problem with D3hot support and it may
> need to be disabled or augmented with a PCI quirk.  In addition to
> investigating what power state the downstream port is achieving and
> reporting lspci -vvv with and without disable_idle_d3, would you mind
> reporting "lspci -nns 2:00.0" and "lspci -nns 3:" to report all the
> vendor and device IDs of the switch.  Thanks,
> 

It seems that way, especially after manually preventing the root port
for the graphics card from entering D3hot, however the one for the NIC
seems to be doing that just fine, which makes things more confusing.

$ lspci -nns 2:00.0
02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse Switch Upstream [1022:57ad]

$ lspci -nns 3:
03:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]
03:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]
03:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]
03:05.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]
03:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]
03:08.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a4]
03:09.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a4]
03:0a.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a4]

And the information for the PCIe port:

==> 6_2_before_vm <==
+ cat /sys/bus/pci/devices/0000:03:02.0/power_state
D3hot
+ cat /sys/bus/pci/devices/0000:03:02.0/power/runtime_status
suspended
+ cat /sys/bus/pci/devices/0000:03:02.0/power/control
auto
+ lspci -vvv -s 0000:03:02.0
03:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin ? routed to IRQ 35
	Bus: primary=03, secondary=06, subordinate=06, sec-latency=0
	I/O behind bridge: 0000d000-0000dfff [size=4K]
	Memory behind bridge: d0100000-d01fffff [size=1M]
	Prefetchable memory behind bridge: 0000001030000000-0000001047ffffff [size=384M]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D3 NoSoftRst- PME-Enable+ DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Downstream Port (Slot+), MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0
			ExtTag+ RBE+
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #2, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 <32us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM Disabled; Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (downgraded), Width x4 (ok)
			TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #0, PowerLimit 0.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState+
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+
			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-, ARIFwd-
			 AtomicOpsCap: Routing-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
			 AtomicOpsCtl: EgressBlck-
		LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee00000  Data: 0000
	Capabilities: [c0] Subsystem: ASUSTeK Computer Inc. Matisse PCIe GPP Bridge
	Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [270 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
		LaneErrStat: LaneErr at lane: 0 1 2 3
	Capabilities: [2a0 v1] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
		ACSCtl:	SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
	Capabilities: [370 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
		L1SubCtl2:
	Capabilities: [400 v1] Data Link Feature <?>
	Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
	Capabilities: [440 v1] Lane Margining at the Receiver <?>
	Kernel driver in use: pcieport


==> 6_2_running_vm <==
+ cat /sys/bus/pci/devices/0000:03:02.0/power_state
D0
+ cat /sys/bus/pci/devices/0000:03:02.0/power/runtime_status
active
+ cat /sys/bus/pci/devices/0000:03:02.0/power/control
auto
+ lspci -vvv -s 0000:03:02.0
03:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin ? routed to IRQ 35
	Bus: primary=03, secondary=06, subordinate=06, sec-latency=0
	I/O behind bridge: 0000d000-0000dfff [size=4K]
	Memory behind bridge: d0100000-d01fffff [size=1M]
	Prefetchable memory behind bridge: 0000001030000000-0000001047ffffff [size=384M]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Downstream Port (Slot+), MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0
			ExtTag+ RBE+
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #2, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 <32us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM Disabled; Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (downgraded), Width x4 (ok)
			TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #0, PowerLimit 0.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState+
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+
			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-, ARIFwd-
			 AtomicOpsCap: Routing-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
			 AtomicOpsCtl: EgressBlck-
		LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee00000  Data: 0000
	Capabilities: [c0] Subsystem: ASUSTeK Computer Inc. Matisse PCIe GPP Bridge
	Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [270 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
		LaneErrStat: LaneErr at lane: 0 1 2 3
	Capabilities: [2a0 v1] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
		ACSCtl:	SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
	Capabilities: [370 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
		L1SubCtl2:
	Capabilities: [400 v1] Data Link Feature <?>
	Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
	Capabilities: [440 v1] Lane Margining at the Receiver <?>
	Kernel driver in use: pcieport


==> 6_2_before_vm_disable_d3_idle <==
+ cat /sys/bus/pci/devices/0000:03:02.0/power_state
D0
+ cat /sys/bus/pci/devices/0000:03:02.0/power/runtime_status
active
+ cat /sys/bus/pci/devices/0000:03:02.0/power/control
auto
+ lspci -vvv -s 0000:03:02.0
03:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin ? routed to IRQ 35
	Bus: primary=03, secondary=06, subordinate=06, sec-latency=0
	I/O behind bridge: 0000d000-0000dfff [size=4K]
	Memory behind bridge: d0100000-d01fffff [size=1M]
	Prefetchable memory behind bridge: 0000001030000000-0000001047ffffff [size=384M]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Downstream Port (Slot+), MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0
			ExtTag+ RBE+
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #2, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 <32us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM Disabled; Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (downgraded), Width x4 (ok)
			TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #0, PowerLimit 0.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState+
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+
			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-, ARIFwd-
			 AtomicOpsCap: Routing-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
			 AtomicOpsCtl: EgressBlck-
		LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee00000  Data: 0000
	Capabilities: [c0] Subsystem: ASUSTeK Computer Inc. Matisse PCIe GPP Bridge
	Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [270 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
		LaneErrStat: LaneErr at lane: 0 1 2 3
	Capabilities: [2a0 v1] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
		ACSCtl:	SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
	Capabilities: [370 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
		L1SubCtl2:
	Capabilities: [400 v1] Data Link Feature <?>
	Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
	Capabilities: [440 v1] Lane Margining at the Receiver <?>
	Kernel driver in use: pcieport


==> 6_2_running_vm_disable_d3_idle <==
+ cat /sys/bus/pci/devices/0000:03:02.0/power_state
D0
+ cat /sys/bus/pci/devices/0000:03:02.0/power/runtime_status
active
+ cat /sys/bus/pci/devices/0000:03:02.0/power/control
auto
+ lspci -vvv -s 0000:03:02.0
03:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin ? routed to IRQ 35
	Bus: primary=03, secondary=06, subordinate=06, sec-latency=0
	I/O behind bridge: 0000d000-0000dfff [size=4K]
	Memory behind bridge: d0100000-d01fffff [size=1M]
	Prefetchable memory behind bridge: 0000001030000000-0000001047ffffff [size=384M]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Downstream Port (Slot+), MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0
			ExtTag+ RBE+
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #2, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 <32us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM Disabled; Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (downgraded), Width x4 (ok)
			TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #0, PowerLimit 0.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState+
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+
			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-, ARIFwd-
			 AtomicOpsCap: Routing-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
			 AtomicOpsCtl: EgressBlck-
		LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee00000  Data: 0000
	Capabilities: [c0] Subsystem: ASUSTeK Computer Inc. Matisse PCIe GPP Bridge
	Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [270 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
		LaneErrStat: LaneErr at lane: 0 1 2 3
	Capabilities: [2a0 v1] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
		ACSCtl:	SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
	Capabilities: [370 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
		L1SubCtl2:
	Capabilities: [400 v1] Data Link Feature <?>
	Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
	Capabilities: [440 v1] Lane Margining at the Receiver <?>
	Kernel driver in use: pcieport


--
Tasos

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Completion-Wait loop timed out with vfio
  2023-03-02  7:40           ` Tasos Sahanidis
@ 2023-03-02 20:36             ` Alex Williamson
  2023-03-03  6:33               ` Tasos Sahanidis
  2023-03-06  8:21               ` Abhishek Sahu
  0 siblings, 2 replies; 16+ messages in thread
From: Alex Williamson @ 2023-03-02 20:36 UTC (permalink / raw)
  To: Tasos Sahanidis; +Cc: Abhishek Sahu, kvm

On Thu, 2 Mar 2023 09:40:35 +0200
Tasos Sahanidis <tasos@tasossah.com> wrote:

> On 2023-03-01 16:10, Alex Williamson wrote:
> > 0000:02:00.0 is the upstream port of that switch and 0000:03:02.0 is
> > the downstream port for the 7790.  0000:03:02.0 is the port that should
> > also now enter D3hot.
> >   
> >> If so, I tested in 5.18, both before and while running the VM, with 6.2
> >> both with and without disable_idle_d3, and in all cases they stayed at D0.  
> > 
> > In this case the upstream port should always stay in D0, it has quite a
> > lot more devices under it than just the GPU.  It's interesting that the
> > MosChip that assigns ok is also under a downstream port of this switch.
> > That means the downstream port 0000:03:06.0 should also be entering
> > D3hot when all of the MosChip devices are attached to vfio-pci and
> > unused.
> > 
> > I'm not convinced thought that the MosChip assignment is a good
> > comparison device though, as a "multi-i/o" controller, it's possible
> > that it doesn't actually make use of DMA that would trigger the IOMMU
> > like the GPU does.  Do you have a NIC card you could replace one of
> > these with?  
> 
> Unfortunately I do not have any other PCIe devices available.
> I did, however, pass through the onboard Realtek NIC and it worked
> perfectly fine (without any errors in dmesg).
> 
> 0000:03:05.0 should be the root port for the NIC.
> 
> ==> 6_2_before_vm_rtl <==  
> + cat /sys/bus/pci/devices/0000:03:05.0/power_state
> D3hot
> + cat /sys/bus/pci/devices/0000:03:05.0/power/runtime_status
> suspended
> + cat /sys/bus/pci/devices/0000:03:05.0/power/control
> auto
> + cat /sys/bus/pci/devices/0000:09:00.0/power_state
> D3hot
> + cat /sys/bus/pci/devices/0000:09:00.0/power/runtime_status
> suspended
> + cat /sys/bus/pci/devices/0000:09:00.0/power/control
> auto
> + lspci -vvv -s 0000:03:05.0
> 03:05.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge (prog-if 00 [Normal decode])
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin ? routed to IRQ 37
> 	Bus: primary=03, secondary=09, subordinate=09, sec-latency=0
> 	I/O behind bridge: 0000b000-0000bfff [size=4K]
> 	Memory behind bridge: d0300000-d03fffff [size=1M]
> 	Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [disabled]
> 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
> 	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> 	Capabilities: <access denied>
> 	Kernel driver in use: pcieport
> 
> 
> ==> 6_2_running_vm_rtl <==  
> + cat /sys/bus/pci/devices/0000:03:05.0/power_state
> D0
> + cat /sys/bus/pci/devices/0000:03:05.0/power/runtime_status
> active
> + cat /sys/bus/pci/devices/0000:03:05.0/power/control
> auto
> + cat /sys/bus/pci/devices/0000:09:00.0/power_state
> D0
> + cat /sys/bus/pci/devices/0000:09:00.0/power/runtime_status
> active
> + cat /sys/bus/pci/devices/0000:09:00.0/power/control
> auto
> + lspci -vvv -s 0000:03:05.0
> 03:05.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge (prog-if 00 [Normal decode])
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin ? routed to IRQ 37
> 	Bus: primary=03, secondary=09, subordinate=09, sec-latency=0
> 	I/O behind bridge: 0000b000-0000bfff [size=4K]
> 	Memory behind bridge: d0300000-d03fffff [size=1M]
> 	Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [disabled]
> 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
> 	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> 	Capabilities: <access denied>
> 	Kernel driver in use: pcieport
> 
> 
> > It's possible the switch has a problem with D3hot support and it may
> > need to be disabled or augmented with a PCI quirk.  In addition to
> > investigating what power state the downstream port is achieving and
> > reporting lspci -vvv with and without disable_idle_d3, would you mind
> > reporting "lspci -nns 2:00.0" and "lspci -nns 3:" to report all the
> > vendor and device IDs of the switch.  Thanks,
> >   
> 
> It seems that way, especially after manually preventing the root port
> for the graphics card from entering D3hot, however the one for the NIC
> seems to be doing that just fine, which makes things more confusing.

Yes, the fact that the NIC works suggests there's not simply a blatant
chip defect where we should blindly disable D3 power state support for
this downstream port.  I'm also not seeing any difference in the
downstream port configuration between the VM running after the port has
resumed from D3hot and the case where the port never entered D3hot.

But it suddenly dawns on me that you're assigning a Radeon HD 7790,
which is one of the many AMD GPUs which is plagued by reset problems.
I wonder if that's a factor there.  This particular GPU even has
special handling in QEMU to try to manually reset the device, and which
likely has never been tested since adding runtime power management
support.  In fact, I'm surprised anyone is doing regular device
assignment with an HD 7790 and considers it a normal, acceptable
experience even with the QEMU workarounds.

I certainly wouldn't feel comfortable proposing a quirk for the
downstream port to disable D3hot for an issue only seen when assigning
a device with such a nefarious background relative to device
assignment.  It does however seem like there are sufficient options in
place to work around the issue, either disabling power management at
the vfio-pci driver, or specifically for the downstream port via sysfs.
I don't really have any better suggestions given our limited ability to
test and highly suspect target device.  Any other ideas, Abhishek?
Thanks,

Alex


> $ lspci -nns 2:00.0
> 02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse Switch Upstream [1022:57ad]
> 
> $ lspci -nns 3:
> 03:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]
> 03:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]
> 03:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]
> 03:05.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]
> 03:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]
> 03:08.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a4]
> 03:09.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a4]
> 03:0a.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a4]
> 
> And the information for the PCIe port:
> 
> ==> 6_2_before_vm <==  
> + cat /sys/bus/pci/devices/0000:03:02.0/power_state
> D3hot
> + cat /sys/bus/pci/devices/0000:03:02.0/power/runtime_status
> suspended
> + cat /sys/bus/pci/devices/0000:03:02.0/power/control
> auto
> + lspci -vvv -s 0000:03:02.0
> 03:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge (prog-if 00 [Normal decode])
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin ? routed to IRQ 35
> 	Bus: primary=03, secondary=06, subordinate=06, sec-latency=0
> 	I/O behind bridge: 0000d000-0000dfff [size=4K]
> 	Memory behind bridge: d0100000-d01fffff [size=1M]
> 	Prefetchable memory behind bridge: 0000001030000000-0000001047ffffff [size=384M]
> 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
> 	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> 	Capabilities: [50] Power Management version 3
> 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> 		Status: D3 NoSoftRst- PME-Enable+ DSel=0 DScale=0 PME-
> 	Capabilities: [58] Express (v2) Downstream Port (Slot+), MSI 00
> 		DevCap:	MaxPayload 512 bytes, PhantFunc 0
> 			ExtTag+ RBE+
> 		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> 			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> 			MaxPayload 128 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
> 		LnkCap:	Port #2, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 <32us
> 			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
> 		LnkCtl:	ASPM Disabled; Disabled- CommClk+
> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 8GT/s (downgraded), Width x4 (ok)
> 			TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
> 		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
> 			Slot #0, PowerLimit 0.000W; Interlock- NoCompl+
> 		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
> 			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
> 		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
> 			Changed: MRL- PresDet- LinkState+
> 		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+
> 			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
> 			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
> 			 FRS-, ARIFwd-
> 			 AtomicOpsCap: Routing-
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
> 			 AtomicOpsCtl: EgressBlck-
> 		LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> 		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
> 			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
> 	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> 		Address: 00000000fee00000  Data: 0000
> 	Capabilities: [c0] Subsystem: ASUSTeK Computer Inc. Matisse PCIe GPP Bridge
> 	Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
> 	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
> 	Capabilities: [150 v2] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> 		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
> 			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> 		HeaderLog: 00000000 00000000 00000000 00000000
> 	Capabilities: [270 v1] Secondary PCI Express
> 		LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
> 		LaneErrStat: LaneErr at lane: 0 1 2 3
> 	Capabilities: [2a0 v1] Access Control Services
> 		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
> 		ACSCtl:	SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
> 	Capabilities: [370 v1] L1 PM Substates
> 		L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
> 		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> 		L1SubCtl2:
> 	Capabilities: [400 v1] Data Link Feature <?>
> 	Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
> 	Capabilities: [440 v1] Lane Margining at the Receiver <?>
> 	Kernel driver in use: pcieport
> 
> 
> ==> 6_2_running_vm <==  
> + cat /sys/bus/pci/devices/0000:03:02.0/power_state
> D0
> + cat /sys/bus/pci/devices/0000:03:02.0/power/runtime_status
> active
> + cat /sys/bus/pci/devices/0000:03:02.0/power/control
> auto
> + lspci -vvv -s 0000:03:02.0
> 03:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge (prog-if 00 [Normal decode])
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin ? routed to IRQ 35
> 	Bus: primary=03, secondary=06, subordinate=06, sec-latency=0
> 	I/O behind bridge: 0000d000-0000dfff [size=4K]
> 	Memory behind bridge: d0100000-d01fffff [size=1M]
> 	Prefetchable memory behind bridge: 0000001030000000-0000001047ffffff [size=384M]
> 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
> 	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> 	Capabilities: [50] Power Management version 3
> 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> 		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> 	Capabilities: [58] Express (v2) Downstream Port (Slot+), MSI 00
> 		DevCap:	MaxPayload 512 bytes, PhantFunc 0
> 			ExtTag+ RBE+
> 		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> 			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> 			MaxPayload 128 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
> 		LnkCap:	Port #2, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 <32us
> 			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
> 		LnkCtl:	ASPM Disabled; Disabled- CommClk+
> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 8GT/s (downgraded), Width x4 (ok)
> 			TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
> 		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
> 			Slot #0, PowerLimit 0.000W; Interlock- NoCompl+
> 		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
> 			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
> 		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
> 			Changed: MRL- PresDet- LinkState+
> 		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+
> 			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
> 			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
> 			 FRS-, ARIFwd-
> 			 AtomicOpsCap: Routing-
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
> 			 AtomicOpsCtl: EgressBlck-
> 		LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> 		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
> 			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
> 	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> 		Address: 00000000fee00000  Data: 0000
> 	Capabilities: [c0] Subsystem: ASUSTeK Computer Inc. Matisse PCIe GPP Bridge
> 	Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
> 	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
> 	Capabilities: [150 v2] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> 		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
> 			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> 		HeaderLog: 00000000 00000000 00000000 00000000
> 	Capabilities: [270 v1] Secondary PCI Express
> 		LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
> 		LaneErrStat: LaneErr at lane: 0 1 2 3
> 	Capabilities: [2a0 v1] Access Control Services
> 		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
> 		ACSCtl:	SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
> 	Capabilities: [370 v1] L1 PM Substates
> 		L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
> 		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> 		L1SubCtl2:
> 	Capabilities: [400 v1] Data Link Feature <?>
> 	Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
> 	Capabilities: [440 v1] Lane Margining at the Receiver <?>
> 	Kernel driver in use: pcieport
> 
> 
> ==> 6_2_before_vm_disable_d3_idle <==  
> + cat /sys/bus/pci/devices/0000:03:02.0/power_state
> D0
> + cat /sys/bus/pci/devices/0000:03:02.0/power/runtime_status
> active
> + cat /sys/bus/pci/devices/0000:03:02.0/power/control
> auto
> + lspci -vvv -s 0000:03:02.0
> 03:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge (prog-if 00 [Normal decode])
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin ? routed to IRQ 35
> 	Bus: primary=03, secondary=06, subordinate=06, sec-latency=0
> 	I/O behind bridge: 0000d000-0000dfff [size=4K]
> 	Memory behind bridge: d0100000-d01fffff [size=1M]
> 	Prefetchable memory behind bridge: 0000001030000000-0000001047ffffff [size=384M]
> 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
> 	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> 	Capabilities: [50] Power Management version 3
> 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> 		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> 	Capabilities: [58] Express (v2) Downstream Port (Slot+), MSI 00
> 		DevCap:	MaxPayload 512 bytes, PhantFunc 0
> 			ExtTag+ RBE+
> 		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> 			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> 			MaxPayload 128 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
> 		LnkCap:	Port #2, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 <32us
> 			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
> 		LnkCtl:	ASPM Disabled; Disabled- CommClk+
> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 8GT/s (downgraded), Width x4 (ok)
> 			TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
> 		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
> 			Slot #0, PowerLimit 0.000W; Interlock- NoCompl+
> 		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
> 			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
> 		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
> 			Changed: MRL- PresDet- LinkState+
> 		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+
> 			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
> 			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
> 			 FRS-, ARIFwd-
> 			 AtomicOpsCap: Routing-
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
> 			 AtomicOpsCtl: EgressBlck-
> 		LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> 		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
> 			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
> 	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> 		Address: 00000000fee00000  Data: 0000
> 	Capabilities: [c0] Subsystem: ASUSTeK Computer Inc. Matisse PCIe GPP Bridge
> 	Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
> 	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
> 	Capabilities: [150 v2] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> 		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
> 			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> 		HeaderLog: 00000000 00000000 00000000 00000000
> 	Capabilities: [270 v1] Secondary PCI Express
> 		LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
> 		LaneErrStat: LaneErr at lane: 0 1 2 3
> 	Capabilities: [2a0 v1] Access Control Services
> 		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
> 		ACSCtl:	SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
> 	Capabilities: [370 v1] L1 PM Substates
> 		L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
> 		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> 		L1SubCtl2:
> 	Capabilities: [400 v1] Data Link Feature <?>
> 	Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
> 	Capabilities: [440 v1] Lane Margining at the Receiver <?>
> 	Kernel driver in use: pcieport
> 
> 
> ==> 6_2_running_vm_disable_d3_idle <==  
> + cat /sys/bus/pci/devices/0000:03:02.0/power_state
> D0
> + cat /sys/bus/pci/devices/0000:03:02.0/power/runtime_status
> active
> + cat /sys/bus/pci/devices/0000:03:02.0/power/control
> auto
> + lspci -vvv -s 0000:03:02.0
> 03:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge (prog-if 00 [Normal decode])
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin ? routed to IRQ 35
> 	Bus: primary=03, secondary=06, subordinate=06, sec-latency=0
> 	I/O behind bridge: 0000d000-0000dfff [size=4K]
> 	Memory behind bridge: d0100000-d01fffff [size=1M]
> 	Prefetchable memory behind bridge: 0000001030000000-0000001047ffffff [size=384M]
> 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
> 	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> 	Capabilities: [50] Power Management version 3
> 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> 		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> 	Capabilities: [58] Express (v2) Downstream Port (Slot+), MSI 00
> 		DevCap:	MaxPayload 512 bytes, PhantFunc 0
> 			ExtTag+ RBE+
> 		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> 			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> 			MaxPayload 128 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
> 		LnkCap:	Port #2, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 <32us
> 			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
> 		LnkCtl:	ASPM Disabled; Disabled- CommClk+
> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 8GT/s (downgraded), Width x4 (ok)
> 			TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
> 		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
> 			Slot #0, PowerLimit 0.000W; Interlock- NoCompl+
> 		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
> 			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
> 		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
> 			Changed: MRL- PresDet- LinkState+
> 		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+
> 			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
> 			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
> 			 FRS-, ARIFwd-
> 			 AtomicOpsCap: Routing-
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
> 			 AtomicOpsCtl: EgressBlck-
> 		LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> 		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
> 			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
> 	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> 		Address: 00000000fee00000  Data: 0000
> 	Capabilities: [c0] Subsystem: ASUSTeK Computer Inc. Matisse PCIe GPP Bridge
> 	Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
> 	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
> 	Capabilities: [150 v2] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> 		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
> 			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> 		HeaderLog: 00000000 00000000 00000000 00000000
> 	Capabilities: [270 v1] Secondary PCI Express
> 		LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
> 		LaneErrStat: LaneErr at lane: 0 1 2 3
> 	Capabilities: [2a0 v1] Access Control Services
> 		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
> 		ACSCtl:	SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
> 	Capabilities: [370 v1] L1 PM Substates
> 		L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
> 		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> 		L1SubCtl2:
> 	Capabilities: [400 v1] Data Link Feature <?>
> 	Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
> 	Capabilities: [440 v1] Lane Margining at the Receiver <?>
> 	Kernel driver in use: pcieport
> 
> 
> --
> Tasos
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Completion-Wait loop timed out with vfio
  2023-03-02 20:36             ` Alex Williamson
@ 2023-03-03  6:33               ` Tasos Sahanidis
  2023-03-03 16:41                 ` Alex Williamson
  2023-03-06  8:21               ` Abhishek Sahu
  1 sibling, 1 reply; 16+ messages in thread
From: Tasos Sahanidis @ 2023-03-03  6:33 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Abhishek Sahu, kvm

On 2023-03-02 22:36, Alex Williamson wrote:
> Yes, the fact that the NIC works suggests there's not simply a blatant
> chip defect where we should blindly disable D3 power state support for
> this downstream port.  I'm also not seeing any difference in the
> downstream port configuration between the VM running after the port has
> resumed from D3hot and the case where the port never entered D3hot.

Agreed.

> But it suddenly dawns on me that you're assigning a Radeon HD 7790,
> which is one of the many AMD GPUs which is plagued by reset problems.
> I wonder if that's a factor there.  This particular GPU even has
> special handling in QEMU to try to manually reset the device, and which
> likely has never been tested since adding runtime power management
> support.  In fact, I'm surprised anyone is doing regular device
> assignment with an HD 7790 and considers it a normal, acceptable
> experience even with the QEMU workarounds.

I had no idea. I always assumed that because it worked out of the box
ever since I first tried passing it through, it wasn't affected by these
reset issues. I never had any trouble with it until now.

> I certainly wouldn't feel comfortable proposing a quirk for the
> downstream port to disable D3hot for an issue only seen when assigning
> a device with such a nefarious background relative to device
> assignment.  It does however seem like there are sufficient options in
> place to work around the issue, either disabling power management at
> the vfio-pci driver, or specifically for the downstream port via sysfs.
> I don't really have any better suggestions given our limited ability to
> test and highly suspect target device.  Any other ideas, Abhishek?
> Thanks,
> 
> Alex

This actually gave me an idea on how to check if it's the graphics card
that's at fault, or if it is QEMU's workarounds.

I booted up the system as usual and let vfio-pci take over the device.
Both the device itself and the PCIe port were at D3hot. I manually
forced the PCIe port to switch to D0, with the GPU remaining at D3hot. I
then proceeded to start up the VM, and there were no errors in dmesg.

If it's even possible, it sounds like QEMU might be doing something
before the PCIe port is (fully?) out of D3hot, and thus the card tries
to do something which makes the IOMMU unhappy.

Is there something in either the rpm trace, or elsewhere that can help
me dig into this further?

--
Tasos

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Completion-Wait loop timed out with vfio
  2023-03-03  6:33               ` Tasos Sahanidis
@ 2023-03-03 16:41                 ` Alex Williamson
  2023-03-06  9:49                   ` Tasos Sahanidis
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2023-03-03 16:41 UTC (permalink / raw)
  To: Tasos Sahanidis; +Cc: Abhishek Sahu, kvm

On Fri, 3 Mar 2023 08:33:14 +0200
Tasos Sahanidis <tasos@tasossah.com> wrote:

> On 2023-03-02 22:36, Alex Williamson wrote:
> > Yes, the fact that the NIC works suggests there's not simply a blatant
> > chip defect where we should blindly disable D3 power state support for
> > this downstream port.  I'm also not seeing any difference in the
> > downstream port configuration between the VM running after the port has
> > resumed from D3hot and the case where the port never entered D3hot.  
> 
> Agreed.
> 
> > But it suddenly dawns on me that you're assigning a Radeon HD 7790,
> > which is one of the many AMD GPUs which is plagued by reset problems.
> > I wonder if that's a factor there.  This particular GPU even has
> > special handling in QEMU to try to manually reset the device, and which
> > likely has never been tested since adding runtime power management
> > support.  In fact, I'm surprised anyone is doing regular device
> > assignment with an HD 7790 and considers it a normal, acceptable
> > experience even with the QEMU workarounds.  
> 
> I had no idea. I always assumed that because it worked out of the box
> ever since I first tried passing it through, it wasn't affected by these
> reset issues. I never had any trouble with it until now.

IIRC, so long as the VM is always booting and cleanly shutting down,
then the QEMU quirk is sufficient, but if you need to kill QEMU the GPU
might be in a bad state that requires a host reboot to recover.

> > I certainly wouldn't feel comfortable proposing a quirk for the
> > downstream port to disable D3hot for an issue only seen when assigning
> > a device with such a nefarious background relative to device
> > assignment.  It does however seem like there are sufficient options in
> > place to work around the issue, either disabling power management at
> > the vfio-pci driver, or specifically for the downstream port via sysfs.
> > I don't really have any better suggestions given our limited ability to
> > test and highly suspect target device.  Any other ideas, Abhishek?
> > Thanks,
> > 
> > Alex  
> 
> This actually gave me an idea on how to check if it's the graphics card
> that's at fault, or if it is QEMU's workarounds.
> 
> I booted up the system as usual and let vfio-pci take over the device.
> Both the device itself and the PCIe port were at D3hot. I manually
> forced the PCIe port to switch to D0, with the GPU remaining at D3hot. I
> then proceeded to start up the VM, and there were no errors in dmesg.
> 
> If it's even possible, it sounds like QEMU might be doing something
> before the PCIe port is (fully?) out of D3hot, and thus the card tries
> to do something which makes the IOMMU unhappy.
> 
> Is there something in either the rpm trace, or elsewhere that can help
> me dig into this further?

That's interesting to find.  There are quirks in the kernel that don't
disable D3hot, but just extend the suspend/resume time.  If you're
slightly comfortable with coding and building the kernel, you could try
something like below.  With the level of information we have, I'd feel
more comfortable only proposing to extend the resume time for the 7790
and not the downstream port, but I've put both in below to play with.

You can comment out one of the DECLARE... lines to disable each.  The 20
value here is in ms and I have no idea what it should be.  There are a
couple quirks that use this 20ms value and a bunch of Intel device IDs
set an equivalent value to 120ms.  Experiment and see if you can find
something that works reliably.  Thanks,

Alex

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 44cab813bf95..d9ae376d9524 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1956,6 +1956,15 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e0, quirk_ryzen_xhci_d3hot);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e1, quirk_ryzen_xhci_d3hot);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x1639, quirk_ryzen_xhci_d3hot);
 
+static void quirk_d3hot_test_delay(struct pci_dev *dev)
+{
+	quirk_d3hot_delay(dev, 20);
+}
+/* Radeon HD 7790 */
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x665c, quirk_d3hot_test_delay);
+/* Matisse PCIe GPP Bridge Downstream Ports */
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x57a3, quirk_d3hot_test_delay);
+
 #ifdef CONFIG_X86_IO_APIC
 static int dmi_disable_ioapicreroute(const struct dmi_system_id *d)
 {


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Bug: Completion-Wait loop timed out with vfio
  2023-03-02 20:36             ` Alex Williamson
  2023-03-03  6:33               ` Tasos Sahanidis
@ 2023-03-06  8:21               ` Abhishek Sahu
  1 sibling, 0 replies; 16+ messages in thread
From: Abhishek Sahu @ 2023-03-06  8:21 UTC (permalink / raw)
  To: Alex Williamson, Tasos Sahanidis; +Cc: kvm

On 3/3/2023 2:06 AM, Alex Williamson wrote:
> On Thu, 2 Mar 2023 09:40:35 +0200
> Tasos Sahanidis <tasos@tasossah.com> wrote:
> 
>> On 2023-03-01 16:10, Alex Williamson wrote:
>>> 0000:02:00.0 is the upstream port of that switch and 0000:03:02.0 is
>>> the downstream port for the 7790.  0000:03:02.0 is the port that should
>>> also now enter D3hot.
>>>   
>>>> If so, I tested in 5.18, both before and while running the VM, with 6.2
>>>> both with and without disable_idle_d3, and in all cases they stayed at D0.  
>>>
>>> It's possible the switch has a problem with D3hot support and it may
>>> need to be disabled or augmented with a PCI quirk.  In addition to
>>> investigating what power state the downstream port is achieving and
>>> reporting lspci -vvv with and without disable_idle_d3, would you mind
>>> reporting "lspci -nns 2:00.0" and "lspci -nns 3:" to report all the
>>> vendor and device IDs of the switch.  Thanks,
>>>   
>>
>> It seems that way, especially after manually preventing the root port
>> for the graphics card from entering D3hot, however the one for the NIC
>> seems to be doing that just fine, which makes things more confusing.
> 
> Yes, the fact that the NIC works suggests there's not simply a blatant
> chip defect where we should blindly disable D3 power state support for
> this downstream port.  I'm also not seeing any difference in the
> downstream port configuration between the VM running after the port has
> resumed from D3hot and the case where the port never entered D3hot.
> 
> But it suddenly dawns on me that you're assigning a Radeon HD 7790,
> which is one of the many AMD GPUs which is plagued by reset problems.
> I wonder if that's a factor there.  This particular GPU even has
> special handling in QEMU to try to manually reset the device, and which
> likely has never been tested since adding runtime power management
> support.  In fact, I'm surprised anyone is doing regular device
> assignment with an HD 7790 and considers it a normal, acceptable
> experience even with the QEMU workarounds.
> 
> I certainly wouldn't feel comfortable proposing a quirk for the
> downstream port to disable D3hot for an issue only seen when assigning
> a device with such a nefarious background relative to device
> assignment.  It does however seem like there are sufficient options in
> place to work around the issue, either disabling power management at
> the vfio-pci driver, or specifically for the downstream port via sysfs.

  Thanks Tasos and Alex. 

  We can use the udev rules to toggle the sysfs entries automatically.
  The information regarding udev parameters for the downstream or upstream bridge
  can be fetched through 

  # udevadm info <device_path>

  And then create a rules file.
  For nvidia GPU runtime PM, the udev rules are documented in 

  https://download.nvidia.com/XFree86/Linux-x86_64/525.89.02/README/dynamicpowermanagement.html#AutomatedSetup803b0

  We can create similar kind of udev rules, if we want to disable runtime PM only
  for specific device automatically.
 
> I don't really have any better suggestions given our limited ability to
> test and highly suspect target device.  Any other ideas, Abhishek?

 Given that we already tried all the possible isolation steps from the
 user-space side, so nothing new from my side which can be tried easily.
 I checked the lspci dumps and no issues observed with that.
 
 I have written standalone programs by using the example mentioned in
 https://www.kernel.org/doc/html/next/driver-api/vfio.html when I did
 testing for my runtime PM patches to get more coverage. For this issue
 also, if we can try to repro the issue through standalone programs first, then
 the debugging may be easier. But it requires effort and access to register
 manual, so it won't be worth to try at Tasos's end.

 We can go with delay option as you suggested in the latest thread and see if
 that helps.

 Thanks,
 Abhishek

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Completion-Wait loop timed out with vfio
  2023-03-03 16:41                 ` Alex Williamson
@ 2023-03-06  9:49                   ` Tasos Sahanidis
  0 siblings, 0 replies; 16+ messages in thread
From: Tasos Sahanidis @ 2023-03-06  9:49 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Abhishek Sahu, kvm

On 2023-03-03 18:41, Alex Williamson wrote:>>> But it suddenly dawns on
me that you're assigning a Radeon HD 7790,
>>> which is one of the many AMD GPUs which is plagued by reset problems.
>>> I wonder if that's a factor there.  This particular GPU even has
>>> special handling in QEMU to try to manually reset the device, and which
>>> likely has never been tested since adding runtime power management
>>> support.  In fact, I'm surprised anyone is doing regular device
>>> assignment with an HD 7790 and considers it a normal, acceptable
>>> experience even with the QEMU workarounds.  
>>
>> I had no idea. I always assumed that because it worked out of the box
>> ever since I first tried passing it through, it wasn't affected by these
>> reset issues. I never had any trouble with it until now.
> 
> IIRC, so long as the VM is always booting and cleanly shutting down,
> then the QEMU quirk is sufficient, but if you need to kill QEMU the GPU
> might be in a bad state that requires a host reboot to recover.
> 

I tried SIGKILLing QEMU a few times and the card kept working.

>>> I certainly wouldn't feel comfortable proposing a quirk for the
>>> downstream port to disable D3hot for an issue only seen when assigning
>>> a device with such a nefarious background relative to device
>>> assignment.  It does however seem like there are sufficient options in
>>> place to work around the issue, either disabling power management at
>>> the vfio-pci driver, or specifically for the downstream port via sysfs.
>>> I don't really have any better suggestions given our limited ability to
>>> test and highly suspect target device.  Any other ideas, Abhishek?
>>> Thanks,
>>>
>>> Alex  
>>
>> This actually gave me an idea on how to check if it's the graphics card
>> that's at fault, or if it is QEMU's workarounds.
>>
>> I booted up the system as usual and let vfio-pci take over the device.
>> Both the device itself and the PCIe port were at D3hot. I manually
>> forced the PCIe port to switch to D0, with the GPU remaining at D3hot. I
>> then proceeded to start up the VM, and there were no errors in dmesg.
>>
>> If it's even possible, it sounds like QEMU might be doing something
>> before the PCIe port is (fully?) out of D3hot, and thus the card tries
>> to do something which makes the IOMMU unhappy.
>>
>> Is there something in either the rpm trace, or elsewhere that can help
>> me dig into this further?
> 
> That's interesting to find.  There are quirks in the kernel that don't
> disable D3hot, but just extend the suspend/resume time.  If you're
> slightly comfortable with coding and building the kernel, you could try
> something like below.  With the level of information we have, I'd feel
> more comfortable only proposing to extend the resume time for the 7790
> and not the downstream port, but I've put both in below to play with.
> 
> You can comment out one of the DECLARE... lines to disable each.  The 20
> value here is in ms and I have no idea what it should be.  There are a
> couple quirks that use this 20ms value and a bunch of Intel device IDs
> set an equivalent value to 120ms.  Experiment and see if you can find
> something that works reliably.  Thanks,
> 
> Alex
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 44cab813bf95..d9ae376d9524 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -1956,6 +1956,15 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e0, quirk_ryzen_xhci_d3hot);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e1, quirk_ryzen_xhci_d3hot);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x1639, quirk_ryzen_xhci_d3hot);
>  
> +static void quirk_d3hot_test_delay(struct pci_dev *dev)
> +{
> +	quirk_d3hot_delay(dev, 20);
> +}
> +/* Radeon HD 7790 */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x665c, quirk_d3hot_test_delay);
> +/* Matisse PCIe GPP Bridge Downstream Ports */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x57a3, quirk_d3hot_test_delay);
> +
>  #ifdef CONFIG_X86_IO_APIC
>  static int dmi_disable_ioapicreroute(const struct dmi_system_id *d)
>  {
> 

The quirk on the downstream port changed nothing, which is both good and
bad I guess. The quirk on the 7790, when set to 120ms actually stopped
the error messages, but only when the VM was stopping. When the VM was
starting, the messages remained the same, which is puzzling. The delay
applies when going from D3 to D0, which happens when the VM starts, not
when it stops... I tried it as high as 500ms and nothing else changed.

I looked at QEMU's source, and I'll try both disabling the reset
temporarily, to see if the errors go away, and also adding some delays
in there in different areas (as there are a few already).

--
Tasos

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-03-06  9:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-25  6:25 Bug: Completion-Wait loop timed out with vfio Tasos Sahanidis
2023-02-27  5:33 ` Abhishek Sahu
2023-02-28  7:33   ` Tasos Sahanidis
2023-02-28 18:46     ` Alex Williamson
2023-03-01 10:34       ` Tasos Sahanidis
2023-03-01 11:53         ` Abhishek Sahu
2023-03-02  7:26           ` Tasos Sahanidis
2023-03-01 14:10         ` Alex Williamson
2023-03-02  7:40           ` Tasos Sahanidis
2023-03-02 20:36             ` Alex Williamson
2023-03-03  6:33               ` Tasos Sahanidis
2023-03-03 16:41                 ` Alex Williamson
2023-03-06  9:49                   ` Tasos Sahanidis
2023-03-06  8:21               ` Abhishek Sahu
2023-03-01 10:10     ` Abhishek Sahu
2023-03-02  7:18       ` Tasos Sahanidis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.