All of lore.kernel.org
 help / color / mirror / Atom feed
* Bug: Completion-Wait loop timed out with vfio
@ 2023-02-25  6:25 Tasos Sahanidis
  2023-02-27  5:33 ` Abhishek Sahu
  0 siblings, 1 reply; 16+ messages in thread
From: Tasos Sahanidis @ 2023-02-25  6:25 UTC (permalink / raw)
  To: alex.williamson; +Cc: abhsahu, kvm

Hello everyone,

Attempting to pass through my graphics card to a VM with kernel 
>= 5.19.results in the following (host):

[   72.645091] AMD-Vi: Completion-Wait loop timed out
[   72.791448] AMD-Vi: Completion-Wait loop timed out
[   72.937768] AMD-Vi: Completion-Wait loop timed out
[   73.084388] AMD-Vi: Completion-Wait loop timed out
[   73.231661] AMD-Vi: Completion-Wait loop timed out
[   73.231711] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f000 flags=0x0050]
[   73.231724] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f040 flags=0x0050]
[   73.231734] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f080 flags=0x0050]
[   73.231743] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f0c0 flags=0x0050]
[   73.231752] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f100 flags=0x0050]
[   73.231761] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f140 flags=0x0050]
[   73.231770] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f180 flags=0x0050]
[   73.231779] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f1c0 flags=0x0050]
[   73.231788] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f200 flags=0x0050]
[   73.231797] ahci 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0017 address=0xc5f3f240 flags=0x0050]
[   73.377900] AMD-Vi: Completion-Wait loop timed out
[   73.500538] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x1001e4600]
[   73.546431] AMD-Vi: Completion-Wait loop timed out
[   73.693772] AMD-Vi: Completion-Wait loop timed out
[   73.847385] AMD-Vi: Completion-Wait loop timed out
[   74.001796] AMD-Vi: Completion-Wait loop timed out
[   74.148077] AMD-Vi: Completion-Wait loop timed out
[   74.168380] virbr0: port 2(vnet0) entered learning state
[   74.294937] AMD-Vi: Completion-Wait loop timed out
[   74.296484] ata2.00: exception Emask 0x20 SAct 0x7e703fff SErr 0x0 action 0x6 frozen
[   74.296492] ata2.00: irq_stat 0x20000000, host bus error
[   74.296496] ata2.00: failed command: WRITE FPDMA QUEUED
[   74.296498] ata2.00: cmd 61/08:00:c0:ec:91/00:00:01:00:00/40 tag 0 ncq dma 4096 out
                        res 40/00:34:20:eb:91/00:00:01:00:00/40 Emask 0x20 (host bus error)
[   74.296507] ata2.00: status: { DRDY }
[more ATA errors]
[   74.296724] ata2: hard resetting link
[   74.430739] AMD-Vi: Completion-Wait loop timed out
[   74.502557] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x1001e4660]
[   74.502563] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x1001e4680]
[   74.680713] vfio-pci 0000:06:00.0: enabling device (0000 -> 0003)
[   74.681219] vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
[   74.681235] vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
[   74.700687] vfio-pci 0000:06:00.1: enabling device (0000 -> 0002)
[   74.772816] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   74.775906] ata2.00: configured for UDMA/133
[   74.775957] ata2: EH complete
[   74.935315] AMD-Vi: Completion-Wait loop timed out
[   75.073590] AMD-Vi: Completion-Wait loop timed out
[   75.212946] AMD-Vi: Completion-Wait loop timed out
[   75.379316] AMD-Vi: Completion-Wait loop timed out
[   75.504512] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x1001e46f0]

Stopping the VM results in similar messages.

The card is an AMD Radeon HD 7790 (1002:665c) and shows up at 06:00.0 on
the host. This is a Ryzen system with an ASUS "TUF GAMING X570-PLUS".
Userspace virt-related packages are all stock from Ubuntu 20.04.

While these messages are printed, sometimes the cursor and audio
stutter. These temporary freezes have also caused file system
corruption. The graphics card is non functional in this state.

Bisecting this shows that the issue was introduced by:
7ab5e10eda02d ("vfio/pci: Move the unused device into low power state with runtime PM").

Reverting that commit in 5.19 results in GPU passthrough working as
expected. The patch doesn't cleanly revert on kernels newer than 5.19.

--
Tasos

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-03-06  9:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-25  6:25 Bug: Completion-Wait loop timed out with vfio Tasos Sahanidis
2023-02-27  5:33 ` Abhishek Sahu
2023-02-28  7:33   ` Tasos Sahanidis
2023-02-28 18:46     ` Alex Williamson
2023-03-01 10:34       ` Tasos Sahanidis
2023-03-01 11:53         ` Abhishek Sahu
2023-03-02  7:26           ` Tasos Sahanidis
2023-03-01 14:10         ` Alex Williamson
2023-03-02  7:40           ` Tasos Sahanidis
2023-03-02 20:36             ` Alex Williamson
2023-03-03  6:33               ` Tasos Sahanidis
2023-03-03 16:41                 ` Alex Williamson
2023-03-06  9:49                   ` Tasos Sahanidis
2023-03-06  8:21               ` Abhishek Sahu
2023-03-01 10:10     ` Abhishek Sahu
2023-03-02  7:18       ` Tasos Sahanidis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.