All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug)
@ 2018-04-17 22:29 bugzilla-daemon
  2018-04-17 22:30 ` bugzilla-daemon
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: bugzilla-daemon @ 2018-04-17 22:29 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1722 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=106111

            Bug ID: 106111
           Summary: [GPU Passthrough]GPU (Polaris) not reinitialized with
                    Linux VM (Reset bug)
           Product: DRI
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: berillions@gmail.com
                CC: alexdeucher@gmail.com

Created attachment 138887
  --> https://bugs.freedesktop.org/attachment.cgi?id=138887&action=edit
xorg.conf

Hi,

My Setup :
- AMD Ryzen 1600
- 16 Gb Memory RAM
- Host (Debian Stable, kernel 4.16.2) : AMD Rx560 4Gb
- Guest (Windows 10 / Archlinux Kernel 4.15.x-4.16.x) : AMD Rx580 - 8Gb

Years ago there was an issue on Windows virtual machine with Qemu/VFIO and AMD
GPU. It was impossible to reboot or use a 2nde time the Guest because the GPU
was not reinitialized when the Host was shutdown. The only solution to re-use
the VM was to reboot the Host OR use a Nvidia GPU.

Actually, the issue is fixed on Windows VM + AMD GPU passed through (i don't
know how), i can use more times my VM without reboot the Host. 

But if i use my Linux VM with my Rx580, the issue still exist. The first launch
works, i can use the Rx580 to play without problem. But if i shutdown/reboot
the guest, the Rx580 is "blocked". I need to hard reboot because the system
hangs after ~2-3 minutes.

Thanks for your help,
Maxime 

(Sorry for my English, i'm French)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3212 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug)
  2018-04-17 22:29 [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug) bugzilla-daemon
@ 2018-04-17 22:30 ` bugzilla-daemon
  2018-04-17 22:55 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2018-04-17 22:30 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 336 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=106111

--- Comment #1 from Max <berillions@gmail.com> ---
Created attachment 138888
  --> https://bugs.freedesktop.org/attachment.cgi?id=138888&action=edit
dmesg output after to launch the VM a second time

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1351 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug)
  2018-04-17 22:29 [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug) bugzilla-daemon
  2018-04-17 22:30 ` bugzilla-daemon
@ 2018-04-17 22:55 ` bugzilla-daemon
  2018-04-18  6:11 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2018-04-17 22:55 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1876 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=106111

--- Comment #2 from Alex Williamson <alex.williamson@redhat.com> ---
The IOMMU looks to be unhappy first:

[   40.201258] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x19@0x270
[   40.201271] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x1b@0x2d0
[   40.201279] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x1e@0x370
[  159.958402] AMD-Vi: Completion-Wait loop timed out
[  160.118777] AMD-Vi: Completion-Wait loop timed out
[  160.799864] AMD-Vi: Event logged [
[  160.799868] IOTLB_INV_TIMEOUT device=0a:00.0 address=0x000000043e8e8550]
[  160.799872] AMD-Vi: Event logged [
[  160.799874] IOTLB_INV_TIMEOUT device=0a:00.0 address=0x000000043e8e8570]
[  160.799876] AMD-Vi: Event logged [
[  160.799878] IOTLB_INV_TIMEOUT device=0a:00.0 address=0x000000043e8e8590]
[  161.801729] AMD-Vi: Event logged [
[  161.801732] IOTLB_INV_TIMEOUT device=0a:00.0 address=0x000000043e8e85e0]
[  180.096365] AMD-Vi: Completion-Wait loop timed out
[  180.256758] AMD-Vi: Completion-Wait loop timed out
[  180.417182] AMD-Vi: Completion-Wait loop timed out
[  180.577636] AMD-Vi: Completion-Wait loop timed out

Can you try a v4.17-rc1 kernel?  Specifically, these two updates:

6bd06f5a486c vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs
eb5ecd1a40e2 iommu/amd: Add support for fast IOTLB flushing

Something about AMD GPUs get unhappy if the IOMMU sends out too many
invalidations and the above two patches can reduce the number of those
invalidations by up to a factor of 512.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6bd06f5a486c06023a618a86e8153b91d26f75f4
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eb5ecd1a40e2098f805fb63cb07817ac48826e40

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2997 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug)
  2018-04-17 22:29 [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug) bugzilla-daemon
  2018-04-17 22:30 ` bugzilla-daemon
  2018-04-17 22:55 ` bugzilla-daemon
@ 2018-04-18  6:11 ` bugzilla-daemon
  2018-04-18 16:13 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2018-04-18  6:11 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1284 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=106111

--- Comment #3 from Max <berillions@gmail.com> ---
Created attachment 138893
  --> https://bugs.freedesktop.org/attachment.cgi?id=138893&action=edit
dmesg after second launch + 4.17-rc1

Same problem with the Kernel 4.17-rc1. To be sure, i need to install this
kernel only on the Host, no need to install it on the Linux Guest ?

I use my own kernel 4.17 so maybe IOMMU/VFIO options are missing :

odelpasso@debian-desktop:~/Bureau$ cat /boot/config-4.17.0-rc1 | grep VFIO
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_VFIO_VIRQFD=m
CONFIG_VFIO=m
# CONFIG_VFIO_NOIOMMU is not set
CONFIG_VFIO_PCI=m
CONFIG_VFIO_PCI_VGA=y
CONFIG_VFIO_PCI_MMAP=y
CONFIG_VFIO_PCI_INTX=y
CONFIG_VFIO_PCI_IGD=y
# CONFIG_VFIO_MDEV is not set
CONFIG_KVM_VFIO=y

odelpasso@debian-desktop:~/Bureau$ cat /boot/config-4.17.0-rc1 | grep IOMMU
# CONFIG_GART_IOMMU is not set
# CONFIG_CALGARY_IOMMU is not set
CONFIG_IOMMU_HELPER=y
CONFIG_VFIO_IOMMU_TYPE1=m
# CONFIG_VFIO_NOIOMMU is not set
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y
# Generic IOMMU Pagetable Support
CONFIG_IOMMU_IOVA=y
CONFIG_AMD_IOMMU=y
CONFIG_AMD_IOMMU_V2=y
# CONFIG_INTEL_IOMMU is not set

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2281 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug)
  2018-04-17 22:29 [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug) bugzilla-daemon
                   ` (2 preceding siblings ...)
  2018-04-18  6:11 ` bugzilla-daemon
@ 2018-04-18 16:13 ` bugzilla-daemon
  2018-04-18 16:33 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2018-04-18 16:13 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1451 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=106111

--- Comment #4 from Alex Williamson <alex.williamson@redhat.com> ---
There is a difference, now we have:

[   84.997634] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x19@0x270
[   84.997645] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x1b@0x2d0
[   84.997653] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x1e@0x370
[  145.518307] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x19@0x270
[  145.518313] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x1b@0x2d0
[  145.518318] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x1e@0x370

So prior to time 145.5 the VM was shutdown and started again and we could still
read config space of the device.  Previously we were already getting IOMMU
faults before the second startup.  But shortly after:

[  193.328586] AMD-Vi: Completion-Wait loop timed out
[  193.488711] AMD-Vi: Completion-Wait loop timed out
[  194.169913] iommu ivhd0: AMD-Vi: Event logged [
[  194.169921] iommu ivhd0: IOTLB_INV_TIMEOUT device=0a:00.0
address=0x000000043e8aaca0]
[  194.169924] iommu ivhd0: AMD-Vi: Event logged [
[  194.169928] iommu ivhd0: IOTLB_INV_TIMEOUT device=0a:00.0
address=0x000000043e8aacc0]

And the stuck in D3 state is evidence that the device is no longer accessible
on the bus.  So that only delayed the issue, some interaction between the IOMMU
and GPU is still failing.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2318 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug)
  2018-04-17 22:29 [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug) bugzilla-daemon
                   ` (3 preceding siblings ...)
  2018-04-18 16:13 ` bugzilla-daemon
@ 2018-04-18 16:33 ` bugzilla-daemon
  2018-08-18  8:31 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2018-04-18 16:33 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1624 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=106111

--- Comment #5 from Max <berillions@gmail.com> ---
(In reply to Alex Williamson from comment #4)
> There is a difference, now we have:
> 
> [   84.997634] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x19@0x270
> [   84.997645] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x1b@0x2d0
> [   84.997653] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x1e@0x370
> [  145.518307] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x19@0x270
> [  145.518313] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x1b@0x2d0
> [  145.518318] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x1e@0x370
> 
> So prior to time 145.5 the VM was shutdown and started again and we could
> still read config space of the device.  Previously we were already getting
> IOMMU faults before the second startup.  But shortly after:
> 
> [  193.328586] AMD-Vi: Completion-Wait loop timed out
> [  193.488711] AMD-Vi: Completion-Wait loop timed out
> [  194.169913] iommu ivhd0: AMD-Vi: Event logged [
> [  194.169921] iommu ivhd0: IOTLB_INV_TIMEOUT device=0a:00.0
> address=0x000000043e8aaca0]
> [  194.169924] iommu ivhd0: AMD-Vi: Event logged [
> [  194.169928] iommu ivhd0: IOTLB_INV_TIMEOUT device=0a:00.0
> address=0x000000043e8aacc0]
> 
> And the stuck in D3 state is evidence that the device is no longer
> accessible on the bus.  So that only delayed the issue, some interaction
> between the IOMMU and GPU is still failing.

Thanks for the explaination Alex.
Something could be done ? 
By AMD or VFIO mainteners ?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2616 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug)
  2018-04-17 22:29 [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug) bugzilla-daemon
                   ` (4 preceding siblings ...)
  2018-04-18 16:33 ` bugzilla-daemon
@ 2018-08-18  8:31 ` bugzilla-daemon
  2018-09-14 10:30 ` bugzilla-daemon
  2019-11-19  8:35 ` bugzilla-daemon
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2018-08-18  8:31 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 671 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=106111

--- Comment #6 from Radosław Szkodziński <astralstorm@gmail.com> ---
This is still happening. It seems that these GPU need engine resets before bus
reset, similar to what was done for Fury and Polaris, but more extensive.

Temporary workaround (yeah sure) is to eject the driver - rmmod in guest or
eject in Windows. This resets the engines.

Windows did the resets on shutdown until version 18.5.1 where they broke
shutdown sequence again - read release notes on Radeon Pro Vega FE drivers
where they actually slightly care.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1516 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug)
  2018-04-17 22:29 [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug) bugzilla-daemon
                   ` (5 preceding siblings ...)
  2018-08-18  8:31 ` bugzilla-daemon
@ 2018-09-14 10:30 ` bugzilla-daemon
  2019-11-19  8:35 ` bugzilla-daemon
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2018-09-14 10:30 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 332 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=106111

--- Comment #7 from Andrew Sheldon <asheldon55@gmail.com> ---
Another workaround that has worked for me with a Vega 56 is to suspend-to-ram
the host system before trying to start the guest again.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1168 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug)
  2018-04-17 22:29 [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug) bugzilla-daemon
                   ` (6 preceding siblings ...)
  2018-09-14 10:30 ` bugzilla-daemon
@ 2019-11-19  8:35 ` bugzilla-daemon
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-11-19  8:35 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 805 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=106111

Martin Peres <martin.peres@free.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |MOVED
             Status|NEW                         |RESOLVED

--- Comment #8 from Martin Peres <martin.peres@free.fr> ---
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.

You can subscribe and participate further through the new bug through this link
to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/346.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2446 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-11-19  8:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-17 22:29 [Bug 106111] [GPU Passthrough]GPU (Polaris) not reinitialized with Linux VM (Reset bug) bugzilla-daemon
2018-04-17 22:30 ` bugzilla-daemon
2018-04-17 22:55 ` bugzilla-daemon
2018-04-18  6:11 ` bugzilla-daemon
2018-04-18 16:13 ` bugzilla-daemon
2018-04-18 16:33 ` bugzilla-daemon
2018-08-18  8:31 ` bugzilla-daemon
2018-09-14 10:30 ` bugzilla-daemon
2019-11-19  8:35 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.