dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
@ 2017-07-27 11:46 bugzilla-daemon
  2017-07-27 11:47 ` bugzilla-daemon
                   ` (32 more replies)
  0 siblings, 33 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-27 11:46 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2457 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

            Bug ID: 101946
           Summary: Rebinding AMDGPU causes initialization errors [R9 290
                    / 4.10 kernel]
           Product: DRI
           Version: XOrg git
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: beanow@oscp.info

Created attachment 133068
  --> https://bugs.freedesktop.org/attachment.cgi?id=133068&action=edit
The script used to reproduce the error.

As I attempted to hotplug my R9 290 for a VM gaming setup, I stumbled on this
issue.

The main kern.log error to come up is:

> [  160.013733] [drm:ci_dpm_enable [amdgpu]] *ERROR* ci_start_dpm failed
> [  160.014134] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <amdgpu_powerplay> failed -22
> [  160.014531] amdgpu 0000:01:00.0: amdgpu_init failed


For my setup I use a Kaby Lake iGPU running i915.
With the R9 290 using vfio-pci / amdgpu.
Ubuntu 17.04 (4.10.0-28-generic).
Mesa 17.1.4 from the padoka stable PPA.


I'm able to reproduce this as follows.

1. Boot with vfio-pci capturing the card and amdgpu blacklisted. Kernel flags:
> intel_iommu=on iommu=pt vfio-pci.ids=1002:67b1,1002:aac8

2. Since I run Gnome3 on Ubuntu 17.04, this will bring me to a wayland greeter
which uses my iGPU. Drop to a free TTY, without logging in. This prevents Xorg
from responding to the AMD card becoming available.

3. Run the attached script "rebind-amd.sh" as root to bind back and forth
between vfio-pci and amdgpu in an infinite loop.

This will:

A. modprobe both drivers to be sure they're loaded.
B. Print information about the driver and card usage.
C. Use the new_id > unbind > bind > remove_id sequence to switch drivers.

What happens is:

vfio-pci -> vfio-pci, Gives no problems, of course.
vfio-pci -> amdgpu, This works and the amdgpu driver initializes the card.
Attached monitor(s) start searching for signals.
amdgpu -> vfio-pci, Since no Xorg is using the dGPU this works without
problems.
vfio-pci -> amdgpu, Fails to initialize dGPU with the kernel error above.


I've attached the script, the output of the script and the full kern.log.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 4027 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
@ 2017-07-27 11:47 ` bugzilla-daemon
  2017-07-27 11:47 ` bugzilla-daemon
                   ` (31 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-27 11:47 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 298 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #1 from Robin <beanow@oscp.info> ---
Created attachment 133069
  --> https://bugs.freedesktop.org/attachment.cgi?id=133069&action=edit
Script output

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1227 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
  2017-07-27 11:47 ` bugzilla-daemon
@ 2017-07-27 11:47 ` bugzilla-daemon
  2017-07-27 11:47 ` bugzilla-daemon
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-27 11:47 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 293 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #2 from Robin <beanow@oscp.info> ---
Created attachment 133070
  --> https://bugs.freedesktop.org/attachment.cgi?id=133070&action=edit
kern.log

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1212 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
  2017-07-27 11:47 ` bugzilla-daemon
  2017-07-27 11:47 ` bugzilla-daemon
@ 2017-07-27 11:47 ` bugzilla-daemon
  2017-07-27 12:00 ` bugzilla-daemon
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-27 11:47 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 433 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

Robin <beanow@oscp.info> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |beanow@oscp.info
            Version|XOrg git                    |unspecified

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1265 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (2 preceding siblings ...)
  2017-07-27 11:47 ` bugzilla-daemon
@ 2017-07-27 12:00 ` bugzilla-daemon
  2017-07-27 14:42 ` bugzilla-daemon
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-27 12:00 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 650 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #3 from Robin <beanow@oscp.info> ---
What I noticed from the kern.log is that it seems to try and skip init steps
the second time amdgpu loads. So perhaps the unbind doesn't do a clean enough
shutdown or there may be a bug in the init step skipping.

For example, the first time:
> [  129.439652] amdgpu 0000:01:00.0: enabling device (0000 -> 0003)
> ...
> [  129.918128] [drm] GPU posting now...

The second time:
No mention of enabling device.
> [  159.722828] [drm] GPU post is not needed

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1532 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (3 preceding siblings ...)
  2017-07-27 12:00 ` bugzilla-daemon
@ 2017-07-27 14:42 ` bugzilla-daemon
  2017-07-27 14:42 ` bugzilla-daemon
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-27 14:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 385 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #4 from Alex Deucher <alexdeucher@gmail.com> ---
Created attachment 133074
  --> https://bugs.freedesktop.org/attachment.cgi?id=133074&action=edit
possible fix 1/2

Do the attached patches help (based on my drm-next-4.14-wip branch)?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1418 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (4 preceding siblings ...)
  2017-07-27 14:42 ` bugzilla-daemon
@ 2017-07-27 14:42 ` bugzilla-daemon
  2017-07-28 10:26 ` bugzilla-daemon
                   ` (26 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-27 14:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 313 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #5 from Alex Deucher <alexdeucher@gmail.com> ---
Created attachment 133075
  --> https://bugs.freedesktop.org/attachment.cgi?id=133075&action=edit
possible fix 2/2

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1346 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (5 preceding siblings ...)
  2017-07-27 14:42 ` bugzilla-daemon
@ 2017-07-28 10:26 ` bugzilla-daemon
  2017-07-28 11:46 ` bugzilla-daemon
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-28 10:26 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 737 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #6 from Robin <beanow@oscp.info> ---
Thanks for the quick patches! I'm working my way to your kernel branch to rule
out other changes fixing the issue. May take a little bit as I've not had to
build my own kernels before.

Anyway, going from 4.10 to the 4.13rc2 kernel found here
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13-rc2/

Has the same problem, although slightly less reliably, and now including ring
test errors.

With less reliably I mean, I've seen the driver *sometimes* working a 2nd
binding, but give the same error on the 3rd time.

More results pending.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1619 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (6 preceding siblings ...)
  2017-07-28 10:26 ` bugzilla-daemon
@ 2017-07-28 11:46 ` bugzilla-daemon
  2017-07-28 14:24 ` bugzilla-daemon
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-28 11:46 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 524 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #7 from Robin <beanow@oscp.info> ---
Created attachment 133098
  --> https://bugs.freedesktop.org/attachment.cgi?id=133098&action=edit
kern.log for drm-next-4.14-wip

Building the drm-next-4.14-wip branch including both patches does not resolve
the issue and behaves similar to the previous 4.13rc2 kernel regarding ring
test errors showing up. Typically 1, 9 and/or 10.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1487 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (7 preceding siblings ...)
  2017-07-28 11:46 ` bugzilla-daemon
@ 2017-07-28 14:24 ` bugzilla-daemon
  2017-07-28 15:19 ` bugzilla-daemon
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-28 14:24 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 367 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #8 from Alex Deucher <alexdeucher@gmail.com> ---
Created attachment 133099
  --> https://bugs.freedesktop.org/attachment.cgi?id=133099&action=edit
possible fix 3/2

Dos using this patch on top of the other two help?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1400 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (8 preceding siblings ...)
  2017-07-28 14:24 ` bugzilla-daemon
@ 2017-07-28 15:19 ` bugzilla-daemon
  2017-07-28 15:37 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-28 15:19 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 492 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #9 from Robin <beanow@oscp.info> ---
Created attachment 133100
  --> https://bugs.freedesktop.org/attachment.cgi?id=133100&action=edit
kern.log for drm-next-4.14-wip with patch 3

Same issue with patch3.

I've attached the kern.log of one of the occasions where it gave the init error
on the 3rd time binding amdgpu, rather than the 2nd.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1481 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (9 preceding siblings ...)
  2017-07-28 15:19 ` bugzilla-daemon
@ 2017-07-28 15:37 ` bugzilla-daemon
  2017-07-28 15:53 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-28 15:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1242 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #10 from Robin <beanow@oscp.info> ---
Created attachment 133101
  --> https://bugs.freedesktop.org/attachment.cgi?id=133101&action=edit
4.13rc2 ubuntu kern.log

Inspecting the output more closely there's a subtle difference in the error
produced.

While the 4.10 kernel produces:

> [  160.013733] [drm:ci_dpm_enable [amdgpu]] *ERROR* ci_start_dpm failed
> [  160.014134] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <amdgpu_powerplay> failed -22

The 4.13rc2, drm-next-4.14-wip and drm-next-4.14-wip with patch 3 produce:

> [  134.226312] [drm:cik_sdma_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 9 test failed (0xCAFEDEAD)
> [  134.226822] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <cik_sdma> failed -22


And something I noticed for the third bind error cases, the 2nd and 3rd time
have much longer ring 1 tests than the first bind.

> [   69.938959] [drm] ring test on 1 succeeded in 2 usecs
> ...
> [  102.040253] [drm] ring test on 1 succeeded in 677 usecs
> ...
> [  134.121468] [drm] ring test on 1 succeeded in 677 usecs

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2315 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (10 preceding siblings ...)
  2017-07-28 15:37 ` bugzilla-daemon
@ 2017-07-28 15:53 ` bugzilla-daemon
  2017-07-28 17:26 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-28 15:53 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 379 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #11 from Alex Deucher <alexdeucher@gmail.com> ---
Note that the GPU reset in patch 3/2 requires access to pci config registers
for the GPU which many hypervisors block, so you'd need to make sure that works
for the reset to work.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1203 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (11 preceding siblings ...)
  2017-07-28 15:53 ` bugzilla-daemon
@ 2017-07-28 17:26 ` bugzilla-daemon
  2017-07-28 17:29 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-28 17:26 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3318 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #12 from Robin <beanow@oscp.info> ---
Created attachment 133103
  --> https://bugs.freedesktop.org/attachment.cgi?id=133103&action=edit
case2-rescan-amd.sh

In an attempt to make a second test case I've created a new script that
produced some noteworthy results.

Rather than bind/unbind, this approach uses rmmod,modprobe, removing the pci
device and rescanning to switch drivers.

Please excuse how poorly written and contrived the test case for "hotswapping"
proposes, I'll try isolating what causes the differences with the first test
case in some mutations next, but wanted to share the intermediate results as-is
first.

Some details about this test.

The starting point is the same as the other test case, TTY and vfio-pci taking
the card first. In order it will:

1. rmmod the current driver.
2. remove one pci subdevice (either VGA or Audio)
3. modprobe the new driver.
4. perform a pci rescan.

It will do this in a loop switching between amdgpu and vfio-pci again.

Another difference is that snd_hda_intel is in use elsewhere, it does not get
an rmmod and will not switch back to vfio-pci because of this.

---

As for results, on 4.10 there was no change.
>From the 2nd binding onward this error will fail to init the driver.
> [  160.013733] [drm:ci_dpm_enable [amdgpu]] *ERROR* ci_start_dpm failed
> [  160.014134] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <amdgpu_powerplay> failed -22

For 4.13rc2, drm-next-4.14-wip and drm-next-4.14-wip with patch 3 it's a
different story.

They have an irregular pattern of errors every loop.
Either the 2nd or 3rd time the first error crops up. Typically this is:
> [  211.818341] [drm:cik_sdma_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 9 test failed (0xCAFEDEAD)
> [  211.818725] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <cik_sdma> failed -22

After that first error, additionally the following error can appear as well.
> [  247.626839] [drm:gfx_v7_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 1 test failed (scratch(0xC040)=0xCAFEDEAD)

And instead of ring 9, ring 10 may fail.
> [  356.686092] [drm:cik_sdma_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 10 test failed (0xCAFEDEAD)
> [  356.686580] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <cik_sdma> failed -22

They seem to randomly happen in the following combinations:

A. Ring 1 fails.
B. Ring 9 or 10 fails.
C. Ring 1 + Ring 9 or 10 fails.

Most importantly though. Only if 9 or 10 fail (B or C combinations) will the
hw_init error occur. If it's just a ring 1 failure (A) the driver will
successfully init the GPU.

Also, the drm-next-4.14-wip with patch 3 kernel will have this A combination
and successful init a lot more often that the other two.

---

So my suspicion is that this difference could be due to:
- Repeatedly rmmodding and modprobing being part of the loop now.
- The rescanning method vs bind/unbind.
- The different treatment of the Audio component.
- The different access of vfio-pci to the Audio component.

So I will make several variations on the test scripts to try and narrow this
down.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 4421 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (12 preceding siblings ...)
  2017-07-28 17:26 ` bugzilla-daemon
@ 2017-07-28 17:29 ` bugzilla-daemon
  2017-07-28 20:55 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-28 17:29 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 638 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #13 from Robin <beanow@oscp.info> ---
(In reply to Alex Deucher from comment #11)
> Note that the GPU reset in patch 3/2 requires access to pci config registers
> for the GPU which many hypervisors block, so you'd need to make sure that
> works for the reset to work.

I'm not actually utilizing vfio-pci in these test cases, this runs as root from
a TTY on the host machine. So I would assume it to work. I don't know how I
would test this though, let me know how I could test this.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1528 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (13 preceding siblings ...)
  2017-07-28 17:29 ` bugzilla-daemon
@ 2017-07-28 20:55 ` bugzilla-daemon
  2017-07-28 21:06 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-28 20:55 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1044 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #14 from Robin <beanow@oscp.info> ---
Created attachment 133108
  --> https://bugs.freedesktop.org/attachment.cgi?id=133108&action=edit
case3.sh

So, tinkering with the test script I've only been able to eliminate some
suspicions and invalidate my observation the patch performed better.

I've taken out vfio-pci binding from the loop. It's only used during boot to
keep the GPU free to unbind. So it's not related to vfio-pci's having access in
between binds.

I've made 3 methods for rebinding amdgpu.
amdgpu rmmod > modprobe
remove pci devices > rescan
driver unbind > bind

I've run each of these a few dozen times on each kernel and none of them really
stand out. All of them have a chance to work (as in, ring 1 test failure only)
or to fail.

4.13rc2, drm-next-4.14-wip, drm-next-4.14-wip + patch3 all have this behaviour.
So no I don't think they've helped after all.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1973 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (14 preceding siblings ...)
  2017-07-28 20:55 ` bugzilla-daemon
@ 2017-07-28 21:06 ` bugzilla-daemon
  2017-07-28 21:17 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-28 21:06 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 461 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #15 from Alex Deucher <alexdeucher@gmail.com> ---
Are you using a patched qemu that attempts to do radeon device specific gpu
reset?  If so, does removing that code help?  Next, are you sure pci config
access is allowed in your configuration?  As I mentioned in comment 11, it's
required for gpu reset to work.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1326 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (15 preceding siblings ...)
  2017-07-28 21:06 ` bugzilla-daemon
@ 2017-07-28 21:17 ` bugzilla-daemon
  2017-07-28 21:55 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-28 21:17 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1118 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #16 from Robin <beanow@oscp.info> ---
(In reply to Alex Deucher from comment #15)
> Are you using a patched qemu that attempts to do radeon device specific gpu
> reset?  If so, does removing that code help?  Next, are you sure pci config
> access is allowed in your configuration?  As I mentioned in comment 11, it's
> required for gpu reset to work.

I have installed the ubuntu supplied version.
> $ kvm --version
> QEMU emulator version 2.8.0(Debian 1:2.8+dfsg-3ubuntu2.3)
> Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers

But KVM/Qemu is not being invoked. After a fresh boot on bare metal, these are
the results I get in a root TTY. 

I have seen mention of vfio-pci using device specific resets though.
https://www.spinics.net/lists/kvm/msg116277.html
So I will try to completely take it out of my test.

I'm not sure about pci config access, since I don't know how to verify this.
Any instructions would be appreciated.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2152 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (16 preceding siblings ...)
  2017-07-28 21:17 ` bugzilla-daemon
@ 2017-07-28 21:55 ` bugzilla-daemon
  2017-07-28 23:06 ` [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290] bugzilla-daemon
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-28 21:55 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 332 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #17 from Robin <beanow@oscp.info> ---
I've tested:

- Disabling vfio-pci, no changes
- Disabling iommu support, no changes
- Booting with and without amdgpu blacklisted, no changes

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1144 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (17 preceding siblings ...)
  2017-07-28 21:55 ` bugzilla-daemon
@ 2017-07-28 23:06 ` bugzilla-daemon
  2017-07-29 16:05 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-28 23:06 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 509 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

Robin <beanow@oscp.info> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Rebinding AMDGPU causes     |Rebinding AMDGPU causes
                   |initialization errors [R9   |initialization errors [R9
                   |290 / 4.10 kernel]          |290]

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1158 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (18 preceding siblings ...)
  2017-07-28 23:06 ` [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290] bugzilla-daemon
@ 2017-07-29 16:05 ` bugzilla-daemon
  2017-07-29 17:20 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-29 16:05 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 410 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #18 from Robin <beanow@oscp.info> ---
Created attachment 133127
  --> https://bugs.freedesktop.org/attachment.cgi?id=133127&action=edit
Logging shutdown function

I've modified the patch to include info messages. The code path is never
executed in my tests.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1422 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (19 preceding siblings ...)
  2017-07-29 16:05 ` bugzilla-daemon
@ 2017-07-29 17:20 ` bugzilla-daemon
  2017-07-29 22:22 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-29 17:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 430 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #19 from Robin <beanow@oscp.info> ---
I've found that my test cases only trigger the PCI drivers'
amdgpu_pci_remove and amdgpu_pci_probe functions.

Adding the new shutdown function call amdgpu_device_shutdown(adev); to the
amdgpu_pci_remove function does not resolve the issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1214 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (20 preceding siblings ...)
  2017-07-29 17:20 ` bugzilla-daemon
@ 2017-07-29 22:22 ` bugzilla-daemon
  2017-08-01 14:28 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-07-29 22:22 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 741 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #20 from Robin <beanow@oscp.info> ---
Created attachment 133132
  --> https://bugs.freedesktop.org/attachment.cgi?id=133132&action=edit
Brute-force fix, resets sdma every init

After much trial and error, I've found this approach to work.
Every hw_init both sDMA's will be flagged for a soft reset.

I have tried the existing soft reset code as well, but the busy status flags
that are being used to selectively reset the sDMA's do not work reliably in my
tests to prevent the errors.

Using this patch the 9 and 10 ring test errors no longer appear and prevents
the ring 1 errors.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1781 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (21 preceding siblings ...)
  2017-07-29 22:22 ` bugzilla-daemon
@ 2017-08-01 14:28 ` bugzilla-daemon
  2017-08-01 14:59 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-08-01 14:28 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1014 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #21 from Luke A. Guest <laguest@archeia.com> ---
Created attachment 133172
  --> https://bugs.freedesktop.org/attachment.cgi?id=133172&action=edit
Test of above with R9 380 with Windows 8.1 and latest AMD drivers

Hi,

After being asked to try this by Alex in IRC, I've added the output of the
various logs, there will be overlap in places, dmesg and messages.

I'm running 4.13.0-rc2 with drm-next-4.14 branch merged and the set of 3
patches from Alex. Tried with and without the third patch, I still get a black
screen on restarting the VM (using virt manager). The first boot from a freshly
booted host starts fine.

In the log I've put "START" and "RESTART" where the VM is started (then
shutdown) and then restarted. There is also extra PCI debugging messages
enabled in the kernel.

I too, would like an answer re the probing mentioned in 11.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2051 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (22 preceding siblings ...)
  2017-08-01 14:28 ` bugzilla-daemon
@ 2017-08-01 14:59 ` bugzilla-daemon
  2017-08-01 15:16 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-08-01 14:59 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 864 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #22 from Luke A. Guest <laguest@archeia.com> ---
I'd like to report a minor success. I've managed to boot into win8.1 twice in a
row. I booted as normal through virt-manager, then shutdown from inside the
guest, then called a script:

#!/bin/sh

echo 1 > /sys/bus/pci/devices/0000\:03\:00.0/remove
echo 1 > /sys/bus/pci/devices/0000\:03\:00.1/remove
echo 1 > /sys/bus/pci/rescan

/opt/vfio/rebind_dev_to_vfio.sh 0000:03:00.0
/opt/vfio/rebind_dev_to_vfio.sh 0000:03:00.1

Then restarted the guest from virt-manager, booted fine, again I shutdown from
within the guest.

On running the above script a second time, the machine hung, hard. I couldn't
login through serial, sysrq keys didn't do anything.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1668 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (23 preceding siblings ...)
  2017-08-01 14:59 ` bugzilla-daemon
@ 2017-08-01 15:16 ` bugzilla-daemon
  2017-08-01 16:16 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-08-01 15:16 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 376 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #23 from Luke A. Guest <laguest@archeia.com> ---
Created attachment 133173
  --> https://bugs.freedesktop.org/attachment.cgi?id=133173&action=edit
Script to rebind a device back to the vfio-pci driver

forgot to submit this.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1369 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (24 preceding siblings ...)
  2017-08-01 15:16 ` bugzilla-daemon
@ 2017-08-01 16:16 ` bugzilla-daemon
  2017-08-01 16:30 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-08-01 16:16 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 763 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #24 from Robin <beanow@oscp.info> ---
(In reply to Luke A. Guest from comment #22)
> I'd like to report a minor success. I've managed to boot into win8.1 twice
> in a row. I booted as normal through virt-manager, then shutdown from inside
> the guest, then called a script:

Hi Luke, few questions. When booting the host, do you boot with amdgpu or
vfio-pci bound to the GPU? After you've started a VM, did you bind back to
amdgpu or did you stay on vfio-pci?

Is it during vfio or amdgpu control that your system hangs on the second boot?

If it's during amdgpu, have you tried my patch from comment 20?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1666 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (25 preceding siblings ...)
  2017-08-01 16:16 ` bugzilla-daemon
@ 2017-08-01 16:30 ` bugzilla-daemon
  2017-08-01 16:58 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-08-01 16:30 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1048 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #25 from Luke A. Guest <laguest@archeia.com> ---
> Hi Luke, few questions. When booting the host, do you boot with amdgpu or vfio-pci bound to the GPU? After you've started a VM, did you bind back to amdgpu or did you stay on vfio-pci?

I have 2 AMD GPU's, R9 390 (host) and R9 380 (guest). I boot with the 380 being
passed over to vfio-pci. On exit the VM sets the 380 back to vfio-pci.

> Is it during vfio or amdgpu control that your system hangs on the second boot?

It was during a boot of the VM, the devices were attached to the vfio-pci
driver.

> If it's during amdgpu, have you tried my patch from comment 20?

I haven't tried it, I don't think it would apply to my card as it's VI not CIK.
Although, if I were using the 390 (CIK) it likely would. The issues are similar
though and I believe I've just proved that the so called hw reset bug, in may
case anyway, is sw not hw.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1977 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (26 preceding siblings ...)
  2017-08-01 16:30 ` bugzilla-daemon
@ 2017-08-01 16:58 ` bugzilla-daemon
  2017-08-01 17:47 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-08-01 16:58 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1350 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #26 from Robin <beanow@oscp.info> ---
(In reply to Luke A. Guest from comment #25)
> I have 2 AMD GPU's, R9 390 (host) and R9 380 (guest). I boot with the 380
> being passed over to vfio-pci. On exit the VM sets the 380 back to vfio-pci.

FWIW I don't think any of these patches are relevant to you then.
The reset logic for your 380 would be coming from the guest's driver +
vfio-pci. Where vfio in theory should only try to get the 380's state back to
how it would be if you actually rebooted and leave the more sophisticated work
to the guest driver.

Though as mentioned here https://www.spinics.net/lists/kvm/msg116277.html
vfio-pci may employ hardware specific solutions if there's no good blanket
solution.

---

For my scenario I have the Intel iGPU and the R9 290. So I am trying to find a
setup where I can use the 290 for gaming on both the host and the guest. Once
the 290 is bound to vfio-pci I have no issues with the VM. Reboot, force off,
as many times as I like and no problems.

It's when I am done with the VMs and try to give the 290 back to amdgpu I had
init issues. Which my comment 20 patch does resolve, even if it is a carpet
bomb approach to solving it.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2313 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (27 preceding siblings ...)
  2017-08-01 16:58 ` bugzilla-daemon
@ 2017-08-01 17:47 ` bugzilla-daemon
  2017-08-02  6:37 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-08-01 17:47 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 576 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #27 from Luke A. Guest <laguest@archeia.com> ---
> FWIW I don't think any of these patches are relevant to you then.

Not strictly true. As I said, Alex pointed me at this page to try his patches.
I believe all this is connected. There are issues un/binding from/to the
driver. There are reset issues as well. 

I've put my test branch here
https://github.com/Lucretia/linux-amdgpu/tree/amdgpu/v4.13-rc2-amdgpu-reset-test

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1497 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (28 preceding siblings ...)
  2017-08-01 17:47 ` bugzilla-daemon
@ 2017-08-02  6:37 ` bugzilla-daemon
  2017-08-30 20:55 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-08-02  6:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 721 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #28 from Robin <beanow@oscp.info> ---
(In reply to Luke A. Guest from comment #27)
> > FWIW I don't think any of these patches are relevant to you then.
> 
> Not strictly true. As I said, Alex pointed me at this page to try his
> patches. I believe all this is connected. There are issues un/binding
> from/to the driver. There are reset issues as well. 

True, there may be init/cleanup issues with AMD cards that might be better
understood and documented in the open source community if this were fixed in
amdgpu and hopefully that helps you in vfio-pci as well.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1592 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (29 preceding siblings ...)
  2017-08-02  6:37 ` bugzilla-daemon
@ 2017-08-30 20:55 ` bugzilla-daemon
  2019-07-27  5:47 ` bugzilla-daemon
  2019-11-19  8:20 ` bugzilla-daemon
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2017-08-30 20:55 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 345 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

--- Comment #29 from Robin <beanow@oscp.info> ---
As I will pass on my R9 290 and switch to an RX 580, please let me know if you
need any extra information from either card before I no longer have the R9 290.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1129 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (30 preceding siblings ...)
  2017-08-30 20:55 ` bugzilla-daemon
@ 2019-07-27  5:47 ` bugzilla-daemon
  2019-11-19  8:20 ` bugzilla-daemon
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2019-07-27  5:47 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 451 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

wedens13@yandex.ru changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugs.freedesktop.or
                   |                            |g/show_bug.cgi?id=111229

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1093 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290]
  2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
                   ` (31 preceding siblings ...)
  2019-07-27  5:47 ` bugzilla-daemon
@ 2019-11-19  8:20 ` bugzilla-daemon
  32 siblings, 0 replies; 34+ messages in thread
From: bugzilla-daemon @ 2019-11-19  8:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 806 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=101946

Martin Peres <martin.peres@free.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |MOVED

--- Comment #30 from Martin Peres <martin.peres@free.fr> ---
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.

You can subscribe and participate further through the new bug through this link
to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/207.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2388 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2019-11-19  8:20 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-27 11:46 [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290 / 4.10 kernel] bugzilla-daemon
2017-07-27 11:47 ` bugzilla-daemon
2017-07-27 11:47 ` bugzilla-daemon
2017-07-27 11:47 ` bugzilla-daemon
2017-07-27 12:00 ` bugzilla-daemon
2017-07-27 14:42 ` bugzilla-daemon
2017-07-27 14:42 ` bugzilla-daemon
2017-07-28 10:26 ` bugzilla-daemon
2017-07-28 11:46 ` bugzilla-daemon
2017-07-28 14:24 ` bugzilla-daemon
2017-07-28 15:19 ` bugzilla-daemon
2017-07-28 15:37 ` bugzilla-daemon
2017-07-28 15:53 ` bugzilla-daemon
2017-07-28 17:26 ` bugzilla-daemon
2017-07-28 17:29 ` bugzilla-daemon
2017-07-28 20:55 ` bugzilla-daemon
2017-07-28 21:06 ` bugzilla-daemon
2017-07-28 21:17 ` bugzilla-daemon
2017-07-28 21:55 ` bugzilla-daemon
2017-07-28 23:06 ` [Bug 101946] Rebinding AMDGPU causes initialization errors [R9 290] bugzilla-daemon
2017-07-29 16:05 ` bugzilla-daemon
2017-07-29 17:20 ` bugzilla-daemon
2017-07-29 22:22 ` bugzilla-daemon
2017-08-01 14:28 ` bugzilla-daemon
2017-08-01 14:59 ` bugzilla-daemon
2017-08-01 15:16 ` bugzilla-daemon
2017-08-01 16:16 ` bugzilla-daemon
2017-08-01 16:30 ` bugzilla-daemon
2017-08-01 16:58 ` bugzilla-daemon
2017-08-01 17:47 ` bugzilla-daemon
2017-08-02  6:37 ` bugzilla-daemon
2017-08-30 20:55 ` bugzilla-daemon
2019-07-27  5:47 ` bugzilla-daemon
2019-11-19  8:20 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).