All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 216625] New: [regression] GPU lockup on Radeon R7 Kaveri
@ 2022-10-25 15:52 bugzilla-daemon
  2022-10-25 15:52 ` [Bug 216625] " bugzilla-daemon
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: bugzilla-daemon @ 2022-10-25 15:52 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=216625

            Bug ID: 216625
           Summary: [regression] GPU lockup on Radeon R7 Kaveri
           Product: Drivers
           Version: 2.5
    Kernel Version: 5.19.16-100.fc35.x86_64
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@kernel-bugs.osdl.org
          Reporter: pierre-bugzilla@ossman.eu
        Regression: No

Created attachment 303084
  --> https://bugzilla.kernel.org/attachment.cgi?id=303084&action=edit
dmesg since problems began

Ever since I upgraded from Fedora 34 to Fedora 35 I've gotten random GPU
lockups. This machine has otherwise been stable for years.

I don't really know what triggers the issue. I *think* it happens in some cases
when I try to play a video in Firefox, but I'm not completely sure.

Reported here, but Fedora generally don't give any attention to GPU driver
issues:

https://bugzilla.redhat.com/show_bug.cgi?id=2131923

Last working system:

  kernel-5.13.8-100.fc33.x86_64
  libglvnd-1:1.3.3-1.fc34.x86_64
  mesa-libGL-21.1.8-3.fc34.x86_64
  libdrm-2.4.109-1.fc34.x86_64
  xorg-x11-server-Xorg-1.20.14-3.fc34.x86_64

First broken system:

  kernel-5.19.8-100.fc35.x86_64
  libglvnd-1:1.3.4-2.fc35.x86_64
  mesa-libGL-21.3.9-1.fc35.x86_64
  libdrm-2.4.110-1.fc35.x86_64
  xorg-x11-server-Xorg-1.20.14-7.fc35.x86_64

Attached is all kernel logs since the issue started happening. It also includes
a fresh boot from the last good kernel, and a good run with the new kernel.

I think that first run with the new kernel was just a fluke, though. The only
package upgraded after the system upgrade and before the lockups started is
annobin.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
  2022-10-25 15:52 [Bug 216625] New: [regression] GPU lockup on Radeon R7 Kaveri bugzilla-daemon
@ 2022-10-25 15:52 ` bugzilla-daemon
  2022-10-25 16:24 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2022-10-25 15:52 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=216625

Pierre Ossman (pierre-bugzilla@ossman.eu) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
               Tree|Mainline                    |Fedora

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
  2022-10-25 15:52 [Bug 216625] New: [regression] GPU lockup on Radeon R7 Kaveri bugzilla-daemon
  2022-10-25 15:52 ` [Bug 216625] " bugzilla-daemon
@ 2022-10-25 16:24 ` bugzilla-daemon
  2022-10-26  5:40 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2022-10-25 16:24 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=216625

Alex Deucher (alexdeucher@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |alexdeucher@gmail.com

--- Comment #1 from Alex Deucher (alexdeucher@gmail.com) ---
Any chance you could bisect?  There have been very few changes to the radeon
kernel driver over the last few years.  I could also be a mesa regression. 
Does upgrading or downgrading mesa help?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
  2022-10-25 15:52 [Bug 216625] New: [regression] GPU lockup on Radeon R7 Kaveri bugzilla-daemon
  2022-10-25 15:52 ` [Bug 216625] " bugzilla-daemon
  2022-10-25 16:24 ` bugzilla-daemon
@ 2022-10-26  5:40 ` bugzilla-daemon
  2022-10-26  5:58 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2022-10-26  5:40 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=216625

--- Comment #2 from Pierre Ossman (pierre-bugzilla@ossman.eu) ---
A bisect will be difficult, given that I can't reproduce it. :/

Any clues from the dmesg that could tell how to provoke it? Or some settings
that could provide more information?

I can try a few version and see if I'm able to narrow it down somewhat. It's
difficult to know when to assume it's a good version as in some cases it has
gone weeks without a lookup...

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
  2022-10-25 15:52 [Bug 216625] New: [regression] GPU lockup on Radeon R7 Kaveri bugzilla-daemon
                   ` (2 preceding siblings ...)
  2022-10-26  5:40 ` bugzilla-daemon
@ 2022-10-26  5:58 ` bugzilla-daemon
  2022-10-28  5:40 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2022-10-26  5:58 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=216625

--- Comment #3 from Pierre Ossman (pierre-bugzilla@ossman.eu) ---
This is wrong, I checked the wrong lines in dnf's history:

> Last working system:
> 
>   kernel-5.13.8-100.fc33.x86_64

The last working kernel is actually 5.17.12-100.fc34.x86_64. So if it's the
kernel it's likely 5.18 or 5.19 that regressed. I'll give 5.18.1 a spin.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
  2022-10-25 15:52 [Bug 216625] New: [regression] GPU lockup on Radeon R7 Kaveri bugzilla-daemon
                   ` (3 preceding siblings ...)
  2022-10-26  5:58 ` bugzilla-daemon
@ 2022-10-28  5:40 ` bugzilla-daemon
  2022-11-11  6:47 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2022-10-28  5:40 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=216625

--- Comment #4 from Pierre Ossman (pierre-bugzilla@ossman.eu) ---
I just got a GPU lockup on 5.18.4. So it's either not the kernel, or a bug that
appeared in the 5.18 series. I'll go back to the known good kernel now and see
if I can get the bug there.


One thought though, even if it is mesa that happens to issue a bad sequence of
commands, shouldn't the kernel driver be able to reset the GPU? It certainly
indicates that it is trying.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
  2022-10-25 15:52 [Bug 216625] New: [regression] GPU lockup on Radeon R7 Kaveri bugzilla-daemon
                   ` (4 preceding siblings ...)
  2022-10-28  5:40 ` bugzilla-daemon
@ 2022-11-11  6:47 ` bugzilla-daemon
  2022-11-11 14:54 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2022-11-11  6:47 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=216625

--- Comment #5 from Pierre Ossman (pierre-bugzilla@ossman.eu) ---
The lockup happens on 5.17.2 as well, so it seems the kernel is not the most
likely suspect.

I'll see if I can try an older mesa next.

Could the issue be with the firmware? Has that changed recently for these
devices?

The last good firmware should be:

  linux-firmware-20220509-132.fc34.noarch

And the first bad firmware should be:

  linux-firmware-20220708-136.fc35.noarch

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
  2022-10-25 15:52 [Bug 216625] New: [regression] GPU lockup on Radeon R7 Kaveri bugzilla-daemon
                   ` (5 preceding siblings ...)
  2022-11-11  6:47 ` bugzilla-daemon
@ 2022-11-11 14:54 ` bugzilla-daemon
  2022-12-20  6:53 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2022-11-11 14:54 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=216625

--- Comment #6 from Alex Deucher (alexdeucher@gmail.com) ---
(In reply to Pierre Ossman from comment #5)
> 
> Could the issue be with the firmware? Has that changed recently for these
> devices?
> 
> The last good firmware should be:
> 
>   linux-firmware-20220509-132.fc34.noarch
> 
> And the first bad firmware should be:
> 
>   linux-firmware-20220708-136.fc35.noarch

Not likely. The firmware for this chip has not changed in years.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
  2022-10-25 15:52 [Bug 216625] New: [regression] GPU lockup on Radeon R7 Kaveri bugzilla-daemon
                   ` (6 preceding siblings ...)
  2022-11-11 14:54 ` bugzilla-daemon
@ 2022-12-20  6:53 ` bugzilla-daemon
  2022-12-20 15:13 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2022-12-20  6:53 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=216625

--- Comment #7 from Pierre Ossman (pierre-bugzilla@ossman.eu) ---
Sorry, I haven't had time to look at downgrading Mesa yet. But FYI, it does
still happen with mesa 22.1.7 and kernel 6.0.10.

I am now almost 100% certain that it is videos that are triggering this. And
possibly not all videos. So I'm thinking, perhaps the video acceleration?

Is that also handled by mesa, or some other component?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
  2022-10-25 15:52 [Bug 216625] New: [regression] GPU lockup on Radeon R7 Kaveri bugzilla-daemon
                   ` (7 preceding siblings ...)
  2022-12-20  6:53 ` bugzilla-daemon
@ 2022-12-20 15:13 ` bugzilla-daemon
  2023-03-06  6:13 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2022-12-20 15:13 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=216625

--- Comment #8 from Alex Deucher (alexdeucher@gmail.com) ---
(In reply to Pierre Ossman from comment #7)
> 
> Is that also handled by mesa, or some other component?

Yes, mesa handles video APIs (VAAPI, OpenMAX, VDPAU) as well as 3D (OpenGL,
Vulkan).

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
  2022-10-25 15:52 [Bug 216625] New: [regression] GPU lockup on Radeon R7 Kaveri bugzilla-daemon
                   ` (8 preceding siblings ...)
  2022-12-20 15:13 ` bugzilla-daemon
@ 2023-03-06  6:13 ` bugzilla-daemon
  2023-03-09  6:23 ` bugzilla-daemon
  2023-03-24 17:52 ` bugzilla-daemon
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2023-03-06  6:13 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=216625

--- Comment #9 from Pierre Ossman (pierre-bugzilla@ossman.eu) ---
FYI, it seems to have gotten worse since upgrading from
kernel-6.1.8-100.fc36.x86_64 to kernel-6.1.13-100.fc36.x86_64.

It now hangs more arbitrarily, not just when trying to play a video. Having
done a suspend/resume cycle is still a requirement though.

I'm struggling building the old version of mesa that still worked. It isn't
very compatible with newer LLVM, and there is something wrong with Fedora's
packaging of LLVM 12 (that seems to be the matching LLVM version for that old
mesa). I'll need some more effort to get that test up and running.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
  2022-10-25 15:52 [Bug 216625] New: [regression] GPU lockup on Radeon R7 Kaveri bugzilla-daemon
                   ` (9 preceding siblings ...)
  2023-03-06  6:13 ` bugzilla-daemon
@ 2023-03-09  6:23 ` bugzilla-daemon
  2023-03-24 17:52 ` bugzilla-daemon
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2023-03-09  6:23 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=216625

--- Comment #10 from Pierre Ossman (pierre-bugzilla@ossman.eu) ---
I finally got that old version of mesa to build. Unfortunately, the hangs still
happen even with that. :/

> Mar 09 07:18:30 kernel: radeon 0000:00:01.0: ring 3 stalled for more than
> 10028msec
> Mar 09 07:18:30 kernel: radeon 0000:00:01.0: GPU lockup (current fence id
> 0x000000000000fa91 last fence id 0x000000000000fabc on ring 3)
> Mar 09 07:18:31 kernel: radeon 0000:00:01.0: ring 5 stalled for more than
> 10077msec
> Mar 09 07:18:31 kernel: radeon 0000:00:01.0: GPU lockup (current fence id
> 0x00000000000018fb last fence id 0x00000000000018fe on ring 5)
> Mar 09 07:18:31 kernel: radeon 0000:00:01.0: ring 0 stalled for more than
> 10202msec
> ...

What can we do next to pinpoint this?

It seems to fail rather reliably after a suspend/resume. Is there some test
suite I can run to provoke things?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
  2022-10-25 15:52 [Bug 216625] New: [regression] GPU lockup on Radeon R7 Kaveri bugzilla-daemon
                   ` (10 preceding siblings ...)
  2023-03-09  6:23 ` bugzilla-daemon
@ 2023-03-24 17:52 ` bugzilla-daemon
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2023-03-24 17:52 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=216625

--- Comment #11 from Pierre Ossman (pierre-bugzilla@ossman.eu) ---
(In reply to Pierre Ossman from comment #9)
> 
> It now hangs more arbitrarily, not just when trying to play a video. Having
> done a suspend/resume cycle is still a requirement though.
> 

I tried disabling video acceleration, and the hangs are now gone. So it does
seem to be the culprit after all.

Could this help you pinpoint things somehow?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-03-24 17:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-25 15:52 [Bug 216625] New: [regression] GPU lockup on Radeon R7 Kaveri bugzilla-daemon
2022-10-25 15:52 ` [Bug 216625] " bugzilla-daemon
2022-10-25 16:24 ` bugzilla-daemon
2022-10-26  5:40 ` bugzilla-daemon
2022-10-26  5:58 ` bugzilla-daemon
2022-10-28  5:40 ` bugzilla-daemon
2022-11-11  6:47 ` bugzilla-daemon
2022-11-11 14:54 ` bugzilla-daemon
2022-12-20  6:53 ` bugzilla-daemon
2022-12-20 15:13 ` bugzilla-daemon
2023-03-06  6:13 ` bugzilla-daemon
2023-03-09  6:23 ` bugzilla-daemon
2023-03-24 17:52 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.