All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
@ 2021-06-10 11:16 bugzilla-daemon
  2021-06-10 11:33 ` [Bug 213391] " bugzilla-daemon
                   ` (39 more replies)
  0 siblings, 40 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-10 11:16 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

            Bug ID: 213391
           Summary: AMDGPU retries page fault with some specific processes
                    amdgpu: [gfxhub0] retry page fault until *ERROR* ring
                    gfx timeout, but soft recovered
           Product: Drivers
           Version: 2.5
    Kernel Version: Linux 5.12.9-arch-1-1
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: low
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@kernel-bugs.osdl.org
          Reporter: samy@lahfa.xyz
        Regression: No

Hi,

I just updated recently from mainstream Kernel 5.11.16 to 5.12.9 and I've ran
into this issue, I've also updated the Mesa driver from mesa-git
(21.1.0_devel.137307.f8e5f945b8f-1) to mesa-git
(21.2.0_devel.140633.c04f20e7e01-1).

Current kernel parameters : /vmlinuz-linux zfs=zroot/ROOT/default rw loglevel=3
quiet radeon.si_support=0 amdgpu.si_support=1 radeon.cik_support=0
amdgpu.cik_support=1

My computer is a Thinkpad T495 laptop (AMD Ryzen 7 3700 Pro with an iGPU RX
VEGA 10, 16GB DDR4 3200Mhz) the very important bit of information is that the
BIOS reserves up to 2GB of DDR4 RAM for the iGPU VRAM, I currently have setup
1GB (1024MB) of RAM in my BIOS for the iGPU, I'm thinking the page fault
retries could be linked to this in someways.

I think this has a higher chance of happening when my RAM memory is under heavy
load and the system is swapping quite a lot too. (I have 12.3GB of Swap on a
NVMe PCIe 3.0)

At present, I cannot reproduce this issue consistently yet, however it has been
happening with web browsers Qutebrowser (more with Qutebrowser) and also
happened only once with Chromium (made the X11 server crash and the computer
completely froze, kernel was still responsive to SysReq keys hence I could get
out of that tricky situation safely).

I'll be uploading both logs of the crashes I have encountered along with an
lspci and other logs files that could be useful.

Kind regards,

Lahfa Samy

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
@ 2021-06-10 11:33 ` bugzilla-daemon
  2021-06-10 11:43 ` bugzilla-daemon
                   ` (38 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-10 11:33 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #1 from Lahfa Samy (samy@lahfa.xyz) ---
Created attachment 297287
  --> https://bugzilla.kernel.org/attachment.cgi?id=297287&action=edit
dmesg-chromium-amdgpu-retry-page-fault

In the dmesg, there is the end of an entry to a sleep state and then out of the
sleep state (a USB-C dock was connected to the laptop, and it has screens
however errors happened with it plugged and when it was unplugged).

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
  2021-06-10 11:33 ` [Bug 213391] " bugzilla-daemon
@ 2021-06-10 11:43 ` bugzilla-daemon
  2021-06-10 12:34 ` [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes " bugzilla-daemon
                   ` (37 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-10 11:43 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

Lahfa Samy (samy@lahfa.xyz) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |samy@lahfa.xyz

--- Comment #2 from Lahfa Samy (samy@lahfa.xyz) ---
Created attachment 297291
  --> https://bugzilla.kernel.org/attachment.cgi?id=297291&action=edit
journalctl-amdgpu-qutebrowser-page-retry

This time there was no gfx timeout and thus the X11 server did not freeze, and
I didn't notice the retry page faults until I ran dmesg.

There is a call trace at the beginning (irq 7: nobody cared (try booting with
the "irqpoll" option) and then a call trace, this is a known and reported bug
that doesn't affect my computer functionality in any way since I acquired it.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
  2021-06-10 11:33 ` [Bug 213391] " bugzilla-daemon
  2021-06-10 11:43 ` bugzilla-daemon
@ 2021-06-10 12:34 ` bugzilla-daemon
  2021-06-10 12:34 ` [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed " bugzilla-daemon
                   ` (36 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-10 12:34 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

Lahfa Samy (samy@lahfa.xyz) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|AMDGPU retries page fault   |AMDGPU retries page fault
                   |with some specific          |with some specific
                   |processes amdgpu: [gfxhub0] |processes amdgpu and
                   |retry page fault until      |sometimes [gfxhub0] retry
                   |*ERROR* ring gfx timeout,   |page fault until *ERROR*
                   |but soft recovered          |ring gfx timeout, but soft
                   |                            |recovered

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (2 preceding siblings ...)
  2021-06-10 12:34 ` [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes " bugzilla-daemon
@ 2021-06-10 12:34 ` bugzilla-daemon
  2021-06-10 12:36 ` bugzilla-daemon
                   ` (35 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-10 12:34 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

Lahfa Samy (samy@lahfa.xyz) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|AMDGPU retries page fault   |AMDGPU retries page fault
                   |with some specific          |with some specific
                   |processes amdgpu and        |processes amdgpu and
                   |sometimes [gfxhub0] retry   |sometimes followed
                   |page fault until *ERROR*    |[gfxhub0] retry page fault
                   |ring gfx timeout, but soft  |until *ERROR* ring gfx
                   |recovered                   |timeout, but soft recovered

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (3 preceding siblings ...)
  2021-06-10 12:34 ` [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed " bugzilla-daemon
@ 2021-06-10 12:36 ` bugzilla-daemon
  2021-06-10 12:51 ` bugzilla-daemon
                   ` (34 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-10 12:36 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

Nirmoy (nirmoy.aiemd@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nirmoy.aiemd@gmail.com

--- Comment #3 from Nirmoy (nirmoy.aiemd@gmail.com) ---
How much VRAM do you have, I can't seem to find that from dmesg? We recently
fixed a similar issue using https://patchwork.freedesktop.org/patch/437369/. I
wonder if you can try this patch out.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (4 preceding siblings ...)
  2021-06-10 12:36 ` bugzilla-daemon
@ 2021-06-10 12:51 ` bugzilla-daemon
  2021-06-10 13:09 ` bugzilla-daemon
                   ` (33 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-10 12:51 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #4 from Lahfa Samy (samy@lahfa.xyz) ---
I have about 1GB of VRAM currently set according to glxinfo:

Extended renderer info (GLX_MESA_query_renderer):
    Vendor: AMD (0x1002)
    Device: AMD Radeon(TM) Vega 10 Graphics (RAVEN, DRM 3.40.0, 5.12.9-arch1-1,
LLVM 12.0.0) (0x15d8)
    Version: 21.2.0
    Accelerated: yes
    Video memory: 1024MB
    Unified memory: no
Memory info (GL_ATI_meminfo):
    VBO free memory - total: 42 MB, largest block: 42 MB
    VBO free aux. memory - total: 2442 MB, largest block: 2442 MB
    Texture free memory - total: 42 MB, largest block: 42 MB
    Texture free aux. memory - total: 2442 MB, largest block: 2442 MB
    Renderbuffer free memory - total: 42 MB, largest block: 42 MB
    Renderbuffer free aux. memory - total: 2442 MB, largest block: 2442 MB
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 1024 MB
    Total available memory: 4096 MB
    Currently available dedicated video memory: 42 MB
OpenGL vendor string: AMD
OpenGL renderer string: AMD Radeon(TM) Vega 10 Graphics (RAVEN, DRM 3.40.0,
5.12.9-arch1-1, LLVM 12.0.0)

How would I go about testing a patch ? (I probably need to rebuild the Linux
kernel with the patch, right and boot with it), I found this link, but it says
that the information in there is probably deprecated :
https://www.kernel.org/doc/html/v5.12/process/applying-patches.html

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (5 preceding siblings ...)
  2021-06-10 12:51 ` bugzilla-daemon
@ 2021-06-10 13:09 ` bugzilla-daemon
  2021-06-10 13:19 ` bugzilla-daemon
                   ` (32 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-10 13:09 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #5 from Nirmoy (nirmoy.aiemd@gmail.com) ---
Please let me know what distro are you using then I can prepare a complete
guide.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (6 preceding siblings ...)
  2021-06-10 13:09 ` bugzilla-daemon
@ 2021-06-10 13:19 ` bugzilla-daemon
  2021-06-10 17:41 ` bugzilla-daemon
                   ` (31 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-10 13:19 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #6 from Lahfa Samy (samy@lahfa.xyz) ---
I'm under ArchLinux running with the ZFS module (I can't boot and mount the
root/home "partition" without it), thanks for the time you'll be taking to make
this guide, I'll be trying my best to test the patch in any ways I can.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (7 preceding siblings ...)
  2021-06-10 13:19 ` bugzilla-daemon
@ 2021-06-10 17:41 ` bugzilla-daemon
  2021-06-10 19:45 ` bugzilla-daemon
                   ` (30 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-10 17:41 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #7 from Nirmoy (nirmoy.aiemd@gmail.com) ---
Actually, I am wrong, I checked out v5.12.9-arch1 from Arch and realized the
fix I mentioned before isn't valid.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (8 preceding siblings ...)
  2021-06-10 17:41 ` bugzilla-daemon
@ 2021-06-10 19:45 ` bugzilla-daemon
  2021-06-11  7:31 ` bugzilla-daemon
                   ` (29 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-10 19:45 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #8 from Lahfa Samy (samy@lahfa.xyz) ---
In the meantime, I'll be trying to find a way to reproduce this issue reliably,
if you have any plans on writing a patch for this issue, I would be glad to
help in any testing in order to help squash this bug.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (9 preceding siblings ...)
  2021-06-10 19:45 ` bugzilla-daemon
@ 2021-06-11  7:31 ` bugzilla-daemon
  2021-06-11 23:32 ` bugzilla-daemon
                   ` (28 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-11  7:31 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #9 from Michel Dänzer (michel@daenzer.net) ---
If you can, reverting to an older version of the files under
/lib/firmware/amdgpu/ may avoid the hangs.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (10 preceding siblings ...)
  2021-06-11  7:31 ` bugzilla-daemon
@ 2021-06-11 23:32 ` bugzilla-daemon
  2021-06-12 23:02 ` bugzilla-daemon
                   ` (27 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-11 23:32 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #10 from dimitris@gmail.com ---
Seeing the same thing on a T495 running Fedora 33 and Wayland, typically
involving Firefox: https://bugzilla.redhat.com/show_bug.cgi?id=1966384

Would it be possible for me to try that patch?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (11 preceding siblings ...)
  2021-06-11 23:32 ` bugzilla-daemon
@ 2021-06-12 23:02 ` bugzilla-daemon
  2021-06-13 17:43 ` bugzilla-daemon
                   ` (26 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-12 23:02 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #11 from Lahfa Samy (samy@lahfa.xyz) ---
Hi Dimitris, what is your current kernel version under Fedora, or the output of
this command "uname --kernel-release" in a terminal, I cannot try the patch
given however I haven't run into the issue again, I haven't had the time to put
my RAM under heavy load.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (12 preceding siblings ...)
  2021-06-12 23:02 ` bugzilla-daemon
@ 2021-06-13 17:43 ` bugzilla-daemon
  2021-06-14  8:01 ` bugzilla-daemon
                   ` (25 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-13 17:43 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #12 from dimitris@gmail.com ---
Hi, I've seen this under 5.12.10-200.fc33.x86_64, two incidents hours apart. 
Earlier had a number of incidents under 5.12.9.

In all of my cases I was using Firefox "heavily".  Creating tabs and using
graphics-heavy pages.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (13 preceding siblings ...)
  2021-06-13 17:43 ` bugzilla-daemon
@ 2021-06-14  8:01 ` bugzilla-daemon
  2021-06-15 22:14 ` bugzilla-daemon
                   ` (24 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-14  8:01 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #13 from Nirmoy (nirmoy.aiemd@gmail.com) ---
Hi Dimitris and Lahfa, please try Michel's suggestion.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (14 preceding siblings ...)
  2021-06-14  8:01 ` bugzilla-daemon
@ 2021-06-15 22:14 ` bugzilla-daemon
  2021-06-16  8:51 ` bugzilla-daemon
                   ` (23 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-15 22:14 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #14 from Dominic Letz (dominic.letz@berlin.de) ---
Having the same issue on an E495 with Kernel 5.12.9. Will try to downgrade the
/lib/firmware/amdgpu any hint to which git tag you would consider safe?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (15 preceding siblings ...)
  2021-06-15 22:14 ` bugzilla-daemon
@ 2021-06-16  8:51 ` bugzilla-daemon
  2021-06-16 10:46 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-16  8:51 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #15 from Michel Dänzer (michel@daenzer.net) ---
(In reply to Dominic Letz from comment #14)
> Having the same issue on an E495 with Kernel 5.12.9. Will try to downgrade
> the /lib/firmware/amdgpu any hint to which git tag you would consider safe?

20210315 seems to work fine here (on an E595).

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (16 preceding siblings ...)
  2021-06-16  8:51 ` bugzilla-daemon
@ 2021-06-16 10:46 ` bugzilla-daemon
  2021-06-16 20:55 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-16 10:46 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #16 from Dominic Letz (dominic.letz@berlin.de) ---
(In reply to Michel Dänzer from comment #15)
> (In reply to Dominic Letz from comment #14)
> > Having the same issue on an E495 with Kernel 5.12.9. Will try to downgrade
> > the /lib/firmware/amdgpu any hint to which git tag you would consider safe?
> 
> 20210315 seems to work fine here (on an E595).

+1 trying that

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (17 preceding siblings ...)
  2021-06-16 10:46 ` bugzilla-daemon
@ 2021-06-16 20:55 ` bugzilla-daemon
  2021-06-18 18:27 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-16 20:55 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

Leandro Jacques (lsrzj@yahoo.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lsrzj@yahoo.com

--- Comment #17 from Leandro Jacques (lsrzj@yahoo.com) ---
Created attachment 297413
  --> https://bugzilla.kernel.org/attachment.cgi?id=297413&action=edit
Crash log for kernel 5.12.10

I'm having issues with amdgpu since kernel 5.10. I had to downgrade to 5.4 LTS
to get rid of any kind of issue.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (18 preceding siblings ...)
  2021-06-16 20:55 ` bugzilla-daemon
@ 2021-06-18 18:27 ` bugzilla-daemon
  2021-06-18 20:30 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-18 18:27 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #18 from Leandro Jacques (lsrzj@yahoo.com) ---
Created attachment 297467
  --> https://bugzilla.kernel.org/attachment.cgi?id=297467&action=edit
amdgpu crash log for kernel 5.4.126

Before 5.4.126 I had no issues at all, downgrading to 5.4.123 to check if the
problem will be gone.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (19 preceding siblings ...)
  2021-06-18 18:27 ` bugzilla-daemon
@ 2021-06-18 20:30 ` bugzilla-daemon
  2021-06-19 12:15 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-18 20:30 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #19 from dimitris@gmail.com ---
I've also just replaced /lib/firmware/amdgpu with the `20210315` version, I'll
see how this goes.  Currently running Fedora kernel 5.12.11-200.fc33.x86_64 on
a T495.

Question, don't I also need to update the initrd?  `lsinitrd` shows that all
the amdgpu modules are included in the initrd image.  Or is the firmware
reloaded once root is mounted?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (20 preceding siblings ...)
  2021-06-18 20:30 ` bugzilla-daemon
@ 2021-06-19 12:15 ` bugzilla-daemon
  2021-06-20 13:02 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-19 12:15 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #20 from Michel Dänzer (michel@daenzer.net) ---
(In reply to dimitris from comment #19)
> Question, don't I also need to update the initrd?

Yes you do, if it didn't happen automatically.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (21 preceding siblings ...)
  2021-06-19 12:15 ` bugzilla-daemon
@ 2021-06-20 13:02 ` bugzilla-daemon
  2021-06-20 21:07 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-20 13:02 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #21 from Dominic Letz (dominic.letz@berlin.de) ---
So I'm running since 16th on 20210315 and it has been stable so far vs.
multiple freezes a day before.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (22 preceding siblings ...)
  2021-06-20 13:02 ` bugzilla-daemon
@ 2021-06-20 21:07 ` bugzilla-daemon
  2021-06-21  7:04 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-20 21:07 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #22 from dimitris@gmail.com ---
Updated initrd also to 20210315, ran under 5.12.11-200.fc33 for a day or so
without issues, now under 5.12.12-200.fc33, we'll see how it goes.

For reference what's the best way to check the active/loaded firmware?  I don't
see anything obvious on dmesg or lspci -vv.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (23 preceding siblings ...)
  2021-06-20 21:07 ` bugzilla-daemon
@ 2021-06-21  7:04 ` bugzilla-daemon
  2021-06-21 18:55 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-21  7:04 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #23 from Michel Dänzer (michel@daenzer.net) ---
/sys/kernel/debug/dri/0/amdgpu_firmware_info has all the info.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (24 preceding siblings ...)
  2021-06-21  7:04 ` bugzilla-daemon
@ 2021-06-21 18:55 ` bugzilla-daemon
  2021-06-21 19:26 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-21 18:55 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #24 from Leandro Jacques (lsrzj@yahoo.com) ---
Created attachment 297557
  --> https://bugzilla.kernel.org/attachment.cgi?id=297557&action=edit
Firmware info

The downgrade to kernel 5.4.123 doesn't had any effect, I had the same bug. Now
I'm passing my firmware versions information.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (25 preceding siblings ...)
  2021-06-21 18:55 ` bugzilla-daemon
@ 2021-06-21 19:26 ` bugzilla-daemon
  2021-06-29 23:55 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-21 19:26 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #25 from Leandro Jacques (lsrzj@yahoo.com) ---
(In reply to Dominic Letz from comment #21)

Trying the same version linux firmware 20210315. Let's check how it goes

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (26 preceding siblings ...)
  2021-06-21 19:26 ` bugzilla-daemon
@ 2021-06-29 23:55 ` bugzilla-daemon
  2021-06-29 23:58 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-29 23:55 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #26 from Lahfa Samy (samy@lahfa.xyz) ---
Created attachment 297669
  --> https://bugzilla.kernel.org/attachment.cgi?id=297669&action=edit
amdgpu-xorg-page-faults-screen-blackout-when-memory-heavily-used

Here are other logs. I have seen that when triggering the bug yet again on the
5.12.10-arch1-1 linux kernel running on ArchLinux, the computer didn't freeze
this time like before, it just stopped displaying anything (Xorg was affected
so I guess that's why). 
I'm using this version of the linux-firmware package under Arch :
linux-firmware-20210511.7685cf4-1 

I have not yet downgraded to test with a downgraded linux-firmware package, may
try this soon, if I get affected by the issue too frequently.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (27 preceding siblings ...)
  2021-06-29 23:55 ` bugzilla-daemon
@ 2021-06-29 23:58 ` bugzilla-daemon
  2021-06-30 19:00 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-29 23:58 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #27 from Lahfa Samy (samy@lahfa.xyz) ---
Created attachment 297671
  --> https://bugzilla.kernel.org/attachment.cgi?id=297671&action=edit
Firmware information for a T495 with an AMD Vega RX 10

Here is again my Linux firmware package version (given by pacman coming from
ArchLinux core repositories) : 20210511.7685cf4-1

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (28 preceding siblings ...)
  2021-06-29 23:58 ` bugzilla-daemon
@ 2021-06-30 19:00 ` bugzilla-daemon
  2021-07-05 16:55 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-06-30 19:00 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #28 from Leandro Jacques (lsrzj@yahoo.com) ---
(In reply to Leandro Jacques from comment #25)

Until now, no problems. So the problem is with newer firmware versions, working
without any issues since 2021-06-21 19:26:28 UTC with version 20210315

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (29 preceding siblings ...)
  2021-06-30 19:00 ` bugzilla-daemon
@ 2021-07-05 16:55 ` bugzilla-daemon
  2021-07-06 17:35 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-07-05 16:55 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #29 from Leandro Jacques (lsrzj@yahoo.com) ---
How to file a bug to the linux-firmware project for the amdgpu driver? After
the downgrade I haven't experienced any issues anymore.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (30 preceding siblings ...)
  2021-07-05 16:55 ` bugzilla-daemon
@ 2021-07-06 17:35 ` bugzilla-daemon
  2021-07-08 18:24 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-07-06 17:35 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #30 from Leandro Jacques (lsrzj@yahoo.com) ---
(In reply to Dominic Letz from comment #21)
I made what you suggested, no issues anymore. It was a linux-firmware package
problem, not a kernel driver problem.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (31 preceding siblings ...)
  2021-07-06 17:35 ` bugzilla-daemon
@ 2021-07-08 18:24 ` bugzilla-daemon
  2021-07-08 18:28 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-07-08 18:24 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #31 from Lahfa Samy (samy@lahfa.xyz) ---
I just have hit the same error even after downgrading, here is the current
version of the package linux-firmware 20210315.3568f96-3.

I have hit the error again, the computer froze for a few seconds, looking at
the logs shows many retry page faults for the amdgpu driver.

Furthermore, I'm on ArchLinux and I will attach the output of `modinfo amdgpu`,
I'm thinking that downgrading linux-firmware on my distro wasn't enough it
seems to downgrade the AMDGPU driver.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (32 preceding siblings ...)
  2021-07-08 18:24 ` bugzilla-daemon
@ 2021-07-08 18:28 ` bugzilla-daemon
  2021-07-15 13:29 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-07-08 18:28 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #32 from Lahfa Samy (samy@lahfa.xyz) ---
Created attachment 297781
  --> https://bugzilla.kernel.org/attachment.cgi?id=297781&action=edit
Archlinux-part-of-modinfo-amdgpu

I think that my kernel is using the latest amdgpu driver that is coming with
5.12.13-arch1-2 and not the version coming with the linux-firmware pkg, if
anyone can enlighten me or explain to me if I'm mistaken.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (33 preceding siblings ...)
  2021-07-08 18:28 ` bugzilla-daemon
@ 2021-07-15 13:29 ` bugzilla-daemon
  2021-07-15 13:31 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-07-15 13:29 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #33 from Leandro Jacques (lsrzj@yahoo.com) ---
Created attachment 297881
  --> https://bugzilla.kernel.org/attachment.cgi?id=297881&action=edit
Kernel crash log for linux firmware version 20210511.7685cf4

amdgpu kernel crash log when the problem ocurred, with the exact same message
telling about page fault.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (34 preceding siblings ...)
  2021-07-15 13:29 ` bugzilla-daemon
@ 2021-07-15 13:31 ` bugzilla-daemon
  2021-08-14 21:00 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-07-15 13:31 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #34 from Leandro Jacques (lsrzj@yahoo.com) ---
Created attachment 297883
  --> https://bugzilla.kernel.org/attachment.cgi?id=297883&action=edit
Linux Firmware version info 20210511.7685cf4

Firmware info as of the moment when the system crashed

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (35 preceding siblings ...)
  2021-07-15 13:31 ` bugzilla-daemon
@ 2021-08-14 21:00 ` bugzilla-daemon
  2021-09-10 11:46 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-08-14 21:00 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

mcmarius@gmx.net changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mcmarius@gmx.net

--- Comment #35 from mcmarius@gmx.net ---
i have a Lenovo L340 and the same problem

here is the complete dmesg log

https://gist.github.com/McMarius11/36c8d21a2dcaf5c2289c91a74af4f7fb

Operating System: Manjaro Linux
KDE Plasma Version: 5.22.4
KDE Frameworks Version: 5.84.0
Qt Version: 5.15.2
Kernel Version: 5.11.22-2-MANJARO (64-bit)
Graphics Platform: X11
Processors: 8 × AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
Memory: 5,6 GiB of RAM
Graphics Processor: AMD Radeon™ Vega 10 Graphics

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (36 preceding siblings ...)
  2021-08-14 21:00 ` bugzilla-daemon
@ 2021-09-10 11:46 ` bugzilla-daemon
  2021-09-10 13:29 ` bugzilla-daemon
  2021-11-19 13:28 ` bugzilla-daemon
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-09-10 11:46 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #36 from Lahfa Samy (samy@lahfa.xyz) ---
Did anyone test whether this has been fixed in newer firmware updates, or
should we still stay on version 20210315.3568f96-3 ?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (37 preceding siblings ...)
  2021-09-10 11:46 ` bugzilla-daemon
@ 2021-09-10 13:29 ` bugzilla-daemon
  2021-11-19 13:28 ` bugzilla-daemon
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-09-10 13:29 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #37 from Michel Dänzer (michel@daenzer.net) ---
(In reply to Lahfa Samy from comment #36)
> Did anyone test whether this has been fixed in newer firmware updates, or
> should we still stay on version 20210315.3568f96-3 ?

It's fixed in upstream linux-firmware 20210818.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered
  2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
                   ` (38 preceding siblings ...)
  2021-09-10 13:29 ` bugzilla-daemon
@ 2021-11-19 13:28 ` bugzilla-daemon
  39 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2021-11-19 13:28 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213391

Lahfa Samy (samy@lahfa.xyz) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |UNREPRODUCIBLE

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2021-11-19 13:28 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-10 11:16 [Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered bugzilla-daemon
2021-06-10 11:33 ` [Bug 213391] " bugzilla-daemon
2021-06-10 11:43 ` bugzilla-daemon
2021-06-10 12:34 ` [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes " bugzilla-daemon
2021-06-10 12:34 ` [Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed " bugzilla-daemon
2021-06-10 12:36 ` bugzilla-daemon
2021-06-10 12:51 ` bugzilla-daemon
2021-06-10 13:09 ` bugzilla-daemon
2021-06-10 13:19 ` bugzilla-daemon
2021-06-10 17:41 ` bugzilla-daemon
2021-06-10 19:45 ` bugzilla-daemon
2021-06-11  7:31 ` bugzilla-daemon
2021-06-11 23:32 ` bugzilla-daemon
2021-06-12 23:02 ` bugzilla-daemon
2021-06-13 17:43 ` bugzilla-daemon
2021-06-14  8:01 ` bugzilla-daemon
2021-06-15 22:14 ` bugzilla-daemon
2021-06-16  8:51 ` bugzilla-daemon
2021-06-16 10:46 ` bugzilla-daemon
2021-06-16 20:55 ` bugzilla-daemon
2021-06-18 18:27 ` bugzilla-daemon
2021-06-18 20:30 ` bugzilla-daemon
2021-06-19 12:15 ` bugzilla-daemon
2021-06-20 13:02 ` bugzilla-daemon
2021-06-20 21:07 ` bugzilla-daemon
2021-06-21  7:04 ` bugzilla-daemon
2021-06-21 18:55 ` bugzilla-daemon
2021-06-21 19:26 ` bugzilla-daemon
2021-06-29 23:55 ` bugzilla-daemon
2021-06-29 23:58 ` bugzilla-daemon
2021-06-30 19:00 ` bugzilla-daemon
2021-07-05 16:55 ` bugzilla-daemon
2021-07-06 17:35 ` bugzilla-daemon
2021-07-08 18:24 ` bugzilla-daemon
2021-07-08 18:28 ` bugzilla-daemon
2021-07-15 13:29 ` bugzilla-daemon
2021-07-15 13:31 ` bugzilla-daemon
2021-08-14 21:00 ` bugzilla-daemon
2021-09-10 11:46 ` bugzilla-daemon
2021-09-10 13:29 ` bugzilla-daemon
2021-11-19 13:28 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.