All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 213569] New: Amdgpu temperature reaching dangerous levels
@ 2021-06-24 15:32 bugzilla-daemon
  2021-06-24 20:58 ` [Bug 213569] " bugzilla-daemon
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: bugzilla-daemon @ 2021-06-24 15:32 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213569

            Bug ID: 213569
           Summary: Amdgpu temperature reaching dangerous levels
           Product: Drivers
           Version: 2.5
    Kernel Version: 5.12, 5.11
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: blocking
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@kernel-bugs.osdl.org
          Reporter: martin.tk@gmx.com
        Regression: No

Ever since going to 5.11 version and later 5.12 the fan speed on my Radeon
RX550 is erratic causing the temperature to reach dangerous level.

sensors output:

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:      825.00 mV 
fan1:         200 RPM  (min =    0 RPM, max = 3500 RPM)
edge:         +69.0°C  (crit = +97.0°C, hyst = -273.1°C)
power1:        7.03 W  (cap =  36.00 W)


I'm afraid it'll eventually kill my gpu.

I've already reported another bug for 5.11: 
https://bugzilla.kernel.org/show_bug.cgi?id=212107

From what I gather there were changes in fan control in 5.11. Is it possible to
disable those changes?
There were no issues on 5.10. Fan went to roughly 1000rpm, it was cool and
quiet.

The behaviour from 5.11 onward is dangerous, can cause hardware destruction.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 213569] Amdgpu temperature reaching dangerous levels
  2021-06-24 15:32 [Bug 213569] New: Amdgpu temperature reaching dangerous levels bugzilla-daemon
@ 2021-06-24 20:58 ` bugzilla-daemon
  2021-06-25 12:34 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2021-06-24 20:58 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213569

miloog (mileikasjos@mailbox.org) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mileikasjos@mailbox.org

--- Comment #1 from miloog (mileikasjos@mailbox.org) ---
I can confirm.

But in a different scenario. I'm using debian bullseye with lts kernel and
latest amdgpu firmware. I don't change any fan control mechanism.

5.10.44 and 5.10.45 works fine but 5.10.46 if i'm only start sway (wayland
window manager) my gpu usage is at 100% without doing anything.

It's a vega 56.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 213569] Amdgpu temperature reaching dangerous levels
  2021-06-24 15:32 [Bug 213569] New: Amdgpu temperature reaching dangerous levels bugzilla-daemon
  2021-06-24 20:58 ` [Bug 213569] " bugzilla-daemon
@ 2021-06-25 12:34 ` bugzilla-daemon
  2021-06-27  6:09 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2021-06-25 12:34 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213569

--- Comment #2 from Martin (martin.tk@gmx.com) ---
In my case it was watching a video that made the gpu reach 70°C

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 213569] Amdgpu temperature reaching dangerous levels
  2021-06-24 15:32 [Bug 213569] New: Amdgpu temperature reaching dangerous levels bugzilla-daemon
  2021-06-24 20:58 ` [Bug 213569] " bugzilla-daemon
  2021-06-25 12:34 ` bugzilla-daemon
@ 2021-06-27  6:09 ` bugzilla-daemon
  2021-06-27  7:14 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2021-06-27  6:09 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213569

James (mrjameshennig@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mrjameshennig@gmail.com

--- Comment #3 from James (mrjameshennig@gmail.com) ---
This is a legitimate bug which is present starting 5.12.13 and the issue was
said to have been fixed starting 5.13-rc8. I wanted to comment out of
reassurance that 70°C edge temperature for that GPU cannot damage it. Notice
"crit = +97.0°C" which is the throttle temperature.

The computer should shut down at the "emerg" temperature which is not present
in your sensors output, but should be +5.0°C over "crit" for your GPU.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 213569] Amdgpu temperature reaching dangerous levels
  2021-06-24 15:32 [Bug 213569] New: Amdgpu temperature reaching dangerous levels bugzilla-daemon
                   ` (2 preceding siblings ...)
  2021-06-27  6:09 ` bugzilla-daemon
@ 2021-06-27  7:14 ` bugzilla-daemon
  2021-06-28 13:09 ` bugzilla-daemon
  2021-07-17 11:44 ` bugzilla-daemon
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2021-06-27  7:14 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213569

Frank Kruger (fkrueger@mailbox.org) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fkrueger@mailbox.org

--- Comment #4 from Frank Kruger (fkrueger@mailbox.org) ---
(In reply to miloog from comment #1)
> I can confirm.
> 
> But in a different scenario. I'm using debian bullseye with lts kernel and
> latest amdgpu firmware. I don't change any fan control mechanism.
> 
> 5.10.44 and 5.10.45 works fine but 5.10.46 if i'm only start sway (wayland
> window manager) my gpu usage is at 100% without doing anything.
> 
> It's a vega 56.

You are probably hit by a recent regression introduced with kernel 5.10.46 and
5.12.13 (cf. https://bugzilla.kernel.org/show_bug.cgi?id=213561), where patches
are on its way
(https://lists.freedesktop.org/archives/amd-gfx/2021-June/065612.html). This is
not related to the original bug report here, I presume.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 213569] Amdgpu temperature reaching dangerous levels
  2021-06-24 15:32 [Bug 213569] New: Amdgpu temperature reaching dangerous levels bugzilla-daemon
                   ` (3 preceding siblings ...)
  2021-06-27  7:14 ` bugzilla-daemon
@ 2021-06-28 13:09 ` bugzilla-daemon
  2021-07-17 11:44 ` bugzilla-daemon
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2021-06-28 13:09 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213569

--- Comment #5 from Martin (martin.tk@gmx.com) ---
(In reply to James from comment #3)
> This is a legitimate bug which is present starting 5.12.13 and the issue was
> said to have been fixed starting 5.13-rc8. I wanted to comment out of
> reassurance that 70°C edge temperature for that GPU cannot damage it. Notice
> "crit = +97.0°C" which is the throttle temperature.
> 
> The computer should shut down at the "emerg" temperature which is not
> present in your sensors output, but should be +5.0°C over "crit" for your
> GPU.

Thank you for explanation. I've never seen 70°C on my gpu before so to me it
looked scary.

Before those changes landed in 5.11 the usual temperature on my gpu would be
around 40°C. The fan would be around 1000rpm which on my gpu doesn't produce
any  perceivable sound.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 213569] Amdgpu temperature reaching dangerous levels
  2021-06-24 15:32 [Bug 213569] New: Amdgpu temperature reaching dangerous levels bugzilla-daemon
                   ` (4 preceding siblings ...)
  2021-06-28 13:09 ` bugzilla-daemon
@ 2021-07-17 11:44 ` bugzilla-daemon
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2021-07-17 11:44 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=213569

Martin (martin.tk@gmx.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Kernel Version|5.12, 5.11                  |5.13

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-07-17 11:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-24 15:32 [Bug 213569] New: Amdgpu temperature reaching dangerous levels bugzilla-daemon
2021-06-24 20:58 ` [Bug 213569] " bugzilla-daemon
2021-06-25 12:34 ` bugzilla-daemon
2021-06-27  6:09 ` bugzilla-daemon
2021-06-27  7:14 ` bugzilla-daemon
2021-06-28 13:09 ` bugzilla-daemon
2021-07-17 11:44 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.