dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: bugzilla-daemon@bugzilla.kernel.org
To: dri-devel@lists.freedesktop.org
Subject: [Bug 206475] amdgpu under load drop signal to monitor until hard reset
Date: Wed, 24 Jun 2020 20:33:24 +0000	[thread overview]
Message-ID: <bug-206475-2300-ltqfoIk2Za@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-206475-2300@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=206475

--- Comment #15 from Andrew Ammerlaan (andrewammerlaan@riseup.net) ---
So today it was *really* hot, and I had this issue occur a couple of times.
(The solution with the extra fans was nice and all, but not enough to prevent
it entirely)

However, now that the iGPU is default, I can still see the system monitor that
I usually run on the other monitor when this issue occurs. Every single time
the thermal sensor of the GPU would show a ridiculous value (e.g. 511 degrees
Celsius).

Now, this could explain why the GPU does a reset. If the thermal sensor would
all of a sudden return a value of e.g. 511, then of course the GPU will shut
itself down. 

As it is clearly impossible for the temperature of the GPU to jump from being
somewhere between 80 to 90, to over 500 within a couple of milliseconds. I
conclude that there is something wrong, either physically with the thermal
sensor, or with the way the firmware/driver handles the temperature reporting
from the sensor. Also, if the GPU would have actually reached a temperature of
511 it would be broken now, as the melting temperature of tin is about 230
degrees Celsius.

I happen to work with thermometers quite a lot, and I have seen temperature
readings do stuff like this. Usually the cause is either a broken, or shorted
sensor (which is unlikely in this case, cause it works normally most of the
time), or a wrong/incomplete calibration curve. (Usually thermal sensors are
only calibrated within the range they are expected to operate, but the high
limit of this calibration curve might be too low.)

Anyway, either the GPU reset is caused by the incorrect temperature readings,
or the incorrect temperature readings are caused by the GPU reset (which is
also possible I guess). In any case, it would be great if AMD could look into
this soon. Because clearly something is wrong.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  parent reply	other threads:[~2020-06-24 20:33 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-09 20:36 [Bug 206475] New: amdgpu under load drop signal to monitor until hard reset bugzilla-daemon
2020-02-10 13:20 ` [Bug 206475] " bugzilla-daemon
2020-02-10 13:21 ` bugzilla-daemon
2020-02-10 16:39 ` bugzilla-daemon
2020-02-10 16:40 ` bugzilla-daemon
2020-02-10 19:33 ` bugzilla-daemon
2020-02-17 13:23 ` bugzilla-daemon
2020-02-21 21:13 ` bugzilla-daemon
2020-02-24 13:50 ` bugzilla-daemon
2020-02-24 13:52 ` bugzilla-daemon
2020-05-22 12:55 ` bugzilla-daemon
2020-05-23 14:40 ` bugzilla-daemon
2020-05-23 16:44 ` bugzilla-daemon
2020-06-16 15:48 ` bugzilla-daemon
2020-06-16 16:39 ` bugzilla-daemon
2020-06-24 20:33 ` bugzilla-daemon [this message]
2020-06-24 20:41 ` bugzilla-daemon
2020-06-25  9:58 ` bugzilla-daemon
2020-09-15 18:31 ` bugzilla-daemon
2020-09-16  7:52 ` bugzilla-daemon
2021-03-22  9:36 ` bugzilla-daemon
2022-01-06 17:58 ` bugzilla-daemon
2022-01-06 23:44 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-206475-2300-ltqfoIk2Za@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@bugzilla.kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).